Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230628となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# PyPIにおけるディープラーニングパッケージ・サプライ・チェーンの特徴:ドメイン、クラスタ、ディスエンジメント Characterizing Deep Learning Package Supply Chains in PyPI: Domains, Clusters, and Disengagement ( http://arxiv.org/abs/2306.16307v1 ) ライセンス: Link先を確認	Kai Gao, Runzhi He, Bing Xie, Minghui Zhou	(参考訳) ディープラーニング(DL)パッケージサプライチェーン(SC)は、DLフレームワークが競争力を維持するために不可欠である。しかし、DLパッケージSCの性質に関する重要な知識はいまだに欠如している。本稿では,この知識ギャップを埋めるため,2つの代表的なpypi dlパッケージscsにおいて,パッケージのドメイン,クラスタ,および解除について検討する。約600万のPyPIパッケージディストリビューションのメタデータを分析し、人気のある2つのDLフレームワークであるTensorFlowとPyTorchのバージョンセンシティブなSCを構築します。その結果,2つのSCは8つのカテゴリに属する34のドメインをカバーしている(月間ダウンロード数で測る)。アプリケーション、インフラストラクチャ、科学のカテゴリはそれぞれ、SCとTensorFlowの人気のあるパッケージの85%以上を占めており、PyTorch SCはそれぞれ、インフラストラクチャとアプリケーションのパッケージに特化している。我々は、Leidenコミュニティ検出アルゴリズムを用いて、2つのSCの131と100のクラスタを検出する。クラスタは、主にアロー、スター、ツリー、フォレストという4つの形状を示し、依存関係の複雑さが増す。ほとんどのクラスタはArrowまたはStarだが、TreeとForestのクラスタがほとんどのパッケージ(Tensorflow SC:70%、PyTorch SC:90%)を担っている。パッケージがSCから切り離された3つの理由(すなわち、DLフレームワークとその依存物がインストール依存から削除される)、すなわち依存性の問題、機能改善、インストールの容易さの3つのグループを特定します。 2つのSCの最も一般的な解離原因は異なる。本研究は,PyPI DL SCのメンテナンスと依存性管理の実践に深く影響している。 Deep learning (DL) package supply chains (SCs) are critical for DL frameworks to remain competitive. However, vital knowledge on the nature of DL package SCs is still lacking. In this paper, we explore the domains, clusters, and disengagement of packages in two representative PyPI DL package SCs to bridge this knowledge gap. We analyze the metadata of nearly six million PyPI package distributions and construct version-sensitive SCs for two popular DL frameworks: TensorFlow and PyTorch. We find that popular packages (measured by the number of monthly downloads) in the two SCs cover 34 domains belonging to eight categories. Applications, Infrastructure, and Sciences categories account for over 85% of popular packages in either SC and TensorFlow and PyTorch SC have developed specializations on Infrastructure and Applications packages respectively. We employ the Leiden community detection algorithm and detect 131 and 100 clusters in the two SCs. The clusters mainly exhibit four shapes: Arrow, Star, Tree, and Forest with increasing dependency complexity. Most clusters are Arrow or Star, but Tree and Forest clusters account for most packages (Tensorflow SC: 70%, PyTorch SC: 90%). We identify three groups of reasons why packages disengage from the SC (i.e., remove the DL framework and its dependents from their installation dependencies): dependency issues, functional improvements, and ease of installation. The most common disengagement reason in the two SCs are different. Our study provides rich implications on the maintenance and dependency management practices of PyPI DL SCs.	翻訳日:2023-10-23 18:45:42 公開日:2023-06-28
# FuzzyFlow: プログラム最適化バグの検索とスワッシュにDataflowを活用する FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs ( http://arxiv.org/abs/2306.16178v1 ) ライセンス: Link先を確認	Philipp Schaad and Timo Schneider and Tal Ben-Nun and Alexandru Calotoiu and Alexandros Nikolaos Ziogas and Torsten Hoefler	(参考訳) 現在のハードウェアランドスケープとアプリケーションスケールは、パフォーマンスエンジニアをbespoke最適化を書くよう駆り立てています。このような最適化の検証と、最小限の失敗事例の生成は、入力やサイズといったプログラム条件の変更に対して、堅牢性において重要である。しかしながら、既存のアプリケーションから最小限のテストケースを分離し、新しい構成を生成することは、主にデータフローに関連するシステム状態に副作用があるため、しばしば困難である。本稿では,プログラム最適化をテストするために設計された障害局所化およびテストケース抽出フレームワークであるFuzzyFlowを紹介する。我々は、データフロープログラム表現を利用して、完全に再現可能なシステム状態をキャプチャし、その領域を最適化し、意味同値の高速チェックを可能にする。テスト時間を削減するため,テスト入力を最小限に抑えるアルゴリズムを設計し,再計算のためのメモリ交換を行う。 FuzzyFlowは、従来のアプローチに比べて最大528倍高速な最適化テストとデバッギングを提供する実世界のアプリケーションのユースケースを例に示す。 The current hardware landscape and application scale is driving performance engineers towards writing bespoke optimizations. Verifying such optimizations, and generating minimal failing cases, is important for robustness in the face of changing program conditions, such as inputs and sizes. However, isolation of minimal test-cases from existing applications and generating new configurations are often difficult due to side effects on the system state, mostly related to dataflow. This paper introduces FuzzyFlow: a fault localization and test case extraction framework designed to test program optimizations. We leverage dataflow program representations to capture a fully reproducible system state and area-of-effect for optimizations to enable fast checking for semantic equivalence. To reduce testing time, we design an algorithm for minimizing test inputs, trading off memory for recomputation. We demonstrate FuzzyFlow on example use cases in real-world applications where the approach provides up to 528 times faster optimization testing and debugging compared to traditional approaches.	翻訳日:2023-10-23 18:45:16 公開日:2023-06-28
# ソーシャルコーディングプラットフォームにおける画像ベースコミュニケーション Image-based Communication on Social Coding Platforms ( http://arxiv.org/abs/2306.15851v1 ) ライセンス: Link先を確認	Maleknaz Nayebi and Bram Adams	(参考訳) 画像やビデオの形でのビジュアルコンテンツは、様々な方法で汎用ソーシャルネットワークを乗っ取り、オンラインコミュニケーションの合理化と強化を行っている。私たちは、画像の利用がソーシャルコーディングプラットフォームで人気があり、どの程度役に立つかを理解することに興味があります。 MozillaのイシュートラッキングシステムであるBugzillaと、開発者のQ/A、すなわちStack Overflowで最も有名なプラットフォームである。我々はさらに168人のソフトウェア開発者を対象に調査を行い、鉱業結果を三角測量し拡張した。 2013年から2022年の間に、BugzillaとStack Overflowの画像データを含む投稿数は倍増した。さらに、画像を共有することで、他の開発者がコンテンツにより速く関わります。画像が開発者の投稿に含まれている場合の大半では、画像内の情報は提供されたテキストに補完される。最後に,画像が共有された場合,画像内の情報を持たないコンテンツを理解することは,86.9\%のケースではありそうにないことを示した。これらの観察に基づいて、開発者の分析や自動化ツールの設計において、ビジュアルコンテンツを検討することの重要性について論じる。 Visual content in the form of images and videos has taken over general-purpose social networks in a variety of ways, streamlining and enriching online communications. We are interested to understand if and to what extent the use of images is popular and helpful in social coding platforms. We mined nine years of data from two popular software developers' platforms: the Mozilla issue tracking system, i.e., Bugzilla, and the most well-known platform for developers' Q/A, i.e., Stack Overflow. We further triangulated and extended our mining results by performing a survey with 168 software developers. We observed that, between 2013 and 2022, the number of posts containing image data on Bugzilla and Stack Overflow doubled. Furthermore, we found that sharing images makes other developers engage more and faster with the content. In the majority of cases in which an image is included in a developer's post, the information in that image is complementary to the text provided. Finally, our results showed that when an image is shared, understanding the content without the information in the image is unlikely for 86.9\% of the cases. Based on these observations, we discuss the importance of considering visual content when analyzing developers and designing automation tools.	翻訳日:2023-10-23 18:44:57 公開日:2023-06-28
# 多様なポートフォリオにおけるトレーディングのための強化学習手法の評価 Evaluation of Reinforcement Learning Techniques for Trading on a Diverse Portfolio ( http://arxiv.org/abs/2309.03202v1 ) ライセンス: Link先を確認	Ishan S. Khare, Tarun K. Martheswaran, Akshana Dassanaike-Perera, Jonah B. Ezekiel	(参考訳) 本研究は,S&P500指数上での強化学習の実現可能性に関する重要な研究課題に答えようとしている。価値反復(vi)のオンポリシー手法と、q-learningのオフポリシー手法とともに、状態-アクション-reward-state-action(sarsa)が実装されている。モデルは2000年から2023年までの数年間の株式市場データからなるデータセット上でトレーニングされ、テストされる。この分析は、covid-19パンデミックの年数を含む2つの異なる期間を使ってモデルをトレーニングし、テストした結果と結果を提示する。その結果、トレーニングデータセットにおけるCOVID-19期間の市場データを含めると、ベースライン戦略よりも優れたパフォーマンスが得られることが示唆された。テスト中、オンラインアプローチ(VIとSARSA)はQラーニングを上回っ、バイアス分散トレードオフの影響とより単純なポリシーの一般化能力を強調した。しかし,Q-ラーニングのパフォーマンスは,今後の市場環境の安定性によって異なる可能性がある。今後の取り組みとして、さまざまな株式の試験および取引におけるqラーニングポリシーの更新を含む実験が提案されている。また,モデル訓練のための代替経済指標の探索も提案している。 This work seeks to answer key research questions regarding the viability of reinforcement learning over the S&P 500 index. The on-policy techniques of Value Iteration (VI) and State-action-reward-state-action (SARSA) are implemented along with the off-policy technique of Q-Learning. The models are trained and tested on a dataset comprising multiple years of stock market data from 2000-2023. The analysis presents the results and findings from training and testing the models using two different time periods: one including the COVID-19 pandemic years and one excluding them. The results indicate that including market data from the COVID-19 period in the training dataset leads to superior performance compared to the baseline strategies. During testing, the on-policy approaches (VI and SARSA) outperform Q-learning, highlighting the influence of bias-variance tradeoff and the generalization capabilities of simpler policies. However, it is noted that the performance of Q-learning may vary depending on the stability of future market conditions. Future work is suggested, including experiments with updated Q-learning policies during testing and trading diverse individual stocks. Additionally, the exploration of alternative economic indicators for training the models is proposed.	翻訳日:2023-10-23 08:54:54 公開日:2023-06-28
# 非エルミート双曲性物質における例外輪郭の発見 Uncovering Exceptional Contours in non-Hermitian Hyperbolic Matter ( http://arxiv.org/abs/2307.04745v1 ) ライセンス: Link先を確認	Nisarg Chadha, Awadhesh Narayan	(参考訳) 双曲格子は、物質の新しい段階を探索するために研究され始めている。同時に、非エルミート物理学は、フォトニック、光学、フォノニック、凝縮体系において最前線にある。本研究では,非エルミート双曲体を導入し,その特異な性質を深く解明する。双曲ブロッホ理論を用いて、非エルミートオンサイトゲインと損失と非相反ホッピングの存在下で双曲格子のバンド構造を調べる。様々な解析的および数値的アプローチを用いて、位相剛性、エネルギースケーリング、渦性を用いて特徴づける10,5}テッセレーションにおいて、広くアクセス可能で可変可能な例外点と輪郭を示す。さらに,ニュートン多角形を用いた<8,4}テセルレーションにおける高次例外点と輪郭の発生を,渦性および位相剛性計算によって実証した。最後に,開放境界スペクトルと状態密度を調べ,バンド理論の結果と比較し,境界局所化の実証を行った。以上の結果から,双曲型非エルミート物質の異常な不均一性がみられた。 Hyperbolic lattices are starting to be explored in search of novel phases of matter. At the same time, non-Hermitian physics has come to the forefront in photonic, optical, phononic, and condensed matter systems. In this work, we introduce non-Hermitian hyperbolic matter and elucidate its exceptional properties in depth. We use hyperbolic Bloch theory to investigate band structures of hyperbolic lattices in the presence of non-Hermitian on-site gain and loss as well as non-reciprocal hopping. Using various analytical and numerical approaches we demonstrate widely accessible and tunable exceptional points and contours in {10,5} tessellations, which we characterize using phase rigidity, energy scaling, and vorticity. We further demonstrate the occurrence of higher-order exceptional points and contours in the {8,4} tessellations using the method of Newton polygons, supported by vorticity and phase rigidity computations. Finally, we investigate the open boundary spectra and densities of states to compare with results from band theory, along with a demonstration of boundary localisation. Our results unveil an abundance of exceptional degeneracies in hyperbolic non-Hermitian matter.	翻訳日:2023-07-16 04:02:56 公開日:2023-06-28
# 中性子・X線反射率データのニューラルネットワーク解析:相問題に取り組むための事前知識の導入 Neural network analysis of neutron and X-ray reflectivity data: Incorporating prior knowledge for tackling the phase problem ( http://arxiv.org/abs/2307.05364v1 ) ライセンス: Link先を確認	Valentin Munteanu, Vladimir Starostin, Alessandro Greco, Linus Pithan, Alexander Gerlach, Alexander Hinderhofer, Stefan Kowarik, Frank Schreiber	(参考訳) 位相情報の欠如により、測定された中性子およびx線反射率曲線から多層薄膜の物理パラメータを決定することは、基本レベルでは、不確定な逆問題である。このいわゆるフェーズ問題は、従来の機械学習ソリューションで考慮されるパラメータの範囲と数を制限する、標準的なニューラルネットワークに制限を与える。そこで本研究では,事前知識を活用し,より広いパラメータ空間上でのトレーニングプロセスを定式化する手法を提案する。ボックスモデルパラメータ化を用いた多層構造や,多層構造に対する散乱長密度プロファイルの物理に着想を得た特殊パラメータ化など,様々なシナリオにおいて本手法の有効性を示す。事前知識の入力を活用することで、トレーニングダイナミクスを改善し、未決定の(未解決の)問題の性質に対処できる。従来の手法とは対照的に,5層多層モデルや最大17個のオープンパラメータを持つn層周期多層モデルにおいても,逆問題の複雑性を増大させる手法は好適である。 Due to the lack of phase information, determining the physical parameters of multilayer thin films from measured neutron and X-ray reflectivity curves is, on a fundamental level, an underdetermined inverse problem. This so-called phase problem poses limitations on standard neural networks, constraining the range and number of considered parameters in previous machine learning solutions. To overcome this, we present an approach that utilizes prior knowledge to regularize the training process over larger parameter spaces. We demonstrate the effectiveness of our method in various scenarios, including multilayer structures with box model parameterization and a physics-inspired special parameterization of the scattering length density profile for a multilayer structure. By leveraging the input of prior knowledge, we can improve the training dynamics and address the underdetermined ("ill-posed") nature of the problem. In contrast to previous methods, our approach scales favorably when increasing the complexity of the inverse problem, working properly even for a 5-layer multilayer model and an N-layer periodic multilayer model with up to 17 open parameters.	翻訳日:2023-07-16 03:55:58 公開日:2023-06-28
# VisText:Semantically Rich Chart Captioningのベンチマーク VisText: A Benchmark for Semantically Rich Chart Captioning ( http://arxiv.org/abs/2307.05356v1 ) ライセンス: Link先を確認	Benny J. Tang, Angie Boggust and Arvind Satyanarayan	(参考訳) チャートを記述または説明するキャプションは、描写されたデータのリコールと理解を改善し、視覚障害者にとってよりアクセスしやすい媒体を提供する。しかし、このようなキャプションを自動生成する現在のアプローチは、チャートの目印である知覚的特徴や認知的特徴(複雑な傾向やパターンなど)を明確にするのに苦労している。グラフの構成を記述した12,441組のチャートとキャプションのデータセットであるVisTextを紹介し、重要な統計を報告し、知覚的および認知的現象を識別する。 VisTextでは、チャートはラスタ化イメージ、バックデータテーブル、シーングラフの3つの表現として利用可能である。これは、チャートの視覚要素をWebページのドキュメントオブジェクトモデル(DOM)に似た階層的な表現である。 vistextの影響を評価するために、グラフキャプションタスクに最先端の言語モデルを微調整し、彼らが伝達する意味的コンテンツが異なるキャプションを作成するためにプレフィックスチューニングを適用します。我々のモデルはコヒーレントでセマンティックにリッチなキャプションを生成し、機械翻訳とテキスト生成のメトリクスで最先端のチャートキャプションモデルと同等に機能する。定性的分析により、我々のモデルが将来の作業に役立てる6つの幅広いエラーカテゴリを特定します。 Captions that describe or explain charts help improve recall and comprehension of the depicted data and provide a more accessible medium for people with visual disabilities. However, current approaches for automatically generating such captions struggle to articulate the perceptual or cognitive features that are the hallmark of charts (e.g., complex trends and patterns). In response, we introduce VisText: a dataset of 12,441 pairs of charts and captions that describe the charts' construction, report key statistics, and identify perceptual and cognitive phenomena. In VisText, a chart is available as three representations: a rasterized image, a backing data table, and a scene graph -- a hierarchical representation of a chart's visual elements akin to a web page's Document Object Model (DOM). To evaluate the impact of VisText, we fine-tune state-of-the-art language models on our chart captioning task and apply prefix-tuning to produce captions that vary the semantic content they convey. Our models generate coherent, semantically rich captions and perform on par with state-of-the-art chart captioning models across machine translation and text generation metrics. Through qualitative analysis, we identify six broad categories of errors that our models make that can inform future work.	翻訳日:2023-07-16 03:55:21 公開日:2023-06-28
# HIVA:ホログラフィー・インテリジェント音声アシスタント HIVA: Holographic Intellectual Voice Assistant ( http://arxiv.org/abs/2307.05501v1 ) ライセンス: Link先を確認	Ruslan Isaev, Radmir Gumerov, Gulzada Esenalieva, Remudin Reshid Mekuria, Ermek Doszhanov	(参考訳) Holographic Intellectual Voice Assistant (HIVA)は、視覚効果と3Dアバターを用いた人間のコンピュータインタラクションを促進することを目的としている。 hivaは、入学、研究問題、手数料、部門、大学構造と歴史、カンティーン、人的資源、図書館、学生生活とイベント、国と市に関する情報など、様々な性質の要求を含む、大学に関する完全な情報を提供している。以上のデータを受信するには、大学の公式サイトやその他のサポートアプリ、HEI(Higher Education Institution)公式ソーシャルメディア、HEIスタッフに直接質問する他のチャンネルなどがある。しかし、HIVAはアニメーション3Dマスコットとの「対面」相互作用のユニークな体験を提供し、実際のコミュニケーションの感覚を得るのに役立つ。このシステムは、多くのサブモジュールを含み、モバイルアプリケーション、Telegramチャットボット、提案分類、エンターテイメントサービスなどのアプリケーション群を接続する。音声アシスタントは、最高のユーザーエクスペリエンスのためにパイプライン化されたロシア語のnlpモデルとツールを使用する。 Holographic Intellectual Voice Assistant (HIVA) aims to facilitate human computer interaction using audiovisual effects and 3D avatar. HIVA provides complete information about the university, including requests of various nature: admission, study issues, fees, departments, university structure and history, canteen, human resources, library, student life and events, information about the country and the city, etc. There are other ways for receiving the data listed above: the university's official website and other supporting apps, HEI (Higher Education Institution) official social media, directly asking the HEI staff, and other channels. However, HIVA provides the unique experience of "face-to-face" interaction with an animated 3D mascot, helping to get a sense of 'real-life' communication. The system includes many sub-modules and connects a family of applications such as mobile applications, Telegram chatbot, suggestion categorization, and entertainment services. The Voice assistant uses Russian language NLP models and tools, which are pipelined for the best user experience.	翻訳日:2023-07-16 03:35:05 公開日:2023-06-28
# 微分力学系に対する古典的フィッシャー情報 Classical Fisher information for differentiable dynamical systems ( http://arxiv.org/abs/2307.00026v1 ) ライセンス: Link先を確認	Mohamed Sahbani, Swetamber Das, and Jason R. Green	(参考訳) フィッシャー情報は、古典的および量子力学的パラメータの統計的推定における不確実性の低い境界である。いくつかの決定論的力学系はランダムなゆらぎには属さないが、それでも不確実性がある: 初期条件に対する無限小の摂動は、決定論的カオスのサインである時間的に指数関数的に増加する。この不確かさの尺度として、他の古典的情報、特に騒音に従わない古典システムの決定論的ダイナミクスを紹介する。この古典的な情報の測度は接空間におけるリャプノフベクトルで定義されており、古典的なフィッシャー情報に似ておらず、ヒルベルト空間の波動ベクトルで定義される量子フィッシャー情報に近い。局所状態空間構造と線形安定性の解析は,この情報の上界と下界につながり,流れのネットストレッチング作用として解釈される。機械的な例のためのこの情報の数値計算は、位相空間の曲率と流れの速度に直接依存していることを示している。 Fisher information is a lower bound on the uncertainty in the statistical estimation of classical and quantum mechanical parameters. While some deterministic dynamical systems are not subject to random fluctuations, they do still have a form of uncertainty: Infinitesimal perturbations to the initial conditions can grow exponentially in time, a signature of deterministic chaos. As a measure of this uncertainty, we introduce another classical information, specifically for the deterministic dynamics of classical systems not subject to noise. This classical measure of information is defined with Lyapunov vectors in tangent space, making it less akin to the classical Fisher information and more akin to the quantum Fisher information defined with wavevectors in Hilbert space. Our analysis of the local state space structure and linear stability lead to upper and lower bounds on this information, giving it an interpretation as the net stretching action of the flow. Numerical calculations of this information for illustrative mechanical examples show that it depends directly on the phase space curvature and speed of the flow.	翻訳日:2023-07-09 13:50:03 公開日:2023-06-28
# EmoSpeech: FastSpeech2が感情テキストから音声へ EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech ( http://arxiv.org/abs/2307.00024v1 ) ライセンス: Link先を確認	Daria Diatlova, Vitaly Shutov	(参考訳) 最先端の音声合成モデルは、人間の声にできるだけ近づこうとしている。したがって、感情のモデル化はテキスト音声(TTS)研究の不可欠な部分である。本研究では,fastspeech2を出発点として選択し,感情音声合成のための一連の修正を提案する。自動評価と人的評価により,我々のモデルであるEmoSpeechは,生成音声におけるMOSスコアと感情認識精度の両方に関する既存モデルを上回った。我々は、EmoSpeechを形成するFastSpeech2アーキテクチャのすべての拡張について、詳細なアブレーション研究を行った。テキスト中の感情の不均一な分布は、より良い、合成された音声とイントネーション知覚に不可欠である。私たちのモデルには、さまざまな強度レベルで各携帯電話に感情が貢献できるようにすることで、この問題を効果的に処理するコンディショニングメカニズムが含まれています。人間の評価は、提案された修正は、より高いMOSと感情表現性を持つ音声を生成することを示している。 State-of-the-art speech synthesis models try to get as close as possible to the human voice. Hence, modelling emotions is an essential part of Text-To-Speech (TTS) research. In our work, we selected FastSpeech2 as the starting point and proposed a series of modifications for synthesizing emotional speech. According to automatic and human evaluation, our model, EmoSpeech, surpasses existing models regarding both MOS score and emotion recognition accuracy in generated speech. We provided a detailed ablation study for every extension to FastSpeech2 architecture that forms EmoSpeech. The uneven distribution of emotions in the text is crucial for better, synthesized speech and intonation perception. Our model includes a conditioning mechanism that effectively handles this issue by allowing emotions to contribute to each phone with varying intensity levels. The human assessment indicates that proposed modifications generate audio with higher MOS and emotional expressiveness.	翻訳日:2023-07-09 13:49:45 公開日:2023-06-28
# 古典状態の幾何学的テンソル The geometric tensor for classical states ( http://arxiv.org/abs/2307.01208v1 ) ライセンス: Link先を確認	A. D. Berm\'udez Manjarres	(参考訳) リウヴィル固有関数を用いて幾何テンソルの古典版を定義し、古典的断熱ゲージポテンシャル(AGP)との関係を研究する。我々は可積分系に注目し、幾何学的テンソルの虚部がハンネー曲率と関連していることを示す。幾何テンソルの特異点と AGP は、アーノルド・リウヴィル積分性からカオスへの遷移と、量子相転移の数学的形式論のいくつかを結びつけることができる。 We use the Liouville eigenfunctions to define a classical version of the geometric tensor and study its relationship with the classical adiabatic gauge potential (AGP). We focus on integrable systems and show that the imaginary part of the geometric tensor is related to the Hannay curvature. The singularities of the geometric tensor and the AGP allows us to link the transition from Arnold-Liouville integrability to chaos with some of the mathematical formalism of quantum phase transitions.	翻訳日:2023-07-09 13:40:53 公開日:2023-06-28
# オンラインおよびモバイルソーシャルネットワークのためのレコメンダシステム:調査 Recommender Systems for Online and Mobile Social Networks: A survey ( http://arxiv.org/abs/2307.01207v1 ) ライセンス: Link先を確認	Mattia Giovanni Campana, Franca Delmastro	(参考訳) Recommender Systems (RS) はオンラインサービスにおける基本的なツールであり、特に Online Social Networks (OSN) が出現した。この場合、ユーザは大量のコンテンツを生成し、無駄な情報によって素早くオーバーロードできる。同時に、ソーシャルメディアはコンテンツやユーザーの興味を特徴づける重要な情報源となっている。 RSはこの情報を利用して提案をさらにパーソナライズし、推奨プロセスを改善することができる。本稿では,オンラインおよびモバイルのソーシャルネットワーク向けに設計・実装されたレコメンダシステムに関する調査を行い,ソーシャルコンテキスト情報の利用がレコメンデーションタスクをどのように改善するか,標準アルゴリズムを拡張・最適化して,日和見ネットワークとして完全に分散した環境で動作させるべきか,について述べる。本稿では,これらのシステムの利点と欠点を,アルゴリズム,対象領域,評価指標,性能評価の観点から説明する。最終的には、この分野におけるオープンリサーチの課題をいくつか提示する。 Recommender Systems (RS) currently represent a fundamental tool in online services, especially with the advent of Online Social Networks (OSN). In this case, users generate huge amounts of contents and they can be quickly overloaded by useless information. At the same time, social media represent an important source of information to characterize contents and users' interests. RS can exploit this information to further personalize suggestions and improve the recommendation process. In this paper we present a survey of Recommender Systems designed and implemented for Online and Mobile Social Networks, highlighting how the use of social context information improves the recommendation task, and how standard algorithms must be enhanced and optimized to run in a fully distributed environment, as opportunistic networks. We describe advantages and drawbacks of these systems in terms of algorithms, target domains, evaluation metrics and performance evaluations. Eventually, we present some open research challenges in this area.	翻訳日:2023-07-09 13:40:45 公開日:2023-06-28
# CTR予測のための信頼度ランキング Confidence Ranking for CTR Prediction ( http://arxiv.org/abs/2307.01206v1 ) ライセンス: Link先を確認	Jian Zhu, Congcong Liu, Pei Wang, Xiwei Zhao, Zhangang Lin, Jingping Shao	(参考訳) モデルの進化とデータの定常利用は、広告やレコメンデーションシステムなど、大規模な実世界の機械学習アプリケーションにおいて2つの一般的な現象である。適応するために、現実世界のシステムは、通常、すべての利用可能なデータで再トレーニングし、最近利用可能なデータでオンライン学習を行い、パフォーマンスの向上を目標として定期的にモデルを更新する。本稿では,2つの異なるモデルを用いたランキング関数として最適化目標を設計する,信頼ランキングという新しいフレームワークを提案する。私たちの信頼度ランキングの損失は、メトリクスの異なる凸サーロゲート関数(例えば、aucと精度)に対するロジット出力の直接最適化を可能にします。提案手法を用いて,信頼度ランキング損失の導入は,公共および産業データセットのCTR予測タスクにおいて,すべてのベースラインを上回り得ることを示す。このフレームワークは、JD.comの広告システムに展開され、ファインランクの段階で主要なトラフィックを提供する。 Model evolution and constant availability of data are two common phenomena in large-scale real-world machine learning applications, e.g. ads and recommendation systems. To adapt, the real-world system typically retrain with all available data and online learn with recently available data to update the models periodically with the goal of better serving performance. In this paper, we propose a novel framework, named Confidence Ranking, which designs the optimization objective as a ranking function with two different models. Our confidence ranking loss allows direct optimization of the logits output for different convex surrogate functions of metrics, e.g. AUC and Accuracy depending on the target task and dataset. Armed with our proposed methods, our experiments show that the introduction of confidence ranking loss can outperform all baselines on the CTR prediction tasks of public and industrial datasets. This framework has been deployed in the advertisement system of JD.com to serve the main traffic in the fine-rank stage.	翻訳日:2023-07-09 13:40:26 公開日:2023-06-28
# 命令チューニングの活用可能性について On the Exploitability of Instruction Tuning ( http://arxiv.org/abs/2306.17194v1 ) ライセンス: Link先を確認	Manli Shu, Jiongxiao Wang, Chen Zhu, Jonas Geiping, Chaowei Xiao, Tom Goldstein	(参考訳) インストラクションチューニングは、大きな言語モデル(LLM)を人間の意図に合わせる効果的な手法である。本研究では,モデル動作を意図的に変化させる訓練データに,特定の指示追従例を注入することにより,相手が指導チューニングを利用する方法を検討する。例えば、敵は、ターゲットコンテンツに言及するトレーニング例を注入し、下流モデルからそのような行動を引き出すことによって、コンテンツ注入を達成できる。この目的を達成するために、自動データ中毒パイプラインである \textit{AutoPoison} を提案する。自然とコヒーレントに、oracle llmの助けを借りて、汎用的な攻撃目標を有毒データに組み込む。コンテンツインジェクションと過剰拒否攻撃の2つの例を示し、それぞれが特定の悪用可能な振る舞いを誘導する。データ中毒スキームの強さとステルスネスを定量化し、ベンチマークします。以上の結果から, オートポゾンにより, 被毒例の密着性を維持しつつ, 少量のデータのみを有毒化することにより, 敵がモデルの行動を変えることが可能となった。私たちの研究は、データ品質が命令調整モデルの振る舞いにどのように影響するかを明らかにし、llmの責任ある展開におけるデータ品質の重要性に対する認識を高めることを願っています。コードは \url{https://github.com/azshue/autopoison} で入手できる。 Instruction tuning is an effective technique to align large language models (LLMs) with human intents. In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior. For example, an adversary can achieve content injection by injecting training examples that mention target content and eliciting such behavior from downstream models. To achieve this goal, we propose \textit{AutoPoison}, an automated data poisoning pipeline. It naturally and coherently incorporates versatile attack goals into poisoned data with the help of an oracle LLM. We showcase two example attacks: content injection and over-refusal attacks, each aiming to induce a specific exploitable behavior. We quantify and benchmark the strength and the stealthiness of our data poisoning scheme. Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data while maintaining a high level of stealthiness in the poisoned examples. We hope our work sheds light on how data quality affects the behavior of instruction-tuned models and raises awareness of the importance of data quality for responsible deployments of LLMs. Code is available at \url{https://github.com/azshue/AutoPoison}.	翻訳日:2023-07-03 14:31:25 公開日:2023-06-28
# 自動脆弱性検出のための機械学習の限界 Limits of Machine Learning for Automatic Vulnerability Detection ( http://arxiv.org/abs/2306.17193v1 ) ライセンス: Link先を確認	Niklas Risse, Marcel B\"ohme	(参考訳) 関数のソースコードのみを$f$とすれば、マシンラーニングテクニックによってトレーニングされたモデルは、$f$が最大70%の精度でセキュリティ上の欠陥を含むかどうかを判断できる。しかし、これらの結果が汎用的でデータセットに固有のものではないことをどうやって知るのか? この質問を研究するために、研究者はセマンティクス保存の変更を注入することでテストセットの増幅を提案し、モデルの精度が大幅に低下することを発見した。言い換えると、このモデルは分類中にいくつかの無関係な特徴を使用する。モデルの堅牢性を高めるために、研究者は増幅されたトレーニングデータをトレーニングすることを提案した。本稿では,本研究を再現・継続し,研究者が脆弱性検出のための機械学習の進歩をよりよく評価する上で有効なモデルベンチマーク手法を提案する。具体的には (i)トレーニングセットまたはテストセットの増幅中に意味保存変換を適用するクロス検証アルゴリズム (ii)脆弱性が修正されたコードスニペットによるテストセットの増幅。 11の変換、3つのMLテクニック、2つのデータセットを使用して、改善された堅牢性は、トレーニングデータ増幅時に使用される特定の変換にのみ適用される。言い換えれば、堅牢化モデルはテストデータの脆弱性を予測するために、いまだ無関係な機能に依存しています。さらに、トレーニングされたモデルでは、脆弱性のある機能をパッチと区別する必要のある修正された設定に一般化できないことも分かりました。 Recent results of machine learning for automatic vulnerability detection have been very promising indeed: Given only the source code of a function $f$, models trained by machine learning techniques can decide if $f$ contains a security flaw with up to 70% accuracy. But how do we know that these results are general and not specific to the datasets? To study this question, researchers proposed to amplify the testing set by injecting semantic preserving changes and found that the model's accuracy significantly drops. In other words, the model uses some unrelated features during classification. In order to increase the robustness of the model, researchers proposed to train on amplified training data, and indeed model accuracy increased to previous levels. In this paper, we replicate and continue this investigation, and provide an actionable model benchmarking methodology to help researchers better evaluate advances in machine learning for vulnerability detection. Specifically, we propose (i) a cross validation algorithm, where a semantic preserving transformation is applied during the amplification of either the training set or the testing set, and (ii) the amplification of the testing set with code snippets where the vulnerabilities are fixed. Using 11 transformations, 3 ML techniques, and 2 datasets, we find that the improved robustness only applies to the specific transformations used during training data amplification. In other words, the robustified models still rely on unrelated features for predicting the vulnerabilities in the testing data. Additionally, we find that the trained models are unable to generalize to the modified setting which requires to distinguish vulnerable functions from their patches.	翻訳日:2023-07-03 14:31:03 公開日:2023-06-28
# ネットワーク干渉による因果推論の局所的アプローチ The Local Approach to Causal Inference under Network Interference ( http://arxiv.org/abs/2105.03810v4 ) ライセンス: Link先を確認	Eric Auerbach and Max Tabord-Meehan	(参考訳) エージェントが社会的・経済的ネットワークにどのようにリンクされているかに依存する場合の因果推論のための新しい非パラメトリックモデリングフレームワークを提案する。このようなネットワーク干渉は、治療の流出、社会的相互作用、社会学習、情報拡散、病気と金融の伝染、社会資本の形成などに関する大きな文献を記述している。提案手法では, エージェントがネットワーク内でどのようにリンクされているかを, 経路距離で測定した他のエージェントと近傍の接続の設定を用いて特徴付ける。ポリシーや治療課題の影響は、同様に構成されたエージェント間で結果データをプールすることで学習される。本研究は,k-nearest-neighbor推定器の平均的又は分布的政策効果/治療反応に対する平均二乗誤差を限定し,政策無関係/無治療仮説に対する漸近的に有効なテストを提案する。 We propose a new nonparametric modeling framework for causal inference when outcomes depend on how agents are linked in a social or economic network. Such network interference describes a large literature on treatment spillovers, social interactions, social learning, information diffusion, disease and financial contagion, social capital formation, and more. Our approach works by first characterizing how an agent is linked in the network using the configuration of other agents and connections nearby as measured by path distance. The impact of a policy or treatment assignment is then learned by pooling outcome data across similarly configured agents. We demonstrate the approach by proposing an asymptotically valid test for the hypothesis of policy irrelevance/no treatment effects and bounding the mean-squared error of a k-nearest-neighbor estimator for the average or distributional policy effect/treatment response.	翻訳日:2023-06-30 19:50:23 公開日:2023-06-28
# 学習型ビジュアルオドメトリーを用いた動的高密度RGB-D SLAM Dynamic Dense RGB-D SLAM using Learning-based Visual Odometry ( http://arxiv.org/abs/2205.05916v2 ) ライセンス: Link先を確認	Shihao Shen, Yilin Cai, Jiayi Qiu, Guangzhao Li	(参考訳) 本稿では,学習に基づくビジュアルオドメトリーであるTartanVOに基づく高密度な動的RGB-D SLAMパイプラインを提案する。 TartanVOは、機能ベースの他の直接的な方法と同様に、高密度の光学的流れを通してカメラのポーズを推定するが、これは静的なシーンにのみ適用され、動的オブジェクトを無視する。色濃度の仮定により、光学フローは動的画素と静的画素の区別ができない。したがって,このような直接的手法で静的マップを再構築するには,光フロー出力を利用して動的/静的セグメンテーションを解決し,静的ポイントのみをマップに融合する。さらに、動的な画素を取り除いた入力フレームを再描画し、視覚的なオドメトリーに繰り返し転送してポーズ推定を洗練させる。 We propose a dense dynamic RGB-D SLAM pipeline based on a learning-based visual odometry, TartanVO. TartanVO, like other direct methods rather than feature-based, estimates camera pose through dense optical flow, which only applies to static scenes and disregards dynamic objects. Due to the color constancy assumption, optical flow is not able to differentiate between dynamic and static pixels. Therefore, to reconstruct a static map through such direct methods, our pipeline resolves dynamic/static segmentation by leveraging the optical flow output, and only fuse static points into the map. Moreover, we rerender the input frames such that the dynamic pixels are removed and iteratively pass them back into the visual odometry to refine the pose estimate.	翻訳日:2023-06-30 19:47:46 公開日:2023-06-28
# 電圧からの構造 Structure from Voltage ( http://arxiv.org/abs/2203.00063v2 ) ライセンス: Link先を確認	Robi Bhattacharjee, Alex Cloninger, Yoav Freund, Andreas Oslandsbotn	(参考訳) 有効抵抗(ER)はグラフの構造を問う魅力的な方法である。これはグラフラプラシアンの固有ベクトルを計算するに代わるものである。グラフラプラシアンは高次元データにおいて低次元構造を見つけるために用いられる。ここでも、ERベースの解析は等ベクトル法よりも有利である。残念ながら、Von Luxburg et al. (2010) は、頂点が計量空間上の分布からのサンプルに対応するとき、遠点間のERの極限はグラフの構造に関する情報を持たない自明な量に収束することを示した。我々は、$n$頂点が$n^2$のグラフにおけるスケーリング抵抗を使用することで、電圧と有効抵抗の有意な制限が得られることを示す。また、計量グラフに「接地」ノードを加えることで、選択された点から他の全ての点までの距離を計算するための単純で自然な方法が得られることを示す。 Effective resistance (ER) is an attractive way to interrogate the structure of graphs. It is an alternative to computing the eigen-vectors of the graph Laplacian. Graph laplacians are used to find low dimensional structures in high dimensional data. Here too, ER based analysis has advantages over eign-vector based methods. Unfortunately Von Luxburg et al. (2010) show that, when vertices correspond to a sample from a distribution over a metric space, the limit of the ER between distant points converges to a trivial quantity that holds no information about the structure of the graph. We show that by using scaling resistances in a graph with $n$ vertices by $n^2$, one gets a meaningful limit of the voltages and of effective resistances. We also show that by adding a "ground" node to a metric graph one gets a simple and natural way to compute all of the distances from a chosen point to all other points.	翻訳日:2023-06-30 19:47:12 公開日:2023-06-28
# ブロックワイズデータを用いた高次元線形回帰の統計的推測 Statistical Inference for High-Dimensional Linear Regression with Blockwise Missing Data ( http://arxiv.org/abs/2106.03344v2 ) ライセンス: Link先を確認	Fei Xue, Rong Ma, Hongzhe Li	(参考訳) 異なるソースやモダリティが相補的な情報を含んでいるマルチソースまたはマルチモダリティデータを統合すると、ブロックワイドなデータが頻繁に発生する。本稿では,ブロックワイド共変量と部分的な応答変数を持つ高次元線形回帰モデルについて考察する。本研究では,不偏推定方程式とブロックワイズ計算法に基づく回帰係数ベクトルの計算効率の高い推定器を提案し,その収束率を求める。さらに,初期推定器のバイアス補正を本質的に達成する革新的な予測式法に基づいて,漸近的に分布する各回帰係数に対する偏りのない推定法を提案する。これらの偏差推定器に基づいて、漸近的に有効な信頼区間と各回帰係数に関する統計的試験を構築する。アルツハイマー病の神経画像化イニシアチブデータの数値研究と応用分析により,提案法が従来の方法よりも良好で,教師なし検体より有益であることが示された。 Blockwise missing data occurs frequently when we integrate multisource or multimodality data where different sources or modalities contain complementary information. In this paper, we consider a high-dimensional linear regression model with blockwise missing covariates and a partially observed response variable. Under this framework, we propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations and a blockwise imputation procedure, and obtain its rate of convergence. Furthermore, building upon an innovative projected estimating equation technique that intrinsically achieves bias-correction of the initial estimator, we propose a nearly unbiased estimator for each individual regression coefficient, which is asymptotically normally distributed under mild conditions. Based on these debiased estimators, asymptotically valid confidence intervals and statistical tests about each regression coefficient are constructed. Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.	翻訳日:2023-06-30 19:46:08 公開日:2023-06-28
# SAFER:強化学習による集中的かつ効率的な軌道探索による安全な衝突回避 SAFER: Safe Collision Avoidance using Focused and Efficient Trajectory Search with Reinforcement Learning ( http://arxiv.org/abs/2209.11789v2 ) ライセンス: Link先を確認	Mario Srouji, Hugues Thomas, Hubert Tsai, Ali Farhadi, Jian Zhang	(参考訳) 衝突回避は、現実世界で安全に動く移動ロボットやエージェントにとって重要だ。本研究では,オペレーターが送信する制御コマンドの修正により安全性を向上させることができる,効率的な衝突回避システムSAFERを提案する。現実世界の強化学習(RL)、検索ベースのオンライン軌道計画、自動緊急ブレーキ(AEB)などの自動緊急介入を組み合わせる。 RLの目的は、衝突のない軌道の集中探索に使用される効果的な補正制御動作を学習し、自動緊急ブレーキの起動頻度を低減することである。この新しいセットアップにより、rlポリシーは現実世界の屋内環境でモバイルロボットを安全に直接学習することができ、トレーニング中にも実際のクラッシュを最小限に抑えることができる。私たちの実世界の実験では、いくつかのベースラインと比較すると、平均速度、クラッシュ率の低下、緊急介入の低減、計算オーバーヘッドの低減、全体的なコントロールの円滑化が期待できます。 Collision avoidance is key for mobile robots and agents to operate safely in the real world. In this work we present SAFER, an efficient and effective collision avoidance system that is able to improve safety by correcting the control commands sent by an operator. It combines real-world reinforcement learning (RL), search-based online trajectory planning, and automatic emergency intervention, e.g. automatic emergency braking (AEB). The goal of the RL is to learn an effective corrective control action that is used in a focused search for collision-free trajectories, and to reduce the frequency of triggering automatic emergency braking. This novel setup enables the RL policy to learn safely and directly on mobile robots in a real-world indoor environment, minimizing actual crashes even during training. Our real-world experiments show that, when compared with several baselines, our approach enjoys a higher average speed, lower crash rate, less emergency intervention, smaller computation overhead, and smoother overall control.	翻訳日:2023-06-30 19:35:54 公開日:2023-06-28
# マルチタイム量子通信: 興味深いが反事実ではない Multitime Quantum Communication: Interesting But Not Counterfactual ( http://arxiv.org/abs/2301.01730v3 ) ライセンス: Link先を確認	Robert B. Griffiths	(参考訳) Salihらによって導入された2つの当事者間での情報伝達プロトコルPhys。 Rev. Lett. 110 (2013) 170502 (ここでslazの後)は、単に一方向の信号を送るのではなく、一連のステップで量子チャネルで量子振幅を前後に送信する。著者らは、それらのプロトコルは、両者をつなぐために量子チャネルが必要であるが、ステップ数が無限になる傾向があるため、その実際の使用量は漸近的限界において極めて小さくなるという意味で、'counterfactual'であると主張した。ここでは、量子干渉の存在下で中間時間で有効でない確率論的推論を使用するため、この主張は誤りであることを示す。未定義の確率が、チャネルを通じて送信される振幅の絶対二乗に等しい「コスト」と呼ばれるよく定義されたチャネル使用量の尺度に置き換えられるとき、その総コストは、多くのステップの漸近極限においてゼロにならず、厳密な不等式によって下限となる。詳細な分析により、この境界がSLAZプロトコルで満たされていることが示されている。この境界につながる解析は、純量子状態の集合の内部積によって形成されるグラム行列がヒルベルト部分空間上の加法的であり、ユニタリ時間変換の下で不変であるという事実を用いる。その非対角的要素は概して肯定的ではないが、形式的議論において重要な役割を果たすとともに、情報の伝達を幾分奇妙な方法で可視化する。 A protocol for transmission of information between two parties introduced by Salih et al., Phys. Rev. Lett. 110 (2013) 170502 (hereafter SLAZ), involves sending quantum amplitude back and forth through a quantum channel in a series of steps, rather than simply sending a signal in one direction. The authors claimed that their protocol was ``counterfactual'' in the sense that while a quantum channel is needed to connect the parties, its actual usage becomes vanishingly small in the asymptotic limit as the number of steps tends to infinity. Here we show that this claim is incorrect because it uses probabilistic reasoning that is not valid at intermediate times in the presence of quantum interference. When ill-defined probabilities are replaced with a well-defined measure of channel usage here called ``Cost'', equal to the absolute square of the amplitude sent through the channel, the total Cost does not go to zero in the asymptotic limit of a large number of steps, but is bounded below by a rigorous inequality. A detailed analysis shows that this bound is satisfied in the SLAZ protocol. The analysis leading to the bound uses the fact that the Gram matrix formed by inner products of a collection of pure quantum states is additive over Hilbert subspaces and invariant under unitary time transformations. Its off-diagonal elements, which in general are not positive, play a significant role in the formal argument as well as providing a somewhat strange way of visualizing the transfer of information.	翻訳日:2023-06-30 19:27:24 公開日:2023-06-28
# 一般核行列のデータ駆動線形複雑性低ランク近似:幾何学的アプローチ Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach ( http://arxiv.org/abs/2212.12674v2 ) ライセンス: Link先を確認	Difeng Cai, Edmond Chow, Yuanzhe Xi	(参考訳) 一般に、カーネル行列は $k_{ij} = \kappa(x_i,y_j)$ ここで $\kappa(x,y)$ はカーネル関数であり、$x=\{x_i\}_{i=1}^m$ と $y=\{y_i\}_{i=1}^n$ は2つの点の集合である。本稿では、x$ と $y$ の点の集合が大きすぎて、互いに離れて ``intermingled'' や same など、任意に分布するカーネル行列の低ランク近似を求める。このような長方形のカーネル行列は、例えばガウスのプロセス回帰において、$X$はトレーニングデータに対応し、$Y$はテストデータに対応する。この場合、点はしばしば高次元である。点集合は大きいので、行列が核関数から生じるという事実を活用し、行列の形成を避け、したがってほとんどの代数的手法を除外しなければならない。特に,固定近似ランクのデータサイズに関して,線形あるいはほぼ線形にスケールできる手法を求める。この論文の主なアイデアは、低位近似を構成する点の適切な部分集合を幾何学的に選択することである。本稿では,この選択をいかに行うべきかを考察する。 A general, {\em rectangular} kernel matrix may be defined as $K_{ij} = \kappa(x_i,y_j)$ where $\kappa(x,y)$ is a kernel function and where $X=\{x_i\}_{i=1}^m$ and $Y=\{y_i\}_{i=1}^n$ are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large and are arbitrarily distributed, such as away from each other, ``intermingled'', identical, etc. Such rectangular kernel matrices may arise, for example, in Gaussian process regression where $X$ corresponds to the training data and $Y$ corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly or nearly linear with respect to the size of data for a fixed approximation rank. The main idea in this paper is to {\em geometrically} select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.	翻訳日:2023-06-30 19:26:57 公開日:2023-06-28
# カスケード変分量子固有解法アルゴリズム Cascaded variational quantum eigensolver algorithm ( http://arxiv.org/abs/2303.15237v2 ) ライセンス: Link先を確認	Daniel Gunlycke, C. Stephen Hellberg, and John P. T. Stenger	(参考訳) 本稿では,パラメータ最適化過程において,反復毎に1回ではなく1回の量子回路セットの実行しか必要としないカスケード変分量子固有ソルバアルゴリズムを提案する。このアルゴリズムは量子処理ユニットを用いて必要な確率質量関数を探索し、古典処理ユニットはエネルギー最小化を含む残りの計算を行う。アンサッツ形式はフォック空間を制限せず、対称性やその他の物理的動機付けのある制約を含むアンサッツ状態を完全に制御する。 We present a cascaded variational quantum eigensolver algorithm that only requires the execution of a set of quantum circuits once rather than at every iteration during the parameter optimization process, thereby reducing the number of needed circuit executions. This algorithm uses a quantum processing unit to probe the needed probability mass functions and a classical processing unit perform the remaining calculations, including the energy minimization. The ansatz form does not restrict the Fock space and provides full control over the ansatz state, including the implementation of symmetry and other physically motivated constraints.	翻訳日:2023-06-30 19:18:23 公開日:2023-06-28
# ProphNet: アンカーインフォームド提案による効率的なエージェント中心運動予測 ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals ( http://arxiv.org/abs/2303.12071v3 ) ライセンス: Link先を確認	Xishun Wang, Tong Su, Fang Da, Xiaodong Yang	(参考訳) モーション予測は自動運転システムにおいて重要なモジュールである。マルチソース入力の異質性、エージェントの動作におけるマルチモダリティ、オンボード配置に必要な低レイテンシのため、このタスクは悪名高い課題である。このような問題に対処するため,本研究では,効率的なマルチモーダル動作予測のためのアンカーインフォームド提案を用いたエージェント中心モデルを提案する。複雑な入力を簡潔に統一的に符号化するモダリティ非依存戦略を設計する。我々は,目標志向のシーンコンテキストを持つアンカーと融合した多様な提案を生成し,幅広い将来の軌跡をカバーするマルチモーダル予測を誘導する。我々のネットワークアーキテクチャは高度に均一で簡潔であり、現実の運転環境に適応できる効率的なモデルに繋がる。実験により,エージェント中心のネットワークは予測精度において最先端の手法と好適に比較され,シーン中心レベルの推論レイテンシが達成された。 Motion forecasting is a key module in an autonomous driving system. Due to the heterogeneous nature of multi-sourced input, multimodality in agent behavior, and low latency required by onboard deployment, this task is notoriously challenging. To cope with these difficulties, this paper proposes a novel agent-centric model with anchor-informed proposals for efficient multimodal motion prediction. We design a modality-agnostic strategy to concisely encode the complex input in a unified manner. We generate diverse proposals, fused with anchors bearing goal-oriented scene context, to induce multimodal prediction that covers a wide range of future trajectories. Our network architecture is highly uniform and succinct, leading to an efficient model amenable for real-world driving deployment. Experiments reveal that our agent-centric network compares favorably with the state-of-the-art methods in prediction accuracy, while achieving scene-centric level inference latency.	翻訳日:2023-06-30 19:17:26 公開日:2023-06-28
# 因果推論における変数重要度マッチング Variable Importance Matching for Causal Inference ( http://arxiv.org/abs/2302.11715v2 ) ライセンス: Link先を確認	Quinn Lanners, Harsh Parikh, Alexander Volfovsky, Cynthia Rudin, and David Page	(参考訳) 我々の目標は、監査可能で、トラブルシュートが容易で、治療効果の推定に正確で、高次元データにスケーラブルな観察因果推定法を作ることである。これらの目標を達成するための汎用フレームワークとして,model-to-matchについて述べる。 (i)成果モデリングを通して距離計量を学ぶこと。 (ii)距離計量を用いて一致群を作成すること、 (iii)一致群を用いて治療効果を推定する。 Model-to-Matchは、可変重要度測定を使用して距離メトリックを構築し、様々なアプリケーションに適用可能な柔軟なフレームワークとなる。潜在的な共同設立者数における問題のスケーラビリティに集中して、LASSOでModel-to-Matchフレームワークを運用します。 lassoの成果モデリングが(線形モデルを正しく指定する必要なしに)すべての共同創設者を一貫して識別する設定で、パフォーマンス保証を導き出します。また,本手法の監査性,精度,拡張性,およびより一般的な非パラメトリックな結果モデリングの拡張性を示す実験結果も提供する。 Our goal is to produce methods for observational causal inference that are auditable, easy to troubleshoot, accurate for treatment effect estimation, and scalable to high-dimensional data. We describe a general framework called Model-to-Match that achieves these goals by (i) learning a distance metric via outcome modeling, (ii) creating matched groups using the distance metric, and (iii) using the matched groups to estimate treatment effects. Model-to-Match uses variable importance measurements to construct a distance metric, making it a flexible framework that can be adapted to various applications. Concentrating on the scalability of the problem in the number of potential confounders, we operationalize the Model-to-Match framework with LASSO. We derive performance guarantees for settings where LASSO outcome modeling consistently identifies all confounders (importantly without requiring the linear model to be correctly specified). We also provide experimental results demonstrating the method's auditability, accuracy, and scalability as well as extensions to more general nonparametric outcome modeling.	翻訳日:2023-06-30 19:16:22 公開日:2023-06-28
# ディープラーニングによるパーコレーション型ゲームのマスタリング Mastering Percolation-like Games with Deep Learning ( http://arxiv.org/abs/2305.07687v2 ) ライセンス: Link先を確認	Michael M. Danziger, Omkar R. Gojala, Sean P. Cornelius	(参考訳) ランダムアタックに対するネットワークの堅牢性は広く研究されているが、知的エージェントによる意図的な破壊は従来の方法では不可能である。ここでは,ネットワークを破壊しようとする攻撃者の論理を模倣した格子上に単一プレイヤーゲームを作成する。ゲームの目的は、最も少ないステップ数で全てのノードを無効にすることである。我々は,ネットワークを最適に攻撃するために,このゲームをうまくプレイできる深層q学習を用いた強化学習手法を開発した。学習アルゴリズムは普遍的であるため、堅牢性の異なる定義のエージェントを訓練し、学習戦略を比較する。表面的に類似したロバストネスの定義は、訓練されたエージェントに異なる戦略を誘導し、ネットワークを最適に攻撃または防御することが特定の目的に敏感であることを示唆する。本手法はネットワークのロバスト性を理解するための新しい手法であり、障害のあるシステムにおける他の離散プロセスへの潜在的な応用を提供する。 Though robustness of networks to random attacks has been widely studied, intentional destruction by an intelligent agent is not tractable with previous methods. Here we devise a single-player game on a lattice that mimics the logic of an attacker attempting to destroy a network. The objective of the game is to disable all nodes in the fewest number of steps. We develop a reinforcement learning approach using deep Q-learning that is capable of learning to play this game successfully, and in so doing, to optimally attack a network. Because the learning algorithm is universal, we train agents on different definitions of robustness and compare the learned strategies. We find that superficially similar definitions of robustness induce different strategies in the trained agent, implying that optimally attacking or defending a network is sensitive the particular objective. Our method provides a new approach to understand network robustness, with potential applications to other discrete processes in disordered systems.	翻訳日:2023-06-30 19:09:30 公開日:2023-06-28
# ポテンシャルインバージョン理論 The Potential Inversion Theorem ( http://arxiv.org/abs/2305.07260v2 ) ライセンス: Link先を確認	Alec Shelley, Henry Hunt	(参考訳) タイト結合モデルにおける波動関数の確率は、初期条件が厳密に偶数あるいは奇な格子点を占有し、大域的な位相まで存在する限り、ポテンシャルエネルギーの符号反転の下で保存されるというポテンシャル反転定理を証明する。この対称性は電子対の時間はポジトロニウムのように進化し、したがって結合状態を形成する必要がある。我々は、この単純な定理の他の興味深い結果と同様に、粒子が正のポテンシャルと同様に負のポテンシャルに捕捉されるという事実も探求する。格子ホッピングモデル、スピン相互作用モデル、量子場理論の文脈におけるポテンシャル反転定理を議論し、それらがいくつかの無関係な物理的効果を単純化し説明できることを示す。 We prove the potential inversion theorem, which says that wavefunction probability in tight binding models is preserved under the sign inversion of the potential energy as long as the initial conditions occupy strictly even or odd lattice sites and are real up to a global phase. This symmetry requires that electron pairs time evolve like positronium and therefore form bound states. We explore this as well as other intriguing consequences of this simple theorem, such as the fact that particles can be trapped by negative potentials just as well as positive potentials. We discuss the potential inversion theorem in the context of lattice hopping models, spin interaction models, and quantum field theory, and show that it can simplify and explain a number of seemingly unrelated physical effects.	翻訳日:2023-06-30 19:09:14 公開日:2023-06-28
# 分解密度を持つ文字列図形 String Diagrams with Factorized Densities ( http://arxiv.org/abs/2305.02506v2 ) ライセンス: Link先を確認	Eli Sennesh and Jan-Willem van de Meent	(参考訳) 確率的プログラムと因果モデルに関する研究の活発化は、有向グラフィカルモデルを拡張するモデルクラスについて構成的に考える必要性を強調している。確率的プログラムと因果モデルの両方は、ランダム変数の集合上の合同確率密度を定義し、因果関係と条件独立性を推論するために使用できるスパース構造を示す。この研究は、確率写像のマルコフ圏に関する最近の研究に基づいて、射が各サンプル空間上で分解された結合密度と、サンプルから戻り値への決定論的写像を組み合わせた圏を定義する。これは、確率測度に関する最近のカテゴリー論的記述と、確率計画法や因果推論によく用いられる分解密度の操作的定義とのギャップを埋めるためのステップである。 A growing body of research on probabilistic programs and causal models has highlighted the need to reason compositionally about model classes that extend directed graphical models. Both probabilistic programs and causal models define a joint probability density over a set of random variables, and exhibit sparse structure that can be used to reason about causation and conditional independence. This work builds on recent work on Markov categories of probabilistic mappings to define a category whose morphisms combine a joint density, factorized over each sample space, with a deterministic mapping from samples to return values. This is a step towards closing the gap between recent category-theoretic descriptions of probability measures, and the operational definitions of factorized densities that are commonly employed in probabilistic programming and causal inference.	翻訳日:2023-06-30 19:08:39 公開日:2023-06-28
# 長期一重項状態準備のための反断熱駆動 Counterdiabatic driving for long-lived singlet state preparation ( http://arxiv.org/abs/2305.02096v2 ) ライセンス: Link先を確認	Abhinav Suresh, Vishal Varma, Priya Batra, and T S Mahesh	(参考訳) 量子アディアバティック法は、状態の進化を通じて瞬時に固有状態の個体群を維持するもので、状態の準備と操作のために確立され、しばしば好まれる選択である。駆動コストを著しく最小化するが、その遅い速度はノイズの多い中規模量子(NISQ)時代の技術では厳しい制限となる。断熱経路は多くの物理過程において広く見られるため、断熱をはるかに高速に達成することはより広い関心事である。非断熱経路を高速に駆動することで、遅い断熱過程を克服する断熱技術へのショートカットが近年注目されている。過去10年間に確立された核磁気共鳴における長寿命一重項状態(LLS)の極端に長い寿命は、分光法から生医学的イメージングまで、いくつかの重要な応用を開拓してきた。断熱法を含む様々な方法がLSSの調製にすでに使われている。本稿では,高速駆動によるLSS調製を高速化するために,逆断熱駆動(CD)を用いたことを報告する。 NMR実験により,CDは従来の断熱駆動よりも短い期間でLSSのオーダーを得られることを示した。 The quantum adiabatic method, which maintains populations in their instantaneous eigenstates throughout the state evolution, is an established and often a preferred choice for state preparation and manipulation. Though it minimizes the driving cost significantly, its slow speed is a severe limitation in noisy intermediate-scale quantum (NISQ) era technologies. Since adiabatic paths are extensive in many physical processes, it is of broader interest to achieve adiabaticity at a much faster rate. Shortcuts to adiabaticity techniques which overcome the slow adiabatic process by driving the system faster through non-adiabatic paths, have seen increased attention recently. The extraordinarily long lifetime of the long-lived singlet states (LLS) in nuclear magnetic resonance, established over the past decade, has opened several important applications ranging from spectroscopy to biomedical imaging. Various methods, including adiabatic methods, are already being used to prepare LLS. In this article, we report the use of counterdiabatic driving (CD) to speed up LLS preparation with faster drives. Using NMR experiments, we show that CD can give stronger LLS order in shorter durations than conventional adiabatic driving.	翻訳日:2023-06-30 19:08:25 公開日:2023-06-28
# BMAD: 医学的異常検出のためのベンチマーク BMAD: Benchmarks for Medical Anomaly Detection ( http://arxiv.org/abs/2306.11876v2 ) ライセンス: Link先を確認	Jinan Bao, Hanshi Sun, Hanqiu Deng, Yinsheng He, Zhaoxiang Zhang, Xingyu Li	(参考訳) 異常検出(AD)は、機械学習とコンピュータビジョンの基本的な研究課題であり、産業検査、ビデオ監視、医療診断に実用化されている。医用画像では、ADはまれな疾患や病態を示す可能性のある異常の検出と診断に特に重要である。しかし、医療画像上でADメソッドを評価するための普遍的で公平なベンチマークが欠如しており、この特定の領域におけるより一般化された、堅牢なADメソッドの開発を妨げる。このギャップを埋めるために,医療画像における異常検出手法を評価するための総合評価ベンチマークを提案する。このベンチマークは、5つの医学領域(脳MRI、肝CT、網膜OCT、胸部X線、デジタル病理学)から6つの再構成データセットと3つの重要な評価指標を含み、合計14の最先端ADアルゴリズムを含んでいる。本ベンチマークは,最近提案された異常検出手法の総合的な比較を可能にする。これは、コミュニティが公正な比較を行い、医療画像のAD分野を前進させることを促す。 BMADの詳細はGitHubリポジトリで確認できます。 Anomaly detection (AD) is a fundamental research problem in machine learning and computer vision, with practical applications in industrial inspection, video surveillance, and medical diagnosis. In medical imaging, AD is especially vital for detecting and diagnosing anomalies that may indicate rare diseases or conditions. However, there is a lack of a universal and fair benchmark for evaluating AD methods on medical images, which hinders the development of more generalized and robust AD methods in this specific domain. To bridge this gap, we introduce a comprehensive evaluation benchmark for assessing anomaly detection methods on medical images. This benchmark encompasses six reorganized datasets from five medical domains (i.e. brain MRI, liver CT, retinal OCT, chest X-ray, and digital histopathology) and three key evaluation metrics, and includes a total of fourteen state-of-the-art AD algorithms. This standardized and well-curated medical benchmark with the well-structured codebase enables comprehensive comparisons among recently proposed anomaly detection methods. It will facilitate the community to conduct a fair comparison and advance the field of AD on medical imaging. More information on BMAD is available in our GitHub repository: https://github.com/DorisBao/BMAD	翻訳日:2023-06-30 18:58:27 公開日:2023-06-28
# 脳腫瘍分離(BraTS)チャレンジ2023: 腫瘍分離(BraSyn)のための脳MR画像合成 The Brain Tumor Segmentation (BraTS) Challenge 2023: Brain MR Image Synthesis for Tumor Segmentation (BraSyn) ( http://arxiv.org/abs/2305.09011v5 ) ライセンス: Link先を確認	Hongwei Bran Li, Gian Marco Conte, Syed Muhammad Anwar, Florian Kofler, Ivan Ezhov, Koen van Leemput, Marie Piraud, Maria Diaz, Byrone Cole, Evan Calabrese, Jeff Rudie, Felix Meissen, Maruf Adewole, Anastasia Janas, Anahita Fathi Kazerooni, Dominic LaBella, Ahmed W. Moawad, Keyvan Farahani, James Eddy, Timothy Bergquist, Verena Chung, Russell Takeshi Shinohara, Farouk Dako, Walter Wiggins, Zachary Reitman, Chunhao Wang, Xinyang Liu, Zhifan Jiang, Ariana Familiar, Elaine Johanson, Zeke Meier, Christos Davatzikos, John Freymann, Justin Kirby, Michel Bilello, Hassan M. Fathallah-Shaykh, Roland Wiest, Jan Kirschke, Rivka R. Colen, Aikaterini Kotrotsou, Pamela Lamontagne, Daniel Marcus, Mikhail Milchenko, Arash Nazeri, Marc Andr\'e Weber, Abhishek Mahajan, Suyash Mohan, John Mongan, Christopher Hess, Soonmee Cha, Javier Villanueva, Meyer Errol Colak, Priscila Crivellaro, Andras Jakab, Jake Albrecht, Udunna Anazodo, Mariam Aboian, Thomas Yu, Verena Chung, Timothy Bergquist, James Eddy, Jake Albrecht, Ujjwal Baid, Spyridon Bakas, Marius George Linguraru, Bjoern Menze, Juan Eugenio Iglesias, Benedikt Wiestler	(参考訳) 自動脳腫瘍分画法が確立され,臨床応用可能な性能レベルに達している。これらの手法は通常、T1強調画像、T2強調画像、FLAIR画像の4つの入力磁気共鳴イメージング(MRI)モードに依存している。しかしながら、一部のシーケンスは、時間的制約や患者の動きのようなイメージアーティファクトのために臨床実践に欠落することが多い。その結果、これらのアルゴリズムが臨床ルーチンで広く採用されるためには、欠落したモダリティを置換し、セグメンテーション性能を得る能力が極めて望ましい。本稿では,医療用画像コンピューティングとコンピュータ支援インターベンション(MICCAI)2023と連携して脳MR画像合成ベンチマーク(BraSyn)の確立について述べる。この課題の主な目的は、複数の利用可能な画像が提供される際に、MRIの欠落を現実的に生成できる画像合成手法を評価することである。究極の目的は、自動的な脳腫瘍セグメンテーションパイプラインを促進することである。ベンチマークで使用される画像データセットは多様で多様であり、様々な病院や研究機関と協力して作成された。 Automated brain tumor segmentation methods have become well-established and reached performance levels offering clear clinical utility. These methods typically rely on four input magnetic resonance imaging (MRI) modalities: T1-weighted images with and without contrast enhancement, T2-weighted images, and FLAIR images. However, some sequences are often missing in clinical practice due to time constraints or image artifacts, such as patient motion. Consequently, the ability to substitute missing modalities and gain segmentation performance is highly desirable and necessary for the broader adoption of these algorithms in the clinical routine. In this work, we present the establishment of the Brain MR Image Synthesis Benchmark (BraSyn) in conjunction with the Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2023. The primary objective of this challenge is to evaluate image synthesis methods that can realistically generate missing MRI modalities when multiple available images are provided. The ultimate aim is to facilitate automated brain tumor segmentation pipelines. The image dataset used in the benchmark is diverse and multi-modal, created through collaboration with various hospitals and research institutions.	翻訳日:2023-06-30 18:56:23 公開日:2023-06-28
# 条件付き拡散モデルによる損失画像圧縮 Lossy Image Compression with Conditional Diffusion Models ( http://arxiv.org/abs/2209.06950v5 ) ライセンス: Link先を確認	Ruihan Yang, Stephan Mandt	(参考訳) 本稿では,拡散生成モデルを用いた画像圧縮のエンドツーエンド最適化について概説する。このアプローチは変換符号化パラダイムに依存しており、画像はエントロピー符号化のための潜在空間にマッピングされ、そこから再構成のためにデータ空間にマッピングされる。平均)デコーダが決定論的ニューラルネットワークであるvaeベースのニューラルネットワークとは対照的に、このデコーダは条件拡散モデルである。そこで本手法では,逆拡散過程を条件付けした"コンテンツ"潜在変数を導入し,この変数を用いて画像に関する情報を格納する。拡散過程を特徴付ける残りの「テクスチャ」変数は復号時に合成される。モデルの性能は,関心の認知的指標に調整可能であることを示す。複数のデータセットと画像品質評価指標を含む広範囲な実験により,提案手法はGANモデルよりも強いFIDスコアを得られる一方で,VAEモデルと競合する性能を複数の歪み指標で得ることが示された。さらに、Xパラメータ化による拡散の訓練により、少数の復号化ステップで高品質な再構成が可能となり、モデルの実用性に大きな影響を及ぼす。 This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. The approach relies on the transform coding paradigm, where an image is mapped into a latent space for entropy coding and, from there, mapped back to the data space for reconstruction. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach thus introduces an additional "content" latent variable on which the reverse diffusion process is conditioned and uses this variable to store information about the image. The remaining "texture" variables characterizing the diffusion process are synthesized at decoding time. We show that the model's performance can be tuned toward perceptual metrics of interest. Our extensive experiments involving multiple datasets and image quality assessment metrics show that our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics. Furthermore, training the diffusion with X-parameterization enables high-quality reconstructions in only a handful of decoding steps, greatly affecting the model's practicality.	翻訳日:2023-06-30 17:03:22 公開日:2023-06-28
# 帯域制限関数の一般化におけるNN上のGNNの優位性 Superiority of GNN over NN in generalizing bandlimited functions ( http://arxiv.org/abs/2206.05904v7 ) ライセンス: Link先を確認	A. Martina Neuman, Rongrong Wang and Yuying Xie	(参考訳) グラフ情報の統合機能を備えたグラフニューラルネットワーク(GNN)は,データ解析に広く利用されている。しかし、GNNの表現力はグラフレベルのタスクにのみ研究されているが、ノード分類のようなノードレベルのタスクでは研究されていない。本稿では, 関数補間問題である, 上記の分類課題に対するgnnの表現力について検討する。具体的には、GNNが帯域制限関数を$\mathbb{R}^d$で補間するのに必要な重みと層の数を求める。以上の結果から,GNNアーキテクチャを用いた帯域制限関数の重み付けは,完全連結ニューラルネットワーク(NN)を用いた一般的な帯域制限関数よりもはるかに少ないことが分かる。特に,$O((\log \epsilon^{-1})^{d})$重み付けは,$O((\log \epsilon^{-1})^{d})$サンプルから$\epsilon$-approximateへ,$\mathbb{R}^d$で離散化された帯域制限信号を生成する。この結果は、gnn構造と古典的なサンプリング定理との関係を描き、我々の研究がこの方向への最初の試みとなるようにすることで得られる。 Graph Neural Network (GNN) with its ability to integrate graph information has been widely used for data analyses. However, the expressive power of GNN has only been studied for graph-level tasks but not for node-level tasks, such as node classification, where one tries to interpolate missing nodal labels from the observed ones. In this paper, we study the expressive power of GNN for the said classification task, which is in essence a function interpolation problem. Explicitly, we derive the number of weights and layers needed for a GNN to interpolate a band-limited function in $\mathbb{R}^d$. Our result shows that, the number of weights needed to $\epsilon$-approximate a bandlimited function using the GNN architecture is much fewer than the best known one using a fully connected neural network (NN) - in particular, one only needs $O((\log \epsilon^{-1})^{d})$ weights using a GNN trained by $O((\log \epsilon^{-1})^{d})$ samples to $\epsilon$-approximate a discretized bandlimited signal in $\mathbb{R}^d$. The result is obtained by drawing a connection between the GNN structure and the classical sampling theorems, making our work the first attempt in this direction.	翻訳日:2023-06-30 17:02:16 公開日:2023-06-28
# データ中毒に対する時間的ロバスト性 Temporal Robustness against Data Poisoning ( http://arxiv.org/abs/2302.03684v2 ) ライセンス: Link先を確認	Wenxiao Wang, Soheil Feizi	(参考訳) データ中毒は、悪意のあるトレーニングデータを通じて機械学習アルゴリズムの振る舞いを操作する場合を考える。データ中毒の既存の脅威モデルは、1つの指標、有毒サンプルの数を中心に構成されている。結果として、多くの実用的なシナリオのように、攻撃者が予想よりも多くのサンプルを手頃なオーバーヘッドで毒殺することができれば、既存の防御を短期間で無効にすることができる可能性がある。この問題に対処するために、私たちはデータの生年月日を示すタイムスタンプを活用しています。これらのタイムスタンプの利点を生かして,攻撃開始までの時間と攻撃の継続時間を測定する2つの新しい指標,アールネスと持続時間によるデータ中毒の時間的脅威モデルを提案する。これらの指標を用いて,データ中毒に対する時間的ロバスト性の概念を定義し,有意な保護感を与える。本稿では,更新モデルの連続データ収集と周期的展開をシミュレートした評価プロトコルを用いて,時間的ロバスト性の実証評価を行う。最後に、我々は、時間的集約(temporal aggregation)、証明可能な時間的堅牢性(temporal robustness)の提供、およびデータ中毒に対する時間的脅威モデルの可能性を強調するベースラインディフェンスを開発し、実証的に検証する。 Data poisoning considers cases when an adversary manipulates the behavior of machine learning algorithms through malicious training data. Existing threat models of data poisoning center around a single metric, the number of poisoned samples. In consequence, if attackers can poison more samples than expected with affordable overhead, as in many practical scenarios, they may be able to render existing defenses ineffective in a short time. To address this issue, we leverage timestamps denoting the birth dates of data, which are often available but neglected in the past. Benefiting from these timestamps, we propose a temporal threat model of data poisoning with two novel metrics, earliness and duration, which respectively measure how long an attack started in advance and how long an attack lasted. Using these metrics, we define the notions of temporal robustness against data poisoning, providing a meaningful sense of protection even with unbounded amounts of poisoned samples. We present a benchmark with an evaluation protocol simulating continuous data collection and periodic deployments of updated models, thus enabling empirical evaluation of temporal robustness. Lastly, we develop and also empirically verify a baseline defense, namely temporal aggregation, offering provable temporal robustness and highlighting the potential of our temporal threat model for data poisoning.	翻訳日:2023-06-30 16:54:38 公開日:2023-06-28
# ニュースの文レベルの事実性とメディアメディアのバイアスの予測 Predicting Sentence-Level Factuality of News and Bias of Media Outlets ( http://arxiv.org/abs/2301.11850v3 ) ライセンス: Link先を確認	Francielle Vargas, Kokil Jaidka, Thiago A. S. Pardo, Fabr\'icio Benevenuto	(参考訳) ニュースの信頼性と事実チェックを大規模に自動化するには、ニュースの事実とメディアバイアスを正確に予測する必要がある。本稿では,AllSides が提案する事実とメディアバイアスの定義に基づいて,6,191 の注釈付き文からなる「FactNews」という文レベルデータセットを提案する。我々はFactNewsを用いて、ニュースメディアの文章レベルの事実性を予測するための2つのテキスト分類問題を定式化し、ニュースソース全体の信頼性を評価する。本実験では,バイアス文は感情の優位に加えて,実文よりも単語数が多いことを実証する。そこで,ニュース記事の主観性と公平性の微粒化分析により,メディアの信頼性を予測できる有望な結果が得られた。最後に、ブラジルにおける偽ニュースの深刻さと政治的偏見、そしてポルトガル語の研究の欠如により、データセットとベースラインの両方がブラジルポルトガル語向けに提案された。 Automated news credibility and fact-checking at scale require accurately predicting news factuality and media bias. This paper introduces a large sentence-level dataset, titled "FactNews", composed of 6,191 sentences expertly annotated according to factuality and media bias definitions proposed by AllSides. We use FactNews to assess the overall reliability of news sources, by formulating two text classification problems for predicting sentence-level factuality of news reporting and bias of media outlets. Our experiments demonstrate that biased sentences present a higher number of words compared to factual sentences, besides having a predominance of emotions. Hence, the fine-grained analysis of subjectivity and impartiality of news articles provided promising results for predicting the reliability of media outlets. Finally, due to the severity of fake news and political polarization in Brazil, and the lack of research for Portuguese, both dataset and baseline were proposed for Brazilian Portuguese.	翻訳日:2023-06-30 16:53:42 公開日:2023-06-28
# 敵対的攻撃に対するブラックボックスモデルのデータフリー防御 Data-free Defense of Black Box Models Against Adversarial Attacks ( http://arxiv.org/abs/2211.01579v2 ) ライセンス: Link先を確認	Gaurav Kumar Nayak, Inder Khatri, Ruchit Rawal, Anirban Chakraborty	(参考訳) いくつかの企業は、APIを通じてブラックボックスとしてのみ公開することによって、トレーニングされた深層モデル(アーキテクチャの詳細、学習重量、トレーニング詳細など)をサードパーティのユーザから保護することが多い。さらに、プロプライエタリな理由やセンシティブな懸念から、トレーニングデータへのアクセスも提供されない可能性がある。そこで本研究では,データフリーセットアップにおける敵攻撃に対するブラックボックスモデルに対する新しい防御機構を提案する。生成モデルによる合成データを構築し,モデル盗み技術を用いてサロゲートネットワークを訓練する。本稿では,入力画像上で離散ウェーブレット分解を行う「ウェーブレットノイズ除去器」(WNR)を提案し,我々の「ウェーブレット係数選択モジュール」(WCSM)によって決定されるいくつかの重要な係数のみを慎重に選択する。 WNRによるノイズ除去後の画像の高周波コンテンツを回復するため,再構成した画像がサロゲートモデル上で元の予測と類似する係数を復元する目的で,さらに「再生器」ネットワークを訓練する。テスト時には、トレーニングされた再生器ネットワークと組み合わせたWNRがブラックボックスネットワークにプリプションされ、敵の精度が向上する。本手法は,攻撃者がブラックボックスアーキテクチャ(Alexnet)に類似したサロゲートアーキテクチャ(Alexnet-half,Alexnet)をディフェンダーと同じモデルステーリング戦略で使用しても,ベースラインと比較してCIFAR-10の対角精度を38.98%,32.01%向上させる。コードはhttps://github.com/vcl-iisc/data-free-black-box- defenseで入手できる。 Several companies often safeguard their trained deep models (i.e., details of architecture, learnt weights, training details etc.) from third-party users by exposing them only as black boxes through APIs. Moreover, they may not even provide access to the training data due to proprietary reasons or sensitivity concerns. In this work, we propose a novel defense mechanism for black box models against adversarial attacks in a data-free set up. We construct synthetic data via generative model and train surrogate network using model stealing techniques. To minimize adversarial contamination on perturbed samples, we propose 'wavelet noise remover' (WNR) that performs discrete wavelet decomposition on input images and carefully select only a few important coefficients determined by our 'wavelet coefficient selection module' (WCSM). To recover the high-frequency content of the image after noise removal via WNR, we further train a 'regenerator' network with the objective of retrieving the coefficients such that the reconstructed image yields similar to original predictions on the surrogate model. At test time, WNR combined with trained regenerator network is prepended to the black box network, resulting in a high boost in adversarial accuracy. Our method improves the adversarial accuracy on CIFAR-10 by 38.98% and 32.01% on state-of-the-art Auto Attack compared to baseline, even when the attacker uses surrogate architecture (Alexnet-half and Alexnet) similar to the black box architecture (Alexnet) with same model stealing strategy as defender. The code is available at https://github.com/vcl-iisc/data-free-black-box-defense	翻訳日:2023-06-30 16:52:49 公開日:2023-06-28
# 教師なしファジィクラスタリングを用いた歴史的文書の手書き認識 Recognizing Handwriting Styles in a Historical Scanned Document Using Unsupervised Fuzzy Clustering ( http://arxiv.org/abs/2210.16780v2 ) ライセンス: Link先を確認	Sriparna Majumdar and Aaron Brick	(参考訳) デジタル化された文書中の手書きの複数の筆跡への法医学的帰属は、高次元の難しい問題である。ユニークな手書きスタイルは、文字サイズ、ストローク幅、ループ、ダクト、傾斜角、曲がりくねったリガチュアなど、いくつかの要素を混ぜ合わせて区別することができる。隠れマルコフモデル、サポートベクターマシン、半教師付きリカレントニューラルネットワークによるラベル付きデータの研究は、中程度から高い成功を収めている。本研究では, ファジィソフトクラスタリングと線形主成分分析を組み合わせることで, 古写本のハンドシフトの検出に成功している。この進歩は、歴史文書の著者帰属と法医学的文書分析のための教師なし手法の展開を成功に導くものである。 The forensic attribution of the handwriting in a digitized document to multiple scribes is a challenging problem of high dimensionality. Unique handwriting styles may be dissimilar in a blend of several factors including character size, stroke width, loops, ductus, slant angles, and cursive ligatures. Previous work on labeled data with Hidden Markov models, support vector machines, and semi-supervised recurrent neural networks have provided moderate to high success. In this study, we successfully detect hand shifts in a historical manuscript through fuzzy soft clustering in combination with linear principal component analysis. This advance demonstrates the successful deployment of unsupervised methods for writer attribution of historical documents and forensic document analysis.	翻訳日:2023-06-30 16:52:18 公開日:2023-06-28
# AI生成したテキストは確実に検出できるのか? Can AI-Generated Text be Reliably Detected? ( http://arxiv.org/abs/2303.11156v2 ) ライセンス: Link先を確認	Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang and Soheil Feizi	(参考訳) 本稿では,経験的かつ理論的に,いくつかのAIテキスト検出装置が現実的なシナリオでは信頼できないことを示す。実験により,大規模な言語モデル (LLM) 上に光パラフレーズが適用されるパラフレーズ攻撃は,ウォーターマーキングスキームやニューラルネットワークベースの検出器,ゼロショット分類器などを含む,あらゆる種類の検出器を破壊できることを示す。本実験は, 再帰的パラフレージングに対して依然として脆弱であることを示す。次に, 言語モデルがより洗練され, 人間の文章をエミュレートする能力が向上するにつれて, 最良検出器でも性能が低下することを示す理論的に不可能であることを示す。人間の文章を模倣しようとする十分に高度な言語モデルにとって、最も有望な検出器でさえ、ランダムな分類器よりもわずかに優れている。私たちの結果は、特定の記述スタイル、巧妙なプロンプトデザイン、テキストパラフレーズなど、特定のシナリオを捉えるのに十分です。また、擬似乱数生成器が真のランダム性ではなく、AIテキスト生成に使用される場合を含むように、不可能な結果も拡張する。すべての多項式時間計算可能検出器に対して、同じ結果が無視可能な補正項を持つことを示す。最後に、透かし方式で保護されたLLMでさえ、敵対する人間が隠れたLLMテキストシグネチャを推測し、LLMが生成したテキストとして検出する人為的なテキストに追加できる偽造攻撃に対して脆弱であり、開発者が評判を損なう可能性があることを示す。これらの結果は、AI生成テキストの倫理的かつ信頼性の高い使用に関するコミュニティの正直な会話を開こうとしています。 In this paper, both empirically and theoretically, we show that several AI-text detectors are not reliable in practical scenarios. Empirically, we show that paraphrasing attacks, where a light paraphraser is applied on top of a large language model (LLM), can break a whole range of detectors, including ones using watermarking schemes as well as neural network-based detectors and zero-shot classifiers. Our experiments demonstrate that retrieval-based detectors, designed to evade paraphrasing attacks, are still vulnerable to recursive paraphrasing. We then provide a theoretical impossibility result indicating that as language models become more sophisticated and better at emulating human text, the performance of even the best-possible detector decreases. For a sufficiently advanced language model seeking to imitate human text, even the best-possible detector may only perform marginally better than a random classifier. Our result is general enough to capture specific scenarios such as particular writing styles, clever prompt design, or text paraphrasing. We also extend the impossibility result to include the case where pseudorandom number generators are used for AI-text generation instead of true randomness. We show that the same result holds with a negligible correction term for all polynomial-time computable detectors. Finally, we show that even LLMs protected by watermarking schemes can be vulnerable against spoofing attacks where adversarial humans can infer hidden LLM text signatures and add them to human-generated text to be detected as text generated by the LLMs, potentially causing reputational damage to their developers. We believe these results can open an honest conversation in the community regarding the ethical and reliable use of AI-generated text.	翻訳日:2023-06-30 16:43:25 公開日:2023-06-28
# データスワップのヘイトスケーリング法則について On Hate Scaling Laws For Data-Swamps ( http://arxiv.org/abs/2306.13141v2 ) ライセンス: Link先を確認	Abeba Birhane, Vinay Prabhu, Sang Han, Vishnu Naresh Boddeti	(参考訳) 「モデルをスケールし、データをスケールし、GPUファームをスケール」は、今日の生成AIの世界における支配的な感情である。モデルスケーリングは広く研究されているが、データスケーリングとその下流への影響はまだ検討中である。これは、主要なソースがWorld Wide Webであり、CommonCrawlダンプとしてまとめてパッケージ化されている視覚言語データセットのコンテキストにおいて、特に重要である。この大規模データダンプは、多くの欠点があることが知られているが、繰り返し採掘され、大規模生成モデルのデータメーザーロデとして機能する。本稿では, 1)4億試料と20億試料を含むlaion-400mとlaion-2b-enの比較監査による憎悪コンテンツに対するデータセットのスケーリングの効果の検討 2)シカゴ・フェイス・データセット(CFD)を用いてトレーニングしたモデルの人種的偏りを測定することにより,これらのデータセット変種に基づいて訓練された視覚言語モデルに対するスケールのダウンストリームの影響を評価する。私たちの結果は 1)データセットにおける憎悪コンテンツの存在は,pysentimiento hate-detection natural language processing (nlp)モデルの推論に基づくヘイトコンテンツ率 (hcr) 測定値を用いて測定すると,約12-%$で増加した。 2) 社会バイアスと負のステレオタイプは, 評価したモデルに対するスケールとともに悪化した。スケールが大きくなるにつれて、人間の顔の画像と「人間」のクラスを関連付けるモデルが、他の7つの攻撃クラスを半分に減らす傾向が見られた。さらに、黒人女性のカテゴリーでは、モデルが「犯罪」クラスと顔を関連付ける傾向が2倍になり、黒人男性の顔のクインツップリングは2倍になった。我々は,モデル監査結果の質的・歴史的分析を行い,我々の発見とそのデータセットのキュレーション実践への影響を反映するとともに,この領域で実施すべき知見と今後の課題について概説する。 `Scale the model, scale the data, scale the GPU-farms' is the reigning sentiment in the world of generative AI today. While model scaling has been extensively studied, data scaling and its downstream impacts remain under explored. This is especially of critical importance in the context of visio-linguistic datasets whose main source is the World Wide Web, condensed and packaged as the CommonCrawl dump. This large scale data-dump, which is known to have numerous drawbacks, is repeatedly mined and serves as the data-motherlode for large generative models. In this paper, we: 1) investigate the effect of scaling datasets on hateful content through a comparative audit of the LAION-400M and LAION-2B-en, containing 400 million and 2 billion samples respectively, and 2) evaluate the downstream impact of scale on visio-linguistic models trained on these dataset variants by measuring racial bias of the models trained on them using the Chicago Face Dataset (CFD) as a probe. Our results show that 1) the presence of hateful content in datasets, when measured with a Hate Content Rate (HCR) metric on the inferences of the Pysentimiento hate-detection Natural Language Processing (NLP) model, increased by nearly $12\%$ and 2) societal biases and negative stereotypes were also exacerbated with scale on the models we evaluated. As scale increased, the tendency of the model to associate images of human faces with the `human being' class over 7 other offensive classes reduced by half. Furthermore, for the Black female category, the tendency of the model to associate their faces with the `criminal' class doubled, while quintupling for Black male faces. We present a qualitative and historical analysis of the model audit results, reflect on our findings and its implications for dataset curation practice, and close with a summary of our findings and potential future work to be done in this area.	翻訳日:2023-06-30 16:25:20 公開日:2023-06-28
# ViP:コンピュータビジョンのための微分プライベートファンデーションモデル ViP: A Differentially Private Foundation Model for Computer Vision ( http://arxiv.org/abs/2306.08842v2 ) ライセンス: Link先を確認	Yaodong Yu and Maziar Sanjabi and Yi Ma and Kamalika Chaudhuri and Chuan Guo	(参考訳) 人工知能(AI)は、インターネット規模のデータに基づいてトレーニングされた基礎モデルを使用することで、能力の飛躍的な増加を見せている。逆に、インターネット規模のデータの未処理の性質は、個人情報や著作権のある資料を許可なくトレーニングするべきではないため、重大なプライバシーや法的リスクも伴う。本研究では,DP(差分プライバシ)を保証した基礎的ビジョンモデルを学習するためのレシピの緩和尺度として提案する。マスク付きオートエンコーダは、DP-SGDとうまく一致した適切な学習アルゴリズムであり、LAION400Mデータセットの厳格なプライバシー予算として、差分プライバシを備えたビジョントランスフォーマーであるViPをトレーニングする。我々は、標準の下流視覚タスクを用いて、VPが学習した表現の質を評価する。特に、VPは、ImageNet上で5,5.7 %の(プライベートでない)線形探索精度を達成している。この結果から,インターネット規模データへのスケーリングは,私的学習に有効であることが示唆された。コードは \url{https://github.com/facebookresearch/vip-mae} で入手できる。 Artificial intelligence (AI) has seen a tremendous surge in capabilities thanks to the use of foundation models trained on internet-scale data. On the flip side, the uncurated nature of internet-scale data also poses significant privacy and legal risks, as they often contain personal information or copyrighted material that should not be trained on without permission. In this work, we propose as a mitigation measure a recipe to train foundation vision models with differential privacy (DP) guarantee. We identify masked autoencoders as a suitable learning algorithm that aligns well with DP-SGD, and train ViP -- a Vision transformer with differential Privacy -- under a strict privacy budget of $\epsilon=8$ on the LAION400M dataset. We evaluate the quality of representation learned by ViP using standard downstream vision tasks; in particular, ViP achieves a (non-private) linear probing accuracy of $55.7\%$ on ImageNet, comparable to that of end-to-end trained AlexNet (trained and evaluated on ImageNet). Our result suggests that scaling to internet-scale data can be practical for private learning. Code is available at \url{https://github.com/facebookresearch/ViP-MAE}.	翻訳日:2023-06-30 16:24:23 公開日:2023-06-28
# ランク縮小カルマンフィルタ : 高次元における近似動的低ランクフィルタリング The Rank-Reduced Kalman Filter: Approximate Dynamical-Low-Rank Filtering In High Dimensions ( http://arxiv.org/abs/2306.07774v2 ) ライセンス: Link先を確認	Jonathan Schmidt, Philipp Hennig, J\"org Nick, Filip Tronarp	(参考訳) 高次元力学系の文脈における推論とシミュレーションは、計算的に難しい問題のままである。いくつかの次元還元は、問題を一般に引き出すのに必要である。本稿では,共分散行列の低ランク近似を伝播する新しい近似ガウスフィルタ・平滑化法を提案する。これは、予測ステップに関連するリアプノフ方程式を低ランク行列の多様体に投影し、最近開発された数値的に安定な動的低ランク積分器によって解かれる。一方、共分散更新は共分散行列の列空間のみを変換し、構成によりランクが低いことを指摘して、更新ステップを扱いやすくする。このアルゴリズムは、共分散行列の低ランク近似が確率的ではなく決定論的であるという点において、既存のアンサンブルに基づくアプローチと差別化する。これにより、低ランク次元が問題の真の次元に近づくにつれて、正確なカルマンフィルタを再現することができる。本手法は,(カルマンフィルタの場合)立方体から最悪の場合の状態空間サイズにおける \emph{quadratic} までの計算複雑性を低減し,状態空間モデルが一定の条件を満たす場合に \emph{linear} の複雑性を実現する。古典的データ同化と時空間回帰の一連の実験を通じて,提案手法は平均誤差と正確なカルマンフィルタに対する共変性の観点から,アンサンブルに基づく手法を一貫して上回っていることを示す。これは漸近的な計算の複雑さに関して追加のコストを伴わない。 Inference and simulation in the context of high-dimensional dynamical systems remain computationally challenging problems. Some form of dimensionality reduction is required to make the problem tractable in general. In this paper, we propose a novel approximate Gaussian filtering and smoothing method which propagates low-rank approximations of the covariance matrices. This is accomplished by projecting the Lyapunov equations associated with the prediction step to a manifold of low-rank matrices, which are then solved by a recently developed, numerically stable, dynamical low-rank integrator. Meanwhile, the update steps are made tractable by noting that the covariance update only transforms the column space of the covariance matrix, which is low-rank by construction. The algorithm differentiates itself from existing ensemble-based approaches in that the low-rank approximations of the covariance matrices are deterministic, rather than stochastic. Crucially, this enables the method to reproduce the exact Kalman filter as the low-rank dimension approaches the true dimensionality of the problem. Our method reduces computational complexity from cubic (for the Kalman filter) to \emph{quadratic} in the state-space size in the worst-case, and can achieve \emph{linear} complexity if the state-space model satisfies certain criteria. Through a set of experiments in classical data-assimilation and spatio-temporal regression, we show that the proposed method consistently outperforms the ensemble-based methods in terms of error in the mean and covariance with respect to the exact Kalman filter. This comes at no additional cost in terms of asymptotic computational complexity.	翻訳日:2023-06-30 16:24:03 公開日:2023-06-28
# GANに基づく研究 A Work Based on GAN ( http://arxiv.org/abs/2306.03538v3 ) ライセンス: Link先を確認	Honghao Fu	(参考訳) この作業は提出段階に入るので、特定の情報は一時的に隠され、タイトルも隠される。 This work will enter the submission stage, so specific information will be temporarily hidden, also hide the title.	翻訳日:2023-06-30 16:23:17 公開日:2023-06-28
# 監視量子デバイスにおける輸送と非相互性の精密記述 Exact description of transport and non-reciprocity in monitored quantum devices ( http://arxiv.org/abs/2306.16452v1 ) ライセンス: Link先を確認	Jo\~ao Ferreira, Tony Jin, Jochen Mannhart, Thierry Giamarchi, Michele Filippone	(参考訳) 本研究では, 連続モニタリング中の非相互作用性フェルミオン系について検討した。測定結果から平均すると, 系内の粒子と熱の流れの正確な公式を導出する。これらの電流は競合する弾性成分と非弾性成分を特徴とし、非自明にモニタリングの強さに依存する。モニタによる非弾性プロセスは非逆流を生じさせ、アクティブなフィードバック制御なしに測定結果から作業を抽出することができる。測定誘導電力または冷却を提供する2つの異なるモニタリングスキームで、我々の形式を説明する。 ~optimal performancesは,摂動的アプローチでは対処が難しい監視強度$\gamma$の値に対して見出される。 We study non-interacting fermionic systems undergoing continuous monitoring and driven by biased reservoirs. Averaging over the measurement outcomes, we derive exact formulas for the particle and heat flows in the system. We show that these currents feature competing elastic and inelastic components, which depend non-trivially on the monitoring strength $\gamma$. We highlight that monitor-induced inelastic processes lead to non-reciprocal currents, allowing to extract work from measurements without active feedback control. We illustrate our formalism with two distinct monitoring schemes providing measurement-induced power or cooling.~Optimal performances are found for values of the monitoring strength $\gamma$ which are hard to address with perturbative approaches.	翻訳日:2023-06-30 16:16:49 公開日:2023-06-28
# shorのアルゴリズムにおけるモジュラー乗算力学系のカオス的根元 Chaotic Roots of the Modular Multiplication Dynamical System in Shor's Algorithm ( http://arxiv.org/abs/2306.16446v1 ) ライセンス: Link先を確認	Abu Musa Patoary and Amit Vikram and Laura Shou and Victor Galitski	(参考訳) ショアのファクタリングアルゴリズムは、古典計算よりも指数関数的なスピードアップを提供すると信じられており、正確に周期的な量子モジュラー乗算作用素の周期を見つけることに依存している。この完全周期性は、モジュラー乗法作用素の古典極限が古典エルゴード階層の「最大ランダムな」ベルヌーイ準位を占有する非常にカオス的なシステムであることから、量子カオスの観点では矛盾する可積分系の特徴である。本研究では、量子力学系の観点からこの明らかなパラドックスにアプローチし、エルゴード性やカオスのシグネチャが実際にカオス系の「可積分」量子化に符号化されるかどうかを検討する。特定の場合において、ショアのモジュラー乗算作用素は量子化されたa-ベーカー写像の重ね合わせとして書くことができ、より典型的な量子カオスとエルゴード性を示す。この研究は、ショアのモジュラー乗算作用素の可積分性は、同じ写像の族における他の「カオス的」量子化の干渉に起因し、量子アルゴリズムによる積分性、エルゴード性、カオスの相互作用のより深い研究の道を開くことを示唆している。 Shor's factoring algorithm, believed to provide an exponential speedup over classical computation, relies on finding the period of an exactly periodic quantum modular multiplication operator. This exact periodicity is the hallmark of an integrable system, which is paradoxical from the viewpoint of quantum chaos, given that the classical limit of the modular multiplication operator is a highly chaotic system that occupies the "maximally random" Bernoulli level of the classical ergodic hierarchy. In this work, we approach this apparent paradox from a quantum dynamical systems viewpoint, and consider whether signatures of ergodicity and chaos may indeed be encoded in such an "integrable" quantization of a chaotic system. We show that Shor's modular multiplication operator, in specific cases, can be written as a superposition of quantized A-baker's maps exhibiting more typical signatures of quantum chaos and ergodicity. This work suggests that the integrability of Shor's modular multiplication operator may stem from the interference of other "chaotic" quantizations of the same family of maps, and paves the way for deeper studies on the interplay of integrability, ergodicity and chaos in and via quantum algorithms.	翻訳日:2023-06-30 16:16:33 公開日:2023-06-28
# モデル非依存な対話的特徴帰属による性能向上とサンプル効率向上 Increasing Performance And Sample Efficiency With Model-agnostic Interactive Feature Attributions ( http://arxiv.org/abs/2306.16431v1 ) ライセンス: Link先を確認	Joran Michiels, Maarten De Vos, Johan Suykens	(参考訳) モデルに依存しない特徴属性は、複雑なMLモデルに局所的な洞察を与えることができる。説明が正しければ、ドメインエキスパートはモデルの判断を検証し、信頼することができる。しかし、専門家の知識と矛盾する場合、関連する作業はモデルを改善するために無関係な特徴のみを補正する。本稿では,2つの一般的な説明手法(Occlusion と Shapley の値)に対して,モデルに依存しない実装を提供することにより,複雑なモデルに完全に異なる属性を強制する。特定のサンプルのセットに対して、修正された特徴属性を使用して追加の局所データを生成し、サンプルの正しい説明をするためにモデルを再トレーニングする。様々なモデルに関するシミュレーションおよび実データ実験を通じて、提案手法がモデルの性能を大幅に向上させる方法を示し、修正された説明に基づいてトレーニングデータセットを増強する。アクティブな学習環境にインタラクティブな説明を加えることで、サンプル効率が大幅に向上し、既存の説明的対話戦略よりも優れています。さらに、ドメインの専門家がモデルを改善するのに十分正しい機能属性を提供する方法についても検討する。 Model-agnostic feature attributions can provide local insights in complex ML models. If the explanation is correct, a domain expert can validate and trust the model's decision. However, if it contradicts the expert's knowledge, related work only corrects irrelevant features to improve the model. To allow for unlimited interaction, in this paper we provide model-agnostic implementations for two popular explanation methods (Occlusion and Shapley values) to enforce entirely different attributions in the complex model. For a particular set of samples, we use the corrected feature attributions to generate extra local data, which is used to retrain the model to have the right explanation for the samples. Through simulated and real data experiments on a variety of models we show how our proposed approach can significantly improve the model's performance only by augmenting its training dataset based on corrected explanations. Adding our interactive explanations to active learning settings increases the sample efficiency significantly and outperforms existing explanatory interactive strategies. Additionally we explore how a domain expert can provide feature attributions which are sufficiently correct to improve the model.	翻訳日:2023-06-30 16:15:51 公開日:2023-06-28
# DNA-TEQ:DNN推論のためのテンソルの適応指数量子化 DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN Inference ( http://arxiv.org/abs/2306.16430v1 ) ライセンス: Link先を確認	Bahareh Khabbazan, Marc Riera, Antonio Gonz\'alez	(参考訳) 量子化はディープニューラルネットワーク(DNN)において、アクティベーションと重みの算術的精度、すなわちテンソルを小さくすることで、記憶と計算の複雑さを減らすために一般的に用いられる。効率的なハードウェアアーキテクチャでは、最近のDNNを組み込みシステムやモバイルデバイスに展開するために線形量子化を用いる。しかし、線形均一量子化はモデル精度の点で高い性能を犠牲にすることなく、通常8ビット未満に数値精度を下げることはできない。パフォーマンスの損失はテンソルが一様分布に従わないためである。本稿では,かなりの量のテンソルが指数分布に適合することを示す。そこで我々は,DNNテンソルを指数関数的に定量化するDNA-TEQを提案する。実験の結果,DNA-TEQの量子化ビット幅は従来の提案よりもはるかに小さく,平均圧縮比は線形INT8ベースラインよりも40%も小さく,精度の低下は無視でき,DNNを再トレーニングすることができないことがわかった。さらに、DNA-TEQは指数領域でのドット生成操作を誘導し、広く使用されているDNNのセットで平均して66%のエネルギー消費を節約する。 Quantization is commonly used in Deep Neural Networks (DNNs) to reduce the storage and computational complexity by decreasing the arithmetical precision of activations and weights, a.k.a. tensors. Efficient hardware architectures employ linear quantization to enable the deployment of recent DNNs onto embedded systems and mobile devices. However, linear uniform quantization cannot usually reduce the numerical precision to less than 8 bits without sacrificing high performance in terms of model accuracy. The performance loss is due to the fact that tensors do not follow uniform distributions. In this paper, we show that a significant amount of tensors fit into an exponential distribution. Then, we propose DNA-TEQ to exponentially quantize DNN tensors with an adaptive scheme that achieves the best trade-off between numerical precision and accuracy loss. The experimental results show that DNA-TEQ provides a much lower quantization bit-width compared to previous proposals, resulting in an average compression ratio of 40% over the linear INT8 baseline, with negligible accuracy loss and without retraining the DNNs. Besides, DNA-TEQ leads the way in performing dot-product operations in the exponential domain, which saves 66% of energy consumption on average for a set of widely used DNNs.	翻訳日:2023-06-30 16:15:21 公開日:2023-06-28
# 低ランクテンソル分解による複素数値適応系同定 Complex-valued Adaptive System Identification via Low-Rank Tensor Decomposition ( http://arxiv.org/abs/2306.16428v1 ) ライセンス: Link先を確認	Oliver Ploder, Christina Auer, Oliver Lang, Thomas Paireder, Mario Huemer	(参考訳) 機械学習(ML)とテンソルベースの手法は、ここ数十年、科学コミュニティにとって大きな関心を集めてきた。以前の研究で、我々はテンソルのみのアーキテクチャの計算負荷を軽減しつつ、非常に優れた性能を達成できる新しいテンソルベースのシステム識別フレームワークを提示した。しかし、導出手法は実数値問題のみを処理できるため、複雑な数値システムを扱う広範囲の信号処理や通信問題に直接適用できない。そこで本研究では,複雑な数値信号の処理を可能にする2つの新しいアーキテクチャを導出し,これらの拡張が,計算資源のオーバーヘッドをわずかに必要とせず,複雑な数値信号の処理が可能なことを示す。 Machine learning (ML) and tensor-based methods have been of significant interest for the scientific community for the last few decades. In a previous work we presented a novel tensor-based system identification framework to ease the computational burden of tensor-only architectures while still being able to achieve exceptionally good performance. However, the derived approach only allows to process real-valued problems and is therefore not directly applicable on a wide range of signal processing and communications problems, which often deal with complex-valued systems. In this work we therefore derive two new architectures to allow the processing of complex-valued signals, and show that these extensions are able to surpass the trivial, complex-valued extension of the original architecture in terms of performance, while only requiring a slight overhead in computational resources to allow for complex-valued operations.	翻訳日:2023-06-30 16:14:59 公開日:2023-06-28
# カラーコードデコーダによる表面符号故障の最小化 Minimising surface-code failures using a color-code decoder ( http://arxiv.org/abs/2306.16476v1 ) ライセンス: Link先を確認	Asmae Benhemou, Kaavya Sahay, Lingling Lao, Benjamin J. Brown	(参考訳) 実用的で高性能な復号アルゴリズムの開発は、フォールトトレラント量子コンピューティングのリソースコストを削減する。本稿では、偏極雑音モデルによって生じる誤差に対する低重補正演算子を求める表面符号のデコーダを提案する。このデコーダは、表面符号のシンドロームをカラーコードのシンドロームにマッピングすることにより、より洗練されたカラーコードデコーダアルゴリズムを採用することができる。解析的な議論と徹底的なテストにより、結果として生じるデコーダは、コード距離$d$でさえも、すべてのウェイト$d/2$デポーラライズエラーに対して最小の重み付け補正を見つけることができる。これにより、指数係数$O(2^{d/2})$による論理誤差率をビットフリップとデフォーカスエラーを別々に扱うデコーダと比較して改善する。この改善を解析的考察と低い誤差率での数値シミュレーションを用いて実証する。また,従来のカラーコード復号法と比較して,色コードに影響を及ぼす独立かつ同一分散ビットフリップ誤差を補正するために用いたデコーダの論理誤り率を指数関数的に改善することを示す。 The development of practical, high-performance decoding algorithms reduces the resource cost of fault-tolerant quantum computing. Here we propose a decoder for the surface code that finds low-weight correction operators for errors produced by the depolarising noise model. The decoder is obtained by mapping the syndrome of the surface code onto that of the color code, thereby allowing us to adopt more sophisticated color-code decoding algorithms. Analytical arguments and exhaustive testing show that the resulting decoder can find a least-weight correction for all weight $d/2$ depolarising errors for even code distance $d$. This improves the logical error rate by an exponential factor $O(2^{d/2})$ compared with decoders that treat bit-flip and dephasing errors separately. We demonstrate this improvement with analytical arguments and supporting numerical simulations at low error rates. Of independent interest, we also demonstrate an exponential improvement in logical error rate for our decoder used to correct independent and identically distributed bit-flip errors affecting the color code compared with more conventional color-code decoding algorithms.	翻訳日:2023-06-30 16:06:05 公開日:2023-06-28
# All-in-fiber動的軌道角運動量モードソート All-in-fiber dynamic orbital angular momentum mode sorting ( http://arxiv.org/abs/2306.16472v1 ) ライセンス: Link先を確認	Alvaro Alarc\'on, Santiago G\'omez, Daniel Spegel-Lexne, Joakim Argillander, Jaime Cari\~ne, Gustavo Ca\~nas, Gustavo Lima, Guilherme B. Xavier	(参考訳) 光の自由度の軌道角運動量(OAM)は、電気通信、量子情報、光に基づくマイクロマニピュレーションなど、多くの応用で広く研究されている。異なる横空間モードを分離し区別する能力はモードソートまたはモード分割と呼ばれ、そのようなアプリケーションで符号化された情報を復元することが不可欠である。理想的な$d$モードソーターは、異なる$d$空間モードの区別を忠実に行うことができ、損失を最小限に抑え、$d$出力を持ち、応答時間が速い。従来の全てのモードソータは、空間光変調器などのバルク光学素子に依存しており、光ファイバーシステムと統合される場合、迅速に調整することはできず、さらに損失が生じる。本稿では,超高速動的再構成性を有するoamモードソートにおける最初のオールインファイバー方式を提案する。提案手法は,まず光線形偏光(LP)モードでOAMモードを分解し,次に干渉的に再結合してトポロジカルチャージを決定し,OAMモードを正しくソートする。さらに、oamモードの超高速ルーティングを行うためにも、セットアップを利用できます。これらの結果は、古典的および量子的情報処理における多くの新しい用途に容易に利用できる新しい光空間モードソート法である。 The orbital angular momentum (OAM) spatial degree of freedom of light has been widely explored in many applications, including telecommunications, quantum information and light-based micro-manipulation. The ability to separate and distinguish between the different transverse spatial modes is called mode sorting or mode demultiplexing, and it is essential to recover the encoded information in such applications. An ideal $d$ mode sorter should be able to faithfully distinguish between the different $d$ spatial modes, with minimal losses, have $d$ outputs, and have fast response times. All previous mode sorters rely on bulk optical elements such as spatial light modulators, which cannot be quickly tuned and have additional losses if they are to be integrated with optical fiber systems. Here we propose and experimentally demonstrate, to the best of our knowledge, the first all-in-fiber method for OAM mode sorting with ultra-fast dynamic reconfigurability. Our scheme first decomposes the OAM mode in fiber-optical linearly polarized (LP) modes, and then interferometrically recombines them to determine the topological charge, thus correctly sorting the OAM mode. In addition, our setup can also be used to perform ultra-fast routing of the OAM modes. These results show a novel and fiber integrated form of optical spatial mode sorting that can be readily used for many new applications in classical and quantum information processing.	翻訳日:2023-06-30 16:05:50 公開日:2023-06-28
# 地球と宇宙における量子センサによるボセノバの検出 Detection of Bosenovae with Quantum Sensors on Earth and in Space ( http://arxiv.org/abs/2306.16468v1 ) ライセンス: Link先を確認	Jason Arakawa, Joshua Eby, Marianna S. Safronova, Volodymyr Takhistov, and Muhammad H. Zaheer	(参考訳) 幅広い理論のクラスにおいて、質量 10^{-22}~\textrm{eV} < m_{\phi} < 1~\textrm{eV}$ の超光暗黒物質(ULDM)の蓄積は、ボソン星として知られる長寿命の境界状態を形成する。 ULDMが自己相互作用を示すと、ボセノヴァ爆発で崩壊するボソン星から相対論的ボソンによって運ばれる膨大なエネルギーが放出される。我々は、光子、電子、グルーオンに結合したULDMを含むスカラー粒子の放射された相対論的バーストの過渡的なシグネチャを検出するための地球外および宇宙ベースの実験の可能性を探り、幅広い動機付けられた理論を捉えた。緩和型ULDMのシナリオとして、核時計や宇宙ベースの干渉計などの今後の実験や技術が、過渡的なボセノバ現象のシグネチャを検出することによって、ULDM結合質量パラメータ空間のオーダーを感度よく探索できることを実証する。解析は相対論的スカラー粒子放出の異なるシナリオに容易に拡張できる。 In a broad class of theories, the accumulation of ultralight dark matter (ULDM) with particles of mass $10^{-22}~\textrm{eV} < m_{\phi} < 1~\textrm{eV}$ leads the to formation of long-lived bound states known as boson stars. When the ULDM exhibits self-interactions, prodigious bursts of energy carried by relativistic bosons are released from collapsing boson stars in bosenova explosions. We extensively explore the potential reach of terrestrial and space-based experiments for detecting transient signatures of emitted relativistic bursts of scalar particles, including ULDM coupled to photons, electrons, and gluons, capturing a wide range of motivated theories. For the scenario of relaxion ULDM, we demonstrate that upcoming experiments and technology such as nuclear clocks as well as space-based interferometers will be able to sensitively probe orders of magnitude in the ULDM coupling-mass parameter space, challenging to study otherwise, by detecting signatures of transient bosenova events. Our analysis can be readily extended to different scenarios of relativistic scalar particle emission.	翻訳日:2023-06-30 16:05:23 公開日:2023-06-28
# 平均場レベルでの光学格子中のキラルスピン液体相 Chiral spin liquid phase in an optical lattice at mean-field level ( http://arxiv.org/abs/2306.16466v1 ) ライセンス: Link先を確認	Jian Yang and Xiong-Jun Liu	(参考訳) 我々は,スレーブ・ローター理論とスピノン平均場理論に基づく低温原子のカイラルスピン液体(csl)相を示すために,{\mathrm{u}(1)$合成ゲージフラックスを持つ光学ラマン正方形格子について検討した。ラマンポテンシャルによって生成される有効U($1$)ゲージ束は、CSL相の実現に重要な役割を果たしている。スレーブロータ技術を用いることで、中間のFermi Hubbard相互作用レギュレーションでCSL位相を求める。強い相互作用系では、4つのスピン相互作用を含む効果的なスピンモデルを導出する。スピノン平均場解析により,強磁気フラストレーションの場合,CSL相は安定していることが示された。 2つの平均場近似法は、一貫した位相図を与え、CSL位相の定性的数値的な証拠を与える。 We study an optical Raman square lattice with $\mathrm{U}(1)$ synthetic gauge flux to show chiral spin liquid (CSL) phase for cold atoms based on slave-rotor theory and spinon mean-field theory, respectively. An effective U($1$) gauge flux generated by Raman potentials plays a major role in realizing the CSL phase. By using slave-rotor techniques we find CSL phase at intermediate on-site Fermi Hubbard interacting regime. For the strong interacting regime we derive an effective spin model including up to the four spin interactions. By spinon mean-field analysis it is shown that CSL phase is stabilized in the case of strong magnetic frustration. The two mean-field approximation methods give consistent phase diagrams and provide qualitative numerical evidence of the CSL phase.	翻訳日:2023-06-30 16:05:02 公開日:2023-06-28
# フロッケ絶縁体と格子フェルミオン Floquet insulators and lattice fermions ( http://arxiv.org/abs/2306.16463v1 ) ライセンス: Link先を確認	Thomas Iadecola, Srimoyee Sen, Lars Sivertsen	(参考訳) フロッケ絶縁体は周期的に駆動される量子システムであり、ドライブパラメータの関数として新しい位相位相をホストすることができる。これらの新しい相は離散時間格子フェルミオン理論のフェルミオン二重化を思わせる特徴を持っている。この提案は、ある駆動パラメータに対する非相互作用(1+1)D Floquet 絶縁体のスペクトルを時間非依存ハミルトニアンによる離散時間格子フェルミオン理論のスペクトルにマッピングすることで具体化する。結果として得られるハミルトニアンは、ストロボスコープダイナミクスを生成するフロケットハミルトニアンとは異なる。離散時間Su-Schrieffer-Heegerモデルと原モデルの空間的位置の半数、あるいは空間的位置の4分の1の(1+1)D Wilson-Dirac理論の形式をとることができる。 Floquet insulators are periodically driven quantum systems that can host novel topological phases as a function of the drive parameters. These new phases exhibit features reminiscent of fermion doubling in discrete-time lattice fermion theories. We make this suggestion concrete by mapping the spectrum of a noninteracting (1+1)D Floquet insulator for certain drive parameters onto that of a discrete-time lattice fermion theory with a time-independent Hamiltonian. The resulting Hamiltonian is distinct from the Floquet Hamiltonian that generates stroboscopic dynamics. It can take the form of a discrete-time Su-Schrieffer-Heeger model with half the number of spatial sites of the original model, or of a (1+1)D Wilson-Dirac theory with one quarter of the spatial sites.	翻訳日:2023-06-30 16:04:46 公開日:2023-06-28
# 非局所量子計算と情報理論暗号 Relating non-local quantum computation to information theoretic cryptography ( http://arxiv.org/abs/2306.16462v1 ) ライセンス: Link先を確認	Rene Allerstorfer, Harry Buhrman, Alex May, Florian Speelman, Philip Verduyn Lunel	(参考訳) 非局所量子計算(NLQC)は位置検証スキームの不正な方法であり、AdS/CFT対応の文脈に現れている。ここでは、nlqcを情報理論的な暗号のより広い文脈に結びつけ、他の多くの暗号プリミティブに関連付ける。 f$-routingとして知られるnlqcの特別な場合の一つは、cdsプリミティブの条件付き開示の量子アナログ(英語版)(quantum analogue of the conditional disclosure of secrets)に相当する。さらに,コヒーレント関数評価(CFE)と呼ばれる位置検証の特殊な事例についても検討し,CFEプロトコルがプライベート同時メッセージパッシング(PSM)シナリオに対して同様の効率的なプロトコルを誘導することを示す。これらの暗号プリミティブに位置検証を関連付けることで、暗号文学における多くの結果はNLQCに新しい意味を与え、その逆も与える。これには、最悪の場合のコストが$f$-routing of $2^{O(\sqrt{n\log n})}$ entanglement(英語版)の最初の部分指数上界、外部にあると思われる問題に対する効率的な$f$-routing(英語版)戦略の最初の例、量子設定におけるCDSの絡み合いの線形下界、CFEの通信コストの線形下界、低T$の量子回路で計算できる関数の量子設定におけるCDSの効率的なプロトコルが含まれる。 Non-local quantum computation (NLQC) is a cheating strategy for position-verification schemes, and has appeared in the context of the AdS/CFT correspondence. Here, we connect NLQC to the wider context of information theoretic cryptography by relating it to a number of other cryptographic primitives. We show one special case of NLQC, known as $f$-routing, is equivalent to the quantum analogue of the conditional disclosure of secrets (CDS) primitive, where by equivalent we mean that a protocol for one task gives a protocol for the other with only small overhead in resource costs. We further consider another special case of position verification, which we call coherent function evaluation (CFE), and show CFE protocols induce similarly efficient protocols for the private simultaneous message passing (PSM) scenario. By relating position-verification to these cryptographic primitives, a number of results in the cryptography literature give new implications for NLQC, and vice versa. These include the first sub-exponential upper bounds on the worst case cost of $f$-routing of $2^{O(\sqrt{n\log n})}$ entanglement, the first example of an efficient $f$-routing strategy for a problem believed to be outside $P/poly$, linear lower bounds on entanglement for CDS in the quantum setting, linear lower bounds on communication cost of CFE, and efficient protocols for CDS in the quantum setting for functions that can be computed with quantum circuits of low $T$ depth.	翻訳日:2023-06-30 16:04:33 公開日:2023-06-28
# ヤコビ法によるフェルミの黄金律を超えて Beyond Fermi's golden rule with the Jacobi method ( http://arxiv.org/abs/2306.16457v1 ) ライセンス: Link先を確認	David M. Long, Dominik Hahn, Marin Bukov, Anushya Chandran	(参考訳) 量子力学における多くの問題は、単一量子状態の連続体への崩壊として考えられる。時間に依存する初期状態の重なりは忠実性と呼ばれ、この崩壊を特徴づける。エルゴード・ハミルトニアンへのクエンチ後の忠実性の解析的表現を導出する。この表現は弱クエンチェと強クエンチェの両方で有効であり、ヒルベルト空間の有限性以前の時間スケールは忠実性を制限する。初期の二次的崩壊と漸近的指数的崩壊を再現し、強いクエンチェではフェルミの黄金律とは異なる速度で再現する。この分析はジャコビ法(Jacobi method)に依存しており、これはもともとほぼ局所的な系に応用され、ここではよく熱化された系に適応する。この結果は,ジャコビ法が量子力学の異なる状態において予測可能であることを示す。 Many problems in quantum dynamics can be cast as the decay of a single quantum state into a continuum. The time dependent overlap with the initial state, called the fidelity, characterizes this decay. We derive an analytic expression for the fidelity after a quench to an ergodic Hamiltonian. The expression is valid for both weak and strong quenches, and timescales before finiteness of the Hilbert space limits the fidelity. It reproduces initial quadratic decay and asymptotic exponential decay with a rate which, for strong quenches, differs from Fermi's golden rule. The analysis relies on the Jacobi method, which was originally applied in nearly localized systems, and which we here adapt to well-thermalizing systems. Our results demonstrate that the Jacobi method is predictive in disparate regimes of quantum dynamics.	翻訳日:2023-06-30 16:04:00 公開日:2023-06-28
# 翻訳不変行列積状態と$W$-State表現について On Translation-Invariant Matrix Product States and $W$-State Representations ( http://arxiv.org/abs/2306.16456v1 ) ライセンス: Link先を確認	Petr Klimov, Richik Sengupta and Jacob Biamonte	(参考訳) この研究は、特定の状態のクラスに対する周期的境界条件を持つ翻訳不変行列積状態を構築する方法の開発に焦点をあてる。特に$n$-party $W$-state の結合次元表現は、現在知られている方法を超えている。さらに、最小限の結合次元の推定を改善する可能性も考慮し、$d(\psi)$と表記する。決定論的アルゴリズムは、$d(\psi)$ を見つけるために構築され、また、$d(\psi)$ の決定とその性質の理解に関連する様々な問題を探索する。特に、W状態に対して、結合次元 $ \left\lfloor \frac{n}{2} \right\rfloor +1 の TI-MPS 表現を構築する。さらに、我々は、結合次元を$nに下げることができる多数の状態を証明する。 $ This work focuses on developing methods to construct translation-invariant matrix product states with periodic boundary conditions for specific classes of states. We notably consider the bond dimension representations for the $n$-party $W$-state, surpassing currently known methods. Additionally, we consider possibilities for improving estimates of the minimal possible bond dimension, denoted as $d(\psi)$. A deterministic algorithm is constructed to discover $d(\psi)$, and we also explore various issues associated with determining $d(\psi)$ and understanding its properties. In particular, we construct for W-state an TI-MPS representation of bond dimension $ \left\lfloor \frac{n}{2} \right\rfloor +1. $ Moreover, we prove a large class of states we can reduce the bond-dimension to $n.$	翻訳日:2023-06-30 16:03:46 公開日:2023-06-28
# 監視アンラベリングによるノイズ浅部回路の効率的なサンプリング Efficient sampling of noisy shallow circuits via monitored unraveling ( http://arxiv.org/abs/2306.16455v1 ) ライセンス: Link先を確認	Zihan Cheng and Matteo Ippoliti	(参考訳) 本研究では,2次元量子ビットアレイ上の浅くノイズの多いランダム回路の出力をサンプリングする古典的なアルゴリズムを提案する。このアルゴリズムは、最近発表された '`space-evolving block decimation'' (SEBD) に基づいて構築され、ノイズ回路の場合に拡張する。 SEBD は、2次元のユニタリ回路を 1D {\displaystyle {\it monitored} にマッピングしたもので、単位ゲートとともに測定を特徴付け、測定誘起絡み合い相転移の存在を利用して有限臨界深さ$T_c$以下の効率的な(近似的な)サンプリングを実現する。我々のノイズ-SEBDアルゴリズムは、ノイズを計測し、さらに絡み合いを減らし、より広い回路深さまで効率的な古典的なサンプリングを可能にする。物理関連ノイズモデル(ユニタリキュービットチャネル)のクラスを2レプリカ統計力学処理で解析し、弱い測定値が最適(つまり最も遠ざかる)アンラベリング(unraveling)であることを示す。次に,実回路モデルにおける回路深さと雑音強度の関数として,ノイズ-sebd複雑性遷移を求める。実例として、IBM QuantumプロセッサをベースとしたCNOTあたりのノイズレート$\approx 2\%の重六角形量子ビットアレイ上の回路を、5iSWAP(または10CNOT)ゲート層まで効率的にサンプリング可能であることを示す。本結果は,ノイズの多いハードウェアのシミュレーションの実用的硬度要件の明確化に有効である。 We introduce a classical algorithm for sampling the output of shallow, noisy random circuits on two-dimensional qubit arrays. The algorithm builds on the recently-proposed ``space-evolving block decimation'' (SEBD) and extends it to the case of noisy circuits. SEBD is based on a mapping of 2D unitary circuits to 1D {\it monitored} ones, which feature measurements alongside unitary gates; it exploits the presence of a measurement-induced entanglement phase transition to achieve efficient (approximate) sampling below a finite critical depth $T_c$. Our noisy-SEBD algorithm unravels the action of noise into measurements, further lowering entanglement and enabling efficient classical sampling up to larger circuit depths. We analyze a class of physically-relevant noise models (unital qubit channels) within a two-replica statistical mechanics treatment, finding weak measurements to be the optimal (i.e. most disentangling) unraveling. We then locate the noisy-SEBD complexity transition as a function of circuit depth and noise strength in realistic circuit models. As an illustrative example, we show that circuits on heavy-hexagon qubit arrays with noise rates of $\approx 2\%$ per CNOT, based on IBM Quantum processors, can be efficiently sampled up to a depth of 5 iSWAP (or 10 CNOT) gate layers. Our results help sharpen the requirements for practical hardness of simulation of noisy hardware.	翻訳日:2023-06-30 16:03:32 公開日:2023-06-28
# デュアルレール量子ネットワークにおけるプログラマブルマルチビット絡み合いの自律的分布 Autonomous distribution of programmable multi-qubit entanglement in a dual-rail quantum network ( http://arxiv.org/abs/2306.16453v1 ) ライセンス: Link先を確認	Joan Agust\'i, Xin H. H. Zhang, Yuri Minoguchi, Peter Rabl	(参考訳) デュアルレール導波路QEDセットアップにおいて、空間分布多ビット絡み合った状態を作成するためのスケーラブルで完全自律的なスキームを提案し、解析する。このアプローチでは、2つの分離導波路に沿って位置する量子ビットの配列は、非退化パラメトリック増幅器の出力からの相関光子によって照らされる。これらの光子は、クビットを、局所的クビット光子デチューニングのパターンによって、多重粒子の絡み合いの程度を便利に調整できるような、純粋に絡み合った定常状態の異なるクラスに駆動する。中規模ネットワークの数値シミュレーションにより、複雑なマルチ量子ビット状態の合成時間はシステムサイズと最も線形に増加し、大きな増幅帯域幅の限界でさらなる高速化の恩恵を受ける可能性があることが示されている。したがって、このスキームは、正確なパルス制御を必要とせず、単一のガウスの絡み合い源のみに依存することなく、大きな量子ネットワークで使える多部絡み合い状態を分散するための興味深い新しいルートを提供する。 We propose and analyze a scalable and fully autonomous scheme for preparing spatially distributed multi-qubit entangled states in a dual-rail waveguide QED setup. In this approach, arrays of qubits located along two separated waveguides are illuminated by correlated photons from the output of a non-degenerate parametric amplifier. These photons drive the qubits into different classes of pure entangled steady states, for which the degree of multipartite entanglement can be conveniently adjusted by the chosen pattern of local qubit-photon detunings. Numerical simulations for moderate-sized networks show that the preparation time for these complex multi-qubit states increases at most linearly with the system size and that one may benefit from an additional speedup in the limit of a large amplifier bandwidth. Therefore, this scheme offers an intriguing new route for distributing ready-to-use multipartite entangled states across large quantum networks, without requiring any precise pulse control and relying on a single Gaussian entanglement source only.	翻訳日:2023-06-30 16:03:05 公開日:2023-06-28
# HNO:PDEを解くハイエナニューラル演算子 HNO: Hyena Neural Operator for solving PDEs ( http://arxiv.org/abs/2306.16524v1 ) ライセンス: Link先を確認	Saurabh Patil, Zijie Li, Amir Barati Farimani	(参考訳) 偏微分方程式(PDE)の数値解法は通常、計算コストのかかる時空間スケールを解くために細かな離散化を必要とする。近年のディープラーニングの進歩は、ニューラル演算子の使用を含むPDEの解決に新たなアプローチをもたらした。ニューラルネットワークは、関数空間間のマッピングを学び、データに基づいて偏微分方程式を解く能力を持つニューラルネットワークアーキテクチャである。本研究は,多層パーセプトロンによりパラメータ化される長い畳み込みフィルタを用いた,ハイエナと呼ばれるニューラル演算子を用いる。ハイエナ作用素(hyena operator)は、大域的な受容場を楽しむ長い畳み込みをパラメータ化するために、準二次複雑性と状態空間モデルを楽しむ演算である。このメカニズムは入力のコンテキストに対するモデルの理解を高め、異なるPDEインスタンスに対するデータ依存重みを可能にする。 PDEの解法における層の効果を測定するため,バーガー方程式とナビエ・ストークス方程式の実験を行った。以上の結果から,Hyena Neural operatorはPDEの解演算子を学習する上で,効率的かつ正確なモデルとして機能することが示唆された。使用したデータとコードは、https://github.com/Saupatil07/Hyena-Neural-Operator.comで見ることができる。 Numerically solving partial differential equations (PDEs) typically requires fine discretization to resolve necessary spatiotemporal scales, which can be computationally expensive. Recent advances in deep learning have provided a new approach to solving PDEs that involves the use of neural operators. Neural operators are neural network architectures that learn mappings between function spaces and have the capability to solve partial differential equations based on data. This study utilizes a novel neural operator called Hyena, which employs a long convolutional filter that is parameterized by a multilayer perceptron. The Hyena operator is an operation that enjoys sub-quadratic complexity and state space model to parameterize long convolution that enjoys global receptive field. This mechanism enhances the model's comprehension of the input's context and enables data-dependent weight for different PDE instances. To measure how effective the layers are in solving PDEs, we conduct experiments on Burger's equation and Navier Stokes equation. Our findings indicate Hyena Neural operator can serve as an efficient and accurate model for learning PDEs' solution operator. The data and code used can be found at: https://github.com/Saupatil07/Hyena-Neural-Operator	翻訳日:2023-06-30 15:57:45 公開日:2023-06-28
# カーネルレンジスペースの場合、定数のクエリは十分である For Kernel Range Spaces a Constant Number of Queries Are Sufficient ( http://arxiv.org/abs/2306.16516v1 ) ライセンス: Link先を確認	Jeff M. Phillips and Hasan Pourmahmood-Aghababa	(参考訳) 我々は、カーネル範囲空間に対する$\varepsilon$-coverの概念を導入する。カーネル範囲空間は、点の集合 $X \subset \mathbb{R}^d$ と、固定されたカーネルによる全てのクエリの空間(例えば、ガウス核 $K(p,\cdot) = \exp(-\\|p-\cdot\\|^2)$)に関する。点集合 $X$ of size $n$ に対して、クエリは値のベクトル $R_p \in \mathbb{R}^n$ を返し、$i$th 座標 $(R_p)_i = K(p,x_i)$ for $x_i \in X$ が返される。 Q \subset \mathbb{R}^d$ は任意の$p \in \mathbb{R}^d$ に対して、$\frac{1}{n} \\|R_pR_q\\|_1\leq \varepsilon$ の集合である。これは、組合せ範囲空間に対するハウスラーの$\varepsilon$-covers(例えば、ボールクエリ内の点の部分集合で定義される)の概念の滑らかな類似であり、結果として得られるベクトル$R_p$は$[0,1]^n$の代わりに$\{0,1\}^n$である。これらの範囲空間のカーネルバージョンは、座標が不確かで不正確であるかもしれないデータ解析タスクに現れ、従ってクエリ範囲内外の概念に柔軟性を加えることを望んでいる。私たちの主な結果は、組合せ範囲空間とは異なり、カーネル $\varepsilon$-covers のサイズは入力サイズ $n$ と次元 $d$ に依存しないということです。ここでは、$(1/\varepsilon)^{\tilde O(1/\varepsilon^2)}$, $\tilde{O}(f(1/\varepsilon))$は、カーネルに依存することができる$(1/\varepsilon)$でログ係数を隠す。これは、範囲クエリにおける境界の概念を緩和することで、最終的には次元の呪いが消え、非常に高次元の機械学習の成功を説明するのに役立つことを意味する。また、この結果を約$(1/\varepsilon)^{\Omega(1/\varepsilon)$で補い、$/\varepsilon$への指数的な依存が必要とされることを示す。 We introduce the notion of an $\varepsilon$-cover for a kernel range space. A kernel range space concerns a set of points $X \subset \mathbb{R}^d$ and the space of all queries by a fixed kernel (e.g., a Gaussian kernel $K(p,\cdot) = \exp(-\\|p-\cdot\\|^2)$). For a point set $X$ of size $n$, a query returns a vector of values $R_p \in \mathbb{R}^n$, where the $i$th coordinate $(R_p)_i = K(p,x_i)$ for $x_i \in X$. An $\varepsilon$-cover is a subset of points $Q \subset \mathbb{R}^d$ so for any $p \in \mathbb{R}^d$ that $\frac{1}{n} \\|R_p - R_q\\|_1\leq \varepsilon$ for some $q \in Q$. This is a smooth analog of Haussler's notion of $\varepsilon$-covers for combinatorial range spaces (e.g., defined by subsets of points within a ball query) where the resulting vectors $R_p$ are in $\{0,1\}^n$ instead of $[0,1]^n$. The kernel versions of these range spaces show up in data analysis tasks where the coordinates may be uncertain or imprecise, and hence one wishes to add some flexibility in the notion of inside and outside of a query range. Our main result is that, unlike combinatorial range spaces, the size of kernel $\varepsilon$-covers is independent of the input size $n$ and dimension $d$. We obtain a bound of $(1/\varepsilon)^{\tilde O(1/\varepsilon^2)}$, where $\tilde{O}(f(1/\varepsilon))$ hides log factors in $(1/\varepsilon)$ that can depend on the kernel. This implies that by relaxing the notion of boundaries in range queries, eventually the curse of dimensionality disappears, and may help explain the success of machine learning in very high-dimensions. We also complement this result with a lower bound of almost $(1/\varepsilon)^{\Omega(1/\varepsilon)}$, showing the exponential dependence on $1/\varepsilon$ is necessary.	翻訳日:2023-06-30 15:57:24 公開日:2023-06-28
# 自動バイアス曲線の曲げ - 国家安全保障における人間とAIによる意思決定に関する研究 Bending the Automation Bias Curve: A Study of Human and AI-based Decision Making in National Security Contexts ( http://arxiv.org/abs/2306.16507v1 ) ライセンス: Link先を確認	Michael C. Horowitz, Lauren Kahn	(参考訳) ai(artificial intelligence, ai)の利用は、特に機械学習のアプローチによって、世界中のセクターや社会で増加している。 AIの採用は、特に国際セキュリティ分野において、どのように進むのか? 自動化バイアスの研究は、AIにおいて人間が過信される可能性があることを示唆する一方、アルゴリズムの逆転の研究は、決定の利害が高まるにつれて、人間がアルゴリズムを信頼することに対してより慎重になることを示している。我々は、AIに関する背景知識とAIに対する信頼の関係、そしてこれらが国際セキュリティ文脈における自動化バイアスの確率に影響を与える他の要因とどのように相互作用するかを理論化する。我々は、AI産業のレベルが異なる9カ国の9000人の成人の代表例を対象に、事前登録されたタスク識別実験でテストを行った。結果は、特にAIの背景知識に関する理論を強く支持する。ダニング・クルーガー効果(dunning kruger effect)の1バージョンは、aiを使った経験が最低レベルである人は、アルゴリズムが逆になる確率がわずかに高いため、自動化バイアスは、応答者のaiバックグラウンドが最高レベルに達する前に、知識の低レベルで発生する。追加の結果は、タスクの難易度、全体的なAI信頼、人間かAIの意思決定支援が非常に有能であるか、あまり有能でないと説明されるかどうかによる影響を示している。 Uses of artificial intelligence (AI), especially those powered by machine learning approaches, are growing in sectors and societies around the world. How will AI adoption proceed, especially in the international security realm? Research on automation bias suggests that humans can often be overconfident in AI, whereas research on algorithm aversion shows that, as the stakes of a decision rise, humans become more cautious about trusting algorithms. We theorize about the relationship between background knowledge about AI, trust in AI, and how these interact with other factors to influence the probability of automation bias in the international security context. We test these in a preregistered task identification experiment across a representative sample of 9000 adults in 9 countries with varying levels of AI industries. The results strongly support the theory, especially concerning AI background knowledge. A version of the Dunning Kruger effect appears to be at play, whereby those with the lowest level of experience with AI are slightly more likely to be algorithm-averse, then automation bias occurs at lower levels of knowledge before leveling off as a respondent's AI background reaches the highest levels. Additional results show effects from the task's difficulty, overall AI trust, and whether a human or AI decision aid is described as highly competent or less competent.	翻訳日:2023-06-30 15:56:37 公開日:2023-06-28
# 非IIDフェデレーション学習におけるMomentumのメリット Momentum Benefits Non-IID Federated Learning Simply and Provably ( http://arxiv.org/abs/2306.16504v1 ) ライセンス: Link先を確認	Ziheng Cheng, Xinmeng Huang, Kun Yuan	(参考訳) フェデレーション学習は、大規模機械学習の強力なパラダイムだが、信頼性の低いネットワーク接続、遅い通信、クライアント間のデータの不均一性など、大きな課題に直面している。 FedAvgとSCAFFOLDは、これらの課題に対処する2つの基本的なアルゴリズムである。特に、FedAvgは中央サーバと通信する前に複数のローカル更新を使用するが、SCAFFOLDは各クライアントに制御変数を保持し、ローカル更新で"クライアントドリフト"を補償する。これらの2つのアルゴリズムの収束性を高めるために、文献で様々な方法が提案されているが、アルゴリズム構造に対する非現実的な調整を行うか、境界データの不均一性の仮定に依存する。本稿では,FedAvgとSCAFFOLDの性能向上のための運動量の利用について検討する。すべてのクライアントがトレーニングプロセスに参加すると、momentumを組み込むことで、一定の局所学習率を使用しても、境界データの不均一性の仮定に頼らずにfedavgを収束させることができることを実証する。 fedavgの既存の分析では、局所学習率の低下にもかかわらず、境界データの不均一性が必要となるため、これは新しい結果である。部分的な顧客参加の場合、momentumは追加の前提を課すことなく、足場が確実に速く収束できることを示す。さらに,FedAvg と SCAFFOLD の新たな分散還元拡張を開発するために運動量を用いて,最先端の収束率を示す。実験結果はすべての理論的結果を支持する。 Federated learning is a powerful paradigm for large-scale machine learning, but it faces significant challenges due to unreliable network connections, slow communication, and substantial data heterogeneity across clients. FedAvg and SCAFFOLD are two fundamental algorithms to address these challenges. In particular, FedAvg employs multiple local updates before communicating with a central server, while SCAFFOLD maintains a control variable on each client to compensate for "client drift" in its local updates. Various methods have been proposed in literature to enhance the convergence of these two algorithms, but they either make impractical adjustments to algorithmic structure, or rely on the assumption of bounded data heterogeneity. This paper explores the utilization of momentum to enhance the performance of FedAvg and SCAFFOLD. When all clients participate in the training process, we demonstrate that incorporating momentum allows FedAvg to converge without relying on the assumption of bounded data heterogeneity even using a constant local learning rate. This is a novel result since existing analyses for FedAvg require bounded data heterogeneity even with diminishing local learning rates. In the case of partial client participation, we show that momentum enables SCAFFOLD to converge provably faster without imposing any additional assumptions. Furthermore, we use momentum to develop new variance-reduced extensions of FedAvg and SCAFFOLD, which exhibit state-of-the-art convergence rates. Our experimental results support all theoretical findings.	翻訳日:2023-06-30 15:56:13 公開日:2023-06-28
# SARC:ソフトアクターの反省的批判 SARC: Soft Actor Retrospective Critic ( http://arxiv.org/abs/2306.16503v1 ) ライセンス: Link先を確認	Sukriti Verma, Ayush Chopra, Jayakumar Subramanian, Mausoom Sarkar, Nikaash Puri, Piyush Gupta, Balaji Krishnamurthy	(参考訳) 俳優-批判的アルゴリズムであるsacの2倍スケールの性質は、批評家の見積もりが俳優に対して常に収束していないという事実によって特徴づけられるが、批評家は俳優よりも速く学習するので、両者の一貫性が保証される。様々な戦略が文献に導入され、より良い収束を達成するためにより良い勾配推定を学ぶ。グラデーション推定は批評家に依存するため,レビュアーの改善によって,各時点における俳優のグラデーション推定が向上する可能性が示唆される。これを利用することで、SAC批評家の損失を新たな損失期間的損失で増大させ、批評家の収束を早め、その結果、アクターの政策勾配推定をより良くするソフトアクターレトロスペクティブ批評(SARC)を提案する。既存のSACの実装は最小限の変更で簡単にSARCに適応できる。本研究では,SARCがベンチマーク環境におけるSACよりも一貫した改善を提供することを示す。我々は、コードとすべての実験データを、https://github.com/sukritiverma 1996/SARCでオープンソース化する予定です。 The two-time scale nature of SAC, which is an actor-critic algorithm, is characterised by the fact that the critic estimate has not converged for the actor at any given time, but since the critic learns faster than the actor, it ensures eventual consistency between the two. Various strategies have been introduced in literature to learn better gradient estimates to help achieve better convergence. Since gradient estimates depend upon the critic, we posit that improving the critic can provide a better gradient estimate for the actor at each time. Utilizing this, we propose Soft Actor Retrospective Critic (SARC), where we augment the SAC critic loss with another loss term - retrospective loss - leading to faster critic convergence and consequently, better policy gradient estimates for the actor. An existing implementation of SAC can be easily adapted to SARC with minimal modifications. Through extensive experimentation and analysis, we show that SARC provides consistent improvement over SAC on benchmark environments. We plan to open-source the code and all experiment data at: https://github.com/sukritiverma1996/SARC.	翻訳日:2023-06-30 15:55:45 公開日:2023-06-28
# 変分不等式の確率的方法:エルゴディディティ、バイアス、リファインメント Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements ( http://arxiv.org/abs/2306.16502v1 ) ライセンス: Link先を確認	Emmanouil-Vasileios Vlatakis-Gkaragkounis, Angeliki Giannou, Yudong Chen, Qiaomin Xie	(参考訳) 様々な機械学習タスクで発生する分極最適化と変分不等式問題 (VIP) に対して、Stochastic Extragradient (SEG) とStochastic Gradient Descent Ascent (SGDA) が最優先のアルゴリズムとして登場した。 SEG/SGDAの定常的なステップサイズ変種は、簡単なチューニングや初期条件の迅速な許容といった魅力的な利点によって人気を集めているが、それらの収束挙動は初歩的な双線形モデルにおいてもより複雑である。我々の研究は、これらのアルゴリズムに固有の確率構造を解明し、定量化する努力をしている。定数のステップサイズSEG/SGDAを時間同質マルコフ連鎖として再キャストすることにより、大数第一種法則と中心極限定理を確立し、平均イテレートが漸近正規であり、幅広いモノトンおよび非モノトンVIPに対してユニークな不変分布を持つことを示した。凸凹 min-max 最適化に特化して、Von-Neumann の値に対するステップサイズと誘導バイアスの関係を特徴づける。最後に、richardson-romberg外挿により、vipsのグローバルソリューションへの平均反復値の近接性が向上することを示す。我々の確率論的分析は、我々の理論的な発見を裏付ける実験によって支えられ、最適化、マルコフ連鎖、演算子理論からの技術を利用する。 For min-max optimization and variational inequalities problems (VIP) encountered in diverse machine learning tasks, Stochastic Extragradient (SEG) and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminent algorithms. Constant step-size variants of SEG/SGDA have gained popularity, with appealing benefits such as easy tuning and rapid forgiveness of initial conditions, but their convergence behaviors are more complicated even in rudimentary bilinear models. Our work endeavors to elucidate and quantify the probabilistic structures intrinsic to these algorithms. By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem, demonstrating that the average iterate is asymptotically normal with a unique invariant distribution for an extensive range of monotone and non-monotone VIPs. Specializing to convex-concave min-max optimization, we characterize the relationship between the step-size and the induced bias with respect to the Von-Neumann's value. Finally, we establish that Richardson-Romberg extrapolation can improve proximity of the average iterate to the global solution for VIPs. Our probabilistic analysis, underpinned by experiments corroborating our theoretical discoveries, harnesses techniques from optimization, Markov chains, and operator theory.	翻訳日:2023-06-30 15:55:25 公開日:2023-06-28
# ソーシャルメディアストリームからのイベント検出:方法,データセット,機会 Event Detection from Social Media Stream: Methods, Datasets and Opportunities ( http://arxiv.org/abs/2306.16495v1 ) ライセンス: Link先を確認	Quanzhi Li, Yang Chao, Dong Li, Yao Lu, Chi Zhang	(参考訳) ソーシャルメディアストリームには、日々の物語から最新のグローバルおよびローカルイベントやニュースまで、多種多様な情報が含まれている。特にtwitterは、リアルタイムに発生したイベントの迅速な拡散を可能にし、個人や組織が今起きている出来事を知らせ続けることができる。ソーシャルメディアデータからのイベント検出は、従来のテキストとは異なる課題であり、近年注目を集めている研究分野である。本稿では,Twitterデータストリームのイベント検出手法を幅広く調査し,この領域における最近の展開を読者が理解できるようにする。利用可能なデータセットを公開します。さらにいくつかの研究の機会は Social media streams contain large and diverse amount of information, ranging from daily-life stories to the latest global and local events and news. Twitter, especially, allows a fast spread of events happening real time, and enables individuals and organizations to stay informed of the events happening now. Event detection from social media data poses different challenges from traditional text and is a research area that has attracted much attention in recent years. In this paper, we survey a wide range of event detection methods for Twitter data stream, helping readers understand the recent development in this area. We present the datasets available to the public. Furthermore, a few research opportunities	翻訳日:2023-06-30 15:54:56 公開日:2023-06-28
# 独立したサブネットトレーニングの理論的理解に向けて Towards a Better Theoretical Understanding of Independent Subnetwork Training ( http://arxiv.org/abs/2306.16484v1 ) ライセンス: Link先を確認	Egor Shulgin and Peter Richt\'arik	(参考訳) 大規模機械学習の最近の進歩は、データ並列分散コンピューティングのパラダイムなしでは不可能だろう。大規模モデルを用いた分散コンピューティングは通信チャネルに過度な圧力を与えるため、通信コスト削減を目的とした通信圧縮戦略と訓練アルゴリズムの協調設計に向けた重要な研究が進められている。純粋なデータ並列処理はデータスケーリングを向上しますが、モデルスケーリング特性の貧弱さに悩まされます。実際、計算ノードはメモリ制約によって著しく制限され、モデルサイズがさらに大きくなるのを防ぐ。このため、巨大ニューラルネットワークモデルのトレーニングにおける最新の成果も、ある種のモデル並列性に依存している。本稿では,先述の問題を解決するために最近提案されている,高度に効果的な手法である独立サブネットワークトレーニング(ist)について,より理論的に考察する。圧縮通信を用いた分散手法など,ISTと代替手法の基本的な違いを特定し,その最適化性能を2次モデル上で正確に解析する。 Modern advancements in large-scale machine learning would be impossible without the paradigm of data-parallel distributed computing. Since distributed computing with large-scale models imparts excessive pressure on communication channels, significant recent research has been directed toward co-designing communication compression strategies and training algorithms with the goal of reducing communication costs. While pure data parallelism allows better data scaling, it suffers from poor model scaling properties. Indeed, compute nodes are severely limited by memory constraints, preventing further increases in model size. For this reason, the latest achievements in training giant neural network models also rely on some form of model parallelism. In this work, we take a closer theoretical look at Independent Subnetwork Training (IST), which is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication, and provide a precise analysis of its optimization performance on a quadratic model.	翻訳日:2023-06-30 15:54:47 公開日:2023-06-28
# densebam-gi:hmerのためのmomentum assisted gruを用いた注意強化denesenet DenseBAM-GI: Attention Augmented DeneseNet with momentum aided GRU for HMER ( http://arxiv.org/abs/2306.16482v1 ) ライセンス: Link先を確認	Aniket Pal, Krishna Pratap Singh	(参考訳) 手書き数学表現(HMER)の認識は,デジタル教育や学術研究の分野において重要である。しかし,手書き数式における記号間の長さと複雑な空間関係を正確に決定することは困難である。本研究では,HMER 用の新しいエンコーダ・デコーダアーキテクチャ (DenseBAM-GI) を提案する。そこでは,エンコーダは特徴表現を改善するために Bottleneck Attention Module (BAM) を持ち,デコーダは拡張ゲート付きGated Input-GRU (GI-GRU) ユニットを持ち,長大かつ複雑な表現を容易にする。提案モデルは、表現認識率(exprate)の観点から、最先端モデルと同等のパフォーマンスを持つ効率的で軽量なアーキテクチャである。また、CROHME 2014、2016、2019データセットの上位1、2、3エラー精度も向上している。 DenseBAM-GIは、CROHME 2019データセットで、すべてのモデルの中で最高のエクスプロイトを達成する。重要なことに、これらの成功は計算の複雑さの低下とgpuメモリの必要性の低減によって達成される。 The task of recognising Handwritten Mathematical Expressions (HMER) is crucial in the fields of digital education and scholarly research. However, it is difficult to accurately determine the length and complex spatial relationships among symbols in handwritten mathematical expressions. In this study, we present a novel encoder-decoder architecture (DenseBAM-GI) for HMER, where the encoder has a Bottleneck Attention Module (BAM) to improve feature representation and the decoder has a Gated Input-GRU (GI-GRU) unit with an extra gate to make decoding long and complex expressions easier. The proposed model is an efficient and lightweight architecture with performance equivalent to state-of-the-art models in terms of Expression Recognition Rate (exprate). It also performs better in terms of top 1, 2, and 3 error accuracy across the CROHME 2014, 2016, and 2019 datasets. DenseBAM-GI achieves the best exprate among all models on the CROHME 2019 dataset. Importantly, these successes are accomplished with a drop in the complexity of the calculation and a reduction in the need for GPU memory.	翻訳日:2023-06-30 15:54:32 公開日:2023-06-28
# 外部知識ビジュアル質問応答のための事前学習型マルチモーダルドライザー Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering ( http://arxiv.org/abs/2306.16478v1 ) ライセンス: Link先を確認	Alireza Salemi, Mahta Rafiee, Hamed Zamani	(参考訳) 本稿では,質問への回答に外部知識へのアクセスが必要である視覚質問応答タスクのカテゴリについて検討する。このカテゴリーは外部知識視覚質問応答 (OK-VQA) と呼ばれる。 OK-VQAシステムの開発における大きなステップは、与えられたマルチモーダルクエリに関連するドキュメントを取得することである。このタスクの最先端非対称密度検索モデルは、マルチモーダルクエリエンコーダとユニモーダルドキュメントエンコーダを備えたアーキテクチャを使用する。このようなアーキテクチャは、効果的なパフォーマンスのために大量のトレーニングデータを必要とする。そこで本稿では,OK-VQAタスクの経路検索モデルの事前学習のための自動データ生成パイプラインを提案する。提案されたアプローチは、現在の最先端非対称アーキテクチャと比較して26.9%の精度@5の改善をもたらす。さらに、提案した事前学習アプローチは、ゼロショット検索シナリオにおいて優れた能力を示す。 This paper studies a category of visual question answering tasks, in which accessing external knowledge is necessary for answering the questions. This category is called outside-knowledge visual question answering (OK-VQA). A major step in developing OK-VQA systems is to retrieve relevant documents for the given multi-modal query. Current state-of-the-art asymmetric dense retrieval model for this task uses an architecture with a multi-modal query encoder and a uni-modal document encoder. Such an architecture requires a large amount of training data for effective performance. We propose an automatic data generation pipeline for pre-training passage retrieval models for OK-VQA tasks. The proposed approach leads to 26.9% Precision@5 improvements compared to the current state-of-the-art asymmetric architecture. Additionally, the proposed pre-training approach exhibits a good ability in zero-shot retrieval scenarios.	翻訳日:2023-06-30 15:54:09 公開日:2023-06-28
# フレキシブルレート双方向ビデオ圧縮のためのマルチスケール変形性アライメントとコンテンツ適応型推論 Multi-Scale Deformable Alignment and Content-Adaptive Inference for Flexible-Rate Bi-Directional Video Compression ( http://arxiv.org/abs/2306.16544v1 ) ライセンス: Link先を確認	M.Ak{\i}n Y{\i}lmaz, O.Ugur Ulas, A.Murat Tekalp	(参考訳) 動画コンテンツに動き補償モデルを適用する能力の欠如は、現在のエンドツーエンドの学習ビデオ圧縮モデルの重要な制限である。本稿では、エンドツーエンドの速度歪みに最適化された階層的双方向ビデオ圧縮のための適応型モーション補償モデルを提案する。特に2つの新案を提案します一特徴レベルにおけるマルチスケールの変形可能なアライメント方式及びマルチスケール条件付き符号化二運動コンテンツ適応推論さらに,複数のレート歪み動作点で単一モデルを動作させることができるゲインユニットを採用した。また,実際のフレキシブルレート学習ビデオ符号化のために,対応するモデルを微調整することにより,符号内対双方向符号化フレーム間のビット割り当てを制御するためにゲインユニットを利用する。実験により, 学習ビデオ符号化における先行技術に比較して, 最先端の速度歪み性能を示すことができた。 The lack of ability to adapt the motion compensation model to video content is an important limitation of current end-to-end learned video compression models. This paper advances the state-of-the-art by proposing an adaptive motion-compensation model for end-to-end rate-distortion optimized hierarchical bi-directional video compression. In particular, we propose two novelties: i) a multi-scale deformable alignment scheme at the feature level combined with multi-scale conditional coding, ii) motion-content adaptive inference. In addition, we employ a gain unit, which enables a single model to operate at multiple rate-distortion operating points. We also exploit the gain unit to control bit allocation among intra-coded vs. bi-directionally coded frames by fine tuning corresponding models for truly flexible-rate learned video coding. Experimental results demonstrate state-of-the-art rate-distortion performance exceeding those of all prior art in learned video coding.	翻訳日:2023-06-30 15:47:29 公開日:2023-06-28
# 効率的なフォトリアリスティック・ヒューマンレンダリングによる次世代拡張現実会議システムの実現 Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering ( http://arxiv.org/abs/2306.16541v1 ) ライセンス: Link先を確認	Chuanyue Shen, Letian Zhang, Zhangsihao Yang, Masood Mortazavi, Xiyun Song, Liang Peng, Heather Yu	(参考訳) オンラインミーティングが新しい標準になりつつある。オンライン会議のための没入型体験を作ることは、より多様でシームレスな環境に欠かせない。人間の3Dダイナミックスの効率的な光リアルレンダリングは没入型ミーティングの中核である。現在の一般的なアプリケーションはリアルタイム会議を実現しているが,2d空間の制限や,参加者間の現実的なインタラクションを欠いたアバターの使用など,フォトリアリスティックな人間のダイナミクスの提供には不足している。 NeRF(Neural Radiance Field)のようなニューラルレンダリングの最近の進歩は、メタバースミーティングにおいてより大きなリアリズムをもたらす可能性がある。しかし,NeRFのレンダリング速度は遅いため,リアルタイム会議が困難である。データとハードウェアの効率を向上させるために,単眼映像取得と自由視点合成を活用した,将来の拡張現実型メタバース会議システムのためのパイプラインを想定する。没入型会議体験に向けて,光現実性人間力学をより効率的にレンダリングするための,NeRFに基づく高速な自由視点合成アルゴリズムを探索する。提案アルゴリズムは,最先端手法よりも44.5%,213%高速なトレーニングを行いながら,同等のレンダリング品質を実現することを示す。我々の探索は、複雑なアプリケーションシナリオを扱えるメタバース会議システムを構築するための設計基盤を提供する。例えば、カスタマイズされたテーマによる動的シーンのリライトや、現実世界の人々を拡張世界へと調和させるマルチユーザー会議である。 Meeting online is becoming the new normal. Creating an immersive experience for online meetings is a necessity towards more diverse and seamless environments. Efficient photorealistic rendering of human 3D dynamics is the core of immersive meetings. Current popular applications achieve real-time conferencing but fall short in delivering photorealistic human dynamics, either due to limited 2D space or the use of avatars that lack realistic interactions between participants. Recent advances in neural rendering, such as the Neural Radiance Field (NeRF), offer the potential for greater realism in metaverse meetings. However, the slow rendering speed of NeRF poses challenges for real-time conferencing. We envision a pipeline for a future extended reality metaverse conferencing system that leverages monocular video acquisition and free-viewpoint synthesis to enhance data and hardware efficiency. Towards an immersive conferencing experience, we explore an accelerated NeRF-based free-viewpoint synthesis algorithm for rendering photorealistic human dynamics more efficiently. We show that our algorithm achieves comparable rendering quality while performing training and inference 44.5% and 213% faster than state-of-the-art methods, respectively. Our exploration provides a design basis for constructing metaverse conferencing systems that can handle complex application scenarios, including dynamic scene relighting with customized themes and multi-user conferencing that harmonizes real-world people into an extended world.	翻訳日:2023-06-30 15:47:16 公開日:2023-06-28
# 物体検出のための深層学習における前景-背景不均衡問題の体系的研究 A systematic study of the foreground-background imbalance problem in deep learning for object detection ( http://arxiv.org/abs/2306.16539v1 ) ライセンス: Link先を確認	Hanxue Gu, Haoyu Dong, Nicholas Konz, Maciej A. Mazurowski	(参考訳) 深層学習におけるクラス不均衡問題は、いくつかの研究で研究されているが、物体検出におけるこの現象の体系的な解析はまだ行われていない。本稿では,対象検出におけるフォアグラウンドバックグラウンド(f-b)不均衡問題の包括的解析と実験を行う。 F-B不均衡(オブジェクトサイズ,オブジェクト数,データセットサイズ,オブジェクトタイプ)の異なる側面が検出性能に及ぼす影響を実験的に検討した。さらに,Faster-RCNN,SSD,OHEM,Libra-RCNN,Focal-Loss,GHM,PISA,YOLO-v3,GFLの9つの主要な手法を,異なる画像領域のデータセットで比較した。 We conclude that (1) the F-B imbalance can indeed cause a significant drop in detection performance, (2) The detection performance is more affected by F-B imbalance when fewer training data are available, (3) in most cases, decreasing object size leads to larger performance drop than decreasing number of objects, given the same change in the ratio of object pixels to non-object pixels, (6) among all selected methods, Libra-RCNN and PISA demonstrate the best performance in addressing the issue of F-B imbalance. (7) トレーニングデータセットのサイズが大きい場合, 方法の選択は影響を受けない (8) フォーカスロス, GHM, GFLを含むソフトサンプリング手法は, 平均的にかなりよく動作するが, 比較的不安定である。 The class imbalance problem in deep learning has been explored in several studies, but there has yet to be a systematic analysis of this phenomenon in object detection. Here, we present comprehensive analyses and experiments of the foreground-background (F-B) imbalance problem in object detection, which is very common and caused by small, infrequent objects of interest. We experimentally study the effects of different aspects of F-B imbalance (object size, number of objects, dataset size, object type) on detection performance. In addition, we also compare 9 leading methods for addressing this problem, including Faster-RCNN, SSD, OHEM, Libra-RCNN, Focal-Loss, GHM, PISA, YOLO-v3, and GFL with a range of datasets from different imaging domains. We conclude that (1) the F-B imbalance can indeed cause a significant drop in detection performance, (2) The detection performance is more affected by F-B imbalance when fewer training data are available, (3) in most cases, decreasing object size leads to larger performance drop than decreasing number of objects, given the same change in the ratio of object pixels to non-object pixels, (6) among all selected methods, Libra-RCNN and PISA demonstrate the best performance in addressing the issue of F-B imbalance. (7) When the training dataset size is large, the choice of method is not impactful (8) Soft-sampling methods, including focal-loss, GHM, and GFL, perform fairly well on average but are relatively unstable.	翻訳日:2023-06-30 15:46:53 公開日:2023-06-28
# CLANet: ブライトフィールド画像を用いたクロスバッチセルライン識別のための総合的フレームワーク CLANet: A Comprehensive Framework for Cross-Batch Cell Line Identification Using Brightfield Images ( http://arxiv.org/abs/2306.16538v1 ) ライセンス: Link先を確認	Lei Tong, Adam Corrigan, Navin Rathna Kumar, Kerry Hallbrook, Jonathan Orme, Yinhai Wang, Huiyu Zhou	(参考訳) 細胞線認証は、生物医学の分野で重要な役割を担っており、研究者が正確に同定された細胞を扱うことを保証する。教師付き深層学習は、細胞イメージングによる細胞形態学的特徴の研究により、細胞株の同定において顕著な進歩を遂げた。しかし、データが生成される異なる時間から生じる重要な問題であるバッチ効果は、基礎となるデータ分布に大きな変化をもたらし、異なるバッチ培養から細胞列間の信頼性の高い分化を複雑にする。この課題に対処するために,我々は,brightfieldイメージを用いたクロスバッチセルライン識別のための先駆的フレームワークであるclangtを紹介する。本稿では,セル密度の変動を効率的に把握するセルクラスタレベルの選択手法と,画像品質の変動を管理する自己教師型学習戦略を提案する。さらに,複数のインスタンス学習(MIL)を,セルライン識別のためのインスタンスレベルの特徴を効果的に集約するために採用する。当社の革新的な時系列セグメントサンプリングモジュールは,バッチ間のインキュベーション時間の違いによるバイアスを軽減し,milの機能学習能力をさらに向上させる。 astrazenecaグローバルセルバンクの93の実験バッチにわたる32のセルラインのデータを用いて、clangetを検証する。以上の結果から,CLANetは関連するアプローチ(ドメイン適応,MILなど)よりも優れており,細胞株同定におけるバッチ効果に対処する効果が示された。 Cell line authentication plays a crucial role in the biomedical field, ensuring researchers work with accurately identified cells. Supervised deep learning has made remarkable strides in cell line identification by studying cell morphological features through cell imaging. However, batch effects, a significant issue stemming from the different times at which data is generated, lead to substantial shifts in the underlying data distribution, thus complicating reliable differentiation between cell lines from distinct batch cultures. To address this challenge, we introduce CLANet, a pioneering framework for cross-batch cell line identification using brightfield images, specifically designed to tackle three distinct batch effects. We propose a cell cluster-level selection method to efficiently capture cell density variations, and a self-supervised learning strategy to manage image quality variations, thus producing reliable patch representations. Additionally, we adopt multiple instance learning(MIL) for effective aggregation of instance-level features for cell line identification. Our innovative time-series segment sampling module further enhances MIL's feature-learning capabilities, mitigating biases from varying incubation times across batches. We validate CLANet using data from 32 cell lines across 93 experimental batches from the AstraZeneca Global Cell Bank. Our results show that CLANet outperforms related approaches (e.g. domain adaptation, MIL), demonstrating its effectiveness in addressing batch effects in cell line identification.	翻訳日:2023-06-30 15:46:27 公開日:2023-06-28
# 核多体系における多体絡み合いと情報再構成 Multi-Body Entanglement and Information Rearrangement in Nuclear Many-Body Systems ( http://arxiv.org/abs/2306.16535v1 ) ライセンス: Link先を確認	S. Momme Hengstenberg, Caroline E. P. Robin, Martin J. Savage	(参考訳) 核多体系の有効モデル空間(EMS)計算について検討し,多粒子エンタングルメントの収束について検討した。一般化リプキン・メシュコフ・グリク(lmg)モデルは、核の絡み合い駆動記述の将来の発展の動機付けと洞察を提供するために用いられる。効果的なアプローチはヒルベルト空間の切り離しと、関連する基本自由度を構成するクォービット(スピン)の変分回転に基づいている。回転と切り離しの非可換性により、モデル空間の大部分でエネルギー収束が指数関数的に改善される。本分析では, 相関と絡み合いの測定を行い, その収束度をカットオフの増加とともに定量化する。マルチボディの絡み合いを推定するために, 1 および 2 スピンの絡み合いエントロピー,相互情報,および $n$-tangles に焦点を当てた。実効的な記述は回転したスピンのエントロピーや相互情報を強く抑制し、低いカットオフで正確な結果を広範囲に回収することができる。一方、素ハミルトニアンのネーブ・トランケーションは、これらの測度を人工的に過小評価する。本モデルにおけるn$-tangles は、n$-particle の絡み合いの基底独立測度を提供する。 EMSの記述ではこれらを捉えるのが難しいが、最小のハミルトニアンのトランケーションに比べて収束の改善は著しく劇的である。低エネルギーems法は多体系における低次オブザーバブルの予測能力を提供し、lmgモデルにおける量子相関や多体絡み合いの類似性を示し、核多体系や高エネルギー物理学や核物理学に関連する実効場理論の研究を動機付けるものであると結論づける。 We examine how effective-model-space (EMS) calculations of nuclear many-body systems rearrange and converge multi-particle entanglement. The generalized Lipkin-Meshkov-Glick (LMG) model is used to motivate and provide insight for future developments of entanglement-driven descriptions of nuclei. The effective approach is based on a truncation of the Hilbert space together with a variational rotation of the qubits (spins), which constitute the relevant elementary degrees of freedom. The non-commutivity of the rotation and truncation allows for an exponential improvement of the energy convergence throughout much of the model space. Our analysis examines measures of correlations and entanglement, and quantifies their convergence with increasing cut-off. We focus on one- and two-spin entanglement entropies, mutual information, and $n$-tangles for $n=2,4$ to estimate multi-body entanglement. The effective description strongly suppresses entropies and mutual information of the rotated spins, while being able to recover the exact results to a large extent with low cut-offs. Naive truncations of the bare Hamiltonian, on the other hand, artificially underestimate these measures. The $n$-tangles in the present model provide a basis-independent measures of $n$-particle entanglement. While these are more difficult to capture with the EMS description, the improvement in convergence, compared to truncations of the bare Hamiltonian, is significantly more dramatic. We conclude that the low-energy EMS techniques, that successfully provide predictive capabilities for low-lying observables in many-body systems, exhibit analogous efficacy for quantum correlations and multi-body entanglement in the LMG model, motivating future studies in nuclear many-body systems and effective field theories relevant to high-energy physics and nuclear physics.	翻訳日:2023-06-30 15:46:04 公開日:2023-06-28
# 有限時間熱力学における集合的利点 Collective advantages in finite-time thermodynamics ( http://arxiv.org/abs/2306.16534v1 ) ライセンス: Link先を確認	Alberto Rolandi, Mart\'i Perarnau-Llobet	(参考訳) 有限時間熱力学における中心的なタスクは、熱浴に浸漬した系の状態を操作する際に、余剰または散逸する作業を最小化することである。我々は,この課題を,プロセスの開始時と終了時において,構成成分が同一で非相関な$N$ボディシステムとみなす。遅いが有限時間プロセスの状態では、プロトコルに沿って対話が適切に作成される集合プロトコルを考えることで、$W_{\rm diss}$を劇的に削減できることを示す。これは$W_{\rm diss}\sim N^x$ with $x<1$; のサブ線形成長にもつながり、非相互作用プロトコルで満たされる$W_{\rm diss}\sim N$とは対照的に、$N$: $W_{\rm diss}\sim N^x$ with $x<1$; のサブ線形成長につながる。このような集合的利点に対する基本的な限界を導出し、x=0$ が原理的に可能であることを示すが、これは非常に局所的な $n$-body 相互作用を必要とする。次に、現実的な多体相互作用モデル、特に1次元スピンチェーンと全対全スピンモデルによる集合過程を探索し、現実的な制御レベルで顕著な利得を達成する。これらの結果の応用として,情報の消去を有限時間に限定し,ランドーアーの消去限界へのより高速な収束を証明した。 A central task in finite-time thermodynamics is to minimize the excess or dissipated work, $W_{\rm diss}$, when manipulating the state of a system immersed in a thermal bath. We consider this task for an $N$-body system, whose constituents are identical and uncorrelated at the beginning and end of the process. In the regime of slow but finite-time processes, we show that $W_{\rm diss}$ can be dramatically reduced by considering collective protocols in which interactions are suitably created along the protocol. This can even lead to a sub-linear growth of $W_{\rm diss}$ with $N$: $W_{\rm diss}\sim N^x$ with $x<1$; to be contrasted to the expected $W_{\rm diss}\sim N$ satisfied in any non-interacting protocol. We derive the fundamental limits to such collective advantages and show that $x=0$ is in principle possible, which however requires highly non-local $N$-body interactions. We then explore collective processes with realistic many-body interacting models, in particular a 1D spin chain and an all-to-all spin model, achieving noticeable gains under realistic levels of control. As an application of these results, we focus on the erasure of information in finite time, and prove a faster convergence to Landauer's erasure bound.	翻訳日:2023-06-30 15:45:33 公開日:2023-06-28
# ICSVR:ビデオ検索モデルにおける構成的・意味的理解の検討 ICSVR: Investigating Compositional and Semantic Understanding in Video Retrieval Models ( http://arxiv.org/abs/2306.16533v1 ) ライセンス: Link先を確認	Avinash Madasu, Vasudev Lal	(参考訳) ビデオ検索(VR)は、テキストキャプションまたはリバーサが与えられたビデオデータベースから地上の真理ビデオを取得することを含む。合成性の2つの重要なコンポーネント:オブジェクト \&属性とアクションは適切なテキストクエリを形成するために正しいセマンティクスを使って結合される。これらのコンポーネント(属性、アクション、セマンティクスを対象とする)は、それぞれがビデオの識別や正しい地上の真理ビデオの検索に重要な役割を果たす。しかし,これらのコンポーネントがビデオ検索性能に与える影響は明らかでない。そこで我々は,MSRVTT,MSVD,DIDEMOなどの標準ベンチマークを用いて,映像検索モデルの構成的および意味的理解を評価するための体系的研究を行った。本研究は,ビデオ検索モデルの2つのカテゴリについて行った。 (i)ビデオテキストペアで事前学習し、下流ビデオ検索データセット(例えば、Frozen-in-Time、Violet、MCQなど)で微調整する。 (ii) ビデオ検索にCLIP(CLIP4Clip, XCLIP, CLIP2Videoなど)のような事前訓練済みの画像テキスト表現を適用する。ビデオ理解において,アクションやセマンティクスはオブジェクトや属性と比較して小さな役割を担っていることが明らかとなった。さらに、事前学習された画像テキスト表現(CLIP)を用いたビデオ検索モデルは、ビデオテキストデータに事前学習されたモデルと比較して、意味的・構成的理解が優れている。 Video retrieval (VR) involves retrieving the ground truth video from the video database given a text caption or vice-versa. The two important components of compositionality: objects \& attributes and actions are joined using correct semantics to form a proper text query. These components (objects \& attributes, actions and semantics) each play an important role to help distinguish among videos and retrieve the correct ground truth video. However, it is unclear what is the effect of these components on the video retrieval performance. We therefore, conduct a systematic study to evaluate the compositional and semantic understanding of video retrieval models on standard benchmarks such as MSRVTT, MSVD and DIDEMO. The study is performed on two categories of video retrieval models: (i) which are pre-trained on video-text pairs and fine-tuned on downstream video retrieval datasets (Eg. Frozen-in-Time, Violet, MCQ etc.) (ii) which adapt pre-trained image-text representations like CLIP for video retrieval (Eg. CLIP4Clip, XCLIP, CLIP2Video etc.). Our experiments reveal that actions and semantics play a minor role compared to objects \& attributes in video understanding. Moreover, video retrieval models that use pre-trained image-text representations (CLIP) have better semantic and compositional understanding as compared to models pre-trained on video-text data.	翻訳日:2023-06-30 15:45:08 公開日:2023-06-28
# WHOグレード4グリオーマにおける術前MRIによる早期進行と生存リスクの予測 Prediction of Rapid Early Progression and Survival Risk with Pre-Radiation MRI in WHO Grade 4 Glioma Patients ( http://arxiv.org/abs/2306.16531v1 ) ライセンス: Link先を確認	Walia Farzana, Mustafa M Basree, Norou Diawara, Zeina A. Shboul, Sagel Dubey, Marie M Lockhart, Mohamed Hamza, Joshua D. Palmer, Khan M. Iftekharuddin	(参考訳) 最近の臨床研究では放射線治療開始前にREPを呈するグリオ芽腫のサブセットが報告されている。現在の文献では臨床病理学的特徴を用いてこの人口を記述している。本研究は,従来のra-diomics,洗練されたマルチレゾリューションフラクタルテクスチャ特徴,および非rep症例からのrepの予測のための診断および予後予測ツールとしての異なる分子特徴(mgmt,idh変異)について,計算および統計モデルを用いて初めて検討した。放射線プランニングT1ポストコントラスト(T1C)MRIシークエンスの解析を行った。 1000回以上の5倍のクロスバリデーションを持つアンサンブル法では、AUCは0.793であり、REPと非REPの標準偏差は0.082である。さらに、依存的な検閲(患者のサブセットが死ぬまで追跡されない場合)下でのコプラに基づくモデリングは、患者の生存確率と予後のグルーピングに重要な特徴(p-value <0.05)を特定する。コホート患者の生存率の予測は0.881で、標準偏差は0.056である。融合特徴を用いた予後指標(PI)は、REP症例の84.62%が悪い予後群に該当し、REP症例の比率が高くなる可能性を示唆している。さらに, マルチ分解能フラクタルテクスチャ特性は, REPやサバイバル結果の従来の放射能特性よりも優れていた。 Recent clinical research describes a subset of glioblastoma patients that exhibit REP prior to start of radiation therapy. Current literature has thus far described this population using clinicopathologic features. To our knowledge, this study is the first to investigate the potential of conventional ra-diomics, sophisticated multi-resolution fractal texture features, and different molecular features (MGMT, IDH mutations) as a diagnostic and prognostic tool for prediction of REP from non-REP cases using computational and statistical modeling methods. Radiation-planning T1 post-contrast (T1C) MRI sequences of 70 patients are analyzed. Ensemble method with 5-fold cross validation over 1000 iterations offers AUC of 0.793 with standard deviation of 0.082 for REP and non-REP classification. In addition, copula-based modeling under dependent censoring (where a subset of the patients may not be followed up until death) identifies significant features (p-value <0.05) for survival probability and prognostic grouping of patient cases. The prediction of survival for the patients cohort produces precision of 0.881 with standard deviation of 0.056. The prognostic index (PI) calculated using the fused features suggests that 84.62% of REP cases fall under the bad prognostic group, suggesting potentiality of fused features to predict a higher percentage of REP cases. The experimental result further shows that mul-ti-resolution fractal texture features perform better than conventional radiomics features for REP and survival outcomes.	翻訳日:2023-06-30 15:44:41 公開日:2023-06-28
# OAM光と原子アンサンブルのQND相互作用による並列多量子SWAPゲート Parallel multi-two-qubit SWAP gate via QND interaction of OAM light and atomic ensemble ( http://arxiv.org/abs/2306.16565v1 ) ライセンス: Link先を確認	E.N. Bashmakova, E.A. Vashukevich, and T. Yu. Golubeva	(参考訳) 現在、量子SWAPゲートは量子コンピューティングの不可欠な部分となっているため、その実現方法の研究は様々な量子光学および情報応用において重要な実践的問題であると考えられる。本稿では、原子アンサンブルと軌道角運動量を持つマルチモード光との量子非退化相互作用の枠組みにおいて、離散変数でスワップ論理演算を行うためのスキームを提案する。本稿では、駆動場軌道運動量の異なる値に対する原子状態と磁場状態の集合上の2量子ビット閉サブシステムを明らかにする手順について詳細に論じる。また,並列マルチツーキュービット量子SWAPゲートの実装の可能性を示す。 Nowadays quantum SWAP gate has become an integral part of quantum computing, so investigation of methods of its realization seems to be an important practical problem for various quantum-optical and information applications. In the present paper we propose a scheme for performing a SWAP logic operation in discrete variables in the framework of quantum non-demolition interaction between an atomic ensemble and a multimode light with orbital angular momentum. We discuss in detail the procedure for revealing two-qubit closed subsystems on a set of atomic and field states for different values of the driving field orbital momentum. We also demonstrate the possibility of implementing a parallel multi-two-qubit quantum SWAP gate.	翻訳日:2023-06-30 15:38:01 公開日:2023-06-28
# pareto optimal self-supervision による大規模言語モデルの自動校正と誤り訂正 Automatic Calibration and Error Correction for Large Language Models via Pareto Optimal Self-Supervision ( http://arxiv.org/abs/2306.16564v1 ) ライセンス: Link先を確認	Theodore Zhao, Mu Wei, J. Samuel Preston, Hoifung Poon	(参考訳) 大規模言語モデル (LLM) は、広範囲の応用において目覚ましい能力を示してきたが、精度は依然として大きな成長領域であり、特にバイオメディシンのようなミッションクリティカルな領域では顕著である。 LLM応答に対する信頼度を校正する効果的な方法は、エラーを自動的に検出し、ループ内検証を容易にするために不可欠である。キャリブレーション信号の重要な源は、低コストで利用可能であるが、ノイズやカバレッジといった独自の制限がある、専門家によるプログラム的監督にある。本稿では,利用可能なプログラム的監督を活用し,追加の手動作業なしに,各応答に対するリスクスコアを作成することで,llm応答を体系的に校正することができるparetoの最適自己スーパービジョンフレームワークを提案する。これは、より不確実なLSM応答により高いリスクスコアを割り当て、エラー修正を容易にする、他の利用可能な監視源とLLM出力を一致させるハーモニザモデルを学ぶことで達成される。生体医学領域および一般領域における標準関係抽出タスクの実験により,本手法の有効性が示され,本手法のリスクスコアはllmsの実誤差率と高い相関を示した。最も不確実なテスト例では,提案したリスクスコアに基づく動的プロンプトにより,既製のLCMの精度が大幅に向上し,SOTA(State-of-the-art)の監督が弱く,SOTAの監督が難しい評価データセットにGPT-4の結果が及んだ。 Large language models (LLMs) have demonstrated remarkable capabilities out of box for a wide range of applications, yet accuracy still remains a major growth area, especially in mission-critical domains such as biomedicine. An effective method to calibrate the confidence level on LLM responses is essential to automatically detect errors and facilitate human-in-the-loop verification. An important source of calibration signals stems from expert-stipulated programmatic supervision, which is often available at low cost but has its own limitations such as noise and coverage. In this paper, we introduce a Pareto optimal self-supervision framework that can leverage available programmatic supervision to systematically calibrate LLM responses by producing a risk score for every response, without any additional manual efforts. This is accomplished by learning a harmonizer model to align LLM output with other available supervision sources, which would assign higher risk scores to more uncertain LLM responses and facilitate error correction. Experiments on standard relation extraction tasks in biomedical and general domains demonstrate the promise of this approach, with our proposed risk scores highly correlated with the real error rate of LLMs. For the most uncertain test instances, dynamic prompting based on our proposed risk scores results in significant accuracy improvement for off-the-shelf LLMs, boosting GPT-3 results past state-of-the-art (SOTA) weak supervision and GPT-4 results past SOTA supervised results on challenging evaluation datasets.	翻訳日:2023-06-30 15:37:51 公開日:2023-06-28
# 半定義プログラミングと量子情報 Semi-definite programming and quantum information ( http://arxiv.org/abs/2306.16560v1 ) ライセンス: Link先を確認	Piotr Mironowicz	(参考訳) 本稿では,量子情報の文脈における半定値プログラミング(SDP)手法の包括的探索について述べる。凸最適化、双対性、sdp定式化の数学的基礎を調べ、量子システムにおける最適化の課題に対処するための確かな理論的枠組みを提供する。これらのツールを活用することで、研究者や実践者は古典的および量子的相関を特徴づけ、量子状態を最適化し、効率的な量子アルゴリズムとプロトコルを設計することができる。また,量子情報処理における最適化手法の効果的な活用を可能にするため,sdpやモデリングツールなどの実装面についても論じる。この論文で提示された知見と方法論は、量子情報分野の進歩に寄与し、新しい通信プロトコル、自己テスト手法、量子絡み合いのより深い理解を促進することが証明されている。全体として、この研究は最適化と量子情報の交点に関心のある研究者にリソースを提供し、この急速に進化する分野における探索とブレークスルーのための新しい道を開く。 This paper presents a comprehensive exploration of semi-definite programming (SDP) techniques within the context of quantum information. It examines the mathematical foundations of convex optimization, duality, and SDP formulations, providing a solid theoretical framework for addressing optimization challenges in quantum systems. By leveraging these tools, researchers and practitioners can characterize classical and quantum correlations, optimize quantum states, and design efficient quantum algorithms and protocols. The paper also discusses implementational aspects, such as solvers for SDP and modeling tools, enabling the effective employment of optimization techniques in quantum information processing. The insights and methodologies presented in this paper have proven instrumental in advancing the field of quantum information, facilitating the development of novel communication protocols, self-testing methods, and a deeper understanding of quantum entanglement. Overall, this study offers a resource for researchers interested in the intersection of optimization and quantum information, opening up new avenues for exploration and breakthroughs in this rapidly evolving field.	翻訳日:2023-06-30 15:37:21 公開日:2023-06-28
# 特徴選択:属性間の協調をめざして Feature Selection: A perspective on inter-attribute cooperation ( http://arxiv.org/abs/2306.16559v1 ) ライセンス: Link先を確認	Gustavo Sosa-Cabrera, Santiago G\'omez-Guerrero, Miguel Garc\'ia-Torres, Christian E. Schaerer	(参考訳) 高次元データセットは、データマイニングと機械学習における学習タスクの課題を描いている。特徴の選択は次元の縮小を扱う効果的な手法である。これは学習アルゴリズムを適用する前に必要不可欠なデータ処理ステップであることが多い。フィルタの特徴選択手法は、何十年もの間、単純な単変量関係ランキングアルゴリズムから、より洗練された関連性-冗長トレードオフ、そして近年の多変量依存に基づくアプローチへと進化してきた。多変量依存を取り込むこの傾向は、特徴間の相互作用からクラスに関するユニークな情報を得ることを目的としている。本稿では,機能相互運用を支援するフィルタ特徴選択手法に関する最近の研究を包括的に調査し,文献における様々なアプローチの貢献を要約する。さらに,今後の研究開発に期待できる課題や課題についても紹介する。 High-dimensional datasets depict a challenge for learning tasks in data mining and machine learning. Feature selection is an effective technique in dealing with dimensionality reduction. It is often an essential data processing step prior to applying a learning algorithm. Over the decades, filter feature selection methods have evolved from simple univariate relevance ranking algorithms to more sophisticated relevance-redundancy trade-offs and to multivariate dependencies-based approaches in recent years. This tendency to capture multivariate dependence aims at obtaining unique information about the class from the intercooperation among features. This paper presents a comprehensive survey of the state-of-the-art work on filter feature selection methods assisted by feature intercooperation, and summarizes the contributions of different approaches found in the literature. Furthermore, current issues and challenges are introduced to identify promising future research and development.	翻訳日:2023-06-30 15:37:05 公開日:2023-06-28
# 理論的保証付き機械学習のための非凸最適化:ロバスト行列補完とニューラルネットワーク学習 Non-Convex Optimizations for Machine Learning with Theoretical Guarantee: Robust Matrix Completion and Neural Network Learning ( http://arxiv.org/abs/2306.16557v1 ) ライセンス: Link先を確認	Shuai Zhang	(参考訳) 機械学習の最近の発展にもかかわらず、ほとんどの学習システムは未だに「ブラックボックス」という概念の下にあり、パフォーマンスを理解・導出できない。公衆の安全とプライバシーの懸念が高まり、説明可能な学習システムを設計することは、機械学習の新しいトレンドとなっている。一般に、多くの機械学習問題は損失関数の最小化(最大化)として定式化されている。実データは非線形モデルから生成される可能性が高いため、損失関数は一般に非凸である。凸最適化問題と異なり、勾配降下アルゴリズムは非凸最適化の解法において局所的最小値に閉じ込められる。したがって、非凸最適化問題を研究する際に説明可能なアルゴリズムを提供することは困難である。本論文では,(1)低ランク行列補完と(2)ニューラルネットワーク学習の2つの一般的な非凸問題について考察する。 Despite the recent development in machine learning, most learning systems are still under the concept of "black box", where the performance cannot be understood and derived. With the rise of safety and privacy concerns in public, designing an explainable learning system has become a new trend in machine learning. In general, many machine learning problems are formulated as minimizing (or maximizing) some loss function. Since real data are most likely generated from non-linear models, the loss function is non-convex in general. Unlike the convex optimization problem, gradient descent algorithms will be trapped in spurious local minima in solving non-convex optimization. Therefore, it is challenging to provide explainable algorithms when studying non-convex optimization problems. In this thesis, two popular non-convex problems are studied: (1) low-rank matrix completion and (2) neural network learning.	翻訳日:2023-06-30 15:36:52 公開日:2023-06-28
# Rater-Specific Bayesian Neural Networkによる医用画像セグメンテーションにおける層間不確かさの定量化 Inter-Rater Uncertainty Quantification in Medical Image Segmentation via Rater-Specific Bayesian Neural Networks ( http://arxiv.org/abs/2306.16556v1 ) ライセンス: Link先を確認	Qingqiao Hu, Hao Wang, Jing Luo, Yunhao Luo, Zhiheng Zhangg, Jan S. Kirschke, Benedikt Wiestler, Bjoern Menze, Jianguo Zhang, Hongwei Bran Li	(参考訳) 自動医用画像分割は本質的にある程度の不確実性を伴う。この不確実性に寄与する重要な要因の1つは、主に画像の外観の変化によって、対象領域の境界を決定する際に生じる曖昧さである。これに加えて、この分野の専門家の間でも、特定の解剖学的構造の正確な定義に関して異なる意見が生まれることがある。この研究は特に、層間不確実性として知られるセグメンテーションの不確かさのモデリングに対処する。その主な目的は、医療画像の複数の専門家が同じ画像の解釈と注釈を行う際に生じるセグメンテーション結果の変動を探索し分析することである。医用画像セグメンテーションにおけるレータ間不確実性を推定するための新しいベイズニューラルネットワークアーキテクチャを提案する。私たちのアプローチには3つの重要な進歩がある。まず,不確実性推定用に特別に調整した1エンコーダマルチデコーダアーキテクチャを導入することで,各専門家のレートラ固有の表現を捉えることができる。第2に,新しいアーキテクチャのベイズモデルを提案することで,特に制約の少ないシナリオにおいて,レート間分布の効率的なキャプチャを実現する。最後に、各デコーダにアテンションモジュールを組み込むことにより、rater特有の表現を強化する。このモジュールは、各レートのセグメンテーション結果の集中化と洗練を容易にする。合成および実世界のデータセットを使用して広範な評価を行い、技術的革新を厳格に検証する。提案手法は, 各種不確実性を考慮した2つの評価指標を考慮し, 7つのタスクのうち5つにおいて, 既存のベースライン手法を越えている。私たちのコード、モデル、新しいデータセットはgithubリポジトリから入手できます。 Automated medical image segmentation inherently involves a certain degree of uncertainty. One key factor contributing to this uncertainty is the ambiguity that can arise in determining the boundaries of a target region of interest, primarily due to variations in image appearance. On top of this, even among experts in the field, different opinions can emerge regarding the precise definition of specific anatomical structures. This work specifically addresses the modeling of segmentation uncertainty, known as inter-rater uncertainty. Its primary objective is to explore and analyze the variability in segmentation outcomes that can occur when multiple experts in medical imaging interpret and annotate the same images. We introduce a novel Bayesian neural network-based architecture to estimate inter-rater uncertainty in medical image segmentation. Our approach has three key advancements. Firstly, we introduce a one-encoder-multi-decoder architecture specifically tailored for uncertainty estimation, enabling us to capture the rater-specific representation of each expert involved. Secondly, we propose Bayesian modeling for the new architecture, allowing efficient capture of the inter-rater distribution, particularly in scenarios with limited annotations. Lastly, we enhance the rater-specific representation by integrating an attention module into each decoder. This module facilitates focused and refined segmentation results for each rater. We conduct extensive evaluations using synthetic and real-world datasets to validate our technical innovations rigorously. Our method surpasses existing baseline methods in five out of seven diverse tasks on the publicly available \emph{QUBIQ} dataset, considering two evaluation metrics encompassing different uncertainty aspects. Our codes, models, and the new dataset are available through our GitHub repository: https://github.com/HaoWang420/bOEMD-net .	翻訳日:2023-06-30 15:36:38 公開日:2023-06-28
# min-max f-divergence正規化による学習フェア分類 Learning Fair Classifiers via Min-Max F-divergence Regularization ( http://arxiv.org/abs/2306.16552v1 ) ライセンス: Link先を確認	Meiyu Zhong, Ravi Tandon	(参考訳) 機械学習(ML)ベースのシステムは、法執行機関、刑事司法、財務、雇用、入場などの分野で採用されているため、機械学習支援による意思決定の公正性がますます重要になっている。本稿では, 公平な分類の問題に焦点をあて, 高い精度を維持しつつ, 公平な分類モデルを学ぶための min-max F-divergence regularization framework を導入する。このフレームワークは,2つの学習可能なネットワーク,すなわち分類器ネットワークとバイアス/フェアネス推定ネットワークから成り,f-ダイバージェンスの統計的概念を用いてフェアネスを計測する。その結果,f-divergence測度は凸性と微分可能性特性を有し,その変動表現は実用的勾配に基づく学習法に広く適用できることがわかった。提案するフレームワークは、複数の機密属性や高次元データセットに容易に適応できる。グループフェアネス制約,すなわち人口格差と等化確率の2種類のグループフェアネス制約に対するF偏差に基づくトレーニングパラダイムについて検討する。本稿では,複数の領域(コンパス,法律加入,成人所得,セロバデータセットなど)で発生する実世界のデータセットについて,総合的な実験を行う。フェアネス精度のトレードオフを定量化するために、フェアネス精度の受信機動作特性 (FA-ROC) とそれに対応する 'textit{low-bias} FA-ROC の概念を導入する。フェア分類器(前処理,後処理,その他の正規化手法を含む)を学習するためのいくつかの既存手法と比較して,提案手法は,精度と公正性のトレードオフに関して,最先端の性能を実現する。 As machine learning (ML) based systems are adopted in domains such as law enforcement, criminal justice, finance, hiring and admissions, ensuring the fairness of ML aided decision-making is becoming increasingly important. In this paper, we focus on the problem of fair classification, and introduce a novel min-max F-divergence regularization framework for learning fair classification models while preserving high accuracy. Our framework consists of two trainable networks, namely, a classifier network and a bias/fairness estimator network, where the fairness is measured using the statistical notion of F-divergence. We show that F-divergence measures possess convexity and differentiability properties, and their variational representation make them widely applicable in practical gradient based training methods. The proposed framework can be readily adapted to multiple sensitive attributes and for high dimensional datasets. We study the F-divergence based training paradigm for two types of group fairness constraints, namely, demographic parity and equalized odds. We present a comprehensive set of experiments for several real-world data sets arising in multiple domains (including COMPAS, Law Admissions, Adult Income, and CelebA datasets). To quantify the fairness-accuracy tradeoff, we introduce the notion of fairness-accuracy receiver operating characteristic (FA-ROC) and a corresponding \textit{low-bias} FA-ROC, which we argue is an appropriate measure to evaluate different classifiers. In comparison to several existing approaches for learning fair classifiers (including pre-processing, post-processing and other regularization methods), we show that the proposed F-divergence based framework achieves state-of-the-art performance with respect to the trade-off between accuracy and fairness.	翻訳日:2023-06-30 15:36:10 公開日:2023-06-28
# オフロードセマンティックセマンティックセグメンテーション性能におけるLiDAR構成の解析 Analysis of LiDAR Configurations on Off-road Semantic Segmentation Performance ( http://arxiv.org/abs/2306.16551v1 ) ライセンス: Link先を確認	Jinhee Yu, Jingdao Chen, Lalitha Dabbiru, Christopher T. Goodin	(参考訳) 本稿では,LiDARの設定変化が3次元LiDARポイントクラウドセマンティックセグメンテーションモデルの性能に与える影響について検討する。実験にCylinder3Dを用いた3次元LiDARポイントクラウドセマンティックセマンティックセグメンテーションモデルのトレーニングおよびテストにおいて,異なるLiDARチャネルを使用することの効果を検討する。シリンダー3dモデルは、ミシシッピ州立大学のautonomous vehicle simulator(mavs)で作成したシミュレーション3d lidar point cloudデータセットと、現実世界のオフロード環境で収集されたrellis-3dデータセットの32,64チャンネルの3d lidar point cloud上でトレーニングおよびテストされる。実験の結果,センサと空間領域のシフトは,LiDARに基づくセマンティックセグメンテーションモデルの性能に大きく影響することが示された。トレーニングとテストの間の空間領域の変化がないため、同じセンサータイプでトレーニングとテストを行ったモデルは、一般的により優れた性能を示した。さらに,高分解能センサは低分解能センサに比べて性能が向上した。しかし,空間的領域変化が存在すると,結果は異なっていた。場合によっては、センサーの高解像度化の利点は、センサードメインシフトと非センサードメインシフトの両方でパフォーマンスの向上につながった。別の例では、高解像度は特定のドメイン内で過度に適合し、一般化能力が欠如し、異なるセンサー構成のデータでテストした場合のパフォーマンスが低下した。 This paper investigates the impact of LiDAR configuration shifts on the performance of 3D LiDAR point cloud semantic segmentation models, a topic not extensively studied before. We explore the effect of using different LiDAR channels when training and testing a 3D LiDAR point cloud semantic segmentation model, utilizing Cylinder3D for the experiments. A Cylinder3D model is trained and tested on simulated 3D LiDAR point cloud datasets created using the Mississippi State University Autonomous Vehicle Simulator (MAVS) and 32, 64 channel 3D LiDAR point clouds of the RELLIS-3D dataset collected in a real-world off-road environment. Our experimental results demonstrate that sensor and spatial domain shifts significantly impact the performance of LiDAR-based semantic segmentation models. In the absence of spatial domain changes between training and testing, models trained and tested on the same sensor type generally exhibited better performance. Moreover, higher-resolution sensors showed improved performance compared to those with lower-resolution ones. However, results varied when spatial domain changes were present. In some cases, the advantage of a sensor's higher resolution led to better performance both with and without sensor domain shifts. In other instances, the higher resolution resulted in overfitting within a specific domain, causing a lack of generalization capability and decreased performance when tested on data with different sensor configurations.	翻訳日:2023-06-30 15:35:39 公開日:2023-06-28
# UTOPIA: 普遍的にトレーニング可能な最適予測間隔 UTOPIA: Universally Trainable Optimal Prediction Intervals Aggregation ( http://arxiv.org/abs/2306.16549v1 ) ライセンス: Link先を確認	Jianqing Fan, Jiawei Ge and Debarghya Mukherjee	(参考訳) 予測の不確かさの定量化は、バイオメディカルサイエンス、経済研究、天気予報など、様々な分野で重要な応用の興味深い問題である。量子回帰や共形予測などの予測区間を構築するための多くの方法が利用可能である。それでも、モデル不特定(特に高次元)や準最適構成は、しばしばバイアスや不必要に広い予測間隔をもたらす。本稿では,予測帯域の平均幅を最小化するために,予測帯域の平均幅を最小化する手法として,普遍的に学習可能な最適予測間隔集約 (utopia) を提案する。また,基本関数に基づいて予測帯域を直接構築することができる。我々のアプローチは、実装が容易な線形あるいは凸プログラミングに基づいている。提案手法はすべて,本論文で詳述した範囲確率と最適平均長に関する理論的保証によって支持されている。本手法の有効性は,金融・マクロ経済学における合成データと2つの実データに適用することによって実証された。 Uncertainty quantification for prediction is an intriguing problem with significant applications in various fields, such as biomedical science, economic studies, and weather forecasts. Numerous methods are available for constructing prediction intervals, such as quantile regression and conformal predictions, among others. Nevertheless, model misspecification (especially in high-dimension) or sub-optimal constructions can frequently result in biased or unnecessarily-wide prediction intervals. In this paper, we propose a novel and widely applicable technique for aggregating multiple prediction intervals to minimize the average width of the prediction band along with coverage guarantee, called Universally Trainable Optimal Predictive Intervals Aggregation (UTOPIA). The method also allows us to directly construct predictive bands based on elementary basis functions. Our approach is based on linear or convex programming which is easy to implement. All of our proposed methodologies are supported by theoretical guarantees on the coverage probability and optimal average length, which are detailed in this paper. The effectiveness of our approach is convincingly demonstrated by applying it to synthetic data and two real datasets on finance and macroeconomics.	翻訳日:2023-06-30 15:35:12 公開日:2023-06-28
# palm: 言語モデルによる行動予測@ego4d 長期行動予測チャレンジ2023 Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023 ( http://arxiv.org/abs/2306.16545v1 ) ライセンス: Link先を確認	Daoji Huang, Otmar Hilliges, Luc Van Gool, Xi Wang	(参考訳) 視覚言語と大規模言語モデルを利用したLTA(Long-Term Action Precipation)タスクのソリューションであるPalmを提案する。注釈付きアクション周期の入力ビデオが与えられた場合、LTAタスクは将来のアクションを予測することを目的としている。我々は、最適なソリューションは過去のアクションと将来のアクションの間の相互依存性を捉え、過去のアクションで符号化された構造と依存関係に基づいて将来のアクションを推測できるべきだと仮定する。大規模言語モデルは顕著な常識に基づく推論能力を示している。これにインスパイアされたPalmは、画像キャプションモデルと大きな言語モデルをチェーンする。入力ビデオから抽出したフレーム記述とアクションラベルに基づいて、将来のアクションを予測する。提案手法は,EGO4D LTAチャレンジにおける他の参加者よりも優れ,行動予測の観点で最高のパフォーマンスを達成する。私たちのコードはhttps://github.com/DanDoge/Palmで利用可能です。 We present Palm, a solution to the Long-Term Action Anticipation (LTA) task utilizing vision-language and large language models. Given an input video with annotated action periods, the LTA task aims to predict possible future actions. We hypothesize that an optimal solution should capture the interdependency between past and future actions, and be able to infer future actions based on the structure and dependency encoded in the past actions. Large language models have demonstrated remarkable commonsense-based reasoning ability. Inspired by that, Palm chains an image captioning model and a large language model. It predicts future actions based on frame descriptions and action labels extracted from the input videos. Our method outperforms other participants in the EGO4D LTA challenge and achieves the best performance in terms of action prediction. Our code is available at https://github.com/DanDoge/Palm	翻訳日:2023-06-30 15:34:51 公開日:2023-06-28
# 適応量子力学における自由フェルミオン Free fermions under adaptive quantum dynamics ( http://arxiv.org/abs/2306.16595v1 ) ライセンス: Link先を確認	Vikram Ravindranath, Zhi-Cheng Yang and Xiao Chen	(参考訳) ユニタリゲートと射影計測からなる適応量子力学の下で自由フェルミオン系と補正ユニタリ演算について検討した。さらに、各サイトに対して古典的なフラグを導入し、ユニタリゲートが適用可能か否かを判断するアクティブまたは非アクティブな状態を可能にする。この力学において、個々の量子軌道は、連続的監視下で以前に研究された自由フェルミオンのモデルと同様に、臨界値から限界値までのエンタングルメント遷移を示す。さらに, 正則ユニタリ演算は, 電荷密度-波動秩序を特徴とする状態に制御できることがわかった。その結果、量子軌道と量子チャネルの両方のレベルで観察できる追加の位相遷移が起こる。我々は、絡み合い遷移とステアリング遷移が根本的に異なることを確証する。後者の遷移は、固有のフェルミオンパリティと古典的なラベリングの間の相互作用から生じるパリティ保存(PC)普遍性クラスに属する。我々は,フリーフェルミオン系の効率的な数値シミュレーションにより,エンタングルメントとステアリング遷移の双方を実証し,後者のPC普遍性クラスを確認する。 We study free fermion systems under adaptive quantum dynamics consisting of unitary gates and projective measurements followed by corrective unitary operations. We further introduce a classical flag for each site, allowing for an active or inactive status which determines whether or not the unitary gates are allowed to apply. In this dynamics, the individual quantum trajectories exhibit a measurement-induced entanglement transition from critical to area-law scaling above a critical measurement rate, similar to previously studied models of free fermions under continuous monitoring. Furthermore, we find that the corrective unitary operations can steer the system into a state characterized by charge-density-wave order. Consequently, an additional phase transition occurs, which can be observed at both the level of the quantum trajectory and the quantum channel. We establish that the entanglement transition and the steering transition are fundamentally distinct. The latter transition belongs to the parity-conserving (PC) universality class, arising from the interplay between the inherent fermionic parity and classical labelling. We demonstrate both the entanglement and the steering transitions via efficient numerical simulations of free fermion systems, which confirm the PC universality class of the latter.	翻訳日:2023-06-30 15:28:52 公開日:2023-06-28
# 時間不変性と線形性を利用した部分的に観測された動的時系列の開発予測 Forecasting of the development of a partially-observed dynamical time series with the aid of time-invariance and linearity ( http://arxiv.org/abs/2306.16593v1 ) ライセンス: Link先を確認	Akifumi Okuno, Yuya Morishita, Yoh-ichi Mototake	(参考訳) 力学系は進化関数を用いて開発された動的時系列と呼ばれる依存多変量列を生成する。現在の時刻における動的時系列の変数は通常、前の時刻における変数全体に依存するため、既存の研究では進化関数を推定することによって将来の時刻における変数を予測する。しかし、動的時系列のいくつかの変数は、いくつかの実用的な状況では欠落している。本研究では,スラック時系列(ARS)モデルを用いた自己回帰モデルを提案する。 ARSモデルは、力学系の時間不変性と線形性の助けを借りて、進化関数と基礎となる不足変数をスラック時系列として同時推定する。本研究では,提案モデルの有効性を実証的に示す。 A dynamical system produces a dependent multivariate sequence called dynamical time series, developed with an evolution function. As variables in the dynamical time series at the current time-point usually depend on the whole variables in the previous time-point, existing studies forecast the variables at the future time-point by estimating the evolution function. However, some variables in the dynamical time-series are missing in some practical situations. In this study, we propose an autoregressive with slack time series (ARS) model. ARS model involves the simultaneous estimation of the evolution function and the underlying missing variables as a slack time series, with the aid of the time-invariance and linearity of the dynamical system. This study empirically demonstrates the effectiveness of the proposed ARS model.	翻訳日:2023-06-30 15:28:33 公開日:2023-06-28
# SeMLaPS: 潜時事前ネットワークと準平面分割を用いたリアルタイム意味マッピング SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and Quasi-Planar Segmentation ( http://arxiv.org/abs/2306.16585v1 ) ライセンス: Link先を確認	Jingwen Wang, Juan Tarrio, Lourdes Agapito, Pablo F. Alcantarilla, Alexander Vakhitov	(参考訳) リアルタイムセマンティクスの可用性はSLAMシステムの中核的な幾何学的機能を大幅に改善し、多数のロボットおよびAR/VRアプリケーションを可能にする。本稿では,2次元ニューラルネットワークとSLAMシステムに基づく3次元ネットワークを組み合わせたRGB-Dシーケンスからのリアルタイムセマンティックマッピング手法を提案する。新しいフレームをセグメント化する際、差別化可能なレンダリングに基づいて、以前のフレームから潜在機能を再投影する。以前のフレームから現在のフレームで再プロジェクションされた特徴マップを再利用することで、イメージを独立して処理するベースラインに比べて、画像セグメンテーションの品質が大幅に向上する。 3次元マップ処理では,曲面正規度に依存して,同じ意味クラスに属する可能性のある3次元マップ要素をグループ化する幾何学的準平面オーバーセグメンテーション法を提案する。また,軽量なセマンティックマップ処理のためのニューラルネットワーク設計について述べる。本システムは,2d-3dネットワークベースのシステムにおいて最先端のセマンティックマッピング品質を実現し,リアルタイム作業中に3つの実屋内データセット上での3次元畳み込みネットワークの性能に適合する。さらに,3d cnnと比較してセンサ間一般化能力が向上し,異なる深度センサを用いたトレーニングや推論が可能となった。コードとデータはプロジェクトページで公開される。 http://jingwenwang95.github.io/SeMLaPS The availability of real-time semantics greatly improves the core geometric functionality of SLAM systems, enabling numerous robotic and AR/VR applications. We present a new methodology for real-time semantic mapping from RGB-D sequences that combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping. When segmenting a new frame we perform latent feature re-projection from previous frames based on differentiable rendering. Fusing re-projected feature maps from previous frames with current-frame features greatly improves image segmentation quality, compared to a baseline that processes images independently. For 3D map processing, we propose a novel geometric quasi-planar over-segmentation method that groups 3D map elements likely to belong to the same semantic classes, relying on surface normals. We also describe a novel neural network design for lightweight semantic map post-processing. Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems and matches the performance of 3D convolutional networks on three real indoor datasets, while working in real-time. Moreover, it shows better cross-sensor generalization abilities compared to 3D CNNs, enabling training and inference with different depth sensors. Code and data will be released on project page: http://jingwenwang95.github.io/SeMLaPS	翻訳日:2023-06-30 15:28:21 公開日:2023-06-28
# 画像分類における深部ニューラルネットワークのロバスト性は有益か? Does Saliency-Based Training bring Robustness for Deep Neural Networks in Image Classification? ( http://arxiv.org/abs/2306.16581v1 ) ライセンス: Link先を確認	Ali Karkehabadi	(参考訳) ディープニューラルネットワークは、複雑なパターンを理解し、意思決定する強力なツールである。しかし、そのブラックボックスの性質は内部の動作を完全に理解することを妨げている。オンライン・サリエンシーガイドによるトレーニング手法では、モデルのアウトプットの顕著な特徴を強調してこの問題を緩和しようとするが、視覚的に説明可能な特徴がモデルの頑健さと敵対的な例と一致するかどうかはあいまいである。本稿では,サリエンシートレーニングモデルの脆弱性を逆例法に適用して検討する。モデルはオンラインのsaliency-guided trainingメソッドを使用してトレーニングされ、敵の例の一般的なアルゴリズムに対して評価される。我々はロバスト性を定量化し、モデルの出力によく説明されている可視化にもかかわらず、サルエントモデルが敵の事例攻撃に対する低いパフォーマンスに苦しむと結論づける。 Deep Neural Networks are powerful tools to understand complex patterns and making decisions. However, their black-box nature impedes a complete understanding of their inner workings. While online saliency-guided training methods try to highlight the prominent features in the model's output to alleviate this problem, it is still ambiguous if the visually explainable features align with robustness of the model against adversarial examples. In this paper, we investigate the saliency trained model's vulnerability to adversarial examples methods. Models are trained using an online saliency-guided training method and evaluated against popular algorithms of adversarial examples. We quantify the robustness and conclude that despite the well-explained visualizations in the model's output, the salient models suffer from the lower performance against adversarial examples attacks.	翻訳日:2023-06-30 15:27:59 公開日:2023-06-28
# 温度状態生成のための量子イマジナリー時間伝搬アルゴリズム Quantum Imaginary Time Propagation algorithm for preparing thermal states ( http://arxiv.org/abs/2306.16580v1 ) ライセンス: Link先を確認	Francesco Turro	(参考訳) 有限温度での計算は、核物理学から凝縮物質まで、様々な科学分野において基本的なものである。想像時間における進化は、量子系の熱状態を作成するための顕著な古典的技法である。本稿では, 量子時間伝搬法に基づく熱状態の生成を, 量子時間演算子の非ユニタリティ特性を克服するために, アンシラ量子ビットを用いた希釈演算子を用いて提案する。提案手法は、一般的なハミルトニアンに対する一般的な量子プロセッサ上の正しい熱密度行列を得ることができる最初の方法である。 2つの中性子系と3つの中性子系の実際の量子ハードウェア計算熱特性の信頼性を証明する。 Calculations at finite temperatures are fundamental in different scientific fields, from nuclear physics to condensed matter. Evolution in imaginary time is a prominent classical technique for preparing thermal states of quantum systems. We propose a new quantum algorithm that prepares thermal states based on the quantum imaginary time propagation method, using a diluted operator with ancilla qubits to overcome the non-unitarity nature of the imaginary time operator. The presented method is the first that allows us to obtain the correct thermal density matrix on a general quantum processor for a generic Hamiltonian. We prove its reliability in the actual quantum hardware computing thermal properties for two and three neutron systems.	翻訳日:2023-06-30 15:27:44 公開日:2023-06-28
# 未知・ランダム・リワードを持つ腕に異種資源を割り当てる Allocating Divisible Resources on Arms with Unknown and Random Rewards ( http://arxiv.org/abs/2306.16578v1 ) ライセンス: Link先を確認	Ningyuan Chen, Wenhao Li	(参考訳) 我々は,各期間に再生可能かつ分別可能な資源の1つの単位を,複数のアームで割り当てる意思決定者を考える。アームには未知およびランダムな報酬があり、その手段は割り当てられたリソースに比例し、その分散は割り当てられたリソースのオーダー$b$に比例する。特に、ある期間に意思決定者がリソース$a_i$をarm$i$に割り当てると、報酬$y_i$は$y_i(a_i)=a_i \mu_i+a_i^b \xi_{i}$となる。 b$ が 0 から 1 まで変化すると、フレームワークは標準の確率的多腕バンディットとオンライン学習を完全なフィードバックでスムーズに橋渡しする。最適なギャップ依存とギャップ非依存の残差境界を$b\in [0,1]$で設計し,$b=1/2$で相転移を示す。理論的な結果は、重みが分数であり、濾過に適応し、単調なサブガウス確率変数の線形結合を境界とする、新しい濃度不等式にかかっている。 We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order $b$ of the allocated resource. In particular, if the decision maker allocates resource $A_i$ to arm $i$ in a period, then the reward $Y_i$ is$Y_i(A_i)=A_i \mu_i+A_i^b \xi_{i}$, where $\mu_i$ is the unknown mean and the noise $\xi_{i}$ is independent and sub-Gaussian. When the order $b$ ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for $b\in [0,1]$, and demonstrate a phase transition at $b=1/2$. The theoretical results hinge on a novel concentration inequality we have developed that bounds a linear combination of sub-Gaussian random variables whose weights are fractional, adapted to the filtration, and monotonic.	翻訳日:2023-06-30 15:27:33 公開日:2023-06-28
# 石油・ガス供給チェーンにおけるブロックチェーン: ユーザセキュリティとプライバシの観点からの文献レビュー Blockchain in Oil and Gas Supply Chain: A Literature Review from User Security and Privacy Perspective ( http://arxiv.org/abs/2306.16576v1 ) ライセンス: Link先を確認	Urvashi Kishnani, Srinidhi Madabhushi and Sanchari Das	(参考訳) ブロックチェーンの影響は金融を超えて広がり、不動産、石油、ガス、教育など様々な分野に影響を与えた。この広範なリーチは、デジタルトランザクションとサプライチェーンを確実に管理できるブロックチェーンの本質的な能力に起因する。石油とガス部門では、ブロックチェーンとサプライチェーンの管理とデータ処理の合併が注目に値するトレンドだ。サプライチェーンには、資源の抽出、輸送、取引、流通など、いくつかの事業がある。残念ながら、現在のサプライチェーン構造では、透明性やトレーサビリティ、フレキシブルなトレーサビリティ、セキュアなデータストレージといった重要な機能が欠落しています。それでも、石油・ガス産業におけるブロックチェーンのセキュリティとプライバシの調査は不可欠である。このような精査により、スムーズでセキュアで使用可能なトランザクションの実行が可能になる。本研究は124冊の学術論文をレビューし,21冊の詳細な分析を行った。サプライチェーンフローのさまざまなフェーズ – 上流,中流,下流,データ管理 – との関連性から,記事の分類を行った。サプライチェーンにおける既存のセキュリティとプライバシの空白に対処できるブロックチェーンのポテンシャルにもかかわらず、石油とガスの運用におけるブロックチェーン統合の実践的な実装が大幅に欠如している。この欠如は、従来の方法からブロックチェーン中心のアプローチへの移行に大きく挑戦する。 Blockchain's influence extends beyond finance, impacting diverse sectors such as real estate, oil and gas, and education. This extensive reach stems from blockchain's intrinsic ability to reliably manage digital transactions and supply chains. Within the oil and gas sector, the merger of blockchain with supply chain management and data handling is a notable trend. The supply chain encompasses several operations: extraction, transportation, trading, and distribution of resources. Unfortunately, the current supply chain structure misses critical features such as transparency, traceability, flexible trading, and secure data storage - all of which blockchain can provide. Nevertheless, it is essential to investigate blockchain's security and privacy in the oil and gas industry. Such scrutiny enables the smooth, secure, and usable execution of transactions. For this purpose, we reviewed 124 peer-reviewed academic publications, conducting an in-depth analysis of 21 among them. We classified the articles by their relevance to various phases of the supply chain flow: upstream, midstream, downstream, and data management. Despite blockchain's potential to address existing security and privacy voids in the supply chain, there is a significant lack of practical implementation of blockchain integration in oil and gas operations. This deficiency substantially challenges the transition from conventional methods to a blockchain-centric approach.	翻訳日:2023-06-30 15:27:05 公開日:2023-06-28
# fisher information rateを用いた有限サンプル対称平均推定 Finite-Sample Symmetric Mean Estimation with Fisher Information Rate ( http://arxiv.org/abs/2306.16573v1 ) ライセンス: Link先を確認	Shivam Gupta, Jasper C.H. Lee, Eric Price	(参考訳) 未知の分散の平均$$\sigma^2$ distribution $f$は、分散$\frac{\sigma^2}{n}$とほぼ対応する部分ガウス率を持つ$n$サンプルから推定することができる。 f$ が翻訳まで知られている場合、これは漸近的に $\frac{1}{n\mathcal i}$ に改善され、ここで $\mathcal i$ は分布のフィッシャー情報である。そのような改善は、一般の未知の$f$では不可能であるが、[Stone, 1975] は、この漸近収束$\textit{is}$が、その平均について$f$が$\textit{symmetric}$であれば可能であることを示した。収束に必要な$n$ は、分配 $f$ と失敗確率 $\delta$ に依存する。本稿では、フィッシャー情報の観点から対称平均推定のための有限サンプル保証を与える。すべての$f, n, \delta$ with $n > \log \frac{1}{\delta}$ に対して、分散$\frac{1}{n \mathcal I_r}$ に近い収束を得る。そのような境界は、既知の$f$設定における有限サンプル保証と本質的に一致する。 The mean of an unknown variance-$\sigma^2$ distribution $f$ can be estimated from $n$ samples with variance $\frac{\sigma^2}{n}$ and nearly corresponding subgaussian rate. When $f$ is known up to translation, this can be improved asymptotically to $\frac{1}{n\mathcal I}$, where $\mathcal I$ is the Fisher information of the distribution. Such an improvement is not possible for general unknown $f$, but [Stone, 1975] showed that this asymptotic convergence $\textit{is}$ possible if $f$ is $\textit{symmetric}$ about its mean. Stone's bound is asymptotic, however: the $n$ required for convergence depends in an unspecified way on the distribution $f$ and failure probability $\delta$. In this paper we give finite-sample guarantees for symmetric mean estimation in terms of Fisher information. For every $f, n, \delta$ with $n > \log \frac{1}{\delta}$, we get convergence close to a subgaussian with variance $\frac{1}{n \mathcal I_r}$, where $\mathcal I_r$ is the $r$-$\textit{smoothed}$ Fisher information with smoothing radius $r$ that decays polynomially in $n$. Such a bound essentially matches the finite-sample guarantees in the known-$f$ setting.	翻訳日:2023-06-30 15:26:43 公開日:2023-06-28
# 実時間および虚数時間における量子および古典シミュレーションのための複合qdrift-product公式 Composite QDrift-Product Formulas for Quantum and Classical Simulations in Real and Imaginary Time ( http://arxiv.org/abs/2306.16572v1 ) ライセンス: Link先を確認	Matthew Pocrnic, Matthew Hagan, Juan Carrasquilla, Dvira Segal, Nathan Wiebe	(参考訳) 最近の研究は、与えられたシミュレーション問題に対してハミルトン$H$をサブセットの$A$と$B$に分割し、$H=A+B$をトロッタースズキチャネルでシミュレートし、QDriftアルゴリズムを介して$B$項をランダムにサンプリングする合成チャネルを実装するのが有利であることを示している。ここでは、このアプローチが虚数時間で成り立つことを示し、量子モンテカルロ計算の古典的アルゴリズム候補となる。虚数時間QDriftと複合チャネルの両方において、Schatten-$1 \to 1$ normを上界する。もう一つの最近の結果は、有限格子上で定義される系に対する幾何学的局所的相互作用を含むハミルトンのシミュレーションが、リーブ・ロビンソンの議論を用いて格子の部分集合上で支持される項のみを含む部分集合に$h$を分解することで改善できることを示した。ここでは,この結果と複合的手法を併用した量子アルゴリズムを ``local composite channel' に提供し,ダイヤモンド距離を上界に設定する。 e^{-ih_j t}$ と $e^{-h_j \beta}$ の形のゲート数を計算してアルゴリズムコストの正確な数値シミュレーションを行い、一定の誤差許容値 $\epsilon$ を満たす。我々は、様々な興味深いハミルトニアンに対して定数因子の利点を示し、その最大値は、ジェリウムのシミュレーションで起こる約20ドルの速度アップである。 Recent work has shown that it can be advantageous to implement a composite channel that partitions the Hamiltonian $H$ for a given simulation problem into subsets $A$ and $B$ such that $H=A+B$, where the terms in $A$ are simulated with a Trotter-Suzuki channel and the $B$ terms are randomly sampled via the QDrift algorithm. Here we show that this approach holds in imaginary time, making it a candidate classical algorithm for quantum Monte-Carlo calculations. We upper-bound the induced Schatten-$1 \to 1$ norm on both imaginary-time QDrift and Composite channels. Another recent result demonstrated that simulations of Hamiltonians containing geometrically-local interactions for systems defined on finite lattices can be improved by decomposing $H$ into subsets that contain only terms supported on that subset of the lattice using a Lieb-Robinson argument. Here, we provide a quantum algorithm by unifying this result with the composite approach into ``local composite channels" and we upper bound the diamond distance. We provide exact numerical simulations of algorithmic cost by counting the number of gates of the form $e^{-iH_j t}$ and $e^{-H_j \beta}$ to meet a certain error tolerance $\epsilon$. We show constant factor advantages for a variety of interesting Hamiltonians, the maximum of which is a $\approx 20$ fold speedup that occurs for a simulation of Jellium.	翻訳日:2023-06-30 15:26:01 公開日:2023-06-28
# 終端イベントの存在下でのリカレントイベントの予測数に対する因果推論 Causal inference for the expected number of recurrent events in the presence of a terminal event ( http://arxiv.org/abs/2306.16571v1 ) ライセンス: Link先を確認	Benjamin R. Baer, Robert L. Strawderman, Ashkan Ertefaie	(参考訳) 終端イベントの存在下での繰り返し事象の因果推論と効率的な推定について検討した。我々は,予測回数の繰り返しイベントと,ランドマーク時間列に沿って評価された障害生存関数の両方からなるベクトルとして推定値を定義する。ランダムに粗大化下で機能する観測データとして、右検閲と因果選択の存在下での推定を同定し、非パラメトリック効率境界を導出し、境界を達成し、迷惑パラメータの非パラメトリック推定を可能にするマルチプライロバスト推定器を提案する。全体として、失敗、検閲、観察されたデータの基本的な確率分布について絶対連続性の仮定は行われない。さらに、粗い分布が分かっていれば影響関数のクラスを導出し、そのクラスに属することができるのかをレビューする。その過程で,因果寿命分析文献における興味深い不一致を浮き彫りにする。 We study causal inference and efficient estimation for the expected number of recurrent events in the presence of a terminal event. We define our estimand as the vector comprising both the expected number of recurrent events and the failure survival function evaluated along a sequence of landmark times. We identify the estimand in the presence of right-censoring and causal selection as an observed data functional under coarsening at random, derive the nonparametric efficiency bound, and propose a multiply-robust estimator that achieves the bound and permits nonparametric estimation of nuisance parameters. Throughout, no absolute continuity assumption is made on the underlying probability distributions of failure, censoring, or the observed data. Additionally, we derive the class of influence functions when the coarsening distribution is known and review how published estimators may belong to the class. Along the way, we highlight some interesting inconsistencies in the causal lifetime analysis literature.	翻訳日:2023-06-30 15:24:58 公開日:2023-06-28
# cpu上のトランスフォーマー言語モデルのための効率的なスパース推論ソフトウェアアクセラレータ An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs ( http://arxiv.org/abs/2306.16601v1 ) ライセンス: Link先を確認	Haihao Shen, Hengyu Meng, Bo Dong, Zhe Wang, Ofir Zafrir, Yi Ding, Yu Luo, Hanwen Chang, Qun Gao, Ziheng Wang, Guy Boudoukh, and Moshe Wasserblat	(参考訳) 近年,トランスフォーマーに基づく言語モデルが自然言語処理タスクの標準的アプローチとなっている。しかし、産業アプリケーションにおける厳格なスループットとレイテンシ要件は採用を制限している。このギャップを軽減するために、構造化プルーニングのようなモデル圧縮技術が推論効率を改善するために使用されている。しかし、既存のほとんどのニューラルネットワーク推論ランタイムは、構造化されたスパーシリティを適切にサポートしていない。本稿では,トランスフォーマーに基づく言語モデルに対して,重みを一定のブロックサイズで刈り取る,効率的なスパース深層学習ソフトウェアスタックを提案する。我々のスパースソフトウェアアクセラレータは、Intel Deep Learning Boostを活用してスパースマトリックス(一般にSpMMと略される)の性能を最大化する。我々のSpMMカーネルは,既存のスパースライブラリ (oneMKL, TVM, LIBXSMM) を5つの代表空間比 (70%, 75%, 80%, 85%, 90%) 以下のGEMM形状で桁違いに処理する。さらに、当社のSpMMカーネルは、業界で広く使われている高度ライブラリであるOneDNNの高密度GEMMカーネルよりも最大5倍高速化されている。スパースアクセラレータを,Bert-Mini, DistilBERT, Bert-Base, BERT-Largeなど,広く使われているTransformerベースの言語モデルに適用する。当社のスパース推論ソフトウェアは,Amazon Web Services上のXeonと同じ構成で,Neural MagicのDeepsparseよりも1.5倍のスピードアップを実現しています。我々はまた、私たちのソリューションを、ONNX RuntimeとPyTorchという2つのフレームワークベースの推論ソリューションと比較し、レイテンシ制約の下で、ONNX Runtimeの最大37倍のスピードアップとXeonのPyTorchの最大345倍のスピードアップを示します。ソースコードはすべてgithubで公開されている。 https://github.com/intel/intel-extension-for-transformers。 In recent years, Transformer-based language models have become the standard approach for natural language processing tasks. However, stringent throughput and latency requirements in industrial applications are limiting their adoption. To mitigate the gap, model compression techniques such as structured pruning are being used to improve inference efficiency. However, most existing neural network inference runtimes lack adequate support for structured sparsity. In this paper, we propose an efficient sparse deep learning inference software stack for Transformer-based language models where the weights are pruned with constant block size. Our sparse software accelerator leverages Intel Deep Learning Boost to maximize the performance of sparse matrix - dense matrix multiplication (commonly abbreviated as SpMM) on CPUs. Our SpMM kernel outperforms the existing sparse libraries (oneMKL, TVM, and LIBXSMM) by an order of magnitude on a wide range of GEMM shapes under 5 representative sparsity ratios (70%, 75%, 80%, 85%, 90%). Moreover, our SpMM kernel shows up to 5x speedup over dense GEMM kernel of oneDNN, a well-optimized dense library widely used in industry. We apply our sparse accelerator on widely-used Transformer-based language models including Bert-Mini, DistilBERT, Bert-Base, and BERT-Large. Our sparse inference software shows up to 1.5x speedup over Neural Magic's Deepsparse under same configurations on Xeon on Amazon Web Services under proxy production latency constraints. We also compare our solution with two framework-based inference solutions, ONNX Runtime and PyTorch, and demonstrate up to 37x speedup over ONNX Runtime and 345x over PyTorch on Xeon under the latency constraints. All the source code is publicly available on Github: https://github.com/intel/intel-extension-for-transformers.	翻訳日:2023-06-30 15:16:11 公開日:2023-06-28
# 浮遊粒子の近量子制限速度分布の観察 Observation of near-quantum-limited velocity distributions of a levitated particle ( http://arxiv.org/abs/2306.16598v1 ) ライセンス: Link先を確認	M. Kamba and K. Aikawa	(参考訳) 超低温浮遊ナノ粒子の飛行時間測定を実証し、量子状態における翻訳速度を明らかにする。繰り返し測定により得られた速度分布は,ナノ粒子の液状化運動により著しく拡大することがわかった。すべてのリリレー運動に対するフィードバック冷却の下で、占有数からの期待値と合理的に一致する速度分布を量子限界の約2倍の幅で回復する。振動中心と質量中心との偏差はナノ粒子の非対称性によって引き起こされるため, 振動運動の翻訳運動に対する強い影響は理解されている。その結果、振動運動の制御の重要性が解明され、浮遊ナノ粒子の速度の観点から量子力学的性質を探求する基礎が確立された。 We demonstrate time-of-flight measurements for an ultracold levitated nanoparticle and reveal its translational velocity in the quantum regime. We discover that the velocity distributions obtained with repeated measurements are significantly broadened via librational motions of the nanoparticle. Under feedback cooling on all the librational motions, we recover the velocity distributions in reasonable agreement with an expectation from the occupation number, with approximately twice the width of the quantum limit. The strong impact of librational motions on the translational motions is understood as a result of the deviation between the libration center and the center of mass, induced by the asymmetry of the nanoparticle. Our results elucidate the importance of the control over librational motions and establish the basis for exploring quantum mechanical properties of levitated nanoparticles in terms of their velocity.	翻訳日:2023-06-30 15:15:39 公開日:2023-06-28
# PFB-Diff:テキスト駆動画像編集のための進行的特徴ブレンディング拡散 PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing ( http://arxiv.org/abs/2306.16894v1 ) ライセンス: Link先を確認	Wenjing Huang, Shikui Tu, Lei Xu	(参考訳) 拡散モデルは、多彩で高品質な画像を合成する優れた能力を示し、実際の画像編集への応用への関心を喚起している。しかしながら、局所的な画像編集のための既存の拡散ベースのアプローチは、ノイズの多い対象画像と拡散潜性変数のピクセルレベルでのブレンドによって、望ましくないアーティファクトに苦しむことが多い。そこで本研究では拡散型画像編集のためのプログレッシブ機能ブレンド手法であるpfb-diffを提案する。従来の方法とは異なり、PFB-Diffはマルチレベルの特徴ブレンディングを通じてテキスト誘導された生成コンテンツをターゲット画像にシームレスに統合する。深い特徴を符号化したリッチなセマンティックスと、高度から低レベルのプログレッシブブレンディングスキームは、編集画像のセマンティックコヒーレンスと高品質を保証します。また,クロスアテンション層に注意マスキング機構を導入し,特定の単語が所望の領域に与える影響を限定し,背景編集の性能をさらに向上させる。 PFB-Diffは、オブジェクト/バックグラウンド置換やオブジェクト属性編集など、様々な編集タスクに効果的に対処できる。本手法は,画像の忠実性,編集精度,効率性,およびオリジナル画像に対する忠実性において,微調整やトレーニングを必要とせずに優れた性能を示す。 Diffusion models have showcased their remarkable capability to synthesize diverse and high-quality images, sparking interest in their application for real image editing. However, existing diffusion-based approaches for local image editing often suffer from undesired artifacts due to the pixel-level blending of the noised target images and diffusion latent variables, which lack the necessary semantics for maintaining image consistency. To address these issues, we propose PFB-Diff, a Progressive Feature Blending method for Diffusion-based image editing. Unlike previous methods, PFB-Diff seamlessly integrates text-guided generated content into the target image through multi-level feature blending. The rich semantics encoded in deep features and the progressive blending scheme from high to low levels ensure semantic coherence and high quality in edited images. Additionally, we introduce an attention masking mechanism in the cross-attention layers to confine the impact of specific words to desired regions, further improving the performance of background editing. PFB-Diff can effectively address various editing tasks, including object/background replacement and object attribute editing. Our method demonstrates its superior performance in terms of image fidelity, editing accuracy, efficiency, and faithfulness to the original image, without the need for fine-tuning or training.	翻訳日:2023-06-30 13:39:22 公開日:2023-06-28
# 重み付け最適化軌道による対人訓練の強化 Enhancing Adversarial Training via Reweighting Optimization Trajectory ( http://arxiv.org/abs/2306.14275v2 ) ライセンス: Link先を確認	Tianjin Huang, Shiwei Liu, Tianlong Chen, Meng Fang, Li Shen, Vlaod Menkovski, Lu Yin, Yulong Pei and Mykola Pechenizkiy	(参考訳) 敵対的トレーニングがディープニューラルネットワークの堅牢性向上のデファクト手法になっているにもかかわらず、バニラ対人トレーニングが頑強なオーバーフィッティングに悩まされ、満足のいく堅牢な一般化をもたらすことはよく知られている。これらの欠点に対処するいくつかのアプローチが提案されている。例えば、余分な正規化、敵の重みの摂動、そして過去数年間のさらなるデータによるトレーニングなどである。しかし、強固な一般化改善はまだ十分ではない。本稿では,この課題に新たな視点でアプローチし,歴史的最適化の軌跡を整理する。本稿では, 時間内学習の最適化トラジェクトリを利用する「textbf{Weighted Optimization Trajectories (WOT)」という新しい手法を提案する。我々は,様々な対人攻撃におけるWOTの有効性を実証するための広範囲な実験を行った。以上の結果から,wotは既存の対向訓練手法とシームレスに統合され,強固なオーバーフィッティング問題を一貫して克服し,対向ロバスト性が向上した。例えば、WOTはAA-$L_{\infty}$アタックのAT-PGDのロバスト精度を1.53\%$\sim$6.11\%向上させ、一方SVHN、CIFAR-10、CIFAR-100、Tiny-ImageNetデータセットのクリーン精度を0.55\%$\sim$5.47\%向上させる。 Despite the fact that adversarial training has become the de facto method for improving the robustness of deep neural networks, it is well-known that vanilla adversarial training suffers from daunting robust overfitting, resulting in unsatisfactory robust generalization. A number of approaches have been proposed to address these drawbacks such as extra regularization, adversarial weights perturbation, and training with more data over the last few years. However, the robust generalization improvement is yet far from satisfactory. In this paper, we approach this challenge with a brand new perspective -- refining historical optimization trajectories. We propose a new method named \textbf{Weighted Optimization Trajectories (WOT)} that leverages the optimization trajectories of adversarial training in time. We have conducted extensive experiments to demonstrate the effectiveness of WOT under various state-of-the-art adversarial attacks. Our results show that WOT integrates seamlessly with the existing adversarial training methods and consistently overcomes the robust overfitting issue, resulting in better adversarial robustness. For example, WOT boosts the robust accuracy of AT-PGD under AA-$L_{\infty}$ attack by 1.53\% $\sim$ 6.11\% and meanwhile increases the clean accuracy by 0.55\%$\sim$5.47\% across SVHN, CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets.	翻訳日:2023-06-30 10:21:18 公開日:2023-06-28
# ShuttleSet22: ストロークレベルバドミントンデータセットによるストローク予測のベンチマーク ShuttleSet22: Benchmarking Stroke Forecasting with Stroke-Level Badminton Dataset ( http://arxiv.org/abs/2306.15664v2 ) ライセンス: Link先を確認	Wei-Yao Wang, Wei-Wei Du, Wen-Chih Peng	(参考訳) 近年、人工知能の進歩とデータ収集の効率化により、バドミントン分析が注目を集めている。プレイヤーのパフォーマンスを改善し、調査するための効果的なアプリケーションはいくつかあるが、バドミントン領域以外の研究者に使用できる公開バドミントンデータセットはわずかである。既存のバドミントンシングルスデータセットは特定のマッチアップに焦点を当てているが、異なるプレイヤーや様々なマッチアップに関する包括的な研究は提供できない。本稿では,バドミントン・シングルス・データセットであるshuttleset22を2022年に高位の試合から収集した。 shuttleset22はトレーニングセット2,888回の30,172ストローク、バリデーションセット450回の1,400ストローク、ラリー内の詳細なストロークレベルメタデータを備えたテストセット654の2,040ストロークで構成される。 shuttleset22で既存の作業をベンチマークするために、shuttlenetという最先端のストローク予測手法をテストし、対応するストローク予測タスク、すなわち各ラリーの所定のストロークに基づいて将来のストロークを予測する。また、coachai badminton challenge 2023で、バドミントン集会における今後のターンベースのストロークを予測することで、この問題に取り組む研究者を増やそうとしています。ベースラインコードとデータセットはhttps://github.com/wywyWang/CoachAI-Projects/tree/main/CoachAI-Challenge-IJCAI2023で公開される。 In recent years, badminton analytics has drawn attention due to the advancement of artificial intelligence and the efficiency of data collection. While there is a line of effective applications to improve and investigate player performance, there are only a few public badminton datasets that can be used for researchers outside the badminton domain. Existing badminton singles datasets focus on specific matchups; however, they cannot provide comprehensive studies on different players and various matchups. In this paper, we provide a badminton singles dataset, ShuttleSet22, which is collected from high-ranking matches in 2022. ShuttleSet22 consists of 30,172 strokes in 2,888 rallies in the training set, 1,400 strokes in 450 rallies in the validation set, and 2,040 strokes in 654 rallies in the testing set with detailed stroke-level metadata within a rally. To benchmark existing work with ShuttleSet22, we test the state-of-the-art stroke forecasting approach, ShuttleNet, with the corresponding stroke forecasting task, i.e., predict the future strokes based on the given strokes of each rally. We also hold a challenge, Track 2: Forecasting Future Turn-Based Strokes in Badminton Rallies, at CoachAI Badminton Challenge 2023 to boost researchers to tackle this problem. The baseline codes and the dataset will be made available on https://github.com/wywyWang/CoachAI-Projects/tree/main/CoachAI-Challenge-IJCAI2023.	翻訳日:2023-06-30 10:11:05 公開日:2023-06-28
# 非対称幾何散乱変換によるグラフニューラルネットワークの理解 Understanding Graph Neural Networks with Asymmetric Geometric Scattering Transforms ( http://arxiv.org/abs/1911.06253v4 ) ライセンス: Link先を確認	Michael Perlmutter and Alexander Tong and Feng Gao and Guy Wolf and Matthew Hirn	(参考訳) 散乱変換は、畳み込みニューラルネットワークのモデルとして機能する多層ウェーブレットベースのディープラーニングアーキテクチャである。近年、グラフのような非ユークリッド的な設定に対する散乱変換の一般化がいくつか提案されている。我々の研究は、非対称ウェーブレットの非常に一般的なクラスに基づくグラフに対して、窓付きおよび非窓型の幾何学的散乱変換を導入することで、これらの構成に基づいている。これらの非対称グラフ散乱変換は、対称グラフ散乱変換と多くの理論的保証を持つことを示す。その結果、提案手法は既存のグラフ散乱アーキテクチャの多くに対する既知の理論結果を統一し、拡張する。この研究は、幾何学的散乱と他のグラフニューラルネットワークとのギャップを埋めるのに役立ち、証明可能な安定性と不変性を保証する大きなネットワーク群を導入する。これらの結果は、フィルタを学習したグラフ構造化データのための将来のディープラーニングアーキテクチャの基礎となり、確実に望ましい理論的特性を持つ。 The scattering transform is a multilayered wavelet-based deep learning architecture that acts as a model of convolutional neural networks. Recently, several works have introduced generalizations of the scattering transform for non-Euclidean settings such as graphs. Our work builds upon these constructions by introducing windowed and non-windowed geometric scattering transforms for graphs based upon a very general class of asymmetric wavelets. We show that these asymmetric graph scattering transforms have many of the same theoretical guarantees as their symmetric counterparts. As a result, the proposed construction unifies and extends known theoretical results for many of the existing graph scattering architectures. In doing so, this work helps bridge the gap between geometric scattering and other graph neural networks by introducing a large family of networks with provable stability and invariance guarantees. These results lay the groundwork for future deep learning architectures for graph-structured data that have learned filters and also provably have desirable theoretical properties.	翻訳日:2023-06-29 19:03:02 公開日:2023-06-28
# 時間的および意思決定タスク用に訓練された繰り返しニューラルネットワークにおける重み初期化、解の多様性、劣化の探求 Exploring weight initialization, diversity of solutions, and degradation in recurrent neural networks trained for temporal and decision-making tasks ( http://arxiv.org/abs/1906.01094v6 ) ライセンス: Link先を確認	Cecilia Jarne and Rodrigo Laje	(参考訳) リカレントニューラルネットワーク(Recurrent Neural Networks, RNN)は、脳機能と構造をモデル化するために頻繁に使用される。本研究では,時間変化刺激による時間・流れ制御タスクを行うために,小型完全接続型RNNを訓練した。また,ネットワークサイズが小さくなったり,間隔が長くなったり,接続障害が大きくなったりすることで,異なるRNNが同じ課題を解くことができることを示す。検討した課題に対して,タスクパラメータ化により学習後に得られるネットワークがいかに堅牢かを検討した。その過程で,計算神経科学における他の関心課題をパラメータ化するためのフレームワークを開発した。この結果は、通常ブラックボックスとして用いられ、大脳皮質領域の生物学的応答をモデル化するために理解する必要があるモデルの異なる側面を定量化するのに有用である。 Recurrent Neural Networks (RNNs) are frequently used to model aspects of brain function and structure. In this work, we trained small fully-connected RNNs to perform temporal and flow control tasks with time-varying stimuli. Our results show that different RNNs can solve the same task by converging to different underlying dynamics and also how the performance gracefully degrades as either network size is decreased, interval duration is increased, or connectivity damage is increased. For the considered tasks, we explored how robust the network obtained after training can be according to task parameterization. In the process, we developed a framework that can be useful to parameterize other tasks of interest in computational neuroscience. Our results are useful to quantify different aspects of the models, which are normally used as black boxes and need to be understood in order to model the biological response of cerebral cortex areas.	翻訳日:2023-06-29 19:02:50 公開日:2023-06-28
# 多クラス分類のためのラベル分布ロバスト損失:一貫性,ロバスト性,適応性 Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity ( http://arxiv.org/abs/2112.14869v4 ) ライセンス: Link先を確認	Dixian Zhu, Yiming Ying and Tianbao Yang	(参考訳) 本研究では,分布的ロバスト最適化(dro)の観点から定式化した多クラス分類のためのラベル分散ロバスト(ldr)損失と呼ばれる損失関数の族について検討する。この観点の利点はいくつかあります。 i) 古典的クロスエントロピー(CE)損失とSVM損失とその変種を説明する統一的なフレームワークを提供する。 (ii)広く採用されているが、よく理解されていない温度スケールのce損失に対応する特殊ファミリーを含む。 (iii)インスタンスレベルでラベル情報の不確実性度に適応することができる。 Our contributions include: (1) we study both consistency and robustness by establishing top-$k$ ($\forall k\geq 1$) consistency of LDR losses for multi-class classification, and a negative result that a top-$1$ consistent and symmetric robust loss cannot achieve top-$k$ consistency simultaneously for all $k\geq 2$; (2) we propose a new adaptive LDR loss that automatically adapts the individualized temperature parameter to the noise degree of class label of each instance; (3) we demonstrate stable and competitive performance for the proposed adaptive LDR loss on 7 benchmark datasets under 6 noisy label and 1 clean settings against 13 loss functions, and on one real-world noisy dataset. コードは \url{https://github.com/Optimization-AI/ICML2023_LDR} でオープンソース化されている。 We study a family of loss functions named label-distributionally robust (LDR) losses for multi-class classification that are formulated from distributionally robust optimization (DRO) perspective, where the uncertainty in the given label information are modeled and captured by taking the worse case of distributional weights. The benefits of this perspective are several fold: (i) it provides a unified framework to explain the classical cross-entropy (CE) loss and SVM loss and their variants, (ii) it includes a special family corresponding to the temperature-scaled CE loss, which is widely adopted but poorly understood; (iii) it allows us to achieve adaptivity to the uncertainty degree of label information at an instance level. Our contributions include: (1) we study both consistency and robustness by establishing top-$k$ ($\forall k\geq 1$) consistency of LDR losses for multi-class classification, and a negative result that a top-$1$ consistent and symmetric robust loss cannot achieve top-$k$ consistency simultaneously for all $k\geq 2$; (2) we propose a new adaptive LDR loss that automatically adapts the individualized temperature parameter to the noise degree of class label of each instance; (3) we demonstrate stable and competitive performance for the proposed adaptive LDR loss on 7 benchmark datasets under 6 noisy label and 1 clean settings against 13 loss functions, and on one real-world noisy dataset. The code is open-sourced at \url{https://github.com/Optimization-AI/ICML2023_LDR}.	翻訳日:2023-06-29 19:02:02 公開日:2023-06-28
# 連立多目的・多目的最適化のためのレバレッジ信頼 Leveraging Trust for Joint Multi-Objective and Multi-Fidelity Optimization ( http://arxiv.org/abs/2112.13901v3 ) ライセンス: Link先を確認	Faran Irshad, Stefan Karsch and Andreas D\"opp	(参考訳) 本稿では,費用対評価システムの効率的な最適化を追求するために,ベイジアン多目的多忠実度最適化(MOMF)の新たなアプローチを提案する。従来の最適化手法は効果的であるが、しばしば1つ以上の目的の多次元最適化において非常に高いコストに直面する。マルチフィデリティアプローチは、低解像度シミュレーションなどの低コストな複数の情報ソースを活用することで、潜在的な改善を提供する。しかし、これら2つの戦略を統合することは大きな課題である。複数目的とデータソースの同時最適化を支援するため,信頼度基準の革新的利用を提案する。提案手法は,パレート最適化問題において,評価コスト当たりの信頼ゲインを1つの目的として組み込むための多目的最適化ポリシーを修正し,低コストで同時MOMFを実現する。本稿では,入力パラメータと信頼パラメータを併用して選択する総合的アプローチと,ベンチマークのための逐次的アプローチの2つのMOMF最適化手法を提案する。合成試験関数のベンチマークにより,本手法は,純多目的最適化と比較して最大1桁の大幅なコスト削減をもたらすことが示された。さらに,信頼ドメインと客観的ドメインの協調最適化は,それらを逐次的に処理する上で優れていた。レーザプラズマ加速シミュレーションの最適化を応用し, 高コストブラックボックス関数のパレート最適化における本手法の可能性を示す。既存のベイズフレームワークでこれらのメソッドを実装するのは簡単で、バッチ最適化に簡単に拡張できる。種々の連続的あるいは離散的忠実度次元を扱う能力により,プラズマ物理学や流体力学などの分野におけるシミュレーション問題に対する幅広い適用性を提供する。 In the pursuit of efficient optimization of expensive-to-evaluate systems, this paper investigates a novel approach to Bayesian multi-objective and multi-fidelity (MOMF) optimization. Traditional optimization methods, while effective, often encounter prohibitively high costs in multi-dimensional optimizations of one or more objectives. Multi-fidelity approaches offer potential remedies by utilizing multiple, less costly information sources, such as low-resolution simulations. However, integrating these two strategies presents a significant challenge. We suggest the innovative use of a trust metric to support simultaneous optimization of multiple objectives and data sources. Our method modifies a multi-objective optimization policy to incorporate the trust gain per evaluation cost as one objective in a Pareto optimization problem, enabling simultaneous MOMF at lower costs. We present and compare two MOMF optimization methods: a holistic approach selecting both the input parameters and the trust parameter jointly, and a sequential approach for benchmarking. Through benchmarks on synthetic test functions, our approach is shown to yield significant cost reductions - up to an order of magnitude compared to pure multi-objective optimization. Furthermore, we find that joint optimization of the trust and objective domains outperforms addressing them in sequential manner. We validate our results using the use case of optimizing laser-plasma acceleration simulations, demonstrating our method's potential in Pareto optimization of high-cost black-box functions. Implementing these methods in existing Bayesian frameworks is simple, and they can be readily extended to batch optimization. With their capability to handle various continuous or discrete fidelity dimensions, our techniques offer broad applicability in solving simulation problems in fields such as plasma physics and fluid dynamics.	翻訳日:2023-06-29 19:01:33 公開日:2023-06-28
# ランダムスパルシファイド勾配による微分プライベートsgdの改善 Improving Differentially Private SGD via Randomly Sparsified Gradients ( http://arxiv.org/abs/2112.00845v3 ) ライセンス: Link先を確認	Junyi Zhu, Matthew B. Blaschko	(参考訳) 個人差分的確率勾配勾配(DP-SGD)は、個々の勾配の最大ノルムと付加等方性ガウス雑音を束縛するために、厳密に定義されたプライバシーを提供するために、ディープラーニングにおいて広く採用されている。非凸状態でのdp-sgdの収束速度の解析により、クリッピングとノイズ化前の勾配をランダムにスパース化することで、収束境界の内部成分間のトレードオフを調整し、ノイズが支配的である場合の上限を小さくする。さらに, 理論的解析および実証評価の結果, トレードオフは自明なものではなく, DP-SGDのユニークな特性である可能性が示唆された。この観測は、DP-SGDが(単純なランダムな)勾配圧縮に固有の空間を持っていることを暗示している。そこで我々は,DP-SGDを強化するためにランダムスペーシフィケーション(RS)を用いた効率的で軽量な拡張手法を提案する。様々なDP-SGDフレームワークを用いた実験では、RSはパフォーマンスを向上させることができる。さらに、生成したRSのスパース勾配は、通信コストの削減と、プライベート機械学習の重要な問題であるリコンストラクション攻撃に対するプライバシー強化の利点を示す。 Differentially private stochastic gradient descent (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy, which requires gradient clipping to bound the maximum norm of individual gradients and additive isotropic Gaussian noise. With analysis of the convergence rate of DP-SGD in a non-convex setting, we identify that randomly sparsifying gradients before clipping and noisification adjusts a trade-off between internal components of the convergence bound and leads to a smaller upper bound when the noise is dominant. Additionally, our theoretical analysis and empirical evaluations show that the trade-off is not trivial but possibly a unique property of DP-SGD, as either canceling noisification or gradient clipping eliminates the trade-off in the bound. This observation is indicative, as it implies DP-SGD has special inherent room for (even simply random) gradient compression. To verify the observation and utilize it, we propose an efficient and lightweight extension using random sparsification (RS) to strengthen DP-SGD. Experiments with various DP-SGD frameworks show that RS can improve performance. Additionally, the produced sparse gradients of RS exhibit advantages in reducing communication cost and strengthening privacy against reconstruction attacks, which are also key problems in private machine learning.	翻訳日:2023-06-29 19:01:03 公開日:2023-06-28
# 量子コンピューティングは、製造欠陥のある平面量子ビット列上でスケーラブルである Quantum computing is scalable on a planar array of qubits with fabrication defects ( http://arxiv.org/abs/2111.06432v3 ) ライセンス: Link先を確認	Armands Strikis, Simon C. Benjamin and Benjamin J. Brown	(参考訳) 大規模なアルゴリズムをうまく実行するためには、量子コンピュータは基本操作をほぼ完璧に実行する必要がある。これは、全ての物理量子ビットがかなりのノイズに悩まされるため、根本的な問題である。さらに、実数系は有限収率を持つ可能性があり、例えば、複雑な装置の成分の非ゼロの割合は、製造段階で無意味に破られる可能性がある。本稿では,有限生成欠陥密度を持つ2次元ノイズ量子ビット列を用いて,任意に大きな量子計算を失敗の消滅確率で完了できることを示すしきい値定理を提案する。証明を完了するために,不活性量子ビットの広い領域を補償する高重安定化器を測定する頑健なプロトコルを導入する。我々はsurface code architectureを用いて結果を得た。したがって、我々のアプローチは、大規模量子コンピュータを構築するための実験的な取り組みと容易に対応できる。 To successfully execute large-scale algorithms, a quantum computer will need to perform its elementary operations near perfectly. This is a fundamental challenge since all physical qubits suffer a considerable level of noise. Moreover, real systems are likely to have a finite yield, i.e. some non-zero proportion of the components in a complex device may be irredeemably broken at the fabrication stage. We present a threshold theorem showing that an arbitrarily large quantum computation can be completed with a vanishing probability of failure using a two-dimensional array of noisy qubits with a finite density of fabrication defects. To complete our proof we introduce a robust protocol to measure high-weight stabilizers to compensate for large regions of inactive qubits. We obtain our result using a surface code architecture. Our approach is therefore readily compatible with ongoing experimental efforts to build a large-scale quantum computer.	翻訳日:2023-06-29 19:00:38 公開日:2023-06-28
# ブロックランチョスを用いた行列積状態における多重励起の直接解法 Direct solution of multiple excitations in a matrix product state with block Lanczos ( http://arxiv.org/abs/2109.08181v3 ) ライセンス: Link先を確認	Thomas E. Baker, Alexandre Foley, and David S\'en\'echal	(参考訳) 行列積状態法は局所的ガッピングハミルトニアンの基底状態、特に1次元の計算に効率的であることが知られている。我々は,多目的密度行列再正規化群法を導入し,多くの励起を持つ束行列積状態に作用する。ブロックまたはバンド付きlanczosアルゴリズムを使用することで、励起束の同時、変動最適化が可能になる。この手法はハイゼンベルクモデルや他の興味のあるケースで示される。多数の励起は鎖全体で非常に信頼性の高い局所観測可能な小さな結合次元で得ることができる。 Matrix product state methods are known to be efficient for computing ground states of local, gapped Hamiltonians, particularly in one dimension. We introduce the multi-targeted density matrix renormalization group method that acts on a bundled matrix product state, holding many excitations. The use of a block or banded Lanczos algorithm allows for the simultaneous, variational optimization of the bundle of excitations. The method is demonstrated on a Heisenberg model and other cases of interest. A large of number of excitations can be obtained at a small bond dimension with highly reliable local observables throughout the chain.	翻訳日:2023-06-29 19:00:22 公開日:2023-06-28
# DeepSMile: 大腸癌および乳癌におけるH&E全スライディング画像から直接MSIとHRDを分類する対照的な自己監督前訓練効果 DeepSMILE: Contrastive self-supervised pre-training benefits MSI and HRD classification directly from H&E whole-slide images in colorectal and breast cancer ( http://arxiv.org/abs/2107.09405v3 ) ライセンス: Link先を確認	Yoni Schirris, Efstratios Gavves, Iris Nederlof, Hugo Mark Horlings, Jonas Teuwen	(参考訳) 本稿では,Hematoxylin と Eosin (H&E) のスライディング画像全体 (WSI) を解析するための深層学習に基づく弱いラベル学習法を提案する。我々はdeepsmileを相同組換え欠損症(hrd)とマイクロサテライト不安定症(msi)のタスクに適用する。対照的自己教師付き学習を用いて,癌組織の病理組織学タイルの特徴抽出装置を事前学習する。さらに,腫瘍の多様性をモデル化しながら,可変性に着目したディープマルチインスタンス学習を用いてタイル特徴集合関数を学習する。 TCGA-CRC (n=360) の腫瘍診断および色調正常化サブセットにおけるMSI予測は,本提案したDeepSMILE法と同等にタイル管理ベースラインを0.77AUROCから0.87AUROCに改善する。 TCGA-BC (n=1041) では、DeepSMILE は、自己監督またはImageNet事前訓練された特徴抽出器によるタイル管理と比較して、RDD分類性能を 0.77 から 0.81 AUROC に改善した。提案手法は,両データセットのラベル付きデータの40%のみを用いて,ベースライン性能に達する。これらの改善は、組織病理領域における複数のインスタンス学習を組み合わせた標準的な自己教師付き学習技術を用いて、ラベル付きデータの少ないゲノムラベル分類性能を向上させることを示唆する。 We propose a Deep learning-based weak label learning method for analyzing whole slide images (WSIs) of Hematoxylin and Eosin (H&E) stained tumor tissue not requiring pixel-level or tile-level annotations using Self-supervised pre-training and heterogeneity-aware deep Multiple Instance LEarning (DeepSMILE). We apply DeepSMILE to the task of Homologous recombination deficiency (HRD) and microsatellite instability (MSI) prediction. We utilize contrastive self-supervised learning to pre-train a feature extractor on histopathology tiles of cancer tissue. Additionally, we use variability-aware deep multiple instance learning to learn the tile feature aggregation function while modeling tumor heterogeneity. For MSI prediction in a tumor-annotated and color normalized subset of TCGA-CRC (n=360 patients), contrastive self-supervised learning improves the tile supervision baseline from 0.77 to 0.87 AUROC, on par with our proposed DeepSMILE method. On TCGA-BC (n=1041 patients) without any manual annotations, DeepSMILE improves HRD classification performance from 0.77 to 0.81 AUROC compared to tile supervision with either a self-supervised or ImageNet pre-trained feature extractor. Our proposed methods reach the baseline performance using only 40% of the labeled data on both datasets. These improvements suggest we can use standard self-supervised learning techniques combined with multiple instance learning in the histopathology domain to improve genomic label classification performance with fewer labeled data.	翻訳日:2023-06-29 19:00:14 公開日:2023-06-28
# サブ線形深さとエネルギーのしきい値回路の指数下限 Exponential Lower Bounds for Threshold Circuits of Sub-Linear Depth and Energy ( http://arxiv.org/abs/2107.00223v2 ) ライセンス: Link先を確認	Kei Uchizawa and Haruki Abe	(参考訳) 本稿では,しきい値回路の計算能力とニューラルネットワークの他の理論モデルについて,サイズ(ゲート数)、深さ、重み、エネルギーの4つの複雑性尺度を用いて検討する。ここで、回路のエネルギー複雑性は計算の空間性を測定し、全ての入力代入に対して非ゼロの値を出力する最大ゲート数として定義される。主な結果として、任意のしきい値回路$C$ of size $s$, depth $d$, energy $e$ and weight $w$ satisfies $\log (rk(M_C)) \le ed (\log s + \log w + \log n)$, where $rk(M_C)$は通信行列$M_C$の2n$変数ブール関数の階数であることを示す。したがって、そのようなしきい値回路$C$は、通信行列が階数を持つブール関数のみを$s,w$の対数係数の積と$d,e$の線型因子の積で計算することができる。これは、エネルギーと重量が十分に小さい場合、偶数直線深度閾値回路のサイズに対する指数的な下界を意味する。離散レイル回路や離散化sgmoid回路のような他のニューラルネットワークモデルに対しては、同様の不等式が離散化回路に対しても成り立つことが証明される: $rk(m_c) = o(ed(\log s + \log w + \log n)^3)$。 In this paper, we investigate computational power of threshold circuits and other theoretical models of neural networks in terms of the following four complexity measures: size (the number of gates), depth, weight and energy. Here the energy complexity of a circuit measures sparsity of their computation, and is defined as the maximum number of gates outputting non-zero values taken over all the input assignments. As our main result, we prove that any threshold circuit $C$ of size $s$, depth $d$, energy $e$ and weight $w$ satisfies $\log (rk(M_C)) \le ed (\log s + \log w + \log n)$, where $rk(M_C)$ is the rank of the communication matrix $M_C$ of a $2n$-variable Boolean function that $C$ computes. Thus, such a threshold circuit $C$ is able to compute only a Boolean function of which communication matrix has rank bounded by a product of logarithmic factors of $s,w$ and linear factors of $d,e$. This implies an exponential lower bound on the size of even sublinear-depth threshold circuit if energy and weight are sufficiently small. For other models of neural networks such as a discretized ReLE circuits and decretized sigmoid circuits, we prove that a similar inequality also holds for a discretized circuit $C$: $rk(M_C) = O(ed(\log s + \log w + \log n)^3)$.	翻訳日:2023-06-29 18:59:40 公開日:2023-06-28
# GeoT: 信頼性分子特性予測と化学的解釈可能な表現学習のための幾何学的変換器 GeoT: A Geometry-aware Transformer for Reliable Molecular Property Prediction and Chemically Interpretable Representation Learning ( http://arxiv.org/abs/2106.15516v3 ) ライセンス: Link先を確認	Bumju Kwak, Jiwon Park, Taewon Kang, Jeonghee Jo, Byunghan Lee, Sungroh Yoon	(参考訳) 近年、分子表現学習は様々な化学タスクに焦点をあてる重要な領域として浮上している。しかし、既存のモデルの多くは、分子構造の幾何学的情報を完全に考慮できず、直感的な表現は少ない。さらに、化学的な観点からの実験結果の解釈を提供するために、広く使われているメッセージパッシング機構が限られている。これらの課題に対処するために,GeoT(Geometry-aware Transformer)という,分子表現学習のためのトランスフォーマーベースのフレームワークを導入する。 geotは、分子特性の予測だけでなく、信頼できる解釈性を提供するために特別に設計された注意に基づくメカニズムを通じて、分子グラフ構造を学ぶ。これにより、GeoTはトレーニング対象に関連する原子間関係の注意マップを生成することができる。さらに、GeoTはMPNNベースのモデルに匹敵する性能を示しながら、計算複雑性の低減を実現している。実験の結果,geantは分子構造に対する化学的洞察を効果的に学習し,人工知能と分子科学のギャップを橋渡ししていることが明らかとなった。 In recent years, molecular representation learning has emerged as a key area of focus in various chemical tasks. However, many existing models fail to fully consider the geometric information of molecular structures, resulting in less intuitive representations. Moreover, the widely used message-passing mechanism is limited to provide the interpretation of experimental results from a chemical perspective. To address these challenges, we introduce a novel Transformer-based framework for molecular representation learning, named the Geometry-aware Transformer (GeoT). GeoT learns molecular graph structures through attention-based mechanisms specifically designed to offer reliable interpretability, as well as molecular property prediction. Consequently, GeoT can generate attention maps of interatomic relationships associated with training objectives. In addition, GeoT demonstrates comparable performance to MPNN-based models while achieving reduced computational complexity. Our comprehensive experiments, including an empirical simulation, reveal that GeoT effectively learns the chemical insights into molecular structures, bridging the gap between artificial intelligence and molecular sciences.	翻訳日:2023-06-29 18:59:03 公開日:2023-06-28
# 古典的に区別不能となるハーディ型真実装型ガジェットの拡張 Extensions of Hardy-type true-implies-false gadgets to classically obtain indistinguishability ( http://arxiv.org/abs/2006.11396v4 ) ライセンス: Link先を確認	Karl Svozil	(参考訳) 量子論理用語では、ハーディ型引数は、相互に絡み合った文脈とその可観測物の集合として一様に表現され拡張される。古典的に解釈すれば、これらの構造はグラフ理論的な「ゲット」として機能し、事前に選択され、後から選択された観測可能な終点の相関を強制する。この方法は、他のタイプの関係性、特に量子力学的に異なる観測可能な古典的等式を予測する新しいジョイント特性の一般化と拡張を可能にする。また、量子可観測体の忠実直交表現の発見を容易にする。 In quantum logical terms, Hardy-type arguments can be uniformly presented and extended as collections of intertwined contexts and their observables. If interpreted classically those structures serve as graph-theoretic "gadgets" that enforce correlations on the respective preselected and postselected observable terminal points. The method allows the generalization and extension to other types of relational properties, in particular, to novel joint properties predicting classical equality of quantum mechanically distinct observables. It also facilitates finding faithful orthogonal representations of quantum observables.	翻訳日:2023-06-29 18:58:46 公開日:2023-06-28
# 統一機械学習:同時観測・未観測ノベルティ検出による分類 Unified machine learning: Classification with simultaneous observed and unobserved novelty detection ( http://arxiv.org/abs/2002.01368v7 ) ライセンス: Link先を確認	Emile R. Engelbrecht, Johan A. du Preez	(参考訳) Positive and Unlabelled (PU)-learning, Semi-Supervised Learning (SSL), and Open-Set Recognition (OSR) の統一的なアプローチは、コスト効率の高いアプリケーショングレード分類器の開発を大幅に促進する。しかし、以前の試みは、新しいカテゴリである \mbox{\textit{observed}} と \mbox{\textit{unobserved}} の定義を混乱させた。観測された新しいカテゴリは、PU学習において未学習のトレーニングデータとして定義され、トレーニングセットのカテゴリラベルの不完全なセットによって存在する。対照的に、観測されていない新しいカテゴリはosrでテストデータにのみ存在し、時間とともに現れる新しい興味深いパターンを表すものとして定義されている。安全で実用的な分類器の開発を維持するために、モデルはこれらの新しいカテゴリタイプの違いを一般化する必要がある。本稿では,関連する機械学習研究分野を徹底的にレビューし,ラベルなしデータやオープンlacuを活用し,カテゴリを拡張したオープンセット学習という,新たな統一機械学習政策を提案する。具体的には、Open-LACUは、ラベル付きカテゴリの$K > 1$を正確に分類し、同時に観察された新規カテゴリを拡張背景カテゴリ(K + 1$)に検出・分離し、さらに観測されていない新規カテゴリを強化未知カテゴリ(K + 2$)に検出・分離するモデルを必要とする。 Open-LACUは、観察および保存されていない新しいカテゴリを一般化する最初の機械学習ポリシーである。 Open-LACUの意義は、リモートセンシング画像のセマンティックセグメンテーション、医用放射線画像内の物体検出、およびコークス音響分析による病気の識別におけるその応用について論じる。 A unified approach of Positive and Unlabelled (PU)-learning, Semi-Supervised Learning (SSL), and Open-Set Recognition (OSR) would significantly enhance the development of cost-efficient application-grade classifiers. However, previous attempts have conflated the definitions of \mbox{\textit{observed}} and \mbox{\textit{unobserved}} novel categories. Observed novel categories are defined in PU-learning as those in unlabelled training data and exist due to an incomplete set of category labels for the training set. In contrast, unobserved novel categories are defined in OSR as those that only exist in the testing data and represent new and interesting patterns that emerge over time. To maintain safe and practical classifier development, models must generalise the difference between these novel category types. In this letter, we thoroughly review the relevant machine learning research fields to propose a new unified machine learning policy called Open-set Learning with Augmented Categories by exploiting Unlabelled data or Open-LACU. Specifically, Open-LACU requires models to accurately classify $K > 1$ number of labelled categories while simultaneously detecting and separating observed novel categories into the augmented background category ($K + 1$) and further detecting and separating unobserved novel categories into the augmented unknown category ($K + 2$). Open-LACU is the first machine learning policy to generalise observed and unobserved novel categories. The significance of Open-LACU is also highlighted by discussing its application in semantic segmentation of remote sensing images, object detection within medical radiology images and disease identification through cough sound analysis.	翻訳日:2023-06-29 18:58:35 公開日:2023-06-28
# ニューラルマシン翻訳のための局所バイト融合 Local Byte Fusion for Neural Machine Translation ( http://arxiv.org/abs/2205.11490v3 ) ライセンス: Link先を確認	Makesh Narsimhan Sreedhar, Xiangpeng Wan, Yu Cheng, Junjie Hu	(参考訳) サブワードトークン化スキームは、現在のNLPモデルで使用される主要なテクニックである。しかし、そのようなスキームは剛性があり、一方のコーパス上に構築されたトークン化器は他の並列コーパスにうまく適応しない。多言語コーパスでは、サブワードのトークン化スキームが低リソース言語を多言語化することで翻訳性能が低下することが観察されている。サブワードトークンライザの単純な代替手段は、UTF-8のような符号化方式を用いてバイト列へのトークン化を行うバイトベースの方法である。バイトトークンは、しばしばサブキャラクタの粒度で入力を表す。これにより、文字列よりもかなり長いバイトシーケンスが生成される。下層層における局所情報の集約は、モデルに高レベルのセマンティック情報を構築するためのガイドとなる。本稿では,局所意味情報を集約するために,バイトベースの機械翻訳のためのローカルByte Fusion(LOBEF)手法を提案する。多言語翻訳、ゼロショット交叉変換、ドメイン適応に関する大規模な実験は、従来のバイトベースモデルやサブワード技術よりも一貫して改善されている。さらに分析した結果、バイトベースモデルはパラメータ効率が高く、サブワードモデルよりも高速にトレーニングできることがわかった。 Subword tokenization schemes are the dominant technique used in current NLP models. However, such schemes can be rigid and tokenizers built on one corpus do not adapt well to other parallel corpora. It has also been observed that in multilingual corpora, subword tokenization schemes over-segment low-resource languages leading to a drop in translation performance. A simple alternative to subword tokenizers is byte-based methods i.e. tokenization into byte sequences using encoding schemes such as UTF-8. Byte tokens often represent inputs at a sub-character granularity i.e. one character can be represented by a sequence of multiple byte tokens. This results in byte sequences that are significantly longer than character sequences. Enforcing aggregation of local information in the lower layers can guide the model to build higher-level semantic information. We propose a Local Byte Fusion (LOBEF) method for byte-based machine translation -- utilizing byte $n$-gram and word boundaries -- to aggregate local semantic information. Extensive experiments on multilingual translation, zero-shot cross-lingual transfer, and domain adaptation reveal a consistent improvement over traditional byte-based models and even over subword techniques. Further analysis also indicates that our byte-based models are parameter-efficient and can be trained faster than subword models.	翻訳日:2023-06-29 18:51:46 公開日:2023-06-28
# EHRKit: 電子健康記録テキストのためのPython自然言語処理ツールキット EHRKit: A Python Natural Language Processing Toolkit for Electronic Health Record Texts ( http://arxiv.org/abs/2204.06604v5 ) ライセンス: Link先を確認	Irene Li, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Dragomir Radev	(参考訳) 電子健康記録(Electronic Health Record, EHR)は、医療システムにおいて重要な部分であり、医療提供、運営、研究に影響を与える。非構造化テキストは、EHRの構造化情報にもかかわらず多くの注目を集めており、エキサイティングな研究分野となっている。最近のニューラル自然言語処理(NLP)法の成功は、構造化されていない臨床ノートを処理するための新しい方向性につながった。本研究では,臨床テキストのためのピソンライブラリ EHRKit を開発した。 MIMIC-III固有の機能とタスク固有の機能である。第1部では、基本的な検索、情報検索、情報抽出を含むMIMIC-III NOTEEVENTSデータにアクセスするためのインターフェースのリストを紹介する。第2部では、名前付きエンティティ認識、要約、機械翻訳など、最大12のオフセットnlpタスクのために、多くのサードパーティライブラリを統合する。 The Electronic Health Record (EHR) is an essential part of the modern medical system and impacts healthcare delivery, operations, and research. Unstructured text is attracting much attention despite structured information in the EHRs and has become an exciting research field. The success of the recent neural Natural Language Processing (NLP) method has led to a new direction for processing unstructured clinical notes. In this work, we create a python library for clinical texts, EHRKit. This library contains two main parts: MIMIC-III-specific functions and tasks specific functions. The first part introduces a list of interfaces for accessing MIMIC-III NOTEEVENTS data, including basic search, information retrieval, and information extraction. The second part integrates many third-party libraries for up to 12 off-shelf NLP tasks such as named entity recognition, summarization, machine translation, etc.	翻訳日:2023-06-29 18:50:34 公開日:2023-06-28
# SUPERNOVA:リスクベーステストと機械学習を用いたAAAゲームにおけるテスト選択と欠陥防止の自動化 SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video Games Using Risk Based Testing and Machine Learning ( http://arxiv.org/abs/2203.05566v2 ) ライセンス: Link先を確認	Alexander Senchenko, Naomi Patterson, Hamman Samuel, Dan Isper	(参考訳) 従来の手法がソフトウェアシステムの成長とともにスケールできないため、ビデオゲームのテストはますます難しくなっている。手動テストは非常に労働集約的なプロセスなので、すぐにコスト禁止になります。自動テストにスクリプトを使用するのは手頃な価格だが、非決定的な環境ではスクリプトが有効ではない。現代のゲームの複雑さ、スコープ、プレイヤーの期待は、品質管理が生産コストと納入リスクの大きな部分を占めるように急速に増大している。このリスクを低減し、生産を実現することは、現在業界にとって大きな課題です。生産コストを前後的に現実的なものにするため、テストやデータ分析の自動化と並行して、予防的な品質保証戦略に重点を置いています。本稿では,自動ハブとして機能しながら,テスト選択と欠陥防止を行うシステムであるSUPERNOVA(Selection of Testing and Universal defect Prevention in external Repositories for Novel Objective Verification of Software Anomalies)を提案する。データ分析機能と機械学習機能を統合することで、SUPERNOVAは品質保証テスタのバグ発見と欠陥の低減を支援し、プロダクションサイクルの安定性を改善し、テストコストをコントロールできる。この直接的な影響は、これらのテスト選択最適化を使用して出荷された未公開のスポーツゲームタイトルのテスト時間を55%以上削減することが観察されている。さらに、半教師付き機械学習モデルによって生成されたリスクスコアを用いて、71%の精度で検出でき、77%がバグを誘発する変更リストの確率を思い出すことができ、この推論の詳細な説明を開発者に提供できる。これらの取り組みはワークフローを改善し、開発中のゲームタイトルに必要なテスト時間を削減する。 Testing video games is an increasingly difficult task as traditional methods fail to scale with growing software systems. Manual testing is a very labor-intensive process, and therefore quickly becomes cost prohibitive. Using scripts for automated testing is affordable, however scripts are ineffective in non-deterministic environments, and knowing when to run each test is another problem altogether. The modern game's complexity, scope, and player expectations are rapidly increasing where quality control is a big portion of the production cost and delivery risk. Reducing this risk and making production happen is a big challenge for the industry currently. To keep production costs realistic up-to and after release, we are focusing on preventive quality assurance tactics alongside testing and data analysis automation. We present SUPERNOVA (Selection of tests and Universal defect Prevention in External Repositories for Novel Objective Verification of software Anomalies), a system responsible for test selection and defect prevention while also functioning as an automation hub. By integrating data analysis functionality with machine and deep learning capability, SUPERNOVA assists quality assurance testers in finding bugs and developers in reducing defects, which improves stability during the production cycle and keeps testing costs under control. The direct impact of this has been observed to be a reduction in 55% or more testing hours for an undisclosed sports game title that has shipped, which was using these test selection optimizations. Furthermore, using risk scores generated by a semi-supervised machine learning model, we are able to detect with 71% precision and 77% recall the probability of a change-list being bug inducing, and provide a detailed breakdown of this inference to developers. These efforts improve workflow and reduce testing hours required on game titles in development.	翻訳日:2023-06-29 18:49:41 公開日:2023-06-28
# cocofl:部分的nn凍結と量子化によるコミュニケーションと計算の融合学習 CoCoFL: Communication- and Computation-Aware Federated Learning via Partial NN Freezing and Quantization ( http://arxiv.org/abs/2203.05468v3 ) ライセンス: Link先を確認	Kilian Pfeiffer, Martin Rapp, Ramin Khalili, J\"org Henkel	(参考訳) 連邦学習(FL)に参加するデバイスは通常、異種通信、計算、メモリ資源を持つ。しかしながら、同期flでは、すべてのデバイスは、サーバが指示する同じ期限までにトレーニングを終える必要がある。以上の結果から,ニューラルネットワーク(NN)の小さなサブセットを拘束されたデバイス,すなわち最先端技術が提案するニューロン/フィルタを停止させることは非効率であり,これらのデバイスがモデルに効果的な寄与を妨げていることが示された。これにより、特にデバイス間でクラスラベルが歪んだ場合において、制約されたデバイスの達成可能な精度が不公平になる。全てのデバイスでNN構造を完全に維持する新しいFL手法であるCoCoFLを提案する。デバイスの異種リソースに適応するために、cocoflは選択したレイヤを凍結して定量化し、通信、計算、メモリ要件を削減します。これにより、CoCoFLはデバイス上の利用可能なリソースを効率的に利用し、制約されたデバイスがFLシステムに重要な貢献をし、参加者間の公正性(精度の同等性)を高め、モデルの最終的な精度を著しく向上する。 Devices participating in federated learning (FL) typically have heterogeneous communication, computation, and memory resources. However, in synchronous FL, all devices need to finish training by the same deadline dictated by the server. Our results show that training a smaller subset of the neural network (NN) at constrained devices, i.e., dropping neurons/filters as proposed by state of the art, is inefficient, preventing these devices to make an effective contribution to the model. This causes unfairness w.r.t the achievable accuracies of constrained devices, especially in cases with a skewed distribution of class labels across devices. We present a novel FL technique, CoCoFL, which maintains the full NN structure on all devices. To adapt to the devices' heterogeneous resources, CoCoFL freezes and quantizes selected layers, reducing communication, computation, and memory requirements, whereas other layers are still trained in full precision, enabling to reach a high accuracy. Thereby, CoCoFL efficiently utilizes the available resources on devices and allows constrained devices to make a significant contribution to the FL system, increasing fairness among participants (accuracy parity) and significantly improving the final accuracy of the model.	翻訳日:2023-06-29 18:49:12 公開日:2023-06-28
# 高モダリティ多モード変換器:高モダリティ表現学習のためのモダリティと相互作用の不均一性の定量化 High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning ( http://arxiv.org/abs/2203.01311v4 ) ライセンス: Link先を確認	Paul Pu Liang, Yiwei Lyu, Xiang Fan, Jeffrey Tsaw, Yudong Liu, Shentong Mo, Dani Yogatama, Louis-Philippe Morency, Ruslan Salakhutdinov	(参考訳) 現実の多くの問題は本質的にマルチモーダルであり、人間のコミュニケーション、強制、受容、ロボットの視覚センサーなどに使われる言語、ジェスチャー、パラ言語などである。マルチモーダル学習への関心は爆発的に高まっているが、これらの手法は主に言語、視覚、音声に焦点が当てられている。本稿では,多種多様なモダリティに対する一般化を加速するために,多種多様なモダリティを含む高モダリティシナリオに対する効率的な表現学習について検討する。新しいモダリティに新しいモデルを追加することは、必然的に高価になるので、重要な技術的課題は、多様性の定量化である: 前のモダリティとパラメータの共有を可能にするために、類似した情報と相互作用をエンコードするモダリティをどうやって測定できるのか? 異質性量子化のための2つの新しい情報理論指標を提案する。(1) モダリティの不均一性(modality heterogeneity)は、X1からX2への情報転送量を測定することによって、また(2) 相互作用異質性(interaction heterogeneity)は、Fusing {X1,X2} から {X3,X4} への情報転送量を測定することによって、どのように相互作用するかを測定する。提案する2つの指標を,ユニークな情報やインタラクションを含むモダリティの融合を自動的に優先順位付けする方法として重要視する。その結果、単一のモデルであるhighmmtが、最大10のモダリティ(テキスト、画像、音声、ビデオ、センサー、プロピオセプション、音声、時系列、セット、テーブル)と5つの研究領域から15のタスクにスケールする。 HighMMTは、パフォーマンスと効率のトレードオフに関する事前の手法よりも優れているだけでなく、重要なスケーリングの挙動も示している。 Many real-world problems are inherently multimodal, from spoken language, gestures, and paralinguistics humans use to communicate, to force, proprioception, and visual sensors on robots. While there has been an explosion of interest in multimodal learning, these methods are focused on a small set of modalities primarily in language, vision, and audio. In order to accelerate generalization towards diverse and understudied modalities, this paper studies efficient representation learning for high-modality scenarios involving a large set of diverse modalities. Since adding new models for every new modality becomes prohibitively expensive, a critical technical challenge is heterogeneity quantification: how can we measure which modalities encode similar information and interactions in order to permit parameter sharing with previous modalities? This paper proposes two new information theoretic metrics for heterogeneity quantification: (1) modality heterogeneity studies how similar 2 modalities {X1,X2} are by measuring how much information can be transferred from X1 to X2, while (2) interaction heterogeneity studies how similarly pairs of modalities {X1,X2}, {X3,X4} interact by measuring how much information can be transferred from fusing {X1,X2} to {X3,X4}. We show the importance of these 2 proposed metrics as a way to automatically prioritize the fusion of modalities that contain unique information or interactions. The result is a single model, HighMMT, that scales up to 10 modalities (text, image, audio, video, sensors, proprioception, speech, time-series, sets, and tables) and 15 tasks from 5 research areas. Not only does HighMMT outperform prior methods on the tradeoff between performance and efficiency, it also demonstrates a crucial scaling behavior: performance continues to improve with each modality added, and it transfers to entirely new modalities and tasks during fine-tuning.	翻訳日:2023-06-29 18:48:51 公開日:2023-06-28
# 時間非局在サブシステム上の因果不等式に違反するプロセスの存在 Existence of processes violating causal inequalities on time-delocalised subsystems ( http://arxiv.org/abs/2201.11832v3 ) ライセンス: Link先を確認	Julian Wechs, Cyril Branciard, Ognyan Oreshkov	(参考訳) 量子的および古典的過程が存在することは理論的に可能であり、それぞれのパーティによって実行される操作が明確に定義された因果順序で起こらないことが示されている。中心的な疑問は、実際にそのようなプロセスが実現できるかどうかである。このような過程が標準量子論において実現されているという概念を厳密に議論するために、時間非局在量子サブシステムの概念が導入された。本稿では,三成分過程のすべてのユニタリ拡張に対して,時間非局所化サブシステム上の実現が存在することを示す。注目すべきことに、このクラスは因果不等式に違反するプロセス、すなわちデバイス非依存の方法で特定の因果順序と無矛盾性を示す相関を生成するプロセスを含んでいる。我々は,ユニタリ拡張を持つ三部的古典過程の見事な例を考察し,時間非局所化サブシステムにおけるその実現について考察する。次に、因果不等式違反がこの設定においてどのような意味を持つのかを議論し、利害関係変数間の有意因果順序がないことを示すことは確かに有意義な概念であると主張する。 It has been shown that it is theoretically possible for there to exist quantum and classical processes in which the operations performed by separate parties do not occur in a well-defined causal order. A central question is whether and how such processes can be realised in practice. In order to provide a rigorous argument for the notion that certain such processes have a realisation in standard quantum theory, the concept of time-delocalised quantum subsystem has been introduced. In this paper, we show that realisations on time-delocalised subsystems exist for all unitary extensions of tripartite processes. Remarkably, this class contains processes that violate causal inequalities, i.e., that can generate correlations that witness the incompatibility with definite causal order in a device-independent manner. We consider a known striking example of such a tripartite classical process that has a unitary extension, and study its realisation on time-delocalised subsystems. We then discuss the question of what a violation of causal inequalities implies in this setting, and argue that it is indeed a meaningful concept to show the absence of a definite causal order between the variables of interest.	翻訳日:2023-06-29 18:48:05 公開日:2023-06-28
# トランスダクティブおよびセミ教師付き連合学習のためのクロスクライアントラベル伝播 Cross-client Label Propagation for Transductive and Semi-Supervised Federated Learning ( http://arxiv.org/abs/2210.06434v3 ) ライセンス: Link先を確認	Jonathan Scott, Michelle Yeo, Christoph H. Lampert	(参考訳) トランスダクティブフェデレーション学習のための新しい手法であるクロスクライアントラベル伝搬(XCLP)を提案する。 XCLPは、複数のクライアントのデータからデータグラフを共同で推定し、ラベル情報をグラフ全体に伝播することによりラベル付きデータのラベルを算出する。クライアントがデータを誰とでも共有することを避けるため、XCLPは2つの暗号化的にセキュアなプロトコルを使っている。我々は、連合学習におけるXCLPの2つの異なる応用を実証した。最初は、見当たらないテストポイントのラベルを予測するために、ワンショットでそれを使用します。第二に、半教師なしのフェデレーション環境での擬似ラベルなしトレーニングデータを繰り返し使用する。実際のフェデレーションと標準ベンチマークの両方の実験では、XCLPはどちらのアプリケーションでも、代替手法よりも高い分類精度を達成している。 We present Cross-Client Label Propagation(XCLP), a new method for transductive federated learning. XCLP estimates a data graph jointly from the data of multiple clients and computes labels for the unlabeled data by propagating label information across the graph. To avoid clients having to share their data with anyone, XCLP employs two cryptographically secure protocols: secure Hamming distance computation and secure summation. We demonstrate two distinct applications of XCLP within federated learning. In the first, we use it in a one-shot way to predict labels for unseen test points. In the second, we use it to repeatedly pseudo-label unlabeled training data in a federated semi-supervised setting. Experiments on both real federated and standard benchmark datasets show that in both applications XCLP achieves higher classification accuracy than alternative approaches.	翻訳日:2023-06-29 18:42:20 公開日:2023-06-28
# 不均衡な学際的研究提案による階層的ミックスアップマルチラベル分類 Hierarchical MixUp Multi-label Classification with Imbalanced Interdisciplinary Research Proposals ( http://arxiv.org/abs/2209.13912v2 ) ライセンス: Link先を確認	Meng Xiao, Min Wu, Ziyue Qiao, Zhiyuan Ning, Yi Du, Yanjie Fu, Yuanchun Zhou	(参考訳) 資金提供機関は、主にドメインエキスパートと研究提案のトピックマッチングに依存しており、提案レビューアを割り当てている。提案が学際的になるにつれて、提案の学際的性質をプロファイルし、その後、適切な専門知識を持つ専門家を見つけることが困難になる。この問題を解決するための重要なステップは、提案の学際ラベルを正確にモデル化し分類することである。テキスト分類や提案分類といった既存の方法論・応用関連文献は、学際的提案データによる3つの重要な課題を共同で解決するには不十分である。 1)情報科学からAI,AIの基本に至るまで,粗粒から細粒までの提案の規律ラベルの階層構造。 2 提案において、異なる役割を担っている各種主文部の異種意味論 3)非学際研究と学際研究の間には,提案の数は不均衡である。提案の学際的性質を理解する上で,同時に3つの課題に対処できるだろうか? そこで本研究では,H-MixUpと呼ぶ階層型混成多重ラベル分類フレームワークを提案する。 H-MixUpはトランスフォーマーベースの意味情報抽出器とGCNベースの学際知識抽出器を第1号と第2号に活用する。 H-MixUpは、Wold-level MixUp、Word-level CutMix、Manifold MixUp、Document-level MixUpの融合トレーニング方法を開発した。 Funding agencies are largely relied on a topic matching between domain experts and research proposals to assign proposal reviewers. As proposals are increasingly interdisciplinary, it is challenging to profile the interdisciplinary nature of a proposal, and, thereafter, find expert reviewers with an appropriate set of expertise. An essential step in solving this challenge is to accurately model and classify the interdisciplinary labels of a proposal. Existing methodological and application-related literature, such as textual classification and proposal classification, are insufficient in jointly addressing the three key unique issues introduced by interdisciplinary proposal data: 1) the hierarchical structure of discipline labels of a proposal from coarse-grain to fine-grain, e.g., from information science to AI to fundamentals of AI. 2) the heterogeneous semantics of various main textual parts that play different roles in a proposal; 3) the number of proposals is imbalanced between non-interdisciplinary and interdisciplinary research. Can we simultaneously address the three issues in understanding the proposal's interdisciplinary nature? In response to this question, we propose a hierarchical mixup multiple-label classification framework, which we called H-MixUp. H-MixUp leverages a transformer-based semantic information extractor and a GCN-based interdisciplinary knowledge extractor for the first and second issues. H-MixUp develops a fused training method of Wold-level MixUp, Word-level CutMix, Manifold MixUp, and Document-level MixUp to address the third issue.	翻訳日:2023-06-29 18:42:05 公開日:2023-06-28
# 周囲環境における絡み合いの様相 Salient signatures of entanglement in the surrounding environment ( http://arxiv.org/abs/2209.05197v2 ) ライセンス: Link先を確認	{\L}ukasz Rudnicki, Waldemar K{\l}obus, Otavio A. D. Molitor, Wies{\l}aw Laskowski	(参考訳) 我々は, 量子系における絡み合いの存在を, システムを取り巻く環境の粗い観察によって確認できるモデルを開発した。この反直感効果は、システムと環境の間の相互作用が、絡み合う証人である観測可能なものと比例するときに起こりうる。 3つの直感的な例を示しながら一理想気体の雲で、絡み合わされた証人とともに線形ポテンシャルを受けるときは、証人のサインにより指示された方向を加速する。二 2つの量子ビット(又は四レベル原子)を結合したキャビティ内の電磁界の四次数は、同じ方法で変位する。三一つの量子ビットにより与えられる量子環境において、その状態は、ブロッホ球面の1つの半球のみを占め、また、証人のサインと完全に一致する。 We develop a model in which presence of entanglement in a quantum system can be confirmed through coarse observations of the environment surrounding the system. This counter-intuitive effect becomes possible when interaction between the system and its environment is proportional to an observable being an entanglement witness. While presenting three intuitive examples we show that: i) a cloud of an ideal gas, when subject to a linear potential coupled with the entanglement witness, accelerates in the direction dictated by the sign of the witness; ii) quadratures of electromagnetic field in a cavity coupled with two qubits (or a four-level atom) are displaced in the same manner; iii) for a quantum environment given by a single qubit, its state occupies only one hemisphere of the Bloch sphere, again in full agreement with the sign of the witness.	翻訳日:2023-06-29 18:41:41 公開日:2023-06-28
# 量子状態の非線形関数推定のためのハイブリッドフレームワーク A hybrid framework for estimating nonlinear functions of quantum states ( http://arxiv.org/abs/2208.08416v2 ) ライセンス: Link先を確認	You Zhou and Zhenhuan Liu	(参考訳) 量子状態の非線形関数、例えば$\tr(\rho^m)$を推定することは、量子科学と技術に対する基礎的かつ実用的な関心である。ここでは,一般化スワップテストによって量子部分を構成する量子古典的ハイブリッドフレームワークを示し,ランダム化測定から結果を後処理することで古典的部分を実現する。このハイブリッド・フレームワークは、中間スケールの量子プロセッサの部分的コヒーレント・パワーを利用し、同時に量子計測の数と古典的な後処理コストを劇的に削減する。状態モーメント推定と量子誤り軽減のタスクにおける我々のフレームワークの利点を実証する。 Estimating nonlinear functions of quantum states, such as the moment $\tr(\rho^m)$, is of fundamental and practical interest in quantum science and technology. Here we show a quantum-classical hybrid framework to measure them, where the quantum part is constituted by the generalized swap test, and the classical part is realized by postprocessing the result from randomized measurements. This hybrid framework utilizes the partial coherent power of the intermediate-scale quantum processor and, at the same time, dramatically reduces the number of quantum measurements and the cost of classical postprocessing. We demonstrate the advantage of our framework in the tasks of state-moment estimation and quantum error mitigation.	翻訳日:2023-06-29 18:41:20 公開日:2023-06-28
# メタバースxurllcサービスの注意対応リソース割り当てとqoe分析 Attention-aware Resource Allocation and QoE Analysis for Metaverse xURLLC Services ( http://arxiv.org/abs/2208.05438v6 ) ライセンス: Link先を確認	Hongyang Du, Jiazhen Liu, Dusit Niyato, Jiawen Kang, Zehui Xiong, Junshan Zhang, and Dong In Kim	(参考訳) Metaverseは、次世代インターネットの期待をカプセル化しつつ、新しいキーパフォーマンス指標(KPI)を導入しています。従来の超信頼性・低遅延通信(URLLC)は客観的KPIを満足するが,Metaverseの特徴である個人化された没入感体験を提供することは困難である。クオリティ・オブ・エクスペリエンス(QoE)は総合的なKPIとみなすことができるため、URLLCはより高度なQoEを実現するために、パーソナライズされたリソース割り当てスキームを備えた次世代のURLLC(xURLLC)へと進化する。 Metaverse xURLLC サービスをデプロイするために,Metaverse サービスプロバイダ (MSP) とネットワークインフラストラクチャプロバイダ (InP) のインタラクションを調査し,最適なコントラクト設計フレームワークを提供する。具体的には、メタバースユーザーのQoEの関数として定義されたMSPの効用を最大化し、InPのインセンティブを確実にする。本稿では,QoEを数学的にモデル化するために,メタ・インマージョン(Meta-Immersion)という手法を提案する。さらに, xurllc における qoe を改善するため,注意意識型レンダリングキャパシティアロケーションスキームを開発した。ユーザ・オブジェクト・アテンションレベルデータセットを用いてxURLLCが従来のURLLCと比較して平均20.1%のQoE改善を実現可能であることを検証した。この論文のコードはhttps://github.com/hongyangdu/attentionqoeで入手できる。 Metaverse encapsulates our expectations of the next-generation Internet, while bringing new key performance indicators (KPIs). Although conventional ultra-reliable and low-latency communications (URLLC) can satisfy objective KPIs, it is difficult to provide a personalized immersive experience that is a distinctive feature of the Metaverse. Since the quality of experience (QoE) can be regarded as a comprehensive KPI, the URLLC is evolved towards the next generation URLLC (xURLLC) with a personalized resource allocation scheme to achieve higher QoE. To deploy Metaverse xURLLC services, we study the interaction between the Metaverse service provider (MSP) and the network infrastructure provider (InP), and provide an optimal contract design framework. Specifically, the utility of the MSP, defined as a function of Metaverse users' QoE, is to be maximized, while ensuring the incentives of the InP. To model the QoE mathematically, we propose a novel metric named Meta-Immersion that incorporates both the objective KPIs and subjective feelings of Metaverse users. Furthermore, we develop an attention-aware rendering capacity allocation scheme to improve QoE in xURLLC. Using a user-object-attention level dataset, we validate that the xURLLC can achieve an average of 20.1% QoE improvement compared to the conventional URLLC with a uniform resource allocation scheme. The code for this paper is available at https://github.com/HongyangDu/AttentionQoE	翻訳日:2023-06-29 18:41:09 公開日:2023-06-28
# 開空洞から単一光子を伝播する:普遍量子化による記述 Propagating single photons from an open cavity: Description from universal quantization ( http://arxiv.org/abs/2207.04517v2 ) ライセンス: Link先を確認	Astghik Saharyan, Benjamin Rousseaux, Zsolt Kis, Sergiy Stryzhenko and St\'ephane Gu\'erin	(参考訳) 過去数十年間、量子光学は、初期の実験における高品質な因子キャビティから、リークモードを含む新しいキャビティ設計へと進化してきた。非常に信頼できるモデルにもかかわらず、キャビティ量子電磁力学の概念では、光子漏れは現象学的にほとんどの時間を扱う。ここで、異なるアプローチをとり、最初の原理から始めて、元の真のモード表現から派生した内側の表現を定義し、それによって効果的なハミルトニアンとポインティングベクトルを決定できる。現象学モデルとは対照的に、空洞で発生し、自由空間で伝播する単一の光子の完全な記述が可能である。これはレーザー駆動の原子キャビティシステムに適用される。さらに, 単一光子生成のための原子空洞非共振方式を提案し, 結合状態の異なる単一光子を時間および周波数領域で厳密に解析する。最後に, 単一光子のパルス形状を専用に設計した駆動場エンベロープを用いて調整した断熱除去を実現する特定の結合構造を導入する。 Over the last decades, quantum optics has evolved from high quality factor cavities in the early experiments toward new cavity designs involving leaky modes. Despite very reliable models, in the concepts of cavity quantum electrodynamics, photon leakage is most of the time treated phenomenologically. Here, we take a different approach, and starting from first principles, we define an inside-outside representation which is derived from the original true-mode representation, in which one can determine effective Hamiltonian and Poynting vector. Contrary to the phenomenological model, they allow a full description of a leaking single photon produced in the cavity and propagating in free space. This is applied for a laser-driven atom-cavity system. In addition, we propose an atom-cavity non-resonant scheme for single photon generation, and we rigorously analyze the outgoing single photon in time and frequency domains for different coupling regimes. Finally, we introduce a particular coupling regime ensuring adiabatic elimination for which the pulse shape of the outgoing single photon is tailored using a specifically designed driving field envelope.	翻訳日:2023-06-29 18:40:42 公開日:2023-06-28
# 重ね合わせにおける時空の量子共形対称性 Quantum conformal symmetries for spacetimes in superposition ( http://arxiv.org/abs/2207.00021v2 ) ライセンス: Link先を確認	Viktoria Kabel, Anne-Catherine de la Hamette, Esteban Castro-Ruiz, \v{C}aslav Brukner	(参考訳) 量子重力の完全な理論がなければ、量子場と量子粒子が時空の重ね合わせの中でどのように振る舞うかという問題は、理論的および実験的研究の範囲を超えているように見える。ここでは量子参照フレーム形式の拡張を用いて、同値な計量の重ね合わせを前提としたクライン=ゴードン場に対するこの問題に対処する。量子共形変換'' の群構造に基づいて、時空の重ね合わせに量子場を記述する状態と、ミンコフスキーの背景に質量の重ね合わせを持つ量子場を表す状態とをマッピングできる明示的な量子作用素を構築する。これは拡張対称性の原理、すなわち量子共形変換の下で不変性を構成する。後者は、微分同値でない時空の重ね合わせを、曲線化された時空上のより直感的な場の重ね合わせと関連付けることによって理解することができる。さらに、曲がりくねった時空における粒子生成の現象を同値な同値な時空にインポートするためにも用いることができ、改良されたミンコフスキー時空における新しい特徴を明らかにすることができる。 Without a complete theory of quantum gravity, the question of how quantum fields and quantum particles behave in a superposition of spacetimes seems beyond the reach of theoretical and experimental investigations. Here we use an extension of the quantum reference frame formalism to address this question for the Klein-Gordon field residing on a superposition of conformally equivalent metrics. Based on the group structure of ``quantum conformal transformations'', we construct an explicit quantum operator that can map states describing a quantum field on a superposition of spacetimes to states representing a quantum field with a superposition of masses on a Minkowski background. This constitutes an extended symmetry principle, namely invariance under quantum conformal transformations. The latter allows to build an understanding of superpositions of diffeomorphically non-equivalent spacetimes by relating them to a more intuitive superposition of quantum fields on curved spacetime. Furthermore, it can be used to import the phenomenon of particle production in curved spacetime to its conformally equivalent counterpart, thus revealing new features in modified Minkowski spacetime.	翻訳日:2023-06-29 18:40:23 公開日:2023-06-28
# KAB2Sに向けて : 単目的問題から多目的問題への鍵知識の学習 Towards KAB2S: Learning Key Knowledge from Single-Objective Problems to Multi-Objective Problem ( http://arxiv.org/abs/2206.12906v2 ) ライセンス: Link先を確認	Xu Wendi, Wang Xianpeng, Guo Qingxin, Song Xiangman, Zhao Ren, Zhao Guodong, Yang Yang, Xu Te, He Dakuo	(参考訳) 進化的計算研究の新しいフロンティア」として、進化的伝達最適化(eto)は進化的計算研究における過去の問題からの関連する経験と知識のゼロ再利用という従来のパラダイムを克服する。 etoによるスケジューリングアプリケーションでは、知的スケジューリングとグリーンスケジューリングの両方、特に中国からの「炭素中立性」の国際的な誓約のために、非常に魅力的で競争の激しいフレームワーク「ミーティング」が形成される可能性がある。我々の知る限り、ここでのスケジューリングに関する論文は、多目的最適化問題(マルチタスク最適化ではない)が離散ケースにおいて単目的最適化問題を「省略する」場合に、ETOフレームワークのクラスの最初の作業となる。より具体的には、遺伝的アルゴリズムをベースとした位置決定ブロックのような産業応用のための重要な知識は、置換フローショップスケジューリング問題(PFSP)のための新しいコア転送機構と学習技術によって利用することができる。提案するETO-PFSPフレームワークの有効性と大域的普遍性を実証的に検証した。本研究は,(1)ETOフレームワークを充実させ,(2)遺伝的アルゴリズムとメメティックアルゴリズムのブロック構築の古典的・基本的理論に寄与し,(3)中国における「インダストリアルインテリジェンス」のための「知識とビルディングブロックに基づくスケジューリング(KAB2S)」のパラダイムの提案と実践により,進化的スケジューリングのパラダイムシフトに向かう。 As "a new frontier in evolutionary computation research", evolutionary transfer optimization(ETO) will overcome the traditional paradigm of zero reuse of related experience and knowledge from solved past problems in researches of evolutionary computation. In scheduling applications via ETO, a quite appealing and highly competitive framework "meeting" between them could be formed for both intelligent scheduling and green scheduling, especially for international pledge of "carbon neutrality" from China. To the best of our knowledge, our paper on scheduling here, serves as the 1st work of a class of ETO frameworks when multiobjective optimization problem "meets" single-objective optimization problems in discrete case (not multitasking optimization). More specifically, key knowledge conveyed for industrial applications, like positional building blocks with genetic algorithm based settings, could be used via the new core transfer mechanism and learning techniques for permutation flow shop scheduling problem(PFSP). Extensive studies on well-studied benchmarks validate firm effectiveness and great universality of our proposed ETO-PFSP framework empirically. Our investigations (1) enrich the ETO frameworks, (2) contribute to the classical and fundamental theory of building block for genetic algorithms and memetic algorithms, and (3) head towards the paradigm shift of evolutionary scheduling via learning by proposal and practice of paradigm of "knowledge and building-block based scheduling" (KAB2S) for "industrial intelligence" in China.	翻訳日:2023-06-29 18:39:58 公開日:2023-06-28
# スケジューリング:単一目的問題から多目的問題への鍵知識の学習 ETO Meets Scheduling: Learning Key Knowledge from Single-Objective Problems to Multi-Objective Problem ( http://arxiv.org/abs/2206.12902v2 ) ライセンス: Link先を確認	Wendi Xu, Xianpeng Wang	(参考訳) 進化的伝達最適化(ETO)は「進化的計算研究の新しいフロンティア」として機能し、従来の進化的計算において解決された問題から経験と知識をゼロに再利用することを避ける。 ETOを経由したスケジューリングでは、インテリジェントなスケジューリングとグリーンスケジューリングの両方、特に中国の文脈における炭素中立性のために、非常に競争の激しい"ミーティング"フレームワークを構成することができる。我々の知る限り、ここでのスケジューリングに関する我々の研究は、多目的問題(マルチタスク最適化ではない)の単一目的問題において、複雑な最適化のためのETOの最初の研究である。より具体的には、位置決めブロックのような重要な知識は学習され、置換フローショップスケジューリング問題(PFSP)のために転送される。提案するETO-PFSPフレームワークの比較的確実な有効性と大きな可能性を検証する。 Evolutionary transfer optimization(ETO) serves as "a new frontier in evolutionary computation research", which will avoid zero reuse of experience and knowledge from solved problems in traditional evolutionary computation. In scheduling applications via ETO, a highly competitive "meeting" framework between them could be constituted towards both intelligent scheduling and green scheduling, especially for carbon neutrality within the context of China. To the best of our knowledge, our study on scheduling here, is the 1st work of ETO for complex optimization when multiobjective problem "meets" single-objective problems in combinatorial case (not multitasking optimization). More specifically, key knowledge like positional building blocks clustered, could be learned and transferred for permutation flow shop scheduling problem (PFSP). Empirical studies on well-studied benchmarks validate relatively firm effectiveness and great potential of our proposed ETO-PFSP framework.	翻訳日:2023-06-29 18:39:31 公開日:2023-06-28
# 人間の日から機械秒:機械学習の最終結果の自動回答と生成 From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams ( http://arxiv.org/abs/2206.05442v7 ) ライセンス: Link先を確認	Iddo Drori, Sarah J. Zhang, Reece Shuttleworth, Sarah Zhang, Keith Tyser, Zad Chin, Pedro Lantigua, Saisamrit Surbehera, Gregory Hunter, Derek Austin, Leonard Tang, Yann Hicke, Sage Simhon, Sathwik Karnik, Darnell Granberry, Madeleine Udell	(参考訳) mit、ハーバード、コーネルなどの上位機関における機械学習の最終試験は通常、執筆に学部の日を要し、解決には学生の時間を要する。大規模な言語モデルは、トレーニング後のオンラインのファイナルで、人間のレベルで機械学習のファイナルをパスし、新しい品質のファイナル質問を数秒で自動生成することを示した。従来の研究は、数学やSTEMコースにおける大学レベルの問題セットを解くために、プログラム合成と数ショットの学習方法を開発した。本研究では,問題集合とはいくつかの方法で異なる最終試験を解く手法を開発し,比較する。質問はより長く,複数の部分を持ち,より複雑で,幅広い話題にまたがる。オンラインで利用できる機械学習の最終試験のデータセットとベンチマークを作成し、これらの質問に答え、新しい質問を生成するためのコードを作成します。他の質問やコースノートから新しい質問を生成する方法を示します。この最終試験ベンチマークの再現性と今後の研究のために,複数選択,数値,質問に対する自動チェッカーを表現回答とともに使用する。 GPT-3, OPT, Codex, ChatGPT を用いて, ゼロショット学習と少数ショット学習, チェーン・オブ・シークレットとを比較したアブレーション研究を行い, 少数ショット学習が有効であることを示す。我々は,大規模評価の文章作成と解法を合理化する言語モデルの変換可能性に注目し,人間の日数から機械数秒までの作業負荷を大幅に削減する。本研究は,chatgptのような大規模言語モデルを授業で禁止するよりも,学生に対して,正しさ,完全性,回答の独創性を問うことによって活用を指導し,批判的思考を奨励すべきであることが示唆された。 A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human level, on finals available online after the models were trained, and automatically generate new human-quality final exam questions in seconds. Previous work has developed program synthesis and few-shot learning methods to solve university-level problem set questions in mathematics and STEM courses. In this work, we develop and compare methods that solve final exams, which differ from problem sets in several ways: the questions are longer, have multiple parts, are more complicated, and span a broader set of topics. We curate a dataset and benchmark of questions from machine learning final exams available online and code for answering these questions and generating new questions. We show how to generate new questions from other questions and course notes. For reproducibility and future research on this final exam benchmark, we use automatic checkers for multiple-choice, numeric, and questions with expression answers. We perform ablation studies comparing zero-shot learning with few-shot learning and chain-of-thought prompting using GPT-3, OPT, Codex, and ChatGPT across machine learning topics and find that few-shot learning methods perform best. We highlight the transformative potential of language models to streamline the writing and solution of large-scale assessments, significantly reducing the workload from human days to mere machine seconds. Our results suggest that rather than banning large language models such as ChatGPT in class, instructors should teach students to harness them by asking students meta-questions about correctness, completeness, and originality of the responses generated, encouraging critical thinking in academic studies.	翻訳日:2023-06-29 18:39:15 公開日:2023-06-28
# 微分可能なユーザモデル Differentiable User Models ( http://arxiv.org/abs/2211.16277v2 ) ライセンス: Link先を確認	Alex H\"am\"al\"ainen, Mustafa Mert \c{C}elikok, Samuel Kaski	(参考訳) 確率的ユーザモデリングは、ループ内に人間がいるユビキタスケースで機械学習システムを構築するために不可欠である。しかし、現代の高度なユーザーモデルは認知行動シミュレータとして設計され、現代の機械学習パイプラインと互換性がなく、ほとんどの実用的なアプリケーションでは計算が禁じられている。我々は、この計算ボトルネックを回避するために、広く適用可能な微分可能サロゲートを導入することでこの問題に対処し、サロゲートは現代の認知モデルを用いた計算効率の高い推論を可能にする。オンラインアプリケーションに適した計算コストで、既存の可能性のない推論手法である、唯一利用可能なソリューションに匹敵するモデリング能力が達成可能であることを実験的に示す。最後に、メニュー検索タスクにおいて、aiアシスタントがオンラインインタラクションに認知モデルをどのように利用できるかを示す。 Probabilistic user modeling is essential for building machine learning systems in the ubiquitous cases with humans in the loop. However, modern advanced user models, often designed as cognitive behavior simulators, are incompatible with modern machine learning pipelines and computationally prohibitive for most practical applications. We address this problem by introducing widely-applicable differentiable surrogates for bypassing this computational bottleneck; the surrogates enable computationally efficient inference with modern cognitive models. We show experimentally that modeling capabilities comparable to the only available solution, existing likelihood-free inference methods, are achievable with a computational cost suitable for online applications. Finally, we demonstrate how AI-assistants can now use cognitive models for online interaction in a menu-search task, which has so far required hours of computation during interaction.	翻訳日:2023-06-29 18:31:33 公開日:2023-06-28
# ストラグラー緩和のための逐次勾配符号化 Sequential Gradient Coding For Straggler Mitigation ( http://arxiv.org/abs/2211.13802v2 ) ライセンス: Link先を確認	M. Nikhil Krishnan, MohammadReza Ebrahimi, Ashish Khisti	(参考訳) 分散コンピューティングでは、遅いノード(ストラグラー)は通常ボトルネックとなる。 Tandonらによって導入されたGC(Gradient Coding)は、誤り訂正符号の原理を用いて、ストラグラーの存在下で勾配計算を分散する効率的な手法である。本稿では,各勾配の処理をラウンド$t$で開始し,ラウンド$(t+t)$で終了するような勾配列$\{g(1),g(2),\ldots,g(j)\}$の分散計算を考える。ここで$T\geq 0$は遅延パラメータを表す。 GCスキームでは、コーディングは計算ノード間でのみ行われ、結果として$T=0$というソリューションが得られる。一方、$t>0$を持つことで、時間次元を利用するスキームを設計することができる。本稿では,GCと比較して性能向上を示す2つの手法を提案する。最初のスキームでは、GCと未完成タスクの選択的な繰り返しを組み合わせることで、トラグラー緩和の改善を実現しています。私たちの主な貢献を構成する第2のスキームでは、タスクのサブセットにgcを適用し、残りのタスクを反復します。次に、過去のストラグラーパターンに基づいて、労働者とラウンドにまたがる2つのタスクのクラスを適応的に多重化する。理論解析を用いて,第2のスキームが計算負荷を大幅に削減できることを実証する。実験では、256のワーカノードを含むAWS Lambdaクラスタ上で、並列に複数のニューラルネットワークをトレーニングする実践的な設定について検討した。提案手法は, 自然に発生する非シミュレートストラグラーの存在下で, ベースラインGC方式よりも16倍のランタイム改善を実現することができることを示す。 In distributed computing, slower nodes (stragglers) usually become a bottleneck. Gradient Coding (GC), introduced by Tandon et al., is an efficient technique that uses principles of error-correcting codes to distribute gradient computation in the presence of stragglers. In this paper, we consider the distributed computation of a sequence of gradients $\{g(1),g(2),\ldots,g(J)\}$, where processing of each gradient $g(t)$ starts in round-$t$ and finishes by round-$(t+T)$. Here $T\geq 0$ denotes a delay parameter. For the GC scheme, coding is only across computing nodes and this results in a solution where $T=0$. On the other hand, having $T>0$ allows for designing schemes which exploit the temporal dimension as well. In this work, we propose two schemes that demonstrate improved performance compared to GC. Our first scheme combines GC with selective repetition of previously unfinished tasks and achieves improved straggler mitigation. In our second scheme, which constitutes our main contribution, we apply GC to a subset of the tasks and repetition for the remainder of the tasks. We then multiplex these two classes of tasks across workers and rounds in an adaptive manner, based on past straggler patterns. Using theoretical analysis, we demonstrate that our second scheme achieves significant reduction in the computational load. In our experiments, we study a practical setting of concurrently training multiple neural networks over an AWS Lambda cluster involving 256 worker nodes, where our framework naturally applies. We demonstrate that the latter scheme can yield a 16\% improvement in runtime over the baseline GC scheme, in the presence of naturally occurring, non-simulated stragglers.	翻訳日:2023-06-29 18:31:21 公開日:2023-06-28
# 周期駆動超低温原子を用いた非可換キラルスピン液体の工学と探索 Engineering and probing non-Abelian chiral spin liquids using periodically driven ultracold atoms ( http://arxiv.org/abs/2211.09777v3 ) ライセンス: Link先を確認	Bo-Ye Sun, Nathan Goldman, Monika Aidelsburger, Marin Bukov	(参考訳) 量子シミュレータを用いた非可換キラルスピン液体の実現と探索を目的として,周期(フロッケ)駆動に基づく寒冷原子を用いたキタエフのハニカムモデルの実装法を提案する。実効的なハミルトニアンを逆周波数展開における主次数に導出し、実効的なマヨラナと渦の自由度を混ぜることなくスペクトルの位相的ギャップを開くことを示した。我々は、マヨルダナフェルミオンの物理を探索する課題に対処し、元の合成スピン自由度にのみアクセスする。具体的には,Floquetドライブの存在下でのギャップ分光とエッジクエンチを用いて,キラルスピン液体相の性質を検出することを提案する。その結果得られるキラルエッジ信号は、中性マヨラナ電流に関連する熱ホール効果と関連しており、現実的に準備された状態に対して頑健であることが判明した。フロッケ工学と強い相互作用を組み合わせることで、量子シミュレータを用いた非可換励起と量子化熱輸送の将来研究への道を開く。 We propose a scheme to implement Kitaev's honeycomb model with cold atoms, based on a periodic (Floquet) drive, in view of realizing and probing non-Abelian chiral spin liquids using quantum simulators. We derive the effective Hamiltonian to leading order in the inverse-frequency expansion, and show that the drive opens up a topological gap in the spectrum without mixing the effective Majorana and vortex degrees of freedom. We address the challenge of probing the physics of Majorana fermions, while having only access to the original composite spin degrees of freedom. Specifically, we propose to detect the properties of the chiral spin liquid phase using gap spectroscopy and edge quenches in the presence of the Floquet drive. The resulting chiral edge signal, which relates to the thermal Hall effect associated with neutral Majorana currents, is found to be robust for realistically-prepared states. By combining strong interactions with Floquet engineering, our work paves the way for future studies of non-Abelian excitations and quantized thermal transport using quantum simulators.	翻訳日:2023-06-29 18:30:52 公開日:2023-06-28
# QueryForm: シンプルなゼロショットフォーム Entity Query Framework QueryForm: A Simple Zero-shot Form Entity Query Framework ( http://arxiv.org/abs/2211.07730v2 ) ライセンス: Link先を確認	Zifeng Wang, Zizhao Zhang, Jacob Devlin, Chen-Yu Lee, Guolong Su, Hao Zhang, Jennifer Dy, Vincent Perot, Tomas Pfister	(参考訳) 文書理解のためのゼロショット転送学習は、文書エンティティのアノテートにかかわる高コスト化を支援するために不可欠だが、未検討のシナリオである。本稿では,0ショット方式でフォームライクなドキュメントからエンティティ値を抽出する新しいクエリベースのフレームワークQueryFormを提案する。 queryformには、ドキュメントスキーマと特定のエンティティタイプの両方をクエリに構成するデュアルプロンプトメカニズムが含まれており、トランスフォーマーモデルに単一のエンティティ抽出タスクを実行するように促すために使用される。さらに,HTML アノテーションの弱いフォーム風の Web ページから生成された大規模クエリエンタリティペアを,事前学習型 QueryForm に活用することを提案する。事前トレーニングと微調整を同じクエリベースのフレームワークに統合することにより、queryformでは、さまざまなエンティティやレイアウトを含む構造化ドキュメントからモデルが学習できるようになる。 QueryForm は XFUND (+4.6%~10.1%) と Payment (+3.2%~9.5%) のゼロショットベンチマークの両方に新しい最先端の平均 F1 スコアをセットする。 Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities. We present a novel query-based framework, QueryForm, that extracts entity values from form-like documents in a zero-shot fashion. QueryForm contains a dual prompting mechanism that composes both the document schema and a specific entity type into a query, which is used to prompt a Transformer model to perform a single entity extraction task. Furthermore, we propose to leverage large-scale query-entity pairs generated from form-like webpages with weak HTML annotations to pre-train QueryForm. By unifying pre-training and fine-tuning into the same query-based framework, QueryForm enables models to learn from structured documents containing various entities and layouts, leading to better generalization to target document types without the need for target-specific training data. QueryForm sets new state-of-the-art average F1 score on both the XFUND (+4.6%~10.1%) and the Payment (+3.2%~9.5%) zero-shot benchmark, with a smaller model size and no additional image input.	翻訳日:2023-06-29 18:30:25 公開日:2023-06-28
# 相互作用する可積分および非可積分スピン-1/2XYZ鎖における時間外秩序相関子のスロー緩和 Slow relaxation of out-of-time-ordered correlators in interacting integrable and nonintegrable spin-1/2 XYZ chains ( http://arxiv.org/abs/2211.07073v2 ) ライセンス: Link先を確認	Vinitha Balachandran, Lea F. Santos, Marcos Rigol, and Dario Poletti	(参考訳) 時間外順序相関器(OTOC)は量子情報のスクランブルを特徴づけるのに役立ち、通常は非可積分系の文脈で研究される。本研究では、古典的対応を持たないレジームにおける可積分および非可積分スピン-1/2XYZ鎖の相互作用におけるOTOCの緩和ダイナミクスを比較する。両方の種類の鎖において、$U(1)$や超対称性のような対称性の存在を利用して、OTOC作用素がハミルトン作用素と重複するか否かを考察する。 OTOCs の緩和は、連鎖が可積分であるか非可積分であるかに関わらず、重複が存在するとき(そうでないとき)遅い(高速)ことを示す。遅くなると、OTOCのダイナミクスは2点相関器のそれに近いことが示される。数値計算を用いてオトクの動力学を研究し,エネルギー固有ベイシスにおける対応する局所作用素の対角およびオフ対角行列要素の性質から解析的洞察を得る。 Out-of-time ordered correlators (OTOCs) help characterize the scrambling of quantum information and are usually studied in the context of nonintegrable systems. In this work, we compare the relaxation dynamics of OTOCs in interacting integrable and nonintegrable spin-1/2 XYZ chains in regimes without a classical counterpart. In both kinds of chains, using the presence of symmetries such as $U(1)$ and supersymmetry, we consider regimes in which the OTOC operators overlap or not with the Hamiltonian. We show that the relaxation of the OTOCs is slow (fast) when there is (there is not) an overlap, independently of whether the chain is integrable or nonintegrable. When slow, we show that the OTOC dynamics follows closely that of the two-point correlators. We study the dynamics of OTOCs using numerical calculations, and gain analytical insights from the properties of the diagonal and of the off-diagonal matrix elements of the corresponding local operators in the energy eigenbasis.	翻訳日:2023-06-29 18:30:01 公開日:2023-06-28
# 双曲表現学習の数値的安定性 The Numerical Stability of Hyperbolic Representation Learning ( http://arxiv.org/abs/2211.00181v3 ) ライセンス: Link先を確認	Gal Mishne, Zhengchao Wan, Yusu Wang, Sheng Yang	(参考訳) 球の半径が指数関数的に増加すると、双曲空間は任意に小さな歪みで木を埋め込むことができ、したがって階層的なデータセットを表現するために広く注目を集めている。しかし、この指数的成長特性は数値的な不安定さの代償となり、双曲型学習モデルの訓練は時に破滅的なnan問題を引き起こし、浮動小数点演算において表現不能な値に遭遇する。本研究では,双曲空間に対する2つの人気モデルの極限,すなわちポアンカーの球とローレンツ模型を慎重に解析する。まず,64ビットの算術システムにおいて,ポアンカルの球は点を正しく表現するためのローレンツモデルよりも比較的大きな容量を持つことを示す。そして,最適化の観点から,ポアンカーの球に対するローレンツモデルの優位性を理論的に検証する。両方のモデルの数値的な制限を考えると、これらの制限を緩和できる双曲空間のユークリッドパラメトリゼーションを1つ特定する。さらに、このユークリッドパラメトリゼーションを双曲型超平面に拡張し、双曲型SVMの性能を向上させる能力を示す。 Given the exponential growth of the volume of the ball w.r.t. its radius, the hyperbolic space is capable of embedding trees with arbitrarily small distortion and hence has received wide attention for representing hierarchical datasets. However, this exponential growth property comes at a price of numerical instability such that training hyperbolic learning models will sometimes lead to catastrophic NaN problems, encountering unrepresentable values in floating point arithmetic. In this work, we carefully analyze the limitation of two popular models for the hyperbolic space, namely, the Poincar\'e ball and the Lorentz model. We first show that, under the 64 bit arithmetic system, the Poincar\'e ball has a relatively larger capacity than the Lorentz model for correctly representing points. Then, we theoretically validate the superiority of the Lorentz model over the Poincar\'e ball from the perspective of optimization. Given the numerical limitations of both models, we identify one Euclidean parametrization of the hyperbolic space which can alleviate these limitations. We further extend this Euclidean parametrization to hyperbolic hyperplanes and exhibits its ability in improving the performance of hyperbolic SVM.	翻訳日:2023-06-29 18:28:50 公開日:2023-06-28
# 無限格子上の量子スピン系に対する位数1のワッサーシュタイン距離 The Wasserstein distance of order 1 for quantum spin systems on infinite lattices ( http://arxiv.org/abs/2210.11446v2 ) ライセンス: Link先を確認	Giacomo De Palma and Dario Trevisan	(参考訳) 我々は、特定の量子 $w_1$ 距離と呼ばれる格子 $\mathbb{z}^d$ 上の位数 1 から量子スピン系へのワッサーシュタイン距離の一般化を提案する。この提案は[de palma et al., ieee trans. inf. theory 67, 6627 (2021)]のquditsに対する$w_1$ distance に基づいており、任意の有限個のスピン上の限界状態が正準基底で対角的である量子状態に対してornsteinの$\bar{d}$- distance を回復する。また、$\mathbb{Z}^d$ 上の量子相互作用に対するリプシッツ定数の一般化を提案し、そのような量子リプシッツ定数と特定の量子 $W_1$ 距離が互いに双対であることを証明する。我々は、量子$W_1$距離という観点から有限個の量子スピンに対するフォン・ノイマンエントロピーに対する新しい連続性を証明し、それを用いて、$\mathbb{Z}^d$上の量子スピン系に対する特定の量子$W_1$距離という観点から、特定のフォン・ノイマンエントロピーに対する連続性を証明する。最後に、臨界温度を超える局所的な量子交換相互作用が輸送コストの不等式を満たすことを証明し、ギブス状態の特異性を示す。 We propose a generalization of the Wasserstein distance of order 1 to quantum spin systems on the lattice $\mathbb{Z}^d$, which we call specific quantum $W_1$ distance. The proposal is based on the $W_1$ distance for qudits of [De Palma et al., IEEE Trans. Inf. Theory 67, 6627 (2021)] and recovers Ornstein's $\bar{d}$-distance for the quantum states whose marginal states on any finite number of spins are diagonal in the canonical basis. We also propose a generalization of the Lipschitz constant to quantum interactions on $\mathbb{Z}^d$ and prove that such quantum Lipschitz constant and the specific quantum $W_1$ distance are mutually dual. We prove a new continuity bound for the von Neumann entropy for a finite set of quantum spins in terms of the quantum $W_1$ distance, and we apply it to prove a continuity bound for the specific von Neumann entropy in terms of the specific quantum $W_1$ distance for quantum spin systems on $\mathbb{Z}^d$. Finally, we prove that local quantum commuting interactions above a critical temperature satisfy a transportation-cost inequality, which implies the uniqueness of their Gibbs states.	翻訳日:2023-06-29 18:28:19 公開日:2023-06-28
# 準確率表現における量子ベイズ推論 Quantum Bayesian Inference in Quasiprobability Representations ( http://arxiv.org/abs/2301.01952v2 ) ライセンス: Link先を確認	Clive Cenxin Aw, Kelvin Onggadinata, Dagomir Kaszlikowski, Valerio Scarani	(参考訳) ベイズの法則は情報科学や物理科学において重要な論理推論の役割を果たす。量子状態への拡張は、近年のいくつかの研究の対象となっている。これらのベイズの規則の量子バージョンはヒルベルト空間の言語で表現されている。本稿では,正規準確率表現(離散ウィグナー表現を含む)と対称的かつ情報的完全正の演算子値測度(sic-povms)の2つの標準的選択に対する明示的な公式を用いて,任意の準確率表現におけるpetz回復写像の表現を導出する。古典理論と量子理論の論理的推論の核となる違いは、(quasi-)確率ベクトルに作用する(quasi-)確率行列の同じ数学的構文を用いることで、チャネルの表現よりもむしろ参照の操作において見出される。 Bayes' rule plays a crucial piece of logical inference in information and physical sciences alike. Its extension into the quantum regime has been the object of several recent works. These quantum versions of Bayes' rule have been expressed in the language of Hilbert spaces. In this paper, we derive the expression of the Petz recovery map within any quasiprobability representation, with explicit formulas for the two canonical choices of normal quasiprobability representations (which include Discrete Wigner representations) and of representations based on symmetric, informationally complete positive operator-valued measures (SIC-POVMs). By using the same mathematical syntax of (quasi-)stochastic matrices acting on (quasi-)stochastic vectors, the core difference in logical inference between classical and quantum theory is found in the manipulation of the reference prior rather than in the representation of the channel.	翻訳日:2023-06-29 18:21:58 公開日:2023-06-28
# 深いRプログラミング Deep R Programming ( http://arxiv.org/abs/2301.01188v3 ) ライセンス: Link先を確認	Marek Gagolewski	(参考訳) Deep R Programmingは、データサイエンスの最も人気のある言語の1つである包括的で詳細な入門コースである。これは野心的な学生、専門家、研究者に、この強力な環境の独立したユーザーになるための知識とスキルを与え、データラングリングや分析、数値計算、統計学、機械学習に関するあらゆる問題に取り組むことができる。この教科書は非営利プロジェクトです。オンライン版とPDF版は <https://deepr.gagolewski.com/> で無料で入手できる。 Deep R Programming is a comprehensive and in-depth introductory course on one of the most popular languages for data science. It equips ambitious students, professionals, and researchers with the knowledge and skills to become independent users of this potent environment so that they can tackle any problem related to data wrangling and analytics, numerical computing, statistics, and machine learning. This textbook is a non-profit project. Its online and PDF versions are freely available at <https://deepr.gagolewski.com/>.	翻訳日:2023-06-29 18:21:42 公開日:2023-06-28
# チューナブル有効結合を持つKerrパラメトリック発振器の相関振動 Correlated oscillations in Kerr parametric oscillators with tunable effective coupling ( http://arxiv.org/abs/2212.13682v3 ) ライセンス: Link先を確認	T. Yamaji and S. Masuda and A. Yamaguchi and T. Satoh and A. Morioka and Y. Igarashi and M. Shirane and T. Yamamoto	(参考訳) 単一光子kerrレジームにおける2つの分散回路ジョセフソンパラメトリック発振器からなる系の同時パラメトリック振動を静電容量で結合した。系のエネルギーは、振幅と符号がパラメトリックポンプ間の相対位相に依存する効果的なカップリングを持つ2ビットイジングハミルトニアンによって記述される。パラメトリック振動の2相相は相互に相関し, ポンプ位相を調整することで相関のパリティと強度を制御できることを実証した。観測された相関は, 純粋な強調を考慮したシミュレーションで再現される。本結果は、KPOネットワークからなるIsingマシンハードウェアで使用可能な外部マイクロ波の位相によるハミルトンパラメータのチューニング性を示す。 We study simultaneous parametric oscillations in a system composed of two distributed-element-circuit Josephson parametric oscillators in the single-photon Kerr regime coupled via a static capacitance. The energy of the system is described by a two-bit Ising Hamiltonian with an effective coupling whose amplitude and sign depend on the relative phase between parametric pumps. We demonstrate that the binary phases of the parametric oscillations are correlated with each other, and that the parity and strength of the correlation can be controlled by adjusting the pump phase. The observed correlation is reproduced in our simulation taking pure dephasing into account. The present result demonstrates the tunability of the Hamiltonian parameters by the phase of external microwave, which can be used in the Ising machine hardware composed of the KPO network.	翻訳日:2023-06-29 18:21:29 公開日:2023-06-28
# クリロフ状態と作用素複素量に対する普遍的アプローチ A universal approach to Krylov State and Operator complexities ( http://arxiv.org/abs/2212.10583v3 ) ライセンス: Link先を確認	Mohsen Alishahiha and Souvik Banerjee	(参考訳) 我々は、Krylov状態と演算子複雑性の両方を同じ足場に配置できる一般的な枠組みを提案する。我々の形式論において、クリロフ複雑性は、作用素複雑性に対してチャネル状態写像によって得られる二重ヒルベルト空間上に存在する関連する状態の密度行列によって定義される。この密度行列の観点からの複雑性の統一定義により、クリロフ複雑性の概念を部分領域あるいは混合状態複雑性に拡張し、また自然にクリロフ相互複雑度にも拡張することができる。このフレームワークは、複雑さというホログラフィック概念をうまく包含していることを示す。 We present a general framework in which both Krylov state and operator complexities can be put on the same footing. In our formalism, the Krylov complexity is defined in terms of the density matrix of the associated state which, for the operator complexity, lives on a doubled Hilbert space obtained through the channel-state map. This unified definition of complexity in terms of the density matrices enables us to extend the notion of Krylov complexity, to subregion or mixed state complexities and also naturally to the Krylov mutual complexity. We show that this framework also encompasses nicely the holographic notions of complexity.	翻訳日:2023-06-29 18:21:17 公開日:2023-06-28
# トランスフォーマーに基づくバイオメディカル言語モデルのドメイン内適応 Localising In-Domain Adaptation of Transformer-Based Biomedical Language Models ( http://arxiv.org/abs/2212.10422v3 ) ライセンス: Link先を確認	Tommaso Mario Buonocore, Claudio Crema, Alberto Redolfi, Riccardo Bellazzi, Enea Parimbelli	(参考訳) デジタル医療の時代には、病院で毎日生成される膨大なテキスト情報は、タスク固有の、微調整されたバイオメディカル言語表現モデル、患者のケアと管理の改善で活用できる、必須だが未使用の資産である。このような特殊なドメインに対しては、広範囲のチェックポイントから派生した微調整モデルが、大規模なドメイン内リソースに対する追加のトレーニングラウンドに大きく貢献することを示した。しかし、これらのリソースはイタリア語のような低リソース言語には到達できないことが多く、地元の医療機関がドメイン内適応を採用するのを妨げている。このギャップを減らすために,我々の研究は,英語以外の言語で生物医学的言語モデルを導出するための2つのアプローチについて検討した。1つは,英語リソースのニューラルネットワーク翻訳に基づく,品質よりも量を重視する,もう1つは,イタリア語でネイティブに書かれたハイグレードで狭スコープのコーパスに基づく,量よりも品質を優先する,という,具体的なユースケースである。本研究は, 生物医学的適応のためのデータ品質よりもデータ量に厳しい制約があることを示すが, 高品質なデータの結合は, 比較的サイズが制限されたコーパスを扱う場合でも, モデル性能を向上させることができる。我々の調査から得られたモデルは、イタリアの病院やアカデミアにとって重要な研究機会を開放する可能性がある。最後に、この研究から学んだ一連の教訓は、他の低リソース言語や異なるドメイン設定に一般化可能なバイオメディカル言語モデルを構築するためのソリューションに対する貴重な洞察を構成する。 In the era of digital healthcare, the huge volumes of textual information generated every day in hospitals constitute an essential but underused asset that could be exploited with task-specific, fine-tuned biomedical language representation models, improving patient care and management. For such specialized domains, previous research has shown that fine-tuning models stemming from broad-coverage checkpoints can largely benefit additional training rounds over large-scale in-domain resources. However, these resources are often unreachable for less-resourced languages like Italian, preventing local medical institutions to employ in-domain adaptation. In order to reduce this gap, our work investigates two accessible approaches to derive biomedical language models in languages other than English, taking Italian as a concrete use-case: one based on neural machine translation of English resources, favoring quantity over quality; the other based on a high-grade, narrow-scoped corpus natively written in Italian, thus preferring quality over quantity. Our study shows that data quantity is a harder constraint than data quality for biomedical adaptation, but the concatenation of high-quality data can improve model performance even when dealing with relatively size-limited corpora. The models published from our investigations have the potential to unlock important research opportunities for Italian hospitals and academia. Finally, the set of lessons learned from the study constitutes valuable insights towards a solution to build biomedical language models that are generalizable to other less-resourced languages and different domain settings.	翻訳日:2023-06-29 18:21:06 公開日:2023-06-28
# ベクターレグレッションのサポート:リスククワッドローグフレームワーク Support Vector Regression: Risk Quadrangle Framework ( http://arxiv.org/abs/2212.09178v4 ) ライセンス: Link先を確認	Anton Malandii, Stan Uryasev	(参考訳) 本稿では, 最適化, リスク管理, 統計的推定を関連付ける基本リスク二次理論の文脈において, サポートベクトル回帰 (svr) について検討する。 SVR, $\varepsilon$-SVR および $\nu$-SVR の2つの定式化は、それぞれ等価なエラー対策(Vapnik error と CVaR norm)の最小化に対応する。これらの誤差測度は、対応するリスク二次数を定義する。 SVRに対応する基本リスク四角形を構築することにより、SVRは2つの対称条件量子平均の漸近的に偏りのない推定器であることを示す。さらに,一般確率環境での$\varepsilon$-SVRと$\nu$-SVRの等価性を証明した。さらに、SVRは正規化ペナルティを持つ正規偏差最小化問題として定式化される。最後に、リスク四角形フレームワークにおけるSVRの二重定式化が導出される。 This paper investigates Support Vector Regression (SVR) in the context of the fundamental risk quadrangle theory, which links optimization, risk management, and statistical estimation. It is shown that both formulations of SVR, $\varepsilon$-SVR and $\nu$-SVR, correspond to the minimization of equivalent error measures (Vapnik error and CVaR norm, respectively) with a regularization penalty. These error measures, in turn, define the corresponding risk quadrangles. By constructing the fundamental risk quadrangle, which corresponds to SVR, we show that SVR is the asymptotically unbiased estimator of the average of two symmetric conditional quantiles. Further, we prove the equivalence of the $\varepsilon$-SVR and $\nu$-SVR in a general stochastic setting. Additionally, SVR is formulated as a regular deviation minimization problem with a regularization penalty. Finally, the dual formulation of SVR in the risk quadrangle framework is derived.	翻訳日:2023-06-29 18:20:38 公開日:2023-06-28
# 国家対立型マルチエージェント強化学習の解決策とは? What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning? ( http://arxiv.org/abs/2212.02705v4 ) ライセンス: Link先を確認	Songyang Han, Sanbao Su, Sihong He, Shuo Han, Haizhao Yang, Fei Miao	(参考訳) MARL(Multi-Agent Reinforcement Learning)は,エージェントのポリシーが正確な状態情報に基づいていると仮定して開発されている。しかし、Deep Reinforcement Learning (DRL)を通じて学んだ政策は、敵国の摂動攻撃に影響を受けやすい。本研究では,MARL の基本的特性を状態不確実性下で調査する試みとして,SAMG (State-Adversarial Markov Game) を提案する。分析の結果,最適エージェント政策とロバストnash均衡の一般的な解概念は必ずしもsamgに存在しないことがわかった。この困難を回避するために,エージェントが最悪の状態値の最大化を目指す,ロバストエージェントポリシーと呼ばれる新しいソリューションを考える。我々は,有限状態および有限作用samgに対するロバストエージェントポリシーの存在を証明する。さらに、状態不確実性の下でMARLエージェントの堅牢なポリシーを学習するためのロバスト多エージェントアクタークリティカル(RMA3C)アルゴリズムを提案する。実験により,本アルゴリズムは状態摂動に直面する場合の既存手法よりも優れ,MARLポリシーの堅牢性を大幅に向上することが示された。私たちのコードはhttps://songyanghan.github.io/what_is_solution/で公開しています。 Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate the fundamental properties of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/.	翻訳日:2023-06-29 18:19:41 公開日:2023-06-28
# narrasum: 物語要約のための大規模データセット NarraSum: A Large-Scale Dataset for Abstractive Narrative Summarization ( http://arxiv.org/abs/2212.01476v2 ) ライセンス: Link先を確認	Chao Zhao, Faeze Brahman, Kaiqiang Song, Wenlin Yao, Dian Yu, Snigdha Chaturvedi	(参考訳) 物語の要約は、最も健全な出来事とキャラクターを記述するための物語の蒸留版を作ることを目的としている。物語の要約は、出来事の因果関係と性格行動を理解する必要があるため、難しい。この方向の研究を促進するために,大規模な物語要約データセットであるNarraSumを提案する。 122kの物語文書を収録し、様々なジャンルの映画やテレビ番組の筋書きや、それらに対応する抽象要約から収集する。実験の結果,NarraSumにおける人間と最先端の要約モデルの間には大きなパフォーマンスギャップが存在することがわかった。このデータセットは、今後の要約研究や、自然言語の理解と生成に関する広範な研究を促進することを願っている。データセットはhttps://github.com/zhaochaocs/narrasumで入手できる。 Narrative summarization aims to produce a distilled version of a narrative to describe its most salient events and characters. Summarizing a narrative is challenging as it requires an understanding of event causality and character behaviors. To encourage research in this direction, we propose NarraSum, a large-scale narrative summarization dataset. It contains 122K narrative documents, which are collected from plot descriptions of movies and TV episodes with diverse genres, and their corresponding abstractive summaries. Experiments show that there is a large performance gap between humans and the state-of-the-art summarization models on NarraSum. We hope that this dataset will promote future research in summarization, as well as broader studies of natural language understanding and generation. The dataset is available at https://github.com/zhaochaocs/narrasum.	翻訳日:2023-06-29 18:19:19 公開日:2023-06-28
# トランスベース学習最適化 Transformer-Based Learned Optimization ( http://arxiv.org/abs/2212.01055v4 ) ライセンス: Link先を確認	Erik G\"artner, Luke Metz, Mykhaylo Andriluka, C. Daniel Freeman, Cristian Sminchisescu	(参考訳) 本稿では,ニューラルネットワークを用いたオプティマイザ更新ステップの計算を行うための新しい学習最適化手法を提案する。最適化器のパラメータは、最適化タスクのセットのトレーニングによって学習され、効率よく最小化を行う。私たちのイノベーションは、古典的なbfgsアルゴリズムにインスパイアされた学習オプティマイザのための、新しいニューラルネットワークアーキテクチャであるoptimusです。 BFGSと同様に、プレコンディショニング行列をランク1更新の和として推定するが、Transformerベースのニューラルネットワークを用いてこれらの更新をステップ長と方向とともに予測する。近年の学習された最適化に基づくアプローチとは対照的に,我々の定式化により,対象問題のパラメータ空間の次元をまたいだ条件付けが可能となった。提案手法の利点は,これまで最適化アルゴリズムの評価に用いられてきた目標関数と,物理に基づく3次元人体動作の視覚的再構成の現実的実現に有効であることを示す。 We propose a new approach to learned optimization where we represent the computation of an optimizer's update step using a neural network. The parameters of the optimizer are then learned by training on a set of optimization tasks with the objective to perform minimization efficiently. Our innovation is a new neural network architecture, Optimus, for the learned optimizer inspired by the classic BFGS algorithm. As in BFGS, we estimate a preconditioning matrix as a sum of rank-one updates but use a Transformer-based neural network to predict these updates jointly with the step length and direction. In contrast to several recent learned optimization-based approaches, our formulation allows for conditioning across the dimensions of the parameter space of the target problem while remaining applicable to optimization tasks of variable dimensionality without retraining. We demonstrate the advantages of our approach on a benchmark composed of objective functions traditionally used for the evaluation of optimization algorithms, as well as on the real world-task of physics-based visual reconstruction of articulated 3d human motion.	翻訳日:2023-06-29 18:19:07 公開日:2023-06-28
# サンプリングのための近位アルゴリズムの次元依存性の改善 Improved dimension dependence of a proximal algorithm for sampling ( http://arxiv.org/abs/2302.10081v2 ) ライセンス: Link先を確認	Jiaojiao Fan, Bo Yuan and Yongxin Chen	(参考訳) 本研究では,すべての古典的設定(特にlog-concave,log-concave,logarithmic-sobolev inequality (lsi),poincar\'e inequality)において,より汎用的な半スムースあるいは複合ポテンシャルを用いた,より優れた複雑性境界を実現するサンプリングアルゴリズムを提案する。提案アルゴリズムは, 〜\citet{lee2021structured} で導入された近位標本に基づく。この近位サンプリング器の性能は、近位サンプリング器の重要なステップである制限されたガウスオラクル(RGO)によって決定される。この研究の主な貢献は、近似的拒絶サンプリングに基づくRGOの不正確な実現である。 RGOの不等式を束縛するために、ガウス分布上の半滑らか関数に対する新しい濃度不等式を確立し、リプシッツ函数に対するよく知られた濃度不等式を拡張する。 RGOの実装を近位サンプリングに応用し、ほぼすべての設定で最先端の複雑さ境界を達成する。例えば、強い対数対数分布の場合、我々の手法は、MALA の minimax 境界よりも、ウォームスタートのない$\tilde\mathcal{O}(\kappa d^{1/2})$ の複雑さを持つ。 LSIを満たす分布に対して、我々の境界は$\tilde \mathcal{O}(\hat \kappa d^{1/2})$である。 We propose a sampling algorithm that achieves superior complexity bounds in all the classical settings (strongly log-concave, log-concave, Logarithmic-Sobolev inequality (LSI), Poincar\'e inequality) as well as more general settings with semi-smooth or composite potentials. Our algorithm is based on the proximal sampler introduced in~\citet{lee2021structured}. The performance of this proximal sampler is determined by that of the restricted Gaussian oracle (RGO), a key step in the proximal sampler. The main contribution of this work is an inexact realization of RGO based on approximate rejection sampling. To bound the inexactness of RGO, we establish a new concentration inequality for semi-smooth functions over Gaussian distributions, extending the well-known concentration inequality for Lipschitz functions. Applying our RGO implementation to the proximal sampler, we achieve state-of-the-art complexity bounds in almost all settings. For instance, for strongly log-concave distributions, our method has complexity bound $\tilde\mathcal{O}(\kappa d^{1/2})$ without warm start, better than the minimax bound for MALA. For distributions satisfying the LSI, our bound is $\tilde \mathcal{O}(\hat \kappa d^{1/2})$ where $\hat \kappa$ is the ratio between smoothness and the LSI constant, better than all existing bounds.	翻訳日:2023-06-29 18:12:18 公開日:2023-06-28
# 非特定運動データを用いた拡張可能なXRユーザ同定 Extensible Motion-based Identification of XR Users using Non-Specific Motion Data ( http://arxiv.org/abs/2302.07517v3 ) ライセンス: Link先を確認	Christian Schell, Konstantin Kobs, Tamara Fernando, Andreas Hotho, Marc Erich Latoschik	(参考訳) 本稿では,距離ベースと分類に基づくアプローチの強みを組み合わせることで,拡張現実ユーザの動きを識別する。そこで我々は,深層メトリック学習を活用した組込みモデルについて検討する。われわれは,VRゲーム‘Half-Life: Alyx’’をプレイするユーザのデータセット上でモデルをトレーニングし,アート分類ベースモデルの状態をベースラインとして,複数の実験と分析を行う。その結果,埋め込み型手法が有効であった。 1) 数分間の登録データを使用して,非特定動作から新規ユーザを識別できる。 2)新しいユーザーを数秒以内に登録できるが、ベースラインアプローチの再トレーニングにはおよそ1日かかる。 3) 登録データが少ない場合にのみ,ベースラインアプローチよりも信頼性が高い。 4) 異なるVRデバイスで記録された別のデータセットから新しいユーザーを特定するために使用することができる。全体として、我々のソリューションは、拡張可能なxrユーザ識別システムの基礎であり、幅広いユーザ動作に適用できる。また、専門知識やハードウェア、あるいはディープラーニングモデルをトレーニングするためのデータを必要としない、XR実践者が使用可能なプロダクション対応モデルの道を開く。 In this paper, we combine the strengths of distance-based and classification-based approaches for the task of identifying extended reality users by their movements. For this we explore an embedding-based model that leverages deep metric learning. We train the model on a dataset of users playing the VR game ``Half-Life: Alyx'' and conduct multiple experiments and analyses using a state of the art classification-based model as baseline. The results show that the embedding-based method 1) is able to identify new users from non-specific movements using only a few minutes of enrollment data, 2) can enroll new users within seconds, while retraining the baseline approach takes almost a day, 3) is more reliable than the baseline approach when only little enrollment data is available, 4) can be used to identify new users from another dataset recorded with different VR devices. Altogether, our solution is a foundation for easily extensible XR user identification systems, applicable to a wide range of user motions. It also paves the way for production-ready models that could be used by XR practitioners without the requirements of expertise, hardware, or data for training deep learning models.	翻訳日:2023-06-29 18:11:46 公開日:2023-06-28
# 完全共変機械学習に向けて Towards fully covariant machine learning ( http://arxiv.org/abs/2301.13724v2 ) ライセンス: Link先を確認	Soledad Villar (JHU), David W. Hogg (NYU, MPIA, Flatiron), Weichi Yao (NYU), George A. Kevrekidis (JHU, LANL), Bernhard Sch\"olkopf (MPI-IS)	(参考訳) 任意のデータ表現は任意の調査員の選択を伴う。これらの選択はデータ生成過程の外部にあるため、それぞれの選択は1つの可能な表現を別の表現に取る変換の群に対応する正確な対称性をもたらす。これらはパッシブ対称性であり、座標自由度、ゲージ対称性、単位共分散を含み、これらは全て物理学において重要な結果をもたらした。機械学習において、最も目に見える受動対称性はグラフの相対的あるいは置換的対称性である。私たちの目標は、プレイ中の多くの受動的対称性の機械学習の意味を理解することです。受動的対称性を尊重すべきならば,機械学習の実践について,dos と not について議論する。因果モデリングとの関連について議論し、学習問題の目的がサンプルから一般化することである場合、受動的対称性の実装は特に有用であると主張する。この論文は概念的であり、物理学、数学、機械学習の言語に翻訳される。受動的対称性の考察と実装は、機械学習が20世紀に物理学を変革したのと同じ方法で役立つと信じている。 Any representation of data involves arbitrary investigator choices. Because those choices are external to the data-generating process, each choice leads to an exact symmetry, corresponding to the group of transformations that takes one possible representation to another. These are the passive symmetries; they include coordinate freedom, gauge symmetry, and units covariance, all of which have led to important results in physics. In machine learning, the most visible passive symmetry is the relabeling or permutation symmetry of graphs. Our goal is to understand the implications for machine learning of the many passive symmetries in play. We discuss dos and don'ts for machine learning practice if passive symmetries are to be respected. We discuss links to causal modeling, and argue that the implementation of passive symmetries is particularly valuable when the goal of the learning problem is to generalize out of sample. This paper is conceptual: It translates among the languages of physics, mathematics, and machine-learning. We believe that consideration and implementation of passive symmetries might help machine learning in the same ways that it transformed physics in the twentieth century.	翻訳日:2023-06-29 18:11:09 公開日:2023-06-28
# 未発見の論理推論と学位カリキュラムの一般化 Generalization on the Unseen, Logic Reasoning and Degree Curriculum ( http://arxiv.org/abs/2301.13105v2 ) ライセンス: Link先を確認	Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk	(参考訳) 本稿では,論理関数の学習を,分散一般化の強い場合である未完(gotu)設定の一般化に焦点をあてて検討する。これは、ある推論タスク(例えば算術/論理学)におけるデータのリッチな組合せの性質が、代表的データのサンプリングを困難にし、GOTUの下での学習が成功すると、'extrapolating'あるいは'reasoning'学習者の最初のビゲットを与えるという事実が動機である。次に、(S)GDでトレーニングされた異なるネットワークアーキテクチャがGOTUの下でどのように機能するかを研究し、トランスフォーマーのインスタンス、ランダム特徴モデル、対角線ネットワークを含むネットワークモデルのクラスにおいて、無目でmin-degree-interpolatorが学習されるという理論的および実験的証拠を提供する。また、学習率や平均フィールドネットワークが漏れやすい最小限の解に到達した証拠も提示する。これらの知見は,(1)長さ一般化問題(例: Anil et al. 2022)を説明すること,(2)単項をより効率的に学習するDegree-Curriculumというカリキュラム学習アルゴリズムを導入すること,の2つに繋がる。 This paper considers the learning of logical (Boolean) functions with focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization. This is motivated by the fact that the rich combinatorial nature of data in certain reasoning tasks (e.g., arithmetic/logic) makes representative data sampling challenging, and learning successfully under GOTU gives a first vignette of an 'extrapolating' or 'reasoning' learner. We then study how different network architectures trained by (S)GD perform under GOTU and provide both theoretical and experimental evidence that for a class of network models including instances of Transformers, random features models, and diagonal linear networks, a min-degree-interpolator is learned on the unseen. We also provide evidence that other instances with larger learning rates or mean-field networks reach leaky min-degree solutions. These findings lead to two implications: (1) we provide an explanation to the length generalization problem (e.g., Anil et al. 2022); (2) we introduce a curriculum learning algorithm called Degree-Curriculum that learns monomials more efficiently by incrementing supports.	翻訳日:2023-06-29 18:10:55 公開日:2023-06-28
# EHRSQL: 電子健康記録のための実践的なテキストからSQLのベンチマーク EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records ( http://arxiv.org/abs/2301.07695v4 ) ライセンス: Link先を確認	Gyubok Lee, Hyeonji Hwang, Seongsu Bae, Yeonsu Kwon, Woncheol Shin, Seongjun Yang, Minjoon Seo, Jong-Yeup Kim, Edward Choi	(参考訳) 電子健康記録(EHR)のための新しいテキスト間SQLデータセットを提案する。発話は、医師、看護師、保険審査および健康記録チームを含む222人の病院スタッフから集められた。構造化EMHデータに基づくQAデータセットを構築するため,大学病院で調査を行い,種問合せの作成に回答した。次に、これらの質問をMIMIC-IIIとeICUの2つのオープンソースのEHRデータベースに手動でリンクし、様々な時間表現と、この調査から収集されたデータセットに持たない質問を含む。私たちのデータセットには、ユニークな課題があります。 1) 病院における幅広いニーズを反映したsqlクエリを生成し、簡単な検索や生存率の計算などの複雑な操作を含む。 2)医療における時間感性質問に対する各種時間表現の理解と対応 3) ある質問が回答可能か否かを区別する。当社のデータセットであるEHRSQLは、構造化されたEHRデータ上でのQAモデルの開発と評価のための実用的なベンチマークとして機能し、テキストからSQLまでの研究と、その医療における実際の展開のギャップを埋めるための一歩を踏み出すことができると考えています。 EHRSQLはhttps://github.com/glee4810/EHRSQLで入手できる。 We present a new text-to-SQL dataset for electronic health records (EHRs). The utterances were collected from 222 hospital staff members, including physicians, nurses, and insurance review and health records teams. To construct the QA dataset on structured EHR data, we conducted a poll at a university hospital and used the responses to create seed questions. We then manually linked these questions to two open-source EHR databases, MIMIC-III and eICU, and included various time expressions and held-out unanswerable questions in the dataset, which were also collected from the poll. Our dataset poses a unique set of challenges: the model needs to 1) generate SQL queries that reflect a wide range of needs in the hospital, including simple retrieval and complex operations such as calculating survival rate, 2) understand various time expressions to answer time-sensitive questions in healthcare, and 3) distinguish whether a given question is answerable or unanswerable. We believe our dataset, EHRSQL, can serve as a practical benchmark for developing and assessing QA models on structured EHR data and take a step further towards bridging the gap between text-to-SQL research and its real-life deployment in healthcare. EHRSQL is available at https://github.com/glee4810/EHRSQL.	翻訳日:2023-06-29 18:09:44 公開日:2023-06-28
# コンピュータビジョンとLSTMニューラルネットワークを用いた太陽コロナホール解析と予測 Solar Coronal Hole Analysis and Prediction using Computer Vision and LSTM Neural Network ( http://arxiv.org/abs/2301.06732v4 ) ライセンス: Link先を確認	Juyoung Yun	(参考訳) 人類が宇宙を探索し始めるにつれ、宇宙の天気の重要性が明らかになってきた。宇宙天気現象の一種であるコロナホールが、航空機や衛星の運用に影響を与えることが確立されている。コロナホール(英: coronal hole)は、オープン磁場線と比較的低温を特徴とする太陽上の領域であり、太陽風を平均より高い速度で放出する。本研究では,地球へのコロナホールの影響に備えるために,コンピュータビジョンを用いてコロナホール領域を検出し,太陽動力学観測所(sdo)の画像に基づいてその大きさを計算する。我々は、太陽の各領域のコロナホールを比較し、相関関係を分析する。次に, 深層学習, 特にLong Short-Term Memory (LSTM) 手法を実装し, コロナホール領域データの傾向を解析し, 7日間にわたる異なる太陽領域におけるそのサイズを予測する。本研究は, コロナホール領域の時系列データを解析することにより, コロナホールの挙動のパターンや傾向を同定し, 宇宙気象事象にどのように影響するかを理解することを目的とする。この研究は、地球と技術システムに影響を与える宇宙天気イベントを予測し、準備する能力を改善するための重要なステップである。 As humanity has begun to explore space, the significance of space weather has become apparent. It has been established that coronal holes, a type of space weather phenomenon, can impact the operation of aircraft and satellites. The coronal hole is an area on the sun characterized by open magnetic field lines and relatively low temperatures, which result in the emission of the solar wind at higher than average rates. In this study, To prepare for the impact of coronal holes on the Earth, we use computer vision to detect the coronal hole region and calculate its size based on images from the Solar Dynamics Observatory (SDO). We compare the coronal holes for each region of the Sun and analyze the correlation. We then implement deep learning techniques, specifically the Long Short-Term Memory (LSTM) method, to analyze trends in the coronal hole area data and predict its size for different sun regions over 7 days. By analyzing time series data on the coronal hole area, this study aims to identify patterns and trends in coronal hole behavior and understand how they may impact space weather events. This research represents an important step towards improving our ability to predict and prepare for space weather events that can affect Earth and technological systems.	翻訳日:2023-06-29 18:09:21 公開日:2023-06-28
# 確率的予測によるリアルタイムてんかん発作検出の遅延短縮 Shorter Latency of Real-time Epileptic Seizure Detection via Probabilistic Prediction ( http://arxiv.org/abs/2301.03465v2 ) ライセンス: Link先を確認	Yankun Xu, Jie Yang, Wenjie Ming, Shuang Wang, and Mohamad Sawan	(参考訳) 近年の研究では、感度性能のよい発作検出アルゴリズムが提案されているが、リアルタイムシナリオにおいて検出遅延を大幅に短縮することは困難である。本稿では,確率的予測によるてんかん発作検出遅延の短縮を目的とした,新しいディープラーニングフレームワークを提案する。我々は,従来の二分法から確率予測への変換を,発作指向脳波記録から横断周期を導入し,ソフトラベルを用いたラベル付け規則を提案することで行った。また, 3D-CNNアーキテクチャと組み合わせたSTFTを用いた新しい特徴抽出手法を提案し, サンプルの予測確率を正確に把握する。さらに,予測確率を高めるための修正重み付け戦略と,検出遅延を大幅に短縮する累積決定ルールを提案する。提案手法は,患者固有の離脱1回限りのクロスバリデーション方式において,CHB-MIT scalp EEG データセットと SWEC-ETHZ 頭蓋内 EEG データセットに実装されている。提案手法は, 交差期99例中94例, 脳波開始後100%の発作検出に成功し, 平均14.84%の正規化予測ictal probability (rpip) 誤差, 2.3 s検出遅延, 0.08/h偽検出率 (fdr) をchb-mitデータセット上で検出した。一方、交差期間中に検出された89例中84例、脳波開始後に100%検出された発作、16.17%のrpipエラー、4.7 s検出遅延、0.08/h fdrがswec-ethzデータセット上で達成されている。得られた検出レイテンシは, 従来研究で報告された最先端結果よりも少なくとも50%短い。 Although recent studies have proposed seizure detection algorithms with good sensitivity performance, there is a remained challenge that they were hard to achieve significantly short detection latency in real-time scenarios. In this manuscript, we propose a novel deep learning framework intended for shortening epileptic seizure detection latency via probabilistic prediction. We are the first to convert the seizure detection task from traditional binary classification to probabilistic prediction by introducing a crossing period from seizure-oriented EEG recording and proposing a labeling rule using soft-label for crossing period samples. And, a novel multiscale STFT-based feature extraction method combined with 3D-CNN architecture is proposed to accurately capture predictive probabilities of samples. Furthermore, we also propose rectified weighting strategy to enhance predictive probabilities, and accumulative decision-making rule to achieve significantly shorter detection latency. We implement the proposed framework on two prevalent datasets -- CHB-MIT scalp EEG dataset and SWEC-ETHZ intracranial EEG dataset in patient-specific leave-one-seizure-out cross-validation scheme. Eventually, the proposed algorithm successfully detected 94 out of 99 seizures during crossing period and 100% seizures detected after EEG onset, averaged 14.84% rectified predictive ictal probability (RPIP) errors of crossing samples, 2.3 s detection latency, 0.08/h false detection rate (FDR) on CHB-MIT dataset. Meanwhile, 84 out of 89 detected seizures during crossing period, 100% detected seizures after EEG onset, 16.17% RPIP errors, 4.7 s detection latency, and 0.08/h FDR are achieved on SWEC-ETHZ dataset. The obtained detection latencies are at least 50% shorter than state-of-the-art results reported in previous studies.	翻訳日:2023-06-29 18:08:59 公開日:2023-06-28
# Auto-AVSR: 自動ラベルによる音声認識 Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels ( http://arxiv.org/abs/2303.14307v3 ) ライセンス: Link先を確認	Pingchuan Ma, Alexandros Haliassos, Adriana Fernandez-Lopez, Honglie Chen, Stavros Petridis, Maja Pantic	(参考訳) 音響雑音に対する頑健性から,音声認識には多くの注目を集めている。近年,大規模モデルとトレーニングセットの使用を中心に,自動・視覚的・音声視覚的音声認識(ASR,VSR,AV-ASR)の性能が大幅に向上している。しかし、データセットの正確なラベル付けには時間と費用がかかる。そこで本研究では,ラベルなしデータセットの自動生成転写を用いて,トレーニングセットのサイズを増加させる方法について検討する。この目的のために、AVSpeechやVoxCeleb2といった非競合データセットを自動的に書き起こすために、公開トレーニング済みのASRモデルを使用します。そして、ARS、VSR、AV-ASRのモデルを拡張トレーニングセットでトレーニングし、LSS2とLSS3のデータセットと追加の自動転写データからなる。近年の文献的傾向であるトレーニングセットのサイズが大きくなると,ノイズによる書き起こしにもかかわらずWERが減少することが示されている。提案手法は,RS2 と LRS3 の AV-ASR 上での最先端性能を実現する。特に、現在の最先端アプローチよりも30%向上したRS3で0.9%のWERを達成し、26倍のトレーニングデータを持つ非公開データセットでトレーニングされたメソッドを上回ります。 Audio-visual speech recognition has received a lot of attention due to its robustness against acoustic noise. Recently, the performance of automatic, visual, and audio-visual speech recognition (ASR, VSR, and AV-ASR, respectively) has been substantially improved, mainly due to the use of larger models and training sets. However, accurate labelling of datasets is time-consuming and expensive. Hence, in this work, we investigate the use of automatically-generated transcriptions of unlabelled datasets to increase the training set size. For this purpose, we use publicly-available pre-trained ASR models to automatically transcribe unlabelled datasets such as AVSpeech and VoxCeleb2. Then, we train ASR, VSR and AV-ASR models on the augmented training set, which consists of the LRS2 and LRS3 datasets as well as the additional automatically-transcribed data. We demonstrate that increasing the size of the training set, a recent trend in the literature, leads to reduced WER despite using noisy transcriptions. The proposed model achieves new state-of-the-art performance on AV-ASR on LRS2 and LRS3. In particular, it achieves a WER of 0.9% on LRS3, a relative improvement of 30% over the current state-of-the-art approach, and outperforms methods that have been trained on non-publicly available datasets with 26 times more training data.	翻訳日:2023-06-29 18:03:00 公開日:2023-06-28
# xplainer:x線観測からゼロショット診断へ Xplainer: From X-Ray Observations to Explainable Zero-Shot Diagnosis ( http://arxiv.org/abs/2303.13391v3 ) ライセンス: Link先を確認	Chantal Pellegrini, Matthias Keicher, Ege \"Ozsoy, Petra Jiraskova, Rickmer Braren, Nassir Navab	(参考訳) 医療画像からの診断自動予測は臨床的意思決定を支援する貴重な資源である。しかし、そのようなシステムは、通常、医療領域では不足することが多い大量の注釈付きデータに基づいて訓練される必要がある。ゼロショット法は、ラベル付きデータに頼ることなく、異なる臨床所見を持つ新しい設定への柔軟な適応を可能にすることで、この問題に対処する。さらに, 臨床ワークフローに自動診断を統合するためには, 方法が透明で説明しやすいこと, 医療専門家の信頼度を高め, 正確性検証を容易にすることが必要である。本稿では,臨床現場におけるゼロショット診断のための新しいフレームワークであるXplainerを紹介する。 Xplainerは、比較視覚言語モデルの分類記述アプローチを多言語診断タスクに適用する。具体的には、診断を直接予測する代わりに、放射線技師がX線スキャンで探す記述的観察の存在をモデルに分類し、診断の可能性を推定するために記述子確率を使用する。最終的な診断予測は、基礎となる記述子の予測に基づいて直接行われるため、このモデルは設計によって説明可能である。胸部X線データセットであるCheXpertとChestX-ray14のXplainerを評価し,ゼロショット診断の性能と説明性の向上に有効であることを示した。以上の結果から,Xplainerは意思決定プロセスのより詳細な理解を提供し,臨床診断に有用なツールであることが示唆された。 Automated diagnosis prediction from medical images is a valuable resource to support clinical decision-making. However, such systems usually need to be trained on large amounts of annotated data, which often is scarce in the medical domain. Zero-shot methods address this challenge by allowing a flexible adaption to new settings with different clinical findings without relying on labeled data. Further, to integrate automated diagnosis in the clinical workflow, methods should be transparent and explainable, increasing medical professionals' trust and facilitating correctness verification. In this work, we introduce Xplainer, a novel framework for explainable zero-shot diagnosis in the clinical setting. Xplainer adapts the classification-by-description approach of contrastive vision-language models to the multi-label medical diagnosis task. Specifically, instead of directly predicting a diagnosis, we prompt the model to classify the existence of descriptive observations, which a radiologist would look for on an X-Ray scan, and use the descriptor probabilities to estimate the likelihood of a diagnosis. Our model is explainable by design, as the final diagnosis prediction is directly based on the prediction of the underlying descriptors. We evaluate Xplainer on two chest X-ray datasets, CheXpert and ChestX-ray14, and demonstrate its effectiveness in improving the performance and explainability of zero-shot diagnosis. Our results suggest that Xplainer provides a more detailed understanding of the decision-making process and can be a valuable tool for clinical diagnosis.	翻訳日:2023-06-29 18:02:37 公開日:2023-06-28
# 生成的半教師付き学習と生成的オープンセット認識の関連について On the link between generative semi-supervised learning and generative open-set recognition ( http://arxiv.org/abs/2303.11702v3 ) ライセンス: Link先を確認	Emile Reyn Engelbrecht, Johan du Preez	(参考訳) 本研究では,GANにおける半教師付き学習(SSL)とオープンセット認識(OSR)の関係について検討した。 SSLとOSRを公式にリンクした以前の研究はないが、それぞれの手法は大きな類似点を共有している。具体的には、SSL-GANとOSR-GANはジェネレータに相補的な空間でサンプルを生成し、それぞれの分類器ネットワークを正規化する。続いて、sslとosrの下で訓練された分類器はラベル付きカテゴリ周辺の分類境界を厳しくすることでオープンスペースを一般化する。言い換えれば、SSL-GANを使って訓練された分類器はOSRと逆転を本質的に達成する。このSSL-OSRリンクを証明するため、理論上、実験的に最先端のSSL-GANと最先端のOSR-GANを比較した。その結果,すべてのSSL-GANとOSR-GANは同じ目標に向かって動作し,SSLに最適化されたMargin-GANがSSL-OSRの組み合わせタスクに対して,新たな最先端を設定できることがわかった。将来の研究はSSL-GANとOSR-GANの理論的類似性をさらに探求し、SSL-OSRを他の学習ポリシーに拡張する。 This study investigates the relationship between semi-supervised learning (SSL) and open-set recognition (OSR) under the context of generative adversarial networks (GANs). Although no previous study has formally linked SSL and OSR, their respective methods share striking similarities. Specifically, SSL-GANs and OSR-GANs require their generators to produce samples in the complementary space, which are then used to regularise their respective classifier networks. In turn, classifiers trained under SSL and OSR generalise the open space by tightening classification boundaries around the labelled categories. In other words, a classifier trained using an SSL-GAN intrinsically achieves OSR and vice-versa. To prove this SSL-OSR link, we theoretically and experimentally compare the state-of-the-art SSL-GAN with the state-of-the-art OSR-GAN. Our results find that all SSL-GANs and OSR-GANs work towards the same goal, and that the SSL-optimised Margin-GANs set the new state-of-the-art for the combined task of SSL-OSR. Future studies could further explore the theoretical similarities between SSL-GANs and OSR-GANs, as well as extend SSL-OSR to other learning policies.	翻訳日:2023-06-29 18:02:13 公開日:2023-06-28
# IRGen:画像検索のための生成モデリング IRGen: Generative Modeling for Image Retrieval ( http://arxiv.org/abs/2303.10126v3 ) ライセンス: Link先を確認	Yidan Zhang, Ting Zhang, Dong Chen, Yujing Wang, Qi Chen, Xing Xie, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Mao Yang, Qingmin Liao, Baining Guo	(参考訳) 生成的モデリングは自然言語処理やコンピュータビジョンにおいてユビキタスであるが、画像検索への応用は未検討である。本稿では,シーケンス・ツー・シーケンスモデルを用いて画像検索を生成モデルの一形態として再キャストし,現在の統一テーマに寄与する。我々のフレームワークIRGenは、エンドツーエンドの微分検索を可能にする統一モデルであり、直接最適化により優れた性能を実現する。 IRGenの開発中、画像の極めて短いセマンティックな配列に変換するという重要な技術的課題に取り組み、効率的かつ効果的な検索を可能にする。実証実験により,本モデルが一般的に使用される3つのベンチマーク,例えばre recall@10スコアのin-shopデータセットにおけるprecision@10の最高基準法よりも22.9\%高い値が得られることを示した。 While generative modeling has been ubiquitous in natural language processing and computer vision, its application to image retrieval remains unexplored. In this paper, we recast image retrieval as a form of generative modeling by employing a sequence-to-sequence model, contributing to the current unified theme. Our framework, IRGen, is a unified model that enables end-to-end differentiable search, thus achieving superior performance thanks to direct optimization. While developing IRGen we tackle the key technical challenge of converting an image into quite a short sequence of semantic units in order to enable efficient and effective retrieval. Empirical experiments demonstrate that our model yields significant improvement over three commonly used benchmarks, for example, 22.9\% higher than the best baseline method in precision@10 on In-shop dataset with comparable recall@10 score.	翻訳日:2023-06-29 18:01:50 公開日:2023-06-28
# アシラ量子測定による多体系の進化 Evolution of many-body systems under ancilla quantum measurements ( http://arxiv.org/abs/2303.07081v3 ) ライセンス: Link先を確認	Elmer V. H. Doggen, Yuval Gefen, Igor V. Gornyi, Alexander D. Mirlin, Dmitry G. Polyakov	(参考訳) 測定誘起相転移は、実験と理論の両方の観点から、激しい電流研究の対象である。我々は,多体格子系を,射影測定を行う自由度(追加の2つの部位を用いて実装)に結合させることにより,量子測定を実装するという概念を探求する。一次元鎖内の相互作用するハードコアボソンの動的相関に対する繰り返し測定(「ストロボスコープ」)の効果を解析した。このプロトコルの重要な特徴は、検出アンシラが各測定工程後に再起動されないことである。これにより、測定された相関系による累積影響の記憶を維持する。はじめに,アシラを1つの格子サイトと結合するモデルを考える。この設定により、アシラ系相互作用によって変調された自由度のラビ振動を通じてシステムに関する情報を得ることができる。量子軌道の統計は、測定が強くなったときに生じる「量子-ゼノバルブ効果」を示し、低エンタングルメントと高エンタングルメントの間に鋭い分岐がある。数値シミュレーションを2つのアンシラの場合に適用し,その後,全部位の計測に拡張する。この現実的な測定装置により、より抽象的なモデルで以前観察されたように、遠絡測定による遷移の証拠が見つかる。力学は絡み合いエントロピーの広い分布を特徴とする。 Measurement-induced phase transitions are the subject of intense current research, both from an experimental and a theoretical perspective. We explore the concept of implementing quantum measurements by coupling a many-body lattice system to an ancillary degree of freedom (implemented using two additional sites), on which projective measurements are performed. We analyze the effect of repeated (``stroboscopic'') measurements on the dynamical correlations of interacting hard-core bosons in a one-dimensional chain. An important distinctive ingredient of the protocol is the fact that the detector ancillas are not re-initialized after each measurement step. The detector thus maintains memory of the accumulated influence by the measured correlated system. Initially, we consider a model in which the ancilla is coupled to a single lattice site. This setup allows obtaining information about the system through Rabi oscillations in the ancillary degrees of freedom, modulated by the ancilla-system interaction. The statistics of quantum trajectories exhibits a ``quantum-Zeno-valve effect'' that occurs when the measurement becomes strong, with sharp branching between low and high entanglement. We proceed by extending numerical simulations to the case of two ancillas and, then, to measurements on all sites. With this realistic measurement apparatus, we find evidence of a disentangling-entangling measurement-induced transition as was previously observed in more abstract models. The dynamics features a broad distribution of the entanglement entropy.	翻訳日:2023-06-29 18:01:04 公開日:2023-06-28
# マルコフ連鎖の線形統計に対するローゼンタール型不等式 Rosenthal-type inequalities for linear statistics of Markov chains ( http://arxiv.org/abs/2303.05838v2 ) ライセンス: Link先を確認	Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov, Marina Sheshukova	(参考訳) 本稿では、独立確率変数の和に対するローゼンタールやベルンシュタインの不等式に類似した幾何学的エルゴードマルコフ鎖の加法関数に対する新しい偏差境界を確立する。我々は、対応する鎖の混合時間に対する境界の依存性に特に注意を払う。より正確には、ロゼンタール不等式(英語版)のマルティンゲール版からの定数と結びついた明示的境界と、基礎となるマルコフ核の混合特性を特徴づける定数を確立する。最後に、我々の証明手法は、我々の知る限り、新しいもので、ポアソン分解の繰り返し適用に基づいている。 In this paper, we establish novel deviation bounds for additive functionals of geometrically ergodic Markov chains similar to Rosenthal and Bernstein inequalities for sums of independent random variables. We pay special attention to the dependence of our bounds on the mixing time of the corresponding chain. More precisely, we establish explicit bounds that are linked to the constants from the martingale version of the Rosenthal inequality, as well as the constants that characterize the mixing properties of the underlying Markov kernel. Finally, our proof technique is, up to our knowledge, new and based on a recurrent application of the Poisson decomposition.	翻訳日:2023-06-29 18:00:40 公開日:2023-06-28
# 拡大次元空間における低離散サンプリング:粒子群最適化のための加速法 Low-discrepancy Sampling in the Expanded Dimensional Space: An Acceleration Technique for Particle Swarm Optimization ( http://arxiv.org/abs/2303.03055v2 ) ライセンス: Link先を確認	Feng Wu, Yuelin Zhao, Jianhua Pang, Jun Yan, and Wanxie Zhong	(参考訳) ランダムサンプリングと比較すると,低差分サンプリングの方が探索空間の被覆に有効である。しかし, 粒子群最適化 (pso) に対する低分散サンプルの影響が正か負かは, 既存の研究で明らかに述べられていない。ニダーレイターの定理を用いて、この研究はPSOの誤差解析を完了し、各反復におけるPSOの誤差境界は拡張次元空間におけるサンプル集合の分散に依存することを示した。この誤差解析に基づいて,拡張次元空間における低差分サンプリングによるPSO型アルゴリズムの高速化手法を提案する。加速度法は、拡張次元空間においてランダムサンプリングに比べて分散が小さい低分散サンプル集合を生成することができ、また、各イテレーションにおける誤差を低減し、収束速度を向上できる。高速化手法を標準PSOと総合学習粒子群最適化と組み合わせ,改良アルゴリズムの性能を元のアルゴリズムと比較した。実験の結果, 2つの改良アルゴリズムは同じ精度で収束速度が著しく速いことがわかった。 Compared with random sampling, low-discrepancy sampling is more effective in covering the search space. However, the existing research cannot definitely state whether the impact of a low-discrepancy sample on particle swarm optimization (PSO) is positive or negative. Using Niderreiter's theorem, this study completes an error analysis of PSO, which reveals that the error bound of PSO at each iteration depends on the dispersion of the sample set in an expanded dimensional space. Based on this error analysis, an acceleration technique for PSO-type algorithms is proposed with low-discrepancy sampling in the expanded dimensional space. The acceleration technique can generate a low-discrepancy sample set with a smaller dispersion, compared with a random sampling, in the expanded dimensional space; it also reduces the error at each iteration, and hence improves the convergence speed. The acceleration technique is combined with the standard PSO and the comprehensive learning particle swarm optimization, and the performance of the improved algorithm is compared with the original algorithm. The experimental results show that the two improved algorithms have significantly faster convergence speed under the same accuracy requirement.	翻訳日:2023-06-29 18:00:29 公開日:2023-06-28
# 低次モデリングにおける残差学習のためのDeepONet多重忠実度アプローチ A DeepONet multi-fidelity approach for residual learning in reduced order modeling ( http://arxiv.org/abs/2302.12682v2 ) ライセンス: Link先を確認	Nicola Demo and Marco Tezzele and Gianluigi Rozza	(参考訳) 本稿では,多元的視点とdeeponetsを活用し,減少順序モデルの精度を向上させる新しい手法を提案する。縮小モデルは、元のモデルを単純化することで、リアルタイムな数値近似を提供する。そのような演算によって引き起こされるエラーは通常、高速な計算に到達するために無視され、犠牲にされる。そこで本研究では,ニューラルネットワークによって上記の誤差を学習し,新たな予測を推定できるように,機械学習残差学習にモデル還元を組み合わせることを提案する。我々は,高忠実度情報の利用を最大化し,高次オーダーモデルの構築と残差学習に利用することを強調した。本研究では,センサデータに対する正規直交分解(POD)とギャップピーPODの統合について,最近のDeepONetアーキテクチャを用いて検討する。パラメトリックベンチマーク関数と非線形パラメトリックナビエ-ストークス問題に関する数値的研究を行った。 In the present work, we introduce a novel approach to enhance the precision of reduced order models by exploiting a multi-fidelity perspective and DeepONets. Reduced models provide a real-time numerical approximation by simplifying the original model. The error introduced by the such operation is usually neglected and sacrificed in order to reach a fast computation. We propose to couple the model reduction to a machine learning residual learning, such that the above-mentioned error can be learned by a neural network and inferred for new predictions. We emphasize that the framework maximizes the exploitation of high-fidelity information, using it for building the reduced order model and for learning the residual. In this work, we explore the integration of proper orthogonal decomposition (POD), and gappy POD for sensors data, with the recent DeepONet architecture. Numerical investigations for a parametric benchmark function and a nonlinear parametric Navier-Stokes problem are presented.	翻訳日:2023-06-29 17:59:49 公開日:2023-06-28
# 逐次実験後の最適試験 Optimal tests following sequential experiments ( http://arxiv.org/abs/2305.00403v2 ) ライセンス: Link先を確認	Karun Adusumilli	(参考訳) 近年,逐次実験の理論と応用が飛躍的に進歩している。これらの実験は常に仮説検証を念頭に置いて設計されているわけではないが、実験が完了した後もテストの実行に関心があるかもしれない。本研究の目的は,その漸近的性質を解析し,逐次実験の最適テストの開発を支援することである。我々の重要な発見は、あらゆるテストの漸近的なパワー関数は、各処理でガウス過程が観測され、これらのプロセスのドリフトに対する推論が行われる極限実験において、テストによって一致させることができることである。この結果は、強力なsufficiency結果を含む重要な意味を持つ: どんな候補テストも、逐次実験の種類に関わらず、一定の統計セットのみに依存する必要がある。これらの統計は、各治療が実験の終了までにサンプリングされた回数であり、各治療のスコア(パラメトリックモデル)の最終値や効率的な影響関数(非パラメトリックモデル)のプロセスも合わせている。次に,不偏性,\alpha-spending制約など様々な制約下での漸近的最適検定を特徴付ける。最後に,本研究の結果を,コストライジング,グループシーケンシャルトライアル,バンドイット実験の3つの重要な段階に適用し,これらのシナリオにおいて最適な推論を行う方法を示す。 Recent years have seen tremendous advances in the theory and application of sequential experiments. While these experiments are not always designed with hypothesis testing in mind, researchers may still be interested in performing tests after the experiment is completed. The purpose of this paper is to aid in the development of optimal tests for sequential experiments by analyzing their asymptotic properties. Our key finding is that the asymptotic power function of any test can be matched by a test in a limit experiment where a Gaussian process is observed for each treatment, and inference is made for the drifts of these processes. This result has important implications, including a powerful sufficiency result: any candidate test only needs to rely on a fixed set of statistics, regardless of the type of sequential experiment. These statistics are the number of times each treatment has been sampled by the end of the experiment, along with final value of the score (for parametric models) or efficient influence function (for non-parametric models) process for each treatment. We then characterize asymptotically optimal tests under various restrictions such as unbiasedness, \alpha-spending constraints etc. Finally, we apply our our results to three key classes of sequential experiments: costly sampling, group sequential trials, and bandit experiments, and show how optimal inference can be conducted in these scenarios.	翻訳日:2023-06-29 17:53:00 公開日:2023-06-28
# 深層学習支援マイクロ波-プラズマ相互作用に基づくプラズマ密度推定手法 Deep Learning assisted microwave-plasma interaction based technique for plasma density estimation ( http://arxiv.org/abs/2304.14807v2 ) ライセンス: Link先を確認	Pratik Ghosh, Bhaskar Chaudhury, Shishir Purohit, Vishv Joshi, Ashray Kothari, Devdeep Shetranjiwala	(参考訳) 電子密度は、あらゆるプラズマを特徴づける重要なパラメータである。低温プラズマ(LTP)の領域におけるプラズマ応用と研究の大部分は、プラズマ密度とプラズマ温度の正確な推定に基づいている。従来の電子密度測定法は、任意の線形LTPデバイスに対して軸方向および半径方向のプロファイルを提供する。これらの手法は、操作範囲(あまり広くない)、煩雑な計測、複雑なデータ分析手順において大きな欠点がある。本稿は,既存のプラズマ密度測定手法に関連する課題を解決するための新しい代替手法として使用できる,マイクロ波プラズマ相互作用に基づく非侵襲的戦略を提案する。プラズマからのマイクロ波散乱による電界パターンを利用して密度分布を推定する。この概念の証明は、低温、非磁性、衝突プラズマからなるシミュレーショントレーニングデータセットに対して試験される。 10^{16}-10^{19}$ m$^{-3}$の異なる対称(ガウス型)と非対称密度プロファイルは、様々な実験的な構成に対応して検討されている。合成学習データセットの作成中,ノイズの存在や測定データ(dense vs sparse)の量といった実生活実験的な課題が検討されている。 DLベースの技術はプラズマ内の電子密度プロファイルを決定する能力を持つ。提案手法の性能は,SSIM, RMSLE, MAPEの3つの指標を用いて評価されている。得られた結果は, 線形プラズマ装置の密度の2次元半径分布を推定する上で有望な性能を示し, 提案手法のプラズマ診断への応用の可能性を確認した。 The electron density is a key parameter to characterize any plasma. Most of the plasma applications and research in the area of low-temperature plasmas (LTPs) are based on the accurate estimations of plasma density and plasma temperature. The conventional methods for electron density measurements offer axial and radial profiles for any given linear LTP device. These methods have major disadvantages of operational range (not very wide), cumbersome instrumentation, and complicated data analysis procedures. The article proposes a Deep Learning (DL) assisted microwave-plasma interaction-based non-invasive strategy, which can be used as a new alternative approach to address some of the challenges associated with existing plasma density measurement techniques. The electric field pattern due to microwave scattering from plasma is utilized to estimate the density profile. The proof of concept is tested for a simulated training data set comprising a low-temperature, unmagnetized, collisional plasma. Different types of symmetric (Gaussian-shaped) and asymmetrical density profiles, in the range $10^{16}-10^{19}$ m$^{-3}$, addressing a range of experimental configurations have been considered in our study. Real-life experimental issues such as the presence of noise and the amount of measured data (dense vs sparse) have been taken into consideration while preparing the synthetic training data-sets. The DL-based technique has the capability to determine the electron density profile within the plasma. The performance of the proposed deep learning-based approach has been evaluated using three metrics- SSIM, RMSLE, and MAPE. The obtained results show promising performance in estimating the 2D radial profile of the density for the given linear plasma device and affirms the potential of the proposed ML-based approach in plasma diagnostics.	翻訳日:2023-06-29 17:52:38 公開日:2023-06-28
# 自己指導型学習のクックブック A Cookbook of Self-Supervised Learning ( http://arxiv.org/abs/2304.12210v2 ) ライセンス: Link先を確認	Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, Avi Schwarzschild, Andrew Gordon Wilson, Jonas Geiping, Quentin Garrido, Pierre Fernandez, Amir Bar, Hamed Pirsiavash, Yann LeCun and Micah Goldblum	(参考訳) 人工知能のダークマターと呼ばれる自己教師型学習は、機械学習を進めるための有望な道である。しかし、料理と同様にSSLメソッドのトレーニングは、参入障壁の高い繊細なテクニックである。多くのコンポーネントは慣れ親しんでいるが、SSLメソッドをうまくトレーニングするには、プリテキストタスクからハイパーパラメータのトレーニングまで、一連の選択をめちゃくちゃにする必要がある。私たちのゴールは、基本と最新のSSLレシピをクックブックのスタイルで配置することで、SSL研究への参入障壁を低くすることにあります。興味のある研究者がメソッドの地形をナビゲートし、さまざまなノブの役割を理解し、SSLがいかに美味しいかを探求するために必要なノウハウを得ることを期待しています。 Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier to entry into SSL research by laying the foundations and latest SSL recipes in the style of a cookbook. We hope to empower the curious researcher to navigate the terrain of methods, understand the role of the various knobs, and gain the know-how required to explore how delicious SSL can be.	翻訳日:2023-06-29 17:52:11 公開日:2023-06-28
# DeePLT:スマートホームにおける認知者の軌道予測による個人化照明支援 DeePLT: Personalized Lighting Facilitates by Trajectory Prediction of Recognized Residents in the Smart Home ( http://arxiv.org/abs/2304.08027v3 ) ライセンス: Link先を確認	Danial Safaei, Ali Sobhani, Ali Akbar Kiaei	(参考訳) 近年、住宅の様々な部分の知性は、現代の住宅において不可欠な特徴の1つとなっている。これらの部品の1つは、各人の光をパーソナライズする知性照明システムである。本稿では、軌道予測によって推定される、認識されたユーザの即時未来位置における照明をパーソナライズする機械学習に基づくインテリジェントシステムを提案する。提案するシステムは, (i) 与えられた映像フレームの人物を検出・局所化するための人間検出, (ii) 検出された人物を識別するための顔認識, (iii) 映像フレームのシーケンス内の人物を追跡するための人間追跡, (iv) 逆強化学習を用いた環境におけるユーザの将来の位置を予測するための軌道予測,からなる。提案手法は、仕様、顔画像、カスタム照明設定など、各人物にユニークなプロファイルを提供する。このプロファイルは照明調整プロセスで使用される。一定の照明を考慮した他の方法とは異なり,本システムは,ユーザの直接的介入なしに,色や光強度の観点でそれぞれの「好みの照明」を適用できる。これにより、より高速で効率良く照明を調整できる。また, 予測された軌道経路により, 所望の照明を適用でき, 家庭住民の快適で快適な環境が得られる。実験結果では、入力時点から平均1.4秒で所望の光を照射し、人間の検出では22.1mAp、顔認識では95.12%、人間の追跡では93.3%、軌道予測では10.80 MinADE20, 18.55 MinFDE20, 15.8 MinADE5, 30.50 MinFDE5を照射した。 In recent years, the intelligence of various parts of the home has become one of the essential features of any modern home. One of these parts is the intelligence lighting system that personalizes the light for each person. This paper proposes an intelligent system based on machine learning that personalizes lighting in the instant future location of a recognized user, inferred by trajectory prediction. Our proposed system consists of the following modules: (I) human detection to detect and localize the person in each given video frame, (II) face recognition to identify the detected person, (III) human tracking to track the person in the sequence of video frames and (IV) trajectory prediction to forecast the future location of the user in the environment using Inverse Reinforcement Learning. The proposed method provides a unique profile for each person, including specifications, face images, and custom lighting settings. This profile is used in the lighting adjustment process. Unlike other methods that consider constant lighting for every person, our system can apply each 'person's desired lighting in terms of color and light intensity without direct user intervention. Therefore, the lighting is adjusted with higher speed and better efficiency. In addition, the predicted trajectory path makes the proposed system apply the desired lighting, creating more pleasant and comfortable conditions for the home residents. In the experimental results, the system applied the desired lighting in an average time of 1.4 seconds from the moment of entry, as well as a performance of 22.1mAp in human detection, 95.12% accuracy in face recognition, 93.3% MDP in human tracking, and 10.80 MinADE20, 18.55 MinFDE20, 15.8 MinADE5 and 30.50 MinFDE5 in trajectory prediction.	翻訳日:2023-06-29 17:51:57 公開日:2023-06-28
# フーリエ完全有界多項式の影響と量子アルゴリズムの古典シミュレーション Influences of Fourier Completely Bounded Polynomials and Classical Simulation of Quantum Algorithms ( http://arxiv.org/abs/2304.06713v2 ) ライセンス: Link先を確認	Francisco Escudero Guti\'errez	(参考訳) 我々は、Arunachalam, Bri\"et and Palazuelos (SICOMP'19) の主な結果の新しいプレゼンテーションを行い、量子クエリアルゴリズムがフーリエ完全有界多項式と呼ばれる新しい多項式のクラスによって特徴づけられることを示す。そのような多項式はすべて影響変数を持つと推測する。この予想は有名なaaronson-ambainis (aa) 予想 (theory of computing '14) よりも弱いが、量子クエリアルゴリズムの古典的なシミュレーションにも同じ意味を持つ。我々は、同次フーリエ完全有界多項式に対して成り立つことを示すことにより、AA予想の新しいケースを証明した。これは、$d$-query量子アルゴリズムの出力が次数2d$の等質多項式$p$であるなら、少なくとも$Var[p]^2$の影響を持つ変数を持つことを意味する。さらに、Bansal, Sinha and de Wolf (CCC'22 and QIP'23) の結果の代替証明として、ブロック-多重線型完全有界多項式が影響変数を持つことを示す。我々の証明はより単純で、より良い定数を得、ランダム性を使用しない。 We give a new presentation of the main result of Arunachalam, Bri\"et and Palazuelos (SICOMP'19) and show that quantum query algorithms are characterized by a new class of polynomials which we call Fourier completely bounded polynomials. We conjecture that all such polynomials have an influential variable. This conjecture is weaker than the famous Aaronson-Ambainis (AA) conjecture (Theory of Computing'14), but has the same implications for classical simulation of quantum query algorithms. We prove a new case of the AA conjecture by showing that it holds for homogeneous Fourier completely bounded polynomials. This implies that if the output of $d$-query quantum algorithm is a homogeneous polynomial $p$ of degree $2d$, then it has a variable with influence at least $Var[p]^2$. In addition, we give an alternative proof of the results of Bansal, Sinha and de Wolf (CCC'22 and QIP'23) showing that block-multilinear completely bounded polynomials have influential variables. Our proof is simpler, obtains better constants and does not use randomness.	翻訳日:2023-06-29 17:51:26 公開日:2023-06-28
# ポリタプレット損失を考慮した理解・論理推論タスクの深層マニフォールド学習 Deep Manifold Learning for Reading Comprehension and Logical Reasoning Tasks with Polytuplet Loss ( http://arxiv.org/abs/2304.01046v2 ) ライセンス: Link先を確認	Jeffrey Lu, Ivan Rodriguez	(参考訳) 理解と論理的推論タスクを読む機械学習モデルの開発における現在のトレンドは、論理的ルールを理解し活用するモデルの能力を改善することに焦点を当てている。本研究は、人間が理解や論理的推論タスクを与えられたときに使用する共通の戦略を表現することにより、他のモデルよりも解釈可能なコンポーネントを持つ、新しい損失関数と付随するモデルアーキテクチャを提供することに焦点を当てている。この戦略は、絶対精度よりも相対精度を強調し、問題の解答に必要な情報を完全に知ることなく理論的に正しい解を生成できる。このような戦略を転校学習モデルの学習に応用し,読解と論理的推論の問題を解決する効果について検討する。モデルは、難読性理解と論理的推論ベンチマークであるreclorデータセットで評価された。本稿では,三重項損失関数の拡張であるポリタップレット損失関数を提案する。その結果,ポリタプレット損失モデルの方が既存のベースラインモデルより優れていることがわかった。ポリタプレット損失は他のコントラスト損失関数の代替として有望なものであるが、その利点を定量化するためにさらなる研究が必要である。 The current trend in developing machine learning models for reading comprehension and logical reasoning tasks is focused on improving the models' abilities to understand and utilize logical rules. This work focuses on providing a novel loss function and accompanying model architecture that has more interpretable components than some other models by representing a common strategy employed by humans when given reading comprehension and logical reasoning tasks. This strategy involves emphasizing relative accuracy over absolute accuracy and can theoretically produce the correct answer without full knowledge of the information required to solve the question. We examine the effectiveness of applying such a strategy to train transfer learning models to solve reading comprehension and logical reasoning questions. The models were evaluated on the ReClor dataset, a challenging reading comprehension and logical reasoning benchmark. We propose the polytuplet loss function, an extension of the triplet loss function, to ensure prioritization of learning the relative correctness of answer choices over learning the true accuracy of each choice. Our results indicate that models employing polytuplet loss outperform existing baseline models. Although polytuplet loss is a promising alternative to other contrastive loss functions, further research is required to quantify the benefits it may present.	翻訳日:2023-06-29 17:50:35 公開日:2023-06-28
# Pgx:強化学習のためのハードウェアアクセラレーション並列ゲームシミュレータ Pgx: Hardware-accelerated Parallel Game Simulators for Reinforcement Learning ( http://arxiv.org/abs/2303.17503v2 ) ライセンス: Link先を確認	Sotetsu Koyamada, Shinri Okano, Soichiro Nishimori, Yu Murata, Keigo Habara, Haruka Kita, Shin Ishii	(参考訳) JAXで記述され,GPU/TPUアクセラレータ向けに最適化されたボードゲーム強化学習(RL)環境のスイートであるPgxを提案する。 JAXの自動ベクタライゼーションとJust-In-Time(JIT)コンパイルを活用することで、Pgxはアクセラレータ上で数千の並列実行に効率的にスケールできる。 DGX-A100ワークステーションの実験では、Pgxは既存のPython RLライブラリよりも10～100倍高速にRL環境をシミュレートできることがわかった。 Pgxには、バックギャモン、チェス、ショギ、GoといったRL研究のベンチマークとして一般的に使用されるRL環境が含まれている。さらにPgxは、迅速な研究サイクルを促進するために、ミニチュアゲームセットとベースラインモデルを提供している。 pgx環境を用いたgumbel alphazeroアルゴリズムの効率的なトレーニングを行う。 pgxは、研究者がrl実験を加速するための高性能環境シミュレータを提供する。 pgxはhttps://github.com/sotetsuk/pgxで入手できる。 We propose Pgx, a suite of board game reinforcement learning (RL) environments written in JAX and optimized for GPU/TPU accelerators. By leveraging auto-vectorization and Just-In-Time (JIT) compilation of JAX, Pgx can efficiently scale to thousands of parallel executions over accelerators. In our experiments on a DGX-A100 workstation, we discovered that Pgx can simulate RL environments 10-100x faster than existing Python RL libraries. Pgx includes RL environments commonly used as benchmarks in RL research, such as backgammon, chess, shogi, and Go. Additionally, Pgx offers miniature game sets and baseline models to facilitate rapid research cycles. We demonstrate the efficient training of the Gumbel AlphaZero algorithm with Pgx environments. Overall, Pgx provides high-performance environment simulators for researchers to accelerate their RL experiments. Pgx is available at https://github.com/sotetsuk/pgx.	翻訳日:2023-06-29 17:50:15 公開日:2023-06-28
# PeakNet:ディープニューラルネットワークを備えた自動ブラッグピークファインダ PeakNet: An Autonomous Bragg Peak Finder with Deep Neural Networks ( http://arxiv.org/abs/2303.15301v2 ) ライセンス: Link先を確認	Cong Wang, Po-Nan Li, Jana Thayer and Chun Hong Yoon	(参考訳) X線自由電子レーザー(XFEL)とシンクロトロン施設におけるシリアル結晶学は近年大きな進歩を遂げており、マクロ分子構造と分子過程の新たな科学的研究を可能にしている。しかし、これらの実験はデータ削減とリアルタイムフィードバックにおいて計算上の課題を呈する膨大な量のデータを生成する。ブラッグピーク探索アルゴリズムは有用な画像の識別や、ヒット率と解像度に関するリアルタイムフィードバックを提供する。バッファ溶液,噴射ノズル,その他の遮蔽材からのショット・ツー・ショット強度変動と強い背景散乱により,これは時間を要する最適化問題となる。本稿では,深層ニューラルネットワークを利用した自律型ブラッグピークファインダPeakNetを紹介する。このシステムの開発は 1)手動のアルゴリズムパラメータチューニングの必要性をなくす。 2) 強背景散乱におけるショット・ツー・ショットの変動をリアルタイムに調整することにより, 偽陽性ピークを低減する。 3) 悪い画素マスクを手作業で作成する手間を省き, 必要に応じて再生できるため, イベント毎にマスクを保管する必要がなくなる。 PeakNetは、1920×1920ピクセルの画像をNVIDIA 1080 Ti GPU上で90ミリ秒で処理し、並列化分析やGPUストリーム処理によるさらなる拡張の可能性を秘めている。 PeakNetは、専門家レベルのリアルタイム連続結晶学データ解析に高いデータレートで適している。 Serial crystallography at X-ray free electron laser (XFEL) and synchrotron facilities has experienced tremendous progress in recent times enabling novel scientific investigations into macromolecular structures and molecular processes. However, these experiments generate a significant amount of data posing computational challenges in data reduction and real-time feedback. Bragg peak finding algorithm is used to identify useful images and also provide real-time feedback about hit-rate and resolution. Shot-to-shot intensity fluctuations and strong background scattering from buffer solution, injection nozzle and other shielding materials make this a time-consuming optimization problem. Here, we present PeakNet, an autonomous Bragg peak finder that utilizes deep neural networks. The development of this system 1) eliminates the need for manual algorithm parameter tuning, 2) reduces false-positive peaks by adjusting to shot-to-shot variations in strong background scattering in real-time, 3) eliminates the laborious task of manually creating bad pixel masks and the need to store these masks per event since these can be regenerated on demand. PeakNet also exhibits exceptional runtime efficiency, processing a 1920-by-1920 pixel image around 90 ms on an NVIDIA 1080 Ti GPU, with the potential for further enhancements through parallelized analysis or GPU stream processing. PeakNet is well-suited for expert-level real-time serial crystallography data analysis at high data rates.	翻訳日:2023-06-29 17:49:59 公開日:2023-06-28
# 後方特徴投影による連続学習における線形分離性維持 Preserving Linear Separability in Continual Learning by Backward Feature Projection ( http://arxiv.org/abs/2303.14595v3 ) ライセンス: Link先を確認	Qiao Gu, Dongsub Shim, Florian Shkurti	(参考訳) 破滅的な忘れは、連続的な学習において大きな課題であり、モデルでは、以前見られたタスクからデータにアクセスできない、あるいは制限された、新しいタスクを学習する必要がある。この課題に対処するため,特徴空間における知識蒸留に基づく手法が提案され,忘れの低減が図られている。しかし、ほとんどの特徴蒸留法は、プラスチック性の必要性を見越して、新しい特徴を古いものと一致させるよう直接に制約している。安定性と可塑性のトレードオフを改善するため,我々は,新しい特徴を学習可能な線形変換へと変化させる連続学習法である後方特徴投影法(bfp)を提案する。 BFPは古いクラスの線形分離性を保ちつつ、新しいフィーチャの方向が新しいクラスに対応できるようにしている。 BFPは既存のエクスペリエンスリプレイメソッドと統合することができ、パフォーマンスを大幅に向上させることができる。また,BFPは連続学習中に線形分離性が良好に維持され,高い分類精度が得られるような表現空間の学習にも有効であることを示す。コードはhttps://github.com/rvl-lab-utoronto/BFPで確認できる。 Catastrophic forgetting has been a major challenge in continual learning, where the model needs to learn new tasks with limited or no access to data from previously seen tasks. To tackle this challenge, methods based on knowledge distillation in feature space have been proposed and shown to reduce forgetting. However, most feature distillation methods directly constrain the new features to match the old ones, overlooking the need for plasticity. To achieve a better stability-plasticity trade-off, we propose Backward Feature Projection (BFP), a method for continual learning that allows the new features to change up to a learnable linear transformation of the old features. BFP preserves the linear separability of the old classes while allowing the emergence of new feature directions to accommodate new classes. BFP can be integrated with existing experience replay methods and boost performance by a significant margin. We also demonstrate that BFP helps learn a better representation space, in which linear separability is well preserved during continual learning and linear probing achieves high classification accuracy. The code can be found at https://github.com/rvl-lab-utoronto/BFP	翻訳日:2023-06-29 17:49:37 公開日:2023-06-28
# DC CoMix TTS: Mixerとのコラボレーションによる離散コード付きエンドツーエンド表現型TS DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer ( http://arxiv.org/abs/2305.19567v4 ) ライセンス: Link先を確認	Yerin Choi, Myoung-Wan Koo	(参考訳) TTSの中立性は大きな成功を収めたものの、コンテンツ収集は依然として課題だ。本稿では,プロソディモデリングの改善を実現するための新しい入力表現と単純なアーキテクチャを提案する。近年のttsにおける離散コードの使用の成功に触発されて,参照エンコーダの入力に離散コードを導入する。具体的には,音響圧縮モデルのベクトル量子化器を用いて,すでにトレーニング済みの多様な音響情報を活用する。さらに、修正MLP-Mixerを参照エンコーダに適用し、アーキテクチャをより軽量にする。その結果、プロソディ転送TSをエンドツーエンドで訓練する。本手法は主観的評価と客観的評価の両方を通して有効性を示す。実験において、離散符号を入力として利用する場合、参照エンコーダは話者非依存の韻律を学習できることを実証する。さらに,少ないパラメータを入力しても比較結果が得られる。 Despite the huge successes made in neutral TTS, content-leakage remains a challenge. In this paper, we propose a new input representation and simple architecture to achieve improved prosody modeling. Inspired by the recent success in the use of discrete code in TTS, we introduce discrete code to the input of the reference encoder. Specifically, we leverage the vector quantizer from the audio compression model to exploit the diverse acoustic information it has already been trained on. In addition, we apply the modified MLP-Mixer to the reference encoder, making the architecture lighter. As a result, we train the prosody transfer TTS in an end-to-end manner. We prove the effectiveness of our method through both subjective and objective evaluations. We demonstrate that the reference encoder learns better speaker-independent prosody when discrete code is utilized as input in the experiments. In addition, we obtain comparable results even when fewer parameters are inputted.	翻訳日:2023-06-29 17:44:03 公開日:2023-06-28
# 幾何グラフフィルタとニューラルネットワーク : 限界特性と判別可能性トレードオフ Geometric Graph Filters and Neural Networks: Limit Properties and Discriminability Trade-offs ( http://arxiv.org/abs/2305.18467v2 ) ライセンス: Link先を確認	Zhiyang Wang and Luana Ruiz and Alejandro Ribeiro	(参考訳) 本稿では、グラフニューラルネットワーク(gnn)と多様体ニューラルネットワーク(mnn)の関係について、グラフが多様体からサンプリングされた点の集合から構築され、幾何学的情報をエンコードする場合に検討する。我々は、多様体とグラフの畳み込みがそれぞれラプラス・ベルトラミ作用素とグラフラプラシアンで定義されるような畳み込み MNN と GNN を考える。適切なカーネルを用いて、密度と適度なスパースグラフの両方を分析する。これらのグラフ上の畳み込みフィルタとニューラルネットワークが連続多様体上の畳み込みフィルタとニューラルネットワークに収束することを示す非漸近的誤差境界を証明した。この分析の副産物として、グラフフィルタの識別性と、多様体フィルタの所望の挙動を近似する能力との間の重要なトレードオフを観察する。次に、非線形性の周波数混合性により、このトレードオフがニューラルネットワークでどのように改善されるかについて議論する。さらに、同一多様体からサンプリングされた幾何グラフの転送可能性も導出する。本研究は,ナビゲーション制御問題と点雲分類タスクで数値的に検証する。 This paper studies the relationship between a graph neural network (GNN) and a manifold neural network (MNN) when the graph is constructed from a set of points sampled from the manifold, thus encoding geometric information. We consider convolutional MNNs and GNNs where the manifold and the graph convolutions are respectively defined in terms of the Laplace-Beltrami operator and the graph Laplacian. Using the appropriate kernels, we analyze both dense and moderately sparse graphs. We prove non-asymptotic error bounds showing that convolutional filters and neural networks on these graphs converge to convolutional filters and neural networks on the continuous manifold. As a byproduct of this analysis, we observe an important trade-off between the discriminability of graph filters and their ability to approximate the desired behavior of manifold filters. We then discuss how this trade-off is ameliorated in neural networks due to the frequency mixing property of nonlinearities. We further derive a transferability corollary for geometric graphs sampled from the same manifold. We validate our results numerically on a navigation control problem and a point cloud classification task.	翻訳日:2023-06-29 17:43:50 公開日:2023-06-28
# ビデオ連続学習のための時間情報の再検討 Just a Glimpse: Rethinking Temporal Information for Video Continual Learning ( http://arxiv.org/abs/2305.18418v2 ) ライセンス: Link先を確認	Lama Alssum, Juan Leon Alcazar, Merey Ramazanova, Chen Zhao, Bernard Ghanem	(参考訳) クラス増分学習は、現実世界のアプリケーションシナリオによく似ているため、継続的学習の研究において最も重要な設定の1つである。メモリサイズが制限されると、クラスやタスクの数が増えると、壊滅的な忘れることになる。ビデオ領域での継続的な学習は、ビデオデータが大量のフレームを含んでいるため、リプレイメモリにより高い負担がかかるため、さらに課題となる。現在の一般的なプラクティスは、ビデオストリームからサブサンプルのフレームをリプレイメモリに格納することです。本稿では,個別フレームに基づく効果的なビデオ連続学習のための新しい再生機構SMILEを提案する。広範にわたる実験により,映像の多様性は時間的情報よりも重要な役割を担っていることが明らかとなった。そこで本手法は,多数の一意なビデオを表す少数のフレームから学習することに焦点を当てている。 3つの代表的なビデオデータセット、kinetics, ucf101, activitynetにおいて、提案手法は最先端の性能を最大21.49%向上させた。 Class-incremental learning is one of the most important settings for the study of Continual Learning, as it closely resembles real-world application scenarios. With constrained memory sizes, catastrophic forgetting arises as the number of classes/tasks increases. Studying continual learning in the video domain poses even more challenges, as video data contains a large number of frames, which places a higher burden on the replay memory. The current common practice is to sub-sample frames from the video stream and store them in the replay memory. In this paper, we propose SMILE a novel replay mechanism for effective video continual learning based on individual/single frames. Through extensive experimentation, we show that under extreme memory constraints, video diversity plays a more significant role than temporal information. Therefore, our method focuses on learning from a small number of frames that represent a large number of unique videos. On three representative video datasets, Kinetics, UCF101, and ActivityNet, the proposed method achieves state-of-the-art performance, outperforming the previous state-of-the-art by up to 21.49%.	翻訳日:2023-06-29 17:43:33 公開日:2023-06-28
# 量子アニールの強結合極限における$U(N)$ゲージ理論 $U(N)$ gauge theory in the strong coupling limit on a quantum annealer ( http://arxiv.org/abs/2305.18179v2 ) ライセンス: Link先を確認	Jangho Kim and Thomas Luu and Wolfgang Unger	(参考訳) 強結合系における格子 qcd は整数値を持つ双対変数で定式化することができる。この方法では有限密度符号問題を回避し、ワームアルゴリズムによって、控えめな有限温度と有限密度を効率的にシミュレーションすることができる。しかし、低温度の環境は対処に費用がかかる。分割関数は整数の項でのみ表されるので、D-ウェーブ量子アニーラーでの研究には適している。まず、研究対象とするシステムのセットアップを説明し、その後、量子アニール、特にD-Waveに適合する改質を示す。概念実証として、ゲージ群 $U(1)$ に対して D-Wave 上で得られた最初の結果を示し、ゲージ群 $U(3)$ および $SU(3)$ への次のステップを概説する。また,ヒストグラムの重み付けにより,分析結果と比較して観察精度が大幅に向上することがわかった。 Lattice QCD in the strong coupling regime can be formulated in dual variables which are integer-valued. It can be efficiently simulated for modest finite temperatures and finite densities via the Worm algorithm, circumventing the finite density sign problem in this regime. However, the low temperature regime is more expensive to address. As the partition function is solely expressed in terms of integers, it is well suited to be studied on the D-Wave quantum annealer. We will first explain the setup of the system we want to study, and then present its reformulation suitable for a quantum annealer, and in particular the D-Wave. As a proof of concept, we present first results obtained on D-Wave for gauge group $U(1)$ and outline the next steps towards gauge groups $U(3)$ and $SU(3)$. We find that in addition, histogram reweighting greatly improves the accuracy of our observables when compared to analytic results.	翻訳日:2023-06-29 17:43:15 公開日:2023-06-28
# 思考連鎖の背後にある謎の解明に向けて--理論的展望 Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective ( http://arxiv.org/abs/2305.15408v3 ) ライセンス: Link先を確認	Guhao Feng, Bohang Zhang, Yuntian Gu, Haotian Ye, Di He, Liwei Wang	(参考訳) 近年の研究では、特に数学や推論を含む複雑なタスクを扱う場合、CoT(Chain-of-Thought prompting)がLarge Language Models(LLM)の性能を劇的に改善できることが判明している。実験的な成功にもかかわらず、CoTの背後にあるメカニズムとLLMの可能性を解き放つ方法はまだ解明されていない。本稿では,これらの疑問に理論的に答える第一歩を踏み出す。具体的には,基本的な数学的および意思決定問題の解法において,LLMとCoTとの表現性について検討する。まず,モデルサイズが入力長に対して超多項式的に大きくなる限り,有界深度変換器は基本演算/方程式タスクの正解を直接生成できないことを示す。対照的に,定サイズの自己回帰変換器は,一般的な数学言語形式を用いてCoTの導出を生成することで,両方のタスクを解くのに十分であることを示す。さらに, COT を用いた LLM は, 動的プログラミング(Dynamic Programming) と呼ばれる一般的な意思決定問題を解くことができ, 複雑な実世界のタスクに対処する能力の正当化を図っている。最後に、4つのタスクに関する広範な実験では、トランスフォーマーは常に直接答えを予測できないが、十分なCoTの実証から正しいソリューションを段階的に生成できることが示されている。 Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs), particularly when dealing with complex tasks involving mathematics or reasoning. Despite the enormous empirical success, the underlying mechanisms behind CoT and how it unlocks the potential of LLMs remain elusive. In this paper, we take a first step towards theoretically answering these questions. Specifically, we examine the expressivity of LLMs with CoT in solving fundamental mathematical and decision-making problems. We start by giving an impossibility result showing that bounded-depth Transformers are unable to directly produce correct answers for basic arithmetic/equation tasks unless the model size grows super-polynomially with respect to the input length. In contrast, we then prove by construction that autoregressive Transformers of constant size suffice to solve both tasks by generating CoT derivations using a commonly-used math language format. Moreover, we show LLMs with CoT are capable of solving a general class of decision-making problems known as Dynamic Programming, thus justifying its power in tackling complex real-world tasks. Finally, extensive experiments on four tasks show that, while Transformers always fail to predict the answers directly, they can consistently learn to generate correct solutions step-by-step given sufficient CoT demonstrations.	翻訳日:2023-06-29 17:42:59 公開日:2023-06-28
# 予測をフリップする最小トレーニングサブセットのリラベル Relabeling Minimal Training Subset to Flip a Prediction ( http://arxiv.org/abs/2305.12809v2 ) ライセンス: Link先を確認	Jinghan Yang, Linjie Xu, Lequan Yu	(参考訳) 機械学習モデルから不十分な予測に直面する場合、基礎となる理由を調査し、その結果を逆転する可能性を探ることが不可欠である。モデルがトレーニングされる前に、トレーニングデータの最小サブセットである$\mathcal{S}_t$を解放することで、テスト予測を$x_t$に切り替えることができますか? 拡張影響関数を用いてそのような部分集合を同定し、レバー化する効率的な手順を提案する。トレーニングポイントの1%未満のrelabelingでは、モデルの予測をひっくり返すことがしばしばあります。このメカニズムは、(1) 影響力のあるトレーニング部分集合を復元してモデル予測に挑戦するためのアプローチを提供する、(2) モデルのロバスト性を評価する(例えば、$\|\mathcal{S}_t\|$)、(2) トレーニングセットのノイズ比に高い関係があること、および$\|\mathcal{S}_t\|$ が予測確率と相関するが、予測確率に相補的であること、(3) トレーニングポイントがグループ帰属バイアスにつながること、の3つを示す。私たちの知る限りでは、私たちは、与えられた予測を覆すのに必要な最小限のトレーニングサブセットを特定し、緩和することについて、最初に調査します。 When facing an unsatisfactory prediction from a machine learning model, it is crucial to investigate the underlying reasons and explore the potential for reversing the outcome. We ask: can we result in the flipping of a test prediction $x_t$ by relabeling the smallest subset $\mathcal{S}_t$ of the training data before the model is trained? We propose an efficient procedure to identify and relabel such a subset via an extended influence function. We find that relabeling fewer than 1% of the training points can often flip the model's prediction. This mechanism can serve multiple purposes: (1) providing an approach to challenge a model prediction by recovering influential training subsets; (2) evaluating model robustness with the cardinality of the subset (i.e., $\|\mathcal{S}_t\|$); we show that $\|\mathcal{S}_t\|$ is highly related to the noise ratio in the training set and $\|\mathcal{S}_t\|$ is correlated with but complementary to predicted probabilities; (3) revealing training points lead to group attribution bias. To the best of our knowledge, we are the first to investigate identifying and relabeling the minimal training subset required to flip a given prediction.	翻訳日:2023-06-29 17:42:35 公開日:2023-06-28
# マルチタスク階層型逆強化学習 Multi-task Hierarchical Adversarial Inverse Reinforcement Learning ( http://arxiv.org/abs/2305.12633v2 ) ライセンス: Link先を確認	Jiayu Chen, Dipesh Tamboli, Tian Lan, Vaneet Aggarwal	(参考訳) マルチタスク・イミテーション・ラーニング(MIL)は,汎用ロボットに不可欠なマルチタスク・エキスパート・デモに基づいて,タスクの配布が可能な政策を訓練することを目的としている。既存のmilアルゴリズムは、データ効率が低く、複雑な長方形処理では性能が劣る。 MH-AIRL(Multi-task Hierarchical Adversarial Inverse Reinforcement Learning)を開発し、階層的に構造化されたマルチタスクポリシーを学習する。これを実現するため、mh-airlはコンテキストベースのマルチタスク学習、airl(ilアプローチ)、階層的ポリシー学習を効果的に合成する。さらに、MH-AIRLは、実際によりアクセスしやすいタスクやスキルアノテーション(すなわち状態-アクションペアのみ)なしで、デモに採用することができる。 MH-AIRLの各モジュールに対して理論的正当性を提供し、MH-AIRLで学んだマルチタスクポリシーをSOTA MILベースラインよりも優れた性能と転送性を示す。 Multi-task Imitation Learning (MIL) aims to train a policy capable of performing a distribution of tasks based on multi-task expert demonstrations, which is essential for general-purpose robots. Existing MIL algorithms suffer from low data efficiency and poor performance on complex long-horizontal tasks. We develop Multi-task Hierarchical Adversarial Inverse Reinforcement Learning (MH-AIRL) to learn hierarchically-structured multi-task policies, which is more beneficial for compositional tasks with long horizons and has higher expert data efficiency through identifying and transferring reusable basic skills across tasks. To realize this, MH-AIRL effectively synthesizes context-based multi-task learning, AIRL (an IL approach), and hierarchical policy learning. Further, MH-AIRL can be adopted to demonstrations without the task or skill annotations (i.e., state-action pairs only) which are more accessible in practice. Theoretical justifications are provided for each module of MH-AIRL, and evaluations on challenging multi-task settings demonstrate superior performance and transferability of the multi-task policies learned with MH-AIRL as compared to SOTA MIL baselines.	翻訳日:2023-06-29 17:42:06 公開日:2023-06-28
# 高速カロリーメータシミュレーションのための幾何学的自己回帰モデル(GAAM)による新しいジオメトリへの一般化 Generalizing to new geometries with Geometry-Aware Autoregressive Models (GAAMs) for fast calorimeter simulation ( http://arxiv.org/abs/2305.11531v2 ) ライセンス: Link先を確認	Junze Liu, Aishik Ghosh, Dylan Smith, Pierre Baldi, Daniel Whiteson	(参考訳) 衝突生成物に対するシミュレート検出器の応答は素粒子物理学のデータ解析に不可欠であるが、計算量は非常に高価である。 1つのサブ検出器であるカロリメータは、細胞の粒度が高く、相互作用の複雑さのために計算時間を支配している。生成モデルは、より迅速なサンプル生産を提供することができるが、現在、特定の検出器ジオメトリのパフォーマンスを最適化するためにかなりの労力を必要としており、しばしば、他のジオメトリに一般化することなく、様々なセルサイズや配置を記述するために多くのモデルが必要となる。我々は,温度計の応答が幾何によってどう変化するかを学習し,余分なトレーニングを伴わずに未知の測地に対するシミュレーション応答を生成できる,$\textit{geometry-aware}$ autoregressive modelを開発した。幾何認識モデルは、生成したワッサーシュタイン距離や、シミュレーションされた応答を要約する鍵量の真の分布といったいくつかの指標において、ベースライン無意識モデルよりも50\%以上優れている。 1つの幾何学的認識モデルは、大型ハドロン衝突型加速器で収集されたデータを分析する物理学者によって、現在カロリーメーターシミュレーション用に設計された数百の生成モデルを置き換えることができる。将来の検出器の研究のためには、このような基礎モデルが重要なツールとなり、通常生成熱量計モデルを開発するのに必要な大規模な事前投資を劇的に削減する。 Generation of simulated detector response to collision products is crucial to data analysis in particle physics, but computationally very expensive. One subdetector, the calorimeter, dominates the computational time due to the high granularity of its cells and complexity of the interactions. Generative models can provide more rapid sample production, but currently require significant effort to optimize performance for specific detector geometries, often requiring many models to describe the varying cell sizes and arrangements, without the ability to generalize to other geometries. We develop a $\textit{geometry-aware}$ autoregressive model, which learns how the calorimeter response varies with geometry, and is capable of generating simulated responses to unseen geometries without additional training. The geometry-aware model outperforms a baseline unaware model by over $50\%$ in several metrics such as the Wasserstein distance between the generated and the true distributions of key quantities which summarize the simulated response. A single geometry-aware model could replace the hundreds of generative models currently designed for calorimeter simulation by physicists analyzing data collected at the Large Hadron Collider. For the study of future detectors, such a foundational model will be a crucial tool, dramatically reducing the large upfront investment usually needed to develop generative calorimeter models.	翻訳日:2023-06-29 17:41:43 公開日:2023-06-28
# MAF-Net: 基底血管画像分割のための複数注意誘導核融合ネットワーク MAF-Net: Multiple attention-guided fusion network for fundus vascular image segmentation ( http://arxiv.org/abs/2305.03617v3 ) ライセンス: Link先を確認	Yuanyuan Peng, Pengpeng Luan, Zixu Zhang	(参考訳) 網膜眼底画像中の血管を正確に分割することは、眼疾患の早期スクリーニング、診断、評価において重要であるが、重要な光変化、不均一な曲率構造、非一様コントラストなどの様々な要因により、セグメンテーションタスクに不明瞭な不確実性をもたらす。その結果,網膜基底画像の血管を正確に検出するためのマルチアテンション誘導核融合ネットワーク (MAF-Net) が提案された。現在、伝統的なunetベースのモデルは、長距離依存関係を明示的にモデル化することで部分的な情報を失う可能性がある。シーン情報補償の損失に対するコンテクスト情報を強化するため、眼底画像から血管の様々な特徴を抽出するために、トランスフォーマによって構築された空間的注意機構とチャネル注意を結合した注意融合機構を用いる。その後、スキップ接続にユニークな空間的注意機構を適用し、低レベル機能から冗長な情報やノイズをフィルタリングすることで、高レベル機能との統合性が向上する。さらに、ドロップアウト層を使用して、いくつかのニューロンをランダムに破棄することで、ディープラーニングネットワークの過剰フィットを防止し、その一般化性能を向上させることができる。実験結果は,F1スコアが0.818,0.836,0.811,Acc値が0.968,0.973,0.973の公開データセットDRIVE,STARE,CHASEDB1で検証された。ビジュアルインスペクションと定量的評価はいずれも,最先端手法と比較して良好な結果が得られることを示す。 Accurately segmenting blood vessels in retinal fundus images is crucial in the early screening, diagnosing, and evaluating some ocular diseases, yet it poses a nontrivial uncertainty for the segmentation task due to various factors such as significant light variations, uneven curvilinear structures, and non-uniform contrast. As a result, a multiple attention-guided fusion network (MAF-Net) is proposed to accurately detect blood vessels in retinal fundus images. Currently, traditional UNet-based models may lose partial information due to explicitly modeling long-distance dependencies, which may lead to unsatisfactory results. To enrich contextual information for the loss of scene information compensation, an attention fusion mechanism that combines the channel attention with spatial attention mechanisms constructed by Transformer is employed to extract various features of blood vessels from retinal fundus images. Subsequently, a unique spatial attention mechanism is applied in the skip connection to filter out redundant information and noise from low-level features, thus enabling better integration with high-level features. In addition, a DropOut layer is employed to randomly discard some neurons, which can prevent overfitting of the deep learning network and improve its generalization performance. Experimental results were verified in public datasets DRIVE, STARE and CHASEDB1 with F1 scores of 0.818, 0.836 and 0.811, and Acc values of 0.968, 0.973 and 0.973, respectively. Both visual inspection and quantitative evaluation demonstrate that our method produces satisfactory results compared to some state-of-the-art methods.	翻訳日:2023-06-29 17:41:20 公開日:2023-06-28
# 深部ニューラルネットワークの統計的最適性 Statistical Optimality of Deep Wide Neural Networks ( http://arxiv.org/abs/2305.02657v2 ) ライセンス: Link先を確認	Yicheng Li, Zixiong Yu, Guhan Chen, Qian Lin	(参考訳) 本稿では、有界領域 $\mathcal X \subset \mathbb R^{d}$ 上で定義された深いフィードフォワード ReLU ニューラルネットワークの一般化能力を考察する。まず、ニューラルネットワークの一般化能力は、対応するディープ・ニューラル・タンジェント・カーネル(NTK)の回帰によって完全に特徴づけられることを示した。次に、深部NTKのスペクトル特性を調査し、深部NTKが$\mathcal{X}$で正定値であり、その固有値減衰率は$(d+1)/d$であることを示す。カーネル回帰の確立された理論により、対応するNTKに付随する再生カーネルヒルベルト空間(RKHS)に回帰関数が存在することを仮定して、勾配降下により訓練された多層ワイドニューラルネットワークが最小最大値を達成することを結論付ける。最後に、オーバーフィットした多層ニューラルネットワークは$\mathbb S^{d}$ではうまく一般化できないことを示す。我々は、$\mathbb r^{d}$ 上の ntk の固有値減衰率を決定する技術上の貢献は、独立した利益であると信じている。 In this paper, we consider the generalization ability of deep wide feedforward ReLU neural networks defined on a bounded domain $\mathcal X \subset \mathbb R^{d}$. We first demonstrate that the generalization ability of the neural network can be fully characterized by that of the corresponding deep neural tangent kernel (NTK) regression. We then investigate on the spectral properties of the deep NTK and show that the deep NTK is positive definite on $\mathcal{X}$ and its eigenvalue decay rate is $(d+1)/d$. Thanks to the well established theories in kernel regression, we then conclude that multilayer wide neural networks trained by gradient descent with proper early stopping achieve the minimax rate, provided that the regression function lies in the reproducing kernel Hilbert space (RKHS) associated with the corresponding NTK. Finally, we illustrate that the overfitted multilayer wide neural networks can not generalize well on $\mathbb S^{d}$. We believe our technical contributions in determining the eigenvalue decay rate of NTK on $\mathbb R^{d}$ might be of independent interests.	翻訳日:2023-06-29 17:40:50 公開日:2023-06-28
# Genomic Interpreter: 1Dシフトウィンドウトランスを備えた階層型ゲノムディープニューラルネットワーク Genomic Interpreter: A Hierarchical Genomic Deep Neural Network with 1D Shifted Window Transformer ( http://arxiv.org/abs/2306.05143v2 ) ライセンス: Link先を確認	Zehui Li, Akashaditya Das, William A V Beardall, Yiren Zhao, Guy-Bart Stan	(参考訳) ゲノムデータの量と質の増大を考えると、新しい洞察の抽出には解釈可能な機械学習モデルが必要である。本研究はゲノム解析予測のための新しいアーキテクチャであるゲノム解釈を提示する。このモデルは、ゲノムアッセイ予測タスクの最先端モデルを上回る。我々のモデルはゲノム部位の階層的依存関係を識別できる。これは、我々が長距離階層データをモデル化するために設計した、新しいトランスフォーマーベースのブロックである1d-swinの統合によって実現されている。ゲノムインタプターは17K塩基対の38,171のDNAセグメントを含むデータセットに基づいて評価され、クロマチンアクセシビリティと遺伝子発現予測において優れた性能を示し、遺伝子制御の基礎となる「シンタクス」を解き放つ。 Given the increasing volume and quality of genomics data, extracting new insights requires interpretable machine-learning models. This work presents Genomic Interpreter: a novel architecture for genomic assay prediction. This model outperforms the state-of-the-art models for genomic assay prediction tasks. Our model can identify hierarchical dependencies in genomic sites. This is achieved through the integration of 1D-Swin, a novel Transformer-based block designed by us for modelling long-range hierarchical data. Evaluated on a dataset containing 38,171 DNA segments of 17K base pairs, Genomic Interpreter demonstrates superior performance in chromatin accessibility and gene expression prediction and unmasks the underlying `syntax' of gene regulation.	翻訳日:2023-06-29 17:33:45 公開日:2023-06-28
# テキストプロンプトによる高品質検出データ生成のためのテキスト間拡散モデルへの幾何制御の統合 Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt ( http://arxiv.org/abs/2306.04607v4 ) ライセンス: Link先を確認	Kai Chen, Enze Xie, Zhe Chen, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung	(参考訳) 拡散モデルは、コンテンツの作成や画像分類などのタスクのためのデータの生成に際し、非常に注目されている。しかし、高品質な物体検出データを生成するための拡散モデルの利用は、画像レベルの知覚品質だけでなく、バウンディングボックスやカメラビューのような幾何学的条件が不可欠である未探索領域に留まっている。従来はコピー・ペースト合成やレイアウト・トゥ・イメージ(L2I)生成を利用していた。本稿では,様々な幾何学的条件を柔軟にテキストプロンプトに変換し,高品質なデータ生成のための事前学習されたtext-to-image(t2i)拡散モデルを強化するシンプルなフレームワークgeodiffusionを提案する。従来のl2i法とは異なり、geodiffusionはバウンディングボックスだけでなく、自動運転シーンのカメラビューなどの余分な幾何学的条件もエンコードできる。大規模な実験では、GeoDiffusionは従来のL2I法よりも高速に4倍のトレーニング時間を維持する。私たちの知る限りでは、幾何学的な条件でレイアウトから画像への拡散モデルを採用し、l2i生成画像が物体検出器の性能向上に有用であることを実証するのはこれが初めてです。 Diffusion models have attracted significant attention due to their remarkable ability to create content and generate data for tasks such as image classification. However, the usage of diffusion models to generate high-quality object detection data remains an underexplored area, where not only the image-level perceptual quality but also geometric conditions such as bounding boxes and camera views are essential. Previous studies have utilized either copy-paste synthesis or layout-to-image (L2I) generation with specifically designed modules to encode semantic layouts. In this paper, we propose GeoDiffusion, a simple framework that can flexibly translate various geometric conditions into text prompts and empower the pre-trained text-to-image (T2I) diffusion models for high-quality detection data generation. Unlike previous L2I methods, our GeoDiffusion is able to encode not only bounding boxes but also extra geometric conditions such as camera views in self-driving scenes. Extensive experiments demonstrate GeoDiffusion outperforms previous L2I methods while maintaining 4x training time faster. To the best of our knowledge, this is the first work to adopt diffusion models for layout-to-image generation with geometric conditions and demonstrate that L2I-generated images can be beneficial for improving the performance of object detectors.	翻訳日:2023-06-29 17:32:59 公開日:2023-06-28
# 適応的勾配に基づく外乱除去による雑音ラベルの学習 Learning with Noisy Labels by Adaptive Gradient-Based Outlier Removal ( http://arxiv.org/abs/2306.04502v2 ) ライセンス: Link先を確認	Anastasiia Sedova, Lena Zellinger, Benjamin Roth	(参考訳) 正確で実質的なデータセットは、信頼性とパフォーマンスのよいモデルのトレーニングに不可欠です。しかし、手動でアノテートされたデータセットでさえラベルエラーを含んでいる。従来、ラベルのデノイジングの方法は、主に、データセットのオーバーフィルタやアンダーフィルタのプロセスである、異常値の検出と永続的な削除に重点を置いてきた。本稿では,Adaptive GRAdient-based outlier removal を用いて,雑音ラベルを用いた新しい学習法 AGRAを提案する。モデルトレーニングの前にデータセットをクリーニングする代わりに、トレーニングプロセス中にデータセットを動的に調整する。サンプルのバッチの集約勾配と個々のサンプル勾配を比較することで、この時点で対応するサンプルがモデルに有用か、あるいは非生産的かを動的に決定し、現在の更新のために残すべきである。いくつかのデータセットに対する広範囲な評価はAGRAの有効性を示しているが、包括的な結果分析は私たちの最初の仮説を支持している。 An accurate and substantial dataset is essential for training a reliable and well-performing model. However, even manually annotated datasets contain label errors, not to mention automatically labeled ones. Previous methods for label denoising have primarily focused on detecting outliers and their permanent removal - a process that is likely to over- or underfilter the dataset. In this work, we propose AGRA: a new method for learning with noisy labels by using Adaptive GRAdient-based outlier removal. Instead of cleaning the dataset prior to model training, the dataset is dynamically adjusted during the training process. By comparing the aggregated gradient of a batch of samples and an individual example gradient, our method dynamically decides whether a corresponding example is helpful for the model at this point or is counter-productive and should be left out for the current update. Extensive evaluation on several datasets demonstrates AGRA's effectiveness, while a comprehensive results analysis supports our initial hypothesis: permanent hard outlier removal is not always what model benefits the most from.	翻訳日:2023-06-29 17:32:38 公開日:2023-06-28
# 社会技術的ギャップを狭めるモデル評価の再検討 Rethinking Model Evaluation as Narrowing the Socio-Technical Gap ( http://arxiv.org/abs/2306.03100v2 ) ライセンス: Link先を確認	Q. Vera Liao, Ziang Xiao	(参考訳) 最近のジェネレーティブ言語モデル(llm)の開発は、研究コミュニティや業界が取り組んでいるモデル評価に新たな挑戦をもたらしている。これらのモデルの汎用性は興奮を喚起する一方で、必然的に均質化へと跳躍する。本稿では,この均質化によってもたらされる課題と責任に対処するためには,モデル評価の実践が重要な課題を担わなければならないことを論じる。社会科学、ヒューマン・コンピュータ・インタラクション(HCI)、説明可能なAI(XAI)の学際的な分野から教訓を得て、実世界の社会要求に基づく評価手法の開発をコミュニティに促し、現実主義から社会要求へのトレードオフと実用的コストの認識による多様な評価手法を取り入れて評価を行う。 HCI と現在の NLG 評価手法をマッピングすることにより,社会技術的ギャップを狭くし,オープンな疑問を呈する LLM の評価手法を提案する。 The recent development of generative and large language models (LLMs) poses new challenges for model evaluation that the research community and industry are grappling with. While the versatile capabilities of these models ignite excitement, they also inevitably make a leap toward homogenization: powering a wide range of applications with a single, often referred to as ``general-purpose'', model. In this position paper, we argue that model evaluation practices must take on a critical task to cope with the challenges and responsibilities brought by this homogenization: providing valid assessments for whether and how much human needs in downstream use cases can be satisfied by the given model (socio-technical gap). By drawing on lessons from the social sciences, human-computer interaction (HCI), and the interdisciplinary field of explainable AI (XAI), we urge the community to develop evaluation methods based on real-world socio-requirements and embrace diverse evaluation methods with an acknowledgment of trade-offs between realism to socio-requirements and pragmatic costs to conduct the evaluation. By mapping HCI and current NLG evaluation methods, we identify opportunities for evaluation methods for LLMs to narrow the socio-technical gap and pose open questions.	翻訳日:2023-06-29 17:32:24 公開日:2023-06-28
# 重ね合わせ方向の時間軸を持つ量子演算 Quantum operations with the time axis in a superposed direction ( http://arxiv.org/abs/2306.02755v3 ) ライセンス: Link先を確認	Seok Hyung Lie, M. S. Kim	(参考訳) 量子論において、ある過程が行列転位を適用し、それが物理的に保たれているかどうかを調べることによって、時間反転対称性を持つかどうかが示されている。しかし、量子過程の不定因果順序に関する最近の発見は、完全な反転以外に、より一般的な時間の対称性変換が存在することを示唆している。本研究では,行列変換という一般化された転置の概念を導入し,量子演算の未来と過去のヒルベルト空間の一般二部一元変換を考慮し,時間軸を重畳方向に確実に横たわらせ,従来研究されていた「時間の不定方向」、すなわち前方の重畳と後方の時間進化を一般化する。この枠組みは、時空構造が量子力学から現れると説明される量子重力と同様に時間と空間を等しく扱うアプローチに応用することができる。この一般化された転位法を用いて、完全テンソルの連続的一般化、サブシステムのトレースの動的バージョン、二成分量子相互作用における多重時間軸の互換性を調べる。特に,両部間相互作用がより異なる時間軸と一致している場合,因果的違反を防止するため,両者間の情報交換の費用が削減されることを示す。 In the quantum theory, it has been shown that one can see if a process has the time reversal symmetry by applying the matrix transposition and examining if it remains physical. However, recent discoveries regarding the indefinite causal order of quantum processes suggest that there may be other, more general symmetry transformations of time besides the complete reversal. In this work, we introduce an expanded concept of matrix transposition, the generalized transposition, that takes into account general bipartite unitary transformations of a quantum operation's future and past Hilbert spaces, allowing for making the time axis definitely lie in a superposed direction, which generalizes the previously studied `indefinite direction of time', i.e., superposition of the forward and the backward time evolution. This framework may have applications in approaches that treat time and space equally like quantum gravity, where the spatio-temporal structure is explained to emerge from quantum mechanics. We apply this generalized transposition to investigate a continuous generalization of perfect tensors, a dynamic version of tracing out a subsystem, and the compatibility of multiple time axes in bipartite quantum interactions. Notably, we demonstrate that when a bipartite interaction is consistent with more distinct local temporal axes, there is a reduced allowance for information exchange between the two parties in order to prevent causality violations.	翻訳日:2023-06-29 17:32:01 公開日:2023-06-28
# オートエンコーダの最大度トレーニング Maximum Likelihood Training of Autoencoders ( http://arxiv.org/abs/2306.01843v2 ) ライセンス: Link先を確認	Peter Sorrenson, Felix Draxler, Armand Rousselot, Sander Hummerich, Lea Zimmermann and Ullrich K\"othe	(参考訳) 最大度トレーニングは好適な統計特性を持ち、特に正規化フローにおいて生成的モデリングに人気がある。一方、生成オートエンコーダは多様体仮説による流れの正規化よりも効率的なことを約束している。本研究では,制約のないオートエンコーダの最大確率トレーニングを初めて導入し,この2つのパラダイムを組み合わせる。第一に、フリーフォームネットワークのための既存の最大確率推定器は、潜在次元と線形にコストがスケールする反復スキームに依存するため、受け入れがたいほど遅い。改良された推定器を導入し、イテレーションを排除し、一定のコスト(バニラオートエンコーダのバッチあたりのランタイムの約2倍)をもたらす。第2に,自動エンコーダに最大限の確率を適用することで,異なる解を導き出すことが可能であり,この知見を用いて安定的な最大確率トレーニング目標を動機付けることを実証する。我々は,玩具,表,画像データについて広範な実験を行い,その結果の競争性能を実証した。我々は、我々のモデルを最大可能性オートエンコーダ(MLAE)と呼ぶ。 Maximum likelihood training has favorable statistical properties and is popular for generative modeling, especially with normalizing flows. On the other hand, generative autoencoders promise to be more efficient than normalizing flows due to the manifold hypothesis. In this work, we introduce successful maximum likelihood training of unconstrained autoencoders for the first time, bringing the two paradigms together. To do so, we identify and overcome two challenges: Firstly, existing maximum likelihood estimators for free-form networks are unacceptably slow, relying on iteration schemes whose cost scales linearly with latent dimension. We introduce an improved estimator which eliminates iteration, resulting in constant cost (roughly double the runtime per batch of a vanilla autoencoder). Secondly, we demonstrate that naively applying maximum likelihood to autoencoders can lead to divergent solutions and use this insight to motivate a stable maximum likelihood training objective. We perform extensive experiments on toy, tabular and image data, demonstrating the competitive performance of the resulting model. We call our model the maximum likelihood autoencoder (MLAE).	翻訳日:2023-06-29 17:31:39 公開日:2023-06-28
# 変圧器を用いたアノテーションバイアスを考慮した医用画像分割 Transformer-based Annotation Bias-aware Medical Image Segmentation ( http://arxiv.org/abs/2306.01340v2 ) ライセンス: Link先を確認	Zehui Liao, Yutong Xie, Shishuai Hu, Yong Xia	(参考訳) 手動画像分割は主観的であり、アノテータ関連バイアスに悩まされ、深層学習法によって模倣または増幅される。近年、このバイアスはアノテータの好みと確率的誤差の組合せであり、それぞれデコーダと画素単位の独立なガウス分布の後にある畳み込みブロックによってモデル化されている。畳み込みブロックは、全解像度レベルで様々な好みの度合いを効果的にモデル化することは不可能である。さらに、独立画素ワイドガウス分布は画素相関を無視し、不連続境界をもたらす。本稿では,アノテーションの嗜好と確率的誤りをモデル化することにより,アノテーション関連バイアスに取り組むトランスフォーマタ型アノテーション・バイアス・アウェア(tab)医療画像分割モデルを提案する。 TABはTransformerと学習可能なクエリを使って、好みに重点を置くさまざまな特徴を抽出する。これにより、TABは単一のセグメンテーションヘッドを使用して、様々な好みのセグメンテーションを同時に生成できる。さらに、TABは画素相関をモデル化する多変正規分布を仮定し、アノテーション分布を学習して確率誤差を解消する。 6つのアノテーションを付加したOD/OCセグメンテーションベンチマークでTABを評価した。以上の結果から,TABはアノテータ関連バイアスを考慮した既存の医用画像セグメンテーションモデルより優れていることが示唆された。 Manual medical image segmentation is subjective and suffers from annotator-related bias, which can be mimicked or amplified by deep learning methods. Recently, researchers have suggested that such bias is the combination of the annotator preference and stochastic error, which are modeled by convolution blocks located after decoder and pixel-wise independent Gaussian distribution, respectively. It is unlikely that convolution blocks can effectively model the varying degrees of preference at the full resolution level. Additionally, the independent pixel-wise Gaussian distribution disregards pixel correlations, leading to a discontinuous boundary. This paper proposes a Transformer-based Annotation Bias-aware (TAB) medical image segmentation model, which tackles the annotator-related bias via modeling annotator preference and stochastic errors. TAB employs the Transformer with learnable queries to extract the different preference-focused features. This enables TAB to produce segmentation with various preferences simultaneously using a single segmentation head. Moreover, TAB takes the multivariant normal distribution assumption that models pixel correlations, and learns the annotation distribution to disentangle the stochastic error. We evaluated our TAB on an OD/OC segmentation benchmark annotated by six annotators. Our results suggest that TAB outperforms existing medical image segmentation models which take into account the annotator-related bias.	翻訳日:2023-06-29 17:31:20 公開日:2023-06-28
# AIによる意思決定のためのヒューマンアライズドキャリブレーション Human-Aligned Calibration for AI-Assisted Decision Making ( http://arxiv.org/abs/2306.00074v2 ) ライセンス: Link先を確認	Nina L. Corvelo Benz and Manuel Gomez Rodriguez	(参考訳) バイナリ分類器を使用して意思決定支援を行う場合、通常はラベル予測と信頼値の両方を提供する。次に、意思決定者は、信頼度値を使用して、予測をどれだけ信頼するかを判断する。この文脈では、信頼度値は、予測されたラベルが基底真理ラベルと一致する確率の十分に校正された推定値に対応するべきであるとしばしば主張されている。しかし、複数の実証的証拠は、意思決定者がこれらの信頼度値を用いて予測をいつ信頼するかを判断するのに難しいことを示唆している。本稿では,まずその理由を理解し,より有用な信頼値の構築方法を検討することを目的とする。我々はまず、広範囲のユーティリティ機能に対して、合理的な意思決定者が一般的に、上記の信頼度値を使って最適な決定方針を発見することができないデータ分布が存在することを論じる。しかし, 意思決定者自身の予測に対する信頼度に関して, 信頼度値が自然な整合性を満たすならば, 常に, 意思決定者が予測に立たなければならない信頼度が信頼度に単調であり, 発見可能性の向上に寄与する最適決定方針が存在することを示す。さらに, 意思決定者自身の予測に対する信頼度に対する多重化が, 調整の十分条件であることを示す。分類器が実際の人間の専門家に意思決定支援を提供する4つのAI支援意思決定タスクの実験は、我々の理論的結果を検証するとともに、アライメントがより良い意思決定につながることを示唆している。 Whenever a binary classifier is used to provide decision support, it typically provides both a label prediction and a confidence value. Then, the decision maker is supposed to use the confidence value to calibrate how much to trust the prediction. In this context, it has been often argued that the confidence value should correspond to a well calibrated estimate of the probability that the predicted label matches the ground truth label. However, multiple lines of empirical evidence suggest that decision makers have difficulties at developing a good sense on when to trust a prediction using these confidence values. In this paper, our goal is first to understand why and then investigate how to construct more useful confidence values. We first argue that, for a broad class of utility functions, there exist data distributions for which a rational decision maker is, in general, unlikely to discover the optimal decision policy using the above confidence values -- an optimal decision maker would need to sometimes place more (less) trust on predictions with lower (higher) confidence values. However, we then show that, if the confidence values satisfy a natural alignment property with respect to the decision maker's confidence on her own predictions, there always exists an optimal decision policy under which the level of trust the decision maker would need to place on predictions is monotone on the confidence values, facilitating its discoverability. Further, we show that multicalibration with respect to the decision maker's confidence on her own predictions is a sufficient condition for alignment. Experiments on four different AI-assisted decision making tasks where a classifier provides decision support to real human experts validate our theoretical results and suggest that alignment may lead to better decisions.	翻訳日:2023-06-29 17:30:55 公開日:2023-06-28
# 拡張視野を用いた複数音源変換トモグラフィーの解析的再構成に関する研究 Analytical reconstructions of multiple source-translation computed tomography with extended field of views: a research study ( http://arxiv.org/abs/2305.19767v2 ) ライセンス: Link先を確認	Zhisheng Wang, Yue Liu, Shunli Wang, Xingyuan Bian, Zongfeng Li and Junning Cui	(参考訳) 本稿では,複数音源変換トモグラフィ(mSTCT)を拡張視野(FOV)下での高品質な解析的再構成について検討する。より大規模なFOVでは、D-BPF や S-BPF を含む mSTCT のバックプロジェクションフィルタ (BPF) アルゴリズムが、不安定なバックプロジェクション重み付け因子と半スキャンモードにより画像エッジに許容できない誤りを犯し、mSTCT イメージングの意図から逸脱する。本稿では,fd-bpfとfs-bpfと略されるエラーのバランスをとるために,mstctの非重み付けd-bpf(nwd-bpf)を導出し,bpfsを特別なフルスキャンmstct(f-mstct)に導入する手法を提案する。第一戦略として、D-BPFに特殊変動関係を導入することにより、不安定な後方投影重み付け因子を除去する。第2の戦略として、F-mSTCT幾何とBPFを組み合わせることで、F-mSTCTに適切な冗長重み付け関数を導出する。実験により,提案手法が実証された。その中で、NWD-BPFは画像エッジの不安定性を弱めることができるが、詳細は曖昧であり、FS-BPFは大きな物体を撮像する極端に拡張されたFOVの下で高品質な安定画像を得ることができるが、FD-BPFよりも多くの投影を必要とする。 FOV画像の拡張における様々な実践的要件に対して,アルゴリズムの選択について提案する。 This paper is to investigate the high-quality analytical reconstructions of multiple source-translation computed tomography (mSTCT) under an extended field of view (FOV). Under the larger FOVs, the previously proposed backprojection filtration (BPF) algorithms for mSTCT, including D-BPF and S-BPF, make some intolerable errors in the image edges due to an unstable backprojection weighting factor and the half-scan mode, which deviates from the intention of mSTCT imaging. In this paper, to achieve reconstruction with as little error as possible under the extremely extended FOV, we propose two strategies, including deriving a no-weighting D-BPF (NWD-BPF) for mSTCT and introducing BPFs into a special full-scan mSTCT (F-mSTCT) to balance errors, i.e., abbreviated as FD-BPF and FS-BPF. For the first strategy, we eliminate this unstable backprojection weighting factor by introducing a special variable relationship in D-BPF. For the second strategy, we combine the F-mSTCT geometry with BPFs to study the performance and derive a suitable redundant weighting function for F-mSTCT. The experiments demonstrate our proposed methods for these strategies. Among them, NWD-BPF can weaken the instability at the image edges but blur the details, and FS-BPF can get high-quality stable images under the extremely extended FOV imaging a large object but requires more projections than FD-BPF. For different practical requirements in extending FOV imaging, we give suggestions on algorithm selection.	翻訳日:2023-06-29 17:30:27 公開日:2023-06-28
# 集合観測からの普遍的偏りのない分類法 A Universal Unbiased Method for Classification from Aggregate Observations ( http://arxiv.org/abs/2306.11343v2 ) ライセンス: Link先を確認	Zixi Wei, Lei Feng, Bo Han, Tongliang Liu, Gang Niu, Xiaofeng Zhu, Heng Tao Shen	(参考訳) 従来の教師付き分類では、個々のインスタンスには真のラベルが必要である。しかし、プライバシの懸念や不適切なアノテーションコストのために、個々のインスタンスの真のラベルを収集することは禁止される可能性がある。これは、個々のインスタンスではなく、インスタンスのグループに監督を提供する集合観察(CFAO)からの分類の研究を動機付けている。 CFAOは、多言語学習やラベル比率からの学習など、さまざまな学習問題を含む一般化学習フレームワークである。本研究の目的は,任意の損失に対する分類リスクの偏りのない推定値を保持する,新しいCFAOの普遍的手法を提案することである。実際、本手法はグループ内の各インスタンスに対する各ラベルの重要性を考慮し、分類器が学習するパーソナライズされた監督を提供する。理論的には,提案手法は不偏リスク推定器によるリスクの整合性を保証するだけでなく,任意の損失に対応できる。 CFAOの諸問題に対する大規模な実験により,提案手法の優位性を示した。 In conventional supervised classification, true labels are required for individual instances. However, it could be prohibitive to collect the true labels for individual instances, due to privacy concerns or unaffordable annotation costs. This motivates the study on classification from aggregate observations (CFAO), where the supervision is provided to groups of instances, instead of individual instances. CFAO is a generalized learning framework that contains various learning problems, such as multiple-instance learning and learning from label proportions. The goal of this paper is to present a novel universal method of CFAO, which holds an unbiased estimator of the classification risk for arbitrary losses -- previous research failed to achieve this goal. Practically, our method works by weighing the importance of each label for each instance in the group, which provides purified supervision for the classifier to learn. Theoretically, our proposed method not only guarantees the risk consistency due to the unbiased risk estimator but also can be compatible with arbitrary losses. Extensive experiments on various problems of CFAO demonstrate the superiority of our proposed method.	翻訳日:2023-06-29 17:23:45 公開日:2023-06-28
# 対角線はループ量子宇宙論の一般的な図像か? Is the diagonal case a general picture for Loop Quantum Cosmology? ( http://arxiv.org/abs/2306.10934v2 ) ライセンス: Link先を確認	Matteo Bruno and Giovanni Montani	(参考訳) ループ量子重力の初期の均質宇宙への正しい実装は、su(2)対称性が適切に保持できないため、文献において長い議論の対象となっている。この対称性の役割はガウス制約によって表される。ここで、バニッシュでないガウス制約が見つかる。しかし、適切な変数を用いて3つのアベリア制約に再キャストできることを示し、ループ量子宇宙論においてそのような対称性が存在しないことを正当化する。 The correct implementation of the Loop Quantum Gravity to the early homogeneous Universe has been the subject of a long debate in the literature because the SU(2) symmetry cannot be properly retained. The role of this symmetry is expressed by the Gauss constraint. Here, a non-vanishing Gauss constraint is found. However, we show that using suitable variables, it can be recast into three Abelian constraints, justifying the absence of such a symmetry in Loop Quantum Cosmology.	翻訳日:2023-06-29 17:23:28 公開日:2023-06-28
# the false dawn: チップマクロ配置のためのgoogleの強化学習の再評価 The False Dawn: Reevaluating Google's Reinforcement Learning for Chip Macro Placement ( http://arxiv.org/abs/2306.09633v4 ) ライセンス: Link先を確認	Igor L. Markov	(参考訳) Google 2021 Natureの論文で、シリコンチップの物理的設計のための強化学習(RL)が論争を引き起こした。 nature紙は、報告された結果を生成するために必要なほとんどの入力と、方法論におけるいくつかの重要なステップを支持した。しかし、2つの異なる評価がギャップを埋め、Google RLが人間設計者より遅れており、よく知られたアルゴリズム(Simulated Annealing)、そして一般的な商用ソフトウェアよりも遅れていることを示した。クロスチェックデータによると、Nature論文の完全性は、行動、分析、報告の誤りによって著しく損なわれている。 Reinforcement learning (RL) for physical design of silicon chips in a Google 2021 Nature paper stirred controversy due to poorly documented claims that raised eyebrows and attracted critical media coverage. The Nature paper withheld most inputs needed to produce reported results and some critical steps in the methodology. But two separate evaluations filled in the gaps and demonstrated that Google RL lags behind human designers, behind a well-known algorithm (Simulated Annealing), and also behind generally-available commercial software. Crosschecked data indicate that the integrity of the Nature paper is substantially undermined owing to errors in the conduct, analysis and reporting.	翻訳日:2023-06-29 17:23:20 公開日:2023-06-28
# 体積医用画像分割のための学習可能な重み初期化 Learnable Weight Initialization for Volumetric Medical Image Segmentation ( http://arxiv.org/abs/2306.09320v3 ) ライセンス: Link先を確認	Shahina Kunhimon, Abdelrahman Shaker, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan	(参考訳) 局所畳み込みとグローバルな注意の利点を組み合わせたハイブリッド容積医用画像セグメンテーションモデルが最近注目されている。主にアーキテクチャの変更に重点を置いているが、既存のほとんどのハイブリッドアプローチでは、医療データの本質的な容積性を無視して性能を制限する従来のデータ非依存の重み初期化スキームが使用されている。そこで本研究では, 利用可能な医療訓練データを用いて, 提案する自己監督目標を用いて, 文脈的および構造的手がかりを効果的に学習する, 学習可能な重み初期化手法を提案する。我々のアプローチはどんなハイブリッドモデルにも簡単に統合でき、外部のトレーニングデータを必要としない。多臓器・肺癌セグメンテーションタスクの実験は、我々のアプローチの有効性を示し、最先端セグメンテーション性能をもたらす。提案手法は,マルチオーガンセグメンテーションタスクにおける大規模データセットを用いて事前学習したswain-unetrモデルと比較して良好に機能する。ソースコードとモデルは、https://github.com/shahinakk/lwi-vmsで利用可能です。 Hybrid volumetric medical image segmentation models, combining the advantages of local convolution and global attention, have recently received considerable attention. While mainly focusing on architectural modifications, most existing hybrid approaches still use conventional data-independent weight initialization schemes which restrict their performance due to ignoring the inherent volumetric nature of the medical data. To address this issue, we propose a learnable weight initialization approach that utilizes the available medical training data to effectively learn the contextual and structural cues via the proposed self-supervised objectives. Our approach is easy to integrate into any hybrid model and requires no external training data. Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach, leading to state-of-the-art segmentation performance. Our proposed data-dependent initialization approach performs favorably as compared to the Swin-UNETR model pretrained using large-scale datasets on multi-organ segmentation task. Our source code and models are available at: https://github.com/ShahinaKK/LWI-VMS.	翻訳日:2023-06-29 17:23:05 公開日:2023-06-28
# AssistGPT:計画、実行、検査、学習が可能な汎用マルチモーダルアシスタント AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn ( http://arxiv.org/abs/2306.08640v2 ) ライセンス: Link先を確認	Difei Gao, Lei Ji, Luowei Zhou, Kevin Qinghong Lin, Joya Chen, Zihan Fan, Mike Zheng Shou	(参考訳) 近年のLarge Language Models (LLMs) の研究は、一般のNLPAIアシスタントに顕著な進歩をもたらした。いくつかの研究は、より一般的なマルチモーダルユーザクエリに対処するために、モデルやapiの計画と呼び出しにllmの使用をさらに検討している。この進歩にもかかわらず、視覚タスクの多様な性質のため、複雑な視覚ベースのタスクは依然として困難である。この多様性は2つの側面に反映されます 1)経路の推論。多くの実生活アプリケーションでは、クエリ自体を調べるだけでクエリを正確に分解することは困難である。特定の視覚内容と各ステップの結果に基づいた計画が通常必要である。 2)柔軟な入力と中間結果。入力フォームは、野生のケースでは柔軟で、単一の画像やビデオだけでなく、ビデオや画像の混合物(たとえば、ユーザービュー画像といくつかの参照ビデオ)も含む。さらに、複雑な推論プロセスは、ビデオナレーションやセグメント化されたビデオクリップなど、さまざまなマルチモーダル中間結果を生成する。このような一般的なケースに対処するため,我々は,plan,execute,inspect,learning(peil)と呼ばれるインターリーブされたコードと言語推論アプローチを備えたマルチモーダルaiアシスタントである assistgpt を提案する。具体的には、Plannerは自然言語を使ってExecutorのどのツールが次にすべきかを、現在の推論の進捗に基づいて計画することができる。インスペクタは、プランナーが特定のツールに適切な視覚情報を供給するのを補助する効率的なメモリマネージャである。最後に、推論プロセス全体が複雑で柔軟であるため、学習者はモデルが最適な解を自律的に探索し発見できるように設計されている。我々は, A-OKVQA と NExT-QA のベンチマーク実験を行った。さらに,本システムでは,ベンチマークよりもはるかに複雑な質問を処理可能であることを示す。 Recent research on Large Language Models (LLMs) has led to remarkable advancements in general NLP AI assistants. Some studies have further explored the use of LLMs for planning and invoking models or APIs to address more general multi-modal user queries. Despite this progress, complex visual-based tasks still remain challenging due to the diverse nature of visual tasks. This diversity is reflected in two aspects: 1) Reasoning paths. For many real-life applications, it is hard to accurately decompose a query simply by examining the query itself. Planning based on the specific visual content and the results of each step is usually required. 2) Flexible inputs and intermediate results. Input forms could be flexible for in-the-wild cases, and involves not only a single image or video but a mixture of videos and images, e.g., a user-view image with some reference videos. Besides, a complex reasoning process will also generate diverse multimodal intermediate results, e.g., video narrations, segmented video clips, etc. To address such general cases, we propose a multi-modal AI assistant, AssistGPT, with an interleaved code and language reasoning approach called Plan, Execute, Inspect, and Learn (PEIL) to integrate LLMs with various tools. Specifically, the Planner is capable of using natural language to plan which tool in Executor should do next based on the current reasoning progress. Inspector is an efficient memory manager to assist the Planner to feed proper visual information into a specific tool. Finally, since the entire reasoning process is complex and flexible, a Learner is designed to enable the model to autonomously explore and discover the optimal solution. We conducted experiments on A-OKVQA and NExT-QA benchmarks, achieving state-of-the-art results. Moreover, showcases demonstrate the ability of our system to handle questions far more complex than those found in the benchmarks.	翻訳日:2023-06-29 17:22:46 公開日:2023-06-28
# 分断された肺気道・容器のトポロジー修復:ベースラインとデータセット Topology Repairing of Disconnected Pulmonary Airways and Vessels: Baselines and a Dataset ( http://arxiv.org/abs/2306.07089v2 ) ライセンス: Link先を確認	Ziqiao Weng, Jiancheng Yang, Dongnan Liu, Weidong Cai	(参考訳) 肺疾患の診断と治療には, 肺気道および血管の正確な分断が重要である。しかし、現在のディープラーニングアプローチは、その臨床的有用性を阻害する分離性の問題に苦しむ。この課題に対処するために, 分離肺管状構造のトポロジーを修復するためにデータ駆動法を応用した後処理手法を提案する。我々のアプローチは、ニューラルネットワークが非接続なコンポーネントをブリッジできるキーポイントを予測するために訓練されるキーポイント検出タスクとして問題を定式化する。完全肺構造から分離したデータを生成するトレーニングデータ合成パイプラインを使用する。さらに、肺気道、動脈、静脈の800の完全な3Dモデルと合成切断データを含む新しい肺樹修復データセットが公開されている。私たちのコードとデータはhttps://github.com/m3dv/pulmonary-tree-repairingで入手できます。 Accurate segmentation of pulmonary airways and vessels is crucial for the diagnosis and treatment of pulmonary diseases. However, current deep learning approaches suffer from disconnectivity issues that hinder their clinical usefulness. To address this challenge, we propose a post-processing approach that leverages a data-driven method to repair the topology of disconnected pulmonary tubular structures. Our approach formulates the problem as a keypoint detection task, where a neural network is trained to predict keypoints that can bridge disconnected components. We use a training data synthesis pipeline that generates disconnected data from complete pulmonary structures. Moreover, the new Pulmonary Tree Repairing (PTR) dataset is publicly available, which comprises 800 complete 3D models of pulmonary airways, arteries, and veins, as well as the synthetic disconnected data. Our code and data are available at https://github.com/M3DV/pulmonary-tree-repairing.	翻訳日:2023-06-29 17:21:53 公開日:2023-06-28
# 近似制約最適化のための自己教師付きEquality Embedded Deep Lagrange Dual Self-supervised Equality Embedded Deep Lagrange Dual for Approximate Constrained Optimization ( http://arxiv.org/abs/2306.06674v3 ) ライセンス: Link先を確認	Minsoo Kim, Hongseok Kim	(参考訳) 従来の解法はしばしば、特に大規模かつ時間クリティカルな問題において、制約付き最適化のために計算コストがかかる。これにより、ニューラルネットワーク(NN)を高速な最適解近似器として使用することへの関心が高まっているが、NNに制約を組み込むことは難しい。そこで本研究では,ラベルを使わずに最適解を見つけることを学ぶフレームワークdeep lagrange dual with equal embedded (deeplde)を提案する。実現可能なソリューションを確保するため、NNに等価性制約を組み込み、未等式制約を課すために原始双対法を用いてNNを訓練する。さらに,DeepLDEの収束性を証明し,本手法だけでは等式埋め込みの助けなしには等式制約を保証できないことを示す。コンベックス,非凸,AC最適電力流(AC-OPF)問題に関するシミュレーション結果から,提案したDeepLDEはNNベースの全アプローチの中で最小の最適性ギャップを達成でき,かつ常に実現可能な解を確保できることを示す。さらに,制約付き凸,非凸最適化,ac-opfの解法において,提案手法の計算時間はdc3および従来の解法に比べて約5～250倍高速である。 Conventional solvers are often computationally expensive for constrained optimization, particularly in large-scale and time-critical problems. While this leads to a growing interest in using neural networks (NNs) as fast optimal solution approximators, incorporating the constraints with NNs is challenging. In this regard, we propose deep Lagrange dual with equality embedding (DeepLDE), a framework that learns to find an optimal solution without using labels. To ensure feasible solutions, we embed equality constraints into the NNs and train the NNs using the primal-dual method to impose inequality constraints. Furthermore, we prove the convergence of DeepLDE and show that the primal-dual learning method alone cannot ensure equality constraints without the help of equality embedding. Simulation results on convex, non-convex, and AC optimal power flow (AC-OPF) problems show that the proposed DeepLDE achieves the smallest optimality gap among all the NN-based approaches while always ensuring feasible solutions. Furthermore, the computation time of the proposed method is about 5 to 250 times faster than DC3 and the conventional solvers in solving constrained convex, non-convex optimization, and/or AC-OPF.	翻訳日:2023-06-29 17:21:39 公開日:2023-06-28
# 微分表示型測光ステレオ Differentiable Display Photometric Stereo ( http://arxiv.org/abs/2306.13325v2 ) ライセンス: Link先を確認	Seokjun Choi, Seungwoo Yoon, Giljoo Nam, Seungyong Lee, Seung-Hwan Baek	(参考訳) フォトメトリックステレオは、光度条件の変化を利用してピクセルごとの表面正常を再構成する。従来のモニタを照明源として使用するディスプレイフォトメトリックステレオの概念は、かさばり、使いづらい従来の設定でしばしば発生する制限を克服する可能性がある。本稿では,市販のモニターとカメラを用いた高忠実度ノーマルリコンストラクションを実現するため,DDPS(diffariable Display Photometric Stereo)を提案する。 ddpsは、フォトメトリックステレオにおける批判的だがしばしば無視される課題に対処している。本稿では,フォトメトリックステレオ再構成法と基底照明画像形成を併用する微分可能なフレームワークを提案する。これにより、ディスプレイパターンの学習が容易になり、自動微分による高品質な正常な再構築につながる。エンドツーエンドの最適化に固有の合成ドメインギャップに対処し、3Dプリントオブジェクトからなる実世界の測光ステレオトレーニングデータセットを提案する。さらに,光度ステレオの異常な性質を低減するために,モニタから放射される線形偏光を利用して,撮像画像中の拡散反射とスペクトル反射を光学的に分離する。 DDPSは、ターゲット設定に最適化されたディスプレイパターンを学習することができ、初期化に堅牢であることを示す。本研究では,3次元プリントオブジェクトにおけるDDPSの評価を行い,DPSが効果的な測光ステレオ再構成を実現することを実証した。 Photometric stereo leverages variations in illumination conditions to reconstruct per-pixel surface normals. The concept of display photometric stereo, which employs a conventional monitor as an illumination source, has the potential to overcome limitations often encountered in bulky and difficult-to-use conventional setups. In this paper, we introduce Differentiable Display Photometric Stereo (DDPS), a method designed to achieve high-fidelity normal reconstruction using an off-the-shelf monitor and camera. DDPS addresses a critical yet often neglected challenge in photometric stereo: the optimization of display patterns for enhanced normal reconstruction. We present a differentiable framework that couples basis-illumination image formation with a photometric-stereo reconstruction method. This facilitates the learning of display patterns that leads to high-quality normal reconstruction through automatic differentiation. Addressing the synthetic-real domain gap inherent in end-to-end optimization, we propose the use of a real-world photometric-stereo training dataset composed of 3D-printed objects. Moreover, to reduce the ill-posed nature of photometric stereo, we exploit the linearly polarized light emitted from the monitor to optically separate diffuse and specular reflections in the captured images. We demonstrate that DDPS allows for learning display patterns optimized for a target configuration and is robust to initialization. We assess DDPS on 3D-printed objects with ground-truth normals and diverse real-world objects, validating that DDPS enables effective photometric-stereo reconstruction.	翻訳日:2023-06-29 17:11:59 公開日:2023-06-28
# フェデレーション学習におけるコミュニケーション削減のための効率的な仮想データ生成手法 An Efficient Virtual Data Generation Method for Reducing Communication in Federated Learning ( http://arxiv.org/abs/2306.12088v2 ) ライセンス: Link先を確認	Cheng Yang, Xue Yang, Dongxian Wu, Xiaohu Tang	(参考訳) コミュニケーションのオーバーヘッドは、連合学習(fl)における大きな課題の1つです。いくつかの古典的なスキームでは、サーバがローカルモデルから参加者のトレーニングデータに関する補助情報を抽出して中央ダミーデータセットを構築することができると仮定している。サーバはダミーデータセットを使用して、集約されたグローバルモデルを微調整し、より少ない通信ラウンドでターゲットテスト精度を達成する。本稿では、上記のソリューションをデータベースの通信効率の高いflフレームワークにまとめる。提案フレームワークの鍵となるのは,ダミーデータセットが集約されたグローバルモデルに正の影響を与えることを保証する効率的な抽出モジュール(EM)を設計することである。ジェネレータを使ってEMを設計する既存手法とは異なり,提案手法では勾配マッチングの概念を取り入れてEMを構築する。具体的には、FedINIBoostは、実際のデータセットのプロキシデータセットを、各コミュニケーションラウンドの参加者毎に2つのステップで構築する。その後、サーバはすべてのプロキシデータセットを集約し、集約されたグローバルモデルを微調整するために使用される中央ダミーデータセットを形成する。従来手法であるFedAVG,FedProx,Moon,FedFTGと比較し,本手法の優位性を検証した。さらに、FedINIBoostは、FLの初期における集約グローバルモデルの性能を微調整する上で重要な役割を果たす。 Communication overhead is one of the major challenges in Federated Learning(FL). A few classical schemes assume the server can extract the auxiliary information about training data of the participants from the local models to construct a central dummy dataset. The server uses the dummy dataset to finetune aggregated global model to achieve the target test accuracy in fewer communication rounds. In this paper, we summarize the above solutions into a data-based communication-efficient FL framework. The key of the proposed framework is to design an efficient extraction module(EM) which ensures the dummy dataset has a positive effect on finetuning aggregated global model. Different from the existing methods that use generator to design EM, our proposed method, FedINIBoost borrows the idea of gradient match to construct EM. Specifically, FedINIBoost builds a proxy dataset of the real dataset in two steps for each participant at each communication round. Then the server aggregates all the proxy datasets to form a central dummy dataset, which is used to finetune aggregated global model. Extensive experiments verify the superiority of our method compared with the existing classical method, FedAVG, FedProx, Moon and FedFTG. Moreover, FedINIBoost plays a significant role in finetuning the performance of aggregated global model at the initial stage of FL.	翻訳日:2023-06-29 17:11:37 公開日:2023-06-28
# G-NM:数値時系列予測モデルのグループ G-NM: A Group of Numerical Time Series Prediction Models ( http://arxiv.org/abs/2306.11667v2 ) ライセンス: Link先を確認	Juyoung Yun	(参考訳) 本研究では,数値時系列予測モデル群 (G-NM) と総称される数値時系列予測モデルの包括的アンサンブルの開発と実装に焦点を当てた。この包括的セットは、リカレントニューラルネットワーク(RNN)やLong Short-Term Memory(LSTM)といった現代のニューラルネットワークモデルに加えて、Autoregressive Integrated moving Average(ARIMA)、Holt-Wintersのメソッド、SVR(Support Vector Regression)といった従来のモデルを含む。 G-NMは、複雑な自然現象に固有のパターンや傾向に関連する予測能力を増強するために明確に構成されている。これらの事象に関連する時系列データを利用することで、g-nmは長期にわたってそのような現象の予測を容易にする。本研究の目的は,このような事象に対する我々の理解を深めることと,予測の精度を著しく向上させることである。 g-nmは時系列データに現れる線形および非線形の依存関係、季節性、トレンドの両方をカプセル化する。これらのモデルはそれぞれ、線形トレンドと季節性を扱うARIMAのレジリエンス、非線形パターンをキャプチャするSVRの習熟度、時系列データの様々なコンポーネントをモデル化するLSTMの適応性など、さまざまな長所に貢献している。 g-nmポテンシャルの活用を通じて,大規模時系列予測モデルにおける最先端の進歩を試みている。我々は,本研究が,自然界を構成する複雑な事象を理解し,予測するための,現在進行中の取り組みにおいて,重要な足掛かりとなることを期待する。 In this study, we focus on the development and implementation of a comprehensive ensemble of numerical time series forecasting models, collectively referred to as the Group of Numerical Time Series Prediction Model (G-NM). This inclusive set comprises traditional models such as Autoregressive Integrated Moving Average (ARIMA), Holt-Winters' method, and Support Vector Regression (SVR), in addition to modern neural network models including Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM). G-NM is explicitly constructed to augment our predictive capabilities related to patterns and trends inherent in complex natural phenomena. By utilizing time series data relevant to these events, G-NM facilitates the prediction of such phenomena over extended periods. The primary objective of this research is to both advance our understanding of such occurrences and to significantly enhance the accuracy of our forecasts. G-NM encapsulates both linear and non-linear dependencies, seasonalities, and trends present in time series data. Each of these models contributes distinct strengths, from ARIMA's resilience in handling linear trends and seasonality, SVR's proficiency in capturing non-linear patterns, to LSTM's adaptability in modeling various components of time series data. Through the exploitation of the G-NM potential, we strive to advance the state-of-the-art in large-scale time series forecasting models. We anticipate that this research will represent a significant stepping stone in our ongoing endeavor to comprehend and forecast the complex events that constitute the natural world.	翻訳日:2023-06-29 17:11:17 公開日:2023-06-28
# セグメンテーションはイメージデヘイズに役立ちます Let Segment Anything Help Image Dehaze ( http://arxiv.org/abs/2306.15870v1 ) ライセンス: Link先を確認	Zheyan Jin, Shiqi Chen, Yueting Chen, Zhihai Xu, Huajun Feng	(参考訳) 大きな言語モデルと高レベルのビジョンモデルは、大きなデータセットとモデルサイズで素晴らしいパフォーマンス向上を実現しています。しかし、画像デヘイズやぼかし除去のような低レベルのコンピュータビジョンタスクは、依然として少数のデータセットと小さなモデルに依存しており、一般的にオーバーフィットと局所的なオプティマをもたらす。そこで本稿では,大規模モデルを低レベルコンピュータビジョンタスクに統合するフレームワークを提案する。画像分割のタスクと同様に、hazeの分解もテクスチャに関連している。そこで我々は,グレースケール符号化,ネットワークチャネル拡張,プリデヘイズ構造を検出し,低レベルデヘイジングネットワークに大規模事前知識を統合することを提案する。異なるデータセットとアルゴリズムの比較実験により,低レベルの視覚タスクを導く上で,大規模モデルの有効性と適用性を示す。最後に,灰色スケール符号化,ネットワークチャネル拡張,リカレントネットワーク構造の効果をアブレーション実験により実証する。追加のデータやトレーニングリソースが不要な条件下では,大規模モデルの事前知識の統合により,劣化性能が向上し,低レベル視覚タスクのトレーニング時間を短縮できることを示す。 The large language model and high-level vision model have achieved impressive performance improvements with large datasets and model sizes. However, low-level computer vision tasks, such as image dehaze and blur removal, still rely on a small number of datasets and small-sized models, which generally leads to overfitting and local optima. Therefore, we propose a framework to integrate large-model prior into low-level computer vision tasks. Just as with the task of image segmentation, the degradation of haze is also texture-related. So we propose to detect gray-scale coding, network channel expansion, and pre-dehaze structures to integrate large-model prior knowledge into any low-level dehazing network. We demonstrate the effectiveness and applicability of large models in guiding low-level visual tasks through different datasets and algorithms comparison experiments. Finally, we demonstrate the effect of grayscale coding, network channel expansion, and recurrent network structures through ablation experiments. Under the conditions where additional data and training resources are not required, we successfully prove that the integration of large-model prior knowledge will improve the dehaze performance and save training time for low-level visual tasks.	翻訳日:2023-06-29 16:16:12 公開日:2023-06-28
# grass: リモートセンシング画像セマンティクスセグメンテーションのためのグラデーション誘導サンプリング戦略を用いたコントラスト学習 GraSS: Contrastive Learning with Gradient Guided Sampling Strategy for Remote Sensing Image Semantic Segmentation ( http://arxiv.org/abs/2306.15868v1 ) ライセンス: Link先を確認	Zhaoyang Zhang, Zhen Ren, Chao Tao, Yunsheng Zhang, Chengli Peng, Haifeng Li	(参考訳) 自己教師付きコントラスト学習(SSCL)は、リモートセンシング画像(RSI)理解において重要なマイルストーンを達成している。その本質は、ダウンストリームタスクに有益である多数のラベルのない画像から画像の特徴を抽出するための教師なしインスタンス識別プリテキストタスクを設計することである。しかしながら、既存のインスタンス識別ベースのssclは、rsiセマンティックセグメンテーションタスクに適用される場合、2つの制限に苦しむ。 1) 肯定的なサンプル結合問題 2)特徴適応バイアス。ピクセルレベルやオブジェクトレベルの機能を必要とするセマンティックセグメンテーションタスクに適用すると、機能適応バイアスが導入される。本研究では,RSIの特定領域に対して,教師なしのコントラスト損失の勾配によって識別情報をマッピングできることを見いだし,これらの特定領域は特異な接地対象を含む傾向にあることを示した。そこで本研究では,RSIセマンティックセグメンテーションのためのGradient Guided Sampling Strategy(GraSS)を用いたコントラスト学習を提案する。 GraSSは、インスタンス識別ウォームアップ(IDウォームアップ)とGradient Guided Sampling contrastive training(GSトレーニング)の2つのステージで構成される。 idウォームアップは、コントラスト損失勾配に初期識別情報を提供することを目的としている。 gsトレーニングステージは、より特異な接地対象を含むrsiパッチのコントラスト損失勾配および適応的に選択された領域に含まれる識別情報を活用し、新しい正と負のサンプルを構築することを目的としている。 3つのオープンデータセットの実験結果から、GraSSは高分解能RSIセマンティックセグメンテーションにおけるSSCLの性能を効果的に向上することが示された。 5つの異なる種類のssclからの7つのベースライン法と比較すると、草は平均で 1.57 %、最大で 3.58 % の改善を達成している。ソースコードはhttps://github.com/GeoX-Lab/GraSSで入手できる。 Self-supervised contrastive learning (SSCL) has achieved significant milestones in remote sensing image (RSI) understanding. Its essence lies in designing an unsupervised instance discrimination pretext task to extract image features from a large number of unlabeled images that are beneficial for downstream tasks. However, existing instance discrimination based SSCL suffer from two limitations when applied to the RSI semantic segmentation task: 1) Positive sample confounding issue; 2) Feature adaptation bias. It introduces a feature adaptation bias when applied to semantic segmentation tasks that require pixel-level or object-level features. In this study, We observed that the discrimination information can be mapped to specific regions in RSI through the gradient of unsupervised contrastive loss, these specific regions tend to contain singular ground objects. Based on this, we propose contrastive learning with Gradient guided Sampling Strategy (GraSS) for RSI semantic segmentation. GraSS consists of two stages: Instance Discrimination warm-up (ID warm-up) and Gradient guided Sampling contrastive training (GS training). The ID warm-up aims to provide initial discrimination information to the contrastive loss gradients. The GS training stage aims to utilize the discrimination information contained in the contrastive loss gradients and adaptively select regions in RSI patches that contain more singular ground objects, in order to construct new positive and negative samples. Experimental results on three open datasets demonstrate that GraSS effectively enhances the performance of SSCL in high-resolution RSI semantic segmentation. Compared to seven baseline methods from five different types of SSCL, GraSS achieves an average improvement of 1.57\% and a maximum improvement of 3.58\% in terms of mean intersection over the union. The source code is available at https://github.com/GeoX-Lab/GraSS	翻訳日:2023-06-29 16:15:53 公開日:2023-06-28
# 個人別分散推定と学習 Differentially Private Distributed Estimation and Learning ( http://arxiv.org/abs/2306.15865v1 ) ライセンス: Link先を確認	Marios Papachristou, M. Amin Rahimian	(参考訳) エージェントが情報交換を行い、個人が観測したサンプルから未知の確率変数の統計的特性を推定するネットワーク環境における分散推定と学習の問題について検討する。プライベートな観察に関する情報を交換することで、エージェントは未知の量をまとめて見積もることができるが、プライバシー上のリスクにも直面する。我々のアグリゲーション・スキームの目標は、観測されたデータを時間とともに、ネットワーク全体にわたって効率的に組み合わせ、エージェントのプライバシー要求を調整し、その周辺地域を超えて調整することである。我々のアルゴリズムにより、参加者はオフラインまたはオンラインで取得されたプライベート信号から十分な統計量を推定し、その信号とネットワーク近傍のプライバシーを維持することができる。これは微分プライバシー(dp)制約の下で交換された推定値にノイズを付加する調整されたランダム化スキームを持つ線形集計スキームによって達成される。いずれの場合も、全ての信号に中心的なアクセスを持つ仮説的、全知的な観測者の推定への収束を証明し、アルゴリズムの効率を実証する。また,コンバージェンスレート解析と有限時間性能保証を提供し,コンバージェンス時間を最小化するノイズがラプラスノイズであり,各エージェントの信号およびネットワーク特性に対する感度に対応するパラメータであることを示す。最後に,我々の理論的結果を補足し,検証するために,米国電力グリッドネットワークによる実世界のデータと,ドイツ家庭の電力消費データを用いて,すべてのプライバシー体制下での電力ステーションおよび家庭の平均消費電力を推定する実験を行った。 We study distributed estimation and learning problems in a networked environment in which agents exchange information to estimate unknown statistical properties of random variables from their privately observed samples. By exchanging information about their private observations, the agents can collectively estimate the unknown quantities, but they also face privacy risks. The goal of our aggregation schemes is to combine the observed data efficiently over time and across the network, while accommodating the privacy needs of the agents and without any coordination beyond their local neighborhoods. Our algorithms enable the participating agents to estimate a complete sufficient statistic from private signals that are acquired offline or online over time, and to preserve the privacy of their signals and network neighborhoods. This is achieved through linear aggregation schemes with adjusted randomization schemes that add noise to the exchanged estimates subject to differential privacy (DP) constraints. In every case, we demonstrate the efficiency of our algorithms by proving convergence to the estimators of a hypothetical, omniscient observer that has central access to all of the signals. We also provide convergence rate analysis and finite-time performance guarantees and show that the noise that minimizes the convergence time to the best estimates is the Laplace noise, with parameters corresponding to each agent's sensitivity to their signal and network characteristics. Finally, to supplement and validate our theoretical results, we run experiments on real-world data from the US Power Grid Network and electric consumption data from German Households to estimate the average power consumption of power stations and households under all privacy regimes.	翻訳日:2023-06-29 16:15:21 公開日:2023-06-28
# ゼロノイズ補間による実効量子体積の測定値の増大 Increasing the Measured Effective Quantum Volume with Zero Noise Extrapolation ( http://arxiv.org/abs/2306.15863v1 ) ライセンス: Link先を確認	Elijah Pelofske, Vincent Russo, Ryan LaRose, Andrea Mari, Dan Strano, Andreas B\"artschi, Stephan Eidenbenz, William J. Zeng	(参考訳) Quantum Volumeは、短期量子コンピュータのフルスタックベンチマークである。ターゲットデバイス上で合理的な忠実さで実行できる正方形回路の最大サイズを定量化する。エラー緩和(英: error mitigation)は、ノイズ量子コンピュータの期待値を計算する際に発生するノイズの影響を取り除くための一連の手法である。有効量子ボリュームは、ターゲットデバイスだけでなく、エラー軽減アルゴリズムの有効性を評価するために、量子ボリュームプロトコルにエラー緩和を適用するための提案された計量である。ディジタルゼロノイズ外挿法 (Digital Zero-Noise Extrapolation, ZNE) は、回路折り畳みによるノイズレス予測値を推定し、既知のスケール因子による誤差を増幅し、ゼロノイズ限界への外挿を行う。ここでは,大域的かつ局所的なユニタリ折り畳みと分数スケールの因子を併用したZNEが,動的デカップリングと組み合わせることで,ベンダーが測定した量子体積よりも有効な量子体積を増大させることができることを示す。具体的には、4つのibm量子超伝導プロセッサユニットの有効量子体積を測定し、各デバイスでベンダーが測定した量子体積よりも大きい値を得る。これが最初の報告である。 Quantum Volume is a full-stack benchmark for near-term quantum computers. It quantifies the largest size of a square circuit which can be executed on the target device with reasonable fidelity. Error mitigation is a set of techniques intended to remove the effects of noise present in the computation of noisy quantum computers when computing an expectation value of interest. Effective quantum volume is a proposed metric that applies error mitigation to the quantum volume protocol in order to evaluate the effectiveness not only of the target device but also of the error mitigation algorithm. Digital Zero-Noise Extrapolation (ZNE) is an error mitigation technique that estimates the noiseless expectation value using circuit folding to amplify errors by known scale factors and extrapolating to the zero-noise limit. Here we demonstrate that ZNE, with global and local unitary folding with fractional scale factors, in conjunction with dynamical decoupling, can increase the effective quantum volume over the vendor-measured quantum volume. Specifically, we measure the effective quantum volume of four IBM Quantum superconducting processor units, obtaining values that are larger than the vendor-measured quantum volume on each device. This is the first such increase reported.	翻訳日:2023-06-29 16:14:55 公開日:2023-06-28
# 階層型グラフニューラルネットワークによるハンドオブジェクトの6次元ポーズ推定 Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation of In-hand Objects ( http://arxiv.org/abs/2306.15858v1 ) ライセンス: Link先を確認	Alireza Rezazadeh, Snehal Dikhale, Soshi Iba and Nawid Jamali	(参考訳) ロボット操作、特に手動の物体操作は、しばしば物体の6Dポーズの正確な推定を必要とする。推定ポーズの精度を向上させるため、6次元物体ポーズ推定における最先端のアプローチでは、rgb画像、奥行き、触覚読取などの1つ以上のモードからの観測データを用いる。しかし、既存のアプローチでは、これらのモダリティによって捕獲された物体の基底となる幾何学的構造を限定的に利用し、視覚的特徴に依存している。これにより、このような視覚的特徴が欠けているオブジェクトや、単に視覚的特徴が無視されているオブジェクトが提示される場合のパフォーマンスが低下する。また,現在のアプローチでは,指の位置に埋め込まれた固有情報を利用しない。 To address these limitations, in this paper: (1) we introduce a hierarchical graph neural network architecture for combining multimodal (vision and touch) data that allows for a geometrically informed 6D object pose estimation, (2) we introduce a hierarchical message passing operation that flows the information within and across modalities to learn a graph-based object representation, and (3) we introduce a method that accounts for the proprioceptive information for in-hand object representation. 我々は,YCBオブジェクトとモデルセットから多種多様なオブジェクトのサブセット上でモデルを評価し,その手法が既存の最先端技術よりも精度と強靭性で優れていることを示す。また,提案フレームワークを実ロボットにデプロイし,実環境への移動を定量的に示す。 Robotic manipulation, in particular in-hand object manipulation, often requires an accurate estimate of the object's 6D pose. To improve the accuracy of the estimated pose, state-of-the-art approaches in 6D object pose estimation use observational data from one or more modalities, e.g., RGB images, depth, and tactile readings. However, existing approaches make limited use of the underlying geometric structure of the object captured by these modalities, thereby, increasing their reliance on visual features. This results in poor performance when presented with objects that lack such visual features or when visual features are simply occluded. Furthermore, current approaches do not take advantage of the proprioceptive information embedded in the position of the fingers. To address these limitations, in this paper: (1) we introduce a hierarchical graph neural network architecture for combining multimodal (vision and touch) data that allows for a geometrically informed 6D object pose estimation, (2) we introduce a hierarchical message passing operation that flows the information within and across modalities to learn a graph-based object representation, and (3) we introduce a method that accounts for the proprioceptive information for in-hand object representation. We evaluate our model on a diverse subset of objects from the YCB Object and Model Set, and show that our method substantially outperforms existing state-of-the-art work in accuracy and robustness to occlusion. We also deploy our proposed framework on a real robot and qualitatively demonstrate successful transfer to real settings.	翻訳日:2023-06-29 16:14:34 公開日:2023-06-28
# 斜め試料を用いた低位構造を有する多腕帯の純探査 Pure exploration in multi-armed bandits with low rank structure using oblivious sampler ( http://arxiv.org/abs/2306.15856v1 ) ライセンス: Link先を確認	Yaxiong Liu, Atsuyoshi Nakamura, Kohei Hatano, Eiji Takimoto	(参考訳) 本稿では,純探索問題の報酬系列の低階構造について考察する。まず, 探索戦略が探査のフィードバックを得られない純粋な探査問題において, 分離した設定を提案する。この分離のため、腕を標本化するための探索戦略が必要である。報奨ベクトルの核情報を取り込むことにより、result bound $o(d\sqrt{(\ln n)/n})$を持つ時間変動と固定ケースの両方に対して効率的なアルゴリズムを提供する。次に,低位列の多腕バンディットにおける純粋探索に対する下限を示す。我々の上界と下界の間には$o(\sqrt{\ln n})$ギャップがある。 In this paper, we consider the low rank structure of the reward sequence of the pure exploration problems. Firstly, we propose the separated setting in pure exploration problem, where the exploration strategy cannot receive the feedback of its explorations. Due to this separation, it requires that the exploration strategy to sample the arms obliviously. By involving the kernel information of the reward vectors, we provide efficient algorithms for both time-varying and fixed cases with regret bound $O(d\sqrt{(\ln N)/n})$. Then, we show the lower bound to the pure exploration in multi-armed bandits with low rank sequence. There is an $O(\sqrt{\ln N})$ gap between our upper bound and the lower bound.	翻訳日:2023-06-29 16:14:11 公開日:2023-06-28
# GoalieNet: アイスホッケーにおけるジョイントゴール、機器、ネットポーズ推定のためのマルチステージネットワーク GoalieNet: A Multi-Stage Network for Joint Goalie, Equipment, and Net Pose Estimation in Ice Hockey ( http://arxiv.org/abs/2306.15853v1 ) ライセンス: Link先を確認	Marjan Shahi, David Clausi, Alexander Wong	(参考訳) コンピュータビジョン駆動アイスホッケー分析の分野では、最も困難で研究の少ないタスクの1つはゴールキーパーのポーズ推定である。一般的な人間のポーズ推定とは違って、太いパッドやマスクの下に隠されたゴールキーパーの関節に対応するキーポイントの検出だけでなく、大きな脚パッドや手袋、スティック、ホッケーネットなどに対応する多数の非人間のキーポイントも含むため、ゴールキーパーポーズ推定ははるかに複雑である。この課題に取り組むため,我々は,ゴールキーパー,機器,ネットのポーズを共同で推定する多段深層ニューラルネットワークであるgoalienetを紹介する。 NHLベンチマークデータを用いた実験の結果,提案したGoalieNetはすべてのキーポイントに対して平均84倍の精度を達成でき,29のキーポイントのうち22が80倍以上の精度で検出されることがわかった。このことは,このような共同ポーズ推定手法が有望な研究方向であることを示す。 In the field of computer vision-driven ice hockey analytics, one of the most challenging and least studied tasks is goalie pose estimation. Unlike general human pose estimation, goalie pose estimation is much more complex as it involves not only the detection of keypoints corresponding to the joints of the goalie concealed under thick padding and mask, but also a large number of non-human keypoints corresponding to the large leg pads and gloves worn, the stick, as well as the hockey net. To tackle this challenge, we introduce GoalieNet, a multi-stage deep neural network for jointly estimating the pose of the goalie, their equipment, and the net. Experimental results using NHL benchmark data demonstrate that the proposed GoalieNet can achieve an average of 84\% accuracy across all keypoints, where 22 out of 29 keypoints are detected with more than 80\% accuracy. This indicates that such a joint pose estimation approach can be a promising research direction.	翻訳日:2023-06-29 16:14:02 公開日:2023-06-28
# 自律ロボットのための新しい室内人間動作データセットRoAMを用いた行動条件深部視覚予測 Action-conditioned Deep Visual Prediction with RoAM, a new Indoor Human Motion Dataset for Autonomous Robots ( http://arxiv.org/abs/2306.15852v1 ) ライセンス: Link先を確認	Meenakshi Sarkar, Vinayak Honkote, Dibyendu Das and Debasish Ghose	(参考訳) 産業におけるロボットの採用の増加に伴い、ロボットが人間と協調して効果的に行動を予測、理解、計画できる高度なアルゴリズムの開発に注力することが重要である。ロボットのエゴビジョンから様々な人間の動きを記録できる様々な屋内環境において、カスタムメイドのタートルボット3バーガーロボットで収集されるロボット自律運動(RoAM)ビデオデータセットを紹介する。データセットには、LiDARスキャンの同期記録や、静的で動く人間のエージェントの周りを移動する際にロボットが取るすべての制御アクションも含まれている。このユニークなデータセットは、記録エージェントが部分的に観察可能なシナリオや、イメージングセンサーが移動プラットフォームにマウントされているケースにおいて、将来の画像フレームを予測できる新しいビジュアル予測フレームワークの開発とベンチマークを提供する。 acpnetと呼ばれる新しい深部視覚予測フレームワークのデータセットをベンチマークし、近似された将来の画像フレームはロボットのアクションにも依存しており、モバイルロボットと自律ナビゲーション研究のためのビデオ予測パラダイムにロボットダイナミクスを組み込む可能性を実証した。 With the increasing adoption of robots across industries, it is crucial to focus on developing advanced algorithms that enable robots to anticipate, comprehend, and plan their actions effectively in collaboration with humans. We introduce the Robot Autonomous Motion (RoAM) video dataset, which is collected with a custom-made turtlebot3 Burger robot in a variety of indoor environments recording various human motions from the robot's ego-vision. The dataset also includes synchronized records of the LiDAR scan and all control actions taken by the robot as it navigates around static and moving human agents. The unique dataset provides an opportunity to develop and benchmark new visual prediction frameworks that can predict future image frames based on the action taken by the recording agent in partially observable scenarios or cases where the imaging sensor is mounted on a moving platform. We have benchmarked the dataset on our novel deep visual prediction framework called ACPNet where the approximated future image frames are also conditioned on action taken by the robot and demonstrated its potential for incorporating robot dynamics into the video prediction paradigm for mobile robotics and autonomous navigation research.	翻訳日:2023-06-29 16:13:41 公開日:2023-06-28
# spotem:エピソディックメモリのための効率的なビデオ検索 SpotEM: Efficient Video Search for Episodic Memory ( http://arxiv.org/abs/2306.15850v1 ) ライセンス: Link先を確認	Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman	(参考訳) エピソードメモリ(EM)の目標は、自然言語の問い合わせに答えるために、長いエゴセントリックなビデオを検索することである(例えば、私は財布をどこに置き去りにしたのか? 既存のemメソッドは、ビデオの至るところで見られるよう、高価な固定長クリップ機能を徹底的に抽出している。本研究では,高い精度を維持しつつ,与えられたEM手法の効率性を実現する手法であるSpotEMを提案する。 SpotEMは3つの重要なアイデアから成り立っている。 1) 言語クエリで条件付き検索を行うための有望なビデオ領域を特定することを学習する新規クリップセレクタ 2) 部屋,オブジェクト,および見るべき場所を示すインタラクションのコンテキストをキャプチャする,低コストでセマンティックなインデックス化機能。 3)クリップセレクタとemモデルのエンドツーエンド合同トレーニングから生じる最適化問題に対処する蒸留損失。 Ego4D EM Natural Language Queriesベンチマークによる200時間以上のビデオと3つの異なるEMモデルによる実験は、我々のアプローチの有効性を示している。プロジェクトページ: https://vision.cs.utexas.edu/projects/spotem The goal in episodic memory (EM) is to search a long egocentric video to answer a natural language query (e.g., "where did I leave my purse?"). Existing EM methods exhaustively extract expensive fixed-length clip features to look everywhere in the video for the answer, which is infeasible for long wearable-camera videos that span hours or even days. We propose SpotEM, an approach to achieve efficiency for a given EM method while maintaining good accuracy. SpotEM consists of three key ideas: 1) a novel clip selector that learns to identify promising video regions to search conditioned on the language query; 2) a set of low-cost semantic indexing features that capture the context of rooms, objects, and interactions that suggest where to look; and 3) distillation losses that address the optimization issues arising from end-to-end joint training of the clip selector and EM model. Our experiments on 200+ hours of video from the Ego4D EM Natural Language Queries benchmark and three different EM models demonstrate the effectiveness of our approach: computing only 10% - 25% of the clip features, we preserve 84% - 97% of the original EM model's accuracy. Project page: https://vision.cs.utexas.edu/projects/spotem	翻訳日:2023-06-29 16:13:21 公開日:2023-06-28
# 非置換SGDの順序付け Ordering for Non-Replacement SGD ( http://arxiv.org/abs/2306.15848v1 ) ライセンス: Link先を確認	Yuetong Xu and Baharan Mirzasoleiman	(参考訳) 実行時間の削減と機械学習の効率向上のための1つのアプローチは、使用される最適化アルゴリズムの収束率を低減することである。 Shufflingは機械学習で広く使われているアルゴリズム技術だが、近年は理論上のみ注目を集め始めた。ランダムシャッフルと漸進勾配降下のために異なる収束速度が発達し、アルゴリズムの非置換形式に対する収束率を改善することができる順序付けを求める。最適イテレートと電流イテレートの間の距離の既存の境界に基づいて、エポックの開始時の勾配に依存する上界を導出する。境界解析により、強い凸関数と凸関数のステップサイズを一定かつ小さくするための最適順序付けを開発することができる。さらに, 合成実験と実データを用いた実験を行い, 実験結果の検証を行った。さらに、注文とミニバッチを組み合わせることで、より複雑なニューラルネットワークに適用し、有望な結果を示すことができます。 One approach for reducing run time and improving efficiency of machine learning is to reduce the convergence rate of the optimization algorithm used. Shuffling is an algorithm technique that is widely used in machine learning, but it only started to gain attention theoretically in recent years. With different convergence rates developed for random shuffling and incremental gradient descent, we seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm. Based on existing bounds of the distance between the optimal and current iterate, we derive an upper bound that is dependent on the gradients at the beginning of the epoch. Through analysis of the bound, we are able to develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions. We further test and verify our results through experiments on synthesis and real data sets. In addition, we are able to combine the ordering with mini-batch and further apply it to more complex neural networks, which show promising results.	翻訳日:2023-06-29 16:13:00 公開日:2023-06-28
# asymptotic-preserving convolutional deeponets : 多スケール線形輸送方程式の拡散挙動を捉える Asymptotic-Preserving Convolutional DeepONets Capture the Diffusive Behavior of the Multiscale Linear Transport Equations ( http://arxiv.org/abs/2306.15891v1 ) ライセンス: Link先を確認	Keke Wu and Xiong-bin Yan and Shi Jin and Zheng Ma	(参考訳) 本稿では,マルチスケールの時間依存線形輸送問題に対処するために設計された,漸近保存型畳み込み型深層作用素ネットワーク (apcons) の2つのタイプを提案する。 MLPを改良したバニラ物理インフォームドディープノネットは,所望のマクロな挙動を維持する不安定性を示す可能性がある。したがって、漸近保存損失関数の利用が必要である。拡散方程式における熱核からインスピレーションを得たConvolutional Deep Operator Networksという新しいアーキテクチャを提案し,各フィルタ層におけるプールおよびアクティベーション操作とともに,グローバルな熱カーネルの代わりに複数の局所畳み込み演算を用いる。我々のAPCON法は, グリッドサイズに依存しないパラメータ数を持ち, 線形輸送問題の拡散挙動を捉えることができる。最後に,本手法の有効性をいくつかの数値例を通して検証する。 In this paper, we introduce two types of novel Asymptotic-Preserving Convolutional Deep Operator Networks (APCONs) designed to address the multiscale time-dependent linear transport problem. We observe that the vanilla physics-informed DeepONets with modified MLP may exhibit instability in maintaining the desired limiting macroscopic behavior. Therefore, this necessitates the utilization of an asymptotic-preserving loss function. Drawing inspiration from the heat kernel in the diffusion equation, we propose a new architecture called Convolutional Deep Operator Networks, which employ multiple local convolution operations instead of a global heat kernel, along with pooling and activation operations in each filter layer. Our APCON methods possess a parameter count that is independent of the grid size and are capable of capturing the diffusive behavior of the linear transport problem. Finally, we validate the effectiveness of our methods through several numerical examples.	翻訳日:2023-06-29 16:06:47 公開日:2023-06-28
# 反応・再合成予測のための深層学習の統一的展望:現状と今後の課題 A Unified View of Deep Learning for Reaction and Retrosynthesis Prediction: Current Status and Future Challenges ( http://arxiv.org/abs/2306.15890v1 ) ライセンス: Link先を確認	Ziqiao Meng, Peilin Zhao, Yang Yu, Irwin King	(参考訳) 反応と再合成予測は、最近機械学習と薬物発見コミュニティから注目を集めている計算化学の基本的なタスクである。これらの問題に取り組むために、さまざまなディープラーニングアプローチが提案されている。本研究では,反応・再合成予測のための高度なディープラーニングモデルに関する包括的調査を行う。我々は最先端アプローチの設計機構,強み,弱みを要約する。次に、現在のソリューションの限界と、問題自体のオープンな課題について論じる。最後に,今後の研究を促進するための有望な方向性を示す。本研究は,反応の統一的理解と再合成予測を目的とした,初めての総合的かつ体系的な調査である。 Reaction and retrosynthesis prediction are fundamental tasks in computational chemistry that have recently garnered attention from both the machine learning and drug discovery communities. Various deep learning approaches have been proposed to tackle these problems, and some have achieved initial success. In this survey, we conduct a comprehensive investigation of advanced deep learning-based models for reaction and retrosynthesis prediction. We summarize the design mechanisms, strengths, and weaknesses of state-of-the-art approaches. Then, we discuss the limitations of current solutions and open challenges in the problem itself. Finally, we present promising directions to facilitate future research. To our knowledge, this paper is the first comprehensive and systematic survey that seeks to provide a unified understanding of reaction and retrosynthesis prediction.	翻訳日:2023-06-29 16:06:30 公開日:2023-06-28
# ハイプを超えて: GPT3.5の性能, 信頼性, 臨床適合性を評価する Beyond the Hype: Assessing the Performance, Trustworthiness, and Clinical Suitability of GPT3.5 ( http://arxiv.org/abs/2306.15887v1 ) ライセンス: Link先を確認	Salmonn Talebi, Elizabeth Tong and Mohammad R. K. Mofrad	(参考訳) 医療における大規模言語モデル(LLMs)の使用は普及しているが,臨床現場での実用性や安全性は十分に評価されていない。 LLMにとって、医療環境や信頼性、安全性といった高度な環境が重要な問題である。そこで本研究では,医療画像プロトコル割り当てのためのgpt3.5モデルの性能と信頼性を評価する手法を提案する。細調整されたBERTモデルと放射線技師を比較した。また,決定過程を評価するため,GPT3.5の出力を放射線学者にレビューする。評価データセットは、頭部全体にわたる11のイメージングプロトコルクラスにまたがる4,700人の医師からなる。以上の結果から,GPT3.5はBERTと放射線科医に遅れていることが示唆された。しかし GPT3.5 は BERT よりも優れており、その決定を説明し、関連する単語の指標を検出し、モデルの校正を行う。さらに, 誤分類に対する GPT3.5 の説明を解析することにより, 安全性と臨床応用への適合性を高めるために解決すべき系統的誤りを明らかにする。 The use of large language models (LLMs) in healthcare is gaining popularity, but their practicality and safety in clinical settings have not been thoroughly assessed. In high-stakes environments like medical settings, trust and safety are critical issues for LLMs. To address these concerns, we present an approach to evaluate the performance and trustworthiness of a GPT3.5 model for medical image protocol assignment. We compare it with a fine-tuned BERT model and a radiologist. In addition, we have a radiologist review the GPT3.5 output to evaluate its decision-making process. Our evaluation dataset consists of 4,700 physician entries across 11 imaging protocol classes spanning the entire head. Our findings suggest that the GPT3.5 performance falls behind BERT and a radiologist. However, GPT3.5 outperforms BERT in its ability to explain its decision, detect relevant word indicators, and model calibration. Furthermore, by analyzing the explanations of GPT3.5 for misclassifications, we reveal systematic errors that need to be resolved to enhance its safety and suitability for clinical use.	翻訳日:2023-06-29 16:06:19 公開日:2023-06-28
# 特徴表現に基づく逐次的注意源同定 Sequential Attention Source Identification Based on Feature Representation ( http://arxiv.org/abs/2306.15886v1 ) ライセンス: Link先を確認	Dongpeng Hou, Zhen Wang, Chao Gao, Xuelong Li	(参考訳) スナップショット観測に基づくソースローカライゼーションは、アクセシビリティと低コストのために広く研究されている。しかし, 既存手法におけるユーザ間のインタラクションは, 時間変化による感染シナリオでは対処できない。これらの手法は異種相互作用のシナリオにおいて精度が低下する。そこで本研究では,インダクティブ・ラーニング・アイデアに基づく時間系列に基づくグラフ注意源同定(tgasi)と呼ばれる,シーケンスからシーケンスへの局所化手法を提案する。より具体的には、エンコーダは2人のユーザ間の影響確率を推定して複数の特徴を生成し、デコーダは設計した時間的注意機構により異なるタイムスタンプにおける予測ソースの重要性を区別する。ただし、インダクティブラーニングのアイデアは、TGASIが他の事前知識を知らずに新しいシナリオのソースを検出できることを保証するもので、TGASIのスケーラビリティを証明している点には注意が必要だ。 SOTA法による総合的な実験は、TGASIの異なるシナリオにおける高い検出性能とスケーラビリティを示す。 Snapshot observation based source localization has been widely studied due to its accessibility and low cost. However, the interaction of users in existing methods does not be addressed in time-varying infection scenarios. So these methods have a decreased accuracy in heterogeneous interaction scenarios. To solve this critical issue, this paper proposes a sequence-to-sequence based localization framework called Temporal-sequence based Graph Attention Source Identification (TGASI) based on an inductive learning idea. More specifically, the encoder focuses on generating multiple features by estimating the influence probability between two users, and the decoder distinguishes the importance of prediction sources in different timestamps by a designed temporal attention mechanism. It's worth mentioning that the inductive learning idea ensures that TGASI can detect the sources in new scenarios without knowing other prior knowledge, which proves the scalability of TGASI. Comprehensive experiments with the SOTA methods demonstrate the higher detection performance and scalability in different scenarios of TGASI.	翻訳日:2023-06-29 16:06:03 公開日:2023-06-28
# 真のフレア除去に向けて - 包括的なパイプラインと新しいベンチマーク Toward Real Flare Removal: A Comprehensive Pipeline and A New Benchmark ( http://arxiv.org/abs/2306.15884v1 ) ライセンス: Link先を確認	Zheyan Jin, Shiqi Chen, Huajun Feng, Zhihai Xu, Yueting Chen	(参考訳) 照明不足のシーンでの撮影では、複雑な光源の存在は、強度、スペクトル、反射、収差などの強いフレアのアーチファクトを画像に残すことが多い。画質だけでなく、下流のビジュアルアプリケーションの性能にも影響を及ぼす。したがって、レンズフレアとゴーストの除去は、特に低照度環境での課題である。しかし, 既存のフレア除去法は主に, 散乱フレアのカテゴリーが特異であり, 反射するゴーストが利用できない, 不適切なシミュレーションや実世界の捕獲に限られている。したがって,フレア除去のデータセットの構築には包括的劣化手順が不可欠である。理論的解析と実世界の評価に基づいて,フレア劣化を伴うデータペアを生成する手法を提案する。手順は包括的であり、散在するフレアの類似性と反射するゴーストの対称効果を実現する。さらに,散乱と反射フレアの影響をそれぞれ処理する実写パイプラインを構築し,エンドツーエンドの手法で直接データを生成する。実験の結果,提案手法は既存のフレアデータセットに多様性を付加し,フレアデータペアの包括的なマッピング手順を構築する。また,本手法では,フレア画像の良好な復元を実現するためにデータ駆動モデルを構築し,実際の撮影に基づく評価システムを提案する。 Photographing in the under-illuminated scenes, the presence of complex light sources often leave strong flare artifacts in images, where the intensity, the spectrum, the reflection, and the aberration altogether contribute the deterioration. Besides the image quality, it also influence the performance of down-stream visual applications. Thus, removing the lens flare and ghosts is a challenge issue especially in low-light environment. However, existing methods for flare removal mainly restricted to the problems of inadequate simulation and real-world capture, where the categories of scattered flares are singular and the reflected ghosts are unavailable. Therefore, a comprehensive deterioration procedure is crucial for constructing the dataset of flare removal. Based on the theoretical analysis and real-world evaluation, we propose a well-developed methodology for generating the data-pairs with flare deterioration. The procedure is comprehensive, where the similarity of scattered flares and the symmetric effect of reflected ghosts are realized. Moreover, we also construct a real-shot pipeline that respectively processes the effects of scattering and reflective flares, aiming to directly generate the data for end-to-end methods. Experimental results show that the proposed methodology add diversity to the existing flare datasets and construct a comprehensive mapping procedure for flare data pairs. And our method facilities the data-driven model to realize better restoration in flare images and proposes a better evaluation system based on real shots, resulting promote progress in the area of real flare removal.	翻訳日:2023-06-29 16:05:47 公開日:2023-06-28
# レコメンデーションシステムにおけるブロックワイズ機能相互作用 Blockwise Feature Interaction in Recommendation Systems ( http://arxiv.org/abs/2306.15881v1 ) ライセンス: Link先を確認	Weijie Zhao, Ping Li	(参考訳) 機能インタラクションは、ユーザの好みとアイテム特性の複雑な関係を捉えるため、レコメンデーションシステムにおいて重要な役割を果たす。ディープ・アンド・クロス・ネットワーク(DCNv2)のような既存の手法は、クロス層演算のために高い計算要求に悩まされる可能性がある。本稿では,この問題を軽減するためにbfi(blockwise feature interaction)と呼ばれる新しい手法を提案する。機能相互作用プロセスを小さなブロックに分割することで、メモリフットプリントと計算負荷の両方を大幅に削減できる。 BFIの4つの変種(それぞれP, Q, T, S)が開発され、経験的に比較されている。実験の結果,提案アルゴリズムは標準のDCNv2に比べて精度が良く,計算オーバーヘッドやパラメータ数を大幅に削減できることがわかった。本稿では,機能相互作用効率向上のための実用的なソリューションを提供することで,効率的なレコメンデーションシステムの開発に寄与する。 Feature interactions can play a crucial role in recommendation systems as they capture complex relationships between user preferences and item characteristics. Existing methods such as Deep & Cross Network (DCNv2) may suffer from high computational requirements due to their cross-layer operations. In this paper, we propose a novel approach called blockwise feature interaction (BFI) to help alleviate this issue. By partitioning the feature interaction process into smaller blocks, we can significantly reduce both the memory footprint and the computational burden. Four variants (denoted by P, Q, T, S, respectively) of BFI have been developed and empirically compared. Our experimental results demonstrate that the proposed algorithms achieves close accuracy compared to the standard DCNv2, while greatly reducing the computational overhead and the number of parameters. This paper contributes to the development of efficient recommendation systems by providing a practical solution for improving feature interaction efficiency.	翻訳日:2023-06-29 16:05:23 公開日:2023-06-28
# オープンボキャブラリ学習に向けて:調査 Towards Open Vocabulary Learning: A Survey ( http://arxiv.org/abs/2306.15880v1 ) ライセンス: Link先を確認	Jianzong Wu, Xiangtai Li, Shilin Xu. Haobo Yuan, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, and Dacheng Tao	(参考訳) 視覚シーン理解の分野では、ディープニューラルネットワークはセグメンテーション、トラッキング、検出など、さまざまなコアタスクにおいて驚くべき進歩を遂げている。しかし、ほとんどのアプローチはクローズセットの仮定に基づいており、トレーニングセットに存在する事前定義されたカテゴリのみを識別できる。近年、視覚言語事前学習の急速な進歩により、オープンな語彙設定が提案されている。これらの新しいアプローチは、注釈付きラベル空間を超えてカテゴリを見つけ、認識することを目指している。オープン語彙のアプローチは、弱教師付きおよびゼロショット設定に比べて、より一般的で実用的で効果的である。本稿では,その分野における最近の発展を要約し,分析し,オープンな語彙学習の徹底的なレビューを行う。特に,ゼロショット学習,オープンセット認識,分散検出といった関連する概念と比較することから始める。次に, セグメンテーションと検出に関して, ロングテール問題, 少数ショット設定, ゼロショット設定など, 密接に関連するタスクをいくつか検討する。本研究は,まず,事前知識としてクローズセットにおける検出とセグメンテーションの基本的な知識を提示する。次に,オープン語彙学習を用いた様々なシナリオについて検討し,共通設計要素とコアアイデアを同定する。次に、一般的なデータセットとベンチマークにおける最近の検出とセグメンテーションのアプローチを比較した。最後に,今後の研究方向性に関する洞察,課題,議論をまとめる。私たちの知る限り、オープンな語彙学習に関する総合的な文献レビューはこれが初めてである。関連する作業をhttps://github.com/jianzongwu/Awesome-Open-Vocabulary.comで追跡しています。 In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective compared to weakly supervised and zero-shot settings. This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by comparing it to related concepts such as zero-shot learning, open-set recognition, and out-of-distribution detection. Then, we review several closely related tasks in the case of segmentation and detection, including long-tail problems, few-shot, and zero-shot settings. For the method survey, we first present the basic knowledge of detection and segmentation in close-set as the preliminary knowledge. Next, we examine various scenarios in which open vocabulary learning is used, identifying common design elements and core ideas. Then, we compare the recent detection and segmentation approaches in commonly used datasets and benchmarks. Finally, we conclude with insights, issues, and discussions regarding future research directions. To our knowledge, this is the first comprehensive literature review of open vocabulary learning. We keep tracing related works at https://github.com/jianzongwu/Awesome-Open-Vocabulary.	翻訳日:2023-06-29 16:05:08 公開日:2023-06-28
# ハイブリッド蒸留:マスクオートエンコーダとコントラスト学習者との接続 Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners ( http://arxiv.org/abs/2306.15876v1 ) ライセンス: Link先を確認	Bowen Shi, Xiaopeng Zhang, Yaoming Wang, Jin Li, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian	(参考訳) 表現学習は従来の教師付きトレーニングからコントラスト学習(CL)やマスケッド画像モデリング(MIM)へと進化してきた。従来の研究では、CLや教師付き事前訓練のエクササイズといった特定のシナリオにおいて、より長い範囲のグローバルパターンを捕捉し、より優れた特徴識別を可能にするとともに、MIMはすべてのトランスフォーマー層により局所的で多様な注意を向けることが可能であった。本稿では,その強みを組み合わせたモデルを得る方法について検討する。まず,前回の特徴蒸留法とマスクの特徴再現法について検討し,その限界を明らかにした。多様性の増大は、主に非対称な設計に由来するが、これらの設計は結果的に識別能力を損なう可能性がある。識別と多様性の両立を図るため,教師/CL教師とMIM教師の双方を併用し,学生モデルを指導する簡易かつ効果的なハイブリッド蒸留戦略を提案する。 Hybrid DistillはMIM教師のトークン関係を模倣し、注意崩壊を緩和し、教師/CL教師の特徴マップを蒸留して差別を可能にする。さらに、プログレッシブな冗長なトークンマスキング戦略を用いて蒸留コストを削減し、局所最適状態に陥ることを避ける。実験の結果、ハイブリッド蒸留は異なるベンチマークで優れた性能を達成できることが証明された。 Representation learning has been evolving from traditional supervised training to Contrastive Learning (CL) and Masked Image Modeling (MIM). Previous works have demonstrated their pros and cons in specific scenarios, i.e., CL and supervised pre-training excel at capturing longer-range global patterns and enabling better feature discrimination, while MIM can introduce more local and diverse attention across all transformer layers. In this paper, we explore how to obtain a model that combines their strengths. We start by examining previous feature distillation and mask feature reconstruction methods and identify their limitations. We find that their increasing diversity mainly derives from the asymmetric designs, but these designs may in turn compromise the discrimination ability. In order to better obtain both discrimination and diversity, we propose a simple but effective Hybrid Distillation strategy, which utilizes both the supervised/CL teacher and the MIM teacher to jointly guide the student model. Hybrid Distill imitates the token relations of the MIM teacher to alleviate attention collapse, as well as distills the feature maps of the supervised/CL teacher to enable discrimination. Furthermore, a progressive redundant token masking strategy is also utilized to reduce the distilling costs and avoid falling into local optima. Experiment results prove that Hybrid Distill can achieve superior performance on different benchmarks.	翻訳日:2023-06-29 16:04:47 公開日:2023-06-28
# fake the real: 音声変換によるディープ音声分類へのバックドア攻撃 Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion ( http://arxiv.org/abs/2306.15875v1 ) ライセンス: Link先を確認	Zhe Ye, Terui Mao, Li Dong, Diqun Yan	(参考訳) 深い音声分類は大きな成功を収め、多くの実世界の応用の出現を大いに促進した。しかし、バックドア攻撃は、特に信頼できるサードパーティプラットフォームにおいて、攻撃者が設定した事前定義されたトリガーがバックドアをアクティベートできるため、新たなセキュリティ脅威をもたらす。既存の音声バックドア攻撃のトリガーのほとんどはサンプルに依存しないものであり、たとえトリガーが目立たずに設計されているとしても、音は聞こえる。本研究は,音声変換に基づくサンプル特異的トリガを用いたバックドア攻撃を探索する。具体的には,事前学習した音声変換モデルを用いてトリガーを生成し,有毒なサンプルが追加の可聴音を発生させないようにする。 2つの音声分類タスクに対する大規模な実験により,攻撃の有効性が示された。さらに,提案するバックドアを活性化する特定のシナリオを分析し,微調整に対する抵抗を検証した。 Deep speech classification has achieved tremendous success and greatly promoted the emergence of many real-world applications. However, backdoor attacks present a new security threat to it, particularly with untrustworthy third-party platforms, as pre-defined triggers set by the attacker can activate the backdoor. Most of the triggers in existing speech backdoor attacks are sample-agnostic, and even if the triggers are designed to be unnoticeable, they can still be audible. This work explores a backdoor attack that utilizes sample-specific triggers based on voice conversion. Specifically, we adopt a pre-trained voice conversion model to generate the trigger, ensuring that the poisoned samples does not introduce any additional audible noise. Extensive experiments on two speech classification tasks demonstrate the effectiveness of our attack. Furthermore, we analyzed the specific scenarios that activated the proposed backdoor and verified its resistance against fine-tuning.	翻訳日:2023-06-29 16:04:22 公開日:2023-06-28
# 変分ベイズ推論を用いた限定データからの確率偏微分方程式の発見 Discovering stochastic partial differential equations from limited data using variational Bayes inference ( http://arxiv.org/abs/2306.15873v1 ) ライセンス: Link先を確認	Yogesh Chandrakant Mathpati and Tapas Tripura and Rajdip Nayek and Souvik Chakraborty	(参考訳) データから確率的部分微分方程式(SPDE)を発見するための新しい枠組みを提案する。提案手法は確率計算、変分ベイズ理論、スパース学習の概念を組み合わせたものである。本研究では,SPDEのドリフト・拡散項を状態応答の観点から表現するための拡張クラマース・モヤル展開法を提案し,Spike-and-Slab プリエントをスパースラーニング技術を用いて,基礎となるSPDEを効率的に正確に発見する。提案手法は3つの標準SPDEに適用されている。 a)確率的熱方程式 (b)確率アレン・カーン方程式、及び (c)確率ナグモ方程式。提案手法は,SPDEを限られたデータで正確に識別できることを示す。これはデータからSPDEを発見する最初の試みであり、気候モデリング、財務予測、化学動力学などの様々な科学的応用に重要な意味を持つ。 We propose a novel framework for discovering Stochastic Partial Differential Equations (SPDEs) from data. The proposed approach combines the concepts of stochastic calculus, variational Bayes theory, and sparse learning. We propose the extended Kramers-Moyal expansion to express the drift and diffusion terms of an SPDE in terms of state responses and use Spike-and-Slab priors with sparse learning techniques to efficiently and accurately discover the underlying SPDEs. The proposed approach has been applied to three canonical SPDEs, (a) stochastic heat equation, (b) stochastic Allen-Cahn equation, and (c) stochastic Nagumo equation. Our results demonstrate that the proposed approach can accurately identify the underlying SPDEs with limited data. This is the first attempt at discovering SPDEs from data, and it has significant implications for various scientific applications, such as climate modeling, financial forecasting, and chemical kinetics.	翻訳日:2023-06-29 16:04:07 公開日:2023-06-28
# 2023年Waymo Open Sim Agents Challengeの第2位 The 2nd Place Solution for 2023 Waymo Open Sim Agents Challenge ( http://arxiv.org/abs/2306.15914v1 ) ライセンス: Link先を確認	Cheng Qian, Di Xiu, Minghao Tian	(参考訳) 本稿では,2023年のWaymo Open Sim Agents Challenge(WOSAC)[4]の2位となるソリューションを提示する。本稿では,MTR(Motion Transformer)[5] という,マルチエージェントの動作をシミュレーションするシンプルな自己回帰手法を提案する。我々の提出したMTR+++は2023 WOSACのRealism Metaメトリックで0.4697を達成しています。また、MTR_Eと命名されたMTRに基づく改良モデルも提案されており、スコアは0.4911で、2023年6月25日現在、WOSACのリーダーボードで3位である。 In this technical report, we present the 2nd place solution of 2023 Waymo Open Sim Agents Challenge (WOSAC)[4]. We propose a simple yet effective autoregressive method for simulating multi-agent behaviors, which is built upon a well-known multimodal motion forecasting framework called Motion Transformer (MTR)[5] with postprocessing algorithms applied. Our submission named MTR+++ achieves 0.4697 on the Realism Meta metric in 2023 WOSAC. Besides, a modified model based on MTR named MTR_E is proposed after the challenge, which has a better score 0.4911 and is ranked the 3rd on the leaderboard of WOSAC as of June 25, 2023.	翻訳日:2023-06-29 15:56:49 公開日:2023-06-28
# DCT:大規模離散行動空間を用いた強化学習のためのアクション埋め込みのデュアルチャネルトレーニング DCT: Dual Channel Training of Action Embeddings for Reinforcement Learning with Large Discrete Action Spaces ( http://arxiv.org/abs/2306.15913v1 ) ライセンス: Link先を確認	Pranavi Pathakota and Hardik Meisheri and Harshad Khadilkar	(参考訳) 大規模な離散的行動空間を一般化しながら強固なポリシーを学ぶ能力は、知的システム、特に次元の呪いに直面する雑音環境にとって、オープンな課題である。本稿では,アクション埋め込みを効率的に学習するための新しい枠組みを提案する。本稿では、動作再構成と状態予測精度のバランスをとる2つのチャネル損失を持つ動作埋め込みのためのエンコーダデコーダアーキテクチャについて述べる。我々は、トレーニングされたデコーダと、埋め込み空間でアクションを生成する標準強化学習アルゴリズムを併用する。私たちのアーキテクチャは、4000以上の離散的なノイズアクションを持つ2d maze環境と、現実世界のeコマーストランザクションデータを使用するプロダクトレコメンデーションタスクという、2つの異なる環境での2つの競合ベースラインよりも優れています。経験的な結果は、モデルがよりクリーンなアクション埋め込みをもたらすことを示し、改善された表現はより早い収束でより良いポリシーを学ぶのに役立つ。 The ability to learn robust policies while generalizing over large discrete action spaces is an open challenge for intelligent systems, especially in noisy environments that face the curse of dimensionality. In this paper, we present a novel framework to efficiently learn action embeddings that simultaneously allow us to reconstruct the original action as well as to predict the expected future state. We describe an encoder-decoder architecture for action embeddings with a dual channel loss that balances between action reconstruction and state prediction accuracy. We use the trained decoder in conjunction with a standard reinforcement learning algorithm that produces actions in the embedding space. Our architecture is able to outperform two competitive baselines in two diverse environments: a 2D maze environment with more than 4000 discrete noisy actions, and a product recommendation task that uses real-world e-commerce transaction data. Empirical results show that the model results in cleaner action embeddings, and the improved representations help learn better policies with earlier convergence.	翻訳日:2023-06-29 15:56:37 公開日:2023-06-28
# 食品インスタンスセグメンテーションにおけるインクリメンタル学習 Incremental Learning on Food Instance Segmentation ( http://arxiv.org/abs/2306.15910v1 ) ライセンス: Link先を確認	Huu-Thanh Nguyen, Yu Cao, Chong-Wah Ngo, Wing-Kwong Chan	(参考訳) 食品インスタンスのセグメンテーションは、食品画像中の料理のサービスサイズを推定するために不可欠である。最近のセグメンテーションの最先端技術は、印象的なセグメンテーション品質と高速計算を備えたディープラーニングネットワークである。それでも彼らはデータに飢えており、アノテーションには高価です。本稿では,データラベリング予算に制限のあるモデル性能を最適化するインクリメンタル学習フレームワークを提案する。フレームワークのパワーは、最新のトレーニングされたインスタンスセグメンテーションモデルに対して、非ラベルのサンプルがいかに困難であるかを予測する、新しい困難評価モデルである。データ収集手順はいくつかの段階に分けられ、それぞれに新しいサンプルパッケージが収集される。このフレームワークは、ラベル付け予算を最も難しいサンプルに割り当てる。評価モデルから一定の資格を満たす未ラベルのサンプルを用いて擬似ラベルを生成する。最終的には、手動ラベルと擬似ラベルがトレーニングデータに送られ、インスタンスセグメンテーションモデルが改善される。提案するフレームワークは,4つの大規模食品データセットにおいて,現在のインクリメンタルラーニングベンチマークより優れ,完全注釈付きサンプルでトレーニングしたモデルとの競合性能を実現している。 Food instance segmentation is essential to estimate the serving size of dishes in a food image. The recent cutting-edge techniques for instance segmentation are deep learning networks with impressive segmentation quality and fast computation. Nonetheless, they are hungry for data and expensive for annotation. This paper proposes an incremental learning framework to optimize the model performance given a limited data labelling budget. The power of the framework is a novel difficulty assessment model, which forecasts how challenging an unlabelled sample is to the latest trained instance segmentation model. The data collection procedure is divided into several stages, each in which a new sample package is collected. The framework allocates the labelling budget to the most difficult samples. The unlabelled samples that meet a certain qualification from the assessment model are used to generate pseudo-labels. Eventually, the manual labels and pseudo-labels are sent to the training data to improve the instance segmentation model. On four large-scale food datasets, our proposed framework outperforms current incremental learning benchmarks and achieves competitive performance with the model trained on fully annotated samples.	翻訳日:2023-06-29 15:56:21 公開日:2023-06-28
# RL$^3$: RLによるメタ強化学習をRL$^2$内で促進する RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ ( http://arxiv.org/abs/2306.15909v1 ) ライセンス: Link先を確認	Abhinav Bhatia, Samer B. Nashed, Shlomo Zilberstein	(参考訳) RL$^2$のようなメタ強化学習(meta-RL)手法は、与えられたタスク分布に合わせてデータ効率のよいRLアルゴリズムを学習するための有望なアプローチとして登場した。しかしながら、これらのRLアルゴリズムは、値関数のような一般的なRLコンポーネントにまとめるのではなく、繰り返しニューラルネットワークを使用して経験のシーケンスを処理するため、長い水平タスクや分配タスクに苦労する。さらに、トランスフォーマーでさえ、トレーニングや推論コストが禁じられる前に効率的に推論できる履歴の長さに実用的な制限がある。対照的に、従来のRLアルゴリズムはドメイン知識を活用せず、より多くのデータが利用可能になるにつれて最適なポリシーに収束するので、データ非効率である。本稿では,従来のRLとメタRLを組み合わせたハイブリッド手法であるRL$^3$を提案する。 rl$^3$ は rl$^2$ と比較して長期ホリゾン・アウト・オブ・ディストリビューション・タスクでより大きな累積報酬を得られるが、短期的には後者の効率は維持される。様々な短期的、長期的、複雑な依存関係を示すメタRL文献から、カスタムドメインとベンチマークドメインの両方で実験を行う。 Meta reinforcement learning (meta-RL) methods such as RL$^2$ have emerged as promising approaches for learning data-efficient RL algorithms tailored to a given task distribution. However, these RL algorithms struggle with long-horizon tasks and out-of-distribution tasks since they rely on recurrent neural networks to process the sequence of experiences instead of summarizing them into general RL components such as value functions. Moreover, even transformers have a practical limit to the length of histories they can efficiently reason about before training and inference costs become prohibitive. In contrast, traditional RL algorithms are data-inefficient since they do not leverage domain knowledge, but they do converge to an optimal policy as more data becomes available. In this paper, we propose RL$^3$, a principled hybrid approach that combines traditional RL and meta-RL by incorporating task-specific action-values learned through traditional RL as an input to the meta-RL neural network. We show that RL$^3$ earns greater cumulative reward on long-horizon and out-of-distribution tasks compared to RL$^2$, while maintaining the efficiency of the latter in the short term. Experiments are conducted on both custom and benchmark discrete domains from the meta-RL literature that exhibit a range of short-term, long-term, and complex dependencies.	翻訳日:2023-06-29 15:56:03 公開日:2023-06-28
# 南フロリダにおける水ステージ予測のための深層学習モデル Deep Learning Models for Water Stage Predictions in South Florida ( http://arxiv.org/abs/2306.15907v1 ) ライセンス: Link先を確認	Jimeng Shi, Zeda Yin, Rukmangadh Myana, Khandker Ishtiaq, Anupama John, Jayantha Obeysekera, Arturo Leon, Giri Narasimhan	(参考訳) 河川システムにおける水位シミュレーションと予測は,洪水警報,水理操作,洪水軽減に不可欠である。工学分野では、HEC-RAS、MIKE、SWMMといったツールを使用して、詳細な物理に基づく水理・水理計算モデルを構築し、流域全体をシミュレートし、システム内の任意の時点での水ステージを予測する。しかし、これらの物理学に基づくモデルは、特に大きな流域やより長いシミュレーションのために、計算集約的である。この問題を克服するために,我々は複数の深層学習モデル(DL)を代理モデルとして使用し,水ステージを迅速に予測する。南フロリダのマイアミ川の下流は,本論文の事例研究として選択されている。データセットは2010年1月1日から2020年12月31日まで、南フロリダ水管理地区(SFWMD)のDBHYDROデータベースからダウンロードされる。大規模な実験により、DLモデルの性能は極度の降水条件(熱帯嵐)においても物理学に基づくモデルの性能に匹敵することが示された。さらに,予測長の増加に伴うDLモデルの予測精度の低下について検討した。今後の水ステージを予測するため,我々のDLモデルでは,近年の河川系の測定変数と,近い将来に確実に予測できる共変量を用いている。要約すると、ディープラーニングモデルは、物理ベースのモデルと比較して、少なくとも1000倍のスピードアップで、同等またはより良いエラー率を達成する。 Simulating and predicting water levels in river systems is essential for flood warnings, hydraulic operations, and flood mitigations. In the engineering field, tools such as HEC-RAS, MIKE, and SWMM are used to build detailed physics-based hydrological and hydraulic computational models to simulate the entire watershed, thereby predicting the water stage at any point in the system. However, these physics-based models are computationally intensive, especially for large watersheds and for longer simulations. To overcome this problem, we train several deep learning (DL) models for use as surrogate models to rapidly predict the water stage. The downstream stage of the Miami River in South Florida is chosen as a case study for this paper. The dataset is from January 1, 2010, to December 31, 2020, downloaded from the DBHYDRO database of the South Florida Water Management District (SFWMD). Extensive experiments show that the performance of the DL models is comparable to that of the physics-based models, even during extreme precipitation conditions (i.e., tropical storms). Furthermore, we study the decline in prediction accuracy of the DL models with an increase in prediction lengths. In order to predict the water stage in the future, our DL models use measured variables of the river system from the recent past as well as covariates that can be reliably predicted in the near future. In summary, the deep learning models achieve comparable or better error rates with at least 1000x speedup in comparison to the physics-based models.	翻訳日:2023-06-29 15:55:39 公開日:2023-06-28
# 協調濾過における硬質負試料の寸法独立混合 Dimension Independent Mixup for Hard Negative Sample in Collaborative Filtering ( http://arxiv.org/abs/2306.15905v1 ) ライセンス: Link先を確認	Xi Wu, Liangwei Yang, Jibing Gong, Chao Zhou, Tianyu Lin, Xiaolong Liu, Philip S. Yu	(参考訳) 協調フィルタリング(CF)は,過去のインタラクションに基づいてユーザの好みを予測する手法として広く利用されている。負のサンプリングは、暗黙のフィードバックでcfベースのモデルのトレーニングにおいて重要な役割を果たす。本稿では,既存のサンプリング手法を再検討するためのサンプリング領域に基づく新しい視点を提案する。現状のサンプリング手法は, 主にポイントワイズやラインワイズに焦点を合わせ, 柔軟性の欠如, ハードサンプリング領域の大部分を未検討のまま残している。この制限に対処するため,CFモデルを用いた最初のエリアワイドサンプリング手法であるDINS(Dimension Independent Mixup for Hard Negative Smpling)を提案する。 DINSはハード境界定義、次元独立混合、マルチホッププールの3つのモジュールから構成されている。行列分解モデルとグラフベースモデルの両方における実世界のデータセットを用いた実験により、DINSは他の負のサンプリング手法よりも優れ、その効果と優越性を確立した。本研究は,新たな視点と領域的サンプリングの導入,負サンプリングの最先端性能を実現する新たなアプローチとしてdinsを提案する。私たちの実装はPyTorchで利用可能です。 Collaborative filtering (CF) is a widely employed technique that predicts user preferences based on past interactions. Negative sampling plays a vital role in training CF-based models with implicit feedback. In this paper, we propose a novel perspective based on the sampling area to revisit existing sampling methods. We point out that current sampling methods mainly focus on Point-wise or Line-wise sampling, lacking flexibility and leaving a significant portion of the hard sampling area un-explored. To address this limitation, we propose Dimension Independent Mixup for Hard Negative Sampling (DINS), which is the first Area-wise sampling method for training CF-based models. DINS comprises three modules: Hard Boundary Definition, Dimension Independent Mixup, and Multi-hop Pooling. Experiments with real-world datasets on both matrix factorization and graph-based models demonstrate that DINS outperforms other negative sampling methods, establishing its effectiveness and superiority. Our work contributes a new perspective, introduces Area-wise sampling, and presents DINS as a novel approach that achieves state-of-the-art performance for negative sampling. Our implementations are available in PyTorch.	翻訳日:2023-06-29 15:55:17 公開日:2023-06-28
# 多様性と強み - 複数のAIの相互強化学習によるサッカーフルゲームをマスターする Diversity is Strength: Mastering Football Full Game with Interactive Reinforcement Learning of Multiple AIs ( http://arxiv.org/abs/2306.15903v1 ) ライセンス: Link先を確認	Chenglu Sun, Shuo Shen, Sijia Xu, Weidong Zhang	(参考訳) マルチエージェント環境で強力で豊かな戦略でAIを訓練することは、Deep Reinforcement Learning(DRL)において重要な研究トピックである。 AIの強みは戦略の多様性と密接に関連しており、この関係は、強い戦略と豊かな戦略の両方でAIを訓練するためのガイドとなります。この点を証明するために、多種多様なAIを同時に訓練できる新しいDRLトレーニングフレームワークであるDiversity is Strength (DIS)を提案する。これらのAIは相互接続された履歴モデルプール構造を介してリンクされ、その能力と戦略の多様性を高める。また、モデルプールを強化し、最終的なAIを得るために最適なモデルを選択するためのモデル評価およびスクリーニングスキームを設計する。提案手法は,人的データを用いることなく,多様で汎用的で強力なAI戦略を提供する。私たちはGoogle Research Football(GRF)に基づいたAIコンペでテストを行い、5v5と11v11のトラックで優勝しました。この方法により、GRF AIは、複雑なマルチエージェント環境下で、5v5と11v11トラックの両方で、初めてハイレベルになる。行動分析により、トレーニングされたAIは豊富な戦略を持ち、アブレーション実験は、設計されたモジュールがトレーニングプロセスの恩恵を受けることを示した。 Training AI with strong and rich strategies in multi-agent environments remains an important research topic in Deep Reinforcement Learning (DRL). The AI's strength is closely related to its diversity of strategies, and this relationship can guide us to train AI with both strong and rich strategies. To prove this point, we propose Diversity is Strength (DIS), a novel DRL training framework that can simultaneously train multiple kinds of AIs. These AIs are linked through an interconnected history model pool structure, which enhances their capabilities and strategy diversities. We also design a model evaluation and screening scheme to select the best models to enrich the model pool and obtain the final AI. The proposed training method provides diverse, generalizable, and strong AI strategies without using human data. We tested our method in an AI competition based on Google Research Football (GRF) and won the 5v5 and 11v11 tracks. The method enables a GRF AI to have a high level on both 5v5 and 11v11 tracks for the first time, which are under complex multi-agent environments. The behavior analysis shows that the trained AI has rich strategies, and the ablation experiments proved that the designed modules benefit the training process.	翻訳日:2023-06-29 15:54:59 公開日:2023-06-28
# 分布外一般化のための個別及び構造グラフ情報基盤 Individual and Structural Graph Information Bottlenecks for Out-of-Distribution Generalization ( http://arxiv.org/abs/2306.15902v1 ) ライセンス: Link先を確認	Ling Yang, Jiayi Zheng, Heyuan Wang, Zhongyi Liu, Zhilin Huang, Shenda Hong, Wentao Zhang, Bin Cui	(参考訳) アウト・オブ・ディストリビューション (OOD) グラフの一般化は多くの実世界のアプリケーションにとって重要である。既存の方法は、ラベルとは無関係な入力の急激な特徴や騒々しい特徴を捨てることを無視している。さらに、主にインスタンスレベルのクラス不変グラフ学習を行い、グラフインスタンス間の構造クラス関係を利用できない。本研究は,IS-GIB(Personal and Structure Graph Information Bottlenecks)と呼ばれる統合フレームワークを用いて,これらの課題に対処する。分散シフトによるクラス急激な特徴を除去するために,入力グラフと埋め込みの相互情報を最小化することにより,無関係な情報を捨てるPersonal Graph Information Bottleneck (I-GIB)を提案する。構造内およびドメイン間相関の活用を目的として,構造グラフ情報ボトルネック(S-GIB)を提案する。具体的には、複数のドメインを持つグラフのバッチに対して、S-GIBはまずペアの入力-入力、埋め込み-埋め込み、ラベル-ラベル相関を計算する。そして、埋め込みとラベルペア間の相互情報を最大化しながら、入力グラフと埋め込みペア間の相互情報を最小化する。 S-GIBの批判的な洞察は、複数の分布シフトの下でクラス関係を維持することにより、急激な特徴を同時に排除し、高次の視点から不変な特徴を学習することである。特に、提案したI-GIBとS-GIBを統一して、補完的なフレームワークIS-GIBを形成する。ノードレベルのタスクとグラフレベルのタスクの両方で実施された大規模な実験は、IS-GIBの優れた一般化能力を一貫して示している。コードはhttps://github.com/yangling0818/graphoodで入手できる。 Out-of-distribution (OOD) graph generalization are critical for many real-world applications. Existing methods neglect to discard spurious or noisy features of inputs, which are irrelevant to the label. Besides, they mainly conduct instance-level class-invariant graph learning and fail to utilize the structural class relationships between graph instances. In this work, we endeavor to address these issues in a unified framework, dubbed Individual and Structural Graph Information Bottlenecks (IS-GIB). To remove class spurious feature caused by distribution shifts, we propose Individual Graph Information Bottleneck (I-GIB) which discards irrelevant information by minimizing the mutual information between the input graph and its embeddings. To leverage the structural intra- and inter-domain correlations, we propose Structural Graph Information Bottleneck (S-GIB). Specifically for a batch of graphs with multiple domains, S-GIB first computes the pair-wise input-input, embedding-embedding, and label-label correlations. Then it minimizes the mutual information between input graph and embedding pairs while maximizing the mutual information between embedding and label pairs. The critical insight of S-GIB is to simultaneously discard spurious features and learn invariant features from a high-order perspective by maintaining class relationships under multiple distributional shifts. Notably, we unify the proposed I-GIB and S-GIB to form our complementary framework IS-GIB. Extensive experiments conducted on both node- and graph-level tasks consistently demonstrate the superior generalization ability of IS-GIB. The code is available at https://github.com/YangLing0818/GraphOOD.	翻訳日:2023-06-29 15:54:38 公開日:2023-06-28
# 特権情報による擬似ラベル化とそのin situシーケンシング画像への応用 Pseudo-Labeling Enhanced by Privileged Information and Its Application to In Situ Sequencing Images ( http://arxiv.org/abs/2306.15898v1 ) ライセンス: Link先を確認	Marzieh Haghighi, Mario C. Cruz, Erin Weisbart, Beth A. Cimini, Avtar Singh, Julia Bauman, Maria E. Lozada, Sanam L. Kavari, James T. Neal, Paul C. Blainey, Anne E. Carpenter and Shantanu Singh	(参考訳) ラベル・スカース物体検出のための様々な戦略がコンピュータビジョン研究コミュニティによって検討されている。これらの戦略は主に、自然画像に特有の仮定に依存しており、生物学的および生物医学的な視覚領域に直接適用されない。例えば、ほとんどの半教師付き学習戦略は、信頼できる真実の情報源としてラベル付きデータの小さなセットに依存している。しかし、多くの生物学的視覚応用において、基礎的真理は未知であり、間接的な情報はノイズ推定や直交的証拠という形で利用可能である。本研究では,半教師付き物体検出(ssod)問題として,空間的トランスクリプトミクス(iss画像からバーコードを復号する)における重要な問題を考察する。提案フレームワークは,半教師付き学習フレームワークに特権情報という形で追加可能な情報ソースを組み込む。特権情報は教師の疑似ラベルに組み込まれ、教師の教師が自習するイテレーションで学習される。利用可能な特権情報はデータドメインに特化することができるが、特権情報(PLePI)によって強化された擬似ラベルの一般的な戦略を導入し、ISSイメージとCLIPが提供する余分な証拠を用いたCOCOベンチマークを用いて概念を実証した。 Various strategies for label-scarce object detection have been explored by the computer vision research community. These strategies mainly rely on assumptions that are specific to natural images and not directly applicable to the biological and biomedical vision domains. For example, most semi-supervised learning strategies rely on a small set of labeled data as a confident source of ground truth. In many biological vision applications, however, the ground truth is unknown and indirect information might be available in the form of noisy estimations or orthogonal evidence. In this work, we frame a crucial problem in spatial transcriptomics - decoding barcodes from In-Situ-Sequencing (ISS) images - as a semi-supervised object detection (SSOD) problem. Our proposed framework incorporates additional available sources of information into a semi-supervised learning framework in the form of privileged information. The privileged information is incorporated into the teacher's pseudo-labeling in a teacher-student self-training iteration. Although the available privileged information could be data domain specific, we have introduced a general strategy of pseudo-labeling enhanced by privileged information (PLePI) and exemplified the concept using ISS images, as well on the COCO benchmark using extra evidence provided by CLIP.	翻訳日:2023-06-29 15:54:12 公開日:2023-06-28
# 帰属訓練データジェネレータとしての大規模言語モデル:多様性とバイアスの物語 Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias ( http://arxiv.org/abs/2306.15895v1 ) ライセンス: Link先を確認	Yue Yu, Yuchen Zhuang, Jieyu Zhang, Yu Meng, Alexander Ratner, Ranjay Krishna, Jiaming Shen, Chao Zhang	(参考訳) 大規模言語モデル(LLM)は、最近、様々な自然言語処理(NLP)タスクのためのトレーニングデータジェネレータとして活用されている。従来の研究では、生成データを用いたモデルトレーニングのさまざまなアプローチが検討されているが、一般的には、生成されたデータの多様性を制限し、LLMの系統的バイアスを継承する、単純なクラス条件のプロンプトに依存している。そこで本研究では,多様な属性を持つプロンプト(例えば,長さやスタイルなどの属性を指定する)を用いたトレーニングデータ生成について検討する。本研究は,高い濃度と多様なドメインを持つデータセットに着目し,帰属プロンプトが,結果モデルの性能の点で単純なクラス条件プロンプトよりも優れていることを示す。 Additionally, we present a comprehensive empirical study on data generation encompassing vital aspects like bias, diversity, and efficiency, and highlight three key observations: firstly, synthetic datasets generated by simple prompts exhibit significant biases, such as regional bias; secondly, attribute diversity plays a pivotal role in enhancing model performance; lastly, attributed prompts achieve the performance of simple class-conditional prompts while utilizing only 5\% of the querying cost of ChatGPT associated with the latter. 生成されたデータセットを公開し、今後の研究を促進するためにプロンプトを使用します。データとコードは \url{https://github.com/yueyu1030/AttrPrompt} で入手できる。 Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks. While previous research has explored different approaches to training models using generated data, they generally rely on simple class-conditional prompts, which may limit the diversity of the generated data and inherit systematic biases of LLM. Thus, we investigate training data generation with diversely attributed prompts (e.g., specifying attributes like length and style), which have the potential to yield diverse and attributed generated data. Our investigation focuses on datasets with high cardinality and diverse domains, wherein we demonstrate that attributed prompts outperform simple class-conditional prompts in terms of the resulting model's performance. Additionally, we present a comprehensive empirical study on data generation encompassing vital aspects like bias, diversity, and efficiency, and highlight three key observations: firstly, synthetic datasets generated by simple prompts exhibit significant biases, such as regional bias; secondly, attribute diversity plays a pivotal role in enhancing model performance; lastly, attributed prompts achieve the performance of simple class-conditional prompts while utilizing only 5\% of the querying cost of ChatGPT associated with the latter. We release the generated dataset and used prompts to facilitate future research. The data and code will be available on \url{https://github.com/yueyu1030/AttrPrompt}.	翻訳日:2023-06-29 15:53:52 公開日:2023-06-28
# グローバルおよびローカル表現に基づくマルチネットワークコントラスト学習 Multi-network Contrastive Learning Based on Global and Local Representations ( http://arxiv.org/abs/2306.15930v1 ) ライセンス: Link先を確認	Weiquan Li, Xianzhong Long, Yun Li	(参考訳) 自己教師付き学習の人気により、ラベル付きデータに頼ることなくモデルをトレーニングすることが可能になった。しかしながら、既存の自己教師付きコントラスト学習手法の多くは、グローバル特徴情報とローカル特徴情報の組み合わせを見落としていることが多い。本稿では,グローバルおよびローカル表現に基づくマルチネットワークコントラスト学習フレームワークを提案する。複数のネットワークを通じて自己指導型コントラスト学習のためのグローバル・ローカル特徴情報を導入する。モデルは、複数のネットワークから生成される埋め込みペアを対比して、画像の異なるスケールで特徴情報を学習する。このフレームワークはまた、コントラストに使用されるサンプル数を拡大し、モデルのトレーニング効率を向上させる。 3つのベンチマークデータセットの線形評価結果から,本手法は従来の自己教師付き学習法よりも優れていることが示された。 The popularity of self-supervised learning has made it possible to train models without relying on labeled data, which saves expensive annotation costs. However, most existing self-supervised contrastive learning methods often overlook the combination of global and local feature information. This paper proposes a multi-network contrastive learning framework based on global and local representations. We introduce global and local feature information for self-supervised contrastive learning through multiple networks. The model learns feature information at different scales of an image by contrasting the embedding pairs generated by multiple networks. The framework also expands the number of samples used for contrast and improves the training efficiency of the model. Linear evaluation results on three benchmark datasets show that our method outperforms several existing classical self-supervised learning methods.	翻訳日:2023-06-29 15:47:40 公開日:2023-06-28
# ジャンプポイント探索における冗長作業の削減 Reducing Redundant Work in Jump Point Search ( http://arxiv.org/abs/2306.15928v1 ) ライセンス: Link先を確認	Shizhe Zhao, Daniel Harabor, Peter J. Stuckey	(参考訳) JPS (Jump Point Search) は、オンライングリッドベースのパスフィンディングのための最先端のアルゴリズムである。ゲームやその他のナビゲーションシナリオで広く使われているが、JPSは研究されていない病理学的行動を示すことができる。 i) 地図の同じ領域を何度もスキャンして後継を見つけることができる。 (ii) 最適下探索ノードを生成して拡張する。本研究では,これらの病的行動の源泉について検討し,実際にどのように起こるかを示し,より効率的に対処するためのオンラインアプローチであるConstrained JPS(CJPS)を提案する。実験の結果、cjpsのオーバーヘッドは低く、動的に変化するグリッド環境ではjpsよりも高速であることが示され、大きなゲームマップでは最大7倍、病理シナリオでは最大14倍まで向上した。 JPS (Jump Point Search) is a state-of-the-art optimal algorithm for online grid-based pathfinding. Widely used in games and other navigation scenarios, JPS nevertheless can exhibit pathological behaviours which are not well studied: (i) it may repeatedly scan the same area of the map to find successors; (ii) it may generate and expand suboptimal search nodes. In this work, we examine the source of these pathological behaviours, show how they can occur in practice, and propose a purely online approach, called Constrained JPS (CJPS), to tackle them efficiently. Experimental results show that CJPS has low overheads and is often faster than JPS in dynamically changing grid environments: by up to 7x in large game maps and up to 14x in pathological scenarios.	翻訳日:2023-06-29 15:47:29 公開日:2023-06-28
# 全文脈情報から動的グラフを学習し, 正確な訪問予測 Learning Dynamic Graphs from All Contextual Information for Accurate Point-of-Interest Visit Forecasting ( http://arxiv.org/abs/2306.15927v1 ) ライセンス: Link先を確認	Arash Hajisafi, Haowen Lin, Sina Shaham, Haoji Hu, Maria Despoina Siampou, Yao-Yi Chiang, Cyrus Shahabi	(参考訳) 都市部におけるポイント・オブ・関心(POI)の訪問数予測は、都市計画・交通管理から公衆衛生・社会研究に至るまで、様々な分野の計画・意思決定に不可欠である。この予測問題は、多変量時系列予測タスクとして定式化することができるが、現在の手法では、POI間の常に変化するマルチコンテキスト相関を完全に活用することはできない。そこで本研究では,pois間のマルチコンテキスト相関を学習し,より正確な訪問予測のための時間的グラフニューラルネットワークであるbroadness graph neural network (bysgnn)を提案する。動的グラフを学習するために時系列データのみを使用する他のアプローチとは異なり、BysGNNはコンテキスト情報と時系列データを利用して正確な動的グラフ表現を学ぶ。文脈的・時間的・空間的な信号をすべて取り入れることで、米国中の実世界のデータセットを用いた実験において、最先端の予測モデルよりも予測精度が大幅に向上するのを観察する。 Forecasting the number of visits to Points-of-Interest (POI) in an urban area is critical for planning and decision-making for various application domains, from urban planning and transportation management to public health and social studies. Although this forecasting problem can be formulated as a multivariate time-series forecasting task, the current approaches cannot fully exploit the ever-changing multi-context correlations among POIs. Therefore, we propose Busyness Graph Neural Network (BysGNN), a temporal graph neural network designed to learn and uncover the underlying multi-context correlations between POIs for accurate visit forecasting. Unlike other approaches where only time-series data is used to learn a dynamic graph, BysGNN utilizes all contextual information and time-series data to learn an accurate dynamic graph representation. By incorporating all contextual, temporal, and spatial signals, we observe a significant improvement in our forecasting accuracy over state-of-the-art forecasting models in our experiments with real-world datasets across the United States.	翻訳日:2023-06-29 15:47:17 公開日:2023-06-28
# ほとんどの言語モデルも詩人になれる:AIライティングアシスタントと制約付きテキスト生成スタジオ Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio ( http://arxiv.org/abs/2306.15926v1 ) ライセンス: Link先を確認	Allen Roush, Sanjay Basu, Akshay Moorthy, Dmitry Dubovoy	(参考訳) 制約された自然言語生成の分野で急速に進歩したにもかかわらず、語彙が語彙的に、意味的に、あるいは音声的に制約された言語モデルの可能性を探る時間はほとんどない。ほとんどの言語モデルは、大きな制約の下でも魅力的なテキストを生成する。本稿では,テキスト単位を生成する前に,言語モデル語彙にフィルタ関数を合成適用することにより,言語モデルの出力をシンプルかつ普遍的に変更する手法を提案する。このアプローチはプラグアンドプレイであり、モデルを変更する必要はない。本手法の価値を示すために,CTGS(Constrained Text Generation Studio)と呼ばれるAI記述アシスタントを提案する。 CTGSは、特定の文字を禁止したり、生成された単語に一定の数の音節を持つように強制したり、単語を他の単語の部分的なアナグラムに強制したりといった、幅広い制約の組み合わせでテキストを生成または選択することができる。文字eを省略する新しい散文データセットを導入する。本手法は,本データセットの微調整のみと比較して,厳格に優れた性能を示す。また,Gadsbyという技術を用いたHuggingfaceのWebアプリケーションも紹介する。コードはここで公開されている。 https://github.com/HellisotherPeople/Constrained-Text-Generation-Studio Despite rapid advancement in the field of Constrained Natural Language Generation, little time has been spent on exploring the potential of language models which have had their vocabularies lexically, semantically, and/or phonetically constrained. We find that most language models generate compelling text even under significant constraints. We present a simple and universally applicable technique for modifying the output of a language model by compositionally applying filter functions to the language models vocabulary before a unit of text is generated. This approach is plug-and-play and requires no modification to the model. To showcase the value of this technique, we present an easy to use AI writing assistant called Constrained Text Generation Studio (CTGS). CTGS allows users to generate or choose from text with any combination of a wide variety of constraints, such as banning a particular letter, forcing the generated words to have a certain number of syllables, and/or forcing the words to be partial anagrams of another word. We introduce a novel dataset of prose that omits the letter e. We show that our method results in strictly superior performance compared to fine-tuning alone on this dataset. We also present a Huggingface space web-app presenting this technique called Gadsby. The code is available to the public here: https://github.com/Hellisotherpeople/Constrained-Text-Generation-Studio	翻訳日:2023-06-29 15:46:57 公開日:2023-06-28
# ロングテール認識のためのサブクラスバランスコントラスト学習 Subclass-balancing Contrastive Learning for Long-tailed Recognition ( http://arxiv.org/abs/2306.15925v1 ) ライセンス: Link先を確認	Chengkai Hou and Jieyu Zhang and Haonan Wang and Tianyi Zhou	(参考訳) 不均衡なクラス分布を持つロングテール認識は、実践的な機械学習アプリケーションで自然に現れる。 data reweighing、resampling、supervised contrastive learningのような既存のメソッドは、クラスバランスを、headクラスとtailクラスのインスタンス間の不均衡を導入する価格で強制し、これは前者のリッチなセマンティックサブ構造を無視し、後者のバイアスを誇張する可能性がある。これらの欠点を,各headクラスを末尾クラスと同じ大きさの複数のサブクラスに分類し,元のクラスとそれらのサブクラスの間の2層クラス階層をキャプチャする表現を強制する,新しい`subclass-balancing contrastive learning (sbcl)'アプローチによって克服した。クラスタリングは、表現空間内で実行され、トレーニング中に更新されるので、サブクラスラベルは、ヘッドクラスのセマンティックサブ構造を保持する。一方、テールクラスのサンプルを過度に強調しないため、各インスタンスは表現学習に等しく貢献する。したがって,本手法はインスタンスとサブクラスのバランスを両立させるが,元のクラスラベルは異なるクラスのサブクラスのコントラスト学習によって学習される。我々は,長期化ベンチマークデータセットの一覧からSBCLを評価し,最先端のパフォーマンスを実現する。さらに,SBCLのさらなる分析とアブレーションを行い,その利点を検証した。 Long-tailed recognition with imbalanced class distribution naturally emerges in practical machine learning applications. Existing methods such as data reweighing, resampling, and supervised contrastive learning enforce the class balance with a price of introducing imbalance between instances of head class and tail class, which may ignore the underlying rich semantic substructures of the former and exaggerate the biases in the latter. We overcome these drawbacks by a novel ``subclass-balancing contrastive learning (SBCL)'' approach that clusters each head class into multiple subclasses of similar sizes as the tail classes and enforce representations to capture the two-layer class hierarchy between the original classes and their subclasses. Since the clustering is conducted in the representation space and updated during the course of training, the subclass labels preserve the semantic substructures of head classes. Meanwhile, it does not overemphasize tail class samples, so each individual instance contribute to the representation learning equally. Hence, our method achieves both the instance- and subclass-balance, while the original class labels are also learned through contrastive learning among subclasses from different classes. We evaluate SBCL over a list of long-tailed benchmark datasets and it achieves the state-of-the-art performance. In addition, we present extensive analyses and ablation studies of SBCL to verify its advantages.	翻訳日:2023-06-29 15:46:36 公開日:2023-06-28
# オペレーター学習における次元の呪い The curse of dimensionality in operator learning ( http://arxiv.org/abs/2306.15924v1 ) ライセンス: Link先を確認	Samuel Lanthaler and Andrew M. Stuart	(参考訳) ニューラルネットワークを用いて、関数のバナッハ空間間の演算子マッピングを近似し、エミュレーションによってモデル評価を加速したり、データからモデルを発見したりすることができる。その結果,近年,この手法が注目され,オペレーター学習の分野が急速に拡大している。この論文の第一の貢献は、C^r$-あるいはリプシッツ正則性のみによって特徴づけられる作用素の一般クラスに対して、無限次元の入力および出力関数空間の表現に関して正確に定義された次元性の呪いに苦しむことである。その結果は、PCA-Net、DeepONet、FNOなど、さまざまな既存のニューラル演算子に適用できる。この論文の第二の貢献は、ハミルトン・ヤコビ方程式によって定義される解作用素に対して、次元性の一般的な呪いが克服可能であることを証明することである。この目的のために、hj-netと呼ばれる新しいニューラルオペレーターアーキテクチャが導入され、基盤となるハミルトン系の特性情報を明示的に考慮した。 hj-net の誤差と複雑性の推定は、このアーキテクチャが無限次元の入出力関数空間に関連する次元の呪いを打ち負かすことができることを示している。 Neural operator architectures employ neural networks to approximate operators mapping between Banach spaces of functions; they may be used to accelerate model evaluations via emulation, or to discover models from data. Consequently, the methodology has received increasing attention over recent years, giving rise to the rapidly growing field of operator learning. The first contribution of this paper is to prove that for general classes of operators which are characterized only by their $C^r$- or Lipschitz-regularity, operator learning suffers from a curse of dimensionality, defined precisely here in terms of representations of the infinite-dimensional input and output function spaces. The result is applicable to a wide variety of existing neural operators, including PCA-Net, DeepONet and the FNO. The second contribution of the paper is to prove that the general curse of dimensionality can be overcome for solution operators defined by the Hamilton-Jacobi equation; this is achieved by leveraging additional structure in the underlying solution operator, going beyond regularity. To this end, a novel neural operator architecture is introduced, termed HJ-Net, which explicitly takes into account characteristic information of the underlying Hamiltonian system. Error and complexity estimates are derived for HJ-Net which show that this architecture can provably beat the curse of dimensionality related to the infinite-dimensional input and output function spaces.	翻訳日:2023-06-29 15:46:11 公開日:2023-06-28
# 微細な3次元物体認識 : アプローチと実験 Fine-grained 3D object recognition: an approach and experiments ( http://arxiv.org/abs/2306.15919v1 ) ライセンス: Link先を確認	Junhyung Jo, Hamidreza Kasaei	(参考訳) 3次元物体認識技術は自動車の自動運転などの先進技術の中核技術として利用されている。 3Dオブジェクト認識には2つのアプローチがある。 (i)Global Orthographic Object Descriptor(GOOD)などの手作りのアプローチ (ii)mobilenetやvggといったディープラーニングベースのアプローチ。しかし、既知のカテゴリの数が時間とともに増加するオープンエンド領域では、これらのアプローチのどちらがよりうまく機能するかを知る必要があり、システムは、少数のトレーニング例を使って、新しいオブジェクトカテゴリについて学ぶ必要がある。本稿では,オブジェクトビューを入力とし,カテゴリラベルを出力として生成するオフライン3Dオブジェクト認識システムを最初に実装した。オフラインの段階では、インスタンスベースの学習(IBL)を使用して新しいカテゴリを形成し、得られたオブジェクト認識性能を評価するためにK-foldクロスバリデーションを使用する。次に,提案手法をシミュレートした教師テストに統合し,オンライン形式でテストを行った。その結果,ディープラーニング機能を用いたアプローチは,よりオープンな手法に適していることがわかった。さらに,手作り・深層学習の特徴を結合することで分類精度が向上することを確認した。 Three-dimensional (3D) object recognition technology is being used as a core technology in advanced technologies such as autonomous driving of automobiles. There are two sets of approaches for 3D object recognition: (i) hand-crafted approaches like Global Orthographic Object Descriptor (GOOD), and (ii) deep learning-based approaches such as MobileNet and VGG. However, it is needed to know which of these approaches works better in an open-ended domain where the number of known categories increases over time, and the system should learn about new object categories using few training examples. In this paper, we first implemented an offline 3D object recognition system that takes an object view as input and generates category labels as output. In the offline stage, instance-based learning (IBL) is used to form a new category and we use K-fold cross-validation to evaluate the obtained object recognition performance. We then test the proposed approach in an online fashion by integrating the code into a simulated teacher test. As a result, we concluded that the approach using deep learning features is more suitable for open-ended fashion. Moreover, we observed that concatenating the hand-crafted and deep learning features increases the classification accuracy.	翻訳日:2023-06-29 15:45:48 公開日:2023-06-28
# ニューラルネットワークが捉えた情報:記憶と一般化とのつながり On information captured by neural networks: connections with memorization and generalization ( http://arxiv.org/abs/2306.15918v1 ) ライセンス: Link先を確認	Hrayr Harutyunyan	(参考訳) ディープラーニングの人気と成功にもかかわらず、ニューラルネットワークが未知の例に一般化する時期、方法、理由の理解は限られている。学習はデータから情報を取り出すものとして見ることができるので、トレーニング中にニューラルネットワークが取得した情報を正式に研究する。具体的には,情報理論的な観点から雑音ラベルの存在下での学習から始め,ラベル雑音情報を重み付けに制限する学習アルゴリズムを導出する。次に、個々のサンプルがディープネットワークのトレーニングに与えるユニークな情報の概念を定義し、非定型的、曖昧、あるいは過度に表現されたサブポピュレーションに属する例のニューラルネットワークの振る舞いに光を当てる。非空の一般化ギャップ境界を導出することで、例情報性と一般化を関連付ける。最後に, 知識蒸留の研究により, 一般化におけるデータとラベルの複雑さの重要性を浮き彫りにする。その結果,ニューラルネットワークの一般化のメカニズムの理解を深めることができた。 Despite the popularity and success of deep learning, there is limited understanding of when, how, and why neural networks generalize to unseen examples. Since learning can be seen as extracting information from data, we formally study information captured by neural networks during training. Specifically, we start with viewing learning in presence of noisy labels from an information-theoretic perspective and derive a learning algorithm that limits label noise information in weights. We then define a notion of unique information that an individual sample provides to the training of a deep network, shedding some light on the behavior of neural networks on examples that are atypical, ambiguous, or belong to underrepresented subpopulations. We relate example informativeness to generalization by deriving nonvacuous generalization gap bounds. Finally, by studying knowledge distillation, we highlight the important role of data and label complexity in generalization. Overall, our findings contribute to a deeper understanding of the mechanisms underlying neural network generalization.	翻訳日:2023-06-29 15:45:30 公開日:2023-06-28
# 信頼度校正エンサンブルダンスフレーズ検索 Confidence-Calibrated Ensemble Dense Phrase Retrieval ( http://arxiv.org/abs/2306.15917v1 ) ライセンス: Link先を確認	William Yang, Noah Bergam, Arnav Jain, Nima Sheikhoslami	(参考訳) 本稿では, (Karpukhin et al. 2020) によって開発されたトランスフォーマーを用いた Dense Passage Retrieval (DPR) アルゴリズムが, 事前学習なしに最適化できる範囲について考察する。この手法には2つの特別な洞察が含まれている: dprコンテキストエンコーダを様々な句長(例えば、1セントと5セントのセグメント)に適用し、これら全てのセグメントに対して信頼度に合致したアンサンブル予測を行う。このやや徹底的なアプローチは、Google NQやSQuADといったベンチマークデータセットで、最先端の結果を達成する。また,本手法をドメイン固有のデータセットに適用し,異なるドメインに対して異なる粒度が最適であることを示す。 In this paper, we consider the extent to which the transformer-based Dense Passage Retrieval (DPR) algorithm, developed by (Karpukhin et. al. 2020), can be optimized without further pre-training. Our method involves two particular insights: we apply the DPR context encoder at various phrase lengths (e.g. one-sentence versus five-sentence segments), and we take a confidence-calibrated ensemble prediction over all of these different segmentations. This somewhat exhaustive approach achieves start-of-the-art results on benchmark datasets such as Google NQ and SQuAD. We also apply our method to domain-specific datasets, and the results suggest how different granularities are optimal for different domains	翻訳日:2023-06-29 15:45:14 公開日:2023-06-28
# ランダム係数リッジ回帰を用いた伝達学習 Transfer Learning with Random Coefficient Ridge Regression ( http://arxiv.org/abs/2306.15915v1 ) ライセンス: Link先を確認	Hongzhe Zhang and Hongzhe Li	(参考訳) ランダムな係数を持つリッジ回帰は、効果が小さいがゼロではないと期待される場合、高次元の設定で固定係数回帰に重要な代替となる。本稿では,移動学習の設定におけるランダム係数リッジ回帰の推定と予測について考察し,対象モデルからの観測に加えて,異なるが関連性のある回帰モデルからのサンプルも利用可能である。対象モデルに対するソースモデルの情報性は、回帰係数の相関によって定量化することができる。本稿では,実験的推定リスクや予測リスクを最小化して重みを決定できるターゲットモデルとソースモデルの両方のリッジ推定の重み付け和として,対象モデルの回帰係数を2つの推定器を提案する。ランダム行列理論を用いて、最適重みの制限値は、$p/n \rightarrow \gamma$(ここで$p$は予測者の数、$n$はサンプルサイズ)という設定で導出され、推定や予測のリスクを明示的に表わす。シミュレーションでは、これらの制限リスクは経験的リスクと非常によく一致している。脂質特性に対するポリジェニックリスクスコアの予測への応用は, 単一試料隆起回帰法やラッソを用いた伝達学習法よりも, 予測誤差が小さいことを示す。 Ridge regression with random coefficients provides an important alternative to fixed coefficients regression in high dimensional setting when the effects are expected to be small but not zeros. This paper considers estimation and prediction of random coefficient ridge regression in the setting of transfer learning, where in addition to observations from the target model, source samples from different but possibly related regression models are available. The informativeness of the source model to the target model can be quantified by the correlation between the regression coefficients. This paper proposes two estimators of regression coefficients of the target model as the weighted sum of the ridge estimates of both target and source models, where the weights can be determined by minimizing the empirical estimation risk or prediction risk. Using random matrix theory, the limiting values of the optimal weights are derived under the setting when $p/n \rightarrow \gamma$, where $p$ is the number of the predictors and $n$ is the sample size, which leads to an explicit expression of the estimation or prediction risks. Simulations show that these limiting risks agree very well with the empirical risks. An application to predicting the polygenic risk scores for lipid traits shows such transfer learning methods lead to smaller prediction errors than the single sample ridge regression or Lasso-based transfer learning.	翻訳日:2023-06-29 15:45:00 公開日:2023-06-28
# マルチモーダルうわさ検出のための知識強化階層型情報相関学習 Knowledge-Enhanced Hierarchical Information Correlation Learning for Multi-Modal Rumor Detection ( http://arxiv.org/abs/2306.15946v1 ) ライセンス: Link先を確認	Jiawei Liu, Jingyi Xie, Fanrui Zhang, Qiang Zhang, Zheng-jun Zha	(参考訳) ソーシャルメディア上のテキストや画像による噂の爆発的な成長は、大きな注目を集めている。既存の研究は、クロスモーダル情報インタラクションと融合に多大な貢献をしてきたが、異なるモダリティコンテンツ間の階層的および複雑な意味的相関を十分に探求できず、マルチモーダルなうわさを検出する際の性能を厳しく制限している。本研究では,基本意味相関と高次知識相関を共同でモデル化し,マルチモーダルうわさ検出のための知識エンハンスド階層情報相関学習手法(khicl)を提案する。具体的には、KhiCLはクロスモーダル結合辞書を利用して、異種一様特徴を共通特徴空間に伝達し、クロスモーダル融合層によって基本的なクロスモーダル意味的一貫性と矛盾を捉える。さらに、マルチモーダルコンテンツの記述をエンティティを中心に考えると、KhiCLは画像やテキストから視覚的およびテキスト的エンティティを抽出し、知識関連推論戦略を設計し、外部知識グラフ内の各エンティティ間の最も短い意味的関連パスを見つけ、この経路で他の連結エンティティの補完的なコンテキスト的知識をすべて吸収して知識強化エンティティ表現を学習する。さらに、KhiCLは署名された注意機構を用いて、その対応する意味的関連距離を測定することで、モダリティ内およびモダリティ間エンティティペアの知識強化エンティティ一貫性と矛盾をモデル化する。提案手法の有効性を実験により実証した。 The explosive growth of rumors with text and images on social media platforms has drawn great attention. Existing studies have made significant contributions to cross-modal information interaction and fusion, but they fail to fully explore hierarchical and complex semantic correlation across different modality content, severely limiting their performance on detecting multi-modal rumor. In this work, we propose a novel knowledge-enhanced hierarchical information correlation learning approach (KhiCL) for multi-modal rumor detection by jointly modeling the basic semantic correlation and high-order knowledge-enhanced entity correlation. Specifically, KhiCL exploits cross-modal joint dictionary to transfer the heterogeneous unimodality features into the common feature space and captures the basic cross-modal semantic consistency and inconsistency by a cross-modal fusion layer. Moreover, considering the description of multi-modal content is narrated around entities, KhiCL extracts visual and textual entities from images and text, and designs a knowledge relevance reasoning strategy to find the shortest semantic relevant path between each pair of entities in external knowledge graph, and absorbs all complementary contextual knowledge of other connected entities in this path for learning knowledge-enhanced entity representations. Furthermore, KhiCL utilizes a signed attention mechanism to model the knowledge-enhanced entity consistency and inconsistency of intra-modality and inter-modality entity pairs by measuring their corresponding semantic relevant distance. Extensive experiments have demonstrated the effectiveness of the proposed method.	翻訳日:2023-06-29 15:37:56 公開日:2023-06-28
# Pb-Hash: 分割bビットハッシュ Pb-Hash: Partitioned b-bit Hashing ( http://arxiv.org/abs/2306.15944v1 ) ライセンス: Link先を確認	Ping Li, Weijie Zhao	(参考訳) minwise hashing (minhash), one permutation hashing (oph), consistent weighted sampling (cws) を含む多くのハッシュアルゴリズムは、$b$bitの整数を生成する。データベクトル毎に$k$ハッシュを使用すると、ストレージは$B\times k$ bitsとなり、大規模学習に使用する場合、モデルサイズは$2^B\times k$となる。標準的な戦略は、$b$ビットのうち最低の$b$ビットのみを使用し、ハッシュ数である$k$を多少増やすことである。本研究では,$B$ビットを$m$チャンク,例えば$b\times m =B$に分割することでハッシュを再使用することを提案する。対応するモデルサイズは$m\times 2^b \times k$となり、これは元の$^b\times k$よりもかなり小さい。理論分析の結果、ハッシュ値を$m$チャンクに分割すると精度が低下することが明らかとなった。言い換えれば、$B/m$ビットの$m$チャンクを使うことは、$B$ビットを直接使用するほど正確ではない。これは同じハッシュの再使用による相関のためである。一方、我々の分析では(例えば)$m=2\sim 4$に対して精度があまり低下しないことも示している。一部の地域では、pb-hashは4.99ドルよりはるかに大きい価格でも機能する。 Pb-Hashは、ハッシュメソッド/アプリケーションのファミリーに良い追加であり、産業従事者に利益をもたらすと期待しています。線形SVMモデルおよびディープラーニングモデルに対する機械学習タスクにおけるPb-Hashの有効性を検証する。ハッシュデータは本質的に分類(ID)機能であるため、各ハッシュに埋め込みテーブルを使用する標準的なプラクティスに従う。 Pb-Hashでは、$m$の埋め込みを組み合わせる効果的な戦略を設計する必要があります。本研究は, 連結, 最大プール, 平均プール, 製品プールの4つの手法を実証的に評価した。どのプールが良いかという明確な答えはなく、私たちは将来の研究のためにそれを残します。 Many hashing algorithms including minwise hashing (MinHash), one permutation hashing (OPH), and consistent weighted sampling (CWS) generate integers of $B$ bits. With $k$ hashes for each data vector, the storage would be $B\times k$ bits; and when used for large-scale learning, the model size would be $2^B\times k$, which can be expensive. A standard strategy is to use only the lowest $b$ bits out of the $B$ bits and somewhat increase $k$, the number of hashes. In this study, we propose to re-use the hashes by partitioning the $B$ bits into $m$ chunks, e.g., $b\times m =B$. Correspondingly, the model size becomes $m\times 2^b \times k$, which can be substantially smaller than the original $2^B\times k$. Our theoretical analysis reveals that by partitioning the hash values into $m$ chunks, the accuracy would drop. In other words, using $m$ chunks of $B/m$ bits would not be as accurate as directly using $B$ bits. This is due to the correlation from re-using the same hash. On the other hand, our analysis also shows that the accuracy would not drop much for (e.g.,) $m=2\sim 4$. In some regions, Pb-Hash still works well even for $m$ much larger than 4. We expect Pb-Hash would be a good addition to the family of hashing methods/applications and benefit industrial practitioners. We verify the effectiveness of Pb-Hash in machine learning tasks, for linear SVM models as well as deep learning models. Since the hashed data are essentially categorical (ID) features, we follow the standard practice of using embedding tables for each hash. With Pb-Hash, we need to design an effective strategy to combine $m$ embeddings. Our study provides an empirical evaluation on four pooling schemes: concatenation, max pooling, mean pooling, and product pooling. There is no definite answer which pooling would be always better and we leave that for future study.	翻訳日:2023-06-29 15:37:26 公開日:2023-06-28
# 移動不要:Opti-Mileを用いたラストマイルと公共交通の統合 No Transfers Required: Integrating Last Mile with Public Transit Using Opti-Mile ( http://arxiv.org/abs/2306.15943v1 ) ライセンス: Link先を確認	Raashid Altaf, Pravesh Biyani	(参考訳) 公共交通機関は、ほとんどの地域に到達するのに必要な交通機関の必要性のため不便にもかかわらず、その手頃な価格のため人気のある交通手段である。例えば、ニューデリーのバスと地下鉄のネットワークでは、どの出発点からでも30\%しか直接アクセスできないため、ほとんどの通勤者への乗り換えが必要となる。さらに、リックショー、タクチューク、シャトルといったラストマイルのサービスは、最も近い公共交通機関のアクセスポイントへの給餌機として一般的に使われており、旅の複雑さと非効率性をさらに増す。最終的に、ユーザーは移動モードやラストマイルサービスの有無に関わらず、目的地に到達するためのカバレッジと転送のトレードオフに直面します。公共交通機関における移動に伴うアクセシビリティの制限と非効率の問題に対処するために,ラストマイルサービスと公共交通機関を組み合わせた新しい旅行計画手法である「opti-mile」を提案する。 Opti-mileでは、最大歩行距離や許容範囲などの旅行パラメータをカスタマイズできる。我々はニューデリーの交通ネットワークを解析し、ランダムに選択されたソース-決定ペア間の最適なマルチモーダル旅行におけるオプティマイルの効率、実現可能性、利点を評価する。従来の最短経路に比べて18%の値上げで、オプティマイル走行が10%距離移動を減少させることを示した。また、オプティマイルの旅行は公共交通機関よりも、運賃の大幅な増加を伴わずに、市をカバーしていることを示す。 Public transit is a popular mode of transit due to its affordability, despite the inconveniences due to the necessity of transfers required to reach most areas. For example, in the bus and metro network of New Delhi, only 30\% of stops can be directly accessed from any starting point, thus requiring transfers for most commutes. Additionally, last-mile services like rickshaws, tuk-tuks or shuttles are commonly used as feeders to the nearest public transit access points, which further adds to the complexity and inefficiency of a journey. Ultimately, users often face a tradeoff between coverage and transfers to reach their destination, regardless of the mode of transit or the use of last-mile services. To address the problem of limited accessibility and inefficiency due to transfers in public transit systems, we propose ``opti-mile," a novel trip planning approach that combines last-mile services with public transit such that no transfers are required. Opti-mile allows users to customise trip parameters such as maximum walking distance, and acceptable fare range. We analyse the transit network of New Delhi, evaluating the efficiency, feasibility and advantages of opti-mile for optimal multi-modal trips between randomly selected source-destination pairs. We demonstrate that opti-mile trips lead to a 10% reduction in distance travelled for 18% increase in price compared to traditional shortest paths. We also show that opti-mile trips provide better coverage of the city than public transit, without a significant fare increase.	翻訳日:2023-06-29 15:36:49 公開日:2023-06-28
# ターゲット音声抽出のための空間情報付き強化ニューラルビームフォーマ Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction ( http://arxiv.org/abs/2306.15942v1 ) ライセンス: Link先を確認	Aoqi Guo, Junnan Wu, Peng Gao, Wenbo Zhu, Qinwen Guo, Dazhi Gao and Yujun Wang	(参考訳) 近年,深層学習に基づくビームフォーミングアルゴリズムは,ターゲット音声抽出作業において有望な性能を示した。しかし、ほとんどのシステムは空間情報を十分に利用していない。本稿では,空間情報を利用してニューラルビームフォーマの性能を向上させるターゲット音声抽出ネットワークを提案する。そこで我々はまず, unet-tcn構造を用いて入力特徴をモデル化し, 他のモデルにおける直接次元化による情報損失を回避し, 音声前分離モジュールの推定精度を向上させる。さらに,アレイが受信する空間情報を十分に活用することにより,神経ビームフォーマーの空間情報の知覚を高めるマルチヘッドクロスアテンション機構を提案する。実験の結果,より合理的なターゲットマスク推定ネットワークと空間情報に基づくクロスタッチ機構を組み込んだアプローチが,音声分離性能を効果的に向上することが示された。 Recently, deep learning-based beamforming algorithms have shown promising performance in target speech extraction tasks. However, most systems do not fully utilize spatial information. In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer. To achieve this, we first use the UNet-TCN structure to model input features and improve the estimation accuracy of the speech pre-separation module by avoiding information loss caused by direct dimensionality reduction in other models. Furthermore, we introduce a multi-head cross-attention mechanism that enhances the neural beamformer's perception of spatial information by making full use of the spatial information received by the array. Experimental results demonstrate that our approach, which incorporates a more reasonable target mask estimation network and a spatial information-based cross-attention mechanism into the neural beamformer, effectively improves speech separation performance.	翻訳日:2023-06-29 15:36:21 公開日:2023-06-28
# 変分オートエンコーダの概念学習によるセルネットワークの解釈可能な異常検出 Interpretable Anomaly Detection in Cellular Networks by Learning Concepts in Variational Autoencoders ( http://arxiv.org/abs/2306.15938v1 ) ライセンス: Link先を確認	Amandeep Singh, Michael Weber, Markus Lange-Hegermann	(参考訳) 本稿では,セルラーネットワーク内の異常を解釈可能な方法で検出する課題に対処し,データセット内のキーパフォーマンス指標(KPI)毎に潜在空間の解釈可能な表現を学習する可変オートエンコーダ(VAE)を用いた新しいアプローチを提案する。これにより、再構成損失とzスコアに基づく異常の検出が可能になる。我々は,k-meansアルゴリズムを用いた追加情報センタロイド(c)による異常の解釈可能性を確保し,表現学習の促進を図る。我々は,特定のKPIの潜在次元のパターンを解析することにより,モデルの性能を評価し,解釈可能性と異常を実証する。提案するフレームワークは,セルネットワーク内の異常を検出するための高速かつ自律的なソリューションを提供し,ビッグデータ処理におけるディープラーニングベースのアルゴリズムの可能性を示す。 This paper addresses the challenges of detecting anomalies in cellular networks in an interpretable way and proposes a new approach using variational autoencoders (VAEs) that learn interpretable representations of the latent space for each Key Performance Indicator (KPI) in the dataset. This enables the detection of anomalies based on reconstruction loss and Z-scores. We ensure the interpretability of the anomalies via additional information centroids (c) using the K-means algorithm to enhance representation learning. We evaluate the performance of the model by analyzing patterns in the latent dimension for specific KPIs and thereby demonstrate the interpretability and anomalies. The proposed framework offers a faster and autonomous solution for detecting anomalies in cellular networks and showcases the potential of deep learning-based algorithms in handling big data.	翻訳日:2023-06-29 15:36:00 公開日:2023-06-28
# 熱電流の量子制御 Quantum Control of Heat Current ( http://arxiv.org/abs/2306.15937v1 ) ライセンス: Link先を確認	Gobinda Chakraborty, Subhadeep Chakraborty, Tanmoy Basu, and Manas Mukherjee	(参考訳) 2つの熱浴に結合した高調波発振器の量子トリマーにおける局所熱輸送について検討した。それらのカップリングは複雑な相によって増強され、同じ熱浴に接続された2つの発振器間の局所的な非定型熱電流の量子制御につながる。本研究により, この非定型熱電流はダークモードの上昇の結果であり, この電流の変調はシステム浴の相関のばらつきに起因することが明らかとなった。提案する量子システムは、熱電流を利用して量子熱・メモリデバイスに応用できるかもしれない。 We investigate the local thermal transport in a quantum trimer of harmonic oscillators connected to two thermal baths. The coupling between them are augmented by complex phases which leads to the quantum control of the local atypical heat current between two oscillators connected to the same heat bath. Our study reveals that this atypical heat current is a consequence of the lifting of the dark mode and the modulation of this current is due to variation in system bath correlations. The proposed quantum system may find application in quantum thermal and memory devices by leveraging the heat current.	翻訳日:2023-06-29 15:35:37 公開日:2023-06-28
# モデルベース適応のための奇抜なリプレイ Curious Replay for Model-based Adaptation ( http://arxiv.org/abs/2306.15934v1 ) ライセンス: Link先を確認	Isaac Kauvar, Chris Doyle, Linqi Zhou, Nick Haber	(参考訳) エージェントは環境の変化に応じて迅速に適応できなければならない。既存のモデルベース強化学習エージェントは、過去の経験を世界モデルのトレーニングに用いているため、これをうまく実行できないことが分かっています。ここでは、好奇心に基づく優先信号を用いて、モデルベースのエージェントにカスタマイズされた優先的な体験リプレイの形式であるCurious Replayを紹介する。 Curious Replayを使用するエージェントは、動物行動やCrafterベンチマークにインスパイアされた探索パラダイムのパフォーマンス向上を示す。 Curious Replay の DreamerV3 は Crafter の最先端のパフォーマンスを上回り、DreamerV3 の以前の高得点 14.5 よりも大幅に向上した 19.4 のスコアを達成し、Deepmind Control Suite でも同様のパフォーマンスを維持した。 Curious Replayのコードはhttps://github.com/AutonomousAgentsLab/curiousreplayで入手できる。 Agents must be able to adapt quickly as an environment changes. We find that existing model-based reinforcement learning agents are unable to do this well, in part because of how they use past experiences to train their world model. Here, we present Curious Replay -- a form of prioritized experience replay tailored to model-based agents through use of a curiosity-based priority signal. Agents using Curious Replay exhibit improved performance in an exploration paradigm inspired by animal behavior and on the Crafter benchmark. DreamerV3 with Curious Replay surpasses state-of-the-art performance on Crafter, achieving a mean score of 19.4 that substantially improves on the previous high score of 14.5 by DreamerV3 with uniform replay, while also maintaining similar performance on the Deepmind Control Suite. Code for Curious Replay is available at https://github.com/AutonomousAgentsLab/curiousreplay	翻訳日:2023-06-29 15:35:22 公開日:2023-06-28
# 再度生成できる - 検証と修正プロンプトを備えたデータからテキストへの生成 You Can Generate It Again: Data-to-text Generation with Verification and Correction Prompting ( http://arxiv.org/abs/2306.15933v1 ) ライセンス: Link先を確認	Xuan Ren, Lingqiao Liu	(参考訳) 既存のモデルの大幅な進歩にもかかわらず、データ対テキスト生成として知られる構造化データ入力からテキスト記述を生成することは、依然として困難な課題である。本稿では, 生成, 検証, 修正段階からなる多段階プロセスを導入することで, 従来のワンショット生成方法を超える新しい手法を提案する。我々のアプローチであるVCP(Verification and Correction Prompting)は、初期出力を生成するモデルから始まります。次に、生成されたテキストの異なる側面の正しさを検証する。検証ステップからの観察は、特定されたエラーを考慮して出力を再生するようにモデルに指示する特殊なエラー表示プロンプトに変換される。モデルの修正能力を高めるため,注意深く設計したトレーニング手順を開発した。この手順により、モデルがエラー表示プロンプトからのフィードバックを組み込むことができ、結果として出力生成が改善される。実験結果から,本手法は生成テキストの全体的な品質を維持しつつ,スロットエラー率を効果的に低減することを示す。 Despite significant advancements in existing models, generating text descriptions from structured data input, known as data-to-text generation, remains a challenging task. In this paper, we propose a novel approach that goes beyond traditional one-shot generation methods by introducing a multi-step process consisting of generation, verification, and correction stages. Our approach, VCP(Verification and Correction Prompting), begins with the model generating an initial output. We then proceed to verify the correctness of different aspects of the generated text. The observations from the verification step are converted into a specialized error-indication prompt, which instructs the model to regenerate the output while considering the identified errors. To enhance the model's correction ability, we have developed a carefully designed training procedure. This procedure enables the model to incorporate feedback from the error-indication prompt, resulting in improved output generation. Through experimental results, we demonstrate that our approach effectively reduces slot error rates while maintaining the overall quality of the generated text.	翻訳日:2023-06-29 15:34:47 公開日:2023-06-28
# NIPD:実世界の非IIDデータに基づくフェデレーション学習者検出ベンチマーク NIPD: A Federated Learning Person Detection Benchmark Based on Real-World Non-IID Data ( http://arxiv.org/abs/2306.15932v1 ) ライセンス: Link先を確認	Kangning Yin, Zhen Ding, Zhihua Dong, Dongsheng Chen, Jie Fu, Xinhui Ji, Guangqiang Yin and Zhiguo Wang	(参考訳) プライバシー保護型分散機械学習であるfederated learning(fl)は、無線通信ネットワークで急速に適用されている。 FLにより、IoT(Internet of Things)クライアントは、プライバシーの漏洩を防止しつつ、十分にトレーニングされたモデルを得ることができる。人検出は、FLと組み合わせてビデオデータをエッジで直接処理する場合、限られた計算能力を持つエッジデバイスに展開することができる。しかし、異なるカメラの異なるハードウェアおよび展開シナリオのため、カメラが収集したデータは非独立かつ同一に分布しており(非IID)、FLアグリゲーションから派生したグローバルモデルはより効果的ではない。一方、既存の研究では、現実世界のFLオブジェクト検出のための公開データセットが欠如しており、IoTカメラにおける非IID問題の研究には適していない。そこで我々は,5台のカメラから収集した非IID IoT 人物検出(NIPD)データセットをオープンソース化した。我々の知る限り、これがデバイスベースの非IID人物検出データセットとしては初めてのものである。このデータセットに基づいて,fl実験プラットフォームの構築方法を説明し,非iid者検出のためのベンチマークを提供する。 NIPDはFLの適用とスマートシティのセキュリティを促進することが期待されている。 Federated learning (FL), a privacy-preserving distributed machine learning, has been rapidly applied in wireless communication networks. FL enables Internet of Things (IoT) clients to obtain well-trained models while preventing privacy leakage. Person detection can be deployed on edge devices with limited computing power if combined with FL to process the video data directly at the edge. However, due to the different hardware and deployment scenarios of different cameras, the data collected by the camera present non-independent and identically distributed (non-IID), and the global model derived from FL aggregation is less effective. Meanwhile, existing research lacks public data set for real-world FL object detection, which is not conducive to studying the non-IID problem on IoT cameras. Therefore, we open source a non-IID IoT person detection (NIPD) data set, which is collected from five different cameras. To our knowledge, this is the first true device-based non-IID person detection data set. Based on this data set, we explain how to establish a FL experimental platform and provide a benchmark for non-IID person detection. NIPD is expected to promote the application of FL and the security of smart city.	翻訳日:2023-06-29 15:34:21 公開日:2023-06-28
# 学習可能なパッチワイズマスクによる対向移動性の向上 Boosting Adversarial Transferability with Learnable Patch-wise Masks ( http://arxiv.org/abs/2306.15931v1 ) ライセンス: Link先を確認	Xingxing Wei, Shiji Zhao	(参考訳) 敵対的な例は、異なるモデル間での転送性のため、セキュリティクリティカルなアプリケーションで広く注目を集めています。対向移動性を高めるために多くの方法が提案されているが、実際的な需要にはまだギャップがある。本稿では,モデル固有の判別領域が,ソースモデルへの過剰適合を招き,対象モデルへの伝達性を低下させる鍵要因であると主張する。そのため、対向摂動を計算する際に、パッチワイズマスクを用いてモデル固有領域をプルークする。これらの領域を正確にローカライズするために,マスクの自動最適化のための学習可能なアプローチを提案する。具体的には,対象モデルのシミュレーションを行い,シミュレーションモデルのフィードバックに応じてパッチワイズマスクを調整する。効率を向上させるために、差分進化法(DE)アルゴリズムを用いて特定の画像に対するパッチワイドマスクを探索する。反復攻撃中、学習したマスクを画像に適用して、モデル固有の領域に関するパッチをドロップアウトし、勾配をより汎用的にし、対向移動性を向上させる。提案手法は前処理法であり,既存の勾配に基づく手法と統合することで,転送攻撃成功率をさらに高めることができる。 ImageNetデータセットの大規模な実験により,本手法の有効性が示された。提案手法を既存のアンサンブル攻撃手法に組み込んで,最新技術を用いた攻撃性能を効果的に向上させる7つの先進防衛手法に対して平均93.01%の成功率を達成する。 Adversarial examples have raised widespread attention in security-critical applications because of their transferability across different models. Although many methods have been proposed to boost adversarial transferability, a gap still exists in the practical demand. In this paper, we argue that the model-specific discriminative regions are a key factor to cause the over-fitting to the source model, and thus reduce the transferability to the target model. For that, a patch-wise mask is utilized to prune the model-specific regions when calculating adversarial perturbations. To accurately localize these regions, we present a learnable approach to optimize the mask automatically. Specifically, we simulate the target models in our framework, and adjust the patch-wise mask according to the feedback of simulated models. To improve the efficiency, Differential Evolutionary (DE) algorithm is utilized to search for patch-wise masks for a specific image. During iterative attacks, the learned masks are applied to the image to drop out the patches related to model-specific regions, thus making the gradients more generic and improving the adversarial transferability. The proposed approach is a pre-processing method and can be integrated with existing gradient-based methods to further boost the transfer attack success rate. Extensive experiments on the ImageNet dataset demonstrate the effectiveness of our method. We incorporate the proposed approach with existing methods in the ensemble attacks and achieve an average success rate of 93.01% against seven advanced defense methods, which can effectively enhance the state-of-the-art transfer-based attack performance.	翻訳日:2023-06-29 15:34:02 公開日:2023-06-28
# バイモーダルトランスによる血行動態応答関数の再構成 Reconstructing the Hemodynamic Response Function via a Bimodal Transformer ( http://arxiv.org/abs/2306.15971v1 ) ライセンス: Link先を確認	Yoni Choukroun, Lior Golgher, Pablo Blinder, Lior Wolf	(参考訳) 血流と神経活動の関係は広く認識されており、fmri研究において血流は神経活動のサーロゲートとしてよく用いられる。微小なレベルでは、神経活動は近くの血管の血流に影響を与えることが示されている。本研究は、この問題を明示的なニューロン集団レベルで直接扱う最初の予測モデルを提案する。覚醒マウスの生体内記録を用いて, 経時的バイモーダルトランスフォーマー構造を用いて, 経時的血流量と持続する自発的ニューロン活動の両方に基づいて, 血流を推定する。本研究は,神経活動の取り込みにより,血流量の予測能力が著しく向上することが示唆された。モデル行動の解析を通じて,神経活動に対する血行力学的反応の概ね未熟な性質に関する仮説を提案する。 The relationship between blood flow and neuronal activity is widely recognized, with blood flow frequently serving as a surrogate for neuronal activity in fMRI studies. At the microscopic level, neuronal activity has been shown to influence blood flow in nearby blood vessels. This study introduces the first predictive model that addresses this issue directly at the explicit neuronal population level. Using in vivo recordings in awake mice, we employ a novel spatiotemporal bimodal transformer architecture to infer current blood flow based on both historical blood flow and ongoing spontaneous neuronal activity. Our findings indicate that incorporating neuronal activity significantly enhances the model's ability to predict blood flow values. Through analysis of the model's behavior, we propose hypotheses regarding the largely unexplored nature of the hemodynamic response to neuronal activity.	翻訳日:2023-06-29 15:27:51 公開日:2023-06-28
# 雑音量子処理実験における有効量子体積・忠実度・計算コスト Effective quantum volume, fidelity and computational cost of noisy quantum processing experiments ( http://arxiv.org/abs/2306.15970v1 ) ライセンス: Link先を確認	K. Kechedzhi, S. V. Isakov, S. Mandr\`a, B. Villalonga, X. Mi, S. Boixo, V. Smelyanskiy	(参考訳) 今日の実験的なノイズ量子プロセッサは、無作為回路サンプリングの計算ベンチマークタスクのために、最先端のスーパーコンピュータ上のすべての既知のアルゴリズムと競合することができる[1-5]。さらに、局所観測可能な量子情報スクランブル[6]の回路ベースの量子シミュレーションは、例えば、正確なシュロディンガー進化やマトリックス生成状態(MPS)など、標準のフルウェーブ関数シミュレーションアルゴリズムをすでに上回っている。しかし、この実験はまだ観測可能値を計算するためにテンソルネットワークの収縮を越えていない。これらの研究に基づき、本研究は、特定の観測可能な信号対雑音比とそれに対応する計算コストとのトレードオフを説明するために、基礎となる有効回路体積を利用する統一的なフレームワークを提供する。このフレームワークを、ランダム回路サンプリング[5]、量子情報スクランブル[6]、フロッケ回路ユニタリ[7]の最近の量子プロセッサ実験に適用する。これにより、Refの結果を再現できます。 1つのGPUを使って、データポイントあたり1秒未満で [7]。 Today's experimental noisy quantum processors can compete with and surpass all known algorithms on state-of-the-art supercomputers for the computational benchmark task of Random Circuit Sampling [1-5]. Additionally, a circuit-based quantum simulation of quantum information scrambling [6], which measures a local observable, has already outperformed standard full wave function simulation algorithms, e.g., exact Schrodinger evolution and Matrix Product States (MPS). However, this experiment has not yet surpassed tensor network contraction for computing the value of the observable. Based on those studies, we provide a unified framework that utilizes the underlying effective circuit volume to explain the tradeoff between the experimentally achievable signal-to-noise ratio for a specific observable, and the corresponding computational cost. We apply this framework to recent quantum processor experiments of Random Circuit Sampling [5], quantum information scrambling [6], and a Floquet circuit unitary [7]. This allows us to reproduce the results of Ref. [7] in less than one second per data point using one GPU.	翻訳日:2023-06-29 15:27:38 公開日:2023-06-28
# 分離可能な物理インフォームニューラルネットワーク Separable Physics-Informed Neural Networks ( http://arxiv.org/abs/2306.15969v1 ) ライセンス: Link先を確認	Junwoo Cho, Seungtae Nam, Hyunmo Yang, Seok-Bae Yun, Youngjoon Hong, Eunbyung Park	(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、様々なPDEに対して有望なデータ駆動型PDE解法として最近登場した。しかし、多次元pdesや近似高複素解関数を解くための訓練ピンの基本的な制限がある。これらの困難なpdesに必要なトレーニングポイント(ロケーションポイント)の数は大幅に増加するが、高価な計算コストとメモリのオーバーヘッドのため、かなり制限されている。この問題を克服するため,我々はpinnのネットワークアーキテクチャとトレーニングアルゴリズムを提案する。提案手法である分離可能なPINN(SPINN)は,従来のPINNのポイントワイド処理とは異なり,多次元PDEにおけるネットワーク伝搬数を著しく削減する。また,PDE残差計算の計算コストを削減し,単一のコモディティGPU上で多数のコロケーションポイント(>10^7)を実現するために,前方モード自動微分法を提案する。実験の結果,多次元PDEにおける計算コスト(壁面時間62倍,FLOPでは1,394倍)を大幅に削減し,精度が向上した。さらに,SPINN は,2+1-d Navier-Stokes 方程式を最良性能の先行手法 (1GPUでは9分対10時間) よりもはるかに高速に解き,精度を維持できることを示した。最後に、SPINNは高非線形多次元PDE(3+1-d Navier-Stokes方程式)の解を正確に得ることを示す。 Physics-informed neural networks (PINNs) have recently emerged as promising data-driven PDE solvers showing encouraging results on various PDEs. However, there is a fundamental limitation of training PINNs to solve multi-dimensional PDEs and approximate highly complex solution functions. The number of training points (collocation points) required on these challenging PDEs grows substantially, but it is severely limited due to the expensive computational costs and heavy memory overhead. To overcome this issue, we propose a network architecture and training algorithm for PINNs. The proposed method, separable PINN (SPINN), operates on a per-axis basis to significantly reduce the number of network propagations in multi-dimensional PDEs unlike point-wise processing in conventional PINNs. We also propose using forward-mode automatic differentiation to reduce the computational cost of computing PDE residuals, enabling a large number of collocation points (>10^7) on a single commodity GPU. The experimental results show drastically reduced computational costs (62x in wall-clock time, 1,394x in FLOPs given the same number of collocation points) in multi-dimensional PDEs while achieving better accuracy. Furthermore, we present that SPINN can solve a chaotic (2+1)-d Navier-Stokes equation significantly faster than the best-performing prior method (9 minutes vs 10 hours in a single GPU), maintaining accuracy. Finally, we showcase that SPINN can accurately obtain the solution of a highly nonlinear and multi-dimensional PDE, a (3+1)-d Navier-Stokes equation.	翻訳日:2023-06-29 15:27:21 公開日:2023-06-28
# 階層型強化学習による都市自律運転の行動と軌道計画 Action and Trajectory Planning for Urban Autonomous Driving with Hierarchical Reinforcement Learning ( http://arxiv.org/abs/2306.15968v1 ) ライセンス: Link先を確認	Xinyang Lu, Flint Xiaofeng Fan and Tianying Wang	(参考訳) 強化学習(rl)は、単純な運転シナリオにおける自動運転車(avs)の計画と意思決定において有望な進歩を遂げた。しかし、AVのための既存のRLアルゴリズムは、複雑な都市シナリオにおいて重要な運転スキルを学ばない。第一に、都市運転シナリオは、従来のRLアルゴリズムが不可能な複数の運転タスクを扱うためにAVを必要とする。第2に、都市シナリオにおける他の車両の存在は動的に変化する環境をもたらし、rlアルゴリズムはavの動作と軌道を計画する。本研究では,階層的強化学習(athrl)法を用いて,ライダーとバードアイの視覚の知覚を用いて,階層的モデルにおけるエージェントの挙動をモデル化するアクションおよび軌道プランナーを提案する。提案手法は,エージェントの将来の軌跡を決定することを学習し,階層DDPGアルゴリズムに基づいて目標経路を連続的に計算する。 athrlモデルによって計画されたウェイポイントは低レベルコントローラに送られ、車両の操縦に必要な操舵およびスロットルコマンドを生成する。 athrlの有効性を,carlaシミュレータにおける複数タスクからなる複雑な都市走行シナリオにおける広範囲な実験により実証的に検証した。実験結果から, 最先端RL法と比較して, 大幅な性能向上が示唆された。 Reinforcement Learning (RL) has made promising progress in planning and decision-making for Autonomous Vehicles (AVs) in simple driving scenarios. However, existing RL algorithms for AVs fail to learn critical driving skills in complex urban scenarios. First, urban driving scenarios require AVs to handle multiple driving tasks of which conventional RL algorithms are incapable. Second, the presence of other vehicles in urban scenarios results in a dynamically changing environment, which challenges RL algorithms to plan the action and trajectory of the AV. In this work, we propose an action and trajectory planner using Hierarchical Reinforcement Learning (atHRL) method, which models the agent behavior in a hierarchical model by using the perception of the lidar and birdeye view. The proposed atHRL method learns to make decisions about the agent's future trajectory and computes target waypoints under continuous settings based on a hierarchical DDPG algorithm. The waypoints planned by the atHRL model are then sent to a low-level controller to generate the steering and throttle commands required for the vehicle maneuver. We empirically verify the efficacy of atHRL through extensive experiments in complex urban driving scenarios that compose multiple tasks with the presence of other vehicles in the CARLA simulator. The experimental results suggest a significant performance improvement compared to the state-of-the-art RL methods.	翻訳日:2023-06-29 15:26:56 公開日:2023-06-28
# 高速融合移動によるグラフ補間 Graph Interpolation via Fast Fused-Gromovization ( http://arxiv.org/abs/2306.15963v1 ) ライセンス: Link先を確認	Xinyu Ma, Xu Chu, Yasha Wang, Yang Lin, Junfeng Zhao, Liantao Ma, Wenwu Zhu	(参考訳) グラフデータの増大はグラフレベルの分類のためのグラフニューラルネットワーク(GNN)の一般化性と堅牢性を高めるのに有効であることが証明されている。しかし、既存の手法は主にグラフ信号空間とグラフ構造空間を独立に拡張することに集中し、それらの相互作用を見越す。本稿では,グラフ構造と信号間の相互作用を考慮に入れたグラフ間のノードマッチングのための最適戦略を見つけることを目的とした,最適輸送問題としてこの問題を定式化する。この問題に対処するために、FGWMixupと呼ばれる新しいグラフ混合アルゴリズムを提案し、FGW(Fused Gromov-Wasserstein)計量空間を利用して、ソースグラフの「中間点」を同定する。この手法のスケーラビリティを向上させるために, 収束率を$\mathcal{o}(t^{-1})$から$\mathcal{o}(t^{-2})$にすることで, fgwmixupを高速化する緩和されたfgwソルバを導入する。古典的(MPNN)と先進的(Graphormers)のGNNバックボーンを併用した5つのデータセットで行われた大規模な実験は、GNNの一般化性と堅牢性を改善する上でFGWMixupの有効性を示した。 Graph data augmentation has proven to be effective in enhancing the generalizability and robustness of graph neural networks (GNNs) for graph-level classifications. However, existing methods mainly focus on augmenting the graph signal space and the graph structure space independently, overlooking their joint interaction. This paper addresses this limitation by formulating the problem as an optimal transport problem that aims to find an optimal strategy for matching nodes between graphs considering the interactions between graph structures and signals. To tackle this problem, we propose a novel graph mixup algorithm dubbed FGWMixup, which leverages the Fused Gromov-Wasserstein (FGW) metric space to identify a "midpoint" of the source graphs. To improve the scalability of our approach, we introduce a relaxed FGW solver that accelerates FGWMixup by enhancing the convergence rate from $\mathcal{O}(t^{-1})$ to $\mathcal{O}(t^{-2})$. Extensive experiments conducted on five datasets, utilizing both classic (MPNNs) and advanced (Graphormers) GNN backbones, demonstrate the effectiveness of FGWMixup in improving the generalizability and robustness of GNNs.	翻訳日:2023-06-29 15:26:33 公開日:2023-06-28
# 六方晶窒化ホウ素中のホウ素原子価欠陥の基底準位によるロバスト核スピン分極 Robust Nuclear Spin Polarization via Ground-State Level Anti-Crossing of Boron Vacancy Defects in Hexagonal Boron Nitride ( http://arxiv.org/abs/2306.15960v1 ) ライセンス: Link先を確認	Shihao Ru, Zhengzhi Jiang, Haidong Liang, Jonathan Kenny, Hongbing Cai, Xiaodan Lyu, Robert Cernansky, Feifei Zhou, Yuzhe Yang, Kenji Watanabe, Takashi Taniguch, Fuli Li, Koh Teck Seng, Xiaogang Liu, Fedor Jelezko, Andrew A. Bettiol, Weibo Gao	(参考訳) 核スピン偏極は、量子情報処理と量子センシングにおいて重要な役割を果たす。本研究では, 六方晶窒化ホウ素 (h-BN) 中のホウ素空孔欠陥 (\mathrm{V_B^-}$) を基底準位アンチクロス (GSLAC) を用いて, 安定かつ効率的な核スピン分極法を示す。 GSLACによる核分極は励起状態の反交差よりもかなり低いレーザーパワーで達成でき、このプロセスは実験的に実現可能である。さらに、h-BNで$\mathrm{V_B^-}$に対して、核スピンの直接光学的読み出しを実証した。以上の結果から, GSLACはh-BNの欠陥を正確に制御し, 操作するための有望な手法であることが示唆された。 Nuclear spin polarization plays a crucial role in quantum information processing and quantum sensing. In this work, we demonstrate a robust and efficient method for nuclear spin polarization with boron vacancy ($\mathrm{V_B^-}$) defects in hexagonal boron nitride (h-BN) using ground-state level anti-crossing (GSLAC). We show that GSLAC-assisted nuclear polarization can be achieved with significantly lower laser power than excited-state level anti-crossing, making the process experimentally more viable. Furthermore, we have demonstrated direct optical readout of nuclear spins for $\mathrm{V_B^-}$ in h-BN. Our findings suggest that GSLAC is a promising technique for the precise control and manipulation of nuclear spins in $\mathrm{V_B^-}$ defects in h-BN.	翻訳日:2023-06-29 15:26:08 公開日:2023-06-28
# ギャップのブリッジ: クラス不均衡下での一般化のための神経崩壊によるプロンプトチューニング Bridging the Gap: Neural Collapse Inspired Prompt Tuning for Generalization under Class Imbalance ( http://arxiv.org/abs/2306.15955v1 ) ライセンス: Link先を確認	Didi Zhu, Yinchuan Li, Min Zhang, Junkun Yuan, Jiashuo Liu, Kun Kuang, Chao Wu	(参考訳) 大規模視覚言語モデル (V-L) は, 高速チューニングによる下流タスクの顕著な一般化機能を示した。しかし、実際のシナリオでは一般的な問題であるクラス不均衡の存在下では、パフォーマンスが著しく低下する。本稿では,クラス不均衡がV-Lモデルの一般化性能に及ぼす影響とニューラル崩壊現象をこれらのモデルに拡張し,クラス不均衡が一般化能力に与える影響の幾何学的理由を明らかにする。この問題を解決するために,ニューラル・コラプスに基づくプロンプト・チューニング(NPT)を提案し,テキストと画像の特徴が同じ単純なETF構造を満たすようにプロンプトを最適化する。 NPTは2つの正規化項、幾何脱バイアスとマルチモーダル同型を導入し、一般化能力を保ちながらクラス不均衡条件下でのV-Lモデルのロバスト性を高める。総合実験により,nptは11種類の画像認識データセットで既存のプロンプト学習技術を上回っており,新しいクラスでは絶対平均値2.63\%,不均衡データでは調和平均値2.47\%を達成した。 Large-scale vision-language (V-L) models have demonstrated remarkable generalization capabilities for downstream tasks through prompt tuning. However, their performance suffers significantly in the presence of class imbalance, a common issue in real-world scenarios. In this paper, we investigate the effects of class imbalance on the generalization performance of V-L models and extend Neural Collapse phenomenon to these models, revealing the geometric reasons behind the impact of class imbalance on their generalization ability. To address this problem, we propose Neural Collapse based Prompt Tuning (NPT), a novel method that optimizes prompts so that both text and image features satisfy the same simplex ETF structure. NPT incorporates two regularization terms, geometric de-biasing and multi-modal isomorphism, to enhance the robustness of V-L models under class imbalance conditions while maintaining their generalization capabilities. Our comprehensive experiments show that NPT outperforms existing prompt learning techniques across 11 diverse image recognition datasets, achieving an absolute average gain of 2.63\% for novel classes and 2.47\% for harmonic mean when facing imbalanced data.	翻訳日:2023-06-29 15:25:54 公開日:2023-06-28
# 球面センサのレンズレスイメージングのための角感光レンズ Angle Sensitive Pixels for Lensless Imaging on Spherical Sensors ( http://arxiv.org/abs/2306.15953v1 ) ライセンス: Link先を確認	Yi Hua, Yongyi Zhao, Aswin C. Sankaranarayanan	(参考訳) 球面センサを用いた撮像のためのレンズレスアーキテクチャであるOrbCamを提案する。レンズレス撮像装置の技術は、主に平面センサの使用に重点を置いているが、そのような設計では、変調素子(例えば振幅マスクや位相マスク)を使用して可逆撮像システムを構築することが重要である。対照的に,曲面上の画素配向の多様性は,シーンとセンサ間のマッピングの条件づけを改善するのに十分であることを示す。したがって、球面センサを撮像する場合、すべての画素は同じ角度応答関数を持つことができ、レンズレス撮像器は互いに同一であり、向きだけが異なる画素で構成されている。本研究では,球面センサにおける画素の角応答設計のための計算ツールについて述べる。シミュレーションと実験室のプロトタイプの両方で設計を検証する。この設計の意義は、レンズレスイメージングを曲面やフレキシブルな面に容易に適用でき、新しいアプリケーション領域を開拓できるということである。 We propose OrbCam, a lensless architecture for imaging with spherical sensors. Prior work in lensless imager techniques have focused largely on using planar sensors; for such designs, it is important to use a modulation element, e.g. amplitude or phase masks, to construct a invertible imaging system. In contrast, we show that the diversity of pixel orientations on a curved surface is sufficient to improve the conditioning of the mapping between the scene and the sensor. Hence, when imaging on a spherical sensor, all pixels can have the same angular response function such that the lensless imager is comprised of pixels that are identical to each other and differ only in their orientations. We provide the computational tools for the design of the angular response of the pixels in a spherical sensor that leads to well-conditioned and noise-robust measurements. We validate our design in both simulation and a lab prototype. The implications of our design is that the lensless imaging can be enabled easily for curved and flexible surfaces thereby opening up a new set of application domains.	翻訳日:2023-06-29 15:25:33 公開日:2023-06-28
# 完全正の写像に対する極小完備定理とほぼすべての同値性 A minimal completion theorem and almost everywhere equivalence for Completely Positive maps ( http://arxiv.org/abs/2306.15952v1 ) ライセンス: Link先を確認	B. V. Rajarama Bhat, Arghya Chongdar	(参考訳) C-代数上の線型写像を完全正の写像に完備化する問題を分析する。そのような完備化が実現可能であれば、一意的な極小完備が存在することが示される。この定理は、いくつかの非常に一般的な条件下では、準純写像とほぼ至る所で完全に正の写像が実際にその写像と等しいことを示すために用いられる。 A problem of completing a linear map on C-algebras to a completely positive map is analyzed. It is shown that whenever such a completion is feasible there exists a unique minimal completion. This theorem is used to show that under some very general conditions a completely positive map almost everywhere equivalent to a quasi-pure map is actually equal to that map.	翻訳日:2023-06-29 15:25:16 公開日:2023-06-28
# 零点スキップによる畳み込み層の計算複雑性の低減 Reduce Computational Complexity for Convolutional Layers by Skipping Zeros ( http://arxiv.org/abs/2306.15951v1 ) ライセンス: Link先を確認	Zhiyi Zhang, Pengfei Zhang, Zhuopin Xu, Qi Wang	(参考訳) ディープニューラルネットワークはアクセラレーションのために並列プロセッサに依存している。オペレータを設計するには、複雑さを減らすための優れたアルゴリズムだけでなく、ハードウェアの十分な利用が必要である。畳み込み層は主に3種類の演算子を含む:前方伝播における畳み込み、逆伝播における畳み込み、拡張畳み込み。これらの演算子を実行するとき、0は常にテンソルに追加され、冗長な計算を引き起こす。本稿では, c-k-sアルゴリズム(convv2, ks-deconv, sk-dilated)について述べる。フィルタを分割してパッド付き0を除外し, 疎テンソルを密度テンソルに変換する。通常の畳み込みとは対照的に、畳み込みは複雑さのため加速しにくい。本稿では,C-K-Sの高性能GPU実装について述べるとともに,PyTorchとの比較による検証を行った。実験によると、C-K-SはPyTorchよりも利点があり、特に小さな特徴写像のデコンボリューションにおいて有利である。 C-K-Sのさらなる強化は、特定のGPUアーキテクチャで完全な最適化を行うことによって達成できる。 Deep neural networks rely on parallel processors for acceleration. To design operators for them, it requires not only good algorithm to reduce complexity, but also sufficient utilization of hardwares. Convolutional layers mainly contain 3 kinds of operators: convolution in forward propagation, deconvolution and dilated-convolution in backward propagation. When executing these operators, 0s are always added to tensors, causing redundant calculations. This paper gives C-K-S algorithm (ConvV2, KS-deconv, Sk-dilated), which skips these 0s in two ways: trim the filters to exclude padded 0s; transform sparse tensors to dense tensors, to avoid inserted 0s in deconvolution and dilated-convolution. In contrast to regular convolution, deconvolution is hard to accelerate due to its complicacy. This paper provides high-performance GPU implementations of C-K-S, and verifies their effectiveness with comparison to PyTorch. According to the experiments, C-K-S has advantages over PyTorch in certain cases, especially in deconvolution on small feature-maps. Further enhancement of C-K-S can be done by making full optimizations oriented at specific GPU architectures.	翻訳日:2023-06-29 15:25:08 公開日:2023-06-28
# 大規模言語モデルの時代におけるクエリ理解 Query Understanding in the Age of Large Language Models ( http://arxiv.org/abs/2306.16004v1 ) ライセンス: Link先を確認	Avishek Anand, Venktesh V, Abhijit Anand, Vinay Setty	(参考訳) 大規模言語モデル(llm)の普及と普及に伴い,自然言語を用いた検索・会話・制御や情報検索インターフェースが急速に普及している。本稿では,LLMを用いた対話型クエリ書き換えのための汎用フレームワークについて述べる。提案手法は,LLMを用いた高性能検索システムを構築しながら,改良的で透明な意図理解のための新たな機会を開拓することを目的としている。我々のフレームワークの重要な側面は、最終検索フェーズの前にさらに洗練され、制御され、編集される自然言語において、リライターが検索エンジンによるマシンインテントを十分に指定できることである。自然言語における機械の意図を表現、相互作用、推論する能力は、透明性、ランキングパフォーマンス、および意図を理解するために教師付きシグナルが収集される伝統的な方法からの離脱に重大な影響を与える。この対話型クエリ理解フレームワークに対するオープンな質問とともに、最初の実験を背景としたコンセプトを詳述する。 Querying, conversing, and controlling search and information-seeking interfaces using natural language are fast becoming ubiquitous with the rise and adoption of large-language models (LLM). In this position paper, we describe a generic framework for interactive query-rewriting using LLMs. Our proposal aims to unfold new opportunities for improved and transparent intent understanding while building high-performance retrieval systems using LLMs. A key aspect of our framework is the ability of the rewriter to fully specify the machine intent by the search engine in natural language that can be further refined, controlled, and edited before the final retrieval phase. The ability to present, interact, and reason over the underlying machine intent in natural language has profound implications on transparency, ranking performance, and a departure from the traditional way in which supervised signals were collected for understanding intents. We detail the concept, backed by initial experiments, along with open questions for this interactive query understanding framework.	翻訳日:2023-06-29 15:16:59 公開日:2023-06-28
# 音声駆動音声合成のテキスト化 Reprogramming Audio-driven Talking Face Synthesis into Text-driven ( http://arxiv.org/abs/2306.16003v1 ) ライセンス: Link先を確認	Jeongsoo Choi, Minsu Kim, Se Jin Park, Yong Man Ro	(参考訳) 本稿では,テキスト入力で操作できるように,事前学習した音声駆動対話顔合成モデルを再プログラムする手法を提案する。音声駆動発話顔合成モデルは、音声音声を入力として、所望の音声内容の発話アバターを生成するため、予め音声録音を行う必要がある。しかし、再生されるすべてのビデオの音声を記録するのは面倒だ。この問題を軽減するために,事前学習された音声駆動モデルの学習音声潜在空間に入力テキストを埋め込む新しい手法を提案する。そこで我々は,テキスト入力から音声潜在機能へのマッピングを学ぶために,tem(text-to-audio embedded module)を設計した。さらに,音声特徴に含まれる話者特性をモデル化するために,単一顔画像から得られるTAEMに視覚的話者埋め込みを注入することを提案する。訓練後、テキストか音声のどちらかの音声で会話の対面ビデオを合成できる。 In this paper, we propose a method to reprogram pre-trained audio-driven talking face synthesis models to be able to operate with text inputs. As the audio-driven talking face synthesis model takes speech audio as inputs, in order to generate a talking avatar with the desired speech content, speech recording needs to be performed in advance. However, this is burdensome to record audio for every video to be generated. In order to alleviate this problem, we propose a novel method that embeds input text into the learned audio latent space of the pre-trained audio-driven model. To this end, we design a Text-to-Audio Embedding Module (TAEM) which is guided to learn to map a given text input to the audio latent features. Moreover, to model the speaker characteristics lying in the audio features, we propose to inject visual speaker embedding into the TAEM, which is obtained from a single face image. After training, we can synthesize talking face videos with either text or speech audio.	翻訳日:2023-06-29 15:16:45 公開日:2023-06-28
# ディープラーニングによる公衆衛生研究のためのソーシャルメディア情報検索の合理化 Streamlining Social Media Information Retrieval for Public Health Research with Deep Learning ( http://arxiv.org/abs/2306.16001v1 ) ライセンス: Link先を確認	Yining Hua, Shixu Lin, Minghui Li, Yujie Zhang, Peilin Zhou, Ying-Chih Lo, Li Zhou, Jie Yang	(参考訳) 流行監視におけるソーシャルメディアの利用はよく確立されている。それでも、事前に定義されたレキシコンを用いて関連するコーパスを検索する場合、しばしばバイアスが発生する。本研究は,医学用語体系とUMLS概念の広範な辞書のキュレーションを目的としたフレームワークを提案する。このフレームワークは、ソーシャルメディアコンテンツから医療エンティティを識別するBERTベースの名前付きエンティティ認識(NER)モデルと、抽出されたエンティティを標準化するディープラーニング駆動正規化モジュールと、最も確率の高いUMLS概念を標準化されたエンティティに割り当てる半教師付きクラスタリングモジュールの3つのモジュールから構成される。この枠組みを2020年2月1日から2022年4月30日までのCOVID-19関連ツイートに適用し,876 UMLS概念にマッピングされた9,249の標準化されたエンティティと38,175の言語表現からなる症状辞書(https://github.com/ningkko/UMLS_colloquialism/)を生成した。この枠組みは,ソーシャルメディアを用いた公衆衛生研究におけるキーワードマッチング情報検索の制約に対処できる可能性を示す。 The utilization of social media in epidemic surveillance has been well established. Nonetheless, bias is often introduced when pre-defined lexicons are used to retrieve relevant corpus. This study introduces a framework aimed at curating extensive dictionaries of medical colloquialisms and Unified Medical Language System (UMLS) concepts. The framework comprises three modules: a BERT-based Named Entity Recognition (NER) model that identifies medical entities from social media content, a deep-learning powered normalization module that standardizes the extracted entities, and a semi-supervised clustering module that assigns the most probable UMLS concept to each standardized entity. We applied this framework to COVID-19-related tweets from February 1, 2020, to April 30, 2022, generating a symptom dictionary (available at https://github.com/ningkko/UMLS_colloquialism/) composed of 9,249 standardized entities mapped to 876 UMLS concepts and 38,175 colloquial expressions. This framework demonstrates encouraging potential in addressing the constraints of keyword matching information retrieval in social media-based public health research.	翻訳日:2023-06-29 15:16:29 公開日:2023-06-28
# 量子力学におけるエネルギー密度について On the energy density in quantum mechanics ( http://arxiv.org/abs/2306.15999v1 ) ライセンス: Link先を確認	Francisco Torres Arvizu, Adrian Ortega, and Hern\'an Larralde	(参考訳) 量子力学におけるエネルギー密度の定義はいくつかある。これらの降伏式は局所的に異なるが、全て連続性方程式を満たし、考慮中の系の期待エネルギーの値に積分する。したがって、ある定義を他の定義から選択する物理的根拠が存在するかどうかという問題は自然に生じる。本研究では, 量子粒子を含む井戸の大きさを変化させることで, システムの探究方法を提案する。壁を移動させることによる平均的な作業はエネルギー密度の定義の1つと密接に関連していることを示す。具体的には、壁面で評価された適切なエネルギー密度は、粒子が局所的に作用する力に対応し、そこで作業を行う。この同定は2次元系と3次元系に拡張される。 There are several definitions of energy density in quantum mechanics. These yield expressions that differ locally, but all satisfy a continuity equation and integrate to the value of the expected energy of the system under consideration. Thus, the question of whether there are physical grounds to choose one definition over another arises naturally. In this work, we propose a way to probe a system by varying the size of a well containing a quantum particle. We show that the mean work done by moving the wall is closely related to one of the definitions for energy density. Specifically, the appropriate energy density, evaluated at the wall corresponds to the force exerted by the particle locally, against which the work is done. We show that this identification extends to two and three dimensional systems.	翻訳日:2023-06-29 15:16:07 公開日:2023-06-28
# ラベル雑音補正がMLフェアネスに及ぼす影響の系統解析 Systematic analysis of the impact of label noise correction on ML Fairness ( http://arxiv.org/abs/2306.15994v1 ) ライセンス: Link先を確認	I. Oliveira e Silva, C. Soares, I. Sousa, R. Ghani	(参考訳) 任意、矛盾、あるいは欠陥のある意思決定は深刻な懸念を生じさせ、不公平なモデルを防ぐことは、機械学習においてますます重要な課題である。データはしばしば過去の差別行動を反映し、そのようなデータに基づいてトレーニングされたモデルは、性別、人種、年齢などのセンシティブな属性に偏りを反映する可能性がある。公正なモデルを開発するための1つのアプローチは、トレーニングデータを前処理して、例えばバイアス付きラベルを修正することで、関連する情報を保持しながら、基礎となるバイアスを取り除くことである。複数のラベルのノイズ補正手法が利用可能であるが、識別におけるその行動に関する情報は非常に限られている。本研究では,ラベルノイズ補正手法の有効性を定量的に評価し,偏りのあるデータセットで学習したモデルの公平性を保証する実験手法を開発した。提案手法はラベルノイズ量を操作することで,公平性ベンチマークだけでなく,標準mlデータセットでも使用できる。提案手法を適用し,標準OpenMLデータセットの公平度測定値に基づいて6つのラベルノイズ補正手法を解析する。その結果,ハイブリッドラベル雑音補正法は,予測性能と公平性との最良のトレードオフを実現することが示唆された。しかしながら、クラスタリングに基づく補正は、予測パフォーマンスを低下させるコストで、最も差別を低減できる。 Arbitrary, inconsistent, or faulty decision-making raises serious concerns, and preventing unfair models is an increasingly important challenge in Machine Learning. Data often reflect past discriminatory behavior, and models trained on such data may reflect bias on sensitive attributes, such as gender, race, or age. One approach to developing fair models is to preprocess the training data to remove the underlying biases while preserving the relevant information, for example, by correcting biased labels. While multiple label noise correction methods are available, the information about their behavior in identifying discrimination is very limited. In this work, we develop an empirical methodology to systematically evaluate the effectiveness of label noise correction techniques in ensuring the fairness of models trained on biased datasets. Our methodology involves manipulating the amount of label noise and can be used with fairness benchmarks but also with standard ML datasets. We apply the methodology to analyze six label noise correction methods according to several fairness metrics on standard OpenML datasets. Our results suggest that the Hybrid Label Noise Correction method achieves the best trade-off between predictive performance and fairness. Clustering-Based Correction can reduce discrimination the most, however, at the cost of lower predictive performance.	翻訳日:2023-06-29 15:15:56 公開日:2023-06-28
# MyDigitalFootprint: エッジにおける分散コンピューティングアプリケーションのための広範なコンテキストデータセット MyDigitalFootprint: an extensive context dataset for pervasive computing applications at the edge ( http://arxiv.org/abs/2306.15990v1 ) ライセンス: Link先を確認	Mattia Giovanni Campana, Franca Delmastro	(参考訳) 接続されたスマートデバイスの広範な普及は、インターネットの最先端における急速な拡大と進化に寄与した。パーソナルモバイルデバイスは周囲の他のスマートオブジェクトと対話し、急速に変化するユーザーコンテキストに基づいて行動に適応する。このデータをローカルに処理するモバイルデバイスの能力は、迅速な適応には不可欠である。これは、ユーザアプリケーションやコンテキスト処理用のミドルウェアプラットフォームに統合された単一の開発プロセスによって実現できます。しかし、モバイル環境におけるユーザコンテキストの複雑さを考慮した公開データセットの欠如は、研究の進展を妨げる。 mydigitalfootprintは,スマートフォンのセンサデータ,物理的近接情報,オンラインソーシャルネットワークインタラクションからなる大規模データセットである。このデータセットはマルチモーダルコンテキスト認識と社会的関係モデリングをサポートする。自然環境における31人のボランティアユーザーによる2ヶ月の計測で、制限のない行動を可能にする。既存の公開データセットは、特定のアプリケーションに対する限られたコンテキストデータに重点を置いています。データセットの有効性を示すために,様々な機械学習タスクを活用した3つのコンテキスト認識アプリケーションを提案する。 (i)物理的近接データに基づくソーシャルリンク予測アルゴリズム (ii)スマートフォン内蔵センサデータを用いた日常生活行動認識 (iii)広義の文脈認識推薦システム。我々のデータセットは、その異質な情報と共に、モバイルおよびエッジコンピューティングにおける新しい研究を検証する貴重なリソースとして役立ちます。 The widespread diffusion of connected smart devices has contributed to the rapid expansion and evolution of the Internet at its edge. Personal mobile devices interact with other smart objects in their surroundings, adapting behavior based on rapidly changing user context. The ability of mobile devices to process this data locally is crucial for quick adaptation. This can be achieved through a single elaboration process integrated into user applications or a middleware platform for context processing. However, the lack of public datasets considering user context complexity in the mobile environment hinders research progress. We introduce MyDigitalFootprint, a large-scale dataset comprising smartphone sensor data, physical proximity information, and Online Social Networks interactions. This dataset supports multimodal context recognition and social relationship modeling. It spans two months of measurements from 31 volunteer users in their natural environment, allowing for unrestricted behavior. Existing public datasets focus on limited context data for specific applications, while ours offers comprehensive information on the user context in the mobile environment. To demonstrate the dataset's effectiveness, we present three context-aware applications utilizing various machine learning tasks: (i) a social link prediction algorithm based on physical proximity data, (ii) daily-life activity recognition using smartphone-embedded sensors data, and (iii) a pervasive context-aware recommender system. Our dataset, with its heterogeneity of information, serves as a valuable resource to validate new research in mobile and edge computing.	翻訳日:2023-06-29 15:15:36 公開日:2023-06-28
# テンソルフォーマ:高品質点雲再構成のための正規化マトリックスアテンショントランス Tensorformer: Normalized Matrix Attention Transformer for High-quality Point Cloud Reconstruction ( http://arxiv.org/abs/2306.15989v1 ) ライセンス: Link先を確認	Hui Tian, Zheng Qin, Renjiao Yi, Chenyang Zhu, Kai Xu	(参考訳) 生のポイントクラウドからの表面復元は、コンピュータグラフィックスコミュニティで何十年も研究されてきた。ポアソン曲面再構成のような古典的な解は、合理的な結果を得るために余分な入力として点正規化を必要とする。現代の変圧器に基づく手法は正規化なしでは機能するが、離散点からの局所融合における符号化性能の制限により、結果はより微細化されていない。高品質な再構成を行うための新しい正規化行列アテンショントランス(Tensorformer)を提案する。提案した行列アテンションにより、同時にポイントワイドとチャネルワイドのメッセージパッシングが可能となり、一方、以前のベクトルアテンションは異なるチャネル間で隣接するポイント情報を失う。これにより、機能学習の自由度が高まり、ローカルジオメトリのモデリングが容易になる。提案手法は,ShapeNetCoreとABCの2つの一般的なデータセットの最先端化を実現し,ShapeNet上のIOUを4%改善する。私たちの実装は受け入れ次第リリースします。 Surface reconstruction from raw point clouds has been studied for decades in the computer graphics community, which is highly demanded by modeling and rendering applications nowadays. Classic solutions, such as Poisson surface reconstruction, require point normals as extra input to perform reasonable results. Modern transformer-based methods can work without normals, while the results are less fine-grained due to limited encoding performance in local fusion from discrete points. We introduce a novel normalized matrix attention transformer (Tensorformer) to perform high-quality reconstruction. The proposed matrix attention allows for simultaneous point-wise and channel-wise message passing, while the previous vector attention loses neighbor point information across different channels. It brings more degree of freedom in feature learning and thus facilitates better modeling of local geometries. Our method achieves state-of-the-art on two commonly used datasets, ShapeNetCore and ABC, and attains 4% improvements on IOU on ShapeNet. Our implementation will be released upon acceptance.	翻訳日:2023-06-29 15:15:17 公開日:2023-06-28
# AFPN:オブジェクト検出のための漸近的特徴ピラミッドネットワーク AFPN: Asymptotic Feature Pyramid Network for Object Detection ( http://arxiv.org/abs/2306.15988v1 ) ライセンス: Link先を確認	Guoyu Yang, Jie Lei, Zhikuan Zhu, Siyu Cheng, Zunlei Feng, Ronghua Liang	(参考訳) マルチスケール機能は、オブジェクト検出タスクのばらつきを伴うオブジェクトのエンコーディングにおいて非常に重要である。マルチスケール機能抽出のための一般的な戦略は、古典的なトップダウンおよびボトムアップ機能ピラミッドネットワークを採用することだ。しかし,これらの手法は特徴情報の喪失や劣化に悩まされ,非隣接レベルの融合効果を損なう。本稿では,非隣接レベルで直接インタラクションをサポートする漸近的特徴ピラミッドネットワーク(afpn)を提案する。 AFPNは隣接する2つの低レベル特徴を融合させて開始され、漸近的に高レベル特徴を融合プロセスに組み込む。このように、非隣接レベル間の大きな意味的ギャップを回避できる。各空間位置における特徴融合時に発生する多目的情報衝突の可能性を考えると、適応的な空間融合操作によりこれらの矛盾を軽減できる。提案したAFPNを2段階および1段階のオブジェクト検出フレームワークに組み込んで,MS-COCO 2017バリデーションとテストデータセットを用いて評価する。実験結果から,本手法は他の最先端機能ピラミッドネットワークよりも高い競合性が得られた。コードは \href{https://github.com/gyyang23/afpn}{https://github.com/gyyang23/afpn} で入手できる。 Multi-scale features are of great importance in encoding objects with scale variance in object detection tasks. A common strategy for multi-scale feature extraction is adopting the classic top-down and bottom-up feature pyramid networks. However, these approaches suffer from the loss or degradation of feature information, impairing the fusion effect of non-adjacent levels. This paper proposes an asymptotic feature pyramid network (AFPN) to support direct interaction at non-adjacent levels. AFPN is initiated by fusing two adjacent low-level features and asymptotically incorporates higher-level features into the fusion process. In this way, the larger semantic gap between non-adjacent levels can be avoided. Given the potential for multi-object information conflicts to arise during feature fusion at each spatial location, adaptive spatial fusion operation is further utilized to mitigate these inconsistencies. We incorporate the proposed AFPN into both two-stage and one-stage object detection frameworks and evaluate with the MS-COCO 2017 validation and test datasets. Experimental evaluation shows that our method achieves more competitive results than other state-of-the-art feature pyramid networks. The code is available at \href{https://github.com/gyyang23/AFPN}{https://github.com/gyyang23/AFPN}.	翻訳日:2023-06-29 15:14:58 公開日:2023-06-28
# 日本語文分類と名前付きエンティティ認識のマルチタスク学習のための文間ラベル生成フレームワーク Sentence-to-Label Generation Framework for Multi-task Learning of Japanese Sentence Classification and Named Entity Recognition ( http://arxiv.org/abs/2306.15978v1 ) ライセンス: Link先を確認	Chengguang Gan, Qinghao Zhang and Tatsunori Mori	(参考訳) 情報抽出(IE)は自然言語処理において重要なサブフィールドである。本研究では,Sentence Classification (SC) と Named Entity Recognition (NER) を組み合わせた,Sentence Classification and Named Entity Recognition Multi-task (SCNM) アプローチを提案する。我々はSCNMのためのSLGフレームワークを開発し、SCとNERの両方を含むウィキペディアデータセットを構築する。フォーマット変換器を用いて入力形式を統一し,生成モデルを用いてscラベル,nerラベル,関連するテキストセグメントを生成する。生成フォーマットの精度を向上させるための制約機構(cm)を提案する。その結果,SCの精度はSCNMでは1.13ポイント,NERでは1.06ポイント向上し,CMでは63.61から100に向上した。その結果,scとnerの相互強化効果が示され,統合により両タスクの性能が向上した。 Information extraction(IE) is a crucial subfield within natural language processing. In this study, we introduce a Sentence Classification and Named Entity Recognition Multi-task (SCNM) approach that combines Sentence Classification (SC) and Named Entity Recognition (NER). We develop a Sentence-to-Label Generation (SLG) framework for SCNM and construct a Wikipedia dataset containing both SC and NER. Using a format converter, we unify input formats and employ a generative model to generate SC-labels, NER-labels, and associated text segments. We propose a Constraint Mechanism (CM) to improve generated format accuracy. Our results show SC accuracy increased by 1.13 points and NER by 1.06 points in SCNM compared to standalone tasks, with CM raising format accuracy from 63.61 to 100. The findings indicate mutual reinforcement effects between SC and NER, and integration enhances both tasks' performance.	翻訳日:2023-06-29 15:14:36 公開日:2023-06-28
# 次元構造に基づくクロスモーダル学習のための知識蒸留法 A Dimensional Structure based Knowledge Distillation Method for Cross-Modal Learning ( http://arxiv.org/abs/2306.15977v1 ) ライセンス: Link先を確認	Lingyu Si, Hongwei Dong, Wenwen Qiang, Junzhi Yu, Wenlong Zhai, Changwen Zheng, Fanjiang Xu, Fuchun Sun	(参考訳) データ品質の制限のため、いくつかの重要な視覚タスクは独立して実行するのは難しい。情報的な暗黒知識を伝達するために、これまで利用できなかった情報を導入することは、そのような困難な課題を解決する一般的な方法である。しかし、なぜ転向した知識労働が広範に研究されていないのか。本稿では,単純かつ難解な課題から抽出された特徴を解析・観察することにより,特徴判別性と次元構造(ds)との相関性を見出す。そこで我々は, 深いチャネル関係と中間空間分布を用いてDSを表現し, 教師付きクロスモーダル学習(CML)の性能向上のための新しいクロスモーダル知識蒸留法を提案する。提案手法では,出力特徴をチャネル毎に独立し,中間特徴を均一に分散させることで,難課題から意味的に無関係な特徴を学習し,その正確性を高める。これは、二重モード間の性能ギャップが比較的大きい特定のアプリケーションで特に有用である。さらに,コミュニティ開発を促進するために,実世界のCMLデータセットを収集した。データセットには1万以上の光学画像とレーダー画像が含まれており、継続的に更新されている。実世界およびベンチマークデータセットにおける実験結果は,提案手法の有効性を検証する。 Due to limitations in data quality, some essential visual tasks are difficult to perform independently. Introducing previously unavailable information to transfer informative dark knowledge has been a common way to solve such hard tasks. However, research on why transferred knowledge works has not been extensively explored. To address this issue, in this paper, we discover the correlation between feature discriminability and dimensional structure (DS) by analyzing and observing features extracted from simple and hard tasks. On this basis, we express DS using deep channel-wise correlation and intermediate spatial distribution, and propose a novel cross-modal knowledge distillation (CMKD) method for better supervised cross-modal learning (CML) performance. The proposed method enforces output features to be channel-wise independent and intermediate ones to be uniformly distributed, thereby learning semantically irrelevant features from the hard task to boost its accuracy. This is especially useful in specific applications where the performance gap between dual modalities is relatively large. Furthermore, we collect a real-world CML dataset to promote community development. The dataset contains more than 10,000 paired optical and radar images and is continuously being updated. Experimental results on real-world and benchmark datasets validate the effectiveness of the proposed method.	翻訳日:2023-06-29 15:14:19 公開日:2023-06-28
# 物理・仮想センサデータを組み合わせたユーザコンテキストの軽量モデリング Lightweight Modeling of User Context Combining Physical and Virtual Sensor Data ( http://arxiv.org/abs/2306.16029v1 ) ライセンス: Link先を確認	Mattia Giovanni Campana, Dimitris Chatzopoulos, Franca Delmastro, Pan Hui	(参考訳) ユーザのモバイルデバイスで利用可能なセンサによって生成される多数のデータと、機械学習技術の進歩、ユーザの現在の状況(物理的コンテキスト)を認識し、システムのパーソナライズ機能を最適化するコンテキスト認識サービスのサポートを組み合わせる。しかし、コンテキスト認識性能は主にコンテキスト推論プロセスの精度に依存しており、これは大規模およびラベル付きデータセットの可用性に厳密に関係している。本研究では,パーソナルモバイルデバイスから得られた異種センシングデータを含むデータセットを収集するフレームワークを提案する。このフレームワークは3人の任意ユーザが2週間使用し、36K以上のサンプルと1331の機能を持つデータセットを生成する。また,ユーザモバイルデバイス上で推論処理全体を効率的に実行するユーザコンテキストをモデル化する軽量なアプローチを提案する。この目的のために, 文脈分類を最適化するために6次元化手法を用いた。生成したデータセットに対する実験結果から,精度損失を3%以下に抑えつつ,10倍のスピードアップと90%以上の特徴低下を実現した。 The multitude of data generated by sensors available on users' mobile devices, combined with advances in machine learning techniques, support context-aware services in recognizing the current situation of a user (i.e., physical context) and optimizing the system's personalization features. However, context-awareness performances mainly depend on the accuracy of the context inference process, which is strictly tied to the availability of large-scale and labeled datasets. In this work, we present a framework developed to collect datasets containing heterogeneous sensing data derived from personal mobile devices. The framework has been used by 3 voluntary users for two weeks, generating a dataset with more than 36K samples and 1331 features. We also propose a lightweight approach to model the user context able to efficiently perform the entire reasoning process on the user mobile device. To this aim, we used six dimensionality reduction techniques in order to optimize the context classification. Experimental results on the generated dataset show that we achieve a 10x speed up and a feature reduction of more than 90% while keeping the accuracy loss less than 3%.	翻訳日:2023-06-29 15:08:27 公開日:2023-06-28
# 古典と量子学習者の指数的分離 Exponential separations between classical and quantum learners ( http://arxiv.org/abs/2306.16028v1 ) ライセンス: Link先を確認	Casper Gyurik and Vedran Dunjko	(参考訳) かなりの努力にもかかわらず、量子機械学習コミュニティは、古典的なデータを扱う際に、人工暗号に触発されたデータセットに対して量子学習の利点しか示していない。本稿では、量子学習アルゴリズムが古典学習アルゴリズムよりも証明可能な指数的高速化を達成できる学習問題を見つけることの課題に対処する。本稿では,この問題に関連する計算学習理論の概念を考察し,定義の微妙な違いが,学習者が満足して解決すべき要件や課題をいかに大きく異なるものにするかを考察する。証明可能な量子スピードアップによる既存の学習問題を検証し、データを生成する関数を識別するのではなく、その古典的な難易度に大きく依存していることを見出した。そこで本研究では,古典的難易度は主にデータ生成関数の同定にある2つの新しい学習分離を提案する。さらに、データが量子生成されるシナリオで量子速度を証明できる計算のハードネスの仮定を探求し、より自然な設定(凝縮物や高エネルギー物理学など)で量子の利点を示唆する。また,学習分離の文脈における古典的な影のパラダイムの限界や,物質相の特徴付けやハミルトン学習といった物理的動機づけのある設定が計算学習フレームワークにどのように適合するかについても論じた。 Despite significant effort, the quantum machine learning community has only demonstrated quantum learning advantages for artificial cryptography-inspired datasets when dealing with classical data. In this paper we address the challenge of finding learning problems where quantum learning algorithms can achieve a provable exponential speedup over classical learning algorithms. We reflect on computational learning theory concepts related to this question and discuss how subtle differences in definitions can result in significantly different requirements and tasks for the learner to meet and solve. We examine existing learning problems with provable quantum speedups and find that they largely rely on the classical hardness of evaluating the function that generates the data, rather than identifying it. To address this, we present two new learning separations where the classical difficulty primarily lies in identifying the function generating the data. Furthermore, we explore computational hardness assumptions that can be leveraged to prove quantum speedups in scenarios where data is quantum-generated, which implies likely quantum advantages in a plethora of more natural settings (e.g., in condensed matter and high energy physics). We also discuss the limitations of the classical shadow paradigm in the context of learning separations, and how physically-motivated settings such as characterizing phases of matter and Hamiltonian learning fit in the computational learning framework.	翻訳日:2023-06-29 15:08:09 公開日:2023-06-28
# フェデレーション学習に基づく分散計算モデル : 時間変動問題を解決するための異種モデルとコンソーシアムブロックチェーンの統合 A Distributed Computation Model Based on Federated Learning Integrates Heterogeneous models and Consortium Blockchain for Solving Time-Varying Problems ( http://arxiv.org/abs/2306.16023v1 ) ライセンス: Link先を確認	Zhihao Hao, Guancheng Wang, Chunwei Tian, Bob Zhang	(参考訳) リカレントニューラルネットワークは,複雑な環境に対応する時間変動問題を効果的に解決するために開発された。しかし、集中処理の方法によって制限されたモデル性能は、実際のモデルやデータのサイロ問題のような要因によって大きく影響を受ける。したがって、フェデレーション学習(fl)のような分散人工知能の出現は、モデル間の動的集約を可能にする。しかしながら、flの統合プロセスは依然としてサーバに依存しており、モデル全体に大きなリスクをもたらす可能性がある。また、均質なモデル間の協調のみが可能であり、異質なモデル間の相互作用に対する良い解決策を持っていない。そこで本研究では,コンソーシアムブロックチェーンネットワークに基づく分散計算モデル(DCM)を提案する。さらに、グローバルソリューションプロセスのために分散階層統合(DHI)アルゴリズムも設計されている。グループ内のパーミッションノードは、異なるパーミッションレスノードからローカルモデルの結果を収集し、集約された結果をすべてのパーミッションレスノードに送信し、ローカルモデルの処理を規則化する。イテレーションが完了すると、ローカルな結果の二次的な統合がパーミッションノード間で行われ、グローバルな結果が得られる。実験では,DCMの効率を検証し,提案したモデルが,フェデレート学習フレームワークに基づく多くの最先端モデルより優れていることを示す。 The recurrent neural network has been greatly developed for effectively solving time-varying problems corresponding to complex environments. However, limited by the way of centralized processing, the model performance is greatly affected by factors like the silos problems of the models and data in reality. Therefore, the emergence of distributed artificial intelligence such as federated learning (FL) makes it possible for the dynamic aggregation among models. However, the integration process of FL is still server-dependent, which may cause a great risk to the overall model. Also, it only allows collaboration between homogeneous models, and does not have a good solution for the interaction between heterogeneous models. Therefore, we propose a Distributed Computation Model (DCM) based on the consortium blockchain network to improve the credibility of the overall model and effective coordination among heterogeneous models. In addition, a Distributed Hierarchical Integration (DHI) algorithm is also designed for the global solution process. Within a group, permissioned nodes collect the local models' results from different permissionless nodes and then sends the aggregated results back to all the permissionless nodes to regularize the processing of the local models. After the iteration is completed, the secondary integration of the local results will be performed between permission nodes to obtain the global results. In the experiments, we verify the efficiency of DCM, where the results show that the proposed model outperforms many state-of-the-art models based on a federated learning framework.	翻訳日:2023-06-29 15:07:46 公開日:2023-06-28
# 強化学習の構造:調査とオープン問題 Structure in Reinforcement Learning: A Survey and Open Problems ( http://arxiv.org/abs/2306.16021v1 ) ライセンス: Link先を確認	Aditya Mohan, Amy Zhang, Marius Lindauer	(参考訳) 関数近似のためのディープニューラルネットワーク(DNN)の表現能力に支えられた強化学習(RL)は、多くのアプリケーションでかなりの成功を収めている。しかし、多様な予測不可能な力学、ノイズ信号、そして大きな状態と行動空間によって特徴づけられる、幅広い現実世界のシナリオに対処する実践性は依然として限られている。この制限は、データ効率の低下、一般化能力の制限、安全性保証の欠如、解釈可能性の欠如などの問題に起因している。これらの課題を克服し、これらの重要な指標にまたがるパフォーマンスを改善するために、問題に関する構造的な情報をRL学習プロセスに組み込むことが期待できる。 RLの様々なサブフィールドは、そのような誘導バイアスを組み込む方法を提案している。我々は,これらの多様な方法論を統一的な枠組みで満たし,学習問題における構造の役割に光を当て,それらの手法を構造を組み込む異なるパターンに分類する。この包括的フレームワークを活用することで、構造化されたRLに関連する課題に関する貴重な洞察を提供し、RL研究におけるデザインパターンの視点の基礎となる。この新しい視点は、現実世界のシナリオをよりうまく処理できる、より効率的かつ効率的なRLアルゴリズムの開発における将来の進歩と支援の道を開く。 Reinforcement Learning (RL), bolstered by the expressive capabilities of Deep Neural Networks (DNNs) for function approximation, has demonstrated considerable success in numerous applications. However, its practicality in addressing a wide range of real-world scenarios, characterized by diverse and unpredictable dynamics, noisy signals, and large state and action spaces, remains limited. This limitation stems from issues such as poor data efficiency, limited generalization capabilities, a lack of safety guarantees, and the absence of interpretability, among other factors. To overcome these challenges and improve performance across these crucial metrics, one promising avenue is to incorporate additional structural information about the problem into the RL learning process. Various sub-fields of RL have proposed methods for incorporating such inductive biases. We amalgamate these diverse methodologies under a unified framework, shedding light on the role of structure in the learning problem, and classify these methods into distinct patterns of incorporating structure. By leveraging this comprehensive framework, we provide valuable insights into the challenges associated with structured RL and lay the groundwork for a design pattern perspective on RL research. This novel perspective paves the way for future advancements and aids in the development of more effective and efficient RL algorithms that can potentially handle real-world scenarios better.	翻訳日:2023-06-29 15:07:24 公開日:2023-06-28
# points for energy reform (pointer): エネルギー特性に関連付けられた100万の建物からなるlidarから派生したポイントクラウドデータセット Points for Energy Renovation (PointER): A LiDAR-Derived Point Cloud Dataset of One Million English Buildings Linked to Energy Characteristics ( http://arxiv.org/abs/2306.16020v1 ) ライセンス: Link先を確認	Sebastian Krapf, Kevin Mayer, Martin Fischer	(参考訳) 欧州の非効率な建物の急速な改修は、気候変動を減らすために必要である。しかし,各建物がユニークなため,大規模建築物の分析・評価は困難である。現在の実施例では、建物のエネルギー性能は、ゆっくりと、コストがかかり、局所的に評価される。本稿では,建物の3次元表現とそのエネルギー特性に関するデータ駆動型大規模理解を促進するビルディングポイントクラウドデータセットを提案する。我々は、ジオリファレンスなLiDARデータとビルディングフットプリントを交差させてビルディングポイント雲を生成し、Unique Property Reference Number (UPRN)を介してイギリスのエネルギーパフォーマンスデータベースから属性をリンクする。代表例を示すために,イギリス各地の農村・都市部から100万棟の建築物を選定し,その50万棟がエネルギー特性と関連づけられている。新しいリージョンにおけるポイントクラウドの構築は、論文と共に公開されたオープンソースコードによって生成される。このデータセットは、エネルギーモデリングの新たな研究を可能にし、UPRNやジオロケーションを通じて構築機能を追加することで、他の研究分野にも容易に拡張できる。 Rapid renovation of Europe's inefficient buildings is required to reduce climate change. However, analyzing and evaluating buildings at scale is challenging because every building is unique. In current practice, the energy performance of buildings is assessed during on-site visits, which are slow, costly, and local. This paper presents a building point cloud dataset that promotes a data-driven, large-scale understanding of the 3D representation of buildings and their energy characteristics. We generate building point clouds by intersecting building footprints with geo-referenced LiDAR data and link them with attributes from UK's energy performance database via the Unique Property Reference Number (UPRN). To achieve a representative sample, we select one million buildings from a range of rural and urban regions across England, of which half a million are linked to energy characteristics. Building point clouds in new regions can be generated with the open-source code published alongside the paper. The dataset enables novel research in building energy modeling and can be easily expanded to other research fields by adding building features via the UPRN or geo-location.	翻訳日:2023-06-29 15:07:04 公開日:2023-06-28
# 改良型深層学習モデルに基づくオフショア風力発電所における鳥の高速認識 Fast Recognition of birds in offshore wind farms based on an improved deep learning model ( http://arxiv.org/abs/2306.16019v1 ) ライセンス: Link先を確認	Yantong Liu, Xingke Li, Jong-Chan Lee	(参考訳) 風力タービンの安全性は、沖合の風力発電所の安定した運用の前提条件である。しかし、鳥の損傷は風力タービンと風力タービンブレードの安全運転に直接的な脅威をもたらす。さらに、毎年何百万もの鳥が風力タービンで死んでいる。そこで本稿では, 環境保全と洋上風力タービンの安全な運転の維持を目的として, 夜間等の低照度環境における電流ターゲット検出アルゴリズムの低検出性能の問題に対処するため, cbamアテンション機構とretinexnetネットワークをyolov5に統合することにより, ネットワーク性能を向上させる手法を提案する。まず、トレーニング用CBAMアテンションモジュールを内蔵したYOLOv5ネットワークにトレーニングセット画像を入力し、最適な重量モデルを保存する。次に、デコンネットとエンハンスネットを用いて低照度画像の強調表示を行い、その精度を最適重みモデルで検証する。さらに、k-means++クラスタリングアルゴリズムを用いてアンカーボックス選択法を最適化し、不安定な初期セントロイドの問題を解消し、より優れたクラスタリング結果を得る。実験の結果、鳥類検出作業におけるこのモデルの精度は87.40%に達し、21.25%が増加することが示された。モデルは,風力タービン近傍の鳥をリアルタイムに検出し,夜間,雨季,風況に強い安定性を示し,そのモデルが風力タービンの安全かつ安定した運転を保証することを証明した。 The safety of wind turbines is a prerequisite for the stable operation of offshore wind farms. However, bird damage poses a direct threat to the safe operation of wind turbines and wind turbine blades. In addition, millions of birds are killed by wind turbines every year. In order to protect the ecological environment and maintain the safe operation of offshore wind turbines, and to address the problem of the low detection capability of current target detection algorithms in low-light environments such as at night, this paper proposes a method to improve the network performance by integrating the CBAM attention mechanism and the RetinexNet network into YOLOv5. First, the training set images are fed into the YOLOv5 network with integrated CBAM attention module for training, and the optimal weight model is stored. Then, low-light images are enhanced and denoised using Decom-Net and Enhance-Net, and the accuracy is tested on the optimal weight model. In addition, the k-means++ clustering algorithm is used to optimise the anchor box selection method, which solves the problem of unstable initial centroids and achieves better clustering results. Experimental results show that the accuracy of this model in bird detection tasks can reach 87.40%, an increase of 21.25%. The model can detect birds near wind turbines in real time and shows strong stability in night, rainy and shaky conditions, proving that the model can ensure the safe and stable operation of wind turbines.	翻訳日:2023-06-29 15:06:45 公開日:2023-06-28
# 複数ラベルの分類に必要な正のラベル Positive Label Is All You Need for Multi-Label Classification ( http://arxiv.org/abs/2306.16016v1 ) ライセンス: Link先を確認	Zhixiang Yuan, Kaixin Zhang, Tao Huang	(参考訳) マルチラベル分類(MLC)は、各画像に様々な意味ラベルを注釈付けすることが困難であるため、トレーニングデータにおいて避けられないラベルノイズに悩まされる。ノイズラベルの影響を軽減するため、既存の手法は主に訓練されたmlcモデルによるラベルミスの識別と修正に費やされている。しかし、これらの方法はいまだに騒がしいラベルをトレーニングに含むため、ノイズラベルを不正確に認識し、パフォーマンスを弱める可能性がある。本稿では, 負ラベルが正ラベル以上であり, ほとんどのノイズラベルが負ラベルから来ていることを考慮し, データセット中のすべての負ラベルを直接破棄し, 正および未ラベルのマルチラベル分類(PU-MLC)と呼ばれる新しい手法を提案する。正のラベル付き学習をmlcタスクに拡張することにより,正のラベルとラベル付きデータのみをモデルに訓練し,損失関数に適応的な再バランス係数と適応温度係数を導入し,ラベル分布の破滅的不均衡と,トレーニングの確率の過大さを緩和する。 PU-MLC は単純かつ効果的であり,MLC-PL タスクを伴う MLC と MLC の両方に適用可能である。 MS-COCOとPASCAL VOCデータセットの大規模な実験により、私たちのPU-MLCはより少ないアノテーションで MLC と MLC-PL の設定を大幅に改善することを示した。コードはリリースされる。 Multi-label classification (MLC) suffers from the inevitable label noise in training data due to the difficulty in annotating various semantic labels in each image. To mitigate the influence of noisy labels, existing methods mainly devote to identifying and correcting the label mistakes via a trained MLC model. However, these methods still involve annoying noisy labels in training, which can result in imprecise recognition of noisy labels and weaken the performance. In this paper, considering that the negative labels are substantially more than positive labels, and most noisy labels are from the negative labels, we directly discard all the negative labels in the dataset, and propose a new method dubbed positive and unlabeled multi-label classification (PU-MLC). By extending positive-unlabeled learning into MLC task, our method trains model with only positive labels and unlabeled data, and introduces adaptive re-balance factor and adaptive temperature coefficient in the loss function to alleviate the catastrophic imbalance in label distribution and over-smoothing of probabilities in training. Our PU-MLC is simple and effective, and it is applicable to both MLC and MLC with partial labels (MLC-PL) tasks. Extensive experiments on MS-COCO and PASCAL VOC datasets demonstrate that our PU-MLC achieves significantly improvements on both MLC and MLC-PL settings with even fewer annotations. Code will be released.	翻訳日:2023-06-29 15:06:19 公開日:2023-06-28
# bayesflow:ニューラルネットワークによるベイズワークフローの償却 BayesFlow: Amortized Bayesian Workflows With Neural Networks ( http://arxiv.org/abs/2306.16015v1 ) ライセンス: Link先を確認	Stefan T Radev and Marvin Schmitt and Lukas Schumacher and Lasse Elsem\"uller and Valentin Pratz and Yannik Sch\"alte and Ullrich K\"othe and Paul-Christian B\"urkner	(参考訳) 現代のベイズ推論は、データ分析の原則的ワークフローの一部として確率的モデルからの結論を推定、検証、描画するための計算技法の混合を含む。ベイズワークフローの典型的な問題は、様々なモデルタイプに対する難解な後続分布の近似と、その複雑さと予測性能の観点から同じプロセスの競合モデルの比較である。この原稿はPythonライブラリのBayesFlowを紹介し、アモートされたデータ圧縮と推論のための確立したニューラルネットワークアーキテクチャのシミュレーションベースのトレーニングを行う。 Amortized Bayesian推論は、BayesFlowで実装されているもので、モデルシミュレーションでカスタムニューラルネットワークをトレーニングし、その後のモデル適用のためにこれらのネットワークを再使用することができる。トレーニングされたネットワークは、ほぼ瞬時に推論を行うことができるため、事前のニューラルネットワークトレーニングは、迅速に償却される。 Modern Bayesian inference involves a mixture of computational techniques for estimating, validating, and drawing conclusions from probabilistic models as part of principled workflows for data analysis. Typical problems in Bayesian workflows are the approximation of intractable posterior distributions for diverse model types and the comparison of competing models of the same process in terms of their complexity and predictive performance. This manuscript introduces the Python library BayesFlow for simulation-based training of established neural network architectures for amortized data compression and inference. Amortized Bayesian inference, as implemented in BayesFlow, enables users to train custom neural networks on model simulations and re-use these networks for any subsequent application of the models. Since the trained networks can perform inference almost instantaneously, the upfront neural network training is quickly amortized.	翻訳日:2023-06-29 15:05:50 公開日:2023-06-28
# 隣接トークンマージによるトランスデューサの高速化 Accelerating Transducers through Adjacent Token Merging ( http://arxiv.org/abs/2306.16009v1 ) ライセンス: Link先を確認	Yuang Li, Yu Wu, Jinyu Li, Shujie Liu	(参考訳) 最近のエンドツーエンド自動音声認識(ASR)システムは、高いフレームレートで埋め込みを生成するトランスフォーマーベースの音響エンコーダを使用することが多い。しかし、この設計は非効率であり、特に長い音声信号は、自己着脱の二次計算のためである。そこで本研究では,隣接するトークンと鍵値間の類似度の高いスコアを段階的に組み合わせたAdjacent Token Merging(A-ToMe)を提案する。これにより、総時間ステップを短縮することができ、エンコーダとジョイントネットワークの両方の推論が高速化される。 LibriSpeechの実験により,トークンの57%を削減し,GPU上での推論速度を70%向上できることがわかった。さらに、A-ToMeは、入力音声が複数の発話からなる長文ASRにおけるトークンを減らす効果的な解であることを示す。 Recent end-to-end automatic speech recognition (ASR) systems often utilize a Transformer-based acoustic encoder that generates embedding at a high frame rate. However, this design is inefficient, particularly for long speech signals due to the quadratic computation of self-attention. To address this, we propose a new method, Adjacent Token Merging (A-ToMe), which gradually combines adjacent tokens with high similarity scores between their key values. In this way, the total time step could be reduced, and the inference of both the encoder and joint network is accelerated. Experiments on LibriSpeech show that our method can reduce 57% of tokens and improve the inference speed on GPU by 70% without any notable loss of accuracy. Additionally, we demonstrate that A-ToMe is also an effective solution to reduce tokens in long-form ASR, where the input speech consists of multiple utterances.	翻訳日:2023-06-29 15:05:38 公開日:2023-06-28
# 音声認識におけるゼロショット領域適応のための大規模言語モデルの提案 Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition ( http://arxiv.org/abs/2306.16007v1 ) ライセンス: Link先を確認	Yuang Li, Yu Wu, Jinyu Li, Shujie Liu	(参考訳) 言語モデル(LM)の統合は、音声認識におけるドメインシフトに対処する効果的な方法であることが証明されている。しかし、これらのアプローチは通常、lmsのトレーニングのためにかなりの量のターゲットドメインテキストデータを必要とする。これらの手法と異なり、ドメイン固有のテキストプロンプトのみで、7ビリオンパラメータ大言語モデル(LLM)であるLLaMAを用いた2つのゼロショットASRドメイン適応手法を提案する。 LLMは2つの方法で使われます。 1)第2パス再構成:所定のASR系のN-best仮説をLLaMAで再評価すること。 2)深いLLM融合:エンコーダデコーダベースのASRシステムのデコーダにLLMを組み込む。実験では、1つのドメインプロンプトだけで、両方のメソッドがドメイン外のtedlium-2とspgispeechデータセットでワードエラー率(wer)を効果的に削減できることが示されている。特に、深いLLM融合は、実体語と外語彙語のより優れたリコールの利点がある。 The integration of Language Models (LMs) has proven to be an effective way to address domain shifts in speech recognition. However, these approaches usually require a significant amount of target domain text data for the training of LMs. Different from these methods, in this work, with only a domain-specific text prompt, we propose two zero-shot ASR domain adaptation methods using LLaMA, a 7-billion-parameter large language model (LLM). LLM is used in two ways: 1) second-pass rescoring: reranking N-best hypotheses of a given ASR system with LLaMA; 2) deep LLM-fusion: incorporating LLM into the decoder of an encoder-decoder based ASR system. Experiments show that, with only one domain prompt, both methods can effectively reduce word error rates (WER) on out-of-domain TedLium-2 and SPGISpeech datasets. Especially, the deep LLM-fusion has the advantage of better recall of entity and out-of-vocabulary words.	翻訳日:2023-06-29 15:05:24 公開日:2023-06-28
# ディープラーニングのためのバイナリ・プリソーシングによる主音分類の改善 Improving Primate Sounds Classification using Binary Presorting for Deep Learning ( http://arxiv.org/abs/2306.16054v1 ) ライセンス: Link先を確認	Michael K\"olle, Steffen Illium, Maximilian Zorn, Jonas N\"u{\ss}lein, Patrick Suchostawski and Claudia Linnhoff-Popien	(参考訳) 野生生物の観察と保全の分野では、音声録音における機械学習のアプローチがますます普及している。残念なことに、この研究分野の利用可能なデータセットは、しばしば最適な学習材料ではない。本研究では,MELスペクトル表現のサブセグメンテーションを初めてリラベルする一般化されたアプローチを導入し,実際のマルチクラス分類タスクにおいて高い性能を実現する。バイナリプリソートと分類の両方において、畳み込みニューラルネットワーク(CNN)と様々なデータ拡張技術を利用する。このアプローチの結果を,異なる霊長類音を分類し,相対的に装備されたモデルベースラインと対照的に,高い精度とuarスコアを報告するという課題を伴って,挑戦的な \textit{compare 2021}データセット上で示した。 In the field of wildlife observation and conservation, approaches involving machine learning on audio recordings are becoming increasingly popular. Unfortunately, available datasets from this field of research are often not optimal learning material; Samples can be weakly labeled, of different lengths or come with a poor signal-to-noise ratio. In this work, we introduce a generalized approach that first relabels subsegments of MEL spectrogram representations, to achieve higher performances on the actual multi-class classification tasks. For both the binary pre-sorting and the classification, we make use of convolutional neural networks (CNN) and various data-augmentation techniques. We showcase the results of this approach on the challenging \textit{ComparE 2021} dataset, with the task of classifying between different primate species sounds, and report significantly higher Accuracy and UAR scores in contrast to comparatively equipped model baselines.	翻訳日:2023-06-29 14:56:57 公開日:2023-06-28
# SVNR:デノイング拡散による空間変動ノイズ除去 SVNR: Spatially-variant Noise Removal with Denoising Diffusion ( http://arxiv.org/abs/2306.16052v1 ) ライセンス: Link先を確認	Naama Pearl, Yaron Brodsky, Dana Berman, Assaf Zomet, Alex Rav Acha, Daniel Cohen-Or, Dani Lischinski	(参考訳) 雑音拡散モデルは最近、生成的タスクにおいて印象的な結果を示している。トレーニング画像の膨大なコレクションから強力な事前学習を行うことで、このようなモデルは小さなデノナイジングステップのシーケンスを通じて、完全なノイズをクリーンな自然画像に徐々に修正することができる。しかし, 実写画像のノイズとは異なり, 付加的な白色ガウス雑音に基づくため, 実写ノイズ除去に効果的にデノナイジング拡散モデルを適用することは困難である。本研究では,より現実的で空間的変動のある雑音モデルを想定した,新しい拡散の定式化であるSVNRを提案する。 SVNRは、ノイズの多い入力画像を、その上に処理を条件付けることに加えて、デノナイジング拡散プロセスの出発点として使用できる。この目的のために,各画素が独自の時間埋め込みを持つように拡散過程を適応させ,空間的に変化する時間マップをサポートするトレーニングおよび推論スキームを提案する。我々の定式化は、条件画像と修正拡散過程に沿ったサンプルとの間に存在する相関も説明します。実験では, 強力な拡散モデルベースラインに対するアプローチの利点と, 最先端の単一画像復号法に対するアプローチの利点を実証した。 Denoising diffusion models have recently shown impressive results in generative tasks. By learning powerful priors from huge collections of training images, such models are able to gradually modify complete noise to a clean natural image via a sequence of small denoising steps, seemingly making them well-suited for single image denoising. However, effectively applying denoising diffusion models to removal of realistic noise is more challenging than it may seem, since their formulation is based on additive white Gaussian noise, unlike noise in real-world images. In this work, we present SVNR, a novel formulation of denoising diffusion that assumes a more realistic, spatially-variant noise model. SVNR enables using the noisy input image as the starting point for the denoising diffusion process, in addition to conditioning the process on it. To this end, we adapt the diffusion process to allow each pixel to have its own time embedding, and propose training and inference schemes that support spatially-varying time maps. Our formulation also accounts for the correlation that exists between the condition image and the samples along the modified diffusion process. In our experiments we demonstrate the advantages of our approach over a strong diffusion model baseline, as well as over a state-of-the-art single image denoising method.	翻訳日:2023-06-29 14:56:40 公開日:2023-06-28
# 逆攻撃による深部画像復調モデルの同時性およびロバスト性の評価 Evaluating Similitude and Robustness of Deep Image Denoising Models via Adversarial Attack ( http://arxiv.org/abs/2306.16050v1 ) ライセンス: Link先を確認	Jie Ning, Yao Li, Zhichang Guo	(参考訳) ディープニューラルネットワーク(dnn)は、画像デノイジングの分野で幅広い応用があり、従来の画像デノイジングよりも優れている。しかし、DNNは必然的に、敵の攻撃に直面している弱い堅牢性を示す。本稿では,既存の奥行き画像のデノイジング手法の類似性について検討する。第一に、デノナイジング-PGDは、デノナイジングモデルの全対角法である。現在の主流の非盲検モデル(DnCNN、FFDNet、ECNDNet、BRDNet)、盲検モデル(DnCNN-B、ノイズ2ノイズ、RDDCNN-B、FAN)、プラグ・アンド・プレイ(DPIR、CurvPnP)、グレースケールおよびカラー画像に適用した展開復調モデル(DeamNet)は、同一の手法で攻撃することができる。第2に,画像デノイジングタスクではデノイジングpgdの転写能が顕著であるため,トランスファビリティ下での潜伏の特性を探索する実験をデザインする。トランスファー可能性と同化度を関連付け、深部画像の同化モデルは高い同化度を持つと結論づける。第3に, 対向空間の特徴について検討し, 対向訓練を用いて, 対向攻撃による深層画像の脆弱性を補完する。最後に,この対向攻撃法を制約し,ガウス分布を維持する対向攻撃法L2-denoising-PGD画像を提案する。さらに, BM3Dのモデル駆動画像は, 敵攻撃に対する抵抗性を示した。 Deep neural networks (DNNs) have a wide range of applications in the field of image denoising, and they are superior to traditional image denoising. However, DNNs inevitably show vulnerability, which is the weak robustness in the face of adversarial attacks. In this paper, we find some similitudes between existing deep image denoising methods, as they are consistently fooled by adversarial attacks. First, denoising-PGD is proposed which is a denoising model full adversarial method. The current mainstream non-blind denoising models (DnCNN, FFDNet, ECNDNet, BRDNet), blind denoising models (DnCNN-B, Noise2Noise, RDDCNN-B, FAN), and plug-and-play (DPIR, CurvPnP) and unfolding denoising models (DeamNet) applied to grayscale and color images can be attacked by the same set of methods. Second, since the transferability of denoising-PGD is prominent in the image denoising task, we design experiments to explore the characteristic of the latent under the transferability. We correlate transferability with similitude and conclude that the deep image denoising models have high similitude. Third, we investigate the characteristic of the adversarial space and use adversarial training to complement the vulnerability of deep image denoising to adversarial attacks on image denoising. Finally, we constrain this adversarial attack method and propose the L2-denoising-PGD image denoising adversarial attack method that maintains the Gaussian distribution. Moreover, the model-driven image denoising BM3D shows some resistance in the face of adversarial attacks.	翻訳日:2023-06-29 14:55:26 公開日:2023-06-28
# fifaワールドカップのカタール2022でtwitterとaiを使って学んだ感想と楽しい事実 What Sentiment and Fun Facts We Learnt Before FIFA World Cup Qatar 2022 Using Twitter and AI ( http://arxiv.org/abs/2306.16049v1 ) ライセンス: Link先を確認	James She, Kamilla Swart-Arries, Mohammad Belal and Simon Wong	(参考訳) Twitterは、ほとんどの国を橋渡しし、リアルタイムニュース発見を可能にするソーシャルメディアプラットフォームである。 Twitter上のツイートは通常短く、一般の感情を表現するため、グローバルなイベントに対する意見マイニングと感情分析の源泉となる。本稿では、fifaワールドカップに関するツイートに対する感想を提供するための効果的な解決策を提案する。コミュニティで初めて、少なくとも130万ツイートが収集され、提案された機械学習ソリューションのパフォーマンスを評価するデータセットとして実装されている。これらのツイートはカタールワールドカップ2022の関連ハッシュタグとキーワードで収集される。本論文ではvaderアルゴリズムを用いて感情分析を行う。機械学習手法とTwitterのつぶやきの収集により、ワールドカップ前の期間において重要ないくつかの側面の感情と楽しい事実を発見した。その結果、人々はワールドカップの開会に前向きであることがわかった。 Twitter is a social media platform bridging most countries and allows real-time news discovery. Since the tweets on Twitter are usually short and express public feelings, thus provide a source for opinion mining and sentiment analysis for global events. This paper proposed an effective solution, in providing a sentiment on tweets related to the FIFA World Cup. At least 130k tweets, as the first in the community, are collected and implemented as a dataset to evaluate the performance of the proposed machine learning solution. These tweets are collected with the related hashtags and keywords of the Qatar World Cup 2022. The Vader algorithm is used in this paper for sentiment analysis. Through the machine learning method and collected Twitter tweets, we discovered the sentiments and fun facts of several aspects important to the period before the World Cup. The result shows people are positive to the opening of the World Cup.	翻訳日:2023-06-29 14:54:53 公開日:2023-06-28
# 視覚言語モデルによるゼロショット認識の課題:粒度と正確性 Challenges of Zero-Shot Recognition with Vision-Language Models: Granularity and Correctness ( http://arxiv.org/abs/2306.16048v1 ) ライセンス: Link先を確認	Zhenlin Xu, Yi Zhu, Tiffany Deng, Abhay Mittal, Yanbei Chen, Manchen Wang, Paolo Favaro, Joseph Tighe, Davide Modolo	(参考訳) 本稿では,オープンワールドにおけるゼロショット視覚認識タスクに視覚言語モデル(vlms)を適用する際の課題について,クリップなどのコントラスト的視覚言語モデルに着目して検討する。まず,様々な粒度の概念に対するvlmの性能について検討した。我々は,2つの実験環境において,性能不一致を公平に評価する方法を提案し,vlmがきめ細かい概念を認識するのに優れていることを示す。さらに,vlmsの類似度スコアは,視覚入力によるテキスト入力の正確さを厳密に反映しないことがわかった。本稿では,より情報的な記述に対してスコアが偏りがあるという仮説を検証するための評価プロトコルを提案し,組込み間の類似性スコアの性質は,VLMが類似する記述間の正しさを認識するのを困難にしている。本研究は,VLMをオープンワールド環境で使用する上での課題を強調し,今後のゼロショット機能向上に向けた方向性を提案する。 This paper investigates the challenges of applying vision-language models (VLMs) to zero-shot visual recognition tasks in an open-world setting, with a focus on contrastive vision-language models such as CLIP. We first examine the performance of VLMs on concepts of different granularity levels. We propose a way to fairly evaluate the performance discrepancy under two experimental setups and find that VLMs are better at recognizing fine-grained concepts. Furthermore, we find that the similarity scores from VLMs do not strictly reflect the correctness of the textual inputs given visual input. We propose an evaluation protocol to test our hypothesis that the scores can be biased towards more informative descriptions, and the nature of the similarity score between embedding makes it challenging for VLMs to recognize the correctness between similar but wrong descriptions. Our study highlights the challenges of using VLMs in open-world settings and suggests directions for future research to improve their zero-shot capabilities.	翻訳日:2023-06-29 14:54:41 公開日:2023-06-28
# OpenNDD:神経発達障害検出のためのオープンセット認識 OpenNDD: Open Set Recognition for Neurodevelopmental Disorders Detection ( http://arxiv.org/abs/2306.16045v1 ) ライセンス: Link先を確認	Jiaming Yu, Zihao Guan, Xinyue Chang, Xiumei Liu, Zhenshan Shi, Changcai Yang, Riqing Chen, Lanyan Xue, Lifang Wei	(参考訳) 神経発達障害 (NDD) は、非常に一般的な疾患群であり、強力な臨床行動類似性を示し、自閉症スペクトラム障害 (ASD) や注意欠陥性高活動障害 (ADHD) などの異なるNDDの正確な同定を困難にしている。さらに,NDDの診断には信頼性のある生理学的マーカーは存在せず,心理的評価基準にのみ依存している。しかし, 経過観察と近縁な知的補助診断により, 誤診や下垂体手術を予防することが重要である。そこで本稿では,これらの課題を解消するために,nddsスクリーニングと検出のための新しいオープンセット認識フレームワークを提案する。オートエンコーダと逆方向の相互認識を組み合わせることで、既知のクラスを正確に識別し、遭遇したことのないクラスを認識する。また、異なる被験者間の強い類似性を考慮し、未知の障害を識別するためのMMSと呼ばれる共同スケーリング手法を提案する。提案手法の有効性を検証するために, 自閉症脳画像データ交換装置i (abide i) とadhd-200サンプル (adhd-200) のハイブリッドデータセットに対する相互対向実験プロトコルを, 4地点から771点のサンプルで設計し, 各種指標の優位性を示す。 OpenNDDは77.38%、AUROCは75.53%、オープンセットの分類率は59.43%という有望な性能を達成した。 Neurodevelopmental disorders (NDDs) are a highly prevalent group of disorders and represent strong clinical behavioral similarities, and that make it very challenging for accurate identification of different NDDs such as autism spectrum disorder (ASD) and attention-deficit hyperactivity disorder (ADHD). Moreover, there is no reliable physiological markers for NDDs diagnosis and it solely relies on psychological evaluation criteria. However, it is crucial to prevent misdiagnosis and underdiagnosis by intelligent assisted diagnosis, which is closely related to the follow-up corresponding treatment. In order to relieve these issues, we propose a novel open set recognition framework for NDDs screening and detection, which is the first application of open set recognition in this field. It combines auto encoder and adversarial reciprocal points open set recognition to accurately identify known classes as well as recognize classes never encountered. And considering the strong similarities between different subjects, we present a joint scaling method called MMS to distinguish unknown disorders. To validate the feasibility of our presented method, we design a reciprocal opposition experiment protocol on the hybrid datasets from Autism Brain Imaging Data Exchange I (ABIDE I) and THE ADHD-200 SAMPLE (ADHD-200) with 791 samples from four sites and the results demonstrate the superiority on various metrics. Our OpenNDD has achieved promising performance, where the accuracy is 77.38%, AUROC is 75.53% and the open set classification rate is as high as 59.43%.	翻訳日:2023-06-29 14:54:23 公開日:2023-06-28
# 加速検出器のダイナミックマップ Dynamical Maps for Accelerating Detectors ( http://arxiv.org/abs/2306.16041v1 ) ライセンス: Link先を確認	Shalin Jose (1), Anil Shaji (1) ((1) Indian Institute of Science Education and Research Thiruvananthapuram)	(参考訳) ミンコフスキー真空を無質量スカラー場に弱結合して加速する2レベル粒子検出器の開量子力学について検討する。非ゼロサイズの検出器を考察し、初期慣性運動である場合の時間発展について検討し、その後一定加速度を有限時間オンにする。このようなシステムの進化を記述した力学写像を研究し、力学が完全に正ではないことを示す。加速前の慣性運動は検出器と磁場を絡み合わせることができ、NCPダイナミクスに繋がる。本研究では,前慣性運動の時間と加速度の大きさの関数として,加速相中の開力学の性質を検討する。 We study the open quantum dynamics of a two-level particle detector that starts accelerating through Minkowski vacuum weakly coupled to a massless scalar field. We consider a detector with non-zero size and study its time evolution for the case where it is initially in inertial motion and subsequently a constant acceleration is switched on for a finite time. We study the dynamical maps that describe the evolution of such a system and show that the dynamics is not completely positive (NCP). The inertial motion prior to the acceleration can entangle the detector and field leading to the NCP dynamics. We examine the nature of the open dynamics during the accelerated phase as a function of the duration of prior inertial motion and the magnitude of the acceleration.	翻訳日:2023-06-29 14:53:54 公開日:2023-06-28
# 肝ct画像における超高能率病変検出と偽陽性除去のためのカスケード法 A Cascaded Approach for ultraly High Performance Lesion Detection and False Positive Removal in Liver CT Scans ( http://arxiv.org/abs/2306.16036v1 ) ライセンス: Link先を確認	Fakai Wang, Chi-Tung Cheng, Chien-Wei Peng, Ke Yan, Min Wu, Le Lu, Chien-Hung Liao, and Ling Zhang	(参考訳) 肝臓がんは世界中で高い死亡率と死亡率を持っている。多相CTは肝腫瘍の検出・診断のための主要な医用画像モダリティである。 CT画像における肝病変の自動検出と分類は、臨床ワークフローを改善する可能性がある。この課題は, 肝病変の大きさ, 外観, 画像コントラスト, 腫瘍タイプやサブタイプの複雑さの多様さにより, 依然として困難である。本研究は,4相CT画像,多臓器マスク,多発病変(病理検査で確認された6種類の肝病変)を含む大規模データセットをキュレートするために,多段階CT画像のための多目的ラベリングツールをカスタマイズする。 2段階の肝病変検出パイプラインを開発し,第1段階の高感度検出アルゴリズムは可能な限り多くの病変を発見でき,第2段階の病変再分類アルゴリズムは可能な限り多くの誤報を除去する。多感性病変検出アルゴリズムはセグメント化の個々の確率マップの情報利用を最大化し、病変拡大は病変と肝臓のテクスチャコントラストを効果的に探索する。 331例で個別に検査し, マルチフェーズ造影CT(99.2%, 97.1%, 診断設定)および非コントラストCT(97.3%, 95.7%, スクリーニング設定)における悪性度分類の感度と特異性を得た。 Liver cancer has high morbidity and mortality rates in the world. Multi-phase CT is a main medical imaging modality for detecting/identifying and diagnosing liver tumors. Automatically detecting and classifying liver lesions in CT images have the potential to improve the clinical workflow. This task remains challenging due to liver lesions' large variations in size, appearance, image contrast, and the complexities of tumor types or subtypes. In this work, we customize a multi-object labeling tool for multi-phase CT images, which is used to curate a large-scale dataset containing 1,631 patients with four-phase CT images, multi-organ masks, and multi-lesion (six major types of liver lesions confirmed by pathology) masks. We develop a two-stage liver lesion detection pipeline, where the high-sensitivity detecting algorithms in the first stage discover as many lesion proposals as possible, and the lesion-reclassification algorithms in the second stage remove as many false alarms as possible. The multi-sensitivity lesion detection algorithm maximizes the information utilization of the individual probability maps of segmentation, and the lesion-shuffle augmentation effectively explores the texture contrast between lesions and the liver. Independently tested on 331 patient cases, the proposed model achieves high sensitivity and specificity for malignancy classification in the multi-phase contrast-enhanced CT (99.2%, 97.1%, diagnosis setting) and in the noncontrast CT (97.3%, 95.7%, screening setting).	翻訳日:2023-06-29 14:53:43 公開日:2023-06-28
# stone needle: 医療に向けた汎用マルチモーダル大規模モデルフレームワーク Stone Needle: A General Multimodal Large-scale Model Framework towards Healthcare ( http://arxiv.org/abs/2306.16034v1 ) ライセンス: Link先を確認	Weihua Liu and Yong Zuo	(参考訳) 医療では、マルチモーダルデータは広く普及しており、医療画像や臨床報告など、診断決定前に総合的に分析する必要がある。しかし、現在の大規模人工知能モデルは、主に単一モーダル認知能力に焦点を当て、複数のモーダルの統合を無視している。そこで本研究では,医療応用に適した汎用マルチモーダル大規模モデルフレームワークであるStone Needleを提案する。ストーンニードルは総合的な医療マルチモーダルモデルの基礎として機能し、テキスト、画像、ビデオ、オーディオといった様々なモダリティを統合し、シングルモーダルシステムの限界を超える。インテント分析,医療基盤モデル,プロンプトマネージャ,医療言語モジュールのフレームワークコンポーネントを通じて,アーキテクチャは複数ラウンドの対話でマルチモーダルインタラクションを行うことができる。本手法は汎用マルチモーダル大規模モデルフレームワークであり,多様なモダリティを統合し,特定のタスクを調整できる。本手法はシングルモーダルシステムと比較して優れた性能を示す実験結果である。異なる形態の融合と複雑な医療情報を石針で処理する能力は、正確な診断、治療の推奨、患者のケアに役立つ。 In healthcare, multimodal data is prevalent and requires to be comprehensively analyzed before diagnostic decisions, including medical images, clinical reports, etc. However, current large-scale artificial intelligence models predominantly focus on single-modal cognitive abilities and neglect the integration of multiple modalities. Therefore, we propose Stone Needle, a general multimodal large-scale model framework tailored explicitly for healthcare applications. Stone Needle serves as a comprehensive medical multimodal model foundation, integrating various modalities such as text, images, videos, and audio to surpass the limitations of single-modal systems. Through the framework components of intent analysis, medical foundation models, prompt manager, and medical language module, our architecture can perform multi-modal interaction in multiple rounds of dialogue. Our method is a general multimodal large-scale model framework, integrating diverse modalities and allowing us to tailor for specific tasks. The experimental results demonstrate the superior performance of our method compared to single-modal systems. The fusion of different modalities and the ability to process complex medical information in Stone Needle benefits accurate diagnosis, treatment recommendations, and patient care.	翻訳日:2023-06-29 14:53:14 公開日:2023-06-28
# ソーシャルメディアにおける公開談話の時空間的変動を探る:イタリアにおけるコロナウイルスパンデミックの第一波を事例として Exploring Spatial-Temporal Variations of Public Discourse on Social Media: A Case Study on the First Wave of the Coronavirus Pandemic in Italy ( http://arxiv.org/abs/2306.16031v1 ) ライセンス: Link先を確認	Anslow Michael and Galletti Martina	(参考訳) 本稿では,SARS CoV2パンデミックで流行したような重要な出来事に対する社会的反応の探索にソーシャルメディア上での言語行動をどのように活用できるかを探求するための方法論を提案する。特に、イベントの空間的側面と時間的側面が重要な特徴である。本手法は時系列分析とクラスタリングを用いて,ツイート利用傾向の時空間カテゴリーを定位する。各カテゴリの有意な項は、手書きのカテゴリに集約されたスケールされたfスコアに基づいて定性的比較分析によって同定される。このアプローチを実証するため,イタリアで発生した第1波について事例研究を行った。提案手法を用いて既存の心理学的観察を探索し,事象から物理的距離がコミュニケーション内容に与える影響を考察した。本研究は,sars cov2の発生源である病原体と周辺部が明らかな時系列クラスターに対応し,sars cov2の発生源である病原体は周辺地域よりも連帯性と政策に重点が置かれていることを示し,これらの知見を確認した。さらに,パンデミック時の政策変化と時間的カテゴリーが密接に対応していることが判明した。 This paper proposes a methodology for exploring how linguistic behaviour on social media can be used to explore societal reactions to important events such as those that transpired during the SARS CoV2 pandemic. In particular, where spatial and temporal aspects of events are important features. Our methodology consists of grounding spatial-temporal categories in tweet usage trends using time-series analysis and clustering. Salient terms in each category were then identified through qualitative comparative analysis based on scaled f-scores aggregated into hand-coded categories. To exemplify this approach, we conducted a case study on the first wave of the coronavirus in Italy. We used our proposed methodology to explore existing psychological observations which claimed that physical distance from events affects what is communicated about them. We confirmed these findings by showing that the epicentre of the disease and peripheral regions correspond to clear time-series clusters and that those living in the epicentre of the SARS CoV2 outbreak were more focused on solidarity and policy than those from more peripheral regions. Furthermore, we also found that temporal categories corresponded closely to policy changes during the handling of the pandemic.	翻訳日:2023-06-29 14:52:55 公開日:2023-06-28
# 構造モチーフ型グラフニューラルネットワークによる質量スペクトル予測 Mass Spectra Prediction with Structural Motif-based Graph Neural Networks ( http://arxiv.org/abs/2306.16085v1 ) ライセンス: Link先を確認	Jiwon Park, Jeonghee Jo, Sungroh Yoon	(参考訳) ターゲット分子からのイオン化フラグメントの集合体である質量スペクトルは、分子構造の同定において、様々な分野において重要な役割を果たす。一般的な分析方法は、未知のスペクトルがデータベースと相互参照されるスペクトルライブラリ検索である。しかし、このような探索に基づく手法の有効性は、既存の質量スペクトルデータベースの範囲によって制限され、質量スペクトル予測によるデータベースの拡張の必要性を強調する。本研究では、構造モチーフから得られる情報とグラフニューラルネットワーク(GNN)の実装を用いて、質量スペクトルを予測するMotif-based Mass Spectrum Prediction Network (MoMS-Net)を提案する。我々は、様々な質量スペクトルでモデルを試験し、既存のモデルよりもその優位性を観察した。 MoMS-Netはグラフレベルでのサブ構造を考慮し、グラフトランスフォーマーモデルに比べて少ないメモリを使用しながら、長距離依存の取り込みを容易にする。 Mass spectra, which are agglomerations of ionized fragments from targeted molecules, play a crucial role across various fields for the identification of molecular structures. A prevalent analysis method involves spectral library searches,where unknown spectra are cross-referenced with a database. The effectiveness of such search-based approaches, however, is restricted by the scope of the existing mass spectra database, underscoring the need to expand the database via mass spectra prediction. In this research, we propose the Motif-based Mass Spectrum Prediction Network (MoMS-Net), a system that predicts mass spectra using the information derived from structural motifs and the implementation of Graph Neural Networks (GNNs). We have tested our model across diverse mass spectra and have observed its superiority over other existing models. MoMS-Net considers substructure at the graph level, which facilitates the incorporation of long-range dependencies while using less memory compared to the graph transformer model.	翻訳日:2023-06-29 14:47:31 公開日:2023-06-28
# 高速rcnnに基づく連続的二重チャネルライブラリ占有検知システム A serial dual-channel library occupancy detection system based on Faster RCNN ( http://arxiv.org/abs/2306.16080v1 ) ライセンス: Link先を確認	Guoqiang Yang, Xiaowen Chang, Zitong Wang, Min Yang and Xin Chen	(参考訳) 大学図書館における座席占有現象が問題となっている。しかし、ソフトウェアベースの座席予約やセンサーによる占有検知といった既存のソリューションは、この問題に効果的に対処するには不十分であることが証明されている。本研究では,高速なrcnnに基づく連続2チャンネル物体検出モデルを提案する。さらに,ユーザフレンドリーなWebインターフェースとモバイルAPPを開発し,図書館利用者検出のためのコンピュータビジョンベースのプラットフォームを構築する。データセットを構築するために、現実世界のデータコレックオプションとUE5バーチャルリアリティを組み合わせる。実験の結果,音素単位の仮想データセットの利用は,専用シナリオにおける畳み込みニューラルネットワーク(CNN)の性能を著しく向上させることが示された。シリアルデュアルチャネル検出モデルは、3つのステップを含む。まず、座席が個人によって占有されているかどうかを判断するために、より高速なRCNNアルゴリズムを用いる。その後、移動学習に基づく物体分類アルゴリズムを用いて、未占有座席の画像の分類と識別を行う。これにより、座席を占拠した疑いがあるかどうかについて、手動で判断する必要がなくなる。最後に、WebインターフェースとAPPは、それぞれ図書館員と学生に座席情報を提供し、包括的なサービスを可能にする。本研究は,深層学習手法を活用することで,図書館システムにおける座席占有の課題を効果的に解決する。シート占有率認識の精度を大幅に向上させ,CNNのトレーニングに必要な計算資源を削減し,ライブラリーシート管理の効率を大幅に向上させる。 The phenomenon of seat occupancy in university libraries is a prevalent issue. However, existing solutions, such as software-based seat reservations and sensors-based occupancy detection, have proven to be inadequate in effectively addressing this problem. In this study, we propose a novel approach: a serial dual-channel object detection model based on Faster RCNN. Furthermore, we develop a user-friendly Web interface and mobile APP to create a computer vision-based platform for library seat occupancy detection. To construct our dataset, we combine real-world data collec-tion with UE5 virtual reality. The results of our tests also demonstrate that the utilization of per-sonalized virtual dataset significantly enhances the performance of the convolutional neural net-work (CNN) in dedicated scenarios. The serial dual-channel detection model comprises three es-sential steps. Firstly, we employ Faster RCNN algorithm to determine whether a seat is occupied by an individual. Subsequently, we utilize an object classification algorithm based on transfer learning, to classify and identify images of unoccupied seats. This eliminates the need for manual judgment regarding whether a person is suspected of occupying a seat. Lastly, the Web interface and APP provide seat information to librarians and students respectively, enabling comprehensive services. By leveraging deep learning methodologies, this research effectively addresses the issue of seat occupancy in library systems. It significantly enhances the accuracy of seat occupancy recognition, reduces the computational resources required for training CNNs, and greatly improves the effi-ciency of library seat management.	翻訳日:2023-06-29 14:47:16 公開日:2023-06-28
# カスケードハイブリッド最適化によるセキュアかつ高速な非同期垂直フェデレーション学習 Secure and Fast Asynchronous Vertical Federated Learning via Cascaded Hybrid Optimization ( http://arxiv.org/abs/2306.16077v1 ) ライセンス: Link先を確認	Ganyu Wang, Qingsong Zhang, Li Xiang, Boyu Wang, Bin Gu, Charles Ling	(参考訳) Vertical Federated Learning (VFL)は、複数のパーティが垂直に分割されたデータに対して、プライバシ保護モデルを共同でトレーニングできるようにするため、注目を集めている。近年の研究では、ゼロ階最適化(ZOO)の適用は実用的なVFLアルゴリズムを構築する上で多くの利点があることが示されている。しかし、ZOOベースのVFLの致命的な問題は収束速度が遅いことであり、これは現代の大規模モデルを扱う際の応用を制限している。そこで本研究では,VFLにおけるハイブリッド最適化手法を提案する。この方法では、下流モデル(クライアント)がZOOでトレーニングされ、プライバシーを保護し、内部情報が共有されないことを保証する。一方、アップストリームモデル(サーバ)は、一階最適化(foo)をローカルに更新することで、収束率を大幅に改善し、プライバシとセキュリティを損なうことなく、大規模モデルのトレーニングを可能にする。我々のVFLフレームワークがZOOベースのVFLよりも早く収束することが理論的に証明されている。本手法は,プライバシー保護レベルを維持しつつ,ZOOベースのVFLフレームワークよりも高速な収束を実現することを示す。さらに、VFLの収束は安全でないFOOベースのVFLベースラインに匹敵することを示した。さらに,本手法が大規模モデルのトレーニングを可能にすることを示す。 Vertical Federated Learning (VFL) attracts increasing attention because it empowers multiple parties to jointly train a privacy-preserving model over vertically partitioned data. Recent research has shown that applying zeroth-order optimization (ZOO) has many advantages in building a practical VFL algorithm. However, a vital problem with the ZOO-based VFL is its slow convergence rate, which limits its application in handling modern large models. To address this problem, we propose a cascaded hybrid optimization method in VFL. In this method, the downstream models (clients) are trained with ZOO to protect privacy and ensure that no internal information is shared. Meanwhile, the upstream model (server) is updated with first-order optimization (FOO) locally, which significantly improves the convergence rate, making it feasible to train the large models without compromising privacy and security. We theoretically prove that our VFL framework converges faster than the ZOO-based VFL, as the convergence of our framework is not limited by the size of the server model, making it effective for training large models with the major part on the server. Extensive experiments demonstrate that our method achieves faster convergence than the ZOO-based VFL framework, while maintaining an equivalent level of privacy protection. Moreover, we show that the convergence of our VFL is comparable to the unsafe FOO-based VFL baseline. Additionally, we demonstrate that our method makes the training of a large model feasible.	翻訳日:2023-06-29 14:46:55 公開日:2023-06-28
# 長期会話分析: ユーティリティとプライバシの探求 Long-term Conversation Analysis: Exploring Utility and Privacy ( http://arxiv.org/abs/2306.16071v1 ) ライセンス: Link先を確認	Francesco Nespoli, Jule Pohlhausen, Patrick A. Naylor, Joerg Bitzer	(参考訳) 日常生活で記録された会話の分析にはプライバシー保護が必要である。本稿では,入力特徴量削減,スペクトル平滑化,およびmcadams係数に基づく低コスト話者匿名化手法に基づくプライバシー保全特徴抽出手法について検討する。音声認識と話者検証モデルによりプライバシー保護が決定される一方で,音声活動検出と話者ダイアリゼーションシステムを用いて特徴抽出手法の有用性を評価する。我々は,mcadams係数とスペクトル平滑化の組み合わせが,プライバシを改善しながら有用性を維持していることを示す。 The analysis of conversations recorded in everyday life requires privacy protection. In this contribution, we explore a privacy-preserving feature extraction method based on input feature dimension reduction, spectral smoothing and the low-cost speaker anonymization technique based on McAdams coefficient. We assess the utility of the feature extraction methods with a voice activity detection and a speaker diarization system, while privacy protection is determined with a speech recognition and a speaker verification model. We show that the combination of McAdams coefficient and spectral smoothing maintains the utility while improving privacy.	翻訳日:2023-06-29 14:46:29 公開日:2023-06-28
# 基礎モデルを用いたフェデレーション生成学習 Federated Generative Learning with Foundation Models ( http://arxiv.org/abs/2306.16064v1 ) ライセンス: Link先を確認	Jie Zhang, Xiaohua Qi, Bo Zhao	(参考訳) 既存のフェデレートされた学習ソリューションは、クライアントとサーバの間で機能やパラメータ、ガディアンを伝達することに重点を置いている。新たな基礎生成モデルのおかげで、クライアントとサーバ間で分散トレーニングデータに関連するプロンプトを送信する、新しいフェデレーション学習フレームワーク、federated generative learningを提案する。情報学習データは、プライバシーと基礎生成モデルを含む受信したプロンプトに基づいて遠隔で合成することができる。新しいフレームワークには、通信効率の向上、分散シフトへのレジリエンス向上、実質的なパフォーマンス向上、imagenetとdomainnetデータセットの広範な実験で検証されたプライバシー保護強化など、複数のメリットがある。 Existing federated learning solutions focus on transmitting features, parameters or gadients between clients and server, which suffer from serious low-efficiency and privacy-leakage problems. Thanks to the emerging foundation generative models, we propose a novel federated learning framework, namely Federated Generative Learning, that transmits prompts associated with distributed training data between clients and server. The informative training data can be synthesized remotely based on received prompts containing little privacy and the foundation generative models. The new framework possesses multiple advantages, including improved communication efficiency, better resilience to distribution shift, substantial performance gains, and enhanced privacy protection, which are verified in extensive experiments on ImageNet and DomainNet datasets.	翻訳日:2023-06-29 14:46:21 公開日:2023-06-28
# バナッハ空間の誘導系におけるダイナミクスの収束 Convergence of Dynamics on Inductive Systems of Banach Spaces ( http://arxiv.org/abs/2306.16063v1 ) ライセンス: Link先を確認	Lauritz van Luijk, Alexander Stottmeister an Reinhard F. Werner	(参考訳) 定性的かつ定量的な物理系の多くの特徴は、ある限定的な状況下でのみ、鋭く定義されるか、抽出可能である。例えば、熱力学極限における相転移、量子論からの大きな作用における古典力学の出現、再正規化群固定点から生じる連続量子場理論である。このような多様なアプリケーションで有効な方法がほとんどないように思える。しかし、ここでは理論の極限に対する柔軟なモデリングツールを示す:バナッハ空間の帰納的極限の一般化を構成するソフトインダクティブ極限。この文脈では、ダイナミクスの収束に関する一般的な基準が定式化され、これらの基準が前述の状況に適用されることが示される。 Many features of physical systems, both qualitative and quantitative, become sharply defined or tractable only in some limiting situation. Examples are phase transitions in the thermodynamic limit, the emergence of classical mechanics from quantum theory at large action, and continuum quantum field theory arising from renormalization group fixed points. It would seem that few methods can be useful in such diverse applications. However, we here present a flexible modeling tool for the limit of theories: soft inductive limits constituting a generalization of inductive limits of Banach spaces. In this context, general criteria for the convergence of dynamics will be formulated, and these criteria will be shown to apply in the situations mentioned and more.	翻訳日:2023-06-29 14:46:04 公開日:2023-06-28
# ダイクパスとトポロジカル量子計算 Dyck Paths and Topological Quantum Computation ( http://arxiv.org/abs/2306.16062v1 ) ライセンス: Link先を確認	Vivek Kumar Singh, Akash Sinha, Pramod Padmanabhan, Indrajit Jana	(参考訳) Fibonacci anyonsの融合基底は、普遍量子計算に使用できるユニタリブレイド表現をサポートする。 3つのフィボナッチアーロンの融合基底である$\{\|1\rangle, \|\tau\rangle\}$と、融合基底上の2次元ブレイド群表現と標準の$(2,2)$ヤングダイアグラム上に構築されたブレイド群表現の間の同型による2つの長さのディックパスの間の写像を示す。この対応は、標準 $(N,N)$ Young tableaux がカタルーニャ数、$C_N$ であるとして、Dyck パスを用いて Fibonacci の融合基底を構築するのに役立つ。次に、局所フレドキン運動を用いて、フィボナッチ融合基底に対応するDyckパスを正確に含むスピン鎖を退化集合として構成する。本システムでは, ランダムノイズに対する安定性を検証し, トポロジカル量子計算のプラットフォームとしての有用性を確立する。最後に、所望の1量子ビット演算の実行を効率的に可能とし、所望の精度($\sim 10^{-3}$)を達成するこの回転空間におけるブレイドワードを示す。 The fusion basis of Fibonacci anyons supports unitary braid representations that can be utilized for universal quantum computation. We show a mapping between the fusion basis of three Fibonacci anyons, $\{\|1\rangle, \|\tau\rangle\}$, and the two length 4 Dyck paths via an isomorphism between the two dimensional braid group representations on the fusion basis and the braid group representation built on the standard $(2,2)$ Young diagrams using the Jones construction. This correspondence helps us construct the fusion basis of the Fibonacci anyons using Dyck paths as the number of standard $(N,N)$ Young tableaux is the Catalan number, $C_N$ . We then use the local Fredkin moves to construct a spin chain that contains precisely those Dyck paths that correspond to the Fibonacci fusion basis, as a degenerate set. We show that the system is gapped and examine its stability to random noise thereby establishing its usefulness as a platform for topological quantum computation. Finally, we show braidwords in this rotated space that efficiently enable the execution of any desired single-qubit operation, achieving the desired level of precision($\sim 10^{-3}$).	翻訳日:2023-06-29 14:45:45 公開日:2023-06-28
# RoMo-HER:ロバストなモデルベースの隠れ体験リプレイ RoMo-HER: Robust Model-based Hindsight Experience Replay ( http://arxiv.org/abs/2306.16061v1 ) ライセンス: Link先を確認	Yuming Huang and Bin Ren	(参考訳) スパース報酬はマルチゴール強化学習(RL)におけるサンプル効率の低下につながる要因の1つである。 Hindsight Experience Replay (HER)に基づいて、トレーニングされたモデルと相互作用して得られた仮想軌跡を用いて、モデルに基づくラベリング手法が目標を緩和する手法が提案されている。しかし、ロボット操作環境では効果がない。本稿では,ロボット操作環境における動的モデルを効果的に活用し,サンプル効率を向上させるロバストモデルに基づくHendsight Experience Replay (RoMo-HER) と呼ばれる頑健なフレームワークを設計する。 RoMo-HERは、ダイナミックスモデルと、Foresight relabeling(FR)と呼ばれる、特定の戦略で予測開始状態を選択し、スタート状態の将来の軌跡を予測し、ダイナミックスモデルとエージェントをトレーニングするための最新のポリシーを使用してゴールを再ラベルする新しいゴールレバーリング技術に基づいて構築されている。実験の結果,複数のロボット操作環境において,RoMo-HERはHERやモデルベースHMMよりも高効率であることがわかった。さらに,RoMo-HER と Relay Hindsight Experience Replay (RHER) を統合することで,ロバストモデルに基づく Relay Hindsight Experience Replay (RoMo-RHER) と呼ばれる新しい手法が提案される。 RHERはFetchPush-v1とFetchPickandPlace-v1で25%, 26%, RHERでは25%, RHERでは26%, RHERよりも高い試料効率が得られた。 Sparse rewards are one of the factors leading to low sample efficiency in multi-goal reinforcement learning (RL). Based on Hindsight Experience Replay (HER), model-based relabeling methods have been proposed to relabel goals using virtual trajectories obtained by interacting with the trained model, which can effectively enhance the sample efficiency in accurately modelable sparse-reward environments. However, they are ineffective in robot manipulation environment. In our paper, we design a robust framework called Robust Model-based Hindsight Experience Replay (RoMo-HER) which can effectively utilize the dynamical model in robot manipulation environments to enhance the sample efficiency. RoMo-HER is built upon a dynamics model and a novel goal relabeling technique called Foresight relabeling (FR), which selects the prediction starting state with a specific strategy, predicts the future trajectory of the starting state, and then relabels the goal using the dynamics model and the latest policy to train the agent. Experimental results show that RoMo-HER has higher sample efficiency than HER and Model-based Hindsight Experience Replay in several simulated robot manipulation environments. Furthermore, we integrate RoMo-HER and Relay Hindsight Experience Replay (RHER), which currently exhibits the highest sampling efficiency in most benchmark environments, resulting in a novel approach called Robust Model-based Relay Hindsight Experience Replay (RoMo-RHER). Our experimental results demonstrate that RoMo-RHER achieves higher sample efficiency over RHER, outperforming RHER by 25% and 26% in FetchPush-v1 and FetchPickandPlace-v1, respectively.	翻訳日:2023-06-29 14:45:05 公開日:2023-06-28
# 圧縮センシングのための動的経路制御型ディープアンフォールディングネットワーク Dynamic Path-Controllable Deep Unfolding Network for Compressive Sensing ( http://arxiv.org/abs/2306.16060v1 ) ライセンス: Link先を確認	Jiechong Song and Bin Chen and Jian Zhang	(参考訳) 深層ニューラルネットワークに最適化アルゴリズムを展開するディープ・アンフォールディング・ネットワーク(dun)は、その優れた解釈性と高性能のため、圧縮センシング(cs)において大きな成功を収めている。 DUNの各ステージは最適化の1つのイテレーションに対応する。テスト時には、すべてのサンプリングイメージを全ての段階で処理する必要があるが、これは計算負荷のコストがかかるとともに、コンテンツの復元が容易な画像も不要である。本稿では,CS再構成に着目し,新しいDPC-DUN(Dynamic Path-Controllable Deep Unfolding Network)を提案する。 dpc-dun 設計したパス制御可能なセレクタは、画像毎に高速かつ適切な経路を動的に選択でき、異なる性能・複雑さのトレードオフを制御してスリム化することができる。我々のDPC-DUNは高い柔軟性を示し、適切なトレードオフを得るために優れた性能と動的調整を提供し、現実にアピールする主な要件に対処する。コードはhttps://github.com/songjiechong/dpc-dunで入手できる。 Deep unfolding network (DUN) that unfolds the optimization algorithm into a deep neural network has achieved great success in compressive sensing (CS) due to its good interpretability and high performance. Each stage in DUN corresponds to one iteration in optimization. At the test time, all the sampling images generally need to be processed by all stages, which comes at a price of computation burden and is also unnecessary for the images whose contents are easier to restore. In this paper, we focus on CS reconstruction and propose a novel Dynamic Path-Controllable Deep Unfolding Network (DPC-DUN). DPC-DUN with our designed path-controllable selector can dynamically select a rapid and appropriate route for each image and is slimmable by regulating different performance-complexity tradeoffs. Extensive experiments show that our DPC-DUN is highly flexible and can provide excellent performance and dynamic adjustment to get a suitable tradeoff, thus addressing the main requirements to become appealing in practice. Codes are available at https://github.com/songjiechong/DPC-DUN.	翻訳日:2023-06-29 14:44:21 公開日:2023-06-28
# DUET: 2次元構造とほぼ同変表現 DUET: 2D Structured and Approximately Equivariant Representations ( http://arxiv.org/abs/2306.16058v1 ) ライセンス: Link先を確認	Xavier Suau, Federico Danieli, T. Anderson Keller, Arno Blaas, Chen Huang, Jason Ramapuram, Dan Busbridge, Luca Zappella	(参考訳) MSSL(Multiview Self-Supervised Learning)は、入力変換の集合に関する学習不変性に基づいている。しかし、不変性は変換に関連する情報を表現から部分的にあるいは完全に取り除き、そのような情報を必要とする特定の下流タスクのパフォーマンスを損なう可能性がある。本稿では,行列構造に整理された2次元表現である2DstrUcturedおよびEquivarianT表現(Coined DUET)を提案し,入力データに作用する変換について同変する。 DUET表現は、意味的に表現されたまま、入力変換に関する情報を保持する。 SimCLR (Chen et al., 2020) や ESSL (Dangovski et al., 2022) と比較すると、DUET 表現の構造的および同変性は、再構成エラーの少ない制御生成を可能にし、SimCLR や ESSL では制御不可能である。 DUETは複数の識別タスクに対して高い精度を実現し、転送学習を改善する。 Multiview Self-Supervised Learning (MSSL) is based on learning invariances with respect to a set of input transformations. However, invariance partially or totally removes transformation-related information from the representations, which might harm performance for specific downstream tasks that require such information. We propose 2D strUctured and EquivarianT representations (coined DUET), which are 2d representations organized in a matrix structure, and equivariant with respect to transformations acting on the input data. DUET representations maintain information about an input transformation, while remaining semantically expressive. Compared to SimCLR (Chen et al., 2020) (unstructured and invariant) and ESSL (Dangovski et al., 2022) (unstructured and equivariant), the structured and equivariant nature of DUET representations enables controlled generation with lower reconstruction error, while controllability is not possible with SimCLR or ESSL. DUET also achieves higher accuracy for several discriminative tasks, and improves transfer learning.	翻訳日:2023-06-29 14:44:01 公開日:2023-06-28
# ChatGPTは医療専門家か? バイオメディカルタスクにおける現行GPTモデルのゼロショット性能の探索 Is ChatGPT a Biomedical Expert? -- Exploring the Zero-Shot Performance of Current GPT Models in Biomedical Tasks ( http://arxiv.org/abs/2306.16108v1 ) ライセンス: Link先を確認	Samy Ateia, Udo Kruschwitz	(参考訳) 商業用大規模言語モデル (LLMs) GPT-3.5-Turbo と GPT-4 の性能を2023年のBioASQ課題から評価した。回答生成に焦点を当てたタスク11bフェーズbでは、両方のモデルがリードシステムとの競合能力を示した。注目すべきは、単純なゼロショット学習でこれを達成したことだ。関連したスニペットがなくても、パフォーマンスは良好だったが、最高のシステムと同等ではなかった。興味深いことに、より古く安価なGPT-3.5-Turboシステムでは、ファクトイドとリストの回答に基づいたQ&A設定でGPT-4と競合することができた。タスク11bのフェーズAでは、検索に焦点を当てたゼロショット学習によるクエリ拡張により、性能が向上したが、他のシステムに比べてモデルは低下した。これらの実験を再実行するのに必要なコードはGitHubから入手できる。 We assessed the performance of commercial Large Language Models (LLMs) GPT-3.5-Turbo and GPT-4 on tasks from the 2023 BioASQ challenge. In Task 11b Phase B, which is focused on answer generation, both models demonstrated competitive abilities with leading systems. Remarkably, they achieved this with simple zero-shot learning, grounded with relevant snippets. Even without relevant snippets, their performance was decent, though not on par with the best systems. Interestingly, the older and cheaper GPT-3.5-Turbo system was able to compete with GPT-4 in the grounded Q&A setting on factoid and list answers. In Task 11b Phase A, focusing on retrieval, query expansion through zero-shot learning improved performance, but the models fell short compared to other systems. The code needed to rerun these experiments is available through GitHub.	翻訳日:2023-06-29 14:36:17 公開日:2023-06-28
# 1mのパラメータで十分か? 医用画像分割のための軽量CNNモデル 1M parameters are enough? A lightweight CNN-based model for medical image segmentation ( http://arxiv.org/abs/2306.16103v1 ) ライセンス: Link先を確認	Binh-Duong Dinh, Thanh-Thu Nguyen, Thi-Thao Tran, Van-Truong Pham	(参考訳) 畳み込みニューラルネットワーク(CNN)とトランスフォーマーベースのモデルは、高レベルの特徴を抽出し、画像の重要な側面を捉える能力により、医療画像セグメンテーションに広く適用されている。しかし、高い精度の必要性と低い計算コストの要求との間にはトレードオフがしばしばある。高いパラメータを持つモデルは理論的にはより優れた性能を達成できるが、計算の複雑さとメモリ使用量の増加をもたらすため、実装には実用的ではない。本稿では,u-lite という,同一のままでも優れた性能を得られる軽量な u-net ベースのモデルを提案する。我々は,CNNの強みを生かし,演算パラメータの著しい削減を図るために,Depthwise Separable Convolutionの原理に基づいてU-Liteを設計する。具体的には、エンコーダとデコーダの両方で7x7のカーネルを持つAxial Depthwise Convolutionsを提案し、モデル受容場を拡大する。性能をさらに向上するため,フィルタ3x3によるAxial Dilated Depthwise Convolutionsをいくつかのブランチとして使用しています。全体として、U-Lite は 878K のパラメータしか持たず、従来の U-Net の35倍も小さく、トランスフォーマーベースのモデルよりもはるかに少ない。提案手法は, 計算複雑性を削減しつつ, 他の最先端アーキテクチャと比較して医療用セグメンテーションタスクにおいて印象的な性能を実現する。コードはhttps://github.com/duong-db/u-lite。 Convolutional neural networks (CNNs) and Transformer-based models are being widely applied in medical image segmentation thanks to their ability to extract high-level features and capture important aspects of the image. However, there is often a trade-off between the need for high accuracy and the desire for low computational cost. A model with higher parameters can theoretically achieve better performance but also result in more computational complexity and higher memory usage, and thus is not practical to implement. In this paper, we look for a lightweight U-Net-based model which can remain the same or even achieve better performance, namely U-Lite. We design U-Lite based on the principle of Depthwise Separable Convolution so that the model can both leverage the strength of CNNs and reduce a remarkable number of computing parameters. Specifically, we propose Axial Depthwise Convolutions with kernels 7x7 in both the encoder and decoder to enlarge the model receptive field. To further improve the performance, we use several Axial Dilated Depthwise Convolutions with filters 3x3 for the bottleneck as one of our branches. Overall, U-Lite contains only 878K parameters, 35 times less than the traditional U-Net, and much more times less than other modern Transformer-based models. The proposed model cuts down a large amount of computational complexity while attaining an impressive performance on medical segmentation tasks compared to other state-of-the-art architectures. The code will be available at: https://github.com/duong-db/U-Lite.	翻訳日:2023-06-29 14:36:01 公開日:2023-06-28
# 任意分布型雑音量子チャネルの古典的容量 Classical Capacity of Arbitrarily Distributed Noisy Quantum Channels ( http://arxiv.org/abs/2306.16102v1 ) ライセンス: Link先を確認	Indrakshi Dey, Harun Siljak, Nicola Marchetti	(参考訳) 量子コンピュータと量子衛星の迅速な展開により、古典的な情報を交換できる量子およびハイブリッドな古典量子ネットワークの設計と展開の必要性が高まっている。この文脈では、古典的情報を含む任意の量子チャネルに対する古典的ノイズと量子的ノイズの混合の影響に関する基礎研究を行う。このような混合ノイズを考える背景にある理論的根拠は、量子ノイズは異なるメモリやリピータ技術のような量子伝送シナリオにおける異なる絡み合いや不一致から生じうるが、古典的なノイズは古典的信号との共存から生じるものである。この目的に向けて,古典的システムの観点から混合雑音の分布を導出し,混合雑音の存在下で任意の分散量子チャネル上で実現可能なチャネル容量を定式化する。数値実験の結果,光子数の増加に伴って容量が増加することがわかった。 With the rapid deployment of quantum computers and quantum satellites, there is a pressing need to design and deploy quantum and hybrid classical-quantum networks capable of exchanging classical information. In this context, we conduct the foundational study on the impact of a mixture of classical and quantum noise on an arbitrary quantum channel carrying classical information. The rationale behind considering such mixed noise is that quantum noise can arise from different entanglement and discord in quantum transmission scenarios, like different memories and repeater technologies, while classical noise can arise from the coexistence with the classical signal. Towards this end, we derive the distribution of the mixed noise from a classical system's perspective, and formulate the achievable channel capacity over an arbitrary distributed quantum channel in presence of the mixed noise. Numerical results demonstrate that capacity increases with the increase in the number of photons per usage.	翻訳日:2023-06-29 14:35:30 公開日:2023-06-28
# 密度密度相関を用いた相関差分実現における量子気体のキャラクタリゼーション Characterizing quantum gases in correlated-disorder realizations using density-density correlations ( http://arxiv.org/abs/2306.16099v1 ) ライセンス: Link先を確認	Silvia Hiebel, Benjamin Nagler, Sian Barbosa, Jennifer Koch, and Artur Widera	(参考訳) 物理系における障害の役割は、マクロとミクロの世界で広く研究されている。静的障害は多くの場合よく理解されているが、時間依存障害が量子気体に与える影響はいまだに研究されていない。実験では、波長可変な相関時間を持つ超低温量子気体の時間依存光スペックル障害を生成・特徴付ける。実験的に、コヒーレント光は、スタティックと回転ディフューザの組み合わせを照らし、ディフューザの構造による空間変化相と相対回転による時間変化相とを収集する。ディフューザの回転速度は、ダイナミクスの特徴的な時間スケールを決定する。研究対象の量子気体の典型的な時間スケールと一致する広い範囲で調整することができる。分子ボース・アインシュタイン凝縮体に対するその影響を観測し,その強度分布とその場を測定することで,動的スペックルパターンを特徴付ける。 1つのディフューザが共通の光軸の周りで相対的に回転すると、光学スペックルの強度相関と量子ガス密度密度相関を追跡する。その結果,両測定法に比較して結果が得られた。この設定により、量子ガスの特性に適応した乱れポテンシャルを調整できる。これらの研究は、制御された動的不規則ポテンシャルを用いて相互作用する量子気体における非平衡物理学の研究の道を開いた。 The role of disorder on physical systems has been widely studied in the macroscopic and microscopic world. While static disorder is well understood in many cases, the impact of time-dependent disorder on quantum gases is still poorly investigated. In our experimental setup, we produce and characterize time-dependent optical-speckle disorder for ultracold quantum gases with tunable correlation time. Experimentally, coherent light illuminates a combination of a static and a rotating diffuser, thereby collecting a spatially varying phase due to the diffusers' structure and a temporally variable phase due to the relative rotation. The rotation speed of the diffuser determines a characteristic time scale of the dynamics. It can be tuned within a broad range matching typical time scales of the quantum gases investigated. We characterize the dynamic speckle pattern ex-situ by measuring its intensity distribution and in-situ by observing its impact on a molecular Bose-Einstein condensate. As one diffuser rotates relative to the other around the common optical axis, we trace the optical speckle's intensity correlations and the quantum gas' density-density correlations. Our results show comparable outcomes for both measurement methods. The setup allows us to tune the disorder potential adapted to the characteristics of the quantum gas. These studies pave the way for investigating nonequilibrium physics in interacting quantum gases using controlled dynamical-disorder potentials.	翻訳日:2023-06-29 14:35:15 公開日:2023-06-28
# Chan-Vese Attention U-Net:ロバストセグメンテーションの注意機構 Chan-Vese Attention U-Net: An attention mechanism for robust segmentation ( http://arxiv.org/abs/2306.16098v1 ) ライセンス: Link先を確認	Nicolas Makaroff and Laurent D. Cohen	(参考訳) 畳み込みニューラルネットワークを用いたセグメンテーションアルゴリズムの結果を研究するとき、結果の信頼性と一貫性について疑問を呈する。このことは、疑わしい余地がほとんどないアプリケーションでそのようなアルゴリズムを使用する可能性に疑問を呈する。本稿では,U-Netモデルのような標準CNNアーキテクチャによって与えられるセグメンテーションマスクをより正確に制御するために,Chan-Veseエネルギー最小化に基づく新しいアテンションゲートを提案する。このメカニズムにより、pdeの解像度に基づいてセグメンテーションの制約を得ることができる。本研究により,ニューラルネットワークが保持する空間情報を関心領域で観測し,二分節分割における競合結果を得ることができた。 mri脳画像データベース上での医用画像分割におけるこの手法の有効性について述べる。 When studying the results of a segmentation algorithm using convolutional neural networks, one wonders about the reliability and consistency of the results. This leads to questioning the possibility of using such an algorithm in applications where there is little room for doubt. We propose in this paper a new attention gate based on the use of Chan-Vese energy minimization to control more precisely the segmentation masks given by a standard CNN architecture such as the U-Net model. This mechanism allows to obtain a constraint on the segmentation based on the resolution of a PDE. The study of the results allows us to observe the spatial information retained by the neural network on the region of interest and obtains competitive results on the binary segmentation. We illustrate the efficiency of this approach for medical image segmentation on a database of MRI brain images.	翻訳日:2023-06-29 14:34:54 公開日:2023-06-28
# スパース表現、推論、学習 Sparse Representations, Inference and Learning ( http://arxiv.org/abs/2306.16097v1 ) ライセンス: Link先を確認	Clarissa Lauditi, Emanuele Troiani and Marc M\'ezard	(参考訳) 近年、統計物理学は、機械学習で起こるような大きな次元の推論問題を調べる上で有用なツールであることが証明されている。統計物理学は、解の基本的な限界を研究する分析ツールを提供し、個々のインスタンスを解決するアルゴリズムを提案する。これらのノートでは、2022年にル・フーシュのサマースクールで行われたMarc M\'ezard氏の講義に基づき、圧縮されたセンシング問題やパーセプトロンにおける学習問題を含む、弱い長距離相互作用の様々な問題に使用できる一般的な枠組みを提示する。これらの問題は,理論ツールやアルゴリズムとして,キャビティ手法の開発を通じて,レプリカ対称レベルでどのように研究できるのかを考察する。 In recent years statistical physics has proven to be a valuable tool to probe into large dimensional inference problems such as the ones occurring in machine learning. Statistical physics provides analytical tools to study fundamental limitations in their solutions and proposes algorithms to solve individual instances. In these notes, based on the lectures by Marc M\'ezard in 2022 at the summer school in Les Houches, we will present a general framework that can be used in a large variety of problems with weak long-range interactions, including the compressed sensing problem, or the problem of learning in a perceptron. We shall see how these problems can be studied at the replica symmetric level, using developments of the cavity methods, both as a theoretical tool and as an algorithm.	翻訳日:2023-06-29 14:34:42 公開日:2023-06-28
# chatlaw: 外部知識ベースを統合したオープンソースの法的大型言語モデル ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases ( http://arxiv.org/abs/2306.16092v1 ) ライセンス: Link先を確認	Jiaxi Cui, Zongjian Li, Yang Yan, Bohua Chen and Li Yuan	(参考訳) 大規模言語モデル(LLM)は、様々な領域における自然言語処理タスクに革命をもたらす可能性を示しており、垂直特有の大規模モデルに大きな関心を喚起している。しかし、独自のデータ蓄積を利用して金融分野を前進させたbloomberggptやfingptのようなプロプライエタリなモデルとは異なり、デジタルトランスフォーメーションを促進するために、中国の法律領域に似たような大きな言語モデルはない。本稿では,ChatLawという,オープンソースの法的大規模言語モデルを提案する。データ品質の重要性から、法的なドメインの微調整データセットを慎重に設計しました。さらに,参照データ検索における法データスクリーニングにおけるモデル幻覚の問題を克服するために,ベクトルデータベース検索とキーワード検索を組み合わせた手法を導入し,ベクトルデータベース検索のみに依存する不正確さを効果的に軽減する。さらに,参照データに存在する誤差を克服する大規模モデルの能力を高めること,モデルレベルでのモデル幻覚の問題を最適化すること,大規模モデルの問題解決能力を向上させることを提案する。また、当社のモデルとデータの一部をhttps://github.com/PKU-YuanGroup/ChatLaw.comでオープンソース化しました。 Large Language Models (LLMs) have shown the potential to revolutionize natural language processing tasks in various domains, sparking great interest in vertical-specific large models. However, unlike proprietary models such as BloombergGPT and FinGPT, which have leveraged their unique data accumulations to make strides in the finance domain, there hasn't not many similar large language models in the Chinese legal domain to facilitate its digital transformation. In this paper, we propose an open-source legal large language model named ChatLaw. Due to the importance of data quality, we carefully designed a legal domain fine-tuning dataset. Additionally, to overcome the problem of model hallucinations in legal data screening during reference data retrieval, we introduce a method that combines vector database retrieval with keyword retrieval to effectively reduce the inaccuracy of relying solely on vector database retrieval. Furthermore, we propose a self-attention method to enhance the ability of large models to overcome errors present in reference data, further optimizing the issue of model hallucinations at the model level and improving the problem-solving capabilities of large models. We also open-sourced our model and part of the data at https://github.com/PKU-YuanGroup/ChatLaw.	翻訳日:2023-06-29 14:34:26 公開日:2023-06-28
# ニューラルネットワーク活性化機能の実証的損失景観解析 Empirical Loss Landscape Analysis of Neural Network Activation Functions ( http://arxiv.org/abs/2306.16090v1 ) ライセンス: Link先を確認	Anna Sergeevna Bosman, Andries Engelbrecht, Marde Helbig	(参考訳) 非線型性を有効にすることで、活性化関数はニューラルネットワーク設計において重要な役割を果たす。活性化関数の選択は、以前、結果として生じる損失景観の特性に影響を及ぼすことが示された。アクティベーション関数とロスランドスケープ特性の関係を理解することは、ニューラルアーキテクチャとトレーニングアルゴリズム設計において重要である。本研究は,双曲的接点,整流線形単位,指数的線形単位活性化関数に関連するニューラルネットワーク損失のランドスケープを実験的に検討する。整流された線形ユニットは最も凸なロスランドスケープを示し、指数線型ユニットは最も平坦なロスランドスケープを示し、より優れた一般化性能を示す。全ての活性化関数に対して、損失景観における広狭谷の存在が確立され、狭谷は飽和ニューロンと暗黙的に規則化されたネットワーク構成と相関することが示されている。 Activation functions play a significant role in neural network design by enabling non-linearity. The choice of activation function was previously shown to influence the properties of the resulting loss landscape. Understanding the relationship between activation functions and loss landscape properties is important for neural architecture and training algorithm design. This study empirically investigates neural network loss landscapes associated with hyperbolic tangent, rectified linear unit, and exponential linear unit activation functions. Rectified linear unit is shown to yield the most convex loss landscape, and exponential linear unit is shown to yield the least flat loss landscape, and to exhibit superior generalisation performance. The presence of wide and narrow valleys in the loss landscape is established for all activation functions, and the narrow valleys are shown to correlate with saturated neurons and implicitly regularised network configurations.	翻訳日:2023-06-29 14:34:05 公開日:2023-06-28
# mastering nordschleife - モータースポーツにおけるai戦略決定のための総合的レースシミュレーション Mastering Nordschleife -- A comprehensive race simulation for AI strategy decision-making in motorsports ( http://arxiv.org/abs/2306.16088v1 ) ライセンス: Link先を確認	Max Boettinger, David Klotz	(参考訳) サーキットモータースポーツの分野では、レース戦略はレースの結果を決定する上で重要な役割を果たす。この戦略は燃料消費とタイヤ性能の劣化のために必要となるピット停止のタイミングに焦点を当てている。レース戦略の目的は、タイヤ交換や燃料補給のようなピットストップの利点と、ピットレーンで発生する時間損失のバランスをとることである。現在のレースシミュレーションは、最善のレース戦略を推定するために使用され、粒度、確率的事象のモデル化、インラップの手動入力を必要とする。本稿では,gtレーシングに適した新しいシミュレーションモデルを開発し,戦略決定の自動化に人工知能を活用することで,これらの制約に対処する。 openaiのジムフレームワークとシミュレーションを統合することで、強化学習環境が作成され、エージェントが訓練される。実験パラメータ検証のために,2020 N\"urburgring Langstrecken Serie の時系列データをもとに,様々なハイパーパラメータ構成,観測空間,報酬関数を評価した。その結果、訓練されたエージェントがピット停止タイミングと燃料補給量に関する合理的な判断を行うため、レース戦略決定を改善するための強化学習の可能性を示した。学習率、腐敗率、エピソード数といった重要なパラメータは重要な要因として特定され、燃料質量と現在の人種位置の組み合わせは政策開発に最も有効であることが証明される。この論文は、レースシミュレーションにおける強化学習の幅広い応用に寄与し、特にgtレーシング領域においてfia formula~1を超えるレース戦略最適化の可能性を切り開く。 In the realm of circuit motorsports, race strategy plays a pivotal role in determining race outcomes. This strategy focuses on the timing of pit stops, which are necessary due to fuel consumption and tire performance degradation. The objective of race strategy is to balance the advantages of pit stops, such as tire replacement and refueling, with the time loss incurred in the pit lane. Current race simulations, used to estimate the best possible race strategy, vary in granularity, modeling of probabilistic events, and require manual input for in-laps. This paper addresses these limitations by developing a novel simulation model tailored to GT racing and leveraging artificial intelligence to automate strategic decisions. By integrating the simulation with OpenAI's Gym framework, a reinforcement learning environment is created and an agent is trained. The study evaluates various hyperparameter configurations, observation spaces, and reward functions, drawing upon historical timing data from the 2020 N\"urburgring Langstrecken Serie for empirical parameter validation. The results demonstrate the potential of reinforcement learning for improving race strategy decision-making, as the trained agent makes sensible decisions regarding pit stop timing and refueling amounts. Key parameters, such as learning rate, decay rate and the number of episodes, are identified as crucial factors, while the combination of fuel mass and current race position proves most effective for policy development. The paper contributes to the broader application of reinforcement learning in race simulations and unlocks the potential for race strategy optimization beyond FIA Formula~1, specifically in the GT racing domain.	翻訳日:2023-06-29 14:33:50 公開日:2023-06-28
# 生涯変化検出: ロボットナビゲーションにおける小物体変化検出のための連続領域適応 Lifelong Change Detection: Continuous Domain Adaptation for Small Object Change Detection in Every Robot Navigation ( http://arxiv.org/abs/2306.16086v1 ) ライセンス: Link先を確認	Koji Takeda, Kanji Tanaka, Yoshimasa Nakamura	(参考訳) 最近発表されたロボット工学の研究領域である地景変化検出は、視覚の不確かさと複雑な非線形視点の投影が組み合わさって、その不適切さに苦しめられている。不適切さを正則化するために、一般的に適用される教師付き学習方法(cscd-netなど)は、手作業で注釈付き高品質なオブジェクトクラス固有の優先順位に依存する。本稿では,手動アノテーションが利用できない汎用アプリケーションドメインについて検討し,完全な自己監督アプローチを提案する。本手法は,日常のロボットナビゲーションにおいて検出される物体の変化を,将来の変化検出タスクを改善するために,追加の事前として再利用できるという,強力で汎用的な考え方を採用する。さらに、グラウンダービューの小さなオブジェクト変更検出という、新しい挑戦的な実践シナリオにおいて、堅牢化フレームワークを実装し、実験的に検証する。 The recently emerging research area in robotics, ground view change detection, suffers from its ill-posed-ness because of visual uncertainty combined with complex nonlinear perspective projection. To regularize the ill-posed-ness, the commonly applied supervised learning methods (e.g., CSCD-Net) rely on manually annotated high-quality object-class-specific priors. In this work, we consider general application domains where no manual annotation is available and present a fully self-supervised approach. The present approach adopts the powerful and versatile idea that object changes detected during everyday robot navigation can be reused as additional priors to improve future change detection tasks. Furthermore, a robustified framework is implemented and verified experimentally in a new challenging practical application scenario: ground-view small object change detection.	翻訳日:2023-06-29 14:33:22 公開日:2023-06-28
# INSTA-BEEER: 高速かつ高精度なオブジェクトインスタンスセグメンテーションのための明示的なエラー推定とリファインメント INSTA-BEEER: Explicit Error Estimation and Refinement for Fast and Accurate Unseen Object Instance Segmentation ( http://arxiv.org/abs/2306.16132v1 ) ライセンス: Link先を確認	Seunghyeok Back, Sangbeom Lee, Kangmin Kim, Joosoon Lee, Sungho Shin, Jaemo Maeng, Kyoobin Lee	(参考訳) ロボット操作には、見えない物体の効率的かつ正確なセグメンテーションが不可欠である。しかし、過度あるいは過度なセグメンテーションのため、依然として困難である。既存の改良法はセグメンテーション品質を向上させることができるが、小さな境界エラーのみを修正できるか、十分に高速ではない。本研究では,インスタンスの追加と削除,および境界のシャープ化を可能にする改良モデルであるINSTA-BEEER(INSTAnce boundary Explicit Error Estimation and Refinement)を提案する。このモデルは、エラー推定-then-refinementスキームを利用して、最初に、最初のセグメンテーションでインスタンス境界の真正、真負、偽正、偽負のピクセルのピクセル境界の明示的なエラーを推定する。その後、これらの誤差推定をガイダンスとして、初期セグメンテーションを洗練する。実験により,提案モデルによりセグメント化が著しく向上し,最先端性能が達成された。さらに、高速ランタイム(0.1秒未満)で、モデルは様々な初期セグメンテーションメソッドのパフォーマンスを一貫して改善し、実用的なロボットアプリケーションに適している。 Efficient and accurate segmentation of unseen objects is crucial for robotic manipulation. However, it remains challenging due to over- or under-segmentation. Although existing refinement methods can enhance the segmentation quality, they fix only minor boundary errors or are not sufficiently fast. In this work, we propose INSTAnce Boundary Explicit Error Estimation and Refinement (INSTA-BEEER), a novel refinement model that allows for adding and deleting instances and sharpening boundaries. Leveraging an error-estimation-then-refinement scheme, the model first estimates the pixel-wise boundary explicit errors: true positive, true negative, false positive, and false negative pixels of the instance boundary in the initial segmentation. It then refines the initial segmentation using these error estimates as guidance. Experiments show that the proposed model significantly enhances segmentation, achieving state-of-the-art performance. Furthermore, with a fast runtime (less than 0.1 s), the model consistently improves performance across various initial segmentation methods, making it highly suitable for practical robotic applications.	翻訳日:2023-06-29 14:27:42 公開日:2023-06-28
# 位置認識型対向パッチの分布モデル Distributional Modeling for Location-Aware Adversarial Patches ( http://arxiv.org/abs/2306.16131v1 ) ライセンス: Link先を確認	Xingxing Wei, Shouwei Ruan, Yinpeng Dong and Hang Su	(参考訳) 敵パッチは、物理的な世界で敵攻撃を行う重要な形態の1つである。既存の敵パッチの自然性と攻撃性を改善するために,対象オブジェクト上のパッチの位置を最適化プロセスに統合して攻撃を行う位置認識パッチを提案する。効果的ではあるが、パッチを配置するための最適な場所を効率的に見つけることは、特にブラックボックス攻撃設定下では困難である。本稿では,分散最適化型逆数パッチ(DOPatch, Distribution-Optimized Adversarial Patch)を提案する。第一に、異なるモデルにまたがる場所の分布がかなり似ていることを発見し、サーロゲートモデルに最適化された分布的プリミティブを使用して、未発見のモデルに対して効率的なクエリベースの攻撃を実現できる。第二に、DOPatchは、敵位置の分布を特徴付けることにより、多様な敵のサンプルを生成することができる。そこで我々は,DOP-DMAT (Distributal-Modeling Adversarial Training) を慎重に設計することで,位置対応パッチに対するモデルの堅牢性を向上させることができる。顔認識および画像認識タスクにおけるDOPatchの評価を行い、既存の手法よりも優れた性能と効率性を示す。また,本手法の有効性を検証し,敵位置の分布に関する知見を提供するため,広範囲にわたるアブレーション研究と分析を行った。 Adversarial patch is one of the important forms of performing adversarial attacks in the physical world. To improve the naturalness and aggressiveness of existing adversarial patches, location-aware patches are proposed, where the patch's location on the target object is integrated into the optimization process to perform attacks. Although it is effective, efficiently finding the optimal location for placing the patches is challenging, especially under the black-box attack settings. In this paper, we propose the Distribution-Optimized Adversarial Patch (DOPatch), a novel method that optimizes a multimodal distribution of adversarial locations instead of individual ones. DOPatch has several benefits: Firstly, we find that the locations' distributions across different models are pretty similar, and thus we can achieve efficient query-based attacks to unseen models using a distributional prior optimized on a surrogate model. Secondly, DOPatch can generate diverse adversarial samples by characterizing the distribution of adversarial locations. Thus we can improve the model's robustness to location-aware patches via carefully designed Distributional-Modeling Adversarial Training (DOP-DMAT). We evaluate DOPatch on various face recognition and image recognition tasks and demonstrate its superiority and efficiency over existing methods. We also conduct extensive ablation studies and analyses to validate the effectiveness of our method and provide insights into the distribution of adversarial locations.	翻訳日:2023-06-29 14:27:22 公開日:2023-06-28
# MLSMM:機械学習セキュリティ成熟度モデル MLSMM: Machine Learning Security Maturity Model ( http://arxiv.org/abs/2306.16127v1 ) ライセンス: Link先を確認	Felix Jedrzejewski, Davide Fucci, Oleksandr Adamov	(参考訳) 機械学習(ML)ベースのソフトウェアコンポーネントの開発におけるセキュリティプラクティスの成熟度を評価することは、従来のソフトウェア開発ほど注目されていない。本稿では,ML開発ライフサイクルに沿ってセキュリティプラクティスを整理し,それぞれが3段階の成熟度を確立する機械学習セキュリティ成熟度モデル(MLSMM)を提案する。我々は,MLSMMを産業と学界の緊密な連携の一歩として想定する。 Assessing the maturity of security practices during the development of Machine Learning (ML) based software components has not gotten as much attention as traditional software development. In this Blue Sky idea paper, we propose an initial Machine Learning Security Maturity Model (MLSMM) which organizes security practices along the ML-development lifecycle and, for each, establishes three levels of maturity. We envision MLSMM as a step towards closer collaboration between industry and academia.	翻訳日:2023-06-29 14:26:58 公開日:2023-06-28
# 自動転写表データのより効率的な手作業レビュー More efficient manual review of automatically transcribed tabular data ( http://arxiv.org/abs/2306.16126v1 ) ライセンス: Link先を確認	Bj{\o}rn-Richard Pedersen, Rigmor Katrine Johansen, Einar Holsb{\o}, Hilde Sommerseth, Lars Ailo Bongo	(参考訳) 機械学習手法は、歴史的データの書き起こしに有用であることが証明されている。しかし、精度の高い手法による結果には手動による検証と修正が必要である。このような手作業によるレビューは, 時間と費用がかかるため, より効率的に行うことが目的である。以前は、ノルウェーの1950年国勢調査(97%)から230万個の手書きの職業コードを書き起こすのに機械学習を使いました。モデルの信頼性が最も低い90,000 (3%) のコードを手作業でレビューしました。 9万のコードを人間のレビュアーに割り当て、アノテーションツールを使ってコードをレビューしました。レビューア合意を評価するために、いくつかのコードは複数のレビューアに割り当てられた。そして、レビュー結果を分析して、精度の改善と努力の関係を理解する。さらに、ワークフローを改善するためにレビュアーにインタビューしました。レビュアーはラベルの62.8%を修正し、31.9%のケースでモデルラベルに同意した。画像の約0.2%はラベルを割り当てられず、5.1%はレビュアーが不確実か、または無効なラベルを割り当てられた。 9000枚の画像は、複数のレビュアーによって独立にレビューされ、86.43%の合意と8.96%の不一致が得られた。私たちの自動転写は、最も頻度の高いコードに対して偏りがあり、最も低い頻度のコードに対して高い誤分類があることが分かりました。インタビューの結果,レビュアーは内部品質管理を行い,カスタムツールが適していることがわかった。したがって、レビュアーは1人だけですが、不確実性を報告すべきです。 Machine learning methods have proven useful in transcribing historical data. However, results from even highly accurate methods require manual verification and correction. Such manual review can be time-consuming and expensive, therefore the objective of this paper was to make it more efficient. Previously, we used machine learning to transcribe 2.3 million handwritten occupation codes from the Norwegian 1950 census with high accuracy (97%). We manually reviewed the 90,000 (3%) codes with the lowest model confidence. We allocated those 90,000 codes to human reviewers, who used our annotation tool to review the codes. To assess reviewer agreement, some codes were assigned to multiple reviewers. We then analyzed the review results to understand the relationship between accuracy improvements and effort. Additionally, we interviewed the reviewers to improve the workflow. The reviewers corrected 62.8% of the labels and agreed with the model label in 31.9% of cases. About 0.2% of the images could not be assigned a label, while for 5.1% the reviewers were uncertain, or they assigned an invalid label. 9,000 images were independently reviewed by multiple reviewers, resulting in an agreement of 86.43% and disagreement of 8.96%. We learned that our automatic transcription is biased towards the most frequent codes, with a higher degree of misclassification for the lowest frequency codes. Our interview findings show that the reviewers did internal quality control and found our custom tool well-suited. So, only one reviewer is needed, but they should report uncertainty.	翻訳日:2023-06-29 14:26:50 公開日:2023-06-28
# ソーシャルメディア上でうつ病を識別するためのフレームワーク: mentalriskes@iberlef 2023 A Framework for Identifying Depression on Social Media: MentalRiskES@IberLEF 2023 ( http://arxiv.org/abs/2306.16125v1 ) ライセンス: Link先を確認	Simon Sanchez Viloria, Daniel Peix del R\'io, Rub\'en Berm\'udez Cabo, Guillermo Arturo Arrojo Fuentes, Isabel Segura-Bedmar	(参考訳) 本稿では,IberLEF 2023におけるMentalRiskESタスクへの参加について述べる。そのタスクは、ソーシャルメディアの活動に基づいて、抑うつを経験する個人の可能性を予測することであった。データセットは、175人のテレグラムユーザーの会話から成り、それぞれが障害に苦しむ証拠に従ってラベル付けされた。従来の機械学習とディープラーニングを組み合わせることで、バイナリ分類、単純な回帰、マルチクラス分類、マルチクラス回帰という4つの予測サブタスクを解くことができた。マルチクラスの回帰ケースを解くためにモデルをトレーニングし、他の3つのサブタスクで動作するように予測を変換することで、この問題に対処した。 BERTモデルを微調整し、文の埋め込みを線形回帰器への入力として使用し、後者がより良い結果を得る2つの異なるモデリング手法の性能を比較した。結果はhttps://github.com/simonsanvil/earlydepression-mentalriskesで再現できます。 This paper describes our participation in the MentalRiskES task at IberLEF 2023. The task involved predicting the likelihood of an individual experiencing depression based on their social media activity. The dataset consisted of conversations from 175 Telegram users, each labeled according to their evidence of suffering from the disorder. We used a combination of traditional machine learning and deep learning techniques to solve four predictive subtasks: binary classification, simple regression, multiclass classification, and multiclass regression. We approached this by training a model to solve the multiclass regression case and then transforming the predictions to work for the other three subtasks. We compare the performance of two different modeling approaches: fine-tuning a BERT-based model and using sentence embeddings as inputs to a linear regressor, with the latter yielding better results. The code to reproduce our results can be found at: https://github.com/simonsanvil/EarlyDepression-MentalRiskES.	翻訳日:2023-06-29 14:26:27 公開日:2023-06-28
# コントラストの識別を促進する意味ポジティブペア Semantic Positive Pairs for Enhancing Contrastive Instance Discrimination ( http://arxiv.org/abs/2306.16122v1 ) ライセンス: Link先を確認	Mohammad Alkhalefi, Georgios Leontidis, Mingjun Zhong	(参考訳) インスタンス識別に基づく自己教師付き学習アルゴリズムは,表現の崩壊を効果的に防止し,表現学習に有望な結果をもたらす。しかし、組込み空間において正のペア(すなわち同じインスタンスの2つのビュー)を引き付け、それらのカテゴリに関係なく他のすべてのインスタンス(すなわち負のペア)を撃退するプロセスは、重要な特徴を破棄する。そこで本研究では,類似した意味的内容を持つ画像を特定し,それを正のインスタンスとして扱う手法であるspps(semantic positive pairs set)を提案し,表現学習中に重要な特徴を捨てるリスクを低減させる。このアプローチは、SimCLRやMOCOのような対照的なインスタンス識別フレームワークでも機能します。我々は、ImageNet、STL-10、CIFAR-10の3つのデータセットで実験を行い、我々のアプローチを評価する。実験結果から,本手法は3つのデータセットのベースライン手法であるvanilla SimCLRよりも一貫して優れており,例えば,バッチサイズ1024および800エポックのImageNet上では,線形評価プロトコル下でのvanilla SimCLRを4.18%改善する。 Self-supervised learning algorithms based on instance discrimination effectively prevent representation collapse and produce promising results in representation learning. However, the process of attracting positive pairs (i.e., two views of the same instance) in the embedding space and repelling all other instances (i.e., negative pairs) irrespective of their categories could result in discarding important features. To address this issue, we propose an approach to identifying those images with similar semantic content and treating them as positive instances, named semantic positive pairs set (SPPS), thereby reducing the risk of discarding important features during representation learning. Our approach could work with any contrastive instance discrimination framework such as SimCLR or MOCO. We conduct experiments on three datasets: ImageNet, STL-10 and CIFAR-10 to evaluate our approach. The experimental results show that our approach consistently outperforms the baseline method vanilla SimCLR across all three datasets; for example, our approach improves upon vanilla SimCLR under linear evaluation protocol by 4.18% on ImageNet with a batch size 1024 and 800 epochs.	翻訳日:2023-06-29 14:26:12 公開日:2023-06-28
# gp経路のディープラーニングアルゴリズムを用いたnhsトラストにおける正常胸部x線撮影の実世界性能 Real-World Performance of Autonomously Reporting Normal Chest Radiographs in NHS Trusts Using a Deep-Learning Algorithm on the GP Pathway ( http://arxiv.org/abs/2306.16115v1 ) ライセンス: Link先を確認	Jordan Smith, Tom Naunton Morgan, Paul Williams, Qaiser Malik, Simon Rasalingham	(参考訳) AIMは、現在、2つのNHSトラストにおいて診断決定支援ソフトウェアとしてデプロイされているディープラーニング(DL)アルゴリズムの性能を分析し、アクティブな臨床経路における正常な胸部X線を識別する。材料と方法 2022年12月からサマセット NHS Foundation Trust (SFT) に、2023年3月からCalderdale & Huddersfield NHS Foundation Trust (CHFT) にDLアルゴリズムが配備されている。このアルゴリズムは展開前に開発・訓練され、gp要求胸部x線(cxr)に異常点を割り当てるために使用される。このアルゴリズムは、最も低い異常スコアを持つ試験のサブセットをHigh Confidence Normal (HCN) に分類し、その結果をTrustに表示する。この2段階の研究には、アルゴリズムによって6週間にわたって処理された4,654のcxr連続検査が含まれる。その結果,評価試験の20.0%(930)をhcnとして分類すると,負の予測値(npv)0.96。検査の0.77% (36) が誤ってhcnと分類され、検査医による臨床的に有意な異常は認められなかった。 DLソフトウェアは臨床医への迅速なサービス水準を維持し、平均7.1秒でTrustsに返却された。結論 dlアルゴリズムは低いエラー率で動作し、cxrのサブセットを正常に高信頼で自動報告するために使用される自動診断意思決定支援ツールとして非常に有効である。すべてのcxrの20%を取り除き、レポーターの作業負荷を削減し、放射線部門が他の場所でリソースに集中できるようにする。 AIM To analyse the performance of a deep-learning (DL) algorithm currently deployed as diagnostic decision support software in two NHS Trusts used to identify normal chest x-rays in active clinical pathways. MATERIALS AND METHODS A DL algorithm has been deployed in Somerset NHS Foundation Trust (SFT) since December 2022, and at Calderdale & Huddersfield NHS Foundation Trust (CHFT) since March 2023. The algorithm was developed and trained prior to deployment, and is used to assign abnormality scores to each GP-requested chest x-ray (CXR). The algorithm classifies a subset of examinations with the lowest abnormality scores as High Confidence Normal (HCN), and displays this result to the Trust. This two-site study includes 4,654 CXR continuous examinations processed by the algorithm over a six-week period. RESULTS When classifying 20.0% of assessed examinations (930) as HCN, the model classified exams with a negative predictive value (NPV) of 0.96. There were 0.77% of examinations (36) classified incorrectly as HCN, with none of the abnormalities considered clinically significant by auditing radiologists. The DL software maintained fast levels of service to clinicians, with results returned to Trusts in a mean time of 7.1 seconds. CONCLUSION The DL algorithm performs with a low rate of error and is highly effective as an automated diagnostic decision support tool, used to autonomously report a subset of CXRs as normal with high confidence. Removing 20% of all CXRs reduces workload for reporters and allows radiology departments to focus resources elsewhere.	翻訳日:2023-06-29 14:25:49 公開日:2023-06-28
# 境界条件の異なる円点における磁場の影響の量子情報理論 Quantum-information theory of magnetic field influence on circular dots with different boundary conditions ( http://arxiv.org/abs/2306.16114v1 ) ライセンス: Link先を確認	H. Shafeekali, O. Olendski	(参考訳) 横一様磁場 $\bf b$ の位置 (subscript $\rho$) と運動量 (\gamma$) に対するシャノン量子情報エントロピー $s_{\rho,\gamma}$, fisher informations $i_{\rho,\gamma}$, informational energies $o_{\rho,\gamma}$ および情報エネルギー $o_{\rho,\gamma}$ の影響は、円周がジリクレとノイマン境界条件 (bc) のいずれかをサポートする2次元円形量子ドット (qds) に対して理論的に研究されている。解析により、磁場と表面相互作用の構造特性に対する類似性と影響の相違が明らかになった。スペクトル間の顕著な区別は、同じ放射量子数$n$と隣接する非正角指数$m$でノイマンエネルギーの誘導が増加するときの交差である。 b$が増加すると、どちらのシステムも、その特性が一様場となるとランダウ凝縮を行う。例えば、ディリクレ和 $s_{\rho_{00}}+s_{\gamma_{00}} は、上から基本限界 2(1+\ln\pi)$ へのアプローチにおいて、対応するノイマン量よりも少なくとも $b$ である。広く受け入れられている不平衡不確かさ関係 $o_\rho o_\gamma\leq(2\pi)^{-\mathtt{d}}$ と$\mathtt{d}$ が系の次元であることは、磁場中のノイマン qd によって破られることを指摘した。静電高調波閉じ込めとの比較を行う。物理的解釈は2つのbcの異なる役割とフィールドとの相互作用に基づいている: ディリクレ(ノイマン)曲面は反発的(引き込み的)なインターフェースである。 Influence of the transverse uniform magnetic field $\bf B$ on position (subscript $\rho$) and momentum ($\gamma$) Shannon quantum-information entropies $S_{\rho,\gamma}$, Fisher informations $I_{\rho,\gamma}$ and informational energies $O_{\rho,\gamma}$ is studied theoretically for the 2D circular quantum dots (QDs) whose circumference supports homogeneous either Dirichlet or Neumann boundary condition (BC). Analysis reveals similarities and differences of the influence on the properties of the structure of the surface interaction with the magnetic field. Conspicuous distinction between the spectra are crossings at the increasing induction of the Neumann energies with the same radial quantum number $n$ and adjacent non-positive angular indices $m$. At the growing $B$, either system undergoes Landau condensation when its characteristics turn into their uniform field counterparts. For the Dirichlet system this transformation takes place at the smaller magnetic intensities; e.g., the Dirichlet sum $S_{\rho_{00}}+S_{\gamma_{00}}$ on its approach from above to a fundamental limit $2(1+\ln\pi)$ is at any $B$ smaller than the corresponding Neumann quantity what physically means that the former geometry provides more total information about the position and motion of the particle. It is pointed out that the widely accepted disequilibrium uncertainty relation $O_\rho O_\gamma\leq(2\pi)^{-\mathtt{d}}$, with $\mathtt{d}$ being a dimensionality of the system, is violated by the Neumann QD in the magnetic field. Comparison with electrostatic harmonic confinement is performed. Physical interpretation is based on the different roles of the two BCs and their interplay with the field: Dirichlet (Neumann) surface is a repulsive (attractive) interface.	翻訳日:2023-06-29 14:25:17 公開日:2023-06-28
# 最適時間変数学習における時間正規化 Time Regularization in Optimal Time Variable Learning ( http://arxiv.org/abs/2306.16111v1 ) ライセンス: Link先を確認	Evelyn Herberg and Roland Herzog and Frederik K\"ohne	(参考訳) 近年、arXiv:2204.08528では、ディープニューラルネットワーク(DNN)における最適時変学習が導入されている。この写本では、離散力学系の時間軸に直接関係する正規化項を導入することで概念を拡張している。さらに,Residual Neural Networks (ResNets) に対する適応型プルーニング手法を提案する。この結果は、よく知られたMNISTとFashion MNISTデータセットの分類タスクに提案された概念を適用することで説明される。 pytorchコードはhttps://github.com/frederikkoehne/time_variable_learningで利用できます。 Recently, optimal time variable learning in deep neural networks (DNNs) was introduced in arXiv:2204.08528. In this manuscript we extend the concept by introducing a regularization term that directly relates to the time horizon in discrete dynamical systems. Furthermore, we propose an adaptive pruning approach for Residual Neural Networks (ResNets), which reduces network complexity without compromising expressiveness, while simultaneously decreasing training time. The results are illustrated by applying the proposed concepts to classification tasks on the well known MNIST and Fashion MNIST data sets. Our PyTorch code is available on https://github.com/frederikkoehne/time_variable_learning.	翻訳日:2023-06-29 14:24:31 公開日:2023-06-28
# 高速マーチングエネルギーCNN Fast Marching Energy CNN ( http://arxiv.org/abs/2306.16109v1 ) ライセンス: Link先を確認	Nicolas Makaroff, Th\'eo Bertrand and Laurent D. Cohen	(参考訳) 測地距離とそれらが伝達する幾何学的情報を活用することは、イメージングにおける多くのデータ指向アプリケーションにとって鍵となる。測地線距離計算は、画像ベースメトリクスを用いた画像セグメンテーションに長く使われてきた。我々は、CNNを用いて問題に適応した等方的リーマン計量を生成し、アプリケーションの例を示す新しい方法を提案する。次に、cnnで出力される計量ポテンシャルで計算された測地線距離の単位球としての脳腫瘍のセグメンテーションに適用し、出力マスクに幾何学的および位相的制約を与える。測地的距離モジュールは機械学習フレームワークでうまく機能し、幾何学的および位相的特性を確保しつつ最先端のパフォーマンスを達成するために使用できることを示す。 Leveraging geodesic distances and the geometrical information they convey is key for many data-oriented applications in imaging. Geodesic distance computation has been used for long for image segmentation using Image based metrics. We introduce a new method by generating isotropic Riemannian metrics adapted to a problem using CNN and give as illustrations an example of application. We then apply this idea to the segmentation of brain tumours as unit balls for the geodesic distance computed with the metric potential output by a CNN, thus imposing geometrical and topological constraints on the output mask. We show that geodesic distance modules work well in machine learning frameworks and can be used to achieve state-of-the-art performances while ensuring geometrical and/or topological properties.	翻訳日:2023-06-29 14:24:19 公開日:2023-06-28
# SkillNet-X: わずかに活性化された多言語マルチタスクモデル SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills ( http://arxiv.org/abs/2306.16176v1 ) ライセンス: Link先を確認	Zhangyin Feng, Yong Dai, Fan Zhang, Duyu Tang, Xiaocheng Feng, Shuangzhi Wu, Bing Qin, Yunbo Cao and Shuming Shi	(参考訳) 従来のマルチタスク学習手法は、基本的にタスクや言語に関する共通知識のみを活用でき、言語横断知識やクロスタスク知識が失われる。本稿では,skillnet-xと呼ばれる汎用多言語マルチタスクモデルを提案する。この目的のために、複数の言語固有のスキルとタスク固有のスキルを定義し、それぞれがスキルモジュールに対応する。 skillnet-xは、ターゲットタスクまたはターゲット言語に関連するスキルモジュールの一部をスパースに活性化する。知識伝達ハブとして機能するスキルモジュールは、タスク関連知識と言語関連知識を連続的に吸収することができる。トランスを基盤として,マルチヘッドアテンション層とフィードフォワードネットワーク層を変更し,スキルモジュールに対応する。我々はSkillNet-Xを4言語で11の自然言語理解データセット上で評価した。その結果,SkillNet-Xはタスク固有のベースラインと2つのマルチタスク学習ベースライン(密接な関節モデルとMixture-of-Expertsモデル)よりも優れた性能を示した。さらに、スキル事前トレーニングは、ほぼすべてのデータセット上でSkillNet-Xのパフォーマンスをさらに向上させる。モデルの一般化を検討するために,2つの新しいタスクについて実験を行い,skillnet-xがベースラインを大きく上回ることを確認した。 Traditional multitask learning methods basically can only exploit common knowledge in task- or language-wise, which lose either cross-language or cross-task knowledge. This paper proposes a general multilingual multitask model, named SkillNet-X, which enables a single model to tackle many different tasks from different languages. To this end, we define several language-specific skills and task-specific skills, each of which corresponds to a skill module. SkillNet-X sparsely activates parts of the skill modules which are relevant either to the target task or the target language. Acting as knowledge transit hubs, skill modules are capable of absorbing task-related knowledge and language-related knowledge consecutively. Based on Transformer, we modify the multi-head attention layer and the feed forward network layer to accommodate skill modules. We evaluate SkillNet-X on eleven natural language understanding datasets in four languages. Results show that SkillNet-X performs better than task-specific baselines and two multitask learning baselines (i.e., dense joint model and Mixture-of-Experts model). Furthermore, skill pre-training further improves the performance of SkillNet-X on almost all datasets. To investigate the generalization of our model, we conduct experiments on two new tasks and find that SkillNet-X significantly outperforms baselines.	翻訳日:2023-06-29 14:16:58 公開日:2023-06-28
# $\mathbf{c}^2$former:rgb赤外物体検出のための校正および補完トランスフォーマー $\mathbf{C}^2$Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection ( http://arxiv.org/abs/2306.16175v1 ) ライセンス: Link先を確認	Maoxun Yuan, Xingxing Wei	(参考訳) 可視(rgb)および赤外線(ir)画像上の物体検出は、時間前後のアプリケーションのロバストな検出を容易にする新たなソリューションとして、近年広く注目を集めている。赤外線画像の助けを借りて、オブジェクト検出器はRGB-IR複合情報を使用することにより、実用上より信頼性が高く、堅牢である。しかし、既存の手法は相反性ミスカバリレーションや核融合インプレシジョンの問題に苦しんでいる。本稿では,異なる特徴間のペア関係をモデル化する強力な能力を有するため,これら2つの問題に同時に対処するために,$\mathrm{C}^2$Former という新しいキャリブレーション・補完変換器を提案する。 rgb と ir モダリティの相互接続関係を学習し,そのキャリブレーションと相補的特徴を得るために,$\mathrm{c}^2$former で相互接続(inter-modality cross-attention,ica)モジュールを設計する。 ICAにおけるグローバルアテンションの計算による計算コストを低減するため、特徴写像の次元を小さくするために、適応特徴サンプリング(AFS)モジュールが導入された。 $\mathrm{C}^2$Formerは機能ドメインで機能するため、バックボーンネットワークを介して既存のRGB-IRオブジェクト検出器に組み込むことができる。したがって,1つの単段と2つの2段階の物体検出器に,我々の$\mathrm{C}^2$Formerを組み込んで,その有効性と汎用性を評価する。本研究では,DroneVehicle と KAIST RGB-IR データセットの広範な実験により,RGB-IR 補完情報を完全に活用し,ロバストな検出結果が得られることを確認した。コードはhttps://github.com/yuanmaoxun/Calibrated-and-Complementary-Transformer-for-RGB-Infrared-Object-Detec tion.gitで公開されている。 Object detection on visible (RGB) and infrared (IR) images, as an emerging solution to facilitate robust detection for around-the-clock applications, has received extensive attention in recent years. With the help of IR images, object detectors have been more reliable and robust in practical applications by using RGB-IR combined information. However, existing methods still suffer from modality miscalibration and fusion imprecision problems. Since transformer has the powerful capability to model the pairwise correlations between different features, in this paper, we propose a novel Calibrated and Complementary Transformer called $\mathrm{C}^2$Former to address these two problems simultaneously. In $\mathrm{C}^2$Former, we design an Inter-modality Cross-Attention (ICA) module to obtain the calibrated and complementary features by learning the cross-attention relationship between the RGB and IR modality. To reduce the computational cost caused by computing the global attention in ICA, an Adaptive Feature Sampling (AFS) module is introduced to decrease the dimension of feature maps. Because $\mathrm{C}^2$Former performs in the feature domain, it can be embedded into existed RGB-IR object detectors via the backbone network. Thus, one single-stage and one two-stage object detector both incorporating our $\mathrm{C}^2$Former are constructed to evaluate its effectiveness and versatility. With extensive experiments on the DroneVehicle and KAIST RGB-IR datasets, we verify that our method can fully utilize the RGB-IR complementary information and achieve robust detection results. The code is available at https://github.com/yuanmaoxun/Calibrated-and-Complementary-Transformer-for-RGB-Infrared-Object-Detec tion.git.	翻訳日:2023-06-29 14:16:34 公開日:2023-06-28
# ソースコードの類似度測定とクローン検出に関する体系的文献レビュー--技術,応用,課題 A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges ( http://arxiv.org/abs/2306.16171v1 ) ライセンス: Link先を確認	Morteza Zakeri-Nasrabadi and Saeed Parsa and Mohammad Ramezani and Chanchal Roy and Masoud Ekhtiarzadeh	(参考訳) ソースコードの類似度の測定と評価は、コードのレコメンデーション、重複コード、盗作、マルウェア、嗅覚検出など、幅広いアプリケーションを取り入れた、基本的なソフトウェアエンジニアリング活動である。本稿では,コード類似度測定と評価手法に関する体系的な文献レビューとメタ分析を行い,既存手法とその特性を異なる用途で明らかにする。私たちは最初、4つのデジタルライブラリーに問い合わせて100,000以上の記事を見つけました。研究は方法論、プログラミング言語、データセット、ツール、アプリケーションによって分類された。深い調査によると、80のソフトウェアツールがあり、5つのアプリケーションドメインで8つの異なるテクニックで作業している。約49%のツールはjavaプログラムで動作し、37%はcとc++をサポートしているが、多くのプログラミング言語はサポートしていない。注目すべき点は、ソースコードの類似度測定と重複コードに関連する12のデータセットが存在することだ。信頼できるデータセットの欠如、経験的評価、ハイブリッドメソッド、マルチパラダイム言語にフォーカスすることが、この分野の主要な課題である。コード類似度測定の新たな応用は、メンテナンスに加えて開発フェーズに集中する。 Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.	翻訳日:2023-06-29 14:16:00 公開日:2023-06-28
# マルチテラー逆蒸留による精度・ロバスト性トレードオフの緩和 Mitigating the Accuracy-Robustness Trade-off via Multi-Teacher Adversarial Distillation ( http://arxiv.org/abs/2306.16170v1 ) ライセンス: Link先を確認	Shiji Zhao, Xizhe Wang, Xingxing Wei	(参考訳) 敵対的トレーニングは、敵対的攻撃に対するディープニューラルネットワークの堅牢性を改善するための実践的なアプローチである。信頼性の高いロバスト性をもたらすが、クリーンな例に対する性能は敵の訓練後に負の影響を受ける。近年, 対人訓練に知識蒸留法を応用し, 堅牢性向上に競争力を発揮する研究も行われているが, 清浄な試料の精度はいまだに限られている。本稿では, 高いクリーンな教師と強いロバストな教師を用いて, クリーンな事例と敵対的な事例をそれぞれ扱うことで, モデルの逆トレーニングプロセスの指導を行うマルチTeacher Adversarial Robustness Distillation (MTARD)を導入する。最適化の過程では,異なる教師が同様の知識尺度を示すことを保証するために,教師の温度を調整し,教師の情報エントロピーを一定に保つエントロピーベースバランスアルゴリズムを設計する。また,生徒が複数の教師から比較的一貫した学習速度を持つことを保証するため,異なる種類の知識の学習重みを調整できる正規化損失バランスアルゴリズムを提案する。公開データセット上で行われた一連の実験は、MTARDが様々な敵攻撃に対して最先端の敵の訓練と蒸留法より優れていることを示した。 Adversarial training is a practical approach for improving the robustness of deep neural networks against adversarial attacks. Although bringing reliable robustness, the performance toward clean examples is negatively affected after adversarial training, which means a trade-off exists between accuracy and robustness. Recently, some studies have tried to use knowledge distillation methods in adversarial training, achieving competitive performance in improving the robustness but the accuracy for clean samples is still limited. In this paper, to mitigate the accuracy-robustness trade-off, we introduce the Multi-Teacher Adversarial Robustness Distillation (MTARD) to guide the model's adversarial training process by applying a strong clean teacher and a strong robust teacher to handle the clean examples and adversarial examples, respectively. During the optimization process, to ensure that different teachers show similar knowledge scales, we design the Entropy-Based Balance algorithm to adjust the teacher's temperature and keep the teachers' information entropy consistent. Besides, to ensure that the student has a relatively consistent learning speed from multiple teachers, we propose the Normalization Loss Balance algorithm to adjust the learning weights of different types of knowledge. A series of experiments conducted on public datasets demonstrate that MTARD outperforms the state-of-the-art adversarial training and distillation methods against various adversarial attacks.	翻訳日:2023-06-29 14:15:39 公開日:2023-06-28
# エンドツーエンド自動運転のための階層的フェデレーション学習を制約した通信資源 Communication Resources Constrained Hierarchical Federated Learning for End-to-End Autonomous Driving ( http://arxiv.org/abs/2306.16169v1 ) ライセンス: Link先を確認	Wei-Bin Kou, Shuai Wang, Guangxu Zhu, Bin Luo, Yingxian Chen, Derrick Wing Kwan Ng, and Yik-Chung Wu	(参考訳) フェデレートラーニング(FL)はモデル集約によるエンドツーエンド自動運転の一般化を改善する一方、従来のシングルホップFL(SFL)は、車両とクラウドサーバ間の長距離通信による収束が遅い。階層的連合学習(HFL)は、中点エッジサーバの導入によってそのような欠点を克服する。しかし、制約付き通信資源とhfl性能のオーケストレーションが緊急問題となっている。本稿では,ハイブリッドデータとモデル集約を用いた自律運転モデルの一般化誤差を最小限に抑えるために,最適化に基づく通信資源制約付き階層型学習(CRCHFL)フレームワークを提案する。 CRCHFLの有効性をカーラーニング・トゥ・アクト(CARLA)シミュレーションプラットフォームで評価した。その結果,提案するcrchflは収束速度を加速し,連関学習自律運転モデルの一般化を促進することがわかった。さらに、同じ通信資源予算の下では、HFLを10.33%、SFLを12.44%上回っている。 While federated learning (FL) improves the generalization of end-to-end autonomous driving by model aggregation, the conventional single-hop FL (SFL) suffers from slow convergence rate due to long-range communications among vehicles and cloud server. Hierarchical federated learning (HFL) overcomes such drawbacks via introduction of mid-point edge servers. However, the orchestration between constrained communication resources and HFL performance becomes an urgent problem. This paper proposes an optimization-based Communication Resource Constrained Hierarchical Federated Learning (CRCHFL) framework to minimize the generalization error of the autonomous driving model using hybrid data and model aggregation. The effectiveness of the proposed CRCHFL is evaluated in the Car Learning to Act (CARLA) simulation platform. Results show that the proposed CRCHFL both accelerates the convergence rate and enhances the generalization of federated learning autonomous driving model. Moreover, under the same communication resource budget, it outperforms the HFL by 10.33% and the SFL by 12.44%.	翻訳日:2023-06-29 14:15:11 公開日:2023-06-28
# 機械学習のための最適輸送の最近の進歩 Recent Advances in Optimal Transport for Machine Learning ( http://arxiv.org/abs/2306.16156v1 ) ライセンス: Link先を確認	Eduardo Fernandes Montesuma, Fred Ngol\`e Mboula, Antoine Souloumiac	(参考訳) 近年,確率分布の比較と操作のための機械学習の確率的フレームワークとして最適輸送法が提案されている。これはその豊かな歴史と理論に根ざし、生成モデリングや伝達学習といった機械学習の様々な問題に対する新しい解決策を提供してきた。本研究では,2012～2022年における機械学習の最適輸送の貢献について検討し,教師なし,教師なし,転送,強化学習の4つのサブフィールドに注目した。計算最適輸送の最近の発展と機械学習の実践との相互作用をさらに強調する。 Recently, Optimal Transport has been proposed as a probabilistic framework in Machine Learning for comparing and manipulating probability distributions. This is rooted in its rich history and theory, and has offered new solutions to different problems in machine learning, such as generative modeling and transfer learning. In this survey we explore contributions of Optimal Transport for Machine Learning over the period 2012 -- 2022, focusing on four sub-fields of Machine Learning: supervised, unsupervised, transfer and reinforcement learning. We further highlight the recent development in computational Optimal Transport, and its interplay with Machine Learning practice.	翻訳日:2023-06-29 14:14:54 公開日:2023-06-28
# 周期的に蹴られた多体力学における予熱 Prethermalization in aperiodically kicked many-body dynamics ( http://arxiv.org/abs/2306.16144v1 ) ライセンス: Link先を確認	Jin Yan, Roderich Moessner and Hongzheng Zhao	(参考訳) 駆動多体システムは通常、エネルギー保存の欠如により加熱を受ける。加熱は時間周期ドライブでは抑制されるが、通常の駆動プロトコルではほとんど知られていない。本研究では,準周期的なトゥエモースや,n$-multipolar temporal correlationsを持つランダムシーケンス群によって駆動される非周期キック系の加熱ダイナミクスについて検討する。高周波状態から離れても複数の加熱チャネルを除去できることを実証した。除去チャネルの数は、多極性順序$n$で増加する。これを古典的な蹴りローターチェーンで説明し、長寿命の予熱状態を見つける。静的ハミルトニアンが運動エネルギーのみを含む場合、先行熱寿命 $t^$ はドライブの時間的相関に強く依存し、キック強度 $t^\sim k^{-2n}$ のパワーロー依存性を持つ。 Driven many-body systems typically experience heating due to the lack of energy conservation. Heating may be suppressed for time-periodic drives, but little is known for less regular drive protocols. In this work, we investigate the heating dynamics in aperiodically kicked systems, specifically those driven by quasi-periodic Thue-Morse or a family of random sequences with $n$-multipolar temporal correlations. We demonstrate that multiple heating channels can be eliminated even away from the high-frequency regime. The number of eliminated channels increases with multipolar order $n$. We illustrate this in a classical kicked rotor chain where we find a long-lived prethermal regime. When the static Hamiltonian only involves the kinetic energy, the prethermal lifetime $t^$ can strongly depend on the temporal correlations of the drive, with a power-law dependence on the kick strength $t^\sim K^{-2n}$, for which we can account using a simple linearization argument.	翻訳日:2023-06-29 14:14:44 公開日:2023-06-28
# ドメイン固有自然言語処理アプリケーション開発のための生成的ユーザエクスペリエンス研究 Generative User-Experience Research for Developing Domain-specific Natural Language Processing Applications ( http://arxiv.org/abs/2306.16143v1 ) ライセンス: Link先を確認	Anastasia Zhukova, Lukas von Sperl, Christian E. Matt, Bela Gipp	(参考訳) ユーザエクスペリエンス(ux)は、ヒューマンコンピュータインタラクション(hci)研究の一部であり、システムユーザに対する直感性、透明性、シンプルさ、信頼の向上に重点を置いている。機械学習(ML)や自然言語処理(NLP)のためのUX研究のほとんどは、データ駆動の方法論に焦点を当てている。さらに、より一般的なUXメソッドは、最初にユーザニーズについて学ぶのとは異なり、システムをユーザユーザビリティに向けて調整する。本稿では,生成UX研究をドメインNLPアプリケーションに組み込む手法を提案する。生成UX研究は、プロトタイプ開発の初期段階、すなわちアイデアと概念評価、およびユーザ価値の変化を評価するための最終段階において、ドメインユーザーを採用する。本ケーススタディでは,プロセス産業における日常業務のドメイン固有意味検索の完全サイクルプロトタイプ開発について報告する。ケーススタディでは、ドメインエキスパートの関与は、NLPアプリケーションに対する関心と信頼を高めます。さらに,狭義のNLPアプリケーションにおいて重要となるデータおよびユーザ主導の機会と制約を,相乗的UX+NLP研究が効率的に検討していることを示す。 User experience (UX) is a part of human-computer interaction (HCI) research and focuses on increasing intuitiveness, transparency, simplicity, and trust for system users. Most of the UX research for machine learning (ML) or natural language processing (NLP) focuses on a data-driven methodology, i.e., it fails to focus on users' requirements, and engages domain users mainly for usability evaluation. Moreover, more typical UX methods tailor the systems towards user usability, unlike learning about the user needs first. The paper proposes a methodology for integrating generative UX research into developing domain NLP applications. Generative UX research employs domain users at the initial stages of prototype development, i.e., ideation and concept evaluation, and the last stage for evaluating the change in user value. In the case study, we report the full-cycle prototype development of a domain-specific semantic search for daily operations in the process industry. Our case study shows that involving domain experts increases their interest and trust in the final NLP application. Moreover, we show that synergetic UX+NLP research efficiently considers data- and user-driven opportunities and constraints, which can be crucial for NLP applications in narrow domains	翻訳日:2023-06-29 14:14:25 公開日:2023-06-28
# 一方向パストレースレンダリングのための神経方向距離場オブジェクト表現 Neural directional distance field object representation for uni-directional path-traced rendering ( http://arxiv.org/abs/2306.16142v1 ) ライセンス: Link先を確認	Annada Prasad Behera and Subhankar Mishra	(参考訳) 合成画像の高速レンダリングは、コンピュータグラフィックスの分野で核となる問題である。パストラッシングのようなレンダリングアルゴリズムは、画像のサイズ、光の反射数、ピクセル当たりのサンプル数などのパラメータに依存しており、所望の画質の画像を取得したい場合、すべてが固定される。また、レンダリングされるシーンのサイズと複雑さにも依存する。レンダリングにおける最大のボトルネックの1つは、特にシーンが非常に大きい場合、シーン内のあるレイの経路にあるオブジェクトをクエリすることである。シーン内のオブジェクトを表すデータ型を変更することで、レンダリング時間を短縮することができるが、シーンの異なる表現ではレンダリングアルゴリズムを変更する必要がある。この論文では a) 対象物の機能的表現として有向距離場を導入する。 b) 有向距離が,ニューラルネットワークとして格納された場合,どのように機能するか,及び (c) 修正されたパストレースアルゴリズムでそのようなオブジェクトをレンダリングする方法。 Faster rendering of synthetic images is a core problem in the field of computer graphics. Rendering algorithms, such as path-tracing is dependent on parameters like size of the image, number of light bounces, number of samples per pixel, all of which, are fixed if one wants to obtain a image of a desired quality. It is also dependent on the size and complexity of the scene being rendered. One of the largest bottleneck in rendering, particularly when the scene is very large, is querying for objects in the path of a given ray in the scene. By changing the data type that represents the objects in the scene, one may reduce render time, however, a different representation of a scene requires the modification of the rendering algorithm. In this paper, (a) we introduce directed distance field, as a functional representation of a object; (b) how the directed distance functions, when stored as a neural network, be optimized and; (c) how such an object can be rendered with a modified path-tracing algorithm.	翻訳日:2023-06-29 14:14:03 公開日:2023-06-28
# 大規模オンライン学習による深層サロゲートモデルのトレーニング Training Deep Surrogate Models with Large Scale Online Learning ( http://arxiv.org/abs/2306.16133v1 ) ライセンス: Link先を確認	Lucas Meyer (EDF R\&D, SINCLAIR AI Lab, DATAMOVE ), Marc Schouler (DATAMOVE ), Robert Alexander Caulk (DATAMOVE ), Alejandro Rib\'es (SINCLAIR AI Lab, EDF R\&D), Bruno Raffin (DATAMOVE )	(参考訳) 部分微分方程式(PDE)の時空間分解は、世界の物理現象の数学的記述において重要な役割を果たす。一般に、科学者や技術者は計算に要求される解法を用いてPDEを数値的に解く。近年,PDEの高速解の代替としてディープラーニングアルゴリズムが登場している。モデルは通常、ソルバが生成した合成データに基づいて訓練され、ディスクに格納され、トレーニングのために読み戻される。本稿では、これらのモデルをトレーニングするために従来の静的データセットを頼りにしているため、ソルバの完全なメリットをデータジェネレータとして利用できないことを主張する。深層サロゲートモデルのためのオープンソースのオンライントレーニングフレームワークを提案する。このフレームワークは、数値シミュレーションとディープニューラルネットワークのトレーニングを同時に行うことに焦点を当てた、いくつかのレベルの並列処理を実装している。このアプローチは、ディスクロードされたデータセットに関連するI/Oとストレージのボトルネックを抑制し、はるかに大きなデータセットのトレーニング方法を開く。実験では、最先端アーキテクチャを含む4つの代理モデルのオフラインおよびオンライントレーニングを比較した。以上の結果から,データセットの多様性を最大数百GBにまで高めることで,モデル一般化能力が向上する可能性が示唆された。フル接続ニューラルネットワーク、フーリエニューラル演算子(FNO)、メッセージパスPDEソルバー予測精度はそれぞれ68%、16%、7%向上している。 The spatiotemporal resolution of Partial Differential Equations (PDEs) plays important roles in the mathematical description of the world's physical phenomena. In general, scientists and engineers solve PDEs numerically by the use of computationally demanding solvers. Recently, deep learning algorithms have emerged as a viable alternative for obtaining fast solutions for PDEs. Models are usually trained on synthetic data generated by solvers, stored on disk and read back for training. This paper advocates that relying on a traditional static dataset to train these models does not allow the full benefit of the solver to be used as a data generator. It proposes an open source online training framework for deep surrogate models. The framework implements several levels of parallelism focused on simultaneously generating numerical simulations and training deep neural networks. This approach suppresses the I/O and storage bottleneck associated with disk-loaded datasets, and opens the way to training on significantly larger datasets. Experiments compare the offline and online training of four surrogate models, including state-of-the-art architectures. Results indicate that exposing deep surrogate models to more dataset diversity, up to hundreds of GB, can increase model generalization capabilities. Fully connected neural networks, Fourier Neural Operator (FNO), and Message Passing PDE Solver prediction accuracy is improved by 68%, 16% and 7%, respectively.	翻訳日:2023-06-29 14:13:49 公開日:2023-06-28
# 行動や指示からエージェントを伝達する目標を推測する Inferring the Goals of Communicating Agents from Actions and Instructions ( http://arxiv.org/abs/2306.16207v1 ) ライセンス: Link先を確認	Lance Ying, Tan Zhi-Xuan, Vikash Mansinghka, Joshua B. Tenenbaum	(参考訳) 人間が協力すると、彼らはしばしば言語コミュニケーションと非言語行動の両方を通して活動の協調を行い、この情報を使って共通の目標と計画を立てる。この推論能力をどのようにモデル化するか? 本稿では,一つのエージェントであるプリンシパルが,共有計画に関する自然言語指示を他のエージェントであるアシスタントに伝え,gpt-3を指導発話の確率関数として用いる協調チームのモデルを提案する。次に、第三者のオブザーバが行動や指示からマルチモーダルベイズ逆計画を通じてチームのゴールを推測し、エージェントが行動し、合理的にコミュニケーションして達成することを前提として、目標に対する後方分布を計算する方法を示す。提案手法は,マルチエージェントグリッドワールドにおける人間の目標推定と比較し,モデルの推定が人間の判断と密接に相関していること(R = 0.96)を見出した。また,行動のみからの推論と比較すると,指示がより迅速かつ不確定な目標推論につながり,協調エージェントにとっての言語コミュニケーションの重要性が強調された。 When humans cooperate, they frequently coordinate their activity through both verbal communication and non-verbal actions, using this information to infer a shared goal and plan. How can we model this inferential ability? In this paper, we introduce a model of a cooperative team where one agent, the principal, may communicate natural language instructions about their shared plan to another agent, the assistant, using GPT-3 as a likelihood function for instruction utterances. We then show how a third person observer can infer the team's goal via multi-modal Bayesian inverse planning from actions and instructions, computing the posterior distribution over goals under the assumption that agents will act and communicate rationally to achieve them. We evaluate this approach by comparing it with human goal inferences in a multi-agent gridworld, finding that our model's inferences closely correlate with human judgments (R = 0.96). When compared to inference from actions alone, we also find that instructions lead to more rapid and less uncertain goal inference, highlighting the importance of verbal communication for cooperative agents.	翻訳日:2023-06-29 14:08:05 公開日:2023-06-28
# マルチエージェントチームによる学習の理解を深める Towards a Better Understanding of Learning with Multiagent Teams ( http://arxiv.org/abs/2306.16205v1 ) ライセンス: Link先を確認	David Radke, Kate Larson, Tim Brecht and Kyle Tilbury	(参考訳) 個人学習エージェントのチームは、その部分の合計よりも大きいと長年認識されてきたが、最近の研究によると、より大きなチームは必ずしも小さなものよりも効果的ではない。本稿では,特定のチーム構造が個人学習エージェントの集団に対して効果的な学習を促進する理由と条件について検討する。環境によっては、エージェントが特定の役割を専門化するのに役立ついくつかのチーム構造が、より望ましいグローバルな結果をもたらすことを示している。しかし、大きなチームはコーディネーションを減らすためのクレジット割り当ての課題を作り、大きなチームは小さなチームに比べてパフォーマンスが悪くなります。理論的分析と経験的結果の両方で結論を支持する。 While it has long been recognized that a team of individual learning agents can be greater than the sum of its parts, recent work has shown that larger teams are not necessarily more effective than smaller ones. In this paper, we study why and under which conditions certain team structures promote effective learning for a population of individual learning agents. We show that, depending on the environment, some team structures help agents learn to specialize into specific roles, resulting in more favorable global results. However, large teams create credit assignment challenges that reduce coordination, leading to large teams performing poorly compared to smaller ones. We support our conclusions with both theoretical analysis and empirical results.	翻訳日:2023-06-29 14:07:45 公開日:2023-06-28
# 半教師対象検出のための低信頼サンプルマイニング Low-Confidence Samples Mining for Semi-supervised Object Detection ( http://arxiv.org/abs/2306.16201v1 ) ライセンス: Link先を確認	Guandu Liu, Fangyuan Zhang, Tianxiang Pan, Bin Wang	(参考訳) ラベルなしデータからの信頼性の高い擬似ラベルは、半教師付きオブジェクト検出(SSOD)において重要な役割を果たす。しかし、最先端のssodメソッドはすべて信頼度の高い擬似ラベルに依存しており、信頼度の低い貴重な擬似ラベルを無視している。さらに、ラベルなしデータの発掘が不十分なため、リコール率が過度に低くなり、ネットワークトレーニングが損なわれる。本稿では,低信頼擬似ラベルを効率的に利用する新しい低信頼サンプルマイニング(lsm)手法を提案する。具体的には,低分解能な特徴マップを考慮に入れた付加的な擬似情報マイニング(pim)ブランチを開発し,信頼性の高い大規模インスタンスを抽出した。 pimとメインブランチの相補的な予測により、両者を相互に学習して補うための自己蒸留(sd)を更に設計する。一方、上記のアプローチの拡張性により、我々のLSMは、それぞれ Faster-RCNN と Deformable-DETR に適用できる。 MS-COCOベンチマークでは,5%のラベル付け率で最先端手法よりも3.54%のmAP改善を実現している。 Reliable pseudo-labels from unlabeled data play a key role in semi-supervised object detection (SSOD). However, the state-of-the-art SSOD methods all rely on pseudo-labels with high confidence, which ignore valuable pseudo-labels with lower confidence. Additionally, the insufficient excavation for unlabeled data results in an excessively low recall rate thus hurting the network training. In this paper, we propose a novel Low-confidence Samples Mining (LSM) method to utilize low-confidence pseudo-labels efficiently. Specifically, we develop an additional pseudo information mining (PIM) branch on account of low-resolution feature maps to extract reliable large-area instances, the IoUs of which are higher than small-area ones. Owing to the complementary predictions between PIM and the main branch, we further design self-distillation (SD) to compensate for both in a mutually-learning manner. Meanwhile, the extensibility of the above approaches enables our LSM to apply to Faster-RCNN and Deformable-DETR respectively. On the MS-COCO benchmark, our method achieves 3.54% mAP improvement over state-of-the-art methods under 5% labeling ratios.	翻訳日:2023-06-29 14:07:33 公開日:2023-06-28
# 自由度3次元超音波再構成のためのオンライン自己整合型マルチIMU Multi-IMU with Online Self-Consistency for Freehand 3D Ultrasound Reconstruction ( http://arxiv.org/abs/2306.16197v1 ) ライセンス: Link先を確認	Mingyuan Luo, Xin Yang, Zhongnuo Yan, Yuanji Zhang, Junyu Li, Jiongquan Chen, Xindi Hu, Jikuan Qian, Jun Cheng, Dong Ni	(参考訳) 超音波(US)イメージングは臨床診断において一般的なツールであり、安全性、再現性、リアルタイム能力を提供する。 Freehand 3D USは、複雑さを増すことなくスキャンされた領域をより深く理解する技術である。しかし,標高変位と累積誤差の推定は依然として困難であり,画像のみを用いて相対位置を推定することは困難である。複雑さを増すことなく再建性能を向上させるために,外部軽量センサの追加が提案されている。本稿では,複数慣性測定ユニット (imus) を用いた新しいオンライン自己抵抗ネットワーク (oscnet) を提案する。 OSCNetは、複数のIMU情報を融合し、各IMUデータから得られた再構成結果の違いを減らすために、モーダルレベルの自己管理戦略を利用する。さらに,スキャンシーケンスとそのサブシーケンス間の予測結果の階層的一貫性を改善するために,シーケンスレベルの自己一貫性戦略を提案する。複数のスキャン戦術を用いた大規模腕と頸動脈データセットの実験では,oscnetが従来の手法を上回っており,最先端の再構築性能を実現している。 Ultrasound (US) imaging is a popular tool in clinical diagnosis, offering safety, repeatability, and real-time capabilities. Freehand 3D US is a technique that provides a deeper understanding of scanned regions without increasing complexity. However, estimating elevation displacement and accumulation error remains challenging, making it difficult to infer the relative position using images alone. The addition of external lightweight sensors has been proposed to enhance reconstruction performance without adding complexity, which has been shown to be beneficial. We propose a novel online self-consistency network (OSCNet) using multiple inertial measurement units (IMUs) to improve reconstruction performance. OSCNet utilizes a modal-level self-supervised strategy to fuse multiple IMU information and reduce differences between reconstruction results obtained from each IMU data. Additionally, a sequence-level self-consistency strategy is proposed to improve the hierarchical consistency of prediction results among the scanning sequence and its sub-sequences. Experiments on large-scale arm and carotid datasets with multiple scanning tactics demonstrate that our OSCNet outperforms previous methods, achieving state-of-the-art reconstruction performance.	翻訳日:2023-06-29 14:07:16 公開日:2023-06-28
# 動的グラフ知識集約による対話生成の促進 Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation ( http://arxiv.org/abs/2306.16195v1 ) ライセンス: Link先を確認	Chen Tang, Hongbo Zhang, Tyler Loakman, Chenghua Lin and Frank Guerin	(参考訳) 外部グラフ知識をニューラルチャットボットモデルに組み込むことは、対話生成の強化に有効であることが証明されている。しかし、従来のグラフニューラルネットワーク(gnns)では、グラフ上のメッセージパッシングはテキストから独立しており、結果としてグラフ表現の隠れ空間はテキストとは異なっている。既存のモデルのこのトレーニング体制は、グラフ知識とテキストの間に意味的なギャップをもたらす。本研究では,知識グラフ強化対話生成のための新しいフレームワークを提案する。疑似ノードを持つマルチホップ知識グラフを動的に構築し,すべてのステップで言語モデルをグラフ内の特徴集約に組み込む。バニラ部分グラフの学習による意味バイアスを回避するため,提案フレームワークは疑似ノード上のグラフの特徴を集約する階層グラフの注意を応用し,グローバルな特徴を得る。したがって、フレームワークはポストと外部のグラフの知識の両方から異質な機能をうまく利用することができる。広範な実験により,対話生成におけるsota(state-of-the-art)ベースラインよりも優れたフレームワークが得られた。さらに,テキストとグラフの知識の両方の表現を凝集することにより,表現学習フレームワークが意味的ギャップを埋めることを示す。さらに、言語モデルは、機能集約プロセスでサブグラフパターンを利用することによって、より有益な応答のために知識トリプルを選択する方法も学んでいます。私たちのコードとリソースはhttps://github.com/tangg555/sabartで利用可能です。 Incorporating external graph knowledge into neural chatbot models has been proven effective for enhancing dialogue generation. However, in conventional graph neural networks (GNNs), message passing on a graph is independent from text, resulting in the graph representation hidden space differing from that of the text. This training regime of existing models therefore leads to a semantic gap between graph knowledge and text. In this study, we propose a novel framework for knowledge graph enhanced dialogue generation. We dynamically construct a multi-hop knowledge graph with pseudo nodes to involve the language model in feature aggregation within the graph at all steps. To avoid the semantic biases caused by learning on vanilla subgraphs, the proposed framework applies hierarchical graph attention to aggregate graph features on pseudo nodes and then attains a global feature. Therefore, the framework can better utilise the heterogeneous features from both the post and external graph knowledge. Extensive experiments demonstrate that our framework outperforms state-of-the-art (SOTA) baselines on dialogue generation. Further analysis also shows that our representation learning framework can fill the semantic gap by coagulating representations of both text and graph knowledge. Moreover, the language model also learns how to better select knowledge triples for a more informative response via exploiting subgraph patterns within our feature aggregation process. Our code and resources are available at https://github.com/tangg555/SaBART.	翻訳日:2023-06-29 14:06:57 公開日:2023-06-28
# 近接近傍相互作用を持つ1次元量子デバイス上でのスピンスクイージングの変分生成 Variational generation of spin squeezing on one-dimensional quantum devices with nearest-neighbor interactions ( http://arxiv.org/abs/2306.16194v1 ) ライセンス: Link先を確認	Zheng-Hang Sun, Yong-Yi Wang, Yu-Ran Zhang, Franco Nori, Heng Fan	(参考訳) スピンスクイーズ状態の効率的な調製は量子化メトロジーにとって重要である。強いスピンスクイーズを生成するための現在のプロトコルは、高次元または長距離の相互作用に依存する。鍵となる課題は、近傍の相互作用しか持たない1次元系のスピンスクイーズを生成する方法である。そこで我々は,この問題を解決するために変分スピンスキーズアルゴリズムを開発した。これらの変分アルゴリズムについて,ディジタル回路とアナログ量子回路の両方を考察する。変分スピンスケージングアルゴリズムの閉最適化ループの後、生成されたスクイージングは、2軸ツイストリングから生成される最強のスクイージングに匹敵する。実験的不完全性の解析により、本研究で提案する変分スピンスキーズアルゴリズムは、近年開発された雑音中規模量子コンピュータにおいて実現可能である。 Efficient preparation of spin-squeezed states is important for quantum-enhanced metrology. Current protocols for generating strong spin squeezing rely on either high dimensionality or long-range interactions. A key challenge is how to generate considerable spin squeezing in one-dimensional systems with only nearest-neighbor interactions. Here, we develop variational spin-squeezing algorithms to solve this problem. We consider both digital and analog quantum circuits for these variational algorithms. After the closed optimization loop of the variational spin-squeezing algorithms, the generated squeezing can be comparable to the strongest squeezing created from two-axis twisting. By analyzing the experimental imperfections, the variational spin-squeezing algorithms proposed in this work are feasible in recent developed noisy intermediate-scale quantum computers.	翻訳日:2023-06-29 14:06:35 公開日:2023-06-28
# 特定知識注入による繊維欠陥分割のための事前学習型大規模視覚モデルの有効性 Effective Transfer of Pretrained Large Visual Model for Fabric Defect Segmentation via Specifc Knowledge Injection ( http://arxiv.org/abs/2306.16186v1 ) ライセンス: Link先を確認	Zhewei Chen, Wai Keung Wong, Zuofeng Zhong, Jinpiao Liao, Ying Qu	(参考訳) 繊維欠陥セグメンテーションは繊維品質管理に不可欠である。それにもかかわらず、高品質なアノテートデータの不足とファブリック欠陥の多様性は、この分野でのディープラーニングの適用に重大な課題をもたらす。これらの要因は、既存のモデルの一般化とセグメンテーション性能を制限し、多様なファブリックタイプや欠陥の複雑さを扱う能力を妨げる。これらの障害を克服するため,本研究では,織物欠陥の専門知識を大規模視覚モデルsegment anything model(sam)に注入する革新的な手法を提案する。ファブリック欠陥関連パラメータのユニークなセットを導入し、訓練することにより、既存のモデルパラメータに広範な修正を加えることなく、ドメイン固有の知識をSAMにシームレスに統合する。改良されたSAMモデルは、ファブリック欠陥固有の知識を取り入れながら、大規模な自然画像データセットから学んだ一般化イメージ理解を活用し、ファブリック欠陥分割タスクの習熟性を確保する。実験結果から, 汎用的知識とファブリック固有の知識の融合によるモデルセグメンテーション性能の大幅な向上が示された。 3つのデータセットにまたがる一般的なセグメンテーションモデルに対してベンチマークを行うと、提案モデルが性能を大幅に向上することを示す。クロスデータセット比較と数発の学習実験による印象的な結果は、繊維品質管理の実践的応用の可能性をさらに示している。 Fabric defect segmentation is integral to textile quality control. Despite this, the scarcity of high-quality annotated data and the diversity of fabric defects present significant challenges to the application of deep learning in this field. These factors limit the generalization and segmentation performance of existing models, impeding their ability to handle the complexity of diverse fabric types and defects. To overcome these obstacles, this study introduces an innovative method to infuse specialized knowledge of fabric defects into the Segment Anything Model (SAM), a large-scale visual model. By introducing and training a unique set of fabric defect-related parameters, this approach seamlessly integrates domain-specific knowledge into SAM without the need for extensive modifications to the pre-existing model parameters. The revamped SAM model leverages generalized image understanding learned from large-scale natural image datasets while incorporating fabric defect-specific knowledge, ensuring its proficiency in fabric defect segmentation tasks. The experimental results reveal a significant improvement in the model's segmentation performance, attributable to this novel amalgamation of generic and fabric-specific knowledge. When benchmarking against popular existing segmentation models across three datasets, our proposed model demonstrates a substantial leap in performance. Its impressive results in cross-dataset comparisons and few-shot learning experiments further demonstrate its potential for practical applications in textile quality control.	翻訳日:2023-06-29 14:06:22 公開日:2023-06-28
# 空間的詳細記憶を用いたパンシャープ化への学習 Learning to Pan-sharpening with Memories of Spatial Details ( http://arxiv.org/abs/2306.16181v1 ) ライセンス: Link先を確認	Maoxun Yuan, Tianyi Zhao, Bo Li, Xingxing Wei	(参考訳) リモートセンシングシステムにおいて最もよく用いられる技術の一つであるパンシャーペニングは、パンクロマティック画像からマルチスペクトル画像に空間的詳細を注入し、高分解能MS画像を得る。ディープラーニングはその強固な適合能力と効率的な特徴抽出によって広く注目を集めているため、優れた性能を達成するために様々なパンシャープ化手法が提案されている。しかしながら、現在のパンシャーピング法では、通常、入力としてPANとMSの2つのイメージが必要である。この問題に対処するために,本論文では,PAN画像の空間的詳細が主に高周波の手がかりである,すなわち入力PAN画像の輪郭を反映していることを観察する。これにより,いくつかのベースエッジを格納するPAN非依存表現を開発し,それを介して対応するPAN画像の輪郭を構成することができる。その結果、推定時にms画像のみを用いてパンシャープ化タスクを行うことができる。この目的のために、メモリベースのネットワークは、トレーニングフェーズ中に空間の詳細を抽出して記憶するように適応し、メモリベースの空間詳細ネットワーク(MSDN)と呼ばれる推論時にPAN画像から空間情報を取得するプロセスを置き換えるために使用される。我々は最終的に提案したMSDNモジュールを既存のDLベースのパンシャーピング手法に統合し、エンドツーエンドのパンシャーピングネットワークを実現する。我々はGaofen1衛星とWorldView-4衛星の広範な実験により、PAN画像なしで良好な空間的詳細を構築し、最高の性能を達成することを検証する。コードはhttps://github.com/Zhao-Tian-yi/Learning-to-Pan-sharpening-with-Memories-of-Spatial-Details.gitで公開されている。 Pan-sharpening, as one of the most commonly used techniques in remote sensing systems, aims to inject spatial details from panchromatic images into multi-spectral images to obtain high-resolution MS images. Since deep learning has received widespread attention because of its powerful fitting ability and efficient feature extraction, a variety of pan-sharpening methods have been proposed to achieve remarkable performance. However, current pan-sharpening methods usually require the paired PAN and MS images as the input, which limits their usage in some scenarios. To address this issue, in this paper, we observe that the spatial details from PAN images are mainly high-frequency cues, i.e., the edges reflect the contour of input PAN images. This motivates us to develop a PAN-agnostic representation to store some base edges, so as to compose the contour for the corresponding PAN image via them. As a result, we can perform the pan-sharpening task with only the MS image when inference. To this end, a memory-based network is adapted to extract and memorize the spatial details during the training phase and is used to replace the process of obtaining spatial information from PAN images when inference, which is called Memory-based Spatial Details Network (MSDN). We finally integrate the proposed MSDN module into the existing DL-based pan-sharpening methods to achieve an end-to-end pan-sharpening network. With extensive experiments on the Gaofen1 and WorldView-4 satellites, we verify that our method constructs good spatial details without PAN images and achieves the best performance. The code is available at https://github.com/Zhao-Tian-yi/Learning-to-Pan-sharpening-with-Memories-of-Spatial-Details.git.	翻訳日:2023-06-29 14:05:57 公開日:2023-06-28
# 複数インスタンス学習に基づく全スライド画像分類のための疑似バッグミックスアップ拡張 Pseudo-Bag Mixup Augmentation for Multiple Instance Learning Based Whole Slide Image Classification ( http://arxiv.org/abs/2306.16180v1 ) ライセンス: Link先を確認	Pei Liu, Luping Ji, Xinyu Zhang, Feng Ye	(参考訳) ギガピクセル画像のモデリングの特別な状況を考えると、MIL(Multiple Case Learning)はWSI(Whole Slide Image)分類において最も重要なフレームワークの1つとなっている。現在、ほとんどのMILネットワークは、トレーニングにおいて避けられない2つの問題に直面している。 i) 不十分なWSIデータ及び二ニューラルネットワークに固有のデータ記憶の性質これらの問題は、WSIの分類モデルの継続的な性能向上を抑えるため、MILモデルが適切かつ効率的な訓練から妨げられる可能性がある。そこで本研究では,MILモデルのトレーニングを改善するためのPseudo-bag Mixup(PseMix)データ拡張スキームを提案する。このスキームは、MILに基づくWSI分類に適用するために、一般的な画像のMixup戦略を擬似バグを介して特別なWSIに一般化する。疑似バッグによる協調により,psemixはミックスアップ戦略におけるクリティカルサイズアライメントとセマンティクスアライメントを満足する。さらに、MILに適応した効率的で分離された手法として設計されており、時間を要する操作やMILモデル予測に依存しない。比較実験とアブレーション研究はPseMixの有効性と利点を評価するために特別に設計されている。 PseMixはWSI分類におけるMILネットワークの性能を向上させることができる。さらに、MILモデルの一般化能力を高め、排他的ラベルやノイズのあるラベルをパッチする堅牢性を促進することもできる。ソースコードはhttps://github.com/liupei101/psemixで入手できます。 Given the special situation of modeling gigapixel images, multiple instance learning (MIL) has become one of the most important frameworks for Whole Slide Image (WSI) classification. In current practice, most MIL networks often face two unavoidable problems in training: i) insufficient WSI data, and ii) the data memorization nature inherent in neural networks. These problems may hinder MIL models from adequate and efficient training, suppressing the continuous performance promotion of classification models on WSIs. Inspired by the basic idea of Mixup, this paper proposes a Pseudo-bag Mixup (PseMix) data augmentation scheme to improve the training of MIL models. This scheme generalizes the Mixup strategy for general images to special WSIs via pseudo-bags so as to be applied in MIL-based WSI classification. Cooperated by pseudo-bags, our PseMix fulfills the critical size alignment and semantic alignment in Mixup strategy. Moreover, it is designed as an efficient and decoupled method adaptive to MIL, neither involving time-consuming operations nor relying on MIL model predictions. Comparative experiments and ablation studies are specially designed to evaluate the effectiveness and advantages of our PseMix. Test results show that PseMix could often improve the performance of MIL networks in WSI classification. Besides, it could also boost the generalization capacity of MIL models, and promote their robustness to patch occlusion and noisy labels. Our source code is available at https://github.com/liupei101/PseMix.	翻訳日:2023-06-29 14:05:27 公開日:2023-06-28
# データサイエンスを定義する: 探究の新しい分野 Defining data science: a new field of inquiry ( http://arxiv.org/abs/2306.16177v1 ) ライセンス: Link先を確認	Michael L Brodie	(参考訳) データサイエンスは科学ではない。それは研究パラダイムです。その力、範囲、スケールは、我々の最も強力な研究パラダイムである科学を越え、知識の発見と世界を変えることができるでしょう。私たちはまだそれを理解し定義しておらず、その可能性を認識し、リスクを管理するために不可欠です。現代のデータサイエンスは始まったばかりです。 1962年から徐々に発展し、2000年から急速に発展し、21世紀の最も活発で強力な革新の1つであり、基本的に新しい調査分野である。その価値、パワー、適用性のために、40以上の規律、何百もの研究領域、何千ものアプリケーションに現れています。何百万ものデータサイエンス出版物には、データサイエンスとデータサイエンスの問題解決の無数の定義が含まれている。幼少期のため、多くの定義は独立性、アプリケーション固有性、相互不完全性、冗長性、矛盾性がある。本研究では,データサイエンスコミュニティのためのデータサイエンスジャーナルを用いた,データサイエンス参照フレームワークに基づくコヒーレントで統一的な定義の開発を提案することにより,このデータサイエンスの多重定義の課題を解決する。本稿では、そのような定義を議論するために必要なデータサイエンスアーティファクトの候補定義を提供する。データサイエンスの哲学、データサイエンスの問題解決パラダイム、およびデータサイエンスを定義し、統一し、発展させるためのフレームワークとしてしばしば呼ばれる6つの要素データサイエンス参照フレームワーク(公理学、オントロジ、認識論、方法論、手法、技術)からなる古典的な研究パラダイムの概念に基づいている。データ科学を定義するための課題、すなわち、データ科学を定義するための手段、そして包括的ソリューションの基盤としてのそれらの要求と利益を示す。 Data science is not a science. It is a research paradigm. Its power, scope, and scale will surpass science, our most powerful research paradigm, to enable knowledge discovery and change our world. We have yet to understand and define it, vital to realizing its potential and managing its risks. Modern data science is in its infancy. Emerging slowly since 1962 and rapidly since 2000, it is a fundamentally new field of inquiry, one of the most active, powerful, and rapidly evolving 21st century innovations. Due to its value, power, and applicability, it is emerging in 40+ disciplines, hundreds of research areas, and thousands of applications. Millions of data science publications contain myriad definitions of data science and data science problem solving. Due to its infancy, many definitions are independent, application-specific, mutually incomplete, redundant, or inconsistent, hence so is data science. This research addresses this data science multiple definitions challenge by proposing the development of coherent, unified definition based on a data science reference framework using a data science journal for the data science community to achieve such a definition. This paper provides candidate definitions for essential data science artifacts that are required to discuss such a definition. They are based on the classical research paradigm concept consisting of a philosophy of data science, the data science problem solving paradigm, and the six component data science reference framework (axiology, ontology, epistemology, methodology, methods, technology) that is a frequently called for unifying framework with which to define, unify, and evolve data science. It presents challenges for defining data science, solution approaches, i.e., means for defining data science, and their requirements and benefits as the basis of a comprehensive solution.	翻訳日:2023-06-29 14:04:58 公開日:2023-06-28
# アフガニスタンにおける教育禁止ツイートの感情分析 Emotion Analysis of Tweets Banning Education in Afghanistan ( http://arxiv.org/abs/2306.16268v1 ) ライセンス: Link先を確認	Mohammad Ali Hussiny, Lilja {\O}vrelid	(参考訳) 本稿ではアフガニスタンで話されているペルシア語のダリ変種に対する最初の感情注釈データセットを紹介する。 LetHerLearnのデータセットには、2022年にタリバンで女性教育の権利が禁止されたことを受けて投稿された7,600のツイートが含まれている。ここでは、データ収集とアノテーションのプロセス、関連するデータセットの統計、結果のデータセットに関する最初の実験、dari感情分類のタスクのために様々なニューラルネットワークアーキテクチャをベンチマークします。 This paper introduces the first emotion annotated dataset for the Dari variant of Persian spoken in Afghanistan. The LetHerLearn dataset contains 7,600 tweets posted in reaction to the Taliban ban of women rights to education in 2022 and has been manually annotated according to Ekman emotion categories. We here detail the data collection and annotation process, present relevant dataset statistics as well as initial experiments on the resulting dataset, benchmarking a number of different neural architectures for the task of Dari emotion classification.	翻訳日:2023-06-29 13:56:56 公開日:2023-06-28
# 学習者自身のコードに関する自動質問は、脆弱な知識を検出するのに役立つ Automated Questions About Learners' Own Code Help to Detect Fragile Knowledge ( http://arxiv.org/abs/2306.16267v1 ) ライセンス: Link先を確認	Teemu Lehtinen, Otto Sepp\"al\"a, Ari Korhonen	(参考訳) 学生は、実際にどのように動作するかを脆弱に理解していても、正しく機能するプログラムコードを作成できる。個々のエクササイズサブミッション(qlc)から自動的に派生した質問は、学生が作成したコードの構造とロジックをどの程度理解しているかを調査できる。以前の研究は、最初のプログラミングコースの文脈でこのアプローチを研究した。本研究は,CS1における一般概念の要約を含む工学生のためのフォローアッププログラミングコースの再現である。その課題は、学生の90%が解決した古典的な降雨問題であった。合格申請ごとに生成されたQLCは意図的にシンプルに保たれていたが、学生の27%は少なくとも1回は失敗した。自己のプログラム論理に関する質問に苦しむ学生は、正解した学生よりもコースポイント全体の中央値が低かった。 Students are able to produce correctly functioning program code even though they have a fragile understanding of how it actually works. Questions derived automatically from individual exercise submissions (QLC) can probe if and how well the students understand the structure and logic of the code they just created. Prior research studied this approach in the context of the first programming course. We replicate the study on a follow-up programming course for engineering students which contains a recap of general concepts in CS1. The task was the classic rainfall problem which was solved by 90% of the students. The QLCs generated from each passing submission were kept intentionally simple, yet 27% of the students failed in at least one of them. Students who struggled with questions about their own program logic had a lower median for overall course points than students who answered correctly.	翻訳日:2023-06-29 13:56:46 公開日:2023-06-28
# MIMO信号検出のための深部展開模擬分岐 Deep Unfolded Simulated Bifurcation for Massive MIMO Signal Detection ( http://arxiv.org/abs/2306.16264v1 ) ライセンス: Link先を確認	Satoshi Takabe	(参考訳) マルチインプット多重出力(MIMO)は次世代無線通信の鍵となる要素である。近年,深層学習技術と量子(インスパイアされた)アルゴリズムに基づく様々なMIMO信号検出器が提案され,従来の検出器と比較して検出性能が向上している。本稿では,量子インスパイアされたアルゴリズムであるシミュレート分岐(sb)アルゴリズムに注目した。本稿では,検出性能を向上させる2つの手法を提案する。第一は、レベンバーグ・マーカルトアルゴリズムに触発されたアルゴリズムを修正して、最大確率検出の極小を取り除いたことである。 2つ目は、反復アルゴリズムの内部パラメータをトレーニングするためのディープラーニングテクニックである、deep unfoldingの利用である。本稿では,SBの更新ルールを微分可能とした深部展開SBを提案する。その結果,これらの検出器はMIMOシステムの信号検出性能を著しく向上することがわかった。 Multiple-input multiple-output (MIMO) is a key ingredient of next-generation wireless communications. Recently, various MIMO signal detectors based on deep learning techniques and quantum(-inspired) algorithms have been proposed to improve the detection performance compared with conventional detectors. This paper focuses on the simulated bifurcation (SB) algorithm, a quantum-inspired algorithm. This paper proposes two techniques to improve its detection performance. The first is modifying the algorithm inspired by the Levenberg-Marquardt algorithm to eliminate local minima of maximum likelihood detection. The second is the use of deep unfolding, a deep learning technique to train the internal parameters of an iterative algorithm. We propose a deep-unfolded SB by making the update rule of SB differentiable. The numerical results show that these proposed detectors significantly improve the signal detection performance in massive MIMO systems.	翻訳日:2023-06-29 13:56:34 公開日:2023-06-28
# i.i.d.行列の散逸スペクトル形式因子 The Dissipative Spectral Form Factor for I.I.D. Matrices ( http://arxiv.org/abs/2306.16262v1 ) ライセンス: Link先を確認	Giorgio Cipolloni and Nicolo Grometto	(参考訳) ジニブレアンサンブルの[arXiv:2103.05001]に最近導入された散逸スペクトル形因子(DSFF)は、散逸量子系の普遍的性質を研究するための鍵となるツールである。本研究では,実数や複素数を中間時間スケールまで含む大きな乱数行列のdsffを計算し, [arxiv:2103.05001] からの予測を確認した。実例におけるDSFFの解析式は以前不明であった。さらに,DSFFの連結成分は,短時間で成分の4次累積に依存する非普遍的補正を示すことを示した。これらの結果は、非エルミート確率行列[arXiv:2002.02438, arXiv:1912.04100]の線形固有値統計に対する中心極限定理に基づいている。 The Dissipative Spectral Form Factor (DSFF), recently introduced in [arXiv:2103.05001] for the Ginibre ensemble, is a key tool to study universal properties of dissipative quantum systems. In this work we compute the DSFF for a large class of random matrices with real or complex entries up to an intermediate time scale, confirming the predictions from [arXiv:2103.05001]. The analytic formula for the DSFF in the real case was previously unknown. Furthermore, we show that for short times the connected component of the DSFF exhibits a non-universal correction depending on the fourth cumulant of the entries. These results are based on the central limit theorem for linear eigenvalue statistics of non-Hermitian random matrices [arXiv:2002.02438, arXiv:1912.04100].	翻訳日:2023-06-29 13:56:22 公開日:2023-06-28
# センチネル2画像からの疎アノテーションによる土地被覆区分 Land Cover Segmentation with Sparse Annotations from Sentinel-2 Imagery ( http://arxiv.org/abs/2306.16252v1 ) ライセンス: Link先を確認	Marco Galatola, Edoardo Arnaudo, Luca Barco, Claudio Rossi, Fabrizio Dominici	(参考訳) 土地被覆(LC)セグメンテーションは, 環境分析や自然災害管理など, 様々な分野で重要な役割を担っている。しかし、正確なLCマップの生成は複雑で時間を要する作業であり、環境変化を考慮して複数のアノテータの専門知識と定期的な更新が必要である。本研究では,LCセグメンテーションに関連する課題に,スパースアノテーションとドメイン適応手法を用いて対処する,燃料マップ記述のためのフレームワークSPADAを紹介する。 LUCASやUrban Atlasといった信頼性の高い地上事実を用いた性能評価は、この手法の有効性を示している。 SPADAは最先端のセマンティックセグメンテーションアプローチやサードパーティ製品よりも優れており、平均IoUスコアは42.86、F1スコアは67.93である。 Land cover (LC) segmentation plays a critical role in various applications, including environmental analysis and natural disaster management. However, generating accurate LC maps is a complex and time-consuming task that requires the expertise of multiple annotators and regular updates to account for environmental changes. In this work, we introduce SPADA, a framework for fuel map delineation that addresses the challenges associated with LC segmentation using sparse annotations and domain adaptation techniques for semantic segmentation. Performance evaluations using reliable ground truths, such as LUCAS and Urban Atlas, demonstrate the technique's effectiveness. SPADA outperforms state-of-the-art semantic segmentation approaches as well as third-party products, achieving a mean Intersection over Union (IoU) score of 42.86 and an F1 score of 67.93 on Urban Atlas and LUCAS, respectively.	翻訳日:2023-06-29 13:56:08 公開日:2023-06-28
# 均一空間上の潜在SDE Latent SDEs on Homogeneous Spaces ( http://arxiv.org/abs/2306.16248v1 ) ライセンス: Link先を確認	Sebastian Zeng, Florian Graf, Roland Kwitt	(参考訳) 確率過程が(おそらく複雑な)観測された場合、潜時確率微分方程式(SDE)の解によって支配される潜在変数モデルにおける変分ベイズ推論の問題を考察する。効率的な勾配計算などの大規模データから(ほぼ任意の)潜伏ニューラルネットワークSDEを学習しようとするときの課題に触発されて、我々は一歩後退して特定のサブクラスを研究する。我々の場合、SDEは同次潜在空間上で進化し、対応する(行列)リー群の確率力学によって誘導される。学習問題において、単位$n$-sphere上のSDEは、おそらくこの設定の最も関連性の高いインカーネーションである。特に、変分推論において、球面は真に非形式的事前SDEの使用を容易にするだけでなく、証明の下界における近似的後続過程と先行過程の間のクルバック・リーブラー分岐に対する特に単純で直感的な表現も得られる。実験により, 提案手法の潜在sdeを, 既存の1段階幾何オイラー・マルヤマスキームを用いて効率的に学習できることを実証した。より多様なSDEに制限されているにもかかわらず、様々な時系列補間および分類ベンチマークにおいて、競争力や最先端のパフォーマンスを達成する。 We consider the problem of variational Bayesian inference in a latent variable model where a (possibly complex) observed stochastic process is governed by the solution of a latent stochastic differential equation (SDE). Motivated by the challenges that arise when trying to learn an (almost arbitrary) latent neural SDE from large-scale data, such as efficient gradient computation, we take a step back and study a specific subclass instead. In our case, the SDE evolves on a homogeneous latent space and is induced by stochastic dynamics of the corresponding (matrix) Lie group. In learning problems, SDEs on the unit $n$-sphere are arguably the most relevant incarnation of this setup. Notably, for variational inference, the sphere not only facilitates using a truly uninformative prior SDE, but we also obtain a particularly simple and intuitive expression for the Kullback-Leibler divergence between the approximate posterior and prior process in the evidence lower bound. Experiments demonstrate that a latent SDE of the proposed type can be learned efficiently by means of an existing one-step geometric Euler-Maruyama scheme. Despite restricting ourselves to a less diverse class of SDEs, we achieve competitive or even state-of-the-art performance on various time series interpolation and classification benchmarks.	翻訳日:2023-06-29 13:55:50 公開日:2023-06-28
# CBBQ: 大規模言語モデルのための人間-AIコラボレーションによる中国のバイアスベンチマークデータセット CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models ( http://arxiv.org/abs/2306.16244v1 ) ライセンス: Link先を確認	Yufei Huang and Deyi Xiong	(参考訳) 大規模言語モデルの社会的バイアスを理論的に測定することは、高度に有能なAIモデルの倫理的リスクの検出と低減に不可欠である。本研究では,人間の専門家と生成言語モデルが共同で構築した10万以上の質問からなり,中国文化と価値観に関連する14の社会的次元におけるステレオタイプと社会バイアスをカバーする,中国バイアスベンチマークデータセットを提案する。キュレーションプロセスには、広範な文献レビューによるバイアス識別、曖昧なコンテキスト生成、AIによるあいまいなコンテキスト生成、snd manual Review \& recompositionの4つの重要なステップが含まれている。データセットのテストインスタンスは、3K以上の高品質なテンプレートから自動的に抽出される。データセットは広範囲のカバレッジと高い多様性を示す。広範な実験により、データセットがモデルバイアスの検出に有効であることが示され、公に入手可能な10の中国語大言語モデルはすべて、特定のカテゴリにおいて強いバイアスを示している。さらに,我々は実験から,微調整されたモデルがある程度の注意を払って,あるタイプにおいて道徳的に有害なアウトプットを生成するのを避けることができることを観察した。私たちのデータセットと結果は \href{https://github.com/yfhuangxxxx/cbbq}{https://github.com/yfhuangxxxx/cbbq} で公開されています。 Holistically measuring societal biases of large language models is crucial for detecting and reducing ethical risks in highly capable AI models. In this work, we present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models, covering stereotypes and societal biases in 14 social dimensions related to Chinese culture and values. The curation process contains 4 essential steps: bias identification via extensive literature review, ambiguous context generation, AI-assisted disambiguous context generation, snd manual review \& recomposition. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. The dataset exhibits wide coverage and high diversity. Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories. Additionally, we observe from our experiments that fine-tuned models could, to a certain extent, heed instructions and avoid generating outputs that are morally harmful in some types, in the way of "moral self-correction". Our dataset and results are publicly available at \href{https://github.com/YFHuangxxxx/CBBQ}{https://github.com/YFHuangxxxx/CBBQ}, offering debiasing research opportunities to a widened community.	翻訳日:2023-06-29 13:55:29 公開日:2023-06-28
# 散逸フェルミオン系に対する局所非エルミットハミルトン形式とフェルミ超流動系の損失誘起人口増加 Local Non-Hermitian Hamiltonian Formalism for Dissipative Fermionic Systems and Loss-Induced Population Increase in Fermi Superfluids ( http://arxiv.org/abs/2306.16235v1 ) ライセンス: Link先を確認	Teng Xiao and Gentaro Watanabe	(参考訳) 非エルミートハミルトニアン(Non-Hermitian Hamiltonian、NHH)は、開量子系に対する効果的な形式主義である。共通認識では、リンドブラッドマスター方程式で系を記述するとき、そのジャンプ項を無視して得られるnhhは散逸率の逆よりも十分に短い時間スケールのよい近似であると考えられている。この共通知恵に挑戦し、散逸性フェルミオン系に対する元のマスター方程式から適切なNHHを得るためのスキームを開発する。この NHH は局所的な NHH と呼ばれ、各モードにおける損失過程を局所的に記述する。具体例として、フェミオン超流動を用いた新しいスキームを1体損失下で正当化する。さらに, ペアリングギャップと異常電界との間の散逸誘起位相ロックにより, 長期的進化における損失による個体増加がみられた。 Non-Hermitian Hamiltonian (NHH) is an effective formalism for open quantum systems. In common wisdom, when the system is described by the Lindblad master equation, the NHH obtained by neglecting its jump term is believed to be a good approximation for a timescale sufficiently shorter than the inverse of the dissipation rate. We challenge this common wisdom and develop a scheme to obtain an appropriate NHH from the original master equation for dissipative fermionic systems. This NHH, called the local NHH, describes the loss process in each individual mode locally. As a concrete example, we justify our new scheme using fermionic superfluid under one-body loss. Furthermore, we find loss-induced population increase in the long time evolution due to the dissipation-induced phase locking between the pairing gap and the anomalous field.	翻訳日:2023-06-29 13:55:01 公開日:2023-06-28
# 自己組織化生体分子薄膜によるカシミール力の効率的な低減 Efficient Reduction of Casimir Forces by Self-assembled Bio-molecular Thin Films ( http://arxiv.org/abs/2306.16209v1 ) ライセンス: Link先を確認	Ren\'e I.P. Sedmik, Alexander Urech, Zeev Zalevsky, Itai Carmeli	(参考訳) ロンドン・ヴァン・デル・ワールス力に関連するカシミール力は、電磁揺らぎのスペクトルが境界によって制限された場合に生じる。ナノスケールでこれらの力を制御するための基礎科学と技術的応用の両方から大きな関心がある。科学的には、カシミール効果は、マクロな物体の間に現れる唯一の既知の量子真空効果であり、真空の未知の物理を調べることができる。本研究では, プレートと球体間のカシミール力に及ぼす自己組織化分子バイオ薄膜と有機薄膜の影響を実験的に検討した。分子薄膜は、わずか数ナノメートルの厚さにもかかわらず、カシミール力は最大14%減少することがわかった。この還元に繋がる分子特性を明らかにするため, 化学的, 物理的性質の異なる5種類の生体分子膜について検討した。分光データから、金層と分子膜の電子状態が自己集合過程における電荷再配置によって混合されることに起因する広い吸収帯が明らかとなった。リフシッツ理論を用いて、観察されたカシミール力の変化は、分子層の形成による新しい吸収帯の出現と一致することを計算した。所望のカシミール力の低減は、溶液中の単純な自己組織化技術を用いて、複数の単層を積み重ねることで調整できる。分子は、それぞれ数ナノメートルの長さで、小さな空洞や穴を貫通し、あらゆる表面を高い効率で覆うことができる。このプロセスは、マイクロエレクトロメカニカルシステム(MEMS)の製造における現在の手法と互換性があり、カシミール効果によって生じる「スティクション」により、一定のサイズを超えて小型化できない。したがって、我々のアプローチはこれらのデバイスをさらに小型化することができる。 Casimir forces, related to London-van der Waals forces, arise if the spectrum of electromagnetic fluctuations is restricted by boundaries. There is great interest both from fundamental science and technical applications to control these forces on the nano scale. Scientifically, the Casimir effect being the only known quantum vacuum effect manifesting between macroscopic objects, allows to investigate the poorly known physics of the vacuum. In this work, we experimentally investigate the influence of self-assembled molecular bio and organic thin films on the Casimir force between a plate and a sphere. We find that molecular thin films, despite being a mere few nanometers thick, reduce the Casimir force by up to 14%. To identify the molecular characteristics leading to this reduction, five different bio-molecular films with varying chemical and physical properties were investigated. Spectroscopic data reveal a broad absorption band whose presence can be attributed to the mixing of electronic states of the underlying gold layer and those of the molecular film due to charge rearrangement in the process of self-assembly. Using Lifshitz theory we calculate that the observed change in the Casimir force is consistent with the appearance of the new absorption band due to the formation of molecular layers. The desired Casimir force reduction can be tuned by stacking several monolayers, using a simple self-assembly technique in a solution. The molecules - each a few nanometers long - can penetrate small cavities and holes, and cover any surface with high efficiency. This process seems compatible with current methods in the production of micro-electromechanical systems (MEMS), which cannot be miniaturized beyond a certain size due to `stiction' caused by the Casimir effect. Our approach could therefore readily enable further miniaturization of these devices.	翻訳日:2023-06-29 13:54:46 公開日:2023-06-28
# McKean-Vlasov制御問題に対する連続時間q-ラーニング Continuous-Time q-learning for McKean-Vlasov Control Problems ( http://arxiv.org/abs/2306.16208v1 ) ライセンス: Link先を確認	Xiaoli Wei, Xiang Yu	(参考訳) 本稿では,最近Jia と Zhou (2022c) による Q-learning の連続的対応として作られた q-learning を,エントロピー規則化強化学習の設定における Mckean-Vlasov 制御問題に対して検討する。 Jia と Zhou (2022c) における単一のエージェントの制御問題とは対照的に、エージェントの平均場相互作用は q-函数の定義をより微妙に表現し、2つの異なる q-函数が自然に生じることを示した。 i) テストポリシを含む弱いマルティンゲール条件で学習可能な統合Q関数(Gu, Guo, Wei, Xu (2023))の1次近似としての統合q関数($q$で記述) (ii)政策改善イテレーションで使用される本質的なq-関数($q_e$で示される)。 2つのq関数は、すべてのテストポリシーの下で積分表現を介して関連していることを示す。統合q関数の弱martingale条件と提案するテストポリシー探索法に基づき,モデルフリーのオフラインおよびオンライン学習アルゴリズムを考案した。 LQ制御フレームワークとLQ制御フレームワーク以外の2つの金融アプリケーションにおいて、値関数と2つのq-関数の正確なパラメータ化を求め、シミュレーション実験でアルゴリズムを説明できる。 This paper studies the q-learning, recently coined as the continuous-time counterpart of Q-learning by Jia and Zhou (2022c), for continuous time Mckean-Vlasov control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent's control problem in Jia and Zhou (2022c), the mean-field interaction of agents render the definition of q-function more subtle, for which we reveal that two distinct q-functions naturally arise: (i) the integrated q-function (denoted by $q$) as the first-order approximation of the integrated Q-function introduced in Gu, Guo, Wei and Xu (2023) that can be learnt by a weak martingale condition involving test policies; and (ii) the essential q-function (denoted by $q_e$) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation under all test policies. Based on the weak martingale condition of the integrated q-function and our proposed searching method of test policies, some model-free offline and online learning algorithms are devised. In two financial applications, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the value function and two q-functions and illustrate our algorithms with simulation experiments.	翻訳日:2023-06-29 13:54:16 公開日:2023-06-28
# スタイン法によるガウス確率場近似と広帯域ランダムニューラルネットワークへの応用 Gaussian random field approximation via Stein's method with applications to wide random neural networks ( http://arxiv.org/abs/2306.16308v1 ) ライセンス: Link先を確認	Krishnakumar Balasubramanian, Larry Goldstein, Nathan Ross, Adil Salim	(参考訳) 我々は、シュテインの手法に基づいて、n$-球面とガウス面によってインデックスづけされた任意の連続的$\mathbb{r}^d$値の確率場の間の、ワッサースタイン距離(w_1$)の上界を、$\sup$-normに関して導出する。我々は、より滑らかな計量の束縛を$w_1$距離に移すことができる新しいガウス平滑化手法を開発した。滑らか化はラプラシアン作用素の力を使って構成された共分散関数に基づいており、関連するガウス過程がトラクタブルなキャメロン・マーチンあるいは再生するケルネル・ヒルベルト空間を持つように設計されている。この機能により、以前文献で考慮されていた1次元の間隔ベースのインデックスセットを越えられるようになります。一般結果に特化して、任意の深さの広いランダムニューラルネットワークのガウス確率場近似とランダム場レベルでのリプシッツ活性化関数の第一境界を求める。我々の境界は、ネットワークの幅とランダムな重みのモーメントで明示的に表現される。また、活性化関数が3つの有界導関数を持つとき、より厳密な境界が得られる。 We derive upper bounds on the Wasserstein distance ($W_1$), with respect to $\sup$-norm, between any continuous $\mathbb{R}^d$ valued random field indexed by the $n$-sphere and the Gaussian, based on Stein's method. We develop a novel Gaussian smoothing technique that allows us to transfer a bound in a smoother metric to the $W_1$ distance. The smoothing is based on covariance functions constructed using powers of Laplacian operators, designed so that the associated Gaussian process has a tractable Cameron-Martin or Reproducing Kernel Hilbert Space. This feature enables us to move beyond one dimensional interval-based index sets that were previously considered in the literature. Specializing our general result, we obtain the first bounds on the Gaussian random field approximation of wide random neural networks of any depth and Lipschitz activation functions at the random field level. Our bounds are explicitly expressed in terms of the widths of the network and moments of the random weights. We also obtain tighter bounds when the activation function has three bounded derivatives.	翻訳日:2023-06-29 13:48:50 公開日:2023-06-28
# 点2Point : 時空間占有予測におけるヒルベルト選別点雲の効率的な深層学習のためのフレームワーク Point2Point : A Framework for Efficient Deep Learning on Hilbert sorted Point Clouds with applications in Spatio-Temporal Occupancy Prediction ( http://arxiv.org/abs/2306.16306v1 ) ライセンス: Link先を確認	Athrva Atul Pandhare	(参考訳) ポイントクラウドデータの不規則性と置換不変性は、効果的な学習に挑戦する。この問題に対処する従来の方法は、生の点雲を3Dボクセルグリッドやレンジイメージなどの中間表現に変換することである。このような中間表現は置換不変性の問題を解くが、情報のかなりの損失をもたらす。原点雲で学習するアプローチは、点間の近傍関係の解決に支障をきたすか、あるいはそれらの定式化において複雑すぎる。本論文では,ヒルベルト空間充填曲線によって誘導される1次元秩序を保存する局所性として点雲を表現する新しい手法を提案する。ヒルベルトソートされた点雲上で効果的に学習できるニューラルアーキテクチャであるpoint2pointも紹介する。 Point2Pointは、ポイントクラウドセグメンテーションと生成タスクで競合する性能を示す。最後に,ポイント雲からの時空間占有予測におけるpoint2pointの性能を示す。 The irregularity and permutation invariance of point cloud data pose challenges for effective learning. Conventional methods for addressing this issue involve converting raw point clouds to intermediate representations such as 3D voxel grids or range images. While such intermediate representations solve the problem of permutation invariance, they can result in significant loss of information. Approaches that do learn on raw point clouds either have trouble in resolving neighborhood relationships between points or are too complicated in their formulation. In this paper, we propose a novel approach to representing point clouds as a locality preserving 1D ordering induced by the Hilbert space-filling curve. We also introduce Point2Point, a neural architecture that can effectively learn on Hilbert-sorted point clouds. We show that Point2Point shows competitive performance on point cloud segmentation and generation tasks. Finally, we show the performance of Point2Point on Spatio-temporal Occupancy prediction from Point clouds.	翻訳日:2023-06-29 13:48:25 公開日:2023-06-28
# 超伝導量子デバイス用超音波エッジマイクロカットを用いた高qトレンチアルミニウムコプラナー共振器 High-Q trenched aluminum coplanar resonators with an ultrasonic edge microcutting for superconducting quantum devices ( http://arxiv.org/abs/2306.16301v1 ) ライセンス: Link先を確認	E.V. Zikiy, A.I. Ivanov, N.S. Smirnov, D.O. Moskalev, V.I. Polozov, A.R. Matanin, E.I. Malevannaya, V.V. Echeistov, T.G. Konstantinova and I.A. Rodionov	(参考訳) 誘電損失は超伝導量子ビットのコヒーレンスを制限する重要な要因の1つである。材料および製造工程が誘電損失に与える影響を共平面導波路(cpw)マイクロ波共振器を用いて評価する。本稿では,内部品質係数が5x106以上,低出力2x106(最大4.4x106)の超伝導マイクロ波共振器について報告する。このような性能は、量子ジョセフソン接合回路でよく用いられる高抵抗シリコン基板上で、7-10.5Um帯の100nm厚アルミニウム共振器で実証される。乾式および湿式アルミニウムエッチングとシリコン基板の深さおよび等方性反応性イオンエッチングを併用した共振器の内部品質因子について検討した。エアブリッジとシリコン基板エッチングの両方を用いたジョセフソン接合互換CPW共振器製造法を提案する。最後に, エアブリッジの位置と余分なプロセスステップが誘電損失に与える影響を実証する。ウェットエッチングされたアルミニウム共振器と等方性除去基板に対して, 超音波金属エッジマイクロカットにより, 最適品質のfa ctorを得る。 Dielectric losses are one of the key factors limiting the coherence of superconducting qubits. The impact of materials and fabrication steps on dielectric losses can be evaluated using coplanar waveguide (CPW) microwave resonators. Here, we report on superconducting CPW microwave resonators with internal quality factors systematically exceeding 5x106 at high powers and 2x106 (with the best value of 4.4x106) at low power. Such performance is demonstrated for 100-nm-thick aluminum resonators with 7-10.5 um center trace on high-resistivity silicon substrates commonly used in quantum Josephson junction circuits. We investigate internal quality factors of the resonators with both dry and wet aluminum etching, as well as deep and isotropic reactive ion etching of silicon substrate. Josephson junction compatible CPW resonators fabrication process with both airbridges and silicon substrate etching is proposed. Finally, we demonstrate the effect of airbridges positions and extra process steps on the overall dielectric losses. The best quality fa ctors are obtained for the wet etched aluminum resonators and isotropically removed substrate with the proposed ultrasonic metal edge microcutting.	翻訳日:2023-06-29 13:48:09 公開日:2023-06-28
# 社会世界の知識 : モデリングと応用 Social World Knowledge: Modeling and Applications ( http://arxiv.org/abs/2306.16299v1 ) ライセンス: Link先を確認	Nir Lotan and Einat Minkov	(参考訳) 社会世界の知識は、人間や機械による効果的なコミュニケーションと情報処理の重要な要素である。現在、実世界の知識を表す多くの知識基盤が存在する。しかし、世界の知識の社会的側面を捉えた資源は存在しない。我々はこのような資源の定式化と構築に向けて重要な一歩を踏み出したと信じている。ソーシャルネットワークで発生する社会的文脈から低次元の実体埋め込みを抽出するための一般的なフレームワークであるSocialVecを紹介する。このフレームワークでは、エンティティは一般的な関心を呼び出す非常に人気のあるアカウントに対応する。個人ユーザが共同でフォローする傾向にあるエンティティは社会的に関連があると仮定し、このソーシャルコンテキストの定義を用いてエンティティの埋め込みを学習する。テキストセマンティクスを含む作業を容易にする単語埋め込みと同様に、学習されたソーシャルエンティティ埋め込みは、複数のソーシャルフレーバーのタスクに利益をもたらすことを期待する。この研究では、約2万のエンティティのソーシャル埋め込みを、130万人のTwitterユーザーとフォローするアカウントのサンプルから引き出した。社会的重要性の2つのタスクに、結果の埋め込みを取り入れて評価する。まず、ニュースソースの政治的バイアスを、社会的埋め込み空間における実体的類似性の観点から評価する。第2に,フォローするエンティティのソーシャルな埋め込みに基づいて,個々のtwitterユーザの個人的特性を予測する。どちらの場合も、タスク固有のベースラインと比較して、我々のアプローチで有利または競争的なパフォーマンスを示す。さらに、事実に基づく既存の実体埋め込み方式は、知識の社会的側面を捉えないことを示す。我々は、社会世界の知識とその応用のさらなる探索を支援するために、学習された社会エンティティの埋め込みを研究コミュニティに公開する。 Social world knowledge is a key ingredient in effective communication and information processing by humans and machines alike. As of today, there exist many knowledge bases that represent factual world knowledge. Yet, there is no resource that is designed to capture social aspects of world knowledge. We believe that this work makes an important step towards the formulation and construction of such a resource. We introduce SocialVec, a general framework for eliciting low-dimensional entity embeddings from the social contexts in which they occur in social networks. In this framework, entities correspond to highly popular accounts which invoke general interest. We assume that entities that individual users tend to co-follow are socially related, and use this definition of social context to learn the entity embeddings. Similar to word embeddings which facilitate tasks that involve text semantics, we expect the learned social entity embeddings to benefit multiple tasks of social flavor. In this work, we elicited the social embeddings of roughly 200K entities from a sample of 1.3M Twitter users and the accounts that they follow. We employ and gauge the resulting embeddings on two tasks of social importance. First, we assess the political bias of news sources in terms of entity similarity in the social embedding space. Second, we predict the personal traits of individual Twitter users based on the social embeddings of entities that they follow. In both cases, we show advantageous or competitive performance using our approach compared with task-specific baselines. We further show that existing entity embedding schemes, which are fact-based, fail to capture social aspects of knowledge. We make the learned social entity embeddings available to the research community to support further exploration of social world knowledge and its applications.	翻訳日:2023-06-29 13:47:52 公開日:2023-06-28
# 時間変動モデレーション評価のための因果的帰納効果推定のためのメタラーニング手法 A Meta-Learning Method for Estimation of Causal Excursion Effects to Assess Time-Varying Moderation ( http://arxiv.org/abs/2306.16297v1 ) ライセンス: Link先を確認	Jieru Shi, Walter Dempsey	(参考訳) ウェアラブル技術とスマートフォンによるデジタル健康介入における双子革命は、様々な健康科学分野におけるモバイルヘルス(mhealth)介入のアクセシビリティと取り込みを大きく拡大した。マイクロランダム化試験(MRT)と呼ばれる連続ランダム化実験は、これらのmHealth介入成分の有効性を実証的に評価するために人気が高まっている。 MRTは「因果抽出効果(causal excursion effect)」と呼ばれる新しい種類の因果推定を行い、健康科学者は介入の効果が時間とともにどのように変化するか、あるいは過去の個々の特性、文脈、反応によって緩和されるかを評価することができる。しかし、因果抽出効果を推定する現在のデータ解析手法では、重要なニュアンスパラメータの作業モデルを構築するために、観測された高次元歴史の特徴を事前に特定する必要がある。機械学習アルゴリズムは自動機能構築に理想的だが、因果的再帰推定へのナイーブな応用は、モデルの誤特定下でバイアスを生じさせ、介入効果に関する誤った結論をもたらす可能性がある。この問題に対処するために,本稿ではメタリーナーの観点から因果的帰納効果の推定を再検討する。そこでは,ニュアサンスパラメータの推定に用いられる教師付き学習アルゴリズムの選択に,アナリストはいまだ無依存である。本論文は,新しい推定器の漸近特性を理論的および広範囲なシミュレーション実験により比較し,相対効率の向上を実証し,既存手法の2倍頑健な代替手法を提案する。最後に,本手法の実用性について,米国における初年の医療従事者の多施設コホート(NeCampら,2020年)からのデータを分析した。 Twin revolutions in wearable technologies and smartphone-delivered digital health interventions have significantly expanded the accessibility and uptake of mobile health (mHealth) interventions across various health science domains. Sequentially randomized experiments called micro-randomized trials (MRTs) have grown in popularity to empirically evaluate the effectiveness of these mHealth intervention components. MRTs have given rise to a new class of causal estimands known as "causal excursion effects", which enable health scientists to assess how intervention effectiveness changes over time or is moderated by individual characteristics, context, or responses in the past. However, current data analysis methods for estimating causal excursion effects require pre-specified features of the observed high-dimensional history to construct a working model of an important nuisance parameter. While machine learning algorithms are ideal for automatic feature construction, their naive application to causal excursion estimation can lead to bias under model misspecification, potentially yielding incorrect conclusions about intervention effectiveness. To address this issue, this paper revisits the estimation of causal excursion effects from a meta-learner perspective, where the analyst remains agnostic to the choices of supervised learning algorithms used to estimate nuisance parameters. The paper presents asymptotic properties of the novel estimators and compares them theoretically and through extensive simulation experiments, demonstrating relative efficiency gains and supporting the recommendation for a doubly robust alternative to existing methods. Finally, the practical utility of the proposed methods is demonstrated by analyzing data from a multi-institution cohort of first-year medical residents in the United States (NeCamp et al., 2020).	翻訳日:2023-06-29 13:47:28 公開日:2023-06-28
# 関連エンティティの選択:ゼロショット解析による知識グラフブートストラップ Relevant Entity Selection: Knowledge Graph Bootstrapping via Zero-Shot Analogical Pruning ( http://arxiv.org/abs/2306.16296v1 ) ライセンス: Link先を確認	Lucas Jarnac, Miguel Couceiro, Pierre Monnin	(参考訳) 知識グラフ構築(kgc)は、高品質の核から始まった反復的なプロセスと見なすことができる。このような核はWikidataのようなオープンなKGに存在する知識から得ることができる。しかし、そのような汎用kgのサイズのため、それらを全体として統合することは、無関係なコンテンツとスケーラビリティの問題を伴う可能性がある。我々は,汎用kg に対する興味を持つ種実体から始まり,それらの隣り合う実体を保持または従属するアナロジーに基づくアプローチを提案する。ウィキデータに対する我々のアプローチは、ドメイン均質または異質なシードエンティティを含む2つの手動ラベル付きデータセットを通して評価する。我々は,我々の類推に基づくアプローチがLSTM,ランダムフォレスト,SVM,MLPを著しく低いパラメータ数で上回ることを示す。また,その一般化ポテンシャルを転送学習環境において評価する。これらの結果は、KGライフサイクルに関連するタスクにおけるアナロジーに基づく推論のさらなる統合を提唱する。 Knowledge Graph Construction (KGC) can be seen as an iterative process starting from a high quality nucleus that is refined by knowledge extraction approaches in a virtuous loop. Such a nucleus can be obtained from knowledge existing in an open KG like Wikidata. However, due to the size of such generic KGs, integrating them as a whole may entail irrelevant content and scalability issues. We propose an analogy-based approach that starts from seed entities of interest in a generic KG, and keeps or prunes their neighboring entities. We evaluate our approach on Wikidata through two manually labeled datasets that contain either domain-homogeneous or -heterogeneous seed entities. We empirically show that our analogy-based approach outperforms LSTM, Random Forest, SVM, and MLP, with a drastically lower number of parameters. We also evaluate its generalization potential in a transfer learning setting. These results advocate for the further integration of analogy-based inference in tasks related to the KG lifecycle.	翻訳日:2023-06-29 13:46:57 公開日:2023-06-28
# ワン・ツー・マニー合成による手術器具の領域分割の一般化 Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis ( http://arxiv.org/abs/2306.16285v1 ) ライセンス: Link先を確認	An Wang, Mobarakol Islam, Mengya Xu, Hongliang Ren	(参考訳) 様々な手術シーン理解タスクにおける優れたパフォーマンスにもかかわらず、深層学習に基づく手法は、様々な原因のために実世界の外科的応用に展開することを妨げることが多い。特に、サイトと患者の間のデータ収集、アノテーション、ドメインシフトが最も一般的な障害です。本研究では,最小限のソースイメージを有効活用して,合成手術器具セグメンテーションデータセットを生成することにより,データ関連の問題を軽減し,目に見えない実領域における優れた一般化性能を実現する。具体的には,1つの背景組織像と,各前景楽器の少なくとも3つの画像のみをシード画像とする。これらのソース画像は、前景と背景画像プールを構築するために広範囲に変換され、ランダムにサンプリングされた組織と楽器の画像を複数のブレンディング技術で合成し、新しい手術シーン画像を生成する。さらに,トレーニングデータの多様化を図るために,ハイブリッドなトレーニング時間拡張も導入する。 Endo2017、Endo2018、RoboToolの3つの実世界のデータセットに対する広範囲な評価は、我々の1対多の人工外科用データセットの生成とセグメンテーションフレームワークが実際のデータによるトレーニングと比較して、奨励的なパフォーマンスを達成することを実証している。特に、より重要なドメインギャップが存在するRoboToolデータセットでは、我々のフレームワークは、かなりのマージンで一般化の優位性を示している。我々は、データ合成によるモデル一般化の改善に研究の関心を惹きつけることを期待している。 Despite their impressive performance in various surgical scene understanding tasks, deep learning-based methods are frequently hindered from deploying to real-world surgical applications for various causes. Particularly, data collection, annotation, and domain shift in-between sites and patients are the most common obstacles. In this work, we mitigate data-related issues by efficiently leveraging minimal source images to generate synthetic surgical instrument segmentation datasets and achieve outstanding generalization performance on unseen real domains. Specifically, in our framework, only one background tissue image and at most three images of each foreground instrument are taken as the seed images. These source images are extensively transformed and employed to build up the foreground and background image pools, from which randomly sampled tissue and instrument images are composed with multiple blending techniques to generate new surgical scene images. Besides, we introduce hybrid training-time augmentations to diversify the training data further. Extensive evaluation on three real-world datasets, i.e., Endo2017, Endo2018, and RoboTool, demonstrates that our one-to-many synthetic surgical instruments datasets generation and segmentation framework can achieve encouraging performance compared with training with real data. Notably, on the RoboTool dataset, where a more significant domain gap exists, our framework shows its superiority of generalization by a considerable margin. We expect that our inspiring results will attract research attention to improving model generalization with data synthesizing.	翻訳日:2023-06-29 13:46:40 公開日:2023-06-28
# GPT-4による食品効果の要約と反復的プロンプティングによる製品特異的ガイダンス開発 Leveraging GPT-4 for Food Effect Summarization to Enhance Product-Specific Guidance Development via Iterative Prompting ( http://arxiv.org/abs/2306.16275v1 ) ライセンス: Link先を確認	Yiwen Shi, Ping Ren, Jing Wang, Biao Han, Taha ValizadehAslani, Felix Agbavor, Yi Zhang, Meng Hu, Liang Zhao, Hualou Liang	(参考訳) 新薬応用(NDA)による食品効果の要約は、製品特異的ガイダンス(PSG)の開発と評価に欠かせない要素である。しかし、広範囲な薬物アプリケーションレビュー文書からの食品効果の手動要約は時間がかかるため、自動化方法の開発の必要性が高まる。 chatgptやgpt-4といった大規模言語モデル(llm)の最近の進歩は、自動テキスト要約の有効性を向上させる大きな可能性を示しているが、psg評価における食品効果の要約の精度に関する能力は未だ不明である。本研究では,ChatGPT や GPT-4 との相互作用をより効果的かつ効率的に行うための,反復的プロンプト法を提案する。具体的には,食品効果要約のための3ターン反復プロンプト手法を提案し,キーワード指向プロンプトと長さ制御プロンプトを連続して提供し,生成した要約の質を向上させる。我々は,過去5年間に選択された100件のNDAレビュー文書に対して,自動測定からFDA専門家,さらにはGPT-4による評価まで,幅広い評価を行っている。我々は,プロセス全体で要約品質が徐々に改善されることを観察する。また, FDA専門家(43%対12%), GPT-4(64%対35%)では, GPT-4がChatGPTより優れていた。重要なことに、すべてのFDA専門家は、GPT-4が生成したサマリーの85%が、黄金の基準の要約と事実上一致していると全会一致で評価した。これらの結果は、gpt-4が食品効果サマリーを作成する大きな可能性を強く示唆しており、fdaの専門家によってレビューされ、psgアセスメントサイクルの効率を改善し、ジェネリック医薬品開発を促進する。 Food effect summarization from New Drug Application (NDA) is an essential component of product-specific guidance (PSG) development and assessment. However, manual summarization of food effect from extensive drug application review documents is time-consuming, which arouses a need to develop automated methods. Recent advances in large language models (LLMs) such as ChatGPT and GPT-4, have demonstrated great potential in improving the effectiveness of automated text summarization, but its ability regarding the accuracy in summarizing food effect for PSG assessment remains unclear. In this study, we introduce a simple yet effective approach, iterative prompting, which allows one to interact with ChatGPT or GPT-4 more effectively and efficiently through multi-turn interaction. Specifically, we propose a three-turn iterative prompting approach to food effect summarization in which the keyword-focused and length-controlled prompts are respectively provided in consecutive turns to refine the quality of the generated summary. We conduct a series of extensive evaluations, ranging from automated metrics to FDA professionals and even evaluation by GPT-4, on 100 NDA review documents selected over the past five years. We observe that the summary quality is progressively improved throughout the process. Moreover, we find that GPT-4 performs better than ChatGPT, as evaluated by FDA professionals (43% vs. 12%) and GPT-4 (64% vs. 35%). Importantly, all the FDA professionals unanimously rated that 85% of the summaries generated by GPT-4 are factually consistent with the golden reference summary, a finding further supported by GPT-4 rating of 72% consistency. These results strongly suggest a great potential for GPT-4 to draft food effect summaries that could be reviewed by FDA professionals, thereby improving the efficiency of PSG assessment cycle and promoting the generic drug product development.	翻訳日:2023-06-29 13:46:15 公開日:2023-06-28
# S2SNet:超伝導発見のためのトレーニング済みニューラルネットワーク S2SNet: A Pretrained Neural Network for Superconductivity Discovery ( http://arxiv.org/abs/2306.16270v1 ) ライセンス: Link先を確認	Ke Liu and Kaifan Yang and Jiahong Zhang and Renjun Xu	(参考訳) 超伝導は、エネルギー損失なしに電流を流すことができ、固体超伝導は物理学、物質科学、電気工学の最大の目標である。 16人以上のノーベル賞受賞者が超伝導研究への貢献で受賞している。超伝導体は、気候変動緩和、安価でクリーンなエネルギー、産業、イノベーション、インフラなど、持続可能な開発目標(SDG)に価値がある。しかし、全ての超伝導機構を説明する統一物理学理論はまだ不明である。超伝導は分子組成だけでなく、幾何学的結晶構造も原因であると考えられている。そのため、結晶構造と超伝導臨界温度の両方を含む新しいデータセットS2SがSuperConとMaterial Project上に構築されている。この新たなデータセットに基づいて,超伝導予測に注目機構を利用する新しいモデルS2SNetを提案する。データ不足を克服するために、s2snetは、マスキング言語モデリング(mlm)を使用して、マテリアルプロジェクトデータセット全体に事前トレーニングされる。 S2SNetは、新しい最先端の精度を92%、AUC(Area Under Curve)を0.92の精度で実現している。我々の知る限りでは、S2SNetは結晶構造の情報のみを用いて超伝導を予測する最初の研究である。この研究は超伝導の発見とさらにsdgに有益である。コードとデータセットはhttps://github.com/zjuKeLiu/S2SNetで入手できる。 Superconductivity allows electrical current to flow without any energy loss, and thus making solids superconducting is a grand goal of physics, material science, and electrical engineering. More than 16 Nobel Laureates have been awarded for their contribution to superconductivity research. Superconductors are valuable for sustainable development goals (SDGs), such as climate change mitigation, affordable and clean energy, industry, innovation and infrastructure, and so on. However, a unified physics theory explaining all superconductivity mechanism is still unknown. It is believed that superconductivity is microscopically due to not only molecular compositions but also the geometric crystal structure. Hence a new dataset, S2S, containing both crystal structures and superconducting critical temperature, is built upon SuperCon and Material Project. Based on this new dataset, we propose a novel model, S2SNet, which utilizes the attention mechanism for superconductivity prediction. To overcome the shortage of data, S2SNet is pre-trained on the whole Material Project dataset with Masked-Language Modeling (MLM). S2SNet makes a new state-of-the-art, with out-of-sample accuracy of 92% and Area Under Curve (AUC) of 0.92. To the best of our knowledge, S2SNet is the first work to predict superconductivity with only information of crystal structures. This work is beneficial to superconductivity discovery and further SDGs. Code and datasets are available in https://github.com/zjuKeLiu/S2SNet	翻訳日:2023-06-29 13:45:39 公開日:2023-06-28
# RSPrompter: Visual Foundation Modelに基づくリモートセンシングインスタンスセグメンテーションのためのプロンプト学習 RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model ( http://arxiv.org/abs/2306.16269v1 ) ライセンス: Link先を確認	Keyan Chen, Chenyang Liu, Hao Chen, Haotian Zhang, Wenyuan Li, Zhengxia Zou, and Zhenwei Shi	(参考訳) Meta AI Researchが提案したSegment Anything Model(SAM)は、大規模なトレーニングデータ(SA-1B)を活用することで、優れた一般化とゼロショット機能を示している。それでも、SAMはカテゴリに依存しないインスタンスセグメンテーション法として、ポイント、ボックス、粗いきめのマスクを含む以前の手動ガイダンスに大きく依存している。さらに,リモートセンシング画像分割タスクの性能については,まだ十分に検証されていない。本稿では,SAMファウンデーションモデルに基づくリモートセンシング画像の自動インスタンスセグメンテーション手法の設計について検討する。そこで本研究では,SAM入力に対する適切なプロンプトの生成を学習する手法を提案する。これにより、SAMはリモートセンシング画像に対して意味的に識別可能なセグメンテーション結果を生成することができる。また,SAMコミュニティの最近の発展をベースとして,例分節タスクなどいくつかの派生案を提案し,その性能をRSPrompterと比較した。 WHUビルディング,NWPU VHR-10,SSDDデータセットの大規模実験により,提案手法の有効性が検証された。私たちのコードは \url{https://kyanchen.github.io/RSPrompter} でアクセスできます。 Leveraging vast training data (SA-1B), the foundation Segment Anything Model (SAM) proposed by Meta AI Research exhibits remarkable generalization and zero-shot capabilities. Nonetheless, as a category-agnostic instance segmentation method, SAM heavily depends on prior manual guidance involving points, boxes, and coarse-grained masks. Additionally, its performance on remote sensing image segmentation tasks has yet to be fully explored and demonstrated. In this paper, we consider designing an automated instance segmentation approach for remote sensing images based on the SAM foundation model, incorporating semantic category information. Inspired by prompt learning, we propose a method to learn the generation of appropriate prompts for SAM input. This enables SAM to produce semantically discernible segmentation results for remote sensing images, which we refer to as RSPrompter. We also suggest several ongoing derivatives for instance segmentation tasks, based on recent developments in the SAM community, and compare their performance with RSPrompter. Extensive experimental results on the WHU building, NWPU VHR-10, and SSDD datasets validate the efficacy of our proposed method. Our code is accessible at \url{https://kyanchen.github.io/RSPrompter}.	翻訳日:2023-06-29 13:45:16 公開日:2023-06-28
# ランダム分類雑音をもつ辺縁半空間学習のための情報計算トレードオフ Information-Computation Tradeoffs for Learning Margin Halfspaces with Random Classification Noise ( http://arxiv.org/abs/2306.16352v1 ) ライセンス: Link先を確認	Ilias Diakonikolas, Jelena Diakonikolas, Daniel M. Kane, Puqian Wang, Nikos Zarifis	(参考訳) ランダムな分類雑音を伴うpac学習問題である\gamma$-margin半空間について検討する。我々は、問題のサンプル複雑性と計算効率の良いアルゴリズムのサンプル複雑性との間に固有のギャップを示唆する情報計算トレードオフを確立する。具体的には、問題のサンプル複雑性は$\widetilde{\Theta}(1/(\gamma^2 \epsilon)$である。まず、サンプル複雑性 $\widetilde{O}(1/(\gamma^2 \epsilon^2))$ の単純な効率的なアルゴリズムを与える。我々の主な結果は統計クエリ(sq)アルゴリズムと低次多項式テストに対する下限であり、サンプル複雑性における1/\epsilon$の二次依存は計算効率の高いアルゴリズムに固有のものであることを示唆している。具体的には、効率的なSQ学習者や低次テストのサンプル複雑さについて、より低い値の$\widetilde{\Omega}(1/(\gamma^{1/2} \epsilon^2)を示唆する。 We study the problem of PAC learning $\gamma$-margin halfspaces with Random Classification Noise. We establish an information-computation tradeoff suggesting an inherent gap between the sample complexity of the problem and the sample complexity of computationally efficient algorithms. Concretely, the sample complexity of the problem is $\widetilde{\Theta}(1/(\gamma^2 \epsilon))$. We start by giving a simple efficient algorithm with sample complexity $\widetilde{O}(1/(\gamma^2 \epsilon^2))$. Our main result is a lower bound for Statistical Query (SQ) algorithms and low-degree polynomial tests suggesting that the quadratic dependence on $1/\epsilon$ in the sample complexity is inherent for computationally efficient algorithms. Specifically, our results imply a lower bound of $\widetilde{\Omega}(1/(\gamma^{1/2} \epsilon^2))$ on the sample complexity of any efficient SQ learner or low-degree test.	翻訳日:2023-06-29 13:37:33 公開日:2023-06-28
# ボソニックガウス流路の低地・高地容量領域解析 Low-ground/High ground capacity regions analysis for Bosonic Gaussian Channels ( http://arxiv.org/abs/2306.16350v1 ) ライセンス: Link先を確認	Farzad Kianvash, Marco Fanizza, and Vittorio Giovannetti	(参考訳) 本稿では, 単一モード, 位相非感受性ガウスボソニックチャネル間の相互接続の包括的特性について述べる。この特徴付けにより、これらのマップのパラメータ空間において、低地と高地という2つの異なる領域を識別できる。低地領域では、情報容量は指定基準値よりも小さく、高地領域では、確実に大きい。直接の結果として、既知の上界と合成規則を組み合わせて既存の結果を改善するこれらの写像の量子容量とプライベート容量の明示的な上界の集合を体系的に概説する。 We present a comprehensive characterization of the interconnections between single-mode, phaseinsensitive Gaussian Bosonic Channels resulting from channel concatenation. This characterization enables us to identify, in the parameter space of these maps, two distinct regions: low-ground and high-ground. In the low-ground region, the information capacities are smaller than a designated reference value, while in the high-ground region, they are provably greater. As a direct consequence, we systematically outline an explicit set of upper bounds for the quantum and private capacity of these maps, which combine known upper bounds and composition rules, improving upon existing results.	翻訳日:2023-06-29 13:37:20 公開日:2023-06-28
# SpinBusアーキテクチャ - 電子シャットリングによるスピン量子のスケーリング The SpinBus Architecture: Scaling Spin Qubits with Electron Shuttling ( http://arxiv.org/abs/2306.16348v1 ) ライセンス: Link先を確認	Matthias K\"unne, Alexander Willmes, Max Oberl\"ander, Christian Gorjaew, Julian D. Teske, Harsh Bhardwaj, Max Beer, Eugen Kammerloher, Ren\'e Otten, Inga Seidler, Ran Xue, Lars R. Schreiber and Hendrik Bluhm	(参考訳) 量子プロセッサアーキテクチャは、2次元の量子ビット接続と必要な操作能力を提供しながら、大きな量子ビット数へのスケーリングを可能にする必要がある。マイクロ波制御された半導体スピン量子ビットでは、密度の強いアレイがかなりの進歩を遂げているが、配線ファンアウトによりサイズが制限され、クォービット間のクロストークが顕著である。これらの制約を克服するために、電子シャットリングを用いてキュービットを接続し、低動作周波数と拡張キュービットコヒーレンスを特徴とするSpinBusアーキテクチャを導入する。 Si/SiGeプラットフォームにおける全ての関連する操作のデバイスシミュレーションは、確立された半導体パターン技術と99.9%以上の動作フィデリティによる実現可能性を検証する。室温計を用いた制御は、少なくとも144量子ビットを確実に支持できるが、もっと多くの数値が低温制御回路で認識できる。高忠実度スピンコヒーレント電子遮断の理論的実現可能性に基づいて、スピンバスアーキテクチャは実用的な量子コンピューティングのスケーラビリティ要件を満たすスピンベースの量子プロセッサの基礎となるかもしれない。 Quantum processor architectures must enable scaling to large qubit numbers while providing two-dimensional qubit connectivity and exquisite operation fidelities. For microwave-controlled semiconductor spin qubits, dense arrays have made considerable progress, but are still limited in size by wiring fan-out and exhibit significant crosstalk between qubits. To overcome these limitations, we introduce the SpinBus architecture, which uses electron shuttling to connect qubits and features low operating frequencies and enhanced qubit coherence. Device simulations for all relevant operations in the Si/SiGe platform validate the feasibility with established semiconductor patterning technology and operation fidelities exceeding 99.9 %. Control using room temperature instruments can plausibly support at least 144 qubits, but much larger numbers are conceivable with cryogenic control circuits. Building on the theoretical feasibility of high-fidelity spin-coherent electron shuttling as key enabling factor, the SpinBus architecture may be the basis for a spin-based quantum processor that meets the scalability requirements for practical quantum computing.	翻訳日:2023-06-29 13:37:08 公開日:2023-06-28
# 多様体上の自己回帰モデル(mnarx)を用いた複素系のダイナミクスの模倣 Emulating the dynamics of complex systems using autoregressive models on manifolds (mNARX) ( http://arxiv.org/abs/2306.16335v1 ) ライセンス: Link先を確認	Styfen Sch\"ar, Stefano Marelli, Bruno Sudret	(参考訳) 本研究では, 時間変化による外因性励起による複雑な力学系の応答を, 効率的に正確に近似するための新しい代理モデリング手法を提案する。我々のアプローチでは, 自己回帰的サロゲートを構成するのに最適な問題固有の外因性入力多様体を構築することを含む, 非線形自己回帰モデル (mNARX) と命名する。 mNARX の核を形成する多様体は、システムの物理と事前の専門家およびドメイン知識を組み込むことで漸進的に構成される。 mNARXは完全な問題を一連の小さなサブプロブレムに分解し、それぞれが元のより低い複雑さを持つので、最終的なサロゲートのトレーニングと評価のコストの両面で、問題の複雑さによく対応している。さらに、mnarxは従来の次元還元技術とよく調和しており、高次元外因性入力を持つ力学系をモデル化するのに非常に適しており、解くのが難しく、特に工学的応用で見られるような物理システムではドメイン知識が豊富であるため、mnarxはこれらの応用に適している。 1次元ランダム励起により励起される古典的結合ばね質量系の応答を予測するため,mNARXは従来の自己回帰代理よりも優れていた。さらに,mNARXは,アクティブコントローラの影響を受けても,現実的なエアロサーボ弾性風力タービンシミュレータの動力学を補助することにより,高次元時間・状態依存系のエミュレートに適していることを示す。一般に,mNARXは複雑な力学系を,精度と効率の観点からモデル化する上で有望な可能性を示している。 In this study, we propose a novel surrogate modelling approach to efficiently and accurately approximate the response of complex dynamical systems driven by time-varying exogenous excitations over extended time periods. Our approach, that we name \emph{manifold nonlinear autoregressive modelling with exogenous input} (mNARX), involves constructing a problem-specific exogenous input manifold that is optimal for constructing autoregressive surrogates. The manifold, which forms the core of mNARX, is constructed incrementally by incorporating the physics of the system, as well as prior expert- and domain- knowledge. Because mNARX decomposes the full problem into a series of smaller sub-problems, each with a lower complexity than the original, it scales well with the complexity of the problem, both in terms of training and evaluation costs of the final surrogate. Furthermore, mNARX synergizes well with traditional dimensionality reduction techniques, making it highly suitable for modelling dynamical systems with high-dimensional exogenous inputs, a class of problems that is typically challenging to solve.Since domain knowledge is particularly abundant in physical systems, such as those found in engineering applications, mNARX is well suited for these applications. We demonstrate that mNARX outperforms traditional autoregressive surrogates in predicting the response of a classical coupled spring-mass system excited by a one-dimensional random excitation. Additionally, we show that mNARX is well suited for emulating very high-dimensional time- and state-dependent systems, even when affected by active controllers, by surrogating the dynamics of a realistic aero-servo-elastic onshore wind turbine simulator. In general, our results demonstrate that mNARX offers promising prospects for modelling complex dynamical systems, in terms of accuracy and efficiency.	翻訳日:2023-06-29 13:36:48 公開日:2023-06-28
# 密度ランドマーク検出による離散化潜在座標系の同定可能性 Identifiability of Discretized Latent Coordinate Systems via Density Landmarks Detection ( http://arxiv.org/abs/2306.16334v1 ) ライセンス: Link先を確認	Vit\'oria Barin-Pacela, Kartik Ahuja, Simon Lacoste-Julien, Pascal Vincent	(参考訳) 乱れは観測された分布のみから有意義な潜在的地中要因を回復することを目的としている。 Identifiabilityは、不整合が十分に確立される理論的根拠を提供する。残念なことに、独立潜在因子の教師なし識別性は、因子から観測までの一般的な非線形滑らかな写像の下でのi.d.セッティングにおいて理論的に証明された不可能性である。本研究では,高次非線形滑らかな写像(微分同相写像)の下で離散化された潜在座標を,追加の帰納的バイアスを伴わずに復元可能であることを示す。これは、潜在密度が軸合わせの不連続性ランドマークを持つと仮定するが、因子の統計的独立性を非現実的に仮定する必要はない。本稿では,量子化座標identifiabilityと呼ばれる,この新しい形態の識別可能性を紹介し,離散座標の回復の包括的証明を提供する。 Disentanglement aims to recover meaningful latent ground-truth factors from only the observed distribution. Identifiability provides the theoretical grounding for disentanglement to be well-founded. Unfortunately, unsupervised identifiability of independent latent factors is a theoretically proven impossibility in the i.i.d. setting under a general nonlinear smooth map from factors to observations. In this work, we show that, remarkably, it is possible to recover discretized latent coordinates under a highly generic nonlinear smooth mapping (a diffeomorphism) without any additional inductive bias on the mapping. This is, assuming that latent density has axis-aligned discontinuity landmarks, but without making the unrealistic assumption of statistical independence of the factors. We introduce this novel form of identifiability, termed quantized coordinate identifiability, and provide a comprehensive proof of the recovery of discretized coordinates.	翻訳日:2023-06-29 13:36:16 公開日:2023-06-28
# diffcomplete:拡散に基づく生成的3次元形状完了 DiffComplete: Diffusion-based Generative 3D Shape Completion ( http://arxiv.org/abs/2306.16329v1 ) ライセンス: Link先を確認	Ruihang Chu, Enze Xie, Shentong Mo, Zhenguo Li, Matthias Nie{\ss}ner, Chi-Wing Fu, Jiaya Jia	(参考訳) 3dレンジスキャンによる形状完了のための新しい拡散ベース手法を提案する。従来の決定論的および確率論的手法と比較して、現実主義、多様性、高忠実性のバランスをとる。不完全な形状を条件とした生成タスクとして、形状完了をキャスティングすることでDiffCompleteを提案する。私たちのキーデザインは2倍です。まず,空間的に一貫した方法で条件付き特徴を注入する階層的特徴集約機構を考案する。そこで, 形状完了を制御するために, 条件入力の局所的詳細とより広い文脈の両方をキャプチャできる。第2に,複数の部分形状の完成と,入力条件に対する高い柔軟性を実現するために,我々のモデルにおける占有を考慮した融合戦略を提案する。 DiffCompleteは2つの大規模3D形状補完ベンチマーク上で新しいSOTA性能(例:l_1エラーの40%削減)を設定する。我々の完成形は決定論的方法と比較して現実的な見通しを持つだけでなく、確率的代替物と比較して基礎的真理と高い類似性を示す。さらに、DiffCompleteは、合成データと実データの両方に対して、完全に見えないクラスのオブジェクトに対して強力な一般化性を持ち、様々なアプリケーションでモデルの再トレーニングを不要にする。 We introduce a new diffusion-based approach for shape completion on 3D range scans. Compared with prior deterministic and probabilistic methods, we strike a balance between realism, multi-modality, and high fidelity. We propose DiffComplete by casting shape completion as a generative task conditioned on the incomplete shape. Our key designs are two-fold. First, we devise a hierarchical feature aggregation mechanism to inject conditional features in a spatially-consistent manner. So, we can capture both local details and broader contexts of the conditional inputs to control the shape completion. Second, we propose an occupancy-aware fusion strategy in our model to enable the completion of multiple partial shapes and introduce higher flexibility on the input conditions. DiffComplete sets a new SOTA performance (e.g., 40% decrease on l_1 error) on two large-scale 3D shape completion benchmarks. Our completed shapes not only have a realistic outlook compared with the deterministic methods but also exhibit high similarity to the ground truths compared with the probabilistic alternatives. Further, DiffComplete has strong generalizability on objects of entirely unseen classes for both synthetic and real data, eliminating the need for model re-training in various applications.	翻訳日:2023-06-29 13:35:58 公開日:2023-06-28
# 変分ベイズネットワークによる表現学習 Representation Learning via Variational Bayesian Networks ( http://arxiv.org/abs/2306.16326v1 ) ライセンス: Link先を確認	Oren Barkan, Avi Caciularu, Idan Rejwan, Ori Katz, Jonathan Weill, Itzik Malkiel, Noam Koenigstein	(参考訳) 変動ベイズネットワーク(vbn) - 階層的およびリレーショナルなサイド情報を利用した新しいベイズ型エンティティ表現学習モデルであり、データが不足している 'long-tail'' におけるエンティティのモデリングに特に有用である。第一に、vbnは共通の祖先を共有するエンティティ間の情報伝達を可能にする情報的階層的優先順位を採用している。さらに、VBNは相補構造と一貫性を強制するエンティティ間の明示的な関係をモデル化し、学習された表現をより意味のある空間配置へと導く。第2に、VBNは(ベクトルではなく)密度による実体を表すため、データ不足に対処する上で相補的な役割を果たす不確実性をモデル化する。最後に,高速近似ベイズ推定を可能にするスケーラブルな変分ベイズ最適化アルゴリズムを提案する。言語,推奨,医学的推論タスクにおけるvbnの有効性を評価した。以上の結果から,VBNは複数のデータセット,特にロングテールにおいて,既存の手法よりも優れていることがわかった。 We present Variational Bayesian Network (VBN) - a novel Bayesian entity representation learning model that utilizes hierarchical and relational side information and is particularly useful for modeling entities in the ``long-tail'', where the data is scarce. VBN provides better modeling for long-tail entities via two complementary mechanisms: First, VBN employs informative hierarchical priors that enable information propagation between entities sharing common ancestors. Additionally, VBN models explicit relations between entities that enforce complementary structure and consistency, guiding the learned representations towards a more meaningful arrangement in space. Second, VBN represents entities by densities (rather than vectors), hence modeling uncertainty that plays a complementary role in coping with data scarcity. Finally, we propose a scalable Variational Bayes optimization algorithm that enables fast approximate Bayesian inference. We evaluate the effectiveness of VBN on linguistic, recommendations, and medical inference tasks. Our findings show that VBN outperforms other existing methods across multiple datasets, and especially in the long-tail.	翻訳日:2023-06-29 13:35:39 公開日:2023-06-28
# DoseDiff:放射線治療における線量予測のための距離認識拡散モデル DoseDiff: Distance-aware Diffusion Model for Dose Prediction in Radiotherapy ( http://arxiv.org/abs/2306.16324v1 ) ライセンス: Link先を確認	Yiwen Zhang, Chuanpu Li, Liming Zhong, Zeli Chen, Wei Yang, and Xuetao Wang	(参考訳) 治療計画は放射線治療のワークフローにおいて重要な要素であり、一般的には医療物理学者が時間を要する試行錯誤の方法で行う。これまでの研究では、医学物理学者が治療計画の効率を改善するのに役立つ線量分布マップを予測するための知識ベースまたは深層学習ベースの方法が提案されている。しかしながら、これらの線量予測法は通常、周囲の組織と標的または臓器間の距離情報の有効利用を欠いている。さらに、予測された線量分布図における線量経路の分布特性の維持に乏しく、医療物理学者が得る貴重な情報を失うことになる。本稿では,線量分布を正確に予測するための距離認識拡散モデル(DoseDiff)を提案する。我々は、線量予測を、CT画像と署名された距離マップ(SDM)の条件で予測された線量分布マップを生成する、認知ステップのシーケンスとして定義する。 SDMは、画像の各画素から目標またはOARのアウトラインまでの距離情報を提供するターゲットまたはOARのマスクからの距離変換によって得られる。さらに,マルチエンコーダとマルチスケールフュージョンネットワーク(MMFNet)を提案し,マルチスケールフュージョンとトランスフォーマーベースフュージョンモジュールを組み込むことにより,機能レベルでのCT画像とSDM間の情報フュージョンを強化する。本モデルは,乳癌患者と鼻咽頭癌患者から収集した2つのデータを用いて評価した。その結果,ドセディフは量的および視覚的品質の両面で最先端の線量予測法より優れていた。 Treatment planning is a critical component of the radiotherapy workflow, typically carried out by a medical physicist using a time-consuming trial-and-error manner. Previous studies have proposed knowledge-based or deep learning-based methods for predicting dose distribution maps to assist medical physicists in improving the efficiency of treatment planning. However, these dose prediction methods usuallylack the effective utilization of distance information between surrounding tissues andtargets or organs-at-risk (OARs). Moreover, they are poor in maintaining the distribution characteristics of ray paths in the predicted dose distribution maps, resulting in a loss of valuable information obtained by medical physicists. In this paper, we propose a distance-aware diffusion model (DoseDiff) for precise prediction of dose distribution. We define dose prediction as a sequence of denoising steps, wherein the predicted dose distribution map is generated with the conditions of the CT image and signed distance maps (SDMs). The SDMs are obtained by a distance transformation from the masks of targets or OARs, which provide the distance information from each pixel in the image to the outline of the targets or OARs. Besides, we propose a multiencoder and multi-scale fusion network (MMFNet) that incorporates a multi-scale fusion and a transformer-based fusion module to enhance information fusion between the CT image and SDMs at the feature level. Our model was evaluated on two datasets collected from patients with breast cancer and nasopharyngeal cancer, respectively. The results demonstrate that our DoseDiff outperforms the state-of-the-art dose prediction methods in terms of both quantitative and visual quality.	翻訳日:2023-06-29 13:35:22 公開日:2023-06-28
# Taqyim: ChatGPTモデルによるアラビアNLPタスクの評価 Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models ( http://arxiv.org/abs/2306.16322v1 ) ライセンス: Link先を確認	Zaid Alyafeai and Maged S. Alshaibani and Badr AlKhamissi and Hamzah Luqman and Ebrahim Alareqi and Ali Fadel	(参考訳) GPT-3.5やGPT-4のようなLLM上に構築されたチャットベースのモデルChatGPTなど、さまざまなダウンストリームタスクにおいて、微調整を必要とせずに、大きな言語モデル(LLM)が印象的なパフォーマンスを示している。英語に比べて訓練率が低いにもかかわらず、これらのモデルは他の言語でも顕著な能力を示す。本研究では, 感情分析, 翻訳, 翻訳, パラフレージング, 音声タグ付け, 要約, ダイアクリマイゼーションの7つの異なるNLPタスクにおけるGPT-3.5およびGPT-4モデルの性能評価を行った。 GPT-4は7タスク中5タスクでGPT-3.5を上回った。さらに、感情分析タスクを広範囲に分析し、難易度データセット上でLCMが例外的な結果を得る方法について考察する。さらに,これらのタスクの評価を容易にする新しいPythonインターフェース https://github.com/ARBML/Taqyimを導入する。 Large language models (LLMs) have demonstrated impressive performance on various downstream tasks without requiring fine-tuning, including ChatGPT, a chat-based model built on top of LLMs such as GPT-3.5 and GPT-4. Despite having a lower training proportion compared to English, these models also exhibit remarkable capabilities in other languages. In this study, we assess the performance of GPT-3.5 and GPT-4 models on seven distinct Arabic NLP tasks: sentiment analysis, translation, transliteration, paraphrasing, part of speech tagging, summarization, and diacritization. Our findings reveal that GPT-4 outperforms GPT-3.5 on five out of the seven tasks. Furthermore, we conduct an extensive analysis of the sentiment analysis task, providing insights into how LLMs achieve exceptional results on a challenging dialectal dataset. Additionally, we introduce a new Python interface https://github.com/ARBML/Taqyim that facilitates the evaluation of these tasks effortlessly.	翻訳日:2023-06-29 13:34:55 公開日:2023-06-28
# 意味的検出を用いた中国語テキスト修正のための敵対的マルチタスク学習法 An Adversarial Multi-Task Learning Method for Chinese Text Correction with Semantic Detection ( http://arxiv.org/abs/2306.16313v1 ) ライセンス: Link先を確認	Fanyu Wang and Zhenping Xie	(参考訳) テキストの修正、特により広く使用されるシーンのセマンティックな修正は、テキストの流速と筆記効率を改善するために強く要求される。中国語文文脈における文字ポリセミーのモデル化と検出能力を高めるために, 対向的多タスク学習法を提案する。そこで、マスク言語モデルとスコアリング言語モデルという2つのモデルが、結合されただけでなく、逆の学習タスクとして導入された。さらに,モンテカルロ木探索戦略とポリシーネットワークを導入して,意味検出による中国語テキストの効率的な修正作業を実現する。実験は,3つのデータセットと5つの比較手法を用いて実施され,本手法は,意味的合理性を高めるために,中国語テキスト修正タスクにおいて優れた性能が得られることを示す。 Text correction, especially the semantic correction of more widely used scenes, is strongly required to improve, for the fluency and writing efficiency of the text. An adversarial multi-task learning method is proposed to enhance the modeling and detection ability of character polysemy in Chinese sentence context. Wherein, two models, the masked language model and scoring language model, are introduced as a pair of not only coupled but also adversarial learning tasks. Moreover, the Monte Carlo tree search strategy and a policy network are introduced to accomplish the efficient Chinese text correction task with semantic detection. The experiments are executed on three datasets and five comparable methods, and the experimental results show that our method can obtain good performance in Chinese text correction task for better semantic rationality.	翻訳日:2023-06-29 13:34:38 公開日:2023-06-28
# ベイズ逆問題に対する時空間ベソフ前処理 Spatiotemporal Besov Priors for Bayesian Inverse Problems ( http://arxiv.org/abs/2306.16378v1 ) ライセンス: Link先を確認	Shiwei Lan, Mirjeta Pasha, and Shuyi Li	(参考訳) 科学技術の急速な発展は、突然の変化や鋭いコントラストといった特別なデータ特徴を捉えるための適切な統計ツールの必要性を招いた。データサイエンスにおける多くの応用は、不連続性や特異性のある時間依存物体(例えば、動的コンピュータ断層撮影(ct)画像)から時空間的再構成を求める。ガウス過程(gp)に基づく従来の手法は、過剰な事前候補を提供する傾向があるため、十分な解を提供することができない。近年、ランダム係数を持つウェーブレット展開によって定義されるベッソフ過程(bp)は、このタイプのベイズ逆問題より適切であるとして提案されている。 BPは画像解析においてGPを上回ってエッジ保存再構成を生成するが、動的に変化する画像に遺伝する時間相関を自動的に組み込むわけではない。本稿では,時系列相関強度を規定するq指数過程に従って,系列展開の確率係数を確率時間関数に置き換え,時空間領域(stbp)へbpを一般化する。 STBPに関する数学的および統計的性質を慎重に研究した。また,STBPの白色雑音表現も提案し,後方サンプリングによる最大値(MAP)と不確かさ定量化(UQ)による点推定を容易にする。 2つの有限角ct再構成例とnavier-stokes方程式を含む高非線形逆問題を用いて、従来のstgpと時間非相関アプローチと比較して、時間変化を考慮した空間的特徴の保存におけるstbpの利点を示す。 Fast development in science and technology has driven the need for proper statistical tools to capture special data features such as abrupt changes or sharp contrast. Many applications in the data science seek spatiotemporal reconstruction from a sequence of time-dependent objects with discontinuity or singularity, e.g. dynamic computerized tomography (CT) images with edges. Traditional methods based on Gaussian processes (GP) may not provide satisfactory solutions since they tend to offer over-smooth prior candidates. Recently, Besov process (BP) defined by wavelet expansions with random coefficients has been proposed as a more appropriate prior for this type of Bayesian inverse problems. While BP outperforms GP in imaging analysis to produce edge-preserving reconstructions, it does not automatically incorporate temporal correlation inherited in the dynamically changing images. In this paper, we generalize BP to the spatiotemporal domain (STBP) by replacing the random coefficients in the series expansion with stochastic time functions following Q-exponential process which governs the temporal correlation strength. Mathematical and statistical properties about STBP are carefully studied. A white-noise representation of STBP is also proposed to facilitate the point estimation through maximum a posterior (MAP) and the uncertainty quantification (UQ) by posterior sampling. Two limited-angle CT reconstruction examples and a highly non-linear inverse problem involving Navier-Stokes equation are used to demonstrate the advantage of the proposed STBP in preserving spatial features while accounting for temporal changes compared with the classic STGP and a time-uncorrelated approach.	翻訳日:2023-06-29 13:28:55 公開日:2023-06-28
# メモリ・マイクロスケール接続機能を有する単一電子情報処理デバイス用Si/SiGe QuBus Si/SiGe QuBus for single electron information-processing devices with memory and micron-scale connectivity function ( http://arxiv.org/abs/2306.16375v1 ) ライセンス: Link先を確認	Ran Xue, Max Beer, Inga Seidler, Simon Humpohl, Jhih-Sian Tu, Stefan Trellenkamp, Tom Struck, Hendrik Bluhm, Lars R. Schreiber	(参考訳) 単一キャリア情報処理デバイス内の接続には、単一電荷量子の転送とストレージが必要である。われわれの全電動Si/SiGeシャトル装置は量子バス(QuBus)と呼ばれ、長さは10$\mathrm{\mu}$mで、電圧パルスは6つしかない。コンベアモード(containor-mode)、すなわち電子は移動QDに閉じ込められ、断熱的に輸送される。我々は,QuBusの潜在的な欠陥と局所的なシャトル忠実度をベンチマークするために,シャトルトモグラフィーと呼ばれるキャラクタリゼーション手法を提案する。全デバイスと背面を横断する単電子シャトルの忠実性(合計距離は19ドルの\mathrm{\mu}$m)は$(99.7 \pm 0.3)\,\%$である。 QuBusを用いて最大34個の電子の位置と検出を行い、任意に選択されたゼロ電子と単一電子のパターンを持つ34個の量子ドットのレジスタを初期化する。単純な演算信号、産業製造との互換性、$^{28}$Si/SiGeでの低スピン環境相互作用は、量子コンピューティングアーキテクチャにおける量子接続のためのスピン保存輸送を約束する。 The connectivity within single carrier information-processing devices requires transport and storage of single charge quanta. Our all-electrical Si/SiGe shuttle device, called quantum bus (QuBus), spans a length of 10 $\mathrm{\mu}$m and is operated by only six simply-tunable voltage pulses. It operates in conveyor-mode, i.e. the electron is adiabatically transported while confined to a moving QD. We introduce a characterization method, called shuttle-tomography, to benchmark the potential imperfections and local shuttle-fidelity of the QuBus. The fidelity of the single-electron shuttle across the full device and back (a total distance of 19 $\mathrm{\mu}$m) is $(99.7 \pm 0.3)\,\%$. Using the QuBus, we position and detect up to 34 electrons and initialize a register of 34 quantum dots with arbitrarily chosen patterns of zero and single-electrons. The simple operation signals, compatibility with industry fabrication and low spin-environment-interaction in $^{28}$Si/SiGe, promises spin-conserving transport of spin qubits for quantum connectivity in quantum computing architectures.	翻訳日:2023-06-29 13:28:14 公開日:2023-06-28
# フォールトトレランス前の量子コンピューティングの有用性に関するエビデンスの高速古典シミュレーション Fast classical simulation of evidence for the utility of quantum computing before fault tolerance ( http://arxiv.org/abs/2306.16372v1 ) ライセンス: Link先を確認	Tomislav Begu\v{s}i\'c and Garnet Kin-Lic Chan	(参考訳) スパース・ポーリ・ダイナミクスに基づく古典的アルゴリズムは、ibmのeagleプロセッサ[nature 618, 500 (2023)]の127量子ビットに関する最近の実験で研究された量子回路を効率的にシミュレートできる。ラップトップの単一コア上の古典的なシミュレーションは、報告された量子シミュレーションのウォールタイムよりも桁違いに速く、古典的な処理のない推定量子ハードウェアランタイムよりも高速で、ゼロノイズ外挿実験結果とよく一致しています。 We show that a classical algorithm based on sparse Pauli dynamics can efficiently simulate quantum circuits studied in a recent experiment on 127 qubits of IBM's Eagle processor [Nature 618, 500 (2023)]. Our classical simulations on a single core of a laptop are orders of magnitude faster than the reported walltime of the quantum simulations, as well as faster than the estimated quantum hardware runtime without classical processing, and are in good agreement with the zero-noise extrapolated experimental results.	翻訳日:2023-06-29 13:27:47 公開日:2023-06-28
# 不平衡振幅をもつ和オーバーパスの完全等式理論 Complete equational theories for the sum-over-paths with unbalanced amplitudes ( http://arxiv.org/abs/2306.16369v1 ) ライセンス: Link先を確認	Matthew Amy	(参考訳) vilmart氏は最近、toffoli-hadamard回路と拡張clifford+$\mathrm{diag}(1, \zeta_{2^k})$回路上の平衡和オーバーパスの完全な方程式理論を提示した。それらの理論は、位相自由なZH-計算に基づいており、完全なZH-計算の平均的な規則を著しく省略し、振幅の局所的な和を許容しない。ここでは局所和を自然に支持する不均衡経路和における完全性の問題を考察する。非平衡和オーバーパスの具体的構文を示し、記号的多線型代数と干渉規則とともに、zh-係数の平均および正則の様々な定式化が任意の環と体上の完全な方程式論を与えるのに十分であることを示す。 Vilmart recently gave a complete equational theory for the balanced sum-over-paths over Toffoli-Hadamard circuits, and by extension Clifford+$\mathrm{diag}(1, \zeta_{2^k})$ circuits. Their theory is based on the phase-free ZH-calculus which crucially omits the average rule of the full ZH-calculus, dis-allowing the local summation of amplitudes. Here we study the question of completeness in unbalanced path sums which naturally support local summation. We give a concrete syntax for the unbalanced sum-over-paths and show that, together with symbolic multilinear algebra and the interference rule, various formulations of the average and ortho rules of the ZH-calculus are sufficient to give complete equational theories over arbitrary rings and fields.	翻訳日:2023-06-29 13:27:29 公開日:2023-06-28
# ラグランジアンに基づく自動推論のためのAアルゴリズム Lagrangian based A algorithm for automated reasoning ( http://arxiv.org/abs/2306.16368v1 ) ライセンス: Link先を確認	Renju Rajan	(参考訳) 本稿では,Aアルゴリズムの修正を最短経路問題として検討する。重み付けはAアルゴリズムのヒューリスティックな部分で導入され、効率が向上する。このアルゴリズムの応用は、速度をヒューリスティックの湿潤化とみなすUAV経路計画に適用できると考えられる。当初、ラグランジュ方程式に基づく変分法を用いて速度を力学系の決定的因子として同定した。このアプローチは、これらの領域におけるアルゴリズムの効率を改善するのに他の問題にも役立つだろう。 In this paper, a modification of A* algorithm is considered for the shortest path problem. A weightage is introduced in the heuristic part of the A* algorithm to improve its efficiency. An application of the algorithm is considered for UAV path planning wherein velocity is taken as the weigtage to the heuristic. At the outset, calculus of variations based Lagrange's equation was used to identify velocity as the decisive factor for the dynamical system. This approach would be useful for other problems as well to improve the efficiency of algorithms in those areas.	翻訳日:2023-06-29 13:27:11 公開日:2023-06-28
# 再帰的・注意的モデルとNVFlareを用いた多段階臨床フェデレーション学習 Multi-Site Clinical Federated Learning using Recursive and Attentive Models and NVFlare ( http://arxiv.org/abs/2306.16367v1 ) ライセンス: Link先を確認	Won Joon Yun, Samuel Kim, Joongheon Kim	(参考訳) デジタルヘルスデータの驚異的な成長は、自然言語処理(NLP)や医療記録、臨床ノート、その他のテキストベースの健康情報を精査するための機械学習手法の利用への関心が高まっている。 nlp技術は患者のケアを増強し、臨床意思決定を知らせる上で大きな可能性を秘めているが、データプライバシと規制への順守は重要な懸念として続いている。フェデレーテッド・ラーニング(FL)は実行可能なソリューションとして登場し、複数の組織が生データを広めることなく、機械学習モデルを協調的にトレーニングすることを可能にする。本稿では, NVIDIA が開発した FL, NLP モデル, NVFlare フレームワークを併用することにより, 医療用 NLP に対する実用的アプローチを実現する。医療データ内のコンテキストやセマンティクスの理解において例外的な性能を示した,長期記憶モデル(lstm)とトランスフォーマ(bert)からの双方向エンコーダ表現モデルという,2つの模範的nlpモデルを提案する。本稿では,データのプライバシと規制遵守の課題に対処しつつ,高い精度と性能を維持しながら,bertプリトレーニングを取り入れ,提案手法の有効性を包括的に検証する統合フレームワークの開発について述べる。 The prodigious growth of digital health data has precipitated a mounting interest in harnessing machine learning methodologies, such as natural language processing (NLP), to scrutinize medical records, clinical notes, and other text-based health information. Although NLP techniques have exhibited substantial potential in augmenting patient care and informing clinical decision-making, data privacy and adherence to regulations persist as critical concerns. Federated learning (FL) emerges as a viable solution, empowering multiple organizations to train machine learning models collaboratively without disseminating raw data. This paper proffers a pragmatic approach to medical NLP by amalgamating FL, NLP models, and the NVFlare framework, developed by NVIDIA. We introduce two exemplary NLP models, the Long-Short Term Memory (LSTM)-based model and Bidirectional Encoder Representations from Transformers (BERT), which have demonstrated exceptional performance in comprehending context and semantics within medical data. This paper encompasses the development of an integrated framework that addresses data privacy and regulatory compliance challenges while maintaining elevated accuracy and performance, incorporating BERT pretraining, and comprehensively substantiating the efficacy of the proposed approach.	翻訳日:2023-06-29 13:27:01 公開日:2023-06-28
# Vanilla Gradient Descentを用いたNTKを超えて:ポリノーミアル幅,サンプル,時間を有するニューラルネットワークの平均場解析 Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time ( http://arxiv.org/abs/2306.16361v1 ) ライセンス: Link先を確認	Arvind Mahankali, Jeff Z. Haochen, Kefan Dong, Margalit Glasgow, Tengyu Ma	(参考訳) 2層ニューラルネットワークの非凸最適化に関する最近の理論的な進歩にもかかわらず、不自然な修正を伴わないニューラルネットワークの勾配降下がカーネル法よりも優れたサンプル複雑性を達成することができるかどうかはまだ疑問である。本稿では,多項式幅2層ニューラルネットワーク上の投影勾配流れのクリーンな平均場解析を提供する。先行研究と異なり,本解析では最適化アルゴリズムの不自然な修正は不要である。サンプルサイズ $n = O(d^{3.1})$ の場合、$d$ は入力の次元であり、ネットワークは多項式的に多くの反復に収束し、$n \ll d^4$ サンプルを用いてカーネルメソッドでは達成できない非自明な誤差に収束するので、修正されていない勾配降下と NTK の明確な分離を示す。 Despite recent theoretical progress on the non-convex optimization of two-layer neural networks, it is still an open question whether gradient descent on neural networks without unnatural modifications can achieve better sample complexity than kernel methods. This paper provides a clean mean-field analysis of projected gradient flow on polynomial-width two-layer neural networks. Different from prior works, our analysis does not require unnatural modifications of the optimization algorithm. We prove that with sample size $n = O(d^{3.1})$ where $d$ is the dimension of the inputs, the network converges in polynomially many iterations to a non-trivial error that is not achievable by kernel methods using $n \ll d^4$ samples, hence demonstrating a clear separation between unmodified gradient descent and NTK.	翻訳日:2023-06-29 13:26:38 公開日:2023-06-28
# 分極量子回路における古典計算性能境界 Classically computing performance bounds on depolarized quantum circuits ( http://arxiv.org/abs/2306.16360v1 ) ライセンス: Link先を確認	Sattwik Deb Mishra, Miguel Fr\'ias-P\'erez, Rahul Trivedi	(参考訳) 量子コンピュータとシミュレータは、古典的および量子的ハミルトニアンの基底状態の発見において、古典的コンピュータを上回る可能性がある。しかし、この利点が誤り訂正なしでノイズの存在に持続できるかどうかはまだ不明である。本稿では,ラグランジュ双対性の原理を生かして,量子回路の出力状態によって達成可能な最小エネルギーに対する検証可能な下限を,非分極ノイズの存在下で古典的に計算する数値解法を開発した。提案手法は、雑音量子回路の性能に回路構造依存的な境界を与えることができるという理論的および数値的な証拠を提供する。 Quantum computers and simulators can potentially outperform classical computers in finding ground states of classical and quantum Hamiltonians. However, if this advantage can persist in the presence of noise without error correction remains unclear. In this paper, by exploiting the principle of Lagrangian duality, we develop a numerical method to classically compute a certifiable lower bound on the minimum energy attainable by the output state of a quantum circuit in the presence of depolarizing noise. We provide theoretical and numerical evidence that this approach can provide circuit-architecture dependent bounds on the performance of noisy quantum circuits.	翻訳日:2023-06-29 13:26:18 公開日:2023-06-28
# 時空間グラフ畳み込みネットワークの伝達学習による視覚障害者の劇場支援システム Theater Aid System for the Visually Impaired Through Transfer Learning of Spatio-Temporal Graph Convolution Networks ( http://arxiv.org/abs/2306.16357v1 ) ライセンス: Link先を確認	Leyla Benhamida, Slimane Larabi	(参考訳) 本研究の目的は、視覚障害者や視覚障害者を支援するためにステージで行う人間の行動を認識することである。そこで我々は,深度画像で捉えた骨格データを入力として利用する演劇人間行動認識システムを開発した。劇場環境における人間の行動の新たなサンプルを収集し,スケルトンに基づく人行動認識のための3つの事前訓練された時空間グラフ畳み込みネットワーク(時空間グラフ畳み込みネットワーク,2ストリーム適応グラフ畳み込みネットワーク,およびマルチスケール不整合グラフ畳み込みネットワーク)を用いて移動学習手法を検証した。我々は、NTU-RGBDのヒューマンアクションベンチマークをソースドメインとして選択し、収集したデータセットをターゲットドメインとして使用した。本研究は,事前学習モデルの伝達可能性を分析し,トランスファー学習手法をソース領域とターゲット領域の多様性に適用し,適用するための2つの構成を提案した。移行学習の使用は、演劇の文脈における人間の行動システムの性能向上に寄与した。その結果,時空間グラフ畳み込みネットワークは肯定的に転送され,転送学習のないベースラインに比べて性能が向上した。 The aim of this research is to recognize human actions performed on stage to aid visually impaired and blind individuals. To achieve this, we have created a theatre human action recognition system that uses skeleton data captured by depth image as input. We collected new samples of human actions in a theatre environment, and then tested the transfer learning technique with three pre-trained Spatio-Temporal Graph Convolution Networks for skeleton-based human action recognition: the spatio-temporal graph convolution network, the two-stream adaptive graph convolution network, and the multi-scale disentangled unified graph convolution network. We selected the NTU-RGBD human action benchmark as the source domain and used our collected dataset as the target domain. We analyzed the transferability of the pre-trained models and proposed two configurations to apply and adapt the transfer learning technique to the diversity between the source and target domains. The use of transfer learning helped to improve the performance of the human action system within the context of theatre. The results indicate that Spatio-Temporal Graph Convolution Networks is positively transferred, and there was an improvement in performance compared to the baseline without transfer learning.	翻訳日:2023-06-29 13:26:06 公開日:2023-06-28
# cuSLINK:GPU上の単一リンク集約クラスタリング cuSLINK: Single-linkage Agglomerative Clustering on the GPU ( http://arxiv.org/abs/2306.16354v1 ) ライセンス: Link先を確認	Corey J. Nolet, Divye Gala, Alex Fender, Mahesh Doijade, Joe Eaton, Edward Raff, John Zedlewski, Brad Rees, Tim Oates	(参考訳) 本稿では,GPU上のSLINKアルゴリズムの新規かつ最先端の再構成であるcuSLINKを提案する。また,cuslinkを構成する新規かつ再利用可能なビルディングブロックのセットを提案する。これらのビルディングブロックには、$k$-NNグラフ構築、スパンニングツリー、デンドログラムクラスタ抽出のための高度に最適化された計算パターンが含まれている。我々は、プリミティブをGPU上でcuSLINKのエンドツーエンド実装にどのように使用したかを示し、さらに、かつて難解だったさまざまな現実世界のデータマイニングと機械学習アプリケーションを可能にしました。 HDBSCANアルゴリズムの主要な計算ボトルネックであるだけでなく、我々のエンドツーエンドのcuSLINKアルゴリズムの影響は、ソーシャルおよびコンピュータネットワークにおけるクラスタ分析、自然言語処理、コンピュータビジョンなど、幅広い重要な応用に及んでいる。 cuSLINKはhttps://docs.rapids.ai/api/cuml/latest/api/#agglomerative-clusteringで入手できる。 In this paper, we propose cuSLINK, a novel and state-of-the-art reformulation of the SLINK algorithm on the GPU which requires only $O(Nk)$ space and uses a parameter $k$ to trade off space and time. We also propose a set of novel and reusable building blocks that compose cuSLINK. These building blocks include highly optimized computational patterns for $k$-NN graph construction, spanning trees, and dendrogram cluster extraction. We show how we used our primitives to implement cuSLINK end-to-end on the GPU, further enabling a wide range of real-world data mining and machine learning applications that were once intractable. In addition to being a primary computational bottleneck in the popular HDBSCAN algorithm, the impact of our end-to-end cuSLINK algorithm spans a large range of important applications, including cluster analysis in social and computer networks, natural language processing, and computer vision. Users can obtain cuSLINK at https://docs.rapids.ai/api/cuml/latest/api/#agglomerative-clustering	翻訳日:2023-06-29 13:25:42 公開日:2023-06-28
# データ攻撃に対するアグリゲーション防衛の実践的側面について On Practical Aspects of Aggregation Defenses against Data Poisoning Attacks ( http://arxiv.org/abs/2306.16415v1 ) ライセンス: Link先を確認	Wenxiao Wang, Soheil Feizi	(参考訳) データへのアクセスの増加は、悪意のあるトレーニングサンプルでディープラーニングモデルの振る舞いを操作できるため、ディープラーニングの機会とリスクの両方をもたらす。このような攻撃はデータ中毒として知られている。データ中毒に対する防衛戦略の最近の進歩は、診断された中毒の堅牢性における最先端の成果を達成するための集約スキームの有効性を強調している。しかし、これらのアプローチの実践的意味はいまだ不明である。ここでは,代表的なアグリゲーション防御であるディープパーティショニングアグリゲーションに注目し,その実用的側面である効率,性能,堅牢性を評価する。評価には、ImageNetを64×64の解像度にリサイズして、以前のものよりも大規模な評価を可能にする。まず,集約防御のトレーニングと推論の効率を向上し,ベースモデルのスケーリングをシンプルかつ実用的なアプローチとして示す。第2に,データ・複雑度比,すなわちデータセットのサイズとサンプルの複雑さの比率を,精度を保ちながら展開可能なベースモデルの最大数を実用的に推定する実験的な証拠を提供する。最後に,アグリゲーション・ディフェンスが,アグリゲーションのアグリゲーションのアグリゲーション・ロバスト性において,アグリゲーションのアグリゲーション・ロバスト性を示す主要なメカニズムである中毒過剰適合現象を経験的に促進する方法を指摘する。全体として,データ中毒の脅威を軽減するために,アグリゲーション防御の実践的実装に有用な知見を提供する。 The increasing access to data poses both opportunities and risks in deep learning, as one can manipulate the behaviors of deep learning models with malicious training samples. Such attacks are known as data poisoning. Recent advances in defense strategies against data poisoning have highlighted the effectiveness of aggregation schemes in achieving state-of-the-art results in certified poisoning robustness. However, the practical implications of these approaches remain unclear. Here we focus on Deep Partition Aggregation, a representative aggregation defense, and assess its practical aspects, including efficiency, performance, and robustness. For evaluations, we use ImageNet resized to a resolution of 64 by 64 to enable evaluations at a larger scale than previous ones. Firstly, we demonstrate a simple yet practical approach to scaling base models, which improves the efficiency of training and inference for aggregation defenses. Secondly, we provide empirical evidence supporting the data-to-complexity ratio, i.e. the ratio between the data set size and sample complexity, as a practical estimation of the maximum number of base models that can be deployed while preserving accuracy. Last but not least, we point out how aggregation defenses boost poisoning robustness empirically through the poisoning overfitting phenomenon, which is the key underlying mechanism for the empirical poisoning robustness of aggregations. Overall, our findings provide valuable insights for practical implementations of aggregation defenses to mitigate the threat of data poisoning.	翻訳日:2023-06-29 13:18:21 公開日:2023-06-28
# MultiZoo & MultiBench: マルチモーダルディープラーニングのための標準ツールキット MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep Learning ( http://arxiv.org/abs/2306.16413v1 ) ライセンス: Link先を確認	Paul Pu Liang, Yiwei Lyu, Xiang Fan, Arav Agarwal, Yun Cheng, Louis-Philippe Morency, Ruslan Salakhutdinov	(参考訳) マルチモーダル表現の学習には、複数の異種データソースからの情報を統合することが含まれる。実世界のロバスト性を確保しつつ、未調査のモダリティやタスクの進歩を加速するため、20以上のコアマルチモーダルアルゴリズムと15のデータセット、10のモダリティ、20の予測タスク、および6つの研究領域にまたがる大規模ベンチマークであるMultiBenchを実装した公開ツールキットであるMultiZooをリリースする。これらを合わせて、データローディングや実験的なセットアップ、モデル評価の簡略化と標準化を行う、エンドツーエンドのマシンラーニングパイプラインが提供される。本研究では,(1)一般化,(2)時間と空間の複雑さ,(3)モダリティの堅牢性を評価するための包括的方法論を提案する。マルチベンチは、使いやすさ、アクセシビリティ、再現性を確保しつつ、マルチモーダルモデルの能力と制限をよりよく理解するための道を開く。私たちのツールキットは公開され、定期的に更新され、コミュニティからのインプットを歓迎します。 Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiZoo, a public toolkit consisting of standardized implementations of > 20 core multimodal algorithms and MultiBench, a large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. Together, these provide an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, we offer a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench paves the way towards a better understanding of the capabilities and limitations of multimodal models, while ensuring ease of use, accessibility, and reproducibility. Our toolkits are publicly available, will be regularly updated, and welcome inputs from the community.	翻訳日:2023-06-29 13:17:52 公開日:2023-06-28
# 見える言語モデルに向けて:自然言語レンズによるコンピュータビジョン Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language ( http://arxiv.org/abs/2306.16410v1 ) ライセンス: Link先を確認	William Berrios, Gautam Mittal, Tristan Thrush, Douwe Kiela, Amanpreet Singh	(参考訳) 大規模言語モデル(LLM)のパワーを活用することで,コンピュータビジョン問題に対処するためのモジュール型アプローチであるLENSを提案する。本システムでは、画像に関する徹底的な情報を提供する独立かつ記述性の高い視覚モジュール群からの出力を推論するために言語モデルを用いる。我々は,ゼロショットや少数ショットの物体認識などの純粋コンピュータビジョンの設定や,視覚や言語の問題に対するアプローチを評価する。 LENS は市販の LLM にも適用可能であり,LENS を用いた LLM は,より大規模で高度なシステムで高い競争力を発揮する。私たちはコードをhttps://github.com/contextualai/lensでオープンソースにし、インタラクティブなデモを提供します。 We propose LENS, a modular approach for tackling computer vision problems by leveraging the power of large language models (LLMs). Our system uses a language model to reason over outputs from a set of independent and highly descriptive vision modules that provide exhaustive information about an image. We evaluate the approach on pure computer vision settings such as zero- and few-shot object recognition, as well as on vision and language problems. LENS can be applied to any off-the-shelf LLM and we find that the LLMs with LENS perform highly competitively with much bigger and much more sophisticated systems, without any multimodal training whatsoever. We open-source our code at https://github.com/ContextualAI/lens and provide an interactive demo.	翻訳日:2023-06-29 13:17:31 公開日:2023-06-28
# データセットシフトの一般形に基づく効率的かつ多元的ロバストリスク推定 Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift ( http://arxiv.org/abs/2306.16406v1 ) ライセンス: Link先を確認	Hongxiang Qiu, Eric Tchetgen Tchetgen, Edgar Dobriban	(参考訳) 統計的な機械学習手法は、利害関係者から利用可能な限られたデータの課題に直面することが多い。 1つの治療法は、いくつかの条件分布を共有したり、ターゲットドメインと他の方法でリンクされた補助源集団のデータを活用することである。このような \emph{dataset shift} 条件を活用する手法は \emph{domain adaptation} または \emph{transfer learning} として知られている。データセットのシフトに関する広範な文献にもかかわらず、限定的な研究は、対象人口における与えられた機械学習タスクのリスク評価の正確性を改善するために補助人口を効率的に利用する方法に言及している。本稿では, 半パラメトリック効率理論を用いて, 様々なデータセットシフト条件下でターゲット人口リスクを効率的に推定する一般的な問題について検討する。我々は,共変量,ラベル,概念シフトの3つの一般的な条件を含む,データセットシフト条件の一般的なクラスを特別なケースとして検討する。我々は、ソースとターゲットの人口の間で部分的に重複しない支持を可能にする。我々は、これらのデータセットシフト条件の簡単な仕様テストと共に、効率的かつ多重にロバストな推定器を開発する。また、他の2つのデータセットシフト条件、後方ドリフトと位置スケールシフトの効率境界も導出する。シミュレーション研究は、妥当なデータセットシフト条件の活用による効率向上を支援する。 Statistical machine learning methods often face the challenge of limited data available from the population of interest. One remedy is to leverage data from auxiliary source populations, which share some conditional distributions or are linked in other ways with the target domain. Techniques leveraging such \emph{dataset shift} conditions are known as \emph{domain adaptation} or \emph{transfer learning}. Despite extensive literature on dataset shift, limited works address how to efficiently use the auxiliary populations to improve the accuracy of risk evaluation for a given machine learning task in the target population. In this paper, we study the general problem of efficiently estimating target population risk under various dataset shift conditions, leveraging semiparametric efficiency theory. We consider a general class of dataset shift conditions, which includes three popular conditions -- covariate, label and concept shift -- as special cases. We allow for partially non-overlapping support between the source and target populations. We develop efficient and multiply robust estimators along with a straightforward specification test of these dataset shift conditions. We also derive efficiency bounds for two other dataset shift conditions, posterior drift and location-scale shift. Simulation studies support the efficiency gains due to leveraging plausible dataset shift conditions.	翻訳日:2023-06-29 13:17:19 公開日:2023-06-28
# 量子2ブロック群代数符号 Quantum two-block group algebra codes ( http://arxiv.org/abs/2306.16400v1 ) ライセンス: Link先を確認	Hsiang-Ku Lin and Leonid P. Pryadko	(参考訳) 量子2ブロック群代数 (2BGA) は、これまで研究されていない最小の持ち上げ積 (LP) 符号の族である。これらの符号は一般化双サイクル(gb)符号と関係があるが、巡回群は任意の有限群(一般に非可換群)に置き換えられる。特別な場合として、2BGA符号は、準巡回符号を含むアーベル群上の正方行列LP符号のサブセットと、古典群符号の対から構築された全正方行列ハイパーグラフ積符号を含む。 2bga符号の置換同値性の基準を定め、それらのパラメータの境界を明示的かつ他の量子符号と古典符号との関係で与える。また、安定化器発生器重みが$W \le 8$、アーベル群が$n \le 100$、非アーベル群が$n \le 200$の全ての非等価連結2BGA符号の最適パラメータを列挙する。 We consider quantum two-block group algebra (2BGA) codes, a previously unstudied family of smallest lifted-product (LP) codes. These codes are related to generalized-bicycle (GB) codes, except a cyclic group is replaced with an arbitrary finite group, generally non-abelian. As special cases, 2BGA codes include a subset of square-matrix LP codes over abelian groups, including quasi-cyclic codes, and all square-matrix hypergraph-product codes constructed from a pair of classical group codes. We establish criteria for permutation equivalence of 2BGA codes and give bounds for their parameters, both explicit and in relation to other quantum and classical codes. We also enumerate the optimal parameters of all inequivalent connected 2BGA codes with stabilizer generator weights $W \le 8$, of length $n \le 100$ for abelian groups, and $n \le 200$ for non-abelian groups.	翻訳日:2023-06-29 13:17:01 公開日:2023-06-28
# SUSYによる平行非一様電磁界におけるディラック材料:新しいキラル平面ホール効果のクラスか? Dirac materials in parallel non-uniform electromagnetic fields generated by SUSY: A new class of chiral Planar Hall Effect? ( http://arxiv.org/abs/2306.16399v1 ) ライセンス: Link先を確認	Julio Cesar P\'erez-Pedraza, Juan D. Garc\'ia-Mu\~noz and A. Raya	(参考訳) 超対称量子力学(susy-qm)の枠組みでは、外部平行電気および磁場の存在下でディラック材料を記述する(3+1)ディラック方程式が解かれる。 y方向に沿った並進対称性を持つ静的だが非一様の電気的および磁気的プロファイルを考えると、ディラック方程式はフェルミオン場の各キラリティに対して2つの分離されたschr\"odinger方程式に変換される。ベクトルポテンシャルとスカラーポテンシャルの三角プロファイルと双曲プロファイルを取り、susyパートナー p\"oschl-teller-like quantum potentials に到達した。解析的ゼロモード解を支持するポテンシャルの条件に限定すると、電場と磁場が作用する同じ平面で非自明な電流密度が得られるが、両者は垂直であり、平面ホール効果を実現する可能性を示している。さらに、この非破壊電流密度は、左右のキラル性に対する電流密度の和であり、正の電流はキラル対称性の結果であることを示唆している。 Within a Supersymmetric Quantum Mechanics (SUSY-QM) framework, the (3+1) Dirac equation describing a Dirac material in the presence of external parallel electric and magnetic fields is solved. Considering static but non-uniform electric and magnetic profiles with translational symmetry along the y-direction, the Dirac equation is transformed into two decoupled pairs of Schr\"odinger equations, one for each chirality of the fermion fields. Taking trigonometric and hyperbolic profiles for the vector and scalar potentials, respectively, we arrive at SUSY partner P\"oschl-Teller-like quantum potentials. Restricting to the conditions of the potentials that support an analytic zero-mode solution, we obtain a nontrivial current density in the same plane where the electric and magnetic fields lie, but perpendicular to both of them, indicating the possibility of realizing the Planar Hall Effect. Furthermore, this non-vanishing current density is the sum of current densities for the left- and right-chiralities, suggesting that the net current is a consequence of chiral symmetry.	翻訳日:2023-06-29 13:16:40 公開日:2023-06-28
# 量子チャネルとスーパーチャネルの双対性は基底依存性である Duality between quantum channels and super-channels is basis-dependent ( http://arxiv.org/abs/2306.16395v1 ) ライセンス: Link先を確認	Sohail, Sahil, Ritabrata Sengupta, Ujjwal Sen	(参考訳) Choi-Jamio{\l}kowski-Kraus-Sudarshan量子チャネル状態同型における完全正の対正の対応は基底の選択に依存する。例えば、パウリのスピン行列と、二次元複素ヒルベルト空間上の有界作用素の空間の基底としての恒等式を使用すれば、この対応は崩壊する。この対応の妥当性に基づく十分条件は、後に Kye~\cite{Kye} によって必要であることが証明された Paulsen と Shult~\cite{Paulsen} の業績に与えられる。スーパーマップの空間と入力と出力の空間のテンソル積の間にも対応性が存在する。特に、超写像が完全CP保存であることと、そのChoi型表現が完全正であることは同値である。この対応は基底の特定の選択にも依存する。本研究では,この対応が真であるように,必要かつ十分な条件を求める。 The complete positivity vs positivity correspondence in the Choi-Jamio{\l}kowski-Kraus-Sudarshan quantum channel-state isomorphism depends on the choice of basis. Instead of the ``canonical'' basis, if we use, e.g., the Pauli spin matrices along with the identity as the basis for the space of bounded operators on the two-dimensional complex Hilbert space, this correspondence breaks down. A sufficient condition on the basis for validity of this correspondence is provided in the work of Paulsen and Shult~\cite{Paulsen}, which was later proven to be necessary by Kye~\cite{Kye}. A correspondence is also present between the space of super-maps and the tensor product of the spaces of the inputs and outputs of the same. In particular, a super-map is completely CP-preserving if and only if its Choi-type representation is completely positive (CP). This correspondence also depends on a specific choice of basis. In this work, we find the necessary and sufficient condition on a basis such that this correspondence holds true.	翻訳日:2023-06-29 13:16:18 公開日:2023-06-28
# 平均回帰マルコフ決定過程に対するシャーパモデルフリー強化学習 Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes ( http://arxiv.org/abs/2306.16394v1 ) ライセンス: Link先を確認	Zihan Zhang and Qiaomin Xie	(参考訳) 我々は,無限水平平均逆マルコフ決定過程(MDPs)のモデルフリー強化学習(RL)アルゴリズムを開発した。オンライン設定とシミュレータへのアクセスによる設定の両方について検討する。本稿では,参照アドバンテージ分解に基づくモデルフリーRLアルゴリズムを提案する。このアルゴリズムは、$t$ステップ後に$\widetilde{o}(s^5a^2\mathrm{sp}(h^)\sqrt{t})$を成し、$s\times a$は状態-作用空間のサイズであり、$\mathrm{sp}(h^)$は最適なバイアス関数の幅である。我々の結果は、弱通信型MDPに対するT$の最適依存性を最初に達成したものである。シミュレータ設定では,$\widetilde{O} \left(\frac{SA\mathrm{sp}^2(h^)}{\epsilon^2}+\frac{S^2A\mathrm{sp}(h^)}{\epsilon} \right)$サンプルを用いて,$\Omega\left(\frac{SA\mathrm{sp}(h^)}{\epsilon^2}\right)$を使用するモデルフリーなRLアルゴリズムを提案する。この結果は,平均回帰設定でユニークな2つの新しい手法に基づいている。 1) 値差推定によるより良い割引近似 2) 空間複雑性を$O(SA)$とする最適バイアス関数に対する信頼領域の効率的な構築。 We develop several provably efficient model-free reinforcement learning (RL) algorithms for infinite-horizon average-reward Markov Decision Processes (MDPs). We consider both online setting and the setting with access to a simulator. In the online setting, we propose model-free RL algorithms based on reference-advantage decomposition. Our algorithm achieves $\widetilde{O}(S^5A^2\mathrm{sp}(h^)\sqrt{T})$ regret after $T$ steps, where $S\times A$ is the size of state-action space, and $\mathrm{sp}(h^)$ the span of the optimal bias function. Our results are the first to achieve optimal dependence in $T$ for weakly communicating MDPs. In the simulator setting, we propose a model-free RL algorithm that finds an $\epsilon$-optimal policy using $\widetilde{O} \left(\frac{SA\mathrm{sp}^2(h^)}{\epsilon^2}+\frac{S^2A\mathrm{sp}(h^)}{\epsilon} \right)$ samples, whereas the minimax lower bound is $\Omega\left(\frac{SA\mathrm{sp}(h^)}{\epsilon^2}\right)$. Our results are based on two new techniques that are unique in the average-reward setting: 1) better discounted approximation by value-difference estimation; 2) efficient construction of confidence region for the optimal bias function with space complexity $O(SA)$.	翻訳日:2023-06-29 13:16:00 公開日:2023-06-28
# 言語モデルにおける主観的グローバル・オピニオン表現の計測に向けて Towards Measuring the Representation of Subjective Global Opinions in Language Models ( http://arxiv.org/abs/2306.16388v1 ) ライセンス: Link先を確認	Esin Durmus, Karina Nyugen, Thomas I. Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, Deep Ganguli	(参考訳) 大規模言語モデル(LLM)は、社会問題に関する多様なグローバルな視点を公平に表すものではない。本稿では,モデル生成応答がより類似している意見を評価するための定量的枠組みを開発する。まず,各国のグローバル問題に対するさまざまな意見の収集を目的とした全国横断調査から回答を得たデータセットGlobalOpinionQAを構築した。次に, LLM が生成する調査応答と, 国別に設定した人的応答の類似度を定量化する指標を定義した。われわれのフレームワークでは、3つの実験をLEMで実施し、立憲AIに役立ち、正直で無害であるように訓練した。デフォルトでは、LCMの反応は、米国や一部のヨーロッパや南米諸国のような特定の人口の意見とよく似ており、偏見の可能性を浮き彫りにしている。モデルに特定の国の視点を考察するよう促すと、応答は人口の意見によく似ているが、有害な文化的ステレオタイプを反映することができる。我々がGlobalOpinionQA質問を対象言語に翻訳するとき、モデルの応答は必ずしもそれらの言語の話者の意見に最もよく似ているとは限らない。他の人が使用して構築するためのデータセットをリリースします。私たちのデータはhttps://huggingface.co/datasets/Anthropic/llm_global_opinionsにあります。また、https://llmglobalvalues.anthropic.comでもインタラクティブな可視化を提供しています。 Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries. Next, we define a metric that quantifies the similarity between LLM-generated survey responses and human responses, conditioned on country. With our framework, we run three experiments on an LLM trained to be helpful, honest, and harmless with Constitutional AI. By default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases. When we prompt the model to consider a particular country's perspective, responses shift to be more similar to the opinions of the prompted populations, but can reflect harmful cultural stereotypes. When we translate GlobalOpinionQA questions to a target language, the model's responses do not necessarily become the most similar to the opinions of speakers of those languages. We release our dataset for others to use and build on. Our data is at https://huggingface.co/datasets/Anthropic/llm_global_opinions. We also provide an interactive visualization at https://llmglobalvalues.anthropic.com.	翻訳日:2023-06-29 13:15:20 公開日:2023-06-28
# GPUによる直接ストレージアクセスによるGNNフレームワークのサンプリングと集約操作の高速化 Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses ( http://arxiv.org/abs/2306.16384v1 ) ライセンス: Link先を確認	Jeongmin Brian Park and Vikram Sharma Mailthody and Zaid Qureshi and Wen-mei Hwu	(参考訳) グラフニューラルネットワーク(gnns)は、グラフ構造化データから学び、さまざまなアプリケーションドメインで高度な推論タスクを実行するための強力なツールとして登場している。 GNNは、控えめなグラフで有効であることが示されているが、効率的なデータアクセスとデータ移動方法がないため、大規模グラフでそれらを訓練することは大きな課題である。既存のGNNトレーニングフレームワークでは、グラフサンプリングと機能集約にCPUを使用し、GPU上でモデルの重み付けのトレーニングと更新が実行される。しかし、我々の詳細なプロファイリングは、CPUがGNNモデルのトレーニングスループットを飽和させるのに必要なスループットを達成できないことを示している。さらに、グラフとその埋め込みがCPUメモリに収まらない場合、オペレーティングシステムによって導入されたオーバーヘッド、例えばページフォールトを扱うことは、実行の重要な経路となる。これらの問題に対処するために、GPU Initiated Direct Storage Access (GIDS) データローダを提案し、CPUメモリ、ストレージ、GPUメモリなどのハードウェアリソースをハイブリッドデータ配置戦略で効率的に活用しながら、大規模グラフに対するGPU指向のGNNトレーニングを可能にする。 GPUスレッドがストレージから直接特徴ベクトルをフェッチできるようにすることで、GIDSデータローダはGPU指向のGNNトレーニングのメモリ容量問題を解決する。さらに、GIDSデータローダはGPU並列性を利用してストレージ遅延を許容し、高価なページフォールトオーバーヘッドを排除している。これにより、局所性を活かし、GNNトレーニングに有効な帯域幅を増やすための新しい最適化を設計できる。テラバイト規模のGNNデータセット上の1つのGPUを用いて評価したところ、GIDSデータローダは、現在最先端のDGLデータローダと比較して、DGL GNNトレーニングパイプライン全体を最大392倍高速化することがわかった。 Graph Neural Networks (GNNs) are emerging as a powerful tool for learning from graph-structured data and performing sophisticated inference tasks in various application domains. Although GNNs have been shown to be effective on modest-sized graphs, training them on large-scale graphs remains a significant challenge due to lack of efficient data access and data movement methods. Existing frameworks for training GNNs use CPUs for graph sampling and feature aggregation, while the training and updating of model weights are executed on GPUs. However, our in-depth profiling shows the CPUs cannot achieve the throughput required to saturate GNN model training throughput, causing gross under-utilization of expensive GPU resources. Furthermore, when the graph and its embeddings do not fit in the CPU memory, the overhead introduced by the operating system, say for handling page-faults, comes in the critical path of execution. To address these issues, we propose the GPU Initiated Direct Storage Access (GIDS) dataloader, to enable GPU-oriented GNN training for large-scale graphs while efficiently utilizing all hardware resources, such as CPU memory, storage, and GPU memory with a hybrid data placement strategy. By enabling GPU threads to fetch feature vectors directly from storage, GIDS dataloader solves the memory capacity problem for GPU-oriented GNN training. Moreover, GIDS dataloader leverages GPU parallelism to tolerate storage latency and eliminates expensive page-fault overhead. Doing so enables us to design novel optimizations for exploiting locality and increasing effective bandwidth for GNN training. Our evaluation using a single GPU on terabyte-scale GNN datasets shows that GIDS dataloader accelerates the overall DGL GNN training pipeline by up to 392X when compared to the current, state-of-the-art DGL dataloader.	翻訳日:2023-06-29 13:14:57 公開日:2023-06-28
# 自動運転の新技術の概要 An Overview about Emerging Technologies of Autonomous Driving ( http://arxiv.org/abs/2306.13302v3 ) ライセンス: Link先を確認	Yu Huang, Yue Chen, Zijiang Yang	(参考訳) 2004年にDARPAがグランドチャレンジを始め、2007年にアーバンチャレンジを開始して以来、自動運転はAIアプリケーションの最も活発な分野となっている。本稿では,自動運転技術とオープン問題の技術的側面について概説する。本稿では,認識,マッピングとローカライゼーション,予測,計画と制御,シミュレーション,V2X,安全性など,自動運転システムの主要な分野について検討する。特に私たちは,ロングテールの自動運転問題を解決するための一般的なプラットフォームであるdata closed loopのフレームワークで,これらすべての問題を詳しく説明しています。 Since DARPA started Grand Challenges in 2004 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. This paper gives an overview about technical aspects of autonomous driving technologies and open problems. We investigate the major fields of self-driving systems, such as perception, mapping and localization, prediction, planning and control, simulation, V2X and safety etc. Especially we elaborate on all these issues in a framework of data closed loop, a popular platform to solve the long tailed autonomous driving problems.	翻訳日:2023-06-29 11:30:31 公開日:2023-06-28
# 意識的知識グラフ畳み込みネットワークに基づく観光客の推薦 Tourist Attractions Recommendation based on Attention Knowledge Graph Convolution Network ( http://arxiv.org/abs/2306.10946v3 ) ライセンス: Link先を確認	Ahmad A. Mubarak and Afifa Kahled	(参考訳) 知識グラフに基づく推薦アルゴリズムは比較的成熟した段階にある。しかし、特定の分野の推薦にはいくつかの問題がある。例えば、観光分野では、観光アトラクションの推奨基盤として、適切な観光アトラクション属性の選択プロセスが複雑である。本稿では,対象の景観スポットの近傍のエンティティを自動的に意味的に発見する改良された意識知識グラフ畳み込みネットワークモデル(Att-KGCN)を提案する。注意層は比較的類似した位置を集約し、隣接するベクトルでそれらを表現する。そして、観光客の好む選択により、類似点の確率を推薦システムとして予測する。 Socotra Island-Yemenの観光データに基づく観光名所の知識グラフデータセット実験により,アテンションナレッジグラフ畳み込みネットワークが観光名所のレコメンデーションに良い影響を与え,観光客の選択により多くのレコメンデーションをすることができることを確認した。 The recommendation algorithm based on knowledge graphs is at a relatively mature stage. However, there are still some problems in the recommendation of specific areas. For example, in the tourism field, selecting suitable tourist attraction attributes process is complicated as the recommendation basis for tourist attractions. In this paper, we propose the improved Attention Knowledge Graph Convolution Network model, named (Att-KGCN), which automatically discovers the neighboring entities of the target scenic spot semantically. The attention layer aggregates relatively similar locations and represents them with an adjacent vector. Then, according to the tourist's preferred choices, the model predicts the probability of similar spots as a recommendation system. A knowledge graph dataset of tourist attractions used based on tourism data on Socotra Island-Yemen. Through experiments, it is verified that the Attention Knowledge Graph Convolution Network has a good effect on the recommendation of tourist attractions and can make more recommendations for tourists' choices.	翻訳日:2023-06-29 11:30:22 公開日:2023-06-28
# TSMixer:多変量時系列予測のための軽量MLPミクサモデル TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting ( http://arxiv.org/abs/2306.09364v3 ) ライセンス: Link先を確認	Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam	(参考訳) トランスフォーマーは時系列予測において、長い列の相互作用を捉える能力で人気を集めている。しかし、その高いメモリとコンピューティング要件は長期的な予測に重大なボトルネックをもたらす。そこで本研究では,多層パーセプトロン(MLP)モジュールのみからなる軽量ニューラルネットワークTSMixerを提案する。 tsmixerはパッチ付き時系列の多変量予測と表現学習のために設計されており、トランスフォーマーの効率的な代替手段を提供する。我々のモデルはコンピュータビジョンにおけるMLP-Mixerモデルの成功からインスピレーションを得ている。時系列にVision MLP-Mixerを適用する際の課題を示し、精度を高めるために経験的検証されたコンポーネントを導入する。これは、階層構造やチャネル相関などの時系列特性を明示的にモデル化するための、MLP-Mixerバックボーンにオンライン和解ヘッドを付加する新しい設計パラダイムを含む。また,既存のパッチチャネル混合方式では一般的な課題である,多種多様なデータセット間のノイズチャネルインタラクションと一般化を効果的に処理するためのハイブリッドチャネルモデリング手法を提案する。さらに、重要な特徴を優先するために、バックボーンに単純なゲートアテンション機構が導入される。これらの軽量なコンポーネントを組み込むことで、単純なmlp構造の学習能力を大幅に向上させ、最小の計算使用量で複雑なトランスフォーマーモデルを上回る。さらに、TSMixerのモジュール設計により、教師付きとマスク付きの両方の自己教師付き学習手法との互換性が実現され、時系列基礎モデルのための有望なビルディングブロックとなる。 TSMixer は最先端の MLP と Transformer のモデルよりも 8-60% の差で予測できる。また、Patch-Transformerモデルの最新の強力なベンチマーク(1～2%)を上回り、メモリとランタイム(2～3倍)を大幅に削減した。 Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions. However, their high memory and computing requirements pose a critical bottleneck for long-term forecasting. To address this, we propose TSMixer, a lightweight neural architecture exclusively composed of multi-layer perceptron (MLP) modules. TSMixer is designed for multivariate forecasting and representation learning on patched time series, providing an efficient alternative to Transformers. Our model draws inspiration from the success of MLP-Mixer models in computer vision. We demonstrate the challenges involved in adapting Vision MLP-Mixer for time series and introduce empirically validated components to enhance accuracy. This includes a novel design paradigm of attaching online reconciliation heads to the MLP-Mixer backbone, for explicitly modeling the time-series properties such as hierarchy and channel-correlations. We also propose a Hybrid channel modeling approach to effectively handle noisy channel interactions and generalization across diverse datasets, a common challenge in existing patch channel-mixing methods. Additionally, a simple gated attention mechanism is introduced in the backbone to prioritize important features. By incorporating these lightweight components, we significantly enhance the learning capability of simple MLP structures, outperforming complex Transformer models with minimal computing usage. Moreover, TSMixer's modular design enables compatibility with both supervised and masked self-supervised learning methods, making it a promising building block for time-series Foundation Models. TSMixer outperforms state-of-the-art MLP and Transformer models in forecasting by a considerable margin of 8-60%. It also outperforms the latest strong benchmarks of Patch-Transformer models (by 1-2%) with a significant reduction in memory and runtime (2-3X).	翻訳日:2023-06-29 11:30:07 公開日:2023-06-28
# 非均一サンプリングによるネットワークデータの等角予測の有効性について On the Validity of Conformal Prediction for Network Data Under Non-Uniform Sampling ( http://arxiv.org/abs/2306.07252v3 ) ライセンス: Link先を確認	Robert Lunde	(参考訳) 実例ではよく見られるが,ノードの非表現的なサンプルとなる様々なサンプリングメカニズムの下で,ネットワークデータの共形予測の特性について検討する。これらのサンプリング機構を,過集団に適用する選択規則として解釈し,適切な選択イベントにおける共形予測条件の有効性について検討する。選択規則が置換不変性を満たす場合、サンプルされたサブアレイは選択イベント上で交換可能条件であり、その超集団に対して共有交換可能条件が成立することを示す。以上の結果から,エゴネットワークや雪玉サンプリングに関連する特定の選択事象に対する共形予測の有限サンプルの有効性が示唆された。また,グラフ上のランダムなウォークでデータをサンプリングすると,重み付き共形予測の変種が個体群から選択したノードに対して漸近的に妥当な予測集合を生成することを示した。 We study the properties of conformal prediction for network data under various sampling mechanisms that commonly arise in practice but often result in a non-representative sample of nodes. We interpret these sampling mechanisms as selection rules applied to a superpopulation and study the validity of conformal prediction conditional on an appropriate selection event. We show that the sampled subarray is exchangeable conditional on the selection event if the selection rule satisfies a permutation invariance property and a joint exchangeability condition holds for the superpopulation. Our result implies the finite-sample validity of conformal prediction for certain selection events related to ego networks and snowball sampling. We also show that when data are sampled via a random walk on a graph, a variant of weighted conformal prediction yields asymptotically valid prediction sets for an independently selected node from the population.	翻訳日:2023-06-29 11:29:37 公開日:2023-06-28
# 大規模言語モデルからレコメンダシステムにどのようなメリットがあるか:調査 How Can Recommender Systems Benefit from Large Language Models: A Survey ( http://arxiv.org/abs/2306.05817v4 ) ライセンス: Link先を確認	Jianghao Lin, Xinyi Dai, Yunjia Xi, Weiwen Liu, Bo Chen, Xiangyang Li, Chenxu Zhu, Huifeng Guo, Yong Yu, Ruiming Tang, Weinan Zhang	(参考訳) インターネットアプリケーションにおいて,レコメンダシステム(RS)はユーザの情報要求に合わせて重要な役割を果たす。自然言語処理(nlp)領域では、大規模言語モデル(llm)は驚くべき創発的能力(例えば命令追従、推論)を示しており、llmをrsに適用してパフォーマンスの向上とユーザエクスペリエンスの改善を行う有望な研究方向を生み出している。本稿では,本研究の方向性をアプリケーション指向の観点から総合的に調査する。まず, LLM を RS に適用する方法という2つの直交的な視点から, 既存の研究成果を要約する。 where"という質問に対して、我々は、レコメンデーションパイプラインのさまざまなステージでllmが果たすことができる役割、すなわち、機能工学、特徴エンコーダ、スコアリング/ランキング関数、パイプラインコントローラについて論じる。 how"問題に対しては、トレーニングと推論の戦略を調査し、llmをチューニングするか否か、推論に従来の推奨モデル(crm)を関与させるかどうかという2つの詳細な分類基準を導出する。いずれの質問にも詳細な分析と一般的な開発軌跡が提供される。次に,3つの側面,すなわち効率性,有効性,倫理性から,LSMをRSに適用する上での課題を強調した。最後に,調査の概要と今後の展望について考察する。また、この上昇方向において、論文やその他の関連リソースのためのgithubリポジトリを積極的に維持している。 Recommender systems (RS) play important roles to match users' information needs for Internet applications. In natural language processing (NLP) domains, large language model (LLM) has shown astonishing emergent abilities (e.g., instruction following, reasoning), thus giving rise to the promising research direction of adapting LLM to RS for performance enhancements and user experience improvements. In this paper, we conduct a comprehensive survey on this research direction from an application-oriented view. We first summarize existing research works from two orthogonal perspectives: where and how to adapt LLM to RS. For the "WHERE" question, we discuss the roles that LLM could play in different stages of the recommendation pipeline, i.e., feature engineering, feature encoder, scoring/ranking function, and pipeline controller. For the "HOW" question, we investigate the training and inference strategies, resulting in two fine-grained taxonomy criteria, i.e., whether to tune LLMs or not, and whether to involve conventional recommendation model (CRM) for inference. Detailed analysis and general development trajectories are provided for both questions, respectively. Then, we highlight key challenges in adapting LLM to RS from three aspects, i.e., efficiency, effectiveness, and ethics. Finally, we summarize the survey and discuss the future prospects. We also actively maintain a GitHub repository for papers and other related resources in this rising direction: https://github.com/CHIANGEL/Awesome-LLM-for-RecSys.	翻訳日:2023-06-29 11:29:22 公開日:2023-06-28
# ストリーミングデータからのニューラルネットワークオンライン学習のための低ランク拡張カルマンフィルタ Low-rank extended Kalman filtering for online learning of neural networks from streaming data ( http://arxiv.org/abs/2305.19535v3 ) ライセンス: Link先を確認	Peter G. Chang, Gerardo Dur\'an-Mart\'in, Alexander Y Shestopaloff, Matt Jones, Kevin Murphy	(参考訳) 非定常データストリームから非線形関数のパラメータを推定するための効率的なオンライン近似ベイズ推定アルゴリズムを提案する。この方法は拡張カルマンフィルタ(ekf)に基づいているが、モデルパラメータの数に線形なステップあたりのコストを与える、後方精度行列の新たな低ランク+対角分解を用いる。確率的変動推論に基づく手法とは対照的に,本手法は完全に決定論的であり,ステップサイズチューニングを必要としない。実験により,この結果がより高速(より標本効率のよい)学習となり,分布の変化に適応しやすくなり,文脈的バンディットアルゴリズムの一部として使用する場合の報酬の蓄積が早くなることを示した。 We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream. The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior precision matrix, which gives a cost per step which is linear in the number of model parameters. In contrast to methods based on stochastic variational inference, our method is fully deterministic, and does not require step-size tuning. We show experimentally that this results in much faster (more sample efficient) learning, which results in more rapid adaptation to changing distributions, and faster accumulation of reward when used as part of a contextual bandit algorithm.	翻訳日:2023-06-29 11:28:57 公開日:2023-06-28
# QR-CLIP: 位置と時間推論のための明示的なオープンワールド知識の導入 QR-CLIP: Introducing Explicit Open-World Knowledge for Location and Time Reasoning ( http://arxiv.org/abs/2302.00952v3 ) ライセンス: Link先を確認	Weimin Shi, Mingchen Zhuge, Dehong Gao, Zhong Zhou, Ming-Ming Cheng, Deng-Ping Fan	(参考訳) 日々のイメージは、私たちが記憶し、それらから深い情報を推測する必要がある抽象的な意味を伝える。このような人間的な推論を促進するために、我々は機械に従来のセグメンテーションや分類といった基本的なタスクではなく、いつ、どこで、いつ取られたかを予測するように教える。 Horn氏のQR理論に触発されて、2つのコンポーネントからなる新しいQR-CLIPモデルを設計した。 1)Quantityモジュールは,まず,候補言語の入力として,よりオープンワールドな知識を振り返る。 2) 関連モジュールは,視覚と言語手がかりを慎重に推定し,位置と時刻を推定する。実験によりQR-CLIPの有効性が示され、各タスクにおける以前のSOTAを、位置と時間的推論の観点から平均約10%と130%の相対的なリフトで上回ります。本研究は,位置情報と時間的推論の技術的基礎を築いており,オープンワールド知識の効果的な導入が課題のパナセの1つであることを示唆する。 Daily images may convey abstract meanings that require us to memorize and infer profound information from them. To encourage such human-like reasoning, in this work, we teach machines to predict where and when it was taken rather than performing basic tasks like traditional segmentation or classification. Inspired by Horn's QR theory, we designed a novel QR-CLIP model consisting of two components: 1) the Quantity module first retrospects more open-world knowledge as the candidate language inputs; 2) the Relevance module carefully estimates vision and language cues and infers the location and time. Experiments show our QR-CLIP's effectiveness, and it outperforms the previous SOTA on each task by an average of about 10% and 130% relative lift in terms of location and time reasoning. This study lays a technical foundation for location and time reasoning and suggests that effectively introducing open-world knowledge is one of the panaceas for the tasks.	翻訳日:2023-06-29 11:28:42 公開日:2023-06-28
# マインドディアル:神経対話生成のための理論オブマインドモデリングによる信念のダイナミクス追跡 MindDial: Belief Dynamics Tracking with Theory-of-Mind Modeling for Situated Neural Dialogue Generation ( http://arxiv.org/abs/2306.15253v2 ) ライセンス: Link先を確認	Shuwen Qiu, Song-Chun Zhu, Zilong Zheng	(参考訳) 人間は表現された意味や共通点を交渉しながら自由に話す。大きな生成言語モデルの印象的な会話能力にもかかわらず、共有場所における文脈理解の個人差は考慮されていない。本研究はMindDialを提案する。MindDialは、位置自由な応答を生成できる新しい対話型フレームワークで、共通基盤の交渉を行う。我々は,3段階の信念を追跡可能な明示的なマインドモジュールを設計する。話者の信念,話者の聴取者の信念の予測,および最初の2つの間隙に基づく共通信念である。そして、話す行為分類ヘッドは、話を続けるか、このターンを終了するか、タスク関連のアクションを取ることに決めます。 2つのエージェント間の無料チャットに基づいて,1つの相互友人を見つけることを目標とする,信念ダイナミクスアノテーションを用いた共通基底アライメントデータセットの相互フレンドを補強する。実験により, 人間の自然な会話の流れを再現する上で, 心的状態モデリングを用いたモデルが人間の反応に類似することが確認された。さらに、アブレーション研究により、第3レベルの共通信念は、第1および第2の信念の情報を集約し、共通基盤をより効率的に調整することができる。 Humans talk in free-form while negotiating the expressed meanings or common ground. Despite the impressive conversational abilities of the large generative language models, they do not consider the individual differences in contextual understanding in a shared situated environment. In this work, we propose MindDial, a novel conversational framework that can generate situated free-form responses to negotiate common ground. We design an explicit mind module that can track three-level beliefs -- the speaker's belief, the speaker's prediction of the listener's belief, and the common belief based on the gap between the first two. Then the speaking act classification head will decide to continue to talk, end this turn, or take task-related action. We augment a common ground alignment dataset MutualFriend with belief dynamics annotation, of which the goal is to find a single mutual friend based on the free chat between two agents. Experiments show that our model with mental state modeling can resemble human responses when aligning common ground meanwhile mimic the natural human conversation flow. The ablation study further validates the third-level common belief can aggregate information of the first and second-order beliefs and align common ground more efficiently.	翻訳日:2023-06-29 11:23:18 公開日:2023-06-28
# MIMIC:画像対応による仮面画像モデリング MIMIC: Masked Image Modeling with Image Correspondences ( http://arxiv.org/abs/2306.15128v2 ) ライセンス: Link先を確認	Kalyani Marathe, Mahtab Bigverdi, Nishat Khan, Tuhin Kundu, Aniruddha Kembhavi, Linda G. Shapiro, Ranjay Krishna	(参考訳) 現在、コンピュータビジョンにおける深度推定とセマンティックセグメンテーションは、事前訓練された画像表現に依存している。したがって、効果的な事前学習データセットのキュレーションは不可欠である。残念ながら、効果的な事前トレーニングデータセットは、マルチビューシーンを持つもので、アノテーション付き3Dメッシュ、ポイントクラウド、シミュレートされた環境からのカメラパラメータを使用してのみキュレートされている。アノテーションを必要としないデータセット作成機構を提案する。我々は、MIMIC-1M with 1.3MとMIMIC-3M with 3.1Mの2つのデータセットを、オープンソースビデオデータセットと合成3D環境から抽出した。マスク付き画像モデリングの目的が異なる複数の自己教師付きモデルをトレーニングし、以下の結果を示す。深度推定、意味セグメンテーション、表面正規化、ポーズ推定など、複数の下流タスクでアノテーションを使用してマイニングされたものよりも、模倣3mでトレーニングされた表現が優れている。また、ダウンストリームのトレーニングデータに制限がある場合、凍結された表現よりも優れています。より大規模なデータセット(MIMIC-3M)は、より大規模なデータセットを生成するために任意にスケールできるので、パフォーマンスが大幅に向上する。 MIMICコード、データセット、トレーニング済みモデルはhttps://github.com/RAIVNLab/MIMICでオープンソース化されている。 Many pixelwise dense prediction tasks-depth estimation and semantic segmentation in computer vision today rely on pretrained image representations. Therefore, curating effective pretraining datasets is vital. Unfortunately, the effective pretraining datasets are those with multi-view scenes and have only been curated using annotated 3D meshes, point clouds, and camera parameters from simulated environments. We propose a dataset-curation mechanism that does not require any annotations. We mine two datasets: MIMIC-1M with 1.3M and MIMIC-3M with 3.1M multi-view image pairs from open-sourced video datasets and from synthetic 3D environments. We train multiple self-supervised models with different masked image modeling objectives to showcase the following findings: Representations trained on MIMIC-3M outperform those mined using annotations on multiple downstream tasks, including depth estimation, semantic segmentation, surface normals, and pose estimation. They also outperform representations that are frozen and when downstream training data is limited to few-shot. Larger dataset (MIMIC-3M) significantly improves performance, which is promising since our curation method can arbitrarily scale to produce even larger datasets. MIMIC code, dataset, and pretrained models are open-sourced at https://github.com/RAIVNLab/MIMIC.	翻訳日:2023-06-29 11:22:51 公開日:2023-06-28
# bertのクロスドメイン挙動のレビュー理解における検討 Investigating Cross-Domain Behaviors of BERT in Review Understanding ( http://arxiv.org/abs/2306.15123v2 ) ライセンス: Link先を確認	Albert Lu and Meng Jiang	(参考訳) レビュースコアの予測には、自然言語処理の現実的な応用であるレビューテキスト理解が必要である。製品レビューにおける異種テキストドメインのため、共通するプラクティスは、異なるドメインのレビューに基づいてBERTモデルを微調整することである。しかし、製品レビュー理解の様々なタスクにおいて、BERTモデルのクロスドメイン動作に関する実証的研究は未だ行われていない。本稿では,単一ドメインおよび複数ドメインのAmazonレビューデータに基づいて,BERTモデルのテキスト分類を行う。以上の結果から,マルチドメインモデルと比較した場合,単一ドメインモデルの性能は若干向上したが,マルチドメインモデルでは,マルチドメインデータで評価した場合の単一ドメインモデルよりも優れており,単一ドメインモデルでは微調整が行えず,すべてのテストで平均的に性能が向上した。単一ドメインモデルの微調整によって精度がわずかに向上するが、ドメイン間でよく機能するマルチドメインモデルを利用することで、計算資源とコストを削減できる。 Review score prediction requires review text understanding, a critical real-world application of natural language processing. Due to dissimilar text domains in product reviews, a common practice is fine-tuning BERT models upon reviews of differing domains. However, there has not yet been an empirical study of cross-domain behaviors of BERT models in the various tasks of product review understanding. In this project, we investigate text classification BERT models fine-tuned on single-domain and multi-domain Amazon review data. In our findings, though single-domain models achieved marginally improved performance on their corresponding domain compared to multi-domain models, multi-domain models outperformed single-domain models when evaluated on multi-domain data, single-domain data the single-domain model was not fine-tuned on, and on average when considering all tests. Though slight increases in accuracy can be achieved through single-domain model fine-tuning, computational resources and costs can be reduced by utilizing multi-domain models that perform well across domains.	翻訳日:2023-06-29 11:22:28 公開日:2023-06-28
# 拡張カルマンフィルタを用いたストリーミング量子ゲートトモグラフィ Streaming quantum gate set tomography using the extended Kalman filter ( http://arxiv.org/abs/2306.15116v2 ) ライセンス: Link先を確認	J. P. Marceaux and Kevin Young	(参考訳) 量子プロセッサのリアルタイム校正のためのクローズドループ制御アルゴリズムは、測定された量子回路結果のストリームに基づいて物理誤差パラメータを推定できる効率的なフィルタを必要とする。このようなフィルタの開発は、観測された回路結果と初歩誤差の大きさとの非線形関係が複雑である。本研究では,量子ゲート集合トモグラフィのデータに対して拡張カルマンフィルタを適用し,システム誤差モデルとその不確かさをストリーミング推定する。我々の数値的な例から、拡張カルマンフィルタは最大推定値と同等の性能が得られるが、計算コストは劇的に低い。提案手法により, 標準ラップトップは1ビットと2ビットの回路結果を処理することができ, ゲートセットエラーモデルを現在の実験実行に匹敵する速度で更新することができる。 Closed-loop control algorithms for real-time calibration of quantum processors require efficient filters that can estimate physical error parameters based on streams of measured quantum circuit outcomes. Development of such filters is complicated by the highly nonlinear relationship relationship between observed circuit outcomes and the magnitudes of elementary errors. In this work, we apply the extended Kalman filter to data from quantum gate set tomography to provide a streaming estimator of the both the system error model and its uncertainties. Our numerical examples indicate extended Kalman filtering can achieve similar performance to maximum likelihood estimation, but with dramatically lower computational cost. With our methods, a standard laptop can process one- and two-qubit circuit outcomes and update gate set error model at rates comparable with current experimental execution.	翻訳日:2023-06-29 11:22:11 公開日:2023-06-28
# フラクタル場理論における量子クエンチ Quantum quenches in fractonic field theories ( http://arxiv.org/abs/2306.14951v2 ) ライセンス: Link先を確認	Dmitry S. Ageev and Vasilii V. Pushkarev	(参考訳) フラクトロンスカラー場理論における大域量子クエンチによる平衡外ダイナミクスについて検討する。数種類のクエンチ、特に離散回転対称性の異なる理論における質量クエンチ(\mathbb{z}_4$ および $\mathbb{z}_8$)とそれらの間の遷移による瞬時クエンチを考える。また, ユークリッド時間に有限幅スラブ上に初期状態が作成されるフラクタル境界クエンチについても検討した。有限体積におけるフラクトロン系の摂動は、特に、特定の$\mathbb{Z}_4$-対称空間構造の形成とその後の進化を通じて制限されたモビリティを強調する。我々は$\mathbb{Z}_n$-対称場理論への一般化について議論し、適切な正則化を導入し、フラクトロン場理論に固有の発散を明示的に扱うことができる。 We study out-of-equilibrium dynamics caused by global quantum quenches in fractonic scalar field theories. We consider several types of quenches, in particular, the mass quench in theories with different types of discrete rotational symmetries ($\mathbb{Z}_4$ and $\mathbb{Z}_8$), as well as an instantaneous quench via the transition between them. We also investigate fractonic boundary quenches, where the initial state is prepared on a finite-width slab in Euclidean time. We find that perturbing a fractonic system in finite volume especially highlights the restricted mobility via the formation and subsequent evolution of specific $\mathbb{Z}_4$-symmetric spatial structures. We discuss a generalization to $\mathbb{Z}_n$-symmetric field theories, and introduce a proper regularization, which allows us to explicitly deal with divergences inherent to fractonic field theories.	翻訳日:2023-06-29 11:21:58 公開日:2023-06-28
# phd論文:認知とコンピュータビジョンのアーキテクチャにおける(自己)アテンションの役割を探求する PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture ( http://arxiv.org/abs/2306.14650v2 ) ライセンス: Link先を確認	Mohit Vaishnav	(参考訳) 複雑な推論タスクにおける注意と記憶の役割について検討する。トランスフォーマーに基づく自己認識をモデルとして分析し,メモリで拡張する。合成視覚的推論テストの研究により、推論タスクの分類を洗練する。 resnet50にセルフ・アテンションを組み込んだ機能マップを機能ベースおよび空間的注意力を用いて拡張し,視覚的推論課題を効率的に解決する。本研究は,SVRTタスクの注意的ニーズの理解に寄与する。さらに,アクティブビジョン理論に触発された注意と記憶を組み合わせた認知アーキテクチャGAMRを提案する。 GAMRはサンプル効率、堅牢性、構成性において他のアーキテクチャよりも優れており、新しい推論タスクにおいてゼロショットの一般化を示す。 We investigate the role of attention and memory in complex reasoning tasks. We analyze Transformer-based self-attention as a model and extend it with memory. By studying a synthetic visual reasoning test, we refine the taxonomy of reasoning tasks. Incorporating self-attention with ResNet50, we enhance feature maps using feature-based and spatial attention, achieving efficient solving of challenging visual reasoning tasks. Our findings contribute to understanding the attentional needs of SVRT tasks. Additionally, we propose GAMR, a cognitive architecture combining attention and memory, inspired by active vision theory. GAMR outperforms other architectures in sample efficiency, robustness, and compositionality, and shows zero-shot generalization on new reasoning tasks.	翻訳日:2023-06-29 11:20:56 公開日:2023-06-28
# stylegan 埋め込み画像を用いた癌予後予測のための深層学習 Deep Learning for Cancer Prognosis Prediction Using Portrait Photos by StyleGAN Embedding ( http://arxiv.org/abs/2306.14596v2 ) ライセンス: Link先を確認	Amr Hagag, Ahmed Gomaa, Dominik Kornek, Andreas Maier, Rainer Fietkau, Christoph Bert, Florian Putz and Yixing Huang	(参考訳) がん患者の生存予測は最適な治療選択と患者管理に重要である。現在の患者生存予測法は、典型的には患者の臨床記録データまたは生物学的および画像データから生存情報を抽出する。実際に、経験豊富な臨床医は、主に顔の特徴である観察可能な身体的外観に基づいて、患者の健康状態の予備評価を行うことができる。しかし、この評価は非常に主観的である。本研究は,従来のポートレート写真に含まれる予測情報を,深層学習を用いて客観的に捉え,活用する効果について初めて検討した。事前トレーニングされたStyleGAN2モデルは、がん患者の写真のカスタムデータセットに基づいて微調整され、患者の写真に合った生成能力で生成する。 StyleGAN2は、写真を非常に表現力のある潜伏空間に埋め込むために使用される。最先端のサバイバル分析モデルと、styleganの潜在空間写真埋め込みに基づいて、このアプローチは0.677のc-インデックスを達成し、これは単純な2d顔画像に埋め込まれた予測値よりも顕著に高い。さらに、StyleGANの解釈可能な潜伏空間のおかげで、我々の生存予測モデルは、重要な顔の特徴に依存し、衣服や背景などの外部情報からのバイアスを排除できる。さらに、患者のケアに重要な電位値を有する回帰係数から健康属性を求める。 Survival prediction for cancer patients is critical for optimal treatment selection and patient management. Current patient survival prediction methods typically extract survival information from patients' clinical record data or biological and imaging data. In practice, experienced clinicians can have a preliminary assessment of patients' health status based on patients' observable physical appearances, which are mainly facial features. However, such assessment is highly subjective. In this work, the efficacy of objectively capturing and using prognostic information contained in conventional portrait photographs using deep learning for survival predication purposes is investigated for the first time. A pre-trained StyleGAN2 model is fine-tuned on a custom dataset of our cancer patients' photos to empower its generator with generative ability suitable for patients' photos. The StyleGAN2 is then used to embed the photographs to its highly expressive latent space. Utilizing the state-of-the-art survival analysis models and based on StyleGAN's latent space photo embeddings, this approach achieved a C-index of 0.677, which is notably higher than chance and evidencing the prognostic value embedded in simple 2D facial images. In addition, thanks to StyleGAN's interpretable latent space, our survival prediction model can be validated for relying on essential facial features, eliminating any biases from extraneous information like clothing or background. Moreover, a health attribute is obtained from regression coefficients, which has important potential value for patient care.	翻訳日:2023-06-29 11:20:40 公開日:2023-06-28
# PoseDiffusion: Diffusion-aided Bundle Adjustment によるPose推定の解法 PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment ( http://arxiv.org/abs/2306.15667v2 ) ライセンス: Link先を確認	Jianyuan Wang, Christian Rupprecht, David Novotny	(参考訳) カメラポーズ推定は、従来は手作りのキーポイントマッチング、RANSAC、バンドル調整といった古典的な手法に依存していたコンピュータビジョンの問題である。本稿では,入力画像に対するカメラポーズの条件分布をモデル化し,確率拡散フレームワーク内の運動からの構造 (sfm) を定式化する。古い問題に対するこの新しい見方にはいくつかの利点がある。 (i)拡散フレームワークの性質は、バンドル調整の反復手順を反映している。 (ii)この定式化はエピポーラ幾何学からの幾何学的制約のシームレスな統合を可能にする。 (iii)広い基準線を持つスパースビューのような典型的な難易度シナリオに優れる。 (iv)任意の量の画像に対して内在性及び外在性を予測することができる。提案手法は,従来のSfMパイプラインと実世界の2つのデータセットに対する学習アプローチよりも大幅に改善されていることを示す。最後に,本手法がさらなるトレーニングを行なわずにデータセットをまたいで一般化できることが観察された。プロジェクトページ: https://posediffusion.github.io/ Camera pose estimation is a long-standing computer vision problem that to date often relies on classical methods, such as handcrafted keypoint matching, RANSAC and bundle adjustment. In this paper, we propose to formulate the Structure from Motion (SfM) problem inside a probabilistic diffusion framework, modelling the conditional distribution of camera poses given input images. This novel view of an old problem has several advantages. (i) The nature of the diffusion framework mirrors the iterative procedure of bundle adjustment. (ii) The formulation allows a seamless integration of geometric constraints from epipolar geometry. (iii) It excels in typically difficult scenarios such as sparse views with wide baselines. (iv) The method can predict intrinsics and extrinsics for an arbitrary amount of images. We demonstrate that our method PoseDiffusion significantly improves over the classic SfM pipelines and the learned approaches on two real-world datasets. Finally, it is observed that our method can generalize across datasets without further training. Project page: https://posediffusion.github.io/	翻訳日:2023-06-29 11:13:30 公開日:2023-06-28
# フランス語物語における直接音声の自動アノテーション Automatic Annotation of Direct Speech in Written French Narratives ( http://arxiv.org/abs/2306.15634v2 ) ライセンス: Link先を確認	No\'e Durandard and Viet-Anh Tran and Gaspard Michel and Elena V. Epure	(参考訳) テキスト中の直接音声(aads)の自動注釈は、しばしば計算的な物語理解に使われている。ルールやディープニューラルネットワークに基づく手法は、特に英語やドイツ語で研究されている。しかし、フランス語では、我々の対象とする言語は多くはない。私たちのゴールは、フランス語でAADSモデルを設計、評価するための統一されたフレームワークを作ることです。そこで我々は,一語あたりのDSに注釈付けされた最大かつ最新のフランス語物語データセットを統合し,他の言語でのシーケンスラベリングやAADSから様々なベースラインを適応させ,一般化に焦点を当てた広範な評価を行った。結果は,タスクにはまだかなりの努力が必要であり,各ベースラインの特徴を強調していることを示している。このフレームワークは改善される可能性があるが、このトピックに関するさらなる研究を促進するための一歩である。 The automatic annotation of direct speech (AADS) in written text has been often used in computational narrative understanding. Methods based on either rules or deep neural networks have been explored, in particular for English or German languages. Yet, for French, our target language, not many works exist. Our goal is to create a unified framework to design and evaluate AADS models in French. For this, we consolidated the largest-to-date French narrative dataset annotated with DS per word; we adapted various baselines for sequence labelling or from AADS in other languages; and we designed and conducted an extensive evaluation focused on generalisation. Results show that the task still requires substantial efforts and emphasise characteristics of each baseline. Although this framework could be improved, it is a step further to encourage more research on the topic.	翻訳日:2023-06-29 11:13:15 公開日:2023-06-28
# コサイクルを用いた非同期アルゴリズムアライメント Asynchronous Algorithmic Alignment with Cocycles ( http://arxiv.org/abs/2306.15632v2 ) ライセンス: Link先を確認	Andrew Dudzik, Tamara von Glehn, Razvan Pascanu, Petar Veli\v{c}kovi\'c	(参考訳) 最先端のニューラルネットワーク推論器は、グラフニューラルネットワーク(GNN)でメッセージパッシングを利用する。しかし、典型的なgnnはメッセージ関数の定義と呼び出しの区別を曖昧にし、ノードが各レイヤの近隣にメッセージを同期的に送らなければならない。しかし、動的プログラミングアルゴリズムの実行を学ぶためにGNNを適用する場合、ほとんどのステップでは、送信すべき意味のあるアップデートはノードのごく一部に限られる。したがって、多くの中間gnnステップがid関数を学ばなければならないため、グラフ全体にあまりにも多くの無関係なデータを送信することで、非効率なリスクを負うことになる。この作業では、ノードの状態更新とメッセージ関数呼び出しの概念を明示的に分離します。この分離により、アルゴリズムとニューラルネットワークの両方で非同期計算を推論できる数学的定式化が得られる。 State-of-the-art neural algorithmic reasoners make use of message passing in graph neural networks (GNNs). But typical GNNs blur the distinction between the definition and invocation of the message function, forcing a node to send messages to its neighbours at every layer, synchronously. When applying GNNs to learn to execute dynamic programming algorithms, however, on most steps only a handful of the nodes would have meaningful updates to send. One, hence, runs the risk of inefficiencies by sending too much irrelevant data across the graph -- with many intermediate GNN steps having to learn identity functions. In this work, we explicitly separate the concepts of node state update and message function invocation. With this separation, we obtain a mathematical formulation that allows us to reason about asynchronous computation in both algorithms and neural networks.	翻訳日:2023-06-29 11:13:03 公開日:2023-06-28
# 位置補間による大規模言語モデルのコンテキストウィンドウの拡張 Extending Context Window of Large Language Models via Positional Interpolation ( http://arxiv.org/abs/2306.15595v2 ) ライセンス: Link先を確認	Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian	(参考訳) LLaMAモデルのようなRoPEベースで事前訓練されたLLMのコンテキストウィンドウサイズを、最小限の微調整(1000ステップ以内)で最大32768まで拡張し、パスキー検索、言語モデリング、LLaMA 7Bから65Bまでの長い文書要約などの長いコンテキストを必要とするタスクに対して強力な実験結果を示す。一方、位置補間による拡張モデルは、元のコンテキストウィンドウ内のタスクの質を比較的よく保っている。この目的を達成するために、位置補間は入力位置指標を線形にダウンスケールし、トレーニングされたコンテキスト長を超えて外挿するのではなく、自己保持機構を完全に破壊する破滅的な高い注意スコアを与える。我々の理論的研究は、補間上限が少なくとも$\sim 600 \times$は外挿限界よりも小さいことを示し、その安定性を示している。位置補間によって拡張されたモデルは元のアーキテクチャを維持し、既存の最適化とインフラを再利用することができる。 We present Position Interpolation (PI) that extends the context window sizes of RoPE-based pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within 1000 steps), while demonstrating strong empirical results on various tasks that require long context, including passkey retrieval, language modeling, and long document summarization from LLaMA 7B to 65B. Meanwhile, the extended model by Position Interpolation preserve quality relatively well on tasks within its original context window. To achieve this goal, Position Interpolation linearly down-scales the input position indices to match the original context window size, rather than extrapolating beyond the trained context length which may lead to catastrophically high attention scores that completely ruin the self-attention mechanism. Our theoretical study shows that the upper bound of interpolation is at least $\sim 600 \times$ smaller than that of extrapolation, further demonstrating its stability. Models extended via Position Interpolation retain its original architecture and can reuse most pre-existing optimization and infrastructure.	翻訳日:2023-06-29 11:12:50 公開日:2023-06-28
# 幾何超音波局在顕微鏡 Geometric Ultrasound Localization Microscopy ( http://arxiv.org/abs/2306.15548v2 ) ライセンス: Link先を確認	Christopher Hahne and Raphael Sznitman	(参考訳) 造影超音波(CEUS)は、医学診断における非侵襲的、動的可視化の有効な方法となっているが、超音波局在顕微鏡(ULM)は10倍の高分解能を提供することで、画期的なブレークスルーを実現している。現在までに、遅延アンドサム(DAS)ビームフォーマを使用してULMフレームをレンダリングし、最終的に画像解像度の能力を決定する。 ULMを最大限に活用するために,本研究では,ビームフォーミングがULMの最も効果的な処理ステップであるかどうかを疑問視し,TDoA情報のみに依存する代替手法を提案する。この目的のために, 既存のビームフォーミング限界を克服するために, 楕円交差による微小気泡局在のための新しい幾何学的枠組みを提案する。本稿では,既存のベースライン法よりも精度と信頼性の面で優れており,利用可能なトランスデューサデータの一部のみを活用できる公開データセットに基づくベンチマーク比較を行う。 Contrast-Enhanced Ultra-Sound (CEUS) has become a viable method for non-invasive, dynamic visualization in medical diagnostics, yet Ultrasound Localization Microscopy (ULM) has enabled a revolutionary breakthrough by offering ten times higher resolution. To date, Delay-And-Sum (DAS) beamformers are used to render ULM frames, ultimately determining the image resolution capability. To take full advantage of ULM, this study questions whether beamforming is the most effective processing step for ULM, suggesting an alternative approach that relies solely on Time-Difference-of-Arrival (TDoA) information. To this end, a novel geometric framework for micro bubble localization via ellipse intersections is proposed to overcome existing beamforming limitations. We present a benchmark comparison based on a public dataset for which our geometric ULM outperforms existing baseline methods in terms of accuracy and reliability while only utilizing a portion of the available transducer data.	翻訳日:2023-06-29 11:12:30 公開日:2023-06-28
# 不規則時系列の事前異常検出 Precursor-of-Anomaly Detection for Irregular Time Series ( http://arxiv.org/abs/2306.15489v2 ) ライセンス: Link先を確認	Sheo Yon Jhin, Jaehoon Lee, Noseong Park	(参考訳) 異常検出は予期せぬパターンやデータポイントを特定することを目的とした重要な分野であり、金融、製造、サイバーセキュリティなどにおける多くの現実世界の問題と密接に関連している。様々な分野で異常検出が広く研究されているが、今後の異常検出は未発見領域のままである。本稿では,新しい種類の異常検出手法であるemph{\textbf{P}recursor-of-\textbf{A}nomaly (PoA) を提案する。特定の時系列観測が異常であるか否かを決定する従来の異常検出とは異なり、PoA検出は将来の異常を検出することを目的としている。両課題を同時に解決するために,ニューラル制御による微分方程式に基づくニューラルネットワークとそのマルチタスク学習アルゴリズムを提案する。 17のベースラインと3つのデータセットを使って、規則的および不規則な時系列を含む実験を行い、提案手法がほぼすべてのケースでベースラインを上回ることを実証した。また, マルチタスクトレーニング手法は, 異常検出とpoa検出の両方において, 全体的な性能を著しく向上させることが示唆された。 Anomaly detection is an important field that aims to identify unexpected patterns or data points, and it is closely related to many real-world problems, particularly to applications in finance, manufacturing, cyber security, and so on. While anomaly detection has been studied extensively in various fields, detecting future anomalies before they occur remains an unexplored territory. In this paper, we present a novel type of anomaly detection, called \emph{\textbf{P}recursor-of-\textbf{A}nomaly} (PoA) detection. Unlike conventional anomaly detection, which focuses on determining whether a given time series observation is an anomaly or not, PoA detection aims to detect future anomalies before they happen. To solve both problems at the same time, we present a neural controlled differential equation-based neural network and its multi-task learning algorithm. We conduct experiments using 17 baselines and 3 datasets, including regular and irregular time series, and demonstrate that our presented method outperforms the baselines in almost all cases. Our ablation studies also indicate that the multitasking training method significantly enhances the overall performance for both anomaly and PoA detection.	翻訳日:2023-06-29 11:12:13 公開日:2023-06-28
# フリースタイル・高速3次元ポートレート合成 Free-style and Fast 3D Portrait Synthesis ( http://arxiv.org/abs/2306.15419v2 ) ライセンス: Link先を確認	Tianxiang Ma, Kang Zhao, Jianxin Sun, Jing Dong, Tieniu Tan	(参考訳) 高品質で一貫性のあるフリースタイルの3Dポートレートを効果的に生成することは、有望だが難しい課題だ。既存のほとんどのメソッドで生成されるポートレートスタイルは通常、FFHQのような特定の顔データセットで学習される3Dジェネレータによって制限される。フリースタイルの3Dポートレートを得るには、大規模なマルチスタイルデータベースを構築して3Dジェネレータを再トレーニングするか、あるいはオフザシェルフツールを使ってスタイル翻訳を行うことができる。しかし、データ収集とトレーニングプロセスのために前者は時間がかかり、後者はマルチビューの一貫性を損なう可能性がある。この問題に対処するため,本論文では,テキストプロンプトを用いてスタイルを指定可能な高速な3次元肖像画合成フレームワークを提案する。具体的には、3d対応ganジェネレータ (eg3d) とテキスト誘導画像エディタ (ip2p) の2つの生成前処理を利用して、数発のトレーニングセットを迅速に構築し、ip2pの推論プロセスを最適化し、編集をより安定させる。次に、EG3Dの原型三葉機を2つの目的のためにImage-to-Triplane (I2T)モジュールに置き換える。 1) 少数ショットデータセット上でI2Tを微調整することにより,事前訓練したEG3Dのスタイル制約を解消する。 2) I2Tを除くEG3Dのすべての部分の固定による訓練効率の向上。さらに,本手法のスケーラビリティと一般化を実証するために,マルチスタイルかつマルチidentity 3dポートレートデータベースを構築した。実験の結果,高品質な3dポートレートを数分で合成でき,最新技術に匹敵することがわかった。 Efficiently generating a free-style 3D portrait with high quality and consistency is a promising yet challenging task. The portrait styles generated by most existing methods are usually restricted by their 3D generators, which are learned in specific facial datasets, such as FFHQ. To get a free-style 3D portrait, one can build a large-scale multi-style database to retrain the 3D generator, or use a off-the-shelf tool to do the style translation. However, the former is time-consuming due to data collection and training process, the latter may destroy the multi-view consistency. To tackle this problem, we propose a fast 3D portrait synthesis framework in this paper, which enable one to use text prompts to specify styles. Specifically, for a given portrait style, we first leverage two generative priors, a 3D-aware GAN generator (EG3D) and a text-guided image editor (Ip2p), to quickly construct a few-shot training set, where the inference process of Ip2p is optimized to make editing more stable. Then we replace original triplane generator of EG3D with a Image-to-Triplane (I2T) module for two purposes: 1) getting rid of the style constraints of pre-trained EG3D by fine-tuning I2T on the few-shot dataset; 2) improving training efficiency by fixing all parts of EG3D except I2T. Furthermore, we construct a multi-style and multi-identity 3D portrait database to demonstrate the scalability and generalization of our method. Experimental results show that our method is capable of synthesizing high-quality 3D portraits with specified styles in a few minutes, outperforming the state-of-the-art.	翻訳日:2023-06-29 11:11:52 公開日:2023-06-28
# TrickVOS:ビデオオブジェクトセグメンテーションのためのトリックの袋 TrickVOS: A Bag of Tricks for Video Object Segmentation ( http://arxiv.org/abs/2306.15377v2 ) ライセンス: Link先を確認	Evangelos Skartados, Konstantinos Georgiadis, Mehmet Kerim Yucel, Koskinas Ioannis, Armando Domi, Anastasios Drosou, Bruno Manganelli, Albert Saa-Garriga	(参考訳) 空間時間メモリ(STM)ネットワーク手法は,その性能上,半教師付きビデオオブジェクトセグメンテーション(SVOS)において支配的であった。本研究では,このような手法を改良できる3つの重要な側面を同定する。一監督信号二事前訓練及び訓練 iii) 空間意識。次に、各側面に対処できる汎用的なメソッドに依存しないトリックバッグであるtrickvosを提案する。一構造対応ハイブリッド損失二簡易復号機事前訓練体制及び三モデル予測に空間的制約を課す安価な追跡装置最後に、軽量なネットワークを提案し、TrickVOSでトレーニングすると、DAVISとYouTubeベンチマークの最先端メソッドと競合する結果が得られ、モバイルデバイス上でリアルタイムに実行できるSTMベースのSVOSメソッドの1つであることを示す。 Space-time memory (STM) network methods have been dominant in semi-supervised video object segmentation (SVOS) due to their remarkable performance. In this work, we identify three key aspects where we can improve such methods; i) supervisory signal, ii) pretraining and iii) spatial awareness. We then propose TrickVOS; a generic, method-agnostic bag of tricks addressing each aspect with i) a structure-aware hybrid loss, ii) a simple decoder pretraining regime and iii) a cheap tracker that imposes spatial constraints in model predictions. Finally, we propose a lightweight network and show that when trained with TrickVOS, it achieves competitive results to state-of-the-art methods on DAVIS and YouTube benchmarks, while being one of the first STM-based SVOS methods that can run in real-time on a mobile device.	翻訳日:2023-06-29 11:11:25 公開日:2023-06-28
# 3D-Speaker: 大規模マルチデバイス, マルチディスタンス, マルチディレクトコーパスによる音声表現遠絡 3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement ( http://arxiv.org/abs/2306.15354v2 ) ライセンス: Link先を確認	Siqi Zheng, Luyao Cheng, Yafeng Chen, Hui Wang, Qian Chen	(参考訳) 発話における非相関情報の拡散は、音声コミュニティにおいて重要な研究課題である。異なる音声関連タスクは、他の非相関情報の影響を最小限に抑えながら、異なる音声表現を抽出することに焦点を当てる。本稿では,音声表現のゆがみの研究を容易にするための大規模音声コーパスを提案する。 3D-Speakerには10,000人以上のスピーカーが含まれており、それぞれが複数のデバイスによって同時に記録され、異なる距離に配置されている。多次元オーディオデータの制御された組み合わせは、多様な音声表現の絡み合いの混合のマトリックスを生じさせ、興味をそそる方法の動機付けとなる。 3D-Speakerのマルチドメインの性質は、ドメイン外学習と自己教師型学習の大規模な普遍的な音声モデルと実験方法を評価するのに適している。 https://3dspeaker.github.io/ Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000 speakers, each of whom are simultaneously recorded by multiple Devices, locating at different Distances, and some speakers are speaking multiple Dialects. The controlled combinations of multi-dimensional audio data yield a matrix of a diverse blend of speech representation entanglement, thereby motivating intriguing methods to untangle them. The multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate large universal speech models and experiment methods of out-of-domain learning and self-supervised learning. https://3dspeaker.github.io/	翻訳日:2023-06-29 11:11:11 公開日:2023-06-28
# 時系列モデリングのための変分潜在離散表現 Variational Latent Discrete Representation for Time Series Modelling ( http://arxiv.org/abs/2306.15282v2 ) ライセンス: Link先を確認	Max Cohen, Maurice Charbit, Sylvain Le Corff	(参考訳) 離散潜在空間モデルは、最近、深部変分推論における連続的な空間と同等の性能を達成した。彼らはまだ様々な実装課題に直面しているが、これらのモデルは自然に離散的な現象をより直接的に表現するだけでなく、潜在空間をよりよく解釈する機会を提供する。最近のアプローチでは、離散潜在データ上で非常に高次元の事前モデルを個別に訓練することを提案している。本稿では、離散状態がマルコフ連鎖であり、高速なエンドツーエンドトレーニングを可能にする潜在データモデルを提案する。生成モデルの性能はビル管理データセットと一般公開されているElectricity Transformer Datasetに基づいて評価する。 Discrete latent space models have recently achieved performance on par with their continuous counterparts in deep variational inference. While they still face various implementation challenges, these models offer the opportunity for a better interpretation of latent spaces, as well as a more direct representation of naturally discrete phenomena. Most recent approaches propose to train separately very high-dimensional prior models on the discrete latent data which is a challenging task on its own. In this paper, we introduce a latent data model where the discrete state is a Markov chain, which allows fast end-to-end training. The performance of our generative model is assessed on a building management dataset and on the publicly available Electricity Transformer Dataset.	翻訳日:2023-06-29 11:10:54 公開日:2023-06-28

Title

Authors

Abstract

論文公表日・翻訳日

# PyPIにおけるディープラーニングパッケージ・サプライ・チェーンの特徴:ドメイン、クラスタ、ディスエンジメント

Characterizing Deep Learning Package Supply Chains in PyPI: Domains, Clusters, and Disengagement ( http://arxiv.org/abs/2306.16307v1 )

ライセンス: Link先を確認

Kai Gao, Runzhi He, Bing Xie, Minghui Zhou

(参考訳) ディープラーニング(DL)パッケージサプライチェーン(SC)は、DLフレームワークが競争力を維持するために不可欠である。しかし、DLパッケージSCの性質に関する重要な知識はいまだに欠如している。本稿では,この知識ギャップを埋めるため,2つの代表的なpypi dlパッケージscsにおいて,パッケージのドメイン,クラスタ,および解除について検討する。約600万のPyPIパッケージディストリビューションのメタデータを分析し、人気のある2つのDLフレームワークであるTensorFlowとPyTorchのバージョンセンシティブなSCを構築します。その結果,2つのSCは8つのカテゴリに属する34のドメインをカバーしている(月間ダウンロード数で測る)。アプリケーション、インフラストラクチャ、科学のカテゴリはそれぞれ、SCとTensorFlowの人気のあるパッケージの85%以上を占めており、PyTorch SCはそれぞれ、インフラストラクチャとアプリケーションのパッケージに特化している。我々は、Leidenコミュニティ検出アルゴリズムを用いて、2つのSCの131と100のクラスタを検出する。クラスタは、主にアロー、スター、ツリー、フォレストという4つの形状を示し、依存関係の複雑さが増す。ほとんどのクラスタはArrowまたはStarだが、TreeとForestのクラスタがほとんどのパッケージ(Tensorflow SC:70%、PyTorch SC:90%)を担っている。パッケージがSCから切り離された3つの理由(すなわち、DLフレームワークとその依存物がインストール依存から削除される)、すなわち依存性の問題、機能改善、インストールの容易さの3つのグループを特定します。 2つのSCの最も一般的な解離原因は異なる。本研究は,PyPI DL SCのメンテナンスと依存性管理の実践に深く影響している。

Deep learning (DL) package supply chains (SCs) are critical for DL frameworks to remain competitive. However, vital knowledge on the nature of DL package SCs is still lacking. In this paper, we explore the domains, clusters, and disengagement of packages in two representative PyPI DL package SCs to bridge this knowledge gap. We analyze the metadata of nearly six million PyPI package distributions and construct version-sensitive SCs for two popular DL frameworks: TensorFlow and PyTorch. We find that popular packages (measured by the number of monthly downloads) in the two SCs cover 34 domains belonging to eight categories. Applications, Infrastructure, and Sciences categories account for over 85% of popular packages in either SC and TensorFlow and PyTorch SC have developed specializations on Infrastructure and Applications packages respectively. We employ the Leiden community detection algorithm and detect 131 and 100 clusters in the two SCs. The clusters mainly exhibit four shapes: Arrow, Star, Tree, and Forest with increasing dependency complexity. Most clusters are Arrow or Star, but Tree and Forest clusters account for most packages (Tensorflow SC: 70%, PyTorch SC: 90%). We identify three groups of reasons why packages disengage from the SC (i.e., remove the DL framework and its dependents from their installation dependencies): dependency issues, functional improvements, and ease of installation. The most common disengagement reason in the two SCs are different. Our study provides rich implications on the maintenance and dependency management practices of PyPI DL SCs.

翻訳日:2023-10-23 18:45:42 公開日:2023-06-28

# FuzzyFlow: プログラム最適化バグの検索とスワッシュにDataflowを活用する

FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs ( http://arxiv.org/abs/2306.16178v1 )

ライセンス: Link先を確認

Philipp Schaad and Timo Schneider and Tal Ben-Nun and Alexandru Calotoiu and Alexandros Nikolaos Ziogas and Torsten Hoefler

(参考訳) 現在のハードウェアランドスケープとアプリケーションスケールは、パフォーマンスエンジニアをbespoke最適化を書くよう駆り立てています。このような最適化の検証と、最小限の失敗事例の生成は、入力やサイズといったプログラム条件の変更に対して、堅牢性において重要である。しかしながら、既存のアプリケーションから最小限のテストケースを分離し、新しい構成を生成することは、主にデータフローに関連するシステム状態に副作用があるため、しばしば困難である。本稿では,プログラム最適化をテストするために設計された障害局所化およびテストケース抽出フレームワークであるFuzzyFlowを紹介する。我々は、データフロープログラム表現を利用して、完全に再現可能なシステム状態をキャプチャし、その領域を最適化し、意味同値の高速チェックを可能にする。テスト時間を削減するため,テスト入力を最小限に抑えるアルゴリズムを設計し,再計算のためのメモリ交換を行う。 FuzzyFlowは、従来のアプローチに比べて最大528倍高速な最適化テストとデバッギングを提供する実世界のアプリケーションのユースケースを例に示す。

The current hardware landscape and application scale is driving performance engineers towards writing bespoke optimizations. Verifying such optimizations, and generating minimal failing cases, is important for robustness in the face of changing program conditions, such as inputs and sizes. However, isolation of minimal test-cases from existing applications and generating new configurations are often difficult due to side effects on the system state, mostly related to dataflow. This paper introduces FuzzyFlow: a fault localization and test case extraction framework designed to test program optimizations. We leverage dataflow program representations to capture a fully reproducible system state and area-of-effect for optimizations to enable fast checking for semantic equivalence. To reduce testing time, we design an algorithm for minimizing test inputs, trading off memory for recomputation. We demonstrate FuzzyFlow on example use cases in real-world applications where the approach provides up to 528 times faster optimization testing and debugging compared to traditional approaches.

翻訳日:2023-10-23 18:45:16 公開日:2023-06-28

# ソーシャルコーディングプラットフォームにおける画像ベースコミュニケーション

Image-based Communication on Social Coding Platforms ( http://arxiv.org/abs/2306.15851v1 )

ライセンス: Link先を確認

Maleknaz Nayebi and Bram Adams

(参考訳) 画像やビデオの形でのビジュアルコンテンツは、様々な方法で汎用ソーシャルネットワークを乗っ取り、オンラインコミュニケーションの合理化と強化を行っている。私たちは、画像の利用がソーシャルコーディングプラットフォームで人気があり、どの程度役に立つかを理解することに興味があります。 MozillaのイシュートラッキングシステムであるBugzillaと、開発者のQ/A、すなわちStack Overflowで最も有名なプラットフォームである。我々はさらに168人のソフトウェア開発者を対象に調査を行い、鉱業結果を三角測量し拡張した。 2013年から2022年の間に、BugzillaとStack Overflowの画像データを含む投稿数は倍増した。さらに、画像を共有することで、他の開発者がコンテンツにより速く関わります。画像が開発者の投稿に含まれている場合の大半では、画像内の情報は提供されたテキストに補完される。最後に,画像が共有された場合,画像内の情報を持たないコンテンツを理解することは,86.9\%のケースではありそうにないことを示した。これらの観察に基づいて、開発者の分析や自動化ツールの設計において、ビジュアルコンテンツを検討することの重要性について論じる。

Visual content in the form of images and videos has taken over general-purpose social networks in a variety of ways, streamlining and enriching online communications. We are interested to understand if and to what extent the use of images is popular and helpful in social coding platforms. We mined nine years of data from two popular software developers' platforms: the Mozilla issue tracking system, i.e., Bugzilla, and the most well-known platform for developers' Q/A, i.e., Stack Overflow. We further triangulated and extended our mining results by performing a survey with 168 software developers. We observed that, between 2013 and 2022, the number of posts containing image data on Bugzilla and Stack Overflow doubled. Furthermore, we found that sharing images makes other developers engage more and faster with the content. In the majority of cases in which an image is included in a developer's post, the information in that image is complementary to the text provided. Finally, our results showed that when an image is shared, understanding the content without the information in the image is unlikely for 86.9\% of the cases. Based on these observations, we discuss the importance of considering visual content when analyzing developers and designing automation tools.

翻訳日:2023-10-23 18:44:57 公開日:2023-06-28

# 多様なポートフォリオにおけるトレーディングのための強化学習手法の評価

Evaluation of Reinforcement Learning Techniques for Trading on a Diverse Portfolio ( http://arxiv.org/abs/2309.03202v1 )

ライセンス: Link先を確認

Ishan S. Khare, Tarun K. Martheswaran, Akshana Dassanaike-Perera, Jonah B. Ezekiel

(参考訳) 本研究は,S&P500指数上での強化学習の実現可能性に関する重要な研究課題に答えようとしている。価値反復(vi)のオンポリシー手法と、q-learningのオフポリシー手法とともに、状態-アクション-reward-state-action(sarsa)が実装されている。モデルは2000年から2023年までの数年間の株式市場データからなるデータセット上でトレーニングされ、テストされる。この分析は、covid-19パンデミックの年数を含む2つの異なる期間を使ってモデルをトレーニングし、テストした結果と結果を提示する。その結果、トレーニングデータセットにおけるCOVID-19期間の市場データを含めると、ベースライン戦略よりも優れたパフォーマンスが得られることが示唆された。テスト中、オンラインアプローチ(VIとSARSA)はQラーニングを上回っ、バイアス分散トレードオフの影響とより単純なポリシーの一般化能力を強調した。しかし,Q-ラーニングのパフォーマンスは,今後の市場環境の安定性によって異なる可能性がある。今後の取り組みとして、さまざまな株式の試験および取引におけるqラーニングポリシーの更新を含む実験が提案されている。また,モデル訓練のための代替経済指標の探索も提案している。

This work seeks to answer key research questions regarding the viability of reinforcement learning over the S&P 500 index. The on-policy techniques of Value Iteration (VI) and State-action-reward-state-action (SARSA) are implemented along with the off-policy technique of Q-Learning. The models are trained and tested on a dataset comprising multiple years of stock market data from 2000-2023. The analysis presents the results and findings from training and testing the models using two different time periods: one including the COVID-19 pandemic years and one excluding them. The results indicate that including market data from the COVID-19 period in the training dataset leads to superior performance compared to the baseline strategies. During testing, the on-policy approaches (VI and SARSA) outperform Q-learning, highlighting the influence of bias-variance tradeoff and the generalization capabilities of simpler policies. However, it is noted that the performance of Q-learning may vary depending on the stability of future market conditions. Future work is suggested, including experiments with updated Q-learning policies during testing and trading diverse individual stocks. Additionally, the exploration of alternative economic indicators for training the models is proposed.

翻訳日:2023-10-23 08:54:54 公開日:2023-06-28

# 非エルミート双曲性物質における例外輪郭の発見

Uncovering Exceptional Contours in non-Hermitian Hyperbolic Matter ( http://arxiv.org/abs/2307.04745v1 )

ライセンス: Link先を確認

Nisarg Chadha, Awadhesh Narayan

(参考訳) 双曲格子は、物質の新しい段階を探索するために研究され始めている。同時に、非エルミート物理学は、フォトニック、光学、フォノニック、凝縮体系において最前線にある。本研究では,非エルミート双曲体を導入し,その特異な性質を深く解明する。双曲ブロッホ理論を用いて、非エルミートオンサイトゲインと損失と非相反ホッピングの存在下で双曲格子のバンド構造を調べる。様々な解析的および数値的アプローチを用いて、位相剛性、エネルギースケーリング、渦性を用いて特徴づける10,5}テッセレーションにおいて、広くアクセス可能で可変可能な例外点と輪郭を示す。さらに,ニュートン多角形を用いた<8,4}テセルレーションにおける高次例外点と輪郭の発生を,渦性および位相剛性計算によって実証した。最後に,開放境界スペクトルと状態密度を調べ,バンド理論の結果と比較し,境界局所化の実証を行った。以上の結果から,双曲型非エルミート物質の異常な不均一性がみられた。

Hyperbolic lattices are starting to be explored in search of novel phases of matter. At the same time, non-Hermitian physics has come to the forefront in photonic, optical, phononic, and condensed matter systems. In this work, we introduce non-Hermitian hyperbolic matter and elucidate its exceptional properties in depth. We use hyperbolic Bloch theory to investigate band structures of hyperbolic lattices in the presence of non-Hermitian on-site gain and loss as well as non-reciprocal hopping. Using various analytical and numerical approaches we demonstrate widely accessible and tunable exceptional points and contours in {10,5} tessellations, which we characterize using phase rigidity, energy scaling, and vorticity. We further demonstrate the occurrence of higher-order exceptional points and contours in the {8,4} tessellations using the method of Newton polygons, supported by vorticity and phase rigidity computations. Finally, we investigate the open boundary spectra and densities of states to compare with results from band theory, along with a demonstration of boundary localisation. Our results unveil an abundance of exceptional degeneracies in hyperbolic non-Hermitian matter.

翻訳日:2023-07-16 04:02:56 公開日:2023-06-28

# 中性子・X線反射率データのニューラルネットワーク解析:相問題に取り組むための事前知識の導入

Neural network analysis of neutron and X-ray reflectivity data: Incorporating prior knowledge for tackling the phase problem ( http://arxiv.org/abs/2307.05364v1 )

ライセンス: Link先を確認

Valentin Munteanu, Vladimir Starostin, Alessandro Greco, Linus Pithan, Alexander Gerlach, Alexander Hinderhofer, Stefan Kowarik, Frank Schreiber

(参考訳) 位相情報の欠如により、測定された中性子およびx線反射率曲線から多層薄膜の物理パラメータを決定することは、基本レベルでは、不確定な逆問題である。このいわゆるフェーズ問題は、従来の機械学習ソリューションで考慮されるパラメータの範囲と数を制限する、標準的なニューラルネットワークに制限を与える。そこで本研究では,事前知識を活用し,より広いパラメータ空間上でのトレーニングプロセスを定式化する手法を提案する。ボックスモデルパラメータ化を用いた多層構造や,多層構造に対する散乱長密度プロファイルの物理に着想を得た特殊パラメータ化など,様々なシナリオにおいて本手法の有効性を示す。事前知識の入力を活用することで、トレーニングダイナミクスを改善し、未決定の(未解決の)問題の性質に対処できる。従来の手法とは対照的に,5層多層モデルや最大17個のオープンパラメータを持つn層周期多層モデルにおいても,逆問題の複雑性を増大させる手法は好適である。

Due to the lack of phase information, determining the physical parameters of multilayer thin films from measured neutron and X-ray reflectivity curves is, on a fundamental level, an underdetermined inverse problem. This so-called phase problem poses limitations on standard neural networks, constraining the range and number of considered parameters in previous machine learning solutions. To overcome this, we present an approach that utilizes prior knowledge to regularize the training process over larger parameter spaces. We demonstrate the effectiveness of our method in various scenarios, including multilayer structures with box model parameterization and a physics-inspired special parameterization of the scattering length density profile for a multilayer structure. By leveraging the input of prior knowledge, we can improve the training dynamics and address the underdetermined ("ill-posed") nature of the problem. In contrast to previous methods, our approach scales favorably when increasing the complexity of the inverse problem, working properly even for a 5-layer multilayer model and an N-layer periodic multilayer model with up to 17 open parameters.

翻訳日:2023-07-16 03:55:58 公開日:2023-06-28

# VisText:Semantically Rich Chart Captioningのベンチマーク

VisText: A Benchmark for Semantically Rich Chart Captioning ( http://arxiv.org/abs/2307.05356v1 )

ライセンス: Link先を確認

Benny J. Tang, Angie Boggust and Arvind Satyanarayan

(参考訳) チャートを記述または説明するキャプションは、描写されたデータのリコールと理解を改善し、視覚障害者にとってよりアクセスしやすい媒体を提供する。しかし、このようなキャプションを自動生成する現在のアプローチは、チャートの目印である知覚的特徴や認知的特徴(複雑な傾向やパターンなど)を明確にするのに苦労している。グラフの構成を記述した12,441組のチャートとキャプションのデータセットであるVisTextを紹介し、重要な統計を報告し、知覚的および認知的現象を識別する。 VisTextでは、チャートはラスタ化イメージ、バックデータテーブル、シーングラフの3つの表現として利用可能である。これは、チャートの視覚要素をWebページのドキュメントオブジェクトモデル(DOM)に似た階層的な表現である。 vistextの影響を評価するために、グラフキャプションタスクに最先端の言語モデルを微調整し、彼らが伝達する意味的コンテンツが異なるキャプションを作成するためにプレフィックスチューニングを適用します。我々のモデルはコヒーレントでセマンティックにリッチなキャプションを生成し、機械翻訳とテキスト生成のメトリクスで最先端のチャートキャプションモデルと同等に機能する。定性的分析により、我々のモデルが将来の作業に役立てる6つの幅広いエラーカテゴリを特定します。

Captions that describe or explain charts help improve recall and comprehension of the depicted data and provide a more accessible medium for people with visual disabilities. However, current approaches for automatically generating such captions struggle to articulate the perceptual or cognitive features that are the hallmark of charts (e.g., complex trends and patterns). In response, we introduce VisText: a dataset of 12,441 pairs of charts and captions that describe the charts' construction, report key statistics, and identify perceptual and cognitive phenomena. In VisText, a chart is available as three representations: a rasterized image, a backing data table, and a scene graph -- a hierarchical representation of a chart's visual elements akin to a web page's Document Object Model (DOM). To evaluate the impact of VisText, we fine-tune state-of-the-art language models on our chart captioning task and apply prefix-tuning to produce captions that vary the semantic content they convey. Our models generate coherent, semantically rich captions and perform on par with state-of-the-art chart captioning models across machine translation and text generation metrics. Through qualitative analysis, we identify six broad categories of errors that our models make that can inform future work.

翻訳日:2023-07-16 03:55:21 公開日:2023-06-28

# HIVA:ホログラフィー・インテリジェント音声アシスタント

HIVA: Holographic Intellectual Voice Assistant ( http://arxiv.org/abs/2307.05501v1 )

ライセンス: Link先を確認

Ruslan Isaev, Radmir Gumerov, Gulzada Esenalieva, Remudin Reshid Mekuria, Ermek Doszhanov

(参考訳) Holographic Intellectual Voice Assistant (HIVA)は、視覚効果と3Dアバターを用いた人間のコンピュータインタラクションを促進することを目的としている。 hivaは、入学、研究問題、手数料、部門、大学構造と歴史、カンティーン、人的資源、図書館、学生生活とイベント、国と市に関する情報など、様々な性質の要求を含む、大学に関する完全な情報を提供している。以上のデータを受信するには、大学の公式サイトやその他のサポートアプリ、HEI(Higher Education Institution)公式ソーシャルメディア、HEIスタッフに直接質問する他のチャンネルなどがある。しかし、HIVAはアニメーション3Dマスコットとの「対面」相互作用のユニークな体験を提供し、実際のコミュニケーションの感覚を得るのに役立つ。このシステムは、多くのサブモジュールを含み、モバイルアプリケーション、Telegramチャットボット、提案分類、エンターテイメントサービスなどのアプリケーション群を接続する。音声アシスタントは、最高のユーザーエクスペリエンスのためにパイプライン化されたロシア語のnlpモデルとツールを使用する。

Holographic Intellectual Voice Assistant (HIVA) aims to facilitate human computer interaction using audiovisual effects and 3D avatar. HIVA provides complete information about the university, including requests of various nature: admission, study issues, fees, departments, university structure and history, canteen, human resources, library, student life and events, information about the country and the city, etc. There are other ways for receiving the data listed above: the university's official website and other supporting apps, HEI (Higher Education Institution) official social media, directly asking the HEI staff, and other channels. However, HIVA provides the unique experience of "face-to-face" interaction with an animated 3D mascot, helping to get a sense of 'real-life' communication. The system includes many sub-modules and connects a family of applications such as mobile applications, Telegram chatbot, suggestion categorization, and entertainment services. The Voice assistant uses Russian language NLP models and tools, which are pipelined for the best user experience.

翻訳日:2023-07-16 03:35:05 公開日:2023-06-28

# 微分力学系に対する古典的フィッシャー情報

Classical Fisher information for differentiable dynamical systems ( http://arxiv.org/abs/2307.00026v1 )

ライセンス: Link先を確認

Mohamed Sahbani, Swetamber Das, and Jason R. Green

(参考訳) フィッシャー情報は、古典的および量子力学的パラメータの統計的推定における不確実性の低い境界である。いくつかの決定論的力学系はランダムなゆらぎには属さないが、それでも不確実性がある: 初期条件に対する無限小の摂動は、決定論的カオスのサインである時間的に指数関数的に増加する。この不確かさの尺度として、他の古典的情報、特に騒音に従わない古典システムの決定論的ダイナミクスを紹介する。この古典的な情報の測度は接空間におけるリャプノフベクトルで定義されており、古典的なフィッシャー情報に似ておらず、ヒルベルト空間の波動ベクトルで定義される量子フィッシャー情報に近い。局所状態空間構造と線形安定性の解析は,この情報の上界と下界につながり,流れのネットストレッチング作用として解釈される。機械的な例のためのこの情報の数値計算は、位相空間の曲率と流れの速度に直接依存していることを示している。

Fisher information is a lower bound on the uncertainty in the statistical estimation of classical and quantum mechanical parameters. While some deterministic dynamical systems are not subject to random fluctuations, they do still have a form of uncertainty: Infinitesimal perturbations to the initial conditions can grow exponentially in time, a signature of deterministic chaos. As a measure of this uncertainty, we introduce another classical information, specifically for the deterministic dynamics of classical systems not subject to noise. This classical measure of information is defined with Lyapunov vectors in tangent space, making it less akin to the classical Fisher information and more akin to the quantum Fisher information defined with wavevectors in Hilbert space. Our analysis of the local state space structure and linear stability lead to upper and lower bounds on this information, giving it an interpretation as the net stretching action of the flow. Numerical calculations of this information for illustrative mechanical examples show that it depends directly on the phase space curvature and speed of the flow.

翻訳日:2023-07-09 13:50:03 公開日:2023-06-28

# EmoSpeech: FastSpeech2が感情テキストから音声へ

EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech ( http://arxiv.org/abs/2307.00024v1 )

ライセンス: Link先を確認

Daria Diatlova, Vitaly Shutov

(参考訳) 最先端の音声合成モデルは、人間の声にできるだけ近づこうとしている。したがって、感情のモデル化はテキスト音声(TTS)研究の不可欠な部分である。本研究では,fastspeech2を出発点として選択し,感情音声合成のための一連の修正を提案する。自動評価と人的評価により,我々のモデルであるEmoSpeechは,生成音声におけるMOSスコアと感情認識精度の両方に関する既存モデルを上回った。我々は、EmoSpeechを形成するFastSpeech2アーキテクチャのすべての拡張について、詳細なアブレーション研究を行った。テキスト中の感情の不均一な分布は、より良い、合成された音声とイントネーション知覚に不可欠である。私たちのモデルには、さまざまな強度レベルで各携帯電話に感情が貢献できるようにすることで、この問題を効果的に処理するコンディショニングメカニズムが含まれています。人間の評価は、提案された修正は、より高いMOSと感情表現性を持つ音声を生成することを示している。

State-of-the-art speech synthesis models try to get as close as possible to the human voice. Hence, modelling emotions is an essential part of Text-To-Speech (TTS) research. In our work, we selected FastSpeech2 as the starting point and proposed a series of modifications for synthesizing emotional speech. According to automatic and human evaluation, our model, EmoSpeech, surpasses existing models regarding both MOS score and emotion recognition accuracy in generated speech. We provided a detailed ablation study for every extension to FastSpeech2 architecture that forms EmoSpeech. The uneven distribution of emotions in the text is crucial for better, synthesized speech and intonation perception. Our model includes a conditioning mechanism that effectively handles this issue by allowing emotions to contribute to each phone with varying intensity levels. The human assessment indicates that proposed modifications generate audio with higher MOS and emotional expressiveness.

翻訳日:2023-07-09 13:49:45 公開日:2023-06-28

# 古典状態の幾何学的テンソル

The geometric tensor for classical states ( http://arxiv.org/abs/2307.01208v1 )

ライセンス: Link先を確認

A. D. Berm\'udez Manjarres

(参考訳) リウヴィル固有関数を用いて幾何テンソルの古典版を定義し、古典的断熱ゲージポテンシャル(AGP)との関係を研究する。我々は可積分系に注目し、幾何学的テンソルの虚部がハンネー曲率と関連していることを示す。幾何テンソルの特異点と AGP は、アーノルド・リウヴィル積分性からカオスへの遷移と、量子相転移の数学的形式論のいくつかを結びつけることができる。

We use the Liouville eigenfunctions to define a classical version of the geometric tensor and study its relationship with the classical adiabatic gauge potential (AGP). We focus on integrable systems and show that the imaginary part of the geometric tensor is related to the Hannay curvature. The singularities of the geometric tensor and the AGP allows us to link the transition from Arnold-Liouville integrability to chaos with some of the mathematical formalism of quantum phase transitions.

翻訳日:2023-07-09 13:40:53 公開日:2023-06-28

# オンラインおよびモバイルソーシャルネットワークのためのレコメンダシステム:調査

Recommender Systems for Online and Mobile Social Networks: A survey ( http://arxiv.org/abs/2307.01207v1 )

ライセンス: Link先を確認

Mattia Giovanni Campana, Franca Delmastro

(参考訳) Recommender Systems (RS) はオンラインサービスにおける基本的なツールであり、特に Online Social Networks (OSN) が出現した。この場合、ユーザは大量のコンテンツを生成し、無駄な情報によって素早くオーバーロードできる。同時に、ソーシャルメディアはコンテンツやユーザーの興味を特徴づける重要な情報源となっている。 RSはこの情報を利用して提案をさらにパーソナライズし、推奨プロセスを改善することができる。本稿では,オンラインおよびモバイルのソーシャルネットワーク向けに設計・実装されたレコメンダシステムに関する調査を行い,ソーシャルコンテキスト情報の利用がレコメンデーションタスクをどのように改善するか,標準アルゴリズムを拡張・最適化して,日和見ネットワークとして完全に分散した環境で動作させるべきか,について述べる。本稿では,これらのシステムの利点と欠点を,アルゴリズム,対象領域,評価指標,性能評価の観点から説明する。最終的には、この分野におけるオープンリサーチの課題をいくつか提示する。

Recommender Systems (RS) currently represent a fundamental tool in online services, especially with the advent of Online Social Networks (OSN). In this case, users generate huge amounts of contents and they can be quickly overloaded by useless information. At the same time, social media represent an important source of information to characterize contents and users' interests. RS can exploit this information to further personalize suggestions and improve the recommendation process. In this paper we present a survey of Recommender Systems designed and implemented for Online and Mobile Social Networks, highlighting how the use of social context information improves the recommendation task, and how standard algorithms must be enhanced and optimized to run in a fully distributed environment, as opportunistic networks. We describe advantages and drawbacks of these systems in terms of algorithms, target domains, evaluation metrics and performance evaluations. Eventually, we present some open research challenges in this area.

翻訳日:2023-07-09 13:40:45 公開日:2023-06-28

# CTR予測のための信頼度ランキング

Confidence Ranking for CTR Prediction ( http://arxiv.org/abs/2307.01206v1 )

ライセンス: Link先を確認

Jian Zhu, Congcong Liu, Pei Wang, Xiwei Zhao, Zhangang Lin, Jingping Shao

(参考訳) モデルの進化とデータの定常利用は、広告やレコメンデーションシステムなど、大規模な実世界の機械学習アプリケーションにおいて2つの一般的な現象である。適応するために、現実世界のシステムは、通常、すべての利用可能なデータで再トレーニングし、最近利用可能なデータでオンライン学習を行い、パフォーマンスの向上を目標として定期的にモデルを更新する。本稿では,2つの異なるモデルを用いたランキング関数として最適化目標を設計する,信頼ランキングという新しいフレームワークを提案する。私たちの信頼度ランキングの損失は、メトリクスの異なる凸サーロゲート関数(例えば、aucと精度)に対するロジット出力の直接最適化を可能にします。提案手法を用いて,信頼度ランキング損失の導入は,公共および産業データセットのCTR予測タスクにおいて,すべてのベースラインを上回り得ることを示す。このフレームワークは、JD.comの広告システムに展開され、ファインランクの段階で主要なトラフィックを提供する。

Model evolution and constant availability of data are two common phenomena in large-scale real-world machine learning applications, e.g. ads and recommendation systems. To adapt, the real-world system typically retrain with all available data and online learn with recently available data to update the models periodically with the goal of better serving performance. In this paper, we propose a novel framework, named Confidence Ranking, which designs the optimization objective as a ranking function with two different models. Our confidence ranking loss allows direct optimization of the logits output for different convex surrogate functions of metrics, e.g. AUC and Accuracy depending on the target task and dataset. Armed with our proposed methods, our experiments show that the introduction of confidence ranking loss can outperform all baselines on the CTR prediction tasks of public and industrial datasets. This framework has been deployed in the advertisement system of JD.com to serve the main traffic in the fine-rank stage.

翻訳日:2023-07-09 13:40:26 公開日:2023-06-28

# 命令チューニングの活用可能性について

On the Exploitability of Instruction Tuning ( http://arxiv.org/abs/2306.17194v1 )

ライセンス: Link先を確認

Manli Shu, Jiongxiao Wang, Chen Zhu, Jonas Geiping, Chaowei Xiao, Tom Goldstein

(参考訳) インストラクションチューニングは、大きな言語モデル(LLM)を人間の意図に合わせる効果的な手法である。本研究では,モデル動作を意図的に変化させる訓練データに,特定の指示追従例を注入することにより,相手が指導チューニングを利用する方法を検討する。例えば、敵は、ターゲットコンテンツに言及するトレーニング例を注入し、下流モデルからそのような行動を引き出すことによって、コンテンツ注入を達成できる。この目的を達成するために、自動データ中毒パイプラインである \textit{AutoPoison} を提案する。自然とコヒーレントに、oracle llmの助けを借りて、汎用的な攻撃目標を有毒データに組み込む。コンテンツインジェクションと過剰拒否攻撃の2つの例を示し、それぞれが特定の悪用可能な振る舞いを誘導する。データ中毒スキームの強さとステルスネスを定量化し、ベンチマークします。以上の結果から, オートポゾンにより, 被毒例の密着性を維持しつつ, 少量のデータのみを有毒化することにより, 敵がモデルの行動を変えることが可能となった。私たちの研究は、データ品質が命令調整モデルの振る舞いにどのように影響するかを明らかにし、llmの責任ある展開におけるデータ品質の重要性に対する認識を高めることを願っています。コードは \url{https://github.com/azshue/autopoison} で入手できる。

Instruction tuning is an effective technique to align large language models (LLMs) with human intents. In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior. For example, an adversary can achieve content injection by injecting training examples that mention target content and eliciting such behavior from downstream models. To achieve this goal, we propose \textit{AutoPoison}, an automated data poisoning pipeline. It naturally and coherently incorporates versatile attack goals into poisoned data with the help of an oracle LLM. We showcase two example attacks: content injection and over-refusal attacks, each aiming to induce a specific exploitable behavior. We quantify and benchmark the strength and the stealthiness of our data poisoning scheme. Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data while maintaining a high level of stealthiness in the poisoned examples. We hope our work sheds light on how data quality affects the behavior of instruction-tuned models and raises awareness of the importance of data quality for responsible deployments of LLMs. Code is available at \url{https://github.com/azshue/AutoPoison}.

翻訳日:2023-07-03 14:31:25 公開日:2023-06-28

# 自動脆弱性検出のための機械学習の限界

Limits of Machine Learning for Automatic Vulnerability Detection ( http://arxiv.org/abs/2306.17193v1 )

ライセンス: Link先を確認

Niklas Risse, Marcel B\"ohme

(参考訳) 関数のソースコードのみを$f$とすれば、マシンラーニングテクニックによってトレーニングされたモデルは、$f$が最大70%の精度でセキュリティ上の欠陥を含むかどうかを判断できる。しかし、これらの結果が汎用的でデータセットに固有のものではないことをどうやって知るのか? この質問を研究するために、研究者はセマンティクス保存の変更を注入することでテストセットの増幅を提案し、モデルの精度が大幅に低下することを発見した。言い換えると、このモデルは分類中にいくつかの無関係な特徴を使用する。モデルの堅牢性を高めるために、研究者は増幅されたトレーニングデータをトレーニングすることを提案した。本稿では,本研究を再現・継続し,研究者が脆弱性検出のための機械学習の進歩をよりよく評価する上で有効なモデルベンチマーク手法を提案する。具体的には (i)トレーニングセットまたはテストセットの増幅中に意味保存変換を適用するクロス検証アルゴリズム (ii)脆弱性が修正されたコードスニペットによるテストセットの増幅。 11の変換、3つのMLテクニック、2つのデータセットを使用して、改善された堅牢性は、トレーニングデータ増幅時に使用される特定の変換にのみ適用される。言い換えれば、堅牢化モデルはテストデータの脆弱性を予測するために、いまだ無関係な機能に依存しています。さらに、トレーニングされたモデルでは、脆弱性のある機能をパッチと区別する必要のある修正された設定に一般化できないことも分かりました。

Recent results of machine learning for automatic vulnerability detection have been very promising indeed: Given only the source code of a function $f$, models trained by machine learning techniques can decide if $f$ contains a security flaw with up to 70% accuracy. But how do we know that these results are general and not specific to the datasets? To study this question, researchers proposed to amplify the testing set by injecting semantic preserving changes and found that the model's accuracy significantly drops. In other words, the model uses some unrelated features during classification. In order to increase the robustness of the model, researchers proposed to train on amplified training data, and indeed model accuracy increased to previous levels. In this paper, we replicate and continue this investigation, and provide an actionable model benchmarking methodology to help researchers better evaluate advances in machine learning for vulnerability detection. Specifically, we propose (i) a cross validation algorithm, where a semantic preserving transformation is applied during the amplification of either the training set or the testing set, and (ii) the amplification of the testing set with code snippets where the vulnerabilities are fixed. Using 11 transformations, 3 ML techniques, and 2 datasets, we find that the improved robustness only applies to the specific transformations used during training data amplification. In other words, the robustified models still rely on unrelated features for predicting the vulnerabilities in the testing data. Additionally, we find that the trained models are unable to generalize to the modified setting which requires to distinguish vulnerable functions from their patches.

翻訳日:2023-07-03 14:31:03 公開日:2023-06-28

# ネットワーク干渉による因果推論の局所的アプローチ

The Local Approach to Causal Inference under Network Interference ( http://arxiv.org/abs/2105.03810v4 )

ライセンス: Link先を確認

Eric Auerbach and Max Tabord-Meehan

(参考訳) エージェントが社会的・経済的ネットワークにどのようにリンクされているかに依存する場合の因果推論のための新しい非パラメトリックモデリングフレームワークを提案する。このようなネットワーク干渉は、治療の流出、社会的相互作用、社会学習、情報拡散、病気と金融の伝染、社会資本の形成などに関する大きな文献を記述している。提案手法では, エージェントがネットワーク内でどのようにリンクされているかを, 経路距離で測定した他のエージェントと近傍の接続の設定を用いて特徴付ける。ポリシーや治療課題の影響は、同様に構成されたエージェント間で結果データをプールすることで学習される。本研究は,k-nearest-neighbor推定器の平均的又は分布的政策効果/治療反応に対する平均二乗誤差を限定し,政策無関係/無治療仮説に対する漸近的に有効なテストを提案する。

We propose a new nonparametric modeling framework for causal inference when outcomes depend on how agents are linked in a social or economic network. Such network interference describes a large literature on treatment spillovers, social interactions, social learning, information diffusion, disease and financial contagion, social capital formation, and more. Our approach works by first characterizing how an agent is linked in the network using the configuration of other agents and connections nearby as measured by path distance. The impact of a policy or treatment assignment is then learned by pooling outcome data across similarly configured agents. We demonstrate the approach by proposing an asymptotically valid test for the hypothesis of policy irrelevance/no treatment effects and bounding the mean-squared error of a k-nearest-neighbor estimator for the average or distributional policy effect/treatment response.

翻訳日:2023-06-30 19:50:23 公開日:2023-06-28

# 学習型ビジュアルオドメトリーを用いた動的高密度RGB-D SLAM

Dynamic Dense RGB-D SLAM using Learning-based Visual Odometry ( http://arxiv.org/abs/2205.05916v2 )

ライセンス: Link先を確認

Shihao Shen, Yilin Cai, Jiayi Qiu, Guangzhao Li

(参考訳) 本稿では,学習に基づくビジュアルオドメトリーであるTartanVOに基づく高密度な動的RGB-D SLAMパイプラインを提案する。 TartanVOは、機能ベースの他の直接的な方法と同様に、高密度の光学的流れを通してカメラのポーズを推定するが、これは静的なシーンにのみ適用され、動的オブジェクトを無視する。色濃度の仮定により、光学フローは動的画素と静的画素の区別ができない。したがって,このような直接的手法で静的マップを再構築するには,光フロー出力を利用して動的/静的セグメンテーションを解決し,静的ポイントのみをマップに融合する。さらに、動的な画素を取り除いた入力フレームを再描画し、視覚的なオドメトリーに繰り返し転送してポーズ推定を洗練させる。

We propose a dense dynamic RGB-D SLAM pipeline based on a learning-based visual odometry, TartanVO. TartanVO, like other direct methods rather than feature-based, estimates camera pose through dense optical flow, which only applies to static scenes and disregards dynamic objects. Due to the color constancy assumption, optical flow is not able to differentiate between dynamic and static pixels. Therefore, to reconstruct a static map through such direct methods, our pipeline resolves dynamic/static segmentation by leveraging the optical flow output, and only fuse static points into the map. Moreover, we rerender the input frames such that the dynamic pixels are removed and iteratively pass them back into the visual odometry to refine the pose estimate.

翻訳日:2023-06-30 19:47:46 公開日:2023-06-28

# 電圧からの構造

Structure from Voltage ( http://arxiv.org/abs/2203.00063v2 )

ライセンス: Link先を確認

Robi Bhattacharjee, Alex Cloninger, Yoav Freund, Andreas Oslandsbotn

(参考訳) 有効抵抗(ER)はグラフの構造を問う魅力的な方法である。これはグラフラプラシアンの固有ベクトルを計算するに代わるものである。グラフラプラシアンは高次元データにおいて低次元構造を見つけるために用いられる。ここでも、ERベースの解析は等ベクトル法よりも有利である。残念ながら、Von Luxburg et al. (2010) は、頂点が計量空間上の分布からのサンプルに対応するとき、遠点間のERの極限はグラフの構造に関する情報を持たない自明な量に収束することを示した。我々は、$n$頂点が$n^2$のグラフにおけるスケーリング抵抗を使用することで、電圧と有効抵抗の有意な制限が得られることを示す。また、計量グラフに「接地」ノードを加えることで、選択された点から他の全ての点までの距離を計算するための単純で自然な方法が得られることを示す。

Effective resistance (ER) is an attractive way to interrogate the structure of graphs. It is an alternative to computing the eigen-vectors of the graph Laplacian. Graph laplacians are used to find low dimensional structures in high dimensional data. Here too, ER based analysis has advantages over eign-vector based methods. Unfortunately Von Luxburg et al. (2010) show that, when vertices correspond to a sample from a distribution over a metric space, the limit of the ER between distant points converges to a trivial quantity that holds no information about the structure of the graph. We show that by using scaling resistances in a graph with $n$ vertices by $n^2$, one gets a meaningful limit of the voltages and of effective resistances. We also show that by adding a "ground" node to a metric graph one gets a simple and natural way to compute all of the distances from a chosen point to all other points.

翻訳日:2023-06-30 19:47:12 公開日:2023-06-28

# ブロックワイズデータを用いた高次元線形回帰の統計的推測

Statistical Inference for High-Dimensional Linear Regression with Blockwise Missing Data ( http://arxiv.org/abs/2106.03344v2 )

ライセンス: Link先を確認

Fei Xue, Rong Ma, Hongzhe Li

(参考訳) 異なるソースやモダリティが相補的な情報を含んでいるマルチソースまたはマルチモダリティデータを統合すると、ブロックワイドなデータが頻繁に発生する。本稿では,ブロックワイド共変量と部分的な応答変数を持つ高次元線形回帰モデルについて考察する。本研究では,不偏推定方程式とブロックワイズ計算法に基づく回帰係数ベクトルの計算効率の高い推定器を提案し,その収束率を求める。さらに,初期推定器のバイアス補正を本質的に達成する革新的な予測式法に基づいて,漸近的に分布する各回帰係数に対する偏りのない推定法を提案する。これらの偏差推定器に基づいて、漸近的に有効な信頼区間と各回帰係数に関する統計的試験を構築する。アルツハイマー病の神経画像化イニシアチブデータの数値研究と応用分析により,提案法が従来の方法よりも良好で,教師なし検体より有益であることが示された。

Blockwise missing data occurs frequently when we integrate multisource or multimodality data where different sources or modalities contain complementary information. In this paper, we consider a high-dimensional linear regression model with blockwise missing covariates and a partially observed response variable. Under this framework, we propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations and a blockwise imputation procedure, and obtain its rate of convergence. Furthermore, building upon an innovative projected estimating equation technique that intrinsically achieves bias-correction of the initial estimator, we propose a nearly unbiased estimator for each individual regression coefficient, which is asymptotically normally distributed under mild conditions. Based on these debiased estimators, asymptotically valid confidence intervals and statistical tests about each regression coefficient are constructed. Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.

翻訳日:2023-06-30 19:46:08 公開日:2023-06-28

# SAFER:強化学習による集中的かつ効率的な軌道探索による安全な衝突回避

SAFER: Safe Collision Avoidance using Focused and Efficient Trajectory Search with Reinforcement Learning ( http://arxiv.org/abs/2209.11789v2 )

ライセンス: Link先を確認

Mario Srouji, Hugues Thomas, Hubert Tsai, Ali Farhadi, Jian Zhang

(参考訳) 衝突回避は、現実世界で安全に動く移動ロボットやエージェントにとって重要だ。本研究では,オペレーターが送信する制御コマンドの修正により安全性を向上させることができる,効率的な衝突回避システムSAFERを提案する。現実世界の強化学習(RL)、検索ベースのオンライン軌道計画、自動緊急ブレーキ(AEB)などの自動緊急介入を組み合わせる。 RLの目的は、衝突のない軌道の集中探索に使用される効果的な補正制御動作を学習し、自動緊急ブレーキの起動頻度を低減することである。この新しいセットアップにより、rlポリシーは現実世界の屋内環境でモバイルロボットを安全に直接学習することができ、トレーニング中にも実際のクラッシュを最小限に抑えることができる。私たちの実世界の実験では、いくつかのベースラインと比較すると、平均速度、クラッシュ率の低下、緊急介入の低減、計算オーバーヘッドの低減、全体的なコントロールの円滑化が期待できます。

Collision avoidance is key for mobile robots and agents to operate safely in the real world. In this work we present SAFER, an efficient and effective collision avoidance system that is able to improve safety by correcting the control commands sent by an operator. It combines real-world reinforcement learning (RL), search-based online trajectory planning, and automatic emergency intervention, e.g. automatic emergency braking (AEB). The goal of the RL is to learn an effective corrective control action that is used in a focused search for collision-free trajectories, and to reduce the frequency of triggering automatic emergency braking. This novel setup enables the RL policy to learn safely and directly on mobile robots in a real-world indoor environment, minimizing actual crashes even during training. Our real-world experiments show that, when compared with several baselines, our approach enjoys a higher average speed, lower crash rate, less emergency intervention, smaller computation overhead, and smoother overall control.

翻訳日:2023-06-30 19:35:54 公開日:2023-06-28

# マルチタイム量子通信: 興味深いが反事実ではない

Multitime Quantum Communication: Interesting But Not Counterfactual ( http://arxiv.org/abs/2301.01730v3 )

ライセンス: Link先を確認

Robert B. Griffiths

(参考訳) Salihらによって導入された2つの当事者間での情報伝達プロトコルPhys。 Rev. Lett. 110 (2013) 170502 (ここでslazの後)は、単に一方向の信号を送るのではなく、一連のステップで量子チャネルで量子振幅を前後に送信する。著者らは、それらのプロトコルは、両者をつなぐために量子チャネルが必要であるが、ステップ数が無限になる傾向があるため、その実際の使用量は漸近的限界において極めて小さくなるという意味で、'counterfactual'であると主張した。ここでは、量子干渉の存在下で中間時間で有効でない確率論的推論を使用するため、この主張は誤りであることを示す。未定義の確率が、チャネルを通じて送信される振幅の絶対二乗に等しい「コスト」と呼ばれるよく定義されたチャネル使用量の尺度に置き換えられるとき、その総コストは、多くのステップの漸近極限においてゼロにならず、厳密な不等式によって下限となる。詳細な分析により、この境界がSLAZプロトコルで満たされていることが示されている。この境界につながる解析は、純量子状態の集合の内部積によって形成されるグラム行列がヒルベルト部分空間上の加法的であり、ユニタリ時間変換の下で不変であるという事実を用いる。その非対角的要素は概して肯定的ではないが、形式的議論において重要な役割を果たすとともに、情報の伝達を幾分奇妙な方法で可視化する。

A protocol for transmission of information between two parties introduced by Salih et al., Phys. Rev. Lett. 110 (2013) 170502 (hereafter SLAZ), involves sending quantum amplitude back and forth through a quantum channel in a series of steps, rather than simply sending a signal in one direction. The authors claimed that their protocol was ``counterfactual'' in the sense that while a quantum channel is needed to connect the parties, its actual usage becomes vanishingly small in the asymptotic limit as the number of steps tends to infinity. Here we show that this claim is incorrect because it uses probabilistic reasoning that is not valid at intermediate times in the presence of quantum interference. When ill-defined probabilities are replaced with a well-defined measure of channel usage here called ``Cost'', equal to the absolute square of the amplitude sent through the channel, the total Cost does not go to zero in the asymptotic limit of a large number of steps, but is bounded below by a rigorous inequality. A detailed analysis shows that this bound is satisfied in the SLAZ protocol. The analysis leading to the bound uses the fact that the Gram matrix formed by inner products of a collection of pure quantum states is additive over Hilbert subspaces and invariant under unitary time transformations. Its off-diagonal elements, which in general are not positive, play a significant role in the formal argument as well as providing a somewhat strange way of visualizing the transfer of information.

翻訳日:2023-06-30 19:27:24 公開日:2023-06-28

# 一般核行列のデータ駆動線形複雑性低ランク近似:幾何学的アプローチ

Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach ( http://arxiv.org/abs/2212.12674v2 )

ライセンス: Link先を確認

Difeng Cai, Edmond Chow, Yuanzhe Xi

(参考訳) 一般に、カーネル行列は $k_{ij} = \kappa(x_i,y_j)$ ここで $\kappa(x,y)$ はカーネル関数であり、$x=\{x_i\}_{i=1}^m$ と $y=\{y_i\}_{i=1}^n$ は2つの点の集合である。本稿では、x$ と $y$ の点の集合が大きすぎて、互いに離れて ``intermingled'' や same など、任意に分布するカーネル行列の低ランク近似を求める。このような長方形のカーネル行列は、例えばガウスのプロセス回帰において、$X$はトレーニングデータに対応し、$Y$はテストデータに対応する。この場合、点はしばしば高次元である。点集合は大きいので、行列が核関数から生じるという事実を活用し、行列の形成を避け、したがってほとんどの代数的手法を除外しなければならない。特に,固定近似ランクのデータサイズに関して,線形あるいはほぼ線形にスケールできる手法を求める。この論文の主なアイデアは、低位近似を構成する点の適切な部分集合を幾何学的に選択することである。本稿では,この選択をいかに行うべきかを考察する。

A general, {\em rectangular} kernel matrix may be defined as $K_{ij} = \kappa(x_i,y_j)$ where $\kappa(x,y)$ is a kernel function and where $X=\{x_i\}_{i=1}^m$ and $Y=\{y_i\}_{i=1}^n$ are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large and are arbitrarily distributed, such as away from each other, ``intermingled'', identical, etc. Such rectangular kernel matrices may arise, for example, in Gaussian process regression where $X$ corresponds to the training data and $Y$ corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly or nearly linear with respect to the size of data for a fixed approximation rank. The main idea in this paper is to {\em geometrically} select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.

翻訳日:2023-06-30 19:26:57 公開日:2023-06-28

# カスケード変分量子固有解法アルゴリズム

Cascaded variational quantum eigensolver algorithm ( http://arxiv.org/abs/2303.15237v2 )

ライセンス: Link先を確認

Daniel Gunlycke, C. Stephen Hellberg, and John P. T. Stenger

(参考訳) 本稿では,パラメータ最適化過程において,反復毎に1回ではなく1回の量子回路セットの実行しか必要としないカスケード変分量子固有ソルバアルゴリズムを提案する。このアルゴリズムは量子処理ユニットを用いて必要な確率質量関数を探索し、古典処理ユニットはエネルギー最小化を含む残りの計算を行う。アンサッツ形式はフォック空間を制限せず、対称性やその他の物理的動機付けのある制約を含むアンサッツ状態を完全に制御する。

We present a cascaded variational quantum eigensolver algorithm that only requires the execution of a set of quantum circuits once rather than at every iteration during the parameter optimization process, thereby reducing the number of needed circuit executions. This algorithm uses a quantum processing unit to probe the needed probability mass functions and a classical processing unit perform the remaining calculations, including the energy minimization. The ansatz form does not restrict the Fock space and provides full control over the ansatz state, including the implementation of symmetry and other physically motivated constraints.

翻訳日:2023-06-30 19:18:23 公開日:2023-06-28

# ProphNet: アンカーインフォームド提案による効率的なエージェント中心運動予測

ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals ( http://arxiv.org/abs/2303.12071v3 )

ライセンス: Link先を確認

Xishun Wang, Tong Su, Fang Da, Xiaodong Yang

(参考訳) モーション予測は自動運転システムにおいて重要なモジュールである。マルチソース入力の異質性、エージェントの動作におけるマルチモダリティ、オンボード配置に必要な低レイテンシのため、このタスクは悪名高い課題である。このような問題に対処するため,本研究では,効率的なマルチモーダル動作予測のためのアンカーインフォームド提案を用いたエージェント中心モデルを提案する。複雑な入力を簡潔に統一的に符号化するモダリティ非依存戦略を設計する。我々は,目標志向のシーンコンテキストを持つアンカーと融合した多様な提案を生成し,幅広い将来の軌跡をカバーするマルチモーダル予測を誘導する。我々のネットワークアーキテクチャは高度に均一で簡潔であり、現実の運転環境に適応できる効率的なモデルに繋がる。実験により,エージェント中心のネットワークは予測精度において最先端の手法と好適に比較され,シーン中心レベルの推論レイテンシが達成された。

Motion forecasting is a key module in an autonomous driving system. Due to the heterogeneous nature of multi-sourced input, multimodality in agent behavior, and low latency required by onboard deployment, this task is notoriously challenging. To cope with these difficulties, this paper proposes a novel agent-centric model with anchor-informed proposals for efficient multimodal motion prediction. We design a modality-agnostic strategy to concisely encode the complex input in a unified manner. We generate diverse proposals, fused with anchors bearing goal-oriented scene context, to induce multimodal prediction that covers a wide range of future trajectories. Our network architecture is highly uniform and succinct, leading to an efficient model amenable for real-world driving deployment. Experiments reveal that our agent-centric network compares favorably with the state-of-the-art methods in prediction accuracy, while achieving scene-centric level inference latency.

翻訳日:2023-06-30 19:17:26 公開日:2023-06-28

# 因果推論における変数重要度マッチング

Variable Importance Matching for Causal Inference ( http://arxiv.org/abs/2302.11715v2 )

ライセンス: Link先を確認

Quinn Lanners, Harsh Parikh, Alexander Volfovsky, Cynthia Rudin, and David Page

(参考訳) 我々の目標は、監査可能で、トラブルシュートが容易で、治療効果の推定に正確で、高次元データにスケーラブルな観察因果推定法を作ることである。これらの目標を達成するための汎用フレームワークとして,model-to-matchについて述べる。 (i)成果モデリングを通して距離計量を学ぶこと。 (ii)距離計量を用いて一致群を作成すること、 (iii)一致群を用いて治療効果を推定する。 Model-to-Matchは、可変重要度測定を使用して距離メトリックを構築し、様々なアプリケーションに適用可能な柔軟なフレームワークとなる。潜在的な共同設立者数における問題のスケーラビリティに集中して、LASSOでModel-to-Matchフレームワークを運用します。 lassoの成果モデリングが(線形モデルを正しく指定する必要なしに)すべての共同創設者を一貫して識別する設定で、パフォーマンス保証を導き出します。また,本手法の監査性,精度,拡張性,およびより一般的な非パラメトリックな結果モデリングの拡張性を示す実験結果も提供する。

Our goal is to produce methods for observational causal inference that are auditable, easy to troubleshoot, accurate for treatment effect estimation, and scalable to high-dimensional data. We describe a general framework called Model-to-Match that achieves these goals by (i) learning a distance metric via outcome modeling, (ii) creating matched groups using the distance metric, and (iii) using the matched groups to estimate treatment effects. Model-to-Match uses variable importance measurements to construct a distance metric, making it a flexible framework that can be adapted to various applications. Concentrating on the scalability of the problem in the number of potential confounders, we operationalize the Model-to-Match framework with LASSO. We derive performance guarantees for settings where LASSO outcome modeling consistently identifies all confounders (importantly without requiring the linear model to be correctly specified). We also provide experimental results demonstrating the method's auditability, accuracy, and scalability as well as extensions to more general nonparametric outcome modeling.

翻訳日:2023-06-30 19:16:22 公開日:2023-06-28

# ディープラーニングによるパーコレーション型ゲームのマスタリング

Mastering Percolation-like Games with Deep Learning ( http://arxiv.org/abs/2305.07687v2 )

ライセンス: Link先を確認

Michael M. Danziger, Omkar R. Gojala, Sean P. Cornelius

(参考訳) ランダムアタックに対するネットワークの堅牢性は広く研究されているが、知的エージェントによる意図的な破壊は従来の方法では不可能である。ここでは,ネットワークを破壊しようとする攻撃者の論理を模倣した格子上に単一プレイヤーゲームを作成する。ゲームの目的は、最も少ないステップ数で全てのノードを無効にすることである。我々は,ネットワークを最適に攻撃するために,このゲームをうまくプレイできる深層q学習を用いた強化学習手法を開発した。学習アルゴリズムは普遍的であるため、堅牢性の異なる定義のエージェントを訓練し、学習戦略を比較する。表面的に類似したロバストネスの定義は、訓練されたエージェントに異なる戦略を誘導し、ネットワークを最適に攻撃または防御することが特定の目的に敏感であることを示唆する。本手法はネットワークのロバスト性を理解するための新しい手法であり、障害のあるシステムにおける他の離散プロセスへの潜在的な応用を提供する。

Though robustness of networks to random attacks has been widely studied, intentional destruction by an intelligent agent is not tractable with previous methods. Here we devise a single-player game on a lattice that mimics the logic of an attacker attempting to destroy a network. The objective of the game is to disable all nodes in the fewest number of steps. We develop a reinforcement learning approach using deep Q-learning that is capable of learning to play this game successfully, and in so doing, to optimally attack a network. Because the learning algorithm is universal, we train agents on different definitions of robustness and compare the learned strategies. We find that superficially similar definitions of robustness induce different strategies in the trained agent, implying that optimally attacking or defending a network is sensitive the particular objective. Our method provides a new approach to understand network robustness, with potential applications to other discrete processes in disordered systems.

翻訳日:2023-06-30 19:09:30 公開日:2023-06-28

# ポテンシャルインバージョン理論

The Potential Inversion Theorem ( http://arxiv.org/abs/2305.07260v2 )

ライセンス: Link先を確認

Alec Shelley, Henry Hunt

(参考訳) タイト結合モデルにおける波動関数の確率は、初期条件が厳密に偶数あるいは奇な格子点を占有し、大域的な位相まで存在する限り、ポテンシャルエネルギーの符号反転の下で保存されるというポテンシャル反転定理を証明する。この対称性は電子対の時間はポジトロニウムのように進化し、したがって結合状態を形成する必要がある。我々は、この単純な定理の他の興味深い結果と同様に、粒子が正のポテンシャルと同様に負のポテンシャルに捕捉されるという事実も探求する。格子ホッピングモデル、スピン相互作用モデル、量子場理論の文脈におけるポテンシャル反転定理を議論し、それらがいくつかの無関係な物理的効果を単純化し説明できることを示す。

We prove the potential inversion theorem, which says that wavefunction probability in tight binding models is preserved under the sign inversion of the potential energy as long as the initial conditions occupy strictly even or odd lattice sites and are real up to a global phase. This symmetry requires that electron pairs time evolve like positronium and therefore form bound states. We explore this as well as other intriguing consequences of this simple theorem, such as the fact that particles can be trapped by negative potentials just as well as positive potentials. We discuss the potential inversion theorem in the context of lattice hopping models, spin interaction models, and quantum field theory, and show that it can simplify and explain a number of seemingly unrelated physical effects.

翻訳日:2023-06-30 19:09:14 公開日:2023-06-28

# 分解密度を持つ文字列図形

String Diagrams with Factorized Densities ( http://arxiv.org/abs/2305.02506v2 )

ライセンス: Link先を確認

Eli Sennesh and Jan-Willem van de Meent

(参考訳) 確率的プログラムと因果モデルに関する研究の活発化は、有向グラフィカルモデルを拡張するモデルクラスについて構成的に考える必要性を強調している。確率的プログラムと因果モデルの両方は、ランダム変数の集合上の合同確率密度を定義し、因果関係と条件独立性を推論するために使用できるスパース構造を示す。この研究は、確率写像のマルコフ圏に関する最近の研究に基づいて、射が各サンプル空間上で分解された結合密度と、サンプルから戻り値への決定論的写像を組み合わせた圏を定義する。これは、確率測度に関する最近のカテゴリー論的記述と、確率計画法や因果推論によく用いられる分解密度の操作的定義とのギャップを埋めるためのステップである。

A growing body of research on probabilistic programs and causal models has highlighted the need to reason compositionally about model classes that extend directed graphical models. Both probabilistic programs and causal models define a joint probability density over a set of random variables, and exhibit sparse structure that can be used to reason about causation and conditional independence. This work builds on recent work on Markov categories of probabilistic mappings to define a category whose morphisms combine a joint density, factorized over each sample space, with a deterministic mapping from samples to return values. This is a step towards closing the gap between recent category-theoretic descriptions of probability measures, and the operational definitions of factorized densities that are commonly employed in probabilistic programming and causal inference.

翻訳日:2023-06-30 19:08:39 公開日:2023-06-28

# 長期一重項状態準備のための反断熱駆動

Counterdiabatic driving for long-lived singlet state preparation ( http://arxiv.org/abs/2305.02096v2 )

ライセンス: Link先を確認

Abhinav Suresh, Vishal Varma, Priya Batra, and T S Mahesh

(参考訳) 量子アディアバティック法は、状態の進化を通じて瞬時に固有状態の個体群を維持するもので、状態の準備と操作のために確立され、しばしば好まれる選択である。駆動コストを著しく最小化するが、その遅い速度はノイズの多い中規模量子(NISQ)時代の技術では厳しい制限となる。断熱経路は多くの物理過程において広く見られるため、断熱をはるかに高速に達成することはより広い関心事である。非断熱経路を高速に駆動することで、遅い断熱過程を克服する断熱技術へのショートカットが近年注目されている。過去10年間に確立された核磁気共鳴における長寿命一重項状態(LLS)の極端に長い寿命は、分光法から生医学的イメージングまで、いくつかの重要な応用を開拓してきた。断熱法を含む様々な方法がLSSの調製にすでに使われている。本稿では,高速駆動によるLSS調製を高速化するために,逆断熱駆動(CD)を用いたことを報告する。 NMR実験により,CDは従来の断熱駆動よりも短い期間でLSSのオーダーを得られることを示した。

The quantum adiabatic method, which maintains populations in their instantaneous eigenstates throughout the state evolution, is an established and often a preferred choice for state preparation and manipulation. Though it minimizes the driving cost significantly, its slow speed is a severe limitation in noisy intermediate-scale quantum (NISQ) era technologies. Since adiabatic paths are extensive in many physical processes, it is of broader interest to achieve adiabaticity at a much faster rate. Shortcuts to adiabaticity techniques which overcome the slow adiabatic process by driving the system faster through non-adiabatic paths, have seen increased attention recently. The extraordinarily long lifetime of the long-lived singlet states (LLS) in nuclear magnetic resonance, established over the past decade, has opened several important applications ranging from spectroscopy to biomedical imaging. Various methods, including adiabatic methods, are already being used to prepare LLS. In this article, we report the use of counterdiabatic driving (CD) to speed up LLS preparation with faster drives. Using NMR experiments, we show that CD can give stronger LLS order in shorter durations than conventional adiabatic driving.

翻訳日:2023-06-30 19:08:25 公開日:2023-06-28

# BMAD: 医学的異常検出のためのベンチマーク

BMAD: Benchmarks for Medical Anomaly Detection ( http://arxiv.org/abs/2306.11876v2 )

ライセンス: Link先を確認

Jinan Bao, Hanshi Sun, Hanqiu Deng, Yinsheng He, Zhaoxiang Zhang, Xingyu Li

(参考訳) 異常検出(AD)は、機械学習とコンピュータビジョンの基本的な研究課題であり、産業検査、ビデオ監視、医療診断に実用化されている。医用画像では、ADはまれな疾患や病態を示す可能性のある異常の検出と診断に特に重要である。しかし、医療画像上でADメソッドを評価するための普遍的で公平なベンチマークが欠如しており、この特定の領域におけるより一般化された、堅牢なADメソッドの開発を妨げる。このギャップを埋めるために,医療画像における異常検出手法を評価するための総合評価ベンチマークを提案する。このベンチマークは、5つの医学領域(脳MRI、肝CT、網膜OCT、胸部X線、デジタル病理学)から6つの再構成データセットと3つの重要な評価指標を含み、合計14の最先端ADアルゴリズムを含んでいる。本ベンチマークは,最近提案された異常検出手法の総合的な比較を可能にする。これは、コミュニティが公正な比較を行い、医療画像のAD分野を前進させることを促す。 BMADの詳細はGitHubリポジトリで確認できます。

Anomaly detection (AD) is a fundamental research problem in machine learning and computer vision, with practical applications in industrial inspection, video surveillance, and medical diagnosis. In medical imaging, AD is especially vital for detecting and diagnosing anomalies that may indicate rare diseases or conditions. However, there is a lack of a universal and fair benchmark for evaluating AD methods on medical images, which hinders the development of more generalized and robust AD methods in this specific domain. To bridge this gap, we introduce a comprehensive evaluation benchmark for assessing anomaly detection methods on medical images. This benchmark encompasses six reorganized datasets from five medical domains (i.e. brain MRI, liver CT, retinal OCT, chest X-ray, and digital histopathology) and three key evaluation metrics, and includes a total of fourteen state-of-the-art AD algorithms. This standardized and well-curated medical benchmark with the well-structured codebase enables comprehensive comparisons among recently proposed anomaly detection methods. It will facilitate the community to conduct a fair comparison and advance the field of AD on medical imaging. More information on BMAD is available in our GitHub repository: https://github.com/DorisBao/BMAD

翻訳日:2023-06-30 18:58:27 公開日:2023-06-28

# 脳腫瘍分離(BraTS)チャレンジ2023: 腫瘍分離(BraSyn)のための脳MR画像合成

The Brain Tumor Segmentation (BraTS) Challenge 2023: Brain MR Image Synthesis for Tumor Segmentation (BraSyn) ( http://arxiv.org/abs/2305.09011v5 )

ライセンス: Link先を確認

Hongwei Bran Li, Gian Marco Conte, Syed Muhammad Anwar, Florian Kofler, Ivan Ezhov, Koen van Leemput, Marie Piraud, Maria Diaz, Byrone Cole, Evan Calabrese, Jeff Rudie, Felix Meissen, Maruf Adewole, Anastasia Janas, Anahita Fathi Kazerooni, Dominic LaBella, Ahmed W. Moawad, Keyvan Farahani, James Eddy, Timothy Bergquist, Verena Chung, Russell Takeshi Shinohara, Farouk Dako, Walter Wiggins, Zachary Reitman, Chunhao Wang, Xinyang Liu, Zhifan Jiang, Ariana Familiar, Elaine Johanson, Zeke Meier, Christos Davatzikos, John Freymann, Justin Kirby, Michel Bilello, Hassan M. Fathallah-Shaykh, Roland Wiest, Jan Kirschke, Rivka R. Colen, Aikaterini Kotrotsou, Pamela Lamontagne, Daniel Marcus, Mikhail Milchenko, Arash Nazeri, Marc Andr\'e Weber, Abhishek Mahajan, Suyash Mohan, John Mongan, Christopher Hess, Soonmee Cha, Javier Villanueva, Meyer Errol Colak, Priscila Crivellaro, Andras Jakab, Jake Albrecht, Udunna Anazodo, Mariam Aboian, Thomas Yu, Verena Chung, Timothy Bergquist, James Eddy, Jake Albrecht, Ujjwal Baid, Spyridon Bakas, Marius George Linguraru, Bjoern Menze, Juan Eugenio Iglesias, Benedikt Wiestler

(参考訳) 自動脳腫瘍分画法が確立され,臨床応用可能な性能レベルに達している。これらの手法は通常、T1強調画像、T2強調画像、FLAIR画像の4つの入力磁気共鳴イメージング(MRI)モードに依存している。しかしながら、一部のシーケンスは、時間的制約や患者の動きのようなイメージアーティファクトのために臨床実践に欠落することが多い。その結果、これらのアルゴリズムが臨床ルーチンで広く採用されるためには、欠落したモダリティを置換し、セグメンテーション性能を得る能力が極めて望ましい。本稿では,医療用画像コンピューティングとコンピュータ支援インターベンション(MICCAI)2023と連携して脳MR画像合成ベンチマーク(BraSyn)の確立について述べる。この課題の主な目的は、複数の利用可能な画像が提供される際に、MRIの欠落を現実的に生成できる画像合成手法を評価することである。究極の目的は、自動的な脳腫瘍セグメンテーションパイプラインを促進することである。ベンチマークで使用される画像データセットは多様で多様であり、様々な病院や研究機関と協力して作成された。

Automated brain tumor segmentation methods have become well-established and reached performance levels offering clear clinical utility. These methods typically rely on four input magnetic resonance imaging (MRI) modalities: T1-weighted images with and without contrast enhancement, T2-weighted images, and FLAIR images. However, some sequences are often missing in clinical practice due to time constraints or image artifacts, such as patient motion. Consequently, the ability to substitute missing modalities and gain segmentation performance is highly desirable and necessary for the broader adoption of these algorithms in the clinical routine. In this work, we present the establishment of the Brain MR Image Synthesis Benchmark (BraSyn) in conjunction with the Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2023. The primary objective of this challenge is to evaluate image synthesis methods that can realistically generate missing MRI modalities when multiple available images are provided. The ultimate aim is to facilitate automated brain tumor segmentation pipelines. The image dataset used in the benchmark is diverse and multi-modal, created through collaboration with various hospitals and research institutions.

翻訳日:2023-06-30 18:56:23 公開日:2023-06-28

# 条件付き拡散モデルによる損失画像圧縮

Lossy Image Compression with Conditional Diffusion Models ( http://arxiv.org/abs/2209.06950v5 )

ライセンス: Link先を確認

Ruihan Yang, Stephan Mandt

(参考訳) 本稿では,拡散生成モデルを用いた画像圧縮のエンドツーエンド最適化について概説する。このアプローチは変換符号化パラダイムに依存しており、画像はエントロピー符号化のための潜在空間にマッピングされ、そこから再構成のためにデータ空間にマッピングされる。平均)デコーダが決定論的ニューラルネットワークであるvaeベースのニューラルネットワークとは対照的に、このデコーダは条件拡散モデルである。そこで本手法では,逆拡散過程を条件付けした"コンテンツ"潜在変数を導入し,この変数を用いて画像に関する情報を格納する。拡散過程を特徴付ける残りの「テクスチャ」変数は復号時に合成される。モデルの性能は,関心の認知的指標に調整可能であることを示す。複数のデータセットと画像品質評価指標を含む広範囲な実験により,提案手法はGANモデルよりも強いFIDスコアを得られる一方で,VAEモデルと競合する性能を複数の歪み指標で得ることが示された。さらに、Xパラメータ化による拡散の訓練により、少数の復号化ステップで高品質な再構成が可能となり、モデルの実用性に大きな影響を及ぼす。

This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. The approach relies on the transform coding paradigm, where an image is mapped into a latent space for entropy coding and, from there, mapped back to the data space for reconstruction. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach thus introduces an additional "content" latent variable on which the reverse diffusion process is conditioned and uses this variable to store information about the image. The remaining "texture" variables characterizing the diffusion process are synthesized at decoding time. We show that the model's performance can be tuned toward perceptual metrics of interest. Our extensive experiments involving multiple datasets and image quality assessment metrics show that our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics. Furthermore, training the diffusion with X-parameterization enables high-quality reconstructions in only a handful of decoding steps, greatly affecting the model's practicality.

翻訳日:2023-06-30 17:03:22 公開日:2023-06-28

# 帯域制限関数の一般化におけるNN上のGNNの優位性

Superiority of GNN over NN in generalizing bandlimited functions ( http://arxiv.org/abs/2206.05904v7 )

ライセンス: Link先を確認

A. Martina Neuman, Rongrong Wang and Yuying Xie

(参考訳) グラフ情報の統合機能を備えたグラフニューラルネットワーク(GNN)は,データ解析に広く利用されている。しかし、GNNの表現力はグラフレベルのタスクにのみ研究されているが、ノード分類のようなノードレベルのタスクでは研究されていない。本稿では, 関数補間問題である, 上記の分類課題に対するgnnの表現力について検討する。具体的には、GNNが帯域制限関数を$\mathbb{R}^d$で補間するのに必要な重みと層の数を求める。以上の結果から,GNNアーキテクチャを用いた帯域制限関数の重み付けは,完全連結ニューラルネットワーク(NN)を用いた一般的な帯域制限関数よりもはるかに少ないことが分かる。特に,$O((\log \epsilon^{-1})^{d})$重み付けは,$O((\log \epsilon^{-1})^{d})$サンプルから$\epsilon$-approximateへ,$\mathbb{R}^d$で離散化された帯域制限信号を生成する。この結果は、gnn構造と古典的なサンプリング定理との関係を描き、我々の研究がこの方向への最初の試みとなるようにすることで得られる。

Graph Neural Network (GNN) with its ability to integrate graph information has been widely used for data analyses. However, the expressive power of GNN has only been studied for graph-level tasks but not for node-level tasks, such as node classification, where one tries to interpolate missing nodal labels from the observed ones. In this paper, we study the expressive power of GNN for the said classification task, which is in essence a function interpolation problem. Explicitly, we derive the number of weights and layers needed for a GNN to interpolate a band-limited function in $\mathbb{R}^d$. Our result shows that, the number of weights needed to $\epsilon$-approximate a bandlimited function using the GNN architecture is much fewer than the best known one using a fully connected neural network (NN) - in particular, one only needs $O((\log \epsilon^{-1})^{d})$ weights using a GNN trained by $O((\log \epsilon^{-1})^{d})$ samples to $\epsilon$-approximate a discretized bandlimited signal in $\mathbb{R}^d$. The result is obtained by drawing a connection between the GNN structure and the classical sampling theorems, making our work the first attempt in this direction.

翻訳日:2023-06-30 17:02:16 公開日:2023-06-28

# データ中毒に対する時間的ロバスト性

Temporal Robustness against Data Poisoning ( http://arxiv.org/abs/2302.03684v2 )

ライセンス: Link先を確認

Wenxiao Wang, Soheil Feizi

(参考訳) データ中毒は、悪意のあるトレーニングデータを通じて機械学習アルゴリズムの振る舞いを操作する場合を考える。データ中毒の既存の脅威モデルは、1つの指標、有毒サンプルの数を中心に構成されている。結果として、多くの実用的なシナリオのように、攻撃者が予想よりも多くのサンプルを手頃なオーバーヘッドで毒殺することができれば、既存の防御を短期間で無効にすることができる可能性がある。この問題に対処するために、私たちはデータの生年月日を示すタイムスタンプを活用しています。これらのタイムスタンプの利点を生かして,攻撃開始までの時間と攻撃の継続時間を測定する2つの新しい指標,アールネスと持続時間によるデータ中毒の時間的脅威モデルを提案する。これらの指標を用いて,データ中毒に対する時間的ロバスト性の概念を定義し,有意な保護感を与える。本稿では,更新モデルの連続データ収集と周期的展開をシミュレートした評価プロトコルを用いて,時間的ロバスト性の実証評価を行う。最後に、我々は、時間的集約(temporal aggregation)、証明可能な時間的堅牢性(temporal robustness)の提供、およびデータ中毒に対する時間的脅威モデルの可能性を強調するベースラインディフェンスを開発し、実証的に検証する。

Data poisoning considers cases when an adversary manipulates the behavior of machine learning algorithms through malicious training data. Existing threat models of data poisoning center around a single metric, the number of poisoned samples. In consequence, if attackers can poison more samples than expected with affordable overhead, as in many practical scenarios, they may be able to render existing defenses ineffective in a short time. To address this issue, we leverage timestamps denoting the birth dates of data, which are often available but neglected in the past. Benefiting from these timestamps, we propose a temporal threat model of data poisoning with two novel metrics, earliness and duration, which respectively measure how long an attack started in advance and how long an attack lasted. Using these metrics, we define the notions of temporal robustness against data poisoning, providing a meaningful sense of protection even with unbounded amounts of poisoned samples. We present a benchmark with an evaluation protocol simulating continuous data collection and periodic deployments of updated models, thus enabling empirical evaluation of temporal robustness. Lastly, we develop and also empirically verify a baseline defense, namely temporal aggregation, offering provable temporal robustness and highlighting the potential of our temporal threat model for data poisoning.

翻訳日:2023-06-30 16:54:38 公開日:2023-06-28

# ニュースの文レベルの事実性とメディアメディアのバイアスの予測

Predicting Sentence-Level Factuality of News and Bias of Media Outlets ( http://arxiv.org/abs/2301.11850v3 )

ライセンス: Link先を確認

Francielle Vargas, Kokil Jaidka, Thiago A. S. Pardo, Fabr\'icio Benevenuto

(参考訳) ニュースの信頼性と事実チェックを大規模に自動化するには、ニュースの事実とメディアバイアスを正確に予測する必要がある。本稿では,AllSides が提案する事実とメディアバイアスの定義に基づいて,6,191 の注釈付き文からなる「FactNews」という文レベルデータセットを提案する。我々はFactNewsを用いて、ニュースメディアの文章レベルの事実性を予測するための2つのテキスト分類問題を定式化し、ニュースソース全体の信頼性を評価する。本実験では,バイアス文は感情の優位に加えて,実文よりも単語数が多いことを実証する。そこで,ニュース記事の主観性と公平性の微粒化分析により,メディアの信頼性を予測できる有望な結果が得られた。最後に、ブラジルにおける偽ニュースの深刻さと政治的偏見、そしてポルトガル語の研究の欠如により、データセットとベースラインの両方がブラジルポルトガル語向けに提案された。

Automated news credibility and fact-checking at scale require accurately predicting news factuality and media bias. This paper introduces a large sentence-level dataset, titled "FactNews", composed of 6,191 sentences expertly annotated according to factuality and media bias definitions proposed by AllSides. We use FactNews to assess the overall reliability of news sources, by formulating two text classification problems for predicting sentence-level factuality of news reporting and bias of media outlets. Our experiments demonstrate that biased sentences present a higher number of words compared to factual sentences, besides having a predominance of emotions. Hence, the fine-grained analysis of subjectivity and impartiality of news articles provided promising results for predicting the reliability of media outlets. Finally, due to the severity of fake news and political polarization in Brazil, and the lack of research for Portuguese, both dataset and baseline were proposed for Brazilian Portuguese.

翻訳日:2023-06-30 16:53:42 公開日:2023-06-28

# 敵対的攻撃に対するブラックボックスモデルのデータフリー防御

Data-free Defense of Black Box Models Against Adversarial Attacks ( http://arxiv.org/abs/2211.01579v2 )

ライセンス: Link先を確認

Gaurav Kumar Nayak, Inder Khatri, Ruchit Rawal, Anirban Chakraborty

(参考訳) いくつかの企業は、APIを通じてブラックボックスとしてのみ公開することによって、トレーニングされた深層モデル(アーキテクチャの詳細、学習重量、トレーニング詳細など)をサードパーティのユーザから保護することが多い。さらに、プロプライエタリな理由やセンシティブな懸念から、トレーニングデータへのアクセスも提供されない可能性がある。そこで本研究では,データフリーセットアップにおける敵攻撃に対するブラックボックスモデルに対する新しい防御機構を提案する。生成モデルによる合成データを構築し,モデル盗み技術を用いてサロゲートネットワークを訓練する。本稿では,入力画像上で離散ウェーブレット分解を行う「ウェーブレットノイズ除去器」(WNR)を提案し,我々の「ウェーブレット係数選択モジュール」(WCSM)によって決定されるいくつかの重要な係数のみを慎重に選択する。 WNRによるノイズ除去後の画像の高周波コンテンツを回復するため,再構成した画像がサロゲートモデル上で元の予測と類似する係数を復元する目的で,さらに「再生器」ネットワークを訓練する。テスト時には、トレーニングされた再生器ネットワークと組み合わせたWNRがブラックボックスネットワークにプリプションされ、敵の精度が向上する。本手法は,攻撃者がブラックボックスアーキテクチャ(Alexnet)に類似したサロゲートアーキテクチャ(Alexnet-half,Alexnet)をディフェンダーと同じモデルステーリング戦略で使用しても,ベースラインと比較してCIFAR-10の対角精度を38.98%,32.01%向上させる。コードはhttps://github.com/vcl-iisc/data-free-black-box- defenseで入手できる。

Several companies often safeguard their trained deep models (i.e., details of architecture, learnt weights, training details etc.) from third-party users by exposing them only as black boxes through APIs. Moreover, they may not even provide access to the training data due to proprietary reasons or sensitivity concerns. In this work, we propose a novel defense mechanism for black box models against adversarial attacks in a data-free set up. We construct synthetic data via generative model and train surrogate network using model stealing techniques. To minimize adversarial contamination on perturbed samples, we propose 'wavelet noise remover' (WNR) that performs discrete wavelet decomposition on input images and carefully select only a few important coefficients determined by our 'wavelet coefficient selection module' (WCSM). To recover the high-frequency content of the image after noise removal via WNR, we further train a 'regenerator' network with the objective of retrieving the coefficients such that the reconstructed image yields similar to original predictions on the surrogate model. At test time, WNR combined with trained regenerator network is prepended to the black box network, resulting in a high boost in adversarial accuracy. Our method improves the adversarial accuracy on CIFAR-10 by 38.98% and 32.01% on state-of-the-art Auto Attack compared to baseline, even when the attacker uses surrogate architecture (Alexnet-half and Alexnet) similar to the black box architecture (Alexnet) with same model stealing strategy as defender. The code is available at https://github.com/vcl-iisc/data-free-black-box-defense

翻訳日:2023-06-30 16:52:49 公開日:2023-06-28

# 教師なしファジィクラスタリングを用いた歴史的文書の手書き認識

Recognizing Handwriting Styles in a Historical Scanned Document Using Unsupervised Fuzzy Clustering ( http://arxiv.org/abs/2210.16780v2 )

ライセンス: Link先を確認

Sriparna Majumdar and Aaron Brick

(参考訳) デジタル化された文書中の手書きの複数の筆跡への法医学的帰属は、高次元の難しい問題である。ユニークな手書きスタイルは、文字サイズ、ストローク幅、ループ、ダクト、傾斜角、曲がりくねったリガチュアなど、いくつかの要素を混ぜ合わせて区別することができる。隠れマルコフモデル、サポートベクターマシン、半教師付きリカレントニューラルネットワークによるラベル付きデータの研究は、中程度から高い成功を収めている。本研究では, ファジィソフトクラスタリングと線形主成分分析を組み合わせることで, 古写本のハンドシフトの検出に成功している。この進歩は、歴史文書の著者帰属と法医学的文書分析のための教師なし手法の展開を成功に導くものである。

The forensic attribution of the handwriting in a digitized document to multiple scribes is a challenging problem of high dimensionality. Unique handwriting styles may be dissimilar in a blend of several factors including character size, stroke width, loops, ductus, slant angles, and cursive ligatures. Previous work on labeled data with Hidden Markov models, support vector machines, and semi-supervised recurrent neural networks have provided moderate to high success. In this study, we successfully detect hand shifts in a historical manuscript through fuzzy soft clustering in combination with linear principal component analysis. This advance demonstrates the successful deployment of unsupervised methods for writer attribution of historical documents and forensic document analysis.

翻訳日:2023-06-30 16:52:18 公開日:2023-06-28

# AI生成したテキストは確実に検出できるのか?

Can AI-Generated Text be Reliably Detected? ( http://arxiv.org/abs/2303.11156v2 )

ライセンス: Link先を確認

Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang and Soheil Feizi

(参考訳) 本稿では,経験的かつ理論的に,いくつかのAIテキスト検出装置が現実的なシナリオでは信頼できないことを示す。実験により,大規模な言語モデル (LLM) 上に光パラフレーズが適用されるパラフレーズ攻撃は,ウォーターマーキングスキームやニューラルネットワークベースの検出器,ゼロショット分類器などを含む,あらゆる種類の検出器を破壊できることを示す。本実験は, 再帰的パラフレージングに対して依然として脆弱であることを示す。次に, 言語モデルがより洗練され, 人間の文章をエミュレートする能力が向上するにつれて, 最良検出器でも性能が低下することを示す理論的に不可能であることを示す。人間の文章を模倣しようとする十分に高度な言語モデルにとって、最も有望な検出器でさえ、ランダムな分類器よりもわずかに優れている。私たちの結果は、特定の記述スタイル、巧妙なプロンプトデザイン、テキストパラフレーズなど、特定のシナリオを捉えるのに十分です。また、擬似乱数生成器が真のランダム性ではなく、AIテキスト生成に使用される場合を含むように、不可能な結果も拡張する。すべての多項式時間計算可能検出器に対して、同じ結果が無視可能な補正項を持つことを示す。最後に、透かし方式で保護されたLLMでさえ、敵対する人間が隠れたLLMテキストシグネチャを推測し、LLMが生成したテキストとして検出する人為的なテキストに追加できる偽造攻撃に対して脆弱であり、開発者が評判を損なう可能性があることを示す。これらの結果は、AI生成テキストの倫理的かつ信頼性の高い使用に関するコミュニティの正直な会話を開こうとしています。

In this paper, both empirically and theoretically, we show that several AI-text detectors are not reliable in practical scenarios. Empirically, we show that paraphrasing attacks, where a light paraphraser is applied on top of a large language model (LLM), can break a whole range of detectors, including ones using watermarking schemes as well as neural network-based detectors and zero-shot classifiers. Our experiments demonstrate that retrieval-based detectors, designed to evade paraphrasing attacks, are still vulnerable to recursive paraphrasing. We then provide a theoretical impossibility result indicating that as language models become more sophisticated and better at emulating human text, the performance of even the best-possible detector decreases. For a sufficiently advanced language model seeking to imitate human text, even the best-possible detector may only perform marginally better than a random classifier. Our result is general enough to capture specific scenarios such as particular writing styles, clever prompt design, or text paraphrasing. We also extend the impossibility result to include the case where pseudorandom number generators are used for AI-text generation instead of true randomness. We show that the same result holds with a negligible correction term for all polynomial-time computable detectors. Finally, we show that even LLMs protected by watermarking schemes can be vulnerable against spoofing attacks where adversarial humans can infer hidden LLM text signatures and add them to human-generated text to be detected as text generated by the LLMs, potentially causing reputational damage to their developers. We believe these results can open an honest conversation in the community regarding the ethical and reliable use of AI-generated text.

翻訳日:2023-06-30 16:43:25 公開日:2023-06-28

# データスワップのヘイトスケーリング法則について

On Hate Scaling Laws For Data-Swamps ( http://arxiv.org/abs/2306.13141v2 )

ライセンス: Link先を確認

Abeba Birhane, Vinay Prabhu, Sang Han, Vishnu Naresh Boddeti

(参考訳) 「モデルをスケールし、データをスケールし、GPUファームをスケール」は、今日の生成AIの世界における支配的な感情である。モデルスケーリングは広く研究されているが、データスケーリングとその下流への影響はまだ検討中である。これは、主要なソースがWorld Wide Webであり、CommonCrawlダンプとしてまとめてパッケージ化されている視覚言語データセットのコンテキストにおいて、特に重要である。この大規模データダンプは、多くの欠点があることが知られているが、繰り返し採掘され、大規模生成モデルのデータメーザーロデとして機能する。本稿では, 1)4億試料と20億試料を含むlaion-400mとlaion-2b-enの比較監査による憎悪コンテンツに対するデータセットのスケーリングの効果の検討 2)シカゴ・フェイス・データセット(CFD)を用いてトレーニングしたモデルの人種的偏りを測定することにより,これらのデータセット変種に基づいて訓練された視覚言語モデルに対するスケールのダウンストリームの影響を評価する。私たちの結果は 1)データセットにおける憎悪コンテンツの存在は,pysentimiento hate-detection natural language processing (nlp)モデルの推論に基づくヘイトコンテンツ率 (hcr) 測定値を用いて測定すると,約12-%$で増加した。 2) 社会バイアスと負のステレオタイプは, 評価したモデルに対するスケールとともに悪化した。スケールが大きくなるにつれて、人間の顔の画像と「人間」のクラスを関連付けるモデルが、他の7つの攻撃クラスを半分に減らす傾向が見られた。さらに、黒人女性のカテゴリーでは、モデルが「犯罪」クラスと顔を関連付ける傾向が2倍になり、黒人男性の顔のクインツップリングは2倍になった。我々は,モデル監査結果の質的・歴史的分析を行い,我々の発見とそのデータセットのキュレーション実践への影響を反映するとともに,この領域で実施すべき知見と今後の課題について概説する。

`Scale the model, scale the data, scale the GPU-farms' is the reigning sentiment in the world of generative AI today. While model scaling has been extensively studied, data scaling and its downstream impacts remain under explored. This is especially of critical importance in the context of visio-linguistic datasets whose main source is the World Wide Web, condensed and packaged as the CommonCrawl dump. This large scale data-dump, which is known to have numerous drawbacks, is repeatedly mined and serves as the data-motherlode for large generative models. In this paper, we: 1) investigate the effect of scaling datasets on hateful content through a comparative audit of the LAION-400M and LAION-2B-en, containing 400 million and 2 billion samples respectively, and 2) evaluate the downstream impact of scale on visio-linguistic models trained on these dataset variants by measuring racial bias of the models trained on them using the Chicago Face Dataset (CFD) as a probe. Our results show that 1) the presence of hateful content in datasets, when measured with a Hate Content Rate (HCR) metric on the inferences of the Pysentimiento hate-detection Natural Language Processing (NLP) model, increased by nearly $12\%$ and 2) societal biases and negative stereotypes were also exacerbated with scale on the models we evaluated. As scale increased, the tendency of the model to associate images of human faces with the `human being' class over 7 other offensive classes reduced by half. Furthermore, for the Black female category, the tendency of the model to associate their faces with the `criminal' class doubled, while quintupling for Black male faces. We present a qualitative and historical analysis of the model audit results, reflect on our findings and its implications for dataset curation practice, and close with a summary of our findings and potential future work to be done in this area.

翻訳日:2023-06-30 16:25:20 公開日:2023-06-28

# ViP:コンピュータビジョンのための微分プライベートファンデーションモデル

ViP: A Differentially Private Foundation Model for Computer Vision ( http://arxiv.org/abs/2306.08842v2 )

ライセンス: Link先を確認

Yaodong Yu and Maziar Sanjabi and Yi Ma and Kamalika Chaudhuri and Chuan Guo

(参考訳) 人工知能(AI)は、インターネット規模のデータに基づいてトレーニングされた基礎モデルを使用することで、能力の飛躍的な増加を見せている。逆に、インターネット規模のデータの未処理の性質は、個人情報や著作権のある資料を許可なくトレーニングするべきではないため、重大なプライバシーや法的リスクも伴う。本研究では,DP(差分プライバシ)を保証した基礎的ビジョンモデルを学習するためのレシピの緩和尺度として提案する。マスク付きオートエンコーダは、DP-SGDとうまく一致した適切な学習アルゴリズムであり、LAION400Mデータセットの厳格なプライバシー予算として、差分プライバシを備えたビジョントランスフォーマーであるViPをトレーニングする。我々は、標準の下流視覚タスクを用いて、VPが学習した表現の質を評価する。特に、VPは、ImageNet上で5,5.7 %の(プライベートでない)線形探索精度を達成している。この結果から,インターネット規模データへのスケーリングは,私的学習に有効であることが示唆された。コードは \url{https://github.com/facebookresearch/vip-mae} で入手できる。

Artificial intelligence (AI) has seen a tremendous surge in capabilities thanks to the use of foundation models trained on internet-scale data. On the flip side, the uncurated nature of internet-scale data also poses significant privacy and legal risks, as they often contain personal information or copyrighted material that should not be trained on without permission. In this work, we propose as a mitigation measure a recipe to train foundation vision models with differential privacy (DP) guarantee. We identify masked autoencoders as a suitable learning algorithm that aligns well with DP-SGD, and train ViP -- a Vision transformer with differential Privacy -- under a strict privacy budget of $\epsilon=8$ on the LAION400M dataset. We evaluate the quality of representation learned by ViP using standard downstream vision tasks; in particular, ViP achieves a (non-private) linear probing accuracy of $55.7\%$ on ImageNet, comparable to that of end-to-end trained AlexNet (trained and evaluated on ImageNet). Our result suggests that scaling to internet-scale data can be practical for private learning. Code is available at \url{https://github.com/facebookresearch/ViP-MAE}.

翻訳日:2023-06-30 16:24:23 公開日:2023-06-28

# ランク縮小カルマンフィルタ : 高次元における近似動的低ランクフィルタリング

The Rank-Reduced Kalman Filter: Approximate Dynamical-Low-Rank Filtering In High Dimensions ( http://arxiv.org/abs/2306.07774v2 )

ライセンス: Link先を確認

Jonathan Schmidt, Philipp Hennig, J\"org Nick, Filip Tronarp

(参考訳) 高次元力学系の文脈における推論とシミュレーションは、計算的に難しい問題のままである。いくつかの次元還元は、問題を一般に引き出すのに必要である。本稿では,共分散行列の低ランク近似を伝播する新しい近似ガウスフィルタ・平滑化法を提案する。これは、予測ステップに関連するリアプノフ方程式を低ランク行列の多様体に投影し、最近開発された数値的に安定な動的低ランク積分器によって解かれる。一方、共分散更新は共分散行列の列空間のみを変換し、構成によりランクが低いことを指摘して、更新ステップを扱いやすくする。このアルゴリズムは、共分散行列の低ランク近似が確率的ではなく決定論的であるという点において、既存のアンサンブルに基づくアプローチと差別化する。これにより、低ランク次元が問題の真の次元に近づくにつれて、正確なカルマンフィルタを再現することができる。本手法は,(カルマンフィルタの場合)立方体から最悪の場合の状態空間サイズにおける \emph{quadratic} までの計算複雑性を低減し,状態空間モデルが一定の条件を満たす場合に \emph{linear} の複雑性を実現する。古典的データ同化と時空間回帰の一連の実験を通じて,提案手法は平均誤差と正確なカルマンフィルタに対する共変性の観点から,アンサンブルに基づく手法を一貫して上回っていることを示す。これは漸近的な計算の複雑さに関して追加のコストを伴わない。

Inference and simulation in the context of high-dimensional dynamical systems remain computationally challenging problems. Some form of dimensionality reduction is required to make the problem tractable in general. In this paper, we propose a novel approximate Gaussian filtering and smoothing method which propagates low-rank approximations of the covariance matrices. This is accomplished by projecting the Lyapunov equations associated with the prediction step to a manifold of low-rank matrices, which are then solved by a recently developed, numerically stable, dynamical low-rank integrator. Meanwhile, the update steps are made tractable by noting that the covariance update only transforms the column space of the covariance matrix, which is low-rank by construction. The algorithm differentiates itself from existing ensemble-based approaches in that the low-rank approximations of the covariance matrices are deterministic, rather than stochastic. Crucially, this enables the method to reproduce the exact Kalman filter as the low-rank dimension approaches the true dimensionality of the problem. Our method reduces computational complexity from cubic (for the Kalman filter) to \emph{quadratic} in the state-space size in the worst-case, and can achieve \emph{linear} complexity if the state-space model satisfies certain criteria. Through a set of experiments in classical data-assimilation and spatio-temporal regression, we show that the proposed method consistently outperforms the ensemble-based methods in terms of error in the mean and covariance with respect to the exact Kalman filter. This comes at no additional cost in terms of asymptotic computational complexity.

翻訳日:2023-06-30 16:24:03 公開日:2023-06-28

# GANに基づく研究

A Work Based on GAN ( http://arxiv.org/abs/2306.03538v3 )

ライセンス: Link先を確認

Honghao Fu

(参考訳) この作業は提出段階に入るので、特定の情報は一時的に隠され、タイトルも隠される。

This work will enter the submission stage, so specific information will be temporarily hidden, also hide the title.

翻訳日:2023-06-30 16:23:17 公開日:2023-06-28

# 監視量子デバイスにおける輸送と非相互性の精密記述

Exact description of transport and non-reciprocity in monitored quantum devices ( http://arxiv.org/abs/2306.16452v1 )

ライセンス: Link先を確認

Jo\~ao Ferreira, Tony Jin, Jochen Mannhart, Thierry Giamarchi, Michele Filippone

(参考訳) 本研究では, 連続モニタリング中の非相互作用性フェルミオン系について検討した。測定結果から平均すると, 系内の粒子と熱の流れの正確な公式を導出する。これらの電流は競合する弾性成分と非弾性成分を特徴とし、非自明にモニタリングの強さに依存する。モニタによる非弾性プロセスは非逆流を生じさせ、アクティブなフィードバック制御なしに測定結果から作業を抽出することができる。測定誘導電力または冷却を提供する2つの異なるモニタリングスキームで、我々の形式を説明する。 ~optimal performancesは,摂動的アプローチでは対処が難しい監視強度$\gamma$の値に対して見出される。

We study non-interacting fermionic systems undergoing continuous monitoring and driven by biased reservoirs. Averaging over the measurement outcomes, we derive exact formulas for the particle and heat flows in the system. We show that these currents feature competing elastic and inelastic components, which depend non-trivially on the monitoring strength $\gamma$. We highlight that monitor-induced inelastic processes lead to non-reciprocal currents, allowing to extract work from measurements without active feedback control. We illustrate our formalism with two distinct monitoring schemes providing measurement-induced power or cooling.~Optimal performances are found for values of the monitoring strength $\gamma$ which are hard to address with perturbative approaches.

翻訳日:2023-06-30 16:16:49 公開日:2023-06-28

# shorのアルゴリズムにおけるモジュラー乗算力学系のカオス的根元

Chaotic Roots of the Modular Multiplication Dynamical System in Shor's Algorithm ( http://arxiv.org/abs/2306.16446v1 )

ライセンス: Link先を確認

Abu Musa Patoary and Amit Vikram and Laura Shou and Victor Galitski

(参考訳) ショアのファクタリングアルゴリズムは、古典計算よりも指数関数的なスピードアップを提供すると信じられており、正確に周期的な量子モジュラー乗算作用素の周期を見つけることに依存している。この完全周期性は、モジュラー乗法作用素の古典極限が古典エルゴード階層の「最大ランダムな」ベルヌーイ準位を占有する非常にカオス的なシステムであることから、量子カオスの観点では矛盾する可積分系の特徴である。本研究では、量子力学系の観点からこの明らかなパラドックスにアプローチし、エルゴード性やカオスのシグネチャが実際にカオス系の「可積分」量子化に符号化されるかどうかを検討する。特定の場合において、ショアのモジュラー乗算作用素は量子化されたa-ベーカー写像の重ね合わせとして書くことができ、より典型的な量子カオスとエルゴード性を示す。この研究は、ショアのモジュラー乗算作用素の可積分性は、同じ写像の族における他の「カオス的」量子化の干渉に起因し、量子アルゴリズムによる積分性、エルゴード性、カオスの相互作用のより深い研究の道を開くことを示唆している。

Shor's factoring algorithm, believed to provide an exponential speedup over classical computation, relies on finding the period of an exactly periodic quantum modular multiplication operator. This exact periodicity is the hallmark of an integrable system, which is paradoxical from the viewpoint of quantum chaos, given that the classical limit of the modular multiplication operator is a highly chaotic system that occupies the "maximally random" Bernoulli level of the classical ergodic hierarchy. In this work, we approach this apparent paradox from a quantum dynamical systems viewpoint, and consider whether signatures of ergodicity and chaos may indeed be encoded in such an "integrable" quantization of a chaotic system. We show that Shor's modular multiplication operator, in specific cases, can be written as a superposition of quantized A-baker's maps exhibiting more typical signatures of quantum chaos and ergodicity. This work suggests that the integrability of Shor's modular multiplication operator may stem from the interference of other "chaotic" quantizations of the same family of maps, and paves the way for deeper studies on the interplay of integrability, ergodicity and chaos in and via quantum algorithms.

翻訳日:2023-06-30 16:16:33 公開日:2023-06-28

# モデル非依存な対話的特徴帰属による性能向上とサンプル効率向上

Increasing Performance And Sample Efficiency With Model-agnostic Interactive Feature Attributions ( http://arxiv.org/abs/2306.16431v1 )

ライセンス: Link先を確認

Joran Michiels, Maarten De Vos, Johan Suykens

(参考訳) モデルに依存しない特徴属性は、複雑なMLモデルに局所的な洞察を与えることができる。説明が正しければ、ドメインエキスパートはモデルの判断を検証し、信頼することができる。しかし、専門家の知識と矛盾する場合、関連する作業はモデルを改善するために無関係な特徴のみを補正する。本稿では,2つの一般的な説明手法(Occlusion と Shapley の値)に対して,モデルに依存しない実装を提供することにより,複雑なモデルに完全に異なる属性を強制する。特定のサンプルのセットに対して、修正された特徴属性を使用して追加の局所データを生成し、サンプルの正しい説明をするためにモデルを再トレーニングする。様々なモデルに関するシミュレーションおよび実データ実験を通じて、提案手法がモデルの性能を大幅に向上させる方法を示し、修正された説明に基づいてトレーニングデータセットを増強する。アクティブな学習環境にインタラクティブな説明を加えることで、サンプル効率が大幅に向上し、既存の説明的対話戦略よりも優れています。さらに、ドメインの専門家がモデルを改善するのに十分正しい機能属性を提供する方法についても検討する。

Model-agnostic feature attributions can provide local insights in complex ML models. If the explanation is correct, a domain expert can validate and trust the model's decision. However, if it contradicts the expert's knowledge, related work only corrects irrelevant features to improve the model. To allow for unlimited interaction, in this paper we provide model-agnostic implementations for two popular explanation methods (Occlusion and Shapley values) to enforce entirely different attributions in the complex model. For a particular set of samples, we use the corrected feature attributions to generate extra local data, which is used to retrain the model to have the right explanation for the samples. Through simulated and real data experiments on a variety of models we show how our proposed approach can significantly improve the model's performance only by augmenting its training dataset based on corrected explanations. Adding our interactive explanations to active learning settings increases the sample efficiency significantly and outperforms existing explanatory interactive strategies. Additionally we explore how a domain expert can provide feature attributions which are sufficiently correct to improve the model.

翻訳日:2023-06-30 16:15:51 公開日:2023-06-28

# DNA-TEQ:DNN推論のためのテンソルの適応指数量子化

DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN Inference ( http://arxiv.org/abs/2306.16430v1 )

ライセンス: Link先を確認

Bahareh Khabbazan, Marc Riera, Antonio Gonz\'alez

(参考訳) 量子化はディープニューラルネットワーク(DNN)において、アクティベーションと重みの算術的精度、すなわちテンソルを小さくすることで、記憶と計算の複雑さを減らすために一般的に用いられる。効率的なハードウェアアーキテクチャでは、最近のDNNを組み込みシステムやモバイルデバイスに展開するために線形量子化を用いる。しかし、線形均一量子化はモデル精度の点で高い性能を犠牲にすることなく、通常8ビット未満に数値精度を下げることはできない。パフォーマンスの損失はテンソルが一様分布に従わないためである。本稿では,かなりの量のテンソルが指数分布に適合することを示す。そこで我々は,DNNテンソルを指数関数的に定量化するDNA-TEQを提案する。実験の結果,DNA-TEQの量子化ビット幅は従来の提案よりもはるかに小さく,平均圧縮比は線形INT8ベースラインよりも40%も小さく,精度の低下は無視でき,DNNを再トレーニングすることができないことがわかった。さらに、DNA-TEQは指数領域でのドット生成操作を誘導し、広く使用されているDNNのセットで平均して66%のエネルギー消費を節約する。

Quantization is commonly used in Deep Neural Networks (DNNs) to reduce the storage and computational complexity by decreasing the arithmetical precision of activations and weights, a.k.a. tensors. Efficient hardware architectures employ linear quantization to enable the deployment of recent DNNs onto embedded systems and mobile devices. However, linear uniform quantization cannot usually reduce the numerical precision to less than 8 bits without sacrificing high performance in terms of model accuracy. The performance loss is due to the fact that tensors do not follow uniform distributions. In this paper, we show that a significant amount of tensors fit into an exponential distribution. Then, we propose DNA-TEQ to exponentially quantize DNN tensors with an adaptive scheme that achieves the best trade-off between numerical precision and accuracy loss. The experimental results show that DNA-TEQ provides a much lower quantization bit-width compared to previous proposals, resulting in an average compression ratio of 40% over the linear INT8 baseline, with negligible accuracy loss and without retraining the DNNs. Besides, DNA-TEQ leads the way in performing dot-product operations in the exponential domain, which saves 66% of energy consumption on average for a set of widely used DNNs.

翻訳日:2023-06-30 16:15:21 公開日:2023-06-28

# 低ランクテンソル分解による複素数値適応系同定

Complex-valued Adaptive System Identification via Low-Rank Tensor Decomposition ( http://arxiv.org/abs/2306.16428v1 )

ライセンス: Link先を確認

Oliver Ploder, Christina Auer, Oliver Lang, Thomas Paireder, Mario Huemer

(参考訳) 機械学習(ML)とテンソルベースの手法は、ここ数十年、科学コミュニティにとって大きな関心を集めてきた。以前の研究で、我々はテンソルのみのアーキテクチャの計算負荷を軽減しつつ、非常に優れた性能を達成できる新しいテンソルベースのシステム識別フレームワークを提示した。しかし、導出手法は実数値問題のみを処理できるため、複雑な数値システムを扱う広範囲の信号処理や通信問題に直接適用できない。そこで本研究では,複雑な数値信号の処理を可能にする2つの新しいアーキテクチャを導出し,これらの拡張が,計算資源のオーバーヘッドをわずかに必要とせず,複雑な数値信号の処理が可能なことを示す。

Machine learning (ML) and tensor-based methods have been of significant interest for the scientific community for the last few decades. In a previous work we presented a novel tensor-based system identification framework to ease the computational burden of tensor-only architectures while still being able to achieve exceptionally good performance. However, the derived approach only allows to process real-valued problems and is therefore not directly applicable on a wide range of signal processing and communications problems, which often deal with complex-valued systems. In this work we therefore derive two new architectures to allow the processing of complex-valued signals, and show that these extensions are able to surpass the trivial, complex-valued extension of the original architecture in terms of performance, while only requiring a slight overhead in computational resources to allow for complex-valued operations.

翻訳日:2023-06-30 16:14:59 公開日:2023-06-28

# カラーコードデコーダによる表面符号故障の最小化

Minimising surface-code failures using a color-code decoder ( http://arxiv.org/abs/2306.16476v1 )

ライセンス: Link先を確認

Asmae Benhemou, Kaavya Sahay, Lingling Lao, Benjamin J. Brown

(参考訳) 実用的で高性能な復号アルゴリズムの開発は、フォールトトレラント量子コンピューティングのリソースコストを削減する。本稿では、偏極雑音モデルによって生じる誤差に対する低重補正演算子を求める表面符号のデコーダを提案する。このデコーダは、表面符号のシンドロームをカラーコードのシンドロームにマッピングすることにより、より洗練されたカラーコードデコーダアルゴリズムを採用することができる。解析的な議論と徹底的なテストにより、結果として生じるデコーダは、コード距離$d$でさえも、すべてのウェイト$d/2$デポーラライズエラーに対して最小の重み付け補正を見つけることができる。これにより、指数係数$O(2^{d/2})$による論理誤差率をビットフリップとデフォーカスエラーを別々に扱うデコーダと比較して改善する。この改善を解析的考察と低い誤差率での数値シミュレーションを用いて実証する。また,従来のカラーコード復号法と比較して,色コードに影響を及ぼす独立かつ同一分散ビットフリップ誤差を補正するために用いたデコーダの論理誤り率を指数関数的に改善することを示す。

The development of practical, high-performance decoding algorithms reduces the resource cost of fault-tolerant quantum computing. Here we propose a decoder for the surface code that finds low-weight correction operators for errors produced by the depolarising noise model. The decoder is obtained by mapping the syndrome of the surface code onto that of the color code, thereby allowing us to adopt more sophisticated color-code decoding algorithms. Analytical arguments and exhaustive testing show that the resulting decoder can find a least-weight correction for all weight $d/2$ depolarising errors for even code distance $d$. This improves the logical error rate by an exponential factor $O(2^{d/2})$ compared with decoders that treat bit-flip and dephasing errors separately. We demonstrate this improvement with analytical arguments and supporting numerical simulations at low error rates. Of independent interest, we also demonstrate an exponential improvement in logical error rate for our decoder used to correct independent and identically distributed bit-flip errors affecting the color code compared with more conventional color-code decoding algorithms.

翻訳日:2023-06-30 16:06:05 公開日:2023-06-28

# All-in-fiber動的軌道角運動量モードソート

All-in-fiber dynamic orbital angular momentum mode sorting ( http://arxiv.org/abs/2306.16472v1 )

ライセンス: Link先を確認

Alvaro Alarc\'on, Santiago G\'omez, Daniel Spegel-Lexne, Joakim Argillander, Jaime Cari\~ne, Gustavo Ca\~nas, Gustavo Lima, Guilherme B. Xavier

(参考訳) 光の自由度の軌道角運動量(OAM)は、電気通信、量子情報、光に基づくマイクロマニピュレーションなど、多くの応用で広く研究されている。異なる横空間モードを分離し区別する能力はモードソートまたはモード分割と呼ばれ、そのようなアプリケーションで符号化された情報を復元することが不可欠である。理想的な$d$モードソーターは、異なる$d$空間モードの区別を忠実に行うことができ、損失を最小限に抑え、$d$出力を持ち、応答時間が速い。従来の全てのモードソータは、空間光変調器などのバルク光学素子に依存しており、光ファイバーシステムと統合される場合、迅速に調整することはできず、さらに損失が生じる。本稿では,超高速動的再構成性を有するoamモードソートにおける最初のオールインファイバー方式を提案する。提案手法は,まず光線形偏光(LP)モードでOAMモードを分解し,次に干渉的に再結合してトポロジカルチャージを決定し,OAMモードを正しくソートする。さらに、oamモードの超高速ルーティングを行うためにも、セットアップを利用できます。これらの結果は、古典的および量子的情報処理における多くの新しい用途に容易に利用できる新しい光空間モードソート法である。

The orbital angular momentum (OAM) spatial degree of freedom of light has been widely explored in many applications, including telecommunications, quantum information and light-based micro-manipulation. The ability to separate and distinguish between the different transverse spatial modes is called mode sorting or mode demultiplexing, and it is essential to recover the encoded information in such applications. An ideal $d$ mode sorter should be able to faithfully distinguish between the different $d$ spatial modes, with minimal losses, have $d$ outputs, and have fast response times. All previous mode sorters rely on bulk optical elements such as spatial light modulators, which cannot be quickly tuned and have additional losses if they are to be integrated with optical fiber systems. Here we propose and experimentally demonstrate, to the best of our knowledge, the first all-in-fiber method for OAM mode sorting with ultra-fast dynamic reconfigurability. Our scheme first decomposes the OAM mode in fiber-optical linearly polarized (LP) modes, and then interferometrically recombines them to determine the topological charge, thus correctly sorting the OAM mode. In addition, our setup can also be used to perform ultra-fast routing of the OAM modes. These results show a novel and fiber integrated form of optical spatial mode sorting that can be readily used for many new applications in classical and quantum information processing.

翻訳日:2023-06-30 16:05:50 公開日:2023-06-28

# 地球と宇宙における量子センサによるボセノバの検出

Detection of Bosenovae with Quantum Sensors on Earth and in Space ( http://arxiv.org/abs/2306.16468v1 )

ライセンス: Link先を確認

Jason Arakawa, Joshua Eby, Marianna S. Safronova, Volodymyr Takhistov, and Muhammad H. Zaheer

(参考訳) 幅広い理論のクラスにおいて、質量 10^{-22}~\textrm{eV} < m_{\phi} < 1~\textrm{eV}$ の超光暗黒物質(ULDM)の蓄積は、ボソン星として知られる長寿命の境界状態を形成する。 ULDMが自己相互作用を示すと、ボセノヴァ爆発で崩壊するボソン星から相対論的ボソンによって運ばれる膨大なエネルギーが放出される。我々は、光子、電子、グルーオンに結合したULDMを含むスカラー粒子の放射された相対論的バーストの過渡的なシグネチャを検出するための地球外および宇宙ベースの実験の可能性を探り、幅広い動機付けられた理論を捉えた。緩和型ULDMのシナリオとして、核時計や宇宙ベースの干渉計などの今後の実験や技術が、過渡的なボセノバ現象のシグネチャを検出することによって、ULDM結合質量パラメータ空間のオーダーを感度よく探索できることを実証する。解析は相対論的スカラー粒子放出の異なるシナリオに容易に拡張できる。

In a broad class of theories, the accumulation of ultralight dark matter (ULDM) with particles of mass $10^{-22}~\textrm{eV} < m_{\phi} < 1~\textrm{eV}$ leads the to formation of long-lived bound states known as boson stars. When the ULDM exhibits self-interactions, prodigious bursts of energy carried by relativistic bosons are released from collapsing boson stars in bosenova explosions. We extensively explore the potential reach of terrestrial and space-based experiments for detecting transient signatures of emitted relativistic bursts of scalar particles, including ULDM coupled to photons, electrons, and gluons, capturing a wide range of motivated theories. For the scenario of relaxion ULDM, we demonstrate that upcoming experiments and technology such as nuclear clocks as well as space-based interferometers will be able to sensitively probe orders of magnitude in the ULDM coupling-mass parameter space, challenging to study otherwise, by detecting signatures of transient bosenova events. Our analysis can be readily extended to different scenarios of relativistic scalar particle emission.

翻訳日:2023-06-30 16:05:23 公開日:2023-06-28

# 平均場レベルでの光学格子中のキラルスピン液体相

Chiral spin liquid phase in an optical lattice at mean-field level ( http://arxiv.org/abs/2306.16466v1 )

ライセンス: Link先を確認

Jian Yang and Xiong-Jun Liu

(参考訳) 我々は,スレーブ・ローター理論とスピノン平均場理論に基づく低温原子のカイラルスピン液体(csl)相を示すために,{\mathrm{u}(1)$合成ゲージフラックスを持つ光学ラマン正方形格子について検討した。ラマンポテンシャルによって生成される有効U($1$)ゲージ束は、CSL相の実現に重要な役割を果たしている。スレーブロータ技術を用いることで、中間のFermi Hubbard相互作用レギュレーションでCSL位相を求める。強い相互作用系では、4つのスピン相互作用を含む効果的なスピンモデルを導出する。スピノン平均場解析により,強磁気フラストレーションの場合,CSL相は安定していることが示された。 2つの平均場近似法は、一貫した位相図を与え、CSL位相の定性的数値的な証拠を与える。

We study an optical Raman square lattice with $\mathrm{U}(1)$ synthetic gauge flux to show chiral spin liquid (CSL) phase for cold atoms based on slave-rotor theory and spinon mean-field theory, respectively. An effective U($1$) gauge flux generated by Raman potentials plays a major role in realizing the CSL phase. By using slave-rotor techniques we find CSL phase at intermediate on-site Fermi Hubbard interacting regime. For the strong interacting regime we derive an effective spin model including up to the four spin interactions. By spinon mean-field analysis it is shown that CSL phase is stabilized in the case of strong magnetic frustration. The two mean-field approximation methods give consistent phase diagrams and provide qualitative numerical evidence of the CSL phase.

翻訳日:2023-06-30 16:05:02 公開日:2023-06-28

# フロッケ絶縁体と格子フェルミオン

Floquet insulators and lattice fermions ( http://arxiv.org/abs/2306.16463v1 )

ライセンス: Link先を確認

Thomas Iadecola, Srimoyee Sen, Lars Sivertsen

(参考訳) フロッケ絶縁体は周期的に駆動される量子システムであり、ドライブパラメータの関数として新しい位相位相をホストすることができる。これらの新しい相は離散時間格子フェルミオン理論のフェルミオン二重化を思わせる特徴を持っている。この提案は、ある駆動パラメータに対する非相互作用(1+1)D Floquet 絶縁体のスペクトルを時間非依存ハミルトニアンによる離散時間格子フェルミオン理論のスペクトルにマッピングすることで具体化する。結果として得られるハミルトニアンは、ストロボスコープダイナミクスを生成するフロケットハミルトニアンとは異なる。離散時間Su-Schrieffer-Heegerモデルと原モデルの空間的位置の半数、あるいは空間的位置の4分の1の(1+1)D Wilson-Dirac理論の形式をとることができる。

Floquet insulators are periodically driven quantum systems that can host novel topological phases as a function of the drive parameters. These new phases exhibit features reminiscent of fermion doubling in discrete-time lattice fermion theories. We make this suggestion concrete by mapping the spectrum of a noninteracting (1+1)D Floquet insulator for certain drive parameters onto that of a discrete-time lattice fermion theory with a time-independent Hamiltonian. The resulting Hamiltonian is distinct from the Floquet Hamiltonian that generates stroboscopic dynamics. It can take the form of a discrete-time Su-Schrieffer-Heeger model with half the number of spatial sites of the original model, or of a (1+1)D Wilson-Dirac theory with one quarter of the spatial sites.

翻訳日:2023-06-30 16:04:46 公開日:2023-06-28

# 非局所量子計算と情報理論暗号

Relating non-local quantum computation to information theoretic cryptography ( http://arxiv.org/abs/2306.16462v1 )

ライセンス: Link先を確認

Rene Allerstorfer, Harry Buhrman, Alex May, Florian Speelman, Philip Verduyn Lunel

(参考訳) 非局所量子計算(NLQC)は位置検証スキームの不正な方法であり、AdS/CFT対応の文脈に現れている。ここでは、nlqcを情報理論的な暗号のより広い文脈に結びつけ、他の多くの暗号プリミティブに関連付ける。 f$-routingとして知られるnlqcの特別な場合の一つは、cdsプリミティブの条件付き開示の量子アナログ(英語版)(quantum analogue of the conditional disclosure of secrets)に相当する。さらに,コヒーレント関数評価(CFE)と呼ばれる位置検証の特殊な事例についても検討し,CFEプロトコルがプライベート同時メッセージパッシング(PSM)シナリオに対して同様の効率的なプロトコルを誘導することを示す。これらの暗号プリミティブに位置検証を関連付けることで、暗号文学における多くの結果はNLQCに新しい意味を与え、その逆も与える。これには、最悪の場合のコストが$f$-routing of $2^{O(\sqrt{n\log n})}$ entanglement(英語版)の最初の部分指数上界、外部にあると思われる問題に対する効率的な$f$-routing(英語版)戦略の最初の例、量子設定におけるCDSの絡み合いの線形下界、CFEの通信コストの線形下界、低T$の量子回路で計算できる関数の量子設定におけるCDSの効率的なプロトコルが含まれる。

Non-local quantum computation (NLQC) is a cheating strategy for position-verification schemes, and has appeared in the context of the AdS/CFT correspondence. Here, we connect NLQC to the wider context of information theoretic cryptography by relating it to a number of other cryptographic primitives. We show one special case of NLQC, known as $f$-routing, is equivalent to the quantum analogue of the conditional disclosure of secrets (CDS) primitive, where by equivalent we mean that a protocol for one task gives a protocol for the other with only small overhead in resource costs. We further consider another special case of position verification, which we call coherent function evaluation (CFE), and show CFE protocols induce similarly efficient protocols for the private simultaneous message passing (PSM) scenario. By relating position-verification to these cryptographic primitives, a number of results in the cryptography literature give new implications for NLQC, and vice versa. These include the first sub-exponential upper bounds on the worst case cost of $f$-routing of $2^{O(\sqrt{n\log n})}$ entanglement, the first example of an efficient $f$-routing strategy for a problem believed to be outside $P/poly$, linear lower bounds on entanglement for CDS in the quantum setting, linear lower bounds on communication cost of CFE, and efficient protocols for CDS in the quantum setting for functions that can be computed with quantum circuits of low $T$ depth.

翻訳日:2023-06-30 16:04:33 公開日:2023-06-28

# ヤコビ法によるフェルミの黄金律を超えて

Beyond Fermi's golden rule with the Jacobi method ( http://arxiv.org/abs/2306.16457v1 )

ライセンス: Link先を確認

David M. Long, Dominik Hahn, Marin Bukov, Anushya Chandran

(参考訳) 量子力学における多くの問題は、単一量子状態の連続体への崩壊として考えられる。時間に依存する初期状態の重なりは忠実性と呼ばれ、この崩壊を特徴づける。エルゴード・ハミルトニアンへのクエンチ後の忠実性の解析的表現を導出する。この表現は弱クエンチェと強クエンチェの両方で有効であり、ヒルベルト空間の有限性以前の時間スケールは忠実性を制限する。初期の二次的崩壊と漸近的指数的崩壊を再現し、強いクエンチェではフェルミの黄金律とは異なる速度で再現する。この分析はジャコビ法(Jacobi method)に依存しており、これはもともとほぼ局所的な系に応用され、ここではよく熱化された系に適応する。この結果は,ジャコビ法が量子力学の異なる状態において予測可能であることを示す。

Many problems in quantum dynamics can be cast as the decay of a single quantum state into a continuum. The time dependent overlap with the initial state, called the fidelity, characterizes this decay. We derive an analytic expression for the fidelity after a quench to an ergodic Hamiltonian. The expression is valid for both weak and strong quenches, and timescales before finiteness of the Hilbert space limits the fidelity. It reproduces initial quadratic decay and asymptotic exponential decay with a rate which, for strong quenches, differs from Fermi's golden rule. The analysis relies on the Jacobi method, which was originally applied in nearly localized systems, and which we here adapt to well-thermalizing systems. Our results demonstrate that the Jacobi method is predictive in disparate regimes of quantum dynamics.

翻訳日:2023-06-30 16:04:00 公開日:2023-06-28

# 翻訳不変行列積状態と$W$-State表現について

On Translation-Invariant Matrix Product States and $W$-State Representations ( http://arxiv.org/abs/2306.16456v1 )

ライセンス: Link先を確認

Petr Klimov, Richik Sengupta and Jacob Biamonte

(参考訳) この研究は、特定の状態のクラスに対する周期的境界条件を持つ翻訳不変行列積状態を構築する方法の開発に焦点をあてる。特に$n$-party $W$-state の結合次元表現は、現在知られている方法を超えている。さらに、最小限の結合次元の推定を改善する可能性も考慮し、$d(\psi)$と表記する。決定論的アルゴリズムは、$d(\psi)$ を見つけるために構築され、また、$d(\psi)$ の決定とその性質の理解に関連する様々な問題を探索する。特に、W状態に対して、結合次元 $ \left\lfloor \frac{n}{2} \right\rfloor +1 の TI-MPS 表現を構築する。さらに、我々は、結合次元を$nに下げることができる多数の状態を証明する。 $

This work focuses on developing methods to construct translation-invariant matrix product states with periodic boundary conditions for specific classes of states. We notably consider the bond dimension representations for the $n$-party $W$-state, surpassing currently known methods. Additionally, we consider possibilities for improving estimates of the minimal possible bond dimension, denoted as $d(\psi)$. A deterministic algorithm is constructed to discover $d(\psi)$, and we also explore various issues associated with determining $d(\psi)$ and understanding its properties. In particular, we construct for W-state an TI-MPS representation of bond dimension $ \left\lfloor \frac{n}{2} \right\rfloor +1. $ Moreover, we prove a large class of states we can reduce the bond-dimension to $n.$

翻訳日:2023-06-30 16:03:46 公開日:2023-06-28

# 監視アンラベリングによるノイズ浅部回路の効率的なサンプリング

Efficient sampling of noisy shallow circuits via monitored unraveling ( http://arxiv.org/abs/2306.16455v1 )

ライセンス: Link先を確認

Zihan Cheng and Matteo Ippoliti

(参考訳) 本研究では,2次元量子ビットアレイ上の浅くノイズの多いランダム回路の出力をサンプリングする古典的なアルゴリズムを提案する。このアルゴリズムは、最近発表された '`space-evolving block decimation'' (SEBD) に基づいて構築され、ノイズ回路の場合に拡張する。 SEBD は、2次元のユニタリ回路を 1D {\displaystyle {\it monitored} にマッピングしたもので、単位ゲートとともに測定を特徴付け、測定誘起絡み合い相転移の存在を利用して有限臨界深さ$T_c$以下の効率的な(近似的な)サンプリングを実現する。我々のノイズ-SEBDアルゴリズムは、ノイズを計測し、さらに絡み合いを減らし、より広い回路深さまで効率的な古典的なサンプリングを可能にする。物理関連ノイズモデル(ユニタリキュービットチャネル)のクラスを2レプリカ統計力学処理で解析し、弱い測定値が最適(つまり最も遠ざかる)アンラベリング(unraveling)であることを示す。次に,実回路モデルにおける回路深さと雑音強度の関数として,ノイズ-sebd複雑性遷移を求める。実例として、IBM QuantumプロセッサをベースとしたCNOTあたりのノイズレート$\approx 2\%の重六角形量子ビットアレイ上の回路を、5iSWAP(または10CNOT)ゲート層まで効率的にサンプリング可能であることを示す。本結果は,ノイズの多いハードウェアのシミュレーションの実用的硬度要件の明確化に有効である。

We introduce a classical algorithm for sampling the output of shallow, noisy random circuits on two-dimensional qubit arrays. The algorithm builds on the recently-proposed ``space-evolving block decimation'' (SEBD) and extends it to the case of noisy circuits. SEBD is based on a mapping of 2D unitary circuits to 1D {\it monitored} ones, which feature measurements alongside unitary gates; it exploits the presence of a measurement-induced entanglement phase transition to achieve efficient (approximate) sampling below a finite critical depth $T_c$. Our noisy-SEBD algorithm unravels the action of noise into measurements, further lowering entanglement and enabling efficient classical sampling up to larger circuit depths. We analyze a class of physically-relevant noise models (unital qubit channels) within a two-replica statistical mechanics treatment, finding weak measurements to be the optimal (i.e. most disentangling) unraveling. We then locate the noisy-SEBD complexity transition as a function of circuit depth and noise strength in realistic circuit models. As an illustrative example, we show that circuits on heavy-hexagon qubit arrays with noise rates of $\approx 2\%$ per CNOT, based on IBM Quantum processors, can be efficiently sampled up to a depth of 5 iSWAP (or 10 CNOT) gate layers. Our results help sharpen the requirements for practical hardness of simulation of noisy hardware.

翻訳日:2023-06-30 16:03:32 公開日:2023-06-28

# デュアルレール量子ネットワークにおけるプログラマブルマルチビット絡み合いの自律的分布

Autonomous distribution of programmable multi-qubit entanglement in a dual-rail quantum network ( http://arxiv.org/abs/2306.16453v1 )

ライセンス: Link先を確認

Joan Agust\'i, Xin H. H. Zhang, Yuri Minoguchi, Peter Rabl

(参考訳) デュアルレール導波路QEDセットアップにおいて、空間分布多ビット絡み合った状態を作成するためのスケーラブルで完全自律的なスキームを提案し、解析する。このアプローチでは、2つの分離導波路に沿って位置する量子ビットの配列は、非退化パラメトリック増幅器の出力からの相関光子によって照らされる。これらの光子は、クビットを、局所的クビット光子デチューニングのパターンによって、多重粒子の絡み合いの程度を便利に調整できるような、純粋に絡み合った定常状態の異なるクラスに駆動する。中規模ネットワークの数値シミュレーションにより、複雑なマルチ量子ビット状態の合成時間はシステムサイズと最も線形に増加し、大きな増幅帯域幅の限界でさらなる高速化の恩恵を受ける可能性があることが示されている。したがって、このスキームは、正確なパルス制御を必要とせず、単一のガウスの絡み合い源のみに依存することなく、大きな量子ネットワークで使える多部絡み合い状態を分散するための興味深い新しいルートを提供する。

We propose and analyze a scalable and fully autonomous scheme for preparing spatially distributed multi-qubit entangled states in a dual-rail waveguide QED setup. In this approach, arrays of qubits located along two separated waveguides are illuminated by correlated photons from the output of a non-degenerate parametric amplifier. These photons drive the qubits into different classes of pure entangled steady states, for which the degree of multipartite entanglement can be conveniently adjusted by the chosen pattern of local qubit-photon detunings. Numerical simulations for moderate-sized networks show that the preparation time for these complex multi-qubit states increases at most linearly with the system size and that one may benefit from an additional speedup in the limit of a large amplifier bandwidth. Therefore, this scheme offers an intriguing new route for distributing ready-to-use multipartite entangled states across large quantum networks, without requiring any precise pulse control and relying on a single Gaussian entanglement source only.

翻訳日:2023-06-30 16:03:05 公開日:2023-06-28

# HNO:PDEを解くハイエナニューラル演算子

HNO: Hyena Neural Operator for solving PDEs ( http://arxiv.org/abs/2306.16524v1 )

ライセンス: Link先を確認

Saurabh Patil, Zijie Li, Amir Barati Farimani

(参考訳) 偏微分方程式(PDE)の数値解法は通常、計算コストのかかる時空間スケールを解くために細かな離散化を必要とする。近年のディープラーニングの進歩は、ニューラル演算子の使用を含むPDEの解決に新たなアプローチをもたらした。ニューラルネットワークは、関数空間間のマッピングを学び、データに基づいて偏微分方程式を解く能力を持つニューラルネットワークアーキテクチャである。本研究は,多層パーセプトロンによりパラメータ化される長い畳み込みフィルタを用いた,ハイエナと呼ばれるニューラル演算子を用いる。ハイエナ作用素(hyena operator)は、大域的な受容場を楽しむ長い畳み込みをパラメータ化するために、準二次複雑性と状態空間モデルを楽しむ演算である。このメカニズムは入力のコンテキストに対するモデルの理解を高め、異なるPDEインスタンスに対するデータ依存重みを可能にする。 PDEの解法における層の効果を測定するため,バーガー方程式とナビエ・ストークス方程式の実験を行った。以上の結果から,Hyena Neural operatorはPDEの解演算子を学習する上で,効率的かつ正確なモデルとして機能することが示唆された。使用したデータとコードは、https://github.com/Saupatil07/Hyena-Neural-Operator.comで見ることができる。

Numerically solving partial differential equations (PDEs) typically requires fine discretization to resolve necessary spatiotemporal scales, which can be computationally expensive. Recent advances in deep learning have provided a new approach to solving PDEs that involves the use of neural operators. Neural operators are neural network architectures that learn mappings between function spaces and have the capability to solve partial differential equations based on data. This study utilizes a novel neural operator called Hyena, which employs a long convolutional filter that is parameterized by a multilayer perceptron. The Hyena operator is an operation that enjoys sub-quadratic complexity and state space model to parameterize long convolution that enjoys global receptive field. This mechanism enhances the model's comprehension of the input's context and enables data-dependent weight for different PDE instances. To measure how effective the layers are in solving PDEs, we conduct experiments on Burger's equation and Navier Stokes equation. Our findings indicate Hyena Neural operator can serve as an efficient and accurate model for learning PDEs' solution operator. The data and code used can be found at: https://github.com/Saupatil07/Hyena-Neural-Operator

翻訳日:2023-06-30 15:57:45 公開日:2023-06-28

# カーネルレンジスペースの場合、定数のクエリは十分である

For Kernel Range Spaces a Constant Number of Queries Are Sufficient ( http://arxiv.org/abs/2306.16516v1 )

ライセンス: Link先を確認

Jeff M. Phillips and Hasan Pourmahmood-Aghababa

(参考訳) 我々は、カーネル範囲空間に対する$\varepsilon$-coverの概念を導入する。カーネル範囲空間は、点の集合 $X \subset \mathbb{R}^d$ と、固定されたカーネルによる全てのクエリの空間(例えば、ガウス核 $K(p,\cdot) = \exp(-\|p-\cdot\|^2)$)に関する。点集合 $X$ of size $n$ に対して、クエリは値のベクトル $R_p \in \mathbb{R}^n$ を返し、$i$th 座標 $(R_p)_i = K(p,x_i)$ for $x_i \in X$ が返される。 Q \subset \mathbb{R}^d$ は任意の$p \in \mathbb{R}^d$ に対して、$\frac{1}{n} \|R_pR_q\|_1\leq \varepsilon$ の集合である。これは、組合せ範囲空間に対するハウスラーの$\varepsilon$-covers(例えば、ボールクエリ内の点の部分集合で定義される)の概念の滑らかな類似であり、結果として得られるベクトル$R_p$は$[0,1]^n$の代わりに$\{0,1\}^n$である。これらの範囲空間のカーネルバージョンは、座標が不確かで不正確であるかもしれないデータ解析タスクに現れ、従ってクエリ範囲内外の概念に柔軟性を加えることを望んでいる。私たちの主な結果は、組合せ範囲空間とは異なり、カーネル $\varepsilon$-covers のサイズは入力サイズ $n$ と次元 $d$ に依存しないということです。ここでは、$(1/\varepsilon)^{\tilde O(1/\varepsilon^2)}$, $\tilde{O}(f(1/\varepsilon))$は、カーネルに依存することができる$(1/\varepsilon)$でログ係数を隠す。これは、範囲クエリにおける境界の概念を緩和することで、最終的には次元の呪いが消え、非常に高次元の機械学習の成功を説明するのに役立つことを意味する。また、この結果を約$(1/\varepsilon)^{\Omega(1/\varepsilon)$で補い、$/\varepsilon$への指数的な依存が必要とされることを示す。

We introduce the notion of an $\varepsilon$-cover for a kernel range space. A kernel range space concerns a set of points $X \subset \mathbb{R}^d$ and the space of all queries by a fixed kernel (e.g., a Gaussian kernel $K(p,\cdot) = \exp(-\|p-\cdot\|^2)$). For a point set $X$ of size $n$, a query returns a vector of values $R_p \in \mathbb{R}^n$, where the $i$th coordinate $(R_p)_i = K(p,x_i)$ for $x_i \in X$. An $\varepsilon$-cover is a subset of points $Q \subset \mathbb{R}^d$ so for any $p \in \mathbb{R}^d$ that $\frac{1}{n} \|R_p - R_q\|_1\leq \varepsilon$ for some $q \in Q$. This is a smooth analog of Haussler's notion of $\varepsilon$-covers for combinatorial range spaces (e.g., defined by subsets of points within a ball query) where the resulting vectors $R_p$ are in $\{0,1\}^n$ instead of $[0,1]^n$. The kernel versions of these range spaces show up in data analysis tasks where the coordinates may be uncertain or imprecise, and hence one wishes to add some flexibility in the notion of inside and outside of a query range. Our main result is that, unlike combinatorial range spaces, the size of kernel $\varepsilon$-covers is independent of the input size $n$ and dimension $d$. We obtain a bound of $(1/\varepsilon)^{\tilde O(1/\varepsilon^2)}$, where $\tilde{O}(f(1/\varepsilon))$ hides log factors in $(1/\varepsilon)$ that can depend on the kernel. This implies that by relaxing the notion of boundaries in range queries, eventually the curse of dimensionality disappears, and may help explain the success of machine learning in very high-dimensions. We also complement this result with a lower bound of almost $(1/\varepsilon)^{\Omega(1/\varepsilon)}$, showing the exponential dependence on $1/\varepsilon$ is necessary.

翻訳日:2023-06-30 15:57:24 公開日:2023-06-28

# 自動バイアス曲線の曲げ - 国家安全保障における人間とAIによる意思決定に関する研究

Bending the Automation Bias Curve: A Study of Human and AI-based Decision Making in National Security Contexts ( http://arxiv.org/abs/2306.16507v1 )

ライセンス: Link先を確認

Michael C. Horowitz, Lauren Kahn

(参考訳) ai(artificial intelligence, ai)の利用は、特に機械学習のアプローチによって、世界中のセクターや社会で増加している。 AIの採用は、特に国際セキュリティ分野において、どのように進むのか? 自動化バイアスの研究は、AIにおいて人間が過信される可能性があることを示唆する一方、アルゴリズムの逆転の研究は、決定の利害が高まるにつれて、人間がアルゴリズムを信頼することに対してより慎重になることを示している。我々は、AIに関する背景知識とAIに対する信頼の関係、そしてこれらが国際セキュリティ文脈における自動化バイアスの確率に影響を与える他の要因とどのように相互作用するかを理論化する。我々は、AI産業のレベルが異なる9カ国の9000人の成人の代表例を対象に、事前登録されたタスク識別実験でテストを行った。結果は、特にAIの背景知識に関する理論を強く支持する。ダニング・クルーガー効果(dunning kruger effect)の1バージョンは、aiを使った経験が最低レベルである人は、アルゴリズムが逆になる確率がわずかに高いため、自動化バイアスは、応答者のaiバックグラウンドが最高レベルに達する前に、知識の低レベルで発生する。追加の結果は、タスクの難易度、全体的なAI信頼、人間かAIの意思決定支援が非常に有能であるか、あまり有能でないと説明されるかどうかによる影響を示している。

Uses of artificial intelligence (AI), especially those powered by machine learning approaches, are growing in sectors and societies around the world. How will AI adoption proceed, especially in the international security realm? Research on automation bias suggests that humans can often be overconfident in AI, whereas research on algorithm aversion shows that, as the stakes of a decision rise, humans become more cautious about trusting algorithms. We theorize about the relationship between background knowledge about AI, trust in AI, and how these interact with other factors to influence the probability of automation bias in the international security context. We test these in a preregistered task identification experiment across a representative sample of 9000 adults in 9 countries with varying levels of AI industries. The results strongly support the theory, especially concerning AI background knowledge. A version of the Dunning Kruger effect appears to be at play, whereby those with the lowest level of experience with AI are slightly more likely to be algorithm-averse, then automation bias occurs at lower levels of knowledge before leveling off as a respondent's AI background reaches the highest levels. Additional results show effects from the task's difficulty, overall AI trust, and whether a human or AI decision aid is described as highly competent or less competent.

翻訳日:2023-06-30 15:56:37 公開日:2023-06-28

# 非IIDフェデレーション学習におけるMomentumのメリット

Momentum Benefits Non-IID Federated Learning Simply and Provably ( http://arxiv.org/abs/2306.16504v1 )

ライセンス: Link先を確認

Ziheng Cheng, Xinmeng Huang, Kun Yuan

(参考訳) フェデレーション学習は、大規模機械学習の強力なパラダイムだが、信頼性の低いネットワーク接続、遅い通信、クライアント間のデータの不均一性など、大きな課題に直面している。 FedAvgとSCAFFOLDは、これらの課題に対処する2つの基本的なアルゴリズムである。特に、FedAvgは中央サーバと通信する前に複数のローカル更新を使用するが、SCAFFOLDは各クライアントに制御変数を保持し、ローカル更新で"クライアントドリフト"を補償する。これらの2つのアルゴリズムの収束性を高めるために、文献で様々な方法が提案されているが、アルゴリズム構造に対する非現実的な調整を行うか、境界データの不均一性の仮定に依存する。本稿では,FedAvgとSCAFFOLDの性能向上のための運動量の利用について検討する。すべてのクライアントがトレーニングプロセスに参加すると、momentumを組み込むことで、一定の局所学習率を使用しても、境界データの不均一性の仮定に頼らずにfedavgを収束させることができることを実証する。 fedavgの既存の分析では、局所学習率の低下にもかかわらず、境界データの不均一性が必要となるため、これは新しい結果である。部分的な顧客参加の場合、momentumは追加の前提を課すことなく、足場が確実に速く収束できることを示す。さらに,FedAvg と SCAFFOLD の新たな分散還元拡張を開発するために運動量を用いて,最先端の収束率を示す。実験結果はすべての理論的結果を支持する。

Federated learning is a powerful paradigm for large-scale machine learning, but it faces significant challenges due to unreliable network connections, slow communication, and substantial data heterogeneity across clients. FedAvg and SCAFFOLD are two fundamental algorithms to address these challenges. In particular, FedAvg employs multiple local updates before communicating with a central server, while SCAFFOLD maintains a control variable on each client to compensate for "client drift" in its local updates. Various methods have been proposed in literature to enhance the convergence of these two algorithms, but they either make impractical adjustments to algorithmic structure, or rely on the assumption of bounded data heterogeneity. This paper explores the utilization of momentum to enhance the performance of FedAvg and SCAFFOLD. When all clients participate in the training process, we demonstrate that incorporating momentum allows FedAvg to converge without relying on the assumption of bounded data heterogeneity even using a constant local learning rate. This is a novel result since existing analyses for FedAvg require bounded data heterogeneity even with diminishing local learning rates. In the case of partial client participation, we show that momentum enables SCAFFOLD to converge provably faster without imposing any additional assumptions. Furthermore, we use momentum to develop new variance-reduced extensions of FedAvg and SCAFFOLD, which exhibit state-of-the-art convergence rates. Our experimental results support all theoretical findings.

翻訳日:2023-06-30 15:56:13 公開日:2023-06-28

# SARC:ソフトアクターの反省的批判

SARC: Soft Actor Retrospective Critic ( http://arxiv.org/abs/2306.16503v1 )

ライセンス: Link先を確認

Sukriti Verma, Ayush Chopra, Jayakumar Subramanian, Mausoom Sarkar, Nikaash Puri, Piyush Gupta, Balaji Krishnamurthy

(参考訳) 俳優-批判的アルゴリズムであるsacの2倍スケールの性質は、批評家の見積もりが俳優に対して常に収束していないという事実によって特徴づけられるが、批評家は俳優よりも速く学習するので、両者の一貫性が保証される。様々な戦略が文献に導入され、より良い収束を達成するためにより良い勾配推定を学ぶ。グラデーション推定は批評家に依存するため,レビュアーの改善によって,各時点における俳優のグラデーション推定が向上する可能性が示唆される。これを利用することで、SAC批評家の損失を新たな損失期間的損失で増大させ、批評家の収束を早め、その結果、アクターの政策勾配推定をより良くするソフトアクターレトロスペクティブ批評(SARC)を提案する。既存のSACの実装は最小限の変更で簡単にSARCに適応できる。本研究では,SARCがベンチマーク環境におけるSACよりも一貫した改善を提供することを示す。我々は、コードとすべての実験データを、https://github.com/sukritiverma 1996/SARCでオープンソース化する予定です。

The two-time scale nature of SAC, which is an actor-critic algorithm, is characterised by the fact that the critic estimate has not converged for the actor at any given time, but since the critic learns faster than the actor, it ensures eventual consistency between the two. Various strategies have been introduced in literature to learn better gradient estimates to help achieve better convergence. Since gradient estimates depend upon the critic, we posit that improving the critic can provide a better gradient estimate for the actor at each time. Utilizing this, we propose Soft Actor Retrospective Critic (SARC), where we augment the SAC critic loss with another loss term - retrospective loss - leading to faster critic convergence and consequently, better policy gradient estimates for the actor. An existing implementation of SAC can be easily adapted to SARC with minimal modifications. Through extensive experimentation and analysis, we show that SARC provides consistent improvement over SAC on benchmark environments. We plan to open-source the code and all experiment data at: https://github.com/sukritiverma1996/SARC.

翻訳日:2023-06-30 15:55:45 公開日:2023-06-28

# 変分不等式の確率的方法:エルゴディディティ、バイアス、リファインメント

Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements ( http://arxiv.org/abs/2306.16502v1 )

ライセンス: Link先を確認

Emmanouil-Vasileios Vlatakis-Gkaragkounis, Angeliki Giannou, Yudong Chen, Qiaomin Xie

(参考訳) 様々な機械学習タスクで発生する分極最適化と変分不等式問題 (VIP) に対して、Stochastic Extragradient (SEG) とStochastic Gradient Descent Ascent (SGDA) が最優先のアルゴリズムとして登場した。 SEG/SGDAの定常的なステップサイズ変種は、簡単なチューニングや初期条件の迅速な許容といった魅力的な利点によって人気を集めているが、それらの収束挙動は初歩的な双線形モデルにおいてもより複雑である。我々の研究は、これらのアルゴリズムに固有の確率構造を解明し、定量化する努力をしている。定数のステップサイズSEG/SGDAを時間同質マルコフ連鎖として再キャストすることにより、大数第一種法則と中心極限定理を確立し、平均イテレートが漸近正規であり、幅広いモノトンおよび非モノトンVIPに対してユニークな不変分布を持つことを示した。凸凹 min-max 最適化に特化して、Von-Neumann の値に対するステップサイズと誘導バイアスの関係を特徴づける。最後に、richardson-romberg外挿により、vipsのグローバルソリューションへの平均反復値の近接性が向上することを示す。我々の確率論的分析は、我々の理論的な発見を裏付ける実験によって支えられ、最適化、マルコフ連鎖、演算子理論からの技術を利用する。

For min-max optimization and variational inequalities problems (VIP) encountered in diverse machine learning tasks, Stochastic Extragradient (SEG) and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminent algorithms. Constant step-size variants of SEG/SGDA have gained popularity, with appealing benefits such as easy tuning and rapid forgiveness of initial conditions, but their convergence behaviors are more complicated even in rudimentary bilinear models. Our work endeavors to elucidate and quantify the probabilistic structures intrinsic to these algorithms. By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem, demonstrating that the average iterate is asymptotically normal with a unique invariant distribution for an extensive range of monotone and non-monotone VIPs. Specializing to convex-concave min-max optimization, we characterize the relationship between the step-size and the induced bias with respect to the Von-Neumann's value. Finally, we establish that Richardson-Romberg extrapolation can improve proximity of the average iterate to the global solution for VIPs. Our probabilistic analysis, underpinned by experiments corroborating our theoretical discoveries, harnesses techniques from optimization, Markov chains, and operator theory.

翻訳日:2023-06-30 15:55:25 公開日:2023-06-28

# ソーシャルメディアストリームからのイベント検出:方法,データセット,機会

Event Detection from Social Media Stream: Methods, Datasets and Opportunities ( http://arxiv.org/abs/2306.16495v1 )

ライセンス: Link先を確認

Quanzhi Li, Yang Chao, Dong Li, Yao Lu, Chi Zhang

(参考訳) ソーシャルメディアストリームには、日々の物語から最新のグローバルおよびローカルイベントやニュースまで、多種多様な情報が含まれている。特にtwitterは、リアルタイムに発生したイベントの迅速な拡散を可能にし、個人や組織が今起きている出来事を知らせ続けることができる。ソーシャルメディアデータからのイベント検出は、従来のテキストとは異なる課題であり、近年注目を集めている研究分野である。本稿では,Twitterデータストリームのイベント検出手法を幅広く調査し,この領域における最近の展開を読者が理解できるようにする。利用可能なデータセットを公開します。さらにいくつかの研究の機会は

Social media streams contain large and diverse amount of information, ranging from daily-life stories to the latest global and local events and news. Twitter, especially, allows a fast spread of events happening real time, and enables individuals and organizations to stay informed of the events happening now. Event detection from social media data poses different challenges from traditional text and is a research area that has attracted much attention in recent years. In this paper, we survey a wide range of event detection methods for Twitter data stream, helping readers understand the recent development in this area. We present the datasets available to the public. Furthermore, a few research opportunities

翻訳日:2023-06-30 15:54:56 公開日:2023-06-28

# 独立したサブネットトレーニングの理論的理解に向けて

Towards a Better Theoretical Understanding of Independent Subnetwork Training ( http://arxiv.org/abs/2306.16484v1 )

ライセンス: Link先を確認

Egor Shulgin and Peter Richt\'arik

(参考訳) 大規模機械学習の最近の進歩は、データ並列分散コンピューティングのパラダイムなしでは不可能だろう。大規模モデルを用いた分散コンピューティングは通信チャネルに過度な圧力を与えるため、通信コスト削減を目的とした通信圧縮戦略と訓練アルゴリズムの協調設計に向けた重要な研究が進められている。純粋なデータ並列処理はデータスケーリングを向上しますが、モデルスケーリング特性の貧弱さに悩まされます。実際、計算ノードはメモリ制約によって著しく制限され、モデルサイズがさらに大きくなるのを防ぐ。このため、巨大ニューラルネットワークモデルのトレーニングにおける最新の成果も、ある種のモデル並列性に依存している。本稿では,先述の問題を解決するために最近提案されている,高度に効果的な手法である独立サブネットワークトレーニング(ist)について,より理論的に考察する。圧縮通信を用いた分散手法など,ISTと代替手法の基本的な違いを特定し,その最適化性能を2次モデル上で正確に解析する。

Modern advancements in large-scale machine learning would be impossible without the paradigm of data-parallel distributed computing. Since distributed computing with large-scale models imparts excessive pressure on communication channels, significant recent research has been directed toward co-designing communication compression strategies and training algorithms with the goal of reducing communication costs. While pure data parallelism allows better data scaling, it suffers from poor model scaling properties. Indeed, compute nodes are severely limited by memory constraints, preventing further increases in model size. For this reason, the latest achievements in training giant neural network models also rely on some form of model parallelism. In this work, we take a closer theoretical look at Independent Subnetwork Training (IST), which is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication, and provide a precise analysis of its optimization performance on a quadratic model.

翻訳日:2023-06-30 15:54:47 公開日:2023-06-28

# densebam-gi:hmerのためのmomentum assisted gruを用いた注意強化denesenet

DenseBAM-GI: Attention Augmented DeneseNet with momentum aided GRU for HMER ( http://arxiv.org/abs/2306.16482v1 )

ライセンス: Link先を確認

Aniket Pal, Krishna Pratap Singh

(参考訳) 手書き数学表現(HMER)の認識は,デジタル教育や学術研究の分野において重要である。しかし,手書き数式における記号間の長さと複雑な空間関係を正確に決定することは困難である。本研究では,HMER 用の新しいエンコーダ・デコーダアーキテクチャ (DenseBAM-GI) を提案する。そこでは,エンコーダは特徴表現を改善するために Bottleneck Attention Module (BAM) を持ち,デコーダは拡張ゲート付きGated Input-GRU (GI-GRU) ユニットを持ち,長大かつ複雑な表現を容易にする。提案モデルは、表現認識率(exprate)の観点から、最先端モデルと同等のパフォーマンスを持つ効率的で軽量なアーキテクチャである。また、CROHME 2014、2016、2019データセットの上位1、2、3エラー精度も向上している。 DenseBAM-GIは、CROHME 2019データセットで、すべてのモデルの中で最高のエクスプロイトを達成する。重要なことに、これらの成功は計算の複雑さの低下とgpuメモリの必要性の低減によって達成される。

The task of recognising Handwritten Mathematical Expressions (HMER) is crucial in the fields of digital education and scholarly research. However, it is difficult to accurately determine the length and complex spatial relationships among symbols in handwritten mathematical expressions. In this study, we present a novel encoder-decoder architecture (DenseBAM-GI) for HMER, where the encoder has a Bottleneck Attention Module (BAM) to improve feature representation and the decoder has a Gated Input-GRU (GI-GRU) unit with an extra gate to make decoding long and complex expressions easier. The proposed model is an efficient and lightweight architecture with performance equivalent to state-of-the-art models in terms of Expression Recognition Rate (exprate). It also performs better in terms of top 1, 2, and 3 error accuracy across the CROHME 2014, 2016, and 2019 datasets. DenseBAM-GI achieves the best exprate among all models on the CROHME 2019 dataset. Importantly, these successes are accomplished with a drop in the complexity of the calculation and a reduction in the need for GPU memory.

翻訳日:2023-06-30 15:54:32 公開日:2023-06-28

# 外部知識ビジュアル質問応答のための事前学習型マルチモーダルドライザー

Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering ( http://arxiv.org/abs/2306.16478v1 )

ライセンス: Link先を確認

Alireza Salemi, Mahta Rafiee, Hamed Zamani

(参考訳) 本稿では,質問への回答に外部知識へのアクセスが必要である視覚質問応答タスクのカテゴリについて検討する。このカテゴリーは外部知識視覚質問応答 (OK-VQA) と呼ばれる。 OK-VQAシステムの開発における大きなステップは、与えられたマルチモーダルクエリに関連するドキュメントを取得することである。このタスクの最先端非対称密度検索モデルは、マルチモーダルクエリエンコーダとユニモーダルドキュメントエンコーダを備えたアーキテクチャを使用する。このようなアーキテクチャは、効果的なパフォーマンスのために大量のトレーニングデータを必要とする。そこで本稿では,OK-VQAタスクの経路検索モデルの事前学習のための自動データ生成パイプラインを提案する。提案されたアプローチは、現在の最先端非対称アーキテクチャと比較して26.9%の精度@5の改善をもたらす。さらに、提案した事前学習アプローチは、ゼロショット検索シナリオにおいて優れた能力を示す。

This paper studies a category of visual question answering tasks, in which accessing external knowledge is necessary for answering the questions. This category is called outside-knowledge visual question answering (OK-VQA). A major step in developing OK-VQA systems is to retrieve relevant documents for the given multi-modal query. Current state-of-the-art asymmetric dense retrieval model for this task uses an architecture with a multi-modal query encoder and a uni-modal document encoder. Such an architecture requires a large amount of training data for effective performance. We propose an automatic data generation pipeline for pre-training passage retrieval models for OK-VQA tasks. The proposed approach leads to 26.9% Precision@5 improvements compared to the current state-of-the-art asymmetric architecture. Additionally, the proposed pre-training approach exhibits a good ability in zero-shot retrieval scenarios.

翻訳日:2023-06-30 15:54:09 公開日:2023-06-28

# フレキシブルレート双方向ビデオ圧縮のためのマルチスケール変形性アライメントとコンテンツ適応型推論

Multi-Scale Deformable Alignment and Content-Adaptive Inference for Flexible-Rate Bi-Directional Video Compression ( http://arxiv.org/abs/2306.16544v1 )

ライセンス: Link先を確認

M.Ak{\i}n Y{\i}lmaz, O.Ugur Ulas, A.Murat Tekalp

(参考訳) 動画コンテンツに動き補償モデルを適用する能力の欠如は、現在のエンドツーエンドの学習ビデオ圧縮モデルの重要な制限である。本稿では、エンドツーエンドの速度歪みに最適化された階層的双方向ビデオ圧縮のための適応型モーション補償モデルを提案する。特に2つの新案を提案します一特徴レベルにおけるマルチスケールの変形可能なアライメント方式及びマルチスケール条件付き符号化二運動コンテンツ適応推論さらに,複数のレート歪み動作点で単一モデルを動作させることができるゲインユニットを採用した。また,実際のフレキシブルレート学習ビデオ符号化のために,対応するモデルを微調整することにより,符号内対双方向符号化フレーム間のビット割り当てを制御するためにゲインユニットを利用する。実験により, 学習ビデオ符号化における先行技術に比較して, 最先端の速度歪み性能を示すことができた。

The lack of ability to adapt the motion compensation model to video content is an important limitation of current end-to-end learned video compression models. This paper advances the state-of-the-art by proposing an adaptive motion-compensation model for end-to-end rate-distortion optimized hierarchical bi-directional video compression. In particular, we propose two novelties: i) a multi-scale deformable alignment scheme at the feature level combined with multi-scale conditional coding, ii) motion-content adaptive inference. In addition, we employ a gain unit, which enables a single model to operate at multiple rate-distortion operating points. We also exploit the gain unit to control bit allocation among intra-coded vs. bi-directionally coded frames by fine tuning corresponding models for truly flexible-rate learned video coding. Experimental results demonstrate state-of-the-art rate-distortion performance exceeding those of all prior art in learned video coding.

翻訳日:2023-06-30 15:47:29 公開日:2023-06-28

# 効率的なフォトリアリスティック・ヒューマンレンダリングによる次世代拡張現実会議システムの実現

Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering ( http://arxiv.org/abs/2306.16541v1 )

ライセンス: Link先を確認

Chuanyue Shen, Letian Zhang, Zhangsihao Yang, Masood Mortazavi, Xiyun Song, Liang Peng, Heather Yu

(参考訳) オンラインミーティングが新しい標準になりつつある。オンライン会議のための没入型体験を作ることは、より多様でシームレスな環境に欠かせない。人間の3Dダイナミックスの効率的な光リアルレンダリングは没入型ミーティングの中核である。現在の一般的なアプリケーションはリアルタイム会議を実現しているが,2d空間の制限や,参加者間の現実的なインタラクションを欠いたアバターの使用など,フォトリアリスティックな人間のダイナミクスの提供には不足している。 NeRF(Neural Radiance Field)のようなニューラルレンダリングの最近の進歩は、メタバースミーティングにおいてより大きなリアリズムをもたらす可能性がある。しかし,NeRFのレンダリング速度は遅いため,リアルタイム会議が困難である。データとハードウェアの効率を向上させるために,単眼映像取得と自由視点合成を活用した,将来の拡張現実型メタバース会議システムのためのパイプラインを想定する。没入型会議体験に向けて,光現実性人間力学をより効率的にレンダリングするための,NeRFに基づく高速な自由視点合成アルゴリズムを探索する。提案アルゴリズムは,最先端手法よりも44.5%,213%高速なトレーニングを行いながら,同等のレンダリング品質を実現することを示す。我々の探索は、複雑なアプリケーションシナリオを扱えるメタバース会議システムを構築するための設計基盤を提供する。例えば、カスタマイズされたテーマによる動的シーンのリライトや、現実世界の人々を拡張世界へと調和させるマルチユーザー会議である。

Meeting online is becoming the new normal. Creating an immersive experience for online meetings is a necessity towards more diverse and seamless environments. Efficient photorealistic rendering of human 3D dynamics is the core of immersive meetings. Current popular applications achieve real-time conferencing but fall short in delivering photorealistic human dynamics, either due to limited 2D space or the use of avatars that lack realistic interactions between participants. Recent advances in neural rendering, such as the Neural Radiance Field (NeRF), offer the potential for greater realism in metaverse meetings. However, the slow rendering speed of NeRF poses challenges for real-time conferencing. We envision a pipeline for a future extended reality metaverse conferencing system that leverages monocular video acquisition and free-viewpoint synthesis to enhance data and hardware efficiency. Towards an immersive conferencing experience, we explore an accelerated NeRF-based free-viewpoint synthesis algorithm for rendering photorealistic human dynamics more efficiently. We show that our algorithm achieves comparable rendering quality while performing training and inference 44.5% and 213% faster than state-of-the-art methods, respectively. Our exploration provides a design basis for constructing metaverse conferencing systems that can handle complex application scenarios, including dynamic scene relighting with customized themes and multi-user conferencing that harmonizes real-world people into an extended world.

翻訳日:2023-06-30 15:47:16 公開日:2023-06-28

# 物体検出のための深層学習における前景-背景不均衡問題の体系的研究

A systematic study of the foreground-background imbalance problem in deep learning for object detection ( http://arxiv.org/abs/2306.16539v1 )

ライセンス: Link先を確認

Hanxue Gu, Haoyu Dong, Nicholas Konz, Maciej A. Mazurowski

(参考訳) 深層学習におけるクラス不均衡問題は、いくつかの研究で研究されているが、物体検出におけるこの現象の体系的な解析はまだ行われていない。本稿では,対象検出におけるフォアグラウンドバックグラウンド(f-b)不均衡問題の包括的解析と実験を行う。 F-B不均衡(オブジェクトサイズ,オブジェクト数,データセットサイズ,オブジェクトタイプ)の異なる側面が検出性能に及ぼす影響を実験的に検討した。さらに,Faster-RCNN,SSD,OHEM,Libra-RCNN,Focal-Loss,GHM,PISA,YOLO-v3,GFLの9つの主要な手法を,異なる画像領域のデータセットで比較した。 We conclude that (1) the F-B imbalance can indeed cause a significant drop in detection performance, (2) The detection performance is more affected by F-B imbalance when fewer training data are available, (3) in most cases, decreasing object size leads to larger performance drop than decreasing number of objects, given the same change in the ratio of object pixels to non-object pixels, (6) among all selected methods, Libra-RCNN and PISA demonstrate the best performance in addressing the issue of F-B imbalance. (7) トレーニングデータセットのサイズが大きい場合, 方法の選択は影響を受けない (8) フォーカスロス, GHM, GFLを含むソフトサンプリング手法は, 平均的にかなりよく動作するが, 比較的不安定である。

The class imbalance problem in deep learning has been explored in several studies, but there has yet to be a systematic analysis of this phenomenon in object detection. Here, we present comprehensive analyses and experiments of the foreground-background (F-B) imbalance problem in object detection, which is very common and caused by small, infrequent objects of interest. We experimentally study the effects of different aspects of F-B imbalance (object size, number of objects, dataset size, object type) on detection performance. In addition, we also compare 9 leading methods for addressing this problem, including Faster-RCNN, SSD, OHEM, Libra-RCNN, Focal-Loss, GHM, PISA, YOLO-v3, and GFL with a range of datasets from different imaging domains. We conclude that (1) the F-B imbalance can indeed cause a significant drop in detection performance, (2) The detection performance is more affected by F-B imbalance when fewer training data are available, (3) in most cases, decreasing object size leads to larger performance drop than decreasing number of objects, given the same change in the ratio of object pixels to non-object pixels, (6) among all selected methods, Libra-RCNN and PISA demonstrate the best performance in addressing the issue of F-B imbalance. (7) When the training dataset size is large, the choice of method is not impactful (8) Soft-sampling methods, including focal-loss, GHM, and GFL, perform fairly well on average but are relatively unstable.

翻訳日:2023-06-30 15:46:53 公開日:2023-06-28

# CLANet: ブライトフィールド画像を用いたクロスバッチセルライン識別のための総合的フレームワーク

CLANet: A Comprehensive Framework for Cross-Batch Cell Line Identification Using Brightfield Images ( http://arxiv.org/abs/2306.16538v1 )

ライセンス: Link先を確認

Lei Tong, Adam Corrigan, Navin Rathna Kumar, Kerry Hallbrook, Jonathan Orme, Yinhai Wang, Huiyu Zhou

(参考訳) 細胞線認証は、生物医学の分野で重要な役割を担っており、研究者が正確に同定された細胞を扱うことを保証する。教師付き深層学習は、細胞イメージングによる細胞形態学的特徴の研究により、細胞株の同定において顕著な進歩を遂げた。しかし、データが生成される異なる時間から生じる重要な問題であるバッチ効果は、基礎となるデータ分布に大きな変化をもたらし、異なるバッチ培養から細胞列間の信頼性の高い分化を複雑にする。この課題に対処するために,我々は,brightfieldイメージを用いたクロスバッチセルライン識別のための先駆的フレームワークであるclangtを紹介する。本稿では,セル密度の変動を効率的に把握するセルクラスタレベルの選択手法と,画像品質の変動を管理する自己教師型学習戦略を提案する。さらに,複数のインスタンス学習(MIL)を,セルライン識別のためのインスタンスレベルの特徴を効果的に集約するために採用する。当社の革新的な時系列セグメントサンプリングモジュールは,バッチ間のインキュベーション時間の違いによるバイアスを軽減し,milの機能学習能力をさらに向上させる。 astrazenecaグローバルセルバンクの93の実験バッチにわたる32のセルラインのデータを用いて、clangetを検証する。以上の結果から,CLANetは関連するアプローチ(ドメイン適応,MILなど)よりも優れており,細胞株同定におけるバッチ効果に対処する効果が示された。

Cell line authentication plays a crucial role in the biomedical field, ensuring researchers work with accurately identified cells. Supervised deep learning has made remarkable strides in cell line identification by studying cell morphological features through cell imaging. However, batch effects, a significant issue stemming from the different times at which data is generated, lead to substantial shifts in the underlying data distribution, thus complicating reliable differentiation between cell lines from distinct batch cultures. To address this challenge, we introduce CLANet, a pioneering framework for cross-batch cell line identification using brightfield images, specifically designed to tackle three distinct batch effects. We propose a cell cluster-level selection method to efficiently capture cell density variations, and a self-supervised learning strategy to manage image quality variations, thus producing reliable patch representations. Additionally, we adopt multiple instance learning(MIL) for effective aggregation of instance-level features for cell line identification. Our innovative time-series segment sampling module further enhances MIL's feature-learning capabilities, mitigating biases from varying incubation times across batches. We validate CLANet using data from 32 cell lines across 93 experimental batches from the AstraZeneca Global Cell Bank. Our results show that CLANet outperforms related approaches (e.g. domain adaptation, MIL), demonstrating its effectiveness in addressing batch effects in cell line identification.

翻訳日:2023-06-30 15:46:27 公開日:2023-06-28

# 核多体系における多体絡み合いと情報再構成

Multi-Body Entanglement and Information Rearrangement in Nuclear Many-Body Systems ( http://arxiv.org/abs/2306.16535v1 )

ライセンス: Link先を確認

S. Momme Hengstenberg, Caroline E. P. Robin, Martin J. Savage

(参考訳) 核多体系の有効モデル空間(EMS)計算について検討し,多粒子エンタングルメントの収束について検討した。一般化リプキン・メシュコフ・グリク(lmg)モデルは、核の絡み合い駆動記述の将来の発展の動機付けと洞察を提供するために用いられる。効果的なアプローチはヒルベルト空間の切り離しと、関連する基本自由度を構成するクォービット(スピン)の変分回転に基づいている。回転と切り離しの非可換性により、モデル空間の大部分でエネルギー収束が指数関数的に改善される。本分析では, 相関と絡み合いの測定を行い, その収束度をカットオフの増加とともに定量化する。マルチボディの絡み合いを推定するために, 1 および 2 スピンの絡み合いエントロピー,相互情報,および $n$-tangles に焦点を当てた。実効的な記述は回転したスピンのエントロピーや相互情報を強く抑制し、低いカットオフで正確な結果を広範囲に回収することができる。一方、素ハミルトニアンのネーブ・トランケーションは、これらの測度を人工的に過小評価する。本モデルにおけるn$-tangles は、n$-particle の絡み合いの基底独立測度を提供する。 EMSの記述ではこれらを捉えるのが難しいが、最小のハミルトニアンのトランケーションに比べて収束の改善は著しく劇的である。低エネルギーems法は多体系における低次オブザーバブルの予測能力を提供し、lmgモデルにおける量子相関や多体絡み合いの類似性を示し、核多体系や高エネルギー物理学や核物理学に関連する実効場理論の研究を動機付けるものであると結論づける。

We examine how effective-model-space (EMS) calculations of nuclear many-body systems rearrange and converge multi-particle entanglement. The generalized Lipkin-Meshkov-Glick (LMG) model is used to motivate and provide insight for future developments of entanglement-driven descriptions of nuclei. The effective approach is based on a truncation of the Hilbert space together with a variational rotation of the qubits (spins), which constitute the relevant elementary degrees of freedom. The non-commutivity of the rotation and truncation allows for an exponential improvement of the energy convergence throughout much of the model space. Our analysis examines measures of correlations and entanglement, and quantifies their convergence with increasing cut-off. We focus on one- and two-spin entanglement entropies, mutual information, and $n$-tangles for $n=2,4$ to estimate multi-body entanglement. The effective description strongly suppresses entropies and mutual information of the rotated spins, while being able to recover the exact results to a large extent with low cut-offs. Naive truncations of the bare Hamiltonian, on the other hand, artificially underestimate these measures. The $n$-tangles in the present model provide a basis-independent measures of $n$-particle entanglement. While these are more difficult to capture with the EMS description, the improvement in convergence, compared to truncations of the bare Hamiltonian, is significantly more dramatic. We conclude that the low-energy EMS techniques, that successfully provide predictive capabilities for low-lying observables in many-body systems, exhibit analogous efficacy for quantum correlations and multi-body entanglement in the LMG model, motivating future studies in nuclear many-body systems and effective field theories relevant to high-energy physics and nuclear physics.

翻訳日:2023-06-30 15:46:04 公開日:2023-06-28

# 有限時間熱力学における集合的利点

Collective advantages in finite-time thermodynamics ( http://arxiv.org/abs/2306.16534v1 )

ライセンス: Link先を確認

Alberto Rolandi, Mart\'i Perarnau-Llobet

(参考訳) 有限時間熱力学における中心的なタスクは、熱浴に浸漬した系の状態を操作する際に、余剰または散逸する作業を最小化することである。我々は,この課題を,プロセスの開始時と終了時において,構成成分が同一で非相関な$N$ボディシステムとみなす。遅いが有限時間プロセスの状態では、プロトコルに沿って対話が適切に作成される集合プロトコルを考えることで、$W_{\rm diss}$を劇的に削減できることを示す。これは$W_{\rm diss}\sim N^x$ with $x<1$; のサブ線形成長にもつながり、非相互作用プロトコルで満たされる$W_{\rm diss}\sim N$とは対照的に、$N$: $W_{\rm diss}\sim N^x$ with $x<1$; のサブ線形成長につながる。このような集合的利点に対する基本的な限界を導出し、x=0$ が原理的に可能であることを示すが、これは非常に局所的な $n$-body 相互作用を必要とする。次に、現実的な多体相互作用モデル、特に1次元スピンチェーンと全対全スピンモデルによる集合過程を探索し、現実的な制御レベルで顕著な利得を達成する。これらの結果の応用として,情報の消去を有限時間に限定し,ランドーアーの消去限界へのより高速な収束を証明した。

A central task in finite-time thermodynamics is to minimize the excess or dissipated work, $W_{\rm diss}$, when manipulating the state of a system immersed in a thermal bath. We consider this task for an $N$-body system, whose constituents are identical and uncorrelated at the beginning and end of the process. In the regime of slow but finite-time processes, we show that $W_{\rm diss}$ can be dramatically reduced by considering collective protocols in which interactions are suitably created along the protocol. This can even lead to a sub-linear growth of $W_{\rm diss}$ with $N$: $W_{\rm diss}\sim N^x$ with $x<1$; to be contrasted to the expected $W_{\rm diss}\sim N$ satisfied in any non-interacting protocol. We derive the fundamental limits to such collective advantages and show that $x=0$ is in principle possible, which however requires highly non-local $N$-body interactions. We then explore collective processes with realistic many-body interacting models, in particular a 1D spin chain and an all-to-all spin model, achieving noticeable gains under realistic levels of control. As an application of these results, we focus on the erasure of information in finite time, and prove a faster convergence to Landauer's erasure bound.

翻訳日:2023-06-30 15:45:33 公開日:2023-06-28

# ICSVR:ビデオ検索モデルにおける構成的・意味的理解の検討

ICSVR: Investigating Compositional and Semantic Understanding in Video Retrieval Models ( http://arxiv.org/abs/2306.16533v1 )

ライセンス: Link先を確認

Avinash Madasu, Vasudev Lal

(参考訳) ビデオ検索(VR)は、テキストキャプションまたはリバーサが与えられたビデオデータベースから地上の真理ビデオを取得することを含む。合成性の2つの重要なコンポーネント:オブジェクト \&属性とアクションは適切なテキストクエリを形成するために正しいセマンティクスを使って結合される。これらのコンポーネント(属性、アクション、セマンティクスを対象とする)は、それぞれがビデオの識別や正しい地上の真理ビデオの検索に重要な役割を果たす。しかし,これらのコンポーネントがビデオ検索性能に与える影響は明らかでない。そこで我々は,MSRVTT,MSVD,DIDEMOなどの標準ベンチマークを用いて,映像検索モデルの構成的および意味的理解を評価するための体系的研究を行った。本研究は,ビデオ検索モデルの2つのカテゴリについて行った。 (i)ビデオテキストペアで事前学習し、下流ビデオ検索データセット(例えば、Frozen-in-Time、Violet、MCQなど)で微調整する。 (ii) ビデオ検索にCLIP(CLIP4Clip, XCLIP, CLIP2Videoなど)のような事前訓練済みの画像テキスト表現を適用する。ビデオ理解において,アクションやセマンティクスはオブジェクトや属性と比較して小さな役割を担っていることが明らかとなった。さらに、事前学習された画像テキスト表現(CLIP)を用いたビデオ検索モデルは、ビデオテキストデータに事前学習されたモデルと比較して、意味的・構成的理解が優れている。

Video retrieval (VR) involves retrieving the ground truth video from the video database given a text caption or vice-versa. The two important components of compositionality: objects \& attributes and actions are joined using correct semantics to form a proper text query. These components (objects \& attributes, actions and semantics) each play an important role to help distinguish among videos and retrieve the correct ground truth video. However, it is unclear what is the effect of these components on the video retrieval performance. We therefore, conduct a systematic study to evaluate the compositional and semantic understanding of video retrieval models on standard benchmarks such as MSRVTT, MSVD and DIDEMO. The study is performed on two categories of video retrieval models: (i) which are pre-trained on video-text pairs and fine-tuned on downstream video retrieval datasets (Eg. Frozen-in-Time, Violet, MCQ etc.) (ii) which adapt pre-trained image-text representations like CLIP for video retrieval (Eg. CLIP4Clip, XCLIP, CLIP2Video etc.). Our experiments reveal that actions and semantics play a minor role compared to objects \& attributes in video understanding. Moreover, video retrieval models that use pre-trained image-text representations (CLIP) have better semantic and compositional understanding as compared to models pre-trained on video-text data.

翻訳日:2023-06-30 15:45:08 公開日:2023-06-28

# WHOグレード4グリオーマにおける術前MRIによる早期進行と生存リスクの予測

Prediction of Rapid Early Progression and Survival Risk with Pre-Radiation MRI in WHO Grade 4 Glioma Patients ( http://arxiv.org/abs/2306.16531v1 )

ライセンス: Link先を確認

Walia Farzana, Mustafa M Basree, Norou Diawara, Zeina A. Shboul, Sagel Dubey, Marie M Lockhart, Mohamed Hamza, Joshua D. Palmer, Khan M. Iftekharuddin

(参考訳) 最近の臨床研究では放射線治療開始前にREPを呈するグリオ芽腫のサブセットが報告されている。現在の文献では臨床病理学的特徴を用いてこの人口を記述している。本研究は,従来のra-diomics,洗練されたマルチレゾリューションフラクタルテクスチャ特徴,および非rep症例からのrepの予測のための診断および予後予測ツールとしての異なる分子特徴(mgmt,idh変異)について,計算および統計モデルを用いて初めて検討した。放射線プランニングT1ポストコントラスト(T1C)MRIシークエンスの解析を行った。 1000回以上の5倍のクロスバリデーションを持つアンサンブル法では、AUCは0.793であり、REPと非REPの標準偏差は0.082である。さらに、依存的な検閲(患者のサブセットが死ぬまで追跡されない場合)下でのコプラに基づくモデリングは、患者の生存確率と予後のグルーピングに重要な特徴(p-value <0.05)を特定する。コホート患者の生存率の予測は0.881で、標準偏差は0.056である。融合特徴を用いた予後指標(PI)は、REP症例の84.62%が悪い予後群に該当し、REP症例の比率が高くなる可能性を示唆している。さらに, マルチ分解能フラクタルテクスチャ特性は, REPやサバイバル結果の従来の放射能特性よりも優れていた。

Recent clinical research describes a subset of glioblastoma patients that exhibit REP prior to start of radiation therapy. Current literature has thus far described this population using clinicopathologic features. To our knowledge, this study is the first to investigate the potential of conventional ra-diomics, sophisticated multi-resolution fractal texture features, and different molecular features (MGMT, IDH mutations) as a diagnostic and prognostic tool for prediction of REP from non-REP cases using computational and statistical modeling methods. Radiation-planning T1 post-contrast (T1C) MRI sequences of 70 patients are analyzed. Ensemble method with 5-fold cross validation over 1000 iterations offers AUC of 0.793 with standard deviation of 0.082 for REP and non-REP classification. In addition, copula-based modeling under dependent censoring (where a subset of the patients may not be followed up until death) identifies significant features (p-value <0.05) for survival probability and prognostic grouping of patient cases. The prediction of survival for the patients cohort produces precision of 0.881 with standard deviation of 0.056. The prognostic index (PI) calculated using the fused features suggests that 84.62% of REP cases fall under the bad prognostic group, suggesting potentiality of fused features to predict a higher percentage of REP cases. The experimental result further shows that mul-ti-resolution fractal texture features perform better than conventional radiomics features for REP and survival outcomes.

翻訳日:2023-06-30 15:44:41 公開日:2023-06-28

# OAM光と原子アンサンブルのQND相互作用による並列多量子SWAPゲート

Parallel multi-two-qubit SWAP gate via QND interaction of OAM light and atomic ensemble ( http://arxiv.org/abs/2306.16565v1 )

ライセンス: Link先を確認

E.N. Bashmakova, E.A. Vashukevich, and T. Yu. Golubeva

(参考訳) 現在、量子SWAPゲートは量子コンピューティングの不可欠な部分となっているため、その実現方法の研究は様々な量子光学および情報応用において重要な実践的問題であると考えられる。本稿では、原子アンサンブルと軌道角運動量を持つマルチモード光との量子非退化相互作用の枠組みにおいて、離散変数でスワップ論理演算を行うためのスキームを提案する。本稿では、駆動場軌道運動量の異なる値に対する原子状態と磁場状態の集合上の2量子ビット閉サブシステムを明らかにする手順について詳細に論じる。また,並列マルチツーキュービット量子SWAPゲートの実装の可能性を示す。

Nowadays quantum SWAP gate has become an integral part of quantum computing, so investigation of methods of its realization seems to be an important practical problem for various quantum-optical and information applications. In the present paper we propose a scheme for performing a SWAP logic operation in discrete variables in the framework of quantum non-demolition interaction between an atomic ensemble and a multimode light with orbital angular momentum. We discuss in detail the procedure for revealing two-qubit closed subsystems on a set of atomic and field states for different values of the driving field orbital momentum. We also demonstrate the possibility of implementing a parallel multi-two-qubit quantum SWAP gate.

翻訳日:2023-06-30 15:38:01 公開日:2023-06-28

# pareto optimal self-supervision による大規模言語モデルの自動校正と誤り訂正

Automatic Calibration and Error Correction for Large Language Models via Pareto Optimal Self-Supervision ( http://arxiv.org/abs/2306.16564v1 )

ライセンス: Link先を確認

Theodore Zhao, Mu Wei, J. Samuel Preston, Hoifung Poon

(参考訳) 大規模言語モデル (LLM) は、広範囲の応用において目覚ましい能力を示してきたが、精度は依然として大きな成長領域であり、特にバイオメディシンのようなミッションクリティカルな領域では顕著である。 LLM応答に対する信頼度を校正する効果的な方法は、エラーを自動的に検出し、ループ内検証を容易にするために不可欠である。キャリブレーション信号の重要な源は、低コストで利用可能であるが、ノイズやカバレッジといった独自の制限がある、専門家によるプログラム的監督にある。本稿では,利用可能なプログラム的監督を活用し,追加の手動作業なしに,各応答に対するリスクスコアを作成することで,llm応答を体系的に校正することができるparetoの最適自己スーパービジョンフレームワークを提案する。これは、より不確実なLSM応答により高いリスクスコアを割り当て、エラー修正を容易にする、他の利用可能な監視源とLLM出力を一致させるハーモニザモデルを学ぶことで達成される。生体医学領域および一般領域における標準関係抽出タスクの実験により,本手法の有効性が示され,本手法のリスクスコアはllmsの実誤差率と高い相関を示した。最も不確実なテスト例では,提案したリスクスコアに基づく動的プロンプトにより,既製のLCMの精度が大幅に向上し,SOTA(State-of-the-art)の監督が弱く,SOTAの監督が難しい評価データセットにGPT-4の結果が及んだ。

Large language models (LLMs) have demonstrated remarkable capabilities out of box for a wide range of applications, yet accuracy still remains a major growth area, especially in mission-critical domains such as biomedicine. An effective method to calibrate the confidence level on LLM responses is essential to automatically detect errors and facilitate human-in-the-loop verification. An important source of calibration signals stems from expert-stipulated programmatic supervision, which is often available at low cost but has its own limitations such as noise and coverage. In this paper, we introduce a Pareto optimal self-supervision framework that can leverage available programmatic supervision to systematically calibrate LLM responses by producing a risk score for every response, without any additional manual efforts. This is accomplished by learning a harmonizer model to align LLM output with other available supervision sources, which would assign higher risk scores to more uncertain LLM responses and facilitate error correction. Experiments on standard relation extraction tasks in biomedical and general domains demonstrate the promise of this approach, with our proposed risk scores highly correlated with the real error rate of LLMs. For the most uncertain test instances, dynamic prompting based on our proposed risk scores results in significant accuracy improvement for off-the-shelf LLMs, boosting GPT-3 results past state-of-the-art (SOTA) weak supervision and GPT-4 results past SOTA supervised results on challenging evaluation datasets.

翻訳日:2023-06-30 15:37:51 公開日:2023-06-28

# 半定義プログラミングと量子情報

Semi-definite programming and quantum information ( http://arxiv.org/abs/2306.16560v1 )

ライセンス: Link先を確認

Piotr Mironowicz

(参考訳) 本稿では,量子情報の文脈における半定値プログラミング(SDP)手法の包括的探索について述べる。凸最適化、双対性、sdp定式化の数学的基礎を調べ、量子システムにおける最適化の課題に対処するための確かな理論的枠組みを提供する。これらのツールを活用することで、研究者や実践者は古典的および量子的相関を特徴づけ、量子状態を最適化し、効率的な量子アルゴリズムとプロトコルを設計することができる。また,量子情報処理における最適化手法の効果的な活用を可能にするため,sdpやモデリングツールなどの実装面についても論じる。この論文で提示された知見と方法論は、量子情報分野の進歩に寄与し、新しい通信プロトコル、自己テスト手法、量子絡み合いのより深い理解を促進することが証明されている。全体として、この研究は最適化と量子情報の交点に関心のある研究者にリソースを提供し、この急速に進化する分野における探索とブレークスルーのための新しい道を開く。

This paper presents a comprehensive exploration of semi-definite programming (SDP) techniques within the context of quantum information. It examines the mathematical foundations of convex optimization, duality, and SDP formulations, providing a solid theoretical framework for addressing optimization challenges in quantum systems. By leveraging these tools, researchers and practitioners can characterize classical and quantum correlations, optimize quantum states, and design efficient quantum algorithms and protocols. The paper also discusses implementational aspects, such as solvers for SDP and modeling tools, enabling the effective employment of optimization techniques in quantum information processing. The insights and methodologies presented in this paper have proven instrumental in advancing the field of quantum information, facilitating the development of novel communication protocols, self-testing methods, and a deeper understanding of quantum entanglement. Overall, this study offers a resource for researchers interested in the intersection of optimization and quantum information, opening up new avenues for exploration and breakthroughs in this rapidly evolving field.

翻訳日:2023-06-30 15:37:21 公開日:2023-06-28

# 特徴選択:属性間の協調をめざして

Feature Selection: A perspective on inter-attribute cooperation ( http://arxiv.org/abs/2306.16559v1 )

ライセンス: Link先を確認

Gustavo Sosa-Cabrera, Santiago G\'omez-Guerrero, Miguel Garc\'ia-Torres, Christian E. Schaerer

(参考訳) 高次元データセットは、データマイニングと機械学習における学習タスクの課題を描いている。特徴の選択は次元の縮小を扱う効果的な手法である。これは学習アルゴリズムを適用する前に必要不可欠なデータ処理ステップであることが多い。フィルタの特徴選択手法は、何十年もの間、単純な単変量関係ランキングアルゴリズムから、より洗練された関連性-冗長トレードオフ、そして近年の多変量依存に基づくアプローチへと進化してきた。多変量依存を取り込むこの傾向は、特徴間の相互作用からクラスに関するユニークな情報を得ることを目的としている。本稿では,機能相互運用を支援するフィルタ特徴選択手法に関する最近の研究を包括的に調査し,文献における様々なアプローチの貢献を要約する。さらに,今後の研究開発に期待できる課題や課題についても紹介する。

High-dimensional datasets depict a challenge for learning tasks in data mining and machine learning. Feature selection is an effective technique in dealing with dimensionality reduction. It is often an essential data processing step prior to applying a learning algorithm. Over the decades, filter feature selection methods have evolved from simple univariate relevance ranking algorithms to more sophisticated relevance-redundancy trade-offs and to multivariate dependencies-based approaches in recent years. This tendency to capture multivariate dependence aims at obtaining unique information about the class from the intercooperation among features. This paper presents a comprehensive survey of the state-of-the-art work on filter feature selection methods assisted by feature intercooperation, and summarizes the contributions of different approaches found in the literature. Furthermore, current issues and challenges are introduced to identify promising future research and development.

翻訳日:2023-06-30 15:37:05 公開日:2023-06-28

# 理論的保証付き機械学習のための非凸最適化:ロバスト行列補完とニューラルネットワーク学習

Non-Convex Optimizations for Machine Learning with Theoretical Guarantee: Robust Matrix Completion and Neural Network Learning ( http://arxiv.org/abs/2306.16557v1 )

ライセンス: Link先を確認

Shuai Zhang

(参考訳) 機械学習の最近の発展にもかかわらず、ほとんどの学習システムは未だに「ブラックボックス」という概念の下にあり、パフォーマンスを理解・導出できない。公衆の安全とプライバシーの懸念が高まり、説明可能な学習システムを設計することは、機械学習の新しいトレンドとなっている。一般に、多くの機械学習問題は損失関数の最小化(最大化)として定式化されている。実データは非線形モデルから生成される可能性が高いため、損失関数は一般に非凸である。凸最適化問題と異なり、勾配降下アルゴリズムは非凸最適化の解法において局所的最小値に閉じ込められる。したがって、非凸最適化問題を研究する際に説明可能なアルゴリズムを提供することは困難である。本論文では,(1)低ランク行列補完と(2)ニューラルネットワーク学習の2つの一般的な非凸問題について考察する。

Despite the recent development in machine learning, most learning systems are still under the concept of "black box", where the performance cannot be understood and derived. With the rise of safety and privacy concerns in public, designing an explainable learning system has become a new trend in machine learning. In general, many machine learning problems are formulated as minimizing (or maximizing) some loss function. Since real data are most likely generated from non-linear models, the loss function is non-convex in general. Unlike the convex optimization problem, gradient descent algorithms will be trapped in spurious local minima in solving non-convex optimization. Therefore, it is challenging to provide explainable algorithms when studying non-convex optimization problems. In this thesis, two popular non-convex problems are studied: (1) low-rank matrix completion and (2) neural network learning.

翻訳日:2023-06-30 15:36:52 公開日:2023-06-28

# Rater-Specific Bayesian Neural Networkによる医用画像セグメンテーションにおける層間不確かさの定量化

Inter-Rater Uncertainty Quantification in Medical Image Segmentation via Rater-Specific Bayesian Neural Networks ( http://arxiv.org/abs/2306.16556v1 )

ライセンス: Link先を確認

Qingqiao Hu, Hao Wang, Jing Luo, Yunhao Luo, Zhiheng Zhangg, Jan S. Kirschke, Benedikt Wiestler, Bjoern Menze, Jianguo Zhang, Hongwei Bran Li

(参考訳) 自動医用画像分割は本質的にある程度の不確実性を伴う。この不確実性に寄与する重要な要因の1つは、主に画像の外観の変化によって、対象領域の境界を決定する際に生じる曖昧さである。これに加えて、この分野の専門家の間でも、特定の解剖学的構造の正確な定義に関して異なる意見が生まれることがある。この研究は特に、層間不確実性として知られるセグメンテーションの不確かさのモデリングに対処する。その主な目的は、医療画像の複数の専門家が同じ画像の解釈と注釈を行う際に生じるセグメンテーション結果の変動を探索し分析することである。医用画像セグメンテーションにおけるレータ間不確実性を推定するための新しいベイズニューラルネットワークアーキテクチャを提案する。私たちのアプローチには3つの重要な進歩がある。まず,不確実性推定用に特別に調整した1エンコーダマルチデコーダアーキテクチャを導入することで,各専門家のレートラ固有の表現を捉えることができる。第2に,新しいアーキテクチャのベイズモデルを提案することで,特に制約の少ないシナリオにおいて,レート間分布の効率的なキャプチャを実現する。最後に、各デコーダにアテンションモジュールを組み込むことにより、rater特有の表現を強化する。このモジュールは、各レートのセグメンテーション結果の集中化と洗練を容易にする。合成および実世界のデータセットを使用して広範な評価を行い、技術的革新を厳格に検証する。提案手法は, 各種不確実性を考慮した2つの評価指標を考慮し, 7つのタスクのうち5つにおいて, 既存のベースライン手法を越えている。私たちのコード、モデル、新しいデータセットはgithubリポジトリから入手できます。

Automated medical image segmentation inherently involves a certain degree of uncertainty. One key factor contributing to this uncertainty is the ambiguity that can arise in determining the boundaries of a target region of interest, primarily due to variations in image appearance. On top of this, even among experts in the field, different opinions can emerge regarding the precise definition of specific anatomical structures. This work specifically addresses the modeling of segmentation uncertainty, known as inter-rater uncertainty. Its primary objective is to explore and analyze the variability in segmentation outcomes that can occur when multiple experts in medical imaging interpret and annotate the same images. We introduce a novel Bayesian neural network-based architecture to estimate inter-rater uncertainty in medical image segmentation. Our approach has three key advancements. Firstly, we introduce a one-encoder-multi-decoder architecture specifically tailored for uncertainty estimation, enabling us to capture the rater-specific representation of each expert involved. Secondly, we propose Bayesian modeling for the new architecture, allowing efficient capture of the inter-rater distribution, particularly in scenarios with limited annotations. Lastly, we enhance the rater-specific representation by integrating an attention module into each decoder. This module facilitates focused and refined segmentation results for each rater. We conduct extensive evaluations using synthetic and real-world datasets to validate our technical innovations rigorously. Our method surpasses existing baseline methods in five out of seven diverse tasks on the publicly available \emph{QUBIQ} dataset, considering two evaluation metrics encompassing different uncertainty aspects. Our codes, models, and the new dataset are available through our GitHub repository: https://github.com/HaoWang420/bOEMD-net .

翻訳日:2023-06-30 15:36:38 公開日:2023-06-28

# min-max f-divergence正規化による学習フェア分類

Learning Fair Classifiers via Min-Max F-divergence Regularization ( http://arxiv.org/abs/2306.16552v1 )

ライセンス: Link先を確認

Meiyu Zhong, Ravi Tandon

(参考訳) 機械学習(ML)ベースのシステムは、法執行機関、刑事司法、財務、雇用、入場などの分野で採用されているため、機械学習支援による意思決定の公正性がますます重要になっている。本稿では, 公平な分類の問題に焦点をあて, 高い精度を維持しつつ, 公平な分類モデルを学ぶための min-max F-divergence regularization framework を導入する。このフレームワークは,2つの学習可能なネットワーク,すなわち分類器ネットワークとバイアス/フェアネス推定ネットワークから成り,f-ダイバージェンスの統計的概念を用いてフェアネスを計測する。その結果,f-divergence測度は凸性と微分可能性特性を有し,その変動表現は実用的勾配に基づく学習法に広く適用できることがわかった。提案するフレームワークは、複数の機密属性や高次元データセットに容易に適応できる。グループフェアネス制約,すなわち人口格差と等化確率の2種類のグループフェアネス制約に対するF偏差に基づくトレーニングパラダイムについて検討する。本稿では,複数の領域(コンパス,法律加入,成人所得,セロバデータセットなど)で発生する実世界のデータセットについて,総合的な実験を行う。フェアネス精度のトレードオフを定量化するために、フェアネス精度の受信機動作特性 (FA-ROC) とそれに対応する 'textit{low-bias} FA-ROC の概念を導入する。フェア分類器(前処理,後処理,その他の正規化手法を含む)を学習するためのいくつかの既存手法と比較して,提案手法は,精度と公正性のトレードオフに関して,最先端の性能を実現する。

As machine learning (ML) based systems are adopted in domains such as law enforcement, criminal justice, finance, hiring and admissions, ensuring the fairness of ML aided decision-making is becoming increasingly important. In this paper, we focus on the problem of fair classification, and introduce a novel min-max F-divergence regularization framework for learning fair classification models while preserving high accuracy. Our framework consists of two trainable networks, namely, a classifier network and a bias/fairness estimator network, where the fairness is measured using the statistical notion of F-divergence. We show that F-divergence measures possess convexity and differentiability properties, and their variational representation make them widely applicable in practical gradient based training methods. The proposed framework can be readily adapted to multiple sensitive attributes and for high dimensional datasets. We study the F-divergence based training paradigm for two types of group fairness constraints, namely, demographic parity and equalized odds. We present a comprehensive set of experiments for several real-world data sets arising in multiple domains (including COMPAS, Law Admissions, Adult Income, and CelebA datasets). To quantify the fairness-accuracy tradeoff, we introduce the notion of fairness-accuracy receiver operating characteristic (FA-ROC) and a corresponding \textit{low-bias} FA-ROC, which we argue is an appropriate measure to evaluate different classifiers. In comparison to several existing approaches for learning fair classifiers (including pre-processing, post-processing and other regularization methods), we show that the proposed F-divergence based framework achieves state-of-the-art performance with respect to the trade-off between accuracy and fairness.

翻訳日:2023-06-30 15:36:10 公開日:2023-06-28

# オフロードセマンティックセマンティックセグメンテーション性能におけるLiDAR構成の解析

Analysis of LiDAR Configurations on Off-road Semantic Segmentation Performance ( http://arxiv.org/abs/2306.16551v1 )

ライセンス: Link先を確認

Jinhee Yu, Jingdao Chen, Lalitha Dabbiru, Christopher T. Goodin

(参考訳) 本稿では,LiDARの設定変化が3次元LiDARポイントクラウドセマンティックセグメンテーションモデルの性能に与える影響について検討する。実験にCylinder3Dを用いた3次元LiDARポイントクラウドセマンティックセマンティックセグメンテーションモデルのトレーニングおよびテストにおいて,異なるLiDARチャネルを使用することの効果を検討する。シリンダー3dモデルは、ミシシッピ州立大学のautonomous vehicle simulator(mavs)で作成したシミュレーション3d lidar point cloudデータセットと、現実世界のオフロード環境で収集されたrellis-3dデータセットの32,64チャンネルの3d lidar point cloud上でトレーニングおよびテストされる。実験の結果,センサと空間領域のシフトは,LiDARに基づくセマンティックセグメンテーションモデルの性能に大きく影響することが示された。トレーニングとテストの間の空間領域の変化がないため、同じセンサータイプでトレーニングとテストを行ったモデルは、一般的により優れた性能を示した。さらに,高分解能センサは低分解能センサに比べて性能が向上した。しかし,空間的領域変化が存在すると,結果は異なっていた。場合によっては、センサーの高解像度化の利点は、センサードメインシフトと非センサードメインシフトの両方でパフォーマンスの向上につながった。別の例では、高解像度は特定のドメイン内で過度に適合し、一般化能力が欠如し、異なるセンサー構成のデータでテストした場合のパフォーマンスが低下した。

This paper investigates the impact of LiDAR configuration shifts on the performance of 3D LiDAR point cloud semantic segmentation models, a topic not extensively studied before. We explore the effect of using different LiDAR channels when training and testing a 3D LiDAR point cloud semantic segmentation model, utilizing Cylinder3D for the experiments. A Cylinder3D model is trained and tested on simulated 3D LiDAR point cloud datasets created using the Mississippi State University Autonomous Vehicle Simulator (MAVS) and 32, 64 channel 3D LiDAR point clouds of the RELLIS-3D dataset collected in a real-world off-road environment. Our experimental results demonstrate that sensor and spatial domain shifts significantly impact the performance of LiDAR-based semantic segmentation models. In the absence of spatial domain changes between training and testing, models trained and tested on the same sensor type generally exhibited better performance. Moreover, higher-resolution sensors showed improved performance compared to those with lower-resolution ones. However, results varied when spatial domain changes were present. In some cases, the advantage of a sensor's higher resolution led to better performance both with and without sensor domain shifts. In other instances, the higher resolution resulted in overfitting within a specific domain, causing a lack of generalization capability and decreased performance when tested on data with different sensor configurations.

翻訳日:2023-06-30 15:35:39 公開日:2023-06-28

# UTOPIA: 普遍的にトレーニング可能な最適予測間隔

UTOPIA: Universally Trainable Optimal Prediction Intervals Aggregation ( http://arxiv.org/abs/2306.16549v1 )

ライセンス: Link先を確認

Jianqing Fan, Jiawei Ge and Debarghya Mukherjee

(参考訳) 予測の不確かさの定量化は、バイオメディカルサイエンス、経済研究、天気予報など、様々な分野で重要な応用の興味深い問題である。量子回帰や共形予測などの予測区間を構築するための多くの方法が利用可能である。それでも、モデル不特定(特に高次元)や準最適構成は、しばしばバイアスや不必要に広い予測間隔をもたらす。本稿では,予測帯域の平均幅を最小化するために,予測帯域の平均幅を最小化する手法として,普遍的に学習可能な最適予測間隔集約 (utopia) を提案する。また,基本関数に基づいて予測帯域を直接構築することができる。我々のアプローチは、実装が容易な線形あるいは凸プログラミングに基づいている。提案手法はすべて,本論文で詳述した範囲確率と最適平均長に関する理論的保証によって支持されている。本手法の有効性は,金融・マクロ経済学における合成データと2つの実データに適用することによって実証された。

Uncertainty quantification for prediction is an intriguing problem with significant applications in various fields, such as biomedical science, economic studies, and weather forecasts. Numerous methods are available for constructing prediction intervals, such as quantile regression and conformal predictions, among others. Nevertheless, model misspecification (especially in high-dimension) or sub-optimal constructions can frequently result in biased or unnecessarily-wide prediction intervals. In this paper, we propose a novel and widely applicable technique for aggregating multiple prediction intervals to minimize the average width of the prediction band along with coverage guarantee, called Universally Trainable Optimal Predictive Intervals Aggregation (UTOPIA). The method also allows us to directly construct predictive bands based on elementary basis functions. Our approach is based on linear or convex programming which is easy to implement. All of our proposed methodologies are supported by theoretical guarantees on the coverage probability and optimal average length, which are detailed in this paper. The effectiveness of our approach is convincingly demonstrated by applying it to synthetic data and two real datasets on finance and macroeconomics.

翻訳日:2023-06-30 15:35:12 公開日:2023-06-28

# palm: 言語モデルによる行動予測@ego4d 長期行動予測チャレンジ2023

Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023 ( http://arxiv.org/abs/2306.16545v1 )

ライセンス: Link先を確認

Daoji Huang, Otmar Hilliges, Luc Van Gool, Xi Wang

(参考訳) 視覚言語と大規模言語モデルを利用したLTA(Long-Term Action Precipation)タスクのソリューションであるPalmを提案する。注釈付きアクション周期の入力ビデオが与えられた場合、LTAタスクは将来のアクションを予測することを目的としている。我々は、最適なソリューションは過去のアクションと将来のアクションの間の相互依存性を捉え、過去のアクションで符号化された構造と依存関係に基づいて将来のアクションを推測できるべきだと仮定する。大規模言語モデルは顕著な常識に基づく推論能力を示している。これにインスパイアされたPalmは、画像キャプションモデルと大きな言語モデルをチェーンする。入力ビデオから抽出したフレーム記述とアクションラベルに基づいて、将来のアクションを予測する。提案手法は,EGO4D LTAチャレンジにおける他の参加者よりも優れ,行動予測の観点で最高のパフォーマンスを達成する。私たちのコードはhttps://github.com/DanDoge/Palmで利用可能です。

We present Palm, a solution to the Long-Term Action Anticipation (LTA) task utilizing vision-language and large language models. Given an input video with annotated action periods, the LTA task aims to predict possible future actions. We hypothesize that an optimal solution should capture the interdependency between past and future actions, and be able to infer future actions based on the structure and dependency encoded in the past actions. Large language models have demonstrated remarkable commonsense-based reasoning ability. Inspired by that, Palm chains an image captioning model and a large language model. It predicts future actions based on frame descriptions and action labels extracted from the input videos. Our method outperforms other participants in the EGO4D LTA challenge and achieves the best performance in terms of action prediction. Our code is available at https://github.com/DanDoge/Palm

翻訳日:2023-06-30 15:34:51 公開日:2023-06-28

# 適応量子力学における自由フェルミオン

Free fermions under adaptive quantum dynamics ( http://arxiv.org/abs/2306.16595v1 )

ライセンス: Link先を確認

Vikram Ravindranath, Zhi-Cheng Yang and Xiao Chen

(参考訳) ユニタリゲートと射影計測からなる適応量子力学の下で自由フェルミオン系と補正ユニタリ演算について検討した。さらに、各サイトに対して古典的なフラグを導入し、ユニタリゲートが適用可能か否かを判断するアクティブまたは非アクティブな状態を可能にする。この力学において、個々の量子軌道は、連続的監視下で以前に研究された自由フェルミオンのモデルと同様に、臨界値から限界値までのエンタングルメント遷移を示す。さらに, 正則ユニタリ演算は, 電荷密度-波動秩序を特徴とする状態に制御できることがわかった。その結果、量子軌道と量子チャネルの両方のレベルで観察できる追加の位相遷移が起こる。我々は、絡み合い遷移とステアリング遷移が根本的に異なることを確証する。後者の遷移は、固有のフェルミオンパリティと古典的なラベリングの間の相互作用から生じるパリティ保存(PC)普遍性クラスに属する。我々は,フリーフェルミオン系の効率的な数値シミュレーションにより,エンタングルメントとステアリング遷移の双方を実証し,後者のPC普遍性クラスを確認する。

We study free fermion systems under adaptive quantum dynamics consisting of unitary gates and projective measurements followed by corrective unitary operations. We further introduce a classical flag for each site, allowing for an active or inactive status which determines whether or not the unitary gates are allowed to apply. In this dynamics, the individual quantum trajectories exhibit a measurement-induced entanglement transition from critical to area-law scaling above a critical measurement rate, similar to previously studied models of free fermions under continuous monitoring. Furthermore, we find that the corrective unitary operations can steer the system into a state characterized by charge-density-wave order. Consequently, an additional phase transition occurs, which can be observed at both the level of the quantum trajectory and the quantum channel. We establish that the entanglement transition and the steering transition are fundamentally distinct. The latter transition belongs to the parity-conserving (PC) universality class, arising from the interplay between the inherent fermionic parity and classical labelling. We demonstrate both the entanglement and the steering transitions via efficient numerical simulations of free fermion systems, which confirm the PC universality class of the latter.

翻訳日:2023-06-30 15:28:52 公開日:2023-06-28

# 時間不変性と線形性を利用した部分的に観測された動的時系列の開発予測

Forecasting of the development of a partially-observed dynamical time series with the aid of time-invariance and linearity ( http://arxiv.org/abs/2306.16593v1 )

ライセンス: Link先を確認

Akifumi Okuno, Yuya Morishita, Yoh-ichi Mototake

(参考訳) 力学系は進化関数を用いて開発された動的時系列と呼ばれる依存多変量列を生成する。現在の時刻における動的時系列の変数は通常、前の時刻における変数全体に依存するため、既存の研究では進化関数を推定することによって将来の時刻における変数を予測する。しかし、動的時系列のいくつかの変数は、いくつかの実用的な状況では欠落している。本研究では,スラック時系列(ARS)モデルを用いた自己回帰モデルを提案する。 ARSモデルは、力学系の時間不変性と線形性の助けを借りて、進化関数と基礎となる不足変数をスラック時系列として同時推定する。本研究では,提案モデルの有効性を実証的に示す。

A dynamical system produces a dependent multivariate sequence called dynamical time series, developed with an evolution function. As variables in the dynamical time series at the current time-point usually depend on the whole variables in the previous time-point, existing studies forecast the variables at the future time-point by estimating the evolution function. However, some variables in the dynamical time-series are missing in some practical situations. In this study, we propose an autoregressive with slack time series (ARS) model. ARS model involves the simultaneous estimation of the evolution function and the underlying missing variables as a slack time series, with the aid of the time-invariance and linearity of the dynamical system. This study empirically demonstrates the effectiveness of the proposed ARS model.

翻訳日:2023-06-30 15:28:33 公開日:2023-06-28

# SeMLaPS: 潜時事前ネットワークと準平面分割を用いたリアルタイム意味マッピング

SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and Quasi-Planar Segmentation ( http://arxiv.org/abs/2306.16585v1 )

ライセンス: Link先を確認

Jingwen Wang, Juan Tarrio, Lourdes Agapito, Pablo F. Alcantarilla, Alexander Vakhitov

(参考訳) リアルタイムセマンティクスの可用性はSLAMシステムの中核的な幾何学的機能を大幅に改善し、多数のロボットおよびAR/VRアプリケーションを可能にする。本稿では,2次元ニューラルネットワークとSLAMシステムに基づく3次元ネットワークを組み合わせたRGB-Dシーケンスからのリアルタイムセマンティックマッピング手法を提案する。新しいフレームをセグメント化する際、差別化可能なレンダリングに基づいて、以前のフレームから潜在機能を再投影する。以前のフレームから現在のフレームで再プロジェクションされた特徴マップを再利用することで、イメージを独立して処理するベースラインに比べて、画像セグメンテーションの品質が大幅に向上する。 3次元マップ処理では,曲面正規度に依存して,同じ意味クラスに属する可能性のある3次元マップ要素をグループ化する幾何学的準平面オーバーセグメンテーション法を提案する。また,軽量なセマンティックマップ処理のためのニューラルネットワーク設計について述べる。本システムは,2d-3dネットワークベースのシステムにおいて最先端のセマンティックマッピング品質を実現し,リアルタイム作業中に3つの実屋内データセット上での3次元畳み込みネットワークの性能に適合する。さらに,3d cnnと比較してセンサ間一般化能力が向上し,異なる深度センサを用いたトレーニングや推論が可能となった。コードとデータはプロジェクトページで公開される。 http://jingwenwang95.github.io/SeMLaPS

The availability of real-time semantics greatly improves the core geometric functionality of SLAM systems, enabling numerous robotic and AR/VR applications. We present a new methodology for real-time semantic mapping from RGB-D sequences that combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping. When segmenting a new frame we perform latent feature re-projection from previous frames based on differentiable rendering. Fusing re-projected feature maps from previous frames with current-frame features greatly improves image segmentation quality, compared to a baseline that processes images independently. For 3D map processing, we propose a novel geometric quasi-planar over-segmentation method that groups 3D map elements likely to belong to the same semantic classes, relying on surface normals. We also describe a novel neural network design for lightweight semantic map post-processing. Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems and matches the performance of 3D convolutional networks on three real indoor datasets, while working in real-time. Moreover, it shows better cross-sensor generalization abilities compared to 3D CNNs, enabling training and inference with different depth sensors. Code and data will be released on project page: http://jingwenwang95.github.io/SeMLaPS

翻訳日:2023-06-30 15:28:21 公開日:2023-06-28

# 画像分類における深部ニューラルネットワークのロバスト性は有益か?

Does Saliency-Based Training bring Robustness for Deep Neural Networks in Image Classification? ( http://arxiv.org/abs/2306.16581v1 )

ライセンス: Link先を確認

Ali Karkehabadi

(参考訳) ディープニューラルネットワークは、複雑なパターンを理解し、意思決定する強力なツールである。しかし、そのブラックボックスの性質は内部の動作を完全に理解することを妨げている。オンライン・サリエンシーガイドによるトレーニング手法では、モデルのアウトプットの顕著な特徴を強調してこの問題を緩和しようとするが、視覚的に説明可能な特徴がモデルの頑健さと敵対的な例と一致するかどうかはあいまいである。本稿では,サリエンシートレーニングモデルの脆弱性を逆例法に適用して検討する。モデルはオンラインのsaliency-guided trainingメソッドを使用してトレーニングされ、敵の例の一般的なアルゴリズムに対して評価される。我々はロバスト性を定量化し、モデルの出力によく説明されている可視化にもかかわらず、サルエントモデルが敵の事例攻撃に対する低いパフォーマンスに苦しむと結論づける。

Deep Neural Networks are powerful tools to understand complex patterns and making decisions. However, their black-box nature impedes a complete understanding of their inner workings. While online saliency-guided training methods try to highlight the prominent features in the model's output to alleviate this problem, it is still ambiguous if the visually explainable features align with robustness of the model against adversarial examples. In this paper, we investigate the saliency trained model's vulnerability to adversarial examples methods. Models are trained using an online saliency-guided training method and evaluated against popular algorithms of adversarial examples. We quantify the robustness and conclude that despite the well-explained visualizations in the model's output, the salient models suffer from the lower performance against adversarial examples attacks.

翻訳日:2023-06-30 15:27:59 公開日:2023-06-28

# 温度状態生成のための量子イマジナリー時間伝搬アルゴリズム

Quantum Imaginary Time Propagation algorithm for preparing thermal states ( http://arxiv.org/abs/2306.16580v1 )

ライセンス: Link先を確認

Francesco Turro

(参考訳) 有限温度での計算は、核物理学から凝縮物質まで、様々な科学分野において基本的なものである。想像時間における進化は、量子系の熱状態を作成するための顕著な古典的技法である。本稿では, 量子時間伝搬法に基づく熱状態の生成を, 量子時間演算子の非ユニタリティ特性を克服するために, アンシラ量子ビットを用いた希釈演算子を用いて提案する。提案手法は、一般的なハミルトニアンに対する一般的な量子プロセッサ上の正しい熱密度行列を得ることができる最初の方法である。 2つの中性子系と3つの中性子系の実際の量子ハードウェア計算熱特性の信頼性を証明する。

Calculations at finite temperatures are fundamental in different scientific fields, from nuclear physics to condensed matter. Evolution in imaginary time is a prominent classical technique for preparing thermal states of quantum systems. We propose a new quantum algorithm that prepares thermal states based on the quantum imaginary time propagation method, using a diluted operator with ancilla qubits to overcome the non-unitarity nature of the imaginary time operator. The presented method is the first that allows us to obtain the correct thermal density matrix on a general quantum processor for a generic Hamiltonian. We prove its reliability in the actual quantum hardware computing thermal properties for two and three neutron systems.

翻訳日:2023-06-30 15:27:44 公開日:2023-06-28

# 未知・ランダム・リワードを持つ腕に異種資源を割り当てる

Allocating Divisible Resources on Arms with Unknown and Random Rewards ( http://arxiv.org/abs/2306.16578v1 )

ライセンス: Link先を確認

Ningyuan Chen, Wenhao Li

(参考訳) 我々は,各期間に再生可能かつ分別可能な資源の1つの単位を,複数のアームで割り当てる意思決定者を考える。アームには未知およびランダムな報酬があり、その手段は割り当てられたリソースに比例し、その分散は割り当てられたリソースのオーダー$b$に比例する。特に、ある期間に意思決定者がリソース$a_i$をarm$i$に割り当てると、報酬$y_i$は$y_i(a_i)=a_i \mu_i+a_i^b \xi_{i}$となる。 b$ が 0 から 1 まで変化すると、フレームワークは標準の確率的多腕バンディットとオンライン学習を完全なフィードバックでスムーズに橋渡しする。最適なギャップ依存とギャップ非依存の残差境界を$b\in [0,1]$で設計し,$b=1/2$で相転移を示す。理論的な結果は、重みが分数であり、濾過に適応し、単調なサブガウス確率変数の線形結合を境界とする、新しい濃度不等式にかかっている。

We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order $b$ of the allocated resource. In particular, if the decision maker allocates resource $A_i$ to arm $i$ in a period, then the reward $Y_i$ is$Y_i(A_i)=A_i \mu_i+A_i^b \xi_{i}$, where $\mu_i$ is the unknown mean and the noise $\xi_{i}$ is independent and sub-Gaussian. When the order $b$ ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for $b\in [0,1]$, and demonstrate a phase transition at $b=1/2$. The theoretical results hinge on a novel concentration inequality we have developed that bounds a linear combination of sub-Gaussian random variables whose weights are fractional, adapted to the filtration, and monotonic.

翻訳日:2023-06-30 15:27:33 公開日:2023-06-28

# 石油・ガス供給チェーンにおけるブロックチェーン: ユーザセキュリティとプライバシの観点からの文献レビュー

Blockchain in Oil and Gas Supply Chain: A Literature Review from User Security and Privacy Perspective ( http://arxiv.org/abs/2306.16576v1 )

ライセンス: Link先を確認

Urvashi Kishnani, Srinidhi Madabhushi and Sanchari Das

(参考訳) ブロックチェーンの影響は金融を超えて広がり、不動産、石油、ガス、教育など様々な分野に影響を与えた。この広範なリーチは、デジタルトランザクションとサプライチェーンを確実に管理できるブロックチェーンの本質的な能力に起因する。石油とガス部門では、ブロックチェーンとサプライチェーンの管理とデータ処理の合併が注目に値するトレンドだ。サプライチェーンには、資源の抽出、輸送、取引、流通など、いくつかの事業がある。残念ながら、現在のサプライチェーン構造では、透明性やトレーサビリティ、フレキシブルなトレーサビリティ、セキュアなデータストレージといった重要な機能が欠落しています。それでも、石油・ガス産業におけるブロックチェーンのセキュリティとプライバシの調査は不可欠である。このような精査により、スムーズでセキュアで使用可能なトランザクションの実行が可能になる。本研究は124冊の学術論文をレビューし,21冊の詳細な分析を行った。サプライチェーンフローのさまざまなフェーズ – 上流,中流,下流,データ管理 – との関連性から,記事の分類を行った。サプライチェーンにおける既存のセキュリティとプライバシの空白に対処できるブロックチェーンのポテンシャルにもかかわらず、石油とガスの運用におけるブロックチェーン統合の実践的な実装が大幅に欠如している。この欠如は、従来の方法からブロックチェーン中心のアプローチへの移行に大きく挑戦する。

Blockchain's influence extends beyond finance, impacting diverse sectors such as real estate, oil and gas, and education. This extensive reach stems from blockchain's intrinsic ability to reliably manage digital transactions and supply chains. Within the oil and gas sector, the merger of blockchain with supply chain management and data handling is a notable trend. The supply chain encompasses several operations: extraction, transportation, trading, and distribution of resources. Unfortunately, the current supply chain structure misses critical features such as transparency, traceability, flexible trading, and secure data storage - all of which blockchain can provide. Nevertheless, it is essential to investigate blockchain's security and privacy in the oil and gas industry. Such scrutiny enables the smooth, secure, and usable execution of transactions. For this purpose, we reviewed 124 peer-reviewed academic publications, conducting an in-depth analysis of 21 among them. We classified the articles by their relevance to various phases of the supply chain flow: upstream, midstream, downstream, and data management. Despite blockchain's potential to address existing security and privacy voids in the supply chain, there is a significant lack of practical implementation of blockchain integration in oil and gas operations. This deficiency substantially challenges the transition from conventional methods to a blockchain-centric approach.

翻訳日:2023-06-30 15:27:05 公開日:2023-06-28

# fisher information rateを用いた有限サンプル対称平均推定

Finite-Sample Symmetric Mean Estimation with Fisher Information Rate ( http://arxiv.org/abs/2306.16573v1 )

ライセンス: Link先を確認

Shivam Gupta, Jasper C.H. Lee, Eric Price

(参考訳) 未知の分散の平均$$\sigma^2$ distribution $f$は、分散$\frac{\sigma^2}{n}$とほぼ対応する部分ガウス率を持つ$n$サンプルから推定することができる。 f$ が翻訳まで知られている場合、これは漸近的に $\frac{1}{n\mathcal i}$ に改善され、ここで $\mathcal i$ は分布のフィッシャー情報である。そのような改善は、一般の未知の$f$では不可能であるが、[Stone, 1975] は、この漸近収束$\textit{is}$が、その平均について$f$が$\textit{symmetric}$であれば可能であることを示した。収束に必要な$n$ は、分配 $f$ と失敗確率 $\delta$ に依存する。本稿では、フィッシャー情報の観点から対称平均推定のための有限サンプル保証を与える。すべての$f, n, \delta$ with $n > \log \frac{1}{\delta}$ に対して、分散$\frac{1}{n \mathcal I_r}$ に近い収束を得る。そのような境界は、既知の$f$設定における有限サンプル保証と本質的に一致する。

The mean of an unknown variance-$\sigma^2$ distribution $f$ can be estimated from $n$ samples with variance $\frac{\sigma^2}{n}$ and nearly corresponding subgaussian rate. When $f$ is known up to translation, this can be improved asymptotically to $\frac{1}{n\mathcal I}$, where $\mathcal I$ is the Fisher information of the distribution. Such an improvement is not possible for general unknown $f$, but [Stone, 1975] showed that this asymptotic convergence $\textit{is}$ possible if $f$ is $\textit{symmetric}$ about its mean. Stone's bound is asymptotic, however: the $n$ required for convergence depends in an unspecified way on the distribution $f$ and failure probability $\delta$. In this paper we give finite-sample guarantees for symmetric mean estimation in terms of Fisher information. For every $f, n, \delta$ with $n > \log \frac{1}{\delta}$, we get convergence close to a subgaussian with variance $\frac{1}{n \mathcal I_r}$, where $\mathcal I_r$ is the $r$-$\textit{smoothed}$ Fisher information with smoothing radius $r$ that decays polynomially in $n$. Such a bound essentially matches the finite-sample guarantees in the known-$f$ setting.

翻訳日:2023-06-30 15:26:43 公開日:2023-06-28

# 実時間および虚数時間における量子および古典シミュレーションのための複合qdrift-product公式

Composite QDrift-Product Formulas for Quantum and Classical Simulations in Real and Imaginary Time ( http://arxiv.org/abs/2306.16572v1 )

ライセンス: Link先を確認

Matthew Pocrnic, Matthew Hagan, Juan Carrasquilla, Dvira Segal, Nathan Wiebe

(参考訳) 最近の研究は、与えられたシミュレーション問題に対してハミルトン$H$をサブセットの$A$と$B$に分割し、$H=A+B$をトロッタースズキチャネルでシミュレートし、QDriftアルゴリズムを介して$B$項をランダムにサンプリングする合成チャネルを実装するのが有利であることを示している。ここでは、このアプローチが虚数時間で成り立つことを示し、量子モンテカルロ計算の古典的アルゴリズム候補となる。虚数時間QDriftと複合チャネルの両方において、Schatten-$1 \to 1$ normを上界する。もう一つの最近の結果は、有限格子上で定義される系に対する幾何学的局所的相互作用を含むハミルトンのシミュレーションが、リーブ・ロビンソンの議論を用いて格子の部分集合上で支持される項のみを含む部分集合に$h$を分解することで改善できることを示した。ここでは,この結果と複合的手法を併用した量子アルゴリズムを ``local composite channel' に提供し,ダイヤモンド距離を上界に設定する。 e^{-ih_j t}$ と $e^{-h_j \beta}$ の形のゲート数を計算してアルゴリズムコストの正確な数値シミュレーションを行い、一定の誤差許容値 $\epsilon$ を満たす。我々は、様々な興味深いハミルトニアンに対して定数因子の利点を示し、その最大値は、ジェリウムのシミュレーションで起こる約20ドルの速度アップである。

Recent work has shown that it can be advantageous to implement a composite channel that partitions the Hamiltonian $H$ for a given simulation problem into subsets $A$ and $B$ such that $H=A+B$, where the terms in $A$ are simulated with a Trotter-Suzuki channel and the $B$ terms are randomly sampled via the QDrift algorithm. Here we show that this approach holds in imaginary time, making it a candidate classical algorithm for quantum Monte-Carlo calculations. We upper-bound the induced Schatten-$1 \to 1$ norm on both imaginary-time QDrift and Composite channels. Another recent result demonstrated that simulations of Hamiltonians containing geometrically-local interactions for systems defined on finite lattices can be improved by decomposing $H$ into subsets that contain only terms supported on that subset of the lattice using a Lieb-Robinson argument. Here, we provide a quantum algorithm by unifying this result with the composite approach into ``local composite channels" and we upper bound the diamond distance. We provide exact numerical simulations of algorithmic cost by counting the number of gates of the form $e^{-iH_j t}$ and $e^{-H_j \beta}$ to meet a certain error tolerance $\epsilon$. We show constant factor advantages for a variety of interesting Hamiltonians, the maximum of which is a $\approx 20$ fold speedup that occurs for a simulation of Jellium.

翻訳日:2023-06-30 15:26:01 公開日:2023-06-28

# 終端イベントの存在下でのリカレントイベントの予測数に対する因果推論

Causal inference for the expected number of recurrent events in the presence of a terminal event ( http://arxiv.org/abs/2306.16571v1 )

ライセンス: Link先を確認

Benjamin R. Baer, Robert L. Strawderman, Ashkan Ertefaie

(参考訳) 終端イベントの存在下での繰り返し事象の因果推論と効率的な推定について検討した。我々は,予測回数の繰り返しイベントと,ランドマーク時間列に沿って評価された障害生存関数の両方からなるベクトルとして推定値を定義する。ランダムに粗大化下で機能する観測データとして、右検閲と因果選択の存在下での推定を同定し、非パラメトリック効率境界を導出し、境界を達成し、迷惑パラメータの非パラメトリック推定を可能にするマルチプライロバスト推定器を提案する。全体として、失敗、検閲、観察されたデータの基本的な確率分布について絶対連続性の仮定は行われない。さらに、粗い分布が分かっていれば影響関数のクラスを導出し、そのクラスに属することができるのかをレビューする。その過程で,因果寿命分析文献における興味深い不一致を浮き彫りにする。

We study causal inference and efficient estimation for the expected number of recurrent events in the presence of a terminal event. We define our estimand as the vector comprising both the expected number of recurrent events and the failure survival function evaluated along a sequence of landmark times. We identify the estimand in the presence of right-censoring and causal selection as an observed data functional under coarsening at random, derive the nonparametric efficiency bound, and propose a multiply-robust estimator that achieves the bound and permits nonparametric estimation of nuisance parameters. Throughout, no absolute continuity assumption is made on the underlying probability distributions of failure, censoring, or the observed data. Additionally, we derive the class of influence functions when the coarsening distribution is known and review how published estimators may belong to the class. Along the way, we highlight some interesting inconsistencies in the causal lifetime analysis literature.

翻訳日:2023-06-30 15:24:58 公開日:2023-06-28

# cpu上のトランスフォーマー言語モデルのための効率的なスパース推論ソフトウェアアクセラレータ

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs ( http://arxiv.org/abs/2306.16601v1 )

ライセンス: Link先を確認

Haihao Shen, Hengyu Meng, Bo Dong, Zhe Wang, Ofir Zafrir, Yi Ding, Yu Luo, Hanwen Chang, Qun Gao, Ziheng Wang, Guy Boudoukh, and Moshe Wasserblat

(参考訳) 近年,トランスフォーマーに基づく言語モデルが自然言語処理タスクの標準的アプローチとなっている。しかし、産業アプリケーションにおける厳格なスループットとレイテンシ要件は採用を制限している。このギャップを軽減するために、構造化プルーニングのようなモデル圧縮技術が推論効率を改善するために使用されている。しかし、既存のほとんどのニューラルネットワーク推論ランタイムは、構造化されたスパーシリティを適切にサポートしていない。本稿では,トランスフォーマーに基づく言語モデルに対して,重みを一定のブロックサイズで刈り取る,効率的なスパース深層学習ソフトウェアスタックを提案する。我々のスパースソフトウェアアクセラレータは、Intel Deep Learning Boostを活用してスパースマトリックス(一般にSpMMと略される)の性能を最大化する。我々のSpMMカーネルは,既存のスパースライブラリ (oneMKL, TVM, LIBXSMM) を5つの代表空間比 (70%, 75%, 80%, 85%, 90%) 以下のGEMM形状で桁違いに処理する。さらに、当社のSpMMカーネルは、業界で広く使われている高度ライブラリであるOneDNNの高密度GEMMカーネルよりも最大5倍高速化されている。スパースアクセラレータを,Bert-Mini, DistilBERT, Bert-Base, BERT-Largeなど,広く使われているTransformerベースの言語モデルに適用する。当社のスパース推論ソフトウェアは,Amazon Web Services上のXeonと同じ構成で,Neural MagicのDeepsparseよりも1.5倍のスピードアップを実現しています。我々はまた、私たちのソリューションを、ONNX RuntimeとPyTorchという2つのフレームワークベースの推論ソリューションと比較し、レイテンシ制約の下で、ONNX Runtimeの最大37倍のスピードアップとXeonのPyTorchの最大345倍のスピードアップを示します。ソースコードはすべてgithubで公開されている。 https://github.com/intel/intel-extension-for-transformers。

In recent years, Transformer-based language models have become the standard approach for natural language processing tasks. However, stringent throughput and latency requirements in industrial applications are limiting their adoption. To mitigate the gap, model compression techniques such as structured pruning are being used to improve inference efficiency. However, most existing neural network inference runtimes lack adequate support for structured sparsity. In this paper, we propose an efficient sparse deep learning inference software stack for Transformer-based language models where the weights are pruned with constant block size. Our sparse software accelerator leverages Intel Deep Learning Boost to maximize the performance of sparse matrix - dense matrix multiplication (commonly abbreviated as SpMM) on CPUs. Our SpMM kernel outperforms the existing sparse libraries (oneMKL, TVM, and LIBXSMM) by an order of magnitude on a wide range of GEMM shapes under 5 representative sparsity ratios (70%, 75%, 80%, 85%, 90%). Moreover, our SpMM kernel shows up to 5x speedup over dense GEMM kernel of oneDNN, a well-optimized dense library widely used in industry. We apply our sparse accelerator on widely-used Transformer-based language models including Bert-Mini, DistilBERT, Bert-Base, and BERT-Large. Our sparse inference software shows up to 1.5x speedup over Neural Magic's Deepsparse under same configurations on Xeon on Amazon Web Services under proxy production latency constraints. We also compare our solution with two framework-based inference solutions, ONNX Runtime and PyTorch, and demonstrate up to 37x speedup over ONNX Runtime and 345x over PyTorch on Xeon under the latency constraints. All the source code is publicly available on Github: https://github.com/intel/intel-extension-for-transformers.

翻訳日:2023-06-30 15:16:11 公開日:2023-06-28

# 浮遊粒子の近量子制限速度分布の観察

Observation of near-quantum-limited velocity distributions of a levitated particle ( http://arxiv.org/abs/2306.16598v1 )

ライセンス: Link先を確認

M. Kamba and K. Aikawa

(参考訳) 超低温浮遊ナノ粒子の飛行時間測定を実証し、量子状態における翻訳速度を明らかにする。繰り返し測定により得られた速度分布は,ナノ粒子の液状化運動により著しく拡大することがわかった。すべてのリリレー運動に対するフィードバック冷却の下で、占有数からの期待値と合理的に一致する速度分布を量子限界の約2倍の幅で回復する。振動中心と質量中心との偏差はナノ粒子の非対称性によって引き起こされるため, 振動運動の翻訳運動に対する強い影響は理解されている。その結果、振動運動の制御の重要性が解明され、浮遊ナノ粒子の速度の観点から量子力学的性質を探求する基礎が確立された。

We demonstrate time-of-flight measurements for an ultracold levitated nanoparticle and reveal its translational velocity in the quantum regime. We discover that the velocity distributions obtained with repeated measurements are significantly broadened via librational motions of the nanoparticle. Under feedback cooling on all the librational motions, we recover the velocity distributions in reasonable agreement with an expectation from the occupation number, with approximately twice the width of the quantum limit. The strong impact of librational motions on the translational motions is understood as a result of the deviation between the libration center and the center of mass, induced by the asymmetry of the nanoparticle. Our results elucidate the importance of the control over librational motions and establish the basis for exploring quantum mechanical properties of levitated nanoparticles in terms of their velocity.

翻訳日:2023-06-30 15:15:39 公開日:2023-06-28

# PFB-Diff:テキスト駆動画像編集のための進行的特徴ブレンディング拡散

PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing ( http://arxiv.org/abs/2306.16894v1 )

ライセンス: Link先を確認

Wenjing Huang, Shikui Tu, Lei Xu

(参考訳) 拡散モデルは、多彩で高品質な画像を合成する優れた能力を示し、実際の画像編集への応用への関心を喚起している。しかしながら、局所的な画像編集のための既存の拡散ベースのアプローチは、ノイズの多い対象画像と拡散潜性変数のピクセルレベルでのブレンドによって、望ましくないアーティファクトに苦しむことが多い。そこで本研究では拡散型画像編集のためのプログレッシブ機能ブレンド手法であるpfb-diffを提案する。従来の方法とは異なり、PFB-Diffはマルチレベルの特徴ブレンディングを通じてテキスト誘導された生成コンテンツをターゲット画像にシームレスに統合する。深い特徴を符号化したリッチなセマンティックスと、高度から低レベルのプログレッシブブレンディングスキームは、編集画像のセマンティックコヒーレンスと高品質を保証します。また,クロスアテンション層に注意マスキング機構を導入し,特定の単語が所望の領域に与える影響を限定し,背景編集の性能をさらに向上させる。 PFB-Diffは、オブジェクト/バックグラウンド置換やオブジェクト属性編集など、様々な編集タスクに効果的に対処できる。本手法は,画像の忠実性,編集精度,効率性,およびオリジナル画像に対する忠実性において,微調整やトレーニングを必要とせずに優れた性能を示す。

Diffusion models have showcased their remarkable capability to synthesize diverse and high-quality images, sparking interest in their application for real image editing. However, existing diffusion-based approaches for local image editing often suffer from undesired artifacts due to the pixel-level blending of the noised target images and diffusion latent variables, which lack the necessary semantics for maintaining image consistency. To address these issues, we propose PFB-Diff, a Progressive Feature Blending method for Diffusion-based image editing. Unlike previous methods, PFB-Diff seamlessly integrates text-guided generated content into the target image through multi-level feature blending. The rich semantics encoded in deep features and the progressive blending scheme from high to low levels ensure semantic coherence and high quality in edited images. Additionally, we introduce an attention masking mechanism in the cross-attention layers to confine the impact of specific words to desired regions, further improving the performance of background editing. PFB-Diff can effectively address various editing tasks, including object/background replacement and object attribute editing. Our method demonstrates its superior performance in terms of image fidelity, editing accuracy, efficiency, and faithfulness to the original image, without the need for fine-tuning or training.

翻訳日:2023-06-30 13:39:22 公開日:2023-06-28

# 重み付け最適化軌道による対人訓練の強化

Enhancing Adversarial Training via Reweighting Optimization Trajectory ( http://arxiv.org/abs/2306.14275v2 )

ライセンス: Link先を確認

Tianjin Huang, Shiwei Liu, Tianlong Chen, Meng Fang, Li Shen, Vlaod Menkovski, Lu Yin, Yulong Pei and Mykola Pechenizkiy

(参考訳) 敵対的トレーニングがディープニューラルネットワークの堅牢性向上のデファクト手法になっているにもかかわらず、バニラ対人トレーニングが頑強なオーバーフィッティングに悩まされ、満足のいく堅牢な一般化をもたらすことはよく知られている。これらの欠点に対処するいくつかのアプローチが提案されている。例えば、余分な正規化、敵の重みの摂動、そして過去数年間のさらなるデータによるトレーニングなどである。しかし、強固な一般化改善はまだ十分ではない。本稿では,この課題に新たな視点でアプローチし,歴史的最適化の軌跡を整理する。本稿では, 時間内学習の最適化トラジェクトリを利用する「textbf{Weighted Optimization Trajectories (WOT)」という新しい手法を提案する。我々は,様々な対人攻撃におけるWOTの有効性を実証するための広範囲な実験を行った。以上の結果から,wotは既存の対向訓練手法とシームレスに統合され,強固なオーバーフィッティング問題を一貫して克服し,対向ロバスト性が向上した。例えば、WOTはAA-$L_{\infty}$アタックのAT-PGDのロバスト精度を1.53\%$\sim$6.11\%向上させ、一方SVHN、CIFAR-10、CIFAR-100、Tiny-ImageNetデータセットのクリーン精度を0.55\%$\sim$5.47\%向上させる。

Despite the fact that adversarial training has become the de facto method for improving the robustness of deep neural networks, it is well-known that vanilla adversarial training suffers from daunting robust overfitting, resulting in unsatisfactory robust generalization. A number of approaches have been proposed to address these drawbacks such as extra regularization, adversarial weights perturbation, and training with more data over the last few years. However, the robust generalization improvement is yet far from satisfactory. In this paper, we approach this challenge with a brand new perspective -- refining historical optimization trajectories. We propose a new method named \textbf{Weighted Optimization Trajectories (WOT)} that leverages the optimization trajectories of adversarial training in time. We have conducted extensive experiments to demonstrate the effectiveness of WOT under various state-of-the-art adversarial attacks. Our results show that WOT integrates seamlessly with the existing adversarial training methods and consistently overcomes the robust overfitting issue, resulting in better adversarial robustness. For example, WOT boosts the robust accuracy of AT-PGD under AA-$L_{\infty}$ attack by 1.53\% $\sim$ 6.11\% and meanwhile increases the clean accuracy by 0.55\%$\sim$5.47\% across SVHN, CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets.

翻訳日:2023-06-30 10:21:18 公開日:2023-06-28

# ShuttleSet22: ストロークレベルバドミントンデータセットによるストローク予測のベンチマーク

ShuttleSet22: Benchmarking Stroke Forecasting with Stroke-Level Badminton Dataset ( http://arxiv.org/abs/2306.15664v2 )

ライセンス: Link先を確認

Wei-Yao Wang, Wei-Wei Du, Wen-Chih Peng

(参考訳) 近年、人工知能の進歩とデータ収集の効率化により、バドミントン分析が注目を集めている。プレイヤーのパフォーマンスを改善し、調査するための効果的なアプリケーションはいくつかあるが、バドミントン領域以外の研究者に使用できる公開バドミントンデータセットはわずかである。既存のバドミントンシングルスデータセットは特定のマッチアップに焦点を当てているが、異なるプレイヤーや様々なマッチアップに関する包括的な研究は提供できない。本稿では,バドミントン・シングルス・データセットであるshuttleset22を2022年に高位の試合から収集した。 shuttleset22はトレーニングセット2,888回の30,172ストローク、バリデーションセット450回の1,400ストローク、ラリー内の詳細なストロークレベルメタデータを備えたテストセット654の2,040ストロークで構成される。 shuttleset22で既存の作業をベンチマークするために、shuttlenetという最先端のストローク予測手法をテストし、対応するストローク予測タスク、すなわち各ラリーの所定のストロークに基づいて将来のストロークを予測する。また、coachai badminton challenge 2023で、バドミントン集会における今後のターンベースのストロークを予測することで、この問題に取り組む研究者を増やそうとしています。ベースラインコードとデータセットはhttps://github.com/wywyWang/CoachAI-Projects/tree/main/CoachAI-Challenge-IJCAI2023で公開される。

In recent years, badminton analytics has drawn attention due to the advancement of artificial intelligence and the efficiency of data collection. While there is a line of effective applications to improve and investigate player performance, there are only a few public badminton datasets that can be used for researchers outside the badminton domain. Existing badminton singles datasets focus on specific matchups; however, they cannot provide comprehensive studies on different players and various matchups. In this paper, we provide a badminton singles dataset, ShuttleSet22, which is collected from high-ranking matches in 2022. ShuttleSet22 consists of 30,172 strokes in 2,888 rallies in the training set, 1,400 strokes in 450 rallies in the validation set, and 2,040 strokes in 654 rallies in the testing set with detailed stroke-level metadata within a rally. To benchmark existing work with ShuttleSet22, we test the state-of-the-art stroke forecasting approach, ShuttleNet, with the corresponding stroke forecasting task, i.e., predict the future strokes based on the given strokes of each rally. We also hold a challenge, Track 2: Forecasting Future Turn-Based Strokes in Badminton Rallies, at CoachAI Badminton Challenge 2023 to boost researchers to tackle this problem. The baseline codes and the dataset will be made available on https://github.com/wywyWang/CoachAI-Projects/tree/main/CoachAI-Challenge-IJCAI2023.

翻訳日:2023-06-30 10:11:05 公開日:2023-06-28

# 非対称幾何散乱変換によるグラフニューラルネットワークの理解

Understanding Graph Neural Networks with Asymmetric Geometric Scattering Transforms ( http://arxiv.org/abs/1911.06253v4 )

ライセンス: Link先を確認

Michael Perlmutter and Alexander Tong and Feng Gao and Guy Wolf and Matthew Hirn

(参考訳) 散乱変換は、畳み込みニューラルネットワークのモデルとして機能する多層ウェーブレットベースのディープラーニングアーキテクチャである。近年、グラフのような非ユークリッド的な設定に対する散乱変換の一般化がいくつか提案されている。我々の研究は、非対称ウェーブレットの非常に一般的なクラスに基づくグラフに対して、窓付きおよび非窓型の幾何学的散乱変換を導入することで、これらの構成に基づいている。これらの非対称グラフ散乱変換は、対称グラフ散乱変換と多くの理論的保証を持つことを示す。その結果、提案手法は既存のグラフ散乱アーキテクチャの多くに対する既知の理論結果を統一し、拡張する。この研究は、幾何学的散乱と他のグラフニューラルネットワークとのギャップを埋めるのに役立ち、証明可能な安定性と不変性を保証する大きなネットワーク群を導入する。これらの結果は、フィルタを学習したグラフ構造化データのための将来のディープラーニングアーキテクチャの基礎となり、確実に望ましい理論的特性を持つ。

The scattering transform is a multilayered wavelet-based deep learning architecture that acts as a model of convolutional neural networks. Recently, several works have introduced generalizations of the scattering transform for non-Euclidean settings such as graphs. Our work builds upon these constructions by introducing windowed and non-windowed geometric scattering transforms for graphs based upon a very general class of asymmetric wavelets. We show that these asymmetric graph scattering transforms have many of the same theoretical guarantees as their symmetric counterparts. As a result, the proposed construction unifies and extends known theoretical results for many of the existing graph scattering architectures. In doing so, this work helps bridge the gap between geometric scattering and other graph neural networks by introducing a large family of networks with provable stability and invariance guarantees. These results lay the groundwork for future deep learning architectures for graph-structured data that have learned filters and also provably have desirable theoretical properties.

翻訳日:2023-06-29 19:03:02 公開日:2023-06-28

# 時間的および意思決定タスク用に訓練された繰り返しニューラルネットワークにおける重み初期化、解の多様性、劣化の探求

Exploring weight initialization, diversity of solutions, and degradation in recurrent neural networks trained for temporal and decision-making tasks ( http://arxiv.org/abs/1906.01094v6 )

ライセンス: Link先を確認

Cecilia Jarne and Rodrigo Laje

(参考訳) リカレントニューラルネットワーク(Recurrent Neural Networks, RNN)は、脳機能と構造をモデル化するために頻繁に使用される。本研究では,時間変化刺激による時間・流れ制御タスクを行うために,小型完全接続型RNNを訓練した。また,ネットワークサイズが小さくなったり,間隔が長くなったり,接続障害が大きくなったりすることで,異なるRNNが同じ課題を解くことができることを示す。検討した課題に対して,タスクパラメータ化により学習後に得られるネットワークがいかに堅牢かを検討した。その過程で,計算神経科学における他の関心課題をパラメータ化するためのフレームワークを開発した。この結果は、通常ブラックボックスとして用いられ、大脳皮質領域の生物学的応答をモデル化するために理解する必要があるモデルの異なる側面を定量化するのに有用である。

Recurrent Neural Networks (RNNs) are frequently used to model aspects of brain function and structure. In this work, we trained small fully-connected RNNs to perform temporal and flow control tasks with time-varying stimuli. Our results show that different RNNs can solve the same task by converging to different underlying dynamics and also how the performance gracefully degrades as either network size is decreased, interval duration is increased, or connectivity damage is increased. For the considered tasks, we explored how robust the network obtained after training can be according to task parameterization. In the process, we developed a framework that can be useful to parameterize other tasks of interest in computational neuroscience. Our results are useful to quantify different aspects of the models, which are normally used as black boxes and need to be understood in order to model the biological response of cerebral cortex areas.

翻訳日:2023-06-29 19:02:50 公開日:2023-06-28

# 多クラス分類のためのラベル分布ロバスト損失:一貫性,ロバスト性,適応性

Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity ( http://arxiv.org/abs/2112.14869v4 )

ライセンス: Link先を確認

Dixian Zhu, Yiming Ying and Tianbao Yang

(参考訳) 本研究では,分布的ロバスト最適化(dro)の観点から定式化した多クラス分類のためのラベル分散ロバスト(ldr)損失と呼ばれる損失関数の族について検討する。この観点の利点はいくつかあります。 i) 古典的クロスエントロピー(CE)損失とSVM損失とその変種を説明する統一的なフレームワークを提供する。 (ii)広く採用されているが、よく理解されていない温度スケールのce損失に対応する特殊ファミリーを含む。 (iii)インスタンスレベルでラベル情報の不確実性度に適応することができる。 Our contributions include: (1) we study both consistency and robustness by establishing top-$k$ ($\forall k\geq 1$) consistency of LDR losses for multi-class classification, and a negative result that a top-$1$ consistent and symmetric robust loss cannot achieve top-$k$ consistency simultaneously for all $k\geq 2$; (2) we propose a new adaptive LDR loss that automatically adapts the individualized temperature parameter to the noise degree of class label of each instance; (3) we demonstrate stable and competitive performance for the proposed adaptive LDR loss on 7 benchmark datasets under 6 noisy label and 1 clean settings against 13 loss functions, and on one real-world noisy dataset. コードは \url{https://github.com/Optimization-AI/ICML2023_LDR} でオープンソース化されている。

We study a family of loss functions named label-distributionally robust (LDR) losses for multi-class classification that are formulated from distributionally robust optimization (DRO) perspective, where the uncertainty in the given label information are modeled and captured by taking the worse case of distributional weights. The benefits of this perspective are several fold: (i) it provides a unified framework to explain the classical cross-entropy (CE) loss and SVM loss and their variants, (ii) it includes a special family corresponding to the temperature-scaled CE loss, which is widely adopted but poorly understood; (iii) it allows us to achieve adaptivity to the uncertainty degree of label information at an instance level. Our contributions include: (1) we study both consistency and robustness by establishing top-$k$ ($\forall k\geq 1$) consistency of LDR losses for multi-class classification, and a negative result that a top-$1$ consistent and symmetric robust loss cannot achieve top-$k$ consistency simultaneously for all $k\geq 2$; (2) we propose a new adaptive LDR loss that automatically adapts the individualized temperature parameter to the noise degree of class label of each instance; (3) we demonstrate stable and competitive performance for the proposed adaptive LDR loss on 7 benchmark datasets under 6 noisy label and 1 clean settings against 13 loss functions, and on one real-world noisy dataset. The code is open-sourced at \url{https://github.com/Optimization-AI/ICML2023_LDR}.

翻訳日:2023-06-29 19:02:02 公開日:2023-06-28

# 連立多目的・多目的最適化のためのレバレッジ信頼

Leveraging Trust for Joint Multi-Objective and Multi-Fidelity Optimization ( http://arxiv.org/abs/2112.13901v3 )

ライセンス: Link先を確認

Faran Irshad, Stefan Karsch and Andreas D\"opp

(参考訳) 本稿では,費用対評価システムの効率的な最適化を追求するために,ベイジアン多目的多忠実度最適化(MOMF)の新たなアプローチを提案する。従来の最適化手法は効果的であるが、しばしば1つ以上の目的の多次元最適化において非常に高いコストに直面する。マルチフィデリティアプローチは、低解像度シミュレーションなどの低コストな複数の情報ソースを活用することで、潜在的な改善を提供する。しかし、これら2つの戦略を統合することは大きな課題である。複数目的とデータソースの同時最適化を支援するため,信頼度基準の革新的利用を提案する。提案手法は,パレート最適化問題において,評価コスト当たりの信頼ゲインを1つの目的として組み込むための多目的最適化ポリシーを修正し,低コストで同時MOMFを実現する。本稿では,入力パラメータと信頼パラメータを併用して選択する総合的アプローチと,ベンチマークのための逐次的アプローチの2つのMOMF最適化手法を提案する。合成試験関数のベンチマークにより,本手法は,純多目的最適化と比較して最大1桁の大幅なコスト削減をもたらすことが示された。さらに,信頼ドメインと客観的ドメインの協調最適化は,それらを逐次的に処理する上で優れていた。レーザプラズマ加速シミュレーションの最適化を応用し, 高コストブラックボックス関数のパレート最適化における本手法の可能性を示す。既存のベイズフレームワークでこれらのメソッドを実装するのは簡単で、バッチ最適化に簡単に拡張できる。種々の連続的あるいは離散的忠実度次元を扱う能力により,プラズマ物理学や流体力学などの分野におけるシミュレーション問題に対する幅広い適用性を提供する。

In the pursuit of efficient optimization of expensive-to-evaluate systems, this paper investigates a novel approach to Bayesian multi-objective and multi-fidelity (MOMF) optimization. Traditional optimization methods, while effective, often encounter prohibitively high costs in multi-dimensional optimizations of one or more objectives. Multi-fidelity approaches offer potential remedies by utilizing multiple, less costly information sources, such as low-resolution simulations. However, integrating these two strategies presents a significant challenge. We suggest the innovative use of a trust metric to support simultaneous optimization of multiple objectives and data sources. Our method modifies a multi-objective optimization policy to incorporate the trust gain per evaluation cost as one objective in a Pareto optimization problem, enabling simultaneous MOMF at lower costs. We present and compare two MOMF optimization methods: a holistic approach selecting both the input parameters and the trust parameter jointly, and a sequential approach for benchmarking. Through benchmarks on synthetic test functions, our approach is shown to yield significant cost reductions - up to an order of magnitude compared to pure multi-objective optimization. Furthermore, we find that joint optimization of the trust and objective domains outperforms addressing them in sequential manner. We validate our results using the use case of optimizing laser-plasma acceleration simulations, demonstrating our method's potential in Pareto optimization of high-cost black-box functions. Implementing these methods in existing Bayesian frameworks is simple, and they can be readily extended to batch optimization. With their capability to handle various continuous or discrete fidelity dimensions, our techniques offer broad applicability in solving simulation problems in fields such as plasma physics and fluid dynamics.

翻訳日:2023-06-29 19:01:33 公開日:2023-06-28

# ランダムスパルシファイド勾配による微分プライベートsgdの改善

Improving Differentially Private SGD via Randomly Sparsified Gradients ( http://arxiv.org/abs/2112.00845v3 )

ライセンス: Link先を確認

Junyi Zhu, Matthew B. Blaschko

(参考訳) 個人差分的確率勾配勾配(DP-SGD)は、個々の勾配の最大ノルムと付加等方性ガウス雑音を束縛するために、厳密に定義されたプライバシーを提供するために、ディープラーニングにおいて広く採用されている。非凸状態でのdp-sgdの収束速度の解析により、クリッピングとノイズ化前の勾配をランダムにスパース化することで、収束境界の内部成分間のトレードオフを調整し、ノイズが支配的である場合の上限を小さくする。さらに, 理論的解析および実証評価の結果, トレードオフは自明なものではなく, DP-SGDのユニークな特性である可能性が示唆された。この観測は、DP-SGDが(単純なランダムな)勾配圧縮に固有の空間を持っていることを暗示している。そこで我々は,DP-SGDを強化するためにランダムスペーシフィケーション(RS)を用いた効率的で軽量な拡張手法を提案する。様々なDP-SGDフレームワークを用いた実験では、RSはパフォーマンスを向上させることができる。さらに、生成したRSのスパース勾配は、通信コストの削減と、プライベート機械学習の重要な問題であるリコンストラクション攻撃に対するプライバシー強化の利点を示す。

Differentially private stochastic gradient descent (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy, which requires gradient clipping to bound the maximum norm of individual gradients and additive isotropic Gaussian noise. With analysis of the convergence rate of DP-SGD in a non-convex setting, we identify that randomly sparsifying gradients before clipping and noisification adjusts a trade-off between internal components of the convergence bound and leads to a smaller upper bound when the noise is dominant. Additionally, our theoretical analysis and empirical evaluations show that the trade-off is not trivial but possibly a unique property of DP-SGD, as either canceling noisification or gradient clipping eliminates the trade-off in the bound. This observation is indicative, as it implies DP-SGD has special inherent room for (even simply random) gradient compression. To verify the observation and utilize it, we propose an efficient and lightweight extension using random sparsification (RS) to strengthen DP-SGD. Experiments with various DP-SGD frameworks show that RS can improve performance. Additionally, the produced sparse gradients of RS exhibit advantages in reducing communication cost and strengthening privacy against reconstruction attacks, which are also key problems in private machine learning.

翻訳日:2023-06-29 19:01:03 公開日:2023-06-28

# 量子コンピューティングは、製造欠陥のある平面量子ビット列上でスケーラブルである

Quantum computing is scalable on a planar array of qubits with fabrication defects ( http://arxiv.org/abs/2111.06432v3 )

ライセンス: Link先を確認

Armands Strikis, Simon C. Benjamin and Benjamin J. Brown

(参考訳) 大規模なアルゴリズムをうまく実行するためには、量子コンピュータは基本操作をほぼ完璧に実行する必要がある。これは、全ての物理量子ビットがかなりのノイズに悩まされるため、根本的な問題である。さらに、実数系は有限収率を持つ可能性があり、例えば、複雑な装置の成分の非ゼロの割合は、製造段階で無意味に破られる可能性がある。本稿では,有限生成欠陥密度を持つ2次元ノイズ量子ビット列を用いて,任意に大きな量子計算を失敗の消滅確率で完了できることを示すしきい値定理を提案する。証明を完了するために,不活性量子ビットの広い領域を補償する高重安定化器を測定する頑健なプロトコルを導入する。我々はsurface code architectureを用いて結果を得た。したがって、我々のアプローチは、大規模量子コンピュータを構築するための実験的な取り組みと容易に対応できる。

To successfully execute large-scale algorithms, a quantum computer will need to perform its elementary operations near perfectly. This is a fundamental challenge since all physical qubits suffer a considerable level of noise. Moreover, real systems are likely to have a finite yield, i.e. some non-zero proportion of the components in a complex device may be irredeemably broken at the fabrication stage. We present a threshold theorem showing that an arbitrarily large quantum computation can be completed with a vanishing probability of failure using a two-dimensional array of noisy qubits with a finite density of fabrication defects. To complete our proof we introduce a robust protocol to measure high-weight stabilizers to compensate for large regions of inactive qubits. We obtain our result using a surface code architecture. Our approach is therefore readily compatible with ongoing experimental efforts to build a large-scale quantum computer.

翻訳日:2023-06-29 19:00:38 公開日:2023-06-28

# ブロックランチョスを用いた行列積状態における多重励起の直接解法

Direct solution of multiple excitations in a matrix product state with block Lanczos ( http://arxiv.org/abs/2109.08181v3 )

ライセンス: Link先を確認

Thomas E. Baker, Alexandre Foley, and David S\'en\'echal

(参考訳) 行列積状態法は局所的ガッピングハミルトニアンの基底状態、特に1次元の計算に効率的であることが知られている。我々は,多目的密度行列再正規化群法を導入し,多くの励起を持つ束行列積状態に作用する。ブロックまたはバンド付きlanczosアルゴリズムを使用することで、励起束の同時、変動最適化が可能になる。この手法はハイゼンベルクモデルや他の興味のあるケースで示される。多数の励起は鎖全体で非常に信頼性の高い局所観測可能な小さな結合次元で得ることができる。

Matrix product state methods are known to be efficient for computing ground states of local, gapped Hamiltonians, particularly in one dimension. We introduce the multi-targeted density matrix renormalization group method that acts on a bundled matrix product state, holding many excitations. The use of a block or banded Lanczos algorithm allows for the simultaneous, variational optimization of the bundle of excitations. The method is demonstrated on a Heisenberg model and other cases of interest. A large of number of excitations can be obtained at a small bond dimension with highly reliable local observables throughout the chain.

翻訳日:2023-06-29 19:00:22 公開日:2023-06-28

# DeepSMile: 大腸癌および乳癌におけるH&E全スライディング画像から直接MSIとHRDを分類する対照的な自己監督前訓練効果

DeepSMILE: Contrastive self-supervised pre-training benefits MSI and HRD classification directly from H&E whole-slide images in colorectal and breast cancer ( http://arxiv.org/abs/2107.09405v3 )

ライセンス: Link先を確認

Yoni Schirris, Efstratios Gavves, Iris Nederlof, Hugo Mark Horlings, Jonas Teuwen

(参考訳) 本稿では,Hematoxylin と Eosin (H&E) のスライディング画像全体 (WSI) を解析するための深層学習に基づく弱いラベル学習法を提案する。我々はdeepsmileを相同組換え欠損症(hrd)とマイクロサテライト不安定症(msi)のタスクに適用する。対照的自己教師付き学習を用いて,癌組織の病理組織学タイルの特徴抽出装置を事前学習する。さらに,腫瘍の多様性をモデル化しながら,可変性に着目したディープマルチインスタンス学習を用いてタイル特徴集合関数を学習する。 TCGA-CRC (n=360) の腫瘍診断および色調正常化サブセットにおけるMSI予測は,本提案したDeepSMILE法と同等にタイル管理ベースラインを0.77AUROCから0.87AUROCに改善する。 TCGA-BC (n=1041) では、DeepSMILE は、自己監督またはImageNet事前訓練された特徴抽出器によるタイル管理と比較して、RDD分類性能を 0.77 から 0.81 AUROC に改善した。提案手法は,両データセットのラベル付きデータの40%のみを用いて,ベースライン性能に達する。これらの改善は、組織病理領域における複数のインスタンス学習を組み合わせた標準的な自己教師付き学習技術を用いて、ラベル付きデータの少ないゲノムラベル分類性能を向上させることを示唆する。

We propose a Deep learning-based weak label learning method for analyzing whole slide images (WSIs) of Hematoxylin and Eosin (H&E) stained tumor tissue not requiring pixel-level or tile-level annotations using Self-supervised pre-training and heterogeneity-aware deep Multiple Instance LEarning (DeepSMILE). We apply DeepSMILE to the task of Homologous recombination deficiency (HRD) and microsatellite instability (MSI) prediction. We utilize contrastive self-supervised learning to pre-train a feature extractor on histopathology tiles of cancer tissue. Additionally, we use variability-aware deep multiple instance learning to learn the tile feature aggregation function while modeling tumor heterogeneity. For MSI prediction in a tumor-annotated and color normalized subset of TCGA-CRC (n=360 patients), contrastive self-supervised learning improves the tile supervision baseline from 0.77 to 0.87 AUROC, on par with our proposed DeepSMILE method. On TCGA-BC (n=1041 patients) without any manual annotations, DeepSMILE improves HRD classification performance from 0.77 to 0.81 AUROC compared to tile supervision with either a self-supervised or ImageNet pre-trained feature extractor. Our proposed methods reach the baseline performance using only 40% of the labeled data on both datasets. These improvements suggest we can use standard self-supervised learning techniques combined with multiple instance learning in the histopathology domain to improve genomic label classification performance with fewer labeled data.

翻訳日:2023-06-29 19:00:14 公開日:2023-06-28

# サブ線形深さとエネルギーのしきい値回路の指数下限

Exponential Lower Bounds for Threshold Circuits of Sub-Linear Depth and Energy ( http://arxiv.org/abs/2107.00223v2 )

ライセンス: Link先を確認

Kei Uchizawa and Haruki Abe

(参考訳) 本稿では,しきい値回路の計算能力とニューラルネットワークの他の理論モデルについて,サイズ(ゲート数)、深さ、重み、エネルギーの4つの複雑性尺度を用いて検討する。ここで、回路のエネルギー複雑性は計算の空間性を測定し、全ての入力代入に対して非ゼロの値を出力する最大ゲート数として定義される。主な結果として、任意のしきい値回路$C$ of size $s$, depth $d$, energy $e$ and weight $w$ satisfies $\log (rk(M_C)) \le ed (\log s + \log w + \log n)$, where $rk(M_C)$は通信行列$M_C$の2n$変数ブール関数の階数であることを示す。したがって、そのようなしきい値回路$C$は、通信行列が階数を持つブール関数のみを$s,w$の対数係数の積と$d,e$の線型因子の積で計算することができる。これは、エネルギーと重量が十分に小さい場合、偶数直線深度閾値回路のサイズに対する指数的な下界を意味する。離散レイル回路や離散化sgmoid回路のような他のニューラルネットワークモデルに対しては、同様の不等式が離散化回路に対しても成り立つことが証明される: $rk(m_c) = o(ed(\log s + \log w + \log n)^3)$。

In this paper, we investigate computational power of threshold circuits and other theoretical models of neural networks in terms of the following four complexity measures: size (the number of gates), depth, weight and energy. Here the energy complexity of a circuit measures sparsity of their computation, and is defined as the maximum number of gates outputting non-zero values taken over all the input assignments. As our main result, we prove that any threshold circuit $C$ of size $s$, depth $d$, energy $e$ and weight $w$ satisfies $\log (rk(M_C)) \le ed (\log s + \log w + \log n)$, where $rk(M_C)$ is the rank of the communication matrix $M_C$ of a $2n$-variable Boolean function that $C$ computes. Thus, such a threshold circuit $C$ is able to compute only a Boolean function of which communication matrix has rank bounded by a product of logarithmic factors of $s,w$ and linear factors of $d,e$. This implies an exponential lower bound on the size of even sublinear-depth threshold circuit if energy and weight are sufficiently small. For other models of neural networks such as a discretized ReLE circuits and decretized sigmoid circuits, we prove that a similar inequality also holds for a discretized circuit $C$: $rk(M_C) = O(ed(\log s + \log w + \log n)^3)$.

翻訳日:2023-06-29 18:59:40 公開日:2023-06-28

# GeoT: 信頼性分子特性予測と化学的解釈可能な表現学習のための幾何学的変換器

GeoT: A Geometry-aware Transformer for Reliable Molecular Property Prediction and Chemically Interpretable Representation Learning ( http://arxiv.org/abs/2106.15516v3 )

ライセンス: Link先を確認

Bumju Kwak, Jiwon Park, Taewon Kang, Jeonghee Jo, Byunghan Lee, Sungroh Yoon

(参考訳) 近年、分子表現学習は様々な化学タスクに焦点をあてる重要な領域として浮上している。しかし、既存のモデルの多くは、分子構造の幾何学的情報を完全に考慮できず、直感的な表現は少ない。さらに、化学的な観点からの実験結果の解釈を提供するために、広く使われているメッセージパッシング機構が限られている。これらの課題に対処するために,GeoT(Geometry-aware Transformer)という,分子表現学習のためのトランスフォーマーベースのフレームワークを導入する。 geotは、分子特性の予測だけでなく、信頼できる解釈性を提供するために特別に設計された注意に基づくメカニズムを通じて、分子グラフ構造を学ぶ。これにより、GeoTはトレーニング対象に関連する原子間関係の注意マップを生成することができる。さらに、GeoTはMPNNベースのモデルに匹敵する性能を示しながら、計算複雑性の低減を実現している。実験の結果,geantは分子構造に対する化学的洞察を効果的に学習し,人工知能と分子科学のギャップを橋渡ししていることが明らかとなった。

In recent years, molecular representation learning has emerged as a key area of focus in various chemical tasks. However, many existing models fail to fully consider the geometric information of molecular structures, resulting in less intuitive representations. Moreover, the widely used message-passing mechanism is limited to provide the interpretation of experimental results from a chemical perspective. To address these challenges, we introduce a novel Transformer-based framework for molecular representation learning, named the Geometry-aware Transformer (GeoT). GeoT learns molecular graph structures through attention-based mechanisms specifically designed to offer reliable interpretability, as well as molecular property prediction. Consequently, GeoT can generate attention maps of interatomic relationships associated with training objectives. In addition, GeoT demonstrates comparable performance to MPNN-based models while achieving reduced computational complexity. Our comprehensive experiments, including an empirical simulation, reveal that GeoT effectively learns the chemical insights into molecular structures, bridging the gap between artificial intelligence and molecular sciences.

翻訳日:2023-06-29 18:59:03 公開日:2023-06-28

# 古典的に区別不能となるハーディ型真実装型ガジェットの拡張

Extensions of Hardy-type true-implies-false gadgets to classically obtain indistinguishability ( http://arxiv.org/abs/2006.11396v4 )

ライセンス: Link先を確認

Karl Svozil

(参考訳) 量子論理用語では、ハーディ型引数は、相互に絡み合った文脈とその可観測物の集合として一様に表現され拡張される。古典的に解釈すれば、これらの構造はグラフ理論的な「ゲット」として機能し、事前に選択され、後から選択された観測可能な終点の相関を強制する。この方法は、他のタイプの関係性、特に量子力学的に異なる観測可能な古典的等式を予測する新しいジョイント特性の一般化と拡張を可能にする。また、量子可観測体の忠実直交表現の発見を容易にする。

In quantum logical terms, Hardy-type arguments can be uniformly presented and extended as collections of intertwined contexts and their observables. If interpreted classically those structures serve as graph-theoretic "gadgets" that enforce correlations on the respective preselected and postselected observable terminal points. The method allows the generalization and extension to other types of relational properties, in particular, to novel joint properties predicting classical equality of quantum mechanically distinct observables. It also facilitates finding faithful orthogonal representations of quantum observables.

翻訳日:2023-06-29 18:58:46 公開日:2023-06-28

# 統一機械学習:同時観測・未観測ノベルティ検出による分類

Unified machine learning: Classification with simultaneous observed and unobserved novelty detection ( http://arxiv.org/abs/2002.01368v7 )

ライセンス: Link先を確認

Emile R. Engelbrecht, Johan A. du Preez

(参考訳) Positive and Unlabelled (PU)-learning, Semi-Supervised Learning (SSL), and Open-Set Recognition (OSR) の統一的なアプローチは、コスト効率の高いアプリケーショングレード分類器の開発を大幅に促進する。しかし、以前の試みは、新しいカテゴリである \mbox{\textit{observed}} と \mbox{\textit{unobserved}} の定義を混乱させた。観測された新しいカテゴリは、PU学習において未学習のトレーニングデータとして定義され、トレーニングセットのカテゴリラベルの不完全なセットによって存在する。対照的に、観測されていない新しいカテゴリはosrでテストデータにのみ存在し、時間とともに現れる新しい興味深いパターンを表すものとして定義されている。安全で実用的な分類器の開発を維持するために、モデルはこれらの新しいカテゴリタイプの違いを一般化する必要がある。本稿では,関連する機械学習研究分野を徹底的にレビューし,ラベルなしデータやオープンlacuを活用し,カテゴリを拡張したオープンセット学習という,新たな統一機械学習政策を提案する。具体的には、Open-LACUは、ラベル付きカテゴリの$K > 1$を正確に分類し、同時に観察された新規カテゴリを拡張背景カテゴリ(K + 1$)に検出・分離し、さらに観測されていない新規カテゴリを強化未知カテゴリ(K + 2$)に検出・分離するモデルを必要とする。 Open-LACUは、観察および保存されていない新しいカテゴリを一般化する最初の機械学習ポリシーである。 Open-LACUの意義は、リモートセンシング画像のセマンティックセグメンテーション、医用放射線画像内の物体検出、およびコークス音響分析による病気の識別におけるその応用について論じる。

A unified approach of Positive and Unlabelled (PU)-learning, Semi-Supervised Learning (SSL), and Open-Set Recognition (OSR) would significantly enhance the development of cost-efficient application-grade classifiers. However, previous attempts have conflated the definitions of \mbox{\textit{observed}} and \mbox{\textit{unobserved}} novel categories. Observed novel categories are defined in PU-learning as those in unlabelled training data and exist due to an incomplete set of category labels for the training set. In contrast, unobserved novel categories are defined in OSR as those that only exist in the testing data and represent new and interesting patterns that emerge over time. To maintain safe and practical classifier development, models must generalise the difference between these novel category types. In this letter, we thoroughly review the relevant machine learning research fields to propose a new unified machine learning policy called Open-set Learning with Augmented Categories by exploiting Unlabelled data or Open-LACU. Specifically, Open-LACU requires models to accurately classify $K > 1$ number of labelled categories while simultaneously detecting and separating observed novel categories into the augmented background category ($K + 1$) and further detecting and separating unobserved novel categories into the augmented unknown category ($K + 2$). Open-LACU is the first machine learning policy to generalise observed and unobserved novel categories. The significance of Open-LACU is also highlighted by discussing its application in semantic segmentation of remote sensing images, object detection within medical radiology images and disease identification through cough sound analysis.

翻訳日:2023-06-29 18:58:35 公開日:2023-06-28

# ニューラルマシン翻訳のための局所バイト融合

Local Byte Fusion for Neural Machine Translation ( http://arxiv.org/abs/2205.11490v3 )

ライセンス: Link先を確認

Makesh Narsimhan Sreedhar, Xiangpeng Wan, Yu Cheng, Junjie Hu

(参考訳) サブワードトークン化スキームは、現在のNLPモデルで使用される主要なテクニックである。しかし、そのようなスキームは剛性があり、一方のコーパス上に構築されたトークン化器は他の並列コーパスにうまく適応しない。多言語コーパスでは、サブワードのトークン化スキームが低リソース言語を多言語化することで翻訳性能が低下することが観察されている。サブワードトークンライザの単純な代替手段は、UTF-8のような符号化方式を用いてバイト列へのトークン化を行うバイトベースの方法である。バイトトークンは、しばしばサブキャラクタの粒度で入力を表す。これにより、文字列よりもかなり長いバイトシーケンスが生成される。下層層における局所情報の集約は、モデルに高レベルのセマンティック情報を構築するためのガイドとなる。本稿では,局所意味情報を集約するために,バイトベースの機械翻訳のためのローカルByte Fusion(LOBEF)手法を提案する。多言語翻訳、ゼロショット交叉変換、ドメイン適応に関する大規模な実験は、従来のバイトベースモデルやサブワード技術よりも一貫して改善されている。さらに分析した結果、バイトベースモデルはパラメータ効率が高く、サブワードモデルよりも高速にトレーニングできることがわかった。

Subword tokenization schemes are the dominant technique used in current NLP models. However, such schemes can be rigid and tokenizers built on one corpus do not adapt well to other parallel corpora. It has also been observed that in multilingual corpora, subword tokenization schemes over-segment low-resource languages leading to a drop in translation performance. A simple alternative to subword tokenizers is byte-based methods i.e. tokenization into byte sequences using encoding schemes such as UTF-8. Byte tokens often represent inputs at a sub-character granularity i.e. one character can be represented by a sequence of multiple byte tokens. This results in byte sequences that are significantly longer than character sequences. Enforcing aggregation of local information in the lower layers can guide the model to build higher-level semantic information. We propose a Local Byte Fusion (LOBEF) method for byte-based machine translation -- utilizing byte $n$-gram and word boundaries -- to aggregate local semantic information. Extensive experiments on multilingual translation, zero-shot cross-lingual transfer, and domain adaptation reveal a consistent improvement over traditional byte-based models and even over subword techniques. Further analysis also indicates that our byte-based models are parameter-efficient and can be trained faster than subword models.

翻訳日:2023-06-29 18:51:46 公開日:2023-06-28

# EHRKit: 電子健康記録テキストのためのPython自然言語処理ツールキット

EHRKit: A Python Natural Language Processing Toolkit for Electronic Health Record Texts ( http://arxiv.org/abs/2204.06604v5 )

ライセンス: Link先を確認

Irene Li, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Dragomir Radev

(参考訳) 電子健康記録(Electronic Health Record, EHR)は、医療システムにおいて重要な部分であり、医療提供、運営、研究に影響を与える。非構造化テキストは、EHRの構造化情報にもかかわらず多くの注目を集めており、エキサイティングな研究分野となっている。最近のニューラル自然言語処理(NLP)法の成功は、構造化されていない臨床ノートを処理するための新しい方向性につながった。本研究では,臨床テキストのためのピソンライブラリ EHRKit を開発した。 MIMIC-III固有の機能とタスク固有の機能である。第1部では、基本的な検索、情報検索、情報抽出を含むMIMIC-III NOTEEVENTSデータにアクセスするためのインターフェースのリストを紹介する。第2部では、名前付きエンティティ認識、要約、機械翻訳など、最大12のオフセットnlpタスクのために、多くのサードパーティライブラリを統合する。

The Electronic Health Record (EHR) is an essential part of the modern medical system and impacts healthcare delivery, operations, and research. Unstructured text is attracting much attention despite structured information in the EHRs and has become an exciting research field. The success of the recent neural Natural Language Processing (NLP) method has led to a new direction for processing unstructured clinical notes. In this work, we create a python library for clinical texts, EHRKit. This library contains two main parts: MIMIC-III-specific functions and tasks specific functions. The first part introduces a list of interfaces for accessing MIMIC-III NOTEEVENTS data, including basic search, information retrieval, and information extraction. The second part integrates many third-party libraries for up to 12 off-shelf NLP tasks such as named entity recognition, summarization, machine translation, etc.

翻訳日:2023-06-29 18:50:34 公開日:2023-06-28

# SUPERNOVA:リスクベーステストと機械学習を用いたAAAゲームにおけるテスト選択と欠陥防止の自動化

SUPERNOVA: Automating Test Selection and Defect Prevention in AAA Video Games Using Risk Based Testing and Machine Learning ( http://arxiv.org/abs/2203.05566v2 )

ライセンス: Link先を確認

Alexander Senchenko, Naomi Patterson, Hamman Samuel, Dan Isper

(参考訳) 従来の手法がソフトウェアシステムの成長とともにスケールできないため、ビデオゲームのテストはますます難しくなっている。手動テストは非常に労働集約的なプロセスなので、すぐにコスト禁止になります。自動テストにスクリプトを使用するのは手頃な価格だが、非決定的な環境ではスクリプトが有効ではない。現代のゲームの複雑さ、スコープ、プレイヤーの期待は、品質管理が生産コストと納入リスクの大きな部分を占めるように急速に増大している。このリスクを低減し、生産を実現することは、現在業界にとって大きな課題です。生産コストを前後的に現実的なものにするため、テストやデータ分析の自動化と並行して、予防的な品質保証戦略に重点を置いています。本稿では,自動ハブとして機能しながら,テスト選択と欠陥防止を行うシステムであるSUPERNOVA(Selection of Testing and Universal defect Prevention in external Repositories for Novel Objective Verification of Software Anomalies)を提案する。データ分析機能と機械学習機能を統合することで、SUPERNOVAは品質保証テスタのバグ発見と欠陥の低減を支援し、プロダクションサイクルの安定性を改善し、テストコストをコントロールできる。この直接的な影響は、これらのテスト選択最適化を使用して出荷された未公開のスポーツゲームタイトルのテスト時間を55%以上削減することが観察されている。さらに、半教師付き機械学習モデルによって生成されたリスクスコアを用いて、71%の精度で検出でき、77%がバグを誘発する変更リストの確率を思い出すことができ、この推論の詳細な説明を開発者に提供できる。これらの取り組みはワークフローを改善し、開発中のゲームタイトルに必要なテスト時間を削減する。

Testing video games is an increasingly difficult task as traditional methods fail to scale with growing software systems. Manual testing is a very labor-intensive process, and therefore quickly becomes cost prohibitive. Using scripts for automated testing is affordable, however scripts are ineffective in non-deterministic environments, and knowing when to run each test is another problem altogether. The modern game's complexity, scope, and player expectations are rapidly increasing where quality control is a big portion of the production cost and delivery risk. Reducing this risk and making production happen is a big challenge for the industry currently. To keep production costs realistic up-to and after release, we are focusing on preventive quality assurance tactics alongside testing and data analysis automation. We present SUPERNOVA (Selection of tests and Universal defect Prevention in External Repositories for Novel Objective Verification of software Anomalies), a system responsible for test selection and defect prevention while also functioning as an automation hub. By integrating data analysis functionality with machine and deep learning capability, SUPERNOVA assists quality assurance testers in finding bugs and developers in reducing defects, which improves stability during the production cycle and keeps testing costs under control. The direct impact of this has been observed to be a reduction in 55% or more testing hours for an undisclosed sports game title that has shipped, which was using these test selection optimizations. Furthermore, using risk scores generated by a semi-supervised machine learning model, we are able to detect with 71% precision and 77% recall the probability of a change-list being bug inducing, and provide a detailed breakdown of this inference to developers. These efforts improve workflow and reduce testing hours required on game titles in development.

翻訳日:2023-06-29 18:49:41 公開日:2023-06-28

# cocofl:部分的nn凍結と量子化によるコミュニケーションと計算の融合学習

CoCoFL: Communication- and Computation-Aware Federated Learning via Partial NN Freezing and Quantization ( http://arxiv.org/abs/2203.05468v3 )

ライセンス: Link先を確認

Kilian Pfeiffer, Martin Rapp, Ramin Khalili, J\"org Henkel

(参考訳) 連邦学習(FL)に参加するデバイスは通常、異種通信、計算、メモリ資源を持つ。しかしながら、同期flでは、すべてのデバイスは、サーバが指示する同じ期限までにトレーニングを終える必要がある。以上の結果から,ニューラルネットワーク(NN)の小さなサブセットを拘束されたデバイス,すなわち最先端技術が提案するニューロン/フィルタを停止させることは非効率であり,これらのデバイスがモデルに効果的な寄与を妨げていることが示された。これにより、特にデバイス間でクラスラベルが歪んだ場合において、制約されたデバイスの達成可能な精度が不公平になる。全てのデバイスでNN構造を完全に維持する新しいFL手法であるCoCoFLを提案する。デバイスの異種リソースに適応するために、cocoflは選択したレイヤを凍結して定量化し、通信、計算、メモリ要件を削減します。これにより、CoCoFLはデバイス上の利用可能なリソースを効率的に利用し、制約されたデバイスがFLシステムに重要な貢献をし、参加者間の公正性(精度の同等性)を高め、モデルの最終的な精度を著しく向上する。

Devices participating in federated learning (FL) typically have heterogeneous communication, computation, and memory resources. However, in synchronous FL, all devices need to finish training by the same deadline dictated by the server. Our results show that training a smaller subset of the neural network (NN) at constrained devices, i.e., dropping neurons/filters as proposed by state of the art, is inefficient, preventing these devices to make an effective contribution to the model. This causes unfairness w.r.t the achievable accuracies of constrained devices, especially in cases with a skewed distribution of class labels across devices. We present a novel FL technique, CoCoFL, which maintains the full NN structure on all devices. To adapt to the devices' heterogeneous resources, CoCoFL freezes and quantizes selected layers, reducing communication, computation, and memory requirements, whereas other layers are still trained in full precision, enabling to reach a high accuracy. Thereby, CoCoFL efficiently utilizes the available resources on devices and allows constrained devices to make a significant contribution to the FL system, increasing fairness among participants (accuracy parity) and significantly improving the final accuracy of the model.

翻訳日:2023-06-29 18:49:12 公開日:2023-06-28

# 高モダリティ多モード変換器:高モダリティ表現学習のためのモダリティと相互作用の不均一性の定量化

High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning ( http://arxiv.org/abs/2203.01311v4 )

ライセンス: Link先を確認

Paul Pu Liang, Yiwei Lyu, Xiang Fan, Jeffrey Tsaw, Yudong Liu, Shentong Mo, Dani Yogatama, Louis-Philippe Morency, Ruslan Salakhutdinov

(参考訳) 現実の多くの問題は本質的にマルチモーダルであり、人間のコミュニケーション、強制、受容、ロボットの視覚センサーなどに使われる言語、ジェスチャー、パラ言語などである。マルチモーダル学習への関心は爆発的に高まっているが、これらの手法は主に言語、視覚、音声に焦点が当てられている。本稿では,多種多様なモダリティに対する一般化を加速するために,多種多様なモダリティを含む高モダリティシナリオに対する効率的な表現学習について検討する。新しいモダリティに新しいモデルを追加することは、必然的に高価になるので、重要な技術的課題は、多様性の定量化である: 前のモダリティとパラメータの共有を可能にするために、類似した情報と相互作用をエンコードするモダリティをどうやって測定できるのか? 異質性量子化のための2つの新しい情報理論指標を提案する。(1) モダリティの不均一性(modality heterogeneity)は、X1からX2への情報転送量を測定することによって、また(2) 相互作用異質性(interaction heterogeneity)は、Fusing {X1,X2} から {X3,X4} への情報転送量を測定することによって、どのように相互作用するかを測定する。提案する2つの指標を,ユニークな情報やインタラクションを含むモダリティの融合を自動的に優先順位付けする方法として重要視する。その結果、単一のモデルであるhighmmtが、最大10のモダリティ(テキスト、画像、音声、ビデオ、センサー、プロピオセプション、音声、時系列、セット、テーブル)と5つの研究領域から15のタスクにスケールする。 HighMMTは、パフォーマンスと効率のトレードオフに関する事前の手法よりも優れているだけでなく、重要なスケーリングの挙動も示している。

Many real-world problems are inherently multimodal, from spoken language, gestures, and paralinguistics humans use to communicate, to force, proprioception, and visual sensors on robots. While there has been an explosion of interest in multimodal learning, these methods are focused on a small set of modalities primarily in language, vision, and audio. In order to accelerate generalization towards diverse and understudied modalities, this paper studies efficient representation learning for high-modality scenarios involving a large set of diverse modalities. Since adding new models for every new modality becomes prohibitively expensive, a critical technical challenge is heterogeneity quantification: how can we measure which modalities encode similar information and interactions in order to permit parameter sharing with previous modalities? This paper proposes two new information theoretic metrics for heterogeneity quantification: (1) modality heterogeneity studies how similar 2 modalities {X1,X2} are by measuring how much information can be transferred from X1 to X2, while (2) interaction heterogeneity studies how similarly pairs of modalities {X1,X2}, {X3,X4} interact by measuring how much information can be transferred from fusing {X1,X2} to {X3,X4}. We show the importance of these 2 proposed metrics as a way to automatically prioritize the fusion of modalities that contain unique information or interactions. The result is a single model, HighMMT, that scales up to 10 modalities (text, image, audio, video, sensors, proprioception, speech, time-series, sets, and tables) and 15 tasks from 5 research areas. Not only does HighMMT outperform prior methods on the tradeoff between performance and efficiency, it also demonstrates a crucial scaling behavior: performance continues to improve with each modality added, and it transfers to entirely new modalities and tasks during fine-tuning.

翻訳日:2023-06-29 18:48:51 公開日:2023-06-28

# 時間非局在サブシステム上の因果不等式に違反するプロセスの存在

Existence of processes violating causal inequalities on time-delocalised subsystems ( http://arxiv.org/abs/2201.11832v3 )

ライセンス: Link先を確認

Julian Wechs, Cyril Branciard, Ognyan Oreshkov

(参考訳) 量子的および古典的過程が存在することは理論的に可能であり、それぞれのパーティによって実行される操作が明確に定義された因果順序で起こらないことが示されている。中心的な疑問は、実際にそのようなプロセスが実現できるかどうかである。このような過程が標準量子論において実現されているという概念を厳密に議論するために、時間非局在量子サブシステムの概念が導入された。本稿では,三成分過程のすべてのユニタリ拡張に対して,時間非局所化サブシステム上の実現が存在することを示す。注目すべきことに、このクラスは因果不等式に違反するプロセス、すなわちデバイス非依存の方法で特定の因果順序と無矛盾性を示す相関を生成するプロセスを含んでいる。我々は,ユニタリ拡張を持つ三部的古典過程の見事な例を考察し,時間非局所化サブシステムにおけるその実現について考察する。次に、因果不等式違反がこの設定においてどのような意味を持つのかを議論し、利害関係変数間の有意因果順序がないことを示すことは確かに有意義な概念であると主張する。

It has been shown that it is theoretically possible for there to exist quantum and classical processes in which the operations performed by separate parties do not occur in a well-defined causal order. A central question is whether and how such processes can be realised in practice. In order to provide a rigorous argument for the notion that certain such processes have a realisation in standard quantum theory, the concept of time-delocalised quantum subsystem has been introduced. In this paper, we show that realisations on time-delocalised subsystems exist for all unitary extensions of tripartite processes. Remarkably, this class contains processes that violate causal inequalities, i.e., that can generate correlations that witness the incompatibility with definite causal order in a device-independent manner. We consider a known striking example of such a tripartite classical process that has a unitary extension, and study its realisation on time-delocalised subsystems. We then discuss the question of what a violation of causal inequalities implies in this setting, and argue that it is indeed a meaningful concept to show the absence of a definite causal order between the variables of interest.

翻訳日:2023-06-29 18:48:05 公開日:2023-06-28

# トランスダクティブおよびセミ教師付き連合学習のためのクロスクライアントラベル伝播

Cross-client Label Propagation for Transductive and Semi-Supervised Federated Learning ( http://arxiv.org/abs/2210.06434v3 )

ライセンス: Link先を確認

Jonathan Scott, Michelle Yeo, Christoph H. Lampert

(参考訳) トランスダクティブフェデレーション学習のための新しい手法であるクロスクライアントラベル伝搬(XCLP)を提案する。 XCLPは、複数のクライアントのデータからデータグラフを共同で推定し、ラベル情報をグラフ全体に伝播することによりラベル付きデータのラベルを算出する。クライアントがデータを誰とでも共有することを避けるため、XCLPは2つの暗号化的にセキュアなプロトコルを使っている。我々は、連合学習におけるXCLPの2つの異なる応用を実証した。最初は、見当たらないテストポイントのラベルを予測するために、ワンショットでそれを使用します。第二に、半教師なしのフェデレーション環境での擬似ラベルなしトレーニングデータを繰り返し使用する。実際のフェデレーションと標準ベンチマークの両方の実験では、XCLPはどちらのアプリケーションでも、代替手法よりも高い分類精度を達成している。

We present Cross-Client Label Propagation(XCLP), a new method for transductive federated learning. XCLP estimates a data graph jointly from the data of multiple clients and computes labels for the unlabeled data by propagating label information across the graph. To avoid clients having to share their data with anyone, XCLP employs two cryptographically secure protocols: secure Hamming distance computation and secure summation. We demonstrate two distinct applications of XCLP within federated learning. In the first, we use it in a one-shot way to predict labels for unseen test points. In the second, we use it to repeatedly pseudo-label unlabeled training data in a federated semi-supervised setting. Experiments on both real federated and standard benchmark datasets show that in both applications XCLP achieves higher classification accuracy than alternative approaches.

翻訳日:2023-06-29 18:42:20 公開日:2023-06-28

# 不均衡な学際的研究提案による階層的ミックスアップマルチラベル分類

Hierarchical MixUp Multi-label Classification with Imbalanced Interdisciplinary Research Proposals ( http://arxiv.org/abs/2209.13912v2 )

ライセンス: Link先を確認

Meng Xiao, Min Wu, Ziyue Qiao, Zhiyuan Ning, Yi Du, Yanjie Fu, Yuanchun Zhou

(参考訳) 資金提供機関は、主にドメインエキスパートと研究提案のトピックマッチングに依存しており、提案レビューアを割り当てている。提案が学際的になるにつれて、提案の学際的性質をプロファイルし、その後、適切な専門知識を持つ専門家を見つけることが困難になる。この問題を解決するための重要なステップは、提案の学際ラベルを正確にモデル化し分類することである。テキスト分類や提案分類といった既存の方法論・応用関連文献は、学際的提案データによる3つの重要な課題を共同で解決するには不十分である。 1)情報科学からAI,AIの基本に至るまで,粗粒から細粒までの提案の規律ラベルの階層構造。 2 提案において、異なる役割を担っている各種主文部の異種意味論 3)非学際研究と学際研究の間には,提案の数は不均衡である。提案の学際的性質を理解する上で,同時に3つの課題に対処できるだろうか? そこで本研究では,H-MixUpと呼ぶ階層型混成多重ラベル分類フレームワークを提案する。 H-MixUpはトランスフォーマーベースの意味情報抽出器とGCNベースの学際知識抽出器を第1号と第2号に活用する。 H-MixUpは、Wold-level MixUp、Word-level CutMix、Manifold MixUp、Document-level MixUpの融合トレーニング方法を開発した。

Funding agencies are largely relied on a topic matching between domain experts and research proposals to assign proposal reviewers. As proposals are increasingly interdisciplinary, it is challenging to profile the interdisciplinary nature of a proposal, and, thereafter, find expert reviewers with an appropriate set of expertise. An essential step in solving this challenge is to accurately model and classify the interdisciplinary labels of a proposal. Existing methodological and application-related literature, such as textual classification and proposal classification, are insufficient in jointly addressing the three key unique issues introduced by interdisciplinary proposal data: 1) the hierarchical structure of discipline labels of a proposal from coarse-grain to fine-grain, e.g., from information science to AI to fundamentals of AI. 2) the heterogeneous semantics of various main textual parts that play different roles in a proposal; 3) the number of proposals is imbalanced between non-interdisciplinary and interdisciplinary research. Can we simultaneously address the three issues in understanding the proposal's interdisciplinary nature? In response to this question, we propose a hierarchical mixup multiple-label classification framework, which we called H-MixUp. H-MixUp leverages a transformer-based semantic information extractor and a GCN-based interdisciplinary knowledge extractor for the first and second issues. H-MixUp develops a fused training method of Wold-level MixUp, Word-level CutMix, Manifold MixUp, and Document-level MixUp to address the third issue.

翻訳日:2023-06-29 18:42:05 公開日:2023-06-28

# 周囲環境における絡み合いの様相

Salient signatures of entanglement in the surrounding environment ( http://arxiv.org/abs/2209.05197v2 )

ライセンス: Link先を確認

{\L}ukasz Rudnicki, Waldemar K{\l}obus, Otavio A. D. Molitor, Wies{\l}aw Laskowski

(参考訳) 我々は, 量子系における絡み合いの存在を, システムを取り巻く環境の粗い観察によって確認できるモデルを開発した。この反直感効果は、システムと環境の間の相互作用が、絡み合う証人である観測可能なものと比例するときに起こりうる。 3つの直感的な例を示しながら一理想気体の雲で、絡み合わされた証人とともに線形ポテンシャルを受けるときは、証人のサインにより指示された方向を加速する。二 2つの量子ビット(又は四レベル原子)を結合したキャビティ内の電磁界の四次数は、同じ方法で変位する。三一つの量子ビットにより与えられる量子環境において、その状態は、ブロッホ球面の1つの半球のみを占め、また、証人のサインと完全に一致する。

We develop a model in which presence of entanglement in a quantum system can be confirmed through coarse observations of the environment surrounding the system. This counter-intuitive effect becomes possible when interaction between the system and its environment is proportional to an observable being an entanglement witness. While presenting three intuitive examples we show that: i) a cloud of an ideal gas, when subject to a linear potential coupled with the entanglement witness, accelerates in the direction dictated by the sign of the witness; ii) quadratures of electromagnetic field in a cavity coupled with two qubits (or a four-level atom) are displaced in the same manner; iii) for a quantum environment given by a single qubit, its state occupies only one hemisphere of the Bloch sphere, again in full agreement with the sign of the witness.

翻訳日:2023-06-29 18:41:41 公開日:2023-06-28

# 量子状態の非線形関数推定のためのハイブリッドフレームワーク

A hybrid framework for estimating nonlinear functions of quantum states ( http://arxiv.org/abs/2208.08416v2 )

ライセンス: Link先を確認

You Zhou and Zhenhuan Liu

(参考訳) 量子状態の非線形関数、例えば$\tr(\rho^m)$を推定することは、量子科学と技術に対する基礎的かつ実用的な関心である。ここでは,一般化スワップテストによって量子部分を構成する量子古典的ハイブリッドフレームワークを示し,ランダム化測定から結果を後処理することで古典的部分を実現する。このハイブリッド・フレームワークは、中間スケールの量子プロセッサの部分的コヒーレント・パワーを利用し、同時に量子計測の数と古典的な後処理コストを劇的に削減する。状態モーメント推定と量子誤り軽減のタスクにおける我々のフレームワークの利点を実証する。

Estimating nonlinear functions of quantum states, such as the moment $\tr(\rho^m)$, is of fundamental and practical interest in quantum science and technology. Here we show a quantum-classical hybrid framework to measure them, where the quantum part is constituted by the generalized swap test, and the classical part is realized by postprocessing the result from randomized measurements. This hybrid framework utilizes the partial coherent power of the intermediate-scale quantum processor and, at the same time, dramatically reduces the number of quantum measurements and the cost of classical postprocessing. We demonstrate the advantage of our framework in the tasks of state-moment estimation and quantum error mitigation.

翻訳日:2023-06-29 18:41:20 公開日:2023-06-28

# メタバースxurllcサービスの注意対応リソース割り当てとqoe分析

Attention-aware Resource Allocation and QoE Analysis for Metaverse xURLLC Services ( http://arxiv.org/abs/2208.05438v6 )

ライセンス: Link先を確認

Hongyang Du, Jiazhen Liu, Dusit Niyato, Jiawen Kang, Zehui Xiong, Junshan Zhang, and Dong In Kim

(参考訳) Metaverseは、次世代インターネットの期待をカプセル化しつつ、新しいキーパフォーマンス指標(KPI)を導入しています。従来の超信頼性・低遅延通信(URLLC)は客観的KPIを満足するが,Metaverseの特徴である個人化された没入感体験を提供することは困難である。クオリティ・オブ・エクスペリエンス(QoE)は総合的なKPIとみなすことができるため、URLLCはより高度なQoEを実現するために、パーソナライズされたリソース割り当てスキームを備えた次世代のURLLC(xURLLC)へと進化する。 Metaverse xURLLC サービスをデプロイするために,Metaverse サービスプロバイダ (MSP) とネットワークインフラストラクチャプロバイダ (InP) のインタラクションを調査し,最適なコントラクト設計フレームワークを提供する。具体的には、メタバースユーザーのQoEの関数として定義されたMSPの効用を最大化し、InPのインセンティブを確実にする。本稿では,QoEを数学的にモデル化するために,メタ・インマージョン(Meta-Immersion)という手法を提案する。さらに, xurllc における qoe を改善するため,注意意識型レンダリングキャパシティアロケーションスキームを開発した。ユーザ・オブジェクト・アテンションレベルデータセットを用いてxURLLCが従来のURLLCと比較して平均20.1%のQoE改善を実現可能であることを検証した。この論文のコードはhttps://github.com/hongyangdu/attentionqoeで入手できる。

Metaverse encapsulates our expectations of the next-generation Internet, while bringing new key performance indicators (KPIs). Although conventional ultra-reliable and low-latency communications (URLLC) can satisfy objective KPIs, it is difficult to provide a personalized immersive experience that is a distinctive feature of the Metaverse. Since the quality of experience (QoE) can be regarded as a comprehensive KPI, the URLLC is evolved towards the next generation URLLC (xURLLC) with a personalized resource allocation scheme to achieve higher QoE. To deploy Metaverse xURLLC services, we study the interaction between the Metaverse service provider (MSP) and the network infrastructure provider (InP), and provide an optimal contract design framework. Specifically, the utility of the MSP, defined as a function of Metaverse users' QoE, is to be maximized, while ensuring the incentives of the InP. To model the QoE mathematically, we propose a novel metric named Meta-Immersion that incorporates both the objective KPIs and subjective feelings of Metaverse users. Furthermore, we develop an attention-aware rendering capacity allocation scheme to improve QoE in xURLLC. Using a user-object-attention level dataset, we validate that the xURLLC can achieve an average of 20.1% QoE improvement compared to the conventional URLLC with a uniform resource allocation scheme. The code for this paper is available at https://github.com/HongyangDu/AttentionQoE

翻訳日:2023-06-29 18:41:09 公開日:2023-06-28

# 開空洞から単一光子を伝播する:普遍量子化による記述

Propagating single photons from an open cavity: Description from universal quantization ( http://arxiv.org/abs/2207.04517v2 )

ライセンス: Link先を確認

Astghik Saharyan, Benjamin Rousseaux, Zsolt Kis, Sergiy Stryzhenko and St\'ephane Gu\'erin

(参考訳) 過去数十年間、量子光学は、初期の実験における高品質な因子キャビティから、リークモードを含む新しいキャビティ設計へと進化してきた。非常に信頼できるモデルにもかかわらず、キャビティ量子電磁力学の概念では、光子漏れは現象学的にほとんどの時間を扱う。ここで、異なるアプローチをとり、最初の原理から始めて、元の真のモード表現から派生した内側の表現を定義し、それによって効果的なハミルトニアンとポインティングベクトルを決定できる。現象学モデルとは対照的に、空洞で発生し、自由空間で伝播する単一の光子の完全な記述が可能である。これはレーザー駆動の原子キャビティシステムに適用される。さらに, 単一光子生成のための原子空洞非共振方式を提案し, 結合状態の異なる単一光子を時間および周波数領域で厳密に解析する。最後に, 単一光子のパルス形状を専用に設計した駆動場エンベロープを用いて調整した断熱除去を実現する特定の結合構造を導入する。

Over the last decades, quantum optics has evolved from high quality factor cavities in the early experiments toward new cavity designs involving leaky modes. Despite very reliable models, in the concepts of cavity quantum electrodynamics, photon leakage is most of the time treated phenomenologically. Here, we take a different approach, and starting from first principles, we define an inside-outside representation which is derived from the original true-mode representation, in which one can determine effective Hamiltonian and Poynting vector. Contrary to the phenomenological model, they allow a full description of a leaking single photon produced in the cavity and propagating in free space. This is applied for a laser-driven atom-cavity system. In addition, we propose an atom-cavity non-resonant scheme for single photon generation, and we rigorously analyze the outgoing single photon in time and frequency domains for different coupling regimes. Finally, we introduce a particular coupling regime ensuring adiabatic elimination for which the pulse shape of the outgoing single photon is tailored using a specifically designed driving field envelope.

翻訳日:2023-06-29 18:40:42 公開日:2023-06-28

# 重ね合わせにおける時空の量子共形対称性

Quantum conformal symmetries for spacetimes in superposition ( http://arxiv.org/abs/2207.00021v2 )

ライセンス: Link先を確認

Viktoria Kabel, Anne-Catherine de la Hamette, Esteban Castro-Ruiz, \v{C}aslav Brukner

(参考訳) 量子重力の完全な理論がなければ、量子場と量子粒子が時空の重ね合わせの中でどのように振る舞うかという問題は、理論的および実験的研究の範囲を超えているように見える。ここでは量子参照フレーム形式の拡張を用いて、同値な計量の重ね合わせを前提としたクライン=ゴードン場に対するこの問題に対処する。量子共形変換'' の群構造に基づいて、時空の重ね合わせに量子場を記述する状態と、ミンコフスキーの背景に質量の重ね合わせを持つ量子場を表す状態とをマッピングできる明示的な量子作用素を構築する。これは拡張対称性の原理、すなわち量子共形変換の下で不変性を構成する。後者は、微分同値でない時空の重ね合わせを、曲線化された時空上のより直感的な場の重ね合わせと関連付けることによって理解することができる。さらに、曲がりくねった時空における粒子生成の現象を同値な同値な時空にインポートするためにも用いることができ、改良されたミンコフスキー時空における新しい特徴を明らかにすることができる。

Without a complete theory of quantum gravity, the question of how quantum fields and quantum particles behave in a superposition of spacetimes seems beyond the reach of theoretical and experimental investigations. Here we use an extension of the quantum reference frame formalism to address this question for the Klein-Gordon field residing on a superposition of conformally equivalent metrics. Based on the group structure of ``quantum conformal transformations'', we construct an explicit quantum operator that can map states describing a quantum field on a superposition of spacetimes to states representing a quantum field with a superposition of masses on a Minkowski background. This constitutes an extended symmetry principle, namely invariance under quantum conformal transformations. The latter allows to build an understanding of superpositions of diffeomorphically non-equivalent spacetimes by relating them to a more intuitive superposition of quantum fields on curved spacetime. Furthermore, it can be used to import the phenomenon of particle production in curved spacetime to its conformally equivalent counterpart, thus revealing new features in modified Minkowski spacetime.

翻訳日:2023-06-29 18:40:23 公開日:2023-06-28

# KAB2Sに向けて : 単目的問題から多目的問題への鍵知識の学習

Towards KAB2S: Learning Key Knowledge from Single-Objective Problems to Multi-Objective Problem ( http://arxiv.org/abs/2206.12906v2 )

ライセンス: Link先を確認

Xu Wendi, Wang Xianpeng, Guo Qingxin, Song Xiangman, Zhao Ren, Zhao Guodong, Yang Yang, Xu Te, He Dakuo

(参考訳) 進化的計算研究の新しいフロンティア」として、進化的伝達最適化(eto)は進化的計算研究における過去の問題からの関連する経験と知識のゼロ再利用という従来のパラダイムを克服する。 etoによるスケジューリングアプリケーションでは、知的スケジューリングとグリーンスケジューリングの両方、特に中国からの「炭素中立性」の国際的な誓約のために、非常に魅力的で競争の激しいフレームワーク「ミーティング」が形成される可能性がある。我々の知る限り、ここでのスケジューリングに関する論文は、多目的最適化問題(マルチタスク最適化ではない)が離散ケースにおいて単目的最適化問題を「省略する」場合に、ETOフレームワークのクラスの最初の作業となる。より具体的には、遺伝的アルゴリズムをベースとした位置決定ブロックのような産業応用のための重要な知識は、置換フローショップスケジューリング問題(PFSP)のための新しいコア転送機構と学習技術によって利用することができる。提案するETO-PFSPフレームワークの有効性と大域的普遍性を実証的に検証した。本研究は,(1)ETOフレームワークを充実させ,(2)遺伝的アルゴリズムとメメティックアルゴリズムのブロック構築の古典的・基本的理論に寄与し,(3)中国における「インダストリアルインテリジェンス」のための「知識とビルディングブロックに基づくスケジューリング(KAB2S)」のパラダイムの提案と実践により,進化的スケジューリングのパラダイムシフトに向かう。

As "a new frontier in evolutionary computation research", evolutionary transfer optimization(ETO) will overcome the traditional paradigm of zero reuse of related experience and knowledge from solved past problems in researches of evolutionary computation. In scheduling applications via ETO, a quite appealing and highly competitive framework "meeting" between them could be formed for both intelligent scheduling and green scheduling, especially for international pledge of "carbon neutrality" from China. To the best of our knowledge, our paper on scheduling here, serves as the 1st work of a class of ETO frameworks when multiobjective optimization problem "meets" single-objective optimization problems in discrete case (not multitasking optimization). More specifically, key knowledge conveyed for industrial applications, like positional building blocks with genetic algorithm based settings, could be used via the new core transfer mechanism and learning techniques for permutation flow shop scheduling problem(PFSP). Extensive studies on well-studied benchmarks validate firm effectiveness and great universality of our proposed ETO-PFSP framework empirically. Our investigations (1) enrich the ETO frameworks, (2) contribute to the classical and fundamental theory of building block for genetic algorithms and memetic algorithms, and (3) head towards the paradigm shift of evolutionary scheduling via learning by proposal and practice of paradigm of "knowledge and building-block based scheduling" (KAB2S) for "industrial intelligence" in China.

翻訳日:2023-06-29 18:39:58 公開日:2023-06-28

# スケジューリング:単一目的問題から多目的問題への鍵知識の学習

ETO Meets Scheduling: Learning Key Knowledge from Single-Objective Problems to Multi-Objective Problem ( http://arxiv.org/abs/2206.12902v2 )

ライセンス: Link先を確認

Wendi Xu, Xianpeng Wang

(参考訳) 進化的伝達最適化(ETO)は「進化的計算研究の新しいフロンティア」として機能し、従来の進化的計算において解決された問題から経験と知識をゼロに再利用することを避ける。 ETOを経由したスケジューリングでは、インテリジェントなスケジューリングとグリーンスケジューリングの両方、特に中国の文脈における炭素中立性のために、非常に競争の激しい"ミーティング"フレームワークを構成することができる。我々の知る限り、ここでのスケジューリングに関する我々の研究は、多目的問題(マルチタスク最適化ではない)の単一目的問題において、複雑な最適化のためのETOの最初の研究である。より具体的には、位置決めブロックのような重要な知識は学習され、置換フローショップスケジューリング問題(PFSP)のために転送される。提案するETO-PFSPフレームワークの比較的確実な有効性と大きな可能性を検証する。

Evolutionary transfer optimization(ETO) serves as "a new frontier in evolutionary computation research", which will avoid zero reuse of experience and knowledge from solved problems in traditional evolutionary computation. In scheduling applications via ETO, a highly competitive "meeting" framework between them could be constituted towards both intelligent scheduling and green scheduling, especially for carbon neutrality within the context of China. To the best of our knowledge, our study on scheduling here, is the 1st work of ETO for complex optimization when multiobjective problem "meets" single-objective problems in combinatorial case (not multitasking optimization). More specifically, key knowledge like positional building blocks clustered, could be learned and transferred for permutation flow shop scheduling problem (PFSP). Empirical studies on well-studied benchmarks validate relatively firm effectiveness and great potential of our proposed ETO-PFSP framework.

翻訳日:2023-06-29 18:39:31 公開日:2023-06-28

# 人間の日から機械秒:機械学習の最終結果の自動回答と生成

From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams ( http://arxiv.org/abs/2206.05442v7 )

ライセンス: Link先を確認

Iddo Drori, Sarah J. Zhang, Reece Shuttleworth, Sarah Zhang, Keith Tyser, Zad Chin, Pedro Lantigua, Saisamrit Surbehera, Gregory Hunter, Derek Austin, Leonard Tang, Yann Hicke, Sage Simhon, Sathwik Karnik, Darnell Granberry, Madeleine Udell

(参考訳) mit、ハーバード、コーネルなどの上位機関における機械学習の最終試験は通常、執筆に学部の日を要し、解決には学生の時間を要する。大規模な言語モデルは、トレーニング後のオンラインのファイナルで、人間のレベルで機械学習のファイナルをパスし、新しい品質のファイナル質問を数秒で自動生成することを示した。従来の研究は、数学やSTEMコースにおける大学レベルの問題セットを解くために、プログラム合成と数ショットの学習方法を開発した。本研究では,問題集合とはいくつかの方法で異なる最終試験を解く手法を開発し,比較する。質問はより長く,複数の部分を持ち,より複雑で,幅広い話題にまたがる。オンラインで利用できる機械学習の最終試験のデータセットとベンチマークを作成し、これらの質問に答え、新しい質問を生成するためのコードを作成します。他の質問やコースノートから新しい質問を生成する方法を示します。この最終試験ベンチマークの再現性と今後の研究のために,複数選択,数値,質問に対する自動チェッカーを表現回答とともに使用する。 GPT-3, OPT, Codex, ChatGPT を用いて, ゼロショット学習と少数ショット学習, チェーン・オブ・シークレットとを比較したアブレーション研究を行い, 少数ショット学習が有効であることを示す。我々は,大規模評価の文章作成と解法を合理化する言語モデルの変換可能性に注目し,人間の日数から機械数秒までの作業負荷を大幅に削減する。本研究は,chatgptのような大規模言語モデルを授業で禁止するよりも,学生に対して,正しさ,完全性,回答の独創性を問うことによって活用を指導し,批判的思考を奨励すべきであることが示唆された。

A final exam in machine learning at a top institution such as MIT, Harvard, or Cornell typically takes faculty days to write, and students hours to solve. We demonstrate that large language models pass machine learning finals at a human level, on finals available online after the models were trained, and automatically generate new human-quality final exam questions in seconds. Previous work has developed program synthesis and few-shot learning methods to solve university-level problem set questions in mathematics and STEM courses. In this work, we develop and compare methods that solve final exams, which differ from problem sets in several ways: the questions are longer, have multiple parts, are more complicated, and span a broader set of topics. We curate a dataset and benchmark of questions from machine learning final exams available online and code for answering these questions and generating new questions. We show how to generate new questions from other questions and course notes. For reproducibility and future research on this final exam benchmark, we use automatic checkers for multiple-choice, numeric, and questions with expression answers. We perform ablation studies comparing zero-shot learning with few-shot learning and chain-of-thought prompting using GPT-3, OPT, Codex, and ChatGPT across machine learning topics and find that few-shot learning methods perform best. We highlight the transformative potential of language models to streamline the writing and solution of large-scale assessments, significantly reducing the workload from human days to mere machine seconds. Our results suggest that rather than banning large language models such as ChatGPT in class, instructors should teach students to harness them by asking students meta-questions about correctness, completeness, and originality of the responses generated, encouraging critical thinking in academic studies.

翻訳日:2023-06-29 18:39:15 公開日:2023-06-28

# 微分可能なユーザモデル

Differentiable User Models ( http://arxiv.org/abs/2211.16277v2 )

ライセンス: Link先を確認

Alex H\"am\"al\"ainen, Mustafa Mert \c{C}elikok, Samuel Kaski

(参考訳) 確率的ユーザモデリングは、ループ内に人間がいるユビキタスケースで機械学習システムを構築するために不可欠である。しかし、現代の高度なユーザーモデルは認知行動シミュレータとして設計され、現代の機械学習パイプラインと互換性がなく、ほとんどの実用的なアプリケーションでは計算が禁じられている。我々は、この計算ボトルネックを回避するために、広く適用可能な微分可能サロゲートを導入することでこの問題に対処し、サロゲートは現代の認知モデルを用いた計算効率の高い推論を可能にする。オンラインアプリケーションに適した計算コストで、既存の可能性のない推論手法である、唯一利用可能なソリューションに匹敵するモデリング能力が達成可能であることを実験的に示す。最後に、メニュー検索タスクにおいて、aiアシスタントがオンラインインタラクションに認知モデルをどのように利用できるかを示す。

Probabilistic user modeling is essential for building machine learning systems in the ubiquitous cases with humans in the loop. However, modern advanced user models, often designed as cognitive behavior simulators, are incompatible with modern machine learning pipelines and computationally prohibitive for most practical applications. We address this problem by introducing widely-applicable differentiable surrogates for bypassing this computational bottleneck; the surrogates enable computationally efficient inference with modern cognitive models. We show experimentally that modeling capabilities comparable to the only available solution, existing likelihood-free inference methods, are achievable with a computational cost suitable for online applications. Finally, we demonstrate how AI-assistants can now use cognitive models for online interaction in a menu-search task, which has so far required hours of computation during interaction.

翻訳日:2023-06-29 18:31:33 公開日:2023-06-28

# ストラグラー緩和のための逐次勾配符号化

Sequential Gradient Coding For Straggler Mitigation ( http://arxiv.org/abs/2211.13802v2 )

ライセンス: Link先を確認

M. Nikhil Krishnan, MohammadReza Ebrahimi, Ashish Khisti

(参考訳) 分散コンピューティングでは、遅いノード(ストラグラー)は通常ボトルネックとなる。 Tandonらによって導入されたGC(Gradient Coding)は、誤り訂正符号の原理を用いて、ストラグラーの存在下で勾配計算を分散する効率的な手法である。本稿では,各勾配の処理をラウンド$t$で開始し,ラウンド$(t+t)$で終了するような勾配列$\{g(1),g(2),\ldots,g(j)\}$の分散計算を考える。ここで$T\geq 0$は遅延パラメータを表す。 GCスキームでは、コーディングは計算ノード間でのみ行われ、結果として$T=0$というソリューションが得られる。一方、$t>0$を持つことで、時間次元を利用するスキームを設計することができる。本稿では,GCと比較して性能向上を示す2つの手法を提案する。最初のスキームでは、GCと未完成タスクの選択的な繰り返しを組み合わせることで、トラグラー緩和の改善を実現しています。私たちの主な貢献を構成する第2のスキームでは、タスクのサブセットにgcを適用し、残りのタスクを反復します。次に、過去のストラグラーパターンに基づいて、労働者とラウンドにまたがる2つのタスクのクラスを適応的に多重化する。理論解析を用いて,第2のスキームが計算負荷を大幅に削減できることを実証する。実験では、256のワーカノードを含むAWS Lambdaクラスタ上で、並列に複数のニューラルネットワークをトレーニングする実践的な設定について検討した。提案手法は, 自然に発生する非シミュレートストラグラーの存在下で, ベースラインGC方式よりも16倍のランタイム改善を実現することができることを示す。

In distributed computing, slower nodes (stragglers) usually become a bottleneck. Gradient Coding (GC), introduced by Tandon et al., is an efficient technique that uses principles of error-correcting codes to distribute gradient computation in the presence of stragglers. In this paper, we consider the distributed computation of a sequence of gradients $\{g(1),g(2),\ldots,g(J)\}$, where processing of each gradient $g(t)$ starts in round-$t$ and finishes by round-$(t+T)$. Here $T\geq 0$ denotes a delay parameter. For the GC scheme, coding is only across computing nodes and this results in a solution where $T=0$. On the other hand, having $T>0$ allows for designing schemes which exploit the temporal dimension as well. In this work, we propose two schemes that demonstrate improved performance compared to GC. Our first scheme combines GC with selective repetition of previously unfinished tasks and achieves improved straggler mitigation. In our second scheme, which constitutes our main contribution, we apply GC to a subset of the tasks and repetition for the remainder of the tasks. We then multiplex these two classes of tasks across workers and rounds in an adaptive manner, based on past straggler patterns. Using theoretical analysis, we demonstrate that our second scheme achieves significant reduction in the computational load. In our experiments, we study a practical setting of concurrently training multiple neural networks over an AWS Lambda cluster involving 256 worker nodes, where our framework naturally applies. We demonstrate that the latter scheme can yield a 16\% improvement in runtime over the baseline GC scheme, in the presence of naturally occurring, non-simulated stragglers.

翻訳日:2023-06-29 18:31:21 公開日:2023-06-28

# 周期駆動超低温原子を用いた非可換キラルスピン液体の工学と探索

Engineering and probing non-Abelian chiral spin liquids using periodically driven ultracold atoms ( http://arxiv.org/abs/2211.09777v3 )

ライセンス: Link先を確認

Bo-Ye Sun, Nathan Goldman, Monika Aidelsburger, Marin Bukov

(参考訳) 量子シミュレータを用いた非可換キラルスピン液体の実現と探索を目的として,周期(フロッケ)駆動に基づく寒冷原子を用いたキタエフのハニカムモデルの実装法を提案する。実効的なハミルトニアンを逆周波数展開における主次数に導出し、実効的なマヨラナと渦の自由度を混ぜることなくスペクトルの位相的ギャップを開くことを示した。我々は、マヨルダナフェルミオンの物理を探索する課題に対処し、元の合成スピン自由度にのみアクセスする。具体的には,Floquetドライブの存在下でのギャップ分光とエッジクエンチを用いて,キラルスピン液体相の性質を検出することを提案する。その結果得られるキラルエッジ信号は、中性マヨラナ電流に関連する熱ホール効果と関連しており、現実的に準備された状態に対して頑健であることが判明した。フロッケ工学と強い相互作用を組み合わせることで、量子シミュレータを用いた非可換励起と量子化熱輸送の将来研究への道を開く。

We propose a scheme to implement Kitaev's honeycomb model with cold atoms, based on a periodic (Floquet) drive, in view of realizing and probing non-Abelian chiral spin liquids using quantum simulators. We derive the effective Hamiltonian to leading order in the inverse-frequency expansion, and show that the drive opens up a topological gap in the spectrum without mixing the effective Majorana and vortex degrees of freedom. We address the challenge of probing the physics of Majorana fermions, while having only access to the original composite spin degrees of freedom. Specifically, we propose to detect the properties of the chiral spin liquid phase using gap spectroscopy and edge quenches in the presence of the Floquet drive. The resulting chiral edge signal, which relates to the thermal Hall effect associated with neutral Majorana currents, is found to be robust for realistically-prepared states. By combining strong interactions with Floquet engineering, our work paves the way for future studies of non-Abelian excitations and quantized thermal transport using quantum simulators.

翻訳日:2023-06-29 18:30:52 公開日:2023-06-28

# QueryForm: シンプルなゼロショットフォーム Entity Query Framework

QueryForm: A Simple Zero-shot Form Entity Query Framework ( http://arxiv.org/abs/2211.07730v2 )

ライセンス: Link先を確認

Zifeng Wang, Zizhao Zhang, Jacob Devlin, Chen-Yu Lee, Guolong Su, Hao Zhang, Jennifer Dy, Vincent Perot, Tomas Pfister

(参考訳) 文書理解のためのゼロショット転送学習は、文書エンティティのアノテートにかかわる高コスト化を支援するために不可欠だが、未検討のシナリオである。本稿では,0ショット方式でフォームライクなドキュメントからエンティティ値を抽出する新しいクエリベースのフレームワークQueryFormを提案する。 queryformには、ドキュメントスキーマと特定のエンティティタイプの両方をクエリに構成するデュアルプロンプトメカニズムが含まれており、トランスフォーマーモデルに単一のエンティティ抽出タスクを実行するように促すために使用される。さらに,HTML アノテーションの弱いフォーム風の Web ページから生成された大規模クエリエンタリティペアを,事前学習型 QueryForm に活用することを提案する。事前トレーニングと微調整を同じクエリベースのフレームワークに統合することにより、queryformでは、さまざまなエンティティやレイアウトを含む構造化ドキュメントからモデルが学習できるようになる。 QueryForm は XFUND (+4.6%~10.1%) と Payment (+3.2%~9.5%) のゼロショットベンチマークの両方に新しい最先端の平均 F1 スコアをセットする。

Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities. We present a novel query-based framework, QueryForm, that extracts entity values from form-like documents in a zero-shot fashion. QueryForm contains a dual prompting mechanism that composes both the document schema and a specific entity type into a query, which is used to prompt a Transformer model to perform a single entity extraction task. Furthermore, we propose to leverage large-scale query-entity pairs generated from form-like webpages with weak HTML annotations to pre-train QueryForm. By unifying pre-training and fine-tuning into the same query-based framework, QueryForm enables models to learn from structured documents containing various entities and layouts, leading to better generalization to target document types without the need for target-specific training data. QueryForm sets new state-of-the-art average F1 score on both the XFUND (+4.6%~10.1%) and the Payment (+3.2%~9.5%) zero-shot benchmark, with a smaller model size and no additional image input.

翻訳日:2023-06-29 18:30:25 公開日:2023-06-28

# 相互作用する可積分および非可積分スピン-1/2XYZ鎖における時間外秩序相関子のスロー緩和

Slow relaxation of out-of-time-ordered correlators in interacting integrable and nonintegrable spin-1/2 XYZ chains ( http://arxiv.org/abs/2211.07073v2 )

ライセンス: Link先を確認

Vinitha Balachandran, Lea F. Santos, Marcos Rigol, and Dario Poletti

(参考訳) 時間外順序相関器(OTOC)は量子情報のスクランブルを特徴づけるのに役立ち、通常は非可積分系の文脈で研究される。本研究では、古典的対応を持たないレジームにおける可積分および非可積分スピン-1/2XYZ鎖の相互作用におけるOTOCの緩和ダイナミクスを比較する。両方の種類の鎖において、$U(1)$や超対称性のような対称性の存在を利用して、OTOC作用素がハミルトン作用素と重複するか否かを考察する。 OTOCs の緩和は、連鎖が可積分であるか非可積分であるかに関わらず、重複が存在するとき(そうでないとき)遅い(高速)ことを示す。遅くなると、OTOCのダイナミクスは2点相関器のそれに近いことが示される。数値計算を用いてオトクの動力学を研究し,エネルギー固有ベイシスにおける対応する局所作用素の対角およびオフ対角行列要素の性質から解析的洞察を得る。

Out-of-time ordered correlators (OTOCs) help characterize the scrambling of quantum information and are usually studied in the context of nonintegrable systems. In this work, we compare the relaxation dynamics of OTOCs in interacting integrable and nonintegrable spin-1/2 XYZ chains in regimes without a classical counterpart. In both kinds of chains, using the presence of symmetries such as $U(1)$ and supersymmetry, we consider regimes in which the OTOC operators overlap or not with the Hamiltonian. We show that the relaxation of the OTOCs is slow (fast) when there is (there is not) an overlap, independently of whether the chain is integrable or nonintegrable. When slow, we show that the OTOC dynamics follows closely that of the two-point correlators. We study the dynamics of OTOCs using numerical calculations, and gain analytical insights from the properties of the diagonal and of the off-diagonal matrix elements of the corresponding local operators in the energy eigenbasis.

翻訳日:2023-06-29 18:30:01 公開日:2023-06-28

# 双曲表現学習の数値的安定性

The Numerical Stability of Hyperbolic Representation Learning ( http://arxiv.org/abs/2211.00181v3 )

ライセンス: Link先を確認

Gal Mishne, Zhengchao Wan, Yusu Wang, Sheng Yang

(参考訳) 球の半径が指数関数的に増加すると、双曲空間は任意に小さな歪みで木を埋め込むことができ、したがって階層的なデータセットを表現するために広く注目を集めている。しかし、この指数的成長特性は数値的な不安定さの代償となり、双曲型学習モデルの訓練は時に破滅的なnan問題を引き起こし、浮動小数点演算において表現不能な値に遭遇する。本研究では,双曲空間に対する2つの人気モデルの極限,すなわちポアンカーの球とローレンツ模型を慎重に解析する。まず,64ビットの算術システムにおいて,ポアンカルの球は点を正しく表現するためのローレンツモデルよりも比較的大きな容量を持つことを示す。そして,最適化の観点から,ポアンカーの球に対するローレンツモデルの優位性を理論的に検証する。両方のモデルの数値的な制限を考えると、これらの制限を緩和できる双曲空間のユークリッドパラメトリゼーションを1つ特定する。さらに、このユークリッドパラメトリゼーションを双曲型超平面に拡張し、双曲型SVMの性能を向上させる能力を示す。

Given the exponential growth of the volume of the ball w.r.t. its radius, the hyperbolic space is capable of embedding trees with arbitrarily small distortion and hence has received wide attention for representing hierarchical datasets. However, this exponential growth property comes at a price of numerical instability such that training hyperbolic learning models will sometimes lead to catastrophic NaN problems, encountering unrepresentable values in floating point arithmetic. In this work, we carefully analyze the limitation of two popular models for the hyperbolic space, namely, the Poincar\'e ball and the Lorentz model. We first show that, under the 64 bit arithmetic system, the Poincar\'e ball has a relatively larger capacity than the Lorentz model for correctly representing points. Then, we theoretically validate the superiority of the Lorentz model over the Poincar\'e ball from the perspective of optimization. Given the numerical limitations of both models, we identify one Euclidean parametrization of the hyperbolic space which can alleviate these limitations. We further extend this Euclidean parametrization to hyperbolic hyperplanes and exhibits its ability in improving the performance of hyperbolic SVM.

翻訳日:2023-06-29 18:28:50 公開日:2023-06-28

# 無限格子上の量子スピン系に対する位数1のワッサーシュタイン距離

The Wasserstein distance of order 1 for quantum spin systems on infinite lattices ( http://arxiv.org/abs/2210.11446v2 )

ライセンス: Link先を確認

Giacomo De Palma and Dario Trevisan

(参考訳) 我々は、特定の量子 $w_1$ 距離と呼ばれる格子 $\mathbb{z}^d$ 上の位数 1 から量子スピン系へのワッサーシュタイン距離の一般化を提案する。この提案は[de palma et al., ieee trans. inf. theory 67, 6627 (2021)]のquditsに対する$w_1$ distance に基づいており、任意の有限個のスピン上の限界状態が正準基底で対角的である量子状態に対してornsteinの$\bar{d}$- distance を回復する。また、$\mathbb{Z}^d$ 上の量子相互作用に対するリプシッツ定数の一般化を提案し、そのような量子リプシッツ定数と特定の量子 $W_1$ 距離が互いに双対であることを証明する。我々は、量子$W_1$距離という観点から有限個の量子スピンに対するフォン・ノイマンエントロピーに対する新しい連続性を証明し、それを用いて、$\mathbb{Z}^d$上の量子スピン系に対する特定の量子$W_1$距離という観点から、特定のフォン・ノイマンエントロピーに対する連続性を証明する。最後に、臨界温度を超える局所的な量子交換相互作用が輸送コストの不等式を満たすことを証明し、ギブス状態の特異性を示す。

We propose a generalization of the Wasserstein distance of order 1 to quantum spin systems on the lattice $\mathbb{Z}^d$, which we call specific quantum $W_1$ distance. The proposal is based on the $W_1$ distance for qudits of [De Palma et al., IEEE Trans. Inf. Theory 67, 6627 (2021)] and recovers Ornstein's $\bar{d}$-distance for the quantum states whose marginal states on any finite number of spins are diagonal in the canonical basis. We also propose a generalization of the Lipschitz constant to quantum interactions on $\mathbb{Z}^d$ and prove that such quantum Lipschitz constant and the specific quantum $W_1$ distance are mutually dual. We prove a new continuity bound for the von Neumann entropy for a finite set of quantum spins in terms of the quantum $W_1$ distance, and we apply it to prove a continuity bound for the specific von Neumann entropy in terms of the specific quantum $W_1$ distance for quantum spin systems on $\mathbb{Z}^d$. Finally, we prove that local quantum commuting interactions above a critical temperature satisfy a transportation-cost inequality, which implies the uniqueness of their Gibbs states.

翻訳日:2023-06-29 18:28:19 公開日:2023-06-28

# 準確率表現における量子ベイズ推論

Quantum Bayesian Inference in Quasiprobability Representations ( http://arxiv.org/abs/2301.01952v2 )

ライセンス: Link先を確認

Clive Cenxin Aw, Kelvin Onggadinata, Dagomir Kaszlikowski, Valerio Scarani

(参考訳) ベイズの法則は情報科学や物理科学において重要な論理推論の役割を果たす。量子状態への拡張は、近年のいくつかの研究の対象となっている。これらのベイズの規則の量子バージョンはヒルベルト空間の言語で表現されている。本稿では,正規準確率表現(離散ウィグナー表現を含む)と対称的かつ情報的完全正の演算子値測度(sic-povms)の2つの標準的選択に対する明示的な公式を用いて,任意の準確率表現におけるpetz回復写像の表現を導出する。古典理論と量子理論の論理的推論の核となる違いは、(quasi-)確率ベクトルに作用する(quasi-)確率行列の同じ数学的構文を用いることで、チャネルの表現よりもむしろ参照の操作において見出される。

Bayes' rule plays a crucial piece of logical inference in information and physical sciences alike. Its extension into the quantum regime has been the object of several recent works. These quantum versions of Bayes' rule have been expressed in the language of Hilbert spaces. In this paper, we derive the expression of the Petz recovery map within any quasiprobability representation, with explicit formulas for the two canonical choices of normal quasiprobability representations (which include Discrete Wigner representations) and of representations based on symmetric, informationally complete positive operator-valued measures (SIC-POVMs). By using the same mathematical syntax of (quasi-)stochastic matrices acting on (quasi-)stochastic vectors, the core difference in logical inference between classical and quantum theory is found in the manipulation of the reference prior rather than in the representation of the channel.

翻訳日:2023-06-29 18:21:58 公開日:2023-06-28

# 深いRプログラミング

Deep R Programming ( http://arxiv.org/abs/2301.01188v3 )

ライセンス: Link先を確認

Marek Gagolewski

(参考訳) Deep R Programmingは、データサイエンスの最も人気のある言語の1つである包括的で詳細な入門コースである。これは野心的な学生、専門家、研究者に、この強力な環境の独立したユーザーになるための知識とスキルを与え、データラングリングや分析、数値計算、統計学、機械学習に関するあらゆる問題に取り組むことができる。この教科書は非営利プロジェクトです。オンライン版とPDF版は <https://deepr.gagolewski.com/> で無料で入手できる。

Deep R Programming is a comprehensive and in-depth introductory course on one of the most popular languages for data science. It equips ambitious students, professionals, and researchers with the knowledge and skills to become independent users of this potent environment so that they can tackle any problem related to data wrangling and analytics, numerical computing, statistics, and machine learning. This textbook is a non-profit project. Its online and PDF versions are freely available at <https://deepr.gagolewski.com/>.

翻訳日:2023-06-29 18:21:42 公開日:2023-06-28

# チューナブル有効結合を持つKerrパラメトリック発振器の相関振動

Correlated oscillations in Kerr parametric oscillators with tunable effective coupling ( http://arxiv.org/abs/2212.13682v3 )

ライセンス: Link先を確認

T. Yamaji and S. Masuda and A. Yamaguchi and T. Satoh and A. Morioka and Y. Igarashi and M. Shirane and T. Yamamoto

(参考訳) 単一光子kerrレジームにおける2つの分散回路ジョセフソンパラメトリック発振器からなる系の同時パラメトリック振動を静電容量で結合した。系のエネルギーは、振幅と符号がパラメトリックポンプ間の相対位相に依存する効果的なカップリングを持つ2ビットイジングハミルトニアンによって記述される。パラメトリック振動の2相相は相互に相関し, ポンプ位相を調整することで相関のパリティと強度を制御できることを実証した。観測された相関は, 純粋な強調を考慮したシミュレーションで再現される。本結果は、KPOネットワークからなるIsingマシンハードウェアで使用可能な外部マイクロ波の位相によるハミルトンパラメータのチューニング性を示す。

We study simultaneous parametric oscillations in a system composed of two distributed-element-circuit Josephson parametric oscillators in the single-photon Kerr regime coupled via a static capacitance. The energy of the system is described by a two-bit Ising Hamiltonian with an effective coupling whose amplitude and sign depend on the relative phase between parametric pumps. We demonstrate that the binary phases of the parametric oscillations are correlated with each other, and that the parity and strength of the correlation can be controlled by adjusting the pump phase. The observed correlation is reproduced in our simulation taking pure dephasing into account. The present result demonstrates the tunability of the Hamiltonian parameters by the phase of external microwave, which can be used in the Ising machine hardware composed of the KPO network.

翻訳日:2023-06-29 18:21:29 公開日:2023-06-28

# クリロフ状態と作用素複素量に対する普遍的アプローチ

A universal approach to Krylov State and Operator complexities ( http://arxiv.org/abs/2212.10583v3 )

ライセンス: Link先を確認

Mohsen Alishahiha and Souvik Banerjee

(参考訳) 我々は、Krylov状態と演算子複雑性の両方を同じ足場に配置できる一般的な枠組みを提案する。我々の形式論において、クリロフ複雑性は、作用素複雑性に対してチャネル状態写像によって得られる二重ヒルベルト空間上に存在する関連する状態の密度行列によって定義される。この密度行列の観点からの複雑性の統一定義により、クリロフ複雑性の概念を部分領域あるいは混合状態複雑性に拡張し、また自然にクリロフ相互複雑度にも拡張することができる。このフレームワークは、複雑さというホログラフィック概念をうまく包含していることを示す。

We present a general framework in which both Krylov state and operator complexities can be put on the same footing. In our formalism, the Krylov complexity is defined in terms of the density matrix of the associated state which, for the operator complexity, lives on a doubled Hilbert space obtained through the channel-state map. This unified definition of complexity in terms of the density matrices enables us to extend the notion of Krylov complexity, to subregion or mixed state complexities and also naturally to the Krylov mutual complexity. We show that this framework also encompasses nicely the holographic notions of complexity.

翻訳日:2023-06-29 18:21:17 公開日:2023-06-28

# トランスフォーマーに基づくバイオメディカル言語モデルのドメイン内適応

Localising In-Domain Adaptation of Transformer-Based Biomedical Language Models ( http://arxiv.org/abs/2212.10422v3 )

ライセンス: Link先を確認

Tommaso Mario Buonocore, Claudio Crema, Alberto Redolfi, Riccardo Bellazzi, Enea Parimbelli

(参考訳) デジタル医療の時代には、病院で毎日生成される膨大なテキスト情報は、タスク固有の、微調整されたバイオメディカル言語表現モデル、患者のケアと管理の改善で活用できる、必須だが未使用の資産である。このような特殊なドメインに対しては、広範囲のチェックポイントから派生した微調整モデルが、大規模なドメイン内リソースに対する追加のトレーニングラウンドに大きく貢献することを示した。しかし、これらのリソースはイタリア語のような低リソース言語には到達できないことが多く、地元の医療機関がドメイン内適応を採用するのを妨げている。このギャップを減らすために,我々の研究は,英語以外の言語で生物医学的言語モデルを導出するための2つのアプローチについて検討した。1つは,英語リソースのニューラルネットワーク翻訳に基づく,品質よりも量を重視する,もう1つは,イタリア語でネイティブに書かれたハイグレードで狭スコープのコーパスに基づく,量よりも品質を優先する,という,具体的なユースケースである。本研究は, 生物医学的適応のためのデータ品質よりもデータ量に厳しい制約があることを示すが, 高品質なデータの結合は, 比較的サイズが制限されたコーパスを扱う場合でも, モデル性能を向上させることができる。我々の調査から得られたモデルは、イタリアの病院やアカデミアにとって重要な研究機会を開放する可能性がある。最後に、この研究から学んだ一連の教訓は、他の低リソース言語や異なるドメイン設定に一般化可能なバイオメディカル言語モデルを構築するためのソリューションに対する貴重な洞察を構成する。

In the era of digital healthcare, the huge volumes of textual information generated every day in hospitals constitute an essential but underused asset that could be exploited with task-specific, fine-tuned biomedical language representation models, improving patient care and management. For such specialized domains, previous research has shown that fine-tuning models stemming from broad-coverage checkpoints can largely benefit additional training rounds over large-scale in-domain resources. However, these resources are often unreachable for less-resourced languages like Italian, preventing local medical institutions to employ in-domain adaptation. In order to reduce this gap, our work investigates two accessible approaches to derive biomedical language models in languages other than English, taking Italian as a concrete use-case: one based on neural machine translation of English resources, favoring quantity over quality; the other based on a high-grade, narrow-scoped corpus natively written in Italian, thus preferring quality over quantity. Our study shows that data quantity is a harder constraint than data quality for biomedical adaptation, but the concatenation of high-quality data can improve model performance even when dealing with relatively size-limited corpora. The models published from our investigations have the potential to unlock important research opportunities for Italian hospitals and academia. Finally, the set of lessons learned from the study constitutes valuable insights towards a solution to build biomedical language models that are generalizable to other less-resourced languages and different domain settings.

翻訳日:2023-06-29 18:21:06 公開日:2023-06-28

# ベクターレグレッションのサポート:リスククワッドローグフレームワーク

Support Vector Regression: Risk Quadrangle Framework ( http://arxiv.org/abs/2212.09178v4 )

ライセンス: Link先を確認

Anton Malandii, Stan Uryasev

(参考訳) 本稿では, 最適化, リスク管理, 統計的推定を関連付ける基本リスク二次理論の文脈において, サポートベクトル回帰 (svr) について検討する。 SVR, $\varepsilon$-SVR および $\nu$-SVR の2つの定式化は、それぞれ等価なエラー対策(Vapnik error と CVaR norm)の最小化に対応する。これらの誤差測度は、対応するリスク二次数を定義する。 SVRに対応する基本リスク四角形を構築することにより、SVRは2つの対称条件量子平均の漸近的に偏りのない推定器であることを示す。さらに,一般確率環境での$\varepsilon$-SVRと$\nu$-SVRの等価性を証明した。さらに、SVRは正規化ペナルティを持つ正規偏差最小化問題として定式化される。最後に、リスク四角形フレームワークにおけるSVRの二重定式化が導出される。

This paper investigates Support Vector Regression (SVR) in the context of the fundamental risk quadrangle theory, which links optimization, risk management, and statistical estimation. It is shown that both formulations of SVR, $\varepsilon$-SVR and $\nu$-SVR, correspond to the minimization of equivalent error measures (Vapnik error and CVaR norm, respectively) with a regularization penalty. These error measures, in turn, define the corresponding risk quadrangles. By constructing the fundamental risk quadrangle, which corresponds to SVR, we show that SVR is the asymptotically unbiased estimator of the average of two symmetric conditional quantiles. Further, we prove the equivalence of the $\varepsilon$-SVR and $\nu$-SVR in a general stochastic setting. Additionally, SVR is formulated as a regular deviation minimization problem with a regularization penalty. Finally, the dual formulation of SVR in the risk quadrangle framework is derived.

翻訳日:2023-06-29 18:20:38 公開日:2023-06-28

# 国家対立型マルチエージェント強化学習の解決策とは?

What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning? ( http://arxiv.org/abs/2212.02705v4 )

ライセンス: Link先を確認

Songyang Han, Sanbao Su, Sihong He, Shuo Han, Haizhao Yang, Fei Miao

(参考訳) MARL(Multi-Agent Reinforcement Learning)は,エージェントのポリシーが正確な状態情報に基づいていると仮定して開発されている。しかし、Deep Reinforcement Learning (DRL)を通じて学んだ政策は、敵国の摂動攻撃に影響を受けやすい。本研究では,MARL の基本的特性を状態不確実性下で調査する試みとして,SAMG (State-Adversarial Markov Game) を提案する。分析の結果,最適エージェント政策とロバストnash均衡の一般的な解概念は必ずしもsamgに存在しないことがわかった。この困難を回避するために,エージェントが最悪の状態値の最大化を目指す,ロバストエージェントポリシーと呼ばれる新しいソリューションを考える。我々は,有限状態および有限作用samgに対するロバストエージェントポリシーの存在を証明する。さらに、状態不確実性の下でMARLエージェントの堅牢なポリシーを学習するためのロバスト多エージェントアクタークリティカル(RMA3C)アルゴリズムを提案する。実験により,本アルゴリズムは状態摂動に直面する場合の既存手法よりも優れ,MARLポリシーの堅牢性を大幅に向上することが示された。私たちのコードはhttps://songyanghan.github.io/what_is_solution/で公開しています。

Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate the fundamental properties of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/.

翻訳日:2023-06-29 18:19:41 公開日:2023-06-28

# narrasum: 物語要約のための大規模データセット

NarraSum: A Large-Scale Dataset for Abstractive Narrative Summarization ( http://arxiv.org/abs/2212.01476v2 )

ライセンス: Link先を確認

Chao Zhao, Faeze Brahman, Kaiqiang Song, Wenlin Yao, Dian Yu, Snigdha Chaturvedi

(参考訳) 物語の要約は、最も健全な出来事とキャラクターを記述するための物語の蒸留版を作ることを目的としている。物語の要約は、出来事の因果関係と性格行動を理解する必要があるため、難しい。この方向の研究を促進するために,大規模な物語要約データセットであるNarraSumを提案する。 122kの物語文書を収録し、様々なジャンルの映画やテレビ番組の筋書きや、それらに対応する抽象要約から収集する。実験の結果,NarraSumにおける人間と最先端の要約モデルの間には大きなパフォーマンスギャップが存在することがわかった。このデータセットは、今後の要約研究や、自然言語の理解と生成に関する広範な研究を促進することを願っている。データセットはhttps://github.com/zhaochaocs/narrasumで入手できる。

Narrative summarization aims to produce a distilled version of a narrative to describe its most salient events and characters. Summarizing a narrative is challenging as it requires an understanding of event causality and character behaviors. To encourage research in this direction, we propose NarraSum, a large-scale narrative summarization dataset. It contains 122K narrative documents, which are collected from plot descriptions of movies and TV episodes with diverse genres, and their corresponding abstractive summaries. Experiments show that there is a large performance gap between humans and the state-of-the-art summarization models on NarraSum. We hope that this dataset will promote future research in summarization, as well as broader studies of natural language understanding and generation. The dataset is available at https://github.com/zhaochaocs/narrasum.

翻訳日:2023-06-29 18:19:19 公開日:2023-06-28

# トランスベース学習最適化

Transformer-Based Learned Optimization ( http://arxiv.org/abs/2212.01055v4 )

ライセンス: Link先を確認

Erik G\"artner, Luke Metz, Mykhaylo Andriluka, C. Daniel Freeman, Cristian Sminchisescu

(参考訳) 本稿では,ニューラルネットワークを用いたオプティマイザ更新ステップの計算を行うための新しい学習最適化手法を提案する。最適化器のパラメータは、最適化タスクのセットのトレーニングによって学習され、効率よく最小化を行う。私たちのイノベーションは、古典的なbfgsアルゴリズムにインスパイアされた学習オプティマイザのための、新しいニューラルネットワークアーキテクチャであるoptimusです。 BFGSと同様に、プレコンディショニング行列をランク1更新の和として推定するが、Transformerベースのニューラルネットワークを用いてこれらの更新をステップ長と方向とともに予測する。近年の学習された最適化に基づくアプローチとは対照的に,我々の定式化により,対象問題のパラメータ空間の次元をまたいだ条件付けが可能となった。提案手法の利点は,これまで最適化アルゴリズムの評価に用いられてきた目標関数と,物理に基づく3次元人体動作の視覚的再構成の現実的実現に有効であることを示す。

We propose a new approach to learned optimization where we represent the computation of an optimizer's update step using a neural network. The parameters of the optimizer are then learned by training on a set of optimization tasks with the objective to perform minimization efficiently. Our innovation is a new neural network architecture, Optimus, for the learned optimizer inspired by the classic BFGS algorithm. As in BFGS, we estimate a preconditioning matrix as a sum of rank-one updates but use a Transformer-based neural network to predict these updates jointly with the step length and direction. In contrast to several recent learned optimization-based approaches, our formulation allows for conditioning across the dimensions of the parameter space of the target problem while remaining applicable to optimization tasks of variable dimensionality without retraining. We demonstrate the advantages of our approach on a benchmark composed of objective functions traditionally used for the evaluation of optimization algorithms, as well as on the real world-task of physics-based visual reconstruction of articulated 3d human motion.

翻訳日:2023-06-29 18:19:07 公開日:2023-06-28

# サンプリングのための近位アルゴリズムの次元依存性の改善

Improved dimension dependence of a proximal algorithm for sampling ( http://arxiv.org/abs/2302.10081v2 )

ライセンス: Link先を確認

Jiaojiao Fan, Bo Yuan and Yongxin Chen

(参考訳) 本研究では,すべての古典的設定(特にlog-concave,log-concave,logarithmic-sobolev inequality (lsi),poincar\'e inequality)において,より汎用的な半スムースあるいは複合ポテンシャルを用いた,より優れた複雑性境界を実現するサンプリングアルゴリズムを提案する。提案アルゴリズムは, 〜\citet{lee2021structured} で導入された近位標本に基づく。この近位サンプリング器の性能は、近位サンプリング器の重要なステップである制限されたガウスオラクル(RGO)によって決定される。この研究の主な貢献は、近似的拒絶サンプリングに基づくRGOの不正確な実現である。 RGOの不等式を束縛するために、ガウス分布上の半滑らか関数に対する新しい濃度不等式を確立し、リプシッツ函数に対するよく知られた濃度不等式を拡張する。 RGOの実装を近位サンプリングに応用し、ほぼすべての設定で最先端の複雑さ境界を達成する。例えば、強い対数対数分布の場合、我々の手法は、MALA の minimax 境界よりも、ウォームスタートのない$\tilde\mathcal{O}(\kappa d^{1/2})$ の複雑さを持つ。 LSIを満たす分布に対して、我々の境界は$\tilde \mathcal{O}(\hat \kappa d^{1/2})$である。

We propose a sampling algorithm that achieves superior complexity bounds in all the classical settings (strongly log-concave, log-concave, Logarithmic-Sobolev inequality (LSI), Poincar\'e inequality) as well as more general settings with semi-smooth or composite potentials. Our algorithm is based on the proximal sampler introduced in~\citet{lee2021structured}. The performance of this proximal sampler is determined by that of the restricted Gaussian oracle (RGO), a key step in the proximal sampler. The main contribution of this work is an inexact realization of RGO based on approximate rejection sampling. To bound the inexactness of RGO, we establish a new concentration inequality for semi-smooth functions over Gaussian distributions, extending the well-known concentration inequality for Lipschitz functions. Applying our RGO implementation to the proximal sampler, we achieve state-of-the-art complexity bounds in almost all settings. For instance, for strongly log-concave distributions, our method has complexity bound $\tilde\mathcal{O}(\kappa d^{1/2})$ without warm start, better than the minimax bound for MALA. For distributions satisfying the LSI, our bound is $\tilde \mathcal{O}(\hat \kappa d^{1/2})$ where $\hat \kappa$ is the ratio between smoothness and the LSI constant, better than all existing bounds.

翻訳日:2023-06-29 18:12:18 公開日:2023-06-28

# 非特定運動データを用いた拡張可能なXRユーザ同定

Extensible Motion-based Identification of XR Users using Non-Specific Motion Data ( http://arxiv.org/abs/2302.07517v3 )

ライセンス: Link先を確認

Christian Schell, Konstantin Kobs, Tamara Fernando, Andreas Hotho, Marc Erich Latoschik

(参考訳) 本稿では,距離ベースと分類に基づくアプローチの強みを組み合わせることで,拡張現実ユーザの動きを識別する。そこで我々は,深層メトリック学習を活用した組込みモデルについて検討する。われわれは,VRゲーム‘Half-Life: Alyx’’をプレイするユーザのデータセット上でモデルをトレーニングし,アート分類ベースモデルの状態をベースラインとして,複数の実験と分析を行う。その結果,埋め込み型手法が有効であった。 1) 数分間の登録データを使用して,非特定動作から新規ユーザを識別できる。 2)新しいユーザーを数秒以内に登録できるが、ベースラインアプローチの再トレーニングにはおよそ1日かかる。 3) 登録データが少ない場合にのみ,ベースラインアプローチよりも信頼性が高い。 4) 異なるVRデバイスで記録された別のデータセットから新しいユーザーを特定するために使用することができる。全体として、我々のソリューションは、拡張可能なxrユーザ識別システムの基礎であり、幅広いユーザ動作に適用できる。また、専門知識やハードウェア、あるいはディープラーニングモデルをトレーニングするためのデータを必要としない、XR実践者が使用可能なプロダクション対応モデルの道を開く。

In this paper, we combine the strengths of distance-based and classification-based approaches for the task of identifying extended reality users by their movements. For this we explore an embedding-based model that leverages deep metric learning. We train the model on a dataset of users playing the VR game ``Half-Life: Alyx'' and conduct multiple experiments and analyses using a state of the art classification-based model as baseline. The results show that the embedding-based method 1) is able to identify new users from non-specific movements using only a few minutes of enrollment data, 2) can enroll new users within seconds, while retraining the baseline approach takes almost a day, 3) is more reliable than the baseline approach when only little enrollment data is available, 4) can be used to identify new users from another dataset recorded with different VR devices. Altogether, our solution is a foundation for easily extensible XR user identification systems, applicable to a wide range of user motions. It also paves the way for production-ready models that could be used by XR practitioners without the requirements of expertise, hardware, or data for training deep learning models.

翻訳日:2023-06-29 18:11:46 公開日:2023-06-28

# 完全共変機械学習に向けて

Towards fully covariant machine learning ( http://arxiv.org/abs/2301.13724v2 )

ライセンス: Link先を確認

Soledad Villar (JHU), David W. Hogg (NYU, MPIA, Flatiron), Weichi Yao (NYU), George A. Kevrekidis (JHU, LANL), Bernhard Sch\"olkopf (MPI-IS)

(参考訳) 任意のデータ表現は任意の調査員の選択を伴う。これらの選択はデータ生成過程の外部にあるため、それぞれの選択は1つの可能な表現を別の表現に取る変換の群に対応する正確な対称性をもたらす。これらはパッシブ対称性であり、座標自由度、ゲージ対称性、単位共分散を含み、これらは全て物理学において重要な結果をもたらした。機械学習において、最も目に見える受動対称性はグラフの相対的あるいは置換的対称性である。私たちの目標は、プレイ中の多くの受動的対称性の機械学習の意味を理解することです。受動的対称性を尊重すべきならば,機械学習の実践について,dos と not について議論する。因果モデリングとの関連について議論し、学習問題の目的がサンプルから一般化することである場合、受動的対称性の実装は特に有用であると主張する。この論文は概念的であり、物理学、数学、機械学習の言語に翻訳される。受動的対称性の考察と実装は、機械学習が20世紀に物理学を変革したのと同じ方法で役立つと信じている。

Any representation of data involves arbitrary investigator choices. Because those choices are external to the data-generating process, each choice leads to an exact symmetry, corresponding to the group of transformations that takes one possible representation to another. These are the passive symmetries; they include coordinate freedom, gauge symmetry, and units covariance, all of which have led to important results in physics. In machine learning, the most visible passive symmetry is the relabeling or permutation symmetry of graphs. Our goal is to understand the implications for machine learning of the many passive symmetries in play. We discuss dos and don'ts for machine learning practice if passive symmetries are to be respected. We discuss links to causal modeling, and argue that the implementation of passive symmetries is particularly valuable when the goal of the learning problem is to generalize out of sample. This paper is conceptual: It translates among the languages of physics, mathematics, and machine-learning. We believe that consideration and implementation of passive symmetries might help machine learning in the same ways that it transformed physics in the twentieth century.

翻訳日:2023-06-29 18:11:09 公開日:2023-06-28

# 未発見の論理推論と学位カリキュラムの一般化

Generalization on the Unseen, Logic Reasoning and Degree Curriculum ( http://arxiv.org/abs/2301.13105v2 )

ライセンス: Link先を確認

Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk

(参考訳) 本稿では,論理関数の学習を,分散一般化の強い場合である未完(gotu)設定の一般化に焦点をあてて検討する。これは、ある推論タスク(例えば算術/論理学)におけるデータのリッチな組合せの性質が、代表的データのサンプリングを困難にし、GOTUの下での学習が成功すると、'extrapolating'あるいは'reasoning'学習者の最初のビゲットを与えるという事実が動機である。次に、(S)GDでトレーニングされた異なるネットワークアーキテクチャがGOTUの下でどのように機能するかを研究し、トランスフォーマーのインスタンス、ランダム特徴モデル、対角線ネットワークを含むネットワークモデルのクラスにおいて、無目でmin-degree-interpolatorが学習されるという理論的および実験的証拠を提供する。また、学習率や平均フィールドネットワークが漏れやすい最小限の解に到達した証拠も提示する。これらの知見は,(1)長さ一般化問題(例: Anil et al. 2022)を説明すること,(2)単項をより効率的に学習するDegree-Curriculumというカリキュラム学習アルゴリズムを導入すること,の2つに繋がる。

This paper considers the learning of logical (Boolean) functions with focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization. This is motivated by the fact that the rich combinatorial nature of data in certain reasoning tasks (e.g., arithmetic/logic) makes representative data sampling challenging, and learning successfully under GOTU gives a first vignette of an 'extrapolating' or 'reasoning' learner. We then study how different network architectures trained by (S)GD perform under GOTU and provide both theoretical and experimental evidence that for a class of network models including instances of Transformers, random features models, and diagonal linear networks, a min-degree-interpolator is learned on the unseen. We also provide evidence that other instances with larger learning rates or mean-field networks reach leaky min-degree solutions. These findings lead to two implications: (1) we provide an explanation to the length generalization problem (e.g., Anil et al. 2022); (2) we introduce a curriculum learning algorithm called Degree-Curriculum that learns monomials more efficiently by incrementing supports.

翻訳日:2023-06-29 18:10:55 公開日:2023-06-28

# EHRSQL: 電子健康記録のための実践的なテキストからSQLのベンチマーク

EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records ( http://arxiv.org/abs/2301.07695v4 )

ライセンス: Link先を確認

Gyubok Lee, Hyeonji Hwang, Seongsu Bae, Yeonsu Kwon, Woncheol Shin, Seongjun Yang, Minjoon Seo, Jong-Yeup Kim, Edward Choi

(参考訳) 電子健康記録(EHR)のための新しいテキスト間SQLデータセットを提案する。発話は、医師、看護師、保険審査および健康記録チームを含む222人の病院スタッフから集められた。構造化EMHデータに基づくQAデータセットを構築するため,大学病院で調査を行い,種問合せの作成に回答した。次に、これらの質問をMIMIC-IIIとeICUの2つのオープンソースのEHRデータベースに手動でリンクし、様々な時間表現と、この調査から収集されたデータセットに持たない質問を含む。私たちのデータセットには、ユニークな課題があります。 1) 病院における幅広いニーズを反映したsqlクエリを生成し、簡単な検索や生存率の計算などの複雑な操作を含む。 2)医療における時間感性質問に対する各種時間表現の理解と対応 3) ある質問が回答可能か否かを区別する。当社のデータセットであるEHRSQLは、構造化されたEHRデータ上でのQAモデルの開発と評価のための実用的なベンチマークとして機能し、テキストからSQLまでの研究と、その医療における実際の展開のギャップを埋めるための一歩を踏み出すことができると考えています。 EHRSQLはhttps://github.com/glee4810/EHRSQLで入手できる。

We present a new text-to-SQL dataset for electronic health records (EHRs). The utterances were collected from 222 hospital staff members, including physicians, nurses, and insurance review and health records teams. To construct the QA dataset on structured EHR data, we conducted a poll at a university hospital and used the responses to create seed questions. We then manually linked these questions to two open-source EHR databases, MIMIC-III and eICU, and included various time expressions and held-out unanswerable questions in the dataset, which were also collected from the poll. Our dataset poses a unique set of challenges: the model needs to 1) generate SQL queries that reflect a wide range of needs in the hospital, including simple retrieval and complex operations such as calculating survival rate, 2) understand various time expressions to answer time-sensitive questions in healthcare, and 3) distinguish whether a given question is answerable or unanswerable. We believe our dataset, EHRSQL, can serve as a practical benchmark for developing and assessing QA models on structured EHR data and take a step further towards bridging the gap between text-to-SQL research and its real-life deployment in healthcare. EHRSQL is available at https://github.com/glee4810/EHRSQL.

翻訳日:2023-06-29 18:09:44 公開日:2023-06-28

# コンピュータビジョンとLSTMニューラルネットワークを用いた太陽コロナホール解析と予測

Solar Coronal Hole Analysis and Prediction using Computer Vision and LSTM Neural Network ( http://arxiv.org/abs/2301.06732v4 )

ライセンス: Link先を確認

Juyoung Yun

(参考訳) 人類が宇宙を探索し始めるにつれ、宇宙の天気の重要性が明らかになってきた。宇宙天気現象の一種であるコロナホールが、航空機や衛星の運用に影響を与えることが確立されている。コロナホール(英: coronal hole)は、オープン磁場線と比較的低温を特徴とする太陽上の領域であり、太陽風を平均より高い速度で放出する。本研究では,地球へのコロナホールの影響に備えるために,コンピュータビジョンを用いてコロナホール領域を検出し,太陽動力学観測所(sdo)の画像に基づいてその大きさを計算する。我々は、太陽の各領域のコロナホールを比較し、相関関係を分析する。次に, 深層学習, 特にLong Short-Term Memory (LSTM) 手法を実装し, コロナホール領域データの傾向を解析し, 7日間にわたる異なる太陽領域におけるそのサイズを予測する。本研究は, コロナホール領域の時系列データを解析することにより, コロナホールの挙動のパターンや傾向を同定し, 宇宙気象事象にどのように影響するかを理解することを目的とする。この研究は、地球と技術システムに影響を与える宇宙天気イベントを予測し、準備する能力を改善するための重要なステップである。

As humanity has begun to explore space, the significance of space weather has become apparent. It has been established that coronal holes, a type of space weather phenomenon, can impact the operation of aircraft and satellites. The coronal hole is an area on the sun characterized by open magnetic field lines and relatively low temperatures, which result in the emission of the solar wind at higher than average rates. In this study, To prepare for the impact of coronal holes on the Earth, we use computer vision to detect the coronal hole region and calculate its size based on images from the Solar Dynamics Observatory (SDO). We compare the coronal holes for each region of the Sun and analyze the correlation. We then implement deep learning techniques, specifically the Long Short-Term Memory (LSTM) method, to analyze trends in the coronal hole area data and predict its size for different sun regions over 7 days. By analyzing time series data on the coronal hole area, this study aims to identify patterns and trends in coronal hole behavior and understand how they may impact space weather events. This research represents an important step towards improving our ability to predict and prepare for space weather events that can affect Earth and technological systems.

翻訳日:2023-06-29 18:09:21 公開日:2023-06-28

# 確率的予測によるリアルタイムてんかん発作検出の遅延短縮

Shorter Latency of Real-time Epileptic Seizure Detection via Probabilistic Prediction ( http://arxiv.org/abs/2301.03465v2 )

ライセンス: Link先を確認

Yankun Xu, Jie Yang, Wenjie Ming, Shuang Wang, and Mohamad Sawan

(参考訳) 近年の研究では、感度性能のよい発作検出アルゴリズムが提案されているが、リアルタイムシナリオにおいて検出遅延を大幅に短縮することは困難である。本稿では,確率的予測によるてんかん発作検出遅延の短縮を目的とした,新しいディープラーニングフレームワークを提案する。我々は,従来の二分法から確率予測への変換を,発作指向脳波記録から横断周期を導入し,ソフトラベルを用いたラベル付け規則を提案することで行った。また, 3D-CNNアーキテクチャと組み合わせたSTFTを用いた新しい特徴抽出手法を提案し, サンプルの予測確率を正確に把握する。さらに,予測確率を高めるための修正重み付け戦略と,検出遅延を大幅に短縮する累積決定ルールを提案する。提案手法は,患者固有の離脱1回限りのクロスバリデーション方式において,CHB-MIT scalp EEG データセットと SWEC-ETHZ 頭蓋内 EEG データセットに実装されている。提案手法は, 交差期99例中94例, 脳波開始後100%の発作検出に成功し, 平均14.84%の正規化予測ictal probability (rpip) 誤差, 2.3 s検出遅延, 0.08/h偽検出率 (fdr) をchb-mitデータセット上で検出した。一方、交差期間中に検出された89例中84例、脳波開始後に100%検出された発作、16.17%のrpipエラー、4.7 s検出遅延、0.08/h fdrがswec-ethzデータセット上で達成されている。得られた検出レイテンシは, 従来研究で報告された最先端結果よりも少なくとも50%短い。

Although recent studies have proposed seizure detection algorithms with good sensitivity performance, there is a remained challenge that they were hard to achieve significantly short detection latency in real-time scenarios. In this manuscript, we propose a novel deep learning framework intended for shortening epileptic seizure detection latency via probabilistic prediction. We are the first to convert the seizure detection task from traditional binary classification to probabilistic prediction by introducing a crossing period from seizure-oriented EEG recording and proposing a labeling rule using soft-label for crossing period samples. And, a novel multiscale STFT-based feature extraction method combined with 3D-CNN architecture is proposed to accurately capture predictive probabilities of samples. Furthermore, we also propose rectified weighting strategy to enhance predictive probabilities, and accumulative decision-making rule to achieve significantly shorter detection latency. We implement the proposed framework on two prevalent datasets -- CHB-MIT scalp EEG dataset and SWEC-ETHZ intracranial EEG dataset in patient-specific leave-one-seizure-out cross-validation scheme. Eventually, the proposed algorithm successfully detected 94 out of 99 seizures during crossing period and 100% seizures detected after EEG onset, averaged 14.84% rectified predictive ictal probability (RPIP) errors of crossing samples, 2.3 s detection latency, 0.08/h false detection rate (FDR) on CHB-MIT dataset. Meanwhile, 84 out of 89 detected seizures during crossing period, 100% detected seizures after EEG onset, 16.17% RPIP errors, 4.7 s detection latency, and 0.08/h FDR are achieved on SWEC-ETHZ dataset. The obtained detection latencies are at least 50% shorter than state-of-the-art results reported in previous studies.

翻訳日:2023-06-29 18:08:59 公開日:2023-06-28

# Auto-AVSR: 自動ラベルによる音声認識

Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels ( http://arxiv.org/abs/2303.14307v3 )

ライセンス: Link先を確認

Pingchuan Ma, Alexandros Haliassos, Adriana Fernandez-Lopez, Honglie Chen, Stavros Petridis, Maja Pantic

(参考訳) 音響雑音に対する頑健性から,音声認識には多くの注目を集めている。近年,大規模モデルとトレーニングセットの使用を中心に,自動・視覚的・音声視覚的音声認識(ASR,VSR,AV-ASR)の性能が大幅に向上している。しかし、データセットの正確なラベル付けには時間と費用がかかる。そこで本研究では,ラベルなしデータセットの自動生成転写を用いて,トレーニングセットのサイズを増加させる方法について検討する。この目的のために、AVSpeechやVoxCeleb2といった非競合データセットを自動的に書き起こすために、公開トレーニング済みのASRモデルを使用します。そして、ARS、VSR、AV-ASRのモデルを拡張トレーニングセットでトレーニングし、LSS2とLSS3のデータセットと追加の自動転写データからなる。近年の文献的傾向であるトレーニングセットのサイズが大きくなると,ノイズによる書き起こしにもかかわらずWERが減少することが示されている。提案手法は,RS2 と LRS3 の AV-ASR 上での最先端性能を実現する。特に、現在の最先端アプローチよりも30%向上したRS3で0.9%のWERを達成し、26倍のトレーニングデータを持つ非公開データセットでトレーニングされたメソッドを上回ります。

Audio-visual speech recognition has received a lot of attention due to its robustness against acoustic noise. Recently, the performance of automatic, visual, and audio-visual speech recognition (ASR, VSR, and AV-ASR, respectively) has been substantially improved, mainly due to the use of larger models and training sets. However, accurate labelling of datasets is time-consuming and expensive. Hence, in this work, we investigate the use of automatically-generated transcriptions of unlabelled datasets to increase the training set size. For this purpose, we use publicly-available pre-trained ASR models to automatically transcribe unlabelled datasets such as AVSpeech and VoxCeleb2. Then, we train ASR, VSR and AV-ASR models on the augmented training set, which consists of the LRS2 and LRS3 datasets as well as the additional automatically-transcribed data. We demonstrate that increasing the size of the training set, a recent trend in the literature, leads to reduced WER despite using noisy transcriptions. The proposed model achieves new state-of-the-art performance on AV-ASR on LRS2 and LRS3. In particular, it achieves a WER of 0.9% on LRS3, a relative improvement of 30% over the current state-of-the-art approach, and outperforms methods that have been trained on non-publicly available datasets with 26 times more training data.

翻訳日:2023-06-29 18:03:00 公開日:2023-06-28

# xplainer:x線観測からゼロショット診断へ

Xplainer: From X-Ray Observations to Explainable Zero-Shot Diagnosis ( http://arxiv.org/abs/2303.13391v3 )

ライセンス: Link先を確認

Chantal Pellegrini, Matthias Keicher, Ege \"Ozsoy, Petra Jiraskova, Rickmer Braren, Nassir Navab

(参考訳) 医療画像からの診断自動予測は臨床的意思決定を支援する貴重な資源である。しかし、そのようなシステムは、通常、医療領域では不足することが多い大量の注釈付きデータに基づいて訓練される必要がある。ゼロショット法は、ラベル付きデータに頼ることなく、異なる臨床所見を持つ新しい設定への柔軟な適応を可能にすることで、この問題に対処する。さらに, 臨床ワークフローに自動診断を統合するためには, 方法が透明で説明しやすいこと, 医療専門家の信頼度を高め, 正確性検証を容易にすることが必要である。本稿では,臨床現場におけるゼロショット診断のための新しいフレームワークであるXplainerを紹介する。 Xplainerは、比較視覚言語モデルの分類記述アプローチを多言語診断タスクに適用する。具体的には、診断を直接予測する代わりに、放射線技師がX線スキャンで探す記述的観察の存在をモデルに分類し、診断の可能性を推定するために記述子確率を使用する。最終的な診断予測は、基礎となる記述子の予測に基づいて直接行われるため、このモデルは設計によって説明可能である。胸部X線データセットであるCheXpertとChestX-ray14のXplainerを評価し,ゼロショット診断の性能と説明性の向上に有効であることを示した。以上の結果から,Xplainerは意思決定プロセスのより詳細な理解を提供し,臨床診断に有用なツールであることが示唆された。

Automated diagnosis prediction from medical images is a valuable resource to support clinical decision-making. However, such systems usually need to be trained on large amounts of annotated data, which often is scarce in the medical domain. Zero-shot methods address this challenge by allowing a flexible adaption to new settings with different clinical findings without relying on labeled data. Further, to integrate automated diagnosis in the clinical workflow, methods should be transparent and explainable, increasing medical professionals' trust and facilitating correctness verification. In this work, we introduce Xplainer, a novel framework for explainable zero-shot diagnosis in the clinical setting. Xplainer adapts the classification-by-description approach of contrastive vision-language models to the multi-label medical diagnosis task. Specifically, instead of directly predicting a diagnosis, we prompt the model to classify the existence of descriptive observations, which a radiologist would look for on an X-Ray scan, and use the descriptor probabilities to estimate the likelihood of a diagnosis. Our model is explainable by design, as the final diagnosis prediction is directly based on the prediction of the underlying descriptors. We evaluate Xplainer on two chest X-ray datasets, CheXpert and ChestX-ray14, and demonstrate its effectiveness in improving the performance and explainability of zero-shot diagnosis. Our results suggest that Xplainer provides a more detailed understanding of the decision-making process and can be a valuable tool for clinical diagnosis.

翻訳日:2023-06-29 18:02:37 公開日:2023-06-28

# 生成的半教師付き学習と生成的オープンセット認識の関連について

On the link between generative semi-supervised learning and generative open-set recognition ( http://arxiv.org/abs/2303.11702v3 )

ライセンス: Link先を確認

Emile Reyn Engelbrecht, Johan du Preez

(参考訳) 本研究では,GANにおける半教師付き学習(SSL)とオープンセット認識(OSR)の関係について検討した。 SSLとOSRを公式にリンクした以前の研究はないが、それぞれの手法は大きな類似点を共有している。具体的には、SSL-GANとOSR-GANはジェネレータに相補的な空間でサンプルを生成し、それぞれの分類器ネットワークを正規化する。続いて、sslとosrの下で訓練された分類器はラベル付きカテゴリ周辺の分類境界を厳しくすることでオープンスペースを一般化する。言い換えれば、SSL-GANを使って訓練された分類器はOSRと逆転を本質的に達成する。このSSL-OSRリンクを証明するため、理論上、実験的に最先端のSSL-GANと最先端のOSR-GANを比較した。その結果,すべてのSSL-GANとOSR-GANは同じ目標に向かって動作し,SSLに最適化されたMargin-GANがSSL-OSRの組み合わせタスクに対して,新たな最先端を設定できることがわかった。将来の研究はSSL-GANとOSR-GANの理論的類似性をさらに探求し、SSL-OSRを他の学習ポリシーに拡張する。

This study investigates the relationship between semi-supervised learning (SSL) and open-set recognition (OSR) under the context of generative adversarial networks (GANs). Although no previous study has formally linked SSL and OSR, their respective methods share striking similarities. Specifically, SSL-GANs and OSR-GANs require their generators to produce samples in the complementary space, which are then used to regularise their respective classifier networks. In turn, classifiers trained under SSL and OSR generalise the open space by tightening classification boundaries around the labelled categories. In other words, a classifier trained using an SSL-GAN intrinsically achieves OSR and vice-versa. To prove this SSL-OSR link, we theoretically and experimentally compare the state-of-the-art SSL-GAN with the state-of-the-art OSR-GAN. Our results find that all SSL-GANs and OSR-GANs work towards the same goal, and that the SSL-optimised Margin-GANs set the new state-of-the-art for the combined task of SSL-OSR. Future studies could further explore the theoretical similarities between SSL-GANs and OSR-GANs, as well as extend SSL-OSR to other learning policies.

翻訳日:2023-06-29 18:02:13 公開日:2023-06-28

# IRGen:画像検索のための生成モデリング

IRGen: Generative Modeling for Image Retrieval ( http://arxiv.org/abs/2303.10126v3 )

ライセンス: Link先を確認

Yidan Zhang, Ting Zhang, Dong Chen, Yujing Wang, Qi Chen, Xing Xie, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Mao Yang, Qingmin Liao, Baining Guo

(参考訳) 生成的モデリングは自然言語処理やコンピュータビジョンにおいてユビキタスであるが、画像検索への応用は未検討である。本稿では,シーケンス・ツー・シーケンスモデルを用いて画像検索を生成モデルの一形態として再キャストし,現在の統一テーマに寄与する。我々のフレームワークIRGenは、エンドツーエンドの微分検索を可能にする統一モデルであり、直接最適化により優れた性能を実現する。 IRGenの開発中、画像の極めて短いセマンティックな配列に変換するという重要な技術的課題に取り組み、効率的かつ効果的な検索を可能にする。実証実験により,本モデルが一般的に使用される3つのベンチマーク,例えばre recall@10スコアのin-shopデータセットにおけるprecision@10の最高基準法よりも22.9\%高い値が得られることを示した。

While generative modeling has been ubiquitous in natural language processing and computer vision, its application to image retrieval remains unexplored. In this paper, we recast image retrieval as a form of generative modeling by employing a sequence-to-sequence model, contributing to the current unified theme. Our framework, IRGen, is a unified model that enables end-to-end differentiable search, thus achieving superior performance thanks to direct optimization. While developing IRGen we tackle the key technical challenge of converting an image into quite a short sequence of semantic units in order to enable efficient and effective retrieval. Empirical experiments demonstrate that our model yields significant improvement over three commonly used benchmarks, for example, 22.9\% higher than the best baseline method in precision@10 on In-shop dataset with comparable recall@10 score.

翻訳日:2023-06-29 18:01:50 公開日:2023-06-28

# アシラ量子測定による多体系の進化

Evolution of many-body systems under ancilla quantum measurements ( http://arxiv.org/abs/2303.07081v3 )

ライセンス: Link先を確認

Elmer V. H. Doggen, Yuval Gefen, Igor V. Gornyi, Alexander D. Mirlin, Dmitry G. Polyakov

(参考訳) 測定誘起相転移は、実験と理論の両方の観点から、激しい電流研究の対象である。我々は,多体格子系を,射影測定を行う自由度(追加の2つの部位を用いて実装)に結合させることにより,量子測定を実装するという概念を探求する。一次元鎖内の相互作用するハードコアボソンの動的相関に対する繰り返し測定(「ストロボスコープ」)の効果を解析した。このプロトコルの重要な特徴は、検出アンシラが各測定工程後に再起動されないことである。これにより、測定された相関系による累積影響の記憶を維持する。はじめに,アシラを1つの格子サイトと結合するモデルを考える。この設定により、アシラ系相互作用によって変調された自由度のラビ振動を通じてシステムに関する情報を得ることができる。量子軌道の統計は、測定が強くなったときに生じる「量子-ゼノバルブ効果」を示し、低エンタングルメントと高エンタングルメントの間に鋭い分岐がある。数値シミュレーションを2つのアンシラの場合に適用し,その後,全部位の計測に拡張する。この現実的な測定装置により、より抽象的なモデルで以前観察されたように、遠絡測定による遷移の証拠が見つかる。力学は絡み合いエントロピーの広い分布を特徴とする。

Measurement-induced phase transitions are the subject of intense current research, both from an experimental and a theoretical perspective. We explore the concept of implementing quantum measurements by coupling a many-body lattice system to an ancillary degree of freedom (implemented using two additional sites), on which projective measurements are performed. We analyze the effect of repeated (``stroboscopic'') measurements on the dynamical correlations of interacting hard-core bosons in a one-dimensional chain. An important distinctive ingredient of the protocol is the fact that the detector ancillas are not re-initialized after each measurement step. The detector thus maintains memory of the accumulated influence by the measured correlated system. Initially, we consider a model in which the ancilla is coupled to a single lattice site. This setup allows obtaining information about the system through Rabi oscillations in the ancillary degrees of freedom, modulated by the ancilla-system interaction. The statistics of quantum trajectories exhibits a ``quantum-Zeno-valve effect'' that occurs when the measurement becomes strong, with sharp branching between low and high entanglement. We proceed by extending numerical simulations to the case of two ancillas and, then, to measurements on all sites. With this realistic measurement apparatus, we find evidence of a disentangling-entangling measurement-induced transition as was previously observed in more abstract models. The dynamics features a broad distribution of the entanglement entropy.

翻訳日:2023-06-29 18:01:04 公開日:2023-06-28

# マルコフ連鎖の線形統計に対するローゼンタール型不等式

Rosenthal-type inequalities for linear statistics of Markov chains ( http://arxiv.org/abs/2303.05838v2 )

ライセンス: Link先を確認

Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov, Marina Sheshukova

(参考訳) 本稿では、独立確率変数の和に対するローゼンタールやベルンシュタインの不等式に類似した幾何学的エルゴードマルコフ鎖の加法関数に対する新しい偏差境界を確立する。我々は、対応する鎖の混合時間に対する境界の依存性に特に注意を払う。より正確には、ロゼンタール不等式(英語版)のマルティンゲール版からの定数と結びついた明示的境界と、基礎となるマルコフ核の混合特性を特徴づける定数を確立する。最後に、我々の証明手法は、我々の知る限り、新しいもので、ポアソン分解の繰り返し適用に基づいている。

In this paper, we establish novel deviation bounds for additive functionals of geometrically ergodic Markov chains similar to Rosenthal and Bernstein inequalities for sums of independent random variables. We pay special attention to the dependence of our bounds on the mixing time of the corresponding chain. More precisely, we establish explicit bounds that are linked to the constants from the martingale version of the Rosenthal inequality, as well as the constants that characterize the mixing properties of the underlying Markov kernel. Finally, our proof technique is, up to our knowledge, new and based on a recurrent application of the Poisson decomposition.

翻訳日:2023-06-29 18:00:40 公開日:2023-06-28

# 拡大次元空間における低離散サンプリング:粒子群最適化のための加速法

Low-discrepancy Sampling in the Expanded Dimensional Space: An Acceleration Technique for Particle Swarm Optimization ( http://arxiv.org/abs/2303.03055v2 )

ライセンス: Link先を確認

Feng Wu, Yuelin Zhao, Jianhua Pang, Jun Yan, and Wanxie Zhong

(参考訳) ランダムサンプリングと比較すると,低差分サンプリングの方が探索空間の被覆に有効である。しかし, 粒子群最適化 (pso) に対する低分散サンプルの影響が正か負かは, 既存の研究で明らかに述べられていない。ニダーレイターの定理を用いて、この研究はPSOの誤差解析を完了し、各反復におけるPSOの誤差境界は拡張次元空間におけるサンプル集合の分散に依存することを示した。この誤差解析に基づいて,拡張次元空間における低差分サンプリングによるPSO型アルゴリズムの高速化手法を提案する。加速度法は、拡張次元空間においてランダムサンプリングに比べて分散が小さい低分散サンプル集合を生成することができ、また、各イテレーションにおける誤差を低減し、収束速度を向上できる。高速化手法を標準PSOと総合学習粒子群最適化と組み合わせ,改良アルゴリズムの性能を元のアルゴリズムと比較した。実験の結果, 2つの改良アルゴリズムは同じ精度で収束速度が著しく速いことがわかった。

Compared with random sampling, low-discrepancy sampling is more effective in covering the search space. However, the existing research cannot definitely state whether the impact of a low-discrepancy sample on particle swarm optimization (PSO) is positive or negative. Using Niderreiter's theorem, this study completes an error analysis of PSO, which reveals that the error bound of PSO at each iteration depends on the dispersion of the sample set in an expanded dimensional space. Based on this error analysis, an acceleration technique for PSO-type algorithms is proposed with low-discrepancy sampling in the expanded dimensional space. The acceleration technique can generate a low-discrepancy sample set with a smaller dispersion, compared with a random sampling, in the expanded dimensional space; it also reduces the error at each iteration, and hence improves the convergence speed. The acceleration technique is combined with the standard PSO and the comprehensive learning particle swarm optimization, and the performance of the improved algorithm is compared with the original algorithm. The experimental results show that the two improved algorithms have significantly faster convergence speed under the same accuracy requirement.

翻訳日:2023-06-29 18:00:29 公開日:2023-06-28

# 低次モデリングにおける残差学習のためのDeepONet多重忠実度アプローチ

A DeepONet multi-fidelity approach for residual learning in reduced order modeling ( http://arxiv.org/abs/2302.12682v2 )

ライセンス: Link先を確認

Nicola Demo and Marco Tezzele and Gianluigi Rozza

(参考訳) 本稿では,多元的視点とdeeponetsを活用し,減少順序モデルの精度を向上させる新しい手法を提案する。縮小モデルは、元のモデルを単純化することで、リアルタイムな数値近似を提供する。そのような演算によって引き起こされるエラーは通常、高速な計算に到達するために無視され、犠牲にされる。そこで本研究では,ニューラルネットワークによって上記の誤差を学習し,新たな予測を推定できるように,機械学習残差学習にモデル還元を組み合わせることを提案する。我々は,高忠実度情報の利用を最大化し,高次オーダーモデルの構築と残差学習に利用することを強調した。本研究では,センサデータに対する正規直交分解(POD)とギャップピーPODの統合について,最近のDeepONetアーキテクチャを用いて検討する。パラメトリックベンチマーク関数と非線形パラメトリックナビエ-ストークス問題に関する数値的研究を行った。

In the present work, we introduce a novel approach to enhance the precision of reduced order models by exploiting a multi-fidelity perspective and DeepONets. Reduced models provide a real-time numerical approximation by simplifying the original model. The error introduced by the such operation is usually neglected and sacrificed in order to reach a fast computation. We propose to couple the model reduction to a machine learning residual learning, such that the above-mentioned error can be learned by a neural network and inferred for new predictions. We emphasize that the framework maximizes the exploitation of high-fidelity information, using it for building the reduced order model and for learning the residual. In this work, we explore the integration of proper orthogonal decomposition (POD), and gappy POD for sensors data, with the recent DeepONet architecture. Numerical investigations for a parametric benchmark function and a nonlinear parametric Navier-Stokes problem are presented.

翻訳日:2023-06-29 17:59:49 公開日:2023-06-28

# 逐次実験後の最適試験

Optimal tests following sequential experiments ( http://arxiv.org/abs/2305.00403v2 )

ライセンス: Link先を確認

Karun Adusumilli

(参考訳) 近年,逐次実験の理論と応用が飛躍的に進歩している。これらの実験は常に仮説検証を念頭に置いて設計されているわけではないが、実験が完了した後もテストの実行に関心があるかもしれない。本研究の目的は,その漸近的性質を解析し,逐次実験の最適テストの開発を支援することである。我々の重要な発見は、あらゆるテストの漸近的なパワー関数は、各処理でガウス過程が観測され、これらのプロセスのドリフトに対する推論が行われる極限実験において、テストによって一致させることができることである。この結果は、強力なsufficiency結果を含む重要な意味を持つ: どんな候補テストも、逐次実験の種類に関わらず、一定の統計セットのみに依存する必要がある。これらの統計は、各治療が実験の終了までにサンプリングされた回数であり、各治療のスコア(パラメトリックモデル)の最終値や効率的な影響関数(非パラメトリックモデル)のプロセスも合わせている。次に,不偏性,\alpha-spending制約など様々な制約下での漸近的最適検定を特徴付ける。最後に,本研究の結果を,コストライジング,グループシーケンシャルトライアル,バンドイット実験の3つの重要な段階に適用し,これらのシナリオにおいて最適な推論を行う方法を示す。

Recent years have seen tremendous advances in the theory and application of sequential experiments. While these experiments are not always designed with hypothesis testing in mind, researchers may still be interested in performing tests after the experiment is completed. The purpose of this paper is to aid in the development of optimal tests for sequential experiments by analyzing their asymptotic properties. Our key finding is that the asymptotic power function of any test can be matched by a test in a limit experiment where a Gaussian process is observed for each treatment, and inference is made for the drifts of these processes. This result has important implications, including a powerful sufficiency result: any candidate test only needs to rely on a fixed set of statistics, regardless of the type of sequential experiment. These statistics are the number of times each treatment has been sampled by the end of the experiment, along with final value of the score (for parametric models) or efficient influence function (for non-parametric models) process for each treatment. We then characterize asymptotically optimal tests under various restrictions such as unbiasedness, \alpha-spending constraints etc. Finally, we apply our our results to three key classes of sequential experiments: costly sampling, group sequential trials, and bandit experiments, and show how optimal inference can be conducted in these scenarios.

翻訳日:2023-06-29 17:53:00 公開日:2023-06-28

# 深層学習支援マイクロ波-プラズマ相互作用に基づくプラズマ密度推定手法

Deep Learning assisted microwave-plasma interaction based technique for plasma density estimation ( http://arxiv.org/abs/2304.14807v2 )

ライセンス: Link先を確認

Pratik Ghosh, Bhaskar Chaudhury, Shishir Purohit, Vishv Joshi, Ashray Kothari, Devdeep Shetranjiwala

(参考訳) 電子密度は、あらゆるプラズマを特徴づける重要なパラメータである。低温プラズマ(LTP)の領域におけるプラズマ応用と研究の大部分は、プラズマ密度とプラズマ温度の正確な推定に基づいている。従来の電子密度測定法は、任意の線形LTPデバイスに対して軸方向および半径方向のプロファイルを提供する。これらの手法は、操作範囲(あまり広くない)、煩雑な計測、複雑なデータ分析手順において大きな欠点がある。本稿は,既存のプラズマ密度測定手法に関連する課題を解決するための新しい代替手法として使用できる,マイクロ波プラズマ相互作用に基づく非侵襲的戦略を提案する。プラズマからのマイクロ波散乱による電界パターンを利用して密度分布を推定する。この概念の証明は、低温、非磁性、衝突プラズマからなるシミュレーショントレーニングデータセットに対して試験される。 10^{16}-10^{19}$ m$^{-3}$の異なる対称(ガウス型)と非対称密度プロファイルは、様々な実験的な構成に対応して検討されている。合成学習データセットの作成中,ノイズの存在や測定データ(dense vs sparse)の量といった実生活実験的な課題が検討されている。 DLベースの技術はプラズマ内の電子密度プロファイルを決定する能力を持つ。提案手法の性能は,SSIM, RMSLE, MAPEの3つの指標を用いて評価されている。得られた結果は, 線形プラズマ装置の密度の2次元半径分布を推定する上で有望な性能を示し, 提案手法のプラズマ診断への応用の可能性を確認した。

The electron density is a key parameter to characterize any plasma. Most of the plasma applications and research in the area of low-temperature plasmas (LTPs) are based on the accurate estimations of plasma density and plasma temperature. The conventional methods for electron density measurements offer axial and radial profiles for any given linear LTP device. These methods have major disadvantages of operational range (not very wide), cumbersome instrumentation, and complicated data analysis procedures. The article proposes a Deep Learning (DL) assisted microwave-plasma interaction-based non-invasive strategy, which can be used as a new alternative approach to address some of the challenges associated with existing plasma density measurement techniques. The electric field pattern due to microwave scattering from plasma is utilized to estimate the density profile. The proof of concept is tested for a simulated training data set comprising a low-temperature, unmagnetized, collisional plasma. Different types of symmetric (Gaussian-shaped) and asymmetrical density profiles, in the range $10^{16}-10^{19}$ m$^{-3}$, addressing a range of experimental configurations have been considered in our study. Real-life experimental issues such as the presence of noise and the amount of measured data (dense vs sparse) have been taken into consideration while preparing the synthetic training data-sets. The DL-based technique has the capability to determine the electron density profile within the plasma. The performance of the proposed deep learning-based approach has been evaluated using three metrics- SSIM, RMSLE, and MAPE. The obtained results show promising performance in estimating the 2D radial profile of the density for the given linear plasma device and affirms the potential of the proposed ML-based approach in plasma diagnostics.

翻訳日:2023-06-29 17:52:38 公開日:2023-06-28

# 自己指導型学習のクックブック

A Cookbook of Self-Supervised Learning ( http://arxiv.org/abs/2304.12210v2 )

ライセンス: Link先を確認

Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, Avi Schwarzschild, Andrew Gordon Wilson, Jonas Geiping, Quentin Garrido, Pierre Fernandez, Amir Bar, Hamed Pirsiavash, Yann LeCun and Micah Goldblum

(参考訳) 人工知能のダークマターと呼ばれる自己教師型学習は、機械学習を進めるための有望な道である。しかし、料理と同様にSSLメソッドのトレーニングは、参入障壁の高い繊細なテクニックである。多くのコンポーネントは慣れ親しんでいるが、SSLメソッドをうまくトレーニングするには、プリテキストタスクからハイパーパラメータのトレーニングまで、一連の選択をめちゃくちゃにする必要がある。私たちのゴールは、基本と最新のSSLレシピをクックブックのスタイルで配置することで、SSL研究への参入障壁を低くすることにあります。興味のある研究者がメソッドの地形をナビゲートし、さまざまなノブの役割を理解し、SSLがいかに美味しいかを探求するために必要なノウハウを得ることを期待しています。

Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier to entry into SSL research by laying the foundations and latest SSL recipes in the style of a cookbook. We hope to empower the curious researcher to navigate the terrain of methods, understand the role of the various knobs, and gain the know-how required to explore how delicious SSL can be.

翻訳日:2023-06-29 17:52:11 公開日:2023-06-28

# DeePLT:スマートホームにおける認知者の軌道予測による個人化照明支援

DeePLT: Personalized Lighting Facilitates by Trajectory Prediction of Recognized Residents in the Smart Home ( http://arxiv.org/abs/2304.08027v3 )

ライセンス: Link先を確認

Danial Safaei, Ali Sobhani, Ali Akbar Kiaei

(参考訳) 近年、住宅の様々な部分の知性は、現代の住宅において不可欠な特徴の1つとなっている。これらの部品の1つは、各人の光をパーソナライズする知性照明システムである。本稿では、軌道予測によって推定される、認識されたユーザの即時未来位置における照明をパーソナライズする機械学習に基づくインテリジェントシステムを提案する。提案するシステムは, (i) 与えられた映像フレームの人物を検出・局所化するための人間検出, (ii) 検出された人物を識別するための顔認識, (iii) 映像フレームのシーケンス内の人物を追跡するための人間追跡, (iv) 逆強化学習を用いた環境におけるユーザの将来の位置を予測するための軌道予測,からなる。提案手法は、仕様、顔画像、カスタム照明設定など、各人物にユニークなプロファイルを提供する。このプロファイルは照明調整プロセスで使用される。一定の照明を考慮した他の方法とは異なり,本システムは,ユーザの直接的介入なしに,色や光強度の観点でそれぞれの「好みの照明」を適用できる。これにより、より高速で効率良く照明を調整できる。また, 予測された軌道経路により, 所望の照明を適用でき, 家庭住民の快適で快適な環境が得られる。実験結果では、入力時点から平均1.4秒で所望の光を照射し、人間の検出では22.1mAp、顔認識では95.12%、人間の追跡では93.3%、軌道予測では10.80 MinADE20, 18.55 MinFDE20, 15.8 MinADE5, 30.50 MinFDE5を照射した。

In recent years, the intelligence of various parts of the home has become one of the essential features of any modern home. One of these parts is the intelligence lighting system that personalizes the light for each person. This paper proposes an intelligent system based on machine learning that personalizes lighting in the instant future location of a recognized user, inferred by trajectory prediction. Our proposed system consists of the following modules: (I) human detection to detect and localize the person in each given video frame, (II) face recognition to identify the detected person, (III) human tracking to track the person in the sequence of video frames and (IV) trajectory prediction to forecast the future location of the user in the environment using Inverse Reinforcement Learning. The proposed method provides a unique profile for each person, including specifications, face images, and custom lighting settings. This profile is used in the lighting adjustment process. Unlike other methods that consider constant lighting for every person, our system can apply each 'person's desired lighting in terms of color and light intensity without direct user intervention. Therefore, the lighting is adjusted with higher speed and better efficiency. In addition, the predicted trajectory path makes the proposed system apply the desired lighting, creating more pleasant and comfortable conditions for the home residents. In the experimental results, the system applied the desired lighting in an average time of 1.4 seconds from the moment of entry, as well as a performance of 22.1mAp in human detection, 95.12% accuracy in face recognition, 93.3% MDP in human tracking, and 10.80 MinADE20, 18.55 MinFDE20, 15.8 MinADE5 and 30.50 MinFDE5 in trajectory prediction.

翻訳日:2023-06-29 17:51:57 公開日:2023-06-28

# フーリエ完全有界多項式の影響と量子アルゴリズムの古典シミュレーション

Influences of Fourier Completely Bounded Polynomials and Classical Simulation of Quantum Algorithms ( http://arxiv.org/abs/2304.06713v2 )

ライセンス: Link先を確認

Francisco Escudero Guti\'errez

(参考訳) 我々は、Arunachalam, Bri\"et and Palazuelos (SICOMP'19) の主な結果の新しいプレゼンテーションを行い、量子クエリアルゴリズムがフーリエ完全有界多項式と呼ばれる新しい多項式のクラスによって特徴づけられることを示す。そのような多項式はすべて影響変数を持つと推測する。この予想は有名なaaronson-ambainis (aa) 予想 (theory of computing '14) よりも弱いが、量子クエリアルゴリズムの古典的なシミュレーションにも同じ意味を持つ。我々は、同次フーリエ完全有界多項式に対して成り立つことを示すことにより、AA予想の新しいケースを証明した。これは、$d$-query量子アルゴリズムの出力が次数2d$の等質多項式$p$であるなら、少なくとも$Var[p]^2$の影響を持つ変数を持つことを意味する。さらに、Bansal, Sinha and de Wolf (CCC'22 and QIP'23) の結果の代替証明として、ブロック-多重線型完全有界多項式が影響変数を持つことを示す。我々の証明はより単純で、より良い定数を得、ランダム性を使用しない。

We give a new presentation of the main result of Arunachalam, Bri\"et and Palazuelos (SICOMP'19) and show that quantum query algorithms are characterized by a new class of polynomials which we call Fourier completely bounded polynomials. We conjecture that all such polynomials have an influential variable. This conjecture is weaker than the famous Aaronson-Ambainis (AA) conjecture (Theory of Computing'14), but has the same implications for classical simulation of quantum query algorithms. We prove a new case of the AA conjecture by showing that it holds for homogeneous Fourier completely bounded polynomials. This implies that if the output of $d$-query quantum algorithm is a homogeneous polynomial $p$ of degree $2d$, then it has a variable with influence at least $Var[p]^2$. In addition, we give an alternative proof of the results of Bansal, Sinha and de Wolf (CCC'22 and QIP'23) showing that block-multilinear completely bounded polynomials have influential variables. Our proof is simpler, obtains better constants and does not use randomness.

翻訳日:2023-06-29 17:51:26 公開日:2023-06-28

# ポリタプレット損失を考慮した理解・論理推論タスクの深層マニフォールド学習

Deep Manifold Learning for Reading Comprehension and Logical Reasoning Tasks with Polytuplet Loss ( http://arxiv.org/abs/2304.01046v2 )

ライセンス: Link先を確認

Jeffrey Lu, Ivan Rodriguez

(参考訳) 理解と論理的推論タスクを読む機械学習モデルの開発における現在のトレンドは、論理的ルールを理解し活用するモデルの能力を改善することに焦点を当てている。本研究は、人間が理解や論理的推論タスクを与えられたときに使用する共通の戦略を表現することにより、他のモデルよりも解釈可能なコンポーネントを持つ、新しい損失関数と付随するモデルアーキテクチャを提供することに焦点を当てている。この戦略は、絶対精度よりも相対精度を強調し、問題の解答に必要な情報を完全に知ることなく理論的に正しい解を生成できる。このような戦略を転校学習モデルの学習に応用し,読解と論理的推論の問題を解決する効果について検討する。モデルは、難読性理解と論理的推論ベンチマークであるreclorデータセットで評価された。本稿では,三重項損失関数の拡張であるポリタップレット損失関数を提案する。その結果,ポリタプレット損失モデルの方が既存のベースラインモデルより優れていることがわかった。ポリタプレット損失は他のコントラスト損失関数の代替として有望なものであるが、その利点を定量化するためにさらなる研究が必要である。

The current trend in developing machine learning models for reading comprehension and logical reasoning tasks is focused on improving the models' abilities to understand and utilize logical rules. This work focuses on providing a novel loss function and accompanying model architecture that has more interpretable components than some other models by representing a common strategy employed by humans when given reading comprehension and logical reasoning tasks. This strategy involves emphasizing relative accuracy over absolute accuracy and can theoretically produce the correct answer without full knowledge of the information required to solve the question. We examine the effectiveness of applying such a strategy to train transfer learning models to solve reading comprehension and logical reasoning questions. The models were evaluated on the ReClor dataset, a challenging reading comprehension and logical reasoning benchmark. We propose the polytuplet loss function, an extension of the triplet loss function, to ensure prioritization of learning the relative correctness of answer choices over learning the true accuracy of each choice. Our results indicate that models employing polytuplet loss outperform existing baseline models. Although polytuplet loss is a promising alternative to other contrastive loss functions, further research is required to quantify the benefits it may present.

翻訳日:2023-06-29 17:50:35 公開日:2023-06-28

# Pgx:強化学習のためのハードウェアアクセラレーション並列ゲームシミュレータ

Pgx: Hardware-accelerated Parallel Game Simulators for Reinforcement Learning ( http://arxiv.org/abs/2303.17503v2 )

ライセンス: Link先を確認

Sotetsu Koyamada, Shinri Okano, Soichiro Nishimori, Yu Murata, Keigo Habara, Haruka Kita, Shin Ishii

(参考訳) JAXで記述され,GPU/TPUアクセラレータ向けに最適化されたボードゲーム強化学習(RL)環境のスイートであるPgxを提案する。 JAXの自動ベクタライゼーションとJust-In-Time(JIT)コンパイルを活用することで、Pgxはアクセラレータ上で数千の並列実行に効率的にスケールできる。 DGX-A100ワークステーションの実験では、Pgxは既存のPython RLライブラリよりも10～100倍高速にRL環境をシミュレートできることがわかった。 Pgxには、バックギャモン、チェス、ショギ、GoといったRL研究のベンチマークとして一般的に使用されるRL環境が含まれている。さらにPgxは、迅速な研究サイクルを促進するために、ミニチュアゲームセットとベースラインモデルを提供している。 pgx環境を用いたgumbel alphazeroアルゴリズムの効率的なトレーニングを行う。 pgxは、研究者がrl実験を加速するための高性能環境シミュレータを提供する。 pgxはhttps://github.com/sotetsuk/pgxで入手できる。

We propose Pgx, a suite of board game reinforcement learning (RL) environments written in JAX and optimized for GPU/TPU accelerators. By leveraging auto-vectorization and Just-In-Time (JIT) compilation of JAX, Pgx can efficiently scale to thousands of parallel executions over accelerators. In our experiments on a DGX-A100 workstation, we discovered that Pgx can simulate RL environments 10-100x faster than existing Python RL libraries. Pgx includes RL environments commonly used as benchmarks in RL research, such as backgammon, chess, shogi, and Go. Additionally, Pgx offers miniature game sets and baseline models to facilitate rapid research cycles. We demonstrate the efficient training of the Gumbel AlphaZero algorithm with Pgx environments. Overall, Pgx provides high-performance environment simulators for researchers to accelerate their RL experiments. Pgx is available at https://github.com/sotetsuk/pgx.

翻訳日:2023-06-29 17:50:15 公開日:2023-06-28

# PeakNet:ディープニューラルネットワークを備えた自動ブラッグピークファインダ

PeakNet: An Autonomous Bragg Peak Finder with Deep Neural Networks ( http://arxiv.org/abs/2303.15301v2 )

ライセンス: Link先を確認

Cong Wang, Po-Nan Li, Jana Thayer and Chun Hong Yoon

(参考訳) X線自由電子レーザー(XFEL)とシンクロトロン施設におけるシリアル結晶学は近年大きな進歩を遂げており、マクロ分子構造と分子過程の新たな科学的研究を可能にしている。しかし、これらの実験はデータ削減とリアルタイムフィードバックにおいて計算上の課題を呈する膨大な量のデータを生成する。ブラッグピーク探索アルゴリズムは有用な画像の識別や、ヒット率と解像度に関するリアルタイムフィードバックを提供する。バッファ溶液,噴射ノズル,その他の遮蔽材からのショット・ツー・ショット強度変動と強い背景散乱により,これは時間を要する最適化問題となる。本稿では,深層ニューラルネットワークを利用した自律型ブラッグピークファインダPeakNetを紹介する。このシステムの開発は 1)手動のアルゴリズムパラメータチューニングの必要性をなくす。 2) 強背景散乱におけるショット・ツー・ショットの変動をリアルタイムに調整することにより, 偽陽性ピークを低減する。 3) 悪い画素マスクを手作業で作成する手間を省き, 必要に応じて再生できるため, イベント毎にマスクを保管する必要がなくなる。 PeakNetは、1920×1920ピクセルの画像をNVIDIA 1080 Ti GPU上で90ミリ秒で処理し、並列化分析やGPUストリーム処理によるさらなる拡張の可能性を秘めている。 PeakNetは、専門家レベルのリアルタイム連続結晶学データ解析に高いデータレートで適している。

Serial crystallography at X-ray free electron laser (XFEL) and synchrotron facilities has experienced tremendous progress in recent times enabling novel scientific investigations into macromolecular structures and molecular processes. However, these experiments generate a significant amount of data posing computational challenges in data reduction and real-time feedback. Bragg peak finding algorithm is used to identify useful images and also provide real-time feedback about hit-rate and resolution. Shot-to-shot intensity fluctuations and strong background scattering from buffer solution, injection nozzle and other shielding materials make this a time-consuming optimization problem. Here, we present PeakNet, an autonomous Bragg peak finder that utilizes deep neural networks. The development of this system 1) eliminates the need for manual algorithm parameter tuning, 2) reduces false-positive peaks by adjusting to shot-to-shot variations in strong background scattering in real-time, 3) eliminates the laborious task of manually creating bad pixel masks and the need to store these masks per event since these can be regenerated on demand. PeakNet also exhibits exceptional runtime efficiency, processing a 1920-by-1920 pixel image around 90 ms on an NVIDIA 1080 Ti GPU, with the potential for further enhancements through parallelized analysis or GPU stream processing. PeakNet is well-suited for expert-level real-time serial crystallography data analysis at high data rates.

翻訳日:2023-06-29 17:49:59 公開日:2023-06-28

# 後方特徴投影による連続学習における線形分離性維持

Preserving Linear Separability in Continual Learning by Backward Feature Projection ( http://arxiv.org/abs/2303.14595v3 )

ライセンス: Link先を確認

Qiao Gu, Dongsub Shim, Florian Shkurti

(参考訳) 破滅的な忘れは、連続的な学習において大きな課題であり、モデルでは、以前見られたタスクからデータにアクセスできない、あるいは制限された、新しいタスクを学習する必要がある。この課題に対処するため,特徴空間における知識蒸留に基づく手法が提案され,忘れの低減が図られている。しかし、ほとんどの特徴蒸留法は、プラスチック性の必要性を見越して、新しい特徴を古いものと一致させるよう直接に制約している。安定性と可塑性のトレードオフを改善するため,我々は,新しい特徴を学習可能な線形変換へと変化させる連続学習法である後方特徴投影法(bfp)を提案する。 BFPは古いクラスの線形分離性を保ちつつ、新しいフィーチャの方向が新しいクラスに対応できるようにしている。 BFPは既存のエクスペリエンスリプレイメソッドと統合することができ、パフォーマンスを大幅に向上させることができる。また,BFPは連続学習中に線形分離性が良好に維持され,高い分類精度が得られるような表現空間の学習にも有効であることを示す。コードはhttps://github.com/rvl-lab-utoronto/BFPで確認できる。

Catastrophic forgetting has been a major challenge in continual learning, where the model needs to learn new tasks with limited or no access to data from previously seen tasks. To tackle this challenge, methods based on knowledge distillation in feature space have been proposed and shown to reduce forgetting. However, most feature distillation methods directly constrain the new features to match the old ones, overlooking the need for plasticity. To achieve a better stability-plasticity trade-off, we propose Backward Feature Projection (BFP), a method for continual learning that allows the new features to change up to a learnable linear transformation of the old features. BFP preserves the linear separability of the old classes while allowing the emergence of new feature directions to accommodate new classes. BFP can be integrated with existing experience replay methods and boost performance by a significant margin. We also demonstrate that BFP helps learn a better representation space, in which linear separability is well preserved during continual learning and linear probing achieves high classification accuracy. The code can be found at https://github.com/rvl-lab-utoronto/BFP

翻訳日:2023-06-29 17:49:37 公開日:2023-06-28

# DC CoMix TTS: Mixerとのコラボレーションによる離散コード付きエンドツーエンド表現型TS

DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer ( http://arxiv.org/abs/2305.19567v4 )

ライセンス: Link先を確認

Yerin Choi, Myoung-Wan Koo

(参考訳) TTSの中立性は大きな成功を収めたものの、コンテンツ収集は依然として課題だ。本稿では,プロソディモデリングの改善を実現するための新しい入力表現と単純なアーキテクチャを提案する。近年のttsにおける離散コードの使用の成功に触発されて,参照エンコーダの入力に離散コードを導入する。具体的には,音響圧縮モデルのベクトル量子化器を用いて,すでにトレーニング済みの多様な音響情報を活用する。さらに、修正MLP-Mixerを参照エンコーダに適用し、アーキテクチャをより軽量にする。その結果、プロソディ転送TSをエンドツーエンドで訓練する。本手法は主観的評価と客観的評価の両方を通して有効性を示す。実験において、離散符号を入力として利用する場合、参照エンコーダは話者非依存の韻律を学習できることを実証する。さらに,少ないパラメータを入力しても比較結果が得られる。

Despite the huge successes made in neutral TTS, content-leakage remains a challenge. In this paper, we propose a new input representation and simple architecture to achieve improved prosody modeling. Inspired by the recent success in the use of discrete code in TTS, we introduce discrete code to the input of the reference encoder. Specifically, we leverage the vector quantizer from the audio compression model to exploit the diverse acoustic information it has already been trained on. In addition, we apply the modified MLP-Mixer to the reference encoder, making the architecture lighter. As a result, we train the prosody transfer TTS in an end-to-end manner. We prove the effectiveness of our method through both subjective and objective evaluations. We demonstrate that the reference encoder learns better speaker-independent prosody when discrete code is utilized as input in the experiments. In addition, we obtain comparable results even when fewer parameters are inputted.

翻訳日:2023-06-29 17:44:03 公開日:2023-06-28

# 幾何グラフフィルタとニューラルネットワーク : 限界特性と判別可能性トレードオフ

Geometric Graph Filters and Neural Networks: Limit Properties and Discriminability Trade-offs ( http://arxiv.org/abs/2305.18467v2 )

ライセンス: Link先を確認

Zhiyang Wang and Luana Ruiz and Alejandro Ribeiro

(参考訳) 本稿では、グラフニューラルネットワーク(gnn)と多様体ニューラルネットワーク(mnn)の関係について、グラフが多様体からサンプリングされた点の集合から構築され、幾何学的情報をエンコードする場合に検討する。我々は、多様体とグラフの畳み込みがそれぞれラプラス・ベルトラミ作用素とグラフラプラシアンで定義されるような畳み込み MNN と GNN を考える。適切なカーネルを用いて、密度と適度なスパースグラフの両方を分析する。これらのグラフ上の畳み込みフィルタとニューラルネットワークが連続多様体上の畳み込みフィルタとニューラルネットワークに収束することを示す非漸近的誤差境界を証明した。この分析の副産物として、グラフフィルタの識別性と、多様体フィルタの所望の挙動を近似する能力との間の重要なトレードオフを観察する。次に、非線形性の周波数混合性により、このトレードオフがニューラルネットワークでどのように改善されるかについて議論する。さらに、同一多様体からサンプリングされた幾何グラフの転送可能性も導出する。本研究は,ナビゲーション制御問題と点雲分類タスクで数値的に検証する。

This paper studies the relationship between a graph neural network (GNN) and a manifold neural network (MNN) when the graph is constructed from a set of points sampled from the manifold, thus encoding geometric information. We consider convolutional MNNs and GNNs where the manifold and the graph convolutions are respectively defined in terms of the Laplace-Beltrami operator and the graph Laplacian. Using the appropriate kernels, we analyze both dense and moderately sparse graphs. We prove non-asymptotic error bounds showing that convolutional filters and neural networks on these graphs converge to convolutional filters and neural networks on the continuous manifold. As a byproduct of this analysis, we observe an important trade-off between the discriminability of graph filters and their ability to approximate the desired behavior of manifold filters. We then discuss how this trade-off is ameliorated in neural networks due to the frequency mixing property of nonlinearities. We further derive a transferability corollary for geometric graphs sampled from the same manifold. We validate our results numerically on a navigation control problem and a point cloud classification task.

翻訳日:2023-06-29 17:43:50 公開日:2023-06-28

# ビデオ連続学習のための時間情報の再検討

Just a Glimpse: Rethinking Temporal Information for Video Continual Learning ( http://arxiv.org/abs/2305.18418v2 )

ライセンス: Link先を確認

Lama Alssum, Juan Leon Alcazar, Merey Ramazanova, Chen Zhao, Bernard Ghanem

(参考訳) クラス増分学習は、現実世界のアプリケーションシナリオによく似ているため、継続的学習の研究において最も重要な設定の1つである。メモリサイズが制限されると、クラスやタスクの数が増えると、壊滅的な忘れることになる。ビデオ領域での継続的な学習は、ビデオデータが大量のフレームを含んでいるため、リプレイメモリにより高い負担がかかるため、さらに課題となる。現在の一般的なプラクティスは、ビデオストリームからサブサンプルのフレームをリプレイメモリに格納することです。本稿では,個別フレームに基づく効果的なビデオ連続学習のための新しい再生機構SMILEを提案する。広範にわたる実験により,映像の多様性は時間的情報よりも重要な役割を担っていることが明らかとなった。そこで本手法は,多数の一意なビデオを表す少数のフレームから学習することに焦点を当てている。 3つの代表的なビデオデータセット、kinetics, ucf101, activitynetにおいて、提案手法は最先端の性能を最大21.49%向上させた。

Class-incremental learning is one of the most important settings for the study of Continual Learning, as it closely resembles real-world application scenarios. With constrained memory sizes, catastrophic forgetting arises as the number of classes/tasks increases. Studying continual learning in the video domain poses even more challenges, as video data contains a large number of frames, which places a higher burden on the replay memory. The current common practice is to sub-sample frames from the video stream and store them in the replay memory. In this paper, we propose SMILE a novel replay mechanism for effective video continual learning based on individual/single frames. Through extensive experimentation, we show that under extreme memory constraints, video diversity plays a more significant role than temporal information. Therefore, our method focuses on learning from a small number of frames that represent a large number of unique videos. On three representative video datasets, Kinetics, UCF101, and ActivityNet, the proposed method achieves state-of-the-art performance, outperforming the previous state-of-the-art by up to 21.49%.

翻訳日:2023-06-29 17:43:33 公開日:2023-06-28

# 量子アニールの強結合極限における$U(N)$ゲージ理論

$U(N)$ gauge theory in the strong coupling limit on a quantum annealer ( http://arxiv.org/abs/2305.18179v2 )

ライセンス: Link先を確認

Jangho Kim and Thomas Luu and Wolfgang Unger

(参考訳) 強結合系における格子 qcd は整数値を持つ双対変数で定式化することができる。この方法では有限密度符号問題を回避し、ワームアルゴリズムによって、控えめな有限温度と有限密度を効率的にシミュレーションすることができる。しかし、低温度の環境は対処に費用がかかる。分割関数は整数の項でのみ表されるので、D-ウェーブ量子アニーラーでの研究には適している。まず、研究対象とするシステムのセットアップを説明し、その後、量子アニール、特にD-Waveに適合する改質を示す。概念実証として、ゲージ群 $U(1)$ に対して D-Wave 上で得られた最初の結果を示し、ゲージ群 $U(3)$ および $SU(3)$ への次のステップを概説する。また,ヒストグラムの重み付けにより,分析結果と比較して観察精度が大幅に向上することがわかった。

Lattice QCD in the strong coupling regime can be formulated in dual variables which are integer-valued. It can be efficiently simulated for modest finite temperatures and finite densities via the Worm algorithm, circumventing the finite density sign problem in this regime. However, the low temperature regime is more expensive to address. As the partition function is solely expressed in terms of integers, it is well suited to be studied on the D-Wave quantum annealer. We will first explain the setup of the system we want to study, and then present its reformulation suitable for a quantum annealer, and in particular the D-Wave. As a proof of concept, we present first results obtained on D-Wave for gauge group $U(1)$ and outline the next steps towards gauge groups $U(3)$ and $SU(3)$. We find that in addition, histogram reweighting greatly improves the accuracy of our observables when compared to analytic results.

翻訳日:2023-06-29 17:43:15 公開日:2023-06-28

# 思考連鎖の背後にある謎の解明に向けて--理論的展望

Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective ( http://arxiv.org/abs/2305.15408v3 )

ライセンス: Link先を確認

Guhao Feng, Bohang Zhang, Yuntian Gu, Haotian Ye, Di He, Liwei Wang

(参考訳) 近年の研究では、特に数学や推論を含む複雑なタスクを扱う場合、CoT(Chain-of-Thought prompting)がLarge Language Models(LLM)の性能を劇的に改善できることが判明している。実験的な成功にもかかわらず、CoTの背後にあるメカニズムとLLMの可能性を解き放つ方法はまだ解明されていない。本稿では,これらの疑問に理論的に答える第一歩を踏み出す。具体的には,基本的な数学的および意思決定問題の解法において,LLMとCoTとの表現性について検討する。まず,モデルサイズが入力長に対して超多項式的に大きくなる限り,有界深度変換器は基本演算/方程式タスクの正解を直接生成できないことを示す。対照的に,定サイズの自己回帰変換器は,一般的な数学言語形式を用いてCoTの導出を生成することで,両方のタスクを解くのに十分であることを示す。さらに, COT を用いた LLM は, 動的プログラミング(Dynamic Programming) と呼ばれる一般的な意思決定問題を解くことができ, 複雑な実世界のタスクに対処する能力の正当化を図っている。最後に、4つのタスクに関する広範な実験では、トランスフォーマーは常に直接答えを予測できないが、十分なCoTの実証から正しいソリューションを段階的に生成できることが示されている。

Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs), particularly when dealing with complex tasks involving mathematics or reasoning. Despite the enormous empirical success, the underlying mechanisms behind CoT and how it unlocks the potential of LLMs remain elusive. In this paper, we take a first step towards theoretically answering these questions. Specifically, we examine the expressivity of LLMs with CoT in solving fundamental mathematical and decision-making problems. We start by giving an impossibility result showing that bounded-depth Transformers are unable to directly produce correct answers for basic arithmetic/equation tasks unless the model size grows super-polynomially with respect to the input length. In contrast, we then prove by construction that autoregressive Transformers of constant size suffice to solve both tasks by generating CoT derivations using a commonly-used math language format. Moreover, we show LLMs with CoT are capable of solving a general class of decision-making problems known as Dynamic Programming, thus justifying its power in tackling complex real-world tasks. Finally, extensive experiments on four tasks show that, while Transformers always fail to predict the answers directly, they can consistently learn to generate correct solutions step-by-step given sufficient CoT demonstrations.

翻訳日:2023-06-29 17:42:59 公開日:2023-06-28

# 予測をフリップする最小トレーニングサブセットのリラベル

Relabeling Minimal Training Subset to Flip a Prediction ( http://arxiv.org/abs/2305.12809v2 )

ライセンス: Link先を確認

Jinghan Yang, Linjie Xu, Lequan Yu

(参考訳) 機械学習モデルから不十分な予測に直面する場合、基礎となる理由を調査し、その結果を逆転する可能性を探ることが不可欠である。モデルがトレーニングされる前に、トレーニングデータの最小サブセットである$\mathcal{S}_t$を解放することで、テスト予測を$x_t$に切り替えることができますか? 拡張影響関数を用いてそのような部分集合を同定し、レバー化する効率的な手順を提案する。トレーニングポイントの1%未満のrelabelingでは、モデルの予測をひっくり返すことがしばしばあります。このメカニズムは、(1) 影響力のあるトレーニング部分集合を復元してモデル予測に挑戦するためのアプローチを提供する、(2) モデルのロバスト性を評価する(例えば、$|\mathcal{S}_t|$)、(2) トレーニングセットのノイズ比に高い関係があること、および$|\mathcal{S}_t|$ が予測確率と相関するが、予測確率に相補的であること、(3) トレーニングポイントがグループ帰属バイアスにつながること、の3つを示す。私たちの知る限りでは、私たちは、与えられた予測を覆すのに必要な最小限のトレーニングサブセットを特定し、緩和することについて、最初に調査します。

When facing an unsatisfactory prediction from a machine learning model, it is crucial to investigate the underlying reasons and explore the potential for reversing the outcome. We ask: can we result in the flipping of a test prediction $x_t$ by relabeling the smallest subset $\mathcal{S}_t$ of the training data before the model is trained? We propose an efficient procedure to identify and relabel such a subset via an extended influence function. We find that relabeling fewer than 1% of the training points can often flip the model's prediction. This mechanism can serve multiple purposes: (1) providing an approach to challenge a model prediction by recovering influential training subsets; (2) evaluating model robustness with the cardinality of the subset (i.e., $|\mathcal{S}_t|$); we show that $|\mathcal{S}_t|$ is highly related to the noise ratio in the training set and $|\mathcal{S}_t|$ is correlated with but complementary to predicted probabilities; (3) revealing training points lead to group attribution bias. To the best of our knowledge, we are the first to investigate identifying and relabeling the minimal training subset required to flip a given prediction.

翻訳日:2023-06-29 17:42:35 公開日:2023-06-28

# マルチタスク階層型逆強化学習

Multi-task Hierarchical Adversarial Inverse Reinforcement Learning ( http://arxiv.org/abs/2305.12633v2 )

ライセンス: Link先を確認

Jiayu Chen, Dipesh Tamboli, Tian Lan, Vaneet Aggarwal

(参考訳) マルチタスク・イミテーション・ラーニング(MIL)は,汎用ロボットに不可欠なマルチタスク・エキスパート・デモに基づいて,タスクの配布が可能な政策を訓練することを目的としている。既存のmilアルゴリズムは、データ効率が低く、複雑な長方形処理では性能が劣る。 MH-AIRL(Multi-task Hierarchical Adversarial Inverse Reinforcement Learning)を開発し、階層的に構造化されたマルチタスクポリシーを学習する。これを実現するため、mh-airlはコンテキストベースのマルチタスク学習、airl(ilアプローチ)、階層的ポリシー学習を効果的に合成する。さらに、MH-AIRLは、実際によりアクセスしやすいタスクやスキルアノテーション(すなわち状態-アクションペアのみ)なしで、デモに採用することができる。 MH-AIRLの各モジュールに対して理論的正当性を提供し、MH-AIRLで学んだマルチタスクポリシーをSOTA MILベースラインよりも優れた性能と転送性を示す。

Multi-task Imitation Learning (MIL) aims to train a policy capable of performing a distribution of tasks based on multi-task expert demonstrations, which is essential for general-purpose robots. Existing MIL algorithms suffer from low data efficiency and poor performance on complex long-horizontal tasks. We develop Multi-task Hierarchical Adversarial Inverse Reinforcement Learning (MH-AIRL) to learn hierarchically-structured multi-task policies, which is more beneficial for compositional tasks with long horizons and has higher expert data efficiency through identifying and transferring reusable basic skills across tasks. To realize this, MH-AIRL effectively synthesizes context-based multi-task learning, AIRL (an IL approach), and hierarchical policy learning. Further, MH-AIRL can be adopted to demonstrations without the task or skill annotations (i.e., state-action pairs only) which are more accessible in practice. Theoretical justifications are provided for each module of MH-AIRL, and evaluations on challenging multi-task settings demonstrate superior performance and transferability of the multi-task policies learned with MH-AIRL as compared to SOTA MIL baselines.

翻訳日:2023-06-29 17:42:06 公開日:2023-06-28

# 高速カロリーメータシミュレーションのための幾何学的自己回帰モデル(GAAM)による新しいジオメトリへの一般化

Generalizing to new geometries with Geometry-Aware Autoregressive Models (GAAMs) for fast calorimeter simulation ( http://arxiv.org/abs/2305.11531v2 )

ライセンス: Link先を確認

Junze Liu, Aishik Ghosh, Dylan Smith, Pierre Baldi, Daniel Whiteson

(参考訳) 衝突生成物に対するシミュレート検出器の応答は素粒子物理学のデータ解析に不可欠であるが、計算量は非常に高価である。 1つのサブ検出器であるカロリメータは、細胞の粒度が高く、相互作用の複雑さのために計算時間を支配している。生成モデルは、より迅速なサンプル生産を提供することができるが、現在、特定の検出器ジオメトリのパフォーマンスを最適化するためにかなりの労力を必要としており、しばしば、他のジオメトリに一般化することなく、様々なセルサイズや配置を記述するために多くのモデルが必要となる。我々は,温度計の応答が幾何によってどう変化するかを学習し,余分なトレーニングを伴わずに未知の測地に対するシミュレーション応答を生成できる,$\textit{geometry-aware}$ autoregressive modelを開発した。幾何認識モデルは、生成したワッサーシュタイン距離や、シミュレーションされた応答を要約する鍵量の真の分布といったいくつかの指標において、ベースライン無意識モデルよりも50\%以上優れている。 1つの幾何学的認識モデルは、大型ハドロン衝突型加速器で収集されたデータを分析する物理学者によって、現在カロリーメーターシミュレーション用に設計された数百の生成モデルを置き換えることができる。将来の検出器の研究のためには、このような基礎モデルが重要なツールとなり、通常生成熱量計モデルを開発するのに必要な大規模な事前投資を劇的に削減する。

Generation of simulated detector response to collision products is crucial to data analysis in particle physics, but computationally very expensive. One subdetector, the calorimeter, dominates the computational time due to the high granularity of its cells and complexity of the interactions. Generative models can provide more rapid sample production, but currently require significant effort to optimize performance for specific detector geometries, often requiring many models to describe the varying cell sizes and arrangements, without the ability to generalize to other geometries. We develop a $\textit{geometry-aware}$ autoregressive model, which learns how the calorimeter response varies with geometry, and is capable of generating simulated responses to unseen geometries without additional training. The geometry-aware model outperforms a baseline unaware model by over $50\%$ in several metrics such as the Wasserstein distance between the generated and the true distributions of key quantities which summarize the simulated response. A single geometry-aware model could replace the hundreds of generative models currently designed for calorimeter simulation by physicists analyzing data collected at the Large Hadron Collider. For the study of future detectors, such a foundational model will be a crucial tool, dramatically reducing the large upfront investment usually needed to develop generative calorimeter models.

翻訳日:2023-06-29 17:41:43 公開日:2023-06-28

# MAF-Net: 基底血管画像分割のための複数注意誘導核融合ネットワーク

MAF-Net: Multiple attention-guided fusion network for fundus vascular image segmentation ( http://arxiv.org/abs/2305.03617v3 )

ライセンス: Link先を確認

Yuanyuan Peng, Pengpeng Luan, Zixu Zhang

(参考訳) 網膜眼底画像中の血管を正確に分割することは、眼疾患の早期スクリーニング、診断、評価において重要であるが、重要な光変化、不均一な曲率構造、非一様コントラストなどの様々な要因により、セグメンテーションタスクに不明瞭な不確実性をもたらす。その結果,網膜基底画像の血管を正確に検出するためのマルチアテンション誘導核融合ネットワーク (MAF-Net) が提案された。現在、伝統的なunetベースのモデルは、長距離依存関係を明示的にモデル化することで部分的な情報を失う可能性がある。シーン情報補償の損失に対するコンテクスト情報を強化するため、眼底画像から血管の様々な特徴を抽出するために、トランスフォーマによって構築された空間的注意機構とチャネル注意を結合した注意融合機構を用いる。その後、スキップ接続にユニークな空間的注意機構を適用し、低レベル機能から冗長な情報やノイズをフィルタリングすることで、高レベル機能との統合性が向上する。さらに、ドロップアウト層を使用して、いくつかのニューロンをランダムに破棄することで、ディープラーニングネットワークの過剰フィットを防止し、その一般化性能を向上させることができる。実験結果は,F1スコアが0.818,0.836,0.811,Acc値が0.968,0.973,0.973の公開データセットDRIVE,STARE,CHASEDB1で検証された。ビジュアルインスペクションと定量的評価はいずれも,最先端手法と比較して良好な結果が得られることを示す。

Accurately segmenting blood vessels in retinal fundus images is crucial in the early screening, diagnosing, and evaluating some ocular diseases, yet it poses a nontrivial uncertainty for the segmentation task due to various factors such as significant light variations, uneven curvilinear structures, and non-uniform contrast. As a result, a multiple attention-guided fusion network (MAF-Net) is proposed to accurately detect blood vessels in retinal fundus images. Currently, traditional UNet-based models may lose partial information due to explicitly modeling long-distance dependencies, which may lead to unsatisfactory results. To enrich contextual information for the loss of scene information compensation, an attention fusion mechanism that combines the channel attention with spatial attention mechanisms constructed by Transformer is employed to extract various features of blood vessels from retinal fundus images. Subsequently, a unique spatial attention mechanism is applied in the skip connection to filter out redundant information and noise from low-level features, thus enabling better integration with high-level features. In addition, a DropOut layer is employed to randomly discard some neurons, which can prevent overfitting of the deep learning network and improve its generalization performance. Experimental results were verified in public datasets DRIVE, STARE and CHASEDB1 with F1 scores of 0.818, 0.836 and 0.811, and Acc values of 0.968, 0.973 and 0.973, respectively. Both visual inspection and quantitative evaluation demonstrate that our method produces satisfactory results compared to some state-of-the-art methods.

翻訳日:2023-06-29 17:41:20 公開日:2023-06-28

# 深部ニューラルネットワークの統計的最適性

Statistical Optimality of Deep Wide Neural Networks ( http://arxiv.org/abs/2305.02657v2 )

ライセンス: Link先を確認

Yicheng Li, Zixiong Yu, Guhan Chen, Qian Lin

(参考訳) 本稿では、有界領域 $\mathcal X \subset \mathbb R^{d}$ 上で定義された深いフィードフォワード ReLU ニューラルネットワークの一般化能力を考察する。まず、ニューラルネットワークの一般化能力は、対応するディープ・ニューラル・タンジェント・カーネル(NTK)の回帰によって完全に特徴づけられることを示した。次に、深部NTKのスペクトル特性を調査し、深部NTKが$\mathcal{X}$で正定値であり、その固有値減衰率は$(d+1)/d$であることを示す。カーネル回帰の確立された理論により、対応するNTKに付随する再生カーネルヒルベルト空間(RKHS)に回帰関数が存在することを仮定して、勾配降下により訓練された多層ワイドニューラルネットワークが最小最大値を達成することを結論付ける。最後に、オーバーフィットした多層ニューラルネットワークは$\mathbb S^{d}$ではうまく一般化できないことを示す。我々は、$\mathbb r^{d}$ 上の ntk の固有値減衰率を決定する技術上の貢献は、独立した利益であると信じている。

In this paper, we consider the generalization ability of deep wide feedforward ReLU neural networks defined on a bounded domain $\mathcal X \subset \mathbb R^{d}$. We first demonstrate that the generalization ability of the neural network can be fully characterized by that of the corresponding deep neural tangent kernel (NTK) regression. We then investigate on the spectral properties of the deep NTK and show that the deep NTK is positive definite on $\mathcal{X}$ and its eigenvalue decay rate is $(d+1)/d$. Thanks to the well established theories in kernel regression, we then conclude that multilayer wide neural networks trained by gradient descent with proper early stopping achieve the minimax rate, provided that the regression function lies in the reproducing kernel Hilbert space (RKHS) associated with the corresponding NTK. Finally, we illustrate that the overfitted multilayer wide neural networks can not generalize well on $\mathbb S^{d}$. We believe our technical contributions in determining the eigenvalue decay rate of NTK on $\mathbb R^{d}$ might be of independent interests.

翻訳日:2023-06-29 17:40:50 公開日:2023-06-28

# Genomic Interpreter: 1Dシフトウィンドウトランスを備えた階層型ゲノムディープニューラルネットワーク

Genomic Interpreter: A Hierarchical Genomic Deep Neural Network with 1D Shifted Window Transformer ( http://arxiv.org/abs/2306.05143v2 )

ライセンス: Link先を確認

Zehui Li, Akashaditya Das, William A V Beardall, Yiren Zhao, Guy-Bart Stan

(参考訳) ゲノムデータの量と質の増大を考えると、新しい洞察の抽出には解釈可能な機械学習モデルが必要である。本研究はゲノム解析予測のための新しいアーキテクチャであるゲノム解釈を提示する。このモデルは、ゲノムアッセイ予測タスクの最先端モデルを上回る。我々のモデルはゲノム部位の階層的依存関係を識別できる。これは、我々が長距離階層データをモデル化するために設計した、新しいトランスフォーマーベースのブロックである1d-swinの統合によって実現されている。ゲノムインタプターは17K塩基対の38,171のDNAセグメントを含むデータセットに基づいて評価され、クロマチンアクセシビリティと遺伝子発現予測において優れた性能を示し、遺伝子制御の基礎となる「シンタクス」を解き放つ。

Given the increasing volume and quality of genomics data, extracting new insights requires interpretable machine-learning models. This work presents Genomic Interpreter: a novel architecture for genomic assay prediction. This model outperforms the state-of-the-art models for genomic assay prediction tasks. Our model can identify hierarchical dependencies in genomic sites. This is achieved through the integration of 1D-Swin, a novel Transformer-based block designed by us for modelling long-range hierarchical data. Evaluated on a dataset containing 38,171 DNA segments of 17K base pairs, Genomic Interpreter demonstrates superior performance in chromatin accessibility and gene expression prediction and unmasks the underlying `syntax' of gene regulation.

翻訳日:2023-06-29 17:33:45 公開日:2023-06-28

# テキストプロンプトによる高品質検出データ生成のためのテキスト間拡散モデルへの幾何制御の統合

Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt ( http://arxiv.org/abs/2306.04607v4 )

ライセンス: Link先を確認

Kai Chen, Enze Xie, Zhe Chen, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung

(参考訳) 拡散モデルは、コンテンツの作成や画像分類などのタスクのためのデータの生成に際し、非常に注目されている。しかし、高品質な物体検出データを生成するための拡散モデルの利用は、画像レベルの知覚品質だけでなく、バウンディングボックスやカメラビューのような幾何学的条件が不可欠である未探索領域に留まっている。従来はコピー・ペースト合成やレイアウト・トゥ・イメージ(L2I)生成を利用していた。本稿では,様々な幾何学的条件を柔軟にテキストプロンプトに変換し,高品質なデータ生成のための事前学習されたtext-to-image(t2i)拡散モデルを強化するシンプルなフレームワークgeodiffusionを提案する。従来のl2i法とは異なり、geodiffusionはバウンディングボックスだけでなく、自動運転シーンのカメラビューなどの余分な幾何学的条件もエンコードできる。大規模な実験では、GeoDiffusionは従来のL2I法よりも高速に4倍のトレーニング時間を維持する。私たちの知る限りでは、幾何学的な条件でレイアウトから画像への拡散モデルを採用し、l2i生成画像が物体検出器の性能向上に有用であることを実証するのはこれが初めてです。

Diffusion models have attracted significant attention due to their remarkable ability to create content and generate data for tasks such as image classification. However, the usage of diffusion models to generate high-quality object detection data remains an underexplored area, where not only the image-level perceptual quality but also geometric conditions such as bounding boxes and camera views are essential. Previous studies have utilized either copy-paste synthesis or layout-to-image (L2I) generation with specifically designed modules to encode semantic layouts. In this paper, we propose GeoDiffusion, a simple framework that can flexibly translate various geometric conditions into text prompts and empower the pre-trained text-to-image (T2I) diffusion models for high-quality detection data generation. Unlike previous L2I methods, our GeoDiffusion is able to encode not only bounding boxes but also extra geometric conditions such as camera views in self-driving scenes. Extensive experiments demonstrate GeoDiffusion outperforms previous L2I methods while maintaining 4x training time faster. To the best of our knowledge, this is the first work to adopt diffusion models for layout-to-image generation with geometric conditions and demonstrate that L2I-generated images can be beneficial for improving the performance of object detectors.

翻訳日:2023-06-29 17:32:59 公開日:2023-06-28

# 適応的勾配に基づく外乱除去による雑音ラベルの学習

Learning with Noisy Labels by Adaptive Gradient-Based Outlier Removal ( http://arxiv.org/abs/2306.04502v2 )

ライセンス: Link先を確認

Anastasiia Sedova, Lena Zellinger, Benjamin Roth

(参考訳) 正確で実質的なデータセットは、信頼性とパフォーマンスのよいモデルのトレーニングに不可欠です。しかし、手動でアノテートされたデータセットでさえラベルエラーを含んでいる。従来、ラベルのデノイジングの方法は、主に、データセットのオーバーフィルタやアンダーフィルタのプロセスである、異常値の検出と永続的な削除に重点を置いてきた。本稿では,Adaptive GRAdient-based outlier removal を用いて,雑音ラベルを用いた新しい学習法 AGRAを提案する。モデルトレーニングの前にデータセットをクリーニングする代わりに、トレーニングプロセス中にデータセットを動的に調整する。サンプルのバッチの集約勾配と個々のサンプル勾配を比較することで、この時点で対応するサンプルがモデルに有用か、あるいは非生産的かを動的に決定し、現在の更新のために残すべきである。いくつかのデータセットに対する広範囲な評価はAGRAの有効性を示しているが、包括的な結果分析は私たちの最初の仮説を支持している。

An accurate and substantial dataset is essential for training a reliable and well-performing model. However, even manually annotated datasets contain label errors, not to mention automatically labeled ones. Previous methods for label denoising have primarily focused on detecting outliers and their permanent removal - a process that is likely to over- or underfilter the dataset. In this work, we propose AGRA: a new method for learning with noisy labels by using Adaptive GRAdient-based outlier removal. Instead of cleaning the dataset prior to model training, the dataset is dynamically adjusted during the training process. By comparing the aggregated gradient of a batch of samples and an individual example gradient, our method dynamically decides whether a corresponding example is helpful for the model at this point or is counter-productive and should be left out for the current update. Extensive evaluation on several datasets demonstrates AGRA's effectiveness, while a comprehensive results analysis supports our initial hypothesis: permanent hard outlier removal is not always what model benefits the most from.

翻訳日:2023-06-29 17:32:38 公開日:2023-06-28

# 社会技術的ギャップを狭めるモデル評価の再検討

Rethinking Model Evaluation as Narrowing the Socio-Technical Gap ( http://arxiv.org/abs/2306.03100v2 )

ライセンス: Link先を確認

Q. Vera Liao, Ziang Xiao

(参考訳) 最近のジェネレーティブ言語モデル(llm)の開発は、研究コミュニティや業界が取り組んでいるモデル評価に新たな挑戦をもたらしている。これらのモデルの汎用性は興奮を喚起する一方で、必然的に均質化へと跳躍する。本稿では,この均質化によってもたらされる課題と責任に対処するためには,モデル評価の実践が重要な課題を担わなければならないことを論じる。社会科学、ヒューマン・コンピュータ・インタラクション(HCI)、説明可能なAI(XAI)の学際的な分野から教訓を得て、実世界の社会要求に基づく評価手法の開発をコミュニティに促し、現実主義から社会要求へのトレードオフと実用的コストの認識による多様な評価手法を取り入れて評価を行う。 HCI と現在の NLG 評価手法をマッピングすることにより,社会技術的ギャップを狭くし,オープンな疑問を呈する LLM の評価手法を提案する。

The recent development of generative and large language models (LLMs) poses new challenges for model evaluation that the research community and industry are grappling with. While the versatile capabilities of these models ignite excitement, they also inevitably make a leap toward homogenization: powering a wide range of applications with a single, often referred to as ``general-purpose'', model. In this position paper, we argue that model evaluation practices must take on a critical task to cope with the challenges and responsibilities brought by this homogenization: providing valid assessments for whether and how much human needs in downstream use cases can be satisfied by the given model (socio-technical gap). By drawing on lessons from the social sciences, human-computer interaction (HCI), and the interdisciplinary field of explainable AI (XAI), we urge the community to develop evaluation methods based on real-world socio-requirements and embrace diverse evaluation methods with an acknowledgment of trade-offs between realism to socio-requirements and pragmatic costs to conduct the evaluation. By mapping HCI and current NLG evaluation methods, we identify opportunities for evaluation methods for LLMs to narrow the socio-technical gap and pose open questions.

翻訳日:2023-06-29 17:32:24 公開日:2023-06-28

# 重ね合わせ方向の時間軸を持つ量子演算

Quantum operations with the time axis in a superposed direction ( http://arxiv.org/abs/2306.02755v3 )

ライセンス: Link先を確認

Seok Hyung Lie, M. S. Kim

(参考訳) 量子論において、ある過程が行列転位を適用し、それが物理的に保たれているかどうかを調べることによって、時間反転対称性を持つかどうかが示されている。しかし、量子過程の不定因果順序に関する最近の発見は、完全な反転以外に、より一般的な時間の対称性変換が存在することを示唆している。本研究では,行列変換という一般化された転置の概念を導入し,量子演算の未来と過去のヒルベルト空間の一般二部一元変換を考慮し,時間軸を重畳方向に確実に横たわらせ,従来研究されていた「時間の不定方向」、すなわち前方の重畳と後方の時間進化を一般化する。この枠組みは、時空構造が量子力学から現れると説明される量子重力と同様に時間と空間を等しく扱うアプローチに応用することができる。この一般化された転位法を用いて、完全テンソルの連続的一般化、サブシステムのトレースの動的バージョン、二成分量子相互作用における多重時間軸の互換性を調べる。特に,両部間相互作用がより異なる時間軸と一致している場合,因果的違反を防止するため,両者間の情報交換の費用が削減されることを示す。

In the quantum theory, it has been shown that one can see if a process has the time reversal symmetry by applying the matrix transposition and examining if it remains physical. However, recent discoveries regarding the indefinite causal order of quantum processes suggest that there may be other, more general symmetry transformations of time besides the complete reversal. In this work, we introduce an expanded concept of matrix transposition, the generalized transposition, that takes into account general bipartite unitary transformations of a quantum operation's future and past Hilbert spaces, allowing for making the time axis definitely lie in a superposed direction, which generalizes the previously studied `indefinite direction of time', i.e., superposition of the forward and the backward time evolution. This framework may have applications in approaches that treat time and space equally like quantum gravity, where the spatio-temporal structure is explained to emerge from quantum mechanics. We apply this generalized transposition to investigate a continuous generalization of perfect tensors, a dynamic version of tracing out a subsystem, and the compatibility of multiple time axes in bipartite quantum interactions. Notably, we demonstrate that when a bipartite interaction is consistent with more distinct local temporal axes, there is a reduced allowance for information exchange between the two parties in order to prevent causality violations.

翻訳日:2023-06-29 17:32:01 公開日:2023-06-28

# オートエンコーダの最大度トレーニング

Maximum Likelihood Training of Autoencoders ( http://arxiv.org/abs/2306.01843v2 )

ライセンス: Link先を確認

Peter Sorrenson, Felix Draxler, Armand Rousselot, Sander Hummerich, Lea Zimmermann and Ullrich K\"othe

(参考訳) 最大度トレーニングは好適な統計特性を持ち、特に正規化フローにおいて生成的モデリングに人気がある。一方、生成オートエンコーダは多様体仮説による流れの正規化よりも効率的なことを約束している。本研究では,制約のないオートエンコーダの最大確率トレーニングを初めて導入し,この2つのパラダイムを組み合わせる。第一に、フリーフォームネットワークのための既存の最大確率推定器は、潜在次元と線形にコストがスケールする反復スキームに依存するため、受け入れがたいほど遅い。改良された推定器を導入し、イテレーションを排除し、一定のコスト(バニラオートエンコーダのバッチあたりのランタイムの約2倍)をもたらす。第2に,自動エンコーダに最大限の確率を適用することで,異なる解を導き出すことが可能であり,この知見を用いて安定的な最大確率トレーニング目標を動機付けることを実証する。我々は,玩具,表,画像データについて広範な実験を行い,その結果の競争性能を実証した。我々は、我々のモデルを最大可能性オートエンコーダ(MLAE)と呼ぶ。

Maximum likelihood training has favorable statistical properties and is popular for generative modeling, especially with normalizing flows. On the other hand, generative autoencoders promise to be more efficient than normalizing flows due to the manifold hypothesis. In this work, we introduce successful maximum likelihood training of unconstrained autoencoders for the first time, bringing the two paradigms together. To do so, we identify and overcome two challenges: Firstly, existing maximum likelihood estimators for free-form networks are unacceptably slow, relying on iteration schemes whose cost scales linearly with latent dimension. We introduce an improved estimator which eliminates iteration, resulting in constant cost (roughly double the runtime per batch of a vanilla autoencoder). Secondly, we demonstrate that naively applying maximum likelihood to autoencoders can lead to divergent solutions and use this insight to motivate a stable maximum likelihood training objective. We perform extensive experiments on toy, tabular and image data, demonstrating the competitive performance of the resulting model. We call our model the maximum likelihood autoencoder (MLAE).

翻訳日:2023-06-29 17:31:39 公開日:2023-06-28

# 変圧器を用いたアノテーションバイアスを考慮した医用画像分割

Transformer-based Annotation Bias-aware Medical Image Segmentation ( http://arxiv.org/abs/2306.01340v2 )

ライセンス: Link先を確認

Zehui Liao, Yutong Xie, Shishuai Hu, Yong Xia

(参考訳) 手動画像分割は主観的であり、アノテータ関連バイアスに悩まされ、深層学習法によって模倣または増幅される。近年、このバイアスはアノテータの好みと確率的誤差の組合せであり、それぞれデコーダと画素単位の独立なガウス分布の後にある畳み込みブロックによってモデル化されている。畳み込みブロックは、全解像度レベルで様々な好みの度合いを効果的にモデル化することは不可能である。さらに、独立画素ワイドガウス分布は画素相関を無視し、不連続境界をもたらす。本稿では,アノテーションの嗜好と確率的誤りをモデル化することにより,アノテーション関連バイアスに取り組むトランスフォーマタ型アノテーション・バイアス・アウェア(tab)医療画像分割モデルを提案する。 TABはTransformerと学習可能なクエリを使って、好みに重点を置くさまざまな特徴を抽出する。これにより、TABは単一のセグメンテーションヘッドを使用して、様々な好みのセグメンテーションを同時に生成できる。さらに、TABは画素相関をモデル化する多変正規分布を仮定し、アノテーション分布を学習して確率誤差を解消する。 6つのアノテーションを付加したOD/OCセグメンテーションベンチマークでTABを評価した。以上の結果から,TABはアノテータ関連バイアスを考慮した既存の医用画像セグメンテーションモデルより優れていることが示唆された。

Manual medical image segmentation is subjective and suffers from annotator-related bias, which can be mimicked or amplified by deep learning methods. Recently, researchers have suggested that such bias is the combination of the annotator preference and stochastic error, which are modeled by convolution blocks located after decoder and pixel-wise independent Gaussian distribution, respectively. It is unlikely that convolution blocks can effectively model the varying degrees of preference at the full resolution level. Additionally, the independent pixel-wise Gaussian distribution disregards pixel correlations, leading to a discontinuous boundary. This paper proposes a Transformer-based Annotation Bias-aware (TAB) medical image segmentation model, which tackles the annotator-related bias via modeling annotator preference and stochastic errors. TAB employs the Transformer with learnable queries to extract the different preference-focused features. This enables TAB to produce segmentation with various preferences simultaneously using a single segmentation head. Moreover, TAB takes the multivariant normal distribution assumption that models pixel correlations, and learns the annotation distribution to disentangle the stochastic error. We evaluated our TAB on an OD/OC segmentation benchmark annotated by six annotators. Our results suggest that TAB outperforms existing medical image segmentation models which take into account the annotator-related bias.

翻訳日:2023-06-29 17:31:20 公開日:2023-06-28

# AIによる意思決定のためのヒューマンアライズドキャリブレーション

Human-Aligned Calibration for AI-Assisted Decision Making ( http://arxiv.org/abs/2306.00074v2 )

ライセンス: Link先を確認

Nina L. Corvelo Benz and Manuel Gomez Rodriguez

(参考訳) バイナリ分類器を使用して意思決定支援を行う場合、通常はラベル予測と信頼値の両方を提供する。次に、意思決定者は、信頼度値を使用して、予測をどれだけ信頼するかを判断する。この文脈では、信頼度値は、予測されたラベルが基底真理ラベルと一致する確率の十分に校正された推定値に対応するべきであるとしばしば主張されている。しかし、複数の実証的証拠は、意思決定者がこれらの信頼度値を用いて予測をいつ信頼するかを判断するのに難しいことを示唆している。本稿では,まずその理由を理解し,より有用な信頼値の構築方法を検討することを目的とする。我々はまず、広範囲のユーティリティ機能に対して、合理的な意思決定者が一般的に、上記の信頼度値を使って最適な決定方針を発見することができないデータ分布が存在することを論じる。しかし, 意思決定者自身の予測に対する信頼度に関して, 信頼度値が自然な整合性を満たすならば, 常に, 意思決定者が予測に立たなければならない信頼度が信頼度に単調であり, 発見可能性の向上に寄与する最適決定方針が存在することを示す。さらに, 意思決定者自身の予測に対する信頼度に対する多重化が, 調整の十分条件であることを示す。分類器が実際の人間の専門家に意思決定支援を提供する4つのAI支援意思決定タスクの実験は、我々の理論的結果を検証するとともに、アライメントがより良い意思決定につながることを示唆している。

Whenever a binary classifier is used to provide decision support, it typically provides both a label prediction and a confidence value. Then, the decision maker is supposed to use the confidence value to calibrate how much to trust the prediction. In this context, it has been often argued that the confidence value should correspond to a well calibrated estimate of the probability that the predicted label matches the ground truth label. However, multiple lines of empirical evidence suggest that decision makers have difficulties at developing a good sense on when to trust a prediction using these confidence values. In this paper, our goal is first to understand why and then investigate how to construct more useful confidence values. We first argue that, for a broad class of utility functions, there exist data distributions for which a rational decision maker is, in general, unlikely to discover the optimal decision policy using the above confidence values -- an optimal decision maker would need to sometimes place more (less) trust on predictions with lower (higher) confidence values. However, we then show that, if the confidence values satisfy a natural alignment property with respect to the decision maker's confidence on her own predictions, there always exists an optimal decision policy under which the level of trust the decision maker would need to place on predictions is monotone on the confidence values, facilitating its discoverability. Further, we show that multicalibration with respect to the decision maker's confidence on her own predictions is a sufficient condition for alignment. Experiments on four different AI-assisted decision making tasks where a classifier provides decision support to real human experts validate our theoretical results and suggest that alignment may lead to better decisions.

翻訳日:2023-06-29 17:30:55 公開日:2023-06-28

# 拡張視野を用いた複数音源変換トモグラフィーの解析的再構成に関する研究

Analytical reconstructions of multiple source-translation computed tomography with extended field of views: a research study ( http://arxiv.org/abs/2305.19767v2 )

ライセンス: Link先を確認

Zhisheng Wang, Yue Liu, Shunli Wang, Xingyuan Bian, Zongfeng Li and Junning Cui

(参考訳) 本稿では,複数音源変換トモグラフィ(mSTCT)を拡張視野(FOV)下での高品質な解析的再構成について検討する。より大規模なFOVでは、D-BPF や S-BPF を含む mSTCT のバックプロジェクションフィルタ (BPF) アルゴリズムが、不安定なバックプロジェクション重み付け因子と半スキャンモードにより画像エッジに許容できない誤りを犯し、mSTCT イメージングの意図から逸脱する。本稿では,fd-bpfとfs-bpfと略されるエラーのバランスをとるために,mstctの非重み付けd-bpf(nwd-bpf)を導出し,bpfsを特別なフルスキャンmstct(f-mstct)に導入する手法を提案する。第一戦略として、D-BPFに特殊変動関係を導入することにより、不安定な後方投影重み付け因子を除去する。第2の戦略として、F-mSTCT幾何とBPFを組み合わせることで、F-mSTCTに適切な冗長重み付け関数を導出する。実験により,提案手法が実証された。その中で、NWD-BPFは画像エッジの不安定性を弱めることができるが、詳細は曖昧であり、FS-BPFは大きな物体を撮像する極端に拡張されたFOVの下で高品質な安定画像を得ることができるが、FD-BPFよりも多くの投影を必要とする。 FOV画像の拡張における様々な実践的要件に対して,アルゴリズムの選択について提案する。

This paper is to investigate the high-quality analytical reconstructions of multiple source-translation computed tomography (mSTCT) under an extended field of view (FOV). Under the larger FOVs, the previously proposed backprojection filtration (BPF) algorithms for mSTCT, including D-BPF and S-BPF, make some intolerable errors in the image edges due to an unstable backprojection weighting factor and the half-scan mode, which deviates from the intention of mSTCT imaging. In this paper, to achieve reconstruction with as little error as possible under the extremely extended FOV, we propose two strategies, including deriving a no-weighting D-BPF (NWD-BPF) for mSTCT and introducing BPFs into a special full-scan mSTCT (F-mSTCT) to balance errors, i.e., abbreviated as FD-BPF and FS-BPF. For the first strategy, we eliminate this unstable backprojection weighting factor by introducing a special variable relationship in D-BPF. For the second strategy, we combine the F-mSTCT geometry with BPFs to study the performance and derive a suitable redundant weighting function for F-mSTCT. The experiments demonstrate our proposed methods for these strategies. Among them, NWD-BPF can weaken the instability at the image edges but blur the details, and FS-BPF can get high-quality stable images under the extremely extended FOV imaging a large object but requires more projections than FD-BPF. For different practical requirements in extending FOV imaging, we give suggestions on algorithm selection.

翻訳日:2023-06-29 17:30:27 公開日:2023-06-28

# 集合観測からの普遍的偏りのない分類法

A Universal Unbiased Method for Classification from Aggregate Observations ( http://arxiv.org/abs/2306.11343v2 )

ライセンス: Link先を確認

Zixi Wei, Lei Feng, Bo Han, Tongliang Liu, Gang Niu, Xiaofeng Zhu, Heng Tao Shen

(参考訳) 従来の教師付き分類では、個々のインスタンスには真のラベルが必要である。しかし、プライバシの懸念や不適切なアノテーションコストのために、個々のインスタンスの真のラベルを収集することは禁止される可能性がある。これは、個々のインスタンスではなく、インスタンスのグループに監督を提供する集合観察(CFAO)からの分類の研究を動機付けている。 CFAOは、多言語学習やラベル比率からの学習など、さまざまな学習問題を含む一般化学習フレームワークである。本研究の目的は,任意の損失に対する分類リスクの偏りのない推定値を保持する,新しいCFAOの普遍的手法を提案することである。実際、本手法はグループ内の各インスタンスに対する各ラベルの重要性を考慮し、分類器が学習するパーソナライズされた監督を提供する。理論的には,提案手法は不偏リスク推定器によるリスクの整合性を保証するだけでなく,任意の損失に対応できる。 CFAOの諸問題に対する大規模な実験により,提案手法の優位性を示した。

In conventional supervised classification, true labels are required for individual instances. However, it could be prohibitive to collect the true labels for individual instances, due to privacy concerns or unaffordable annotation costs. This motivates the study on classification from aggregate observations (CFAO), where the supervision is provided to groups of instances, instead of individual instances. CFAO is a generalized learning framework that contains various learning problems, such as multiple-instance learning and learning from label proportions. The goal of this paper is to present a novel universal method of CFAO, which holds an unbiased estimator of the classification risk for arbitrary losses -- previous research failed to achieve this goal. Practically, our method works by weighing the importance of each label for each instance in the group, which provides purified supervision for the classifier to learn. Theoretically, our proposed method not only guarantees the risk consistency due to the unbiased risk estimator but also can be compatible with arbitrary losses. Extensive experiments on various problems of CFAO demonstrate the superiority of our proposed method.

翻訳日:2023-06-29 17:23:45 公開日:2023-06-28

# 対角線はループ量子宇宙論の一般的な図像か?

Is the diagonal case a general picture for Loop Quantum Cosmology? ( http://arxiv.org/abs/2306.10934v2 )

ライセンス: Link先を確認

Matteo Bruno and Giovanni Montani

(参考訳) ループ量子重力の初期の均質宇宙への正しい実装は、su(2)対称性が適切に保持できないため、文献において長い議論の対象となっている。この対称性の役割はガウス制約によって表される。ここで、バニッシュでないガウス制約が見つかる。しかし、適切な変数を用いて3つのアベリア制約に再キャストできることを示し、ループ量子宇宙論においてそのような対称性が存在しないことを正当化する。

The correct implementation of the Loop Quantum Gravity to the early homogeneous Universe has been the subject of a long debate in the literature because the SU(2) symmetry cannot be properly retained. The role of this symmetry is expressed by the Gauss constraint. Here, a non-vanishing Gauss constraint is found. However, we show that using suitable variables, it can be recast into three Abelian constraints, justifying the absence of such a symmetry in Loop Quantum Cosmology.

翻訳日:2023-06-29 17:23:28 公開日:2023-06-28

# the false dawn: チップマクロ配置のためのgoogleの強化学習の再評価

The False Dawn: Reevaluating Google's Reinforcement Learning for Chip Macro Placement ( http://arxiv.org/abs/2306.09633v4 )

ライセンス: Link先を確認

Igor L. Markov

(参考訳) Google 2021 Natureの論文で、シリコンチップの物理的設計のための強化学習(RL)が論争を引き起こした。 nature紙は、報告された結果を生成するために必要なほとんどの入力と、方法論におけるいくつかの重要なステップを支持した。しかし、2つの異なる評価がギャップを埋め、Google RLが人間設計者より遅れており、よく知られたアルゴリズム(Simulated Annealing)、そして一般的な商用ソフトウェアよりも遅れていることを示した。クロスチェックデータによると、Nature論文の完全性は、行動、分析、報告の誤りによって著しく損なわれている。

Reinforcement learning (RL) for physical design of silicon chips in a Google 2021 Nature paper stirred controversy due to poorly documented claims that raised eyebrows and attracted critical media coverage. The Nature paper withheld most inputs needed to produce reported results and some critical steps in the methodology. But two separate evaluations filled in the gaps and demonstrated that Google RL lags behind human designers, behind a well-known algorithm (Simulated Annealing), and also behind generally-available commercial software. Crosschecked data indicate that the integrity of the Nature paper is substantially undermined owing to errors in the conduct, analysis and reporting.

翻訳日:2023-06-29 17:23:20 公開日:2023-06-28

# 体積医用画像分割のための学習可能な重み初期化

Learnable Weight Initialization for Volumetric Medical Image Segmentation ( http://arxiv.org/abs/2306.09320v3 )

ライセンス: Link先を確認

Shahina Kunhimon, Abdelrahman Shaker, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

(参考訳) 局所畳み込みとグローバルな注意の利点を組み合わせたハイブリッド容積医用画像セグメンテーションモデルが最近注目されている。主にアーキテクチャの変更に重点を置いているが、既存のほとんどのハイブリッドアプローチでは、医療データの本質的な容積性を無視して性能を制限する従来のデータ非依存の重み初期化スキームが使用されている。そこで本研究では, 利用可能な医療訓練データを用いて, 提案する自己監督目標を用いて, 文脈的および構造的手がかりを効果的に学習する, 学習可能な重み初期化手法を提案する。我々のアプローチはどんなハイブリッドモデルにも簡単に統合でき、外部のトレーニングデータを必要としない。多臓器・肺癌セグメンテーションタスクの実験は、我々のアプローチの有効性を示し、最先端セグメンテーション性能をもたらす。提案手法は,マルチオーガンセグメンテーションタスクにおける大規模データセットを用いて事前学習したswain-unetrモデルと比較して良好に機能する。ソースコードとモデルは、https://github.com/shahinakk/lwi-vmsで利用可能です。

Hybrid volumetric medical image segmentation models, combining the advantages of local convolution and global attention, have recently received considerable attention. While mainly focusing on architectural modifications, most existing hybrid approaches still use conventional data-independent weight initialization schemes which restrict their performance due to ignoring the inherent volumetric nature of the medical data. To address this issue, we propose a learnable weight initialization approach that utilizes the available medical training data to effectively learn the contextual and structural cues via the proposed self-supervised objectives. Our approach is easy to integrate into any hybrid model and requires no external training data. Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach, leading to state-of-the-art segmentation performance. Our proposed data-dependent initialization approach performs favorably as compared to the Swin-UNETR model pretrained using large-scale datasets on multi-organ segmentation task. Our source code and models are available at: https://github.com/ShahinaKK/LWI-VMS.

翻訳日:2023-06-29 17:23:05 公開日:2023-06-28

# AssistGPT:計画、実行、検査、学習が可能な汎用マルチモーダルアシスタント

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn ( http://arxiv.org/abs/2306.08640v2 )

ライセンス: Link先を確認

Difei Gao, Lei Ji, Luowei Zhou, Kevin Qinghong Lin, Joya Chen, Zihan Fan, Mike Zheng Shou

(参考訳) 近年のLarge Language Models (LLMs) の研究は、一般のNLPAIアシスタントに顕著な進歩をもたらした。いくつかの研究は、より一般的なマルチモーダルユーザクエリに対処するために、モデルやapiの計画と呼び出しにllmの使用をさらに検討している。この進歩にもかかわらず、視覚タスクの多様な性質のため、複雑な視覚ベースのタスクは依然として困難である。この多様性は2つの側面に反映されます 1)経路の推論。多くの実生活アプリケーションでは、クエリ自体を調べるだけでクエリを正確に分解することは困難である。特定の視覚内容と各ステップの結果に基づいた計画が通常必要である。 2)柔軟な入力と中間結果。入力フォームは、野生のケースでは柔軟で、単一の画像やビデオだけでなく、ビデオや画像の混合物(たとえば、ユーザービュー画像といくつかの参照ビデオ)も含む。さらに、複雑な推論プロセスは、ビデオナレーションやセグメント化されたビデオクリップなど、さまざまなマルチモーダル中間結果を生成する。このような一般的なケースに対処するため,我々は,plan,execute,inspect,learning(peil)と呼ばれるインターリーブされたコードと言語推論アプローチを備えたマルチモーダルaiアシスタントである assistgpt を提案する。具体的には、Plannerは自然言語を使ってExecutorのどのツールが次にすべきかを、現在の推論の進捗に基づいて計画することができる。インスペクタは、プランナーが特定のツールに適切な視覚情報を供給するのを補助する効率的なメモリマネージャである。最後に、推論プロセス全体が複雑で柔軟であるため、学習者はモデルが最適な解を自律的に探索し発見できるように設計されている。我々は, A-OKVQA と NExT-QA のベンチマーク実験を行った。さらに,本システムでは,ベンチマークよりもはるかに複雑な質問を処理可能であることを示す。

Recent research on Large Language Models (LLMs) has led to remarkable advancements in general NLP AI assistants. Some studies have further explored the use of LLMs for planning and invoking models or APIs to address more general multi-modal user queries. Despite this progress, complex visual-based tasks still remain challenging due to the diverse nature of visual tasks. This diversity is reflected in two aspects: 1) Reasoning paths. For many real-life applications, it is hard to accurately decompose a query simply by examining the query itself. Planning based on the specific visual content and the results of each step is usually required. 2) Flexible inputs and intermediate results. Input forms could be flexible for in-the-wild cases, and involves not only a single image or video but a mixture of videos and images, e.g., a user-view image with some reference videos. Besides, a complex reasoning process will also generate diverse multimodal intermediate results, e.g., video narrations, segmented video clips, etc. To address such general cases, we propose a multi-modal AI assistant, AssistGPT, with an interleaved code and language reasoning approach called Plan, Execute, Inspect, and Learn (PEIL) to integrate LLMs with various tools. Specifically, the Planner is capable of using natural language to plan which tool in Executor should do next based on the current reasoning progress. Inspector is an efficient memory manager to assist the Planner to feed proper visual information into a specific tool. Finally, since the entire reasoning process is complex and flexible, a Learner is designed to enable the model to autonomously explore and discover the optimal solution. We conducted experiments on A-OKVQA and NExT-QA benchmarks, achieving state-of-the-art results. Moreover, showcases demonstrate the ability of our system to handle questions far more complex than those found in the benchmarks.

翻訳日:2023-06-29 17:22:46 公開日:2023-06-28

# 分断された肺気道・容器のトポロジー修復:ベースラインとデータセット

Topology Repairing of Disconnected Pulmonary Airways and Vessels: Baselines and a Dataset ( http://arxiv.org/abs/2306.07089v2 )

ライセンス: Link先を確認

Ziqiao Weng, Jiancheng Yang, Dongnan Liu, Weidong Cai

(参考訳) 肺疾患の診断と治療には, 肺気道および血管の正確な分断が重要である。しかし、現在のディープラーニングアプローチは、その臨床的有用性を阻害する分離性の問題に苦しむ。この課題に対処するために, 分離肺管状構造のトポロジーを修復するためにデータ駆動法を応用した後処理手法を提案する。我々のアプローチは、ニューラルネットワークが非接続なコンポーネントをブリッジできるキーポイントを予測するために訓練されるキーポイント検出タスクとして問題を定式化する。完全肺構造から分離したデータを生成するトレーニングデータ合成パイプラインを使用する。さらに、肺気道、動脈、静脈の800の完全な3Dモデルと合成切断データを含む新しい肺樹修復データセットが公開されている。私たちのコードとデータはhttps://github.com/m3dv/pulmonary-tree-repairingで入手できます。

Accurate segmentation of pulmonary airways and vessels is crucial for the diagnosis and treatment of pulmonary diseases. However, current deep learning approaches suffer from disconnectivity issues that hinder their clinical usefulness. To address this challenge, we propose a post-processing approach that leverages a data-driven method to repair the topology of disconnected pulmonary tubular structures. Our approach formulates the problem as a keypoint detection task, where a neural network is trained to predict keypoints that can bridge disconnected components. We use a training data synthesis pipeline that generates disconnected data from complete pulmonary structures. Moreover, the new Pulmonary Tree Repairing (PTR) dataset is publicly available, which comprises 800 complete 3D models of pulmonary airways, arteries, and veins, as well as the synthetic disconnected data. Our code and data are available at https://github.com/M3DV/pulmonary-tree-repairing.

翻訳日:2023-06-29 17:21:53 公開日:2023-06-28

# 近似制約最適化のための自己教師付きEquality Embedded Deep Lagrange Dual

Self-supervised Equality Embedded Deep Lagrange Dual for Approximate Constrained Optimization ( http://arxiv.org/abs/2306.06674v3 )

ライセンス: Link先を確認

Minsoo Kim, Hongseok Kim

(参考訳) 従来の解法はしばしば、特に大規模かつ時間クリティカルな問題において、制約付き最適化のために計算コストがかかる。これにより、ニューラルネットワーク(NN)を高速な最適解近似器として使用することへの関心が高まっているが、NNに制約を組み込むことは難しい。そこで本研究では,ラベルを使わずに最適解を見つけることを学ぶフレームワークdeep lagrange dual with equal embedded (deeplde)を提案する。実現可能なソリューションを確保するため、NNに等価性制約を組み込み、未等式制約を課すために原始双対法を用いてNNを訓練する。さらに,DeepLDEの収束性を証明し,本手法だけでは等式埋め込みの助けなしには等式制約を保証できないことを示す。コンベックス,非凸,AC最適電力流(AC-OPF)問題に関するシミュレーション結果から,提案したDeepLDEはNNベースの全アプローチの中で最小の最適性ギャップを達成でき,かつ常に実現可能な解を確保できることを示す。さらに,制約付き凸,非凸最適化,ac-opfの解法において,提案手法の計算時間はdc3および従来の解法に比べて約5～250倍高速である。

Conventional solvers are often computationally expensive for constrained optimization, particularly in large-scale and time-critical problems. While this leads to a growing interest in using neural networks (NNs) as fast optimal solution approximators, incorporating the constraints with NNs is challenging. In this regard, we propose deep Lagrange dual with equality embedding (DeepLDE), a framework that learns to find an optimal solution without using labels. To ensure feasible solutions, we embed equality constraints into the NNs and train the NNs using the primal-dual method to impose inequality constraints. Furthermore, we prove the convergence of DeepLDE and show that the primal-dual learning method alone cannot ensure equality constraints without the help of equality embedding. Simulation results on convex, non-convex, and AC optimal power flow (AC-OPF) problems show that the proposed DeepLDE achieves the smallest optimality gap among all the NN-based approaches while always ensuring feasible solutions. Furthermore, the computation time of the proposed method is about 5 to 250 times faster than DC3 and the conventional solvers in solving constrained convex, non-convex optimization, and/or AC-OPF.

翻訳日:2023-06-29 17:21:39 公開日:2023-06-28

# 微分表示型測光ステレオ

Differentiable Display Photometric Stereo ( http://arxiv.org/abs/2306.13325v2 )

ライセンス: Link先を確認

Seokjun Choi, Seungwoo Yoon, Giljoo Nam, Seungyong Lee, Seung-Hwan Baek

(参考訳) フォトメトリックステレオは、光度条件の変化を利用してピクセルごとの表面正常を再構成する。従来のモニタを照明源として使用するディスプレイフォトメトリックステレオの概念は、かさばり、使いづらい従来の設定でしばしば発生する制限を克服する可能性がある。本稿では,市販のモニターとカメラを用いた高忠実度ノーマルリコンストラクションを実現するため,DDPS(diffariable Display Photometric Stereo)を提案する。 ddpsは、フォトメトリックステレオにおける批判的だがしばしば無視される課題に対処している。本稿では,フォトメトリックステレオ再構成法と基底照明画像形成を併用する微分可能なフレームワークを提案する。これにより、ディスプレイパターンの学習が容易になり、自動微分による高品質な正常な再構築につながる。エンドツーエンドの最適化に固有の合成ドメインギャップに対処し、3Dプリントオブジェクトからなる実世界の測光ステレオトレーニングデータセットを提案する。さらに,光度ステレオの異常な性質を低減するために,モニタから放射される線形偏光を利用して,撮像画像中の拡散反射とスペクトル反射を光学的に分離する。 DDPSは、ターゲット設定に最適化されたディスプレイパターンを学習することができ、初期化に堅牢であることを示す。本研究では,3次元プリントオブジェクトにおけるDDPSの評価を行い,DPSが効果的な測光ステレオ再構成を実現することを実証した。

Photometric stereo leverages variations in illumination conditions to reconstruct per-pixel surface normals. The concept of display photometric stereo, which employs a conventional monitor as an illumination source, has the potential to overcome limitations often encountered in bulky and difficult-to-use conventional setups. In this paper, we introduce Differentiable Display Photometric Stereo (DDPS), a method designed to achieve high-fidelity normal reconstruction using an off-the-shelf monitor and camera. DDPS addresses a critical yet often neglected challenge in photometric stereo: the optimization of display patterns for enhanced normal reconstruction. We present a differentiable framework that couples basis-illumination image formation with a photometric-stereo reconstruction method. This facilitates the learning of display patterns that leads to high-quality normal reconstruction through automatic differentiation. Addressing the synthetic-real domain gap inherent in end-to-end optimization, we propose the use of a real-world photometric-stereo training dataset composed of 3D-printed objects. Moreover, to reduce the ill-posed nature of photometric stereo, we exploit the linearly polarized light emitted from the monitor to optically separate diffuse and specular reflections in the captured images. We demonstrate that DDPS allows for learning display patterns optimized for a target configuration and is robust to initialization. We assess DDPS on 3D-printed objects with ground-truth normals and diverse real-world objects, validating that DDPS enables effective photometric-stereo reconstruction.

翻訳日:2023-06-29 17:11:59 公開日:2023-06-28

# フェデレーション学習におけるコミュニケーション削減のための効率的な仮想データ生成手法

An Efficient Virtual Data Generation Method for Reducing Communication in Federated Learning ( http://arxiv.org/abs/2306.12088v2 )

ライセンス: Link先を確認

Cheng Yang, Xue Yang, Dongxian Wu, Xiaohu Tang

(参考訳) コミュニケーションのオーバーヘッドは、連合学習(fl)における大きな課題の1つです。いくつかの古典的なスキームでは、サーバがローカルモデルから参加者のトレーニングデータに関する補助情報を抽出して中央ダミーデータセットを構築することができると仮定している。サーバはダミーデータセットを使用して、集約されたグローバルモデルを微調整し、より少ない通信ラウンドでターゲットテスト精度を達成する。本稿では、上記のソリューションをデータベースの通信効率の高いflフレームワークにまとめる。提案フレームワークの鍵となるのは,ダミーデータセットが集約されたグローバルモデルに正の影響を与えることを保証する効率的な抽出モジュール(EM)を設計することである。ジェネレータを使ってEMを設計する既存手法とは異なり,提案手法では勾配マッチングの概念を取り入れてEMを構築する。具体的には、FedINIBoostは、実際のデータセットのプロキシデータセットを、各コミュニケーションラウンドの参加者毎に2つのステップで構築する。その後、サーバはすべてのプロキシデータセットを集約し、集約されたグローバルモデルを微調整するために使用される中央ダミーデータセットを形成する。従来手法であるFedAVG,FedProx,Moon,FedFTGと比較し,本手法の優位性を検証した。さらに、FedINIBoostは、FLの初期における集約グローバルモデルの性能を微調整する上で重要な役割を果たす。

Communication overhead is one of the major challenges in Federated Learning(FL). A few classical schemes assume the server can extract the auxiliary information about training data of the participants from the local models to construct a central dummy dataset. The server uses the dummy dataset to finetune aggregated global model to achieve the target test accuracy in fewer communication rounds. In this paper, we summarize the above solutions into a data-based communication-efficient FL framework. The key of the proposed framework is to design an efficient extraction module(EM) which ensures the dummy dataset has a positive effect on finetuning aggregated global model. Different from the existing methods that use generator to design EM, our proposed method, FedINIBoost borrows the idea of gradient match to construct EM. Specifically, FedINIBoost builds a proxy dataset of the real dataset in two steps for each participant at each communication round. Then the server aggregates all the proxy datasets to form a central dummy dataset, which is used to finetune aggregated global model. Extensive experiments verify the superiority of our method compared with the existing classical method, FedAVG, FedProx, Moon and FedFTG. Moreover, FedINIBoost plays a significant role in finetuning the performance of aggregated global model at the initial stage of FL.

翻訳日:2023-06-29 17:11:37 公開日:2023-06-28

# G-NM:数値時系列予測モデルのグループ

G-NM: A Group of Numerical Time Series Prediction Models ( http://arxiv.org/abs/2306.11667v2 )

ライセンス: Link先を確認

Juyoung Yun

(参考訳) 本研究では,数値時系列予測モデル群 (G-NM) と総称される数値時系列予測モデルの包括的アンサンブルの開発と実装に焦点を当てた。この包括的セットは、リカレントニューラルネットワーク(RNN)やLong Short-Term Memory(LSTM)といった現代のニューラルネットワークモデルに加えて、Autoregressive Integrated moving Average(ARIMA)、Holt-Wintersのメソッド、SVR(Support Vector Regression)といった従来のモデルを含む。 G-NMは、複雑な自然現象に固有のパターンや傾向に関連する予測能力を増強するために明確に構成されている。これらの事象に関連する時系列データを利用することで、g-nmは長期にわたってそのような現象の予測を容易にする。本研究の目的は,このような事象に対する我々の理解を深めることと,予測の精度を著しく向上させることである。 g-nmは時系列データに現れる線形および非線形の依存関係、季節性、トレンドの両方をカプセル化する。これらのモデルはそれぞれ、線形トレンドと季節性を扱うARIMAのレジリエンス、非線形パターンをキャプチャするSVRの習熟度、時系列データの様々なコンポーネントをモデル化するLSTMの適応性など、さまざまな長所に貢献している。 g-nmポテンシャルの活用を通じて,大規模時系列予測モデルにおける最先端の進歩を試みている。我々は,本研究が,自然界を構成する複雑な事象を理解し,予測するための,現在進行中の取り組みにおいて,重要な足掛かりとなることを期待する。

In this study, we focus on the development and implementation of a comprehensive ensemble of numerical time series forecasting models, collectively referred to as the Group of Numerical Time Series Prediction Model (G-NM). This inclusive set comprises traditional models such as Autoregressive Integrated Moving Average (ARIMA), Holt-Winters' method, and Support Vector Regression (SVR), in addition to modern neural network models including Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM). G-NM is explicitly constructed to augment our predictive capabilities related to patterns and trends inherent in complex natural phenomena. By utilizing time series data relevant to these events, G-NM facilitates the prediction of such phenomena over extended periods. The primary objective of this research is to both advance our understanding of such occurrences and to significantly enhance the accuracy of our forecasts. G-NM encapsulates both linear and non-linear dependencies, seasonalities, and trends present in time series data. Each of these models contributes distinct strengths, from ARIMA's resilience in handling linear trends and seasonality, SVR's proficiency in capturing non-linear patterns, to LSTM's adaptability in modeling various components of time series data. Through the exploitation of the G-NM potential, we strive to advance the state-of-the-art in large-scale time series forecasting models. We anticipate that this research will represent a significant stepping stone in our ongoing endeavor to comprehend and forecast the complex events that constitute the natural world.

翻訳日:2023-06-29 17:11:17 公開日:2023-06-28

# セグメンテーションはイメージデヘイズに役立ちます

Let Segment Anything Help Image Dehaze ( http://arxiv.org/abs/2306.15870v1 )

ライセンス: Link先を確認

Zheyan Jin, Shiqi Chen, Yueting Chen, Zhihai Xu, Huajun Feng

(参考訳) 大きな言語モデルと高レベルのビジョンモデルは、大きなデータセットとモデルサイズで素晴らしいパフォーマンス向上を実現しています。しかし、画像デヘイズやぼかし除去のような低レベルのコンピュータビジョンタスクは、依然として少数のデータセットと小さなモデルに依存しており、一般的にオーバーフィットと局所的なオプティマをもたらす。そこで本稿では,大規模モデルを低レベルコンピュータビジョンタスクに統合するフレームワークを提案する。画像分割のタスクと同様に、hazeの分解もテクスチャに関連している。そこで我々は,グレースケール符号化,ネットワークチャネル拡張,プリデヘイズ構造を検出し,低レベルデヘイジングネットワークに大規模事前知識を統合することを提案する。異なるデータセットとアルゴリズムの比較実験により,低レベルの視覚タスクを導く上で,大規模モデルの有効性と適用性を示す。最後に,灰色スケール符号化,ネットワークチャネル拡張,リカレントネットワーク構造の効果をアブレーション実験により実証する。追加のデータやトレーニングリソースが不要な条件下では,大規模モデルの事前知識の統合により,劣化性能が向上し,低レベル視覚タスクのトレーニング時間を短縮できることを示す。

The large language model and high-level vision model have achieved impressive performance improvements with large datasets and model sizes. However, low-level computer vision tasks, such as image dehaze and blur removal, still rely on a small number of datasets and small-sized models, which generally leads to overfitting and local optima. Therefore, we propose a framework to integrate large-model prior into low-level computer vision tasks. Just as with the task of image segmentation, the degradation of haze is also texture-related. So we propose to detect gray-scale coding, network channel expansion, and pre-dehaze structures to integrate large-model prior knowledge into any low-level dehazing network. We demonstrate the effectiveness and applicability of large models in guiding low-level visual tasks through different datasets and algorithms comparison experiments. Finally, we demonstrate the effect of grayscale coding, network channel expansion, and recurrent network structures through ablation experiments. Under the conditions where additional data and training resources are not required, we successfully prove that the integration of large-model prior knowledge will improve the dehaze performance and save training time for low-level visual tasks.

翻訳日:2023-06-29 16:16:12 公開日:2023-06-28

# grass: リモートセンシング画像セマンティクスセグメンテーションのためのグラデーション誘導サンプリング戦略を用いたコントラスト学習

GraSS: Contrastive Learning with Gradient Guided Sampling Strategy for Remote Sensing Image Semantic Segmentation ( http://arxiv.org/abs/2306.15868v1 )

ライセンス: Link先を確認

Zhaoyang Zhang, Zhen Ren, Chao Tao, Yunsheng Zhang, Chengli Peng, Haifeng Li

(参考訳) 自己教師付きコントラスト学習(SSCL)は、リモートセンシング画像(RSI)理解において重要なマイルストーンを達成している。その本質は、ダウンストリームタスクに有益である多数のラベルのない画像から画像の特徴を抽出するための教師なしインスタンス識別プリテキストタスクを設計することである。しかしながら、既存のインスタンス識別ベースのssclは、rsiセマンティックセグメンテーションタスクに適用される場合、2つの制限に苦しむ。 1) 肯定的なサンプル結合問題 2)特徴適応バイアス。ピクセルレベルやオブジェクトレベルの機能を必要とするセマンティックセグメンテーションタスクに適用すると、機能適応バイアスが導入される。本研究では,RSIの特定領域に対して,教師なしのコントラスト損失の勾配によって識別情報をマッピングできることを見いだし,これらの特定領域は特異な接地対象を含む傾向にあることを示した。そこで本研究では,RSIセマンティックセグメンテーションのためのGradient Guided Sampling Strategy(GraSS)を用いたコントラスト学習を提案する。 GraSSは、インスタンス識別ウォームアップ(IDウォームアップ)とGradient Guided Sampling contrastive training(GSトレーニング)の2つのステージで構成される。 idウォームアップは、コントラスト損失勾配に初期識別情報を提供することを目的としている。 gsトレーニングステージは、より特異な接地対象を含むrsiパッチのコントラスト損失勾配および適応的に選択された領域に含まれる識別情報を活用し、新しい正と負のサンプルを構築することを目的としている。 3つのオープンデータセットの実験結果から、GraSSは高分解能RSIセマンティックセグメンテーションにおけるSSCLの性能を効果的に向上することが示された。 5つの異なる種類のssclからの7つのベースライン法と比較すると、草は平均で 1.57 %、最大で 3.58 % の改善を達成している。ソースコードはhttps://github.com/GeoX-Lab/GraSSで入手できる。

Self-supervised contrastive learning (SSCL) has achieved significant milestones in remote sensing image (RSI) understanding. Its essence lies in designing an unsupervised instance discrimination pretext task to extract image features from a large number of unlabeled images that are beneficial for downstream tasks. However, existing instance discrimination based SSCL suffer from two limitations when applied to the RSI semantic segmentation task: 1) Positive sample confounding issue; 2) Feature adaptation bias. It introduces a feature adaptation bias when applied to semantic segmentation tasks that require pixel-level or object-level features. In this study, We observed that the discrimination information can be mapped to specific regions in RSI through the gradient of unsupervised contrastive loss, these specific regions tend to contain singular ground objects. Based on this, we propose contrastive learning with Gradient guided Sampling Strategy (GraSS) for RSI semantic segmentation. GraSS consists of two stages: Instance Discrimination warm-up (ID warm-up) and Gradient guided Sampling contrastive training (GS training). The ID warm-up aims to provide initial discrimination information to the contrastive loss gradients. The GS training stage aims to utilize the discrimination information contained in the contrastive loss gradients and adaptively select regions in RSI patches that contain more singular ground objects, in order to construct new positive and negative samples. Experimental results on three open datasets demonstrate that GraSS effectively enhances the performance of SSCL in high-resolution RSI semantic segmentation. Compared to seven baseline methods from five different types of SSCL, GraSS achieves an average improvement of 1.57\% and a maximum improvement of 3.58\% in terms of mean intersection over the union. The source code is available at https://github.com/GeoX-Lab/GraSS

翻訳日:2023-06-29 16:15:53 公開日:2023-06-28

# 個人別分散推定と学習

Differentially Private Distributed Estimation and Learning ( http://arxiv.org/abs/2306.15865v1 )

ライセンス: Link先を確認

Marios Papachristou, M. Amin Rahimian

(参考訳) エージェントが情報交換を行い、個人が観測したサンプルから未知の確率変数の統計的特性を推定するネットワーク環境における分散推定と学習の問題について検討する。プライベートな観察に関する情報を交換することで、エージェントは未知の量をまとめて見積もることができるが、プライバシー上のリスクにも直面する。我々のアグリゲーション・スキームの目標は、観測されたデータを時間とともに、ネットワーク全体にわたって効率的に組み合わせ、エージェントのプライバシー要求を調整し、その周辺地域を超えて調整することである。我々のアルゴリズムにより、参加者はオフラインまたはオンラインで取得されたプライベート信号から十分な統計量を推定し、その信号とネットワーク近傍のプライバシーを維持することができる。これは微分プライバシー(dp)制約の下で交換された推定値にノイズを付加する調整されたランダム化スキームを持つ線形集計スキームによって達成される。いずれの場合も、全ての信号に中心的なアクセスを持つ仮説的、全知的な観測者の推定への収束を証明し、アルゴリズムの効率を実証する。また,コンバージェンスレート解析と有限時間性能保証を提供し,コンバージェンス時間を最小化するノイズがラプラスノイズであり,各エージェントの信号およびネットワーク特性に対する感度に対応するパラメータであることを示す。最後に,我々の理論的結果を補足し,検証するために,米国電力グリッドネットワークによる実世界のデータと,ドイツ家庭の電力消費データを用いて,すべてのプライバシー体制下での電力ステーションおよび家庭の平均消費電力を推定する実験を行った。

We study distributed estimation and learning problems in a networked environment in which agents exchange information to estimate unknown statistical properties of random variables from their privately observed samples. By exchanging information about their private observations, the agents can collectively estimate the unknown quantities, but they also face privacy risks. The goal of our aggregation schemes is to combine the observed data efficiently over time and across the network, while accommodating the privacy needs of the agents and without any coordination beyond their local neighborhoods. Our algorithms enable the participating agents to estimate a complete sufficient statistic from private signals that are acquired offline or online over time, and to preserve the privacy of their signals and network neighborhoods. This is achieved through linear aggregation schemes with adjusted randomization schemes that add noise to the exchanged estimates subject to differential privacy (DP) constraints. In every case, we demonstrate the efficiency of our algorithms by proving convergence to the estimators of a hypothetical, omniscient observer that has central access to all of the signals. We also provide convergence rate analysis and finite-time performance guarantees and show that the noise that minimizes the convergence time to the best estimates is the Laplace noise, with parameters corresponding to each agent's sensitivity to their signal and network characteristics. Finally, to supplement and validate our theoretical results, we run experiments on real-world data from the US Power Grid Network and electric consumption data from German Households to estimate the average power consumption of power stations and households under all privacy regimes.

翻訳日:2023-06-29 16:15:21 公開日:2023-06-28

# ゼロノイズ補間による実効量子体積の測定値の増大

Increasing the Measured Effective Quantum Volume with Zero Noise Extrapolation ( http://arxiv.org/abs/2306.15863v1 )

ライセンス: Link先を確認

Elijah Pelofske, Vincent Russo, Ryan LaRose, Andrea Mari, Dan Strano, Andreas B\"artschi, Stephan Eidenbenz, William J. Zeng

(参考訳) Quantum Volumeは、短期量子コンピュータのフルスタックベンチマークである。ターゲットデバイス上で合理的な忠実さで実行できる正方形回路の最大サイズを定量化する。エラー緩和(英: error mitigation)は、ノイズ量子コンピュータの期待値を計算する際に発生するノイズの影響を取り除くための一連の手法である。有効量子ボリュームは、ターゲットデバイスだけでなく、エラー軽減アルゴリズムの有効性を評価するために、量子ボリュームプロトコルにエラー緩和を適用するための提案された計量である。ディジタルゼロノイズ外挿法 (Digital Zero-Noise Extrapolation, ZNE) は、回路折り畳みによるノイズレス予測値を推定し、既知のスケール因子による誤差を増幅し、ゼロノイズ限界への外挿を行う。ここでは,大域的かつ局所的なユニタリ折り畳みと分数スケールの因子を併用したZNEが,動的デカップリングと組み合わせることで,ベンダーが測定した量子体積よりも有効な量子体積を増大させることができることを示す。具体的には、4つのibm量子超伝導プロセッサユニットの有効量子体積を測定し、各デバイスでベンダーが測定した量子体積よりも大きい値を得る。これが最初の報告である。

Quantum Volume is a full-stack benchmark for near-term quantum computers. It quantifies the largest size of a square circuit which can be executed on the target device with reasonable fidelity. Error mitigation is a set of techniques intended to remove the effects of noise present in the computation of noisy quantum computers when computing an expectation value of interest. Effective quantum volume is a proposed metric that applies error mitigation to the quantum volume protocol in order to evaluate the effectiveness not only of the target device but also of the error mitigation algorithm. Digital Zero-Noise Extrapolation (ZNE) is an error mitigation technique that estimates the noiseless expectation value using circuit folding to amplify errors by known scale factors and extrapolating to the zero-noise limit. Here we demonstrate that ZNE, with global and local unitary folding with fractional scale factors, in conjunction with dynamical decoupling, can increase the effective quantum volume over the vendor-measured quantum volume. Specifically, we measure the effective quantum volume of four IBM Quantum superconducting processor units, obtaining values that are larger than the vendor-measured quantum volume on each device. This is the first such increase reported.

翻訳日:2023-06-29 16:14:55 公開日:2023-06-28

# 階層型グラフニューラルネットワークによるハンドオブジェクトの6次元ポーズ推定

Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation of In-hand Objects ( http://arxiv.org/abs/2306.15858v1 )

ライセンス: Link先を確認

Alireza Rezazadeh, Snehal Dikhale, Soshi Iba and Nawid Jamali

(参考訳) ロボット操作、特に手動の物体操作は、しばしば物体の6Dポーズの正確な推定を必要とする。推定ポーズの精度を向上させるため、6次元物体ポーズ推定における最先端のアプローチでは、rgb画像、奥行き、触覚読取などの1つ以上のモードからの観測データを用いる。しかし、既存のアプローチでは、これらのモダリティによって捕獲された物体の基底となる幾何学的構造を限定的に利用し、視覚的特徴に依存している。これにより、このような視覚的特徴が欠けているオブジェクトや、単に視覚的特徴が無視されているオブジェクトが提示される場合のパフォーマンスが低下する。また,現在のアプローチでは,指の位置に埋め込まれた固有情報を利用しない。 To address these limitations, in this paper: (1) we introduce a hierarchical graph neural network architecture for combining multimodal (vision and touch) data that allows for a geometrically informed 6D object pose estimation, (2) we introduce a hierarchical message passing operation that flows the information within and across modalities to learn a graph-based object representation, and (3) we introduce a method that accounts for the proprioceptive information for in-hand object representation. 我々は,YCBオブジェクトとモデルセットから多種多様なオブジェクトのサブセット上でモデルを評価し,その手法が既存の最先端技術よりも精度と強靭性で優れていることを示す。また,提案フレームワークを実ロボットにデプロイし,実環境への移動を定量的に示す。

Robotic manipulation, in particular in-hand object manipulation, often requires an accurate estimate of the object's 6D pose. To improve the accuracy of the estimated pose, state-of-the-art approaches in 6D object pose estimation use observational data from one or more modalities, e.g., RGB images, depth, and tactile readings. However, existing approaches make limited use of the underlying geometric structure of the object captured by these modalities, thereby, increasing their reliance on visual features. This results in poor performance when presented with objects that lack such visual features or when visual features are simply occluded. Furthermore, current approaches do not take advantage of the proprioceptive information embedded in the position of the fingers. To address these limitations, in this paper: (1) we introduce a hierarchical graph neural network architecture for combining multimodal (vision and touch) data that allows for a geometrically informed 6D object pose estimation, (2) we introduce a hierarchical message passing operation that flows the information within and across modalities to learn a graph-based object representation, and (3) we introduce a method that accounts for the proprioceptive information for in-hand object representation. We evaluate our model on a diverse subset of objects from the YCB Object and Model Set, and show that our method substantially outperforms existing state-of-the-art work in accuracy and robustness to occlusion. We also deploy our proposed framework on a real robot and qualitatively demonstrate successful transfer to real settings.

翻訳日:2023-06-29 16:14:34 公開日:2023-06-28

# 斜め試料を用いた低位構造を有する多腕帯の純探査

Pure exploration in multi-armed bandits with low rank structure using oblivious sampler ( http://arxiv.org/abs/2306.15856v1 )

ライセンス: Link先を確認

Yaxiong Liu, Atsuyoshi Nakamura, Kohei Hatano, Eiji Takimoto

(参考訳) 本稿では,純探索問題の報酬系列の低階構造について考察する。まず, 探索戦略が探査のフィードバックを得られない純粋な探査問題において, 分離した設定を提案する。この分離のため、腕を標本化するための探索戦略が必要である。報奨ベクトルの核情報を取り込むことにより、result bound $o(d\sqrt{(\ln n)/n})$を持つ時間変動と固定ケースの両方に対して効率的なアルゴリズムを提供する。次に,低位列の多腕バンディットにおける純粋探索に対する下限を示す。我々の上界と下界の間には$o(\sqrt{\ln n})$ギャップがある。

In this paper, we consider the low rank structure of the reward sequence of the pure exploration problems. Firstly, we propose the separated setting in pure exploration problem, where the exploration strategy cannot receive the feedback of its explorations. Due to this separation, it requires that the exploration strategy to sample the arms obliviously. By involving the kernel information of the reward vectors, we provide efficient algorithms for both time-varying and fixed cases with regret bound $O(d\sqrt{(\ln N)/n})$. Then, we show the lower bound to the pure exploration in multi-armed bandits with low rank sequence. There is an $O(\sqrt{\ln N})$ gap between our upper bound and the lower bound.

翻訳日:2023-06-29 16:14:11 公開日:2023-06-28

# GoalieNet: アイスホッケーにおけるジョイントゴール、機器、ネットポーズ推定のためのマルチステージネットワーク

GoalieNet: A Multi-Stage Network for Joint Goalie, Equipment, and Net Pose Estimation in Ice Hockey ( http://arxiv.org/abs/2306.15853v1 )

ライセンス: Link先を確認

Marjan Shahi, David Clausi, Alexander Wong

(参考訳) コンピュータビジョン駆動アイスホッケー分析の分野では、最も困難で研究の少ないタスクの1つはゴールキーパーのポーズ推定である。一般的な人間のポーズ推定とは違って、太いパッドやマスクの下に隠されたゴールキーパーの関節に対応するキーポイントの検出だけでなく、大きな脚パッドや手袋、スティック、ホッケーネットなどに対応する多数の非人間のキーポイントも含むため、ゴールキーパーポーズ推定ははるかに複雑である。この課題に取り組むため,我々は,ゴールキーパー,機器,ネットのポーズを共同で推定する多段深層ニューラルネットワークであるgoalienetを紹介する。 NHLベンチマークデータを用いた実験の結果,提案したGoalieNetはすべてのキーポイントに対して平均84倍の精度を達成でき,29のキーポイントのうち22が80倍以上の精度で検出されることがわかった。このことは,このような共同ポーズ推定手法が有望な研究方向であることを示す。

In the field of computer vision-driven ice hockey analytics, one of the most challenging and least studied tasks is goalie pose estimation. Unlike general human pose estimation, goalie pose estimation is much more complex as it involves not only the detection of keypoints corresponding to the joints of the goalie concealed under thick padding and mask, but also a large number of non-human keypoints corresponding to the large leg pads and gloves worn, the stick, as well as the hockey net. To tackle this challenge, we introduce GoalieNet, a multi-stage deep neural network for jointly estimating the pose of the goalie, their equipment, and the net. Experimental results using NHL benchmark data demonstrate that the proposed GoalieNet can achieve an average of 84\% accuracy across all keypoints, where 22 out of 29 keypoints are detected with more than 80\% accuracy. This indicates that such a joint pose estimation approach can be a promising research direction.

翻訳日:2023-06-29 16:14:02 公開日:2023-06-28

# 自律ロボットのための新しい室内人間動作データセットRoAMを用いた行動条件深部視覚予測

Action-conditioned Deep Visual Prediction with RoAM, a new Indoor Human Motion Dataset for Autonomous Robots ( http://arxiv.org/abs/2306.15852v1 )

ライセンス: Link先を確認

Meenakshi Sarkar, Vinayak Honkote, Dibyendu Das and Debasish Ghose

(参考訳) 産業におけるロボットの採用の増加に伴い、ロボットが人間と協調して効果的に行動を予測、理解、計画できる高度なアルゴリズムの開発に注力することが重要である。ロボットのエゴビジョンから様々な人間の動きを記録できる様々な屋内環境において、カスタムメイドのタートルボット3バーガーロボットで収集されるロボット自律運動(RoAM)ビデオデータセットを紹介する。データセットには、LiDARスキャンの同期記録や、静的で動く人間のエージェントの周りを移動する際にロボットが取るすべての制御アクションも含まれている。このユニークなデータセットは、記録エージェントが部分的に観察可能なシナリオや、イメージングセンサーが移動プラットフォームにマウントされているケースにおいて、将来の画像フレームを予測できる新しいビジュアル予測フレームワークの開発とベンチマークを提供する。 acpnetと呼ばれる新しい深部視覚予測フレームワークのデータセットをベンチマークし、近似された将来の画像フレームはロボットのアクションにも依存しており、モバイルロボットと自律ナビゲーション研究のためのビデオ予測パラダイムにロボットダイナミクスを組み込む可能性を実証した。

With the increasing adoption of robots across industries, it is crucial to focus on developing advanced algorithms that enable robots to anticipate, comprehend, and plan their actions effectively in collaboration with humans. We introduce the Robot Autonomous Motion (RoAM) video dataset, which is collected with a custom-made turtlebot3 Burger robot in a variety of indoor environments recording various human motions from the robot's ego-vision. The dataset also includes synchronized records of the LiDAR scan and all control actions taken by the robot as it navigates around static and moving human agents. The unique dataset provides an opportunity to develop and benchmark new visual prediction frameworks that can predict future image frames based on the action taken by the recording agent in partially observable scenarios or cases where the imaging sensor is mounted on a moving platform. We have benchmarked the dataset on our novel deep visual prediction framework called ACPNet where the approximated future image frames are also conditioned on action taken by the robot and demonstrated its potential for incorporating robot dynamics into the video prediction paradigm for mobile robotics and autonomous navigation research.

翻訳日:2023-06-29 16:13:41 公開日:2023-06-28

# spotem:エピソディックメモリのための効率的なビデオ検索

SpotEM: Efficient Video Search for Episodic Memory ( http://arxiv.org/abs/2306.15850v1 )

ライセンス: Link先を確認

Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman

(参考訳) エピソードメモリ(EM)の目標は、自然言語の問い合わせに答えるために、長いエゴセントリックなビデオを検索することである(例えば、私は財布をどこに置き去りにしたのか? 既存のemメソッドは、ビデオの至るところで見られるよう、高価な固定長クリップ機能を徹底的に抽出している。本研究では,高い精度を維持しつつ,与えられたEM手法の効率性を実現する手法であるSpotEMを提案する。 SpotEMは3つの重要なアイデアから成り立っている。 1) 言語クエリで条件付き検索を行うための有望なビデオ領域を特定することを学習する新規クリップセレクタ 2) 部屋,オブジェクト,および見るべき場所を示すインタラクションのコンテキストをキャプチャする,低コストでセマンティックなインデックス化機能。 3)クリップセレクタとemモデルのエンドツーエンド合同トレーニングから生じる最適化問題に対処する蒸留損失。 Ego4D EM Natural Language Queriesベンチマークによる200時間以上のビデオと3つの異なるEMモデルによる実験は、我々のアプローチの有効性を示している。プロジェクトページ: https://vision.cs.utexas.edu/projects/spotem

The goal in episodic memory (EM) is to search a long egocentric video to answer a natural language query (e.g., "where did I leave my purse?"). Existing EM methods exhaustively extract expensive fixed-length clip features to look everywhere in the video for the answer, which is infeasible for long wearable-camera videos that span hours or even days. We propose SpotEM, an approach to achieve efficiency for a given EM method while maintaining good accuracy. SpotEM consists of three key ideas: 1) a novel clip selector that learns to identify promising video regions to search conditioned on the language query; 2) a set of low-cost semantic indexing features that capture the context of rooms, objects, and interactions that suggest where to look; and 3) distillation losses that address the optimization issues arising from end-to-end joint training of the clip selector and EM model. Our experiments on 200+ hours of video from the Ego4D EM Natural Language Queries benchmark and three different EM models demonstrate the effectiveness of our approach: computing only 10% - 25% of the clip features, we preserve 84% - 97% of the original EM model's accuracy. Project page: https://vision.cs.utexas.edu/projects/spotem

翻訳日:2023-06-29 16:13:21 公開日:2023-06-28

# 非置換SGDの順序付け

Ordering for Non-Replacement SGD ( http://arxiv.org/abs/2306.15848v1 )

ライセンス: Link先を確認

Yuetong Xu and Baharan Mirzasoleiman

(参考訳) 実行時間の削減と機械学習の効率向上のための1つのアプローチは、使用される最適化アルゴリズムの収束率を低減することである。 Shufflingは機械学習で広く使われているアルゴリズム技術だが、近年は理論上のみ注目を集め始めた。ランダムシャッフルと漸進勾配降下のために異なる収束速度が発達し、アルゴリズムの非置換形式に対する収束率を改善することができる順序付けを求める。最適イテレートと電流イテレートの間の距離の既存の境界に基づいて、エポックの開始時の勾配に依存する上界を導出する。境界解析により、強い凸関数と凸関数のステップサイズを一定かつ小さくするための最適順序付けを開発することができる。さらに, 合成実験と実データを用いた実験を行い, 実験結果の検証を行った。さらに、注文とミニバッチを組み合わせることで、より複雑なニューラルネットワークに適用し、有望な結果を示すことができます。

One approach for reducing run time and improving efficiency of machine learning is to reduce the convergence rate of the optimization algorithm used. Shuffling is an algorithm technique that is widely used in machine learning, but it only started to gain attention theoretically in recent years. With different convergence rates developed for random shuffling and incremental gradient descent, we seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm. Based on existing bounds of the distance between the optimal and current iterate, we derive an upper bound that is dependent on the gradients at the beginning of the epoch. Through analysis of the bound, we are able to develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions. We further test and verify our results through experiments on synthesis and real data sets. In addition, we are able to combine the ordering with mini-batch and further apply it to more complex neural networks, which show promising results.

翻訳日:2023-06-29 16:13:00 公開日:2023-06-28

# asymptotic-preserving convolutional deeponets : 多スケール線形輸送方程式の拡散挙動を捉える

Asymptotic-Preserving Convolutional DeepONets Capture the Diffusive Behavior of the Multiscale Linear Transport Equations ( http://arxiv.org/abs/2306.15891v1 )

ライセンス: Link先を確認

Keke Wu and Xiong-bin Yan and Shi Jin and Zheng Ma

(参考訳) 本稿では,マルチスケールの時間依存線形輸送問題に対処するために設計された,漸近保存型畳み込み型深層作用素ネットワーク (apcons) の2つのタイプを提案する。 MLPを改良したバニラ物理インフォームドディープノネットは,所望のマクロな挙動を維持する不安定性を示す可能性がある。したがって、漸近保存損失関数の利用が必要である。拡散方程式における熱核からインスピレーションを得たConvolutional Deep Operator Networksという新しいアーキテクチャを提案し,各フィルタ層におけるプールおよびアクティベーション操作とともに,グローバルな熱カーネルの代わりに複数の局所畳み込み演算を用いる。我々のAPCON法は, グリッドサイズに依存しないパラメータ数を持ち, 線形輸送問題の拡散挙動を捉えることができる。最後に,本手法の有効性をいくつかの数値例を通して検証する。

In this paper, we introduce two types of novel Asymptotic-Preserving Convolutional Deep Operator Networks (APCONs) designed to address the multiscale time-dependent linear transport problem. We observe that the vanilla physics-informed DeepONets with modified MLP may exhibit instability in maintaining the desired limiting macroscopic behavior. Therefore, this necessitates the utilization of an asymptotic-preserving loss function. Drawing inspiration from the heat kernel in the diffusion equation, we propose a new architecture called Convolutional Deep Operator Networks, which employ multiple local convolution operations instead of a global heat kernel, along with pooling and activation operations in each filter layer. Our APCON methods possess a parameter count that is independent of the grid size and are capable of capturing the diffusive behavior of the linear transport problem. Finally, we validate the effectiveness of our methods through several numerical examples.

翻訳日:2023-06-29 16:06:47 公開日:2023-06-28

# 反応・再合成予測のための深層学習の統一的展望:現状と今後の課題

A Unified View of Deep Learning for Reaction and Retrosynthesis Prediction: Current Status and Future Challenges ( http://arxiv.org/abs/2306.15890v1 )

ライセンス: Link先を確認

Ziqiao Meng, Peilin Zhao, Yang Yu, Irwin King

(参考訳) 反応と再合成予測は、最近機械学習と薬物発見コミュニティから注目を集めている計算化学の基本的なタスクである。これらの問題に取り組むために、さまざまなディープラーニングアプローチが提案されている。本研究では,反応・再合成予測のための高度なディープラーニングモデルに関する包括的調査を行う。我々は最先端アプローチの設計機構,強み,弱みを要約する。次に、現在のソリューションの限界と、問題自体のオープンな課題について論じる。最後に,今後の研究を促進するための有望な方向性を示す。本研究は,反応の統一的理解と再合成予測を目的とした,初めての総合的かつ体系的な調査である。

Reaction and retrosynthesis prediction are fundamental tasks in computational chemistry that have recently garnered attention from both the machine learning and drug discovery communities. Various deep learning approaches have been proposed to tackle these problems, and some have achieved initial success. In this survey, we conduct a comprehensive investigation of advanced deep learning-based models for reaction and retrosynthesis prediction. We summarize the design mechanisms, strengths, and weaknesses of state-of-the-art approaches. Then, we discuss the limitations of current solutions and open challenges in the problem itself. Finally, we present promising directions to facilitate future research. To our knowledge, this paper is the first comprehensive and systematic survey that seeks to provide a unified understanding of reaction and retrosynthesis prediction.

翻訳日:2023-06-29 16:06:30 公開日:2023-06-28

# ハイプを超えて: GPT3.5の性能, 信頼性, 臨床適合性を評価する

Beyond the Hype: Assessing the Performance, Trustworthiness, and Clinical Suitability of GPT3.5 ( http://arxiv.org/abs/2306.15887v1 )

ライセンス: Link先を確認

Salmonn Talebi, Elizabeth Tong and Mohammad R. K. Mofrad

(参考訳) 医療における大規模言語モデル(LLMs)の使用は普及しているが,臨床現場での実用性や安全性は十分に評価されていない。 LLMにとって、医療環境や信頼性、安全性といった高度な環境が重要な問題である。そこで本研究では,医療画像プロトコル割り当てのためのgpt3.5モデルの性能と信頼性を評価する手法を提案する。細調整されたBERTモデルと放射線技師を比較した。また,決定過程を評価するため,GPT3.5の出力を放射線学者にレビューする。評価データセットは、頭部全体にわたる11のイメージングプロトコルクラスにまたがる4,700人の医師からなる。以上の結果から,GPT3.5はBERTと放射線科医に遅れていることが示唆された。しかし GPT3.5 は BERT よりも優れており、その決定を説明し、関連する単語の指標を検出し、モデルの校正を行う。さらに, 誤分類に対する GPT3.5 の説明を解析することにより, 安全性と臨床応用への適合性を高めるために解決すべき系統的誤りを明らかにする。

The use of large language models (LLMs) in healthcare is gaining popularity, but their practicality and safety in clinical settings have not been thoroughly assessed. In high-stakes environments like medical settings, trust and safety are critical issues for LLMs. To address these concerns, we present an approach to evaluate the performance and trustworthiness of a GPT3.5 model for medical image protocol assignment. We compare it with a fine-tuned BERT model and a radiologist. In addition, we have a radiologist review the GPT3.5 output to evaluate its decision-making process. Our evaluation dataset consists of 4,700 physician entries across 11 imaging protocol classes spanning the entire head. Our findings suggest that the GPT3.5 performance falls behind BERT and a radiologist. However, GPT3.5 outperforms BERT in its ability to explain its decision, detect relevant word indicators, and model calibration. Furthermore, by analyzing the explanations of GPT3.5 for misclassifications, we reveal systematic errors that need to be resolved to enhance its safety and suitability for clinical use.

翻訳日:2023-06-29 16:06:19 公開日:2023-06-28

# 特徴表現に基づく逐次的注意源同定

Sequential Attention Source Identification Based on Feature Representation ( http://arxiv.org/abs/2306.15886v1 )

ライセンス: Link先を確認

Dongpeng Hou, Zhen Wang, Chao Gao, Xuelong Li

(参考訳) スナップショット観測に基づくソースローカライゼーションは、アクセシビリティと低コストのために広く研究されている。しかし, 既存手法におけるユーザ間のインタラクションは, 時間変化による感染シナリオでは対処できない。これらの手法は異種相互作用のシナリオにおいて精度が低下する。そこで本研究では,インダクティブ・ラーニング・アイデアに基づく時間系列に基づくグラフ注意源同定(tgasi)と呼ばれる,シーケンスからシーケンスへの局所化手法を提案する。より具体的には、エンコーダは2人のユーザ間の影響確率を推定して複数の特徴を生成し、デコーダは設計した時間的注意機構により異なるタイムスタンプにおける予測ソースの重要性を区別する。ただし、インダクティブラーニングのアイデアは、TGASIが他の事前知識を知らずに新しいシナリオのソースを検出できることを保証するもので、TGASIのスケーラビリティを証明している点には注意が必要だ。 SOTA法による総合的な実験は、TGASIの異なるシナリオにおける高い検出性能とスケーラビリティを示す。

Snapshot observation based source localization has been widely studied due to its accessibility and low cost. However, the interaction of users in existing methods does not be addressed in time-varying infection scenarios. So these methods have a decreased accuracy in heterogeneous interaction scenarios. To solve this critical issue, this paper proposes a sequence-to-sequence based localization framework called Temporal-sequence based Graph Attention Source Identification (TGASI) based on an inductive learning idea. More specifically, the encoder focuses on generating multiple features by estimating the influence probability between two users, and the decoder distinguishes the importance of prediction sources in different timestamps by a designed temporal attention mechanism. It's worth mentioning that the inductive learning idea ensures that TGASI can detect the sources in new scenarios without knowing other prior knowledge, which proves the scalability of TGASI. Comprehensive experiments with the SOTA methods demonstrate the higher detection performance and scalability in different scenarios of TGASI.

翻訳日:2023-06-29 16:06:03 公開日:2023-06-28

# 真のフレア除去に向けて - 包括的なパイプラインと新しいベンチマーク

Toward Real Flare Removal: A Comprehensive Pipeline and A New Benchmark ( http://arxiv.org/abs/2306.15884v1 )

ライセンス: Link先を確認

Zheyan Jin, Shiqi Chen, Huajun Feng, Zhihai Xu, Yueting Chen

(参考訳) 照明不足のシーンでの撮影では、複雑な光源の存在は、強度、スペクトル、反射、収差などの強いフレアのアーチファクトを画像に残すことが多い。画質だけでなく、下流のビジュアルアプリケーションの性能にも影響を及ぼす。したがって、レンズフレアとゴーストの除去は、特に低照度環境での課題である。しかし, 既存のフレア除去法は主に, 散乱フレアのカテゴリーが特異であり, 反射するゴーストが利用できない, 不適切なシミュレーションや実世界の捕獲に限られている。したがって,フレア除去のデータセットの構築には包括的劣化手順が不可欠である。理論的解析と実世界の評価に基づいて,フレア劣化を伴うデータペアを生成する手法を提案する。手順は包括的であり、散在するフレアの類似性と反射するゴーストの対称効果を実現する。さらに,散乱と反射フレアの影響をそれぞれ処理する実写パイプラインを構築し,エンドツーエンドの手法で直接データを生成する。実験の結果,提案手法は既存のフレアデータセットに多様性を付加し,フレアデータペアの包括的なマッピング手順を構築する。また,本手法では,フレア画像の良好な復元を実現するためにデータ駆動モデルを構築し,実際の撮影に基づく評価システムを提案する。

Photographing in the under-illuminated scenes, the presence of complex light sources often leave strong flare artifacts in images, where the intensity, the spectrum, the reflection, and the aberration altogether contribute the deterioration. Besides the image quality, it also influence the performance of down-stream visual applications. Thus, removing the lens flare and ghosts is a challenge issue especially in low-light environment. However, existing methods for flare removal mainly restricted to the problems of inadequate simulation and real-world capture, where the categories of scattered flares are singular and the reflected ghosts are unavailable. Therefore, a comprehensive deterioration procedure is crucial for constructing the dataset of flare removal. Based on the theoretical analysis and real-world evaluation, we propose a well-developed methodology for generating the data-pairs with flare deterioration. The procedure is comprehensive, where the similarity of scattered flares and the symmetric effect of reflected ghosts are realized. Moreover, we also construct a real-shot pipeline that respectively processes the effects of scattering and reflective flares, aiming to directly generate the data for end-to-end methods. Experimental results show that the proposed methodology add diversity to the existing flare datasets and construct a comprehensive mapping procedure for flare data pairs. And our method facilities the data-driven model to realize better restoration in flare images and proposes a better evaluation system based on real shots, resulting promote progress in the area of real flare removal.

翻訳日:2023-06-29 16:05:47 公開日:2023-06-28

# レコメンデーションシステムにおけるブロックワイズ機能相互作用

Blockwise Feature Interaction in Recommendation Systems ( http://arxiv.org/abs/2306.15881v1 )

ライセンス: Link先を確認

Weijie Zhao, Ping Li

(参考訳) 機能インタラクションは、ユーザの好みとアイテム特性の複雑な関係を捉えるため、レコメンデーションシステムにおいて重要な役割を果たす。ディープ・アンド・クロス・ネットワーク(DCNv2)のような既存の手法は、クロス層演算のために高い計算要求に悩まされる可能性がある。本稿では,この問題を軽減するためにbfi(blockwise feature interaction)と呼ばれる新しい手法を提案する。機能相互作用プロセスを小さなブロックに分割することで、メモリフットプリントと計算負荷の両方を大幅に削減できる。 BFIの4つの変種(それぞれP, Q, T, S)が開発され、経験的に比較されている。実験の結果,提案アルゴリズムは標準のDCNv2に比べて精度が良く,計算オーバーヘッドやパラメータ数を大幅に削減できることがわかった。本稿では,機能相互作用効率向上のための実用的なソリューションを提供することで,効率的なレコメンデーションシステムの開発に寄与する。

Feature interactions can play a crucial role in recommendation systems as they capture complex relationships between user preferences and item characteristics. Existing methods such as Deep & Cross Network (DCNv2) may suffer from high computational requirements due to their cross-layer operations. In this paper, we propose a novel approach called blockwise feature interaction (BFI) to help alleviate this issue. By partitioning the feature interaction process into smaller blocks, we can significantly reduce both the memory footprint and the computational burden. Four variants (denoted by P, Q, T, S, respectively) of BFI have been developed and empirically compared. Our experimental results demonstrate that the proposed algorithms achieves close accuracy compared to the standard DCNv2, while greatly reducing the computational overhead and the number of parameters. This paper contributes to the development of efficient recommendation systems by providing a practical solution for improving feature interaction efficiency.

翻訳日:2023-06-29 16:05:23 公開日:2023-06-28

# オープンボキャブラリ学習に向けて:調査

Towards Open Vocabulary Learning: A Survey ( http://arxiv.org/abs/2306.15880v1 )

ライセンス: Link先を確認

Jianzong Wu, Xiangtai Li, Shilin Xu. Haobo Yuan, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, and Dacheng Tao

(参考訳) 視覚シーン理解の分野では、ディープニューラルネットワークはセグメンテーション、トラッキング、検出など、さまざまなコアタスクにおいて驚くべき進歩を遂げている。しかし、ほとんどのアプローチはクローズセットの仮定に基づいており、トレーニングセットに存在する事前定義されたカテゴリのみを識別できる。近年、視覚言語事前学習の急速な進歩により、オープンな語彙設定が提案されている。これらの新しいアプローチは、注釈付きラベル空間を超えてカテゴリを見つけ、認識することを目指している。オープン語彙のアプローチは、弱教師付きおよびゼロショット設定に比べて、より一般的で実用的で効果的である。本稿では,その分野における最近の発展を要約し,分析し,オープンな語彙学習の徹底的なレビューを行う。特に,ゼロショット学習,オープンセット認識,分散検出といった関連する概念と比較することから始める。次に, セグメンテーションと検出に関して, ロングテール問題, 少数ショット設定, ゼロショット設定など, 密接に関連するタスクをいくつか検討する。本研究は,まず,事前知識としてクローズセットにおける検出とセグメンテーションの基本的な知識を提示する。次に,オープン語彙学習を用いた様々なシナリオについて検討し,共通設計要素とコアアイデアを同定する。次に、一般的なデータセットとベンチマークにおける最近の検出とセグメンテーションのアプローチを比較した。最後に,今後の研究方向性に関する洞察,課題,議論をまとめる。私たちの知る限り、オープンな語彙学習に関する総合的な文献レビューはこれが初めてである。関連する作業をhttps://github.com/jianzongwu/Awesome-Open-Vocabulary.comで追跡しています。

In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective compared to weakly supervised and zero-shot settings. This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by comparing it to related concepts such as zero-shot learning, open-set recognition, and out-of-distribution detection. Then, we review several closely related tasks in the case of segmentation and detection, including long-tail problems, few-shot, and zero-shot settings. For the method survey, we first present the basic knowledge of detection and segmentation in close-set as the preliminary knowledge. Next, we examine various scenarios in which open vocabulary learning is used, identifying common design elements and core ideas. Then, we compare the recent detection and segmentation approaches in commonly used datasets and benchmarks. Finally, we conclude with insights, issues, and discussions regarding future research directions. To our knowledge, this is the first comprehensive literature review of open vocabulary learning. We keep tracing related works at https://github.com/jianzongwu/Awesome-Open-Vocabulary.

翻訳日:2023-06-29 16:05:08 公開日:2023-06-28

# ハイブリッド蒸留:マスクオートエンコーダとコントラスト学習者との接続

Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners ( http://arxiv.org/abs/2306.15876v1 )

ライセンス: Link先を確認

Bowen Shi, Xiaopeng Zhang, Yaoming Wang, Jin Li, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian

(参考訳) 表現学習は従来の教師付きトレーニングからコントラスト学習(CL)やマスケッド画像モデリング(MIM)へと進化してきた。従来の研究では、CLや教師付き事前訓練のエクササイズといった特定のシナリオにおいて、より長い範囲のグローバルパターンを捕捉し、より優れた特徴識別を可能にするとともに、MIMはすべてのトランスフォーマー層により局所的で多様な注意を向けることが可能であった。本稿では,その強みを組み合わせたモデルを得る方法について検討する。まず,前回の特徴蒸留法とマスクの特徴再現法について検討し,その限界を明らかにした。多様性の増大は、主に非対称な設計に由来するが、これらの設計は結果的に識別能力を損なう可能性がある。識別と多様性の両立を図るため,教師/CL教師とMIM教師の双方を併用し,学生モデルを指導する簡易かつ効果的なハイブリッド蒸留戦略を提案する。 Hybrid DistillはMIM教師のトークン関係を模倣し、注意崩壊を緩和し、教師/CL教師の特徴マップを蒸留して差別を可能にする。さらに、プログレッシブな冗長なトークンマスキング戦略を用いて蒸留コストを削減し、局所最適状態に陥ることを避ける。実験の結果、ハイブリッド蒸留は異なるベンチマークで優れた性能を達成できることが証明された。

Representation learning has been evolving from traditional supervised training to Contrastive Learning (CL) and Masked Image Modeling (MIM). Previous works have demonstrated their pros and cons in specific scenarios, i.e., CL and supervised pre-training excel at capturing longer-range global patterns and enabling better feature discrimination, while MIM can introduce more local and diverse attention across all transformer layers. In this paper, we explore how to obtain a model that combines their strengths. We start by examining previous feature distillation and mask feature reconstruction methods and identify their limitations. We find that their increasing diversity mainly derives from the asymmetric designs, but these designs may in turn compromise the discrimination ability. In order to better obtain both discrimination and diversity, we propose a simple but effective Hybrid Distillation strategy, which utilizes both the supervised/CL teacher and the MIM teacher to jointly guide the student model. Hybrid Distill imitates the token relations of the MIM teacher to alleviate attention collapse, as well as distills the feature maps of the supervised/CL teacher to enable discrimination. Furthermore, a progressive redundant token masking strategy is also utilized to reduce the distilling costs and avoid falling into local optima. Experiment results prove that Hybrid Distill can achieve superior performance on different benchmarks.

翻訳日:2023-06-29 16:04:47 公開日:2023-06-28

# fake the real: 音声変換によるディープ音声分類へのバックドア攻撃

Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion ( http://arxiv.org/abs/2306.15875v1 )

ライセンス: Link先を確認

Zhe Ye, Terui Mao, Li Dong, Diqun Yan

(参考訳) 深い音声分類は大きな成功を収め、多くの実世界の応用の出現を大いに促進した。しかし、バックドア攻撃は、特に信頼できるサードパーティプラットフォームにおいて、攻撃者が設定した事前定義されたトリガーがバックドアをアクティベートできるため、新たなセキュリティ脅威をもたらす。既存の音声バックドア攻撃のトリガーのほとんどはサンプルに依存しないものであり、たとえトリガーが目立たずに設計されているとしても、音は聞こえる。本研究は,音声変換に基づくサンプル特異的トリガを用いたバックドア攻撃を探索する。具体的には,事前学習した音声変換モデルを用いてトリガーを生成し,有毒なサンプルが追加の可聴音を発生させないようにする。 2つの音声分類タスクに対する大規模な実験により,攻撃の有効性が示された。さらに,提案するバックドアを活性化する特定のシナリオを分析し,微調整に対する抵抗を検証した。

Deep speech classification has achieved tremendous success and greatly promoted the emergence of many real-world applications. However, backdoor attacks present a new security threat to it, particularly with untrustworthy third-party platforms, as pre-defined triggers set by the attacker can activate the backdoor. Most of the triggers in existing speech backdoor attacks are sample-agnostic, and even if the triggers are designed to be unnoticeable, they can still be audible. This work explores a backdoor attack that utilizes sample-specific triggers based on voice conversion. Specifically, we adopt a pre-trained voice conversion model to generate the trigger, ensuring that the poisoned samples does not introduce any additional audible noise. Extensive experiments on two speech classification tasks demonstrate the effectiveness of our attack. Furthermore, we analyzed the specific scenarios that activated the proposed backdoor and verified its resistance against fine-tuning.

翻訳日:2023-06-29 16:04:22 公開日:2023-06-28

# 変分ベイズ推論を用いた限定データからの確率偏微分方程式の発見

Discovering stochastic partial differential equations from limited data using variational Bayes inference ( http://arxiv.org/abs/2306.15873v1 )

ライセンス: Link先を確認

Yogesh Chandrakant Mathpati and Tapas Tripura and Rajdip Nayek and Souvik Chakraborty

(参考訳) データから確率的部分微分方程式(SPDE)を発見するための新しい枠組みを提案する。提案手法は確率計算、変分ベイズ理論、スパース学習の概念を組み合わせたものである。本研究では,SPDEのドリフト・拡散項を状態応答の観点から表現するための拡張クラマース・モヤル展開法を提案し,Spike-and-Slab プリエントをスパースラーニング技術を用いて,基礎となるSPDEを効率的に正確に発見する。提案手法は3つの標準SPDEに適用されている。 a)確率的熱方程式 (b)確率アレン・カーン方程式、及び (c)確率ナグモ方程式。提案手法は,SPDEを限られたデータで正確に識別できることを示す。これはデータからSPDEを発見する最初の試みであり、気候モデリング、財務予測、化学動力学などの様々な科学的応用に重要な意味を持つ。

We propose a novel framework for discovering Stochastic Partial Differential Equations (SPDEs) from data. The proposed approach combines the concepts of stochastic calculus, variational Bayes theory, and sparse learning. We propose the extended Kramers-Moyal expansion to express the drift and diffusion terms of an SPDE in terms of state responses and use Spike-and-Slab priors with sparse learning techniques to efficiently and accurately discover the underlying SPDEs. The proposed approach has been applied to three canonical SPDEs, (a) stochastic heat equation, (b) stochastic Allen-Cahn equation, and (c) stochastic Nagumo equation. Our results demonstrate that the proposed approach can accurately identify the underlying SPDEs with limited data. This is the first attempt at discovering SPDEs from data, and it has significant implications for various scientific applications, such as climate modeling, financial forecasting, and chemical kinetics.

翻訳日:2023-06-29 16:04:07 公開日:2023-06-28

# 2023年Waymo Open Sim Agents Challengeの第2位

The 2nd Place Solution for 2023 Waymo Open Sim Agents Challenge ( http://arxiv.org/abs/2306.15914v1 )

ライセンス: Link先を確認

Cheng Qian, Di Xiu, Minghao Tian

(参考訳) 本稿では,2023年のWaymo Open Sim Agents Challenge(WOSAC)[4]の2位となるソリューションを提示する。本稿では,MTR(Motion Transformer)[5] という,マルチエージェントの動作をシミュレーションするシンプルな自己回帰手法を提案する。我々の提出したMTR+++は2023 WOSACのRealism Metaメトリックで0.4697を達成しています。また、MTR_Eと命名されたMTRに基づく改良モデルも提案されており、スコアは0.4911で、2023年6月25日現在、WOSACのリーダーボードで3位である。

In this technical report, we present the 2nd place solution of 2023 Waymo Open Sim Agents Challenge (WOSAC)[4]. We propose a simple yet effective autoregressive method for simulating multi-agent behaviors, which is built upon a well-known multimodal motion forecasting framework called Motion Transformer (MTR)[5] with postprocessing algorithms applied. Our submission named MTR+++ achieves 0.4697 on the Realism Meta metric in 2023 WOSAC. Besides, a modified model based on MTR named MTR_E is proposed after the challenge, which has a better score 0.4911 and is ranked the 3rd on the leaderboard of WOSAC as of June 25, 2023.

翻訳日:2023-06-29 15:56:49 公開日:2023-06-28

# DCT:大規模離散行動空間を用いた強化学習のためのアクション埋め込みのデュアルチャネルトレーニング

DCT: Dual Channel Training of Action Embeddings for Reinforcement Learning with Large Discrete Action Spaces ( http://arxiv.org/abs/2306.15913v1 )

ライセンス: Link先を確認

Pranavi Pathakota and Hardik Meisheri and Harshad Khadilkar

(参考訳) 大規模な離散的行動空間を一般化しながら強固なポリシーを学ぶ能力は、知的システム、特に次元の呪いに直面する雑音環境にとって、オープンな課題である。本稿では,アクション埋め込みを効率的に学習するための新しい枠組みを提案する。本稿では、動作再構成と状態予測精度のバランスをとる2つのチャネル損失を持つ動作埋め込みのためのエンコーダデコーダアーキテクチャについて述べる。我々は、トレーニングされたデコーダと、埋め込み空間でアクションを生成する標準強化学習アルゴリズムを併用する。私たちのアーキテクチャは、4000以上の離散的なノイズアクションを持つ2d maze環境と、現実世界のeコマーストランザクションデータを使用するプロダクトレコメンデーションタスクという、2つの異なる環境での2つの競合ベースラインよりも優れています。経験的な結果は、モデルがよりクリーンなアクション埋め込みをもたらすことを示し、改善された表現はより早い収束でより良いポリシーを学ぶのに役立つ。

The ability to learn robust policies while generalizing over large discrete action spaces is an open challenge for intelligent systems, especially in noisy environments that face the curse of dimensionality. In this paper, we present a novel framework to efficiently learn action embeddings that simultaneously allow us to reconstruct the original action as well as to predict the expected future state. We describe an encoder-decoder architecture for action embeddings with a dual channel loss that balances between action reconstruction and state prediction accuracy. We use the trained decoder in conjunction with a standard reinforcement learning algorithm that produces actions in the embedding space. Our architecture is able to outperform two competitive baselines in two diverse environments: a 2D maze environment with more than 4000 discrete noisy actions, and a product recommendation task that uses real-world e-commerce transaction data. Empirical results show that the model results in cleaner action embeddings, and the improved representations help learn better policies with earlier convergence.

翻訳日:2023-06-29 15:56:37 公開日:2023-06-28

# 食品インスタンスセグメンテーションにおけるインクリメンタル学習

Incremental Learning on Food Instance Segmentation ( http://arxiv.org/abs/2306.15910v1 )

ライセンス: Link先を確認

Huu-Thanh Nguyen, Yu Cao, Chong-Wah Ngo, Wing-Kwong Chan

(参考訳) 食品インスタンスのセグメンテーションは、食品画像中の料理のサービスサイズを推定するために不可欠である。最近のセグメンテーションの最先端技術は、印象的なセグメンテーション品質と高速計算を備えたディープラーニングネットワークである。それでも彼らはデータに飢えており、アノテーションには高価です。本稿では,データラベリング予算に制限のあるモデル性能を最適化するインクリメンタル学習フレームワークを提案する。フレームワークのパワーは、最新のトレーニングされたインスタンスセグメンテーションモデルに対して、非ラベルのサンプルがいかに困難であるかを予測する、新しい困難評価モデルである。データ収集手順はいくつかの段階に分けられ、それぞれに新しいサンプルパッケージが収集される。このフレームワークは、ラベル付け予算を最も難しいサンプルに割り当てる。評価モデルから一定の資格を満たす未ラベルのサンプルを用いて擬似ラベルを生成する。最終的には、手動ラベルと擬似ラベルがトレーニングデータに送られ、インスタンスセグメンテーションモデルが改善される。提案するフレームワークは,4つの大規模食品データセットにおいて,現在のインクリメンタルラーニングベンチマークより優れ,完全注釈付きサンプルでトレーニングしたモデルとの競合性能を実現している。

Food instance segmentation is essential to estimate the serving size of dishes in a food image. The recent cutting-edge techniques for instance segmentation are deep learning networks with impressive segmentation quality and fast computation. Nonetheless, they are hungry for data and expensive for annotation. This paper proposes an incremental learning framework to optimize the model performance given a limited data labelling budget. The power of the framework is a novel difficulty assessment model, which forecasts how challenging an unlabelled sample is to the latest trained instance segmentation model. The data collection procedure is divided into several stages, each in which a new sample package is collected. The framework allocates the labelling budget to the most difficult samples. The unlabelled samples that meet a certain qualification from the assessment model are used to generate pseudo-labels. Eventually, the manual labels and pseudo-labels are sent to the training data to improve the instance segmentation model. On four large-scale food datasets, our proposed framework outperforms current incremental learning benchmarks and achieves competitive performance with the model trained on fully annotated samples.

翻訳日:2023-06-29 15:56:21 公開日:2023-06-28

# RL$^3$: RLによるメタ強化学習をRL$^2$内で促進する

RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ ( http://arxiv.org/abs/2306.15909v1 )

ライセンス: Link先を確認

Abhinav Bhatia, Samer B. Nashed, Shlomo Zilberstein

(参考訳) RL$^2$のようなメタ強化学習(meta-RL)手法は、与えられたタスク分布に合わせてデータ効率のよいRLアルゴリズムを学習するための有望なアプローチとして登場した。しかしながら、これらのRLアルゴリズムは、値関数のような一般的なRLコンポーネントにまとめるのではなく、繰り返しニューラルネットワークを使用して経験のシーケンスを処理するため、長い水平タスクや分配タスクに苦労する。さらに、トランスフォーマーでさえ、トレーニングや推論コストが禁じられる前に効率的に推論できる履歴の長さに実用的な制限がある。対照的に、従来のRLアルゴリズムはドメイン知識を活用せず、より多くのデータが利用可能になるにつれて最適なポリシーに収束するので、データ非効率である。本稿では,従来のRLとメタRLを組み合わせたハイブリッド手法であるRL$^3$を提案する。 rl$^3$ は rl$^2$ と比較して長期ホリゾン・アウト・オブ・ディストリビューション・タスクでより大きな累積報酬を得られるが、短期的には後者の効率は維持される。様々な短期的、長期的、複雑な依存関係を示すメタRL文献から、カスタムドメインとベンチマークドメインの両方で実験を行う。

Meta reinforcement learning (meta-RL) methods such as RL$^2$ have emerged as promising approaches for learning data-efficient RL algorithms tailored to a given task distribution. However, these RL algorithms struggle with long-horizon tasks and out-of-distribution tasks since they rely on recurrent neural networks to process the sequence of experiences instead of summarizing them into general RL components such as value functions. Moreover, even transformers have a practical limit to the length of histories they can efficiently reason about before training and inference costs become prohibitive. In contrast, traditional RL algorithms are data-inefficient since they do not leverage domain knowledge, but they do converge to an optimal policy as more data becomes available. In this paper, we propose RL$^3$, a principled hybrid approach that combines traditional RL and meta-RL by incorporating task-specific action-values learned through traditional RL as an input to the meta-RL neural network. We show that RL$^3$ earns greater cumulative reward on long-horizon and out-of-distribution tasks compared to RL$^2$, while maintaining the efficiency of the latter in the short term. Experiments are conducted on both custom and benchmark discrete domains from the meta-RL literature that exhibit a range of short-term, long-term, and complex dependencies.

翻訳日:2023-06-29 15:56:03 公開日:2023-06-28

# 南フロリダにおける水ステージ予測のための深層学習モデル

Deep Learning Models for Water Stage Predictions in South Florida ( http://arxiv.org/abs/2306.15907v1 )

ライセンス: Link先を確認

Jimeng Shi, Zeda Yin, Rukmangadh Myana, Khandker Ishtiaq, Anupama John, Jayantha Obeysekera, Arturo Leon, Giri Narasimhan

(参考訳) 河川システムにおける水位シミュレーションと予測は,洪水警報,水理操作,洪水軽減に不可欠である。工学分野では、HEC-RAS、MIKE、SWMMといったツールを使用して、詳細な物理に基づく水理・水理計算モデルを構築し、流域全体をシミュレートし、システム内の任意の時点での水ステージを予測する。しかし、これらの物理学に基づくモデルは、特に大きな流域やより長いシミュレーションのために、計算集約的である。この問題を克服するために,我々は複数の深層学習モデル(DL)を代理モデルとして使用し,水ステージを迅速に予測する。南フロリダのマイアミ川の下流は,本論文の事例研究として選択されている。データセットは2010年1月1日から2020年12月31日まで、南フロリダ水管理地区(SFWMD)のDBHYDROデータベースからダウンロードされる。大規模な実験により、DLモデルの性能は極度の降水条件(熱帯嵐)においても物理学に基づくモデルの性能に匹敵することが示された。さらに,予測長の増加に伴うDLモデルの予測精度の低下について検討した。今後の水ステージを予測するため,我々のDLモデルでは,近年の河川系の測定変数と,近い将来に確実に予測できる共変量を用いている。要約すると、ディープラーニングモデルは、物理ベースのモデルと比較して、少なくとも1000倍のスピードアップで、同等またはより良いエラー率を達成する。

Simulating and predicting water levels in river systems is essential for flood warnings, hydraulic operations, and flood mitigations. In the engineering field, tools such as HEC-RAS, MIKE, and SWMM are used to build detailed physics-based hydrological and hydraulic computational models to simulate the entire watershed, thereby predicting the water stage at any point in the system. However, these physics-based models are computationally intensive, especially for large watersheds and for longer simulations. To overcome this problem, we train several deep learning (DL) models for use as surrogate models to rapidly predict the water stage. The downstream stage of the Miami River in South Florida is chosen as a case study for this paper. The dataset is from January 1, 2010, to December 31, 2020, downloaded from the DBHYDRO database of the South Florida Water Management District (SFWMD). Extensive experiments show that the performance of the DL models is comparable to that of the physics-based models, even during extreme precipitation conditions (i.e., tropical storms). Furthermore, we study the decline in prediction accuracy of the DL models with an increase in prediction lengths. In order to predict the water stage in the future, our DL models use measured variables of the river system from the recent past as well as covariates that can be reliably predicted in the near future. In summary, the deep learning models achieve comparable or better error rates with at least 1000x speedup in comparison to the physics-based models.

翻訳日:2023-06-29 15:55:39 公開日:2023-06-28

# 協調濾過における硬質負試料の寸法独立混合

Dimension Independent Mixup for Hard Negative Sample in Collaborative Filtering ( http://arxiv.org/abs/2306.15905v1 )

ライセンス: Link先を確認

Xi Wu, Liangwei Yang, Jibing Gong, Chao Zhou, Tianyu Lin, Xiaolong Liu, Philip S. Yu

(参考訳) 協調フィルタリング(CF)は,過去のインタラクションに基づいてユーザの好みを予測する手法として広く利用されている。負のサンプリングは、暗黙のフィードバックでcfベースのモデルのトレーニングにおいて重要な役割を果たす。本稿では,既存のサンプリング手法を再検討するためのサンプリング領域に基づく新しい視点を提案する。現状のサンプリング手法は, 主にポイントワイズやラインワイズに焦点を合わせ, 柔軟性の欠如, ハードサンプリング領域の大部分を未検討のまま残している。この制限に対処するため,CFモデルを用いた最初のエリアワイドサンプリング手法であるDINS(Dimension Independent Mixup for Hard Negative Smpling)を提案する。 DINSはハード境界定義、次元独立混合、マルチホッププールの3つのモジュールから構成されている。行列分解モデルとグラフベースモデルの両方における実世界のデータセットを用いた実験により、DINSは他の負のサンプリング手法よりも優れ、その効果と優越性を確立した。本研究は,新たな視点と領域的サンプリングの導入,負サンプリングの最先端性能を実現する新たなアプローチとしてdinsを提案する。私たちの実装はPyTorchで利用可能です。

Collaborative filtering (CF) is a widely employed technique that predicts user preferences based on past interactions. Negative sampling plays a vital role in training CF-based models with implicit feedback. In this paper, we propose a novel perspective based on the sampling area to revisit existing sampling methods. We point out that current sampling methods mainly focus on Point-wise or Line-wise sampling, lacking flexibility and leaving a significant portion of the hard sampling area un-explored. To address this limitation, we propose Dimension Independent Mixup for Hard Negative Sampling (DINS), which is the first Area-wise sampling method for training CF-based models. DINS comprises three modules: Hard Boundary Definition, Dimension Independent Mixup, and Multi-hop Pooling. Experiments with real-world datasets on both matrix factorization and graph-based models demonstrate that DINS outperforms other negative sampling methods, establishing its effectiveness and superiority. Our work contributes a new perspective, introduces Area-wise sampling, and presents DINS as a novel approach that achieves state-of-the-art performance for negative sampling. Our implementations are available in PyTorch.

翻訳日:2023-06-29 15:55:17 公開日:2023-06-28

# 多様性と強み - 複数のAIの相互強化学習によるサッカーフルゲームをマスターする

Diversity is Strength: Mastering Football Full Game with Interactive Reinforcement Learning of Multiple AIs ( http://arxiv.org/abs/2306.15903v1 )

ライセンス: Link先を確認

Chenglu Sun, Shuo Shen, Sijia Xu, Weidong Zhang

(参考訳) マルチエージェント環境で強力で豊かな戦略でAIを訓練することは、Deep Reinforcement Learning(DRL)において重要な研究トピックである。 AIの強みは戦略の多様性と密接に関連しており、この関係は、強い戦略と豊かな戦略の両方でAIを訓練するためのガイドとなります。この点を証明するために、多種多様なAIを同時に訓練できる新しいDRLトレーニングフレームワークであるDiversity is Strength (DIS)を提案する。これらのAIは相互接続された履歴モデルプール構造を介してリンクされ、その能力と戦略の多様性を高める。また、モデルプールを強化し、最終的なAIを得るために最適なモデルを選択するためのモデル評価およびスクリーニングスキームを設計する。提案手法は,人的データを用いることなく,多様で汎用的で強力なAI戦略を提供する。私たちはGoogle Research Football(GRF)に基づいたAIコンペでテストを行い、5v5と11v11のトラックで優勝しました。この方法により、GRF AIは、複雑なマルチエージェント環境下で、5v5と11v11トラックの両方で、初めてハイレベルになる。行動分析により、トレーニングされたAIは豊富な戦略を持ち、アブレーション実験は、設計されたモジュールがトレーニングプロセスの恩恵を受けることを示した。

Training AI with strong and rich strategies in multi-agent environments remains an important research topic in Deep Reinforcement Learning (DRL). The AI's strength is closely related to its diversity of strategies, and this relationship can guide us to train AI with both strong and rich strategies. To prove this point, we propose Diversity is Strength (DIS), a novel DRL training framework that can simultaneously train multiple kinds of AIs. These AIs are linked through an interconnected history model pool structure, which enhances their capabilities and strategy diversities. We also design a model evaluation and screening scheme to select the best models to enrich the model pool and obtain the final AI. The proposed training method provides diverse, generalizable, and strong AI strategies without using human data. We tested our method in an AI competition based on Google Research Football (GRF) and won the 5v5 and 11v11 tracks. The method enables a GRF AI to have a high level on both 5v5 and 11v11 tracks for the first time, which are under complex multi-agent environments. The behavior analysis shows that the trained AI has rich strategies, and the ablation experiments proved that the designed modules benefit the training process.

翻訳日:2023-06-29 15:54:59 公開日:2023-06-28

# 分布外一般化のための個別及び構造グラフ情報基盤

Individual and Structural Graph Information Bottlenecks for Out-of-Distribution Generalization ( http://arxiv.org/abs/2306.15902v1 )

ライセンス: Link先を確認

Ling Yang, Jiayi Zheng, Heyuan Wang, Zhongyi Liu, Zhilin Huang, Shenda Hong, Wentao Zhang, Bin Cui

(参考訳) アウト・オブ・ディストリビューション (OOD) グラフの一般化は多くの実世界のアプリケーションにとって重要である。既存の方法は、ラベルとは無関係な入力の急激な特徴や騒々しい特徴を捨てることを無視している。さらに、主にインスタンスレベルのクラス不変グラフ学習を行い、グラフインスタンス間の構造クラス関係を利用できない。本研究は,IS-GIB(Personal and Structure Graph Information Bottlenecks)と呼ばれる統合フレームワークを用いて,これらの課題に対処する。分散シフトによるクラス急激な特徴を除去するために,入力グラフと埋め込みの相互情報を最小化することにより,無関係な情報を捨てるPersonal Graph Information Bottleneck (I-GIB)を提案する。構造内およびドメイン間相関の活用を目的として,構造グラフ情報ボトルネック(S-GIB)を提案する。具体的には、複数のドメインを持つグラフのバッチに対して、S-GIBはまずペアの入力-入力、埋め込み-埋め込み、ラベル-ラベル相関を計算する。そして、埋め込みとラベルペア間の相互情報を最大化しながら、入力グラフと埋め込みペア間の相互情報を最小化する。 S-GIBの批判的な洞察は、複数の分布シフトの下でクラス関係を維持することにより、急激な特徴を同時に排除し、高次の視点から不変な特徴を学習することである。特に、提案したI-GIBとS-GIBを統一して、補完的なフレームワークIS-GIBを形成する。ノードレベルのタスクとグラフレベルのタスクの両方で実施された大規模な実験は、IS-GIBの優れた一般化能力を一貫して示している。コードはhttps://github.com/yangling0818/graphoodで入手できる。

Out-of-distribution (OOD) graph generalization are critical for many real-world applications. Existing methods neglect to discard spurious or noisy features of inputs, which are irrelevant to the label. Besides, they mainly conduct instance-level class-invariant graph learning and fail to utilize the structural class relationships between graph instances. In this work, we endeavor to address these issues in a unified framework, dubbed Individual and Structural Graph Information Bottlenecks (IS-GIB). To remove class spurious feature caused by distribution shifts, we propose Individual Graph Information Bottleneck (I-GIB) which discards irrelevant information by minimizing the mutual information between the input graph and its embeddings. To leverage the structural intra- and inter-domain correlations, we propose Structural Graph Information Bottleneck (S-GIB). Specifically for a batch of graphs with multiple domains, S-GIB first computes the pair-wise input-input, embedding-embedding, and label-label correlations. Then it minimizes the mutual information between input graph and embedding pairs while maximizing the mutual information between embedding and label pairs. The critical insight of S-GIB is to simultaneously discard spurious features and learn invariant features from a high-order perspective by maintaining class relationships under multiple distributional shifts. Notably, we unify the proposed I-GIB and S-GIB to form our complementary framework IS-GIB. Extensive experiments conducted on both node- and graph-level tasks consistently demonstrate the superior generalization ability of IS-GIB. The code is available at https://github.com/YangLing0818/GraphOOD.

翻訳日:2023-06-29 15:54:38 公開日:2023-06-28

# 特権情報による擬似ラベル化とそのin situシーケンシング画像への応用

Pseudo-Labeling Enhanced by Privileged Information and Its Application to In Situ Sequencing Images ( http://arxiv.org/abs/2306.15898v1 )

ライセンス: Link先を確認

Marzieh Haghighi, Mario C. Cruz, Erin Weisbart, Beth A. Cimini, Avtar Singh, Julia Bauman, Maria E. Lozada, Sanam L. Kavari, James T. Neal, Paul C. Blainey, Anne E. Carpenter and Shantanu Singh

(参考訳) ラベル・スカース物体検出のための様々な戦略がコンピュータビジョン研究コミュニティによって検討されている。これらの戦略は主に、自然画像に特有の仮定に依存しており、生物学的および生物医学的な視覚領域に直接適用されない。例えば、ほとんどの半教師付き学習戦略は、信頼できる真実の情報源としてラベル付きデータの小さなセットに依存している。しかし、多くの生物学的視覚応用において、基礎的真理は未知であり、間接的な情報はノイズ推定や直交的証拠という形で利用可能である。本研究では,半教師付き物体検出(ssod)問題として,空間的トランスクリプトミクス(iss画像からバーコードを復号する)における重要な問題を考察する。提案フレームワークは,半教師付き学習フレームワークに特権情報という形で追加可能な情報ソースを組み込む。特権情報は教師の疑似ラベルに組み込まれ、教師の教師が自習するイテレーションで学習される。利用可能な特権情報はデータドメインに特化することができるが、特権情報(PLePI)によって強化された擬似ラベルの一般的な戦略を導入し、ISSイメージとCLIPが提供する余分な証拠を用いたCOCOベンチマークを用いて概念を実証した。

Various strategies for label-scarce object detection have been explored by the computer vision research community. These strategies mainly rely on assumptions that are specific to natural images and not directly applicable to the biological and biomedical vision domains. For example, most semi-supervised learning strategies rely on a small set of labeled data as a confident source of ground truth. In many biological vision applications, however, the ground truth is unknown and indirect information might be available in the form of noisy estimations or orthogonal evidence. In this work, we frame a crucial problem in spatial transcriptomics - decoding barcodes from In-Situ-Sequencing (ISS) images - as a semi-supervised object detection (SSOD) problem. Our proposed framework incorporates additional available sources of information into a semi-supervised learning framework in the form of privileged information. The privileged information is incorporated into the teacher's pseudo-labeling in a teacher-student self-training iteration. Although the available privileged information could be data domain specific, we have introduced a general strategy of pseudo-labeling enhanced by privileged information (PLePI) and exemplified the concept using ISS images, as well on the COCO benchmark using extra evidence provided by CLIP.

翻訳日:2023-06-29 15:54:12 公開日:2023-06-28

# 帰属訓練データジェネレータとしての大規模言語モデル:多様性とバイアスの物語

Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias ( http://arxiv.org/abs/2306.15895v1 )

ライセンス: Link先を確認

Yue Yu, Yuchen Zhuang, Jieyu Zhang, Yu Meng, Alexander Ratner, Ranjay Krishna, Jiaming Shen, Chao Zhang

(参考訳) 大規模言語モデル(LLM)は、最近、様々な自然言語処理(NLP)タスクのためのトレーニングデータジェネレータとして活用されている。従来の研究では、生成データを用いたモデルトレーニングのさまざまなアプローチが検討されているが、一般的には、生成されたデータの多様性を制限し、LLMの系統的バイアスを継承する、単純なクラス条件のプロンプトに依存している。そこで本研究では,多様な属性を持つプロンプト(例えば,長さやスタイルなどの属性を指定する)を用いたトレーニングデータ生成について検討する。本研究は,高い濃度と多様なドメインを持つデータセットに着目し,帰属プロンプトが,結果モデルの性能の点で単純なクラス条件プロンプトよりも優れていることを示す。 Additionally, we present a comprehensive empirical study on data generation encompassing vital aspects like bias, diversity, and efficiency, and highlight three key observations: firstly, synthetic datasets generated by simple prompts exhibit significant biases, such as regional bias; secondly, attribute diversity plays a pivotal role in enhancing model performance; lastly, attributed prompts achieve the performance of simple class-conditional prompts while utilizing only 5\% of the querying cost of ChatGPT associated with the latter. 生成されたデータセットを公開し、今後の研究を促進するためにプロンプトを使用します。データとコードは \url{https://github.com/yueyu1030/AttrPrompt} で入手できる。

Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks. While previous research has explored different approaches to training models using generated data, they generally rely on simple class-conditional prompts, which may limit the diversity of the generated data and inherit systematic biases of LLM. Thus, we investigate training data generation with diversely attributed prompts (e.g., specifying attributes like length and style), which have the potential to yield diverse and attributed generated data. Our investigation focuses on datasets with high cardinality and diverse domains, wherein we demonstrate that attributed prompts outperform simple class-conditional prompts in terms of the resulting model's performance. Additionally, we present a comprehensive empirical study on data generation encompassing vital aspects like bias, diversity, and efficiency, and highlight three key observations: firstly, synthetic datasets generated by simple prompts exhibit significant biases, such as regional bias; secondly, attribute diversity plays a pivotal role in enhancing model performance; lastly, attributed prompts achieve the performance of simple class-conditional prompts while utilizing only 5\% of the querying cost of ChatGPT associated with the latter. We release the generated dataset and used prompts to facilitate future research. The data and code will be available on \url{https://github.com/yueyu1030/AttrPrompt}.

翻訳日:2023-06-29 15:53:52 公開日:2023-06-28

# グローバルおよびローカル表現に基づくマルチネットワークコントラスト学習

Multi-network Contrastive Learning Based on Global and Local Representations ( http://arxiv.org/abs/2306.15930v1 )

ライセンス: Link先を確認

Weiquan Li, Xianzhong Long, Yun Li

(参考訳) 自己教師付き学習の人気により、ラベル付きデータに頼ることなくモデルをトレーニングすることが可能になった。しかしながら、既存の自己教師付きコントラスト学習手法の多くは、グローバル特徴情報とローカル特徴情報の組み合わせを見落としていることが多い。本稿では,グローバルおよびローカル表現に基づくマルチネットワークコントラスト学習フレームワークを提案する。複数のネットワークを通じて自己指導型コントラスト学習のためのグローバル・ローカル特徴情報を導入する。モデルは、複数のネットワークから生成される埋め込みペアを対比して、画像の異なるスケールで特徴情報を学習する。このフレームワークはまた、コントラストに使用されるサンプル数を拡大し、モデルのトレーニング効率を向上させる。 3つのベンチマークデータセットの線形評価結果から,本手法は従来の自己教師付き学習法よりも優れていることが示された。

The popularity of self-supervised learning has made it possible to train models without relying on labeled data, which saves expensive annotation costs. However, most existing self-supervised contrastive learning methods often overlook the combination of global and local feature information. This paper proposes a multi-network contrastive learning framework based on global and local representations. We introduce global and local feature information for self-supervised contrastive learning through multiple networks. The model learns feature information at different scales of an image by contrasting the embedding pairs generated by multiple networks. The framework also expands the number of samples used for contrast and improves the training efficiency of the model. Linear evaluation results on three benchmark datasets show that our method outperforms several existing classical self-supervised learning methods.

翻訳日:2023-06-29 15:47:40 公開日:2023-06-28

# ジャンプポイント探索における冗長作業の削減

Reducing Redundant Work in Jump Point Search ( http://arxiv.org/abs/2306.15928v1 )

ライセンス: Link先を確認

Shizhe Zhao, Daniel Harabor, Peter J. Stuckey

(参考訳) JPS (Jump Point Search) は、オンライングリッドベースのパスフィンディングのための最先端のアルゴリズムである。ゲームやその他のナビゲーションシナリオで広く使われているが、JPSは研究されていない病理学的行動を示すことができる。 i) 地図の同じ領域を何度もスキャンして後継を見つけることができる。 (ii) 最適下探索ノードを生成して拡張する。本研究では,これらの病的行動の源泉について検討し,実際にどのように起こるかを示し,より効率的に対処するためのオンラインアプローチであるConstrained JPS(CJPS)を提案する。実験の結果、cjpsのオーバーヘッドは低く、動的に変化するグリッド環境ではjpsよりも高速であることが示され、大きなゲームマップでは最大7倍、病理シナリオでは最大14倍まで向上した。

JPS (Jump Point Search) is a state-of-the-art optimal algorithm for online grid-based pathfinding. Widely used in games and other navigation scenarios, JPS nevertheless can exhibit pathological behaviours which are not well studied: (i) it may repeatedly scan the same area of the map to find successors; (ii) it may generate and expand suboptimal search nodes. In this work, we examine the source of these pathological behaviours, show how they can occur in practice, and propose a purely online approach, called Constrained JPS (CJPS), to tackle them efficiently. Experimental results show that CJPS has low overheads and is often faster than JPS in dynamically changing grid environments: by up to 7x in large game maps and up to 14x in pathological scenarios.

翻訳日:2023-06-29 15:47:29 公開日:2023-06-28

# 全文脈情報から動的グラフを学習し, 正確な訪問予測

Learning Dynamic Graphs from All Contextual Information for Accurate Point-of-Interest Visit Forecasting ( http://arxiv.org/abs/2306.15927v1 )

ライセンス: Link先を確認

Arash Hajisafi, Haowen Lin, Sina Shaham, Haoji Hu, Maria Despoina Siampou, Yao-Yi Chiang, Cyrus Shahabi

(参考訳) 都市部におけるポイント・オブ・関心(POI)の訪問数予測は、都市計画・交通管理から公衆衛生・社会研究に至るまで、様々な分野の計画・意思決定に不可欠である。この予測問題は、多変量時系列予測タスクとして定式化することができるが、現在の手法では、POI間の常に変化するマルチコンテキスト相関を完全に活用することはできない。そこで本研究では,pois間のマルチコンテキスト相関を学習し,より正確な訪問予測のための時間的グラフニューラルネットワークであるbroadness graph neural network (bysgnn)を提案する。動的グラフを学習するために時系列データのみを使用する他のアプローチとは異なり、BysGNNはコンテキスト情報と時系列データを利用して正確な動的グラフ表現を学ぶ。文脈的・時間的・空間的な信号をすべて取り入れることで、米国中の実世界のデータセットを用いた実験において、最先端の予測モデルよりも予測精度が大幅に向上するのを観察する。

Forecasting the number of visits to Points-of-Interest (POI) in an urban area is critical for planning and decision-making for various application domains, from urban planning and transportation management to public health and social studies. Although this forecasting problem can be formulated as a multivariate time-series forecasting task, the current approaches cannot fully exploit the ever-changing multi-context correlations among POIs. Therefore, we propose Busyness Graph Neural Network (BysGNN), a temporal graph neural network designed to learn and uncover the underlying multi-context correlations between POIs for accurate visit forecasting. Unlike other approaches where only time-series data is used to learn a dynamic graph, BysGNN utilizes all contextual information and time-series data to learn an accurate dynamic graph representation. By incorporating all contextual, temporal, and spatial signals, we observe a significant improvement in our forecasting accuracy over state-of-the-art forecasting models in our experiments with real-world datasets across the United States.

翻訳日:2023-06-29 15:47:17 公開日:2023-06-28

# ほとんどの言語モデルも詩人になれる:AIライティングアシスタントと制約付きテキスト生成スタジオ

Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio ( http://arxiv.org/abs/2306.15926v1 )

ライセンス: Link先を確認

Allen Roush, Sanjay Basu, Akshay Moorthy, Dmitry Dubovoy

(参考訳) 制約された自然言語生成の分野で急速に進歩したにもかかわらず、語彙が語彙的に、意味的に、あるいは音声的に制約された言語モデルの可能性を探る時間はほとんどない。ほとんどの言語モデルは、大きな制約の下でも魅力的なテキストを生成する。本稿では,テキスト単位を生成する前に,言語モデル語彙にフィルタ関数を合成適用することにより,言語モデルの出力をシンプルかつ普遍的に変更する手法を提案する。このアプローチはプラグアンドプレイであり、モデルを変更する必要はない。本手法の価値を示すために,CTGS(Constrained Text Generation Studio)と呼ばれるAI記述アシスタントを提案する。 CTGSは、特定の文字を禁止したり、生成された単語に一定の数の音節を持つように強制したり、単語を他の単語の部分的なアナグラムに強制したりといった、幅広い制約の組み合わせでテキストを生成または選択することができる。文字eを省略する新しい散文データセットを導入する。本手法は,本データセットの微調整のみと比較して,厳格に優れた性能を示す。また,Gadsbyという技術を用いたHuggingfaceのWebアプリケーションも紹介する。コードはここで公開されている。 https://github.com/HellisotherPeople/Constrained-Text-Generation-Studio

Despite rapid advancement in the field of Constrained Natural Language Generation, little time has been spent on exploring the potential of language models which have had their vocabularies lexically, semantically, and/or phonetically constrained. We find that most language models generate compelling text even under significant constraints. We present a simple and universally applicable technique for modifying the output of a language model by compositionally applying filter functions to the language models vocabulary before a unit of text is generated. This approach is plug-and-play and requires no modification to the model. To showcase the value of this technique, we present an easy to use AI writing assistant called Constrained Text Generation Studio (CTGS). CTGS allows users to generate or choose from text with any combination of a wide variety of constraints, such as banning a particular letter, forcing the generated words to have a certain number of syllables, and/or forcing the words to be partial anagrams of another word. We introduce a novel dataset of prose that omits the letter e. We show that our method results in strictly superior performance compared to fine-tuning alone on this dataset. We also present a Huggingface space web-app presenting this technique called Gadsby. The code is available to the public here: https://github.com/Hellisotherpeople/Constrained-Text-Generation-Studio

翻訳日:2023-06-29 15:46:57 公開日:2023-06-28

# ロングテール認識のためのサブクラスバランスコントラスト学習

Subclass-balancing Contrastive Learning for Long-tailed Recognition ( http://arxiv.org/abs/2306.15925v1 )

ライセンス: Link先を確認

Chengkai Hou and Jieyu Zhang and Haonan Wang and Tianyi Zhou

(参考訳) 不均衡なクラス分布を持つロングテール認識は、実践的な機械学習アプリケーションで自然に現れる。 data reweighing、resampling、supervised contrastive learningのような既存のメソッドは、クラスバランスを、headクラスとtailクラスのインスタンス間の不均衡を導入する価格で強制し、これは前者のリッチなセマンティックサブ構造を無視し、後者のバイアスを誇張する可能性がある。これらの欠点を,各headクラスを末尾クラスと同じ大きさの複数のサブクラスに分類し,元のクラスとそれらのサブクラスの間の2層クラス階層をキャプチャする表現を強制する,新しい`subclass-balancing contrastive learning (sbcl)'アプローチによって克服した。クラスタリングは、表現空間内で実行され、トレーニング中に更新されるので、サブクラスラベルは、ヘッドクラスのセマンティックサブ構造を保持する。一方、テールクラスのサンプルを過度に強調しないため、各インスタンスは表現学習に等しく貢献する。したがって,本手法はインスタンスとサブクラスのバランスを両立させるが,元のクラスラベルは異なるクラスのサブクラスのコントラスト学習によって学習される。我々は,長期化ベンチマークデータセットの一覧からSBCLを評価し,最先端のパフォーマンスを実現する。さらに,SBCLのさらなる分析とアブレーションを行い,その利点を検証した。

Long-tailed recognition with imbalanced class distribution naturally emerges in practical machine learning applications. Existing methods such as data reweighing, resampling, and supervised contrastive learning enforce the class balance with a price of introducing imbalance between instances of head class and tail class, which may ignore the underlying rich semantic substructures of the former and exaggerate the biases in the latter. We overcome these drawbacks by a novel ``subclass-balancing contrastive learning (SBCL)'' approach that clusters each head class into multiple subclasses of similar sizes as the tail classes and enforce representations to capture the two-layer class hierarchy between the original classes and their subclasses. Since the clustering is conducted in the representation space and updated during the course of training, the subclass labels preserve the semantic substructures of head classes. Meanwhile, it does not overemphasize tail class samples, so each individual instance contribute to the representation learning equally. Hence, our method achieves both the instance- and subclass-balance, while the original class labels are also learned through contrastive learning among subclasses from different classes. We evaluate SBCL over a list of long-tailed benchmark datasets and it achieves the state-of-the-art performance. In addition, we present extensive analyses and ablation studies of SBCL to verify its advantages.

翻訳日:2023-06-29 15:46:36 公開日:2023-06-28

# オペレーター学習における次元の呪い

The curse of dimensionality in operator learning ( http://arxiv.org/abs/2306.15924v1 )

ライセンス: Link先を確認

Samuel Lanthaler and Andrew M. Stuart

(参考訳) ニューラルネットワークを用いて、関数のバナッハ空間間の演算子マッピングを近似し、エミュレーションによってモデル評価を加速したり、データからモデルを発見したりすることができる。その結果,近年,この手法が注目され,オペレーター学習の分野が急速に拡大している。この論文の第一の貢献は、C^r$-あるいはリプシッツ正則性のみによって特徴づけられる作用素の一般クラスに対して、無限次元の入力および出力関数空間の表現に関して正確に定義された次元性の呪いに苦しむことである。その結果は、PCA-Net、DeepONet、FNOなど、さまざまな既存のニューラル演算子に適用できる。この論文の第二の貢献は、ハミルトン・ヤコビ方程式によって定義される解作用素に対して、次元性の一般的な呪いが克服可能であることを証明することである。この目的のために、hj-netと呼ばれる新しいニューラルオペレーターアーキテクチャが導入され、基盤となるハミルトン系の特性情報を明示的に考慮した。 hj-net の誤差と複雑性の推定は、このアーキテクチャが無限次元の入出力関数空間に関連する次元の呪いを打ち負かすことができることを示している。

Neural operator architectures employ neural networks to approximate operators mapping between Banach spaces of functions; they may be used to accelerate model evaluations via emulation, or to discover models from data. Consequently, the methodology has received increasing attention over recent years, giving rise to the rapidly growing field of operator learning. The first contribution of this paper is to prove that for general classes of operators which are characterized only by their $C^r$- or Lipschitz-regularity, operator learning suffers from a curse of dimensionality, defined precisely here in terms of representations of the infinite-dimensional input and output function spaces. The result is applicable to a wide variety of existing neural operators, including PCA-Net, DeepONet and the FNO. The second contribution of the paper is to prove that the general curse of dimensionality can be overcome for solution operators defined by the Hamilton-Jacobi equation; this is achieved by leveraging additional structure in the underlying solution operator, going beyond regularity. To this end, a novel neural operator architecture is introduced, termed HJ-Net, which explicitly takes into account characteristic information of the underlying Hamiltonian system. Error and complexity estimates are derived for HJ-Net which show that this architecture can provably beat the curse of dimensionality related to the infinite-dimensional input and output function spaces.

翻訳日:2023-06-29 15:46:11 公開日:2023-06-28

# 微細な3次元物体認識 : アプローチと実験

Fine-grained 3D object recognition: an approach and experiments ( http://arxiv.org/abs/2306.15919v1 )

ライセンス: Link先を確認

Junhyung Jo, Hamidreza Kasaei

(参考訳) 3次元物体認識技術は自動車の自動運転などの先進技術の中核技術として利用されている。 3Dオブジェクト認識には2つのアプローチがある。 (i)Global Orthographic Object Descriptor(GOOD)などの手作りのアプローチ (ii)mobilenetやvggといったディープラーニングベースのアプローチ。しかし、既知のカテゴリの数が時間とともに増加するオープンエンド領域では、これらのアプローチのどちらがよりうまく機能するかを知る必要があり、システムは、少数のトレーニング例を使って、新しいオブジェクトカテゴリについて学ぶ必要がある。本稿では,オブジェクトビューを入力とし,カテゴリラベルを出力として生成するオフライン3Dオブジェクト認識システムを最初に実装した。オフラインの段階では、インスタンスベースの学習(IBL)を使用して新しいカテゴリを形成し、得られたオブジェクト認識性能を評価するためにK-foldクロスバリデーションを使用する。次に,提案手法をシミュレートした教師テストに統合し,オンライン形式でテストを行った。その結果,ディープラーニング機能を用いたアプローチは,よりオープンな手法に適していることがわかった。さらに,手作り・深層学習の特徴を結合することで分類精度が向上することを確認した。

Three-dimensional (3D) object recognition technology is being used as a core technology in advanced technologies such as autonomous driving of automobiles. There are two sets of approaches for 3D object recognition: (i) hand-crafted approaches like Global Orthographic Object Descriptor (GOOD), and (ii) deep learning-based approaches such as MobileNet and VGG. However, it is needed to know which of these approaches works better in an open-ended domain where the number of known categories increases over time, and the system should learn about new object categories using few training examples. In this paper, we first implemented an offline 3D object recognition system that takes an object view as input and generates category labels as output. In the offline stage, instance-based learning (IBL) is used to form a new category and we use K-fold cross-validation to evaluate the obtained object recognition performance. We then test the proposed approach in an online fashion by integrating the code into a simulated teacher test. As a result, we concluded that the approach using deep learning features is more suitable for open-ended fashion. Moreover, we observed that concatenating the hand-crafted and deep learning features increases the classification accuracy.

翻訳日:2023-06-29 15:45:48 公開日:2023-06-28

# ニューラルネットワークが捉えた情報:記憶と一般化とのつながり

On information captured by neural networks: connections with memorization and generalization ( http://arxiv.org/abs/2306.15918v1 )

ライセンス: Link先を確認

Hrayr Harutyunyan

(参考訳) ディープラーニングの人気と成功にもかかわらず、ニューラルネットワークが未知の例に一般化する時期、方法、理由の理解は限られている。学習はデータから情報を取り出すものとして見ることができるので、トレーニング中にニューラルネットワークが取得した情報を正式に研究する。具体的には,情報理論的な観点から雑音ラベルの存在下での学習から始め,ラベル雑音情報を重み付けに制限する学習アルゴリズムを導出する。次に、個々のサンプルがディープネットワークのトレーニングに与えるユニークな情報の概念を定義し、非定型的、曖昧、あるいは過度に表現されたサブポピュレーションに属する例のニューラルネットワークの振る舞いに光を当てる。非空の一般化ギャップ境界を導出することで、例情報性と一般化を関連付ける。最後に, 知識蒸留の研究により, 一般化におけるデータとラベルの複雑さの重要性を浮き彫りにする。その結果,ニューラルネットワークの一般化のメカニズムの理解を深めることができた。

Despite the popularity and success of deep learning, there is limited understanding of when, how, and why neural networks generalize to unseen examples. Since learning can be seen as extracting information from data, we formally study information captured by neural networks during training. Specifically, we start with viewing learning in presence of noisy labels from an information-theoretic perspective and derive a learning algorithm that limits label noise information in weights. We then define a notion of unique information that an individual sample provides to the training of a deep network, shedding some light on the behavior of neural networks on examples that are atypical, ambiguous, or belong to underrepresented subpopulations. We relate example informativeness to generalization by deriving nonvacuous generalization gap bounds. Finally, by studying knowledge distillation, we highlight the important role of data and label complexity in generalization. Overall, our findings contribute to a deeper understanding of the mechanisms underlying neural network generalization.

翻訳日:2023-06-29 15:45:30 公開日:2023-06-28

# 信頼度校正エンサンブルダンスフレーズ検索

Confidence-Calibrated Ensemble Dense Phrase Retrieval ( http://arxiv.org/abs/2306.15917v1 )

ライセンス: Link先を確認

William Yang, Noah Bergam, Arnav Jain, Nima Sheikhoslami

(参考訳) 本稿では, (Karpukhin et al. 2020) によって開発されたトランスフォーマーを用いた Dense Passage Retrieval (DPR) アルゴリズムが, 事前学習なしに最適化できる範囲について考察する。この手法には2つの特別な洞察が含まれている: dprコンテキストエンコーダを様々な句長(例えば、1セントと5セントのセグメント)に適用し、これら全てのセグメントに対して信頼度に合致したアンサンブル予測を行う。このやや徹底的なアプローチは、Google NQやSQuADといったベンチマークデータセットで、最先端の結果を達成する。また,本手法をドメイン固有のデータセットに適用し,異なるドメインに対して異なる粒度が最適であることを示す。

In this paper, we consider the extent to which the transformer-based Dense Passage Retrieval (DPR) algorithm, developed by (Karpukhin et. al. 2020), can be optimized without further pre-training. Our method involves two particular insights: we apply the DPR context encoder at various phrase lengths (e.g. one-sentence versus five-sentence segments), and we take a confidence-calibrated ensemble prediction over all of these different segmentations. This somewhat exhaustive approach achieves start-of-the-art results on benchmark datasets such as Google NQ and SQuAD. We also apply our method to domain-specific datasets, and the results suggest how different granularities are optimal for different domains

翻訳日:2023-06-29 15:45:14 公開日:2023-06-28

# ランダム係数リッジ回帰を用いた伝達学習

Transfer Learning with Random Coefficient Ridge Regression ( http://arxiv.org/abs/2306.15915v1 )

ライセンス: Link先を確認

Hongzhe Zhang and Hongzhe Li

(参考訳) ランダムな係数を持つリッジ回帰は、効果が小さいがゼロではないと期待される場合、高次元の設定で固定係数回帰に重要な代替となる。本稿では,移動学習の設定におけるランダム係数リッジ回帰の推定と予測について考察し,対象モデルからの観測に加えて,異なるが関連性のある回帰モデルからのサンプルも利用可能である。対象モデルに対するソースモデルの情報性は、回帰係数の相関によって定量化することができる。本稿では,実験的推定リスクや予測リスクを最小化して重みを決定できるターゲットモデルとソースモデルの両方のリッジ推定の重み付け和として,対象モデルの回帰係数を2つの推定器を提案する。ランダム行列理論を用いて、最適重みの制限値は、$p/n \rightarrow \gamma$(ここで$p$は予測者の数、$n$はサンプルサイズ)という設定で導出され、推定や予測のリスクを明示的に表わす。シミュレーションでは、これらの制限リスクは経験的リスクと非常によく一致している。脂質特性に対するポリジェニックリスクスコアの予測への応用は, 単一試料隆起回帰法やラッソを用いた伝達学習法よりも, 予測誤差が小さいことを示す。

Ridge regression with random coefficients provides an important alternative to fixed coefficients regression in high dimensional setting when the effects are expected to be small but not zeros. This paper considers estimation and prediction of random coefficient ridge regression in the setting of transfer learning, where in addition to observations from the target model, source samples from different but possibly related regression models are available. The informativeness of the source model to the target model can be quantified by the correlation between the regression coefficients. This paper proposes two estimators of regression coefficients of the target model as the weighted sum of the ridge estimates of both target and source models, where the weights can be determined by minimizing the empirical estimation risk or prediction risk. Using random matrix theory, the limiting values of the optimal weights are derived under the setting when $p/n \rightarrow \gamma$, where $p$ is the number of the predictors and $n$ is the sample size, which leads to an explicit expression of the estimation or prediction risks. Simulations show that these limiting risks agree very well with the empirical risks. An application to predicting the polygenic risk scores for lipid traits shows such transfer learning methods lead to smaller prediction errors than the single sample ridge regression or Lasso-based transfer learning.

翻訳日:2023-06-29 15:45:00 公開日:2023-06-28

# マルチモーダルうわさ検出のための知識強化階層型情報相関学習

Knowledge-Enhanced Hierarchical Information Correlation Learning for Multi-Modal Rumor Detection ( http://arxiv.org/abs/2306.15946v1 )

ライセンス: Link先を確認

Jiawei Liu, Jingyi Xie, Fanrui Zhang, Qiang Zhang, Zheng-jun Zha

(参考訳) ソーシャルメディア上のテキストや画像による噂の爆発的な成長は、大きな注目を集めている。既存の研究は、クロスモーダル情報インタラクションと融合に多大な貢献をしてきたが、異なるモダリティコンテンツ間の階層的および複雑な意味的相関を十分に探求できず、マルチモーダルなうわさを検出する際の性能を厳しく制限している。本研究では,基本意味相関と高次知識相関を共同でモデル化し,マルチモーダルうわさ検出のための知識エンハンスド階層情報相関学習手法(khicl)を提案する。具体的には、KhiCLはクロスモーダル結合辞書を利用して、異種一様特徴を共通特徴空間に伝達し、クロスモーダル融合層によって基本的なクロスモーダル意味的一貫性と矛盾を捉える。さらに、マルチモーダルコンテンツの記述をエンティティを中心に考えると、KhiCLは画像やテキストから視覚的およびテキスト的エンティティを抽出し、知識関連推論戦略を設計し、外部知識グラフ内の各エンティティ間の最も短い意味的関連パスを見つけ、この経路で他の連結エンティティの補完的なコンテキスト的知識をすべて吸収して知識強化エンティティ表現を学習する。さらに、KhiCLは署名された注意機構を用いて、その対応する意味的関連距離を測定することで、モダリティ内およびモダリティ間エンティティペアの知識強化エンティティ一貫性と矛盾をモデル化する。提案手法の有効性を実験により実証した。

The explosive growth of rumors with text and images on social media platforms has drawn great attention. Existing studies have made significant contributions to cross-modal information interaction and fusion, but they fail to fully explore hierarchical and complex semantic correlation across different modality content, severely limiting their performance on detecting multi-modal rumor. In this work, we propose a novel knowledge-enhanced hierarchical information correlation learning approach (KhiCL) for multi-modal rumor detection by jointly modeling the basic semantic correlation and high-order knowledge-enhanced entity correlation. Specifically, KhiCL exploits cross-modal joint dictionary to transfer the heterogeneous unimodality features into the common feature space and captures the basic cross-modal semantic consistency and inconsistency by a cross-modal fusion layer. Moreover, considering the description of multi-modal content is narrated around entities, KhiCL extracts visual and textual entities from images and text, and designs a knowledge relevance reasoning strategy to find the shortest semantic relevant path between each pair of entities in external knowledge graph, and absorbs all complementary contextual knowledge of other connected entities in this path for learning knowledge-enhanced entity representations. Furthermore, KhiCL utilizes a signed attention mechanism to model the knowledge-enhanced entity consistency and inconsistency of intra-modality and inter-modality entity pairs by measuring their corresponding semantic relevant distance. Extensive experiments have demonstrated the effectiveness of the proposed method.

翻訳日:2023-06-29 15:37:56 公開日:2023-06-28

# Pb-Hash: 分割bビットハッシュ

Pb-Hash: Partitioned b-bit Hashing ( http://arxiv.org/abs/2306.15944v1 )

ライセンス: Link先を確認

Ping Li, Weijie Zhao

(参考訳) minwise hashing (minhash), one permutation hashing (oph), consistent weighted sampling (cws) を含む多くのハッシュアルゴリズムは、$b$bitの整数を生成する。データベクトル毎に$k$ハッシュを使用すると、ストレージは$B\times k$ bitsとなり、大規模学習に使用する場合、モデルサイズは$2^B\times k$となる。標準的な戦略は、$b$ビットのうち最低の$b$ビットのみを使用し、ハッシュ数である$k$を多少増やすことである。本研究では,$B$ビットを$m$チャンク,例えば$b\times m =B$に分割することでハッシュを再使用することを提案する。対応するモデルサイズは$m\times 2^b \times k$となり、これは元の$^b\times k$よりもかなり小さい。理論分析の結果、ハッシュ値を$m$チャンクに分割すると精度が低下することが明らかとなった。言い換えれば、$B/m$ビットの$m$チャンクを使うことは、$B$ビットを直接使用するほど正確ではない。これは同じハッシュの再使用による相関のためである。一方、我々の分析では(例えば)$m=2\sim 4$に対して精度があまり低下しないことも示している。一部の地域では、pb-hashは4.99ドルよりはるかに大きい価格でも機能する。 Pb-Hashは、ハッシュメソッド/アプリケーションのファミリーに良い追加であり、産業従事者に利益をもたらすと期待しています。線形SVMモデルおよびディープラーニングモデルに対する機械学習タスクにおけるPb-Hashの有効性を検証する。ハッシュデータは本質的に分類(ID)機能であるため、各ハッシュに埋め込みテーブルを使用する標準的なプラクティスに従う。 Pb-Hashでは、$m$の埋め込みを組み合わせる効果的な戦略を設計する必要があります。本研究は, 連結, 最大プール, 平均プール, 製品プールの4つの手法を実証的に評価した。どのプールが良いかという明確な答えはなく、私たちは将来の研究のためにそれを残します。

Many hashing algorithms including minwise hashing (MinHash), one permutation hashing (OPH), and consistent weighted sampling (CWS) generate integers of $B$ bits. With $k$ hashes for each data vector, the storage would be $B\times k$ bits; and when used for large-scale learning, the model size would be $2^B\times k$, which can be expensive. A standard strategy is to use only the lowest $b$ bits out of the $B$ bits and somewhat increase $k$, the number of hashes. In this study, we propose to re-use the hashes by partitioning the $B$ bits into $m$ chunks, e.g., $b\times m =B$. Correspondingly, the model size becomes $m\times 2^b \times k$, which can be substantially smaller than the original $2^B\times k$. Our theoretical analysis reveals that by partitioning the hash values into $m$ chunks, the accuracy would drop. In other words, using $m$ chunks of $B/m$ bits would not be as accurate as directly using $B$ bits. This is due to the correlation from re-using the same hash. On the other hand, our analysis also shows that the accuracy would not drop much for (e.g.,) $m=2\sim 4$. In some regions, Pb-Hash still works well even for $m$ much larger than 4. We expect Pb-Hash would be a good addition to the family of hashing methods/applications and benefit industrial practitioners. We verify the effectiveness of Pb-Hash in machine learning tasks, for linear SVM models as well as deep learning models. Since the hashed data are essentially categorical (ID) features, we follow the standard practice of using embedding tables for each hash. With Pb-Hash, we need to design an effective strategy to combine $m$ embeddings. Our study provides an empirical evaluation on four pooling schemes: concatenation, max pooling, mean pooling, and product pooling. There is no definite answer which pooling would be always better and we leave that for future study.

翻訳日:2023-06-29 15:37:26 公開日:2023-06-28

# 移動不要:Opti-Mileを用いたラストマイルと公共交通の統合

No Transfers Required: Integrating Last Mile with Public Transit Using Opti-Mile ( http://arxiv.org/abs/2306.15943v1 )

ライセンス: Link先を確認

Raashid Altaf, Pravesh Biyani

(参考訳) 公共交通機関は、ほとんどの地域に到達するのに必要な交通機関の必要性のため不便にもかかわらず、その手頃な価格のため人気のある交通手段である。例えば、ニューデリーのバスと地下鉄のネットワークでは、どの出発点からでも30\%しか直接アクセスできないため、ほとんどの通勤者への乗り換えが必要となる。さらに、リックショー、タクチューク、シャトルといったラストマイルのサービスは、最も近い公共交通機関のアクセスポイントへの給餌機として一般的に使われており、旅の複雑さと非効率性をさらに増す。最終的に、ユーザーは移動モードやラストマイルサービスの有無に関わらず、目的地に到達するためのカバレッジと転送のトレードオフに直面します。公共交通機関における移動に伴うアクセシビリティの制限と非効率の問題に対処するために,ラストマイルサービスと公共交通機関を組み合わせた新しい旅行計画手法である「opti-mile」を提案する。 Opti-mileでは、最大歩行距離や許容範囲などの旅行パラメータをカスタマイズできる。我々はニューデリーの交通ネットワークを解析し、ランダムに選択されたソース-決定ペア間の最適なマルチモーダル旅行におけるオプティマイルの効率、実現可能性、利点を評価する。従来の最短経路に比べて18%の値上げで、オプティマイル走行が10%距離移動を減少させることを示した。また、オプティマイルの旅行は公共交通機関よりも、運賃の大幅な増加を伴わずに、市をカバーしていることを示す。

Public transit is a popular mode of transit due to its affordability, despite the inconveniences due to the necessity of transfers required to reach most areas. For example, in the bus and metro network of New Delhi, only 30\% of stops can be directly accessed from any starting point, thus requiring transfers for most commutes. Additionally, last-mile services like rickshaws, tuk-tuks or shuttles are commonly used as feeders to the nearest public transit access points, which further adds to the complexity and inefficiency of a journey. Ultimately, users often face a tradeoff between coverage and transfers to reach their destination, regardless of the mode of transit or the use of last-mile services. To address the problem of limited accessibility and inefficiency due to transfers in public transit systems, we propose ``opti-mile," a novel trip planning approach that combines last-mile services with public transit such that no transfers are required. Opti-mile allows users to customise trip parameters such as maximum walking distance, and acceptable fare range. We analyse the transit network of New Delhi, evaluating the efficiency, feasibility and advantages of opti-mile for optimal multi-modal trips between randomly selected source-destination pairs. We demonstrate that opti-mile trips lead to a 10% reduction in distance travelled for 18% increase in price compared to traditional shortest paths. We also show that opti-mile trips provide better coverage of the city than public transit, without a significant fare increase.

翻訳日:2023-06-29 15:36:49 公開日:2023-06-28

# ターゲット音声抽出のための空間情報付き強化ニューラルビームフォーマ

Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction ( http://arxiv.org/abs/2306.15942v1 )

ライセンス: Link先を確認

Aoqi Guo, Junnan Wu, Peng Gao, Wenbo Zhu, Qinwen Guo, Dazhi Gao and Yujun Wang

(参考訳) 近年,深層学習に基づくビームフォーミングアルゴリズムは,ターゲット音声抽出作業において有望な性能を示した。しかし、ほとんどのシステムは空間情報を十分に利用していない。本稿では,空間情報を利用してニューラルビームフォーマの性能を向上させるターゲット音声抽出ネットワークを提案する。そこで我々はまず, unet-tcn構造を用いて入力特徴をモデル化し, 他のモデルにおける直接次元化による情報損失を回避し, 音声前分離モジュールの推定精度を向上させる。さらに,アレイが受信する空間情報を十分に活用することにより,神経ビームフォーマーの空間情報の知覚を高めるマルチヘッドクロスアテンション機構を提案する。実験の結果,より合理的なターゲットマスク推定ネットワークと空間情報に基づくクロスタッチ機構を組み込んだアプローチが,音声分離性能を効果的に向上することが示された。

Recently, deep learning-based beamforming algorithms have shown promising performance in target speech extraction tasks. However, most systems do not fully utilize spatial information. In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer. To achieve this, we first use the UNet-TCN structure to model input features and improve the estimation accuracy of the speech pre-separation module by avoiding information loss caused by direct dimensionality reduction in other models. Furthermore, we introduce a multi-head cross-attention mechanism that enhances the neural beamformer's perception of spatial information by making full use of the spatial information received by the array. Experimental results demonstrate that our approach, which incorporates a more reasonable target mask estimation network and a spatial information-based cross-attention mechanism into the neural beamformer, effectively improves speech separation performance.

翻訳日:2023-06-29 15:36:21 公開日:2023-06-28

# 変分オートエンコーダの概念学習によるセルネットワークの解釈可能な異常検出

Interpretable Anomaly Detection in Cellular Networks by Learning Concepts in Variational Autoencoders ( http://arxiv.org/abs/2306.15938v1 )

ライセンス: Link先を確認

Amandeep Singh, Michael Weber, Markus Lange-Hegermann

(参考訳) 本稿では,セルラーネットワーク内の異常を解釈可能な方法で検出する課題に対処し,データセット内のキーパフォーマンス指標(KPI)毎に潜在空間の解釈可能な表現を学習する可変オートエンコーダ(VAE)を用いた新しいアプローチを提案する。これにより、再構成損失とzスコアに基づく異常の検出が可能になる。我々は,k-meansアルゴリズムを用いた追加情報センタロイド(c)による異常の解釈可能性を確保し,表現学習の促進を図る。我々は,特定のKPIの潜在次元のパターンを解析することにより,モデルの性能を評価し,解釈可能性と異常を実証する。提案するフレームワークは,セルネットワーク内の異常を検出するための高速かつ自律的なソリューションを提供し,ビッグデータ処理におけるディープラーニングベースのアルゴリズムの可能性を示す。

This paper addresses the challenges of detecting anomalies in cellular networks in an interpretable way and proposes a new approach using variational autoencoders (VAEs) that learn interpretable representations of the latent space for each Key Performance Indicator (KPI) in the dataset. This enables the detection of anomalies based on reconstruction loss and Z-scores. We ensure the interpretability of the anomalies via additional information centroids (c) using the K-means algorithm to enhance representation learning. We evaluate the performance of the model by analyzing patterns in the latent dimension for specific KPIs and thereby demonstrate the interpretability and anomalies. The proposed framework offers a faster and autonomous solution for detecting anomalies in cellular networks and showcases the potential of deep learning-based algorithms in handling big data.

翻訳日:2023-06-29 15:36:00 公開日:2023-06-28

# 熱電流の量子制御

Quantum Control of Heat Current ( http://arxiv.org/abs/2306.15937v1 )

ライセンス: Link先を確認

Gobinda Chakraborty, Subhadeep Chakraborty, Tanmoy Basu, and Manas Mukherjee

(参考訳) 2つの熱浴に結合した高調波発振器の量子トリマーにおける局所熱輸送について検討した。それらのカップリングは複雑な相によって増強され、同じ熱浴に接続された2つの発振器間の局所的な非定型熱電流の量子制御につながる。本研究により, この非定型熱電流はダークモードの上昇の結果であり, この電流の変調はシステム浴の相関のばらつきに起因することが明らかとなった。提案する量子システムは、熱電流を利用して量子熱・メモリデバイスに応用できるかもしれない。

We investigate the local thermal transport in a quantum trimer of harmonic oscillators connected to two thermal baths. The coupling between them are augmented by complex phases which leads to the quantum control of the local atypical heat current between two oscillators connected to the same heat bath. Our study reveals that this atypical heat current is a consequence of the lifting of the dark mode and the modulation of this current is due to variation in system bath correlations. The proposed quantum system may find application in quantum thermal and memory devices by leveraging the heat current.

翻訳日:2023-06-29 15:35:37 公開日:2023-06-28

# モデルベース適応のための奇抜なリプレイ

Curious Replay for Model-based Adaptation ( http://arxiv.org/abs/2306.15934v1 )

ライセンス: Link先を確認

Isaac Kauvar, Chris Doyle, Linqi Zhou, Nick Haber

(参考訳) エージェントは環境の変化に応じて迅速に適応できなければならない。既存のモデルベース強化学習エージェントは、過去の経験を世界モデルのトレーニングに用いているため、これをうまく実行できないことが分かっています。ここでは、好奇心に基づく優先信号を用いて、モデルベースのエージェントにカスタマイズされた優先的な体験リプレイの形式であるCurious Replayを紹介する。 Curious Replayを使用するエージェントは、動物行動やCrafterベンチマークにインスパイアされた探索パラダイムのパフォーマンス向上を示す。 Curious Replay の DreamerV3 は Crafter の最先端のパフォーマンスを上回り、DreamerV3 の以前の高得点 14.5 よりも大幅に向上した 19.4 のスコアを達成し、Deepmind Control Suite でも同様のパフォーマンスを維持した。 Curious Replayのコードはhttps://github.com/AutonomousAgentsLab/curiousreplayで入手できる。

Agents must be able to adapt quickly as an environment changes. We find that existing model-based reinforcement learning agents are unable to do this well, in part because of how they use past experiences to train their world model. Here, we present Curious Replay -- a form of prioritized experience replay tailored to model-based agents through use of a curiosity-based priority signal. Agents using Curious Replay exhibit improved performance in an exploration paradigm inspired by animal behavior and on the Crafter benchmark. DreamerV3 with Curious Replay surpasses state-of-the-art performance on Crafter, achieving a mean score of 19.4 that substantially improves on the previous high score of 14.5 by DreamerV3 with uniform replay, while also maintaining similar performance on the Deepmind Control Suite. Code for Curious Replay is available at https://github.com/AutonomousAgentsLab/curiousreplay

翻訳日:2023-06-29 15:35:22 公開日:2023-06-28

# 再度生成できる - 検証と修正プロンプトを備えたデータからテキストへの生成

You Can Generate It Again: Data-to-text Generation with Verification and Correction Prompting ( http://arxiv.org/abs/2306.15933v1 )

ライセンス: Link先を確認

Xuan Ren, Lingqiao Liu

(参考訳) 既存のモデルの大幅な進歩にもかかわらず、データ対テキスト生成として知られる構造化データ入力からテキスト記述を生成することは、依然として困難な課題である。本稿では, 生成, 検証, 修正段階からなる多段階プロセスを導入することで, 従来のワンショット生成方法を超える新しい手法を提案する。我々のアプローチであるVCP(Verification and Correction Prompting)は、初期出力を生成するモデルから始まります。次に、生成されたテキストの異なる側面の正しさを検証する。検証ステップからの観察は、特定されたエラーを考慮して出力を再生するようにモデルに指示する特殊なエラー表示プロンプトに変換される。モデルの修正能力を高めるため,注意深く設計したトレーニング手順を開発した。この手順により、モデルがエラー表示プロンプトからのフィードバックを組み込むことができ、結果として出力生成が改善される。実験結果から,本手法は生成テキストの全体的な品質を維持しつつ,スロットエラー率を効果的に低減することを示す。

Despite significant advancements in existing models, generating text descriptions from structured data input, known as data-to-text generation, remains a challenging task. In this paper, we propose a novel approach that goes beyond traditional one-shot generation methods by introducing a multi-step process consisting of generation, verification, and correction stages. Our approach, VCP(Verification and Correction Prompting), begins with the model generating an initial output. We then proceed to verify the correctness of different aspects of the generated text. The observations from the verification step are converted into a specialized error-indication prompt, which instructs the model to regenerate the output while considering the identified errors. To enhance the model's correction ability, we have developed a carefully designed training procedure. This procedure enables the model to incorporate feedback from the error-indication prompt, resulting in improved output generation. Through experimental results, we demonstrate that our approach effectively reduces slot error rates while maintaining the overall quality of the generated text.

翻訳日:2023-06-29 15:34:47 公開日:2023-06-28

# NIPD:実世界の非IIDデータに基づくフェデレーション学習者検出ベンチマーク

NIPD: A Federated Learning Person Detection Benchmark Based on Real-World Non-IID Data ( http://arxiv.org/abs/2306.15932v1 )

ライセンス: Link先を確認

Kangning Yin, Zhen Ding, Zhihua Dong, Dongsheng Chen, Jie Fu, Xinhui Ji, Guangqiang Yin and Zhiguo Wang

(参考訳) プライバシー保護型分散機械学習であるfederated learning(fl)は、無線通信ネットワークで急速に適用されている。 FLにより、IoT(Internet of Things)クライアントは、プライバシーの漏洩を防止しつつ、十分にトレーニングされたモデルを得ることができる。人検出は、FLと組み合わせてビデオデータをエッジで直接処理する場合、限られた計算能力を持つエッジデバイスに展開することができる。しかし、異なるカメラの異なるハードウェアおよび展開シナリオのため、カメラが収集したデータは非独立かつ同一に分布しており(非IID)、FLアグリゲーションから派生したグローバルモデルはより効果的ではない。一方、既存の研究では、現実世界のFLオブジェクト検出のための公開データセットが欠如しており、IoTカメラにおける非IID問題の研究には適していない。そこで我々は,5台のカメラから収集した非IID IoT 人物検出(NIPD)データセットをオープンソース化した。我々の知る限り、これがデバイスベースの非IID人物検出データセットとしては初めてのものである。このデータセットに基づいて,fl実験プラットフォームの構築方法を説明し,非iid者検出のためのベンチマークを提供する。 NIPDはFLの適用とスマートシティのセキュリティを促進することが期待されている。

Federated learning (FL), a privacy-preserving distributed machine learning, has been rapidly applied in wireless communication networks. FL enables Internet of Things (IoT) clients to obtain well-trained models while preventing privacy leakage. Person detection can be deployed on edge devices with limited computing power if combined with FL to process the video data directly at the edge. However, due to the different hardware and deployment scenarios of different cameras, the data collected by the camera present non-independent and identically distributed (non-IID), and the global model derived from FL aggregation is less effective. Meanwhile, existing research lacks public data set for real-world FL object detection, which is not conducive to studying the non-IID problem on IoT cameras. Therefore, we open source a non-IID IoT person detection (NIPD) data set, which is collected from five different cameras. To our knowledge, this is the first true device-based non-IID person detection data set. Based on this data set, we explain how to establish a FL experimental platform and provide a benchmark for non-IID person detection. NIPD is expected to promote the application of FL and the security of smart city.

翻訳日:2023-06-29 15:34:21 公開日:2023-06-28

# 学習可能なパッチワイズマスクによる対向移動性の向上

Boosting Adversarial Transferability with Learnable Patch-wise Masks ( http://arxiv.org/abs/2306.15931v1 )

ライセンス: Link先を確認

Xingxing Wei, Shiji Zhao

(参考訳) 敵対的な例は、異なるモデル間での転送性のため、セキュリティクリティカルなアプリケーションで広く注目を集めています。対向移動性を高めるために多くの方法が提案されているが、実際的な需要にはまだギャップがある。本稿では,モデル固有の判別領域が,ソースモデルへの過剰適合を招き,対象モデルへの伝達性を低下させる鍵要因であると主張する。そのため、対向摂動を計算する際に、パッチワイズマスクを用いてモデル固有領域をプルークする。これらの領域を正確にローカライズするために,マスクの自動最適化のための学習可能なアプローチを提案する。具体的には,対象モデルのシミュレーションを行い,シミュレーションモデルのフィードバックに応じてパッチワイズマスクを調整する。効率を向上させるために、差分進化法(DE)アルゴリズムを用いて特定の画像に対するパッチワイドマスクを探索する。反復攻撃中、学習したマスクを画像に適用して、モデル固有の領域に関するパッチをドロップアウトし、勾配をより汎用的にし、対向移動性を向上させる。提案手法は前処理法であり,既存の勾配に基づく手法と統合することで,転送攻撃成功率をさらに高めることができる。 ImageNetデータセットの大規模な実験により,本手法の有効性が示された。提案手法を既存のアンサンブル攻撃手法に組み込んで,最新技術を用いた攻撃性能を効果的に向上させる7つの先進防衛手法に対して平均93.01%の成功率を達成する。

Adversarial examples have raised widespread attention in security-critical applications because of their transferability across different models. Although many methods have been proposed to boost adversarial transferability, a gap still exists in the practical demand. In this paper, we argue that the model-specific discriminative regions are a key factor to cause the over-fitting to the source model, and thus reduce the transferability to the target model. For that, a patch-wise mask is utilized to prune the model-specific regions when calculating adversarial perturbations. To accurately localize these regions, we present a learnable approach to optimize the mask automatically. Specifically, we simulate the target models in our framework, and adjust the patch-wise mask according to the feedback of simulated models. To improve the efficiency, Differential Evolutionary (DE) algorithm is utilized to search for patch-wise masks for a specific image. During iterative attacks, the learned masks are applied to the image to drop out the patches related to model-specific regions, thus making the gradients more generic and improving the adversarial transferability. The proposed approach is a pre-processing method and can be integrated with existing gradient-based methods to further boost the transfer attack success rate. Extensive experiments on the ImageNet dataset demonstrate the effectiveness of our method. We incorporate the proposed approach with existing methods in the ensemble attacks and achieve an average success rate of 93.01% against seven advanced defense methods, which can effectively enhance the state-of-the-art transfer-based attack performance.

翻訳日:2023-06-29 15:34:02 公開日:2023-06-28

# バイモーダルトランスによる血行動態応答関数の再構成

Reconstructing the Hemodynamic Response Function via a Bimodal Transformer ( http://arxiv.org/abs/2306.15971v1 )

ライセンス: Link先を確認

Yoni Choukroun, Lior Golgher, Pablo Blinder, Lior Wolf

(参考訳) 血流と神経活動の関係は広く認識されており、fmri研究において血流は神経活動のサーロゲートとしてよく用いられる。微小なレベルでは、神経活動は近くの血管の血流に影響を与えることが示されている。本研究は、この問題を明示的なニューロン集団レベルで直接扱う最初の予測モデルを提案する。覚醒マウスの生体内記録を用いて, 経時的バイモーダルトランスフォーマー構造を用いて, 経時的血流量と持続する自発的ニューロン活動の両方に基づいて, 血流を推定する。本研究は,神経活動の取り込みにより,血流量の予測能力が著しく向上することが示唆された。モデル行動の解析を通じて,神経活動に対する血行力学的反応の概ね未熟な性質に関する仮説を提案する。

The relationship between blood flow and neuronal activity is widely recognized, with blood flow frequently serving as a surrogate for neuronal activity in fMRI studies. At the microscopic level, neuronal activity has been shown to influence blood flow in nearby blood vessels. This study introduces the first predictive model that addresses this issue directly at the explicit neuronal population level. Using in vivo recordings in awake mice, we employ a novel spatiotemporal bimodal transformer architecture to infer current blood flow based on both historical blood flow and ongoing spontaneous neuronal activity. Our findings indicate that incorporating neuronal activity significantly enhances the model's ability to predict blood flow values. Through analysis of the model's behavior, we propose hypotheses regarding the largely unexplored nature of the hemodynamic response to neuronal activity.

翻訳日:2023-06-29 15:27:51 公開日:2023-06-28

# 雑音量子処理実験における有効量子体積・忠実度・計算コスト

Effective quantum volume, fidelity and computational cost of noisy quantum processing experiments ( http://arxiv.org/abs/2306.15970v1 )

ライセンス: Link先を確認

K. Kechedzhi, S. V. Isakov, S. Mandr\`a, B. Villalonga, X. Mi, S. Boixo, V. Smelyanskiy

(参考訳) 今日の実験的なノイズ量子プロセッサは、無作為回路サンプリングの計算ベンチマークタスクのために、最先端のスーパーコンピュータ上のすべての既知のアルゴリズムと競合することができる[1-5]。さらに、局所観測可能な量子情報スクランブル[6]の回路ベースの量子シミュレーションは、例えば、正確なシュロディンガー進化やマトリックス生成状態(MPS)など、標準のフルウェーブ関数シミュレーションアルゴリズムをすでに上回っている。しかし、この実験はまだ観測可能値を計算するためにテンソルネットワークの収縮を越えていない。これらの研究に基づき、本研究は、特定の観測可能な信号対雑音比とそれに対応する計算コストとのトレードオフを説明するために、基礎となる有効回路体積を利用する統一的なフレームワークを提供する。このフレームワークを、ランダム回路サンプリング[5]、量子情報スクランブル[6]、フロッケ回路ユニタリ[7]の最近の量子プロセッサ実験に適用する。これにより、Refの結果を再現できます。 1つのGPUを使って、データポイントあたり1秒未満で [7]。

Today's experimental noisy quantum processors can compete with and surpass all known algorithms on state-of-the-art supercomputers for the computational benchmark task of Random Circuit Sampling [1-5]. Additionally, a circuit-based quantum simulation of quantum information scrambling [6], which measures a local observable, has already outperformed standard full wave function simulation algorithms, e.g., exact Schrodinger evolution and Matrix Product States (MPS). However, this experiment has not yet surpassed tensor network contraction for computing the value of the observable. Based on those studies, we provide a unified framework that utilizes the underlying effective circuit volume to explain the tradeoff between the experimentally achievable signal-to-noise ratio for a specific observable, and the corresponding computational cost. We apply this framework to recent quantum processor experiments of Random Circuit Sampling [5], quantum information scrambling [6], and a Floquet circuit unitary [7]. This allows us to reproduce the results of Ref. [7] in less than one second per data point using one GPU.

翻訳日:2023-06-29 15:27:38 公開日:2023-06-28

# 分離可能な物理インフォームニューラルネットワーク

Separable Physics-Informed Neural Networks ( http://arxiv.org/abs/2306.15969v1 )

ライセンス: Link先を確認

Junwoo Cho, Seungtae Nam, Hyunmo Yang, Seok-Bae Yun, Youngjoon Hong, Eunbyung Park

(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、様々なPDEに対して有望なデータ駆動型PDE解法として最近登場した。しかし、多次元pdesや近似高複素解関数を解くための訓練ピンの基本的な制限がある。これらの困難なpdesに必要なトレーニングポイント(ロケーションポイント)の数は大幅に増加するが、高価な計算コストとメモリのオーバーヘッドのため、かなり制限されている。この問題を克服するため,我々はpinnのネットワークアーキテクチャとトレーニングアルゴリズムを提案する。提案手法である分離可能なPINN(SPINN)は,従来のPINNのポイントワイド処理とは異なり,多次元PDEにおけるネットワーク伝搬数を著しく削減する。また,PDE残差計算の計算コストを削減し,単一のコモディティGPU上で多数のコロケーションポイント(>10^7)を実現するために,前方モード自動微分法を提案する。実験の結果,多次元PDEにおける計算コスト(壁面時間62倍,FLOPでは1,394倍)を大幅に削減し,精度が向上した。さらに,SPINN は,2+1-d Navier-Stokes 方程式を最良性能の先行手法 (1GPUでは9分対10時間) よりもはるかに高速に解き,精度を維持できることを示した。最後に、SPINNは高非線形多次元PDE(3+1-d Navier-Stokes方程式)の解を正確に得ることを示す。

Physics-informed neural networks (PINNs) have recently emerged as promising data-driven PDE solvers showing encouraging results on various PDEs. However, there is a fundamental limitation of training PINNs to solve multi-dimensional PDEs and approximate highly complex solution functions. The number of training points (collocation points) required on these challenging PDEs grows substantially, but it is severely limited due to the expensive computational costs and heavy memory overhead. To overcome this issue, we propose a network architecture and training algorithm for PINNs. The proposed method, separable PINN (SPINN), operates on a per-axis basis to significantly reduce the number of network propagations in multi-dimensional PDEs unlike point-wise processing in conventional PINNs. We also propose using forward-mode automatic differentiation to reduce the computational cost of computing PDE residuals, enabling a large number of collocation points (>10^7) on a single commodity GPU. The experimental results show drastically reduced computational costs (62x in wall-clock time, 1,394x in FLOPs given the same number of collocation points) in multi-dimensional PDEs while achieving better accuracy. Furthermore, we present that SPINN can solve a chaotic (2+1)-d Navier-Stokes equation significantly faster than the best-performing prior method (9 minutes vs 10 hours in a single GPU), maintaining accuracy. Finally, we showcase that SPINN can accurately obtain the solution of a highly nonlinear and multi-dimensional PDE, a (3+1)-d Navier-Stokes equation.

翻訳日:2023-06-29 15:27:21 公開日:2023-06-28

# 階層型強化学習による都市自律運転の行動と軌道計画

Action and Trajectory Planning for Urban Autonomous Driving with Hierarchical Reinforcement Learning ( http://arxiv.org/abs/2306.15968v1 )

ライセンス: Link先を確認

Xinyang Lu, Flint Xiaofeng Fan and Tianying Wang

(参考訳) 強化学習(rl)は、単純な運転シナリオにおける自動運転車(avs)の計画と意思決定において有望な進歩を遂げた。しかし、AVのための既存のRLアルゴリズムは、複雑な都市シナリオにおいて重要な運転スキルを学ばない。第一に、都市運転シナリオは、従来のRLアルゴリズムが不可能な複数の運転タスクを扱うためにAVを必要とする。第2に、都市シナリオにおける他の車両の存在は動的に変化する環境をもたらし、rlアルゴリズムはavの動作と軌道を計画する。本研究では,階層的強化学習(athrl)法を用いて,ライダーとバードアイの視覚の知覚を用いて,階層的モデルにおけるエージェントの挙動をモデル化するアクションおよび軌道プランナーを提案する。提案手法は,エージェントの将来の軌跡を決定することを学習し,階層DDPGアルゴリズムに基づいて目標経路を連続的に計算する。 athrlモデルによって計画されたウェイポイントは低レベルコントローラに送られ、車両の操縦に必要な操舵およびスロットルコマンドを生成する。 athrlの有効性を,carlaシミュレータにおける複数タスクからなる複雑な都市走行シナリオにおける広範囲な実験により実証的に検証した。実験結果から, 最先端RL法と比較して, 大幅な性能向上が示唆された。

Reinforcement Learning (RL) has made promising progress in planning and decision-making for Autonomous Vehicles (AVs) in simple driving scenarios. However, existing RL algorithms for AVs fail to learn critical driving skills in complex urban scenarios. First, urban driving scenarios require AVs to handle multiple driving tasks of which conventional RL algorithms are incapable. Second, the presence of other vehicles in urban scenarios results in a dynamically changing environment, which challenges RL algorithms to plan the action and trajectory of the AV. In this work, we propose an action and trajectory planner using Hierarchical Reinforcement Learning (atHRL) method, which models the agent behavior in a hierarchical model by using the perception of the lidar and birdeye view. The proposed atHRL method learns to make decisions about the agent's future trajectory and computes target waypoints under continuous settings based on a hierarchical DDPG algorithm. The waypoints planned by the atHRL model are then sent to a low-level controller to generate the steering and throttle commands required for the vehicle maneuver. We empirically verify the efficacy of atHRL through extensive experiments in complex urban driving scenarios that compose multiple tasks with the presence of other vehicles in the CARLA simulator. The experimental results suggest a significant performance improvement compared to the state-of-the-art RL methods.

翻訳日:2023-06-29 15:26:56 公開日:2023-06-28

# 高速融合移動によるグラフ補間

Graph Interpolation via Fast Fused-Gromovization ( http://arxiv.org/abs/2306.15963v1 )

ライセンス: Link先を確認

Xinyu Ma, Xu Chu, Yasha Wang, Yang Lin, Junfeng Zhao, Liantao Ma, Wenwu Zhu

(参考訳) グラフデータの増大はグラフレベルの分類のためのグラフニューラルネットワーク(GNN)の一般化性と堅牢性を高めるのに有効であることが証明されている。しかし、既存の手法は主にグラフ信号空間とグラフ構造空間を独立に拡張することに集中し、それらの相互作用を見越す。本稿では,グラフ構造と信号間の相互作用を考慮に入れたグラフ間のノードマッチングのための最適戦略を見つけることを目的とした,最適輸送問題としてこの問題を定式化する。この問題に対処するために、FGWMixupと呼ばれる新しいグラフ混合アルゴリズムを提案し、FGW(Fused Gromov-Wasserstein)計量空間を利用して、ソースグラフの「中間点」を同定する。この手法のスケーラビリティを向上させるために, 収束率を$\mathcal{o}(t^{-1})$から$\mathcal{o}(t^{-2})$にすることで, fgwmixupを高速化する緩和されたfgwソルバを導入する。古典的(MPNN)と先進的(Graphormers)のGNNバックボーンを併用した5つのデータセットで行われた大規模な実験は、GNNの一般化性と堅牢性を改善する上でFGWMixupの有効性を示した。

Graph data augmentation has proven to be effective in enhancing the generalizability and robustness of graph neural networks (GNNs) for graph-level classifications. However, existing methods mainly focus on augmenting the graph signal space and the graph structure space independently, overlooking their joint interaction. This paper addresses this limitation by formulating the problem as an optimal transport problem that aims to find an optimal strategy for matching nodes between graphs considering the interactions between graph structures and signals. To tackle this problem, we propose a novel graph mixup algorithm dubbed FGWMixup, which leverages the Fused Gromov-Wasserstein (FGW) metric space to identify a "midpoint" of the source graphs. To improve the scalability of our approach, we introduce a relaxed FGW solver that accelerates FGWMixup by enhancing the convergence rate from $\mathcal{O}(t^{-1})$ to $\mathcal{O}(t^{-2})$. Extensive experiments conducted on five datasets, utilizing both classic (MPNNs) and advanced (Graphormers) GNN backbones, demonstrate the effectiveness of FGWMixup in improving the generalizability and robustness of GNNs.

翻訳日:2023-06-29 15:26:33 公開日:2023-06-28

# 六方晶窒化ホウ素中のホウ素原子価欠陥の基底準位によるロバスト核スピン分極

Robust Nuclear Spin Polarization via Ground-State Level Anti-Crossing of Boron Vacancy Defects in Hexagonal Boron Nitride ( http://arxiv.org/abs/2306.15960v1 )

ライセンス: Link先を確認

Shihao Ru, Zhengzhi Jiang, Haidong Liang, Jonathan Kenny, Hongbing Cai, Xiaodan Lyu, Robert Cernansky, Feifei Zhou, Yuzhe Yang, Kenji Watanabe, Takashi Taniguch, Fuli Li, Koh Teck Seng, Xiaogang Liu, Fedor Jelezko, Andrew A. Bettiol, Weibo Gao

(参考訳) 核スピン偏極は、量子情報処理と量子センシングにおいて重要な役割を果たす。本研究では, 六方晶窒化ホウ素 (h-BN) 中のホウ素空孔欠陥 (\mathrm{V_B^-}$) を基底準位アンチクロス (GSLAC) を用いて, 安定かつ効率的な核スピン分極法を示す。 GSLACによる核分極は励起状態の反交差よりもかなり低いレーザーパワーで達成でき、このプロセスは実験的に実現可能である。さらに、h-BNで$\mathrm{V_B^-}$に対して、核スピンの直接光学的読み出しを実証した。以上の結果から, GSLACはh-BNの欠陥を正確に制御し, 操作するための有望な手法であることが示唆された。

Nuclear spin polarization plays a crucial role in quantum information processing and quantum sensing. In this work, we demonstrate a robust and efficient method for nuclear spin polarization with boron vacancy ($\mathrm{V_B^-}$) defects in hexagonal boron nitride (h-BN) using ground-state level anti-crossing (GSLAC). We show that GSLAC-assisted nuclear polarization can be achieved with significantly lower laser power than excited-state level anti-crossing, making the process experimentally more viable. Furthermore, we have demonstrated direct optical readout of nuclear spins for $\mathrm{V_B^-}$ in h-BN. Our findings suggest that GSLAC is a promising technique for the precise control and manipulation of nuclear spins in $\mathrm{V_B^-}$ defects in h-BN.

翻訳日:2023-06-29 15:26:08 公開日:2023-06-28

# ギャップのブリッジ: クラス不均衡下での一般化のための神経崩壊によるプロンプトチューニング

Bridging the Gap: Neural Collapse Inspired Prompt Tuning for Generalization under Class Imbalance ( http://arxiv.org/abs/2306.15955v1 )

ライセンス: Link先を確認

Didi Zhu, Yinchuan Li, Min Zhang, Junkun Yuan, Jiashuo Liu, Kun Kuang, Chao Wu

(参考訳) 大規模視覚言語モデル (V-L) は, 高速チューニングによる下流タスクの顕著な一般化機能を示した。しかし、実際のシナリオでは一般的な問題であるクラス不均衡の存在下では、パフォーマンスが著しく低下する。本稿では,クラス不均衡がV-Lモデルの一般化性能に及ぼす影響とニューラル崩壊現象をこれらのモデルに拡張し,クラス不均衡が一般化能力に与える影響の幾何学的理由を明らかにする。この問題を解決するために,ニューラル・コラプスに基づくプロンプト・チューニング(NPT)を提案し,テキストと画像の特徴が同じ単純なETF構造を満たすようにプロンプトを最適化する。 NPTは2つの正規化項、幾何脱バイアスとマルチモーダル同型を導入し、一般化能力を保ちながらクラス不均衡条件下でのV-Lモデルのロバスト性を高める。総合実験により,nptは11種類の画像認識データセットで既存のプロンプト学習技術を上回っており,新しいクラスでは絶対平均値2.63\%,不均衡データでは調和平均値2.47\%を達成した。

Large-scale vision-language (V-L) models have demonstrated remarkable generalization capabilities for downstream tasks through prompt tuning. However, their performance suffers significantly in the presence of class imbalance, a common issue in real-world scenarios. In this paper, we investigate the effects of class imbalance on the generalization performance of V-L models and extend Neural Collapse phenomenon to these models, revealing the geometric reasons behind the impact of class imbalance on their generalization ability. To address this problem, we propose Neural Collapse based Prompt Tuning (NPT), a novel method that optimizes prompts so that both text and image features satisfy the same simplex ETF structure. NPT incorporates two regularization terms, geometric de-biasing and multi-modal isomorphism, to enhance the robustness of V-L models under class imbalance conditions while maintaining their generalization capabilities. Our comprehensive experiments show that NPT outperforms existing prompt learning techniques across 11 diverse image recognition datasets, achieving an absolute average gain of 2.63\% for novel classes and 2.47\% for harmonic mean when facing imbalanced data.

翻訳日:2023-06-29 15:25:54 公開日:2023-06-28

# 球面センサのレンズレスイメージングのための角感光レンズ

Angle Sensitive Pixels for Lensless Imaging on Spherical Sensors ( http://arxiv.org/abs/2306.15953v1 )

ライセンス: Link先を確認

Yi Hua, Yongyi Zhao, Aswin C. Sankaranarayanan

(参考訳) 球面センサを用いた撮像のためのレンズレスアーキテクチャであるOrbCamを提案する。レンズレス撮像装置の技術は、主に平面センサの使用に重点を置いているが、そのような設計では、変調素子(例えば振幅マスクや位相マスク)を使用して可逆撮像システムを構築することが重要である。対照的に,曲面上の画素配向の多様性は,シーンとセンサ間のマッピングの条件づけを改善するのに十分であることを示す。したがって、球面センサを撮像する場合、すべての画素は同じ角度応答関数を持つことができ、レンズレス撮像器は互いに同一であり、向きだけが異なる画素で構成されている。本研究では,球面センサにおける画素の角応答設計のための計算ツールについて述べる。シミュレーションと実験室のプロトタイプの両方で設計を検証する。この設計の意義は、レンズレスイメージングを曲面やフレキシブルな面に容易に適用でき、新しいアプリケーション領域を開拓できるということである。

We propose OrbCam, a lensless architecture for imaging with spherical sensors. Prior work in lensless imager techniques have focused largely on using planar sensors; for such designs, it is important to use a modulation element, e.g. amplitude or phase masks, to construct a invertible imaging system. In contrast, we show that the diversity of pixel orientations on a curved surface is sufficient to improve the conditioning of the mapping between the scene and the sensor. Hence, when imaging on a spherical sensor, all pixels can have the same angular response function such that the lensless imager is comprised of pixels that are identical to each other and differ only in their orientations. We provide the computational tools for the design of the angular response of the pixels in a spherical sensor that leads to well-conditioned and noise-robust measurements. We validate our design in both simulation and a lab prototype. The implications of our design is that the lensless imaging can be enabled easily for curved and flexible surfaces thereby opening up a new set of application domains.

翻訳日:2023-06-29 15:25:33 公開日:2023-06-28

# 完全正の写像に対する極小完備定理とほぼすべての同値性

A minimal completion theorem and almost everywhere equivalence for Completely Positive maps ( http://arxiv.org/abs/2306.15952v1 )

ライセンス: Link先を確認

B. V. Rajarama Bhat, Arghya Chongdar

(参考訳) C*-代数上の線型写像を完全正の写像に完備化する問題を分析する。そのような完備化が実現可能であれば、一意的な極小完備が存在することが示される。この定理は、いくつかの非常に一般的な条件下では、準純写像とほぼ至る所で完全に正の写像が実際にその写像と等しいことを示すために用いられる。

A problem of completing a linear map on C*-algebras to a completely positive map is analyzed. It is shown that whenever such a completion is feasible there exists a unique minimal completion. This theorem is used to show that under some very general conditions a completely positive map almost everywhere equivalent to a quasi-pure map is actually equal to that map.

翻訳日:2023-06-29 15:25:16 公開日:2023-06-28

# 零点スキップによる畳み込み層の計算複雑性の低減

Reduce Computational Complexity for Convolutional Layers by Skipping Zeros ( http://arxiv.org/abs/2306.15951v1 )

ライセンス: Link先を確認

Zhiyi Zhang, Pengfei Zhang, Zhuopin Xu, Qi Wang

(参考訳) ディープニューラルネットワークはアクセラレーションのために並列プロセッサに依存している。オペレータを設計するには、複雑さを減らすための優れたアルゴリズムだけでなく、ハードウェアの十分な利用が必要である。畳み込み層は主に3種類の演算子を含む:前方伝播における畳み込み、逆伝播における畳み込み、拡張畳み込み。これらの演算子を実行するとき、0は常にテンソルに追加され、冗長な計算を引き起こす。本稿では, c-k-sアルゴリズム(convv2, ks-deconv, sk-dilated)について述べる。フィルタを分割してパッド付き0を除外し, 疎テンソルを密度テンソルに変換する。通常の畳み込みとは対照的に、畳み込みは複雑さのため加速しにくい。本稿では,C-K-Sの高性能GPU実装について述べるとともに,PyTorchとの比較による検証を行った。実験によると、C-K-SはPyTorchよりも利点があり、特に小さな特徴写像のデコンボリューションにおいて有利である。 C-K-Sのさらなる強化は、特定のGPUアーキテクチャで完全な最適化を行うことによって達成できる。

Deep neural networks rely on parallel processors for acceleration. To design operators for them, it requires not only good algorithm to reduce complexity, but also sufficient utilization of hardwares. Convolutional layers mainly contain 3 kinds of operators: convolution in forward propagation, deconvolution and dilated-convolution in backward propagation. When executing these operators, 0s are always added to tensors, causing redundant calculations. This paper gives C-K-S algorithm (ConvV2, KS-deconv, Sk-dilated), which skips these 0s in two ways: trim the filters to exclude padded 0s; transform sparse tensors to dense tensors, to avoid inserted 0s in deconvolution and dilated-convolution. In contrast to regular convolution, deconvolution is hard to accelerate due to its complicacy. This paper provides high-performance GPU implementations of C-K-S, and verifies their effectiveness with comparison to PyTorch. According to the experiments, C-K-S has advantages over PyTorch in certain cases, especially in deconvolution on small feature-maps. Further enhancement of C-K-S can be done by making full optimizations oriented at specific GPU architectures.

翻訳日:2023-06-29 15:25:08 公開日:2023-06-28

# 大規模言語モデルの時代におけるクエリ理解

Query Understanding in the Age of Large Language Models ( http://arxiv.org/abs/2306.16004v1 )

ライセンス: Link先を確認

Avishek Anand, Venktesh V, Abhijit Anand, Vinay Setty

(参考訳) 大規模言語モデル(llm)の普及と普及に伴い,自然言語を用いた検索・会話・制御や情報検索インターフェースが急速に普及している。本稿では,LLMを用いた対話型クエリ書き換えのための汎用フレームワークについて述べる。提案手法は,LLMを用いた高性能検索システムを構築しながら,改良的で透明な意図理解のための新たな機会を開拓することを目的としている。我々のフレームワークの重要な側面は、最終検索フェーズの前にさらに洗練され、制御され、編集される自然言語において、リライターが検索エンジンによるマシンインテントを十分に指定できることである。自然言語における機械の意図を表現、相互作用、推論する能力は、透明性、ランキングパフォーマンス、および意図を理解するために教師付きシグナルが収集される伝統的な方法からの離脱に重大な影響を与える。この対話型クエリ理解フレームワークに対するオープンな質問とともに、最初の実験を背景としたコンセプトを詳述する。

Querying, conversing, and controlling search and information-seeking interfaces using natural language are fast becoming ubiquitous with the rise and adoption of large-language models (LLM). In this position paper, we describe a generic framework for interactive query-rewriting using LLMs. Our proposal aims to unfold new opportunities for improved and transparent intent understanding while building high-performance retrieval systems using LLMs. A key aspect of our framework is the ability of the rewriter to fully specify the machine intent by the search engine in natural language that can be further refined, controlled, and edited before the final retrieval phase. The ability to present, interact, and reason over the underlying machine intent in natural language has profound implications on transparency, ranking performance, and a departure from the traditional way in which supervised signals were collected for understanding intents. We detail the concept, backed by initial experiments, along with open questions for this interactive query understanding framework.

翻訳日:2023-06-29 15:16:59 公開日:2023-06-28

# 音声駆動音声合成のテキスト化

Reprogramming Audio-driven Talking Face Synthesis into Text-driven ( http://arxiv.org/abs/2306.16003v1 )

ライセンス: Link先を確認

Jeongsoo Choi, Minsu Kim, Se Jin Park, Yong Man Ro

(参考訳) 本稿では,テキスト入力で操作できるように,事前学習した音声駆動対話顔合成モデルを再プログラムする手法を提案する。音声駆動発話顔合成モデルは、音声音声を入力として、所望の音声内容の発話アバターを生成するため、予め音声録音を行う必要がある。しかし、再生されるすべてのビデオの音声を記録するのは面倒だ。この問題を軽減するために,事前学習された音声駆動モデルの学習音声潜在空間に入力テキストを埋め込む新しい手法を提案する。そこで我々は,テキスト入力から音声潜在機能へのマッピングを学ぶために,tem(text-to-audio embedded module)を設計した。さらに,音声特徴に含まれる話者特性をモデル化するために,単一顔画像から得られるTAEMに視覚的話者埋め込みを注入することを提案する。訓練後、テキストか音声のどちらかの音声で会話の対面ビデオを合成できる。

In this paper, we propose a method to reprogram pre-trained audio-driven talking face synthesis models to be able to operate with text inputs. As the audio-driven talking face synthesis model takes speech audio as inputs, in order to generate a talking avatar with the desired speech content, speech recording needs to be performed in advance. However, this is burdensome to record audio for every video to be generated. In order to alleviate this problem, we propose a novel method that embeds input text into the learned audio latent space of the pre-trained audio-driven model. To this end, we design a Text-to-Audio Embedding Module (TAEM) which is guided to learn to map a given text input to the audio latent features. Moreover, to model the speaker characteristics lying in the audio features, we propose to inject visual speaker embedding into the TAEM, which is obtained from a single face image. After training, we can synthesize talking face videos with either text or speech audio.

翻訳日:2023-06-29 15:16:45 公開日:2023-06-28

# ディープラーニングによる公衆衛生研究のためのソーシャルメディア情報検索の合理化

Streamlining Social Media Information Retrieval for Public Health Research with Deep Learning ( http://arxiv.org/abs/2306.16001v1 )

ライセンス: Link先を確認

Yining Hua, Shixu Lin, Minghui Li, Yujie Zhang, Peilin Zhou, Ying-Chih Lo, Li Zhou, Jie Yang

(参考訳) 流行監視におけるソーシャルメディアの利用はよく確立されている。それでも、事前に定義されたレキシコンを用いて関連するコーパスを検索する場合、しばしばバイアスが発生する。本研究は,医学用語体系とUMLS概念の広範な辞書のキュレーションを目的としたフレームワークを提案する。このフレームワークは、ソーシャルメディアコンテンツから医療エンティティを識別するBERTベースの名前付きエンティティ認識(NER)モデルと、抽出されたエンティティを標準化するディープラーニング駆動正規化モジュールと、最も確率の高いUMLS概念を標準化されたエンティティに割り当てる半教師付きクラスタリングモジュールの3つのモジュールから構成される。この枠組みを2020年2月1日から2022年4月30日までのCOVID-19関連ツイートに適用し,876 UMLS概念にマッピングされた9,249の標準化されたエンティティと38,175の言語表現からなる症状辞書(https://github.com/ningkko/UMLS_colloquialism/)を生成した。この枠組みは,ソーシャルメディアを用いた公衆衛生研究におけるキーワードマッチング情報検索の制約に対処できる可能性を示す。

The utilization of social media in epidemic surveillance has been well established. Nonetheless, bias is often introduced when pre-defined lexicons are used to retrieve relevant corpus. This study introduces a framework aimed at curating extensive dictionaries of medical colloquialisms and Unified Medical Language System (UMLS) concepts. The framework comprises three modules: a BERT-based Named Entity Recognition (NER) model that identifies medical entities from social media content, a deep-learning powered normalization module that standardizes the extracted entities, and a semi-supervised clustering module that assigns the most probable UMLS concept to each standardized entity. We applied this framework to COVID-19-related tweets from February 1, 2020, to April 30, 2022, generating a symptom dictionary (available at https://github.com/ningkko/UMLS_colloquialism/) composed of 9,249 standardized entities mapped to 876 UMLS concepts and 38,175 colloquial expressions. This framework demonstrates encouraging potential in addressing the constraints of keyword matching information retrieval in social media-based public health research.

翻訳日:2023-06-29 15:16:29 公開日:2023-06-28

# 量子力学におけるエネルギー密度について

On the energy density in quantum mechanics ( http://arxiv.org/abs/2306.15999v1 )

ライセンス: Link先を確認

Francisco Torres Arvizu, Adrian Ortega, and Hern\'an Larralde

(参考訳) 量子力学におけるエネルギー密度の定義はいくつかある。これらの降伏式は局所的に異なるが、全て連続性方程式を満たし、考慮中の系の期待エネルギーの値に積分する。したがって、ある定義を他の定義から選択する物理的根拠が存在するかどうかという問題は自然に生じる。本研究では, 量子粒子を含む井戸の大きさを変化させることで, システムの探究方法を提案する。壁を移動させることによる平均的な作業はエネルギー密度の定義の1つと密接に関連していることを示す。具体的には、壁面で評価された適切なエネルギー密度は、粒子が局所的に作用する力に対応し、そこで作業を行う。この同定は2次元系と3次元系に拡張される。

There are several definitions of energy density in quantum mechanics. These yield expressions that differ locally, but all satisfy a continuity equation and integrate to the value of the expected energy of the system under consideration. Thus, the question of whether there are physical grounds to choose one definition over another arises naturally. In this work, we propose a way to probe a system by varying the size of a well containing a quantum particle. We show that the mean work done by moving the wall is closely related to one of the definitions for energy density. Specifically, the appropriate energy density, evaluated at the wall corresponds to the force exerted by the particle locally, against which the work is done. We show that this identification extends to two and three dimensional systems.

翻訳日:2023-06-29 15:16:07 公開日:2023-06-28

# ラベル雑音補正がMLフェアネスに及ぼす影響の系統解析

Systematic analysis of the impact of label noise correction on ML Fairness ( http://arxiv.org/abs/2306.15994v1 )

ライセンス: Link先を確認

I. Oliveira e Silva, C. Soares, I. Sousa, R. Ghani

(参考訳) 任意、矛盾、あるいは欠陥のある意思決定は深刻な懸念を生じさせ、不公平なモデルを防ぐことは、機械学習においてますます重要な課題である。データはしばしば過去の差別行動を反映し、そのようなデータに基づいてトレーニングされたモデルは、性別、人種、年齢などのセンシティブな属性に偏りを反映する可能性がある。公正なモデルを開発するための1つのアプローチは、トレーニングデータを前処理して、例えばバイアス付きラベルを修正することで、関連する情報を保持しながら、基礎となるバイアスを取り除くことである。複数のラベルのノイズ補正手法が利用可能であるが、識別におけるその行動に関する情報は非常に限られている。本研究では,ラベルノイズ補正手法の有効性を定量的に評価し,偏りのあるデータセットで学習したモデルの公平性を保証する実験手法を開発した。提案手法はラベルノイズ量を操作することで,公平性ベンチマークだけでなく,標準mlデータセットでも使用できる。提案手法を適用し,標準OpenMLデータセットの公平度測定値に基づいて6つのラベルノイズ補正手法を解析する。その結果,ハイブリッドラベル雑音補正法は,予測性能と公平性との最良のトレードオフを実現することが示唆された。しかしながら、クラスタリングに基づく補正は、予測パフォーマンスを低下させるコストで、最も差別を低減できる。

Arbitrary, inconsistent, or faulty decision-making raises serious concerns, and preventing unfair models is an increasingly important challenge in Machine Learning. Data often reflect past discriminatory behavior, and models trained on such data may reflect bias on sensitive attributes, such as gender, race, or age. One approach to developing fair models is to preprocess the training data to remove the underlying biases while preserving the relevant information, for example, by correcting biased labels. While multiple label noise correction methods are available, the information about their behavior in identifying discrimination is very limited. In this work, we develop an empirical methodology to systematically evaluate the effectiveness of label noise correction techniques in ensuring the fairness of models trained on biased datasets. Our methodology involves manipulating the amount of label noise and can be used with fairness benchmarks but also with standard ML datasets. We apply the methodology to analyze six label noise correction methods according to several fairness metrics on standard OpenML datasets. Our results suggest that the Hybrid Label Noise Correction method achieves the best trade-off between predictive performance and fairness. Clustering-Based Correction can reduce discrimination the most, however, at the cost of lower predictive performance.

翻訳日:2023-06-29 15:15:56 公開日:2023-06-28

# MyDigitalFootprint: エッジにおける分散コンピューティングアプリケーションのための広範なコンテキストデータセット

MyDigitalFootprint: an extensive context dataset for pervasive computing applications at the edge ( http://arxiv.org/abs/2306.15990v1 )

ライセンス: Link先を確認

Mattia Giovanni Campana, Franca Delmastro

(参考訳) 接続されたスマートデバイスの広範な普及は、インターネットの最先端における急速な拡大と進化に寄与した。パーソナルモバイルデバイスは周囲の他のスマートオブジェクトと対話し、急速に変化するユーザーコンテキストに基づいて行動に適応する。このデータをローカルに処理するモバイルデバイスの能力は、迅速な適応には不可欠である。これは、ユーザアプリケーションやコンテキスト処理用のミドルウェアプラットフォームに統合された単一の開発プロセスによって実現できます。しかし、モバイル環境におけるユーザコンテキストの複雑さを考慮した公開データセットの欠如は、研究の進展を妨げる。 mydigitalfootprintは,スマートフォンのセンサデータ,物理的近接情報,オンラインソーシャルネットワークインタラクションからなる大規模データセットである。このデータセットはマルチモーダルコンテキスト認識と社会的関係モデリングをサポートする。自然環境における31人のボランティアユーザーによる2ヶ月の計測で、制限のない行動を可能にする。既存の公開データセットは、特定のアプリケーションに対する限られたコンテキストデータに重点を置いています。データセットの有効性を示すために,様々な機械学習タスクを活用した3つのコンテキスト認識アプリケーションを提案する。 (i)物理的近接データに基づくソーシャルリンク予測アルゴリズム (ii)スマートフォン内蔵センサデータを用いた日常生活行動認識 (iii)広義の文脈認識推薦システム。我々のデータセットは、その異質な情報と共に、モバイルおよびエッジコンピューティングにおける新しい研究を検証する貴重なリソースとして役立ちます。

The widespread diffusion of connected smart devices has contributed to the rapid expansion and evolution of the Internet at its edge. Personal mobile devices interact with other smart objects in their surroundings, adapting behavior based on rapidly changing user context. The ability of mobile devices to process this data locally is crucial for quick adaptation. This can be achieved through a single elaboration process integrated into user applications or a middleware platform for context processing. However, the lack of public datasets considering user context complexity in the mobile environment hinders research progress. We introduce MyDigitalFootprint, a large-scale dataset comprising smartphone sensor data, physical proximity information, and Online Social Networks interactions. This dataset supports multimodal context recognition and social relationship modeling. It spans two months of measurements from 31 volunteer users in their natural environment, allowing for unrestricted behavior. Existing public datasets focus on limited context data for specific applications, while ours offers comprehensive information on the user context in the mobile environment. To demonstrate the dataset's effectiveness, we present three context-aware applications utilizing various machine learning tasks: (i) a social link prediction algorithm based on physical proximity data, (ii) daily-life activity recognition using smartphone-embedded sensors data, and (iii) a pervasive context-aware recommender system. Our dataset, with its heterogeneity of information, serves as a valuable resource to validate new research in mobile and edge computing.

翻訳日:2023-06-29 15:15:36 公開日:2023-06-28

# テンソルフォーマ:高品質点雲再構成のための正規化マトリックスアテンショントランス

Tensorformer: Normalized Matrix Attention Transformer for High-quality Point Cloud Reconstruction ( http://arxiv.org/abs/2306.15989v1 )

ライセンス: Link先を確認

Hui Tian, Zheng Qin, Renjiao Yi, Chenyang Zhu, Kai Xu

(参考訳) 生のポイントクラウドからの表面復元は、コンピュータグラフィックスコミュニティで何十年も研究されてきた。ポアソン曲面再構成のような古典的な解は、合理的な結果を得るために余分な入力として点正規化を必要とする。現代の変圧器に基づく手法は正規化なしでは機能するが、離散点からの局所融合における符号化性能の制限により、結果はより微細化されていない。高品質な再構成を行うための新しい正規化行列アテンショントランス(Tensorformer)を提案する。提案した行列アテンションにより、同時にポイントワイドとチャネルワイドのメッセージパッシングが可能となり、一方、以前のベクトルアテンションは異なるチャネル間で隣接するポイント情報を失う。これにより、機能学習の自由度が高まり、ローカルジオメトリのモデリングが容易になる。提案手法は,ShapeNetCoreとABCの2つの一般的なデータセットの最先端化を実現し,ShapeNet上のIOUを4%改善する。私たちの実装は受け入れ次第リリースします。

Surface reconstruction from raw point clouds has been studied for decades in the computer graphics community, which is highly demanded by modeling and rendering applications nowadays. Classic solutions, such as Poisson surface reconstruction, require point normals as extra input to perform reasonable results. Modern transformer-based methods can work without normals, while the results are less fine-grained due to limited encoding performance in local fusion from discrete points. We introduce a novel normalized matrix attention transformer (Tensorformer) to perform high-quality reconstruction. The proposed matrix attention allows for simultaneous point-wise and channel-wise message passing, while the previous vector attention loses neighbor point information across different channels. It brings more degree of freedom in feature learning and thus facilitates better modeling of local geometries. Our method achieves state-of-the-art on two commonly used datasets, ShapeNetCore and ABC, and attains 4% improvements on IOU on ShapeNet. Our implementation will be released upon acceptance.

翻訳日:2023-06-29 15:15:17 公開日:2023-06-28

# AFPN:オブジェクト検出のための漸近的特徴ピラミッドネットワーク

AFPN: Asymptotic Feature Pyramid Network for Object Detection ( http://arxiv.org/abs/2306.15988v1 )

ライセンス: Link先を確認

Guoyu Yang, Jie Lei, Zhikuan Zhu, Siyu Cheng, Zunlei Feng, Ronghua Liang

(参考訳) マルチスケール機能は、オブジェクト検出タスクのばらつきを伴うオブジェクトのエンコーディングにおいて非常に重要である。マルチスケール機能抽出のための一般的な戦略は、古典的なトップダウンおよびボトムアップ機能ピラミッドネットワークを採用することだ。しかし,これらの手法は特徴情報の喪失や劣化に悩まされ,非隣接レベルの融合効果を損なう。本稿では,非隣接レベルで直接インタラクションをサポートする漸近的特徴ピラミッドネットワーク(afpn)を提案する。 AFPNは隣接する2つの低レベル特徴を融合させて開始され、漸近的に高レベル特徴を融合プロセスに組み込む。このように、非隣接レベル間の大きな意味的ギャップを回避できる。各空間位置における特徴融合時に発生する多目的情報衝突の可能性を考えると、適応的な空間融合操作によりこれらの矛盾を軽減できる。提案したAFPNを2段階および1段階のオブジェクト検出フレームワークに組み込んで,MS-COCO 2017バリデーションとテストデータセットを用いて評価する。実験結果から,本手法は他の最先端機能ピラミッドネットワークよりも高い競合性が得られた。コードは \href{https://github.com/gyyang23/afpn}{https://github.com/gyyang23/afpn} で入手できる。

Multi-scale features are of great importance in encoding objects with scale variance in object detection tasks. A common strategy for multi-scale feature extraction is adopting the classic top-down and bottom-up feature pyramid networks. However, these approaches suffer from the loss or degradation of feature information, impairing the fusion effect of non-adjacent levels. This paper proposes an asymptotic feature pyramid network (AFPN) to support direct interaction at non-adjacent levels. AFPN is initiated by fusing two adjacent low-level features and asymptotically incorporates higher-level features into the fusion process. In this way, the larger semantic gap between non-adjacent levels can be avoided. Given the potential for multi-object information conflicts to arise during feature fusion at each spatial location, adaptive spatial fusion operation is further utilized to mitigate these inconsistencies. We incorporate the proposed AFPN into both two-stage and one-stage object detection frameworks and evaluate with the MS-COCO 2017 validation and test datasets. Experimental evaluation shows that our method achieves more competitive results than other state-of-the-art feature pyramid networks. The code is available at \href{https://github.com/gyyang23/AFPN}{https://github.com/gyyang23/AFPN}.

翻訳日:2023-06-29 15:14:58 公開日:2023-06-28

# 日本語文分類と名前付きエンティティ認識のマルチタスク学習のための文間ラベル生成フレームワーク

Sentence-to-Label Generation Framework for Multi-task Learning of Japanese Sentence Classification and Named Entity Recognition ( http://arxiv.org/abs/2306.15978v1 )

ライセンス: Link先を確認

Chengguang Gan, Qinghao Zhang and Tatsunori Mori

(参考訳) 情報抽出(IE)は自然言語処理において重要なサブフィールドである。本研究では,Sentence Classification (SC) と Named Entity Recognition (NER) を組み合わせた,Sentence Classification and Named Entity Recognition Multi-task (SCNM) アプローチを提案する。我々はSCNMのためのSLGフレームワークを開発し、SCとNERの両方を含むウィキペディアデータセットを構築する。フォーマット変換器を用いて入力形式を統一し,生成モデルを用いてscラベル,nerラベル,関連するテキストセグメントを生成する。生成フォーマットの精度を向上させるための制約機構(cm)を提案する。その結果,SCの精度はSCNMでは1.13ポイント,NERでは1.06ポイント向上し,CMでは63.61から100に向上した。その結果,scとnerの相互強化効果が示され,統合により両タスクの性能が向上した。

Information extraction(IE) is a crucial subfield within natural language processing. In this study, we introduce a Sentence Classification and Named Entity Recognition Multi-task (SCNM) approach that combines Sentence Classification (SC) and Named Entity Recognition (NER). We develop a Sentence-to-Label Generation (SLG) framework for SCNM and construct a Wikipedia dataset containing both SC and NER. Using a format converter, we unify input formats and employ a generative model to generate SC-labels, NER-labels, and associated text segments. We propose a Constraint Mechanism (CM) to improve generated format accuracy. Our results show SC accuracy increased by 1.13 points and NER by 1.06 points in SCNM compared to standalone tasks, with CM raising format accuracy from 63.61 to 100. The findings indicate mutual reinforcement effects between SC and NER, and integration enhances both tasks' performance.

翻訳日:2023-06-29 15:14:36 公開日:2023-06-28

# 次元構造に基づくクロスモーダル学習のための知識蒸留法

A Dimensional Structure based Knowledge Distillation Method for Cross-Modal Learning ( http://arxiv.org/abs/2306.15977v1 )

ライセンス: Link先を確認

Lingyu Si, Hongwei Dong, Wenwen Qiang, Junzhi Yu, Wenlong Zhai, Changwen Zheng, Fanjiang Xu, Fuchun Sun

(参考訳) データ品質の制限のため、いくつかの重要な視覚タスクは独立して実行するのは難しい。情報的な暗黒知識を伝達するために、これまで利用できなかった情報を導入することは、そのような困難な課題を解決する一般的な方法である。しかし、なぜ転向した知識労働が広範に研究されていないのか。本稿では,単純かつ難解な課題から抽出された特徴を解析・観察することにより,特徴判別性と次元構造(ds)との相関性を見出す。そこで我々は, 深いチャネル関係と中間空間分布を用いてDSを表現し, 教師付きクロスモーダル学習(CML)の性能向上のための新しいクロスモーダル知識蒸留法を提案する。提案手法では,出力特徴をチャネル毎に独立し,中間特徴を均一に分散させることで,難課題から意味的に無関係な特徴を学習し,その正確性を高める。これは、二重モード間の性能ギャップが比較的大きい特定のアプリケーションで特に有用である。さらに,コミュニティ開発を促進するために,実世界のCMLデータセットを収集した。データセットには1万以上の光学画像とレーダー画像が含まれており、継続的に更新されている。実世界およびベンチマークデータセットにおける実験結果は,提案手法の有効性を検証する。

Due to limitations in data quality, some essential visual tasks are difficult to perform independently. Introducing previously unavailable information to transfer informative dark knowledge has been a common way to solve such hard tasks. However, research on why transferred knowledge works has not been extensively explored. To address this issue, in this paper, we discover the correlation between feature discriminability and dimensional structure (DS) by analyzing and observing features extracted from simple and hard tasks. On this basis, we express DS using deep channel-wise correlation and intermediate spatial distribution, and propose a novel cross-modal knowledge distillation (CMKD) method for better supervised cross-modal learning (CML) performance. The proposed method enforces output features to be channel-wise independent and intermediate ones to be uniformly distributed, thereby learning semantically irrelevant features from the hard task to boost its accuracy. This is especially useful in specific applications where the performance gap between dual modalities is relatively large. Furthermore, we collect a real-world CML dataset to promote community development. The dataset contains more than 10,000 paired optical and radar images and is continuously being updated. Experimental results on real-world and benchmark datasets validate the effectiveness of the proposed method.

翻訳日:2023-06-29 15:14:19 公開日:2023-06-28

# 物理・仮想センサデータを組み合わせたユーザコンテキストの軽量モデリング

Lightweight Modeling of User Context Combining Physical and Virtual Sensor Data ( http://arxiv.org/abs/2306.16029v1 )

ライセンス: Link先を確認

Mattia Giovanni Campana, Dimitris Chatzopoulos, Franca Delmastro, Pan Hui

(参考訳) ユーザのモバイルデバイスで利用可能なセンサによって生成される多数のデータと、機械学習技術の進歩、ユーザの現在の状況(物理的コンテキスト)を認識し、システムのパーソナライズ機能を最適化するコンテキスト認識サービスのサポートを組み合わせる。しかし、コンテキスト認識性能は主にコンテキスト推論プロセスの精度に依存しており、これは大規模およびラベル付きデータセットの可用性に厳密に関係している。本研究では,パーソナルモバイルデバイスから得られた異種センシングデータを含むデータセットを収集するフレームワークを提案する。このフレームワークは3人の任意ユーザが2週間使用し、36K以上のサンプルと1331の機能を持つデータセットを生成する。また,ユーザモバイルデバイス上で推論処理全体を効率的に実行するユーザコンテキストをモデル化する軽量なアプローチを提案する。この目的のために, 文脈分類を最適化するために6次元化手法を用いた。生成したデータセットに対する実験結果から,精度損失を3%以下に抑えつつ,10倍のスピードアップと90%以上の特徴低下を実現した。

The multitude of data generated by sensors available on users' mobile devices, combined with advances in machine learning techniques, support context-aware services in recognizing the current situation of a user (i.e., physical context) and optimizing the system's personalization features. However, context-awareness performances mainly depend on the accuracy of the context inference process, which is strictly tied to the availability of large-scale and labeled datasets. In this work, we present a framework developed to collect datasets containing heterogeneous sensing data derived from personal mobile devices. The framework has been used by 3 voluntary users for two weeks, generating a dataset with more than 36K samples and 1331 features. We also propose a lightweight approach to model the user context able to efficiently perform the entire reasoning process on the user mobile device. To this aim, we used six dimensionality reduction techniques in order to optimize the context classification. Experimental results on the generated dataset show that we achieve a 10x speed up and a feature reduction of more than 90% while keeping the accuracy loss less than 3%.

翻訳日:2023-06-29 15:08:27 公開日:2023-06-28

# 古典と量子学習者の指数的分離

Exponential separations between classical and quantum learners ( http://arxiv.org/abs/2306.16028v1 )

ライセンス: Link先を確認

Casper Gyurik and Vedran Dunjko

(参考訳) かなりの努力にもかかわらず、量子機械学習コミュニティは、古典的なデータを扱う際に、人工暗号に触発されたデータセットに対して量子学習の利点しか示していない。本稿では、量子学習アルゴリズムが古典学習アルゴリズムよりも証明可能な指数的高速化を達成できる学習問題を見つけることの課題に対処する。本稿では,この問題に関連する計算学習理論の概念を考察し,定義の微妙な違いが,学習者が満足して解決すべき要件や課題をいかに大きく異なるものにするかを考察する。証明可能な量子スピードアップによる既存の学習問題を検証し、データを生成する関数を識別するのではなく、その古典的な難易度に大きく依存していることを見出した。そこで本研究では,古典的難易度は主にデータ生成関数の同定にある2つの新しい学習分離を提案する。さらに、データが量子生成されるシナリオで量子速度を証明できる計算のハードネスの仮定を探求し、より自然な設定(凝縮物や高エネルギー物理学など)で量子の利点を示唆する。また,学習分離の文脈における古典的な影のパラダイムの限界や,物質相の特徴付けやハミルトン学習といった物理的動機づけのある設定が計算学習フレームワークにどのように適合するかについても論じた。

Despite significant effort, the quantum machine learning community has only demonstrated quantum learning advantages for artificial cryptography-inspired datasets when dealing with classical data. In this paper we address the challenge of finding learning problems where quantum learning algorithms can achieve a provable exponential speedup over classical learning algorithms. We reflect on computational learning theory concepts related to this question and discuss how subtle differences in definitions can result in significantly different requirements and tasks for the learner to meet and solve. We examine existing learning problems with provable quantum speedups and find that they largely rely on the classical hardness of evaluating the function that generates the data, rather than identifying it. To address this, we present two new learning separations where the classical difficulty primarily lies in identifying the function generating the data. Furthermore, we explore computational hardness assumptions that can be leveraged to prove quantum speedups in scenarios where data is quantum-generated, which implies likely quantum advantages in a plethora of more natural settings (e.g., in condensed matter and high energy physics). We also discuss the limitations of the classical shadow paradigm in the context of learning separations, and how physically-motivated settings such as characterizing phases of matter and Hamiltonian learning fit in the computational learning framework.

翻訳日:2023-06-29 15:08:09 公開日:2023-06-28

# フェデレーション学習に基づく分散計算モデル : 時間変動問題を解決するための異種モデルとコンソーシアムブロックチェーンの統合

A Distributed Computation Model Based on Federated Learning Integrates Heterogeneous models and Consortium Blockchain for Solving Time-Varying Problems ( http://arxiv.org/abs/2306.16023v1 )

ライセンス: Link先を確認

Zhihao Hao, Guancheng Wang, Chunwei Tian, Bob Zhang

(参考訳) リカレントニューラルネットワークは,複雑な環境に対応する時間変動問題を効果的に解決するために開発された。しかし、集中処理の方法によって制限されたモデル性能は、実際のモデルやデータのサイロ問題のような要因によって大きく影響を受ける。したがって、フェデレーション学習(fl)のような分散人工知能の出現は、モデル間の動的集約を可能にする。しかしながら、flの統合プロセスは依然としてサーバに依存しており、モデル全体に大きなリスクをもたらす可能性がある。また、均質なモデル間の協調のみが可能であり、異質なモデル間の相互作用に対する良い解決策を持っていない。そこで本研究では,コンソーシアムブロックチェーンネットワークに基づく分散計算モデル(DCM)を提案する。さらに、グローバルソリューションプロセスのために分散階層統合(DHI)アルゴリズムも設計されている。グループ内のパーミッションノードは、異なるパーミッションレスノードからローカルモデルの結果を収集し、集約された結果をすべてのパーミッションレスノードに送信し、ローカルモデルの処理を規則化する。イテレーションが完了すると、ローカルな結果の二次的な統合がパーミッションノード間で行われ、グローバルな結果が得られる。実験では,DCMの効率を検証し,提案したモデルが,フェデレート学習フレームワークに基づく多くの最先端モデルより優れていることを示す。

The recurrent neural network has been greatly developed for effectively solving time-varying problems corresponding to complex environments. However, limited by the way of centralized processing, the model performance is greatly affected by factors like the silos problems of the models and data in reality. Therefore, the emergence of distributed artificial intelligence such as federated learning (FL) makes it possible for the dynamic aggregation among models. However, the integration process of FL is still server-dependent, which may cause a great risk to the overall model. Also, it only allows collaboration between homogeneous models, and does not have a good solution for the interaction between heterogeneous models. Therefore, we propose a Distributed Computation Model (DCM) based on the consortium blockchain network to improve the credibility of the overall model and effective coordination among heterogeneous models. In addition, a Distributed Hierarchical Integration (DHI) algorithm is also designed for the global solution process. Within a group, permissioned nodes collect the local models' results from different permissionless nodes and then sends the aggregated results back to all the permissionless nodes to regularize the processing of the local models. After the iteration is completed, the secondary integration of the local results will be performed between permission nodes to obtain the global results. In the experiments, we verify the efficiency of DCM, where the results show that the proposed model outperforms many state-of-the-art models based on a federated learning framework.

翻訳日:2023-06-29 15:07:46 公開日:2023-06-28

# 強化学習の構造:調査とオープン問題

Structure in Reinforcement Learning: A Survey and Open Problems ( http://arxiv.org/abs/2306.16021v1 )

ライセンス: Link先を確認

Aditya Mohan, Amy Zhang, Marius Lindauer

(参考訳) 関数近似のためのディープニューラルネットワーク(DNN)の表現能力に支えられた強化学習(RL)は、多くのアプリケーションでかなりの成功を収めている。しかし、多様な予測不可能な力学、ノイズ信号、そして大きな状態と行動空間によって特徴づけられる、幅広い現実世界のシナリオに対処する実践性は依然として限られている。この制限は、データ効率の低下、一般化能力の制限、安全性保証の欠如、解釈可能性の欠如などの問題に起因している。これらの課題を克服し、これらの重要な指標にまたがるパフォーマンスを改善するために、問題に関する構造的な情報をRL学習プロセスに組み込むことが期待できる。 RLの様々なサブフィールドは、そのような誘導バイアスを組み込む方法を提案している。我々は,これらの多様な方法論を統一的な枠組みで満たし,学習問題における構造の役割に光を当て,それらの手法を構造を組み込む異なるパターンに分類する。この包括的フレームワークを活用することで、構造化されたRLに関連する課題に関する貴重な洞察を提供し、RL研究におけるデザインパターンの視点の基礎となる。この新しい視点は、現実世界のシナリオをよりうまく処理できる、より効率的かつ効率的なRLアルゴリズムの開発における将来の進歩と支援の道を開く。

Reinforcement Learning (RL), bolstered by the expressive capabilities of Deep Neural Networks (DNNs) for function approximation, has demonstrated considerable success in numerous applications. However, its practicality in addressing a wide range of real-world scenarios, characterized by diverse and unpredictable dynamics, noisy signals, and large state and action spaces, remains limited. This limitation stems from issues such as poor data efficiency, limited generalization capabilities, a lack of safety guarantees, and the absence of interpretability, among other factors. To overcome these challenges and improve performance across these crucial metrics, one promising avenue is to incorporate additional structural information about the problem into the RL learning process. Various sub-fields of RL have proposed methods for incorporating such inductive biases. We amalgamate these diverse methodologies under a unified framework, shedding light on the role of structure in the learning problem, and classify these methods into distinct patterns of incorporating structure. By leveraging this comprehensive framework, we provide valuable insights into the challenges associated with structured RL and lay the groundwork for a design pattern perspective on RL research. This novel perspective paves the way for future advancements and aids in the development of more effective and efficient RL algorithms that can potentially handle real-world scenarios better.

翻訳日:2023-06-29 15:07:24 公開日:2023-06-28

# points for energy reform (pointer): エネルギー特性に関連付けられた100万の建物からなるlidarから派生したポイントクラウドデータセット

Points for Energy Renovation (PointER): A LiDAR-Derived Point Cloud Dataset of One Million English Buildings Linked to Energy Characteristics ( http://arxiv.org/abs/2306.16020v1 )

ライセンス: Link先を確認

Sebastian Krapf, Kevin Mayer, Martin Fischer

(参考訳) 欧州の非効率な建物の急速な改修は、気候変動を減らすために必要である。しかし,各建物がユニークなため,大規模建築物の分析・評価は困難である。現在の実施例では、建物のエネルギー性能は、ゆっくりと、コストがかかり、局所的に評価される。本稿では,建物の3次元表現とそのエネルギー特性に関するデータ駆動型大規模理解を促進するビルディングポイントクラウドデータセットを提案する。我々は、ジオリファレンスなLiDARデータとビルディングフットプリントを交差させてビルディングポイント雲を生成し、Unique Property Reference Number (UPRN)を介してイギリスのエネルギーパフォーマンスデータベースから属性をリンクする。代表例を示すために,イギリス各地の農村・都市部から100万棟の建築物を選定し,その50万棟がエネルギー特性と関連づけられている。新しいリージョンにおけるポイントクラウドの構築は、論文と共に公開されたオープンソースコードによって生成される。このデータセットは、エネルギーモデリングの新たな研究を可能にし、UPRNやジオロケーションを通じて構築機能を追加することで、他の研究分野にも容易に拡張できる。

Rapid renovation of Europe's inefficient buildings is required to reduce climate change. However, analyzing and evaluating buildings at scale is challenging because every building is unique. In current practice, the energy performance of buildings is assessed during on-site visits, which are slow, costly, and local. This paper presents a building point cloud dataset that promotes a data-driven, large-scale understanding of the 3D representation of buildings and their energy characteristics. We generate building point clouds by intersecting building footprints with geo-referenced LiDAR data and link them with attributes from UK's energy performance database via the Unique Property Reference Number (UPRN). To achieve a representative sample, we select one million buildings from a range of rural and urban regions across England, of which half a million are linked to energy characteristics. Building point clouds in new regions can be generated with the open-source code published alongside the paper. The dataset enables novel research in building energy modeling and can be easily expanded to other research fields by adding building features via the UPRN or geo-location.

翻訳日:2023-06-29 15:07:04 公開日:2023-06-28

# 改良型深層学習モデルに基づくオフショア風力発電所における鳥の高速認識

Fast Recognition of birds in offshore wind farms based on an improved deep learning model ( http://arxiv.org/abs/2306.16019v1 )

ライセンス: Link先を確認

Yantong Liu, Xingke Li, Jong-Chan Lee

(参考訳) 風力タービンの安全性は、沖合の風力発電所の安定した運用の前提条件である。しかし、鳥の損傷は風力タービンと風力タービンブレードの安全運転に直接的な脅威をもたらす。さらに、毎年何百万もの鳥が風力タービンで死んでいる。そこで本稿では, 環境保全と洋上風力タービンの安全な運転の維持を目的として, 夜間等の低照度環境における電流ターゲット検出アルゴリズムの低検出性能の問題に対処するため, cbamアテンション機構とretinexnetネットワークをyolov5に統合することにより, ネットワーク性能を向上させる手法を提案する。まず、トレーニング用CBAMアテンションモジュールを内蔵したYOLOv5ネットワークにトレーニングセット画像を入力し、最適な重量モデルを保存する。次に、デコンネットとエンハンスネットを用いて低照度画像の強調表示を行い、その精度を最適重みモデルで検証する。さらに、k-means++クラスタリングアルゴリズムを用いてアンカーボックス選択法を最適化し、不安定な初期セントロイドの問題を解消し、より優れたクラスタリング結果を得る。実験の結果、鳥類検出作業におけるこのモデルの精度は87.40%に達し、21.25%が増加することが示された。モデルは,風力タービン近傍の鳥をリアルタイムに検出し,夜間,雨季,風況に強い安定性を示し,そのモデルが風力タービンの安全かつ安定した運転を保証することを証明した。

The safety of wind turbines is a prerequisite for the stable operation of offshore wind farms. However, bird damage poses a direct threat to the safe operation of wind turbines and wind turbine blades. In addition, millions of birds are killed by wind turbines every year. In order to protect the ecological environment and maintain the safe operation of offshore wind turbines, and to address the problem of the low detection capability of current target detection algorithms in low-light environments such as at night, this paper proposes a method to improve the network performance by integrating the CBAM attention mechanism and the RetinexNet network into YOLOv5. First, the training set images are fed into the YOLOv5 network with integrated CBAM attention module for training, and the optimal weight model is stored. Then, low-light images are enhanced and denoised using Decom-Net and Enhance-Net, and the accuracy is tested on the optimal weight model. In addition, the k-means++ clustering algorithm is used to optimise the anchor box selection method, which solves the problem of unstable initial centroids and achieves better clustering results. Experimental results show that the accuracy of this model in bird detection tasks can reach 87.40%, an increase of 21.25%. The model can detect birds near wind turbines in real time and shows strong stability in night, rainy and shaky conditions, proving that the model can ensure the safe and stable operation of wind turbines.

翻訳日:2023-06-29 15:06:45 公開日:2023-06-28

# 複数ラベルの分類に必要な正のラベル

Positive Label Is All You Need for Multi-Label Classification ( http://arxiv.org/abs/2306.16016v1 )

ライセンス: Link先を確認

Zhixiang Yuan, Kaixin Zhang, Tao Huang

(参考訳) マルチラベル分類(MLC)は、各画像に様々な意味ラベルを注釈付けすることが困難であるため、トレーニングデータにおいて避けられないラベルノイズに悩まされる。ノイズラベルの影響を軽減するため、既存の手法は主に訓練されたmlcモデルによるラベルミスの識別と修正に費やされている。しかし、これらの方法はいまだに騒がしいラベルをトレーニングに含むため、ノイズラベルを不正確に認識し、パフォーマンスを弱める可能性がある。本稿では, 負ラベルが正ラベル以上であり, ほとんどのノイズラベルが負ラベルから来ていることを考慮し, データセット中のすべての負ラベルを直接破棄し, 正および未ラベルのマルチラベル分類(PU-MLC)と呼ばれる新しい手法を提案する。正のラベル付き学習をmlcタスクに拡張することにより,正のラベルとラベル付きデータのみをモデルに訓練し,損失関数に適応的な再バランス係数と適応温度係数を導入し,ラベル分布の破滅的不均衡と,トレーニングの確率の過大さを緩和する。 PU-MLC は単純かつ効果的であり,MLC-PL タスクを伴う MLC と MLC の両方に適用可能である。 MS-COCOとPASCAL VOCデータセットの大規模な実験により、私たちのPU-MLCはより少ないアノテーションで MLC と MLC-PL の設定を大幅に改善することを示した。コードはリリースされる。

Multi-label classification (MLC) suffers from the inevitable label noise in training data due to the difficulty in annotating various semantic labels in each image. To mitigate the influence of noisy labels, existing methods mainly devote to identifying and correcting the label mistakes via a trained MLC model. However, these methods still involve annoying noisy labels in training, which can result in imprecise recognition of noisy labels and weaken the performance. In this paper, considering that the negative labels are substantially more than positive labels, and most noisy labels are from the negative labels, we directly discard all the negative labels in the dataset, and propose a new method dubbed positive and unlabeled multi-label classification (PU-MLC). By extending positive-unlabeled learning into MLC task, our method trains model with only positive labels and unlabeled data, and introduces adaptive re-balance factor and adaptive temperature coefficient in the loss function to alleviate the catastrophic imbalance in label distribution and over-smoothing of probabilities in training. Our PU-MLC is simple and effective, and it is applicable to both MLC and MLC with partial labels (MLC-PL) tasks. Extensive experiments on MS-COCO and PASCAL VOC datasets demonstrate that our PU-MLC achieves significantly improvements on both MLC and MLC-PL settings with even fewer annotations. Code will be released.

翻訳日:2023-06-29 15:06:19 公開日:2023-06-28

# bayesflow:ニューラルネットワークによるベイズワークフローの償却

BayesFlow: Amortized Bayesian Workflows With Neural Networks ( http://arxiv.org/abs/2306.16015v1 )

ライセンス: Link先を確認

Stefan T Radev and Marvin Schmitt and Lukas Schumacher and Lasse Elsem\"uller and Valentin Pratz and Yannik Sch\"alte and Ullrich K\"othe and Paul-Christian B\"urkner

(参考訳) 現代のベイズ推論は、データ分析の原則的ワークフローの一部として確率的モデルからの結論を推定、検証、描画するための計算技法の混合を含む。ベイズワークフローの典型的な問題は、様々なモデルタイプに対する難解な後続分布の近似と、その複雑さと予測性能の観点から同じプロセスの競合モデルの比較である。この原稿はPythonライブラリのBayesFlowを紹介し、アモートされたデータ圧縮と推論のための確立したニューラルネットワークアーキテクチャのシミュレーションベースのトレーニングを行う。 Amortized Bayesian推論は、BayesFlowで実装されているもので、モデルシミュレーションでカスタムニューラルネットワークをトレーニングし、その後のモデル適用のためにこれらのネットワークを再使用することができる。トレーニングされたネットワークは、ほぼ瞬時に推論を行うことができるため、事前のニューラルネットワークトレーニングは、迅速に償却される。

Modern Bayesian inference involves a mixture of computational techniques for estimating, validating, and drawing conclusions from probabilistic models as part of principled workflows for data analysis. Typical problems in Bayesian workflows are the approximation of intractable posterior distributions for diverse model types and the comparison of competing models of the same process in terms of their complexity and predictive performance. This manuscript introduces the Python library BayesFlow for simulation-based training of established neural network architectures for amortized data compression and inference. Amortized Bayesian inference, as implemented in BayesFlow, enables users to train custom neural networks on model simulations and re-use these networks for any subsequent application of the models. Since the trained networks can perform inference almost instantaneously, the upfront neural network training is quickly amortized.

翻訳日:2023-06-29 15:05:50 公開日:2023-06-28

# 隣接トークンマージによるトランスデューサの高速化

Accelerating Transducers through Adjacent Token Merging ( http://arxiv.org/abs/2306.16009v1 )

ライセンス: Link先を確認

Yuang Li, Yu Wu, Jinyu Li, Shujie Liu

(参考訳) 最近のエンドツーエンド自動音声認識(ASR)システムは、高いフレームレートで埋め込みを生成するトランスフォーマーベースの音響エンコーダを使用することが多い。しかし、この設計は非効率であり、特に長い音声信号は、自己着脱の二次計算のためである。そこで本研究では,隣接するトークンと鍵値間の類似度の高いスコアを段階的に組み合わせたAdjacent Token Merging(A-ToMe)を提案する。これにより、総時間ステップを短縮することができ、エンコーダとジョイントネットワークの両方の推論が高速化される。 LibriSpeechの実験により,トークンの57%を削減し,GPU上での推論速度を70%向上できることがわかった。さらに、A-ToMeは、入力音声が複数の発話からなる長文ASRにおけるトークンを減らす効果的な解であることを示す。

Recent end-to-end automatic speech recognition (ASR) systems often utilize a Transformer-based acoustic encoder that generates embedding at a high frame rate. However, this design is inefficient, particularly for long speech signals due to the quadratic computation of self-attention. To address this, we propose a new method, Adjacent Token Merging (A-ToMe), which gradually combines adjacent tokens with high similarity scores between their key values. In this way, the total time step could be reduced, and the inference of both the encoder and joint network is accelerated. Experiments on LibriSpeech show that our method can reduce 57% of tokens and improve the inference speed on GPU by 70% without any notable loss of accuracy. Additionally, we demonstrate that A-ToMe is also an effective solution to reduce tokens in long-form ASR, where the input speech consists of multiple utterances.

翻訳日:2023-06-29 15:05:38 公開日:2023-06-28

# 音声認識におけるゼロショット領域適応のための大規模言語モデルの提案

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition ( http://arxiv.org/abs/2306.16007v1 )

ライセンス: Link先を確認

Yuang Li, Yu Wu, Jinyu Li, Shujie Liu

(参考訳) 言語モデル(LM)の統合は、音声認識におけるドメインシフトに対処する効果的な方法であることが証明されている。しかし、これらのアプローチは通常、lmsのトレーニングのためにかなりの量のターゲットドメインテキストデータを必要とする。これらの手法と異なり、ドメイン固有のテキストプロンプトのみで、7ビリオンパラメータ大言語モデル(LLM)であるLLaMAを用いた2つのゼロショットASRドメイン適応手法を提案する。 LLMは2つの方法で使われます。 1)第2パス再構成:所定のASR系のN-best仮説をLLaMAで再評価すること。 2)深いLLM融合:エンコーダデコーダベースのASRシステムのデコーダにLLMを組み込む。実験では、1つのドメインプロンプトだけで、両方のメソッドがドメイン外のtedlium-2とspgispeechデータセットでワードエラー率(wer)を効果的に削減できることが示されている。特に、深いLLM融合は、実体語と外語彙語のより優れたリコールの利点がある。

The integration of Language Models (LMs) has proven to be an effective way to address domain shifts in speech recognition. However, these approaches usually require a significant amount of target domain text data for the training of LMs. Different from these methods, in this work, with only a domain-specific text prompt, we propose two zero-shot ASR domain adaptation methods using LLaMA, a 7-billion-parameter large language model (LLM). LLM is used in two ways: 1) second-pass rescoring: reranking N-best hypotheses of a given ASR system with LLaMA; 2) deep LLM-fusion: incorporating LLM into the decoder of an encoder-decoder based ASR system. Experiments show that, with only one domain prompt, both methods can effectively reduce word error rates (WER) on out-of-domain TedLium-2 and SPGISpeech datasets. Especially, the deep LLM-fusion has the advantage of better recall of entity and out-of-vocabulary words.

翻訳日:2023-06-29 15:05:24 公開日:2023-06-28

# ディープラーニングのためのバイナリ・プリソーシングによる主音分類の改善

Improving Primate Sounds Classification using Binary Presorting for Deep Learning ( http://arxiv.org/abs/2306.16054v1 )

ライセンス: Link先を確認

Michael K\"olle, Steffen Illium, Maximilian Zorn, Jonas N\"u{\ss}lein, Patrick Suchostawski and Claudia Linnhoff-Popien

(参考訳) 野生生物の観察と保全の分野では、音声録音における機械学習のアプローチがますます普及している。残念なことに、この研究分野の利用可能なデータセットは、しばしば最適な学習材料ではない。本研究では,MELスペクトル表現のサブセグメンテーションを初めてリラベルする一般化されたアプローチを導入し,実際のマルチクラス分類タスクにおいて高い性能を実現する。バイナリプリソートと分類の両方において、畳み込みニューラルネットワーク(CNN)と様々なデータ拡張技術を利用する。このアプローチの結果を,異なる霊長類音を分類し,相対的に装備されたモデルベースラインと対照的に,高い精度とuarスコアを報告するという課題を伴って,挑戦的な \textit{compare 2021}データセット上で示した。

In the field of wildlife observation and conservation, approaches involving machine learning on audio recordings are becoming increasingly popular. Unfortunately, available datasets from this field of research are often not optimal learning material; Samples can be weakly labeled, of different lengths or come with a poor signal-to-noise ratio. In this work, we introduce a generalized approach that first relabels subsegments of MEL spectrogram representations, to achieve higher performances on the actual multi-class classification tasks. For both the binary pre-sorting and the classification, we make use of convolutional neural networks (CNN) and various data-augmentation techniques. We showcase the results of this approach on the challenging \textit{ComparE 2021} dataset, with the task of classifying between different primate species sounds, and report significantly higher Accuracy and UAR scores in contrast to comparatively equipped model baselines.

翻訳日:2023-06-29 14:56:57 公開日:2023-06-28

# SVNR:デノイング拡散による空間変動ノイズ除去

SVNR: Spatially-variant Noise Removal with Denoising Diffusion ( http://arxiv.org/abs/2306.16052v1 )

ライセンス: Link先を確認

Naama Pearl, Yaron Brodsky, Dana Berman, Assaf Zomet, Alex Rav Acha, Daniel Cohen-Or, Dani Lischinski

(参考訳) 雑音拡散モデルは最近、生成的タスクにおいて印象的な結果を示している。トレーニング画像の膨大なコレクションから強力な事前学習を行うことで、このようなモデルは小さなデノナイジングステップのシーケンスを通じて、完全なノイズをクリーンな自然画像に徐々に修正することができる。しかし, 実写画像のノイズとは異なり, 付加的な白色ガウス雑音に基づくため, 実写ノイズ除去に効果的にデノナイジング拡散モデルを適用することは困難である。本研究では,より現実的で空間的変動のある雑音モデルを想定した,新しい拡散の定式化であるSVNRを提案する。 SVNRは、ノイズの多い入力画像を、その上に処理を条件付けることに加えて、デノナイジング拡散プロセスの出発点として使用できる。この目的のために,各画素が独自の時間埋め込みを持つように拡散過程を適応させ,空間的に変化する時間マップをサポートするトレーニングおよび推論スキームを提案する。我々の定式化は、条件画像と修正拡散過程に沿ったサンプルとの間に存在する相関も説明します。実験では, 強力な拡散モデルベースラインに対するアプローチの利点と, 最先端の単一画像復号法に対するアプローチの利点を実証した。

Denoising diffusion models have recently shown impressive results in generative tasks. By learning powerful priors from huge collections of training images, such models are able to gradually modify complete noise to a clean natural image via a sequence of small denoising steps, seemingly making them well-suited for single image denoising. However, effectively applying denoising diffusion models to removal of realistic noise is more challenging than it may seem, since their formulation is based on additive white Gaussian noise, unlike noise in real-world images. In this work, we present SVNR, a novel formulation of denoising diffusion that assumes a more realistic, spatially-variant noise model. SVNR enables using the noisy input image as the starting point for the denoising diffusion process, in addition to conditioning the process on it. To this end, we adapt the diffusion process to allow each pixel to have its own time embedding, and propose training and inference schemes that support spatially-varying time maps. Our formulation also accounts for the correlation that exists between the condition image and the samples along the modified diffusion process. In our experiments we demonstrate the advantages of our approach over a strong diffusion model baseline, as well as over a state-of-the-art single image denoising method.

翻訳日:2023-06-29 14:56:40 公開日:2023-06-28

# 逆攻撃による深部画像復調モデルの同時性およびロバスト性の評価

Evaluating Similitude and Robustness of Deep Image Denoising Models via Adversarial Attack ( http://arxiv.org/abs/2306.16050v1 )

ライセンス: Link先を確認

Jie Ning, Yao Li, Zhichang Guo

(参考訳) ディープニューラルネットワーク(dnn)は、画像デノイジングの分野で幅広い応用があり、従来の画像デノイジングよりも優れている。しかし、DNNは必然的に、敵の攻撃に直面している弱い堅牢性を示す。本稿では,既存の奥行き画像のデノイジング手法の類似性について検討する。第一に、デノナイジング-PGDは、デノナイジングモデルの全対角法である。現在の主流の非盲検モデル(DnCNN、FFDNet、ECNDNet、BRDNet)、盲検モデル(DnCNN-B、ノイズ2ノイズ、RDDCNN-B、FAN)、プラグ・アンド・プレイ(DPIR、CurvPnP)、グレースケールおよびカラー画像に適用した展開復調モデル(DeamNet)は、同一の手法で攻撃することができる。第2に,画像デノイジングタスクではデノイジングpgdの転写能が顕著であるため,トランスファビリティ下での潜伏の特性を探索する実験をデザインする。トランスファー可能性と同化度を関連付け、深部画像の同化モデルは高い同化度を持つと結論づける。第3に, 対向空間の特徴について検討し, 対向訓練を用いて, 対向攻撃による深層画像の脆弱性を補完する。最後に,この対向攻撃法を制約し,ガウス分布を維持する対向攻撃法L2-denoising-PGD画像を提案する。さらに, BM3Dのモデル駆動画像は, 敵攻撃に対する抵抗性を示した。

Deep neural networks (DNNs) have a wide range of applications in the field of image denoising, and they are superior to traditional image denoising. However, DNNs inevitably show vulnerability, which is the weak robustness in the face of adversarial attacks. In this paper, we find some similitudes between existing deep image denoising methods, as they are consistently fooled by adversarial attacks. First, denoising-PGD is proposed which is a denoising model full adversarial method. The current mainstream non-blind denoising models (DnCNN, FFDNet, ECNDNet, BRDNet), blind denoising models (DnCNN-B, Noise2Noise, RDDCNN-B, FAN), and plug-and-play (DPIR, CurvPnP) and unfolding denoising models (DeamNet) applied to grayscale and color images can be attacked by the same set of methods. Second, since the transferability of denoising-PGD is prominent in the image denoising task, we design experiments to explore the characteristic of the latent under the transferability. We correlate transferability with similitude and conclude that the deep image denoising models have high similitude. Third, we investigate the characteristic of the adversarial space and use adversarial training to complement the vulnerability of deep image denoising to adversarial attacks on image denoising. Finally, we constrain this adversarial attack method and propose the L2-denoising-PGD image denoising adversarial attack method that maintains the Gaussian distribution. Moreover, the model-driven image denoising BM3D shows some resistance in the face of adversarial attacks.

翻訳日:2023-06-29 14:55:26 公開日:2023-06-28

# fifaワールドカップのカタール2022でtwitterとaiを使って学んだ感想と楽しい事実

What Sentiment and Fun Facts We Learnt Before FIFA World Cup Qatar 2022 Using Twitter and AI ( http://arxiv.org/abs/2306.16049v1 )

ライセンス: Link先を確認

James She, Kamilla Swart-Arries, Mohammad Belal and Simon Wong

(参考訳) Twitterは、ほとんどの国を橋渡しし、リアルタイムニュース発見を可能にするソーシャルメディアプラットフォームである。 Twitter上のツイートは通常短く、一般の感情を表現するため、グローバルなイベントに対する意見マイニングと感情分析の源泉となる。本稿では、fifaワールドカップに関するツイートに対する感想を提供するための効果的な解決策を提案する。コミュニティで初めて、少なくとも130万ツイートが収集され、提案された機械学習ソリューションのパフォーマンスを評価するデータセットとして実装されている。これらのツイートはカタールワールドカップ2022の関連ハッシュタグとキーワードで収集される。本論文ではvaderアルゴリズムを用いて感情分析を行う。機械学習手法とTwitterのつぶやきの収集により、ワールドカップ前の期間において重要ないくつかの側面の感情と楽しい事実を発見した。その結果、人々はワールドカップの開会に前向きであることがわかった。

Twitter is a social media platform bridging most countries and allows real-time news discovery. Since the tweets on Twitter are usually short and express public feelings, thus provide a source for opinion mining and sentiment analysis for global events. This paper proposed an effective solution, in providing a sentiment on tweets related to the FIFA World Cup. At least 130k tweets, as the first in the community, are collected and implemented as a dataset to evaluate the performance of the proposed machine learning solution. These tweets are collected with the related hashtags and keywords of the Qatar World Cup 2022. The Vader algorithm is used in this paper for sentiment analysis. Through the machine learning method and collected Twitter tweets, we discovered the sentiments and fun facts of several aspects important to the period before the World Cup. The result shows people are positive to the opening of the World Cup.

翻訳日:2023-06-29 14:54:53 公開日:2023-06-28

# 視覚言語モデルによるゼロショット認識の課題:粒度と正確性

Challenges of Zero-Shot Recognition with Vision-Language Models: Granularity and Correctness ( http://arxiv.org/abs/2306.16048v1 )

ライセンス: Link先を確認

Zhenlin Xu, Yi Zhu, Tiffany Deng, Abhay Mittal, Yanbei Chen, Manchen Wang, Paolo Favaro, Joseph Tighe, Davide Modolo

(参考訳) 本稿では,オープンワールドにおけるゼロショット視覚認識タスクに視覚言語モデル(vlms)を適用する際の課題について,クリップなどのコントラスト的視覚言語モデルに着目して検討する。まず,様々な粒度の概念に対するvlmの性能について検討した。我々は,2つの実験環境において,性能不一致を公平に評価する方法を提案し,vlmがきめ細かい概念を認識するのに優れていることを示す。さらに,vlmsの類似度スコアは,視覚入力によるテキスト入力の正確さを厳密に反映しないことがわかった。本稿では,より情報的な記述に対してスコアが偏りがあるという仮説を検証するための評価プロトコルを提案し,組込み間の類似性スコアの性質は,VLMが類似する記述間の正しさを認識するのを困難にしている。本研究は,VLMをオープンワールド環境で使用する上での課題を強調し,今後のゼロショット機能向上に向けた方向性を提案する。

This paper investigates the challenges of applying vision-language models (VLMs) to zero-shot visual recognition tasks in an open-world setting, with a focus on contrastive vision-language models such as CLIP. We first examine the performance of VLMs on concepts of different granularity levels. We propose a way to fairly evaluate the performance discrepancy under two experimental setups and find that VLMs are better at recognizing fine-grained concepts. Furthermore, we find that the similarity scores from VLMs do not strictly reflect the correctness of the textual inputs given visual input. We propose an evaluation protocol to test our hypothesis that the scores can be biased towards more informative descriptions, and the nature of the similarity score between embedding makes it challenging for VLMs to recognize the correctness between similar but wrong descriptions. Our study highlights the challenges of using VLMs in open-world settings and suggests directions for future research to improve their zero-shot capabilities.

翻訳日:2023-06-29 14:54:41 公開日:2023-06-28

# OpenNDD:神経発達障害検出のためのオープンセット認識

OpenNDD: Open Set Recognition for Neurodevelopmental Disorders Detection ( http://arxiv.org/abs/2306.16045v1 )

ライセンス: Link先を確認

Jiaming Yu, Zihao Guan, Xinyue Chang, Xiumei Liu, Zhenshan Shi, Changcai Yang, Riqing Chen, Lanyan Xue, Lifang Wei

(参考訳) 神経発達障害 (NDD) は、非常に一般的な疾患群であり、強力な臨床行動類似性を示し、自閉症スペクトラム障害 (ASD) や注意欠陥性高活動障害 (ADHD) などの異なるNDDの正確な同定を困難にしている。さらに,NDDの診断には信頼性のある生理学的マーカーは存在せず,心理的評価基準にのみ依存している。しかし, 経過観察と近縁な知的補助診断により, 誤診や下垂体手術を予防することが重要である。そこで本稿では,これらの課題を解消するために,nddsスクリーニングと検出のための新しいオープンセット認識フレームワークを提案する。オートエンコーダと逆方向の相互認識を組み合わせることで、既知のクラスを正確に識別し、遭遇したことのないクラスを認識する。また、異なる被験者間の強い類似性を考慮し、未知の障害を識別するためのMMSと呼ばれる共同スケーリング手法を提案する。提案手法の有効性を検証するために, 自閉症脳画像データ交換装置i (abide i) とadhd-200サンプル (adhd-200) のハイブリッドデータセットに対する相互対向実験プロトコルを, 4地点から771点のサンプルで設計し, 各種指標の優位性を示す。 OpenNDDは77.38%、AUROCは75.53%、オープンセットの分類率は59.43%という有望な性能を達成した。

Neurodevelopmental disorders (NDDs) are a highly prevalent group of disorders and represent strong clinical behavioral similarities, and that make it very challenging for accurate identification of different NDDs such as autism spectrum disorder (ASD) and attention-deficit hyperactivity disorder (ADHD). Moreover, there is no reliable physiological markers for NDDs diagnosis and it solely relies on psychological evaluation criteria. However, it is crucial to prevent misdiagnosis and underdiagnosis by intelligent assisted diagnosis, which is closely related to the follow-up corresponding treatment. In order to relieve these issues, we propose a novel open set recognition framework for NDDs screening and detection, which is the first application of open set recognition in this field. It combines auto encoder and adversarial reciprocal points open set recognition to accurately identify known classes as well as recognize classes never encountered. And considering the strong similarities between different subjects, we present a joint scaling method called MMS to distinguish unknown disorders. To validate the feasibility of our presented method, we design a reciprocal opposition experiment protocol on the hybrid datasets from Autism Brain Imaging Data Exchange I (ABIDE I) and THE ADHD-200 SAMPLE (ADHD-200) with 791 samples from four sites and the results demonstrate the superiority on various metrics. Our OpenNDD has achieved promising performance, where the accuracy is 77.38%, AUROC is 75.53% and the open set classification rate is as high as 59.43%.

翻訳日:2023-06-29 14:54:23 公開日:2023-06-28

# 加速検出器のダイナミックマップ

Dynamical Maps for Accelerating Detectors ( http://arxiv.org/abs/2306.16041v1 )

ライセンス: Link先を確認

Shalin Jose (1), Anil Shaji (1) ((1) Indian Institute of Science Education and Research Thiruvananthapuram)

(参考訳) ミンコフスキー真空を無質量スカラー場に弱結合して加速する2レベル粒子検出器の開量子力学について検討する。非ゼロサイズの検出器を考察し、初期慣性運動である場合の時間発展について検討し、その後一定加速度を有限時間オンにする。このようなシステムの進化を記述した力学写像を研究し、力学が完全に正ではないことを示す。加速前の慣性運動は検出器と磁場を絡み合わせることができ、NCPダイナミクスに繋がる。本研究では,前慣性運動の時間と加速度の大きさの関数として,加速相中の開力学の性質を検討する。

We study the open quantum dynamics of a two-level particle detector that starts accelerating through Minkowski vacuum weakly coupled to a massless scalar field. We consider a detector with non-zero size and study its time evolution for the case where it is initially in inertial motion and subsequently a constant acceleration is switched on for a finite time. We study the dynamical maps that describe the evolution of such a system and show that the dynamics is not completely positive (NCP). The inertial motion prior to the acceleration can entangle the detector and field leading to the NCP dynamics. We examine the nature of the open dynamics during the accelerated phase as a function of the duration of prior inertial motion and the magnitude of the acceleration.

翻訳日:2023-06-29 14:53:54 公開日:2023-06-28

# 肝ct画像における超高能率病変検出と偽陽性除去のためのカスケード法

A Cascaded Approach for ultraly High Performance Lesion Detection and False Positive Removal in Liver CT Scans ( http://arxiv.org/abs/2306.16036v1 )

ライセンス: Link先を確認

Fakai Wang, Chi-Tung Cheng, Chien-Wei Peng, Ke Yan, Min Wu, Le Lu, Chien-Hung Liao, and Ling Zhang

(参考訳) 肝臓がんは世界中で高い死亡率と死亡率を持っている。多相CTは肝腫瘍の検出・診断のための主要な医用画像モダリティである。 CT画像における肝病変の自動検出と分類は、臨床ワークフローを改善する可能性がある。この課題は, 肝病変の大きさ, 外観, 画像コントラスト, 腫瘍タイプやサブタイプの複雑さの多様さにより, 依然として困難である。本研究は,4相CT画像,多臓器マスク,多発病変(病理検査で確認された6種類の肝病変)を含む大規模データセットをキュレートするために,多段階CT画像のための多目的ラベリングツールをカスタマイズする。 2段階の肝病変検出パイプラインを開発し,第1段階の高感度検出アルゴリズムは可能な限り多くの病変を発見でき,第2段階の病変再分類アルゴリズムは可能な限り多くの誤報を除去する。多感性病変検出アルゴリズムはセグメント化の個々の確率マップの情報利用を最大化し、病変拡大は病変と肝臓のテクスチャコントラストを効果的に探索する。 331例で個別に検査し, マルチフェーズ造影CT(99.2%, 97.1%, 診断設定)および非コントラストCT(97.3%, 95.7%, スクリーニング設定)における悪性度分類の感度と特異性を得た。

Liver cancer has high morbidity and mortality rates in the world. Multi-phase CT is a main medical imaging modality for detecting/identifying and diagnosing liver tumors. Automatically detecting and classifying liver lesions in CT images have the potential to improve the clinical workflow. This task remains challenging due to liver lesions' large variations in size, appearance, image contrast, and the complexities of tumor types or subtypes. In this work, we customize a multi-object labeling tool for multi-phase CT images, which is used to curate a large-scale dataset containing 1,631 patients with four-phase CT images, multi-organ masks, and multi-lesion (six major types of liver lesions confirmed by pathology) masks. We develop a two-stage liver lesion detection pipeline, where the high-sensitivity detecting algorithms in the first stage discover as many lesion proposals as possible, and the lesion-reclassification algorithms in the second stage remove as many false alarms as possible. The multi-sensitivity lesion detection algorithm maximizes the information utilization of the individual probability maps of segmentation, and the lesion-shuffle augmentation effectively explores the texture contrast between lesions and the liver. Independently tested on 331 patient cases, the proposed model achieves high sensitivity and specificity for malignancy classification in the multi-phase contrast-enhanced CT (99.2%, 97.1%, diagnosis setting) and in the noncontrast CT (97.3%, 95.7%, screening setting).

翻訳日:2023-06-29 14:53:43 公開日:2023-06-28

# stone needle: 医療に向けた汎用マルチモーダル大規模モデルフレームワーク

Stone Needle: A General Multimodal Large-scale Model Framework towards Healthcare ( http://arxiv.org/abs/2306.16034v1 )

ライセンス: Link先を確認

Weihua Liu and Yong Zuo

(参考訳) 医療では、マルチモーダルデータは広く普及しており、医療画像や臨床報告など、診断決定前に総合的に分析する必要がある。しかし、現在の大規模人工知能モデルは、主に単一モーダル認知能力に焦点を当て、複数のモーダルの統合を無視している。そこで本研究では,医療応用に適した汎用マルチモーダル大規模モデルフレームワークであるStone Needleを提案する。ストーンニードルは総合的な医療マルチモーダルモデルの基礎として機能し、テキスト、画像、ビデオ、オーディオといった様々なモダリティを統合し、シングルモーダルシステムの限界を超える。インテント分析,医療基盤モデル,プロンプトマネージャ,医療言語モジュールのフレームワークコンポーネントを通じて,アーキテクチャは複数ラウンドの対話でマルチモーダルインタラクションを行うことができる。本手法は汎用マルチモーダル大規模モデルフレームワークであり,多様なモダリティを統合し,特定のタスクを調整できる。本手法はシングルモーダルシステムと比較して優れた性能を示す実験結果である。異なる形態の融合と複雑な医療情報を石針で処理する能力は、正確な診断、治療の推奨、患者のケアに役立つ。

In healthcare, multimodal data is prevalent and requires to be comprehensively analyzed before diagnostic decisions, including medical images, clinical reports, etc. However, current large-scale artificial intelligence models predominantly focus on single-modal cognitive abilities and neglect the integration of multiple modalities. Therefore, we propose Stone Needle, a general multimodal large-scale model framework tailored explicitly for healthcare applications. Stone Needle serves as a comprehensive medical multimodal model foundation, integrating various modalities such as text, images, videos, and audio to surpass the limitations of single-modal systems. Through the framework components of intent analysis, medical foundation models, prompt manager, and medical language module, our architecture can perform multi-modal interaction in multiple rounds of dialogue. Our method is a general multimodal large-scale model framework, integrating diverse modalities and allowing us to tailor for specific tasks. The experimental results demonstrate the superior performance of our method compared to single-modal systems. The fusion of different modalities and the ability to process complex medical information in Stone Needle benefits accurate diagnosis, treatment recommendations, and patient care.

翻訳日:2023-06-29 14:53:14 公開日:2023-06-28

# ソーシャルメディアにおける公開談話の時空間的変動を探る:イタリアにおけるコロナウイルスパンデミックの第一波を事例として

Exploring Spatial-Temporal Variations of Public Discourse on Social Media: A Case Study on the First Wave of the Coronavirus Pandemic in Italy ( http://arxiv.org/abs/2306.16031v1 )

ライセンス: Link先を確認

Anslow Michael and Galletti Martina

(参考訳) 本稿では,SARS CoV2パンデミックで流行したような重要な出来事に対する社会的反応の探索にソーシャルメディア上での言語行動をどのように活用できるかを探求するための方法論を提案する。特に、イベントの空間的側面と時間的側面が重要な特徴である。本手法は時系列分析とクラスタリングを用いて,ツイート利用傾向の時空間カテゴリーを定位する。各カテゴリの有意な項は、手書きのカテゴリに集約されたスケールされたfスコアに基づいて定性的比較分析によって同定される。このアプローチを実証するため,イタリアで発生した第1波について事例研究を行った。提案手法を用いて既存の心理学的観察を探索し,事象から物理的距離がコミュニケーション内容に与える影響を考察した。本研究は,sars cov2の発生源である病原体と周辺部が明らかな時系列クラスターに対応し,sars cov2の発生源である病原体は周辺地域よりも連帯性と政策に重点が置かれていることを示し,これらの知見を確認した。さらに,パンデミック時の政策変化と時間的カテゴリーが密接に対応していることが判明した。

This paper proposes a methodology for exploring how linguistic behaviour on social media can be used to explore societal reactions to important events such as those that transpired during the SARS CoV2 pandemic. In particular, where spatial and temporal aspects of events are important features. Our methodology consists of grounding spatial-temporal categories in tweet usage trends using time-series analysis and clustering. Salient terms in each category were then identified through qualitative comparative analysis based on scaled f-scores aggregated into hand-coded categories. To exemplify this approach, we conducted a case study on the first wave of the coronavirus in Italy. We used our proposed methodology to explore existing psychological observations which claimed that physical distance from events affects what is communicated about them. We confirmed these findings by showing that the epicentre of the disease and peripheral regions correspond to clear time-series clusters and that those living in the epicentre of the SARS CoV2 outbreak were more focused on solidarity and policy than those from more peripheral regions. Furthermore, we also found that temporal categories corresponded closely to policy changes during the handling of the pandemic.

翻訳日:2023-06-29 14:52:55 公開日:2023-06-28

# 構造モチーフ型グラフニューラルネットワークによる質量スペクトル予測

Mass Spectra Prediction with Structural Motif-based Graph Neural Networks ( http://arxiv.org/abs/2306.16085v1 )

ライセンス: Link先を確認

Jiwon Park, Jeonghee Jo, Sungroh Yoon

(参考訳) ターゲット分子からのイオン化フラグメントの集合体である質量スペクトルは、分子構造の同定において、様々な分野において重要な役割を果たす。一般的な分析方法は、未知のスペクトルがデータベースと相互参照されるスペクトルライブラリ検索である。しかし、このような探索に基づく手法の有効性は、既存の質量スペクトルデータベースの範囲によって制限され、質量スペクトル予測によるデータベースの拡張の必要性を強調する。本研究では、構造モチーフから得られる情報とグラフニューラルネットワーク(GNN)の実装を用いて、質量スペクトルを予測するMotif-based Mass Spectrum Prediction Network (MoMS-Net)を提案する。我々は、様々な質量スペクトルでモデルを試験し、既存のモデルよりもその優位性を観察した。 MoMS-Netはグラフレベルでのサブ構造を考慮し、グラフトランスフォーマーモデルに比べて少ないメモリを使用しながら、長距離依存の取り込みを容易にする。

Mass spectra, which are agglomerations of ionized fragments from targeted molecules, play a crucial role across various fields for the identification of molecular structures. A prevalent analysis method involves spectral library searches,where unknown spectra are cross-referenced with a database. The effectiveness of such search-based approaches, however, is restricted by the scope of the existing mass spectra database, underscoring the need to expand the database via mass spectra prediction. In this research, we propose the Motif-based Mass Spectrum Prediction Network (MoMS-Net), a system that predicts mass spectra using the information derived from structural motifs and the implementation of Graph Neural Networks (GNNs). We have tested our model across diverse mass spectra and have observed its superiority over other existing models. MoMS-Net considers substructure at the graph level, which facilitates the incorporation of long-range dependencies while using less memory compared to the graph transformer model.

翻訳日:2023-06-29 14:47:31 公開日:2023-06-28

# 高速rcnnに基づく連続的二重チャネルライブラリ占有検知システム

A serial dual-channel library occupancy detection system based on Faster RCNN ( http://arxiv.org/abs/2306.16080v1 )

ライセンス: Link先を確認

Guoqiang Yang, Xiaowen Chang, Zitong Wang, Min Yang and Xin Chen

(参考訳) 大学図書館における座席占有現象が問題となっている。しかし、ソフトウェアベースの座席予約やセンサーによる占有検知といった既存のソリューションは、この問題に効果的に対処するには不十分であることが証明されている。本研究では,高速なrcnnに基づく連続2チャンネル物体検出モデルを提案する。さらに,ユーザフレンドリーなWebインターフェースとモバイルAPPを開発し,図書館利用者検出のためのコンピュータビジョンベースのプラットフォームを構築する。データセットを構築するために、現実世界のデータコレックオプションとUE5バーチャルリアリティを組み合わせる。実験の結果,音素単位の仮想データセットの利用は,専用シナリオにおける畳み込みニューラルネットワーク(CNN)の性能を著しく向上させることが示された。シリアルデュアルチャネル検出モデルは、3つのステップを含む。まず、座席が個人によって占有されているかどうかを判断するために、より高速なRCNNアルゴリズムを用いる。その後、移動学習に基づく物体分類アルゴリズムを用いて、未占有座席の画像の分類と識別を行う。これにより、座席を占拠した疑いがあるかどうかについて、手動で判断する必要がなくなる。最後に、WebインターフェースとAPPは、それぞれ図書館員と学生に座席情報を提供し、包括的なサービスを可能にする。本研究は,深層学習手法を活用することで,図書館システムにおける座席占有の課題を効果的に解決する。シート占有率認識の精度を大幅に向上させ,CNNのトレーニングに必要な計算資源を削減し,ライブラリーシート管理の効率を大幅に向上させる。

The phenomenon of seat occupancy in university libraries is a prevalent issue. However, existing solutions, such as software-based seat reservations and sensors-based occupancy detection, have proven to be inadequate in effectively addressing this problem. In this study, we propose a novel approach: a serial dual-channel object detection model based on Faster RCNN. Furthermore, we develop a user-friendly Web interface and mobile APP to create a computer vision-based platform for library seat occupancy detection. To construct our dataset, we combine real-world data collec-tion with UE5 virtual reality. The results of our tests also demonstrate that the utilization of per-sonalized virtual dataset significantly enhances the performance of the convolutional neural net-work (CNN) in dedicated scenarios. The serial dual-channel detection model comprises three es-sential steps. Firstly, we employ Faster RCNN algorithm to determine whether a seat is occupied by an individual. Subsequently, we utilize an object classification algorithm based on transfer learning, to classify and identify images of unoccupied seats. This eliminates the need for manual judgment regarding whether a person is suspected of occupying a seat. Lastly, the Web interface and APP provide seat information to librarians and students respectively, enabling comprehensive services. By leveraging deep learning methodologies, this research effectively addresses the issue of seat occupancy in library systems. It significantly enhances the accuracy of seat occupancy recognition, reduces the computational resources required for training CNNs, and greatly improves the effi-ciency of library seat management.

翻訳日:2023-06-29 14:47:16 公開日:2023-06-28

# カスケードハイブリッド最適化によるセキュアかつ高速な非同期垂直フェデレーション学習

Secure and Fast Asynchronous Vertical Federated Learning via Cascaded Hybrid Optimization ( http://arxiv.org/abs/2306.16077v1 )

ライセンス: Link先を確認

Ganyu Wang, Qingsong Zhang, Li Xiang, Boyu Wang, Bin Gu, Charles Ling

(参考訳) Vertical Federated Learning (VFL)は、複数のパーティが垂直に分割されたデータに対して、プライバシ保護モデルを共同でトレーニングできるようにするため、注目を集めている。近年の研究では、ゼロ階最適化(ZOO)の適用は実用的なVFLアルゴリズムを構築する上で多くの利点があることが示されている。しかし、ZOOベースのVFLの致命的な問題は収束速度が遅いことであり、これは現代の大規模モデルを扱う際の応用を制限している。そこで本研究では,VFLにおけるハイブリッド最適化手法を提案する。この方法では、下流モデル(クライアント)がZOOでトレーニングされ、プライバシーを保護し、内部情報が共有されないことを保証する。一方、アップストリームモデル(サーバ)は、一階最適化(foo)をローカルに更新することで、収束率を大幅に改善し、プライバシとセキュリティを損なうことなく、大規模モデルのトレーニングを可能にする。我々のVFLフレームワークがZOOベースのVFLよりも早く収束することが理論的に証明されている。本手法は,プライバシー保護レベルを維持しつつ,ZOOベースのVFLフレームワークよりも高速な収束を実現することを示す。さらに、VFLの収束は安全でないFOOベースのVFLベースラインに匹敵することを示した。さらに,本手法が大規模モデルのトレーニングを可能にすることを示す。

Vertical Federated Learning (VFL) attracts increasing attention because it empowers multiple parties to jointly train a privacy-preserving model over vertically partitioned data. Recent research has shown that applying zeroth-order optimization (ZOO) has many advantages in building a practical VFL algorithm. However, a vital problem with the ZOO-based VFL is its slow convergence rate, which limits its application in handling modern large models. To address this problem, we propose a cascaded hybrid optimization method in VFL. In this method, the downstream models (clients) are trained with ZOO to protect privacy and ensure that no internal information is shared. Meanwhile, the upstream model (server) is updated with first-order optimization (FOO) locally, which significantly improves the convergence rate, making it feasible to train the large models without compromising privacy and security. We theoretically prove that our VFL framework converges faster than the ZOO-based VFL, as the convergence of our framework is not limited by the size of the server model, making it effective for training large models with the major part on the server. Extensive experiments demonstrate that our method achieves faster convergence than the ZOO-based VFL framework, while maintaining an equivalent level of privacy protection. Moreover, we show that the convergence of our VFL is comparable to the unsafe FOO-based VFL baseline. Additionally, we demonstrate that our method makes the training of a large model feasible.

翻訳日:2023-06-29 14:46:55 公開日:2023-06-28

# 長期会話分析: ユーティリティとプライバシの探求

Long-term Conversation Analysis: Exploring Utility and Privacy ( http://arxiv.org/abs/2306.16071v1 )

ライセンス: Link先を確認

Francesco Nespoli, Jule Pohlhausen, Patrick A. Naylor, Joerg Bitzer

(参考訳) 日常生活で記録された会話の分析にはプライバシー保護が必要である。本稿では,入力特徴量削減,スペクトル平滑化,およびmcadams係数に基づく低コスト話者匿名化手法に基づくプライバシー保全特徴抽出手法について検討する。音声認識と話者検証モデルによりプライバシー保護が決定される一方で,音声活動検出と話者ダイアリゼーションシステムを用いて特徴抽出手法の有用性を評価する。我々は,mcadams係数とスペクトル平滑化の組み合わせが,プライバシを改善しながら有用性を維持していることを示す。

The analysis of conversations recorded in everyday life requires privacy protection. In this contribution, we explore a privacy-preserving feature extraction method based on input feature dimension reduction, spectral smoothing and the low-cost speaker anonymization technique based on McAdams coefficient. We assess the utility of the feature extraction methods with a voice activity detection and a speaker diarization system, while privacy protection is determined with a speech recognition and a speaker verification model. We show that the combination of McAdams coefficient and spectral smoothing maintains the utility while improving privacy.

翻訳日:2023-06-29 14:46:29 公開日:2023-06-28

# 基礎モデルを用いたフェデレーション生成学習

Federated Generative Learning with Foundation Models ( http://arxiv.org/abs/2306.16064v1 )

ライセンス: Link先を確認

Jie Zhang, Xiaohua Qi, Bo Zhao

(参考訳) 既存のフェデレートされた学習ソリューションは、クライアントとサーバの間で機能やパラメータ、ガディアンを伝達することに重点を置いている。新たな基礎生成モデルのおかげで、クライアントとサーバ間で分散トレーニングデータに関連するプロンプトを送信する、新しいフェデレーション学習フレームワーク、federated generative learningを提案する。情報学習データは、プライバシーと基礎生成モデルを含む受信したプロンプトに基づいて遠隔で合成することができる。新しいフレームワークには、通信効率の向上、分散シフトへのレジリエンス向上、実質的なパフォーマンス向上、imagenetとdomainnetデータセットの広範な実験で検証されたプライバシー保護強化など、複数のメリットがある。

Existing federated learning solutions focus on transmitting features, parameters or gadients between clients and server, which suffer from serious low-efficiency and privacy-leakage problems. Thanks to the emerging foundation generative models, we propose a novel federated learning framework, namely Federated Generative Learning, that transmits prompts associated with distributed training data between clients and server. The informative training data can be synthesized remotely based on received prompts containing little privacy and the foundation generative models. The new framework possesses multiple advantages, including improved communication efficiency, better resilience to distribution shift, substantial performance gains, and enhanced privacy protection, which are verified in extensive experiments on ImageNet and DomainNet datasets.

翻訳日:2023-06-29 14:46:21 公開日:2023-06-28

# バナッハ空間の誘導系におけるダイナミクスの収束

Convergence of Dynamics on Inductive Systems of Banach Spaces ( http://arxiv.org/abs/2306.16063v1 )

ライセンス: Link先を確認

Lauritz van Luijk, Alexander Stottmeister an Reinhard F. Werner

(参考訳) 定性的かつ定量的な物理系の多くの特徴は、ある限定的な状況下でのみ、鋭く定義されるか、抽出可能である。例えば、熱力学極限における相転移、量子論からの大きな作用における古典力学の出現、再正規化群固定点から生じる連続量子場理論である。このような多様なアプリケーションで有効な方法がほとんどないように思える。しかし、ここでは理論の極限に対する柔軟なモデリングツールを示す:バナッハ空間の帰納的極限の一般化を構成するソフトインダクティブ極限。この文脈では、ダイナミクスの収束に関する一般的な基準が定式化され、これらの基準が前述の状況に適用されることが示される。

Many features of physical systems, both qualitative and quantitative, become sharply defined or tractable only in some limiting situation. Examples are phase transitions in the thermodynamic limit, the emergence of classical mechanics from quantum theory at large action, and continuum quantum field theory arising from renormalization group fixed points. It would seem that few methods can be useful in such diverse applications. However, we here present a flexible modeling tool for the limit of theories: soft inductive limits constituting a generalization of inductive limits of Banach spaces. In this context, general criteria for the convergence of dynamics will be formulated, and these criteria will be shown to apply in the situations mentioned and more.

翻訳日:2023-06-29 14:46:04 公開日:2023-06-28

# ダイクパスとトポロジカル量子計算

Dyck Paths and Topological Quantum Computation ( http://arxiv.org/abs/2306.16062v1 )

ライセンス: Link先を確認

Vivek Kumar Singh, Akash Sinha, Pramod Padmanabhan, Indrajit Jana

(参考訳) Fibonacci anyonsの融合基底は、普遍量子計算に使用できるユニタリブレイド表現をサポートする。 3つのフィボナッチアーロンの融合基底である$\{|1\rangle, |\tau\rangle\}$と、融合基底上の2次元ブレイド群表現と標準の$(2,2)$ヤングダイアグラム上に構築されたブレイド群表現の間の同型による2つの長さのディックパスの間の写像を示す。この対応は、標準 $(N,N)$ Young tableaux がカタルーニャ数、$C_N$ であるとして、Dyck パスを用いて Fibonacci の融合基底を構築するのに役立つ。次に、局所フレドキン運動を用いて、フィボナッチ融合基底に対応するDyckパスを正確に含むスピン鎖を退化集合として構成する。本システムでは, ランダムノイズに対する安定性を検証し, トポロジカル量子計算のプラットフォームとしての有用性を確立する。最後に、所望の1量子ビット演算の実行を効率的に可能とし、所望の精度($\sim 10^{-3}$)を達成するこの回転空間におけるブレイドワードを示す。

The fusion basis of Fibonacci anyons supports unitary braid representations that can be utilized for universal quantum computation. We show a mapping between the fusion basis of three Fibonacci anyons, $\{|1\rangle, |\tau\rangle\}$, and the two length 4 Dyck paths via an isomorphism between the two dimensional braid group representations on the fusion basis and the braid group representation built on the standard $(2,2)$ Young diagrams using the Jones construction. This correspondence helps us construct the fusion basis of the Fibonacci anyons using Dyck paths as the number of standard $(N,N)$ Young tableaux is the Catalan number, $C_N$ . We then use the local Fredkin moves to construct a spin chain that contains precisely those Dyck paths that correspond to the Fibonacci fusion basis, as a degenerate set. We show that the system is gapped and examine its stability to random noise thereby establishing its usefulness as a platform for topological quantum computation. Finally, we show braidwords in this rotated space that efficiently enable the execution of any desired single-qubit operation, achieving the desired level of precision($\sim 10^{-3}$).

翻訳日:2023-06-29 14:45:45 公開日:2023-06-28

# RoMo-HER:ロバストなモデルベースの隠れ体験リプレイ

RoMo-HER: Robust Model-based Hindsight Experience Replay ( http://arxiv.org/abs/2306.16061v1 )

ライセンス: Link先を確認

Yuming Huang and Bin Ren

(参考訳) スパース報酬はマルチゴール強化学習(RL)におけるサンプル効率の低下につながる要因の1つである。 Hindsight Experience Replay (HER)に基づいて、トレーニングされたモデルと相互作用して得られた仮想軌跡を用いて、モデルに基づくラベリング手法が目標を緩和する手法が提案されている。しかし、ロボット操作環境では効果がない。本稿では,ロボット操作環境における動的モデルを効果的に活用し,サンプル効率を向上させるロバストモデルに基づくHendsight Experience Replay (RoMo-HER) と呼ばれる頑健なフレームワークを設計する。 RoMo-HERは、ダイナミックスモデルと、Foresight relabeling(FR)と呼ばれる、特定の戦略で予測開始状態を選択し、スタート状態の将来の軌跡を予測し、ダイナミックスモデルとエージェントをトレーニングするための最新のポリシーを使用してゴールを再ラベルする新しいゴールレバーリング技術に基づいて構築されている。実験の結果,複数のロボット操作環境において,RoMo-HERはHERやモデルベースHMMよりも高効率であることがわかった。さらに,RoMo-HER と Relay Hindsight Experience Replay (RHER) を統合することで,ロバストモデルに基づく Relay Hindsight Experience Replay (RoMo-RHER) と呼ばれる新しい手法が提案される。 RHERはFetchPush-v1とFetchPickandPlace-v1で25%, 26%, RHERでは25%, RHERでは26%, RHERよりも高い試料効率が得られた。

Sparse rewards are one of the factors leading to low sample efficiency in multi-goal reinforcement learning (RL). Based on Hindsight Experience Replay (HER), model-based relabeling methods have been proposed to relabel goals using virtual trajectories obtained by interacting with the trained model, which can effectively enhance the sample efficiency in accurately modelable sparse-reward environments. However, they are ineffective in robot manipulation environment. In our paper, we design a robust framework called Robust Model-based Hindsight Experience Replay (RoMo-HER) which can effectively utilize the dynamical model in robot manipulation environments to enhance the sample efficiency. RoMo-HER is built upon a dynamics model and a novel goal relabeling technique called Foresight relabeling (FR), which selects the prediction starting state with a specific strategy, predicts the future trajectory of the starting state, and then relabels the goal using the dynamics model and the latest policy to train the agent. Experimental results show that RoMo-HER has higher sample efficiency than HER and Model-based Hindsight Experience Replay in several simulated robot manipulation environments. Furthermore, we integrate RoMo-HER and Relay Hindsight Experience Replay (RHER), which currently exhibits the highest sampling efficiency in most benchmark environments, resulting in a novel approach called Robust Model-based Relay Hindsight Experience Replay (RoMo-RHER). Our experimental results demonstrate that RoMo-RHER achieves higher sample efficiency over RHER, outperforming RHER by 25% and 26% in FetchPush-v1 and FetchPickandPlace-v1, respectively.

翻訳日:2023-06-29 14:45:05 公開日:2023-06-28

# 圧縮センシングのための動的経路制御型ディープアンフォールディングネットワーク

Dynamic Path-Controllable Deep Unfolding Network for Compressive Sensing ( http://arxiv.org/abs/2306.16060v1 )

ライセンス: Link先を確認

Jiechong Song and Bin Chen and Jian Zhang

(参考訳) 深層ニューラルネットワークに最適化アルゴリズムを展開するディープ・アンフォールディング・ネットワーク(dun)は、その優れた解釈性と高性能のため、圧縮センシング(cs)において大きな成功を収めている。 DUNの各ステージは最適化の1つのイテレーションに対応する。テスト時には、すべてのサンプリングイメージを全ての段階で処理する必要があるが、これは計算負荷のコストがかかるとともに、コンテンツの復元が容易な画像も不要である。本稿では,CS再構成に着目し,新しいDPC-DUN(Dynamic Path-Controllable Deep Unfolding Network)を提案する。 dpc-dun 設計したパス制御可能なセレクタは、画像毎に高速かつ適切な経路を動的に選択でき、異なる性能・複雑さのトレードオフを制御してスリム化することができる。我々のDPC-DUNは高い柔軟性を示し、適切なトレードオフを得るために優れた性能と動的調整を提供し、現実にアピールする主な要件に対処する。コードはhttps://github.com/songjiechong/dpc-dunで入手できる。

Deep unfolding network (DUN) that unfolds the optimization algorithm into a deep neural network has achieved great success in compressive sensing (CS) due to its good interpretability and high performance. Each stage in DUN corresponds to one iteration in optimization. At the test time, all the sampling images generally need to be processed by all stages, which comes at a price of computation burden and is also unnecessary for the images whose contents are easier to restore. In this paper, we focus on CS reconstruction and propose a novel Dynamic Path-Controllable Deep Unfolding Network (DPC-DUN). DPC-DUN with our designed path-controllable selector can dynamically select a rapid and appropriate route for each image and is slimmable by regulating different performance-complexity tradeoffs. Extensive experiments show that our DPC-DUN is highly flexible and can provide excellent performance and dynamic adjustment to get a suitable tradeoff, thus addressing the main requirements to become appealing in practice. Codes are available at https://github.com/songjiechong/DPC-DUN.

翻訳日:2023-06-29 14:44:21 公開日:2023-06-28

# DUET: 2次元構造とほぼ同変表現

DUET: 2D Structured and Approximately Equivariant Representations ( http://arxiv.org/abs/2306.16058v1 )

ライセンス: Link先を確認

Xavier Suau, Federico Danieli, T. Anderson Keller, Arno Blaas, Chen Huang, Jason Ramapuram, Dan Busbridge, Luca Zappella

(参考訳) MSSL(Multiview Self-Supervised Learning)は、入力変換の集合に関する学習不変性に基づいている。しかし、不変性は変換に関連する情報を表現から部分的にあるいは完全に取り除き、そのような情報を必要とする特定の下流タスクのパフォーマンスを損なう可能性がある。本稿では,行列構造に整理された2次元表現である2DstrUcturedおよびEquivarianT表現(Coined DUET)を提案し,入力データに作用する変換について同変する。 DUET表現は、意味的に表現されたまま、入力変換に関する情報を保持する。 SimCLR (Chen et al., 2020) や ESSL (Dangovski et al., 2022) と比較すると、DUET 表現の構造的および同変性は、再構成エラーの少ない制御生成を可能にし、SimCLR や ESSL では制御不可能である。 DUETは複数の識別タスクに対して高い精度を実現し、転送学習を改善する。

Multiview Self-Supervised Learning (MSSL) is based on learning invariances with respect to a set of input transformations. However, invariance partially or totally removes transformation-related information from the representations, which might harm performance for specific downstream tasks that require such information. We propose 2D strUctured and EquivarianT representations (coined DUET), which are 2d representations organized in a matrix structure, and equivariant with respect to transformations acting on the input data. DUET representations maintain information about an input transformation, while remaining semantically expressive. Compared to SimCLR (Chen et al., 2020) (unstructured and invariant) and ESSL (Dangovski et al., 2022) (unstructured and equivariant), the structured and equivariant nature of DUET representations enables controlled generation with lower reconstruction error, while controllability is not possible with SimCLR or ESSL. DUET also achieves higher accuracy for several discriminative tasks, and improves transfer learning.

翻訳日:2023-06-29 14:44:01 公開日:2023-06-28

# ChatGPTは医療専門家か? バイオメディカルタスクにおける現行GPTモデルのゼロショット性能の探索

Is ChatGPT a Biomedical Expert? -- Exploring the Zero-Shot Performance of Current GPT Models in Biomedical Tasks ( http://arxiv.org/abs/2306.16108v1 )

ライセンス: Link先を確認

Samy Ateia, Udo Kruschwitz

(参考訳) 商業用大規模言語モデル (LLMs) GPT-3.5-Turbo と GPT-4 の性能を2023年のBioASQ課題から評価した。回答生成に焦点を当てたタスク11bフェーズbでは、両方のモデルがリードシステムとの競合能力を示した。注目すべきは、単純なゼロショット学習でこれを達成したことだ。関連したスニペットがなくても、パフォーマンスは良好だったが、最高のシステムと同等ではなかった。興味深いことに、より古く安価なGPT-3.5-Turboシステムでは、ファクトイドとリストの回答に基づいたQ&A設定でGPT-4と競合することができた。タスク11bのフェーズAでは、検索に焦点を当てたゼロショット学習によるクエリ拡張により、性能が向上したが、他のシステムに比べてモデルは低下した。これらの実験を再実行するのに必要なコードはGitHubから入手できる。

We assessed the performance of commercial Large Language Models (LLMs) GPT-3.5-Turbo and GPT-4 on tasks from the 2023 BioASQ challenge. In Task 11b Phase B, which is focused on answer generation, both models demonstrated competitive abilities with leading systems. Remarkably, they achieved this with simple zero-shot learning, grounded with relevant snippets. Even without relevant snippets, their performance was decent, though not on par with the best systems. Interestingly, the older and cheaper GPT-3.5-Turbo system was able to compete with GPT-4 in the grounded Q&A setting on factoid and list answers. In Task 11b Phase A, focusing on retrieval, query expansion through zero-shot learning improved performance, but the models fell short compared to other systems. The code needed to rerun these experiments is available through GitHub.

翻訳日:2023-06-29 14:36:17 公開日:2023-06-28

# 1mのパラメータで十分か? 医用画像分割のための軽量CNNモデル

1M parameters are enough? A lightweight CNN-based model for medical image segmentation ( http://arxiv.org/abs/2306.16103v1 )

ライセンス: Link先を確認

Binh-Duong Dinh, Thanh-Thu Nguyen, Thi-Thao Tran, Van-Truong Pham

(参考訳) 畳み込みニューラルネットワーク(CNN)とトランスフォーマーベースのモデルは、高レベルの特徴を抽出し、画像の重要な側面を捉える能力により、医療画像セグメンテーションに広く適用されている。しかし、高い精度の必要性と低い計算コストの要求との間にはトレードオフがしばしばある。高いパラメータを持つモデルは理論的にはより優れた性能を達成できるが、計算の複雑さとメモリ使用量の増加をもたらすため、実装には実用的ではない。本稿では,u-lite という,同一のままでも優れた性能を得られる軽量な u-net ベースのモデルを提案する。我々は,CNNの強みを生かし,演算パラメータの著しい削減を図るために,Depthwise Separable Convolutionの原理に基づいてU-Liteを設計する。具体的には、エンコーダとデコーダの両方で7x7のカーネルを持つAxial Depthwise Convolutionsを提案し、モデル受容場を拡大する。性能をさらに向上するため,フィルタ3x3によるAxial Dilated Depthwise Convolutionsをいくつかのブランチとして使用しています。全体として、U-Lite は 878K のパラメータしか持たず、従来の U-Net の35倍も小さく、トランスフォーマーベースのモデルよりもはるかに少ない。提案手法は, 計算複雑性を削減しつつ, 他の最先端アーキテクチャと比較して医療用セグメンテーションタスクにおいて印象的な性能を実現する。コードはhttps://github.com/duong-db/u-lite。

Convolutional neural networks (CNNs) and Transformer-based models are being widely applied in medical image segmentation thanks to their ability to extract high-level features and capture important aspects of the image. However, there is often a trade-off between the need for high accuracy and the desire for low computational cost. A model with higher parameters can theoretically achieve better performance but also result in more computational complexity and higher memory usage, and thus is not practical to implement. In this paper, we look for a lightweight U-Net-based model which can remain the same or even achieve better performance, namely U-Lite. We design U-Lite based on the principle of Depthwise Separable Convolution so that the model can both leverage the strength of CNNs and reduce a remarkable number of computing parameters. Specifically, we propose Axial Depthwise Convolutions with kernels 7x7 in both the encoder and decoder to enlarge the model receptive field. To further improve the performance, we use several Axial Dilated Depthwise Convolutions with filters 3x3 for the bottleneck as one of our branches. Overall, U-Lite contains only 878K parameters, 35 times less than the traditional U-Net, and much more times less than other modern Transformer-based models. The proposed model cuts down a large amount of computational complexity while attaining an impressive performance on medical segmentation tasks compared to other state-of-the-art architectures. The code will be available at: https://github.com/duong-db/U-Lite.

翻訳日:2023-06-29 14:36:01 公開日:2023-06-28

# 任意分布型雑音量子チャネルの古典的容量

Classical Capacity of Arbitrarily Distributed Noisy Quantum Channels ( http://arxiv.org/abs/2306.16102v1 )

ライセンス: Link先を確認

Indrakshi Dey, Harun Siljak, Nicola Marchetti

(参考訳) 量子コンピュータと量子衛星の迅速な展開により、古典的な情報を交換できる量子およびハイブリッドな古典量子ネットワークの設計と展開の必要性が高まっている。この文脈では、古典的情報を含む任意の量子チャネルに対する古典的ノイズと量子的ノイズの混合の影響に関する基礎研究を行う。このような混合ノイズを考える背景にある理論的根拠は、量子ノイズは異なるメモリやリピータ技術のような量子伝送シナリオにおける異なる絡み合いや不一致から生じうるが、古典的なノイズは古典的信号との共存から生じるものである。この目的に向けて,古典的システムの観点から混合雑音の分布を導出し,混合雑音の存在下で任意の分散量子チャネル上で実現可能なチャネル容量を定式化する。数値実験の結果,光子数の増加に伴って容量が増加することがわかった。

With the rapid deployment of quantum computers and quantum satellites, there is a pressing need to design and deploy quantum and hybrid classical-quantum networks capable of exchanging classical information. In this context, we conduct the foundational study on the impact of a mixture of classical and quantum noise on an arbitrary quantum channel carrying classical information. The rationale behind considering such mixed noise is that quantum noise can arise from different entanglement and discord in quantum transmission scenarios, like different memories and repeater technologies, while classical noise can arise from the coexistence with the classical signal. Towards this end, we derive the distribution of the mixed noise from a classical system's perspective, and formulate the achievable channel capacity over an arbitrary distributed quantum channel in presence of the mixed noise. Numerical results demonstrate that capacity increases with the increase in the number of photons per usage.

翻訳日:2023-06-29 14:35:30 公開日:2023-06-28

# 密度密度相関を用いた相関差分実現における量子気体のキャラクタリゼーション

Characterizing quantum gases in correlated-disorder realizations using density-density correlations ( http://arxiv.org/abs/2306.16099v1 )

ライセンス: Link先を確認

Silvia Hiebel, Benjamin Nagler, Sian Barbosa, Jennifer Koch, and Artur Widera

(参考訳) 物理系における障害の役割は、マクロとミクロの世界で広く研究されている。静的障害は多くの場合よく理解されているが、時間依存障害が量子気体に与える影響はいまだに研究されていない。実験では、波長可変な相関時間を持つ超低温量子気体の時間依存光スペックル障害を生成・特徴付ける。実験的に、コヒーレント光は、スタティックと回転ディフューザの組み合わせを照らし、ディフューザの構造による空間変化相と相対回転による時間変化相とを収集する。ディフューザの回転速度は、ダイナミクスの特徴的な時間スケールを決定する。研究対象の量子気体の典型的な時間スケールと一致する広い範囲で調整することができる。分子ボース・アインシュタイン凝縮体に対するその影響を観測し,その強度分布とその場を測定することで,動的スペックルパターンを特徴付ける。 1つのディフューザが共通の光軸の周りで相対的に回転すると、光学スペックルの強度相関と量子ガス密度密度相関を追跡する。その結果,両測定法に比較して結果が得られた。この設定により、量子ガスの特性に適応した乱れポテンシャルを調整できる。これらの研究は、制御された動的不規則ポテンシャルを用いて相互作用する量子気体における非平衡物理学の研究の道を開いた。

The role of disorder on physical systems has been widely studied in the macroscopic and microscopic world. While static disorder is well understood in many cases, the impact of time-dependent disorder on quantum gases is still poorly investigated. In our experimental setup, we produce and characterize time-dependent optical-speckle disorder for ultracold quantum gases with tunable correlation time. Experimentally, coherent light illuminates a combination of a static and a rotating diffuser, thereby collecting a spatially varying phase due to the diffusers' structure and a temporally variable phase due to the relative rotation. The rotation speed of the diffuser determines a characteristic time scale of the dynamics. It can be tuned within a broad range matching typical time scales of the quantum gases investigated. We characterize the dynamic speckle pattern ex-situ by measuring its intensity distribution and in-situ by observing its impact on a molecular Bose-Einstein condensate. As one diffuser rotates relative to the other around the common optical axis, we trace the optical speckle's intensity correlations and the quantum gas' density-density correlations. Our results show comparable outcomes for both measurement methods. The setup allows us to tune the disorder potential adapted to the characteristics of the quantum gas. These studies pave the way for investigating nonequilibrium physics in interacting quantum gases using controlled dynamical-disorder potentials.

翻訳日:2023-06-29 14:35:15 公開日:2023-06-28

# Chan-Vese Attention U-Net:ロバストセグメンテーションの注意機構

Chan-Vese Attention U-Net: An attention mechanism for robust segmentation ( http://arxiv.org/abs/2306.16098v1 )

ライセンス: Link先を確認

Nicolas Makaroff and Laurent D. Cohen

(参考訳) 畳み込みニューラルネットワークを用いたセグメンテーションアルゴリズムの結果を研究するとき、結果の信頼性と一貫性について疑問を呈する。このことは、疑わしい余地がほとんどないアプリケーションでそのようなアルゴリズムを使用する可能性に疑問を呈する。本稿では,U-Netモデルのような標準CNNアーキテクチャによって与えられるセグメンテーションマスクをより正確に制御するために,Chan-Veseエネルギー最小化に基づく新しいアテンションゲートを提案する。このメカニズムにより、pdeの解像度に基づいてセグメンテーションの制約を得ることができる。本研究により,ニューラルネットワークが保持する空間情報を関心領域で観測し,二分節分割における競合結果を得ることができた。 mri脳画像データベース上での医用画像分割におけるこの手法の有効性について述べる。

When studying the results of a segmentation algorithm using convolutional neural networks, one wonders about the reliability and consistency of the results. This leads to questioning the possibility of using such an algorithm in applications where there is little room for doubt. We propose in this paper a new attention gate based on the use of Chan-Vese energy minimization to control more precisely the segmentation masks given by a standard CNN architecture such as the U-Net model. This mechanism allows to obtain a constraint on the segmentation based on the resolution of a PDE. The study of the results allows us to observe the spatial information retained by the neural network on the region of interest and obtains competitive results on the binary segmentation. We illustrate the efficiency of this approach for medical image segmentation on a database of MRI brain images.

翻訳日:2023-06-29 14:34:54 公開日:2023-06-28

# スパース表現、推論、学習

Sparse Representations, Inference and Learning ( http://arxiv.org/abs/2306.16097v1 )

ライセンス: Link先を確認

Clarissa Lauditi, Emanuele Troiani and Marc M\'ezard

(参考訳) 近年、統計物理学は、機械学習で起こるような大きな次元の推論問題を調べる上で有用なツールであることが証明されている。統計物理学は、解の基本的な限界を研究する分析ツールを提供し、個々のインスタンスを解決するアルゴリズムを提案する。これらのノートでは、2022年にル・フーシュのサマースクールで行われたMarc M\'ezard氏の講義に基づき、圧縮されたセンシング問題やパーセプトロンにおける学習問題を含む、弱い長距離相互作用の様々な問題に使用できる一般的な枠組みを提示する。これらの問題は,理論ツールやアルゴリズムとして,キャビティ手法の開発を通じて,レプリカ対称レベルでどのように研究できるのかを考察する。

In recent years statistical physics has proven to be a valuable tool to probe into large dimensional inference problems such as the ones occurring in machine learning. Statistical physics provides analytical tools to study fundamental limitations in their solutions and proposes algorithms to solve individual instances. In these notes, based on the lectures by Marc M\'ezard in 2022 at the summer school in Les Houches, we will present a general framework that can be used in a large variety of problems with weak long-range interactions, including the compressed sensing problem, or the problem of learning in a perceptron. We shall see how these problems can be studied at the replica symmetric level, using developments of the cavity methods, both as a theoretical tool and as an algorithm.

翻訳日:2023-06-29 14:34:42 公開日:2023-06-28

# chatlaw: 外部知識ベースを統合したオープンソースの法的大型言語モデル

ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases ( http://arxiv.org/abs/2306.16092v1 )

ライセンス: Link先を確認

Jiaxi Cui, Zongjian Li, Yang Yan, Bohua Chen and Li Yuan

(参考訳) 大規模言語モデル(LLM)は、様々な領域における自然言語処理タスクに革命をもたらす可能性を示しており、垂直特有の大規模モデルに大きな関心を喚起している。しかし、独自のデータ蓄積を利用して金融分野を前進させたbloomberggptやfingptのようなプロプライエタリなモデルとは異なり、デジタルトランスフォーメーションを促進するために、中国の法律領域に似たような大きな言語モデルはない。本稿では,ChatLawという,オープンソースの法的大規模言語モデルを提案する。データ品質の重要性から、法的なドメインの微調整データセットを慎重に設計しました。さらに,参照データ検索における法データスクリーニングにおけるモデル幻覚の問題を克服するために,ベクトルデータベース検索とキーワード検索を組み合わせた手法を導入し,ベクトルデータベース検索のみに依存する不正確さを効果的に軽減する。さらに,参照データに存在する誤差を克服する大規模モデルの能力を高めること,モデルレベルでのモデル幻覚の問題を最適化すること,大規模モデルの問題解決能力を向上させることを提案する。また、当社のモデルとデータの一部をhttps://github.com/PKU-YuanGroup/ChatLaw.comでオープンソース化しました。

Large Language Models (LLMs) have shown the potential to revolutionize natural language processing tasks in various domains, sparking great interest in vertical-specific large models. However, unlike proprietary models such as BloombergGPT and FinGPT, which have leveraged their unique data accumulations to make strides in the finance domain, there hasn't not many similar large language models in the Chinese legal domain to facilitate its digital transformation. In this paper, we propose an open-source legal large language model named ChatLaw. Due to the importance of data quality, we carefully designed a legal domain fine-tuning dataset. Additionally, to overcome the problem of model hallucinations in legal data screening during reference data retrieval, we introduce a method that combines vector database retrieval with keyword retrieval to effectively reduce the inaccuracy of relying solely on vector database retrieval. Furthermore, we propose a self-attention method to enhance the ability of large models to overcome errors present in reference data, further optimizing the issue of model hallucinations at the model level and improving the problem-solving capabilities of large models. We also open-sourced our model and part of the data at https://github.com/PKU-YuanGroup/ChatLaw.

翻訳日:2023-06-29 14:34:26 公開日:2023-06-28

# ニューラルネットワーク活性化機能の実証的損失景観解析

Empirical Loss Landscape Analysis of Neural Network Activation Functions ( http://arxiv.org/abs/2306.16090v1 )

ライセンス: Link先を確認

Anna Sergeevna Bosman, Andries Engelbrecht, Marde Helbig

(参考訳) 非線型性を有効にすることで、活性化関数はニューラルネットワーク設計において重要な役割を果たす。活性化関数の選択は、以前、結果として生じる損失景観の特性に影響を及ぼすことが示された。アクティベーション関数とロスランドスケープ特性の関係を理解することは、ニューラルアーキテクチャとトレーニングアルゴリズム設計において重要である。本研究は,双曲的接点,整流線形単位,指数的線形単位活性化関数に関連するニューラルネットワーク損失のランドスケープを実験的に検討する。整流された線形ユニットは最も凸なロスランドスケープを示し、指数線型ユニットは最も平坦なロスランドスケープを示し、より優れた一般化性能を示す。全ての活性化関数に対して、損失景観における広狭谷の存在が確立され、狭谷は飽和ニューロンと暗黙的に規則化されたネットワーク構成と相関することが示されている。

Activation functions play a significant role in neural network design by enabling non-linearity. The choice of activation function was previously shown to influence the properties of the resulting loss landscape. Understanding the relationship between activation functions and loss landscape properties is important for neural architecture and training algorithm design. This study empirically investigates neural network loss landscapes associated with hyperbolic tangent, rectified linear unit, and exponential linear unit activation functions. Rectified linear unit is shown to yield the most convex loss landscape, and exponential linear unit is shown to yield the least flat loss landscape, and to exhibit superior generalisation performance. The presence of wide and narrow valleys in the loss landscape is established for all activation functions, and the narrow valleys are shown to correlate with saturated neurons and implicitly regularised network configurations.

翻訳日:2023-06-29 14:34:05 公開日:2023-06-28

# mastering nordschleife - モータースポーツにおけるai戦略決定のための総合的レースシミュレーション

Mastering Nordschleife -- A comprehensive race simulation for AI strategy decision-making in motorsports ( http://arxiv.org/abs/2306.16088v1 )

ライセンス: Link先を確認

Max Boettinger, David Klotz

(参考訳) サーキットモータースポーツの分野では、レース戦略はレースの結果を決定する上で重要な役割を果たす。この戦略は燃料消費とタイヤ性能の劣化のために必要となるピット停止のタイミングに焦点を当てている。レース戦略の目的は、タイヤ交換や燃料補給のようなピットストップの利点と、ピットレーンで発生する時間損失のバランスをとることである。現在のレースシミュレーションは、最善のレース戦略を推定するために使用され、粒度、確率的事象のモデル化、インラップの手動入力を必要とする。本稿では,gtレーシングに適した新しいシミュレーションモデルを開発し,戦略決定の自動化に人工知能を活用することで,これらの制約に対処する。 openaiのジムフレームワークとシミュレーションを統合することで、強化学習環境が作成され、エージェントが訓練される。実験パラメータ検証のために,2020 N\"urburgring Langstrecken Serie の時系列データをもとに,様々なハイパーパラメータ構成,観測空間,報酬関数を評価した。その結果、訓練されたエージェントがピット停止タイミングと燃料補給量に関する合理的な判断を行うため、レース戦略決定を改善するための強化学習の可能性を示した。学習率、腐敗率、エピソード数といった重要なパラメータは重要な要因として特定され、燃料質量と現在の人種位置の組み合わせは政策開発に最も有効であることが証明される。この論文は、レースシミュレーションにおける強化学習の幅広い応用に寄与し、特にgtレーシング領域においてfia formula~1を超えるレース戦略最適化の可能性を切り開く。

In the realm of circuit motorsports, race strategy plays a pivotal role in determining race outcomes. This strategy focuses on the timing of pit stops, which are necessary due to fuel consumption and tire performance degradation. The objective of race strategy is to balance the advantages of pit stops, such as tire replacement and refueling, with the time loss incurred in the pit lane. Current race simulations, used to estimate the best possible race strategy, vary in granularity, modeling of probabilistic events, and require manual input for in-laps. This paper addresses these limitations by developing a novel simulation model tailored to GT racing and leveraging artificial intelligence to automate strategic decisions. By integrating the simulation with OpenAI's Gym framework, a reinforcement learning environment is created and an agent is trained. The study evaluates various hyperparameter configurations, observation spaces, and reward functions, drawing upon historical timing data from the 2020 N\"urburgring Langstrecken Serie for empirical parameter validation. The results demonstrate the potential of reinforcement learning for improving race strategy decision-making, as the trained agent makes sensible decisions regarding pit stop timing and refueling amounts. Key parameters, such as learning rate, decay rate and the number of episodes, are identified as crucial factors, while the combination of fuel mass and current race position proves most effective for policy development. The paper contributes to the broader application of reinforcement learning in race simulations and unlocks the potential for race strategy optimization beyond FIA Formula~1, specifically in the GT racing domain.

翻訳日:2023-06-29 14:33:50 公開日:2023-06-28

# 生涯変化検出: ロボットナビゲーションにおける小物体変化検出のための連続領域適応

Lifelong Change Detection: Continuous Domain Adaptation for Small Object Change Detection in Every Robot Navigation ( http://arxiv.org/abs/2306.16086v1 )

ライセンス: Link先を確認

Koji Takeda, Kanji Tanaka, Yoshimasa Nakamura

(参考訳) 最近発表されたロボット工学の研究領域である地景変化検出は、視覚の不確かさと複雑な非線形視点の投影が組み合わさって、その不適切さに苦しめられている。不適切さを正則化するために、一般的に適用される教師付き学習方法(cscd-netなど)は、手作業で注釈付き高品質なオブジェクトクラス固有の優先順位に依存する。本稿では,手動アノテーションが利用できない汎用アプリケーションドメインについて検討し,完全な自己監督アプローチを提案する。本手法は,日常のロボットナビゲーションにおいて検出される物体の変化を,将来の変化検出タスクを改善するために,追加の事前として再利用できるという,強力で汎用的な考え方を採用する。さらに、グラウンダービューの小さなオブジェクト変更検出という、新しい挑戦的な実践シナリオにおいて、堅牢化フレームワークを実装し、実験的に検証する。

The recently emerging research area in robotics, ground view change detection, suffers from its ill-posed-ness because of visual uncertainty combined with complex nonlinear perspective projection. To regularize the ill-posed-ness, the commonly applied supervised learning methods (e.g., CSCD-Net) rely on manually annotated high-quality object-class-specific priors. In this work, we consider general application domains where no manual annotation is available and present a fully self-supervised approach. The present approach adopts the powerful and versatile idea that object changes detected during everyday robot navigation can be reused as additional priors to improve future change detection tasks. Furthermore, a robustified framework is implemented and verified experimentally in a new challenging practical application scenario: ground-view small object change detection.

翻訳日:2023-06-29 14:33:22 公開日:2023-06-28

# INSTA-BEEER: 高速かつ高精度なオブジェクトインスタンスセグメンテーションのための明示的なエラー推定とリファインメント

INSTA-BEEER: Explicit Error Estimation and Refinement for Fast and Accurate Unseen Object Instance Segmentation ( http://arxiv.org/abs/2306.16132v1 )

ライセンス: Link先を確認

Seunghyeok Back, Sangbeom Lee, Kangmin Kim, Joosoon Lee, Sungho Shin, Jaemo Maeng, Kyoobin Lee

(参考訳) ロボット操作には、見えない物体の効率的かつ正確なセグメンテーションが不可欠である。しかし、過度あるいは過度なセグメンテーションのため、依然として困難である。既存の改良法はセグメンテーション品質を向上させることができるが、小さな境界エラーのみを修正できるか、十分に高速ではない。本研究では,インスタンスの追加と削除,および境界のシャープ化を可能にする改良モデルであるINSTA-BEEER(INSTAnce boundary Explicit Error Estimation and Refinement)を提案する。このモデルは、エラー推定-then-refinementスキームを利用して、最初に、最初のセグメンテーションでインスタンス境界の真正、真負、偽正、偽負のピクセルのピクセル境界の明示的なエラーを推定する。その後、これらの誤差推定をガイダンスとして、初期セグメンテーションを洗練する。実験により,提案モデルによりセグメント化が著しく向上し,最先端性能が達成された。さらに、高速ランタイム(0.1秒未満)で、モデルは様々な初期セグメンテーションメソッドのパフォーマンスを一貫して改善し、実用的なロボットアプリケーションに適している。

Efficient and accurate segmentation of unseen objects is crucial for robotic manipulation. However, it remains challenging due to over- or under-segmentation. Although existing refinement methods can enhance the segmentation quality, they fix only minor boundary errors or are not sufficiently fast. In this work, we propose INSTAnce Boundary Explicit Error Estimation and Refinement (INSTA-BEEER), a novel refinement model that allows for adding and deleting instances and sharpening boundaries. Leveraging an error-estimation-then-refinement scheme, the model first estimates the pixel-wise boundary explicit errors: true positive, true negative, false positive, and false negative pixels of the instance boundary in the initial segmentation. It then refines the initial segmentation using these error estimates as guidance. Experiments show that the proposed model significantly enhances segmentation, achieving state-of-the-art performance. Furthermore, with a fast runtime (less than 0.1 s), the model consistently improves performance across various initial segmentation methods, making it highly suitable for practical robotic applications.

翻訳日:2023-06-29 14:27:42 公開日:2023-06-28

# 位置認識型対向パッチの分布モデル

Distributional Modeling for Location-Aware Adversarial Patches ( http://arxiv.org/abs/2306.16131v1 )

ライセンス: Link先を確認

Xingxing Wei, Shouwei Ruan, Yinpeng Dong and Hang Su

(参考訳) 敵パッチは、物理的な世界で敵攻撃を行う重要な形態の1つである。既存の敵パッチの自然性と攻撃性を改善するために,対象オブジェクト上のパッチの位置を最適化プロセスに統合して攻撃を行う位置認識パッチを提案する。効果的ではあるが、パッチを配置するための最適な場所を効率的に見つけることは、特にブラックボックス攻撃設定下では困難である。本稿では,分散最適化型逆数パッチ(DOPatch, Distribution-Optimized Adversarial Patch)を提案する。第一に、異なるモデルにまたがる場所の分布がかなり似ていることを発見し、サーロゲートモデルに最適化された分布的プリミティブを使用して、未発見のモデルに対して効率的なクエリベースの攻撃を実現できる。第二に、DOPatchは、敵位置の分布を特徴付けることにより、多様な敵のサンプルを生成することができる。そこで我々は,DOP-DMAT (Distributal-Modeling Adversarial Training) を慎重に設計することで,位置対応パッチに対するモデルの堅牢性を向上させることができる。顔認識および画像認識タスクにおけるDOPatchの評価を行い、既存の手法よりも優れた性能と効率性を示す。また,本手法の有効性を検証し,敵位置の分布に関する知見を提供するため,広範囲にわたるアブレーション研究と分析を行った。

Adversarial patch is one of the important forms of performing adversarial attacks in the physical world. To improve the naturalness and aggressiveness of existing adversarial patches, location-aware patches are proposed, where the patch's location on the target object is integrated into the optimization process to perform attacks. Although it is effective, efficiently finding the optimal location for placing the patches is challenging, especially under the black-box attack settings. In this paper, we propose the Distribution-Optimized Adversarial Patch (DOPatch), a novel method that optimizes a multimodal distribution of adversarial locations instead of individual ones. DOPatch has several benefits: Firstly, we find that the locations' distributions across different models are pretty similar, and thus we can achieve efficient query-based attacks to unseen models using a distributional prior optimized on a surrogate model. Secondly, DOPatch can generate diverse adversarial samples by characterizing the distribution of adversarial locations. Thus we can improve the model's robustness to location-aware patches via carefully designed Distributional-Modeling Adversarial Training (DOP-DMAT). We evaluate DOPatch on various face recognition and image recognition tasks and demonstrate its superiority and efficiency over existing methods. We also conduct extensive ablation studies and analyses to validate the effectiveness of our method and provide insights into the distribution of adversarial locations.

翻訳日:2023-06-29 14:27:22 公開日:2023-06-28

# MLSMM:機械学習セキュリティ成熟度モデル

MLSMM: Machine Learning Security Maturity Model ( http://arxiv.org/abs/2306.16127v1 )

ライセンス: Link先を確認

Felix Jedrzejewski, Davide Fucci, Oleksandr Adamov

(参考訳) 機械学習(ML)ベースのソフトウェアコンポーネントの開発におけるセキュリティプラクティスの成熟度を評価することは、従来のソフトウェア開発ほど注目されていない。本稿では,ML開発ライフサイクルに沿ってセキュリティプラクティスを整理し,それぞれが3段階の成熟度を確立する機械学習セキュリティ成熟度モデル(MLSMM)を提案する。我々は,MLSMMを産業と学界の緊密な連携の一歩として想定する。

Assessing the maturity of security practices during the development of Machine Learning (ML) based software components has not gotten as much attention as traditional software development. In this Blue Sky idea paper, we propose an initial Machine Learning Security Maturity Model (MLSMM) which organizes security practices along the ML-development lifecycle and, for each, establishes three levels of maturity. We envision MLSMM as a step towards closer collaboration between industry and academia.

翻訳日:2023-06-29 14:26:58 公開日:2023-06-28

# 自動転写表データのより効率的な手作業レビュー

More efficient manual review of automatically transcribed tabular data ( http://arxiv.org/abs/2306.16126v1 )

ライセンス: Link先を確認

Bj{\o}rn-Richard Pedersen, Rigmor Katrine Johansen, Einar Holsb{\o}, Hilde Sommerseth, Lars Ailo Bongo

(参考訳) 機械学習手法は、歴史的データの書き起こしに有用であることが証明されている。しかし、精度の高い手法による結果には手動による検証と修正が必要である。このような手作業によるレビューは, 時間と費用がかかるため, より効率的に行うことが目的である。以前は、ノルウェーの1950年国勢調査(97%)から230万個の手書きの職業コードを書き起こすのに機械学習を使いました。モデルの信頼性が最も低い90,000 (3%) のコードを手作業でレビューしました。 9万のコードを人間のレビュアーに割り当て、アノテーションツールを使ってコードをレビューしました。レビューア合意を評価するために、いくつかのコードは複数のレビューアに割り当てられた。そして、レビュー結果を分析して、精度の改善と努力の関係を理解する。さらに、ワークフローを改善するためにレビュアーにインタビューしました。レビュアーはラベルの62.8%を修正し、31.9%のケースでモデルラベルに同意した。画像の約0.2%はラベルを割り当てられず、5.1%はレビュアーが不確実か、または無効なラベルを割り当てられた。 9000枚の画像は、複数のレビュアーによって独立にレビューされ、86.43%の合意と8.96%の不一致が得られた。私たちの自動転写は、最も頻度の高いコードに対して偏りがあり、最も低い頻度のコードに対して高い誤分類があることが分かりました。インタビューの結果,レビュアーは内部品質管理を行い,カスタムツールが適していることがわかった。したがって、レビュアーは1人だけですが、不確実性を報告すべきです。

Machine learning methods have proven useful in transcribing historical data. However, results from even highly accurate methods require manual verification and correction. Such manual review can be time-consuming and expensive, therefore the objective of this paper was to make it more efficient. Previously, we used machine learning to transcribe 2.3 million handwritten occupation codes from the Norwegian 1950 census with high accuracy (97%). We manually reviewed the 90,000 (3%) codes with the lowest model confidence. We allocated those 90,000 codes to human reviewers, who used our annotation tool to review the codes. To assess reviewer agreement, some codes were assigned to multiple reviewers. We then analyzed the review results to understand the relationship between accuracy improvements and effort. Additionally, we interviewed the reviewers to improve the workflow. The reviewers corrected 62.8% of the labels and agreed with the model label in 31.9% of cases. About 0.2% of the images could not be assigned a label, while for 5.1% the reviewers were uncertain, or they assigned an invalid label. 9,000 images were independently reviewed by multiple reviewers, resulting in an agreement of 86.43% and disagreement of 8.96%. We learned that our automatic transcription is biased towards the most frequent codes, with a higher degree of misclassification for the lowest frequency codes. Our interview findings show that the reviewers did internal quality control and found our custom tool well-suited. So, only one reviewer is needed, but they should report uncertainty.

翻訳日:2023-06-29 14:26:50 公開日:2023-06-28

# ソーシャルメディア上でうつ病を識別するためのフレームワーク: mentalriskes@iberlef 2023

A Framework for Identifying Depression on Social Media: MentalRiskES@IberLEF 2023 ( http://arxiv.org/abs/2306.16125v1 )

ライセンス: Link先を確認

Simon Sanchez Viloria, Daniel Peix del R\'io, Rub\'en Berm\'udez Cabo, Guillermo Arturo Arrojo Fuentes, Isabel Segura-Bedmar

(参考訳) 本稿では,IberLEF 2023におけるMentalRiskESタスクへの参加について述べる。そのタスクは、ソーシャルメディアの活動に基づいて、抑うつを経験する個人の可能性を予測することであった。データセットは、175人のテレグラムユーザーの会話から成り、それぞれが障害に苦しむ証拠に従ってラベル付けされた。従来の機械学習とディープラーニングを組み合わせることで、バイナリ分類、単純な回帰、マルチクラス分類、マルチクラス回帰という4つの予測サブタスクを解くことができた。マルチクラスの回帰ケースを解くためにモデルをトレーニングし、他の3つのサブタスクで動作するように予測を変換することで、この問題に対処した。 BERTモデルを微調整し、文の埋め込みを線形回帰器への入力として使用し、後者がより良い結果を得る2つの異なるモデリング手法の性能を比較した。結果はhttps://github.com/simonsanvil/earlydepression-mentalriskesで再現できます。

This paper describes our participation in the MentalRiskES task at IberLEF 2023. The task involved predicting the likelihood of an individual experiencing depression based on their social media activity. The dataset consisted of conversations from 175 Telegram users, each labeled according to their evidence of suffering from the disorder. We used a combination of traditional machine learning and deep learning techniques to solve four predictive subtasks: binary classification, simple regression, multiclass classification, and multiclass regression. We approached this by training a model to solve the multiclass regression case and then transforming the predictions to work for the other three subtasks. We compare the performance of two different modeling approaches: fine-tuning a BERT-based model and using sentence embeddings as inputs to a linear regressor, with the latter yielding better results. The code to reproduce our results can be found at: https://github.com/simonsanvil/EarlyDepression-MentalRiskES.

翻訳日:2023-06-29 14:26:27 公開日:2023-06-28

# コントラストの識別を促進する意味ポジティブペア

Semantic Positive Pairs for Enhancing Contrastive Instance Discrimination ( http://arxiv.org/abs/2306.16122v1 )

ライセンス: Link先を確認

Mohammad Alkhalefi, Georgios Leontidis, Mingjun Zhong

(参考訳) インスタンス識別に基づく自己教師付き学習アルゴリズムは,表現の崩壊を効果的に防止し,表現学習に有望な結果をもたらす。しかし、組込み空間において正のペア(すなわち同じインスタンスの2つのビュー)を引き付け、それらのカテゴリに関係なく他のすべてのインスタンス(すなわち負のペア)を撃退するプロセスは、重要な特徴を破棄する。そこで本研究では,類似した意味的内容を持つ画像を特定し,それを正のインスタンスとして扱う手法であるspps(semantic positive pairs set)を提案し,表現学習中に重要な特徴を捨てるリスクを低減させる。このアプローチは、SimCLRやMOCOのような対照的なインスタンス識別フレームワークでも機能します。我々は、ImageNet、STL-10、CIFAR-10の3つのデータセットで実験を行い、我々のアプローチを評価する。実験結果から,本手法は3つのデータセットのベースライン手法であるvanilla SimCLRよりも一貫して優れており,例えば,バッチサイズ1024および800エポックのImageNet上では,線形評価プロトコル下でのvanilla SimCLRを4.18%改善する。

Self-supervised learning algorithms based on instance discrimination effectively prevent representation collapse and produce promising results in representation learning. However, the process of attracting positive pairs (i.e., two views of the same instance) in the embedding space and repelling all other instances (i.e., negative pairs) irrespective of their categories could result in discarding important features. To address this issue, we propose an approach to identifying those images with similar semantic content and treating them as positive instances, named semantic positive pairs set (SPPS), thereby reducing the risk of discarding important features during representation learning. Our approach could work with any contrastive instance discrimination framework such as SimCLR or MOCO. We conduct experiments on three datasets: ImageNet, STL-10 and CIFAR-10 to evaluate our approach. The experimental results show that our approach consistently outperforms the baseline method vanilla SimCLR across all three datasets; for example, our approach improves upon vanilla SimCLR under linear evaluation protocol by 4.18% on ImageNet with a batch size 1024 and 800 epochs.

翻訳日:2023-06-29 14:26:12 公開日:2023-06-28

# gp経路のディープラーニングアルゴリズムを用いたnhsトラストにおける正常胸部x線撮影の実世界性能

Real-World Performance of Autonomously Reporting Normal Chest Radiographs in NHS Trusts Using a Deep-Learning Algorithm on the GP Pathway ( http://arxiv.org/abs/2306.16115v1 )

ライセンス: Link先を確認

Jordan Smith, Tom Naunton Morgan, Paul Williams, Qaiser Malik, Simon Rasalingham

(参考訳) AIMは、現在、2つのNHSトラストにおいて診断決定支援ソフトウェアとしてデプロイされているディープラーニング(DL)アルゴリズムの性能を分析し、アクティブな臨床経路における正常な胸部X線を識別する。材料と方法 2022年12月からサマセット NHS Foundation Trust (SFT) に、2023年3月からCalderdale & Huddersfield NHS Foundation Trust (CHFT) にDLアルゴリズムが配備されている。このアルゴリズムは展開前に開発・訓練され、gp要求胸部x線(cxr)に異常点を割り当てるために使用される。このアルゴリズムは、最も低い異常スコアを持つ試験のサブセットをHigh Confidence Normal (HCN) に分類し、その結果をTrustに表示する。この2段階の研究には、アルゴリズムによって6週間にわたって処理された4,654のcxr連続検査が含まれる。その結果,評価試験の20.0%(930)をhcnとして分類すると,負の予測値(npv)0.96。検査の0.77% (36) が誤ってhcnと分類され、検査医による臨床的に有意な異常は認められなかった。 DLソフトウェアは臨床医への迅速なサービス水準を維持し、平均7.1秒でTrustsに返却された。結論 dlアルゴリズムは低いエラー率で動作し、cxrのサブセットを正常に高信頼で自動報告するために使用される自動診断意思決定支援ツールとして非常に有効である。すべてのcxrの20%を取り除き、レポーターの作業負荷を削減し、放射線部門が他の場所でリソースに集中できるようにする。

AIM To analyse the performance of a deep-learning (DL) algorithm currently deployed as diagnostic decision support software in two NHS Trusts used to identify normal chest x-rays in active clinical pathways. MATERIALS AND METHODS A DL algorithm has been deployed in Somerset NHS Foundation Trust (SFT) since December 2022, and at Calderdale & Huddersfield NHS Foundation Trust (CHFT) since March 2023. The algorithm was developed and trained prior to deployment, and is used to assign abnormality scores to each GP-requested chest x-ray (CXR). The algorithm classifies a subset of examinations with the lowest abnormality scores as High Confidence Normal (HCN), and displays this result to the Trust. This two-site study includes 4,654 CXR continuous examinations processed by the algorithm over a six-week period. RESULTS When classifying 20.0% of assessed examinations (930) as HCN, the model classified exams with a negative predictive value (NPV) of 0.96. There were 0.77% of examinations (36) classified incorrectly as HCN, with none of the abnormalities considered clinically significant by auditing radiologists. The DL software maintained fast levels of service to clinicians, with results returned to Trusts in a mean time of 7.1 seconds. CONCLUSION The DL algorithm performs with a low rate of error and is highly effective as an automated diagnostic decision support tool, used to autonomously report a subset of CXRs as normal with high confidence. Removing 20% of all CXRs reduces workload for reporters and allows radiology departments to focus resources elsewhere.

翻訳日:2023-06-29 14:25:49 公開日:2023-06-28

# 境界条件の異なる円点における磁場の影響の量子情報理論

Quantum-information theory of magnetic field influence on circular dots with different boundary conditions ( http://arxiv.org/abs/2306.16114v1 )

ライセンス: Link先を確認

H. Shafeekali, O. Olendski

(参考訳) 横一様磁場 $\bf b$ の位置 (subscript $\rho$) と運動量 (\gamma$) に対するシャノン量子情報エントロピー $s_{\rho,\gamma}$, fisher informations $i_{\rho,\gamma}$, informational energies $o_{\rho,\gamma}$ および情報エネルギー $o_{\rho,\gamma}$ の影響は、円周がジリクレとノイマン境界条件 (bc) のいずれかをサポートする2次元円形量子ドット (qds) に対して理論的に研究されている。解析により、磁場と表面相互作用の構造特性に対する類似性と影響の相違が明らかになった。スペクトル間の顕著な区別は、同じ放射量子数$n$と隣接する非正角指数$m$でノイマンエネルギーの誘導が増加するときの交差である。 b$が増加すると、どちらのシステムも、その特性が一様場となるとランダウ凝縮を行う。例えば、ディリクレ和 $s_{\rho_{00}}+s_{\gamma_{00}} は、上から基本限界 2(1+\ln\pi)$ へのアプローチにおいて、対応するノイマン量よりも少なくとも $b$ である。広く受け入れられている不平衡不確かさ関係 $o_\rho o_\gamma\leq(2\pi)^{-\mathtt{d}}$ と$\mathtt{d}$ が系の次元であることは、磁場中のノイマン qd によって破られることを指摘した。静電高調波閉じ込めとの比較を行う。物理的解釈は2つのbcの異なる役割とフィールドとの相互作用に基づいている: ディリクレ(ノイマン)曲面は反発的(引き込み的)なインターフェースである。

Influence of the transverse uniform magnetic field $\bf B$ on position (subscript $\rho$) and momentum ($\gamma$) Shannon quantum-information entropies $S_{\rho,\gamma}$, Fisher informations $I_{\rho,\gamma}$ and informational energies $O_{\rho,\gamma}$ is studied theoretically for the 2D circular quantum dots (QDs) whose circumference supports homogeneous either Dirichlet or Neumann boundary condition (BC). Analysis reveals similarities and differences of the influence on the properties of the structure of the surface interaction with the magnetic field. Conspicuous distinction between the spectra are crossings at the increasing induction of the Neumann energies with the same radial quantum number $n$ and adjacent non-positive angular indices $m$. At the growing $B$, either system undergoes Landau condensation when its characteristics turn into their uniform field counterparts. For the Dirichlet system this transformation takes place at the smaller magnetic intensities; e.g., the Dirichlet sum $S_{\rho_{00}}+S_{\gamma_{00}}$ on its approach from above to a fundamental limit $2(1+\ln\pi)$ is at any $B$ smaller than the corresponding Neumann quantity what physically means that the former geometry provides more total information about the position and motion of the particle. It is pointed out that the widely accepted disequilibrium uncertainty relation $O_\rho O_\gamma\leq(2\pi)^{-\mathtt{d}}$, with $\mathtt{d}$ being a dimensionality of the system, is violated by the Neumann QD in the magnetic field. Comparison with electrostatic harmonic confinement is performed. Physical interpretation is based on the different roles of the two BCs and their interplay with the field: Dirichlet (Neumann) surface is a repulsive (attractive) interface.

翻訳日:2023-06-29 14:25:17 公開日:2023-06-28

# 最適時間変数学習における時間正規化

Time Regularization in Optimal Time Variable Learning ( http://arxiv.org/abs/2306.16111v1 )

ライセンス: Link先を確認

Evelyn Herberg and Roland Herzog and Frederik K\"ohne

(参考訳) 近年、arXiv:2204.08528では、ディープニューラルネットワーク(DNN)における最適時変学習が導入されている。この写本では、離散力学系の時間軸に直接関係する正規化項を導入することで概念を拡張している。さらに,Residual Neural Networks (ResNets) に対する適応型プルーニング手法を提案する。この結果は、よく知られたMNISTとFashion MNISTデータセットの分類タスクに提案された概念を適用することで説明される。 pytorchコードはhttps://github.com/frederikkoehne/time_variable_learningで利用できます。

Recently, optimal time variable learning in deep neural networks (DNNs) was introduced in arXiv:2204.08528. In this manuscript we extend the concept by introducing a regularization term that directly relates to the time horizon in discrete dynamical systems. Furthermore, we propose an adaptive pruning approach for Residual Neural Networks (ResNets), which reduces network complexity without compromising expressiveness, while simultaneously decreasing training time. The results are illustrated by applying the proposed concepts to classification tasks on the well known MNIST and Fashion MNIST data sets. Our PyTorch code is available on https://github.com/frederikkoehne/time_variable_learning.

翻訳日:2023-06-29 14:24:31 公開日:2023-06-28

# 高速マーチングエネルギーCNN

Fast Marching Energy CNN ( http://arxiv.org/abs/2306.16109v1 )

ライセンス: Link先を確認

Nicolas Makaroff, Th\'eo Bertrand and Laurent D. Cohen

(参考訳) 測地距離とそれらが伝達する幾何学的情報を活用することは、イメージングにおける多くのデータ指向アプリケーションにとって鍵となる。測地線距離計算は、画像ベースメトリクスを用いた画像セグメンテーションに長く使われてきた。我々は、CNNを用いて問題に適応した等方的リーマン計量を生成し、アプリケーションの例を示す新しい方法を提案する。次に、cnnで出力される計量ポテンシャルで計算された測地線距離の単位球としての脳腫瘍のセグメンテーションに適用し、出力マスクに幾何学的および位相的制約を与える。測地的距離モジュールは機械学習フレームワークでうまく機能し、幾何学的および位相的特性を確保しつつ最先端のパフォーマンスを達成するために使用できることを示す。

Leveraging geodesic distances and the geometrical information they convey is key for many data-oriented applications in imaging. Geodesic distance computation has been used for long for image segmentation using Image based metrics. We introduce a new method by generating isotropic Riemannian metrics adapted to a problem using CNN and give as illustrations an example of application. We then apply this idea to the segmentation of brain tumours as unit balls for the geodesic distance computed with the metric potential output by a CNN, thus imposing geometrical and topological constraints on the output mask. We show that geodesic distance modules work well in machine learning frameworks and can be used to achieve state-of-the-art performances while ensuring geometrical and/or topological properties.

翻訳日:2023-06-29 14:24:19 公開日:2023-06-28

# SkillNet-X: わずかに活性化された多言語マルチタスクモデル

SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills ( http://arxiv.org/abs/2306.16176v1 )

ライセンス: Link先を確認

Zhangyin Feng, Yong Dai, Fan Zhang, Duyu Tang, Xiaocheng Feng, Shuangzhi Wu, Bing Qin, Yunbo Cao and Shuming Shi

(参考訳) 従来のマルチタスク学習手法は、基本的にタスクや言語に関する共通知識のみを活用でき、言語横断知識やクロスタスク知識が失われる。本稿では,skillnet-xと呼ばれる汎用多言語マルチタスクモデルを提案する。この目的のために、複数の言語固有のスキルとタスク固有のスキルを定義し、それぞれがスキルモジュールに対応する。 skillnet-xは、ターゲットタスクまたはターゲット言語に関連するスキルモジュールの一部をスパースに活性化する。知識伝達ハブとして機能するスキルモジュールは、タスク関連知識と言語関連知識を連続的に吸収することができる。トランスを基盤として,マルチヘッドアテンション層とフィードフォワードネットワーク層を変更し,スキルモジュールに対応する。我々はSkillNet-Xを4言語で11の自然言語理解データセット上で評価した。その結果,SkillNet-Xはタスク固有のベースラインと2つのマルチタスク学習ベースライン(密接な関節モデルとMixture-of-Expertsモデル)よりも優れた性能を示した。さらに、スキル事前トレーニングは、ほぼすべてのデータセット上でSkillNet-Xのパフォーマンスをさらに向上させる。モデルの一般化を検討するために,2つの新しいタスクについて実験を行い,skillnet-xがベースラインを大きく上回ることを確認した。

Traditional multitask learning methods basically can only exploit common knowledge in task- or language-wise, which lose either cross-language or cross-task knowledge. This paper proposes a general multilingual multitask model, named SkillNet-X, which enables a single model to tackle many different tasks from different languages. To this end, we define several language-specific skills and task-specific skills, each of which corresponds to a skill module. SkillNet-X sparsely activates parts of the skill modules which are relevant either to the target task or the target language. Acting as knowledge transit hubs, skill modules are capable of absorbing task-related knowledge and language-related knowledge consecutively. Based on Transformer, we modify the multi-head attention layer and the feed forward network layer to accommodate skill modules. We evaluate SkillNet-X on eleven natural language understanding datasets in four languages. Results show that SkillNet-X performs better than task-specific baselines and two multitask learning baselines (i.e., dense joint model and Mixture-of-Experts model). Furthermore, skill pre-training further improves the performance of SkillNet-X on almost all datasets. To investigate the generalization of our model, we conduct experiments on two new tasks and find that SkillNet-X significantly outperforms baselines.

翻訳日:2023-06-29 14:16:58 公開日:2023-06-28

# $\mathbf{c}^2$former:rgb赤外物体検出のための校正および補完トランスフォーマー

$\mathbf{C}^2$Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection ( http://arxiv.org/abs/2306.16175v1 )

ライセンス: Link先を確認

Maoxun Yuan, Xingxing Wei

(参考訳) 可視(rgb)および赤外線(ir)画像上の物体検出は、時間前後のアプリケーションのロバストな検出を容易にする新たなソリューションとして、近年広く注目を集めている。赤外線画像の助けを借りて、オブジェクト検出器はRGB-IR複合情報を使用することにより、実用上より信頼性が高く、堅牢である。しかし、既存の手法は相反性ミスカバリレーションや核融合インプレシジョンの問題に苦しんでいる。本稿では,異なる特徴間のペア関係をモデル化する強力な能力を有するため,これら2つの問題に同時に対処するために,$\mathrm{C}^2$Former という新しいキャリブレーション・補完変換器を提案する。 rgb と ir モダリティの相互接続関係を学習し,そのキャリブレーションと相補的特徴を得るために,$\mathrm{c}^2$former で相互接続(inter-modality cross-attention,ica)モジュールを設計する。 ICAにおけるグローバルアテンションの計算による計算コストを低減するため、特徴写像の次元を小さくするために、適応特徴サンプリング(AFS)モジュールが導入された。 $\mathrm{C}^2$Formerは機能ドメインで機能するため、バックボーンネットワークを介して既存のRGB-IRオブジェクト検出器に組み込むことができる。したがって,1つの単段と2つの2段階の物体検出器に,我々の$\mathrm{C}^2$Formerを組み込んで,その有効性と汎用性を評価する。本研究では,DroneVehicle と KAIST RGB-IR データセットの広範な実験により,RGB-IR 補完情報を完全に活用し,ロバストな検出結果が得られることを確認した。コードはhttps://github.com/yuanmaoxun/Calibrated-and-Complementary-Transformer-for-RGB-Infrared-Object-Detec tion.gitで公開されている。

Object detection on visible (RGB) and infrared (IR) images, as an emerging solution to facilitate robust detection for around-the-clock applications, has received extensive attention in recent years. With the help of IR images, object detectors have been more reliable and robust in practical applications by using RGB-IR combined information. However, existing methods still suffer from modality miscalibration and fusion imprecision problems. Since transformer has the powerful capability to model the pairwise correlations between different features, in this paper, we propose a novel Calibrated and Complementary Transformer called $\mathrm{C}^2$Former to address these two problems simultaneously. In $\mathrm{C}^2$Former, we design an Inter-modality Cross-Attention (ICA) module to obtain the calibrated and complementary features by learning the cross-attention relationship between the RGB and IR modality. To reduce the computational cost caused by computing the global attention in ICA, an Adaptive Feature Sampling (AFS) module is introduced to decrease the dimension of feature maps. Because $\mathrm{C}^2$Former performs in the feature domain, it can be embedded into existed RGB-IR object detectors via the backbone network. Thus, one single-stage and one two-stage object detector both incorporating our $\mathrm{C}^2$Former are constructed to evaluate its effectiveness and versatility. With extensive experiments on the DroneVehicle and KAIST RGB-IR datasets, we verify that our method can fully utilize the RGB-IR complementary information and achieve robust detection results. The code is available at https://github.com/yuanmaoxun/Calibrated-and-Complementary-Transformer-for-RGB-Infrared-Object-Detec tion.git.

翻訳日:2023-06-29 14:16:34 公開日:2023-06-28

# ソースコードの類似度測定とクローン検出に関する体系的文献レビュー--技術,応用,課題

A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges ( http://arxiv.org/abs/2306.16171v1 )

ライセンス: Link先を確認

Morteza Zakeri-Nasrabadi and Saeed Parsa and Mohammad Ramezani and Chanchal Roy and Masoud Ekhtiarzadeh

(参考訳) ソースコードの類似度の測定と評価は、コードのレコメンデーション、重複コード、盗作、マルウェア、嗅覚検出など、幅広いアプリケーションを取り入れた、基本的なソフトウェアエンジニアリング活動である。本稿では,コード類似度測定と評価手法に関する体系的な文献レビューとメタ分析を行い,既存手法とその特性を異なる用途で明らかにする。私たちは最初、4つのデジタルライブラリーに問い合わせて100,000以上の記事を見つけました。研究は方法論、プログラミング言語、データセット、ツール、アプリケーションによって分類された。深い調査によると、80のソフトウェアツールがあり、5つのアプリケーションドメインで8つの異なるテクニックで作業している。約49%のツールはjavaプログラムで動作し、37%はcとc++をサポートしているが、多くのプログラミング言語はサポートしていない。注目すべき点は、ソースコードの類似度測定と重複コードに関連する12のデータセットが存在することだ。信頼できるデータセットの欠如、経験的評価、ハイブリッドメソッド、マルチパラダイム言語にフォーカスすることが、この分野の主要な課題である。コード類似度測定の新たな応用は、メンテナンスに加えて開発フェーズに集中する。

Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.

翻訳日:2023-06-29 14:16:00 公開日:2023-06-28

# マルチテラー逆蒸留による精度・ロバスト性トレードオフの緩和

Mitigating the Accuracy-Robustness Trade-off via Multi-Teacher Adversarial Distillation ( http://arxiv.org/abs/2306.16170v1 )

ライセンス: Link先を確認

Shiji Zhao, Xizhe Wang, Xingxing Wei

(参考訳) 敵対的トレーニングは、敵対的攻撃に対するディープニューラルネットワークの堅牢性を改善するための実践的なアプローチである。信頼性の高いロバスト性をもたらすが、クリーンな例に対する性能は敵の訓練後に負の影響を受ける。近年, 対人訓練に知識蒸留法を応用し, 堅牢性向上に競争力を発揮する研究も行われているが, 清浄な試料の精度はいまだに限られている。本稿では, 高いクリーンな教師と強いロバストな教師を用いて, クリーンな事例と敵対的な事例をそれぞれ扱うことで, モデルの逆トレーニングプロセスの指導を行うマルチTeacher Adversarial Robustness Distillation (MTARD)を導入する。最適化の過程では,異なる教師が同様の知識尺度を示すことを保証するために,教師の温度を調整し,教師の情報エントロピーを一定に保つエントロピーベースバランスアルゴリズムを設計する。また,生徒が複数の教師から比較的一貫した学習速度を持つことを保証するため,異なる種類の知識の学習重みを調整できる正規化損失バランスアルゴリズムを提案する。公開データセット上で行われた一連の実験は、MTARDが様々な敵攻撃に対して最先端の敵の訓練と蒸留法より優れていることを示した。

Adversarial training is a practical approach for improving the robustness of deep neural networks against adversarial attacks. Although bringing reliable robustness, the performance toward clean examples is negatively affected after adversarial training, which means a trade-off exists between accuracy and robustness. Recently, some studies have tried to use knowledge distillation methods in adversarial training, achieving competitive performance in improving the robustness but the accuracy for clean samples is still limited. In this paper, to mitigate the accuracy-robustness trade-off, we introduce the Multi-Teacher Adversarial Robustness Distillation (MTARD) to guide the model's adversarial training process by applying a strong clean teacher and a strong robust teacher to handle the clean examples and adversarial examples, respectively. During the optimization process, to ensure that different teachers show similar knowledge scales, we design the Entropy-Based Balance algorithm to adjust the teacher's temperature and keep the teachers' information entropy consistent. Besides, to ensure that the student has a relatively consistent learning speed from multiple teachers, we propose the Normalization Loss Balance algorithm to adjust the learning weights of different types of knowledge. A series of experiments conducted on public datasets demonstrate that MTARD outperforms the state-of-the-art adversarial training and distillation methods against various adversarial attacks.

翻訳日:2023-06-29 14:15:39 公開日:2023-06-28

# エンドツーエンド自動運転のための階層的フェデレーション学習を制約した通信資源

Communication Resources Constrained Hierarchical Federated Learning for End-to-End Autonomous Driving ( http://arxiv.org/abs/2306.16169v1 )

ライセンス: Link先を確認

Wei-Bin Kou, Shuai Wang, Guangxu Zhu, Bin Luo, Yingxian Chen, Derrick Wing Kwan Ng, and Yik-Chung Wu

(参考訳) フェデレートラーニング(FL)はモデル集約によるエンドツーエンド自動運転の一般化を改善する一方、従来のシングルホップFL(SFL)は、車両とクラウドサーバ間の長距離通信による収束が遅い。階層的連合学習(HFL)は、中点エッジサーバの導入によってそのような欠点を克服する。しかし、制約付き通信資源とhfl性能のオーケストレーションが緊急問題となっている。本稿では,ハイブリッドデータとモデル集約を用いた自律運転モデルの一般化誤差を最小限に抑えるために,最適化に基づく通信資源制約付き階層型学習(CRCHFL)フレームワークを提案する。 CRCHFLの有効性をカーラーニング・トゥ・アクト(CARLA)シミュレーションプラットフォームで評価した。その結果,提案するcrchflは収束速度を加速し,連関学習自律運転モデルの一般化を促進することがわかった。さらに、同じ通信資源予算の下では、HFLを10.33%、SFLを12.44%上回っている。

While federated learning (FL) improves the generalization of end-to-end autonomous driving by model aggregation, the conventional single-hop FL (SFL) suffers from slow convergence rate due to long-range communications among vehicles and cloud server. Hierarchical federated learning (HFL) overcomes such drawbacks via introduction of mid-point edge servers. However, the orchestration between constrained communication resources and HFL performance becomes an urgent problem. This paper proposes an optimization-based Communication Resource Constrained Hierarchical Federated Learning (CRCHFL) framework to minimize the generalization error of the autonomous driving model using hybrid data and model aggregation. The effectiveness of the proposed CRCHFL is evaluated in the Car Learning to Act (CARLA) simulation platform. Results show that the proposed CRCHFL both accelerates the convergence rate and enhances the generalization of federated learning autonomous driving model. Moreover, under the same communication resource budget, it outperforms the HFL by 10.33% and the SFL by 12.44%.

翻訳日:2023-06-29 14:15:11 公開日:2023-06-28

# 機械学習のための最適輸送の最近の進歩

Recent Advances in Optimal Transport for Machine Learning ( http://arxiv.org/abs/2306.16156v1 )

ライセンス: Link先を確認

Eduardo Fernandes Montesuma, Fred Ngol\`e Mboula, Antoine Souloumiac

(参考訳) 近年,確率分布の比較と操作のための機械学習の確率的フレームワークとして最適輸送法が提案されている。これはその豊かな歴史と理論に根ざし、生成モデリングや伝達学習といった機械学習の様々な問題に対する新しい解決策を提供してきた。本研究では,2012～2022年における機械学習の最適輸送の貢献について検討し,教師なし,教師なし,転送,強化学習の4つのサブフィールドに注目した。計算最適輸送の最近の発展と機械学習の実践との相互作用をさらに強調する。

Recently, Optimal Transport has been proposed as a probabilistic framework in Machine Learning for comparing and manipulating probability distributions. This is rooted in its rich history and theory, and has offered new solutions to different problems in machine learning, such as generative modeling and transfer learning. In this survey we explore contributions of Optimal Transport for Machine Learning over the period 2012 -- 2022, focusing on four sub-fields of Machine Learning: supervised, unsupervised, transfer and reinforcement learning. We further highlight the recent development in computational Optimal Transport, and its interplay with Machine Learning practice.

翻訳日:2023-06-29 14:14:54 公開日:2023-06-28

# 周期的に蹴られた多体力学における予熱

Prethermalization in aperiodically kicked many-body dynamics ( http://arxiv.org/abs/2306.16144v1 )

ライセンス: Link先を確認

Jin Yan, Roderich Moessner and Hongzheng Zhao

(参考訳) 駆動多体システムは通常、エネルギー保存の欠如により加熱を受ける。加熱は時間周期ドライブでは抑制されるが、通常の駆動プロトコルではほとんど知られていない。本研究では,準周期的なトゥエモースや,n$-multipolar temporal correlationsを持つランダムシーケンス群によって駆動される非周期キック系の加熱ダイナミクスについて検討する。高周波状態から離れても複数の加熱チャネルを除去できることを実証した。除去チャネルの数は、多極性順序$n$で増加する。これを古典的な蹴りローターチェーンで説明し、長寿命の予熱状態を見つける。静的ハミルトニアンが運動エネルギーのみを含む場合、先行熱寿命 $t^*$ はドライブの時間的相関に強く依存し、キック強度 $t^*\sim k^{-2n}$ のパワーロー依存性を持つ。

Driven many-body systems typically experience heating due to the lack of energy conservation. Heating may be suppressed for time-periodic drives, but little is known for less regular drive protocols. In this work, we investigate the heating dynamics in aperiodically kicked systems, specifically those driven by quasi-periodic Thue-Morse or a family of random sequences with $n$-multipolar temporal correlations. We demonstrate that multiple heating channels can be eliminated even away from the high-frequency regime. The number of eliminated channels increases with multipolar order $n$. We illustrate this in a classical kicked rotor chain where we find a long-lived prethermal regime. When the static Hamiltonian only involves the kinetic energy, the prethermal lifetime $t^*$ can strongly depend on the temporal correlations of the drive, with a power-law dependence on the kick strength $t^*\sim K^{-2n}$, for which we can account using a simple linearization argument.

翻訳日:2023-06-29 14:14:44 公開日:2023-06-28

# ドメイン固有自然言語処理アプリケーション開発のための生成的ユーザエクスペリエンス研究

Generative User-Experience Research for Developing Domain-specific Natural Language Processing Applications ( http://arxiv.org/abs/2306.16143v1 )

ライセンス: Link先を確認

Anastasia Zhukova, Lukas von Sperl, Christian E. Matt, Bela Gipp

(参考訳) ユーザエクスペリエンス(ux)は、ヒューマンコンピュータインタラクション(hci)研究の一部であり、システムユーザに対する直感性、透明性、シンプルさ、信頼の向上に重点を置いている。機械学習(ML)や自然言語処理(NLP)のためのUX研究のほとんどは、データ駆動の方法論に焦点を当てている。さらに、より一般的なUXメソッドは、最初にユーザニーズについて学ぶのとは異なり、システムをユーザユーザビリティに向けて調整する。本稿では,生成UX研究をドメインNLPアプリケーションに組み込む手法を提案する。生成UX研究は、プロトタイプ開発の初期段階、すなわちアイデアと概念評価、およびユーザ価値の変化を評価するための最終段階において、ドメインユーザーを採用する。本ケーススタディでは,プロセス産業における日常業務のドメイン固有意味検索の完全サイクルプロトタイプ開発について報告する。ケーススタディでは、ドメインエキスパートの関与は、NLPアプリケーションに対する関心と信頼を高めます。さらに,狭義のNLPアプリケーションにおいて重要となるデータおよびユーザ主導の機会と制約を,相乗的UX+NLP研究が効率的に検討していることを示す。

User experience (UX) is a part of human-computer interaction (HCI) research and focuses on increasing intuitiveness, transparency, simplicity, and trust for system users. Most of the UX research for machine learning (ML) or natural language processing (NLP) focuses on a data-driven methodology, i.e., it fails to focus on users' requirements, and engages domain users mainly for usability evaluation. Moreover, more typical UX methods tailor the systems towards user usability, unlike learning about the user needs first. The paper proposes a methodology for integrating generative UX research into developing domain NLP applications. Generative UX research employs domain users at the initial stages of prototype development, i.e., ideation and concept evaluation, and the last stage for evaluating the change in user value. In the case study, we report the full-cycle prototype development of a domain-specific semantic search for daily operations in the process industry. Our case study shows that involving domain experts increases their interest and trust in the final NLP application. Moreover, we show that synergetic UX+NLP research efficiently considers data- and user-driven opportunities and constraints, which can be crucial for NLP applications in narrow domains

翻訳日:2023-06-29 14:14:25 公開日:2023-06-28

# 一方向パストレースレンダリングのための神経方向距離場オブジェクト表現

Neural directional distance field object representation for uni-directional path-traced rendering ( http://arxiv.org/abs/2306.16142v1 )

ライセンス: Link先を確認

Annada Prasad Behera and Subhankar Mishra

(参考訳) 合成画像の高速レンダリングは、コンピュータグラフィックスの分野で核となる問題である。パストラッシングのようなレンダリングアルゴリズムは、画像のサイズ、光の反射数、ピクセル当たりのサンプル数などのパラメータに依存しており、所望の画質の画像を取得したい場合、すべてが固定される。また、レンダリングされるシーンのサイズと複雑さにも依存する。レンダリングにおける最大のボトルネックの1つは、特にシーンが非常に大きい場合、シーン内のあるレイの経路にあるオブジェクトをクエリすることである。シーン内のオブジェクトを表すデータ型を変更することで、レンダリング時間を短縮することができるが、シーンの異なる表現ではレンダリングアルゴリズムを変更する必要がある。この論文では a) 対象物の機能的表現として有向距離場を導入する。 b) 有向距離が,ニューラルネットワークとして格納された場合,どのように機能するか,及び (c) 修正されたパストレースアルゴリズムでそのようなオブジェクトをレンダリングする方法。

Faster rendering of synthetic images is a core problem in the field of computer graphics. Rendering algorithms, such as path-tracing is dependent on parameters like size of the image, number of light bounces, number of samples per pixel, all of which, are fixed if one wants to obtain a image of a desired quality. It is also dependent on the size and complexity of the scene being rendered. One of the largest bottleneck in rendering, particularly when the scene is very large, is querying for objects in the path of a given ray in the scene. By changing the data type that represents the objects in the scene, one may reduce render time, however, a different representation of a scene requires the modification of the rendering algorithm. In this paper, (a) we introduce directed distance field, as a functional representation of a object; (b) how the directed distance functions, when stored as a neural network, be optimized and; (c) how such an object can be rendered with a modified path-tracing algorithm.

翻訳日:2023-06-29 14:14:03 公開日:2023-06-28

# 大規模オンライン学習による深層サロゲートモデルのトレーニング

Training Deep Surrogate Models with Large Scale Online Learning ( http://arxiv.org/abs/2306.16133v1 )

ライセンス: Link先を確認

Lucas Meyer (EDF R\&D, SINCLAIR AI Lab, DATAMOVE ), Marc Schouler (DATAMOVE ), Robert Alexander Caulk (DATAMOVE ), Alejandro Rib\'es (SINCLAIR AI Lab, EDF R\&D), Bruno Raffin (DATAMOVE )

(参考訳) 部分微分方程式(PDE)の時空間分解は、世界の物理現象の数学的記述において重要な役割を果たす。一般に、科学者や技術者は計算に要求される解法を用いてPDEを数値的に解く。近年,PDEの高速解の代替としてディープラーニングアルゴリズムが登場している。モデルは通常、ソルバが生成した合成データに基づいて訓練され、ディスクに格納され、トレーニングのために読み戻される。本稿では、これらのモデルをトレーニングするために従来の静的データセットを頼りにしているため、ソルバの完全なメリットをデータジェネレータとして利用できないことを主張する。深層サロゲートモデルのためのオープンソースのオンライントレーニングフレームワークを提案する。このフレームワークは、数値シミュレーションとディープニューラルネットワークのトレーニングを同時に行うことに焦点を当てた、いくつかのレベルの並列処理を実装している。このアプローチは、ディスクロードされたデータセットに関連するI/Oとストレージのボトルネックを抑制し、はるかに大きなデータセットのトレーニング方法を開く。実験では、最先端アーキテクチャを含む4つの代理モデルのオフラインおよびオンライントレーニングを比較した。以上の結果から,データセットの多様性を最大数百GBにまで高めることで,モデル一般化能力が向上する可能性が示唆された。フル接続ニューラルネットワーク、フーリエニューラル演算子(FNO)、メッセージパスPDEソルバー予測精度はそれぞれ68%、16%、7%向上している。

The spatiotemporal resolution of Partial Differential Equations (PDEs) plays important roles in the mathematical description of the world's physical phenomena. In general, scientists and engineers solve PDEs numerically by the use of computationally demanding solvers. Recently, deep learning algorithms have emerged as a viable alternative for obtaining fast solutions for PDEs. Models are usually trained on synthetic data generated by solvers, stored on disk and read back for training. This paper advocates that relying on a traditional static dataset to train these models does not allow the full benefit of the solver to be used as a data generator. It proposes an open source online training framework for deep surrogate models. The framework implements several levels of parallelism focused on simultaneously generating numerical simulations and training deep neural networks. This approach suppresses the I/O and storage bottleneck associated with disk-loaded datasets, and opens the way to training on significantly larger datasets. Experiments compare the offline and online training of four surrogate models, including state-of-the-art architectures. Results indicate that exposing deep surrogate models to more dataset diversity, up to hundreds of GB, can increase model generalization capabilities. Fully connected neural networks, Fourier Neural Operator (FNO), and Message Passing PDE Solver prediction accuracy is improved by 68%, 16% and 7%, respectively.

翻訳日:2023-06-29 14:13:49 公開日:2023-06-28

# 行動や指示からエージェントを伝達する目標を推測する

Inferring the Goals of Communicating Agents from Actions and Instructions ( http://arxiv.org/abs/2306.16207v1 )

ライセンス: Link先を確認

Lance Ying, Tan Zhi-Xuan, Vikash Mansinghka, Joshua B. Tenenbaum

(参考訳) 人間が協力すると、彼らはしばしば言語コミュニケーションと非言語行動の両方を通して活動の協調を行い、この情報を使って共通の目標と計画を立てる。この推論能力をどのようにモデル化するか? 本稿では,一つのエージェントであるプリンシパルが,共有計画に関する自然言語指示を他のエージェントであるアシスタントに伝え,gpt-3を指導発話の確率関数として用いる協調チームのモデルを提案する。次に、第三者のオブザーバが行動や指示からマルチモーダルベイズ逆計画を通じてチームのゴールを推測し、エージェントが行動し、合理的にコミュニケーションして達成することを前提として、目標に対する後方分布を計算する方法を示す。提案手法は,マルチエージェントグリッドワールドにおける人間の目標推定と比較し,モデルの推定が人間の判断と密接に相関していること(R = 0.96)を見出した。また,行動のみからの推論と比較すると,指示がより迅速かつ不確定な目標推論につながり,協調エージェントにとっての言語コミュニケーションの重要性が強調された。

When humans cooperate, they frequently coordinate their activity through both verbal communication and non-verbal actions, using this information to infer a shared goal and plan. How can we model this inferential ability? In this paper, we introduce a model of a cooperative team where one agent, the principal, may communicate natural language instructions about their shared plan to another agent, the assistant, using GPT-3 as a likelihood function for instruction utterances. We then show how a third person observer can infer the team's goal via multi-modal Bayesian inverse planning from actions and instructions, computing the posterior distribution over goals under the assumption that agents will act and communicate rationally to achieve them. We evaluate this approach by comparing it with human goal inferences in a multi-agent gridworld, finding that our model's inferences closely correlate with human judgments (R = 0.96). When compared to inference from actions alone, we also find that instructions lead to more rapid and less uncertain goal inference, highlighting the importance of verbal communication for cooperative agents.

翻訳日:2023-06-29 14:08:05 公開日:2023-06-28

# マルチエージェントチームによる学習の理解を深める

Towards a Better Understanding of Learning with Multiagent Teams ( http://arxiv.org/abs/2306.16205v1 )

ライセンス: Link先を確認

David Radke, Kate Larson, Tim Brecht and Kyle Tilbury

(参考訳) 個人学習エージェントのチームは、その部分の合計よりも大きいと長年認識されてきたが、最近の研究によると、より大きなチームは必ずしも小さなものよりも効果的ではない。本稿では,特定のチーム構造が個人学習エージェントの集団に対して効果的な学習を促進する理由と条件について検討する。環境によっては、エージェントが特定の役割を専門化するのに役立ついくつかのチーム構造が、より望ましいグローバルな結果をもたらすことを示している。しかし、大きなチームはコーディネーションを減らすためのクレジット割り当ての課題を作り、大きなチームは小さなチームに比べてパフォーマンスが悪くなります。理論的分析と経験的結果の両方で結論を支持する。

While it has long been recognized that a team of individual learning agents can be greater than the sum of its parts, recent work has shown that larger teams are not necessarily more effective than smaller ones. In this paper, we study why and under which conditions certain team structures promote effective learning for a population of individual learning agents. We show that, depending on the environment, some team structures help agents learn to specialize into specific roles, resulting in more favorable global results. However, large teams create credit assignment challenges that reduce coordination, leading to large teams performing poorly compared to smaller ones. We support our conclusions with both theoretical analysis and empirical results.

翻訳日:2023-06-29 14:07:45 公開日:2023-06-28

# 半教師対象検出のための低信頼サンプルマイニング

Low-Confidence Samples Mining for Semi-supervised Object Detection ( http://arxiv.org/abs/2306.16201v1 )

ライセンス: Link先を確認

Guandu Liu, Fangyuan Zhang, Tianxiang Pan, Bin Wang

(参考訳) ラベルなしデータからの信頼性の高い擬似ラベルは、半教師付きオブジェクト検出(SSOD)において重要な役割を果たす。しかし、最先端のssodメソッドはすべて信頼度の高い擬似ラベルに依存しており、信頼度の低い貴重な擬似ラベルを無視している。さらに、ラベルなしデータの発掘が不十分なため、リコール率が過度に低くなり、ネットワークトレーニングが損なわれる。本稿では,低信頼擬似ラベルを効率的に利用する新しい低信頼サンプルマイニング(lsm)手法を提案する。具体的には,低分解能な特徴マップを考慮に入れた付加的な擬似情報マイニング(pim)ブランチを開発し,信頼性の高い大規模インスタンスを抽出した。 pimとメインブランチの相補的な予測により、両者を相互に学習して補うための自己蒸留(sd)を更に設計する。一方、上記のアプローチの拡張性により、我々のLSMは、それぞれ Faster-RCNN と Deformable-DETR に適用できる。 MS-COCOベンチマークでは,5%のラベル付け率で最先端手法よりも3.54%のmAP改善を実現している。

Reliable pseudo-labels from unlabeled data play a key role in semi-supervised object detection (SSOD). However, the state-of-the-art SSOD methods all rely on pseudo-labels with high confidence, which ignore valuable pseudo-labels with lower confidence. Additionally, the insufficient excavation for unlabeled data results in an excessively low recall rate thus hurting the network training. In this paper, we propose a novel Low-confidence Samples Mining (LSM) method to utilize low-confidence pseudo-labels efficiently. Specifically, we develop an additional pseudo information mining (PIM) branch on account of low-resolution feature maps to extract reliable large-area instances, the IoUs of which are higher than small-area ones. Owing to the complementary predictions between PIM and the main branch, we further design self-distillation (SD) to compensate for both in a mutually-learning manner. Meanwhile, the extensibility of the above approaches enables our LSM to apply to Faster-RCNN and Deformable-DETR respectively. On the MS-COCO benchmark, our method achieves 3.54% mAP improvement over state-of-the-art methods under 5% labeling ratios.

翻訳日:2023-06-29 14:07:33 公開日:2023-06-28

# 自由度3次元超音波再構成のためのオンライン自己整合型マルチIMU

Multi-IMU with Online Self-Consistency for Freehand 3D Ultrasound Reconstruction ( http://arxiv.org/abs/2306.16197v1 )

ライセンス: Link先を確認

Mingyuan Luo, Xin Yang, Zhongnuo Yan, Yuanji Zhang, Junyu Li, Jiongquan Chen, Xindi Hu, Jikuan Qian, Jun Cheng, Dong Ni

(参考訳) 超音波(US)イメージングは臨床診断において一般的なツールであり、安全性、再現性、リアルタイム能力を提供する。 Freehand 3D USは、複雑さを増すことなくスキャンされた領域をより深く理解する技術である。しかし,標高変位と累積誤差の推定は依然として困難であり,画像のみを用いて相対位置を推定することは困難である。複雑さを増すことなく再建性能を向上させるために,外部軽量センサの追加が提案されている。本稿では,複数慣性測定ユニット (imus) を用いた新しいオンライン自己抵抗ネットワーク (oscnet) を提案する。 OSCNetは、複数のIMU情報を融合し、各IMUデータから得られた再構成結果の違いを減らすために、モーダルレベルの自己管理戦略を利用する。さらに,スキャンシーケンスとそのサブシーケンス間の予測結果の階層的一貫性を改善するために,シーケンスレベルの自己一貫性戦略を提案する。複数のスキャン戦術を用いた大規模腕と頸動脈データセットの実験では,oscnetが従来の手法を上回っており,最先端の再構築性能を実現している。

Ultrasound (US) imaging is a popular tool in clinical diagnosis, offering safety, repeatability, and real-time capabilities. Freehand 3D US is a technique that provides a deeper understanding of scanned regions without increasing complexity. However, estimating elevation displacement and accumulation error remains challenging, making it difficult to infer the relative position using images alone. The addition of external lightweight sensors has been proposed to enhance reconstruction performance without adding complexity, which has been shown to be beneficial. We propose a novel online self-consistency network (OSCNet) using multiple inertial measurement units (IMUs) to improve reconstruction performance. OSCNet utilizes a modal-level self-supervised strategy to fuse multiple IMU information and reduce differences between reconstruction results obtained from each IMU data. Additionally, a sequence-level self-consistency strategy is proposed to improve the hierarchical consistency of prediction results among the scanning sequence and its sub-sequences. Experiments on large-scale arm and carotid datasets with multiple scanning tactics demonstrate that our OSCNet outperforms previous methods, achieving state-of-the-art reconstruction performance.

翻訳日:2023-06-29 14:07:16 公開日:2023-06-28

# 動的グラフ知識集約による対話生成の促進

Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation ( http://arxiv.org/abs/2306.16195v1 )

ライセンス: Link先を確認

Chen Tang, Hongbo Zhang, Tyler Loakman, Chenghua Lin and Frank Guerin

(参考訳) 外部グラフ知識をニューラルチャットボットモデルに組み込むことは、対話生成の強化に有効であることが証明されている。しかし、従来のグラフニューラルネットワーク(gnns)では、グラフ上のメッセージパッシングはテキストから独立しており、結果としてグラフ表現の隠れ空間はテキストとは異なっている。既存のモデルのこのトレーニング体制は、グラフ知識とテキストの間に意味的なギャップをもたらす。本研究では,知識グラフ強化対話生成のための新しいフレームワークを提案する。疑似ノードを持つマルチホップ知識グラフを動的に構築し,すべてのステップで言語モデルをグラフ内の特徴集約に組み込む。バニラ部分グラフの学習による意味バイアスを回避するため,提案フレームワークは疑似ノード上のグラフの特徴を集約する階層グラフの注意を応用し,グローバルな特徴を得る。したがって、フレームワークはポストと外部のグラフの知識の両方から異質な機能をうまく利用することができる。広範な実験により,対話生成におけるsota(state-of-the-art)ベースラインよりも優れたフレームワークが得られた。さらに,テキストとグラフの知識の両方の表現を凝集することにより,表現学習フレームワークが意味的ギャップを埋めることを示す。さらに、言語モデルは、機能集約プロセスでサブグラフパターンを利用することによって、より有益な応答のために知識トリプルを選択する方法も学んでいます。私たちのコードとリソースはhttps://github.com/tangg555/sabartで利用可能です。

Incorporating external graph knowledge into neural chatbot models has been proven effective for enhancing dialogue generation. However, in conventional graph neural networks (GNNs), message passing on a graph is independent from text, resulting in the graph representation hidden space differing from that of the text. This training regime of existing models therefore leads to a semantic gap between graph knowledge and text. In this study, we propose a novel framework for knowledge graph enhanced dialogue generation. We dynamically construct a multi-hop knowledge graph with pseudo nodes to involve the language model in feature aggregation within the graph at all steps. To avoid the semantic biases caused by learning on vanilla subgraphs, the proposed framework applies hierarchical graph attention to aggregate graph features on pseudo nodes and then attains a global feature. Therefore, the framework can better utilise the heterogeneous features from both the post and external graph knowledge. Extensive experiments demonstrate that our framework outperforms state-of-the-art (SOTA) baselines on dialogue generation. Further analysis also shows that our representation learning framework can fill the semantic gap by coagulating representations of both text and graph knowledge. Moreover, the language model also learns how to better select knowledge triples for a more informative response via exploiting subgraph patterns within our feature aggregation process. Our code and resources are available at https://github.com/tangg555/SaBART.

翻訳日:2023-06-29 14:06:57 公開日:2023-06-28

# 近接近傍相互作用を持つ1次元量子デバイス上でのスピンスクイージングの変分生成

Variational generation of spin squeezing on one-dimensional quantum devices with nearest-neighbor interactions ( http://arxiv.org/abs/2306.16194v1 )

ライセンス: Link先を確認

Zheng-Hang Sun, Yong-Yi Wang, Yu-Ran Zhang, Franco Nori, Heng Fan

(参考訳) スピンスクイーズ状態の効率的な調製は量子化メトロジーにとって重要である。強いスピンスクイーズを生成するための現在のプロトコルは、高次元または長距離の相互作用に依存する。鍵となる課題は、近傍の相互作用しか持たない1次元系のスピンスクイーズを生成する方法である。そこで我々は,この問題を解決するために変分スピンスキーズアルゴリズムを開発した。これらの変分アルゴリズムについて,ディジタル回路とアナログ量子回路の両方を考察する。変分スピンスケージングアルゴリズムの閉最適化ループの後、生成されたスクイージングは、2軸ツイストリングから生成される最強のスクイージングに匹敵する。実験的不完全性の解析により、本研究で提案する変分スピンスキーズアルゴリズムは、近年開発された雑音中規模量子コンピュータにおいて実現可能である。

Efficient preparation of spin-squeezed states is important for quantum-enhanced metrology. Current protocols for generating strong spin squeezing rely on either high dimensionality or long-range interactions. A key challenge is how to generate considerable spin squeezing in one-dimensional systems with only nearest-neighbor interactions. Here, we develop variational spin-squeezing algorithms to solve this problem. We consider both digital and analog quantum circuits for these variational algorithms. After the closed optimization loop of the variational spin-squeezing algorithms, the generated squeezing can be comparable to the strongest squeezing created from two-axis twisting. By analyzing the experimental imperfections, the variational spin-squeezing algorithms proposed in this work are feasible in recent developed noisy intermediate-scale quantum computers.

翻訳日:2023-06-29 14:06:35 公開日:2023-06-28

# 特定知識注入による繊維欠陥分割のための事前学習型大規模視覚モデルの有効性

Effective Transfer of Pretrained Large Visual Model for Fabric Defect Segmentation via Specifc Knowledge Injection ( http://arxiv.org/abs/2306.16186v1 )

ライセンス: Link先を確認

Zhewei Chen, Wai Keung Wong, Zuofeng Zhong, Jinpiao Liao, Ying Qu

(参考訳) 繊維欠陥セグメンテーションは繊維品質管理に不可欠である。それにもかかわらず、高品質なアノテートデータの不足とファブリック欠陥の多様性は、この分野でのディープラーニングの適用に重大な課題をもたらす。これらの要因は、既存のモデルの一般化とセグメンテーション性能を制限し、多様なファブリックタイプや欠陥の複雑さを扱う能力を妨げる。これらの障害を克服するため,本研究では,織物欠陥の専門知識を大規模視覚モデルsegment anything model(sam)に注入する革新的な手法を提案する。ファブリック欠陥関連パラメータのユニークなセットを導入し、訓練することにより、既存のモデルパラメータに広範な修正を加えることなく、ドメイン固有の知識をSAMにシームレスに統合する。改良されたSAMモデルは、ファブリック欠陥固有の知識を取り入れながら、大規模な自然画像データセットから学んだ一般化イメージ理解を活用し、ファブリック欠陥分割タスクの習熟性を確保する。実験結果から, 汎用的知識とファブリック固有の知識の融合によるモデルセグメンテーション性能の大幅な向上が示された。 3つのデータセットにまたがる一般的なセグメンテーションモデルに対してベンチマークを行うと、提案モデルが性能を大幅に向上することを示す。クロスデータセット比較と数発の学習実験による印象的な結果は、繊維品質管理の実践的応用の可能性をさらに示している。

Fabric defect segmentation is integral to textile quality control. Despite this, the scarcity of high-quality annotated data and the diversity of fabric defects present significant challenges to the application of deep learning in this field. These factors limit the generalization and segmentation performance of existing models, impeding their ability to handle the complexity of diverse fabric types and defects. To overcome these obstacles, this study introduces an innovative method to infuse specialized knowledge of fabric defects into the Segment Anything Model (SAM), a large-scale visual model. By introducing and training a unique set of fabric defect-related parameters, this approach seamlessly integrates domain-specific knowledge into SAM without the need for extensive modifications to the pre-existing model parameters. The revamped SAM model leverages generalized image understanding learned from large-scale natural image datasets while incorporating fabric defect-specific knowledge, ensuring its proficiency in fabric defect segmentation tasks. The experimental results reveal a significant improvement in the model's segmentation performance, attributable to this novel amalgamation of generic and fabric-specific knowledge. When benchmarking against popular existing segmentation models across three datasets, our proposed model demonstrates a substantial leap in performance. Its impressive results in cross-dataset comparisons and few-shot learning experiments further demonstrate its potential for practical applications in textile quality control.

翻訳日:2023-06-29 14:06:22 公開日:2023-06-28

# 空間的詳細記憶を用いたパンシャープ化への学習

Learning to Pan-sharpening with Memories of Spatial Details ( http://arxiv.org/abs/2306.16181v1 )

ライセンス: Link先を確認

Maoxun Yuan, Tianyi Zhao, Bo Li, Xingxing Wei

(参考訳) リモートセンシングシステムにおいて最もよく用いられる技術の一つであるパンシャーペニングは、パンクロマティック画像からマルチスペクトル画像に空間的詳細を注入し、高分解能MS画像を得る。ディープラーニングはその強固な適合能力と効率的な特徴抽出によって広く注目を集めているため、優れた性能を達成するために様々なパンシャープ化手法が提案されている。しかしながら、現在のパンシャーピング法では、通常、入力としてPANとMSの2つのイメージが必要である。この問題に対処するために,本論文では,PAN画像の空間的詳細が主に高周波の手がかりである,すなわち入力PAN画像の輪郭を反映していることを観察する。これにより,いくつかのベースエッジを格納するPAN非依存表現を開発し,それを介して対応するPAN画像の輪郭を構成することができる。その結果、推定時にms画像のみを用いてパンシャープ化タスクを行うことができる。この目的のために、メモリベースのネットワークは、トレーニングフェーズ中に空間の詳細を抽出して記憶するように適応し、メモリベースの空間詳細ネットワーク(MSDN)と呼ばれる推論時にPAN画像から空間情報を取得するプロセスを置き換えるために使用される。我々は最終的に提案したMSDNモジュールを既存のDLベースのパンシャーピング手法に統合し、エンドツーエンドのパンシャーピングネットワークを実現する。我々はGaofen1衛星とWorldView-4衛星の広範な実験により、PAN画像なしで良好な空間的詳細を構築し、最高の性能を達成することを検証する。コードはhttps://github.com/Zhao-Tian-yi/Learning-to-Pan-sharpening-with-Memories-of-Spatial-Details.gitで公開されている。

Pan-sharpening, as one of the most commonly used techniques in remote sensing systems, aims to inject spatial details from panchromatic images into multi-spectral images to obtain high-resolution MS images. Since deep learning has received widespread attention because of its powerful fitting ability and efficient feature extraction, a variety of pan-sharpening methods have been proposed to achieve remarkable performance. However, current pan-sharpening methods usually require the paired PAN and MS images as the input, which limits their usage in some scenarios. To address this issue, in this paper, we observe that the spatial details from PAN images are mainly high-frequency cues, i.e., the edges reflect the contour of input PAN images. This motivates us to develop a PAN-agnostic representation to store some base edges, so as to compose the contour for the corresponding PAN image via them. As a result, we can perform the pan-sharpening task with only the MS image when inference. To this end, a memory-based network is adapted to extract and memorize the spatial details during the training phase and is used to replace the process of obtaining spatial information from PAN images when inference, which is called Memory-based Spatial Details Network (MSDN). We finally integrate the proposed MSDN module into the existing DL-based pan-sharpening methods to achieve an end-to-end pan-sharpening network. With extensive experiments on the Gaofen1 and WorldView-4 satellites, we verify that our method constructs good spatial details without PAN images and achieves the best performance. The code is available at https://github.com/Zhao-Tian-yi/Learning-to-Pan-sharpening-with-Memories-of-Spatial-Details.git.

翻訳日:2023-06-29 14:05:57 公開日:2023-06-28

# 複数インスタンス学習に基づく全スライド画像分類のための疑似バッグミックスアップ拡張

Pseudo-Bag Mixup Augmentation for Multiple Instance Learning Based Whole Slide Image Classification ( http://arxiv.org/abs/2306.16180v1 )

ライセンス: Link先を確認

Pei Liu, Luping Ji, Xinyu Zhang, Feng Ye

(参考訳) ギガピクセル画像のモデリングの特別な状況を考えると、MIL(Multiple Case Learning)はWSI(Whole Slide Image)分類において最も重要なフレームワークの1つとなっている。現在、ほとんどのMILネットワークは、トレーニングにおいて避けられない2つの問題に直面している。 i) 不十分なWSIデータ及び二ニューラルネットワークに固有のデータ記憶の性質これらの問題は、WSIの分類モデルの継続的な性能向上を抑えるため、MILモデルが適切かつ効率的な訓練から妨げられる可能性がある。そこで本研究では,MILモデルのトレーニングを改善するためのPseudo-bag Mixup(PseMix)データ拡張スキームを提案する。このスキームは、MILに基づくWSI分類に適用するために、一般的な画像のMixup戦略を擬似バグを介して特別なWSIに一般化する。疑似バッグによる協調により,psemixはミックスアップ戦略におけるクリティカルサイズアライメントとセマンティクスアライメントを満足する。さらに、MILに適応した効率的で分離された手法として設計されており、時間を要する操作やMILモデル予測に依存しない。比較実験とアブレーション研究はPseMixの有効性と利点を評価するために特別に設計されている。 PseMixはWSI分類におけるMILネットワークの性能を向上させることができる。さらに、MILモデルの一般化能力を高め、排他的ラベルやノイズのあるラベルをパッチする堅牢性を促進することもできる。ソースコードはhttps://github.com/liupei101/psemixで入手できます。

Given the special situation of modeling gigapixel images, multiple instance learning (MIL) has become one of the most important frameworks for Whole Slide Image (WSI) classification. In current practice, most MIL networks often face two unavoidable problems in training: i) insufficient WSI data, and ii) the data memorization nature inherent in neural networks. These problems may hinder MIL models from adequate and efficient training, suppressing the continuous performance promotion of classification models on WSIs. Inspired by the basic idea of Mixup, this paper proposes a Pseudo-bag Mixup (PseMix) data augmentation scheme to improve the training of MIL models. This scheme generalizes the Mixup strategy for general images to special WSIs via pseudo-bags so as to be applied in MIL-based WSI classification. Cooperated by pseudo-bags, our PseMix fulfills the critical size alignment and semantic alignment in Mixup strategy. Moreover, it is designed as an efficient and decoupled method adaptive to MIL, neither involving time-consuming operations nor relying on MIL model predictions. Comparative experiments and ablation studies are specially designed to evaluate the effectiveness and advantages of our PseMix. Test results show that PseMix could often improve the performance of MIL networks in WSI classification. Besides, it could also boost the generalization capacity of MIL models, and promote their robustness to patch occlusion and noisy labels. Our source code is available at https://github.com/liupei101/PseMix.

翻訳日:2023-06-29 14:05:27 公開日:2023-06-28

# データサイエンスを定義する: 探究の新しい分野

Defining data science: a new field of inquiry ( http://arxiv.org/abs/2306.16177v1 )

ライセンス: Link先を確認

Michael L Brodie

(参考訳) データサイエンスは科学ではない。それは研究パラダイムです。その力、範囲、スケールは、我々の最も強力な研究パラダイムである科学を越え、知識の発見と世界を変えることができるでしょう。私たちはまだそれを理解し定義しておらず、その可能性を認識し、リスクを管理するために不可欠です。現代のデータサイエンスは始まったばかりです。 1962年から徐々に発展し、2000年から急速に発展し、21世紀の最も活発で強力な革新の1つであり、基本的に新しい調査分野である。その価値、パワー、適用性のために、40以上の規律、何百もの研究領域、何千ものアプリケーションに現れています。何百万ものデータサイエンス出版物には、データサイエンスとデータサイエンスの問題解決の無数の定義が含まれている。幼少期のため、多くの定義は独立性、アプリケーション固有性、相互不完全性、冗長性、矛盾性がある。本研究では,データサイエンスコミュニティのためのデータサイエンスジャーナルを用いた,データサイエンス参照フレームワークに基づくコヒーレントで統一的な定義の開発を提案することにより,このデータサイエンスの多重定義の課題を解決する。本稿では、そのような定義を議論するために必要なデータサイエンスアーティファクトの候補定義を提供する。データサイエンスの哲学、データサイエンスの問題解決パラダイム、およびデータサイエンスを定義し、統一し、発展させるためのフレームワークとしてしばしば呼ばれる6つの要素データサイエンス参照フレームワーク(公理学、オントロジ、認識論、方法論、手法、技術)からなる古典的な研究パラダイムの概念に基づいている。データ科学を定義するための課題、すなわち、データ科学を定義するための手段、そして包括的ソリューションの基盤としてのそれらの要求と利益を示す。

Data science is not a science. It is a research paradigm. Its power, scope, and scale will surpass science, our most powerful research paradigm, to enable knowledge discovery and change our world. We have yet to understand and define it, vital to realizing its potential and managing its risks. Modern data science is in its infancy. Emerging slowly since 1962 and rapidly since 2000, it is a fundamentally new field of inquiry, one of the most active, powerful, and rapidly evolving 21st century innovations. Due to its value, power, and applicability, it is emerging in 40+ disciplines, hundreds of research areas, and thousands of applications. Millions of data science publications contain myriad definitions of data science and data science problem solving. Due to its infancy, many definitions are independent, application-specific, mutually incomplete, redundant, or inconsistent, hence so is data science. This research addresses this data science multiple definitions challenge by proposing the development of coherent, unified definition based on a data science reference framework using a data science journal for the data science community to achieve such a definition. This paper provides candidate definitions for essential data science artifacts that are required to discuss such a definition. They are based on the classical research paradigm concept consisting of a philosophy of data science, the data science problem solving paradigm, and the six component data science reference framework (axiology, ontology, epistemology, methodology, methods, technology) that is a frequently called for unifying framework with which to define, unify, and evolve data science. It presents challenges for defining data science, solution approaches, i.e., means for defining data science, and their requirements and benefits as the basis of a comprehensive solution.

翻訳日:2023-06-29 14:04:58 公開日:2023-06-28

# アフガニスタンにおける教育禁止ツイートの感情分析

Emotion Analysis of Tweets Banning Education in Afghanistan ( http://arxiv.org/abs/2306.16268v1 )

ライセンス: Link先を確認

Mohammad Ali Hussiny, Lilja {\O}vrelid

(参考訳) 本稿ではアフガニスタンで話されているペルシア語のダリ変種に対する最初の感情注釈データセットを紹介する。 LetHerLearnのデータセットには、2022年にタリバンで女性教育の権利が禁止されたことを受けて投稿された7,600のツイートが含まれている。ここでは、データ収集とアノテーションのプロセス、関連するデータセットの統計、結果のデータセットに関する最初の実験、dari感情分類のタスクのために様々なニューラルネットワークアーキテクチャをベンチマークします。

This paper introduces the first emotion annotated dataset for the Dari variant of Persian spoken in Afghanistan. The LetHerLearn dataset contains 7,600 tweets posted in reaction to the Taliban ban of women rights to education in 2022 and has been manually annotated according to Ekman emotion categories. We here detail the data collection and annotation process, present relevant dataset statistics as well as initial experiments on the resulting dataset, benchmarking a number of different neural architectures for the task of Dari emotion classification.

翻訳日:2023-06-29 13:56:56 公開日:2023-06-28

# 学習者自身のコードに関する自動質問は、脆弱な知識を検出するのに役立つ

Automated Questions About Learners' Own Code Help to Detect Fragile Knowledge ( http://arxiv.org/abs/2306.16267v1 )

ライセンス: Link先を確認

Teemu Lehtinen, Otto Sepp\"al\"a, Ari Korhonen

(参考訳) 学生は、実際にどのように動作するかを脆弱に理解していても、正しく機能するプログラムコードを作成できる。個々のエクササイズサブミッション(qlc)から自動的に派生した質問は、学生が作成したコードの構造とロジックをどの程度理解しているかを調査できる。以前の研究は、最初のプログラミングコースの文脈でこのアプローチを研究した。本研究は,CS1における一般概念の要約を含む工学生のためのフォローアッププログラミングコースの再現である。その課題は、学生の90%が解決した古典的な降雨問題であった。合格申請ごとに生成されたQLCは意図的にシンプルに保たれていたが、学生の27%は少なくとも1回は失敗した。自己のプログラム論理に関する質問に苦しむ学生は、正解した学生よりもコースポイント全体の中央値が低かった。

Students are able to produce correctly functioning program code even though they have a fragile understanding of how it actually works. Questions derived automatically from individual exercise submissions (QLC) can probe if and how well the students understand the structure and logic of the code they just created. Prior research studied this approach in the context of the first programming course. We replicate the study on a follow-up programming course for engineering students which contains a recap of general concepts in CS1. The task was the classic rainfall problem which was solved by 90% of the students. The QLCs generated from each passing submission were kept intentionally simple, yet 27% of the students failed in at least one of them. Students who struggled with questions about their own program logic had a lower median for overall course points than students who answered correctly.

翻訳日:2023-06-29 13:56:46 公開日:2023-06-28

# MIMO信号検出のための深部展開模擬分岐

Deep Unfolded Simulated Bifurcation for Massive MIMO Signal Detection ( http://arxiv.org/abs/2306.16264v1 )

ライセンス: Link先を確認

Satoshi Takabe

(参考訳) マルチインプット多重出力(MIMO)は次世代無線通信の鍵となる要素である。近年,深層学習技術と量子(インスパイアされた)アルゴリズムに基づく様々なMIMO信号検出器が提案され,従来の検出器と比較して検出性能が向上している。本稿では,量子インスパイアされたアルゴリズムであるシミュレート分岐(sb)アルゴリズムに注目した。本稿では,検出性能を向上させる2つの手法を提案する。第一は、レベンバーグ・マーカルトアルゴリズムに触発されたアルゴリズムを修正して、最大確率検出の極小を取り除いたことである。 2つ目は、反復アルゴリズムの内部パラメータをトレーニングするためのディープラーニングテクニックである、deep unfoldingの利用である。本稿では,SBの更新ルールを微分可能とした深部展開SBを提案する。その結果,これらの検出器はMIMOシステムの信号検出性能を著しく向上することがわかった。

Multiple-input multiple-output (MIMO) is a key ingredient of next-generation wireless communications. Recently, various MIMO signal detectors based on deep learning techniques and quantum(-inspired) algorithms have been proposed to improve the detection performance compared with conventional detectors. This paper focuses on the simulated bifurcation (SB) algorithm, a quantum-inspired algorithm. This paper proposes two techniques to improve its detection performance. The first is modifying the algorithm inspired by the Levenberg-Marquardt algorithm to eliminate local minima of maximum likelihood detection. The second is the use of deep unfolding, a deep learning technique to train the internal parameters of an iterative algorithm. We propose a deep-unfolded SB by making the update rule of SB differentiable. The numerical results show that these proposed detectors significantly improve the signal detection performance in massive MIMO systems.

翻訳日:2023-06-29 13:56:34 公開日:2023-06-28

# i.i.d.行列の散逸スペクトル形式因子

The Dissipative Spectral Form Factor for I.I.D. Matrices ( http://arxiv.org/abs/2306.16262v1 )

ライセンス: Link先を確認

Giorgio Cipolloni and Nicolo Grometto

(参考訳) ジニブレアンサンブルの[arXiv:2103.05001]に最近導入された散逸スペクトル形因子(DSFF)は、散逸量子系の普遍的性質を研究するための鍵となるツールである。本研究では,実数や複素数を中間時間スケールまで含む大きな乱数行列のdsffを計算し, [arxiv:2103.05001] からの予測を確認した。実例におけるDSFFの解析式は以前不明であった。さらに,DSFFの連結成分は,短時間で成分の4次累積に依存する非普遍的補正を示すことを示した。これらの結果は、非エルミート確率行列[arXiv:2002.02438, arXiv:1912.04100]の線形固有値統計に対する中心極限定理に基づいている。

The Dissipative Spectral Form Factor (DSFF), recently introduced in [arXiv:2103.05001] for the Ginibre ensemble, is a key tool to study universal properties of dissipative quantum systems. In this work we compute the DSFF for a large class of random matrices with real or complex entries up to an intermediate time scale, confirming the predictions from [arXiv:2103.05001]. The analytic formula for the DSFF in the real case was previously unknown. Furthermore, we show that for short times the connected component of the DSFF exhibits a non-universal correction depending on the fourth cumulant of the entries. These results are based on the central limit theorem for linear eigenvalue statistics of non-Hermitian random matrices [arXiv:2002.02438, arXiv:1912.04100].

翻訳日:2023-06-29 13:56:22 公開日:2023-06-28

# センチネル2画像からの疎アノテーションによる土地被覆区分

Land Cover Segmentation with Sparse Annotations from Sentinel-2 Imagery ( http://arxiv.org/abs/2306.16252v1 )

ライセンス: Link先を確認

Marco Galatola, Edoardo Arnaudo, Luca Barco, Claudio Rossi, Fabrizio Dominici

(参考訳) 土地被覆(LC)セグメンテーションは, 環境分析や自然災害管理など, 様々な分野で重要な役割を担っている。しかし、正確なLCマップの生成は複雑で時間を要する作業であり、環境変化を考慮して複数のアノテータの専門知識と定期的な更新が必要である。本研究では,LCセグメンテーションに関連する課題に,スパースアノテーションとドメイン適応手法を用いて対処する,燃料マップ記述のためのフレームワークSPADAを紹介する。 LUCASやUrban Atlasといった信頼性の高い地上事実を用いた性能評価は、この手法の有効性を示している。 SPADAは最先端のセマンティックセグメンテーションアプローチやサードパーティ製品よりも優れており、平均IoUスコアは42.86、F1スコアは67.93である。

Land cover (LC) segmentation plays a critical role in various applications, including environmental analysis and natural disaster management. However, generating accurate LC maps is a complex and time-consuming task that requires the expertise of multiple annotators and regular updates to account for environmental changes. In this work, we introduce SPADA, a framework for fuel map delineation that addresses the challenges associated with LC segmentation using sparse annotations and domain adaptation techniques for semantic segmentation. Performance evaluations using reliable ground truths, such as LUCAS and Urban Atlas, demonstrate the technique's effectiveness. SPADA outperforms state-of-the-art semantic segmentation approaches as well as third-party products, achieving a mean Intersection over Union (IoU) score of 42.86 and an F1 score of 67.93 on Urban Atlas and LUCAS, respectively.

翻訳日:2023-06-29 13:56:08 公開日:2023-06-28

# 均一空間上の潜在SDE

Latent SDEs on Homogeneous Spaces ( http://arxiv.org/abs/2306.16248v1 )

ライセンス: Link先を確認

Sebastian Zeng, Florian Graf, Roland Kwitt

(参考訳) 確率過程が(おそらく複雑な)観測された場合、潜時確率微分方程式(SDE)の解によって支配される潜在変数モデルにおける変分ベイズ推論の問題を考察する。効率的な勾配計算などの大規模データから(ほぼ任意の)潜伏ニューラルネットワークSDEを学習しようとするときの課題に触発されて、我々は一歩後退して特定のサブクラスを研究する。我々の場合、SDEは同次潜在空間上で進化し、対応する(行列)リー群の確率力学によって誘導される。学習問題において、単位$n$-sphere上のSDEは、おそらくこの設定の最も関連性の高いインカーネーションである。特に、変分推論において、球面は真に非形式的事前SDEの使用を容易にするだけでなく、証明の下界における近似的後続過程と先行過程の間のクルバック・リーブラー分岐に対する特に単純で直感的な表現も得られる。実験により, 提案手法の潜在sdeを, 既存の1段階幾何オイラー・マルヤマスキームを用いて効率的に学習できることを実証した。より多様なSDEに制限されているにもかかわらず、様々な時系列補間および分類ベンチマークにおいて、競争力や最先端のパフォーマンスを達成する。

We consider the problem of variational Bayesian inference in a latent variable model where a (possibly complex) observed stochastic process is governed by the solution of a latent stochastic differential equation (SDE). Motivated by the challenges that arise when trying to learn an (almost arbitrary) latent neural SDE from large-scale data, such as efficient gradient computation, we take a step back and study a specific subclass instead. In our case, the SDE evolves on a homogeneous latent space and is induced by stochastic dynamics of the corresponding (matrix) Lie group. In learning problems, SDEs on the unit $n$-sphere are arguably the most relevant incarnation of this setup. Notably, for variational inference, the sphere not only facilitates using a truly uninformative prior SDE, but we also obtain a particularly simple and intuitive expression for the Kullback-Leibler divergence between the approximate posterior and prior process in the evidence lower bound. Experiments demonstrate that a latent SDE of the proposed type can be learned efficiently by means of an existing one-step geometric Euler-Maruyama scheme. Despite restricting ourselves to a less diverse class of SDEs, we achieve competitive or even state-of-the-art performance on various time series interpolation and classification benchmarks.

翻訳日:2023-06-29 13:55:50 公開日:2023-06-28

# CBBQ: 大規模言語モデルのための人間-AIコラボレーションによる中国のバイアスベンチマークデータセット

CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models ( http://arxiv.org/abs/2306.16244v1 )

ライセンス: Link先を確認

Yufei Huang and Deyi Xiong

(参考訳) 大規模言語モデルの社会的バイアスを理論的に測定することは、高度に有能なAIモデルの倫理的リスクの検出と低減に不可欠である。本研究では,人間の専門家と生成言語モデルが共同で構築した10万以上の質問からなり,中国文化と価値観に関連する14の社会的次元におけるステレオタイプと社会バイアスをカバーする,中国バイアスベンチマークデータセットを提案する。キュレーションプロセスには、広範な文献レビューによるバイアス識別、曖昧なコンテキスト生成、AIによるあいまいなコンテキスト生成、snd manual Review \& recompositionの4つの重要なステップが含まれている。データセットのテストインスタンスは、3K以上の高品質なテンプレートから自動的に抽出される。データセットは広範囲のカバレッジと高い多様性を示す。広範な実験により、データセットがモデルバイアスの検出に有効であることが示され、公に入手可能な10の中国語大言語モデルはすべて、特定のカテゴリにおいて強いバイアスを示している。さらに,我々は実験から,微調整されたモデルがある程度の注意を払って,あるタイプにおいて道徳的に有害なアウトプットを生成するのを避けることができることを観察した。私たちのデータセットと結果は \href{https://github.com/yfhuangxxxx/cbbq}{https://github.com/yfhuangxxxx/cbbq} で公開されています。

Holistically measuring societal biases of large language models is crucial for detecting and reducing ethical risks in highly capable AI models. In this work, we present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models, covering stereotypes and societal biases in 14 social dimensions related to Chinese culture and values. The curation process contains 4 essential steps: bias identification via extensive literature review, ambiguous context generation, AI-assisted disambiguous context generation, snd manual review \& recomposition. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. The dataset exhibits wide coverage and high diversity. Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories. Additionally, we observe from our experiments that fine-tuned models could, to a certain extent, heed instructions and avoid generating outputs that are morally harmful in some types, in the way of "moral self-correction". Our dataset and results are publicly available at \href{https://github.com/YFHuangxxxx/CBBQ}{https://github.com/YFHuangxxxx/CBBQ}, offering debiasing research opportunities to a widened community.

翻訳日:2023-06-29 13:55:29 公開日:2023-06-28

# 散逸フェルミオン系に対する局所非エルミットハミルトン形式とフェルミ超流動系の損失誘起人口増加

Local Non-Hermitian Hamiltonian Formalism for Dissipative Fermionic Systems and Loss-Induced Population Increase in Fermi Superfluids ( http://arxiv.org/abs/2306.16235v1 )

ライセンス: Link先を確認

Teng Xiao and Gentaro Watanabe

(参考訳) 非エルミートハミルトニアン(Non-Hermitian Hamiltonian、NHH)は、開量子系に対する効果的な形式主義である。共通認識では、リンドブラッドマスター方程式で系を記述するとき、そのジャンプ項を無視して得られるnhhは散逸率の逆よりも十分に短い時間スケールのよい近似であると考えられている。この共通知恵に挑戦し、散逸性フェルミオン系に対する元のマスター方程式から適切なNHHを得るためのスキームを開発する。この NHH は局所的な NHH と呼ばれ、各モードにおける損失過程を局所的に記述する。具体例として、フェミオン超流動を用いた新しいスキームを1体損失下で正当化する。さらに, ペアリングギャップと異常電界との間の散逸誘起位相ロックにより, 長期的進化における損失による個体増加がみられた。

Non-Hermitian Hamiltonian (NHH) is an effective formalism for open quantum systems. In common wisdom, when the system is described by the Lindblad master equation, the NHH obtained by neglecting its jump term is believed to be a good approximation for a timescale sufficiently shorter than the inverse of the dissipation rate. We challenge this common wisdom and develop a scheme to obtain an appropriate NHH from the original master equation for dissipative fermionic systems. This NHH, called the local NHH, describes the loss process in each individual mode locally. As a concrete example, we justify our new scheme using fermionic superfluid under one-body loss. Furthermore, we find loss-induced population increase in the long time evolution due to the dissipation-induced phase locking between the pairing gap and the anomalous field.

翻訳日:2023-06-29 13:55:01 公開日:2023-06-28

# 自己組織化生体分子薄膜によるカシミール力の効率的な低減

Efficient Reduction of Casimir Forces by Self-assembled Bio-molecular Thin Films ( http://arxiv.org/abs/2306.16209v1 )

ライセンス: Link先を確認

Ren\'e I.P. Sedmik, Alexander Urech, Zeev Zalevsky, Itai Carmeli

(参考訳) ロンドン・ヴァン・デル・ワールス力に関連するカシミール力は、電磁揺らぎのスペクトルが境界によって制限された場合に生じる。ナノスケールでこれらの力を制御するための基礎科学と技術的応用の両方から大きな関心がある。科学的には、カシミール効果は、マクロな物体の間に現れる唯一の既知の量子真空効果であり、真空の未知の物理を調べることができる。本研究では, プレートと球体間のカシミール力に及ぼす自己組織化分子バイオ薄膜と有機薄膜の影響を実験的に検討した。分子薄膜は、わずか数ナノメートルの厚さにもかかわらず、カシミール力は最大14%減少することがわかった。この還元に繋がる分子特性を明らかにするため, 化学的, 物理的性質の異なる5種類の生体分子膜について検討した。分光データから、金層と分子膜の電子状態が自己集合過程における電荷再配置によって混合されることに起因する広い吸収帯が明らかとなった。リフシッツ理論を用いて、観察されたカシミール力の変化は、分子層の形成による新しい吸収帯の出現と一致することを計算した。所望のカシミール力の低減は、溶液中の単純な自己組織化技術を用いて、複数の単層を積み重ねることで調整できる。分子は、それぞれ数ナノメートルの長さで、小さな空洞や穴を貫通し、あらゆる表面を高い効率で覆うことができる。このプロセスは、マイクロエレクトロメカニカルシステム(MEMS)の製造における現在の手法と互換性があり、カシミール効果によって生じる「スティクション」により、一定のサイズを超えて小型化できない。したがって、我々のアプローチはこれらのデバイスをさらに小型化することができる。

Casimir forces, related to London-van der Waals forces, arise if the spectrum of electromagnetic fluctuations is restricted by boundaries. There is great interest both from fundamental science and technical applications to control these forces on the nano scale. Scientifically, the Casimir effect being the only known quantum vacuum effect manifesting between macroscopic objects, allows to investigate the poorly known physics of the vacuum. In this work, we experimentally investigate the influence of self-assembled molecular bio and organic thin films on the Casimir force between a plate and a sphere. We find that molecular thin films, despite being a mere few nanometers thick, reduce the Casimir force by up to 14%. To identify the molecular characteristics leading to this reduction, five different bio-molecular films with varying chemical and physical properties were investigated. Spectroscopic data reveal a broad absorption band whose presence can be attributed to the mixing of electronic states of the underlying gold layer and those of the molecular film due to charge rearrangement in the process of self-assembly. Using Lifshitz theory we calculate that the observed change in the Casimir force is consistent with the appearance of the new absorption band due to the formation of molecular layers. The desired Casimir force reduction can be tuned by stacking several monolayers, using a simple self-assembly technique in a solution. The molecules - each a few nanometers long - can penetrate small cavities and holes, and cover any surface with high efficiency. This process seems compatible with current methods in the production of micro-electromechanical systems (MEMS), which cannot be miniaturized beyond a certain size due to `stiction' caused by the Casimir effect. Our approach could therefore readily enable further miniaturization of these devices.

翻訳日:2023-06-29 13:54:46 公開日:2023-06-28

# McKean-Vlasov制御問題に対する連続時間q-ラーニング

Continuous-Time q-learning for McKean-Vlasov Control Problems ( http://arxiv.org/abs/2306.16208v1 )

ライセンス: Link先を確認

Xiaoli Wei, Xiang Yu

(参考訳) 本稿では,最近Jia と Zhou (2022c) による Q-learning の連続的対応として作られた q-learning を,エントロピー規則化強化学習の設定における Mckean-Vlasov 制御問題に対して検討する。 Jia と Zhou (2022c) における単一のエージェントの制御問題とは対照的に、エージェントの平均場相互作用は q-函数の定義をより微妙に表現し、2つの異なる q-函数が自然に生じることを示した。 i) テストポリシを含む弱いマルティンゲール条件で学習可能な統合Q関数(Gu, Guo, Wei, Xu (2023))の1次近似としての統合q関数($q$で記述) (ii)政策改善イテレーションで使用される本質的なq-関数($q_e$で示される)。 2つのq関数は、すべてのテストポリシーの下で積分表現を介して関連していることを示す。統合q関数の弱martingale条件と提案するテストポリシー探索法に基づき,モデルフリーのオフラインおよびオンライン学習アルゴリズムを考案した。 LQ制御フレームワークとLQ制御フレームワーク以外の2つの金融アプリケーションにおいて、値関数と2つのq-関数の正確なパラメータ化を求め、シミュレーション実験でアルゴリズムを説明できる。

This paper studies the q-learning, recently coined as the continuous-time counterpart of Q-learning by Jia and Zhou (2022c), for continuous time Mckean-Vlasov control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent's control problem in Jia and Zhou (2022c), the mean-field interaction of agents render the definition of q-function more subtle, for which we reveal that two distinct q-functions naturally arise: (i) the integrated q-function (denoted by $q$) as the first-order approximation of the integrated Q-function introduced in Gu, Guo, Wei and Xu (2023) that can be learnt by a weak martingale condition involving test policies; and (ii) the essential q-function (denoted by $q_e$) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation under all test policies. Based on the weak martingale condition of the integrated q-function and our proposed searching method of test policies, some model-free offline and online learning algorithms are devised. In two financial applications, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the value function and two q-functions and illustrate our algorithms with simulation experiments.

翻訳日:2023-06-29 13:54:16 公開日:2023-06-28

# スタイン法によるガウス確率場近似と広帯域ランダムニューラルネットワークへの応用

Gaussian random field approximation via Stein's method with applications to wide random neural networks ( http://arxiv.org/abs/2306.16308v1 )

ライセンス: Link先を確認

Krishnakumar Balasubramanian, Larry Goldstein, Nathan Ross, Adil Salim

(参考訳) 我々は、シュテインの手法に基づいて、n$-球面とガウス面によってインデックスづけされた任意の連続的$\mathbb{r}^d$値の確率場の間の、ワッサースタイン距離(w_1$)の上界を、$\sup$-normに関して導出する。我々は、より滑らかな計量の束縛を$w_1$距離に移すことができる新しいガウス平滑化手法を開発した。滑らか化はラプラシアン作用素の力を使って構成された共分散関数に基づいており、関連するガウス過程がトラクタブルなキャメロン・マーチンあるいは再生するケルネル・ヒルベルト空間を持つように設計されている。この機能により、以前文献で考慮されていた1次元の間隔ベースのインデックスセットを越えられるようになります。一般結果に特化して、任意の深さの広いランダムニューラルネットワークのガウス確率場近似とランダム場レベルでのリプシッツ活性化関数の第一境界を求める。我々の境界は、ネットワークの幅とランダムな重みのモーメントで明示的に表現される。また、活性化関数が3つの有界導関数を持つとき、より厳密な境界が得られる。

We derive upper bounds on the Wasserstein distance ($W_1$), with respect to $\sup$-norm, between any continuous $\mathbb{R}^d$ valued random field indexed by the $n$-sphere and the Gaussian, based on Stein's method. We develop a novel Gaussian smoothing technique that allows us to transfer a bound in a smoother metric to the $W_1$ distance. The smoothing is based on covariance functions constructed using powers of Laplacian operators, designed so that the associated Gaussian process has a tractable Cameron-Martin or Reproducing Kernel Hilbert Space. This feature enables us to move beyond one dimensional interval-based index sets that were previously considered in the literature. Specializing our general result, we obtain the first bounds on the Gaussian random field approximation of wide random neural networks of any depth and Lipschitz activation functions at the random field level. Our bounds are explicitly expressed in terms of the widths of the network and moments of the random weights. We also obtain tighter bounds when the activation function has three bounded derivatives.

翻訳日:2023-06-29 13:48:50 公開日:2023-06-28

# 点2Point : 時空間占有予測におけるヒルベルト選別点雲の効率的な深層学習のためのフレームワーク

Point2Point : A Framework for Efficient Deep Learning on Hilbert sorted Point Clouds with applications in Spatio-Temporal Occupancy Prediction ( http://arxiv.org/abs/2306.16306v1 )

ライセンス: Link先を確認

Athrva Atul Pandhare

(参考訳) ポイントクラウドデータの不規則性と置換不変性は、効果的な学習に挑戦する。この問題に対処する従来の方法は、生の点雲を3Dボクセルグリッドやレンジイメージなどの中間表現に変換することである。このような中間表現は置換不変性の問題を解くが、情報のかなりの損失をもたらす。原点雲で学習するアプローチは、点間の近傍関係の解決に支障をきたすか、あるいはそれらの定式化において複雑すぎる。本論文では,ヒルベルト空間充填曲線によって誘導される1次元秩序を保存する局所性として点雲を表現する新しい手法を提案する。ヒルベルトソートされた点雲上で効果的に学習できるニューラルアーキテクチャであるpoint2pointも紹介する。 Point2Pointは、ポイントクラウドセグメンテーションと生成タスクで競合する性能を示す。最後に,ポイント雲からの時空間占有予測におけるpoint2pointの性能を示す。

The irregularity and permutation invariance of point cloud data pose challenges for effective learning. Conventional methods for addressing this issue involve converting raw point clouds to intermediate representations such as 3D voxel grids or range images. While such intermediate representations solve the problem of permutation invariance, they can result in significant loss of information. Approaches that do learn on raw point clouds either have trouble in resolving neighborhood relationships between points or are too complicated in their formulation. In this paper, we propose a novel approach to representing point clouds as a locality preserving 1D ordering induced by the Hilbert space-filling curve. We also introduce Point2Point, a neural architecture that can effectively learn on Hilbert-sorted point clouds. We show that Point2Point shows competitive performance on point cloud segmentation and generation tasks. Finally, we show the performance of Point2Point on Spatio-temporal Occupancy prediction from Point clouds.

翻訳日:2023-06-29 13:48:25 公開日:2023-06-28

# 超伝導量子デバイス用超音波エッジマイクロカットを用いた高qトレンチアルミニウムコプラナー共振器

High-Q trenched aluminum coplanar resonators with an ultrasonic edge microcutting for superconducting quantum devices ( http://arxiv.org/abs/2306.16301v1 )

ライセンス: Link先を確認

E.V. Zikiy, A.I. Ivanov, N.S. Smirnov, D.O. Moskalev, V.I. Polozov, A.R. Matanin, E.I. Malevannaya, V.V. Echeistov, T.G. Konstantinova and I.A. Rodionov

(参考訳) 誘電損失は超伝導量子ビットのコヒーレンスを制限する重要な要因の1つである。材料および製造工程が誘電損失に与える影響を共平面導波路(cpw)マイクロ波共振器を用いて評価する。本稿では,内部品質係数が5x106以上,低出力2x106(最大4.4x106)の超伝導マイクロ波共振器について報告する。このような性能は、量子ジョセフソン接合回路でよく用いられる高抵抗シリコン基板上で、7-10.5Um帯の100nm厚アルミニウム共振器で実証される。乾式および湿式アルミニウムエッチングとシリコン基板の深さおよび等方性反応性イオンエッチングを併用した共振器の内部品質因子について検討した。エアブリッジとシリコン基板エッチングの両方を用いたジョセフソン接合互換CPW共振器製造法を提案する。最後に, エアブリッジの位置と余分なプロセスステップが誘電損失に与える影響を実証する。ウェットエッチングされたアルミニウム共振器と等方性除去基板に対して, 超音波金属エッジマイクロカットにより, 最適品質のfa ctorを得る。

Dielectric losses are one of the key factors limiting the coherence of superconducting qubits. The impact of materials and fabrication steps on dielectric losses can be evaluated using coplanar waveguide (CPW) microwave resonators. Here, we report on superconducting CPW microwave resonators with internal quality factors systematically exceeding 5x106 at high powers and 2x106 (with the best value of 4.4x106) at low power. Such performance is demonstrated for 100-nm-thick aluminum resonators with 7-10.5 um center trace on high-resistivity silicon substrates commonly used in quantum Josephson junction circuits. We investigate internal quality factors of the resonators with both dry and wet aluminum etching, as well as deep and isotropic reactive ion etching of silicon substrate. Josephson junction compatible CPW resonators fabrication process with both airbridges and silicon substrate etching is proposed. Finally, we demonstrate the effect of airbridges positions and extra process steps on the overall dielectric losses. The best quality fa ctors are obtained for the wet etched aluminum resonators and isotropically removed substrate with the proposed ultrasonic metal edge microcutting.

翻訳日:2023-06-29 13:48:09 公開日:2023-06-28

# 社会世界の知識 : モデリングと応用

Social World Knowledge: Modeling and Applications ( http://arxiv.org/abs/2306.16299v1 )

ライセンス: Link先を確認

Nir Lotan and Einat Minkov

(参考訳) 社会世界の知識は、人間や機械による効果的なコミュニケーションと情報処理の重要な要素である。現在、実世界の知識を表す多くの知識基盤が存在する。しかし、世界の知識の社会的側面を捉えた資源は存在しない。我々はこのような資源の定式化と構築に向けて重要な一歩を踏み出したと信じている。ソーシャルネットワークで発生する社会的文脈から低次元の実体埋め込みを抽出するための一般的なフレームワークであるSocialVecを紹介する。このフレームワークでは、エンティティは一般的な関心を呼び出す非常に人気のあるアカウントに対応する。個人ユーザが共同でフォローする傾向にあるエンティティは社会的に関連があると仮定し、このソーシャルコンテキストの定義を用いてエンティティの埋め込みを学習する。テキストセマンティクスを含む作業を容易にする単語埋め込みと同様に、学習されたソーシャルエンティティ埋め込みは、複数のソーシャルフレーバーのタスクに利益をもたらすことを期待する。この研究では、約2万のエンティティのソーシャル埋め込みを、130万人のTwitterユーザーとフォローするアカウントのサンプルから引き出した。社会的重要性の2つのタスクに、結果の埋め込みを取り入れて評価する。まず、ニュースソースの政治的バイアスを、社会的埋め込み空間における実体的類似性の観点から評価する。第2に,フォローするエンティティのソーシャルな埋め込みに基づいて,個々のtwitterユーザの個人的特性を予測する。どちらの場合も、タスク固有のベースラインと比較して、我々のアプローチで有利または競争的なパフォーマンスを示す。さらに、事実に基づく既存の実体埋め込み方式は、知識の社会的側面を捉えないことを示す。我々は、社会世界の知識とその応用のさらなる探索を支援するために、学習された社会エンティティの埋め込みを研究コミュニティに公開する。

Social world knowledge is a key ingredient in effective communication and information processing by humans and machines alike. As of today, there exist many knowledge bases that represent factual world knowledge. Yet, there is no resource that is designed to capture social aspects of world knowledge. We believe that this work makes an important step towards the formulation and construction of such a resource. We introduce SocialVec, a general framework for eliciting low-dimensional entity embeddings from the social contexts in which they occur in social networks. In this framework, entities correspond to highly popular accounts which invoke general interest. We assume that entities that individual users tend to co-follow are socially related, and use this definition of social context to learn the entity embeddings. Similar to word embeddings which facilitate tasks that involve text semantics, we expect the learned social entity embeddings to benefit multiple tasks of social flavor. In this work, we elicited the social embeddings of roughly 200K entities from a sample of 1.3M Twitter users and the accounts that they follow. We employ and gauge the resulting embeddings on two tasks of social importance. First, we assess the political bias of news sources in terms of entity similarity in the social embedding space. Second, we predict the personal traits of individual Twitter users based on the social embeddings of entities that they follow. In both cases, we show advantageous or competitive performance using our approach compared with task-specific baselines. We further show that existing entity embedding schemes, which are fact-based, fail to capture social aspects of knowledge. We make the learned social entity embeddings available to the research community to support further exploration of social world knowledge and its applications.

翻訳日:2023-06-29 13:47:52 公開日:2023-06-28

# 時間変動モデレーション評価のための因果的帰納効果推定のためのメタラーニング手法

A Meta-Learning Method for Estimation of Causal Excursion Effects to Assess Time-Varying Moderation ( http://arxiv.org/abs/2306.16297v1 )

ライセンス: Link先を確認

Jieru Shi, Walter Dempsey

(参考訳) ウェアラブル技術とスマートフォンによるデジタル健康介入における双子革命は、様々な健康科学分野におけるモバイルヘルス(mhealth)介入のアクセシビリティと取り込みを大きく拡大した。マイクロランダム化試験(MRT)と呼ばれる連続ランダム化実験は、これらのmHealth介入成分の有効性を実証的に評価するために人気が高まっている。 MRTは「因果抽出効果(causal excursion effect)」と呼ばれる新しい種類の因果推定を行い、健康科学者は介入の効果が時間とともにどのように変化するか、あるいは過去の個々の特性、文脈、反応によって緩和されるかを評価することができる。しかし、因果抽出効果を推定する現在のデータ解析手法では、重要なニュアンスパラメータの作業モデルを構築するために、観測された高次元歴史の特徴を事前に特定する必要がある。機械学習アルゴリズムは自動機能構築に理想的だが、因果的再帰推定へのナイーブな応用は、モデルの誤特定下でバイアスを生じさせ、介入効果に関する誤った結論をもたらす可能性がある。この問題に対処するために,本稿ではメタリーナーの観点から因果的帰納効果の推定を再検討する。そこでは,ニュアサンスパラメータの推定に用いられる教師付き学習アルゴリズムの選択に,アナリストはいまだ無依存である。本論文は,新しい推定器の漸近特性を理論的および広範囲なシミュレーション実験により比較し,相対効率の向上を実証し,既存手法の2倍頑健な代替手法を提案する。最後に,本手法の実用性について,米国における初年の医療従事者の多施設コホート(NeCampら,2020年)からのデータを分析した。

Twin revolutions in wearable technologies and smartphone-delivered digital health interventions have significantly expanded the accessibility and uptake of mobile health (mHealth) interventions across various health science domains. Sequentially randomized experiments called micro-randomized trials (MRTs) have grown in popularity to empirically evaluate the effectiveness of these mHealth intervention components. MRTs have given rise to a new class of causal estimands known as "causal excursion effects", which enable health scientists to assess how intervention effectiveness changes over time or is moderated by individual characteristics, context, or responses in the past. However, current data analysis methods for estimating causal excursion effects require pre-specified features of the observed high-dimensional history to construct a working model of an important nuisance parameter. While machine learning algorithms are ideal for automatic feature construction, their naive application to causal excursion estimation can lead to bias under model misspecification, potentially yielding incorrect conclusions about intervention effectiveness. To address this issue, this paper revisits the estimation of causal excursion effects from a meta-learner perspective, where the analyst remains agnostic to the choices of supervised learning algorithms used to estimate nuisance parameters. The paper presents asymptotic properties of the novel estimators and compares them theoretically and through extensive simulation experiments, demonstrating relative efficiency gains and supporting the recommendation for a doubly robust alternative to existing methods. Finally, the practical utility of the proposed methods is demonstrated by analyzing data from a multi-institution cohort of first-year medical residents in the United States (NeCamp et al., 2020).

翻訳日:2023-06-29 13:47:28 公開日:2023-06-28

# 関連エンティティの選択:ゼロショット解析による知識グラフブートストラップ

Relevant Entity Selection: Knowledge Graph Bootstrapping via Zero-Shot Analogical Pruning ( http://arxiv.org/abs/2306.16296v1 )

ライセンス: Link先を確認

Lucas Jarnac, Miguel Couceiro, Pierre Monnin

(参考訳) 知識グラフ構築(kgc)は、高品質の核から始まった反復的なプロセスと見なすことができる。このような核はWikidataのようなオープンなKGに存在する知識から得ることができる。しかし、そのような汎用kgのサイズのため、それらを全体として統合することは、無関係なコンテンツとスケーラビリティの問題を伴う可能性がある。我々は,汎用kg に対する興味を持つ種実体から始まり,それらの隣り合う実体を保持または従属するアナロジーに基づくアプローチを提案する。ウィキデータに対する我々のアプローチは、ドメイン均質または異質なシードエンティティを含む2つの手動ラベル付きデータセットを通して評価する。我々は,我々の類推に基づくアプローチがLSTM,ランダムフォレスト,SVM,MLPを著しく低いパラメータ数で上回ることを示す。また,その一般化ポテンシャルを転送学習環境において評価する。これらの結果は、KGライフサイクルに関連するタスクにおけるアナロジーに基づく推論のさらなる統合を提唱する。

Knowledge Graph Construction (KGC) can be seen as an iterative process starting from a high quality nucleus that is refined by knowledge extraction approaches in a virtuous loop. Such a nucleus can be obtained from knowledge existing in an open KG like Wikidata. However, due to the size of such generic KGs, integrating them as a whole may entail irrelevant content and scalability issues. We propose an analogy-based approach that starts from seed entities of interest in a generic KG, and keeps or prunes their neighboring entities. We evaluate our approach on Wikidata through two manually labeled datasets that contain either domain-homogeneous or -heterogeneous seed entities. We empirically show that our analogy-based approach outperforms LSTM, Random Forest, SVM, and MLP, with a drastically lower number of parameters. We also evaluate its generalization potential in a transfer learning setting. These results advocate for the further integration of analogy-based inference in tasks related to the KG lifecycle.

翻訳日:2023-06-29 13:46:57 公開日:2023-06-28

# ワン・ツー・マニー合成による手術器具の領域分割の一般化

Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis ( http://arxiv.org/abs/2306.16285v1 )

ライセンス: Link先を確認

An Wang, Mobarakol Islam, Mengya Xu, Hongliang Ren

(参考訳) 様々な手術シーン理解タスクにおける優れたパフォーマンスにもかかわらず、深層学習に基づく手法は、様々な原因のために実世界の外科的応用に展開することを妨げることが多い。特に、サイトと患者の間のデータ収集、アノテーション、ドメインシフトが最も一般的な障害です。本研究では,最小限のソースイメージを有効活用して,合成手術器具セグメンテーションデータセットを生成することにより,データ関連の問題を軽減し,目に見えない実領域における優れた一般化性能を実現する。具体的には,1つの背景組織像と,各前景楽器の少なくとも3つの画像のみをシード画像とする。これらのソース画像は、前景と背景画像プールを構築するために広範囲に変換され、ランダムにサンプリングされた組織と楽器の画像を複数のブレンディング技術で合成し、新しい手術シーン画像を生成する。さらに,トレーニングデータの多様化を図るために,ハイブリッドなトレーニング時間拡張も導入する。 Endo2017、Endo2018、RoboToolの3つの実世界のデータセットに対する広範囲な評価は、我々の1対多の人工外科用データセットの生成とセグメンテーションフレームワークが実際のデータによるトレーニングと比較して、奨励的なパフォーマンスを達成することを実証している。特に、より重要なドメインギャップが存在するRoboToolデータセットでは、我々のフレームワークは、かなりのマージンで一般化の優位性を示している。我々は、データ合成によるモデル一般化の改善に研究の関心を惹きつけることを期待している。

Despite their impressive performance in various surgical scene understanding tasks, deep learning-based methods are frequently hindered from deploying to real-world surgical applications for various causes. Particularly, data collection, annotation, and domain shift in-between sites and patients are the most common obstacles. In this work, we mitigate data-related issues by efficiently leveraging minimal source images to generate synthetic surgical instrument segmentation datasets and achieve outstanding generalization performance on unseen real domains. Specifically, in our framework, only one background tissue image and at most three images of each foreground instrument are taken as the seed images. These source images are extensively transformed and employed to build up the foreground and background image pools, from which randomly sampled tissue and instrument images are composed with multiple blending techniques to generate new surgical scene images. Besides, we introduce hybrid training-time augmentations to diversify the training data further. Extensive evaluation on three real-world datasets, i.e., Endo2017, Endo2018, and RoboTool, demonstrates that our one-to-many synthetic surgical instruments datasets generation and segmentation framework can achieve encouraging performance compared with training with real data. Notably, on the RoboTool dataset, where a more significant domain gap exists, our framework shows its superiority of generalization by a considerable margin. We expect that our inspiring results will attract research attention to improving model generalization with data synthesizing.

翻訳日:2023-06-29 13:46:40 公開日:2023-06-28

# GPT-4による食品効果の要約と反復的プロンプティングによる製品特異的ガイダンス開発

Leveraging GPT-4 for Food Effect Summarization to Enhance Product-Specific Guidance Development via Iterative Prompting ( http://arxiv.org/abs/2306.16275v1 )

ライセンス: Link先を確認

Yiwen Shi, Ping Ren, Jing Wang, Biao Han, Taha ValizadehAslani, Felix Agbavor, Yi Zhang, Meng Hu, Liang Zhao, Hualou Liang

(参考訳) 新薬応用(NDA)による食品効果の要約は、製品特異的ガイダンス(PSG)の開発と評価に欠かせない要素である。しかし、広範囲な薬物アプリケーションレビュー文書からの食品効果の手動要約は時間がかかるため、自動化方法の開発の必要性が高まる。 chatgptやgpt-4といった大規模言語モデル(llm)の最近の進歩は、自動テキスト要約の有効性を向上させる大きな可能性を示しているが、psg評価における食品効果の要約の精度に関する能力は未だ不明である。本研究では,ChatGPT や GPT-4 との相互作用をより効果的かつ効率的に行うための,反復的プロンプト法を提案する。具体的には,食品効果要約のための3ターン反復プロンプト手法を提案し,キーワード指向プロンプトと長さ制御プロンプトを連続して提供し,生成した要約の質を向上させる。我々は,過去5年間に選択された100件のNDAレビュー文書に対して,自動測定からFDA専門家,さらにはGPT-4による評価まで,幅広い評価を行っている。我々は,プロセス全体で要約品質が徐々に改善されることを観察する。また, FDA専門家(43%対12%), GPT-4(64%対35%)では, GPT-4がChatGPTより優れていた。重要なことに、すべてのFDA専門家は、GPT-4が生成したサマリーの85%が、黄金の基準の要約と事実上一致していると全会一致で評価した。これらの結果は、gpt-4が食品効果サマリーを作成する大きな可能性を強く示唆しており、fdaの専門家によってレビューされ、psgアセスメントサイクルの効率を改善し、ジェネリック医薬品開発を促進する。

Food effect summarization from New Drug Application (NDA) is an essential component of product-specific guidance (PSG) development and assessment. However, manual summarization of food effect from extensive drug application review documents is time-consuming, which arouses a need to develop automated methods. Recent advances in large language models (LLMs) such as ChatGPT and GPT-4, have demonstrated great potential in improving the effectiveness of automated text summarization, but its ability regarding the accuracy in summarizing food effect for PSG assessment remains unclear. In this study, we introduce a simple yet effective approach, iterative prompting, which allows one to interact with ChatGPT or GPT-4 more effectively and efficiently through multi-turn interaction. Specifically, we propose a three-turn iterative prompting approach to food effect summarization in which the keyword-focused and length-controlled prompts are respectively provided in consecutive turns to refine the quality of the generated summary. We conduct a series of extensive evaluations, ranging from automated metrics to FDA professionals and even evaluation by GPT-4, on 100 NDA review documents selected over the past five years. We observe that the summary quality is progressively improved throughout the process. Moreover, we find that GPT-4 performs better than ChatGPT, as evaluated by FDA professionals (43% vs. 12%) and GPT-4 (64% vs. 35%). Importantly, all the FDA professionals unanimously rated that 85% of the summaries generated by GPT-4 are factually consistent with the golden reference summary, a finding further supported by GPT-4 rating of 72% consistency. These results strongly suggest a great potential for GPT-4 to draft food effect summaries that could be reviewed by FDA professionals, thereby improving the efficiency of PSG assessment cycle and promoting the generic drug product development.

翻訳日:2023-06-29 13:46:15 公開日:2023-06-28

# S2SNet:超伝導発見のためのトレーニング済みニューラルネットワーク

S2SNet: A Pretrained Neural Network for Superconductivity Discovery ( http://arxiv.org/abs/2306.16270v1 )

ライセンス: Link先を確認

Ke Liu and Kaifan Yang and Jiahong Zhang and Renjun Xu

(参考訳) 超伝導は、エネルギー損失なしに電流を流すことができ、固体超伝導は物理学、物質科学、電気工学の最大の目標である。 16人以上のノーベル賞受賞者が超伝導研究への貢献で受賞している。超伝導体は、気候変動緩和、安価でクリーンなエネルギー、産業、イノベーション、インフラなど、持続可能な開発目標(SDG)に価値がある。しかし、全ての超伝導機構を説明する統一物理学理論はまだ不明である。超伝導は分子組成だけでなく、幾何学的結晶構造も原因であると考えられている。そのため、結晶構造と超伝導臨界温度の両方を含む新しいデータセットS2SがSuperConとMaterial Project上に構築されている。この新たなデータセットに基づいて,超伝導予測に注目機構を利用する新しいモデルS2SNetを提案する。データ不足を克服するために、s2snetは、マスキング言語モデリング(mlm)を使用して、マテリアルプロジェクトデータセット全体に事前トレーニングされる。 S2SNetは、新しい最先端の精度を92%、AUC(Area Under Curve)を0.92の精度で実現している。我々の知る限りでは、S2SNetは結晶構造の情報のみを用いて超伝導を予測する最初の研究である。この研究は超伝導の発見とさらにsdgに有益である。コードとデータセットはhttps://github.com/zjuKeLiu/S2SNetで入手できる。

Superconductivity allows electrical current to flow without any energy loss, and thus making solids superconducting is a grand goal of physics, material science, and electrical engineering. More than 16 Nobel Laureates have been awarded for their contribution to superconductivity research. Superconductors are valuable for sustainable development goals (SDGs), such as climate change mitigation, affordable and clean energy, industry, innovation and infrastructure, and so on. However, a unified physics theory explaining all superconductivity mechanism is still unknown. It is believed that superconductivity is microscopically due to not only molecular compositions but also the geometric crystal structure. Hence a new dataset, S2S, containing both crystal structures and superconducting critical temperature, is built upon SuperCon and Material Project. Based on this new dataset, we propose a novel model, S2SNet, which utilizes the attention mechanism for superconductivity prediction. To overcome the shortage of data, S2SNet is pre-trained on the whole Material Project dataset with Masked-Language Modeling (MLM). S2SNet makes a new state-of-the-art, with out-of-sample accuracy of 92% and Area Under Curve (AUC) of 0.92. To the best of our knowledge, S2SNet is the first work to predict superconductivity with only information of crystal structures. This work is beneficial to superconductivity discovery and further SDGs. Code and datasets are available in https://github.com/zjuKeLiu/S2SNet

翻訳日:2023-06-29 13:45:39 公開日:2023-06-28

# RSPrompter: Visual Foundation Modelに基づくリモートセンシングインスタンスセグメンテーションのためのプロンプト学習

RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model ( http://arxiv.org/abs/2306.16269v1 )

ライセンス: Link先を確認

Keyan Chen, Chenyang Liu, Hao Chen, Haotian Zhang, Wenyuan Li, Zhengxia Zou, and Zhenwei Shi

(参考訳) Meta AI Researchが提案したSegment Anything Model(SAM)は、大規模なトレーニングデータ(SA-1B)を活用することで、優れた一般化とゼロショット機能を示している。それでも、SAMはカテゴリに依存しないインスタンスセグメンテーション法として、ポイント、ボックス、粗いきめのマスクを含む以前の手動ガイダンスに大きく依存している。さらに,リモートセンシング画像分割タスクの性能については,まだ十分に検証されていない。本稿では,SAMファウンデーションモデルに基づくリモートセンシング画像の自動インスタンスセグメンテーション手法の設計について検討する。そこで本研究では,SAM入力に対する適切なプロンプトの生成を学習する手法を提案する。これにより、SAMはリモートセンシング画像に対して意味的に識別可能なセグメンテーション結果を生成することができる。また,SAMコミュニティの最近の発展をベースとして,例分節タスクなどいくつかの派生案を提案し,その性能をRSPrompterと比較した。 WHUビルディング,NWPU VHR-10,SSDDデータセットの大規模実験により,提案手法の有効性が検証された。私たちのコードは \url{https://kyanchen.github.io/RSPrompter} でアクセスできます。

Leveraging vast training data (SA-1B), the foundation Segment Anything Model (SAM) proposed by Meta AI Research exhibits remarkable generalization and zero-shot capabilities. Nonetheless, as a category-agnostic instance segmentation method, SAM heavily depends on prior manual guidance involving points, boxes, and coarse-grained masks. Additionally, its performance on remote sensing image segmentation tasks has yet to be fully explored and demonstrated. In this paper, we consider designing an automated instance segmentation approach for remote sensing images based on the SAM foundation model, incorporating semantic category information. Inspired by prompt learning, we propose a method to learn the generation of appropriate prompts for SAM input. This enables SAM to produce semantically discernible segmentation results for remote sensing images, which we refer to as RSPrompter. We also suggest several ongoing derivatives for instance segmentation tasks, based on recent developments in the SAM community, and compare their performance with RSPrompter. Extensive experimental results on the WHU building, NWPU VHR-10, and SSDD datasets validate the efficacy of our proposed method. Our code is accessible at \url{https://kyanchen.github.io/RSPrompter}.

翻訳日:2023-06-29 13:45:16 公開日:2023-06-28

# ランダム分類雑音をもつ辺縁半空間学習のための情報計算トレードオフ

Information-Computation Tradeoffs for Learning Margin Halfspaces with Random Classification Noise ( http://arxiv.org/abs/2306.16352v1 )

ライセンス: Link先を確認

Ilias Diakonikolas, Jelena Diakonikolas, Daniel M. Kane, Puqian Wang, Nikos Zarifis

(参考訳) ランダムな分類雑音を伴うpac学習問題である\gamma$-margin半空間について検討する。我々は、問題のサンプル複雑性と計算効率の良いアルゴリズムのサンプル複雑性との間に固有のギャップを示唆する情報計算トレードオフを確立する。具体的には、問題のサンプル複雑性は$\widetilde{\Theta}(1/(\gamma^2 \epsilon)$である。まず、サンプル複雑性 $\widetilde{O}(1/(\gamma^2 \epsilon^2))$ の単純な効率的なアルゴリズムを与える。我々の主な結果は統計クエリ(sq)アルゴリズムと低次多項式テストに対する下限であり、サンプル複雑性における1/\epsilon$の二次依存は計算効率の高いアルゴリズムに固有のものであることを示唆している。具体的には、効率的なSQ学習者や低次テストのサンプル複雑さについて、より低い値の$\widetilde{\Omega}(1/(\gamma^{1/2} \epsilon^2)を示唆する。

We study the problem of PAC learning $\gamma$-margin halfspaces with Random Classification Noise. We establish an information-computation tradeoff suggesting an inherent gap between the sample complexity of the problem and the sample complexity of computationally efficient algorithms. Concretely, the sample complexity of the problem is $\widetilde{\Theta}(1/(\gamma^2 \epsilon))$. We start by giving a simple efficient algorithm with sample complexity $\widetilde{O}(1/(\gamma^2 \epsilon^2))$. Our main result is a lower bound for Statistical Query (SQ) algorithms and low-degree polynomial tests suggesting that the quadratic dependence on $1/\epsilon$ in the sample complexity is inherent for computationally efficient algorithms. Specifically, our results imply a lower bound of $\widetilde{\Omega}(1/(\gamma^{1/2} \epsilon^2))$ on the sample complexity of any efficient SQ learner or low-degree test.

翻訳日:2023-06-29 13:37:33 公開日:2023-06-28

# ボソニックガウス流路の低地・高地容量領域解析

Low-ground/High ground capacity regions analysis for Bosonic Gaussian Channels ( http://arxiv.org/abs/2306.16350v1 )

ライセンス: Link先を確認

Farzad Kianvash, Marco Fanizza, and Vittorio Giovannetti

(参考訳) 本稿では, 単一モード, 位相非感受性ガウスボソニックチャネル間の相互接続の包括的特性について述べる。この特徴付けにより、これらのマップのパラメータ空間において、低地と高地という2つの異なる領域を識別できる。低地領域では、情報容量は指定基準値よりも小さく、高地領域では、確実に大きい。直接の結果として、既知の上界と合成規則を組み合わせて既存の結果を改善するこれらの写像の量子容量とプライベート容量の明示的な上界の集合を体系的に概説する。

We present a comprehensive characterization of the interconnections between single-mode, phaseinsensitive Gaussian Bosonic Channels resulting from channel concatenation. This characterization enables us to identify, in the parameter space of these maps, two distinct regions: low-ground and high-ground. In the low-ground region, the information capacities are smaller than a designated reference value, while in the high-ground region, they are provably greater. As a direct consequence, we systematically outline an explicit set of upper bounds for the quantum and private capacity of these maps, which combine known upper bounds and composition rules, improving upon existing results.

翻訳日:2023-06-29 13:37:20 公開日:2023-06-28

# SpinBusアーキテクチャ - 電子シャットリングによるスピン量子のスケーリング

The SpinBus Architecture: Scaling Spin Qubits with Electron Shuttling ( http://arxiv.org/abs/2306.16348v1 )

ライセンス: Link先を確認

Matthias K\"unne, Alexander Willmes, Max Oberl\"ander, Christian Gorjaew, Julian D. Teske, Harsh Bhardwaj, Max Beer, Eugen Kammerloher, Ren\'e Otten, Inga Seidler, Ran Xue, Lars R. Schreiber and Hendrik Bluhm

(参考訳) 量子プロセッサアーキテクチャは、2次元の量子ビット接続と必要な操作能力を提供しながら、大きな量子ビット数へのスケーリングを可能にする必要がある。マイクロ波制御された半導体スピン量子ビットでは、密度の強いアレイがかなりの進歩を遂げているが、配線ファンアウトによりサイズが制限され、クォービット間のクロストークが顕著である。これらの制約を克服するために、電子シャットリングを用いてキュービットを接続し、低動作周波数と拡張キュービットコヒーレンスを特徴とするSpinBusアーキテクチャを導入する。 Si/SiGeプラットフォームにおける全ての関連する操作のデバイスシミュレーションは、確立された半導体パターン技術と99.9%以上の動作フィデリティによる実現可能性を検証する。室温計を用いた制御は、少なくとも144量子ビットを確実に支持できるが、もっと多くの数値が低温制御回路で認識できる。高忠実度スピンコヒーレント電子遮断の理論的実現可能性に基づいて、スピンバスアーキテクチャは実用的な量子コンピューティングのスケーラビリティ要件を満たすスピンベースの量子プロセッサの基礎となるかもしれない。

Quantum processor architectures must enable scaling to large qubit numbers while providing two-dimensional qubit connectivity and exquisite operation fidelities. For microwave-controlled semiconductor spin qubits, dense arrays have made considerable progress, but are still limited in size by wiring fan-out and exhibit significant crosstalk between qubits. To overcome these limitations, we introduce the SpinBus architecture, which uses electron shuttling to connect qubits and features low operating frequencies and enhanced qubit coherence. Device simulations for all relevant operations in the Si/SiGe platform validate the feasibility with established semiconductor patterning technology and operation fidelities exceeding 99.9 %. Control using room temperature instruments can plausibly support at least 144 qubits, but much larger numbers are conceivable with cryogenic control circuits. Building on the theoretical feasibility of high-fidelity spin-coherent electron shuttling as key enabling factor, the SpinBus architecture may be the basis for a spin-based quantum processor that meets the scalability requirements for practical quantum computing.

翻訳日:2023-06-29 13:37:08 公開日:2023-06-28

# 多様体上の自己回帰モデル(mnarx)を用いた複素系のダイナミクスの模倣

Emulating the dynamics of complex systems using autoregressive models on manifolds (mNARX) ( http://arxiv.org/abs/2306.16335v1 )

ライセンス: Link先を確認

Styfen Sch\"ar, Stefano Marelli, Bruno Sudret

(参考訳) 本研究では, 時間変化による外因性励起による複雑な力学系の応答を, 効率的に正確に近似するための新しい代理モデリング手法を提案する。我々のアプローチでは, 自己回帰的サロゲートを構成するのに最適な問題固有の外因性入力多様体を構築することを含む, 非線形自己回帰モデル (mNARX) と命名する。 mNARX の核を形成する多様体は、システムの物理と事前の専門家およびドメイン知識を組み込むことで漸進的に構成される。 mNARXは完全な問題を一連の小さなサブプロブレムに分解し、それぞれが元のより低い複雑さを持つので、最終的なサロゲートのトレーニングと評価のコストの両面で、問題の複雑さによく対応している。さらに、mnarxは従来の次元還元技術とよく調和しており、高次元外因性入力を持つ力学系をモデル化するのに非常に適しており、解くのが難しく、特に工学的応用で見られるような物理システムではドメイン知識が豊富であるため、mnarxはこれらの応用に適している。 1次元ランダム励起により励起される古典的結合ばね質量系の応答を予測するため,mNARXは従来の自己回帰代理よりも優れていた。さらに,mNARXは,アクティブコントローラの影響を受けても,現実的なエアロサーボ弾性風力タービンシミュレータの動力学を補助することにより,高次元時間・状態依存系のエミュレートに適していることを示す。一般に,mNARXは複雑な力学系を,精度と効率の観点からモデル化する上で有望な可能性を示している。

In this study, we propose a novel surrogate modelling approach to efficiently and accurately approximate the response of complex dynamical systems driven by time-varying exogenous excitations over extended time periods. Our approach, that we name \emph{manifold nonlinear autoregressive modelling with exogenous input} (mNARX), involves constructing a problem-specific exogenous input manifold that is optimal for constructing autoregressive surrogates. The manifold, which forms the core of mNARX, is constructed incrementally by incorporating the physics of the system, as well as prior expert- and domain- knowledge. Because mNARX decomposes the full problem into a series of smaller sub-problems, each with a lower complexity than the original, it scales well with the complexity of the problem, both in terms of training and evaluation costs of the final surrogate. Furthermore, mNARX synergizes well with traditional dimensionality reduction techniques, making it highly suitable for modelling dynamical systems with high-dimensional exogenous inputs, a class of problems that is typically challenging to solve.Since domain knowledge is particularly abundant in physical systems, such as those found in engineering applications, mNARX is well suited for these applications. We demonstrate that mNARX outperforms traditional autoregressive surrogates in predicting the response of a classical coupled spring-mass system excited by a one-dimensional random excitation. Additionally, we show that mNARX is well suited for emulating very high-dimensional time- and state-dependent systems, even when affected by active controllers, by surrogating the dynamics of a realistic aero-servo-elastic onshore wind turbine simulator. In general, our results demonstrate that mNARX offers promising prospects for modelling complex dynamical systems, in terms of accuracy and efficiency.

翻訳日:2023-06-29 13:36:48 公開日:2023-06-28

# 密度ランドマーク検出による離散化潜在座標系の同定可能性

Identifiability of Discretized Latent Coordinate Systems via Density Landmarks Detection ( http://arxiv.org/abs/2306.16334v1 )

ライセンス: Link先を確認

Vit\'oria Barin-Pacela, Kartik Ahuja, Simon Lacoste-Julien, Pascal Vincent

(参考訳) 乱れは観測された分布のみから有意義な潜在的地中要因を回復することを目的としている。 Identifiabilityは、不整合が十分に確立される理論的根拠を提供する。残念なことに、独立潜在因子の教師なし識別性は、因子から観測までの一般的な非線形滑らかな写像の下でのi.d.セッティングにおいて理論的に証明された不可能性である。本研究では,高次非線形滑らかな写像(微分同相写像)の下で離散化された潜在座標を,追加の帰納的バイアスを伴わずに復元可能であることを示す。これは、潜在密度が軸合わせの不連続性ランドマークを持つと仮定するが、因子の統計的独立性を非現実的に仮定する必要はない。本稿では,量子化座標identifiabilityと呼ばれる,この新しい形態の識別可能性を紹介し,離散座標の回復の包括的証明を提供する。

Disentanglement aims to recover meaningful latent ground-truth factors from only the observed distribution. Identifiability provides the theoretical grounding for disentanglement to be well-founded. Unfortunately, unsupervised identifiability of independent latent factors is a theoretically proven impossibility in the i.i.d. setting under a general nonlinear smooth map from factors to observations. In this work, we show that, remarkably, it is possible to recover discretized latent coordinates under a highly generic nonlinear smooth mapping (a diffeomorphism) without any additional inductive bias on the mapping. This is, assuming that latent density has axis-aligned discontinuity landmarks, but without making the unrealistic assumption of statistical independence of the factors. We introduce this novel form of identifiability, termed quantized coordinate identifiability, and provide a comprehensive proof of the recovery of discretized coordinates.

翻訳日:2023-06-29 13:36:16 公開日:2023-06-28

# diffcomplete:拡散に基づく生成的3次元形状完了

DiffComplete: Diffusion-based Generative 3D Shape Completion ( http://arxiv.org/abs/2306.16329v1 )

ライセンス: Link先を確認

Ruihang Chu, Enze Xie, Shentong Mo, Zhenguo Li, Matthias Nie{\ss}ner, Chi-Wing Fu, Jiaya Jia

(参考訳) 3dレンジスキャンによる形状完了のための新しい拡散ベース手法を提案する。従来の決定論的および確率論的手法と比較して、現実主義、多様性、高忠実性のバランスをとる。不完全な形状を条件とした生成タスクとして、形状完了をキャスティングすることでDiffCompleteを提案する。私たちのキーデザインは2倍です。まず,空間的に一貫した方法で条件付き特徴を注入する階層的特徴集約機構を考案する。そこで, 形状完了を制御するために, 条件入力の局所的詳細とより広い文脈の両方をキャプチャできる。第2に,複数の部分形状の完成と,入力条件に対する高い柔軟性を実現するために,我々のモデルにおける占有を考慮した融合戦略を提案する。 DiffCompleteは2つの大規模3D形状補完ベンチマーク上で新しいSOTA性能(例:l_1エラーの40%削減)を設定する。我々の完成形は決定論的方法と比較して現実的な見通しを持つだけでなく、確率的代替物と比較して基礎的真理と高い類似性を示す。さらに、DiffCompleteは、合成データと実データの両方に対して、完全に見えないクラスのオブジェクトに対して強力な一般化性を持ち、様々なアプリケーションでモデルの再トレーニングを不要にする。

We introduce a new diffusion-based approach for shape completion on 3D range scans. Compared with prior deterministic and probabilistic methods, we strike a balance between realism, multi-modality, and high fidelity. We propose DiffComplete by casting shape completion as a generative task conditioned on the incomplete shape. Our key designs are two-fold. First, we devise a hierarchical feature aggregation mechanism to inject conditional features in a spatially-consistent manner. So, we can capture both local details and broader contexts of the conditional inputs to control the shape completion. Second, we propose an occupancy-aware fusion strategy in our model to enable the completion of multiple partial shapes and introduce higher flexibility on the input conditions. DiffComplete sets a new SOTA performance (e.g., 40% decrease on l_1 error) on two large-scale 3D shape completion benchmarks. Our completed shapes not only have a realistic outlook compared with the deterministic methods but also exhibit high similarity to the ground truths compared with the probabilistic alternatives. Further, DiffComplete has strong generalizability on objects of entirely unseen classes for both synthetic and real data, eliminating the need for model re-training in various applications.

翻訳日:2023-06-29 13:35:58 公開日:2023-06-28

# 変分ベイズネットワークによる表現学習

Representation Learning via Variational Bayesian Networks ( http://arxiv.org/abs/2306.16326v1 )

ライセンス: Link先を確認

Oren Barkan, Avi Caciularu, Idan Rejwan, Ori Katz, Jonathan Weill, Itzik Malkiel, Noam Koenigstein

(参考訳) 変動ベイズネットワーク(vbn) - 階層的およびリレーショナルなサイド情報を利用した新しいベイズ型エンティティ表現学習モデルであり、データが不足している 'long-tail'' におけるエンティティのモデリングに特に有用である。第一に、vbnは共通の祖先を共有するエンティティ間の情報伝達を可能にする情報的階層的優先順位を採用している。さらに、VBNは相補構造と一貫性を強制するエンティティ間の明示的な関係をモデル化し、学習された表現をより意味のある空間配置へと導く。第2に、VBNは(ベクトルではなく)密度による実体を表すため、データ不足に対処する上で相補的な役割を果たす不確実性をモデル化する。最後に,高速近似ベイズ推定を可能にするスケーラブルな変分ベイズ最適化アルゴリズムを提案する。言語,推奨,医学的推論タスクにおけるvbnの有効性を評価した。以上の結果から,VBNは複数のデータセット,特にロングテールにおいて,既存の手法よりも優れていることがわかった。

We present Variational Bayesian Network (VBN) - a novel Bayesian entity representation learning model that utilizes hierarchical and relational side information and is particularly useful for modeling entities in the ``long-tail'', where the data is scarce. VBN provides better modeling for long-tail entities via two complementary mechanisms: First, VBN employs informative hierarchical priors that enable information propagation between entities sharing common ancestors. Additionally, VBN models explicit relations between entities that enforce complementary structure and consistency, guiding the learned representations towards a more meaningful arrangement in space. Second, VBN represents entities by densities (rather than vectors), hence modeling uncertainty that plays a complementary role in coping with data scarcity. Finally, we propose a scalable Variational Bayes optimization algorithm that enables fast approximate Bayesian inference. We evaluate the effectiveness of VBN on linguistic, recommendations, and medical inference tasks. Our findings show that VBN outperforms other existing methods across multiple datasets, and especially in the long-tail.

翻訳日:2023-06-29 13:35:39 公開日:2023-06-28

# DoseDiff:放射線治療における線量予測のための距離認識拡散モデル

DoseDiff: Distance-aware Diffusion Model for Dose Prediction in Radiotherapy ( http://arxiv.org/abs/2306.16324v1 )

ライセンス: Link先を確認

Yiwen Zhang, Chuanpu Li, Liming Zhong, Zeli Chen, Wei Yang, and Xuetao Wang

(参考訳) 治療計画は放射線治療のワークフローにおいて重要な要素であり、一般的には医療物理学者が時間を要する試行錯誤の方法で行う。これまでの研究では、医学物理学者が治療計画の効率を改善するのに役立つ線量分布マップを予測するための知識ベースまたは深層学習ベースの方法が提案されている。しかしながら、これらの線量予測法は通常、周囲の組織と標的または臓器間の距離情報の有効利用を欠いている。さらに、予測された線量分布図における線量経路の分布特性の維持に乏しく、医療物理学者が得る貴重な情報を失うことになる。本稿では,線量分布を正確に予測するための距離認識拡散モデル(DoseDiff)を提案する。我々は、線量予測を、CT画像と署名された距離マップ(SDM)の条件で予測された線量分布マップを生成する、認知ステップのシーケンスとして定義する。 SDMは、画像の各画素から目標またはOARのアウトラインまでの距離情報を提供するターゲットまたはOARのマスクからの距離変換によって得られる。さらに,マルチエンコーダとマルチスケールフュージョンネットワーク(MMFNet)を提案し,マルチスケールフュージョンとトランスフォーマーベースフュージョンモジュールを組み込むことにより,機能レベルでのCT画像とSDM間の情報フュージョンを強化する。本モデルは,乳癌患者と鼻咽頭癌患者から収集した2つのデータを用いて評価した。その結果,ドセディフは量的および視覚的品質の両面で最先端の線量予測法より優れていた。

Treatment planning is a critical component of the radiotherapy workflow, typically carried out by a medical physicist using a time-consuming trial-and-error manner. Previous studies have proposed knowledge-based or deep learning-based methods for predicting dose distribution maps to assist medical physicists in improving the efficiency of treatment planning. However, these dose prediction methods usuallylack the effective utilization of distance information between surrounding tissues andtargets or organs-at-risk (OARs). Moreover, they are poor in maintaining the distribution characteristics of ray paths in the predicted dose distribution maps, resulting in a loss of valuable information obtained by medical physicists. In this paper, we propose a distance-aware diffusion model (DoseDiff) for precise prediction of dose distribution. We define dose prediction as a sequence of denoising steps, wherein the predicted dose distribution map is generated with the conditions of the CT image and signed distance maps (SDMs). The SDMs are obtained by a distance transformation from the masks of targets or OARs, which provide the distance information from each pixel in the image to the outline of the targets or OARs. Besides, we propose a multiencoder and multi-scale fusion network (MMFNet) that incorporates a multi-scale fusion and a transformer-based fusion module to enhance information fusion between the CT image and SDMs at the feature level. Our model was evaluated on two datasets collected from patients with breast cancer and nasopharyngeal cancer, respectively. The results demonstrate that our DoseDiff outperforms the state-of-the-art dose prediction methods in terms of both quantitative and visual quality.

翻訳日:2023-06-29 13:35:22 公開日:2023-06-28

# Taqyim: ChatGPTモデルによるアラビアNLPタスクの評価

Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models ( http://arxiv.org/abs/2306.16322v1 )

ライセンス: Link先を確認

Zaid Alyafeai and Maged S. Alshaibani and Badr AlKhamissi and Hamzah Luqman and Ebrahim Alareqi and Ali Fadel

(参考訳) GPT-3.5やGPT-4のようなLLM上に構築されたチャットベースのモデルChatGPTなど、さまざまなダウンストリームタスクにおいて、微調整を必要とせずに、大きな言語モデル(LLM)が印象的なパフォーマンスを示している。英語に比べて訓練率が低いにもかかわらず、これらのモデルは他の言語でも顕著な能力を示す。本研究では, 感情分析, 翻訳, 翻訳, パラフレージング, 音声タグ付け, 要約, ダイアクリマイゼーションの7つの異なるNLPタスクにおけるGPT-3.5およびGPT-4モデルの性能評価を行った。 GPT-4は7タスク中5タスクでGPT-3.5を上回った。さらに、感情分析タスクを広範囲に分析し、難易度データセット上でLCMが例外的な結果を得る方法について考察する。さらに,これらのタスクの評価を容易にする新しいPythonインターフェース https://github.com/ARBML/Taqyimを導入する。

Large language models (LLMs) have demonstrated impressive performance on various downstream tasks without requiring fine-tuning, including ChatGPT, a chat-based model built on top of LLMs such as GPT-3.5 and GPT-4. Despite having a lower training proportion compared to English, these models also exhibit remarkable capabilities in other languages. In this study, we assess the performance of GPT-3.5 and GPT-4 models on seven distinct Arabic NLP tasks: sentiment analysis, translation, transliteration, paraphrasing, part of speech tagging, summarization, and diacritization. Our findings reveal that GPT-4 outperforms GPT-3.5 on five out of the seven tasks. Furthermore, we conduct an extensive analysis of the sentiment analysis task, providing insights into how LLMs achieve exceptional results on a challenging dialectal dataset. Additionally, we introduce a new Python interface https://github.com/ARBML/Taqyim that facilitates the evaluation of these tasks effortlessly.

翻訳日:2023-06-29 13:34:55 公開日:2023-06-28

# 意味的検出を用いた中国語テキスト修正のための敵対的マルチタスク学習法

An Adversarial Multi-Task Learning Method for Chinese Text Correction with Semantic Detection ( http://arxiv.org/abs/2306.16313v1 )

ライセンス: Link先を確認

Fanyu Wang and Zhenping Xie

(参考訳) テキストの修正、特により広く使用されるシーンのセマンティックな修正は、テキストの流速と筆記効率を改善するために強く要求される。中国語文文脈における文字ポリセミーのモデル化と検出能力を高めるために, 対向的多タスク学習法を提案する。そこで、マスク言語モデルとスコアリング言語モデルという2つのモデルが、結合されただけでなく、逆の学習タスクとして導入された。さらに,モンテカルロ木探索戦略とポリシーネットワークを導入して,意味検出による中国語テキストの効率的な修正作業を実現する。実験は,3つのデータセットと5つの比較手法を用いて実施され,本手法は,意味的合理性を高めるために,中国語テキスト修正タスクにおいて優れた性能が得られることを示す。

Text correction, especially the semantic correction of more widely used scenes, is strongly required to improve, for the fluency and writing efficiency of the text. An adversarial multi-task learning method is proposed to enhance the modeling and detection ability of character polysemy in Chinese sentence context. Wherein, two models, the masked language model and scoring language model, are introduced as a pair of not only coupled but also adversarial learning tasks. Moreover, the Monte Carlo tree search strategy and a policy network are introduced to accomplish the efficient Chinese text correction task with semantic detection. The experiments are executed on three datasets and five comparable methods, and the experimental results show that our method can obtain good performance in Chinese text correction task for better semantic rationality.

翻訳日:2023-06-29 13:34:38 公開日:2023-06-28

# ベイズ逆問題に対する時空間ベソフ前処理

Spatiotemporal Besov Priors for Bayesian Inverse Problems ( http://arxiv.org/abs/2306.16378v1 )

ライセンス: Link先を確認

Shiwei Lan, Mirjeta Pasha, and Shuyi Li

(参考訳) 科学技術の急速な発展は、突然の変化や鋭いコントラストといった特別なデータ特徴を捉えるための適切な統計ツールの必要性を招いた。データサイエンスにおける多くの応用は、不連続性や特異性のある時間依存物体(例えば、動的コンピュータ断層撮影(ct)画像)から時空間的再構成を求める。ガウス過程(gp)に基づく従来の手法は、過剰な事前候補を提供する傾向があるため、十分な解を提供することができない。近年、ランダム係数を持つウェーブレット展開によって定義されるベッソフ過程(bp)は、このタイプのベイズ逆問題より適切であるとして提案されている。 BPは画像解析においてGPを上回ってエッジ保存再構成を生成するが、動的に変化する画像に遺伝する時間相関を自動的に組み込むわけではない。本稿では,時系列相関強度を規定するq指数過程に従って,系列展開の確率係数を確率時間関数に置き換え,時空間領域(stbp)へbpを一般化する。 STBPに関する数学的および統計的性質を慎重に研究した。また,STBPの白色雑音表現も提案し,後方サンプリングによる最大値(MAP)と不確かさ定量化(UQ)による点推定を容易にする。 2つの有限角ct再構成例とnavier-stokes方程式を含む高非線形逆問題を用いて、従来のstgpと時間非相関アプローチと比較して、時間変化を考慮した空間的特徴の保存におけるstbpの利点を示す。

Fast development in science and technology has driven the need for proper statistical tools to capture special data features such as abrupt changes or sharp contrast. Many applications in the data science seek spatiotemporal reconstruction from a sequence of time-dependent objects with discontinuity or singularity, e.g. dynamic computerized tomography (CT) images with edges. Traditional methods based on Gaussian processes (GP) may not provide satisfactory solutions since they tend to offer over-smooth prior candidates. Recently, Besov process (BP) defined by wavelet expansions with random coefficients has been proposed as a more appropriate prior for this type of Bayesian inverse problems. While BP outperforms GP in imaging analysis to produce edge-preserving reconstructions, it does not automatically incorporate temporal correlation inherited in the dynamically changing images. In this paper, we generalize BP to the spatiotemporal domain (STBP) by replacing the random coefficients in the series expansion with stochastic time functions following Q-exponential process which governs the temporal correlation strength. Mathematical and statistical properties about STBP are carefully studied. A white-noise representation of STBP is also proposed to facilitate the point estimation through maximum a posterior (MAP) and the uncertainty quantification (UQ) by posterior sampling. Two limited-angle CT reconstruction examples and a highly non-linear inverse problem involving Navier-Stokes equation are used to demonstrate the advantage of the proposed STBP in preserving spatial features while accounting for temporal changes compared with the classic STGP and a time-uncorrelated approach.

翻訳日:2023-06-29 13:28:55 公開日:2023-06-28

# メモリ・マイクロスケール接続機能を有する単一電子情報処理デバイス用Si/SiGe QuBus

Si/SiGe QuBus for single electron information-processing devices with memory and micron-scale connectivity function ( http://arxiv.org/abs/2306.16375v1 )

ライセンス: Link先を確認

Ran Xue, Max Beer, Inga Seidler, Simon Humpohl, Jhih-Sian Tu, Stefan Trellenkamp, Tom Struck, Hendrik Bluhm, Lars R. Schreiber

(参考訳) 単一キャリア情報処理デバイス内の接続には、単一電荷量子の転送とストレージが必要である。われわれの全電動Si/SiGeシャトル装置は量子バス(QuBus)と呼ばれ、長さは10$\mathrm{\mu}$mで、電圧パルスは6つしかない。コンベアモード(containor-mode)、すなわち電子は移動QDに閉じ込められ、断熱的に輸送される。我々は,QuBusの潜在的な欠陥と局所的なシャトル忠実度をベンチマークするために,シャトルトモグラフィーと呼ばれるキャラクタリゼーション手法を提案する。全デバイスと背面を横断する単電子シャトルの忠実性(合計距離は19ドルの\mathrm{\mu}$m)は$(99.7 \pm 0.3)\,\%$である。 QuBusを用いて最大34個の電子の位置と検出を行い、任意に選択されたゼロ電子と単一電子のパターンを持つ34個の量子ドットのレジスタを初期化する。単純な演算信号、産業製造との互換性、$^{28}$Si/SiGeでの低スピン環境相互作用は、量子コンピューティングアーキテクチャにおける量子接続のためのスピン保存輸送を約束する。

The connectivity within single carrier information-processing devices requires transport and storage of single charge quanta. Our all-electrical Si/SiGe shuttle device, called quantum bus (QuBus), spans a length of 10 $\mathrm{\mu}$m and is operated by only six simply-tunable voltage pulses. It operates in conveyor-mode, i.e. the electron is adiabatically transported while confined to a moving QD. We introduce a characterization method, called shuttle-tomography, to benchmark the potential imperfections and local shuttle-fidelity of the QuBus. The fidelity of the single-electron shuttle across the full device and back (a total distance of 19 $\mathrm{\mu}$m) is $(99.7 \pm 0.3)\,\%$. Using the QuBus, we position and detect up to 34 electrons and initialize a register of 34 quantum dots with arbitrarily chosen patterns of zero and single-electrons. The simple operation signals, compatibility with industry fabrication and low spin-environment-interaction in $^{28}$Si/SiGe, promises spin-conserving transport of spin qubits for quantum connectivity in quantum computing architectures.

翻訳日:2023-06-29 13:28:14 公開日:2023-06-28

# フォールトトレランス前の量子コンピューティングの有用性に関するエビデンスの高速古典シミュレーション

Fast classical simulation of evidence for the utility of quantum computing before fault tolerance ( http://arxiv.org/abs/2306.16372v1 )

ライセンス: Link先を確認

Tomislav Begu\v{s}i\'c and Garnet Kin-Lic Chan

(参考訳) スパース・ポーリ・ダイナミクスに基づく古典的アルゴリズムは、ibmのeagleプロセッサ[nature 618, 500 (2023)]の127量子ビットに関する最近の実験で研究された量子回路を効率的にシミュレートできる。ラップトップの単一コア上の古典的なシミュレーションは、報告された量子シミュレーションのウォールタイムよりも桁違いに速く、古典的な処理のない推定量子ハードウェアランタイムよりも高速で、ゼロノイズ外挿実験結果とよく一致しています。

We show that a classical algorithm based on sparse Pauli dynamics can efficiently simulate quantum circuits studied in a recent experiment on 127 qubits of IBM's Eagle processor [Nature 618, 500 (2023)]. Our classical simulations on a single core of a laptop are orders of magnitude faster than the reported walltime of the quantum simulations, as well as faster than the estimated quantum hardware runtime without classical processing, and are in good agreement with the zero-noise extrapolated experimental results.

翻訳日:2023-06-29 13:27:47 公開日:2023-06-28

# 不平衡振幅をもつ和オーバーパスの完全等式理論

Complete equational theories for the sum-over-paths with unbalanced amplitudes ( http://arxiv.org/abs/2306.16369v1 )

ライセンス: Link先を確認

Matthew Amy

(参考訳) vilmart氏は最近、toffoli-hadamard回路と拡張clifford+$\mathrm{diag}(1, \zeta_{2^k})$回路上の平衡和オーバーパスの完全な方程式理論を提示した。それらの理論は、位相自由なZH-計算に基づいており、完全なZH-計算の平均的な規則を著しく省略し、振幅の局所的な和を許容しない。ここでは局所和を自然に支持する不均衡経路和における完全性の問題を考察する。非平衡和オーバーパスの具体的構文を示し、記号的多線型代数と干渉規則とともに、zh-係数の平均および正則の様々な定式化が任意の環と体上の完全な方程式論を与えるのに十分であることを示す。

Vilmart recently gave a complete equational theory for the balanced sum-over-paths over Toffoli-Hadamard circuits, and by extension Clifford+$\mathrm{diag}(1, \zeta_{2^k})$ circuits. Their theory is based on the phase-free ZH-calculus which crucially omits the average rule of the full ZH-calculus, dis-allowing the local summation of amplitudes. Here we study the question of completeness in unbalanced path sums which naturally support local summation. We give a concrete syntax for the unbalanced sum-over-paths and show that, together with symbolic multilinear algebra and the interference rule, various formulations of the average and ortho rules of the ZH-calculus are sufficient to give complete equational theories over arbitrary rings and fields.

翻訳日:2023-06-29 13:27:29 公開日:2023-06-28

# ラグランジアンに基づく自動推論のためのA*アルゴリズム

Lagrangian based A* algorithm for automated reasoning ( http://arxiv.org/abs/2306.16368v1 )

ライセンス: Link先を確認

Renju Rajan

(参考訳) 本稿では,A*アルゴリズムの修正を最短経路問題として検討する。重み付けはA*アルゴリズムのヒューリスティックな部分で導入され、効率が向上する。このアルゴリズムの応用は、速度をヒューリスティックの湿潤化とみなすUAV経路計画に適用できると考えられる。当初、ラグランジュ方程式に基づく変分法を用いて速度を力学系の決定的因子として同定した。このアプローチは、これらの領域におけるアルゴリズムの効率を改善するのに他の問題にも役立つだろう。

In this paper, a modification of A* algorithm is considered for the shortest path problem. A weightage is introduced in the heuristic part of the A* algorithm to improve its efficiency. An application of the algorithm is considered for UAV path planning wherein velocity is taken as the weigtage to the heuristic. At the outset, calculus of variations based Lagrange's equation was used to identify velocity as the decisive factor for the dynamical system. This approach would be useful for other problems as well to improve the efficiency of algorithms in those areas.

翻訳日:2023-06-29 13:27:11 公開日:2023-06-28

# 再帰的・注意的モデルとNVFlareを用いた多段階臨床フェデレーション学習

Multi-Site Clinical Federated Learning using Recursive and Attentive Models and NVFlare ( http://arxiv.org/abs/2306.16367v1 )

ライセンス: Link先を確認

Won Joon Yun, Samuel Kim, Joongheon Kim

(参考訳) デジタルヘルスデータの驚異的な成長は、自然言語処理(NLP)や医療記録、臨床ノート、その他のテキストベースの健康情報を精査するための機械学習手法の利用への関心が高まっている。 nlp技術は患者のケアを増強し、臨床意思決定を知らせる上で大きな可能性を秘めているが、データプライバシと規制への順守は重要な懸念として続いている。フェデレーテッド・ラーニング(FL)は実行可能なソリューションとして登場し、複数の組織が生データを広めることなく、機械学習モデルを協調的にトレーニングすることを可能にする。本稿では, NVIDIA が開発した FL, NLP モデル, NVFlare フレームワークを併用することにより, 医療用 NLP に対する実用的アプローチを実現する。医療データ内のコンテキストやセマンティクスの理解において例外的な性能を示した,長期記憶モデル(lstm)とトランスフォーマ(bert)からの双方向エンコーダ表現モデルという,2つの模範的nlpモデルを提案する。本稿では,データのプライバシと規制遵守の課題に対処しつつ,高い精度と性能を維持しながら,bertプリトレーニングを取り入れ,提案手法の有効性を包括的に検証する統合フレームワークの開発について述べる。

The prodigious growth of digital health data has precipitated a mounting interest in harnessing machine learning methodologies, such as natural language processing (NLP), to scrutinize medical records, clinical notes, and other text-based health information. Although NLP techniques have exhibited substantial potential in augmenting patient care and informing clinical decision-making, data privacy and adherence to regulations persist as critical concerns. Federated learning (FL) emerges as a viable solution, empowering multiple organizations to train machine learning models collaboratively without disseminating raw data. This paper proffers a pragmatic approach to medical NLP by amalgamating FL, NLP models, and the NVFlare framework, developed by NVIDIA. We introduce two exemplary NLP models, the Long-Short Term Memory (LSTM)-based model and Bidirectional Encoder Representations from Transformers (BERT), which have demonstrated exceptional performance in comprehending context and semantics within medical data. This paper encompasses the development of an integrated framework that addresses data privacy and regulatory compliance challenges while maintaining elevated accuracy and performance, incorporating BERT pretraining, and comprehensively substantiating the efficacy of the proposed approach.

翻訳日:2023-06-29 13:27:01 公開日:2023-06-28

# Vanilla Gradient Descentを用いたNTKを超えて:ポリノーミアル幅,サンプル,時間を有するニューラルネットワークの平均場解析

Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time ( http://arxiv.org/abs/2306.16361v1 )

ライセンス: Link先を確認

Arvind Mahankali, Jeff Z. Haochen, Kefan Dong, Margalit Glasgow, Tengyu Ma

(参考訳) 2層ニューラルネットワークの非凸最適化に関する最近の理論的な進歩にもかかわらず、不自然な修正を伴わないニューラルネットワークの勾配降下がカーネル法よりも優れたサンプル複雑性を達成することができるかどうかはまだ疑問である。本稿では,多項式幅2層ニューラルネットワーク上の投影勾配流れのクリーンな平均場解析を提供する。先行研究と異なり,本解析では最適化アルゴリズムの不自然な修正は不要である。サンプルサイズ $n = O(d^{3.1})$ の場合、$d$ は入力の次元であり、ネットワークは多項式的に多くの反復に収束し、$n \ll d^4$ サンプルを用いてカーネルメソッドでは達成できない非自明な誤差に収束するので、修正されていない勾配降下と NTK の明確な分離を示す。

Despite recent theoretical progress on the non-convex optimization of two-layer neural networks, it is still an open question whether gradient descent on neural networks without unnatural modifications can achieve better sample complexity than kernel methods. This paper provides a clean mean-field analysis of projected gradient flow on polynomial-width two-layer neural networks. Different from prior works, our analysis does not require unnatural modifications of the optimization algorithm. We prove that with sample size $n = O(d^{3.1})$ where $d$ is the dimension of the inputs, the network converges in polynomially many iterations to a non-trivial error that is not achievable by kernel methods using $n \ll d^4$ samples, hence demonstrating a clear separation between unmodified gradient descent and NTK.

翻訳日:2023-06-29 13:26:38 公開日:2023-06-28

# 分極量子回路における古典計算性能境界

Classically computing performance bounds on depolarized quantum circuits ( http://arxiv.org/abs/2306.16360v1 )

ライセンス: Link先を確認

Sattwik Deb Mishra, Miguel Fr\'ias-P\'erez, Rahul Trivedi

(参考訳) 量子コンピュータとシミュレータは、古典的および量子的ハミルトニアンの基底状態の発見において、古典的コンピュータを上回る可能性がある。しかし、この利点が誤り訂正なしでノイズの存在に持続できるかどうかはまだ不明である。本稿では,ラグランジュ双対性の原理を生かして,量子回路の出力状態によって達成可能な最小エネルギーに対する検証可能な下限を,非分極ノイズの存在下で古典的に計算する数値解法を開発した。提案手法は、雑音量子回路の性能に回路構造依存的な境界を与えることができるという理論的および数値的な証拠を提供する。

Quantum computers and simulators can potentially outperform classical computers in finding ground states of classical and quantum Hamiltonians. However, if this advantage can persist in the presence of noise without error correction remains unclear. In this paper, by exploiting the principle of Lagrangian duality, we develop a numerical method to classically compute a certifiable lower bound on the minimum energy attainable by the output state of a quantum circuit in the presence of depolarizing noise. We provide theoretical and numerical evidence that this approach can provide circuit-architecture dependent bounds on the performance of noisy quantum circuits.

翻訳日:2023-06-29 13:26:18 公開日:2023-06-28

# 時空間グラフ畳み込みネットワークの伝達学習による視覚障害者の劇場支援システム

Theater Aid System for the Visually Impaired Through Transfer Learning of Spatio-Temporal Graph Convolution Networks ( http://arxiv.org/abs/2306.16357v1 )

ライセンス: Link先を確認

Leyla Benhamida, Slimane Larabi

(参考訳) 本研究の目的は、視覚障害者や視覚障害者を支援するためにステージで行う人間の行動を認識することである。そこで我々は,深度画像で捉えた骨格データを入力として利用する演劇人間行動認識システムを開発した。劇場環境における人間の行動の新たなサンプルを収集し,スケルトンに基づく人行動認識のための3つの事前訓練された時空間グラフ畳み込みネットワーク(時空間グラフ畳み込みネットワーク,2ストリーム適応グラフ畳み込みネットワーク,およびマルチスケール不整合グラフ畳み込みネットワーク)を用いて移動学習手法を検証した。我々は、NTU-RGBDのヒューマンアクションベンチマークをソースドメインとして選択し、収集したデータセットをターゲットドメインとして使用した。本研究は,事前学習モデルの伝達可能性を分析し,トランスファー学習手法をソース領域とターゲット領域の多様性に適用し,適用するための2つの構成を提案した。移行学習の使用は、演劇の文脈における人間の行動システムの性能向上に寄与した。その結果,時空間グラフ畳み込みネットワークは肯定的に転送され,転送学習のないベースラインに比べて性能が向上した。

The aim of this research is to recognize human actions performed on stage to aid visually impaired and blind individuals. To achieve this, we have created a theatre human action recognition system that uses skeleton data captured by depth image as input. We collected new samples of human actions in a theatre environment, and then tested the transfer learning technique with three pre-trained Spatio-Temporal Graph Convolution Networks for skeleton-based human action recognition: the spatio-temporal graph convolution network, the two-stream adaptive graph convolution network, and the multi-scale disentangled unified graph convolution network. We selected the NTU-RGBD human action benchmark as the source domain and used our collected dataset as the target domain. We analyzed the transferability of the pre-trained models and proposed two configurations to apply and adapt the transfer learning technique to the diversity between the source and target domains. The use of transfer learning helped to improve the performance of the human action system within the context of theatre. The results indicate that Spatio-Temporal Graph Convolution Networks is positively transferred, and there was an improvement in performance compared to the baseline without transfer learning.

翻訳日:2023-06-29 13:26:06 公開日:2023-06-28

# cuSLINK:GPU上の単一リンク集約クラスタリング

cuSLINK: Single-linkage Agglomerative Clustering on the GPU ( http://arxiv.org/abs/2306.16354v1 )

ライセンス: Link先を確認

Corey J. Nolet, Divye Gala, Alex Fender, Mahesh Doijade, Joe Eaton, Edward Raff, John Zedlewski, Brad Rees, Tim Oates

(参考訳) 本稿では,GPU上のSLINKアルゴリズムの新規かつ最先端の再構成であるcuSLINKを提案する。また,cuslinkを構成する新規かつ再利用可能なビルディングブロックのセットを提案する。これらのビルディングブロックには、$k$-NNグラフ構築、スパンニングツリー、デンドログラムクラスタ抽出のための高度に最適化された計算パターンが含まれている。我々は、プリミティブをGPU上でcuSLINKのエンドツーエンド実装にどのように使用したかを示し、さらに、かつて難解だったさまざまな現実世界のデータマイニングと機械学習アプリケーションを可能にしました。 HDBSCANアルゴリズムの主要な計算ボトルネックであるだけでなく、我々のエンドツーエンドのcuSLINKアルゴリズムの影響は、ソーシャルおよびコンピュータネットワークにおけるクラスタ分析、自然言語処理、コンピュータビジョンなど、幅広い重要な応用に及んでいる。 cuSLINKはhttps://docs.rapids.ai/api/cuml/latest/api/#agglomerative-clusteringで入手できる。

In this paper, we propose cuSLINK, a novel and state-of-the-art reformulation of the SLINK algorithm on the GPU which requires only $O(Nk)$ space and uses a parameter $k$ to trade off space and time. We also propose a set of novel and reusable building blocks that compose cuSLINK. These building blocks include highly optimized computational patterns for $k$-NN graph construction, spanning trees, and dendrogram cluster extraction. We show how we used our primitives to implement cuSLINK end-to-end on the GPU, further enabling a wide range of real-world data mining and machine learning applications that were once intractable. In addition to being a primary computational bottleneck in the popular HDBSCAN algorithm, the impact of our end-to-end cuSLINK algorithm spans a large range of important applications, including cluster analysis in social and computer networks, natural language processing, and computer vision. Users can obtain cuSLINK at https://docs.rapids.ai/api/cuml/latest/api/#agglomerative-clustering

翻訳日:2023-06-29 13:25:42 公開日:2023-06-28

# データ攻撃に対するアグリゲーション防衛の実践的側面について

On Practical Aspects of Aggregation Defenses against Data Poisoning Attacks ( http://arxiv.org/abs/2306.16415v1 )

ライセンス: Link先を確認

Wenxiao Wang, Soheil Feizi

(参考訳) データへのアクセスの増加は、悪意のあるトレーニングサンプルでディープラーニングモデルの振る舞いを操作できるため、ディープラーニングの機会とリスクの両方をもたらす。このような攻撃はデータ中毒として知られている。データ中毒に対する防衛戦略の最近の進歩は、診断された中毒の堅牢性における最先端の成果を達成するための集約スキームの有効性を強調している。しかし、これらのアプローチの実践的意味はいまだ不明である。ここでは,代表的なアグリゲーション防御であるディープパーティショニングアグリゲーションに注目し,その実用的側面である効率,性能,堅牢性を評価する。評価には、ImageNetを64×64の解像度にリサイズして、以前のものよりも大規模な評価を可能にする。まず,集約防御のトレーニングと推論の効率を向上し,ベースモデルのスケーリングをシンプルかつ実用的なアプローチとして示す。第2に,データ・複雑度比,すなわちデータセットのサイズとサンプルの複雑さの比率を,精度を保ちながら展開可能なベースモデルの最大数を実用的に推定する実験的な証拠を提供する。最後に,アグリゲーション・ディフェンスが,アグリゲーションのアグリゲーションのアグリゲーション・ロバスト性において,アグリゲーションのアグリゲーション・ロバスト性を示す主要なメカニズムである中毒過剰適合現象を経験的に促進する方法を指摘する。全体として,データ中毒の脅威を軽減するために,アグリゲーション防御の実践的実装に有用な知見を提供する。

The increasing access to data poses both opportunities and risks in deep learning, as one can manipulate the behaviors of deep learning models with malicious training samples. Such attacks are known as data poisoning. Recent advances in defense strategies against data poisoning have highlighted the effectiveness of aggregation schemes in achieving state-of-the-art results in certified poisoning robustness. However, the practical implications of these approaches remain unclear. Here we focus on Deep Partition Aggregation, a representative aggregation defense, and assess its practical aspects, including efficiency, performance, and robustness. For evaluations, we use ImageNet resized to a resolution of 64 by 64 to enable evaluations at a larger scale than previous ones. Firstly, we demonstrate a simple yet practical approach to scaling base models, which improves the efficiency of training and inference for aggregation defenses. Secondly, we provide empirical evidence supporting the data-to-complexity ratio, i.e. the ratio between the data set size and sample complexity, as a practical estimation of the maximum number of base models that can be deployed while preserving accuracy. Last but not least, we point out how aggregation defenses boost poisoning robustness empirically through the poisoning overfitting phenomenon, which is the key underlying mechanism for the empirical poisoning robustness of aggregations. Overall, our findings provide valuable insights for practical implementations of aggregation defenses to mitigate the threat of data poisoning.

翻訳日:2023-06-29 13:18:21 公開日:2023-06-28

# MultiZoo & MultiBench: マルチモーダルディープラーニングのための標準ツールキット

MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep Learning ( http://arxiv.org/abs/2306.16413v1 )

ライセンス: Link先を確認

Paul Pu Liang, Yiwei Lyu, Xiang Fan, Arav Agarwal, Yun Cheng, Louis-Philippe Morency, Ruslan Salakhutdinov

(参考訳) マルチモーダル表現の学習には、複数の異種データソースからの情報を統合することが含まれる。実世界のロバスト性を確保しつつ、未調査のモダリティやタスクの進歩を加速するため、20以上のコアマルチモーダルアルゴリズムと15のデータセット、10のモダリティ、20の予測タスク、および6つの研究領域にまたがる大規模ベンチマークであるMultiBenchを実装した公開ツールキットであるMultiZooをリリースする。これらを合わせて、データローディングや実験的なセットアップ、モデル評価の簡略化と標準化を行う、エンドツーエンドのマシンラーニングパイプラインが提供される。本研究では,(1)一般化,(2)時間と空間の複雑さ,(3)モダリティの堅牢性を評価するための包括的方法論を提案する。マルチベンチは、使いやすさ、アクセシビリティ、再現性を確保しつつ、マルチモーダルモデルの能力と制限をよりよく理解するための道を開く。私たちのツールキットは公開され、定期的に更新され、コミュニティからのインプットを歓迎します。

Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiZoo, a public toolkit consisting of standardized implementations of > 20 core multimodal algorithms and MultiBench, a large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. Together, these provide an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, we offer a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench paves the way towards a better understanding of the capabilities and limitations of multimodal models, while ensuring ease of use, accessibility, and reproducibility. Our toolkits are publicly available, will be regularly updated, and welcome inputs from the community.

翻訳日:2023-06-29 13:17:52 公開日:2023-06-28

# 見える言語モデルに向けて:自然言語レンズによるコンピュータビジョン

Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language ( http://arxiv.org/abs/2306.16410v1 )

ライセンス: Link先を確認

William Berrios, Gautam Mittal, Tristan Thrush, Douwe Kiela, Amanpreet Singh

(参考訳) 大規模言語モデル(LLM)のパワーを活用することで,コンピュータビジョン問題に対処するためのモジュール型アプローチであるLENSを提案する。本システムでは、画像に関する徹底的な情報を提供する独立かつ記述性の高い視覚モジュール群からの出力を推論するために言語モデルを用いる。我々は,ゼロショットや少数ショットの物体認識などの純粋コンピュータビジョンの設定や,視覚や言語の問題に対するアプローチを評価する。 LENS は市販の LLM にも適用可能であり,LENS を用いた LLM は,より大規模で高度なシステムで高い競争力を発揮する。私たちはコードをhttps://github.com/contextualai/lensでオープンソースにし、インタラクティブなデモを提供します。

We propose LENS, a modular approach for tackling computer vision problems by leveraging the power of large language models (LLMs). Our system uses a language model to reason over outputs from a set of independent and highly descriptive vision modules that provide exhaustive information about an image. We evaluate the approach on pure computer vision settings such as zero- and few-shot object recognition, as well as on vision and language problems. LENS can be applied to any off-the-shelf LLM and we find that the LLMs with LENS perform highly competitively with much bigger and much more sophisticated systems, without any multimodal training whatsoever. We open-source our code at https://github.com/ContextualAI/lens and provide an interactive demo.

翻訳日:2023-06-29 13:17:31 公開日:2023-06-28

# データセットシフトの一般形に基づく効率的かつ多元的ロバストリスク推定

Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift ( http://arxiv.org/abs/2306.16406v1 )

ライセンス: Link先を確認

Hongxiang Qiu, Eric Tchetgen Tchetgen, Edgar Dobriban

(参考訳) 統計的な機械学習手法は、利害関係者から利用可能な限られたデータの課題に直面することが多い。 1つの治療法は、いくつかの条件分布を共有したり、ターゲットドメインと他の方法でリンクされた補助源集団のデータを活用することである。このような \emph{dataset shift} 条件を活用する手法は \emph{domain adaptation} または \emph{transfer learning} として知られている。データセットのシフトに関する広範な文献にもかかわらず、限定的な研究は、対象人口における与えられた機械学習タスクのリスク評価の正確性を改善するために補助人口を効率的に利用する方法に言及している。本稿では, 半パラメトリック効率理論を用いて, 様々なデータセットシフト条件下でターゲット人口リスクを効率的に推定する一般的な問題について検討する。我々は,共変量,ラベル,概念シフトの3つの一般的な条件を含む,データセットシフト条件の一般的なクラスを特別なケースとして検討する。我々は、ソースとターゲットの人口の間で部分的に重複しない支持を可能にする。我々は、これらのデータセットシフト条件の簡単な仕様テストと共に、効率的かつ多重にロバストな推定器を開発する。また、他の2つのデータセットシフト条件、後方ドリフトと位置スケールシフトの効率境界も導出する。シミュレーション研究は、妥当なデータセットシフト条件の活用による効率向上を支援する。

Statistical machine learning methods often face the challenge of limited data available from the population of interest. One remedy is to leverage data from auxiliary source populations, which share some conditional distributions or are linked in other ways with the target domain. Techniques leveraging such \emph{dataset shift} conditions are known as \emph{domain adaptation} or \emph{transfer learning}. Despite extensive literature on dataset shift, limited works address how to efficiently use the auxiliary populations to improve the accuracy of risk evaluation for a given machine learning task in the target population. In this paper, we study the general problem of efficiently estimating target population risk under various dataset shift conditions, leveraging semiparametric efficiency theory. We consider a general class of dataset shift conditions, which includes three popular conditions -- covariate, label and concept shift -- as special cases. We allow for partially non-overlapping support between the source and target populations. We develop efficient and multiply robust estimators along with a straightforward specification test of these dataset shift conditions. We also derive efficiency bounds for two other dataset shift conditions, posterior drift and location-scale shift. Simulation studies support the efficiency gains due to leveraging plausible dataset shift conditions.

翻訳日:2023-06-29 13:17:19 公開日:2023-06-28

# 量子2ブロック群代数符号

Quantum two-block group algebra codes ( http://arxiv.org/abs/2306.16400v1 )

ライセンス: Link先を確認

Hsiang-Ku Lin and Leonid P. Pryadko

(参考訳) 量子2ブロック群代数 (2BGA) は、これまで研究されていない最小の持ち上げ積 (LP) 符号の族である。これらの符号は一般化双サイクル(gb)符号と関係があるが、巡回群は任意の有限群(一般に非可換群)に置き換えられる。特別な場合として、2BGA符号は、準巡回符号を含むアーベル群上の正方行列LP符号のサブセットと、古典群符号の対から構築された全正方行列ハイパーグラフ積符号を含む。 2bga符号の置換同値性の基準を定め、それらのパラメータの境界を明示的かつ他の量子符号と古典符号との関係で与える。また、安定化器発生器重みが$W \le 8$、アーベル群が$n \le 100$、非アーベル群が$n \le 200$の全ての非等価連結2BGA符号の最適パラメータを列挙する。

We consider quantum two-block group algebra (2BGA) codes, a previously unstudied family of smallest lifted-product (LP) codes. These codes are related to generalized-bicycle (GB) codes, except a cyclic group is replaced with an arbitrary finite group, generally non-abelian. As special cases, 2BGA codes include a subset of square-matrix LP codes over abelian groups, including quasi-cyclic codes, and all square-matrix hypergraph-product codes constructed from a pair of classical group codes. We establish criteria for permutation equivalence of 2BGA codes and give bounds for their parameters, both explicit and in relation to other quantum and classical codes. We also enumerate the optimal parameters of all inequivalent connected 2BGA codes with stabilizer generator weights $W \le 8$, of length $n \le 100$ for abelian groups, and $n \le 200$ for non-abelian groups.

翻訳日:2023-06-29 13:17:01 公開日:2023-06-28

# SUSYによる平行非一様電磁界におけるディラック材料:新しいキラル平面ホール効果のクラスか?

Dirac materials in parallel non-uniform electromagnetic fields generated by SUSY: A new class of chiral Planar Hall Effect? ( http://arxiv.org/abs/2306.16399v1 )

ライセンス: Link先を確認

Julio Cesar P\'erez-Pedraza, Juan D. Garc\'ia-Mu\~noz and A. Raya

(参考訳) 超対称量子力学(susy-qm)の枠組みでは、外部平行電気および磁場の存在下でディラック材料を記述する(3+1)ディラック方程式が解かれる。 y方向に沿った並進対称性を持つ静的だが非一様の電気的および磁気的プロファイルを考えると、ディラック方程式はフェルミオン場の各キラリティに対して2つの分離されたschr\"odinger方程式に変換される。ベクトルポテンシャルとスカラーポテンシャルの三角プロファイルと双曲プロファイルを取り、susyパートナー p\"oschl-teller-like quantum potentials に到達した。解析的ゼロモード解を支持するポテンシャルの条件に限定すると、電場と磁場が作用する同じ平面で非自明な電流密度が得られるが、両者は垂直であり、平面ホール効果を実現する可能性を示している。さらに、この非破壊電流密度は、左右のキラル性に対する電流密度の和であり、正の電流はキラル対称性の結果であることを示唆している。

Within a Supersymmetric Quantum Mechanics (SUSY-QM) framework, the (3+1) Dirac equation describing a Dirac material in the presence of external parallel electric and magnetic fields is solved. Considering static but non-uniform electric and magnetic profiles with translational symmetry along the y-direction, the Dirac equation is transformed into two decoupled pairs of Schr\"odinger equations, one for each chirality of the fermion fields. Taking trigonometric and hyperbolic profiles for the vector and scalar potentials, respectively, we arrive at SUSY partner P\"oschl-Teller-like quantum potentials. Restricting to the conditions of the potentials that support an analytic zero-mode solution, we obtain a nontrivial current density in the same plane where the electric and magnetic fields lie, but perpendicular to both of them, indicating the possibility of realizing the Planar Hall Effect. Furthermore, this non-vanishing current density is the sum of current densities for the left- and right-chiralities, suggesting that the net current is a consequence of chiral symmetry.

翻訳日:2023-06-29 13:16:40 公開日:2023-06-28

# 量子チャネルとスーパーチャネルの双対性は基底依存性である

Duality between quantum channels and super-channels is basis-dependent ( http://arxiv.org/abs/2306.16395v1 )

ライセンス: Link先を確認

Sohail, Sahil, Ritabrata Sengupta, Ujjwal Sen

(参考訳) Choi-Jamio{\l}kowski-Kraus-Sudarshan量子チャネル状態同型における完全正の対正の対応は基底の選択に依存する。例えば、パウリのスピン行列と、二次元複素ヒルベルト空間上の有界作用素の空間の基底としての恒等式を使用すれば、この対応は崩壊する。この対応の妥当性に基づく十分条件は、後に Kye~\cite{Kye} によって必要であることが証明された Paulsen と Shult~\cite{Paulsen} の業績に与えられる。スーパーマップの空間と入力と出力の空間のテンソル積の間にも対応性が存在する。特に、超写像が完全CP保存であることと、そのChoi型表現が完全正であることは同値である。この対応は基底の特定の選択にも依存する。本研究では,この対応が真であるように,必要かつ十分な条件を求める。

The complete positivity vs positivity correspondence in the Choi-Jamio{\l}kowski-Kraus-Sudarshan quantum channel-state isomorphism depends on the choice of basis. Instead of the ``canonical'' basis, if we use, e.g., the Pauli spin matrices along with the identity as the basis for the space of bounded operators on the two-dimensional complex Hilbert space, this correspondence breaks down. A sufficient condition on the basis for validity of this correspondence is provided in the work of Paulsen and Shult~\cite{Paulsen}, which was later proven to be necessary by Kye~\cite{Kye}. A correspondence is also present between the space of super-maps and the tensor product of the spaces of the inputs and outputs of the same. In particular, a super-map is completely CP-preserving if and only if its Choi-type representation is completely positive (CP). This correspondence also depends on a specific choice of basis. In this work, we find the necessary and sufficient condition on a basis such that this correspondence holds true.

翻訳日:2023-06-29 13:16:18 公開日:2023-06-28

# 平均回帰マルコフ決定過程に対するシャーパモデルフリー強化学習

Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes ( http://arxiv.org/abs/2306.16394v1 )

ライセンス: Link先を確認

Zihan Zhang and Qiaomin Xie

(参考訳) 我々は,無限水平平均逆マルコフ決定過程(MDPs)のモデルフリー強化学習(RL)アルゴリズムを開発した。オンライン設定とシミュレータへのアクセスによる設定の両方について検討する。本稿では,参照アドバンテージ分解に基づくモデルフリーRLアルゴリズムを提案する。このアルゴリズムは、$t$ステップ後に$\widetilde{o}(s^5a^2\mathrm{sp}(h^*)\sqrt{t})$を成し、$s\times a$は状態-作用空間のサイズであり、$\mathrm{sp}(h^*)$は最適なバイアス関数の幅である。我々の結果は、弱通信型MDPに対するT$の最適依存性を最初に達成したものである。シミュレータ設定では,$\widetilde{O} \left(\frac{SA\mathrm{sp}^2(h^*)}{\epsilon^2}+\frac{S^2A\mathrm{sp}(h^*)}{\epsilon} \right)$サンプルを用いて,$\Omega\left(\frac{SA\mathrm{sp}(h^*)}{\epsilon^2}\right)$を使用するモデルフリーなRLアルゴリズムを提案する。この結果は,平均回帰設定でユニークな2つの新しい手法に基づいている。 1) 値差推定によるより良い割引近似 2) 空間複雑性を$O(SA)$とする最適バイアス関数に対する信頼領域の効率的な構築。

We develop several provably efficient model-free reinforcement learning (RL) algorithms for infinite-horizon average-reward Markov Decision Processes (MDPs). We consider both online setting and the setting with access to a simulator. In the online setting, we propose model-free RL algorithms based on reference-advantage decomposition. Our algorithm achieves $\widetilde{O}(S^5A^2\mathrm{sp}(h^*)\sqrt{T})$ regret after $T$ steps, where $S\times A$ is the size of state-action space, and $\mathrm{sp}(h^*)$ the span of the optimal bias function. Our results are the first to achieve optimal dependence in $T$ for weakly communicating MDPs. In the simulator setting, we propose a model-free RL algorithm that finds an $\epsilon$-optimal policy using $\widetilde{O} \left(\frac{SA\mathrm{sp}^2(h^*)}{\epsilon^2}+\frac{S^2A\mathrm{sp}(h^*)}{\epsilon} \right)$ samples, whereas the minimax lower bound is $\Omega\left(\frac{SA\mathrm{sp}(h^*)}{\epsilon^2}\right)$. Our results are based on two new techniques that are unique in the average-reward setting: 1) better discounted approximation by value-difference estimation; 2) efficient construction of confidence region for the optimal bias function with space complexity $O(SA)$.

翻訳日:2023-06-29 13:16:00 公開日:2023-06-28

# 言語モデルにおける主観的グローバル・オピニオン表現の計測に向けて

Towards Measuring the Representation of Subjective Global Opinions in Language Models ( http://arxiv.org/abs/2306.16388v1 )

ライセンス: Link先を確認

Esin Durmus, Karina Nyugen, Thomas I. Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, Deep Ganguli

(参考訳) 大規模言語モデル(LLM)は、社会問題に関する多様なグローバルな視点を公平に表すものではない。本稿では,モデル生成応答がより類似している意見を評価するための定量的枠組みを開発する。まず,各国のグローバル問題に対するさまざまな意見の収集を目的とした全国横断調査から回答を得たデータセットGlobalOpinionQAを構築した。次に, LLM が生成する調査応答と, 国別に設定した人的応答の類似度を定量化する指標を定義した。われわれのフレームワークでは、3つの実験をLEMで実施し、立憲AIに役立ち、正直で無害であるように訓練した。デフォルトでは、LCMの反応は、米国や一部のヨーロッパや南米諸国のような特定の人口の意見とよく似ており、偏見の可能性を浮き彫りにしている。モデルに特定の国の視点を考察するよう促すと、応答は人口の意見によく似ているが、有害な文化的ステレオタイプを反映することができる。我々がGlobalOpinionQA質問を対象言語に翻訳するとき、モデルの応答は必ずしもそれらの言語の話者の意見に最もよく似ているとは限らない。他の人が使用して構築するためのデータセットをリリースします。私たちのデータはhttps://huggingface.co/datasets/Anthropic/llm_global_opinionsにあります。また、https://llmglobalvalues.anthropic.comでもインタラクティブな可視化を提供しています。

Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries. Next, we define a metric that quantifies the similarity between LLM-generated survey responses and human responses, conditioned on country. With our framework, we run three experiments on an LLM trained to be helpful, honest, and harmless with Constitutional AI. By default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases. When we prompt the model to consider a particular country's perspective, responses shift to be more similar to the opinions of the prompted populations, but can reflect harmful cultural stereotypes. When we translate GlobalOpinionQA questions to a target language, the model's responses do not necessarily become the most similar to the opinions of speakers of those languages. We release our dataset for others to use and build on. Our data is at https://huggingface.co/datasets/Anthropic/llm_global_opinions. We also provide an interactive visualization at https://llmglobalvalues.anthropic.com.

翻訳日:2023-06-29 13:15:20 公開日:2023-06-28

# GPUによる直接ストレージアクセスによるGNNフレームワークのサンプリングと集約操作の高速化

Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses ( http://arxiv.org/abs/2306.16384v1 )

ライセンス: Link先を確認

Jeongmin Brian Park and Vikram Sharma Mailthody and Zaid Qureshi and Wen-mei Hwu

(参考訳) グラフニューラルネットワーク(gnns)は、グラフ構造化データから学び、さまざまなアプリケーションドメインで高度な推論タスクを実行するための強力なツールとして登場している。 GNNは、控えめなグラフで有効であることが示されているが、効率的なデータアクセスとデータ移動方法がないため、大規模グラフでそれらを訓練することは大きな課題である。既存のGNNトレーニングフレームワークでは、グラフサンプリングと機能集約にCPUを使用し、GPU上でモデルの重み付けのトレーニングと更新が実行される。しかし、我々の詳細なプロファイリングは、CPUがGNNモデルのトレーニングスループットを飽和させるのに必要なスループットを達成できないことを示している。さらに、グラフとその埋め込みがCPUメモリに収まらない場合、オペレーティングシステムによって導入されたオーバーヘッド、例えばページフォールトを扱うことは、実行の重要な経路となる。これらの問題に対処するために、GPU Initiated Direct Storage Access (GIDS) データローダを提案し、CPUメモリ、ストレージ、GPUメモリなどのハードウェアリソースをハイブリッドデータ配置戦略で効率的に活用しながら、大規模グラフに対するGPU指向のGNNトレーニングを可能にする。 GPUスレッドがストレージから直接特徴ベクトルをフェッチできるようにすることで、GIDSデータローダはGPU指向のGNNトレーニングのメモリ容量問題を解決する。さらに、GIDSデータローダはGPU並列性を利用してストレージ遅延を許容し、高価なページフォールトオーバーヘッドを排除している。これにより、局所性を活かし、GNNトレーニングに有効な帯域幅を増やすための新しい最適化を設計できる。テラバイト規模のGNNデータセット上の1つのGPUを用いて評価したところ、GIDSデータローダは、現在最先端のDGLデータローダと比較して、DGL GNNトレーニングパイプライン全体を最大392倍高速化することがわかった。

Graph Neural Networks (GNNs) are emerging as a powerful tool for learning from graph-structured data and performing sophisticated inference tasks in various application domains. Although GNNs have been shown to be effective on modest-sized graphs, training them on large-scale graphs remains a significant challenge due to lack of efficient data access and data movement methods. Existing frameworks for training GNNs use CPUs for graph sampling and feature aggregation, while the training and updating of model weights are executed on GPUs. However, our in-depth profiling shows the CPUs cannot achieve the throughput required to saturate GNN model training throughput, causing gross under-utilization of expensive GPU resources. Furthermore, when the graph and its embeddings do not fit in the CPU memory, the overhead introduced by the operating system, say for handling page-faults, comes in the critical path of execution. To address these issues, we propose the GPU Initiated Direct Storage Access (GIDS) dataloader, to enable GPU-oriented GNN training for large-scale graphs while efficiently utilizing all hardware resources, such as CPU memory, storage, and GPU memory with a hybrid data placement strategy. By enabling GPU threads to fetch feature vectors directly from storage, GIDS dataloader solves the memory capacity problem for GPU-oriented GNN training. Moreover, GIDS dataloader leverages GPU parallelism to tolerate storage latency and eliminates expensive page-fault overhead. Doing so enables us to design novel optimizations for exploiting locality and increasing effective bandwidth for GNN training. Our evaluation using a single GPU on terabyte-scale GNN datasets shows that GIDS dataloader accelerates the overall DGL GNN training pipeline by up to 392X when compared to the current, state-of-the-art DGL dataloader.

翻訳日:2023-06-29 13:14:57 公開日:2023-06-28

# 自動運転の新技術の概要

An Overview about Emerging Technologies of Autonomous Driving ( http://arxiv.org/abs/2306.13302v3 )

ライセンス: Link先を確認

Yu Huang, Yue Chen, Zijiang Yang

(参考訳) 2004年にDARPAがグランドチャレンジを始め、2007年にアーバンチャレンジを開始して以来、自動運転はAIアプリケーションの最も活発な分野となっている。本稿では,自動運転技術とオープン問題の技術的側面について概説する。本稿では,認識,マッピングとローカライゼーション,予測,計画と制御,シミュレーション,V2X,安全性など,自動運転システムの主要な分野について検討する。特に私たちは,ロングテールの自動運転問題を解決するための一般的なプラットフォームであるdata closed loopのフレームワークで,これらすべての問題を詳しく説明しています。

Since DARPA started Grand Challenges in 2004 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. This paper gives an overview about technical aspects of autonomous driving technologies and open problems. We investigate the major fields of self-driving systems, such as perception, mapping and localization, prediction, planning and control, simulation, V2X and safety etc. Especially we elaborate on all these issues in a framework of data closed loop, a popular platform to solve the long tailed autonomous driving problems.

翻訳日:2023-06-29 11:30:31 公開日:2023-06-28

# 意識的知識グラフ畳み込みネットワークに基づく観光客の推薦

Tourist Attractions Recommendation based on Attention Knowledge Graph Convolution Network ( http://arxiv.org/abs/2306.10946v3 )

ライセンス: Link先を確認

Ahmad A. Mubarak and Afifa Kahled

(参考訳) 知識グラフに基づく推薦アルゴリズムは比較的成熟した段階にある。しかし、特定の分野の推薦にはいくつかの問題がある。例えば、観光分野では、観光アトラクションの推奨基盤として、適切な観光アトラクション属性の選択プロセスが複雑である。本稿では,対象の景観スポットの近傍のエンティティを自動的に意味的に発見する改良された意識知識グラフ畳み込みネットワークモデル(Att-KGCN)を提案する。注意層は比較的類似した位置を集約し、隣接するベクトルでそれらを表現する。そして、観光客の好む選択により、類似点の確率を推薦システムとして予測する。 Socotra Island-Yemenの観光データに基づく観光名所の知識グラフデータセット実験により,アテンションナレッジグラフ畳み込みネットワークが観光名所のレコメンデーションに良い影響を与え,観光客の選択により多くのレコメンデーションをすることができることを確認した。

The recommendation algorithm based on knowledge graphs is at a relatively mature stage. However, there are still some problems in the recommendation of specific areas. For example, in the tourism field, selecting suitable tourist attraction attributes process is complicated as the recommendation basis for tourist attractions. In this paper, we propose the improved Attention Knowledge Graph Convolution Network model, named (Att-KGCN), which automatically discovers the neighboring entities of the target scenic spot semantically. The attention layer aggregates relatively similar locations and represents them with an adjacent vector. Then, according to the tourist's preferred choices, the model predicts the probability of similar spots as a recommendation system. A knowledge graph dataset of tourist attractions used based on tourism data on Socotra Island-Yemen. Through experiments, it is verified that the Attention Knowledge Graph Convolution Network has a good effect on the recommendation of tourist attractions and can make more recommendations for tourists' choices.

翻訳日:2023-06-29 11:30:22 公開日:2023-06-28

# TSMixer:多変量時系列予測のための軽量MLPミクサモデル

TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting ( http://arxiv.org/abs/2306.09364v3 )

ライセンス: Link先を確認

Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam

(参考訳) トランスフォーマーは時系列予測において、長い列の相互作用を捉える能力で人気を集めている。しかし、その高いメモリとコンピューティング要件は長期的な予測に重大なボトルネックをもたらす。そこで本研究では,多層パーセプトロン(MLP)モジュールのみからなる軽量ニューラルネットワークTSMixerを提案する。 tsmixerはパッチ付き時系列の多変量予測と表現学習のために設計されており、トランスフォーマーの効率的な代替手段を提供する。我々のモデルはコンピュータビジョンにおけるMLP-Mixerモデルの成功からインスピレーションを得ている。時系列にVision MLP-Mixerを適用する際の課題を示し、精度を高めるために経験的検証されたコンポーネントを導入する。これは、階層構造やチャネル相関などの時系列特性を明示的にモデル化するための、MLP-Mixerバックボーンにオンライン和解ヘッドを付加する新しい設計パラダイムを含む。また,既存のパッチチャネル混合方式では一般的な課題である,多種多様なデータセット間のノイズチャネルインタラクションと一般化を効果的に処理するためのハイブリッドチャネルモデリング手法を提案する。さらに、重要な特徴を優先するために、バックボーンに単純なゲートアテンション機構が導入される。これらの軽量なコンポーネントを組み込むことで、単純なmlp構造の学習能力を大幅に向上させ、最小の計算使用量で複雑なトランスフォーマーモデルを上回る。さらに、TSMixerのモジュール設計により、教師付きとマスク付きの両方の自己教師付き学習手法との互換性が実現され、時系列基礎モデルのための有望なビルディングブロックとなる。 TSMixer は最先端の MLP と Transformer のモデルよりも 8-60% の差で予測できる。また、Patch-Transformerモデルの最新の強力なベンチマーク(1～2%)を上回り、メモリとランタイム(2～3倍)を大幅に削減した。

Transformers have gained popularity in time series forecasting for their ability to capture long-sequence interactions. However, their high memory and computing requirements pose a critical bottleneck for long-term forecasting. To address this, we propose TSMixer, a lightweight neural architecture exclusively composed of multi-layer perceptron (MLP) modules. TSMixer is designed for multivariate forecasting and representation learning on patched time series, providing an efficient alternative to Transformers. Our model draws inspiration from the success of MLP-Mixer models in computer vision. We demonstrate the challenges involved in adapting Vision MLP-Mixer for time series and introduce empirically validated components to enhance accuracy. This includes a novel design paradigm of attaching online reconciliation heads to the MLP-Mixer backbone, for explicitly modeling the time-series properties such as hierarchy and channel-correlations. We also propose a Hybrid channel modeling approach to effectively handle noisy channel interactions and generalization across diverse datasets, a common challenge in existing patch channel-mixing methods. Additionally, a simple gated attention mechanism is introduced in the backbone to prioritize important features. By incorporating these lightweight components, we significantly enhance the learning capability of simple MLP structures, outperforming complex Transformer models with minimal computing usage. Moreover, TSMixer's modular design enables compatibility with both supervised and masked self-supervised learning methods, making it a promising building block for time-series Foundation Models. TSMixer outperforms state-of-the-art MLP and Transformer models in forecasting by a considerable margin of 8-60%. It also outperforms the latest strong benchmarks of Patch-Transformer models (by 1-2%) with a significant reduction in memory and runtime (2-3X).

翻訳日:2023-06-29 11:30:07 公開日:2023-06-28

# 非均一サンプリングによるネットワークデータの等角予測の有効性について

On the Validity of Conformal Prediction for Network Data Under Non-Uniform Sampling ( http://arxiv.org/abs/2306.07252v3 )

ライセンス: Link先を確認

Robert Lunde

(参考訳) 実例ではよく見られるが,ノードの非表現的なサンプルとなる様々なサンプリングメカニズムの下で,ネットワークデータの共形予測の特性について検討する。これらのサンプリング機構を,過集団に適用する選択規則として解釈し,適切な選択イベントにおける共形予測条件の有効性について検討する。選択規則が置換不変性を満たす場合、サンプルされたサブアレイは選択イベント上で交換可能条件であり、その超集団に対して共有交換可能条件が成立することを示す。以上の結果から,エゴネットワークや雪玉サンプリングに関連する特定の選択事象に対する共形予測の有限サンプルの有効性が示唆された。また,グラフ上のランダムなウォークでデータをサンプリングすると,重み付き共形予測の変種が個体群から選択したノードに対して漸近的に妥当な予測集合を生成することを示した。

We study the properties of conformal prediction for network data under various sampling mechanisms that commonly arise in practice but often result in a non-representative sample of nodes. We interpret these sampling mechanisms as selection rules applied to a superpopulation and study the validity of conformal prediction conditional on an appropriate selection event. We show that the sampled subarray is exchangeable conditional on the selection event if the selection rule satisfies a permutation invariance property and a joint exchangeability condition holds for the superpopulation. Our result implies the finite-sample validity of conformal prediction for certain selection events related to ego networks and snowball sampling. We also show that when data are sampled via a random walk on a graph, a variant of weighted conformal prediction yields asymptotically valid prediction sets for an independently selected node from the population.

翻訳日:2023-06-29 11:29:37 公開日:2023-06-28

# 大規模言語モデルからレコメンダシステムにどのようなメリットがあるか:調査

How Can Recommender Systems Benefit from Large Language Models: A Survey ( http://arxiv.org/abs/2306.05817v4 )

ライセンス: Link先を確認

Jianghao Lin, Xinyi Dai, Yunjia Xi, Weiwen Liu, Bo Chen, Xiangyang Li, Chenxu Zhu, Huifeng Guo, Yong Yu, Ruiming Tang, Weinan Zhang

(参考訳) インターネットアプリケーションにおいて,レコメンダシステム(RS)はユーザの情報要求に合わせて重要な役割を果たす。自然言語処理(nlp)領域では、大規模言語モデル(llm)は驚くべき創発的能力(例えば命令追従、推論)を示しており、llmをrsに適用してパフォーマンスの向上とユーザエクスペリエンスの改善を行う有望な研究方向を生み出している。本稿では,本研究の方向性をアプリケーション指向の観点から総合的に調査する。まず, LLM を RS に適用する方法という2つの直交的な視点から, 既存の研究成果を要約する。 where"という質問に対して、我々は、レコメンデーションパイプラインのさまざまなステージでllmが果たすことができる役割、すなわち、機能工学、特徴エンコーダ、スコアリング/ランキング関数、パイプラインコントローラについて論じる。 how"問題に対しては、トレーニングと推論の戦略を調査し、llmをチューニングするか否か、推論に従来の推奨モデル(crm)を関与させるかどうかという2つの詳細な分類基準を導出する。いずれの質問にも詳細な分析と一般的な開発軌跡が提供される。次に,3つの側面,すなわち効率性,有効性,倫理性から,LSMをRSに適用する上での課題を強調した。最後に,調査の概要と今後の展望について考察する。また、この上昇方向において、論文やその他の関連リソースのためのgithubリポジトリを積極的に維持している。

Recommender systems (RS) play important roles to match users' information needs for Internet applications. In natural language processing (NLP) domains, large language model (LLM) has shown astonishing emergent abilities (e.g., instruction following, reasoning), thus giving rise to the promising research direction of adapting LLM to RS for performance enhancements and user experience improvements. In this paper, we conduct a comprehensive survey on this research direction from an application-oriented view. We first summarize existing research works from two orthogonal perspectives: where and how to adapt LLM to RS. For the "WHERE" question, we discuss the roles that LLM could play in different stages of the recommendation pipeline, i.e., feature engineering, feature encoder, scoring/ranking function, and pipeline controller. For the "HOW" question, we investigate the training and inference strategies, resulting in two fine-grained taxonomy criteria, i.e., whether to tune LLMs or not, and whether to involve conventional recommendation model (CRM) for inference. Detailed analysis and general development trajectories are provided for both questions, respectively. Then, we highlight key challenges in adapting LLM to RS from three aspects, i.e., efficiency, effectiveness, and ethics. Finally, we summarize the survey and discuss the future prospects. We also actively maintain a GitHub repository for papers and other related resources in this rising direction: https://github.com/CHIANGEL/Awesome-LLM-for-RecSys.

翻訳日:2023-06-29 11:29:22 公開日:2023-06-28

# ストリーミングデータからのニューラルネットワークオンライン学習のための低ランク拡張カルマンフィルタ

Low-rank extended Kalman filtering for online learning of neural networks from streaming data ( http://arxiv.org/abs/2305.19535v3 )

ライセンス: Link先を確認

Peter G. Chang, Gerardo Dur\'an-Mart\'in, Alexander Y Shestopaloff, Matt Jones, Kevin Murphy

(参考訳) 非定常データストリームから非線形関数のパラメータを推定するための効率的なオンライン近似ベイズ推定アルゴリズムを提案する。この方法は拡張カルマンフィルタ(ekf)に基づいているが、モデルパラメータの数に線形なステップあたりのコストを与える、後方精度行列の新たな低ランク+対角分解を用いる。確率的変動推論に基づく手法とは対照的に,本手法は完全に決定論的であり,ステップサイズチューニングを必要としない。実験により,この結果がより高速(より標本効率のよい)学習となり,分布の変化に適応しやすくなり,文脈的バンディットアルゴリズムの一部として使用する場合の報酬の蓄積が早くなることを示した。

We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream. The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior precision matrix, which gives a cost per step which is linear in the number of model parameters. In contrast to methods based on stochastic variational inference, our method is fully deterministic, and does not require step-size tuning. We show experimentally that this results in much faster (more sample efficient) learning, which results in more rapid adaptation to changing distributions, and faster accumulation of reward when used as part of a contextual bandit algorithm.

翻訳日:2023-06-29 11:28:57 公開日:2023-06-28

# QR-CLIP: 位置と時間推論のための明示的なオープンワールド知識の導入

QR-CLIP: Introducing Explicit Open-World Knowledge for Location and Time Reasoning ( http://arxiv.org/abs/2302.00952v3 )

ライセンス: Link先を確認

Weimin Shi, Mingchen Zhuge, Dehong Gao, Zhong Zhou, Ming-Ming Cheng, Deng-Ping Fan

(参考訳) 日々のイメージは、私たちが記憶し、それらから深い情報を推測する必要がある抽象的な意味を伝える。このような人間的な推論を促進するために、我々は機械に従来のセグメンテーションや分類といった基本的なタスクではなく、いつ、どこで、いつ取られたかを予測するように教える。 Horn氏のQR理論に触発されて、2つのコンポーネントからなる新しいQR-CLIPモデルを設計した。 1)Quantityモジュールは,まず,候補言語の入力として,よりオープンワールドな知識を振り返る。 2) 関連モジュールは,視覚と言語手がかりを慎重に推定し,位置と時刻を推定する。実験によりQR-CLIPの有効性が示され、各タスクにおける以前のSOTAを、位置と時間的推論の観点から平均約10%と130%の相対的なリフトで上回ります。本研究は,位置情報と時間的推論の技術的基礎を築いており,オープンワールド知識の効果的な導入が課題のパナセの1つであることを示唆する。

Daily images may convey abstract meanings that require us to memorize and infer profound information from them. To encourage such human-like reasoning, in this work, we teach machines to predict where and when it was taken rather than performing basic tasks like traditional segmentation or classification. Inspired by Horn's QR theory, we designed a novel QR-CLIP model consisting of two components: 1) the Quantity module first retrospects more open-world knowledge as the candidate language inputs; 2) the Relevance module carefully estimates vision and language cues and infers the location and time. Experiments show our QR-CLIP's effectiveness, and it outperforms the previous SOTA on each task by an average of about 10% and 130% relative lift in terms of location and time reasoning. This study lays a technical foundation for location and time reasoning and suggests that effectively introducing open-world knowledge is one of the panaceas for the tasks.

翻訳日:2023-06-29 11:28:42 公開日:2023-06-28

# マインドディアル:神経対話生成のための理論オブマインドモデリングによる信念のダイナミクス追跡

MindDial: Belief Dynamics Tracking with Theory-of-Mind Modeling for Situated Neural Dialogue Generation ( http://arxiv.org/abs/2306.15253v2 )

ライセンス: Link先を確認

Shuwen Qiu, Song-Chun Zhu, Zilong Zheng

(参考訳) 人間は表現された意味や共通点を交渉しながら自由に話す。大きな生成言語モデルの印象的な会話能力にもかかわらず、共有場所における文脈理解の個人差は考慮されていない。本研究はMindDialを提案する。MindDialは、位置自由な応答を生成できる新しい対話型フレームワークで、共通基盤の交渉を行う。我々は,3段階の信念を追跡可能な明示的なマインドモジュールを設計する。話者の信念,話者の聴取者の信念の予測,および最初の2つの間隙に基づく共通信念である。そして、話す行為分類ヘッドは、話を続けるか、このターンを終了するか、タスク関連のアクションを取ることに決めます。 2つのエージェント間の無料チャットに基づいて,1つの相互友人を見つけることを目標とする,信念ダイナミクスアノテーションを用いた共通基底アライメントデータセットの相互フレンドを補強する。実験により, 人間の自然な会話の流れを再現する上で, 心的状態モデリングを用いたモデルが人間の反応に類似することが確認された。さらに、アブレーション研究により、第3レベルの共通信念は、第1および第2の信念の情報を集約し、共通基盤をより効率的に調整することができる。

Humans talk in free-form while negotiating the expressed meanings or common ground. Despite the impressive conversational abilities of the large generative language models, they do not consider the individual differences in contextual understanding in a shared situated environment. In this work, we propose MindDial, a novel conversational framework that can generate situated free-form responses to negotiate common ground. We design an explicit mind module that can track three-level beliefs -- the speaker's belief, the speaker's prediction of the listener's belief, and the common belief based on the gap between the first two. Then the speaking act classification head will decide to continue to talk, end this turn, or take task-related action. We augment a common ground alignment dataset MutualFriend with belief dynamics annotation, of which the goal is to find a single mutual friend based on the free chat between two agents. Experiments show that our model with mental state modeling can resemble human responses when aligning common ground meanwhile mimic the natural human conversation flow. The ablation study further validates the third-level common belief can aggregate information of the first and second-order beliefs and align common ground more efficiently.

翻訳日:2023-06-29 11:23:18 公開日:2023-06-28

# MIMIC:画像対応による仮面画像モデリング

MIMIC: Masked Image Modeling with Image Correspondences ( http://arxiv.org/abs/2306.15128v2 )

ライセンス: Link先を確認

Kalyani Marathe, Mahtab Bigverdi, Nishat Khan, Tuhin Kundu, Aniruddha Kembhavi, Linda G. Shapiro, Ranjay Krishna

(参考訳) 現在、コンピュータビジョンにおける深度推定とセマンティックセグメンテーションは、事前訓練された画像表現に依存している。したがって、効果的な事前学習データセットのキュレーションは不可欠である。残念ながら、効果的な事前トレーニングデータセットは、マルチビューシーンを持つもので、アノテーション付き3Dメッシュ、ポイントクラウド、シミュレートされた環境からのカメラパラメータを使用してのみキュレートされている。アノテーションを必要としないデータセット作成機構を提案する。我々は、MIMIC-1M with 1.3MとMIMIC-3M with 3.1Mの2つのデータセットを、オープンソースビデオデータセットと合成3D環境から抽出した。マスク付き画像モデリングの目的が異なる複数の自己教師付きモデルをトレーニングし、以下の結果を示す。深度推定、意味セグメンテーション、表面正規化、ポーズ推定など、複数の下流タスクでアノテーションを使用してマイニングされたものよりも、模倣3mでトレーニングされた表現が優れている。また、ダウンストリームのトレーニングデータに制限がある場合、凍結された表現よりも優れています。より大規模なデータセット(MIMIC-3M)は、より大規模なデータセットを生成するために任意にスケールできるので、パフォーマンスが大幅に向上する。 MIMICコード、データセット、トレーニング済みモデルはhttps://github.com/RAIVNLab/MIMICでオープンソース化されている。

Many pixelwise dense prediction tasks-depth estimation and semantic segmentation in computer vision today rely on pretrained image representations. Therefore, curating effective pretraining datasets is vital. Unfortunately, the effective pretraining datasets are those with multi-view scenes and have only been curated using annotated 3D meshes, point clouds, and camera parameters from simulated environments. We propose a dataset-curation mechanism that does not require any annotations. We mine two datasets: MIMIC-1M with 1.3M and MIMIC-3M with 3.1M multi-view image pairs from open-sourced video datasets and from synthetic 3D environments. We train multiple self-supervised models with different masked image modeling objectives to showcase the following findings: Representations trained on MIMIC-3M outperform those mined using annotations on multiple downstream tasks, including depth estimation, semantic segmentation, surface normals, and pose estimation. They also outperform representations that are frozen and when downstream training data is limited to few-shot. Larger dataset (MIMIC-3M) significantly improves performance, which is promising since our curation method can arbitrarily scale to produce even larger datasets. MIMIC code, dataset, and pretrained models are open-sourced at https://github.com/RAIVNLab/MIMIC.

翻訳日:2023-06-29 11:22:51 公開日:2023-06-28

# bertのクロスドメイン挙動のレビュー理解における検討

Investigating Cross-Domain Behaviors of BERT in Review Understanding ( http://arxiv.org/abs/2306.15123v2 )

ライセンス: Link先を確認

Albert Lu and Meng Jiang

(参考訳) レビュースコアの予測には、自然言語処理の現実的な応用であるレビューテキスト理解が必要である。製品レビューにおける異種テキストドメインのため、共通するプラクティスは、異なるドメインのレビューに基づいてBERTモデルを微調整することである。しかし、製品レビュー理解の様々なタスクにおいて、BERTモデルのクロスドメイン動作に関する実証的研究は未だ行われていない。本稿では,単一ドメインおよび複数ドメインのAmazonレビューデータに基づいて,BERTモデルのテキスト分類を行う。以上の結果から,マルチドメインモデルと比較した場合,単一ドメインモデルの性能は若干向上したが,マルチドメインモデルでは,マルチドメインデータで評価した場合の単一ドメインモデルよりも優れており,単一ドメインモデルでは微調整が行えず,すべてのテストで平均的に性能が向上した。単一ドメインモデルの微調整によって精度がわずかに向上するが、ドメイン間でよく機能するマルチドメインモデルを利用することで、計算資源とコストを削減できる。

Review score prediction requires review text understanding, a critical real-world application of natural language processing. Due to dissimilar text domains in product reviews, a common practice is fine-tuning BERT models upon reviews of differing domains. However, there has not yet been an empirical study of cross-domain behaviors of BERT models in the various tasks of product review understanding. In this project, we investigate text classification BERT models fine-tuned on single-domain and multi-domain Amazon review data. In our findings, though single-domain models achieved marginally improved performance on their corresponding domain compared to multi-domain models, multi-domain models outperformed single-domain models when evaluated on multi-domain data, single-domain data the single-domain model was not fine-tuned on, and on average when considering all tests. Though slight increases in accuracy can be achieved through single-domain model fine-tuning, computational resources and costs can be reduced by utilizing multi-domain models that perform well across domains.

翻訳日:2023-06-29 11:22:28 公開日:2023-06-28

# 拡張カルマンフィルタを用いたストリーミング量子ゲートトモグラフィ

Streaming quantum gate set tomography using the extended Kalman filter ( http://arxiv.org/abs/2306.15116v2 )

ライセンス: Link先を確認

J. P. Marceaux and Kevin Young

(参考訳) 量子プロセッサのリアルタイム校正のためのクローズドループ制御アルゴリズムは、測定された量子回路結果のストリームに基づいて物理誤差パラメータを推定できる効率的なフィルタを必要とする。このようなフィルタの開発は、観測された回路結果と初歩誤差の大きさとの非線形関係が複雑である。本研究では,量子ゲート集合トモグラフィのデータに対して拡張カルマンフィルタを適用し,システム誤差モデルとその不確かさをストリーミング推定する。我々の数値的な例から、拡張カルマンフィルタは最大推定値と同等の性能が得られるが、計算コストは劇的に低い。提案手法により, 標準ラップトップは1ビットと2ビットの回路結果を処理することができ, ゲートセットエラーモデルを現在の実験実行に匹敵する速度で更新することができる。

Closed-loop control algorithms for real-time calibration of quantum processors require efficient filters that can estimate physical error parameters based on streams of measured quantum circuit outcomes. Development of such filters is complicated by the highly nonlinear relationship relationship between observed circuit outcomes and the magnitudes of elementary errors. In this work, we apply the extended Kalman filter to data from quantum gate set tomography to provide a streaming estimator of the both the system error model and its uncertainties. Our numerical examples indicate extended Kalman filtering can achieve similar performance to maximum likelihood estimation, but with dramatically lower computational cost. With our methods, a standard laptop can process one- and two-qubit circuit outcomes and update gate set error model at rates comparable with current experimental execution.

翻訳日:2023-06-29 11:22:11 公開日:2023-06-28

# フラクタル場理論における量子クエンチ

Quantum quenches in fractonic field theories ( http://arxiv.org/abs/2306.14951v2 )

ライセンス: Link先を確認

Dmitry S. Ageev and Vasilii V. Pushkarev

(参考訳) フラクトロンスカラー場理論における大域量子クエンチによる平衡外ダイナミクスについて検討する。数種類のクエンチ、特に離散回転対称性の異なる理論における質量クエンチ(\mathbb{z}_4$ および $\mathbb{z}_8$)とそれらの間の遷移による瞬時クエンチを考える。また, ユークリッド時間に有限幅スラブ上に初期状態が作成されるフラクタル境界クエンチについても検討した。有限体積におけるフラクトロン系の摂動は、特に、特定の$\mathbb{Z}_4$-対称空間構造の形成とその後の進化を通じて制限されたモビリティを強調する。我々は$\mathbb{Z}_n$-対称場理論への一般化について議論し、適切な正則化を導入し、フラクトロン場理論に固有の発散を明示的に扱うことができる。

We study out-of-equilibrium dynamics caused by global quantum quenches in fractonic scalar field theories. We consider several types of quenches, in particular, the mass quench in theories with different types of discrete rotational symmetries ($\mathbb{Z}_4$ and $\mathbb{Z}_8$), as well as an instantaneous quench via the transition between them. We also investigate fractonic boundary quenches, where the initial state is prepared on a finite-width slab in Euclidean time. We find that perturbing a fractonic system in finite volume especially highlights the restricted mobility via the formation and subsequent evolution of specific $\mathbb{Z}_4$-symmetric spatial structures. We discuss a generalization to $\mathbb{Z}_n$-symmetric field theories, and introduce a proper regularization, which allows us to explicitly deal with divergences inherent to fractonic field theories.

翻訳日:2023-06-29 11:21:58 公開日:2023-06-28

# phd論文:認知とコンピュータビジョンのアーキテクチャにおける(自己)アテンションの役割を探求する

PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture ( http://arxiv.org/abs/2306.14650v2 )

ライセンス: Link先を確認

Mohit Vaishnav

(参考訳) 複雑な推論タスクにおける注意と記憶の役割について検討する。トランスフォーマーに基づく自己認識をモデルとして分析し,メモリで拡張する。合成視覚的推論テストの研究により、推論タスクの分類を洗練する。 resnet50にセルフ・アテンションを組み込んだ機能マップを機能ベースおよび空間的注意力を用いて拡張し,視覚的推論課題を効率的に解決する。本研究は,SVRTタスクの注意的ニーズの理解に寄与する。さらに,アクティブビジョン理論に触発された注意と記憶を組み合わせた認知アーキテクチャGAMRを提案する。 GAMRはサンプル効率、堅牢性、構成性において他のアーキテクチャよりも優れており、新しい推論タスクにおいてゼロショットの一般化を示す。

We investigate the role of attention and memory in complex reasoning tasks. We analyze Transformer-based self-attention as a model and extend it with memory. By studying a synthetic visual reasoning test, we refine the taxonomy of reasoning tasks. Incorporating self-attention with ResNet50, we enhance feature maps using feature-based and spatial attention, achieving efficient solving of challenging visual reasoning tasks. Our findings contribute to understanding the attentional needs of SVRT tasks. Additionally, we propose GAMR, a cognitive architecture combining attention and memory, inspired by active vision theory. GAMR outperforms other architectures in sample efficiency, robustness, and compositionality, and shows zero-shot generalization on new reasoning tasks.

翻訳日:2023-06-29 11:20:56 公開日:2023-06-28

# stylegan 埋め込み画像を用いた癌予後予測のための深層学習

Deep Learning for Cancer Prognosis Prediction Using Portrait Photos by StyleGAN Embedding ( http://arxiv.org/abs/2306.14596v2 )

ライセンス: Link先を確認

Amr Hagag, Ahmed Gomaa, Dominik Kornek, Andreas Maier, Rainer Fietkau, Christoph Bert, Florian Putz and Yixing Huang

(参考訳) がん患者の生存予測は最適な治療選択と患者管理に重要である。現在の患者生存予測法は、典型的には患者の臨床記録データまたは生物学的および画像データから生存情報を抽出する。実際に、経験豊富な臨床医は、主に顔の特徴である観察可能な身体的外観に基づいて、患者の健康状態の予備評価を行うことができる。しかし、この評価は非常に主観的である。本研究は,従来のポートレート写真に含まれる予測情報を,深層学習を用いて客観的に捉え,活用する効果について初めて検討した。事前トレーニングされたStyleGAN2モデルは、がん患者の写真のカスタムデータセットに基づいて微調整され、患者の写真に合った生成能力で生成する。 StyleGAN2は、写真を非常に表現力のある潜伏空間に埋め込むために使用される。最先端のサバイバル分析モデルと、styleganの潜在空間写真埋め込みに基づいて、このアプローチは0.677のc-インデックスを達成し、これは単純な2d顔画像に埋め込まれた予測値よりも顕著に高い。さらに、StyleGANの解釈可能な潜伏空間のおかげで、我々の生存予測モデルは、重要な顔の特徴に依存し、衣服や背景などの外部情報からのバイアスを排除できる。さらに、患者のケアに重要な電位値を有する回帰係数から健康属性を求める。

Survival prediction for cancer patients is critical for optimal treatment selection and patient management. Current patient survival prediction methods typically extract survival information from patients' clinical record data or biological and imaging data. In practice, experienced clinicians can have a preliminary assessment of patients' health status based on patients' observable physical appearances, which are mainly facial features. However, such assessment is highly subjective. In this work, the efficacy of objectively capturing and using prognostic information contained in conventional portrait photographs using deep learning for survival predication purposes is investigated for the first time. A pre-trained StyleGAN2 model is fine-tuned on a custom dataset of our cancer patients' photos to empower its generator with generative ability suitable for patients' photos. The StyleGAN2 is then used to embed the photographs to its highly expressive latent space. Utilizing the state-of-the-art survival analysis models and based on StyleGAN's latent space photo embeddings, this approach achieved a C-index of 0.677, which is notably higher than chance and evidencing the prognostic value embedded in simple 2D facial images. In addition, thanks to StyleGAN's interpretable latent space, our survival prediction model can be validated for relying on essential facial features, eliminating any biases from extraneous information like clothing or background. Moreover, a health attribute is obtained from regression coefficients, which has important potential value for patient care.

翻訳日:2023-06-29 11:20:40 公開日:2023-06-28

# PoseDiffusion: Diffusion-aided Bundle Adjustment によるPose推定の解法

PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment ( http://arxiv.org/abs/2306.15667v2 )

ライセンス: Link先を確認

Jianyuan Wang, Christian Rupprecht, David Novotny

(参考訳) カメラポーズ推定は、従来は手作りのキーポイントマッチング、RANSAC、バンドル調整といった古典的な手法に依存していたコンピュータビジョンの問題である。本稿では,入力画像に対するカメラポーズの条件分布をモデル化し,確率拡散フレームワーク内の運動からの構造 (sfm) を定式化する。古い問題に対するこの新しい見方にはいくつかの利点がある。 (i)拡散フレームワークの性質は、バンドル調整の反復手順を反映している。 (ii)この定式化はエピポーラ幾何学からの幾何学的制約のシームレスな統合を可能にする。 (iii)広い基準線を持つスパースビューのような典型的な難易度シナリオに優れる。 (iv)任意の量の画像に対して内在性及び外在性を予測することができる。提案手法は,従来のSfMパイプラインと実世界の2つのデータセットに対する学習アプローチよりも大幅に改善されていることを示す。最後に,本手法がさらなるトレーニングを行なわずにデータセットをまたいで一般化できることが観察された。プロジェクトページ: https://posediffusion.github.io/

Camera pose estimation is a long-standing computer vision problem that to date often relies on classical methods, such as handcrafted keypoint matching, RANSAC and bundle adjustment. In this paper, we propose to formulate the Structure from Motion (SfM) problem inside a probabilistic diffusion framework, modelling the conditional distribution of camera poses given input images. This novel view of an old problem has several advantages. (i) The nature of the diffusion framework mirrors the iterative procedure of bundle adjustment. (ii) The formulation allows a seamless integration of geometric constraints from epipolar geometry. (iii) It excels in typically difficult scenarios such as sparse views with wide baselines. (iv) The method can predict intrinsics and extrinsics for an arbitrary amount of images. We demonstrate that our method PoseDiffusion significantly improves over the classic SfM pipelines and the learned approaches on two real-world datasets. Finally, it is observed that our method can generalize across datasets without further training. Project page: https://posediffusion.github.io/

翻訳日:2023-06-29 11:13:30 公開日:2023-06-28

# フランス語物語における直接音声の自動アノテーション

Automatic Annotation of Direct Speech in Written French Narratives ( http://arxiv.org/abs/2306.15634v2 )

ライセンス: Link先を確認

No\'e Durandard and Viet-Anh Tran and Gaspard Michel and Elena V. Epure

(参考訳) テキスト中の直接音声(aads)の自動注釈は、しばしば計算的な物語理解に使われている。ルールやディープニューラルネットワークに基づく手法は、特に英語やドイツ語で研究されている。しかし、フランス語では、我々の対象とする言語は多くはない。私たちのゴールは、フランス語でAADSモデルを設計、評価するための統一されたフレームワークを作ることです。そこで我々は,一語あたりのDSに注釈付けされた最大かつ最新のフランス語物語データセットを統合し,他の言語でのシーケンスラベリングやAADSから様々なベースラインを適応させ,一般化に焦点を当てた広範な評価を行った。結果は,タスクにはまだかなりの努力が必要であり,各ベースラインの特徴を強調していることを示している。このフレームワークは改善される可能性があるが、このトピックに関するさらなる研究を促進するための一歩である。

The automatic annotation of direct speech (AADS) in written text has been often used in computational narrative understanding. Methods based on either rules or deep neural networks have been explored, in particular for English or German languages. Yet, for French, our target language, not many works exist. Our goal is to create a unified framework to design and evaluate AADS models in French. For this, we consolidated the largest-to-date French narrative dataset annotated with DS per word; we adapted various baselines for sequence labelling or from AADS in other languages; and we designed and conducted an extensive evaluation focused on generalisation. Results show that the task still requires substantial efforts and emphasise characteristics of each baseline. Although this framework could be improved, it is a step further to encourage more research on the topic.

翻訳日:2023-06-29 11:13:15 公開日:2023-06-28

# コサイクルを用いた非同期アルゴリズムアライメント

Asynchronous Algorithmic Alignment with Cocycles ( http://arxiv.org/abs/2306.15632v2 )

ライセンス: Link先を確認

Andrew Dudzik, Tamara von Glehn, Razvan Pascanu, Petar Veli\v{c}kovi\'c

(参考訳) 最先端のニューラルネットワーク推論器は、グラフニューラルネットワーク(GNN)でメッセージパッシングを利用する。しかし、典型的なgnnはメッセージ関数の定義と呼び出しの区別を曖昧にし、ノードが各レイヤの近隣にメッセージを同期的に送らなければならない。しかし、動的プログラミングアルゴリズムの実行を学ぶためにGNNを適用する場合、ほとんどのステップでは、送信すべき意味のあるアップデートはノードのごく一部に限られる。したがって、多くの中間gnnステップがid関数を学ばなければならないため、グラフ全体にあまりにも多くの無関係なデータを送信することで、非効率なリスクを負うことになる。この作業では、ノードの状態更新とメッセージ関数呼び出しの概念を明示的に分離します。この分離により、アルゴリズムとニューラルネットワークの両方で非同期計算を推論できる数学的定式化が得られる。

State-of-the-art neural algorithmic reasoners make use of message passing in graph neural networks (GNNs). But typical GNNs blur the distinction between the definition and invocation of the message function, forcing a node to send messages to its neighbours at every layer, synchronously. When applying GNNs to learn to execute dynamic programming algorithms, however, on most steps only a handful of the nodes would have meaningful updates to send. One, hence, runs the risk of inefficiencies by sending too much irrelevant data across the graph -- with many intermediate GNN steps having to learn identity functions. In this work, we explicitly separate the concepts of node state update and message function invocation. With this separation, we obtain a mathematical formulation that allows us to reason about asynchronous computation in both algorithms and neural networks.

翻訳日:2023-06-29 11:13:03 公開日:2023-06-28

# 位置補間による大規模言語モデルのコンテキストウィンドウの拡張

Extending Context Window of Large Language Models via Positional Interpolation ( http://arxiv.org/abs/2306.15595v2 )

ライセンス: Link先を確認

Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian

(参考訳) LLaMAモデルのようなRoPEベースで事前訓練されたLLMのコンテキストウィンドウサイズを、最小限の微調整(1000ステップ以内)で最大32768まで拡張し、パスキー検索、言語モデリング、LLaMA 7Bから65Bまでの長い文書要約などの長いコンテキストを必要とするタスクに対して強力な実験結果を示す。一方、位置補間による拡張モデルは、元のコンテキストウィンドウ内のタスクの質を比較的よく保っている。この目的を達成するために、位置補間は入力位置指標を線形にダウンスケールし、トレーニングされたコンテキスト長を超えて外挿するのではなく、自己保持機構を完全に破壊する破滅的な高い注意スコアを与える。我々の理論的研究は、補間上限が少なくとも$\sim 600 \times$は外挿限界よりも小さいことを示し、その安定性を示している。位置補間によって拡張されたモデルは元のアーキテクチャを維持し、既存の最適化とインフラを再利用することができる。

We present Position Interpolation (PI) that extends the context window sizes of RoPE-based pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within 1000 steps), while demonstrating strong empirical results on various tasks that require long context, including passkey retrieval, language modeling, and long document summarization from LLaMA 7B to 65B. Meanwhile, the extended model by Position Interpolation preserve quality relatively well on tasks within its original context window. To achieve this goal, Position Interpolation linearly down-scales the input position indices to match the original context window size, rather than extrapolating beyond the trained context length which may lead to catastrophically high attention scores that completely ruin the self-attention mechanism. Our theoretical study shows that the upper bound of interpolation is at least $\sim 600 \times$ smaller than that of extrapolation, further demonstrating its stability. Models extended via Position Interpolation retain its original architecture and can reuse most pre-existing optimization and infrastructure.

翻訳日:2023-06-29 11:12:50 公開日:2023-06-28

# 幾何超音波局在顕微鏡

Geometric Ultrasound Localization Microscopy ( http://arxiv.org/abs/2306.15548v2 )

ライセンス: Link先を確認

Christopher Hahne and Raphael Sznitman

(参考訳) 造影超音波(CEUS)は、医学診断における非侵襲的、動的可視化の有効な方法となっているが、超音波局在顕微鏡(ULM)は10倍の高分解能を提供することで、画期的なブレークスルーを実現している。現在までに、遅延アンドサム(DAS)ビームフォーマを使用してULMフレームをレンダリングし、最終的に画像解像度の能力を決定する。 ULMを最大限に活用するために,本研究では,ビームフォーミングがULMの最も効果的な処理ステップであるかどうかを疑問視し,TDoA情報のみに依存する代替手法を提案する。この目的のために, 既存のビームフォーミング限界を克服するために, 楕円交差による微小気泡局在のための新しい幾何学的枠組みを提案する。本稿では,既存のベースライン法よりも精度と信頼性の面で優れており,利用可能なトランスデューサデータの一部のみを活用できる公開データセットに基づくベンチマーク比較を行う。

Contrast-Enhanced Ultra-Sound (CEUS) has become a viable method for non-invasive, dynamic visualization in medical diagnostics, yet Ultrasound Localization Microscopy (ULM) has enabled a revolutionary breakthrough by offering ten times higher resolution. To date, Delay-And-Sum (DAS) beamformers are used to render ULM frames, ultimately determining the image resolution capability. To take full advantage of ULM, this study questions whether beamforming is the most effective processing step for ULM, suggesting an alternative approach that relies solely on Time-Difference-of-Arrival (TDoA) information. To this end, a novel geometric framework for micro bubble localization via ellipse intersections is proposed to overcome existing beamforming limitations. We present a benchmark comparison based on a public dataset for which our geometric ULM outperforms existing baseline methods in terms of accuracy and reliability while only utilizing a portion of the available transducer data.

翻訳日:2023-06-29 11:12:30 公開日:2023-06-28

# 不規則時系列の事前異常検出

Precursor-of-Anomaly Detection for Irregular Time Series ( http://arxiv.org/abs/2306.15489v2 )

ライセンス: Link先を確認

Sheo Yon Jhin, Jaehoon Lee, Noseong Park

(参考訳) 異常検出は予期せぬパターンやデータポイントを特定することを目的とした重要な分野であり、金融、製造、サイバーセキュリティなどにおける多くの現実世界の問題と密接に関連している。様々な分野で異常検出が広く研究されているが、今後の異常検出は未発見領域のままである。本稿では,新しい種類の異常検出手法であるemph{\textbf{P}recursor-of-\textbf{A}nomaly (PoA) を提案する。特定の時系列観測が異常であるか否かを決定する従来の異常検出とは異なり、PoA検出は将来の異常を検出することを目的としている。両課題を同時に解決するために,ニューラル制御による微分方程式に基づくニューラルネットワークとそのマルチタスク学習アルゴリズムを提案する。 17のベースラインと3つのデータセットを使って、規則的および不規則な時系列を含む実験を行い、提案手法がほぼすべてのケースでベースラインを上回ることを実証した。また, マルチタスクトレーニング手法は, 異常検出とpoa検出の両方において, 全体的な性能を著しく向上させることが示唆された。

Anomaly detection is an important field that aims to identify unexpected patterns or data points, and it is closely related to many real-world problems, particularly to applications in finance, manufacturing, cyber security, and so on. While anomaly detection has been studied extensively in various fields, detecting future anomalies before they occur remains an unexplored territory. In this paper, we present a novel type of anomaly detection, called \emph{\textbf{P}recursor-of-\textbf{A}nomaly} (PoA) detection. Unlike conventional anomaly detection, which focuses on determining whether a given time series observation is an anomaly or not, PoA detection aims to detect future anomalies before they happen. To solve both problems at the same time, we present a neural controlled differential equation-based neural network and its multi-task learning algorithm. We conduct experiments using 17 baselines and 3 datasets, including regular and irregular time series, and demonstrate that our presented method outperforms the baselines in almost all cases. Our ablation studies also indicate that the multitasking training method significantly enhances the overall performance for both anomaly and PoA detection.

翻訳日:2023-06-29 11:12:13 公開日:2023-06-28

# フリースタイル・高速3次元ポートレート合成

Free-style and Fast 3D Portrait Synthesis ( http://arxiv.org/abs/2306.15419v2 )

ライセンス: Link先を確認

Tianxiang Ma, Kang Zhao, Jianxin Sun, Jing Dong, Tieniu Tan

(参考訳) 高品質で一貫性のあるフリースタイルの3Dポートレートを効果的に生成することは、有望だが難しい課題だ。既存のほとんどのメソッドで生成されるポートレートスタイルは通常、FFHQのような特定の顔データセットで学習される3Dジェネレータによって制限される。フリースタイルの3Dポートレートを得るには、大規模なマルチスタイルデータベースを構築して3Dジェネレータを再トレーニングするか、あるいはオフザシェルフツールを使ってスタイル翻訳を行うことができる。しかし、データ収集とトレーニングプロセスのために前者は時間がかかり、後者はマルチビューの一貫性を損なう可能性がある。この問題に対処するため,本論文では,テキストプロンプトを用いてスタイルを指定可能な高速な3次元肖像画合成フレームワークを提案する。具体的には、3d対応ganジェネレータ (eg3d) とテキスト誘導画像エディタ (ip2p) の2つの生成前処理を利用して、数発のトレーニングセットを迅速に構築し、ip2pの推論プロセスを最適化し、編集をより安定させる。次に、EG3Dの原型三葉機を2つの目的のためにImage-to-Triplane (I2T)モジュールに置き換える。 1) 少数ショットデータセット上でI2Tを微調整することにより,事前訓練したEG3Dのスタイル制約を解消する。 2) I2Tを除くEG3Dのすべての部分の固定による訓練効率の向上。さらに,本手法のスケーラビリティと一般化を実証するために,マルチスタイルかつマルチidentity 3dポートレートデータベースを構築した。実験の結果,高品質な3dポートレートを数分で合成でき,最新技術に匹敵することがわかった。

Efficiently generating a free-style 3D portrait with high quality and consistency is a promising yet challenging task. The portrait styles generated by most existing methods are usually restricted by their 3D generators, which are learned in specific facial datasets, such as FFHQ. To get a free-style 3D portrait, one can build a large-scale multi-style database to retrain the 3D generator, or use a off-the-shelf tool to do the style translation. However, the former is time-consuming due to data collection and training process, the latter may destroy the multi-view consistency. To tackle this problem, we propose a fast 3D portrait synthesis framework in this paper, which enable one to use text prompts to specify styles. Specifically, for a given portrait style, we first leverage two generative priors, a 3D-aware GAN generator (EG3D) and a text-guided image editor (Ip2p), to quickly construct a few-shot training set, where the inference process of Ip2p is optimized to make editing more stable. Then we replace original triplane generator of EG3D with a Image-to-Triplane (I2T) module for two purposes: 1) getting rid of the style constraints of pre-trained EG3D by fine-tuning I2T on the few-shot dataset; 2) improving training efficiency by fixing all parts of EG3D except I2T. Furthermore, we construct a multi-style and multi-identity 3D portrait database to demonstrate the scalability and generalization of our method. Experimental results show that our method is capable of synthesizing high-quality 3D portraits with specified styles in a few minutes, outperforming the state-of-the-art.

翻訳日:2023-06-29 11:11:52 公開日:2023-06-28

# TrickVOS:ビデオオブジェクトセグメンテーションのためのトリックの袋

TrickVOS: A Bag of Tricks for Video Object Segmentation ( http://arxiv.org/abs/2306.15377v2 )

ライセンス: Link先を確認

Evangelos Skartados, Konstantinos Georgiadis, Mehmet Kerim Yucel, Koskinas Ioannis, Armando Domi, Anastasios Drosou, Bruno Manganelli, Albert Saa-Garriga

(参考訳) 空間時間メモリ(STM)ネットワーク手法は,その性能上,半教師付きビデオオブジェクトセグメンテーション(SVOS)において支配的であった。本研究では,このような手法を改良できる3つの重要な側面を同定する。一監督信号二事前訓練及び訓練 iii) 空間意識。次に、各側面に対処できる汎用的なメソッドに依存しないトリックバッグであるtrickvosを提案する。一構造対応ハイブリッド損失二簡易復号機事前訓練体制及び三モデル予測に空間的制約を課す安価な追跡装置最後に、軽量なネットワークを提案し、TrickVOSでトレーニングすると、DAVISとYouTubeベンチマークの最先端メソッドと競合する結果が得られ、モバイルデバイス上でリアルタイムに実行できるSTMベースのSVOSメソッドの1つであることを示す。

Space-time memory (STM) network methods have been dominant in semi-supervised video object segmentation (SVOS) due to their remarkable performance. In this work, we identify three key aspects where we can improve such methods; i) supervisory signal, ii) pretraining and iii) spatial awareness. We then propose TrickVOS; a generic, method-agnostic bag of tricks addressing each aspect with i) a structure-aware hybrid loss, ii) a simple decoder pretraining regime and iii) a cheap tracker that imposes spatial constraints in model predictions. Finally, we propose a lightweight network and show that when trained with TrickVOS, it achieves competitive results to state-of-the-art methods on DAVIS and YouTube benchmarks, while being one of the first STM-based SVOS methods that can run in real-time on a mobile device.

翻訳日:2023-06-29 11:11:25 公開日:2023-06-28

# 3D-Speaker: 大規模マルチデバイス, マルチディスタンス, マルチディレクトコーパスによる音声表現遠絡

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement ( http://arxiv.org/abs/2306.15354v2 )

ライセンス: Link先を確認

Siqi Zheng, Luyao Cheng, Yafeng Chen, Hui Wang, Qian Chen

(参考訳) 発話における非相関情報の拡散は、音声コミュニティにおいて重要な研究課題である。異なる音声関連タスクは、他の非相関情報の影響を最小限に抑えながら、異なる音声表現を抽出することに焦点を当てる。本稿では,音声表現のゆがみの研究を容易にするための大規模音声コーパスを提案する。 3D-Speakerには10,000人以上のスピーカーが含まれており、それぞれが複数のデバイスによって同時に記録され、異なる距離に配置されている。多次元オーディオデータの制御された組み合わせは、多様な音声表現の絡み合いの混合のマトリックスを生じさせ、興味をそそる方法の動機付けとなる。 3D-Speakerのマルチドメインの性質は、ドメイン外学習と自己教師型学習の大規模な普遍的な音声モデルと実験方法を評価するのに適している。 https://3dspeaker.github.io/

Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000 speakers, each of whom are simultaneously recorded by multiple Devices, locating at different Distances, and some speakers are speaking multiple Dialects. The controlled combinations of multi-dimensional audio data yield a matrix of a diverse blend of speech representation entanglement, thereby motivating intriguing methods to untangle them. The multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate large universal speech models and experiment methods of out-of-domain learning and self-supervised learning. https://3dspeaker.github.io/

翻訳日:2023-06-29 11:11:11 公開日:2023-06-28

# 時系列モデリングのための変分潜在離散表現

Variational Latent Discrete Representation for Time Series Modelling ( http://arxiv.org/abs/2306.15282v2 )

ライセンス: Link先を確認

Max Cohen, Maurice Charbit, Sylvain Le Corff

(参考訳) 離散潜在空間モデルは、最近、深部変分推論における連続的な空間と同等の性能を達成した。彼らはまだ様々な実装課題に直面しているが、これらのモデルは自然に離散的な現象をより直接的に表現するだけでなく、潜在空間をよりよく解釈する機会を提供する。最近のアプローチでは、離散潜在データ上で非常に高次元の事前モデルを個別に訓練することを提案している。本稿では、離散状態がマルコフ連鎖であり、高速なエンドツーエンドトレーニングを可能にする潜在データモデルを提案する。生成モデルの性能はビル管理データセットと一般公開されているElectricity Transformer Datasetに基づいて評価する。

Discrete latent space models have recently achieved performance on par with their continuous counterparts in deep variational inference. While they still face various implementation challenges, these models offer the opportunity for a better interpretation of latent spaces, as well as a more direct representation of naturally discrete phenomena. Most recent approaches propose to train separately very high-dimensional prior models on the discrete latent data which is a challenging task on its own. In this paper, we introduce a latent data model where the discrete state is a Markov chain, which allows fast end-to-end training. The performance of our generative model is assessed on a building management dataset and on the publicly available Electricity Transformer Dataset.

翻訳日:2023-06-29 11:10:54 公開日:2023-06-28

PDF登録状況（公開日: 20230628）