Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230513となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# プログラムの自動修復手法に関する調査研究 A Survey on Automated Program Repair Techniques ( http://arxiv.org/abs/2303.18184v3 ) ライセンス: Link先を確認	Kai Huang, Zhengzi Xu, Su Yang, Hongyu Sun, Xuejun Li, Zheng Yan, Yuqing Zhang	(参考訳) プログラムソフトウェアの急速な開発と大規模な普及により、現代社会はますますソフトウェアシステムに依存している。しかし、ソフトウェアによってもたらされる問題は表面化している。ソフトウェア欠陥は、開発者を悩ませる重要な要因になっている。この文脈では、自動プログラム修復(automated program repair, apr)技術が登場し、ソフトウェアの欠陥問題を自動修正し、手動デバッグ作業を減らすことを目的としている。特に、ディープラーニングの進歩により、近年多くの学習ベースのAPR技術が出現し、APR研究の新たな機会ももたらされている。研究者は、APR技術の完全な発展と今後の展望を素早く概説するため、APR技術の進化を再考し、APR研究の最新の進歩について深く議論する。本稿では,検索ベース,制約ベース,テンプレートベース,学習ベースという4つの異なるパッチ生成方式を用いて,APR手法の開発を紹介する。さらに,各APRツールをレビュー・比較するための一貫した基準セットを提案し,APR技術の利点とデメリットをまとめた上で,APR開発の現状について論じる。さらに,本研究は,APR開発を進める大きな動機となった,APRの関連技術分野の研究についても紹介する。最後に,現状の課題と今後の方向性を分析し,特に大規模言語モデルがapr研究にもたらした重要な機会を強調する。 With the rapid development and large-scale popularity of program software, modern society increasingly relies on software systems. However, the problems exposed by software have also come to the fore. Software defect has become an important factor troubling developers. In this context, Automated Program Repair (APR) techniques have emerged, aiming to automatically fix software defect problems and reduce manual debugging work. In particular, benefiting from the advances in deep learning, numerous learning-based APR techniques have emerged in recent years, which also bring new opportunities for APR research. To give researchers a quick overview of APR techniques' complete development and future opportunities, we revisit the evolution of APR techniques and discuss in depth the latest advances in APR research. In this paper, the development of APR techniques is introduced in terms of four different patch generation schemes: search-based, constraint-based, template-based, and learning-based. Moreover, we propose a uniform set of criteria to review and compare each APR tool, summarize the advantages and disadvantages of APR techniques, and discuss the current state of APR development. Furthermore, we introduce the research on the related technical areas of APR that have also provided a strong motivation to advance APR development. Finally, we analyze current challenges and future directions, especially highlighting the critical opportunities that large language models bring to APR research.	翻訳日:2023-10-24 12:56:27 公開日:2023-05-13
# AR/VRテクノロジスタック: ソフトウェア開発ライブラリ、プラットフォーム、ツールの中心的なリポジトリ The AR/VR Technology Stack: A Central Repository of Software Development Libraries, Platforms, and Tools ( http://arxiv.org/abs/2305.07842v1 ) ライセンス: Link先を確認	Jasmine Roberts	(参考訳) 拡張現実、バーチャル、複合現実の領域に特化して、ソフトウェア開発ライブラリ、プラットフォーム、ツールの包括的なリポジトリ。 A comprehensive repository of software development libraries, platforms, and tools specifically to the domains of augmented, virtual, and mixed reality.	翻訳日:2023-10-24 08:55:30 公開日:2023-05-13
# 形式的手法によるXAI神話の否定 -初期結果- Disproving XAI Myths with Formal Methods -- Initial Results ( http://arxiv.org/abs/2306.01744v1 ) ライセンス: Link先を確認	Joao Marques-Silva	(参考訳) 近年の機械学習(ML)の進歩は印象的かつ広範囲に及んでいる。しかしながら、MLモデルのデプロイは、最高のパフォーマンスを持つMLモデルの予測方法に対する信頼の欠如によって、依然として損なわれている。信頼の欠如の問題は、高リスクまたは安全クリティカルな領域におけるmlモデルの使用においてさらに深刻である。 eXplainable AI(XAI)は、信頼できるAIを提供するための継続的な取り組みの中核にある。残念ながら、XAIは信頼を構築する代わりに不信を育むという批判的な誤解に打ち消されている。本稿は、XAIにおける最も目に見える誤解のいくつかを詳述し、これらの誤解を否定するためにも、実用的な代替案を考案するためにも、形式的手法がどのように使われたかを示す。 The advances in Machine Learning (ML) in recent years have been both impressive and far-reaching. However, the deployment of ML models is still impaired by a lack of trust in how the best-performing ML models make predictions. The issue of lack of trust is even more acute in the uses of ML models in high-risk or safety-critical domains. eXplainable artificial intelligence (XAI) is at the core of ongoing efforts for delivering trustworthy AI. Unfortunately, XAI is riddled with critical misconceptions, that foster distrust instead of building trust. This paper details some of the most visible misconceptions in XAI, and shows how formal methods have been used, both to disprove those misconceptions, but also to devise practically effective alternatives.	翻訳日:2023-06-11 14:04:56 公開日:2023-05-13
# 消失する活性化:深部カプセルネットワークの症状 Vanishing Activations: A Symptom of Deep Capsule Networks ( http://arxiv.org/abs/2305.11178v1 ) ライセンス: Link先を確認	Miles Everett, Mingjun Zhong and Georgios Leontidis	(参考訳) カプセルネットワークは、スカラーの代わりにベクトルや行列表現を利用したニューラルネットワークの拡張であり、当初、視覚概念が部分から完全なオブジェクトへと進化する動的解析木を作成するために開発された。カプセルネットワークの初期実装は、さまざまなデータセットで最先端の成果を達成し、維持する。しかし、最近の研究では、パースツリーの構築に失敗したことと、より深いネットワークに展開する際の勾配の消失に対する感受性など、オリジナルのCapsule Networkアーキテクチャの欠点が明らかにされている。本稿では,本研究を主要なCapsule Networkアーキテクチャにまで拡張し,これらの課題が元の設計に限らないことを示す。カプセルネットワーク研究の大多数は、元々のカプセルネットワークとはやや異なるが、根本的に類似した構造を保っているアーキテクチャを生み出していると論じている。この設計上の類似性がカプセルネットワークのスケーラビリティを阻害している可能性がある。本研究は,Capsule Networksの堅牢性とスケーラビリティ向上に関する広範な議論に寄与する。 Capsule Networks, an extension to Neural Networks utilizing vector or matrix representations instead of scalars, were initially developed to create a dynamic parse tree where visual concepts evolve from parts to complete objects. Early implementations of Capsule Networks achieved and maintain state-of-the-art results on various datasets. However, recent studies have revealed shortcomings in the original Capsule Network architecture, notably its failure to construct a parse tree and its susceptibility to vanishing gradients when deployed in deeper networks. This paper extends the investigation to a range of leading Capsule Network architectures, demonstrating that these issues are not confined to the original design. We argue that the majority of Capsule Network research has produced architectures that, while modestly divergent from the original Capsule Network, still retain a fundamentally similar structure. We posit that this inherent design similarity might be impeding the scalability of Capsule Networks. Our study contributes to the broader discussion on improving the robustness and scalability of Capsule Networks.	翻訳日:2023-05-28 05:36:22 公開日:2023-05-13
# CBAGAN-RRT:Samping-based Path Planningのための畳み込みブロック注意生成支援ネットワーク CBAGAN-RRT: Convolutional Block Attention Generative Adversarial Network for Sampling-Based Path Planning ( http://arxiv.org/abs/2305.10442v1 ) ライセンス: Link先を確認	Abhinav Sagar, Sai Teja Gilukara	(参考訳) サンプリングに基づく経路計画アルゴリズムは自律ロボットにおいて重要な役割を果たす。しかし、RTRベースのアルゴリズムで共通する問題は、生成した初期経路が最適ではなく、収束が遅すぎて現実のアプリケーションでは利用できないことである。本稿では,空間的およびチャネル的注意と新たな損失関数を組み合わせた畳み込みブロック的注意生成逆ネットワークを用いた新しい画像ベース学習アルゴリズム(cbagan-rrt)を提案する。我々のganモデルから生成される経路の確率分布は、rrtアルゴリズムのサンプリングプロセスを導くために用いられる。我々は, \cite{zhang2021generative} が生成したデータセット上でネットワークをトレーニングし, IOU Score, Dice Score, FIDスコア, 時間コストやノード数といったパス計画指標を用いて, 従来の最先端アルゴリズムよりも優れた性能を示す。我々は,本研究の実現可能性を示すため,詳細な実験とアブレーション実験を行い,本モデルがトレーニングデータセットだけでなく,未発見のテストデータセット上でも良好に機能することを示す。このアプローチの利点は、状態空間における複雑な前処理を回避でき、精度を損なうことなくターンや狭い通路を含むような複雑な環境に一般化でき、我々のモデルはサンプリングに基づく他の経路計画アルゴリズムと容易に統合できることである。 Sampling-based path planning algorithms play an important role in autonomous robotics. However, a common problem among the RRT-based algorithms is that the initial path generated is not optimal and the convergence is too slow to be used in real-world applications. In this paper, we propose a novel image-based learning algorithm (CBAGAN-RRT) using a Convolutional Block Attention Generative Adversarial Network with a combination of spatial and channel attention and a novel loss function to design the heuristics, find a better optimal path, and improve the convergence of the algorithm both concerning time and speed. The probability distribution of the paths generated from our GAN model is used to guide the sampling process for the RRT algorithm. We train and test our network on the dataset generated by \cite{zhang2021generative} and demonstrate that our algorithm outperforms the previous state-of-the-art algorithms using both the image quality generation metrics like IOU Score, Dice Score, FID score, and path planning metrics like time cost and the number of nodes. We conduct detailed experiments and ablation studies to illustrate the feasibility of our study and show that our model performs well not only on the training dataset but also on the unseen test dataset. The advantage of our approach is that we can avoid the complicated preprocessing in the state space, our model can be generalized to complicated environments like those containing turns and narrow passages without loss of accuracy, and our model can be easily integrated with other sampling-based path planning algorithms.	翻訳日:2023-05-21 10:24:15 公開日:2023-05-13
# 医用samアダプタ : 医用画像分割のためのsegment anythingモデルの適用 Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation ( http://arxiv.org/abs/2304.12620v6 ) ライセンス: Link先を確認	Junde Wu and Yu Zhang and Rao Fu and Huihui Fang and Yuanpei Liu and Zhaowei Wang and Yanwu Xu and Yueming Jin	(参考訳) Segment Anything Model (SAM)は画像セグメンテーションの分野で最近人気を集めている。全面的なセグメンテーションタスクとプロンプトベースのインターフェースの素晴らしい機能のおかげで、SAMはコミュニティ内で激しい議論を巻き起こした。イメージセグメンテーションのタスクはSAMによって「完了」されたと多くの名高い専門家から言われている。しかし, イメージセグメンテーションは, イメージセグメンテーションファミリーの重要な分枝であるが, セグメンテーション"Anything"の範囲には含まれていないようである。多くの個人実験や最近の研究では、SAMは医療画像のセグメンテーションのサブパールを担っていることが示されている。自然な疑問は、SAMの強力なセグメンテーション能力を医療画像セグメンテーションに拡張するために、パズルの欠片を見つける方法である。本稿では,SAMモデルを微調整する代わりに,医療特化領域の知識をセグメンテーションモデルに統合するMed SAM Adapterを提案する。この単純な実装は、医療画像のセグメンテーションにおいて驚くほど優れた性能を示しており、一般的なNLP技術であるAdapterをコンピュータビジョンのケースに転送する試みの1つだ。医用SAMアダプタ (MSA) は, CT, MRI, 超音波画像, 眼底画像, 皮膚内視鏡画像など, 様々な画像モダリティを有する19の医用画像セグメンテーションタスクにおいて, 優れた性能を示した。 MSAは、nnUNet、TransUNet、UNetr、MedSegDiffのような幅広い最先端(SOTA)の医療画像セグメンテーション手法より優れており、また、完全に細返されたMedSAMよりもかなりパフォーマンスの差がある。コードは、https://github.com/WuJunde/Medical-SAM-Adapter.comでリリースされる。 The Segment Anything Model (SAM) has recently gained popularity in the field of image segmentation. Thanks to its impressive capabilities in all-round segmentation tasks and its prompt-based interface, SAM has sparked intensive discussion within the community. It is even said by many prestigious experts that image segmentation task has been "finished" by SAM. However, medical image segmentation, although an important branch of the image segmentation family, seems not to be included in the scope of Segmenting "Anything". Many individual experiments and recent studies have shown that SAM performs subpar in medical image segmentation. A natural question is how to find the missing piece of the puzzle to extend the strong segmentation capability of SAM to medical image segmentation. In this paper, instead of fine-tuning the SAM model, we propose Med SAM Adapter, which integrates the medical specific domain knowledge to the segmentation model, by a simple yet effective adaptation technique. Although this work is still one of a few to transfer the popular NLP technique Adapter to computer vision cases, this simple implementation shows surprisingly good performance on medical image segmentation. A medical image adapted SAM, which we have dubbed Medical SAM Adapter (MSA), shows superior performance on 19 medical image segmentation tasks with various image modalities including CT, MRI, ultrasound image, fundus image, and dermoscopic images. MSA outperforms a wide range of state-of-the-art (SOTA) medical image segmentation methods, such as nnUNet, TransUNet, UNetr, MedSegDiff, and also outperforms the fully fine-turned MedSAM with a considerable performance gap. Code will be released at: https://github.com/WuJunde/Medical-SAM-Adapter.	翻訳日:2023-05-18 19:31:04 公開日:2023-05-13
# モデル整合性検証のための決定に基づく繰り返しフラクティブ透かし Decision-based iterative fragile watermarking for model integrity verification ( http://arxiv.org/abs/2305.09684v1 ) ライセンス: Link先を確認	Zhaoxia Yin, Heng Yin, Hang Su, Xinpeng Zhang, Zhenzhe Gao	(参考訳) 通常、ファンデーションモデルは彼らのサービスに対する高い需要を満たすためにクラウドサーバーにホストされる。しかしこれは、アタッカーがクラウドにアップロードしたり、ローカルシステムから転送した後で修正できるため、セキュリティ上のリスクにさらされる。そこで本研究では,通常のトレーニングサンプルをモデル変更に敏感な脆弱なサンプルに変換する反復的決定ベース脆弱性透かしアルゴリズムを提案する。提案手法は,変換されたサンプルを投入した場合に,対象モデルが出力する予測確率分布の分散を最小化することを目的とした最適化問題であり,通常のサンプルを複数回繰り返して脆弱なサンプルに変換する。 Our method has some advantages: (1) the iterative update of samples is done in a decision-based black-box manner, relying solely on the predicted probability distribution of the target model, which reduces the risk of exposure to adversarial attacks, (2) the small-amplitude multiple iterations approach allows the fragile samples to perform well visually, with a PSNR of 55 dB in TinyImageNet compared to the original samples, (3) even with changes in the overall parameters of the model of magnitude 1e-4, the fragile samples can detect such changes, and (4) the method is independent of the specific model structure and dataset. 本稿では,複数のモデルとデータセットにおける提案手法の有効性を実証し,現状よりも優れていることを示す。 Typically, foundation models are hosted on cloud servers to meet the high demand for their services. However, this exposes them to security risks, as attackers can modify them after uploading to the cloud or transferring from a local system. To address this issue, we propose an iterative decision-based fragile watermarking algorithm that transforms normal training samples into fragile samples that are sensitive to model changes. We then compare the output of sensitive samples from the original model to that of the compromised model during validation to assess the model's completeness.The proposed fragile watermarking algorithm is an optimization problem that aims to minimize the variance of the predicted probability distribution outputed by the target model when fed with the converted sample.We convert normal samples to fragile samples through multiple iterations. Our method has some advantages: (1) the iterative update of samples is done in a decision-based black-box manner, relying solely on the predicted probability distribution of the target model, which reduces the risk of exposure to adversarial attacks, (2) the small-amplitude multiple iterations approach allows the fragile samples to perform well visually, with a PSNR of 55 dB in TinyImageNet compared to the original samples, (3) even with changes in the overall parameters of the model of magnitude 1e-4, the fragile samples can detect such changes, and (4) the method is independent of the specific model structure and dataset. We demonstrate the effectiveness of our method on multiple models and datasets, and show that it outperforms the current state-of-the-art.	翻訳日:2023-05-18 19:11:22 公開日:2023-05-13
# スパイキングネットワークの初期化とフィリングレートの崩壊 Spiking Network Initialisation and Firing Rate Collapse ( http://arxiv.org/abs/2305.08879v1 ) ライセンス: Link先を確認	Nicolas Perez-Nieves and Dan F.M Goodman	(参考訳) 近年、スパイクニューラルネットワーク(SNN)を訓練する手法が開発され、精度の面ではArtificial Neural Networks(ANN)の代替となり、同時に推論やトレーニング時のエネルギー効率も向上している。しかし、SNNの優れた初期化を構成するものについては、まだ不明である。 ANNトレーニングのために開発された初期化スキームは、しばしば不十分で手動チューニングを必要とする。本稿では,ANN初期化文献の手法と計算神経科学結果を用いてこの問題に対処する。提案手法では,snsのスパイク・リセット非線形性と燃焼速度の崩壊問題により,annの重量初期化問題はannよりも微妙な問題であることを示した。まず,従来のランダムウォーク法とウィーナー法を応用して,様々な仮定の集合の下で発火速度崩壊問題に対するいくつかの解を同定し,提案する。次に,annからの分散伝播法と異なる手法を組み合わせたsn初期化のための一般的な戦略を考案し,拡散とショットノイズ近似に基づく期待発火率と膜電位分布を求める。また, しきい値の存在下での膜電位分布を考慮したSNN初期化を理論的に検討した。しかし、これらの手法が実際のデータセット上でSNNにどの程度うまく適用できるかは未解決のままである。 In recent years, newly developed methods to train spiking neural networks (SNNs) have rendered them as a plausible alternative to Artificial Neural Networks (ANNs) in terms of accuracy, while at the same time being much more energy efficient at inference and potentially at training time. However, it is still unclear what constitutes a good initialisation for an SNN. We often use initialisation schemes developed for ANN training which are often inadequate and require manual tuning. In this paper, we attempt to tackle this issue by using techniques from the ANN initialisation literature as well as computational neuroscience results. We show that the problem of weight initialisation for ANNs is a more nuanced problem than it is for ANNs due to the spike-and-reset non-linearity of SNNs and the firing rate collapse problem. We firstly identify and propose several solutions to the firing rate collapse problem under different sets of assumptions which successfully solve the issue by leveraging classical random walk and Wiener processes results. Secondly, we devise a general strategy for SNN initialisation which combines variance propagation techniques from ANNs and different methods to obtain the expected firing rate and membrane potential distribution based on diffusion and shot-noise approximations. Altogether, we obtain theoretical results to solve the SNN initialisation which consider the membrane potential distribution in the presence of a threshold. Yet, to what extent can these methods be successfully applied to SNNs on real datasets remains an open question.	翻訳日:2023-05-17 17:51:42 公開日:2023-05-13
# 脳腫瘍のセグメンテーションにおける未学習特徴の学習 Learning to Learn Unlearned Feature for Brain Tumor Segmentation ( http://arxiv.org/abs/2305.08878v1 ) ライセンス: Link先を確認	Seungyub Han, Yeongmo Kim, Seokhyeon Ha, Jungwoo Lee, Seunghong Choi	(参考訳) そこで本研究では,脳腫瘍の分類を微調整するアルゴリズムを提案し,少数のデータサンプルを必要とせず,ネットワークが元のタスクを忘れないようにする。我々のアプローチはアクティブラーニングとメタラーニングに基づいている。医学的画像分割の難しさの1つは、適切なアノテーションによるデータセットの欠如であり、医師が信頼できるアノテーションをタグ付けする必要があることと、脳腫瘍の種類が異なり、mr画像に異なる構造的特徴を持つグリオーマや脳転移など、疾患の多くの変種が存在するためである。したがって、あらゆる種類の疾患に対して大規模な医療画像データセットを作成することは不可能である。本稿では,高次グリオーマから脳転移への伝達学習法を示し,そのアルゴリズムが数ステップでグリオーマと脳転移ドメインのバランスの取れたパラメータを実現することを示す。 We propose a fine-tuning algorithm for brain tumor segmentation that needs only a few data samples and helps networks not to forget the original tasks. Our approach is based on active learning and meta-learning. One of the difficulties in medical image segmentation is the lack of datasets with proper annotations, because it requires doctors to tag reliable annotation and there are many variants of a disease, such as glioma and brain metastasis, which are the different types of brain tumor and have different structural features in MR images. Therefore, it is impossible to produce the large-scale medical image datasets for all types of diseases. In this paper, we show a transfer learning method from high grade glioma to brain metastasis, and demonstrate that the proposed algorithm achieves balanced parameters for both glioma and brain metastasis domains within a few steps.	翻訳日:2023-05-17 17:51:15 公開日:2023-05-13
# M$^2$DAR:視覚変換器を用いたマルチビューマルチスケールドライバ動作認識 M$^2$DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer ( http://arxiv.org/abs/2305.08877v1 ) ライセンス: Link先を確認	Yunsheng Ma, Liangqi Yuan, Amr Abdelraouf, Kyungtae Han, Rohit Gupta, Zihao Li, Ziran Wang	(参考訳) 交通安全の確保と事故防止は、この目標を達成するためにコンピュータビジョン技術の進歩を活用できる日々の運転において重要な目標である。本稿では,非トリミングビデオ(M$^2$DAR)における自然主義的運転行動認識とローカライズのためのマルチビュー・マルチスケールフレームワークを提案する。本システムでは,ロバストな階層表現を学習する重み付きマルチスケールトランスフォーマーベースの行動認識ネットワークを特徴とする。さらに,複数のビューにまたがる行動認識モジュールから予備結果を洗練するために,集約,フィルタリング,マージ,選択プロセスからなる新しい選挙アルゴリズムを提案する。第7回ai city challenge track 3データセットで行った広範囲な実験により,a2テストセットで0.5921のオーバーラップスコアを達成した。我々のソースコードは \url{https://github.com/PurdueDigitalTwin/M2DAR} で入手できる。 Ensuring traffic safety and preventing accidents is a critical goal in daily driving, where the advancement of computer vision technologies can be leveraged to achieve this goal. In this paper, we present a multi-view, multi-scale framework for naturalistic driving action recognition and localization in untrimmed videos, namely M$^2$DAR, with a particular focus on detecting distracted driving behaviors. Our system features a weight-sharing, multi-scale Transformer-based action recognition network that learns robust hierarchical representations. Furthermore, we propose a new election algorithm consisting of aggregation, filtering, merging, and selection processes to refine the preliminary results from the action recognition module across multiple views. Extensive experiments conducted on the 7th AI City Challenge Track 3 dataset demonstrate the effectiveness of our approach, where we achieved an overlap score of 0.5921 on the A2 test set. Our source code is available at \url{https://github.com/PurdueDigitalTwin/M2DAR}.	翻訳日:2023-05-17 17:50:58 公開日:2023-05-13
# カプセル内視鏡における三次元表面再構成の課題 Challenges of 3D Surface Reconstruction in Capsule Endoscopy ( http://arxiv.org/abs/2103.10390v3 ) ライセンス: Link先を確認	Olivier Rukundo	(参考訳) 大腸がん検診の精度と信頼性を向上させるため,カプセル内視鏡(CE)画像を用いた三次元3次元表面再構成は,CEハードウェアとソフトウェア制限のために依然として困難である。本研究は3次元可視化の課題に焦点をあて,前処理および非前処理のCE画像を用いて再構成した3次元表面に対する視線選択の不確定の影響を簡潔に検討する。さらに,同じ方位角で見る3次元表面の内容と,視線の角度の違いについて検討した。この研究は、再構成された3D表面の3Dプリンティングは、2Dスクリーンの非決定的な選択や視覚的制約といった課題を克服できると結論付けている。 Essential for improving the accuracy and reliability of bowel cancer screening, three-dimensional (3D) surface reconstruction using capsule endoscopy (CE) images remains challenging due to CE hardware and software limitations. This study focuses on 3D visualization challenges and briefly investigates the impact of indeterminate selection of the line-of-sight on 3D surfaces reconstructed using both preprocessed and non-preprocessed CE images. Furthermore, the study examines the content of 3D surfaces viewed at the same azimuth angles and different elevation angles of the line of sight. The study concludes that 3D printing of reconstructed 3D surfaces can overcome the challenges such as 2D screen line-of-sight indeterminate selection and visual restrictions.	翻訳日:2023-05-17 01:50:49 公開日:2023-05-13
# 行列積密度演算子(Matrix Product Density Operators): ローカルな親 Hamiltonian はいつ存在するか? Matrix Product Density Operators: when do they have a local parent Hamiltonian? ( http://arxiv.org/abs/2010.14682v3 ) ライセンス: Link先を確認	Chi-Fang Chen, Kohtaro Kato, and Fernando G.S.L. Brand\~ao	(参考訳) 準局所親ハミルトニアンのギブス状態として行列積密度演算子(MPDO)を書けるかを検討する。我々は、これが一般的なMPDOのケースであり、証拠を裏付けるものであると推測する。親ハミルトニアンの局所性を調べるため、量子条件付き相互情報が指数関数的に崩壊するかどうかをチェックする。我々が考えるMPDOは、1-入出力/2-アウトプット('Y-shaped')完全正の写像の連鎖から構成される。確率的チャネルと厳密に正のチャネルの条件付き相互情報の上界を導出し、そのチャネルの補正可能な代数が自明であれば指数関数的に崩壊することを示す。また、簡単な修正可能な代数を持つすべてのY字チャネルに対する条件付き相互情報の指数関数的崩壊を意味する量子データ処理の不等式に関する予想も導入する。さらに,近親だが同値でない従兄弟であるmpdoを局所的に測定した。測定された状態の条件付き相互情報の指数的減衰に対して十分な条件を与え、あるランダムmpdoに対して汎用的に正しいことを数値的に確認する。 We study whether one can write a Matrix Product Density Operator (MPDO) as the Gibbs state of a quasi-local parent Hamiltonian. We conjecture this is the case for generic MPDO and give supporting evidences. To investigate the locality of the parent Hamiltonian, we take the approach of checking whether the quantum conditional mutual information decays exponentially. The MPDO we consider are constructed from a chain of 1-input/2-output (`Y-shaped') completely-positive maps, i.e., the MPDO have a local purification. We derive an upper bound on the conditional mutual information for bistochastic channels and strictly positive channels and show that it decays exponentially if the correctable algebra of the channel is trivial. We also introduce a conjecture on a quantum data processing inequality that implies the exponential decay of the conditional mutual information for every Y-shaped channel with trivial correctable algebra. We additionally investigate a close but nonequivalent cousin: MPDO measured in a local basis. We provide sufficient conditions for the exponential decay of the conditional mutual information of the measured states and numerically confirm they are generically true for certain random MPDO.	翻訳日:2023-05-17 01:50:05 公開日:2023-05-13
# タスクに条件付けられた明示的ハイパーパラメータ予測関数の学習 Learning an Explicit Hyperparameter Prediction Function Conditioned on Tasks ( http://arxiv.org/abs/2107.02378v2 ) ライセンス: Link先を確認	Jun Shu, Deyu Meng, Zongben Xu	(参考訳) メタ学習は最近、機械学習コミュニティで注目を集めている。新しいクエリデータのためのラベルを予測するために固有の予測ルールを学習する従来の機械学習とは対照的に、メタ学習は、観察したタスクから機械学習の学習方法論を学習することを目的としており、メタ学習学習手法を利用して新しいクエリタスクを一般化する。本研究では,すべての学習タスクで共有される明示的なハイパーパラメータ予測関数の学習として,学習方法論を解釈する。具体的には、この関数はメタラーナーと呼ばれるパラメータ化関数として表現され、トレーニング/テストタスクから適切なハイパーパラメータ設定にマッピングされる。このような設定により、メタ学習学習手法は、現在の多くのメタ学習手法によって固定されたハイパーパラメータを得る代わりに、様々なクエリタスクを柔軟に適合させることができる。このようなメタ学習の理解は、一般的な損失/タスク/モデルで一般化境界を分析する従来の学習理論から容易に成功する。この理論は自然に、抽出されたメタリーナーの品質を改善するための実現可能な制御戦略を導いており、少数ショット回帰、少数ショット分類、ドメイン一般化など、いくつかの典型的なメタ学習アプリケーションにおいて、その一般化能力を微妙に改善できることが証明されている。 Meta learning has attracted much attention recently in machine learning community. Contrary to conventional machine learning aiming to learn inherent prediction rules to predict labels for new query data, meta learning aims to learn the learning methodology for machine learning from observed tasks, so as to generalize to new query tasks by leveraging the meta-learned learning methodology. In this study, we interpret such learning methodology as learning an explicit hyper-parameter prediction function shared by all training tasks. Specifically, this function is represented as a parameterized function called meta-learner, mapping from a training/test task to its suitable hyper-parameter setting, extracted from a pre-specified function set called meta learning machine. Such setting guarantees that the meta-learned learning methodology is able to flexibly fit diverse query tasks, instead of only obtaining fixed hyper-parameters by many current meta learning methods, with less adaptability to query task's variations. Such understanding of meta learning also makes it easily succeed from traditional learning theory for analyzing its generalization bounds with general losses/tasks/models. The theory naturally leads to some feasible controlling strategies for ameliorating the quality of the extracted meta-learner, verified to be able to finely ameliorate its generalization capability in some typical meta learning applications, including few-shot regression, few-shot classification and domain generalization.	翻訳日:2023-05-17 01:40:02 公開日:2023-05-13
# 放送Bellシナリオにおけるデバイス非依存およびセミデバイス非依存の絡み合い認証 Device-independent and semi-device-independent entanglement certification in broadcast Bell scenarios ( http://arxiv.org/abs/2111.06358v4 ) ライセンス: Link先を確認	Emanuel-Cristian Boghiu, Flavien Hirsch, Pei-Sheng Lin, Marco T\'ulio Quintino, Joseph Bowles	(参考訳) 近年、二成分量子状態のサブシステムをブロードキャストすることで、ベル非局所性を活性化し、デバイス非依存の絡み合い認証のためのノイズ許容範囲を大幅に改善できることが示されている。この研究では、これらの結果を強化し、この現象の新たな側面を探求する。まず,ベル非局所性の活性化に関する新しい結果を示す。我々は,放送シナリオに合わせたベル不等式を構築し,放送がベル非局所活性化のより強固な概念につながることを示す。特に,これらの概念を応用して,局所的隠れ変数モデルが一般化された場合,二成分状態が真の三成分非局所相関に繋がることを示す。次に,放送シナリオにおけるデバイス非依存の絡み合い認証について検討し,デバイス非依存の絡み合い認証が2ビットのワーナー状態に対して本質的には絡み合いの範囲全体において可能であることを示す。最後に、EPRステアリングの概念を放送シナリオに拡張し、2ビット等方性状態の活性化の新たな例を示す。その結果,ブロードキャストベースのデバイス依存およびセミデバイス非依存プロトコルへの道が開けた。 It has recently been shown that by broadcasting the subsystems of a bipartite quantum state, one can activate Bell nonlocality and significantly improve noise tolerance bounds for device-independent entanglement certification. In this work we strengthen these results and explore new aspects of this phenomenon. First, we prove new results related to the activation of Bell nonlocality. We construct Bell inequalities tailored to the broadcast scenario, and show how broadcasting can lead to even stronger notions of Bell nonlocality activation. In particular, we exploit these ideas to show that bipartite states admitting a local hidden-variable model for general measurements can lead to genuine tripartite nonlocal correlations. We then study device-independent entanglement certification in the broadcast scenario, and show through semidefinite programming techniques that device-independent entanglement certification is possible for the two-qubit Werner state in essentially the entire range of entanglement. Finally, we extend the concept of EPR steering to the broadcast scenario, and present novel examples of activation of the two-qubit isotropic state. Our results pave the way for broadcast-based device-dependent and semi-device-independent protocols.	翻訳日:2023-05-17 01:30:25 公開日:2023-05-13
# ポイント2NeRF:3次元点雲からのニューラル放射場の生成 Points2NeRF: Generating Neural Radiance Fields from 3D point cloud ( http://arxiv.org/abs/2206.01290v2 ) ライセンス: Link先を確認	D. Zimny, T. Trzci\'nski, P. Spurek	(参考訳) LIDARや様々な深度カメラなどの3D視覚情報のための現代の登録装置は、データを3Dポイントクラウドとしてキャプチャする。逆に、そのような雲はサイズと複雑さのため処理が難しい。既存のメソッドは、メッシュをポイントクラウドに適合させ、代わりにレンダリングすることで、この問題に対処する。しかしこのアプローチは、結果として生じる視覚化の忠実さを低下させ、コンピュータグラフィックスアプリケーションで重要なオブジェクトの色情報を見逃してしまう。本研究では,3次元物体をNeRF(Neural Radiance Fields)として表現することで,この課題を軽減することを提案する。我々は、ハイパーネットワークのパラダイムを活用し、モデルをトレーニングし、関連するカラー値を持つ3Dポイント・クラウドを取り、入力された2D画像から3Dオブジェクトを再構成するNeRFネットワークの重みを返す。提案手法は,3次元オブジェクトの効率的な表現を提供し,NeRFの条件付けや,学習対象以外の一般化の改善など,既存のアプローチに対していくつかの利点を提供している。後者も経験的評価の結果で確認した。 Contemporary registration devices for 3D visual information, such as LIDARs and various depth cameras, capture data as 3D point clouds. In turn, such clouds are challenging to be processed due to their size and complexity. Existing methods address this problem by fitting a mesh to the point cloud and rendering it instead. This approach, however, leads to the reduced fidelity of the resulting visualization and misses color information of the objects crucial in computer graphics applications. In this work, we propose to mitigate this challenge by representing 3D objects as Neural Radiance Fields (NeRFs). We leverage a hypernetwork paradigm and train the model to take a 3D point cloud with the associated color values and return a NeRF network's weights that reconstruct 3D objects from input 2D images. Our method provides efficient 3D object representation and offers several advantages over the existing approaches, including the ability to condition NeRFs and improved generalization beyond objects seen in training. The latter we also confirmed in the results of our empirical evaluation.	翻訳日:2023-05-17 01:22:12 公開日:2023-05-13
# 中性子干渉法における位相渦格子 Phase Vortex Lattices in Neutron Interferometry ( http://arxiv.org/abs/2205.00536v2 ) ライセンス: Link先を確認	Niels Geerits and Hartmut Lemmel and Anna-Sophie Berger and Stephan Sponar	(参考訳) ネストループ干渉計に挿入されたアルミニウムプリズムの組み合わせを用いて、軌道角運動量l_z=0.35の中性子位相渦格子を220ミクロンの長さスケールで生成し、伝播方向を横切る。本手法は,最近開発された磁気的手法の一般化であり,強い核相互作用を活用できる。これらのプリズムの強いポテンシャルにより、より強い格子が生成される。中性子化合物光学およびスプリット結晶干渉計の最近の進歩と組み合わせることで、本手法は固有の中性子軌道角運動量状態の生成に適用できる。最後に、現在の状態では、我々の設定は異方性極小角中性子散乱に直接適用可能であると断言する。 A combination of aluminium prisms inserted into a nested loop interferometer is used to generate a neutron phase vortex lattice with significant extrinsic orbital angular momentum, L_z=0.35, on a length scale of 220 microns, transverse to the propagation direction. Our method is a generalization of recently developed magnetic methods, such that we can exploit the strong nuclear interaction. The stronger potential of these prisms allows for the generation of a tighter lattice. Combined with recent advances in neutron compound optics and split crystal interferometry our method may be applied to the generation of intrinsic neutron orbital angular momentum states. Finally, we assert that, in its current state, our setup is directly applicable to anisotropic ultra small angle neutron scattering.	翻訳日:2023-05-17 01:21:19 公開日:2023-05-13
# PADA:自己教師付き音声表現のためのドメイン適応処理 PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations ( http://arxiv.org/abs/2203.16965v4 ) ライセンス: Link先を確認	Lodagala V S V Durga Prasad and Sreyan Ghosh and S. Umesh	(参考訳) 自己教師付き音声表現学習(ssl)モデルは下流の様々なタスクをこなすが、これらのモデルはラベルのないデータが起源となる領域に過剰に適合することが観察されている。この問題を軽減するために,大量のドメイン外データに基づいて事前訓練されたモデルからPAD(Pruning Assisted Domain Adaptation)と余剰重量をゼロにする手法を提案する。直感的には、ターゲットドメインのASR微調整のためのスペースを作るのに役立つ。冗長な重みは、この作業の一部として詳細に議論された様々な刈り取り戦略を通じて特定することができる。具体的には,最近発見されたタスク非依存型およびタスク認識型プルーニングがPADに与える影響を調査し,後者に基づいた新たなプルーニングパラダイムを提案する。 CD-TAWは、十分に調整されたOODモデルから初期プルーニングマスクを取得し、論文で論じるプルーニング戦略の他の部分と大きく異なる。提案するCD-TAW法は,言語モデル(LM)復号化を伴わないSwitchboardデータの2時間サブセットを微調整することにより,ベースラインよりも20.6%の相対的なWER改善を実現する。さらに,提案手法の重要な設計選択を強調するために,詳細な分析を行った。 While self-supervised speech representation learning (SSL) models serve a variety of downstream tasks, these models have been observed to overfit to the domain from which the unlabelled data originates. To alleviate this issue, we propose PADA (Pruning Assisted Domain Adaptation) and zero out redundant weights from models pre-trained on large amounts of out-of-domain (OOD) data. Intuitively, this helps to make space for the target-domain ASR finetuning. The redundant weights can be identified through various pruning strategies which have been discussed in detail as a part of this work. Specifically, we investigate the effect of the recently discovered Task-Agnostic and Task-Aware pruning on PADA and propose a new pruning paradigm based on the latter, which we call Cross-Domain Task-Aware Pruning (CD-TAW). CD-TAW obtains the initial pruning mask from a well fine-tuned OOD model, which makes it starkly different from the rest of the pruning strategies discussed in the paper. Our proposed CD-TAW methodology achieves up to 20.6% relative WER improvement over our baseline when fine-tuned on a 2-hour subset of Switchboard data without language model (LM) decoding. Furthermore, we conduct a detailed analysis to highlight the key design choices of our proposed method.	翻訳日:2023-05-17 01:19:42 公開日:2023-05-13
# 逐次キャンセラリストデコーディングのためのスケーラブル極性コード構築:グラフニューラルネットワークに基づくアプローチ Scalable Polar Code Construction for Successive Cancellation List Decoding: A Graph Neural Network-Based Approach ( http://arxiv.org/abs/2207.01105v4 ) ライセンス: Link先を確認	Yun Liao, Seyyed Ali Hashemi, Hengjie Yang, John M. Cioffi	(参考訳) 逐次復号化のための極性符号はビットチャネルをソートすることで効率よく実装できるが、巡回冗長チェック支援逐次復号化リスト(CA-SCL)の最適極性符号の探索はまだ検討が待たれている。本稿では,まず極性コードを,極性コード構築メッセージパッシング(pccmp)グラフと呼ばれる一意な不均一グラフにマッピングする。次に、CA-SCLデコーディングの下で最小フレーム誤り率の極符号に対応するPCCMPグラフを見つけることを目的とした、異種グラフニューラルネットベースの反復メッセージパス(IMP)アルゴリズムを提案する。この新しいIMPアルゴリズムの主な利点はスケーラビリティである。すなわち、モデル複雑性はブロック長とコードレートとは独立であり、短い極性コード上で訓練されたIMPモデルは、長い極性コードの構成に容易に適用できる。数値実験により、IMPベースの極符号構造はCA-SCLデコードの下での古典的な構成よりも優れていた。さらに、長さ128の極性符号で訓練されたIMPモデルがコードレートとブロック長の異なる極性符号の構築に直接適用されると、これらの極性符号構造が5G極性符号に匹敵する性能を示すことがシミュレーションで示されている。 While constructing polar codes for successive-cancellation decoding can be implemented efficiently by sorting the bit-channels, finding optimal polar codes for cyclic-redundancy-check-aided successive-cancellation list (CA-SCL) decoding in an efficient and scalable manner still awaits investigation. This paper first maps a polar code to a unique heterogeneous graph called the polar-code-construction message-passing (PCCMP) graph. Next, a heterogeneous graph-neural-network-based iterative message-passing (IMP) algorithm is proposed which aims to find a PCCMP graph that corresponds to the polar code with minimum frame error rate under CA-SCL decoding. This new IMP algorithm's major advantage lies in its scalability power. That is, the model complexity is independent of the blocklength and code rate, and a trained IMP model over a short polar code can be readily applied to a long polar code's construction. Numerical experiments show that IMP-based polar-code constructions outperform classical constructions under CA-SCL decoding. In addition, when an IMP model trained on a length-128 polar code directly applies to the construction of polar codes with different code rates and blocklengths, simulations show that these polar code constructions deliver comparable performance to the 5G polar codes.	翻訳日:2023-05-17 01:12:25 公開日:2023-05-13
# グラフ埋め込み法のメモリと容量 Memory and Capacity of Graph Embedding Methods ( http://arxiv.org/abs/2208.08769v3 ) ライセンス: Link先を確認	Frank Qiu	(参考訳) テンソル製品とおよそオーソノーマルコードによるグラフ埋め込み(Graph Embeddings via Tensor Products and A roughly Orthonormal Codes)を参照してください。 THIS PAPER IS NOW DEFUNCT: Check out "Graph Embeddings via Tensor Products and Approximately Orthonormal Codes", where it has been combined into one paper.	翻訳日:2023-05-17 01:02:45 公開日:2023-05-13
# ユークリッド選好モデルにおける誤差 Error in the Euclidean Preference Model ( http://arxiv.org/abs/2208.08160v3 ) ライセンス: Link先を確認	Luke Thorburn, Maria Polukarov, Carmine Ventre	(参考訳) 選好の空間モデルは、ベクトル埋め込みの形で、レコメンダシステムを含む多くのディープラーニングとマルチエージェントシステムによって学習される。これらのモデルはしばしばユークリッド構造を近似すると仮定され、ユークリッド計量によって測定されるように、個人は「理想点」に近い位置にある選択肢を好む。しかし、Bogomolnaia and Laslier (2007) は、ユークリッド空間が個人や代替物よりも2つの少ない次元を持つ場合、この構造で表現できない順序的選好プロファイルが存在することを示した。この結果を拡張し、ほぼすべての選好プロファイルをユークリッドモデルで表現できない状況を示し、ユークリッドモデルを用いて非ユークリッド選好プロファイルを近似する際の予測誤差の理論的下限を導出する。この結果は、ベクトル埋め込みの解釈と利用に影響を及ぼす。なぜなら、任意の、真の順序関係の近似が、埋め込みの次元が表現される実体数の実質的な分数である場合に限り、予測できるからである。 Spatial models of preference, in the form of vector embeddings, are learned by many deep learning and multiagent systems, including recommender systems. Often these models are assumed to approximate a Euclidean structure, where an individual prefers alternatives positioned closer to their "ideal point", as measured by the Euclidean metric. However, Bogomolnaia and Laslier (2007) showed that there exist ordinal preference profiles that cannot be represented with this structure if the Euclidean space has two fewer dimensions than there are individuals or alternatives. We extend this result, showing that there are situations in which almost all preference profiles cannot be represented with the Euclidean model, and derive a theoretical lower bound on the expected error when using the Euclidean model to approximate non-Euclidean preference profiles. Our results have implications for the interpretation and use of vector embeddings, because in some cases close approximation of arbitrary, true ordinal relationships can be expected only if the dimensionality of the embeddings is a substantial fraction of the number of entities represented.	翻訳日:2023-05-17 01:02:41 公開日:2023-05-13
# 分散トレーニングにおけるビザンチン攻撃の検出と軽減 Detection and Mitigation of Byzantine Attacks in Distributed Training ( http://arxiv.org/abs/2208.08085v4 ) ライセンス: Link先を確認	Konstantinos Konstantinidis, Namrata Vaswani, and Aditya Ramamoorthy	(参考訳) 現代の機械学習タスクの多くは、トレーニングパイプラインの重要なコンポーネントとして大規模分散クラスタを使用する必要がある。しかし、作業ノードの異常なビザンチン挙動は、トレーニングを脱線させ、推論の品質を損なう可能性がある。このような動作は意図しないシステム障害や組織的攻撃によるものでもあり、結果として、トレーニングを調整するパラメータサーバ(PS)に任意の結果を返すノードもある。最近の研究は、幅広い攻撃モデルを検討し、歪んだ勾配を補正するためにロバストアグリゲーションと/または計算冗長性を検討した。本研究では,攻撃モデルについて検討する。$q$ 防御プロトコルに精通し,反復から弱いものへ変更できる。$q$ ランダムに選択した敵は,一度に数回のイテレーションでのみ変更可能な,限定的な結束能力を持つ。我々のアルゴリズムは、冗長なタスク割り当てと敵対行動の検出に頼っている。また,文献で考慮される共通の仮定と設定の下での最適点への本手法の収束性を示す。強い攻撃に対しては,従来の最先端技術と比較して16%～99%の歪み勾配が減少することを示した。トップ1の分類精度はcifar-10のデータセットにおいて,最先端の手法と比較して25%の精度向上(強弱のシナリオ平均)を示した。 A plethora of modern machine learning tasks require the utilization of large-scale distributed clusters as a critical component of the training pipeline. However, abnormal Byzantine behavior of the worker nodes can derail the training and compromise the quality of the inference. Such behavior can be attributed to unintentional system malfunctions or orchestrated attacks; as a result, some nodes may return arbitrary results to the parameter server (PS) that coordinates the training. Recent work considers a wide range of attack models and has explored robust aggregation and/or computational redundancy to correct the distorted gradients. In this work, we consider attack models ranging from strong ones: $q$ omniscient adversaries with full knowledge of the defense protocol that can change from iteration to iteration to weak ones: $q$ randomly chosen adversaries with limited collusion abilities which only change every few iterations at a time. Our algorithms rely on redundant task assignments coupled with detection of adversarial behavior. We also show the convergence of our method to the optimal point under common assumptions and settings considered in literature. For strong attacks, we demonstrate a reduction in the fraction of distorted gradients ranging from 16%-99% as compared to the prior state-of-the-art. Our top-1 classification accuracy results on the CIFAR-10 data set demonstrate 25% advantage in accuracy (averaged over strong and weak scenarios) under the most sophisticated attacks compared to state-of-the-art methods.	翻訳日:2023-05-17 01:02:22 公開日:2023-05-13
# 量子回路は一般化されるか? Do Quantum Circuit Born Machines Generalize? ( http://arxiv.org/abs/2207.13645v4 ) ライセンス: Link先を確認	Kaitlin Gili, Mohamed Hibat-Allah, Marta Mauri, Chris Ballance, Alejandro Perdomo-Ortiz	(参考訳) 近年、生成タスクのための量子回路モデルの提案において、それらの性能に関する議論は、既知のターゲット分布を再現する能力に限られている。例えば、QCBM(Quantum Circuit Born Machines)のような表現型モデルファミリは、与えられたターゲット分布を高精度に学習する能力について、ほぼ完全に評価されている。この側面はいくつかのタスクには理想的かもしれないが、ジェネレーティブモデルの評価の範囲を一般化するよりもデータを記憶する能力に制限する。その結果、モデルの一般化性能とそのような能力とリソース要件との関係、例えば回路深さとトレーニングデータの量についてはほとんど理解されていない。本研究では,最近提案された一般化評価フレームワークを活用し,この知識ギャップに対処する。まず,QCBMの濃度制約分布の学習過程を調査し,回路深度を増大させながら一般化性能が向上することを示した。ここで示した12量子ビットの例では、トレーニングセット内の有効データの最大30%で、qcbmは、知覚不能で有効なデータを生成するための最良の一般化性能を示す。最後に、QCBMが有効なサンプルだけでなく、適切に再重み付けされた分布に応じて分布する高品質なビットストリングに一般化できる能力を評価する。 QCBMは、再重み付けされたデータセットを効果的に学習し、トレーニングセットのデータセットよりも高い品質の未確認サンプルを生成することができる。我々の知る限り、これはQCBMの一般化性能を量子生成モデルの積分評価指標として示し、QCBMが高品質で望まれる新しいサンプルに一般化する能力を示す文献の中では初めてのものである。 In recent proposals of quantum circuit models for generative tasks, the discussion about their performance has been limited to their ability to reproduce a known target distribution. For example, expressive model families such as Quantum Circuit Born Machines (QCBMs) have been almost entirely evaluated on their capability to learn a given target distribution with high accuracy. While this aspect may be ideal for some tasks, it limits the scope of a generative model's assessment to its ability to memorize data rather than generalize. As a result, there has been little understanding of a model's generalization performance and the relation between such capability and the resource requirements, e.g., the circuit depth and the amount of training data. In this work, we leverage upon a recently proposed generalization evaluation framework to begin addressing this knowledge gap. We first investigate the QCBM's learning process of a cardinality-constrained distribution and see an increase in generalization performance while increasing the circuit depth. In the 12-qubit example presented here, we observe that with as few as 30% of the valid data in the training set, the QCBM exhibits the best generalization performance toward generating unseen and valid data. Lastly, we assess the QCBM's ability to generalize not only to valid samples, but to high-quality bitstrings distributed according to an adequately re-weighted distribution. We see that the QCBM is able to effectively learn the reweighted dataset and generate unseen samples with higher quality than those in the training set. To the best of our knowledge, this is the first work in the literature that presents the QCBM's generalization performance as an integral evaluation metric for quantum generative models, and demonstrates the QCBM's ability to generalize to high-quality, desired novel samples.	翻訳日:2023-05-17 01:00:41 公開日:2023-05-13
# ViT-DD:セミスーパービジョンドライバディトラクション検出用マルチタスク・ビジョン・トランス ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection ( http://arxiv.org/abs/2209.09178v3 ) ライセンス: Link先を確認	Yunsheng Ma and Ziran Wang	(参考訳) 現代の運転における交通安全確保と事故軽減が最重要であり、コンピュータビジョン技術はこの目標に大きく貢献する可能性がある。本稿では,運転者注意障害検出と運転者の感情認識の両方に関連するトレーニング信号からインダクティブ情報を取り入れたマルチモーダル視覚変換器(ViT-DD)を提案する。さらに,感情ラベルのないドライバデータをvit-ddのマルチタスクトレーニングプロセスにシームレスに統合可能な自己学習アルゴリズムを開発した。実験結果から,提案したViT-DDは,SFDDDデータセットとAUCDDデータセットにおいて,運転者の気晴らし検出の既存手法を6.5\%,0.9\%で上回ることがわかった。この重要な研究領域における再現性をサポートし、さらなる進歩を促進するため、このアプローチのソースコードはhttps://github.com/PurdueDigitalTwin/ViT-DDで公開されている。 Ensuring traffic safety and mitigating accidents in modern driving is of paramount importance, and computer vision technologies have the potential to significantly contribute to this goal. This paper presents a multi-modal Vision Transformer for Driver Distraction Detection (termed ViT-DD), which incorporates inductive information from training signals related to both distraction detection and driver emotion recognition. Additionally, a self-learning algorithm is developed, allowing for the seamless integration of driver data without emotion labels into the multi-task training process of ViT-DD. Experimental results reveal that the proposed ViT-DD surpasses existing state-of-the-art methods for driver distraction detection by 6.5\% and 0.9\% on the SFDDD and AUCDD datasets, respectively. To support reproducibility and foster further advancements in this critical research area, the source code for this approach is made publicly available at https://github.com/PurdueDigitalTwin/ViT-DD.	翻訳日:2023-05-17 00:53:52 公開日:2023-05-13
# ロボットアームの安全かつ効率的なマルチオブジェクト把持検出手法 A Secure and Efficient Multi-Object Grasping Detection Approach for Robotic Arms ( http://arxiv.org/abs/2209.03511v2 ) ライセンス: Link先を確認	Hui Wang, Jieren Cheng, Yichen Xu, Sirui Ni, Zaijia Yang and Jiangpeng Li	(参考訳) ロボットアームは自動産業で広く使われている。しかし、ロボットアームにおけるディープラーニングの幅広い応用により、コンピューティングパワーの把握の割り当てやセキュリティに対する需要の増加など、新たな課題が存在する。本研究では,ディープラーニングとエッジクラウドの協調に基づくロボットアームの把握手法を提案する。本手法は,ロボットアームの任意の把握計画を実現し,把握効率と情報セキュリティを考慮した。さらに、GANによって訓練されたエンコーダとデコーダにより、圧縮中に画像が暗号化され、プライバシーのセキュリティが保証される。このモデルは、OCIDデータセット上で92%の精度を実現し、画像圧縮比が0.03%に達し、構造差値が0.91以上である。 Robotic arms are widely used in automatic industries. However, with wide applications of deep learning in robotic arms, there are new challenges such as the allocation of grasping computing power and the growing demand for security. In this work, we propose a robotic arm grasping approach based on deep learning and edge-cloud collaboration. This approach realizes the arbitrary grasp planning of the robot arm and considers the grasp efficiency and information security. In addition, the encoder and decoder trained by GAN enable the images to be encrypted while compressing, which ensures the security of privacy. The model achieves 92% accuracy on the OCID dataset, the image compression ratio reaches 0.03%, and the structural difference value is higher than 0.91.	翻訳日:2023-05-17 00:52:44 公開日:2023-05-13
# テンソル製品とほぼ正規コードによるグラフ埋め込み Graph Embeddings via Tensor Products and Approximately Orthonormal Codes ( http://arxiv.org/abs/2208.10917v4 ) ライセンス: Link先を確認	Frank Qiu	(参考訳) グラフを構造保存的な方法でベクトルとして埋め込む手法を解析し、その豊かな表現能力を示し、その理論的性質のいくつかを確立する。我々の手順はバインド・アンド・サム法に該当し、テンソル積が重ね合わせ原理を尊重する最も一般的な結合演算であることを示す。また,提案手法の挙動を特徴づける精度の高い結果が得られ,球面符号の使用が上限のパッキングを実現することを示す。本手法は,ある意味では,疎グラフ表現への応用を伴う隣接行列の圧縮であることを示すために,隣接行列へのリンクを確立する。 We analyze a method for embedding graphs as vectors in a structure-preserving manner, showcasing its rich representational capacity and establishing some of its theoretical properties. Our procedure falls under the bind-and-sum approach, and we show that the tensor product is the most general binding operation that respects the superposition principle. We also establish some precise results characterizing the behavior of our method, and we show that our use of spherical codes achieves a packing upper bound. We establish a link to adjacency matrices, showing that our method is, in some sense, a compression of adjacency matrices with applications towards sparse graph representations.	翻訳日:2023-05-17 00:51:41 公開日:2023-05-13
# CCC-wav2vec 2.0:クラスタリング支援による音声表現のクロスコントラスト自己教師型学習 CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations ( http://arxiv.org/abs/2210.02592v3 ) ライセンス: Link先を確認	Vasista Sai Lodagala and Sreyan Ghosh and S. Umesh	(参考訳) Self-Supervised Learningは、利用可能なラベルなしデータからスケールのメリットを得るのに役立ちましたが、学習パラダイムは継続的に改善されています。本稿では,クラスタリングと拡張に基づくクロスコントラスト損失を自己管理対象とする,ccc-wav2vec 2.0という新たな事前学習戦略を提案する。クラスタリングモジュールを通じて、ポジティブと非常によく似た否定的な例の影響をスケールダウンします。クロスコントラスト損失は、元のサンプルのエンコーダ出力と、その増大と逆転の量子化器出力との間に計算され、事前学習戦略に堅牢性をもたらす。 ccc-wav2vec 2.0は、librispeechのベースラインであるwav2vec 2.0よりも15.6%と12.7%の改善を達成している。提案手法は,Switchboardデータに微調整を施すと,ベースラインwav2vec 2.0よりも14.9%の相対的なWER改善を実現する。すべてのコードをgithubで公開しています。 While Self-Supervised Learning has helped reap the benefit of the scale from the available unlabeled data, the learning paradigms are continuously being bettered. We present a new pre-training strategy named ccc-wav2vec 2.0, which uses clustering and an augmentation-based cross-contrastive loss as its self-supervised objective. Through the clustering module, we scale down the influence of those negative examples that are highly similar to the positive. The Cross-Contrastive loss is computed between the encoder output of the original sample and the quantizer output of its augmentation and vice-versa, bringing robustness to the pre-training strategy. ccc-wav2vec 2.0 achieves up to 15.6% and 12.7% relative WER improvement over the baseline wav2vec 2.0 on the test-clean and test-other sets, respectively, of LibriSpeech, without the use of any language model. The proposed method also achieves up to 14.9% relative WER improvement over the baseline wav2vec 2.0 when fine-tuned on Switchboard data. We make all our codes publicly available on GitHub.	翻訳日:2023-05-17 00:44:12 公開日:2023-05-13
# どうやってそこに着くの? 英語過去時制インフレクションの認知モデルとしてのトランスフォーマーニューラルネットワークの評価 How do we get there? Evaluating transformer neural networks as cognitive models for English past tense inflection ( http://arxiv.org/abs/2210.09167v2 ) ライセンス: Link先を確認	Xiaomeng Ma and Lingyu Gao	(参考訳) ニューラルネットワークが人間のような言語の準規則性を把握できるかどうか、議論が続いている。典型的な準正則性タスクである英語の過去時制インフレクションにおいて、ニューラルネットワークモデルは、最も頻繁なパターンを一般化するためにのみ学習し、正規パターンではなく、正規パターンと不規則パターンの抽象的なカテゴリを学ぶことができず、人間のパフォーマンスと異なることを長年批判されてきた。本研究では,異なる設定の変圧器モデルのセットをトレーニングし,その動作について検討する。モデルでは, 正規動詞の認識精度が向上し, 不規則動詞の精度も向上した。レギュラーモデルの性能はタイプ周波数と比に大きく影響されるが、トークンの頻度と比率には影響せず、逆もまた不規則である。正規化と不規則化の異なる振る舞いは、モデルが動詞の規則性についてある程度の記号的学習を持っていることを示唆している。さらに、モデルは名詞動詞の人間の行動と弱い相関関係にある。トランスフォーマーモデルは動詞の規則性の抽象的なカテゴリーについてある程度の学習レベルを示すが、その性能は人間のデータにうまく適合せず、良い認知モデルではない可能性がある。 There is an ongoing debate on whether neural networks can grasp the quasi-regularities in languages like humans. In a typical quasi-regularity task, English past tense inflections, the neural network model has long been criticized that it learns only to generalize the most frequent pattern, but not the regular pattern, thus can not learn the abstract categories of regular and irregular and is dissimilar to human performance. In this work, we train a set of transformer models with different settings to examine their behavior on this task. The models achieved high accuracy on unseen regular verbs and some accuracy on unseen irregular verbs. The models' performance on the regulars is heavily affected by type frequency and ratio but not token frequency and ratio, and vice versa for the irregulars. The different behaviors on the regulars and irregulars suggest that the models have some degree of symbolic learning on the regularity of the verbs. In addition, the models are weakly correlated with human behavior on nonce verbs. Although the transformer model exhibits some level of learning on the abstract category of verb regularity, its performance does not fit human data well, suggesting that it might not be a good cognitive model.	翻訳日:2023-05-17 00:34:29 公開日:2023-05-13
# Covariance Matrix Adaptation MAP-Annealing による多次元制御系の訓練 Training Diverse High-Dimensional Controllers by Scaling Covariance Matrix Adaptation MAP-Annealing ( http://arxiv.org/abs/2210.02622v2 ) ライセンス: Link先を確認	Bryon Tjanaka, Matthew C. Fontaine, David H. Lee, Aniruddha Kalkar, Stefanos Nikolaidis	(参考訳) シミュレーションでさまざまなニューラルネットワークコントローラを事前トレーニングすることで、ロボットのロコモーションタスクの損傷に対するオンライン適応が可能になる。しかし、多様で高性能なコントローラを見つけるには、高価なネットワークトレーニングと多数のハイパーパラメータの広範なチューニングが必要となる。一方,進化戦略(es)に基づく品質多様性アルゴリズムである共分散行列適応map-annealing (cma-mae) は,このような制限がなく,標準qdベンチマークで最先端の性能を達成している。しかし、CMA-MAEは2次複雑さのため、現代のニューラルネットワークコントローラにはスケールできない。我々はESにおける効率的な近似手法を活用し、高次元にスケールする3つの新しいCMA-MAE変種を提案する。実験では,ロボットの歩行タスクにおいて,esベースのベースラインを上回っており,最先端の深層強化学習に基づく品質多様性アルゴリズムに匹敵する。 Pre-training a diverse set of neural network controllers in simulation has enabled robots to adapt online to damage in robot locomotion tasks. However, finding diverse, high-performing controllers requires expensive network training and extensive tuning of a large number of hyperparameters. On the other hand, Covariance Matrix Adaptation MAP-Annealing (CMA-MAE), an evolution strategies (ES)-based quality diversity algorithm, does not have these limitations and has achieved state-of-the-art performance on standard QD benchmarks. However, CMA-MAE cannot scale to modern neural network controllers due to its quadratic complexity. We leverage efficient approximation methods in ES to propose three new CMA-MAE variants that scale to high dimensions. Our experiments show that the variants outperform ES-based baselines in benchmark robotic locomotion tasks, while being comparable with or exceeding state-of-the-art deep reinforcement learning-based quality diversity algorithms.	翻訳日:2023-05-17 00:32:33 公開日:2023-05-13
# data2vec-aqc:Teacher-Studentトレーニング設定における適切な教師アシスタントの探索 data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup ( http://arxiv.org/abs/2211.01246v2 ) ライセンス: Link先を確認	Vasista Sai Lodagala and Sreyan Ghosh and S. Umesh	(参考訳) 本稿では、ラベルなし音声データから音声表現学習を行うための、Data2vec-aqcと呼ばれる新しい自己教師付き学習アルゴリズムを提案する。我々の目標は、ラベル付きデータとラベル付きデータの両方が制限されたドメインにおける音声のSSLを改善することです。最近導入されたdata2vecをベースに、データ拡張、量子化表現、クラスタリングの恩恵を受けるdata2vecフレームワークに追加のモジュールを導入しました。これらのモジュール間の相互作用は、追加の自己監督目的として相互競合損失を解決するのに役立つ。 data2vec-aqc は librispeech の既存の state-the-art data2vec システムよりも 14.1% と 20.9% の改善を達成している。提案モデルでは,Switchboardデータセットのサブセットを微調整すると,ベースラインの data2vec に対して最大17.8\% の相対的な WER ゲインが得られる。コード: https://github.com/speech-lab-iitm/data2vec-aqc。 In this paper, we propose a new Self-Supervised Learning (SSL) algorithm called data2vec-aqc, for speech representation learning from unlabeled speech data. Our goal is to improve SSL for speech in domains where both unlabeled and labeled data are limited. Building on the recently introduced data2vec, we introduce additional modules to the data2vec framework that leverage the benefit of data augmentations, quantized representations, and clustering. The interaction between these modules helps solve the cross-contrastive loss as an additional self-supervised objective. data2vec-aqc achieves up to 14.1% and 20.9% relative WER improvement over the existing state-of-the-art data2vec system over the test-clean and test-other sets, respectively of LibriSpeech, without the use of any language model (LM). Our proposed model also achieves up to 17.8\% relative WER gains over the baseline data2vec when fine-tuned on a subset of the Switchboard dataset. Code: https://github.com/Speech-Lab-IITM/data2vec-aqc.	翻訳日:2023-05-17 00:25:03 公開日:2023-05-13
# 衛星間連続可変量子鍵分布:低地球軌道におけるガウスおよび離散変調プロトコル Satellite-to-Ground Continuous Variable Quantum Key Distribution: The Gaussian and Discrete Modulated Protocols in Low Earth Orbit ( http://arxiv.org/abs/2211.16862v3 ) ライセンス: Link先を確認	Mikhael Sayat, Biveen Shajilal, Sebastian P. Kish, Syed M. Assad, Thomas Symul, Ping Koy Lam, Nicholas Rattenbury, John Cater	(参考訳) ガウス変調連続可変量子鍵分布 (GM-CVQKD) プロトコルは、量子鍵分布 (QKD) において、両者の相互情報を最大化する。別の変調方式は離散変調CVQKD(DM-CVQKD)プロトコルである。本稿では,低SNRにおける衛星間リンク上のGM-CVQKDプロトコルとともに,位相シフト鍵(M-PSK)と準振幅変調(M QAM)DM-CVQKDプロトコルについて検討する。本研究では, 幾何損失, シンチレーション, 散乱損失をそれぞれリンク距離, 大気乱流, 大気エアロゾルから考慮し, 衛星対地リンクモデルを用いた。さらに,近年の多次元符号化・多段復号法モデルとマルチエッジ型低密度パリティチェック(MET-LDPC)符号モデルを組み合わせて,復号化効率を判定する手法が提案されている。その結果,GM-CVQKDはDM-CVQKDより優れていた。さらに、MD調整によるGM-CVQKDは、リンク距離と低高度角度で正の秘密鍵レートを発生させることにより、GM-CVQKDとLC-MSD調整を有限サイズ制限で上回る。 The Gaussian modulated continuous variable quantum key distribution (GM-CVQKD) protocol is known to maximise the mutual information between two parties during quantum key distribution (QKD). An alternative modulation scheme is the discrete modulated CVQKD (DM-CVQKD) protocol. In this paper, we study the Phase Shift Keying (M-PSK) and Quadrature Amplitude Modulation (M QAM) DM-CVQKD protocols along with the GM-CVQKD protocol over a satellite-to-ground link in the low SNR regime. We use a satellite-to-ground link model which takes into account geometric losses, scintillation, and scattering losses from the link distance, atmospheric turbulence, and atmospheric aerosols, respectively. In addition, recent multidimensional (MD) and multilevel coding and multistage decoding (MLC-MSD) reconciliation method models in combination with multiedge-type low-density parity-check (MET-LDPC) code models have been used to determine the reconciliation efficiency. The results show that GM-CVQKD outperforms DM-CVQKD. In addition, GM-CVQKD with MD reconciliation outperforms GM-CVQKD with MLC-MSD reconciliation in the finite size limit by producing positive secret key rates at larger link distances and lower elevation angles.	翻訳日:2023-05-17 00:16:50 公開日:2023-05-13
# オンラインメディアにおける語数成長のためのロジスティック方程式の小さな拡張:社会における成長現象の多様性のパラメトリック記述 A minor extension of the logistic equation for growth of word counts on online media: Parametric description of diversity of growth phenomena in society ( http://arxiv.org/abs/2211.16733v2 ) ライセンス: Link先を確認	Hayafumi Watanabe	(参考訳) 2007年から2019年にかけての約10億の日本語ブログ記事から抽出した月次単語数時系列を,全国のオンラインソーシャルメディア上での新たな語彙の増大現象を解析した。特に、拡張ロジスティック方程式を元の方程式に1つのパラメータを加えることで導入し、ロジスティック関数、線形成長、有限時間発散といった実際の成長曲線の様々なパターンを一貫して再現できることを示した。第二に、モデルパラメータの解析により、典型的な成長パターンは、様々な複雑なシステムにしばしば現れるロジスティック関数であるだけでなく、指数関数から始まる非自明な成長曲線であり、定常状態のないパワー関数に漸近的に近づくことを発見した。さらに,機能的成長形態とピークアウトとの関係も観察した。最後に,提案したモデルと統計特性は,検索クエリの全国的普及の時系列であるGoogle Trendsデータ(英語,フランス語,スペイン語,日本語)にも有効であることを示した。 To understand the growing phenomena of new vocabulary on nationwide online social media, we analyzed monthly word count time series extracted from approximately 1 billion Japanese blog articles from 2007 to 2019. In particular, we first introduced the extended logistic equation by adding one parameter to the original equation and showed that the model can consistently reproduce various patterns of actual growth curves, such as the logistic function, linear growth, and finite-time divergence. Second, by analyzing the model parameters, we found that the typical growth pattern is not only a logistic function, which often appears in various complex systems, but also a nontrivial growth curve that starts with an exponential function and asymptotically approaches a power function without a steady state. Furthermore, we observed a connection between the functional form of growth and the peak-out. Finally, we showed that the proposed model and statistical properties are also valid for Google Trends data (English, French, Spanish, and Japanese), which is a time series of the nationwide popularity of search queries.	翻訳日:2023-05-17 00:16:05 公開日:2023-05-13
# PipeFisher: パイプライニングと漁業情報行列を用いた大規模言語モデルの効率的な訓練 PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices ( http://arxiv.org/abs/2211.14133v2 ) ライセンス: Link先を確認	Kazuki Osawa, Shigang Li, Torsten Hoefler	(参考訳) パイプライン並列処理により、大規模分散アクセラレータクラスタ上でのLarge Language Models(LLM)の効率的なトレーニングが可能になる。しかし、起動時と分解時のパイプラインバブルはアクセラレータの利用を減らす。マイクロバッチと双方向パイプラインを用いた効率的なパイプラインスキームが提案されているが、同期前方および後方通過では相当数の気泡が充填できない。この問題に対処するため,llm訓練の補助的効果を得るために気泡に余分な作業を割り当てることを提案する。この方向の例として,フィッシャー情報行列に基づく2次最適化手法であるK-FACをバブルに割り当てて収束を加速するPipeFisherを提案する。 BERTベースとラージモデルの第1相事前トレーニングでは、K-FACによる加速利用を大幅に改善し、改良された収束の恩恵を受けることにより、一階オプティマイザによるトレーニングに比べて(シミュレーションされた)トレーニング時間を50-75%に短縮する。 Pipeline parallelism enables efficient training of Large Language Models (LLMs) on large-scale distributed accelerator clusters. Yet, pipeline bubbles during startup and tear-down reduce the utilization of accelerators. Although efficient pipeline schemes with micro-batching and bidirectional pipelines have been proposed to maximize utilization, a significant number of bubbles cannot be filled using synchronous forward and backward passes. To address this problem, we suggest that extra work be assigned to the bubbles to gain auxiliary benefits in LLM training. As an example in this direction, we propose PipeFisher, which assigns the work of K-FAC, a second-order optimization method based on the Fisher information matrix, to the bubbles to accelerate convergence. In Phase 1 pretraining of BERT-Base and -Large models, PipeFisher reduces the (simulated) training time to 50-75% compared to training with a first-order optimizer by greatly improving the accelerator utilization and benefiting from the improved convergence by K-FAC.	翻訳日:2023-05-17 00:15:12 公開日:2023-05-13
# 過去が重要なこと:ガウス過程モデルの軌道予測における後続状態の相関 The Past Does Matter: Correlation of Subsequent States in Trajectory Predictions of Gaussian Process Models ( http://arxiv.org/abs/2211.11103v2 ) ライセンス: Link先を確認	Steffen Ridderbusch, Sina Ober-Bl\"obaum, Paul Goulart	(参考訳) 力学系のガウス過程モデルから軌跡の分布を計算することは,そのようなモデルを利用する上で重要な課題である。サンプリングベースアプローチの計算コストに動機づけられ,モデルの出力と軌道分布の近似を考える。従来の不確実性伝播は離散状態空間モデルに焦点をあて、予測された軌道のその後の状態間の独立性の仮定を誤って含んでいた。これらのアイデアを連続常微分方程式モデルに拡張し、この仮定の意義を説明し、ガウス過程の新たな分割線形近似を提案する。 Computing the distribution of trajectories from a Gaussian Process model of a dynamical system is an important challenge in utilizing such models. Motivated by the computational cost of sampling-based approaches, we consider approximations of the model's output and trajectory distribution. We show that previous work on uncertainty propagation, focussed on discrete state-space models, incorrectly included an independence assumption between subsequent states of the predicted trajectories. Expanding these ideas to continuous ordinary differential equation models, we illustrate the implications of this assumption and propose a novel piecewise linear approximation of Gaussian Processes to mitigate them.	翻訳日:2023-05-17 00:14:28 公開日:2023-05-13
# 線形力学系におけるオフラインデータポジショニング攻撃の解析と検出可能性 Analysis and Detectability of Offline Data Poisoning Attacks on Linear Dynamical Systems ( http://arxiv.org/abs/2211.08804v4 ) ライセンス: Link先を確認	Alessio Russo	(参考訳) 近年、データ駆動制御手法に対するデータ中毒攻撃の影響に対する関心が高まっている。毒殺攻撃は機械学習コミュニティではよく知られていますが、これは一般的に線形力学系では持たない、クロスサンプル独立のような仮定を利用しています。したがって、これらのシステムは、i.i.d.\設定の教師付き学習問題のために開発されたものとは異なる攻撃および検出方法を必要とする。多くのデータ駆動制御アルゴリズムは最小二乗推定器を利用するため、統計検査のレンズを通して最小二乗推定値に毒がどのような影響を及ぼすか、また、データ中毒攻撃を検出する方法に疑問を呈する。我々は,データに適合するモデルの集合がシステムの真のモデルを含む条件を定式化し,攻撃者に対する異なる中毒戦略を分析する。そこで本稿では,古典的統計的テストから逃れることのできる最小二乗推定器に対するステルスデータ中毒攻撃を提案し,提案攻撃の有効性を示す。 In recent years, there has been a growing interest in the effects of data poisoning attacks on data-driven control methods. Poisoning attacks are well-known to the Machine Learning community, which, however, make use of assumptions, such as cross-sample independence, that in general do not hold for linear dynamical systems. Consequently, these systems require different attack and detection methods than those developed for supervised learning problems in the i.i.d.\ setting. Since most data-driven control algorithms make use of the least-squares estimator, we study how poisoning impacts the least-squares estimate through the lens of statistical testing, and question in what way data poisoning attacks can be detected. We establish under which conditions the set of models compatible with the data includes the true model of the system, and we analyze different poisoning strategies for the attacker. On the basis of the arguments hereby presented, we propose a stealthy data poisoning attack on the least-squares estimator that can escape classical statistical tests, and conclude by showing the efficiency of the proposed attack.	翻訳日:2023-05-17 00:13:54 公開日:2023-05-13
# Recommenderシステムにおける言語モデリングのPivotalの役割:タスク特化学習とタスク非依存表現学習の強化 Pivotal Role of Language Modeling in Recommender Systems: Enriching Task-specific and Task-agnostic Representation Learning ( http://arxiv.org/abs/2212.03760v5 ) ライセンス: Link先を確認	Kyuyong Shin, Hanock Kwak, Wonjae Kim, Jisu Jeong, Seungjae Jung, Kyung-Min Kim, Jung-Woo Ha, Sang-Woo Lee	(参考訳) 近年,様々なアプリケーションのユーザ行動データを活用する統合ユーザモデリングフレームワークが提案されている。それらの多くは、ユーザの振る舞いシーケンスをプレーンテキストとして利用することで、一般性を失うことなく、任意のドメインやシステム内のリッチな情報を表現することができる。ユーザ履歴コーパスのための言語モデリングは、レコメンダシステムを改善するのに役立つか? その汎用性は、多くのドメインで広く研究されてきたが、レコメンデーションシステムへの応用は、まだ未検討のままである。タスク固有のユーザ履歴に直接適用される言語モデリングは,様々なレコメンデーションタスクにおいて優れた結果が得られることを示す。また、追加のタスクに依存しないユーザ履歴を利用することで、大きなパフォーマンス上のメリットが得られます。さらに,本手法は,未確認領域やサービスにおいても,幅広い実世界のレコメンデータシステムに対して,有望な伝達学習能力を提供できることを示す。 Recent studies have proposed unified user modeling frameworks that leverage user behavior data from various applications. Many of them benefit from utilizing users' behavior sequences as plain texts, representing rich information in any domain or system without losing generality. Hence, a question arises: Can language modeling for user history corpus help improve recommender systems? While its versatile usability has been widely investigated in many domains, its applications to recommender systems still remain underexplored. We show that language modeling applied directly to task-specific user histories achieves excellent results on diverse recommendation tasks. Also, leveraging additional task-agnostic user histories delivers significant performance benefits. We further demonstrate that our approach can provide promising transfer learning capabilities for a broad spectrum of real-world recommender systems, even on unseen domains and services.	翻訳日:2023-05-17 00:06:06 公開日:2023-05-13
# ポート・ハミルトンニューラルネットワークを用いた動的システムの構成学習 Compositional Learning of Dynamical System Models Using Port-Hamiltonian Neural Networks ( http://arxiv.org/abs/2212.00893v2 ) ライセンス: Link先を確認	Cyrus Neary and Ufuk Topcu	(参考訳) 環境と対話するロボットから、大規模なマルチフィジカルシステムまで、多くの動的システムは、多くの相互作用するサブシステムを含んでいる。このようなシステムの複合モデル学習の目的に向けて,本稿で提示する。一構成ニューラルネットワークの枠組み二これらのモデルを訓練するアルゴリズム三学習したモデルを構成する方法四結果の合成モデルの誤差を拘束する理論的結果及び五先入観が知られていないとき、その構成自体を学習する方法ニューラルネットワークのサブモデルは比較的単純なサブシステムによって生成された軌道データに基づいて訓練され、さらに複雑なコンポジットシステムのダイナミクスは、コンポジットシステム自身で生成された追加データを必要としないように予測される。この構成性は、各サブシステムと同様に、ポート-ハミルトンニューラルネットワーク(PHNN)として、ポート-ハミルトン系を帰納バイアスとして用いるニューラル常微分方程式のクラスとして表現することで達成される。 phnnのコレクションは、前もって知られていたり、データから学ばれたりできる、物理に変形した相互接続構造を用いて構成する。本稿では,spring-mass-damperシステムの相互作用に関する数値例を通して,提案フレームワークの新たな機能を示す。非線形エネルギー散逸と制御入力を含むこれらのシステムのモデルは独立に学習される。正確な構成は、新しいモデルをスクラッチからトレーニングするために必要なものと比べて無視できる大量のトレーニングデータを用いて学習される。最後に、複合PHNNはシクロパッシビティのようなポート-ハミルトン系の特性を享受し、制御目的に有用な特性を享受する。 Many dynamical systems -- from robots interacting with their surroundings to large-scale multiphysics systems -- involve a number of interacting subsystems. Toward the objective of learning composite models of such systems from data, we present i) a framework for compositional neural networks, ii) algorithms to train these models, iii) a method to compose the learned models, iv) theoretical results that bound the error of the resulting composite models, and v) a method to learn the composition itself, when it is not known a priori. The end result is a modular approach to learning: neural network submodels are trained on trajectory data generated by relatively simple subsystems, and the dynamics of more complex composite systems are then predicted without requiring additional data generated by the composite systems themselves. We achieve this compositionality by representing the system of interest, as well as each of its subsystems, as a port-Hamiltonian neural network (PHNN) -- a class of neural ordinary differential equations that uses the port-Hamiltonian systems formulation as inductive bias. We compose collections of PHNNs by using the system's physics-informed interconnection structure, which may be known a priori, or may itself be learned from data. We demonstrate the novel capabilities of the proposed framework through numerical examples involving interacting spring-mass-damper systems. Models of these systems, which include nonlinear energy dissipation and control inputs, are learned independently. Accurate compositions are learned using an amount of training data that is negligible in comparison with that required to train a new model from scratch. Finally, we observe that the composite PHNNs enjoy properties of port-Hamiltonian systems, such as cyclo-passivity -- a property that is useful for control purposes.	翻訳日:2023-05-17 00:03:42 公開日:2023-05-13
# 多言語事前学習の促進:多言語モデルのための三角形文書レベル事前学習 Advancing Multilingual Pre-training: TRIP Triangular Document-level Pre-training for Multilingual Language Models ( http://arxiv.org/abs/2212.07752v2 ) ライセンス: Link先を確認	Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Wai Lam, Furu Wei	(参考訳) 多言語系列から列への事前学習の成功にもかかわらず、既存のアプローチの多くは、多くの異なる言語における文書レベルの単言語コーパス、文レベルの複言語コーパス、\footnote{in the paperでは、多くの異なる言語ペアにおいて、平行コーパスと「バイリンガル翻訳ペア」を表すために「バイリンガルコーパス」を使用し、それぞれが異なる言語で書かれた2つの文/文書からなる。我々は,3つの文/文書からなる多言語の組み合わせで,'三言語翻訳ペア'と平行コーパスを表すために,'三言語コーパス'を用いる。時に、合成文書レベルのバイリンガルコーパス。これは、文書レベルの変換のような言語間文書レベルのタスクでパフォーマンスを損なう。そこで本研究では,文書レベルの三言語並列コーパスを用いて,多言語前訓練のシーケンシャル・トゥ・シークエンスを改善することを提案する。従来の単言語とバイリンガルの目標を三言語目標に加速する最初の手法として,グラフトリングと呼ばれる新しい手法を用いて,文書レベル \textbf{p}re-training (\textbf{trip}) を提案する。実験により、TRIPは3つの多言語文書レベルの機械翻訳ベンチマークと1つの言語間の抽象的要約ベンチマークにおいて、最大3.11d-BLEU点と8.9ROUGE-L点の一貫性のある改善を含む、強力なSOTAスコアを達成することが示された。 Despite the success of multilingual sequence-to-sequence pre-training, most existing approaches rely on document-level monolingual corpora in many different languages, sentence-level bilingual corpora,\footnote{In this paper, we use `bilingual corpora' to denote parallel corpora with `bilingual translation pairs' in many different language pairs, each consisting of two sentences/documents with the same meaning written in different languages. We use `trilingual corpora' to denote parallel corpora with `trilingual translation pairs' in many different language combinations, each consisting of three sentences/documents.} and sometimes synthetic document-level bilingual corpora. This hampers the performance with cross-lingual document-level tasks such as document-level translation. Therefore, we propose to mine and leverage document-level trilingual parallel corpora to improve sequence-to-sequence multilingual pre-training. We present \textbf{Tri}angular Document-level \textbf{P}re-training (\textbf{TRIP}), which is the first in the field to accelerate the conventional monolingual and bilingual objectives into a trilingual objective with a novel method called Grafting. Experiments show that TRIP achieves several strong state-of-the-art (SOTA) scores on three multilingual document-level machine translation benchmarks and one cross-lingual abstractive summarization benchmark, including consistent improvements by up to 3.11 d-BLEU points and 8.9 ROUGE-L points.	翻訳日:2023-05-16 23:56:23 公開日:2023-05-13
# groma: ディープニューラルネットワークのグローバルロバスト性を測定するツール gRoMA: a Tool for Measuring Deep Neural Networks Global Robustness ( http://arxiv.org/abs/2301.02288v2 ) ライセンス: Link先を確認	Natan Levy and Raz Yerushalmi and Guy Katz	(参考訳) ディープニューラルネットワーク(DNN)は最先端技術の最前線にあり、さまざまな複雑なタスクにおいて顕著なパフォーマンスを実現している。それでも、航空宇宙分野や自動車分野のような安全クリティカルなシステムへの統合は、敵の入力の脅威(DNNが重大な誤りを犯す可能性のある入力の摂動)のために大きな課題を生んでいる。複数の研究では、現代のDNNでさえ敵の入力に影響を受けやすいことが示されており、このリスクは安全クリティカルシステムへのDNNの配備を可能にするために測定および緩和されなければならない。本稿では,DNNのグローバルな分類的ロバスト性を測定するための確率論的検証手法を実装した,革新的でスケーラブルなgRoMA(global Robustness Measurement and Assessment)を提案する。具体的には、gRoMAは特定の出力カテゴリに対して逆入力に遭遇する確率を測定する。本ツールは,事前学習したブラックボックス分類DNNで動作し,興味のある出力カテゴリに属する入力サンプルを生成する。これは、DNNがこれらの入力の周囲の敵対的な入力に対する感受性を計測し、結果を集約し、DNNの全体的カテゴリー的ロバスト性を小さな境界統計誤差まで推測する。我々は,CIFAR10データセット上で人気のDensenet DNNモデルを用いてツールの評価を行った。結果から, 出力カテゴリーの頑健さに有意な差が認められた。この実験は、我々のアプローチの有用性とスケーラビリティ、およびDNNを重要なシステムに展開できる可能性を示す。 Deep neural networks (DNNs) are at the forefront of cutting-edge technology, and have been achieving remarkable performance in a variety of complex tasks. Nevertheless, their integration into safety-critical systems, such as in the aerospace or automotive domains, poses a significant challenge due to the threat of adversarial inputs: perturbations in inputs that might cause the DNN to make grievous mistakes. Multiple studies have demonstrated that even modern DNNs are susceptible to adversarial inputs; and this risk must thus be measured and mitigated to allow the deployment of DNNs in safety-critical systems. Here, we present gRoMA (global Robustness Measurement and Assessment), an innovative and scalable tool that implements a probabilistic verification approach to measure the global categorial robustness of a DNN. Specifically, gRoMA measures the probability of encountering adversarial inputs for a specific output category. Our tool operates on pre-trained, black-box classification DNNs, and generates input samples belonging to an output category of interest. It measures the DNN's susceptibility to adversarial inputs around these inputs, and aggregates the results to infer the overall global categorial robustness of the DNN up to some small bounded statistical error. We evaluate our tool on the popular Densenet DNN model over the CIFAR10 dataset. Our results reveal significant gaps in the robustness of the different output categories. This experiment demonstrates the usefulness and scalability of our approach, and its potential for allowing DNNs to be deployed within critical systems of interest.	翻訳日:2023-05-16 23:46:41 公開日:2023-05-13
# 非プレーヤ文字対話のオントロジー的忠実生成 Ontologically Faithful Generation of Non-Player Character Dialogues ( http://arxiv.org/abs/2212.10618v2 ) ライセンス: Link先を確認	Nathaniel Weir, Ryan Thomas, Randolph D'Amore, Kellie Hill, Benjamin Van Durme, Harsh Jhamtani	(参考訳) 本稿では,人気ゲーム環境に根ざした言語生成タスクを提案する。 KNUDGE(KNowledge Constrained User-NPC Dialogue GEneration)は、自然言語で記述されたクエストとエンティティ仕様を正確に反映したビデオゲームキャラクター間の対話のツリーを作成するモデルである。クヌージは、オブシディアン・エンタテインメントの『ザ・アウターワールド』のゲームデータから直接引き出されたサイドクエスト対話から構築されており、(1)対話は、発話の線形連鎖とは対照的に木を分岐させ、(2)発話は、ゲームlore -- 人格的ペルソナ、バックストーリー、および人間関係に忠実でありなければならず、(3)対話は、人間のプレイヤーに新しいクエストの詳細を正確に明らかにする必要がある。教師付きおよびコンテキスト内学習技術を用いたニューラルネットワークモデルの結果を報告する。現実的でゲーム品質の対話を創り出す上での課題に対処する上で、今後の作業には有能なパフォーマンスと余地を見出す。 We introduce a language generation task grounded in a popular video game environment. KNUDGE (KNowledge Constrained User-NPC Dialogue GEneration) requires models to produce trees of dialogue between video game characters that accurately reflect quest and entity specifications stated in natural language. KNUDGE is constructed from side quest dialogues drawn directly from game data of Obsidian Entertainment's The Outer Worlds, leading to real-world complexities in generation: (1) dialogues are branching trees as opposed to linear chains of utterances; (2) utterances must remain faithful to the game lore -- character personas, backstories, and entity relationships; and (3) a dialogue must accurately reveal new quest details to the human player. We report results for a set of neural generation models using supervised and in-context learning techniques; we find competent performance but room for future work addressing the challenges of creating realistic, game-quality dialogues.	翻訳日:2023-05-16 23:44:52 公開日:2023-05-13
# LegendreTron: マルチクラスの損失学習が向上 LegendreTron: Uprising Proper Multiclass Loss Learning ( http://arxiv.org/abs/2301.11695v2 ) ライセンス: Link先を確認	Kevin Lam, Christian Walder, Spiridon Penev, Richard Nock	(参考訳) 損失関数は教師付き学習の基礎となり、しばしばモデル開発の前に選択される。損失のアドホックな選択を避けるために、統計的決定理論は、ベイズの法則が最適であると主張する \emph{properness} として知られる損失の望ましい性質を記述する。近年の研究では、emph{learn loss} とモデルの共同開発が試みられている。既存の方法では、逆正準リンク関数を単調に$\mathbb{R}$を$[0,1]$にし、二元問題に対する確率を推定する。本論文では、凸関数の勾配の単調性を用いて、$\mathbb{R}^{C-1}$と予想される確率単純度$\tilde{\Delta}^{C-1}$の間の写像への単調性を拡張する。本稿では,emph{proper canonical loss} と多クラス問題に対する確率を共同で学習する新規かつ実用的な方法として {\sc LegendreTron を提案する。最大1000のクラスを持つドメインのベンチマークでテストした結果、我々のメソッドは10以上のクラスを持つすべてのデータセットで99%の価値がある$t$-testで、自然のマルチクラスベースラインを一貫して上回ります。 Loss functions serve as the foundation of supervised learning and are often chosen prior to model development. To avoid potentially ad hoc choices of losses, statistical decision theory describes a desirable property for losses known as \emph{properness}, which asserts that Bayes' rule is optimal. Recent works have sought to \emph{learn losses} and models jointly. Existing methods do this by fitting an inverse canonical link function which monotonically maps $\mathbb{R}$ to $[0,1]$ to estimate probabilities for binary problems. In this paper, we extend monotonicity to maps between $\mathbb{R}^{C-1}$ and the projected probability simplex $\tilde{\Delta}^{C-1}$ by using monotonicity of gradients of convex functions. We present {\sc LegendreTron} as a novel and practical method that jointly learns \emph{proper canonical losses} and probabilities for multiclass problems. Tested on a benchmark of domains with up to 1,000 classes, our experimental results show that our method consistently outperforms the natural multiclass baseline under a $t$-test at 99% significance on all datasets with greater than 10 classes.	翻訳日:2023-05-16 23:38:12 公開日:2023-05-13
# マルチエージェント強化学習システムにおける直接罰が協調の創発に及ぼす影響の検討 Investigating the Impact of Direct Punishment on the Emergence of Cooperation in Multi-Agent Reinforcement Learning Systems ( http://arxiv.org/abs/2301.08278v2 ) ライセンス: Link先を確認	Nayana Dasgupta, Mirco Musolesi	(参考訳) 協力の解決は機能的社会の創出と維持に不可欠であり、道路の分岐点の航行から炭素削減条約の交渉まで、協調的なジレンマの例である。 AIの利用が社会全体に広まるにつれ、これらの複雑な協調ジレンマをナビゲートできる社会的にインテリジェントなエージェントの必要性がますます明白になりつつある。自然界では、直接罰(direct punishment)は、集団内の協力の出現の恩恵を受ける、ユビキタスな社会的メカニズムである。しかし、社会的ジレンマを経験する人工学習エージェントの集団における協力の発展に先行研究が与える影響は調査されていない。さらに、自然集団内では、いかなる形態の刑罰も、パートナーの選択と評判の関連する社会的メカニズムと強く結びついている。しかし, マルチエージェントシステムにおける協調の出現に, 複数の社会的メカニズムを組み合わせることが及ぼす影響は, これまで検討されていない。そこで,本稿では,マルチエージェント強化学習システムにおける直接的な処罰に関連する行動と学習のダイナミクスを包括的に分析し,パートナー選択と評価の社会的メカニズムと組み合わせることで,第三者の罰と比較する。エージェントが学習した戦略のダイナミクスに対するこれらの重要なメカニズムの影響を広範囲かつ体系的に評価する。最後に,これらのメカニズムが協調型AIシステムの設計に与える影響について論じる。 Solving the problem of cooperation is of fundamental importance to the creation and maintenance of functional societies, with examples of cooperative dilemmas ranging from navigating busy road junctions to negotiating carbon reduction treaties. As the use of AI becomes more pervasive throughout society, the need for socially intelligent agents that are able to navigate these complex cooperative dilemmas is becoming increasingly evident. In the natural world, direct punishment is an ubiquitous social mechanism that has been shown to benefit the emergence of cooperation within populations. However no prior work has investigated its impact on the development of cooperation within populations of artificial learning agents experiencing social dilemmas. Additionally, within natural populations the use of any form of punishment is strongly coupled with the related social mechanisms of partner selection and reputation. However, no previous work has considered the impact of combining multiple social mechanisms on the emergence of cooperation in multi-agent systems. Therefore, in this paper we present a comprehensive analysis of the behaviours and learning dynamics associated with direct punishment in multi-agent reinforcement learning systems and how it compares to third-party punishment, when both are combined with the related social mechanisms of partner selection and reputation. We provide an extensive and systematic evaluation of the impact of these key mechanisms on the dynamics of the strategies learned by agents. Finally, we discuss the implications of the use of these mechanisms on the design of cooperative AI systems.	翻訳日:2023-05-16 23:37:10 公開日:2023-05-13
# フェデレーションレコメンデーションにおける二重パーソナライズ Dual Personalization on Federated Recommendation ( http://arxiv.org/abs/2301.08143v2 ) ライセンス: Link先を確認	Chunxu Zhang, Guodong Long, Tianyi Zhou, Peng Yan, Zijian Zhang, Chengqi Zhang, Bo Yang	(参考訳) フェデレーションレコメンデーション(federated recommendation)は、プライバシー保護レコメンデーションサービスをフェデレーション設定で提供する、新しいインターネットサービスアーキテクチャである。既存のソリューションは、分散レコメンデーションアルゴリズムとプライバシ保護メカニズムを組み合わせるために使用される。したがって、本質的にはサーバでヘビーウェイトモデルの形をとり、デバイス上のインテリジェントモデルのエンドユーザへのデプロイを妨げる。本稿では、サーバ上の重み付けモデルではなく、スマートデバイスにデプロイされる多くのユーザ固有の軽量モデルを学ぶために、Personalized Federated Recommendation(PFedRec)フレームワークを提案する。さらに,ユーザとアイテムの両方の詳細なパーソナライズを効果的に学習するための,新たな二重パーソナライズ機構を提案する。全体的な学習プロセスは統合された最適化フレームワークに定式化される。具体的には、フェデレーションシステムでユーザ間でまったく同じアイテム埋め込みを共有する従来の方法とは異なり、デュアルパーソナライズにより、各ユーザがアイテム埋め込みを穏やかに微調整することで、アイテム表現に対するユーザ固有のビューを生成し、既存のフェデレーション推奨メソッドに統合して、すぐに改善を得られるようになる。複数のベンチマークデータセットの実験では、PFedRecと二重パーソナライゼーション機構の有効性が実証されている。さらに,アイテム埋め込みにおけるパーソナライズ手法の可視化と詳細な分析を行い,フェデレーション設定におけるレコメンダシステムの設計に関する新たな知見を得た。コードは利用可能です。 Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings. Existing solutions are used to combine distributed recommendation algorithms and privacy-preserving mechanisms. Thus it inherently takes the form of heavyweight models at the server and hinders the deployment of on-device intelligent models to end-users. This paper proposes a novel Personalized Federated Recommendation (PFedRec) framework to learn many user-specific lightweight models to be deployed on smart devices rather than a heavyweight model on a server. Moreover, we propose a new dual personalization mechanism to effectively learn fine-grained personalization on both users and items. The overall learning process is formulated into a unified federated optimization framework. Specifically, unlike previous methods that share exactly the same item embeddings across users in a federated system, dual personalization allows mild finetuning of item embeddings for each user to generate user-specific views for item representations which can be integrated into existing federated recommendation methods to gain improvements immediately. Experiments on multiple benchmark datasets have demonstrated the effectiveness of PFedRec and the dual personalization mechanism. Moreover, we provide visualizations and in-depth analysis of the personalization techniques in item embedding, which shed novel insights on the design of recommender systems in federated settings. The code is available.	翻訳日:2023-05-16 23:36:48 公開日:2023-05-13
# 推定特徴属性に対する負のフラックス凝集 Negative Flux Aggregation to Estimate Feature Attributions ( http://arxiv.org/abs/2301.06989v2 ) ライセンス: Link先を確認	Xin Li, Deng Pan, Chengyin Li, Yao Qiang and Dongxiao Zhu	(参考訳) セキュリティや透明性の懸念が高まる中で、ディープニューラルネットワーク(DNN)の動作を理解する必要性が高まっている。ディープニューラルネットワークアーキテクチャの多層非線形性のため、DNN予測の説明は依然として未解決の問題であり、メカニズムの深い理解を妨げている。 DNNの説明可能性を高めるために,分岐とフラックスを用いた予測課題に対する入力特徴の属性を推定する。ベクトル解析における発散定理に着想を得て,新しい負流束集合(neflag)定式化法と帰属写像を推定するための効率的な近似アルゴリズムを開発した。以前の技術とは異なり、私たちの手法はサーロゲートモデルに適合したり、勾配のパス統合を必要としたりしません。定性的かつ定量的な実験は、競合する方法よりも忠実な帰属写像を生成する上で、NeFLAGの優れた性能を示す。我々のコードは \url{https://github.com/xinli0928/NeFLAG} で入手できる。 There are increasing demands for understanding deep neural networks' (DNNs) behavior spurred by growing security and/or transparency concerns. Due to multi-layer nonlinearity of the deep neural network architectures, explaining DNN predictions still remains as an open problem, preventing us from gaining a deeper understanding of the mechanisms. To enhance the explainability of DNNs, we estimate the input feature's attributions to the prediction task using divergence and flux. Inspired by the divergence theorem in vector analysis, we develop a novel Negative Flux Aggregation (NeFLAG) formulation and an efficient approximation algorithm to estimate attribution map. Unlike the previous techniques, ours doesn't rely on fitting a surrogate model nor need any path integration of gradients. Both qualitative and quantitative experiments demonstrate a superior performance of NeFLAG in generating more faithful attribution maps than the competing methods. Our code is available at \url{https://github.com/xinli0928/NeFLAG}	翻訳日:2023-05-16 23:35:49 公開日:2023-05-13
# MPAS-Oとグローバルドリフトデータセットの動的データ同化 Dynamic Data Assimilation of MPAS-O and the Global Drifter Dataset ( http://arxiv.org/abs/2301.05551v2 ) ライセンス: Link先を確認	Derek DeSantis, Ayan Biswas, Earl Lawrence, Phillip Wolfram	(参考訳) 本研究では,海洋における温度予測の精度を向上させるために,地球系モデル(esms)とin situ buoy測定を組み合わせた新しい手法を提案する。この技術はesmで識別されるダイナミクスとモードを利用して、季節性などの特徴を保ちながらブイ測定の精度を向上させる。この手法を用いることで,MPAS-Oモデルによる局所温度予測の誤差を補正することができる。提案手法は他の補間法やデータ同化法に比べて精度が向上することを示す。本手法は,グローバル・ドリフト・プログラムの海洋ブイデータセットを用いて,スケールス・オーシャン・コンポーネント (mpas-o) の予測モデルを適用した。 In this study, we propose a new method for combining in situ buoy measurements with Earth system models (ESMs) to improve the accuracy of temperature predictions in the ocean. The technique utilizes the dynamics and modes identified in ESMs to improve the accuracy of buoy measurements while still preserving features such as seasonality. Using this technique, errors in localized temperature predictions made by the MPAS-O model can be corrected. We demonstrate that our approach improves accuracy compared to other interpolation and data assimilation methods. We apply our method to assimilate the Model for Prediction Across Scales Ocean component (MPAS-O) with the Global Drifter Program's in-situ ocean buoy dataset.	翻訳日:2023-05-16 23:35:33 公開日:2023-05-13
# 国家の安全強化学習に関する調査 State-wise Safe Reinforcement Learning: A Survey ( http://arxiv.org/abs/2302.03122v2 ) ライセンス: Link先を確認	Weiye Zhao, Tairan He, Rui Chen, Tianhao Wei, Changliu Liu	(参考訳) シミュレーション環境でRL(Reinforcement Learning)アルゴリズムが驚くほど成功したにもかかわらず、実世界のアプリケーションにRLを適用することは、まだ多くの課題に直面している。主な懸念事項は安全性、つまり制約満足度である。状態毎の制約は、現実世界のアプリケーションで最も一般的な制約の1つであり、safe rlで最も難しい制約の1つです。自律運転やロボット操作など,多くの課題に対して,国家的制約の実施が不可欠である。本稿では、RLにおける状態制約に対処する既存のアプローチを包括的にレビューする。 SCMDP(State-wise Constrained Markov Decision Process)の枠組みの下で、既存のアプローチの関連、相違、トレードオフについて議論する。 (i)安全性の保証と拡張性。 (ii)安全と報酬の成果、及び (iii)収束後及び訓練中の安全性。また,現在の手法の限界を要約し,今後の方向性について考察する。 Despite the tremendous success of Reinforcement Learning (RL) algorithms in simulation environments, applying RL to real-world applications still faces many challenges. A major concern is safety, in another word, constraint satisfaction. State-wise constraints are one of the most common constraints in real-world applications and one of the most challenging constraints in Safe RL. Enforcing state-wise constraints is necessary and essential to many challenging tasks such as autonomous driving, robot manipulation. This paper provides a comprehensive review of existing approaches that address state-wise constraints in RL. Under the framework of State-wise Constrained Markov Decision Process (SCMDP), we will discuss the connections, differences, and trade-offs of existing approaches in terms of (i) safety guarantee and scalability, (ii) safety and reward performance, and (iii) safety after convergence and during training. We also summarize limitations of current methods and discuss potential future directions.	翻訳日:2023-05-16 23:28:01 公開日:2023-05-13
# リスク分解による自己指導型学習の評価 Evaluating Self-Supervised Learning via Risk Decomposition ( http://arxiv.org/abs/2302.03068v2 ) ライセンス: Link先を確認	Yann Dubois and Tatsunori Hashimoto and Percy Liang	(参考訳) 自己教師付き学習(SSL)パイプラインは、アーキテクチャや拡張、事前トレーニングデータなど、多くの設計上の選択肢が異なる。しかし、SSLは通常、1つのメトリックを使って評価される。これにより、モデルがなぜ、いつ、どのように改善されるのか、多くの洞察が得られない。そこで本研究では,表現学習ステップから生じる誤りを考慮し,古典的教師付き近似推定分解を一般化したsslリスク分解を提案する。分解は,近似,表現ユーザビリティ,プローブ一般化,エンコーダ一般化の4つの誤差成分からなる。我々は,各コンポーネントに対して効率的な推定器を提供し,imagenet で評価した 169 ssl ビジョンモデルに対する30 の設計選択の影響を分析する。私たちの分析はSSLモデルを設計、使用するための貴重な洞察を与えます。例えば、エラーの主なソースを強調し、エラーコンポーネントのトレーディングによって特定の設定(フル対数ショット)でSSLを改善する方法を示している。すべての結果と事前訓練されたモデルはhttps://github.com/YannDubs/SSL-Risk-Decompositionにある。 Self-supervised learning (SSL) pipelines differ in many design choices such as the architecture, augmentations, or pretraining data. Yet SSL is typically evaluated using a single metric: linear probing on ImageNet. This does not provide much insight into why or when a model is better, now how to improve it. To address this, we propose an SSL risk decomposition, which generalizes the classical supervised approximation-estimation decomposition by considering errors arising from the representation learning step. Our decomposition consists of four error components: approximation, representation usability, probe generalization, and encoder generalization. We provide efficient estimators for each component and use them to analyze the effect of 30 design choices on 169 SSL vision models evaluated on ImageNet. Our analysis gives valuable insights for designing and using SSL models. For example, it highlights the main sources of error and shows how to improve SSL in specific settings (full- vs few-shot) by trading off error components. All results and pretrained models are at https://github.com/YannDubs/SSL-Risk-Decomposition.	翻訳日:2023-05-16 23:27:47 公開日:2023-05-13
# 知識グラフ補完のための二重置換等価性 Double Permutation Equivariance for Knowledge Graph Completion ( http://arxiv.org/abs/2302.01313v4 ) ライセンス: Link先を確認	Jianfei Gao, Yangze Zhou, Bruno Ribeiro	(参考訳) この研究は知識グラフ(kgs)を、二重交換可能な有理グラフを表す新しいグラフのクラスとして形式化し、ノードとペアワイズ(joint 2-node)表現は、ノードidとエッジ(&node)属性(relation & node feature)の両方の置換に同値でなければならない。二重置換同変 KG 表現は KG の新しい研究方向を開く。この等分散は、ニューラルネットワークが複雑な論理推論タスクをkgsで実行できるようにする関係の構造的表現を課す。最後に,このような等価表現に対する一般的な青写真を導入し,wn18rr,fb237,nell995インダクティブkg完了タスクにおいて最先端のhis@10テスト精度を達成し,既存の手法では実行できない論理的推論タスクを最善の知識に対して正確に実行可能にする,単純なgnnベースの二重置換同変ニューラルネットワークアーキテクチャをテストする。 This work provides a formalization of Knowledge Graphs (KGs) as a new class of graphs that we denote doubly exchangeable attributed graphs, where node and pairwise (joint 2-node) representations must be equivariant to permutations of both node ids and edge (& node) attributes (relations & node features). Double-permutation equivariant KG representations open a new research direction in KGs. We show that this equivariance imposes a structural representation of relations that allows neural networks to perform complex logical reasoning tasks in KGs. Finally, we introduce a general blueprint for such equivariant representations and test a simple GNN-based double-permutation equivariant neural architecture that achieve state-of-the-art Hits@10 test accuracy in the WN18RR, FB237 and NELL995 inductive KG completion tasks, and can accurately perform logical reasoning tasks that no existing methods can perform, to the best of our knowledge.	翻訳日:2023-05-16 23:27:12 公開日:2023-05-13
# 非自己回帰テキスト生成のための拡散モデル:調査 Diffusion Models for Non-autoregressive Text Generation: A Survey ( http://arxiv.org/abs/2303.06574v2 ) ライセンス: Link先を確認	Yifan Li, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen	(参考訳) 非自己回帰(NAR)テキスト生成は、推論遅延を大幅に低減するが、生成精度を犠牲にする自然言語処理の分野で大きな注目を集めている。近年,narテキスト生成に潜伏型可変生成モデルのクラスである拡散モデルが導入され,テキスト生成品質が向上している。本稿では,NARテキスト生成における拡散モデルの最近の進歩を概観する。背景として,まず拡散モデルとテキスト拡散モデルの一般定義を提示し,ナル生成のメリットについて考察する。コアコンテンツとして,既存のテキスト拡散における2つの主流拡散モデルを紹介し,拡散過程の重要な設計について検討する。さらに,テキスト拡散モデルにおける事前学習言語モデル(PLM)の利用について検討し,テキストデータの最適化手法を導入する。最後に,いくつかの有望な方向性について議論し,本論文をまとめる。本研究の目的は,NAR生成のためのテキスト拡散モデルに関する研究の体系的な参照を提供することである。我々はテキスト拡散モデルの集合をhttps://github.com/RUCAIBox/Awesome-Text-Diffusion-Modelsで紹介する。 Non-autoregressive (NAR) text generation has attracted much attention in the field of natural language processing, which greatly reduces the inference latency but has to sacrifice the generation accuracy. Recently, diffusion models, a class of latent variable generative models, have been introduced into NAR text generation, showing an improved text generation quality. In this survey, we review the recent progress in diffusion models for NAR text generation. As the background, we first present the general definition of diffusion models and the text diffusion models, and then discuss their merits for NAR generation. As the core content, we further introduce two mainstream diffusion models in existing work of text diffusion, and review the key designs of the diffusion process. Moreover, we discuss the utilization of pre-trained language models (PLMs) for text diffusion models and introduce optimization techniques for text data. Finally, we discuss several promising directions and conclude this paper. Our survey aims to provide researchers with a systematic reference of related research on text diffusion models for NAR generation. We present our collection of text diffusion models at https://github.com/RUCAIBox/Awesome-Text-Diffusion-Models.	翻訳日:2023-05-16 23:08:12 公開日:2023-05-13
# 漢字命名におけるトランスフォーマーモデルの評価と人間の行動 Evaluating Transformer Models and Human Behaviors on Chinese Character Naming ( http://arxiv.org/abs/2303.12294v2 ) ライセンス: Link先を確認	Xiaomeng Ma and Lingyu Gao	(参考訳) ニューラルネットワークモデルは、多くのアルファベット言語に対する人間のグラファイム・音素マッピングプロセスを説明するために提案されている。これらのモデルは、文字文字列とその発音の対応をうまく学習しただけでなく、人間の振る舞いを言葉命名タスクで捉えた。ニューラルネットワークは、非アルファベット言語(例えば中国語)の未知文字タスクに対してどのように機能するか? モデルはどの程度人間の行動を捉えますか? 本研究では,まず未知の漢字命名課題に対する話者の回答を収集し,その性能を未知の漢字命名課題における人間の行動と比較し,トランスフォーマーモデルの評価を行った。モデルと人間は同じような振る舞いをしており、各キャラクタに類似した精度分布を持ち、回答にかなりの重複があることが判明した。さらに、モデルの回答は人間の回答と非常に相関している。これらの結果はトランスモデルが人間のキャラクタ命名行動をうまく捉えていることを示唆している。 Neural network models have been proposed to explain the grapheme-phoneme mapping process in humans for many alphabet languages. These models not only successfully learned the correspondence of the letter strings and their pronunciation, but also captured human behavior in nonce word naming tasks. How would the neural models perform for a non-alphabet language (e.g., Chinese) unknown character task? How well would the model capture human behavior? In this study, we first collect human speakers' answers on unknown character naming tasks and then evaluate a set of transformer models by comparing their performances with human behaviors on an unknown Chinese character naming task. We found that the models and humans behaved very similarly, that they had similar accuracy distribution for each character, and had a substantial overlap in answers. In addition, the models' answers are highly correlated with humans' answers. These results suggested that the transformer models can well capture human's character naming behavior.	翻訳日:2023-05-16 22:59:47 公開日:2023-05-13
# 開量子系に対する一般化ブラウン粒子としてのディシパトン:ディシパトン埋め込み量子マスター方程式 Dissipatons as generalized Brownian particles for open quantum systems: Dissipaton-embedded quantum master equation ( http://arxiv.org/abs/2303.10666v2 ) ライセンス: Link先を確認	Xiang Li, Yu Su, Zi-Hao Chen, Yao Wang, Rui-Xue Xu, Xiao Zheng, YiJing Yan	(参考訳) ディシパトン理論はオープン量子系力学を扱うための正確で非摂動的なアプローチとして提案され、ガウス環境の影響はディシパトンと呼ばれる統計的準粒子によって特徴づけられる。本研究では、ディシパトン運動方程式を再検討し、同値なディシパトン埋め込み量子マスター方程式(dqme)を確立し、一般化されたブラウン粒子としてディシパトンを生成する。この論文で説明されているように、dqmeはディシパトンと物理的に支持されるハイブリッドバスモードの統計特性を調べるための直接的なアプローチを提供する。電子移動モデルを用いて数値実験を行い, 溶媒和座標の過渡的統計特性を示す。 Dissipaton theory had been proposed as an exact and nonperturbative approach to deal with open quantum system dynamics, where the influence of Gaussian environment is characterized by statistical quasi-particles named as dissipatons. In this work, we revisit the dissipaton equation of motion theory and establish an equivalent dissipatons-embedded quantum master equation (DQME), which gives rise to dissipatons as generalized Brownian particles. As explained in this work, the DQME supplies a direct approach to investigate the statistical characteristics of dissipatons and thus the physically supporting hybrid bath modes. Numerical demonstrations are carried out on the electron transfer model, exhibiting the transient statistical properties of the solvation coordinate.	翻訳日:2023-05-16 22:59:32 公開日:2023-05-13
# 人間中心設計のための人工共感に向けて:フレームワーク Toward Artificial Empathy for Human-Centered Design: A Framework ( http://arxiv.org/abs/2303.10583v2 ) ライセンス: Link先を確認	Qihao Zhu and Jianxi Luo	(参考訳) 設計プロセスの初期段階では、デザイナは未完成のニーズを発見し、潜在的な解決策として革新的な概念を開発することで機会を探る。人間中心のデザインの観点からは、デザイナーはニーズを真に理解するために、人々と共感しなくてはならない。しかし、共感の発達は、デザイナーの共感能力に大きく依存する複雑で主観的なプロセスである。したがって、共感的理解の発達は直感的であり、基礎となるニーズの発見はしばしばセレンディピティである。本稿では,AIによる人間中心設計の今後の方向性を示すために,人工知能研究からの洞察を提供することを目的としている。具体的には,データ駆動型ユーザ研究,共感的理解開発,人為的共感など研究分野を学際的に調査する。本稿では,人間中心設計において人工共感が果たす役割を論じ,人間中心設計のための人工共感フレームワークを提案する。共感の背後にあるメカニズムと共感設計の研究からの洞察に基づいて、このフレームワークは共感のかなり複雑で主観的な概念を計算的にモデル化できるコンポーネントとモジュールに分解することを目的としている。さらに,このようなシステムを開発することの期待できる利点を議論し,今後の研究努力を促進するための現在の研究ギャップを明らかにする。 In the early stages of the design process, designers explore opportunities by discovering unmet needs and developing innovative concepts as potential solutions. From a human-centered design perspective, designers must develop empathy with people to truly understand their needs. However, developing empathy is a complex and subjective process that relies heavily on the designer's empathic capability. Therefore, the development of empathic understanding is intuitive, and the discovery of underlying needs is often serendipitous. This paper aims to provide insights from artificial intelligence research to indicate the future direction of AI-driven human-centered design, taking into account the essential role of empathy. Specifically, we conduct an interdisciplinary investigation of research areas such as data-driven user studies, empathic understanding development, and artificial empathy. Based on this foundation, we discuss the role that artificial empathy can play in human-centered design and propose an artificial empathy framework for human-centered design. Building on the mechanisms behind empathy and insights from empathic design research, the framework aims to break down the rather complex and subjective concept of empathy into components and modules that can potentially be modeled computationally. Furthermore, we discuss the expected benefits of developing such systems and identify current research gaps to encourage future research efforts.	翻訳日:2023-05-16 22:59:02 公開日:2023-05-13
# 間隔密結合戦略に基づくスウィントランスを用いた低画質画像の解像度向上処理 Resolution Enhancement Processing on Low Quality Images Using Swin Transformer Based on Interval Dense Connection Strategy ( http://arxiv.org/abs/2303.09190v2 ) ライセンス: Link先を確認	Rui-Yang Ju, Chih-Chia Chen, Jen-Shiun Chiang, Yu-Shian Lin, Wei-Han Chen, Chun-Tse Chien	(参考訳) 本手法は,畳み込みニューラルネットワーク(cnns)に基づく手法と比較して,画像の超解像性能が著しく向上した。しかし、画像から特徴情報を抽出するために、SwinIR (Image Restoration Using Swin Transformer) のような自己保持機構を使用するには、膨大な量の計算資源が必要であるため、低計算パワープラットフォームへの応用が制限される。モデル機能の再利用を改善するため,新たに設計されたアルゴリズムに従って異なるブロックを接続するインターバルDense Connection Strategyを提案する。我々はこの戦略をSwinIRに適用し、SwinOIR (Object Image Restoration using Swin Transformer) と名付けた新しいモデルを提案する。画像の超解像に対して,区間密結合戦略がモデル性能に及ぼす影響を示すため,アブレーション実験を行った。さらに,このモデルを様々なベンチマークデータセット上で評価し,他のSOTA(State-of-the-art)軽量モデルと比較した。例えば、SwinOIRはUrban100データセットの超高解像度化のために26.62dBのPSNRを取得し、これはSOTAモデルSwinIRよりも0.15dB高い。本研究は, リアルタイムアプリケーションにおいて, YOLOv8(You Only Look Once)モデルの最後のバージョンと提案モデルを適用し, 低画質画像上でオブジェクト検出とリアルタイム画像の超解像を行う。この実装コードはhttps://github.com/Rubbbbbbby/SwinOIRで公開されている。 The Transformer-based method has demonstrated remarkable performance for image super-resolution in comparison to the method based on the convolutional neural networks (CNNs). However, using the self-attention mechanism like SwinIR (Image Restoration Using Swin Transformer) to extract feature information from images needs a significant amount of computational resources, which limits its application on low computing power platforms. To improve the model feature reuse, this research work proposes the Interval Dense Connection Strategy, which connects different blocks according to the newly designed algorithm. We apply this strategy to SwinIR and present a new model, which named SwinOIR (Object Image Restoration Using Swin Transformer). For image super-resolution, an ablation study is conducted to demonstrate the positive effect of the Interval Dense Connection Strategy on the model performance. Furthermore, we evaluate our model on various popular benchmark datasets, and compare it with other state-of-the-art (SOTA) lightweight models. For example, SwinOIR obtains a PSNR of 26.62 dB for x4 upscaling image super-resolution on Urban100 dataset, which is 0.15 dB higher than the SOTA model SwinIR. For real-life application, this work applies the lastest version of You Only Look Once (YOLOv8) model and the proposed model to perform object detection and real-life image super-resolution on low-quality images. This implementation code is publicly available at https://github.com/Rubbbbbbbbby/SwinOIR.	翻訳日:2023-05-16 22:58:05 公開日:2023-05-13
# Global Prompt Cell: 効率的なPromptチューニングのためのポータブルモジュール Global Prompt Cell: A Portable Control Module for Effective Prompt Tuning ( http://arxiv.org/abs/2304.05642v2 ) ライセンス: Link先を確認	Chi Liu, Haochun Wang, Nuwa Xi, Sendong Zhao, Bing Qin	(参考訳) 事前訓練されたモデルをチューニングするための新しいアプローチとして、プロンプトチューニングは、第1層の入力にトレーニング可能な埋め込みを挿入しながら、下流タスクでパラメータを凍結する。しかし,従来の手法は主に,プロンプト埋め込みの初期化に重点を置いている。適切な方法で迅速な埋め込みを訓練し活用する戦略は、迅速なチューニングの有効性の制限要因となっている。この問題に対処するために,すべてのエンコーダ層にまたがるプロンプト情報を選択的に保存するプロンプトチューニングモジュールであるGPC(Global Prompt Cell)を導入する。実験の結果,バニラプロンプトチューニングと比較して,SuperGLUEデータセットは5.8%改善した。 As a novel approach to tuning pre-trained models, prompt tuning involves freezing the parameters in downstream tasks while inserting trainable embeddings into inputs in the first layer. However, previous methods have mainly focused on the initialization of prompt embeddings. The strategy of training and utilizing prompt embeddings in a reasonable way has become a limiting factor in the effectiveness of prompt tuning. To address this issue, we introduce the Global Prompt Cell (GPC), a portable control module for prompt tuning that selectively preserves prompt information across all encoder layers. Our experimental results demonstrate a 5.8% improvement on SuperGLUE datasets compared to vanilla prompt tuning.	翻訳日:2023-05-16 22:50:01 公開日:2023-05-13
# DDP:高密度視覚予測のための拡散モデル DDP: Diffusion Model for Dense Visual Prediction ( http://arxiv.org/abs/2303.17559v2 ) ライセンス: Link先を確認	Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo	(参考訳) 本研究では,条件拡散パイプラインに基づく高密度視覚予測のための簡易かつ効率的かつ強力なフレームワークを提案する。提案手法は,ランダムなガウス分布からノイズを段階的に除去して予測する「ノイズ・ツー・マップ」生成パラダイムに従う。 DDPと呼ばれるこの手法は、デノナイジング拡散過程を現代の知覚パイプラインに効率的に拡張する。タスク固有の設計とアーキテクチャのカスタマイズがなければ、DDPはセマンティックセグメンテーションや深さ推定といった最も密集した予測タスクに簡単に一般化できる。さらにDDPは,従来の一段階判別法とは対照的に,動的推論や不確実性認識などの魅力的な特性を示す。 3つの代表的なタスクで,6つのベンチマークで上位結果を示し,トリックを伴わずに,ddpは各タスクの最高性能や競争性能を,専門家と比較した。例えば、セマンティックセグメンテーション (83.9 mIoU on Cityscapes)、BEVマップセグメンテーション (70.6 mIoU on nuScenes)、深さ推定 (0.05 REL on KITTI) などがある。私たちのアプローチが、堅固なベースラインとなり、将来の研究を促進することを願っています。 We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task compared to the specialist counterparts. For example, semantic segmentation (83.9 mIoU on Cityscapes), BEV map segmentation (70.6 mIoU on nuScenes), and depth estimation (0.05 REL on KITTI). We hope that our approach will serve as a solid baseline and facilitate future research	翻訳日:2023-05-16 22:48:05 公開日:2023-05-13
# 自然言語の推論, 調査 Natural Language Reasoning, A Survey ( http://arxiv.org/abs/2303.14725v2 ) ライセンス: Link先を確認	Fei Yu, Hongbo Zhang, Prayag Tiwari, Benyou Wang	(参考訳) 本稿では,自然言語処理(NLP)分野における自然言語推論について,概念的にも実用的にも,より明確な視点を提案する。概念的には、我々は、哲学とNLPシナリオの両方に基づいて、NLPにおける自然言語推論の明確な定義を提供し、どのタスクが推論を必要とするかを議論し、推論の分類を導入します。本稿は,古典論理推論,自然言語推論,マルチホップ質問応答,コモンセンス推論を中心に,NLPにおける自然言語推論に関する総合的な文献レビューを行う。本稿は,多段階推論の強力なパラダイムである後方推論を同定し,考察し,自然言語推論研究における最も重要な将来方向の1つとしてデファシブル推論を導入する。ニューロシンボリック手法と数学的推論を除外し,単一モダリティ非構造化自然言語テキストに注目した。 This survey paper proposes a clearer view of natural language reasoning in the field of Natural Language Processing (NLP), both conceptually and practically. Conceptually, we provide a distinct definition for natural language reasoning in NLP, based on both philosophy and NLP scenarios, discuss what types of tasks require reasoning, and introduce a taxonomy of reasoning. Practically, we conduct a comprehensive literature review on natural language reasoning in NLP, mainly covering classical logical reasoning, natural language inference, multi-hop question answering, and commonsense reasoning. The paper also identifies and views backward reasoning, a powerful paradigm for multi-step reasoning, and introduces defeasible reasoning as one of the most important future directions in natural language reasoning research. We focus on single-modality unstructured natural language text, excluding neuro-symbolic techniques and mathematical reasoning.	翻訳日:2023-05-16 22:47:09 公開日:2023-05-13
# 行動検索:ラベルなしデータセットのクエリによるマイテーション学習 Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets ( http://arxiv.org/abs/2304.08742v2 ) ライセンス: Link先を確認	Maximilian Du, Suraj Nair, Dorsa Sadigh, Chelsea Finn	(参考訳) データ効率のよい方法で新しい視覚運動のスキルを習得するロボットの開発は、無数の課題に対して未解決の問題である。この問題に対処するための一般的なパラダイムは、多くの振る舞いを持つ大きなラベルのないデータセットを活用して、少数のタスク固有の人的監督(例えば介入やデモンストレーション)を使用して特定のタスクにポリシーを適用することである。しかし、タスク固有の監督を狭くし、オフラインデータとバランスをとるのがいかに最適かは、未解決の問題である。この研究における私たちの重要な洞察は、タスク固有のデータはエージェントがトレーニングする新しいデータを提供するだけでなく、エージェントが学習に使用するべき事前データの種類を知らせることもできます。具体的には、少量のダウンストリーム専門家データを使用して、オフラインでラベルなしのデータセット(多くのサブ最適動作を含む)から関連する振る舞いを選択的にクエリするシンプルなアプローチを提案する。エージェントは専門家とクエリーデータで共同で訓練される。提案手法はタスクへの関連する遷移のみをクエリし、サブ最適またはタスク不要なデータをフィルタリングすることを学習する。これにより、タスク固有のデータとオフラインのデータの混合からより効果的に学習することができる。さらに,画像からロボット操作タスクをシミュレートすることで,より複雑な目標条件付け手法を20%向上させることができた。ビデオやコードについてはhttps://sites.google.com/view/behaviorretrievalを参照。 Enabling robots to learn novel visuomotor skills in a data-efficient manner remains an unsolved problem with myriad challenges. A popular paradigm for tackling this problem is through leveraging large unlabeled datasets that have many behaviors in them and then adapting a policy to a specific task using a small amount of task-specific human supervision (i.e. interventions or demonstrations). However, how best to leverage the narrow task-specific supervision and balance it with offline data remains an open question. Our key insight in this work is that task-specific data not only provides new data for an agent to train on but can also inform the type of prior data the agent should use for learning. Concretely, we propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset (including many sub-optimal behaviors). The agent is then jointly trained on the expert and queried data. We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data. By doing so, it is able to learn more effectively from the mix of task-specific and offline data compared to naively mixing the data or only using the task-specific data. Furthermore, we find that our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images. See https://sites.google.com/view/behaviorretrieval for videos and code.	翻訳日:2023-05-16 21:03:16 公開日:2023-05-13
# SweCTRL-Mini:スウェーデンにおける制御可能なテキスト生成のためのデータ透過トランスフォーマーに基づく大規模言語モデル SweCTRL-Mini: a data-transparent Transformer-based large language model for controllable text generation in Swedish ( http://arxiv.org/abs/2304.13994v2 ) ライセンス: Link先を確認	Dmytro Kalpakchi, Johan Boye	(参考訳) SweCTRL-Miniは,1つのコンシューマグレードGPU上での推論と微調整に使用できる,スウェーデンの大規模言語モデルである。このモデルはKeskar, McCann, Varshney, Xiong, Socher (2019)によるCTRLアーキテクチャに基づいており、SweCTRL-Miniモデルのユーザは生成プロンプトに特別なトークンを挿入することで生成されたテキストのジャンルを制御できる。 SweCTRL-MiniはスウェーデンのmC4コーパスのサブセットとスウェーデンの小説のセットで訓練されている。本稿では,(1)使用済みの訓練データとテキストの前処理ステップの詳細な説明,(2)特定のフレーズ/ソースが訓練データの一部であったかどうかの確認,(2)自動評価手法と生成課題を用いた判別作業におけるモデルの評価について述べる。また,モデル生成能力とGPT-3の比較を行った。 SweCTRL-Miniは完全にオープンで、ダウンロードできる。 We present SweCTRL-Mini, a large Swedish language model that can be used for inference and fine-tuning on a single consumer-grade GPU. The model is based on the CTRL architecture by Keskar, McCann, Varshney, Xiong, and Socher (2019), which means that users of the SweCTRL-Mini model can control the genre of the generated text by inserting special tokens in the generation prompts. SweCTRL-Mini is trained on a subset of the Swedish part of the mC4 corpus and a set of Swedish novels. In this article, we provide (1) a detailed account of the utilized training data and text pre-processing steps, to the extent that it is possible to check whether a specific phrase/source was a part of the training data, and (2) an evaluation of the model on both discriminative tasks, using automatic evaluation methods, and generative tasks, using human referees. We also compare the generative capabilities of the model with those of GPT-3. SweCTRL-Mini is fully open and available for download.	翻訳日:2023-05-16 20:53:45 公開日:2023-05-13
# テンソルネットワークに基づく量子スピン系の還元基底サロゲート Reduced basis surrogates for quantum spin systems based on tensor networks ( http://arxiv.org/abs/2304.13587v2 ) ライセンス: Link先を確認	Paul Brehmer, Michael F. Herbst, Stefan Wessel, Matteo Rizzi, Benjamin Stamm	(参考訳) 還元基底法アプローチでは、例えば基底状態の位相図を調べるために、量子多体ヒルベルト空間の有効な低次元部分空間を構築する。この部分空間の基盤はスナップショットの解、すなわち、特定のパラメータ値と well-chosen パラメータ値に対応する基底状態から成り立っている。本稿では, 行列積状態(MPS)計算に基づいて, 還元基底を組み立て, パラメータ点を選択するための欲求戦略について述べる。減少基底が得られれば、位相図の計算に必要な可観測性は任意のパラメータ値のヒルベルト空間とは無関係な計算複雑性で計算することができる。本稿では、異方性および双曲面交換相互作用を含む、異なる1次元量子スピン-1モデルに対するこのアプローチの効率と精度を示し、リッチ量子位相図を導出する。 Within the reduced basis methods approach, an effective low-dimensional subspace of a quantum many-body Hilbert space is constructed in order to investigate, e.g., the ground-state phase diagram. The basis of this subspace is built from solutions of snapshots, i.e., ground states corresponding to particular and well-chosen parameter values. Here, we show how a greedy strategy to assemble the reduced basis and thus to select the parameter points can be implemented based on matrix-product-states (MPS) calculations. Once the reduced basis has been obtained, observables required for the computation of phase diagrams can be computed with a computational complexity independent of the underlying Hilbert space for any parameter value. We illustrate the efficiency and accuracy of this approach for different one-dimensional quantum spin-1 models, including anisotropic as well as biquadratic exchange interactions, leading to rich quantum phase diagrams.	翻訳日:2023-05-16 20:53:23 公開日:2023-05-13
# すべてのモデルはローカルである: 外部バリデーションをリカレントローカルバリデーションに置き換える時間 All models are local: time to replace external validation with recurrent local validation ( http://arxiv.org/abs/2305.03219v2 ) ライセンス: Link先を確認	Alex Youssef, Michael Pencina, Anshul Thakur, Tingting Zhu, David Clifton, Nigam H. Shah	(参考訳) 外部検証はMLモデルの一般化性を保証するためにしばしば推奨される。しかし、汎用性や、モデルの臨床的有用性(あらゆる臨床的意思決定支援ツールの最終的な目標)に匹敵するものではない。外部検証は、現在のヘルスケアMLのニーズと不一致である。まず、患者データは時間、地理、施設によって変化する。これらの変化は、単一の固定モデル(特に臨床mlを支配しているディープラーニングモデル)のパフォーマンスに大きなボラティリティをもたらします。第二に、新しいML技術、現在の市場力、更新された規制フレームワークは、デプロイされた個々のモデルインスタンスの頻繁な更新と監視を可能にしている。 MLモデルの安全性やユーティリティを確立するには,外部検証が不十分であることを示す。外部バリデーションパラダイムを修正するための提案は、十分に行き届かない。最終的なテストが私たちを混乱に導く可能性が高いので、引き続きそれに依存します。本稿では,MLOpsにインスパイアされた局所的検証のパラダイムを提案する。このパラダイムは、デプロイ毎のサイト固有の信頼性テストと、デプロイされたアルゴリズムのライフサイクル全体にわたる定期的かつ反復的なチェックに依存する。初期および繰り返しの信頼性テストは、パフォーマンス破壊的な分散シフトと、患者の安全性を損なうコンセプトドリフトから保護される。 External validation is often recommended to ensure the generalizability of ML models. However, it neither guarantees generalizability nor equates to a model's clinical usefulness (the ultimate goal of any clinical decision-support tool). External validation is misaligned with current healthcare ML needs. First, patient data changes across time, geography, and facilities. These changes create significant volatility in the performance of a single fixed model (especially for deep learning models, which dominate clinical ML). Second, newer ML techniques, current market forces, and updated regulatory frameworks are enabling frequent updating and monitoring of individual deployed model instances. We submit that external validation is insufficient to establish ML models' safety or utility. Proposals to fix the external validation paradigm do not go far enough. Continued reliance on it as the ultimate test is likely to lead us astray. We propose the MLOps-inspired paradigm of recurring local validation as an alternative that ensures the validity of models while protecting against performance-disruptive data variability. This paradigm relies on site-specific reliability tests before every deployment, followed by regular and recurrent checks throughout the life cycle of the deployed algorithm. Initial and recurrent reliability tests protect against performance-disruptive distribution shifts, and concept drifts that jeopardize patient safety.	翻訳日:2023-05-16 20:44:51 公開日:2023-05-13
# tweezer配列における反強磁性ボソニック$t$-$j$モデルとその量子シミュレーション Antiferromagnetic bosonic $t$-$J$ models and their quantum simulation in tweezer arrays ( http://arxiv.org/abs/2305.02322v2 ) ライセンス: Link先を確認	Lukas Homeier and Timothy J. Harris and Tizian Blatz and Ulrich Schollw\"ock and Fabian Grusdt and Annabelle Bohrdt	(参考訳) 分子の双極子交換とrydberg原子のヴァン・ダー・ワールス相互作用による強い相互作用を持つ光学トワイザーアレイの組み合わせは、幅広い量子スピンモデルの研究の扉を開いた。次の重要なステップは、そのような設定とモバイルのドーパントの組み合わせである。これにより、多くの強い相関量子材料を弱めていると信じられている物理学をシミュレートすることができる。ここでは、局所ヒルベルト空間を3つの内部原子状態あるいは分子状態の集合に符号化することで、ボゾン$t$-$J$モデルを実現する実験スキームを提案する。スピン間の反強磁性(AFM)カップレートの工学的結合により、高T_c$カップレートと同様の電荷運動と磁気秩序の競合を実現することができる。提案する2d $t$-$j$モデルのbosonic afmバージョンは以前に研究されていなかったので、まず2つのドーパント(ボソニック統計が役割を果たす最も単純な例)のケースを分析し、その結果をフェルミオンの場合と比較する。六脚シリンダ上で大規模密度行列再正規化群 (DMRG) 計算を行い, ストリップを形成するボソニックホールの強い傾向を見出した。これは、ボソニックなAFM$t$-$J$モデルが強い相関電子の集合相と同様の物理を含むことを証明している。 The combination of optical tweezer arrays with strong interactions -- via dipole-exchange of molecules and van-der-Waals interactions of Rydberg atoms -- has opened the door for the exploration of a wide variety of quantum spin models. A next significant step will be the combination of such settings with mobile dopants: This will enable to simulate the physics believed to underlie many strongly correlated quantum materials. Here we propose an experimental scheme to realize bosonic $t$-$J$ models via encoding the local Hilbert space in a set of three internal atomic or molecular states. By engineering antiferromagnetic (AFM) couplings between spins, competition between charge motion and magnetic order similar to that in high-$T_c$ cuprates can be realized. Since the bosonic AFM version of the 2D $t$-$J$ model we propose has not been studied previously, we start by analyzing the case of two dopants -- the simplest instance in which their bosonic statistics plays a role, and contrast our results to the fermionic case. We perform large-scale density matrix renormalization group (DMRG) calculations on six-legged cylinders, and find a strong tendency for bosonic holes to form stripes. This demonstrates that bosonic, AFM $t$-$J$ models may contain similar physics as the collective phases in strongly correlated electrons.	翻訳日:2023-05-16 20:44:21 公開日:2023-05-13
# 強結合ボースポーラロンの統一理論:反発ポーラロンから非ガウス多体バウンド状態へ A unified theory of strong coupling Bose polarons: From repulsive polarons to non-Gaussian many-body bound states ( http://arxiv.org/abs/2305.00835v2 ) ライセンス: Link先を確認	Nader Mostaan, Nathan Goldman, Fabian Grusdt	(参考訳) 我々は、フェシュバッハ共鳴を通じて、ホストボース・アインシュタイン凝縮体(BEC)と強く相互作用する移動不純物のボースポーラロン問題に対処する。強い結合における反発側では、理論的なアプローチは2つの異なるポラロン分岐を誘引性および反発性ポラロンに対応させて予測するが、この2つがどのように関連しているかは定かではない。これは、弱い反発的(安定)ボソン・ボソン相互作用と強い魅力(不安定)な不純物・ボソン相互作用の競合によるものであり、その相互作用は現代の理論手法では説明が難しい。ここでは、無限個のボソニック励起を含む不純物-ボソン散乱状態間のガウス相関と、不純物-ボソン結合状態を占めるボソン間の正確な非ガウス相関を結合する強力な変分フレームワークを開発する。この変分スキームは、共鳴の反発側でフェシュバッハ分子に生じる強い非線形性の完全な処理を可能にする。この枠組みでは,不純物誘起不安定性とボソン-ボソン相互作用による安定化の相互作用が,誘電体と反発性ポラロンの中間エネルギーにおける準安定多体結合状態の離散的集合をもたらすことを示した。これらの状態は非ガウス量子相関の形で強い量子統計特性を示し、その特徴づけには平均場以外の摂動性を必要とする。さらに、これらの多体結合状態は分子スペクトル重みを持ち、分子分光法技術によってアクセス可能である。この研究は、フェシュバッハ共鳴の反発側における魅力的で反発的なボースポーラロンの統一理論を提供する。 We address the Bose polaron problem of a mobile impurity interacting strongly with a host Bose-Einstein condensate (BEC) through a Feshbach resonance. On the repulsive side at strong couplings, theoretical approaches predict two distinct polaron branches corresponding to attractive and repulsive polarons, but it remains unclear how the two are related. This is partly due to the challenges resulting from a competition of strongly attractive (destabilizing) impurity-boson interactions with weakly repulsive (stabilizing) boson-boson interactions, whose interplay is difficult to describe with contemporary theoretical methods. Here we develop a powerful variational framework that combines Gaussian correlations among impurity-boson scattering states, including up to an infinite number of bosonic excitations, with exact non-Gaussian correlations among bosons occupying an impurity-boson bound state. This variational scheme enables a full treatment of strong nonlinearities arising in the Feshbach molecule on the repulsive side of the resonance. Within this framework, we demonstrate that the interplay of impurity-induced instability and stabilization by repulsive boson-boson interactions results in a discrete set of metastable many-body bound states at intermediate energies between the attractive and repulsive polaron branches. These states exhibit strong quantum statistical characteristics in the form of non-Gaussian quantum correlations, requiring non-perturbative beyond mean-field treatments for their characterization. Furthermore, these many-body bound states have sizable molecular spectral weights, accessible via molecular spectroscopy techniques. This work provides a unified theory of attractive and repulsive Bose polarons on the repulsive side of the Feshbach resonance.	翻訳日:2023-05-16 20:43:36 公開日:2023-05-13
# 1対1変圧器によるエンド・ツー・エンド車線検出 End-to-End Lane detection with One-to-Several Transformer ( http://arxiv.org/abs/2305.00675v4 ) ライセンス: Link先を確認	Kunyang Zhou and Rui Zhou	(参考訳) レーン検出手法は実世界のシナリオで印象的な性能を示したが、ほとんどの方法は十分に堅牢ではない後処理を必要とする。したがって、車線検出にはDetection TRansformer(DETR)のようなエンドツーエンド検出器が導入されたが、DTRにおける1対1のラベル割り当ては、ラベルセマンティックコンフリクトによるトレーニング効率の低下を招いている。さらに、detrにおける位置クエリは明示的な位置優先を提供することができないため、最適化が難しい。本稿では,1-to-Several Transformer(O2SFormer)を提案する。まず,1対1のラベル代入と1対1のラベル代入を組み合わせた1対1のラベル代入を提案する。 1対1の割り当てを最適化する難しさを克服する。さらに,異なるデコーダ層において正のレーンアンカーの正の重みを動的に調整する層別ソフトラベルを提案する。最後に,動的アンカーに基づく位置問合せの設計を行い,位置問合せにレーンアンカーを組み込むことにより位置先行を探索する。実験の結果、resnet50 backboneのo2sformerはculaneデータセットで77.83%のf1スコアを獲得し、既存のtransformerベースおよびcnnベースの検出器よりも優れていた。さらにO2SFormerはResNet18バックボーンのDETRよりも12.5倍高速に収束する。 Although lane detection methods have shown impressive performance in real-world scenarios, most of methods require post-processing which is not robust enough. Therefore, end-to-end detectors like DEtection TRansformer(DETR) have been introduced in lane detection.However, one-to-one label assignment in DETR can degrade the training efficiency due to label semantic conflicts. Besides, positional query in DETR is unable to provide explicit positional prior, making it difficult to be optimized. In this paper, we present the One-to-Several Transformer(O2SFormer). We first propose the one-to-several label assignment, which combines one-to-many and one-to-one label assignment to solve label semantic conflicts while keeping end-to-end detection. To overcome the difficulty in optimizing one-to-one assignment. We further propose the layer-wise soft label which dynamically adjusts the positive weight of positive lane anchors in different decoder layers. Finally, we design the dynamic anchor-based positional query to explore positional prior by incorporating lane anchors into positional query. Experimental results show that O2SFormer with ResNet50 backbone achieves 77.83% F1 score on CULane dataset, outperforming existing Transformer-based and CNN-based detectors. Futhermore, O2SFormer converges 12.5x faster than DETR for the ResNet18 backbone.	翻訳日:2023-05-16 20:43:07 公開日:2023-05-13
# 振動ポラリトン化学の微視的理論 Microscopic Theory of Vibrational Polariton Chemistry ( http://arxiv.org/abs/2305.05005v2 ) ライセンス: Link先を確認	Wenxiang Ying, Michael A.D. Taylor, and Pengfei Huo	(参考訳) 振動強い結合(VSC)修飾反応速度定数を説明するための顕微鏡理論を提案する。解析理論は、キャビティモードが反応の速度制限ステップである反応物の基底状態から振動励起状態への遷移を促進するという力学的予想に基づいている。この理論は通常の入射角度で観測された共鳴効果を説明する。コヒーレントな振動エネルギー移動像を仮定すると、理論は集団効果を説明でき、実験的に検証可能ないくつかの予測を行うことができる。 We present a microscopic theory that aims to explain the vibrational strong coupling (VSC) modified reaction rate constant. The analytic theory is based on a mechanistic conjecture that cavity modes promote the transition from the ground state to the vibrational excited state of the reactant, which is the rate-limiting step of the reaction. The theory explains the observed resonance effect at the normal incident angle. Assuming the coherent vibrational energy transfer picture, the theory can also explain the collective effect and makes several predictions that are experimentally verifiable.	翻訳日:2023-05-16 20:34:48 公開日:2023-05-13
# AlignSTS: クロスモーダルアライメントによる音声対歌変換 AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment ( http://arxiv.org/abs/2305.04476v3 ) ライセンス: Link先を確認	Ruiqi Li, Rongjie Huang, Lichao Zhang, Jinglin Liu, Zhou Zhao	(参考訳) 音声認識(sts)音声変換タスクは、音声録音に対応する歌唱サンプルを生成することを目的としており、ターゲット(音声)ピッチ輪郭とソース(音声)コンテンツとのアライメントは、テキストのない状況では学習が困難である。本稿では,音節や内容などの発話の相違を異なるモーダル性として捉えた,明示的なクロスモーダルアライメントに基づくSTSモデルであるAlignSTSを提案する。人間がメロディの歌詞を歌うメカニズムに触発されたAlignSTS: 1)新規なリズム適応器を採用して、目標リズム表現を予測し、そのリズム表現が単純で効果的な方法で計算され、離散空間に量子化される、内容とピッチの間のモダリティギャップを橋渡しする。 2) 予測リズム表現を用いて, クロスアテンションに基づいてコンテンツを再調整し, 再合成のためのクロスモーダル融合を行う。大規模な実験では、AlignSTSは客観的な指標と主観的な指標の両方で優れたパフォーマンスを達成している。オーディオサンプルはhttps://alignsts.github.ioで入手できる。 The speech-to-singing (STS) voice conversion task aims to generate singing samples corresponding to speech recordings while facing a major challenge: the alignment between the target (singing) pitch contour and the source (speech) content is difficult to learn in a text-free situation. This paper proposes AlignSTS, an STS model based on explicit cross-modal alignment, which views speech variance such as pitch and content as different modalities. Inspired by the mechanism of how humans will sing the lyrics to the melody, AlignSTS: 1) adopts a novel rhythm adaptor to predict the target rhythm representation to bridge the modality gap between content and pitch, where the rhythm representation is computed in a simple yet effective way and is quantized into a discrete space; and 2) uses the predicted rhythm representation to re-align the content based on cross-attention and conducts a cross-modal fusion for re-synthesize. Extensive experiments show that AlignSTS achieves superior performance in terms of both objective and subjective metrics. Audio samples are available at https://alignsts.github.io.	翻訳日:2023-05-16 20:33:48 公開日:2023-05-13
# 全方向量子制限位相保存増幅器 Fully Directional Quantum-limited Phase-Preserving Amplifier ( http://arxiv.org/abs/2305.04184v2 ) ライセンス: Link先を確認	Gangqiang Liu, Andrew Lingenfelter, Vidul R. Joshi, Nicholas E. Frattini, Volodymyr V. Sivak, Shyam Shankar and Michel H. Devoret	(参考訳) 本研究では,4つのモードにまたがる6つのパラメトリックプロセス間の干渉を利用して,4ポート4モード超伝導ジョセフソン回路の完全指向性,量子制限型位相保存増幅を実現する方法を提案する。完全方向性(full directionality)は、増幅器の入力ポートと出力ポートの間の前方利得を超える逆分離として定義され、アプリケーション中に出力ポートに存在するインピーダンスミスマッチに対するロバスト性を保証する。既存の指向性位相保存増幅器とは異なり、最小のバックアクションとこの増幅器の量子制限付加ノイズは出力ポートのノイズインシデントの影響を受けない。さらに、一致した入力および出力ポートは、これらの増幅器を他の回路QEDコンポーネントと直接チップ上で統合することができ、超伝導量子プロセッサのスケールアップを容易にする。 We present a way to achieve fully directional, quantum-limited phase-preserving amplification in a four-port, four-mode superconducting Josephson circuit by utilizing interference between six parametric processes that couple all four modes. Full directionality, defined as the reverse isolation surpassing forward gain between the matched input and output ports of the amplifier, ensures its robustness against impedance mismatch that might be present at its output port during applications. Unlike existing directional phase-preserving amplifiers, both the minimal back-action and the quantum-limited added noise of this amplifier remains unaffected by noise incident on its output port. In addition, the matched input and output ports allow direct on-chip integration of these amplifiers with other circuit QED components, facilitating scaling up of superconducting quantum processors.	翻訳日:2023-05-16 20:33:27 公開日:2023-05-13
# Patchwork Learning: 多様なバイオメディカルデータソースの統合分析に向けたパラダイム Patchwork Learning: A Paradigm Towards Integrative Analysis across Diverse Biomedical Data Sources ( http://arxiv.org/abs/2305.06217v2 ) ライセンス: Link先を確認	Suraj Rajendran, Weishen Pan, Mert R. Sabuncu, Yong Chen, Jiayu Zhou, Fei Wang	(参考訳) 医療における機械学習(ml)は、患者ケア、人口健康、医療提供者のワークフローを強化する多くの機会を提供する。しかし、データプライバシや異種データソースの課題、複数のデータモダリティを完全に活用できないため、実際の臨床とコストのメリットは依然として限られている。本稿では,異なるデータモダリティ(クリニカル・フリーテキスト,医用画像,オミクスなど)から構成される異なるデータセットからの情報を統合することにより,これらの制約に対処する新しいパラダイムである"パッチワーク・ラーニング"(PL)を紹介する。 PLはデータのプライバシを保ちながら補完的なデータソースを同時に利用することを可能にし、より包括的で一般化可能なMLモデルの開発を可能にする。本稿では,パッチワーク学習の概念と医療における現在の実装について紹介し,様々な医療課題に対処するための潜在的機会と適用可能なデータソースについて検討する。 PLは、情報共有と欠落したデータのインプットを容易にするために、サイトをまたいだブリッジングのモダリティや重複する特徴空間を活用し、関連する予測タスクに対処する。本稿では,PLに関連する課題について論じる。その多くが連合学習とマルチモーダル学習によって共有され,今後の研究への提言を提供する。医療データ統合に対するより包括的なアプローチを提供することで、パッチワーク学習はMLモデルの臨床的適用性に革命をもたらす可能性がある。このパラダイムは、パーソナライゼーションと一般化可能性のバランスを保ち、最終的には患者の体験を向上し、人口の健康を改善し、医療提供者のワークフローを最適化することを約束する。 Machine learning (ML) in healthcare presents numerous opportunities for enhancing patient care, population health, and healthcare providers' workflows. However, the real-world clinical and cost benefits remain limited due to challenges in data privacy, heterogeneous data sources, and the inability to fully leverage multiple data modalities. In this perspective paper, we introduce "patchwork learning" (PL), a novel paradigm that addresses these limitations by integrating information from disparate datasets composed of different data modalities (e.g., clinical free-text, medical images, omics) and distributed across separate and secure sites. PL allows the simultaneous utilization of complementary data sources while preserving data privacy, enabling the development of more holistic and generalizable ML models. We present the concept of patchwork learning and its current implementations in healthcare, exploring the potential opportunities and applicable data sources for addressing various healthcare challenges. PL leverages bridging modalities or overlapping feature spaces across sites to facilitate information sharing and impute missing data, thereby addressing related prediction tasks. We discuss the challenges associated with PL, many of which are shared by federated and multimodal learning, and provide recommendations for future research in this field. By offering a more comprehensive approach to healthcare data integration, patchwork learning has the potential to revolutionize the clinical applicability of ML models. This paradigm promises to strike a balance between personalization and generalizability, ultimately enhancing patient experiences, improving population health, and optimizing healthcare providers' workflows.	翻訳日:2023-05-16 20:25:39 公開日:2023-05-13
# 何て言うんだ! 大きな言語モデルでは否定的常識の知識が多すぎる Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge ( http://arxiv.org/abs/2305.05976v2 ) ライセンス: Link先を確認	Jiangjie Chen, Wei Shi, Ziquan Fu, Sijie Cheng, Lei Li, Yanghua Xiao	(参考訳) 大規模言語モデル(llm)は、ポジティブな知識を蓄積し活用する能力について広く研究されている。しかし、「lion don't live in the ocean」のような否定的な知識は世界でもユビキタスであるが、テキストで明示的に言及されることは滅多にない。 LLMは負の知識について何を知っているのか? 本研究は,LLMの負のコモンセンス知識に対する能力について検討する。制約付きキーワード対文生成タスク(CG)とブール質問回答タスク(QA)を設計し,LLMを探索する。実験の結果,LLMは負のコモンセンス知識に基づく有効な文の生成に失敗することが多いことがわかった。我々はこの現象をLLMの信念衝突と呼ぶ。さらなる分析から,言語モデリングの事前学習による統計的近道と否定報告バイアスが,この衝突の原因となることが示された。 Large language models (LLMs) have been widely studied for their ability to store and utilize positive knowledge. However, negative knowledge, such as "lions don't live in the ocean", is also ubiquitous in the world but rarely mentioned explicitly in the text. What do LLMs know about negative knowledge? This work examines the ability of LLMs to negative commonsense knowledge. We design a constrained keywords-to-sentence generation task (CG) and a Boolean question-answering task (QA) to probe LLMs. Our experiments reveal that LLMs frequently fail to generate valid sentences grounded in negative commonsense knowledge, yet they can correctly answer polar yes-or-no questions. We term this phenomenon the belief conflict of LLMs. Our further analysis shows that statistical shortcuts and negation reporting bias from language modeling pre-training cause this conflict.	翻訳日:2023-05-16 20:24:50 公開日:2023-05-13
# MetaMorphosis:マルチタスク学習のためのタスク指向プライバシ認知機能生成 MetaMorphosis: Task-oriented Privacy Cognizant Feature Generation for Multi-task Learning ( http://arxiv.org/abs/2305.07815v1 ) ライセンス: Link先を確認	Md Adnan Arefeen, Zhouyu Li, Md Yusuf Sarwar Uddin, Anupam Das	(参考訳) コンピュータビジョンアプリケーションの成長、ディープラーニング、エッジコンピューティングは、エッジデバイスとクラウドにワークロードを分散させることで、実用的なコラボレーションインテリジェンス(CI)の確保に寄与する。しかしながら、エッジデバイス上で別々のシングルタスクモデルを実行することは、必要な計算リソースと時間に関して非効率である。このコンテキストでは、マルチタスク学習は、セマンティックセグメンテーションや入ってくるビデオフレームの深さ推定など、複数のタスクを実行するために単一のディープラーニングモデルを活用することができる。この単一処理パイプラインは、マルチタスクモジュール間で共有される共通の深い特徴を生成する。しかし、コラボレーティブインテリジェンスシナリオでは、共通の深い特徴を生成するには2つの大きな問題がある。まず、深い機能には、下流モジュールに露出した入力情報(入力のプライバシーを侵害する)が不注意に含まれる可能性がある。第二に、生成されたユニバーサル機能は、あるタスクを意図したものよりも集合的な情報を露出し、あるタスクの機能を他のタスク(タスクプライバシに違反する)に利用することができる。本稿では,特定のタスクに対する推論能力を制限する,新しいディープラーニングベースのプライバシー認識機能生成プロセスであるmetamorphosisを提案する。そこで本研究では,すべてのタスクに明確な注意を払い,差分プライバシーを持つ非相関損失関数を用いて,各タスクのアウトプットとして異なるプライバシアウェア機能を生成するディープラーニングモデルを訓練する,チャネルスクイーズ励起型特徴形変換モジュールcross-secを提案する。シーン理解と顔の属性に関する多様な画像からなる4つのデータセットを広範囲に実験した結果,画像とビデオ分析の効率的な方法でプライバシ要件を保証し,近年の逆学習や普遍的特徴生成手法よりもメタモルフィズムが優れていることが示された。 With the growth of computer vision applications, deep learning, and edge computing contribute to ensuring practical collaborative intelligence (CI) by distributing the workload among edge devices and the cloud. However, running separate single-task models on edge devices is inefficient regarding the required computational resource and time. In this context, multi-task learning allows leveraging a single deep learning model for performing multiple tasks, such as semantic segmentation and depth estimation on incoming video frames. This single processing pipeline generates common deep features that are shared among multi-task modules. However, in a collaborative intelligence scenario, generating common deep features has two major issues. First, the deep features may inadvertently contain input information exposed to the downstream modules (violating input privacy). Second, the generated universal features expose a piece of collective information than what is intended for a certain task, in which features for one task can be utilized to perform another task (violating task privacy). This paper proposes a novel deep learning-based privacy-cognizant feature generation process called MetaMorphosis that limits inference capability to specific tasks at hand. To achieve this, we propose a channel squeeze-excitation based feature metamorphosis module, Cross-SEC, to achieve distinct attention of all tasks and a de-correlation loss function with differential-privacy to train a deep learning model that produces distinct privacy-aware features as an output for the respective tasks. With extensive experimentation on four datasets consisting of diverse images related to scene understanding and facial attributes, we show that MetaMorphosis outperforms recent adversarial learning and universal feature generation methods by guaranteeing privacy requirements in an efficient way for image and video analytics.	翻訳日:2023-05-16 19:39:03 公開日:2023-05-13
# Cloud-RAIN: 反射不変性によるポイントクラウド分析 Cloud-RAIN: Point Cloud Analysis with Reflectional Invariance ( http://arxiv.org/abs/2305.07814v1 ) ライセンス: Link先を確認	Yiming Cui, Lecheng Ruan, Hang-Cheng Dong, Qiang Li, Zhongming Wu, Tieyong Zeng, Feng-Lei Fan	(参考訳) 点雲タスクのネットワークは、回転や反射のような点雲が親和的に変換されるときに不変であることが期待される。これまでのところ、近年研究が注目されている回転不変性に対して、反射不変性はほとんど対処されていない。にもかかわらず、リフレクション対称性は、構造化道路の静的反射対称性、動く物体(歩行者など)の双方向運動からの動的反射対称性、異なる国の左右の交通慣行など、非常に一般的で重要なシナリオで自分自身を見つけることができる。私たちの知る限りでは、残念ながら、これまでポイントクラウド分析でリフレクション不変ネットワークが報告されていない。このギャップを埋めるために,2次ニューロンと,Cloud-RAINと呼ばれるPCA標準表現を用いて, \underline{R}eflection\underline{A}l \underline{IN} 分散を用いた点 \underline{Cloud} モデルを実現する枠組みを提案する。クラウドレーンはなぜ反射対称性を享受できるのかを説明するための定理を証明する。さらに、広範な実験は、提案したCloud-RAINの反射特性を相関させ、Cloud-RAINがデータ拡張よりも優れていることを示す。私たちのコードはhttps://github.com/YimingCuiCuiCui/Cloud-RAINで利用可能です。 The networks for point cloud tasks are expected to be invariant when the point clouds are affinely transformed such as rotation and reflection. So far, relative to the rotational invariance that has been attracting major research attention in the past years, the reflection invariance is little addressed. Notwithstanding, reflection symmetry can find itself in very common and important scenarios, e.g., static reflection symmetry of structured streets, dynamic reflection symmetry from bidirectional motion of moving objects (such as pedestrians), and left- and right-hand traffic practices in different countries. To the best of our knowledge, unfortunately, no reflection-invariant network has been reported in point cloud analysis till now. To fill this gap, we propose a framework by using quadratic neurons and PCA canonical representation, referred to as Cloud-RAIN, to endow point \underline{Cloud} models with \underline{R}eflection\underline{A}l \underline{IN}variance. We prove a theorem to explain why Cloud-RAIN can enjoy reflection symmetry. Furthermore, extensive experiments also corroborate the reflection property of the proposed Cloud-RAIN and show that Cloud-RAIN is superior to data augmentation. Our code is available at https://github.com/YimingCuiCuiCui/Cloud-RAIN.	翻訳日:2023-05-16 19:38:32 公開日:2023-05-13
# ドアベルカメラの軽量化検出 Lightweight Delivery Detection on Doorbell Cameras ( http://arxiv.org/abs/2305.07812v1 ) ライセンス: Link先を確認	Pirazh Khorramshahi, Zhe Wu, Tianchen Wang, Luke Deluccia, Hongcheng Wang	(参考訳) 近年の映像ベース行動認識と強固な時空間モデリングの進歩にもかかわらず、提案手法の多くは計算資源の豊富さに頼り、大規模で計算集約的な畳み込みやトランスフォーマーベースのニューラルネットワークを実行して十分な結果を得る。これにより、電力とコンピューティングリソースが制限されたエッジデバイスへのそのようなモデルのデプロイが制限される。本研究では、重要なスマートホームアプリケーション、ビデオベースの配信検出、リソース制約されたドアベルカメラ上で動作可能な、このタスクのためのシンプルで軽量なパイプラインを提案する。提案するパイプラインは,動きの手がかりに基づいて,粗いアクティビティ提案のセットを生成し,さらに,モバイルフレンドリーな3dcnnネットワークで分類する。トレーニングのために、ネットワークが強固な時空間的特徴を学ぶのに役立つ新しい半教師付きアテンションモジュールを設計し、ネットワークによってなされる予測の不確かさを定量化するためのエビデンスベースの最適化目標を採用する。私たちのキュレーションされたデリバリデータセットにおける実験結果は、代替品と比較してパイプラインの有意な有効性を示し、自由かつかなりの推論時間のパフォーマンス向上を達成するためのトレーニングフェーズのノベルティのメリットを強調しています。 Despite recent advances in video-based action recognition and robust spatio-temporal modeling, most of the proposed approaches rely on the abundance of computational resources to afford running huge and computation-intensive convolutional or transformer-based neural networks to obtain satisfactory results. This limits the deployment of such models on edge devices with limited power and computing resources. In this work we investigate an important smart home application, video based delivery detection, and present a simple and lightweight pipeline for this task that can run on resource-constrained doorbell cameras. Our proposed pipeline relies on motion cues to generate a set of coarse activity proposals followed by their classification with a mobile-friendly 3DCNN network. For training we design a novel semi-supervised attention module that helps the network to learn robust spatio-temporal features and adopt an evidence-based optimization objective that allows for quantifying the uncertainty of predictions made by the network. Experimental results on our curated delivery dataset shows the significant effectiveness of our pipeline compared to alternatives and highlights the benefits of our training phase novelties to achieve free and considerable inference-time performance gains.	翻訳日:2023-05-16 19:38:05 公開日:2023-05-13
# ReLU MLPにおける$\mu$P学習率の深さ依存性 Depth Dependence of $\mu$P Learning Rates in ReLU MLPs ( http://arxiv.org/abs/2305.07810v1 ) ライセンス: Link先を確認	Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J. Reddi, Srinadh Bhojanapalli, Sanjiv Kumar	(参考訳) 本稿では、平均フィールド重み初期化を備えた幅$n$と深さ$L$のランダム完全連結ReLUネットワークについて考察する。我々の目的は、最大更新(\mu$p)学習率のn$とl$への依存を調べることである。 yang et の $\mu$p に関する先行研究と同じように。この最大更新学習率は、第1層と第2層の重みを除いて、すべて$n$とは独立している。しかし、それは非自明な$l$依存性を持ち、$l^{-3/2}のようにスケーリングする。 $ In this short note we consider random fully connected ReLU networks of width $n$ and depth $L$ equipped with a mean-field weight initialization. Our purpose is to study the dependence on $n$ and $L$ of the maximal update ($\mu$P) learning rate, the largest learning rate for which the mean squared change in pre-activations after one step of gradient descent remains uniformly bounded at large $n,L$. As in prior work on $\mu$P of Yang et. al., we find that this maximal update learning rate is independent of $n$ for all but the first and last layer weights. However, we find that it has a non-trivial dependence of $L$, scaling like $L^{-3/2}.$	翻訳日:2023-05-16 19:37:44 公開日:2023-05-13
# Mesh2SSM: 表面メッシュから解剖の統計的形状モデルへ Mesh2SSM: From Surface Meshes to Statistical Shape Models of Anatomy ( http://arxiv.org/abs/2305.07805v1 ) ライセンス: Link先を確認	Krithika Iyer, Shireen Elhabian	(参考訳) 統計的形状モデリングは、医療画像(MRIやCTスキャンなど)で捉えたセグメント化された解剖学から重要な形状パラメータを発見する計算過程である。人間の解剖学における実質的な非線形変動の存在は、しばしば伝統的な形状モデリングプロセスを困難にしている。深層学習技術は、形状の複雑な非線形表現を学習し、基礎となる人口レベルの変動に忠実な統計的形状モデルを生成することができる。しかし、既存のディープラーニングモデルは依然として制限があり、トレーニングのために確立/最適化された形状モデルが必要である。我々は、教師なしの置換不変表現学習を活用して、テンプレートポイントクラウドを主観的なメッシュに変形する方法を推定し、対応性に基づく形状モデルを作成する新しいアプローチであるMesh2SSMを提案する。 Mesh2SSMは集団固有のテンプレートも学習でき、テンプレート選択によるバイアスを低減できる。提案手法はメッシュ上で直接動作し,計算効率が高いため,従来型および深層学習に基づくSSMアプローチの代替となる。 Statistical shape modeling is the computational process of discovering significant shape parameters from segmented anatomies captured by medical images (such as MRI and CT scans), which can fully describe subject-specific anatomy in the context of a population. The presence of substantial non-linear variability in human anatomy often makes the traditional shape modeling process challenging. Deep learning techniques can learn complex non-linear representations of shapes and generate statistical shape models that are more faithful to the underlying population-level variability. However, existing deep learning models still have limitations and require established/optimized shape models for training. We propose Mesh2SSM, a new approach that leverages unsupervised, permutation-invariant representation learning to estimate how to deform a template point cloud to subject-specific meshes, forming a correspondence-based shape model. Mesh2SSM can also learn a population-specific template, reducing any bias due to template selection. The proposed method operates directly on meshes and is computationally efficient, making it an attractive alternative to traditional and deep learning-based SSM approaches.	翻訳日:2023-05-16 19:37:32 公開日:2023-05-13
# CEMFormer:空間時間変換器による車内および外部カメラからのドライバー意図の予測 CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers ( http://arxiv.org/abs/2305.07840v1 ) ライセンス: Link先を確認	Yunsheng Ma, Wenqian Ye, Xu Cao, Amr Abdelraouf, Kyungtae Han, Rohit Gupta, Ziran Wang	(参考訳) ドライバーの意図予測は、周囲の交通環境に関する行動を分析することによってドライバーの行動を予測しようとするものである。既存のアプローチは主にレイトフュージョン技術に注目し、予測と一般的な駆動コンテキスト間の一貫性を維持することの重要性を無視している。本稿では,時空間トランスフォーマを使用してドライバの意図予測を改善するための統合メモリ表現を学習する,cross-view episodic memory transformer(cemformer)と呼ばれる新しいフレームワークを提案する。具体的には,in-cabinとexternal cameraの双方からの情報とエピソディックメモリ表現を統合し,履歴データを連続的に融合する空間時空間エンコーダを開発した。さらに,運転コンテキストを補助的監視信号として組み込んで予測性能を向上させる新しいコンテキスト一貫性損失を提案する。 Brain4Carsデータセットに関する包括的な実験は、CEMFormerがドライバーの意図予測において既存の最先端メソッドを一貫して上回っていることを示している。 Driver intention prediction seeks to anticipate drivers' actions by analyzing their behaviors with respect to surrounding traffic environments. Existing approaches primarily focus on late-fusion techniques, and neglect the importance of maintaining consistency between predictions and prevailing driving contexts. In this paper, we introduce a new framework called Cross-View Episodic Memory Transformer (CEMFormer), which employs spatio-temporal transformers to learn unified memory representations for an improved driver intention prediction. Specifically, we develop a spatial-temporal encoder to integrate information from both in-cabin and external camera views, along with episodic memory representations to continuously fuse historical data. Furthermore, we propose a novel context-consistency loss that incorporates driving context as an auxiliary supervision signal to improve prediction performance. Comprehensive experiments on the Brain4Cars dataset demonstrate that CEMFormer consistently outperforms existing state-of-the-art methods in driver intention prediction.	翻訳日:2023-05-16 19:30:35 公開日:2023-05-13
# 多言語言語モデルの幾何学:平等レンズ The Geometry of Multilingual Language Models: An Equality Lens ( http://arxiv.org/abs/2305.07839v1 ) ライセンス: Link先を確認	Cheril Shah, Yashashree Chandak, Manan Suri	(参考訳) 多言語言語モデルにおける異なる言語の表現を理解することは、言語間特性の理解、下流タスクのパフォーマンスの予測、言語間のバイアスの特定に不可欠である。本研究では, ユークリッド空間における3つの多言語モデルの幾何学を解析し, すべての言語が一意な幾何学で表されることを示す。幾何学的分離性指数を用いて、言語は言語族によって近い傾向にあるが、それらは他族の言語とほぼ分離可能である。また,意味空間における言語間距離を測定するために,言語間類似度指数を導入する。以上の結果から,低リソース言語は,いずれのモデルにおいても高リソース言語ほど良く表現されていないことが示唆された。 Understanding the representations of different languages in multilingual language models is essential for comprehending their cross-lingual properties, predicting their performance on downstream tasks, and identifying any biases across languages. In our study, we analyze the geometry of three multilingual language models in Euclidean space and find that all languages are represented by unique geometries. Using a geometric separability index we find that although languages tend to be closer according to their linguistic family, they are almost separable with languages from other families. We also introduce a Cross-Lingual Similarity Index to measure the distance of languages with each other in the semantic space. Our findings indicate that the low-resource languages are not represented as good as high resource languages in any of the models	翻訳日:2023-05-16 19:30:18 公開日:2023-05-13
# 重み付きパッチ品質予測による非参照点クラウド品質評価 No-Reference Point Cloud Quality Assessment via Weighted Patch Quality Prediction ( http://arxiv.org/abs/2305.07829v1 ) ライセンス: Link先を確認	Jun Cheng, Honglei Su, Jari Korhonen	(参考訳) ポイントクラウドに基づく3Dビジョンアプリケーションの開発が急速に進み、ポイントクラウド品質評価(PCQA)が重要な研究トピックになりつつある。しかし、従来のPCQA手法では、点雲の異なる領域における局所的な品質変動の影響を無視する。品質分布不均衡の利点を生かし,地域相関解析機能を備えた非参照点雲質評価法(NR-PCQA)を提案する。具体的には、ポイントクラウドをパッチに分割し、各パッチのテクスチャと構造機能を生成し、それらをパッチ機能に融合してパッチ品質を予測します。そして,相関解析のために点雲のすべてのパッチの特徴を収集し,相関重みを求める。最後に、すべてのパッチに対する予測品質と相関重みを用いて最終的な品質スコアを導出する。実験の結果,提案手法はNR-PCQA法よりも優れていた。 COPP-Netのソースコードはhttps://github.com/philox12358/COPP-Netにある。 With the rapid development of 3D vision applications based on point clouds, point cloud quality assessment(PCQA) is becoming an important research topic. However, the prior PCQA methods ignore the effect of local quality variance across different areas of the point cloud. To take an advantage of the quality distribution imbalance, we propose a no-reference point cloud quality assessment (NR-PCQA) method with local area correlation analysis capability, denoted as COPP-Net. More specifically, we split a point cloud into patches, generate texture and structure features for each patch, and fuse them into patch features to predict patch quality. Then, we gather the features of all the patches of a point cloud for correlation analysis, to obtain the correlation weights. Finally, the predicted qualities and correlation weights for all the patches are used to derive the final quality score. Experimental results show that our method outperforms the state-of-the-art benchmark NR-PCQA methods. The source code for the proposed COPP-Net can be found at https://github.com/philox12358/COPP-Net.	翻訳日:2023-05-16 19:30:05 公開日:2023-05-13
# DCASE 2023チャレンジタスクの解説と議論第2報:機械条件モニタリングのための1ショット無監督異常音検出 Description and Discussion on DCASE 2023 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring ( http://arxiv.org/abs/2305.07828v1 ) ライセンス: Link先を確認	Kota Dohi and Keisuke Imoto and Noboru Harada and Daisuke Niizumi and Yuma Koizumi and Tomoya Nishida and Harsh Purohit and Ryo Tanabe and Takashi Endo and Yohei Kawaguchi	(参考訳) 本稿では,音響シーンとイベントの検出と分類に関するタスク記述(dcase)2023 challenge task 2: "first-shot unsupervised anomalous sound detection (asd) for machine condition monitoring"について述べる。主な目標は、ハイパーパラメータチューニングを必要とせず、少数の正常なサンプルのみを使用して、新しい種類のマシンにasdシステムを迅速に展開できるようにすることである。過去のASDタスクでは、開発および評価データセットが同じマシンタイプであったため、各マシンタイプごとにハイパーパラメータをチューニングする手法が開発された。しかし、通常データや異常データを開発データセットとして収集することは現実には不可能である。 2023年タスク2では、全く新しいマシンタイプのマシンでモデルをトレーニングするという課題であるファーストショット問題を解決することに集中します。具体的には (i)各マシンタイプは1つのセクションしか持たず、 (ii) 開発・評価データセットのマシンタイプは全く異なる。課題提出期限後に,課題結果と提案内容の分析を加えます。 We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge Task 2: "First-shot unsupervised anomalous sound detection (ASD) for machine condition monitoring". The main goal is to enable rapid deployment of ASD systems for new kinds of machines using only a few normal samples, without the need for hyperparameter tuning. In the past ASD tasks, developed methods tuned hyperparameters for each machine type, as the development and evaluation datasets had the same machine types. However, collecting normal and anomalous data as the development dataset can be infeasible in practice. In 2023 Task 2, we focus on solving first-shot problem, which is the challenge of training a model on a few machines of a completely novel machine type. Specifically, (i) each machine type has only one section, and (ii) machine types in the development and evaluation datasets are completely different. We will add challenge results and analysis of the submissions after the challenge submission deadline.	翻訳日:2023-05-16 19:29:48 公開日:2023-05-13
# 混合商品距離による静的単語埋め込みの周波数対応次元選択 Frequency-aware Dimension Selection for Static Word Embedding by Mixed Product Distance ( http://arxiv.org/abs/2305.07826v1 ) ライセンス: Link先を確認	Lingfeng Shen, Haiyun Jiang, Lemao Liu, Ying Chen	(参考訳) 静的な単語埋め込みは、特にコンテキストが利用できないタスクでは、事前学習された言語モデルは、静的な単語埋め込みよりもパフォーマンスが悪いため、まだ有用である。次元は静的単語埋め込みの品質を決定する重要な要素であるが、自動次元選択はめったに議論されない。本稿では, 単語の頻度が次元選択に与える影響について検討し, 単語の頻度が非常に重要であり, 次元選択中に考慮する必要があることを実証的に確認する。このような経験的発見に基づいて, 単語埋め込みアルゴリズムを訓練することなく, 単語埋め込みアルゴリズムの適切な次元を選択するために, 距離(Mixed Product Distance, MPD)を用いた次元選択法を提案する。オラクル行列に後処理関数を適用することで、MPDベースの手法は単語周波数の影響を非強調化することができる。コンテクスト未使用タスクとコンテクスト利用可能タスクの両方に関する実験は、ベースライン上のmpdベースの次元選択方法の効率とパフォーマンスのトレードオフをよりよく示しています。 Static word embedding is still useful, particularly for context-unavailable tasks, because in the case of no context available, pre-trained language models often perform worse than static word embeddings. Although dimension is a key factor determining the quality of static word embeddings, automatic dimension selection is rarely discussed. In this paper, we investigate the impact of word frequency on the dimension selection, and empirically find that word frequency is so vital that it needs to be taken into account during dimension selection. Based on such an empirical finding, this paper proposes a dimension selection method that uses a metric (Mixed Product Distance, MPD) to select a proper dimension for word embedding algorithms without training any word embedding. Through applying a post-processing function to oracle matrices, the MPD-based method can de-emphasize the impact of word frequency. Experiments on both context-unavailable and context-available tasks demonstrate the better efficiency-performance trade-off of our MPD-based dimension selection method over baselines.	翻訳日:2023-05-16 19:29:29 公開日:2023-05-13
# YOLOv7-BRAとマルチモデル融合に基づく学生の授業行動検出 Student Classroom Behavior Detection based on YOLOv7-BRA and Multi-Model Fusion ( http://arxiv.org/abs/2305.07825v1 ) ライセンス: Link先を確認	Fan Yang and Tao Wang, Xiaofei Wang	(参考訳) 教室ビデオにおける生徒の行動を正確に検出することは,授業パフォーマンスの分析と指導効果の向上に寄与する。しかし、動作検出における現在の精度は低い。そこで本稿では, YOLOv7-BRA (YOLOv7 with Bi-level Routing Attention ) に基づく授業行動検出システムを提案する。我々は,立ち上がり,座った,話す,聞く,歩く,手を上げる,読む,書くという8つの行動パターンを特定した。本研究では,11,248個のラベルと4,001個の画像を含むデータセットを構築し,教室環境における手を挙げる一般的な行動に着目した(Student Classroom Behavior dataset, SCB-Dataset)。検出精度を向上させるため,biformer attentionモジュールをyolov7ネットワークに追加した。最後に、学生の教室行動データを得るために、YOLOv7 CrowdHuman、SlowFast、DeepSortモデルの結果を融合した。 SCB-Datasetの実験を行い、YOLOv7-BRAはmAP@0.5の87.1%を達成した。 SCBデータセットは、https://github.com/Whiffe/SCB-dataseからダウンロードできます。 Accurately detecting student behavior in classroom videos can aid in analyzing their classroom performance and improving teaching effectiveness. However, the current accuracy rate in behavior detection is low. To address this challenge, we propose the Student Classroom Behavior Detection system based on based on YOLOv7-BRA (YOLOv7 with Bi-level Routing Attention ). We identified eight different behavior patterns, including standing, sitting, speaking, listening, walking, raising hands, reading, and writing. We constructed a dataset, which contained 11,248 labels and 4,001 images, with an emphasis on the common behavior of raising hands in a classroom setting (Student Classroom Behavior dataset, SCB-Dataset). To improve detection accuracy, we added the biformer attention module to the YOLOv7 network. Finally, we fused the results from YOLOv7 CrowdHuman, SlowFast, and DeepSort models to obtain student classroom behavior data. We conducted experiments on the SCB-Dataset, and YOLOv7-BRA achieved an mAP@0.5 of 87.1%, resulting in a 2.2% improvement over previous results. Our SCB-dataset can be downloaded from: https://github.com/Whiffe/SCB-datase	翻訳日:2023-05-16 19:29:13 公開日:2023-05-13
# 教師なし文表現強調のためのシンプルかつプラグアンドプレイ法 A Simple and Plug-and-play Method for Unsupervised Sentence Representation Enhancement ( http://arxiv.org/abs/2305.07824v1 ) ライセンス: Link先を確認	Lingfeng Shen, Haiyun Jiang, Lemao Liu, Shuming Shi	(参考訳) 文の適切な埋め込みを教師なしの方法で生成することは、実世界のシナリオにおける意味マッチングと検索の問題に有用である。本稿では,文章表現を高度化する非常に単純な後処理手法であるRepresentation ALchemy(RepAL)を提案する。 RepALの基本的な考え方は、事前訓練されたモデルによって生成された文埋め込みの冗長な情報を強調することである。総合的な実験を通して、RepALは学習自由であり、既存の教師なし文章学習モデルと組み合わせることができるプラグアンドプレイ法であることを示す。また,RepALの理解のために詳細な分析を行った。 Generating proper embedding of sentences through an unsupervised way is beneficial to semantic matching and retrieval problems in real-world scenarios. This paper presents Representation ALchemy (RepAL), an extremely simple post-processing method that enhances sentence representations. The basic idea in RepAL is to de-emphasize redundant information of sentence embedding generated by pre-trained models. Through comprehensive experiments, we show that RepAL is free of training and is a plug-and-play method that can be combined with most existing unsupervised sentence learning models. We also conducted in-depth analysis to understand RepAL.	翻訳日:2023-05-16 19:28:50 公開日:2023-05-13
# 深層学習に基づく心臓運動からの電気不整脈回路の予測 : シリコンを用いた研究 Deep Learning-based Prediction of Electrical Arrhythmia Circuits from Cardiac Motion: An In-Silico Study ( http://arxiv.org/abs/2305.07822v1 ) ライセンス: Link先を確認	Jan Lebert, Daniel Deng, Lei Fan, Lik Chuan Lee, and Jan Christoph	(参考訳) 心臓の収縮は、心筋を介して伝播する電気的興奮によって引き起こされる。近年,深層学習を用いてシミュレーションされた心筋組織の収縮運動から電気刺激を計算できることが示されている。心臓の電気生理学において、第一の診断目標は、心臓のリズム障害の電気的トリガーやドライバを特定することである。しかし、電気マッピング技術を用いることで、特に心室不整脈において、心臓筋全体、特に心室不整脈の間における電気波の3次元形態をマッピングすることは不可能である。したがって、心臓の動きから電気的興奮を計算または予測するアプローチは、有望な代替診断手法である可能性がある。本稿では,深層学習を用いて,心室の変形力学から3次元の波動力学を予測できることを計算機シミュレーションで実証する。心室図の電気機械的アクティベーションダイナミクスのシミュレーションを何千回も実施し,そのデータを用いてニューラルネットワークのトレーニングを行い,変形の原因となる3次元波動パターンを予測した。ネットワークが特定の不整脈を見たことがない場合でも、焦点波パターンの横で複雑な3次元の電磁波パターンを再構成できることを実証した。本研究では, 有限要素法 (FEM) で生成したデータに対して, 平滑化粒子流体力学 (SPH) 法で生成したデータに基づいて, 学習モデルを学習し, 一般化できることを示す。予測は、傷跡の存在下で、そして著しい異質性をもって行うことができる。以上の結果から,深部ニューラルネットワークを用いて心筋運動の画像データから筋内活動電位波形を算出できることが示唆された。 The heart's contraction is caused by electrical excitation which propagates through the heart muscle. It was recently shown that the electrical excitation can be computed from the contractile motion of a simulated piece of heart muscle tissue using deep learning. In cardiac electrophysiology, a primary diagnostic goal is to identify electrical triggers or drivers of heart rhythm disorders. However, using electrical mapping techniques, it is currently impossible to map the three-dimensional morphology of the electrical waves throughout the entire heart muscle, especially during ventricular arrhythmias. Therefore, the approach to calculate or predict electrical excitation from the hearts motion could be a promising alternative diagnostic approach. Here, we demonstrate in computer simulations that it is possible to predict three-dimensional electrical wave dynamics from ventricular deformation mechanics using deep learning. We performed thousands of simulations of electromechanical activation dynamics in ventricular geometries and used the data to train a neural network which subsequently predicts the three-dimensional electrical wave pattern that caused the deformation. We demonstrate that, next to focal wave patterns, even complicated three-dimensional electrical wave patterns can be reconstructed, even if the network has never seen the particular arrhythmia. We show that the deep learning model has the ability to generalize by training it on data generated with the smoothed particle hydrodynamics (SPH) method and subsequently applying it to data generated with the finite element method (FEM). Predictions can be performed in the presence of scars and with significant heterogeneity. Our results suggest that, deep neural networks could be used to calculate intramural action potential wave patterns from imaging data of the motion of the heart muscle.	翻訳日:2023-05-16 19:28:34 公開日:2023-05-13
# 配電系統におけるキャパシティ解析をホストするアクティブラーニング手法 An Active Learning-based Approach for Hosting Capacity Analysis in Distribution Systems ( http://arxiv.org/abs/2305.07818v1 ) ライセンス: Link先を確認	Kiyeob Lee, Peng Zhao, Anirban Bhattacharya, Bani K. Mallick, Le Xie	(参考訳) 分散エネルギー資源(ders)の統合が増加するにつれて、将来の電力配電網のためのホスティングキャパシティ(hc)をモデル化し、分析する必要がある。ホスティングキャパシティ分析(hca)は、グリッドに安全に統合できるdersの量を調べ、全一般性において挑戦的なタスクである。すなわち、実現可能集合と実現不可能集合の間には、多くの極点が存在する。さらに、HCは複数の因子に依存する。 (a)社会経済的行動に依存したDrの採用パターン b)DERの制御方法と管理方法これらの2つの要因は、dersのすべての統合が集中的に計画されているわけではなく、hcに関する我々の理解を大きく変える可能性があるため、問題空間に固有のものである。本稿では2つの要因を捉えることで研究ギャップを解消する。 (a)及び b) HCAやいくつかの最も洞察に富んだHCシナリオをドメイン知識の犠牲にして特定すること。我々は,データ駆動型HCAフレームワークを提案し,シナリオを効果的に探求するために,HCAにおけるアクティブラーニングを導入する。 HCAにおけるアクティブラーニングとHCの特性 (a)及び (b) は 3 行の例で示される。次に,その意義を理解するために,詳細な大規模研究を提案する。 (a)及び (b) HCとその解釈は2つの要因によって大きく変化することが示唆された。 (a)及び (b) With the increasing amount of distributed energy resources (DERs) integration, there is a significant need to model and analyze hosting capacity (HC) for future electric distribution grids. Hosting capacity analysis (HCA) examines the amount of DERs that can be safely integrated into the grid and is a challenging task in full generality because there are many possible integration of DERs in foresight. That is, there are numerous extreme points between feasible and infeasible sets. Moreover, HC depends on multiple factors such as (a) adoption patterns of DERs that depend on socio-economic behaviors and (b) how DERs are controlled and managed. These two factors are intrinsic to the problem space because not all integration of DERs may be centrally planned, and could largely change our understanding about HC. This paper addresses the research gap by capturing the two factors (a) and (b) in HCA and by identifying a few most insightful HC scenarios at the cost of domain knowledge. We propose a data-driven HCA framework and introduce active learning in HCA to effectively explore scenarios. Active learning in HCA and characteristics of HC with respect to the two factors (a) and (b) are illustrated in a 3-bus example. Next, detailed large-scale studies are proposed to understand the significance of (a) and (b). Our findings suggest that HC and its interpretations significantly change subject to the two factors (a) and (b).	翻訳日:2023-05-16 19:27:46 公開日:2023-05-13
# palm: 病的近視認識と解剖学的構造アノテーションを備えた開眼眼底写真データセット PALM: Open Fundus Photograph Dataset with Pathologic Myopia Recognition and Anatomical Structure Annotation ( http://arxiv.org/abs/2305.07816v1 ) ライセンス: Link先を確認	Huihui Fang, Fei Li, Junde Wu, Huazhu Fu, Xu Sun, Jos\'e Ignacio Orlando, Hrvoje Bogunovi\'c, Xiulan Zhang, Yanwu Xu	(参考訳) 病理組織学的ミオニア (PM) は、近視性網膜変性症である。この状態の早期スクリーニングは、それに伴う眼底病変による損傷を減少させ、視力の喪失を予防することができる。人工知能に基づく自動診断ツールは、臨床医が病気の兆候を識別したり、カラー・ファンドス写真を使って集団を検査したりすることで、このプロセスの恩恵を受けることができる。本稿では,病理組織診断と解剖学的構造アノテーションのためのPALM,オープン・ファンドス・イメージング・データセットについて考察する。本データベースは, 病的近視カテゴリのラベル付き1200枚の画像と, 視神経乳頭の位置, パッチ状網膜萎縮(乳頭性萎縮症を含む), 網膜剥離などの病変の描出に関する手指注釈を含む。さらに,本論文では,データベース構築に使用されるラベル付けプロセス,サンプルの品質と特性などの詳細を詳述し,他の利用ノートを提供する。 Pathologic myopia (PM) is a common blinding retinal degeneration suffered by highly myopic population. Early screening of this condition can reduce the damage caused by the associated fundus lesions and therefore prevent vision loss. Automated diagnostic tools based on artificial intelligence methods can benefit this process by aiding clinicians to identify disease signs or to screen mass populations using color fundus photographs as inputs. This paper provides insights about PALM, our open fundus imaging dataset for pathological myopia recognition and anatomical structure annotation. Our databases comprises 1200 images with associated labels for the pathologic myopia category and manual annotations of the optic disc, the position of the fovea and delineations of lesions such as patchy retinal atrophy (including peripapillary atrophy) and retinal detachment. In addition, this paper elaborates on other details such as the labeling process used to construct the database, the quality and characteristics of the samples and provides other relevant usage notes.	翻訳日:2023-05-16 19:27:21 公開日:2023-05-13
# 希少事象シミュレーションのためのフローベース生成モデル A Flow-Based Generative Model for Rare-Event Simulation ( http://arxiv.org/abs/2305.07863v1 ) ライセンス: Link先を確認	Lachlan Gibson, Marcus Hoerger, Dirk Kroese	(参考訳) 複雑で確率的な環境で決定問題を解くことは、モンテカルロサンプリングによる決定の結果を推定することでしばしば達成される。しかし、サンプリングは稀だが重要な出来事を見落とし、決定プロセスに重大な影響を及ぼす可能性がある。本稿では,まれな事象が発生した場合の条件分布から直接サンプルをシミュレートする正規化フロー生成モデルを訓練する手法を提案する。カップリングフローを利用することで,任意のサンプリング分布を任意に近似することができる。近似法とImportance Smplingを組み合わせることで、複雑な積分と期待値の高精度な推定値が得られる。本手法を高次元, 希少な設定でも, 効率的なサンプリングと推定に利用できる例をいくつか紹介する。我々は,レアイベント分布から直接シミュレートすることで,レアイベントの発生方法に大きな洞察を得ることができることを示す。 Solving decision problems in complex, stochastic environments is often achieved by estimating the expected outcome of decisions via Monte Carlo sampling. However, sampling may overlook rare, but important events, which can severely impact the decision making process. We present a method in which a Normalizing Flow generative model is trained to simulate samples directly from a conditional distribution given that a rare event occurs. By utilizing Coupling Flows, our model can, in principle, approximate any sampling distribution arbitrarily well. By combining the approximation method with Importance Sampling, highly accurate estimates of complicated integrals and expectations can be obtained. We include several examples to demonstrate how the method can be used for efficient sampling and estimation, even in high-dimensional and rare-event settings. We illustrate that by simulating directly from a rare-event distribution significant insight can be gained into the way rare events happen.	翻訳日:2023-05-16 19:20:14 公開日:2023-05-13
# HAiVA: クラウド特性が気候パターンに与える影響を研究するためのハイブリッドAI支援ビジュアル分析フレームワーク HAiVA: Hybrid AI-assisted Visual Analysis Framework to Study the Effects of Cloud Properties on Climate Patterns ( http://arxiv.org/abs/2305.07859v1 ) ライセンス: Link先を確認	Subhashis Hazarika, Haruki Hirasawa, Sookyung Kim, Kalai Ramea, Salva R. Cachay, Peetak Mitra, Dipti Hingmire, Hansi Singh, Phil J. Rasch	(参考訳) 雲は地球の気候システムに大きな影響を及ぼす。これらは地球の放射収支を調整し、温度と降水量の地域的変化を促進する上で重要な役割を担っている。これにより、雲は、雲の反射率の修正を意味するマリン・クラウド・ブライトニング(MCB)のような気候介入技術に理想的である。しかし,MCBの意図しない影響を避けるためには,気候応答関数に対する複雑な雲の理解を深める必要がある。従来のアース・システム・モデルによる介入シナリオの設計とテストは計算コストがかかる。そこで我々は,このような科学的研究を進めるためのハイブリッドAI支援視覚分析フレームワークを提案し,様々なMCB介入シナリオをインタラクティブに検討し,その意図的かつ意図しない影響が気候パターンに与える影響を評価する。我々は気候科学者のチームと協力して,クラウドと気候の応答関数を模倣するハイブリッドaiモデル群を開発し,異なるmcb介入実験を行うための,密結合したフロントエンドインタラクティブなビジュアル分析システムを設計する。 Clouds have a significant impact on the Earth's climate system. They play a vital role in modulating Earth's radiation budget and driving regional changes in temperature and precipitation. This makes clouds ideal for climate intervention techniques like Marine Cloud Brightening (MCB) which refers to modification in cloud reflectivity, thereby cooling the surrounding region. However, to avoid unintended effects of MCB, we need a better understanding of the complex cloud to climate response function. Designing and testing such interventions scenarios with conventional Earth System Models is computationally expensive. Therefore, we propose a hybrid AI-assisted visual analysis framework to drive such scientific studies and facilitate interactive what-if investigation of different MCB intervention scenarios to assess their intended and unintended impacts on climate patterns. We work with a team of climate scientists to develop a suite of hybrid AI models emulating cloud-climate response function and design a tightly coupled frontend interactive visual analysis system to perform different MCB intervention experiments.	翻訳日:2023-05-16 19:20:01 公開日:2023-05-13
# AURA : 物体除去のためのランダム入力サンプリングを用いたマスク自動生成装置 AURA : Automatic Mask Generator using Randomized Input Sampling for Object Removal ( http://arxiv.org/abs/2305.07857v1 ) ライセンス: Link先を確認	Changsuk Oh, Dongseok Shim, H. Jin Kim	(参考訳) 画像の塗装作業の目的は、画像の欠落した領域を視覚的に可視的に埋めることである。近年,深層学習に基づく画像インパインティングネットワークは,画像中の不要なオブジェクトをマスキングすることで,オブジェクトの除去に利用している。しかしながら、ネットワークを使ってオブジェクトを適切に削除しようとする一方で、以前の作業では入力マスクの重要性に注意を払わない。本稿では,オフザシェルフ・イメージ・インパインティング・ネットワークを用いて,オブジェクトをよりよく除去するための入力マスクの生成に焦点をあてる。本稿では,説明可能なai(xai)手法に触発された自動マスク生成法を提案する。提案手法は,ランダムな入力マスクを用いて重要度マップを生成し,ランダムなマスクから得られた画像のスコアを定量的に推定する。出力マスクは、重要度マップから生成される候補マスクのうち、判定モジュールによって選択される。判定モジュールを設計し,オブジェクト除去結果の品質を定量的に推定する。さらに, 対象除去結果の報告に用いられた評価手法が, 対象除去器の性能を推定するには適切でないことを実証的に見出した。そこで我々は,対象物除去器の品質を適切に評価するために,新しい評価指標(FID$^$とU-IDS$^$)を提案する。実験により,本手法は意味的セグメンテーションマップから生成したマスクよりも,目的のクラスオブジェクトを除去する性能が良好であることが確認された。 The objective of the image inpainting task is to fill missing regions of an image in a visually plausible way. Recently, deep-learning-based image inpainting networks have generated outstanding results, and some utilize their models as object removers by masking unwanted objects in an image. However, while trying to better remove objects using their networks, the previous works pay less attention to the importance of the input mask. In this paper, we focus on generating the input mask to better remove objects using the off-the-shelf image inpainting network. We propose an automatic mask generator inspired by the explainable AI (XAI) method, whose output can better remove objects than a semantic segmentation mask. The proposed method generates an importance map using randomly sampled input masks and quantitatively estimated scores of the completed images obtained from the random masks. The output mask is selected by a judge module among the candidate masks which are generated from the importance map. We design the judge module to quantitatively estimate the quality of the object removal results. In addition, we empirically find that the evaluation methods used in the previous works reporting object removal results are not appropriate for estimating the performance of an object remover. Therefore, we propose new evaluation metrics (FID$^$ and U-IDS$^$) to properly evaluate the quality of object removers. Experiments confirm that our method shows better performance in removing target class objects than the masks generated from the semantic segmentation maps, and the two proposed metrics make judgments consistent with humans.	翻訳日:2023-05-16 19:19:42 公開日:2023-05-13
# マルチエージェントシステムにおける非同期動作コーディネーションのためのstackelberg決定トランスフォーマ Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems ( http://arxiv.org/abs/2305.07856v1 ) ライセンス: Link先を確認	Bin Zhang, Hangyu Mao, Lijuan Li, Zhiwei Xu, Dapeng Li, Rui Zhao, Guoliang Fan	(参考訳) 非同期アクションコーディネーションは、マルチエージェントシステム(mas)において、スタックルバーグゲーム(sg)として表現できる広汎な挑戦を示す。しかし,SGに基づくMARL(Multi-Agent Reinforcement Learning)手法のスケーラビリティは,ネットワーク構造や環境制約によって厳しく制約されている。この問題に対処するために,エージェント間の階層的協調の困難を解消するヒューリスティックアプローチであるStackelberg Decision Transformer (STEER)を提案する。 STEERは、SGの階層的決定構造、自己回帰配列モデルのモデリング能力、MARLの探索的学習手法を取り入れ、空間的および時間的文脈における意思決定プロセスを効率的に管理する。本研究は,masにおける様々なタスクタイプや環境構成に広く適用可能な,効果的かつ適応可能な非同期動作協調手法の開発に寄与する。実験の結果,提案手法はstackelberg平衡解に収束し,他の既存手法よりも複雑なシナリオで優れていることがわかった。 Asynchronous action coordination presents a pervasive challenge in Multi-Agent Systems (MAS), which can be represented as a Stackelberg game (SG). However, the scalability of existing Multi-Agent Reinforcement Learning (MARL) methods based on SG is severely constrained by network structures or environmental limitations. To address this issue, we propose the Stackelberg Decision Transformer (STEER), a heuristic approach that resolves the difficulties of hierarchical coordination among agents. STEER efficiently manages decision-making processes in both spatial and temporal contexts by incorporating the hierarchical decision structure of SG, the modeling capability of autoregressive sequence models, and the exploratory learning methodology of MARL. Our research contributes to the development of an effective and adaptable asynchronous action coordination method that can be widely applied to various task types and environmental configurations in MAS. Experimental results demonstrate that our method can converge to Stackelberg equilibrium solutions and outperforms other existing methods in complex scenarios.	翻訳日:2023-05-16 19:19:16 公開日:2023-05-13
# マッチング特徴抽出を用いた異種エッジデバイスのためのフェデレーション学習型産業健康診断 A Federated Learning-based Industrial Health Prognostics for Heterogeneous Edge Devices using Matched Feature Extraction ( http://arxiv.org/abs/2305.07854v1 ) ライセンス: Link先を確認	Anushiya Arunan, Yan Qin, Xiaoli Li, and Chau Yuen	(参考訳) データ駆動型産業健康予測は、正確で信頼性の高い予測モデルを開発するために豊富な訓練データを必要とする。しかし、厳格なデータプライバシー法とエッジ産業データの豊富さは、分散データ利用を必要とする。したがって,産業保健分野は,分散型・プライバシー保全型学習手法であるフェデレーション学習(fl)から著しく利益を得るのに適している。しかしながら,ヘテロジニアスデータから学習したモデルパラメータを有意義に集約し,ハイパフォーマンスなフェデレーションモデルを形成するという複雑さから,flベースの健康予測タスクはほとんど研究されていない。特に、異質な分解機構と不等なデータセットサイズに由来するエッジデバイス間のデータの不均一性は、正確なフェデレーションモデルを開発する上で重要な統計的課題となる。特徴類似性マッチングパラメータアグリゲーションアルゴリズムを用いて、異種エッジデータから識別的に学習するFLベースの健康予後モデルを提案する。このアルゴリズムは局所的に訓練された不均一なモデルを探索し、まずニューロンと確率論的に類似した特徴抽出関数をマッチングし、それらを選択的に平均化し、フェデレートされたモデルパラメータを形成する。このアルゴリズムは、従来の座標方向ニューロンの平均化とは対照的に、類似したニューロンを平均するだけであるため、局所モデルの異なる特徴抽出器は、結果のフェデレーションモデルへの希釈を少なくする。ターボファンエンジンのLiイオン電池の循環劣化データと非循環劣化データの両方を用いて, 提案手法は, それぞれ44.5\%, 39.3\%の精度向上を達成できることを示した。 Data-driven industrial health prognostics require rich training data to develop accurate and reliable predictive models. However, stringent data privacy laws and the abundance of edge industrial data necessitate decentralized data utilization. Thus, the industrial health prognostics field is well suited to significantly benefit from federated learning (FL), a decentralized and privacy-preserving learning technique. However, FL-based health prognostics tasks have hardly been investigated due to the complexities of meaningfully aggregating model parameters trained from heterogeneous data to form a high performing federated model. Specifically, data heterogeneity among edge devices, stemming from dissimilar degradation mechanisms and unequal dataset sizes, poses a critical statistical challenge for developing accurate federated models. We propose a pioneering FL-based health prognostic model with a feature similarity-matched parameter aggregation algorithm to discriminatingly learn from heterogeneous edge data. The algorithm searches across the heterogeneous locally trained models and matches neurons with probabilistically similar feature extraction functions first, before selectively averaging them to form the federated model parameters. As the algorithm only averages similar neurons, as opposed to conventional naive averaging of coordinate-wise neurons, the distinct feature extractors of local models are carried over with less dilution to the resultant federated model. Using both cyclic degradation data of Li-ion batteries and non-cyclic data of turbofan engines, we demonstrate that the proposed method yields accuracy improvements as high as 44.5\% and 39.3\% for state-of-health estimation and remaining useful life estimation, respectively.	翻訳日:2023-05-16 19:18:57 公開日:2023-05-13
# EV-MGRFlowNet:ハイブリッド運動補償損失を有する教師なしイベントベース光流の動作誘導リカレントネットワーク EV-MGRFlowNet: Motion-Guided Recurrent Network for Unsupervised Event-based Optical Flow with Hybrid Motion-Compensation Loss ( http://arxiv.org/abs/2305.07853v1 ) ライセンス: Link先を確認	Hao Zhuang, Xinjie Huang, Kuanxu Hou, Delei Kong, Chenming Hu, Zheng Fang	(参考訳) イベントカメラは、高時間分解能や高ダイナミックレンジなどの有望な特性を提供する。これらの利点は多くの機械ビジョンタスク、特に光学フロー推定に利用されてきた。現在、ほとんどのイベントベースの作品は、ディープラーニングを使って光の流れを推定している。しかし、それらのネットワークは以前の隠れ状態や動きの流れを完全に活用していない。さらに、彼らの監視戦略は、ネットワークの可能性を解き放つためにイベントデータの幾何学的制約を十分に活用していない。本稿では,ハイブリッド動作補償損失を用いた動作誘導型リカレントネットワークを備えた,教師なしイベントベース光フロー推定パイプラインEV-MGRFlowNetを提案する。まず,従来の隠れ状態を完全に活用してマルチレベル動作特性を得る機能強化型リカレントエンコーダネットワーク(FERE-Net)を提案する。そこで我々は,フロー誘導型デコーダネットワーク(FGD-Net)を提案する。最後に,より正確なイベントアライメントのための幾何学的制約を強化するために,ハイブリッドモーション補償損失(hmc-loss)を設計する。実験結果から,本手法はmvsecデータセットのsof(state-of-the-art, sota)法を上回っており,平均エンドポイント誤差(aee)は22.71%であった。我々の知る限り,本手法は教師なし学習手法の1つである。 Event cameras offer promising properties, such as high temporal resolution and high dynamic range. These benefits have been utilized into many machine vision tasks, especially optical flow estimation. Currently, most existing event-based works use deep learning to estimate optical flow. However, their networks have not fully exploited prior hidden states and motion flows. Additionally, their supervision strategy has not fully leveraged the geometric constraints of event data to unlock the potential of networks. In this paper, we propose EV-MGRFlowNet, an unsupervised event-based optical flow estimation pipeline with motion-guided recurrent networks using a hybrid motion-compensation loss. First, we propose a feature-enhanced recurrent encoder network (FERE-Net) which fully utilizes prior hidden states to obtain multi-level motion features. Then, we propose a flow-guided decoder network (FGD-Net) to integrate prior motion flows. Finally, we design a hybrid motion-compensation loss (HMC-Loss) to strengthen geometric constraints for the more accurate alignment of events. Experimental results show that our method outperforms the current state-of-the-art (SOTA) method on the MVSEC dataset, with an average reduction of approximately 22.71% in average endpoint error (AEE). To our knowledge, our method ranks first among unsupervised learning-based methods.	翻訳日:2023-05-16 19:18:27 公開日:2023-05-13
# スクイーズ励起埋め込み注意用unetによる脳腫瘍の分節 Squeeze Excitation Embedded Attention UNet for Brain Tumor Segmentation ( http://arxiv.org/abs/2305.07850v1 ) ライセンス: Link先を確認	Gaurav Prasanna, John Rohit Ernest, Lalitha G and Sathiya Narayanan	(参考訳) 深層学習に基づく技術は、ここ数年医学の分野で重要性を増してきた。医学画像の分類、分類、識別など様々な用途で使用されている。 unetやアテンションunet、アテンション残差unetといった既存のアーキテクチャは、すでに脳腫瘍のセグメンテーションと同じ応用法として存在しているが、チャンネルレベルの特徴の抽出方法については、いずれも対処されていない。本稿では,Squeeze Excitation Embedded Attention UNet (SEEA-UNet) と呼ばれる新しいアーキテクチャを提案する。提案モデルと既存アーキテクチャとの比較を行った結果,学習回数が少なくなるほど,提案モデルの性能が向上した。双対焦点損失とジャカード係数はモデルの性能を監視するために用いられた。 Deep Learning based techniques have gained significance over the past few years in the field of medicine. They are used in various applications such as classifying medical images, segmentation and identification. The existing architectures such as UNet, Attention UNet and Attention Residual UNet are already currently existing methods for the same application of brain tumor segmentation, but none of them address the issue of how to extract the features in channel level. In this paper, we propose a new architecture called Squeeze Excitation Embedded Attention UNet (SEEA-UNet), this architecture has both Attention UNet and Squeeze Excitation Network for better results and predictions, this is used mainly because to get information at both Spatial and channel levels. The proposed model was compared with the existing architectures based on the comparison it was found out that for lesser number of epochs trained, the proposed model performed better. Binary focal loss and Jaccard Coefficient were used to monitor the model's performance.	翻訳日:2023-05-16 19:18:06 公開日:2023-05-13
# Meta-Polyp: 効率的なPolypセグメンテーションのためのベースライン Meta-Polyp: a baseline for efficient Polyp segmentation ( http://arxiv.org/abs/2305.07848v1 ) ライセンス: Link先を確認	Quoc-Huy Trinh	(参考訳) 近年,ポリプのセグメンテーションが重要となり,cnn,視覚トランスフォーマー,トランスフォーマー技術を用いた競合的手法が数多く開発されている。しかし、これらの手法は、分散外データセット、境界の欠如、小さなポリプを扱う際にしばしば困難に直面する。 2022年、メタフォーマーはビジョンの新しいベースラインとして導入され、マルチタスクコンピュータビジョンのパフォーマンスを向上させるだけでなく、ビジョントランスフォーマーとcnnファミリーバックボーンの制限にも対処した。セグメンテーションをさらに強化するために,UNetとMeta-Formerの融合と,テクスチャを強化するためにデコーダステージにレベルアップを組み合わせたマルチスケールアップサンプリングブロックを提案するとともに,Meta-Formerのアイデアに基づいたConvformerブロックベースを提案し,ローカル特徴の重要な情報を強化する。これらのブロックは、ポリープの全体形状のようなグローバル情報と、医療区分の決定に不可欠な局所情報と境界情報の組み合わせを可能にする。提案手法は競争性能を達成し,CVC-300データセット,Kvasir,CVC-ColonDBデータセットにおける最先端の成果を得た。 Kvasir-SEGとは別に、他はアウトオブディストリビューションデータセットである。実装は以下の通りである。 https://github.com/huyquoctrinh/MetaPolyp-CBMS2023。 In recent years, polyp segmentation has gained significant importance, and many methods have been developed using CNN, Vision Transformer, and Transformer techniques to achieve competitive results. However, these methods often face difficulties when dealing with out-of-distribution datasets, missing boundaries, and small polyps. In 2022, Meta-Former was introduced as a new baseline for vision, which not only improved the performance of multi-task computer vision but also addressed the limitations of the Vision Transformer and CNN family backbones. To further enhance segmentation, we propose a fusion of Meta-Former with UNet, along with the introduction of a Multi-scale Upsampling block with a level-up combination in the decoder stage to enhance the texture, also we propose the Convformer block base on the idea of the Meta-former to enhance the crucial information of the local feature. These blocks enable the combination of global information, such as the overall shape of the polyp, with local information and boundary information, which is crucial for the decision of the medical segmentation. Our proposed approach achieved competitive performance and obtained the top result in the State of the Art on the CVC-300 dataset, Kvasir, and CVC-ColonDB dataset. Apart from Kvasir-SEG, others are out-of-distribution datasets. The implementation can be found at: https://github.com/huyquoctrinh/MetaPolyp-CBMS2023.	翻訳日:2023-05-16 19:17:47 公開日:2023-05-13
# 不均一データを用いたフェデレーション学習における平均モデル理解 Understanding Model Averaging in Federated Learning on Heterogeneous Data ( http://arxiv.org/abs/2305.07845v1 ) ライセンス: Link先を確認	Tailin Zhou, Zehong Lin, Jun Zhang, Danny H.K. Tsang	(参考訳) モデル平均化(model averaging)は、フェデレーション学習(fl)で広く採用されている手法で、異種データでトレーニングされた複数のクライアントモデルを集約し、よく整備されたグローバルモデルを得る。しかし、その成功の根拠はよく理解されていない。そこで本研究では,損失/エラーの景観を可視化し,モデル平均化の幾何学的性質について検討する。幾何学的可視化は、クライアントモデルが共通盆地内のグローバルモデルを取り囲み、クライアントモデルよりも優れた性能を示したとしても、グローバルモデルは盆地の底部から逸脱する可能性があることを示している。この現象をさらに理解するために,グローバルモデルの予測誤差をクライアントモデルに関連する5つの要因に分解する。特に、早期トレーニング後のグローバルモデルエラーは、主に、一クライアントデータセットとグローバルデータセットの重複しないデータのクライアントモデルエラー二グローバルモデルとクライアントモデルとの間の最大距離これらの知見に触発されて,グローバルモデルに反復移動平均化(IMA)を適用して予測誤差を低減し,遅延訓練時の最大距離を制御するクライアント探索を制限することを提案する。実験により,既存のfl法の精度とトレーニング速度が,様々なデータ不均一性を持つベンチマークデータセットにおいて著しく向上することを示した。 Model averaging, a widely adopted technique in federated learning (FL), aggregates multiple client models trained on heterogeneous data to obtain a well-performed global model. However, the rationale behind its success is not well understood. To shed light on this issue, we investigate the geometric properties of model averaging by visualizing the loss/error landscape. The geometrical visualization shows that the client models surround the global model within a common basin, and the global model may deviate from the bottom of the basin even though it performs better than the client models. To further understand this phenomenon, we decompose the expected prediction error of the global model into five factors related to client models. Specifically, we find that the global-model error after early training mainly comes from i) the client-model error on non-overlapping data between client datasets and the global dataset and ii) the maximal distance between the global and client models. Inspired by these findings, we propose adopting iterative moving averaging (IMA) on global models to reduce the prediction error and limiting client exploration to control the maximal distance at the late training. Our experiments demonstrate that IMA significantly improves the accuracy and training speed of existing FL methods on benchmark datasets with various data heterogeneity.	翻訳日:2023-05-16 19:17:19 公開日:2023-05-13
# 不定形作用をもつパラメータ化マルコフ決定過程に対するトンプソンサンプリング Thompson Sampling for Parameterized Markov Decision Processes with Uninformative Actions ( http://arxiv.org/abs/2305.07844v1 ) ライセンス: Link先を確認	Michael Gimelfarb and Michael Jong Kim	(参考訳) 興味の主パラメータが不明であり,ベイズ推定を用いて学習しなければならないパラメータ化MDP(PMDP)について検討した。このようなモデルのキーとなる特徴は、未知のパラメータに関する情報を提供する「非形式的」なアクションの存在である。我々はpmdpに対する一連の仮定を提案し、トンプソンサンプリングは、キューイング、在庫管理、動的価格といった多くの問題に対して容易に検証できる、漸近的に最適な期待後悔値である$o(t^{-1})$を保証する。 We study parameterized MDPs (PMDPs) in which the key parameters of interest are unknown and must be learned using Bayesian inference. One key defining feature of such models is the presence of "uninformative" actions that provide no information about the unknown parameters. We contribute a set of assumptions for PMDPs under which Thompson sampling guarantees an asymptotically optimal expected regret bound of $O(T^{-1})$, which are easily verified for many classes of problems such as queuing, inventory control, and dynamic pricing.	翻訳日:2023-05-16 19:16:56 公開日:2023-05-13
# Logit Attribution Matchingによるコントラスト領域の一般化 Contrastive Domain Generalization via Logit Attribution Matching ( http://arxiv.org/abs/2305.07888v1 ) ライセンス: Link先を確認	Han Gao, Kaican Li, Yongxiang Huang, Luning Wang, Caleb Chen Cao, Nevin L.Zhang	(参考訳) ドメイン一般化(DG)は機械学習において重要なオープンな問題である。深いモデルは、たとえ分単位のドメインシフトに影響を受けやすく、実際のアプリケーションにおける信頼性を著しく損なう。この問題を軽減するために、既存のほとんどのメソッドは、複数のトレーニングドメインにまたがる様々な不変制約を適用している。しかし、このようなアプローチは、一般的に新しいテストドメインに対するパフォーマンス保証をほとんど提供しない。本稿では,複数領域の代わりに強コントラストデータ対によって示される意味的不変性を利用した,cdg(con contrastive domain generalization)という異なる手法について検討する。本稿では,CDGの潜在能力を示す因果的DG理論を提案し,正規化手法とともに,CDGを実現するためのロジット属性マッチング(LAM)を提案する。 LAMは、ペアデータのごく一部で、最先端のDGメソッドよりも優れており、モデルがDGに不可欠なセマンティック機能により焦点を合わせるのに役立つことを実証的に示す。 Domain Generalization (DG) is an important open problem in machine learning. Deep models are susceptible to domain shifts of even minute degrees, which severely compromises their reliability in real applications. To alleviate the issue, most existing methods enforce various invariant constraints across multiple training domains. However,such an approach provides little performance guarantee for novel test domains in general. In this paper, we investigate a different approach named Contrastive Domain Generalization (CDG), which exploits semantic invariance exhibited by strongly contrastive data pairs in lieu of multiple domains. We present a causal DG theory that shows the potential capability of CDG; together with a regularization technique, Logit Attribution Matching (LAM), for realizing CDG. We empirically show that LAM outperforms state-of-the-art DG methods with only a small portion of paired data and that LAM helps models better focus on semantic features which are crucial to DG.	翻訳日:2023-05-16 19:10:39 公開日:2023-05-13
# レビュアー代入問題:スコーピングレビュー Reviewer assignment problem: A scoping review ( http://arxiv.org/abs/2305.07887v1 ) ライセンス: Link先を確認	Jelena Jovanovic (1) and Ebrahim Bagheri (2) ((1) University of Belgrade, Serbia, (2) Toronto Metropolitan University, Canada)	(参考訳) ピアレビューは科学研究の不可欠な要素である。査読の質、その結果発表された研究の質は、提出された論文に対して適切な審査員を募集する能力に大きく依存する。しかし、科学的論文の制作と学者の作業負荷の継続的な増加など、いくつかの要因により、このようなレビュアーの発見はますます困難になっている。これらの課題を緩和するために、レビューア(しばしばレビュア代入問題(RAP)と呼ばれる)と論文の自動関連のためのソリューションが30年間研究の対象となっている。多くの解が提案されているが、我々の知る限り、最近のRAP関連の文献の体系的な合成は欠落している。本稿では、このギャップを埋め、さらにRAP関連の研究を支援するために、RAPに対処するための計算手法のスコーピングレビューを行う。 3つのデータベース(Scopus, Google Scholar, DBLP)からRAPに関する最近の文献を収集し、信頼性基準を適用した後、RAP研究の諸側面におけるデータの抽出・合成に関する26の研究を継続した。一 RAPに対する全体的な枠組み及びアプローチ二審査官選定の基準三候補者審査官及び提出者のモデリング四審査官及び提出書の照合のための計算方法五提案したソリューションの性能を評価する方法この論文は、前述のRAP研究の各側面について要約し、考察し、今後の研究方向性を提案する。 Peer review is an integral component of scientific research. The quality of peer review, and consequently the published research, depends to a large extent on the ability to recruit adequate reviewers for submitted papers. However, finding such reviewers is an increasingly difficult task due to several factors, such as the continuous increase both in the production of scientific papers and the workload of scholars. To mitigate these challenges, solutions for automated association of papers with "well matching" reviewers - the task often referred to as reviewer assignment problem (RAP) - have been the subject of research for thirty years now. Even though numerous solutions have been suggested, to our knowledge, a recent systematic synthesis of the RAP-related literature is missing. To fill this gap and support further RAP-related research, in this paper, we present a scoping review of computational approaches for addressing RAP. Following the latest methodological guidance for scoping reviews, we have collected recent literature on RAP from three databases (Scopus, Google Scholar, DBLP) and, after applying the eligibility criteria, retained 26 studies for extracting and synthesising data on several aspects of RAP research including: i) the overall framing of and approach to RAP; ii) the criteria for reviewer selection; iii) the modelling of candidate reviewers and submissions; iv) the computational methods for matching reviewers and submissions; and v) the methods for evaluating the performance of the proposed solutions. The paper summarises and discusses the findings for each of the aforementioned aspects of RAP research and suggests future research directions.	翻訳日:2023-05-16 19:10:22 公開日:2023-05-13
# 側方カシミール力の測定から非ニュートン重力の制約を強化する方法 How to strengthen constraints on non-Newtonian gravity from measuring the lateral Casimir force ( http://arxiv.org/abs/2305.07884v1 ) ライセンス: Link先を確認	G. L. Klimchitskaya and V. M. Mostepanenko	(参考訳) ナノメートルの相互作用範囲では、利用可能な実験データはニュートンの重力法則に対する湯川型の補正を除外していないことが知られている。この相互作用における湯川型相互作用のパラメータに関する最も強い制約は、中性子散乱の実験と、波形表面間の横及び正常カシミール力の測定から導かれる。本研究では,高い相関振幅と小さい相関周期を犠牲にして実験構成を最適化することにより,4.5nmから37nmの範囲で現在利用可能な制約を大幅に強化できることを実証する。相互作用範囲19nmに対して,40倍以上の最大強度が到達可能であることを示す。 It has been known that in the nanometer interaction range the available experimental data do not exclude the Yukawa-type corrections to Newton's gravitational law which exceed the Newtonian gravitational force by many orders of magnitude. The strongest constraints on the parameters of Yukawa-type interaction in this interaction range follow from the experiments on neutron scattering and from measurements of the lateral and normal Casimir forces between corrugated surfaces. In this work, we demonstrate that by optimizing the experimental configuration at the expense of the higher corrugation amplitudes and smaller periods of corrugations it is possible to considerably strengthen the currently available constraints within the wide interaction range from 4.5 to 37nm. We show that the maximum strengthening by more than a factor of 40 is reachable for the interaction range of 19nm.	翻訳日:2023-05-16 19:09:55 公開日:2023-05-13
# 画素不確かさ推定による医用画像分割の一般化に向けて Towards Generalizable Medical Image Segmentation with Pixel-wise Uncertainty Estimation ( http://arxiv.org/abs/2305.07883v1 ) ライセンス: Link先を確認	Shuai Wang, Zipei Yan, Daoan Zhang, Zhongsen Li, Sirui Wu, Wenxuan Chen, Rui Li	(参考訳) ディープニューラルネットワーク(DNN)は、独立および同一分散(IID)仮説の下で視覚認識において有望な性能を達成する。対照的に、IDD仮説は多くの現実世界、特に医用画像解析において普遍的に保証されていない。医用画像分割は通常、各ピクセルをカテゴリに分類する画素単位の分類タスクとして定式化される。しかし、この定式化はdnnを混乱させるため、例えば境界付近の画素など、分類が難しい画素を無視している。本稿では,まず,分類の難しい画素が不確実性が高いことを明らかにする。そこで本研究では,dnnの分類が難しい画素を強調するために不確実性推定を用いた新しい枠組みを提案する。提案手法はprostateとfundusの2つのベンチマークで評価した。実験の結果,本手法は最先端手法よりも優れていた。 Deep neural networks (DNNs) achieve promising performance in visual recognition under the independent and identically distributed (IID) hypothesis. In contrast, the IID hypothesis is not universally guaranteed in numerous real-world applications, especially in medical image analysis. Medical image segmentation is typically formulated as a pixel-wise classification task in which each pixel is classified into a category. However, this formulation ignores the hard-to-classified pixels, e.g., some pixels near the boundary area, as they usually confuse DNNs. In this paper, we first explore that hard-to-classified pixels are associated with high uncertainty. Based on this, we propose a novel framework that utilizes uncertainty estimation to highlight hard-to-classified pixels for DNNs, thereby improving its generalization. We evaluate our method on two popular benchmarks: prostate and fundus datasets. The results of the experiment demonstrate that our method outperforms state-of-the-art methods.	翻訳日:2023-05-16 19:09:43 公開日:2023-05-13
# 生成aiと大規模言語モデルの二重利用問題 Dual Use Concerns of Generative AI and Large Language Models ( http://arxiv.org/abs/2305.07882v1 ) ライセンス: Link先を確認	Alexei Grinbaum and Laurynas Adomaitis	(参考訳) 本稿では,生命科学のために設計された Dual Use Research of Concern (DURC) フレームワークを,Large Language Models (LLM) に特化して,生成AIの領域に実装することを提案する。生物学的研究における利点と欠点が証明されていることから、DURCの基準はLLMに対して効果的に再定義できると考えており、AIガバナンスの改善に寄与する可能性がある。 DURCフレームワークを採用する際に課せられるバランスを認識し、生成的AIの影響に対する社会的認識を高める上で重要な政治的役割を強調します。最後に,LLM 研究に DURC アプローチを適用するための具体的な推奨事項について述べる。 We suggest the implementation of the Dual Use Research of Concern (DURC) framework, originally designed for life sciences, to the domain of generative AI, with a specific focus on Large Language Models (LLMs). With its demonstrated advantages and drawbacks in biological research, we believe the DURC criteria can be effectively redefined for LLMs, potentially contributing to improved AI governance. Acknowledging the balance that must be struck when employing the DURC framework, we highlight its crucial political role in enhancing societal awareness of the impact of generative AI. As a final point, we offer a series of specific recommendations for applying the DURC approach to LLM research.	翻訳日:2023-05-16 19:09:30 公開日:2023-05-13
# 2段階知識蒸留によるブラックボックスソースフリードメイン適応 Black-box Source-free Domain Adaptation via Two-stage Knowledge Distillation ( http://arxiv.org/abs/2305.07881v1 ) ライセンス: Link先を確認	Shuai Wang, Daoan Zhang, Zipei Yan, Shitong Shao, Rui Li	(参考訳) ソースフリーなドメイン適応は、トレーニング済みのソースモデルとターゲットデータのみを使用して、ディープニューラルネットワークを適用することを目的としている。しかし、ソースモデルにアクセスすると、ソースデータを漏洩する可能性があるため、患者のプライバシが明らかになる。本稿では,ソースモデルと対象データの出力のみを利用できるブラックボックス・ソースフリー領域適応法について検討する。簡便で効果的な二段階知識蒸留法を提案する。 uppercase\expandafter{\romannumeral1}では、ターゲットモデルをスクラッチからトレーニングし、ソースモデルによって生成されたソフトな擬似ラベルを知識蒸留法で生成する。 uppercase\expandafter{\romannumeral2}では、ノイズの多い擬似ラベルによるエラーの蓄積を避けるために、新しい学生モデルとして別のモデルを初期化する。学生モデルの学習を指導するために,教師モデルに弱い増補を施したイメージを給付する。提案手法は単純で柔軟であり,3つのクロスドメインセグメンテーションタスクにおいて驚くべき結果が得られる。 Source-free domain adaptation aims to adapt deep neural networks using only pre-trained source models and target data. However, accessing the source model still has a potential concern about leaking the source data, which reveals the patient's privacy. In this paper, we study the challenging but practical problem: black-box source-free domain adaptation where only the outputs of the source model and target data are available. We propose a simple but effective two-stage knowledge distillation method. In Stage \uppercase\expandafter{\romannumeral1}, we train the target model from scratch with soft pseudo-labels generated by the source model in a knowledge distillation manner. In Stage \uppercase\expandafter{\romannumeral2}, we initialize another model as the new student model to avoid the error accumulation caused by noisy pseudo-labels. We feed the images with weak augmentation to the teacher model to guide the learning of the student model. Our method is simple and flexible, and achieves surprising results on three cross-domain segmentation tasks.	翻訳日:2023-05-16 19:09:18 公開日:2023-05-13
# ウイルス感染症と細菌感染症の鑑別 : 血液検査値に基づく機械学習モデル Differentiating Viral and Bacterial Infections: A Machine Learning Model Based on Routine Blood Test Values ( http://arxiv.org/abs/2305.07877v1 ) ライセンス: Link先を確認	Gregor Gun\v{c}ar, Matja\v{z} Kukar, Tim Smole, Sa\v{s}o Mo\v{s}kon, Toma\v{z} Vovko, Simon Podnar, Peter \v{C}ernel\v{c}, Miran Brvar, Mateja Notar, Manca K\"oster, Marjeta Tu\v{s}ek Jelenc, Marko Notar	(参考訳) 抗生物質耐性の脅威の増大は、適切な抗生物質投与のために細菌感染とウイルス感染の正確な区別を必要とする。本研究では,16種類の血液検査結果,c-reactive proteinレベル,生物学的性および年齢を用いて,これらの感染型を識別するために,ウイルス対バクテリア機械学習モデルを開発した。単一の医療センターから44,120件のデータセットを用いて、ウイルス対細菌モデルでは82.2%、ブライアスコア0.129、ROC曲線0.11以下の領域が従来のCRP決定規則モデルよりも高い精度で示された。このモデルは1040mg/LのCRP範囲内で、細菌とウイルスの感染を区別するために、CRPのみが限られた診断値を提供する間隔において、大幅に改善された精度を示す。これらの知見は、診断決定のための複数の血液パラメータを検討することの重要性を強調し、ウイルス対細菌モデルが革新的な診断ツールの作成に寄与することを示唆している。このようなツールは、機械学習と関連するバイオマーカーを利用して、感染症の管理における臨床意思決定を強化する。 The growing threat of antibiotic resistance necessitates accurate differentiation between bacterial and viral infections for proper antibiotic administration. In this study, a Virus vs. Bacteria machine learning model was developed to discern between these infection types using 16 routine blood test results, C-reactive protein levels, biological sex, and age. With a dataset of 44,120 cases from a single medical center, the Virus vs. Bacteria model demonstrated remarkable accuracy of 82.2%, a Brier score of 0.129, and an area under the ROC curve of 0.91, surpassing the performance of traditional CRP decision rule models. The model demonstrates substantially improved accuracy within the CRP range of 10 40 mg/L, an interval in which CRP alone offers limited diagnostic value for distinguishing between bacterial and viral infections. These findings underscore the importance of considering multiple blood parameters for diagnostic decision-making and suggest that the Virus vs. Bacteria model could contribute to the creation of innovative diagnostic tools. Such tools would harness machine learning and relevant biomarkers to support enhanced clinical decision-making in managing infections.	翻訳日:2023-05-16 19:09:02 公開日:2023-05-13
# SPP-CNN: ネットワークロバストネス予測のための効率的なフレームワーク SPP-CNN: An Efficient Framework for Network Robustness Prediction ( http://arxiv.org/abs/2305.07872v1 ) ライセンス: Link先を確認	Chengpei Wu and Yang Lou and Lin Wang and Junli Li and Xiang Li and Guanrong Chen	(参考訳) 本稿では,ネットワークの接続性と悪意のある攻撃に対する制御性を維持するためのロバスト性について述べる。この種のネットワークの堅牢性は、通常、時間を要する攻撃シミュレーションによって測定され、ノードまたはエッジ削除攻撃のシーケンスの後、残りの接続性と制御可能性を記録する一連の値を返す。本稿では,空間ピラミッドプーリング畳み込みニューラルネットワーク(SPP-CNN)のネットワーク堅牢性予測のための効率的なフレームワークを開発する。新しいフレームワークは、畳み込み層と完全連結層の間に空間ピラミッドプーリング層を設置し、CNNベースの予測手法における一般的なミスマッチ問題を克服し、その一般化性を拡張する。 SPP-CNNと最先端の3つの堅牢性予測器、すなわちCNNベースの2つのグラフニューラルネットワークベースのフレームワークを比較して、大規模な実験を行う。配向および非配向の合成および実世界のネットワークについて検討した。実験の結果,提案したSPP-CNNは未知のデータセットに対する予測性能の向上と一般化性の向上を実現している。 This paper addresses the robustness of a network to sustain its connectivity and controllability against malicious attacks. This kind of network robustness is typically measured by the time-consuming attack simulation, which returns a sequence of values that record the remaining connectivity and controllability after a sequence of node- or edge-removal attacks. For improvement, this paper develops an efficient framework for network robustness prediction, the spatial pyramid pooling convolutional neural network (SPP-CNN). The new framework installs a spatial pyramid pooling layer between the convolutional and fully-connected layers, overcoming the common mismatch issue in the CNN-based prediction approaches and extending its generalizability. Extensive experiments are carried out by comparing SPP-CNN with three state-of-the-art robustness predictors, namely a CNN-based and two graph neural networks-based frameworks. Synthetic and real-world networks, both directed and undirected, are investigated. Experimental results demonstrate that the proposed SPP-CNN achieves better prediction performances and better generalizability to unknown datasets, with significantly lower time-consumption, than its counterparts.	翻訳日:2023-05-16 19:08:43 公開日:2023-05-13
# 事前学習型言語モデルを用いたスケーラブルな教育用質問生成 Scalable Educational Question Generation with Pre-trained Language Models ( http://arxiv.org/abs/2305.07871v1 ) ライセンス: Link先を確認	Sahan Bulathwela, Hamze Muse and Emine Yilmaz	(参考訳) 教育的質問の自動生成は、オンライン教育のスケールにおいて重要な役割を担い、グローバルな人口が個人化された学習旅行を運営しているときに、大規模に自己評価を可能にする。大規模言語モデルを適用した新しい教育的質問生成モデルである \textit{eduqg} を開発した。学術文献および科学質問データに基づく事前学習型言語モデルの構築と微調整により,<textit{EduQG} が優れた教育的質問を作成できることを示す。 The automatic generation of educational questions will play a key role in scaling online education, enabling self-assessment at scale when a global population is manoeuvring their personalised learning journeys. We develop \textit{EduQG}, a novel educational question generation model built by adapting a large language model. Our extensive experiments demonstrate that \textit{EduQG} can produce superior educational questions by further pre-training and fine-tuning a pre-trained language model on the scientific text and science question data.	翻訳日:2023-05-16 19:08:21 公開日:2023-05-13
# AIによるブリッジング履歴 : 予測精度とFact CheckingにおけるGPT3.5、GPT4、GoogleBARDの比較評価 Bridging History with AI A Comparative Evaluation of GPT 3.5, GPT4, and GoogleBARD in Predictive Accuracy and Fact Checking ( http://arxiv.org/abs/2305.07868v1 ) ライセンス: Link先を確認	Davut Emre Tasar, Ceren Ocal Tasar	(参考訳) デジタル時代の情報の急速な拡散は、正確な歴史的表現と解釈の重要性を強調している。人工知能は様々な分野で有望だが、歴史的事実チェックやギャップフィリングの可能性を秘めている。本研究では,LLM 3.5,GPT 4,GoogleBARD の3つの大言語モデルの性能を,与えられたデータに基づいて過去の事象を予測・検証する文脈で評価する。 DTR(Distance to Reality)と呼ばれる新しい指標を導入し、既存の歴史的事実からモデルのアウトプットを評価する。その結果, GPT 4は優れた性能を示すとともに, 歴史的研究におけるAIの潜在的な可能性を明らかにした。本稿では,過去の理解を深め,歴史知識のギャップを埋める上でのAIの役割について,さらなる研究の必要性を明らかにする。 The rapid proliferation of information in the digital era underscores the importance of accurate historical representation and interpretation. While artificial intelligence has shown promise in various fields, its potential for historical fact-checking and gap-filling remains largely untapped. This study evaluates the performance of three large language models LLMs GPT 3.5, GPT 4, and GoogleBARD in the context of predicting and verifying historical events based on given data. A novel metric, Distance to Reality (DTR), is introduced to assess the models' outputs against established historical facts. The results reveal a substantial potential for AI in historical studies, with GPT 4 demonstrating superior performance. This paper underscores the need for further research into AI's role in enriching our understanding of the past and bridging historical knowledge gaps.	翻訳日:2023-05-16 19:08:12 公開日:2023-05-13
# セマンティック対応による時間一貫性自動ビデオカラー化 Temporal Consistent Automatic Video Colorization via Semantic Correspondence ( http://arxiv.org/abs/2305.07904v1 ) ライセンス: Link先を確認	Yu Zhang, Siqi Chen, Mingdao Wang, Xianlin Zhang, Chuang Zhu, Yue Zhang, Xueming Li	(参考訳) 近年,ビデオカラー化作業が注目されている。近年の手法では,隣接するフレームやフレームの時間的一貫性に重点が置かれている。しかし,大きな間隔でフレーム間の不整合に直面する深刻な問題に直面しており,この問題を解決するために,セマンティック対応と自動ビデオカラー化を組み合わせて長距離一貫性を維持する新しい映像カラー化フレームワークを提案する。まず、参照着色ネットワークは、各ビデオの第1フレームを自動的に着色するように設計され、参照画像を取得し、以下の全着色プロセスを監督する。このような自動カラー化基準画像は、作業集約的かつ時間のかかる手動選択を回避できるだけでなく、参照画像とグレースケール画像の類似性を高めることができる。その後、セマンティック対応ネットワークと画像カラー化ネットワークを導入し、参照の助けを借りて残りのフレームの一連の色付けを行う。各フレームは、参照画像と即座に彩色された先行フレームの両方で監督され、短距離と長距離の時間的一貫性が向上する。広範な実験により,本手法は定性的および定量的に時間的一貫性を維持する他の手法よりも優れていることが示された。 NTIRE 2023ビデオカラー化チャレンジでは,色分布一貫性(CDC)最適化トラックで3位にランクインした。 Video colorization task has recently attracted wide attention. Recent methods mainly work on the temporal consistency in adjacent frames or frames with small interval. However, it still faces severe challenge of the inconsistency between frames with large interval.To address this issue, we propose a novel video colorization framework, which combines semantic correspondence into automatic video colorization to keep long-range consistency. Firstly, a reference colorization network is designed to automatically colorize the first frame of each video, obtaining a reference image to supervise the following whole colorization process. Such automatically colorized reference image can not only avoid labor-intensive and time-consuming manual selection, but also enhance the similarity between reference and grayscale images. Afterwards, a semantic correspondence network and an image colorization network are introduced to colorize a series of the remaining frames with the help of the reference. Each frame is supervised by both the reference image and the immediately colorized preceding frame to improve both short-range and long-range temporal consistency. Extensive experiments demonstrate that our method outperforms other methods in maintaining temporal consistency both qualitatively and quantitatively. In the NTIRE 2023 Video Colorization Challenge, our method ranks at the 3rd place in Color Distribution Consistency (CDC) Optimization track.	翻訳日:2023-05-16 19:01:04 公開日:2023-05-13
# SUMO-Kを高次集合論に翻訳する Translating SUMO-K to Higher-Order Set Theory ( http://arxiv.org/abs/2305.07903v1 ) ライセンス: Link先を確認	Chad Brown, Adam Pease, Josef Urban	(参考訳) 我々はSUMO(SUMO-K)の断片から高階集合論への変換を記述する。この翻訳は、一階を超えて、これまで非公式の解釈しかなかった相撲の一部の形式的な意味論を提供する。また、大きな常識オントロジーを非常に安全な対話的定理証明システムに組み込むのも初めてである。我々は、SUMOの高次構造の一部を含む一階構造からSUMOの矛盾を見つけるためのこれまでの研究をさらに拡張する。最後に、この翻訳を用いて、高階の対話的自動定理証明器を用いて証明できる問題を作成することができる。これはいくつかのシステムでテストされており、高階の常識推論問題のコーパスを形成するために使用できる。 We describe a translation from a fragment of SUMO (SUMO-K) into higher-order set theory. The translation provides a formal semantics for portions of SUMO which are beyond first-order and which have previously only had an informal interpretation. It also for the first time embeds a large common-sense ontology into a very secure interactive theorem proving system. We further extend our previous work in finding contradictions in SUMO from first order constructs to include a portion of SUMO's higher order constructs. Finally, using the translation, we can create problems that can be proven using higher-order interactive and automated theorem provers. This is tested in several systems and can be used to form a corpus of higher-order common-sense reasoning problems.	翻訳日:2023-05-16 19:00:43 公開日:2023-05-13
# 量子計算を用いた電子構造計算 Electronic Structure Calculations using Quantum Computing ( http://arxiv.org/abs/2305.07902v1 ) ライセンス: Link先を確認	Nouhaila Innan, Muhammad Al-Zafar Khan, and Mohamed Bennai	(参考訳) 量子レベルでの電子構造特性の計算は、現代の物理学研究の重要な側面である。しかし、従来の手法はより大きく複雑なシステムに対して計算的に要求することができる。この問題に対処するために,変分量子固有解法(VQE)アルゴリズムを用いたハイブリッド古典量子計算手法を提案する。量子系を量子ビットのセットにマッピングし、量子回路を用いて基底状態の波動関数を準備することにより、従来の方法よりも少ない計算資源を必要とする流線型プロセスを実現する。本アルゴリズムは, 分子の密度汎関数理論やハートリー・フォック理論など, 従来の電子構造法と比較して, 比較的少ない資源を有効利用しながら, 類似した精度を実証した。これらの結果は,新しい材料や技術の発展を早めるアルゴリズムの可能性を示している。この研究は、電子構造計算の計算上の課題を克服する道を開く。これは、量子コンピューティングが複雑な量子システムの理解を前進させる上での変換的影響を示している。 The computation of electronic structure properties at the quantum level is a crucial aspect of modern physics research. However, conventional methods can be computationally demanding for larger, more complex systems. To address this issue, we present a hybrid Classical-Quantum computational procedure that uses the Variational Quantum Eigensolver (VQE) algorithm. By mapping the quantum system to a set of qubits and utilising a quantum circuit to prepare the ground state wavefunction, our algorithm offers a streamlined process requiring fewer computational resources than classical methods. Our algorithm demonstrated similar accuracy in rigorous comparisons with conventional electronic structure methods, such as Density Functional Theory and Hartree-Fock Theory, on a range of molecules while utilising significantly fewer resources. These results indicate the potential of the algorithm to expedite the development of new materials and technologies. This work paves the way for overcoming the computational challenges of electronic structure calculations. It demonstrates the transformative impact of quantum computing on advancing our understanding of complex quantum systems.	翻訳日:2023-05-16 19:00:32 公開日:2023-05-13
# パワーグリッドにおける分断スイッチと量子アニーリングの組合せに関する最適化問題に関する研究 A study of the optimization problem on the combination of sectionalizing switches in power grid with quantum annealing ( http://arxiv.org/abs/2305.07899v1 ) ライセンス: Link先を確認	Masaya Takahashi, Hiroaki Nishioka, Masahiro Hirai, Hidetaka Takano	(参考訳) 地球温暖化の観点からは、電力網の効率改善が課題となっている。電力グリッドには、電力の流れを制御する多くのスイッチングデバイスがある。ワイヤにわずかな抵抗があり、電力消費は電流の正方形に比例するので、電流の供給経路を変えるスイッチ値の組み合わせに応じて、ワイヤ上の電力損失の値が変化する。スイッチの組み合わせの総数はスイッチ数とともに指数関数的に増加し、スイッチ値の最適な組み合わせを見つけるために様々なアルゴリズムが研究されている。本稿では,電力網におけるスイッチ結合問題を2次非拘束二元最適化(QUBO)として捉え,量子アニールを用いた評価関数を導出する手法を提案する。結果は特許庁でp6736787として登録される。 From the perspective of global warming, efficiency improvement of power grids is a pressing issue. Power grids have many switching devices to control the flow of electricity. Since there is a slight resistance in the wires and power consumption is proportional to the square of the current, the value of power loss on the wires changes depending on the combination of switch values that change the supply path of the current. The total number of switch combinations increases exponentially with the number of switches, and various algorithms have been studied to find the optimal combination of switch values. We propose a method to capture the switch combination problem in power grids as quadratic unconstrained binary optimization (QUBO) and derive an evaluation function to solve it using quantum annealing. The result is registered as a patent P6736787 at Japanese patent office.	翻訳日:2023-05-16 19:00:16 公開日:2023-05-13
# network-giant: harmonic hessian consensusによる完全分散ニュートン型最適化 Network-GIANT: Fully distributed Newton-type optimization via harmonic Hessian consensus ( http://arxiv.org/abs/2305.07898v1 ) ライセンス: Link先を確認	Alessio Maritan, Ganesh Sharma, Luca Schenato, Subhrakanti Dey	(参考訳) 本稿では,局所最適化と近隣ノード間の情報交換による局所的目的(経験的損失)関数の和を最小化する分散マルチエージェント学習の課題について考察する。本稿では,集中型パラメータサーバに依存する連合学習アルゴリズムである giant に基づく,ニュートン型完全分散最適化アルゴリズム network-giant を提案する。ネットワークジャイアントアルゴリズムは、各ノードにおける勾配追跡とニュートン型反復アルゴリズムの組み合わせによって設計され、局所勾配とニュートン更新のコンセンサスに基づく平均化を行う。提案アルゴリズムは,強い凸関数と滑らかな損失関数を仮定して,ネットワーク上の厳密解に対する半グローバルおよび指数収束を保証する。本稿では,ネットワークダインやニュートン・ラフソンコンセンサスなどの最先端分散学習アルゴリズムよりも,ネットワークジャイアントの収束性能が優れていることを示す実証的証拠を提供する。 This paper considers the problem of distributed multi-agent learning, where the global aim is to minimize a sum of local objective (empirical loss) functions through local optimization and information exchange between neighbouring nodes. We introduce a Newton-type fully distributed optimization algorithm, Network-GIANT, which is based on GIANT, a Federated learning algorithm that relies on a centralized parameter server. The Network-GIANT algorithm is designed via a combination of gradient-tracking and a Newton-type iterative algorithm at each node with consensus based averaging of local gradient and Newton updates. We prove that our algorithm guarantees semi-global and exponential convergence to the exact solution over the network assuming strongly convex and smooth loss functions. We provide empirical evidence of the superior convergence performance of Network-GIANT over other state-of-art distributed learning algorithms such as Network-DANE and Newton-Raphson Consensus.	翻訳日:2023-05-16 19:00:03 公開日:2023-05-13
# 大規模マルチモーダルモデルにおけるOCRの隠れミステリーについて On the Hidden Mystery of OCR in Large Multimodal Models ( http://arxiv.org/abs/2305.07895v1 ) ライセンス: Link先を確認	Yuliang Liu, Zhang Li, Hongliang Li, Wenwen Yu, Mingxin Huang, Dezhi Peng, Mingyu Liu, Mingrui Chen, Chunyuan Li, Lianwen Jin, Xiang Bai	(参考訳) 大規模モデルは近年,自然言語処理やマルチモーダル視覚言語学習において重要な役割を担っている。テキスト関連視覚タスクにおける有効性については,いまだ検討されていない。既存のマルチモーダルモデルについて総合的研究を行い,テキスト認識,テキストに基づく視覚的質問応答,キー情報抽出の性能評価を行った。これらのモデルの強みと弱みは、主に単語認識の意味的理解に依存し、個々の文字形状に対する劣った知覚を示す。また、テキスト長に対する差分を表示し、画像のきめ細かい特徴を検出する能力に制限がある。その結果,現在最も強力な大規模マルチモーダルモデルでさえ,従来のテキストタスクではドメイン固有メソッドと一致せず,より複雑なタスクでは大きな課題に直面していることがわかった。最も重要な点は,ゼロショットマルチモーダル技術の向上を目的とした革新的戦略の構想と評価のための基礎的枠組みを,本研究で提示した基礎的結果が提供できることである。評価パイプラインはhttps://github.com/Yuliang-Liu/MultimodalOCRで提供される。 Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. It remains less explored about their efficacy in text-related visual tasks. We conducted a comprehensive study of existing publicly available multimodal models, evaluating their performance in text recognition, text-based visual question answering, and key information extraction. Our findings reveal strengths and weaknesses in these models, which primarily rely on semantic understanding for word recognition and exhibit inferior perception of individual character shapes. They also display indifference towards text length and have limited capabilities in detecting fine-grained features in images. Consequently, these results demonstrate that even the current most powerful large multimodal models cannot match domain-specific methods in traditional text tasks and face greater challenges in more complex tasks. Most importantly, the baseline results showcased in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal techniques. Evaluation pipeline will be available at https://github.com/Yuliang-Liu/MultimodalOCR.	翻訳日:2023-05-16 18:59:48 公開日:2023-05-13
# 3次元非教師付き(深く)教師付きニューラルネットワークを用いた多孔質部品のボクセルワイズ分類 Voxel-wise classification for porosity investigation of additive manufactured parts with 3D unsupervised and (deeply) supervised neural networks ( http://arxiv.org/abs/2305.07894v1 ) ライセンス: Link先を確認	Domenico Iuso, Soumick Chatterjee, Jan De Beenhouwer, Jan Sijbers	(参考訳) アダプティブ・マニュファクチャリング(AM)は、デジタルモデルからサンプルを直接生産できる製造プロセスとして登場した。バッチのすべての製造サンプルで品質基準が満たされることを保証するため、X線CT(Computerd Tomography)が自動異常検出と組み合わせられることが多い。後者では、画像品質の低下に対して分析され、耐性がある材料に対して堅牢であるように訓練できるため、ディープラーニング(DL)異常検出技術が増えている。残念なことに、最近のDLモデルは2次元画像処理のために開発されており、貴重なボリューム情報を無視している。本研究は,X-CT画像からのAMサンプルのポロシティ解析のための非教師付き (UNet, UNet++, UNet 3+, MSS-UNet) と非教師付き (VAE, ceVAE, gmVAE, vqVAE) DLモデルを再検討し, 3次元パッチパイプラインを用いて3次元入力データを受け入れるように拡張した。教師付きモデルはFocal Tversky損失を用いてトレーニングされ、トレーニングデータセットの低いポロシティから生じるクラス不均衡に対処した。教師なしモデルの出力は、オブジェクト表面を適切に表現できないことによる誤分類を減らすために後処理される。その結果,DLモデルの性能ベンチマーク,ポストプロセッシングアルゴリズムの評価,教師なしモデルの出力による教師なしモデルのトレーニング効果の評価など,5倍の精度で検証された。画像品質の悪いテストセットの最終的なパフォーマンスベンチマークでは、最高のパフォーマンス教師付きモデルは平均精度0.808$\pm$0.013のMSS-UNetであり、最も優れた教師なしモデルは処理後のceVAE 0.935$\pm$ 0.001である。 VAE/ceVAEモデルは特に後処理技術を活用する際に優れた性能を示した。 Additive Manufacturing (AM) has emerged as a manufacturing process that allows the direct production of samples from digital models. To ensure that quality standards are met in all manufactured samples of a batch, X-ray computed tomography (X-CT) is often used combined with automated anomaly detection. For the latter, deep learning (DL) anomaly detection techniques are increasingly, as they can be trained to be robust to the material being analysed and resilient towards poor image quality. Unfortunately, most recent and popular DL models have been developed for 2D image processing, thereby disregarding valuable volumetric information. This study revisits recent supervised (UNet, UNet++, UNet 3+, MSS-UNet) and unsupervised (VAE, ceVAE, gmVAE, vqVAE) DL models for porosity analysis of AM samples from X-CT images and extends them to accept 3D input data with a 3D-patch pipeline for lower computational requirements, improved efficiency and generalisability. The supervised models were trained using the Focal Tversky loss to address class imbalance that arises from the low porosity in the training datasets. The output of the unsupervised models is post-processed to reduce misclassifications caused by their inability to adequately represent the object surface. The findings were cross-validated in a 5-fold fashion and include: a performance benchmark of the DL models, an evaluation of the post-processing algorithm, an evaluation of the effect of training supervised models with the output of unsupervised models. In a final performance benchmark on a test set with poor image quality, the best performing supervised model was MSS-UNet with an average precision of 0.808 $\pm$ 0.013, while the best unsupervised model was the post-processed ceVAE with 0.935 $\pm$ 0.001. The VAE/ceVAE models demonstrated superior capabilities, particularly when leveraging post-processing techniques.	翻訳日:2023-05-16 18:59:32 公開日:2023-05-13
# PESTS: セマンティックテキスト類似性のためのペルシャ英語クロスリンガルコーパス PESTS: Persian_English Cross Lingual Corpus for Semantic Textual Similarity ( http://arxiv.org/abs/2305.07893v1 ) ライセンス: Link先を確認	Mohammad Abdous, Poorya Piroozfar, Behrouz Minaei Bidgoli	(参考訳) 最近多くの調査を受けた自然言語処理のコンポーネントの1つは、セマンティックテキストの類似性である。計算言語学や自然言語処理では、単語、句、段落、テキストの意味的類似性を評価することが重要である。意味的類似性(semantic similarity)は、単言語版とクロス言語版の両方で提供される2つのテキスト片、段落、句間の意味的類似度を計算することである。言語間の意味的類似性は、ソース言語とターゲット言語の両方に意味的類似度を持つ文対が存在するコーパスを必要とする。多くの既存の言語間セマンティック類似モデルでは、機械翻訳誤差の伝搬がモデルの精度を低下させるクロス言語間セマンティック類似性データセットが利用できないため、機械翻訳を用いる。一方、機械翻訳に意味的類似性を利用したい場合は、意味的類似性のために同じ機械翻訳を使うべきではない。ペルシャ語は低資源言語の1つであるが、この点において努力は行われておらず、2つの言語の文脈を理解できるモデルの必要性はこれまで以上に感じられる。本稿では,ペルシア語と英語の文間の意味的テキスト類似性のコーパスを,言語専門家を用いて初めて作成した。このデータセットをPESTS (Persian English Semantic Textual similarity) と名付けた。このコーパスは5375の文対を含む。また、トランスフォーマーに基づくモデルもこのデータセットを使って微調整されている。その結果、PESTSデータセットを用いて、XLM ROBERTaモデルのピアソン相関は85.87%から95.62%に増加した。 One of the components of natural language processing that has received a lot of investigation recently is semantic textual similarity. In computational linguistics and natural language processing, assessing the semantic similarity of words, phrases, paragraphs, and texts is crucial. Calculating the degree of semantic resemblance between two textual pieces, paragraphs, or phrases provided in both monolingual and cross-lingual versions is known as semantic similarity. Cross lingual semantic similarity requires corpora in which there are sentence pairs in both the source and target languages with a degree of semantic similarity between them. Many existing cross lingual semantic similarity models use a machine translation due to the unavailability of cross lingual semantic similarity dataset, which the propagation of the machine translation error reduces the accuracy of the model. On the other hand, when we want to use semantic similarity features for machine translation the same machine translations should not be used for semantic similarity. For Persian, which is one of the low resource languages, no effort has been made in this regard and the need for a model that can understand the context of two languages is felt more than ever. In this article, the corpus of semantic textual similarity between sentences in Persian and English languages has been produced for the first time by using linguistic experts. We named this dataset PESTS (Persian English Semantic Textual Similarity). This corpus contains 5375 sentence pairs. Also, different models based on transformers have been fine-tuned using this dataset. The results show that using the PESTS dataset, the Pearson correlation of the XLM ROBERTa model increases from 85.87% to 95.62%.	翻訳日:2023-05-16 18:58:53 公開日:2023-05-13
# dac-mr: メタ学習のためのデータ拡張一貫性に基づくメタレギュライゼーション DAC-MR: Data Augmentation Consistency Based Meta-Regularization for Meta-Learning ( http://arxiv.org/abs/2305.07892v1 ) ライセンス: Link先を確認	Jun Shu, Xiang Yuan, Deyu Meng, Zongben Xu	(参考訳) 最近、メタ学習は研究され、現代の機械学習の進歩に貢献した。しかし、優れたメタ学習モデルを実現するには、基礎となるタスクの一般化目標を表す高品質なメタデータを備えた大量のトレーニングタスクが必要である。しかし、現在のメタデータ駆動型メタ学習アプローチは、不十分なトレーニングタスクで満足なメタモデルをトレーニングすることがかなり難しい。この問題に対処するため,メタ知識をメタ学習プロセスに統合することによりメタ学習を改善するメタ知識情報メタ学習(MKIML)フレームワークを提案する。メタモデル関数クラスのキャパシティ複雑性を正規化するために,適切なメタ正規化(MR)目標を用いてメタ知識をメタオブジェクトに統合し,未確認タスクの一般化を容易にする。 DAC-MRで表されるMR目標をインスタンス化するためのメタ知識として,不変性を符号化するためのデータ拡張整合性を導入する。提案するdac-mrは、ノイズ、スパース、あるいは使用不能なメタデータを持つトレーニングタスクから、パフォーマンスのよいメタモデルを学ぶことを希望する。理論的には,DAC-MRは,高品質なメタデータを持たないメタモデルを評価するために用いられるプロキシメタオブジェクトとして扱うことができる。さらに,DAC-MRと組み合わせたメタデータ駆動型メタロスは,より優れたメタレベルの一般化を実現することができる。異なるネットワークアーキテクチャとベンチマークを持つ10のメタラーニングタスクは、メタモデル学習を支援するdac-mrの能力を示しています。 DAC-MRの優れた性能は、すべての設定で得られ、我々の理論的知見とよく一致している。これは、私たちのDAC-MRは問題に非依存であり、広範なメタ学習問題やタスクに容易に適用できることを望んでいます。 Meta learning recently has been heavily researched and helped advance the contemporary machine learning. However, achieving well-performing meta-learning model requires a large amount of training tasks with high-quality meta-data representing the underlying task generalization goal, which is sometimes difficult and expensive to obtain for real applications. Current meta-data-driven meta-learning approaches, however, are fairly hard to train satisfactory meta-models with imperfect training tasks. To address this issue, we suggest a meta-knowledge informed meta-learning (MKIML) framework to improve meta-learning by additionally integrating compensated meta-knowledge into meta-learning process. We preliminarily integrate meta-knowledge into meta-objective via using an appropriate meta-regularization (MR) objective to regularize capacity complexity of the meta-model function class to facilitate better generalization on unseen tasks. As a practical implementation, we introduce data augmentation consistency to encode invariance as meta-knowledge for instantiating MR objective, denoted by DAC-MR. The proposed DAC-MR is hopeful to learn well-performing meta-models from training tasks with noisy, sparse or unavailable meta-data. We theoretically demonstrate that DAC-MR can be treated as a proxy meta-objective used to evaluate meta-model without high-quality meta-data. Besides, meta-data-driven meta-loss objective combined with DAC-MR is capable of achieving better meta-level generalization. 10 meta-learning tasks with different network architectures and benchmarks substantiate the capability of our DAC-MR on aiding meta-model learning. Fine performance of DAC-MR are obtained across all settings, and are well-aligned with our theoretical insights. This implies that our DAC-MR is problem-agnostic, and hopeful to be readily applied to extensive meta-learning problems and tasks.	翻訳日:2023-05-16 18:58:28 公開日:2023-05-13
# 構造シミュレーションとブリッジ健康モニタリングのためのニューラルオペレータ Neural operator for structural simulation and bridge health monitoring ( http://arxiv.org/abs/2305.07889v1 ) ライセンス: Link先を確認	Chawit Kaewnuratchadasorn, Jiaji Wang, Chul-Woo Kim	(参考訳) 構造工学による深層学習は,前向き問題(構造シミュレーション)と逆問題(構造健康モニタリング)の両方に広く注目されている。フーリエ・ニューラル・オペレーターに基づいて,橋梁構造のディジタル双対としてvino(vehicle-bridge interaction neural operator)を提案する。 VINOは構造応答場と損傷場のマッピングを学ぶ。本研究では, 構造初期損傷場のランダム分布を考慮したパラメータ有限要素(FE)シミュレーションにより, VBI-FEデータセットを構築した。その後、vbi-expデータセットは4つの損傷シナリオで実験的研究を行った。 VINOはVBI-FEによって事前訓練され、VBI-EXPによって正常状態の橋から微調整された後、以下の2つの改善が達成された。まず、フォワードVINOは、FEモデルよりも正確に損傷場入力から構造応答を予測できる。第二に、逆VINOはすべてのシナリオにおけるダメージを決定、ローカライズ、定量化し、データ駆動アプローチの実践性を示唆する。 Infusing deep learning with structural engineering has received widespread attention for both forward problems (structural simulation) and inverse problems (structural health monitoring). Based on Fourier Neural Operator, this study proposes VINO (Vehicle-bridge Interaction Neural Operator) to serve as the digital twin of bridge structures. VINO learns mappings between structural response fields and damage fields. In this study, VBI-FE dataset was established by running parametric finite element (FE) simulations considering a random distribution of structural initial damage field. Subsequently, VBI-EXP dataset was produced by conducting an experimental study under four damage scenarios. After VINO was pre-trained by VBI-FE and fine-tuned by VBI-EXP from the bridge at the healthy state, the model achieved the following two improvements. First, forward VINO can predict structural responses from damage field inputs more accurately than the FE model. Second, inverse VINO can determine, localize, and quantify damages in all scenarios, suggesting the practicality of data-driven approaches.	翻訳日:2023-05-16 18:57:58 公開日:2023-05-13
# 滑らかな演算子に基づく一段階量子探索アルゴリズム One-step quantum search algorithms based on smooth operators ( http://arxiv.org/abs/2305.07924v1 ) ライセンス: Link先を確認	Basanta R. Pahari, Sagar Bhat, Siri Davidi, William Oates	(参考訳) 微分や積分の発見は科学知識の飛躍的な飛躍であり、数学、物理学、工学など多くの分野に革命をもたらした。高次微分の存在は近似の精度を高め、任意の物理現象のより正確なモデリングを可能にする。ここでは、無限に微分可能な滑らかな演算子を用いて、2つの量子探索アルゴリズムを構築し、一見異なる領域を接続する。滑らかな関数とともに、置換演算子とユニティの根を利用して量子探索を行う量子回路を生成する。量子シミュレータを用いてモデルを検証し、IBMの量子ハードウェア上でテストする。さらに,ノイズと誤差伝播の効果について検討し,groverのアルゴリズムのような反復法に比べて,雑音に対してより頑健な手法であることを示す。 The discovery of derivatives and integrals was a tremendous leap in scientific knowledge and completely revolutionized many fields, including mathematics, physics, and engineering. The existence of higher-order derivatives means better approximation and, thus, more accurate modeling of any physical phenomenon. Here we use smooth operators that are infinitely differentiable to construct two quantum search algorithms and connect these seemingly different areas. Along with smooth functions, permutation operators and the roots of unity are exploited to create quantum circuits to perform a quantum search. We validate our models through quantum simulators and test them on IBM's quantum hardware. Furthermore, we investigate the effect of noise and error propagation and demonstrate that our approach is more robust to noise compared to iterative methods like Grover's algorithm.	翻訳日:2023-05-16 18:51:58 公開日:2023-05-13
# CodeT5+: コード理解と生成のためのオープンコード大言語モデル CodeT5+: Open Code Large Language Models for Code Understanding and Generation ( http://arxiv.org/abs/2305.07922v1 ) ライセンス: Link先を確認	Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D.Q. Bui, Junnan Li, Steven C.H. Hoi	(参考訳) 大きな言語モデル (LLM) は膨大なソースコードで事前訓練されており、コードインテリジェンスにおいて顕著な進歩を遂げている。しかし、既存のLLMにはアーキテクチャと事前訓練タスクの2つの主な制限がある。まず、特定のアーキテクチャ(エンコーダのみまたはデコーダのみ)を採用するか、あるいは異なるダウンストリームタスクに統一されたエンコーダデコーダネットワークに依存する。前者のパラダイムはアプリケーションの柔軟性によって制限されるが、後者では、モデルが全てのタスクに対して単一のシステムとして扱われ、タスクのサブセット上での最適なパフォーマンスをもたらす。第2に,ダウンストリームタスクとは無関係な,限定的な事前トレーニング目標を採用して,結果としてパフォーマンスが大幅に低下することが多い。これらの制限に対処するために,コンポーネントモジュールを柔軟に組み合わせて幅広いダウンストリームコードタスクに適合させることができるコード用エンコーダデコーダLLMのファミリーである ``CodeT5+' を提案する。このような柔軟性は,プレトレイン-ファイントゥーンの相違を緩和するための事前学習目的の混合によって実現される。これらの目的は、単調かつバイモーダルな多言語コードコーパスにおいて、認知、コントラスト学習、テキストコードマッチング、因果的LM事前訓練タスクをカバーする。さらに,スクラッチからトレーニングを受けることなく既製のLLMでCodeT5+を初期化してモデルを効率的にスケールアップし,自然言語命令と整合するインストラクションチューニングについて検討する。我々は、ゼロショット、微調整、命令調整を含む20以上のコード関連ベンチマークでCodeT5+を広範囲に評価した。我々は,コード生成や完了,数学プログラミング,テキスト・ツー・コード検索タスクなど,コード関連タスクにおける最先端(SoTA)モデルのパフォーマンスを観察する。特に,命令調整した CodeT5+ 16B では,HumanEval コード生成タスクにおいて,他のオープンコード LLM に対して新たな SoTA 結果が得られる。 Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. However, existing code LLMs have two main limitations in terms of architecture and pretraining tasks. First, they often adopt a specific architecture (encoder-only or decoder-only) or rely on a unified encoder-decoder network for different downstream tasks. The former paradigm is limited by inflexibility in applications while in the latter, the model is treated as a single system for all tasks, leading to suboptimal performance on a subset of tasks. Secondly, they often employ a limited set of pretraining objectives which might not be relevant to some downstream tasks and hence result in substantial performance degrade. To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. Such flexibility is enabled by our proposed mixture of pretraining objectives to mitigate the pretrain-finetune discrepancy. These objectives cover span denoising, contrastive learning, text-code matching, and causal LM pretraining tasks, on both unimodal and bimodal multilingual code corpora. Furthermore, we propose to initialize CodeT5+ with frozen off-the-shelf LLMs without training from scratch to efficiently scale up our models, and explore instruction-tuning to align with natural language instructions. We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning. We observe state-of-the-art (SoTA) model performance on various code-related tasks, such as code generation and completion, math programming, and text-to-code retrieval tasks. Particularly, our instruction-tuned CodeT5+ 16B achieves new SoTA results on HumanEval code generation task against other open code LLMs.	翻訳日:2023-05-16 18:51:45 公開日:2023-05-13
# 医用ビジョンランゲージ事前トレーニングのためのアライメントモデリングによるマルチタスクペアマスキング Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training ( http://arxiv.org/abs/2305.07920v1 ) ライセンス: Link先を確認	Ke Zhang, Hanliang Jiang, Jian Zhang, Qingming Huang, Jianping Fan, Jun Yu and Weidong Han	(参考訳) 近年,医用画像診断の需要が高まり,放射線科医にとって大きな負担となっている。既存のmed-vlp手法は,大規模医用画像から普遍表現を学習する自動医用画像解析のソリューションを提供し,細かなアノテーションを必要とせずに下流タスクに便益を与える。しかし, 既存の画像・テキスト合成手法では, 関節再建にともなうクロスモーダルアライメントの重要性が無視され, 不適切なクロスモーダル相互作用が得られた。本稿では,マルチタスク・ペアリング・マスク・アライメント(mpma)に基づく統合型メド・vlpフレームワークを提案し,クロスモーダルアライメントタスクを統合画像テキスト合成フレームワークに統合し,より包括的なクロスモーダルインタラクションを実現する。より包括的なクロスモーダル融合を実現するため,視覚的特徴を完全に統合し,レポート再構築のプロセスを支援するメモリ拡張クロスモーダル融合(MA-CMF)モジュールも提案する。実験の結果,提案手法は,ユニモーダルタスク,クロスモーダルタスク,マルチモーダルタスクなど,すべての下流タスクに対して従来手法よりも優れていた。 In recent years, the growing demand for medical imaging diagnosis has brought a significant burden to radiologists. The existing Med-VLP methods provide a solution for automated medical image analysis which learns universal representations from large-scale medical images and reports and benefits downstream tasks without requiring fine-grained annotations. However, the existing methods based on joint image-text reconstruction neglect the importance of cross-modal alignment in conjunction with joint reconstruction, resulting in inadequate cross-modal interaction. In this paper, we propose a unified Med-VLP framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework to achieve more comprehensive cross-modal interaction, while a global and local alignment (GLA) module is designed to assist self-supervised paradigm in obtaining semantic representations with rich domain knowledge. To achieve more comprehensive cross-modal fusion, we also propose a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual features to assist in the process of report reconstruction. Experimental results show that our approach outperforms previous methods over all downstream tasks, including uni-modal, cross-modal and multi-modal tasks.	翻訳日:2023-05-16 18:51:12 公開日:2023-05-13
# マルチオブザーバによる高次元モニタリングとリアリズムの出現 High-dimensional monitoring and the emergence of realism via multiple observers ( http://arxiv.org/abs/2305.07919v1 ) ライセンス: Link先を確認	Alexandre C. Orthey Jr., Pedro R. Dieguez, Owidiusz Makuta, Remigiusz Augusiak	(参考訳) 量子測定はユニタリ進化であり、後に部分的トレースが続く。そこで本研究では,量子世界の物理的現実の出現を,弱度と強い非選択性の測定を補間するモデルを導入することによって解決する。一般化オブザーバブルとハイゼンベルク・ワイル作用素に基づくモデルでは,高次元のquditに対しては,量子ダーウィン主義の枠組みに従って,システムと複数の環境quditと相互作用させることで,システムに関する完全な情報を得ることができることを示唆する。 Quantum measurements are unitary evolutions followed by partial traces. Based on that, we address the problem of the emergence of physical reality from the quantum world by introducing a model that interpolates between weak and strong non-selective measurements for qudits. Our model, which is based on generalized observables and Heisenberg-Weyl operators, suggests that for high-dimensional qudits, full information about the system can only be obtained by making the system interact with not just one but several environmental qudits, following a Quantum Darwinism framework.	翻訳日:2023-05-16 18:50:44 公開日:2023-05-13
# スペシャリストの原理の証明 A Proof of Specker's Principle ( http://arxiv.org/abs/2305.07917v1 ) ライセンス: Link先を確認	Guido Bacciagaluppi	(参考訳) スペクターの原理、対の直交命題は共同直交でなければならないという条件は、量子力学を特徴づける物理原理を見つけるプログラムの中で、近年広く研究されている。しかし、透明な正当性が欠けていることがほとんどである。本稿では,最大エンタングルメントの存在,非最大測定の存在,および符号付けの3つの仮定から,スペクトルの原理を導出する。これら3つの仮定について議論し、2つの命題を満たす非Specker集合の正準例を記述する。これらの例は、量子力学の解釈における様々なアプローチ、特にレトロカウセーションに基づく類似性を示す。また、ポープスクやローリッヒの作品との関係についても論じる。証明の核心(そして、署名の禁止に違反する主な例)は、私が紙を開く『ニーネヴェのシーザー』というスペクターの物語の変種によって説明されている。 Specker's principle, the condition that pairwise orthogonal propositions must be jointly orthogonal, has been much investigated recently within the programme of finding physical principles to characterise quantum mechanics. It largely appears, however, to lack a transparent justification. In this paper, I provide a derivation of Specker's principle from three assumptions (made suitably precise): the existence of maximal entanglement, the existence of non-maximal measurements, and no-signalling. I discuss these three assumptions and describe canonical examples of non-Specker sets of propositions satisfying any two of them. These examples display analogies with various approaches in the interpretation of quantum mechanics, notably ones based on retrocausation. I also discuss connections with the work of Popescu and Rohrlich. The core of the proof (and the main example violating no-signalling) is illustrated by a variant of Specker's tale of the seer of Nineveh, with which I open the paper.	翻訳日:2023-05-16 18:50:33 公開日:2023-05-13
# 干渉による測定のための量子不確かさ原理 Quantum Uncertainty Principles for Measurements with Interventions ( http://arxiv.org/abs/2305.07914v1 ) ライセンス: Link先を確認	Yunlong Xiao, Yuxiang Yang, Ximing Wang, Qing Liu, Mile Gu	(参考訳) ハイゼンベルクの不確実性原理は、量子システムのどの性質を同時に学べるかに関する基本的な制約を意味する。しかし、通常は、これらの性質を1つの点で測定することで調査する。対照的に、複雑なプロセスにおける因果依存性を推測するには、しばしば対話的な実験を必要とする。ここでは任意の介入ラウンドを含む一般的な対話的測定のための普遍的不確実性原理を示す。ケーススタディとして,異なる因果関係に適合する測定値間の不確実性トレードオフを示唆することを示す。 Heisenberg's uncertainty principle implies fundamental constraints on what properties of a quantum system can we simultaneously learn. However, it typically assumes that we probe these properties via measurements at a single point in time. In contrast, inferring causal dependencies in complex processes often requires interactive experimentation - multiple rounds of interventions where we adaptively probe the process with different inputs to observe how they affect outputs. Here we demonstrate universal uncertainty principles for general interactive measurements involving arbitrary rounds of interventions. As a case study, we show that they imply an uncertainty trade-off between measurements compatible with different causal dependencies.	翻訳日:2023-05-16 18:50:16 公開日:2023-05-13
# 時間知識グラフ補完のためのプロンプト付き事前学習言語モデル Pre-trained Language Model with Prompts for Temporal Knowledge Graph Completion ( http://arxiv.org/abs/2305.07912v1 ) ライセンス: Link先を確認	Wenjie Xu, Ben Liu, Miao Peng, Xu Jia, Min Peng	(参考訳) 時間知識グラフ補完(TKGC)は、事実の欠落部分を完成させるために既知のタイムスタンプでの推論を含む重要なタスクであり、近年ますます注目を集めている。既存の手法のほとんどは、時間スタンプから情報を不正確に抽出しながら、グラフニューラルネットワークに基づく表現の学習に重点を置いている。これらの問題に対処するため,我々は新しいtkgcモデル,すなわちtkgc (ppt) のプロンプト付き事前学習言語モデルを提案する。サンプルの四重項を事前訓練した言語モデル入力に変換し、タイムスタンプ間の間隔を異なるプロンプトに変換し、暗黙的な意味情報を持つ一貫性のある文を生成する。我々は、TKGCタスクをマスク付きトークン予測タスクに変換するためのマスキング戦略でモデルを訓練し、事前訓練された言語モデルにおける意味情報を活用することができる。 3つのベンチマークデータセットに関する実験と広範な分析によって、我々のモデルは4つのメトリクスを持つ他のモデルと比較して大きな競合性を示している。我々のモデルは、時間的知識グラフからの情報を言語モデルに効果的に組み込むことができる。 Temporal Knowledge graph completion (TKGC) is a crucial task that involves reasoning at known timestamps to complete the missing part of facts and has attracted more and more attention in recent years. Most existing methods focus on learning representations based on graph neural networks while inaccurately extracting information from timestamps and insufficiently utilizing the implied information in relations. To address these problems, we propose a novel TKGC model, namely Pre-trained Language Model with Prompts for TKGC (PPT). We convert a series of sampled quadruples into pre-trained language model inputs and convert intervals between timestamps into different prompts to make coherent sentences with implicit semantic information. We train our model with a masking strategy to convert TKGC task into a masked token prediction task, which can leverage the semantic information in pre-trained language models. Experiments on three benchmark datasets and extensive analysis demonstrate that our model has great competitiveness compared to other models with four metrics. Our model can effectively incorporate information from temporal knowledge graphs into the language models.	翻訳日:2023-05-16 18:50:07 公開日:2023-05-13
# 遅延適応型政策最適化とバンディットフィードバックによる逆mdpの後悔改善 Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback ( http://arxiv.org/abs/2305.07911v1 ) ライセンス: Link先を確認	Tal Lancewicki, Aviv Rosenberg, Dmitry Sotnikov	(参考訳) 政策最適化(PO)は強化学習(RL)において最も一般的な手法の1つである。したがって、POアルゴリズムの理論的保証はRLコミュニティにとって特に重要である。本稿では,ほぼすべての実世界のアプリケーションで発生する課題である,敵対的MDPにおけるPOについて検討する。表形式のMDPでPOに最も近い最適後悔境界を与え、最先端(効率の低い手法)を超越する可能性さえある。私たちの小説『Delay-Adapted PO』(DAPO)は簡単に実装でき、一般化でき、アルゴリズムを次のように拡張できます。 (i)線形$q$-関数を仮定した無限状態空間は、関数近似を用いて遅延フィードバックに対する最初の後悔の限界を証明する。 (II)MuJoCoドメインの実験において,その有効性を示した深部RL。 Policy Optimization (PO) is one of the most popular methods in Reinforcement Learning (RL). Thus, theoretical guarantees for PO algorithms have become especially important to the RL community. In this paper, we study PO in adversarial MDPs with a challenge that arises in almost every real-world application -- \textit{delayed bandit feedback}. We give the first near-optimal regret bounds for PO in tabular MDPs, and may even surpass state-of-the-art (which uses less efficient methods). Our novel Delay-Adapted PO (DAPO) is easy to implement and to generalize, allowing us to extend our algorithm to: (i) infinite state space under the assumption of linear $Q$-function, proving the first regret bounds for delayed feedback with function approximation. (ii) deep RL, demonstrating its effectiveness in experiments on MuJoCo domains.	翻訳日:2023-05-16 18:49:47 公開日:2023-05-13
# mask to reconstruction: コラボレーティブ・セマンティクス・コンプリートによるビデオテキスト検索 Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval ( http://arxiv.org/abs/2305.07910v1 ) ライセンス: Link先を確認	Han Fang and Zhifei Yang and Xianghao Zang and Chao Ban and Hao Sun	(参考訳) 近年,マスク付きビデオモデリングが広く研究され,局所レベルでの視覚領域の理解能力が大幅に向上している。しかし、既存の手法は通常ランダムマスキングを採用し、クロスモーダルコンテンツ間の相関を活用しないマスキング領域を完備するために同じ再構成パラダイムに従う。本稿では,セマンティクスに基づくマスクモデルに基づいて,セマンティクス補完のためのマスク(mascot)を提案する。具体的には、注意に基づくビデオマスキングを用いて、高インフォームドかつ低インフォームドマスクを生成した後、マスキングされたセマンティクス情報を復元するためのインフォームドセマンティクス補完を提案する。このリカバリメカニズムは、マスクされたコンテンツと、マスクされていない視覚領域と対応するテキストコンテキストを整合させることで実現され、モデルがパッチレベルでよりテキスト関連の詳細をキャプチャする。さらに,無関係な背景から差別的な部分への再構成を重視し,低変形マスクの領域を無視する。さらに,両マスク協調学習を設計し,異なるマスクの下にビデオキューを組み込んで,より整列した映像表現を学習する。 MSR-VTT, LSMDC, ActivityNet, DiDeMo など,4つの主要なテキストビデオ検索ベンチマークで最先端のパフォーマンスを実現した。広範なアブレーション研究により,提案手法の有効性が示された。 Recently, masked video modeling has been widely explored and significantly improved the model's understanding ability of visual regions at a local level. However, existing methods usually adopt random masking and follow the same reconstruction paradigm to complete the masked regions, which do not leverage the correlations between cross-modal content. In this paper, we present Mask for Semantics Completion (MASCOT) based on semantic-based masked modeling. Specifically, after applying attention-based video masking to generate high-informed and low-informed masks, we propose Informed Semantics Completion to recover masked semantics information. The recovery mechanism is achieved by aligning the masked content with the unmasked visual regions and corresponding textual context, which makes the model capture more text-related details at a patch level. Additionally, we shift the emphasis of reconstruction from irrelevant backgrounds to discriminative parts to ignore regions with low-informed masks. Furthermore, we design dual-mask co-learning to incorporate video cues under different masks and learn more aligned video representation. Our MASCOT performs state-of-the-art performance on four major text-video retrieval benchmarks, including MSR-VTT, LSMDC, ActivityNet, and DiDeMo. Extensive ablation studies demonstrate the effectiveness of the proposed schemes.	翻訳日:2023-05-16 18:49:34 公開日:2023-05-13
# ハードウェア貯留層におけるブールウェイト最適化の収束とスケーリング Convergence and scaling of Boolean-weight optimization for hardware reservoirs ( http://arxiv.org/abs/2305.07908v1 ) ライセンス: Link先を確認	Louis Andreoli, St\'ephane Chr\'etien, Xavier Porte, Daniel Brunner	(参考訳) ニューラルネットワークのハードウェア実装は、次世代の効率的で強力な人工知能ソリューションを実装するための重要なステップである。並列で効率的でスケーラブルなハードウェアアーキテクチャの実現に加えて、サンプリング効率のよいアプローチでシステムの非常に大きなパラメータ空間の最適化が不可欠である。本稿では,ランダムリカレント結合型ニューラルネットワーク,リザーバの読み出し層を最適化するために,高度に効率的な座標降下のためのスケーリング則を解析的に導出する。この収束は指数関数的であり,ネットワークのニューロン数に線形にスケールすることを示す。本結果は,概念実証実験で実施した大規模フォトニック貯水池の収束とスケーリングを再現するものである。そこで本研究では,ハードウェアネットワークにおけるこのような最適化の基盤を提供し,ニューラルネットワークの振幅統計量と重み付け更新則を活用し,学習中に収束速度を最適化する今後の方向性を明らかにした。 Hardware implementation of neural network are an essential step to implement next generation efficient and powerful artificial intelligence solutions. Besides the realization of a parallel, efficient and scalable hardware architecture, the optimization of the system's extremely large parameter space with sampling-efficient approaches is essential. Here, we analytically derive the scaling laws for highly efficient Coordinate Descent applied to optimizing the readout layer of a random recurrently connection neural network, a reservoir. We demonstrate that the convergence is exponential and scales linear with the network's number of neurons. Our results perfectly reproduce the convergence and scaling of a large-scale photonic reservoir implemented in a proof-of-concept experiment. Our work therefore provides a solid foundation for such optimization in hardware networks, and identifies future directions that are promising for optimizing convergence speed during learning leveraging measures of a neural network's amplitude statistics and the weight update rule.	翻訳日:2023-05-16 18:49:09 公開日:2023-05-13
# 会話型推薦システムにおける大規模言語モデル活用 Leveraging Large Language Models in Conversational Recommender Systems ( http://arxiv.org/abs/2305.07961v1 ) ライセンス: Link先を確認	Luke Friedman, Sameer Ahuja, David Allen, Terry Tan, Hakim Sidahmed, Changbo Long, Jun Xie, Gabriel Schubiner, Ajay Patel, Harsh Lara, Brian Chu, Zexi Chen, Manoj Tiwari	(参考訳) Conversational Recommender System (CRS)は、リアルタイムのマルチターン対話を通じてシステムと対話できるようにすることにより、ユーザに対して透明性とコントロールを向上する。近年、Large Language Models (LLMs) は、自然に会話し、世界知識と常識推論を言語理解に取り入れ、このパラダイムの可能性を解き放つ前例のない能力を示した。しかし、CRS内でLLMを効果的に活用することは、複雑な会話を適切に理解し、制御し、外部の情報ソースから取り出すなど、新しい技術的課題をもたらす。これらの問題は、大きく進化した項目コーパスと、トレーニングのための会話データの欠如によって悪化する。本稿では,LSMを用いたエンドツーエンドの大規模CRSを構築するためのロードマップを提供する。特に,LLMを利用した統合アーキテクチャの一部として,ユーザ好みの理解,フレキシブルな対話管理,説明可能なレコメンデーションのための新しい実装を提案する。パーソナライズを改善するために,LLMが解釈可能な自然言語ユーザプロファイルを消費し,セッションレベルのコンテキストを変調するために利用する方法について述べる。既存のCRSが存在しない場合の会話データ制限を克服するため,制御可能なLCMベースのユーザシミュレータを構築し,合成会話を生成する手法を提案する。概念実証として、LaMDA上に構築されたYouTubeビデオ用の大規模CRSであるRecLLMを紹介し、説明的な例による会話を通じて、その流布性と多様な機能を示す。 A Conversational Recommender System (CRS) offers increased transparency and control to users by enabling them to engage with the system through a real-time multi-turn dialogue. Recently, Large Language Models (LLMs) have exhibited an unprecedented ability to converse naturally and incorporate world knowledge and common-sense reasoning into language understanding, unlocking the potential of this paradigm. However, effectively leveraging LLMs within a CRS introduces new technical challenges, including properly understanding and controlling a complex conversation and retrieving from external sources of information. These issues are exacerbated by a large, evolving item corpus and a lack of conversational data for training. In this paper, we provide a roadmap for building an end-to-end large-scale CRS using LLMs. In particular, we propose new implementations for user preference understanding, flexible dialogue management and explainable recommendations as part of an integrated architecture powered by LLMs. For improved personalization, we describe how an LLM can consume interpretable natural language user profiles and use them to modulate session-level context. To overcome conversational data limitations in the absence of an existing production CRS, we propose techniques for building a controllable LLM-based user simulator to generate synthetic conversations. As a proof of concept we introduce RecLLM, a large-scale CRS for YouTube videos built on LaMDA, and demonstrate its fluency and diverse functionality through some illustrative example conversations.	翻訳日:2023-05-16 18:41:27 公開日:2023-05-13
# 分類木の最適学習のための新しいメメティック戦略 A Novel Memetic Strategy for Optimized Learning of Classification Trees ( http://arxiv.org/abs/2305.07959v1 ) ライセンス: Link先を確認	Tommaso Aldinucci	(参考訳) 解釈可能な機械学習への関心が高まる中、分類木はそのガラス箱構造のために再び科学界の注目を集めてきた。これらのモデルは、通常、不純物対策を最小化する特徴空間の切断を見つけるためにサブプロブレムを解く、欲求手続きを用いて構築される。本稿では,milpに基づく厳密な定式化による学習問題の定義において,この標準的欲望アプローチや近年の進歩とは対照的に,数千点のデータセットを処理可能なミーム的手法を活用し,分類木を誘導するための新しい進化的アルゴリズムを提案する。提案手法は,実現可能な解空間の探索と局所探索を組み合わせることで,最先端手法と競合する一般化能力を持つ構造を得る。 Given the increasing interest in interpretable machine learning, classification trees have again attracted the attention of the scientific community because of their glass-box structure. These models are usually built using greedy procedures, solving subproblems to find cuts in the feature space that minimize some impurity measures. In contrast to this standard greedy approach and to the recent advances in the definition of the learning problem through MILP-based exact formulations, in this paper we propose a novel evolutionary algorithm for the induction of classification trees that exploits a memetic approach that is able to handle datasets with thousands of points. Our procedure combines the exploration of the feasible space of solutions with local searches to obtain structures with generalization capabilities that are competitive with the state-of-the-art methods.	翻訳日:2023-05-16 18:41:00 公開日:2023-05-13
# more for less: より強力なパフォーマンス保証による安全なポリシー改善 More for Less: Safe Policy Improvement With Stronger Performance Guarantees ( http://arxiv.org/abs/2305.07958v1 ) ライセンス: Link先を確認	Patrick Wienh\"oft, Marnix Suilen, Thiago D. Sim\~ao, Clemens Dubslaff, Christel Baier, Nils Jansen	(参考訳) オフラインの強化学習環境では、安全なポリシー改善(SPI)問題は、サンプルデータが生成された行動ポリシーの性能を改善することを目的としている。 SPIに対する最先端のアプローチは、改善されたポリシーの性能に関する実用的な確率的保証を提供するために、多数のサンプルを必要とする。このような保証のために少ないデータを必要とする手段を提供するspi問題に対して,新たなアプローチを提案する。具体的には、これらの保証の正しさを証明するために、SPIのより厳密な改善境界を導出するための理論的基礎となるデータセットと基礎となる環境モデルに暗黙的な変換を考案する。ベースラインブートストラップ法(SPIBB)アルゴリズムを標準ベンチマークで確立したSPIを用いて,本手法がSPIBBアルゴリズムのサンプリング複雑性を著しく低減することを示す。 In an offline reinforcement learning setting, the safe policy improvement (SPI) problem aims to improve the performance of a behavior policy according to which sample data has been generated. State-of-the-art approaches to SPI require a high number of samples to provide practical probabilistic guarantees on the improved policy's performance. We present a novel approach to the SPI problem that provides the means to require less data for such guarantees. Specifically, to prove the correctness of these guarantees, we devise implicit transformations on the data set and the underlying environment model that serve as theoretical foundations to derive tighter improvement bounds for SPI. Our empirical evaluation, using the well-established SPI with baseline bootstrapping (SPIBB) algorithm, on standard benchmarks shows that our method indeed significantly reduces the sample complexity of the SPIBB algorithm.	翻訳日:2023-05-16 18:40:46 公開日:2023-05-13
# 境界駆動量子鎖のジャンプチャネル統計におけるパターン Patterns in the jump-channel statistics of boundary driven quantum chains ( http://arxiv.org/abs/2305.07957v1 ) ライセンス: Link先を確認	Gabriel T. Landi	(参考訳) 量子系を複数のジャンプチャネルで連続的に測定することに由来する確率過程を考える。このプロセスは、ジャンプ間のランダムな時間だけでなく、ジャンプチャンネルを表す一連のシンボルによっても記述される。我々はこの記号列の基本的な性質を確立する。まず、ダイナミクスを完全に制御する特別なスーパーオペレータセットを決定し、多点分布を計算し、確率的軌道をシミュレートする効率的な方法を提供する。また、確率過程が定常である条件を決定し、遠方の放出の間の記憶が特定のスーパーオペレータのスペクトル特性によって決定されることを示す。最後に、あるシステムはパターンをサポートしており、各ジャンプの後の進化は閉じた状態のセットで実行される。これは、私たちが議論しているように、将来の結果の予測を大いに促進するために利用できます。境界駆動型一次元XYスピンチェーンによる輸送の研究により、これらのアイデアを説明する。統計はチェーンサイズに大きく依存していることが示される。そして、ハミルトニアンにおけるペアリング項の存在は、既存のパターンを破壊する。 We consider the stochastic process stemming from continuously measuring a quantum system with multiple jump channels. The process is described not only by the random times between jumps, but also by a sequence of emitted symbols representing each jump channel. We establish the fundamental properties of this sequence of symbols. First, we determine a special set of superoperators that completely govern the dynamics, and provide an efficient way for computing multi-point distributions and for simulating stochastic trajectories. We also determine the conditions for the stochastic process to be stationary and show that the memory between distant emissions is determined by the spectral properties of a specific superoperator. Finally, we show that some systems support a pattern, where the evolution after each jump runs over a closed set of states. This, as we argue, can be used to greatly facilitate our prediction of future outcomes. We illustrate these ideas by studying transport through a boundary-driven one-dimensional XY spin chain. We show that the statistics depends dramatically on the chain size. And that the presence of pairing terms in the Hamiltonian destroy any existing patterns.	翻訳日:2023-05-16 18:40:32 公開日:2023-05-13
# 確率的グラフマッチングによる画像分割 Image Segmentation via Probabilistic Graph Matching ( http://arxiv.org/abs/2305.07954v1 ) ライセンス: Link先を確認	Ayelet Heimowitz and Yosi Keller	(参考訳) 本研究では,低レベル画像を用いて計算された不定値および一対割り当て確率に基づく推論問題としてセグメンテーションを定式化する,教師なしかつ半自動的な画像セグメンテーション手法を提案する。この推論は確率的グラフマッチングスキームによって解決され、低レベルの画像キューとパラメータの自動チューニングを厳密に組み込むことができる。提案手法は, 現代の最先端画像集合に適用した場合に, 半教師なし・非教師付き画像分割方式と良好に比較できることを示した。 This work presents an unsupervised and semi-automatic image segmentation approach where we formulate the segmentation as a inference problem based on unary and pairwise assignment probabilities computed using low-level image cues. The inference is solved via a probabilistic graph matching scheme, which allows rigorous incorporation of low level image cues and automatic tuning of parameters. The proposed scheme is experimentally shown to compare favorably with contemporary semi-supervised and unsupervised image segmentation schemes, when applied to contemporary state-of-the-art image sets.	翻訳日:2023-05-16 18:40:17 公開日:2023-05-13
# 局所的パッチ間不変性に基づく視覚計測用イルミネーション非感受性バイナリ記述子 Illumination-insensitive Binary Descriptor for Visual Measurement Based on Local Inter-patch Invariance ( http://arxiv.org/abs/2305.07943v1 ) ライセンス: Link先を確認	Xinyu Lin, Yingjie Zhou, Xun Zhang, Yipeng Liu, and Ce Zhu	(参考訳) バイナリ機能記述子は様々な視覚計測タスク、特に限られた計算資源とストレージ容量を持つタスクで広く使われている。既存のバイナリディスクリプタは、照明のバリエーションに敏感なため、長期の視覚的な測定タスクではうまく機能しない。画像照明が劇的に変化すると、局所的なパッチ間の相対関係はほとんど無傷であることが観察できる。そこで本研究では,複数の空間的粒度に現れる局所的なパッチ間不変性を利用して,照明に敏感なバイナリ(IIB)ディスクリプタを提案する。局所パッチ特徴計算に積分画像を利用することにより、高効率なIIB記述子を実現する。スケーラブルな機能を複数の空間的な粒度にエンコードすることで,粗面から細面への計算効率の高い階層マッチングを実現する。さらに、IIBディスクリプタは、いくつかのアプリケーションで利用可能なディープマップやセマンティックセグメンテーション結果など、他のタイプの画像データにも適用することができる。自然と合成の両方のデータセットに関する数値実験により、提案したIIBディスクリプタは最先端のバイナリディスクリプタといくつかのテストフロートディスクリプタより優れていることが明らかになった。提案したIIBディスクリプタは、長期視覚的ローカライゼーションのためのデモシステムにも採用されている。 IIBディスクリプタのコードは公開されている。 Binary feature descriptors have been widely used in various visual measurement tasks, particularly those with limited computing resources and storage capacities. Existing binary descriptors may not perform well for long-term visual measurement tasks due to their sensitivity to illumination variations. It can be observed that when image illumination changes dramatically, the relative relationship among local patches mostly remains intact. Based on the observation, consequently, this study presents an illumination-insensitive binary (IIB) descriptor by leveraging the local inter-patch invariance exhibited in multiple spatial granularities to deal with unfavorable illumination variations. By taking advantage of integral images for local patch feature computation, a highly efficient IIB descriptor is achieved. It can encode scalable features in multiple spatial granularities, thus facilitating a computationally efficient hierarchical matching from coarse to fine. Moreover, the IIB descriptor can also apply to other types of image data, such as depth maps and semantic segmentation results, when available in some applications. Numerical experiments on both natural and synthetic datasets reveal that the proposed IIB descriptor outperforms state-of-the-art binary descriptors and some testing float descriptors. The proposed IIB descriptor has also been successfully employed in a demo system for long-term visual localization. The code of the IIB descriptor will be publicly available.	翻訳日:2023-05-16 18:40:07 公開日:2023-05-13
# データ駆動のディストピア:不断の倫理違反 Data-Driven Dystopia: an uninterrupted breach of ethics ( http://arxiv.org/abs/2305.07934v1 ) ライセンス: Link先を確認	Shreyansh Padarha	(参考訳) 本稿では、データの増加と大企業のデータの誤用に関連するリスクと複雑さについて論じる。この記事は、ユーザのプライバシに違反するデータ漏洩やデータ収集プラクティスの例を示している。また、不平等と差別を永続するビッグデータモデルを指すwmds(weapons of math destruction)の概念も検討している。この記事では、ユーザ情報の保護とデータモデル、AI、MLの倫理的利用に責任を負う企業の必要性を強調している。記事はまた、個人の日常生活におけるデータプライバシの重要性と、データ管理に対するより意識的で責任あるアプローチの必要性を強調している。 This article discusses the risks and complexities associated with the exponential rise in data and the misuse of data by large corporations. The article presents instances of data breaches and data harvesting practices that violate user privacy. It also explores the concept of "Weapons Of Math Destruction" (WMDs), which refers to big data models that perpetuate inequality and discrimination. The article highlights the need for companies to take responsibility for safeguarding user information and the ethical use of data models, AI, and ML. The article also emphasises the significance of data privacy for individuals in their daily lives and the need for a more conscious and responsible approach towards data management.	翻訳日:2023-05-16 18:39:43 公開日:2023-05-13
# GSB:限られたトレーニングサンプルを用いたビジョントランスのためのグループ重ね合わせ二元化 GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples ( http://arxiv.org/abs/2305.07931v1 ) ライセンス: Link先を確認	Tian Gao, Cheng-Zhong Xu, Le Zhang, Hui Kong	(参考訳) 大量のパラメータの影響を受け、ViTは通常、比較的限られた数のトレーニングサンプルで深刻なオーバーフィット問題に悩まされる。さらに、ViTは通常、リソース制限されたデバイスへのデプロイメントを制限する重いコンピューティングリソースを必要とする。モデル圧縮法の一種として、モデル双対化は上記の問題を解決する良い選択である可能性がある。完全な倍数化法と比較すると、複雑なテンソル乗算を単純なビット単位の2進演算に置き換え、全倍数モデルのパラメータとアクティベーションを1ビットのみで表現し、モデルサイズと計算複雑性の問題をそれぞれ解決する。本稿では,バイナリViTモデルの精度の低下は,アテンションモジュールと値ベクトルの情報損失が主な原因であることを示す。そこで本研究では,これらの問題に対処するため,GSB(Group Superposition Binarization)と呼ばれる新しいモデルバイナライゼーション手法を提案する。さらに,二元化モデルの性能をさらに向上させるために,二元化過程における勾配計算手順を調査し,gsbのより適切な勾配計算式を導出し,勾配ミスマッチの影響を低減した。次に, モデル2値化による性能劣化を緩和するために, 知識蒸留技術を導入する。限られたトレーニングサンプル数を持つ3つのデータセットの実験では、提案したGSBモデルがバイナリ量子化スキームの最先端性能を実現し、いくつかの指標でその完全精度を上回ることが示されている。 Affected by the massive amount of parameters, ViT usually suffers from serious overfitting problems with a relatively limited number of training samples. In addition, ViT generally demands heavy computing resources, which limit its deployment on resource-constrained devices. As a type of model-compression method,model binarization is potentially a good choice to solve the above problems. Compared with the full-precision one, the model with the binarization method replaces complex tensor multiplication with simple bit-wise binary operations and represents full-precision model parameters and activations with only 1-bit ones, which potentially solves the problem of model size and computational complexity, respectively. In this paper, we find that the decline of the accuracy of the binary ViT model is mainly due to the information loss of the Attention module and the Value vector. Therefore, we propose a novel model binarization technique, called Group Superposition Binarization (GSB), to deal with these issues. Furthermore, in order to further improve the performance of the binarization model, we have investigated the gradient calculation procedure in the binarization process and derived more proper gradient calculation equations for GSB to reduce the influence of gradient mismatch. Then, the knowledge distillation technique is introduced to alleviate the performance degradation caused by model binarization. Experiments on three datasets with limited numbers of training samples demonstrate that the proposed GSB model achieves state-of-the-art performance among the binary quantization schemes and exceeds its full-precision counterpart on some indicators.	翻訳日:2023-05-16 18:39:32 公開日:2023-05-13
# AMTSS:多言語言語推論のための適応型マルチ教師単段階知識蒸留フレームワーク AMTSS: An Adaptive Multi-Teacher Single-Student Knowledge Distillation Framework For Multilingual Language Inference ( http://arxiv.org/abs/2305.07928v1 ) ライセンス: Link先を確認	Qianglong Chen, Feng Ji, Feng-Lin Li, Guohai Xu, Ming Yan, Ji Zhang and Yin Zhang	(参考訳) 知識蒸留は、実アプリケーションのための多言語事前学習言語モデルをローンチする上で重要である。多言語環境での費用対効果のある言語推論を支援するために,複数の教師から1人の学生への知識の蒸留を可能にする適応型多教師単学生蒸留フレームワークであるAMTSSを提案する。まず,適応型学習戦略と教師重要度重みを導入し,生徒がマックスマージン教師から効果的に学び,新しい言語に容易に適応できるようにする。さらに,複数の言語をサポートする異なるプロジェクション層を持つ共有学生エンコーダを提案する。 AMTSSは,Eコマースシナリオにおいて,パブリックXNLIデータセットとリアル産業データセットAliExpress(AE)の競争結果を得ることを示す。 Knowledge distillation is of key importance to launching multilingual pre-trained language models for real applications. To support cost-effective language inference in multilingual settings, we propose AMTSS, an adaptive multi-teacher single-student distillation framework, which allows distilling knowledge from multiple teachers to a single student. We first introduce an adaptive learning strategy and teacher importance weight, which enables a student to effectively learn from max-margin teachers and easily adapt to new languages. Moreover, we present a shared student encoder with different projection layers in support of multiple languages, which contributes to largely reducing development and machine cost. Experimental results show that AMTSS gains competitive results on the public XNLI dataset and the realistic industrial dataset AliExpress (AE) in the E-commerce scenario.	翻訳日:2023-05-16 18:39:06 公開日:2023-05-13
# RC3: 正規化コントラストクロスランガルクロスモーダルプレトレーニング RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training ( http://arxiv.org/abs/2305.07927v1 ) ライセンス: Link先を確認	Chulun Zhou, Yunlong Liang, Fandong Meng, Jinan Xu, Jinsong Su and Jie Zhou	(参考訳) 多言語視覚言語(V&L)の事前学習は、様々なモダリティや言語にまたがる普遍表現の学習において顕著な進歩を遂げた。近年の成功にもかかわらず、多言語環境でのV&L事前訓練モデルのさらなる改善には依然として課題がある。特に、現在のV&L事前学習法は、機械翻訳を通じて英語中心のデータセットから生成される厳密な多言語画像テキストペアに大きく依存している。しかし、厳密に整合したデータセットの収集と翻訳のコストは通常、計り知れない。本稿では,より豊富な弱結合型多言語画像テキストペアを活用した正規化コントラスト言語間クロスモーダル(rc^3)事前学習を提案する。具体的には、テキスト関連性に応じて、弱整列型視覚テキスト入力の表現近接を制約する正規化言語間視覚テキストコントラスト学習目標を設計する。さらに、既存のV&L事前トレーニングアプローチは、主に関心の領域(ROI)機能またはパッチ埋め込みによる視覚的な入力を扱う。事前学習と下流マルチモーダルタスクのためのモデルに,2種類の視覚的特徴を柔軟に統合する。 6言語にまたがる下流5つのマルチモーダルタスクに関する広範囲な実験により,ゼロショット能力の強いコントラストモデルに対する提案手法の有効性が示された。 Multilingual vision-language (V&L) pre-training has achieved remarkable progress in learning universal representations across different modalities and languages. In spite of recent success, there still remain challenges limiting further improvements of V&L pre-trained models in multilingual settings. Particularly, current V&L pre-training methods rely heavily on strictly-aligned multilingual image-text pairs generated from English-centric datasets through machine translation. However, the cost of collecting and translating such strictly-aligned datasets is usually unbearable. In this paper, we propose Regularized Contrastive Cross-lingual Cross-modal (RC^3) pre-training, which further exploits more abundant weakly-aligned multilingual image-text pairs. Specifically, we design a regularized cross-lingual visio-textual contrastive learning objective that constrains the representation proximity of weakly-aligned visio-textual inputs according to textual relevance. Besides, existing V&L pre-training approaches mainly deal with visual inputs by either region-of-interest (ROI) features or patch embeddings. We flexibly integrate the two forms of visual features into our model for pre-training and downstream multi-modal tasks. Extensive experiments on 5 downstream multi-modal tasks across 6 languages demonstrate the effectiveness of our proposed method over competitive contrast models with stronger zero-shot capability.	翻訳日:2023-05-16 18:38:49 公開日:2023-05-13
# gt-rain challenge cvpr 2023 workshop ug$^{\textbf{2}}$+ track 3の2段階実画像レーディング手法 A Two-Stage Real Image Deraining Method for GT-RAIN Challenge CVPR 2023 Workshop UG$^{\textbf{2}}$+ Track 3 ( http://arxiv.org/abs/2305.07979v1 ) ライセンス: Link先を確認	Yun Guo, Xueyao Xiao, Xiaoxiong Wang, Yi Li, Yi Chang, Luxin Yan	(参考訳) 本稿では,CVPR 2023 UG$^{2}$+ Track 3におけるGT-Rain ChallengeのためのチームHUST\li VIEのソリューションについて紹介する。本研究では,雨のフレームから鮮明な画像を再構成する効率的な二段階フレームワークを提案する。まず,マルチフレームとアライメントされた雨枠の利点を生かした疑似gtを生成するために,低ランクビデオデライニング手法を用いる。第2に,大規模な実雨データセットを事前トレーニングし,擬似gtを微調整して画像復元をさらに改善するために,トランスフォーマによる単一画像レーダネットワーク uformer を実装した。さらに、視覚的快楽効果の面では、パイプラインの終了時に包括的な画像処理モジュールが使用される。我々の全体的なフレームワークは精巧に設計されており、最終テストフェーズで提供される豪雨シーケンスと霧のシーケンスの両方を処理できます。最後に、平均構造類似度(SSIM)で1位、平均ピーク信号-雑音比(PSNR)で2位にランクする。私たちのコードはhttps://github.com/yunguo224/ug2_derainingで利用可能です。 In this technical report, we briefly introduce the solution of our team HUST\li VIE for GT-Rain Challenge in CVPR 2023 UG$^{2}$+ Track 3. In this task, we propose an efficient two-stage framework to reconstruct a clear image from rainy frames. Firstly, a low-rank based video deraining method is utilized to generate pseudo GT, which fully takes the advantage of multi and aligned rainy frames. Secondly, a transformer-based single image deraining network Uformer is implemented to pre-train on large real rain dataset and then fine-tuned on pseudo GT to further improve image restoration. Moreover, in terms of visual pleasing effect, a comprehensive image processor module is utilized at the end of pipeline. Our overall framework is elaborately designed and able to handle both heavy rainy and foggy sequences provided in the final testing phase. Finally, we rank 1st on the average structural similarity (SSIM) and rank 2nd on the average peak signal-to-noise ratio (PSNR). Our code is available at https://github.com/yunguo224/UG2_Deraining.	翻訳日:2023-05-16 18:32:44 公開日:2023-05-13
# デュアルフォーミュレーションによる非負の低ランクテンソルコンプリートと画像・ビデオコンプリートへの応用 Nonnegative Low-Rank Tensor Completion via Dual Formulation with Applications to Image and Video Completion ( http://arxiv.org/abs/2305.07976v1 ) ライセンス: Link先を確認	Tanmay Kumar Sinha, Jayadev Naram, Pawan Kumar	(参考訳) テンソル補完問題に対する最近のアプローチは、しばしばデータの非負構造を見落としている。我々は,非負の低ランクテンソルを学習する問題を考えるとともに,双対性理論を用いて,そのようなテンソルの新しい因子分解を提案する。因子化は非負の制約を低ランクの制約から切り離す。その結果の問題は多様体上の最適化問題であり、それを解決するためにリーマン共役勾配の変種を提案する。提案アルゴリズムは,カラー画像インペインティング,映像補完,ハイパースペクトル画像補完など,様々なタスクにわたってテストを行う。実験の結果,提案手法は多くのテンソル補完アルゴリズムよりも優れていることがわかった。 Recent approaches to the tensor completion problem have often overlooked the nonnegative structure of the data. We consider the problem of learning a nonnegative low-rank tensor, and using duality theory, we propose a novel factorization of such tensors. The factorization decouples the nonnegative constraints from the low-rank constraints. The resulting problem is an optimization problem on manifolds, and we propose a variant of Riemannian conjugate gradients to solve it. We test the proposed algorithm across various tasks such as colour image inpainting, video completion, and hyperspectral image completion. Experimental results show that the proposed method outperforms many state-of-the-art tensor completion algorithms.	翻訳日:2023-05-16 18:32:24 公開日:2023-05-13
# 線形制約系の作用素解の単純解法 Simplicial techniques for operator solutions of linear constraint systems ( http://arxiv.org/abs/2305.07974v1 ) ライセンス: Link先を確認	Ho Yiu Chung, Cihan Okay, Igor Sikora	(参考訳) 線形制約系は、整数の群 $\ZZ_d$ 上の線型方程式によって指定される。作用素解は、量子文脈性や非局所ゲームの研究において重要な役割を果たす。本稿では、単純集合の理論を用いて線形系の作用素解を研究するための枠組みを開発する。このアプローチは、これらの群を空間の基本群と密接に関連する代数的不変量として同定することにより、解群に基づくよく知られた群論的アプローチを洗練する。この観点から、我々のアプローチは細胞複合体に基づく初期のホモトピー的アプローチとも関係している。フレームワーク内では、単純集合から来る線形系の新しいクラスを導入し、任意の線形系をその形式の1つに還元できることを示す。次に、群に関連する線形系を専門とする。群内の解を許容する任意の線型系に対して、$\ZZ_d$の解が認められるという予想に対して、重要な証拠を提供する。 A linear constraint system is specified by linear equations over the group $\ZZ_d$ of integers modulo $d$. Their operator solutions play an important role in the study of quantum contextuality and non-local games. In this paper, we use the theory of simplicial sets to develop a framework for studying operator solutions of linear systems. Our approach refines the well-known group-theoretical approach based on solution groups by identifying these groups as algebraic invariants closely related to the fundamental group of a space. In this respect, our approach also makes a connection to the earlier homotopical approach based on cell complexes. Within our framework, we introduce a new class of linear systems that come from simplicial sets and show that any linear system can be reduced to one of that form. Then we specialize in linear systems that are associated with groups. We provide significant evidence for a conjecture stating that for odd $d$ every linear system admitting a solution in a group admits a solution in $\ZZ_d$.	翻訳日:2023-05-16 18:32:13 公開日:2023-05-13
# 確率的セキュリティの計算コストについて On the Computational Cost of Stochastic Security ( http://arxiv.org/abs/2305.07973v1 ) ライセンス: Link先を確認	Noah A. Crum, Leanto Sunny, Pooya Ronagh, Raymond Laflamme, Radhakrishnan Balu, George Siopsis	(参考訳) 本稿では,Langevin Dynamicsの長期持続鎖モンテカルロシミュレーションにより,エネルギーベースモデル(EBM)による表現の質が向上するかどうかを考察する。本研究では,学習したebmを用いた拡散過程のモンテカルロシミュレーションを用いて,独立分類器ネットワークの逆ロバスト性やキャリブレーションスコアを改善する手法を提案する。本研究は, 連続エネルギーポテンシャルからギブズサンプリングを効率よく行うために, 量子・古典ハードウェアとソフトウェアを新たに実現し, モデルのキャリブレーションと対角ロバスト性を向上させることを目的として, ギブズサンプリングの計算予算の増大を図った。 We investigate whether long-run persistent chain Monte Carlo simulation of Langevin dynamics improves the quality of the representations achieved by energy-based models (EBM). We consider a scheme wherein Monte Carlo simulation of a diffusion process using a trained EBM is used to improve the adversarial robustness and the calibration score of an independent classifier network. Our results show that increasing the computational budget of Gibbs sampling in persistent contrastive divergence improves the calibration and adversarial robustness of the model, elucidating the practical merit of realizing new quantum and classical hardware and software for efficient Gibbs sampling from continuous energy potentials.	翻訳日:2023-05-16 18:31:56 公開日:2023-05-13
# Trillion Dollar Words: 新たな金融データセットとタスク&マーケット分析 Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis ( http://arxiv.org/abs/2305.07972v1 ) ライセンス: Link先を確認	Agam Shah and Suvan Paturi and Sudheer Chava	(参考訳) 連邦公開市場委員会(FOMC)による金融政策宣言は、金融市場リターンの主要な要因である。我々は、金融政策が金融市場に与える影響を理解するために、fomcスピーチ、会議分、記者会見の書き起こしの最大のトークン化および注釈付きデータセットを構築します。本研究では,ホーカッシュ・ドヴィッシュ分類の新たなタスクを開発し,提案するデータセット上での各種事前学習言語モデルのベンチマークを行った。最良業績モデル(RoBERTa-large)を用いて,FOMC文書公開日に対する金融政策スタンスを測定する。構築した指標を評価するため,金融市場,株式市場,マクロ経済指標への影響について検討する。私たちのデータセット、モデル、コードはcc by-nc 4.0ライセンスの下でhughingfaceとgithubで公開されている。 Monetary policy pronouncements by Federal Open Market Committee (FOMC) are a major driver of financial market returns. We construct the largest tokenized and annotated dataset of FOMC speeches, meeting minutes, and press conference transcripts in order to understand how monetary policy influences financial markets. In this study, we develop a novel task of hawkish-dovish classification and benchmark various pre-trained language models on the proposed dataset. Using the best-performing model (RoBERTa-large), we construct a measure of monetary policy stance for the FOMC document release days. To evaluate the constructed measure, we study its impact on the treasury market, stock market, and macroeconomic indicators. Our dataset, models, and code are publicly available on Huggingface and GitHub under CC BY-NC 4.0 license.	翻訳日:2023-05-16 18:31:44 公開日:2023-05-13
# 距離空間におけるグラフ埋め込みの厳密かつ高速な一般化誤差境界 Tight and fast generalization error bound of graph embedding in metric space ( http://arxiv.org/abs/2305.07971v1 ) ライセンス: Link先を確認	Atsushi Suzuki, Atsushi Nitanda, Taiji Suzuki, Jing Wang, Feng Tian, and Kenji Yamanishi	(参考訳) 近年の研究では、計量空間におけるグラフの構造を反映した頂点表現を得ることを目的として、非ユークリッド計量空間において有効かつ効率的なグラフ埋め込みが達成できることが実験的に示されている。具体的には、双曲空間へのグラフ埋め込みは、例えば自然言語、ソーシャルネットワーク、知識ベースなどの階層構造を持つグラフの埋め込みに実験的に成功している。しかし、近年の理論解析により、非ユークリッドグラフ埋め込みの一般化誤差はユークリッドグラフよりもかなり高い値を示しており、高い一般化誤差はデータの不完全性とノイズが学習性能に重大な影響を与えることを示している。これは、既存の境界が非ユークリッド距離空間におけるグラフ埋め込みの成功を実際の訓練データサイズで保証できないことを意味しており、非ユークリッドグラフ埋め込みの実際の問題への応用を防ぐことができる。本稿では、表現対の距離の関数集合としてモデルの局所ラデマッハ複雑性を評価することにより、グラフ埋め込みの一般化誤差の新たな上限を与える。我々の境界は、双曲空間を含む非ユークリッド距離空間におけるグラフ埋め込みのパフォーマンスが、既存の上界よりも優れていることを明確化する。具体的には、我々の新しい上界は距離空間の幾何半径$R$の多項式であり、最大で$O(\frac{1}{S})$で、$S$はトレーニングデータサイズである。我々のバウンダリは、既存のバウンダリよりもかなり強く、高速で$R$と$O(\frac{1}{\sqrt{S}})$に指数関数化できる。例における特定の計算により、非ユークリッド計量空間へのグラフ埋め込みは、既存の有界よりもはるかに少ない訓練データを持つユークリッド空間におけるグラフ埋め込みよりも優れていることが示される。 Recent studies have experimentally shown that we can achieve in non-Euclidean metric space effective and efficient graph embedding, which aims to obtain the vertices' representations reflecting the graph's structure in the metric space. Specifically, graph embedding in hyperbolic space has experimentally succeeded in embedding graphs with hierarchical-tree structure, e.g., data in natural languages, social networks, and knowledge bases. However, recent theoretical analyses have shown a much higher upper bound on non-Euclidean graph embedding's generalization error than Euclidean one's, where a high generalization error indicates that the incompleteness and noise in the data can significantly damage learning performance. It implies that the existing bound cannot guarantee the success of graph embedding in non-Euclidean metric space in a practical training data size, which can prevent non-Euclidean graph embedding's application in real problems. This paper provides a novel upper bound of graph embedding's generalization error by evaluating the local Rademacher complexity of the model as a function set of the distances of representation couples. Our bound clarifies that the performance of graph embedding in non-Euclidean metric space, including hyperbolic space, is better than the existing upper bounds suggest. Specifically, our new upper bound is polynomial in the metric space's geometric radius $R$ and can be $O(\frac{1}{S})$ at the fastest, where $S$ is the training data size. Our bound is significantly tighter and faster than the existing one, which can be exponential to $R$ and $O(\frac{1}{\sqrt{S}})$ at the fastest. Specific calculations on example cases show that graph embedding in non-Euclidean metric space can outperform that in Euclidean space with much smaller training data than the existing bound has suggested.	翻訳日:2023-05-16 18:31:31 公開日:2023-05-13
# 実験経済学を用いた大規模言語モデルにおける創発的ゴール様行動の調査 Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics ( http://arxiv.org/abs/2305.07970v1 ) ライセンス: Link先を確認	Steve Phelps and Yvan I. Russell	(参考訳) 本研究では,社会的ジレンマにおける協調的,競争的,利他的,利他的行動の自然言語記述を運用するための大規模言語モデル(llm)の能力,特にgpt-3.5について検討する。我々の焦点は、非ゼロサム相互作用の古典的な例である反復囚人のジレンマであるが、我々の広範な研究プログラムは、最後通算ゲーム、独裁ゲーム、公共財ゲームを含む様々な実験経済シナリオを含んでいる。実験では,様々なプロンプトを用いてllm生成エージェントをインスタンス化し,協調的および競争的スタンスを伝達した。そこで我々は,囚人のジレンマを繰り返すエージェントの協力レベルを評価し,パートナーの協力行動や離脱行動に対する反応を考慮に入れた。その結果、llmは利他主義と利己主義の自然言語記述をある程度適切な行動に翻訳できるが、条件付き相反性に基づく行動適応の限界が示された。障害者との協力の増大と協力者の協力の減少が観察されたパターンは、社会ジレンマにおける人間の行動に関する知識を一般化するLLMの能力の潜在的な制約を強調している。我々は,幅広い社会的ジレンマの中で,llm生成エージェントの創発的行動に寄与する要因について,研究コミュニティにさらなる検討を求め,モデルアーキテクチャ,トレーニングパラメータ,エージェント行動に対する様々なパートナー戦略の影響について検討する。 GPT-4のような先進的なLLMが利用可能になるにつれて、それらが類似した制限を示すか、より微妙な協調行動が可能かどうかを調査することが重要であり、最終的には人間の価値観や社会的規範に適合したAIシステムの開発を促進する。 In this study, we investigate the capacity of large language models (LLMs), specifically GPT-3.5, to operationalise natural language descriptions of cooperative, competitive, altruistic, and self-interested behavior in social dilemmas. Our focus is on the iterated Prisoner's Dilemma, a classic example of a non-zero-sum interaction, but our broader research program encompasses a range of experimental economics scenarios, including the ultimatum game, dictator game, and public goods game. Using a within-subject experimental design, we instantiated LLM-generated agents with various prompts that conveyed different cooperative and competitive stances. We then assessed the agents' level of cooperation in the iterated Prisoner's Dilemma, taking into account their responsiveness to the cooperative or defection actions of their partners. Our results provide evidence that LLMs can translate natural language descriptions of altruism and selfishness into appropriate behaviour to some extent, but exhibit limitations in adapting their behavior based on conditioned reciprocity. The observed pattern of increased cooperation with defectors and decreased cooperation with cooperators highlights potential constraints in the LLM's ability to generalize its knowledge about human behavior in social dilemmas. We call upon the research community to further explore the factors contributing to the emergent behavior of LLM-generated agents in a wider array of social dilemmas, examining the impact of model architecture, training parameters, and various partner strategies on agent behavior. As more advanced LLMs like GPT-4 become available, it is crucial to investigate whether they exhibit similar limitations or are capable of more nuanced cooperative behaviors, ultimately fostering the development of AI systems that better align with human values and social norms.	翻訳日:2023-05-16 18:30:59 公開日:2023-05-13
# GPT-Sentinel:人間とチャットGPT生成コンテンツを識別する GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content ( http://arxiv.org/abs/2305.07969v1 ) ライセンス: Link先を確認	Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Ramakrishnan	(参考訳) 本稿では,言語モデルを用いたChatGPT生成対人文テキスト検出手法を提案する。この目的のために、我々はまずOpenGPTTextという、ChatGPTを用いて生成されたリフレーズ付きコンテンツからなる前処理データセットを収集し、リリースした。次に、RoBERTa(Roustly Optimized BERT Pretraining Approach)とText-to-Text Transfer Transformer(T5)を用いて、テキスト分類のための2つの異なるモデルの設計、実装、訓練を行った。私たちのモデルは、さまざまなメトリクスで評価したように、テストデータセット上で97%以上の精度で、驚くべき結果を達成しました。さらに,人間の手書きテキストとChatGPT生成テキストの主な特徴を抽出し,識別する能力を示すための解釈可能性の検討を行った。本研究は,生成テキストの検出における言語モデルの有効利用に関する重要な知見を提供する。 This paper presents a novel approach for detecting ChatGPT-generated vs. human-written text using language models. To this end, we first collected and released a pre-processed dataset named OpenGPTText, which consists of rephrased content generated using ChatGPT. We then designed, implemented, and trained two different models for text classification, using Robustly Optimized BERT Pretraining Approach (RoBERTa) and Text-to-Text Transfer Transformer (T5), respectively. Our models achieved remarkable results, with an accuracy of over 97% on the test dataset, as evaluated through various metrics. Furthermore, we conducted an interpretability study to showcase our model's ability to extract and differentiate key features between human-written and ChatGPT-generated text. Our findings provide important insights into the effective use of language models to detect generated text.	翻訳日:2023-05-16 18:30:27 公開日:2023-05-13
# 量子ゼノダイナミクスによるポテンシャル中の量子粒子のテレポーテーション Teleportation of a quantum particle in a potential via quantum Zeno dynamics ( http://arxiv.org/abs/2305.07968v1 ) ライセンス: Link先を確認	Miguel A. Porras, Miguel Casado-\'Alvaro, and Isabel Gonzalo	(参考訳) 量子状態のテレポーテーションとは、絡み合いによって異なる現象である量子粒子のテレポーテーションの可能性について報告する。第一の意味から、粒子を(一定の不確実性を持って)あるポテンシャル井戸や障壁の平衡点から配置し、粒子が静止しているかどうかを頻繁に監視することで、理論上はテレポーテーションが可能である。この量子ゼノダイナミクスは加速度を阻害し、他のターニングポイントにおける古典的ターニングポイントからの消失と他のターニングポイントの出現を特徴とする。粒子は常に静止しており、2つの旋回点の間の経路には見つからず、移動時間を節約します。電子、陽子、その他の粒子のテレポーテーションの実現可能性について議論し、粒子が重くなるにつれてその非現実性が増加すると結論づける。 We report on the possibility of teleportation of a quantum particle, a distinctly different phenomenon from the teleportation of a quantum state through entanglement. With the first meaning, teleportation is theoretically possible by placing the particle initially at rest (with a certain uncertainty) out of any equilibrium point of a potential well or barrier and by frequently monitoring whether the particle remains at rest. This quantum Zeno dynamics inhibits acceleration, and features disappearance from the classical turning point and appearance in other turning point, if there is any other, with a probability that approaches unity by increasing the frequency of the measurements. This phenomenon has all the ingredients attributed in science fiction to teleportation: The particle is always at rest, cannot be found in the path between the two turning points, and saves travel time. We discuss the feasibility, in principle, of teleportation of electrons, protons and other particles, and conclude its increasing impracticability as the particle gets heavier.	翻訳日:2023-05-16 18:30:09 公開日:2023-05-13
# 構造化低ランクテンソル学習 Structured Low-Rank Tensor Learning ( http://arxiv.org/abs/2305.07967v1 ) ライセンス: Link先を確認	Jayadev Naram, Tanmay Kumar Sinha, Pawan Kumar	(参考訳) 構造的制約のある部分的な観測から低ランクテンソルを学習する問題を考察し、そのようなテンソルの新たな因子化を提案し、より単純な最適化問題を導いた。結果として生じる問題は多様体上の最適化問題である。この問題を解決するために,一階および二階リーマン最適化アルゴリズムを開発した。得られた問題の双対性ギャップを導出し,提案アルゴリズムの正しさを実験的に検証する。非負の制約とハンケルの制約に関するアルゴリズムを実証する。 We consider the problem of learning low-rank tensors from partial observations with structural constraints, and propose a novel factorization of such tensors, which leads to a simpler optimization problem. The resulting problem is an optimization problem on manifolds. We develop first-order and second-order Riemannian optimization algorithms to solve it. The duality gap for the resulting problem is derived, and we experimentally verify the correctness of the proposed algorithm. We demonstrate the algorithm on nonnegative constraints and Hankel constraints.	翻訳日:2023-05-16 18:29:50 公開日:2023-05-13
# ナノダイアモンド回転とNV中心スピンの高絡み合い状態の準備 Preparing highly entangled states of nanodiamond rotation and NV center spin ( http://arxiv.org/abs/2305.08008v1 ) ライセンス: Link先を確認	Wen-Liang Li, D. L. Zhou	(参考訳) nv(embedd nitrogen-vacancy)センターを備えたナノダイアモンドは、現在の技術でコヒーレントに操作できる実験システムの1つである。 nv中心電子スピンとナノダイヤモンドの機械的回転の絡み合いは、これらの微視的およびメソスコピックな動きを繋ぐ量子ネットワークを構築する上で重要な役割を果たす。本稿では,外部磁場を漸近的に上昇させることで,量子回転と電子スピンの高度に絡み合った状態を漸近的に生成するプロトコルを提案する。 A nanodiamond with an embedded nitrogen-vacancy (NV) center is one of the experimental systems that can be coherently manipulated within current technologies. Entanglement between NV center electron spin and mechanical rotation of the nanodiamond plays a fundamental role in building a quantum network connecting these microscopic and mesoscopic degrees of motions. Here we present a protocol to asymptotically prepare a highly entangled state of the quantum rotation and electron spin by adiabatically boosting the external magnetic field.	翻訳日:2023-05-16 18:22:41 公開日:2023-05-13
# over the safeguards: chatgptのセキュリティリスクを探求する Beyond the Safeguards: Exploring the Security Risks of ChatGPT ( http://arxiv.org/abs/2305.08005v1 ) ライセンス: Link先を確認	Erik Derner and Kristina Batisti\v{c}	(参考訳) ChatGPTのような大規模言語モデル(LLM)の人気が高まり、安全性、セキュリティリスク、倫理的影響に対する懸念が高まっている。本稿では,悪意のあるテキストやコード生成,プライベートデータ開示,不正サービス,情報収集,非倫理的コンテンツの生成など,ChatGPTに関連するさまざまなセキュリティリスクの概要について述べる。本稿では,ChatGPTのコンテンツフィルタの有効性を検証し,保護されている場合でもLLMに持続する倫理的影響とセキュリティリスクを実証し,これらの保護を回避できる可能性を探究する。セキュリティへの影響の質的な分析に基づいて、これらのリスクを軽減し、研究者、政策立案者、業界専門家にChatGPTのようなLLMがもたらす複雑なセキュリティ課題について通知する潜在的な戦略について議論する。本研究は, LLMの倫理的, セキュリティ的含意に関する継続的な議論に寄与し, この分野における継続的な研究の必要性を浮き彫りにしている。 The increasing popularity of large language models (LLMs) such as ChatGPT has led to growing concerns about their safety, security risks, and ethical implications. This paper aims to provide an overview of the different types of security risks associated with ChatGPT, including malicious text and code generation, private data disclosure, fraudulent services, information gathering, and producing unethical content. We present an empirical study examining the effectiveness of ChatGPT's content filters and explore potential ways to bypass these safeguards, demonstrating the ethical implications and security risks that persist in LLMs even when protections are in place. Based on a qualitative analysis of the security implications, we discuss potential strategies to mitigate these risks and inform researchers, policymakers, and industry professionals about the complex security challenges posed by LLMs like ChatGPT. This study contributes to the ongoing discussion on the ethical and security implications of LLMs, underscoring the need for continued research in this area.	翻訳日:2023-05-16 18:22:33 公開日:2023-05-13
# 混合状態に対する最適量子速度 Optimal quantum speed for mixed states ( http://arxiv.org/abs/2305.08004v1 ) ライセンス: Link先を確認	Ashraf Naderzadeh and Seyed Javad Akhtarshenas	(参考訳) 量子状態がいかに高速に進化できるかという問題を考える。 phys におけるユークリッド距離に基づく二乗速度の定義を用いる。 Rev. Reaserch, {\bf 2}, 033127 (2019)] では、時間非依存ハミルトニアンの下で一元的に進化した$d$次元システムの最適速度を得るための体系的な枠組みを提供する。同じ純度を持つ混合量子状態の組のうち、最適状態はその純度パラメータを用いて得られる。任意の$d$に対して、最適状態は二次対角線に対して対称な追加特性を持つ$X$-状態によって与えられることを示す。純度が最大混合状態$\Id/d$を少なくとも2/d^2$で純度を超える十分低い純度に対して、最適状態の非零対角エントリーは$\varrho_{1d}$であり、それぞれ最小固有値と最大固有値を持つ2つのエネルギー固有状態間の遷移振幅に対応する。しかし、より大きな純度の場合、他の二次径のエントリ$\varrho_{i,d-i+1}$を非零値とするかどうかは、相対エネルギーギャップ$\|E_{d-i+1}-E_{i}\|$に依存する。エネルギー基底に対するコヒーレンスと絡み合いの影響も検討され、最適状態においてはどちらの資源も純度の単調関数であり、量子進化を加速させ、量子速度の限界を小さくすることができる。以上の結果から, 進化速度は, 状態のコヒーレンスが担うが, 最高速では, 二次対角形メークロールに位置する外対角要素によって引き起こされるコヒーレンスのみを示す。 The question that how fast a quantum state can evolve is considered. Using the definition of squared speed based on the Euclidean distance given in [Phys. Rev. Reaserch, {\bf 2}, 033127 (2019)], we provide a systematic framework to obtain the optimal speed of a $d$-dimensional system evolved unitarily under a time-independent Hamiltonian. Among the set of mixed quantum states having the same purity, the optimal state is obtained in terms of its purity parameter. We show that for an arbitrary $d$, the optimal state is given by a $X$-state with an additional property of being symmetric with respect to the secondary diagonal. For sufficiently low purities for which the purity exceeds the purity of maximally mixed state $\Id/d$ by at most $2/d^2$, the only nonzero off-diagonal entry of the optimal state is $\varrho_{1d}$, corresponding to the transition amplitude between two energy eigenstates with minimum and maximum eigenvalues, respectively. For larger purities, however, whether or not the other secondary diameter entries $\varrho_{i,d-i+1}$ take nonzero values depends on their relative energy gaps $\|E_{d-i+1}-E_{i}\|$. The effects of coherence and entanglement, with respect to the energy basis, are also examined and find that for optimal states both resources are monotonic functions of purity, so they can causs speed up quantum evolution leading to a smaller quantum speed limit. Our results show that although the coherence of the states is responsible for the speed of evolution, for the fastest states only the coherence caused by some off-diagonal entries located on the secondary diagonal make role.	翻訳日:2023-05-16 18:22:16 公開日:2023-05-13
# 構造化データを用いた高速非同期確率勾配アルゴリズム Efficient Asynchronize Stochastic Gradient Algorithm with Structured Data ( http://arxiv.org/abs/2305.08001v1 ) ライセンス: Link先を確認	Zhao Song, Mingquan Ye	(参考訳) ディープラーニングは、その優れた一般化により、様々な分野で印象的な成功を収めた。しかしながら、多数のレイヤを持つニューラルネットワークを迅速にトレーニングすることは、これまでも難しい問題でした。既存の作業では、局所性に敏感なハッシュ技術や、空間分割上のデータ構造を利用して、各イテレーションのトレーニングコストを軽減する。本研究では、入力データポイントの観点から各イテレーションにおける計算の高速化を試みる。具体的には、トレーニングデータがKronecker構造のような特別な特性を持つ2層完全連結ニューラルネットワークの場合、各イテレーションはデータ次元のサブ線形時間で完了することができる。 Deep learning has achieved impressive success in a variety of fields because of its good generalization. However, it has been a challenging problem to quickly train a neural network with a large number of layers. The existing works utilize the locality-sensitive hashing technique or some data structures on space partitioning to alleviate the training cost in each iteration. In this work, we try accelerating the computations in each iteration from the perspective of input data points. Specifically, for a two-layer fully connected neural network, when the training data have some special properties, e.g., Kronecker structure, each iteration can be completed in sublinear time in the data dimension.	翻訳日:2023-05-16 18:21:40 公開日:2023-05-13
# 特徴適応を用いたDNN圧縮領域認識 DNN-Compressed Domain Visual Recognition with Feature Adaptation ( http://arxiv.org/abs/2305.08000v1 ) ライセンス: Link先を確認	Yingpeng Deng and Lina J. Karam	(参考訳) 学習に基づく画像圧縮は、最先端の変換ベースのコーデックと競合する性能を発揮する。これはJPEG-AIのような新しい学習ベースのビジュアル圧縮標準の開発を動機づけた。これらの新しい標準に対する特に関心は、人間と機械の両方をターゲットにした学習ベースの画像圧縮システムの開発である。本稿では,圧縮領域表現を用いて,圧縮領域内で直接視覚処理やコンピュータビジョンタスクを行う学習ベース圧縮方式について述べる。本研究では,ビットレートの異なる圧縮ドメイン潜在表現を用いて視覚認識を行うための,学習ベースの圧縮ドメイン分類フレームワークを採用する。本稿では,抽出されたチャネル情報の中で重要な特徴を適応的に強調・強化するために,軽量な注意モデルを統合する新しい特徴適応モジュールを提案する。また,事前訓練された画素領域重みを利用するための適応学習戦略を設計する。比較のために,提案手法を用いて得られた性能評価結果に加えて,画素領域内の圧縮・完全復号画像とオリジナル未圧縮画像を用いた性能評価結果も提示する。その結果,提案した圧縮領域分類モデルは,既存の圧縮領域分類モデルよりも明らかに優れており,完全復号化画像を用いて訓練された画素領域モデルと比較して,計算効率が向上することを示す。 Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs. This motivated the development of new learning-based visual compression standards such as JPEG-AI. Of particular interest to these emerging standards is the development of learning-based image compression systems targeting both humans and machines. This paper is concerned with learning-based compression schemes whose compressed-domain representations can be utilized to perform visual processing and computer vision tasks directly in the compressed domain. In our work, we adopt a learning-based compressed-domain classification framework for performing visual recognition using the compressed-domain latent representation at varying bit-rates. We propose a novel feature adaptation module integrating a lightweight attention model to adaptively emphasize and enhance the key features within the extracted channel-wise information. Also, we design an adaptation training strategy to utilize the pretrained pixel-domain weights. For comparison, in addition to the performance results that are obtained using our proposed latent-based compressed-domain method, we also present performance results using compressed but fully decoded images in the pixel domain as well as original uncompressed images. The obtained performance results show that our proposed compressed-domain classification model can distinctly outperform the existing compressed-domain classification models, and that it can also yield similar accuracy results with a much higher computational efficiency as compared to the pixel-domain models that are trained using fully decoded images.	翻訳日:2023-05-16 18:21:29 公開日:2023-05-13
# ディープニューラルネットワークのための逐次アフィン学習 Successive Affine Learning for Deep Neural Networks ( http://arxiv.org/abs/2305.07996v1 ) ライセンス: Link先を確認	Yuesheng Xu	(参考訳) 本稿では,深層ニューラルネットワーク構築のための逐次アフィン学習(SAL)モデルを提案する。伝統的に、DNNは非凸最適化問題の解決によって構築される。このような問題を非凸性や多数の層を持つため数値的に解くことはしばしば困難である。本論文の著者らにより,人間教育システムに触発されたこの課題に対処するため,近年,多段階深層学習(MGDL)モデルが始められた。 MGDLモデルはいくつかのグレードでDNNを学習し、それぞれが少数の層からなる浅いDNNを構築する。 MGDLモデルは、まだいくつかの非凸最適化問題を解く必要がある。提案したSALモデルはMGDLモデルから変異する。 DNNの各層がアフィン写像とアクティベーション関数から構成されていることに注意し、活性化関数を重み行列と現在の層のバイアスベクトルのみを含む二次凸最適化問題を解くことでアフィン写像を学習することを提案する。関数近似の文脈において、与えられた関数に対して、SALモデルはDNNの形式で適応基底関数を持つ関数の直交展開を生成する。 SALモデルにより生成された直交系に対して,ピタゴラスのアイデンティティとParsevalのアイデンティティを確立する。さらに、SAL過程の収束定理は、有限個のグレードの後に終了するか、その最適誤差関数のノルムが、階数数が無限大に増加するにつれて、極限まで厳密に減少することを意味する。さらに,提案したSALモデルが従来のディープラーニングモデルよりも優れていることを示す概念実証の数値例を示す。 This paper introduces a successive affine learning (SAL) model for constructing deep neural networks (DNNs). Traditionally, a DNN is built by solving a non-convex optimization problem. It is often challenging to solve such a problem numerically due to its non-convexity and having a large number of layers. To address this challenge, inspired by the human education system, the multi-grade deep learning (MGDL) model was recently initiated by the author of this paper. The MGDL model learns a DNN in several grades, in each of which one constructs a shallow DNN consisting of a small number of layers. The MGDL model still requires solving several non-convex optimization problems. The proposed SAL model mutates from the MGDL model. Noting that each layer of a DNN consists of an affine map followed by an activation function, we propose to learn the affine map by solving a quadratic/convex optimization problem which involves the activation function only {\it after} the weight matrix and the bias vector for the current layer have been trained. In the context of function approximation, for a given function the SAL model generates an orthogonal expansion of the function with adaptive basis functions in the form of DNNs. We establish the Pythagorean identity and the Parseval identity for the orthogonal system generated by the SAL model. Moreover, we provide a convergence theorem of the SAL process in the sense that either it terminates after a finite number of grades or the norms of its optimal error functions strictly decrease to a limit as the grade number increases to infinity. Furthermore, we present numerical examples of proof of concept which demonstrate that the proposed SAL model significantly outperforms the traditional deep learning model.	翻訳日:2023-05-16 18:21:10 公開日:2023-05-13
# 多言語前ファクトチェッククレーム検索 Multilingual Previously Fact-Checked Claim Retrieval ( http://arxiv.org/abs/2305.07991v1 ) ライセンス: Link先を確認	Mat\'u\v{s} Pikuliak and Ivan Srba and Robert Moro and Timo Hromadka and Timotej Smolen and Martin Melisek and Ivan Vykopal and Jakub Simko and Juraj Podrouzek and Maria Bielikova	(参考訳) ファクトチェックは、事実チェックが必要な大量のオンラインコンテンツによって、しばしば妨げられる。 NLPは、調査中のコンテンツに関連する既存の事実チェックを取得することで、それらを支援することができる。本稿では,以前に事実確認されたクレーム検索のための多言語データセットであるMultiClaimを紹介する。ソーシャルメディアから27の言語で28kの投稿、プロのファクトチェック担当者が書いた39の言語で206kのファクトチェック、そしてこれら2つのグループ間の31kの接続を集めました。これは、これまででもっとも広範囲で言語的に多様なデータセットである。教師なしの手法がデータセットとその様々な次元にどう影響するかを評価した。このような多種多様なデータセットの評価には複雑さがあり,結果の解釈に先立って適切な対応が必要となる。また,教師なしの微調整手法も評価し,教師なし手法を大幅に改善した。 Fact-checkers are often hampered by the sheer amount of online content that needs to be fact-checked. NLP can help them by retrieving already existing fact-checks relevant to the content being investigated. This paper introduces a new multilingual dataset -- MultiClaim -- for previously fact-checked claim retrieval. We collected 28k posts in 27 languages from social media, 206k fact-checks in 39 languages written by professional fact-checkers, as well as 31k connections between these two groups. This is the most extensive and the most linguistically diverse dataset of this kind to date. We evaluated how different unsupervised methods fare on this dataset and its various dimensions. We show that evaluating such a diverse dataset has its complexities and proper care needs to be taken before interpreting the results. We also evaluated a supervised fine-tuning approach, improving upon the unsupervised method significantly.	翻訳日:2023-05-16 18:20:45 公開日:2023-05-13
# 会議要約のための自己監督文圧縮 Self-Supervised Sentence Compression for Meeting Summarization ( http://arxiv.org/abs/2305.07988v1 ) ライセンス: Link先を確認	Haochen Tan, Han Wu, Wei Shao, Xinyun Zhang, Mingjie Zhan, Zhaohui Hou, Ding Liang, Linqi Song	(参考訳) 従来の要約モデルは、通常、ミーティングコーパスは長い会話を持つ複数のパーティを伴い、冗長で自明なコンテンツが詰め込まれているため、文書の要約において重要な情報をキャプチャできないことが多い。この問題に対処するために,svbは,スライディング・ウィンドウ対話の復元と\textbf{s}coring,チャネルワイズ重要度スコア \textbf{v}oting,相対位置的 \textbf{b}ucketingの3つのプロセスを通じて,冗長性を保ちながら,冗長性を‘圧縮’する,効果的かつ効率的な要約フレームワークである。具体的には、自己監督パラダイムの下で、スライディングウィンドウスコアは、複数のビューから各トークンの重要性を評価することを目的としている。そして、これらの評価はチャンネルワイド投票によって集計される。高評価のトークンは有能な情報と見なされ、‘textit{anchors} とラベル付けされる。最後に、言語モデルに対して許容される長さに長大な入力を調整するために、異なる粒度で他の無関係な内容を圧縮しながらアンカーを保持する相対的な位置バケットアルゴリズムを実行する。大規模事前学習やエキスパートレベルアノテートツールがなければ,提案手法は従来の最先端手法に匹敵する。本手法の有効性を証明するために,膨大な評価と分析を行った。 The conventional summarization model often fails to capture critical information in meeting transcripts, as meeting corpus usually involves multiple parties with lengthy conversations and is stuffed with redundant and trivial content. To tackle this problem, we present SVB, an effective and efficient framework for meeting summarization that `compress' the redundancy while preserving important content via three processes: sliding-window dialogue restoration and \textbf{S}coring, channel-wise importance score \textbf{V}oting, and relative positional \textbf{B}ucketing. Specifically, under the self-supervised paradigm, the sliding-window scoring aims to rate the importance of each token from multiple views. Then these ratings are aggregated by channel-wise voting. Tokens with high ratings will be regarded as salient information and labeled as \textit{anchors}. Finally, to tailor the lengthy input to an acceptable length for the language model, the relative positional bucketing algorithm is performed to retain the anchors while compressing other irrelevant contents in different granularities. Without large-scale pre-training or expert-grade annotating tools, our proposed method outperforms previous state-of-the-art approaches. A vast amount of evaluations and analyses are conducted to prove the effectiveness of our method.	翻訳日:2023-05-16 18:20:31 公開日:2023-05-13
# SCENE: 否定的事例への外挿のための自己ラベル型対策 SCENE: Self-Labeled Counterfactuals for Extrapolating to Negative Examples ( http://arxiv.org/abs/2305.07984v1 ) ライセンス: Link先を確認	Deqing Fu, Ameya Godbole, Robin Jia	(参考訳) 否定を検知する(非包含関係、未解決問題、虚偽主張など)ことは、多くの自然言語理解タスクにおいて重要かつ困難な側面である。手動による挑戦的なネガティブな例の収集は、モデルの検出に役立つが、コストとドメイン固有性の両方がある。本研究では,課題となる否定的な例を検出するモデルの能力を大幅に向上させるトレーニングデータの合成手法であるscene(expolating to negative examples)を提案する。既存のラベルの新しい例を合成する標準的なデータ拡張とは対照的に、SCENEは正の例のみから負の例をゼロショットに合成することができる。正の例が与えられた場合、SCENEはマスク満載モデルでそれを摂動し、その結果の例が自己学習ヒューリスティックに基づいて負かどうかを決定する。回答可能なトレーニング例のみを使用することで、studio 2.0でトレーニングされたモデルと比較して、studio 2.0のパフォーマンスギャップの69.6%をクローズすることができる。また,本手法は,文の包含度を認識してブール質問応答に拡張し,SQuADからACE-whQAへの一般化を改善する。 Detecting negatives (such as non-entailment relationships, unanswerable questions, and false claims) is an important and challenging aspect of many natural language understanding tasks. Though manually collecting challenging negative examples can help models detect them, it is both costly and domain-specific. In this work, we propose Self-labeled Counterfactuals for Extrapolating to Negative Examples (SCENE), an automatic method for synthesizing training data that greatly improves models' ability to detect challenging negative examples. In contrast with standard data augmentation, which synthesizes new examples for existing labels, SCENE can synthesize negative examples zero-shot from only positive ones. Given a positive example, SCENE perturbs it with a mask infilling model, then determines whether the resulting example is negative based on a self-training heuristic. With access to only answerable training examples, SCENE can close 69.6% of the performance gap on SQuAD 2.0, a dataset where half of the evaluation examples are unanswerable, compared to a model trained on SQuAD 2.0. Our method also extends to boolean question answering and recognizing textual entailment, and improves generalization from SQuAD to ACE-whQA, an out-of-domain extractive QA benchmark.	翻訳日:2023-05-16 18:20:04 公開日:2023-05-13
# ゼロショットFactual Error Correction Zero-shot Faithful Factual Error Correction ( http://arxiv.org/abs/2305.07982v1 ) ライセンス: Link先を確認	Kung-Hsiang Huang, Hou Pong Chan, Heng Ji	(参考訳) 事実的誤りを忠実に訂正することは、テキスト的知識基盤の完全性を維持し、シーケンスからシーケンスへのモデルの幻覚を防止するために重要である。人間が事実の誤りを識別し、訂正する能力に基づいて、入力クレームに関する質問を定式化し、与えられた証拠の正しい回答を求め、その証拠と整合性に基づいて各補正の忠実さを評価するゼロショットフレームワークを提案する。私たちのゼロショットフレームワークは、FEVERとSciFactデータセットの実験で示されたように、完全に教師されたアプローチよりも優れています。さらに重要なことに、フレームワークの分解性は本質的に解釈可能性を提供します。さらに,事実的誤り訂正を評価するのに最も適した指標を明らかにするために,一般的に使用される指標と人間の判断との相関を,知性と忠実性に関する3つの異なる次元で分析する。 Faithfully correcting factual errors is critical for maintaining the integrity of textual knowledge bases and preventing hallucinations in sequence-to-sequence models. Drawing on humans' ability to identify and correct factual errors, we present a zero-shot framework that formulates questions about input claims, looks for correct answers in the given evidence, and assesses the faithfulness of each correction based on its consistency with the evidence. Our zero-shot framework outperforms fully-supervised approaches, as demonstrated by experiments on the FEVER and SciFact datasets, where our outputs are shown to be more faithful. More importantly, the decomposability nature of our framework inherently provides interpretability. Additionally, to reveal the most suitable metrics for evaluating factual error corrections, we analyze the correlation between commonly used metrics with human judgments in terms of three different dimensions regarding intelligibility and faithfulness.	翻訳日:2023-05-16 18:19:40 公開日:2023-05-13
# 低次元多様体上の超音速重力空力 Grasping Extreme Aerodynamics on a Low-Dimensional Manifold ( http://arxiv.org/abs/2305.08024v1 ) ライセンス: Link先を確認	Kai Fukami and Kunihiko Taira	(参考訳) 現代の航空車両は輸送、防衛、監視、救助など幅広い活動を行っている。これらの航空機は穏やかな条件で飛行できるが、都市キャニオンや山岳地帯、船の覚醒などに見られる悪質な環境での運用を避けることができる。より小型の航空機は特にこのようなガスト障害を起こしやすい。地球温暖化により極端に天候が頻繁になり、特に小型の航空機は大規模な大気障害に遭遇し、安定した飛行を管理することが期待されている。しかし、極端に渦巻くガストが飛行体に与える影響を記述できる基礎は事実上存在しない。この難しい問題を解くために、ガスティ条件の翼が遭遇する巨大なパラメータ空間が存在する。渦巻と翼の相互作用は、ガストパラメータの組合せごとに複雑で異なるように見えるが、この研究では、極端空気力学の背後にある基礎物理学は、従来の予想よりもはるかに単純で低ランクであることが示されている。時間およびパラメータ空間上の非線形渦流場は、元の高次元物理学の本質を保ちながら、リフト誘導オートエンコーダを持つ3変数のみに圧縮できることが明らかとなった。極端な空気力学的流れは、機械学習によって低次元多様体に最適圧縮され、適切な座標の同定が解析、モデリング、極端に非定常なガスティ流れの制御を促進することを示唆する。本研究は、伝統的に飛行不能と考えられる大気条件下での次世代小型航空機の安定飛行を支援するものである。 Modern air vehicles perform a wide range of operations, including transportation, defense, surveillance, and rescue. These aircraft can fly in calm conditions but avoid operations in gusty environments, which are seen in urban canyons, over mountainous terrains, and in ship wakes. Smaller aircraft are especially prone to such gust disturbances. With extreme weather becoming ever more frequent due to global warming, it is anticipated that aircraft, especially those that are smaller in size, encounter large-scale atmospheric disturbances and still be expected to manage stable flight. However, there exists virtually no foundation to describe the influence of extreme vortical gusts on flying bodies. To compound on this difficult problem, there is an enormous parameter space for gusty conditions wings encounter. While the interaction between the vortical gusts and wings is seemingly complex and different for each combination of gust parameters, we show in this study that the fundamental physics behind extreme aerodynamics is far simpler and low-rank than traditionally expected. It is revealed that the nonlinear vortical flow field over time and parameter space can be compressed to only three variables with a lift-augmented autoencoder while holding the essence of the original high-dimensional physics. Extreme aerodynamic flows can be optimally compressed through machine learning into a low-dimensional manifold, implying that the identification of appropriate coordinates facilitates analyses, modeling, and control of extremely unsteady gusty flows. The present findings support the stable flight of next-generation small air vehicles in atmosphere conditions traditionally considered unflyable.	翻訳日:2023-05-16 18:14:15 公開日:2023-05-13
# ヒント: 時間的ニューラルネットワークのためのトポロジカルに重要な経路サンプリング TIPS: Topologically Important Path Sampling for Anytime Neural Networks ( http://arxiv.org/abs/2305.08021v1 ) ライセンス: Link先を確認	Guihong Li, Kartikeya Bhardwaj, Yuedong Yang, Radu Marculescu	(参考訳) anytime neural network(anytimenns)は、さまざまなハードウェアリソース制約下で実行時にモデルの複雑さを適応的に調整するための有望なソリューションである。しかし、手動設計のAnytimeNNはデザイナの事前経験に偏りがあり、したがって準最適ソリューションを提供する。既存の手作りアプローチの限界に対処するために、我々は最初にanytimennsのトレーニングプロセスを離散時間マルコフ連鎖(dtmc)としてモデル化し、anytimennsのトレーニングに最も寄与する経路を特定するためにそれを使用する。この新たなDTMCに基づく分析に基づいて,様々なハードウェア制約下でAnytimeNNを自動設計するフレームワークであるTIPSを提案する。実験の結果,TIPSはAnytimeNNの収束率とテスト精度を向上させることができることがわかった。既存のAnytimeNNのアプローチと比較して、TIPSは複数のデータセットで精度を2%-6.6%向上し、SOTAの精度-FLOPのトレードオフを達成する。 Anytime neural networks (AnytimeNNs) are a promising solution to adaptively adjust the model complexity at runtime under various hardware resource constraints. However, the manually-designed AnytimeNNs are biased by designers' prior experience and thus provide sub-optimal solutions. To address the limitations of existing hand-crafted approaches, we first model the training process of AnytimeNNs as a discrete-time Markov chain (DTMC) and use it to identify the paths that contribute the most to the training of AnytimeNNs. Based on this new DTMC-based analysis, we further propose TIPS, a framework to automatically design AnytimeNNs under various hardware constraints. Our experimental results show that TIPS can improve the convergence rate and test accuracy of AnytimeNNs. Compared to the existing AnytimeNNs approaches, TIPS improves the accuracy by 2%-6.6% on multiple datasets and achieves SOTA accuracy-FLOPs tradeoffs.	翻訳日:2023-05-16 18:13:50 公開日:2023-05-13
# DRew:遅延で動的にリワイヤされたメッセージパッシング DRew: Dynamically Rewired Message Passing with Delay ( http://arxiv.org/abs/2305.08018v1 ) ライセンス: Link先を確認	Benjamin Gutteridge, Xiaowen Dong, Michael Bronstein, Francesco Di Giovanni	(参考訳) メッセージパッシングニューラルネットワーク(mpnn)は、長距離インタラクションに依存するタスクのパフォーマンス低下を引き起こす過剰スワッシング現象に苦しむことが示されている。これは主に、ノードの直近の近傍でローカルにのみ発生するメッセージパッシングに起因している。グラフを'より連結'しようとするアプローチをリライトすることは、長距離タスクに適していると思われるが、遠いノードをすべての層で瞬時に通信させるため、グラフ上の距離によって得られる帰納バイアスを失うことが多い。本稿では,いずれのmpnnアーキテクチャにも適用可能な,グラフの段階的高密度化を保証するためのレイヤ依存リワイリングを実現するフレームワークを提案する。また,各層と相互距離に依存するノード間の接続をスキップする遅延機構を提案する。提案手法を複数の長距離タスクで検証し,グラフトランスフォーマーやマルチホップmpnnよりも優れていることを示す。 Message passing neural networks (MPNNs) have been shown to suffer from the phenomenon of over-squashing that causes poor performance for tasks relying on long-range interactions. This can be largely attributed to message passing only occurring locally, over a node's immediate neighbours. Rewiring approaches attempting to make graphs `more connected', and supposedly better suited to long-range tasks, often lose the inductive bias provided by distance on the graph since they make distant nodes communicate instantly at every layer. In this paper we propose a framework, applicable to any MPNN architecture, that performs a layer-dependent rewiring to ensure gradual densification of the graph. We also propose a delay mechanism that permits skip connections between nodes depending on the layer and their mutual distance. We validate our approach on several long-range tasks and show that it outperforms graph Transformers and multi-hop MPNNs.	翻訳日:2023-05-16 18:13:32 公開日:2023-05-13
# CheXDragonのトレーニング方法:新しいタスクや医療システムへの移行のための胸部X線モデルのトレーニング How to Train Your CheXDragon: Training Chest X-Ray Models for Transfer to Novel Tasks and Healthcare Systems ( http://arxiv.org/abs/2305.08017v1 ) ライセンス: Link先を確認	Cara Van Uden and Jeremy Irvin and Mars Huang and Nathan Dean and Jason Carr and Andrew Ng and Curtis Langlotz	(参考訳) 自己教師付き学習(SSL)は、機械学習モデルのラベルの効率的なトレーニングを可能にする。これは医療画像などの領域において必須であり、ラベルは費用がかかり、治すのに時間がかかります。しかし、異なる医療システムや新しいタスクにモデルを転送するための最も効果的な教師付きまたはSSL戦略はよく理解されていない。本研究では,医療画像(ケストX線)とテキスト(放射線学報告)のマルチモーダルデータセットを用いて,教師付きおよび自己指導型事前訓練戦略を体系的に実験した。次に、様々なタスクセットを持つ2つの外部機関のデータによる性能評価を行う。さらに,これらのモデルを新しいタスクや医療システムに効果的に適用するために,異なるトランスファー学習戦略を実験する。我々の経験的結果は、マルチモーダルSSLは、新しい医療システムやタスクにおけるパフォーマンスにおいて、完全な監視で事前訓練されたモデルに匹敵する、一過性のSSLよりも大幅に向上していることを示唆している。マルチモーダルドメイン適応型事前学習(DAPT)、線形探索(LP-FT)、および両手法の組み合わせにより、新しいデータセットとタスクにさらに適応したモデルによるさらなる性能向上を示す。これらの追加がすべて実現可能でないシナリオで使用する代替モデルを提案する。本研究は,新しい医療システムと新しい課題に対する医用画像解釈モデルの一般化に関するガイダンスを提供する。 Self-supervised learning (SSL) enables label efficient training for machine learning models. This is essential for domains such as medical imaging, where labels are costly and time-consuming to curate. However, the most effective supervised or SSL strategy for transferring models to different healthcare systems or novel tasks is not well understood. In this work, we systematically experiment with a variety of supervised and self-supervised pretraining strategies using multimodal datasets of medical images (chest X-rays) and text (radiology reports). We then evaluate their performance on data from two external institutions with diverse sets of tasks. In addition, we experiment with different transfer learning strategies to effectively adapt these pretrained models to new tasks and healthcare systems. Our empirical results suggest that multimodal SSL gives substantial gains over unimodal SSL in performance across new healthcare systems and tasks, comparable to models pretrained with full supervision. We demonstrate additional performance gains with models further adapted to the new dataset and task, using multimodal domain-adaptive pretraining (DAPT), linear probing then finetuning (LP-FT), and both methods combined. We offer suggestions for alternative models to use in scenarios where not all of these additions are feasible. Our results provide guidance for improving the generalization of medical image interpretation models to new healthcare systems and novel tasks.	翻訳日:2023-05-16 18:13:02 公開日:2023-05-13
# 軽量オールコンベネト・トランスファー学習による表面emgに基づくセッション間/サブジェクション認識 Surface EMG-Based Inter-Session/Inter-Subject Gesture Recognition by Leveraging Lightweight All-ConvNet and Transfer Learning ( http://arxiv.org/abs/2305.08014v1 ) ライセンス: Link先を確認	Md. Rabiul Islam, Daniel Massicotte, Philippe Y. Massicotte, and Wei-Ping Zhu	(参考訳) 低解像度のHD-sEMG画像を用いたジェスチャー認識は、より流動的で自然な筋肉-コンピュータインターフェースを開発するための新たな道を開く。しかし、セッション間およびサブジェクト間シナリオ間のデータ変動は大きな課題となる。既存のアプローチでは、非常に大きく複雑なConvNetまたは2SRNNベースのドメイン適応手法を使用して、これらのセッション間およびオブジェクト間データのばらつきに起因する分散シフトを近似した。したがって、これらの方法は、何百万ものトレーニングパラメータと、事前トレーニングと適応段階の両方で、トレーニング済みおよびターゲットドメインデータセットを学習する必要がある。その結果、リアルタイムアプリケーションへのデプロイには、ハイエンドのリソースバウンドと計算コストが非常にかかる。本稿では,この問題を解決するために,軽量なall-convnet and transfer learning(tl)を活用した軽量なall-convnet+tlモデルを提案する。 all-convnet+tlモデルは畳み込み層のみで構成されており、セッション間およびサブジェクト間データ可変性によって引き起こされる分散シフトに対処するための不変および判別表現を学習するための単純かつ効率的なフレームワークである。 4つのデータセットに対する実験により,提案手法は,既存の手法よりも大きなマージンで優れており,セッション間およびオブジェクト間シナリオにおける最先端の結果が得られ,セッション内ジェスチャ認識において同等あるいは競合的に実行されることを示した。これらのパフォーマンスギャップは、少数のデータ(例えば単一のトライアル)がターゲットドメインで利用可能になったときにさらに増加する。これらの顕著な実験結果は、現在の最先端モデルが、sEMGベースのセッション間およびオブジェクト間ジェスチャー認識タスクに対して過度にパラメータ化されていることを示す。 Gesture recognition using low-resolution instantaneous HD-sEMG images opens up new avenues for the development of more fluid and natural muscle-computer interfaces. However, the data variability between inter-session and inter-subject scenarios presents a great challenge. The existing approaches employed very large and complex deep ConvNet or 2SRNN-based domain adaptation methods to approximate the distribution shift caused by these inter-session and inter-subject data variability. Hence, these methods also require learning over millions of training parameters and a large pre-trained and target domain dataset in both the pre-training and adaptation stages. As a result, it makes high-end resource-bounded and computationally very expensive for deployment in real-time applications. To overcome this problem, we propose a lightweight All-ConvNet+TL model that leverages lightweight All-ConvNet and transfer learning (TL) for the enhancement of inter-session and inter-subject gesture recognition performance. The All-ConvNet+TL model consists solely of convolutional layers, a simple yet efficient framework for learning invariant and discriminative representations to address the distribution shifts caused by inter-session and inter-subject data variability. Experiments on four datasets demonstrate that our proposed methods outperform the most complex existing approaches by a large margin and achieve state-of-the-art results on inter-session and inter-subject scenarios and perform on par or competitively on intra-session gesture recognition. These performance gaps increase even more when a tiny amount (e.g., a single trial) of data is available on the target domain for adaptation. These outstanding experimental results provide evidence that the current state-of-the-art models may be overparameterized for sEMG-based inter-session and inter-subject gesture recognition tasks.	翻訳日:2023-05-16 18:12:02 公開日:2023-05-13
# 損失圧縮によるディープニューラルネットワークの情報ボトルネック解析 Information Bottleneck Analysis of Deep Neural Networks via Lossy Compression ( http://arxiv.org/abs/2305.08013v1 ) ライセンス: Link先を確認	Ivan Butakov, Aleksander Tolmachev, Sofia Malanchuk, Anna Neopryatnaya, Alexey Frolov, Kirill Andreev	(参考訳) Information Bottleneck(IB)原則は、ディープニューラルネットワーク(DNN)のトレーニングプロセスを分析するための情報理論フレームワークを提供する。その本質は、2つの相互情報(MI)値のダイナミクスを追跡することである。1つは隠れた層とクラスラベルの間のもので、もう1つは隠れた層とDNN入力の間のものである。 Shwartz-Ziv と Tishby (2017) の仮説によれば、トレーニングプロセスは、フィッティングと圧縮の2つの異なるフェーズで構成されている。後者のフェーズは、DNNによる優れた一般化性能を考慮に入れていると考えられている。高次元ランダムベクトル間のmi推定の難しい性質から、この仮説はおもちゃのnnや量子化nnやドロップアウトnnといった特定のタイプのnnに対してのみ検証されている。本稿では,一般NNのICB解析を行うための包括的フレームワークを提案する。提案手法はGoldfeld et al. (2019) によって提案された確率的NN法を利用しており、高次元性に関連する障害を克服するための圧縮ステップを取り入れている。言い換えれば、高次元ランダムベクトルの圧縮表現の間のmiを推定する。提案手法は理論的および実用的正当性の両方で支持される。特に,事前定義されたmi値を用いた合成実験により推定器の精度を示す。最後に,MI力学の新たな特徴を明らかにする畳み込み DNN を用いて IB 解析を行う。 The Information Bottleneck (IB) principle offers an information-theoretic framework for analyzing the training process of deep neural networks (DNNs). Its essence lies in tracking the dynamics of two mutual information (MI) values: one between the hidden layer and the class label, and the other between the hidden layer and the DNN input. According to the hypothesis put forth by Shwartz-Ziv and Tishby (2017), the training process consists of two distinct phases: fitting and compression. The latter phase is believed to account for the good generalization performance exhibited by DNNs. Due to the challenging nature of estimating MI between high-dimensional random vectors, this hypothesis has only been verified for toy NNs or specific types of NNs, such as quantized NNs and dropout NNs. In this paper, we introduce a comprehensive framework for conducting IB analysis of general NNs. Our approach leverages the stochastic NN method proposed by Goldfeld et al. (2019) and incorporates a compression step to overcome the obstacles associated with high dimensionality. In other words, we estimate the MI between the compressed representations of high-dimensional random vectors. The proposed method is supported by both theoretical and practical justifications. Notably, we demonstrate the accuracy of our estimator through synthetic experiments featuring predefined MI values. Finally, we perform IB analysis on a close-to-real-scale convolutional DNN, which reveals new features of the MI dynamics.	翻訳日:2023-05-16 18:11:30 公開日:2023-05-13
# スパイクニューラルネットワークの量子化 Quantization in Spiking Neural Networks ( http://arxiv.org/abs/2305.08012v1 ) ライセンス: Link先を確認	Bernhard A. Moser and Michael Lunglmayr	(参考訳) スパイキングニューラルネットワーク(SNN)では、各ノードで重み付きディラックパルスの入力シーケンスをスパイク集約としきい値の閾値に基づく漏れ積分(LIF)ニューロンモデルにより重み付きディラックパルスの出力シーケンスに変換する。この写像は量子化作用素として理解でき、アレクセイヴィチノルムを用いて量子化誤差に対応する公式を述べる。この分析は LIF モデルにおける再初期化の再考に影響を及ぼし、モジュロベースのリセット変種として 'reset-to-mod' を提案する。 In spiking neural networks (SNN), at each node, an incoming sequence of weighted Dirac pulses is converted into an output sequence of weighted Dirac pulses by a leaky-integrate-and-fire (LIF) neuron model based on spike aggregation and thresholding. We show that this mapping can be understood as a quantization operator and state a corresponding formula for the quantization error by means of the Alexiewicz norm. This analysis has implications for rethinking re-initialization in the LIF model, leading to the proposal of 'reset-to-mod' as a modulo-based reset variant.	翻訳日:2023-05-16 18:11:06 公開日:2023-05-13
# ProKnow:メンタルヘルス診断支援のための安全・説明可能な質問生成のためのプロセス知識 ProKnow: Process Knowledge for Safety Constrained and Explainable Question Generation for Mental Health Diagnostic Assistance ( http://arxiv.org/abs/2305.08010v1 ) ライセンス: Link先を確認	Kaushik Roy, Manas Gaur, Misagh Soltani, Vipula Rawte, Ashwin Kalyan, Amit Sheth	(参考訳) 現在のバーチャルメンタルヘルスアシスタント(vmhas)はカウンセリングと示唆的なケアを提供する。彼らは安全性と専門的な臨床プロセス知識の訓練が不足しているため、患者の診断支援を控えている。本研究では,Proknowをエビデンスに基づくガイドラインやドメインの専門家に対する概念理解のカテゴリにマップする情報集合として定義する。また,医療従事者が使用する安全制約やプロノウハウによって誘導される,新たな診断会話データセットも導入する。患者からの診断情報を対話的に収集する自然言語質問生成法(NLG)を開発した。このデータセットで最先端の大規模言語モデル(LM)を使用することの限界を実証する。我々のアルゴリズムは、安全性、知識獲得、説明可能性を明確にモデル化することでプロセスの知識をモデル化する。 ProKnowガイド法で拡張したLMは、うつ病や不安領域でより安全な89%の質問を発生させた。生成した質問の説明性は、抑うつや不安に関する知識ベースの概念と類似した計算によって評価される。総じて,本手法を改良したlmsのタイプに関わらず,安全性,説明可能性,プロセスガイドによる質問生成において,事前学習した単純なlmsと比較して平均82%の改善を達成できた。提案手法の有効性を定量的に定量的に評価し,安全性,説明可能性,プロセス知識の順守に関する3つの新しい評価指標を導入する。 Current Virtual Mental Health Assistants (VMHAs) provide counseling and suggestive care. They refrain from patient diagnostic assistance because they lack training in safety-constrained and specialized clinical process knowledge. In this work, we define Proknow as an ordered set of information that maps to evidence-based guidelines or categories of conceptual understanding to experts in a domain. We also introduce a new dataset of diagnostic conversations guided by safety constraints and Proknow that healthcare professionals use. We develop a method for natural language question generation (NLG) that collects diagnostic information from the patient interactively. We demonstrate the limitations of using state-of-the-art large-scale language models (LMs) on this dataset. Our algorithm models the process knowledge through explicitly modeling safety, knowledge capture, and explainability. LMs augmented with ProKnow guided method generated 89% safer questions in the depression and anxiety domain. The Explainability of the generated question is assessed by computing similarity with concepts in depression and anxiety knowledge bases. Overall, irrespective of the type of LMs augmented with our ProKnow, we achieved an average 82% improvement over simple pre-trained LMs on safety, explainability, and process-guided question generation. We qualitatively and quantitatively evaluate the efficacy of the proposed ProKnow-guided methods by introducing three new evaluation metrics for safety, explainability, and process knowledge adherence.	翻訳日:2023-05-16 18:10:52 公開日:2023-05-13
# 時間依存量子振動子に対するウィグナー・ヴラソフ形式 The Wigner-Vlasov formalism for time-dependent quantum oscillator ( http://arxiv.org/abs/2305.06069v3 ) ライセンス: Link先を確認	E.E. Perepelkin, B.I. Sadovnikov, N.G. Inozemtseva, A.A. Korepanova	(参考訳) 本稿では,位相空間における量子系に対するvlasov理論とwigner関数の枠組みにおける時間依存周波数を持つ高調波発振器の問題を包括的に検討する。ヴェラソフ方程式チェーンとシュル=オディンガー方程式、およびウィグナー関数のモヤル方程式の関係を用いて、この問題の厳密な解を求める新しい方法が提案されている。位相空間におけるウィグナー関数上のエネルギー関数を平均化する方法は、量子系に対する時間依存エネルギースペクトルを得るために用いられる。ヴラソフ方程式の解はヒル方程式を満たす特性の形で表現することができる。ヒル方程式の特別な場合、すなわち不安定解を持つマチュー方程式は詳細に検討されている。不安定な量子系のダイナミクスの解析により、ウィグナー関数レベル線で有界な位相空間の正方形は時間保存されるが、エネルギー関数線で有界な位相空間の正方形は増加する。この場合、ヴラソフ方程式の特徴はウィグナー関数レベル線とエネルギー関数ラインの交差点に位置する。このクロスポイントは不安定なシステムのダイナミクスを表す軌道で時間とともに移動する。それぞれの軌道は独自のエネルギーを持ち、ウィグナー関数上でこれらのエネルギーを平均すると、システム全体の時間依存離散エネルギースペクトルとなる。一般化位相空間 $\left\{x,p,\dot{p},\ddot{p} \right\} において、4階のウィグナー函数に対して明示的な表現が得られている。 $ This paper presents a comprehensive investigation of the problem of a harmonic oscillator with time-depending frequencies in the framework of the Vlasov theory and the Wigner function apparatus for quantum systems in the phase space. A new method is proposed to find an exact solution of this problem using a relation of the Vlasov equation chain with the Schr\"odinger equation and with the Moyal equation for the Wigner function. A method of averaging the energy function over the Wigner function in the phase space can be used to obtain time-dependent energy spectrum for a quantum system. The Vlasov equation solution can be represented in the form of characteristics satisfying the Hill equation. A particular case of the Hill equation, namely the Mathieu equation with unstable solutions, has been considered in details. An analysis of the dynamics of an unstable quantum system shows that the phase space square bounded with the Wigner function level line conserves in time, but the phase space square bounded with the energy function line increases. In this case the Vlasov equation characteristic is situated on the crosspoint of the Wigner function level line and the energy function line. This crosspoint moves in time with a trajectory that represents the unstable system dynamics. Each such trajectory has its own energy, and averaging these energies over the Wigner function results in time-dependent discreet energy spectrum for the whole system. An explicit expression has been obtained for the Wigner function of the 4th rank in the generalized phase space $\left\{ x,p,\dot{p},\ddot{p} \right\}.$	翻訳日:2023-05-16 11:16:59 公開日:2023-05-13

Title

Authors

Abstract

論文公表日・翻訳日

# プログラムの自動修復手法に関する調査研究

A Survey on Automated Program Repair Techniques ( http://arxiv.org/abs/2303.18184v3 )

ライセンス: Link先を確認

Kai Huang, Zhengzi Xu, Su Yang, Hongyu Sun, Xuejun Li, Zheng Yan, Yuqing Zhang

(参考訳) プログラムソフトウェアの急速な開発と大規模な普及により、現代社会はますますソフトウェアシステムに依存している。しかし、ソフトウェアによってもたらされる問題は表面化している。ソフトウェア欠陥は、開発者を悩ませる重要な要因になっている。この文脈では、自動プログラム修復(automated program repair, apr)技術が登場し、ソフトウェアの欠陥問題を自動修正し、手動デバッグ作業を減らすことを目的としている。特に、ディープラーニングの進歩により、近年多くの学習ベースのAPR技術が出現し、APR研究の新たな機会ももたらされている。研究者は、APR技術の完全な発展と今後の展望を素早く概説するため、APR技術の進化を再考し、APR研究の最新の進歩について深く議論する。本稿では,検索ベース,制約ベース,テンプレートベース,学習ベースという4つの異なるパッチ生成方式を用いて,APR手法の開発を紹介する。さらに,各APRツールをレビュー・比較するための一貫した基準セットを提案し,APR技術の利点とデメリットをまとめた上で,APR開発の現状について論じる。さらに,本研究は,APR開発を進める大きな動機となった,APRの関連技術分野の研究についても紹介する。最後に,現状の課題と今後の方向性を分析し,特に大規模言語モデルがapr研究にもたらした重要な機会を強調する。

With the rapid development and large-scale popularity of program software, modern society increasingly relies on software systems. However, the problems exposed by software have also come to the fore. Software defect has become an important factor troubling developers. In this context, Automated Program Repair (APR) techniques have emerged, aiming to automatically fix software defect problems and reduce manual debugging work. In particular, benefiting from the advances in deep learning, numerous learning-based APR techniques have emerged in recent years, which also bring new opportunities for APR research. To give researchers a quick overview of APR techniques' complete development and future opportunities, we revisit the evolution of APR techniques and discuss in depth the latest advances in APR research. In this paper, the development of APR techniques is introduced in terms of four different patch generation schemes: search-based, constraint-based, template-based, and learning-based. Moreover, we propose a uniform set of criteria to review and compare each APR tool, summarize the advantages and disadvantages of APR techniques, and discuss the current state of APR development. Furthermore, we introduce the research on the related technical areas of APR that have also provided a strong motivation to advance APR development. Finally, we analyze current challenges and future directions, especially highlighting the critical opportunities that large language models bring to APR research.

翻訳日:2023-10-24 12:56:27 公開日:2023-05-13

# AR/VRテクノロジスタック: ソフトウェア開発ライブラリ、プラットフォーム、ツールの中心的なリポジトリ

The AR/VR Technology Stack: A Central Repository of Software Development Libraries, Platforms, and Tools ( http://arxiv.org/abs/2305.07842v1 )

ライセンス: Link先を確認

Jasmine Roberts

(参考訳) 拡張現実、バーチャル、複合現実の領域に特化して、ソフトウェア開発ライブラリ、プラットフォーム、ツールの包括的なリポジトリ。

A comprehensive repository of software development libraries, platforms, and tools specifically to the domains of augmented, virtual, and mixed reality.

翻訳日:2023-10-24 08:55:30 公開日:2023-05-13

# 形式的手法によるXAI神話の否定 -初期結果-

Disproving XAI Myths with Formal Methods -- Initial Results ( http://arxiv.org/abs/2306.01744v1 )

ライセンス: Link先を確認

Joao Marques-Silva

(参考訳) 近年の機械学習(ML)の進歩は印象的かつ広範囲に及んでいる。しかしながら、MLモデルのデプロイは、最高のパフォーマンスを持つMLモデルの予測方法に対する信頼の欠如によって、依然として損なわれている。信頼の欠如の問題は、高リスクまたは安全クリティカルな領域におけるmlモデルの使用においてさらに深刻である。 eXplainable AI(XAI)は、信頼できるAIを提供するための継続的な取り組みの中核にある。残念ながら、XAIは信頼を構築する代わりに不信を育むという批判的な誤解に打ち消されている。本稿は、XAIにおける最も目に見える誤解のいくつかを詳述し、これらの誤解を否定するためにも、実用的な代替案を考案するためにも、形式的手法がどのように使われたかを示す。

The advances in Machine Learning (ML) in recent years have been both impressive and far-reaching. However, the deployment of ML models is still impaired by a lack of trust in how the best-performing ML models make predictions. The issue of lack of trust is even more acute in the uses of ML models in high-risk or safety-critical domains. eXplainable artificial intelligence (XAI) is at the core of ongoing efforts for delivering trustworthy AI. Unfortunately, XAI is riddled with critical misconceptions, that foster distrust instead of building trust. This paper details some of the most visible misconceptions in XAI, and shows how formal methods have been used, both to disprove those misconceptions, but also to devise practically effective alternatives.

翻訳日:2023-06-11 14:04:56 公開日:2023-05-13

# 消失する活性化:深部カプセルネットワークの症状

Vanishing Activations: A Symptom of Deep Capsule Networks ( http://arxiv.org/abs/2305.11178v1 )

ライセンス: Link先を確認

Miles Everett, Mingjun Zhong and Georgios Leontidis

(参考訳) カプセルネットワークは、スカラーの代わりにベクトルや行列表現を利用したニューラルネットワークの拡張であり、当初、視覚概念が部分から完全なオブジェクトへと進化する動的解析木を作成するために開発された。カプセルネットワークの初期実装は、さまざまなデータセットで最先端の成果を達成し、維持する。しかし、最近の研究では、パースツリーの構築に失敗したことと、より深いネットワークに展開する際の勾配の消失に対する感受性など、オリジナルのCapsule Networkアーキテクチャの欠点が明らかにされている。本稿では,本研究を主要なCapsule Networkアーキテクチャにまで拡張し,これらの課題が元の設計に限らないことを示す。カプセルネットワーク研究の大多数は、元々のカプセルネットワークとはやや異なるが、根本的に類似した構造を保っているアーキテクチャを生み出していると論じている。この設計上の類似性がカプセルネットワークのスケーラビリティを阻害している可能性がある。本研究は,Capsule Networksの堅牢性とスケーラビリティ向上に関する広範な議論に寄与する。

Capsule Networks, an extension to Neural Networks utilizing vector or matrix representations instead of scalars, were initially developed to create a dynamic parse tree where visual concepts evolve from parts to complete objects. Early implementations of Capsule Networks achieved and maintain state-of-the-art results on various datasets. However, recent studies have revealed shortcomings in the original Capsule Network architecture, notably its failure to construct a parse tree and its susceptibility to vanishing gradients when deployed in deeper networks. This paper extends the investigation to a range of leading Capsule Network architectures, demonstrating that these issues are not confined to the original design. We argue that the majority of Capsule Network research has produced architectures that, while modestly divergent from the original Capsule Network, still retain a fundamentally similar structure. We posit that this inherent design similarity might be impeding the scalability of Capsule Networks. Our study contributes to the broader discussion on improving the robustness and scalability of Capsule Networks.

翻訳日:2023-05-28 05:36:22 公開日:2023-05-13

# CBAGAN-RRT:Samping-based Path Planningのための畳み込みブロック注意生成支援ネットワーク

CBAGAN-RRT: Convolutional Block Attention Generative Adversarial Network for Sampling-Based Path Planning ( http://arxiv.org/abs/2305.10442v1 )

ライセンス: Link先を確認

Abhinav Sagar, Sai Teja Gilukara

(参考訳) サンプリングに基づく経路計画アルゴリズムは自律ロボットにおいて重要な役割を果たす。しかし、RTRベースのアルゴリズムで共通する問題は、生成した初期経路が最適ではなく、収束が遅すぎて現実のアプリケーションでは利用できないことである。本稿では,空間的およびチャネル的注意と新たな損失関数を組み合わせた畳み込みブロック的注意生成逆ネットワークを用いた新しい画像ベース学習アルゴリズム(cbagan-rrt)を提案する。我々のganモデルから生成される経路の確率分布は、rrtアルゴリズムのサンプリングプロセスを導くために用いられる。我々は, \cite{zhang2021generative} が生成したデータセット上でネットワークをトレーニングし, IOU Score, Dice Score, FIDスコア, 時間コストやノード数といったパス計画指標を用いて, 従来の最先端アルゴリズムよりも優れた性能を示す。我々は,本研究の実現可能性を示すため,詳細な実験とアブレーション実験を行い,本モデルがトレーニングデータセットだけでなく,未発見のテストデータセット上でも良好に機能することを示す。このアプローチの利点は、状態空間における複雑な前処理を回避でき、精度を損なうことなくターンや狭い通路を含むような複雑な環境に一般化でき、我々のモデルはサンプリングに基づく他の経路計画アルゴリズムと容易に統合できることである。

Sampling-based path planning algorithms play an important role in autonomous robotics. However, a common problem among the RRT-based algorithms is that the initial path generated is not optimal and the convergence is too slow to be used in real-world applications. In this paper, we propose a novel image-based learning algorithm (CBAGAN-RRT) using a Convolutional Block Attention Generative Adversarial Network with a combination of spatial and channel attention and a novel loss function to design the heuristics, find a better optimal path, and improve the convergence of the algorithm both concerning time and speed. The probability distribution of the paths generated from our GAN model is used to guide the sampling process for the RRT algorithm. We train and test our network on the dataset generated by \cite{zhang2021generative} and demonstrate that our algorithm outperforms the previous state-of-the-art algorithms using both the image quality generation metrics like IOU Score, Dice Score, FID score, and path planning metrics like time cost and the number of nodes. We conduct detailed experiments and ablation studies to illustrate the feasibility of our study and show that our model performs well not only on the training dataset but also on the unseen test dataset. The advantage of our approach is that we can avoid the complicated preprocessing in the state space, our model can be generalized to complicated environments like those containing turns and narrow passages without loss of accuracy, and our model can be easily integrated with other sampling-based path planning algorithms.

翻訳日:2023-05-21 10:24:15 公開日:2023-05-13

# 医用samアダプタ : 医用画像分割のためのsegment anythingモデルの適用

Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation ( http://arxiv.org/abs/2304.12620v6 )

ライセンス: Link先を確認

Junde Wu and Yu Zhang and Rao Fu and Huihui Fang and Yuanpei Liu and Zhaowei Wang and Yanwu Xu and Yueming Jin

(参考訳) Segment Anything Model (SAM)は画像セグメンテーションの分野で最近人気を集めている。全面的なセグメンテーションタスクとプロンプトベースのインターフェースの素晴らしい機能のおかげで、SAMはコミュニティ内で激しい議論を巻き起こした。イメージセグメンテーションのタスクはSAMによって「完了」されたと多くの名高い専門家から言われている。しかし, イメージセグメンテーションは, イメージセグメンテーションファミリーの重要な分枝であるが, セグメンテーション"Anything"の範囲には含まれていないようである。多くの個人実験や最近の研究では、SAMは医療画像のセグメンテーションのサブパールを担っていることが示されている。自然な疑問は、SAMの強力なセグメンテーション能力を医療画像セグメンテーションに拡張するために、パズルの欠片を見つける方法である。本稿では,SAMモデルを微調整する代わりに,医療特化領域の知識をセグメンテーションモデルに統合するMed SAM Adapterを提案する。この単純な実装は、医療画像のセグメンテーションにおいて驚くほど優れた性能を示しており、一般的なNLP技術であるAdapterをコンピュータビジョンのケースに転送する試みの1つだ。医用SAMアダプタ (MSA) は, CT, MRI, 超音波画像, 眼底画像, 皮膚内視鏡画像など, 様々な画像モダリティを有する19の医用画像セグメンテーションタスクにおいて, 優れた性能を示した。 MSAは、nnUNet、TransUNet、UNetr、MedSegDiffのような幅広い最先端(SOTA)の医療画像セグメンテーション手法より優れており、また、完全に細返されたMedSAMよりもかなりパフォーマンスの差がある。コードは、https://github.com/WuJunde/Medical-SAM-Adapter.comでリリースされる。

The Segment Anything Model (SAM) has recently gained popularity in the field of image segmentation. Thanks to its impressive capabilities in all-round segmentation tasks and its prompt-based interface, SAM has sparked intensive discussion within the community. It is even said by many prestigious experts that image segmentation task has been "finished" by SAM. However, medical image segmentation, although an important branch of the image segmentation family, seems not to be included in the scope of Segmenting "Anything". Many individual experiments and recent studies have shown that SAM performs subpar in medical image segmentation. A natural question is how to find the missing piece of the puzzle to extend the strong segmentation capability of SAM to medical image segmentation. In this paper, instead of fine-tuning the SAM model, we propose Med SAM Adapter, which integrates the medical specific domain knowledge to the segmentation model, by a simple yet effective adaptation technique. Although this work is still one of a few to transfer the popular NLP technique Adapter to computer vision cases, this simple implementation shows surprisingly good performance on medical image segmentation. A medical image adapted SAM, which we have dubbed Medical SAM Adapter (MSA), shows superior performance on 19 medical image segmentation tasks with various image modalities including CT, MRI, ultrasound image, fundus image, and dermoscopic images. MSA outperforms a wide range of state-of-the-art (SOTA) medical image segmentation methods, such as nnUNet, TransUNet, UNetr, MedSegDiff, and also outperforms the fully fine-turned MedSAM with a considerable performance gap. Code will be released at: https://github.com/WuJunde/Medical-SAM-Adapter.

翻訳日:2023-05-18 19:31:04 公開日:2023-05-13

# モデル整合性検証のための決定に基づく繰り返しフラクティブ透かし

Decision-based iterative fragile watermarking for model integrity verification ( http://arxiv.org/abs/2305.09684v1 )

ライセンス: Link先を確認

Zhaoxia Yin, Heng Yin, Hang Su, Xinpeng Zhang, Zhenzhe Gao

(参考訳) 通常、ファンデーションモデルは彼らのサービスに対する高い需要を満たすためにクラウドサーバーにホストされる。しかしこれは、アタッカーがクラウドにアップロードしたり、ローカルシステムから転送した後で修正できるため、セキュリティ上のリスクにさらされる。そこで本研究では,通常のトレーニングサンプルをモデル変更に敏感な脆弱なサンプルに変換する反復的決定ベース脆弱性透かしアルゴリズムを提案する。提案手法は,変換されたサンプルを投入した場合に,対象モデルが出力する予測確率分布の分散を最小化することを目的とした最適化問題であり,通常のサンプルを複数回繰り返して脆弱なサンプルに変換する。 Our method has some advantages: (1) the iterative update of samples is done in a decision-based black-box manner, relying solely on the predicted probability distribution of the target model, which reduces the risk of exposure to adversarial attacks, (2) the small-amplitude multiple iterations approach allows the fragile samples to perform well visually, with a PSNR of 55 dB in TinyImageNet compared to the original samples, (3) even with changes in the overall parameters of the model of magnitude 1e-4, the fragile samples can detect such changes, and (4) the method is independent of the specific model structure and dataset. 本稿では,複数のモデルとデータセットにおける提案手法の有効性を実証し,現状よりも優れていることを示す。

Typically, foundation models are hosted on cloud servers to meet the high demand for their services. However, this exposes them to security risks, as attackers can modify them after uploading to the cloud or transferring from a local system. To address this issue, we propose an iterative decision-based fragile watermarking algorithm that transforms normal training samples into fragile samples that are sensitive to model changes. We then compare the output of sensitive samples from the original model to that of the compromised model during validation to assess the model's completeness.The proposed fragile watermarking algorithm is an optimization problem that aims to minimize the variance of the predicted probability distribution outputed by the target model when fed with the converted sample.We convert normal samples to fragile samples through multiple iterations. Our method has some advantages: (1) the iterative update of samples is done in a decision-based black-box manner, relying solely on the predicted probability distribution of the target model, which reduces the risk of exposure to adversarial attacks, (2) the small-amplitude multiple iterations approach allows the fragile samples to perform well visually, with a PSNR of 55 dB in TinyImageNet compared to the original samples, (3) even with changes in the overall parameters of the model of magnitude 1e-4, the fragile samples can detect such changes, and (4) the method is independent of the specific model structure and dataset. We demonstrate the effectiveness of our method on multiple models and datasets, and show that it outperforms the current state-of-the-art.

翻訳日:2023-05-18 19:11:22 公開日:2023-05-13

# スパイキングネットワークの初期化とフィリングレートの崩壊

Spiking Network Initialisation and Firing Rate Collapse ( http://arxiv.org/abs/2305.08879v1 )

ライセンス: Link先を確認

Nicolas Perez-Nieves and Dan F.M Goodman

(参考訳) 近年、スパイクニューラルネットワーク(SNN)を訓練する手法が開発され、精度の面ではArtificial Neural Networks(ANN)の代替となり、同時に推論やトレーニング時のエネルギー効率も向上している。しかし、SNNの優れた初期化を構成するものについては、まだ不明である。 ANNトレーニングのために開発された初期化スキームは、しばしば不十分で手動チューニングを必要とする。本稿では,ANN初期化文献の手法と計算神経科学結果を用いてこの問題に対処する。提案手法では,snsのスパイク・リセット非線形性と燃焼速度の崩壊問題により,annの重量初期化問題はannよりも微妙な問題であることを示した。まず,従来のランダムウォーク法とウィーナー法を応用して,様々な仮定の集合の下で発火速度崩壊問題に対するいくつかの解を同定し,提案する。次に,annからの分散伝播法と異なる手法を組み合わせたsn初期化のための一般的な戦略を考案し,拡散とショットノイズ近似に基づく期待発火率と膜電位分布を求める。また, しきい値の存在下での膜電位分布を考慮したSNN初期化を理論的に検討した。しかし、これらの手法が実際のデータセット上でSNNにどの程度うまく適用できるかは未解決のままである。

In recent years, newly developed methods to train spiking neural networks (SNNs) have rendered them as a plausible alternative to Artificial Neural Networks (ANNs) in terms of accuracy, while at the same time being much more energy efficient at inference and potentially at training time. However, it is still unclear what constitutes a good initialisation for an SNN. We often use initialisation schemes developed for ANN training which are often inadequate and require manual tuning. In this paper, we attempt to tackle this issue by using techniques from the ANN initialisation literature as well as computational neuroscience results. We show that the problem of weight initialisation for ANNs is a more nuanced problem than it is for ANNs due to the spike-and-reset non-linearity of SNNs and the firing rate collapse problem. We firstly identify and propose several solutions to the firing rate collapse problem under different sets of assumptions which successfully solve the issue by leveraging classical random walk and Wiener processes results. Secondly, we devise a general strategy for SNN initialisation which combines variance propagation techniques from ANNs and different methods to obtain the expected firing rate and membrane potential distribution based on diffusion and shot-noise approximations. Altogether, we obtain theoretical results to solve the SNN initialisation which consider the membrane potential distribution in the presence of a threshold. Yet, to what extent can these methods be successfully applied to SNNs on real datasets remains an open question.

翻訳日:2023-05-17 17:51:42 公開日:2023-05-13

# 脳腫瘍のセグメンテーションにおける未学習特徴の学習

Learning to Learn Unlearned Feature for Brain Tumor Segmentation ( http://arxiv.org/abs/2305.08878v1 )

ライセンス: Link先を確認

Seungyub Han, Yeongmo Kim, Seokhyeon Ha, Jungwoo Lee, Seunghong Choi

(参考訳) そこで本研究では,脳腫瘍の分類を微調整するアルゴリズムを提案し,少数のデータサンプルを必要とせず,ネットワークが元のタスクを忘れないようにする。我々のアプローチはアクティブラーニングとメタラーニングに基づいている。医学的画像分割の難しさの1つは、適切なアノテーションによるデータセットの欠如であり、医師が信頼できるアノテーションをタグ付けする必要があることと、脳腫瘍の種類が異なり、mr画像に異なる構造的特徴を持つグリオーマや脳転移など、疾患の多くの変種が存在するためである。したがって、あらゆる種類の疾患に対して大規模な医療画像データセットを作成することは不可能である。本稿では,高次グリオーマから脳転移への伝達学習法を示し,そのアルゴリズムが数ステップでグリオーマと脳転移ドメインのバランスの取れたパラメータを実現することを示す。

We propose a fine-tuning algorithm for brain tumor segmentation that needs only a few data samples and helps networks not to forget the original tasks. Our approach is based on active learning and meta-learning. One of the difficulties in medical image segmentation is the lack of datasets with proper annotations, because it requires doctors to tag reliable annotation and there are many variants of a disease, such as glioma and brain metastasis, which are the different types of brain tumor and have different structural features in MR images. Therefore, it is impossible to produce the large-scale medical image datasets for all types of diseases. In this paper, we show a transfer learning method from high grade glioma to brain metastasis, and demonstrate that the proposed algorithm achieves balanced parameters for both glioma and brain metastasis domains within a few steps.

翻訳日:2023-05-17 17:51:15 公開日:2023-05-13

# M$^2$DAR:視覚変換器を用いたマルチビューマルチスケールドライバ動作認識

M$^2$DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer ( http://arxiv.org/abs/2305.08877v1 )

ライセンス: Link先を確認

Yunsheng Ma, Liangqi Yuan, Amr Abdelraouf, Kyungtae Han, Rohit Gupta, Zihao Li, Ziran Wang

(参考訳) 交通安全の確保と事故防止は、この目標を達成するためにコンピュータビジョン技術の進歩を活用できる日々の運転において重要な目標である。本稿では,非トリミングビデオ(M$^2$DAR)における自然主義的運転行動認識とローカライズのためのマルチビュー・マルチスケールフレームワークを提案する。本システムでは,ロバストな階層表現を学習する重み付きマルチスケールトランスフォーマーベースの行動認識ネットワークを特徴とする。さらに,複数のビューにまたがる行動認識モジュールから予備結果を洗練するために,集約,フィルタリング,マージ,選択プロセスからなる新しい選挙アルゴリズムを提案する。第7回ai city challenge track 3データセットで行った広範囲な実験により,a2テストセットで0.5921のオーバーラップスコアを達成した。我々のソースコードは \url{https://github.com/PurdueDigitalTwin/M2DAR} で入手できる。

Ensuring traffic safety and preventing accidents is a critical goal in daily driving, where the advancement of computer vision technologies can be leveraged to achieve this goal. In this paper, we present a multi-view, multi-scale framework for naturalistic driving action recognition and localization in untrimmed videos, namely M$^2$DAR, with a particular focus on detecting distracted driving behaviors. Our system features a weight-sharing, multi-scale Transformer-based action recognition network that learns robust hierarchical representations. Furthermore, we propose a new election algorithm consisting of aggregation, filtering, merging, and selection processes to refine the preliminary results from the action recognition module across multiple views. Extensive experiments conducted on the 7th AI City Challenge Track 3 dataset demonstrate the effectiveness of our approach, where we achieved an overlap score of 0.5921 on the A2 test set. Our source code is available at \url{https://github.com/PurdueDigitalTwin/M2DAR}.

翻訳日:2023-05-17 17:50:58 公開日:2023-05-13

# カプセル内視鏡における三次元表面再構成の課題

Challenges of 3D Surface Reconstruction in Capsule Endoscopy ( http://arxiv.org/abs/2103.10390v3 )

ライセンス: Link先を確認

Olivier Rukundo

(参考訳) 大腸がん検診の精度と信頼性を向上させるため,カプセル内視鏡(CE)画像を用いた三次元3次元表面再構成は,CEハードウェアとソフトウェア制限のために依然として困難である。本研究は3次元可視化の課題に焦点をあて,前処理および非前処理のCE画像を用いて再構成した3次元表面に対する視線選択の不確定の影響を簡潔に検討する。さらに,同じ方位角で見る3次元表面の内容と,視線の角度の違いについて検討した。この研究は、再構成された3D表面の3Dプリンティングは、2Dスクリーンの非決定的な選択や視覚的制約といった課題を克服できると結論付けている。

Essential for improving the accuracy and reliability of bowel cancer screening, three-dimensional (3D) surface reconstruction using capsule endoscopy (CE) images remains challenging due to CE hardware and software limitations. This study focuses on 3D visualization challenges and briefly investigates the impact of indeterminate selection of the line-of-sight on 3D surfaces reconstructed using both preprocessed and non-preprocessed CE images. Furthermore, the study examines the content of 3D surfaces viewed at the same azimuth angles and different elevation angles of the line of sight. The study concludes that 3D printing of reconstructed 3D surfaces can overcome the challenges such as 2D screen line-of-sight indeterminate selection and visual restrictions.

翻訳日:2023-05-17 01:50:49 公開日:2023-05-13

# 行列積密度演算子(Matrix Product Density Operators): ローカルな親 Hamiltonian はいつ存在するか?

Matrix Product Density Operators: when do they have a local parent Hamiltonian? ( http://arxiv.org/abs/2010.14682v3 )

ライセンス: Link先を確認

Chi-Fang Chen, Kohtaro Kato, and Fernando G.S.L. Brand\~ao

(参考訳) 準局所親ハミルトニアンのギブス状態として行列積密度演算子(MPDO)を書けるかを検討する。我々は、これが一般的なMPDOのケースであり、証拠を裏付けるものであると推測する。親ハミルトニアンの局所性を調べるため、量子条件付き相互情報が指数関数的に崩壊するかどうかをチェックする。我々が考えるMPDOは、1-入出力/2-アウトプット('Y-shaped')完全正の写像の連鎖から構成される。確率的チャネルと厳密に正のチャネルの条件付き相互情報の上界を導出し、そのチャネルの補正可能な代数が自明であれば指数関数的に崩壊することを示す。また、簡単な修正可能な代数を持つすべてのY字チャネルに対する条件付き相互情報の指数関数的崩壊を意味する量子データ処理の不等式に関する予想も導入する。さらに,近親だが同値でない従兄弟であるmpdoを局所的に測定した。測定された状態の条件付き相互情報の指数的減衰に対して十分な条件を与え、あるランダムmpdoに対して汎用的に正しいことを数値的に確認する。

We study whether one can write a Matrix Product Density Operator (MPDO) as the Gibbs state of a quasi-local parent Hamiltonian. We conjecture this is the case for generic MPDO and give supporting evidences. To investigate the locality of the parent Hamiltonian, we take the approach of checking whether the quantum conditional mutual information decays exponentially. The MPDO we consider are constructed from a chain of 1-input/2-output (`Y-shaped') completely-positive maps, i.e., the MPDO have a local purification. We derive an upper bound on the conditional mutual information for bistochastic channels and strictly positive channels and show that it decays exponentially if the correctable algebra of the channel is trivial. We also introduce a conjecture on a quantum data processing inequality that implies the exponential decay of the conditional mutual information for every Y-shaped channel with trivial correctable algebra. We additionally investigate a close but nonequivalent cousin: MPDO measured in a local basis. We provide sufficient conditions for the exponential decay of the conditional mutual information of the measured states and numerically confirm they are generically true for certain random MPDO.

翻訳日:2023-05-17 01:50:05 公開日:2023-05-13

# タスクに条件付けられた明示的ハイパーパラメータ予測関数の学習

Learning an Explicit Hyperparameter Prediction Function Conditioned on Tasks ( http://arxiv.org/abs/2107.02378v2 )

ライセンス: Link先を確認

Jun Shu, Deyu Meng, Zongben Xu

(参考訳) メタ学習は最近、機械学習コミュニティで注目を集めている。新しいクエリデータのためのラベルを予測するために固有の予測ルールを学習する従来の機械学習とは対照的に、メタ学習は、観察したタスクから機械学習の学習方法論を学習することを目的としており、メタ学習学習手法を利用して新しいクエリタスクを一般化する。本研究では,すべての学習タスクで共有される明示的なハイパーパラメータ予測関数の学習として,学習方法論を解釈する。具体的には、この関数はメタラーナーと呼ばれるパラメータ化関数として表現され、トレーニング/テストタスクから適切なハイパーパラメータ設定にマッピングされる。このような設定により、メタ学習学習手法は、現在の多くのメタ学習手法によって固定されたハイパーパラメータを得る代わりに、様々なクエリタスクを柔軟に適合させることができる。このようなメタ学習の理解は、一般的な損失/タスク/モデルで一般化境界を分析する従来の学習理論から容易に成功する。この理論は自然に、抽出されたメタリーナーの品質を改善するための実現可能な制御戦略を導いており、少数ショット回帰、少数ショット分類、ドメイン一般化など、いくつかの典型的なメタ学習アプリケーションにおいて、その一般化能力を微妙に改善できることが証明されている。

Meta learning has attracted much attention recently in machine learning community. Contrary to conventional machine learning aiming to learn inherent prediction rules to predict labels for new query data, meta learning aims to learn the learning methodology for machine learning from observed tasks, so as to generalize to new query tasks by leveraging the meta-learned learning methodology. In this study, we interpret such learning methodology as learning an explicit hyper-parameter prediction function shared by all training tasks. Specifically, this function is represented as a parameterized function called meta-learner, mapping from a training/test task to its suitable hyper-parameter setting, extracted from a pre-specified function set called meta learning machine. Such setting guarantees that the meta-learned learning methodology is able to flexibly fit diverse query tasks, instead of only obtaining fixed hyper-parameters by many current meta learning methods, with less adaptability to query task's variations. Such understanding of meta learning also makes it easily succeed from traditional learning theory for analyzing its generalization bounds with general losses/tasks/models. The theory naturally leads to some feasible controlling strategies for ameliorating the quality of the extracted meta-learner, verified to be able to finely ameliorate its generalization capability in some typical meta learning applications, including few-shot regression, few-shot classification and domain generalization.

翻訳日:2023-05-17 01:40:02 公開日:2023-05-13

# 放送Bellシナリオにおけるデバイス非依存およびセミデバイス非依存の絡み合い認証

Device-independent and semi-device-independent entanglement certification in broadcast Bell scenarios ( http://arxiv.org/abs/2111.06358v4 )

ライセンス: Link先を確認

Emanuel-Cristian Boghiu, Flavien Hirsch, Pei-Sheng Lin, Marco T\'ulio Quintino, Joseph Bowles

(参考訳) 近年、二成分量子状態のサブシステムをブロードキャストすることで、ベル非局所性を活性化し、デバイス非依存の絡み合い認証のためのノイズ許容範囲を大幅に改善できることが示されている。この研究では、これらの結果を強化し、この現象の新たな側面を探求する。まず,ベル非局所性の活性化に関する新しい結果を示す。我々は,放送シナリオに合わせたベル不等式を構築し,放送がベル非局所活性化のより強固な概念につながることを示す。特に,これらの概念を応用して,局所的隠れ変数モデルが一般化された場合,二成分状態が真の三成分非局所相関に繋がることを示す。次に,放送シナリオにおけるデバイス非依存の絡み合い認証について検討し,デバイス非依存の絡み合い認証が2ビットのワーナー状態に対して本質的には絡み合いの範囲全体において可能であることを示す。最後に、EPRステアリングの概念を放送シナリオに拡張し、2ビット等方性状態の活性化の新たな例を示す。その結果,ブロードキャストベースのデバイス依存およびセミデバイス非依存プロトコルへの道が開けた。

It has recently been shown that by broadcasting the subsystems of a bipartite quantum state, one can activate Bell nonlocality and significantly improve noise tolerance bounds for device-independent entanglement certification. In this work we strengthen these results and explore new aspects of this phenomenon. First, we prove new results related to the activation of Bell nonlocality. We construct Bell inequalities tailored to the broadcast scenario, and show how broadcasting can lead to even stronger notions of Bell nonlocality activation. In particular, we exploit these ideas to show that bipartite states admitting a local hidden-variable model for general measurements can lead to genuine tripartite nonlocal correlations. We then study device-independent entanglement certification in the broadcast scenario, and show through semidefinite programming techniques that device-independent entanglement certification is possible for the two-qubit Werner state in essentially the entire range of entanglement. Finally, we extend the concept of EPR steering to the broadcast scenario, and present novel examples of activation of the two-qubit isotropic state. Our results pave the way for broadcast-based device-dependent and semi-device-independent protocols.

翻訳日:2023-05-17 01:30:25 公開日:2023-05-13

# ポイント2NeRF:3次元点雲からのニューラル放射場の生成

Points2NeRF: Generating Neural Radiance Fields from 3D point cloud ( http://arxiv.org/abs/2206.01290v2 )

ライセンス: Link先を確認

D. Zimny, T. Trzci\'nski, P. Spurek

(参考訳) LIDARや様々な深度カメラなどの3D視覚情報のための現代の登録装置は、データを3Dポイントクラウドとしてキャプチャする。逆に、そのような雲はサイズと複雑さのため処理が難しい。既存のメソッドは、メッシュをポイントクラウドに適合させ、代わりにレンダリングすることで、この問題に対処する。しかしこのアプローチは、結果として生じる視覚化の忠実さを低下させ、コンピュータグラフィックスアプリケーションで重要なオブジェクトの色情報を見逃してしまう。本研究では,3次元物体をNeRF(Neural Radiance Fields)として表現することで,この課題を軽減することを提案する。我々は、ハイパーネットワークのパラダイムを活用し、モデルをトレーニングし、関連するカラー値を持つ3Dポイント・クラウドを取り、入力された2D画像から3Dオブジェクトを再構成するNeRFネットワークの重みを返す。提案手法は,3次元オブジェクトの効率的な表現を提供し,NeRFの条件付けや,学習対象以外の一般化の改善など,既存のアプローチに対していくつかの利点を提供している。後者も経験的評価の結果で確認した。

Contemporary registration devices for 3D visual information, such as LIDARs and various depth cameras, capture data as 3D point clouds. In turn, such clouds are challenging to be processed due to their size and complexity. Existing methods address this problem by fitting a mesh to the point cloud and rendering it instead. This approach, however, leads to the reduced fidelity of the resulting visualization and misses color information of the objects crucial in computer graphics applications. In this work, we propose to mitigate this challenge by representing 3D objects as Neural Radiance Fields (NeRFs). We leverage a hypernetwork paradigm and train the model to take a 3D point cloud with the associated color values and return a NeRF network's weights that reconstruct 3D objects from input 2D images. Our method provides efficient 3D object representation and offers several advantages over the existing approaches, including the ability to condition NeRFs and improved generalization beyond objects seen in training. The latter we also confirmed in the results of our empirical evaluation.

翻訳日:2023-05-17 01:22:12 公開日:2023-05-13

# 中性子干渉法における位相渦格子

Phase Vortex Lattices in Neutron Interferometry ( http://arxiv.org/abs/2205.00536v2 )

ライセンス: Link先を確認

Niels Geerits and Hartmut Lemmel and Anna-Sophie Berger and Stephan Sponar

(参考訳) ネストループ干渉計に挿入されたアルミニウムプリズムの組み合わせを用いて、軌道角運動量l_z=0.35の中性子位相渦格子を220ミクロンの長さスケールで生成し、伝播方向を横切る。本手法は,最近開発された磁気的手法の一般化であり,強い核相互作用を活用できる。これらのプリズムの強いポテンシャルにより、より強い格子が生成される。中性子化合物光学およびスプリット結晶干渉計の最近の進歩と組み合わせることで、本手法は固有の中性子軌道角運動量状態の生成に適用できる。最後に、現在の状態では、我々の設定は異方性極小角中性子散乱に直接適用可能であると断言する。

A combination of aluminium prisms inserted into a nested loop interferometer is used to generate a neutron phase vortex lattice with significant extrinsic orbital angular momentum, L_z=0.35, on a length scale of 220 microns, transverse to the propagation direction. Our method is a generalization of recently developed magnetic methods, such that we can exploit the strong nuclear interaction. The stronger potential of these prisms allows for the generation of a tighter lattice. Combined with recent advances in neutron compound optics and split crystal interferometry our method may be applied to the generation of intrinsic neutron orbital angular momentum states. Finally, we assert that, in its current state, our setup is directly applicable to anisotropic ultra small angle neutron scattering.

翻訳日:2023-05-17 01:21:19 公開日:2023-05-13

# PADA:自己教師付き音声表現のためのドメイン適応処理

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations ( http://arxiv.org/abs/2203.16965v4 )

ライセンス: Link先を確認

Lodagala V S V Durga Prasad and Sreyan Ghosh and S. Umesh

(参考訳) 自己教師付き音声表現学習(ssl)モデルは下流の様々なタスクをこなすが、これらのモデルはラベルのないデータが起源となる領域に過剰に適合することが観察されている。この問題を軽減するために,大量のドメイン外データに基づいて事前訓練されたモデルからPAD(Pruning Assisted Domain Adaptation)と余剰重量をゼロにする手法を提案する。直感的には、ターゲットドメインのASR微調整のためのスペースを作るのに役立つ。冗長な重みは、この作業の一部として詳細に議論された様々な刈り取り戦略を通じて特定することができる。具体的には,最近発見されたタスク非依存型およびタスク認識型プルーニングがPADに与える影響を調査し,後者に基づいた新たなプルーニングパラダイムを提案する。 CD-TAWは、十分に調整されたOODモデルから初期プルーニングマスクを取得し、論文で論じるプルーニング戦略の他の部分と大きく異なる。提案するCD-TAW法は,言語モデル(LM)復号化を伴わないSwitchboardデータの2時間サブセットを微調整することにより,ベースラインよりも20.6%の相対的なWER改善を実現する。さらに,提案手法の重要な設計選択を強調するために,詳細な分析を行った。

While self-supervised speech representation learning (SSL) models serve a variety of downstream tasks, these models have been observed to overfit to the domain from which the unlabelled data originates. To alleviate this issue, we propose PADA (Pruning Assisted Domain Adaptation) and zero out redundant weights from models pre-trained on large amounts of out-of-domain (OOD) data. Intuitively, this helps to make space for the target-domain ASR finetuning. The redundant weights can be identified through various pruning strategies which have been discussed in detail as a part of this work. Specifically, we investigate the effect of the recently discovered Task-Agnostic and Task-Aware pruning on PADA and propose a new pruning paradigm based on the latter, which we call Cross-Domain Task-Aware Pruning (CD-TAW). CD-TAW obtains the initial pruning mask from a well fine-tuned OOD model, which makes it starkly different from the rest of the pruning strategies discussed in the paper. Our proposed CD-TAW methodology achieves up to 20.6% relative WER improvement over our baseline when fine-tuned on a 2-hour subset of Switchboard data without language model (LM) decoding. Furthermore, we conduct a detailed analysis to highlight the key design choices of our proposed method.

翻訳日:2023-05-17 01:19:42 公開日:2023-05-13

# 逐次キャンセラリストデコーディングのためのスケーラブル極性コード構築:グラフニューラルネットワークに基づくアプローチ

Scalable Polar Code Construction for Successive Cancellation List Decoding: A Graph Neural Network-Based Approach ( http://arxiv.org/abs/2207.01105v4 )

ライセンス: Link先を確認

Yun Liao, Seyyed Ali Hashemi, Hengjie Yang, John M. Cioffi

(参考訳) 逐次復号化のための極性符号はビットチャネルをソートすることで効率よく実装できるが、巡回冗長チェック支援逐次復号化リスト(CA-SCL)の最適極性符号の探索はまだ検討が待たれている。本稿では,まず極性コードを,極性コード構築メッセージパッシング(pccmp)グラフと呼ばれる一意な不均一グラフにマッピングする。次に、CA-SCLデコーディングの下で最小フレーム誤り率の極符号に対応するPCCMPグラフを見つけることを目的とした、異種グラフニューラルネットベースの反復メッセージパス(IMP)アルゴリズムを提案する。この新しいIMPアルゴリズムの主な利点はスケーラビリティである。すなわち、モデル複雑性はブロック長とコードレートとは独立であり、短い極性コード上で訓練されたIMPモデルは、長い極性コードの構成に容易に適用できる。数値実験により、IMPベースの極符号構造はCA-SCLデコードの下での古典的な構成よりも優れていた。さらに、長さ128の極性符号で訓練されたIMPモデルがコードレートとブロック長の異なる極性符号の構築に直接適用されると、これらの極性符号構造が5G極性符号に匹敵する性能を示すことがシミュレーションで示されている。

While constructing polar codes for successive-cancellation decoding can be implemented efficiently by sorting the bit-channels, finding optimal polar codes for cyclic-redundancy-check-aided successive-cancellation list (CA-SCL) decoding in an efficient and scalable manner still awaits investigation. This paper first maps a polar code to a unique heterogeneous graph called the polar-code-construction message-passing (PCCMP) graph. Next, a heterogeneous graph-neural-network-based iterative message-passing (IMP) algorithm is proposed which aims to find a PCCMP graph that corresponds to the polar code with minimum frame error rate under CA-SCL decoding. This new IMP algorithm's major advantage lies in its scalability power. That is, the model complexity is independent of the blocklength and code rate, and a trained IMP model over a short polar code can be readily applied to a long polar code's construction. Numerical experiments show that IMP-based polar-code constructions outperform classical constructions under CA-SCL decoding. In addition, when an IMP model trained on a length-128 polar code directly applies to the construction of polar codes with different code rates and blocklengths, simulations show that these polar code constructions deliver comparable performance to the 5G polar codes.

翻訳日:2023-05-17 01:12:25 公開日:2023-05-13

# グラフ埋め込み法のメモリと容量

Memory and Capacity of Graph Embedding Methods ( http://arxiv.org/abs/2208.08769v3 )

ライセンス: Link先を確認

Frank Qiu

(参考訳) テンソル製品とおよそオーソノーマルコードによるグラフ埋め込み(Graph Embeddings via Tensor Products and A roughly Orthonormal Codes)を参照してください。

THIS PAPER IS NOW DEFUNCT: Check out "Graph Embeddings via Tensor Products and Approximately Orthonormal Codes", where it has been combined into one paper.

翻訳日:2023-05-17 01:02:45 公開日:2023-05-13

# ユークリッド選好モデルにおける誤差

Error in the Euclidean Preference Model ( http://arxiv.org/abs/2208.08160v3 )

ライセンス: Link先を確認

Luke Thorburn, Maria Polukarov, Carmine Ventre

(参考訳) 選好の空間モデルは、ベクトル埋め込みの形で、レコメンダシステムを含む多くのディープラーニングとマルチエージェントシステムによって学習される。これらのモデルはしばしばユークリッド構造を近似すると仮定され、ユークリッド計量によって測定されるように、個人は「理想点」に近い位置にある選択肢を好む。しかし、Bogomolnaia and Laslier (2007) は、ユークリッド空間が個人や代替物よりも2つの少ない次元を持つ場合、この構造で表現できない順序的選好プロファイルが存在することを示した。この結果を拡張し、ほぼすべての選好プロファイルをユークリッドモデルで表現できない状況を示し、ユークリッドモデルを用いて非ユークリッド選好プロファイルを近似する際の予測誤差の理論的下限を導出する。この結果は、ベクトル埋め込みの解釈と利用に影響を及ぼす。なぜなら、任意の、真の順序関係の近似が、埋め込みの次元が表現される実体数の実質的な分数である場合に限り、予測できるからである。

Spatial models of preference, in the form of vector embeddings, are learned by many deep learning and multiagent systems, including recommender systems. Often these models are assumed to approximate a Euclidean structure, where an individual prefers alternatives positioned closer to their "ideal point", as measured by the Euclidean metric. However, Bogomolnaia and Laslier (2007) showed that there exist ordinal preference profiles that cannot be represented with this structure if the Euclidean space has two fewer dimensions than there are individuals or alternatives. We extend this result, showing that there are situations in which almost all preference profiles cannot be represented with the Euclidean model, and derive a theoretical lower bound on the expected error when using the Euclidean model to approximate non-Euclidean preference profiles. Our results have implications for the interpretation and use of vector embeddings, because in some cases close approximation of arbitrary, true ordinal relationships can be expected only if the dimensionality of the embeddings is a substantial fraction of the number of entities represented.

翻訳日:2023-05-17 01:02:41 公開日:2023-05-13

# 分散トレーニングにおけるビザンチン攻撃の検出と軽減

Detection and Mitigation of Byzantine Attacks in Distributed Training ( http://arxiv.org/abs/2208.08085v4 )

ライセンス: Link先を確認

Konstantinos Konstantinidis, Namrata Vaswani, and Aditya Ramamoorthy

(参考訳) 現代の機械学習タスクの多くは、トレーニングパイプラインの重要なコンポーネントとして大規模分散クラスタを使用する必要がある。しかし、作業ノードの異常なビザンチン挙動は、トレーニングを脱線させ、推論の品質を損なう可能性がある。このような動作は意図しないシステム障害や組織的攻撃によるものでもあり、結果として、トレーニングを調整するパラメータサーバ(PS)に任意の結果を返すノードもある。最近の研究は、幅広い攻撃モデルを検討し、歪んだ勾配を補正するためにロバストアグリゲーションと/または計算冗長性を検討した。本研究では,攻撃モデルについて検討する。$q$ 防御プロトコルに精通し,反復から弱いものへ変更できる。$q$ ランダムに選択した敵は,一度に数回のイテレーションでのみ変更可能な,限定的な結束能力を持つ。我々のアルゴリズムは、冗長なタスク割り当てと敵対行動の検出に頼っている。また,文献で考慮される共通の仮定と設定の下での最適点への本手法の収束性を示す。強い攻撃に対しては,従来の最先端技術と比較して16%～99%の歪み勾配が減少することを示した。トップ1の分類精度はcifar-10のデータセットにおいて,最先端の手法と比較して25%の精度向上(強弱のシナリオ平均)を示した。

A plethora of modern machine learning tasks require the utilization of large-scale distributed clusters as a critical component of the training pipeline. However, abnormal Byzantine behavior of the worker nodes can derail the training and compromise the quality of the inference. Such behavior can be attributed to unintentional system malfunctions or orchestrated attacks; as a result, some nodes may return arbitrary results to the parameter server (PS) that coordinates the training. Recent work considers a wide range of attack models and has explored robust aggregation and/or computational redundancy to correct the distorted gradients. In this work, we consider attack models ranging from strong ones: $q$ omniscient adversaries with full knowledge of the defense protocol that can change from iteration to iteration to weak ones: $q$ randomly chosen adversaries with limited collusion abilities which only change every few iterations at a time. Our algorithms rely on redundant task assignments coupled with detection of adversarial behavior. We also show the convergence of our method to the optimal point under common assumptions and settings considered in literature. For strong attacks, we demonstrate a reduction in the fraction of distorted gradients ranging from 16%-99% as compared to the prior state-of-the-art. Our top-1 classification accuracy results on the CIFAR-10 data set demonstrate 25% advantage in accuracy (averaged over strong and weak scenarios) under the most sophisticated attacks compared to state-of-the-art methods.

翻訳日:2023-05-17 01:02:22 公開日:2023-05-13

# 量子回路は一般化されるか?

Do Quantum Circuit Born Machines Generalize? ( http://arxiv.org/abs/2207.13645v4 )

ライセンス: Link先を確認

Kaitlin Gili, Mohamed Hibat-Allah, Marta Mauri, Chris Ballance, Alejandro Perdomo-Ortiz

(参考訳) 近年、生成タスクのための量子回路モデルの提案において、それらの性能に関する議論は、既知のターゲット分布を再現する能力に限られている。例えば、QCBM(Quantum Circuit Born Machines)のような表現型モデルファミリは、与えられたターゲット分布を高精度に学習する能力について、ほぼ完全に評価されている。この側面はいくつかのタスクには理想的かもしれないが、ジェネレーティブモデルの評価の範囲を一般化するよりもデータを記憶する能力に制限する。その結果、モデルの一般化性能とそのような能力とリソース要件との関係、例えば回路深さとトレーニングデータの量についてはほとんど理解されていない。本研究では,最近提案された一般化評価フレームワークを活用し,この知識ギャップに対処する。まず,QCBMの濃度制約分布の学習過程を調査し,回路深度を増大させながら一般化性能が向上することを示した。ここで示した12量子ビットの例では、トレーニングセット内の有効データの最大30%で、qcbmは、知覚不能で有効なデータを生成するための最良の一般化性能を示す。最後に、QCBMが有効なサンプルだけでなく、適切に再重み付けされた分布に応じて分布する高品質なビットストリングに一般化できる能力を評価する。 QCBMは、再重み付けされたデータセットを効果的に学習し、トレーニングセットのデータセットよりも高い品質の未確認サンプルを生成することができる。我々の知る限り、これはQCBMの一般化性能を量子生成モデルの積分評価指標として示し、QCBMが高品質で望まれる新しいサンプルに一般化する能力を示す文献の中では初めてのものである。

In recent proposals of quantum circuit models for generative tasks, the discussion about their performance has been limited to their ability to reproduce a known target distribution. For example, expressive model families such as Quantum Circuit Born Machines (QCBMs) have been almost entirely evaluated on their capability to learn a given target distribution with high accuracy. While this aspect may be ideal for some tasks, it limits the scope of a generative model's assessment to its ability to memorize data rather than generalize. As a result, there has been little understanding of a model's generalization performance and the relation between such capability and the resource requirements, e.g., the circuit depth and the amount of training data. In this work, we leverage upon a recently proposed generalization evaluation framework to begin addressing this knowledge gap. We first investigate the QCBM's learning process of a cardinality-constrained distribution and see an increase in generalization performance while increasing the circuit depth. In the 12-qubit example presented here, we observe that with as few as 30% of the valid data in the training set, the QCBM exhibits the best generalization performance toward generating unseen and valid data. Lastly, we assess the QCBM's ability to generalize not only to valid samples, but to high-quality bitstrings distributed according to an adequately re-weighted distribution. We see that the QCBM is able to effectively learn the reweighted dataset and generate unseen samples with higher quality than those in the training set. To the best of our knowledge, this is the first work in the literature that presents the QCBM's generalization performance as an integral evaluation metric for quantum generative models, and demonstrates the QCBM's ability to generalize to high-quality, desired novel samples.

翻訳日:2023-05-17 01:00:41 公開日:2023-05-13

# ViT-DD:セミスーパービジョンドライバディトラクション検出用マルチタスク・ビジョン・トランス

ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection ( http://arxiv.org/abs/2209.09178v3 )

ライセンス: Link先を確認

Yunsheng Ma and Ziran Wang

(参考訳) 現代の運転における交通安全確保と事故軽減が最重要であり、コンピュータビジョン技術はこの目標に大きく貢献する可能性がある。本稿では,運転者注意障害検出と運転者の感情認識の両方に関連するトレーニング信号からインダクティブ情報を取り入れたマルチモーダル視覚変換器(ViT-DD)を提案する。さらに,感情ラベルのないドライバデータをvit-ddのマルチタスクトレーニングプロセスにシームレスに統合可能な自己学習アルゴリズムを開発した。実験結果から,提案したViT-DDは,SFDDDデータセットとAUCDDデータセットにおいて,運転者の気晴らし検出の既存手法を6.5\%,0.9\%で上回ることがわかった。この重要な研究領域における再現性をサポートし、さらなる進歩を促進するため、このアプローチのソースコードはhttps://github.com/PurdueDigitalTwin/ViT-DDで公開されている。

Ensuring traffic safety and mitigating accidents in modern driving is of paramount importance, and computer vision technologies have the potential to significantly contribute to this goal. This paper presents a multi-modal Vision Transformer for Driver Distraction Detection (termed ViT-DD), which incorporates inductive information from training signals related to both distraction detection and driver emotion recognition. Additionally, a self-learning algorithm is developed, allowing for the seamless integration of driver data without emotion labels into the multi-task training process of ViT-DD. Experimental results reveal that the proposed ViT-DD surpasses existing state-of-the-art methods for driver distraction detection by 6.5\% and 0.9\% on the SFDDD and AUCDD datasets, respectively. To support reproducibility and foster further advancements in this critical research area, the source code for this approach is made publicly available at https://github.com/PurdueDigitalTwin/ViT-DD.

翻訳日:2023-05-17 00:53:52 公開日:2023-05-13

# ロボットアームの安全かつ効率的なマルチオブジェクト把持検出手法

A Secure and Efficient Multi-Object Grasping Detection Approach for Robotic Arms ( http://arxiv.org/abs/2209.03511v2 )

ライセンス: Link先を確認

Hui Wang, Jieren Cheng, Yichen Xu, Sirui Ni, Zaijia Yang and Jiangpeng Li

(参考訳) ロボットアームは自動産業で広く使われている。しかし、ロボットアームにおけるディープラーニングの幅広い応用により、コンピューティングパワーの把握の割り当てやセキュリティに対する需要の増加など、新たな課題が存在する。本研究では,ディープラーニングとエッジクラウドの協調に基づくロボットアームの把握手法を提案する。本手法は,ロボットアームの任意の把握計画を実現し,把握効率と情報セキュリティを考慮した。さらに、GANによって訓練されたエンコーダとデコーダにより、圧縮中に画像が暗号化され、プライバシーのセキュリティが保証される。このモデルは、OCIDデータセット上で92%の精度を実現し、画像圧縮比が0.03%に達し、構造差値が0.91以上である。

Robotic arms are widely used in automatic industries. However, with wide applications of deep learning in robotic arms, there are new challenges such as the allocation of grasping computing power and the growing demand for security. In this work, we propose a robotic arm grasping approach based on deep learning and edge-cloud collaboration. This approach realizes the arbitrary grasp planning of the robot arm and considers the grasp efficiency and information security. In addition, the encoder and decoder trained by GAN enable the images to be encrypted while compressing, which ensures the security of privacy. The model achieves 92% accuracy on the OCID dataset, the image compression ratio reaches 0.03%, and the structural difference value is higher than 0.91.

翻訳日:2023-05-17 00:52:44 公開日:2023-05-13

# テンソル製品とほぼ正規コードによるグラフ埋め込み

Graph Embeddings via Tensor Products and Approximately Orthonormal Codes ( http://arxiv.org/abs/2208.10917v4 )

ライセンス: Link先を確認

Frank Qiu

(参考訳) グラフを構造保存的な方法でベクトルとして埋め込む手法を解析し、その豊かな表現能力を示し、その理論的性質のいくつかを確立する。我々の手順はバインド・アンド・サム法に該当し、テンソル積が重ね合わせ原理を尊重する最も一般的な結合演算であることを示す。また,提案手法の挙動を特徴づける精度の高い結果が得られ,球面符号の使用が上限のパッキングを実現することを示す。本手法は,ある意味では,疎グラフ表現への応用を伴う隣接行列の圧縮であることを示すために,隣接行列へのリンクを確立する。

We analyze a method for embedding graphs as vectors in a structure-preserving manner, showcasing its rich representational capacity and establishing some of its theoretical properties. Our procedure falls under the bind-and-sum approach, and we show that the tensor product is the most general binding operation that respects the superposition principle. We also establish some precise results characterizing the behavior of our method, and we show that our use of spherical codes achieves a packing upper bound. We establish a link to adjacency matrices, showing that our method is, in some sense, a compression of adjacency matrices with applications towards sparse graph representations.

翻訳日:2023-05-17 00:51:41 公開日:2023-05-13

# CCC-wav2vec 2.0:クラスタリング支援による音声表現のクロスコントラスト自己教師型学習

CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations ( http://arxiv.org/abs/2210.02592v3 )

ライセンス: Link先を確認

Vasista Sai Lodagala and Sreyan Ghosh and S. Umesh

(参考訳) Self-Supervised Learningは、利用可能なラベルなしデータからスケールのメリットを得るのに役立ちましたが、学習パラダイムは継続的に改善されています。本稿では,クラスタリングと拡張に基づくクロスコントラスト損失を自己管理対象とする,ccc-wav2vec 2.0という新たな事前学習戦略を提案する。クラスタリングモジュールを通じて、ポジティブと非常によく似た否定的な例の影響をスケールダウンします。クロスコントラスト損失は、元のサンプルのエンコーダ出力と、その増大と逆転の量子化器出力との間に計算され、事前学習戦略に堅牢性をもたらす。 ccc-wav2vec 2.0は、librispeechのベースラインであるwav2vec 2.0よりも15.6%と12.7%の改善を達成している。提案手法は,Switchboardデータに微調整を施すと,ベースラインwav2vec 2.0よりも14.9%の相対的なWER改善を実現する。すべてのコードをgithubで公開しています。

While Self-Supervised Learning has helped reap the benefit of the scale from the available unlabeled data, the learning paradigms are continuously being bettered. We present a new pre-training strategy named ccc-wav2vec 2.0, which uses clustering and an augmentation-based cross-contrastive loss as its self-supervised objective. Through the clustering module, we scale down the influence of those negative examples that are highly similar to the positive. The Cross-Contrastive loss is computed between the encoder output of the original sample and the quantizer output of its augmentation and vice-versa, bringing robustness to the pre-training strategy. ccc-wav2vec 2.0 achieves up to 15.6% and 12.7% relative WER improvement over the baseline wav2vec 2.0 on the test-clean and test-other sets, respectively, of LibriSpeech, without the use of any language model. The proposed method also achieves up to 14.9% relative WER improvement over the baseline wav2vec 2.0 when fine-tuned on Switchboard data. We make all our codes publicly available on GitHub.

翻訳日:2023-05-17 00:44:12 公開日:2023-05-13

# どうやってそこに着くの? 英語過去時制インフレクションの認知モデルとしてのトランスフォーマーニューラルネットワークの評価

How do we get there? Evaluating transformer neural networks as cognitive models for English past tense inflection ( http://arxiv.org/abs/2210.09167v2 )

ライセンス: Link先を確認

Xiaomeng Ma and Lingyu Gao

(参考訳) ニューラルネットワークが人間のような言語の準規則性を把握できるかどうか、議論が続いている。典型的な準正則性タスクである英語の過去時制インフレクションにおいて、ニューラルネットワークモデルは、最も頻繁なパターンを一般化するためにのみ学習し、正規パターンではなく、正規パターンと不規則パターンの抽象的なカテゴリを学ぶことができず、人間のパフォーマンスと異なることを長年批判されてきた。本研究では,異なる設定の変圧器モデルのセットをトレーニングし,その動作について検討する。モデルでは, 正規動詞の認識精度が向上し, 不規則動詞の精度も向上した。レギュラーモデルの性能はタイプ周波数と比に大きく影響されるが、トークンの頻度と比率には影響せず、逆もまた不規則である。正規化と不規則化の異なる振る舞いは、モデルが動詞の規則性についてある程度の記号的学習を持っていることを示唆している。さらに、モデルは名詞動詞の人間の行動と弱い相関関係にある。トランスフォーマーモデルは動詞の規則性の抽象的なカテゴリーについてある程度の学習レベルを示すが、その性能は人間のデータにうまく適合せず、良い認知モデルではない可能性がある。

There is an ongoing debate on whether neural networks can grasp the quasi-regularities in languages like humans. In a typical quasi-regularity task, English past tense inflections, the neural network model has long been criticized that it learns only to generalize the most frequent pattern, but not the regular pattern, thus can not learn the abstract categories of regular and irregular and is dissimilar to human performance. In this work, we train a set of transformer models with different settings to examine their behavior on this task. The models achieved high accuracy on unseen regular verbs and some accuracy on unseen irregular verbs. The models' performance on the regulars is heavily affected by type frequency and ratio but not token frequency and ratio, and vice versa for the irregulars. The different behaviors on the regulars and irregulars suggest that the models have some degree of symbolic learning on the regularity of the verbs. In addition, the models are weakly correlated with human behavior on nonce verbs. Although the transformer model exhibits some level of learning on the abstract category of verb regularity, its performance does not fit human data well, suggesting that it might not be a good cognitive model.

翻訳日:2023-05-17 00:34:29 公開日:2023-05-13

# Covariance Matrix Adaptation MAP-Annealing による多次元制御系の訓練

Training Diverse High-Dimensional Controllers by Scaling Covariance Matrix Adaptation MAP-Annealing ( http://arxiv.org/abs/2210.02622v2 )

ライセンス: Link先を確認

Bryon Tjanaka, Matthew C. Fontaine, David H. Lee, Aniruddha Kalkar, Stefanos Nikolaidis

(参考訳) シミュレーションでさまざまなニューラルネットワークコントローラを事前トレーニングすることで、ロボットのロコモーションタスクの損傷に対するオンライン適応が可能になる。しかし、多様で高性能なコントローラを見つけるには、高価なネットワークトレーニングと多数のハイパーパラメータの広範なチューニングが必要となる。一方,進化戦略(es)に基づく品質多様性アルゴリズムである共分散行列適応map-annealing (cma-mae) は,このような制限がなく,標準qdベンチマークで最先端の性能を達成している。しかし、CMA-MAEは2次複雑さのため、現代のニューラルネットワークコントローラにはスケールできない。我々はESにおける効率的な近似手法を活用し、高次元にスケールする3つの新しいCMA-MAE変種を提案する。実験では,ロボットの歩行タスクにおいて,esベースのベースラインを上回っており,最先端の深層強化学習に基づく品質多様性アルゴリズムに匹敵する。

Pre-training a diverse set of neural network controllers in simulation has enabled robots to adapt online to damage in robot locomotion tasks. However, finding diverse, high-performing controllers requires expensive network training and extensive tuning of a large number of hyperparameters. On the other hand, Covariance Matrix Adaptation MAP-Annealing (CMA-MAE), an evolution strategies (ES)-based quality diversity algorithm, does not have these limitations and has achieved state-of-the-art performance on standard QD benchmarks. However, CMA-MAE cannot scale to modern neural network controllers due to its quadratic complexity. We leverage efficient approximation methods in ES to propose three new CMA-MAE variants that scale to high dimensions. Our experiments show that the variants outperform ES-based baselines in benchmark robotic locomotion tasks, while being comparable with or exceeding state-of-the-art deep reinforcement learning-based quality diversity algorithms.

翻訳日:2023-05-17 00:32:33 公開日:2023-05-13

# data2vec-aqc:Teacher-Studentトレーニング設定における適切な教師アシスタントの探索

data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup ( http://arxiv.org/abs/2211.01246v2 )

ライセンス: Link先を確認

Vasista Sai Lodagala and Sreyan Ghosh and S. Umesh

(参考訳) 本稿では、ラベルなし音声データから音声表現学習を行うための、Data2vec-aqcと呼ばれる新しい自己教師付き学習アルゴリズムを提案する。我々の目標は、ラベル付きデータとラベル付きデータの両方が制限されたドメインにおける音声のSSLを改善することです。最近導入されたdata2vecをベースに、データ拡張、量子化表現、クラスタリングの恩恵を受けるdata2vecフレームワークに追加のモジュールを導入しました。これらのモジュール間の相互作用は、追加の自己監督目的として相互競合損失を解決するのに役立つ。 data2vec-aqc は librispeech の既存の state-the-art data2vec システムよりも 14.1% と 20.9% の改善を達成している。提案モデルでは,Switchboardデータセットのサブセットを微調整すると,ベースラインの data2vec に対して最大17.8\% の相対的な WER ゲインが得られる。コード: https://github.com/speech-lab-iitm/data2vec-aqc。

In this paper, we propose a new Self-Supervised Learning (SSL) algorithm called data2vec-aqc, for speech representation learning from unlabeled speech data. Our goal is to improve SSL for speech in domains where both unlabeled and labeled data are limited. Building on the recently introduced data2vec, we introduce additional modules to the data2vec framework that leverage the benefit of data augmentations, quantized representations, and clustering. The interaction between these modules helps solve the cross-contrastive loss as an additional self-supervised objective. data2vec-aqc achieves up to 14.1% and 20.9% relative WER improvement over the existing state-of-the-art data2vec system over the test-clean and test-other sets, respectively of LibriSpeech, without the use of any language model (LM). Our proposed model also achieves up to 17.8\% relative WER gains over the baseline data2vec when fine-tuned on a subset of the Switchboard dataset. Code: https://github.com/Speech-Lab-IITM/data2vec-aqc.

翻訳日:2023-05-17 00:25:03 公開日:2023-05-13

# 衛星間連続可変量子鍵分布:低地球軌道におけるガウスおよび離散変調プロトコル

Satellite-to-Ground Continuous Variable Quantum Key Distribution: The Gaussian and Discrete Modulated Protocols in Low Earth Orbit ( http://arxiv.org/abs/2211.16862v3 )

ライセンス: Link先を確認

Mikhael Sayat, Biveen Shajilal, Sebastian P. Kish, Syed M. Assad, Thomas Symul, Ping Koy Lam, Nicholas Rattenbury, John Cater

(参考訳) ガウス変調連続可変量子鍵分布 (GM-CVQKD) プロトコルは、量子鍵分布 (QKD) において、両者の相互情報を最大化する。別の変調方式は離散変調CVQKD(DM-CVQKD)プロトコルである。本稿では,低SNRにおける衛星間リンク上のGM-CVQKDプロトコルとともに,位相シフト鍵(M-PSK)と準振幅変調(M QAM)DM-CVQKDプロトコルについて検討する。本研究では, 幾何損失, シンチレーション, 散乱損失をそれぞれリンク距離, 大気乱流, 大気エアロゾルから考慮し, 衛星対地リンクモデルを用いた。さらに,近年の多次元符号化・多段復号法モデルとマルチエッジ型低密度パリティチェック(MET-LDPC)符号モデルを組み合わせて,復号化効率を判定する手法が提案されている。その結果,GM-CVQKDはDM-CVQKDより優れていた。さらに、MD調整によるGM-CVQKDは、リンク距離と低高度角度で正の秘密鍵レートを発生させることにより、GM-CVQKDとLC-MSD調整を有限サイズ制限で上回る。

The Gaussian modulated continuous variable quantum key distribution (GM-CVQKD) protocol is known to maximise the mutual information between two parties during quantum key distribution (QKD). An alternative modulation scheme is the discrete modulated CVQKD (DM-CVQKD) protocol. In this paper, we study the Phase Shift Keying (M-PSK) and Quadrature Amplitude Modulation (M QAM) DM-CVQKD protocols along with the GM-CVQKD protocol over a satellite-to-ground link in the low SNR regime. We use a satellite-to-ground link model which takes into account geometric losses, scintillation, and scattering losses from the link distance, atmospheric turbulence, and atmospheric aerosols, respectively. In addition, recent multidimensional (MD) and multilevel coding and multistage decoding (MLC-MSD) reconciliation method models in combination with multiedge-type low-density parity-check (MET-LDPC) code models have been used to determine the reconciliation efficiency. The results show that GM-CVQKD outperforms DM-CVQKD. In addition, GM-CVQKD with MD reconciliation outperforms GM-CVQKD with MLC-MSD reconciliation in the finite size limit by producing positive secret key rates at larger link distances and lower elevation angles.

翻訳日:2023-05-17 00:16:50 公開日:2023-05-13

# オンラインメディアにおける語数成長のためのロジスティック方程式の小さな拡張:社会における成長現象の多様性のパラメトリック記述

A minor extension of the logistic equation for growth of word counts on online media: Parametric description of diversity of growth phenomena in society ( http://arxiv.org/abs/2211.16733v2 )

ライセンス: Link先を確認

Hayafumi Watanabe

(参考訳) 2007年から2019年にかけての約10億の日本語ブログ記事から抽出した月次単語数時系列を,全国のオンラインソーシャルメディア上での新たな語彙の増大現象を解析した。特に、拡張ロジスティック方程式を元の方程式に1つのパラメータを加えることで導入し、ロジスティック関数、線形成長、有限時間発散といった実際の成長曲線の様々なパターンを一貫して再現できることを示した。第二に、モデルパラメータの解析により、典型的な成長パターンは、様々な複雑なシステムにしばしば現れるロジスティック関数であるだけでなく、指数関数から始まる非自明な成長曲線であり、定常状態のないパワー関数に漸近的に近づくことを発見した。さらに,機能的成長形態とピークアウトとの関係も観察した。最後に,提案したモデルと統計特性は,検索クエリの全国的普及の時系列であるGoogle Trendsデータ(英語,フランス語,スペイン語,日本語)にも有効であることを示した。

To understand the growing phenomena of new vocabulary on nationwide online social media, we analyzed monthly word count time series extracted from approximately 1 billion Japanese blog articles from 2007 to 2019. In particular, we first introduced the extended logistic equation by adding one parameter to the original equation and showed that the model can consistently reproduce various patterns of actual growth curves, such as the logistic function, linear growth, and finite-time divergence. Second, by analyzing the model parameters, we found that the typical growth pattern is not only a logistic function, which often appears in various complex systems, but also a nontrivial growth curve that starts with an exponential function and asymptotically approaches a power function without a steady state. Furthermore, we observed a connection between the functional form of growth and the peak-out. Finally, we showed that the proposed model and statistical properties are also valid for Google Trends data (English, French, Spanish, and Japanese), which is a time series of the nationwide popularity of search queries.

翻訳日:2023-05-17 00:16:05 公開日:2023-05-13

# PipeFisher: パイプライニングと漁業情報行列を用いた大規模言語モデルの効率的な訓練

PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices ( http://arxiv.org/abs/2211.14133v2 )

ライセンス: Link先を確認

Kazuki Osawa, Shigang Li, Torsten Hoefler

(参考訳) パイプライン並列処理により、大規模分散アクセラレータクラスタ上でのLarge Language Models(LLM)の効率的なトレーニングが可能になる。しかし、起動時と分解時のパイプラインバブルはアクセラレータの利用を減らす。マイクロバッチと双方向パイプラインを用いた効率的なパイプラインスキームが提案されているが、同期前方および後方通過では相当数の気泡が充填できない。この問題に対処するため,llm訓練の補助的効果を得るために気泡に余分な作業を割り当てることを提案する。この方向の例として,フィッシャー情報行列に基づく2次最適化手法であるK-FACをバブルに割り当てて収束を加速するPipeFisherを提案する。 BERTベースとラージモデルの第1相事前トレーニングでは、K-FACによる加速利用を大幅に改善し、改良された収束の恩恵を受けることにより、一階オプティマイザによるトレーニングに比べて(シミュレーションされた)トレーニング時間を50-75%に短縮する。

Pipeline parallelism enables efficient training of Large Language Models (LLMs) on large-scale distributed accelerator clusters. Yet, pipeline bubbles during startup and tear-down reduce the utilization of accelerators. Although efficient pipeline schemes with micro-batching and bidirectional pipelines have been proposed to maximize utilization, a significant number of bubbles cannot be filled using synchronous forward and backward passes. To address this problem, we suggest that extra work be assigned to the bubbles to gain auxiliary benefits in LLM training. As an example in this direction, we propose PipeFisher, which assigns the work of K-FAC, a second-order optimization method based on the Fisher information matrix, to the bubbles to accelerate convergence. In Phase 1 pretraining of BERT-Base and -Large models, PipeFisher reduces the (simulated) training time to 50-75% compared to training with a first-order optimizer by greatly improving the accelerator utilization and benefiting from the improved convergence by K-FAC.

翻訳日:2023-05-17 00:15:12 公開日:2023-05-13

# 過去が重要なこと:ガウス過程モデルの軌道予測における後続状態の相関

The Past Does Matter: Correlation of Subsequent States in Trajectory Predictions of Gaussian Process Models ( http://arxiv.org/abs/2211.11103v2 )

ライセンス: Link先を確認

Steffen Ridderbusch, Sina Ober-Bl\"obaum, Paul Goulart

(参考訳) 力学系のガウス過程モデルから軌跡の分布を計算することは,そのようなモデルを利用する上で重要な課題である。サンプリングベースアプローチの計算コストに動機づけられ,モデルの出力と軌道分布の近似を考える。従来の不確実性伝播は離散状態空間モデルに焦点をあて、予測された軌道のその後の状態間の独立性の仮定を誤って含んでいた。これらのアイデアを連続常微分方程式モデルに拡張し、この仮定の意義を説明し、ガウス過程の新たな分割線形近似を提案する。

Computing the distribution of trajectories from a Gaussian Process model of a dynamical system is an important challenge in utilizing such models. Motivated by the computational cost of sampling-based approaches, we consider approximations of the model's output and trajectory distribution. We show that previous work on uncertainty propagation, focussed on discrete state-space models, incorrectly included an independence assumption between subsequent states of the predicted trajectories. Expanding these ideas to continuous ordinary differential equation models, we illustrate the implications of this assumption and propose a novel piecewise linear approximation of Gaussian Processes to mitigate them.

翻訳日:2023-05-17 00:14:28 公開日:2023-05-13

# 線形力学系におけるオフラインデータポジショニング攻撃の解析と検出可能性

Analysis and Detectability of Offline Data Poisoning Attacks on Linear Dynamical Systems ( http://arxiv.org/abs/2211.08804v4 )

ライセンス: Link先を確認

Alessio Russo

(参考訳) 近年、データ駆動制御手法に対するデータ中毒攻撃の影響に対する関心が高まっている。毒殺攻撃は機械学習コミュニティではよく知られていますが、これは一般的に線形力学系では持たない、クロスサンプル独立のような仮定を利用しています。したがって、これらのシステムは、i.i.d.\設定の教師付き学習問題のために開発されたものとは異なる攻撃および検出方法を必要とする。多くのデータ駆動制御アルゴリズムは最小二乗推定器を利用するため、統計検査のレンズを通して最小二乗推定値に毒がどのような影響を及ぼすか、また、データ中毒攻撃を検出する方法に疑問を呈する。我々は,データに適合するモデルの集合がシステムの真のモデルを含む条件を定式化し,攻撃者に対する異なる中毒戦略を分析する。そこで本稿では,古典的統計的テストから逃れることのできる最小二乗推定器に対するステルスデータ中毒攻撃を提案し,提案攻撃の有効性を示す。

In recent years, there has been a growing interest in the effects of data poisoning attacks on data-driven control methods. Poisoning attacks are well-known to the Machine Learning community, which, however, make use of assumptions, such as cross-sample independence, that in general do not hold for linear dynamical systems. Consequently, these systems require different attack and detection methods than those developed for supervised learning problems in the i.i.d.\ setting. Since most data-driven control algorithms make use of the least-squares estimator, we study how poisoning impacts the least-squares estimate through the lens of statistical testing, and question in what way data poisoning attacks can be detected. We establish under which conditions the set of models compatible with the data includes the true model of the system, and we analyze different poisoning strategies for the attacker. On the basis of the arguments hereby presented, we propose a stealthy data poisoning attack on the least-squares estimator that can escape classical statistical tests, and conclude by showing the efficiency of the proposed attack.

翻訳日:2023-05-17 00:13:54 公開日:2023-05-13

# Recommenderシステムにおける言語モデリングのPivotalの役割:タスク特化学習とタスク非依存表現学習の強化

Pivotal Role of Language Modeling in Recommender Systems: Enriching Task-specific and Task-agnostic Representation Learning ( http://arxiv.org/abs/2212.03760v5 )

ライセンス: Link先を確認

Kyuyong Shin, Hanock Kwak, Wonjae Kim, Jisu Jeong, Seungjae Jung, Kyung-Min Kim, Jung-Woo Ha, Sang-Woo Lee

(参考訳) 近年,様々なアプリケーションのユーザ行動データを活用する統合ユーザモデリングフレームワークが提案されている。それらの多くは、ユーザの振る舞いシーケンスをプレーンテキストとして利用することで、一般性を失うことなく、任意のドメインやシステム内のリッチな情報を表現することができる。ユーザ履歴コーパスのための言語モデリングは、レコメンダシステムを改善するのに役立つか? その汎用性は、多くのドメインで広く研究されてきたが、レコメンデーションシステムへの応用は、まだ未検討のままである。タスク固有のユーザ履歴に直接適用される言語モデリングは,様々なレコメンデーションタスクにおいて優れた結果が得られることを示す。また、追加のタスクに依存しないユーザ履歴を利用することで、大きなパフォーマンス上のメリットが得られます。さらに,本手法は,未確認領域やサービスにおいても,幅広い実世界のレコメンデータシステムに対して,有望な伝達学習能力を提供できることを示す。

Recent studies have proposed unified user modeling frameworks that leverage user behavior data from various applications. Many of them benefit from utilizing users' behavior sequences as plain texts, representing rich information in any domain or system without losing generality. Hence, a question arises: Can language modeling for user history corpus help improve recommender systems? While its versatile usability has been widely investigated in many domains, its applications to recommender systems still remain underexplored. We show that language modeling applied directly to task-specific user histories achieves excellent results on diverse recommendation tasks. Also, leveraging additional task-agnostic user histories delivers significant performance benefits. We further demonstrate that our approach can provide promising transfer learning capabilities for a broad spectrum of real-world recommender systems, even on unseen domains and services.

翻訳日:2023-05-17 00:06:06 公開日:2023-05-13

# ポート・ハミルトンニューラルネットワークを用いた動的システムの構成学習

Compositional Learning of Dynamical System Models Using Port-Hamiltonian Neural Networks ( http://arxiv.org/abs/2212.00893v2 )

ライセンス: Link先を確認

Cyrus Neary and Ufuk Topcu

(参考訳) 環境と対話するロボットから、大規模なマルチフィジカルシステムまで、多くの動的システムは、多くの相互作用するサブシステムを含んでいる。このようなシステムの複合モデル学習の目的に向けて,本稿で提示する。一構成ニューラルネットワークの枠組み二これらのモデルを訓練するアルゴリズム三学習したモデルを構成する方法四結果の合成モデルの誤差を拘束する理論的結果及び五先入観が知られていないとき、その構成自体を学習する方法ニューラルネットワークのサブモデルは比較的単純なサブシステムによって生成された軌道データに基づいて訓練され、さらに複雑なコンポジットシステムのダイナミクスは、コンポジットシステム自身で生成された追加データを必要としないように予測される。この構成性は、各サブシステムと同様に、ポート-ハミルトンニューラルネットワーク(PHNN)として、ポート-ハミルトン系を帰納バイアスとして用いるニューラル常微分方程式のクラスとして表現することで達成される。 phnnのコレクションは、前もって知られていたり、データから学ばれたりできる、物理に変形した相互接続構造を用いて構成する。本稿では,spring-mass-damperシステムの相互作用に関する数値例を通して,提案フレームワークの新たな機能を示す。非線形エネルギー散逸と制御入力を含むこれらのシステムのモデルは独立に学習される。正確な構成は、新しいモデルをスクラッチからトレーニングするために必要なものと比べて無視できる大量のトレーニングデータを用いて学習される。最後に、複合PHNNはシクロパッシビティのようなポート-ハミルトン系の特性を享受し、制御目的に有用な特性を享受する。

Many dynamical systems -- from robots interacting with their surroundings to large-scale multiphysics systems -- involve a number of interacting subsystems. Toward the objective of learning composite models of such systems from data, we present i) a framework for compositional neural networks, ii) algorithms to train these models, iii) a method to compose the learned models, iv) theoretical results that bound the error of the resulting composite models, and v) a method to learn the composition itself, when it is not known a priori. The end result is a modular approach to learning: neural network submodels are trained on trajectory data generated by relatively simple subsystems, and the dynamics of more complex composite systems are then predicted without requiring additional data generated by the composite systems themselves. We achieve this compositionality by representing the system of interest, as well as each of its subsystems, as a port-Hamiltonian neural network (PHNN) -- a class of neural ordinary differential equations that uses the port-Hamiltonian systems formulation as inductive bias. We compose collections of PHNNs by using the system's physics-informed interconnection structure, which may be known a priori, or may itself be learned from data. We demonstrate the novel capabilities of the proposed framework through numerical examples involving interacting spring-mass-damper systems. Models of these systems, which include nonlinear energy dissipation and control inputs, are learned independently. Accurate compositions are learned using an amount of training data that is negligible in comparison with that required to train a new model from scratch. Finally, we observe that the composite PHNNs enjoy properties of port-Hamiltonian systems, such as cyclo-passivity -- a property that is useful for control purposes.

翻訳日:2023-05-17 00:03:42 公開日:2023-05-13

# 多言語事前学習の促進:多言語モデルのための三角形文書レベル事前学習

Advancing Multilingual Pre-training: TRIP Triangular Document-level Pre-training for Multilingual Language Models ( http://arxiv.org/abs/2212.07752v2 )

ライセンス: Link先を確認

Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Wai Lam, Furu Wei

(参考訳) 多言語系列から列への事前学習の成功にもかかわらず、既存のアプローチの多くは、多くの異なる言語における文書レベルの単言語コーパス、文レベルの複言語コーパス、\footnote{in the paperでは、多くの異なる言語ペアにおいて、平行コーパスと「バイリンガル翻訳ペア」を表すために「バイリンガルコーパス」を使用し、それぞれが異なる言語で書かれた2つの文/文書からなる。我々は,3つの文/文書からなる多言語の組み合わせで,'三言語翻訳ペア'と平行コーパスを表すために,'三言語コーパス'を用いる。時に、合成文書レベルのバイリンガルコーパス。これは、文書レベルの変換のような言語間文書レベルのタスクでパフォーマンスを損なう。そこで本研究では,文書レベルの三言語並列コーパスを用いて,多言語前訓練のシーケンシャル・トゥ・シークエンスを改善することを提案する。従来の単言語とバイリンガルの目標を三言語目標に加速する最初の手法として,グラフトリングと呼ばれる新しい手法を用いて,文書レベル \textbf{p}re-training (\textbf{trip}) を提案する。実験により、TRIPは3つの多言語文書レベルの機械翻訳ベンチマークと1つの言語間の抽象的要約ベンチマークにおいて、最大3.11d-BLEU点と8.9ROUGE-L点の一貫性のある改善を含む、強力なSOTAスコアを達成することが示された。

Despite the success of multilingual sequence-to-sequence pre-training, most existing approaches rely on document-level monolingual corpora in many different languages, sentence-level bilingual corpora,\footnote{In this paper, we use `bilingual corpora' to denote parallel corpora with `bilingual translation pairs' in many different language pairs, each consisting of two sentences/documents with the same meaning written in different languages. We use `trilingual corpora' to denote parallel corpora with `trilingual translation pairs' in many different language combinations, each consisting of three sentences/documents.} and sometimes synthetic document-level bilingual corpora. This hampers the performance with cross-lingual document-level tasks such as document-level translation. Therefore, we propose to mine and leverage document-level trilingual parallel corpora to improve sequence-to-sequence multilingual pre-training. We present \textbf{Tri}angular Document-level \textbf{P}re-training (\textbf{TRIP}), which is the first in the field to accelerate the conventional monolingual and bilingual objectives into a trilingual objective with a novel method called Grafting. Experiments show that TRIP achieves several strong state-of-the-art (SOTA) scores on three multilingual document-level machine translation benchmarks and one cross-lingual abstractive summarization benchmark, including consistent improvements by up to 3.11 d-BLEU points and 8.9 ROUGE-L points.

翻訳日:2023-05-16 23:56:23 公開日:2023-05-13

# groma: ディープニューラルネットワークのグローバルロバスト性を測定するツール

gRoMA: a Tool for Measuring Deep Neural Networks Global Robustness ( http://arxiv.org/abs/2301.02288v2 )

ライセンス: Link先を確認

Natan Levy and Raz Yerushalmi and Guy Katz

(参考訳) ディープニューラルネットワーク(DNN)は最先端技術の最前線にあり、さまざまな複雑なタスクにおいて顕著なパフォーマンスを実現している。それでも、航空宇宙分野や自動車分野のような安全クリティカルなシステムへの統合は、敵の入力の脅威(DNNが重大な誤りを犯す可能性のある入力の摂動)のために大きな課題を生んでいる。複数の研究では、現代のDNNでさえ敵の入力に影響を受けやすいことが示されており、このリスクは安全クリティカルシステムへのDNNの配備を可能にするために測定および緩和されなければならない。本稿では,DNNのグローバルな分類的ロバスト性を測定するための確率論的検証手法を実装した,革新的でスケーラブルなgRoMA(global Robustness Measurement and Assessment)を提案する。具体的には、gRoMAは特定の出力カテゴリに対して逆入力に遭遇する確率を測定する。本ツールは,事前学習したブラックボックス分類DNNで動作し,興味のある出力カテゴリに属する入力サンプルを生成する。これは、DNNがこれらの入力の周囲の敵対的な入力に対する感受性を計測し、結果を集約し、DNNの全体的カテゴリー的ロバスト性を小さな境界統計誤差まで推測する。我々は,CIFAR10データセット上で人気のDensenet DNNモデルを用いてツールの評価を行った。結果から, 出力カテゴリーの頑健さに有意な差が認められた。この実験は、我々のアプローチの有用性とスケーラビリティ、およびDNNを重要なシステムに展開できる可能性を示す。

Deep neural networks (DNNs) are at the forefront of cutting-edge technology, and have been achieving remarkable performance in a variety of complex tasks. Nevertheless, their integration into safety-critical systems, such as in the aerospace or automotive domains, poses a significant challenge due to the threat of adversarial inputs: perturbations in inputs that might cause the DNN to make grievous mistakes. Multiple studies have demonstrated that even modern DNNs are susceptible to adversarial inputs; and this risk must thus be measured and mitigated to allow the deployment of DNNs in safety-critical systems. Here, we present gRoMA (global Robustness Measurement and Assessment), an innovative and scalable tool that implements a probabilistic verification approach to measure the global categorial robustness of a DNN. Specifically, gRoMA measures the probability of encountering adversarial inputs for a specific output category. Our tool operates on pre-trained, black-box classification DNNs, and generates input samples belonging to an output category of interest. It measures the DNN's susceptibility to adversarial inputs around these inputs, and aggregates the results to infer the overall global categorial robustness of the DNN up to some small bounded statistical error. We evaluate our tool on the popular Densenet DNN model over the CIFAR10 dataset. Our results reveal significant gaps in the robustness of the different output categories. This experiment demonstrates the usefulness and scalability of our approach, and its potential for allowing DNNs to be deployed within critical systems of interest.

翻訳日:2023-05-16 23:46:41 公開日:2023-05-13

# 非プレーヤ文字対話のオントロジー的忠実生成

Ontologically Faithful Generation of Non-Player Character Dialogues ( http://arxiv.org/abs/2212.10618v2 )

ライセンス: Link先を確認

Nathaniel Weir, Ryan Thomas, Randolph D'Amore, Kellie Hill, Benjamin Van Durme, Harsh Jhamtani

(参考訳) 本稿では,人気ゲーム環境に根ざした言語生成タスクを提案する。 KNUDGE(KNowledge Constrained User-NPC Dialogue GEneration)は、自然言語で記述されたクエストとエンティティ仕様を正確に反映したビデオゲームキャラクター間の対話のツリーを作成するモデルである。クヌージは、オブシディアン・エンタテインメントの『ザ・アウターワールド』のゲームデータから直接引き出されたサイドクエスト対話から構築されており、(1)対話は、発話の線形連鎖とは対照的に木を分岐させ、(2)発話は、ゲームlore -- 人格的ペルソナ、バックストーリー、および人間関係に忠実でありなければならず、(3)対話は、人間のプレイヤーに新しいクエストの詳細を正確に明らかにする必要がある。教師付きおよびコンテキスト内学習技術を用いたニューラルネットワークモデルの結果を報告する。現実的でゲーム品質の対話を創り出す上での課題に対処する上で、今後の作業には有能なパフォーマンスと余地を見出す。

We introduce a language generation task grounded in a popular video game environment. KNUDGE (KNowledge Constrained User-NPC Dialogue GEneration) requires models to produce trees of dialogue between video game characters that accurately reflect quest and entity specifications stated in natural language. KNUDGE is constructed from side quest dialogues drawn directly from game data of Obsidian Entertainment's The Outer Worlds, leading to real-world complexities in generation: (1) dialogues are branching trees as opposed to linear chains of utterances; (2) utterances must remain faithful to the game lore -- character personas, backstories, and entity relationships; and (3) a dialogue must accurately reveal new quest details to the human player. We report results for a set of neural generation models using supervised and in-context learning techniques; we find competent performance but room for future work addressing the challenges of creating realistic, game-quality dialogues.

翻訳日:2023-05-16 23:44:52 公開日:2023-05-13

# LegendreTron: マルチクラスの損失学習が向上

LegendreTron: Uprising Proper Multiclass Loss Learning ( http://arxiv.org/abs/2301.11695v2 )

ライセンス: Link先を確認

Kevin Lam, Christian Walder, Spiridon Penev, Richard Nock

(参考訳) 損失関数は教師付き学習の基礎となり、しばしばモデル開発の前に選択される。損失のアドホックな選択を避けるために、統計的決定理論は、ベイズの法則が最適であると主張する \emph{properness} として知られる損失の望ましい性質を記述する。近年の研究では、emph{learn loss} とモデルの共同開発が試みられている。既存の方法では、逆正準リンク関数を単調に$\mathbb{R}$を$[0,1]$にし、二元問題に対する確率を推定する。本論文では、凸関数の勾配の単調性を用いて、$\mathbb{R}^{C-1}$と予想される確率単純度$\tilde{\Delta}^{C-1}$の間の写像への単調性を拡張する。本稿では,emph{proper canonical loss} と多クラス問題に対する確率を共同で学習する新規かつ実用的な方法として {\sc LegendreTron を提案する。最大1000のクラスを持つドメインのベンチマークでテストした結果、我々のメソッドは10以上のクラスを持つすべてのデータセットで99%の価値がある$t$-testで、自然のマルチクラスベースラインを一貫して上回ります。

Loss functions serve as the foundation of supervised learning and are often chosen prior to model development. To avoid potentially ad hoc choices of losses, statistical decision theory describes a desirable property for losses known as \emph{properness}, which asserts that Bayes' rule is optimal. Recent works have sought to \emph{learn losses} and models jointly. Existing methods do this by fitting an inverse canonical link function which monotonically maps $\mathbb{R}$ to $[0,1]$ to estimate probabilities for binary problems. In this paper, we extend monotonicity to maps between $\mathbb{R}^{C-1}$ and the projected probability simplex $\tilde{\Delta}^{C-1}$ by using monotonicity of gradients of convex functions. We present {\sc LegendreTron} as a novel and practical method that jointly learns \emph{proper canonical losses} and probabilities for multiclass problems. Tested on a benchmark of domains with up to 1,000 classes, our experimental results show that our method consistently outperforms the natural multiclass baseline under a $t$-test at 99% significance on all datasets with greater than 10 classes.

翻訳日:2023-05-16 23:38:12 公開日:2023-05-13

# マルチエージェント強化学習システムにおける直接罰が協調の創発に及ぼす影響の検討

Investigating the Impact of Direct Punishment on the Emergence of Cooperation in Multi-Agent Reinforcement Learning Systems ( http://arxiv.org/abs/2301.08278v2 )

ライセンス: Link先を確認

Nayana Dasgupta, Mirco Musolesi

(参考訳) 協力の解決は機能的社会の創出と維持に不可欠であり、道路の分岐点の航行から炭素削減条約の交渉まで、協調的なジレンマの例である。 AIの利用が社会全体に広まるにつれ、これらの複雑な協調ジレンマをナビゲートできる社会的にインテリジェントなエージェントの必要性がますます明白になりつつある。自然界では、直接罰(direct punishment)は、集団内の協力の出現の恩恵を受ける、ユビキタスな社会的メカニズムである。しかし、社会的ジレンマを経験する人工学習エージェントの集団における協力の発展に先行研究が与える影響は調査されていない。さらに、自然集団内では、いかなる形態の刑罰も、パートナーの選択と評判の関連する社会的メカニズムと強く結びついている。しかし, マルチエージェントシステムにおける協調の出現に, 複数の社会的メカニズムを組み合わせることが及ぼす影響は, これまで検討されていない。そこで,本稿では,マルチエージェント強化学習システムにおける直接的な処罰に関連する行動と学習のダイナミクスを包括的に分析し,パートナー選択と評価の社会的メカニズムと組み合わせることで,第三者の罰と比較する。エージェントが学習した戦略のダイナミクスに対するこれらの重要なメカニズムの影響を広範囲かつ体系的に評価する。最後に,これらのメカニズムが協調型AIシステムの設計に与える影響について論じる。

Solving the problem of cooperation is of fundamental importance to the creation and maintenance of functional societies, with examples of cooperative dilemmas ranging from navigating busy road junctions to negotiating carbon reduction treaties. As the use of AI becomes more pervasive throughout society, the need for socially intelligent agents that are able to navigate these complex cooperative dilemmas is becoming increasingly evident. In the natural world, direct punishment is an ubiquitous social mechanism that has been shown to benefit the emergence of cooperation within populations. However no prior work has investigated its impact on the development of cooperation within populations of artificial learning agents experiencing social dilemmas. Additionally, within natural populations the use of any form of punishment is strongly coupled with the related social mechanisms of partner selection and reputation. However, no previous work has considered the impact of combining multiple social mechanisms on the emergence of cooperation in multi-agent systems. Therefore, in this paper we present a comprehensive analysis of the behaviours and learning dynamics associated with direct punishment in multi-agent reinforcement learning systems and how it compares to third-party punishment, when both are combined with the related social mechanisms of partner selection and reputation. We provide an extensive and systematic evaluation of the impact of these key mechanisms on the dynamics of the strategies learned by agents. Finally, we discuss the implications of the use of these mechanisms on the design of cooperative AI systems.

翻訳日:2023-05-16 23:37:10 公開日:2023-05-13

# フェデレーションレコメンデーションにおける二重パーソナライズ

Dual Personalization on Federated Recommendation ( http://arxiv.org/abs/2301.08143v2 )

ライセンス: Link先を確認

Chunxu Zhang, Guodong Long, Tianyi Zhou, Peng Yan, Zijian Zhang, Chengqi Zhang, Bo Yang

(参考訳) フェデレーションレコメンデーション(federated recommendation)は、プライバシー保護レコメンデーションサービスをフェデレーション設定で提供する、新しいインターネットサービスアーキテクチャである。既存のソリューションは、分散レコメンデーションアルゴリズムとプライバシ保護メカニズムを組み合わせるために使用される。したがって、本質的にはサーバでヘビーウェイトモデルの形をとり、デバイス上のインテリジェントモデルのエンドユーザへのデプロイを妨げる。本稿では、サーバ上の重み付けモデルではなく、スマートデバイスにデプロイされる多くのユーザ固有の軽量モデルを学ぶために、Personalized Federated Recommendation(PFedRec)フレームワークを提案する。さらに,ユーザとアイテムの両方の詳細なパーソナライズを効果的に学習するための,新たな二重パーソナライズ機構を提案する。全体的な学習プロセスは統合された最適化フレームワークに定式化される。具体的には、フェデレーションシステムでユーザ間でまったく同じアイテム埋め込みを共有する従来の方法とは異なり、デュアルパーソナライズにより、各ユーザがアイテム埋め込みを穏やかに微調整することで、アイテム表現に対するユーザ固有のビューを生成し、既存のフェデレーション推奨メソッドに統合して、すぐに改善を得られるようになる。複数のベンチマークデータセットの実験では、PFedRecと二重パーソナライゼーション機構の有効性が実証されている。さらに,アイテム埋め込みにおけるパーソナライズ手法の可視化と詳細な分析を行い,フェデレーション設定におけるレコメンダシステムの設計に関する新たな知見を得た。コードは利用可能です。

Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings. Existing solutions are used to combine distributed recommendation algorithms and privacy-preserving mechanisms. Thus it inherently takes the form of heavyweight models at the server and hinders the deployment of on-device intelligent models to end-users. This paper proposes a novel Personalized Federated Recommendation (PFedRec) framework to learn many user-specific lightweight models to be deployed on smart devices rather than a heavyweight model on a server. Moreover, we propose a new dual personalization mechanism to effectively learn fine-grained personalization on both users and items. The overall learning process is formulated into a unified federated optimization framework. Specifically, unlike previous methods that share exactly the same item embeddings across users in a federated system, dual personalization allows mild finetuning of item embeddings for each user to generate user-specific views for item representations which can be integrated into existing federated recommendation methods to gain improvements immediately. Experiments on multiple benchmark datasets have demonstrated the effectiveness of PFedRec and the dual personalization mechanism. Moreover, we provide visualizations and in-depth analysis of the personalization techniques in item embedding, which shed novel insights on the design of recommender systems in federated settings. The code is available.

翻訳日:2023-05-16 23:36:48 公開日:2023-05-13

# 推定特徴属性に対する負のフラックス凝集

Negative Flux Aggregation to Estimate Feature Attributions ( http://arxiv.org/abs/2301.06989v2 )

ライセンス: Link先を確認

Xin Li, Deng Pan, Chengyin Li, Yao Qiang and Dongxiao Zhu

(参考訳) セキュリティや透明性の懸念が高まる中で、ディープニューラルネットワーク(DNN)の動作を理解する必要性が高まっている。ディープニューラルネットワークアーキテクチャの多層非線形性のため、DNN予測の説明は依然として未解決の問題であり、メカニズムの深い理解を妨げている。 DNNの説明可能性を高めるために,分岐とフラックスを用いた予測課題に対する入力特徴の属性を推定する。ベクトル解析における発散定理に着想を得て,新しい負流束集合(neflag)定式化法と帰属写像を推定するための効率的な近似アルゴリズムを開発した。以前の技術とは異なり、私たちの手法はサーロゲートモデルに適合したり、勾配のパス統合を必要としたりしません。定性的かつ定量的な実験は、競合する方法よりも忠実な帰属写像を生成する上で、NeFLAGの優れた性能を示す。我々のコードは \url{https://github.com/xinli0928/NeFLAG} で入手できる。

There are increasing demands for understanding deep neural networks' (DNNs) behavior spurred by growing security and/or transparency concerns. Due to multi-layer nonlinearity of the deep neural network architectures, explaining DNN predictions still remains as an open problem, preventing us from gaining a deeper understanding of the mechanisms. To enhance the explainability of DNNs, we estimate the input feature's attributions to the prediction task using divergence and flux. Inspired by the divergence theorem in vector analysis, we develop a novel Negative Flux Aggregation (NeFLAG) formulation and an efficient approximation algorithm to estimate attribution map. Unlike the previous techniques, ours doesn't rely on fitting a surrogate model nor need any path integration of gradients. Both qualitative and quantitative experiments demonstrate a superior performance of NeFLAG in generating more faithful attribution maps than the competing methods. Our code is available at \url{https://github.com/xinli0928/NeFLAG}

翻訳日:2023-05-16 23:35:49 公開日:2023-05-13

# MPAS-Oとグローバルドリフトデータセットの動的データ同化

Dynamic Data Assimilation of MPAS-O and the Global Drifter Dataset ( http://arxiv.org/abs/2301.05551v2 )

ライセンス: Link先を確認

Derek DeSantis, Ayan Biswas, Earl Lawrence, Phillip Wolfram

(参考訳) 本研究では,海洋における温度予測の精度を向上させるために,地球系モデル(esms)とin situ buoy測定を組み合わせた新しい手法を提案する。この技術はesmで識別されるダイナミクスとモードを利用して、季節性などの特徴を保ちながらブイ測定の精度を向上させる。この手法を用いることで,MPAS-Oモデルによる局所温度予測の誤差を補正することができる。提案手法は他の補間法やデータ同化法に比べて精度が向上することを示す。本手法は,グローバル・ドリフト・プログラムの海洋ブイデータセットを用いて,スケールス・オーシャン・コンポーネント (mpas-o) の予測モデルを適用した。

In this study, we propose a new method for combining in situ buoy measurements with Earth system models (ESMs) to improve the accuracy of temperature predictions in the ocean. The technique utilizes the dynamics and modes identified in ESMs to improve the accuracy of buoy measurements while still preserving features such as seasonality. Using this technique, errors in localized temperature predictions made by the MPAS-O model can be corrected. We demonstrate that our approach improves accuracy compared to other interpolation and data assimilation methods. We apply our method to assimilate the Model for Prediction Across Scales Ocean component (MPAS-O) with the Global Drifter Program's in-situ ocean buoy dataset.

翻訳日:2023-05-16 23:35:33 公開日:2023-05-13

# 国家の安全強化学習に関する調査

State-wise Safe Reinforcement Learning: A Survey ( http://arxiv.org/abs/2302.03122v2 )

ライセンス: Link先を確認

Weiye Zhao, Tairan He, Rui Chen, Tianhao Wei, Changliu Liu

(参考訳) シミュレーション環境でRL(Reinforcement Learning)アルゴリズムが驚くほど成功したにもかかわらず、実世界のアプリケーションにRLを適用することは、まだ多くの課題に直面している。主な懸念事項は安全性、つまり制約満足度である。状態毎の制約は、現実世界のアプリケーションで最も一般的な制約の1つであり、safe rlで最も難しい制約の1つです。自律運転やロボット操作など,多くの課題に対して,国家的制約の実施が不可欠である。本稿では、RLにおける状態制約に対処する既存のアプローチを包括的にレビューする。 SCMDP(State-wise Constrained Markov Decision Process)の枠組みの下で、既存のアプローチの関連、相違、トレードオフについて議論する。 (i)安全性の保証と拡張性。 (ii)安全と報酬の成果、及び (iii)収束後及び訓練中の安全性。また,現在の手法の限界を要約し,今後の方向性について考察する。

Despite the tremendous success of Reinforcement Learning (RL) algorithms in simulation environments, applying RL to real-world applications still faces many challenges. A major concern is safety, in another word, constraint satisfaction. State-wise constraints are one of the most common constraints in real-world applications and one of the most challenging constraints in Safe RL. Enforcing state-wise constraints is necessary and essential to many challenging tasks such as autonomous driving, robot manipulation. This paper provides a comprehensive review of existing approaches that address state-wise constraints in RL. Under the framework of State-wise Constrained Markov Decision Process (SCMDP), we will discuss the connections, differences, and trade-offs of existing approaches in terms of (i) safety guarantee and scalability, (ii) safety and reward performance, and (iii) safety after convergence and during training. We also summarize limitations of current methods and discuss potential future directions.

翻訳日:2023-05-16 23:28:01 公開日:2023-05-13

# リスク分解による自己指導型学習の評価

Evaluating Self-Supervised Learning via Risk Decomposition ( http://arxiv.org/abs/2302.03068v2 )

ライセンス: Link先を確認

Yann Dubois and Tatsunori Hashimoto and Percy Liang

(参考訳) 自己教師付き学習(SSL)パイプラインは、アーキテクチャや拡張、事前トレーニングデータなど、多くの設計上の選択肢が異なる。しかし、SSLは通常、1つのメトリックを使って評価される。これにより、モデルがなぜ、いつ、どのように改善されるのか、多くの洞察が得られない。そこで本研究では,表現学習ステップから生じる誤りを考慮し,古典的教師付き近似推定分解を一般化したsslリスク分解を提案する。分解は,近似,表現ユーザビリティ,プローブ一般化,エンコーダ一般化の4つの誤差成分からなる。我々は,各コンポーネントに対して効率的な推定器を提供し,imagenet で評価した 169 ssl ビジョンモデルに対する30 の設計選択の影響を分析する。私たちの分析はSSLモデルを設計、使用するための貴重な洞察を与えます。例えば、エラーの主なソースを強調し、エラーコンポーネントのトレーディングによって特定の設定(フル対数ショット)でSSLを改善する方法を示している。すべての結果と事前訓練されたモデルはhttps://github.com/YannDubs/SSL-Risk-Decompositionにある。

Self-supervised learning (SSL) pipelines differ in many design choices such as the architecture, augmentations, or pretraining data. Yet SSL is typically evaluated using a single metric: linear probing on ImageNet. This does not provide much insight into why or when a model is better, now how to improve it. To address this, we propose an SSL risk decomposition, which generalizes the classical supervised approximation-estimation decomposition by considering errors arising from the representation learning step. Our decomposition consists of four error components: approximation, representation usability, probe generalization, and encoder generalization. We provide efficient estimators for each component and use them to analyze the effect of 30 design choices on 169 SSL vision models evaluated on ImageNet. Our analysis gives valuable insights for designing and using SSL models. For example, it highlights the main sources of error and shows how to improve SSL in specific settings (full- vs few-shot) by trading off error components. All results and pretrained models are at https://github.com/YannDubs/SSL-Risk-Decomposition.

翻訳日:2023-05-16 23:27:47 公開日:2023-05-13

# 知識グラフ補完のための二重置換等価性

Double Permutation Equivariance for Knowledge Graph Completion ( http://arxiv.org/abs/2302.01313v4 )

ライセンス: Link先を確認

Jianfei Gao, Yangze Zhou, Bruno Ribeiro

(参考訳) この研究は知識グラフ(kgs)を、二重交換可能な有理グラフを表す新しいグラフのクラスとして形式化し、ノードとペアワイズ(joint 2-node)表現は、ノードidとエッジ(&node)属性(relation & node feature)の両方の置換に同値でなければならない。二重置換同変 KG 表現は KG の新しい研究方向を開く。この等分散は、ニューラルネットワークが複雑な論理推論タスクをkgsで実行できるようにする関係の構造的表現を課す。最後に,このような等価表現に対する一般的な青写真を導入し,wn18rr,fb237,nell995インダクティブkg完了タスクにおいて最先端のhis@10テスト精度を達成し,既存の手法では実行できない論理的推論タスクを最善の知識に対して正確に実行可能にする,単純なgnnベースの二重置換同変ニューラルネットワークアーキテクチャをテストする。

This work provides a formalization of Knowledge Graphs (KGs) as a new class of graphs that we denote doubly exchangeable attributed graphs, where node and pairwise (joint 2-node) representations must be equivariant to permutations of both node ids and edge (& node) attributes (relations & node features). Double-permutation equivariant KG representations open a new research direction in KGs. We show that this equivariance imposes a structural representation of relations that allows neural networks to perform complex logical reasoning tasks in KGs. Finally, we introduce a general blueprint for such equivariant representations and test a simple GNN-based double-permutation equivariant neural architecture that achieve state-of-the-art Hits@10 test accuracy in the WN18RR, FB237 and NELL995 inductive KG completion tasks, and can accurately perform logical reasoning tasks that no existing methods can perform, to the best of our knowledge.

翻訳日:2023-05-16 23:27:12 公開日:2023-05-13

# 非自己回帰テキスト生成のための拡散モデル:調査

Diffusion Models for Non-autoregressive Text Generation: A Survey ( http://arxiv.org/abs/2303.06574v2 )

ライセンス: Link先を確認

Yifan Li, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen

(参考訳) 非自己回帰(NAR)テキスト生成は、推論遅延を大幅に低減するが、生成精度を犠牲にする自然言語処理の分野で大きな注目を集めている。近年,narテキスト生成に潜伏型可変生成モデルのクラスである拡散モデルが導入され,テキスト生成品質が向上している。本稿では,NARテキスト生成における拡散モデルの最近の進歩を概観する。背景として,まず拡散モデルとテキスト拡散モデルの一般定義を提示し,ナル生成のメリットについて考察する。コアコンテンツとして,既存のテキスト拡散における2つの主流拡散モデルを紹介し,拡散過程の重要な設計について検討する。さらに,テキスト拡散モデルにおける事前学習言語モデル(PLM)の利用について検討し,テキストデータの最適化手法を導入する。最後に,いくつかの有望な方向性について議論し,本論文をまとめる。本研究の目的は,NAR生成のためのテキスト拡散モデルに関する研究の体系的な参照を提供することである。我々はテキスト拡散モデルの集合をhttps://github.com/RUCAIBox/Awesome-Text-Diffusion-Modelsで紹介する。

Non-autoregressive (NAR) text generation has attracted much attention in the field of natural language processing, which greatly reduces the inference latency but has to sacrifice the generation accuracy. Recently, diffusion models, a class of latent variable generative models, have been introduced into NAR text generation, showing an improved text generation quality. In this survey, we review the recent progress in diffusion models for NAR text generation. As the background, we first present the general definition of diffusion models and the text diffusion models, and then discuss their merits for NAR generation. As the core content, we further introduce two mainstream diffusion models in existing work of text diffusion, and review the key designs of the diffusion process. Moreover, we discuss the utilization of pre-trained language models (PLMs) for text diffusion models and introduce optimization techniques for text data. Finally, we discuss several promising directions and conclude this paper. Our survey aims to provide researchers with a systematic reference of related research on text diffusion models for NAR generation. We present our collection of text diffusion models at https://github.com/RUCAIBox/Awesome-Text-Diffusion-Models.

翻訳日:2023-05-16 23:08:12 公開日:2023-05-13

# 漢字命名におけるトランスフォーマーモデルの評価と人間の行動

Evaluating Transformer Models and Human Behaviors on Chinese Character Naming ( http://arxiv.org/abs/2303.12294v2 )

ライセンス: Link先を確認

Xiaomeng Ma and Lingyu Gao

(参考訳) ニューラルネットワークモデルは、多くのアルファベット言語に対する人間のグラファイム・音素マッピングプロセスを説明するために提案されている。これらのモデルは、文字文字列とその発音の対応をうまく学習しただけでなく、人間の振る舞いを言葉命名タスクで捉えた。ニューラルネットワークは、非アルファベット言語(例えば中国語)の未知文字タスクに対してどのように機能するか? モデルはどの程度人間の行動を捉えますか? 本研究では,まず未知の漢字命名課題に対する話者の回答を収集し,その性能を未知の漢字命名課題における人間の行動と比較し,トランスフォーマーモデルの評価を行った。モデルと人間は同じような振る舞いをしており、各キャラクタに類似した精度分布を持ち、回答にかなりの重複があることが判明した。さらに、モデルの回答は人間の回答と非常に相関している。これらの結果はトランスモデルが人間のキャラクタ命名行動をうまく捉えていることを示唆している。

Neural network models have been proposed to explain the grapheme-phoneme mapping process in humans for many alphabet languages. These models not only successfully learned the correspondence of the letter strings and their pronunciation, but also captured human behavior in nonce word naming tasks. How would the neural models perform for a non-alphabet language (e.g., Chinese) unknown character task? How well would the model capture human behavior? In this study, we first collect human speakers' answers on unknown character naming tasks and then evaluate a set of transformer models by comparing their performances with human behaviors on an unknown Chinese character naming task. We found that the models and humans behaved very similarly, that they had similar accuracy distribution for each character, and had a substantial overlap in answers. In addition, the models' answers are highly correlated with humans' answers. These results suggested that the transformer models can well capture human's character naming behavior.

翻訳日:2023-05-16 22:59:47 公開日:2023-05-13

# 開量子系に対する一般化ブラウン粒子としてのディシパトン:ディシパトン埋め込み量子マスター方程式

Dissipatons as generalized Brownian particles for open quantum systems: Dissipaton-embedded quantum master equation ( http://arxiv.org/abs/2303.10666v2 )

ライセンス: Link先を確認

Xiang Li, Yu Su, Zi-Hao Chen, Yao Wang, Rui-Xue Xu, Xiao Zheng, YiJing Yan

(参考訳) ディシパトン理論はオープン量子系力学を扱うための正確で非摂動的なアプローチとして提案され、ガウス環境の影響はディシパトンと呼ばれる統計的準粒子によって特徴づけられる。本研究では、ディシパトン運動方程式を再検討し、同値なディシパトン埋め込み量子マスター方程式(dqme)を確立し、一般化されたブラウン粒子としてディシパトンを生成する。この論文で説明されているように、dqmeはディシパトンと物理的に支持されるハイブリッドバスモードの統計特性を調べるための直接的なアプローチを提供する。電子移動モデルを用いて数値実験を行い, 溶媒和座標の過渡的統計特性を示す。

Dissipaton theory had been proposed as an exact and nonperturbative approach to deal with open quantum system dynamics, where the influence of Gaussian environment is characterized by statistical quasi-particles named as dissipatons. In this work, we revisit the dissipaton equation of motion theory and establish an equivalent dissipatons-embedded quantum master equation (DQME), which gives rise to dissipatons as generalized Brownian particles. As explained in this work, the DQME supplies a direct approach to investigate the statistical characteristics of dissipatons and thus the physically supporting hybrid bath modes. Numerical demonstrations are carried out on the electron transfer model, exhibiting the transient statistical properties of the solvation coordinate.

翻訳日:2023-05-16 22:59:32 公開日:2023-05-13

# 人間中心設計のための人工共感に向けて:フレームワーク

Toward Artificial Empathy for Human-Centered Design: A Framework ( http://arxiv.org/abs/2303.10583v2 )

ライセンス: Link先を確認

Qihao Zhu and Jianxi Luo

(参考訳) 設計プロセスの初期段階では、デザイナは未完成のニーズを発見し、潜在的な解決策として革新的な概念を開発することで機会を探る。人間中心のデザインの観点からは、デザイナーはニーズを真に理解するために、人々と共感しなくてはならない。しかし、共感の発達は、デザイナーの共感能力に大きく依存する複雑で主観的なプロセスである。したがって、共感的理解の発達は直感的であり、基礎となるニーズの発見はしばしばセレンディピティである。本稿では,AIによる人間中心設計の今後の方向性を示すために,人工知能研究からの洞察を提供することを目的としている。具体的には,データ駆動型ユーザ研究,共感的理解開発,人為的共感など研究分野を学際的に調査する。本稿では,人間中心設計において人工共感が果たす役割を論じ,人間中心設計のための人工共感フレームワークを提案する。共感の背後にあるメカニズムと共感設計の研究からの洞察に基づいて、このフレームワークは共感のかなり複雑で主観的な概念を計算的にモデル化できるコンポーネントとモジュールに分解することを目的としている。さらに,このようなシステムを開発することの期待できる利点を議論し,今後の研究努力を促進するための現在の研究ギャップを明らかにする。

In the early stages of the design process, designers explore opportunities by discovering unmet needs and developing innovative concepts as potential solutions. From a human-centered design perspective, designers must develop empathy with people to truly understand their needs. However, developing empathy is a complex and subjective process that relies heavily on the designer's empathic capability. Therefore, the development of empathic understanding is intuitive, and the discovery of underlying needs is often serendipitous. This paper aims to provide insights from artificial intelligence research to indicate the future direction of AI-driven human-centered design, taking into account the essential role of empathy. Specifically, we conduct an interdisciplinary investigation of research areas such as data-driven user studies, empathic understanding development, and artificial empathy. Based on this foundation, we discuss the role that artificial empathy can play in human-centered design and propose an artificial empathy framework for human-centered design. Building on the mechanisms behind empathy and insights from empathic design research, the framework aims to break down the rather complex and subjective concept of empathy into components and modules that can potentially be modeled computationally. Furthermore, we discuss the expected benefits of developing such systems and identify current research gaps to encourage future research efforts.

翻訳日:2023-05-16 22:59:02 公開日:2023-05-13

# 間隔密結合戦略に基づくスウィントランスを用いた低画質画像の解像度向上処理

Resolution Enhancement Processing on Low Quality Images Using Swin Transformer Based on Interval Dense Connection Strategy ( http://arxiv.org/abs/2303.09190v2 )

ライセンス: Link先を確認

Rui-Yang Ju, Chih-Chia Chen, Jen-Shiun Chiang, Yu-Shian Lin, Wei-Han Chen, Chun-Tse Chien

(参考訳) 本手法は,畳み込みニューラルネットワーク(cnns)に基づく手法と比較して,画像の超解像性能が著しく向上した。しかし、画像から特徴情報を抽出するために、SwinIR (Image Restoration Using Swin Transformer) のような自己保持機構を使用するには、膨大な量の計算資源が必要であるため、低計算パワープラットフォームへの応用が制限される。モデル機能の再利用を改善するため,新たに設計されたアルゴリズムに従って異なるブロックを接続するインターバルDense Connection Strategyを提案する。我々はこの戦略をSwinIRに適用し、SwinOIR (Object Image Restoration using Swin Transformer) と名付けた新しいモデルを提案する。画像の超解像に対して,区間密結合戦略がモデル性能に及ぼす影響を示すため,アブレーション実験を行った。さらに,このモデルを様々なベンチマークデータセット上で評価し,他のSOTA(State-of-the-art)軽量モデルと比較した。例えば、SwinOIRはUrban100データセットの超高解像度化のために26.62dBのPSNRを取得し、これはSOTAモデルSwinIRよりも0.15dB高い。本研究は, リアルタイムアプリケーションにおいて, YOLOv8(You Only Look Once)モデルの最後のバージョンと提案モデルを適用し, 低画質画像上でオブジェクト検出とリアルタイム画像の超解像を行う。この実装コードはhttps://github.com/Rubbbbbbby/SwinOIRで公開されている。

The Transformer-based method has demonstrated remarkable performance for image super-resolution in comparison to the method based on the convolutional neural networks (CNNs). However, using the self-attention mechanism like SwinIR (Image Restoration Using Swin Transformer) to extract feature information from images needs a significant amount of computational resources, which limits its application on low computing power platforms. To improve the model feature reuse, this research work proposes the Interval Dense Connection Strategy, which connects different blocks according to the newly designed algorithm. We apply this strategy to SwinIR and present a new model, which named SwinOIR (Object Image Restoration Using Swin Transformer). For image super-resolution, an ablation study is conducted to demonstrate the positive effect of the Interval Dense Connection Strategy on the model performance. Furthermore, we evaluate our model on various popular benchmark datasets, and compare it with other state-of-the-art (SOTA) lightweight models. For example, SwinOIR obtains a PSNR of 26.62 dB for x4 upscaling image super-resolution on Urban100 dataset, which is 0.15 dB higher than the SOTA model SwinIR. For real-life application, this work applies the lastest version of You Only Look Once (YOLOv8) model and the proposed model to perform object detection and real-life image super-resolution on low-quality images. This implementation code is publicly available at https://github.com/Rubbbbbbbbby/SwinOIR.

翻訳日:2023-05-16 22:58:05 公開日:2023-05-13

# Global Prompt Cell: 効率的なPromptチューニングのためのポータブルモジュール

Global Prompt Cell: A Portable Control Module for Effective Prompt Tuning ( http://arxiv.org/abs/2304.05642v2 )

ライセンス: Link先を確認

Chi Liu, Haochun Wang, Nuwa Xi, Sendong Zhao, Bing Qin

(参考訳) 事前訓練されたモデルをチューニングするための新しいアプローチとして、プロンプトチューニングは、第1層の入力にトレーニング可能な埋め込みを挿入しながら、下流タスクでパラメータを凍結する。しかし,従来の手法は主に,プロンプト埋め込みの初期化に重点を置いている。適切な方法で迅速な埋め込みを訓練し活用する戦略は、迅速なチューニングの有効性の制限要因となっている。この問題に対処するために,すべてのエンコーダ層にまたがるプロンプト情報を選択的に保存するプロンプトチューニングモジュールであるGPC(Global Prompt Cell)を導入する。実験の結果,バニラプロンプトチューニングと比較して,SuperGLUEデータセットは5.8%改善した。

As a novel approach to tuning pre-trained models, prompt tuning involves freezing the parameters in downstream tasks while inserting trainable embeddings into inputs in the first layer. However, previous methods have mainly focused on the initialization of prompt embeddings. The strategy of training and utilizing prompt embeddings in a reasonable way has become a limiting factor in the effectiveness of prompt tuning. To address this issue, we introduce the Global Prompt Cell (GPC), a portable control module for prompt tuning that selectively preserves prompt information across all encoder layers. Our experimental results demonstrate a 5.8% improvement on SuperGLUE datasets compared to vanilla prompt tuning.

翻訳日:2023-05-16 22:50:01 公開日:2023-05-13

# DDP:高密度視覚予測のための拡散モデル

DDP: Diffusion Model for Dense Visual Prediction ( http://arxiv.org/abs/2303.17559v2 )

ライセンス: Link先を確認

Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo

(参考訳) 本研究では,条件拡散パイプラインに基づく高密度視覚予測のための簡易かつ効率的かつ強力なフレームワークを提案する。提案手法は,ランダムなガウス分布からノイズを段階的に除去して予測する「ノイズ・ツー・マップ」生成パラダイムに従う。 DDPと呼ばれるこの手法は、デノナイジング拡散過程を現代の知覚パイプラインに効率的に拡張する。タスク固有の設計とアーキテクチャのカスタマイズがなければ、DDPはセマンティックセグメンテーションや深さ推定といった最も密集した予測タスクに簡単に一般化できる。さらにDDPは,従来の一段階判別法とは対照的に,動的推論や不確実性認識などの魅力的な特性を示す。 3つの代表的なタスクで,6つのベンチマークで上位結果を示し,トリックを伴わずに,ddpは各タスクの最高性能や競争性能を,専門家と比較した。例えば、セマンティックセグメンテーション (83.9 mIoU on Cityscapes)、BEVマップセグメンテーション (70.6 mIoU on nuScenes)、深さ推定 (0.05 REL on KITTI) などがある。私たちのアプローチが、堅固なベースラインとなり、将来の研究を促進することを願っています。

We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline. Our approach follows a "noise-to-map" generative paradigm for prediction by progressively removing noise from a random Gaussian distribution, guided by the image. The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline. Without task-specific design and architecture customization, DDP is easy to generalize to most dense prediction tasks, e.g., semantic segmentation and depth estimation. In addition, DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods. We show top results on three representative tasks with six diverse benchmarks, without tricks, DDP achieves state-of-the-art or competitive performance on each task compared to the specialist counterparts. For example, semantic segmentation (83.9 mIoU on Cityscapes), BEV map segmentation (70.6 mIoU on nuScenes), and depth estimation (0.05 REL on KITTI). We hope that our approach will serve as a solid baseline and facilitate future research

翻訳日:2023-05-16 22:48:05 公開日:2023-05-13

# 自然言語の推論, 調査

Natural Language Reasoning, A Survey ( http://arxiv.org/abs/2303.14725v2 )

ライセンス: Link先を確認

Fei Yu, Hongbo Zhang, Prayag Tiwari, Benyou Wang

(参考訳) 本稿では,自然言語処理(NLP)分野における自然言語推論について,概念的にも実用的にも,より明確な視点を提案する。概念的には、我々は、哲学とNLPシナリオの両方に基づいて、NLPにおける自然言語推論の明確な定義を提供し、どのタスクが推論を必要とするかを議論し、推論の分類を導入します。本稿は,古典論理推論,自然言語推論,マルチホップ質問応答,コモンセンス推論を中心に,NLPにおける自然言語推論に関する総合的な文献レビューを行う。本稿は,多段階推論の強力なパラダイムである後方推論を同定し,考察し,自然言語推論研究における最も重要な将来方向の1つとしてデファシブル推論を導入する。ニューロシンボリック手法と数学的推論を除外し,単一モダリティ非構造化自然言語テキストに注目した。

This survey paper proposes a clearer view of natural language reasoning in the field of Natural Language Processing (NLP), both conceptually and practically. Conceptually, we provide a distinct definition for natural language reasoning in NLP, based on both philosophy and NLP scenarios, discuss what types of tasks require reasoning, and introduce a taxonomy of reasoning. Practically, we conduct a comprehensive literature review on natural language reasoning in NLP, mainly covering classical logical reasoning, natural language inference, multi-hop question answering, and commonsense reasoning. The paper also identifies and views backward reasoning, a powerful paradigm for multi-step reasoning, and introduces defeasible reasoning as one of the most important future directions in natural language reasoning research. We focus on single-modality unstructured natural language text, excluding neuro-symbolic techniques and mathematical reasoning.

翻訳日:2023-05-16 22:47:09 公開日:2023-05-13

# 行動検索:ラベルなしデータセットのクエリによるマイテーション学習

Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets ( http://arxiv.org/abs/2304.08742v2 )

ライセンス: Link先を確認

Maximilian Du, Suraj Nair, Dorsa Sadigh, Chelsea Finn

(参考訳) データ効率のよい方法で新しい視覚運動のスキルを習得するロボットの開発は、無数の課題に対して未解決の問題である。この問題に対処するための一般的なパラダイムは、多くの振る舞いを持つ大きなラベルのないデータセットを活用して、少数のタスク固有の人的監督(例えば介入やデモンストレーション)を使用して特定のタスクにポリシーを適用することである。しかし、タスク固有の監督を狭くし、オフラインデータとバランスをとるのがいかに最適かは、未解決の問題である。この研究における私たちの重要な洞察は、タスク固有のデータはエージェントがトレーニングする新しいデータを提供するだけでなく、エージェントが学習に使用するべき事前データの種類を知らせることもできます。具体的には、少量のダウンストリーム専門家データを使用して、オフラインでラベルなしのデータセット(多くのサブ最適動作を含む)から関連する振る舞いを選択的にクエリするシンプルなアプローチを提案する。エージェントは専門家とクエリーデータで共同で訓練される。提案手法はタスクへの関連する遷移のみをクエリし、サブ最適またはタスク不要なデータをフィルタリングすることを学習する。これにより、タスク固有のデータとオフラインのデータの混合からより効果的に学習することができる。さらに,画像からロボット操作タスクをシミュレートすることで,より複雑な目標条件付け手法を20%向上させることができた。ビデオやコードについてはhttps://sites.google.com/view/behaviorretrievalを参照。

Enabling robots to learn novel visuomotor skills in a data-efficient manner remains an unsolved problem with myriad challenges. A popular paradigm for tackling this problem is through leveraging large unlabeled datasets that have many behaviors in them and then adapting a policy to a specific task using a small amount of task-specific human supervision (i.e. interventions or demonstrations). However, how best to leverage the narrow task-specific supervision and balance it with offline data remains an open question. Our key insight in this work is that task-specific data not only provides new data for an agent to train on but can also inform the type of prior data the agent should use for learning. Concretely, we propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset (including many sub-optimal behaviors). The agent is then jointly trained on the expert and queried data. We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data. By doing so, it is able to learn more effectively from the mix of task-specific and offline data compared to naively mixing the data or only using the task-specific data. Furthermore, we find that our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images. See https://sites.google.com/view/behaviorretrieval for videos and code.

翻訳日:2023-05-16 21:03:16 公開日:2023-05-13

# SweCTRL-Mini:スウェーデンにおける制御可能なテキスト生成のためのデータ透過トランスフォーマーに基づく大規模言語モデル

SweCTRL-Mini: a data-transparent Transformer-based large language model for controllable text generation in Swedish ( http://arxiv.org/abs/2304.13994v2 )

ライセンス: Link先を確認

Dmytro Kalpakchi, Johan Boye

(参考訳) SweCTRL-Miniは,1つのコンシューマグレードGPU上での推論と微調整に使用できる,スウェーデンの大規模言語モデルである。このモデルはKeskar, McCann, Varshney, Xiong, Socher (2019)によるCTRLアーキテクチャに基づいており、SweCTRL-Miniモデルのユーザは生成プロンプトに特別なトークンを挿入することで生成されたテキストのジャンルを制御できる。 SweCTRL-MiniはスウェーデンのmC4コーパスのサブセットとスウェーデンの小説のセットで訓練されている。本稿では,(1)使用済みの訓練データとテキストの前処理ステップの詳細な説明,(2)特定のフレーズ/ソースが訓練データの一部であったかどうかの確認,(2)自動評価手法と生成課題を用いた判別作業におけるモデルの評価について述べる。また,モデル生成能力とGPT-3の比較を行った。 SweCTRL-Miniは完全にオープンで、ダウンロードできる。

We present SweCTRL-Mini, a large Swedish language model that can be used for inference and fine-tuning on a single consumer-grade GPU. The model is based on the CTRL architecture by Keskar, McCann, Varshney, Xiong, and Socher (2019), which means that users of the SweCTRL-Mini model can control the genre of the generated text by inserting special tokens in the generation prompts. SweCTRL-Mini is trained on a subset of the Swedish part of the mC4 corpus and a set of Swedish novels. In this article, we provide (1) a detailed account of the utilized training data and text pre-processing steps, to the extent that it is possible to check whether a specific phrase/source was a part of the training data, and (2) an evaluation of the model on both discriminative tasks, using automatic evaluation methods, and generative tasks, using human referees. We also compare the generative capabilities of the model with those of GPT-3. SweCTRL-Mini is fully open and available for download.

翻訳日:2023-05-16 20:53:45 公開日:2023-05-13

# テンソルネットワークに基づく量子スピン系の還元基底サロゲート

Reduced basis surrogates for quantum spin systems based on tensor networks ( http://arxiv.org/abs/2304.13587v2 )

ライセンス: Link先を確認

Paul Brehmer, Michael F. Herbst, Stefan Wessel, Matteo Rizzi, Benjamin Stamm

(参考訳) 還元基底法アプローチでは、例えば基底状態の位相図を調べるために、量子多体ヒルベルト空間の有効な低次元部分空間を構築する。この部分空間の基盤はスナップショットの解、すなわち、特定のパラメータ値と well-chosen パラメータ値に対応する基底状態から成り立っている。本稿では, 行列積状態(MPS)計算に基づいて, 還元基底を組み立て, パラメータ点を選択するための欲求戦略について述べる。減少基底が得られれば、位相図の計算に必要な可観測性は任意のパラメータ値のヒルベルト空間とは無関係な計算複雑性で計算することができる。本稿では、異方性および双曲面交換相互作用を含む、異なる1次元量子スピン-1モデルに対するこのアプローチの効率と精度を示し、リッチ量子位相図を導出する。

Within the reduced basis methods approach, an effective low-dimensional subspace of a quantum many-body Hilbert space is constructed in order to investigate, e.g., the ground-state phase diagram. The basis of this subspace is built from solutions of snapshots, i.e., ground states corresponding to particular and well-chosen parameter values. Here, we show how a greedy strategy to assemble the reduced basis and thus to select the parameter points can be implemented based on matrix-product-states (MPS) calculations. Once the reduced basis has been obtained, observables required for the computation of phase diagrams can be computed with a computational complexity independent of the underlying Hilbert space for any parameter value. We illustrate the efficiency and accuracy of this approach for different one-dimensional quantum spin-1 models, including anisotropic as well as biquadratic exchange interactions, leading to rich quantum phase diagrams.

翻訳日:2023-05-16 20:53:23 公開日:2023-05-13

# すべてのモデルはローカルである: 外部バリデーションをリカレントローカルバリデーションに置き換える時間

All models are local: time to replace external validation with recurrent local validation ( http://arxiv.org/abs/2305.03219v2 )

ライセンス: Link先を確認

Alex Youssef, Michael Pencina, Anshul Thakur, Tingting Zhu, David Clifton, Nigam H. Shah

(参考訳) 外部検証はMLモデルの一般化性を保証するためにしばしば推奨される。しかし、汎用性や、モデルの臨床的有用性(あらゆる臨床的意思決定支援ツールの最終的な目標)に匹敵するものではない。外部検証は、現在のヘルスケアMLのニーズと不一致である。まず、患者データは時間、地理、施設によって変化する。これらの変化は、単一の固定モデル(特に臨床mlを支配しているディープラーニングモデル)のパフォーマンスに大きなボラティリティをもたらします。第二に、新しいML技術、現在の市場力、更新された規制フレームワークは、デプロイされた個々のモデルインスタンスの頻繁な更新と監視を可能にしている。 MLモデルの安全性やユーティリティを確立するには,外部検証が不十分であることを示す。外部バリデーションパラダイムを修正するための提案は、十分に行き届かない。最終的なテストが私たちを混乱に導く可能性が高いので、引き続きそれに依存します。本稿では,MLOpsにインスパイアされた局所的検証のパラダイムを提案する。このパラダイムは、デプロイ毎のサイト固有の信頼性テストと、デプロイされたアルゴリズムのライフサイクル全体にわたる定期的かつ反復的なチェックに依存する。初期および繰り返しの信頼性テストは、パフォーマンス破壊的な分散シフトと、患者の安全性を損なうコンセプトドリフトから保護される。

External validation is often recommended to ensure the generalizability of ML models. However, it neither guarantees generalizability nor equates to a model's clinical usefulness (the ultimate goal of any clinical decision-support tool). External validation is misaligned with current healthcare ML needs. First, patient data changes across time, geography, and facilities. These changes create significant volatility in the performance of a single fixed model (especially for deep learning models, which dominate clinical ML). Second, newer ML techniques, current market forces, and updated regulatory frameworks are enabling frequent updating and monitoring of individual deployed model instances. We submit that external validation is insufficient to establish ML models' safety or utility. Proposals to fix the external validation paradigm do not go far enough. Continued reliance on it as the ultimate test is likely to lead us astray. We propose the MLOps-inspired paradigm of recurring local validation as an alternative that ensures the validity of models while protecting against performance-disruptive data variability. This paradigm relies on site-specific reliability tests before every deployment, followed by regular and recurrent checks throughout the life cycle of the deployed algorithm. Initial and recurrent reliability tests protect against performance-disruptive distribution shifts, and concept drifts that jeopardize patient safety.

翻訳日:2023-05-16 20:44:51 公開日:2023-05-13

# tweezer配列における反強磁性ボソニック$t$-$j$モデルとその量子シミュレーション

Antiferromagnetic bosonic $t$-$J$ models and their quantum simulation in tweezer arrays ( http://arxiv.org/abs/2305.02322v2 )

ライセンス: Link先を確認

Lukas Homeier and Timothy J. Harris and Tizian Blatz and Ulrich Schollw\"ock and Fabian Grusdt and Annabelle Bohrdt

(参考訳) 分子の双極子交換とrydberg原子のヴァン・ダー・ワールス相互作用による強い相互作用を持つ光学トワイザーアレイの組み合わせは、幅広い量子スピンモデルの研究の扉を開いた。次の重要なステップは、そのような設定とモバイルのドーパントの組み合わせである。これにより、多くの強い相関量子材料を弱めていると信じられている物理学をシミュレートすることができる。ここでは、局所ヒルベルト空間を3つの内部原子状態あるいは分子状態の集合に符号化することで、ボゾン$t$-$J$モデルを実現する実験スキームを提案する。スピン間の反強磁性(AFM)カップレートの工学的結合により、高T_c$カップレートと同様の電荷運動と磁気秩序の競合を実現することができる。提案する2d $t$-$j$モデルのbosonic afmバージョンは以前に研究されていなかったので、まず2つのドーパント(ボソニック統計が役割を果たす最も単純な例)のケースを分析し、その結果をフェルミオンの場合と比較する。六脚シリンダ上で大規模密度行列再正規化群 (DMRG) 計算を行い, ストリップを形成するボソニックホールの強い傾向を見出した。これは、ボソニックなAFM$t$-$J$モデルが強い相関電子の集合相と同様の物理を含むことを証明している。

The combination of optical tweezer arrays with strong interactions -- via dipole-exchange of molecules and van-der-Waals interactions of Rydberg atoms -- has opened the door for the exploration of a wide variety of quantum spin models. A next significant step will be the combination of such settings with mobile dopants: This will enable to simulate the physics believed to underlie many strongly correlated quantum materials. Here we propose an experimental scheme to realize bosonic $t$-$J$ models via encoding the local Hilbert space in a set of three internal atomic or molecular states. By engineering antiferromagnetic (AFM) couplings between spins, competition between charge motion and magnetic order similar to that in high-$T_c$ cuprates can be realized. Since the bosonic AFM version of the 2D $t$-$J$ model we propose has not been studied previously, we start by analyzing the case of two dopants -- the simplest instance in which their bosonic statistics plays a role, and contrast our results to the fermionic case. We perform large-scale density matrix renormalization group (DMRG) calculations on six-legged cylinders, and find a strong tendency for bosonic holes to form stripes. This demonstrates that bosonic, AFM $t$-$J$ models may contain similar physics as the collective phases in strongly correlated electrons.

翻訳日:2023-05-16 20:44:21 公開日:2023-05-13

# 強結合ボースポーラロンの統一理論:反発ポーラロンから非ガウス多体バウンド状態へ

A unified theory of strong coupling Bose polarons: From repulsive polarons to non-Gaussian many-body bound states ( http://arxiv.org/abs/2305.00835v2 )

ライセンス: Link先を確認

Nader Mostaan, Nathan Goldman, Fabian Grusdt

(参考訳) 我々は、フェシュバッハ共鳴を通じて、ホストボース・アインシュタイン凝縮体(BEC)と強く相互作用する移動不純物のボースポーラロン問題に対処する。強い結合における反発側では、理論的なアプローチは2つの異なるポラロン分岐を誘引性および反発性ポラロンに対応させて予測するが、この2つがどのように関連しているかは定かではない。これは、弱い反発的(安定)ボソン・ボソン相互作用と強い魅力(不安定)な不純物・ボソン相互作用の競合によるものであり、その相互作用は現代の理論手法では説明が難しい。ここでは、無限個のボソニック励起を含む不純物-ボソン散乱状態間のガウス相関と、不純物-ボソン結合状態を占めるボソン間の正確な非ガウス相関を結合する強力な変分フレームワークを開発する。この変分スキームは、共鳴の反発側でフェシュバッハ分子に生じる強い非線形性の完全な処理を可能にする。この枠組みでは,不純物誘起不安定性とボソン-ボソン相互作用による安定化の相互作用が,誘電体と反発性ポラロンの中間エネルギーにおける準安定多体結合状態の離散的集合をもたらすことを示した。これらの状態は非ガウス量子相関の形で強い量子統計特性を示し、その特徴づけには平均場以外の摂動性を必要とする。さらに、これらの多体結合状態は分子スペクトル重みを持ち、分子分光法技術によってアクセス可能である。この研究は、フェシュバッハ共鳴の反発側における魅力的で反発的なボースポーラロンの統一理論を提供する。

We address the Bose polaron problem of a mobile impurity interacting strongly with a host Bose-Einstein condensate (BEC) through a Feshbach resonance. On the repulsive side at strong couplings, theoretical approaches predict two distinct polaron branches corresponding to attractive and repulsive polarons, but it remains unclear how the two are related. This is partly due to the challenges resulting from a competition of strongly attractive (destabilizing) impurity-boson interactions with weakly repulsive (stabilizing) boson-boson interactions, whose interplay is difficult to describe with contemporary theoretical methods. Here we develop a powerful variational framework that combines Gaussian correlations among impurity-boson scattering states, including up to an infinite number of bosonic excitations, with exact non-Gaussian correlations among bosons occupying an impurity-boson bound state. This variational scheme enables a full treatment of strong nonlinearities arising in the Feshbach molecule on the repulsive side of the resonance. Within this framework, we demonstrate that the interplay of impurity-induced instability and stabilization by repulsive boson-boson interactions results in a discrete set of metastable many-body bound states at intermediate energies between the attractive and repulsive polaron branches. These states exhibit strong quantum statistical characteristics in the form of non-Gaussian quantum correlations, requiring non-perturbative beyond mean-field treatments for their characterization. Furthermore, these many-body bound states have sizable molecular spectral weights, accessible via molecular spectroscopy techniques. This work provides a unified theory of attractive and repulsive Bose polarons on the repulsive side of the Feshbach resonance.

翻訳日:2023-05-16 20:43:36 公開日:2023-05-13

# 1対1変圧器によるエンド・ツー・エンド車線検出

End-to-End Lane detection with One-to-Several Transformer ( http://arxiv.org/abs/2305.00675v4 )

ライセンス: Link先を確認

Kunyang Zhou and Rui Zhou

(参考訳) レーン検出手法は実世界のシナリオで印象的な性能を示したが、ほとんどの方法は十分に堅牢ではない後処理を必要とする。したがって、車線検出にはDetection TRansformer(DETR)のようなエンドツーエンド検出器が導入されたが、DTRにおける1対1のラベル割り当ては、ラベルセマンティックコンフリクトによるトレーニング効率の低下を招いている。さらに、detrにおける位置クエリは明示的な位置優先を提供することができないため、最適化が難しい。本稿では,1-to-Several Transformer(O2SFormer)を提案する。まず,1対1のラベル代入と1対1のラベル代入を組み合わせた1対1のラベル代入を提案する。 1対1の割り当てを最適化する難しさを克服する。さらに,異なるデコーダ層において正のレーンアンカーの正の重みを動的に調整する層別ソフトラベルを提案する。最後に,動的アンカーに基づく位置問合せの設計を行い,位置問合せにレーンアンカーを組み込むことにより位置先行を探索する。実験の結果、resnet50 backboneのo2sformerはculaneデータセットで77.83%のf1スコアを獲得し、既存のtransformerベースおよびcnnベースの検出器よりも優れていた。さらにO2SFormerはResNet18バックボーンのDETRよりも12.5倍高速に収束する。

Although lane detection methods have shown impressive performance in real-world scenarios, most of methods require post-processing which is not robust enough. Therefore, end-to-end detectors like DEtection TRansformer(DETR) have been introduced in lane detection.However, one-to-one label assignment in DETR can degrade the training efficiency due to label semantic conflicts. Besides, positional query in DETR is unable to provide explicit positional prior, making it difficult to be optimized. In this paper, we present the One-to-Several Transformer(O2SFormer). We first propose the one-to-several label assignment, which combines one-to-many and one-to-one label assignment to solve label semantic conflicts while keeping end-to-end detection. To overcome the difficulty in optimizing one-to-one assignment. We further propose the layer-wise soft label which dynamically adjusts the positive weight of positive lane anchors in different decoder layers. Finally, we design the dynamic anchor-based positional query to explore positional prior by incorporating lane anchors into positional query. Experimental results show that O2SFormer with ResNet50 backbone achieves 77.83% F1 score on CULane dataset, outperforming existing Transformer-based and CNN-based detectors. Futhermore, O2SFormer converges 12.5x faster than DETR for the ResNet18 backbone.

翻訳日:2023-05-16 20:43:07 公開日:2023-05-13

# 振動ポラリトン化学の微視的理論

Microscopic Theory of Vibrational Polariton Chemistry ( http://arxiv.org/abs/2305.05005v2 )

ライセンス: Link先を確認

Wenxiang Ying, Michael A.D. Taylor, and Pengfei Huo

(参考訳) 振動強い結合(VSC)修飾反応速度定数を説明するための顕微鏡理論を提案する。解析理論は、キャビティモードが反応の速度制限ステップである反応物の基底状態から振動励起状態への遷移を促進するという力学的予想に基づいている。この理論は通常の入射角度で観測された共鳴効果を説明する。コヒーレントな振動エネルギー移動像を仮定すると、理論は集団効果を説明でき、実験的に検証可能ないくつかの予測を行うことができる。

We present a microscopic theory that aims to explain the vibrational strong coupling (VSC) modified reaction rate constant. The analytic theory is based on a mechanistic conjecture that cavity modes promote the transition from the ground state to the vibrational excited state of the reactant, which is the rate-limiting step of the reaction. The theory explains the observed resonance effect at the normal incident angle. Assuming the coherent vibrational energy transfer picture, the theory can also explain the collective effect and makes several predictions that are experimentally verifiable.

翻訳日:2023-05-16 20:34:48 公開日:2023-05-13

# AlignSTS: クロスモーダルアライメントによる音声対歌変換

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment ( http://arxiv.org/abs/2305.04476v3 )

ライセンス: Link先を確認

Ruiqi Li, Rongjie Huang, Lichao Zhang, Jinglin Liu, Zhou Zhao

(参考訳) 音声認識(sts)音声変換タスクは、音声録音に対応する歌唱サンプルを生成することを目的としており、ターゲット(音声)ピッチ輪郭とソース(音声)コンテンツとのアライメントは、テキストのない状況では学習が困難である。本稿では,音節や内容などの発話の相違を異なるモーダル性として捉えた,明示的なクロスモーダルアライメントに基づくSTSモデルであるAlignSTSを提案する。人間がメロディの歌詞を歌うメカニズムに触発されたAlignSTS: 1)新規なリズム適応器を採用して、目標リズム表現を予測し、そのリズム表現が単純で効果的な方法で計算され、離散空間に量子化される、内容とピッチの間のモダリティギャップを橋渡しする。 2) 予測リズム表現を用いて, クロスアテンションに基づいてコンテンツを再調整し, 再合成のためのクロスモーダル融合を行う。大規模な実験では、AlignSTSは客観的な指標と主観的な指標の両方で優れたパフォーマンスを達成している。オーディオサンプルはhttps://alignsts.github.ioで入手できる。

The speech-to-singing (STS) voice conversion task aims to generate singing samples corresponding to speech recordings while facing a major challenge: the alignment between the target (singing) pitch contour and the source (speech) content is difficult to learn in a text-free situation. This paper proposes AlignSTS, an STS model based on explicit cross-modal alignment, which views speech variance such as pitch and content as different modalities. Inspired by the mechanism of how humans will sing the lyrics to the melody, AlignSTS: 1) adopts a novel rhythm adaptor to predict the target rhythm representation to bridge the modality gap between content and pitch, where the rhythm representation is computed in a simple yet effective way and is quantized into a discrete space; and 2) uses the predicted rhythm representation to re-align the content based on cross-attention and conducts a cross-modal fusion for re-synthesize. Extensive experiments show that AlignSTS achieves superior performance in terms of both objective and subjective metrics. Audio samples are available at https://alignsts.github.io.

翻訳日:2023-05-16 20:33:48 公開日:2023-05-13

# 全方向量子制限位相保存増幅器

Fully Directional Quantum-limited Phase-Preserving Amplifier ( http://arxiv.org/abs/2305.04184v2 )

ライセンス: Link先を確認

Gangqiang Liu, Andrew Lingenfelter, Vidul R. Joshi, Nicholas E. Frattini, Volodymyr V. Sivak, Shyam Shankar and Michel H. Devoret

(参考訳) 本研究では,4つのモードにまたがる6つのパラメトリックプロセス間の干渉を利用して,4ポート4モード超伝導ジョセフソン回路の完全指向性,量子制限型位相保存増幅を実現する方法を提案する。完全方向性(full directionality)は、増幅器の入力ポートと出力ポートの間の前方利得を超える逆分離として定義され、アプリケーション中に出力ポートに存在するインピーダンスミスマッチに対するロバスト性を保証する。既存の指向性位相保存増幅器とは異なり、最小のバックアクションとこの増幅器の量子制限付加ノイズは出力ポートのノイズインシデントの影響を受けない。さらに、一致した入力および出力ポートは、これらの増幅器を他の回路QEDコンポーネントと直接チップ上で統合することができ、超伝導量子プロセッサのスケールアップを容易にする。

We present a way to achieve fully directional, quantum-limited phase-preserving amplification in a four-port, four-mode superconducting Josephson circuit by utilizing interference between six parametric processes that couple all four modes. Full directionality, defined as the reverse isolation surpassing forward gain between the matched input and output ports of the amplifier, ensures its robustness against impedance mismatch that might be present at its output port during applications. Unlike existing directional phase-preserving amplifiers, both the minimal back-action and the quantum-limited added noise of this amplifier remains unaffected by noise incident on its output port. In addition, the matched input and output ports allow direct on-chip integration of these amplifiers with other circuit QED components, facilitating scaling up of superconducting quantum processors.

翻訳日:2023-05-16 20:33:27 公開日:2023-05-13

# Patchwork Learning: 多様なバイオメディカルデータソースの統合分析に向けたパラダイム

Patchwork Learning: A Paradigm Towards Integrative Analysis across Diverse Biomedical Data Sources ( http://arxiv.org/abs/2305.06217v2 )

ライセンス: Link先を確認

Suraj Rajendran, Weishen Pan, Mert R. Sabuncu, Yong Chen, Jiayu Zhou, Fei Wang

(参考訳) 医療における機械学習(ml)は、患者ケア、人口健康、医療提供者のワークフローを強化する多くの機会を提供する。しかし、データプライバシや異種データソースの課題、複数のデータモダリティを完全に活用できないため、実際の臨床とコストのメリットは依然として限られている。本稿では,異なるデータモダリティ(クリニカル・フリーテキスト,医用画像,オミクスなど)から構成される異なるデータセットからの情報を統合することにより,これらの制約に対処する新しいパラダイムである"パッチワーク・ラーニング"(PL)を紹介する。 PLはデータのプライバシを保ちながら補完的なデータソースを同時に利用することを可能にし、より包括的で一般化可能なMLモデルの開発を可能にする。本稿では,パッチワーク学習の概念と医療における現在の実装について紹介し,様々な医療課題に対処するための潜在的機会と適用可能なデータソースについて検討する。 PLは、情報共有と欠落したデータのインプットを容易にするために、サイトをまたいだブリッジングのモダリティや重複する特徴空間を活用し、関連する予測タスクに対処する。本稿では,PLに関連する課題について論じる。その多くが連合学習とマルチモーダル学習によって共有され,今後の研究への提言を提供する。医療データ統合に対するより包括的なアプローチを提供することで、パッチワーク学習はMLモデルの臨床的適用性に革命をもたらす可能性がある。このパラダイムは、パーソナライゼーションと一般化可能性のバランスを保ち、最終的には患者の体験を向上し、人口の健康を改善し、医療提供者のワークフローを最適化することを約束する。

Machine learning (ML) in healthcare presents numerous opportunities for enhancing patient care, population health, and healthcare providers' workflows. However, the real-world clinical and cost benefits remain limited due to challenges in data privacy, heterogeneous data sources, and the inability to fully leverage multiple data modalities. In this perspective paper, we introduce "patchwork learning" (PL), a novel paradigm that addresses these limitations by integrating information from disparate datasets composed of different data modalities (e.g., clinical free-text, medical images, omics) and distributed across separate and secure sites. PL allows the simultaneous utilization of complementary data sources while preserving data privacy, enabling the development of more holistic and generalizable ML models. We present the concept of patchwork learning and its current implementations in healthcare, exploring the potential opportunities and applicable data sources for addressing various healthcare challenges. PL leverages bridging modalities or overlapping feature spaces across sites to facilitate information sharing and impute missing data, thereby addressing related prediction tasks. We discuss the challenges associated with PL, many of which are shared by federated and multimodal learning, and provide recommendations for future research in this field. By offering a more comprehensive approach to healthcare data integration, patchwork learning has the potential to revolutionize the clinical applicability of ML models. This paradigm promises to strike a balance between personalization and generalizability, ultimately enhancing patient experiences, improving population health, and optimizing healthcare providers' workflows.

翻訳日:2023-05-16 20:25:39 公開日:2023-05-13

# 何て言うんだ! 大きな言語モデルでは否定的常識の知識が多すぎる

Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge ( http://arxiv.org/abs/2305.05976v2 )

ライセンス: Link先を確認

Jiangjie Chen, Wei Shi, Ziquan Fu, Sijie Cheng, Lei Li, Yanghua Xiao

(参考訳) 大規模言語モデル(llm)は、ポジティブな知識を蓄積し活用する能力について広く研究されている。しかし、「lion don't live in the ocean」のような否定的な知識は世界でもユビキタスであるが、テキストで明示的に言及されることは滅多にない。 LLMは負の知識について何を知っているのか? 本研究は,LLMの負のコモンセンス知識に対する能力について検討する。制約付きキーワード対文生成タスク(CG)とブール質問回答タスク(QA)を設計し,LLMを探索する。実験の結果,LLMは負のコモンセンス知識に基づく有効な文の生成に失敗することが多いことがわかった。我々はこの現象をLLMの信念衝突と呼ぶ。さらなる分析から,言語モデリングの事前学習による統計的近道と否定報告バイアスが,この衝突の原因となることが示された。

Large language models (LLMs) have been widely studied for their ability to store and utilize positive knowledge. However, negative knowledge, such as "lions don't live in the ocean", is also ubiquitous in the world but rarely mentioned explicitly in the text. What do LLMs know about negative knowledge? This work examines the ability of LLMs to negative commonsense knowledge. We design a constrained keywords-to-sentence generation task (CG) and a Boolean question-answering task (QA) to probe LLMs. Our experiments reveal that LLMs frequently fail to generate valid sentences grounded in negative commonsense knowledge, yet they can correctly answer polar yes-or-no questions. We term this phenomenon the belief conflict of LLMs. Our further analysis shows that statistical shortcuts and negation reporting bias from language modeling pre-training cause this conflict.

翻訳日:2023-05-16 20:24:50 公開日:2023-05-13

# MetaMorphosis:マルチタスク学習のためのタスク指向プライバシ認知機能生成

MetaMorphosis: Task-oriented Privacy Cognizant Feature Generation for Multi-task Learning ( http://arxiv.org/abs/2305.07815v1 )

ライセンス: Link先を確認

Md Adnan Arefeen, Zhouyu Li, Md Yusuf Sarwar Uddin, Anupam Das

(参考訳) コンピュータビジョンアプリケーションの成長、ディープラーニング、エッジコンピューティングは、エッジデバイスとクラウドにワークロードを分散させることで、実用的なコラボレーションインテリジェンス(CI)の確保に寄与する。しかしながら、エッジデバイス上で別々のシングルタスクモデルを実行することは、必要な計算リソースと時間に関して非効率である。このコンテキストでは、マルチタスク学習は、セマンティックセグメンテーションや入ってくるビデオフレームの深さ推定など、複数のタスクを実行するために単一のディープラーニングモデルを活用することができる。この単一処理パイプラインは、マルチタスクモジュール間で共有される共通の深い特徴を生成する。しかし、コラボレーティブインテリジェンスシナリオでは、共通の深い特徴を生成するには2つの大きな問題がある。まず、深い機能には、下流モジュールに露出した入力情報(入力のプライバシーを侵害する)が不注意に含まれる可能性がある。第二に、生成されたユニバーサル機能は、あるタスクを意図したものよりも集合的な情報を露出し、あるタスクの機能を他のタスク(タスクプライバシに違反する)に利用することができる。本稿では,特定のタスクに対する推論能力を制限する,新しいディープラーニングベースのプライバシー認識機能生成プロセスであるmetamorphosisを提案する。そこで本研究では,すべてのタスクに明確な注意を払い,差分プライバシーを持つ非相関損失関数を用いて,各タスクのアウトプットとして異なるプライバシアウェア機能を生成するディープラーニングモデルを訓練する,チャネルスクイーズ励起型特徴形変換モジュールcross-secを提案する。シーン理解と顔の属性に関する多様な画像からなる4つのデータセットを広範囲に実験した結果,画像とビデオ分析の効率的な方法でプライバシ要件を保証し,近年の逆学習や普遍的特徴生成手法よりもメタモルフィズムが優れていることが示された。

With the growth of computer vision applications, deep learning, and edge computing contribute to ensuring practical collaborative intelligence (CI) by distributing the workload among edge devices and the cloud. However, running separate single-task models on edge devices is inefficient regarding the required computational resource and time. In this context, multi-task learning allows leveraging a single deep learning model for performing multiple tasks, such as semantic segmentation and depth estimation on incoming video frames. This single processing pipeline generates common deep features that are shared among multi-task modules. However, in a collaborative intelligence scenario, generating common deep features has two major issues. First, the deep features may inadvertently contain input information exposed to the downstream modules (violating input privacy). Second, the generated universal features expose a piece of collective information than what is intended for a certain task, in which features for one task can be utilized to perform another task (violating task privacy). This paper proposes a novel deep learning-based privacy-cognizant feature generation process called MetaMorphosis that limits inference capability to specific tasks at hand. To achieve this, we propose a channel squeeze-excitation based feature metamorphosis module, Cross-SEC, to achieve distinct attention of all tasks and a de-correlation loss function with differential-privacy to train a deep learning model that produces distinct privacy-aware features as an output for the respective tasks. With extensive experimentation on four datasets consisting of diverse images related to scene understanding and facial attributes, we show that MetaMorphosis outperforms recent adversarial learning and universal feature generation methods by guaranteeing privacy requirements in an efficient way for image and video analytics.

翻訳日:2023-05-16 19:39:03 公開日:2023-05-13

# Cloud-RAIN: 反射不変性によるポイントクラウド分析

Cloud-RAIN: Point Cloud Analysis with Reflectional Invariance ( http://arxiv.org/abs/2305.07814v1 )

ライセンス: Link先を確認

Yiming Cui, Lecheng Ruan, Hang-Cheng Dong, Qiang Li, Zhongming Wu, Tieyong Zeng, Feng-Lei Fan

(参考訳) 点雲タスクのネットワークは、回転や反射のような点雲が親和的に変換されるときに不変であることが期待される。これまでのところ、近年研究が注目されている回転不変性に対して、反射不変性はほとんど対処されていない。にもかかわらず、リフレクション対称性は、構造化道路の静的反射対称性、動く物体(歩行者など)の双方向運動からの動的反射対称性、異なる国の左右の交通慣行など、非常に一般的で重要なシナリオで自分自身を見つけることができる。私たちの知る限りでは、残念ながら、これまでポイントクラウド分析でリフレクション不変ネットワークが報告されていない。このギャップを埋めるために,2次ニューロンと,Cloud-RAINと呼ばれるPCA標準表現を用いて, \underline{R}eflection\underline{A}l \underline{IN} 分散を用いた点 \underline{Cloud} モデルを実現する枠組みを提案する。クラウドレーンはなぜ反射対称性を享受できるのかを説明するための定理を証明する。さらに、広範な実験は、提案したCloud-RAINの反射特性を相関させ、Cloud-RAINがデータ拡張よりも優れていることを示す。私たちのコードはhttps://github.com/YimingCuiCuiCui/Cloud-RAINで利用可能です。

The networks for point cloud tasks are expected to be invariant when the point clouds are affinely transformed such as rotation and reflection. So far, relative to the rotational invariance that has been attracting major research attention in the past years, the reflection invariance is little addressed. Notwithstanding, reflection symmetry can find itself in very common and important scenarios, e.g., static reflection symmetry of structured streets, dynamic reflection symmetry from bidirectional motion of moving objects (such as pedestrians), and left- and right-hand traffic practices in different countries. To the best of our knowledge, unfortunately, no reflection-invariant network has been reported in point cloud analysis till now. To fill this gap, we propose a framework by using quadratic neurons and PCA canonical representation, referred to as Cloud-RAIN, to endow point \underline{Cloud} models with \underline{R}eflection\underline{A}l \underline{IN}variance. We prove a theorem to explain why Cloud-RAIN can enjoy reflection symmetry. Furthermore, extensive experiments also corroborate the reflection property of the proposed Cloud-RAIN and show that Cloud-RAIN is superior to data augmentation. Our code is available at https://github.com/YimingCuiCuiCui/Cloud-RAIN.

翻訳日:2023-05-16 19:38:32 公開日:2023-05-13

# ドアベルカメラの軽量化検出

Lightweight Delivery Detection on Doorbell Cameras ( http://arxiv.org/abs/2305.07812v1 )

ライセンス: Link先を確認

Pirazh Khorramshahi, Zhe Wu, Tianchen Wang, Luke Deluccia, Hongcheng Wang

(参考訳) 近年の映像ベース行動認識と強固な時空間モデリングの進歩にもかかわらず、提案手法の多くは計算資源の豊富さに頼り、大規模で計算集約的な畳み込みやトランスフォーマーベースのニューラルネットワークを実行して十分な結果を得る。これにより、電力とコンピューティングリソースが制限されたエッジデバイスへのそのようなモデルのデプロイが制限される。本研究では、重要なスマートホームアプリケーション、ビデオベースの配信検出、リソース制約されたドアベルカメラ上で動作可能な、このタスクのためのシンプルで軽量なパイプラインを提案する。提案するパイプラインは,動きの手がかりに基づいて,粗いアクティビティ提案のセットを生成し,さらに,モバイルフレンドリーな3dcnnネットワークで分類する。トレーニングのために、ネットワークが強固な時空間的特徴を学ぶのに役立つ新しい半教師付きアテンションモジュールを設計し、ネットワークによってなされる予測の不確かさを定量化するためのエビデンスベースの最適化目標を採用する。私たちのキュレーションされたデリバリデータセットにおける実験結果は、代替品と比較してパイプラインの有意な有効性を示し、自由かつかなりの推論時間のパフォーマンス向上を達成するためのトレーニングフェーズのノベルティのメリットを強調しています。

Despite recent advances in video-based action recognition and robust spatio-temporal modeling, most of the proposed approaches rely on the abundance of computational resources to afford running huge and computation-intensive convolutional or transformer-based neural networks to obtain satisfactory results. This limits the deployment of such models on edge devices with limited power and computing resources. In this work we investigate an important smart home application, video based delivery detection, and present a simple and lightweight pipeline for this task that can run on resource-constrained doorbell cameras. Our proposed pipeline relies on motion cues to generate a set of coarse activity proposals followed by their classification with a mobile-friendly 3DCNN network. For training we design a novel semi-supervised attention module that helps the network to learn robust spatio-temporal features and adopt an evidence-based optimization objective that allows for quantifying the uncertainty of predictions made by the network. Experimental results on our curated delivery dataset shows the significant effectiveness of our pipeline compared to alternatives and highlights the benefits of our training phase novelties to achieve free and considerable inference-time performance gains.

翻訳日:2023-05-16 19:38:05 公開日:2023-05-13

# ReLU MLPにおける$\mu$P学習率の深さ依存性

Depth Dependence of $\mu$P Learning Rates in ReLU MLPs ( http://arxiv.org/abs/2305.07810v1 )

ライセンス: Link先を確認

Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J. Reddi, Srinadh Bhojanapalli, Sanjiv Kumar

(参考訳) 本稿では、平均フィールド重み初期化を備えた幅$n$と深さ$L$のランダム完全連結ReLUネットワークについて考察する。我々の目的は、最大更新(\mu$p)学習率のn$とl$への依存を調べることである。 yang et の $\mu$p に関する先行研究と同じように。この最大更新学習率は、第1層と第2層の重みを除いて、すべて$n$とは独立している。しかし、それは非自明な$l$依存性を持ち、$l^{-3/2}のようにスケーリングする。 $

In this short note we consider random fully connected ReLU networks of width $n$ and depth $L$ equipped with a mean-field weight initialization. Our purpose is to study the dependence on $n$ and $L$ of the maximal update ($\mu$P) learning rate, the largest learning rate for which the mean squared change in pre-activations after one step of gradient descent remains uniformly bounded at large $n,L$. As in prior work on $\mu$P of Yang et. al., we find that this maximal update learning rate is independent of $n$ for all but the first and last layer weights. However, we find that it has a non-trivial dependence of $L$, scaling like $L^{-3/2}.$

翻訳日:2023-05-16 19:37:44 公開日:2023-05-13

# Mesh2SSM: 表面メッシュから解剖の統計的形状モデルへ

Mesh2SSM: From Surface Meshes to Statistical Shape Models of Anatomy ( http://arxiv.org/abs/2305.07805v1 )

ライセンス: Link先を確認

Krithika Iyer, Shireen Elhabian

(参考訳) 統計的形状モデリングは、医療画像(MRIやCTスキャンなど)で捉えたセグメント化された解剖学から重要な形状パラメータを発見する計算過程である。人間の解剖学における実質的な非線形変動の存在は、しばしば伝統的な形状モデリングプロセスを困難にしている。深層学習技術は、形状の複雑な非線形表現を学習し、基礎となる人口レベルの変動に忠実な統計的形状モデルを生成することができる。しかし、既存のディープラーニングモデルは依然として制限があり、トレーニングのために確立/最適化された形状モデルが必要である。我々は、教師なしの置換不変表現学習を活用して、テンプレートポイントクラウドを主観的なメッシュに変形する方法を推定し、対応性に基づく形状モデルを作成する新しいアプローチであるMesh2SSMを提案する。 Mesh2SSMは集団固有のテンプレートも学習でき、テンプレート選択によるバイアスを低減できる。提案手法はメッシュ上で直接動作し,計算効率が高いため,従来型および深層学習に基づくSSMアプローチの代替となる。

Statistical shape modeling is the computational process of discovering significant shape parameters from segmented anatomies captured by medical images (such as MRI and CT scans), which can fully describe subject-specific anatomy in the context of a population. The presence of substantial non-linear variability in human anatomy often makes the traditional shape modeling process challenging. Deep learning techniques can learn complex non-linear representations of shapes and generate statistical shape models that are more faithful to the underlying population-level variability. However, existing deep learning models still have limitations and require established/optimized shape models for training. We propose Mesh2SSM, a new approach that leverages unsupervised, permutation-invariant representation learning to estimate how to deform a template point cloud to subject-specific meshes, forming a correspondence-based shape model. Mesh2SSM can also learn a population-specific template, reducing any bias due to template selection. The proposed method operates directly on meshes and is computationally efficient, making it an attractive alternative to traditional and deep learning-based SSM approaches.

翻訳日:2023-05-16 19:37:32 公開日:2023-05-13

# CEMFormer:空間時間変換器による車内および外部カメラからのドライバー意図の予測

CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers ( http://arxiv.org/abs/2305.07840v1 )

ライセンス: Link先を確認

Yunsheng Ma, Wenqian Ye, Xu Cao, Amr Abdelraouf, Kyungtae Han, Rohit Gupta, Ziran Wang

(参考訳) ドライバーの意図予測は、周囲の交通環境に関する行動を分析することによってドライバーの行動を予測しようとするものである。既存のアプローチは主にレイトフュージョン技術に注目し、予測と一般的な駆動コンテキスト間の一貫性を維持することの重要性を無視している。本稿では,時空間トランスフォーマを使用してドライバの意図予測を改善するための統合メモリ表現を学習する,cross-view episodic memory transformer(cemformer)と呼ばれる新しいフレームワークを提案する。具体的には,in-cabinとexternal cameraの双方からの情報とエピソディックメモリ表現を統合し,履歴データを連続的に融合する空間時空間エンコーダを開発した。さらに,運転コンテキストを補助的監視信号として組み込んで予測性能を向上させる新しいコンテキスト一貫性損失を提案する。 Brain4Carsデータセットに関する包括的な実験は、CEMFormerがドライバーの意図予測において既存の最先端メソッドを一貫して上回っていることを示している。

Driver intention prediction seeks to anticipate drivers' actions by analyzing their behaviors with respect to surrounding traffic environments. Existing approaches primarily focus on late-fusion techniques, and neglect the importance of maintaining consistency between predictions and prevailing driving contexts. In this paper, we introduce a new framework called Cross-View Episodic Memory Transformer (CEMFormer), which employs spatio-temporal transformers to learn unified memory representations for an improved driver intention prediction. Specifically, we develop a spatial-temporal encoder to integrate information from both in-cabin and external camera views, along with episodic memory representations to continuously fuse historical data. Furthermore, we propose a novel context-consistency loss that incorporates driving context as an auxiliary supervision signal to improve prediction performance. Comprehensive experiments on the Brain4Cars dataset demonstrate that CEMFormer consistently outperforms existing state-of-the-art methods in driver intention prediction.

翻訳日:2023-05-16 19:30:35 公開日:2023-05-13

# 多言語言語モデルの幾何学:平等レンズ

The Geometry of Multilingual Language Models: An Equality Lens ( http://arxiv.org/abs/2305.07839v1 )

ライセンス: Link先を確認

Cheril Shah, Yashashree Chandak, Manan Suri

(参考訳) 多言語言語モデルにおける異なる言語の表現を理解することは、言語間特性の理解、下流タスクのパフォーマンスの予測、言語間のバイアスの特定に不可欠である。本研究では, ユークリッド空間における3つの多言語モデルの幾何学を解析し, すべての言語が一意な幾何学で表されることを示す。幾何学的分離性指数を用いて、言語は言語族によって近い傾向にあるが、それらは他族の言語とほぼ分離可能である。また,意味空間における言語間距離を測定するために,言語間類似度指数を導入する。以上の結果から,低リソース言語は,いずれのモデルにおいても高リソース言語ほど良く表現されていないことが示唆された。

Understanding the representations of different languages in multilingual language models is essential for comprehending their cross-lingual properties, predicting their performance on downstream tasks, and identifying any biases across languages. In our study, we analyze the geometry of three multilingual language models in Euclidean space and find that all languages are represented by unique geometries. Using a geometric separability index we find that although languages tend to be closer according to their linguistic family, they are almost separable with languages from other families. We also introduce a Cross-Lingual Similarity Index to measure the distance of languages with each other in the semantic space. Our findings indicate that the low-resource languages are not represented as good as high resource languages in any of the models

翻訳日:2023-05-16 19:30:18 公開日:2023-05-13

# 重み付きパッチ品質予測による非参照点クラウド品質評価

No-Reference Point Cloud Quality Assessment via Weighted Patch Quality Prediction ( http://arxiv.org/abs/2305.07829v1 )

ライセンス: Link先を確認

Jun Cheng, Honglei Su, Jari Korhonen

(参考訳) ポイントクラウドに基づく3Dビジョンアプリケーションの開発が急速に進み、ポイントクラウド品質評価(PCQA)が重要な研究トピックになりつつある。しかし、従来のPCQA手法では、点雲の異なる領域における局所的な品質変動の影響を無視する。品質分布不均衡の利点を生かし,地域相関解析機能を備えた非参照点雲質評価法(NR-PCQA)を提案する。具体的には、ポイントクラウドをパッチに分割し、各パッチのテクスチャと構造機能を生成し、それらをパッチ機能に融合してパッチ品質を予測します。そして,相関解析のために点雲のすべてのパッチの特徴を収集し,相関重みを求める。最後に、すべてのパッチに対する予測品質と相関重みを用いて最終的な品質スコアを導出する。実験の結果,提案手法はNR-PCQA法よりも優れていた。 COPP-Netのソースコードはhttps://github.com/philox12358/COPP-Netにある。

With the rapid development of 3D vision applications based on point clouds, point cloud quality assessment(PCQA) is becoming an important research topic. However, the prior PCQA methods ignore the effect of local quality variance across different areas of the point cloud. To take an advantage of the quality distribution imbalance, we propose a no-reference point cloud quality assessment (NR-PCQA) method with local area correlation analysis capability, denoted as COPP-Net. More specifically, we split a point cloud into patches, generate texture and structure features for each patch, and fuse them into patch features to predict patch quality. Then, we gather the features of all the patches of a point cloud for correlation analysis, to obtain the correlation weights. Finally, the predicted qualities and correlation weights for all the patches are used to derive the final quality score. Experimental results show that our method outperforms the state-of-the-art benchmark NR-PCQA methods. The source code for the proposed COPP-Net can be found at https://github.com/philox12358/COPP-Net.

翻訳日:2023-05-16 19:30:05 公開日:2023-05-13

# DCASE 2023チャレンジタスクの解説と議論第2報:機械条件モニタリングのための1ショット無監督異常音検出

Description and Discussion on DCASE 2023 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring ( http://arxiv.org/abs/2305.07828v1 )

ライセンス: Link先を確認

Kota Dohi and Keisuke Imoto and Noboru Harada and Daisuke Niizumi and Yuma Koizumi and Tomoya Nishida and Harsh Purohit and Ryo Tanabe and Takashi Endo and Yohei Kawaguchi

(参考訳) 本稿では,音響シーンとイベントの検出と分類に関するタスク記述(dcase)2023 challenge task 2: "first-shot unsupervised anomalous sound detection (asd) for machine condition monitoring"について述べる。主な目標は、ハイパーパラメータチューニングを必要とせず、少数の正常なサンプルのみを使用して、新しい種類のマシンにasdシステムを迅速に展開できるようにすることである。過去のASDタスクでは、開発および評価データセットが同じマシンタイプであったため、各マシンタイプごとにハイパーパラメータをチューニングする手法が開発された。しかし、通常データや異常データを開発データセットとして収集することは現実には不可能である。 2023年タスク2では、全く新しいマシンタイプのマシンでモデルをトレーニングするという課題であるファーストショット問題を解決することに集中します。具体的には (i)各マシンタイプは1つのセクションしか持たず、 (ii) 開発・評価データセットのマシンタイプは全く異なる。課題提出期限後に,課題結果と提案内容の分析を加えます。

We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge Task 2: "First-shot unsupervised anomalous sound detection (ASD) for machine condition monitoring". The main goal is to enable rapid deployment of ASD systems for new kinds of machines using only a few normal samples, without the need for hyperparameter tuning. In the past ASD tasks, developed methods tuned hyperparameters for each machine type, as the development and evaluation datasets had the same machine types. However, collecting normal and anomalous data as the development dataset can be infeasible in practice. In 2023 Task 2, we focus on solving first-shot problem, which is the challenge of training a model on a few machines of a completely novel machine type. Specifically, (i) each machine type has only one section, and (ii) machine types in the development and evaluation datasets are completely different. We will add challenge results and analysis of the submissions after the challenge submission deadline.

翻訳日:2023-05-16 19:29:48 公開日:2023-05-13

# 混合商品距離による静的単語埋め込みの周波数対応次元選択

Frequency-aware Dimension Selection for Static Word Embedding by Mixed Product Distance ( http://arxiv.org/abs/2305.07826v1 )

ライセンス: Link先を確認

Lingfeng Shen, Haiyun Jiang, Lemao Liu, Ying Chen

(参考訳) 静的な単語埋め込みは、特にコンテキストが利用できないタスクでは、事前学習された言語モデルは、静的な単語埋め込みよりもパフォーマンスが悪いため、まだ有用である。次元は静的単語埋め込みの品質を決定する重要な要素であるが、自動次元選択はめったに議論されない。本稿では, 単語の頻度が次元選択に与える影響について検討し, 単語の頻度が非常に重要であり, 次元選択中に考慮する必要があることを実証的に確認する。このような経験的発見に基づいて, 単語埋め込みアルゴリズムを訓練することなく, 単語埋め込みアルゴリズムの適切な次元を選択するために, 距離(Mixed Product Distance, MPD)を用いた次元選択法を提案する。オラクル行列に後処理関数を適用することで、MPDベースの手法は単語周波数の影響を非強調化することができる。コンテクスト未使用タスクとコンテクスト利用可能タスクの両方に関する実験は、ベースライン上のmpdベースの次元選択方法の効率とパフォーマンスのトレードオフをよりよく示しています。

Static word embedding is still useful, particularly for context-unavailable tasks, because in the case of no context available, pre-trained language models often perform worse than static word embeddings. Although dimension is a key factor determining the quality of static word embeddings, automatic dimension selection is rarely discussed. In this paper, we investigate the impact of word frequency on the dimension selection, and empirically find that word frequency is so vital that it needs to be taken into account during dimension selection. Based on such an empirical finding, this paper proposes a dimension selection method that uses a metric (Mixed Product Distance, MPD) to select a proper dimension for word embedding algorithms without training any word embedding. Through applying a post-processing function to oracle matrices, the MPD-based method can de-emphasize the impact of word frequency. Experiments on both context-unavailable and context-available tasks demonstrate the better efficiency-performance trade-off of our MPD-based dimension selection method over baselines.

翻訳日:2023-05-16 19:29:29 公開日:2023-05-13

# YOLOv7-BRAとマルチモデル融合に基づく学生の授業行動検出

Student Classroom Behavior Detection based on YOLOv7-BRA and Multi-Model Fusion ( http://arxiv.org/abs/2305.07825v1 )

ライセンス: Link先を確認

Fan Yang and Tao Wang, Xiaofei Wang

(参考訳) 教室ビデオにおける生徒の行動を正確に検出することは,授業パフォーマンスの分析と指導効果の向上に寄与する。しかし、動作検出における現在の精度は低い。そこで本稿では, YOLOv7-BRA (YOLOv7 with Bi-level Routing Attention ) に基づく授業行動検出システムを提案する。我々は,立ち上がり,座った,話す,聞く,歩く,手を上げる,読む,書くという8つの行動パターンを特定した。本研究では,11,248個のラベルと4,001個の画像を含むデータセットを構築し,教室環境における手を挙げる一般的な行動に着目した(Student Classroom Behavior dataset, SCB-Dataset)。検出精度を向上させるため,biformer attentionモジュールをyolov7ネットワークに追加した。最後に、学生の教室行動データを得るために、YOLOv7 CrowdHuman、SlowFast、DeepSortモデルの結果を融合した。 SCB-Datasetの実験を行い、YOLOv7-BRAはmAP@0.5の87.1%を達成した。 SCBデータセットは、https://github.com/Whiffe/SCB-dataseからダウンロードできます。

Accurately detecting student behavior in classroom videos can aid in analyzing their classroom performance and improving teaching effectiveness. However, the current accuracy rate in behavior detection is low. To address this challenge, we propose the Student Classroom Behavior Detection system based on based on YOLOv7-BRA (YOLOv7 with Bi-level Routing Attention ). We identified eight different behavior patterns, including standing, sitting, speaking, listening, walking, raising hands, reading, and writing. We constructed a dataset, which contained 11,248 labels and 4,001 images, with an emphasis on the common behavior of raising hands in a classroom setting (Student Classroom Behavior dataset, SCB-Dataset). To improve detection accuracy, we added the biformer attention module to the YOLOv7 network. Finally, we fused the results from YOLOv7 CrowdHuman, SlowFast, and DeepSort models to obtain student classroom behavior data. We conducted experiments on the SCB-Dataset, and YOLOv7-BRA achieved an mAP@0.5 of 87.1%, resulting in a 2.2% improvement over previous results. Our SCB-dataset can be downloaded from: https://github.com/Whiffe/SCB-datase

翻訳日:2023-05-16 19:29:13 公開日:2023-05-13

# 教師なし文表現強調のためのシンプルかつプラグアンドプレイ法

A Simple and Plug-and-play Method for Unsupervised Sentence Representation Enhancement ( http://arxiv.org/abs/2305.07824v1 )

ライセンス: Link先を確認

Lingfeng Shen, Haiyun Jiang, Lemao Liu, Shuming Shi

(参考訳) 文の適切な埋め込みを教師なしの方法で生成することは、実世界のシナリオにおける意味マッチングと検索の問題に有用である。本稿では,文章表現を高度化する非常に単純な後処理手法であるRepresentation ALchemy(RepAL)を提案する。 RepALの基本的な考え方は、事前訓練されたモデルによって生成された文埋め込みの冗長な情報を強調することである。総合的な実験を通して、RepALは学習自由であり、既存の教師なし文章学習モデルと組み合わせることができるプラグアンドプレイ法であることを示す。また,RepALの理解のために詳細な分析を行った。

Generating proper embedding of sentences through an unsupervised way is beneficial to semantic matching and retrieval problems in real-world scenarios. This paper presents Representation ALchemy (RepAL), an extremely simple post-processing method that enhances sentence representations. The basic idea in RepAL is to de-emphasize redundant information of sentence embedding generated by pre-trained models. Through comprehensive experiments, we show that RepAL is free of training and is a plug-and-play method that can be combined with most existing unsupervised sentence learning models. We also conducted in-depth analysis to understand RepAL.

翻訳日:2023-05-16 19:28:50 公開日:2023-05-13

# 深層学習に基づく心臓運動からの電気不整脈回路の予測 : シリコンを用いた研究

Deep Learning-based Prediction of Electrical Arrhythmia Circuits from Cardiac Motion: An In-Silico Study ( http://arxiv.org/abs/2305.07822v1 )

ライセンス: Link先を確認

Jan Lebert, Daniel Deng, Lei Fan, Lik Chuan Lee, and Jan Christoph

(参考訳) 心臓の収縮は、心筋を介して伝播する電気的興奮によって引き起こされる。近年,深層学習を用いてシミュレーションされた心筋組織の収縮運動から電気刺激を計算できることが示されている。心臓の電気生理学において、第一の診断目標は、心臓のリズム障害の電気的トリガーやドライバを特定することである。しかし、電気マッピング技術を用いることで、特に心室不整脈において、心臓筋全体、特に心室不整脈の間における電気波の3次元形態をマッピングすることは不可能である。したがって、心臓の動きから電気的興奮を計算または予測するアプローチは、有望な代替診断手法である可能性がある。本稿では,深層学習を用いて,心室の変形力学から3次元の波動力学を予測できることを計算機シミュレーションで実証する。心室図の電気機械的アクティベーションダイナミクスのシミュレーションを何千回も実施し,そのデータを用いてニューラルネットワークのトレーニングを行い,変形の原因となる3次元波動パターンを予測した。ネットワークが特定の不整脈を見たことがない場合でも、焦点波パターンの横で複雑な3次元の電磁波パターンを再構成できることを実証した。本研究では, 有限要素法 (FEM) で生成したデータに対して, 平滑化粒子流体力学 (SPH) 法で生成したデータに基づいて, 学習モデルを学習し, 一般化できることを示す。予測は、傷跡の存在下で、そして著しい異質性をもって行うことができる。以上の結果から,深部ニューラルネットワークを用いて心筋運動の画像データから筋内活動電位波形を算出できることが示唆された。

The heart's contraction is caused by electrical excitation which propagates through the heart muscle. It was recently shown that the electrical excitation can be computed from the contractile motion of a simulated piece of heart muscle tissue using deep learning. In cardiac electrophysiology, a primary diagnostic goal is to identify electrical triggers or drivers of heart rhythm disorders. However, using electrical mapping techniques, it is currently impossible to map the three-dimensional morphology of the electrical waves throughout the entire heart muscle, especially during ventricular arrhythmias. Therefore, the approach to calculate or predict electrical excitation from the hearts motion could be a promising alternative diagnostic approach. Here, we demonstrate in computer simulations that it is possible to predict three-dimensional electrical wave dynamics from ventricular deformation mechanics using deep learning. We performed thousands of simulations of electromechanical activation dynamics in ventricular geometries and used the data to train a neural network which subsequently predicts the three-dimensional electrical wave pattern that caused the deformation. We demonstrate that, next to focal wave patterns, even complicated three-dimensional electrical wave patterns can be reconstructed, even if the network has never seen the particular arrhythmia. We show that the deep learning model has the ability to generalize by training it on data generated with the smoothed particle hydrodynamics (SPH) method and subsequently applying it to data generated with the finite element method (FEM). Predictions can be performed in the presence of scars and with significant heterogeneity. Our results suggest that, deep neural networks could be used to calculate intramural action potential wave patterns from imaging data of the motion of the heart muscle.

翻訳日:2023-05-16 19:28:34 公開日:2023-05-13

# 配電系統におけるキャパシティ解析をホストするアクティブラーニング手法

An Active Learning-based Approach for Hosting Capacity Analysis in Distribution Systems ( http://arxiv.org/abs/2305.07818v1 )

ライセンス: Link先を確認

Kiyeob Lee, Peng Zhao, Anirban Bhattacharya, Bani K. Mallick, Le Xie

(参考訳) 分散エネルギー資源(ders)の統合が増加するにつれて、将来の電力配電網のためのホスティングキャパシティ(hc)をモデル化し、分析する必要がある。ホスティングキャパシティ分析(hca)は、グリッドに安全に統合できるdersの量を調べ、全一般性において挑戦的なタスクである。すなわち、実現可能集合と実現不可能集合の間には、多くの極点が存在する。さらに、HCは複数の因子に依存する。 (a)社会経済的行動に依存したDrの採用パターン b)DERの制御方法と管理方法これらの2つの要因は、dersのすべての統合が集中的に計画されているわけではなく、hcに関する我々の理解を大きく変える可能性があるため、問題空間に固有のものである。本稿では2つの要因を捉えることで研究ギャップを解消する。 (a)及び b) HCAやいくつかの最も洞察に富んだHCシナリオをドメイン知識の犠牲にして特定すること。我々は,データ駆動型HCAフレームワークを提案し,シナリオを効果的に探求するために,HCAにおけるアクティブラーニングを導入する。 HCAにおけるアクティブラーニングとHCの特性 (a)及び (b) は 3 行の例で示される。次に,その意義を理解するために,詳細な大規模研究を提案する。 (a)及び (b) HCとその解釈は2つの要因によって大きく変化することが示唆された。 (a)及び (b)

With the increasing amount of distributed energy resources (DERs) integration, there is a significant need to model and analyze hosting capacity (HC) for future electric distribution grids. Hosting capacity analysis (HCA) examines the amount of DERs that can be safely integrated into the grid and is a challenging task in full generality because there are many possible integration of DERs in foresight. That is, there are numerous extreme points between feasible and infeasible sets. Moreover, HC depends on multiple factors such as (a) adoption patterns of DERs that depend on socio-economic behaviors and (b) how DERs are controlled and managed. These two factors are intrinsic to the problem space because not all integration of DERs may be centrally planned, and could largely change our understanding about HC. This paper addresses the research gap by capturing the two factors (a) and (b) in HCA and by identifying a few most insightful HC scenarios at the cost of domain knowledge. We propose a data-driven HCA framework and introduce active learning in HCA to effectively explore scenarios. Active learning in HCA and characteristics of HC with respect to the two factors (a) and (b) are illustrated in a 3-bus example. Next, detailed large-scale studies are proposed to understand the significance of (a) and (b). Our findings suggest that HC and its interpretations significantly change subject to the two factors (a) and (b).

翻訳日:2023-05-16 19:27:46 公開日:2023-05-13

# palm: 病的近視認識と解剖学的構造アノテーションを備えた開眼眼底写真データセット

PALM: Open Fundus Photograph Dataset with Pathologic Myopia Recognition and Anatomical Structure Annotation ( http://arxiv.org/abs/2305.07816v1 )

ライセンス: Link先を確認

Huihui Fang, Fei Li, Junde Wu, Huazhu Fu, Xu Sun, Jos\'e Ignacio Orlando, Hrvoje Bogunovi\'c, Xiulan Zhang, Yanwu Xu

(参考訳) 病理組織学的ミオニア (PM) は、近視性網膜変性症である。この状態の早期スクリーニングは、それに伴う眼底病変による損傷を減少させ、視力の喪失を予防することができる。人工知能に基づく自動診断ツールは、臨床医が病気の兆候を識別したり、カラー・ファンドス写真を使って集団を検査したりすることで、このプロセスの恩恵を受けることができる。本稿では,病理組織診断と解剖学的構造アノテーションのためのPALM,オープン・ファンドス・イメージング・データセットについて考察する。本データベースは, 病的近視カテゴリのラベル付き1200枚の画像と, 視神経乳頭の位置, パッチ状網膜萎縮(乳頭性萎縮症を含む), 網膜剥離などの病変の描出に関する手指注釈を含む。さらに,本論文では,データベース構築に使用されるラベル付けプロセス,サンプルの品質と特性などの詳細を詳述し,他の利用ノートを提供する。

Pathologic myopia (PM) is a common blinding retinal degeneration suffered by highly myopic population. Early screening of this condition can reduce the damage caused by the associated fundus lesions and therefore prevent vision loss. Automated diagnostic tools based on artificial intelligence methods can benefit this process by aiding clinicians to identify disease signs or to screen mass populations using color fundus photographs as inputs. This paper provides insights about PALM, our open fundus imaging dataset for pathological myopia recognition and anatomical structure annotation. Our databases comprises 1200 images with associated labels for the pathologic myopia category and manual annotations of the optic disc, the position of the fovea and delineations of lesions such as patchy retinal atrophy (including peripapillary atrophy) and retinal detachment. In addition, this paper elaborates on other details such as the labeling process used to construct the database, the quality and characteristics of the samples and provides other relevant usage notes.

翻訳日:2023-05-16 19:27:21 公開日:2023-05-13

# 希少事象シミュレーションのためのフローベース生成モデル

A Flow-Based Generative Model for Rare-Event Simulation ( http://arxiv.org/abs/2305.07863v1 )

ライセンス: Link先を確認

Lachlan Gibson, Marcus Hoerger, Dirk Kroese

(参考訳) 複雑で確率的な環境で決定問題を解くことは、モンテカルロサンプリングによる決定の結果を推定することでしばしば達成される。しかし、サンプリングは稀だが重要な出来事を見落とし、決定プロセスに重大な影響を及ぼす可能性がある。本稿では,まれな事象が発生した場合の条件分布から直接サンプルをシミュレートする正規化フロー生成モデルを訓練する手法を提案する。カップリングフローを利用することで,任意のサンプリング分布を任意に近似することができる。近似法とImportance Smplingを組み合わせることで、複雑な積分と期待値の高精度な推定値が得られる。本手法を高次元, 希少な設定でも, 効率的なサンプリングと推定に利用できる例をいくつか紹介する。我々は,レアイベント分布から直接シミュレートすることで,レアイベントの発生方法に大きな洞察を得ることができることを示す。

Solving decision problems in complex, stochastic environments is often achieved by estimating the expected outcome of decisions via Monte Carlo sampling. However, sampling may overlook rare, but important events, which can severely impact the decision making process. We present a method in which a Normalizing Flow generative model is trained to simulate samples directly from a conditional distribution given that a rare event occurs. By utilizing Coupling Flows, our model can, in principle, approximate any sampling distribution arbitrarily well. By combining the approximation method with Importance Sampling, highly accurate estimates of complicated integrals and expectations can be obtained. We include several examples to demonstrate how the method can be used for efficient sampling and estimation, even in high-dimensional and rare-event settings. We illustrate that by simulating directly from a rare-event distribution significant insight can be gained into the way rare events happen.

翻訳日:2023-05-16 19:20:14 公開日:2023-05-13

# HAiVA: クラウド特性が気候パターンに与える影響を研究するためのハイブリッドAI支援ビジュアル分析フレームワーク

HAiVA: Hybrid AI-assisted Visual Analysis Framework to Study the Effects of Cloud Properties on Climate Patterns ( http://arxiv.org/abs/2305.07859v1 )

ライセンス: Link先を確認

Subhashis Hazarika, Haruki Hirasawa, Sookyung Kim, Kalai Ramea, Salva R. Cachay, Peetak Mitra, Dipti Hingmire, Hansi Singh, Phil J. Rasch

(参考訳) 雲は地球の気候システムに大きな影響を及ぼす。これらは地球の放射収支を調整し、温度と降水量の地域的変化を促進する上で重要な役割を担っている。これにより、雲は、雲の反射率の修正を意味するマリン・クラウド・ブライトニング(MCB)のような気候介入技術に理想的である。しかし,MCBの意図しない影響を避けるためには,気候応答関数に対する複雑な雲の理解を深める必要がある。従来のアース・システム・モデルによる介入シナリオの設計とテストは計算コストがかかる。そこで我々は,このような科学的研究を進めるためのハイブリッドAI支援視覚分析フレームワークを提案し,様々なMCB介入シナリオをインタラクティブに検討し,その意図的かつ意図しない影響が気候パターンに与える影響を評価する。我々は気候科学者のチームと協力して,クラウドと気候の応答関数を模倣するハイブリッドaiモデル群を開発し,異なるmcb介入実験を行うための,密結合したフロントエンドインタラクティブなビジュアル分析システムを設計する。

Clouds have a significant impact on the Earth's climate system. They play a vital role in modulating Earth's radiation budget and driving regional changes in temperature and precipitation. This makes clouds ideal for climate intervention techniques like Marine Cloud Brightening (MCB) which refers to modification in cloud reflectivity, thereby cooling the surrounding region. However, to avoid unintended effects of MCB, we need a better understanding of the complex cloud to climate response function. Designing and testing such interventions scenarios with conventional Earth System Models is computationally expensive. Therefore, we propose a hybrid AI-assisted visual analysis framework to drive such scientific studies and facilitate interactive what-if investigation of different MCB intervention scenarios to assess their intended and unintended impacts on climate patterns. We work with a team of climate scientists to develop a suite of hybrid AI models emulating cloud-climate response function and design a tightly coupled frontend interactive visual analysis system to perform different MCB intervention experiments.

翻訳日:2023-05-16 19:20:01 公開日:2023-05-13

# AURA : 物体除去のためのランダム入力サンプリングを用いたマスク自動生成装置

AURA : Automatic Mask Generator using Randomized Input Sampling for Object Removal ( http://arxiv.org/abs/2305.07857v1 )

ライセンス: Link先を確認

Changsuk Oh, Dongseok Shim, H. Jin Kim

(参考訳) 画像の塗装作業の目的は、画像の欠落した領域を視覚的に可視的に埋めることである。近年,深層学習に基づく画像インパインティングネットワークは,画像中の不要なオブジェクトをマスキングすることで,オブジェクトの除去に利用している。しかしながら、ネットワークを使ってオブジェクトを適切に削除しようとする一方で、以前の作業では入力マスクの重要性に注意を払わない。本稿では,オフザシェルフ・イメージ・インパインティング・ネットワークを用いて,オブジェクトをよりよく除去するための入力マスクの生成に焦点をあてる。本稿では,説明可能なai(xai)手法に触発された自動マスク生成法を提案する。提案手法は,ランダムな入力マスクを用いて重要度マップを生成し,ランダムなマスクから得られた画像のスコアを定量的に推定する。出力マスクは、重要度マップから生成される候補マスクのうち、判定モジュールによって選択される。判定モジュールを設計し,オブジェクト除去結果の品質を定量的に推定する。さらに, 対象除去結果の報告に用いられた評価手法が, 対象除去器の性能を推定するには適切でないことを実証的に見出した。そこで我々は,対象物除去器の品質を適切に評価するために,新しい評価指標(FID$^*$とU-IDS$^*$)を提案する。実験により,本手法は意味的セグメンテーションマップから生成したマスクよりも,目的のクラスオブジェクトを除去する性能が良好であることが確認された。

The objective of the image inpainting task is to fill missing regions of an image in a visually plausible way. Recently, deep-learning-based image inpainting networks have generated outstanding results, and some utilize their models as object removers by masking unwanted objects in an image. However, while trying to better remove objects using their networks, the previous works pay less attention to the importance of the input mask. In this paper, we focus on generating the input mask to better remove objects using the off-the-shelf image inpainting network. We propose an automatic mask generator inspired by the explainable AI (XAI) method, whose output can better remove objects than a semantic segmentation mask. The proposed method generates an importance map using randomly sampled input masks and quantitatively estimated scores of the completed images obtained from the random masks. The output mask is selected by a judge module among the candidate masks which are generated from the importance map. We design the judge module to quantitatively estimate the quality of the object removal results. In addition, we empirically find that the evaluation methods used in the previous works reporting object removal results are not appropriate for estimating the performance of an object remover. Therefore, we propose new evaluation metrics (FID$^*$ and U-IDS$^*$) to properly evaluate the quality of object removers. Experiments confirm that our method shows better performance in removing target class objects than the masks generated from the semantic segmentation maps, and the two proposed metrics make judgments consistent with humans.

翻訳日:2023-05-16 19:19:42 公開日:2023-05-13

# マルチエージェントシステムにおける非同期動作コーディネーションのためのstackelberg決定トランスフォーマ

Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems ( http://arxiv.org/abs/2305.07856v1 )

ライセンス: Link先を確認

Bin Zhang, Hangyu Mao, Lijuan Li, Zhiwei Xu, Dapeng Li, Rui Zhao, Guoliang Fan

(参考訳) 非同期アクションコーディネーションは、マルチエージェントシステム(mas)において、スタックルバーグゲーム(sg)として表現できる広汎な挑戦を示す。しかし,SGに基づくMARL(Multi-Agent Reinforcement Learning)手法のスケーラビリティは,ネットワーク構造や環境制約によって厳しく制約されている。この問題に対処するために,エージェント間の階層的協調の困難を解消するヒューリスティックアプローチであるStackelberg Decision Transformer (STEER)を提案する。 STEERは、SGの階層的決定構造、自己回帰配列モデルのモデリング能力、MARLの探索的学習手法を取り入れ、空間的および時間的文脈における意思決定プロセスを効率的に管理する。本研究は,masにおける様々なタスクタイプや環境構成に広く適用可能な,効果的かつ適応可能な非同期動作協調手法の開発に寄与する。実験の結果,提案手法はstackelberg平衡解に収束し,他の既存手法よりも複雑なシナリオで優れていることがわかった。

Asynchronous action coordination presents a pervasive challenge in Multi-Agent Systems (MAS), which can be represented as a Stackelberg game (SG). However, the scalability of existing Multi-Agent Reinforcement Learning (MARL) methods based on SG is severely constrained by network structures or environmental limitations. To address this issue, we propose the Stackelberg Decision Transformer (STEER), a heuristic approach that resolves the difficulties of hierarchical coordination among agents. STEER efficiently manages decision-making processes in both spatial and temporal contexts by incorporating the hierarchical decision structure of SG, the modeling capability of autoregressive sequence models, and the exploratory learning methodology of MARL. Our research contributes to the development of an effective and adaptable asynchronous action coordination method that can be widely applied to various task types and environmental configurations in MAS. Experimental results demonstrate that our method can converge to Stackelberg equilibrium solutions and outperforms other existing methods in complex scenarios.

翻訳日:2023-05-16 19:19:16 公開日:2023-05-13

# マッチング特徴抽出を用いた異種エッジデバイスのためのフェデレーション学習型産業健康診断

A Federated Learning-based Industrial Health Prognostics for Heterogeneous Edge Devices using Matched Feature Extraction ( http://arxiv.org/abs/2305.07854v1 )

ライセンス: Link先を確認

Anushiya Arunan, Yan Qin, Xiaoli Li, and Chau Yuen

(参考訳) データ駆動型産業健康予測は、正確で信頼性の高い予測モデルを開発するために豊富な訓練データを必要とする。しかし、厳格なデータプライバシー法とエッジ産業データの豊富さは、分散データ利用を必要とする。したがって,産業保健分野は,分散型・プライバシー保全型学習手法であるフェデレーション学習(fl)から著しく利益を得るのに適している。しかしながら,ヘテロジニアスデータから学習したモデルパラメータを有意義に集約し,ハイパフォーマンスなフェデレーションモデルを形成するという複雑さから,flベースの健康予測タスクはほとんど研究されていない。特に、異質な分解機構と不等なデータセットサイズに由来するエッジデバイス間のデータの不均一性は、正確なフェデレーションモデルを開発する上で重要な統計的課題となる。特徴類似性マッチングパラメータアグリゲーションアルゴリズムを用いて、異種エッジデータから識別的に学習するFLベースの健康予後モデルを提案する。このアルゴリズムは局所的に訓練された不均一なモデルを探索し、まずニューロンと確率論的に類似した特徴抽出関数をマッチングし、それらを選択的に平均化し、フェデレートされたモデルパラメータを形成する。このアルゴリズムは、従来の座標方向ニューロンの平均化とは対照的に、類似したニューロンを平均するだけであるため、局所モデルの異なる特徴抽出器は、結果のフェデレーションモデルへの希釈を少なくする。ターボファンエンジンのLiイオン電池の循環劣化データと非循環劣化データの両方を用いて, 提案手法は, それぞれ44.5\%, 39.3\%の精度向上を達成できることを示した。

Data-driven industrial health prognostics require rich training data to develop accurate and reliable predictive models. However, stringent data privacy laws and the abundance of edge industrial data necessitate decentralized data utilization. Thus, the industrial health prognostics field is well suited to significantly benefit from federated learning (FL), a decentralized and privacy-preserving learning technique. However, FL-based health prognostics tasks have hardly been investigated due to the complexities of meaningfully aggregating model parameters trained from heterogeneous data to form a high performing federated model. Specifically, data heterogeneity among edge devices, stemming from dissimilar degradation mechanisms and unequal dataset sizes, poses a critical statistical challenge for developing accurate federated models. We propose a pioneering FL-based health prognostic model with a feature similarity-matched parameter aggregation algorithm to discriminatingly learn from heterogeneous edge data. The algorithm searches across the heterogeneous locally trained models and matches neurons with probabilistically similar feature extraction functions first, before selectively averaging them to form the federated model parameters. As the algorithm only averages similar neurons, as opposed to conventional naive averaging of coordinate-wise neurons, the distinct feature extractors of local models are carried over with less dilution to the resultant federated model. Using both cyclic degradation data of Li-ion batteries and non-cyclic data of turbofan engines, we demonstrate that the proposed method yields accuracy improvements as high as 44.5\% and 39.3\% for state-of-health estimation and remaining useful life estimation, respectively.

翻訳日:2023-05-16 19:18:57 公開日:2023-05-13

# EV-MGRFlowNet:ハイブリッド運動補償損失を有する教師なしイベントベース光流の動作誘導リカレントネットワーク

EV-MGRFlowNet: Motion-Guided Recurrent Network for Unsupervised Event-based Optical Flow with Hybrid Motion-Compensation Loss ( http://arxiv.org/abs/2305.07853v1 )

ライセンス: Link先を確認

Hao Zhuang, Xinjie Huang, Kuanxu Hou, Delei Kong, Chenming Hu, Zheng Fang

(参考訳) イベントカメラは、高時間分解能や高ダイナミックレンジなどの有望な特性を提供する。これらの利点は多くの機械ビジョンタスク、特に光学フロー推定に利用されてきた。現在、ほとんどのイベントベースの作品は、ディープラーニングを使って光の流れを推定している。しかし、それらのネットワークは以前の隠れ状態や動きの流れを完全に活用していない。さらに、彼らの監視戦略は、ネットワークの可能性を解き放つためにイベントデータの幾何学的制約を十分に活用していない。本稿では,ハイブリッド動作補償損失を用いた動作誘導型リカレントネットワークを備えた,教師なしイベントベース光フロー推定パイプラインEV-MGRFlowNetを提案する。まず,従来の隠れ状態を完全に活用してマルチレベル動作特性を得る機能強化型リカレントエンコーダネットワーク(FERE-Net)を提案する。そこで我々は,フロー誘導型デコーダネットワーク(FGD-Net)を提案する。最後に,より正確なイベントアライメントのための幾何学的制約を強化するために,ハイブリッドモーション補償損失(hmc-loss)を設計する。実験結果から,本手法はmvsecデータセットのsof(state-of-the-art, sota)法を上回っており,平均エンドポイント誤差(aee)は22.71%であった。我々の知る限り,本手法は教師なし学習手法の1つである。

Event cameras offer promising properties, such as high temporal resolution and high dynamic range. These benefits have been utilized into many machine vision tasks, especially optical flow estimation. Currently, most existing event-based works use deep learning to estimate optical flow. However, their networks have not fully exploited prior hidden states and motion flows. Additionally, their supervision strategy has not fully leveraged the geometric constraints of event data to unlock the potential of networks. In this paper, we propose EV-MGRFlowNet, an unsupervised event-based optical flow estimation pipeline with motion-guided recurrent networks using a hybrid motion-compensation loss. First, we propose a feature-enhanced recurrent encoder network (FERE-Net) which fully utilizes prior hidden states to obtain multi-level motion features. Then, we propose a flow-guided decoder network (FGD-Net) to integrate prior motion flows. Finally, we design a hybrid motion-compensation loss (HMC-Loss) to strengthen geometric constraints for the more accurate alignment of events. Experimental results show that our method outperforms the current state-of-the-art (SOTA) method on the MVSEC dataset, with an average reduction of approximately 22.71% in average endpoint error (AEE). To our knowledge, our method ranks first among unsupervised learning-based methods.

翻訳日:2023-05-16 19:18:27 公開日:2023-05-13

# スクイーズ励起埋め込み注意用unetによる脳腫瘍の分節

Squeeze Excitation Embedded Attention UNet for Brain Tumor Segmentation ( http://arxiv.org/abs/2305.07850v1 )

ライセンス: Link先を確認

Gaurav Prasanna, John Rohit Ernest, Lalitha G and Sathiya Narayanan

(参考訳) 深層学習に基づく技術は、ここ数年医学の分野で重要性を増してきた。医学画像の分類、分類、識別など様々な用途で使用されている。 unetやアテンションunet、アテンション残差unetといった既存のアーキテクチャは、すでに脳腫瘍のセグメンテーションと同じ応用法として存在しているが、チャンネルレベルの特徴の抽出方法については、いずれも対処されていない。本稿では,Squeeze Excitation Embedded Attention UNet (SEEA-UNet) と呼ばれる新しいアーキテクチャを提案する。提案モデルと既存アーキテクチャとの比較を行った結果,学習回数が少なくなるほど,提案モデルの性能が向上した。双対焦点損失とジャカード係数はモデルの性能を監視するために用いられた。

Deep Learning based techniques have gained significance over the past few years in the field of medicine. They are used in various applications such as classifying medical images, segmentation and identification. The existing architectures such as UNet, Attention UNet and Attention Residual UNet are already currently existing methods for the same application of brain tumor segmentation, but none of them address the issue of how to extract the features in channel level. In this paper, we propose a new architecture called Squeeze Excitation Embedded Attention UNet (SEEA-UNet), this architecture has both Attention UNet and Squeeze Excitation Network for better results and predictions, this is used mainly because to get information at both Spatial and channel levels. The proposed model was compared with the existing architectures based on the comparison it was found out that for lesser number of epochs trained, the proposed model performed better. Binary focal loss and Jaccard Coefficient were used to monitor the model's performance.

翻訳日:2023-05-16 19:18:06 公開日:2023-05-13

# Meta-Polyp: 効率的なPolypセグメンテーションのためのベースライン

Meta-Polyp: a baseline for efficient Polyp segmentation ( http://arxiv.org/abs/2305.07848v1 )

ライセンス: Link先を確認

Quoc-Huy Trinh

(参考訳) 近年,ポリプのセグメンテーションが重要となり,cnn,視覚トランスフォーマー,トランスフォーマー技術を用いた競合的手法が数多く開発されている。しかし、これらの手法は、分散外データセット、境界の欠如、小さなポリプを扱う際にしばしば困難に直面する。 2022年、メタフォーマーはビジョンの新しいベースラインとして導入され、マルチタスクコンピュータビジョンのパフォーマンスを向上させるだけでなく、ビジョントランスフォーマーとcnnファミリーバックボーンの制限にも対処した。セグメンテーションをさらに強化するために,UNetとMeta-Formerの融合と,テクスチャを強化するためにデコーダステージにレベルアップを組み合わせたマルチスケールアップサンプリングブロックを提案するとともに,Meta-Formerのアイデアに基づいたConvformerブロックベースを提案し,ローカル特徴の重要な情報を強化する。これらのブロックは、ポリープの全体形状のようなグローバル情報と、医療区分の決定に不可欠な局所情報と境界情報の組み合わせを可能にする。提案手法は競争性能を達成し,CVC-300データセット,Kvasir,CVC-ColonDBデータセットにおける最先端の成果を得た。 Kvasir-SEGとは別に、他はアウトオブディストリビューションデータセットである。実装は以下の通りである。 https://github.com/huyquoctrinh/MetaPolyp-CBMS2023。

In recent years, polyp segmentation has gained significant importance, and many methods have been developed using CNN, Vision Transformer, and Transformer techniques to achieve competitive results. However, these methods often face difficulties when dealing with out-of-distribution datasets, missing boundaries, and small polyps. In 2022, Meta-Former was introduced as a new baseline for vision, which not only improved the performance of multi-task computer vision but also addressed the limitations of the Vision Transformer and CNN family backbones. To further enhance segmentation, we propose a fusion of Meta-Former with UNet, along with the introduction of a Multi-scale Upsampling block with a level-up combination in the decoder stage to enhance the texture, also we propose the Convformer block base on the idea of the Meta-former to enhance the crucial information of the local feature. These blocks enable the combination of global information, such as the overall shape of the polyp, with local information and boundary information, which is crucial for the decision of the medical segmentation. Our proposed approach achieved competitive performance and obtained the top result in the State of the Art on the CVC-300 dataset, Kvasir, and CVC-ColonDB dataset. Apart from Kvasir-SEG, others are out-of-distribution datasets. The implementation can be found at: https://github.com/huyquoctrinh/MetaPolyp-CBMS2023.

翻訳日:2023-05-16 19:17:47 公開日:2023-05-13

# 不均一データを用いたフェデレーション学習における平均モデル理解

Understanding Model Averaging in Federated Learning on Heterogeneous Data ( http://arxiv.org/abs/2305.07845v1 )

ライセンス: Link先を確認

Tailin Zhou, Zehong Lin, Jun Zhang, Danny H.K. Tsang

(参考訳) モデル平均化(model averaging)は、フェデレーション学習(fl)で広く採用されている手法で、異種データでトレーニングされた複数のクライアントモデルを集約し、よく整備されたグローバルモデルを得る。しかし、その成功の根拠はよく理解されていない。そこで本研究では,損失/エラーの景観を可視化し,モデル平均化の幾何学的性質について検討する。幾何学的可視化は、クライアントモデルが共通盆地内のグローバルモデルを取り囲み、クライアントモデルよりも優れた性能を示したとしても、グローバルモデルは盆地の底部から逸脱する可能性があることを示している。この現象をさらに理解するために,グローバルモデルの予測誤差をクライアントモデルに関連する5つの要因に分解する。特に、早期トレーニング後のグローバルモデルエラーは、主に、一クライアントデータセットとグローバルデータセットの重複しないデータのクライアントモデルエラー二グローバルモデルとクライアントモデルとの間の最大距離これらの知見に触発されて,グローバルモデルに反復移動平均化(IMA)を適用して予測誤差を低減し,遅延訓練時の最大距離を制御するクライアント探索を制限することを提案する。実験により,既存のfl法の精度とトレーニング速度が,様々なデータ不均一性を持つベンチマークデータセットにおいて著しく向上することを示した。

Model averaging, a widely adopted technique in federated learning (FL), aggregates multiple client models trained on heterogeneous data to obtain a well-performed global model. However, the rationale behind its success is not well understood. To shed light on this issue, we investigate the geometric properties of model averaging by visualizing the loss/error landscape. The geometrical visualization shows that the client models surround the global model within a common basin, and the global model may deviate from the bottom of the basin even though it performs better than the client models. To further understand this phenomenon, we decompose the expected prediction error of the global model into five factors related to client models. Specifically, we find that the global-model error after early training mainly comes from i) the client-model error on non-overlapping data between client datasets and the global dataset and ii) the maximal distance between the global and client models. Inspired by these findings, we propose adopting iterative moving averaging (IMA) on global models to reduce the prediction error and limiting client exploration to control the maximal distance at the late training. Our experiments demonstrate that IMA significantly improves the accuracy and training speed of existing FL methods on benchmark datasets with various data heterogeneity.

翻訳日:2023-05-16 19:17:19 公開日:2023-05-13

# 不定形作用をもつパラメータ化マルコフ決定過程に対するトンプソンサンプリング

Thompson Sampling for Parameterized Markov Decision Processes with Uninformative Actions ( http://arxiv.org/abs/2305.07844v1 )

ライセンス: Link先を確認

Michael Gimelfarb and Michael Jong Kim

(参考訳) 興味の主パラメータが不明であり,ベイズ推定を用いて学習しなければならないパラメータ化MDP(PMDP)について検討した。このようなモデルのキーとなる特徴は、未知のパラメータに関する情報を提供する「非形式的」なアクションの存在である。我々はpmdpに対する一連の仮定を提案し、トンプソンサンプリングは、キューイング、在庫管理、動的価格といった多くの問題に対して容易に検証できる、漸近的に最適な期待後悔値である$o(t^{-1})$を保証する。

We study parameterized MDPs (PMDPs) in which the key parameters of interest are unknown and must be learned using Bayesian inference. One key defining feature of such models is the presence of "uninformative" actions that provide no information about the unknown parameters. We contribute a set of assumptions for PMDPs under which Thompson sampling guarantees an asymptotically optimal expected regret bound of $O(T^{-1})$, which are easily verified for many classes of problems such as queuing, inventory control, and dynamic pricing.

翻訳日:2023-05-16 19:16:56 公開日:2023-05-13

# Logit Attribution Matchingによるコントラスト領域の一般化

Contrastive Domain Generalization via Logit Attribution Matching ( http://arxiv.org/abs/2305.07888v1 )

ライセンス: Link先を確認

Han Gao, Kaican Li, Yongxiang Huang, Luning Wang, Caleb Chen Cao, Nevin L.Zhang

(参考訳) ドメイン一般化(DG)は機械学習において重要なオープンな問題である。深いモデルは、たとえ分単位のドメインシフトに影響を受けやすく、実際のアプリケーションにおける信頼性を著しく損なう。この問題を軽減するために、既存のほとんどのメソッドは、複数のトレーニングドメインにまたがる様々な不変制約を適用している。しかし、このようなアプローチは、一般的に新しいテストドメインに対するパフォーマンス保証をほとんど提供しない。本稿では,複数領域の代わりに強コントラストデータ対によって示される意味的不変性を利用した,cdg(con contrastive domain generalization)という異なる手法について検討する。本稿では,CDGの潜在能力を示す因果的DG理論を提案し,正規化手法とともに,CDGを実現するためのロジット属性マッチング(LAM)を提案する。 LAMは、ペアデータのごく一部で、最先端のDGメソッドよりも優れており、モデルがDGに不可欠なセマンティック機能により焦点を合わせるのに役立つことを実証的に示す。

Domain Generalization (DG) is an important open problem in machine learning. Deep models are susceptible to domain shifts of even minute degrees, which severely compromises their reliability in real applications. To alleviate the issue, most existing methods enforce various invariant constraints across multiple training domains. However,such an approach provides little performance guarantee for novel test domains in general. In this paper, we investigate a different approach named Contrastive Domain Generalization (CDG), which exploits semantic invariance exhibited by strongly contrastive data pairs in lieu of multiple domains. We present a causal DG theory that shows the potential capability of CDG; together with a regularization technique, Logit Attribution Matching (LAM), for realizing CDG. We empirically show that LAM outperforms state-of-the-art DG methods with only a small portion of paired data and that LAM helps models better focus on semantic features which are crucial to DG.

翻訳日:2023-05-16 19:10:39 公開日:2023-05-13

# レビュアー代入問題:スコーピングレビュー

Reviewer assignment problem: A scoping review ( http://arxiv.org/abs/2305.07887v1 )

ライセンス: Link先を確認

Jelena Jovanovic (1) and Ebrahim Bagheri (2) ((1) University of Belgrade, Serbia, (2) Toronto Metropolitan University, Canada)

(参考訳) ピアレビューは科学研究の不可欠な要素である。査読の質、その結果発表された研究の質は、提出された論文に対して適切な審査員を募集する能力に大きく依存する。しかし、科学的論文の制作と学者の作業負荷の継続的な増加など、いくつかの要因により、このようなレビュアーの発見はますます困難になっている。これらの課題を緩和するために、レビューア(しばしばレビュア代入問題(RAP)と呼ばれる)と論文の自動関連のためのソリューションが30年間研究の対象となっている。多くの解が提案されているが、我々の知る限り、最近のRAP関連の文献の体系的な合成は欠落している。本稿では、このギャップを埋め、さらにRAP関連の研究を支援するために、RAPに対処するための計算手法のスコーピングレビューを行う。 3つのデータベース(Scopus, Google Scholar, DBLP)からRAPに関する最近の文献を収集し、信頼性基準を適用した後、RAP研究の諸側面におけるデータの抽出・合成に関する26の研究を継続した。一 RAPに対する全体的な枠組み及びアプローチ二審査官選定の基準三候補者審査官及び提出者のモデリング四審査官及び提出書の照合のための計算方法五提案したソリューションの性能を評価する方法この論文は、前述のRAP研究の各側面について要約し、考察し、今後の研究方向性を提案する。

Peer review is an integral component of scientific research. The quality of peer review, and consequently the published research, depends to a large extent on the ability to recruit adequate reviewers for submitted papers. However, finding such reviewers is an increasingly difficult task due to several factors, such as the continuous increase both in the production of scientific papers and the workload of scholars. To mitigate these challenges, solutions for automated association of papers with "well matching" reviewers - the task often referred to as reviewer assignment problem (RAP) - have been the subject of research for thirty years now. Even though numerous solutions have been suggested, to our knowledge, a recent systematic synthesis of the RAP-related literature is missing. To fill this gap and support further RAP-related research, in this paper, we present a scoping review of computational approaches for addressing RAP. Following the latest methodological guidance for scoping reviews, we have collected recent literature on RAP from three databases (Scopus, Google Scholar, DBLP) and, after applying the eligibility criteria, retained 26 studies for extracting and synthesising data on several aspects of RAP research including: i) the overall framing of and approach to RAP; ii) the criteria for reviewer selection; iii) the modelling of candidate reviewers and submissions; iv) the computational methods for matching reviewers and submissions; and v) the methods for evaluating the performance of the proposed solutions. The paper summarises and discusses the findings for each of the aforementioned aspects of RAP research and suggests future research directions.

翻訳日:2023-05-16 19:10:22 公開日:2023-05-13

# 側方カシミール力の測定から非ニュートン重力の制約を強化する方法

How to strengthen constraints on non-Newtonian gravity from measuring the lateral Casimir force ( http://arxiv.org/abs/2305.07884v1 )

ライセンス: Link先を確認

G. L. Klimchitskaya and V. M. Mostepanenko

(参考訳) ナノメートルの相互作用範囲では、利用可能な実験データはニュートンの重力法則に対する湯川型の補正を除外していないことが知られている。この相互作用における湯川型相互作用のパラメータに関する最も強い制約は、中性子散乱の実験と、波形表面間の横及び正常カシミール力の測定から導かれる。本研究では,高い相関振幅と小さい相関周期を犠牲にして実験構成を最適化することにより,4.5nmから37nmの範囲で現在利用可能な制約を大幅に強化できることを実証する。相互作用範囲19nmに対して,40倍以上の最大強度が到達可能であることを示す。

It has been known that in the nanometer interaction range the available experimental data do not exclude the Yukawa-type corrections to Newton's gravitational law which exceed the Newtonian gravitational force by many orders of magnitude. The strongest constraints on the parameters of Yukawa-type interaction in this interaction range follow from the experiments on neutron scattering and from measurements of the lateral and normal Casimir forces between corrugated surfaces. In this work, we demonstrate that by optimizing the experimental configuration at the expense of the higher corrugation amplitudes and smaller periods of corrugations it is possible to considerably strengthen the currently available constraints within the wide interaction range from 4.5 to 37nm. We show that the maximum strengthening by more than a factor of 40 is reachable for the interaction range of 19nm.

翻訳日:2023-05-16 19:09:55 公開日:2023-05-13

# 画素不確かさ推定による医用画像分割の一般化に向けて

Towards Generalizable Medical Image Segmentation with Pixel-wise Uncertainty Estimation ( http://arxiv.org/abs/2305.07883v1 )

ライセンス: Link先を確認

Shuai Wang, Zipei Yan, Daoan Zhang, Zhongsen Li, Sirui Wu, Wenxuan Chen, Rui Li

(参考訳) ディープニューラルネットワーク(DNN)は、独立および同一分散(IID)仮説の下で視覚認識において有望な性能を達成する。対照的に、IDD仮説は多くの現実世界、特に医用画像解析において普遍的に保証されていない。医用画像分割は通常、各ピクセルをカテゴリに分類する画素単位の分類タスクとして定式化される。しかし、この定式化はdnnを混乱させるため、例えば境界付近の画素など、分類が難しい画素を無視している。本稿では,まず,分類の難しい画素が不確実性が高いことを明らかにする。そこで本研究では,dnnの分類が難しい画素を強調するために不確実性推定を用いた新しい枠組みを提案する。提案手法はprostateとfundusの2つのベンチマークで評価した。実験の結果,本手法は最先端手法よりも優れていた。

Deep neural networks (DNNs) achieve promising performance in visual recognition under the independent and identically distributed (IID) hypothesis. In contrast, the IID hypothesis is not universally guaranteed in numerous real-world applications, especially in medical image analysis. Medical image segmentation is typically formulated as a pixel-wise classification task in which each pixel is classified into a category. However, this formulation ignores the hard-to-classified pixels, e.g., some pixels near the boundary area, as they usually confuse DNNs. In this paper, we first explore that hard-to-classified pixels are associated with high uncertainty. Based on this, we propose a novel framework that utilizes uncertainty estimation to highlight hard-to-classified pixels for DNNs, thereby improving its generalization. We evaluate our method on two popular benchmarks: prostate and fundus datasets. The results of the experiment demonstrate that our method outperforms state-of-the-art methods.

翻訳日:2023-05-16 19:09:43 公開日:2023-05-13

# 生成aiと大規模言語モデルの二重利用問題

Dual Use Concerns of Generative AI and Large Language Models ( http://arxiv.org/abs/2305.07882v1 )

ライセンス: Link先を確認

Alexei Grinbaum and Laurynas Adomaitis

(参考訳) 本稿では,生命科学のために設計された Dual Use Research of Concern (DURC) フレームワークを,Large Language Models (LLM) に特化して,生成AIの領域に実装することを提案する。生物学的研究における利点と欠点が証明されていることから、DURCの基準はLLMに対して効果的に再定義できると考えており、AIガバナンスの改善に寄与する可能性がある。 DURCフレームワークを採用する際に課せられるバランスを認識し、生成的AIの影響に対する社会的認識を高める上で重要な政治的役割を強調します。最後に,LLM 研究に DURC アプローチを適用するための具体的な推奨事項について述べる。

We suggest the implementation of the Dual Use Research of Concern (DURC) framework, originally designed for life sciences, to the domain of generative AI, with a specific focus on Large Language Models (LLMs). With its demonstrated advantages and drawbacks in biological research, we believe the DURC criteria can be effectively redefined for LLMs, potentially contributing to improved AI governance. Acknowledging the balance that must be struck when employing the DURC framework, we highlight its crucial political role in enhancing societal awareness of the impact of generative AI. As a final point, we offer a series of specific recommendations for applying the DURC approach to LLM research.

翻訳日:2023-05-16 19:09:30 公開日:2023-05-13

# 2段階知識蒸留によるブラックボックスソースフリードメイン適応

Black-box Source-free Domain Adaptation via Two-stage Knowledge Distillation ( http://arxiv.org/abs/2305.07881v1 )

ライセンス: Link先を確認

Shuai Wang, Daoan Zhang, Zipei Yan, Shitong Shao, Rui Li

(参考訳) ソースフリーなドメイン適応は、トレーニング済みのソースモデルとターゲットデータのみを使用して、ディープニューラルネットワークを適用することを目的としている。しかし、ソースモデルにアクセスすると、ソースデータを漏洩する可能性があるため、患者のプライバシが明らかになる。本稿では,ソースモデルと対象データの出力のみを利用できるブラックボックス・ソースフリー領域適応法について検討する。簡便で効果的な二段階知識蒸留法を提案する。 uppercase\expandafter{\romannumeral1}では、ターゲットモデルをスクラッチからトレーニングし、ソースモデルによって生成されたソフトな擬似ラベルを知識蒸留法で生成する。 uppercase\expandafter{\romannumeral2}では、ノイズの多い擬似ラベルによるエラーの蓄積を避けるために、新しい学生モデルとして別のモデルを初期化する。学生モデルの学習を指導するために,教師モデルに弱い増補を施したイメージを給付する。提案手法は単純で柔軟であり,3つのクロスドメインセグメンテーションタスクにおいて驚くべき結果が得られる。

Source-free domain adaptation aims to adapt deep neural networks using only pre-trained source models and target data. However, accessing the source model still has a potential concern about leaking the source data, which reveals the patient's privacy. In this paper, we study the challenging but practical problem: black-box source-free domain adaptation where only the outputs of the source model and target data are available. We propose a simple but effective two-stage knowledge distillation method. In Stage \uppercase\expandafter{\romannumeral1}, we train the target model from scratch with soft pseudo-labels generated by the source model in a knowledge distillation manner. In Stage \uppercase\expandafter{\romannumeral2}, we initialize another model as the new student model to avoid the error accumulation caused by noisy pseudo-labels. We feed the images with weak augmentation to the teacher model to guide the learning of the student model. Our method is simple and flexible, and achieves surprising results on three cross-domain segmentation tasks.

翻訳日:2023-05-16 19:09:18 公開日:2023-05-13

# ウイルス感染症と細菌感染症の鑑別 : 血液検査値に基づく機械学習モデル

Differentiating Viral and Bacterial Infections: A Machine Learning Model Based on Routine Blood Test Values ( http://arxiv.org/abs/2305.07877v1 )

ライセンス: Link先を確認

Gregor Gun\v{c}ar, Matja\v{z} Kukar, Tim Smole, Sa\v{s}o Mo\v{s}kon, Toma\v{z} Vovko, Simon Podnar, Peter \v{C}ernel\v{c}, Miran Brvar, Mateja Notar, Manca K\"oster, Marjeta Tu\v{s}ek Jelenc, Marko Notar

(参考訳) 抗生物質耐性の脅威の増大は、適切な抗生物質投与のために細菌感染とウイルス感染の正確な区別を必要とする。本研究では,16種類の血液検査結果,c-reactive proteinレベル,生物学的性および年齢を用いて,これらの感染型を識別するために,ウイルス対バクテリア機械学習モデルを開発した。単一の医療センターから44,120件のデータセットを用いて、ウイルス対細菌モデルでは82.2%、ブライアスコア0.129、ROC曲線0.11以下の領域が従来のCRP決定規則モデルよりも高い精度で示された。このモデルは1040mg/LのCRP範囲内で、細菌とウイルスの感染を区別するために、CRPのみが限られた診断値を提供する間隔において、大幅に改善された精度を示す。これらの知見は、診断決定のための複数の血液パラメータを検討することの重要性を強調し、ウイルス対細菌モデルが革新的な診断ツールの作成に寄与することを示唆している。このようなツールは、機械学習と関連するバイオマーカーを利用して、感染症の管理における臨床意思決定を強化する。

The growing threat of antibiotic resistance necessitates accurate differentiation between bacterial and viral infections for proper antibiotic administration. In this study, a Virus vs. Bacteria machine learning model was developed to discern between these infection types using 16 routine blood test results, C-reactive protein levels, biological sex, and age. With a dataset of 44,120 cases from a single medical center, the Virus vs. Bacteria model demonstrated remarkable accuracy of 82.2%, a Brier score of 0.129, and an area under the ROC curve of 0.91, surpassing the performance of traditional CRP decision rule models. The model demonstrates substantially improved accuracy within the CRP range of 10 40 mg/L, an interval in which CRP alone offers limited diagnostic value for distinguishing between bacterial and viral infections. These findings underscore the importance of considering multiple blood parameters for diagnostic decision-making and suggest that the Virus vs. Bacteria model could contribute to the creation of innovative diagnostic tools. Such tools would harness machine learning and relevant biomarkers to support enhanced clinical decision-making in managing infections.

翻訳日:2023-05-16 19:09:02 公開日:2023-05-13

# SPP-CNN: ネットワークロバストネス予測のための効率的なフレームワーク

SPP-CNN: An Efficient Framework for Network Robustness Prediction ( http://arxiv.org/abs/2305.07872v1 )

ライセンス: Link先を確認

Chengpei Wu and Yang Lou and Lin Wang and Junli Li and Xiang Li and Guanrong Chen

(参考訳) 本稿では,ネットワークの接続性と悪意のある攻撃に対する制御性を維持するためのロバスト性について述べる。この種のネットワークの堅牢性は、通常、時間を要する攻撃シミュレーションによって測定され、ノードまたはエッジ削除攻撃のシーケンスの後、残りの接続性と制御可能性を記録する一連の値を返す。本稿では,空間ピラミッドプーリング畳み込みニューラルネットワーク(SPP-CNN)のネットワーク堅牢性予測のための効率的なフレームワークを開発する。新しいフレームワークは、畳み込み層と完全連結層の間に空間ピラミッドプーリング層を設置し、CNNベースの予測手法における一般的なミスマッチ問題を克服し、その一般化性を拡張する。 SPP-CNNと最先端の3つの堅牢性予測器、すなわちCNNベースの2つのグラフニューラルネットワークベースのフレームワークを比較して、大規模な実験を行う。配向および非配向の合成および実世界のネットワークについて検討した。実験の結果,提案したSPP-CNNは未知のデータセットに対する予測性能の向上と一般化性の向上を実現している。

This paper addresses the robustness of a network to sustain its connectivity and controllability against malicious attacks. This kind of network robustness is typically measured by the time-consuming attack simulation, which returns a sequence of values that record the remaining connectivity and controllability after a sequence of node- or edge-removal attacks. For improvement, this paper develops an efficient framework for network robustness prediction, the spatial pyramid pooling convolutional neural network (SPP-CNN). The new framework installs a spatial pyramid pooling layer between the convolutional and fully-connected layers, overcoming the common mismatch issue in the CNN-based prediction approaches and extending its generalizability. Extensive experiments are carried out by comparing SPP-CNN with three state-of-the-art robustness predictors, namely a CNN-based and two graph neural networks-based frameworks. Synthetic and real-world networks, both directed and undirected, are investigated. Experimental results demonstrate that the proposed SPP-CNN achieves better prediction performances and better generalizability to unknown datasets, with significantly lower time-consumption, than its counterparts.

翻訳日:2023-05-16 19:08:43 公開日:2023-05-13

# 事前学習型言語モデルを用いたスケーラブルな教育用質問生成

Scalable Educational Question Generation with Pre-trained Language Models ( http://arxiv.org/abs/2305.07871v1 )

ライセンス: Link先を確認

Sahan Bulathwela, Hamze Muse and Emine Yilmaz

(参考訳) 教育的質問の自動生成は、オンライン教育のスケールにおいて重要な役割を担い、グローバルな人口が個人化された学習旅行を運営しているときに、大規模に自己評価を可能にする。大規模言語モデルを適用した新しい教育的質問生成モデルである \textit{eduqg} を開発した。学術文献および科学質問データに基づく事前学習型言語モデルの構築と微調整により,<textit{EduQG} が優れた教育的質問を作成できることを示す。

The automatic generation of educational questions will play a key role in scaling online education, enabling self-assessment at scale when a global population is manoeuvring their personalised learning journeys. We develop \textit{EduQG}, a novel educational question generation model built by adapting a large language model. Our extensive experiments demonstrate that \textit{EduQG} can produce superior educational questions by further pre-training and fine-tuning a pre-trained language model on the scientific text and science question data.

翻訳日:2023-05-16 19:08:21 公開日:2023-05-13

# AIによるブリッジング履歴 : 予測精度とFact CheckingにおけるGPT3.5、GPT4、GoogleBARDの比較評価

Bridging History with AI A Comparative Evaluation of GPT 3.5, GPT4, and GoogleBARD in Predictive Accuracy and Fact Checking ( http://arxiv.org/abs/2305.07868v1 )

ライセンス: Link先を確認

Davut Emre Tasar, Ceren Ocal Tasar

(参考訳) デジタル時代の情報の急速な拡散は、正確な歴史的表現と解釈の重要性を強調している。人工知能は様々な分野で有望だが、歴史的事実チェックやギャップフィリングの可能性を秘めている。本研究では,LLM 3.5,GPT 4,GoogleBARD の3つの大言語モデルの性能を,与えられたデータに基づいて過去の事象を予測・検証する文脈で評価する。 DTR(Distance to Reality)と呼ばれる新しい指標を導入し、既存の歴史的事実からモデルのアウトプットを評価する。その結果, GPT 4は優れた性能を示すとともに, 歴史的研究におけるAIの潜在的な可能性を明らかにした。本稿では,過去の理解を深め,歴史知識のギャップを埋める上でのAIの役割について,さらなる研究の必要性を明らかにする。

The rapid proliferation of information in the digital era underscores the importance of accurate historical representation and interpretation. While artificial intelligence has shown promise in various fields, its potential for historical fact-checking and gap-filling remains largely untapped. This study evaluates the performance of three large language models LLMs GPT 3.5, GPT 4, and GoogleBARD in the context of predicting and verifying historical events based on given data. A novel metric, Distance to Reality (DTR), is introduced to assess the models' outputs against established historical facts. The results reveal a substantial potential for AI in historical studies, with GPT 4 demonstrating superior performance. This paper underscores the need for further research into AI's role in enriching our understanding of the past and bridging historical knowledge gaps.

翻訳日:2023-05-16 19:08:12 公開日:2023-05-13

# セマンティック対応による時間一貫性自動ビデオカラー化

Temporal Consistent Automatic Video Colorization via Semantic Correspondence ( http://arxiv.org/abs/2305.07904v1 )

ライセンス: Link先を確認

Yu Zhang, Siqi Chen, Mingdao Wang, Xianlin Zhang, Chuang Zhu, Yue Zhang, Xueming Li

(参考訳) 近年,ビデオカラー化作業が注目されている。近年の手法では,隣接するフレームやフレームの時間的一貫性に重点が置かれている。しかし,大きな間隔でフレーム間の不整合に直面する深刻な問題に直面しており,この問題を解決するために,セマンティック対応と自動ビデオカラー化を組み合わせて長距離一貫性を維持する新しい映像カラー化フレームワークを提案する。まず、参照着色ネットワークは、各ビデオの第1フレームを自動的に着色するように設計され、参照画像を取得し、以下の全着色プロセスを監督する。このような自動カラー化基準画像は、作業集約的かつ時間のかかる手動選択を回避できるだけでなく、参照画像とグレースケール画像の類似性を高めることができる。その後、セマンティック対応ネットワークと画像カラー化ネットワークを導入し、参照の助けを借りて残りのフレームの一連の色付けを行う。各フレームは、参照画像と即座に彩色された先行フレームの両方で監督され、短距離と長距離の時間的一貫性が向上する。広範な実験により,本手法は定性的および定量的に時間的一貫性を維持する他の手法よりも優れていることが示された。 NTIRE 2023ビデオカラー化チャレンジでは,色分布一貫性(CDC)最適化トラックで3位にランクインした。

Video colorization task has recently attracted wide attention. Recent methods mainly work on the temporal consistency in adjacent frames or frames with small interval. However, it still faces severe challenge of the inconsistency between frames with large interval.To address this issue, we propose a novel video colorization framework, which combines semantic correspondence into automatic video colorization to keep long-range consistency. Firstly, a reference colorization network is designed to automatically colorize the first frame of each video, obtaining a reference image to supervise the following whole colorization process. Such automatically colorized reference image can not only avoid labor-intensive and time-consuming manual selection, but also enhance the similarity between reference and grayscale images. Afterwards, a semantic correspondence network and an image colorization network are introduced to colorize a series of the remaining frames with the help of the reference. Each frame is supervised by both the reference image and the immediately colorized preceding frame to improve both short-range and long-range temporal consistency. Extensive experiments demonstrate that our method outperforms other methods in maintaining temporal consistency both qualitatively and quantitatively. In the NTIRE 2023 Video Colorization Challenge, our method ranks at the 3rd place in Color Distribution Consistency (CDC) Optimization track.

翻訳日:2023-05-16 19:01:04 公開日:2023-05-13

# SUMO-Kを高次集合論に翻訳する

Translating SUMO-K to Higher-Order Set Theory ( http://arxiv.org/abs/2305.07903v1 )

ライセンス: Link先を確認

Chad Brown, Adam Pease, Josef Urban

(参考訳) 我々はSUMO(SUMO-K)の断片から高階集合論への変換を記述する。この翻訳は、一階を超えて、これまで非公式の解釈しかなかった相撲の一部の形式的な意味論を提供する。また、大きな常識オントロジーを非常に安全な対話的定理証明システムに組み込むのも初めてである。我々は、SUMOの高次構造の一部を含む一階構造からSUMOの矛盾を見つけるためのこれまでの研究をさらに拡張する。最後に、この翻訳を用いて、高階の対話的自動定理証明器を用いて証明できる問題を作成することができる。これはいくつかのシステムでテストされており、高階の常識推論問題のコーパスを形成するために使用できる。

We describe a translation from a fragment of SUMO (SUMO-K) into higher-order set theory. The translation provides a formal semantics for portions of SUMO which are beyond first-order and which have previously only had an informal interpretation. It also for the first time embeds a large common-sense ontology into a very secure interactive theorem proving system. We further extend our previous work in finding contradictions in SUMO from first order constructs to include a portion of SUMO's higher order constructs. Finally, using the translation, we can create problems that can be proven using higher-order interactive and automated theorem provers. This is tested in several systems and can be used to form a corpus of higher-order common-sense reasoning problems.

翻訳日:2023-05-16 19:00:43 公開日:2023-05-13

# 量子計算を用いた電子構造計算

Electronic Structure Calculations using Quantum Computing ( http://arxiv.org/abs/2305.07902v1 )

ライセンス: Link先を確認

Nouhaila Innan, Muhammad Al-Zafar Khan, and Mohamed Bennai

(参考訳) 量子レベルでの電子構造特性の計算は、現代の物理学研究の重要な側面である。しかし、従来の手法はより大きく複雑なシステムに対して計算的に要求することができる。この問題に対処するために,変分量子固有解法(VQE)アルゴリズムを用いたハイブリッド古典量子計算手法を提案する。量子系を量子ビットのセットにマッピングし、量子回路を用いて基底状態の波動関数を準備することにより、従来の方法よりも少ない計算資源を必要とする流線型プロセスを実現する。本アルゴリズムは, 分子の密度汎関数理論やハートリー・フォック理論など, 従来の電子構造法と比較して, 比較的少ない資源を有効利用しながら, 類似した精度を実証した。これらの結果は,新しい材料や技術の発展を早めるアルゴリズムの可能性を示している。この研究は、電子構造計算の計算上の課題を克服する道を開く。これは、量子コンピューティングが複雑な量子システムの理解を前進させる上での変換的影響を示している。

The computation of electronic structure properties at the quantum level is a crucial aspect of modern physics research. However, conventional methods can be computationally demanding for larger, more complex systems. To address this issue, we present a hybrid Classical-Quantum computational procedure that uses the Variational Quantum Eigensolver (VQE) algorithm. By mapping the quantum system to a set of qubits and utilising a quantum circuit to prepare the ground state wavefunction, our algorithm offers a streamlined process requiring fewer computational resources than classical methods. Our algorithm demonstrated similar accuracy in rigorous comparisons with conventional electronic structure methods, such as Density Functional Theory and Hartree-Fock Theory, on a range of molecules while utilising significantly fewer resources. These results indicate the potential of the algorithm to expedite the development of new materials and technologies. This work paves the way for overcoming the computational challenges of electronic structure calculations. It demonstrates the transformative impact of quantum computing on advancing our understanding of complex quantum systems.

翻訳日:2023-05-16 19:00:32 公開日:2023-05-13

# パワーグリッドにおける分断スイッチと量子アニーリングの組合せに関する最適化問題に関する研究

A study of the optimization problem on the combination of sectionalizing switches in power grid with quantum annealing ( http://arxiv.org/abs/2305.07899v1 )

ライセンス: Link先を確認

Masaya Takahashi, Hiroaki Nishioka, Masahiro Hirai, Hidetaka Takano

(参考訳) 地球温暖化の観点からは、電力網の効率改善が課題となっている。電力グリッドには、電力の流れを制御する多くのスイッチングデバイスがある。ワイヤにわずかな抵抗があり、電力消費は電流の正方形に比例するので、電流の供給経路を変えるスイッチ値の組み合わせに応じて、ワイヤ上の電力損失の値が変化する。スイッチの組み合わせの総数はスイッチ数とともに指数関数的に増加し、スイッチ値の最適な組み合わせを見つけるために様々なアルゴリズムが研究されている。本稿では,電力網におけるスイッチ結合問題を2次非拘束二元最適化(QUBO)として捉え,量子アニールを用いた評価関数を導出する手法を提案する。結果は特許庁でp6736787として登録される。

From the perspective of global warming, efficiency improvement of power grids is a pressing issue. Power grids have many switching devices to control the flow of electricity. Since there is a slight resistance in the wires and power consumption is proportional to the square of the current, the value of power loss on the wires changes depending on the combination of switch values that change the supply path of the current. The total number of switch combinations increases exponentially with the number of switches, and various algorithms have been studied to find the optimal combination of switch values. We propose a method to capture the switch combination problem in power grids as quadratic unconstrained binary optimization (QUBO) and derive an evaluation function to solve it using quantum annealing. The result is registered as a patent P6736787 at Japanese patent office.

翻訳日:2023-05-16 19:00:16 公開日:2023-05-13

# network-giant: harmonic hessian consensusによる完全分散ニュートン型最適化

Network-GIANT: Fully distributed Newton-type optimization via harmonic Hessian consensus ( http://arxiv.org/abs/2305.07898v1 )

ライセンス: Link先を確認

Alessio Maritan, Ganesh Sharma, Luca Schenato, Subhrakanti Dey

(参考訳) 本稿では,局所最適化と近隣ノード間の情報交換による局所的目的(経験的損失)関数の和を最小化する分散マルチエージェント学習の課題について考察する。本稿では,集中型パラメータサーバに依存する連合学習アルゴリズムである giant に基づく,ニュートン型完全分散最適化アルゴリズム network-giant を提案する。ネットワークジャイアントアルゴリズムは、各ノードにおける勾配追跡とニュートン型反復アルゴリズムの組み合わせによって設計され、局所勾配とニュートン更新のコンセンサスに基づく平均化を行う。提案アルゴリズムは,強い凸関数と滑らかな損失関数を仮定して,ネットワーク上の厳密解に対する半グローバルおよび指数収束を保証する。本稿では,ネットワークダインやニュートン・ラフソンコンセンサスなどの最先端分散学習アルゴリズムよりも,ネットワークジャイアントの収束性能が優れていることを示す実証的証拠を提供する。

This paper considers the problem of distributed multi-agent learning, where the global aim is to minimize a sum of local objective (empirical loss) functions through local optimization and information exchange between neighbouring nodes. We introduce a Newton-type fully distributed optimization algorithm, Network-GIANT, which is based on GIANT, a Federated learning algorithm that relies on a centralized parameter server. The Network-GIANT algorithm is designed via a combination of gradient-tracking and a Newton-type iterative algorithm at each node with consensus based averaging of local gradient and Newton updates. We prove that our algorithm guarantees semi-global and exponential convergence to the exact solution over the network assuming strongly convex and smooth loss functions. We provide empirical evidence of the superior convergence performance of Network-GIANT over other state-of-art distributed learning algorithms such as Network-DANE and Newton-Raphson Consensus.

翻訳日:2023-05-16 19:00:03 公開日:2023-05-13

# 大規模マルチモーダルモデルにおけるOCRの隠れミステリーについて

On the Hidden Mystery of OCR in Large Multimodal Models ( http://arxiv.org/abs/2305.07895v1 )

ライセンス: Link先を確認

Yuliang Liu, Zhang Li, Hongliang Li, Wenwen Yu, Mingxin Huang, Dezhi Peng, Mingyu Liu, Mingrui Chen, Chunyuan Li, Lianwen Jin, Xiang Bai

(参考訳) 大規模モデルは近年,自然言語処理やマルチモーダル視覚言語学習において重要な役割を担っている。テキスト関連視覚タスクにおける有効性については,いまだ検討されていない。既存のマルチモーダルモデルについて総合的研究を行い,テキスト認識,テキストに基づく視覚的質問応答,キー情報抽出の性能評価を行った。これらのモデルの強みと弱みは、主に単語認識の意味的理解に依存し、個々の文字形状に対する劣った知覚を示す。また、テキスト長に対する差分を表示し、画像のきめ細かい特徴を検出する能力に制限がある。その結果,現在最も強力な大規模マルチモーダルモデルでさえ,従来のテキストタスクではドメイン固有メソッドと一致せず,より複雑なタスクでは大きな課題に直面していることがわかった。最も重要な点は,ゼロショットマルチモーダル技術の向上を目的とした革新的戦略の構想と評価のための基礎的枠組みを,本研究で提示した基礎的結果が提供できることである。評価パイプラインはhttps://github.com/Yuliang-Liu/MultimodalOCRで提供される。

Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. It remains less explored about their efficacy in text-related visual tasks. We conducted a comprehensive study of existing publicly available multimodal models, evaluating their performance in text recognition, text-based visual question answering, and key information extraction. Our findings reveal strengths and weaknesses in these models, which primarily rely on semantic understanding for word recognition and exhibit inferior perception of individual character shapes. They also display indifference towards text length and have limited capabilities in detecting fine-grained features in images. Consequently, these results demonstrate that even the current most powerful large multimodal models cannot match domain-specific methods in traditional text tasks and face greater challenges in more complex tasks. Most importantly, the baseline results showcased in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal techniques. Evaluation pipeline will be available at https://github.com/Yuliang-Liu/MultimodalOCR.

翻訳日:2023-05-16 18:59:48 公開日:2023-05-13

# 3次元非教師付き(深く)教師付きニューラルネットワークを用いた多孔質部品のボクセルワイズ分類

Voxel-wise classification for porosity investigation of additive manufactured parts with 3D unsupervised and (deeply) supervised neural networks ( http://arxiv.org/abs/2305.07894v1 )

ライセンス: Link先を確認

Domenico Iuso, Soumick Chatterjee, Jan De Beenhouwer, Jan Sijbers

(参考訳) アダプティブ・マニュファクチャリング(AM)は、デジタルモデルからサンプルを直接生産できる製造プロセスとして登場した。バッチのすべての製造サンプルで品質基準が満たされることを保証するため、X線CT(Computerd Tomography)が自動異常検出と組み合わせられることが多い。後者では、画像品質の低下に対して分析され、耐性がある材料に対して堅牢であるように訓練できるため、ディープラーニング(DL)異常検出技術が増えている。残念なことに、最近のDLモデルは2次元画像処理のために開発されており、貴重なボリューム情報を無視している。本研究は,X-CT画像からのAMサンプルのポロシティ解析のための非教師付き (UNet, UNet++, UNet 3+, MSS-UNet) と非教師付き (VAE, ceVAE, gmVAE, vqVAE) DLモデルを再検討し, 3次元パッチパイプラインを用いて3次元入力データを受け入れるように拡張した。教師付きモデルはFocal Tversky損失を用いてトレーニングされ、トレーニングデータセットの低いポロシティから生じるクラス不均衡に対処した。教師なしモデルの出力は、オブジェクト表面を適切に表現できないことによる誤分類を減らすために後処理される。その結果,DLモデルの性能ベンチマーク,ポストプロセッシングアルゴリズムの評価,教師なしモデルの出力による教師なしモデルのトレーニング効果の評価など,5倍の精度で検証された。画像品質の悪いテストセットの最終的なパフォーマンスベンチマークでは、最高のパフォーマンス教師付きモデルは平均精度0.808$\pm$0.013のMSS-UNetであり、最も優れた教師なしモデルは処理後のceVAE 0.935$\pm$ 0.001である。 VAE/ceVAEモデルは特に後処理技術を活用する際に優れた性能を示した。

Additive Manufacturing (AM) has emerged as a manufacturing process that allows the direct production of samples from digital models. To ensure that quality standards are met in all manufactured samples of a batch, X-ray computed tomography (X-CT) is often used combined with automated anomaly detection. For the latter, deep learning (DL) anomaly detection techniques are increasingly, as they can be trained to be robust to the material being analysed and resilient towards poor image quality. Unfortunately, most recent and popular DL models have been developed for 2D image processing, thereby disregarding valuable volumetric information. This study revisits recent supervised (UNet, UNet++, UNet 3+, MSS-UNet) and unsupervised (VAE, ceVAE, gmVAE, vqVAE) DL models for porosity analysis of AM samples from X-CT images and extends them to accept 3D input data with a 3D-patch pipeline for lower computational requirements, improved efficiency and generalisability. The supervised models were trained using the Focal Tversky loss to address class imbalance that arises from the low porosity in the training datasets. The output of the unsupervised models is post-processed to reduce misclassifications caused by their inability to adequately represent the object surface. The findings were cross-validated in a 5-fold fashion and include: a performance benchmark of the DL models, an evaluation of the post-processing algorithm, an evaluation of the effect of training supervised models with the output of unsupervised models. In a final performance benchmark on a test set with poor image quality, the best performing supervised model was MSS-UNet with an average precision of 0.808 $\pm$ 0.013, while the best unsupervised model was the post-processed ceVAE with 0.935 $\pm$ 0.001. The VAE/ceVAE models demonstrated superior capabilities, particularly when leveraging post-processing techniques.

翻訳日:2023-05-16 18:59:32 公開日:2023-05-13

# PESTS: セマンティックテキスト類似性のためのペルシャ英語クロスリンガルコーパス

PESTS: Persian_English Cross Lingual Corpus for Semantic Textual Similarity ( http://arxiv.org/abs/2305.07893v1 )

ライセンス: Link先を確認

Mohammad Abdous, Poorya Piroozfar, Behrouz Minaei Bidgoli

(参考訳) 最近多くの調査を受けた自然言語処理のコンポーネントの1つは、セマンティックテキストの類似性である。計算言語学や自然言語処理では、単語、句、段落、テキストの意味的類似性を評価することが重要である。意味的類似性(semantic similarity)は、単言語版とクロス言語版の両方で提供される2つのテキスト片、段落、句間の意味的類似度を計算することである。言語間の意味的類似性は、ソース言語とターゲット言語の両方に意味的類似度を持つ文対が存在するコーパスを必要とする。多くの既存の言語間セマンティック類似モデルでは、機械翻訳誤差の伝搬がモデルの精度を低下させるクロス言語間セマンティック類似性データセットが利用できないため、機械翻訳を用いる。一方、機械翻訳に意味的類似性を利用したい場合は、意味的類似性のために同じ機械翻訳を使うべきではない。ペルシャ語は低資源言語の1つであるが、この点において努力は行われておらず、2つの言語の文脈を理解できるモデルの必要性はこれまで以上に感じられる。本稿では,ペルシア語と英語の文間の意味的テキスト類似性のコーパスを,言語専門家を用いて初めて作成した。このデータセットをPESTS (Persian English Semantic Textual similarity) と名付けた。このコーパスは5375の文対を含む。また、トランスフォーマーに基づくモデルもこのデータセットを使って微調整されている。その結果、PESTSデータセットを用いて、XLM ROBERTaモデルのピアソン相関は85.87%から95.62%に増加した。

One of the components of natural language processing that has received a lot of investigation recently is semantic textual similarity. In computational linguistics and natural language processing, assessing the semantic similarity of words, phrases, paragraphs, and texts is crucial. Calculating the degree of semantic resemblance between two textual pieces, paragraphs, or phrases provided in both monolingual and cross-lingual versions is known as semantic similarity. Cross lingual semantic similarity requires corpora in which there are sentence pairs in both the source and target languages with a degree of semantic similarity between them. Many existing cross lingual semantic similarity models use a machine translation due to the unavailability of cross lingual semantic similarity dataset, which the propagation of the machine translation error reduces the accuracy of the model. On the other hand, when we want to use semantic similarity features for machine translation the same machine translations should not be used for semantic similarity. For Persian, which is one of the low resource languages, no effort has been made in this regard and the need for a model that can understand the context of two languages is felt more than ever. In this article, the corpus of semantic textual similarity between sentences in Persian and English languages has been produced for the first time by using linguistic experts. We named this dataset PESTS (Persian English Semantic Textual Similarity). This corpus contains 5375 sentence pairs. Also, different models based on transformers have been fine-tuned using this dataset. The results show that using the PESTS dataset, the Pearson correlation of the XLM ROBERTa model increases from 85.87% to 95.62%.

翻訳日:2023-05-16 18:58:53 公開日:2023-05-13

# dac-mr: メタ学習のためのデータ拡張一貫性に基づくメタレギュライゼーション

DAC-MR: Data Augmentation Consistency Based Meta-Regularization for Meta-Learning ( http://arxiv.org/abs/2305.07892v1 )

ライセンス: Link先を確認

Jun Shu, Xiang Yuan, Deyu Meng, Zongben Xu

(参考訳) 最近、メタ学習は研究され、現代の機械学習の進歩に貢献した。しかし、優れたメタ学習モデルを実現するには、基礎となるタスクの一般化目標を表す高品質なメタデータを備えた大量のトレーニングタスクが必要である。しかし、現在のメタデータ駆動型メタ学習アプローチは、不十分なトレーニングタスクで満足なメタモデルをトレーニングすることがかなり難しい。この問題に対処するため,メタ知識をメタ学習プロセスに統合することによりメタ学習を改善するメタ知識情報メタ学習(MKIML)フレームワークを提案する。メタモデル関数クラスのキャパシティ複雑性を正規化するために,適切なメタ正規化(MR)目標を用いてメタ知識をメタオブジェクトに統合し,未確認タスクの一般化を容易にする。 DAC-MRで表されるMR目標をインスタンス化するためのメタ知識として,不変性を符号化するためのデータ拡張整合性を導入する。提案するdac-mrは、ノイズ、スパース、あるいは使用不能なメタデータを持つトレーニングタスクから、パフォーマンスのよいメタモデルを学ぶことを希望する。理論的には,DAC-MRは,高品質なメタデータを持たないメタモデルを評価するために用いられるプロキシメタオブジェクトとして扱うことができる。さらに,DAC-MRと組み合わせたメタデータ駆動型メタロスは,より優れたメタレベルの一般化を実現することができる。異なるネットワークアーキテクチャとベンチマークを持つ10のメタラーニングタスクは、メタモデル学習を支援するdac-mrの能力を示しています。 DAC-MRの優れた性能は、すべての設定で得られ、我々の理論的知見とよく一致している。これは、私たちのDAC-MRは問題に非依存であり、広範なメタ学習問題やタスクに容易に適用できることを望んでいます。

Meta learning recently has been heavily researched and helped advance the contemporary machine learning. However, achieving well-performing meta-learning model requires a large amount of training tasks with high-quality meta-data representing the underlying task generalization goal, which is sometimes difficult and expensive to obtain for real applications. Current meta-data-driven meta-learning approaches, however, are fairly hard to train satisfactory meta-models with imperfect training tasks. To address this issue, we suggest a meta-knowledge informed meta-learning (MKIML) framework to improve meta-learning by additionally integrating compensated meta-knowledge into meta-learning process. We preliminarily integrate meta-knowledge into meta-objective via using an appropriate meta-regularization (MR) objective to regularize capacity complexity of the meta-model function class to facilitate better generalization on unseen tasks. As a practical implementation, we introduce data augmentation consistency to encode invariance as meta-knowledge for instantiating MR objective, denoted by DAC-MR. The proposed DAC-MR is hopeful to learn well-performing meta-models from training tasks with noisy, sparse or unavailable meta-data. We theoretically demonstrate that DAC-MR can be treated as a proxy meta-objective used to evaluate meta-model without high-quality meta-data. Besides, meta-data-driven meta-loss objective combined with DAC-MR is capable of achieving better meta-level generalization. 10 meta-learning tasks with different network architectures and benchmarks substantiate the capability of our DAC-MR on aiding meta-model learning. Fine performance of DAC-MR are obtained across all settings, and are well-aligned with our theoretical insights. This implies that our DAC-MR is problem-agnostic, and hopeful to be readily applied to extensive meta-learning problems and tasks.

翻訳日:2023-05-16 18:58:28 公開日:2023-05-13

# 構造シミュレーションとブリッジ健康モニタリングのためのニューラルオペレータ

Neural operator for structural simulation and bridge health monitoring ( http://arxiv.org/abs/2305.07889v1 )

ライセンス: Link先を確認

Chawit Kaewnuratchadasorn, Jiaji Wang, Chul-Woo Kim

(参考訳) 構造工学による深層学習は,前向き問題(構造シミュレーション)と逆問題(構造健康モニタリング)の両方に広く注目されている。フーリエ・ニューラル・オペレーターに基づいて,橋梁構造のディジタル双対としてvino(vehicle-bridge interaction neural operator)を提案する。 VINOは構造応答場と損傷場のマッピングを学ぶ。本研究では, 構造初期損傷場のランダム分布を考慮したパラメータ有限要素(FE)シミュレーションにより, VBI-FEデータセットを構築した。その後、vbi-expデータセットは4つの損傷シナリオで実験的研究を行った。 VINOはVBI-FEによって事前訓練され、VBI-EXPによって正常状態の橋から微調整された後、以下の2つの改善が達成された。まず、フォワードVINOは、FEモデルよりも正確に損傷場入力から構造応答を予測できる。第二に、逆VINOはすべてのシナリオにおけるダメージを決定、ローカライズ、定量化し、データ駆動アプローチの実践性を示唆する。

Infusing deep learning with structural engineering has received widespread attention for both forward problems (structural simulation) and inverse problems (structural health monitoring). Based on Fourier Neural Operator, this study proposes VINO (Vehicle-bridge Interaction Neural Operator) to serve as the digital twin of bridge structures. VINO learns mappings between structural response fields and damage fields. In this study, VBI-FE dataset was established by running parametric finite element (FE) simulations considering a random distribution of structural initial damage field. Subsequently, VBI-EXP dataset was produced by conducting an experimental study under four damage scenarios. After VINO was pre-trained by VBI-FE and fine-tuned by VBI-EXP from the bridge at the healthy state, the model achieved the following two improvements. First, forward VINO can predict structural responses from damage field inputs more accurately than the FE model. Second, inverse VINO can determine, localize, and quantify damages in all scenarios, suggesting the practicality of data-driven approaches.

翻訳日:2023-05-16 18:57:58 公開日:2023-05-13

# 滑らかな演算子に基づく一段階量子探索アルゴリズム

One-step quantum search algorithms based on smooth operators ( http://arxiv.org/abs/2305.07924v1 )

ライセンス: Link先を確認

Basanta R. Pahari, Sagar Bhat, Siri Davidi, William Oates

(参考訳) 微分や積分の発見は科学知識の飛躍的な飛躍であり、数学、物理学、工学など多くの分野に革命をもたらした。高次微分の存在は近似の精度を高め、任意の物理現象のより正確なモデリングを可能にする。ここでは、無限に微分可能な滑らかな演算子を用いて、2つの量子探索アルゴリズムを構築し、一見異なる領域を接続する。滑らかな関数とともに、置換演算子とユニティの根を利用して量子探索を行う量子回路を生成する。量子シミュレータを用いてモデルを検証し、IBMの量子ハードウェア上でテストする。さらに,ノイズと誤差伝播の効果について検討し,groverのアルゴリズムのような反復法に比べて,雑音に対してより頑健な手法であることを示す。

The discovery of derivatives and integrals was a tremendous leap in scientific knowledge and completely revolutionized many fields, including mathematics, physics, and engineering. The existence of higher-order derivatives means better approximation and, thus, more accurate modeling of any physical phenomenon. Here we use smooth operators that are infinitely differentiable to construct two quantum search algorithms and connect these seemingly different areas. Along with smooth functions, permutation operators and the roots of unity are exploited to create quantum circuits to perform a quantum search. We validate our models through quantum simulators and test them on IBM's quantum hardware. Furthermore, we investigate the effect of noise and error propagation and demonstrate that our approach is more robust to noise compared to iterative methods like Grover's algorithm.

翻訳日:2023-05-16 18:51:58 公開日:2023-05-13

# CodeT5+: コード理解と生成のためのオープンコード大言語モデル

CodeT5+: Open Code Large Language Models for Code Understanding and Generation ( http://arxiv.org/abs/2305.07922v1 )

ライセンス: Link先を確認

Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D.Q. Bui, Junnan Li, Steven C.H. Hoi

(参考訳) 大きな言語モデル (LLM) は膨大なソースコードで事前訓練されており、コードインテリジェンスにおいて顕著な進歩を遂げている。しかし、既存のLLMにはアーキテクチャと事前訓練タスクの2つの主な制限がある。まず、特定のアーキテクチャ(エンコーダのみまたはデコーダのみ)を採用するか、あるいは異なるダウンストリームタスクに統一されたエンコーダデコーダネットワークに依存する。前者のパラダイムはアプリケーションの柔軟性によって制限されるが、後者では、モデルが全てのタスクに対して単一のシステムとして扱われ、タスクのサブセット上での最適なパフォーマンスをもたらす。第2に,ダウンストリームタスクとは無関係な,限定的な事前トレーニング目標を採用して,結果としてパフォーマンスが大幅に低下することが多い。これらの制限に対処するために,コンポーネントモジュールを柔軟に組み合わせて幅広いダウンストリームコードタスクに適合させることができるコード用エンコーダデコーダLLMのファミリーである ``CodeT5+' を提案する。このような柔軟性は,プレトレイン-ファイントゥーンの相違を緩和するための事前学習目的の混合によって実現される。これらの目的は、単調かつバイモーダルな多言語コードコーパスにおいて、認知、コントラスト学習、テキストコードマッチング、因果的LM事前訓練タスクをカバーする。さらに,スクラッチからトレーニングを受けることなく既製のLLMでCodeT5+を初期化してモデルを効率的にスケールアップし,自然言語命令と整合するインストラクションチューニングについて検討する。我々は、ゼロショット、微調整、命令調整を含む20以上のコード関連ベンチマークでCodeT5+を広範囲に評価した。我々は,コード生成や完了,数学プログラミング,テキスト・ツー・コード検索タスクなど,コード関連タスクにおける最先端(SoTA)モデルのパフォーマンスを観察する。特に,命令調整した CodeT5+ 16B では,HumanEval コード生成タスクにおいて,他のオープンコード LLM に対して新たな SoTA 結果が得られる。

Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. However, existing code LLMs have two main limitations in terms of architecture and pretraining tasks. First, they often adopt a specific architecture (encoder-only or decoder-only) or rely on a unified encoder-decoder network for different downstream tasks. The former paradigm is limited by inflexibility in applications while in the latter, the model is treated as a single system for all tasks, leading to suboptimal performance on a subset of tasks. Secondly, they often employ a limited set of pretraining objectives which might not be relevant to some downstream tasks and hence result in substantial performance degrade. To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. Such flexibility is enabled by our proposed mixture of pretraining objectives to mitigate the pretrain-finetune discrepancy. These objectives cover span denoising, contrastive learning, text-code matching, and causal LM pretraining tasks, on both unimodal and bimodal multilingual code corpora. Furthermore, we propose to initialize CodeT5+ with frozen off-the-shelf LLMs without training from scratch to efficiently scale up our models, and explore instruction-tuning to align with natural language instructions. We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning. We observe state-of-the-art (SoTA) model performance on various code-related tasks, such as code generation and completion, math programming, and text-to-code retrieval tasks. Particularly, our instruction-tuned CodeT5+ 16B achieves new SoTA results on HumanEval code generation task against other open code LLMs.

翻訳日:2023-05-16 18:51:45 公開日:2023-05-13

# 医用ビジョンランゲージ事前トレーニングのためのアライメントモデリングによるマルチタスクペアマスキング

Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training ( http://arxiv.org/abs/2305.07920v1 )

ライセンス: Link先を確認

Ke Zhang, Hanliang Jiang, Jian Zhang, Qingming Huang, Jianping Fan, Jun Yu and Weidong Han

(参考訳) 近年,医用画像診断の需要が高まり,放射線科医にとって大きな負担となっている。既存のmed-vlp手法は,大規模医用画像から普遍表現を学習する自動医用画像解析のソリューションを提供し,細かなアノテーションを必要とせずに下流タスクに便益を与える。しかし, 既存の画像・テキスト合成手法では, 関節再建にともなうクロスモーダルアライメントの重要性が無視され, 不適切なクロスモーダル相互作用が得られた。本稿では,マルチタスク・ペアリング・マスク・アライメント(mpma)に基づく統合型メド・vlpフレームワークを提案し,クロスモーダルアライメントタスクを統合画像テキスト合成フレームワークに統合し,より包括的なクロスモーダルインタラクションを実現する。より包括的なクロスモーダル融合を実現するため,視覚的特徴を完全に統合し,レポート再構築のプロセスを支援するメモリ拡張クロスモーダル融合(MA-CMF)モジュールも提案する。実験の結果,提案手法は,ユニモーダルタスク,クロスモーダルタスク,マルチモーダルタスクなど,すべての下流タスクに対して従来手法よりも優れていた。

In recent years, the growing demand for medical imaging diagnosis has brought a significant burden to radiologists. The existing Med-VLP methods provide a solution for automated medical image analysis which learns universal representations from large-scale medical images and reports and benefits downstream tasks without requiring fine-grained annotations. However, the existing methods based on joint image-text reconstruction neglect the importance of cross-modal alignment in conjunction with joint reconstruction, resulting in inadequate cross-modal interaction. In this paper, we propose a unified Med-VLP framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework to achieve more comprehensive cross-modal interaction, while a global and local alignment (GLA) module is designed to assist self-supervised paradigm in obtaining semantic representations with rich domain knowledge. To achieve more comprehensive cross-modal fusion, we also propose a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual features to assist in the process of report reconstruction. Experimental results show that our approach outperforms previous methods over all downstream tasks, including uni-modal, cross-modal and multi-modal tasks.

翻訳日:2023-05-16 18:51:12 公開日:2023-05-13

# マルチオブザーバによる高次元モニタリングとリアリズムの出現

High-dimensional monitoring and the emergence of realism via multiple observers ( http://arxiv.org/abs/2305.07919v1 )

ライセンス: Link先を確認

Alexandre C. Orthey Jr., Pedro R. Dieguez, Owidiusz Makuta, Remigiusz Augusiak

(参考訳) 量子測定はユニタリ進化であり、後に部分的トレースが続く。そこで本研究では,量子世界の物理的現実の出現を,弱度と強い非選択性の測定を補間するモデルを導入することによって解決する。一般化オブザーバブルとハイゼンベルク・ワイル作用素に基づくモデルでは,高次元のquditに対しては,量子ダーウィン主義の枠組みに従って,システムと複数の環境quditと相互作用させることで,システムに関する完全な情報を得ることができることを示唆する。

Quantum measurements are unitary evolutions followed by partial traces. Based on that, we address the problem of the emergence of physical reality from the quantum world by introducing a model that interpolates between weak and strong non-selective measurements for qudits. Our model, which is based on generalized observables and Heisenberg-Weyl operators, suggests that for high-dimensional qudits, full information about the system can only be obtained by making the system interact with not just one but several environmental qudits, following a Quantum Darwinism framework.

翻訳日:2023-05-16 18:50:44 公開日:2023-05-13

# スペシャリストの原理の証明

A Proof of Specker's Principle ( http://arxiv.org/abs/2305.07917v1 )

ライセンス: Link先を確認

Guido Bacciagaluppi

(参考訳) スペクターの原理、対の直交命題は共同直交でなければならないという条件は、量子力学を特徴づける物理原理を見つけるプログラムの中で、近年広く研究されている。しかし、透明な正当性が欠けていることがほとんどである。本稿では,最大エンタングルメントの存在,非最大測定の存在,および符号付けの3つの仮定から,スペクトルの原理を導出する。これら3つの仮定について議論し、2つの命題を満たす非Specker集合の正準例を記述する。これらの例は、量子力学の解釈における様々なアプローチ、特にレトロカウセーションに基づく類似性を示す。また、ポープスクやローリッヒの作品との関係についても論じる。証明の核心(そして、署名の禁止に違反する主な例)は、私が紙を開く『ニーネヴェのシーザー』というスペクターの物語の変種によって説明されている。

Specker's principle, the condition that pairwise orthogonal propositions must be jointly orthogonal, has been much investigated recently within the programme of finding physical principles to characterise quantum mechanics. It largely appears, however, to lack a transparent justification. In this paper, I provide a derivation of Specker's principle from three assumptions (made suitably precise): the existence of maximal entanglement, the existence of non-maximal measurements, and no-signalling. I discuss these three assumptions and describe canonical examples of non-Specker sets of propositions satisfying any two of them. These examples display analogies with various approaches in the interpretation of quantum mechanics, notably ones based on retrocausation. I also discuss connections with the work of Popescu and Rohrlich. The core of the proof (and the main example violating no-signalling) is illustrated by a variant of Specker's tale of the seer of Nineveh, with which I open the paper.

翻訳日:2023-05-16 18:50:33 公開日:2023-05-13

# 干渉による測定のための量子不確かさ原理

Quantum Uncertainty Principles for Measurements with Interventions ( http://arxiv.org/abs/2305.07914v1 )

ライセンス: Link先を確認

Yunlong Xiao, Yuxiang Yang, Ximing Wang, Qing Liu, Mile Gu

(参考訳) ハイゼンベルクの不確実性原理は、量子システムのどの性質を同時に学べるかに関する基本的な制約を意味する。しかし、通常は、これらの性質を1つの点で測定することで調査する。対照的に、複雑なプロセスにおける因果依存性を推測するには、しばしば対話的な実験を必要とする。ここでは任意の介入ラウンドを含む一般的な対話的測定のための普遍的不確実性原理を示す。ケーススタディとして,異なる因果関係に適合する測定値間の不確実性トレードオフを示唆することを示す。

Heisenberg's uncertainty principle implies fundamental constraints on what properties of a quantum system can we simultaneously learn. However, it typically assumes that we probe these properties via measurements at a single point in time. In contrast, inferring causal dependencies in complex processes often requires interactive experimentation - multiple rounds of interventions where we adaptively probe the process with different inputs to observe how they affect outputs. Here we demonstrate universal uncertainty principles for general interactive measurements involving arbitrary rounds of interventions. As a case study, we show that they imply an uncertainty trade-off between measurements compatible with different causal dependencies.

翻訳日:2023-05-16 18:50:16 公開日:2023-05-13

# 時間知識グラフ補完のためのプロンプト付き事前学習言語モデル

Pre-trained Language Model with Prompts for Temporal Knowledge Graph Completion ( http://arxiv.org/abs/2305.07912v1 )

ライセンス: Link先を確認

Wenjie Xu, Ben Liu, Miao Peng, Xu Jia, Min Peng

(参考訳) 時間知識グラフ補完(TKGC)は、事実の欠落部分を完成させるために既知のタイムスタンプでの推論を含む重要なタスクであり、近年ますます注目を集めている。既存の手法のほとんどは、時間スタンプから情報を不正確に抽出しながら、グラフニューラルネットワークに基づく表現の学習に重点を置いている。これらの問題に対処するため,我々は新しいtkgcモデル,すなわちtkgc (ppt) のプロンプト付き事前学習言語モデルを提案する。サンプルの四重項を事前訓練した言語モデル入力に変換し、タイムスタンプ間の間隔を異なるプロンプトに変換し、暗黙的な意味情報を持つ一貫性のある文を生成する。我々は、TKGCタスクをマスク付きトークン予測タスクに変換するためのマスキング戦略でモデルを訓練し、事前訓練された言語モデルにおける意味情報を活用することができる。 3つのベンチマークデータセットに関する実験と広範な分析によって、我々のモデルは4つのメトリクスを持つ他のモデルと比較して大きな競合性を示している。我々のモデルは、時間的知識グラフからの情報を言語モデルに効果的に組み込むことができる。

Temporal Knowledge graph completion (TKGC) is a crucial task that involves reasoning at known timestamps to complete the missing part of facts and has attracted more and more attention in recent years. Most existing methods focus on learning representations based on graph neural networks while inaccurately extracting information from timestamps and insufficiently utilizing the implied information in relations. To address these problems, we propose a novel TKGC model, namely Pre-trained Language Model with Prompts for TKGC (PPT). We convert a series of sampled quadruples into pre-trained language model inputs and convert intervals between timestamps into different prompts to make coherent sentences with implicit semantic information. We train our model with a masking strategy to convert TKGC task into a masked token prediction task, which can leverage the semantic information in pre-trained language models. Experiments on three benchmark datasets and extensive analysis demonstrate that our model has great competitiveness compared to other models with four metrics. Our model can effectively incorporate information from temporal knowledge graphs into the language models.

翻訳日:2023-05-16 18:50:07 公開日:2023-05-13

# 遅延適応型政策最適化とバンディットフィードバックによる逆mdpの後悔改善

Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback ( http://arxiv.org/abs/2305.07911v1 )

ライセンス: Link先を確認

Tal Lancewicki, Aviv Rosenberg, Dmitry Sotnikov

(参考訳) 政策最適化(PO)は強化学習(RL)において最も一般的な手法の1つである。したがって、POアルゴリズムの理論的保証はRLコミュニティにとって特に重要である。本稿では,ほぼすべての実世界のアプリケーションで発生する課題である,敵対的MDPにおけるPOについて検討する。表形式のMDPでPOに最も近い最適後悔境界を与え、最先端(効率の低い手法)を超越する可能性さえある。私たちの小説『Delay-Adapted PO』(DAPO)は簡単に実装でき、一般化でき、アルゴリズムを次のように拡張できます。 (i)線形$q$-関数を仮定した無限状態空間は、関数近似を用いて遅延フィードバックに対する最初の後悔の限界を証明する。 (II)MuJoCoドメインの実験において,その有効性を示した深部RL。

Policy Optimization (PO) is one of the most popular methods in Reinforcement Learning (RL). Thus, theoretical guarantees for PO algorithms have become especially important to the RL community. In this paper, we study PO in adversarial MDPs with a challenge that arises in almost every real-world application -- \textit{delayed bandit feedback}. We give the first near-optimal regret bounds for PO in tabular MDPs, and may even surpass state-of-the-art (which uses less efficient methods). Our novel Delay-Adapted PO (DAPO) is easy to implement and to generalize, allowing us to extend our algorithm to: (i) infinite state space under the assumption of linear $Q$-function, proving the first regret bounds for delayed feedback with function approximation. (ii) deep RL, demonstrating its effectiveness in experiments on MuJoCo domains.

翻訳日:2023-05-16 18:49:47 公開日:2023-05-13

# mask to reconstruction: コラボレーティブ・セマンティクス・コンプリートによるビデオテキスト検索

Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval ( http://arxiv.org/abs/2305.07910v1 )

ライセンス: Link先を確認

Han Fang and Zhifei Yang and Xianghao Zang and Chao Ban and Hao Sun

(参考訳) 近年,マスク付きビデオモデリングが広く研究され,局所レベルでの視覚領域の理解能力が大幅に向上している。しかし、既存の手法は通常ランダムマスキングを採用し、クロスモーダルコンテンツ間の相関を活用しないマスキング領域を完備するために同じ再構成パラダイムに従う。本稿では,セマンティクスに基づくマスクモデルに基づいて,セマンティクス補完のためのマスク(mascot)を提案する。具体的には、注意に基づくビデオマスキングを用いて、高インフォームドかつ低インフォームドマスクを生成した後、マスキングされたセマンティクス情報を復元するためのインフォームドセマンティクス補完を提案する。このリカバリメカニズムは、マスクされたコンテンツと、マスクされていない視覚領域と対応するテキストコンテキストを整合させることで実現され、モデルがパッチレベルでよりテキスト関連の詳細をキャプチャする。さらに,無関係な背景から差別的な部分への再構成を重視し,低変形マスクの領域を無視する。さらに,両マスク協調学習を設計し,異なるマスクの下にビデオキューを組み込んで,より整列した映像表現を学習する。 MSR-VTT, LSMDC, ActivityNet, DiDeMo など,4つの主要なテキストビデオ検索ベンチマークで最先端のパフォーマンスを実現した。広範なアブレーション研究により,提案手法の有効性が示された。

Recently, masked video modeling has been widely explored and significantly improved the model's understanding ability of visual regions at a local level. However, existing methods usually adopt random masking and follow the same reconstruction paradigm to complete the masked regions, which do not leverage the correlations between cross-modal content. In this paper, we present Mask for Semantics Completion (MASCOT) based on semantic-based masked modeling. Specifically, after applying attention-based video masking to generate high-informed and low-informed masks, we propose Informed Semantics Completion to recover masked semantics information. The recovery mechanism is achieved by aligning the masked content with the unmasked visual regions and corresponding textual context, which makes the model capture more text-related details at a patch level. Additionally, we shift the emphasis of reconstruction from irrelevant backgrounds to discriminative parts to ignore regions with low-informed masks. Furthermore, we design dual-mask co-learning to incorporate video cues under different masks and learn more aligned video representation. Our MASCOT performs state-of-the-art performance on four major text-video retrieval benchmarks, including MSR-VTT, LSMDC, ActivityNet, and DiDeMo. Extensive ablation studies demonstrate the effectiveness of the proposed schemes.

翻訳日:2023-05-16 18:49:34 公開日:2023-05-13

# ハードウェア貯留層におけるブールウェイト最適化の収束とスケーリング

Convergence and scaling of Boolean-weight optimization for hardware reservoirs ( http://arxiv.org/abs/2305.07908v1 )

ライセンス: Link先を確認

Louis Andreoli, St\'ephane Chr\'etien, Xavier Porte, Daniel Brunner

(参考訳) ニューラルネットワークのハードウェア実装は、次世代の効率的で強力な人工知能ソリューションを実装するための重要なステップである。並列で効率的でスケーラブルなハードウェアアーキテクチャの実現に加えて、サンプリング効率のよいアプローチでシステムの非常に大きなパラメータ空間の最適化が不可欠である。本稿では,ランダムリカレント結合型ニューラルネットワーク,リザーバの読み出し層を最適化するために,高度に効率的な座標降下のためのスケーリング則を解析的に導出する。この収束は指数関数的であり,ネットワークのニューロン数に線形にスケールすることを示す。本結果は,概念実証実験で実施した大規模フォトニック貯水池の収束とスケーリングを再現するものである。そこで本研究では,ハードウェアネットワークにおけるこのような最適化の基盤を提供し,ニューラルネットワークの振幅統計量と重み付け更新則を活用し,学習中に収束速度を最適化する今後の方向性を明らかにした。

Hardware implementation of neural network are an essential step to implement next generation efficient and powerful artificial intelligence solutions. Besides the realization of a parallel, efficient and scalable hardware architecture, the optimization of the system's extremely large parameter space with sampling-efficient approaches is essential. Here, we analytically derive the scaling laws for highly efficient Coordinate Descent applied to optimizing the readout layer of a random recurrently connection neural network, a reservoir. We demonstrate that the convergence is exponential and scales linear with the network's number of neurons. Our results perfectly reproduce the convergence and scaling of a large-scale photonic reservoir implemented in a proof-of-concept experiment. Our work therefore provides a solid foundation for such optimization in hardware networks, and identifies future directions that are promising for optimizing convergence speed during learning leveraging measures of a neural network's amplitude statistics and the weight update rule.

翻訳日:2023-05-16 18:49:09 公開日:2023-05-13

# 会話型推薦システムにおける大規模言語モデル活用

Leveraging Large Language Models in Conversational Recommender Systems ( http://arxiv.org/abs/2305.07961v1 )

ライセンス: Link先を確認

Luke Friedman, Sameer Ahuja, David Allen, Terry Tan, Hakim Sidahmed, Changbo Long, Jun Xie, Gabriel Schubiner, Ajay Patel, Harsh Lara, Brian Chu, Zexi Chen, Manoj Tiwari

(参考訳) Conversational Recommender System (CRS)は、リアルタイムのマルチターン対話を通じてシステムと対話できるようにすることにより、ユーザに対して透明性とコントロールを向上する。近年、Large Language Models (LLMs) は、自然に会話し、世界知識と常識推論を言語理解に取り入れ、このパラダイムの可能性を解き放つ前例のない能力を示した。しかし、CRS内でLLMを効果的に活用することは、複雑な会話を適切に理解し、制御し、外部の情報ソースから取り出すなど、新しい技術的課題をもたらす。これらの問題は、大きく進化した項目コーパスと、トレーニングのための会話データの欠如によって悪化する。本稿では,LSMを用いたエンドツーエンドの大規模CRSを構築するためのロードマップを提供する。特に,LLMを利用した統合アーキテクチャの一部として,ユーザ好みの理解,フレキシブルな対話管理,説明可能なレコメンデーションのための新しい実装を提案する。パーソナライズを改善するために,LLMが解釈可能な自然言語ユーザプロファイルを消費し,セッションレベルのコンテキストを変調するために利用する方法について述べる。既存のCRSが存在しない場合の会話データ制限を克服するため,制御可能なLCMベースのユーザシミュレータを構築し,合成会話を生成する手法を提案する。概念実証として、LaMDA上に構築されたYouTubeビデオ用の大規模CRSであるRecLLMを紹介し、説明的な例による会話を通じて、その流布性と多様な機能を示す。

A Conversational Recommender System (CRS) offers increased transparency and control to users by enabling them to engage with the system through a real-time multi-turn dialogue. Recently, Large Language Models (LLMs) have exhibited an unprecedented ability to converse naturally and incorporate world knowledge and common-sense reasoning into language understanding, unlocking the potential of this paradigm. However, effectively leveraging LLMs within a CRS introduces new technical challenges, including properly understanding and controlling a complex conversation and retrieving from external sources of information. These issues are exacerbated by a large, evolving item corpus and a lack of conversational data for training. In this paper, we provide a roadmap for building an end-to-end large-scale CRS using LLMs. In particular, we propose new implementations for user preference understanding, flexible dialogue management and explainable recommendations as part of an integrated architecture powered by LLMs. For improved personalization, we describe how an LLM can consume interpretable natural language user profiles and use them to modulate session-level context. To overcome conversational data limitations in the absence of an existing production CRS, we propose techniques for building a controllable LLM-based user simulator to generate synthetic conversations. As a proof of concept we introduce RecLLM, a large-scale CRS for YouTube videos built on LaMDA, and demonstrate its fluency and diverse functionality through some illustrative example conversations.

翻訳日:2023-05-16 18:41:27 公開日:2023-05-13

# 分類木の最適学習のための新しいメメティック戦略

A Novel Memetic Strategy for Optimized Learning of Classification Trees ( http://arxiv.org/abs/2305.07959v1 )

ライセンス: Link先を確認

Tommaso Aldinucci

(参考訳) 解釈可能な機械学習への関心が高まる中、分類木はそのガラス箱構造のために再び科学界の注目を集めてきた。これらのモデルは、通常、不純物対策を最小化する特徴空間の切断を見つけるためにサブプロブレムを解く、欲求手続きを用いて構築される。本稿では,milpに基づく厳密な定式化による学習問題の定義において,この標準的欲望アプローチや近年の進歩とは対照的に,数千点のデータセットを処理可能なミーム的手法を活用し,分類木を誘導するための新しい進化的アルゴリズムを提案する。提案手法は,実現可能な解空間の探索と局所探索を組み合わせることで,最先端手法と競合する一般化能力を持つ構造を得る。

Given the increasing interest in interpretable machine learning, classification trees have again attracted the attention of the scientific community because of their glass-box structure. These models are usually built using greedy procedures, solving subproblems to find cuts in the feature space that minimize some impurity measures. In contrast to this standard greedy approach and to the recent advances in the definition of the learning problem through MILP-based exact formulations, in this paper we propose a novel evolutionary algorithm for the induction of classification trees that exploits a memetic approach that is able to handle datasets with thousands of points. Our procedure combines the exploration of the feasible space of solutions with local searches to obtain structures with generalization capabilities that are competitive with the state-of-the-art methods.

翻訳日:2023-05-16 18:41:00 公開日:2023-05-13

# more for less: より強力なパフォーマンス保証による安全なポリシー改善

More for Less: Safe Policy Improvement With Stronger Performance Guarantees ( http://arxiv.org/abs/2305.07958v1 )

ライセンス: Link先を確認

Patrick Wienh\"oft, Marnix Suilen, Thiago D. Sim\~ao, Clemens Dubslaff, Christel Baier, Nils Jansen

(参考訳) オフラインの強化学習環境では、安全なポリシー改善(SPI)問題は、サンプルデータが生成された行動ポリシーの性能を改善することを目的としている。 SPIに対する最先端のアプローチは、改善されたポリシーの性能に関する実用的な確率的保証を提供するために、多数のサンプルを必要とする。このような保証のために少ないデータを必要とする手段を提供するspi問題に対して,新たなアプローチを提案する。具体的には、これらの保証の正しさを証明するために、SPIのより厳密な改善境界を導出するための理論的基礎となるデータセットと基礎となる環境モデルに暗黙的な変換を考案する。ベースラインブートストラップ法(SPIBB)アルゴリズムを標準ベンチマークで確立したSPIを用いて,本手法がSPIBBアルゴリズムのサンプリング複雑性を著しく低減することを示す。

In an offline reinforcement learning setting, the safe policy improvement (SPI) problem aims to improve the performance of a behavior policy according to which sample data has been generated. State-of-the-art approaches to SPI require a high number of samples to provide practical probabilistic guarantees on the improved policy's performance. We present a novel approach to the SPI problem that provides the means to require less data for such guarantees. Specifically, to prove the correctness of these guarantees, we devise implicit transformations on the data set and the underlying environment model that serve as theoretical foundations to derive tighter improvement bounds for SPI. Our empirical evaluation, using the well-established SPI with baseline bootstrapping (SPIBB) algorithm, on standard benchmarks shows that our method indeed significantly reduces the sample complexity of the SPIBB algorithm.

翻訳日:2023-05-16 18:40:46 公開日:2023-05-13

# 境界駆動量子鎖のジャンプチャネル統計におけるパターン

Patterns in the jump-channel statistics of boundary driven quantum chains ( http://arxiv.org/abs/2305.07957v1 )

ライセンス: Link先を確認

Gabriel T. Landi

(参考訳) 量子系を複数のジャンプチャネルで連続的に測定することに由来する確率過程を考える。このプロセスは、ジャンプ間のランダムな時間だけでなく、ジャンプチャンネルを表す一連のシンボルによっても記述される。我々はこの記号列の基本的な性質を確立する。まず、ダイナミクスを完全に制御する特別なスーパーオペレータセットを決定し、多点分布を計算し、確率的軌道をシミュレートする効率的な方法を提供する。また、確率過程が定常である条件を決定し、遠方の放出の間の記憶が特定のスーパーオペレータのスペクトル特性によって決定されることを示す。最後に、あるシステムはパターンをサポートしており、各ジャンプの後の進化は閉じた状態のセットで実行される。これは、私たちが議論しているように、将来の結果の予測を大いに促進するために利用できます。境界駆動型一次元XYスピンチェーンによる輸送の研究により、これらのアイデアを説明する。統計はチェーンサイズに大きく依存していることが示される。そして、ハミルトニアンにおけるペアリング項の存在は、既存のパターンを破壊する。

We consider the stochastic process stemming from continuously measuring a quantum system with multiple jump channels. The process is described not only by the random times between jumps, but also by a sequence of emitted symbols representing each jump channel. We establish the fundamental properties of this sequence of symbols. First, we determine a special set of superoperators that completely govern the dynamics, and provide an efficient way for computing multi-point distributions and for simulating stochastic trajectories. We also determine the conditions for the stochastic process to be stationary and show that the memory between distant emissions is determined by the spectral properties of a specific superoperator. Finally, we show that some systems support a pattern, where the evolution after each jump runs over a closed set of states. This, as we argue, can be used to greatly facilitate our prediction of future outcomes. We illustrate these ideas by studying transport through a boundary-driven one-dimensional XY spin chain. We show that the statistics depends dramatically on the chain size. And that the presence of pairing terms in the Hamiltonian destroy any existing patterns.

翻訳日:2023-05-16 18:40:32 公開日:2023-05-13

# 確率的グラフマッチングによる画像分割

Image Segmentation via Probabilistic Graph Matching ( http://arxiv.org/abs/2305.07954v1 )

ライセンス: Link先を確認

Ayelet Heimowitz and Yosi Keller

(参考訳) 本研究では,低レベル画像を用いて計算された不定値および一対割り当て確率に基づく推論問題としてセグメンテーションを定式化する,教師なしかつ半自動的な画像セグメンテーション手法を提案する。この推論は確率的グラフマッチングスキームによって解決され、低レベルの画像キューとパラメータの自動チューニングを厳密に組み込むことができる。提案手法は, 現代の最先端画像集合に適用した場合に, 半教師なし・非教師付き画像分割方式と良好に比較できることを示した。

This work presents an unsupervised and semi-automatic image segmentation approach where we formulate the segmentation as a inference problem based on unary and pairwise assignment probabilities computed using low-level image cues. The inference is solved via a probabilistic graph matching scheme, which allows rigorous incorporation of low level image cues and automatic tuning of parameters. The proposed scheme is experimentally shown to compare favorably with contemporary semi-supervised and unsupervised image segmentation schemes, when applied to contemporary state-of-the-art image sets.

翻訳日:2023-05-16 18:40:17 公開日:2023-05-13

# 局所的パッチ間不変性に基づく視覚計測用イルミネーション非感受性バイナリ記述子

Illumination-insensitive Binary Descriptor for Visual Measurement Based on Local Inter-patch Invariance ( http://arxiv.org/abs/2305.07943v1 )

ライセンス: Link先を確認

Xinyu Lin, Yingjie Zhou, Xun Zhang, Yipeng Liu, and Ce Zhu

(参考訳) バイナリ機能記述子は様々な視覚計測タスク、特に限られた計算資源とストレージ容量を持つタスクで広く使われている。既存のバイナリディスクリプタは、照明のバリエーションに敏感なため、長期の視覚的な測定タスクではうまく機能しない。画像照明が劇的に変化すると、局所的なパッチ間の相対関係はほとんど無傷であることが観察できる。そこで本研究では,複数の空間的粒度に現れる局所的なパッチ間不変性を利用して,照明に敏感なバイナリ(IIB)ディスクリプタを提案する。局所パッチ特徴計算に積分画像を利用することにより、高効率なIIB記述子を実現する。スケーラブルな機能を複数の空間的な粒度にエンコードすることで,粗面から細面への計算効率の高い階層マッチングを実現する。さらに、IIBディスクリプタは、いくつかのアプリケーションで利用可能なディープマップやセマンティックセグメンテーション結果など、他のタイプの画像データにも適用することができる。自然と合成の両方のデータセットに関する数値実験により、提案したIIBディスクリプタは最先端のバイナリディスクリプタといくつかのテストフロートディスクリプタより優れていることが明らかになった。提案したIIBディスクリプタは、長期視覚的ローカライゼーションのためのデモシステムにも採用されている。 IIBディスクリプタのコードは公開されている。

Binary feature descriptors have been widely used in various visual measurement tasks, particularly those with limited computing resources and storage capacities. Existing binary descriptors may not perform well for long-term visual measurement tasks due to their sensitivity to illumination variations. It can be observed that when image illumination changes dramatically, the relative relationship among local patches mostly remains intact. Based on the observation, consequently, this study presents an illumination-insensitive binary (IIB) descriptor by leveraging the local inter-patch invariance exhibited in multiple spatial granularities to deal with unfavorable illumination variations. By taking advantage of integral images for local patch feature computation, a highly efficient IIB descriptor is achieved. It can encode scalable features in multiple spatial granularities, thus facilitating a computationally efficient hierarchical matching from coarse to fine. Moreover, the IIB descriptor can also apply to other types of image data, such as depth maps and semantic segmentation results, when available in some applications. Numerical experiments on both natural and synthetic datasets reveal that the proposed IIB descriptor outperforms state-of-the-art binary descriptors and some testing float descriptors. The proposed IIB descriptor has also been successfully employed in a demo system for long-term visual localization. The code of the IIB descriptor will be publicly available.

翻訳日:2023-05-16 18:40:07 公開日:2023-05-13

# データ駆動のディストピア:不断の倫理違反

Data-Driven Dystopia: an uninterrupted breach of ethics ( http://arxiv.org/abs/2305.07934v1 )

ライセンス: Link先を確認

Shreyansh Padarha

(参考訳) 本稿では、データの増加と大企業のデータの誤用に関連するリスクと複雑さについて論じる。この記事は、ユーザのプライバシに違反するデータ漏洩やデータ収集プラクティスの例を示している。また、不平等と差別を永続するビッグデータモデルを指すwmds(weapons of math destruction)の概念も検討している。この記事では、ユーザ情報の保護とデータモデル、AI、MLの倫理的利用に責任を負う企業の必要性を強調している。記事はまた、個人の日常生活におけるデータプライバシの重要性と、データ管理に対するより意識的で責任あるアプローチの必要性を強調している。

This article discusses the risks and complexities associated with the exponential rise in data and the misuse of data by large corporations. The article presents instances of data breaches and data harvesting practices that violate user privacy. It also explores the concept of "Weapons Of Math Destruction" (WMDs), which refers to big data models that perpetuate inequality and discrimination. The article highlights the need for companies to take responsibility for safeguarding user information and the ethical use of data models, AI, and ML. The article also emphasises the significance of data privacy for individuals in their daily lives and the need for a more conscious and responsible approach towards data management.

翻訳日:2023-05-16 18:39:43 公開日:2023-05-13

# GSB:限られたトレーニングサンプルを用いたビジョントランスのためのグループ重ね合わせ二元化

GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples ( http://arxiv.org/abs/2305.07931v1 )

ライセンス: Link先を確認

Tian Gao, Cheng-Zhong Xu, Le Zhang, Hui Kong

(参考訳) 大量のパラメータの影響を受け、ViTは通常、比較的限られた数のトレーニングサンプルで深刻なオーバーフィット問題に悩まされる。さらに、ViTは通常、リソース制限されたデバイスへのデプロイメントを制限する重いコンピューティングリソースを必要とする。モデル圧縮法の一種として、モデル双対化は上記の問題を解決する良い選択である可能性がある。完全な倍数化法と比較すると、複雑なテンソル乗算を単純なビット単位の2進演算に置き換え、全倍数モデルのパラメータとアクティベーションを1ビットのみで表現し、モデルサイズと計算複雑性の問題をそれぞれ解決する。本稿では,バイナリViTモデルの精度の低下は,アテンションモジュールと値ベクトルの情報損失が主な原因であることを示す。そこで本研究では,これらの問題に対処するため,GSB(Group Superposition Binarization)と呼ばれる新しいモデルバイナライゼーション手法を提案する。さらに,二元化モデルの性能をさらに向上させるために,二元化過程における勾配計算手順を調査し,gsbのより適切な勾配計算式を導出し,勾配ミスマッチの影響を低減した。次に, モデル2値化による性能劣化を緩和するために, 知識蒸留技術を導入する。限られたトレーニングサンプル数を持つ3つのデータセットの実験では、提案したGSBモデルがバイナリ量子化スキームの最先端性能を実現し、いくつかの指標でその完全精度を上回ることが示されている。

Affected by the massive amount of parameters, ViT usually suffers from serious overfitting problems with a relatively limited number of training samples. In addition, ViT generally demands heavy computing resources, which limit its deployment on resource-constrained devices. As a type of model-compression method,model binarization is potentially a good choice to solve the above problems. Compared with the full-precision one, the model with the binarization method replaces complex tensor multiplication with simple bit-wise binary operations and represents full-precision model parameters and activations with only 1-bit ones, which potentially solves the problem of model size and computational complexity, respectively. In this paper, we find that the decline of the accuracy of the binary ViT model is mainly due to the information loss of the Attention module and the Value vector. Therefore, we propose a novel model binarization technique, called Group Superposition Binarization (GSB), to deal with these issues. Furthermore, in order to further improve the performance of the binarization model, we have investigated the gradient calculation procedure in the binarization process and derived more proper gradient calculation equations for GSB to reduce the influence of gradient mismatch. Then, the knowledge distillation technique is introduced to alleviate the performance degradation caused by model binarization. Experiments on three datasets with limited numbers of training samples demonstrate that the proposed GSB model achieves state-of-the-art performance among the binary quantization schemes and exceeds its full-precision counterpart on some indicators.

翻訳日:2023-05-16 18:39:32 公開日:2023-05-13

# AMTSS:多言語言語推論のための適応型マルチ教師単段階知識蒸留フレームワーク

AMTSS: An Adaptive Multi-Teacher Single-Student Knowledge Distillation Framework For Multilingual Language Inference ( http://arxiv.org/abs/2305.07928v1 )

ライセンス: Link先を確認

Qianglong Chen, Feng Ji, Feng-Lin Li, Guohai Xu, Ming Yan, Ji Zhang and Yin Zhang

(参考訳) 知識蒸留は、実アプリケーションのための多言語事前学習言語モデルをローンチする上で重要である。多言語環境での費用対効果のある言語推論を支援するために,複数の教師から1人の学生への知識の蒸留を可能にする適応型多教師単学生蒸留フレームワークであるAMTSSを提案する。まず,適応型学習戦略と教師重要度重みを導入し,生徒がマックスマージン教師から効果的に学び,新しい言語に容易に適応できるようにする。さらに,複数の言語をサポートする異なるプロジェクション層を持つ共有学生エンコーダを提案する。 AMTSSは,Eコマースシナリオにおいて,パブリックXNLIデータセットとリアル産業データセットAliExpress(AE)の競争結果を得ることを示す。

Knowledge distillation is of key importance to launching multilingual pre-trained language models for real applications. To support cost-effective language inference in multilingual settings, we propose AMTSS, an adaptive multi-teacher single-student distillation framework, which allows distilling knowledge from multiple teachers to a single student. We first introduce an adaptive learning strategy and teacher importance weight, which enables a student to effectively learn from max-margin teachers and easily adapt to new languages. Moreover, we present a shared student encoder with different projection layers in support of multiple languages, which contributes to largely reducing development and machine cost. Experimental results show that AMTSS gains competitive results on the public XNLI dataset and the realistic industrial dataset AliExpress (AE) in the E-commerce scenario.

翻訳日:2023-05-16 18:39:06 公開日:2023-05-13

# RC3: 正規化コントラストクロスランガルクロスモーダルプレトレーニング

RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training ( http://arxiv.org/abs/2305.07927v1 )

ライセンス: Link先を確認

Chulun Zhou, Yunlong Liang, Fandong Meng, Jinan Xu, Jinsong Su and Jie Zhou

(参考訳) 多言語視覚言語(V&L)の事前学習は、様々なモダリティや言語にまたがる普遍表現の学習において顕著な進歩を遂げた。近年の成功にもかかわらず、多言語環境でのV&L事前訓練モデルのさらなる改善には依然として課題がある。特に、現在のV&L事前学習法は、機械翻訳を通じて英語中心のデータセットから生成される厳密な多言語画像テキストペアに大きく依存している。しかし、厳密に整合したデータセットの収集と翻訳のコストは通常、計り知れない。本稿では,より豊富な弱結合型多言語画像テキストペアを活用した正規化コントラスト言語間クロスモーダル(rc^3)事前学習を提案する。具体的には、テキスト関連性に応じて、弱整列型視覚テキスト入力の表現近接を制約する正規化言語間視覚テキストコントラスト学習目標を設計する。さらに、既存のV&L事前トレーニングアプローチは、主に関心の領域(ROI)機能またはパッチ埋め込みによる視覚的な入力を扱う。事前学習と下流マルチモーダルタスクのためのモデルに,2種類の視覚的特徴を柔軟に統合する。 6言語にまたがる下流5つのマルチモーダルタスクに関する広範囲な実験により,ゼロショット能力の強いコントラストモデルに対する提案手法の有効性が示された。

Multilingual vision-language (V&L) pre-training has achieved remarkable progress in learning universal representations across different modalities and languages. In spite of recent success, there still remain challenges limiting further improvements of V&L pre-trained models in multilingual settings. Particularly, current V&L pre-training methods rely heavily on strictly-aligned multilingual image-text pairs generated from English-centric datasets through machine translation. However, the cost of collecting and translating such strictly-aligned datasets is usually unbearable. In this paper, we propose Regularized Contrastive Cross-lingual Cross-modal (RC^3) pre-training, which further exploits more abundant weakly-aligned multilingual image-text pairs. Specifically, we design a regularized cross-lingual visio-textual contrastive learning objective that constrains the representation proximity of weakly-aligned visio-textual inputs according to textual relevance. Besides, existing V&L pre-training approaches mainly deal with visual inputs by either region-of-interest (ROI) features or patch embeddings. We flexibly integrate the two forms of visual features into our model for pre-training and downstream multi-modal tasks. Extensive experiments on 5 downstream multi-modal tasks across 6 languages demonstrate the effectiveness of our proposed method over competitive contrast models with stronger zero-shot capability.

翻訳日:2023-05-16 18:38:49 公開日:2023-05-13

# gt-rain challenge cvpr 2023 workshop ug$^{\textbf{2}}$+ track 3の2段階実画像レーディング手法

A Two-Stage Real Image Deraining Method for GT-RAIN Challenge CVPR 2023 Workshop UG$^{\textbf{2}}$+ Track 3 ( http://arxiv.org/abs/2305.07979v1 )

ライセンス: Link先を確認

Yun Guo, Xueyao Xiao, Xiaoxiong Wang, Yi Li, Yi Chang, Luxin Yan

(参考訳) 本稿では,CVPR 2023 UG$^{2}$+ Track 3におけるGT-Rain ChallengeのためのチームHUST\li VIEのソリューションについて紹介する。本研究では,雨のフレームから鮮明な画像を再構成する効率的な二段階フレームワークを提案する。まず,マルチフレームとアライメントされた雨枠の利点を生かした疑似gtを生成するために,低ランクビデオデライニング手法を用いる。第2に,大規模な実雨データセットを事前トレーニングし,擬似gtを微調整して画像復元をさらに改善するために,トランスフォーマによる単一画像レーダネットワーク uformer を実装した。さらに、視覚的快楽効果の面では、パイプラインの終了時に包括的な画像処理モジュールが使用される。我々の全体的なフレームワークは精巧に設計されており、最終テストフェーズで提供される豪雨シーケンスと霧のシーケンスの両方を処理できます。最後に、平均構造類似度(SSIM)で1位、平均ピーク信号-雑音比(PSNR)で2位にランクする。私たちのコードはhttps://github.com/yunguo224/ug2_derainingで利用可能です。

In this technical report, we briefly introduce the solution of our team HUST\li VIE for GT-Rain Challenge in CVPR 2023 UG$^{2}$+ Track 3. In this task, we propose an efficient two-stage framework to reconstruct a clear image from rainy frames. Firstly, a low-rank based video deraining method is utilized to generate pseudo GT, which fully takes the advantage of multi and aligned rainy frames. Secondly, a transformer-based single image deraining network Uformer is implemented to pre-train on large real rain dataset and then fine-tuned on pseudo GT to further improve image restoration. Moreover, in terms of visual pleasing effect, a comprehensive image processor module is utilized at the end of pipeline. Our overall framework is elaborately designed and able to handle both heavy rainy and foggy sequences provided in the final testing phase. Finally, we rank 1st on the average structural similarity (SSIM) and rank 2nd on the average peak signal-to-noise ratio (PSNR). Our code is available at https://github.com/yunguo224/UG2_Deraining.

翻訳日:2023-05-16 18:32:44 公開日:2023-05-13

# デュアルフォーミュレーションによる非負の低ランクテンソルコンプリートと画像・ビデオコンプリートへの応用

Nonnegative Low-Rank Tensor Completion via Dual Formulation with Applications to Image and Video Completion ( http://arxiv.org/abs/2305.07976v1 )

ライセンス: Link先を確認

Tanmay Kumar Sinha, Jayadev Naram, Pawan Kumar

(参考訳) テンソル補完問題に対する最近のアプローチは、しばしばデータの非負構造を見落としている。我々は,非負の低ランクテンソルを学習する問題を考えるとともに,双対性理論を用いて,そのようなテンソルの新しい因子分解を提案する。因子化は非負の制約を低ランクの制約から切り離す。その結果の問題は多様体上の最適化問題であり、それを解決するためにリーマン共役勾配の変種を提案する。提案アルゴリズムは,カラー画像インペインティング,映像補完,ハイパースペクトル画像補完など,様々なタスクにわたってテストを行う。実験の結果,提案手法は多くのテンソル補完アルゴリズムよりも優れていることがわかった。

Recent approaches to the tensor completion problem have often overlooked the nonnegative structure of the data. We consider the problem of learning a nonnegative low-rank tensor, and using duality theory, we propose a novel factorization of such tensors. The factorization decouples the nonnegative constraints from the low-rank constraints. The resulting problem is an optimization problem on manifolds, and we propose a variant of Riemannian conjugate gradients to solve it. We test the proposed algorithm across various tasks such as colour image inpainting, video completion, and hyperspectral image completion. Experimental results show that the proposed method outperforms many state-of-the-art tensor completion algorithms.

翻訳日:2023-05-16 18:32:24 公開日:2023-05-13

# 線形制約系の作用素解の単純解法

Simplicial techniques for operator solutions of linear constraint systems ( http://arxiv.org/abs/2305.07974v1 )

ライセンス: Link先を確認

Ho Yiu Chung, Cihan Okay, Igor Sikora

(参考訳) 線形制約系は、整数の群 $\ZZ_d$ 上の線型方程式によって指定される。作用素解は、量子文脈性や非局所ゲームの研究において重要な役割を果たす。本稿では、単純集合の理論を用いて線形系の作用素解を研究するための枠組みを開発する。このアプローチは、これらの群を空間の基本群と密接に関連する代数的不変量として同定することにより、解群に基づくよく知られた群論的アプローチを洗練する。この観点から、我々のアプローチは細胞複合体に基づく初期のホモトピー的アプローチとも関係している。フレームワーク内では、単純集合から来る線形系の新しいクラスを導入し、任意の線形系をその形式の1つに還元できることを示す。次に、群に関連する線形系を専門とする。群内の解を許容する任意の線型系に対して、$\ZZ_d$の解が認められるという予想に対して、重要な証拠を提供する。

A linear constraint system is specified by linear equations over the group $\ZZ_d$ of integers modulo $d$. Their operator solutions play an important role in the study of quantum contextuality and non-local games. In this paper, we use the theory of simplicial sets to develop a framework for studying operator solutions of linear systems. Our approach refines the well-known group-theoretical approach based on solution groups by identifying these groups as algebraic invariants closely related to the fundamental group of a space. In this respect, our approach also makes a connection to the earlier homotopical approach based on cell complexes. Within our framework, we introduce a new class of linear systems that come from simplicial sets and show that any linear system can be reduced to one of that form. Then we specialize in linear systems that are associated with groups. We provide significant evidence for a conjecture stating that for odd $d$ every linear system admitting a solution in a group admits a solution in $\ZZ_d$.

翻訳日:2023-05-16 18:32:13 公開日:2023-05-13

# 確率的セキュリティの計算コストについて

On the Computational Cost of Stochastic Security ( http://arxiv.org/abs/2305.07973v1 )

ライセンス: Link先を確認

Noah A. Crum, Leanto Sunny, Pooya Ronagh, Raymond Laflamme, Radhakrishnan Balu, George Siopsis

(参考訳) 本稿では,Langevin Dynamicsの長期持続鎖モンテカルロシミュレーションにより,エネルギーベースモデル(EBM)による表現の質が向上するかどうかを考察する。本研究では,学習したebmを用いた拡散過程のモンテカルロシミュレーションを用いて,独立分類器ネットワークの逆ロバスト性やキャリブレーションスコアを改善する手法を提案する。本研究は, 連続エネルギーポテンシャルからギブズサンプリングを効率よく行うために, 量子・古典ハードウェアとソフトウェアを新たに実現し, モデルのキャリブレーションと対角ロバスト性を向上させることを目的として, ギブズサンプリングの計算予算の増大を図った。

We investigate whether long-run persistent chain Monte Carlo simulation of Langevin dynamics improves the quality of the representations achieved by energy-based models (EBM). We consider a scheme wherein Monte Carlo simulation of a diffusion process using a trained EBM is used to improve the adversarial robustness and the calibration score of an independent classifier network. Our results show that increasing the computational budget of Gibbs sampling in persistent contrastive divergence improves the calibration and adversarial robustness of the model, elucidating the practical merit of realizing new quantum and classical hardware and software for efficient Gibbs sampling from continuous energy potentials.

翻訳日:2023-05-16 18:31:56 公開日:2023-05-13

# Trillion Dollar Words: 新たな金融データセットとタスク&マーケット分析

Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis ( http://arxiv.org/abs/2305.07972v1 )

ライセンス: Link先を確認

Agam Shah and Suvan Paturi and Sudheer Chava

(参考訳) 連邦公開市場委員会(FOMC)による金融政策宣言は、金融市場リターンの主要な要因である。我々は、金融政策が金融市場に与える影響を理解するために、fomcスピーチ、会議分、記者会見の書き起こしの最大のトークン化および注釈付きデータセットを構築します。本研究では,ホーカッシュ・ドヴィッシュ分類の新たなタスクを開発し,提案するデータセット上での各種事前学習言語モデルのベンチマークを行った。最良業績モデル(RoBERTa-large)を用いて,FOMC文書公開日に対する金融政策スタンスを測定する。構築した指標を評価するため,金融市場,株式市場,マクロ経済指標への影響について検討する。私たちのデータセット、モデル、コードはcc by-nc 4.0ライセンスの下でhughingfaceとgithubで公開されている。

Monetary policy pronouncements by Federal Open Market Committee (FOMC) are a major driver of financial market returns. We construct the largest tokenized and annotated dataset of FOMC speeches, meeting minutes, and press conference transcripts in order to understand how monetary policy influences financial markets. In this study, we develop a novel task of hawkish-dovish classification and benchmark various pre-trained language models on the proposed dataset. Using the best-performing model (RoBERTa-large), we construct a measure of monetary policy stance for the FOMC document release days. To evaluate the constructed measure, we study its impact on the treasury market, stock market, and macroeconomic indicators. Our dataset, models, and code are publicly available on Huggingface and GitHub under CC BY-NC 4.0 license.

翻訳日:2023-05-16 18:31:44 公開日:2023-05-13

# 距離空間におけるグラフ埋め込みの厳密かつ高速な一般化誤差境界

Tight and fast generalization error bound of graph embedding in metric space ( http://arxiv.org/abs/2305.07971v1 )

ライセンス: Link先を確認

Atsushi Suzuki, Atsushi Nitanda, Taiji Suzuki, Jing Wang, Feng Tian, and Kenji Yamanishi

(参考訳) 近年の研究では、計量空間におけるグラフの構造を反映した頂点表現を得ることを目的として、非ユークリッド計量空間において有効かつ効率的なグラフ埋め込みが達成できることが実験的に示されている。具体的には、双曲空間へのグラフ埋め込みは、例えば自然言語、ソーシャルネットワーク、知識ベースなどの階層構造を持つグラフの埋め込みに実験的に成功している。しかし、近年の理論解析により、非ユークリッドグラフ埋め込みの一般化誤差はユークリッドグラフよりもかなり高い値を示しており、高い一般化誤差はデータの不完全性とノイズが学習性能に重大な影響を与えることを示している。これは、既存の境界が非ユークリッド距離空間におけるグラフ埋め込みの成功を実際の訓練データサイズで保証できないことを意味しており、非ユークリッドグラフ埋め込みの実際の問題への応用を防ぐことができる。本稿では、表現対の距離の関数集合としてモデルの局所ラデマッハ複雑性を評価することにより、グラフ埋め込みの一般化誤差の新たな上限を与える。我々の境界は、双曲空間を含む非ユークリッド距離空間におけるグラフ埋め込みのパフォーマンスが、既存の上界よりも優れていることを明確化する。具体的には、我々の新しい上界は距離空間の幾何半径$R$の多項式であり、最大で$O(\frac{1}{S})$で、$S$はトレーニングデータサイズである。我々のバウンダリは、既存のバウンダリよりもかなり強く、高速で$R$と$O(\frac{1}{\sqrt{S}})$に指数関数化できる。例における特定の計算により、非ユークリッド計量空間へのグラフ埋め込みは、既存の有界よりもはるかに少ない訓練データを持つユークリッド空間におけるグラフ埋め込みよりも優れていることが示される。

Recent studies have experimentally shown that we can achieve in non-Euclidean metric space effective and efficient graph embedding, which aims to obtain the vertices' representations reflecting the graph's structure in the metric space. Specifically, graph embedding in hyperbolic space has experimentally succeeded in embedding graphs with hierarchical-tree structure, e.g., data in natural languages, social networks, and knowledge bases. However, recent theoretical analyses have shown a much higher upper bound on non-Euclidean graph embedding's generalization error than Euclidean one's, where a high generalization error indicates that the incompleteness and noise in the data can significantly damage learning performance. It implies that the existing bound cannot guarantee the success of graph embedding in non-Euclidean metric space in a practical training data size, which can prevent non-Euclidean graph embedding's application in real problems. This paper provides a novel upper bound of graph embedding's generalization error by evaluating the local Rademacher complexity of the model as a function set of the distances of representation couples. Our bound clarifies that the performance of graph embedding in non-Euclidean metric space, including hyperbolic space, is better than the existing upper bounds suggest. Specifically, our new upper bound is polynomial in the metric space's geometric radius $R$ and can be $O(\frac{1}{S})$ at the fastest, where $S$ is the training data size. Our bound is significantly tighter and faster than the existing one, which can be exponential to $R$ and $O(\frac{1}{\sqrt{S}})$ at the fastest. Specific calculations on example cases show that graph embedding in non-Euclidean metric space can outperform that in Euclidean space with much smaller training data than the existing bound has suggested.

翻訳日:2023-05-16 18:31:31 公開日:2023-05-13

# 実験経済学を用いた大規模言語モデルにおける創発的ゴール様行動の調査

Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics ( http://arxiv.org/abs/2305.07970v1 )

ライセンス: Link先を確認

Steve Phelps and Yvan I. Russell

(参考訳) 本研究では,社会的ジレンマにおける協調的,競争的,利他的,利他的行動の自然言語記述を運用するための大規模言語モデル(llm)の能力,特にgpt-3.5について検討する。我々の焦点は、非ゼロサム相互作用の古典的な例である反復囚人のジレンマであるが、我々の広範な研究プログラムは、最後通算ゲーム、独裁ゲーム、公共財ゲームを含む様々な実験経済シナリオを含んでいる。実験では,様々なプロンプトを用いてllm生成エージェントをインスタンス化し,協調的および競争的スタンスを伝達した。そこで我々は,囚人のジレンマを繰り返すエージェントの協力レベルを評価し,パートナーの協力行動や離脱行動に対する反応を考慮に入れた。その結果、llmは利他主義と利己主義の自然言語記述をある程度適切な行動に翻訳できるが、条件付き相反性に基づく行動適応の限界が示された。障害者との協力の増大と協力者の協力の減少が観察されたパターンは、社会ジレンマにおける人間の行動に関する知識を一般化するLLMの能力の潜在的な制約を強調している。我々は,幅広い社会的ジレンマの中で,llm生成エージェントの創発的行動に寄与する要因について,研究コミュニティにさらなる検討を求め,モデルアーキテクチャ,トレーニングパラメータ,エージェント行動に対する様々なパートナー戦略の影響について検討する。 GPT-4のような先進的なLLMが利用可能になるにつれて、それらが類似した制限を示すか、より微妙な協調行動が可能かどうかを調査することが重要であり、最終的には人間の価値観や社会的規範に適合したAIシステムの開発を促進する。

In this study, we investigate the capacity of large language models (LLMs), specifically GPT-3.5, to operationalise natural language descriptions of cooperative, competitive, altruistic, and self-interested behavior in social dilemmas. Our focus is on the iterated Prisoner's Dilemma, a classic example of a non-zero-sum interaction, but our broader research program encompasses a range of experimental economics scenarios, including the ultimatum game, dictator game, and public goods game. Using a within-subject experimental design, we instantiated LLM-generated agents with various prompts that conveyed different cooperative and competitive stances. We then assessed the agents' level of cooperation in the iterated Prisoner's Dilemma, taking into account their responsiveness to the cooperative or defection actions of their partners. Our results provide evidence that LLMs can translate natural language descriptions of altruism and selfishness into appropriate behaviour to some extent, but exhibit limitations in adapting their behavior based on conditioned reciprocity. The observed pattern of increased cooperation with defectors and decreased cooperation with cooperators highlights potential constraints in the LLM's ability to generalize its knowledge about human behavior in social dilemmas. We call upon the research community to further explore the factors contributing to the emergent behavior of LLM-generated agents in a wider array of social dilemmas, examining the impact of model architecture, training parameters, and various partner strategies on agent behavior. As more advanced LLMs like GPT-4 become available, it is crucial to investigate whether they exhibit similar limitations or are capable of more nuanced cooperative behaviors, ultimately fostering the development of AI systems that better align with human values and social norms.

翻訳日:2023-05-16 18:30:59 公開日:2023-05-13

# GPT-Sentinel:人間とチャットGPT生成コンテンツを識別する

GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content ( http://arxiv.org/abs/2305.07969v1 )

ライセンス: Link先を確認

Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Ramakrishnan

(参考訳) 本稿では,言語モデルを用いたChatGPT生成対人文テキスト検出手法を提案する。この目的のために、我々はまずOpenGPTTextという、ChatGPTを用いて生成されたリフレーズ付きコンテンツからなる前処理データセットを収集し、リリースした。次に、RoBERTa(Roustly Optimized BERT Pretraining Approach)とText-to-Text Transfer Transformer(T5)を用いて、テキスト分類のための2つの異なるモデルの設計、実装、訓練を行った。私たちのモデルは、さまざまなメトリクスで評価したように、テストデータセット上で97%以上の精度で、驚くべき結果を達成しました。さらに,人間の手書きテキストとChatGPT生成テキストの主な特徴を抽出し,識別する能力を示すための解釈可能性の検討を行った。本研究は,生成テキストの検出における言語モデルの有効利用に関する重要な知見を提供する。

This paper presents a novel approach for detecting ChatGPT-generated vs. human-written text using language models. To this end, we first collected and released a pre-processed dataset named OpenGPTText, which consists of rephrased content generated using ChatGPT. We then designed, implemented, and trained two different models for text classification, using Robustly Optimized BERT Pretraining Approach (RoBERTa) and Text-to-Text Transfer Transformer (T5), respectively. Our models achieved remarkable results, with an accuracy of over 97% on the test dataset, as evaluated through various metrics. Furthermore, we conducted an interpretability study to showcase our model's ability to extract and differentiate key features between human-written and ChatGPT-generated text. Our findings provide important insights into the effective use of language models to detect generated text.

翻訳日:2023-05-16 18:30:27 公開日:2023-05-13

# 量子ゼノダイナミクスによるポテンシャル中の量子粒子のテレポーテーション

Teleportation of a quantum particle in a potential via quantum Zeno dynamics ( http://arxiv.org/abs/2305.07968v1 )

ライセンス: Link先を確認

Miguel A. Porras, Miguel Casado-\'Alvaro, and Isabel Gonzalo

(参考訳) 量子状態のテレポーテーションとは、絡み合いによって異なる現象である量子粒子のテレポーテーションの可能性について報告する。第一の意味から、粒子を(一定の不確実性を持って)あるポテンシャル井戸や障壁の平衡点から配置し、粒子が静止しているかどうかを頻繁に監視することで、理論上はテレポーテーションが可能である。この量子ゼノダイナミクスは加速度を阻害し、他のターニングポイントにおける古典的ターニングポイントからの消失と他のターニングポイントの出現を特徴とする。粒子は常に静止しており、2つの旋回点の間の経路には見つからず、移動時間を節約します。電子、陽子、その他の粒子のテレポーテーションの実現可能性について議論し、粒子が重くなるにつれてその非現実性が増加すると結論づける。

We report on the possibility of teleportation of a quantum particle, a distinctly different phenomenon from the teleportation of a quantum state through entanglement. With the first meaning, teleportation is theoretically possible by placing the particle initially at rest (with a certain uncertainty) out of any equilibrium point of a potential well or barrier and by frequently monitoring whether the particle remains at rest. This quantum Zeno dynamics inhibits acceleration, and features disappearance from the classical turning point and appearance in other turning point, if there is any other, with a probability that approaches unity by increasing the frequency of the measurements. This phenomenon has all the ingredients attributed in science fiction to teleportation: The particle is always at rest, cannot be found in the path between the two turning points, and saves travel time. We discuss the feasibility, in principle, of teleportation of electrons, protons and other particles, and conclude its increasing impracticability as the particle gets heavier.

翻訳日:2023-05-16 18:30:09 公開日:2023-05-13

# 構造化低ランクテンソル学習

Structured Low-Rank Tensor Learning ( http://arxiv.org/abs/2305.07967v1 )

ライセンス: Link先を確認

Jayadev Naram, Tanmay Kumar Sinha, Pawan Kumar

(参考訳) 構造的制約のある部分的な観測から低ランクテンソルを学習する問題を考察し、そのようなテンソルの新たな因子化を提案し、より単純な最適化問題を導いた。結果として生じる問題は多様体上の最適化問題である。この問題を解決するために,一階および二階リーマン最適化アルゴリズムを開発した。得られた問題の双対性ギャップを導出し,提案アルゴリズムの正しさを実験的に検証する。非負の制約とハンケルの制約に関するアルゴリズムを実証する。

We consider the problem of learning low-rank tensors from partial observations with structural constraints, and propose a novel factorization of such tensors, which leads to a simpler optimization problem. The resulting problem is an optimization problem on manifolds. We develop first-order and second-order Riemannian optimization algorithms to solve it. The duality gap for the resulting problem is derived, and we experimentally verify the correctness of the proposed algorithm. We demonstrate the algorithm on nonnegative constraints and Hankel constraints.

翻訳日:2023-05-16 18:29:50 公開日:2023-05-13

# ナノダイアモンド回転とNV中心スピンの高絡み合い状態の準備

Preparing highly entangled states of nanodiamond rotation and NV center spin ( http://arxiv.org/abs/2305.08008v1 )

ライセンス: Link先を確認

Wen-Liang Li, D. L. Zhou

(参考訳) nv(embedd nitrogen-vacancy)センターを備えたナノダイアモンドは、現在の技術でコヒーレントに操作できる実験システムの1つである。 nv中心電子スピンとナノダイヤモンドの機械的回転の絡み合いは、これらの微視的およびメソスコピックな動きを繋ぐ量子ネットワークを構築する上で重要な役割を果たす。本稿では,外部磁場を漸近的に上昇させることで,量子回転と電子スピンの高度に絡み合った状態を漸近的に生成するプロトコルを提案する。

A nanodiamond with an embedded nitrogen-vacancy (NV) center is one of the experimental systems that can be coherently manipulated within current technologies. Entanglement between NV center electron spin and mechanical rotation of the nanodiamond plays a fundamental role in building a quantum network connecting these microscopic and mesoscopic degrees of motions. Here we present a protocol to asymptotically prepare a highly entangled state of the quantum rotation and electron spin by adiabatically boosting the external magnetic field.

翻訳日:2023-05-16 18:22:41 公開日:2023-05-13

# over the safeguards: chatgptのセキュリティリスクを探求する

Beyond the Safeguards: Exploring the Security Risks of ChatGPT ( http://arxiv.org/abs/2305.08005v1 )

ライセンス: Link先を確認

Erik Derner and Kristina Batisti\v{c}

(参考訳) ChatGPTのような大規模言語モデル(LLM)の人気が高まり、安全性、セキュリティリスク、倫理的影響に対する懸念が高まっている。本稿では,悪意のあるテキストやコード生成,プライベートデータ開示,不正サービス,情報収集,非倫理的コンテンツの生成など,ChatGPTに関連するさまざまなセキュリティリスクの概要について述べる。本稿では,ChatGPTのコンテンツフィルタの有効性を検証し,保護されている場合でもLLMに持続する倫理的影響とセキュリティリスクを実証し,これらの保護を回避できる可能性を探究する。セキュリティへの影響の質的な分析に基づいて、これらのリスクを軽減し、研究者、政策立案者、業界専門家にChatGPTのようなLLMがもたらす複雑なセキュリティ課題について通知する潜在的な戦略について議論する。本研究は, LLMの倫理的, セキュリティ的含意に関する継続的な議論に寄与し, この分野における継続的な研究の必要性を浮き彫りにしている。

The increasing popularity of large language models (LLMs) such as ChatGPT has led to growing concerns about their safety, security risks, and ethical implications. This paper aims to provide an overview of the different types of security risks associated with ChatGPT, including malicious text and code generation, private data disclosure, fraudulent services, information gathering, and producing unethical content. We present an empirical study examining the effectiveness of ChatGPT's content filters and explore potential ways to bypass these safeguards, demonstrating the ethical implications and security risks that persist in LLMs even when protections are in place. Based on a qualitative analysis of the security implications, we discuss potential strategies to mitigate these risks and inform researchers, policymakers, and industry professionals about the complex security challenges posed by LLMs like ChatGPT. This study contributes to the ongoing discussion on the ethical and security implications of LLMs, underscoring the need for continued research in this area.

翻訳日:2023-05-16 18:22:33 公開日:2023-05-13

# 混合状態に対する最適量子速度

Optimal quantum speed for mixed states ( http://arxiv.org/abs/2305.08004v1 )

ライセンス: Link先を確認

Ashraf Naderzadeh and Seyed Javad Akhtarshenas

(参考訳) 量子状態がいかに高速に進化できるかという問題を考える。 phys におけるユークリッド距離に基づく二乗速度の定義を用いる。 Rev. Reaserch, {\bf 2}, 033127 (2019)] では、時間非依存ハミルトニアンの下で一元的に進化した$d$次元システムの最適速度を得るための体系的な枠組みを提供する。同じ純度を持つ混合量子状態の組のうち、最適状態はその純度パラメータを用いて得られる。任意の$d$に対して、最適状態は二次対角線に対して対称な追加特性を持つ$X$-状態によって与えられることを示す。純度が最大混合状態$\Id/d$を少なくとも2/d^2$で純度を超える十分低い純度に対して、最適状態の非零対角エントリーは$\varrho_{1d}$であり、それぞれ最小固有値と最大固有値を持つ2つのエネルギー固有状態間の遷移振幅に対応する。しかし、より大きな純度の場合、他の二次径のエントリ$\varrho_{i,d-i+1}$を非零値とするかどうかは、相対エネルギーギャップ$|E_{d-i+1}-E_{i}|$に依存する。エネルギー基底に対するコヒーレンスと絡み合いの影響も検討され、最適状態においてはどちらの資源も純度の単調関数であり、量子進化を加速させ、量子速度の限界を小さくすることができる。以上の結果から, 進化速度は, 状態のコヒーレンスが担うが, 最高速では, 二次対角形メークロールに位置する外対角要素によって引き起こされるコヒーレンスのみを示す。

The question that how fast a quantum state can evolve is considered. Using the definition of squared speed based on the Euclidean distance given in [Phys. Rev. Reaserch, {\bf 2}, 033127 (2019)], we provide a systematic framework to obtain the optimal speed of a $d$-dimensional system evolved unitarily under a time-independent Hamiltonian. Among the set of mixed quantum states having the same purity, the optimal state is obtained in terms of its purity parameter. We show that for an arbitrary $d$, the optimal state is given by a $X$-state with an additional property of being symmetric with respect to the secondary diagonal. For sufficiently low purities for which the purity exceeds the purity of maximally mixed state $\Id/d$ by at most $2/d^2$, the only nonzero off-diagonal entry of the optimal state is $\varrho_{1d}$, corresponding to the transition amplitude between two energy eigenstates with minimum and maximum eigenvalues, respectively. For larger purities, however, whether or not the other secondary diameter entries $\varrho_{i,d-i+1}$ take nonzero values depends on their relative energy gaps $|E_{d-i+1}-E_{i}|$. The effects of coherence and entanglement, with respect to the energy basis, are also examined and find that for optimal states both resources are monotonic functions of purity, so they can causs speed up quantum evolution leading to a smaller quantum speed limit. Our results show that although the coherence of the states is responsible for the speed of evolution, for the fastest states only the coherence caused by some off-diagonal entries located on the secondary diagonal make role.

翻訳日:2023-05-16 18:22:16 公開日:2023-05-13

# 構造化データを用いた高速非同期確率勾配アルゴリズム

Efficient Asynchronize Stochastic Gradient Algorithm with Structured Data ( http://arxiv.org/abs/2305.08001v1 )

ライセンス: Link先を確認

Zhao Song, Mingquan Ye

(参考訳) ディープラーニングは、その優れた一般化により、様々な分野で印象的な成功を収めた。しかしながら、多数のレイヤを持つニューラルネットワークを迅速にトレーニングすることは、これまでも難しい問題でした。既存の作業では、局所性に敏感なハッシュ技術や、空間分割上のデータ構造を利用して、各イテレーションのトレーニングコストを軽減する。本研究では、入力データポイントの観点から各イテレーションにおける計算の高速化を試みる。具体的には、トレーニングデータがKronecker構造のような特別な特性を持つ2層完全連結ニューラルネットワークの場合、各イテレーションはデータ次元のサブ線形時間で完了することができる。

Deep learning has achieved impressive success in a variety of fields because of its good generalization. However, it has been a challenging problem to quickly train a neural network with a large number of layers. The existing works utilize the locality-sensitive hashing technique or some data structures on space partitioning to alleviate the training cost in each iteration. In this work, we try accelerating the computations in each iteration from the perspective of input data points. Specifically, for a two-layer fully connected neural network, when the training data have some special properties, e.g., Kronecker structure, each iteration can be completed in sublinear time in the data dimension.

翻訳日:2023-05-16 18:21:40 公開日:2023-05-13

# 特徴適応を用いたDNN圧縮領域認識

DNN-Compressed Domain Visual Recognition with Feature Adaptation ( http://arxiv.org/abs/2305.08000v1 )

ライセンス: Link先を確認

Yingpeng Deng and Lina J. Karam

(参考訳) 学習に基づく画像圧縮は、最先端の変換ベースのコーデックと競合する性能を発揮する。これはJPEG-AIのような新しい学習ベースのビジュアル圧縮標準の開発を動機づけた。これらの新しい標準に対する特に関心は、人間と機械の両方をターゲットにした学習ベースの画像圧縮システムの開発である。本稿では,圧縮領域表現を用いて,圧縮領域内で直接視覚処理やコンピュータビジョンタスクを行う学習ベース圧縮方式について述べる。本研究では,ビットレートの異なる圧縮ドメイン潜在表現を用いて視覚認識を行うための,学習ベースの圧縮ドメイン分類フレームワークを採用する。本稿では,抽出されたチャネル情報の中で重要な特徴を適応的に強調・強化するために,軽量な注意モデルを統合する新しい特徴適応モジュールを提案する。また,事前訓練された画素領域重みを利用するための適応学習戦略を設計する。比較のために,提案手法を用いて得られた性能評価結果に加えて,画素領域内の圧縮・完全復号画像とオリジナル未圧縮画像を用いた性能評価結果も提示する。その結果,提案した圧縮領域分類モデルは,既存の圧縮領域分類モデルよりも明らかに優れており,完全復号化画像を用いて訓練された画素領域モデルと比較して,計算効率が向上することを示す。

Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs. This motivated the development of new learning-based visual compression standards such as JPEG-AI. Of particular interest to these emerging standards is the development of learning-based image compression systems targeting both humans and machines. This paper is concerned with learning-based compression schemes whose compressed-domain representations can be utilized to perform visual processing and computer vision tasks directly in the compressed domain. In our work, we adopt a learning-based compressed-domain classification framework for performing visual recognition using the compressed-domain latent representation at varying bit-rates. We propose a novel feature adaptation module integrating a lightweight attention model to adaptively emphasize and enhance the key features within the extracted channel-wise information. Also, we design an adaptation training strategy to utilize the pretrained pixel-domain weights. For comparison, in addition to the performance results that are obtained using our proposed latent-based compressed-domain method, we also present performance results using compressed but fully decoded images in the pixel domain as well as original uncompressed images. The obtained performance results show that our proposed compressed-domain classification model can distinctly outperform the existing compressed-domain classification models, and that it can also yield similar accuracy results with a much higher computational efficiency as compared to the pixel-domain models that are trained using fully decoded images.

翻訳日:2023-05-16 18:21:29 公開日:2023-05-13

# ディープニューラルネットワークのための逐次アフィン学習

Successive Affine Learning for Deep Neural Networks ( http://arxiv.org/abs/2305.07996v1 )

ライセンス: Link先を確認

Yuesheng Xu

(参考訳) 本稿では,深層ニューラルネットワーク構築のための逐次アフィン学習(SAL)モデルを提案する。伝統的に、DNNは非凸最適化問題の解決によって構築される。このような問題を非凸性や多数の層を持つため数値的に解くことはしばしば困難である。本論文の著者らにより,人間教育システムに触発されたこの課題に対処するため,近年,多段階深層学習(MGDL)モデルが始められた。 MGDLモデルはいくつかのグレードでDNNを学習し、それぞれが少数の層からなる浅いDNNを構築する。 MGDLモデルは、まだいくつかの非凸最適化問題を解く必要がある。提案したSALモデルはMGDLモデルから変異する。 DNNの各層がアフィン写像とアクティベーション関数から構成されていることに注意し、活性化関数を重み行列と現在の層のバイアスベクトルのみを含む二次凸最適化問題を解くことでアフィン写像を学習することを提案する。関数近似の文脈において、与えられた関数に対して、SALモデルはDNNの形式で適応基底関数を持つ関数の直交展開を生成する。 SALモデルにより生成された直交系に対して,ピタゴラスのアイデンティティとParsevalのアイデンティティを確立する。さらに、SAL過程の収束定理は、有限個のグレードの後に終了するか、その最適誤差関数のノルムが、階数数が無限大に増加するにつれて、極限まで厳密に減少することを意味する。さらに,提案したSALモデルが従来のディープラーニングモデルよりも優れていることを示す概念実証の数値例を示す。

This paper introduces a successive affine learning (SAL) model for constructing deep neural networks (DNNs). Traditionally, a DNN is built by solving a non-convex optimization problem. It is often challenging to solve such a problem numerically due to its non-convexity and having a large number of layers. To address this challenge, inspired by the human education system, the multi-grade deep learning (MGDL) model was recently initiated by the author of this paper. The MGDL model learns a DNN in several grades, in each of which one constructs a shallow DNN consisting of a small number of layers. The MGDL model still requires solving several non-convex optimization problems. The proposed SAL model mutates from the MGDL model. Noting that each layer of a DNN consists of an affine map followed by an activation function, we propose to learn the affine map by solving a quadratic/convex optimization problem which involves the activation function only {\it after} the weight matrix and the bias vector for the current layer have been trained. In the context of function approximation, for a given function the SAL model generates an orthogonal expansion of the function with adaptive basis functions in the form of DNNs. We establish the Pythagorean identity and the Parseval identity for the orthogonal system generated by the SAL model. Moreover, we provide a convergence theorem of the SAL process in the sense that either it terminates after a finite number of grades or the norms of its optimal error functions strictly decrease to a limit as the grade number increases to infinity. Furthermore, we present numerical examples of proof of concept which demonstrate that the proposed SAL model significantly outperforms the traditional deep learning model.

翻訳日:2023-05-16 18:21:10 公開日:2023-05-13

# 多言語前ファクトチェッククレーム検索

Multilingual Previously Fact-Checked Claim Retrieval ( http://arxiv.org/abs/2305.07991v1 )

ライセンス: Link先を確認

Mat\'u\v{s} Pikuliak and Ivan Srba and Robert Moro and Timo Hromadka and Timotej Smolen and Martin Melisek and Ivan Vykopal and Jakub Simko and Juraj Podrouzek and Maria Bielikova

(参考訳) ファクトチェックは、事実チェックが必要な大量のオンラインコンテンツによって、しばしば妨げられる。 NLPは、調査中のコンテンツに関連する既存の事実チェックを取得することで、それらを支援することができる。本稿では,以前に事実確認されたクレーム検索のための多言語データセットであるMultiClaimを紹介する。ソーシャルメディアから27の言語で28kの投稿、プロのファクトチェック担当者が書いた39の言語で206kのファクトチェック、そしてこれら2つのグループ間の31kの接続を集めました。これは、これまででもっとも広範囲で言語的に多様なデータセットである。教師なしの手法がデータセットとその様々な次元にどう影響するかを評価した。このような多種多様なデータセットの評価には複雑さがあり,結果の解釈に先立って適切な対応が必要となる。また,教師なしの微調整手法も評価し,教師なし手法を大幅に改善した。

Fact-checkers are often hampered by the sheer amount of online content that needs to be fact-checked. NLP can help them by retrieving already existing fact-checks relevant to the content being investigated. This paper introduces a new multilingual dataset -- MultiClaim -- for previously fact-checked claim retrieval. We collected 28k posts in 27 languages from social media, 206k fact-checks in 39 languages written by professional fact-checkers, as well as 31k connections between these two groups. This is the most extensive and the most linguistically diverse dataset of this kind to date. We evaluated how different unsupervised methods fare on this dataset and its various dimensions. We show that evaluating such a diverse dataset has its complexities and proper care needs to be taken before interpreting the results. We also evaluated a supervised fine-tuning approach, improving upon the unsupervised method significantly.

翻訳日:2023-05-16 18:20:45 公開日:2023-05-13

# 会議要約のための自己監督文圧縮

Self-Supervised Sentence Compression for Meeting Summarization ( http://arxiv.org/abs/2305.07988v1 )

ライセンス: Link先を確認

Haochen Tan, Han Wu, Wei Shao, Xinyun Zhang, Mingjie Zhan, Zhaohui Hou, Ding Liang, Linqi Song

(参考訳) 従来の要約モデルは、通常、ミーティングコーパスは長い会話を持つ複数のパーティを伴い、冗長で自明なコンテンツが詰め込まれているため、文書の要約において重要な情報をキャプチャできないことが多い。この問題に対処するために,svbは,スライディング・ウィンドウ対話の復元と\textbf{s}coring,チャネルワイズ重要度スコア \textbf{v}oting,相対位置的 \textbf{b}ucketingの3つのプロセスを通じて,冗長性を保ちながら,冗長性を‘圧縮’する,効果的かつ効率的な要約フレームワークである。具体的には、自己監督パラダイムの下で、スライディングウィンドウスコアは、複数のビューから各トークンの重要性を評価することを目的としている。そして、これらの評価はチャンネルワイド投票によって集計される。高評価のトークンは有能な情報と見なされ、‘textit{anchors} とラベル付けされる。最後に、言語モデルに対して許容される長さに長大な入力を調整するために、異なる粒度で他の無関係な内容を圧縮しながらアンカーを保持する相対的な位置バケットアルゴリズムを実行する。大規模事前学習やエキスパートレベルアノテートツールがなければ,提案手法は従来の最先端手法に匹敵する。本手法の有効性を証明するために,膨大な評価と分析を行った。

The conventional summarization model often fails to capture critical information in meeting transcripts, as meeting corpus usually involves multiple parties with lengthy conversations and is stuffed with redundant and trivial content. To tackle this problem, we present SVB, an effective and efficient framework for meeting summarization that `compress' the redundancy while preserving important content via three processes: sliding-window dialogue restoration and \textbf{S}coring, channel-wise importance score \textbf{V}oting, and relative positional \textbf{B}ucketing. Specifically, under the self-supervised paradigm, the sliding-window scoring aims to rate the importance of each token from multiple views. Then these ratings are aggregated by channel-wise voting. Tokens with high ratings will be regarded as salient information and labeled as \textit{anchors}. Finally, to tailor the lengthy input to an acceptable length for the language model, the relative positional bucketing algorithm is performed to retain the anchors while compressing other irrelevant contents in different granularities. Without large-scale pre-training or expert-grade annotating tools, our proposed method outperforms previous state-of-the-art approaches. A vast amount of evaluations and analyses are conducted to prove the effectiveness of our method.

翻訳日:2023-05-16 18:20:31 公開日:2023-05-13

# SCENE: 否定的事例への外挿のための自己ラベル型対策

SCENE: Self-Labeled Counterfactuals for Extrapolating to Negative Examples ( http://arxiv.org/abs/2305.07984v1 )

ライセンス: Link先を確認

Deqing Fu, Ameya Godbole, Robin Jia

(参考訳) 否定を検知する(非包含関係、未解決問題、虚偽主張など)ことは、多くの自然言語理解タスクにおいて重要かつ困難な側面である。手動による挑戦的なネガティブな例の収集は、モデルの検出に役立つが、コストとドメイン固有性の両方がある。本研究では,課題となる否定的な例を検出するモデルの能力を大幅に向上させるトレーニングデータの合成手法であるscene(expolating to negative examples)を提案する。既存のラベルの新しい例を合成する標準的なデータ拡張とは対照的に、SCENEは正の例のみから負の例をゼロショットに合成することができる。正の例が与えられた場合、SCENEはマスク満載モデルでそれを摂動し、その結果の例が自己学習ヒューリスティックに基づいて負かどうかを決定する。回答可能なトレーニング例のみを使用することで、studio 2.0でトレーニングされたモデルと比較して、studio 2.0のパフォーマンスギャップの69.6%をクローズすることができる。また,本手法は,文の包含度を認識してブール質問応答に拡張し,SQuADからACE-whQAへの一般化を改善する。

Detecting negatives (such as non-entailment relationships, unanswerable questions, and false claims) is an important and challenging aspect of many natural language understanding tasks. Though manually collecting challenging negative examples can help models detect them, it is both costly and domain-specific. In this work, we propose Self-labeled Counterfactuals for Extrapolating to Negative Examples (SCENE), an automatic method for synthesizing training data that greatly improves models' ability to detect challenging negative examples. In contrast with standard data augmentation, which synthesizes new examples for existing labels, SCENE can synthesize negative examples zero-shot from only positive ones. Given a positive example, SCENE perturbs it with a mask infilling model, then determines whether the resulting example is negative based on a self-training heuristic. With access to only answerable training examples, SCENE can close 69.6% of the performance gap on SQuAD 2.0, a dataset where half of the evaluation examples are unanswerable, compared to a model trained on SQuAD 2.0. Our method also extends to boolean question answering and recognizing textual entailment, and improves generalization from SQuAD to ACE-whQA, an out-of-domain extractive QA benchmark.

翻訳日:2023-05-16 18:20:04 公開日:2023-05-13

# ゼロショットFactual Error Correction

Zero-shot Faithful Factual Error Correction ( http://arxiv.org/abs/2305.07982v1 )

ライセンス: Link先を確認

Kung-Hsiang Huang, Hou Pong Chan, Heng Ji

(参考訳) 事実的誤りを忠実に訂正することは、テキスト的知識基盤の完全性を維持し、シーケンスからシーケンスへのモデルの幻覚を防止するために重要である。人間が事実の誤りを識別し、訂正する能力に基づいて、入力クレームに関する質問を定式化し、与えられた証拠の正しい回答を求め、その証拠と整合性に基づいて各補正の忠実さを評価するゼロショットフレームワークを提案する。私たちのゼロショットフレームワークは、FEVERとSciFactデータセットの実験で示されたように、完全に教師されたアプローチよりも優れています。さらに重要なことに、フレームワークの分解性は本質的に解釈可能性を提供します。さらに,事実的誤り訂正を評価するのに最も適した指標を明らかにするために,一般的に使用される指標と人間の判断との相関を,知性と忠実性に関する3つの異なる次元で分析する。

Faithfully correcting factual errors is critical for maintaining the integrity of textual knowledge bases and preventing hallucinations in sequence-to-sequence models. Drawing on humans' ability to identify and correct factual errors, we present a zero-shot framework that formulates questions about input claims, looks for correct answers in the given evidence, and assesses the faithfulness of each correction based on its consistency with the evidence. Our zero-shot framework outperforms fully-supervised approaches, as demonstrated by experiments on the FEVER and SciFact datasets, where our outputs are shown to be more faithful. More importantly, the decomposability nature of our framework inherently provides interpretability. Additionally, to reveal the most suitable metrics for evaluating factual error corrections, we analyze the correlation between commonly used metrics with human judgments in terms of three different dimensions regarding intelligibility and faithfulness.

翻訳日:2023-05-16 18:19:40 公開日:2023-05-13

# 低次元多様体上の超音速重力空力

Grasping Extreme Aerodynamics on a Low-Dimensional Manifold ( http://arxiv.org/abs/2305.08024v1 )

ライセンス: Link先を確認

Kai Fukami and Kunihiko Taira

(参考訳) 現代の航空車両は輸送、防衛、監視、救助など幅広い活動を行っている。これらの航空機は穏やかな条件で飛行できるが、都市キャニオンや山岳地帯、船の覚醒などに見られる悪質な環境での運用を避けることができる。より小型の航空機は特にこのようなガスト障害を起こしやすい。地球温暖化により極端に天候が頻繁になり、特に小型の航空機は大規模な大気障害に遭遇し、安定した飛行を管理することが期待されている。しかし、極端に渦巻くガストが飛行体に与える影響を記述できる基礎は事実上存在しない。この難しい問題を解くために、ガスティ条件の翼が遭遇する巨大なパラメータ空間が存在する。渦巻と翼の相互作用は、ガストパラメータの組合せごとに複雑で異なるように見えるが、この研究では、極端空気力学の背後にある基礎物理学は、従来の予想よりもはるかに単純で低ランクであることが示されている。時間およびパラメータ空間上の非線形渦流場は、元の高次元物理学の本質を保ちながら、リフト誘導オートエンコーダを持つ3変数のみに圧縮できることが明らかとなった。極端な空気力学的流れは、機械学習によって低次元多様体に最適圧縮され、適切な座標の同定が解析、モデリング、極端に非定常なガスティ流れの制御を促進することを示唆する。本研究は、伝統的に飛行不能と考えられる大気条件下での次世代小型航空機の安定飛行を支援するものである。

Modern air vehicles perform a wide range of operations, including transportation, defense, surveillance, and rescue. These aircraft can fly in calm conditions but avoid operations in gusty environments, which are seen in urban canyons, over mountainous terrains, and in ship wakes. Smaller aircraft are especially prone to such gust disturbances. With extreme weather becoming ever more frequent due to global warming, it is anticipated that aircraft, especially those that are smaller in size, encounter large-scale atmospheric disturbances and still be expected to manage stable flight. However, there exists virtually no foundation to describe the influence of extreme vortical gusts on flying bodies. To compound on this difficult problem, there is an enormous parameter space for gusty conditions wings encounter. While the interaction between the vortical gusts and wings is seemingly complex and different for each combination of gust parameters, we show in this study that the fundamental physics behind extreme aerodynamics is far simpler and low-rank than traditionally expected. It is revealed that the nonlinear vortical flow field over time and parameter space can be compressed to only three variables with a lift-augmented autoencoder while holding the essence of the original high-dimensional physics. Extreme aerodynamic flows can be optimally compressed through machine learning into a low-dimensional manifold, implying that the identification of appropriate coordinates facilitates analyses, modeling, and control of extremely unsteady gusty flows. The present findings support the stable flight of next-generation small air vehicles in atmosphere conditions traditionally considered unflyable.

翻訳日:2023-05-16 18:14:15 公開日:2023-05-13

# ヒント: 時間的ニューラルネットワークのためのトポロジカルに重要な経路サンプリング

TIPS: Topologically Important Path Sampling for Anytime Neural Networks ( http://arxiv.org/abs/2305.08021v1 )

ライセンス: Link先を確認

Guihong Li, Kartikeya Bhardwaj, Yuedong Yang, Radu Marculescu

(参考訳) anytime neural network(anytimenns)は、さまざまなハードウェアリソース制約下で実行時にモデルの複雑さを適応的に調整するための有望なソリューションである。しかし、手動設計のAnytimeNNはデザイナの事前経験に偏りがあり、したがって準最適ソリューションを提供する。既存の手作りアプローチの限界に対処するために、我々は最初にanytimennsのトレーニングプロセスを離散時間マルコフ連鎖(dtmc)としてモデル化し、anytimennsのトレーニングに最も寄与する経路を特定するためにそれを使用する。この新たなDTMCに基づく分析に基づいて,様々なハードウェア制約下でAnytimeNNを自動設計するフレームワークであるTIPSを提案する。実験の結果,TIPSはAnytimeNNの収束率とテスト精度を向上させることができることがわかった。既存のAnytimeNNのアプローチと比較して、TIPSは複数のデータセットで精度を2%-6.6%向上し、SOTAの精度-FLOPのトレードオフを達成する。

Anytime neural networks (AnytimeNNs) are a promising solution to adaptively adjust the model complexity at runtime under various hardware resource constraints. However, the manually-designed AnytimeNNs are biased by designers' prior experience and thus provide sub-optimal solutions. To address the limitations of existing hand-crafted approaches, we first model the training process of AnytimeNNs as a discrete-time Markov chain (DTMC) and use it to identify the paths that contribute the most to the training of AnytimeNNs. Based on this new DTMC-based analysis, we further propose TIPS, a framework to automatically design AnytimeNNs under various hardware constraints. Our experimental results show that TIPS can improve the convergence rate and test accuracy of AnytimeNNs. Compared to the existing AnytimeNNs approaches, TIPS improves the accuracy by 2%-6.6% on multiple datasets and achieves SOTA accuracy-FLOPs tradeoffs.

翻訳日:2023-05-16 18:13:50 公開日:2023-05-13

# DRew:遅延で動的にリワイヤされたメッセージパッシング

DRew: Dynamically Rewired Message Passing with Delay ( http://arxiv.org/abs/2305.08018v1 )

ライセンス: Link先を確認

Benjamin Gutteridge, Xiaowen Dong, Michael Bronstein, Francesco Di Giovanni

(参考訳) メッセージパッシングニューラルネットワーク(mpnn)は、長距離インタラクションに依存するタスクのパフォーマンス低下を引き起こす過剰スワッシング現象に苦しむことが示されている。これは主に、ノードの直近の近傍でローカルにのみ発生するメッセージパッシングに起因している。グラフを'より連結'しようとするアプローチをリライトすることは、長距離タスクに適していると思われるが、遠いノードをすべての層で瞬時に通信させるため、グラフ上の距離によって得られる帰納バイアスを失うことが多い。本稿では,いずれのmpnnアーキテクチャにも適用可能な,グラフの段階的高密度化を保証するためのレイヤ依存リワイリングを実現するフレームワークを提案する。また,各層と相互距離に依存するノード間の接続をスキップする遅延機構を提案する。提案手法を複数の長距離タスクで検証し,グラフトランスフォーマーやマルチホップmpnnよりも優れていることを示す。

Message passing neural networks (MPNNs) have been shown to suffer from the phenomenon of over-squashing that causes poor performance for tasks relying on long-range interactions. This can be largely attributed to message passing only occurring locally, over a node's immediate neighbours. Rewiring approaches attempting to make graphs `more connected', and supposedly better suited to long-range tasks, often lose the inductive bias provided by distance on the graph since they make distant nodes communicate instantly at every layer. In this paper we propose a framework, applicable to any MPNN architecture, that performs a layer-dependent rewiring to ensure gradual densification of the graph. We also propose a delay mechanism that permits skip connections between nodes depending on the layer and their mutual distance. We validate our approach on several long-range tasks and show that it outperforms graph Transformers and multi-hop MPNNs.

翻訳日:2023-05-16 18:13:32 公開日:2023-05-13

# CheXDragonのトレーニング方法:新しいタスクや医療システムへの移行のための胸部X線モデルのトレーニング

How to Train Your CheXDragon: Training Chest X-Ray Models for Transfer to Novel Tasks and Healthcare Systems ( http://arxiv.org/abs/2305.08017v1 )

ライセンス: Link先を確認

Cara Van Uden and Jeremy Irvin and Mars Huang and Nathan Dean and Jason Carr and Andrew Ng and Curtis Langlotz

(参考訳) 自己教師付き学習(SSL)は、機械学習モデルのラベルの効率的なトレーニングを可能にする。これは医療画像などの領域において必須であり、ラベルは費用がかかり、治すのに時間がかかります。しかし、異なる医療システムや新しいタスクにモデルを転送するための最も効果的な教師付きまたはSSL戦略はよく理解されていない。本研究では,医療画像(ケストX線)とテキスト(放射線学報告)のマルチモーダルデータセットを用いて,教師付きおよび自己指導型事前訓練戦略を体系的に実験した。次に、様々なタスクセットを持つ2つの外部機関のデータによる性能評価を行う。さらに,これらのモデルを新しいタスクや医療システムに効果的に適用するために,異なるトランスファー学習戦略を実験する。我々の経験的結果は、マルチモーダルSSLは、新しい医療システムやタスクにおけるパフォーマンスにおいて、完全な監視で事前訓練されたモデルに匹敵する、一過性のSSLよりも大幅に向上していることを示唆している。マルチモーダルドメイン適応型事前学習(DAPT)、線形探索(LP-FT)、および両手法の組み合わせにより、新しいデータセットとタスクにさらに適応したモデルによるさらなる性能向上を示す。これらの追加がすべて実現可能でないシナリオで使用する代替モデルを提案する。本研究は,新しい医療システムと新しい課題に対する医用画像解釈モデルの一般化に関するガイダンスを提供する。

Self-supervised learning (SSL) enables label efficient training for machine learning models. This is essential for domains such as medical imaging, where labels are costly and time-consuming to curate. However, the most effective supervised or SSL strategy for transferring models to different healthcare systems or novel tasks is not well understood. In this work, we systematically experiment with a variety of supervised and self-supervised pretraining strategies using multimodal datasets of medical images (chest X-rays) and text (radiology reports). We then evaluate their performance on data from two external institutions with diverse sets of tasks. In addition, we experiment with different transfer learning strategies to effectively adapt these pretrained models to new tasks and healthcare systems. Our empirical results suggest that multimodal SSL gives substantial gains over unimodal SSL in performance across new healthcare systems and tasks, comparable to models pretrained with full supervision. We demonstrate additional performance gains with models further adapted to the new dataset and task, using multimodal domain-adaptive pretraining (DAPT), linear probing then finetuning (LP-FT), and both methods combined. We offer suggestions for alternative models to use in scenarios where not all of these additions are feasible. Our results provide guidance for improving the generalization of medical image interpretation models to new healthcare systems and novel tasks.

翻訳日:2023-05-16 18:13:02 公開日:2023-05-13

# 軽量オールコンベネト・トランスファー学習による表面emgに基づくセッション間/サブジェクション認識

Surface EMG-Based Inter-Session/Inter-Subject Gesture Recognition by Leveraging Lightweight All-ConvNet and Transfer Learning ( http://arxiv.org/abs/2305.08014v1 )

ライセンス: Link先を確認

Md. Rabiul Islam, Daniel Massicotte, Philippe Y. Massicotte, and Wei-Ping Zhu

(参考訳) 低解像度のHD-sEMG画像を用いたジェスチャー認識は、より流動的で自然な筋肉-コンピュータインターフェースを開発するための新たな道を開く。しかし、セッション間およびサブジェクト間シナリオ間のデータ変動は大きな課題となる。既存のアプローチでは、非常に大きく複雑なConvNetまたは2SRNNベースのドメイン適応手法を使用して、これらのセッション間およびオブジェクト間データのばらつきに起因する分散シフトを近似した。したがって、これらの方法は、何百万ものトレーニングパラメータと、事前トレーニングと適応段階の両方で、トレーニング済みおよびターゲットドメインデータセットを学習する必要がある。その結果、リアルタイムアプリケーションへのデプロイには、ハイエンドのリソースバウンドと計算コストが非常にかかる。本稿では,この問題を解決するために,軽量なall-convnet and transfer learning(tl)を活用した軽量なall-convnet+tlモデルを提案する。 all-convnet+tlモデルは畳み込み層のみで構成されており、セッション間およびサブジェクト間データ可変性によって引き起こされる分散シフトに対処するための不変および判別表現を学習するための単純かつ効率的なフレームワークである。 4つのデータセットに対する実験により,提案手法は,既存の手法よりも大きなマージンで優れており,セッション間およびオブジェクト間シナリオにおける最先端の結果が得られ,セッション内ジェスチャ認識において同等あるいは競合的に実行されることを示した。これらのパフォーマンスギャップは、少数のデータ(例えば単一のトライアル)がターゲットドメインで利用可能になったときにさらに増加する。これらの顕著な実験結果は、現在の最先端モデルが、sEMGベースのセッション間およびオブジェクト間ジェスチャー認識タスクに対して過度にパラメータ化されていることを示す。

Gesture recognition using low-resolution instantaneous HD-sEMG images opens up new avenues for the development of more fluid and natural muscle-computer interfaces. However, the data variability between inter-session and inter-subject scenarios presents a great challenge. The existing approaches employed very large and complex deep ConvNet or 2SRNN-based domain adaptation methods to approximate the distribution shift caused by these inter-session and inter-subject data variability. Hence, these methods also require learning over millions of training parameters and a large pre-trained and target domain dataset in both the pre-training and adaptation stages. As a result, it makes high-end resource-bounded and computationally very expensive for deployment in real-time applications. To overcome this problem, we propose a lightweight All-ConvNet+TL model that leverages lightweight All-ConvNet and transfer learning (TL) for the enhancement of inter-session and inter-subject gesture recognition performance. The All-ConvNet+TL model consists solely of convolutional layers, a simple yet efficient framework for learning invariant and discriminative representations to address the distribution shifts caused by inter-session and inter-subject data variability. Experiments on four datasets demonstrate that our proposed methods outperform the most complex existing approaches by a large margin and achieve state-of-the-art results on inter-session and inter-subject scenarios and perform on par or competitively on intra-session gesture recognition. These performance gaps increase even more when a tiny amount (e.g., a single trial) of data is available on the target domain for adaptation. These outstanding experimental results provide evidence that the current state-of-the-art models may be overparameterized for sEMG-based inter-session and inter-subject gesture recognition tasks.

翻訳日:2023-05-16 18:12:02 公開日:2023-05-13

# 損失圧縮によるディープニューラルネットワークの情報ボトルネック解析

Information Bottleneck Analysis of Deep Neural Networks via Lossy Compression ( http://arxiv.org/abs/2305.08013v1 )

ライセンス: Link先を確認

Ivan Butakov, Aleksander Tolmachev, Sofia Malanchuk, Anna Neopryatnaya, Alexey Frolov, Kirill Andreev

(参考訳) Information Bottleneck(IB)原則は、ディープニューラルネットワーク(DNN)のトレーニングプロセスを分析するための情報理論フレームワークを提供する。その本質は、2つの相互情報(MI)値のダイナミクスを追跡することである。1つは隠れた層とクラスラベルの間のもので、もう1つは隠れた層とDNN入力の間のものである。 Shwartz-Ziv と Tishby (2017) の仮説によれば、トレーニングプロセスは、フィッティングと圧縮の2つの異なるフェーズで構成されている。後者のフェーズは、DNNによる優れた一般化性能を考慮に入れていると考えられている。高次元ランダムベクトル間のmi推定の難しい性質から、この仮説はおもちゃのnnや量子化nnやドロップアウトnnといった特定のタイプのnnに対してのみ検証されている。本稿では,一般NNのICB解析を行うための包括的フレームワークを提案する。提案手法はGoldfeld et al. (2019) によって提案された確率的NN法を利用しており、高次元性に関連する障害を克服するための圧縮ステップを取り入れている。言い換えれば、高次元ランダムベクトルの圧縮表現の間のmiを推定する。提案手法は理論的および実用的正当性の両方で支持される。特に,事前定義されたmi値を用いた合成実験により推定器の精度を示す。最後に,MI力学の新たな特徴を明らかにする畳み込み DNN を用いて IB 解析を行う。

The Information Bottleneck (IB) principle offers an information-theoretic framework for analyzing the training process of deep neural networks (DNNs). Its essence lies in tracking the dynamics of two mutual information (MI) values: one between the hidden layer and the class label, and the other between the hidden layer and the DNN input. According to the hypothesis put forth by Shwartz-Ziv and Tishby (2017), the training process consists of two distinct phases: fitting and compression. The latter phase is believed to account for the good generalization performance exhibited by DNNs. Due to the challenging nature of estimating MI between high-dimensional random vectors, this hypothesis has only been verified for toy NNs or specific types of NNs, such as quantized NNs and dropout NNs. In this paper, we introduce a comprehensive framework for conducting IB analysis of general NNs. Our approach leverages the stochastic NN method proposed by Goldfeld et al. (2019) and incorporates a compression step to overcome the obstacles associated with high dimensionality. In other words, we estimate the MI between the compressed representations of high-dimensional random vectors. The proposed method is supported by both theoretical and practical justifications. Notably, we demonstrate the accuracy of our estimator through synthetic experiments featuring predefined MI values. Finally, we perform IB analysis on a close-to-real-scale convolutional DNN, which reveals new features of the MI dynamics.

翻訳日:2023-05-16 18:11:30 公開日:2023-05-13

# スパイクニューラルネットワークの量子化

Quantization in Spiking Neural Networks ( http://arxiv.org/abs/2305.08012v1 )

ライセンス: Link先を確認

Bernhard A. Moser and Michael Lunglmayr

(参考訳) スパイキングニューラルネットワーク(SNN)では、各ノードで重み付きディラックパルスの入力シーケンスをスパイク集約としきい値の閾値に基づく漏れ積分(LIF)ニューロンモデルにより重み付きディラックパルスの出力シーケンスに変換する。この写像は量子化作用素として理解でき、アレクセイヴィチノルムを用いて量子化誤差に対応する公式を述べる。この分析は LIF モデルにおける再初期化の再考に影響を及ぼし、モジュロベースのリセット変種として 'reset-to-mod' を提案する。

In spiking neural networks (SNN), at each node, an incoming sequence of weighted Dirac pulses is converted into an output sequence of weighted Dirac pulses by a leaky-integrate-and-fire (LIF) neuron model based on spike aggregation and thresholding. We show that this mapping can be understood as a quantization operator and state a corresponding formula for the quantization error by means of the Alexiewicz norm. This analysis has implications for rethinking re-initialization in the LIF model, leading to the proposal of 'reset-to-mod' as a modulo-based reset variant.

翻訳日:2023-05-16 18:11:06 公開日:2023-05-13

# ProKnow:メンタルヘルス診断支援のための安全・説明可能な質問生成のためのプロセス知識

ProKnow: Process Knowledge for Safety Constrained and Explainable Question Generation for Mental Health Diagnostic Assistance ( http://arxiv.org/abs/2305.08010v1 )

ライセンス: Link先を確認

Kaushik Roy, Manas Gaur, Misagh Soltani, Vipula Rawte, Ashwin Kalyan, Amit Sheth

(参考訳) 現在のバーチャルメンタルヘルスアシスタント(vmhas)はカウンセリングと示唆的なケアを提供する。彼らは安全性と専門的な臨床プロセス知識の訓練が不足しているため、患者の診断支援を控えている。本研究では,Proknowをエビデンスに基づくガイドラインやドメインの専門家に対する概念理解のカテゴリにマップする情報集合として定義する。また,医療従事者が使用する安全制約やプロノウハウによって誘導される,新たな診断会話データセットも導入する。患者からの診断情報を対話的に収集する自然言語質問生成法(NLG)を開発した。このデータセットで最先端の大規模言語モデル(LM)を使用することの限界を実証する。我々のアルゴリズムは、安全性、知識獲得、説明可能性を明確にモデル化することでプロセスの知識をモデル化する。 ProKnowガイド法で拡張したLMは、うつ病や不安領域でより安全な89%の質問を発生させた。生成した質問の説明性は、抑うつや不安に関する知識ベースの概念と類似した計算によって評価される。総じて,本手法を改良したlmsのタイプに関わらず,安全性,説明可能性,プロセスガイドによる質問生成において,事前学習した単純なlmsと比較して平均82%の改善を達成できた。提案手法の有効性を定量的に定量的に評価し,安全性,説明可能性,プロセス知識の順守に関する3つの新しい評価指標を導入する。

Current Virtual Mental Health Assistants (VMHAs) provide counseling and suggestive care. They refrain from patient diagnostic assistance because they lack training in safety-constrained and specialized clinical process knowledge. In this work, we define Proknow as an ordered set of information that maps to evidence-based guidelines or categories of conceptual understanding to experts in a domain. We also introduce a new dataset of diagnostic conversations guided by safety constraints and Proknow that healthcare professionals use. We develop a method for natural language question generation (NLG) that collects diagnostic information from the patient interactively. We demonstrate the limitations of using state-of-the-art large-scale language models (LMs) on this dataset. Our algorithm models the process knowledge through explicitly modeling safety, knowledge capture, and explainability. LMs augmented with ProKnow guided method generated 89% safer questions in the depression and anxiety domain. The Explainability of the generated question is assessed by computing similarity with concepts in depression and anxiety knowledge bases. Overall, irrespective of the type of LMs augmented with our ProKnow, we achieved an average 82% improvement over simple pre-trained LMs on safety, explainability, and process-guided question generation. We qualitatively and quantitatively evaluate the efficacy of the proposed ProKnow-guided methods by introducing three new evaluation metrics for safety, explainability, and process knowledge adherence.

翻訳日:2023-05-16 18:10:52 公開日:2023-05-13

# 時間依存量子振動子に対するウィグナー・ヴラソフ形式

The Wigner-Vlasov formalism for time-dependent quantum oscillator ( http://arxiv.org/abs/2305.06069v3 )

ライセンス: Link先を確認

E.E. Perepelkin, B.I. Sadovnikov, N.G. Inozemtseva, A.A. Korepanova

(参考訳) 本稿では,位相空間における量子系に対するvlasov理論とwigner関数の枠組みにおける時間依存周波数を持つ高調波発振器の問題を包括的に検討する。ヴェラソフ方程式チェーンとシュル=オディンガー方程式、およびウィグナー関数のモヤル方程式の関係を用いて、この問題の厳密な解を求める新しい方法が提案されている。位相空間におけるウィグナー関数上のエネルギー関数を平均化する方法は、量子系に対する時間依存エネルギースペクトルを得るために用いられる。ヴラソフ方程式の解はヒル方程式を満たす特性の形で表現することができる。ヒル方程式の特別な場合、すなわち不安定解を持つマチュー方程式は詳細に検討されている。不安定な量子系のダイナミクスの解析により、ウィグナー関数レベル線で有界な位相空間の正方形は時間保存されるが、エネルギー関数線で有界な位相空間の正方形は増加する。この場合、ヴラソフ方程式の特徴はウィグナー関数レベル線とエネルギー関数ラインの交差点に位置する。このクロスポイントは不安定なシステムのダイナミクスを表す軌道で時間とともに移動する。それぞれの軌道は独自のエネルギーを持ち、ウィグナー関数上でこれらのエネルギーを平均すると、システム全体の時間依存離散エネルギースペクトルとなる。一般化位相空間 $\left\{x,p,\dot{p},\ddot{p} \right\} において、4階のウィグナー函数に対して明示的な表現が得られている。 $

This paper presents a comprehensive investigation of the problem of a harmonic oscillator with time-depending frequencies in the framework of the Vlasov theory and the Wigner function apparatus for quantum systems in the phase space. A new method is proposed to find an exact solution of this problem using a relation of the Vlasov equation chain with the Schr\"odinger equation and with the Moyal equation for the Wigner function. A method of averaging the energy function over the Wigner function in the phase space can be used to obtain time-dependent energy spectrum for a quantum system. The Vlasov equation solution can be represented in the form of characteristics satisfying the Hill equation. A particular case of the Hill equation, namely the Mathieu equation with unstable solutions, has been considered in details. An analysis of the dynamics of an unstable quantum system shows that the phase space square bounded with the Wigner function level line conserves in time, but the phase space square bounded with the energy function line increases. In this case the Vlasov equation characteristic is situated on the crosspoint of the Wigner function level line and the energy function line. This crosspoint moves in time with a trajectory that represents the unstable system dynamics. Each such trajectory has its own energy, and averaging these energies over the Wigner function results in time-dependent discreet energy spectrum for the whole system. An explicit expression has been obtained for the Wigner function of the 4th rank in the generalized phase space $\left\{ x,p,\dot{p},\ddot{p} \right\}.$

翻訳日:2023-05-16 11:16:59 公開日:2023-05-13

PDF登録状況（公開日: 20230513）