Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230706となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 話者認識システムのバージョン制御 Version Control of Speaker Recognition Systems ( http://arxiv.org/abs/2007.12069v4 ) ライセンス: Link先を確認	Quan Wang, Ignacio Lopez Moreno	(参考訳) 本稿では,話者認識システムにおける最も困難な工学的問題の一つとして,モデルとユーザプロファイルのバージョン管理について論じる。典型的な話者認識システムは、ユーザが提供した登録音声からプロファイルを生成する登録ステージと、ランタイムオーディオの音声idを格納されたプロファイルと比較するランタイムステージの2つのステージで構成される。技術が進歩するにつれて、話者認識システムはパフォーマンスを改善するために更新される必要がある。しかし、保存されたユーザープロファイルが適切に更新されていない場合、バージョンミスマッチは意味のない認識結果をもたらす。本稿では,長年のエンジニアリング実践からgoogleで注意深く研究されてきた音声認識システムのバージョン管理戦略について述べる。これらの戦略は、製品環境へのデプロイ方法、デバイスサイドデプロイメント、サーバサイドデプロイメント、ハイブリッドデプロイメントの3つのグループに分類される。様々なネットワーク構成下で異なる戦略と定量的指標を比較するために,speakerversim(話者認識システムにおける異なるサーバ側配置戦略のためのpythonベースのシミュレーションフレームワーク)を提案する。 This paper discusses one of the most challenging practical engineering problems in speaker recognition systems - the version control of models and user profiles. A typical speaker recognition system consists of two stages: the enrollment stage, where a profile is generated from user-provided enrollment audio; and the runtime stage, where the voice identity of the runtime audio is compared against the stored profiles. As technology advances, the speaker recognition system needs to be updated for better performance. However, if the stored user profiles are not updated accordingly, version mismatch will result in meaningless recognition results. In this paper, we describe different version control strategies for speaker recognition systems that had been carefully studied at Google from years of engineering practice. These strategies are categorized into three groups according to how they are deployed in the production environment: device-side deployment, server-side deployment, and hybrid deployment. To compare different strategies with quantitative metrics under various network configurations, we present SpeakerVerSim, an easily-extensible Python-based simulation framework for different server-side deployment strategies of speaker recognition systems.	翻訳日:2023-10-24 16:07:39 公開日:2023-07-06
# 便利なコードレビューコメントの特定の進歩を探る Exploring the Advances in Identifying Useful Code Review Comments ( http://arxiv.org/abs/2307.00692v2 ) ライセンス: Link先を確認	Sharif Ahmed and Nasir U. Eisty	(参考訳) 協調ソフトウェア開発における効果的な相互コードレビューは、有用なレビュアーコメントとサポート的な自動化ツールを必要とする。コードレビューのコメントは、業界とオープンソース開発におけるModern Code Reviewプロセスの中心的なコンポーネントである。したがって、これらのコメントがその目的を達成することが重要である。本稿では,コードレビューコメントの有用性に関する研究の進化を反映する。コードレビューコメントの有用性を定義した論文、データセットのマイニングとアノテート、開発者の知覚の研究、さまざまな側面の要因の分析、機械学習分類器を使用してコードレビューコメントの有用性を自動的に予測する。最後に、将来の研究で有用なコードレビューコメントを認識する際のオープンな問題と課題について論じる。 Effective peer code review in collaborative software development necessitates useful reviewer comments and supportive automated tools. Code review comments are a central component of the Modern Code Review process in the industry and open-source development. Therefore, it is important to ensure these comments serve their purposes. This paper reflects the evolution of research on the usefulness of code review comments. It examines papers that define the usefulness of code review comments, mine and annotate datasets, study developers' perceptions, analyze factors from different aspects, and use machine learning classifiers to automatically predict the usefulness of code review comments. Finally, it discusses the open problems and challenges in recognizing useful code review comments for future research.	翻訳日:2023-10-23 18:36:25 公開日:2023-07-06
# PersonaGen: ユーザフィードバックからペルソナを生成するツール PersonaGen: A Tool for Generating Personas from User Feedback ( http://arxiv.org/abs/2307.00390v2 ) ライセンス: Link先を確認	Xishuo Zhang, Lin Liu, Yi Wang, Xiao Liu, Hailong Wang, Anqi Ren, Chetan Arora	(参考訳) ペルソナはソフトウェア開発プロセス、特にアジャイル環境では不可欠です。しかしながら、アジャイルソフトウェア開発プロセスにおけるユーザからのフィードバックからペルソナを生成する効果的なツールはありません。このギャップを埋めるために、GPT-4モデルとナレッジグラフを使用して、よく処理されたユーザフィードバックからペルソナテンプレートを生成し、アジャイルソフトウェア開発プロセスにおける要求分析を容易にする新しいツールを提案する。ペルソナゲンというツールを開発しました学生ソフトウェアプロジェクトに関わる小規模なユーザスタディから質的なフィードバックを用いてPersonaGenを評価した。その結果,ペルソナをベースとした教育実践における課題と,非機能的要件への対処が混在していた。 Personas are crucial in software development processes, particularly in agile settings. However, no effective tools are available for generating personas from user feedback in agile software development processes. To fill this gap, we propose a novel tool that uses the GPT-4 model and knowledge graph to generate persona templates from well-processed user feedback, facilitating requirement analysis in agile software development processes. We developed a tool called PersonaGen. We evaluated PersonaGen using qualitative feedback from a small-scale user study involving student software projects. The results were mixed, highlighting challenges in persona-based educational practice and addressing non-functional requirements.	翻訳日:2023-10-23 18:34:37 公開日:2023-07-06
# teaser: シミュレーションに基づく自動運転車ソフトウェアのcanバス回帰テスト TEASER: Simulation-based CAN Bus Regression Testing for Self-driving Cars Software ( http://arxiv.org/abs/2307.03279v1 ) ライセンス: Link先を確認	Christian Birchler, Cyrill Rohrbach, Hyeongkyun Kim, Alessio Gambi, Tianhai Liu, Jens Horneber, Timo Kehrer, Sebastiano Panichella	(参考訳) 自動運転車(SDC)のような安全クリティカルなシステムのためのソフトウェアシステムは厳格にテストする必要がある。特に、SDCの電子制御ユニット(ECU)は、現実的な入力データでテストする必要がある。この文脈では、一般的に、コントローラエリアネットワーク(CAN)と呼ばれる通信プロトコルが、センサーデータをSDC制御ユニットに転送するために使用される。 SDCメンテナとテスタにとっての課題は、現実の世界におけるSDCの状態を現実的に表現するCANインプットを手動で定義する必要があることだ。この課題に対処するため,我々は,最先端の自動車シミュレータからセンサから取得したsdcに対して,現実的なcan信号を生成するツールであるteaserを開発した。自動車分野の企業であるaicas GmbHのDevOpsパイプラインへの統合機能に基づいてTEASERを評価した。具体的には、Jenkinsで構成されたContinous Integration(CI)パイプラインにTEASERを統合しました。パイプラインは、シミュレーション環境でテストケースを実行し、CANバス上のセンサデータを、テスト対象である物理CANデバイスに送信する。 TEASERは,シミュレーションに基づく故障(回帰戦略を用いて)を公開するCIテストケースの生成と実行が可能であり,実世界におけるSDCの実態を現実的に表現するCAN入力を生成する。この結果は,SDCソフトウェアにおけるシミュレーションベースCANバス回帰テストの自動化と有効性を高める上で重要である。ツール: https://doi.org/10.5281/zenodo.7964890 github: https://github.com/christianbirchler-org/sdc-scissor/releases/tag/v2.2.0-rc.1 ドキュメント: https://sdc-scissor.readthedocs.io Software systems for safety-critical systems like self-driving cars (SDCs) need to be tested rigorously. Especially electronic control units (ECUs) of SDCs should be tested with realistic input data. In this context, a communication protocol called Controller Area Network (CAN) is typically used to transfer sensor data to the SDC control units. A challenge for SDC maintainers and testers is the need to manually define the CAN inputs that realistically represent the state of the SDC in the real world. To address this challenge, we developed TEASER, which is a tool that generates realistic CAN signals for SDCs obtained from sensors from state-of-the-art car simulators. We evaluated TEASER based on its integration capability into a DevOps pipeline of aicas GmbH, a company in the automotive sector. Concretely, we integrated TEASER in a Continous Integration (CI) pipeline configured with Jenkins. The pipeline executes the test cases in simulation environments and sends the sensor data over the CAN bus to a physical CAN device, which is the test subject. Our evaluation shows the ability of TEASER to generate and execute CI test cases that expose simulation-based faults (using regression strategies); the tool produces CAN inputs that realistically represent the state of the SDC in the real world. This result is of critical importance for increasing automation and effectiveness of simulation-based CAN bus regression testing for SDC software. Tool: https://doi.org/10.5281/zenodo.7964890 GitHub: https://github.com/christianbirchler-org/sdc-scissor/releases/tag/v2.2.0-rc.1 Documentation: https://sdc-scissor.readthedocs.io	翻訳日:2023-10-23 18:15:27 公開日:2023-07-06
# ニューラルネットワークをガイドする芸術的戦略 Artistic Strategies to Guide Neural Networks ( http://arxiv.org/abs/2307.07521v1 ) ライセンス: Link先を確認	Varvara Guljajeva, Mar Canet Sola, Isaac Joseph Clarke	(参考訳) 人工知能は文化の生成と分布に存在している。アーティストはどのようにニューラルネットワークを利用するのか? これらのアルゴリズムが芸術的実践に与える影響は? 本稿では,現在のai技術,より正確にはディープニューラルネットワークの可能性と限界について,画像,テキスト,フォーム,および記号空間の翻訳の文脈で検討する。比較的短時間で高解像度画像と3Dオブジェクトの生成が達成された。 CLIPやtext2meshのような、出力と同じ種類のメディア入力を必要としないモデルがあります。このようなツイストはクリエイティビティの刺激に寄与し、アートの実践で現れ、開発者のパイプラインにフィードバックします。またしても、アートワークがテクノロジー開発の触媒となる様子が見られます。これらの創造的なシナリオとプロセスは、AIモデルだけでなく、これらの新技術の実装の背後にある懸命な努力によって実現されます。 AIは'プッシュ・ア・ボタン'の傑作を作るのではなく、その背後にある技術と創造的で批判的な考え方を深く理解する必要があります。このように、AIはインスピレーションのための新しい道を開き、新しいツールセットを提供する。 Artificial Intelligence is present in the generation and distribution of culture. How do artists exploit neural networks? What impact do these algorithms have on artistic practice? Through a practice-based research methodology, this paper explores the potentials and limits of current AI technology, more precisely deep neural networks, in the context of image, text, form and translation of semiotic spaces. In a relatively short time, the generation of high-resolution images and 3D objects has been achieved. There are models, like CLIP and text2mesh, that do not need the same kind of media input as the output; we call them translation models. Such a twist contributes toward creativity arousal, which manifests itself in art practice and feeds back to the developers' pipeline. Yet again, we see how artworks act as catalysts for technology development. Those creative scenarios and processes are enabled not solely by AI models, but by the hard work behind implementing these new technologies. AI does not create a 'push-a-button' masterpiece but requires a deep understanding of the technology behind it, and a creative and critical mindset. Thus, AI opens new avenues for inspiration and offers novel tool sets, and yet again the question of authorship is asked.	翻訳日:2023-07-23 12:15:00 公開日:2023-07-06
# 大規模言語モデルの限界・損害・リスクの増幅 Amplifying Limitations, Harms and Risks of Large Language Models ( http://arxiv.org/abs/2307.04821v1 ) ライセンス: Link先を確認	Michael O'Neill and Mark Connor	(参考訳) 本稿では、人工知能(AI)とその能力に関する誇大広告の急増と、AIが知覚的かつ超知的なものになると起こりうるSFシナリオに関する話がもたらす混乱に対抗すべく、小さなジェスチャーとして提示する。また、この分野外の人たちがai技術の制限についてもっと情報を得るのに役立つかもしれない。一般的な談話の現在の文脈では、AIのデフォルトは、ChatGPTの作成に使用されるような基礎と大規模言語モデル(LLM)を意味する。これはそれ自体、AIの分野を真に表している多様性、深さ、研究の量、研究者、技術の誤表現である。 AIは、少なくとも1950年代からソフトウェアアーチファクトに存在した研究分野である。私たちはLSMのいくつかの制限を強調することにしました。そのために、すでに害が発生しており、これらの制限のために引き続き起こり続けることを強調しました。その過程で私たちは、この技術を使用する個人や組織に関連するリスクについても強調しています。 We present this article as a small gesture in an attempt to counter what appears to be exponentially growing hype around Artificial Intelligence (AI) and its capabilities, and the distraction provided by the associated talk of science-fiction scenarios that might arise if AI should become sentient and super-intelligent. It may also help those outside of the field to become more informed about some of the limitations of AI technology. In the current context of popular discourse AI defaults to mean foundation and large language models (LLMs) such as those used to create ChatGPT. This in itself is a misrepresentation of the diversity, depth and volume of research, researchers, and technology that truly represents the field of AI. AI being a field of research that has existed in software artefacts since at least the 1950's. We set out to highlight a number of limitations of LLMs, and in so doing highlight that harms have already arisen and will continue to arise due to these limitations. Along the way we also highlight some of the associated risks for individuals and organisations in using this technology.	翻訳日:2023-07-16 04:05:12 公開日:2023-07-06
# S2vNTM: 半教師付きvMFニューラルトピックモデリング S2vNTM: Semi-supervised vMF Neural Topic Modeling ( http://arxiv.org/abs/2307.04804v1 ) ライセンス: Link先を確認	Weijie Xu, Jay Desai, Srinivasan Sengamedu, Xiaoyu Jiang, Francis Iannacci	(参考訳) 言語モデルに基づく手法はテキスト分類の強力な手法である。しかし、モデルにはいくつかの欠点がある。 1)キーワードなどの人的知識を統合することは困難である。 (2) モデルをトレーニングするには多くのリソースが必要です。 3) 事前学習には大きなテキストデータに頼った。本稿では,これらの課題を克服するためのセミスーパービジョンvMFニューラルトピックモデリング(S2vNTM)を提案する。 S2vNTMはいくつかのシードキーワードをトピックの入力として取り込む。 s2vntmはキーワードのパターンを利用して潜在的なトピックを特定し、トピックのキーワードセットの品質を最適化する。様々なデータセットにおいて、S2vNTMは、限定キーワードによる分類精度において、既存の半教師付きトピックモデリング手法よりも優れている。 S2vNTMはベースラインの少なくとも2倍の速度である。 Language model based methods are powerful techniques for text classification. However, the models have several shortcomings. (1) It is difficult to integrate human knowledge such as keywords. (2) It needs a lot of resources to train the models. (3) It relied on large text data to pretrain. In this paper, we propose Semi-Supervised vMF Neural Topic Modeling (S2vNTM) to overcome these difficulties. S2vNTM takes a few seed keywords as input for topics. S2vNTM leverages the pattern of keywords to identify potential topics, as well as optimize the quality of topics' keywords sets. Across a variety of datasets, S2vNTM outperforms existing semi-supervised topic modeling methods in classification accuracy with limited keywords provided. S2vNTM is at least twice as fast as baselines.	翻訳日:2023-07-16 04:03:44 公開日:2023-07-06
# UniCoRN:認知信号と人間の言語をブリッジする統合認知信号再構成 UniCoRN: Unified Cognitive Signal ReconstructioN bridging cognitive signals and human language ( http://arxiv.org/abs/2307.05355v1 ) ライセンス: Link先を確認	Nuwa Xi, Sendong Zhao, Haochun Wang, Chi Liu, Bing Qin and Ting Liu	(参考訳) 認知信号(例えばfMRI)からテキスト刺激を復号することで、人間の言語システムに対する理解を深め、汎用的なBrain-Computer Interfaceを構築する道を開く。しかし、既存の研究では、個々の単語レベルのfMRIボリュームを制限された語彙から復号することに重点を置いている。本稿では,fMRI時系列と人間の言語を橋渡しする最初のオープン語彙課題であるfMRI2textを提案する。さらに,この課題の可能性を探究するために,脳デコードのための統一認知信号再構成(unified cognitive signal reconstruction for brain decoding)というベースラインソリューションを提案する。個々の時間と時系列の両方を再構成することで、ユニコーンは認知信号のためのロバストなエンコーダ(fmriと脳波)を確立する。事前訓練された言語モデルをデコーダとして活用することにより、UniCoRNは様々な分割設定でfMRIシリーズからコヒーレントテキストを復号する効果を証明している。このモデルでは、fMRI2text上で34.77%のBLEUスコアを達成し、EEGto-textデコーディングに一般化すると37.04%のBLEUを達成し、その結果、以前のベースラインを上回った。実験結果から, 連続fMRIボリュームの復号化の実現可能性, 統合構造を用いた認知信号の復号化の有効性が示唆された。 Decoding text stimuli from cognitive signals (e.g. fMRI) enhances our understanding of the human language system, paving the way for building versatile Brain-Computer Interface. However, existing studies largely focus on decoding individual word-level fMRI volumes from a restricted vocabulary, which is far too idealized for real-world application. In this paper, we propose fMRI2text, the first openvocabulary task aiming to bridge fMRI time series and human language. Furthermore, to explore the potential of this new task, we present a baseline solution, UniCoRN: the Unified Cognitive Signal ReconstructioN for Brain Decoding. By reconstructing both individual time points and time series, UniCoRN establishes a robust encoder for cognitive signals (fMRI & EEG). Leveraging a pre-trained language model as decoder, UniCoRN proves its efficacy in decoding coherent text from fMRI series across various split settings. Our model achieves a 34.77% BLEU score on fMRI2text, and a 37.04% BLEU when generalized to EEGto-text decoding, thereby surpassing the former baseline. Experimental results indicate the feasibility of decoding consecutive fMRI volumes, and the effectiveness of decoding different cognitive signals using a unified structure.	翻訳日:2023-07-16 03:54:59 公開日:2023-07-06
# 解釈可能かつ効率的なppg信号品質評価とアーティファクトセグメンテーションのための学習カーネル Learned Kernels for Interpretable and Efficient PPG Signal Quality Assessment and Artifact Segmentation ( http://arxiv.org/abs/2307.05385v1 ) ライセンス: Link先を確認	Sully F. Chen, Zhicheng Guo, Cheng Ding, Xiao Hu, Cynthia Rudin	(参考訳) Photoplethysmography (PPG) は、様々な心血管パラメータを継続的に監視する、低コストで非侵襲的な方法を提供する。 PPG信号はウェアラブルデバイスによって生成され、人体の動きなどの外部要因によって引き起こされる大きな成果物を頻繁に含む。生理学的パラメータのロバストで正確な抽出を確保するために、信号の破損領域を識別し適切に処理する必要がある。それまでの方法論は、手作りの特徴検出器や、準最適性能をもたらす信号メトリクスに依存するか、あるいは解釈性に欠け、計算的かつメモリ集約的なディープニューラルネットワーク(DNN)のような機械学習技術に依存していた。本研究では,数桁のパラメータを持つ最先端のdnnアプローチとよく似た性能を持つ,解釈可能な畳み込み型カーネルの小さな集合を学習する新しい手法を提案する。この作業により、低消費電力デバイス上で効率的に、堅牢で、解釈可能な信号品質評価とアーティファクトセグメンテーションが可能になる。 Photoplethysmography (PPG) provides a low-cost, non-invasive method to continuously monitor various cardiovascular parameters. PPG signals are generated by wearable devices and frequently contain large artifacts caused by external factors, such as motion of the human subject. In order to ensure robust and accurate extraction of physiological parameters, corrupted areas of the signal need to be identified and handled appropriately. Previous methodology relied either on handcrafted feature detectors or signal metrics which yield sub-optimal performance, or relied on machine learning techniques such as deep neural networks (DNN) which lack interpretability and are computationally and memory intensive. In this work, we present a novel method to learn a small set of interpretable convolutional kernels that has performance similar to -- and often better than -- the state-of-the-art DNN approach with several orders of magnitude fewer parameters. This work allows for efficient, robust, and interpretable signal quality assessment and artifact segmentation on low-power devices.	翻訳日:2023-07-16 03:44:09 公開日:2023-07-06
# 動的およびパーソナライズされたカープーリングサービスのための機械学習ランキングアルゴリズム A Machine-Learned Ranking Algorithm for Dynamic and Personalised Car Pooling Services ( http://arxiv.org/abs/2307.05697v1 ) ライセンス: Link先を確認	Mattia Giovanni Campana, Franca Delmastro, Raffaele Bruno	(参考訳) 自動車のプール化は、交通渋滞や大気汚染の低減に大きく貢献し、ドライバーと旅行者とを同じ時間帯や時間帯で共有できるようにすることが期待されている。多くのカープールマッチングサービスが、ドライバーと潜在的な乗客のプール内で、効率的にライドマッチを見つけるために設計されている。しかし現在では、単純なモビリティニーズ以外の多くの非金銭的側面や社会的配慮が、予測が難しい乗り心地の個々人の意思に影響を及ぼす可能性があると認識されている。そこで本研究では,カープーリングサービスのためのレコメンダシステムであるgogogetherを提案する。このシステムでは,カープーリングの学習とランク付け技術を活用して,選択履歴から各ユーザのパーソナライズされたランキングモデルを自動的に導出する。次にgotogetherは、提示されたマッチの成功率を最大化するために推奨乗車数のリストを構築する。提案手法の性能を検証するため,大都市圏の移動パターンや乗車要求のデータセットを生成するために,Twitter やFoursquare の情報源からの実際のデータを利用する。提案手法は,静的条件と動的条件の両方において,パーソナライズされたユーザの選択モデルを精度良く予測できることを示す。 Car pooling is expected to significantly help in reducing traffic congestion and pollution in cities by enabling drivers to share their cars with travellers with similar itineraries and time schedules. A number of car pooling matching services have been designed in order to efficiently find successful ride matches in a given pool of drivers and potential passengers. However, it is now recognised that many non-monetary aspects and social considerations, besides simple mobility needs, may influence the individual willingness of sharing a ride, which are difficult to predict. To address this problem, in this study we propose GoTogether, a recommender system for car pooling services that leverages on learning-to-rank techniques to automatically derive the personalised ranking model of each user from the history of her choices (i.e., the type of accepted or rejected shared rides). Then, GoTogether builds the list of recommended rides in order to maximise the success rate of the offered matches. To test the performance of our scheme we use real data from Twitter and Foursquare sources in order to generate a dataset of plausible mobility patterns and ride requests in a metropolitan area. The results show that the proposed solution quickly obtain an accurate prediction of the personalised user's choice model both in static and dynamic conditions.	翻訳日:2023-07-16 03:25:27 公開日:2023-07-06
# 高速広帯域分光器を用いたspd源のスペクトル特性評価 Spectral characterization of a SPDC source with a fast broadband spectrometer ( http://arxiv.org/abs/2307.06843v1 ) ライセンス: Link先を確認	Brianna Farella, Gregory Medwig, Raphael A. Abrahao, and Andrei Nomerotski	(参考訳) Spontaneous Parametric Down-Conversion (SPDC) ソースで生成された単一光子の特性を知ることは、特定のアプリケーションや用途に不可欠である。特に、スペクトル特性は重要な関連性である。本稿では,高速ブロードバンド分光器を用いた商用SPDC音源について検討する。我々の分析は、他のSPDCソースや、他の単光子生成技術に対して有効な手法であり、この分光器の設計方法のよい例を提供する。我々はアルゴン放射スペクトルの既知の線を用いて分光計を校正する。 spdc源からの2つの逆変換光子はポンプのパワーによって異なるスペクトル特性を有しており、その条件下では逆変換光子とスペクトル的に類似していることが示されている。最後に,ポンプ光子のスペクトル情報を再構成し,検討することができた。 Knowing the properties of the single photons produced in a Spontaneous Parametric Down-Conversion (SPDC) source can be crucial for specific applications and uses. In particular, the spectral properties are of key relevance. Here, we investigate a commercial SPDC source using our fast broadband spectrometer. Our analysis is a valid method for other SPDC sources, as well as other single-photon generation techniques, thus providing a good example of how to use this spectrometer design. We calibrate the spectrometer using known lines of the argon emission spectrum. We show that the two down-converted photons from the SPDC source have different spectral properties depending on the pump power, and in which condition we measured spectrally similar down-converted photons. Lastly, we were able to reconstruct and investigate the spectral information for the pump photon.	翻訳日:2023-07-16 03:16:33 公開日:2023-07-06
# ソーシャルメディア上の摂食障害コンテンツの同定のためのサイト非依存型マルチモーダル深層学習モデル A Novel Site-Agnostic Multimodal Deep Learning Model to Identify Pro-Eating Disorder Content on Social Media ( http://arxiv.org/abs/2307.06775v1 ) ライセンス: Link先を確認	Jonathan Feldman	(参考訳) 過去10年間で、摂食障害の診断や摂食障害による死亡が急増し、新型コロナウイルス(covid-19)のパンデミックで絶頂期を迎えた。この大きな成長は、パンデミックのストレス要因だけでなく、摂食障害を促進するコンテンツに溢れるソーシャルメディアへの露出の増加にも起因している。このような内容は視聴者の摂食障害を引き起こすことがある。本研究では,視覚データとテキストデータの組み合わせに基づいて,あるソーシャルメディア投稿が摂食障害を促進するかどうかを判断できるマルチモーダル深層学習モデルの構築を目的とした。ツイートのラベル付きデータセットがtwitterから収集され、12のディープラーニングモデルがトレーニングされ、テストされた。モデル性能に基づいて、最も効果的なディープラーニングモデルは、RoBERTa自然言語処理モデルとMaxViT画像分類モデルのマルチモーダル融合であり、それぞれ95.9%と0.959のスコアが得られた。 RoBERTaとMaxViTの融合モデルは、ソーシャルメディアサイトTumblrとRedditの未ラベルの投稿のデータセットを分類するためにデプロイされ、人工知能を使わない以前の研究と同様の分類を生成し、人工知能が研究者と一致する洞察を発達させることができることを示した。さらに、このモデルは、Twitterのハッシュタグ8件の未確認ツイートの時系列分析に使われ、プロ食障害コンテンツが相対的に減少していることが判明した。しかし、2018年ごろから、摂食障害のコンテンツは減少を止めるか、あるいは再び豊富に上昇している。 Over the last decade, there has been a vast increase in eating disorder diagnoses and eating disorder-attributed deaths, reaching their zenith during the Covid-19 pandemic. This immense growth derived in part from the stressors of the pandemic but also from increased exposure to social media, which is rife with content that promotes eating disorders. Such content can induce eating disorders in viewers. This study aimed to create a multimodal deep learning model capable of determining whether a given social media post promotes eating disorders based on a combination of visual and textual data. A labeled dataset of Tweets was collected from Twitter, upon which twelve deep learning models were trained and tested. Based on model performance, the most effective deep learning model was the multimodal fusion of the RoBERTa natural language processing model and the MaxViT image classification model, attaining accuracy and F1 scores of 95.9% and 0.959 respectively. The RoBERTa and MaxViT fusion model, deployed to classify an unlabeled dataset of posts from the social media sites Tumblr and Reddit, generated similar classifications as previous research studies that did not employ artificial intelligence, showing that artificial intelligence can develop insights congruent to those of researchers. Additionally, the model was used to conduct a time-series analysis of yet unseen Tweets from eight Twitter hashtags, uncovering that the relative abundance of pro-eating disorder content has decreased drastically. However, since approximately 2018, pro-eating disorder content has either stopped its decline or risen once more in ampleness.	翻訳日:2023-07-16 03:15:37 公開日:2023-07-06
# 口腔上皮性異形成症における悪性度予測のための完全自動説明可能アルゴリズム A Fully Automated and Explainable Algorithm for the Prediction of Malignant Transformation in Oral Epithelial Dysplasia ( http://arxiv.org/abs/2307.03757v1 ) ライセンス: Link先を確認	Adam J Shephard, Raja Muhammad Saad Bashir, Hanya Mahmood, Mostafa Jahanifar, Fayyaz Minhas, Shan E Ahmed Raza, Kris D McCombe, Stephanie G Craig, Jacqueline James, Jill Brooks, Paul Nankivell, Hisham Mehanna, Syed Ali Khurram, Nasir M Rajpoot	(参考訳) 口腔上皮性異形成症(OED)は,口腔病変を主訴とする病理組織学的診断である。その段階的変化は重要な観察者間変動に苦しめられ、悪性腫瘍の進行を確実に予測することはなく、潜在的に最適な治療決定につながる可能性がある。そこで我々は,HuematoxylinとEosinのスライディング画像全体の組織学的パターンに基づいて,口腔悪性度変換(OMT)のリスクスコアを割り当て,OED進行のリスクを定量化する,新しい人工知能アルゴリズムを開発した。このアルゴリズムは、内部のセグメンテーションモデルを用いて上皮内(および周辺)の核の検出とセグメンテーションに基づいている。次に,形態・空間的特徴を解釈可能な浅層ニューラルネットワークを用いて組織マーカーをエミュレートした。開発コホート (sheffield; n = 193例) について内部的な相互評価を行い, 2つの外部コホート (birmingham and belfast; n = 92例) について独立した検証を行った。提案された OMTscore は、OED が悪性度に進行するか否かを予測するときに AUROC = 0.74 を得る。生存分析の結果,手動で指定したWHOとバイナリグレードと比較すると,悪性度変化の予測にはOMTスコアが有用であった。正常に予測された症例の解析により上皮周囲および上皮内浸潤リンパ球の存在が判明した(p < 0.0001)。これは、外部データセットで検証しつつ、解釈可能な核の特徴に基づいてOED変換を予測するための完全に自動化されたアルゴリズムを提案する最初の研究である。本アルゴリズムは,OED悪性度変化の予測にヒトよりも優れた性能を示し,通常の臨床実践においてOEDをグレードする課題に対して,有望な解決策を提供する。 Oral epithelial dysplasia (OED) is a premalignant histopathological diagnosis given to lesions of the oral cavity. Its grading suffers from significant inter-/intra- observer variability, and does not reliably predict malignancy progression, potentially leading to suboptimal treatment decisions. To address this, we developed a novel artificial intelligence algorithm that can assign an Oral Malignant Transformation (OMT) risk score, based on histological patterns in the in Haematoxylin and Eosin stained whole slide images, to quantify the risk of OED progression. The algorithm is based on the detection and segmentation of nuclei within (and around) the epithelium using an in-house segmentation model. We then employed a shallow neural network fed with interpretable morphological/spatial features, emulating histological markers. We conducted internal cross-validation on our development cohort (Sheffield; n = 193 cases) followed by independent validation on two external cohorts (Birmingham and Belfast; n = 92 cases). The proposed OMTscore yields an AUROC = 0.74 in predicting whether an OED progresses to malignancy or not. Survival analyses showed the prognostic value of our OMTscore for predicting malignancy transformation, when compared to the manually-assigned WHO and binary grades. Analysis of the correctly predicted cases elucidated the presence of peri-epithelial and epithelium-infiltrating lymphocytes in the most predictive patches of cases that transformed (p < 0.0001). This is the first study to propose a completely automated algorithm for predicting OED transformation based on interpretable nuclear features, whilst being validated on external datasets. The algorithm shows better-than-human-level performance for prediction of OED malignant transformation and offers a promising solution to the challenges of grading OED in routine clinical practice.	翻訳日:2023-07-11 17:47:05 公開日:2023-07-06
# FITS: 10k$パラメータによる時系列モデリング FITS: Modeling Time Series with $10k$ Parameters ( http://arxiv.org/abs/2307.03756v1 ) ライセンス: Link先を確認	Zhijian Xu, Ailing Zeng, Qiang Xu	(参考訳) 本稿では,時系列解析のための軽量かつ強力なモデルであるFITSを紹介する。生の時間領域データを直接処理する既存のモデルとは異なり、FITSは複雑な周波数領域の補間によって時系列を操作できるという原理に基づいている。時系列データにほとんど影響を与えない高周波成分を廃棄することにより、FITSは、約10k$のパラメータしか持たず、時系列予測や異常検出タスクの最先端モデルに匹敵する性能を達成する。このような軽量なモデルは、簡単にトレーニングしてエッジデバイスにデプロイでき、さまざまなアプリケーションのための機会を生み出します。匿名のコードリポジトリは以下の通りである。 In this paper, we introduce FITS, a lightweight yet powerful model for time series analysis. Unlike existing models that directly process raw time-domain data, FITS operates on the principle that time series can be manipulated through interpolation in the complex frequency domain. By discarding high-frequency components with negligible impact on time series data, FITS achieves performance comparable to state-of-the-art models for time series forecasting and anomaly detection tasks, while having a remarkably compact size of only approximately $10k$ parameters. Such a lightweight model can be easily trained and deployed in edge devices, creating opportunities for various applications. The anonymous code repo is available in: \url{https://anonymous.4open.science/r/FITS}	翻訳日:2023-07-11 17:46:30 公開日:2023-07-06
# 読むか、見るか、聞くか? マルチモーダルデータセットの解決に必要なこと Read, Look or Listen? What's Needed for Solving a Multimodal Dataset ( http://arxiv.org/abs/2307.04532v1 ) ライセンス: Link先を確認	Netta Madvil, Yonatan Bitton, Roy Schwartz	(参考訳) 大規模マルチモーダルデータセットの普及は,データセットの品質を評価する上で,ユニークな課題を示す。マルチモーダル・インスタンスを処理に必要なモダリティにマップするために、人間のアノテーションの小さなシードを利用するマルチモーダル・データセットを2段階解析する手法を提案する。提案手法は,データセットにおける異なるモダリティの重要性と,それらの関係に光を当てる。ビデオ質問応答データセットであるTVQAに我々のアプローチを適用し、ほとんどの質問が特定のモダリティに対して実質的な偏見を伴わずに単一のモダリティで答えられることを発見した。さらに、ビデオを見たり、音声を聴いたりして、テレビQAにおける複数のモダリティの限定的な統合を強調したりすることで、70%以上の質問が、いくつかの異なる単一モダリティ戦略を用いて解決可能であることがわかった。我々はアノテーションを利用してMERLOTリザーブを解析し、テキストや音声よりも画像に基づく質問に苦しむが、聴覚話者の識別にも苦しむことを発見した。そこで本研究では,複数のモダリティを必要とする新しいテストセットを導入し,モデル性能の劇的な低下を観測する。我々の方法論は、マルチモーダルデータセットに関する貴重な洞察を提供し、より堅牢なモデルの開発の必要性を強調します。 The prevalence of large-scale multimodal datasets presents unique challenges in assessing dataset quality. We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it. Our method sheds light on the importance of different modalities in datasets, as well as the relationship between them. We apply our approach to TVQA, a video question-answering dataset, and discover that most questions can be answered using a single modality, without a substantial bias towards any specific modality. Moreover, we find that more than 70% of the questions are solvable using several different single-modality strategies, e.g., by either looking at the video or listening to the audio, highlighting the limited integration of multiple modalities in TVQA. We leverage our annotation and analyze the MERLOT Reserve, finding that it struggles with image-based questions compared to text and audio, but also with auditory speaker identification. Based on our observations, we introduce a new test set that necessitates multiple modalities, observing a dramatic drop in model performance. Our methodology provides valuable insights into multimodal datasets and highlights the need for the development of more robust models.	翻訳日:2023-07-11 13:01:50 公開日:2023-07-06
# ChatGPTの反応は従来の自然言語処理を促進するか? Can ChatGPT's Responses Boost Traditional Natural Language Processing? ( http://arxiv.org/abs/2307.04648v1 ) ライセンス: Link先を確認	Mostafa M. Amin, Erik Cambria, Bj\"orn W. Schuller	(参考訳) 基礎モデルの雇用は、特にChatGPTの発売と他の基礎モデルの発売により、着実に拡大している。これらのモデルは、問題解決のために特に訓練されることなく、新しい能力の可能性を示してきた。パフォーマンスの質は従来の自然言語処理(NLP)技術と似ているが、RoBERTa言語モデルの微調整のような特別に訓練されたモデルには欠けていた。本研究は,ChatGPTが既存の特殊化モデルを融合させる新しい知識を持つかどうかを探索することによってこれを拡張する。提案手法は,既存のnlp手法を活用し,ダウンストリームタスクを解決するためのchatgptからの冗長応答の有用性を検討することで実現される。本研究は,感情分析,自殺傾向検出,ビッグファイブパーソナリティ評価という3つの情動計算問題について行った。以上の結果から,ChatGPTは核融合によって既存のNLP技術を改善することができる新たな知識を持っていることが示唆された。 The employment of foundation models is steadily expanding, especially with the launch of ChatGPT and the release of other foundation models. These models have shown the potential of emerging capabilities to solve problems, without being particularly trained to solve. A previous work demonstrated these emerging capabilities in affective computing tasks; the performance quality was similar to traditional Natural Language Processing (NLP) techniques, but falling short of specialised trained models, like fine-tuning of the RoBERTa language model. In this work, we extend this by exploring if ChatGPT has novel knowledge that would enhance existing specialised models when they are fused together. We achieve this by investigating the utility of verbose responses from ChatGPT about solving a downstream task, in addition to studying the utility of fusing that with existing NLP methods. The study is conducted on three affective computing problems, namely sentiment analysis, suicide tendency detection, and big-five personality assessment. The results conclude that ChatGPT has indeed novel knowledge that can improve existing NLP techniques by way of fusion, be it early or late fusion.	翻訳日:2023-07-11 12:31:36 公開日:2023-07-06
# core-gpt: オープンアクセス研究と大規模言語モデルを組み合わせた信頼性の高い質問応答 CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering ( http://arxiv.org/abs/2307.04683v1 ) ライセンス: Link先を確認	David Pride, Matteo Cancellieri and Petr Knoth	(参考訳) 本稿では,GPTに基づく言語モデルと3200万件以上の全文オープンアクセス科学論文を組み合わせた質問応答プラットフォームであるCORE-GPTを提案する。まず、GPT3.5とGPT4は、生成されたテキストへの参照や引用を頼りにできないことを示す。次に,質問に対するエビデンスに基づく回答を提示するcore-gptと引用論文への引用とリンクを導入し,回答の信頼性を大幅に向上させ,幻覚のリスクを低減させる。 CORE-GPTのパフォーマンスは、COREの上位20の科学領域をカバーする100の質問のデータセットで評価され、100の回答と500の関連記事へのリンクが得られた。得られた回答の質とリンクの関連性は2つのアノテータで評価した。以上の結果から,CORE-GPTは科学的領域の大部分を包括的で信頼性の高い回答が得られ,真に関連のある科学論文へのリンクが得られた。 In this paper, we present CORE-GPT, a novel question-answering platform that combines GPT-based language models and more than 32 million full-text open access scientific articles from CORE. We first demonstrate that GPT3.5 and GPT4 cannot be relied upon to provide references or citations for generated text. We then introduce CORE-GPT which delivers evidence-based answers to questions, along with citations and links to the cited papers, greatly increasing the trustworthiness of the answers and reducing the risk of hallucinations. CORE-GPT's performance was evaluated on a dataset of 100 questions covering the top 20 scientific domains in CORE, resulting in 100 answers and links to 500 relevant articles. The quality of the provided answers and and relevance of the links were assessed by two annotators. Our results demonstrate that CORE-GPT can produce comprehensive and trustworthy answers across the majority of scientific domains, complete with links to genuine, relevant scientific articles.	翻訳日:2023-07-11 12:22:56 公開日:2023-07-06
# トルコ語音声テキスト学習モデルの性能比較:Whisper-SmallとWav2Vec2-XLS-R-300M Performance Comparison of Pre-trained Models for Speech-to-Text in Turkish: Whisper-Small and Wav2Vec2-XLS-R-300M ( http://arxiv.org/abs/2307.04765v1 ) ライセンス: Link先を確認	Oyku Berfin Mercan, Sercan Cepni, Davut Emre Tasar, Sukru Ozan	(参考訳) 本研究では,事前学習された2つの音声からテキストへの多言語モデルであるwhisper-smallとwav2vec2-xls-r-300mモデルの性能について検討した。 Mozilla Common Voiceバージョン11.0はトルコ語で準備されており、オープンソースのデータセットである。多言語モデルであるWhisper-SmallとWav2Vec2-XLS-R-300Mは、少量のデータを含むこのデータセットで微調整された。 2つのモデルの音声とテキストのパフォーマンスを比較した。 WER値は、それぞれWav2Vec2-XLS-R-300MとWhisper-Smallモデルの0.28と0.16と計算される。さらに、トレーニングおよび検証データセットに含まれていないコールセンターレコードを作成したテストデータを用いて、モデルの性能について検討した。 In this study, the performances of the Whisper-Small and Wav2Vec2-XLS-R-300M models which are two pre-trained multilingual models for speech to text were examined for the Turkish language. Mozilla Common Voice version 11.0 which is prepared in Turkish language and is an open-source data set, was used in the study. The multilingual models, Whisper- Small and Wav2Vec2-XLS-R-300M were fine-tuned with this data set which contains a small amount of data. The speech to text performance of the two models was compared. WER values are calculated as 0.28 and 0.16 for the Wav2Vec2-XLS- R-300M and the Whisper-Small models respectively. In addition, the performances of the models were examined with the test data prepared with call center records that were not included in the training and validation dataset.	翻訳日:2023-07-11 12:05:13 公開日:2023-07-06
# 単語意味論と音韻学がアルツハイマー病患者の手書きにどのように影響するか : 機械学習による分析 How word semantics and phonology affect handwriting of Alzheimer's patients: a machine learning based analysis ( http://arxiv.org/abs/2307.04762v1 ) ライセンス: Link先を確認	Nicole Dalia Cilia, Claudio De Stefano, Francesco Fontanella, Sabato Marco Siniscalchi	(参考訳) 神経変性疾患の診断を支援するために手書き文字のキネマティックな特性を利用することは、真の課題である。文献において,提案する課題は,筆跡運動を誘発する様々な認知的スキルに着目したものである。特に、コピーする単語の意味と音韻は、筆順を損なう可能性がある。本稿では,アルツハイマー病の影響を受ける人の筆跡に単語意味論と音韻学がどのように影響するかを検討した。この目的のために,6つの手書き作業から得られたデータを用いて,規則性(予測可能な音素-音素対応,例えば猫),非規則性(非定型音素-音素対応、例えば笑い),非単語(音素-音素変換規則に準拠した非意味的発音可能な文字列)の1つの単語をコピーする必要がある。我々は、よく知られた4つの分類器と特徴選択を実装することで、機械学習アプローチを用いてデータを分析した。実験の結果,特徴の選択により,各単語タイプごとに異なる特徴の異なる集合を導出できることがわかった。さらに、非正規語は、平均して多くの特徴を持つが、優れた分類性能を達成した: 最良の結果が非正規語で得られ、90%近い精度に達した。 Using kinematic properties of handwriting to support the diagnosis of neurodegenerative disease is a real challenge: non-invasive detection techniques combined with machine learning approaches promise big steps forward in this research field. In literature, the tasks proposed focused on different cognitive skills to elicitate handwriting movements. In particular, the meaning and phonology of words to copy can compromise writing fluency. In this paper, we investigated how word semantics and phonology affect the handwriting of people affected by Alzheimer's disease. To this aim, we used the data from six handwriting tasks, each requiring copying a word belonging to one of the following categories: regular (have a predictable phoneme-grapheme correspondence, e.g., cat), non-regular (have atypical phoneme-grapheme correspondence, e.g., laugh), and non-word (non-meaningful pronounceable letter strings that conform to phoneme-grapheme conversion rules). We analyzed the data using a machine learning approach by implementing four well-known and widely-used classifiers and feature selection. The experimental results showed that the feature selection allowed us to derive a different set of highly distinctive features for each word type. Furthermore, non-regular words needed, on average, more features but achieved excellent classification performance: the best result was obtained on a non-regular, reaching an accuracy close to 90%.	翻訳日:2023-07-11 12:04:56 公開日:2023-07-06
# Heteroscedastic noise modelによる患者特異的ルートの同定 Identifying Patient-Specific Root Causes with the Heteroscedastic Noise Model ( http://arxiv.org/abs/2205.13085v2 ) ライセンス: Link先を確認	Eric V. Strobl, Thomas A. Lasko	(参考訳) 複雑な疾患は、同一の診断カテゴリー内でも患者によって異なる様々な要因によって引き起こされる。根底にあるいくつかの原因は、それぞれの患者で疾患の発生を引き起こす可能性がある。そこで我々は,構造方程式モデルにおける外因性誤り項の標本特異的な予測値に類似した疾患の患者固有の根本原因の同定に焦点をあてた。 y = m(x) + \varepsilon\sigma(x)$ で条件付き平均と平均絶対偏差を表す非線型関数 $m(x)$ と $\sigma(x)$ を持つような、線形設定からヘテロシドスティックノイズモデルへ一般化する。このモデルは識別可能性を保持しますが、エラー項を正しく抽出するために一般化ルート因果推論(grci)と呼ばれるカスタマイズアルゴリズムを必要とする非自明な課題を導入します。 GRCIは、既存の代替品よりも患者固有の根本原因を正確に回復する。 Complex diseases are caused by a multitude of factors that may differ between patients even within the same diagnostic category. A few underlying root causes may nevertheless initiate the development of disease within each patient. We therefore focus on identifying patient-specific root causes of disease, which we equate to the sample-specific predictivity of the exogenous error terms in a structural equation model. We generalize from the linear setting to the heteroscedastic noise model where $Y = m(X) + \varepsilon\sigma(X)$ with non-linear functions $m(X)$ and $\sigma(X)$ representing the conditional mean and mean absolute deviation, respectively. This model preserves identifiability but introduces non-trivial challenges that require a customized algorithm called Generalized Root Causal Inference (GRCI) to extract the error terms correctly. GRCI recovers patient-specific root causes more accurately than existing alternatives.	翻訳日:2023-07-10 16:14:27 公開日:2023-07-06
# エンドツーエンドのマルチモーダルファクトチェックと説明生成: 挑戦的なデータセットとモデル End-to-End Multimodal Fact-Checking and Explanation Generation: A Challenging Dataset and Models ( http://arxiv.org/abs/2205.12487v2 ) ライセンス: Link先を確認	Barry Menglong Yao (1), Aditya Shah (1), Lichao Sun (2), Jin-Hee Cho (1), Lifu Huang (1) ((1) Virginia Tech, (2) Lehigh University)	(参考訳) 本稿では, 記事, 画像, ビデオ, つぶやきを含む大量のWebソースを入力として, クレームの真理性を評価し, 真理性ラベル(例えば, サポート, 反感, あるいは不十分な情報)を予測することによって, クレームの真理性を評価することを目的とした, インプットがクレームであり, 大量のWebソースの集合である, エンドツーエンドのマルチモーダルなファクトチェックと説明生成を提案する。この研究を支援するために,各クレームに真理性ラベルと裁定文を付記した15,601件のクレームと,33,880段落と12,112枚の画像からなる大規模データセットであるmochegを構築した。マルチモーダルエビデンス検索,クレーム検証,説明生成という,3つのパイプラインサブタスク上での最先端のニューラルネットワークアーキテクチャのベースライン性能を確立するため,最先端のマルチモーダルファクトチェックの性能が満足できる結果にならないことを実証した。私たちの知る限りでは、ベンチマークデータセットとエンドツーエンドのマルチモーダルファクトチェックと説明生成のためのソリューションを最初に構築しました。データセット、ソースコード、モデルチェックポイントはhttps://github.com/VT-NLP/Mocheg.comで入手できる。 We propose end-to-end multimodal fact-checking and explanation generation, where the input is a claim and a large collection of web sources, including articles, images, videos, and tweets, and the goal is to assess the truthfulness of the claim by retrieving relevant evidence and predicting a truthfulness label (e.g., support, refute or not enough information), and to generate a statement to summarize and explain the reasoning and ruling process. To support this research, we construct Mocheg, a large-scale dataset consisting of 15,601 claims where each claim is annotated with a truthfulness label and a ruling statement, and 33,880 textual paragraphs and 12,112 images in total as evidence. To establish baseline performances on Mocheg, we experiment with several state-of-the-art neural architectures on the three pipelined subtasks: multimodal evidence retrieval, claim verification, and explanation generation, and demonstrate that the performance of the state-of-the-art end-to-end multimodal fact-checking does not provide satisfactory outcomes. To the best of our knowledge, we are the first to build the benchmark dataset and solutions for end-to-end multimodal fact-checking and explanation generation. The dataset, source code and model checkpoints are available at https://github.com/VT-NLP/Mocheg.	翻訳日:2023-07-10 16:14:08 公開日:2023-07-06
# ブレッドスファーストパイプライン並列処理 Breadth-First Pipeline Parallelism ( http://arxiv.org/abs/2211.05953v2 ) ライセンス: Link先を確認	Joel Lamy-Poirier	(参考訳) パイプラインとデータ並列性の組み合わせを最適化する,新たなトレーニングスケジュールであるBreadth-First Pipeline Parallelismを導入する。 Breadth-First Pipeline Parallelismは、GPU使用率の高いGPUとGPU毎のバッチサイズを併用し、完全なシャードデータ並列性を使用することで、トレーニング時間、コスト、メモリ使用率を低下させる。実験では、megatron-lmと比較して、gpu当たりのバッチサイズが小さい52億パラメタモデルでは、トレーニングスループットが最大43%向上し、大きなgpuクラスタで同じ量でトレーニング時間とコストが削減されることがわかった。 We introduce Breadth-First Pipeline Parallelism, a novel training schedule which optimizes the combination of pipeline and data parallelism. Breadth-First Pipeline Parallelism lowers training time, cost and memory usage by combining a high GPU utilization with a small batch size per GPU, and by making use of fully sharded data parallelism. Experimentally, we observed an increase of up to 43% in training throughput for a 52 billion-parameter model using a small batch size per GPU compared to Megatron-LM, which would reduce the training time and cost by the same amount on a large GPU cluster.	翻訳日:2023-07-10 16:05:54 公開日:2023-07-06
# 優先マトロイド中央値に対するbicriteria近似アルゴリズム Bicriteria Approximation Algorithms for Priority Matroid Median ( http://arxiv.org/abs/2210.01888v2 ) ライセンス: Link先を確認	Tanvi Bajpai and Chandra Chekuri	(参考訳) フェアネスの考慮は近年、新しいクラスタリング問題とアルゴリズムを動機付けている。本稿では,最近研究されている優先度 $k$-median 問題を一般化した優先度マトロイド中央値問題を考える。入力は、一連の施設$\mathcal{F}$と、計量空間$(\mathcal{F} \cup \mathcal{C},d)$にあるクライアント$\mathcal{C}$と、その施設上のマトロイド$\mathcal{M}=(\mathcal{F},\mathcal{I})$からなる。さらに、各クライアント$j$ は特定の半径 $r_j \ge 0$ を持ち、各施設 $i \in \mathcal{F}$ は開封コスト $f_i$ を持つ。目的は、$\sum_{i \in \mathcal{F}} f_i + \sum_{j \in \mathcal{C}} d(j,S)$ を最小化する施設のサブセット $S \subseteq \mathcal{F}$ を選択することである。 (i)$S$は$\mathcal{M}$の独立集合である(つまり$S \in \mathcal{I}$)。 (ii) 各クライアント$j$に対して、開施設までの距離は最大$r_j$(つまり$d(j,S) \le r_j$)である。この問題に対して、最初のbicriteria $(c_1,c_2)$の固定定数に対する近似を記述する: クライアントの半径制約は、少なくとも$c_1$の係数で破られ、目的コストは最適なコストの最大$c_2$倍である。また、一様半径設定(r_j := L$ $\forall j \in \mathcal{C}$)に対する既知双基準近似も改善する。 Fairness considerations have motivated new clustering problems and algorithms in recent years. In this paper we consider the Priority Matroid Median problem which generalizes the Priority $k$-Median problem that has recently been studied. The input consists of a set of facilities $\mathcal{F}$ and a set of clients $\mathcal{C}$ that lie in a metric space $(\mathcal{F} \cup \mathcal{C},d)$, and a matroid $\mathcal{M}=(\mathcal{F},\mathcal{I})$ over the facilities. In addition each client $j$ has a specified radius $r_j \ge 0$ and each facility $i \in \mathcal{F}$ has an opening cost $f_i$. The goal is to choose a subset $S \subseteq \mathcal{F}$ of facilities to minimize the $\sum_{i \in \mathcal{F}} f_i + \sum_{j \in \mathcal{C}} d(j,S)$ subject to two constraints: (i) $S$ is an independent set in $\mathcal{M}$ (that is $S \in \mathcal{I}$) and (ii) for each client $j$, its distance to an open facility is at most $r_j$ (that is, $d(j,S) \le r_j$). For this problem we describe the first bicriteria $(c_1,c_2)$ approximations for fixed constants $c_1,c_2$: the radius constraints of the clients are violated by at most a factor of $c_1$ and the objective cost is at most $c_2$ times the optimum cost. We also improve the previously known bicriteria approximation for the uniform radius setting ($r_j := L$ $\forall j \in \mathcal{C}$).	翻訳日:2023-07-10 16:05:02 公開日:2023-07-06
# キャプション生成のための視覚的セマンティック類似表現:学習した教訓 Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned ( http://arxiv.org/abs/2209.12817v2 ) ライセンス: Link先を確認	Ahmed Sabir	(参考訳) 本稿では,画像キャプチャ生成システムによって生成されるキャプションの強化に着目する。本稿では,モデルが生成する最も可能性の高い出力ではなく,最も関連性の高い出力を選択することでキャプション生成システムを改善する手法を提案する。我々のモデルは視覚的文脈の観点から言語生成出力ビーム探索を改訂する。画像中の関連情報と適切なキャプションを一致させるために,単語と文レベルの視覚的意味尺度を用いる。提案手法は後処理に基づく手法として任意の字幕システムに適用できる。 This paper focuses on enhancing the captions generated by image-caption generation systems. We propose an approach for improving caption generation systems by choosing the most closely related output to the image rather than the most likely output produced by the model. Our model revises the language generation output beam search from a visual context perspective. We employ a visual semantic measure in a word and sentence level manner to match the proper caption to the related information in the image. The proposed approach can be applied to any caption system as a post-processing based method.	翻訳日:2023-07-10 16:03:53 公開日:2023-07-06
# ボース・アインシュタイン凝縮体における集合励起を用いた量子レジスタ A quantum register using collective excitations in a Bose-Einstein condensate ( http://arxiv.org/abs/2211.09252v3 ) ライセンス: Link先を確認	Elisha Haber (1), Zekai Chen (1 and 2), Nicholas P. Bigelow (1) ((1) University of Rochester, (2) University of Innsbruck)	(参考訳) 原子の集合からなる量子ビットは、原子の損失に対する耐性から魅力的であり、そのような量子ビットを実現するための多くの提案は、リドベルク封鎖効果に基づいている。代わりに、空間的に重なり合うボース-アインシュタイン凝縮体からスピン依存光学格子をコヒーレントにロードする実験的に実現可能なプロトコルを考える。各格子サイトを量子ビットとして同定し, 空あるいは充填されたサイトを量子ビットとして, 高忠実度単一量子ビット演算, 任意の量子ビット間の2量子ゲート, 非破壊測定を行う方法について検討した。この設定では、原子損失の影響は緩和され、原子は基底状態多様体から取り除かれる必要はなく、量子ビットの別個の記憶と計算の基盤は不要であり、これらは全て他の多くの種類の原子量子ビットにおいて重要なデコヒーレンスの原因となる。 A qubit made up of an ensemble of atoms is attractive due to its resistance to atom losses, and many proposals to realize such a qubit are based on the Rydberg blockade effect. In this work, we instead consider an experimentally feasible protocol to coherently load a spin-dependent optical lattice from a spatially overlapping Bose--Einstein condensate. Identifying each lattice site as a qubit, with an empty or filled site as the qubit basis, we discuss how high-fidelity single-qubit operations, two-qubit gates between arbitrary pairs of qubits, and nondestructive measurements could be performed. In this setup, the effect of atom losses has been mitigated, the atoms never need to be removed from the ground state manifold, and separate storage and computational bases for the qubits are not required, all of which can be significant sources of decoherence in many other types of atomic qubits.	翻訳日:2023-07-10 15:53:57 公開日:2023-07-06
# Calibrated Interpretation:Semantic Parsingにおける信頼度推定 Calibrated Interpretation: Confidence Estimation in Semantic Parsing ( http://arxiv.org/abs/2211.07443v6 ) ライセンス: Link先を確認	Elias Stengel-Eskin and Benjamin Van Durme	(参考訳) シーケンス生成モデルは、自然言語をプログラムに変換するために、すなわち実行可能なセマンティック解析を実行するために、ますます使われている。セマンティック解析は、現実世界で実行されるアクションにつながるプログラムを予測することを目的としているという事実は、安全なシステムの開発を動機付けている。これにより、特に安全性の中心的な要素であるキャリブレーションの測定が重要になる。一般的な4つのセマンティックパーシングデータセットのキャリブレーションを調査し、モデルやデータセットによって異なることを発見した。次に、キャリブレーションエラーに関連する要因を分析し、2つの解析データセットの新しい信頼度に基づく課題分割をリリースする。セマンティック解析評価にキャリブレーションを組み込むことを容易にするため,キャリブレーションメトリクスを計算するためのライブラリをリリースする。 Sequence generation models are increasingly being used to translate natural language into programs, i.e. to perform executable semantic parsing. The fact that semantic parsing aims to predict programs that can lead to executed actions in the real world motivates developing safe systems. This in turn makes measuring calibration -- a central component to safety -- particularly important. We investigate the calibration of popular generation models across four popular semantic parsing datasets, finding that it varies across models and datasets. We then analyze factors associated with calibration error and release new confidence-based challenge splits of two parsing datasets. To facilitate the inclusion of calibration in semantic parsing evaluations, we release a library for computing calibration metrics.	翻訳日:2023-07-10 15:53:38 公開日:2023-07-06
# 量子論におけるリレーショナルシズムを超えて--量子論の新しい不確定性に基づく解釈 Beyond relationalism in quantum theory: A new indeterminacy-based interpretation of quantum theory ( http://arxiv.org/abs/2304.00608v5 ) ライセンス: Link先を確認	Francisco Pipa	(参考訳) 物理学の基礎と哲学における受け入れられた見解は、ある隠れ変数を持つ補足量子論(QT)を拒絶し、ユニタリQTが正しいとみなすならば、QTのリレーショナルな解釈を採用するべきであるというものである。関係論的な解釈は、測定結果、例えば世界、システム、エージェント、参照フレームに相対化する。それは多世界解釈、関係量子力学、QBismを含んでいる。これらの解釈は、それらのリレーショナルな解釈と結びつく潜在的なコストを持つ。したがって、非リレーショナルな非隠れ変数 QT の普遍的 intepretations が存在するなら、真剣に考えるべきである。環境決定性理論(environmental determinacy-based or end quantum theory,endqt)と呼ばれる、リレーショナル主義と受け取られた見解を超えた解釈を提示する。 endqtは、ユニタリな非隠れ変数ユニバーサルqtを維持しながら、関係性を持たない不確定値と決定値と基礎となる量子特性の考慮を構築することによって、関係性を回避する。 EnDQTによると、関係論者が、拡張されたウィグナーの友人シナリオのような測定結果が相対化されると仮定する場合、決定的な結果ではなく、非関係的な不決定値を持つシステムが存在する。このアプローチでは、特定のシステムを通じてある時点の値が発生し、特定のネットワークで表現された特定の相互作用を通じてその値が持続する。これらのネットワークに属する他のシステム、例えば拡張ウィグナーの友人のシナリオにおける友人の研究室内部から隔離された場合、非関係的な値が内部で発生する。ベル相関の局所的因果説明や、これらのネットワークで表現された新しい実証的実証例など、endqtを採用する他の独立した正当な理由について論じる。 The received view in foundations and philosophy of physics holds that if we reject supplementing quantum theory (QT) with certain hidden variables and consider that unitary QT is correct and universal, we should adopt a relationalist interpretation of QT. Relationalist interpretations relativize measurement outcomes to, for example, worlds, systems, agents, or reference frames. It includes the Many-Worlds Interpretation, Relational Quantum Mechanics, and QBism. These interpretations have potential costs connected with their relationalism that make them unattractive. Thus, if there exists a non-relational non-hidden variable universal intepretations of QT, it should be taken seriously. I will present an interpretation of this kind called Environmental Determinacy-based or EnD Quantum Theory (EnDQT), which goes beyond relationalism and the received view. EnDQT circumvents relationalism by constructing an account of indeterminate and determinate values and underlying quantum properties that is not relational while maintaining unitary non-hidden variable universal QT. In situations where a relationalist assumes that measurement outcomes are relativized, such as in the extended Wigner's friend scenarios, according to EnDQT there aren't determinate outcomes but systems with non-relational indeterminate values. In this approach, determinate values arose at some point in time through certain systems and persist due to them via certain interactions represented by certain networks. When there is isolation from the rest of the systems that belong to these networks, such as inside the friend's lab in the extended Wigner's friend scenarios, indeterminate values non-relationally arise inside. I will discuss other independent good reasons for adopting EnDQT, including providing a local causal explanation of Bell correlations and novel empirical posits represented by these networks.	翻訳日:2023-07-10 15:36:53 公開日:2023-07-06
# 情報検索のための埋め込みAPIの評価 Evaluating Embedding APIs for Information Retrieval ( http://arxiv.org/abs/2305.06300v2 ) ライセンス: Link先を確認	Ehsan Kamalloo, Xinyu Zhang, Odunayo Ogundepo, Nandan Thakur, David Alfonso-Hermelo, Mehdi Rezagholizadeh, Jimmy Lin	(参考訳) 言語モデルのサイズが拡大するにつれ、コミュニティへの普及が加速し、多くの企業がAPIを通じて大きな言語モデルにアクセスできるようになる。密集検索に適した1つの特定のタイプは、入力テキストのベクトル表現を構築するセマンティック埋め込みサービスである。公開apiの数が増えているため,本論文の目標は,既存の提供物を現実的な検索シナリオで分析し,実践者や研究者がニーズに応じて適切なサービスを見つけるのを支援することである。具体的には、ドメインの一般化と多言語検索における既存のセマンティック埋め込みAPIの機能について検討する。本研究では,BEIR と MIRACL の2つの標準ベンチマークでこれらのサービスを評価した。このAPIを用いてBM25の結果を再評価することは予算に優しいアプローチであり、第1段階のレトリバーとして使用する標準的なプラクティスとは対照的に、英語でもっとも効果的である。非英語検索の場合、再ランク付けは結果を改善するが、bm25のハイブリッドモデルの方が高いコストで機能する。我々は,情報アクセスにおいて,検索において重要なセマンティック埋め込みAPIを評価するための基礎を築き上げたい。 The ever-increasing size of language models curtails their widespread availability to the community, thereby galvanizing many companies into offering access to large language models through APIs. One particular type, suitable for dense retrieval, is a semantic embedding service that builds vector representations of input text. With a growing number of publicly available APIs, our goal in this paper is to analyze existing offerings in realistic retrieval scenarios, to assist practitioners and researchers in finding suitable services according to their needs. Specifically, we investigate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval. For this purpose, we evaluate these services on two standard benchmarks, BEIR and MIRACL. We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English, in contrast to the standard practice of employing them as first-stage retrievers. For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost. We hope our work lays the groundwork for evaluating semantic embedding APIs that are critical in search and more broadly, for information access.	翻訳日:2023-07-10 15:25:37 公開日:2023-07-06
# 特徴空間密度マッチングによる医用画像分割のための教師なし領域適応 Unsupervised Domain Adaptation for Medical Image Segmentation via Feature-space Density Matching ( http://arxiv.org/abs/2305.05789v2 ) ライセンス: Link先を確認	Tushar Kataria, Beatrice Knudsen, and Shireen Elhabian	(参考訳) セマンティックセグメンテーション(セマンティックセグメンテーション、Semantic segmentation)は、画像の自動解釈と解析において重要なステップである。セマンティックセグメンテーションのためのディープラーニングアプローチは、アノテーション付き画像のパワーを利用して、これらのセマンティッククラスを示す特徴を学習する。それでも、トレーニング(すなわち、ソース)データとデプロイ時に遭遇するデータセット(すなわち、ターゲット)の間に重要なドメイン(すなわち、分散)シフトがある場合、ターゲットデータに対して手動アノテーションを必要とせず、許容可能なパフォーマンスを達成することがしばしばある。異なる画像モダリティは、プロトコールとベンダーの変動性により、サイト内およびサイト間において大きな変動をもたらすため、医療画像において特に重要である。現在の技術はハイパーパラメータチューニングとターゲットデータセットサイズに敏感である。本稿では,対象データのアノテートの必要性を緩和する意味セグメンテーションのための教師なしドメイン適応手法を提案する。カーネル密度推定を用いて,対象データ分布を特徴空間のソース,特に対象サンプル数(対象データセットサイズの3%)が限られている場合と一致させる。提案手法の有効性を2つのデータセット,多部位前立腺MRI,病理組織像に示す。 Semantic segmentation is a critical step in automated image interpretation and analysis where pixels are classified into one or more predefined semantically meaningful classes. Deep learning approaches for semantic segmentation rely on harnessing the power of annotated images to learn features indicative of these semantic classes. Nonetheless, they often fail to generalize when there is a significant domain (i.e., distributional) shift between the training (i.e., source) data and the dataset(s) encountered when deployed (i.e., target), necessitating manual annotations for the target data to achieve acceptable performance. This is especially important in medical imaging because different image modalities have significant intra- and inter-site variations due to protocol and vendor variability. Current techniques are sensitive to hyperparameter tuning and target dataset size. This paper presents an unsupervised domain adaptation approach for semantic segmentation that alleviates the need for annotating target data. Using kernel density estimation, we match the target data distribution to the source in the feature space, particularly when the number of target samples is limited (3% of the target dataset size). We demonstrate the efficacy of our proposed approach on 2 datasets, multisite prostate MRI and histopathology images.	翻訳日:2023-07-10 15:25:19 公開日:2023-07-06
# ESPnet-ST-v2:多目的音声翻訳ツールキット ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit ( http://arxiv.org/abs/2304.04596v3 ) ライセンス: Link先を確認	Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Pol\'ak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe	(参考訳) ESPnet-ST-v2はオープンソースのESPnet-STツールキットを改良したものである。 ESPnet-ST-v2 のサポート 1)オフライン音声テキスト翻訳(ST) 2)同時音声テキスト翻訳(SST)、及び 3) オフライン音声音声翻訳(S2ST) -- 各タスクは、ESPnet-ST-v2と他のオープンソースの音声翻訳ツールキットを区別して、幅広いアプローチでサポートされている。このツールキットはトランスデューサ、ハイブリッドCTC/アテンション、検索可能な中間子を持つマルチデコーダ、時間同期ブロックワイドCTC/アテンション、トランスラトトロンモデル、直接離散単位モデルなどの最先端アーキテクチャを提供する。本稿では,https://github.com/espnet/espnetで公開されているespnet-st-v2の背後にある全体的な設計,各タスクのモデル,パフォーマンスベンチマークについて述べる。 ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community. ESPnet-ST-v2 supports 1) offline speech-to-text translation (ST), 2) simultaneous speech-to-text translation (SST), and 3) offline speech-to-speech translation (S2ST) -- each task is supported with a wide variety of approaches, differentiating ESPnet-ST-v2 from other open source spoken language translation toolkits. This toolkit offers state-of-the-art architectures such as transducers, hybrid CTC/attention, multi-decoders with searchable intermediates, time-synchronous blockwise CTC/attention, Translatotron models, and direct discrete unit models. In this paper, we describe the overall design, example models for each task, and performance benchmarking behind ESPnet-ST-v2, which is publicly available at https://github.com/espnet/espnet.	翻訳日:2023-07-10 15:23:28 公開日:2023-07-06
# Einstein-Podolsky-Rosen ステアリングの一方向フィルタ Filtering one-way Einstein-Podolsky-Rosen steering ( http://arxiv.org/abs/2304.04210v2 ) ライセンス: Link先を確認	Ze-Yan Hao, Yan Wang, Jia-Kun Li, Yu Xiang, Qiong-Yi He, Zheng-Hao Liu, Mu Yang, Kai Sun, Jin-Shi Xu, Chuan-Feng Li, and Guang-Can Guo	(参考訳) EPR(Einstein-Podolsky-Rosen)ステアリング(EPR)は、量子非局所性の基本概念であり、ある観測者が別の観測者の状態に局所的な測定でリモートで影響する能力を記述する。対称量子相関と関連する量子絡み合いやベル非局所性とは異なり、EPRステアリングは量子非局所性のユニークな非対称性を表す。システム成分が廃棄される局所フィルタ演算により、量子非局所性を蒸留して非局所相関を強化することができ、隠れた非局所性も活性化することができる。しかしながら、フィルタ演算における非対称な量子非局所性は、特に量子非局所相関が確率で存在する可能性のある破棄された部分を考えると、十分に取り調べられた研究を欠いている。ここでは,EPRステアリングに対する局所フィルタの効果について,理論と実験の両方において検討する。 EPRステアリングのすべての構成を同時に観察し、一方方向のEPRステアリングの方向を反転させるなど、非対称な量子非局所性の興味深い進化を観察する。この研究は、非対称量子非局所性を理解するための補完的な視点を提供し、非対称量子システムを量子情報タスクに有意な応用で操作するための実用的なツールボックスを示す。 Einstein-Podolsky-Rosen (EPR) steering, a fundamental concept of quantum nonlocality, describes one observer's capability to remotely affect another distant observer's state by local measurements. Unlike quantum entanglement and Bell nonlocality, both associated with the symmetric quantum correlation, EPR steering depicts the unique asymmetric property of quantum nonlocality. With the local filter operation in which some system components are discarded, quantum nonlocality can be distilled to enhance the nonlocal correlation, and even the hidden nonlocality can be activated. However, asymmetric quantum nonlocality in the filter operation still lacks a well-rounded investigation, especially considering the discarded parts where quantum nonlocal correlations may still exist with probabilities. Here, in both theory and experiment, we investigate the effect of the local filter on EPR steering. We observe all configurations of EPR steering simultaneously and other intriguing evolution of asymmetric quantum nonlocality, such as reversing the direction of one-way EPR steering. This work provides a complementary perspective to understand the asymmetric quantum nonlocality and demonstrates a practical toolbox for manipulating asymmetric quantum systems with significant potential applications in quantum information tasks.	翻訳日:2023-07-10 15:23:10 公開日:2023-07-06
# 大規模言語モデルにおけるオープンドメイン質問応答の評価 Evaluating Open-Domain Question Answering in the Era of Large Language Models ( http://arxiv.org/abs/2305.06984v3 ) ライセンス: Link先を確認	Ehsan Kamalloo, Nouha Dziri, Charles L. A. Clarke, Davood Rafiei	(参考訳) 語彙マッチングは、オープンドメイン質問応答(QA)のデファクト評価方法として残っている。残念なことに、論理的マッチングは、金の答えリストにプラウチブル候補の答えが現れない場合に完全に失敗し、抽出モデルから生成モデルへ移行するにつれて、ますますその傾向が増す。近年の大規模言語モデル (LLMs) の成功により、候補解が長くなると語彙的マッチングの失敗が増加し、ゴールド解とのマッチングはさらに困難になる。正確な評価がなければ、オープンドメインQAの真の進歩は分かっていない。本稿では,一般的なベンチマークであるNQ-openのサブセットを手動で評価することにより,LLMを含む様々なオープンドメインQAモデルの徹底的な分析を行う。私たちの評価では、すべてのモデルの真のパフォーマンスは著しく過小評価されているものの、instructgpt (zero-shot) llmのパフォーマンスは60%近く向上し、既存のトップモデルと同等になり、instructgpt (few-shot) モデルはnq-openの新たな最先端を実際に達成しています。また、語彙マッチング失敗の50%以上が意味論的に等価な答えによるものであることが判明した。さらに、不必要な厳密さに悩まされているにもかかわらず、人間の判断と整合したランクQAモデルを示す。最後に, 自動評価モデルは, LLM が生成する長文解に対してではなく, 語彙マッチングのための合理的なサロゲートであることを示す。自動モデルはLLM回答の幻覚を検出するのに苦労し、LLMを評価することができない。現段階では、人間の評価に代わるものはないようである。 Lexical matching remains the de facto evaluation method for open-domain question answering (QA). Unfortunately, lexical matching fails completely when a plausible candidate answer does not appear in the list of gold answers, which is increasingly the case as we shift from extractive to generative models. The recent success of large language models (LLMs) for QA aggravates lexical matching failures since candidate answers become longer, thereby making matching with the gold answers even more challenging. Without accurate evaluation, the true progress in open-domain QA remains unknown. In this paper, we conduct a thorough analysis of various open-domain QA models, including LLMs, by manually evaluating their answers on a subset of NQ-open, a popular benchmark. Our assessments reveal that while the true performance of all models is significantly underestimated, the performance of the InstructGPT (zero-shot) LLM increases by nearly +60%, making it on par with existing top models, and the InstructGPT (few-shot) model actually achieves a new state-of-the-art on NQ-open. We also find that more than 50% of lexical matching failures are attributed to semantically equivalent answers. We further demonstrate that regex matching ranks QA models consistent with human judgments, although still suffering from unnecessary strictness. Finally, we demonstrate that automated evaluation models are a reasonable surrogate for lexical matching in some circumstances, but not for long-form answers generated by LLMs. The automated models struggle in detecting hallucinations in LLM answers and are thus unable to evaluate LLMs. At this time, there appears to be no substitute for human evaluation.	翻訳日:2023-07-10 15:14:09 公開日:2023-07-06
# 視線を信じないで - 機能の可視化の信頼性について Don't trust your eyes: on the (un)reliability of feature visualizations ( http://arxiv.org/abs/2306.04719v4 ) ライセンス: Link先を確認	Robert Geirhos, Roland S. Zimmermann, Blair Bilodeau, Wieland Brendel, Been Kim	(参考訳) ニューラルネットワークはどのようにピクセルからパターンを抽出するか? 機能の可視化は、最適化によって非常に活性化したパターンを視覚化することで、この重要な質問に答えようとしている。今日、可視化手法は、機械的な解釈可能性の一種として、ニューラルネットワークの内部動作に関する我々の知識の基礎を形成している。機能可視化はどの程度信頼できるのか? 我々は,自然入力上での通常のネットワーク動作から完全に切り離された任意のパターンを示すために,特徴可視化を騙すネットワーク回路の開発に着手する。特徴視覚化は標準入力とは全く異なる処理を受けており、ニューラルネットワークが自然言語をどのように処理するかを「説明」する能力に疑問を呈している。特徴視覚化によって確実に理解できる関数の集合は極めて小さく、一般的なブラックボックスニューラルネットワークを含まないことを証明した理論によるこの経験的発見を裏付ける。そのため、より信頼性の高い特徴視覚化を実現するために、特定の構造を強制するネットワークの開発が期待できる。 How do neural networks extract patterns from pixels? Feature visualizations attempt to answer this important question by visualizing highly activating patterns through optimization. Today, visualization methods form the foundation of our knowledge about the internal workings of neural networks, as a type of mechanistic interpretability. Here we ask: How reliable are feature visualizations? We start our investigation by developing network circuits that trick feature visualizations into showing arbitrary patterns that are completely disconnected from normal network behavior on natural input. We then provide evidence for a similar phenomenon occurring in standard, unmanipulated networks: feature visualizations are processed very differently from standard input, casting doubt on their ability to "explain" how neural networks process natural images. We underpin this empirical finding by theory proving that the set of functions that can be reliably understood by feature visualization is extremely small and does not include general black-box neural networks. Therefore, a promising way forward could be the development of networks that enforce certain structures in order to ensure more reliable feature visualizations.	翻訳日:2023-07-10 15:05:53 公開日:2023-07-06
# 励起状態量子相転移を利用した精密磁気計測 Precision magnetometry exploiting excited state quantum phase transitions ( http://arxiv.org/abs/2306.01126v2 ) ライセンス: Link先を確認	Qian Wang, Ugo Marzolino	(参考訳) 相転移における臨界挙動は精密計測の資源である。理由は、フィッシャー情報として知られるこの関数が臨界点において超指数関数であり、同時にメトロジープロトコルのパフォーマンスを定量化するからである。したがって、位相遷移におけるメロジカルプローブの作成により、遷移制御パラメータの測定精度が向上する。我々は、異なる磁場で励起状態量子相転移を示すリプキン-メシュコフ-グリックモデルに焦点を当てる。モデルスペクトル特性に基づき、フィッシャー情報の広いピークを示し、高精度磁力計の効率的なスキームを提案する。 lipkin-meshkov-glickモデルは、超伝導と核系のために初めて導入され、最近いくつかの凝縮物プラットフォームで実現された。上記のメトロロジースキームは、リプキン-メシュコフ-グリック模型をシミュレートできるシステムの微視的性質を測定するためにも利用できる。 Critical behaviour in phase transitions is a resource for enhanced precision metrology. The reason is that the function, known as Fisher information, is superextensive at critical points, and, at the same time, quantifies performances of metrological protocols. Therefore, preparing metrological probes at phase transitions provides enhanced precision in measuring the transition control parameter. We focus on the Lipkin-Meshkov-Glick model that exhibits excited state quantum phase transitions at different magnetic fields. Resting on the model spectral properties, we show broad peaks of the Fisher information, and propose efficient schemes for precision magnetometry. The Lipkin-Meshkov-Glick model was first introduced for superconductivity and for nuclear systems, and recently realised in several condensed matter platforms. The above metrological schemes can be also exploited to measure microscopic properties of systems able to simulate the Lipkin-Meshkov-Glick model.	翻訳日:2023-07-10 15:04:16 公開日:2023-07-06
# KoLA: 大規模言語モデルのワールドナレッジを慎重にベンチマークする KoLA: Carefully Benchmarking World Knowledge of Large Language Models ( http://arxiv.org/abs/2306.09296v2 ) ライセンス: Link先を確認	Jifan Yu, Xiaozhi Wang, Shangqing Tu, Shulin Cao, Daniel Zhang-Li, Xin Lv, Hao Peng, Zijun Yao, Xiaohan Zhang, Hanming Li, Chunyang Li, Zheyuan Zhang, Yushi Bai, Yantao Liu, Amy Xin, Nianyi Lin, Kaifeng Yun, Linlu Gong, Jianhui Chen, Zhili Wu, Yunjia Qi, Weikai Li, Yong Guan, Kaisheng Zeng, Ji Qi, Hailong Jin, Jinxin Liu, Yu Gu, Yuan Yao, Ning Ding, Lei Hou, Zhiyuan Liu, Bin Xu, Jie Tang, Juanzi Li	(参考訳) 大規模言語モデル(LLM)の先例のない性能は、評価の改善を必要とする。単にLLM能力の広さを探求するだけでなく、綿密で思慮深い設計が、徹底的で偏見がなく、適用可能な評価に不可欠であると信じている。 LLMに対する世界的知識の重要性を考慮し、知識指向LLMアセスメントベンチマーク(KoLA)を構築し、(1)能力モデリングでは、人間の認知を模倣して知識関連能力の4段階の分類を作成し、19ドルのタスクをカバーしている。 2)データを公平に比較するためには,LLMが事前学習したコーパスであるウィキペディアと,未確認データを扱う能力と知識の進化を評価することを目的とした,新たなコーパスを併用する。 (3) 評価基準には,タスクやモデル間の数値コンパビリティ向上のための総合的な基準スコアと,知識幻覚の自動評価のための独自の自己コントラスト尺度が採用されている。オープンソースおよび商用LLMを21ドルで評価し,興味深い結果を得た。 KoLAデータセットとオープン参加型リーダボードはhttps://kola.xlore.cnで公開されており、LLMとナレッジ関連のシステムを開発するためのリファレンスを提供するために継続的に更新される。 The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we believe meticulous and thoughtful designs are essential to thorough, unbiased, and applicable evaluations. Given the importance of world knowledge to LLMs, we construct a Knowledge-oriented LLM Assessment benchmark (KoLA), in which we carefully design three crucial factors: (1) For ability modeling, we mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks. (2) For data, to ensure fair comparisons, we use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, aiming to evaluate the capacity to handle unseen data and evolving knowledge. (3) For evaluation criteria, we adopt a contrastive system, including overall standard scores for better numerical comparability across tasks and models and a unique self-contrast metric for automatically evaluating knowledge hallucination. We evaluate $21$ open-source and commercial LLMs and obtain some intriguing findings. The KoLA dataset and open-participation leaderboard are publicly released at https://kola.xlore.cn and will be continuously updated to provide references for developing LLMs and knowledge-related systems.	翻訳日:2023-07-10 14:55:09 公開日:2023-07-06
# 深部変動クラスタリングを用いたエキスパート非依存超音波画像品質評価 Expert-Agnostic Ultrasound Image Quality Assessment using Deep Variational Clustering ( http://arxiv.org/abs/2307.02462v2 ) ライセンス: Link先を確認	Deepak Raina, Dimitrios Ntentia, SH Chandrashekhara, Richard Voyles, Subir Kumar Saha	(参考訳) 超音波イメージングは、いくつかの診断および治療の手順で一般的に用いられるモダリティである。しかし超音波による診断は、超音波撮影者が手動で評価した画像の品質に大きく依存しており、診断の客観性を低下させ、操作者に依存している。自動品質評価のための教師付き学習ベースの手法は、手動で注釈付きデータセットを必要とする。これらの超音波画像は品質が低く、オブザーバ間の知覚変化によるノイズの多いアノテーションに苦しむため、学習効率が損なわれる。我々は,手動アノテーションの負担と不確実性を解消するUnSupervised UltraSound Image Quality Assessment Network (US2QNet)を提案する。 US2QNetは、前処理、クラスタリング、後処理の3つのモジュールに埋め込まれた変分オートエンコーダを使用して、超音波画像の品質特徴表現を共同で強化、抽出、クラスタリング、可視化する。プリプロセッシングモジュールはイメージのフィルタリングを使用して、ノイズに注意をそらすのではなく、ネットワークの注意を優れた品質機能に向ける。 2次元空間における特徴表現のクラスタを可視化するための後処理を提案する。提案する膀胱超音波画像の品質評価の枠組みを検証した。提案手法は,最先端クラスタリング手法よりも精度が78%,性能が優れている。 Ultrasound imaging is a commonly used modality for several diagnostic and therapeutic procedures. However, the diagnosis by ultrasound relies heavily on the quality of images assessed manually by sonographers, which diminishes the objectivity of the diagnosis and makes it operator-dependent. The supervised learning-based methods for automated quality assessment require manually annotated datasets, which are highly labour-intensive to acquire. These ultrasound images are low in quality and suffer from noisy annotations caused by inter-observer perceptual variations, which hampers learning efficiency. We propose an UnSupervised UltraSound image Quality assessment Network, US2QNet, that eliminates the burden and uncertainty of manual annotations. US2QNet uses the variational autoencoder embedded with the three modules, pre-processing, clustering and post-processing, to jointly enhance, extract, cluster and visualize the quality feature representation of ultrasound images. The pre-processing module uses filtering of images to point the network's attention towards salient quality features, rather than getting distracted by noise. Post-processing is proposed for visualizing the clusters of feature representations in 2D space. We validated the proposed framework for quality assessment of the urinary bladder ultrasound images. The proposed framework achieved 78% accuracy and superior performance to state-of-the-art clustering methods.	翻訳日:2023-07-10 14:36:26 公開日:2023-07-06
# 都市部埋め込みのための地域意識多視点表現学習 Region-Wise Attentive Multi-View Representation Learning for Urban Region Embeddings ( http://arxiv.org/abs/2307.03212v1 ) ライセンス: Link先を確認	Weiliang Chan and Qianqian Ren	(参考訳) 都市領域の埋め込みは、複雑さと都市データの性質が絶えず変化するため、重要かつ非常に困難な問題である。この課題に対処するため,我々は,都市域の多視点依存を捉えるための領域ワイズ多視点表現学習(ROMER)を提案し,厳密な地域条件の制約を伴わずに都市域の表現表現を学習する。本モデルでは,多元都市データから都市域表現を学ぶことに注力する。まず,モビリティフローパターン,poiセマンティクス,チェックインダイナミクスから多視点相関を捉える。次に,グラフ内の2つの頂点の類似性を学習するために,グローバルグラフアテンションネットワークを採用する。複数ビューの特徴を包括的に検討し共有するために,2段階の融合モジュールを提案し,外部の注意を払って重みを学習し,多視点埋め込みを実現する。実世界のデータセット上での2つの下流タスクに対する大規模な実験により、我々のモデルは最先端の手法を最大17倍改善することを示した。 Urban region embedding is an important and yet highly challenging issue due to the complexity and constantly changing nature of urban data. To address the challenges, we propose a Region-Wise Multi-View Representation Learning (ROMER) to capture multi-view dependencies and learn expressive representations of urban regions without the constraints of rigid neighbourhood region conditions. Our model focus on learn urban region representation from multi-source urban data. First, we capture the multi-view correlations from mobility flow patterns, POI semantics and check-in dynamics. Then, we adopt global graph attention networks to learn similarity of any two vertices in graphs. To comprehensively consider and share features of multiple views, a two-stage fusion module is further proposed to learn weights with external attention to fuse multi-view embeddings. Extensive experiments for two downstream tasks on real-world datasets demonstrate that our model outperforms state-of-the-art methods by up to 17\% improvement.	翻訳日:2023-07-10 14:29:03 公開日:2023-07-06
# PseudoCell:深層学習によるセントロブラスト細胞検出のためのPseudo Labelingとしてのハードネガティブマイニング PseudoCell: Hard Negative Mining as Pseudo Labeling for Deep Learning-Based Centroblast Cell Detection ( http://arxiv.org/abs/2307.03211v1 ) ライセンス: Link先を確認	Narongrid Seesawad, Piyalitt Ittichaiwong, Thapanun Sudhawiyangkul, Phattarapong Sawangjai, Peti Thuwajit, Paisarn Boonsakan, Supasan Sripodok, Kanyakorn Veerakanjana, Phoomraphee Luenam, Komgrid Charngkaew, Ananya Pongpaibul, Napat Angkathunyakul, Narit Hnoohom, Sumeth Yuenyong, Chanitra Thuwajit, Theerawit Wilaiprasitporn	(参考訳) 深層学習に基づくパッチ分類モデルはH&E染色組織試料の全スライディング画像(WSI)に利用され, 悪性リンパ腫の診断に有用である。しかし、これらのアプローチはいまだに病理学者が中心芽細胞を手動で同定し、最適な性能のラベルを提供する必要がある。これに対処するために、wsi(ソースコードはhttps://github.com/iobt-vistec/pseudocell.gitで利用可能)でcentroblast検出を自動化するオブジェクト検出フレームワークであるpseudocellを提案する。この枠組みは、病理学者のセントロブラストラベルを組み込んで、細胞の形態学的特徴を用いた偽陽性予測から得られた偽陰性のラベルと組み合わせている。 PseudoCellを用いることで、病理学者の作業量を削減し、組織を調べる際に注意を要する領域を正確に絞り込むことができる。信頼性のしきい値に応じて、pseudocellはwsiの非中心芽球組織領域の58.18-99.35%を除去できる。本研究は, 病理学者が改良のために洗練されたラベルを必要としない, 実用的な遠心細胞前スクリーニング法を提案する。 PseudoCellの実践に関する詳細なガイダンスが議論のセクションで提供されている。 Patch classification models based on deep learning have been utilized in whole-slide images (WSI) of H&E-stained tissue samples to assist pathologists in grading follicular lymphoma patients. However, these approaches still require pathologists to manually identify centroblast cells and provide refined labels for optimal performance. To address this, we propose PseudoCell, an object detection framework to automate centroblast detection in WSI (source code is available at https://github.com/IoBT-VISTEC/PseudoCell.git). This framework incorporates centroblast labels from pathologists and combines them with pseudo-negative labels obtained from undersampled false-positive predictions using the cell's morphological features. By employing PseudoCell, pathologists' workload can be reduced as it accurately narrows down the areas requiring their attention during examining tissue. Depending on the confidence threshold, PseudoCell can eliminate 58.18-99.35% of non-centroblasts tissue areas on WSI. This study presents a practical centroblast prescreening method that does not require pathologists' refined labels for improvement. Detailed guidance on the practical implementation of PseudoCell is provided in the discussion section.	翻訳日:2023-07-10 14:28:44 公開日:2023-07-06
# スパースなグラフィカル線形力学系 Sparse Graphical Linear Dynamical Systems ( http://arxiv.org/abs/2307.03210v1 ) ライセンス: Link先を確認	Emilie Chouzenoux and Victor Elvira	(参考訳) 時系列データセットは、生物医学、地球観測、ネットワーク分析など、科学と工学の多くの分野の中心である。状態空間モデル(SSM)は、時系列上で確率的かつ解釈可能な学習を可能にする強力な数学的ツールである。モデルパラメータをSSMで推定することは、おそらく最も複雑なタスクの1つであり、事前知識の含みは、解釈の容易さだけでなく、推論タスクを複雑にすることが知られている。非常に最近の研究は、これらのモデルパラメータのいくつかにグラフィカルな視点を組み込もうと試みているが、これらは、この作業が対処する顕著な制限を示している。より一般的に、既存のグラフィカルモデリングツールは静的情報、独立した確率変数間の統計的依存関係(例えば、グラフィカルラッソアプローチ)、または時系列サンプル間の因果関係を強調する動的情報(例えば、グラフィカルグランガーアプローチ)のいずれかを組み込むように設計されている。しかし、SSMのコンテキスト内で静的および動的グラフィカルモデリングを組み合わせた共同アプローチは存在しない。本研究では,静的グラフィカルラッソモデルと線形ガウスSSMに対する因果的グラフィカルアプローチを橋渡しする共同グラフィカルモデリングフレームワークを導入することにより,このギャップを埋めるための新しいアプローチを提案する。本稿では,このフレームワークにおける新しい推論手法であるdglasso(dynamic graphical lasso)を提案する。アルゴリズムの収束は、非線形解析から現代のツールから離れることによって確立される。合成および実気象変動データの実験的検証は,提案したモデルと推論アルゴリズムの有効性を示す。 Time-series datasets are central in numerous fields of science and engineering, such as biomedicine, Earth observation, and network analysis. Extensive research exists on state-space models (SSMs), which are powerful mathematical tools that allow for probabilistic and interpretable learning on time series. Estimating the model parameters in SSMs is arguably one of the most complicated tasks, and the inclusion of prior knowledge is known to both ease the interpretation but also to complicate the inferential tasks. Very recent works have attempted to incorporate a graphical perspective on some of those model parameters, but they present notable limitations that this work addresses. More generally, existing graphical modeling tools are designed to incorporate either static information, focusing on statistical dependencies among independent random variables (e.g., graphical Lasso approach), or dynamic information, emphasizing causal relationships among time series samples (e.g., graphical Granger approaches). However, there are no joint approaches combining static and dynamic graphical modeling within the context of SSMs. This work proposes a novel approach to fill this gap by introducing a joint graphical modeling framework that bridges the static graphical Lasso model and a causal-based graphical approach for the linear-Gaussian SSM. We present DGLASSO (Dynamic Graphical Lasso), a new inference method within this framework that implements an efficient block alternating majorization-minimization algorithm. The algorithm's convergence is established by departing from modern tools from nonlinear analysis. Experimental validation on synthetic and real weather variability data showcases the effectiveness of the proposed model and inference algorithm.	翻訳日:2023-07-10 14:28:06 公開日:2023-07-06
# DENCLUEの最適帯域選択 Optimal Bandwidth Selection for DENCLUE ( http://arxiv.org/abs/2307.03206v1 ) ライセンス: Link先を確認	Hao Wang	(参考訳) 現代の業界では、クラスタリングアルゴリズムはアルゴリズムエンジニアの日常的なルーチンである。クラスタリングアルゴリズムは2010年以前に急速に成長した。ディープラーニングが機械学習アプリケーションのデファクト産業標準となった後、研究トピックに関連するイノベーションは停滞している。 2007年、非線形データ構造に対するクラスタリング問題を解決するために密度に基づくクラスタリングアルゴリズムDENCLUEが発明された。しかし、パラメータ選択問題は2011年までほとんど無視された。本稿では,denclueアルゴリズムの最適パラメータを計算する新しい手法を提案し,その性能を実験部で検討する。 In modern day industry, clustering algorithms are daily routines of algorithm engineers. Although clustering algorithms experienced rapid growth before 2010. Innovation related to the research topic has stagnated after deep learning became the de facto industrial standard for machine learning applications. In 2007, a density-based clustering algorithm named DENCLUE was invented to solve clustering problem for nonlinear data structures. However, its parameter selection problem was largely neglected until 2011. In this paper, we propose a new approach to compute the optimal parameters for the DENCLUE algorithm, and discuss its performance in the experiment section.	翻訳日:2023-07-10 14:27:05 公開日:2023-07-06
# セルフリー大量MIMOのためのハイブリッド知識駆動チャネルセマンティック獲得とビームフォーミング Hybrid Knowledge-Data Driven Channel Semantic Acquisition and Beamforming for Cell-Free Massive MIMO ( http://arxiv.org/abs/2307.03070v1 ) ライセンス: Link先を確認	Zhen Gao, Shicong Liu, Yu Su, Zhongxiang Li, Dezhi Zheng	(参考訳) 本稿では,ユビキタスな拡張現実(XR)アプリケーションをサポートし,現在の屋内無線通信能力とのギャップを埋めるため,屋外無線システムの進歩に焦点をあてる。セルレス大規模マルチインプットマルチアウトプット(MIMO)システムにおけるチャネル意味獲得とマルチユーザビームフォーミングのためのハイブリッド知識データ駆動方式を提案する。具体的には、まず、パイロット信号、チャネルセマンティック埋め込みのためのCSI量子化器、チャネルセマンティック抽出のためのCSI再構成をエンドツーエンドで共同で最適化する、チャネルセマンティック取得のためのデータ駆動多重層パーセプトロン(MLP)ベースの自動エンコーダを提案する。さらに、取得したチャネルセマンティクスに基づいて、屋外XRシナリオにおけるCSIの完全性に優れたスペクトル効率を実現することができる知識駆動型深層展開型マルチユーザビームフォーマを提案する。従来の逐次オーバーリラクシエーション(sor)に基づく線形ビームフォーミングスキームをディープラーニングで展開することにより,最適なパラメータを適応的に学習し,収束を加速し,不完全csiに対するロバスト性を向上させることができる。提案手法は,完全ディジタルアレーを用いたアクセスポイント (aps) と,アナログ-デジタルアレーのハイブリッド構造を持つapsに対して使用可能である。シミュレーションの結果,提案手法がチャネル獲得精度の向上に有効であり,csi取得とビームフォーマ設計の複雑さを低減できることを示した。提案手法は,ダウンリンク伝送を3回繰り返しただけで,収束スペクトル効率の約96%を達成し,その効果とアウトドアxr応用の可能性を示した。 This paper focuses on advancing outdoor wireless systems to better support ubiquitous extended reality (XR) applications, and close the gap with current indoor wireless transmission capabilities. We propose a hybrid knowledge-data driven method for channel semantic acquisition and multi-user beamforming in cell-free massive multiple-input multiple-output (MIMO) systems. Specifically, we firstly propose a data-driven multiple layer perceptron (MLP)-Mixer-based auto-encoder for channel semantic acquisition, where the pilot signals, CSI quantizer for channel semantic embedding, and CSI reconstruction for channel semantic extraction are jointly optimized in an end-to-end manner. Moreover, based on the acquired channel semantic, we further propose a knowledge-driven deep-unfolding multi-user beamformer, which is capable of achieving good spectral efficiency with robustness to imperfect CSI in outdoor XR scenarios. By unfolding conventional successive over-relaxation (SOR)-based linear beamforming scheme with deep learning, the proposed beamforming scheme is capable of adaptively learning the optimal parameters to accelerate convergence and improve the robustness to imperfect CSI. The proposed deep unfolding beamforming scheme can be used for access points (APs) with fully-digital array and APs with hybrid analog-digital array structure. Simulation results demonstrate the effectiveness of our proposed scheme in improving the accuracy of channel acquisition, as well as reducing complexity in both CSI acquisition and beamformer design. The proposed beamforming method achieves approximately 96% of the converged spectrum efficiency performance after only three iterations in downlink transmission, demonstrating its efficacy and potential to improve outdoor XR applications.	翻訳日:2023-07-10 14:25:22 公開日:2023-07-06
# PSDR-Room:微分レンダリングによる写真からシーンまで PSDR-Room: Single Photo to Scene using Differentiable Rendering ( http://arxiv.org/abs/2307.03244v1 ) ライセンス: Link先を確認	Kai Yan, Fujun Luan, Milo\v{S} Ha\v{S}An, Thibault Groueix, Valentin Deschaintre, Shuang Zhao	(参考訳) 3dデジタルシーンにはライト、素材、ジオメトリなど多くの要素が含まれており、望ましい外観に達するために相互作用する。このようなシーンのステージングには時間がかかり、芸術と技術の両方のスキルが必要です。そこで本研究では,PSDR-Roomを提案する。PSDR-Roomは,室内シーンのターゲット画像を最小限のユーザ入力でマッチングするための,個々のオブジェクトのポーズや素材を最適化するシステムである。この目的のために、我々は最近の経路空間の微分可能なレンダリング手法を活用し、幾何学、照明、手続き材料に対するレンダリングの偏りのない勾配を提供し、これらすべてのコンポーネントを勾配勾配を用いて最適化し、入力された写真外観と視覚的に一致させることができる。我々は,最近のシーン理解手法を用いて,最適化を初期化し,適切な3次元モデルや材料を探索する。本手法を屋内シーンの実際の写真上で評価し,得られたシーンコンポーネントの編集性を示す。 A 3D digital scene contains many components: lights, materials and geometries, interacting to reach the desired appearance. Staging such a scene is time-consuming and requires both artistic and technical skills. In this work, we propose PSDR-Room, a system allowing to optimize lighting as well as the pose and materials of individual objects to match a target image of a room scene, with minimal user input. To this end, we leverage a recent path-space differentiable rendering approach that provides unbiased gradients of the rendering with respect to geometry, lighting, and procedural materials, allowing us to optimize all of these components using gradient descent to visually match the input photo appearance. We use recent single-image scene understanding methods to initialize the optimization and search for appropriate 3D models and materials. We evaluate our method on real photographs of indoor scenes and demonstrate the editability of the resulting scene components.	翻訳日:2023-07-10 14:18:38 公開日:2023-07-06
# BAD:局所的特徴クラスタリングによるブラインド異常検出 That's BAD: Blind Anomaly Detection by Implicit Local Feature Clustering ( http://arxiv.org/abs/2307.03243v1 ) ライセンス: Link先を確認	Jie Zhang, Masanori Suganuma, Takayuki Okatani	(参考訳) 産業用物体・テクスチャの視覚異常検出(AD)に関する最近の研究は、非常に優れた成果を上げている。彼らは教師なしの設定、特に1つのクラス設定を考慮し、トレーニングのための正規(\textit{i.e}, anomaly-free)イメージセットが利用可能であると仮定する。本稿では,通常のサンプルと異常サンプルの両方を含む可能性のある画像の集合における異常を検出する,教師なしADのより困難なシナリオについて考察する。この設定は、既知の正規データの可用性を前提とせず、最近の研究で考慮されている標準ADとは全く異なる人間のアノテーションから完全に解放されている。明確にするために、seting blind anomaly detection (bad)と呼ぶ。本稿では,badを局所的異常検出問題に変換できることを示すとともに,画像および画素レベルの異常を正確に検出できるpatchclusterという新しい手法を提案する。実験結果から、PatchClusterは通常のデータを知ることなく有望な性能を示し、必要な1クラス設定で適用されるSOTAメソッドに匹敵する性能を示した。 Recent studies on visual anomaly detection (AD) of industrial objects/textures have achieved quite good performance. They consider an unsupervised setting, specifically the one-class setting, in which we assume the availability of a set of normal (\textit{i.e.}, anomaly-free) images for training. In this paper, we consider a more challenging scenario of unsupervised AD, in which we detect anomalies in a given set of images that might contain both normal and anomalous samples. The setting does not assume the availability of known normal data and thus is completely free from human annotation, which differs from the standard AD considered in recent studies. For clarity, we call the setting blind anomaly detection (BAD). We show that BAD can be converted into a local outlier detection problem and propose a novel method named PatchCluster that can accurately detect image- and pixel-level anomalies. Experimental results show that PatchCluster shows a promising performance without the knowledge of normal data, even comparable to the SOTA methods applied in the one-class setting needing it.	翻訳日:2023-07-10 14:18:22 公開日:2023-07-06
# 可視赤外人物再同定のための原始中間情報の適応生成 Adaptive Generation of Privileged Intermediate Information for Visible-Infrared Person Re-Identification ( http://arxiv.org/abs/2307.03240v1 ) ライセンス: Link先を確認	Mahdi Alehdaghi, Arthur Josi, Pourya Shamsolmoali, Rafael M. O. Cruz, and Eric Granger	(参考訳) 可視赤外線の人物再識別は、RGBと赤外線センサーの分散ネットワーク上で撮影された同一人物の画像の検索を試みる。いくつかのV-I ReIDアプローチは、VとIのモダリティを直接統合して、共有表現空間内の人物を識別する。しかしながら、V と I のモダリティ間のデータ分布の重大なギャップを考えると、V-I ReID は依然として困難である。最近のアプローチでは、v と i のモダリティを橋渡しできる中間空間を活用することで一般化が改善されているが、そのような情報領域のデータの選択や生成には効果的な方法が必要である。本稿では,V と I のモダリティ間の識別情報をブリッジする仮想ドメインを適応し,生成するための適応型中間情報学習手法を提案する。 AGPI^2の背後にある重要な動機は、付加情報を提供する特権画像を生成することによって、深いV-I ReIDバックボーンのトレーニングを強化することである。これらの特権画像は、オリジナルのVまたはIモードでのみアクセスできない共有識別特徴をキャプチャする。この目的に向けて、非線形生成モジュールは、逆対象で訓練され、V 画像をより小さな領域シフト w.r.t.I ドメインで中間空間に変換する。一方、AGPI^2内の埋め込みモジュールは、Vと生成された画像の両方に類似した特徴を生成し、すべてのモダリティに共通する特徴の抽出を促進する。これらの貢献に加えて、AGPI^2 は中間画像に適応するための敵の目的を採用しており、V と I ドメイン間の大きなドメインシフトに対処する非モダリティ固有の空間を作る上で重要な役割を果たす。 V-I ReIDデータセットを用いた実験結果から,AGPI^2は推論中に余分な計算資源を使わずにマッチング精度を向上させることが示された。 Visible-infrared person re-identification seeks to retrieve images of the same individual captured over a distributed network of RGB and IR sensors. Several V-I ReID approaches directly integrate both V and I modalities to discriminate persons within a shared representation space. However, given the significant gap in data distributions between V and I modalities, cross-modal V-I ReID remains challenging. Some recent approaches improve generalization by leveraging intermediate spaces that can bridge V and I modalities, yet effective methods are required to select or generate data for such informative domains. In this paper, the Adaptive Generation of Privileged Intermediate Information training approach is introduced to adapt and generate a virtual domain that bridges discriminant information between the V and I modalities. The key motivation behind AGPI^2 is to enhance the training of a deep V-I ReID backbone by generating privileged images that provide additional information. These privileged images capture shared discriminative features that are not easily accessible within the original V or I modalities alone. Towards this goal, a non-linear generative module is trained with an adversarial objective, translating V images into intermediate spaces with a smaller domain shift w.r.t. the I domain. Meanwhile, the embedding module within AGPI^2 aims to produce similar features for both V and generated images, encouraging the extraction of features that are common to all modalities. In addition to these contributions, AGPI^2 employs adversarial objectives for adapting the intermediate images, which play a crucial role in creating a non-modality-specific space to address the large domain shifts between V and I domains. Experimental results conducted on challenging V-I ReID datasets indicate that AGPI^2 increases matching accuracy without extra computational resources during inference.	翻訳日:2023-07-10 14:18:03 公開日:2023-07-06
# 高エネルギー物理学のための量子コンピューティング:最先端の技術と課題 QC4HEPワーキンググループの概要 Quantum Computing for High-Energy Physics: State of the Art and Challenges. Summary of the QC4HEP Working Group ( http://arxiv.org/abs/2307.03236v1 ) ライセンス: Link先を確認	Alberto Di Meglio, Karl Jansen, Ivano Tavernelli, Constantia Alexandrou, Srinivasan Arunachalam, Christian W. Bauer, Kerstin Borras, Stefano Carrazza, Arianna Crippa, Vincent Croft, Roland de Putter, Andrea Delgado, Vedran Dunjko, Daniel J. Egger, Elias Fernandez-Combarro, Elina Fuchs, Lena Funcke, Daniel Gonzalez-Cuadra, Michele Grossi, Jad C. Halimeh, Zoe Holmes, Stefan Kuhn, Denis Lacroix, Randy Lewis, Donatella Lucchesi, Miriam Lucio Martinez, Federico Meloni, Antonio Mezzacapo, Simone Montangero, Lento Nagano, Voica Radescu, Enrique Rico Ortega, Alessandro Roggero, Julian Schuhmacher, Joao Seixas, Pietro Silvi, Panagiotis Spentzouris, Francesco Tacchino, Kristan Temme, Koji Terashi, Jordi Tura, Cenk Tuysuz, Sofia Vallecorsa, Uwe-Jens Wiese, Shinjae Yoo, Jinglei Zhang	(参考訳) 量子コンピュータは、自然科学や他の分野におけるコンピューティングのパラダイム変化に興味深い経路を提供し、いわゆる量子優位、すなわち数値シミュレーションの重要な(指数関数的な)スピードアップを達成する可能性を秘めている。量子ビットの様々な実現を伴うハードウェアデバイスの急速な開発により、量子コンピュータ上での小規模ながら代表的な応用が可能になる。特に、高エネルギー物理学コミュニティは、この分野が計算問題への挑戦の原動力であるため、量子コンピューティングの力にアクセスする上で重要な役割を果たす。この懸念は、理論的な面では、古典的な手法で対処するのが非常に困難または不可能なモデルの探索であり、実験的な面では、大型ハドロン衝突型加速器のアップグレードのような新しい実験の巨大なデータ課題である。 CERN、DESY、IBMが主導するこのロードマップ論文では、高エネルギー物理量子計算の状況を提供し、近い将来に対処できる理論的および実験的なターゲットベンチマーク応用の例を示す。可能であれば、IBM 100 x 100の課題を念頭に置いて、エラー軽減量子コンピューティングを使用した例のリソース推定も提供する。 Quantum computers offer an intriguing path for a paradigmatic change of computing in the natural sciences and beyond, with the potential for achieving a so-called quantum advantage, namely a significant (in some cases exponential) speed-up of numerical simulations. The rapid development of hardware devices with various realizations of qubits enables the execution of small scale but representative applications on quantum computers. In particular, the high-energy physics community plays a pivotal role in accessing the power of quantum computing, since the field is a driving source for challenging computational problems. This concerns, on the theoretical side, the exploration of models which are very hard or even impossible to address with classical techniques and, on the experimental side, the enormous data challenge of newly emerging experiments, such as the upgrade of the Large Hadron Collider. In this roadmap paper, led by CERN, DESY and IBM, we provide the status of high-energy physics quantum computations and give examples for theoretical and experimental target benchmark applications, which can be addressed in the near future. Having the IBM 100 x 100 challenge in mind, where possible, we also provide resource estimates for the examples given using error mitigated quantum computing.	翻訳日:2023-07-10 14:17:31 公開日:2023-07-06
# 量子誤り訂正プリミティブへの単純な化学応用のコンパイル Compilation of a simple chemistry application to quantum error correction primitives ( http://arxiv.org/abs/2307.03233v1 ) ライセンス: Link先を確認	Nick S. Blunt, Gy\"orgy P. Geh\'er, Alexandra E. Moylett	(参考訳) 量子誤差補正の分野では、最近の多くのエキサイティングな結果が見られる。これには、現在の量子ハードウェアにおけるエラー訂正の初期のデモンストレーションや、実世界のアプリケーションのために大規模量子アルゴリズムを実行するための要件を理解するためのリソース見積が含まれる。本研究では,この2つの発展のギャップを,最小限の化学例において,フォールトトレラントに量子位相推定(qpe)を行うために必要な資源を注意深く推定することにより橋渡しする。具体的には, 水素分子を最小に設定した回転表面コードに対して, 格子演算を行うためのqpe回路の詳細なコンパイルについて述べる。本稿ではアルゴリズムと誤り訂正レベルでの最適化について述べる。単純な化学回路でさえも900量子ビットと2300量子誤り訂正ラウンドを必要としており、早期耐故障性体制をターゲットとしたエラー訂正技術の改善の必要性を強調している。 A number of exciting recent results have been seen in the field of quantum error correction. These include initial demonstrations of error correction on current quantum hardware, and resource estimates which improve understanding of the requirements to run large-scale quantum algorithms for real-world applications. In this work, we bridge the gap between these two developments by performing careful estimation of the resources required to fault-tolerantly perform quantum phase estimation (QPE) on a minimal chemical example. Specifically, we describe a detailed compilation of the QPE circuit to lattice surgery operations for the rotated surface code, for a hydrogen molecule in a minimal basis set. We describe a number of optimisations at both the algorithmic and error correction levels. We find that implementing even a simple chemistry circuit requires 900 qubits and 2,300 quantum error correction rounds, emphasising the need for improved error correction techniques specifically targeting the early fault-tolerant regime.	翻訳日:2023-07-10 14:17:12 公開日:2023-07-06
# 高忠実性仮想2量子ゲートの実証実験 Experimental demonstration of a high-fidelity virtual two-qubit gate ( http://arxiv.org/abs/2307.03232v1 ) ライセンス: Link先を確認	Akhil Pratap Singh (1), Kosuke Mitarai (2), Yasunari Suzuki (3), Kentaro Heya (4), Yutaka Tabuchi (4), Keisuke Fujii (2 and 4) and Yasunobu Nakamura (1 and 4) ((1) Department of Applied Physics, Graduate School of Engineering, The University of Tokyo, (2) Graduate School of Engineering Science, Osaka University, (3) NTT Computer and Data Science Laboratories, NTT Corporation, (4) RIKEN Center for Quantum Computing)	(参考訳) 仮想2量子ゲートを実験的に実証し,量子プロセストモグラフィー(QPT)を用いて特徴付ける。仮想2キュービットゲートは、実際の2キュービットゲートを単一キュービット演算に分解し、期待値推定のための量子回路における射影測定を行う。中間回路の分散読み出しによる投影計測を実装した。決定論的サンプリング方式は仮想二ビットゲートの分解に必要な回路評価の回数を減らす。また、読み出し誤差の影響を抑制し、仮想制御されたZ$(CZ)ゲートの平均ゲート忠実度を$f_{\rm av} = 0.9975 \pm 0.0028$に改善する。提案手法は,量子回路のシミュレーションに有用であり,量子ビットの少ない量子ビットを用いた仮想2量子ゲートの実装や,遠隔の2量子ゲートの実装に有用である。 We experimentally demonstrate a virtual two-qubit gate and characterize it using quantum process tomography (QPT). The virtual two-qubit gate decomposes an actual two-qubit gate into single-qubit operations and projective measurements in quantum circuits for expectation-value estimation. We implement projective measurements via mid-circuit dispersive readout. The deterministic sampling scheme reduces the number of experimental circuit evaluations required for decomposing a virtual two-qubit gate. We also apply measurement error mitigation to suppress the effect of readout errors and improve the average gate fidelity of a virtual controlled-$Z$ (CZ) gate to $f_{\rm av} = 0.9975 \pm 0.0028$. Our results highlight a practical approach to implement virtual two-qubit gates with high fidelities, which are useful for simulating quantum circuits using fewer qubits and implementing two-qubit gates on a distant pair of qubits.	翻訳日:2023-07-10 14:16:58 公開日:2023-07-06
# 適応投影型変分量子力学 Adaptive projected variational quantum dynamics ( http://arxiv.org/abs/2307.03229v1 ) ライセンス: Link先を確認	David Linteau, Stefano Barison, Netanel Lindner, Giuseppe Carleo	(参考訳) 本稿では,正確な変動時間進化波動関数を作成するための適応量子アルゴリズムを提案する。この手法は,変分パラメータ数に線形スケーリングを施した大域的最適化を行う,予測された変分量子ダイナミクス(pVQD)アルゴリズムに基づいている。シミュレーションの開始時に変分アンザッツを修正する代わりに、回路は時間進化中に体系的に成長する。さらに、適応ステップは補助量子ビットを必要とせず、ゲート探索は異なる量子デバイス上で並列に行うことができる。この新しいアルゴリズムはadaptive pvqd(適応型pvqd)を駆動スピンモデルとフェルミオン系のシミュレーションに適用し、トロッタ化回路と非適応変分法の両方と比較した場合の利点を示す。最後に,適応型pvqdアルゴリズムを用いて作製した浅層回路を用いて,ハードウェア上の量子システムの物理特性をより正確に測定する。 We propose an adaptive quantum algorithm to prepare accurate variational time evolved wave functions. The method is based on the projected Variational Quantum Dynamics (pVQD) algorithm, that performs a global optimization with linear scaling in the number of variational parameters. Instead of fixing a variational ansatz at the beginning of the simulation, the circuit is grown systematically during the time evolution. Moreover, the adaptive step does not require auxiliary qubits and the gate search can be performed in parallel on different quantum devices. We apply the new algorithm, named Adaptive pVQD, to the simulation of driven spin models and fermionic systems, where it shows an advantage when compared to both Trotterized circuits and non-adaptive variational methods. Finally, we use the shallower circuits prepared using the Adaptive pVQD algorithm to obtain more accurate measurements of physical properties of quantum systems on hardware.	翻訳日:2023-07-10 14:16:40 公開日:2023-07-06
# ニューラルネットワーク場の理論:非ガウス性、行動、局所性 Neural Network Field Theories: Non-Gaussianity, Actions, and Locality ( http://arxiv.org/abs/2307.03223v1 ) ライセンス: Link先を確認	Mehmet Demirtas, James Halverson, Anindita Maiti, Matthew D. Schwartz, Keegan Stoner	(参考訳) 場理論における経路積分測度とニューラルネットワークのアンサンブルは、関数上の分布を記述する。中心極限定理が無限幅(無限$N$)極限に適用できるとき、ネットワークのアンサンブルは自由場理論に対応する。 1/N$の展開は場の理論における相互作用に対応するが、ネットワークパラメータの統計的独立性の小さな破れなど、相互作用する理論につながることもある。これらの他の拡張は、例えば普遍近似定理に対する振る舞いの改善によって、1/N$-展開よりも有利である。場の理論の連結コレレータが与えられた場合、頂点が連結コレレータである新しいファインマン図式処方を用いて、拡張パラメータのアクション順序を体系的に再構成することができる。この方法はエッジワース展開に動機付けられ、ニューラルネットワークの場の理論に対する作用を導出することができる。逆に、この対応により、ニューラルネットワークパラメータ密度の変形として作用変形を表現することにより、与えられた場理論を実現するアーキテクチャを設計できる。例えば、$\phi^4$理論は無限の$N$ニューラルネットワーク場理論として実現される。 Both the path integral measure in field theory and ensembles of neural networks describe distributions over functions. When the central limit theorem can be applied in the infinite-width (infinite-$N$) limit, the ensemble of networks corresponds to a free field theory. Although an expansion in $1/N$ corresponds to interactions in the field theory, others, such as in a small breaking of the statistical independence of network parameters, can also lead to interacting theories. These other expansions can be advantageous over the $1/N$-expansion, for example by improved behavior with respect to the universal approximation theorem. Given the connected correlators of a field theory, one can systematically reconstruct the action order-by-order in the expansion parameter, using a new Feynman diagram prescription whose vertices are the connected correlators. This method is motivated by the Edgeworth expansion and allows one to derive actions for neural network field theories. Conversely, the correspondence allows one to engineer architectures realizing a given field theory by representing action deformations as deformations of neural network parameter densities. As an example, $\phi^4$ theory is realized as an infinite-$N$ neural network field theory.	翻訳日:2023-07-10 14:16:25 公開日:2023-07-06
# 逆モデルによる不確かさの定量化 Quantification of Uncertainty with Adversarial Models ( http://arxiv.org/abs/2307.03217v1 ) ライセンス: Link先を確認	Kajetan Schweighofer, Lukas Aichberger, Mykyta Ielanskyi, G\"unter Klambauer, Sepp Hochreiter	(参考訳) 不確かさの定量化は実世界のアプリケーションで実行可能な予測に重要である。予測的不確実性定量化の重要な部分は、発散関数と後部の間の積の積分として定義されるてんかん不確実性の推定である。ディープアンサンブルやMCドロップアウトのような現在の手法は、主にサンプリングモデルにおいて後部を考慮しているため、てんかんの不確かさを推定するには不十分である。疫学的な不確実性をよりよく推定するために, 適応モデルによる不確かさの定量化を提案する。 quamは、積分の下の全積が後側だけでなく大きい領域を特定する。その結果、quamは従来の方法に比べて認識の不確かさの近似誤差が低い。製品が大きいモデルは、(逆の例ではなく)逆のモデルに対応します。敵対モデルは、高い後部と、それらの予測と参照モデルの高ばらつきの両方を持つ。実験の結果, QUIMは, 深層学習モデルの認識不確実性を把握し, 視覚領域における課題に対する従来の手法よりも優れていることがわかった。 Quantifying uncertainty is important for actionable predictions in real-world applications. A crucial part of predictive uncertainty quantification is the estimation of epistemic uncertainty, which is defined as an integral of the product between a divergence function and the posterior. Current methods such as Deep Ensembles or MC dropout underperform at estimating the epistemic uncertainty, since they primarily consider the posterior when sampling models. We suggest Quantification of Uncertainty with Adversarial Models (QUAM) to better estimate the epistemic uncertainty. QUAM identifies regions where the whole product under the integral is large, not just the posterior. Consequently, QUAM has lower approximation error of the epistemic uncertainty compared to previous methods. Models for which the product is large correspond to adversarial models (not adversarial examples!). Adversarial models have both a high posterior as well as a high divergence between their predictions and that of a reference model. Our experiments show that QUAM excels in capturing epistemic uncertainty for deep learning models and outperforms previous methods on challenging tasks in the vision domain.	翻訳日:2023-07-10 14:16:07 公開日:2023-07-06
# preadd: 制御テキスト生成のためのプレフィックス適応復号 PREADD: Prefix-Adaptive Decoding for Controlled Text Generation ( http://arxiv.org/abs/2307.03214v1 ) ライセンス: Link先を確認	Jonathan Pei, Kevin Yang, and Dan Klein	(参考訳) テキスト生成のためのフレキシブルな方法であるPREADD(Prefix-Adaptive Decoding)を提案する。属性の制御に補助的な専門家モデルを使用する既存の方法とは異なり、PreADDは外部モデルを必要としない。具体的には、preaddは、プレフィックスプリプンを使用して生成されたものとrawプロンプトを使用して生成された出力ロジットを対比し、プレフィックスによってカプセル化された属性に関して、ポジティブとネガティブの両方の制御を可能にする。有害なアウトプット緩和,ジェンダーバイアス低減,感情制御の3つのタスクにおいてpreADDを評価した結果,PreADDはベースラインを刺激するだけでなく,各タスクの主指標に対して12%以上の相対的な利得で補助的専門的制御方法も優れていることがわかった。 We propose Prefix-Adaptive Decoding (PREADD), a flexible method for controlled text generation. Unlike existing methods that use auxiliary expert models to control for attributes, PREADD does not require an external model, instead relying on linearly combining output logits from multiple prompts. Specifically, PREADD contrasts the output logits generated using a raw prompt against those generated using a prefix-prepended prompt, enabling both positive and negative control with respect to any attribute encapsulated by the prefix. We evaluate PREADD on three tasks -- toxic output mitigation, gender bias reduction, and sentiment control -- and find that PREADD outperforms not only prompting baselines, but also an auxiliary-expert control method, by 12% or more in relative gain on our main metrics for each task.	翻訳日:2023-07-10 14:15:48 公開日:2023-07-06
# omniboost:マルチdnn負荷下における異種組み込みデバイスのスループット向上 OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload ( http://arxiv.org/abs/2307.03290v1 ) ライセンス: Link先を確認	Andreas Karatzas and Iraklis Anagnostopoulos	(参考訳) 現代のディープニューラルネットワーク(DNN)は、高い効率性と精度を示す。これにより、複数のDNNアプリケーションで構成されるアプリケーションワークロードが導入され、ワークロードの分散に関する新たな課題が提起された。多様なアクセラレーターを備えた新しい組込みシステムは、現在のランタイムコントローラが完全に利用できないアーキテクチャ上の不均一性を示す。マルチDNNワークロードで高いスループットを実現するために、このようなコントローラは、基礎となる不均一性を活用するために、数十万の可能なソリューションを探索する必要がある。本稿では,異種組み込みデバイスのための軽量かつ拡張可能なマルチDNNマネージャであるOmniBoostを提案する。我々は確率空間探索を活用し、それを高精度な性能推定器と組み合わせて、他の最先端手法と比較してx4.6平均スループット向上を観測する。評価はHiKey970開発ボードで行われた。 Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accuracy properties. This has introduced application workloads that comprise of multiple DNN applications, raising new challenges regarding workload distribution. Equipped with a diverse set of accelerators, newer embedded system present architectural heterogeneity, which current run-time controllers are unable to fully utilize. To enable high throughput in multi-DNN workloads, such a controller is ought to explore hundreds of thousands of possible solutions to exploit the underlying heterogeneity. In this paper, we propose OmniBoost, a lightweight and extensible multi-DNN manager for heterogeneous embedded devices. We leverage stochastic space exploration and we combine it with a highly accurate performance estimator to observe a x4.6 average throughput boost compared to other state-of-the-art methods. The evaluation was performed on the HiKey970 development board.	翻訳日:2023-07-10 14:08:45 公開日:2023-07-06
# サブ線形ハイパーボリュームレグレットの最適スカラー化 Optimal Scalarizations for Sublinear Hypervolume Regret ( http://arxiv.org/abs/2307.03288v1 ) ライセンス: Link先を確認	Qiuyi Zhang (Richard)	(参考訳) スケーラビリティは、例えば最近のRLHFでは、人間の好みを調整する報酬モデルをトレーニングするなど、複数の目的をひとつに減らすために、任意の多目的設定にデプロイできる一般的なテクニックである。しかし、線形スカラー化がパレート辺境の凹部を見逃していることが知られているため、この古典的アプローチを否定する者もいる。そのために我々は,パレート・フロンティアにおけるk$目標の多種多様な集合を探索することのできる,単純な非線形スカラー化を見つけることを目指している。均一にランダムな重みを持つ超体積スカラー化は、任意のアルゴリズムが漸近的により良い処理を行なわないように、最適なサブ線形後悔境界を$O(T^{-1/k})$で達成し、超体積後悔を確実に最小化するのに驚くほど最適であることを示す。理論的なケーススタディとして、多目的確率的線形バンディッツ問題を検討し、超体積スカラー化のsublinear regret boundsを利用すると、$\tilde{o}(d t^{-1/2} + t^{-1/k})$ の高体積後悔境界を生成する新しい非ユークリッド解析が得られることを示す。 EHVIのようなベイズ最適化における標準的な多目的アルゴリズムと同様に、線形スカラー化とチェビシェフスカラー化の両方を一貫して上回る単純な超体積スカラー化を用いることで、我々の理論を強い経験的性能で支持する。 Scalarization is a general technique that can be deployed in any multiobjective setting to reduce multiple objectives into one, such as recently in RLHF for training reward models that align human preferences. Yet some have dismissed this classical approach because linear scalarizations are known to miss concave regions of the Pareto frontier. To that end, we aim to find simple non-linear scalarizations that can explore a diverse set of $k$ objectives on the Pareto frontier, as measured by the dominated hypervolume. We show that hypervolume scalarizations with uniformly random weights are surprisingly optimal for provably minimizing the hypervolume regret, achieving an optimal sublinear regret bound of $O(T^{-1/k})$, with matching lower bounds that preclude any algorithm from doing better asymptotically. As a theoretical case study, we consider the multiobjective stochastic linear bandits problem and demonstrate that by exploiting the sublinear regret bounds of the hypervolume scalarizations, we can derive a novel non-Euclidean analysis that produces improved hypervolume regret bounds of $\tilde{O}( d T^{-1/2} + T^{-1/k})$. We support our theory with strong empirical performance of using simple hypervolume scalarizations that consistently outperforms both the linear and Chebyshev scalarizations, as well as standard multiobjective algorithms in bayesian optimization, such as EHVI.	翻訳日:2023-07-10 14:08:31 公開日:2023-07-06
# 接続制限付き量子符号の速度-距離トレードオフの改善 Improved rate-distance trade-offs for quantum codes with restricted connectivity ( http://arxiv.org/abs/2307.03283v1 ) ライセンス: Link先を確認	Nou\'edyn Baspin, Venkatesan Guruswami, Anirudh Krishna, Ray Li	(参考訳) 量子誤り訂正符号が実現可能であるためには、符号制約を受ける量子ビットがある種の限定接続性を示すことが重要である。 Bravyi & Terhal (BT) と Bravyi, Poulin & Terhal (BPT) の業績は、幾何的局所性は符号特性を制約する(例えば $[[n,k,d]]$ $D$-次元格子上の局所チェックによって定義される量子符号は、$k d^{2/(D-1)} \le O(n)$に従わなければならない。 BaspinとKrishnaは、量子コードに関連する接続グラフがコードパラメータをどう制約するかというより一般的な問題を研究した。これらのトレードオフは、bptおよびbt境界よりもリッチなコードクラスに適用され、幾何学的に局所的なコードのみをキャプチャする。我々は,接続グラフにおける分離子の大きさの関数として,より厳密な次元距離トレードオフを確立することにより,この作業を拡張し,改善する。また、LDPC符号のみでなく、特定の分離プロファイルを持つ安定化器符号を全てカバーする距離境界を得る。 For quantum error-correcting codes to be realizable, it is important that the qubits subject to the code constraints exhibit some form of limited connectivity. The works of Bravyi & Terhal (BT) and Bravyi, Poulin & Terhal (BPT) established that geometric locality constrains code properties -- for instance $[[n,k,d]]$ quantum codes defined by local checks on the $D$-dimensional lattice must obey $k d^{2/(D-1)} \le O(n)$. Baspin and Krishna studied the more general question of how the connectivity graph associated with a quantum code constrains the code parameters. These trade-offs apply to a richer class of codes compared to the BPT and BT bounds, which only capture geometrically-local codes. We extend and improve this work, establishing a tighter dimension-distance trade-off as a function of the size of separators in the connectivity graph. We also obtain a distance bound that covers all stabilizer codes with a particular separation profile, rather than only LDPC codes.	翻訳日:2023-07-10 14:08:01 公開日:2023-07-06
# ニューラルネットワークデコーダによる表面実験 Neural network decoder for near-term surface-code experiments ( http://arxiv.org/abs/2307.03280v1 ) ライセンス: Link先を確認	Boris M. Varbanov, Marc Serra-Peralta, David Byfield, Barbara M. Terhal	(参考訳) ニューラルネットワークデコーダは、表面コードをデコードする際に、従来のデコーダよりも低い論理エラー率を達成することができる。さらに、これらのデコーダは物理エラー率に関する事前情報を必要としないため、高度に適応可能である。本研究では,トランスモン量子ビットプロセッサから得られたシミュレーションデータと実験データの両方を用いて,小型表面符号に着目したデコーダの性能について検討する。最初に、ニューラルネットワークが典型的には、マッチするデコーダよりも優れた処理エラーにより、例えば$Y$エラーなど、複数の相関したシンドローム欠陥につながることが示される。 Google Quantum AI, Nature 614, 676 (2023)]の実験データに適用すると、ニューラルネットワークデコーダは最小ウェイト完全マッチングよりも約25\%$低い論理誤差率を達成し、最大ライクなデコーダのパフォーマンスにアプローチする。このデコーダの柔軟性を実証するために、トランスモン量子ビットのアナログ読み出しで利用できるソフト情報を組み込んで、対称ガウスノイズモデルを用いてシミュレーションにおいてこのデコーダの性能を評価する。ソフトな情報を考えると、測定誤差の確率に応じて、約10〜%の論理誤差率が低下する。優れた論理性能、柔軟性、計算効率により、ニューラルネットワークデコーダは量子メモリの短期的な実証に適している。 Neural-network decoders can achieve a lower logical error rate compared to conventional decoders, like minimum-weight perfect matching, when decoding the surface code. Furthermore, these decoders require no prior information about the physical error rates, making them highly adaptable. In this study, we investigate the performance of such a decoder using both simulated and experimental data obtained from a transmon-qubit processor, focusing on small-distance surface codes. We first show that the neural network typically outperforms the matching decoder due to better handling errors leading to multiple correlated syndrome defects, such as $Y$ errors. When applied to the experimental data of [Google Quantum AI, Nature 614, 676 (2023)], the neural network decoder achieves logical error rates approximately $25\%$ lower than minimum-weight perfect matching, approaching the performance of a maximum-likelihood decoder. To demonstrate the flexibility of this decoder, we incorporate the soft information available in the analog readout of transmon qubits and evaluate the performance of this decoder in simulation using a symmetric Gaussian-noise model. Considering the soft information leads to an approximately $10\%$ lower logical error rate, depending on the probability of a measurement error. The good logical performance, flexibility, and computational efficiency make neural network decoders well-suited for near-term demonstrations of quantum memories.	翻訳日:2023-07-10 14:07:34 公開日:2023-07-06
# 事前訓練するか、事前訓練しないか? 病理組織学におけるセマンティクスセグメンテーションのためのドメイン特化前訓練の事例研究 To pretrain or not to pretrain? A case study of domain-specific pretraining for semantic segmentation in histopathology ( http://arxiv.org/abs/2307.03275v1 ) ライセンス: Link先を確認	Tushar Kataria, Beatrice Knudsen and Shireen Elhabian	(参考訳) 医用画像データセットのアノテートは費用がかかるため、細調整(あるいは伝達学習)は疾患分類やセマンティックセグメンテーションなどのデジタル病理ビジョン応用において最も効果的な方法である。しかし、実際の画像に基づいて訓練されたモデルのテクスチャバイアスにより、転送学習は、未ラベルの病理学データと自己教師によるドメイン固有の特徴の発見を必要とするような、パフォーマンスの低いモデルをもたらす可能性がある。そこで我々は,病理組織特異的な事前訓練モデルが,病理視覚,すなわち腺と細胞セグメンテーションにより良い初期化をもたらすという前提を検証した。本研究では,腺と細胞セグメンテーションタスクのパフォーマンスを,ドメイン特異的および非ドメイン特異的な事前訓練重量と比較した。さらに,ドメイン固有事前学習が統計的に有意な性能差をもたらすデータサイズについて検討する。さらに,ドメイン固有の初期化によって,異なるデータセット上でのドメイン外テストの有効性が向上するかどうかを検討した。その結果、ドメイン固有の事前トレーニングによるパフォーマンス向上は、タスクとトレーニングデータセットのサイズの両方に依存することがわかった。データセットサイズが限定されたインスタンスでは腺分節性能が著しく向上するのに対し,細胞分節データセットでトレーニングしたモデルでは改善は見られなかった。 Annotating medical imaging datasets is costly, so fine-tuning (or transfer learning) is the most effective method for digital pathology vision applications such as disease classification and semantic segmentation. However, due to texture bias in models trained on real-world images, transfer learning for histopathology applications might result in underperforming models, which necessitates the need for using unlabeled histopathology data and self-supervised methods to discover domain-specific characteristics. Here, we tested the premise that histopathology-specific pretrained models provide better initializations for pathology vision tasks, i.e., gland and cell segmentation. In this study, we compare the performance of gland and cell segmentation tasks with domain-specific and non-domain-specific pretrained weights. Moreover, we investigate the data size at which domain-specific pretraining produces a statistically significant difference in performance. In addition, we investigated whether domain-specific initialization improves the effectiveness of out-of-domain testing on distinct datasets but the same task. The results indicate that performance gain using domain-specific pretraining depends on both the task and the size of the training dataset. In instances with limited dataset sizes, a significant improvement in gland segmentation performance was also observed, whereas models trained on cell segmentation datasets exhibit no improvement.	翻訳日:2023-07-10 14:07:09 公開日:2023-07-06
# 性的に推奨的ではなく、教育的だ。 tiktokビデオにおける性教育と提案コンテンツの分離 It is not Sexually Suggestive, It is Educative. Separating Sex Education from Suggestive Content on TikTok Videos ( http://arxiv.org/abs/2307.03274v1 ) ライセンス: Link先を確認	Enfa George, Mihai Surdeanu	(参考訳) sextokは、tiktokの動画を(注釈者の視点から)性的に示唆する、性教育的なコンテンツ、あるいはその両方とラベル付けしたマルチモーダルデータセットである。このようなデータセットは、TikTok上の性的な推奨コンテンツと仮想性教育ビデオの区別という課題に対処するために必要である。子どもの性的な示唆的なビデオへの露出は、その発達に逆効果があることが示されている。一方、バーチャルセックス教育、特にLGBTQIA+コミュニティとより関係のあるテーマは、非常に貴重である。プラットフォームの現在のシステムは、異なる目的のために、両方のタイプのビデオの一部を削除またはペナルティ化する。私たちのデータセットにはビデオURLが含まれています。その重要性を検証するために,ビデオの分類のための2つのトランスフォーマーモデルを検討する。予備的な結果は、これらのタイプの動画を区別する作業は学習可能であるが難しいことを示唆している。これらの実験は、このデータセットが有意義であることを示唆している。 We introduce SexTok, a multi-modal dataset composed of TikTok videos labeled as sexually suggestive (from the annotator's point of view), sex-educational content, or neither. Such a dataset is necessary to address the challenge of distinguishing between sexually suggestive content and virtual sex education videos on TikTok. Children's exposure to sexually suggestive videos has been shown to have adversarial effects on their development. Meanwhile, virtual sex education, especially on subjects that are more relevant to the LGBTQIA+ community, is very valuable. The platform's current system removes or penalizes some of both types of videos, even though they serve different purposes. Our dataset contains video URLs, and it is also audio transcribed. To validate its importance, we explore two transformer-based models for classifying the videos. Our preliminary results suggest that the task of distinguishing between these types of videos is learnable but challenging. These experiments suggest that this dataset is meaningful and invites further study on the subject.	翻訳日:2023-07-10 14:06:45 公開日:2023-07-06
# ADASSM:画像からの統計的形状モデルにおける逆データ拡張 ADASSM: Adversarial Data Augmentation in Statistical Shape Models From Images ( http://arxiv.org/abs/2307.03273v1 ) ライセンス: Link先を確認	Mokshagna Sai Teja Karanam, Tushar Kataria and Shireen Elhabian	(参考訳) 統計的形状モデル (SSM) は, 個体群全体の解剖学的変化を識別するための優れたツールとして確立されている。形状モデルは、与えられたコホート内のすべてのサンプルに対して一貫した形状表現を使用し、形状を比較し、病理を検出できるバリエーションを特定し、治療計画を定式化するのに役立ちます。医用画像では、これらの形状表現をCT/MRIスキャンから計算するには、解剖学的セグメンテーションアノテーション、登録、テクスチャデノイングを含む時間集約的な前処理操作が必要となる。深層学習モデルは、容積画像から直接形状表現を学習する際、例外的な能力を示し、高効率で効率的な画像からSSMへと導く。それでもこれらのモデルはデータ不足であり、医療データの入手が限られているため、ディープラーニングモデルは過度に適合する傾向にある。形状拡張されたサンプルを生成するためにカーネル密度推定(KDE)法を用いるオフラインデータ拡張技術は、従来のSSM法と同等の精度で画像からSSMネットワークを支援することに成功した。しかし,これらの拡張手法は形状向上に重点を置いているのに対し,深層学習モデルは準最適モデルにおけるテクスチャバイアスの結果を示す。本稿では,データ依存型ノイズ生成やテクスチャ拡張を利用して,画像間SSMフレームワークのオンザフライデータ拡張のための新しい戦略を提案する。提案するフレームワークは,画像対ssmネットワークの敵として訓練され,多様で難解なサンプルを補完する。提案手法は,画素値のみに頼らず,基礎となる幾何学に焦点をあてることにより,精度の向上を実現する。 Statistical shape models (SSM) have been well-established as an excellent tool for identifying variations in the morphology of anatomy across the underlying population. Shape models use consistent shape representation across all the samples in a given cohort, which helps to compare shapes and identify the variations that can detect pathologies and help in formulating treatment plans. In medical imaging, computing these shape representations from CT/MRI scans requires time-intensive preprocessing operations, including but not limited to anatomy segmentation annotations, registration, and texture denoising. Deep learning models have demonstrated exceptional capabilities in learning shape representations directly from volumetric images, giving rise to highly effective and efficient Image-to-SSM. Nevertheless, these models are data-hungry and due to the limited availability of medical data, deep learning models tend to overfit. Offline data augmentation techniques, that use kernel density estimation based (KDE) methods for generating shape-augmented samples, have successfully aided Image-to-SSM networks in achieving comparable accuracy to traditional SSM methods. However, these augmentation methods focus on shape augmentation, whereas deep learning models exhibit texture bias results in sub-optimal models. This paper introduces a novel strategy for on-the-fly data augmentation for the Image-to-SSM framework by leveraging data-dependent noise generation or texture augmentation. The proposed framework is trained as an adversary to the Image-to-SSM network, augmenting diverse and challenging noisy samples. Our approach achieves improved accuracy by encouraging the model to focus on the underlying geometry rather than relying solely on pixel values.	翻訳日:2023-07-10 14:06:27 公開日:2023-07-06
# 量子プロセッサのためのハイブリッド量子-古典的生成逆数ネットワーク A Hybrid Quantum-Classical Generative Adversarial Network for Near-Term Quantum Processors ( http://arxiv.org/abs/2307.03269v1 ) ライセンス: Link先を確認	Albha O'Dwyer Boyle and Reza Nikandish	(参考訳) 本稿では,近距離量子プロセッサのためのハイブリッド量子古典生成逆数ネットワーク(GAN)を提案する。ハイブリッドGANは、ジェネレータと識別器量子ニューラルネットワーク(QNN)とを備える。生成ネットワークは、角符号化量子回路と変分量子アンサッツを用いて実現される。識別器ネットワークは、多段トレーニング可能な量子回路を用いて実現される。 QNNでは,その深度を制御し,精度と回路複雑度を妥協するモジュール設計手法が提案されている。ジェネレータと判別器ネットワークの損失関数の勾配は、その実装に使用される同じ量子回路を用いて導出される。これにより、余分な量子回路や補助量子ビットが不要になる。量子シミュレーションはIBM Qiskitオープンソースソフトウェア開発キット(SDK)を用いて行われ、ハイブリッド量子古典的GANのトレーニングは、古典的コンピュータ上でのミニバッチ確率勾配勾配(SGD)最適化を用いて行われる。ハイブリッド量子古典的GANは、異なる識別器ネットワーク構造を持つ2量子システムを用いて実装される。 5段階の判別器ネットワークを用いて実現されたハイブリッドGANは、63個の量子ゲートと31個のトレーニング可能なパラメータから構成され、実データ分布と生成されたデータ分布の類似性をそれぞれ0.39および4.16のKullback-Leibler(KL)とJensen-Shannon(JS)の発散スコアを達成する。 In this article, we present a hybrid quantum-classical generative adversarial network (GAN) for near-term quantum processors. The hybrid GAN comprises a generator and a discriminator quantum neural network (QNN). The generator network is realized using an angle encoding quantum circuit and a variational quantum ansatz. The discriminator network is realized using multi-stage trainable encoding quantum circuits. A modular design approach is proposed for the QNNs which enables control on their depth to compromise between accuracy and circuit complexity. Gradient of the loss functions for the generator and discriminator networks are derived using the same quantum circuits used for their implementation. This prevents the need for extra quantum circuits or auxiliary qubits. The quantum simulations are performed using the IBM Qiskit open-source software development kit (SDK), while the training of the hybrid quantum-classical GAN is conducted using the mini-batch stochastic gradient descent (SGD) optimization on a classic computer. The hybrid quantum-classical GAN is implemented using a two-qubit system with different discriminator network structures. The hybrid GAN realized using a five-stage discriminator network, comprises 63 quantum gates and 31 trainable parameters, and achieves the Kullback-Leibler (KL) and the Jensen-Shannon (JS) divergence scores of 0.39 and 4.16, respectively, for similarity between the real and generated data distributions.	翻訳日:2023-07-10 14:05:57 公開日:2023-07-06
# 前立腺イメージングにおけるセグメンテーション基礎モデルの実証解析 Empirical Analysis of a Segmentation Foundation Model in Prostate Imaging ( http://arxiv.org/abs/2307.03266v1 ) ライセンス: Link先を確認	Heejong Kim, Victor Ion Butoi, Adrian V. Dalca, Mert R. Sabuncu	(参考訳) 医療画像セグメンテーションの最先端技術のほとんどは、ディープラーニングモデルに依存している。しかしながら、これらのモデルは、しばしば、高価なラベル付きデータセットを必要とする教師付き方法で、狭義のタスクで訓練される。自然言語生成などの機械学習領域の最近の進歩は、ラベル付きデータはほとんどなく、下流の様々なタスクにカスタマイズ可能な基礎モデルの構築の実現可能性と有用性を示している。これは、基礎モデルがこの分野の未来を形作ることを期待する医療画像のパラダイムシフトである可能性が高い。本稿では,最近開発された医用画像分割の基礎モデル universeg について述べる。本研究では,前立腺画像の文脈で経験的評価を行い,従来のタスク固有セグメンテーションモデルの訓練手法と比較する。本研究は, 医用画像セグメンテーションの基礎モデルの開発と導入において重要となるいくつかの重要な要因について考察した。 Most state-of-the-art techniques for medical image segmentation rely on deep-learning models. These models, however, are often trained on narrowly-defined tasks in a supervised fashion, which requires expensive labeled datasets. Recent advances in several machine learning domains, such as natural language generation have demonstrated the feasibility and utility of building foundation models that can be customized for various downstream tasks with little to no labeled data. This likely represents a paradigm shift for medical imaging, where we expect that foundation models may shape the future of the field. In this paper, we consider a recently developed foundation model for medical image segmentation, UniverSeg. We conduct an empirical evaluation study in the context of prostate imaging and compare it against the conventional approach of training a task-specific segmentation model. Our results and discussion highlight several important factors that will likely be important in the development and adoption of foundation models for medical image segmentation.	翻訳日:2023-07-10 14:05:29 公開日:2023-07-06
# Vision Language Transformers: 調査 Vision Language Transformers: A Survey ( http://arxiv.org/abs/2307.03254v1 ) ライセンス: Link先を確認	Clayton Fields, Casey Kennington	(参考訳) イメージを記述するキャプションに関する質問に答えたり、生成したりするといった視覚言語タスクは、コンピュータが実行するのが難しいタスクである。比較的最近の研究機関は、‘citet{vaswani2017attention} で導入された事前訓練されたトランスフォーマーアーキテクチャを視覚言語モデリングに応用した。トランスフォーマーモデルは、以前のビジョン言語モデルよりも性能と汎用性を大幅に改善した。大規模なジェネリックデータセットでモデルを事前トレーニングし、アーキテクチャやパラメータ値に小さな変更を加えることで、学習を新しいタスクに移す。この種の伝達学習は、自然言語処理とコンピュータビジョンの両方において標準モデリングの実践となっている。視覚言語トランスフォーマーは、視覚と言語の両方を必要とするタスクで同様の進歩を生み出すことを約束する。本稿では,現在利用可能な視覚言語トランスフォーマーモデルに関する幅広い研究の合成を行い,その強み,限界,未解決の疑問について分析する。 Vision language tasks, such as answering questions about or generating captions that describe an image, are difficult tasks for computers to perform. A relatively recent body of research has adapted the pretrained transformer architecture introduced in \citet{vaswani2017attention} to vision language modeling. Transformer models have greatly improved performance and versatility over previous vision language models. They do so by pretraining models on a large generic datasets and transferring their learning to new tasks with minor changes in architecture and parameter values. This type of transfer learning has become the standard modeling practice in both natural language processing and computer vision. Vision language transformers offer the promise of producing similar advancements in tasks which require both vision and language. In this paper, we provide a broad synthesis of the currently available research on vision language transformer models and offer some analysis of their strengths, limitations and some open questions that remain.	翻訳日:2023-07-10 14:05:16 公開日:2023-07-06
# InfoSync:多言語半構造化テーブル間の情報同期 InfoSync: Information Synchronization across Multilingual Semi-structured Tables ( http://arxiv.org/abs/2307.03313v1 ) ライセンス: Link先を確認	Siddharth Khincha, Chelsi Jain, Vivek Gupta, Tushar Kataria, Shuo Zhang	(参考訳) 言語間の半構造化データの情報同期は困難である。例えば、ある言語のウィキペディアテーブルは言語間で同期する必要がある。この問題に対処するために,新しいデータセットInfoSyncCと2段階のタブ同期手法を導入する。 InfoSyncには14言語にまたがる100Kのエンティティ中心テーブル(Wikipedia Infobox)が含まれており、サブセット(3.5Kペア)が手動で注釈付けされている。提案手法には 1)地図列に対する情報アライメント及び情報アライメント 2)多言語テーブルにまたがるアライメントテーブルの欠落情報更新のための情報更新。 InfoSyncで評価すると、情報アライメントはF1スコア87.91(en <-> non-en)を達成する。情報アップデーションを評価するため,603のテーブル対に対してInfoboxesで人手によるウィキペディア編集を行う。本手法はウィキペディア上で77.28%の受け入れ率を示し,提案手法の有効性を示した。 Information Synchronization of semi-structured data across languages is challenging. For instance, Wikipedia tables in one language should be synchronized across languages. To address this problem, we introduce a new dataset InfoSyncC and a two-step method for tabular synchronization. InfoSync contains 100K entity-centric tables (Wikipedia Infoboxes) across 14 languages, of which a subset (3.5K pairs) are manually annotated. The proposed method includes 1) Information Alignment to map rows and 2) Information Update for updating missing/outdated information for aligned tables across multilingual tables. When evaluated on InfoSync, information alignment achieves an F1 score of 87.91 (en <-> non-en). To evaluate information updation, we perform human-assisted Wikipedia edits on Infoboxes for 603 table pairs. Our approach obtains an acceptance rate of 77.28% on Wikipedia, showing the effectiveness of the proposed method.	翻訳日:2023-07-10 13:59:01 公開日:2023-07-06
# スカラーおよびベクトルデータに対する球面調和表現の不変性、等分散、相関および畳み込みについて On Invariance, Equivariance, Correlation and Convolution of Spherical Harmonic Representations for Scalar and Vectorial Data ( http://arxiv.org/abs/2307.03311v1 ) ライセンス: Link先を確認	Janis Keuper	(参考訳) Spherical Harmonic (SH) ドメインにおけるデータの数学的表現は、最近、機械学習コミュニティへの関心が高まっている。この技術報告では、SH表現の理論的基礎と実践的な実装について詳細に紹介し、回転不変および同変特性に関する研究を要約するとともに、球面上の信号の畳み込みと正確な相関について述べる。拡張において、これらの手法はスカラーSH表現からベクトル調和(VH)へ一般化され、球面上の3次元ベクトル場にも同様の機能を与える。 The mathematical representations of data in the Spherical Harmonic (SH) domain has recently regained increasing interest in the machine learning community. This technical report gives an in-depth introduction to the theoretical foundation and practical implementation of SH representations, summarizing works on rotation invariant and equivariant features, as well as convolutions and exact correlations of signals on spheres. In extension, these methods are then generalized from scalar SH representations to Vectorial Harmonics (VH), providing the same capabilities for 3d vector fields on spheres	翻訳日:2023-07-10 13:58:45 公開日:2023-07-06
# 機械学習による積分可能な量子多体系のダイナミクスの探索 Finding the Dynamics of an Integrable Quantum Many-Body System via Machine Learning ( http://arxiv.org/abs/2307.03310v1 ) ライセンス: Link先を確認	Victor Wei, Alev Orfi, Felix Fehse, W. A. Coish	(参考訳) 学習手法を用いて,ガウディン磁石(中心スピンモデル)の力学について検討する。このモデルは、例えば、環境スピンの大きな浴と相互作用する中心スピンの非マルコフ非コヒーレンスダイナミクスの研究や非平衡超伝導の研究など、実用上重要なものである。ガウディン磁石もまた可積分であり、多くの保存量を認めている:$N$スピンに対して、モデルハミルトニアンは$N$独立通勤作用素の和として書くことができる。この高次対称性にもかかわらず、この多体問題の力学に対する一般閉形式解析解はいまだ解明されていない。機械学習手法は、明示的な解析解が明らかでない場合でも、可積分問題における高次対称性を利用するのに適している。この直観に動機づけられ、モデルハミルトニアンの各変分固有状態に対してニューラルネットワーク表現(制限ボルツマン機械)を用いる。次に、変分モンテカルロ計算により、ガウディン・マグネットハミルトニアンの基底状態と低次励起状態の正確な表現を得る。低次固有状態から、スピン浴の存在下での時間変化する横磁場に対する中心スピンの線形応答を記述する非摂動動的横スピン感受性を求める。この感受性を効率的に記述することは、量子2レベルシステムの環境と相互作用する量子ビットのキャラクタリゼーションと量子制御手順を改善するための扉を開く。これらのシステムには、超微粒子相互作用を介して環境核スピンと相互作用する電子スピンおよびホールスピン量子ビットや、コヒーレント電荷または常磁性不純物と相互作用する自由度を持つ量子ビットが含まれる。 We study the dynamics of the Gaudin magnet ("central-spin model") using machine-learning methods. This model is of practical importance, e.g., for studying non-Markovian decoherence dynamics of a central spin interacting with a large bath of environmental spins and for studies of nonequilibrium superconductivity. The Gaudin magnet is also integrable, admitting many conserved quantities: For $N$ spins, the model Hamiltonian can be written as the sum of $N$ independent commuting operators. Despite this high degree of symmetry, a general closed-form analytic solution for the dynamics of this many-body problem remains elusive. Machine-learning methods may be well suited to exploiting the high degree of symmetry in integrable problems, even when an explicit analytic solution is not obvious. Motivated in part by this intuition, we use a neural-network representation (restricted Boltzmann machine) for each variational eigenstate of the model Hamiltonian. We then obtain accurate representations of the ground state and of the low-lying excited states of the Gaudin-magnet Hamiltonian through a variational Monte Carlo calculation. From the low-lying eigenstates, we find the non-perturbative dynamic transverse spin susceptibility, describing the linear response of a central spin to a time-varying transverse magnetic field in the presence of a spin bath. Having an efficient description of this susceptibility opens the door to improved characterization and quantum control procedures for qubits interacting with an environment of quantum two-level systems. These systems include electron-spin and hole-spin qubits interacting with environmental nuclear spins via hyperfine interactions or qubits with charge or flux degrees of freedom interacting with coherent charge or paramagnetic impurities.	翻訳日:2023-07-10 13:58:34 公開日:2023-07-06
# 高協調光学系における熱的相互変調バックアクション Thermal intermodulation backaction in a high-cooperativity optomechanical system ( http://arxiv.org/abs/2307.03309v1 ) ライセンス: Link先を確認	Christian M. Pluchar, Aman R. Agrawal, Dalziel J. Wilson	(参考訳) テザリングナノメカニカル共振器を用いた室温量子光力学の追求は、外部の機械的自由度による厳密な課題に直面している。重要な例は熱変調ノイズ(tin)であり、熱雑音ピークの混合によって生じる余分な光学ノイズの一種である。 TINは光磁場の位相から切り離すことができるが、放射圧によって間接的に結合し、ショットノイズを圧倒する可能性のある隠れたバックアクションの源を示唆している。本稿では,fabry-p\'{e}rot型キャビティに結合した音響周波数si$_3$n$_4$トランポリンからなる高共動作室温キャビティ光機械系におけるtinのバックアクションを観察する。観測したバックアクションは, キャビティ線幅の10倍小さいにもかかわらず, 熱雑音が20db, 放射圧ショットノイズが40dbを超える。この結果は、TINの緩和が、様々な現代光学系における室温から量子状態に到達する上で重要であることを示唆している。 The pursuit of room temperature quantum optomechanics with tethered nanomechanical resonators faces stringent challenges owing to extraneous mechanical degrees of freedom. An important example is thermal intermodulation noise (TIN), a form of excess optical noise produced by mixing of thermal noise peaks. While TIN can be decoupled from the phase of the optical field, it remains indirectly coupled via radiation pressure, implying a hidden source of backaction that might overwhelm shot noise. Here we report observation of TIN backaction in a high-cooperativity, room temperature cavity optomechanical system consisting of an acoustic-frequency Si$_3$N$_4$ trampoline coupled to a Fabry-P\'{e}rot cavity. The backaction we observe exceeds thermal noise by 20 dB and radiation pressure shot noise by 40 dB, despite the thermal motion being 10 times smaller than the cavity linewidth. Our results suggest that mitigating TIN may be critical to reaching the quantum regime from room temperature in a variety of contemporary optomechanical systems.	翻訳日:2023-07-10 13:58:06 公開日:2023-07-06
# 公正な分類がノイズ保護属性と出会うとき When Fair Classification Meets Noisy Protected Attributes ( http://arxiv.org/abs/2307.03306v1 ) ライセンス: Link先を確認	Avijit Ghosh, Pablo Kvitca, Christo Wilson	(参考訳) アルゴリズムの公平性の運用には、データセットの保護属性の可用性や信頼性など、いくつかの実用的な課題が伴う。現実の文脈では、実用的および法的障害は人口統計データの収集と使用を妨げ、アルゴリズムの公平性を保証することが困難になる。初期フェアネスアルゴリズムはこれらの制限を考慮しなかったが、最近の提案は保護属性にノイズを組み込むか、保護属性を全く使わないことで分類のアルゴリズム的フェアネスを達成することを目的としている。我々の知る限りでは、これは、予測と公正性の二重軸に沿った属性耐性、耐雑音性、および属性ブラインドアルゴリズムを比較するための、公平な分類アルゴリズムの直接的研究である。これらのアルゴリズムを実世界の4つのデータセットと合成摂動のケーススタディを通じて評価した。本研究は,保護された属性がノイズである場合でも,属性依存型アルゴリズムと同等の性能を達成できることを示す。しかし、実際に実施するには注意深いニュアンスが必要である。本研究は,保護属性がうるさく,部分的に使用可能なシナリオにおいて,公平な分類アルゴリズムを使用することの実際的な意義について考察する。 The operationalization of algorithmic fairness comes with several practical challenges, not the least of which is the availability or reliability of protected attributes in datasets. In real-world contexts, practical and legal impediments may prevent the collection and use of demographic data, making it difficult to ensure algorithmic fairness. While initial fairness algorithms did not consider these limitations, recent proposals aim to achieve algorithmic fairness in classification by incorporating noisiness in protected attributes or not using protected attributes at all. To the best of our knowledge, this is the first head-to-head study of fair classification algorithms to compare attribute-reliant, noise-tolerant and attribute-blind algorithms along the dual axes of predictivity and fairness. We evaluated these algorithms via case studies on four real-world datasets and synthetic perturbations. Our study reveals that attribute-blind and noise-tolerant fair classifiers can potentially achieve similar level of performance as attribute-reliant algorithms, even when protected attributes are noisy. However, implementing them in practice requires careful nuance. Our study provides insights into the practical implications of using fair classification algorithms in scenarios where protected attributes are noisy or partially available.	翻訳日:2023-07-10 13:57:47 公開日:2023-07-06
# プレソフトマックススコアを用いた属性法の脆弱性 A Vulnerability of Attribution Methods Using Pre-Softmax Scores ( http://arxiv.org/abs/2307.03305v1 ) ライセンス: Link先を確認	Miguel Lerma and Mirtha Lucas	(参考訳) 分類器として動作する畳み込みニューラルネットワークの出力に関する説明を提供するために使用される帰属方法のカテゴリを含む脆弱性について検討する。このタイプのネットワークは、入力の知覚できない摂動がモデルの出力を変える可能性のある敵攻撃に弱いことが知られている。対照的に、モデル内の小さな修正がモデル出力を変更することなく帰属法に影響を及ぼす影響に焦点を当てる。 We discuss a vulnerability involving a category of attribution methods used to provide explanations for the outputs of convolutional neural networks working as classifiers. It is known that this type of networks are vulnerable to adversarial attacks, in which imperceptible perturbations of the input may alter the outputs of the model. In contrast, here we focus on effects that small modifications in the model may cause on the attribution method without altering the model outputs.	翻訳日:2023-07-10 13:57:25 公開日:2023-07-06
# データ効率・高性能医用画像処理のための同変球面CNN Equivariant Spherical CNN for Data Efficient and High-Performance Medical Image Processing ( http://arxiv.org/abs/2307.03298v1 ) ライセンス: Link先を確認	Amirreza Hashemi, Yuemeng Feng, Hamid Sabet	(参考訳) 本研究は,トモグラフィ応用における等価ネットワークの効率的かつ高性能なアプローチとしての重要性を強調する。本研究は,様々な医用画像処理システムの後処理において有望である畳み込みニューラルネットワーク(CNN)の限界を基礎にしている。しかし、従来のCNNの効率性は、未完成で適切なトレーニングセットに大きく依存している。そこで本研究では,CNNが特定のトレーニングセットへの依存を減らすことを目的とした同変ネットワークを提案する。断層画像診断における球面信号に対する同変CNNの有効性について検討した。この結果から, ベンチマーク問題の解法と再構成において, 球状CNN(SCNN)の精度と計算効率が優れていた。さらに,従来の画像再構成ツールの補完としてSCNNを用いる新たな手法を提案する。いずれの場合も,CNNと比較して,SCNNと同等あるいは高画質の画像処理を継続しながら,計算コストの大幅な低下を観察する。さらに,このネットワークの広範なトモグラフィ応用,特に全方位表現を必要とするネットワークの可能性について検討する。 This work highlights the significance of equivariant networks as efficient and high-performance approaches for tomography applications. Our study builds upon the limitations of Convolutional Neural Networks (CNNs), which have shown promise in post-processing various medical imaging systems. However, the efficiency of conventional CNNs heavily relies on an undiminished and proper training set. To tackle this issue, in this study, we introduce an equivariant network, aiming to reduce CNN's dependency on specific training sets. We evaluate the efficacy of equivariant CNNs on spherical signals for tomographic medical imaging problems. Our results demonstrate superior quality and computational efficiency of spherical CNNs (SCNNs) in denoising and reconstructing benchmark problems. Furthermore, we propose a novel approach to employ SCNNs as a complement to conventional image reconstruction tools, enhancing the outcomes while reducing reliance on the training set. Across all cases, we observe a significant decrease in computational costs while maintaining the same or higher quality of image processing using SCNNs compared to CNNs. Additionally, we explore the potential of this network for broader tomography applications, particularly those requiring omnidirectional representation.	翻訳日:2023-07-10 13:57:18 公開日:2023-07-06
# 顎関節終末音声処理タスクにおけるガンマトネグラムの表現:音声認識,話者識別,知能度評価 Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment ( http://arxiv.org/abs/2307.03296v1 ) ライセンス: Link先を確認	Aref Farhadipour and Hadi Veisi	(参考訳) 失語症(Dysarthria)は、人間の音声システムに障害を引き起こし、人の音声の品質と知性を減らす障害である。この効果により、正常な音声処理システムは、障害のある音声に対して適切に動作できない。この障害は通常身体障害と関連している。したがって、スマートホームで音声コマンドを受信することでタスクを遂行できるシステムを設計することは重要な成果である。本研究では,畳み込みニューラルネットワークの入力として使用される識別的詳細を持つ音声ファイルの効率的な表現法としてガンマトングラムを導入する。言い換えると、各音声ファイルを画像に変換し、異なるシナリオで音声を分類する画像認識システムを提案する。提案するcnnは、事前学習されたalexnet上の転送学習法に基づいている。本研究では,音声認識,話者識別,インテリジェンス評価のためのシステムの有効性を評価する。 uaデータセットの結果によると、提案する音声認識システムは話者依存モードでは91.29%、話者識別システムは87.74%、明瞭度評価システムは2クラスモードで96.47%の精度を達成した。最後に,完全自動動作するマルチネットワーク音声認識システムを提案する。このシステムは、二級知性評価システムと共にカスケード配置され、このシステムの出力は、音声認識ネットワークの各々の1つを活性化する。このアーキテクチャは92.3%のWRRを達成している。本論文のソースコードは利用可能である。 Dysarthria is a disability that causes a disturbance in the human speech system and reduces the quality and intelligibility of a person's speech. Because of this effect, the normal speech processing systems can not work properly on impaired speech. This disability is usually associated with physical disabilities. Therefore, designing a system that can perform some tasks by receiving voice commands in the smart home can be a significant achievement. In this work, we introduce gammatonegram as an effective method to represent audio files with discriminative details, which is used as input for the convolutional neural network. On the other word, we convert each speech file into an image and propose image recognition system to classify speech in different scenarios. Proposed CNN is based on the transfer learning method on the pre-trained Alexnet. In this research, the efficiency of the proposed system for speech recognition, speaker identification, and intelligibility assessment is evaluated. According to the results on the UA dataset, the proposed speech recognition system achieved 91.29% accuracy in speaker-dependent mode, the speaker identification system acquired 87.74% accuracy in text-dependent mode, and the intelligibility assessment system achieved 96.47% accuracy in two-class mode. Finally, we propose a multi-network speech recognition system that works fully automatically. This system is located in a cascade arrangement with the two-class intelligibility assessment system, and the output of this system activates each one of the speech recognition networks. This architecture achieves an accuracy of 92.3% WRR. The source code of this paper is available.	翻訳日:2023-07-10 13:57:02 公開日:2023-07-06
# chexmask: 胸部x線画像のための解剖学的セグメンテーションマスクの大規模データセット CheXmask: a large-scale dataset of anatomical segmentation masks for multi-center chest x-ray images ( http://arxiv.org/abs/2307.03293v1 ) ライセンス: Link先を確認	Nicol\'as Gaggion, Candelaria Mosquera, Lucas Mansilla, Martina Aineseder, Diego H. Milone, Enzo Ferrante	(参考訳) 胸部X線分析のための人工知能モデルの開発は、高品質なアノテーションを持つ大規模で多様なデータセットに依存している。胸部X線画像のデータベースがいくつか公開されているが、そのほとんどは疾患診断ラベルを含んでいるが、詳細なピクセルレベルの解剖学的分類ラベルがない。このギャップに対処するため,CANDID-PTX,ChestX-ray8,Chexpert,MIMIC-CXR-JPG,Padchest,VinDr-CXRの6つの公開データベースから得られる画像に対して,均一かつ微細な解剖学的アノテーションを付加した胸部X線多中心セグメンテーションデータセットを導入する。提案手法はHybridGNetモデルを用いて,全データセットの一貫性と高品質なセグメンテーションを保証する。専門医の評価と自動品質管理を含む厳密な検証を行い、その結果のマスクを検証する。さらに,マスク毎の個別品質指標とデータセット毎の全体的な品質推定も提供する。このデータセットは、胸部x線分析における革新的な方法論の開発と評価を合理化し、より広い科学コミュニティにとって貴重な資源となっている。 chexmaskデータセットは、 \url{https://physionet.org/content/chexmask-cxr-segmentation-data/} で公開されている。 The development of successful artificial intelligence models for chest X-ray analysis relies on large, diverse datasets with high-quality annotations. While several databases of chest X-ray images have been released, most include disease diagnosis labels but lack detailed pixel-level anatomical segmentation labels. To address this gap, we introduce an extensive chest X-ray multi-center segmentation dataset with uniform and fine-grain anatomical annotations for images coming from six well-known publicly available databases: CANDID-PTX, ChestX-ray8, Chexpert, MIMIC-CXR-JPG, Padchest, and VinDr-CXR, resulting in 676,803 segmentation masks. Our methodology utilizes the HybridGNet model to ensure consistent and high-quality segmentations across all datasets. Rigorous validation, including expert physician evaluation and automatic quality control, was conducted to validate the resulting masks. Additionally, we provide individualized quality indices per mask and an overall quality estimation per dataset. This dataset serves as a valuable resource for the broader scientific community, streamlining the development and assessment of innovative methodologies in chest X-ray analysis. The CheXmask dataset is publicly available at: \url{https://physionet.org/content/chexmask-cxr-segmentation-data/}.	翻訳日:2023-07-10 13:56:36 公開日:2023-07-06
# 量子回路ボルニングマシンにおける過パラメータ化の同定 Identifying overparameterization in Quantum Circuit Born Machines ( http://arxiv.org/abs/2307.03292v1 ) ライセンス: Link先を確認	Andrea Delgado, Francisco Rios, Kathleen E. Hamilton	(参考訳) 機械学習では、過剰パラメータ化は経験的リスク環境の質的変化と関連しており、より効率的なトレーニングダイナミクスにつながる可能性がある。統計学習で用いられる多くのパラメータ化モデルでは、モデルが構築され、過剰パラメータ化環境下で訓練される、臨界数のパラメータ(またはモデルサイズ)が存在する。過パラメータ化ロスランドスケープには多くの特徴がある。最も重要な点は、低損失のグローバルまたはローカルミニマへの標準勾配降下の収束である。本研究では,非逆勾配法を用いて学習した生成モデルであるBornマシンの過パラメータ化遷移の開始について検討する。数値解析に基づく境界は, 一般に, オーバーパラメータ化遷移において良好な下限である。しかし、量子回路の代数的構造に基づく境界は非常にゆるい上界である。以上の結果から,これらのモデルのトレーサビリティを完全に理解することは,まだ未解決の課題であることが示唆された。 In machine learning, overparameterization is associated with qualitative changes in the empirical risk landscape, which can lead to more efficient training dynamics. For many parameterized models used in statistical learning, there exists a critical number of parameters, or model size, above which the model is constructed and trained in the overparameterized regime. There are many characteristics of overparameterized loss landscapes. The most significant is the convergence of standard gradient descent to global or local minima of low loss. In this work, we study the onset of overparameterization transitions for quantum circuit Born machines, generative models that are trained using non-adversarial gradient-based methods. We observe that bounds based on numerical analysis are in general good lower bounds on the overparameterization transition. However, bounds based on the quantum circuit's algebraic structure are very loose upper bounds. Our results indicate that fully understanding the trainability of these models remains an open question.	翻訳日:2023-07-10 13:56:08 公開日:2023-07-06
# ACDNet:効果的な医薬勧告のための注意誘導協調決定ネットワーク ACDNet: Attention-guided Collaborative Decision Network for Effective Medication Recommendation ( http://arxiv.org/abs/2307.03332v1 ) ライセンス: Link先を確認	Jiacong Mi, Yi Zu, Zhuoyuan Wang, Jieyue He	(参考訳) 複雑な医療データのためにElectronic Health Records(EHR)を用いた治療勧告は困難である。最近のアプローチでは、患者eerから縦断情報を抽出して推奨事項をパーソナライズする。しかし、既存のモデルは十分な患者表現を欠くことが多く、患者の薬の記録と特定の薬との類似性を考慮することの重要性を見落としている。そこで本論文では,医薬品推奨のための注意誘導協調決定ネットワーク(ACDNet)を提案する。具体的には、adcnetはアテンション機構とトランスフォーマーを使用して、グローバルレベルとローカルレベルの両方での歴史的な訪問をモデル化し、患者の健康状態と薬物記録を効果的に捉えている。 ACDNetはまた、医薬品記録と医薬品表現の類似性を利用して推奨プロセスを促進する共同決定フレームワークも採用している。 MIMIC-IIIとMIMIC-IVの2つの広範囲な医学データセット実験の結果、ACDNetはJaccard、PR-AUC、F1スコアで最先端モデルよりも優れており、その優位性を再確認している。さらに, アブレーション実験により, acdnetにおける各モジュールの有効性の確証が得られ, 全体的な性能への寄与が確認された。さらに、詳細なケーススタディでは、ERHデータに基づく医薬品推奨におけるACDNetの有効性を強化し、現実の医療シナリオにおけるその実用的価値を示す。 Medication recommendation using Electronic Health Records (EHR) is challenging due to complex medical data. Current approaches extract longitudinal information from patient EHR to personalize recommendations. However, existing models often lack sufficient patient representation and overlook the importance of considering the similarity between a patient's medication records and specific medicines. Therefore, an Attention-guided Collaborative Decision Network (ACDNet) for medication recommendation is proposed in this paper. Specifically, ACDNet utilizes attention mechanism and Transformer to effectively capture patient health conditions and medication records by modeling their historical visits at both global and local levels. ACDNet also employs a collaborative decision framework, utilizing the similarity between medication records and medicine representation to facilitate the recommendation process. The experimental results on two extensive medical datasets, MIMIC-III and MIMIC-IV, clearly demonstrate that ACDNet outperforms state-of-the-art models in terms of Jaccard, PR-AUC, and F1 score, reaffirming its superiority. Moreover, the ablation experiments provide solid evidence of the effectiveness of each module in ACDNet, validating their contribution to the overall performance. Furthermore, a detailed case study reinforces the effectiveness of ACDNet in medication recommendation based on EHR data, showcasing its practical value in real-world healthcare scenarios.	翻訳日:2023-07-10 13:47:53 公開日:2023-07-06
# MOBIOデータベースによる顔のランドマーク検出評価 Facial Landmark Detection Evaluation on MOBIO Database ( http://arxiv.org/abs/2307.03329v1 ) ライセンス: Link先を確認	Na Zhang	(参考訳) MOBIOはバイモーダルなデータベースで、ほとんど携帯電話でしか撮れなかった。バイオメトリック技術をモバイルデバイスに展開する研究を改善することを目的としている。モバイル環境では顔認識や話者認識が可能であることが研究で示されている。顔のランドマークの局所化は、2次元顔画像のための予め定義されたキーポイントの集合の座標を見つけることを目的としている。顔ランドマークは通常、鼻先や眼中心といった特定の意味の意味を持ち、顔認識、感情推定、3d顔再構成などの他の顔分析タスクにリッチな幾何学的情報を提供する。 300W, AFW, AFLW, COFWなどの顔データベースを用いた顔のランドマーク検出手法はほとんどないが, モバイルデータはほとんど使われていない。筆者らはまず,MOBIOデータベースからの顔画像を用いて,移動体静止データに対する顔のランドマーク検出評価を行う。約20,600枚の顔画像がこの視聴覚データベースから抽出され、手作業で22のランドマークが基幹としてラベル付けされている。これらのデータ上での性能を評価するために,最先端の顔ランドマーク検出手法がいくつか採用されている。その結果、MOBIOデータベースのデータはかなり難しいことがわかった。このデータベースは、顔のランドマーク検出評価に新たな挑戦となる可能性がある。 MOBIO is a bi-modal database that was captured almost exclusively on mobile phones. It aims to improve research into deploying biometric techniques to mobile devices. Research has been shown that face and speaker recognition can be performed in a mobile environment. Facial landmark localization aims at finding the coordinates of a set of pre-defined key points for 2D face images. A facial landmark usually has specific semantic meaning, e.g. nose tip or eye centre, which provides rich geometric information for other face analysis tasks such as face recognition, emotion estimation and 3D face reconstruction. Pretty much facial landmark detection methods adopt still face databases, such as 300W, AFW, AFLW, or COFW, for evaluation, but seldomly use mobile data. Our work is first to perform facial landmark detection evaluation on the mobile still data, i.e., face images from MOBIO database. About 20,600 face images have been extracted from this audio-visual database and manually labeled with 22 landmarks as the groundtruth. Several state-of-the-art facial landmark detection methods are adopted to evaluate their performance on these data. The result shows that the data from MOBIO database is pretty challenging. This database can be a new challenging one for facial landmark detection evaluation.	翻訳日:2023-07-10 13:47:30 公開日:2023-07-06
# ディジタルアンテナアレイ上の自己教師付き事前学習および下流信号帯域回帰のためのエンコーダデコーダネットワーク Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays ( http://arxiv.org/abs/2307.03327v1 ) ライセンス: Link先を確認	Rajib Bhattacharjea, Nathan West	(参考訳) 本研究は,デジタルアンテナアレイのデータに適用された自己教師あり学習の最初の応用について述べる。エンコーダ・デコーダネットワークは、デジタルアレイデータ上に事前トレーニングされ、チャネル・イン・ペイントと呼ばれる自己教師付きノイズ・再構成タスクを実行する。自己管理のステップでは、人間のラベル付きデータを必要としない。エンコーダのアーキテクチャと事前訓練からの重みはタスク固有のデコーダを持つ新しいネットワークに転送され、新しいネットワークはラベル付きデータの少ない量でトレーニングされる。ラベル付きデータに対する事前トレーニングにより、新しいネットワークは、ランダム初期化から同じラベル付きデータに基づいてトレーニングされた等価ネットワークよりも、デジタルアレイデータ上で帯域幅回帰のタスクを実行できることを示す。 This work presents the first applications of self-supervised learning applied to data from digital antenna arrays. Encoder-decoder networks are pretrained on digital array data to perform a self-supervised noisy-reconstruction task called channel in-painting, in which the network infers the contents of array data that has been masked with zeros. The self-supervised step requires no human-labeled data. The encoder architecture and weights from pretraining are then transferred to a new network with a task-specific decoder, and the new network is trained on a small volume of labeled data. We show that pretraining on the unlabeled data allows the new network to perform the task of bandwidth regression on the digital array data better than an equivalent network that is trained on the same labeled data from random initialization.	翻訳日:2023-07-10 13:47:11 公開日:2023-07-06
# サイバー攻撃を検出して電力系統障害のタイプを識別する機械学習 Machine Learning to detect cyber-attacks and discriminating the types of power system disturbances ( http://arxiv.org/abs/2307.03323v1 ) ライセンス: Link先を確認	Diane Tuyizere and Remy Ihabwikuzo	(参考訳) 本研究では,電力系統を対象とした機械学習による攻撃検出モデルを提案する。 Phasor Measurementing Devices(PMU)から収集したデータとログを利用することで、システムの振る舞いを学習し、潜在的なセキュリティ境界を効果的に識別することを目指している。提案手法は,データセット前処理,特徴選択,モデル生成,評価などの重要な段階を含む。このアプローチを検証するために、異なるPMUから得られた15のデータセットと、スノート警報とログをリレーするデータセットを使用した。ランダムフォレスト、ロジスティック回帰、K-Nearest Neighbourの3つの機械学習モデルを構築し、さまざまなパフォーマンス指標を用いて評価した。その結果, 無作為林モデルは, 電力系統外乱の検出において90.56%の精度で最高性能を達成でき, 意思決定過程におけるオペレーター支援の可能性も示唆された。 This research proposes a machine learning-based attack detection model for power systems, specifically targeting smart grids. By utilizing data and logs collected from Phasor Measuring Devices (PMUs), the model aims to learn system behaviors and effectively identify potential security boundaries. The proposed approach involves crucial stages including dataset pre-processing, feature selection, model creation, and evaluation. To validate our approach, we used a dataset used, consist of 15 separate datasets obtained from different PMUs, relay snort alarms and logs. Three machine learning models: Random Forest, Logistic Regression, and K-Nearest Neighbour were built and evaluated using various performance metrics. The findings indicate that the Random Forest model achieves the highest performance with an accuracy of 90.56% in detecting power system disturbances and has the potential in assisting operators in decision-making processes.	翻訳日:2023-07-10 13:46:55 公開日:2023-07-06
# BiPhone:テキストにおける言語間音声の影響のモデル化 BiPhone: Modeling Inter Language Phonetic Influences in Text ( http://arxiv.org/abs/2307.03322v1 ) ライセンス: Link先を確認	Abhirut Gupta, Ananya B. Sai, Richard Sproat, Yuri Vasilevski, James S. Ren, Ambarish Jash, Sukhdeep S. Sodhi, and Aravindan Raghuveer	(参考訳) 多くの人々が、テクノロジーの非対称性のために、リテラシーの低い言語でwebを使わざるを得ない。このようなユーザから第2言語(L2)で書かれたテキストには、ネイティブ言語(L1)の影響を受けている大量のエラーが含まれていることが多い。本稿ではL1とL2のペアに対して音素混同(L2ではL1話者が強調される可能性が高い)を抽出する方法を提案する。これらの混乱を生成モデル(Bi-Phone)にプラグインし、合成されたL2テキストを生成する。人的評価を通して, ビフォネはL1ごとに異なる, ウェブ上で広く報道される, もっともらしい汚職を発生させることを示す。また,一般的な言語理解ベンチマークであるSuperGLUEを,我々の手法(FunGLUE for Phonetically Noised GLUE)で劣化させ,SoTA言語基盤モデルの性能が低いことを示す。我々はまた,SuperGLUEに近い性能の回復を支援する新しい音素予測事前学習タスクも導入した。最後に,音声にロバストな言語モデルのさらなる研究を促進するために,funglueベンチマークもリリースします。我々の知る限り、FunGLUEはテキストにL1-L2インタラクションを導入した最初のベンチマークです。 A large number of people are forced to use the Web in a language they have low literacy in due to technology asymmetries. Written text in the second language (L2) from such users often contains a large number of errors that are influenced by their native language (L1). We propose a method to mine phoneme confusions (sounds in L2 that an L1 speaker is likely to conflate) for pairs of L1 and L2. These confusions are then plugged into a generative model (Bi-Phone) for synthetically producing corrupted L2 text. Through human evaluations, we show that Bi-Phone generates plausible corruptions that differ across L1s and also have widespread coverage on the Web. We also corrupt the popular language understanding benchmark SuperGLUE with our technique (FunGLUE for Phonetically Noised GLUE) and show that SoTA language understating models perform poorly. We also introduce a new phoneme prediction pre-training task which helps byte models to recover performance close to SuperGLUE. Finally, we also release the FunGLUE benchmark to promote further research in phonetically robust language models. To the best of our knowledge, FunGLUE is the first benchmark to introduce L1-L2 interactions in text.	翻訳日:2023-07-10 13:46:37 公開日:2023-07-06
# 量子絡み合いと純度テスト:グラフゼータ関数の観点から Quantum Entanglement & Purity Testing: A Graph Zeta Function Perspective ( http://arxiv.org/abs/2307.03321v1 ) ライセンス: Link先を確認	Zachary P. Bradshaw and Margarite L. LaBorde	(参考訳) 我々は、任意の密度行列を重み付きグラフに割り当て、それを、イハラゼータ関数の一般化とエッジゼータ関数の特別な場合の両方であるグラフゼータ関数に関連付ける。最近開発された対称群に基づく双分極純状態分離性アルゴリズムは、このゼータ関数の指数展開における係数がユニティであるという条件に等価であることを示す。さらに、密度行列の非零固有値とゼータ関数の特異点との間には1対1の対応がある。これらの発見を説明するためにいくつかの例がある。 We assign an arbitrary density matrix to a weighted graph and associate to it a graph zeta function that is both a generalization of the Ihara zeta function and a special case of the edge zeta function. We show that a recently developed bipartite pure state separability algorithm based on the symmetric group is equivalent to the condition that the coefficients in the exponential expansion of this zeta function are unity. Moreover, there is a one-to-one correspondence between the nonzero eigenvalues of a density matrix and the singularities of its zeta function. Several examples are given to illustrate these findings.	翻訳日:2023-07-10 13:46:17 公開日:2023-07-06
# 調査対象の非共通点:調査対象のギャップに焦点をあてた質問生成 Covering Uncommon Ground: Gap-Focused Question Generation for Answer Assessment ( http://arxiv.org/abs/2307.03319v1 ) ライセンス: Link先を確認	Roni Rabin, Alexandre Djerbetian, Roee Engelberg, Lidan Hackmon, Gal Elidan, Reut Tsarfaty, Amir Globerson	(参考訳) 人間のコミュニケーションには、しばしば対話者間の情報ギャップが伴う。例えば、教育的な対話では、生徒は不完全な答えをしばしば提供し、この答えと教師が期待する完璧な答えの間にはギャップがある。成功した対話は、教師が効果的にこのギャップについて質問することで、リッチでインタラクティブな教育体験を生み出す。このようなギャップに着目した質問(GFQ)を自動生成する問題に着目する。我々はタスクを定義し、優れたgfqの望ましい側面を強調し、これらを満たすモデルを提案する。最後に,人間生成の質問に対する人間の注釈者による評価を行い,競争性を示す。 Human communication often involves information gaps between the interlocutors. For example, in an educational dialogue, a student often provides an answer that is incomplete, and there is a gap between this answer and the perfect one expected by the teacher. Successful dialogue then hinges on the teacher asking about this gap in an effective manner, thus creating a rich and interactive educational experience. We focus on the problem of generating such gap-focused questions (GFQs) automatically. We define the task, highlight key desired aspects of a good GFQ, and propose a model that satisfies these. Finally, we provide an evaluation by human annotators of our generated questions compared against human generated ones, demonstrating competitive performance.	翻訳日:2023-07-10 13:46:07 公開日:2023-07-06
# 解離型潜伏表現による難治性治療の臨床的評価 Assisting Clinical Decisions for Scarcely Available Treatment via Disentangled Latent Representation ( http://arxiv.org/abs/2307.03315v1 ) ライセンス: Link先を確認	Bing Xue, Ahmed Sameh Said, Ziqi Xu, Hanyang Liu, Neel Shah, Hanqing Yang, Philip Payne, Chenyang Lu	(参考訳) 体外膜酸素化(ECMO)は、従来の治療法に耐性がある新型コロナウイルス患者にとって必須の生命維持モーメントである。しかし、適切な治療決定は重要な議論の対象であり、この希少で技術的に複雑な治療オプションの利点についてはまだ議論が続いている。臨床判断を支援するためには,治療ニーズと治療の可能性,無治療反応を予測する必要がある。この臨床課題を対象とし,個別化分析のための新しいアプローチである治療変動オートエンコーダ(tvae)を提案する。 TVAEは、ECMOのようなモデリング上の課題に、強力な治療選択バイアスと不十分な治療ケースで対処するように設計されている。 TVAEは治療決定をマルチスケール問題として概念化している。本研究は,患者の潜在的治療課題と,深層潜伏変数モデルで表現できる本質的な特徴の一部として,現実的および非現実的結果をモデル化する。半スーパービジョンと共に再構成正規化スキームにより事実と反事実の予測誤差を軽減し、異方性と分布整合潜在空間とラベルバランス生成戦略により、選択バイアスと治療ケースの不足を軽減させる。我々は、63カ国1651の病院から収集された国際データセットと、15の病院から収集された機関データセットの2つの実世界のCOVID-19データセットについてTVAEを評価した。その結果、TVAEは、不均一なCOVID-19データセットの妥当性スコアと事実結果の両方を予測するために、最先端の治療効果モデルより優れていることが示された。追加実験では、合成したIHDPベンチマークデータセット上で、個別処理効果の推定において、TVAEが最高の既存のモデルより優れていることも示している。 Extracorporeal membrane oxygenation (ECMO) is an essential life-supporting modality for COVID-19 patients who are refractory to conventional therapies. However, the proper treatment decision has been the subject of significant debate and it remains controversial about who benefits from this scarcely available and technically complex treatment option. To support clinical decisions, it is a critical need to predict the treatment need and the potential treatment and no-treatment responses. Targeting this clinical challenge, we propose Treatment Variational AutoEncoder (TVAE), a novel approach for individualized treatment analysis. TVAE is specifically designed to address the modeling challenges like ECMO with strong treatment selection bias and scarce treatment cases. TVAE conceptualizes the treatment decision as a multi-scale problem. We model a patient's potential treatment assignment and the factual and counterfactual outcomes as part of their intrinsic characteristics that can be represented by a deep latent variable model. The factual and counterfactual prediction errors are alleviated via a reconstruction regularization scheme together with semi-supervision, and the selection bias and the scarcity of treatment cases are mitigated by the disentangled and distribution-matched latent space and the label-balancing generative strategy. We evaluate TVAE on two real-world COVID-19 datasets: an international dataset collected from 1651 hospitals across 63 countries, and a institutional dataset collected from 15 hospitals. The results show that TVAE outperforms state-of-the-art treatment effect models in predicting both the propensity scores and factual outcomes on heterogeneous COVID-19 datasets. Additional experiments also show TVAE outperforms the best existing models in individual treatment effect estimation on the synthesized IHDP benchmark dataset.	翻訳日:2023-07-10 13:45:56 公開日:2023-07-06
# 知識グラフ推論のための構造誘導マルチモーダル事前学習トランス Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning ( http://arxiv.org/abs/2307.03591v1 ) ライセンス: Link先を確認	Ke Liang, Sihang Zhou, Yue Liu, Lingyuan Meng, Meng Liu, Xinwang Liu	(参考訳) 様々なモダリティで情報を直感的に整理するマルチモーダル知識グラフ(MKG)は、レコメンデーションシステムや視覚的質問応答など、複数の下流業務に役立てることができる。しかし、ほとんどのMKGは完成には程遠いため、MKG推論モデルの繁栄の動機となっている。近年,汎用人工建築の発展に伴い,特にマルチモーダルシナリオにおいて,事前学習型トランスフォーマーモデルに注目が集まっている。しかし、知識グラフ推論(KGR)のためのマルチモーダル事前学習変換器(MPT)の研究はまだ初期段階にある。 MKGと他のマルチモーダルデータとの最大の違いとして、MKGの基盤となる豊富な構造情報は、既存のMPTモデルでは十分に活用できない。それらの多くは、同じエンティティに接続された画像とテキストをマッチングするための検索マップとして、グラフ構造のみを使用する。このやり方は彼らの推論パフォーマンスを妨げる。そこで,本研究では知識グラフ推論のためのグラフ構造誘導マルチモーダルプリトレーニングトランス(sgmpt)を提案する。具体的には、構造特徴符号化にグラフ構造エンコーダを用いる。次に、2つの異なる戦略、すなわち重み付き和とアライメント制約を持つ構造誘導型融合モジュールを最初に設計し、構造情報をテキストと視覚の両方に注入する。我々の知る限り、SGMPTは知識グラフの基盤となる構造情報をマイニングするマルチモーダルKGRのための最初のMPTモデルである。 FB15k-237-IMGとWN18-IMGの大規模な実験により、SGMPTが既存の最先端モデルより優れ、設計戦略の有効性が証明された。 Multimodal knowledge graphs (MKGs), which intuitively organize information in various modalities, can benefit multiple practical downstream tasks, such as recommendation systems, and visual question answering. However, most MKGs are still far from complete, which motivates the flourishing of MKG reasoning models. Recently, with the development of general artificial architectures, the pretrained transformer models have drawn increasing attention, especially for multimodal scenarios. However, the research of multimodal pretrained transformer (MPT) for knowledge graph reasoning (KGR) is still at an early stage. As the biggest difference between MKG and other multimodal data, the rich structural information underlying the MKG still cannot be fully leveraged in existing MPT models. Most of them only utilize the graph structure as a retrieval map for matching images and texts connected with the same entity. This manner hinders their reasoning performances. To this end, we propose the graph Structure Guided Multimodal Pretrained Transformer for knowledge graph reasoning, termed SGMPT. Specifically, the graph structure encoder is adopted for structural feature encoding. Then, a structure-guided fusion module with two different strategies, i.e., weighted summation and alignment constraint, is first designed to inject the structural information into both the textual and visual features. To the best of our knowledge, SGMPT is the first MPT model for multimodal KGR, which mines the structural information underlying the knowledge graph. Extensive experiments on FB15k-237-IMG and WN18-IMG, demonstrate that our SGMPT outperforms existing state-of-the-art models, and prove the effectiveness of the designed strategies.	翻訳日:2023-07-10 12:20:58 公開日:2023-07-06
# セキュリティ改善と異言語化における単語埋め込み意味境界オートエンコーダのための未決定ウェーブレット変換 Undecimated Wavelet Transform for Word Embedded Semantic Marginal Autoencoder in Security improvement and Denoising different Languages ( http://arxiv.org/abs/2307.03679v1 ) ライセンス: Link先を確認	Shreyanth S	(参考訳) 本研究は,Word Embedded Semantic Marginal Autoencoder (WESMA) 内の非効率なウェーブレット変換を組み合わせることで,セキュリティ対策の改善と複数の言語を認知するための新たな戦略を提供する。これらの戦略の組み入れは、データ処理アプリケーションにおける堅牢性、プライバシー、多言語性の問題に対処することを目的としている。未決定ウェーブレット変換は、入力データの顕著な言語パターンと構造的性質を識別するための特徴抽出ツールとして使用される。提案手法は,この変換を用いて時間的および地理的な関連を保ちつつ,重要な情報を取り込むことができる。これにより、システムの異常検出能力を高め、隠れたパターンを発見し、正当な内容と危険な脅威を区別することで、セキュリティ対策が改善される。 Word Embedded Semantic Marginal Autoencoderは次元と雑音の低減のためのインテリジェントなフレームワークとしても機能する。オートエンコーダは、データの基盤となるセマンティクスを効果的に学習し、単語埋め込みとセマンティクスコンテキストを利用してノイズ成分を削減する。その結果、以下の処理段階において、データ品質と精度が向上する。提案手法は、複数の言語とセキュリティシナリオを含む多様化データセットを使用してテストされる。実験の結果,提案手法は,複数の言語にまたがるセキュリティ強化と特徴付け機能の実現に有効であることがわかった。このシステムは言語のばらつきを扱うのに強く、使用する言語に関係なく一貫した結果を生み出す。さらに、非効率なウェーブレット変換を組み込むことで、複雑なセキュリティ問題に効率的に対処するシステムの能力が大幅に向上する。 By combining the undecimated wavelet transform within a Word Embedded Semantic Marginal Autoencoder (WESMA), this research study provides a novel strategy for improving security measures and denoising multiple languages. The incorporation of these strategies is intended to address the issues of robustness, privacy, and multilingualism in data processing applications. The undecimated wavelet transform is used as a feature extraction tool to identify prominent language patterns and structural qualities in the input data. The proposed system may successfully capture significant information while preserving the temporal and geographical links within the data by employing this transform. This improves security measures by increasing the system's ability to detect abnormalities, discover hidden patterns, and distinguish between legitimate content and dangerous threats. The Word Embedded Semantic Marginal Autoencoder also functions as an intelligent framework for dimensionality and noise reduction. The autoencoder effectively learns the underlying semantics of the data and reduces noise components by exploiting word embeddings and semantic context. As a result, data quality and accuracy are increased in following processing stages. The suggested methodology is tested using a diversified dataset that includes several languages and security scenarios. The experimental results show that the proposed approach is effective in attaining security enhancement and denoising capabilities across multiple languages. The system is strong in dealing with linguistic variances, producing consistent outcomes regardless of the language used. Furthermore, incorporating the undecimated wavelet transform considerably improves the system's ability to efficiently address complex security concerns	翻訳日:2023-07-10 12:01:48 公開日:2023-07-06
# フロンティアai規制 - 公共安全に対する新たなリスク管理 Frontier AI Regulation: Managing Emerging Risks to Public Safety ( http://arxiv.org/abs/2307.03718v1 ) ライセンス: Link先を確認	Markus Anderljung, Joslyn Barnhart, Jade Leung, Anton Korinek, Cullen O'Keefe, Jess Whittlestone, Shahar Avin, Miles Brundage, Justin Bullock, Duncan Cass-Beggs, Ben Chang, Tantum Collins, Tim Fist, Gillian Hadfield, Alan Hayes, Lewis Ho, Sara Hooker, Eric Horvitz, Noam Kolt, Jonas Schuett, Yonadav Shavit, Divya Siddarth, Robert Trager, Kevin Wolf	(参考訳) 高度なAIモデルは人類にとって大きな利益をもたらすと約束しているが、社会はそれに伴うリスクを積極的に管理する必要がある。本稿では,公共の安全に重大なリスクをもたらすのに十分な危険能力を有するような,高度な能力を持つ基盤モデルについて述べる。危険な能力が予期せず出現する可能性があり、デプロイされたモデルが誤用されることを堅牢に防止することは困難であり、モデルの能力が広範囲に普及することを止めるのは難しい。これらの課題に対処するには、(1)フロンティアAI開発者の適切な要件を特定するための標準設定プロセス、(2)フロンティアAI開発プロセスの可視性を提供するための規制当局の登録および報告要件、(3)フロンティアAIモデルの開発と展開のための安全基準の遵守を保証するメカニズムの3つが必要である。業界の自己規制は重要な第一歩です。しかし、より広範な社会的な議論と政府の介入は、標準の作成とコンプライアンスの確保のために必要となる。我々は、規制当局への執行権限の付与やフロンティアaiモデルのライセンス制度など、この目的へのいくつかの選択肢を検討します。最後に,安全基準の第一セットを提案する。これには、デプロイ前のリスクアセスメントの実行、モデルの振る舞いの外部的検査、デプロイメント決定にリスクアセスメントを使用すること、モデルの能力とデプロイ後の使用に関する新しい情報に関する監視と応答が含まれる。この議論が、ai開発のフロンティアにおける公衆安全のリスクとイノベーションのメリットのバランスのとり方に関する幅広い議論に貢献できることを願っている。 Advanced AI models hold the promise of tremendous benefits for humanity, but society needs to proactively manage the accompanying risks. In this paper, we focus on what we term "frontier AI" models: highly capable foundation models that could possess dangerous capabilities sufficient to pose severe risks to public safety. Frontier AI models pose a distinct regulatory challenge: dangerous capabilities can arise unexpectedly; it is difficult to robustly prevent a deployed model from being misused; and, it is difficult to stop a model's capabilities from proliferating broadly. To address these challenges, at least three building blocks for the regulation of frontier models are needed: (1) standard-setting processes to identify appropriate requirements for frontier AI developers, (2) registration and reporting requirements to provide regulators with visibility into frontier AI development processes, and (3) mechanisms to ensure compliance with safety standards for the development and deployment of frontier AI models. Industry self-regulation is an important first step. However, wider societal discussions and government intervention will be needed to create standards and to ensure compliance with them. We consider several options to this end, including granting enforcement powers to supervisory authorities and licensure regimes for frontier AI models. Finally, we propose an initial set of safety standards. These include conducting pre-deployment risk assessments; external scrutiny of model behavior; using risk assessments to inform deployment decisions; and monitoring and responding to new information about model capabilities and uses post-deployment. We hope this discussion contributes to the broader conversation on how to balance public safety risks and innovation benefits from advances at the frontier of AI development.	翻訳日:2023-07-10 11:52:51 公開日:2023-07-06
# レーザと機械学習モデルを用いた鋼表面粗さパラメータ計算 Steel Surface Roughness Parameter Calculations Using Lasers and Machine Learning Models ( http://arxiv.org/abs/2307.03723v1 ) ライセンス: Link先を確認	Alex Milne, Xianghua Xie	(参考訳) 鋼板の表面性状の制御は, 亜鉛めっきおよび熱間圧延プロセスにおける顧客の要求を満たすために不可欠である。従来の方法はポストプロダクションのスタイラス測定に依存し、オンライン技術はストリップ全体の非接触およびリアルタイム計測を提供する。しかし, 製造パイプラインの有効利用には, 正確な測定の確保が不可欠である。さらに、正確なオンライン測定により製造工程パラメータのリアルタイム調整が可能となり、一貫性のある品質とテンパーミルのクローズドループ制御が可能となる。本研究では,最先端の機械学習モデルを用いて,オンライン計測の高精度なra面粗さ測定への変換を実現する。深部学習法と非深部学習法の両方を含むデータ駆動型アプローチの選択をクローズフォーム変換と比較することにより, 薄帯鋼製造における表面テクスチャ制御の改善の可能性を評価する。 Control of surface texture in strip steel is essential to meet customer requirements during galvanizing and temper rolling processes. Traditional methods rely on post-production stylus measurements, while on-line techniques offer non-contact and real-time measurements of the entire strip. However, ensuring accurate measurement is imperative for their effective utilization in the manufacturing pipeline. Moreover, accurate on-line measurements enable real-time adjustments of manufacturing processing parameters during production, ensuring consistent quality and the possibility of closed-loop control of the temper mill. In this study, we leverage state-of-the-art machine learning models to enhance the transformation of on-line measurements into significantly a more accurate Ra surface roughness metric. By comparing a selection of data-driven approaches, including both deep learning and non-deep learning methods, to the close-form transformation, we evaluate their potential for improving surface texture control in temper strip steel manufacturing.	翻訳日:2023-07-10 11:40:51 公開日:2023-07-06
# エアフォイル GAN:空力形状最適化のためのエアフォイルのエンコードと合成 Airfoil GAN: Encoding and Synthesizing Airfoils for Aerodynamic Shape Optimization ( http://arxiv.org/abs/2101.04757v2 ) ライセンス: Link先を確認	Yuyang Wang, Kenji Shimada, Amir Barati Farimani	(参考訳) エアフォイルのような空力形状の現在の設計は、可能な設計空間を探索するための計算集約的なシミュレーションを伴う。通常、このような設計は設計パラメータの事前定義に依存し、新しい形状の合成に制限を課す。本研究では,既存の翼から表現を自動的に学習し,学習した表現を用いて新しい翼を生成するデータ駆動型形状符号化・生成法を提案する。これらの表現は、空気力学的性能に基づいて合成翼形状の最適化に使用される。我々のモデルは、変分オートエンコーダとジェネレーティブ・アドバーサリアル・ネットワークを組み合わせたニューラルネットワークであるVAEGANに基づいて構築されており、勾配に基づく手法で訓練されている。本モデルでは,(1)既存のエアフォイルを潜在ベクターにエンコードし,それからエアフォイルを再構築し,(2)潜在ベクターをランダムにサンプリングしてエアフォイル座標領域にマッピングし,(3)学習した特徴を遺伝的アルゴリズムにより最適化し,所望の空力特性を有するエアフォイルを合成する。実験の結果,事前定義された設計パラメータを使わずに,形状情報を網羅的かつ包括的に符号化できることがわかった。特徴ベクトルの補間/補間またはガウス雑音からのサンプリングにより、モデルは、モデル訓練のために使用される翼と競合する、あるいはより優れた空力特性を持つ、新しい翼形状を自動的に合成することができる。遺伝的アルゴリズムによって学習された潜在領域の形状を最適化することで、合成された翼は空力特性をターゲットに進化することができる。これは効率のよい学習ベースの翼設計の枠組みを示し、潜水領域の翼を符号化し最適化し、空力性能に必要な有望な翼候補を合成する。 The current design of aerodynamic shapes, like airfoils, involves computationally intensive simulations to explore the possible design space. Usually, such design relies on the prior definition of design parameters and places restrictions on synthesizing novel shapes. In this work, we propose a data-driven shape encoding and generating method, which automatically learns representations from existing airfoils and uses the learned representations to generate new airfoils. The representations are then used in the optimization of synthesized airfoil shapes based on their aerodynamic performance. Our model is built upon VAEGAN, a neural network that combines Variational Autoencoder with Generative Adversarial Network and is trained by the gradient-based technique. Our model can (1) encode the existing airfoil into a latent vector and reconstruct the airfoil from that, (2) generate novel airfoils by randomly sampling the latent vectors and mapping the vectors to the airfoil coordinate domain, and (3) synthesize airfoils with desired aerodynamic properties by optimizing learned features via a genetic algorithm. Our experiments show that the learned features encode shape information thoroughly and comprehensively without predefined design parameters. By interpolating/extrapolating feature vectors or sampling from Gaussian noises, the model can automatically synthesize novel airfoil shapes, some of which possess competitive or even better aerodynamic properties comparing to airfoils used for model training purposes. By optimizing shapes on the learned latent domain via a genetic algorithm, synthesized airfoils can evolve to target aerodynamic properties. This demonstrates an efficient learning-based airfoil design framework, which encodes and optimizes the airfoil on the latent domain and synthesizes promising airfoil candidates for required aerodynamic performance.	翻訳日:2023-07-07 19:04:53 公開日:2023-07-06
# ShadowNet:畳み込みニューラルネットワークのためのセキュアで効率的なオンデバイスモデル推論システム ShadowNet: A Secure and Efficient On-device Model Inference System for Convolutional Neural Networks ( http://arxiv.org/abs/2011.05905v4 ) ライセンス: Link先を確認	Zhichuang Sun, Ruimin Sun, Changming Liu, Amrita Roy Chowdhury, Long Lu, Somesh Jha	(参考訳) モバイルとエッジデバイスにおけるAIアクセラレータの使用が増加し、オンデバイス機械学習(ML)が人気を集めている。何千ものプロプライエタリなMLモデルが今日、何十億もの信頼できないデバイスにデプロイされている。これはモデルプライバシに関する深刻なセキュリティ上の懸念を引き起こす。しかし、信頼できないAIアクセラレーターへのアクセスを失うことなくモデルのプライバシを保護することは難しい問題である。本稿では,デバイス上での新たなモデル推論システムであるShadowNetを提案する。 shadownetはモデルプライバシをtrusted execution environment(tee)で保護するとともに、モデルの重線形層を信頼できないハードウェアアクセラレータに安全にアウトソーシングする。 ShadowNetは、アウトソーシングする前にリニアレイヤの重みを変換し、TEE内の結果を復元することで、これを実現する。非線形層は、TEE内でも安全である。 ShadowNetの設計は、重みの効率的な変換とその後の結果の復元を保証する。 TensorFlow LiteをベースにShadowNetのプロトタイプを構築し、MobileNet、ResNet-44、MiniVGG、ResNet-404、YOLOv4-tinyという5つの人気のあるCNNで評価する。評価の結果,ShadowNetは適切な性能で強力なセキュリティ保証を実現し,デバイス上での安全なモデル推論のための実用的なソリューションを提供する。 With the increased usage of AI accelerators on mobile and edge devices, on-device machine learning (ML) is gaining popularity. Thousands of proprietary ML models are being deployed today on billions of untrusted devices. This raises serious security concerns about model privacy. However, protecting model privacy without losing access to the untrusted AI accelerators is a challenging problem. In this paper, we present a novel on-device model inference system, ShadowNet. ShadowNet protects the model privacy with Trusted Execution Environment (TEE) while securely outsourcing the heavy linear layers of the model to the untrusted hardware accelerators. ShadowNet achieves this by transforming the weights of the linear layers before outsourcing them and restoring the results inside the TEE. The non-linear layers are also kept secure inside the TEE. ShadowNet's design ensures efficient transformation of the weights and the subsequent restoration of the results. We build a ShadowNet prototype based on TensorFlow Lite and evaluate it on five popular CNNs, namely, MobileNet, ResNet-44, MiniVGG, ResNet-404, and YOLOv4-tiny. Our evaluation shows that ShadowNet achieves strong security guarantees with reasonable performance, offering a practical solution for secure on-device model inference.	翻訳日:2023-07-07 19:04:21 公開日:2023-07-06
# SAT解決のためのタイムラプスチャレンジ A Time Leap Challenge for SAT Solving ( http://arxiv.org/abs/2008.02215v2 ) ライセンス: Link先を確認	Johannes K. Fichte, Markus Hecher, Stefan Szeider	(参考訳) 我々は過去20年間のSAT問題解決におけるハードウェアの進歩とアルゴリズムの進歩の影響を比較した。特に,20年前のSATソルバと20年前のハードウェアのSATソルバを比較した。以上の結果から,アルゴリズム面での進歩は,ハードウェア面での進歩よりも少なくとも影響が大きいことがわかった。 We compare the impact of hardware advancement and algorithm advancement for SAT solving over the last two decades. In particular, we compare 20-year-old SAT-solvers on new computer hardware with modern SAT-solvers on 20-year-old hardware. Our findings show that the progress on the algorithmic side has at least as much impact as the progress on the hardware side.	翻訳日:2023-07-07 19:03:58 公開日:2023-07-06
# 半デバイス依存ブラインド量子トモグラフィ Semi-device-dependent blind quantum tomography ( http://arxiv.org/abs/2006.03069v2 ) ライセンス: Link先を確認	Ingo Roth, Jadwiga Wilkens, Dominik Hangleiter, Jens Eisert	(参考訳) 量子状態に関するトモグラフィー情報を抽出することは、高精度量子デバイスを開発するための重要な課題である。現在のスキームは通常、高精度に調整されたトモグラフィのための測定装置を必要とする。皮肉なことに、測定校正の精度は、状態調整の精度によって根本的に制限され、悪循環が確立される。そこで本研究では, このサイクルが破られ, 測定装置のキャリブレーションに対する依存性が著しく緩和されることを示す。その結果, 量子状態の自然低ランク構造を利用して, 古典的に効率的な後処理アルゴリズムを用いた高度にスケーラブルな ‘blind' トモグラフィ法が得られた。さらに,キャリブレーションのスパース構造を利用することで,提案手法の効率をさらに向上させる。これは、ブラインド量子トモグラフィー問題を低ランク行列のスパース和のデミキシングに緩和することで達成される。提案アルゴリズムは低ランクな量子状態を復元し,測定モデルが制限された等尺性を示すことを証明した。総合的な測定を行うには,測定設定を最適に行う必要がある。これらの概念的および数学的知見を補完し、トラップイオンの実装にインスパイアされた実用的な環境で、ロバストブラインド量子トモグラフィーが可能であることを数値的に示す。 Extracting tomographic information about quantum states is a crucial task in the quest towards devising high-precision quantum devices. Current schemes typically require measurement devices for tomography that are a priori calibrated to high precision. Ironically, the accuracy of the measurement calibration is fundamentally limited by the accuracy of state preparation, establishing a vicious cycle. Here, we prove that this cycle can be broken and the dependence on the measurement device's calibration significantly relaxed. We show that exploiting the natural low-rank structure of quantum states of interest suffices to arrive at a highly scalable `blind' tomography scheme with a classically efficient post-processing algorithm. We further improve the efficiency of our scheme by making use of the sparse structure of the calibrations. This is achieved by relaxing the blind quantum tomography problem to the de-mixing of a sparse sum of low-rank matrices. We prove that the proposed algorithm recovers a low-rank quantum state and the calibration provided that the measurement model exhibits a restricted isometry property. For generic measurements, we show that it requires a close-to-optimal number of measurement settings. Complementing these conceptual and mathematical insights, we numerically demonstrate that robust blind quantum tomography is possible in a practical setting inspired by an implementation of trapped ions.	翻訳日:2023-07-07 19:03:53 公開日:2023-07-06
# 非可換代数を用いた畳み込みフィルタとニューラルネットワーク Convolutional Filtering and Neural Networks with Non Commutative Algebras ( http://arxiv.org/abs/2108.09923v3 ) ライセンス: Link先を確認	Alejandro Parada-Mayorga, Landon Butler and Alejandro Ribeiro	(参考訳) 本稿では,非可換畳み込み畳み込みニューラルネットワークの代数的一般化について述べる。代数的信号処理の理論を畳み込み型非可換アーキテクチャのモデル化に活用し、畳み込み型畳み込み型ニューラルネットワークの文献で得られたものを拡張する具体的安定性境界を導出する。非可換畳み込み構造は作用素空間上の変形に対して安定であることを示す。我々は非可換信号モデルのスペクトル表現を開発し、非可換フィルタが互いに独立してフーリエ成分を処理することを示す。特に、非可換モデルにおける信号のスペクトル分解は次元が 1 より大きい固有空間に関連付けられるが、安定性と選択性の間にはトレードオフがあり、低次元行列空間における行列多項式関数によって制御される。このトレードオフは、代数のフィルタが安定に制限されているとき、ポイントワイズ非線形性によってネットワーク内で補償される識別可能性の損失があることを示している。本稿では,群ニューラルネットワーク,マルチグラフニューラルネットワーク,四元系ニューラルネットワークなどの非可換畳み込みアーキテクチャへの直接的な適用と,摂動発生時の挙動を示す数値実験を行った。 In this paper we introduce and study the algebraic generalization of non commutative convolutional neural networks. We leverage the theory of algebraic signal processing to model convolutional non commutative architectures, and we derive concrete stability bounds that extend those obtained in the literature for commutative convolutional neural networks. We show that non commutative convolutional architectures can be stable to deformations on the space of operators. We develop the spectral representation of non commutative signal models to show that non commutative filters process Fourier components independently of each other. In particular we prove that although the spectral decompositions of signals in non commutative models are associated to eigenspaces of dimension larger than one, there exists a trade-off between stability and selectivity, which is controlled by matrix polynomial functions in spaces of matrices of low dimension. This tradeoff shows how when the filters in the algebra are restricted to be stable, there is a loss in discriminability that is compensated in the network by the pointwise nonlinearities. The results derived in this paper have direct applications and implications in non commutative convolutional architectures such as group neural networks, multigraph neural networks, and quaternion neural networks, for which we provide a set of numerical experiments showing their behavior when perturbations are present.	翻訳日:2023-07-07 18:59:56 公開日:2023-07-06
# 高次元コヒーレント一方向量子鍵分布 High-dimensional coherent one-way quantum key distribution ( http://arxiv.org/abs/2105.04733v4 ) ライセンス: Link先を確認	Kfir Sulimany, Guy Pelc, Rom Dudkiewicz, Simcha Korenblit, Hagai S. Eisenberg, Yaron Bromberg, Michael Ben-Or	(参考訳) 高次元量子鍵分布(QKD)は、2次元符号化によるQKDプロトコルでは得られないセキュアな鍵レートによる究極のセキュアな通信を提供する。しかし、既存の高次元QKDプロトコルは、マルチポート干渉計や複数の検出器などの追加の実験資源を必要とするため、実用的な高次元システムのコストが上がり、使用が制限される。本稿では,標準的な2次元システムのハードウェアのみを必要とする任意の次元QKDのための新しいプロトコルを提示し,解析する。個々の攻撃やコヒーレント攻撃に対するセキュリティ証明を提供し、セキュアな鍵レートの上限を上下に設定します。そして,40kmのファイバーリンク上の標準2次元QKDシステムにおいて,新しい高次元プロトコルをテストする。新しいプロトコルは、ハードウェアの変更をシステムに導入することなく、標準の2次元コヒーレントなワンウェイプロトコルと比較して、セキュアなキーレートを2倍に向上させる。この作業は、ソフトウェアアップデートだけで既にデプロイされているQKDシステムの性能を向上させる大きな可能性を秘めている。さらに、その応用はQKDクォーディットの様々な符号化スキームにまたがる。 High-dimensional quantum key distribution (QKD) provides ultimate secure communication with secure key rates that cannot be obtained by QKD protocols with two-dimensional encoding. However, existing high-dimensional QKD protocols require additional experimental resources, such as multiport interferometers and multiple detectors, thus raising the cost of practical high-dimensional systems and limiting their use. Here, we present and analyze a novel protocol for arbitrary-dimensional QKD, that requires only the hardware of a standard two-dimensional system. We provide security proofs against individual attacks and coherent attacks, setting an upper and lower bound on the secure key rates. Then, we test the new high-dimensional protocol in a standard two-dimensional QKD system over a 40 km fiber link. The new protocol yields a two-fold enhancement of the secure key rate compared to the standard two-dimensional coherent one way protocol, without introducing any hardware modifications to the system. This work therefore holds great potential to enhance the performance of already deployed time-bins QKD systems through a software update alone. Furthermore, its applications extend across different encoding schemes of QKD qudits.	翻訳日:2023-07-07 18:59:34 公開日:2023-07-06
# 補間テンソル積ウェーブレットに基づく電子構造計算 Electronic structure calculations with interpolating tensor product wavelet basis ( http://arxiv.org/abs/2101.05540v7 ) ライセンス: Link先を確認	Tommi H\"oyn\"al\"anmaa and Tapio T. Rantala	(参考訳) 本稿では,3次元Deslauriers--Dubucウェーブレットからなる基底集合を導入し,HおよびHe原子および分子のSchr\"odinger方程式をHF法とDFT法で解く。水素の2sと2pの励起状態も計算する。核のクーロン特異性は擬ポテンシャルを用いて処理される。固有値問題をArnoldi法とLaczos法、GMRES法とCGNR法によるPoisson式で解き、補間ウェーブレットの生体直交関係を用いて行列要素を計算する。パフォーマンスはCCCBDBやBigDFTと比較される。 We introduce a basis set consisting of three-dimensional Deslauriers--Dubuc wavelets and solve numerically the Schr\"odinger equations of H and He atoms and molecules $\mathrm{H}_2$, $\mathrm{H}_2^+$, and $\mathrm{LiH}$ with HF and DFT methods. We also compute the 2s and 2p excited states of hydrogen. The Coulomb singularity at the nucleus is handled by using a pseudopotential. The eigenvalue problem is solved with Arnoldi and Lanczos methods, Poisson equation with GMRES and CGNR methods, and matrix elements are computed using the biorthogonality relations of the interpolating wavelets. Performance is compared with those of CCCBDB and BigDFT.	翻訳日:2023-07-07 18:57:14 公開日:2023-07-06
# 訓練可能な重量平均化:サブスペーストレーニングのための一般的なアプローチ Trainable Weight Averaging: A General Approach for Subspace Training ( http://arxiv.org/abs/2205.13104v2 ) ライセンス: Link先を確認	Tao Li, Zhehao Huang, Qinghua Tao, Yingwen Wu, Xiaolin Huang	(参考訳) 低次元部分空間におけるディープニューラルネットワーク(DNN)のトレーニングは、効率的なトレーニングとより良い一般化性能を達成する上で有望な方向である。従来の研究は、ランダムな投影やトレーニング軌道上の次元削減手法を用いて部分空間を抽出するが、これらの手法は次元性や数値演算の点で非効率または不安定である。本稿では,重み付けにサブスペーストレーニングを結びつけるとともに,従来の取り組みを一般化したサブスペーストレーニングの一般的なアプローチであるTWAを提案する。 TWAは次元の点で効率的であり、使用も容易であり、サブスペーストレーニングのための有望な新しい方法である。さらに,複数のノードにわたる並列トレーニングを可能とし,各ノードにメモリと計算負荷を均等に分散させる,大規模な問題に対処する部分空間トレーニングの効率的なスキームを設計する。我々は、TWAを効率的なニューラルネットワークトレーニングに適用し、細調整されたパフォーマンスタスクを改善し、我々のアプローチの優れた効率と有効性を示す。我々は、様々なアーキテクチャを用いて、様々なベンチマークコンピュータビジョンとニューラル言語処理タスクをカバーする広範な実験を行う。実装コードはhttps://github.com/nblt/twa。 Training deep neural networks (DNNs) in low-dimensional subspaces is a promising direction for achieving efficient training and better generalization performance. Previous works extract the subspaces by using random projection or performing dimensionality reduction method on the training trajectory, but these methods can be inefficient or unstable in terms of dimensionality and numerical operations. In this paper, we connect subspace training to weight averaging and propose Trainable Weight Averaging (TWA), a general approach for subspace training that generalizes the previous efforts. TWA is efficient in terms of dimensionality and also easy to use, making it a promising new method for subspace training. We further design an efficient scheme for subspace training to cope with large-scale problems, which allows parallel training across multiple nodes and evenly distributing the memory and computation burden to each node. We apply TWA to efficient neural network training and improving fine-tuning performance tasks to demonstrate the great efficiency and effectiveness of our approach. We conduct extensive experiments that cover various benchmark computer vision and neural language processing tasks with various architectures. The code of implementation is available at https://github.com/nblt/TWA.	翻訳日:2023-07-07 18:50:34 公開日:2023-07-06
# 等価性と推定オントロジーマッチングのための機械学習フレンドリーなバイオメディカルデータセット Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching ( http://arxiv.org/abs/2205.03447v7 ) ライセンス: Link先を確認	Yuan He, Jiaoyan Chen, Hang Dong, Ernesto Jim\'enez-Ruiz, Ali Hadian, Ian Horrocks	(参考訳) オントロジーマッチング(OM)はバイオインフォマティクスやセマンティックウェブなど多くの分野において重要な役割を担い、特に機械学習(ML)技術の適用によってその研究はますます人気が高まっている。オントロジーアライメント評価イニシアチブ(OAEI)は,OMシステムの体系的評価に多大な努力を払っているものの,サブエミッションマッピングの限定的な評価,最適でない参照マッピング,MLベースのシステム評価の限定的なサポートなど,いくつかの制限に悩まされている。これらの制約に対処するために,Mondo と UMLS から抽出したオントロジーを含む5つの新しいバイオメディカル OM タスクを導入する。各タスクは等価性と仮定マッチングの両方を含み、参照マッピングの品質は人間のキュレーションやオントロジープルーニングなどで保証される。 MLベースのOMシステムと非MLベースのOMシステムの両方において,様々な観点からOM性能を測定するための総合評価フレームワークを提案する。我々は,OAEI 2022における新たなBioMLトラックの一部として,これらのリソースの利用状況を示すため,異なるタイプのOMシステムの評価結果を報告する。 Ontology Matching (OM) plays an important role in many domains such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. Although the Ontology Alignment Evaluation Initiative (OAEI) represents an impressive effort for the systematic evaluation of OM systems, it still suffers from several limitations including limited evaluation of subsumption mappings, suboptimal reference mappings, and limited support for the evaluation of ML-based systems. To tackle these limitations, we introduce five new biomedical OM tasks involving ontologies extracted from Mondo and UMLS. Each task includes both equivalence and subsumption matching; the quality of reference mappings is ensured by human curation, ontology pruning, etc.; and a comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems. We report evaluation results for OM systems of different types to demonstrate the usage of these resources, all of which are publicly available as part of the new BioML track at OAEI 2022.	翻訳日:2023-07-07 18:50:12 公開日:2023-07-06
# ニューラルマシン翻訳における非自己回帰生成に関する調査研究 A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond ( http://arxiv.org/abs/2204.09269v2 ) ライセンス: Link先を確認	Yisheng Xiao, Lijun Wu, Junliang Guo, Juntao Li, Min Zhang, Tao Qin, Tie-yan Liu	(参考訳) 推論を高速化するためにニューラルネットワーク翻訳(NMT)で最初に提案された非自己回帰(NAR)生成は、機械学習と自然言語処理のコミュニティの両方で注目を集めている。 NAR生成は機械翻訳の推論速度を大幅に高速化するが、高速化は自動回帰(AR)生成と比較して翻訳精度を犠牲にするコストがかかる。近年,NAR生成とAR生成の精度ギャップを埋めるために,多くの新しいモデルやアルゴリズムが設計・提案されている。本稿では,様々な側面の非自己回帰翻訳(nat)モデルの比較と議論を体系的に実施する。具体的には,natの取り組みを,データ操作,モデリング手法,トレーニング基準,デコードアルゴリズム,事前学習モデルのメリットなど,いくつかのグループに分類した。さらに, 文法的誤り訂正, テキスト要約, テキストスタイル変換, 対話, 意味解析, 自動音声認識など, 機械翻訳以外のNARモデルの応用についても, 簡単なレビューを行った。さらに、KDの依存関係、合理的なトレーニング目標、NARの事前トレーニング、より広範なアプリケーションなど、今後の探索の方向性についても論じる。この調査は、研究者が最新のNAR生成の進歩を捉え、先進的なNARモデルとアルゴリズムの設計を刺激し、業界関係者がアプリケーションに適切なソリューションを選択できるようにするのに役立つことを願っている。このサーベイのWebページは \url{https://github.com/LitterBrother-Xiao/Overview-of-Non-autoregressive-Applications} にある。 Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to speed up inference, has attracted much attention in both machine learning and natural language processing communities. While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, autoregressive (AR) generation. In recent years, many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation. In this paper, we conduct a systematic survey with comparisons and discussions of various non-autoregressive translation (NAT) models from different aspects. Specifically, we categorize the efforts of NAT into several groups, including data manipulation, modeling methods, training criterion, decoding algorithms, and the benefit from pre-trained models. Furthermore, we briefly review other applications of NAR models beyond machine translation, such as grammatical error correction, text summarization, text style transfer, dialogue, semantic parsing, automatic speech recognition, and so on. In addition, we also discuss potential directions for future exploration, including releasing the dependency of KD, reasonable training objectives, pre-training for NAR, and wider applications, etc. We hope this survey can help researchers capture the latest progress in NAR generation, inspire the design of advanced NAR models and algorithms, and enable industry practitioners to choose appropriate solutions for their applications. The web page of this survey is at \url{https://github.com/LitterBrother-Xiao/Overview-of-Non-autoregressive-Applications}.	翻訳日:2023-07-07 18:49:49 公開日:2023-07-06
# 高次元雑音データから低次元非線形構造を学習する:積分演算子アプローチ Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach ( http://arxiv.org/abs/2203.00126v2 ) ライセンス: Link先を確認	Xiucai Ding and Rong Ma	(参考訳) 本研究では,高次元および雑音の観測から低次元非線形構造を学習するためのカーネルスペクトル埋め込みアルゴリズムを提案する。このアルゴリズムは、基礎となる多様体の事前知識に依存しない適応的帯域選択手順を用いる。得られた低次元埋め込みは、データ可視化、クラスタリング、予測などの下流目的にさらに活用することができる。我々の方法は理論的に正当化され、事実上解釈可能である。具体的には,サンプルの寸法と大きさが可分に大きい場合,最終的な埋め込みの収束を確立し,信号対雑音比が収束率と位相遷移に与える影響を特徴付ける。また、ある再生核ヒルベルト空間の核写像によって定義される積分作用素の固有関数への埋め込みの収束を証明し、基礎となる非線形構造を捉える。 3つの実データセットの数値シミュレーションと解析により,様々な多様体を多様な応用で学習する手法と比較して,提案手法の実証的性能が優れていることを示す。 We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations, where the datasets are assumed to be sampled from an intrinsically low-dimensional manifold and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension and size of the samples are comparably large, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Numerical simulations and analysis of three real datasets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various manifolds in diverse applications.	翻訳日:2023-07-07 18:48:51 公開日:2023-07-06
# データセットバイアスの潜在的発生源 : 機械学習アルゴリズムによる診断不足の検討 Potential sources of dataset bias complicate investigation of underdiagnosis by machine learning algorithms ( http://arxiv.org/abs/2201.07856v2 ) ライセンス: Link先を確認	M\'elanie Bernhardt, Charles Jones, Ben Glocker	(参考訳) 機械学習アルゴリズムがトレーニングデータに埋め込まれたバイアスによって、健康格差を増幅するリスクを懸念する報告が増えている。 seyyed-kalantariらは、3つの胸部x線データセットで訓練されたモデルが'no-finding'ラベルのサブグループ間で偽陽性率(fpr)の差をもたらすことを発見した。これらのモデルは、歴史的に保存されていないことが知られているサブグループにおいて、常に高いFPRをもたらす。本研究における実験装置は,アルゴリズム下診断の研究には不十分である。データセットバイアスの程度と性質に関する特定の知識(または仮定)がないため、モデルバイアスを調査することは困難である。重要なことに、トレーニングデータ(ランダム分割による)と同じバイアスを示すテストデータの使用は、報告された格差の解釈を著しく複雑にする。 An increasing number of reports raise concerns about the risk that machine learning algorithms could amplify health disparities due to biases embedded in the training data. Seyyed-Kalantari et al. find that models trained on three chest X-ray datasets yield disparities in false-positive rates (FPR) across subgroups on the 'no-finding' label (indicating the absence of disease). The models consistently yield higher FPR on subgroups known to be historically underserved, and the study concludes that the models exhibit and potentially even amplify systematic underdiagnosis. We argue that the experimental setup in the study is insufficient to study algorithmic underdiagnosis. In the absence of specific knowledge (or assumptions) about the extent and nature of the dataset bias, it is difficult to investigate model bias. Importantly, their use of test data exhibiting the same bias as the training data (due to random splitting) severely complicates the interpretation of the reported disparities.	翻訳日:2023-07-07 18:48:34 公開日:2023-07-06
# データ解析のための完全適応ベイズアルゴリズム, FABADA Fully Adaptive Bayesian Algorithm for Data Analysis, FABADA ( http://arxiv.org/abs/2201.05145v2 ) ライセンス: Link先を確認	Pablo M Sanchez-Alarcon and Yago Ascasibar Sequeiros	(参考訳) 本研究の目的は,1次元と2次元のデータ,例えば天文学的な画像やスペクトルの信号対雑音比を自動的に改善する,ベイズ推定の観点から,新しい非パラメトリックノイズ低減手法を記述することである。このアルゴリズムはデータの平滑化可能なバージョンである平滑化モデルを反復的に評価し、ノイズ測定と統計的に互換性のある信号の推定を得る。繰り返しは、最後の滑らかなモデルのエビデンスと$\chi^2$統計量に基づいて停止し、スムーズなモデルの集合全体の重み付き平均として信号の期待値を計算する。本稿では,アルゴリズムの数学的形式化と数値的実装について述べるとともに,実天体観測のバッテリを用いて,ピーク信号と雑音比,構造的類似度指数,時間ペイロードを用いてその性能を評価する。データ解析のための完全適応ベイズアルゴリズム(fabada)は、パラメータチューニングなしでは、実際のアプリケーションでは不可能である真の信号に基づいてパラメータを最適化した標準的な画像処理アルゴリズムに匹敵する結果をもたらす。 bm3dのような最先端の非パラメトリックな手法は高い信号対雑音比で少し性能が向上するが、超ノイズデータではアルゴリズムの方がかなり正確である(相対誤差が20～40ドル以上であり、天文学の分野に特に関心がある状況である)。この範囲では, 復元によって得られた残留物の標準偏差は, 元の測定値よりも1桁以上小さくなる可能性がある。このレポートで提示された結果をすべて再現するために必要なソースコードは、メソッドの実装を含めて、https://github.com/PabloMSanAla/fabadaで公開されている。 The aim of this paper is to describe a novel non-parametric noise reduction technique from the point of view of Bayesian inference that may automatically improve the signal-to-noise ratio of one- and two-dimensional data, such as e.g. astronomical images and spectra. The algorithm iteratively evaluates possible smoothed versions of the data, the smooth models, obtaining an estimation of the underlying signal that is statistically compatible with the noisy measurements. Iterations stop based on the evidence and the $\chi^2$ statistic of the last smooth model, and we compute the expected value of the signal as a weighted average of the whole set of smooth models. In this paper, we explain the mathematical formalism and numerical implementation of the algorithm, and we evaluate its performance in terms of the peak signal to noise ratio, the structural similarity index, and the time payload, using a battery of real astronomical observations. Our Fully Adaptive Bayesian Algorithm for Data Analysis (FABADA) yields results that, without any parameter tuning, are comparable to standard image processing algorithms whose parameters have been optimized based on the true signal to be recovered, something that is impossible in a real application. State-of-the-art non-parametric methods, such as BM3D, offer slightly better performance at high signal-to-noise ratio, while our algorithm is significantly more accurate for extremely noisy data (higher than $20-40\%$ relative errors, a situation of particular interest in the field of astronomy). In this range, the standard deviation of the residuals obtained by our reconstruction may become more than an order of magnitude lower than that of the original measurements. The source code needed to reproduce all the results presented in this report, including the implementation of the method, is publicly available at https://github.com/PabloMSanAla/fabada	翻訳日:2023-07-07 18:48:15 公開日:2023-07-06
# AGMの信条改正、セマンティカルに AGM Belief Revision, Semantically ( http://arxiv.org/abs/2112.13557v2 ) ライセンス: Link先を確認	Faiq Miftakhul Falakh, Sebastian Rudolph, Kai Sauerwald	(参考訳) We establish a generic, model-theoretic characterization of belief revision operators implementing the paradigm of minimal change according to the seminal work by Alchourr\'{o}n, G\"{a}rdenfors, and Makinson (AGM). Our characterization applies to all Tarskian logics, that is, all logics with a classical model-theoretic semantics, and hence a wide variety of formalisms used in knowledge representation and beyond, including many for which a model-theoretic characterization has hitherto been lacking. Our starting point is the approach by Katsuno and Mendelzon (K&M), who provided such a characterization for propositional logic over finite signatures. We generalize K&M's approach to the setting of AGM-style revision over bases in arbitrary Tarskian logics, where base may refer to one of the various ways of representing an agent's beliefs (such as belief sets, arbitrary or finite sets of sentences, or single sentences). Our first core result is a representation theorem providing a two-way correspondence between AGM-style revision operators and specific assignments: functions associating every base to a "preference" relation over interpretations, which must be total but is - in contrast to prior approaches - not always transitive. 第2のコアコントリビューションとして、我々は結果が(K&Mのオリジナルの研究のように)推移的な選好関係を生み出す代入に強化されるような全ての論理の特徴づけを提供する。これらの主な貢献とともに、我々の発見の多様な変種と、他の信念修正理論の分野への影響について論じる。 We establish a generic, model-theoretic characterization of belief revision operators implementing the paradigm of minimal change according to the seminal work by Alchourr\'{o}n, G\"{a}rdenfors, and Makinson (AGM). Our characterization applies to all Tarskian logics, that is, all logics with a classical model-theoretic semantics, and hence a wide variety of formalisms used in knowledge representation and beyond, including many for which a model-theoretic characterization has hitherto been lacking. Our starting point is the approach by Katsuno and Mendelzon (K&M), who provided such a characterization for propositional logic over finite signatures. We generalize K&M's approach to the setting of AGM-style revision over bases in arbitrary Tarskian logics, where base may refer to one of the various ways of representing an agent's beliefs (such as belief sets, arbitrary or finite sets of sentences, or single sentences). Our first core result is a representation theorem providing a two-way correspondence between AGM-style revision operators and specific assignments: functions associating every base to a "preference" relation over interpretations, which must be total but is - in contrast to prior approaches - not always transitive. As our second core contribution, we provide a characterization of all logics for which our result can be strengthened to assignments producing transitive preference relations (as in K&M's original work). Alongside these main contributions, we discuss diverse variants of our findings as well as ramifications for other areas of belief revision theory.	翻訳日:2023-07-07 18:47:44 公開日:2023-07-06
# ネットワークにおける記述的対推論的コミュニティ検出:落とし穴、神話、半真実 Descriptive vs. inferential community detection in networks: pitfalls, myths, and half-truths ( http://arxiv.org/abs/2112.00183v7 ) ライセンス: Link先を確認	Tiago P. Peixoto	(参考訳) コミュニティ検出はネットワーク科学における最も重要な方法論の1つであり、過去数十年でかなりの注目を集めてきた。この領域は、ネットワークを基本的なビルディングブロックに分割し、その大規模構造の要約を提供することを目的としている。その重要性と広く採用されているにもかかわらず、最先端技術と、実際に様々な分野で実際に使われている方法との間には、明らかなギャップがある。ここでは、既存のメソッドが「記述的」か「推論的」かに応じて分割することで、この相違に対処しようと試みる。記述的手法は、コミュニティ構造の文脈依存的な概念に基づくネットワーク内のパターンを見つけるが、推論的手法は生成モデルを明確にし、それらをデータに適合させようとする。このようにして、彼らはネットワーク形成のメカニズムに関する洞察を与え、統計的証拠によって支持される方法でランダム性から構造を分離することができる。我々は,推論目的による記述的手法の導入が,落とし穴や誤解を招く解答に悩まされており,一般的には避けるべきであることを示す。我々は、推論法はより明確な科学的質問と一致し、より強固な結果をもたらし、多くの場合好まれるべきであると主張する。我々は,コミュニティ検出が実際に行われている場合によく信じられる神話や半真実を,そのような手法の使用と結果の解釈の両方を改善するために,取り除こうとしている。 Community detection is one of the most important methodological fields of network science, and one which has attracted a significant amount of attention over the past decades. This area deals with the automated division of a network into fundamental building blocks, with the objective of providing a summary of its large-scale structure. Despite its importance and widespread adoption, there is a noticeable gap between what is arguably the state-of-the-art and the methods that are actually used in practice in a variety of fields. Here we attempt to address this discrepancy by dividing existing methods according to whether they have a "descriptive" or an "inferential" goal. While descriptive methods find patterns in networks based on context-dependent notions of community structure, inferential methods articulate generative models, and attempt to fit them to data. In this way, they are able to provide insights into the mechanisms of network formation, and separate structure from randomness in a manner supported by statistical evidence. We review how employing descriptive methods with inferential aims is riddled with pitfalls and misleading answers, and thus should be in general avoided. We argue that inferential methods are more typically aligned with clearer scientific questions, yield more robust results, and should be in many cases preferred. We attempt to dispel some myths and half-truths often believed when community detection is employed in practice, in an effort to improve both the use of such methods as well as the interpretation of their results.	翻訳日:2023-07-07 18:47:24 公開日:2023-07-06
# 金融予測のためのエキスパートアグリゲーション Expert Aggregation for Financial Forecasting ( http://arxiv.org/abs/2111.15365v4 ) ライセンス: Link先を確認	Carl Remlinger, Bri\`ere Marie, Alasseur Cl\'emence, Joseph Mikael	(参考訳) 金融時系列予測専用の機械学習アルゴリズムは、多くの関心を集めている。しかし、その推定精度は時間とともに不安定になるため、いくつかのアルゴリズムを選択することは難しい。専門家のオンラインアグリゲーションは、モデルについて仮定することなく、一つのアプローチで有限のモデルの予測を組み合わせる。本稿では,Bernstein Online Aggregation (BOA) 手法を用いて,異なる機械学習モデルから得られる個々のストックリターン予測から構築したロングショート戦略を構築する。オンライン専門家の混在は、非定常性によって特徴づけられる環境においても魅力的なポートフォリオパフォーマンスをもたらす。このアグリゲーションは個々のアルゴリズムより優れており、より高いポートフォリオ Sharpe Ratio、低い不足率、同様のターンオーバーを提供する。専門家や専門家の専門化への拡張も提案されており、ポートフォリオ評価指標のファミリー全体の混合を改善する。 Machine learning algorithms dedicated to financial time series forecasting have gained a lot of interest. But choosing between several algorithms can be challenging, as their estimation accuracy may be unstable over time. Online aggregation of experts combine the forecasts of a finite set of models in a single approach without making any assumption about the models. In this paper, a Bernstein Online Aggregation (BOA) procedure is applied to the construction of long-short strategies built from individual stock return forecasts coming from different machine learning models. The online mixture of experts leads to attractive portfolio performances even in environments characterised by non-stationarity. The aggregation outperforms individual algorithms, offering a higher portfolio Sharpe Ratio, lower shortfall, with a similar turnover. Extensions to expert and aggregation specialisations are also proposed to improve the overall mixture on a family of portfolio evaluation metrics.	翻訳日:2023-07-07 18:46:50 公開日:2023-07-06
# 光パラメトリック増幅によるウィグナー機能トモグラフィ Wigner Function Tomography via Optical Parametric Amplification ( http://arxiv.org/abs/2207.10030v4 ) ライセンス: Link先を確認	Mahmoud Kalash and Maria V. Chekhova	(参考訳) ウィグナー関数トモグラフィーは量子状態の特徴付けには不可欠であるが、その一般的なバージョンであるバランスの取れたホモダイン検出はいくつかの弱点に苦しむ。まず、非ガウス状態、特に明るい状態を測定するのに重要な効率的な検出が必要である。第二に、テスト中の状態の時空間特性に合わせて調整された局所発振器が必要であり、マルチモード状態やブロードバンド状態では失敗する。本稿では,光学パラメトリック増幅に基づくWigner関数トモグラフィーと直接検出手法を提案する。この方法は、非効率性と損失の検出に免疫を持ち、ブロードバンド、空間的および時間的に多モードの量子状態に適している。この原理を証明するため,強い多重モード状態の単一モードを占有する圧縮真空のウィグナー関数を実験的に再構成した。フィルタにより97%以上の損失が生じるにもかかわらず、$-7.5\pm 0.4$ dBと$0.91^{+0.09}_{-0.08}$の純度を得る。理論的には、圧縮された単一光子(明るい非ガウス状態)の再構成も検討する。強力なマルチモードパラメトリック増幅のため、この方法は複数のモードを同時にトモグラフィーできる。これにより、光学量子情報処理の強力なツールとなる。 Wigner function tomography is indispensable for characterizing quantum states, but its commonly used version, balanced homodyne detection, suffers from several weaknesses. First, it requires efficient detection, which is critical for measuring fragile non-Gaussian states, especially bright ones. Second, it needs a local oscillator, tailored to match the spatiotemporal properties of the state under test, and fails for multimode and broadband states. Here we propose Wigner function tomography based on optical parametric amplification followed by direct detection. The method is immune to detection inefficiency and loss, and suitable for broadband, spatially and temporally multimode quantum states. To prove the principle, we experimentally reconstruct the Wigner function of squeezed vacuum occupying a single mode of a strongly multimode state. We obtain a squeezing of $-7.5\pm 0.4$ dB and a purity of $0.91^{+0.09}_{-0.08}$ despite more than $97\%$ loss caused mainly by filtering. Theoretically, we also consider the reconstruction of a squeezed single photon - a bright non-Gaussian state. Due to strong multimode parametric amplification, the method allows for the simultaneous tomography of multiple modes. This makes it a powerful tool for optical quantum information processing.	翻訳日:2023-07-07 18:41:37 公開日:2023-07-06
# 摂動パラメトリック量子進化の微分に対する"プロパ"シフト則 "Proper" Shift Rules for Derivatives of Perturbed-Parametric Quantum Evolutions ( http://arxiv.org/abs/2207.01587v3 ) ライセンス: Link先を確認	Dirk Oliver Theis	(参考訳) Banchi & Crooks (Quantum, 2021) は「摂動」量子進化(英語版) $x\mapsto e^{i(x A + B)/\hbar}$ と呼ばれるパラメータによって期待値の微分を推定する方法を与えている。彼らのメソッドは、単にパラメータを変更するだけでなく、現れるユニタリへの修正を必要とする。さらに、$b$項が避けられない場合、この微分の正確な方法(偏りのない推定法)は知られていないようである: banchi & crooks の手法は近似を与える。本稿では、このタイプのパラメータ化期待値の導関数を推定するために、シフトパラメータのみを必要とせず、量子進化の他の変更(「適切な」シフト規則)も必要としない方法を提案する。本手法は, 解析的導関数, 偏りのない推定値を与える手法であり, バンチ・クルックス法と同じ最悪の場合のばらつきを持つ。さらに、摂動パラメトリック量子進化のフーリエ解析に基づいて、適切なシフト規則を取り巻く理論について議論し、その結果、フーリエ変換の観点から適切なシフト規則が特徴づけられ、結果としてシフトの指数的な集中を伴う適切なシフト規則が存在しない結果となる。近似誤差を示す切り抜き法を導出し、予備数値シミュレーションに基づいてBanchi-Crooks法と比較する。 Banchi & Crooks (Quantum, 2021) have given methods to estimate derivatives of expectation values depending on a parameter that enters via what we call a "perturbed" quantum evolution $x\mapsto e^{i(x A + B)/\hbar}$. Their methods require modifications, beyond merely changing parameters, to the unitaries that appear. Moreover, in the case when the $B$-term is unavoidable, no exact method (unbiased estimator) for the derivative seems to be known: Banchi & Crooks's method gives an approximation. In this paper, for estimating the derivatives of parameterized expectation values of this type, we present a method that only requires shifting parameters, no other modifications of the quantum evolutions (a "proper" shift rule). Our method is exact (i.e., it gives analytic derivatives, unbiased estimators), and it has the same worst-case variance as Banchi-Crooks's. Moreover, we discuss the theory surrounding proper shift rules, based on Fourier analysis of perturbed-parametric quantum evolutions, resulting in a characterization of the proper shift rules in terms of their Fourier transforms, which in turn leads us to non-existence results of proper shift rules with exponential concentration of the shifts. We derive truncated methods that exhibit approximation errors, and compare to Banchi-Crooks's based on preliminary numerical simulations.	翻訳日:2023-07-07 18:40:26 公開日:2023-07-06
# 遷移確率に対する資源可変近距離量子アルゴリズムの改良と物理学および変分量子線形代数への応用 Improved resource-tunable near-term quantum algorithms for transition probabilities, with applications in physics and variational quantum linear algebra ( http://arxiv.org/abs/2206.14213v2 ) ライセンス: Link先を確認	Nicolas PD Sawaya, Joonsuk Huh	(参考訳) 遷移振幅と遷移確率は、応答特性と相関関数の計算を含む物理学シミュレーションの多くの領域に関係している。これらの量もまた方程式の線形系を解くことに関係している。ここでは遷移確率を計算するための3つの関連アルゴリズムを提案する。まず、2つの入力状態が非直交的になるように,前述した短距離アルゴリズムを拡張する。この第1の手順に基づいて、回路評価の少ないトロッター化とリチャードソン外挿に基づくより深いアルゴリズムを導出する。第3に、回路の深さと測定の複雑さをトレードオフできるチューナブルアルゴリズムを導入し、特定のハードウェア特性に合わせて調整可能なアルゴリズムを導出する。最後に、物理学および化学のモデルおよび変分量子線形解法(vqls)のサブルーチンに対する原理証明数値を実装した。私たちのアプローチの一番の利点は (a) 任意の非直交状態は、量子資源のわずかな増加と共に用いられる。 b) 我々は(最近提案された他の方法と同様に)3ビットゲートの分解を必要とするハダマール試験のようなサブルーチンを完全に回避し、 c) 遷移確率に対するnisqアルゴリズムの以前の状態と比較して、量子回路評価がより少ない場合も少なくなる。 Transition amplitudes and transition probabilities are relevant to many areas of physics simulation, including the calculation of response properties and correlation functions. These quantities can also be related to solving linear systems of equations. Here we present three related algorithms for calculating transition probabilities. First, we extend a previously published short-depth algorithm, allowing for the two input states to be non-orthogonal. Building on this first procedure, we then derive a higher-depth algorithm based on Trotterization and Richardson extrapolation that requires fewer circuit evaluations. Third, we introduce a tunable algorithm that allows for trading off circuit depth and measurement complexity, yielding an algorithm that can be tailored to specific hardware characteristics. Finally, we implement proof-of-principle numerics for models in physics and chemistry and for a subroutine in variational quantum linear solving (VQLS). The primary benefits of our approaches are that (a) arbitrary non-orthogonal states may now be used with small increases in quantum resources, (b) we (like another recently proposed method) entirely avoid subroutines such as the Hadamard test that may require three-qubit gates to be decomposed, and (c) in some cases fewer quantum circuit evaluations are required as compared to the previous state-of-the-art in NISQ algorithms for transition probabilities.	翻訳日:2023-07-07 18:40:00 公開日:2023-07-06
# 単純立方体格子における2レベル原子の力学の量子平均場処理 Quantum mean-field treatment of the dynamics of a two-level atom in a simple cubic lattice ( http://arxiv.org/abs/2206.14156v3 ) ライセンス: Link先を確認	Yamen Hamdouni	(参考訳) 平均場近似はキュリー温度に近い強磁性格子中の2レベル原子の動力学の一般的な特徴を調べるために用いられる。解析的および数値的な結果が得られる。まず、格子ハミルトニアンを線形化し、磁場の任意の方向に対する相転移の次数パラメータに対する自己抵抗方程式を導出する。還元されたダイナミクスは格子の自由度を辿り出し、格子の単位セルのサイズに等しい大きさの有効スピン浴における原子のダイナミクスを減少させる。特定の方向に沿って磁場を印加することにより, 劣化や励起状態の占有確率が向上する可能性が示唆された。また,温度変化とスピンの大きさに対する依存性についても検討した。熱揺らぎの増加は励起状態の占有確率を減少させる可能性があることが判明した。非隣接細胞を占有する2つのそのような原子の絡み合いを研究し、その時間の変化は磁場の方向にあまり敏感でないことが判明した。絡み合いによる突然の死亡と再生は臨界温度近くで起こることが示されている。 The mean field approximation is used to investigate the general features of the dynamics of a two-level atom in a ferromagnetic lattice close to the Curie temperature. Various analytical and numerical results are obtained. We first linearize the lattice Hamiltonian, and we derive the self-consistency equation for the order parameter of the phase transition for arbitrary direction of the magnetic field. The reduced dynamics is deduced by tracing out the degrees of freedom of the lattice, which results in the reduction of the dynamics to that of an atom in an effective spin bath whose size is equal to the size of a unit cell of the lattice. It is found that the dephasing and the excited state occupation probability may be enhanced by applying the magnetic field along some specific directions. The dependence on the change of the temperature and the magnitude of spin is also investigated. It turns out that the increase of thermal fluctuations may reduce the occupation probability of the excited state. The entanglement of two such atoms that occupy non-adjacent cells is studied and its variation in time is found to be not much sensitive to the direction of the magnetic field. Entanglement sudden death and revival is shown to occur close to the critical temperature.	翻訳日:2023-07-07 18:39:40 公開日:2023-07-06
# アルゴリズム付きプラットフォームによるコンテンツクリエータインセンティブのモデリング Modeling Content Creator Incentives on Algorithm-Curated Platforms ( http://arxiv.org/abs/2206.13102v2 ) ライセンス: Link先を確認	Jiri Hron, Karl Krauth, Michael I. Jordan, Niki Kilbertus, Sarah Dean	(参考訳) コンテンツクリエイターはユーザーの注意を競います。彼らのリーチは、オンラインプラットフォーム上で開発者が行うアルゴリズムの選択に大きく依存する。露出を最大化するために、多くのクリエーターは、スプロールする検索エンジン最適化産業のような例によって証明されているように、戦略的に適応する。これは有限ユーザアテンションプールの競争を招きます。我々はこれらのダイナミクスを、現代の因数分解や(ディープ)2towerアーキテクチャを含むアルゴリズムによって誘導されるインセンティブのモデルである露光ゲームと呼ぶ形で形式化する。非負対非拘束因子化のような一見無害なアルゴリズム選択は、露出ゲームにおける(nash)平衡の存在と特性に大きな影響を与えることが証明される。エクスポージャーゲームのようなクリエーターの行動モデルを使って、(以前の)デプロイ前の監査を行います。このような監査は、望ましいコンテンツとインセンティブのあるコンテンツのミスアライメントを特定し、コンテンツフィルタリングやモデレーションといったポストホックな措置を補完する。そこで本研究では,露出ゲームにおける平衡を数値的に検出するツールを提案し,MovieLensおよびLastFMデータセットの監査結果を示す。さらに, 戦略的に生成したコンテンツは, アルゴリズム探索とコンテンツの多様性, モデル表現率とジェンダーベースユーザとクリエーターグループへの偏見に強く依存していることが判明した。 Content creators compete for user attention. Their reach crucially depends on algorithmic choices made by developers on online platforms. To maximize exposure, many creators adapt strategically, as evidenced by examples like the sprawling search engine optimization industry. This begets competition for the finite user attention pool. We formalize these dynamics in what we call an exposure game, a model of incentives induced by algorithms, including modern factorization and (deep) two-tower architectures. We prove that seemingly innocuous algorithmic choices, e.g., non-negative vs. unconstrained factorization, significantly affect the existence and character of (Nash) equilibria in exposure games. We proffer use of creator behavior models, like exposure games, for an (ex-ante) pre-deployment audit. Such an audit can identify misalignment between desirable and incentivized content, and thus complement post-hoc measures like content filtering and moderation. To this end, we propose tools for numerically finding equilibria in exposure games, and illustrate results of an audit on the MovieLens and LastFM datasets. Among else, we find that the strategically produced content exhibits strong dependence between algorithmic exploration and content diversity, and between model expressivity and bias towards gender-based user and creator groups.	翻訳日:2023-07-07 18:39:23 公開日:2023-07-06
# 視覚観察からのオフライン強化学習の課題と機会 Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations ( http://arxiv.org/abs/2206.04779v3 ) ライセンス: Link先を確認	Cong Lu, Philip J. Ball, Tim G. J. Rudner, Jack Parker-Holder, Michael A. Osborne, Yee Whye Teh	(参考訳) オフライン強化学習は、ポリシー学習に大規模な事前に収集されたデータセットを活用する上で大きな可能性を秘めている。しかしながら、連続的なアクションスペースによる視覚的観察からのオフライン強化学習は、この複雑なドメインにおける重要な課題に対する理解が限られているため、未検討のままである。本稿では、視覚領域における連続的な制御のための単純なベースラインを確立し、実世界のオフラインRL問題に存在するデータ分布をより良く表現するために設計された視覚観測からオフライン強化学習のための一連のベンチマークタスクを導入する。このベンチマークタスクを用いて、DreamerV2とDrQ-v2という2つの人気のある視覚ベースのオンライン強化学習アルゴリズムに簡単な修正を加えることで、既存のオフラインRLメソッドを上回り、視覚領域における継続的な制御のための競争的ベースラインを確立することができることを示す。我々はこれらのアルゴリズムを厳密に評価し、視覚的観察から連続制御するための最先端モデルベースとモデルなしオフラインRL法の違いを実証的に評価する。この評価で使用されるコードとデータは、この領域の進歩を促進するためにオープンソース化されている。 Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, offline reinforcement learning from visual observations with continuous action spaces remains under-explored, with a limited understanding of the key challenges in this complex domain. In this paper, we establish simple baselines for continuous control in the visual domain and introduce a suite of benchmarking tasks for offline reinforcement learning from visual observations designed to better represent the data distributions present in real-world offline RL problems and guided by a set of desiderata for offline RL from visual observations, including robustness to visual distractions and visually identifiable changes in dynamics. Using this suite of benchmarking tasks, we show that simple modifications to two popular vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform existing offline RL methods and establish competitive baselines for continuous control in the visual domain. We rigorously evaluate these algorithms and perform an empirical evaluation of the differences between state-of-the-art model-based and model-free offline RL methods for continuous control from visual observations. All code and data used in this evaluation are open-sourced to facilitate progress in this domain.	翻訳日:2023-07-07 18:38:26 公開日:2023-07-06
# 部分的誤り訂正の時代における清潔で汚れたキュービットの戦い The battle of clean and dirty qubits in the era of partial error correction ( http://arxiv.org/abs/2205.13454v3 ) ライセンス: Link先を確認	Daniel Bultrini, Samson Wang, Piotr Czarnik, Max Hunter Gordon, M. Cerezo, Patrick J. Coles, Lukasz Cincio	(参考訳) 誤り訂正が可能になった場合、各論理量子ビットに多数の物理量子ビットを割り当てる必要がある。誤り訂正はより深い回路を動作させることができるが、それぞれの物理量子ビットは計算空間の指数的な増加に寄与しうるため、誤り訂正のためにキュービットを使用するか、ノイズの多いキュービットとして使用するかのトレードオフがある。本研究では、ノイズのない量子ビット(誤り訂正された量子ビットの理想的なモデル)とともにノイズの多い量子ビットを使うことの効果を考察し、これを「クリーンで汚い」構成と呼ぶ。この設定を特徴付けるために解析モデルと数値シミュレーションを用いる。イジングモデルであるハミルトン変分アンサッツ回路において,ノイズ誘起不規則高原(nibps)の出現,すなわちノイズによる観測可能物質の指数関数濃度を示す。一つの量子ビットだけがノイズがあり、十分な回路が与えられたとしても、NIBPは単に量子ビットのサブセットを誤り訂正することによって完全に克服できないことを示唆する。正の面では、回路内のすべてのノイズレスキュービットに対して、勾配可観測物の濃度が指数関数的に抑制され、部分誤差補正の利点が示される。最後に, 解析モデルにより, 観測対象が汚染量子ビットの比に関連する指数のスケーリングに集中していることを示し, これらの知見を裏付ける。 When error correction becomes possible it will be necessary to dedicate a large number of physical qubits to each logical qubit. Error correction allows for deeper circuits to be run, but each additional physical qubit can potentially contribute an exponential increase in computational space, so there is a trade-off between using qubits for error correction or using them as noisy qubits. In this work we look at the effects of using noisy qubits in conjunction with noiseless qubits (an idealized model for error-corrected qubits), which we call the "clean and dirty" setup. We employ analytical models and numerical simulations to characterize this setup. Numerically we show the appearance of Noise-Induced Barren Plateaus (NIBPs), i.e., an exponential concentration of observables caused by noise, in an Ising model Hamiltonian variational ansatz circuit. We observe this even if only a single qubit is noisy and given a deep enough circuit, suggesting that NIBPs cannot be fully overcome simply by error-correcting a subset of the qubits. On the positive side, we find that for every noiseless qubit in the circuit, there is an exponential suppression in concentration of gradient observables, showing the benefit of partial error correction. Finally, our analytical models corroborate these findings by showing that observables concentrate with a scaling in the exponent related to the ratio of dirty-to-total qubits.	翻訳日:2023-07-07 18:38:02 公開日:2023-07-06
# マルチドメイン物理インフォームドニューラルネットワークにおけるインタフェース条件のメタ学習 Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks ( http://arxiv.org/abs/2210.12669v2 ) ライセンス: Link先を確認	Shibo Li, Michael Penwarden, Yiming Xu, Conor Tillinghast, Akil Narayan, Robert M. Kirby, Shandian Zhe	(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、偏微分方程式(PDE)の一般的なメッシュフリーな解法として登場している。最近の拡張では、ドメインを分解し、異なるPINNを適用して各サブドメインの問題を解決し、サブドメインをインターフェースで縫い合わせる。これにより、問題の複雑性をさらに軽減し、計算コストを削減し、並列化を可能にする。しかし,マルチドメインPINNの性能は,インターフェース条件の選択に敏感である。かなり多くの条件が提案されているが、特定の問題に応じて条件を選択する方法が提案されていない。このギャップに対処するために、パラメトリックPDEのファミリーを解くための適切なインタフェース条件を動的に決定するための、シンプルで効率的かつ強力なアプローチであるMETALIC(META Learning of Interface Conditions)を提案する。具体的には,2つの文脈的マルチアームバンディット(mab)モデルを開発した。最初のものはトレーニングコース全体に適用され、PDEパラメータとインターフェース条件がパフォーマンスを予測するガウシアンプロセス(GP)の報酬をオンライン更新する。我々は,UPBサンプリングとトンプソンサンプリングの両方に対して,サブ線形後悔を証明し,理論的にはMABの有効性を保証している。第2の段階は2段階に分割し、第1の段階は確率相と第2の段階であり、各段階のGP報酬を更新し、2段階の異なる条件選択を可能にし、柔軟性と性能をさらに向上させる。我々は4つのベンチマークpdeファミリーにおいてメタリックの利点を示した。 Physics-informed neural networks (PINNs) are emerging as popular mesh-free solvers for partial differential equations (PDEs). Recent extensions decompose the domain, apply different PINNs to solve the problem in each subdomain, and stitch the subdomains at the interface. Thereby, they can further alleviate the problem complexity, reduce the computational cost, and allow parallelization. However, the performance of multi-domain PINNs is sensitive to the choice of the interface conditions. While quite a few conditions have been proposed, there is no suggestion about how to select the conditions according to specific problems. To address this gap, we propose META Learning of Interface Conditions (METALIC), a simple, efficient yet powerful approach to dynamically determine appropriate interface conditions for solving a family of parametric PDEs. Specifically, we develop two contextual multi-arm bandit (MAB) models. The first one applies to the entire training course, and online updates a Gaussian process (GP) reward that given the PDE parameters and interface conditions predicts the performance. We prove a sub-linear regret bound for both UCB and Thompson sampling, which in theory guarantees the effectiveness of our MAB. The second one partitions the training into two stages, one is the stochastic phase and the other deterministic phase; we update a GP reward for each phase to enable different condition selections at the two stages to further bolster the flexibility and performance. We have shown the advantage of METALIC on four bench-mark PDE families.	翻訳日:2023-07-07 18:31:06 公開日:2023-07-06
# 強度三重相関によるab initio空間位相検索 Ab Initio Spatial Phase Retrieval via Intensity Triple Correlations ( http://arxiv.org/abs/2210.03793v3 ) ライセンス: Link先を確認	Nolan Peard, Kartik Ayyer, and Henry N. Chapman	(参考訳) 非コヒーレントエミッタからの2次強度相関は、空間分布のフーリエ変換係数を明らかにすることができるが、実空間への完全一般フーリエ変換を可能にするための位相の検索は依然として困難である。 3階の強度相関による位相検索は、計算において未対応の符号問題を単純化する特別なエミッタ構成に依存している。この符号問題の完全な処理がなければ、エミッターの真に任意の配置からフーリエ位相を検索する一般的なケースは不可能である。本稿では, 強度三重相関を用いた ab initio 相の一般検索法について述べる。シミュレーションは、撮像星や蛍光原子や分子に応用できる非コヒーレントエミッターのクラスターの正確な位相検索を示す。この研究により、フーリエ変換を直接実行し、遠方界の強度相関のみを通して任意の独立したエミッター配列の画像を再構成することができるようになった。 Second-order intensity correlations from incoherent emitters can reveal the Fourier transform modulus of their spatial distribution, but retrieving the phase to enable completely general Fourier inversion to real space remains challenging. Phase retrieval via the third-order intensity correlations has relied on special emitter configurations which simplified an unaddressed sign problem in the computation. Without a complete treatment of this sign problem, the general case of retrieving the Fourier phase from a truly arbitrary configuration of emitters is not possible. In this paper, a general method for ab initio phase retrieval via the intensity triple correlations is described. Simulations demonstrate accurate phase retrieval for clusters of incoherent emitters which could be applied to imaging stars or fluorescent atoms and molecules. With this work, it is now finally tractable to perform Fourier inversion directly and reconstruct images of arbitrary arrays of independent emitters via far-field intensity correlations alone.	翻訳日:2023-07-07 18:30:12 公開日:2023-07-06
# adaptive sparse vit: セルフアテンションをフル活用した学習可能な適応トークンプルーニング Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention ( http://arxiv.org/abs/2209.13802v2 ) ライセンス: Link先を確認	Xiangcheng Liu, Tianyi Wu, Guodong Guo	(参考訳) ビジョントランスフォーマーはコンピュータビジョンの新しいパラダイムとして登場し、高価な計算コストを伴う優れた性能を示している。画像トークンのプルーニングは、トークン数に対して複雑さが二次的であること、背景領域のみを含む多くのトークンが最終的な予測に真に寄与しないという事実から、ViT圧縮の主要なアプローチの1つである。既存の作業は、個々のトークンの重要性を評価するために追加モジュールに依存するか、異なる入力インスタンスに対して固定比率プルーニング戦略を実装している。本研究では,最小限のコストで適応的なスパーストークンプルーニングフレームワークを提案する。具体的には,まず,安価な注意頭部重要度重み付けクラス注意得点機構を提案する。そして、学習可能なパラメータをしきい値として挿入して、重要でないトークンと情報を区別する。トークンアテンションスコアとしきい値を比較することで、不要なトークンを階層的に破棄し、推論を加速することができる。学習可能なしきい値は、精度と複雑さのバランスをとるために予算対応トレーニングに最適化され、異なる入力インスタンスに対して対応するプルーニング設定を実行する。大規模な実験は我々のアプローチの有効性を実証する。提案手法はdeit-sのスループットを50%向上させ,top-1精度が0.2%低下しただけで,従来の手法よりも精度とレイテンシのトレードオフが向上した。 Vision transformer has emerged as a new paradigm in computer vision, showing excellent performance while accompanied by expensive computational cost. Image token pruning is one of the main approaches for ViT compression, due to the facts that the complexity is quadratic with respect to the token number, and many tokens containing only background regions do not truly contribute to the final prediction. Existing works either rely on additional modules to score the importance of individual tokens, or implement a fixed ratio pruning strategy for different input instances. In this work, we propose an adaptive sparse token pruning framework with a minimal cost. Specifically, we firstly propose an inexpensive attention head importance weighted class attention scoring mechanism. Then, learnable parameters are inserted as thresholds to distinguish informative tokens from unimportant ones. By comparing token attention scores and thresholds, we can discard useless tokens hierarchically and thus accelerate inference. The learnable thresholds are optimized in budget-aware training to balance accuracy and complexity, performing the corresponding pruning configurations for different input instances. Extensive experiments demonstrate the effectiveness of our approach. Our method improves the throughput of DeiT-S by 50% and brings only 0.2% drop in top-1 accuracy, which achieves a better trade-off between accuracy and latency than the previous methods.	翻訳日:2023-07-07 18:29:36 公開日:2023-07-06
# アンドロイドは電気羊を笑うのか? new yorkerのキャプションコンテストにおけるユーモアの「理解」ベンチマーク Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest ( http://arxiv.org/abs/2209.06293v2 ) ライセンス: Link先を確認	Jack Hessel and Ana Marasovi\'c and Jena D. Hwang and Lillian Lee and Jeff Da and Rowan Zellers and Robert Mankoff and Yejin Choi	(参考訳) 大規模なニューラルネットワークがジョークを生成できるようになったが、ユーモアを“理解”できるのだろうか? 我々は、New Yorker Cartoon Caption Contestから派生した3つのタスクでAIモデルに挑戦する: ジョークと漫画をマッチングし、勝利したキャプションを特定し、勝利したキャプションが面白い理由を説明する。重要な要素は、画像とキャプションの間の複雑な、しばしば驚くべき関係と、間接的で遊びに満ちた説明が人間の経験や文化に頻繁に含まれることである。我々は,マルチモーダルモデルと言語のみのモデルの両方について検討する。前者は漫画イメージに直接挑戦し,後者は人間レベルの視覚的理解をシミュレートするために視覚シーンの多面的記述を与える。どちらのモデルも3つのタスクすべてで苦労しています。例えば、当社のベストマルチモーダルモデルは、マッチングタスクにおいて人間のパフォーマンスよりも30ポイント遅れており、たとえ地上の視覚シーン記述子が提供されたとしても、人間による説明は、2/3以上のケースで最高の機械によって認可されたモデル(gpt-4)よりも優先されます。モデル、コード、リーダボード、コーパスをリリースし、画像の位置や関係、シーンで珍しいもの、ジョークの説明を新たに収集したアノテーションが含まれています。 Large neural networks can now generate jokes, but do they really "understand" humor? We challenge AI models with three tasks derived from the New Yorker Cartoon Caption Contest: matching a joke to a cartoon, identifying a winning caption, and explaining why a winning caption is funny. These tasks encapsulate progressively more sophisticated aspects of "understanding" a cartoon; key elements are the complex, often surprising relationships between images and captions and the frequent inclusion of indirect and playful allusions to human experience and culture. We investigate both multimodal and language-only models: the former are challenged with the cartoon images directly, while the latter are given multifaceted descriptions of the visual scene to simulate human-level visual understanding. We find that both types of models struggle at all three tasks. For example, our best multimodal models fall 30 accuracy points behind human performance on the matching task, and, even when provided ground-truth visual scene descriptors, human-authored explanations are preferred head-to-head over the best machine-authored ones (few-shot GPT-4) in more than 2/3 of cases. We release models, code, leaderboard, and corpus, which includes newly-gathered annotations describing the image's locations/entities, what's unusual in the scene, and an explanation of the joke.	翻訳日:2023-07-07 18:29:14 公開日:2023-07-06
# 衛星からのウイルス検出:グラフニューラルネットワークによる西ナイルウイルスの循環のモデル化 Spotting Virus from Satellites: Modeling the Circulation of West Nile Virus Through Graph Neural Networks ( http://arxiv.org/abs/2209.05251v2 ) ライセンス: Link先を確認	Lorenzo Bonicelli, Angelo Porrello, Stefano Vincenzi, Carla Ippoliti, Federica Iapaolo, Annamaria Conte, Simone Calderara	(参考訳) 西ナイルウイルス(英語: West Nile Virus、WNV)は、蚊が媒介する動物病ウイルスの1つである。その循環は通常、ベクター増殖とウイルスの複製に適した気候および環境条件と関連している。その上、wnv循環の形状と予測のためにいくつかの統計モデルが開発されており、特に最近の地球観測(eo)データの大量利用と人工知能の分野における継続的な進歩は、貴重な機会を提供している。本稿では,広範に環境・気候特性を有する衛星画像を用いた深部ニューラルネットワーク(DNN)によるWNV循環予測を提案する。特に,各地形を個別に解析する従来の手法では,近接する場所の特性を考慮した空間認識手法を提案する。具体的には,グラフニューラルネットワーク(gnn)を基盤として,隣接する場所から特徴を集約し,これらのモジュールをさらに拡張して,温度や土壌水分の差,地理的距離など,複数の関係を考察する。さらに,ウイルス拡散の季節性を考慮するため,時間関連情報をモデルに直接注入する。我々は、ランドサットとセンチネルのミッションの衛星画像と、イタリアにおけるWNV循環の地上観測を組み合わせた実験的な設定を設計する。提案するマルチアドバンシーグラフアテンションネットワーク (magat) は, 適切な事前学習段階と組み合わせると, 一貫して高い性能が得られることを示す。最後に,Ablation研究におけるMAGATの各成分の重要性について検討した。 The occurrence of West Nile Virus (WNV) represents one of the most common mosquito-borne zoonosis viral infections. Its circulation is usually associated with climatic and environmental conditions suitable for vector proliferation and virus replication. On top of that, several statistical models have been developed to shape and forecast WNV circulation: in particular, the recent massive availability of Earth Observation (EO) data, coupled with the continuous advances in the field of Artificial Intelligence, offer valuable opportunities. In this paper, we seek to predict WNV circulation by feeding Deep Neural Networks (DNNs) with satellite images, which have been extensively shown to hold environmental and climatic features. Notably, while previous approaches analyze each geographical site independently, we propose a spatial-aware approach that considers also the characteristics of close sites. Specifically, we build upon Graph Neural Networks (GNN) to aggregate features from neighbouring places, and further extend these modules to consider multiple relations, such as the difference in temperature and soil moisture between two sites, as well as the geographical distance. Moreover, we inject time-related information directly into the model to take into account the seasonality of virus spread. We design an experimental setting that combines satellite images - from Landsat and Sentinel missions - with ground truth observations of WNV circulation in Italy. We show that our proposed Multi-Adjacency Graph Attention Network (MAGAT) consistently leads to higher performance when paired with an appropriate pre-training stage. Finally, we assess the importance of each component of MAGAT in our ablation studies.	翻訳日:2023-07-07 18:28:50 公開日:2023-07-06
# バルク-バウンダリ対応によるトポロジカル量子系に対するストレンジ相関器 Strange correlators for topological quantum systems from bulk-boundary correspondence ( http://arxiv.org/abs/2209.04283v3 ) ライセンス: Link先を確認	Luca Lepori and Michele Burrello and Andrea Trombettoni and Simone Paganelli	(参考訳) ストレンジ」相関器は、調査中の状態と自明な参照状態の間の適切な2点相関の行列要素を計算することにより、多体モデルで生じる位相位相を検出するツールを提供する。その効果は、採用されているオペレータの選択に依存する。本稿では,この選択に対する体系的な手順を提案し,監視下のシステムのバルク・バウンダリ対応を用いた演算子選択の利点について論じる。スケーリング指数を用いて、奇妙な相関子の代数的減衰とギャップレスエッジモード作用素のスケーリング次元を直接関連付ける。対称性を保護した位相位相を包含する格子モデルを用いて解析を開始し、奇妙な相関子の和を解析し、それらのモジュラーを統合することでキャンセルや有限サイズ効果が大幅に減少することを示した。また,非自明なトポロジを持つ状態間の奇妙な相関関係だけでなく,内在的トポロジ秩序をホストするシステムも分析する。翻訳的および非翻訳的不変例,およびオンサイト障害や長距離結合の存在下では, トポロジカル位相の診断に奇妙な相関器を用いた手法の有効性を拡張し, 最適選択のための一般的な手順を示す。 "Strange" correlators provide a tool to detect topological phases arising in many-body models by computing the matrix elements of suitably defined two-point correlations between the states under investigation and trivial reference states. Their effectiveness depends on the choice of the adopted operators. In this paper we give a systematic procedure for this choice, discussing the advantages of choosing operators using the bulk-boundary correspondence of the systems under scrutiny. Via the scaling exponents, we directly relate the algebraic decay of the strange correlators with the scaling dimensions of gapless edge modes operators. We begin our analysis with lattice models hosting symmetry-protected topological phases and we analyze the sums of the strange correlators, pointing out that integrating their moduli substantially reduces cancellations and finite-size effects. We also analyze instances of systems hosting intrinsic topological order, as well as strange correlators between states with different nontrivial topologies. Our results for both translational and non-translational invariant cases, and in presence of on-site disorder and long-range couplings, extend the validity of the strange correlators approach for the diagnosis of topological phases of matter, and indicate a general procedure for their optimal choice.	翻訳日:2023-07-07 18:28:26 公開日:2023-07-06
# 複雑なネットワーク理論を用いた分散型エネルギー資源を用いた配電システムの計画と運用のレジリエンス評価 Evaluating the Planning and Operational Resilience of Electrical Distribution Systems with Distributed Energy Resources using Complex Network Theory ( http://arxiv.org/abs/2208.11543v4 ) ライセンス: Link先を確認	Divyanshi Dwivedi, Pradeep Kumar Yemula, Mayukha Pal	(参考訳) 電気系統は分散エネルギー資源(ders)によって広範囲に浸透し、エネルギー需要にシステムのレジリエンスを高めるという一般的な認識を満たしている。しかし、dersの統合はグリッド操作に悪影響を与え、その断続的な可用性、気象条件のダイナミクス、非線形性、複雑さ、悪意のある脅威の数、消費者の信頼性要求の改善といった様々な要因によってシステムのレジリエンスに影響を与える可能性がある。本稿では,極端事象下での配電系統の計画と運用のレジリエンスを評価する手法を提案し,電力系統の耐久能力について検討する。提案手法は複雑なネットワーク理論を効果的に活用して開発された。電力ネットワークのノードで監視されるアクティブ電力の時系列データから、望ましくない構成のための関連ネットワークを開発する。これらの相関ネットワークに対しては,クラスタリング係数,アソシエイト係数,平均度,電力法指数などのネットワークパラメータを計算し,極端な条件下でのネットワークの耐力判定のためのパーコレーション閾値を算出した。提案手法は, 異なる条件下でレジリエンスを維持しつつ, システム内のソーラーパネルのホスト容量を同定し, システムの非レジリエンス化に寄与する最重要ノードを特定するのにも適している。このフレームワークは、シミュレーションソフトウェアGridLAB-Dを用いて、様々な電気条件のアクティブ電力時系列データを生成することにより、IEEE 123ノードテストフィード上で実証される。パーコレーション閾値は配電システムの計画と運用のレジリエンスの決定に有効な指標となった。 Electrical Distribution Systems are extensively penetrated with Distributed Energy Resources (DERs) to cater the energy demands with the general perception that it enhances the system's resilience. However, integration of DERs may adversely affect the grid operation and affect the system resilience due to various factors like their intermittent availability, dynamics of weather conditions, non-linearity, complexity, number of malicious threats, and improved reliability requirements of consumers. This paper proposes a methodology to evaluate the planning and operational resilience of power distribution systems under extreme events and determines the withstand capability of the electrical network. The proposed framework is developed by effectively employing the complex network theory. Correlated networks for undesirable configurations are developed from the time series data of active power monitored at nodes of the electrical network. For these correlated networks, computed the network parameters such as clustering coefficient, assortative coefficient, average degree and power law exponent for the anticipation; and percolation threshold for the determination of the network withstand capability under extreme conditions. The proposed methodology is also suitable for identifying the hosting capacity of solar panels in the system while maintaining resilience under different unfavourable conditions and identifying the most critical nodes of the system that could drive the system into non-resilience. This framework is demonstrated on IEEE 123 node test feeder by generating active power time-series data for a variety of electrical conditions using simulation software, GridLAB-D. The percolation threshold resulted as an effective metric for the determination of the planning and operational resilience of the power distribution system.	翻訳日:2023-07-07 18:28:05 公開日:2023-07-06
# リレーショナルメッセージパッシングニューラルネットワークを用いた不均一シーングラフ生成 Unbiased Heterogeneous Scene Graph Generation with Relation-aware Message Passing Neural Network ( http://arxiv.org/abs/2212.00443v4 ) ライセンス: Link先を確認	Kanghoon Yoon, Kibum Kim, Jinyoung Moon, Chanyoung Park	(参考訳) 最近のシーングラフ生成(SGG)フレームワークは、画像内の複数のオブジェクト間の複雑な関係を学習することに焦点を当てている。オブジェクトとその隣接するオブジェクト間の高次相互作用をモデル化するメッセージパッシングニューラルネットワーク(MPNN)の性質のおかげで、SGGの代表的な表現学習モジュールとなっている。しかし、既存のMPNNベースのフレームワークはシーングラフを均質なグラフとみなし、オブジェクト間の視覚的関係の文脈認識を制限する。つまり、関係が関連している対象に大きく依存する傾向があるという事実を、彼らは見落としている。本稿では,メッセージパッシングニューラルネットワークを用いて関係認識コンテキストをキャプチャする不偏不均一シーングラフ生成(hetsgg)フレームワークを提案する。本稿では,オブジェクト間の述語型を考慮した画像の文脈情報を集約する,rmp(relation-aware message passing neural network)と呼ばれる新しいメッセージパッシング層を考案する。以上の結果から,HetSGGは最先端の手法,特に尾部述語クラスでは性能に優れていた。 Recent scene graph generation (SGG) frameworks have focused on learning complex relationships among multiple objects in an image. Thanks to the nature of the message passing neural network (MPNN) that models high-order interactions between objects and their neighboring objects, they are dominant representation learning modules for SGG. However, existing MPNN-based frameworks assume the scene graph as a homogeneous graph, which restricts the context-awareness of visual relations between objects. That is, they overlook the fact that the relations tend to be highly dependent on the objects with which the relations are associated. In this paper, we propose an unbiased heterogeneous scene graph generation (HetSGG) framework that captures relation-aware context using message passing neural networks. We devise a novel message passing layer, called relation-aware message passing neural network (RMP), that aggregates the contextual information of an image considering the predicate type between objects. Our extensive evaluations demonstrate that HetSGG outperforms state-of-the-art methods, especially outperforming on tail predicate classes.	翻訳日:2023-07-07 18:21:25 公開日:2023-07-06
# Fourier-Net:バンド制限変形による高速画像登録 Fourier-Net: Fast Image Registration with Band-limited Deformation ( http://arxiv.org/abs/2211.16342v2 ) ライセンス: Link先を確認	Xi Jia, Joseph Bartlett, Wei Chen, Siyang Song, Tianyang Zhang, Xinxing Cheng, Wenqi Lu, Zhaowen Qiu, Jinming Duan	(参考訳) 教師なし画像登録では、全解像度空間領域における密度変位場を予測するためにU-Netスタイルのネットワークが一般的である。高解像度のボリューム画像データの場合、このプロセスはリソース集約的で時間を要する。そこで本研究では,u-net方式ネットワークにおける拡張パスをパラメータフリーモデル駆動デコーダに置き換え,フーリエネットを提案する。具体的には,空間領域内のフルレゾリューション変位場を出力するフーリエネット学習の代わりに,その低次元表現を帯域制限フーリエ領域で学習する。この表現は、我々が考案したモデル駆動デコーダ(ゼロパディング層と逆離散フーリエ変換層)によって空間領域内の密度の高い全解像度変位場にデコードされる。これらの変更により、教師なしのfourier-netは、パラメータと計算操作が少なくなり、推論速度が速くなります。 fourier-netは、さまざまな最先端のアプローチに対して、2つの公開3d脳データセットで評価される。例えば、TransMorphというトランスフォーマーベースの手法と比較して、我々のフーリエネットはパラメータの 2.2 % と乗算加算演算の 6.66 % しか使用せず、0.5 % のDiceスコアと 11.48 倍の推論速度が得られる。コードは \url{https://github.com/xi-jia/fourier-net} で入手できる。 Unsupervised image registration commonly adopts U-Net style networks to predict dense displacement fields in the full-resolution spatial domain. For high-resolution volumetric image data, this process is however resource-intensive and time-consuming. To tackle this problem, we propose the Fourier-Net, replacing the expansive path in a U-Net style network with a parameter-free model-driven decoder. Specifically, instead of our Fourier-Net learning to output a full-resolution displacement field in the spatial domain, we learn its low-dimensional representation in a band-limited Fourier domain. This representation is then decoded by our devised model-driven decoder (consisting of a zero padding layer and an inverse discrete Fourier transform layer) to the dense, full-resolution displacement field in the spatial domain. These changes allow our unsupervised Fourier-Net to contain fewer parameters and computational operations, resulting in faster inference speeds. Fourier-Net is then evaluated on two public 3D brain datasets against various state-of-the-art approaches. For example, when compared to a recent transformer-based method, named TransMorph, our Fourier-Net, which only uses 2.2\% of its parameters and 6.66\% of the multiply-add operations, achieves a 0.5\% higher Dice score and an 11.48 times faster inference speed. Code is available at \url{https://github.com/xi-jia/Fourier-Net}.	翻訳日:2023-07-07 18:21:04 公開日:2023-07-06
# 反復アルゴリズム学習のための再帰的リカレントニューラルネットワーク(R2N2)アーキテクチャ A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms ( http://arxiv.org/abs/2211.12386v2 ) ライセンス: Link先を確認	Danimir T. Doncevic, Alexander Mitsos, Yue Guo, Qianxiao Li, Felix Dietrich, Manuel Dahmen, Ioannis G. Kevrekidis	(参考訳) 与えられたタスクに対する数値アルゴリズムのメタラーニングは、アルゴリズム構造と関連するハイパーパラメータのデータ駆動識別と適応からなる。メタラーニング問題の複雑さを制限するために、有利なアルゴリズム構造に対するある種の帰納的バイアスを持つニューラルアーキテクチャを使用できる。我々は,前回導入したrunge-kuttaニューラルネットワークを再帰的再帰的ニューラルネットワーク(r2n2)スーパー構造に一般化した。既成のディープラーニングアプローチとは対照的に、情報生成のためのモジュールと、それに続くソリューションへの情報の組み立てのためのモジュールの分離が特徴である。サブスペースの形での局所情報は、現在の外部イテレートから始まる繰り返し関数評価の下位、内部、イテレーションによって生成される。次の外部イテレートへの更新は、これらの評価の線形結合として計算され、この空間の残余を低減し、ネットワークの出力を構成する。様々な計算問題クラスの入出力データに対して,提案構造内の重みパラメータを正規にトレーニングすることで,線形方程式系ではクリロフソルバ,非線形方程式系ではニュートン・クリロフソルバ,常微分方程式ではルンゲ・クッタ積分器のような反復が得られることを示す。モジュラリティのため、スーパー構造はテイラー級数展開に基づいて伝統的に反復アルゴリズムのより一般的なクラスを表現するのに必要な関数で容易に拡張できる。 Meta-learning of numerical algorithms for a given task consists of the data-driven identification and adaptation of an algorithmic structure and the associated hyperparameters. To limit the complexity of the meta-learning problem, neural architectures with a certain inductive bias towards favorable algorithmic structures can, and should, be used. We generalize our previously introduced Runge-Kutta neural network to a recursively recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. In contrast to off-the-shelf deep learning approaches, it features a distinct division into modules for generation of information and for the subsequent assembly of this information towards a solution. Local information in the form of a subspace is generated by subordinate, inner, iterations of recurrent function evaluations starting at the current outer iterate. The update to the next outer iterate is computed as a linear combination of these evaluations, reducing the residual in this space, and constitutes the output of the network. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields iterations similar to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta integrators for ordinary differential equations. Due to its modularity, the superstructure can be readily extended with functionalities needed to represent more general classes of iterative algorithms traditionally based on Taylor series expansions.	翻訳日:2023-07-07 18:20:40 公開日:2023-07-06
# 正規変換器:視覚意味論によるLiDAR点からの表面形状の抽出 Normal Transformer: Extracting Surface Geometry from LiDAR Points Enhanced by Visual Semantics ( http://arxiv.org/abs/2211.10580v2 ) ライセンス: Link先を確認	Ancheng Lin, Jun Li	(参考訳) 表面ノーマルの高品質な推定は、衝突回避や咬合推定のような多くの幾何学的理解問題において曖昧さを減らすのに役立つ。本稿では,3次元点雲と2次元カラー画像から正規分布を推定する手法を提案する。本研究では,視覚意味と3次元幾何学データのハイブリッド情報と効果的な学習戦略を活用すべく,トランスフォーマーニューラルネットワークを開発した。既存の手法と比較して,提案手法の情報融合はより効果的であり,実験によって支援されている。また、3次元レンダリングエンジンに屋外交通シーンのシミュレーション環境を構築し、通常の推定器を訓練するための注釈付きデータを得た。合成データに基づいてトレーニングされたモデルは、KITTIデータセットの実際のシーンでテストされる。 KITTIデータセットの通常の方向を推定したタスクは、提案した推定器が既存の手法よりも優れていることを示す。 High-quality estimation of surface normal can help reduce ambiguity in many geometry understanding problems, such as collision avoidance and occlusion inference. This paper presents a technique for estimating the normal from 3D point clouds and 2D colour images. We have developed a transformer neural network that learns to utilise the hybrid information of visual semantic and 3D geometric data, as well as effective learning strategies. Compared to existing methods, the information fusion of the proposed method is more effective, which is supported by experiments. We have also built a simulation environment of outdoor traffic scenes in a 3D rendering engine to obtain annotated data to train the normal estimator. The model trained on synthetic data is tested on the real scenes in the KITTI dataset. And subsequent tasks built upon the estimated normal directions in the KITTI dataset show that the proposed estimator has advantage over existing methods.	翻訳日:2023-07-07 18:19:51 公開日:2023-07-06
# スタイン変分勾配降下のための有限粒子収束速度 A Finite-Particle Convergence Rate for Stein Variational Gradient Descent ( http://arxiv.org/abs/2211.09721v4 ) ライセンス: Link先を確認	Jiaxin Shi and Lester Mackey	(参考訳) 粒子の集合で確率分布を近似する一般的なアルゴリズムであるスタイン変分勾配降下(SVGD)に対する最初の有限粒子収束速度を提供する。具体的には、ターゲット分布がリプシッツスコアのサブガウジアンである場合、n個の粒子と適切なステップサイズシーケンスを持つsvgdは、カーネルスタインの不一致を1/sqrt(log log n)レートでゼロにする。 n への依存度が向上し、我々の明示的で非漸近的な証明戦略が将来の改良のテンプレートになることを期待している。 We provide the first finite-particle convergence rate for Stein variational gradient descent (SVGD), a popular algorithm for approximating a probability distribution with a collection of particles. Specifically, whenever the target distribution is sub-Gaussian with a Lipschitz score, SVGD with n particles and an appropriate step size sequence drives the kernel Stein discrepancy to zero at an order 1/sqrt(log log n) rate. We suspect that the dependence on n can be improved, and we hope that our explicit, non-asymptotic proof strategy will serve as a template for future refinements.	翻訳日:2023-07-07 18:19:38 公開日:2023-07-06
# DiffusionDB: テキストから画像生成モデルのための大規模プロンプトギャラリーデータセット DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models ( http://arxiv.org/abs/2210.14896v4 ) ライセンス: Link先を確認	Zijie J. Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, Duen Horng Chau	(参考訳) 最近の拡散モデルの進歩により、ユーザーは自然言語でテキストプロンプトを書くことで高品質な画像を生成することができる。しかし、所望の詳細な画像を生成するには適切なプロンプトが必要であり、モデルがどのように異なるプロンプトに反応するか、最良のプロンプトが何であるかはよくわからない。これらの重要な課題に対処するために、DiffusionDBを紹介した。DiffusionDBは、Stable Diffusionが生成した1400万のイメージ、1.8万のユニークなプロンプト、および実際のユーザが指定したハイパーパラメータを含む、最初の大規模なテキスト・画像プロンプトデータセットである。我々はプロンプトの構文的特徴と意味的特徴を分析する。モデルエラーにつながる可能性のある特定のハイパーパラメータ値とプロンプトスタイルを特定し、誤情報の発生のような潜在的に有害なモデル使用の証拠を示す。この前例のない規模のデータセットと多様性は、プロンプトと生成モデルの相互作用を理解し、ディープフェイクを検出し、これらのモデルをより簡単に使用するためのヒューマン・aiインタラクションツールを設計するための、エキサイティングな研究機会を提供します。 DiffusionDBは、https://poloclub.github.io/diffusiondb.comで公開されている。 With recent advancements in diffusion models, users can generate high-quality images by writing text prompts in natural language. However, generating images with desired details requires proper prompts, and it is often unclear how a model reacts to different prompts or what the best prompts are. To help researchers tackle these critical challenges, we introduce DiffusionDB, the first large-scale text-to-image prompt dataset totaling 6.5TB, containing 14 million images generated by Stable Diffusion, 1.8 million unique prompts, and hyperparameters specified by real users. We analyze the syntactic and semantic characteristics of prompts. We pinpoint specific hyperparameter values and prompt styles that can lead to model errors and present evidence of potentially harmful model usage, such as the generation of misinformation. The unprecedented scale and diversity of this human-actuated dataset provide exciting research opportunities in understanding the interplay between prompts and generative models, detecting deepfakes, and designing human-AI interaction tools to help users more easily use these models. DiffusionDB is publicly available at: https://poloclub.github.io/diffusiondb.	翻訳日:2023-07-07 18:19:07 公開日:2023-07-06
# 逆コントラスト学習に基づく中国語スペルチェックフレームワーク A Chinese Spelling Check Framework Based on Reverse Contrastive Learning ( http://arxiv.org/abs/2210.13823v2 ) ライセンス: Link先を確認	Nankai Lin, Hongyan Wu, Sihui Fu, Shengyi Jiang, Aimin Yang	(参考訳) 中国語のスペルチェックは、漢字のスペルミスを検出し、訂正するタスクである。既存の研究は、テキスト表現を強化し、マルチソース情報を用いてモデルの検出と修正能力を向上させることを目的としているが、不明瞭な単語を区別する能力にはあまり注意を払わない。類似したサンプルペア間の表現空間距離を最小化することを目的としたコントラスト学習は,近年,自然言語処理において主流となっている。コントラスト学習にインスパイアされた中国語のスペルチェックのための新しいフレームワークを提案し,言語表現,スペルチェック,逆コントラスト学習の3つのモジュールからなる。具体的には,類似した例,すなわち音韻的および視覚的に表現可能な文字間の一致を最小限に抑えるための逆コントラスト学習戦略を提案する。実験の結果,我々のフレームワークはモデルに依存しず,既存の中国語綴りチェックモデルと組み合わせて,最先端のパフォーマンスを実現することができた。 Chinese spelling check is a task to detect and correct spelling mistakes in Chinese text. Existing research aims to enhance the text representation and use multi-source information to improve the detection and correction capabilities of models, but does not pay too much attention to improving their ability to distinguish between confusable words. Contrastive learning, whose aim is to minimize the distance in representation space between similar sample pairs, has recently become a dominant technique in natural language processing. Inspired by contrastive learning, we present a novel framework for Chinese spelling checking, which consists of three modules: language representation, spelling check and reverse contrastive learning. Specifically, we propose a reverse contrastive learning strategy, which explicitly forces the model to minimize the agreement between the similar examples, namely, the phonetically and visually confusable characters. Experimental results show that our framework is model-agnostic and could be combined with existing Chinese spelling check models to yield state-of-the-art performance.	翻訳日:2023-07-07 18:18:43 公開日:2023-07-06
# 複雑な問合せ応答に対するニューラルリンク予測器の適応 Adapting Neural Link Predictors for Complex Query Answering ( http://arxiv.org/abs/2301.12313v2 ) ライセンス: Link先を確認	Erik Arakelyan, Pasquale Minervini, Isabelle Augenstein	(参考訳) 不完全な知識グラフに複雑なクエリを答えることは、モデルが不足する知識が存在する場合、複雑な論理的クエリに答える必要があるという課題である。 arakelyan et al. (2021), minervini et al. (2022) は、ニューラルネットワークの予測器は複雑なクエリに対する応答にも使用できることを示した。しかし、CQDは否定を処理せず、アトミックなトレーニングクエリからのトレーニング信号のみを使用する: ニューラルネットワーク予測スコアは、複雑なクエリ応答中にファジィ論理t-ノルムを介して相互に相互作用するように調整されていない。本稿では、パラメータ効率のよいスコア適応モデルをトレーニングして、ニューラルネットワーク予測スコアを再分類することで、この問題に対処することを提案する。我々の手法であるCQD$^{A}$は、現在の最先端の手法よりもはるかに正確な結果を生成し、利用可能なトレーニングクエリタイプの$\leq 35\%$を使用しながら、すべてのデータセットとクエリタイプの平均値の平均34.4$から35.1ドルに改善した。さらに、CQD$^{A}$はデータ効率が高く、トレーニングデータのわずか1\%の値で競合結果が得られ、ドメイン外の評価が堅牢であることを示す。 Answering complex queries on incomplete knowledge graphs is a challenging task where a model needs to answer complex logical queries in the presence of missing knowledge. Recently, Arakelyan et al. (2021); Minervini et al. (2022) showed that neural link predictors could also be used for answering complex queries: their Continuous Query Decomposition (CQD) method works by decomposing complex queries into atomic sub-queries, answers them using neural link predictors and aggregates their scores via t-norms for ranking the answers to each complex query. However, CQD does not handle negations and only uses the training signal from atomic training queries: neural link prediction scores are not calibrated to interact together via fuzzy logic t-norms during complex query answering. In this work, we propose to address this problem by training a parameter-efficient score adaptation model to re-calibrate neural link prediction scores: this new component is trained on complex queries by back-propagating through the complex query-answering process. Our method, CQD$^{A}$, produces significantly more accurate results than current state-of-the-art methods, improving from $34.4$ to $35.1$ Mean Reciprocal Rank values averaged across all datasets and query types while using $\leq 35\%$ of the available training query types. We further show that CQD$^{A}$ is data-efficient, achieving competitive results with only $1\%$ of the training data, and robust in out-of-domain evaluations.	翻訳日:2023-07-07 18:11:31 公開日:2023-07-06
# 発散自発波動関数崩壊に対する線形摩擦多体方程式について On the linear friction many-body equation for dissipative spontaneous wavefunction collapse ( http://arxiv.org/abs/2301.07661v2 ) ライセンス: Link先を確認	Giovanni Di Bartolomeo and Matteo Carlesso and Kristian Piscicchia and Catalina Curceanu and Maaneli Derakhshani and Lajos Di\'osi	(参考訳) 我々は、自然界における基本的な自発的デコヒーレンスと自発的波動関数の崩壊に関する既存の非相対論的理論の新たな散逸的拡張を目的として、多体系に対する最も単純な普遍散逸的リンドブラッドマスター方程式を構築し、研究する。これは、第二量子化された質量密度 $\hat \rho$ と現在の $\hat j$ で書かれるので普遍的であり、物質構造とそのパラメータとは独立している。電流の線形摩擦を仮定すると、散逸構造は厳密に制約されている。発散性リンドブラッド方程式の一般構造に従うと、最もよく知られた2つの自発波関数崩壊モデルDi\'osi-Penroseと連続自発局所化モデルの発散性拡張を導出し、解析する。 We construct and study the simplest universal dissipative Lindblad master equation for many-body systems with the purpose of a new dissipative extension of existing nonrelativistic theories of fundamental spontaneous decoherence and spontaneous wave function collapse in nature. It is universal as it is written in terms of second-quantized mass density $\hat \rho$ and current $\hat J$, thus making it independent of the material structure and its parameters. Assuming linear friction in the current, we find that the dissipative structure is strictly constrained. Following the general structure of our dissipative Lindblad equation, we derive and analyze the dissipative extensions of the two most known spontaneous wave function collapse models, the Di\'osi-Penrose and the continuous spontaneous localization models.	翻訳日:2023-07-07 18:10:59 公開日:2023-07-06
# ドメインシフトとラベルノイズ下での病理像を用いた共通不確実性推定手法のベンチマーク Benchmarking common uncertainty estimation methods with histopathological images under domain shift and label noise ( http://arxiv.org/abs/2301.01054v2 ) ライセンス: Link先を確認	Hendrik A. Mehrtens, Alexander Kurz, Tabea-Clara Bucher, Titus J. Brinker	(参考訳) 近年, 病理組織学的応用分野における深層学習の利用が増加している。しかし、これらのアプローチは大きな可能性を示しているが、リスクの高い環境では、ディープラーニングモデルは不確実性を判断し、誤分類の可能性がかなり高い場合に入力を拒否できる必要がある。本研究は,スライド画像全体の分類において,最も一般的に用いられる不確かさとロバストネスの手法を厳密に評価し,不確かでない状況においてモデルが分類を拒絶すべき選択分類のタスクに着目した。我々は、ドメインシフトやラベルノイズの面からタイルレベルの実験を行い、スライドレベルの実験も行います。実験では,Deep Ensembles,Monte-Carlo Dropout,Stochastic Variational Inference,Test-Time Data Augmentation,および後者のアプローチのアンサンブルを比較した。従来のコンピュータビジョンベンチマークの結果とは対照的に,一般に手法のアンサンブルが不確実性評価を向上し,ドメインシフトやラベルノイズに対するロバスト性も向上するが,他の手法の体系的な利得は示さない。方法全体では、最も不確実なサンプルの拒絶は、分布内および分布外データの両方の分類精度を確実に向上させる。さらに,これらの手法をラベルノイズの異なる条件下で比較する実験を行った。最後に,病理組織学的データに対する不確実性推定のさらなる研究を促進するために,コードフレームワークを公開する。 In the past years, deep learning has seen an increase in usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole Slide Images, with a focus on the task of selective classification, where the model should reject the classification in situations in which it is uncertain. We conduct our experiments on tile-level under the aspects of domain shift and label noise, as well as on slide-level. In our experiments, we compare Deep Ensembles, Monte-Carlo Dropout, Stochastic Variational Inference, Test-Time Data Augmentation as well as ensembles of the latter approaches. We observe that ensembles of methods generally lead to better uncertainty estimates as well as an increased robustness towards domain shifts and label noise, while contrary to results from classical computer vision benchmarks no systematic gain of the other methods can be shown. Across methods, a rejection of the most uncertain samples reliably leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.	翻訳日:2023-07-07 18:10:41 公開日:2023-07-06
# 高次例外点の非線形摂動--皮膚離散呼吸器と階層的パワー・ロースケーリング Nonlinear perturbation of a high-order exceptional point: skin discrete breathers and the hierarchical power-law scaling ( http://arxiv.org/abs/2212.13765v2 ) ライセンス: Link先を確認	Hui Jiang, Enhong Cheng, Ziyu Zhou, and Li-Jun Lang	(参考訳) 本研究では,一方向ホッピングとKerr非線形性を有するHatano-Nelsonモデルにおいて,高次例外点(EP)の非線形摂動をシステムサイト数$L$と同等に検討する。特に、1つの境界に集約する離散呼吸器のクラスを見つけ、ここで「it skin discrete breathers」 (sdb) と名づける。これらのSDBの非線形スペクトルは、EPの近くで階層的なパワーロースケーリングを示す。具体的には、摂動に対する非線形エネルギーの応答は $e_m\propto \vargamma^{\alpha_{m}}$ で与えられ、ここで $\alpha_m=3^{m-1}$ は、非線形エネルギーバンドをラベルする $m=1,\cdots,l$ のパワーである。これは一般の線型摂動の$L$-番目の根とは対照的である。これらのsdbは、指数関数的に崩壊する線形系のエッジ状態やスキンモードとは異なり、二重指数的に崩壊する。さらに、これらのSDBは完全な非線形性強度で生き残ることができ、大きな非線形性の限界において自己トラッピング状態と連続的に接続される。また、安定性解析から断熱的な進化の非線形忠実性が定義されていることも確認されている。ケラー非線形性が自然に存在する光学導波路の古典的プラットフォームやボース・アインシュタイン凝縮による光学格子の量子プラットフォームなど、様々なプラットフォームにおいて非相対的非線形モデルが実験的に実現されているため、この解析結果が非線形性と非ヘルミティシティの間の相互作用、特に高次epの相互作用のさらなる探求を促し、関連するシミュレーションをベンチマークすることができる。 We study the nonlinear perturbation of a high-order exceptional point (EP) of the order equal to the system site number $L$ in a Hatano-Nelson model with unidirectional hopping and Kerr nonlinearity. Notably, We find a class of discrete breathers that aggregate to one boundary, here named as {\it skin discrete breathers} (SDBs). The nonlinear spectrum of these SDBs shows a hierarchical power-law scaling near the EP. Specifically, the response of nonlinear energy to the perturbation is given by $E_m\propto \varGamma^{\alpha_{m}}$, where $\alpha_m=3^{m-1}$ is the power with $m=1,\cdots,L$ labeling the nonlinear energy bands. This is in sharp contrast to the $L$-th root of a linear perturbation in general. These SDBs decay in a double-exponential manner, unlike the edge states or skin modes in linear systems, which decay exponentially. Furthermore, these SDBs can survive over the full range of nonlinearity strength and are continuously connected to the self-trapped states in the limit of large nonlinearity. They are also stable, as confirmed by a defined nonlinear fidelity of an adiabatic evolution from the stability analysis. As nonreciprocal nonlinear models may be experimentally realized in various platforms, such as the classical platform of optical waveguides, where Kerr nonlinearity is naturally present, and the quantum platform of optical lattices with Bose-Einstein condensates, our analytical results may inspire further exploration of the interplay between nonlinearity and non-Hermiticity, particularly on high-order EPs, and benchmark the relevant simulations.	翻訳日:2023-07-07 18:10:12 公開日:2023-07-06
# コンピュータは「ノー」と言う:共感的会話型aiの事例 Computer says "No": The Case Against Empathetic Conversational AI ( http://arxiv.org/abs/2212.10983v2 ) ライセンス: Link先を確認	Alba Curry, Amanda Cercas Curry	(参考訳) 感情は人間の認知の不可欠な部分であり、世界に対する私たちの理解だけでなく、その中の行動も導く。このように、感情を落ち着かせるか、燃やすかは一致しない。会話型AIにおける最近の研究は、ユーザーに対して共感的に反応し、実際のベースなしで感情を検証することに集中している。このAIが支援する感情的規制は、ユーザや社会にネガティブな結果をもたらす可能性がある。我々はユーザーの感情にどう反応するかを慎重に検討する必要がある。 Emotions are an integral part of human cognition and they guide not only our understanding of the world but also our actions within it. As such, whether we soothe or flame an emotion is not inconsequential. Recent work in conversational AI has focused on responding empathetically to users, validating and soothing their emotions without a real basis. This AI-aided emotional regulation can have negative consequences for users and society, tending towards a one-noted happiness defined as only the absence of "negative" emotions. We argue that we must carefully consider whether and how to respond to users' emotions.	翻訳日:2023-07-07 18:09:38 公開日:2023-07-06
# 2-2散乱におけるベル不等式 Bell inequalities in 2-2 scattering ( http://arxiv.org/abs/2212.10213v3 ) ライセンス: Link先を確認	Aninda Sinha, Ahmadullah Zahed	(参考訳) 我々は,光子,重力子,フェルミオン,ピオンの2-2散乱におけるベルの不等式を考える。最大絡み合った状態に対してベル違反の最大値を与える測定設定を選択し,ベル不等式を計算した。低エネルギーでの光子散乱では、qedは小さな横領域を除いて全ての散乱角に対してベル違反を示す。これは微調整の問題を引き起こす。 ap(light axion-axion-like particle)を組み込むことで、微調整問題を排除し、axion-coupling-axion-massパラメータを制約する。グラビトン交換と光子散乱におけるベル違反の要求により、弱重力導体が満足していることが分かる。アクシオンカップリングにおける量子重力効果について論じる。 2-2グラビトン散乱の場合、CEMZ境界はベルの少なくとも小さな違反を許容する。ワインバーグ角の制限は、ババ散乱におけるベル違反を要求することによって見出される。近年のS-行列ブートストラップデータを用いて,許容S-行列空間のベルパラメータを解析した。光子の場合、ベルパラメータをエネルギーの関数として研究し、EFT観測の支持を得る。クエットであるピオンS行列のベルパラメータについて論じる。ピオンに対しては、Regge挙動を示すS-行列に適したベルパラメータの最小化が存在することが分かる。 We consider Bell inequalities in 2-2 scattering of photons, gravitons, fermions and pions. We choose measurement settings that give maximum Bell violation for maximally entangled states and calculate the relevant Bell inequalities for these processes. For photon scattering at low energies, QED exhibits Bell violation for all scattering angles except for a small transverse region. This leads to a fine-tuning problem. Incorporating a light axion/axion-like particle (ALP) removes the fine-tuning problem and constrains the axion-coupling--axion-mass parameters. Allowing for graviton exchange and demanding Bell violation in photon scattering, we find that the Weak Gravity Conjecture is satisfied. Quantum gravity effect on axion coupling is discussed. For 2-2 graviton scattering, we find that CEMZ bounds allow for at most small Bell violations. Restriction on the Weinberg angle is found by demanding Bell violation in Bhabha scattering. We use recent S-matrix bootstrap data for pions and photons to study the Bell parameter in the space of allowed S-matrices. In the photon case, we study the Bell parameters as a function of energy and find support for the EFT observations. We discuss Bell parameter for pion S-matrices, which are qutrits. For pions, we find that there is a minimization of a suitable Bell parameter for S-matrices which exhibit Regge behaviour.	翻訳日:2023-07-07 18:09:27 公開日:2023-07-06
# メモリ効率の高いNLLB-200:多言語機械翻訳モデルの言語特化 Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model ( http://arxiv.org/abs/2212.09811v2 ) ライセンス: Link先を確認	Yeskendir Koishekenov, Vassilina Nikoulina, Alexandre Berard	(参考訳) 従来のバイリンガル翻訳システムと比較して、単一のモデルが複数の言語に翻訳でき、低リソース言語に対する知識伝達の恩恵を受けるため、多言語機械翻訳は魅力的である。一方、多言語モデルは、そのサイズを大規模にスケーリングし、トレーニングと推論コストを増大させない限り、多言語性の呪いに悩まされる。 Sparse Mixture-of-Expertsモデルは、比例計算を必要とせずに、モデル容量を大幅に増やす方法である。最近リリースされたnllb-200は、そのようなモデルの例である。 202言語をカバーするが、推論には少なくとも4つの32GB GPUが必要である。そこで本研究では, 翻訳品質を損なうことなく, 最大80\%のエキスパートを除去し, 単一の32gb gpu上でモデルを実行することが可能なプルーニング手法を提案する。さらに分析した結果,言語固有の専門家を識別し,特定の言語ペアに関連のない専門家を特定できることが示唆された。 Compared to conventional bilingual translation systems, massively multilingual machine translation is appealing because a single model can translate into multiple languages and benefit from knowledge transfer for low resource languages. On the other hand, massively multilingual models suffer from the curse of multilinguality, unless scaling their size massively, which increases their training and inference costs. Sparse Mixture-of-Experts models are a way to drastically increase model capacity without the need for a proportional amount of computing. The recently released NLLB-200 is an example of such a model. It covers 202 languages but requires at least four 32GB GPUs just for inference. In this work, we propose a pruning method that allows the removal of up to 80\% of experts with a negligible loss in translation quality, which makes it feasible to run the model on a single 32GB GPU. Further analysis suggests that our pruning metrics allow to identify language-specific experts and prune non-relevant experts for a given language pair.	翻訳日:2023-07-07 18:09:07 公開日:2023-07-06
# LOANet:UAV空中リモートセンシング画像から建物や道路を抽出するオブジェクト注意を用いた軽量ネットワーク LOANet: A Lightweight Network Using Object Attention for Extracting Buildings and Roads from UAV Aerial Remote Sensing Images ( http://arxiv.org/abs/2212.08490v6 ) ライセンス: Link先を確認	Xiaoxiang Han, Yiman Liu, Gang Liu, Yuanjie Lin, Qiaohong Liu	(参考訳) 深層学習による無人航空機(uav)リモートセンシング画像から建物や道路を抽出するセマンティックセグメンテーションは、測量やマッピングの分野で従来の手動セグメンテーションよりも効率的で便利である。モデルを軽量化し,モデルの精度を向上させるために,uav空中リモートセンシング画像から建物や道路にオブジェクト・アテンション(loanet)を用いた軽量ネットワークを提案する。提案するネットワークは,軽量Densely Connected Network (LDCNet) をエンコーダとして開発したエンコーダデコーダアーキテクチャを採用している。復号器部では、Atrous Space Pyramid Pooling Module (ASPP) と Object Attention Module (OAM) から構成される2つのマルチスケールコンテキストモジュールが、UAVリモートセンシング画像の特徴マップからより多くのコンテキスト情報を取得するように設計されている。 ASPPとOAMの間には、ASPPから抽出したマルチスケール機能を融合するために、FPN(Feature Pyramid Network)モジュールが使用されている。 2431のトレーニングセット、945の検証セット、および475のテストセットを含むUAVが撮影するリモートセンシング画像のプライベートデータセットを構築する。提案する基本モデルは1.4mパラメータと5.48g浮動小数点演算(flops)しか持たず、優れた平均交叉結合(miou)を達成している。 LoveDAとCITY-OSMデータセットのさらなる実験を行い、提案した基本モデルと大規模モデルの有効性をさらに検証し、優れたmIoU結果を得た。すべてのコードはhttps://github.com/GtLinyer/LOANetで入手できる。 Semantic segmentation for extracting buildings and roads from uncrewed aerial vehicle (UAV) remote sensing images by deep learning becomes a more efficient and convenient method than traditional manual segmentation in surveying and mapping fields. In order to make the model lightweight and improve the model accuracy, a Lightweight Network Using Object Attention (LOANet) for Buildings and Roads from UAV Aerial Remote Sensing Images is proposed. The proposed network adopts an encoder-decoder architecture in which a Lightweight Densely Connected Network (LDCNet) is developed as the encoder. In the decoder part, the dual multi-scale context modules which consist of the Atrous Spatial Pyramid Pooling module (ASPP) and the Object Attention Module (OAM) are designed to capture more context information from feature maps of UAV remote sensing images. Between ASPP and OAM, a Feature Pyramid Network (FPN) module is used to fuse multi-scale features extracted from ASPP. A private dataset of remote sensing images taken by UAV which contains 2431 training sets, 945 validation sets, and 475 test sets is constructed. The proposed basic model performs well on this dataset, with only 1.4M parameters and 5.48G floating point operations (FLOPs), achieving excellent mean Intersection-over-Union (mIoU). Further experiments on the publicly available LoveDA and CITY-OSM datasets have been conducted to further validate the effectiveness of the proposed basic and large model, and outstanding mIoU results have been achieved. All codes are available on https://github.com/GtLinyer/LOANet.	翻訳日:2023-07-07 18:08:47 公開日:2023-07-06
# 量子隠れマルコフモデルの実装と学習 Implementation and Learning of Quantum Hidden Markov Models ( http://arxiv.org/abs/2212.03796v2 ) ライセンス: Link先を確認	Vanio Markov, Vladimir Rastunkov, Amol Deshmukh, Daniel Fry, Charlee Stefanski	(参考訳) 本稿では、量子チャネルとオープン量子システムの理論を用いて、量子隠れマルコフモデル(QHMM)として知られる確率的生成系の効率的なユニタリ特性を提供する。ユニタリ・キャラクタリゼーションを利用して、任意のQHMMを中間回路計測による量子回路として実装できることを実証する。従来の隠れマルコフモデル (HMM) と比較して, QHMM は確率過程言語のより効率的な定義であることを示す。 QHMMを量子チャネルとして定式化することから始め、Stinespring の構成を用いて、これらのモデルを中間回路測定によるユニタリ量子回路として表現する。 QHMMの単位パラメータ化を利用して,形式的QHMM学習モデルを定義する。このモデルは、対象の確率過程言語の経験的分布を定式化し、量子回路の仮説空間を定義し、学習の成功基準として経験的確率的発散測度ハイポテーゼ適合性を導入する。学習モデルは,Stinespringの拡張の連続性により,スムーズな探索環境を有することを示す。仮説と適合空間の滑らかなマッピングは、効率的なヒューリスティックおよび勾配降下学習アルゴリズムの開発を可能にする。本稿では,QHMMのための2つの実践的学習アルゴリズムを提案する。最初のアルゴリズムはハイパーパラメータ適応進化探索である。第2のアルゴリズムは、マルチパラメータ非線形最適化手法を用いてqhmmを量子アンサッツ回路として学習する。 In this article we use the theory of quantum channels and open quantum systems to provide an efficient unitary characterization of a class of stochastic generators known as quantum hidden Markov models (QHMMs). By utilizing the unitary characterization, we demonstrate that any QHMM can be implemented as a quantum circuit with mid-circuit measurement. We prove that QHMMs are more efficient definitions of stochastic process languages compared to the equivalent classical hidden Markov Models (HMMs). Starting with the formulation of QHMMs as quantum channels, we employ Stinespring's construction to represent these models as unitary quantum circuits with mid-circuit measurement. By utilizing the unitary parameterization of QHMMs, we define a formal QHMM learning model. The model formalizes the empirical distributions of target stochastic process languages, defines hypothesis space of quantum circuits, and introduces an empirical stochastic divergence measure - hypothesis fitness - as a success criterion for learning. We demonstrate that the learning model has a smooth search landscape due to the continuity of Stinespring's dilation. The smooth mapping between the hypothesis and fitness spaces enables the development of efficient heuristic and gradient descent learning algorithms. We propose two practical learning algorithms for QHMMs. The first algorithm is a hyperparameter-adaptive evolutionary search. The second algorithm learns the QHMM as a quantum ansatz circuit using a multi-parameter non-linear optimization technique.	翻訳日:2023-07-07 18:08:16 公開日:2023-07-06
# マルチラベルテキスト分類におけるコントラスト学習の有効活用 An Effective Employment of Contrastive Learning in Multi-label Text Classification ( http://arxiv.org/abs/2212.00552v2 ) ライセンス: Link先を確認	Nankai Lin, Guanqiu Qin, Jigang Wang, Aimin Yang, Dong Zhou	(参考訳) 自然言語処理タスクにおけるコントラスト学習技術の有効性はまだ探究・分析されていない。正と負のサンプルを正しくかつ合理的に構築する方法は、コントラスト学習の核となる課題である。複数ラベルのテキスト分類タスクで対照的なオブジェクトを見つけるのはさらに難しい。以前提案された対照的な損失はほとんどない。本稿では,複数ラベルのテキスト分類タスクに対して,新しいコントラスト損失を5つ提案することにより,問題を異なる角度から検討する。これらは、SCL(Strict Contrastive Loss)、ICL(Intra-label Contrastive Loss)、JSCL(Jaccard similarity Contrastive Loss)、JSPCL(Jaccard similarity Probability Contrastive Loss)、SLCL(Stepwise Label Contrastive Loss)である。本稿では,これら新たな損失の雇用によるマルチラベルテキスト分類タスクにおけるコントラスト学習の有効性について検討し,コントラスト学習手法を特定のタスクに展開するためのベースラインモデルを提案する。さらに,このアプローチの解釈可能な分析を行い,コントラスト学習損失の異なる要素がどのように役割を担っているかを示す。実験結果から,提案したコントラスト損失は,複数ラベルテキスト分類タスクの改善につながることが示された。また,マルチラベルテキスト分類タスクにコントラスト学習をどのように適用すべきかについても検討した。 The effectiveness of contrastive learning technology in natural language processing tasks is yet to be explored and analyzed. How to construct positive and negative samples correctly and reasonably is the core challenge of contrastive learning. It is even harder to discover contrastive objects in multi-label text classification tasks. There are very few contrastive losses proposed previously. In this paper, we investigate the problem from a different angle by proposing five novel contrastive losses for multi-label text classification tasks. These are Strict Contrastive Loss (SCL), Intra-label Contrastive Loss (ICL), Jaccard Similarity Contrastive Loss (JSCL), Jaccard Similarity Probability Contrastive Loss (JSPCL), and Stepwise Label Contrastive Loss (SLCL). We explore the effectiveness of contrastive learning for multi-label text classification tasks by the employment of these novel losses and provide a set of baseline models for deploying contrastive learning techniques on specific tasks. We further perform an interpretable analysis of our approach to show how different components of contrastive learning losses play their roles. The experimental results show that our proposed contrastive losses can bring improvement to multi-label text classification tasks. Our work also explores how contrastive learning should be adapted for multi-label text classification tasks.	翻訳日:2023-07-07 18:07:57 公開日:2023-07-06
# FederatedTrust: 信頼できるフェデレーション学習のためのソリューション FederatedTrust: A Solution for Trustworthy Federated Learning ( http://arxiv.org/abs/2302.09844v2 ) ライセンス: Link先を確認	Pedro Miguel S\'anchez S\'anchez, Alberto Huertas Celdr\'an, Ning Xie, G\'er\^ome Bovet, Gregorio Mart\'inez P\'erez, Burkhard Stiller	(参考訳) IoT(Internet of Things)とエッジコンピューティングの急速な拡張により、センシティブな情報を保持する分散データサイロの存在により、集中型機械学習(ML/DL)手法の課題が提示された。データプライバシに関する懸念に対処するため、フェデレートラーニング(FL)のようなML/DL技術が登場している。しかし、モデル予測に対する信頼を確立する必要性が高まっているため、データのプライバシとパフォーマンスの確保だけでは不十分である。既存の文献では、信頼できるML/DL(データプライバシを除く)に対して、堅牢性、公正性、説明可能性、説明責任を重要な柱として特定する様々なアプローチが提案されている。それでも、FLモデルに関連する信頼性柱と評価指標を同定し、FLモデルの信頼性レベルを計算できるソリューションを開発するためには、さらなる研究が必要である。本研究は、flの信頼性を評価するための既存の要件を検証し、6つの柱(プライバシー、ロバスト性、公平性、説明可能性、説明責任、連邦)と、flモデルの信頼性を計算するための30以上の指標からなる包括的分類法を導入する。その後、FederatedTrustというアルゴリズムが分類学で同定された柱とメトリクスに基づいて設計され、FLモデルの信頼性スコアが計算される。 FederatedTrustのプロトタイプが実装され、十分に確立されたFLフレームワークであるFederatedScopeの学習プロセスに統合される。最後に,federatedscopeの異なる構成を用いて5つの実験を行い,flモデルの信頼性計算におけるfederatedtrustの有用性を実証した。 3つの実験では、FEMNISTデータセットを使用し、2つは、実際のIoTセキュリティユースケースを考慮したN-Ba IoTデータセットを使用する。 The rapid expansion of the Internet of Things (IoT) and Edge Computing has presented challenges for centralized Machine and Deep Learning (ML/DL) methods due to the presence of distributed data silos that hold sensitive information. To address concerns regarding data privacy, collaborative and privacy-preserving ML/DL techniques like Federated Learning (FL) have emerged. However, ensuring data privacy and performance alone is insufficient since there is a growing need to establish trust in model predictions. Existing literature has proposed various approaches on trustworthy ML/DL (excluding data privacy), identifying robustness, fairness, explainability, and accountability as important pillars. Nevertheless, further research is required to identify trustworthiness pillars and evaluation metrics specifically relevant to FL models, as well as to develop solutions that can compute the trustworthiness level of FL models. This work examines the existing requirements for evaluating trustworthiness in FL and introduces a comprehensive taxonomy consisting of six pillars (privacy, robustness, fairness, explainability, accountability, and federation), along with over 30 metrics for computing the trustworthiness of FL models. Subsequently, an algorithm named FederatedTrust is designed based on the pillars and metrics identified in the taxonomy to compute the trustworthiness score of FL models. A prototype of FederatedTrust is implemented and integrated into the learning process of FederatedScope, a well-established FL framework. Finally, five experiments are conducted using different configurations of FederatedScope to demonstrate the utility of FederatedTrust in computing the trustworthiness of FL models. Three experiments employ the FEMNIST dataset, and two utilize the N-BaIoT dataset considering a real-world IoT security use case.	翻訳日:2023-07-07 18:02:40 公開日:2023-07-06
# ポインタジェネレータネットワークとSciBERT埋め込みを用いた研究論文からのハイライト生成 Generation of Highlights from Research Papers Using Pointer-Generator Networks and SciBERT Embeddings ( http://arxiv.org/abs/2302.07729v2 ) ライセンス: Link先を確認	Tohida Rehman, Debarshi Kumar Sanyal, Samiran Chattopadhyay, Plaban Kumar Bhowmick, Partha Pratim Das	(参考訳) 近年,本論文の主な知見を要約する研究論文が多数発表されている。ハイライトは、研究者が論文のコントリビューションを正確かつ迅速に特定するのに役立つだけでなく、検索エンジンによる発見可能性を高める。研究論文の特定の部分について,研究ハイライトを自動的に作成することを目的としている。我々は,入力トークンをSciBERT埋め込みにエンコードする入力に,カバレッジ機構を備えたポインタジェネレータネットワークとコンテキスト埋め込み層を使用する。我々は、ベンチマークデータセットCSPubSumでモデルをテストし、また、自動研究ハイライト生成のための新しい論文の多分野コーパスであるMixSubを提示する。 CSPubSum と MixSub の両モデルにおいて,提案モデルが関連する変種や文献で提案する他のモデルと比較して,最高の性能を達成できることを示した。 CSPubSumデータセットでは,入力が紙の抽象的な部分のみである場合に,他の部分に対して最高の性能が得られる。 ROUGE-1、ROUGE-2、ROUGE-L F1スコアは38.26、14.26、35.51、METEORスコアは32.62、BERTScore F1は86.65で、全てのベースラインを上回っている。新しいMixSubデータセットにおいて,提案したモデル(対象カテゴリを区別せずにトレーニングコーパス全体をトレーニングした場合)は,それぞれ31.78,9.76,29.3のROUGE-1,ROUGE-2,ROUGE-L F1スコア,METEORスコア24.00,BERTScore F1,85.25のそれぞれを達成する。 Nowadays many research articles are prefaced with research highlights to summarize the main findings of the paper. Highlights not only help researchers precisely and quickly identify the contributions of a paper, they also enhance the discoverability of the article via search engines. We aim to automatically construct research highlights given certain segments of a research paper. We use a pointer-generator network with coverage mechanism and a contextual embedding layer at the input that encodes the input tokens into SciBERT embeddings. We test our model on a benchmark dataset, CSPubSum, and also present MixSub, a new multi-disciplinary corpus of papers for automatic research highlight generation. For both CSPubSum and MixSub, we have observed that the proposed model achieves the best performance compared to related variants and other models proposed in the literature. On the CSPubSum dataset, our model achieves the best performance when the input is only the abstract of a paper as opposed to other segments of the paper. It produces ROUGE-1, ROUGE-2 and ROUGE-L F1-scores of 38.26, 14.26 and 35.51, respectively, METEOR score of 32.62, and BERTScore F1 of 86.65 which outperform all other baselines. On the new MixSub dataset, where only the abstract is the input, our proposed model (when trained on the whole training corpus without distinguishing between the subject categories) achieves ROUGE-1, ROUGE-2 and ROUGE-L F1-scores of 31.78, 9.76 and 29.3, respectively, METEOR score of 24.00, and BERTScore F1 of 85.25.	翻訳日:2023-07-07 18:02:08 公開日:2023-07-06
# その部分の合計:操作対象の慣性パラメータ識別のための視覚部分分割 The Sum of Its Parts: Visual Part Segmentation for Inertial Parameter Identification of Manipulated Objects ( http://arxiv.org/abs/2302.06685v2 ) ライセンス: Link先を確認	Philippe Nadeau, Matthew Giamou, Jonathan Kelly	(参考訳) 作業者と共に安全かつ効率的に作業するためには,協調ロボット(cobots)は,操作対象のダイナミックスを迅速に理解する能力が必要である。しかしながら、慣性パラメータの完全なセットを推定する従来の方法は、必ずしも高速で安全でない動き(十分な信号対雑音比を達成するために)に依存する。本研究では,視覚と力のねじれを組み合わせることで,動きの遅さや「ストップ・アンド・ゴー」のみを必要とする慣性パラメータ同定アルゴリズムを開発した。この手法は均質部分分割 (hps) と呼ばれ, 人工物は異なる均質な部分から構成されていることが多いという観察を生かしている。我々は,表面に基づく点クラスタリング法と体積形状分割アルゴリズムを組み合わせることで,操作対象の部分レベルセグメンテーションを高速に生成し,そのセグメンテーション表現をHPSにより精度よくオブジェクトの慣性パラメータを推定するために利用する。アルゴリズムをベンチマークするために、20の共通ワークショップツールに対して、現実的なメッシュ、セグメント化されたポイントクラウド、慣性パラメータからなる新しいデータセットを作成し、利用する。最後に,低コストの協調ロボットアームを用いて,複雑な「ハンマーバランス法」を自律的かつオンラインで実施することにより,HPSの実際の性能と精度を実証する。私たちのコードとデータセットはオープンソースで、自由に利用できます。 To operate safely and efficiently alongside human workers, collaborative robots (cobots) require the ability to quickly understand the dynamics of manipulated objects. However, traditional methods for estimating the full set of inertial parameters rely on motions that are necessarily fast and unsafe (to achieve a sufficient signal-to-noise ratio). In this work, we take an alternative approach: by combining visual and force-torque measurements, we develop an inertial parameter identification algorithm that requires slow or 'stop-and-go' motions only, and hence is ideally tailored for use around humans. Our technique, called Homogeneous Part Segmentation (HPS), leverages the observation that man-made objects are often composed of distinct, homogeneous parts. We combine a surface-based point clustering method with a volumetric shape segmentation algorithm to quickly produce a part-level segmentation of a manipulated object; the segmented representation is then used by HPS to accurately estimate the object's inertial parameters. To benchmark our algorithm, we create and utilize a novel dataset consisting of realistic meshes, segmented point clouds, and inertial parameters for 20 common workshop tools. Finally, we demonstrate the real-world performance and accuracy of HPS by performing an intricate 'hammer balancing act' autonomously and online with a low-cost collaborative robotic arm. Our code and dataset are open source and freely available.	翻訳日:2023-07-07 18:01:36 公開日:2023-07-06
# 心電図のパワーを解き放つ : 心電図信号を用いた医療システムにおける新しい患者同定法 Unleashing the Power of Electrocardiograms: A novel approach for Patient Identification in Healthcare Systems with ECG Signals ( http://arxiv.org/abs/2302.06529v2 ) ライセンス: Link先を確認	Caterina Fuster-Barcel\'o, Carmen C\'amara, Pedro Peris-L\'opez	(参考訳) 過去20年間に渡り、心臓のシグナルを生体計測のモダリティとして活用する可能性についてかなりの研究が続けられてきた。本稿では心電図信号を用いた医療システムにおける患者識別のための新しいアプローチを提案する。畳み込みニューラルネットワークは、ECG信号から抽出された画像に基づいてユーザを分類するために使用される。提案する識別システムは複数のデータベースで評価され,実世界のシナリオにおけるその可能性の包括的理解を提供する。心臓血管疾患の一般ユーザ識別への影響は、これまでの研究では概ね見過ごされてきた。本手法は, 患者の心血管状態を考慮し, 得られた結果が偏りや制限がないことを保証する。さらに、得られた結果は、広範囲な実験によって示されるように、低いエラー率と高い精度のメトリクスで、一貫性と信頼性がある。これらの機能はすべて、医療システムにおける患者識別の分野において、提案手法が貴重な貢献となり、実用的応用の強力な候補となる。 Over the course of the past two decades, a substantial body of research has substantiated the viability of utilising cardiac signals as a biometric modality. This paper presents a novel approach for patient identification in healthcare systems using electrocardiogram signals. A convolutional neural network is used to classify users based on images extracted from ECG signals. The proposed identification system is evaluated in multiple databases, providing a comprehensive understanding of its potential in real-world scenarios. The impact of Cardiovascular Diseases on generic user identification has been largely overlooked in previous studies. The presented method takes into account the cardiovascular condition of the patients, ensuring that the results obtained are not biased or limited. Furthermore, the results obtained are consistent and reliable, with lower error rates and higher accuracy metrics, as demonstrated through extensive experimentation. All these features make the proposed method a valuable contribution to the field of patient identification in healthcare systems, and make it a strong contender for practical applications.	翻訳日:2023-07-07 18:01:11 公開日:2023-07-06
# 仮想量子エラー検出 Virtual quantum error detection ( http://arxiv.org/abs/2302.02626v2 ) ライセンス: Link先を確認	Kento Tsubouchi, Yasunari Suzuki, Yuuki Tokunaga, Nobuyuki Yoshioka, Suguru Endo	(参考訳) 量子誤差補正と量子誤差検出は、エラーを検出するために症候群の測定を必要とする。各安定化器発電機のシンドローム測定は、現在の量子ハードウェアにおける読み出し忠実度が一般的にゲート忠実度よりも低いという事実を考慮すると、大きなオーバーヘッドとなる。本稿では,対称性拡張と呼ばれる量子エラー緩和手法を一般化することにより,仮想量子エラー検出(VQED)と呼ばれるプロトコルを提案する。この方法では、回路実行中の量子誤差検出により得られた後選択量子状態に対応する計算結果を、シンドローム測定を実装せずに、事実上評価することができる。安定化器発生器毎のアダマール試験回路の実装を必要とする従来の量子誤り検出とは異なり、我々のVQEDプロトコルは、安定化器発生器の数に関係なく、アンシラ量子ビットを持つ一定の深さの浅い量子回路で実行することができる。また,vqedを用いた計算結果は,vqedの動作中に発生する雑音に対して頑健であり,本手法は他の誤差緩和手法と完全互換であり,計算精度のさらなる向上と高忠実性量子コンピューティングの容易化が図れる。 Quantum error correction and quantum error detection necessitate syndrome measurements to detect errors. Performing syndrome measurements for each stabilizer generator can be a significant overhead, considering the fact that the readout fidelity in the current quantum hardware is generally lower than gate fidelity. Here, by generalizing a quantum error mitigation method known as symmetry expansion, we propose a protocol called virtual quantum error detection (VQED). This method virtually allows for evaluating computation results corresponding to post-selected quantum states obtained through quantum error detection during circuit execution, without implementing syndrome measurements. Unlike conventional quantum error detection, which requires the implementation of Hadamard test circuits for each stabilizer generator, our VQED protocol can be performed with a constant depth shallow quantum circuit with an ancilla qubit, irrespective of the number of stabilizer generators. Furthermore, the computation results obtained using VQED are robust against the noise that occurred during the operation of VQED, and our method is fully compatible with other error mitigation schemes, enabling further improvements in computation accuracy and facilitating high-fidelity quantum computing.	翻訳日:2023-07-07 18:00:27 公開日:2023-07-06
# マージナルコントリビューションを伴わないシェープリー値の近似 Approximating the Shapley Value without Marginal Contributions ( http://arxiv.org/abs/2302.00736v2 ) ライセンス: Link先を確認	Patrick Kolpaczki, Viktor Bengs, Maximilian Muschalik, Eyke H\"ullermeier	(参考訳) Shapley値は、最近説明可能な人工知能で集中的に使用されている協調ゲームにおいて、プレイヤーに有意義な貢献価値を割り当てる最も一般的なアプローチであることは間違いない。意味性は、シャプリー値のみが満足する公理的性質によるものであるが、エージェントの数で指数関数的に増加する正確な計算を犠牲にしている。したがって、多くの研究がシェープリーの値の効率的な近似に費やされており、そのほとんどはエージェントの限界貢献の概念に反するものである。本稿では,余剰貢献の概念から分離されたShapley値の表現に基づいて,SVARM と Stratified SVARM の2つのパラメータフリーおよびドメイン非依存近似アルゴリズムを提案する。我々は,その近似的品質に関する不一致の理論的保証を証明し,合成ゲームを含む経験的結果と,最先端手法と比較する一般的な説明可能性ユースケースを提供する。 The Shapley value is arguably the most popular approach for assigning a meaningful contribution value to players in a cooperative game, which has recently been used intensively in explainable artificial intelligence. The meaningfulness is due to axiomatic properties that only the Shapley value satisfies, which, however, comes at the expense of an exact computation growing exponentially with the number of agents. Accordingly, a number of works are devoted to the efficient approximation of the Shapley values, most of them revolve around the notion of an agent's marginal contribution. In this paper, we propose with SVARM and Stratified SVARM two parameter-free and domain-independent approximation algorithms based on a representation of the Shapley value detached from the notion of marginal contributions. We prove unmatched theoretical guarantees regarding their approximation quality and provide empirical results including synthetic games as well as common explainability use cases comparing ourselves with state-of-the-art methods.	翻訳日:2023-07-07 18:00:07 公開日:2023-07-06
# ESC:ゼロショットオブジェクトナビゲーションのためのソフトコモンセンス制約による探索 ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation ( http://arxiv.org/abs/2301.13166v3 ) ライセンス: Link先を確認	Kaiwen Zhou, Kaizhi Zheng, Connor Pryor, Yilin Shen, Hongxia Jin, Lise Getoor, Xin Eric Wang	(参考訳) 特定のオブジェクトを正確に見つけてナビゲートする能力は、現実世界で動作し、タスクを完了させるためにオブジェクトと対話するエージェントにとって重要な能力である。このようなオブジェクトナビゲーションタスクは、通常、ラベル付きオブジェクトを持つ視覚環境において大規模なトレーニングを必要とする。本研究では,事前学習モデルにおける常識知識を,ナビゲーション経験や視覚環境でのトレーニングなしにオープンワールドオブジェクトナビゲーションに伝達する,ソフト・コモンセンス制約(esc)を用いた新たなゼロショットオブジェクトナビゲーション手法を提案する。第一に、ESCは、オープンワールドのプロンプトベースのグラウンドリングのための事前学習されたビジョンと言語モデルと、ルームおよびオブジェクト推論のための事前学習されたコモンセンス言語モデルを利用する。そして、ESCはコモンセンス知識を、効率的な探索のためのソフトロジック述語としてモデル化することで、ナビゲーション行動に変換する。 MP3D, HM3D, および RoboTHOR ベンチマークの大規模な実験により、我々のESC法はベースラインよりも大幅に改善され、ゼロショットオブジェクトナビゲーションのための新しい最先端結果が得られる(例えば、MP3D の CoW よりも288% の相対的継承率の向上)。 The ability to accurately locate and navigate to a specific object is a crucial capability for embodied agents that operate in the real world and interact with objects to complete tasks. Such object navigation tasks usually require large-scale training in visual environments with labeled objects, which generalizes poorly to novel objects in unknown environments. In this work, we present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC), that transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience nor any other training on the visual environments. First, ESC leverages a pre-trained vision and language model for open-world prompt-based grounding and a pre-trained commonsense language model for room and object reasoning. Then ESC converts commonsense knowledge into navigation actions by modeling it as soft logic predicates for efficient exploration. Extensive experiments on MP3D, HM3D, and RoboTHOR benchmarks show that our ESC method improves significantly over baselines, and achieves new state-of-the-art results for zero-shot object navigation (e.g., 288% relative Success Rate improvement than CoW on MP3D).	翻訳日:2023-07-07 17:59:50 公開日:2023-07-06
# グラフ表現学習による効率的かつ実現可能なロボット組立シーケンス計画 Efficient and Feasible Robotic Assembly Sequence Planning via Graph Representation Learning ( http://arxiv.org/abs/2303.10135v3 ) ライセンス: Link先を確認	Matan Atad, Jianxiang Feng, Ismael Rodr\'iguez, Maximilian Durner, Rudolph Triebel	(参考訳) 自動ロボット組立シーケンス計画(RASP)は、製品カスタマイズの必要性が高まるとともに、現代製造業における生産性とレジリエンスを大幅に向上させることができる。このような自動化を実現する上での最大の課題のひとつは、ますます複雑なアセンブリの潜在的なシーケンスの数が増えることによるソリューションの効率的な発見にある。さらに、ロボットシステムにはコストのかかる実現性チェックが常に必要です。そこで本研究では,製品アセンブリのためのグラフ表現であるアセンブリグラフと,アセンブリシーケンス生成のためのGRACEと呼ばれるポリシアーキテクチャであるGraph Assembly Processing Networkを提案する。次に、GRACEを用いてグラフ入力から意味のある情報を抽出し、ステップバイステップでアセンブリシーケンスを予測する。実験では、両腕ロボットシステムのシミュレーションで収集したデータに基づいて、アルミニウムプロファイルの製品変種間で実現可能な組立シーケンスを予測できることを示す。さらに,本手法は, 偽予測による望ましくない影響を著しく軽減し, 現実の展開を容易にすることができることを示す。コードとトレーニングデータはhttps://github.com/DLR-RM/GRACEで公開されている。 Automatic Robotic Assembly Sequence Planning (RASP) can significantly improve productivity and resilience in modern manufacturing along with the growing need for greater product customization. One of the main challenges in realizing such automation resides in efficiently finding solutions from a growing number of potential sequences for increasingly complex assemblies. Besides, costly feasibility checks are always required for the robotic system. To address this, we propose a holistic graphical approach including a graph representation called Assembly Graph for product assemblies and a policy architecture, Graph Assembly Processing Network, dubbed GRACE for assembly sequence generation. Secondly, we use GRACE to extract meaningful information from the graph input and predict assembly sequences in a step-by-step manner. In experiments, we show that our approach can predict feasible assembly sequences across product variants of aluminum profiles based on data collected in simulation of a dual-armed robotic system. We further demonstrate that our method is capable of detecting infeasible assemblies, substantially alleviating the undesirable impacts from false predictions, and hence facilitating real-world deployment soon. Code and training data are available at https://github.com/DLR-RM/GRACE.	翻訳日:2023-07-07 17:51:17 公開日:2023-07-06
# NeRF固有の4つ: 逆内在カメラパラメータと外在カメラパラメータの同時最適化 NeRFtrinsic Four: An End-To-End Trainable NeRF Jointly Optimizing Diverse Intrinsic and Extrinsic Camera Parameters ( http://arxiv.org/abs/2303.09412v3 ) ライセンス: Link先を確認	Hannah Schieber, Fabian Deuser, Bernhard Egger, Norbert Oswald, Daniel Roth	(参考訳) ニューラル放射場(NeRF)を用いた新しいビュー合成は、新しい視点から高品質な画像を生成する最先端技術である。既存の手法では、極端および内在的なカメラパラメータに関する事前知識が必要である。これにより、前処理ステップが必要な合成シーンや現実世界のシナリオへの適用が制限される。カメラパラメータとNeRFの合同最適化に関する最近の研究は、ノイズのある外部カメラパラメータの精製に重点を置いており、しばしば固有のカメラパラメータの事前処理に依存している。さらなるアプローチは、1つのカメラのみを本質的にカバーすることに限られる。これらの制約に対処するため、我々はNeRFtrinsic Fourと呼ばれる新しいエンドツーエンドのトレーニング可能なアプローチを提案する。我々は,gaussian fourier特徴を用いて,外部カメラパラメータを推定し,投影誤差の監視により,固有カメラパラメータの変動を動的に予測する。提案手法はLLFFとBLEFFの既存の共同最適化手法よりも優れている。これら既存のデータセットに加えて,固有カメラパラメータの異なるiffと呼ばれる新しいデータセットも導入する。 nerftrinsic fourは、nerfベースのビュー合成を共同最適化するステップであり、カメラパラメータの異なる現実世界のシナリオにおいて、よりリアルで柔軟なレンダリングを可能にする。 Novel view synthesis using neural radiance fields (NeRF) is the state-of-the-art technique for generating high-quality images from novel viewpoints. Existing methods require a priori knowledge about extrinsic and intrinsic camera parameters. This limits their applicability to synthetic scenes, or real-world scenarios with the necessity of a preprocessing step. Current research on the joint optimization of camera parameters and NeRF focuses on refining noisy extrinsic camera parameters and often relies on the preprocessing of intrinsic camera parameters. Further approaches are limited to cover only one single camera intrinsic. To address these limitations, we propose a novel end-to-end trainable approach called NeRFtrinsic Four. We utilize Gaussian Fourier features to estimate extrinsic camera parameters and dynamically predict varying intrinsic camera parameters through the supervision of the projection error. Our approach outperforms existing joint optimization methods on LLFF and BLEFF. In addition to these existing datasets, we introduce a new dataset called iFF with varying intrinsic camera parameters. NeRFtrinsic Four is a step forward in joint optimization NeRF-based view synthesis and enables more realistic and flexible rendering in real-world scenarios with varying camera parameters.	翻訳日:2023-07-07 17:51:01 公開日:2023-07-06
# 複合時間スタンプイベントストリームの高速・マルチアスペクトマイニング Fast and Multi-aspect Mining of Complex Time-stamped Event Streams ( http://arxiv.org/abs/2303.03789v2 ) ライセンス: Link先を確認	Kota Nakamura, Yasuko Matsubara, Koki Kawabata, Yuhei Umeda, Yuichiro Wada and Yasushi Sakurai	(参考訳) オンラインショッピングログ (item, price, brand, time) やローカルモビリティアクティビティ (pick-up and drop-off location, time) など,さまざまな属性を備えた時間進化イベントの巨大なオンラインストリームを,どのようにして,大規模で動的高次テンソルストリームを要約すればよいか? 隠れたパターンやルール、異常をどうやって見るのか? 我々は,高次テンソルストリーム上の効率的かつ効果的な手法であるcubescopeを提案するため,'regimes'と'components'という2種類のパターンに注目した。具体的には、突然の不連続を識別し、異なる動的パターン(例えば、平日、ウィークエンド、ホリデーパターン)を認識する。各制度では、すべての属性(アイテム、価格、ブランド、時間など)に対して多方向の要約を行い、潜在グループ(アイテム/ブランドグループなど)とその関係を表す隠れた'コンポーネント'を発見する。簡潔だが効果的な要約のおかげで、CubeScopeは異常の突然の出現を検出し、実際に発生する異常の種類を特定することもできる。提案手法は以下の特性を有する。 (a) 実効性: 動的マルチアスペクトパターン、すなわちレジームとコンポーネントをキャプチャし、統計的にすべての事象を要約する。 b) 一般に,データ圧縮,パターン発見,および様々なテンソルストリームの異常検出に成功させるには,実用的である。 (c)スケーラブル:我々のアルゴリズムは,データストリームの長さと次元に依存しない。実データセットに関する広範な実験は、立方体スコープが有意義なパターンや異常を正しく発見し、精度と実行速度に関して最先端の手法を一貫して上回っていることを示している。 Given a huge, online stream of time-evolving events with multiple attributes, such as online shopping logs: (item, price, brand, time), and local mobility activities: (pick-up and drop-off locations, time), how can we summarize large, dynamic high-order tensor streams? How can we see any hidden patterns, rules, and anomalies? Our answer is to focus on two types of patterns, i.e., ''regimes'' and ''components'', for which we present CubeScope, an efficient and effective method over high-order tensor streams. Specifically, it identifies any sudden discontinuity and recognizes distinct dynamical patterns, ''regimes'' (e.g., weekday/weekend/holiday patterns). In each regime, it also performs multi-way summarization for all attributes (e.g., item, price, brand, and time) and discovers hidden ''components'' representing latent groups (e.g., item/brand groups) and their relationship. Thanks to its concise but effective summarization, CubeScope can also detect the sudden appearance of anomalies and identify the types of anomalies that occur in practice. Our proposed method has the following properties: (a) Effective: it captures dynamical multi-aspect patterns, i.e., regimes and components, and statistically summarizes all the events; (b) General: it is practical for successful application to data compression, pattern discovery, and anomaly detection on various types of tensor streams; (c) Scalable: our algorithm does not depend on the length of the data stream and its dimensionality. Extensive experiments on real datasets demonstrate that CubeScope finds meaningful patterns and anomalies correctly, and consistently outperforms the state-of-the-art methods as regards accuracy and execution speed.	翻訳日:2023-07-07 17:50:22 公開日:2023-07-06
# 高精度・伝達可能なニューラルポテンシャルのための非平衡分子のDenoise Pretraining Denoise Pretraining on Nonequilibrium Molecules for Accurate and Transferable Neural Potentials ( http://arxiv.org/abs/2303.02216v2 ) ライセンス: Link先を確認	Yuyang Wang, Changwen Xu, Zijie Li, Amir Barati Farimani	(参考訳) 等変グラフニューラルネットワーク(GNN)の最近の進歩は、分子ポテンシャル予測のための高価なアブ初期量子力学(QM)アプローチへの高速サロゲートモデルの開発に深層学習が適している。しかしながら、gnnを用いた正確で転送可能なポテンシャルモデルの構築は、特に大規模で複雑な分子システムにおいて、高価な計算コストとqm法の理論のレベルによって非常に制限されるため、依然として困難である。本研究では,非平衡分子配座を事前学習して,より正確かつ伝達可能なGNNポテンシャル予測を実現することを提案する。具体的には、サンプル非平衡配座の原子座標はランダムノイズによって摂動され、GNNは、元の座標を復元する摂動分子配座を飾るために事前訓練される。複数のベンチマークでの厳密な実験は、事前学習が神経電位の精度を大幅に向上させることを示した。さらに,提案手法はモデル非依存であり,異なる不変量および同変量gnnの性能が向上することを示した。特に, 分子にプリトレーニングされたモデルでは, 異種分子, 荷電分子, 生体分子, 大型分子など, 様々な分子系に微調整した場合の性能が向上する。これらの結果は、複雑な分子系に対してより一般化可能なニューラルポテンシャルを構築するために、denoise Pretrainingアプローチを活用する可能性を強調している。 Recent advances in equivariant graph neural networks (GNNs) have made deep learning amenable to developing fast surrogate models to expensive ab initio quantum mechanics (QM) approaches for molecular potential predictions. However, building accurate and transferable potential models using GNNs remains challenging, as the data is greatly limited by the expensive computational costs and level of theory of QM methods, especially for large and complex molecular systems. In this work, we propose denoise pretraining on nonequilibrium molecular conformations to achieve more accurate and transferable GNN potential predictions. Specifically, atomic coordinates of sampled nonequilibrium conformations are perturbed by random noises and GNNs are pretrained to denoise the perturbed molecular conformations which recovers the original coordinates. Rigorous experiments on multiple benchmarks reveal that pretraining significantly improves the accuracy of neural potentials. Furthermore, we show that the proposed pretraining approach is model-agnostic, as it improves the performance of different invariant and equivariant GNNs. Notably, our models pretrained on small molecules demonstrate remarkable transferability, improving performance when fine-tuned on diverse molecular systems, including different elements, charged molecules, biomolecules, and larger systems. These results highlight the potential for leveraging denoise pretraining approaches to build more generalizable neural potentials for complex molecular systems.	翻訳日:2023-07-07 17:49:27 公開日:2023-07-06
# 混合スパース線形回帰における統計計算的トレードオフ Statistical-Computational Tradeoffs in Mixed Sparse Linear Regression ( http://arxiv.org/abs/2303.02118v2 ) ライセンス: Link先を確認	Gabriel Arpino and Ramji Venkataramanan	(参考訳) 2つの成分による混合スパース線形回帰の問題を考えると、2つの実$k$スパース信号 $\beta_1, \beta_2$ が$n$の非ラベリングノイズ線型測定から回収される。スパーシティは次元において部分線型であることが許され、加法ノイズは分散 $\sigma^2$ を持つ独立ガウスであると仮定される。以前の研究によると、この問題は$\frac{k}{snr^2}$-to-$\frac{k^2}{snr^2}$ 統計計算から計算へのギャップに苦しんでおり、スパースpcaやロバストスパース平均推定のような計算上困難な他の高次元推論問題に似ている。低次多項式の手法によりこの問題に対するより広範な計算障壁の存在を確立するが、この問題は非常に狭い対称パラメータ状態においてのみ計算的に困難であることを示す。この難易度における任意のランダム化アルゴリズムに対して,サンプル複雑性$n$と実行時の間のスムーズな情報計算トレードオフを同定する。単純な還元により、サンプル複雑性 $n = \tilde{o}(k^2)$ でスパース位相検索における正確な支持回復を解決するために計算障壁が存在するという新しい厳密な証拠が得られる。第2の貢献は, 難解な狭い状況以外では, サンプル数と正方根の時間と(非混合)スパース線形回帰に必要なサンプルの複雑さを一致させて, 関連する混合回帰検出問題を$O(np)$で解く, という単純なしきい値決定アルゴリズムを解析することである。この結果の特別な場合として,この単純なアルゴリズムは,分散線形回帰法において,完全符号付きサポートリカバリを解くためのアルゴリズム群の中で,順序最適であることを示す。 We consider the problem of mixed sparse linear regression with two components, where two real $k$-sparse signals $\beta_1, \beta_2$ are to be recovered from $n$ unlabelled noisy linear measurements. The sparsity is allowed to be sublinear in the dimension, and additive noise is assumed to be independent Gaussian with variance $\sigma^2$. Prior work has shown that the problem suffers from a $\frac{k}{SNR^2}$-to-$\frac{k^2}{SNR^2}$ statistical-to-computational gap, resembling other computationally challenging high-dimensional inference problems such as Sparse PCA and Robust Sparse Mean Estimation; here $SNR$ is the signal-to-noise ratio. We establish the existence of a more extensive computational barrier for this problem through the method of low-degree polynomials, but show that the problem is computationally hard only in a very narrow symmetric parameter regime. We identify a smooth information-computation tradeoff between the sample complexity $n$ and runtime for any randomized algorithm in this hard regime. Via a simple reduction, this provides novel rigorous evidence for the existence of a computational barrier to solving exact support recovery in sparse phase retrieval with sample complexity $n = \tilde{o}(k^2)$. Our second contribution is to analyze a simple thresholding algorithm which, outside of the narrow regime where the problem is hard, solves the associated mixed regression detection problem in $O(np)$ time with square-root the number of samples and matches the sample complexity required for (non-mixed) sparse linear regression; this allows the recovery problem to be subsequently solved by state-of-the-art techniques from the dense case. As a special case of our results, we show that this simple algorithm is order-optimal among a large family of algorithms in solving exact signed support recovery in sparse linear regression.	翻訳日:2023-07-07 17:49:01 公開日:2023-07-06
# 数量子配列における長持続的反相関 Long persistent anticorrelations in few-qubit arrays ( http://arxiv.org/abs/2303.02085v2 ) ライセンス: Link先を確認	Danil Kornovan, Alexander Poddubny, and Alexander Poshakinskiy	(参考訳) 一般電磁環境における2レベル原子配列に散在する光子間のアンチバンチングを実現する機構を理論的に検討する。私たちの目標は、個々の原子の自発的放出寿命よりもはるかに長い時間持続するアンチバンチングです。このような持続的なアンチバンチングのメカニズムを2つ挙げる。 1つは原子配列のサブラジアント状態に基づいており、もう1つはサブラジアント状態を必要としない。我々は,自由空間の配列と導波路に結合した配列に基づいて,最適化されたアンチバンチを持つ配列パラメータの具体例を2つ提案した。 We consider theoretically the mechanisms to realize antibunching between the photons scattered on the array of two-level atoms in a general electromagnetic environment. Our goal is the antibunching that persists for the times much longer than the spontaneous emission lifetime of an individual atom. We identify two mechanisms for such persistent antibunching. The first one is based on subradiant states of the atomic array, and the second one does not require any subradiant states. We provided two specific examples of array parameters with optimized antibunching, based on an array in a free space and an array coupled to a waveguide.	翻訳日:2023-07-07 17:48:22 公開日:2023-07-06
# ディープラーニングを用いた光リモートセンシング画像における指向性物体検出 Oriented Object Detection in Optical Remote Sensing Images using Deep Learning: A Survey ( http://arxiv.org/abs/2302.10473v2 ) ライセンス: Link先を確認	Kun Wang, Zi Wang, Zhang Lia, Ang Sua, Xichao Tenga, Minhao Liua and Qifeng Yua	(参考訳) 指向オブジェクト検出は、リモートセンシングにおける最も基本的かつ挑戦的なタスクの1つであり、多数の事前定義されたオブジェクトカテゴリの指向オブジェクトを見つけることを目的としている。近年,光リモートセンシング画像における指向性物体の検出において,深層学習に基づく手法が顕著な成果を上げている。しかし,リモートセンシングにおける文献の徹底的なレビューは行われていない。そこで我々は,近年の進歩を包括的に調査し,問題定義,一般的なデータセット,評価プロトコル,検出フレームワーク,オブジェクト指向オブジェクト表現,特徴表現など,オブジェクト指向オブジェクト検出の多くの側面をカバーする。さらに,最先端の手法を分析し,考察する。最後に,今後の研究の方向性を議論し,有用な研究指導を行う。この調査は学界や産業界の研究者にとって Oriented object detection is one of the most fundamental and challenging tasks in remote sensing, aiming at locating the oriented objects of numerous predefined object categories. Recently, deep learning based methods have achieved remarkable performance in detecting oriented objects in optical remote sensing imagery. However, a thorough review of the literature in remote sensing has not yet emerged. Therefore, we give a comprehensive survey of recent advances and cover many aspects of oriented object detection, including problem definition, commonly used datasets, evaluation protocols, detection frameworks, oriented object representations, and feature representations. Besides, the state-of-the-art methods are analyzed and discussed. We finally discuss future research directions to put forward some useful research guidance. We believe that this survey shall be valuable to researchers across academia and industry	翻訳日:2023-07-07 17:48:12 公開日:2023-07-06
# 多様なマルチモーダル制御を備えたインタラクティブな画像記述 Caption Anything: Interactive Image Description with Diverse Multimodal Controls ( http://arxiv.org/abs/2305.02677v3 ) ライセンス: Link先を確認	Teng Wang, Jinrui Zhang, Junjie Fei, Hao Zheng, Yunlong Tang, Zhe Li, Mingqi Gao, Shanshan Zhao	(参考訳) 制御可能な画像キャプション(英: Controllable image Casting)は、人間の目的に従って自然言語で画像を記述することを目的とした、新たなマルチモーダルトピックである。最先端の手法は、アノテーション付き入力制御と出力キャプションで訓練される。しかし、このような注釈付きマルチモーダルデータの不足は、対話型AIシステムのユーザビリティとスケーラビリティを大幅に制限する。ユニモーダル命令追跡基盤モデルを活用することは、幅広いデータソースの恩恵を受ける有望な代替手段である。本稿では,幅広いマルチモデル制御をサポートする基盤モデル拡張画像キャプションフレームワークであるCaption AnyThing(CAT)について述べる。 1) 点,箱,軌跡を含む視覚制御 2)感情,長さ,言語,事実性などの言語制御。 Segment Anything Model(SAM)とChatGPTによって、視覚と言語プロンプトをモジュール化されたフレームワークに統合し、異なるコントロール間の柔軟な組み合わせを可能にします。広範なケーススタディは,視覚言語アプリケーションにおける効果的なユーザインタラクションモデリングに光を当てながら,このフレームワークのユーザ意図アライメント機能を実証する。私たちのコードはhttps://github.com/ttengwang/Caption-Anything.comで公開されています。 Controllable image captioning is an emerging multimodal topic that aims to describe the image with natural language following human purpose, $\textit{e.g.}$, looking at the specified regions or telling in a particular text style. State-of-the-art methods are trained on annotated pairs of input controls and output captions. However, the scarcity of such well-annotated multimodal data largely limits their usability and scalability for interactive AI systems. Leveraging unimodal instruction-following foundation models is a promising alternative that benefits from broader sources of data. In this paper, we present Caption AnyThing (CAT), a foundation model augmented image captioning framework supporting a wide range of multimodel controls: 1) visual controls, including points, boxes, and trajectories; 2) language controls, such as sentiment, length, language, and factuality. Powered by Segment Anything Model (SAM) and ChatGPT, we unify the visual and language prompts into a modularized framework, enabling the flexible combination between different controls. Extensive case studies demonstrate the user intention alignment capabilities of our framework, shedding light on effective user interaction modeling in vision-language applications. Our code is publicly available at https://github.com/ttengwang/Caption-Anything.	翻訳日:2023-07-07 17:42:07 公開日:2023-07-06
# FedVS: 分割モデルのためのストラグラー耐性とプライバシ保護による垂直的フェデレーション学習 FedVS: Straggler-Resilient and Privacy-Preserving Vertical Federated Learning for Split Models ( http://arxiv.org/abs/2304.13407v3 ) ライセンス: Link先を確認	Songze Li, Duanyi Yao, Jin Liu	(参考訳) 中央サーバと多くの分散クライアントからなる垂直連合学習(VFL)システムにおいて、トレーニングデータを垂直に分割し、異なる特徴を異なるクライアントにプライベートに格納する。分割VFLの問題は、サーバとクライアントの間で分割されたモデルをトレーニングすることだ。本稿では,分割VFLにおける2つの課題に対処することを目的とする。 1) 研修中にクライアントを絞ったことによる性能の低下 2) クライアントがアップロードしたデータ埋め込みからのデータとモデルのプライバシリーク。我々はこれらの2つの課題に同時に対処するためにFedVSを提案する。 fedvsの鍵となるアイデアは、ローカルデータやモデルのシークレット共有スキームをデザインすることであり、クライアントと好奇心に満ちたサーバに対する情報理論的なプライバシーが保証され、全てのクライアントの埋め込みの集約は、非ストラグリングクライアントから計算共有を復号することで損失なく再構築される。様々な種類のVFLデータセット(表、CV、マルチビューを含む)に対する大規模な実験は、ベースラインプロトコルに対するトラグラー緩和とプライバシ保護におけるFedVSの普遍的な利点を示している。 In a vertical federated learning (VFL) system consisting of a central server and many distributed clients, the training data are vertically partitioned such that different features are privately stored on different clients. The problem of split VFL is to train a model split between the server and the clients. This paper aims to address two major challenges in split VFL: 1) performance degradation due to straggling clients during training; and 2) data and model privacy leakage from clients' uploaded data embeddings. We propose FedVS to simultaneously address these two challenges. The key idea of FedVS is to design secret sharing schemes for the local data and models, such that information-theoretical privacy against colluding clients and curious server is guaranteed, and the aggregation of all clients' embeddings is reconstructed losslessly, via decrypting computation shares from the non-straggling clients. Extensive experiments on various types of VFL datasets (including tabular, CV, and multi-view) demonstrate the universal advantages of FedVS in straggler mitigation and privacy protection over baseline protocols.	翻訳日:2023-07-07 17:41:13 公開日:2023-07-06
# 部分観測からナビゲーションパターンを予測する学習 Learning to Predict Navigational Patterns from Partial Observations ( http://arxiv.org/abs/2304.13242v2 ) ライセンス: Link先を確認	Robin Karlsson, Alexander Carballo, Francisco Lepe-Salazar, Keisuke Fujii, Kento Ohtani, Kazuya Takeda	(参考訳) 人間は、相互に知られた航法パターンに固執することで、規則に制約された環境を協調的にナビゲートする。不完全な環境からこれらのナビゲーションパターンを推測するには、未熟な場所で動作するインテリジェントな移動ロボットが必要である。しかし、これらのナビゲーションパターンをアルゴリズム的に定義することは非自明である。本稿では,実環境におけるナビゲーションパターンを部分的観測のみから推測する,最初の自己教師付き学習(ssl)手法を提案する。幾何学的データ拡張, 予測世界モデリング, 情報理論正規化器により, 無限データに制限された非バイアスな局所指向性軟線確率(DSLP)の予測が可能となる。 dslp フィールドに最大度グラフをフィッティングすることにより、グローバルナビゲーションパターンを推定する方法を実証する。実験の結果,sslモデルは,nuscenesデータセット上の2つのsoma教師付きレーングラフ予測モデルよりも優れていた。認識によるナビゲーションのためのスケーラブルで解釈可能な連続学習パラダイムとしてSSL方式を提案する。コードはhttps://github.com/robin-karlsson0/dslpで入手できる。 Human beings cooperatively navigate rule-constrained environments by adhering to mutually known navigational patterns, which may be represented as directional pathways or road lanes. Inferring these navigational patterns from incompletely observed environments is required for intelligent mobile robots operating in unmapped locations. However, algorithmically defining these navigational patterns is nontrivial. This paper presents the first self-supervised learning (SSL) method for learning to infer navigational patterns in real-world environments from partial observations only. We explain how geometric data augmentation, predictive world modeling, and an information-theoretic regularizer enables our model to predict an unbiased local directional soft lane probability (DSLP) field in the limit of infinite data. We demonstrate how to infer global navigational patterns by fitting a maximum likelihood graph to the DSLP field. Experiments show that our SSL model outperforms two SOTA supervised lane graph prediction models on the nuScenes dataset. We propose our SSL method as a scalable and interpretable continual learning paradigm for navigation by perception. Code is available at https://github.com/robin-karlsson0/dslp.	翻訳日:2023-07-07 17:40:54 公開日:2023-07-06
# DETRはリアルタイム物体検出でYOLOに勝る DETRs Beat YOLOs on Real-time Object Detection ( http://arxiv.org/abs/2304.08069v2 ) ライセンス: Link先を確認	Wenyu Lv, Yian Zhao, Shangliang Xu, Jinman Wei, Guanzhong Wang, Cheng Cui, Yuning Du, Qingqing Dang, Yi Liu	(参考訳) 近年, エンド・ツー・エンド変圧器型検出器~(DETR)は優れた性能を発揮している。しかし, DETR の高計算コストの問題は効果的に解決されておらず,実用的利用を制限し,非最大抑圧 (NMS) などの後処理の利点を完全に活用することができない。本稿では,現代のリアルタイム物体検出器におけるNMSの推論速度への影響を解析し,エンドツーエンドの速度ベンチマークを確立する。 NMSによる推論遅延を回避するため,我々の知る最初のリアルタイム・エンドツーエンド物体検出器であるリアルタイム検出TRansformer (RT-DETR)を提案する。具体的には,大規模インタラクションとクロススケールフュージョンを分離してマルチスケール特徴を効率的に処理する効率的なハイブリッドエンコーダを設計し,オブジェクトクエリの初期化を改善するためにIoU対応クエリ選択を提案する。また,提案する検出器は,異なるデコーダ層を用いて,再訓練を必要とせず柔軟に推定速度を調整できるため,実時間物体検出器の実用化が容易である。 RT-DETR-LはCOCO val2017で53.0%AP、T4 GPUで114FPS、RT-DETR-Xは54.8%APと74FPSを達成し、同じスケールのYOLO検出器をスピードと精度で上回っている。さらに, RT-DETR-R50は53.1%のAPと108のFPSを達成し, DINO-Deformable-DETR-R50の精度は2.2%, FPSの約21倍に向上した。 ourceコードと事前トレーニング済みモデルはhttps://github.com/lyuwenyu/RT-DETR.orgで公開されている。 Recently, end-to-end transformer-based detectors~(DETRs) have achieved remarkable performance. However, the issue of the high computational cost of DETRs has not been effectively addressed, limiting their practical application and preventing them from fully exploiting the benefits of no post-processing, such as non-maximum suppression (NMS). In this paper, we first analyze the influence of NMS in modern real-time object detectors on inference speed, and establish an end-to-end speed benchmark. To avoid the inference delay caused by NMS, we propose a Real-Time DEtection TRansformer (RT-DETR), the first real-time end-to-end object detector to our best knowledge. Specifically, we design an efficient hybrid encoder to efficiently process multi-scale features by decoupling the intra-scale interaction and cross-scale fusion, and propose IoU-aware query selection to improve the initialization of object queries. In addition, our proposed detector supports flexibly adjustment of the inference speed by using different decoder layers without the need for retraining, which facilitates the practical application of real-time object detectors. Our RT-DETR-L achieves 53.0% AP on COCO val2017 and 114 FPS on T4 GPU, while RT-DETR-X achieves 54.8% AP and 74 FPS, outperforming all YOLO detectors of the same scale in both speed and accuracy. Furthermore, our RT-DETR-R50 achieves 53.1% AP and 108 FPS, outperforming DINO-Deformable-DETR-R50 by 2.2% AP in accuracy and by about 21 times in FPS. ource code and pre-trained models are available at https://github.com/lyuwenyu/RT-DETR.	翻訳日:2023-07-07 17:40:30 公開日:2023-07-06
# シュミット位と行列位によるエンタングルメント蒸留 Entanglement distillation in terms of Schmidt rank and matrix rank ( http://arxiv.org/abs/2304.05563v2 ) ライセンス: Link先を確認	Tianyi Ding, Lin Chen	(参考訳) エンタングルメント蒸留は量子情報処理において重要なタスクである。本稿では,Schmidt階数と行列階数の非正分位 (NPT) バイパルタイト状態を蒸留する。シュミットランク2の全ての二成分状態は古典古典的状態と局所的に等価であり、シュミットランク3の全ての二成分状態は1-不飽和状態であることを示す。次に, 生成物ベクトルを含む低ランクのB値のNPT状態が蒸留可能であることを示し, 低ランクのB値のNPT状態は, 大容量の密度演算子に対して蒸留可能であることを示した。最終的には、$M\times N$ bipartite state of rank $\max\{M,N\}+1$ を蒸留する等価条件を示す。 Entanglement distillation is a key task in quantum-information processing. In this paper, we distill non-positive-partial-transpose (NPT) bipartite states of some given Schmidt rank and matrix rank. We show that all bipartite states of Schmidt rank two are locally equivalent to classical-classical states, and all bipartite states of Schmidt rank three are 1-undistillable. Subsequently, we show that low-rank B-irreducible NPT states are distillable for large-rank reduced density operators by proving low-rank B-irreducible NPT state whose range contains a product vector is distillable. Eventually, we present an equivalent condition to distill $M\times N$ bipartite states of rank $\max\{M,N\}+1$.	翻訳日:2023-07-07 17:39:57 公開日:2023-07-06
# 子宮動脈ドプラ画像の自動誘導と品質評価システム An Automatic Guidance and Quality Assessment System for Doppler Imaging of Umbilical Artery ( http://arxiv.org/abs/2304.05463v2 ) ライセンス: Link先を確認	Chun Kit Wong and Manxi Lin and Alberto Raheli and Zahra Bashir and Morten Bo S{\o}ndergaard Svendsen and Martin Gr{\o}nneb{\ae}k Tolsgaard and Aasa Feragen and Anders Nymark Christensen	(参考訳) 超音波ドプラ法を用いて子宮動脈の検査を行い,胎児の健康モニタリングに欠かせない子宮を通じて胎児への血液供給について検討した。このような検査は、測定に最適な部位を特定すること、ドップラースペクトルの形で血流曲線を取得すること、一連の品質基準に準拠すること、など、正しく行う必要があるいくつかのステップを含む。これらのステップはオペレーターのスキルに大きく依存しており、経験豊富なソノグラフィーの不足が機械支援の需要を生み出している。本研究では,このギャップを埋める自動システムを提案する。改良されたFaster R-CNNネットワークを用いることで,ドップラー計測に適した位置を提案するアルゴリズムを得る。また,ドップラースペクトルの品質評価手法も開発した。提案システムは,全国の超音波検診データベースから657枚の画像に対して検証し,ガイダンスシステムとしての可能性を示した。 Examination of the umbilical artery with Doppler ultrasonography is performed to investigate blood supply to the fetus through the umbilical cord, which is vital for the monitoring of fetal health. Such examination involves several steps that must be performed correctly: identifying suitable sites on the umbilical artery for the measurement, acquiring the blood flow curve in the form of a Doppler spectrum, and ensuring compliance to a set of quality standards. These steps rely heavily on the operator's skill, and the shortage of experienced sonographers has thus created a demand for machine assistance. In this work, we propose an automatic system to fill the gap. By using a modified Faster R-CNN network, we obtain an algorithm that can suggest locations suitable for Doppler measurement. Meanwhile, we have also developed a method for assessment of the Doppler spectrum's quality. The proposed system is validated on 657 images from a national ultrasound screening database, with results demonstrating its potential as a guidance system.	翻訳日:2023-07-07 17:39:44 公開日:2023-07-06
# SLPerf: 分散学習のベンチマークのための統一フレームワーク SLPerf: a Unified Framework for Benchmarking Split Learning ( http://arxiv.org/abs/2304.01502v2 ) ライセンス: Link先を確認	Tianchen Zhou, Zhanyi Hu, Bingzhe Wu, Cen Chen	(参考訳) データプライバシの懸念により、サイロに分散したデータの集中的なトレーニングが実現不可能となり、協調学習フレームワークの必要性が高まった。これに対処するために、フェデレーション学習(fl)とスプリット学習(sl)という、2つの著名なフレームワークが登場した。 FLは様々なベンチマークフレームワークや研究ライブラリを確立しているが、SLは現在、ラベル共有、モデルアグリゲーション、カット層選択の点で多様性があるにもかかわらず、統一ライブラリを欠いている。この標準化の欠如はSLパラダイムの比較を困難にしている。そこで本研究では,SLのための統一的な研究フレームワークであるSLPerfを提案し,IIDおよび非IIDデータ設定下で広く使用されている4つのデータセットについて広範な実験を行った。我々のコントリビューションには、最近提案されたSLパラダイムの包括的調査、さまざまな状況におけるSLパラダイムの詳細なベンチマーク比較、SLパラダイムを改善するためのリッチエンジニアリングのテイクアウトメッセージと研究の洞察が含まれている。 SLPerfはSLアルゴリズムの開発と公正な性能比較を容易にする。コードはhttps://github.com/Rainysponge/Split-learning-Attacksで入手できる。 Data privacy concerns has made centralized training of data, which is scattered across silos, infeasible, leading to the need for collaborative learning frameworks. To address that, two prominent frameworks emerged, i.e., federated learning (FL) and split learning (SL). While FL has established various benchmark frameworks and research libraries,SL currently lacks a unified library despite its diversity in terms of label sharing, model aggregation, and cut layer choice. This lack of standardization makes comparing SL paradigms difficult. To address this, we propose SLPerf, a unified research framework and open research library for SL, and conduct extensive experiments on four widely-used datasets under both IID and Non-IID data settings. Our contributions include a comprehensive survey of recently proposed SL paradigms, a detailed benchmark comparison of different SL paradigms in different situations, and rich engineering take-away messages and research insights for improving SL paradigms. SLPerf can facilitate SL algorithm development and fair performance comparisons. The code is available at https://github.com/Rainysponge/Split-learning-Attacks .	翻訳日:2023-07-07 17:39:30 公開日:2023-07-06
# 非マルコフ進化が量子熱力学のキャラクタリゼーションに及ぼす影響 Impact of non-Markovian evolution on characterizations of quantum thermodynamics ( http://arxiv.org/abs/2305.10622v2 ) ライセンス: Link先を確認	Devvrat Tiwari and Subhashish Banerjee	(参考訳) 本研究では,非マルコフ進化がエルゴトロピーやパワーといった量子熱力学の顕著な特性に与える影響について考察する。これらは量子速度制限時間の挙動によってベンチマークされる。本稿では,幾何学的,特に量子フィッシャーとウィグナー・ヤナゼ情報量測定と物性に基づく測定,特に相対純度測度とコヒーレンス測度の相対エントロピーを用いて,量子速度制限時間を計算する。非マルコフ振幅減衰進化を示すボソニック浴中の量子ビットの単純な非マルコフ模型は、有限な初期エルゴトロピーを持つ量子熱力学の観点から量子バッテリーとして観察することができる。この目的のために,量子速度制限時間の物理特性に基づく測定値とエルゴトロピーのコヒーレント成分との関係を考察する。非マルコフ進化は量子電池の充電過程に影響を与えることが示されている。さらに、量子電池の放電充電サイクルと、量子速度制限時間の幾何学的測定との接続を観測する。 Here we study the impact of non-Markovian evolution on prominent characteristics of quantum thermodynamics, such as ergotropy and power. These are benchmarked by the behavior of the quantum speed limit time. We make use of both geometric-based, particularly quantum Fisher and Wigner-Yanase information metric, and physical properties based-measures, particularly relative purity measure and relative entropy of coherence measure, to compute the quantum speed limit time. A simple non-Markovian model of a qubit in a bosonic bath exhibiting non-Markovian amplitude damping evolution is considered, which, from the quantum thermodynamic perspective with finite initial ergotropy, can be envisaged as a quantum battery. To this end, we explore the connections between the physical properties-based measures of quantum speed limit time and the coherent component of ergotropy. The non-Markovian evolution is shown to impact the recharging process of the quantum battery. Further, a connection between the discharging-charging cycle of the quantum battery and the geometric measures of quantum speed limit time is observed.	翻訳日:2023-07-07 17:31:53 公開日:2023-07-06
# ランダム畳み込み核を用いた時系列クラスタリング Time Series Clustering With Random Convolutional Kernels ( http://arxiv.org/abs/2305.10457v2 ) ライセンス: Link先を確認	Jorge Marco-Blanco, Rub\'en Cuevas	(参考訳) 気候学からファイナンス、医療まで幅広いアプリケーションにわたる時系列データは、その大きさと複雑さのためにデータマイニングにおいて重大な課題を呈する。ひとつは時系列クラスタリングであり、ラベルなしの時系列データの大量処理と貴重な洞察の解放に不可欠である。しかし、伝統的かつ近代的な分析手法は、しばしばこれらの複雑さに苦しむ。これらの制約に対処するために、ランダムに選択されたパラメータを持つ畳み込みアーキテクチャを利用するR-Clusteringを導入する。大規模な評価を通じて、R-Clusteringはクラスタリングの精度、計算効率、スケーラビリティの観点から、既存の手法よりも優れた性能を示す。 UCRアーカイブを用いて得られた実験結果は,様々な時系列データセットにまたがるアプローチの有効性を示した。この結果は、様々な領域やアプリケーションにおけるRクラスタリングの重要性を強調し、時系列データマイニングの進歩に寄与している。 Time series data, spanning applications ranging from climatology to finance to healthcare, presents significant challenges in data mining due to its size and complexity. One open issue lies in time series clustering, which is crucial for processing large volumes of unlabeled time series data and unlocking valuable insights. Traditional and modern analysis methods, however, often struggle with these complexities. To address these limitations, we introduce R-Clustering, a novel method that utilizes convolutional architectures with randomly selected parameters. Through extensive evaluations, R-Clustering demonstrates superior performance over existing methods in terms of clustering accuracy, computational efficiency and scalability. Empirical results obtained using the UCR archive demonstrate the effectiveness of our approach across diverse time series datasets. The findings highlight the significance of R-Clustering in various domains and applications, contributing to the advancement of time series data mining.	翻訳日:2023-07-07 17:31:34 公開日:2023-07-06
# レニアの新たな複雑さを捉え Capturing Emerging Complexity in Lenia ( http://arxiv.org/abs/2305.09378v2 ) ライセンス: Link先を確認	Sanyam Jain, Aarati Shrestha and Stefano Nichele	(参考訳) この研究プロジェクトは、デジタル生物の生態系をシミュレートする人工生命プラットフォームLeniaを調査する。レニアの生態系は、移動し、消費し、成長し、再生できる単純な人工生物から成り立っている。このプラットフォームは、様々な能力と行動を持つ多様な生物を生み出すためのスケーラブルで柔軟な環境を提供するため、人工生命と進化を研究するためのツールとして重要である。レニアの複雑さを測定することは、まだ発見されていないレニアの行動を改善することを目的として、ルールの長期的な複雑な出現行動を測定するための指標を特定する研究の重要な側面である。遺伝的アルゴリズムは、近辺やカーネルを遺伝子型として使用し、レニアの残りのパラメータを例えば成長関数のように固定し、個体群ごとに異なる行動を生成し、その結果生じる行動の複雑さを決定するために適合値を測定する。まず,フレーム間のばらつきが高まるようなフィットネス機能として,時間とともに変化を利用する。第2に,フレームの復元損失リストの変動が報われる自動エンコーダベースの適合性を用いる。第3に、再構成フレームの画素密度のより高い変動が報われるような複合フィットネスを行う。 3つの実験はすべてpixel alive thresholdとフレームで調整されている。最後に、500世代毎に各フィットネスの9つの実験を行った後、さらなる進化のスコープがあるような全ての実験から構成を選択し、2500世代にわたって実行します。結果は、核の質量中心は、特定のピクセル集合と、核がガウス分布を達成しようとする境界とともに増加することを示している。 This research project investigates Lenia, an artificial life platform that simulates ecosystems of digital creatures. Lenia's ecosystem consists of simple, artificial organisms that can move, consume, grow, and reproduce. The platform is important as a tool for studying artificial life and evolution, as it provides a scalable and flexible environment for creating a diverse range of organisms with varying abilities and behaviors. Measuring complexity in Lenia is a key aspect of the study, which identifies the metrics for measuring long-term complex emerging behavior of rules, with the aim of evolving better Lenia behaviors which are yet not discovered. The Genetic Algorithm uses neighborhoods or kernels as genotype while keeping the rest of the parameters of Lenia as fixed, for example growth function, to produce different behaviors respective to the population and then measures fitness value to decide the complexity of the resulting behavior. First, we use Variation over Time as a fitness function where higher variance between the frames are rewarded. Second, we use Auto-encoder based fitness where variation of the list of reconstruction loss for the frames is rewarded. Third, we perform combined fitness where higher variation of the pixel density of reconstructed frames is rewarded. All three experiments are tweaked with pixel alive threshold and frames used. Finally, after performing nine experiments of each fitness for 500 generations, we pick configurations from all experiments such that there is a scope of further evolution, and run it for 2500 generations. Results show that the kernel's center of mass increases with a specific set of pixels and together with borders the kernel try to achieve a Gaussian distribution.	翻訳日:2023-07-07 17:30:55 公開日:2023-07-06
# 量子ゲートの物理的誤差寄与の高速推定 Fast Estimation of Physical Error Contributions of Quantum Gates ( http://arxiv.org/abs/2305.08916v2 ) ライセンス: Link先を確認	Miha Papi\v{c}, Adrian Auer, In\'es de Vega	(参考訳) 大規模量子計算では、実装された量子ゲートの主なエラー源を高速に評価する必要がある。そこで本研究では,各物理ノイズ源の寄与を,少数の実験的な測定値を用いて,一連のゲートの不確かさから抽出する学習ベースのフレームワークを提案する。本手法を説明するために,超伝導トランスモンアーキテクチャを例として,可変カプラを用いたCZゲートのダイアバティック実装に着目する。この文脈では、非マルコフ雑音、電子的不完全性、可変カプラによる計算誤差の影響など、関連する全てのノイズ源を考慮に入れる。 Large-scale quantum computation requires a fast assessment of the main sources of error in the implemented quantum gates. To this aim, we provide a learning based framework that allows to extract the contribution of each physical noise source to the infidelity of a series of gates with a small number of experimental measurements. To illustrate this method, we consider the case of superconducting transmon architectures, where we focus on the diabatic implementation of the CZ gate with tunable couplers. In this context, we account for all relevant noise sources, including non-Markovian noise, electronics imperfections and the effect of tunable couplers to the error of the computation.	翻訳日:2023-07-07 17:30:31 公開日:2023-07-06
# 量子力学におけるパラドックスとその解法について On a paradox in quantum mechanics and its resolution ( http://arxiv.org/abs/2305.08556v2 ) ライセンス: Link先を確認	Padtarapan Banyadsin and Salvatore De Vincenzo	(参考訳) ディリクレ境界条件によって特徴づけられる壁のある区間内の自由シュル=オディンガー粒子を考える。この境界条件を満たす粒子の正規化状態として放物線を選択する。その状態におけるハミルトニアンの分散を計算するには、ハミルトニアンの平均値とその正方形の値を計算する必要がある。これらの平均値を計算するのに標準式を使用すると、両者の結果は困難なく得られるが、その差分は予想外に虚偽値を取る。これらの平均値を計算するのに同じ式を使うが、まず各固有関数と固有値の項でハミルトニアンとその平方を書けば、ハミルトニアンの平均値に対して上と同じ結果が得られるが、ハミルトニアンの平均値は異なる(実際にはゼロではない)ので、分散は許容できる値となる。この矛盾した結果がいつから起こるのか? 後者のパラドックスは、ヒルベルト空間における線型作用素の一般理論の中で、ある基本的な概念を使用することでのみ適切に解決できる問題の例として文献に提示されている。ここでは、これらの概念を慎重に検討し、パラドックスを解決するための詳細な方法で適用する。我々の結果は波動力学の自然な枠組みの中で定式化され、ディラックの象徴的形式主義がもたらす不便さを避けるために、記事全体を通してその形式主義の使用を避ける。さらに、関係する演算子の領域によって課される制約に対処することなく、完全に形式的な方法でパラドックスの解決を得る。本論文の内容は,大学院生や大学院生,インストラクターにとって有用であると考えられる。 Consider a free Schr\"odinger particle inside an interval with walls characterized by the Dirichlet boundary condition. Choose a parabola as the normalized state of the particle that satisfies this boundary condition. To calculate the variance of the Hamiltonian in that state, one needs to calculate the mean value of the Hamiltonian and that of its square. If one uses the standard formula to calculate these mean values, one obtains both results without difficulty, but the variance unexpectedly takes an imaginary value. If one uses the same expression to calculate these mean values but first writes the Hamiltonian and its square in terms of their respective eigenfunctions and eigenvalues, one obtains the same result as above for the mean value of the Hamiltonian but a different value for its square (in fact, it is not zero); hence, the variance takes an acceptable value. From whence do these contradictory results arise? The latter paradox has been presented in the literature as an example of a problem that can only be properly solved by making use of certain fundamental concepts within the general theory of linear operators in Hilbert spaces. Here, we carefully review those concepts and apply them in a detailed way to resolve the paradox. Our results are formulated within the natural framework of wave mechanics, and to avoid inconveniences that the use of Dirac's symbolic formalism could bring, we avoid the use of that formalism throughout the article. In addition, we obtain a resolution of the paradox in an entirely formal way without addressing the restrictions imposed by the domains of the operators involved. We think that the content of this paper will be useful to undergraduate and graduate students as well as to their instructors.	翻訳日:2023-07-07 17:30:21 公開日:2023-07-06
# 事前データから言語モデル、下流タスクへ:不公平なNLPモデルによる政治的バイアスの軌跡を追跡する From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models ( http://arxiv.org/abs/2305.08283v3 ) ライセンス: Link先を確認	Shangbin Feng, Chan Young Park, Yuhan Liu, Yulia Tsvetkov	(参考訳) 言語モデル(LM)は、ニュース、ディスカッションフォーラム、書籍、オンライン百科事典など、さまざまなデータソースで事前訓練されている。このデータの大部分には、民主主義とアイデアの多様性を祝福する意見と視点が含まれており、一方で本質的に社会的に偏っている。本研究は,(1)そのようなコーパスで訓練されたLMの社会的偏見を社会的・経済的軸に沿って測定し,(2)政治的偏見のあるLM上で訓練された下流NLPモデルの公平さを測定するための新しい手法を開発する。我々はヘイトスピーチと誤情報検出に注目し、ハイテイクなソーシャル指向タスクの公平性に関する事前学習データにおける政治的(社会的、経済的)バイアスの効果を実証的に定量化することを目的としている。以上の結果から, 事前学習されたLMは, コーパスの偏極性を高める政治的傾向を示し, 社会的バイアスをヘイトスピーチ予測や誤情報検知器に伝播させることがわかった。我々は,nlp研究の意義を議論し,不公平を緩和するための今後の方向性を提案する。 Language models (LMs) are pretrained on diverse data sources, including news, discussion forums, books, and online encyclopedias. A significant portion of this data includes opinions and perspectives which, on one hand, celebrate democracy and diversity of ideas, and on the other hand are inherently socially biased. Our work develops new methods to (1) measure political biases in LMs trained on such corpora, along social and economic axes, and (2) measure the fairness of downstream NLP models trained on top of politically biased LMs. We focus on hate speech and misinformation detection, aiming to empirically quantify the effects of political (social, economic) biases in pretraining data on the fairness of high-stakes social-oriented tasks. Our findings reveal that pretrained LMs do have political leanings that reinforce the polarization present in pretraining corpora, propagating social biases into hate speech predictions and misinformation detectors. We discuss the implications of our findings for NLP research and propose future directions to mitigate unfairness.	翻訳日:2023-07-07 17:29:55 公開日:2023-07-06
# ブラウンドワーフモデルグリッドの相互比較と機械学習による大気検索 Intercomparison of Brown Dwarf Model Grids and Atmospheric Retrieval Using Machine Learning ( http://arxiv.org/abs/2305.07719v2 ) ライセンス: Link先を確認	Anna Lueber, Daniel Kitzmann, Chloe E. Fisher, Brendan P. Bowler, Adam J. Burgasser, Mark Marley, Kevin Heng	(参考訳) サブステラースペクトルデータとモデルの違いを理解することは、特にブラウンドワーフ大気の徹底的な調査に必要な自己整合モデルグリッドにおいて、大きな課題であることが証明されている。ランダム林の教師付き機械学習手法を用いて,1997年から2021年までのブラウンドロームの14個のモデルグリッドの情報量について検討した。ランダムフォレスト法により,モデルグリッドの予測力を解析し,近似ベイズ計算(abc)の枠組み内でデータを解釈することができる。我々のキュレートされたデータセットには、3つのベンチマークブラウンドローム(Gl 570D, {\epsilon} Indi Ba, Bb)と19個のLおよびTドロームのサンプルが含まれており、このサンプルは従来型のベイズ法(ネステッドサンプリング)を用いてLueber et al. (2022)で分析された。この解釈のために選択されたモデルグリッドとは無関係に、ブラウンドロームの有効温度を頑健に予測できることが判明した。しかし、表面重力の推論はモデルに依存します。具体的には、BT-Settl, Sonora Bobcat および Sonora Cholla モデルグリッドは 1.2 {\mu}m のデータブルーワードが不完全なアルカリ線の形状に関する知識を緩和するために無視されているにもかかわらず、logg ~3-4 (cgs unit) を予測する傾向にある。ブラウンドワーフの大気における雲の影響を理解することに関連する2つの大きな、長い間の課題は、次の原則からそれらをモデル化できないことと、これらのモデルを堅牢に検証することである。 Understanding differences between sub-stellar spectral data and models has proven to be a major challenge, especially for self-consistent model grids that are necessary for a thorough investigation of brown dwarf atmospheres. Using the supervised machine learning method of the random forest, we study the information content of 14 previously published model grids of brown dwarfs (from 1997 to 2021). The random forest method allows us to analyze the predictive power of these model grids, as well as interpret data within the framework of Approximate Bayesian Computation (ABC). Our curated dataset includes 3 benchmark brown dwarfs (Gl 570D, {\epsilon} Indi Ba and Bb) as well as a sample of 19 L and T dwarfs; this sample was previously analyzed in Lueber et al. (2022) using traditional Bayesian methods (nested sampling). We find that the effective temperature of a brown dwarf can be robustly predicted independent of the model grid chosen for the interpretation. However, inference of the surface gravity is model-dependent. Specifically, the BT-Settl, Sonora Bobcat and Sonora Cholla model grids tend to predict logg ~3-4 (cgs units) even after data blueward of 1.2 {\mu}m have been disregarded to mitigate for our incomplete knowledge of the shapes of alkali lines. Two major, longstanding challenges associated with understanding the influence of clouds in brown dwarf atmospheres remain: our inability to model them from first principles and also to robustly validate these models.	翻訳日:2023-07-07 17:29:34 公開日:2023-07-06
# 機械学習を用いた意思決定システムにおけるデータ時間ラグの豚肉価格予測に及ぼす影響 Effects of data time lag in a decision-making system using machine learning for pork price prediction ( http://arxiv.org/abs/2305.05677v2 ) ライセンス: Link先を確認	Mario Suaza-Medina, F. Javier Zarazaga-Soria, Jorge Pinilla-Lopez, Francisco J. L\'opez-Pellicer, Javier Lacasta	(参考訳) スペインは世界第3位の豚肉生産国であり、いくつかの地域の多くの農場はこの市場の進化に依存している。しかし、現在の価格体系は不公平であり、一部の俳優は他の業者よりも優れた市場情報を持っている。この文脈では、歴史的価格設定は簡単で手頃な価格なデータソースであり、すべてのエージェントにより良い情報を提供するのに役立つ。しかし、データ取得の遅れが価格決定に影響を及ぼす可能性がある。本稿では,複数の予測アルゴリズムを用いて,データ取得遅延が価格予測システムに与える影響について検討する。本稿では,最適な提案を意思決定支援システムのプロトタイプに統合し,実際のシナリオでテストする。具体的には、農務省が発行したスペインの最も重要な地域豚肉市場の公開データを用いて、同日に取得した同市場の2週間の遅延とサブスクリプションベースのデータを用いている。その結果,最高のパブリックモデルとデータサブスクリプションモデルとの誤差差は0.6ユーロであり,遅延のないデータに有利であることがわかった。市場規模はこれらの違いをサプライチェーンにおいて重要なものにし、市場価格を交渉するためのより良いツールを提供する。 Spain is the third-largest producer of pork meat in the world, and many farms in several regions depend on the evolution of this market. However, the current pricing system is unfair, as some actors have better market information than others. In this context, historical pricing is an easy-to-find and affordable data source that can help all agents to be better informed. However, the time lag in data acquisition can affect their pricing decisions. In this paper, we study the effect that data acquisition delay has on a price prediction system using multiple prediction algorithms. We describe the integration of the best proposal into a decision support system prototype and test it in a real-case scenario. Specifically, we use public data from the most important regional pork meat markets in Spain published by the Ministry of Agriculture with a two-week delay and subscription-based data of the same markets obtained on the same day. The results show that the error difference between the best public and data subscription models is 0.6 Euro cents in favor of the data without delay. The market dimension makes these differences significant in the supply chain, giving pricing agents a better tool to negotiate market prices.	翻訳日:2023-07-07 17:29:00 公開日:2023-07-06
# セマンティックセグメンテーションのための構造的および統計的テクスチャ知識蒸留 Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation ( http://arxiv.org/abs/2305.03944v2 ) ライセンス: Link先を確認	Deyi Ji, Haoran Wang, Mingyuan Tao, Jianqiang Huang, Xian-Sheng Hua, Hongtao Lu	(参考訳) 既存の知識蒸留は、主に教師から学生への高度な文脈知識の伝達に焦点を当てている。しかし、低レベルのテクスチャ知識は、高レベルの深い特徴に対処できない境界、滑らかさ、規則性、色コントラストといった、局所的な構造パターンとグローバルな統計特性を特徴付ける上でも不可欠である。本稿では,構造的・統計的テクスチャ知識を最大限に活用し,意味的セグメント化のための新しい構造的・統計的テクスチャ知識蒸留(sstkd)フレームワークを提案する。具体的には,構造テクスチャ知識のために,構造テクスチャ知識をマイニングするために,ラプラシアンピラミッドと指向性フィルタバンクで低レベル特徴を分解するContourlet Decomposition Module (CDM)を導入する。統計的知識については,統計テクスチャ知識を適応的に抽出し,ヒューリスティックス反復量子化と復号化操作により拡張するDenoized Texture Intensity Equalization Module (DTIEM)を提案する。最後に、各知識学習は個々の損失関数によって監督され、学生ネットワークはより広い視点から教師をよりよく模倣する。実験の結果,提案手法はCityscapes, Pascal VOC 2012, ADE20Kデータセット上での最先端性能を実現することがわかった。 Existing knowledge distillation works for semantic segmentation mainly focus on transferring high-level contextual knowledge from teacher to student. However, low-level texture knowledge is also of vital importance for characterizing the local structural pattern and global statistical property, such as boundary, smoothness, regularity and color contrast, which may not be well addressed by high-level deep features. In this paper, we are intended to take full advantage of both structural and statistical texture knowledge and propose a novel Structural and Statistical Texture Knowledge Distillation (SSTKD) framework for semantic segmentation. Specifically, for structural texture knowledge, we introduce a Contourlet Decomposition Module (CDM) that decomposes low-level features with iterative Laplacian pyramid and directional filter bank to mine the structural texture knowledge. For statistical knowledge, we propose a Denoised Texture Intensity Equalization Module (DTIEM) to adaptively extract and enhance statistical texture knowledge through heuristics iterative quantization and denoised operation. Finally, each knowledge learning is supervised by an individual loss function, forcing the student network to mimic the teacher better from a broader perspective. Experiments show that the proposed method achieves state-of-the-art performance on Cityscapes, Pascal VOC 2012 and ADE20K datasets.	翻訳日:2023-07-07 17:28:43 公開日:2023-07-06
# 表面から見る:試料効率の良いオフラインRLの基礎対称性の爆発 Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL ( http://arxiv.org/abs/2306.04220v3 ) ライセンス: Link先を確認	Peng Cheng, Xianyuan Zhan, Zhihao Wu, Wenjia Zhang, Shoucheng Song, Han Wang, Youfang Lin, Li Jiang	(参考訳) オフライン強化学習(rl)は、事前収集されたデータセットから環境と対話することなくポリシーを学習することで、現実世界のタスクに魅力的なアプローチを提供する。しかし、既存のオフラインRLアルゴリズムの性能はデータセットのスケールと状態-アクション空間カバレッジに大きく依存する。現実世界のデータ収集は、しばしば高価で制御不能であり、小規模で狭い範囲のデータセットにつながり、オフラインrlの実用的なデプロイに重大な課題をもたらす。本稿では,システムダイナミクスの基本的な対称性を活用することで,小規模データセット下でのオフラインrl性能が大幅に向上することを示す。具体的には,tdm(time-reversal symmetry)強制動力学モデル(t-symmetry enforced dynamics model, tdm)を提案する。 TDMは、小さなデータセットに対する良好な表現と、T対称性の遵守に基づくOODサンプルに対する新しい信頼性尺度の両方を提供する。これらは、保守的なポリシー制約の少ない新しいオフラインRLアルゴリズム(TSRL)の構築や、信頼性の高い遅延空間データ拡張手順に容易に使用できる。大規模な実験に基づいて、TSRLは、原サンプルの1%に満たない小さなベンチマークデータセットで優れた性能を発揮し、データ効率と一般化性の観点から、最近のオフラインRLアルゴリズムを著しく上回っている。 Offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets without interacting with the environment. However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets. Real-world data collection is often expensive and uncontrollable, leading to small and narrowly covered datasets and posing significant challenges for practical deployments of offline RL. In this paper, we provide a new insight that leveraging the fundamental symmetry of system dynamics can substantially enhance offline RL performance under small datasets. Specifically, we propose a Time-reversal symmetry (T-symmetry) enforced Dynamics Model (TDM), which establishes consistency between a pair of forward and reverse latent dynamics. TDM provides both well-behaved representations for small datasets and a new reliability measure for OOD samples based on compliance with the T-symmetry. These can be readily used to construct a new offline RL algorithm (TSRL) with less conservative policy constraints and a reliable latent space data augmentation procedure. Based on extensive experiments, we find TSRL achieves great performance on small benchmark datasets with as few as 1% of the original samples, which significantly outperforms the recent offline RL algorithms in terms of data efficiency and generalizability.	翻訳日:2023-07-07 17:23:04 公開日:2023-07-06
# OSPC:オンライン連続測光校正 OSPC: Online Sequential Photometric Calibration ( http://arxiv.org/abs/2305.17673v2 ) ライセンス: Link先を確認	Jawad Haidar, Douaa Khalil, Daniel Asmar	(参考訳) 測光キャリブレーションは多くのコンピュータビジョンアプリケーションに必須である。主な利点の1つは、特に標準のKLTアルゴリズムのようなトラッキングの直接的な方法に依存する場合、Visual SLAMの性能を向上させることである。もうひとつの利点は、測定された強度からセンサーの照射値を取得することであり、シェーディングの形状のような視覚アルゴリズムの事前処理ステップである。現在の測光キャリブレーションシステムは、共同最適化の問題に頼り、推定値の曖昧さに遭遇する。本稿では, 逐次推定手法を用いて, 測光パラメータを求める新しい手法を提案する。提案手法は,すべてのパラメータを高精度に推定でき,さらに定式化は線形かつ凸であり,その解を高速かつオンラインアプリケーションに適したものにしている。提案手法を検証し,その利点を実証するビジュアルオドメトリーシステムの実験を行った。 Photometric calibration is essential to many computer vision applications. One of its key benefits is enhancing the performance of Visual SLAM, especially when it depends on a direct method for tracking, such as the standard KLT algorithm. Another advantage could be in retrieving the sensor irradiance values from measured intensities, as a pre-processing step for some vision algorithms, such as shape-from-shading. Current photometric calibration systems rely on a joint optimization problem and encounter an ambiguity in the estimates, which can only be resolved using ground truth information. We propose a novel method that solves for photometric parameters using a sequential estimation approach. Our proposed method achieves high accuracy in estimating all parameters; furthermore, the formulations are linear and convex, which makes the solution fast and suitable for online applications. Experiments on a Visual Odometry system validate the proposed method and demonstrate its advantages.	翻訳日:2023-07-07 17:21:54 公開日:2023-07-06
# ランダム化SVDの雑音感度について On the Noise Sensitivity of the Randomized SVD ( http://arxiv.org/abs/2305.17435v2 ) ライセンス: Link先を確認	Elad Romanov	(参考訳) ランダム化特異値分解(R-SVD)は、大きな行列の部分的なSVDを効率的に計算するためのスケッチベースアルゴリズムである。行列が低ランクの場合、R-SVDはその部分SVDを正確に生成するが、ランクが大きいと近似しか得られない。データサイエンスと主成分分析(PCA)の応用により、低ランク信号と雑音測定モデルの下でR-SVDを解析する。 R-SVD が生成した特異値は BBP のような相転移を示すことが示され、SNR が特定の検出可能性閾値を超えると、寸法減少係数に依存する最大の特異値は外れ値となる。さらに、基底真理信号特異ベクトルとR-SVDによる近似との重なり合いに関する漸近公式を計算する。次元の減少は、ノイズを非常に非線形に増幅する悪影響がある。以上の結果から,R-SVDの信号検出と推定の両面での統計的優位性を示すとともに,スケッチ寸法が小さい場合には特に顕著である。我々の分析は漸近的に正確であり、R-SVDの既存の作用素-ノルム誤差境界よりもかなり微細である。これは、ガウスのi.d.スケッチ、ランダム・プロジェクション、サブサンプラート・アダマール変換など、以前に文献で考えられていたスケッチ行列の幅広いファミリーに適用される。最後に、r-svd によって得られる特異値とベクトルに対する最適特異値縮小器を導出し、行列の除算への応用に有用である。 The randomized singular value decomposition (R-SVD) is a popular sketching-based algorithm for efficiently computing the partial SVD of a large matrix. When the matrix is low-rank, the R-SVD produces its partial SVD exactly; but when the rank is large, it only yields an approximation. Motivated by applications in data science and principal component analysis (PCA), we analyze the R-SVD under a low-rank signal plus noise measurement model; specifically, when its input is a spiked random matrix. The singular values produced by the R-SVD are shown to exhibit a BBP-like phase transition: when the SNR exceeds a certain detectability threshold, that depends on the dimension reduction factor, the largest singular value is an outlier; below the threshold, no outlier emerges from the bulk of singular values. We further compute asymptotic formulas for the overlap between the ground truth signal singular vectors and the approximations produced by the R-SVD. Dimensionality reduction has the adverse affect of amplifying the noise in a highly nonlinear manner. Our results demonstrate the statistical advantage -- in both signal detection and estimation -- of the R-SVD over more naive sketched PCA variants; the advantage is especially dramatic when the sketching dimension is small. Our analysis is asymptotically exact, and substantially more fine-grained than existing operator-norm error bounds for the R-SVD, which largely fail to give meaningful error estimates in the moderate SNR regime. It applies for a broad family of sketching matrices previously considered in the literature, including Gaussian i.i.d. sketches, random projections, and the sub-sampled Hadamard transform, among others. Lastly, we derive an optimal singular value shrinker for singular values and vectors obtained through the R-SVD, which may be useful for applications in matrix denoising.	翻訳日:2023-07-07 17:21:34 公開日:2023-07-06
# 多視点制限カーネルマシンにおける双対性 Duality in Multi-View Restricted Kernel Machines ( http://arxiv.org/abs/2305.17251v2 ) ライセンス: Link先を確認	Sonny Achten, Arun Pandey, Hannes De Meulemeester, Bart De Moor, Johan A. K. Suykens	(参考訳) 本稿では,既存の制限付きカーネルマシンメソッドを,教師なし設定と教師なし設定の両方においてカーネル主成分分析のための単一のプリミラル・ディアル・マルチビュー・フレームワークに結合した統一設定を提案する。フレームワークの一次表現と双対表現を導出し、理論的な観点から異なるトレーニングと推論アルゴリズムを関連づける。一次変数を再スケーリングすることで、原始変数と双対変数の完全同値性を実現する方法を示す。最後に,不確定なテストデータを再帰的に予測し,学習した特徴を可視化することにより,複数の時系列データセットにおける異なる手法間の関係を実験的に検証し,考察する。 We propose a unifying setting that combines existing restricted kernel machine methods into a single primal-dual multi-view framework for kernel principal component analysis in both supervised and unsupervised settings. We derive the primal and dual representations of the framework and relate different training and inference algorithms from a theoretical perspective. We show how to achieve full equivalence in primal and dual formulations by rescaling primal variables. Finally, we experimentally validate the equivalence and provide insight into the relationships between different methods on a number of time series data sets by recursively forecasting unseen test data and visualizing the learned features.	翻訳日:2023-07-07 17:20:59 公開日:2023-07-06
# 音声による抑うつ検出における自己教師付き表現 Self-supervised representations in speech-based depression detection ( http://arxiv.org/abs/2305.12263v2 ) ライセンス: Link先を確認	Wen Wu, Chao Zhang, Philip C. Woodland	(参考訳) 本稿では,自己教師付き学習(ssl)による基礎モデルを用いた音声自動抑うつ検出(sdd)における学習データのスパーシティの取り扱いを提案する。予め訓練された基礎モデルの異なる層から派生したSSL表現をSDDで解析し、うつ病検出に適した指標の洞察を提供する。次に、基礎モデルの微調整により、自動音声認識(ASR)と感情認識からSDDへの知識伝達を行う。その結果,asrモデルの隠れた表現とasrのテキスト情報とが組み合わさった場合,oracle と asr の書き起こしが同様の sdd 性能をもたらすことがわかった。複数の基礎モデルから表現を統合することで、DAIC-WOZデータセット上で実際のASRに基づく最先端SDD結果が得られた。 This paper proposes handling training data sparsity in speech-based automatic depression detection (SDD) using foundation models pre-trained with self-supervised learning (SSL). An analysis of SSL representations derived from different layers of pre-trained foundation models is first presented for SDD, which provides insight to suitable indicator for depression detection. Knowledge transfer is then performed from automatic speech recognition (ASR) and emotion recognition to SDD by fine-tuning the foundation models. Results show that the uses of oracle and ASR transcriptions yield similar SDD performance when the hidden representations of the ASR model is incorporated along with the ASR textual information. By integrating representations from multiple foundation models, state-of-the-art SDD results based on real ASR were achieved on the DAIC-WOZ dataset.	翻訳日:2023-07-07 17:19:55 公開日:2023-07-06
# 冷原子不純物モデルを用いたフェルミオン物質波量子光学 Fermionic matter-wave quantum optics with cold-atom impurity models ( http://arxiv.org/abs/2305.11610v2 ) ライセンス: Link先を確認	Bennet Windt, Miguel Bello, Eugene Demler, J. Ignacio Cirac	(参考訳) 物質-波導波路QEDの近年のコールド原子実現により、簡単なフェルミオン不純物モデルが研究され、非自明な境界状態の形成、(マター波)放出ダイナミクス、集団散逸など、量子光学におけるいくつかのパラダイム現象のフェルミオン類似が議論される。単一不純物の場合、特に不純物スクリーニングクラウドに関連する創発的長さスケールの実際の空間シグネチャに焦点を当て、興味深い基底状態の特徴を強調します。また,単重および多重不純物系のクエンチダイナミクスにおいて,フェルミ準位付近の分数減衰や連続体の束縛状態による多重励起集団トラップを含む,新しい非マルコフ多体効果を示す。 Motivated by recent cold-atom realisations of matter-wave waveguide QED, we study simple fermionic impurity models and discuss fermionic analogues of several paradigmatic phenomena in quantum optics, including formation of non-trivial bound states, (matter-wave) emission dynamics, and collective dissipation. For a single impurity, we highlight interesting ground-state features, focusing in particular on real-space signatures of an emergent length scale associated with an impurity screening cloud. We also present novel non-Markovian many-body effects in the quench dynamics of single- and multiple-impurity systems, including fractional decay around the Fermi level and multi-excitation population trapping due to bound states in the continuum.	翻訳日:2023-07-07 17:19:43 公開日:2023-07-06
# ビデオ異常検出のためのマルチスケール時空間インタラクションネットワーク Multi-scale Spatial-temporal Interaction Network for Video Anomaly Detection ( http://arxiv.org/abs/2306.10239v2 ) ライセンス: Link先を確認	Zhiyuan Ning, Zhangxun Li, Zhengliang Guo, Zile Wang, Liang Song	(参考訳) video anomaly detection (vad)は信号処理において必須だが困難なタスクである。時間的または空間的情報の分離分析では特定の異常は検出できないため、これらの2種類のデータ間の相互作用はvadにとって重要であると考えられている。しかし、現在のデュアルストリームアーキテクチャでは、この積分相互作用をオートエンコーダのボトルネックに限定するか、異常に非関連な背景画素をインタラクティブなプロセスに導入することで、VADの精度を損なう。これらの欠陥に対処するために,VADのためのマルチスケール空間時間相互作用ネットワーク(MSTI-Net)を提案する。まず,移動物体の検出を優先し,2種類のデータ間の意味的相違を調和させるため,従来の直接核融合の代替として,アテンションに基づく時空間融合モジュール(ASTFM)を提案する。さらに、両ストリームネットワークの出現と動きをブリッジするマルチASTFMベースの接続を注入し、マルチスケールの時空間相互作用を促進する。最後に,正常な動作と異常な動作の関連性を高めるため,メモリモジュール内の正規情報を記録する。 3つのベンチマークデータセットにおける実験結果から,ucsd ped2,cuhk avenue,上海テックデータセットでそれぞれ96.8%,87.6%,73.9%のaucsを達成した。 Video Anomaly Detection (VAD) is an essential yet challenging task in signal processing. Since certain anomalies cannot be detected by isolated analysis of either temporal or spatial information, the interaction between these two types of data is considered crucial for VAD. However, current dual-stream architectures either confine this integral interaction to the bottleneck of the autoencoder or introduce anomaly-irrelevant background pixels into the interactive process, hindering the accuracy of VAD. To address these deficiencies, we propose a Multi-scale Spatial-Temporal Interaction Network (MSTI-Net) for VAD. First, to prioritize the detection of moving objects in the scene and harmonize the substantial semantic discrepancies between the two types of data, we propose an Attention-based Spatial-Temporal Fusion Module (ASTFM) as a substitute for the conventional direct fusion. Furthermore, we inject multi-ASTFM-based connections that bridge the appearance and motion streams of the dual-stream network, thus fostering multi-scale spatial-temporal interaction. Finally, to bolster the delineation between normal and abnormal activities, our system records the regular information in a memory module. Experimental results on three benchmark datasets validate the effectiveness of our approach, which achieves AUCs of 96.8%, 87.6%, and 73.9% on the UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets, respectively.	翻訳日:2023-07-07 17:12:45 公開日:2023-07-06
# wasserstein barycentersによるマルチタスク学習の公平性 Fairness in Multi-Task Learning via Wasserstein Barycenters ( http://arxiv.org/abs/2306.10155v2 ) ライセンス: Link先を確認	Fran\c{c}ois Hu, Philipp Ratz, Arthur Charpentier	(参考訳) アルゴリズムフェアネスは、データのバイアスを減らすことを目的とした機械学習の確立された分野である。近年の進歩は、単一タスクの非バイアス化を目標とする単変量環境における公平性を確保するための様々な方法が提案されている。しかし、複数の目的が共有表現を用いて最適化されるマルチタスク設定への公平性の拡張は、未探索のままである。このギャップを埋めるため,マルチマルジナルなワッサーシュタイン・バリセンタを用いたマルチタスク学習に,Strong Demographic Parityの定義を拡張した手法を開発した。提案手法は回帰および二項分類タスクを含む最適フェアマルチタスク予測器に対する閉形式解を提供する。本研究では,データ駆動型解推定手法を開発し,合成データと実データの両方について数値実験を行う。実験結果は, 公平な意思決定を促進する上で, 処理後方法論の実際的価値を浮き彫りにするものである。 Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared representation, remains underexplored. To bridge this gap, we develop a method that extends the definition of Strong Demographic Parity to multi-task learning using multi-marginal Wasserstein barycenters. Our approach provides a closed form solution for the optimal fair multi-task predictor including both regression and binary classification tasks. We develop a data-driven estimation procedure for the solution and run numerical experiments on both synthetic and real datasets. The empirical results highlight the practical value of our post-processing methodology in promoting fair decision-making.	翻訳日:2023-07-07 17:12:21 公開日:2023-07-06
# タッグ符号化問題に対するハイパーパラメータ調整モデルの重ね合わせ Stacking of Hyperparameter Tuned Models for Tagging Coding Problems ( http://arxiv.org/abs/2306.10077v2 ) ライセンス: Link先を確認	Sathya Krishnan TS, S. Lakshmana Pandian and P. Shunmugapriya	(参考訳) 符号化問題は、コンピュータプログラムの形で解を必要とする問題である。コーディングの問題は、学生やプロの間で人気があり、スキルやキャリアの機会を高める。コーディング問題を実践する人たちを助けるAIシステムは、非常に有用であり、そのようなシステムには大きな可能性がある。本研究では,ハイパーパラメータの積み重ねによって77.8%の精度と0.815pr-aucの印象的なメトリックスコアを,codeforcesとleetcodeから抽出したデータセット上で達成するモデルを提案する。この作業のために開発されたデータセットとモデルをオープンソースにしています。 Coding problems are problems that require a solution in the form of a computer program. Coding problems are popular among students and professionals as it enhances their skills and career opportunities. An AI system that would help those who practice coding problems would be highly useful and there is a huge potential for such a system. In this work, we propose a model which uses stacking of hyperparameter tuned boosting models to achieve impressive metric scores of 77.8% accuracy and 0.815 PR-AUC on the dataset that was scraped from Codeforces and Leetcode. We open source the dataset and the models developed for this work.	翻訳日:2023-07-07 17:11:41 公開日:2023-07-06
# ChatGPT と LLM の医療イメージ保有者への影響 : 展望とユースケース The Impact of ChatGPT and LLMs on Medical Imaging Stakeholders: Perspectives and Use Cases ( http://arxiv.org/abs/2306.06767v2 ) ライセンス: Link先を確認	Jiancheng Yang, Hongwei Bran Li, Donglai Wei	(参考訳) 本研究では,医療画像におけるOpenAI ChatGPTなどの大規模言語モデル(LLM)の変換可能性について検討する。公衆データの助けを借りて、これらのモデルは優れた言語理解と生成能力を持ち、放射線科医の解釈スキルを増強し、患者と物理学者のコミュニケーションを強化し、臨床ワークフローを合理化する。本稿では,企業,保険法人,政府,研究機関,病院(通称BIGR-H)など医療画像利害関係者の複雑な相互作用を示すための分析枠組みについて紹介する。この視点は、詳細な分析、説明的ユースケース、より広範な意味と今後の方向性に関する議論を通じて、AI対応ヘルスケアの時代における戦略的計画と意思決定に関する議論を提起することを目指している。 This study investigates the transformative potential of Large Language Models (LLMs), such as OpenAI ChatGPT, in medical imaging. With the aid of public data, these models, which possess remarkable language understanding and generation capabilities, are augmenting the interpretive skills of radiologists, enhancing patient-physician communication, and streamlining clinical workflows. The paper introduces an analytic framework for presenting the complex interactions between LLMs and the broader ecosystem of medical imaging stakeholders, including businesses, insurance entities, governments, research institutions, and hospitals (nicknamed BIGR-H). Through detailed analyses, illustrative use cases, and discussions on the broader implications and future directions, this perspective seeks to raise discussion in strategic planning and decision-making in the era of AI-enabled healthcare.	翻訳日:2023-07-07 17:11:24 公開日:2023-07-06
# スパース観測による日次予測の深層学習 Deep Learning for Day Forecasts from Sparse Observations ( http://arxiv.org/abs/2306.06079v3 ) ライセンス: Link先を確認	Marcin Andrychowicz, Lasse Espeholt, Di Li, Samier Merchant, Alexander Merose, Fred Zyda, Shreya Agrawal, Nal Kalchbrenner	(参考訳) 深層ニューラルネットワークは、気象条件をモデル化するための代替パラダイムを提供する。データが利用可能になったら1秒未満で予測できる神経モデルの能力と、非常に高い時間分解能と空間分解能、そして大気観測から直接学習できる能力は、これらのモデルのユニークな利点のほんの一部にすぎない。最新の確率的数値気象予報モデルと比較すると,大気観測で訓練された最も高い忠実度と最低遅延データであるニューラルモデルは,最大12時間のリードタイムを達成でき,降水量の唯一の変数に限られる。本稿では,観測に基づくニューラルモデルによって予測可能な,リードタイム範囲と変数の両方を大きく拡張するMetNet-3を提案する。 MetNet-3は、密度とスパースの両方のデータセンサーから学習し、降水、風、温度、露点を最大24時間前に予測する。 MetNet-3は、極端にスパースなターゲットでのネットワークトレーニングにもかかわらず、暗黙的にデータ同化を捉え、空間的に密度の高い予測を生成するキーデンシフィケーション技術を導入している。 MetNet-3は、それぞれ2分と1kmまでの時間分解能と空間分解能が高く、運用遅延も低い。 MetNet-3は、観測ベースのニューラルモデルに新たなパフォーマンスマイルストーンが設定される前に、最大24時間、CONUS領域上でHRRRやENSのような最も優れたシングルおよびマルチメンバNWPを上回ります。 metnet-3は運用中であり、予測は他のモデルとともにgoogle検索で提供される。 Deep neural networks offer an alternative paradigm for modeling weather conditions. The ability of neural models to make a prediction in less than a second once the data is available and to do so with very high temporal and spatial resolution, and the ability to learn directly from atmospheric observations, are just some of these models' unique advantages. Neural models trained using atmospheric observations, the highest fidelity and lowest latency data, have to date achieved good performance only up to twelve hours of lead time when compared with state-of-the-art probabilistic Numerical Weather Prediction models and only for the sole variable of precipitation. In this paper, we present MetNet-3 that extends significantly both the lead time range and the variables that an observation based neural model can predict well. MetNet-3 learns from both dense and sparse data sensors and makes predictions up to 24 hours ahead for precipitation, wind, temperature and dew point. MetNet-3 introduces a key densification technique that implicitly captures data assimilation and produces spatially dense forecasts in spite of the network training on extremely sparse targets. MetNet-3 has a high temporal and spatial resolution of, respectively, up to 2 minutes and 1 km as well as a low operational latency. We find that MetNet-3 is able to outperform the best single- and multi-member NWPs such as HRRR and ENS over the CONUS region for up to 24 hours ahead setting a new performance milestone for observation based neural models. MetNet-3 is operational and its forecasts are served in Google Search in conjunction with other models.	翻訳日:2023-07-07 17:10:50 公開日:2023-07-06
# 近位誘導によるチューニングフリー実画像編集の改善 Improving Tuning-Free Real Image Editing with Proximal Guidance ( http://arxiv.org/abs/2306.05414v3 ) ライセンス: Link先を確認	Ligong Han, Song Wen, Qi Chen, Zhixing Zhang, Kunpeng Song, Mengwei Ren, Ruijiang Gao, Anastasis Stathopoulos, Xiaoxiao He, Yuxiao Chen, Di Liu, Qilong Zhangli, Jindong Jiang, Zhaoyang Xia, Akash Srivastava, Dimitris Metaxas	(参考訳) DDIMインバージョンは拡散法における実際の画像編集の可能性を明らかにした。しかし、DDIM再構成の精度は、より大きな分類器フリーガイダンス(CFG)スケールが編集の強化に使われているため劣化する。 null-text inversion (nti) は、レコンストラクションとインバージョントラジェクタをより大きなcfgスケールに合わせるためにnull埋め込みを最適化し、クロスアテンション制御による実際の画像編集を可能にする。負のプロンプト反転(NPI)はさらに、NTIのトレーニング不要閉形式解を提供する。しかし、アーティファクトを導入し、DDIMの再構築品質に制約されている。これらの制限を克服するため,我々は近位指導法を提案し,それをNPIに組み込む。我々は、NPIを正規化期間と再構築指導で強化し、トレーニングフリーな性質を生かしながらアーティファクトを減らす。さらに,概念を拡張して相互自己着脱制御を組み込むことにより,編集プロセスにおける幾何およびレイアウト変更を可能にする。提案手法は,計算オーバーヘッドを最小限に抑えることで,実画像編集作業に効果的に対処する。 DDIM inversion has revealed the remarkable potential of real image editing within diffusion-based methods. However, the accuracy of DDIM reconstruction degrades as larger classifier-free guidance (CFG) scales being used for enhanced editing. Null-text inversion (NTI) optimizes null embeddings to align the reconstruction and inversion trajectories with larger CFG scales, enabling real image editing with cross-attention control. Negative-prompt inversion (NPI) further offers a training-free closed-form solution of NTI. However, it may introduce artifacts and is still constrained by DDIM reconstruction quality. To overcome these limitations, we propose proximal guidance and incorporate it to NPI with cross-attention control. We enhance NPI with a regularization term and reconstruction guidance, which reduces artifacts while capitalizing on its training-free nature. Additionally, we extend the concepts to incorporate mutual self-attention control, enabling geometry and layout alterations in the editing process. Our method provides an efficient and straightforward approach, effectively addressing real image editing tasks with minimal computational overhead.	翻訳日:2023-07-07 17:10:21 公開日:2023-07-06
# 統計学者としてのトランスフォーマー:in-contextアルゴリズム選択によるコンテキスト内学習の実現 Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection ( http://arxiv.org/abs/2306.04637v2 ) ライセンス: Link先を確認	Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, Song Mei	(参考訳) トランスフォーマーアーキテクチャに基づくニューラルシーケンスモデルでは、トレーニングやテスト例で新たなタスクを実行し、パラメータをモデルに更新することなく、注目すべき \emph{in-context learning} (icl)能力が実証されている。この研究はまず、トランスフォーマーがiclを実行するための包括的な統計理論を提供する。具体的には、最小二乗、リッジ回帰、ラッソ、学習一般化線形モデル、二層ニューラルネットワーク上の勾配勾配などの文脈において、様々なコンテキスト内データ分布にほぼ最適な予測力を持つ、幅広い機械学習アルゴリズムを実装できることを示す。変換器の構成は,文脈内勾配勾配の効率的な実装を基礎として軽度サイズ境界を許容し,多項式的に多くの事前学習シーケンスで学習することができる。これらの 'base'' の icl アルゴリズムに基づいて、興味深いことに、トランスフォーマーがより複雑な icl プロシージャを実装できることを示します。それは、統計学者が実生活でできることに似ています -- \emph{single} トランスフォーマーは、異なるベース icl アルゴリズムを適応的に選択できます -- あるいは、異なる入力シーケンス上で、正しいアルゴリズムやタスクを明示的にプロンプトすることなく、質的に異なるタスクを実行することができます。我々は,この現象を明示的な構成によって理論的に確立し,実験的に観察する。理論的には,事前iclテストとポストicl検証という2つのアルゴリズム選択機構を具体例で構築する。例えば、ICL後検証機構を用いて、ノイズレベルが混在する雑音のある線形モデルにおいて、ベイズ最適ICLに近い動作が可能なトランスフォーマーを構築する。実験により,標準トランスアーキテクチャの強いコンテキスト内アルゴリズム選択能力を示す。 Neural sequence models based on the transformer architecture have demonstrated remarkable \emph{in-context learning} (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism, our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences. Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving \emph{in-context algorithm selection}, akin to what a statistician can do in real life -- A \emph{single} transformer can adaptively select different base ICL algorithms -- or even perform qualitatively different tasks -- on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally. In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging task -- noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures.	翻訳日:2023-07-07 17:10:01 公開日:2023-07-06
# バナッハ空間の誘導系におけるダイナミクスの収束 Convergence of Dynamics on Inductive Systems of Banach Spaces ( http://arxiv.org/abs/2306.16063v2 ) ライセンス: Link先を確認	Lauritz van Luijk, Alexander Stottmeister, Reinhard F. Werner	(参考訳) 定性的かつ定量的な物理系の多くの特徴は、ある限定的な状況下でのみ、鋭く定義されるか、抽出可能である。例えば、熱力学極限における相転移、量子論からの大きな作用における古典力学の出現、再正規化群固定点から生じる連続量子場理論である。このような多様なアプリケーションで有効な方法がほとんどないように思える。しかし、ここでは理論の極限に対する柔軟なモデリングツールを示す:バナッハ空間の帰納的極限の一般化を構成するソフトインダクティブ極限。この文脈では、ダイナミクスの収束に関する一般的な基準が定式化され、これらの基準が前述の状況に適用されることが示される。 Many features of physical systems, both qualitative and quantitative, become sharply defined or tractable only in some limiting situation. Examples are phase transitions in the thermodynamic limit, the emergence of classical mechanics from quantum theory at large action, and continuum quantum field theory arising from renormalization group fixed points. It would seem that few methods can be useful in such diverse applications. However, we here present a flexible modeling tool for the limit of theories: soft inductive limits constituting a generalization of inductive limits of Banach spaces. In this context, general criteria for the convergence of dynamics will be formulated, and these criteria will be shown to apply in the situations mentioned and more.	翻訳日:2023-07-07 17:03:14 公開日:2023-07-06
# オープンボキャブラリ学習に向けて:調査 Towards Open Vocabulary Learning: A Survey ( http://arxiv.org/abs/2306.15880v2 ) ライセンス: Link先を確認	Jianzong Wu, Xiangtai Li, Shilin Xu, Haobo Yuan, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, Dacheng Tao	(参考訳) 視覚シーン理解の分野では、ディープニューラルネットワークはセグメンテーション、トラッキング、検出など、さまざまなコアタスクにおいて驚くべき進歩を遂げている。しかし、ほとんどのアプローチはクローズセットの仮定に基づいており、トレーニングセットに存在する事前定義されたカテゴリのみを識別できる。近年、視覚言語事前学習の急速な進歩により、オープンな語彙設定が提案されている。これらの新しいアプローチは、注釈付きラベル空間を超えてカテゴリを見つけ、認識することを目指している。オープン語彙のアプローチは、弱教師付きおよびゼロショット設定に比べて、より一般的で実用的で効果的である。本稿では,その分野における最近の発展を要約し,分析し,オープンな語彙学習の徹底的なレビューを行う。特に,ゼロショット学習,オープンセット認識,分散検出といった関連する概念と比較することから始める。次に, セグメンテーションと検出に関して, ロングテール問題, 少数ショット設定, ゼロショット設定など, 密接に関連するタスクをいくつか検討する。本研究は,まず,事前知識としてクローズセットにおける検出とセグメンテーションの基本的な知識を提示する。次に,オープン語彙学習を用いた様々なシナリオについて検討し,共通設計要素とコアアイデアを同定する。次に、一般的なデータセットとベンチマークにおける最近の検出とセグメンテーションのアプローチを比較した。最後に,今後の研究方向性に関する洞察,課題,議論をまとめる。私たちの知る限り、オープンな語彙学習に関する総合的な文献レビューはこれが初めてである。関連する作業をhttps://github.com/jianzongwu/Awesome-Open-Vocabulary.comで追跡しています。 In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective compared to weakly supervised and zero-shot settings. This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by comparing it to related concepts such as zero-shot learning, open-set recognition, and out-of-distribution detection. Then, we review several closely related tasks in the case of segmentation and detection, including long-tail problems, few-shot, and zero-shot settings. For the method survey, we first present the basic knowledge of detection and segmentation in close-set as the preliminary knowledge. Next, we examine various scenarios in which open vocabulary learning is used, identifying common design elements and core ideas. Then, we compare the recent detection and segmentation approaches in commonly used datasets and benchmarks. Finally, we conclude with insights, issues, and discussions regarding future research directions. To our knowledge, this is the first comprehensive literature review of open vocabulary learning. We keep tracing related works at https://github.com/jianzongwu/Awesome-Open-Vocabulary.	翻訳日:2023-07-07 17:03:03 公開日:2023-07-06
# UTRNet: 印刷文書における高解像度ウルドゥー文字認識 UTRNet: High-Resolution Urdu Text Recognition In Printed Documents ( http://arxiv.org/abs/2306.15782v2 ) ライセンス: Link先を確認	Abdur Rahman, Arjun Ghosh, and Chetan Arora	(参考訳) 本稿では,高解像度・マルチスケールな意味的特徴抽出を用いたUrduテキスト認識の課題に対処する新しい手法を提案する。提案するハイブリッドCNN-RNNモデルであるUTRNetアーキテクチャは,ベンチマークデータセット上での最先端性能を示す。ウルドゥー文字の複雑さと十分な注釈付き実世界のデータの欠如に対応するために,我々は,11,000 行以上からなる大規模な注釈付き実世界データセット utrset-real と,実世界に近い2万行の合成データセット utrset-synth を導入し,既存のiii 番目のデータセットの基礎的真相を訂正し,将来の研究のためのより信頼性の高いリソースとした。また、スキャンした文書のUrduテキスト行検出のためのベンチマークデータセットであるUrduDocも提供する。さらに,UTRNetをテキスト検出モデルに統合することにより,印刷物からUrdu OCRをエンド・ツー・エンドにするためのオンラインツールを開発した。我々の研究は、現在のUrdu OCRの限界に対処するだけでなく、この領域における今後の研究の道を開くとともに、Urdu OCR技術の継続的な進歩を促進する。ソースコード、データセット、アノテーション、トレーニングされたモデル、オンラインツールを備えたプロジェクトページは、abdur75648.github.io/utrnetで入手できる。 In this paper, we propose a novel approach to address the challenges of printed Urdu text recognition using high-resolution, multi-scale semantic feature extraction. Our proposed UTRNet architecture, a hybrid CNN-RNN model, demonstrates state-of-the-art performance on benchmark datasets. To address the limitations of previous works, which struggle to generalize to the intricacies of the Urdu script and the lack of sufficient annotated real-world data, we have introduced the UTRSet-Real, a large-scale annotated real-world dataset comprising over 11,000 lines and UTRSet-Synth, a synthetic dataset with 20,000 lines closely resembling real-world and made corrections to the ground truth of the existing IIITH dataset, making it a more reliable resource for future research. We also provide UrduDoc, a benchmark dataset for Urdu text line detection in scanned documents. Additionally, we have developed an online tool for end-to-end Urdu OCR from printed documents by integrating UTRNet with a text detection model. Our work not only addresses the current limitations of Urdu OCR but also paves the way for future research in this area and facilitates the continued advancement of Urdu OCR technology. The project page with source code, datasets, annotations, trained models, and online tool is available at abdur75648.github.io/UTRNet.	翻訳日:2023-07-07 17:02:41 公開日:2023-07-06
# 注意型生成型adversarial networkのための単純かつ効果的なベースライン A Simple and Effective Baseline for Attentional Generative Adversarial Networks ( http://arxiv.org/abs/2306.14708v2 ) ライセンス: Link先を確認	Mingyu Jin, Chong Zhang, Qinkai Yu, Haochen Xue, Xiaobo Jin, Xi Yang	(参考訳) テキスト記述を通じて生成モデルを導くことで高品質画像のテキスト対画像モデルを合成することは、革新的で挑戦的なタスクである。近年,GANトレーニングをガイドするアテンション機構に基づくAttnGAN,ジェネレータの性能と画像生成の質を向上させる自己蒸留技術を採用したSD-GAN,複数のジェネレータと識別器を積み重ねることで画像の細部と品質を徐々に改善するStack-GAN++などが提案されている。しかし、この一連のGANの改善は、いずれもある程度の冗長性を持ち、世代性能と複雑性にある程度影響を及ぼす。我々は,AttnGANの冗長構造を除去し,バックボーンネットワークを改善するために,一般的なシンプルで効果的なアイデア(1)を用いる。 2) DAMSMの複数損失の統合と再構築。モデルサイズとトレーニング効率を大幅に改善するとともに,モデルの性能が変化しないことを保証し,最終的にSEAttnGANを提案する。コードはhttps://github.com/jmyissb/SEAttnGANで検証可能である。 Synthesising a text-to-image model of high-quality images by guiding the generative model through the Text description is an innovative and challenging task. In recent years, AttnGAN based on the Attention mechanism to guide GAN training has been proposed, SD-GAN, which adopts a self-distillation technique to improve the performance of the generator and the quality of image generation, and Stack-GAN++, which gradually improves the details and quality of the image by stacking multiple generators and discriminators. However, this series of improvements to GAN all have redundancy to a certain extent, which affects the generation performance and complexity to a certain extent. We use the popular simple and effective idea (1) to remove redundancy structure and improve the backbone network of AttnGAN. (2) to integrate and reconstruct multiple losses of DAMSM. Our improvements have significantly improved the model size and training efficiency while ensuring that the model's performance is unchanged and finally proposed our SEAttnGAN. Code is avalilable at https://github.com/jmyissb/SEAttnGAN.	翻訳日:2023-07-07 17:01:58 公開日:2023-07-06
# 歌声変換チャレンジ2023 The Singing Voice Conversion Challenge 2023 ( http://arxiv.org/abs/2306.14422v2 ) ライセンス: Link先を確認	Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, Tomoki Toda	(参考訳) 本稿では,共通データセットに基づく異なる音声変換(VC)システムの比較と理解を目的とした,二年制の科学イベントであるVCCシリーズの最新版を紹介する。今年はsvc(singing voice conversion challenge)に焦点を移し、the challenge the singing voice conversion challenge(svcc)と命名しました。新しいデータベースはドメイン内およびドメイン間SVCという2つのタスクのために構築された。チャレンジは2ヶ月間実施され、合計26の応募があり、2つのベースラインがありました。クラウドソースによる大規模なリスニングテストを通じて,人間レベルの自然性はトップシステムによって達成されたが,目標とする話者ほど高い類似度スコアを得ることはできなかった。また、予想通り、ドメイン間SVCは、特に類似性の観点から、ドメイン内SVCよりも難しい。また,既存の客観的測定値が知覚的パフォーマンスを予測できたかを調査し,有意な相関が得られたのはごくわずかであった。 We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual scientific event aiming to compare and understand different voice conversion (VC) systems based on a common dataset. This year we shifted our focus to singing voice conversion (SVC), thus named the challenge the Singing Voice Conversion Challenge (SVCC). A new database was constructed for two tasks, namely in-domain and cross-domain SVC. The challenge was run for two months, and in total we received 26 submissions, including 2 baselines. Through a large-scale crowd-sourced listening test, we observed that for both tasks, although human-level naturalness was achieved by the top system, no team was able to obtain a similarity score as high as the target speakers. Also, as expected, cross-domain SVC is harder than in-domain SVC, especially in the similarity aspect. We also investigated whether existing objective measurements were able to predict perceptual performance, and found that only few of them could reach a significant correlation.	翻訳日:2023-07-07 17:01:38 公開日:2023-07-06
# 大規模言語モデルによる中国のきめ細かな金融感情分析 Chinese Fine-Grained Financial Sentiment Analysis with Large Language Models ( http://arxiv.org/abs/2306.14096v2 ) ライセンス: Link先を確認	Yinyu Lan, Yanru Wu, Wang Xu, Weiqiang Feng, Youhao Zhang	(参考訳) 金融ドメインにおけるエンティティレベルのきめ細かい感情分析は、感情分析の重要なサブタスクであり、現在多くの課題に直面している。主な課題は、財務的なテキスト感情分析用に特別に設計された高品質で大規模な注釈付きコーパスが欠如していることであり、それによって効果的なテキスト処理技術を開発するために必要なデータの利用が制限される。大規模言語モデル(llm)の最近の進歩は、自然言語処理タスクにおいて、主に言語パターンマッチングを中心に顕著なパフォーマンスをもたらした。本稿では,企業早期警戒のための中国における財務感情分析データセットFinChina SAを提案する。我々のデータセットを用いて、よく知られたオープンソースのLCMを徹底的に評価し、実験した。我々は、我々のデータセットが、将来の研究の焦点となる実世界の財務感情分析タスクの探索を進めるための貴重なリソースとなると強く信じている。私たちのデータセットと実験結果を複製するすべてのコードがリリースされます。 Entity-level fine-grained sentiment analysis in the financial domain is a crucial subtask of sentiment analysis and currently faces numerous challenges. The primary challenge stems from the lack of high-quality and large-scale annotated corpora specifically designed for financial text sentiment analysis, which in turn limits the availability of data necessary for developing effective text processing techniques. Recent advancements in large language models (LLMs) have yielded remarkable performance in natural language processing tasks, primarily centered around language pattern matching. In this paper, we propose a novel and extensive Chinese fine-grained financial sentiment analysis dataset, FinChina SA, for enterprise early warning. We thoroughly evaluate and experiment with well-known existing open-source LLMs using our dataset. We firmly believe that our dataset will serve as a valuable resource to advance the exploration of real-world financial sentiment analysis tasks, which should be the focus of future research. Our dataset and all code to replicate the experimental results will be released.	翻訳日:2023-07-07 17:01:21 公開日:2023-07-06
# 新型コロナウイルスワクチン接種に関するトピックとパブリックスタンスの関係の可視化 Visualizing Relation Between (De)Motivating Topics and Public Stance toward COVID-19 Vaccine ( http://arxiv.org/abs/2306.12118v2 ) ライセンス: Link先を確認	Ashiqur Rahman and Hamed Alhoori	(参考訳) 現代のソーシャルメディアはコミュニケーションにおいて重要な役割を担っているが、誤った情報や荒らしが簡単に会話を引き継ぎ、これらのプラットフォームで世論を操ることができる。新型コロナウイルス(covid-19)パンデミックの際には、公衆衛生当局が国民にワクチン接種の動機付けを図りながら大きな反発を受けた。緊急時の現在および将来の脅威に対処し、共通の目標に向けて国民を動機付けるためには、公共のモチベーションがどのように変化し、どのトピックが一般市民の間で共鳴しているかを理解することが不可欠である。本研究では、新型コロナウイルス(covid-19)パンデミック時にtwitter圏内で共鳴した話題を検査・分析し、公衆の予防接種に対するスタンスを変えた重要な要因を理解するためのインタラクティブな可視化ツールを提案する。このツールは、視覚分析のあらゆるシナリオに対して容易に一般化することができ、研究者や一般大衆のソーシャルメディアデータの透明性を高めることができる。 While social media plays a vital role in communication nowadays, misinformation and trolls can easily take over the conversation and steer public opinion on these platforms. We saw the effect of misinformation during the COVID-19 pandemic when public health officials faced significant push-back while trying to motivate the public to vaccinate. To tackle the current and any future threats in emergencies and motivate the public towards a common goal, it is essential to understand how public motivation shifts and which topics resonate among the general population. In this study, we proposed an interactive visualization tool to inspect and analyze the topics that resonated among Twitter-sphere during the COVID-19 pandemic and understand the key factors that shifted public stance for vaccination. This tool can easily be generalized for any scenario for visual analysis and to increase the transparency of social media data for researchers and the general population alike.	翻訳日:2023-07-07 17:01:04 公開日:2023-07-06
# 熱2次元混合スピン1/2系の幾何学的位相 Geometric phases for a thermal two-dimensional mixed spin 1/2 system ( http://arxiv.org/abs/2306.11752v3 ) ライセンス: Link先を確認	Y. Ben-Aryeh	(参考訳) 混合状態に対する幾何位相を得るための量子力学的手法を解析する。純粋状態に対する並列輸送方程式は、動的位相を排除した混合状態に一般化される。混合状態の幾何学的位相はパンチャラトナム相として得られ、これは開サイクルにも有効である。幾何相は、NMRや中性子干渉実験で用いられるものと異なる混合熱状態のSU(2)変換によって引き起こされる。ゼロ次ハミルトニアンは、z方向における磁気モーメントと定磁場の相互作用によって与えられるが、本論文で想定される高次摂動は同じz方向の2つの振動磁場からなる。これらの仮定は、幾何相および干渉強度に関する結果が導出される混合熱状態のSU(2)ユニタリ変換の特別な形式をもたらす。 Quantum mechanical methods for getting geometric phases for mixed states are analyzed. Parallel transport equations for pure states are generalized to mixed states by which dynamical phases are eliminated. The geometric phases of mixed states are obtained as Pancharatnam phases which are valid also for open cycles. The geometric phases are derived here by SU(2) transformations of mixed thermal states which are different from those used in NMR and neutron interference experiments. While the zeroth order Hamiltonian is given by the interaction of a magnetic moment and constant magnetic field in the z direction, the high order perturbations assumed in the present article are composed of two oscillating magnetic fields in the same z direction. These assumptions lead to a special form of the SU(2) unitary transformation of the mixed thermal states by which results for geometric phase and for interference intensity are derived.	翻訳日:2023-07-07 17:00:45 公開日:2023-07-06
# ベクトル探索のための共設計ハードウェアとアルゴリズム Co-design Hardware and Algorithm for Vector Search ( http://arxiv.org/abs/2306.11182v3 ) ライセンス: Link先を確認	Wenqi Jiang and Shigang Li and Yu Zhu and Johannes de Fine Licht and Zhenhao He and Runbin Shi and Cedric Renggli and Shuai Zhang and Theodoros Rekatsinas and Torsten Hoefler and Gustavo Alonso	(参考訳) ベクトル検索は大規模な情報検索と機械学習システムの基盤として現れ、GoogleやBingといった検索エンジンは、エンコードされたクエリテキストとWebドキュメント間のベクトル類似性を評価することによって、ペタバイト規模のドキュメントデータセットで毎秒数万のクエリを処理する。ベクトル探索システムの性能要求が急増するにつれて、加速ハードウェアはムーアの法則時代において有望な解決策を提供する。 FPGA上のエンドツーエンドでスケーラブルなベクトル検索フレームワークである \textit{FANNS} を紹介する。データセットとハードウェアリソースの予算に関するユーザが提供するリコール要求を前提として、 \textit{FANNS}は自動的にハードウェアとアルゴリズムを設計し、それに対応するアクセラレータを生成する。このフレームワークは、ハードウェアTCP/IPスタックをアクセラレータに組み込むことでスケールアウトもサポートする。 fpgaとcpuのベースラインと比較して最大23.0$\times$と37.2$\times$ speedupを達成し、gpuに対する優れたスケーラビリティを示し、中央値で5.5$\times$と7.6$\times$ speedupを、8アクセラレータ構成で95$textsuperscript{th} percentile (p95)レイテンシを達成する。 textit{FANNS} の顕著な性能は、データセンターとAIスーパーコンピュータにおける将来のFPGA統合の堅牢な基盤となる。 Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0$\times$ and 37.2$\times$ speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5$\times$ and 7.6$\times$ speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.	翻訳日:2023-07-07 17:00:33 公開日:2023-07-06
# Image Matters:マルチモーダルハイパボラ検出のための新しいデータセットと実証的研究 Image Matters: A New Dataset and Empirical Study for Multimodal Hyperbole Detection ( http://arxiv.org/abs/2307.00209v2 ) ライセンス: Link先を確認	Huixuan Zhang, Xiaojun Wan	(参考訳) 誇張(Hyperbole)または誇張(exaggeration)は、一般的な言語現象である。ハイパボールの発見は、人間の表現を理解する重要な部分である。ハイパボラ検出の研究はいくつかあるが、そのほとんどはテキストのモダリティのみに焦点を当てている。しかし、ソーシャルメディアの発展によって、テキスト、画像、ビデオなど、さまざまなモダリティを持った双曲表現が作成できるようになる。本稿では,マルチモーダルハイパーボイル検出に注目する。マルチモーダル検出データセット\footnote{the datasetはコミュニティにリリースされます。 →weibo(中国のソーシャルメディア)から、いくつかの研究を行ないました。 weiboの一部のテキストと画像を2つのモダリティとして扱い,ハイパーボイル検出におけるテキストと画像の役割について検討する。このダウンストリームタスクでは、さまざまなプリトレーニングされたマルチモーダルエンコーダも評価され、パフォーマンスを示している。さらに、このデータセットは5つの異なるトピックから構築されているため、異なるモデルのクロスドメイン性能も評価する。これらの研究は、ベンチマークとして機能し、マルチモーダルハイパーボイル検出に関するさらなる研究の方向性を指摘することができる。 Hyperbole, or exaggeration, is a common linguistic phenomenon. The detection of hyperbole is an important part of understanding human expression. There have been several studies on hyperbole detection, but most of which focus on text modality only. However, with the development of social media, people can create hyperbolic expressions with various modalities, including text, images, videos, etc. In this paper, we focus on multimodal hyperbole detection. We create a multimodal detection dataset\footnote{The dataset will be released to the community.} from Weibo (a Chinese social media) and carry out some studies on it. We treat the text and image from a piece of weibo as two modalities and explore the role of text and image for hyperbole detection. Different pre-trained multimodal encoders are also evaluated on this downstream task to show their performance. Besides, since this dataset is constructed from five different topics, we also evaluate the cross-domain performance of different models. These studies can serve as a benchmark and point out the direction of further study on multimodal hyperbole detection.	翻訳日:2023-07-07 16:51:50 公開日:2023-07-06
# バイオメディカル言語モデルは準最適トークン化にロバストである Biomedical Language Models are Robust to Sub-optimal Tokenization ( http://arxiv.org/abs/2306.17649v2 ) ライセンス: Link先を確認	Bernal Jim\'enez Guti\'errez, Huan Sun, Yu Su	(参考訳) 一般英語とは対照的に、バイオメディカル用語学の多くの概念は、正確で簡潔なことを目標として、近年のバイオメディカル専門家によって設計された。これはしばしば、意味のある生体形態を結合して新しい意味単位を作成することで達成される。しかしながら、現代のほとんどのバイオメディカル言語モデル(LM)は、バイオメディカル言語の凝集特性を明示的に活用することなく、大規模バイオメディカルコーパス統計から派生した標準ドメイン固有のトークン化剤を用いて事前訓練されている。本研究では,バイオメディカルな用語を意味のある構成要素に分割できない標準オープンドメインとバイオメディカルなトークン化剤について述べる。そこで, バイオメディカル用語をより正確に区分するトークン化装置を用いることで, 下流のバイオメディカルNLPタスク, 特に名前付きエンティティ認識(NER)やエンティティリンクなどのバイオメディカル用語を直接含むタスクにおいて, バイオメディカルLMの性能を向上させることができると仮定した。驚くべきことに、より正確なバイオメディカルトークンを使用して生体医学的lmを事前トレーニングすることは、マスク言語モデリング予測(mlm)の精度やnerおよびエンティティリンクのパフォーマンスといったいくつかの本質的および極端的な尺度で測定されるように、言語モデルのエンティティ表現品質を改善するものではない。これらの定量的研究は、実体表現の質をより直接的に探求するケーススタディとともに、生物医学的な事前学習プロセスが準最適トークン化の事例に対して非常に堅牢であることを示している。 As opposed to general English, many concepts in biomedical terminology have been designed in recent history by biomedical professionals with the goal of being precise and concise. This is often achieved by concatenating meaningful biomedical morphemes to create new semantic units. Nevertheless, most modern biomedical language models (LMs) are pre-trained using standard domain-specific tokenizers derived from large scale biomedical corpus statistics without explicitly leveraging the agglutinating nature of biomedical language. In this work, we first find that standard open-domain and biomedical tokenizers are largely unable to segment biomedical terms into meaningful components. Therefore, we hypothesize that using a tokenizer which segments biomedical terminology more accurately would enable biomedical LMs to improve their performance on downstream biomedical NLP tasks, especially ones which involve biomedical terms directly such as named entity recognition (NER) and entity linking. Surprisingly, we find that pre-training a biomedical LM using a more accurate biomedical tokenizer does not improve the entity representation quality of a language model as measured by several intrinsic and extrinsic measures such as masked language modeling prediction (MLM) accuracy as well as NER and entity linking performance. These quantitative findings, along with a case study which explores entity representation quality more directly, suggest that the biomedical pre-training process is quite robust to instances of sub-optimal tokenization.	翻訳日:2023-07-07 16:51:35 公開日:2023-07-06
# 古典的および量子力学による原子表面散乱の物理過程の比較 Comparison of physical processes of atom-surface scattering computed by classical and quantum dynamics ( http://arxiv.org/abs/2306.17483v2 ) ライセンス: Link先を確認	Tapas Sahoo and Eli Pollak	(参考訳) 我々は,原子表面散乱の物理過程,例えばトラップ確率と平均エネルギー損失,腐食した熱表面から散乱した粒子の最終的な角分布の動的量を計算するために,古典的および量子力学シミュレーションを行った。ここでは、垂直距離 z と水平座標 x の2つの自由度しか考慮しなくてよいように、平面内散乱に自分自身を制限した。さらに, 表面の熱フォノン浴との相互作用により垂直座標のみが変動することが仮定された。初期位相 - 量子力学の初期波動関数から導かれたウィグナー分布関数に従って, 系の空間変数と古典的シミュレーションのための浴が生成される。非常に低い入射エネルギーでは、脱着した粒子と熱表面の量子力学的平均エネルギー損失は、特定の表面温度において古典的な粒子よりも小さいことが判明した。古典シミュレーションにより得られた散乱粒子の脱出確率は表面温度の増加とともに増加することに留意する必要がある。一方、量子速度は粒子の入射エネルギー2 meVでほぼ温度に依存し、古典的な結果と5 meVで同じ傾向を示し、量子速度は古典的な速度よりも低い。また、古典的だけでなく量子力学においても散乱粒子の最終的な角分布が定性的に異なるが、その量は多かれ少なかれ温度に依存しない。 We have performed classical and quantum dynamical simulations to calculate dynamical quantities for physical processes of atom - surface scattering, e.g., trapping probability and average energy loss, final angular distribution of a particle scattered from a corrugated thermal surface. Here we have restricted ourselves to in-plane scattering so that only two degrees of freedom of the particle have to be considered - the vertical distance z and the horizontal coordinate x. Moreover, we assumed further that only the vertical coordinate fluctuates due to interaction with thermal phonon bath of the surface. Initial phase - space variables of the system and the bath for our classical simulations were generated according to Wigner distribution functions which were derived from initial wavefunctions of our quantum dynamics. At very low incident energy, we have found that the quantum mechanical average energy loss of the escaped particle from the corrugated as well as thermal surface are smaller than the classical ones at a particular surface temperature. It is important to note that the rate of escaping probability of the scattered particle obtained by classical simulation increases with increasing surface temperature. On the other hand, quantum rate is almost temperature independent at 2 meV incident energy of the particle, whereas it shows same trend with the classical results at 5 meV and the quantum rate is lower than the classical rate. We have also noticed that the final angular distributions of the scattered particle both for classical as well as quantum dynamics are qualitatively different but the quantities are more or less temperature independent.	翻訳日:2023-07-07 16:51:06 公開日:2023-07-06
# 情報的非平衡の動的資源理論 The Dynamical Resource Theory of Informational Non-Equilibrium ( http://arxiv.org/abs/2306.16848v2 ) ライセンス: Link先を確認	Benjamin Stratton, Chung-Yun Hsieh, Paul Skrzypczyk	(参考訳) 情報は熱力学の理解に欠かせない。彼らの相互作用は、熱力学変換への情報的貢献を分離できる完全に縮退したハミルトニアンを通じて研究されている。この設定では、最大混合状態以外の全ての状態は情報非平衡状態であると考えられる。情報的非平衡を維持するために量子力学の能力をどのように特徴付けるか? ここでは, 情報的非平衡可観測性に関する動的資源理論を導入し, この問いへの答えを述べる。許容される演算のキャラクタリゼーションは、キュービットチャネルとn次元ワイル共変チャネル(一般チャネルの物理的関連部分集合)に対して与えられる。ベル状態測定を伴う状態識別ゲームの操作解釈が与えられる。最後に、チャネルの古典的容量と情報非平衡を維持する能力との明示的なリンクを作る。 Information is instrumental in our understanding of thermodynamics. Their interplay has been studied through completely degenerate Hamiltonians whereby the informational contributions to thermodynamic transformations can be isolated. In this setting, all states other then the maximally mixed state are considered to be in informational non-equilibrium. An important yet still open question is: how to characterise the ability of quantum dynamics to maintain informational non-equilibrium? Here, the dynamical resource theory of informational non-equilibrium preservability is introduced to begin providing an answer to this question. A characterisation of the allowed operations is given for qubit channels and the n dimensional Weyl-covariant channels - a physically relevant subset of the general channels. An operational interpretation of a state discrimination game with Bell state measurements is given. Finally, an explicit link between a channels classical capacity and its ability to maintain informational non-equilibrium is made.	翻訳日:2023-07-07 16:50:42 公開日:2023-07-06
# KITE:セマンティックマニピュレーションのためのキーポイント型ポリシー KITE: Keypoint-Conditioned Policies for Semantic Manipulation ( http://arxiv.org/abs/2306.16605v2 ) ライセンス: Link先を確認	Priya Sundaresan, Suneel Belkhale, Dorsa Sadigh, Jeannette Bohg	(参考訳) 自然言語は人間とロボットに便利な共有インターフェースを提供するが、ロボットが言語コマンドを解釈し従わせることは、操作において長年の課題である。動作指示追従ロボットを実現するための重要なステップは、ロボットが「ぬいぐるみを拾い上げる」といった高レベルな指示から「象の左耳を磨く」といったより詳細な入力まで、異なる特異性で言語を解釈する意味操作を実現することである。そこで我々は,シーンセマンティクス(視覚的場面における異なるオブジェクトの識別)とオブジェクトセマンティクス(正確にはオブジェクトインスタンス内の異なる部分のローカライズ)の両方に対応する意味操作のための2段階のフレームワークであるKeypoints + Instructions to Execution (KITE)を提案する。 KITEは、まず2次元画像キーポイントを通して視覚シーンに入力命令を接地し、下流アクション推論のための高精度なオブジェクト中心バイアスを提供する。 KITEはRGB-Dシーンの観察を行い、学習されたキーポイント条件のスキルを実行して命令を実行する。キーポイントの精度とパラメータ化スキルを組み合わせることで、シーンやオブジェクトのバリエーションを一般化したきめ細かい操作が可能になる。実世界の3つの環境 – 長距離6-DoFテーブルトップ操作,意味的把握,高精度コーヒー製造タスク – において,KITEを実証した。これらの設定では、KITEはそれぞれ75%、70%、全体の71%の成功率を達成している。 KITEは、キーポイントベースのグラウンドよりも事前訓練されたビジュアル言語モデルを選択するフレームワークや、エンドツーエンドのビジュモータコントロールを優先して省略スキルを向上する。追加資料、データセット、コード、ビデオは、私たちのWebサイトにある。 While natural language offers a convenient shared interface for humans and robots, enabling robots to interpret and follow language commands remains a longstanding challenge in manipulation. A crucial step to realizing a performant instruction-following robot is achieving semantic manipulation, where a robot interprets language at different specificities, from high-level instructions like "Pick up the stuffed animal" to more detailed inputs like "Grab the left ear of the elephant." To tackle this, we propose Keypoints + Instructions to Execution (KITE), a two-step framework for semantic manipulation which attends to both scene semantics (distinguishing between different objects in a visual scene) and object semantics (precisely localizing different parts within an object instance). KITE first grounds an input instruction in a visual scene through 2D image keypoints, providing a highly accurate object-centric bias for downstream action inference. Provided an RGB-D scene observation, KITE then executes a learned keypoint-conditioned skill to carry out the instruction. The combined precision of keypoints and parameterized skills enables fine-grained manipulation with generalization to scene and object variations. Empirically, we demonstrate KITE in 3 real-world environments: long-horizon 6-DoF tabletop manipulation, semantic grasping, and a high-precision coffee-making task. In these settings, KITE achieves a 75%, 70%, and 71% overall success rate for instruction-following, respectively. KITE outperforms frameworks that opt for pre-trained visual language models over keypoint-based grounding, or omit skills in favor of end-to-end visuomotor control, all while being trained from fewer or comparable amounts of demonstrations. Supplementary material, datasets, code, and videos can be found on our website: http://tinyurl.com/kite-site.	翻訳日:2023-07-07 16:50:30 公開日:2023-07-06
# Pareto Optimal Self-supervisionによるLCM校正と幻覚自動検出 LLM Calibration and Automatic Hallucination Detection via Pareto Optimal Self-supervision ( http://arxiv.org/abs/2306.16564v2 ) ライセンス: Link先を確認	Theodore Zhao, Mu Wei, J. Samuel Preston, Hoifung Poon	(参考訳) 大規模言語モデル (LLM) は、広範囲の応用において目覚ましい能力を示してきたが、精度は依然として大きな成長領域であり、特にバイオメディシンのようなミッションクリティカルな領域では顕著である。 LLM応答に対する信頼度を校正する効果的な方法は、エラーを自動的に検出し、ループ内検証を容易にするために不可欠である。キャリブレーション信号の重要な源は、低コストで利用可能であるが、ノイズやカバレッジといった独自の制限がある、専門家によるプログラム的監督にある。本稿では,利用可能なプログラム的監督を活用し,追加の手動作業なしに,各応答に対するリスクスコアを作成することで,llm応答を体系的に校正することができるparetoの最適自己スーパービジョンフレームワークを提案する。これは、より不確実なLSM応答により高いリスクスコアを割り当て、エラー修正を容易にする、他の利用可能な監視源とLLM出力を一致させるハーモニザモデルを学ぶことで達成される。生体医学領域および一般領域における標準関係抽出タスクの実験により,本手法の有効性が示され,本手法のリスクスコアはllmsの実誤差率と高い相関を示した。最も不確実なテスト例では,提案したリスクスコアに基づく動的プロンプトにより,既製のLCMの精度が大幅に向上し,SOTA(State-of-the-art)の監督が弱く,SOTAの監督が難しい評価データセットにGPT-4の結果が及んだ。 Large language models (LLMs) have demonstrated remarkable capabilities out of box for a wide range of applications, yet accuracy still remains a major growth area, especially in mission-critical domains such as biomedicine. An effective method to calibrate the confidence level on LLM responses is essential to automatically detect errors and facilitate human-in-the-loop verification. An important source of calibration signals stems from expert-stipulated programmatic supervision, which is often available at low cost but has its own limitations such as noise and coverage. In this paper, we introduce a Pareto optimal self-supervision framework that can leverage available programmatic supervision to systematically calibrate LLM responses by producing a risk score for every response, without any additional manual efforts. This is accomplished by learning a harmonizer model to align LLM output with other available supervision sources, which would assign higher risk scores to more uncertain LLM responses and facilitate error correction. Experiments on standard relation extraction tasks in biomedical and general domains demonstrate the promise of this approach, with our proposed risk scores highly correlated with the real error rate of LLMs. For the most uncertain test instances, dynamic prompting based on our proposed risk scores results in significant accuracy improvement for off-the-shelf LLMs, boosting GPT-3 results past state-of-the-art (SOTA) weak supervision and GPT-4 results past SOTA supervised results on challenging evaluation datasets.	翻訳日:2023-07-07 16:49:53 公開日:2023-07-06
# ランダムおよび自然言語文によるストラー数の統計力学 Statistical Mechanics of Strahler Number via Random and Natural Language Sentences ( http://arxiv.org/abs/2307.02697v1 ) ライセンス: Link先を確認	Kumiko Tanaka-Ishii and Akira Tanaka	(参考訳) ストラー数は当初、河川分岐の複雑さを特徴付けるために提案され、様々な応用を見出した。本稿では,統計的力学解析が可能な大規模データセットで利用可能な自然言語文木構造に対するシュトララー数の上・下限の計算を提案する。文法的に注釈付けされたデータにわたる経験的な測定により、Strahler の自然言語文の数は、Strahler (1957) と Horton (1945) が報告したように、ほぼ常に 3 または 4 であることが示された。数字の背景にある理論から、特定のモデルの下で文を処理するのに必要なメモリ量の上限が低いことを示す。乱数木の数学的解析は、シュトララー数の性質に関するさらなる予想を与え、それが定数ではなく対数的に成長することを示す。この発見は、一般的な木構造ターゲットの特徴として、ストラー数の背後にある統計的基礎を明らかにする。 The Strahler number was originally proposed to characterize the complexity of river bifurcation and has found various applications. This article proposes computation of the Strahler number's upper and lower limits for natural language sentence tree structures, which are available in a large dataset allowing for statistical mechanics analysis. Through empirical measurements across grammatically annotated data, the Strahler number of natural language sentences is shown to be almost always 3 or 4, similar to the case of river bifurcation as reported by Strahler (1957) and Horton (1945). From the theory behind the number, we show that it is the lower limit of the amount of memory required to process sentences under a particular model. A mathematical analysis of random trees provides a further conjecture on the nature of the Strahler number, revealing that it is not a constant but grows logarithmically. This finding uncovers the statistical basics behind the Strahler number as a characteristic of a general tree structure target.	翻訳日:2023-07-07 15:45:09 公開日:2023-07-06
# フェアネスレンズを通して:エンティティマッチングの実験解析と評価 Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching ( http://arxiv.org/abs/2307.02726v1 ) ライセンス: Link先を確認	Nima Shahbazi, Nikola Danevski, Fatemeh Nargesian, Abolfazl Asudeh, Divesh Srivastava	(参考訳) エンティティマッチング(em)は、さまざまなコミュニティが半世紀以上にわたって研究してきた困難な問題である。アルゴリズムの公平さは、マシンバイアスとその社会的影響に対処するためのタイムリーなトピックにもなっている。これら2つのトピックに関する広範な研究にもかかわらず、エンティティマッチングの公平性にはほとんど注意が払われていない。このギャップに対処するため,本論文では様々なem手法を広範囲に実験的に評価する。我々は、公正なレンズを通してEMを監査するために、公開データセットから2つのソーシャルデータセットを生成した。本研究は,実社会における2つの共通条件下での潜在的不公平性を浮き彫りにする。 (i)一部の人口集団が過剰に代表される場合 (ii)他のグループに比べて名前が似ている場合。多くの発見のうち、様々な公平性の定義は、emのクラス不均衡性のため、異なる設定で価値があるが、ポジティブな予測値パリティや真の正のレートパリティといった尺度は、一般に、emの不公平性を明らかにすることができる。 Entity matching (EM) is a challenging problem studied by different communities for over half a century. Algorithmic fairness has also become a timely topic to address machine bias and its societal impacts. Despite extensive research on these two topics, little attention has been paid to the fairness of entity matching. Towards addressing this gap, we perform an extensive experimental evaluation of a variety of EM techniques in this paper. We generated two social datasets from publicly available datasets for the purpose of auditing EM through the lens of fairness. Our findings underscore potential unfairness under two common conditions in real-world societies: (i) when some demographic groups are overrepresented, and (ii) when names are more similar in some groups compared to others. Among our many findings, it is noteworthy to mention that while various fairness definitions are valuable for different settings, due to EM's class imbalance nature, measures such as positive predictive value parity and true positive rate parity are, in general, more capable of revealing EM unfairness.	翻訳日:2023-07-07 15:34:08 公開日:2023-07-06
# 知識蒸留によるキーワードスポッティングのためのオンデバイス制約付き自己教師付き音声表現学習 On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation ( http://arxiv.org/abs/2307.02720v1 ) ライセンス: Link先を確認	Gene-Ping Yang, Yue Gu, Qingming Tang, Dongsu Du, Yuzong Liu	(参考訳) 大きな自己教師付きモデルは効果的な機能抽出器であるが、そのアプリケーションはオンデバイス予算の制約やバイアス付きデータセットの収集、特にキーワードスポッティングの下では困難である。そこで我々は,オンデバイスキーワードスポッティングのための知識蒸留に基づく自己教師型音声表現学習(S3RL)アーキテクチャを提案する。提案手法では,より大規模で複雑なモデルから,より小型で軽量なモデルに知識を移すための教師・教師の枠組みを用いて,二重視相互相関蒸留と教師のコードブックを学習対象とした。我々は、社内データセットを用いて、Alexaキーワードスポッティング検出タスクでモデルの性能を評価した。本手法は,通常および騒音条件において異常な性能を示し,オンデバイスリソース制約下で作業しながらキーワードスポッティングタスクの自己教師付きモデル構築における知識蒸留法の有効性を示した。 Large self-supervised models are effective feature extractors, but their application is challenging under on-device budget constraints and biased dataset collection, especially in keyword spotting. To address this, we proposed a knowledge distillation-based self-supervised speech representation learning (S3RL) architecture for on-device keyword spotting. Our approach used a teacher-student framework to transfer knowledge from a larger, more complex model to a smaller, light-weight model using dual-view cross-correlation distillation and the teacher's codebook as learning objectives. We evaluated our model's performance on an Alexa keyword spotting detection task using a 16.6k-hour in-house dataset. Our technique showed exceptional performance in normal and noisy conditions, demonstrating the efficacy of knowledge distillation methods in constructing self-supervised models for keyword spotting tasks while working within on-device resource constraints.	翻訳日:2023-07-07 15:33:53 公開日:2023-07-06
# 不確かさサンプリングを理解する Understanding Uncertainty Sampling ( http://arxiv.org/abs/2307.02719v1 ) ライセンス: Link先を確認	Shang Liu, Xiaocheng Li	(参考訳) 不確実性サンプリングは、現在の予測モデルが不確実であるデータサンプルの注釈を逐次クエリする、一般的なアクティブラーニングアルゴリズムである。しかし、不確実性サンプリングの使用は概ねヒューリスティックである。 (i)特定の損失を受けた特定のタスクに対する「不確実性」の適切な定義についての合意がないこと。 (II)アルゴリズムを実装するための標準プロトコルを規定する理論的保証はない。例えば、確率勾配降下のような最適化アルゴリズムの枠組みの下で、逐次到着した注釈付きデータをどう扱うか。本研究では,ストリームベースとプールベースの両方のアクティブラーニングの下で不確実性サンプリングアルゴリズムを体系的に検討する。そこで本研究では, 不確実性尺度と元の損失関数に依存する等価損失の概念を提案し, 不確実性サンプリングアルゴリズムが等価損失に対して本質的に最適化することを示す。この観点は、既存の不確実性対策の正当性を2つの側面から検証する。さらに、不確実性測度を不確実性として設計するための新しい概念である \textit{loss as uncertainty} を提案する。特徴を不確実性尺度として考慮すれば、条件付き期待損失を使用することが目的である。このような不確実性測度は、分類問題と回帰問題の両方をカバーする優れた解析的性質と一般性を有しており、基礎となるモデルと問題の完全な一般性において、ストリームベースとプールベースの設定の両方において不確実性サンプリングアルゴリズムに束縛された最初の一般化を提供することができる。最後に,リスクに敏感な目標と分布的ロバスト性を持つ不確実性サンプリングアルゴリズムのある種の変種間の接続を確立することにより,サンプルサイズが小さい場合の不確実性サンプリングアルゴリズムの利点を部分的に説明できる。 Uncertainty sampling is a prevalent active learning algorithm that queries sequentially the annotations of data samples which the current prediction model is uncertain about. However, the usage of uncertainty sampling has been largely heuristic: (i) There is no consensus on the proper definition of "uncertainty" for a specific task under a specific loss; (ii) There is no theoretical guarantee that prescribes a standard protocol to implement the algorithm, for example, how to handle the sequentially arrived annotated data under the framework of optimization algorithms such as stochastic gradient descent. In this work, we systematically examine uncertainty sampling algorithms under both stream-based and pool-based active learning. We propose a notion of equivalent loss which depends on the used uncertainty measure and the original loss function and establish that an uncertainty sampling algorithm essentially optimizes against such an equivalent loss. The perspective verifies the properness of existing uncertainty measures from two aspects: surrogate property and loss convexity. Furthermore, we propose a new notion for designing uncertainty measures called \textit{loss as uncertainty}. The idea is to use the conditional expected loss given the features as the uncertainty measure. Such an uncertainty measure has nice analytical properties and generality to cover both classification and regression problems, which enable us to provide the first generalization bound for uncertainty sampling algorithms under both stream-based and pool-based settings, in the full generality of the underlying model and problem. Lastly, we establish connections between certain variants of the uncertainty sampling algorithms with risk-sensitive objectives and distributional robustness, which can partly explain the advantage of uncertainty sampling algorithms when the sample size is small.	翻訳日:2023-07-07 15:33:36 公開日:2023-07-06
# TL-nvSRAM-CIM: DC-Power Free Restore と Ternary MAC 操作による超高密度3レベル ReRAM-Assisted Computing-in-nvSRAM TL-nvSRAM-CIM: Ultra-High-Density Three-Level ReRAM-Assisted Computing-in-nvSRAM with DC-Power Free Restore and Ternary MAC Operations ( http://arxiv.org/abs/2307.02717v1 ) ライセンス: Link先を確認	Dengfeng Wang, Liukai Xu, Songyuan Liu, zhi Li, Yiming Chen, Weifeng He, Xueqing Li and Yanan Su	(参考訳) 大規模NNのためにチップ上のすべての重量を調節することは、オンチップ容量に制限のあるSRAMベースのコンピューティングインメモリ(SRAM-CIM)にとって、依然として大きな課題である。従来の非揮発性SRAM-CIM(nvSRAM-CIM)は、高効率SRAM-CIMの上に高密度のシングルレベルReRAMを統合することでこの問題に対処し、オフチップメモリアクセスをなくした。しかし、以前のSL-nvSRAM-CIMは、SL-ReRAMの増加と計算効率の制限によりスケーラビリティが低下していた。これらの課題を克服するために、大規模なNNモデルのための超高密度3レベルReRAM支援非揮発性SRAM(TL-nvSRAM-CIM)方式を提案する。クラスタ化されたn-selector-n-ReRAM (cluster-nSnRs) は、DC電力を排除した信頼性の高い重み復元に使用される。さらに、高NN精度を維持しつつ、エネルギー効率のよい三値MAC演算に対して、微分計算方式による三値SRAM-CIM機構を提案する。提案したTL-nvSRAM-CIMは、最先端技術と比較して7.8倍のストレージ密度を実現する。さらに、TL-nvSRAM-CIMはSRAM-CIMとReRAM-CIMのベースライン設計と比較して最大2.9倍、エネルギー効率は1.9倍に向上した。 Accommodating all the weights on-chip for large-scale NNs remains a great challenge for SRAM based computing-in-memory (SRAM-CIM) with limited on-chip capacity. Previous non-volatile SRAM-CIM (nvSRAM-CIM) addresses this issue by integrating high-density single-level ReRAMs on the top of high-efficiency SRAM-CIM for weight storage to eliminate the off-chip memory access. However, previous SL-nvSRAM-CIM suffers from poor scalability for an increased number of SL-ReRAMs and limited computing efficiency. To overcome these challenges, this work proposes an ultra-high-density three-level ReRAMs-assisted computing-in-nonvolatile-SRAM (TL-nvSRAM-CIM) scheme for large NN models. The clustered n-selector-n-ReRAM (cluster-nSnRs) is employed for reliable weight-restore with eliminated DC power. Furthermore, a ternary SRAM-CIM mechanism with differential computing scheme is proposed for energy-efficient ternary MAC operations while preserving high NN accuracy. The proposed TL-nvSRAM-CIM achieves 7.8x higher storage density, compared with the state-of-art works. Moreover, TL-nvSRAM-CIM shows up to 2.9x and 1.9x enhanced energy-efficiency, respectively, compared to the baseline designs of SRAM-CIM and ReRAM-CIM, respectively.	翻訳日:2023-07-07 15:33:09 公開日:2023-07-06
# cfsum:マルチモーダル要約のための細かな貢献ネットワーク CFSum: A Coarse-to-Fine Contribution Network for Multimodal Summarization ( http://arxiv.org/abs/2307.02716v1 ) ライセンス: Link先を確認	Min Xiao, Junnan Zhu, Haitao Lin, Yu Zhou, Chengqing Zong	(参考訳) マルチモーダル要約は通常、視覚モダリティの寄与が不明確であるという問題に苦しむ。既存のマルチモーダル要約手法は、視覚的モダリティが有用である適応条件を無視しながら、異なるモダリティの融合方法の設計に重点を置いている。そこで本研究では,多変量和数化 (cfsum) のための,画像の異なる和数化への寄与を考えるための新しい粗度対細貢献ネットワークを提案する。まず,無駄な画像の干渉をなくすため,無駄な画像を見捨てるプリフィルタモジュールを提案する。次に,有用な画像を正確に利用するために,単語レベルと句レベルという2つの視覚補完モジュールを提案する。具体的には、画像のコントリビューションを計算し、テキストと視覚の両方に注意を向ける。実験の結果、CFSumは標準ベンチマークで複数の強いベースラインを著しく上回っていることがわかった。さらに,画像中に暗黙的に表現される非視覚的単語を生成するのに有用であることを示す。 Multimodal summarization usually suffers from the problem that the contribution of the visual modality is unclear. Existing multimodal summarization approaches focus on designing the fusion methods of different modalities, while ignoring the adaptive conditions under which visual modalities are useful. Therefore, we propose a novel Coarse-to-Fine contribution network for multimodal Summarization (CFSum) to consider different contributions of images for summarization. First, to eliminate the interference of useless images, we propose a pre-filter module to abandon useless images. Second, to make accurate use of useful images, we propose two levels of visual complement modules, word level and phrase level. Specifically, image contributions are calculated and are adopted to guide the attention of both textual and visual modalities. Experimental results have shown that CFSum significantly outperforms multiple strong baselines on the standard benchmark. Furthermore, the analysis verifies that useful images can even help generate non-visual words which are implicitly represented in the image.	翻訳日:2023-07-07 15:32:42 公開日:2023-07-06
# 多相性コントラスト学習 Multi-Similarity Contrastive Learning ( http://arxiv.org/abs/2307.02712v1 ) ライセンス: Link先を確認	Emily Mu, John Guttag, Maggie Makar	(参考訳) 類似度計量が与えられたとき、対照的な手法は、類似する例が一つにまとめられ、異なる例が引き離される表現を学ぶ。画像分類からキャプション生成までのタスクの表現を学習するために,コントラスト学習技術が広く利用されている。しかし、既存の対照的な学習アプローチは、異なる類似性関係の可能性を考慮していないため、一般化できない可能性がある。本稿では,複数の類似度指標の監視を共同で活用することにより,一般化可能な埋め込みを学習する新しいマルチ相似コントラスト損失(MSCon)を提案する。提案手法は,類似性の不確実性に基づいてコントラスト的類似度重み付けを自動的に学習し,未知のタスクの重み付けを行い,新たなタスクへのドメイン外一般化を改善する。我々は、MSConでトレーニングされたネットワークが、ドメイン内およびドメイン外設定で最先端のベースラインより優れていることを実証的に示す。 Given a similarity metric, contrastive methods learn a representation in which examples that are similar are pushed together and examples that are dissimilar are pulled apart. Contrastive learning techniques have been utilized extensively to learn representations for tasks ranging from image classification to caption generation. However, existing contrastive learning approaches can fail to generalize because they do not take into account the possibility of different similarity relations. In this paper, we propose a novel multi-similarity contrastive loss (MSCon), that learns generalizable embeddings by jointly utilizing supervision from multiple metrics of similarity. Our method automatically learns contrastive similarity weightings based on the uncertainty in the corresponding similarity, down-weighting uncertain tasks and leading to better out-of-domain generalization to new tasks. We show empirically that networks trained with MSCon outperform state-of-the-art baselines on in-domain and out-of-domain settings.	翻訳日:2023-07-07 15:32:23 公開日:2023-07-06
# 不正確な地下構造評価のための論理的評価式の実用性検証 Validation of the Practicability of Logical Assessment Formula for Evaluations with Inaccurate Ground-Truth Labels ( http://arxiv.org/abs/2307.02709v1 ) ライセンス: Link先を確認	Yongquan Yang and Hong Bu	(参考訳) 論理的アセスメント公式 (LAF) は、様々な人工知能応用の予測モデルを評価するために、不正確な地上真実ラベル (IAGTL) を用いた評価のために提案された新しい理論である。しかし, IAGTLを用いた評価において, LAFの実践性はまだ実証されていない。本稿では,この課題に対処するため,臨床病理組織学的スライス画像解析(MHWSIA)における乳癌の腫瘍分節(TSfBC)にLAFを適用した。実験結果と解析結果から,TSfBC症例における IAGTL 評価における LAF の有効性と,MHWSIA に対する LAF の有用性が示唆された。 Logical assessment formula (LAF) is a new theory proposed for evaluations with inaccurate ground-truth labels (IAGTLs) to assess the predictive models for various artificial intelligence applications. However, the practicability of LAF for evaluations with IAGTLs has not yet been validated in real-world practice. In this paper, to address this issue, we applied LAF to tumour segmentation for breast cancer (TSfBC) in medical histopathology whole slide image analysis (MHWSIA). Experimental results and analysis show the validity of LAF for evaluations with IAGTLs in the case of TSfBC and reflect the potentials of LAF applied to MHWSIA.	翻訳日:2023-07-07 15:32:05 公開日:2023-07-06
# 対称性を考慮した周期材料の創製に向けて Towards Symmetry-Aware Generation of Periodic Materials ( http://arxiv.org/abs/2307.02707v1 ) ライセンス: Link先を確認	Youzhi Luo, Chengkai Liu, Shuiwang Ji	(参考訳) 深部モデルを用いた周期材料生成の問題を考える。対称性を感知する分子生成は広く研究されているが、周期的物質は異なる対称性を持ち、既存の方法では完全には捉えられていない。本稿では,周期的物質構造の物理的対称性を捉える新しい材料生成手法であるsymatを提案する。 SyMatは、変分オートエンコーダモデルを用いて、原子タイプセット、格子長、格子角を生成することによって、材料の原子タイプと格子を生成する。さらに、symatはスコアベースの拡散モデルを用いて材料の原子座標を生成し、座標拡散過程において新しい対称性を認識できる確率モデルを用いる。我々は,SyMatが材料上のすべての対称性変換に理論的に不変であることを示し,SyMatがランダム生成および特性最適化タスクにおいて有望な性能を達成することを示す。 We consider the problem of generating periodic materials with deep models. While symmetry-aware molecule generation has been studied extensively, periodic materials possess different symmetries, which have not been completely captured by existing methods. In this work, we propose SyMat, a novel material generation approach that can capture physical symmetries of periodic material structures. SyMat generates atom types and lattices of materials through generating atom type sets, lattice lengths and lattice angles with a variational auto-encoder model. In addition, SyMat employs a score-based diffusion model to generate atom coordinates of materials, in which a novel symmetry-aware probabilistic model is used in the coordinate diffusion process. We show that SyMat is theoretically invariant to all symmetry transformations on materials and demonstrate that SyMat achieves promising performance on random generation and property optimization tasks.	翻訳日:2023-07-07 15:31:51 公開日:2023-07-06
# 積分ゆらぎ定理とトレース保存写像 Integral fluctuation theorems and trace-preserving map ( http://arxiv.org/abs/2307.02705v1 ) ライセンス: Link先を確認	Zhiqiang Huang	(参考訳) 詳細なゆらぎ定理はエントロピー生成確率の生成関数に関する対称性を意味する。積分ゆらぎ定理は、この対称性と確率の正規化からすぐに従う。本稿では,生成関数を完全正の写像で書き直し,積分 ft が構築した写像のトレース保存特性によって決定されることを示す。固有状態変動定理と2つの系間の熱交換を議論することで,この枠組みの利便性を実証する。この手法は準確率生成関数にも適用でき、petzリカバリ写像はこの枠組みから自然に生じることが分かる。また、変動散逸定理の一般化の研究に役立つであろうマルチタイムプロセスの関数生成についても、簡潔に論じる。 The detailed fluctuation theorem implies the symmetry on the generating function of the entropy production probability. The integral fluctuation theorem follows immediately from this symmetry and normalization of the probability. In this paper, we rewrite the generating function with complete positive maps and show that the integral FT is determined by the trace-preserving property of constructed maps. We demonstrate the convenience of this framework by discussing the eigenstate fluctuation theorem and the heat exchange between two systems. This set of methods is also applicable to the generating function of quasi-probability, and we find that the Petz recovery map can arise naturally from this framework. We also briefly discuss generating functions for multitime processes, which may be helpful in studying the generalization of the fluctuation-dissipation theorem.	翻訳日:2023-07-07 15:31:37 公開日:2023-07-06
# 拡散モデルを用いた局所制御によるカラーパレットの適用 Applying a Color Palette with Local Control using Diffusion Models ( http://arxiv.org/abs/2307.02698v1 ) ライセンス: Link先を確認	Vaibhav Vavilala and David Forsyth	(参考訳) ファンタジーカードアートの文脈における2つの新しい編集手順を実証する。パレット転送は、指定された参照パレットを所定のカードに適用する。ファンタジーアートにとって、パレットの望ましい変化は非常に大きく、芸術の「外観」に大きな変化をもたらす可能性がある。ベクトル量子化のパイプライン、マッチング、および(拡散モデルを用いて)「ベクトル量子化」が極端なパレット転送を成功させることを示す。セグメント制御により、アーティストは1つ以上の画像セグメントを移動でき、任意に結果の色を指定することができる。これら2つのタイプの編集の組み合わせは、セグメントを移動し、再色し、再色し、一部のセグメントに所定の色を強制するといった、貴重なワークフローをもたらす。我々は,Yu-Gi-Ohカードアートデータセットに挑戦する手法を実証する。 We demonstrate two novel editing procedures in the context of fantasy card art. Palette transfer applies a specified reference palette to a given card. For fantasy art, the desired change in palette can be very large, leading to huge changes in the "look" of the art. We demonstrate that a pipeline of vector quantization; matching; and "vector dequantization" (using a diffusion model) produces successful extreme palette transfers. Segment control allows an artist to move one or more image segments, and to optionally specify the desired color of the result. The combination of these two types of edit yields valuable workflows, including: move a segment, then recolor; recolor, then force some segments to take a prescribed color. We demonstrate our methods on the challenging Yu-Gi-Oh card art dataset.	翻訳日:2023-07-07 15:31:25 公開日:2023-07-06
# ALPCAH:Tail Singular Value Regularizationを用いたサンプルワイズヘテロシダスティックPCA ALPCAH: Sample-wise Heteroscedastic PCA with Tail Singular Value Regularization ( http://arxiv.org/abs/2307.02745v1 ) ライセンス: Link先を確認	Javier Salazar Cavazos, Jeffrey A. Fessler, Laura Balzano	(参考訳) 主成分分析(PCA)はデータ次元削減の分野で重要なツールであり、様々なデータサイエンス問題に有用である。しかし、多くの応用は、異なるデータ源に関連するノイズ特性により品質が変化する異種データを含む。この混合データセットを扱う手法はヘテロシデスティック法として知られている。 HePPCATのような現在の手法は、実際は成り立たない基底係数のガウス的仮定を作る。重み付きPCA (WPCA) のような他の手法はノイズの分散が知られていると仮定するが、実際は知るのが難しい。本稿では,サンプル単位の雑音分散を推定できるPCA法を開発し,この情報を用いてデータの低ランク構造に関連する部分空間ベースの推定を改善する。これは低ランク成分の分布的な仮定やノイズ分散が知られていると仮定せずに行われる。シミュレーションでは, データのヘテロセシスティック性を考慮し, 全データと良好なデータのみを保持することの利点, PCA, Robust PCA (RPCA) や HePPCAT などの文献で確立されている他の PCA 手法との比較を行った。コードはhttps://github.com/javiersc1/alpcahで利用可能 Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction that is useful for various data science problems. However, many applications involve heterogeneous data that varies in quality due to noise characteristics associated with different sources of the data. Methods that deal with this mixed dataset are known as heteroscedastic methods. Current methods like HePPCAT make Gaussian assumptions of the basis coefficients that may not hold in practice. Other methods such as Weighted PCA (WPCA) assume the noise variances are known, which may be difficult to know in practice. This paper develops a PCA method that can estimate the sample-wise noise variances and use this information in the model to improve the estimate of the subspace basis associated with the low-rank structure of the data. This is done without distributional assumptions of the low-rank component and without assuming the noise variances are known. Simulations show the effectiveness of accounting for such heteroscedasticity in the data, the benefits of using such a method with all of the data versus retaining only good data, and comparisons are made against other PCA methods established in the literature like PCA, Robust PCA (RPCA), and HePPCAT. Code available at https://github.com/javiersc1/ALPCAH	翻訳日:2023-07-07 15:25:44 公開日:2023-07-06
# 表情認識のためのコントラスト事前学習によるアクティブラーニング Active Learning with Contrastive Pre-training for Facial Expression Recognition ( http://arxiv.org/abs/2307.02744v1 ) ライセンス: Link先を確認	Shuvendu Roy, Ali Etemad	(参考訳) ディープラーニングは、大規模なモデルと大量のラベル付きデータのおかげで、表情認識(FER)の成功に重要な役割を果たしてきた。しかし、ラベル付きデータを得るには膨大な人的労力、時間、資金が必要となる。いくつかの先行研究は、異なる教師なし手法を用いて大量のラベル付きデータの必要性を減らすことに重点を置いているが、アクティブラーニングと呼ばれる別の有望なアプローチは、FERの文脈ではほとんど研究されていない。このアプローチでは、制限された「ラベル予算」を最大限に活用するために、ラベルなしのセットから最も代表的なサンプルを選択してラベル付けする。本稿では,3つの公開FERデータセット(FER13,RAF-DB,KDEF)に対して,最近の8つのアクティブラーニング手法を実装し,検討する。その結果、既存のアクティブラーニング手法はferの文脈ではうまく機能せず、ラベル付きサンプルの初期セットがデータセット全体をよく表していない場合に発生する「コールドスタート」と呼ばれる現象に苦しむことが判明した。この問題に対処するために,まず,ラベルなしデータセット全体に基づいて基礎となる表現を学習する,自己教師付き事前学習を提案する。次に、アクティブラーニング手法を用いてこれに従い、2段階のアプローチがランダムサンプリングよりも最大9.2%改善され、事前トレーニングなしで既存の最良のアクティブラーニングベースラインよりも最大6.7%改善されていることを観察する。この研究のコードは、github.com/ShuvenduRoy/ActiveFERで公開します。 Deep learning has played a significant role in the success of facial expression recognition (FER), thanks to large models and vast amounts of labelled data. However, obtaining labelled data requires a tremendous amount of human effort, time, and financial resources. Even though some prior works have focused on reducing the need for large amounts of labelled data using different unsupervised methods, another promising approach called active learning is barely explored in the context of FER. This approach involves selecting and labelling the most representative samples from an unlabelled set to make the best use of a limited 'labelling budget'. In this paper, we implement and study 8 recent active learning methods on three public FER datasets, FER13, RAF-DB, and KDEF. Our findings show that existing active learning methods do not perform well in the context of FER, likely suffering from a phenomenon called 'Cold Start', which occurs when the initial set of labelled samples is not well representative of the entire dataset. To address this issue, we propose contrastive self-supervised pre-training, which first learns the underlying representations based on the entire unlabelled dataset. We then follow this with the active learning methods and observe that our 2-step approach shows up to 9.2% improvement over random sampling and up to 6.7% improvement over the best existing active learning baseline without the pre-training. We will make the code for this study public upon publication at: github.com/ShuvenduRoy/ActiveFER.	翻訳日:2023-07-07 15:25:22 公開日:2023-07-06
# ターゲットドメイン記述を用いたDense Retrieval Adaptation Dense Retrieval Adaptation using Target Domain Description ( http://arxiv.org/abs/2307.02740v1 ) ライセンス: Link先を確認	Helia Hashemi, Yong Zhuang, Sachith Sri Ram Kothur, Srivas Prasad, Edgar Meij, W. Bruce Croft	(参考訳) 情報検索(ir)において、ドメイン適応とは、データ分布がソースドメインとは異なる新しいドメインに検索モデルを適用するプロセスである。この領域の既存の手法は、対象のドキュメントコレクションへのアクセスや、対象のドメイン内のラベル付き(制限された)データへのアクセスを監督(しばしば数ショット)するドメイン適応に焦点をあてている。適応性のない検索モデルのゼロショット性能向上に関する研究もある。本稿では、未探索のIRにおける領域適応の新しいカテゴリを紹介する。ここではゼロショット設定と同様、検索モデルが対象文書コレクションにアクセスできないと仮定する。対照的に、ターゲットドメインを説明する短いテキスト記述にアクセスすることができる。検索タスクにおいて、対象ドメインに適合可能なソースドメインの異なる特性を理解するために、ドメイン属性の分類を定義する。本稿では,テキストドメイン記述を前提として,合成文書コレクション,クエリセット,擬似関連ラベルを生成する新しい自動データ構築パイプラインを提案する。 5つの多様な対象領域に関する広範囲な実験により,構築した合成データを用いた高密度検索モデルの適用により,対象領域での効果的な検索性能が得られた。 In information retrieval (IR), domain adaptation is the process of adapting a retrieval model to a new domain whose data distribution is different from the source domain. Existing methods in this area focus on unsupervised domain adaptation where they have access to the target document collection or supervised (often few-shot) domain adaptation where they additionally have access to (limited) labeled data in the target domain. There also exists research on improving zero-shot performance of retrieval models with no adaptation. This paper introduces a new category of domain adaptation in IR that is as-yet unexplored. Here, similar to the zero-shot setting, we assume the retrieval model does not have access to the target document collection. In contrast, it does have access to a brief textual description that explains the target domain. We define a taxonomy of domain attributes in retrieval tasks to understand different properties of a source domain that can be adapted to a target domain. We introduce a novel automatic data construction pipeline that produces a synthetic document collection, query set, and pseudo relevance labels, given a textual domain description. Extensive experiments on five diverse target domains show that adapting dense retrieval models using the constructed synthetic data leads to effective retrieval performance on the target domain.	翻訳日:2023-07-07 15:24:54 公開日:2023-07-06
# RecallM: 時間的コンテキスト理解と質問応答のためのアーキテクチャ RecallM: An Architecture for Temporal Context Understanding and Question Answering ( http://arxiv.org/abs/2307.02738v1 ) ライセンス: Link先を確認	Brandon Kynoch, Hugo Latapie	(参考訳) 大規模言語モデル(llm)ベースのチャットボットのための理想的な長期記憶メカニズムは、継続的な学習、複雑な推論、シーケンシャルおよびテンポラリな依存関係の学習の基盤となる。このタイプのメモリメカニズムを作成することは、非常に難しい問題です。本稿では、長期記憶の効果を達成するための様々な方法を検討する。本稿では,AGIシステムのための適応型・アップグレード可能な長期メモリの構築を目的とした新しいアーキテクチャを提案する。様々な実験を通じて、RecallMアーキテクチャの利点、特に時間的理解の向上を実演する。 The ideal long-term memory mechanism for Large Language Model (LLM) based chatbots, would lay the foundation for continual learning, complex reasoning and allow sequential and temporal dependencies to be learnt. Creating this type of memory mechanism is an extremely challenging problem. In this paper we explore different methods of achieving the effect of long-term memory. We propose a new architecture focused on creating adaptable and updatable long-term memory for AGI systems. We demonstrate through various experiments the benefits of the RecallM architecture, particularly the improved temporal understanding it provides.	翻訳日:2023-07-07 15:24:35 公開日:2023-07-06
# Liver $T_1\rho$マッピングと分析のための不確かさ支援フレームワーク An Uncertainty Aided Framework for Learning based Liver $T_1\rho$ Mapping and Analysis ( http://arxiv.org/abs/2307.02736v1 ) ライセンス: Link先を確認	Chaoxing Huang, Vincent Wai Sun Wong, Queen Chan, Winnie Chiu Wing Chu, Weitian Chen	(参考訳) 目的:$T_1\rho$イメージングは肝疾患の生化学的変化を評価する可能性がある。定量的なT_1\rho$イメージングを加速するために深層学習法が用いられている。複雑な臨床環境において人工知能を用いた定量的イメージング手法を採用するためには,推定された$t_1\rho$値の不確かさを推定し,定量化結果の信頼性レベルを提供することが重要である。この不確実性は、ポストホックな定量的分析とモデル学習タスクを支援するためにも活用されるべきである。アプローチ:このニーズに対処するために、学習ベースの$t_1\rho$マッピングのためのパラメトリックマップリファインメントアプローチを提案し、不確かさをモデル化するための確率的方法でモデルを訓練する。また,改良された$t_1\rho$マッピングネットワークのトレーニングを空間的に重み付けて,マッピング性能をさらに向上させ,信頼できない$t_1\rho$値の画素を除去するための不確実性マップを提案する。この枠組みは肝線維化段階の異なる51例のデータセットでテストされた。主な結果: 学習に基づくマップリファインメント手法は, 相対的マッピング誤差が3%未満となり, 不確実性推定を同時に行なえることを示す。推定された不確実性は実際のエラーレベルを反映しており、相対的に$t_1\rho$マッピングエラーを2.60%に削減し、関心領域の信頼できないピクセルを効果的に除去するために使用できる。意義:本研究は肝のT_1\rho$マッピングに学習に基づく定量的MRIシステムを提供することの可能性を示した。 Objective: Quantitative $T_1\rho$ imaging has potential for assessment of biochemical alterations of liver pathologies. Deep learning methods have been employed to accelerate quantitative $T_1\rho$ imaging. To employ artificial intelligence-based quantitative imaging methods in complicated clinical environment, it is valuable to estimate the uncertainty of the predicated $T_1\rho$ values to provide the confidence level of the quantification results. The uncertainty should also be utilized to aid the post-hoc quantitative analysis and model learning tasks. Approach: To address this need, we propose a parametric map refinement approach for learning-based $T_1\rho$ mapping and train the model in a probabilistic way to model the uncertainty. We also propose to utilize the uncertainty map to spatially weight the training of an improved $T_1\rho$ mapping network to further improve the mapping performance and to remove pixels with unreliable $T_1\rho$ values in the region of interest. The framework was tested on a dataset of 51 patients with different liver fibrosis stages. Main results: Our results indicate that the learning-based map refinement method leads to a relative mapping error of less than 3% and provides uncertainty estimation simultaneously. The estimated uncertainty reflects the actual error level, and it can be used to further reduce relative $T_1\rho$ mapping error to 2.60% as well as removing unreliable pixels in the region of interest effectively. Significance: Our studies demonstrate the proposed approach has potential to provide a learning-based quantitative MRI system for trustworthy $T_1\rho$ mapping of the liver.	翻訳日:2023-07-07 15:24:25 公開日:2023-07-06
# MMNet:シークエンシャルディープフェイク検出のためのマルチコラボレーションとマルチスーパービジョンネットワーク MMNet: Multi-Collaboration and Multi-Supervision Network for Sequential Deepfake Detection ( http://arxiv.org/abs/2307.02733v1 ) ライセンス: Link先を確認	Ruiyang Xia, Decheng Liu, Jie Li, Lin Yuan, Nannan Wang, Xinbo Gao	(参考訳) 高度な操作技術は、偽造顔画像などの偽造メディアの生成を通じて、犯罪者に社会的なパニックや不正な利益を得る機会を与えてきた。画像の信頼性を評価するため,様々なディープフェイク検出手法が提案されている。ディープフェイク検出の拡張であるシークエンシャルディープフェイク検出は、回復のための正しいシーケンスで偽の顔領域を特定することを目的としている。それにもかかわらず、空間的およびシーケンシャルな操作の異なる組み合わせにより、偽造顔画像は検出性能に重大な影響を及ぼすかなりの相違点を示す。さらに、偽画像の復元には、逆変換を実装するために操作モデルの知識が必要であるため、関連する技術が攻撃者によって隠蔽されることが多いため、確認は困難である。これらの課題に対処するために, 顔画像の様々な空間スケールや逐次的な順応を扱うマルチコラボレーション・マルチスーパービジョンネットワーク (MMNet) を提案し, 対応する操作方法の知識を必要とせずに回復を実現する。さらに, 既存の評価基準では, 連続的複数ステップにおける地盤との一致度を考慮せず, 単一の推定ステップで検出精度のみを考慮に入れている。この制限を克服するために,複数の推論ステップにおける検出精度を考慮した完全系列マッチング(CSM)と呼ばれる新しい評価指標を提案する。いくつかの典型的なデータセットに対する大規模な実験は、MMNetが最先端検出性能と独立回復性能を達成することを示した。 Advanced manipulation techniques have provided criminals with opportunities to make social panic or gain illicit profits through the generation of deceptive media, such as forged face images. In response, various deepfake detection methods have been proposed to assess image authenticity. Sequential deepfake detection, which is an extension of deepfake detection, aims to identify forged facial regions with the correct sequence for recovery. Nonetheless, due to the different combinations of spatial and sequential manipulations, forged face images exhibit substantial discrepancies that severely impact detection performance. Additionally, the recovery of forged images requires knowledge of the manipulation model to implement inverse transformations, which is difficult to ascertain as relevant techniques are often concealed by attackers. To address these issues, we propose Multi-Collaboration and Multi-Supervision Network (MMNet) that handles various spatial scales and sequential permutations in forged face images and achieve recovery without requiring knowledge of the corresponding manipulation method. Furthermore, existing evaluation metrics only consider detection accuracy at a single inferring step, without accounting for the matching degree with ground-truth under continuous multiple steps. To overcome this limitation, we propose a novel evaluation metric called Complete Sequence Matching (CSM), which considers the detection accuracy at multiple inferring steps, reflecting the ability to detect integrally forged sequences. Extensive experiments on several typical datasets demonstrate that MMNet achieves state-of-the-art detection performance and independent recovery performance.	翻訳日:2023-07-07 15:23:55 公開日:2023-07-06
# evaluatorsの評価: 現在の少数の学習ベンチマークは目的に合っているか? Evaluating the Evaluators: Are Current Few-Shot Learning Benchmarks Fit for Purpose? ( http://arxiv.org/abs/2307.02732v1 ) ライセンス: Link先を確認	Lu\'isa Shimabucoro, Timothy Hospedales, Henry Gouk	(参考訳) Few-Shot Learningのための多くのベンチマークがここ10年間提案されている。しかし、これらのベンチマークはすべて多くのタスクでパフォーマンスに重点を置いており、個々のタスクのためにトレーニングされたモデルをどのように確実に評価しチューニングするかという問題は解決されていない。本稿では,タスクレベルの評価 - モデルをデプロイする上での基本的なステップについて,最初の調査を行う。提案手法は,数ショット設定における性能推定器の精度を計測し,モデル選択の戦略を検討し,通常ロバストであると考えられる評価器の故障の原因を考察する。また,多数の折り畳みを持つブートストラップやクロスバリデーションを用いることで,モデル選択の目的に適しており,モデルの性能を直接推定する上では,クロスバリデーションが最適である,という結論を得た。全体として、既存の数ショット学習のベンチマークは、個々のタスクでメソッドがいかに効果的に使えるかの信頼性の高い図を得られるように設計されていない。 Numerous benchmarks for Few-Shot Learning have been proposed in the last decade. However all of these benchmarks focus on performance averaged over many tasks, and the question of how to reliably evaluate and tune models trained for individual tasks in this regime has not been addressed. This paper presents the first investigation into task-level evaluation -- a fundamental step when deploying a model. We measure the accuracy of performance estimators in the few-shot setting, consider strategies for model selection, and examine the reasons for the failure of evaluators usually thought of as being robust. We conclude that cross-validation with a low number of folds is the best choice for directly estimating the performance of a model, whereas using bootstrapping or cross validation with a large number of folds is better for model selection purposes. Overall, we find that existing benchmarks for few-shot learning are not designed in such a way that one can get a reliable picture of how effectively methods can be used on individual tasks.	翻訳日:2023-07-07 15:23:29 公開日:2023-07-06
# 細粒度アクション分析:フィギュアスケートのマルチモダリティとマルチタスクデータセット Fine-grained Action Analysis: A Multi-modality and Multi-task Dataset of Figure Skating ( http://arxiv.org/abs/2307.02730v1 ) ライセンス: Link先を確認	Sheng-Lan Liu, Yu-Ning Ding, Si-Fan Zhang, Wen-Yue Chen, Ning Zhou, Hao Liu, Gui-Hong Lao	(参考訳) 既存のアクションデータセットのきめ細かいアクション分析は、不十分なアクションカテゴリ、低い粒度、限られたモダリティ、タスクによって挑戦される。本稿では,世界フィギュアスケート選手権から収集したフィギュアスケート(mmfs)のマルチモダリティとマルチタスクデータセットを提案する。行動認識と行動品質評価を持つMMFSは、RGB、スケルトンをキャプチャし、空間ラベルや時間ラベルを含む256のカテゴリを持つ11671クリップからアクションのスコアを収集する。私たちのデータセットの主な貢献は、以下の3つの側面に分類できます。 1) 個別に空間的・時間的カテゴリーを提案し, より詳細な行動認識と品質評価を行う。 2) MMFSはまず, 複雑なきめ細かい動作品質評価のための骨格モードを導入する。 (3)マルチモーダリティとマルチタスクデータセットは、より多くのアクション分析モデルを促進する。データセットをベンチマークするために、アクション認識とアクション品質評価のためのRGBおよびスケルトンベースのベースライン手法を採用した。 The fine-grained action analysis of the existing action datasets is challenged by insufficient action categories, low fine granularities, limited modalities, and tasks. In this paper, we propose a Multi-modality and Multi-task dataset of Figure Skating (MMFS) which was collected from the World Figure Skating Championships. MMFS, which possesses action recognition and action quality assessment, captures RGB, skeleton, and is collected the score of actions from 11671 clips with 256 categories including spatial and temporal labels. The key contributions of our dataset fall into three aspects as follows. (1) Independently spatial and temporal categories are first proposed to further explore fine-grained action recognition and quality assessment. (2) MMFS first introduces the skeleton modality for complex fine-grained action quality assessment. (3) Our multi-modality and multi-task dataset encourage more action analysis models. To benchmark our dataset, we adopt RGB-based and skeleton-based baseline methods for action recognition and action quality assessment.	翻訳日:2023-07-07 15:23:11 公開日:2023-07-06
# テキストアライメントは大規模NLPタスクのための効率的な統一モデル Text Alignment Is An Efficient Unified Model for Massive NLP Tasks ( http://arxiv.org/abs/2307.02729v1 ) ライセンス: Link先を確認	Yuheng Zha, Yichi Yang, Ruichen Li, Zhiting Hu	(参考訳) 大きな言語モデル(LLM)は、通常、次の単語予測の関数として設計され、広範なNLPタスクに優れていた。一般性にもかかわらず、次の単語予測は多くの場合、多くのタスクにおいて効率的な定式化ではなく、極端なモデルパラメータ(10億から100億)を必要とし、時には準最適性能をもたらす。実際には、より効率的なモデルを構築することが望ましいことが多い -- 汎用性は低いが、問題のかなりのサブセットに適用され、モデルサイズがはるかに小さい同等あるいは優れたパフォーマンスを提供する。本稿では,テキストの包含,類似性,質問応答(と応答性),事実整合性などを含む幅広い重要なタスクに対して,テキストアライメントを効率的な統一モデルとして提案する。一対のテキストが与えられると、モデルはその情報間のアライメントの度合いを測定する。 28データセットの5.9M例を用いて,RoBERTa(355Mパラメータ)の軽量微調整によりアライメントモデル(Align)をインスタンス化する。 Despite its compact size, extensive experiments show the model's efficiency and strong performance: (1) On over 20 datasets of aforementioned diverse tasks, the model matches or surpasses FLAN-T5 models that have around 2x or 10x more parameters; the single unified model also outperforms task-specific models finetuned on individual datasets; (2) When applied to evaluate factual consistency of language generation on 23 datasets, our model improves over various baselines, including the much larger GPT-3.5 (ChatGPT) and sometimes even GPT-4; (3) The lightweight model can also serve as an add-on component for LLMs such as GPT-3.5 in question answering tasks, improving the average exact match (EM) score by 17.94 and F1 score by 15.05 through identifying unanswerable questions. Large language models (LLMs), typically designed as a function of next-word prediction, have excelled across extensive NLP tasks. Despite the generality, next-word prediction is often not an efficient formulation for many of the tasks, demanding an extreme scale of model parameters (10s or 100s of billions) and sometimes yielding suboptimal performance. In practice, it is often desirable to build more efficient models -- despite being less versatile, they still apply to a substantial subset of problems, delivering on par or even superior performance with much smaller model sizes. In this paper, we propose text alignment as an efficient unified model for a wide range of crucial tasks involving text entailment, similarity, question answering (and answerability), factual consistency, and so forth. Given a pair of texts, the model measures the degree of alignment between their information. We instantiate an alignment model (Align) through lightweight finetuning of RoBERTa (355M parameters) using 5.9M examples from 28 datasets. Despite its compact size, extensive experiments show the model's efficiency and strong performance: (1) On over 20 datasets of aforementioned diverse tasks, the model matches or surpasses FLAN-T5 models that have around 2x or 10x more parameters; the single unified model also outperforms task-specific models finetuned on individual datasets; (2) When applied to evaluate factual consistency of language generation on 23 datasets, our model improves over various baselines, including the much larger GPT-3.5 (ChatGPT) and sometimes even GPT-4; (3) The lightweight model can also serve as an add-on component for LLMs such as GPT-3.5 in question answering tasks, improving the average exact match (EM) score by 17.94 and F1 score by 15.05 through identifying unanswerable questions.	翻訳日:2023-07-07 15:22:54 公開日:2023-07-06
# 階層的エンパワーメント:気軽なエンパワーメントに基づくスキル学習に向けて Hierarchical Empowerment: Towards Tractable Empowerment-Based Skill-Learning ( http://arxiv.org/abs/2307.02728v1 ) ライセンス: Link先を確認	Andrew Levy, Sreehari Rammohan, Alessandro Allievi, Scott Niekum, George Konidaris	(参考訳) 汎用エージェントには大量のスキルのレパートリーが必要です。スキルと国家間の最大の相互情報であるエンパワーメントは、異なるスキルの大規模なコレクションを学ぶための経路を提供するが、相互情報の最適化は困難である。我々は,目標条件付き階層強化学習の概念を統合することにより,コンピュータエンパワメントをより扱いやすくする新しいフレームワークである階層エンパワメントを導入する。私たちのフレームワークは2つの特別な貢献をします。まず,短時間の地平線上でのエンパワーメントの計算に使用可能な,相互情報に基づく新しい変分下界を導入する。第2に,指数的に長い時間スケールで計算能力を高める階層アーキテクチャを導入する。シミュレーションロボットタスクにおけるフレームワークの貢献を検証する。一般的なアリナビゲーション領域では、我々の4つのレベルエージェントは、以前の作業よりも2桁大きい表面積をカバーするスキルを学ぶことができる。 General purpose agents will require large repertoires of skills. Empowerment -- the maximum mutual information between skills and the states -- provides a pathway for learning large collections of distinct skills, but mutual information is difficult to optimize. We introduce a new framework, Hierarchical Empowerment, that makes computing empowerment more tractable by integrating concepts from Goal-Conditioned Hierarchical Reinforcement Learning. Our framework makes two specific contributions. First, we introduce a new variational lower bound on mutual information that can be used to compute empowerment over short horizons. Second, we introduce a hierarchical architecture for computing empowerment over exponentially longer time scales. We verify the contributions of the framework in a series of simulated robotics tasks. In a popular ant navigation domain, our four level agents are able to learn skills that cover a surface area over two orders of magnitude larger than prior work.	翻訳日:2023-07-07 15:22:26 公開日:2023-07-06
# 3分間の人間フィードバックを用いた拡散モデルの検閲サンプリング Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback ( http://arxiv.org/abs/2307.02770v1 ) ライセンス: Link先を確認	TaeHo Yoon, Kibeom Myoung, Keon Lee, Jaewoong Cho, Albert No, Ernest K. Ryu	(参考訳) 拡散モデルは最近、高品質な画像生成で顕著な成功を収めている。しかし、事前学習された拡散モデルは、良い画像を生成できるという意味で部分的な不一致を示すことがあるが、望ましくない画像を出力することもある。もしそうなら、単に悪い画像を生成するのを防ぎ、このタスクを検閲と呼びます。本研究では,最小の人間フィードバックに基づいて学習した報酬モデルを用いて,事前学習した拡散モデルを用いた検閲生成法を提案する。検閲は極端に人的フィードバック効率で達成でき、ほんの数分のフィードバックで生成されたラベルだけで十分であることを示す。 https://github.com/tetrzim/diffusion-human-feedback.com/で利用可能。 Diffusion models have recently shown remarkable success in high-quality image generation. Sometimes, however, a pre-trained diffusion model exhibits partial misalignment in the sense that the model can generate good images, but it sometimes outputs undesirable images. If so, we simply need to prevent the generation of the bad images, and we call this task censoring. In this work, we present censored generation with a pre-trained diffusion model using a reward model trained on minimal human feedback. We show that censoring can be accomplished with extreme human feedback efficiency and that labels generated with a mere few minutes of human feedback are sufficient. Code available at: https://github.com/tetrzim/diffusion-human-feedback.	翻訳日:2023-07-07 15:14:22 公開日:2023-07-06
# 未知の思考の生成、認識、再編成のためのトレーニングモデル Training Models to Generate, Recognize, and Reframe Unhelpful Thoughts ( http://arxiv.org/abs/2307.02768v1 ) ライセンス: Link先を確認	Mounica Maddela, Megan Ung, Jing Xu, Andrea Madotto, Heather Foran, Y-Lan Boureau	(参考訳) 健康に対する多くの認知的アプローチは、例えば無力な思考を認識して再フレーミングするなど、過去数十年にわたってかなりの実証的支援を受けてきたが、それでも本当に広く自己啓発の形式に採用されていない。その採用の障壁は、適切に特定され、多様な専門的な実践材料がないことです。本研究は,現在使われている言語モデルを用いて,特定の文脈に適合する標準的な無ヘルペスな思考パターンを記述し,適切な肯定的再フレーミング提案を生成することができるかを検討する。 PATTERNREFRAMEは、与えられたペルソナに条件付けされた不愉快な思考パターンを含む、およそ10kの思考例からなる、新しいデータセットである。このデータセットを使用して現在のモデルをトレーニングおよび/または評価することにより、既存のモデルは、必要最小限のモデルトレーニングを必要とせずに、多数の調整済みの練習資料と仮説を生成するのに有効なツールであることが示されます。 Many cognitive approaches to well-being, such as recognizing and reframing unhelpful thoughts, have received considerable empirical support over the past decades, yet still lack truly widespread adoption in self-help format. A barrier to that adoption is a lack of adequately specific and diverse dedicated practice material. This work examines whether current language models can be leveraged to both produce a virtually unlimited quantity of practice material illustrating standard unhelpful thought patterns matching specific given contexts, and generate suitable positive reframing proposals. We propose PATTERNREFRAME, a novel dataset of about 10k examples of thoughts containing unhelpful thought patterns conditioned on a given persona, accompanied by about 27k positive reframes. By using this dataset to train and/or evaluate current models, we show that existing models can already be powerful tools to help generate an abundance of tailored practice material and hypotheses, with no or minimal additional model training required.	翻訳日:2023-07-07 15:14:11 公開日:2023-07-06
# ジャンプを伴う高次元PIDEの時間差学習 Temporal Difference Learning for High-Dimensional PIDEs with Jumps ( http://arxiv.org/abs/2307.02766v1 ) ライセンス: Link先を確認	Liwei Lu, Hailong Guo, Xu Yang, Yi Zhu	(参考訳) 本稿では,時間差学習に基づく高次元部分積分微分方程式(pide)を解くための深層学習フレームワークを提案する。一連のLeviプロセスを導入し、それに対応する強化学習モデルを構築する。プロセス全体をシミュレートするために、ディープニューラルネットワークを使用して、方程式の解と非局所項を表現する。その後,非局所項の時間差誤差,終了条件,特性を損失関数としてネットワークを訓練する。この方法の相対誤差は、100次元実験ではo(10^{-3})、一次元純粋なジャンプ問題ではo(10^{-4})に達する。さらに, 計算コストの低減とロバスト性の利点を実証し, ジャンプの強度や形状の異なる問題への対処に適していることを示す。 In this paper, we propose a deep learning framework for solving high-dimensional partial integro-differential equations (PIDEs) based on the temporal difference learning. We introduce a set of Levy processes and construct a corresponding reinforcement learning model. To simulate the entire process, we use deep neural networks to represent the solutions and non-local terms of the equations. Subsequently, we train the networks using the temporal difference error, termination condition, and properties of the non-local terms as the loss function. The relative error of the method reaches O(10^{-3}) in 100-dimensional experiments and O(10^{-4}) in one-dimensional pure jump problems. Additionally, our method demonstrates the advantages of low computational cost and robustness, making it well-suited for addressing problems with different forms and intensities of jumps.	翻訳日:2023-07-07 15:13:52 公開日:2023-07-06
# 信頼に基づくカスケードデフェデレーションはいつ有効か? When Does Confidence-Based Cascade Deferral Suffice? ( http://arxiv.org/abs/2307.02764v1 ) ライセンス: Link先を確認	Wittawat Jitkrittum, Neha Gupta, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sanjiv Kumar	(参考訳) カスケードは、一連の分類器が順番に呼び出されるサンプル間で、推論コストを適応的に変化させる古典的な戦略である。 deferralルールは、シーケンス内の次の分類子を呼び出すか、または予測を終了するかを決定する。 1つの単純なdeferral ruleは、例えば最大予測ソフトマックス確率に基づいて、現在の分類器の信頼性を利用する。カスケードの構造(例えば、下流モデルのエラーをモデル化しない)に従順であるにもかかわらず、このような信頼に基づく推論は、実際には非常にうまく機能する。本稿では,信頼度に基づく推論が失敗する条件と,代替的推論戦略がうまく機能する場合の状況についてより深く理解することを目指す。まず、信頼に基づく推論が苦しむ可能性のある設定を正確に特徴づける最適deferralルールの理論的特徴付けを示す。次に, ポストホック・デフェラルのメカニズムについて検討し, 設定における信頼度に基づくデフェラルの大幅な改善を実証する。 (i)下流モデルは入力のサブセットでのみうまく機能する専門家である。 (ii)サンプルはラベルノイズを受けており、 (iii)列車と試験台の間には分布シフトがある。 Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade -- e.g., not modelling the errors of downstream models -- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better. We first present a theoretical characterisation of the optimal deferral rule, which precisely characterises settings under which confidence-based deferral may suffer. We then study post-hoc deferral mechanisms, and demonstrate they can significantly improve upon confidence-based deferral in settings where (i) downstream models are specialists that only work well on a subset of inputs, (ii) samples are subject to label noise, and (iii) there is distribution shift between the train and test set.	翻訳日:2023-07-07 15:13:41 公開日:2023-07-06
# 配偶者は、社会的関係をモデル化することで、メッセージの文脈的適切性を決定する Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships ( http://arxiv.org/abs/2307.02763v1 ) ライセンス: Link先を確認	David Jurgens, Agrima Seth, Jackson Sargent, Athena Aghighi, Michael Geraci	(参考訳) 対人コミュニケーションを理解するには、メッセージが語られる社会的文脈と規範を理解することが必要である。しかし、このようなコミュニケーションにおける攻撃的コンテンツを識別する現在の手法は、コミュニティの規範や事前会話を文脈として考慮し、文脈に依存しない。本稿では,個人間の社会的関係を明示的にモデル化することにより,不適切なコミュニケーションを識別する新しいアプローチを提案する。本稿では,文脈的に構成された適切性判断のデータセットを新たに導入し,大言語モデルが関係情報を容易に組み込んで,与えられた文脈における適切性を正確に識別できることを示す。オンライン会話と映画対話のデータを用いて、関係自体が暗黙の規範として機能し、異なる会話設定でコンテキスト感受性が必要な程度を定量化する。さらに, 文脈適合性判断は, 便宜や丁寧さといった言語で表される他の社会的要因を予測できることを示す。 Understanding interpersonal communication requires, in part, understanding the social context and norms in which a message is said. However, current methods for identifying offensive content in such communication largely operate independent of context, with only a few approaches considering community norms or prior conversation as context. Here, we introduce a new approach to identifying inappropriate communication by explicitly modeling the social relationship between the individuals. We introduce a new dataset of contextually-situated judgments of appropriateness and show that large language models can readily incorporate relationship information to accurately identify appropriateness in a given context. Using data from online conversations and movie dialogues, we provide insight into how the relationships themselves function as implicit norms and quantify the degree to which context-sensitivity is needed in different conversation settings. Further, we also demonstrate that contextual-appropriateness judgments are predictive of other social factors expressed in language such as condescension and politeness.	翻訳日:2023-07-07 15:13:22 公開日:2023-07-06
# PRD: 大規模言語モデルに基づく評価を改善するピアランクと考察 PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations ( http://arxiv.org/abs/2307.02762v1 ) ライセンス: Link先を確認	Ruosen Li, Teerth Patel, Xinya Du	(参考訳) 現在、様々な現代大言語モデル(LLM)が生成する応答の質は、自動で評価・比較することが困難である。近年の研究では、LLMをオープンエンド質問応答の基準自由度として主に用いている。より具体的には、彼らは「最も強い」llmを評価器として使用し、候補モデルの答えをペアで比較し、ランキングスコアを提供する。しかし、この直感的な手法には、自己強調(自身の答えを好む)や位置バイアスなど、複数の問題がある。教育領域(Cho and MacArthur, 2011; Walsh, 2014)からLLMに基づく評価を改善するための洞察と教訓を引き出す。具体的には,(1)ピア・ランク(pr)アルゴリズムを提案し,各ピア・llmの対方向選好を考慮し,モデルの最終的なランキングを出力し,(2)ピア・ディベーション(pd)により,2つの回答の選好について議論し,相互合意に達するように促す。我々は2つのベンチマークデータセットで実験を行う。私たちのアプローチは、より高い精度を達成し、それぞれ人間の判断とよりよく一致していることが分かりました。興味深いことに、prは匿名設定の下で比較的正確なモデルの自己組織化を誘導することができる。私たちの研究は、人間と比較しにくいモデルを評価するスペースを提供する。 Nowadays, the quality of responses generated by different modern large language models (LLMs) are hard to evaluate and compare automatically. Recent studies suggest and predominantly use LLMs as a reference-free metric for open-ended question answering. More specifically, they use the recognized "strongest" LLM as the evaluator, which conducts pairwise comparisons of candidate models' answers and provides a ranking score. However, this intuitive method has multiple problems, such as bringing in self-enhancement (favoring its own answers) and positional bias. We draw insights and lessons from the educational domain (Cho and MacArthur, 2011; Walsh, 2014) to improve LLM-based evaluations. Specifically, we propose the (1) peer rank (PR) algorithm that takes into account each peer LLM's pairwise preferences of all answer pairs, and outputs a final ranking of models; and (2) peer discussion (PD), where we prompt two LLMs to discuss and try to reach a mutual agreement on preferences of two answers. We conduct experiments on two benchmark datasets. We find that our approaches achieve higher accuracy and align better with human judgments, respectively. Interestingly, PR can induce a relatively accurate self-ranking of models under the anonymous setting, where each model's name is unrevealed. Our work provides space to explore evaluating models that are hard to compare for humans.	翻訳日:2023-07-07 15:13:04 公開日:2023-07-06
# 推薦のための知識グラフ自己監督型合理化 Knowledge Graph Self-Supervised Rationalization for Recommendation ( http://arxiv.org/abs/2307.02759v1 ) ライセンス: Link先を確認	Yuhao Yang, Chao Huang, Lianghao Xia, Chunzhen Huang	(参考訳) 本稿では,知識認識リコメンデータシステムのための,KGRecと呼ばれる自己指導型合理化手法を提案する。情報的知識接続を効果的に識別するために,知識三重項に対する合理的スコアを生成する注意的知識合理化機構を提案する。これらのスコアにより、KGRecは有理マスクによる推薦のための生成的かつコントラスト的な自己監督タスクを統合する。知識グラフの有理性を強調するために,マスキング・再構築という新たな生成タスクを設計する。重要な知識を高い有理スコアで隠蔽することで、KGRecは有理値として役立つ有用な知識接続を再構築し強調するように訓練されている。知識グラフ学習における協調的相互作用の効果をさらに合理化するために,知識とユーザ・イテムの相互作用ビューからの信号を整合させるコントラスト学習タスクを導入する。耐雑音コントラストを確保するため、有理スコアで判断される両グラフの潜在的なノイズエッジをマスキングする。 3つの実世界のデータセットに対する大規模な実験は、KGRecが最先端の手法より優れていることを示した。アプローチの実装コードもhttps://github.com/HKUDS/KGRec.comで公開しています。 In this paper, we introduce a new self-supervised rationalization method, called KGRec, for knowledge-aware recommender systems. To effectively identify informative knowledge connections, we propose an attentive knowledge rationalization mechanism that generates rational scores for knowledge triplets. With these scores, KGRec integrates generative and contrastive self-supervised tasks for recommendation through rational masking. To highlight rationales in the knowledge graph, we design a novel generative task in the form of masking-reconstructing. By masking important knowledge with high rational scores, KGRec is trained to rebuild and highlight useful knowledge connections that serve as rationales. To further rationalize the effect of collaborative interactions on knowledge graph learning, we introduce a contrastive learning task that aligns signals from knowledge and user-item interaction views. To ensure noise-resistant contrasting, potential noisy edges in both graphs judged by the rational scores are masked. Extensive experiments on three real-world datasets demonstrate that KGRec outperforms state-of-the-art methods. We also provide the implementation codes for our approach at https://github.com/HKUDS/KGRec.	翻訳日:2023-07-07 15:12:41 公開日:2023-07-06
# オンラインコミュニティにおける言語スタイルマッチングの探求 : 社会的文脈と会話ダイナミクスの役割 Exploring Linguistic Style Matching in Online Communities: The Role of Social Context and Conversation Dynamics ( http://arxiv.org/abs/2307.02758v1 ) ライセンス: Link先を確認	Aparna Ananthasubramaniam, Hong Chen, Jason Yan, Kenan Alkiek, Jiaxin Pei, Agrima Seth, Lavinia Dunagan, Minje Choi, Benjamin Litterer, David Jurgens	(参考訳) 会話における言語スタイルマッチング(LSM)は、力や説得といった社会的影響のいくつかの側面を反映することができる。しかし、LSMがRedditのようなプラットフォーム上でのオンラインコミュニケーションの結果とどのように関係しているのかは不明な疑問である。本研究では,Redditにおける二者会話スレッドの大規模コーパスを分析し,機能語の使用と形式性という2種類のスタイルを用いて,LSMのすべての発生を識別する。このフレームワークを用いて、Reddit内のいくつかの社会的要因(ポストとサブレディット機能、会話深度、ユーザ在任率、コメントの議論)によって、LSMのレベルが会話でどのように異なるかを検討する。最後に,コミュニティ禁止後の身分喪失に伴うlsmの変化を測定した。その結果,Redditでの会話におけるLSMの相互作用が,コミュニティのダイナミクスを理解する上での会話の関与を理解することの重要性が示唆された。 Linguistic style matching (LSM) in conversations can be reflective of several aspects of social influence such as power or persuasion. However, how LSM relates to the outcomes of online communication on platforms such as Reddit is an unknown question. In this study, we analyze a large corpus of two-party conversation threads in Reddit where we identify all occurrences of LSM using two types of style: the use of function words and formality. Using this framework, we examine how levels of LSM differ in conversations depending on several social factors within Reddit: post and subreddit features, conversation depth, user tenure, and the controversiality of a comment. Finally, we measure the change of LSM following loss of status after community banning. Our findings reveal the interplay of LSM in Reddit conversations with several community metrics, suggesting the importance of understanding conversation engagement when understanding community dynamics.	翻訳日:2023-07-07 15:12:22 公開日:2023-07-06
# CityTrack: 位置認識とボックスグレードマッチングによる都市規模マルチカメラマルチターゲットトラッキングの改善 CityTrack: Improving City-Scale Multi-Camera Multi-Target Tracking by Location-Aware Tracking and Box-Grained Matching ( http://arxiv.org/abs/2307.02753v1 ) ライセンス: Link先を確認	Jincheng Lu, Xipeng Yang, Jin Ye, Yifu Zhang, Zhikang Zou, Wei Zhang, Xiao Tan	(参考訳) Multi-Camera Multi-Target Tracking (MCMT)は、複数のカメラを同時に追跡するコンピュータビジョン技術である。都市交通の視覚分析におけるmcmtは、都市交通シーンの複雑でダイナミックな性質のために大きな課題に直面している。都市交通シーンのターゲットはしばしば閉塞、照明変更、視点変更を受け、異なるカメラ間でターゲットを関連付けることが困難になる。これらの課題を克服するために,CityTrackと呼ばれる新しいMCMTフレームワークを提案する。具体的には、MCMTタスクにおいて、様々な高度な技術を統合した位置認識SCMTトラッカーを提案し、上記の問題を解決するために、ICAモジュールのための新しいボックスグレードマッチング(BGM)手法を提案する。我々は、cityflowv2データセットの公開テストセットに対するアプローチを評価し、2022年のai city challengeで84.91%のidf1を達成した。本研究では,都市交通シーンがもたらす課題を克服するためのアプローチの有効性を実証した。 Multi-Camera Multi-Target Tracking (MCMT) is a computer vision technique that involves tracking multiple targets simultaneously across multiple cameras. MCMT in urban traffic visual analysis faces great challenges due to the complex and dynamic nature of urban traffic scenes, where multiple cameras with different views and perspectives are often used to cover a large city-scale area. Targets in urban traffic scenes often undergo occlusion, illumination changes, and perspective changes, making it difficult to associate targets across different cameras accurately. To overcome these challenges, we propose a novel systematic MCMT framework, called CityTrack. Specifically, we present a Location-Aware SCMT tracker which integrates various advanced techniques to improve its effectiveness in the MCMT task and propose a novel Box-Grained Matching (BGM) method for the ICA module to solve the aforementioned problems. We evaluated our approach on the public test set of the CityFlowV2 dataset and achieved an IDF1 of 84.91%, ranking 1st in the 2022 AI CITY CHALLENGE. Our experimental results demonstrate the effectiveness of our approach in overcoming the challenges posed by urban traffic scenes.	翻訳日:2023-07-07 15:12:06 公開日:2023-07-06
# 不均衡データセットを用いたオフライン強化学習 Offline Reinforcement Learning with Imbalanced Datasets ( http://arxiv.org/abs/2307.02752v1 ) ライセンス: Link先を確認	Li Jiang, Sijie Chen, Jielin Qiu, Haoran Xu, Wai Kin Chan, Zhao Ding	(参考訳) 現在のオフライン強化学習(RL)研究におけるベンチマークの利用は、モデル開発における実際のデータセット分布の不均衡を無視している。現実世界のオフラインRLデータセットは、探索や安全性の考慮が難しいため、状態空間上で不均衡になることが多い。本稿では、オフラインRLにおける不均衡データセットの特性を規定する。そこでは、状態カバレッジは、歪んだポリシーを特徴とする電力法分布に従う。理論的および実証的に、保守的q-learning(cql)のような分布的制約に基づくオフラインrlメソッドは、不均衡データセットの下でポリシーを抽出するのに効果がないことを示した。自然知性に触発されて,cqlを検索プロセスで拡張し,過去の関連する経験を思い出し,不均衡データセットによって生じる課題を効果的に軽減する,オフラインrl手法を提案する。我々は,D4RLの変種を利用して,不均衡なデータセットの文脈における複数のタスクに対する手法の評価を行った。実験により,本手法が他のベースラインよりも優れていることを示す。 The prevalent use of benchmarks in current offline reinforcement learning (RL) research has led to a neglect of the imbalance of real-world dataset distributions in the development of models. The real-world offline RL dataset is often imbalanced over the state space due to the challenge of exploration or safety considerations. In this paper, we specify properties of imbalanced datasets in offline RL, where the state coverage follows a power law distribution characterized by skewed policies. Theoretically and empirically, we show that typically offline RL methods based on distributional constraints, such as conservative Q-learning (CQL), are ineffective in extracting policies under the imbalanced dataset. Inspired by natural intelligence, we propose a novel offline RL method that utilizes the augmentation of CQL with a retrieval process to recall past related experiences, effectively alleviating the challenges posed by imbalanced datasets. We evaluate our method on several tasks in the context of imbalanced datasets with varying levels of imbalance, utilizing the variant of D4RL. Empirical results demonstrate the superiority of our method over other baselines.	翻訳日:2023-07-07 15:11:48 公開日:2023-07-06
# 平均重み付きOLR-WAオンライン回帰 OLR-WA Online Regression with Weighted Average ( http://arxiv.org/abs/2307.02804v1 ) ライセンス: Link先を確認	Mohammad Abu-Shaira and Greg Speegle	(参考訳) 機械学習は正確なモデルを構築するために大量のトレーニングデータを必要とする。時にデータが時間とともに到着し、大きなストレージスペースを必要とし、新しいデータを説明するためにモデルを再計算する。オンライン学習は、データが発生したときにモデルを漸進的に修正し、データを捨てることで、これらの問題に対処する。本研究では,新しいオンライン線形回帰手法を提案する。このアプローチでは、新たに到着したデータと既存のモデルを組み合わせて、新しいモデルを作成します。 olr-wa (online regression with weighted average) と名付けられたこのモデルでは,ユーザ定義の重み付けを使用して,データ変更に対する柔軟性を提供して,結果のバイアスを古いデータや新しいデータに置き換える。我々は,OLR-WAをデータセット全体を用いた静的モデルと比較した2次元および3次元実験を行った。その結果、一貫性のあるデータの場合、olr-waと静的バッチモデルも同様に動作し、異なるデータの場合、ユーザーはolr-waをより迅速に適応させるか、変更に抵抗するように設定できる。 Machine Learning requires a large amount of training data in order to build accurate models. Sometimes the data arrives over time, requiring significant storage space and recalculating the model to account for the new data. On-line learning addresses these issues by incrementally modifying the model as data is encountered, and then discarding the data. In this study we introduce a new online linear regression approach. Our approach combines newly arriving data with a previously existing model to create a new model. The introduced model, named OLR-WA (OnLine Regression with Weighted Average) uses user-defined weights to provide flexibility in the face of changing data to bias the results in favor of old or new data. We have conducted 2-D and 3-D experiments comparing OLR-WA to a static model using the entire data set. The results show that for consistent data, OLR-WA and the static batch model perform similarly and for varying data, the user can set the OLR-WA to adapt more quickly or to resist change.	翻訳日:2023-07-07 15:07:07 公開日:2023-07-06
# テンソル回帰を用いた構造的グローバル情報保存のためのFew-ShotパーソナライズSaliency予測 Few-Shot Personalized Saliency Prediction Using Tensor Regression for Preserving Structural Global Information ( http://arxiv.org/abs/2307.02799v1 ) ライセンス: Link先を確認	Yuya Moroto, Keisuke Maeda, Takahiro Ogawa and Miki Haseyama	(参考訳) 本稿では,psms(パーソナライズ・サリエンシー・マップ)の構造的グローバル情報を保存するために,テンソル・ツー・マトリックス回帰を用いた数ショットのパーソナライズ・サリエンシー予測を提案する。一般のサルマンシーマップとは対照的に、psmは、注視領域の多様性から個々の視覚嗜好を得るのに有用な人物特有の視覚注意を示すので、大きな可能性を秘めている。 PSM予測は、見えない画像のPSMを取得するために必要であるが、個々の視線パターンの複雑さのため、その予測は依然として難しい課題である。視線追跡データの限られた量から個々の視線パターンを認識するために、従来の方法は人の視線傾向の類似性を採用する。しかし、従来の手法では、予測モデルに対してPSMはベクトル化される。このようにして、画像に対応するPSMの構造的グローバル情報を無視する。 psm間の関係を自動的に明らかにするために,psmの構造情報を保存できるテンソルに基づく回帰モデルに着目し,予測精度の向上を実現する。実験の結果,テンソルベース回帰を含む提案手法が比較法より優れていることを確認した。 This paper presents a few-shot personalized saliency prediction using tensor-to-matrix regression for preserving the structural global information of personalized saliency maps (PSMs). In contrast to a general saliency map, a PSM has been great potential since its map indicates the person-specific visual attention that is useful for obtaining individual visual preferences from heterogeneity of gazed areas. The PSM prediction is needed for acquiring the PSM for the unseen image, but its prediction is still a challenging task due to the complexity of individual gaze patterns. For recognizing individual gaze patterns from the limited amount of eye-tracking data, the previous methods adopt the similarity of gaze tendency between persons. However, in the previous methods, the PSMs are vectorized for the prediction model. In this way, the structural global information of the PSMs corresponding to the image is ignored. For automatically revealing the relationship between PSMs, we focus on the tensor-based regression model that can preserve the structural information of PSMs, and realize the improvement of the prediction accuracy. In the experimental results, we confirm the proposed method including the tensor-based regression outperforms the comparative methods.	翻訳日:2023-07-07 15:06:46 公開日:2023-07-06
# 整合性正規化非整合性学習による半教師付き領域適応型医用画像分割 Semi-supervised Domain Adaptive Medical Image Segmentation through Consistency Regularized Disentangled Contrastive Learning ( http://arxiv.org/abs/2307.02798v1 ) ライセンス: Link先を確認	Hritam Basak, Zhaozheng Yin	(参考訳) 教師なしドメイン適応(UDA)は、ドメインシフトを軽減するための有望な方向であるが、教師なしドメイン適応(unsupervised domain adapt)には程遠い。本研究では,医療画像セグメント化のための半教師付き領域適応 (SSDA) を比較的少ない範囲で検討し,いくつかのラベル付き対象サンプルへのアクセスにより適応性能が大幅に向上することを示した。具体的には,2段階のトレーニングプロセスを提案する。まず、新しいドメイン内容の不整合型コントラスト学習(CL)と画素レベルの特徴整合性制約を用いて、自己学習パラダイムでエンコーダを事前学習する。提案したCLは,空間感度を維持することにより局所画素レベルの情報マイニングを強制する一方,ソース画像とターゲット画像からグローバルスケールで識別的コンテンツ固有のドメイン不変セマンティクスを学習するようエンコーダに強制する。このプリトレーニングエンコーダはデコーダと共に、半教師付き設定を用いて下流タスク(ピクセルレベルセグメンテーション)に対してさらに微調整される。さらに,提案手法がUDA設定で容易に拡張可能であることを実験的に検証し,提案手法の優位性を高める。提案手法は,2つの領域適応画像分割タスクの評価において,SSDAおよびUDA設定の両方において,SoTA法よりも優れている。コードはhttps://github.com/hritam-98/GFDA-disentangledで入手できる。 Although unsupervised domain adaptation (UDA) is a promising direction to alleviate domain shift, they fall short of their supervised counterparts. In this work, we investigate relatively less explored semi-supervised domain adaptation (SSDA) for medical image segmentation, where access to a few labeled target samples can improve the adaptation performance substantially. Specifically, we propose a two-stage training process. First, an encoder is pre-trained in a self-learning paradigm using a novel domain-content disentangled contrastive learning (CL) along with a pixel-level feature consistency constraint. The proposed CL enforces the encoder to learn discriminative content-specific but domain-invariant semantics on a global scale from the source and target images, whereas consistency regularization enforces the mining of local pixel-level information by maintaining spatial sensitivity. This pre-trained encoder, along with a decoder, is further fine-tuned for the downstream task, (i.e. pixel-level segmentation) using a semi-supervised setting. Furthermore, we experimentally validate that our proposed method can easily be extended for UDA settings, adding to the superiority of the proposed strategy. Upon evaluation on two domain adaptive image segmentation tasks, our proposed method outperforms the SoTA methods, both in SSDA and UDA settings. Code is available at https://github.com/hritam-98/GFDA-disentangled	翻訳日:2023-07-07 15:06:12 公開日:2023-07-06
# bheisr: バイアスからバランスへ - 知識に基づくレコメンデーションにおけるイデオロギー分離を排除することによって信念調和を促進する BHEISR: Nudging from Bias to Balance -- Promoting Belief Harmony by Eliminating Ideological Segregation in Knowledge-based Recommendations ( http://arxiv.org/abs/2307.02797v1 ) ライセンス: Link先を確認	Mengyan Wang, Yuxuan Hu, Zihan Yuan, Chenting Jiang, Weihua Li, Shiqing Wu and Quan Bai	(参考訳) パーソナライズされたレコメンデーションシステムの領域では、信念の不均衡とユーザのバイアスの増幅が懸念されている。そこで本研究では,既存のレコメンデーションシステムにおけるフィルタバブル効果の悪影響を軽減させるため,ユーザと既存レコメンデーションシステム間の革新的な中間機関(bheisr)を提案する。主な目的は,フィルタバブルによる有害な影響を最小限に抑えつつ,ユーザの信念バランスを打つことである。 BHEISRモデルは、民主的かつ透明な原則を支持しながら、ナッジ理論から原則を取り入れている。ユーザー固有のカテゴリー情報を利用して好奇心を刺激する。新たなカテゴリーへの関心を徐々に刺激することで、このモデルはユーザーが信念の地平を広げ、通常見落としている情報を探索することを奨励する。我々のモデルは時間に敏感であり、ユーザのフィードバックループで動作する。モデルの既存のレコメンデーションアルゴリズムを使用し、事前の時間フレームからのユーザフィードバックを組み込む。このアプローチは、フィルターバブルの制約を超越し、レコメンデーションの多様性を高め、ユーザー間の信念バランスを保ちつつ、ユーザの好みやシステム固有のビジネス要件にも応えます。 BHEISRモデルの有効性と信頼性を検証するため,実世界のデータセットを用いた総合実験を行った。これらの実験は、200人近いフィルターバブルユーザを試験対象として、bheisrモデルの性能をいくつかのベースラインモデルと比較した。実験結果は,フィルタバブルの緩和とユーザ視点のバランスをとる上で,BHEISRモデルの優れた性能を示すものである。 In the realm of personalized recommendation systems, the increasing concern is the amplification of belief imbalance and user biases, a phenomenon primarily attributed to the filter bubble. Addressing this critical issue, we introduce an innovative intermediate agency (BHEISR) between users and existing recommendation systems to attenuate the negative repercussions of the filter bubble effect in extant recommendation systems. The main objective is to strike a belief balance for users while minimizing the detrimental influence caused by filter bubbles. The BHEISR model amalgamates principles from nudge theory while upholding democratic and transparent principles. It harnesses user-specific category information to stimulate curiosity, even in areas users might initially deem uninteresting. By progressively stimulating interest in novel categories, the model encourages users to broaden their belief horizons and explore the information they typically overlook. Our model is time-sensitive and operates on a user feedback loop. It utilizes the existing recommendation algorithm of the model and incorporates user feedback from the prior time frame. This approach endeavors to transcend the constraints of the filter bubble, enrich recommendation diversity, and strike a belief balance among users while also catering to user preferences and system-specific business requirements. To validate the effectiveness and reliability of the BHEISR model, we conducted a series of comprehensive experiments with real-world datasets. These experiments compared the performance of the BHEISR model against several baseline models using nearly 200 filter bubble-impacted users as test subjects. Our experimental results conclusively illustrate the superior performance of the BHEISR model in mitigating filter bubbles and balancing user perspectives.	翻訳日:2023-07-07 15:05:29 公開日:2023-07-06
# VerifAI: 検証された生成AI VerifAI: Verified Generative AI ( http://arxiv.org/abs/2307.02796v1 ) ライセンス: Link先を確認	Nan Tang and Chenyu Yang and Ju Fan and Lei Cao	(参考訳) 生成AIは大きな進歩を遂げているが、アウトプットの正確性と信頼性に関する懸念は拡大を続けている。このような不正確さは、不正確な意思決定、誤った情報の拡散、プライバシー侵害、法的負債など、重大な結果をもたらす可能性がある。透明性、プライバシ保護、バイアス軽減、社会的および環境的責任といった、説明可能なAIと責任あるAIプラクティスを含む、これらのリスクに対処する努力が進行中である。データ管理の観点から生成AIの出力を検証することは、生成AIの新たな課題である。これには、テキストファイル、テーブル、ナレッジグラフを含むマルチモーダルデータレイクの基盤となるデータを分析し、その品質と一貫性を評価することが含まれる。これにより、生成AIモデルの出力を評価するためのより強力な基盤を確立することができる。このようなアプローチは、生成AIの正確性を確保し、透明性を促進し、より信頼性の高い意思決定を可能にする。私たちのビジョンは、検証可能な生成AIの開発を促進し、より信頼性が高く責任あるAIの利用に貢献することです。 Generative AI has made significant strides, yet concerns about the accuracy and reliability of its outputs continue to grow. Such inaccuracies can have serious consequences such as inaccurate decision-making, the spread of false information, privacy violations, legal liabilities, and more. Although efforts to address these risks are underway, including explainable AI and responsible AI practices such as transparency, privacy protection, bias mitigation, and social and environmental responsibility, misinformation caused by generative AI will remain a significant challenge. We propose that verifying the outputs of generative AI from a data management perspective is an emerging issue for generative AI. This involves analyzing the underlying data from multi-modal data lakes, including text files, tables, and knowledge graphs, and assessing its quality and consistency. By doing so, we can establish a stronger foundation for evaluating the outputs of generative AI models. Such an approach can ensure the correctness of generative AI, promote transparency, and enable decision-making with greater confidence. Our vision is to promote the development of verifiable generative AI and contribute to a more trustworthy and responsible use of AI.	翻訳日:2023-07-07 15:04:51 公開日:2023-07-06
# データサイエンス教育は大規模言語モデルで何をすべきか? What Should Data Science Education Do with Large Language Models? ( http://arxiv.org/abs/2307.02792v1 ) ライセンス: Link先を確認	Xinming Tu, James Zou, Weijie J. Su, Linjun Zhang	(参考訳) ChatGPTのような大規模言語モデル(LLM)の急速な進歩は、データサイエンスと統計学に革命をもたらしている。これらの最先端ツールは複雑なプロセスを合理化する。その結果、データサイエンティストの役割が再認識される。 LLMはデータサイエンティストの責務を転換し、手作業によるコーディング、データラングリング、標準分析から、これらの自動化AIによる分析の評価と管理へと焦点を移している、と私たちは主張する。この役割の進化は、ソフトウェアエンジニアからプロダクトマネージャへの移行を思い起こさせる。本稿では, LLMを用いた具体的なデータサイエンスケーススタディを用いて, この変遷を説明する。これらの発展は、データサイエンス教育において有意義な進化を必要とする。教育は、LLMインフォームドクリエイティビティ、批判的思考、AI誘導プログラミングなど、学生の間で多様なスキルセットの育成に重点を置く必要がある。 LLMは教室でインタラクティブな教育と学習ツールとして重要な役割を担い、パーソナライズされた教育に寄与する。本稿では,これら各方向性に対する機会,資源,オープンな課題について論じる。あらゆるトランスフォーメーション技術と同様に、教育にllmを統合するには慎重に検討する必要がある。 LLMは反復作業を効率的に行うことができますが、その役割は人間の知性と創造性を補うことであり、それを置き換えることではありません。したがって、データサイエンス教育の新しい時代は、人間の専門知識とイノベーションを補完しながら、llmの利点のバランスをとるべきである。結論として、LLMの台頭はデータサイエンスとその教育の転換期を告げている。本稿は,このパラダイムシフトに伴う新たなトレンド,潜在的な機会,課題を浮き彫りにし,エキサイティングで未解決な領域に関するさらなる談話や調査のきっかけとなることを願っている。 The rapid advances of large language models (LLMs), such as ChatGPT, are revolutionizing data science and statistics. These state-of-the-art tools can streamline complex processes. As a result, it reshapes the role of data scientists. We argue that LLMs are transforming the responsibilities of data scientists, shifting their focus from hands-on coding, data-wrangling and conducting standard analyses to assessing and managing analyses performed by these automated AIs. This evolution of roles is reminiscent of the transition from a software engineer to a product manager. We illustrate this transition with concrete data science case studies using LLMs in this paper. These developments necessitate a meaningful evolution in data science education. Pedagogy must now place greater emphasis on cultivating diverse skillsets among students, such as LLM-informed creativity, critical thinking, AI-guided programming. LLMs can also play a significant role in the classroom as interactive teaching and learning tools, contributing to personalized education. This paper discusses the opportunities, resources and open challenges for each of these directions. As with any transformative technology, integrating LLMs into education calls for careful consideration. While LLMs can perform repetitive tasks efficiently, it's crucial to remember that their role is to supplement human intelligence and creativity, not to replace it. Therefore, the new era of data science education should balance the benefits of LLMs while fostering complementary human expertise and innovations. In conclusion, the rise of LLMs heralds a transformative period for data science and its education. This paper seeks to shed light on the emerging trends, potential opportunities, and challenges accompanying this paradigm shift, hoping to spark further discourse and investigation into this exciting, uncharted territory.	翻訳日:2023-07-07 15:04:31 公開日:2023-07-06
# グループフェア医療画像分類におけるサブグループ分離性の役割 The Role of Subgroup Separability in Group-Fair Medical Image Classification ( http://arxiv.org/abs/2307.02791v1 ) ライセンス: Link先を確認	Charles Jones, M\'elanie Roschewitz, Ben Glocker	(参考訳) 深層分類器の性能差を調査した。分類器が個人をサブグループに分ける能力は, 医用画像のモダリティや保護特性によって大きく異なっており, この特性がアルゴリズムバイアスの予測であることを示す。理論解析と広範な経験的評価を通じて,下位診断などの体系的バイアスのあるデータを用いてモデルが訓練された場合,サブグループ分離可能性,サブグループ格差,パフォーマンス低下の関係を見出した。私たちの発見は、モデルがどのように偏見を抱くかという問題に新たな光を当て、公正な医療画像AIの開発に重要な洞察を与えました。 We investigate performance disparities in deep classifiers. We find that the ability of classifiers to separate individuals into subgroups varies substantially across medical imaging modalities and protected characteristics; crucially, we show that this property is predictive of algorithmic bias. Through theoretical analysis and extensive empirical evaluation, we find a relationship between subgroup separability, subgroup disparities, and performance degradation when models are trained on data with systematic bias such as underdiagnosis. Our findings shed new light on the question of how models become biased, providing important insights for the development of fair medical imaging AI.	翻訳日:2023-07-07 15:04:03 公開日:2023-07-06
# MEDVQA-GI 2023 における UIT-Saviors: 画像強調によるマルチモーダル学習の改善 UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image Enhancement for Gastrointestinal Visual Question Answering ( http://arxiv.org/abs/2307.02783v1 ) ライセンス: Link先を確認	Triet M. Thai, Anh T. Vo, Hao K. Tieu, Linh N.P. Bui, Thien T.B. Nguyen	(参考訳) 近年、人工知能は医学や疾患の診断において重要な役割を担い、その1つはMedVQA(MedVQA)である。コンピュータビジョンと自然言語処理を組み合わせることで、MedVQAシステムは、与えられた質問に基づいて医療画像から関連情報を抽出し、正確な診断回答を提供する専門家を支援することができる。 ImageCLEFmed-MEDVQA-GI-2023は胃内視鏡および大腸内視鏡画像を含む消化管領域の視覚的質問応答タスクを実行した。我々のチームは,胃腸画像上のVQA性能を改善するために,画像強調によるマルチモーダル学習手法を提案することで課題1にアプローチした。マルチモーダルアーキテクチャは、BERTエンコーダと、畳み込みニューラルネットワーク(CNN)とトランスフォーマーアーキテクチャに基づいて、質問や内視鏡画像から特徴抽出のための様々な事前訓練されたビジョンモデルを備える。本研究は,CNN上でのトランスフォーマーベース視覚モデルの優位性を強調し,F1スコアが向上した8つの視覚モデルのうち6つを用いて,画像強調処理の有効性を示した。 BERT+BEiT融合と画像強調の利点を生かし, 開発テストセット上で最大87.25%の精度と91.85%のF1スコアを達成するとともに, 82.01%の精度でプライベートテストセット上で良好な結果が得られる。 In recent years, artificial intelligence has played an important role in medicine and disease diagnosis, with many applications to be mentioned, one of which is Medical Visual Question Answering (MedVQA). By combining computer vision and natural language processing, MedVQA systems can assist experts in extracting relevant information from medical image based on a given question and providing precise diagnostic answers. The ImageCLEFmed-MEDVQA-GI-2023 challenge carried out visual question answering task in the gastrointestinal domain, which includes gastroscopy and colonoscopy images. Our team approached Task 1 of the challenge by proposing a multimodal learning method with image enhancement to improve the VQA performance on gastrointestinal images. The multimodal architecture is set up with BERT encoder and different pre-trained vision models based on convolutional neural network (CNN) and Transformer architecture for features extraction from question and endoscopy image. The result of this study highlights the dominance of Transformer-based vision models over the CNNs and demonstrates the effectiveness of the image enhancement process, with six out of the eight vision models achieving better F1-Score. Our best method, which takes advantages of BERT+BEiT fusion and image enhancement, achieves up to 87.25% accuracy and 91.85% F1-Score on the development test set, while also producing good result on the private test set with accuracy of 82.01%.	翻訳日:2023-07-07 15:03:51 公開日:2023-07-06
# 大規模言語モデルによるコネクテッドインテリジェンスのための自律エッジAI Large Language Models Empowered Autonomous Edge AI for Connected Intelligence ( http://arxiv.org/abs/2307.02779v1 ) ライセンス: Link先を確認	Yifei Shen, Jiawei Shao, Xinjie Zhang, Zehong Lin, Hao Pan, Dongsheng Li, Jun Zhang, Khaled B. Letaief	(参考訳) ワイヤレスネットワークの進化は、超接続されたサイバー物理世界における人間、物体、および知性のシームレスな相互接続を想定した、コネクテッド・インテリジェンス(connected intelligence)へと向かっている。エッジAIは、ネットワークエッジで高品質で低レイテンシ、プライバシ保護のAIサービスを提供することで、コネクテッドインテリジェンスを実現するための有望なソリューションとして登場します。本稿では,ユーザの多様な要件を満たすために,自動編成,適応,最適化を行う自律エッジAIシステムを紹介する。このシステムはクラウド・エッジ・クライアントの階層アーキテクチャを採用しており、大きな言語モデル、すなわちジェネレーティブ・プレトレーニング・トランスフォーマー(GPT)がクラウドに存在し、他のAIモデルがデバイスやエッジサーバで共同デプロイされる。言語理解,計画,コード生成におけるGPTの強力な能力を活用することで,エッジAIモデルを効率的にコーディネートし,ユーザの個人的要求に応えるとともに,エッジフェデレーション学習を通じて新たなモデルをトレーニングするためのコードを自動的に生成する,汎用的なフレームワークを提案する。実験結果は,ユーザの要求を正確に理解し,最小限のコストでaiモデルを効率的に実行し,連合学習による高性能aiモデルを効果的に作成するシステムの驚くべき能力を示している。 The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world. Edge AI emerges as a promising solution to achieve connected intelligence by delivering high-quality, low-latency, and privacy-preserving AI services at the network edge. In this article, we introduce an autonomous edge AI system that automatically organizes, adapts, and optimizes itself to meet users' diverse requirements. The system employs a cloud-edge-client hierarchical architecture, where the large language model, i.e., Generative Pretrained Transformer (GPT), resides in the cloud, and other AI models are co-deployed on devices and edge servers. By leveraging the powerful abilities of GPT in language understanding, planning, and code generation, we present a versatile framework that efficiently coordinates edge AI models to cater to users' personal demands while automatically generating code to train new models via edge federated learning. Experimental results demonstrate the system's remarkable ability to accurately comprehend user demands, efficiently execute AI models with minimal cost, and effectively create high-performance AI models through federated learning.	翻訳日:2023-07-07 15:03:23 公開日:2023-07-06
# SeLiNet:画像の感情認識のための高密度軽量ネットワーク SeLiNet: Sentiment enriched Lightweight Network for Emotion Recognition in Images ( http://arxiv.org/abs/2307.02773v1 ) ライセンス: Link先を確認	Tuneer Khargonkar, Shwetank Choudhary, Sumit Kumar, Barath Raj KR	(参考訳) 本稿では,感情に富んだ軽量ネットワークSeLiNetと,画像の文脈的感情認識のためのエンド・ツー・エンド・デバイス・パイプラインを提案する。 SeLiNetモデルは、身体特徴抽出器、画像美学特徴抽出器、学習ベース融合ネットワークから構成され、個別の感情と人間の感情を共同で推定する。 EMOTICデータセットでは,ベースラインAPスコアの27.38に対して平均精度(AP)スコアの27.17を達成し,モデルサイズを85%削減する。さらに,ベースラインと比較してモデルサイズが93%以上減少する26.42点のオンデバイスapスコアを報告した。 In this paper, we propose a sentiment-enriched lightweight network SeLiNet and an end-to-end on-device pipeline for contextual emotion recognition in images. SeLiNet model consists of body feature extractor, image aesthetics feature extractor, and learning-based fusion network which jointly estimates discrete emotion and human sentiments tasks. On the EMOTIC dataset, the proposed approach achieves an Average Precision (AP) score of 27.17 in comparison to the baseline AP score of 27.38 while reducing the model size by >85%. In addition, we report an on-device AP score of 26.42 with reduction in model size by >93% when compared to the baseline.	翻訳日:2023-07-07 15:02:55 公開日:2023-07-06
# 逆プロンプトによるクロスドメインスロット充足のためのゼロショットプロンプト学習 Generative Zero-Shot Prompt Learning for Cross-Domain Slot Filling with Inverse Prompting ( http://arxiv.org/abs/2307.02830v1 ) ライセンス: Link先を確認	Xuefeng Li, Liwen Wang, Guanting Dong, Keqing He, Jinzheng Zhao, Hao Lei, Jiachi Liu, Weiran Xu	(参考訳) ゼロショットクロスドメインスロットフィリングは、ラベル付きソースドメインからラベルなしターゲットドメインへの知識の転送を目的としている。既存のモデルはスロット記述や例をエンコードするか、ヒューリスティックなルールを使って手作りの質問テンプレートを設計する。本稿では,クロスドメインスロット充填のための生成的ゼロショットプロンプト学習フレームワークを提案する。さらに,複数の予測問題を回避するために,異なるスロットタイプを識別する新しい逆プロンプト戦略と,プロンプトパラメータの少ないトレーニングだけで高いパフォーマンスを向上させる効率的なプロンプトチューニング戦略を導入する。実験と解析により提案手法の有効性が示され、特に未確認スロットにおける大幅な改善(+13.44% F1)が示された。 Zero-shot cross-domain slot filling aims to transfer knowledge from the labeled source domain to the unlabeled target domain. Existing models either encode slot descriptions and examples or design handcrafted question templates using heuristic rules, suffering from poor generalization capability or robustness. In this paper, we propose a generative zero-shot prompt learning framework for cross-domain slot filling, both improving generalization and robustness than previous work. Besides, we introduce a novel inverse prompting strategy to distinguish different slot types to avoid the multiple prediction problem, and an efficient prompt-tuning strategy to boost higher performance by only training fewer prompt parameters. Experiments and analysis demonstrate the effectiveness of our proposed framework, especially huge improvements (+13.44% F1) on the unseen slots.	翻訳日:2023-07-07 14:56:10 公開日:2023-07-06
# 政策コントラスト模倣学習 Policy Contrastive Imitation Learning ( http://arxiv.org/abs/2307.02829v1 ) ライセンス: Link先を確認	Jialei Huang, Zhaoheng Yin, Yingdong Hu, Yang Gao	(参考訳) 逆模倣学習(英: Adversarial mimicion learning, AIL)は、最近多くの成功を収めた人気手法である。しかしながら、AILのパフォーマンスは、より困難なタスクにはまだ満足できません。主な原因の1つは、AIL識別器の低品質化によるものである。 AIL判別器は、必ずしも専門家から政策を有意義に区別するとは限らないバイナリ分類によって訓練されるので、結果として得られる報酬も意味のあるものではないかもしれない。この問題を解決するために,政策コントラスト模倣学習(PCIL)と呼ばれる新しい手法を提案する。 PCILは異なるポリシーを固定することでコントラスト表現空間を学び、スムーズなコサイン類似性に基づく報酬を生成する。提案する表現学習目標は,ail目標のより強固なバージョンと見なすことができ,エージェントとポリシーのより有意義な比較を行うことができる。理論的観点から,見習い学習フレームワークを用いた手法の有効性を示す。さらに,DeepMind Control スイートの実証評価により,PCIL が最先端の性能を達成できることが実証された。最後に、定性的な結果は、PCILが模倣学習のためのより滑らかで意味のある表現空間を構築することを示唆している。 Adversarial imitation learning (AIL) is a popular method that has recently achieved much success. However, the performance of AIL is still unsatisfactory on the more challenging tasks. We find that one of the major reasons is due to the low quality of AIL discriminator representation. Since the AIL discriminator is trained via binary classification that does not necessarily discriminate the policy from the expert in a meaningful way, the resulting reward might not be meaningful either. We propose a new method called Policy Contrastive Imitation Learning (PCIL) to resolve this issue. PCIL learns a contrastive representation space by anchoring on different policies and generates a smooth cosine-similarity-based reward. Our proposed representation learning objective can be viewed as a stronger version of the AIL objective and provide a more meaningful comparison between the agent and the policy. From a theoretical perspective, we show the validity of our method using the apprenticeship learning framework. Furthermore, our empirical evaluation on the DeepMind Control suite demonstrates that PCIL can achieve state-of-the-art performance. Finally, qualitative results suggest that PCIL builds a smoother and more meaningful representation space for imitation learning.	翻訳日:2023-07-07 14:55:55 公開日:2023-07-06
# サンプリング型高速勾配再スケーリング法による高転送性逆襲攻撃 Sampling-based Fast Gradient Rescaling Method for Highly Transferable Adversarial Attacks ( http://arxiv.org/abs/2307.02828v1 ) ライセンス: Link先を確認	Xu Han, Anmin Liu, Chenxuan Yao, Yanbo Fan, Kun He	(参考訳) 深層ニューラルネットワークは、人間の知覚できない摂動を良心的な入力に加えることで、敵の例に弱いことが知られている。ホワイトボックス設定で100%近い攻撃成功率を達成した後、ブラックボックス攻撃に焦点を移し、敵の事例の転送可能性に大きな注目を集めている。いずれの場合も、一般的な勾配法は一般に手動関数を用いて勾配更新の摂動を生成するが、これは概ね正しい方向を与え、大きな成功を収めた。しかし、その限界に注意を払う仕事はほとんどない。本研究では,元の勾配と発生する雑音との偏差が不正確な勾配更新推定と逆移動可能性に対する最適解をもたらすことを観測する。そこで本研究では,サンプリングに基づくFast Gradient Rescaling Method (S-FGRM)を提案する。具体的には、余分な計算コストを伴わずに手話関数を置換するためにデータ再スケーリングを用いる。さらに,再スケーリングの変動を解消し,勾配更新を安定化するDepth First Smpling法を提案する。本手法は任意の勾配に基づく攻撃に適用可能であり, 様々な入力変換やアンサンブル手法と統合して, 対向移動性の向上を図ることができる。標準のImageNetデータセットに対する大規模な実験により、我々の手法は勾配に基づく攻撃の転送可能性を大幅に向上し、最先端のベースラインよりも優れることが示された。 Deep neural networks are known to be vulnerable to adversarial examples crafted by adding human-imperceptible perturbations to the benign input. After achieving nearly 100% attack success rates in white-box setting, more focus is shifted to black-box attacks, of which the transferability of adversarial examples has gained significant attention. In either case, the common gradient-based methods generally use the sign function to generate perturbations on the gradient update, that offers a roughly correct direction and has gained great success. But little work pays attention to its possible limitation. In this work, we observe that the deviation between the original gradient and the generated noise may lead to inaccurate gradient update estimation and suboptimal solutions for adversarial transferability. To this end, we propose a Sampling-based Fast Gradient Rescaling Method (S-FGRM). Specifically, we use data rescaling to substitute the sign function without extra computational cost. We further propose a Depth First Sampling method to eliminate the fluctuation of rescaling and stabilize the gradient update. Our method could be used in any gradient-based attacks and is extensible to be integrated with various input transformation or ensemble methods to further improve the adversarial transferability. Extensive experiments on the standard ImageNet dataset show that our method could significantly boost the transferability of gradient-based attacks and outperform the state-of-the-art baselines.	翻訳日:2023-07-07 14:55:35 公開日:2023-07-06
# 高次流線微分方程式を用いた束特異的道図分布推定 Bundle-specific Tractogram Distribution Estimation Using Higher-order Streamline Differential Equation ( http://arxiv.org/abs/2307.02825v1 ) ライセンス: Link先を確認	Yuanjing Feng, Lei Xie, Jingqiang Wang, Jianzhong He, Fei Gao	(参考訳) トラクトグラフィーは、拡散方向と繊維幾何学との間の不明瞭な空間的対応に苦しむ繊維配向分布(FOD)から抽出されたピーク方向をトレースする。ピークに基づくトラクトグラフィ手法は「局所的」に復元された流線を「単一」にすることで,繊維束全体の傾向に関する全体的情報に欠ける。本研究では,「クラスターからクラスタへの」方法で流線束を再構成する高次流線微分方程式を用いて,束特異的な気道分布関数に基づく新しい気道図法を提案する。任意の高階ストリームライン微分方程式の統一的フレームワークを示し、拡散テンソルベクトル場に基づいて定義される不整合ストリームラインを持つファイバーバンドルを記述する。大域的なレベルでは、エネルギー最適化モデルを最小化することにより、束特異的なトラクトグラム分布(BTD)係数の推定を簡略化し、トラクトグラムバンドル情報を導入して解剖学的先行情報を提供することにより、事前指導の下でBTDと拡散テンソルベクトルの関係を特徴づける。シミュレートハフ、サイン、円データ、ismrm 2015路面図チャレンジデータ、fibercupデータ、およびヒトコネクトームプロジェクト(hcp)データからのin vivoデータを用いて、質的、定量的評価を行う実験を行った。その結果,本手法は複雑な大域繊維束を直接再構成できることがわかった。 BTDは、局所レベルでの誤差の偏差と蓄積を低減し、長距離、ねじれ、大きなファンニングトラクトを再構築するより良い結果を示す。 Tractography traces the peak directions extracted from fiber orientation distribution (FOD) suffering from ambiguous spatial correspondences between diffusion directions and fiber geometry, which is prone to producing erroneous tracks while missing true positive connections. The peaks-based tractography methods 'locally' reconstructed streamlines in 'single to single' manner, thus lacking of global information about the trend of the whole fiber bundle. In this work, we propose a novel tractography method based on a bundle-specific tractogram distribution function by using a higher-order streamline differential equation, which reconstructs the streamline bundles in 'cluster to cluster' manner. A unified framework for any higher-order streamline differential equation is presented to describe the fiber bundles with disjoint streamlines defined based on the diffusion tensor vector field. At the global level, the tractography process is simplified as the estimation of bundle-specific tractogram distribution (BTD) coefficients by minimizing the energy optimization model, and is used to characterize the relations between BTD and diffusion tensor vector under the prior guidance by introducing the tractogram bundle information to provide anatomic priors. Experiments are performed on simulated Hough, Sine, Circle data, ISMRM 2015 Tractography Challenge data, FiberCup data, and in vivo data from the Human Connectome Project (HCP) data for qualitative and quantitative evaluation. The results demonstrate that our approach can reconstruct the complex global fiber bundles directly. BTD reduces the error deviation and accumulation at the local level and shows better results in reconstructing long-range, twisting, and large fanning tracts.	翻訳日:2023-07-07 14:55:11 公開日:2023-07-06
# 音声感情認識のためのディープラーニングフレームワークによる生波形の評価 Evaluating raw waveforms with deep learning frameworks for speech emotion recognition ( http://arxiv.org/abs/2307.02820v1 ) ライセンス: Link先を確認	Zeynep Hilal Kilimci, Ulku Bayraktar, Ayhan Kucukmanisa	(参考訳) 音声認識は音声処理分野における課題である。このため,特徴抽出プロセスは音声信号の実証と処理において重要な役割を担っている。本研究では、EMO-DB、RAVDESS、TESS、CREMA、SAVEE、TESS+RAVDESSの6つの異なるデータセットを利用した感情の認識のための特徴抽出段階なしで、生オーディオファイルをディープニューラルネットワークに直接供給するモデルを示す。提案モデルの寄与を実証するために,メルスケールスペクトル,メル周波数ケプストラム係数といった従来の特徴抽出技術の性能を,機械学習アルゴリズム,アンサンブル学習手法,深層・ハイブリッド深層学習技術とブレンドする。サポートベクターマシン,決定木,ナイーブベイズ,ランダムフォレストモデルを機械学習アルゴリズムとして評価し,多数決と累積法をアンサンブル学習手法として評価する。さらに,畳み込みニューラルネットワーク,長期記憶ネットワーク,ハイブリッドCNN-LSTMモデルをディープラーニング手法として評価し,機械学習やアンサンブル学習法と比較した。提案モデルの有効性を示すため,最新研究との比較を行った。実験結果に基づき、cnnモデルは、生のオーディオファイルを用いたtess+ravdessデータセットの95.86%の精度で既存のアプローチに優れている。 CNNモデルによるEMO-DBの精度は90.34%、CNNモデルによるRAVDESSの精度は90.42%、LSTMモデルによるTESSの精度は99.48%、CNNモデルによるCREMAの精度は69.72%、CNNモデルによるSAVEEの精度は85.76%である。 Speech emotion recognition is a challenging task in speech processing field. For this reason, feature extraction process has a crucial importance to demonstrate and process the speech signals. In this work, we represent a model, which feeds raw audio files directly into the deep neural networks without any feature extraction stage for the recognition of emotions utilizing six different data sets, EMO-DB, RAVDESS, TESS, CREMA, SAVEE, and TESS+RAVDESS. To demonstrate the contribution of proposed model, the performance of traditional feature extraction techniques namely, mel-scale spectogram, mel-frequency cepstral coefficients, are blended with machine learning algorithms, ensemble learning methods, deep and hybrid deep learning techniques. Support vector machine, decision tree, naive Bayes, random forests models are evaluated as machine learning algorithms while majority voting and stacking methods are assessed as ensemble learning techniques. Moreover, convolutional neural networks, long short-term memory networks, and hybrid CNN- LSTM model are evaluated as deep learning techniques and compared with machine learning and ensemble learning methods. To demonstrate the effectiveness of proposed model, the comparison with state-of-the-art studies are carried out. Based on the experiment results, CNN model excels existent approaches with 95.86% of accuracy for TESS+RAVDESS data set using raw audio files, thence determining the new state-of-the-art. The proposed model performs 90.34% of accuracy for EMO-DB with CNN model, 90.42% of accuracy for RAVDESS with CNN model, 99.48% of accuracy for TESS with LSTM model, 69.72% of accuracy for CREMA with CNN model, 85.76% of accuracy for SAVEE with CNN model in speaker-independent audio categorization problems.	翻訳日:2023-07-07 14:54:38 公開日:2023-07-06
# 機械学習と脳波(EEG)の動向 Trends in Machine Learning and Electroencephalogram (EEG): A Review for Undergraduate Researchers ( http://arxiv.org/abs/2307.02819v1 ) ライセンス: Link先を確認	Nathan Koome Murungi, Michael Vinh Pham, Xufeng Dai, Xiaodong Qu	(参考訳) 本稿では,機械学習の文脈における脳-コンピュータインタフェース(BCI)に関する体系的な文献レビューを行う。私たちの焦点は脳波(EEG)研究であり、2023年現在の最新の傾向を浮き彫りにしている。目標は、bciフィールドのアクセス可能な概要を提供し、タスク、アルゴリズム、データセットをカバーすることにある。近年の知見を合成することにより,bci研究の基本的な理解を提供し,今後の研究に有望な道を見いだすことが目的である。 This paper presents a systematic literature review on Brain-Computer Interfaces (BCIs) in the context of Machine Learning. Our focus is on Electroencephalography (EEG) research, highlighting the latest trends as of 2023. The objective is to provide undergraduate researchers with an accessible overview of the BCI field, covering tasks, algorithms, and datasets. By synthesizing recent findings, our aim is to offer a fundamental understanding of BCI research, identifying promising avenues for future investigations.	翻訳日:2023-07-07 14:53:59 公開日:2023-07-06
# 高次ネットワークにおけるDegree Heterogeneity: Inference in the Hypergraph $\boldsymbol{\beta}$-Model Degree Heterogeneity in Higher-Order Networks: Inference in the Hypergraph $\boldsymbol{\beta}$-Model ( http://arxiv.org/abs/2307.02818v1 ) ライセンス: Link先を確認	Sagnik Nandy and Bhaswar B. Bhattacharya	(参考訳) ランダムグラフに対する$\boldsymbol{\beta}$-model は、次数の不均質なネットワーク内の対関係を表現するのによく用いられる。 stasi et al. (2014) は双対相互作用を超えて、高次(多方向)相互作用を持つネットワークの次数の不均一性を捉えるハイパーグラフ $\boldsymbol{\beta}$-モデルを導入した。本稿では,複数の層を持つハイパーグラフ $\boldsymbol{\beta}$-model の厳密な研究を開始する。まず,最大確率(ml)推定値の収束率を導出し,最小速度の最適性を確立する。また,ML推定の限界分布を導出し,モデルパラメータに対する漸近的に有効な信頼区間を構築する。次に、hypergraph $\boldsymbol{\beta}$-modelにおける適合性の問題を考察する。具体的には,ヌル仮説の下での度数比(lr)検定の漸近正規性を確立し,その検出しきい値と閾値での制限パワーを導出する。興味深いことに、LRテストの検出しきい値はこのしきい値以下で漸近的に無力である、最小限の最適値であることが判明した。理論的結果は数値実験でさらに検証される。ハイパーグラフ$\boldsymbol{\beta}$-モデルの推定と推論のための理論的フレームワークの開発に加えて、上記の結果は、ml推定の最小最適性やlrテストの非null性など、グラフ$\boldsymbol{\beta}$-モデル文献の多くのギャップを埋めている。 The $\boldsymbol{\beta}$-model for random graphs is commonly used for representing pairwise interactions in a network with degree heterogeneity. Going beyond pairwise interactions, Stasi et al. (2014) introduced the hypergraph $\boldsymbol{\beta}$-model for capturing degree heterogeneity in networks with higher-order (multi-way) interactions. In this paper we initiate the rigorous study of the hypergraph $\boldsymbol{\beta}$-model with multiple layers, which allows for hyperedges of different sizes across the layers. To begin with, we derive the rates of convergence of the maximum likelihood (ML) estimate and establish their minimax rate optimality. We also derive the limiting distribution of the ML estimate and construct asymptotically valid confidence intervals for the model parameters. Next, we consider the goodness-of-fit problem in the hypergraph $\boldsymbol{\beta}$-model. Specifically, we establish the asymptotic normality of the likelihood ratio (LR) test under the null hypothesis, derive its detection threshold, and also its limiting power at the threshold. Interestingly, the detection threshold of the LR test turns out to be minimax optimal, that is, all tests are asymptotically powerless below this threshold. The theoretical results are further validated in numerical experiments. In addition to developing the theoretical framework for estimation and inference for hypergraph $\boldsymbol{\beta}$-models, the above results fill a number of gaps in the graph $\boldsymbol{\beta}$-model literature, such as the minimax optimality of the ML estimates and the non-null properties of the LR test, which, to the best of our knowledge, have not been studied before.	翻訳日:2023-07-07 14:53:51 公開日:2023-07-06
# 条件拡散を用いた単一画像LDRからHDRへの変換 Single Image LDR to HDR Conversion using Conditional Diffusion ( http://arxiv.org/abs/2307.02814v1 ) ライセンス: Link先を確認	Dwip Dalal, Gautam Vashishtha, Prajwal Singh, Shanmuganathan Raman	(参考訳) デジタル・イメージングは写実的なシーンを再現することを目的としているが、低ダイナミックレンジ(ldr)カメラは実際のシーンの広いダイナミックレンジを表現できない。本稿では,ハイダイナミックレンジ(hdr)画像を再構成しながら,影やハイライトから複雑な詳細を復元する深層学習に基づくアプローチを提案する。我々は,イメージ・ツー・イメージ(I2I)翻訳タスクとして問題を定式化し,分類器フリーガイダンスを用いた条件付き拡散確率モデル(DDPM)に基づくフレームワークを提案する。提案するフレームワークに深層CNNベースのオートエンコーダを組み込んで,コンディショニングに使用する入力LDR画像の潜時表現の質を高める。さらに,ldr-hdr翻訳タスクにおける新たな損失関数「露光損失」を導入する。この損失は飽和の反対方向の直接勾配を助け、結果の品質をさらに向上させる。定量的および定性的実験により,提案手法の有効性を効果的に実証した。以上の結果から,複雑なカメラパイプラインアーキテクチャを置き換える条件付き拡散法が提案されている。 Digital imaging aims to replicate realistic scenes, but Low Dynamic Range (LDR) cameras cannot represent the wide dynamic range of real scenes, resulting in under-/overexposed images. This paper presents a deep learning-based approach for recovering intricate details from shadows and highlights while reconstructing High Dynamic Range (HDR) images. We formulate the problem as an image-to-image (I2I) translation task and propose a conditional Denoising Diffusion Probabilistic Model (DDPM) based framework using classifier-free guidance. We incorporate a deep CNN-based autoencoder in our proposed framework to enhance the quality of the latent representation of the input LDR image used for conditioning. Moreover, we introduce a new loss function for LDR-HDR translation tasks, termed Exposure Loss. This loss helps direct gradients in the opposite direction of the saturation, further improving the results' quality. By conducting comprehensive quantitative and qualitative experiments, we have effectively demonstrated the proficiency of our proposed method. The results indicate that a simple conditional diffusion-based method can replace the complex camera pipeline-based architectures.	翻訳日:2023-07-07 14:53:20 公開日:2023-07-06
# cpdg : 動的グラフニューラルネットワークのためのコントラスト事前学習法 CPDG: A Contrastive Pre-Training Method for Dynamic Graph Neural Networks ( http://arxiv.org/abs/2307.02813v1 ) ライセンス: Link先を確認	Yuanchen Bei, Hao Xu, Sheng Zhou, Huixuan Chi, Mengdi Zhang, Zhao Li, Jiajun Bu	(参考訳) 動的グラフデータマイニングは, 動的グラフに含まれる豊富な情報と実世界で広く利用されているため, 近年普及している。動的グラフニューラルネットワーク(DGNN)の進歩にもかかわらず、豊富な情報と多様な下流タスクは、産業シナリオにおけるDGNNの実用化に重大な困難をもたらしている。そこで本稿では,この課題を事前学習によって解決し,動的グラフニューラルネットワーク(cpdg)のためのコントラスト事前学習法を提案する。 CPDGは、構造的時間的コントラスト付き事前学習スキームとともに、柔軟な構造的時間的サブグラフサンプリング器を通じて、一般化と長期モデリング機能を含むDGNNの事前訓練の課題に取り組む。大規模研究と産業用動的グラフデータセットの両方で実施された大規模な実験により、CPDGは3つの転送条件下での様々な下流タスクに対する動的グラフ事前学習において、既存の手法よりも優れた性能を示した。 Dynamic graph data mining has gained popularity in recent years due to the rich information contained in dynamic graphs and their widespread use in the real world. Despite the advances in dynamic graph neural networks (DGNNs), the rich information and diverse downstream tasks have posed significant difficulties for the practical application of DGNNs in industrial scenarios. To this end, in this paper, we propose to address them by pre-training and present the Contrastive Pre-Training Method for Dynamic Graph Neural Networks (CPDG). CPDG tackles the challenges of pre-training for DGNNs, including generalization and long-short term modeling capability, through a flexible structural-temporal subgraph sampler along with structural-temporal contrastive pre-training schemes. Extensive experiments conducted on both large-scale research and industrial dynamic graph datasets show that CPDG outperforms existing methods in dynamic graph pre-training for various downstream tasks under three transfer settings.	翻訳日:2023-07-07 14:53:01 公開日:2023-07-06
# テキストプロンプト評価によるゼロショットデジタル品質評価の改善 Advancing Zero-Shot Digital Human Quality Assessment through Text-Prompted Evaluation ( http://arxiv.org/abs/2307.02808v1 ) ライセンス: Link先を確認	Zicheng Zhang, Wei Sun, Yingjie Zhou, Haoning Wu, Chunyi Li, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin	(参考訳) デジタル人間は様々な領域で広範囲の応用を目撃し、関連する品質評価研究を必要としている。しかし、包括的なデジタル人間質評価(DHQA)データベースは存在しない。このギャップに対処するため,本研究では,全身デジタル人間を対象とした主観的品質評価データベースsjtu-h3dを提案する。 40人の高品質の基準デジタル人間と、1,120個のラベル付き歪みが7種類の歪みで生成される。 SJTU-H3DデータベースはDHQA研究のベンチマークとして機能し、処理アルゴリズムの評価と改善を可能にする。さらに、データベースバイアスを緩和しながら一般化機能を確保するため、ノン参照(NR)シナリオに焦点を当てたゼロショットDHQAアプローチを提案する。提案手法は,投影から抽出した意味的・歪み的特徴と,デジタル人間のメッシュ構造から抽出した幾何学的特徴を利用する。具体的には,コントラスト言語-画像事前学習(clip)モデルを用いて意味親和性を測定し,自然性画像品質評価器(niqe)モデルを用いて低レベルの歪み情報を取り込む。さらに,ディヘドラル角度を幾何ディスクリプタとしてメッシュ特徴を抽出する。これらの指標を集約することにより、ゼロショット性能の大幅な改善を示すDHQI(Digital Human Quality Index)を導入する。 DHQIはDHQAタスクの堅牢なベースラインとしても機能し、この分野の進歩を促進する。データベースとコードはhttps://github.com/zzc-1998/SJTU-H3Dで入手できる。 Digital humans have witnessed extensive applications in various domains, necessitating related quality assessment studies. However, there is a lack of comprehensive digital human quality assessment (DHQA) databases. To address this gap, we propose SJTU-H3D, a subjective quality assessment database specifically designed for full-body digital humans. It comprises 40 high-quality reference digital humans and 1,120 labeled distorted counterparts generated with seven types of distortions. The SJTU-H3D database can serve as a benchmark for DHQA research, allowing evaluation and refinement of processing algorithms. Further, we propose a zero-shot DHQA approach that focuses on no-reference (NR) scenarios to ensure generalization capabilities while mitigating database bias. Our method leverages semantic and distortion features extracted from projections, as well as geometry features derived from the mesh structure of digital humans. Specifically, we employ the Contrastive Language-Image Pre-training (CLIP) model to measure semantic affinity and incorporate the Naturalness Image Quality Evaluator (NIQE) model to capture low-level distortion information. Additionally, we utilize dihedral angles as geometry descriptors to extract mesh features. By aggregating these measures, we introduce the Digital Human Quality Index (DHQI), which demonstrates significant improvements in zero-shot performance. The DHQI can also serve as a robust baseline for DHQA tasks, facilitating advancements in the field. The database and the code are available at https://github.com/zzc-1998/SJTU-H3D.	翻訳日:2023-07-07 14:52:43 公開日:2023-07-06
# 集中認識タスクにおける基礎モデルの利用状況に関する批判的考察 A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task ( http://arxiv.org/abs/2307.02862v1 ) ライセンス: Link先を確認	Shiqi Yang, Atsushi Hashimoto, Yoshitaka Ushiku	(参考訳) 近年, 画像認識や生成など多くの分野において, 膨大なモダリティデータを学習した大規模モデルは, 基礎モデルと呼ばれることが多いが, 顕著な達成を達成している。当初のアプリケーションでは大きな成功を収めたものの、これらの基盤モデルが他のダウンストリームタスクにも適用できるかどうかはまだ不明である。本稿では,事前学習した基礎モデルに基づく識別的高密度化タスクの手法に関する簡単な調査を行う。また,Stable Diffusionに基づく既存の開語彙セグメンテーション手法の予備的検討を行い,セグメンテーションのための拡散モデルの展開方法が最適でないことを示す。これは、下流タスクに基礎モデルを採用するための将来の研究のための洞察を提供することを目的としている。 In recent years large model trained on huge amount of cross-modality data, which is usually be termed as foundation model, achieves conspicuous accomplishment in many fields, such as image recognition and generation. Though achieving great success in their original application case, it is still unclear whether those foundation models can be applied to other different downstream tasks. In this paper, we conduct a short survey on the current methods for discriminative dense recognition tasks, which are built on the pretrained foundation model. And we also provide some preliminary experimental analysis of an existing open-vocabulary segmentation method based on Stable Diffusion, which indicates the current way of deploying diffusion model for segmentation is not optimal. This aims to provide insights for future research on adopting foundation model for downstream task.	翻訳日:2023-07-07 14:46:57 公開日:2023-07-06
# フレームスキップによるフェイスアンチスプーフィングのための深層アンサンブル学習 Deep Ensemble Learning with Frame Skipping for Face Anti-Spoofing ( http://arxiv.org/abs/2307.02858v1 ) ライセンス: Link先を確認	Usman Muhammad, Md Ziaul Hoque, Mourad Oussalah and Jorma Laaksonen	(参考訳) スプーフィング攻撃(spoofing attack)とも呼ばれる顔提示攻撃は、アクセス制御システム、モバイル支払いシステム、身元確認システムといった顔認識システムに依存する生体認証システムに重大な脅威をもたらす。スプーフィングを防止するため、連続するビデオフレームで顔の動きを分析するいくつかのビデオベースの手法が文献に提示されている。しかし、隣接するフレーム間の動きを推定することは困難であり、計算コストが高い。本稿では,顔の反スプーフィング課題を運動予測問題として再構成し,フレームスキップ機構を備えた深層アンサンブル学習モデルを提案する。提案するフレームスキップは,オリジナル映像を一定サイズのビデオクリップに分割する一様サンプリング手法に基づいている。このようにして、3つの異なるリカレントニューラルネットワーク(rnn)のトレーニング中に、時間パターンを容易に認識できるように、クリップのn番目のフレームが選択される。各RNNの性能に動機づけられたメタモデルは、個々のRNNの予測を組み合わせることにより、全体的な認識性能を向上させる。 MSU-MFSD (3.12\%)、Replay-Attack (11.19\%)、OULU-NPU (12.23\%)の4つのデータセットで実験を行い、最も難しいクロスデータセットテストシナリオでは、半総誤差率 (HTER) を使用した。 Face presentation attacks, also known as spoofing attacks, pose a significant threat to biometric systems that rely on facial recognition systems, such as access control systems, mobile payments, and identity verification systems. To prevent spoofing, several video-based methods have been presented in the literature that analyze facial motion in successive video frames. However, estimating the motion between adjacent frames is a challenging task and requires high computational cost. In this paper, we reformulate the face anti-spoofing task as a motion prediction problem and introduce a deep ensemble learning model with a frame skipping mechanism. The proposed frame skipping is based on a uniform sampling approach where the original video is divided into fixed size video clips. In this way, every nth frame of the clip is selected to ensure that the temporal patterns can easily be perceived during the training of three different recurrent neural networks (RNNs). Motivated by the performance of each RNNs, a meta-model is developed to improve the overall recognition performance by combining the predictions of the individual RNNs. Extensive experiments were conducted on four datasets, and state-of-the-art performance is reported for MSU-MFSD (3.12\%), Replay-Attack (11.19\%), and OULU-NPU (12.23\%) using half total error rate (HTER) in the most challenging cross-dataset test scenario.	翻訳日:2023-07-07 14:46:44 公開日:2023-07-06
# お金だけでなく、ランサムウェア攻撃による現実世界の被害 It's more than just money: The real-world harms from ransomware attacks ( http://arxiv.org/abs/2307.02855v1 ) ライセンス: Link先を確認	Nandita Pattnaik, Jason R. C. Nurse, Sarah Turner, Gareth Mott, Jamie MacColl, Pia Huesch, James Sullivan	(参考訳) サイバー攻撃の頻度と洗練度が高まるにつれて、組織はインシデントの現実に直面する準備が整う必要がある。セキュリティリスクの管理に成功しようとする組織計画は、攻撃の余波によって影響を受ける害(すなわち負の影響)と様々な当事者を明確に理解しなければならない。この目的のために,本稿では,サイバー攻撃によって引き起こされる多数の現実世界の害について,特にランサムウェア事件を中心に,新たな調査を行っている。この調査は、このような事故による損害をモデル化するための、新しい堅牢な方法論の提案にも繋がる。ランサムウェア攻撃後の様々な段階で発生する害の種類や被害(例えば、オフラインのエンタープライズサーバ)が、利害関係者(例えば、顧客が社会福祉給付や銀行口座にアクセスできないこと)に悪影響を及ぼす可能性を秘めているかどうかを調べるために、ランサムウェアインシデントに関する公開可能なケースデータを作成します。私たちの分析で顕著な発見は、ビジネスそのものを超えて(身代金の支払い以外の)社会的・人的被害の顕著なセットの特定と、産業セクターに関係なく攻撃後に生じる複雑な害の網である。また,完全なデータがないため,害の完全な範囲とシーケンスを解読することは困難な作業であることも確認した。この論文は、ランサムウェアの害に対する透明性の向上を論じ、これらの事件の現実をよりよく理解し、組織や社会の利益をより一般的にする。 As cyber-attacks continue to increase in frequency and sophistication, organisations must be better prepared to face the reality of an incident. Any organisational plan that intends to be successful at managing security risks must clearly understand the harm (i.e., negative impact) and the various parties affected in the aftermath of an attack. To this end, this article conducts a novel exploration into the multitude of real-world harms that can arise from cyber-attacks, with a particular focus on ransomware incidents given their current prominence. This exploration also leads to the proposal of a new, robust methodology for modelling harms from such incidents. We draw on publicly-available case data on high-profile ransomware incidents to examine the types of harm that emerge at various stages after a ransomware attack and how harms (e.g., an offline enterprise server) may trigger other negative, potentially more substantial impacts for stakeholders (e.g., the inability for a customer to access their social welfare benefits or bank account). Prominent findings from our analysis include the identification of a notable set of social/human harms beyond the business itself (and beyond the financial payment of a ransom) and a complex web of harms that emerge after attacks regardless of the industry sector. We also observed that deciphering the full extent and sequence of harms can be a challenging undertaking because of the lack of complete data available. This paper consequently argues for more transparency on ransomware harms, as it would lead to a better understanding of the realities of these incidents to the benefit of organisations and society more generally.	翻訳日:2023-07-07 14:46:14 公開日:2023-07-06
# 窒素空洞中心における光子放出統計のキャラクタリゼーション Characterization of the photon emission statistics in nitrogen-vacancy centers ( http://arxiv.org/abs/2307.02854v1 ) ライセンス: Link先を確認	Iv\'an Panadero, Hilario Espin\'os, Lucas Tsunaki, Kseniia Volkova, Ander Tobalina, Jorge Casanova, Pablo Acedo, Boris Naydenov, Ricardo Puebla, and Erik Torrontegui	(参考訳) 非共鳴レーザー励起と共振マイクロ波制御の下でダイヤモンド中の窒素空孔(NV)中心から放出される光子の時間依存カウント統計をモデル化し,実験的に実証した。 nvセンターの高速固有ダイナミクスに関連する7つの電子状態に対する量子ジャンプ形式性の一般化は、その放出を特徴づけることができ、量子系の内部状態と測定可能な検出された光子数との関係を明確化する自己完結モデルを提供する。このモデルにより、検出プロトコルの開発により、磁場測定に対するシステムの感度を最大化しながら、エネルギーと時間資源を最適化することができる。 We model and experimentally demonstrate the full time-dependent counting statistics of photons emitted by a single nitrogen-vacancy (NV) center in diamond under non-resonant laser excitation and resonant microwave control. A generalization of the quantum jump formalism for the seven electronic states involved in the fast intrinsic dynamics of an NV center provides a self-contained model that allows for the characterization of its emission and clarifies the relation between the quantum system internal states and the measurable detected photon counts. The model allows the elaboration of detection protocols to optimize the energy and time resources while maximizing the system sensitivity to magnetic-field measurements.	翻訳日:2023-07-07 14:45:45 公開日:2023-07-06
# 誇大広告に抵抗しろ! R\'esum\'e-Driven Development の実践的提案 Resist the Hype! Practical Recommendations to Cope With R\'esum\'e-Driven Development ( http://arxiv.org/abs/2307.02850v1 ) ライセンス: Link先を確認	Jonas Fritzsch, Marvin Wyrich, Justus Bogner, Stefan Wagner	(参考訳) 技術トレンドは、ソフトウェアとitプロフェッショナルの雇用プロセスにおいて重要な役割を果たす。採用(130)職と技術(558)職の両方で591人のソフトウェア専門家を対象とした最近の研究で、r\'esum\e とアプリケーションプロセスにおける技術トレンドを過大に強調する傾向に対する経験的サポートを見出した。雇用者の60%は、こうした傾向が求人広告に影響を与えることに同意した。ソフトウェア専門家のうち、82%は、日々の仕事にトレンド技術を使うことは、将来の雇用者にとってより魅力的なものになると信じていた。この現象は以前、r\'esum\'e-driven development (rdd) というラベルで報告されたことがある。この記事では、RDDがソフトウェア開発プラクティスに与える影響について、より真剣な議論を始めようとしています。我々は,この現象が有害な自己維持動態をいかに生み出すかを説明し,雇用者と応募者の両方の視点で,現状を良く変えるための実践的なレコメンデーションを提供する。 Technology trends play an important role in the hiring process for software and IT professionals. In a recent study of 591 software professionals in both hiring (130) and technical (558) roles, we found empirical support for a tendency to overemphasize technology trends in r\'esum\'es and the application process. 60% of the hiring professionals agreed that such trends would influence their job advertisements. Among the software professionals, 82% believed that using trending technologies in their daily work would make them more attractive for potential future employers. This phenomenon has previously been reported anecdotally and somewhat humorously under the label R\'esum\'e-Driven Development (RDD). Our article seeks to initiate a more serious debate about the consequences of RDD on software development practice. We explain how the phenomenon may constitute a harmful self-sustaining dynamic, and provide practical recommendations for both the hiring and applicant perspectives to change the current situation for the better.	翻訳日:2023-07-07 14:45:33 公開日:2023-07-06
# NatLogAttack:自然言語推論モデルを自然言語論理で攻撃するフレームワーク NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic ( http://arxiv.org/abs/2307.02849v1 ) ライセンス: Link先を確認	Zi'ou Zheng and Xiaodan Zhu	(参考訳) 推論は、当初から人工知能の中心的な話題だった。分散表現とニューラルネットワークの最近の進歩は、自然言語推論の最先端性能を改善し続けている。しかし、モデルが結論に達するための真の推論を行うのか、あるいは急激な相関に依存するのかは、まだ明らかな疑問である。敵の攻撃は、アキレスの犠牲者モデルのヒールを評価する重要なツールであることが証明されている。本研究では,論理形式に基づく攻撃モデルの開発に関する基礎的問題を検討する。自然論理を中心とする体系的攻撃を行うnatlogattackを提案する。natlogattackは、アリストテレスのシルロジズムに遡り、自然言語推論のために密接に開発された古典論理形式である。提案するフレームワークは,ラベル保存攻撃とラベルフリッピング攻撃の両方をレンダリングする。既存の攻撃モデルと比較して、NatLogAttackは、犠牲者モデルへの訪問が少なく、より良い敵例を生成する。被害者のモデルはラベルフライピング設定でより脆弱であることが分かる。 NatLogAttackは、キーの観点から既存のNLIモデルのキャパシティを調査するためのツールを提供しています。 Reasoning has been a central topic in artificial intelligence from the beginning. The recent progress made on distributed representation and neural networks continues to improve the state-of-the-art performance of natural language inference. However, it remains an open question whether the models perform real reasoning to reach their conclusions or rely on spurious correlations. Adversarial attacks have proven to be an important tool to help evaluate the Achilles' heel of the victim models. In this study, we explore the fundamental problem of developing attack models based on logic formalism. We propose NatLogAttack to perform systematic attacks centring around natural logic, a classical logic formalism that is traceable back to Aristotle's syllogism and has been closely developed for natural language inference. The proposed framework renders both label-preserving and label-flipping attacks. We show that compared to the existing attack models, NatLogAttack generates better adversarial examples with fewer visits to the victim models. The victim models are found to be more vulnerable under the label-flipping setting. NatLogAttack provides a tool to probe the existing and future NLI models' capacity from a key viewpoint and we hope more logic-based attacks will be further explored for understanding the desired property of reasoning.	翻訳日:2023-07-07 14:45:16 公開日:2023-07-06
# コンピュータ支援結核診断の再検討 Revisiting Computer-Aided Tuberculosis Diagnosis ( http://arxiv.org/abs/2307.02848v1 ) ライセンス: Link先を確認	Yun Liu, Yu-Huan Wu, Shi-Chen Zhang, Li Liu, Min Wu, and Ming-Ming Cheng	(参考訳) 結核(TB)は世界的な健康上の脅威であり、毎年数百万人が死亡している。早期診断と治療は生存率を大幅に向上させるが、特に発展途上国では依然として大きな課題である。近年,深層学習による結核診断 (ctd) が期待されているが, 限られたトレーニングデータによって進歩が妨げられている。そこで本研究では,結核X線(TBX11K)データセットを大規模に構築し,TB領域に対応する境界ボックスアノテーションを備えた胸部X線(CXR)画像を含む。このデータセットは高品質ctdのための高度な検出器のトレーニングを可能にする。さらに,CXR画像の同時分類とTB感染領域検出のための強力なベースラインであるSymFormerを提案する。 SymFormerはSymmetric Search Attention(SymAttention)を導入し、CXR画像の左右対称特性に取り組み、識別的特徴を学習する。 cxr画像は左右対称性に厳密に従わないため,特徴リカバリレーションによるシンマテンションを容易にする対称位置符号化 (spe) も提案する。今後のctd研究を促進するために,評価指標の導入,既存の検出器から再構成したベースラインモデルの評価,オンラインチャレンジの実施により,ベンチマークを構築する。 SymFormerはTBX11Kデータセット上で最先端のパフォーマンスを実現する。データ、コード、モデルがリリースされます。 Tuberculosis (TB) is a major global health threat, causing millions of deaths annually. Although early diagnosis and treatment can greatly improve the chances of survival, it remains a major challenge, especially in developing countries. Recently, computer-aided tuberculosis diagnosis (CTD) using deep learning has shown promise, but progress is hindered by limited training data. To address this, we establish a large-scale dataset, namely the Tuberculosis X-ray (TBX11K) dataset, which contains 11,200 chest X-ray (CXR) images with corresponding bounding box annotations for TB areas. This dataset enables the training of sophisticated detectors for high-quality CTD. Furthermore, we propose a strong baseline, SymFormer, for simultaneous CXR image classification and TB infection area detection. SymFormer incorporates Symmetric Search Attention (SymAttention) to tackle the bilateral symmetry property of CXR images for learning discriminative features. Since CXR images may not strictly adhere to the bilateral symmetry property, we also propose Symmetric Positional Encoding (SPE) to facilitate SymAttention through feature recalibration. To promote future research on CTD, we build a benchmark by introducing evaluation metrics, evaluating baseline models reformed from existing detectors, and running an online challenge. Experiments show that SymFormer achieves state-of-the-art performance on the TBX11K dataset. The data, code, and models will be released.	翻訳日:2023-07-07 14:44:55 公開日:2023-07-06
# 関数近似を用いたCVaR強化学習の高速化 Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation ( http://arxiv.org/abs/2307.02842v1 ) ライセンス: Link先を確認	Yu Chen, Yihan Du, Pihe Hu, Siwei Wang, Desheng Wu, Longbo Huang	(参考訳) リスクセンシティブ強化学習(rl)は、期待される報酬とリスクのバランスをとるポリシーを最適化することを目的としている。本稿では,線形および一般関数近似の下での反復条件値-アット・リスク(CVaR)目標を用いたリスク感応性RLの新規な定式化について検討する。関数近似を備えた ICVaR-RL と呼ばれるこの新しい定式化は、各決定ステップにおける安全性を保証するための原則化された方法を提供する。線形関数近似を持つicvar-rlに対して、計算効率の良いアルゴリズムicvar-lを提案し、$\widetilde{o}(\sqrt{\alpha^{-(h+1)}(d^2h^4+dh^6)k})$ regret、ここで$\alpha$はリスクレベル、$d$は状態動作特徴の次元、$h$は各エピソードの長さ、$k$はエピソード数である。また、一致した下界$\Omega(\sqrt{\alpha^{-(H-1)}d^2K})$を確立して、$d$および$K$に対するCVaR-Lの最適性を検証する。一般関数近似を用いた ICVaR-RL に対し, アルゴリズム ICVaR-G を提案し, ユーラダー次元と被覆数に依存する次元パラメータを $\widetilde{O}(\sqrt{\alpha^{-(H+1)}DH^4K})$ regret とする。さらに, CVaR 演算子の効率的な近似, CVaR 適応特徴を持つ新しい隆起回帰, 改良された楕円形の潜在性レムマなど, リスクに敏感な RL の新たな手法が提案されている。 Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk. In this paper, we investigate a novel risk-sensitive RL formulation with an Iterated Conditional Value-at-Risk (CVaR) objective under linear and general function approximations. This new formulation, named ICVaR-RL with function approximation, provides a principled way to guarantee safety at each decision step. For ICVaR-RL with linear function approximation, we propose a computationally efficient algorithm ICVaR-L, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}(d^2H^4+dH^6)K})$ regret, where $\alpha$ is the risk level, $d$ is the dimension of state-action features, $H$ is the length of each episode, and $K$ is the number of episodes. We also establish a matching lower bound $\Omega(\sqrt{\alpha^{-(H-1)}d^2K})$ to validate the optimality of ICVaR-L with respect to $d$ and $K$. For ICVaR-RL with general function approximation, we propose algorithm ICVaR-G, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}DH^4K})$ regret, where $D$ is a dimensional parameter that depends on the eluder dimension and covering number. Furthermore, our analysis provides several novel techniques for risk-sensitive RL, including an efficient approximation of the CVaR operator, a new ridge regression with CVaR-adapted features, and a refined elliptical potential lemma.	翻訳日:2023-07-07 14:44:32 公開日:2023-07-06
# ニュース要約生成のための進化的微調整によるLLMの強化 Enhancing LLM with Evolutionary Fine Tuning for News Summary Generation ( http://arxiv.org/abs/2307.02839v1 ) ライセンス: Link先を確認	Le Xiao and Xiaolin Chen	(参考訳) ニュース要約生成はインテリジェンス分析の分野で重要なタスクであり、人々が複雑な現実世界の出来事を理解し、反応するのに役立つ正確で包括的な情報を提供する。しかし、従来のニュース要約生成手法では、モデル自体やトレーニングデータの量、テキストノイズの影響に制限があるため、信頼性の高い情報を正確に生成することは困難である。本稿では,自然言語理解と生成能力を備えたllmを用いたニュース要約生成のための新しいパラダイムを提案する。 LLMを用いて、ニュース段落に含まれる事象から複数の構造化イベントパターンを抽出し、遺伝的アルゴリズムを用いてイベントパターンの集団を進化させ、LLMに入力する最も適応性の高いイベントパターンを選択し、ニュース要約を生成する。ニュース概要生成装置(NSG)は、イベントパターンの集団を選択し、進化させ、ニュース要約を生成するように設計されている。実験の結果,ニュース要約生成器は,一般化能力を備えた正確で信頼性の高いニュース要約を生成できることがわかった。 News summary generation is an important task in the field of intelligence analysis, which can provide accurate and comprehensive information to help people better understand and respond to complex real-world events. However, traditional news summary generation methods face some challenges, which are limited by the model itself and the amount of training data, as well as the influence of text noise, making it difficult to generate reliable information accurately. In this paper, we propose a new paradigm for news summary generation using LLM with powerful natural language understanding and generative capabilities. We use LLM to extract multiple structured event patterns from the events contained in news paragraphs, evolve the event pattern population with genetic algorithm, and select the most adaptive event pattern to input into the LLM to generate news summaries. A News Summary Generator (NSG) is designed to select and evolve the event pattern populations and generate news summaries. The experimental results show that the news summary generator is able to generate accurate and reliable news summaries with some generalization ability.	翻訳日:2023-07-07 14:43:50 公開日:2023-07-06
# 産業異常検出と位置推定のためのノイズ・ノーム再構成 Noise-to-Norm Reconstruction for Industrial Anomaly Detection and Localization ( http://arxiv.org/abs/2307.02836v1 ) ライセンス: Link先を確認	Shiqi Deng and Zhiyu Sun and Ruiyan Zhuang and Jun Gong	(参考訳) 異常検出には幅広い応用があり、特に工業品質検査において重要である。現在、多くの最高の異常検出モデルは特徴埋め込み法に依存している。しかし、これらの手法は、オブジェクト位置の変動が大きいデータセットではうまく機能しない。再構成に基づく手法では、サンプルの位置差を考慮せずに再構成誤差を用いて異常を検出する。本研究では,異常領域の不変な再構成を回避し,ノイズ・ツー・ノルムパラダイムを用いた再構成手法を提案する。再構成ネットワークはM-netをベースとして,マルチスケールフュージョンと残留アテンションモジュールを組み込んで,エンドツーエンドの異常検出とローカライゼーションを実現している。実験により, 異常領域を正常なパターンに再構成し, 正確な異常検出と局所化を実現するのに有効であることが示された。 mpddデータセットとvisaデータセットでは,提案手法が最新の手法よりも高い競合性能を達成し,mpddデータセットに新たな最先端標準を設定した。 Anomaly detection has a wide range of applications and is especially important in industrial quality inspection. Currently, many top-performing anomaly-detection models rely on feature-embedding methods. However, these methods do not perform well on datasets with large variations in object locations. Reconstruction-based methods use reconstruction errors to detect anomalies without considering positional differences between samples. In this study, a reconstruction-based method using the noise-to-norm paradigm is proposed, which avoids the invariant reconstruction of anomalous regions. Our reconstruction network is based on M-net and incorporates multiscale fusion and residual attention modules to enable end-to-end anomaly detection and localization. Experiments demonstrate that the method is effective in reconstructing anomalous regions into normal patterns and achieving accurate anomaly detection and localization. On the MPDD and VisA datasets, our proposed method achieved more competitive results than the latest methods, and it set a new state-of-the-art standard on the MPDD dataset.	翻訳日:2023-07-07 14:43:32 公開日:2023-07-06
# 多視点観測によるPOMDPのサンプル効率学習 Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight ( http://arxiv.org/abs/2307.02884v1 ) ライセンス: Link先を確認	Jiacheng Guo, Minshuo Chen, Huan Wang, Caiming Xiong, Mengdi Wang, Yu Bai	(参考訳) 本稿では,強化学習における難解な問題である部分可観測マルコフ決定過程(pomdps)における学習のサンプル効率について検討する。ゲームプレイにおけるローディングなどの実世界の設定により,POMDPと対話する各エピソードの後に,学習者は遭遇した潜伏状態から放出される複数の追加観測を収集するが,潜伏状態自体を観察しないような,強化されたフィードバックモデルを提案する。このフィードバックモデルに基づくサンプル効率学習は,POMDPsの新たなサブクラスである \emph{multi-observation revealeding POMDPs} と \emph{distinguishable POMDPs} の2つに対して可能であることを示す。両方のサブクラスは、標準軌跡フィードバックの下でサンプル効率の学習が可能な広く研究されているサブクラスである 'emph{revealing POMDPs} を一般化し、実質的に緩和する。特に、区別可能なPOMDPは、POMDPを明らかにするのに必要な \emph{linearly independent} の代わりに、異なる潜在状態からの放出分布を \emph{different} としてのみ要求する。 This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case. Motivated by real-world settings such as loading in game playing, we propose an enhanced feedback model called ``multiple observations in hindsight'', where after each episode of interaction with the POMDP, the learner may collect multiple additional observations emitted from the encountered latent states, but may not observe the latent states themselves. We show that sample-efficient learning under this feedback model is possible for two new subclasses of POMDPs: \emph{multi-observation revealing POMDPs} and \emph{distinguishable POMDPs}. Both subclasses generalize and substantially relax \emph{revealing POMDPs} -- a widely studied subclass for which sample-efficient learning is possible under standard trajectory feedback. Notably, distinguishable POMDPs only require the emission distributions from different latent states to be \emph{different} instead of \emph{linearly independent} as required in revealing POMDPs.	翻訳日:2023-07-07 14:35:59 公開日:2023-07-06
# コントラストが必要なのは Contrast Is All You Need ( http://arxiv.org/abs/2307.02882v1 ) ライセンス: Link先を確認	Burak Kilic, Florix Bex, Albert Gatt	(参考訳) 本研究では,ラベル付き法定データが小さく,不均衡であり,結果の質を損なう可能性のある,データスカース分類シナリオを分析する。本研究では,SetFit(Sentence Transformer Finetuning),コントラスト学習設定,法定規定分類タスクにおけるバニラ微調整設定の2つに着目した。さらに,lime (local interpretable model-agnostic explanations) で抽出された特徴を比較し,モデルの分類決定にどの特徴が寄与したかを確認する。その結果,SetFitのコントラスト設定は,トレーニングサンプルのごく一部を使用しながら,バニラファインタニングよりも優れていた。 LIMEの結果から, 比較学習アプローチは, 法的に有意な肯定的特徴と否定的特徴の両方を増強し, 分類結果に寄与することが示唆された。このように、対照的な目的によって微調整されたモデルは、その決定を法的に情報的特徴に基づいてより自信を持って下すように思われる。 In this study, we analyze data-scarce classification scenarios, where available labeled legal data is small and imbalanced, potentially hurting the quality of the results. We focused on two finetuning objectives; SetFit (Sentence Transformer Finetuning), a contrastive learning setup, and a vanilla finetuning setup on a legal provision classification task. Additionally, we compare the features that are extracted with LIME (Local Interpretable Model-agnostic Explanations) to see which particular features contributed to the model's classification decisions. The results show that a contrastive setup with SetFit performed better than vanilla finetuning while using a fraction of the training samples. LIME results show that the contrastive learning approach helps boost both positive and negative features which are legally informative and contribute to the classification results. Thus a model finetuned with a contrastive objective seems to base its decisions more confidently on legally informative features.	翻訳日:2023-07-07 14:35:39 公開日:2023-07-06
# 画像多様体の確率的・意味的記述とその応用 Probabilistic and Semantic Descriptions of Image Manifolds and Their Applications ( http://arxiv.org/abs/2307.02881v1 ) ライセンス: Link先を確認	Peter Tu, Zhaoyuan Yang, Richard Hartley, Zhiwei Xu, Jing Zhang, Dylan Campbell, Jaskirat Singh, Tianyu Wang	(参考訳) 本稿では,高次元画像空間の制限領域内に存在するように制限されているという観測結果を反映した画像の確率密度関数を推定する手法について記述することから始める。画像は高次元空間の低次元多様体上にあると言うのが一般的である。しかし、像はそのような低次元多様体上に存在するかもしれないが、多様体上のすべての点が同じ確率で像になるとは限らない。画像は多様体上に不均一に分布し、この分布を確率分布としてモデル化する方法を考案する。この目標を追求するために、AIやコンピュータビジョンコミュニティで人気のある生成モデルを検討する。我々の目的のために、生成的・確率的モデルは性質を持つべきである 1)サンプル生成:モデル化された密度関数に従ってこの分布からサンプルを採取できなければならない。 2) 確率計算: 興味のあるデータセットから以前に見つからなかったサンプルが与えられた場合、少なくとも正規化定数までサンプルの確率を計算することができる。そこで本研究では,流れの正規化や拡散モデルなどの手法について検討する。次に,このような確率的記述を,敵の攻撃に対する防御構築に利用できることを示す。密度の観点で多様体を記述することに加えて、多様体上の点を記述するために意味論的解釈をどのように利用できるかを考える。この目的のために, 変分エンコーダを用いて与えられた多様体上に存在する点の不等角表現を生成する, 創発的言語フレームワークを考える。多様体上の点間の軌道は、進化する意味記述によって記述することができる。 This paper begins with a description of methods for estimating probability density functions for images that reflects the observation that such data is usually constrained to lie in restricted regions of the high-dimensional image space - not every pattern of pixels is an image. It is common to say that images lie on a lower-dimensional manifold in the high-dimensional space. However, although images may lie on such lower-dimensional manifolds, it is not the case that all points on the manifold have an equal probability of being images. Images are unevenly distributed on the manifold, and our task is to devise ways to model this distribution as a probability distribution. In pursuing this goal, we consider generative models that are popular in AI and computer vision community. For our purposes, generative/probabilistic models should have the properties of 1) sample generation: it should be possible to sample from this distribution according to the modelled density function, and 2) probability computation: given a previously unseen sample from the dataset of interest, one should be able to compute the probability of the sample, at least up to a normalising constant. To this end, we investigate the use of methods such as normalising flow and diffusion models. We then show that such probabilistic descriptions can be used to construct defences against adversarial attacks. In addition to describing the manifold in terms of density, we also consider how semantic interpretations can be used to describe points on the manifold. To this end, we consider an emergent language framework which makes use of variational encoders to produce a disentangled representation of points that reside on a given manifold. Trajectories between points on a manifold can then be described in terms of evolving semantic descriptions.	翻訳日:2023-07-07 14:35:19 公開日:2023-07-06
# 大規模LiDAR点雲における高精度インスタンスセグメンテーションに向けて Towards accurate instance segmentation in large-scale LiDAR point clouds ( http://arxiv.org/abs/2307.02877v1 ) ライセンス: Link先を確認	Binbin Xiang, Torben Peters, Theodora Kontogianni, Frawa Vetterli, Stefano Puliti, Rasmus Astrup, Konrad Schindler	(参考訳) パンオプティカルセグメンテーション(panoptic segmentation)は、セマンティックとインスタンスセグメンテーションの組み合わせである。 3dポイントクラウド内のポイントをセマンティックカテゴリに割り当て、それらを別々のオブジェクトインスタンスに分割する。都市地図から森林管理まで、屋外の景観理解に多くの明白な応用がある。既存のメソッドは、隣接するストリート家具や隣接するツリーのような、同じセマンティックなカテゴリの近隣のインスタンスを分割するのに苦労している。本研究では,オブジェクトインスタンスへのクラスタリングポイントに関するpanoptic segmentation pipelineのステップを調査し,ボトルネックの緩和を目標とする。複数のタイプの学習点埋め込みを利用する注意深く設計されたクラスタリング戦略は、インスタンスのセグメンテーションを大幅に改善する。 npm3d urban mobile mapping datasetとfor-instance forest datasetの実験は、提案手法の有効性と汎用性を示している。 Panoptic segmentation is the combination of semantic and instance segmentation: assign the points in a 3D point cloud to semantic categories and partition them into distinct object instances. It has many obvious applications for outdoor scene understanding, from city mapping to forest management. Existing methods struggle to segment nearby instances of the same semantic category, like adjacent pieces of street furniture or neighbouring trees, which limits their usability for inventory- or management-type applications that rely on object instances. This study explores the steps of the panoptic segmentation pipeline concerned with clustering points into object instances, with the goal to alleviate that bottleneck. We find that a carefully designed clustering strategy, which leverages multiple types of learned point embeddings, significantly improves instance segmentation. Experiments on the NPM3D urban mobile mapping dataset and the FOR-instance forest dataset demonstrate the effectiveness and versatility of the proposed strategy.	翻訳日:2023-07-07 14:34:56 公開日:2023-07-06
# 基準に基づく動きのぼけ除去:基準画像のシャープネスを利用した学習 Reference-based Motion Blur Removal: Learning to Utilize Sharpness in the Reference Image ( http://arxiv.org/abs/2307.02875v1 ) ライセンス: Link先を確認	Han Zou, Masanori Suganuma, Takayuki Okatani	(参考訳) 画像中の動きのぼかしを除去する研究の進歩にもかかわらず、強いぼかしを扱うことは依然として困難である。単一の画像からぼやけを取り除くには限界があるが、ぼやけた画像をデブラーする参照として追加画像を使用するなど、複数の画像を使用する可能性も高い。典型的な設定は、ビデオの劣化の研究のように、近くのシャープ画像を用いて映像をビデオシーケンスでデバリングすることである。本稿では,参照画像に存在する情報を利用する方法を提案する。この方法は参照画像に対する強い仮定を必要としない。ビデオのデブラリングのように、同じシーンの別のショットを使うこともできるし、別のシーンから別のイメージを使うこともできる。提案手法は,まずターゲット画像と参照画像の局所的パッチに一致し,その特徴を融合させてシャープ画像を推定する。我々は、ぼやけた画像と鋭い参照をマッチングする難しい問題を解決するために、パッチベースの特徴マッチング戦略を採用する。本手法は, 単一画像デブロアリング用に設計された既設ネットワークに組み込むことができる。実験の結果,提案手法の有効性が示された。 Despite the recent advancement in the study of removing motion blur in an image, it is still hard to deal with strong blurs. While there are limits in removing blurs from a single image, it has more potential to use multiple images, e.g., using an additional image as a reference to deblur a blurry image. A typical setting is deburring an image using a nearby sharp image(s) in a video sequence, as in the studies of video deblurring. This paper proposes a better method to use the information present in a reference image. The method does not need a strong assumption on the reference image. We can utilize an alternative shot of the identical scene, just like in video deblurring, or we can even employ a distinct image from another scene. Our method first matches local patches of the target and reference images and then fuses their features to estimate a sharp image. We employ a patch-based feature matching strategy to solve the difficult problem of matching the blurry image with the sharp reference. Our method can be integrated into pre-existing networks designed for single image deblurring. The experimental results show the effectiveness of the proposed method.	翻訳日:2023-07-07 14:34:42 公開日:2023-07-06
# momentdiff: ランダムからリアルへの生成的ビデオモーメント検索 MomentDiff: Generative Video Moment Retrieval from Random to Real ( http://arxiv.org/abs/2307.02869v1 ) ライセンス: Link先を確認	Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, Yongdong Zhang	(参考訳) ビデオモーメント検索は、与えられた言語記述に対応する未トリミングビデオ内の特定の時間的セグメントを識別するための効率的で一般化されたソリューションを追求する。この目的を達成するために、momentdiffと呼ばれる生成拡散ベースのフレームワークを提供し、ランダムブラウジングから漸進的ローカライゼーションまでの典型的な人間の検索プロセスをシミュレートする。具体的には、まず実空間をランダムノイズに拡散させ、テキストとビデオの類似性のガイダンスを用いてランダムノイズを元の空間に分解する。これにより、モデルは任意のランダムな場所から実際のモーメントへのマッピングを学習でき、ランダムな初期化からセグメントを見つけることができる。トレーニングが完了すると、MomentDiffはランダムな時間セグメントを初期推定としてサンプリングし、それらを反復的に洗練して正確な時間境界を生成する。識別作業(例えば学習可能な提案やクエリに基づく)とは異なり、ランダムな初期化スパンを持つmomentdiffはデータセットからの時間的位置バイアスに抵抗する可能性がある。時間的位置バイアスの影響を評価するために,Charades-STA-Len と Charades-STA-Mom という2つの反バイアスデータセットを提案する。実験の結果,提案手法は3つのベンチマークで常に最先端手法を上回っており,提案するアンチバイアスデータセットの一般化とロバスト性が向上していることがわかった。コード、モデル、アンチバイアス評価データセットはhttps://github.com/IMCCretrieval/MomentDiffで入手できる。 Video moment retrieval pursues an efficient and generalized solution to identify the specific temporal segments within an untrimmed video that correspond to a given language description. To achieve this goal, we provide a generative diffusion-based framework called MomentDiff, which simulates a typical human retrieval process from random browsing to gradual localization. Specifically, we first diffuse the real span to random noise, and learn to denoise the random noise to the original span with the guidance of similarity between text and video. This allows the model to learn a mapping from arbitrary random locations to real moments, enabling the ability to locate segments from random initialization. Once trained, MomentDiff could sample random temporal segments as initial guesses and iteratively refine them to generate an accurate temporal boundary. Different from discriminative works (e.g., based on learnable proposals or queries), MomentDiff with random initialized spans could resist the temporal location biases from datasets. To evaluate the influence of the temporal location biases, we propose two anti-bias datasets with location distribution shifts, named Charades-STA-Len and Charades-STA-Mom. The experimental results demonstrate that our efficient framework consistently outperforms state-of-the-art methods on three public benchmarks, and exhibits better generalization and robustness on the proposed anti-bias datasets. The code, model, and anti-bias evaluation datasets are available at https://github.com/IMCCretrieval/MomentDiff.	翻訳日:2023-07-07 14:34:23 公開日:2023-07-06
# deep-learning balanced homodyne detectionを用いたカオスによる増幅量子ノイズの高速光子相関モニタリング High-speed photon correlation monitoring of amplified quantum noise by chaos using deep-learning balanced homodyne detection ( http://arxiv.org/abs/2307.02868v1 ) ライセンス: Link先を確認	Yanqiang Guo, Zinan Hu, Jianchao Zhang, Chenyu Zhu, Xiaomin Guo	(参考訳) 光子相関の精密な実験的な決定には大量のデータと膨大な測定時間が必要である。広帯域平衡ホモダイン検出とディープラーニング加速度に基づく増幅量子雑音の2次光子相関を$g^{(2)}(0)$でモニタする手法を提案する。弱いカオスレーザーの注入により量子ノイズを効果的に増幅し、増幅された量子ノイズの$g^{(2)}(0)$をリアルタイムサンプルレート1.4GHzで測定する。また,光子相関畳み込みニューラルネットワークを用いて,数次ゆらぎを用いて相関データを加速し,様々なカオス注入強度と有効帯域幅に対して$g^{(2)}(0)$の並列処理を行う。深層学習法は、$g^{(2)}(0)$実験的な取得を高精度に加速し、平均2乗誤差0.002の光子相関データの6107セットを22秒で推定し、データ取得時間で3桁の大加速度を達成する。この技術は、セキュア通信および量子イメージングにおけるエントロピー源の高速かつ高精度なコヒーレンス評価に寄与する。 Precision experimental determination of photon correlation requires the massive amounts of data and extensive measurement time. We present a technique to monitor second-order photon correlation $g^{(2)}(0)$ of amplified quantum noise based on wideband balanced homodyne detection and deep-learning acceleration. The quantum noise is effectively amplified by an injection of weak chaotic laser and the $g^{(2)}(0)$ of the amplified quantum noise is measured with a real-time sample rate of 1.4 GHz. We also exploit a photon correlation convolutional neural network accelerating correlation data using a few quadrature fluctuations to perform a parallel processing of the $g^{(2)}(0)$ for various chaos injection intensities and effective bandwidths. The deep-learning method accelerates the $g^{(2)}(0)$ experimental acquisition with a high accuracy, estimating 6107 sets of photon correlation data with a mean square error of 0.002 in 22 seconds and achieving a three orders of magnitude acceleration in data acquisition time. This technique contributes to a high-speed and precision coherence evaluation of entropy source in secure communication and quantum imaging.	翻訳日:2023-07-07 14:33:57 公開日:2023-07-06
# 鉄道分野におけるml系システムの継続的開発と安全性確保のための安全mlopsプロセスに向けて Towards a safe MLOps Process for the Continuous Development and Safety Assurance of ML-based Systems in the Railway Domain ( http://arxiv.org/abs/2307.02867v1 ) ライセンス: Link先を確認	Marc Zeller, Thomas Waschulzik, Reiner Schmid, Claus Bahlmann	(参考訳) 従来の自動化技術だけでは、非制限のインフラ上での無人運転を可能にするには不十分である。現在、必要な知覚タスクは機械学習(ML)を使用して実現されており、開発とデプロイを確実かつ効率的に行う必要がある。これを実現するための重要な側面の1つは、改善された再現性、トレーサビリティ、コラボレーション、変更条件へのドライバレス操作の継続的適応にMLOpsプロセスを使用することである。 MLOpsはMLアプリケーション開発と運用(Ops)を混在させ、運用からのフィードバックに基づいて、高周波ソフトウェアリリースと継続的イノベーションを可能にする。本稿では,鉄道分野におけるMLベースシステムの継続的開発と安全性保証のための安全MLOpsプロセスの概要について述べる。システムエンジニアリング、安全性保証、MLライフサイクルを包括的なワークフローに統合する。プロセスの個々の段階とその相互作用を示す。さらに,safe mlopsプロセスの異なる段階を自動化するための,関連する課題について述べる。 Traditional automation technologies alone are not sufficient to enable driverless operation of trains (called Grade of Automation (GoA) 4) on non-restricted infrastructure. The required perception tasks are nowadays realized using Machine Learning (ML) and thus need to be developed and deployed reliably and efficiently. One important aspect to achieve this is to use an MLOps process for tackling improved reproducibility, traceability, collaboration, and continuous adaptation of a driverless operation to changing conditions. MLOps mixes ML application development and operation (Ops) and enables high frequency software releases and continuous innovation based on the feedback from operations. In this paper, we outline a safe MLOps process for the continuous development and safety assurance of ML-based systems in the railway domain. It integrates system engineering, safety assurance, and the ML life-cycle in a comprehensive workflow. We present the individual stages of the process and their interactions. Moreover, we describe relevant challenges to automate the different stages of the safe MLOps process.	翻訳日:2023-07-07 14:33:38 公開日:2023-07-06
# PLIERS: オンラインソーシャルネットワークにおけるコンテンツ拡散のための人気ベースのレコメンデーションシステム PLIERS: a Popularity-Based Recommender System for Content Dissemination in Online Social Networks ( http://arxiv.org/abs/2307.02865v1 ) ライセンス: Link先を確認	Valerio Arnaboldi, Mattia Giovanni Campana, Franca Delmastro, Elena Pagani	(参考訳) 本稿では,ユーザがすでに持っているものと同様の人気を持つアイテムやタグに関心を持っているという仮定に基づく,新しいタグベースのレコメンダシステムpliersを提案する。 PLIERSは、アルゴリズムの複雑さと推奨項目のパーソナライズレベルとの良好なトレードオフを達成することを目的としている。プライアーを評価するために,我々は実際のosnデータセットに関する一連の実験を行い,パーソナライゼーション,関連性,レコメンデーションのノベル性といった面で最先端のソリューションを上回ることを示した。 In this paper, we propose a novel tag-based recommender system called PLIERS, which relies on the assumption that users are mainly interested in items and tags with similar popularity to those they already own. PLIERS is aimed at reaching a good tradeoff between algorithmic complexity and the level of personalization of recommended items. To evaluate PLIERS, we performed a set of experiments on real OSN datasets, demonstrating that it outperforms state-of-the-art solutions in terms of personalization, relevance, and novelty of recommendations.	翻訳日:2023-07-07 14:33:23 公開日:2023-07-06
# ValiTex -- 社会科学構成の計算テキストに基づく測定のための一様検証フレームワーク ValiTex -- a uniform validation framework for computational text-based measures of social science constructs ( http://arxiv.org/abs/2307.02863v1 ) ライセンス: Link先を確認	Lukas Birkenmaier and Clemens Lechner and Claudia Wagner	(参考訳) 社会科学構造に関する計算テキストに基づく尺度の検証方法に関するガイダンスが断片化されている。研究者は一般的に、テキストベースの尺度を検証することの重要性を認めているが、それらはしばしば共通の用語や統一的な枠組みを欠いている。本稿では,テキストデータに基づく社会科学構造の測定を支援するために,ValiTexという新たな検証フレームワークを提案する。このフレームワークは、計算テキスト分析の目的のためにフレームワークを拡張しながら、心理測定において長年確立されてきた伝統に基づいている。 ValiTexは概念モデルと動的チェックリストという2つのコンポーネントで構成されている。概念モデルがバリデーションへのアプローチ方法に関する異なるフェーズに沿って一般的な構造を提供するのに対して、動的チェックリストは特定の検証手順を定義し、推奨可能なステップ(つまり、関連する検証証拠と必要な検証証拠を提供する)またはオプション(つまり、追加の検証証拠を提供するのに役立ちます)についてガイダンスを提供する。ソーシャルメディアデータから性差別を検出するユースケースに適用することにより、フレームワークの有用性を実証する。 Guidance on how to validate computational text-based measures of social science constructs is fragmented. Whereas scholars are generally acknowledging the importance of validating their text-based measures, they often lack common terminology and a unified framework to do so. This paper introduces a new validation framework called ValiTex, designed to assist scholars to measure social science constructs based on textual data. The framework draws on a long-established tradition within psychometrics while extending the framework for the purpose of computational text analysis. ValiTex consists of two components, a conceptual model, and a dynamic checklist. Whereas the conceptual model provides a general structure along distinct phases on how to approach validation, the dynamic checklist defines specific validation steps and provides guidance on which steps might be considered recommendable (i.e., providing relevant and necessary validation evidence) or optional (i.e., useful for providing additional supporting validation evidence. The utility of the framework is demonstrated by applying it to a use case of detecting sexism from social media data.	翻訳日:2023-07-07 14:33:10 公開日:2023-07-06
# センサを用いた人間行動認識における最適センサ配置のためのリアルタイム人文推定手法 A Real-time Human Pose Estimation Approach for Optimal Sensor Placement in Sensor-based Human Activity Recognition ( http://arxiv.org/abs/2307.02906v1 ) ライセンス: Link先を確認	Orhan Konak, Alexander Wischmann, Robin van de Water, Bert Arnrich	(参考訳) センサベースのヒューマンアクティビティ認識は、人間の動きの邪魔にならない監視を容易にする。しかし,最適分類性能に最も効果的なセンサ配置の決定は依然として困難である。本稿では,対象行動の映像記録から推定した実時間2次元ポーズを用いて,この問題を解決する新しい手法を提案する。得られた骨格データは、最適なセンサ位置を特定するためのユニークな戦略を提供する。提案手法の有効性を検証し,慣性センサを用いて被験者10名を対象に13種類の活動を監視する。以上の結果から,視覚に基づくセンサ配置法は従来のディープラーニング手法と同等の結果を示し,その効果を示す。本研究は,センサ配置の最適決定のための軽量なオンデバイスソリューションを提供し,データの匿名化を促進し,マルチモーダル分類アプローチをサポートすることにより,ヒューマンアクティビティ認識の分野を著しく進歩させる。 Sensor-based Human Activity Recognition facilitates unobtrusive monitoring of human movements. However, determining the most effective sensor placement for optimal classification performance remains challenging. This paper introduces a novel methodology to resolve this issue, using real-time 2D pose estimations derived from video recordings of target activities. The derived skeleton data provides a unique strategy for identifying the optimal sensor location. We validate our approach through a feasibility study, applying inertial sensors to monitor 13 different activities across ten subjects. Our findings indicate that the vision-based method for sensor placement offers comparable results to the conventional deep learning approach, demonstrating its efficacy. This research significantly advances the field of Human Activity Recognition by providing a lightweight, on-device solution for determining the optimal sensor placement, thereby enhancing data anonymization and supporting a multimodal classification approach.	翻訳日:2023-07-07 14:26:57 公開日:2023-07-06
# パーシステンスランク関数機械学習のための計算安定性 Computable Stability for Persistence Rank Function Machine Learning ( http://arxiv.org/abs/2307.02904v1 ) ライセンス: Link先を確認	Qiquan Wang, In\'es Garc\'ia-Redondo, Pierre Faug\`ere, Anthea Monod, Gregory Henselman-Petrusek	(参考訳) 永続的ホモロジーバーコードとダイアグラムは、トポロジカルデータ分析の基盤である。多くの実データ設定で広く使われているが、トポロジ情報の変化(細胞ホモロジーによって測定される)とデータの変動を関連付けるが、複雑な幾何学的構造のために統計的設定での使用は困難である。 In this paper, we revisit the persistent homology rank function -- an invariant measure of ``shape" that was introduced before barcodes and persistence diagrams and captures the same information in a form that is more amenable to data and computation. In particular, since they are functions, techniques from functional data analysis -- a domain of statistics adapted for functions -- apply directly to persistent homology when represented by rank functions. Rank functions, however, have been less popular than barcodes because they face the challenge that stability -- a property that is crucial to validate their use in data analysis -- is difficult to guarantee, mainly due to metric concerns on rank function space. しかし、ランク関数はより自然に、多パラメータ持続ホモロジーの人気の高まりと重要なケースにまで拡張される。本稿では,シミュレーションデータと実データの両方,および単一および多パラメータ持続ホモロジーにおいて,関数推論統計および機械学習におけるランク関数の性能について検討する。階数関数によって捕捉される永続的ホモロジーの使用は、既存のアプローチよりも明らかな改善をもたらす。次に, 計算可能性と解釈可能性という基礎的な目的から, 各種指標を用いた単パラメータ・多パラメータ永続ランク関数の安定性を導出し, 数値実験とデータへの適用を理論的に正当化する。 Persistent homology barcodes and diagrams are a cornerstone of topological data analysis. Widely used in many real data settings, they relate variation in topological information (as measured by cellular homology) with variation in data, however, they are challenging to use in statistical settings due to their complex geometric structure. In this paper, we revisit the persistent homology rank function -- an invariant measure of ``shape" that was introduced before barcodes and persistence diagrams and captures the same information in a form that is more amenable to data and computation. In particular, since they are functions, techniques from functional data analysis -- a domain of statistics adapted for functions -- apply directly to persistent homology when represented by rank functions. Rank functions, however, have been less popular than barcodes because they face the challenge that stability -- a property that is crucial to validate their use in data analysis -- is difficult to guarantee, mainly due to metric concerns on rank function space. However, rank functions extend more naturally to the increasingly popular and important case of multiparameter persistent homology. In this paper, we study the performance of rank functions in functional inferential statistics and machine learning on both simulated and real data, and in both single and multiparameter persistent homology. We find that the use of persistent homology captured by rank functions offers a clear improvement over existing approaches. We then provide theoretical justification for our numerical experiments and applications to data by deriving several stability results for single- and multiparameter persistence rank functions under various metrics with the underlying aim of computational feasibility and interpretability.	翻訳日:2023-07-07 14:26:43 公開日:2023-07-06
# puffin:蒸気圧予測のためのパス統一フィードフォワードインタフェースネットワーク PUFFIN: A Path-Unifying Feed-Forward Interfaced Network for Vapor Pressure Prediction ( http://arxiv.org/abs/2307.02903v1 ) ライセンス: Link先を確認	Vinicius Viena Santana, Carine Menezes Rebello, Luana P. Queiroz, Ana Mafalda Ribeiro, Nadia Shardta, and Idelfonso B. R. Nogueira	(参考訳) 蒸気圧の正確な予測は、様々な産業・環境用途に不可欠である。しかし, 実験の資源と労働力の強さから, 興味のあるすべての化合物の正確な測定は不可能である。蒸気圧を予測するための温度依存関係が要求されるとき、資源と労働の需要はさらに増加する。本稿では,移動学習とドメイン知識(アントワーヌ方程式)にインスパイアされた新しい帰納バイアスノードを組み合わせることで,蒸気圧予測を改善する機械学習フレームワークPUFFINを提案する。グラフ埋め込みを用いたインダクティブバイアスとトランスファーラーニングを活用することで、puffinはインダクティブバイアスを使用しない、あるいは化合物の汎用記述子を使用する代替戦略よりも優れている。このフレームワークは、データ可用性の限界を克服するためにドメイン固有の知識を組み込むことによって、他の物理化学的性質の予測を含む化学化合物分析の幅広い応用の可能性を示している。インダクティブアントインノードはネットワーク由来アントイン方程式係数を生成するため,提案する機械学習フレームワークは部分的に解釈可能である。すると、得られた分析表現を直接プロセス設計ソフトウェアに組み込んで、産業や環境で発生するプロセスの予測と制御を改善することができる。 Accurately predicting vapor pressure is vital for various industrial and environmental applications. However, obtaining accurate measurements for all compounds of interest is not possible due to the resource and labor intensity of experiments. The demand for resources and labor further multiplies when a temperature-dependent relationship for predicting vapor pressure is desired. In this paper, we propose PUFFIN (Path-Unifying Feed-Forward Interfaced Network), a machine learning framework that combines transfer learning with a new inductive bias node inspired by domain knowledge (the Antoine equation) to improve vapor pressure prediction. By leveraging inductive bias and transfer learning using graph embeddings, PUFFIN outperforms alternative strategies that do not use inductive bias or that use generic descriptors of compounds. The framework's incorporation of domain-specific knowledge to overcome the limitation of poor data availability shows its potential for broader applications in chemical compound analysis, including the prediction of other physicochemical properties. Importantly, our proposed machine learning framework is partially interpretable, because the inductive Antoine node yields network-derived Antoine equation coefficients. It would then be possible to directly incorporate the obtained analytical expression in process design software for better prediction and control of processes occurring in industry and the environment.	翻訳日:2023-07-07 14:26:21 公開日:2023-07-06
# NMR量子プロセッサ上のパウリ半群の凸混合による量子非マルコフ性の実験的実現 Experimental realization of quantum non-Markovianity through the convex mixing of Pauli semigroups on an NMR quantum processor ( http://arxiv.org/abs/2307.02899v1 ) ライセンス: Link先を確認	Vaishali Gulati and Vinayak Jagadish and R. Srikanth and Kavita Dorai	(参考訳) この実験は、任意の混合パラメータを持つパウリ半群の凸結合を調べ、結果の動的写像がマルコフ的あるいは非マルコフ的挙動を示すかどうかを決定することを目的としている。具体的には、2つのパウリ半群の同値かつ不等混合を考慮し、結果の写像が常に非マルコフ写像であることを示す。さらに、3つのパウリ半群の3方向混合の3つのケースを調査し、結果の写像のマルコビアン性または非マルコビアン性を決定する。 nmr量子プロセッサ上でポーリ半群の異なる混合結合を持つ単一量子ビット系の非ユニタリダイナミクスをシミュレートするために、2つのアンシラリー量子ビットを含むアルゴリズムを用いる。実験結果は理論的な予測と一致した。 This experimental study aims to investigate the convex combinations of Pauli semigroups with arbitrary mixing parameters to determine whether the resulting dynamical map exhibits Markovian or non-Markovian behavior. Specifically, we consider the cases of equal as well as unequal mixing of two Pauli semigroups, and demonstrate that the resulting map is always non-Markovian. Additionally, we study three cases of three-way mixing of the three Pauli semigroups and determine the Markovianity or non-Markovianity of the resulting maps by experimentally determining the decay rates. To simulate the non-unitary dynamics of a single qubit system with different mixing combinations of Pauli semigroups on an NMR quantum processor, we use an algorithm involving two ancillary qubits. The experimental results align with the theoretical predictions.	翻訳日:2023-07-07 14:26:00 公開日:2023-07-06
# RefVSR++: 参照ベースのビデオ超解像のための参照入力の爆発 RefVSR++: Exploiting Reference Inputs for Reference-based Video Super-resolution ( http://arxiv.org/abs/2307.02897v1 ) ライセンス: Link先を確認	Han Zou, Masanori Suganuma, Takayuki Okatani	(参考訳) 異なる視野(fov)を持つ複数のカメラからなるマルチカメラシステムを備えたスマートフォンが普及している。これらのカメラ構成は、参照ベースのSRとビデオSRと互換性があり、デバイス上でビデオを録画しながら同時に実行できる。これにより、これら2つのsr法を組み合わせることで画質が向上する。近年、LeeらはRefVSRという方法を提示している。本稿では,低解像度 (LR) ビデオやレファレンス (Ref) ビデオなどの観測結果を最適に活用する方法を検討する。 RefVSRは従来のビデオSRを非常に簡単に拡張し、LRとRefの入力を1つの双方向ストリームで時間とともに集約する。しかし,FoVによるLR画像とRef画像のコンテンツ差を考慮すると,時間方向に独立して集約することで,二つの画像列から最大情報を導き出すことができる。そこで,本研究では,融合lr入力とref入力を集約する手法と,時間とともにref入力を集約する手法であるrefvsr++を提案する。さらに,ビデオSRの成功の鍵となる画像特徴を時間とともに整列させる機構をRefVSR++に装備する。実験により、RefVSR++はPSNRにおいて1dB以上でRefVSRを上回る性能を示し、新しい最先端を実現する。 Smartphones equipped with a multi-camera system comprising multiple cameras with different field-of-view (FoVs) are becoming more prevalent. These camera configurations are compatible with reference-based SR and video SR, which can be executed simultaneously while recording video on the device. Thus, combining these two SR methods can improve image quality. Recently, Lee et al. have presented such a method, RefVSR. In this paper, we consider how to optimally utilize the observations obtained, including input low-resolution (LR) video and reference (Ref) video. RefVSR extends conventional video SR quite simply, aggregating the LR and Ref inputs over time in a single bidirectional stream. However, considering the content difference between LR and Ref images due to their FoVs, we can derive the maximum information from the two image sequences by aggregating them independently in the temporal direction. Then, we propose an improved method, RefVSR++, which can aggregate two features in parallel in the temporal direction, one for aggregating the fused LR and Ref inputs and the other for Ref inputs over time. Furthermore, we equip RefVSR++ with enhanced mechanisms to align image features over time, which is the key to the success of video SR. We experimentally show that RefVSR++ outperforms RefVSR by over 1dB in PSNR, achieving the new state-of-the-art.	翻訳日:2023-07-07 14:25:44 公開日:2023-07-06
# Free Bits:エッジ上の混合精度量子ニューラルネットワークのレイテンシ最適化 Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge ( http://arxiv.org/abs/2307.02894v1 ) ライセンス: Link先を確認	Georg Rutishauser, Francesco Conti, Luca Benini	(参考訳) ディープニューラルネットワークの層が異なる精度で量子化される混合精度量子化(mixed-precision quantization)は、均質なビット幅量子化によって達成できる以上のモデルサイズ、レイテンシ、統計的精度の間のトレードオフを最適化する機会を提供する。与えられたネットワークに対する混合精度構成の難解な探索空間をナビゲートするために,ハイブリッド検索手法を提案する。ハードウェアに依存しない微分可能な検索アルゴリズムからなり、ハードウェア認識のヒューリスティック最適化により、特定のハードウェアターゲットに対して遅延最適化された混合精度設定を見つける。提案アルゴリズムはMobileNetV1およびMobileNetV2上で評価し,ハードウェア特性の異なるマルチコアRISC-Vマイクロコントローラ群上にネットワークを配置する。我々は、1000クラスのImageNetデータセットの完全精度ベースラインから無視できない精度で8ビットモデルと比較して、エンドツーエンドのレイテンシを最大28.6%削減する。我々は8ビットのベースラインに対して,ハードウェアサポートのないシステムでも,無視可能な精度低下時に高速化を実証する。さらに、レイテンシーのプロキシとして、二項演算数を減らした微分可能な探索に対して、我々のアプローチの優位性を示す。 Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the intractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6% reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline, even on systems with no hardware support for sub-byte arithmetic at negligible accuracy drop. Furthermore, we show the superiority of our approach with respect to differentiable search targeting reduced binary operation counts as a proxy for latency.	翻訳日:2023-07-07 14:25:19 公開日:2023-07-06
# 抑うつ状態における音声特徴の関係--抑うつ検出の速度と性能向上のための特徴相関- The Relationship Between Speech Features Changes When You Get Depressed: Feature Correlations for Improving Speed and Performance of Depression Detection ( http://arxiv.org/abs/2307.02892v1 ) ライセンス: Link先を確認	Fuxiang Tao, Wei Ma, Xuri Ge, Anna Esposito, Alessandro Vinciarelli	(参考訳) この研究は、抑うつが音声から抽出した特徴間の相関を変化させることを示す。さらに、このような知見を用いることで、SVMとLSTMに基づく抑うつ検知器の訓練速度と性能を向上させることができることを示す。実験は、プロの精神科医によってうつ病と診断された58人を含む112人の話者を含む公開データセットであるAndroids Corpus上で実施された。その結果,実験で使用したモデルでは,特徴ベクトルよりも特徴相関行列が与えられ,学習速度と性能が向上した。誤差率の相対的な減少はモデルによって23.1%から26.6%の範囲である。特徴相関行列は, 抑えられた話者の場合, より可変である可能性が示唆された。それに応じて、このような現象は抑うつマーカーと考えることができる。 This work shows that depression changes the correlation between features extracted from speech. Furthermore, it shows that using such an insight can improve the training speed and performance of depression detectors based on SVMs and LSTMs. The experiments were performed over the Androids Corpus, a publicly available dataset involving 112 speakers, including 58 people diagnosed with depression by professional psychiatrists. The results show that the models used in the experiments improve in terms of training speed and performance when fed with feature correlation matrices rather than with feature vectors. The relative reduction of the error rate ranges between 23.1% and 26.6% depending on the model. The probable explanation is that feature correlation matrices appear to be more variable in the case of depressed speakers. Correspondingly, such a phenomenon can be thought of as a depression marker.	翻訳日:2023-07-07 14:24:59 公開日:2023-07-06
# BaBE: 遅延説明変数の推定によるフェアネスの向上 BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables ( http://arxiv.org/abs/2307.02891v1 ) ライセンス: Link先を確認	Ruta Binkyte, Daniele Gorla, Catuscia Palamidessi	(参考訳) 両グループ間の不公平な差別の問題を検討し,公平性を達成するための前処理法を提案する。統計的パリティのような補正法は通常、不正確さを生じさせ、敏感な属性sと正当な属性e(説明変数)との間に相関がある場合、実際には公平性が得られない。これらの欠点を克服するために、他の公平性の概念、特に条件付き統計パリティと平等機会が提案されている。しかし、E はデータの中で直接観測できないことが多い、すなわち潜時変数である。 E を表す他の変数 Z も観測できるが、問題は Z が S に影響される可能性があり、したがって Z 自身はバイアスを受けることができることである。この問題に対処するため、ベイズ推論と期待最大化法の組み合わせに基づくアプローチであるBaBE(Bayesian Bias Elimination)を提案し、各群に対して与えられたZに対してEの最も可能性の高い値を推定する。合成および実データ集合の実験によって、我々のアプローチは、高い正確性とともに十分な公平性を提供することが示された。 We consider the problem of unfair discrimination between two groups and propose a pre-processing method to achieve fairness. Corrective methods like statistical parity usually lead to bad accuracy and do not really achieve fairness in situations where there is a correlation between the sensitive attribute S and the legitimate attribute E (explanatory variable) that should determine the decision. To overcome these drawbacks, other notions of fairness have been proposed, in particular, conditional statistical parity and equal opportunity. However, E is often not directly observable in the data, i.e., it is a latent variable. We may observe some other variable Z representing E, but the problem is that Z may also be affected by S, hence Z itself can be biased. To deal with this problem, we propose BaBE (Bayesian Bias Elimination), an approach based on a combination of Bayes inference and the Expectation-Maximization method, to estimate the most likely value of E for a given Z for each group. The decision can then be based directly on the estimated E. We show, by experiments on synthetic and real data sets, that our approach provides a good level of fairness as well as high accuracy.	翻訳日:2023-07-07 14:24:45 公開日:2023-07-06
# 蛍光光子の登録に基づくイオン量子ビットの高精度トモグラフィー High-precision tomography of ion qubits based on registration of fluorescent photons ( http://arxiv.org/abs/2307.02890v1 ) ライセンス: Link先を確認	Yu. I. Bogdanov, I.A. Dmitriev, B.I. Bantysh, N.A. Bogdanova, V.F. Lukichev	(参考訳) イオン量子ビットレジスタの論理状態の識別性が限定された条件下での高精度トモグラフィー法を開発した。励起レベル、光子散乱、暗騒音、低い数値開口などの有限寿命によるイオン量子ビットの量子状態の読み出し中に低い誤差率を達成することは必ずしも不可能である。しかし、ファジィ量子測定のモデルは、量子状態の正確なトモグラフィーを確保することができる。そこで我々は,蛍光光子の数を数えるファジィ測定モデルを開発した。ファジィ測定演算子に基づくイオン量子ビットレジスタの量子状態再構成のための統計的に適切なアルゴリズムを提案する。このアルゴリズムは実験で利用可能な完全な情報を使用し、イオン量子ビットの論理状態の限定的な識別性に関連する系統的な測定誤差を考慮できる。開発したモデルでは,計算量は複雑ではあるが,量子ビットの状態に関する情報がかなり多く,閾値アルゴリズムに基づくモデルよりも精度が高いことが判明した。 We develop a new method for high-precision tomography of ion qubit registers under conditions of limited distinguishability of its logical states. It is not always possible to achieve low error rates during the readout of the quantum states of ion qubits due to the finite lifetime of excited levels, photon scattering, dark noise, low numerical aperture, etc. However, the model of fuzzy quantum measurements makes it possible to ensure precise tomography of quantum states. To do this, we developed a fuzzy measurement model based on counting the number of fluorescent photons. A statistically adequate algorithm for the reconstruction of quantum states of ion qubit registers based on fuzzy measurement operators is proposed. The algorithm uses the complete information available in the experiment and makes it possible to account for systematic measurement errors associated with the limited distinguishability of the logical states of ion qubits. We show that the developed model, although computationally more complex, contains significantly more information about the state of the qubit and provides a higher accuracy of state reconstruction compared to the model based on the threshold algorithm.	翻訳日:2023-07-07 14:24:24 公開日:2023-07-06
# 先行行動の探索による課題解決の学習 Learning to Solve Tasks with Exploring Prior Behaviours ( http://arxiv.org/abs/2307.02889v1 ) ライセンス: Link先を確認	Ruiqi Zhu, Siyuan Li, Tianhong Dai, Chongjie Zhang, Oya Celiktutan	(参考訳) デモは深層強化学習(drl)で広く使われ、少ない報酬で問題解決を容易にする。しかし、実世界のシナリオにおけるタスクは、しばしばデモから様々な初期条件を持つことができ、追加の事前動作が必要になる。例えば、"emph{picking a object from a open drawer}"というタスクのデモンストレーションが与えられますが、引き出しはトレーニングでクローズされています。引き出しを開ける事前の動作が得られなければ、ロボットがそのタスクを解決する可能性は低い。これを解決するために,本論文では,内在的リワード駆動例に基づく制御 \textbf{(IRDEC)}を提案する。提案手法は,必要な先行行動の探索と取得をエージェントに行うことができ,その上でタスク固有の動作に接続してスパース・リワードタスクを解決することができる。提案手法の性能は3つのナビゲーションタスクと1つのロボット操作タスクにおける他のベースラインよりも高い。コードはhttps://github.com/Ricky-Zhu/IRDECで入手できる。 Demonstrations are widely used in Deep Reinforcement Learning (DRL) for facilitating solving tasks with sparse rewards. However, the tasks in real-world scenarios can often have varied initial conditions from the demonstration, which would require additional prior behaviours. For example, consider we are given the demonstration for the task of \emph{picking up an object from an open drawer}, but the drawer is closed in the training. Without acquiring the prior behaviours of opening the drawer, the robot is unlikely to solve the task. To address this, in this paper we propose an Intrinsic Rewards Driven Example-based Control \textbf{(IRDEC)}. Our method can endow agents with the ability to explore and acquire the required prior behaviours and then connect to the task-specific behaviours in the demonstration to solve sparse-reward tasks without requiring additional demonstration of the prior behaviours. The performance of our method outperforms other baselines on three navigation tasks and one robotic manipulation task with sparse rewards. Codes are available at https://github.com/Ricky-Zhu/IRDEC.	翻訳日:2023-07-07 14:24:08 公開日:2023-07-06
# 離散非線形schr\"odinger方程式における創発的ssh物理、ソリトンおよび凝縮を誘導する密度依存ゲージ場 Density dependent gauge field inducing emergent SSH physics, solitons and condensates in a discrete nonlinear Schr\"odinger equation ( http://arxiv.org/abs/2307.02952v1 ) ライセンス: Link先を確認	William N. Faugno, Mario Salerno, Tomoki Ozawa	(参考訳) 動的密度差依存ゲージ場を持つ離散非線形シュリンガー方程式について検討する。平面波凝縮状態から局所ソリトン状態への基底状態遷移は、ゲージ結合が変化するにつれて起こる。興味深いことに、凝縮物とソリトンが安定している状態が見つかる。創発的なキラル対称性を同定し、対称性が保護されたゼロエネルギーエッジモードの存在につながる。創発的なキラル対称性は、低エネルギーソリトンと高エネルギーソリトンを関連付ける。これらの状態は、相互作用が反発的かつ魅力的に作用することを示している。 We investigate a discrete non-linear Schr\"odinger equation with dynamical, density-difference-dependent, gauge fields. We find a ground-state transition from a plane wave condensate to a localized soliton state as the gauge coupling is varied. Interestingly we find a regime in which the condensate and soliton are both stable. We identify an emergent chiral symmetry, which leads to the existence of a symmetry protected zero energy edge mode. The emergent chiral symmetry relates low and high energy solitons. These states indicate that the interaction acts both repulsively and attractively.	翻訳日:2023-07-07 14:17:50 公開日:2023-07-06
# 実数値観測からの強化学習のためのニューロモルフィックアーキテクチャ A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations ( http://arxiv.org/abs/2307.02947v1 ) ライセンス: Link先を確認	Sergio F. Chevtchenko, Yeshwanth Bethi, Teresa B. Ludermir, Saeed Afshar	(参考訳) 強化学習(RL)は複雑な環境における意思決定のための強力なフレームワークを提供する。しかし、ハードウェア効率とバイオインスパイアされた方法でRLを実装することは依然として課題である。本稿では,実測値を用いてRL問題を解くための新しいスパイキングニューラルネットワーク(SNN)アーキテクチャを提案する。提案モデルは,td(temporal difference)-error modulation)とeligibility tracesを追加して,事前作業に基づいて多層イベントベースクラスタリングを組み込んだものである。アブレーション研究は、これらの成分がモデルの性能に与える影響を裏付けるものである。適応性トレースを持つ表型アクター批判アルゴリズムと最先端のPPOアルゴリズムをベンチマークとして使用する。当社のネットワークは,従来型のRL環境(マウンテンカー,カートポール,アクロボット)における安定的な制御ポリシの発見に成功した。提案モデルは,計算およびハードウェア実装要件の観点から,魅力的なトレードオフを提供する。このモデルは外部メモリバッファやグローバルエラー勾配計算を必要とせず、ローカル学習ルールと放送されたtd-error信号によってオンラインにシナプス更新が行われる。したがって、この研究はよりハードウェア効率の良いRLソリューションの開発に寄与する。 Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments. However, implementing RL in hardware-efficient and bio-inspired ways remains a challenge. This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations. The proposed model incorporates multi-layered event-based clustering, with the addition of Temporal Difference (TD)-error modulation and eligibility traces, building upon prior work. An ablation study confirms the significant impact of these components on the proposed model's performance. A tabular actor-critic algorithm with eligibility traces and a state-of-the-art Proximal Policy Optimization (PPO) algorithm are used as benchmarks. Our network consistently outperforms the tabular approach and successfully discovers stable control policies on classic RL environments: mountain car, cart-pole, and acrobot. The proposed model offers an appealing trade-off in terms of computational and hardware implementation requirements. The model does not require an external memory buffer nor a global error gradient computation, and synaptic updates occur online, driven by local learning rules and a broadcasted TD-error signal. Thus, this work contributes to the development of more hardware-efficient RL solutions.	翻訳日:2023-07-07 14:17:42 公開日:2023-07-06
# DisAsymNet:自己逆学習を用いた両側乳房X線像の非対称異常の解離 DisAsymNet: Disentanglement of Asymmetrical Abnormality on Bilateral Mammograms using Self-adversarial Learning ( http://arxiv.org/abs/2307.02935v1 ) ライセンス: Link先を確認	Xin Wang, Tao Tan, Yuan Gao, Luyi Han, Tianyu Zhang, Chunyao Lu, Regina Beets-Tan, Ruisheng Su, Ritse Mann	(参考訳) 非対称性は異常発生時の両側マンモグラム(Bi-MG)の重要な特徴である。放射線医が診断に広く利用している。左右対称のBi-MGは、非対称な異常が除去された時にどのように見えるか?」という疑問は、まだマンモグラムのアルゴリズムの開発において大きな注目を集めていない。この疑問に対処することで、マンモグラフィー解剖学の貴重な洞察と診断の解釈を助けることができる。そこで本論文では,非対称異常トランスフォーマーを用いた自己敵学習を応用した新しい枠組みであるdisasymnetを提案する。同時に,提案手法はランダムに合成された異常によって部分的に導かれる。提案手法は,3つのパブリックデータセットと1つの社内データセットを用いて実験を行い,異常分類,セグメンテーション,ローカライゼーションタスクにおいて既存の手法よりも優れていることを示す。さらに、再建された正常マンモグラムは、臨床診断のためのより良い解釈可能な視覚的手がかりへの洞察を与える。コードは一般公開される予定だ。 Asymmetry is a crucial characteristic of bilateral mammograms (Bi-MG) when abnormalities are developing. It is widely utilized by radiologists for diagnosis. The question of 'what the symmetrical Bi-MG would look like when the asymmetrical abnormalities have been removed ?' has not yet received strong attention in the development of algorithms on mammograms. Addressing this question could provide valuable insights into mammographic anatomy and aid in diagnostic interpretation. Hence, we propose a novel framework, DisAsymNet, which utilizes asymmetrical abnormality transformer guided self-adversarial learning for disentangling abnormalities and symmetric Bi-MG. At the same time, our proposed method is partially guided by randomly synthesized abnormalities. We conduct experiments on three public and one in-house dataset, and demonstrate that our method outperforms existing methods in abnormality classification, segmentation, and localization tasks. Additionally, reconstructed normal mammograms can provide insights toward better interpretable visual cues for clinical diagnosis. The code will be accessible to the public.	翻訳日:2023-07-07 14:17:26 公開日:2023-07-06
# 時間と空間:補助ロボットアームの適応制御を目指して In Time and Space: Towards Usable Adaptive Control for Assistive Robotic Arms ( http://arxiv.org/abs/2307.02933v1 ) ライセンス: Link先を確認	Max Pascher and Kirill Kronhardt and Felix Ferdinand Goldau and Udo Frese and Jens Gerken	(参考訳) ロボットのソリューション、特にロボットアームは、製造業や家庭の医療環境など、人間との密接なコラボレーションのために頻繁にデプロイされている。これらのロボットアームは、主に物体の把握と操作を含むいくつかの自由度(DoF)を制御する必要がある。標準入力デバイスは主に2つのDoFを持ち、個々のDoFを選択するのに時間を要する。現代の適応型DoFマッピング制御(ADMC)は、必要なモードスイッチ数を削減できたが、これまでは認識された作業負荷を大幅に削減できなかった。ユーザは今でも、ワークフローに抽象モードを切り替える、というメンタルなワークロードを抱えている。我々はADMCのリコメンデーションを更新してフィードフォワードのマルチモーダルフィードバックを提供することにより、ユーザが現在と提案したマッピングをリアルタイムで視覚的に比較できるようにする。 2つの新しいアプローチの効果とは対照的に a) 継続的に更新されたDoFの組み合わせを推奨する b) 現在のロボットの動きと新しい推奨の間で、個別のしきい値を使用する。両者は、古典的な制御方法に対する個人によるVR(Virtual Reality)研究で比較される。タスク完了時間を短縮し、モードスイッチを減らし、認識されたワークロードを減らし、フィードフォワードと組み合わせることで、ADMC法は古典的なモード切替よりも優れていることを確定した。連続性としきい値の間の明らかな定量的な違いの欠如は、ユーザ中心のカスタマイズオプションの重要性を明らかにしている。これらの影響を開発プロセスに含めることで、ユーザビリティが向上し、高いユーザ受け入れを持つロボット技術の実現に欠かせないものとなる。 Robotic solutions, in particular robotic arms, are becoming more frequently deployed for close collaboration with humans, for example in manufacturing or domestic care environments. These robotic arms require the user to control several Degrees-of-Freedom (DoFs) to perform tasks, primarily involving grasping and manipulating objects. Standard input devices predominantly have two DoFs, requiring time-consuming and cognitively demanding mode switches to select individual DoFs. Contemporary Adaptive DoF Mapping Controls (ADMCs) have shown to decrease the necessary number of mode switches but were up to now not able to significantly reduce the perceived workload. Users still bear the mental workload of incorporating abstract mode switching into their workflow. We address this by providing feed-forward multimodal feedback using updated recommendations of ADMC, allowing users to visually compare the current and the suggested mapping in real-time. We contrast the effectiveness of two new approaches that a) continuously recommend updated DoF combinations or b) use discrete thresholds between current robot movements and new recommendations. Both are compared in a Virtual Reality (VR) in-person study against a classic control method. Significant results for lowered task completion time, fewer mode switches, and reduced perceived workload conclusively establish that in combination with feedforward, ADMC methods can indeed outperform classic mode switching. A lack of apparent quantitative differences between Continuous and Threshold reveals the importance of user-centered customization options. Including these implications in the development process will improve usability, which is essential for successfully implementing robotic technologies with high user acceptance.	翻訳日:2023-07-07 14:17:08 公開日:2023-07-06
# 拒絶を伴う回帰にノー・リジェクション学習が最適である場合 When No-Rejection Learning is Optimal for Regression with Rejection ( http://arxiv.org/abs/2307.02932v1 ) ライセンス: Link先を確認	Xiaocheng Li, Shang Liu, Chunlin Sun, Hanzhao Wang	(参考訳) 拒絶による学習は、予測タスクにおける人間とAIの相互作用を研究するための原型モデルである。モデルには2つのコンポーネント、予測器とリジェクタがある。サンプルが到着すると、拒絶者はまずそれを受け入れるかどうかを判断し、受理された場合、予測タスクを遂行し、拒否された場合は、予測を人間に延期する。学習問題は、予測者と拒絶者を同時に学習する必要がある。これは従来の損失関数の構造を変え、しばしば非凸性や不整合の問題を引き起こす。拒絶問題のある分類では、いくつかの作品が、証明可能な一貫性保証を持つ共同学習のための代理的損失を発生させる。本稿では,レグレッションをリジェクション問題(RwR)を用いて検討し,レグレッションタスクとしてRwR問題を扱うノリジェクション学習戦略について検討する。文献で観察される非退行学習戦略の準最適性は、予測器の関数クラスを拡大することにより緩和できることを示す。そこで我々は,予測器の学習を独り占めにするために,乱れた損失を導入し,予測器と拒絶器を共同で行うよりも,予測器に対して一貫したサロゲート特性を個別に確立できることを示す。本研究は,まず全てのデータを用いて予測器を学習し,その予測損失を校正する2段階の学習手順を提案する。より多くのデータサンプルがより良い予測器に結びつくという一般的な直感と一致し、リジェクタを学ぶためのキャリブレーションアルゴリズムのより良い設計にもっと努力する必要がある。我々の議論は主に回帰問題に焦点をあてるが、理論的結果と洞察は分類問題にも一般化する。 Learning with rejection is a prototypical model for studying the interaction between humans and AI on prediction tasks. The model has two components, a predictor and a rejector. Upon the arrival of a sample, the rejector first decides whether to accept it; if accepted, the predictor fulfills the prediction task, and if rejected, the prediction will be deferred to humans. The learning problem requires learning a predictor and a rejector simultaneously. This changes the structure of the conventional loss function and often results in non-convexity and inconsistency issues. For the classification with rejection problem, several works develop surrogate losses for the jointly learning with provable consistency guarantees; in parallel, there has been less work for the regression counterpart. We study the regression with rejection (RwR) problem and investigate the no-rejection learning strategy which treats the RwR problem as a standard regression task to learn the predictor. We establish that the suboptimality of the no-rejection learning strategy observed in the literature can be mitigated by enlarging the function class of the predictor. Then we introduce the truncated loss to single out the learning for the predictor and we show that a consistent surrogate property can be established for the predictor individually in an easier way than for the predictor and the rejector jointly. Our findings advocate for a two-step learning procedure that first uses all the data to learn the predictor and then calibrates the prediction loss for the rejector. It is better aligned with the common intuition that more data samples will lead to a better predictor and it calls for more efforts on a better design of calibration algorithms for learning the rejector. While our discussions mainly focus on the regression problem, the theoretical results and insights generalize to the classification problem as well.	翻訳日:2023-07-07 14:16:24 公開日:2023-07-06
# MIP=RE後の物理の論理的可能性 Logical possibilities for physics after MIP=RE ( http://arxiv.org/abs/2307.02920v1 ) ライセンス: Link先を確認	Ad\'an Cabello, Marco T\'ulio Quintino, Matthias Kleinmann	(参考訳) MIP=RE は C_{qa} (テンソル積相関の集合の閉包) と C_{qc} (可換相関の集合) を超平面(ベルのような不等式)で分離することができ、有限次元テンソル積相関の列で近似できない無限次元量子系上の可換測度(有限個および有限個の結果)によって生成される相関が存在することを意味する。この結果から、論理的に可能な宇宙は4つあると指摘する。それぞれの可能性は、受け入れられた物理理論の限界または自然の重要な側面をテストする機会を明らかにするため興味深い。私たちは、これらの宇宙のどれかを学ぶために、道路マップを設計するのに役立つ、いくつかのオープンな問題をリストアップします。 MIP=RE implies that C_{qa} (the closure of the set of tensor product correlations) and C_{qc} (the set of commuting correlations) can be separated by a hyperplane (i.e., a Bell-like inequality) and that there are correlations produced by commuting measurements (a finite number of them and with a finite number of outcomes) on an infinite-dimensional quantum system which cannot be approximated by sequences of finite-dimensional tensor product correlations. We point out that there are four logically possible universes after this result. Each possibility is interesting because it reveals either limitations in accepted physical theories or opportunities to test crucial aspects of nature. We list some open problems that may help us to design a road map to learn in which of these universes we are.	翻訳日:2023-07-07 14:15:20 公開日:2023-07-06
# 従業員の心理的契約違反が情報セキュリティポリシーの遵守に及ぼす影響--本質的・極端的動機づけ The impact of an employee's psychological contract breach on compliance with information security policies: intrinsic and extrinsic motivation ( http://arxiv.org/abs/2307.02916v1 ) ライセンス: Link先を確認	Daeun Lee and Harjinder Singh Lallie and Nadine Michaelides	(参考訳) ソーシャルエンジニアリング攻撃の急速な増加にもかかわらず、すべての従業員が、組織が期待するほど情報セキュリティポリシー(isp)に準拠しているわけではない。 ISPの非準拠は、様々な心理的動機によって引き起こされる。本研究では、計画行動理論(TPB)と一般抑止理論(GDT)を用いて、従業員の心理的契約違反(PCB)がISPコンプライアンス意図(ICI)に及ぼす影響について検討した。英国人従業員 (\textit{n=206}) のデータ分析の結果, PCB が高いほどICI が低くなることがわかった。調査の結果,PCBは内因性動機(態度と公正感)を有意に低下させ,PCBは内因性動機(制裁の重大さと制裁の確実性)と内因性動機(ICI)の関係を緩やかにしなかった。その結果、ISセキュリティ分野におけるPCBのリスクに対処し、高いPCBを持つ従業員に対して効果的な解決策を提案する。 Despite the rapid rise in social engineering attacks, not all employees are as compliant with information security policies (ISPs) to the extent that organisations expect them to be. ISP non-compliance is caused by a variety of psychological motivation. This study investigates the effect of psychological contract breach (PCB) of employees on ISP compliance intention (ICI) by dividing them into intrinsic and extrinsic motivation using the theory of planned behaviour (TPB) and the general deterrence theory (GDT). Data analysis from UK employees (\textit{n=206}) showed that the higher the PCB, the lower the ICI. The study also found that PCBs significantly reduced intrinsic motivation (attitude and perceived fairness) for ICI, whereas PCBs did not moderate the relationship between extrinsic motivation (sanction severity and sanctions certainty) and ICI. As a result, this study successfully addresses the risks of PCBs in the field of IS security and proposes effective solutions for employees with high PCBs.	翻訳日:2023-07-07 14:14:51 公開日:2023-07-06
# lea: 語彙的注意バイアスを用いたタイプミスに対する文類似性の改善 LEA: Improving Sentence Similarity Robustness to Typos Using Lexical Attention Bias ( http://arxiv.org/abs/2307.02912v1 ) ライセンス: Link先を確認	Mario Almagro, Emilio Almaz\'an, Diego Ortego, David Jim\'enez	(参考訳) タイプミスや略語などのテキストノイズは、下流のタスクでバニラトランスフォーマをペナライズする有名な問題である。これは、複数の領域における基本的なタスクである文類似性(例えば、マッチング、検索、言い換えなど)のケースでもある。文の類似性はクロスエンコーダを用いてアプローチすることができ、2つの文が入力に連結され、モデルがそれらの間の関係を利用することができる。ノイズ問題に対処する以前の研究は、主にデータ拡張戦略に依存しており、トレーニングに使用されるものと類似した破損したサンプルを扱う際の堅牢性が向上している。しかし、これらの手法はすべて依然としてtyposによって引き起こされるトークン分布シフトに苦しむ。本稿では,両文の単語間の語彙類似性を組み込んだ新しい語彙認識アテンションモジュール(lea)をクロスエンコーダに実装し,テキスト雑音に対処することを提案する。テキストの類似性を利用してトークン化シフト問題を回避し,ロバスト性を向上させる。 LEAによって導入された注意バイアスは、特に短文記述と限られたコンテキストを持つドメインにおいて、テキストノイズを伴う複雑なシナリオにクロスエンコーダが取り組むのに役立つことを実証する。製品マッチングのために5つのeコマースデータセットに3つの人気のあるTransformerエンコーダを使用した実験によると、LEAはノイズの存在下でパフォーマンスを継続的に向上する一方で、元の(クリーン)分割に競争力を維持する。また,本手法を2つのデータセットで評価し,LEAが文の長い領域やより自然な文脈でタイポに頑健であることを示す。さらに,本手法における設計選択を徹底的に分析し,意思決定の影響について考察し,タイポスを扱うクロスエンコーダの今後の研究を促進する。 Textual noise, such as typos or abbreviations, is a well-known issue that penalizes vanilla Transformers for most downstream tasks. We show that this is also the case for sentence similarity, a fundamental task in multiple domains, e.g. matching, retrieval or paraphrasing. Sentence similarity can be approached using cross-encoders, where the two sentences are concatenated in the input allowing the model to exploit the inter-relations between them. Previous works addressing the noise issue mainly rely on data augmentation strategies, showing improved robustness when dealing with corrupted samples that are similar to the ones used for training. However, all these methods still suffer from the token distribution shift induced by typos. In this work, we propose to tackle textual noise by equipping cross-encoders with a novel LExical-aware Attention module (LEA) that incorporates lexical similarities between words in both sentences. By using raw text similarities, our approach avoids the tokenization shift problem obtaining improved robustness. We demonstrate that the attention bias introduced by LEA helps cross-encoders to tackle complex scenarios with textual noise, specially in domains with short-text descriptions and limited context. Experiments using three popular Transformer encoders in five e-commerce datasets for product matching show that LEA consistently boosts performance under the presence of noise, while remaining competitive on the original (clean) splits. We also evaluate our approach in two datasets for textual entailment and paraphrasing showing that LEA is robust to typos in domains with longer sentences and more natural context. Additionally, we thoroughly analyze several design choices in our approach, providing insights about the impact of the decisions made and fostering future research in cross-encoders dealing with typos.	翻訳日:2023-07-07 14:14:32 公開日:2023-07-06
# ギルバートにおけるエージェントivit\`a e telicit\`a:inlicazioni認知 Agentivit\`a e telicit\`a in GilBERTo: implicazioni cognitive ( http://arxiv.org/abs/2307.02910v1 ) ライセンス: Link先を確認	Agnese Lombardi, Alessandro Lenci	(参考訳) 本研究の目的は,トランスフォーマティブ・ニューラル・ランゲージモデルが語彙意味論を推論し,その情報を形態素合成パターンの完成に利用するかどうかを検討することである。セマンティクス特性は、テロシティ(これも定性と組み合わせている)とエージェント性である。どちらもセマンティクスとモルフォシンタックスのインターフェイスで動作し、意味的に決定され、構文的にエンコードされる。タスクは、計算モデルとイタリアのネイティブスピーカーのグループの両方に送信された。 2つのデータ群の比較により、ニューラルネットワークモデルが人間の意味的能力の重要な側面をどの程度捉えているかを調べることができる。 The goal of this study is to investigate whether a Transformer-based neural language model infers lexical semantics and use this information for the completion of morphosyntactic patterns. The semantic properties considered are telicity (also combined with definiteness) and agentivity. Both act at the interface between semantics and morphosyntax: they are semantically determined and syntactically encoded. The tasks were submitted to both the computational model and a group of Italian native speakers. The comparison between the two groups of data allows us to investigate to what extent neural language models capture significant aspects of human semantic competence.	翻訳日:2023-07-07 14:14:02 公開日:2023-07-06
# 音声-視覚的エンドツーエンドのマルチチャンネル音声分離・デバーベレーション・認識 Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition ( http://arxiv.org/abs/2307.02909v1 ) ライセンス: Link先を確認	Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu	(参考訳) 重なり合う話者、騒音、残響を含むカクテルパーティー音声の正確な認識は、現在でも非常に難しい課題である。本稿では、音響信号の劣化に対する視覚的モダリティの不変性、音声-視覚的多チャンネル音声分離、全システムコンポーネントに視覚情報をフルに組み込んだデバーベーションと認識アプローチを提案する。ビデオ入力の有効性は、マスクベースのMVDR音声分離、DNN-WPEまたはスペクトルマッピング(SpecM)ベースの音声デバーベレーションフロントエンド、コンフォーマーASRバックエンドで一貫して実証される。マスクを用いたWPDによるパイプライン化, 共同方式による音声分離, 残響処理を行うフロントエンドアーキテクチャについて検討した。音声強調フロントエンドとASRバックエンドコンポーネント間の誤差コストのミスマッチは、ASRコスト関数のみを用いたエンドツーエンドの微調整や、音声強調損失の補間によって最小化する。オックスフォードLSS2データセットのシミュレーションや再生を用いて合成した重畳および残響音声データについて実験を行った。提案された音声-視覚的多チャンネル音声分離、収差認識システムは、同等の音声のみのベースラインを9.1%、絶対値6.2%(41.7%、相対値36.0%)のワードエラー率(WER)で一貫して上回った。 PESQ, STOI, SRMRでは, 音声強調の改善が得られた。 Accurate recognition of cocktail party speech containing overlapping speakers, noise and reverberation remains a highly challenging task to date. Motivated by the invariance of visual modality to acoustic signal corruption, an audio-visual multi-channel speech separation, dereverberation and recognition approach featuring a full incorporation of visual information into all system components is proposed in this paper. The efficacy of the video input is consistently demonstrated in mask-based MVDR speech separation, DNN-WPE or spectral mapping (SpecM) based speech dereverberation front-end and Conformer ASR back-end. Audio-visual integrated front-end architectures performing speech separation and dereverberation in a pipelined or joint fashion via mask-based WPD are investigated. The error cost mismatch between the speech enhancement front-end and ASR back-end components is minimized by end-to-end jointly fine-tuning using either the ASR cost function alone, or its interpolation with the speech enhancement loss. Experiments were conducted on the mixture overlapped and reverberant speech data constructed using simulation or replay of the Oxford LRS2 dataset. The proposed audio-visual multi-channel speech separation, dereverberation and recognition systems consistently outperformed the comparable audio-only baseline by 9.1% and 6.2% absolute (41.7% and 36.0% relative) word error rate (WER) reductions. Consistent speech enhancement improvements were also obtained on PESQ, STOI and SRMR scores.	翻訳日:2023-07-07 14:13:51 公開日:2023-07-06
# CNNと意思決定レベル融合を用いたマルチモーダル・マルチクラスパーキンソン病分類 Multi-modal multi-class Parkinson disease classification using CNN and decision level fusion ( http://arxiv.org/abs/2307.02978v1 ) ライセンス: Link先を確認	Sushanta Kumar Sahu, Ananda S. Chowdhury	(参考訳) パーキンソン病は世界保健機関(WHO)が報告した2番目に一般的な神経変性疾患である。本稿では,MRIとDTIの2つのモードを用いた直接3クラスPD分類を提案する。分類に用いられる3つのクラスはpdであり、ドーパミン欠乏の証拠のないスキャンと健康管理である。目的を達成するために,mriおよび分数異方性から白色物質と灰色物質を用い,dtiからの平均拡散率を測定した。上記の4種類のデータに基づいて、4つの別々のCNNをトレーニングします。決定レベルでは、4つのcnnモデルの出力は最適な重み付き平均融合技術によって融合される。 PPMIデータベース上で,PD,HC,SWEDDの直接3クラス分類における95.53パーセントの精度を実現した。一連のアブレーション研究を含む広範な比較は,提案法の有効性を明確に示している。 Parkinson disease is the second most common neurodegenerative disorder, as reported by the World Health Organization. In this paper, we propose a direct three-Class PD classification using two different modalities, namely, MRI and DTI. The three classes used for classification are PD, Scans Without Evidence of Dopamine Deficit and Healthy Control. We use white matter and gray matter from the MRI and fractional anisotropy and mean diffusivity from the DTI to achieve our goal. We train four separate CNNs on the above four types of data. At the decision level, the outputs of the four CNN models are fused with an optimal weighted average fusion technique. We achieve an accuracy of 95.53 percentage for the direct three class classification of PD, HC and SWEDD on the publicly available PPMI database. Extensive comparisons including a series of ablation studies clearly demonstrate the effectiveness of our proposed solution.	翻訳日:2023-07-07 14:08:14 公開日:2023-07-06
# スマートフォン音声データからcovid-19の効率的な検出のためのトランスファー学習 Transfer Learning for the Efficient Detection of COVID-19 from Smartphone Audio Data ( http://arxiv.org/abs/2307.02975v1 ) ライセンス: Link先を確認	Mattia Giovanni Campana, Franca Delmastro, Elena Pagani	(参考訳) スマートフォンデータから病気を検出することは、モバイル健康(m-health)システムにおけるオープンな研究課題である。新型コロナウイルスとその呼吸器症状は、この地域で重要なケーススタディであり、その早期発見は、パンデミックの状況に対処するための潜在的な手段である。このソリューションの有効性は主に、収集されたデータに適用されたAIアルゴリズムのパフォーマンスと、ユーザのモバイルデバイスに直接適用可能な実装に依存する。本稿では,これらの課題と限られたデータ量を考慮すると,手作りの特徴と比較した3種類の深層学習モデルの実験的評価と,特徴抽出と微調整の2つのシナリオにおける伝達学習の主なアプローチについて述べる。具体的には、VGGish、YAMNET、L\textsuperscript{3}-Net(12の異なる構成を含む)を4つの異なるデータセット(合計13,447のサンプル)でユーザに依存しない実験によって評価した。その結果、L\textsuperscript{3}-Netの利点は、他のソリューションを12.3倍の精度で上回り、AUCを特徴抽出器として、そしてモデルが微調整された場合の10倍の利点を示している。さらに、事前学習したモデルの完全連結層のみを微調整すると、一般的に性能が低下し、特徴抽出に関して平均6.6\%低下する。 % さらなる調査の必要性が高い。最後に,様々なモデルのメモリフットプリントを評価し,商用モバイルデバイス上での利用の可能性について検討した。 Disease detection from smartphone data represents an open research challenge in mobile health (m-health) systems. COVID-19 and its respiratory symptoms are an important case study in this area and their early detection is a potential real instrument to counteract the pandemic situation. The efficacy of this solution mainly depends on the performances of AI algorithms applied to the collected data and their possible implementation directly on the users' mobile devices. Considering these issues, and the limited amount of available data, in this paper we present the experimental evaluation of 3 different deep learning models, compared also with hand-crafted features, and of two main approaches of transfer learning in the considered scenario: both feature extraction and fine-tuning. Specifically, we considered VGGish, YAMNET, and L\textsuperscript{3}-Net (including 12 different configurations) evaluated through user-independent experiments on 4 different datasets (13,447 samples in total). Results clearly show the advantages of L\textsuperscript{3}-Net in all the experimental settings as it overcomes the other solutions by 12.3\% in terms of Precision-Recall AUC as features extractor, and by 10\% when the model is fine-tuned. Moreover, we note that to fine-tune only the fully-connected layers of the pre-trained models generally leads to worse performances, with an average drop of 6.6\% with respect to feature extraction. %highlighting the need for further investigations. Finally, we evaluate the memory footprints of the different models for their possible applications on commercial mobile devices.	翻訳日:2023-07-07 14:08:02 公開日:2023-07-06
# リモートセンシング画像超解像のためのクロス空間画素統合とクロスステージ機能融合型トランスネットワーク Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network for Remote Sensing Image Super-Resolution ( http://arxiv.org/abs/2307.02974v1 ) ライセンス: Link先を確認	Yuting Lu, Lingtong Min, Binglu Wang, Le Zheng, Xiaoxu Wang, Yongqiang Zhao, Teng Long	(参考訳) リモートセンシング画像スーパーレゾリューション(RSISR)は、空間デテールの強化と衛星画像の品質向上に重要な役割を果たす。近年、TransformerベースのモデルはRSISRの競争性能を示している。グローバルな自己注意による二次計算の複雑さを軽減するため、様々な手法が局所的な窓に注意を拘束し、効率を高める。その結果、単一の注意層における受容場は不十分であり、コンテキストモデリングが不十分となる。さらに、ほとんどの変換ベースのアプローチは、スキップ接続を通じて浅い機能を再利用するが、これらの接続のみに依存することによって、浅い特徴と深い特徴を等しく扱い、モデルの特徴付け能力を妨げる。これらの課題に対処するため,RSISR 用 Cross-Spatial Pixel Integration と Cross-Stage Feature Fusion Based Transformer Network (SPIFFNet) と呼ばれる新しいトランスフォーマアーキテクチャを提案する。提案モデルは,画像全体のグローバル認知と理解を効果的に促進し,機能統合の効率化を図る。本モデルでは,CSPIA (Cross-spatial pixel Integration attention) を用いて局所窓にコンテキスト情報を導入し,CSFFA (Cross-stage feature fusion attention) は前段階の特徴を適応的に融合させ,現行の要件に則って特徴表現を改善する。本研究では,複数のベンチマークデータセットを対象とした総合的な実験を行い,提案するspiffnetの性能を,最先端手法と比較して定量的指標と視覚品質の両面で実証した。 Remote sensing image super-resolution (RSISR) plays a vital role in enhancing spatial detials and improving the quality of satellite imagery. Recently, Transformer-based models have shown competitive performance in RSISR. To mitigate the quadratic computational complexity resulting from global self-attention, various methods constrain attention to a local window, enhancing its efficiency. Consequently, the receptive fields in a single attention layer are inadequate, leading to insufficient context modeling. Furthermore, while most transform-based approaches reuse shallow features through skip connections, relying solely on these connections treats shallow and deep features equally, impeding the model's ability to characterize them. To address these issues, we propose a novel transformer architecture called Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network (SPIFFNet) for RSISR. Our proposed model effectively enhances global cognition and understanding of the entire image, facilitating efficient integration of features cross-stages. The model incorporates cross-spatial pixel integration attention (CSPIA) to introduce contextual information into a local window, while cross-stage feature fusion attention (CSFFA) adaptively fuses features from the previous stage to improve feature expression in line with the requirements of the current stage. We conducted comprehensive experiments on multiple benchmark datasets, demonstrating the superior performance of our proposed SPIFFNet in terms of both quantitative metrics and visual quality when compared to state-of-the-art methods.	翻訳日:2023-07-07 14:07:36 公開日:2023-07-06
# プルーニング対量子化:どちらが良いか Pruning vs Quantization: Which is Better? ( http://arxiv.org/abs/2307.02973v1 ) ライセンス: Link先を確認	Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen Blankevoort	(参考訳) ニューラルネットワークのプルーニングと量子化技術は、ニューラルネットワーク自体と同じくらい古い。しかし、現在では両者のアドホックな比較しか発表されていない。本稿では,ニューラルネットワークの量子化とプルーニングのどちらがよいのか,という問いに答える。この質問に答えることで、今後ニューラルネットワークハードウェアに関する設計決定が下されることを期待します。ディープニューラルネットワークを圧縮する2つの手法を広範囲に比較した。まず、一般的なデータ分布に対する期待量子化とプルーニング誤差の分析比較を行う。次に,学習ネットワークにおける層毎のプルーニングと量子化誤差の上限を低くし,最適化後の経験的誤差と比較する。最後に,8つの大規模モデルを3つのタスクでトレーニングするための実験的な比較を行った。その結果,ほとんどの場合,量子化はプルーニングよりも優れていた。圧縮比が非常に高いいくつかのシナリオでのみ、プルーニングは精度の観点から有益である。 Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models on 3 tasks. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with very high compression ratio, pruning might be beneficial from an accuracy standpoint.	翻訳日:2023-07-07 14:07:06 公開日:2023-07-06
# テキスト対画像生成における文化的ギャップについて On the Cultural Gap in Text-to-Image Generation ( http://arxiv.org/abs/2307.02971v1 ) ライセンス: Link先を確認	Bingshuai Liu, Longyue Wang, Chenyang Lyu, Yong Zhang, Jinsong Su, Shuming Shi, Zhaopeng Tu	(参考訳) テキスト・トゥ・イメージ(T2I)生成における課題の1つは、トレーニングデータに存在する文化ギャップの意図しない反映であり、入力テキストの文化的要素がトレーニングセットにほとんど収集されない場合に生成された画像品質の相違を示す。様々なT2Iモデルは印象的だが任意の例を示しているが、T2Iモデルが異文化間画像を生成する能力を体系的に評価するベンチマークは存在しない。このギャップを埋めるために、モデルが対象文化にどの程度適しているかを評価するための総合的な評価基準を備えたChallenging Cross-Cultural (C3)ベンチマークを提案する。 C3ベンチマークで安定拡散モデルによって生成された欠陥画像を解析することにより、そのモデルが特定の文化オブジェクトを生成するのに失敗することが多いことが分かる。そこで本稿では,t2iモデルを微調整して異文化生成を改善するために,対象文化の微調整データをフィルタするために,オブジェクト・テキストアライメントを考慮した新しいマルチモーダル・メトリックを提案する。実験結果から,我々のマルチモーダル・メトリックは既存の指標よりもC3ベンチマーク上でより強力なデータ選択性能を提供することが示された。このベンチマーク、データ、コード、生成した画像は、文化的に多様なT2I世代(https://github.com/longyuewangdcu/C3-Bench.com)の今後の研究を促進する。 One challenge in text-to-image (T2I) generation is the inadvertent reflection of culture gaps present in the training data, which signifies the disparity in generated image quality when the cultural elements of the input text are rarely collected in the training set. Although various T2I models have shown impressive but arbitrary examples, there is no benchmark to systematically evaluate a T2I model's ability to generate cross-cultural images. To bridge the gap, we propose a Challenging Cross-Cultural (C3) benchmark with comprehensive evaluation criteria, which can assess how well-suited a model is to a target culture. By analyzing the flawed images generated by the Stable Diffusion model on the C3 benchmark, we find that the model often fails to generate certain cultural objects. Accordingly, we propose a novel multi-modal metric that considers object-text alignment to filter the fine-tuning data in the target culture, which is used to fine-tune a T2I model to improve cross-cultural generation. Experimental results show that our multi-modal metric provides stronger data selection performance on the C3 benchmark than existing metrics, in which the object-text alignment is crucial. We release the benchmark, data, code, and generated images to facilitate future research on culturally diverse T2I generation (https://github.com/longyuewangdcu/C3-Bench).	翻訳日:2023-07-07 14:06:53 公開日:2023-07-06
# DPM: 分離による感性データのクラスタリング DPM: Clustering Sensitive Data through Separation ( http://arxiv.org/abs/2307.02969v1 ) ライセンス: Link先を確認	Yara Sch\"utt, Johannes Liebenow, Tanya Braun, Marcel Gehrke, Florian Thaeter, Esfandiar Mohammadi	(参考訳) プライバシ保存型クラスタリンググループデータポイントは教師なしの方法で保護され、機密情報が保護される。以前のプライバシ保存クラスタリングは、ポイントクラウドの集中度を特定することに重点を置いていた。本稿では,データセットを分割する適切な分離子を特定することに着目する。本稿では,差分プライベートな方法で正確なデータポイントセパレータを探索する,差分プライベートクラスタリングアルゴリズムdpmを提案する。 DPMは、クラスタ内の小さなギャップではなく、クラスタ間の大きなギャップであるセパレータを特定することと、データを大きなサブパートに分割するセパレータを優先して、プライバシ予算を効率的に使用することだ。差分的にプライベートな指数メカニズムを用いて、DPMは証明可能な高いユーティリティを持つクラスタセパレータをランダムに選択する: データセットの$D$に対して、中央の$60\%$quantileに広い低密度セパレータがある場合、DPMは確率1\exp(-\sqrt{\|D\|})$でそのセパレータを見つける。実験の結果,dpmはクラスタリング指標の慣性において有意な改善が得られた。ベースラインとしての非プライベートkmeans++の慣性的な結果により、$\varepsilon = 1$と$\delta=10^{-5}$ dpmは、合成データセットに対して最大$50\%、changとkamathによる最先端クラスタリングアルゴリズムと比較して実世界のデータセットに対して最大$62\$でベースラインとの違いを改善している。 Privacy-preserving clustering groups data points in an unsupervised manner whilst ensuring that sensitive information remains protected. Previous privacy-preserving clustering focused on identifying concentration of point clouds. In this paper, we take another path and focus on identifying appropriate separators that split a data set. We introduce the novel differentially private clustering algorithm DPM that searches for accurate data point separators in a differentially private manner. DPM addresses two key challenges for finding accurate separators: identifying separators that are large gaps between clusters instead of small gaps within a cluster and, to efficiently spend the privacy budget, prioritising separators that split the data into large subparts. Using the differentially private Exponential Mechanism, DPM randomly chooses cluster separators with provably high utility: For a data set $D$, if there is a wide low-density separator in the central $60\%$ quantile, DPM finds that separator with probability $1 - \exp(-\sqrt{\|D\|})$. Our experimental evaluation demonstrates that DPM achieves significant improvements in terms of the clustering metric inertia. With the inertia results of the non-private KMeans++ as a baseline, for $\varepsilon = 1$ and $\delta=10^{-5}$ DPM improves upon the difference to the baseline by up to $50\%$ for a synthetic data set and by up to $62\%$ for a real-world data set compared to a state-of-the-art clustering algorithm by Chang and Kamath.	翻訳日:2023-07-07 14:06:26 公開日:2023-07-06
# 空間スペクトルベクトルビーム Spatio-Spectral Vector Beams ( http://arxiv.org/abs/2307.02965v1 ) ライセンス: Link先を確認	Lea Kopf, Rafael Barros, Robert Fickler	(参考訳) 自由度(dof)の高度な操作によって光場の複雑さが増すことは、基礎研究や技術にとって新たな機会となる。光の空間的またはスペクトル的な形状に関連する偏光は、完全に偏光され、空間的またはスペクトル的に変化する偏光構造を持ついわゆる空間的またはスペクトル的ベクトルビームをもたらす。ここでは、両方のアプローチを組み合わせることでベクトルビームの一般的な考え方を拡張し、空間、波長、偏光の3つの非分離性DoFにおける新しい光状態を構築する。我々は、それらの複素偏光構造を詳細に研究し、場の偏光の度合いは、空間と波長が狭く定義されているときにのみ明らかにすることを示し、非分離量子系におけるコヒーレンス損失の類似性を実証する。このような光場は、古典的な光場の非分離性や新しい技術機会、例えばイメージングや分光の応用に関する基礎研究を可能にする。 Increasing the complexity of a light field through the advanced manipulation of its degrees of freedom (DoF) provides new opportunities for fundamental studies and technologies. Correlating polarization with the light's spatial or spectral shape results in so-called spatial or spectral vector beams that are fully polarized and have a spatially or spectrally varying polarization structure. Here, we extend the general idea of vector beams by combining both approaches and structuring a novel state of light in three non-separable DoF's, i.e. space, wavelength, and polarization. We study in detail their complex polarization structure, show that the degree of polarization of the field is only unveiled when the field is narrowly defined in space and wavelength, and demonstrate the analogy to the loss of coherence in non-separable quantum systems. Such light fields allow fundamental studies on the non-separable nature of a classical light field and new technological opportunities, e.g. through applications in imaging or spectroscopy.	翻訳日:2023-07-07 14:05:50 公開日:2023-07-06
# 局所パウリ雑音流路の構造とパラメータの効率的な学習 Efficient learning of the structure and parameters of local Pauli noise channels ( http://arxiv.org/abs/2307.02959v1 ) ライセンス: Link先を確認	Cambyse Rouz\'e, Daniel Stilck Fran\c{c}a	(参考訳) ノイズの避けられない存在は、大規模量子コンピュータの開発にとって重要な障害であり、量子ノイズを高精度で確実かつ効率的に特徴づける能力は、量子技術のさらなる拡張に不可欠である。任意の量子チャネルを推定するには指数的な資源を必要とするが、物理的に関連するノイズはいくつかの局所構造を持つことが期待されている。前回の研究では、既知の条件付き独立構造から外れても、状態の準備や測定エラーに頑健な方法で、効率的なサンプル数でポーリノイズチャネルを推定できることが示されている。本稿では,n量子ビット上でポーリノイズチャネルを学習する新しい手法を提案する。条件付き独立構造を持つ学習係数に着目した先行研究とは異なり,本手法は係数と基礎構造の両方を学習する。我々は,Gibs測度を効率よく学習するためにBreslerによる画期的な結果を利用して,O(log(n))の最適なサンプル複雑性を求め,n量子ビットに作用する雑音の未知構造を学習する。この情報を利用すれば、O(poly(n))サンプルからダイヤモンド距離に近いチャネルの記述を得ることができる。さらに,本手法は,SPAMロバストネスなどの他の望ましい特徴を諦めることなく,サンプル数と後処理の両面で効率的であり,単一キュービットクリフォードの実装しか必要としない。これを踏まえ, 量子デバイスにおけるポーリノイズの大規模キャラクタリゼーションを, 最小実験条件と仮定下で実現している。 The unavoidable presence of noise is a crucial roadblock for the development of large-scale quantum computers and the ability to characterize quantum noise reliably and efficiently with high precision is essential to scale quantum technologies further. Although estimating an arbitrary quantum channel requires exponential resources, it is expected that physically relevant noise has some underlying local structure, for instance that errors across different qubits have a conditional independence structure. Previous works showed how it is possible to estimate Pauli noise channels with an efficient number of samples in a way that is robust to state preparation and measurement errors, albeit departing from a known conditional independence structure. We present a novel approach for learning Pauli noise channels over n qubits that addresses this shortcoming. Unlike previous works that focused on learning coefficients with a known conditional independence structure, our method learns both the coefficients and the underlying structure. We achieve our results by leveraging a groundbreaking result by Bresler for efficiently learning Gibbs measures and obtain an optimal sample complexity of O(log(n)) to learn the unknown structure of the noise acting on n qubits. This information can then be leveraged to obtain a description of the channel that is close in diamond distance from O(poly(n)) samples. Furthermore, our method is efficient both in the number of samples and postprocessing without giving up on other desirable features such as SPAM-robustness, and only requires the implementation of single qubit Cliffords. In light of this, our novel approach enables the large-scale characterization of Pauli noise in quantum devices under minimal experimental requirements and assumptions.	翻訳日:2023-07-07 14:05:32 公開日:2023-07-06
# スピン軌道結合二重量子ドットの分類とマジック磁場方向 Classification and magic magnetic-field directions for spin-orbit-coupled double quantum dots ( http://arxiv.org/abs/2307.02958v1 ) ライセンス: Link先を確認	Aritra Sen, Gy\"orgy Frank, Baksa Kolok, Jeroen Danon, Andr\'as P\'alyi	(参考訳) 半導体量子ドットに閉じ込められた単一電子のスピンは自然量子ビット候補である。スピンベースの量子コンピューティングの基本構成要素は、スピン軌道結合の大きい二重量子ドットで実証されている。ここで、スピン軌道結合二重量子ドットは、その$g$-tensorの多次元空間の分割により、6つのクラスに分類できることを示す。このクラスは二重点の物理的特性、すなわち、輸送、分光、コヒーレンス測定の特徴、および量子ビット制御、シャットリング、読み出し実験などを決定する。特に、スピン物理学は、外部磁場が特殊方向(‘マジック方向’)を指しているときに、シュードスピン保存のために高度に単純化され、特殊方向の数はクラスによって決定される。また,等局所ゼーマン分割に対応する磁場方向空間におけるマジックループの存在と関連性を解析した。これらの結果は、強いスピン軌道結合を持つ材料におけるスピンベースの量子コンピューティング実験の正確な解釈と効率的な設計に向けた重要なステップを示す。 The spin of a single electron confined in a semiconductor quantum dot is a natural qubit candidate. Fundamental building blocks of spin-based quantum computing have been demonstrated in double quantum dots with significant spin-orbit coupling. Here, we show that spin-orbit-coupled double quantum dots can be categorised in six classes, according to a partitioning of the multi-dimensional space of their $g$-tensors. The class determines physical characteristics of the double dot, i.e., features in transport, spectroscopy and coherence measurements, as well as qubit control, shuttling, and readout experiments. In particular, we predict that the spin physics is highly simplified due to pseudospin conservation, whenever the external magnetic field is pointing to special directions (`magic directions'), where the number of special directions is determined by the class. We also analyze the existence and relevance of magic loops in the space of magnetic-field directions, corresponding to equal local Zeeman splittings. These results present an important step toward precise interpretation and efficient design of spin-based quantum computing experiments in materials with strong spin-orbit coupling.	翻訳日:2023-07-07 14:05:04 公開日:2023-07-06
# SegNetr:U字型ネットワークにおけるローカル-グローバルインタラクションとスキップ接続の再考 SegNetr: Rethinking the local-global interactions and skip connections in U-shaped networks ( http://arxiv.org/abs/2307.02953v1 ) ライセンス: Link先を確認	Junlong Cheng, Chengrui Gao, Fengjie Wang, Min Zhu	(参考訳) 近年,U字型ネットワークは,シンプルで手軽に調整可能な構造であるため,医用画像セグメンテーションの分野を支配している。しかし、既存のu字型セグメンテーションネットワーク: 1) 主に、畳み込み操作に基づく長期依存の欠如を補う複雑な自己注意モジュールの設計に焦点が当てられ、ネットワークのパラメータの総数と計算複雑性が増大する。 2) 単にエンコーダとデコーダの特徴を融合させ, 空間的位置の接続を無視する。本稿では、上記の問題を再考し、SegNetrと呼ばれる軽量な医用画像分割ネットワークを構築する。具体的には,任意の段階で動的に局所的・局所的相互作用を行なえる新しいSegNetrブロックを提案する。同時に、エンコーダ特徴の空間的位置情報を保存し、デコーダ特徴との正確な融合を実現するための汎用情報保持スキップ接続(IRSC)を設計する。我々は,4つの主流な医用画像セグメンテーションデータセットに対するSegNetrの有効性を検証し,59 %,76 %のパラメータとGFLOPをバニラU-Netよりも少なくし,最先端の手法に匹敵するセグメンテーション性能を実現した。特に,本論文で提案するコンポーネントを他のU字型ネットワークに適用し,セグメンテーション性能を向上させる。 Recently, U-shaped networks have dominated the field of medical image segmentation due to their simple and easily tuned structure. However, existing U-shaped segmentation networks: 1) mostly focus on designing complex self-attention modules to compensate for the lack of long-term dependence based on convolution operation, which increases the overall number of parameters and computational complexity of the network; 2) simply fuse the features of encoder and decoder, ignoring the connection between their spatial locations. In this paper, we rethink the above problem and build a lightweight medical image segmentation network, called SegNetr. Specifically, we introduce a novel SegNetr block that can perform local-global interactions dynamically at any stage and with only linear complexity. At the same time, we design a general information retention skip connection (IRSC) to preserve the spatial location information of encoder features and achieve accurate fusion with the decoder features. We validate the effectiveness of SegNetr on four mainstream medical image segmentation datasets, with 59\% and 76\% fewer parameters and GFLOPs than vanilla U-Net, while achieving segmentation performance comparable to state-of-the-art methods. Notably, the components proposed in this paper can be applied to other U-shaped networks to improve their segmentation performance.	翻訳日:2023-07-07 14:04:47 公開日:2023-07-06
# ラベル効率3d-to2dセグメンテーションのためのモード間再構成と特徴投影ネットワークによる自己教師あり学習 Self-supervised learning via inter-modal reconstruction and feature projection networks for label-efficient 3D-to-2D segmentation ( http://arxiv.org/abs/2307.03008v1 ) ライセンス: Link先を確認	Jos\'e Morano, Guilherme Aresta, Dmitrii Lachinov, Julia Mai, Ursula Schmidt-Erfurth, Hrvoje Bogunovi\'c	(参考訳) 深層学習は、特定の医用画像セグメンテーションタスクを自動化し、医療専門家の作業量を大幅に軽減する貴重なツールとなっている。これらのタスクのいくつかは、入力次元のサブセットでセグメンテーションを行う必要があり、最も一般的なケースは3D-to-2Dである。しかし、既存の手法の性能は、現在これらのタスクで検証されている転送学習のようなデータ効率のよい手法がないため、ラベル付きデータの量によって強く条件付けられている。本研究では,ラベル効率のよい3D-to-2Dセグメンテーションのための新しい畳み込みニューラルネットワーク(CNN)と自己教師付き学習(SSL)手法を提案する。 cnnは、3dエンコーダと、2dデコーダからなり、新しい3d-to2dブロックで接続される。 SSL法は次元の異なるモダリティのイメージペアを再構成する。光コヒーレンス・トモグラフィーにおける地理的萎縮の面分画と直交性偽ドライセンの2つの臨床的関連性について検討した。異なるデータセット上の結果から,提案するcnnは,diceスコアの最大8%の制限付きデータを用いて,シナリオにおけるアートの状態を著しく改善することが示された。さらに,提案手法により,最大23%の性能向上が可能となり,ネットワークアーキテクチャに関係なくSSLが有効であることを示す。 Deep learning has become a valuable tool for the automation of certain medical image segmentation tasks, significantly relieving the workload of medical specialists. Some of these tasks require segmentation to be performed on a subset of the input dimensions, the most common case being 3D-to-2D. However, the performance of existing methods is strongly conditioned by the amount of labeled data available, as there is currently no data efficient method, e.g. transfer learning, that has been validated on these tasks. In this work, we propose a novel convolutional neural network (CNN) and self-supervised learning (SSL) method for label-efficient 3D-to-2D segmentation. The CNN is composed of a 3D encoder and a 2D decoder connected by novel 3D-to-2D blocks. The SSL method consists of reconstructing image pairs of modalities with different dimensionality. The approach has been validated in two tasks with clinical relevance: the en-face segmentation of geographic atrophy and reticular pseudodrusen in optical coherence tomography. Results on different datasets demonstrate that the proposed CNN significantly improves the state of the art in scenarios with limited labeled data by up to 8% in Dice score. Moreover, the proposed SSL method allows further improvement of this performance by up to 23%, and we show that the SSL is beneficial regardless of the network architecture.	翻訳日:2023-07-07 13:57:10 公開日:2023-07-06
# 解剖学的特徴と反復学習を用いた手ポーズ推定の自己教師あり最適化 Self-supervised Optimization of Hand Pose Estimation using Anatomical Features and Iterative Learning ( http://arxiv.org/abs/2307.03007v1 ) ライセンス: Link先を確認	Christian Jauch, Timo Leitritz, Marco F. Huber	(参考訳) 手動組立作業員は、仕事の複雑さが増しています。人間中心の補助システムは役に立つが、オブジェクト認識を可能にする技術は、これらのシステムの洗練された人間中心の設計を妨げる。同時に、手のポーズに基づく行動認識は、手袋を着用するなどの複雑な使用シナリオにおいて、ポーズ推定の貧弱さに苦しむ。本稿では,人的相互作用が最小限である特定のユースケースに手振り推定を適用するための自己教師付きパイプラインを提案する。これにより、安価でロバストな手ポーズベースのアクティビティ認識が可能になる。このパイプラインは、一般的なデータセットでトレーニングされた手のポーズ推定のための一般的な機械学習モデルと、手の解剖学的制約を考慮した空間的および時間的フィルタリングと、モデルを改善するための再トレーニングステップで構成されている。異なるパラメータの組み合わせは、公開および注釈付きデータセットで評価される。最適なパラメータとモデルの組み合わせは、手動のアセンブリシナリオからラベルなしのビデオに適用されます。パイプラインの有効性は、手動のアセンブリシナリオで下流タスクとしてアクティビティ認識をトレーニングすることで実証される。 Manual assembly workers face increasing complexity in their work. Human-centered assistance systems could help, but object recognition as an enabling technology hinders sophisticated human-centered design of these systems. At the same time, activity recognition based on hand poses suffers from poor pose estimation in complex usage scenarios, such as wearing gloves. This paper presents a self-supervised pipeline for adapting hand pose estimation to specific use cases with minimal human interaction. This enables cheap and robust hand posebased activity recognition. The pipeline consists of a general machine learning model for hand pose estimation trained on a generalized dataset, spatial and temporal filtering to account for anatomical constraints of the hand, and a retraining step to improve the model. Different parameter combinations are evaluated on a publicly available and annotated dataset. The best parameter and model combination is then applied to unlabelled videos from a manual assembly scenario. The effectiveness of the pipeline is demonstrated by training an activity recognition as a downstream task in the manual assembly scenario.	翻訳日:2023-07-07 13:56:47 公開日:2023-07-06
# ロボットシステムの効率性向上 : 人間のエキスパートに人工的を加える Improving the Efficiency of Human-in-the-Loop Systems: Adding Artificial to Human Experts ( http://arxiv.org/abs/2307.03003v1 ) ライセンス: Link先を確認	Johannes Jakubik, Daniel Weber, Patrick Hemmer, Michael V\"ossing, Gerhard Satzger	(参考訳) 情報システムは人工知能(AI)と機械学習(ML)を活用して、膨大なデータから価値を生み出す。しかし、MLモデルは不完全であり、誤った分類を生成することができる。したがって、MLモデルへのHuman-in-the-loop(HITL)拡張は、分類が難しいインスタンスのヒューマンレビューを追加する。この研究は、難しいモデルの分類を扱うために人間の専門家を継続的に頼りにすることは、限られた資源を圧迫する人間の努力を強力に増加させると主張している。この問題に対処するために,人間専門家が以前にレビューした未知のクラスからデータインスタンスを分類することを学ぶ人工専門家を作成するハイブリッドシステムを提案する。我々のハイブリッドシステムは、未知のクラスからインスタンスを分類するのに適した人工専門家を評価し、自動的に割り当てる。時間が経つにつれ、人間の労力が減り、システムの効率が向上します。提案手法は,画像分類のベンチマークにおいて,従来のHITLシステムよりも優れていることを示す。 Information systems increasingly leverage artificial intelligence (AI) and machine learning (ML) to generate value from vast amounts of data. However, ML models are imperfect and can generate incorrect classifications. Hence, human-in-the-loop (HITL) extensions to ML models add a human review for instances that are difficult to classify. This study argues that continuously relying on human experts to handle difficult model classifications leads to a strong increase in human effort, which strains limited resources. To address this issue, we propose a hybrid system that creates artificial experts that learn to classify data instances from unknown classes previously reviewed by human experts. Our hybrid system assesses which artificial expert is suitable for classifying an instance from an unknown class and automatically assigns it. Over time, this reduces human effort and increases the efficiency of the system. Our experiments demonstrate that our approach outperforms traditional HITL systems for several benchmarks on image classification.	翻訳日:2023-07-07 13:56:32 公開日:2023-07-06
# Fourier-Net+: 効率的な3次元医用画像登録のための帯域制限表現の活用 Fourier-Net+: Leveraging Band-Limited Representation for Efficient 3D Medical Image Registration ( http://arxiv.org/abs/2307.02997v1 ) ライセンス: Link先を確認	Xi Jia, Alexander Thorley, Alberto Gomez, Wenqi Lu, Dipak Kotecha and Jinming Duan	(参考訳) U-Netスタイルのネットワークは、教師なし画像登録において、高解像度のボリューム画像データに対してリソース集約的かつ時間のかかるタスクである密度の変位場を予測するために一般的に利用される。この課題に取り組むために、まず、コストのかかるu-netスタイルの拡張パスをパラメータフリーのモデル駆動デコーダに置き換えるfourier-netを提案する。フルレゾリューション変位場を直接予測する代わりに、フーリエネットは、モデル駆動デコーダが空間領域のフルレゾリューション変位場に変換する帯域制限フーリエ領域における変位場の低次元表現を学習する。フーリエネット上に拡大してフーリエネット+を導入し、さらに画像の帯域制限された空間表現を入力とし、さらにu-netスタイルのネットワークの収縮経路における畳み込み層数を減らす。最後に、登録性能を向上させるために、Fourier-Net+のカスケード版を提案する。提案手法は,提案手法を3つのデータセット上で評価し,提案手法は現在の最先端手法と同等の結果を得るとともに,高速な推論速度,メモリフットプリントの低減,乗算演算の削減を実現している。このような計算コストの小さなFourier-Net+は、低VRAMGPU上での大規模3D登録の効率的なトレーニングを可能にします。私たちのコードは \url{https://github.com/xi-jia/fourier-net} で公開されている。 U-Net style networks are commonly utilized in unsupervised image registration to predict dense displacement fields, which for high-resolution volumetric image data is a resource-intensive and time-consuming task. To tackle this challenge, we first propose Fourier-Net, which replaces the costly U-Net style expansive path with a parameter-free model-driven decoder. Instead of directly predicting a full-resolution displacement field, our Fourier-Net learns a low-dimensional representation of the displacement field in the band-limited Fourier domain which our model-driven decoder converts to a full-resolution displacement field in the spatial domain. Expanding upon Fourier-Net, we then introduce Fourier-Net+, which additionally takes the band-limited spatial representation of the images as input and further reduces the number of convolutional layers in the U-Net style network's contracting path. Finally, to enhance the registration performance, we propose a cascaded version of Fourier-Net+. We evaluate our proposed methods on three datasets, on which our proposed Fourier-Net and its variants achieve comparable results with current state-of-the art methods, while exhibiting faster inference speeds, lower memory footprint, and fewer multiply-add operations. With such small computational cost, our Fourier-Net+ enables the efficient training of large-scale 3D registration on low-VRAM GPUs. Our code is publicly available at \url{https://github.com/xi-jia/Fourier-Net}.	翻訳日:2023-07-07 13:56:16 公開日:2023-07-06
# 非エルミート系における生体直交動的量子相転移 Biorthogonal dynamical quantum phase transitions in non-Hermitian systems ( http://arxiv.org/abs/2307.02993v1 ) ライセンス: Link先を確認	Yecheng Jing, Jian-Jun Dong, Yu-Yu Zhang, and Zi-Xiang Hu	(参考訳) 生物直交基底を用いて、非エルミート系における生物直交動的量子相転移の完全な枠組みを構築する。これまで見過ごされていた関連状態の助けを借りて, 自動正規化バイオノゴナルロスシュミットエコーを定義する。このアプローチは、複素固有値を持つ任意の非エルミート系を扱うことができ、生体直交基底なしで得られるロスシュミットレートの負の値を自然に取り除くことができる。非エルミート的なSu-Schrieffer-Heegerモデルを具体例として、伝統的な量子相転移を超越した、生物直交の動的トポロジカル秩序パラメータの1/2$の特別な変化が観察される。また、臨界運動量における2段階のサブシステムが振動するか、定常状態に達するかによって、生物直交の動的量子相転移の周期性も分かる。 By using biorthogonal bases, we construct a complete framework for biorthogonal dynamical quantum phase transitions in non-Hermitian systems. With the help of associated state which is overlooked previously, we define the automatically normalized biorthogonal Loschmidt echo. This approach is capable of handling arbitrary non-Hermitian systems with complex eigenvalues, which naturally eliminates the negative value of Loschmidt rate obtained without the biorthogonal bases. Taking the non-Hermitian Su-Schrieffer-Heeger model as a concrete example, a peculiar $1/2$ change in biorthogonal dynamical topological order parameter, which is beyond the traditional dynamical quantum phase transitions is observed. We also find the periodicity of biorthogonal dynamical quantum phase transitions depend on whether the two-level subsystem at the critical momentum oscillates or reaches a steady state.	翻訳日:2023-07-07 13:55:50 公開日:2023-07-06
# ContainerGym: リソース割り当てのための実世界の強化学習ベンチマーク ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource Allocation ( http://arxiv.org/abs/2307.02991v1 ) ライセンス: Link先を確認	Abhijeet Pendyala, Justin Dettmer, Tobias Glasmachers, Asma Atamna	(参考訳) 本稿では,実世界の産業資源配分タスクに触発された強化学習のベンチマークであるContainerGymを紹介する。提案したベンチマークは、不確実性など、現実のシーケンシャルな意思決定問題でよく遭遇する様々な課題をエンコードする。これは、例えば、可変次元の観点で、様々な難易度の問題をインスタンス化するように構成することができる。我々のベンチマークは他の強化学習ベンチマークと異なり、実世界の難易度をエンコードすることを目的としており、それは最小限の単純化と合理化を行った実世界の産業問題から直接導かれるものである。リソース割り当てフレームワークに適合する実世界の問題に対して、強化学習アルゴリズムを評価するのに十分便利です。標準ベースライン方式の結果を提供する。通常のトレーニング報酬曲線を超えて、我々の結果とそれらの解釈に使用される統計ツールは、よく知られた深層強化学習アルゴリズム(PPO、TRPO、DQN)の興味深い制限を強調します。 We present ContainerGym, a benchmark for reinforcement learning inspired by a real-world industrial resource allocation task. The proposed benchmark encodes a range of challenges commonly encountered in real-world sequential decision making problems, such as uncertainty. It can be configured to instantiate problems of varying degrees of difficulty, e.g., in terms of variable dimensionality. Our benchmark differs from other reinforcement learning benchmarks, including the ones aiming to encode real-world difficulties, in that it is directly derived from a real-world industrial problem, which underwent minimal simplification and streamlining. It is sufficiently versatile to evaluate reinforcement learning algorithms on any real-world problem that fits our resource allocation framework. We provide results of standard baseline methods. Going beyond the usual training reward curves, our results and the statistical tools used to interpret them allow to highlight interesting limitations of well-known deep reinforcement learning algorithms, namely PPO, TRPO and DQN.	翻訳日:2023-07-07 13:55:35 公開日:2023-07-06
# 講義ノート:ランダムユニタリ回路の導入と測定誘起絡み合い相転移 Lecture Notes: Introduction to random unitary circuits and the measurement-induced entanglement phase transition ( http://arxiv.org/abs/2307.02986v1 ) ライセンス: Link先を確認	Brian Skinner	(参考訳) これらはミネソタ大学の2023 condensed matter summer schoolの短い講義シリーズにまとめられた講義ノートである。会話的で楽しいようにデザインされており、物事を正確に述べ、文学を徹底的に引用する真剣な仕事を行う記事のレビューに取って代わるものではない。このノートの目的は、ランダムユニタリ回路における絡み合いダイナミクスと測定誘起絡み合い相転移の研究の基礎となるいくつかの中心的なアイデアを導入することである。特に注目されるのは、「最小カット」の概念とその関連する統計メカマッピングである。 These are lecture notes compiled for a short lecture series at the 2023 Condensed Matter Summer School at the University of Minnesota. They are designed to be conversational and fun, and not to take the place of review articles that do a serious job of stating things precisely and citing literature thoroughly. The goal of the notes is to introduce some central ideas underlying the study of entanglement dynamics in random unitary circuits and the measurement-induced entanglement phase transition. A particular focus is the concept of the "minimal cut" and its associated stat mech mappings.	翻訳日:2023-07-07 13:55:17 公開日:2023-07-06
# 3次元フェルミオン物質中の渦ループダイナミクスと動的量子相転移 Vortex loop dynamics and dynamical quantum phase transitions in 3D fermion matter ( http://arxiv.org/abs/2307.02985v1 ) ライセンス: Link先を確認	Arkadiusz Kosior, Markus Heyl	(参考訳) 本研究では, 一般非相互作用フェルミオン格子モデルにおけるグリーン関数の相における渦特異点の挙動を, 瞬時クエンチ後の3次元で検討する。渦の全集合が1次元の動的物体を形成しており、渦ループと呼ばれる。そのような渦ループの数は、異なる非平衡位相を区別する量子化順序パラメータとして解釈できる。この順序パラメータの変化は、動的量子相転移(DQPT)と関連していることを示す。この結果は3次元の一般格子モデルに適用できる。具体的には、単純二バンドワイル半金属の文脈でそれらを示す。また, 渦ループが弱い相互作用系で生き残ることを示す。最後に, 渦ループは, バンドタッチワイルノードの存在により運動量空間の複雑な動的パターンを形成することが観察された。本研究は,非平衡系における動的順序パラメータの定義の開発に有用な知見を提供する。 In this study, we investigate the behavior of vortex singularities in the phase of the Green's function of a general non-interacting fermionic lattice model in three dimensions after an instantaneous quench. We find that the full set of vortices form one-dimensional dynamical objects, which we call vortex loops. The number of such vortex loops can be interpreted as a quantized order parameter that distinguishes between different non-equilibrium phases. We show that changes in this order parameter are related to dynamical quantum phase transitions (DQPTs). Our results are applicable to general lattice models in three dimensions. For concreteness, we present them in the context of a simple two-band Weyl semimetal. We also show that the vortex loops survive in weakly interacting systems. Finally, we observe that vortex loops can form complex dynamical patterns in momentum space due to the existence of band touching Weyl nodes. Our findings provide valuable insights for developing definitions of dynamical order parameters in non-equilibrium systems.	翻訳日:2023-07-07 13:55:06 公開日:2023-07-06
# 医療用生成モデルの潜在空間におけるプライバシ保護ウォーク A Privacy-Preserving Walk in the Latent Space of Generative Models for Medical Applications ( http://arxiv.org/abs/2307.02984v1 ) ライセンス: Link先を確認	Matteo Pennisi, Federica Proietto Salanitri, Giovanni Bellitto, Simone Palazzo, Ulas Bagci, Concetto Spampinato	(参考訳) GAN(Generative Adversarial Networks)は、ターゲット分布に一致する合成サンプルを生成する能力を実証している。しかし、プライバシーの観点からは、gansをデータ共有のプロキシとして使うことは安全な解決策ではない。 k-匿名性原理に触発された最近の研究は、潜在空間におけるサンプルアグリゲーションを通じてこの問題に対処し、データセットをkの係数で減らすという欠点がある。本研究の目的は、奥行きモデルの効果的なトレーニングを支援するために、プライバシー問題に原則的に対処しながら、多様な合成サンプルを生成できる潜航型宇宙航行戦略を提案することである。提案手法では,潜在空間の点間を非線形に歩いたり,実際のサンプルと衝突するリスクを最小限に抑えるための補助的等式分類器を利用する。潜在空間における無作為な一対の点を考えると、我々の歩行戦略は線形補間よりも安全である。次に,k-same法と併用したパス探索戦略を検証し,結核と糖尿病網膜症分類の2つの指標を用いて,プライバシ保護を維持しつつ,そのモデルを用いたトレーニングにより性能低下を軽減できることを示した。 Generative Adversarial Networks (GANs) have demonstrated their ability to generate synthetic samples that match a target distribution. However, from a privacy perspective, using GANs as a proxy for data sharing is not a safe solution, as they tend to embed near-duplicates of real samples in the latent space. Recent works, inspired by k-anonymity principles, address this issue through sample aggregation in the latent space, with the drawback of reducing the dataset by a factor of k. Our work aims to mitigate this problem by proposing a latent space navigation strategy able to generate diverse synthetic samples that may support effective training of deep models, while addressing privacy concerns in a principled way. Our approach leverages an auxiliary identity classifier as a guide to non-linearly walk between points in the latent space, minimizing the risk of collision with near-duplicates of real samples. We empirically demonstrate that, given any random pair of points in the latent space, our walking strategy is safer than linear interpolation. We then test our path-finding strategy combined to k-same methods and demonstrate, on two benchmarks for tuberculosis and diabetic retinopathy classification, that training a model using samples generated by our approach mitigate drops in performance, while keeping privacy preservation.	翻訳日:2023-07-07 13:54:54 公開日:2023-07-06
# 効率的なセミリング重み付きearleyパース Efficient Semiring-Weighted Earley Parsing ( http://arxiv.org/abs/2307.02982v1 ) ライセンス: Link先を確認	Andreas Opedal, Ran Zmigrod, Tim Vieira, Ryan Cotterell, Jason Eisner	(参考訳) 本稿では,様々なスピードアップを伴うEarey (1970) の文脈自由解析アルゴリズムの,推論システムという形での参照記述を提供する。私たちのプレゼンテーションでは、Earey氏の$O(N^3\|G\|R\|)$から、自然言語処理で発生する大きな文法では動作不能な$O(N^3\|G\|)$から、CKYのランタイムを二項化された文法の$G$にマッチさせる$O(N^3\|G\|)$まで、既知の最悪のランタイム改善が含まれている。ここで$N$は文の長さ、$\|R\|$は$G$のプロダクションの数、$\|G\|$はそれらのプロダクションの総の長さである。また、文法が単一の有限状態オートマトン$m$としてコンパクトに表現されるときに、$\|m\| \leq \|g\|$で$o (n^3\|m\|)$のランタイムを実現するバージョンも提供します。半重み付き推論への一般化を慎重に扱い、Stolcke (1995)のような文法を前処理して推論サイクルを排除し、さらに文プレフィックスの重みを計算するStolckeの手法を一般化する。プリプロセスされた文法では、メソッドのセミリング重み付けバージョンは、非重み付けメソッドと同じ漸近ランタイムとスペース要件を持ち、いくつかの文法上のサブキュービックランタイムを含む。 This paper provides a reference description, in the form of a deduction system, of Earley's (1970) context-free parsing algorithm with various speed-ups. Our presentation includes a known worst-case runtime improvement from Earley's $O (N^3\|G\|\|R\|)$, which is unworkable for the large grammars that arise in natural language processing, to $O (N^3\|G\|)$, which matches the runtime of CKY on a binarized version of the grammar $G$. Here $N$ is the length of the sentence, $\|R\|$ is the number of productions in $G$, and $\|G\|$ is the total length of those productions. We also provide a version that achieves runtime of $O (N^3\|M\|)$ with $\|M\| \leq \|G\|$ when the grammar is represented compactly as a single finite-state automaton $M$ (this is partly novel). We carefully treat the generalization to semiring-weighted deduction, preprocessing the grammar like Stolcke (1995) to eliminate deduction cycles, and further generalize Stolcke's method to compute the weights of sentence prefixes. We also provide implementation details for efficient execution, ensuring that on a preprocessed grammar, the semiring-weighted versions of our methods have the same asymptotic runtime and space requirements as the unweighted methods, including sub-cubic runtime on some grammars.	翻訳日:2023-07-07 13:54:28 公開日:2023-07-06
# シャンファー距離の近距離時間アルゴリズム A Near-Linear Time Algorithm for the Chamfer Distance ( http://arxiv.org/abs/2307.03043v1 ) ライセンス: Link先を確認	Ainesh Bakshi, Piotr Indyk, Rajesh Jayaram, Sandeep Silwal, Erik Waingarten	(参考訳) 任意の二つの点集合に対して、$A,B \subset \mathbb{R}^d$ が$n$ になるとき、$A$ から $B$ までのチャムファー距離は $\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$ と定義される。チャンファー距離は点雲間の相似性の一般的な尺度であり、多くの機械学習、コンピュータビジョン、グラフィックアプリケーションで使われ、単純な$O(d n^2)$-time brute forceアルゴリズムが認められている。さらに、チャンファー距離はより計算的に要求される地球移動距離のプロキシとしてしばしば用いられる。しかし、実行時の$n$に依存する \emph{quadratic} は、大規模なデータセットでは難解なアプローチとなる。このボトルネックを克服し、ほぼ直線走行時間でチャンファー距離を推定する最初の$(1+\epsilon)$-approximateアルゴリズムを示す。具体的には、我々のアルゴリズムは時間$O(nd \log (n)/\varepsilon^2)$で実行でき、実装可能である。我々の実験は、大規模な高次元データセットでは正確かつ高速であることを示した。我々は,我々のアルゴリズムが大規模高次元点雲を解析するための新たな道を開くと信じている。また、emph{report} a $(1+\varepsilon)$-approximate mapping to \emph{report} a $(1+\varepsilon) from $a$ to $b$ (単なる値とは対照的に) ならば、任意のサブ量子時間アルゴリズムは存在し得ない。 For any two point sets $A,B \subset \mathbb{R}^d$ of size up to $n$, the Chamfer distance from $A$ to $B$ is defined as $\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$, where $d_X$ is the underlying distance measure (e.g., the Euclidean or Manhattan distance). The Chamfer distance is a popular measure of dissimilarity between point clouds, used in many machine learning, computer vision, and graphics applications, and admits a straightforward $O(d n^2)$-time brute force algorithm. Further, the Chamfer distance is often used as a proxy for the more computationally demanding Earth-Mover (Optimal Transport) Distance. However, the \emph{quadratic} dependence on $n$ in the running time makes the naive approach intractable for large datasets. We overcome this bottleneck and present the first $(1+\epsilon)$-approximate algorithm for estimating the Chamfer distance with a near-linear running time. Specifically, our algorithm runs in time $O(nd \log (n)/\varepsilon^2)$ and is implementable. Our experiments demonstrate that it is both accurate and fast on large high-dimensional datasets. We believe that our algorithm will open new avenues for analyzing large high-dimensional point clouds. We also give evidence that if the goal is to \emph{report} a $(1+\varepsilon)$-approximate mapping from $A$ to $B$ (as opposed to just its value), then any sub-quadratic time algorithm is unlikely to exist.	翻訳日:2023-07-07 13:48:49 公開日:2023-07-06
# 臨床領域におけるLLaMAのパラメータ効率向上 Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain ( http://arxiv.org/abs/2307.03042v1 ) ライセンス: Link先を確認	Aryo Gema, Luke Daines, Pasquale Minervini, Beatrice Alex	(参考訳) 臨床応用のような新しい領域に事前訓練された言語モデルを適用するには、伝統的にパラメータの集合全体をトレーニングする必要がある。しかし、このような大規模言語モデルの訓練に関係するかなりの計算要求のため、このアプローチは実用的でないことがますます証明されている。この問題に対処するために、パラメータ効率の良いファインチューニング(peft)技術は、追加のパラメータの小さなサブセットを選択的に微調整することで、実行可能なソリューションを提供する。本研究では,オープンソースのLLaMAモデルに基づくPEFTアダプタ層である臨床LLaMA-LoRAを提案する。 MIMIC-IVデータベースから得られた臨床ノートを用いて臨床LLaMA-LoRAを訓練し、臨床領域用に設計された特別なアダプタを作成する。さらに,2段階のPEFTフレームワークを提案する。このフレームワークは,下流タスクに特化した2段階のPEFTアダプタであるLLaMA-LoRAと臨床LLaMA-LoRAを融合する。本稿では,複数の臨床結果予測データセットについて,臨床訓練言語モデルと比較した。提案フレームワークは,すべての臨床下流タスクにおいて,最先端のaurocスコアを実現する。診断や手順分類などの大規模多ラベル分類タスクにおいて,6-9%のAUROCスコアの大幅な改善が観察された。 Adapting pretrained language models to novel domains, such as clinical applications, traditionally involves retraining their entire set of parameters. However, this approach is increasingly proven to be impractical owing to the substantial computational requirements associated with training such large language models. To address this issue, Parameter-Efficient Fine-Tuning (PEFT) techniques offer a viable solution by selectively fine-tuning a small subset of additional parameters, significantly reducing the computational requirements for domain adaptation. In this study, we propose Clinical LLaMA-LoRA, a PEFT adapter layer built upon the open-sourced LLaMA model. Clinical LLaMA-LoRA is trained using clinical notes obtained from the MIMIC-IV database, thereby creating a specialised adapter designed for the clinical domain. Additionally, we propose a two-step PEFT framework which fuses Clinical LLaMA-LoRA with Downstream LLaMA-LoRA, another PEFT adapter specialised for downstream tasks. We evaluate this framework on multiple clinical outcome prediction datasets, comparing it to clinically trained language models. Our proposed framework achieves a state-of-the-art AUROC score averaged across all clinical downstream tasks. We observe substantial improvements of 6-9% AUROC score in the large-scale multilabel classification tasks, such as diagnoses and procedures classification.	翻訳日:2023-07-07 13:48:17 公開日:2023-07-06
# 視覚トランスフォーマによるアート認証 Art Authentication with Vision Transformers ( http://arxiv.org/abs/2307.03039v1 ) ライセンス: Link先を確認	Ludovica Schaerf, Carina Popovici, Eric Postma	(参考訳) 近年では、言語用に開発されたTransformersが視覚タスクにうまく適用されている。視覚トランスフォーマーは、画像分類、オブジェクト検出、セマンティックセグメンテーションなど、最先端のタスクを幅広いタスクで推進することが示されている。本研究は,畳み込みニューラルネットワークを用いたアートアトリビューションとアート認証の課題において有望な結果が得られたが,視覚トランスフォーマーの優位性がアート認証に拡張され,コンピュータベースのアート認証の信頼性が向上するかどうかを検証した。ヴィンセント・ファン・ゴッホ(vincent van gogh)と2つのコントラストデータセットによる真正な絵画の注意深くコンパイルされたデータセットを用いて、スウィントランスフォーマのアート認証性能と効率性を比較した。模倣とプロキシを含む標準的なコントラストセット(ファン・ゴッホと密接に関連するスタイルを持つ画家による作品)を用いて、EfficientNetは全体として最高のパフォーマンスを達成する。模倣のみで構成されたコントラストセットでは、認証精度が85%を超えることにより、Swin TransformerはEfficientNetよりも優れていることが分かる。これらの結果から,視覚変換器は,特にコンピュータによる芸術的模倣検出能力の向上において,芸術的認証において強力かつ有望な競争相手である,という結論に至った。 In recent years, Transformers, initially developed for language, have been successfully applied to visual tasks. Vision Transformers have been shown to push the state-of-the-art in a wide range of tasks, including image classification, object detection, and semantic segmentation. While ample research has shown promising results in art attribution and art authentication tasks using Convolutional Neural Networks, this paper examines if the superiority of Vision Transformers extends to art authentication, improving, thus, the reliability of computer-based authentication of artworks. Using a carefully compiled dataset of authentic paintings by Vincent van Gogh and two contrast datasets, we compare the art authentication performances of Swin Transformers with those of EfficientNet. Using a standard contrast set containing imitations and proxies (works by painters with styles closely related to van Gogh), we find that EfficientNet achieves the best performance overall. With a contrast set that only consists of imitations, we find the Swin Transformer to be superior to EfficientNet by achieving an authentication accuracy of over 85%. These results lead us to conclude that Vision Transformers represent a strong and promising contender in art authentication, particularly in enhancing the computer-based ability to detect artistic imitations.	翻訳日:2023-07-07 13:47:54 公開日:2023-07-06
# 一般観測モデルを用いたレストレスバンディットのpcl-indexabilityとwhitle index PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models ( http://arxiv.org/abs/2307.03034v1 ) ライセンス: Link先を確認	Keqin Liu and Chengzhong Zhang	(参考訳) 本稿では,restless multi-armed bandit問題に対する一般的な観測モデルについて検討する。プレイヤーの操作は、リソースの制約や環境や内在的なノイズによってエラーが発生しやすいフィードバック機構に基づく必要がある。フィードバック・観測のダイナミクスの一般的な確率モデルを確立することにより、任意の初期信念(事前情報)から始まる可算な信念状態空間を持つレスレス・バンドイットとして問題を定式化する。部分保存法則(PCL)を用いた達成可能な領域法を無限状態問題に適用し,その指数性と優先度(Whittle index)を分析する。最後に、有限状態問題に対するNi\~no-Mora と Bertsimas の AG アルゴリズムを適用可能な問題に変換する近似法を提案する。数値実験により,このアルゴリズムは優れた性能を示す。 In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a countable belief state space starting from an arbitrary initial belief (a priori information). We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Ni\~no-Mora and Bertsimas for finite-state problems can be applied to. Numerical experiments show that our algorithm has an excellent performance.	翻訳日:2023-07-07 13:47:25 公開日:2023-07-06
# データ重要度学習による検索型大規模言語モデルの改善 Improving Retrieval-Augmented Large Language Models via Data Importance Learning ( http://arxiv.org/abs/2307.03027v1 ) ライセンス: Link先を確認	Xiaozhong Lyu, Stefan Grafberger, Samantha Biegel, Shaopeng Wei, Meng Cao, Sebastian Schelter, Ce Zhang	(参考訳) Retrieval Augmentationは、例えば質問応答やデータ計算といったタスクにおいて、大きな言語モデルで外部の知識を活用できるようにする。しかし,このような検索提示モデルの性能は,検索コーパスのデータ品質によって制限される。本稿では,検索したデータポイントのデータ重要度を評価するためのマルチ線形拡張に基づくアルゴリズムを提案する。マルチリニア拡張には指数関数的に多くの項があり、本論文の重要な貢献の一つは、付加効用関数と検証セットを備えた検索指定モデルが与えられたとき、モデルユーティリティ関数のマルチリニア拡張を用いた検索コーパスにおけるデータポイントの重要度を正確に計算する多項式時間アルゴリズムである。さらに,より効率的な近似アルゴリズム({\epsilon, {\delta})を提案した。実験結果から,検索コーパスのプルーニングや再重み付けのみを必要とせずに,大規模言語モデルの性能を向上させることができることがわかった。一部のタスクでは、小さなモデル(例えば、GPT-JT)を検索エンジンAPIで拡張し、GPT-3.5を(検索拡張なしで)上回ることができる。さらに,マルチリニア拡張に基づく重みは,実際に効率的に計算できることを示す(例えば,1億要素のコーパスに対して10分以内で計算できる)。 Retrieval augmentation enables large language models to take advantage of external knowledge, for example on tasks like question answering and data imputation. However, the performance of such retrieval-augmented models is limited by the data quality of their underlying retrieval corpus. In this paper, we propose an algorithm based on multilinear extension for evaluating the data importance of retrieved data points. There are exponentially many terms in the multilinear extension, and one key contribution of this paper is a polynomial time algorithm that computes exactly, given a retrieval-augmented model with an additive utility function and a validation set, the data importance of data points in the retrieval corpus using the multilinear extension of the model's utility function. We further proposed an even more efficient ({\epsilon}, {\delta})-approximation algorithm. Our experimental results illustrate that we can enhance the performance of large language models by only pruning or reweighting the retrieval corpus, without requiring further training. For some tasks, this even allows a small model (e.g., GPT-JT), augmented with a search engine API, to outperform GPT-3.5 (without retrieval augmentation). Moreover, we show that weights based on multilinear extension can be computed efficiently in practice (e.g., in less than ten minutes for a corpus with 100 million elements).	翻訳日:2023-07-07 13:47:09 公開日:2023-07-06
# style over substance: 大規模言語モデルに対する評価バイアス Style Over Substance: Evaluation Biases for Large Language Models ( http://arxiv.org/abs/2307.03025v1 ) ライセンス: Link先を確認	Minghao Wu, Alham Fikri Aji	(参考訳) 大きな言語モデル(LLM)が進歩を続けるにつれ、そのパフォーマンスを正確かつ包括的に評価することはますます困難になっている。従来、人間の評価は自然言語生成における金の標準とみなされていた。近年の進歩は、評価過程における人間の判断のためのプロキシとして最先端のLSMを取り入れている。それでも、人間とLLMがどの程度の能力を持つかは、まだ不明である。本研究では,クラウドソース型人間とLCMベースの審査員の,異なるモデルからのアウトプットを比較する際の行動について検討する。これを実現するために、意図的に欠陥のある機械生成回答からなるデータセットをキュレートする。その結果, 事実誤りによる潜在的に大きな危険があるにもかかわらず, 事実誤りによる回答は, 短すぎる, 文法的誤りを含む回答に比べ, いまだに好意的に評価されていた。これは評価プロセスにおける関連するバイアスを強調します。この問題に対処するために,評価面を1つのスコアにマージするのではなく,複数の次元にまたがるマシン生成テキストを独立して評価することを提案する。このアイデアをeloレーティングシステムでインスタンス化し,マルチeloレーティングシステムを実現する。本研究から得られた実験結果から,本手法はLLMによる評価,特に実測精度を著しく向上させることが明らかとなった。しかし、クラウドソースによる評価では顕著な改善は見られず、さらなる調査と改善の必要性が示唆されている。 As large language models (LLMs) continue to advance, accurately and comprehensively evaluating their performance becomes increasingly challenging. Conventionally, human evaluations are considered the gold standard in natural language generation. Recent advancements incorporate state-of-the-art LLMs as proxies for human judges in evaluation processes. Nonetheless, the extent to which humans and LLMs are capable evaluators remains uncertain. This study aims to investigate the behavior of both crowd-sourced human and LLM-based judges when comparing outputs from different models. To accomplish this, we curate a dataset comprising intentionally flawed machine-generated answers. Our findings indicate that despite the potentially greater danger posed by factual errors, answers with factual errors were still rated more favorably compared to answers that were too short or contained grammatical errors. This highlights a concerning bias in the evaluation process. To address this issue, we propose to independently evaluate machine-generated text across multiple dimensions, rather than merging all the evaluation aspects into a single score. We instantiate this idea with the Elo rating system, resulting in the Multi-Elo Rating System. Empirical results from our study reveal that this proposed approach significantly enhances the quality of LLM-based evaluations, particularly in terms of factual accuracy. However, notable improvement is not observed in crowd-sourced-based evaluations, suggesting the need for further investigation and refinement.	翻訳日:2023-07-07 13:46:47 公開日:2023-07-06
# 光磁気双極子転移の強いパーセル増強 Strong Purcell enhancement of an optical magnetic dipole transition ( http://arxiv.org/abs/2307.03022v1 ) ライセンス: Link先を確認	Sebastian P. Horvath, Christopher M. Phenicie, Salim Ourari, Mehmet T. Uysal, Songtao Chen, {\L}ukasz Dusanowski, Mouktik Raha, Paul Stevenson, Adam T. Turflinger, Robert J. Cava, Nathalie P. de Leon, and Jeff D. Thompson	(参考訳) ナノフォトニック構造を持つ状態の局所密度は、パーセル効果を介して光-物質相互作用を制御する強力なツールである。光周波数では、状態の電界密度の制御は通常、電気双極子遷移を結合して操作するために使用される。しかし、磁気双極子遷移を制御するために状態の磁気密度を設計することもできる。本研究では, ナノフォトニックキャビティに結合した単一の希土類イオンを用いた光磁気パーセル効果を実験的に実証した。我々は、MgOに新しい単一光子エミッタEr$^{3+}$を設計し、電気双極子崩壊速度は立方体サイト対称性によって強く抑制され、ほぼ純粋な磁気双極子光遷移をもたらす。これにより、磁気パーセル因子 $p_m=1040 \pm 30$ の曖昧な決定が可能になる。さらに、この技術を拡張して磁気双極子スピン-光子界面を実現し、単一er$^{3+}$電子スピンの光スピン初期化と読み出しを行う。この研究は、状態工学の電気的および磁気的密度の基本的な等価性を実証し、より広いクラスのエミッタに対する光-物質相互作用を制御するための新しいツールを提供する。 Engineering the local density of states with nanophotonic structures is a powerful tool to control light-matter interactions via the Purcell effect. At optical frequencies, control over the electric field density of states is typically used to couple to and manipulate electric dipole transitions. However, it is also possible to engineer the magnetic density of states to control magnetic dipole transitions. In this work, we experimentally demonstrate the optical magnetic Purcell effect using a single rare earth ion coupled to a nanophotonic cavity. We engineer a new single photon emitter, Er$^{3+}$ in MgO, where the electric dipole decay rate is strongly suppressed by the cubic site symmetry, giving rise to a nearly pure magnetic dipole optical transition. This allows the unambiguous determination of a magnetic Purcell factor $P_m=1040 \pm 30$. We further extend this technique to realize a magnetic dipole spin-photon interface, performing optical spin initialization and readout of a single Er$^{3+}$ electron spin. This work demonstrates the fundamental equivalence of electric and magnetic density of states engineering, and provides a new tool for controlling light-matter interactions for a broader class of emitters.	翻訳日:2023-07-07 13:46:26 公開日:2023-07-06
# EffLiFe:階層スパースグラディエント蛍光による高効率光電界発生 EffLiFe: Efficient Light Field Generation via Hierarchical Sparse Gradient Descent ( http://arxiv.org/abs/2307.03017v1 ) ライセンス: Link先を確認	Yijie Deng, Lei Han, Tianpeng Lin, Lin Li, Jinzhi Zhang, and Lu Fang	(参考訳) 拡張現実感(XR)技術の台頭に伴い、スパースビューの入力からリアルタイムの光場生成の必要性が高まっている。既存の手法は、高品質なノベルビューを生成することができるが、長い推論/トレーニングのコストがかかるオフライン技術と、一般化性に欠けるか、不満足な結果を生み出すオンライン手法に分類することができる。しかし,Multi-plane Images (MPI) の固有スパース多様体は,レンダリング品質を維持しつつ,光電場生成の大幅な加速を可能にした。この知見に基づいて,提案した階層スパース勾配Descent (HSGD) を利用して,スパース画像から高品質な光フィールドをリアルタイムで生成する光場最適化手法であるEffLiFeを紹介する。技術的には、シーンの粗いMPIはまず3D CNNを使用して生成され、数回のイテレーションで重要なMPI勾配のみに焦点をあてることで、より疎く最適化される。それでも、最適化のみに依存することは、咬合境界でのアーティファクトにつながる可能性がある。そこで本研究では,入力を反復的にフィルタリングすることで,隠蔽領域の視覚的アーティファクトを除去するオクルージョン対応イテレーティブリファインメントモジュールを提案する。大規模な実験により,従来のオフライン手法に比べて平均100倍高速で視覚的品質を達成でき,他のオンライン手法に比べて性能(PSNRでは約2dB高い)が向上した。 With the rise of Extended Reality (XR) technology, there is a growing need for real-time light field generation from sparse view inputs. Existing methods can be classified into offline techniques, which can generate high-quality novel views but at the cost of long inference/training time, and online methods, which either lack generalizability or produce unsatisfactory results. However, we have observed that the intrinsic sparse manifold of Multi-plane Images (MPI) enables a significant acceleration of light field generation while maintaining rendering quality. Based on this insight, we introduce EffLiFe, a novel light field optimization method, which leverages the proposed Hierarchical Sparse Gradient Descent (HSGD) to produce high-quality light fields from sparse view images in real time. Technically, the coarse MPI of a scene is first generated using a 3D CNN, and it is further sparsely optimized by focusing only on important MPI gradients in a few iterations. Nevertheless, relying solely on optimization can lead to artifacts at occlusion boundaries. Therefore, we propose an occlusion-aware iterative refinement module that removes visual artifacts in occluded regions by iteratively filtering the input. Extensive experiments demonstrate that our method achieves comparable visual quality while being 100x faster on average than state-of-the-art offline methods and delivering better performance (about 2 dB higher in PSNR) compared to other online approaches.	翻訳日:2023-07-07 13:46:05 公開日:2023-07-06
# スケーラブルな動的障害物回避のための逐次ニューラルネットワークバリア Sequential Neural Barriers for Scalable Dynamic Obstacle Avoidance ( http://arxiv.org/abs/2307.03015v1 ) ライセンス: Link先を確認	Hongzhan Yu, Chiaki Hirayama, Chenning Yu, Sylvia Herbert, Sicun Gao	(参考訳) 障害物の複雑な相互作用のダイナミクスを解析的にモデル化することは困難であり、障害物の数で計画と制御の複雑さが指数関数的に増加する。したがって、この文脈では、データ駆動および学習に基づく手法が特に有用である。しかし、データ駆動手法は分布のドリフトに敏感であり、異なる障害物密度にわたる学習モデルの訓練と一般化が困難である。本稿では,SNCBF(Sequential Neural Control Barrier Model)の合成学習手法を提案する。複数の動的障害物の空間的相互作用パターンを分解し,各障害物の状態列を通じて予測することができる。分解により、少数の障害でのみ訓練された制御ポリシーを、障害密度が100倍の環境に一般化することができる。提案手法は, ポテンシャル場, エンド・ツー・エンド強化学習, モデル予測制御など既存の手法と比較して, 動的衝突回避の改善に有効であることを示す。また,ハードウェア実験を行い,補足映像におけるアプローチの有効性を示した。 There are two major challenges for scaling up robot navigation around dynamic obstacles: the complex interaction dynamics of the obstacles can be hard to model analytically, and the complexity of planning and control grows exponentially in the number of obstacles. Data-driven and learning-based methods are thus particularly valuable in this context. However, data-driven methods are sensitive to distribution drift, making it hard to train and generalize learned models across different obstacle densities. We propose a novel method for compositional learning of Sequential Neural Control Barrier models (SNCBFs) to achieve scalability. Our approach exploits an important observation: the spatial interaction patterns of multiple dynamic obstacles can be decomposed and predicted through temporal sequences of states for each obstacle. Through decomposition, we can generalize control policies trained only with a small number of obstacles, to environments where the obstacle density can be 100x higher. We demonstrate the benefits of the proposed methods in improving dynamic collision avoidance in comparison with existing methods including potential fields, end-to-end reinforcement learning, and model-predictive control. We also perform hardware experiments and show the practical effectiveness of the approach in the supplementary video.	翻訳日:2023-07-07 13:45:39 公開日:2023-07-06
# 熱平衡からのナノ粒子とグラフェンの分散相互作用に及ぼす質量ギャップの影響 Impact of Mass-Gap on the Dispersion Interaction of Nanoparticles with Graphene out of Thermal Equilibrium ( http://arxiv.org/abs/2307.03009v1 ) ライセンス: Link先を確認	Galina L. Klimchitskaya, Constantine C. Korikov, Vladimir M. Mostepanenko and Oleg Yu. Tsybin	(参考訳) グラフェンシートのソース側のナノ粒子に作用する非平衡分散力について検討した。ナノ粒子は環境温度で保持されるが、グラフェンシートは環境よりも冷却または高温である。質量ギャップパラメータの異なる値における分離関数としての分散力の計算は、基本リフシッツ理論から熱平衡条件への一般化を用いて行う。電磁場の量子および熱揺らぎに対するガッピンググラフェンの応答は、ディラック模型の枠組みにおける(2+1)次元時空における分極テンソルによって記述される。エバネッセント波の領域におけるこのテンソルの成分に対する明示的な表現を示す。グラフェンの質量ギャップパラメータの非平衡分散力に対する非自明な影響を平衡定数と比較して決定する。原始グラフェンの場合とは異なり、非平衡力は魅力的な性質を保っていることが示されている。ナノ粒子とグラフェンシートを複合したマイクロデバイス, ナノデバイスの設計において, 得られた結果を利用する可能性について論じる。 We consider the nonequilibrium dispersion force acting on nanoparticles on the source side of gapped graphene sheet. Nanoparticles are kept at the environmental temperature, whereas the graphene sheet may be either cooler or hotter than the environment. Calculation of the dispersion force as a function of separation at different values of the mass-gap parameter is performed using the generalization of the fundamental Lifshitz theory to the out-of-thermal-equilibrium conditions. The response of gapped graphene to quantum and thermal fluctuations of the electromagnetic field is described by the polarization tensor in (2+1)-dimensional space-time in the framework of the Dirac model. The explicit expressions for the components of this tensor in the area of evanescent waves are presented. The nontrivial impact of the mass-gap parameter of graphene on the nonequilibrium dispersion force, as compared to the equilibrium one, is determined. It is shown that, unlike the case of a pristine graphene, the nonequilibrium force preserves an attractive character. The possibilities of using the obtained results in the design of micro- and nanodevices incorporating nanoparticles and graphene sheets for their functionality are discussed.	翻訳日:2023-07-07 13:45:21 公開日:2023-07-06
# 社会的仮定を伴わない有向グラフにおける不連続表現の学習 Learning Disentangled Representations in Signed Directed Graphs without Social Assumptions ( http://arxiv.org/abs/2307.03077v1 ) ライセンス: Link先を確認	Geonwoo Ko and Jinhong Jung	(参考訳) 符号付きグラフは、様々な領域における信頼関係や嗜好を表す複雑なシステムである。このようなグラフでノード表現を学ぶことは、多くのマイニングタスクにとって重要です。現実世界のサイン付き関係は複数の潜在要因に影響される可能性があるが、ほとんどの既存の手法は、しばしば社会理論に頼り、それらを単純化された要因として扱うことによって、署名付き関係のモデリングを単純化する。これは、それらの表現力とそれらの関係を形作る多様な要因を捉える能力を制限する。本稿では,符号付き有向グラフにおける不連続ノード表現を社会的仮定なしに学習する新しい手法である dines を提案する。それぞれの埋め込みを異なる要因に分けて、複数の潜伏要因をキャプチャする、アンタングル化フレームワークを採用しています。また,社会理論によらず,サインや方向のみに焦点を当てた軽量グラフ畳み込みについても検討した。さらに,因子間の相関を考慮し,エッジの符号を効果的に分類するデコーダを提案する。さらに, エンコーダとデコーダを併用して, 自己教師付き因子判別器の訓練を行った。実世界の符号付き有向グラフに関する広範な実験を通して、DINESは効果的に絡み合ったノード表現を学習し、符号予測タスクにおいて競合相手よりも大幅に優れていることを示す。 Signed graphs are complex systems that represent trust relationships or preferences in various domains. Learning node representations in such graphs is crucial for many mining tasks. Although real-world signed relationships can be influenced by multiple latent factors, most existing methods often oversimplify the modeling of signed relationships by relying on social theories and treating them as simplistic factors. This limits their expressiveness and their ability to capture the diverse factors that shape these relationships. In this paper, we propose DINES, a novel method for learning disentangled node representations in signed directed graphs without social assumptions. We adopt a disentangled framework that separates each embedding into distinct factors, allowing for capturing multiple latent factors. We also explore lightweight graph convolutions that focus solely on sign and direction, without depending on social theories. Additionally, we propose a decoder that effectively classifies an edge's sign by considering correlations between the factors. To further enhance disentanglement, we jointly train a self-supervised factor discriminator with our encoder and decoder. Throughout extensive experiments on real-world signed directed graphs, we show that DINES effectively learns disentangled node representations, and significantly outperforms its competitors in the sign prediction task.	翻訳日:2023-07-07 13:37:42 公開日:2023-07-06
# 分類された経路計算 Categorified Path Calculus ( http://arxiv.org/abs/2307.03075v1 ) ライセンス: Link先を確認	Simon Burton	(参考訳) パス計算(Path calculus)またはグラフィカル線型代数は、基底環上の行列の圏に対する弦図式である。これは対称モノイド圏の通常の弦図計算であり、モノイド積は行列の直和である。我々はこの物語を分類し、基底双モノイド圏上の行列の2カテゴリーのための曲面図表を開発する。これにより、任意のビモノイド圏の表面ダイアグラムは 1x1 行列のダイアグラムに制限される。両積、双対、短剣といった基底圏上の追加構造が、結果として得られる計算にどのように構造を加えるかを示す。圏量子力学に適用すると、テレポーテーションプロトコルの新しいグラフィカルな証明が得られる。 Path calculus, or graphical linear algebra, is a string diagram calculus for the category of matrices over a base ring. It is the usual string diagram calculus for a symmetric monoidal category, where the monoidal product is the direct sum of matrices. We categorify this story to develop a surface diagram calculus for the bicategory of matrices over a base bimonoidal category. This yields a surface diagram calculus for any bimonoidal category by restricting to diagrams for 1x1 matrices. We show how additional structure on the base category, such as biproducts, duals and the dagger, adds structure to the resulting calculus. Applied to categorical quantum mechanics this yields a new graphical proof of the teleportation protocol.	翻訳日:2023-07-07 13:37:19 公開日:2023-07-06
# Proto-CLIP:Few-Shot Learningのためのビジョン言語プロトタイプネットワーク Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning ( http://arxiv.org/abs/2307.03073v1 ) ライセンス: Link先を確認	Jishnu Jaykumar P, Kamalesh Palanisamy, Yu-Wei Chao, Xinya Du, Yu Xiang	(参考訳) 本稿では,CLIPのような大規模視覚言語モデルを活用することで,数ショット学習のための新しいフレームワークを提案する。初歩学習のためのユニモーダルな原型的ネットワークに動機づけられ,初歩学習に画像プロトタイプとテキストプロトタイプを利用するproto-clipを導入した。具体的には、PROTO-CLIPは、CLIP内の画像エンコーダとテキストエンコーダを、少数の例を用いて共同で適応させる。 2つのエンコーダは、分類のための画像クラスのプロトタイプを計算するために使用される。適応中に、対応するクラスの画像とテキストのプロトタイプの整列を提案する。このようなアライメントは、両タイプのプロトタイプからの貢献により、少数ショットの分類に有用である。本手法の有効性を,数発の学習のためのベンチマークデータセットと,ロボットの知覚のための実世界で実験することで実証する。 We propose a novel framework for few-shot learning by leveraging large-scale vision-language models such as CLIP. Motivated by the unimodal prototypical networks for few-shot learning, we introduce PROTO-CLIP that utilizes image prototypes and text prototypes for few-shot learning. Specifically, PROTO-CLIP adapts the image encoder and text encoder in CLIP in a joint fashion using few-shot examples. The two encoders are used to compute prototypes of image classes for classification. During adaptation, we propose aligning the image and text prototypes of corresponding classes. Such a proposed alignment is beneficial for few-shot classification due to the contributions from both types of prototypes. We demonstrate the effectiveness of our method by conducting experiments on benchmark datasets for few-shot learning as well as in the real world for robot perception.	翻訳日:2023-07-07 13:37:09 公開日:2023-07-06
# 脳波認識のためのグラフスムース信号を用いたハイブリッド・エンド・エンド時空間アテンションニューラルネットワーク A Hybrid End-to-End Spatio-Temporal Attention Neural Network with Graph-Smooth Signals for EEG Emotion Recognition ( http://arxiv.org/abs/2307.03068v1 ) ライセンス: Link先を確認	Shadi Sartipi and Mastaneh Torkamani-Azar and Mujdat Cetin	(参考訳) 近年,心電図(eeg)信号などの生理データが情動計算に注目されている。この文脈での主な目標は、感情状態を評価する自動化モデルを設計することです。近年、ディープニューラルネットワークは感情認識タスクにおいて有望なパフォーマンスを示している。しかし、生データから実用的な情報を抽出する深層アーキテクチャの設計は依然として課題である。本稿では,時空間符号化とリカレントアテンションネットワークブロックのハイブリッド構造により,解釈可能な生理学的表現を得るディープニューラルネットワークを提案する。さらに、グラフ信号処理ツールを用いて生データに前処理ステップを適用し、空間領域でグラフ平滑化を行う。提案するアーキテクチャは,公開のdeapデータセットにおける感情分類の最先端結果を超えることを実証する。また,学習モデルの汎用性を検討するために,モデルパラメータを特定のソースから他のターゲットドメインに転送することで,トランスファー学習(tl)に向けたアーキテクチャの性能を評価する。 DREAMER と Emotional English Word (EEWD) データセットでは,脳波に基づく感情分類タスクと異なる刺激を伴う感情分類タスクを伴って,モデルの有効性を実証する。 Recently, physiological data such as electroencephalography (EEG) signals have attracted significant attention in affective computing. In this context, the main goal is to design an automated model that can assess emotional states. Lately, deep neural networks have shown promising performance in emotion recognition tasks. However, designing a deep architecture that can extract practical information from raw data is still a challenge. Here, we introduce a deep neural network that acquires interpretable physiological representations by a hybrid structure of spatio-temporal encoding and recurrent attention network blocks. Furthermore, a preprocessing step is applied to the raw data using graph signal processing tools to perform graph smoothing in the spatial domain. We demonstrate that our proposed architecture exceeds state-of-the-art results for emotion classification on the publicly available DEAP dataset. To explore the generality of the learned model, we also evaluate the performance of our architecture towards transfer learning (TL) by transferring the model parameters from a specific source to other target domains. Using DEAP as the source dataset, we demonstrate the effectiveness of our model in performing cross-modality TL and improving emotion classification accuracy on DREAMER and the Emotional English Word (EEWD) datasets, which involve EEG-based emotion classification tasks with different stimuli.	翻訳日:2023-07-07 13:36:57 公開日:2023-07-06
# DeepOnto: ディープラーニングによるオントロジーエンジニアリングのためのPythonパッケージ DeepOnto: A Python Package for Ontology Engineering with Deep Learning ( http://arxiv.org/abs/2307.03067v1 ) ライセンス: Link先を確認	Yuan He, Jiaoyan Chen, Hang Dong, Ian Horrocks, Carlo Allocca, Taehun Kim, Brahmananda Sapkota	(参考訳) オントロジー工学におけるディープラーニング技術、特に言語モデル(LM)の適用は、広く注目を集めている。しかし、PyTorchやTensorflowのようなディープラーニングフレームワークは主にPythonプログラミング用に開発されており、OWL APIやJanaといった広く使われているオントロジーAPIは主にJavaベースである。これらのフレームワークとAPIのシームレスな統合を容易にするため、オントロジーエンジニアリング用に設計されたPythonパッケージであるDeepontoを紹介します。このパッケージには、広く認識され信頼性の高いOWL API上に構築されたコアオントロジー処理モジュールが含まれており、基本的な機能をよりPython的な方法でカプセル化し、推論、動詞化、正規化、投影など、他の必須コンポーネントを含むように拡張している。このモジュール上に構築されているDeepontoは、オントロジーアライメントや完了といった様々なオントロジーエンジニアリングタスクをサポートする一連のツール、リソース、アルゴリズムを提供する。本稿では,Samsung Research UKのDigital Health Coachingと,Ontology Alignment Evaluation Initiative(OAEI)のBio-MLトラックの2つのユースケースを通じて,Deepontoの実用性を実証する。 Applying deep learning techniques, particularly language models (LMs), in ontology engineering has raised widespread attention. However, deep learning frameworks like PyTorch and Tensorflow are predominantly developed for Python programming, while widely-used ontology APIs, such as the OWL API and Jena, are primarily Java-based. To facilitate seamless integration of these frameworks and APIs, we present Deeponto, a Python package designed for ontology engineering. The package encompasses a core ontology processing module founded on the widely-recognised and reliable OWL API, encapsulating its fundamental features in a more "Pythonic" manner and extending its capabilities to include other essential components including reasoning, verbalisation, normalisation, projection, and more. Building on this module, Deeponto offers a suite of tools, resources, and algorithms that support various ontology engineering tasks, such as ontology alignment and completion, by harnessing deep learning methodologies, primarily pre-trained LMs. In this paper, we also demonstrate the practical utility of Deeponto through two use-cases: the Digital Health Coaching in Samsung Research UK and the Bio-ML track of the Ontology Alignment Evaluation Initiative (OAEI).	翻訳日:2023-07-07 13:36:38 公開日:2023-07-06
# 離散対数に対する量子複雑性とその問題 Quantum Complexity for Discrete Logarithms and Related Problems ( http://arxiv.org/abs/2307.03065v1 ) ライセンス: Link先を確認	Minki Hhan, Takashi Yamakawa, Aaram Yun	(参考訳) 本稿では,離散対数(dl)の量子計算複雑性と,それに関連する一般アルゴリズムの文脈における群論的問題,すなわち,群符号化の特性を生かさないアルゴリズムについて検討する。我々は、群理論問題に対する量子計算の一般モデルを構築し、これを量子ジェネリックグループモデルと呼ぶ。 dl問題と関連アルゴリズムに対するshorのアルゴリズムは、このモデルで記述することができる。このモデルでは、量子複雑性の低い境界と、DLのほぼ一致するアルゴリズムと関連する問題を示す。より正確には、巡回群 $G$ of prime order に対して以下の結果が証明される。 -任意のジェネリック量子dlアルゴリズムは群演算の深さを$\omega(\log \|g\|)$でなければならない。これは、ショアのアルゴリズムが、並列アルゴリズムを考慮しても、ジェネリック量子アルゴリズムの中で漸近的に最適であることを示している。 -shorのアルゴリズムのバリエーションが古典計算を活用し、量子群演算の数を減らすことができることを観測する。汎用ハイブリッド量子古典アルゴリズムのモデルを導入し、これらのアルゴリズムがこのモデルでほぼ最適であることを示す。 DL問題に対する総群演算数$Q$は、深さ$\Omega(\log \|G\|/\log Q)$の量子群演算を$\Omega(\log \|G\| - \log\log Q)$とする。 - 量子メモリが$t$グループ要素のみを格納し、$r$グループ要素の量子ランダムアクセスメモリを使用する場合、任意の一般的なハイブリッドアルゴリズムは、合計$\Omega(\sqrt{\|G\|})$グループ演算または$\Omega(\log \|G\|/\log (tr))$量子群演算をしなければならない。副次的貢献として、パスワード認証鍵交換プロトコルの文脈で提案される量子イライラ特性の強い形に反論し、各インスタンスを1つずつ解決するよりも優れたアルゴリズムを許容する複数のDL問題を示す。 This paper studies the quantum computational complexity of the discrete logarithm (DL) and related group-theoretic problems in the context of generic algorithms -- that is, algorithms that do not exploit any properties of the group encoding. We establish a generic model of quantum computation for group-theoretic problems, which we call the quantum generic group model. Shor's algorithm for the DL problem and related algorithms can be described in this model. We show the quantum complexity lower bounds and almost matching algorithms of the DL and related problems in this model. More precisely, we prove the following results for a cyclic group $G$ of prime order. - Any generic quantum DL algorithm must make $\Omega(\log \|G\|)$ depth of group operations. This shows that Shor's algorithm is asymptotically optimal among the generic quantum algorithms, even considering parallel algorithms. - We observe that variations of Shor's algorithm can take advantage of classical computations to reduce the number of quantum group operations. We introduce a model for generic hybrid quantum-classical algorithms and show that these algorithms are almost optimal in this model. Any generic hybrid algorithm for the DL problem with a total number of group operations $Q$ must make $\Omega(\log \|G\|/\log Q)$ quantum group operations of depth $\Omega(\log\log \|G\| - \log\log Q)$. - When the quantum memory can only store $t$ group elements and use quantum random access memory of $r$ group elements, any generic hybrid algorithm must make either $\Omega(\sqrt{\|G\|})$ group operations in total or $\Omega(\log \|G\|/\log (tr))$ quantum group operations. As a side contribution, we show a multiple DL problem admits a better algorithm than solving each instance one by one, refuting a strong form of the quantum annoying property suggested in the context of password-authenticated key exchange protocol.	翻訳日:2023-07-07 13:36:14 公開日:2023-07-06
# 勾配に基づく解釈可能性のためのバックプロパゲーションの一般化 Generalizing Backpropagation for Gradient-Based Interpretability ( http://arxiv.org/abs/2307.03056v1 ) ライセンス: Link先を確認	Kevin Du, Lucas Torroba Hennigen, Niklas Stoehr, Alexander Warstadt, Ryan Cotterell	(参考訳) ディープニューラルネットワークを解釈するための多くの一般的な特徴属性法は、入力に対するモデルの出力の勾配の計算に依存する。これらの手法はモデルの予測にどの入力特徴が重要であるかを示すことができるが、モデル自体の内部動作についてはほとんど明らかにしない。本稿では,モデルの勾配計算が半環を用いたより一般的な定式化の特別な場合であることを示す。この観測により、バックプロパゲーションアルゴリズムを一般化し、最も重み付きパスやエントロピーのようなニューラルネットワークの勾配グラフに関する他の解釈可能な統計を効率的に計算することができる。本稿では、この一般化アルゴリズムを実装し、計算した統計をよりよく理解するために合成データセット上で評価し、SVAにおけるBERTの挙動の研究に応用する。この方法により、我々は (a)モデルの構成要素を流れる勾配の量は、その予測の重要性を反映していることを検証する。 b) 自己保持機構のどの経路が最も重要であるかを特定する。 Many popular feature-attribution methods for interpreting deep neural networks rely on computing the gradients of a model's output with respect to its inputs. While these methods can indicate which input features may be important for the model's prediction, they reveal little about the inner workings of the model itself. In this paper, we observe that the gradient computation of a model is a special case of a more general formulation using semirings. This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics about the gradient graph of a neural network, such as the highest-weighted path and entropy. We implement this generalized algorithm, evaluate it on synthetic datasets to better understand the statistics it computes, and apply it to study BERT's behavior on the subject-verb number agreement task (SVA). With this method, we (a) validate that the amount of gradient flow through a component of a model reflects its importance to a prediction and (b) for SVA, identify which pathways of the self-attention mechanism are most important.	翻訳日:2023-07-07 13:35:40 公開日:2023-07-06
# ハイブリッド量子古典型貯水池計算による2次元乱流レイリー・ブエナード流れの低次モデリング Reduced-order modeling of two-dimensional turbulent Rayleigh-B\'enard flow by hybrid quantum-classical reservoir computing ( http://arxiv.org/abs/2307.03053v1 ) ライセンス: Link先を確認	Philipp Pfeffer, Florian Heyder and J\"org Schumacher	(参考訳) 2つのハイブリッド量子古典型貯留層計算モデルを用いて,レイリー数 ra=1e5 における2次元乱流rayleigh-b\'enard対流流とpr=10 の低次統計特性を再現した。どちらの量子アルゴリズムも、量子貯水池の回路層、特に絡み合い層の配置によって異なる。 2つのアーキテクチャのうち2番目はh2と呼ばれ、量子回路内のリザーバー更新を完全に実行することができる。その性能は古典的な貯水池計算モデルと比較される。 3つのモデルはすべて、最もエネルギーの強い16個の正直交分解(pod)モードの時系列にまたがる低次元の潜在データ空間における流れの非線形およびカオス力学を学ぶ必要がある。これらのトレーニングデータは、乱流からポッドスナップショット解析によって生成される。全ての貯水池計算モデルは復元モードまたは開放ループモードで動作し、各ステップで入力として3つのPODモードを受け取り、欠落した13のモードを再構築する。本研究では,量子ケースに特有なハイパーパラメータや,リザーバサイズやリーク率などの古典的コンセンサスと共用するハイパーパラメータに依存した再構成誤差を解析した。両量子アルゴリズムは, 乱流対流の基本的な統計特性を, n<=9の少数の量子ビットで再現可能であることを示す。これらの特性は, 速度および温度変動分布, 特に乱流対流熱流束からなり, 層内を横切る乱流熱伝達を定量化し, 密集した高温上昇および冷下熱気柱に現れる。 Two hybrid quantum-classical reservoir computing models are presented to reproduce low-order statistical properties of a two-dimensional turbulent Rayleigh-B\'enard convection flow at a Rayleigh number Ra=1e5 and a Prandtl number Pr=10. Both quantum algorithms differ by the arrangement of the circuit layers in the quantum reservoir, in particular the entanglement layers. The second of the two architectures, denoted as H2, enables a complete execution of the reservoir update inside the quantum circuit. Their performance is compared with that of a classical reservoir computing model. All three models have to learn the nonlinear and chaotic dynamics of the flow in a lower-dimensional latent data space which is spanned by the time series of the 16 most energetic Proper Orthogonal Decomposition (POD) modes. These training data are generated by a POD snapshot analysis from the turbulent flow. All reservoir computing models are operated in the reconstruction or open-loop mode, i.e., they receive 3 POD modes as an input at each step and reconstruct the missing 13 ones. We analyse the reconstruction error in dependence on the hyperparameters which are specific for the quantum cases or shared with the classical counterpart, such as the reservoir size and the leaking rate. We show that both quantum algorithms are able to reconstruct essential statistical properties of the turbulent convection flow successfully with a small number of qubits of n<=9. These properties comprise the velocity and temperature fluctuation profiles and, in particular, the turbulent convective heat flux, which quantifies the turbulent heat transfer across the layer and manifests in coherent hot rising and cold falling thermal plumes.	翻訳日:2023-07-07 13:35:22 公開日:2023-07-06
# 地図ベースのサービスのための原位置旅行時間Oracle Origin-Destination Travel Time Oracle for Map-based Services ( http://arxiv.org/abs/2307.03048v1 ) ライセンス: Link先を確認	Yan Lin, Huaiyu Wan, Jilin Hu, Shengnan Guo, Bin Yang, Youfang Lin, Christian S. Jensen	(参考訳) オリジン(o)、デスティネーション(d)、出発時刻(t)が与えられると、オリジン-デスティネーション(od)の旅行時間oracle~(odt-oracle)は、t.odt-oraclesがマップベースのサービスにおいて重要な目的を果たすときに、oからdへの旅行に要する時間の見積もりを返す。このようなオーラクルの構築を可能にするため,過去の軌道を利用してODペアの時間変化旅行時間を推定する旅行時間推定(TTE)ソリューションを提案する。この問題は、異なる旅行時間を持つ複数の歴史的な軌道がodペアを接続する可能性があり、一方、軌道は互いに異なる可能性があるという事実によって複雑である。この問題を解決するためには、将来の問い合わせに対して旅行時間推定を行う際に、外乱軌道を除去することが重要である。そこで本稿では,Diffusion-based Origin-Detination Travel Time Estimation (DOT)と呼ばれる新しい2段階のフレームワークを提案する。第一に、DOTは、OD対と過去の軌跡との相関を学習することで拡散に基づくPiT推論プロセスの構築を可能にする、条件付きPixelated Trajectories (PiT)デノイザを採用している。具体的には、ODペアと出発時間を考えると、PiTを推論することを目的としています。次に、DOTは、推定されたPiTに基づいて、効率的に効率的に旅行時間を推定するMasked Vision Transformer~(MViT)を含む。我々は,dotがベースラインメソッドよりも精度,スケーラビリティ,説明可能性において優れていることを示す2つの実世界のデータセットに関する広範な実験を報告する。 Given an origin (O), a destination (D), and a departure time (T), an Origin-Destination (OD) travel time oracle~(ODT-Oracle) returns an estimate of the time it takes to travel from O to D when departing at T. ODT-Oracles serve important purposes in map-based services. To enable the construction of such oracles, we provide a travel-time estimation (TTE) solution that leverages historical trajectories to estimate time-varying travel times for OD pairs. The problem is complicated by the fact that multiple historical trajectories with different travel times may connect an OD pair, while trajectories may vary from one another. To solve the problem, it is crucial to remove outlier trajectories when doing travel time estimation for future queries. We propose a novel, two-stage framework called Diffusion-based Origin-destination Travel Time Estimation (DOT), that solves the problem. First, DOT employs a conditioned Pixelated Trajectories (PiT) denoiser that enables building a diffusion-based PiT inference process by learning correlations between OD pairs and historical trajectories. Specifically, given an OD pair and a departure time, we aim to infer a PiT. Next, DOT encompasses a Masked Vision Transformer~(MViT) that effectively and efficiently estimates a travel time based on the inferred PiT. We report on extensive experiments on two real-world datasets that offer evidence that DOT is capable of outperforming baseline methods in terms of accuracy, scalability, and explainability.	翻訳日:2023-07-07 13:34:53 公開日:2023-07-06
# 変圧器を用いた音楽ストリーミングサービスにおけるトラックミックス生成 Track Mix Generation on Music Streaming Services using Transformers ( http://arxiv.org/abs/2307.03045v1 ) ライセンス: Link先を確認	Walid Bendada and Th\'eo Bontempelli and Mathieu Morlon and Benjamin Chapus and Thibault Cador and Thomas Bouab\c{c}a and Guillaume Salha-Galvan	(参考訳) 本稿では,2022年に音楽ストリーミングサービスDeezerでリリースされたパーソナライズされたプレイリスト生成システムであるTrack Mixを紹介する。 Track Mixは、初期の音楽曲にインスパイアされた「ミックス」プレイリストを自動的に生成し、ユーザーはお気に入りのコンテンツに似た音楽を発見することができる。これらのミックスを生成するために,ユーザプレイリストから数百万トラックシーケンスをトレーニングしたTransformerモデルを検討する。近年、トランスフォーマーの人気が高まりつつあることを踏まえて、従来のコラボレーティブフィルタリングアプローチと比較して、このようなモデルを用いたサービスでのミックス生成の利点、欠点、技術的な課題を分析した。 Track Mixはリリース以来、毎日何百万ものユーザー向けにプレイリストを作成し、Deezerで音楽発見体験を強化してきた。 This paper introduces Track Mix, a personalized playlist generation system released in 2022 on the music streaming service Deezer. Track Mix automatically generates "mix" playlists inspired by initial music tracks, allowing users to discover music similar to their favorite content. To generate these mixes, we consider a Transformer model trained on millions of track sequences from user playlists. In light of the growing popularity of Transformers in recent years, we analyze the advantages, drawbacks, and technical challenges of using such a model for mix generation on the service, compared to a more traditional collaborative filtering approach. Since its release, Track Mix has been generating playlists for millions of users daily, enhancing their music discovery experience on Deezer.	翻訳日:2023-07-07 13:34:21 公開日:2023-07-06
# プライバシーとユーティリティのトレードオフに対する量子ソリューション Quantum Solutions to the Privacy vs. Utility Tradeoff ( http://arxiv.org/abs/2307.03118v1 ) ライセンス: Link先を確認	Sagnik Chatterjee and Vyacheslav Kungurtsev	(参考訳) 本稿では,生成モデルに対するメンバシップ推論攻撃に関するプライバシとセキュリティ保証の証明が可能な量子暗号プリミティブに基づく,新たなアーキテクチャ(およびそのいくつかの変種)を提案する。私たちのアーキテクチャは、既存の古典的または量子的生成モデル上で使用できます。我々は、ユニタリ演算子に関連する量子ゲートの使用は、すべての多項式時間敵からの保証されたセキュリティを確立するための標準微分プライバシベースの技術と比較して本質的に有利であると主張している。 In this work, we propose a novel architecture (and several variants thereof) based on quantum cryptographic primitives with provable privacy and security guarantees regarding membership inference attacks on generative models. Our architecture can be used on top of any existing classical or quantum generative models. We argue that the use of quantum gates associated with unitary operators provides inherent advantages compared to standard Differential Privacy based techniques for establishing guaranteed security from all polynomial-time adversaries.	翻訳日:2023-07-07 13:28:52 公開日:2023-07-06
# KoRC: 深層テキスト理解のための知識指向読解ベンチマーク KoRC: Knowledge oriented Reading Comprehension Benchmark for Deep Text Understanding ( http://arxiv.org/abs/2307.03115v1 ) ライセンス: Link先を確認	Zijun Yao, Yantao Liu, Xin Lv, Shulin Cao, Jifan Yu, Lei Hou, Juanzi Li	(参考訳) 与えられた文書とテキスト以外の知識との間の接続を必要とする深いテキスト理解は、近年多くのベンチマークによって強調されている。しかし、これらのベンチマークは2つの大きな制限に遭遇した。一方、そのほとんどが人間の知識アノテーションを必要としており、知識のカバー範囲が限られている。一方、彼らは通常、テキスト中の選択やスパンを答えとして使用し、その結果、狭い回答空間となる。これらの制限を克服するために、我々はKoRcという新しい挑戦的なベンチマークを構築した。以前のベンチマークと比較すると、KoRCには2つの利点がある。具体的には,大量の知識ベースを用いてアノテーションや大規模言語モデル(llm)を指導し,理解可能な質問を構築する。さらに、最終回答として範囲や選択ではなく、知識ベースでラベルを使用します。実験結果から, 最強のベースラインは, 分布内および分布外において, 68.3%, 30.0%のF1測定値しか得られないことが判明した。これらの結果は、深いテキスト理解はまだ未解決の課題であることを示している。ベンチマークデータセット、リーダーボード、ベースラインメソッドはhttps://github.com/THU-KEG/KoRC.orgで公開されている。 Deep text understanding, which requires the connections between a given document and prior knowledge beyond its text, has been highlighted by many benchmarks in recent years. However, these benchmarks have encountered two major limitations. On the one hand, most of them require human annotation of knowledge, which leads to limited knowledge coverage. On the other hand, they usually use choices or spans in the texts as the answers, which results in narrow answer space. To overcome these limitations, we build a new challenging benchmark named KoRc in this paper. Compared with previous benchmarks, KoRC has two advantages, i.e., broad knowledge coverage and flexible answer format. Specifically, we utilize massive knowledge bases to guide annotators or large language models (LLMs) to construct knowledgable questions. Moreover, we use labels in knowledge bases rather than spans or choices as the final answers. We test state-of-the-art models on KoRC and the experimental results show that the strongest baseline only achieves 68.3% and 30.0% F1 measure in the in-distribution and out-of-distribution test set, respectively. These results indicate that deep text understanding is still an unsolved challenge. The benchmark dataset, leaderboard, and baseline methods are released in https://github.com/THU-KEG/KoRC.	翻訳日:2023-07-07 13:28:45 公開日:2023-07-06
# LISSNAS: ニューラルネットワーク検索のための局所性に基づく反復探索空間収縮 LISSNAS: Locality-based Iterative Search Space Shrinkage for Neural Architecture Search ( http://arxiv.org/abs/2307.03110v1 ) ライセンス: Link先を確認	Bhavna Gopal, Arjun Sridhar, Tunhou Zhang and Yiran Chen	(参考訳) 探索空間はニューラルアーキテクチャサーチ(NAS)の進歩を示す。汎用的な建築オペレーターと構造を持つ大規模で複雑な探索空間は、有望なアーキテクチャを造る機会を提供するが、効率的な探索と搾取には厳しい課題が生じる。その後、いくつかの検索空間縮小法は、性能の良いネットワークを含む単一のサブリージョンを選択することで最適化される。これらの手法では, 少ない性能と効率向上が観察されるが, 探索性能を著しく向上させる余地は残っており, アーキテクチャの多様性を維持するには有効ではない。我々は,大規模な空間をSOTA検索性能を持つ多種多様な小さな探索空間に縮小する自動アルゴリズム LISSNAS を提案する。提案手法は, 局所性, 構造的類似性と性能類似性の関係を利用して, 性能の良いネットワークのポケットを効率的に抽出する。様々なサイズとデータセットにまたがる探索空間の配列に本手法を示す。 2つの異なる検索空間において、最良top-1精度を達成することにより、ワンショット検索における縮小空間の有効性を強調する。提案手法は,モバイル制約下でのイメージネットのSOTA Top-1精度77.6\%,最良クラスKendal-Tau,アーキテクチャ多様性,検索空間サイズを実現している。 Search spaces hallmark the advancement of Neural Architecture Search (NAS). Large and complex search spaces with versatile building operators and structures provide more opportunities to brew promising architectures, yet pose severe challenges on efficient exploration and exploitation. Subsequently, several search space shrinkage methods optimize by selecting a single sub-region that contains some well-performing networks. Small performance and efficiency gains are observed with these methods but such techniques leave room for significantly improved search performance and are ineffective at retaining architectural diversity. We propose LISSNAS, an automated algorithm that shrinks a large space into a diverse, small search space with SOTA search performance. Our approach leverages locality, the relationship between structural and performance similarity, to efficiently extract many pockets of well-performing networks. We showcase our method on an array of search spaces spanning various sizes and datasets. We accentuate the effectiveness of our shrunk spaces when used in one-shot search by achieving the best Top-1 accuracy in two different search spaces. Our method achieves a SOTA Top-1 accuracy of 77.6\% in ImageNet under mobile constraints, best-in-class Kendal-Tau, architectural diversity, and search space size.	翻訳日:2023-07-07 13:28:24 公開日:2023-07-06
# 大規模言語モデルの評価に関する調査 A Survey on Evaluation of Large Language Models ( http://arxiv.org/abs/2307.03109v1 ) ライセンス: Link先を確認	Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Kaijie Zhu, Hao Chen, Linyi Yang, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie	(参考訳) 大規模言語モデル(LLM)は、様々なアプリケーションにおける前例のない性能のため、学術と産業の両方で人気が高まっている。 LLMは研究と日常利用の両方において重要な役割を担い続けており、その評価はタスクレベルだけでなく社会レベルでもますます重要になり、潜在的なリスクの理解を深めている。過去数年間、様々な観点からLSMを調べるための重要な努力が続けられてきた。本稿では, これらのLCMの評価手法を総合的に検討し, 評価方法, 評価方法, 評価方法の3つの重要な側面に着目した。まず,一般的な自然言語処理タスク,推論,医療利用,倫理,教育,自然科学,社会科学,エージェント応用など,評価タスクの観点から概観する。第2に,LLMの性能評価において重要な要素である評価手法とベンチマークに飛び乗ることで,'where' と 'how' の質問に答える。次に、異なるタスクにおけるLCMの成功事例と失敗事例を要約する。最後に、llms評価の先にあるいくつかの将来の課題に光を当てた。我々の目的は、LLMの評価の領域における研究者に貴重な洞察を提供することであり、それによってより熟練したLLMの開発を支援することである。我々のキーポイントは、LCMの開発を支援するために、評価を必須の規律として扱うべきであるということです。関連したオープンソース資料は、https://github.com/mlgroupjlu/llm-eval-surveyで一貫して保守しています。 Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, educations, natural and social sciences, agent applications, and other areas. Secondly, we answer the `where' and `how' questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey.	翻訳日:2023-07-07 13:28:00 公開日:2023-07-06
# テキスト・画像拡散モデルにおける不正データ利用の検出方法 How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models ( http://arxiv.org/abs/2307.03108v1 ) ライセンス: Link先を確認	Zhenting Wang, Chen Chen, Yuchen Liu, Lingjuan Lyu, Dimitris Metaxas, Shiqing Ma	(参考訳) 最近のテキストから画像への拡散モデルは、高品質な画像を生成するのに驚くべき性能を示している。しかし、トレーニングプロセス中に不正なデータの使用が懸念されている。例えば、モデルトレーナーが特定のアーティストによって作成された画像の集合を収集し、アーティストの許可を得ずに類似の画像を生成することができるモデルを訓練しようとする場合である。この問題に対処するには、不正なデータ利用を検出することが不可欠になる。本稿では,保護されたデータセット上で訓練されたテキスト間拡散モデルに,インジェクトした記憶を植え付けることで,そのような不正なデータ利用を検出する手法を提案する。具体的には,人間の視力に影響されないが拡散モデルにより捉え,記憶できるステルス画像ラッピング機能などの画像に,ユニークな内容を加えることで,保護された画像データセットを変更する。モデルがインジェクトされたコンテンツ(すなわち、生成された画像が選択された後処理機能によって処理されているかどうか)を記憶しているかどうかを解析することにより、不正に使用したモデルを検出することができる。安定拡散とloraモデルを用いた実験により,提案手法の有効性が実証された。 Recent text-to-image diffusion models have shown surprising performance in generating high-quality images. However, concerns have arisen regarding the unauthorized usage of data during the training process. One example is when a model trainer collects a set of images created by a particular artist and attempts to train a model capable of generating similar images without obtaining permission from the artist. To address this issue, it becomes crucial to detect unauthorized data usage. In this paper, we propose a method for detecting such unauthorized data usage by planting injected memorization into the text-to-image diffusion models trained on the protected dataset. Specifically, we modify the protected image dataset by adding unique contents on the images such as stealthy image wrapping functions that are imperceptible to human vision but can be captured and memorized by diffusion models. By analyzing whether the model has memorization for the injected content (i.e., whether the generated images are processed by the chosen post-processing function), we can detect models that had illegally utilized the unauthorized data. Our experiments conducted on Stable Diffusion and LoRA model demonstrate the effectiveness of the proposed method in detecting unauthorized data usages.	翻訳日:2023-07-07 13:27:36 公開日:2023-07-06
# 一般化量子共分散行列からの非エルミート親ハミルトン Non-Hermitian Parent Hamiltonian from Generalized Quantum Covariance Matrix ( http://arxiv.org/abs/2307.03107v1 ) ライセンス: Link先を確認	Yin Tang, W. Zhu	(参考訳) 量子逆問題は1つの固有状態から局所ハミルトニアンを決定する方法として定義される。この問題はエルミート系だけでなく非エルミート系においても有効である。これまでのところ、ほとんどの試みはエルミート系に限定されているが、非エルミート解は未解決のままである。本研究では,非エルミート系に適用可能な場合に対して量子共分散行列法を一般化し,非エルミート親ハミルトニアンを任意の対の生物固有状態から明示的に再構成することができる。具体例として、リー・ヤン特異点と非エルミート相互作用フェルミオンモデルによるスピン鎖へのアプローチをうまく適用する。このアプローチの一般化とさらなる応用についても論じる。我々の研究は、一対の生物直交固有状態から非エルミートハミルトニアンを構築するための体系的で効率的な方法を提供し、非エルミート物理学の将来の探索に光を当てた。 Quantum inverse problem is defined as how to determine a local Hamiltonian from a single eigenstate? This question is valid not only in Hermitian system but also in non-Hermitian system. So far, most attempts are limited to Hermitian systems, while the possible non-Hermitian solution remains outstanding. In this work, we generalize the quantum covariance matrix method to the cases that are applicable to non-Hermitian systems, through which we are able to explicitly reconstruct the non-Hermitian parent Hamiltonian from an arbitrary pair of biorthogonal eigenstates. As concrete examples, we successfully apply our approach in spin chain with Lee-Yang singularity and a non-Hermitian interacting fermion model. Some generalization and further application of our approach are also discussed. Our work provides a systematical and efficient way to construct non-Hermitian Hamiltonian from a single pair of biorthogonal eigenstates and shed light on future exploration on non-Hermitian physics.	翻訳日:2023-07-07 13:27:18 公開日:2023-07-06
# アダプタを用いた文埋め込みの効率的なドメイン適応 Efficient Domain Adaptation of Sentence Embeddings using Adapters ( http://arxiv.org/abs/2307.03104v1 ) ライセンス: Link先を確認	Tim Schopf, Dennis Schneider, Florian Matthes	(参考訳) 文埋め込みにより、短いテキストの意味的類似性を捉えることができる。ほとんどの文埋め込みモデルはsts(general semantic textual similarity)タスクのために訓練される。したがって、特定のドメインに文を埋め込むには、良い結果を得るためにモデルを適用する必要がある。通常、これは関心領域の文埋め込みモデル全体を微調整することによって行われる。このアプローチは最先端の結果をもたらすが、モデルの重みはすべて微調整中に更新され、このメソッドはリソース集約的になる。したがって,各対象領域の文埋め込みモデル全体を個別に微調整するのではなく,軽量アダプタのトレーニングを提案する。これらのドメイン固有のアダプタは、基礎となるすべての文埋め込みモデルパラメータを微調整する必要はない。代わりに、基礎となる文埋め込みモデルの重みを固定しながら、少数の追加パラメータのみをトレーニングします。ドメイン固有のアダプタのトレーニングでは、常に同じベースモデルを使用することができ、特定のドメインに文の埋め込みを適用するためにのみドメイン固有のアダプタを交換することができる。文埋め込みのパラメータ効率のよいドメイン適応のためのアダプタを用いることで、約3.6%のパラメータをトレーニングしながら、ドメイン適応された完全に微調整された文埋め込みモデルの1%以内の競争性能が得られることを示す。 Sentence embeddings enable us to capture the semantic similarity of short texts. Most sentence embedding models are trained for general semantic textual similarity (STS) tasks. Therefore, to use sentence embeddings in a particular domain, the model must be adapted to it in order to achieve good results. Usually, this is done by fine-tuning the entire sentence embedding model for the domain of interest. While this approach yields state-of-the-art results, all of the model's weights are updated during fine-tuning, making this method resource-intensive. Therefore, instead of fine-tuning entire sentence embedding models for each target domain individually, we propose to train lightweight adapters. These domain-specific adapters do not require fine-tuning all underlying sentence embedding model parameters. Instead, we only train a small number of additional parameters while keeping the weights of the underlying sentence embedding model fixed. Training domain-specific adapters allows always using the same base model and only exchanging the domain-specific adapters to adapt sentence embeddings to a specific domain. We show that using adapters for parameter-efficient domain adaptation of sentence embeddings yields competitive performance within 1% of a domain-adapted, entirely fine-tuned sentence embedding model while only training approximately 3.6% of the parameters.	翻訳日:2023-07-07 13:27:00 公開日:2023-07-06
# 画像異常検出のためのコンテクストアフィニティ蒸留 Contextual Affinity Distillation for Image Anomaly Detection ( http://arxiv.org/abs/2307.03101v1 ) ライセンス: Link先を確認	Jie Zhang, Masanori Suganuma, Takayuki Okatani	(参考訳) 従来、非監督的産業異常検出の研究は、主にひび割れや色汚染などの局所的な構造異常に焦点を当てていた。この種の異常に対する検出性能は著しく向上するが、通常の物体が間違った位置に置かれているような長距離依存に反する論理的異常に直面している。本稿では,過去の知識蒸留研究に基づいて,生徒2名(地域とグローバル)を用いて,教師の行動を模倣する手法を提案する。先行研究で用いられる地域学生は主に構造異常の検出に焦点をあて、グローバル学生は論理異常に注意を払っている。さらに,学生の長期的依存を捉えるための学習を促すために,GCCB(Global context condensing block)を設計し,学生のトレーニングと異常スコアに対する文脈親和性損失を提案する。実験結果から,提案手法は煩雑なトレーニング手法を必要とせず,MVTec LOCO ADデータセット上で新たな最先端性能を実現する。 Previous works on unsupervised industrial anomaly detection mainly focus on local structural anomalies such as cracks and color contamination. While achieving significantly high detection performance on this kind of anomaly, they are faced with logical anomalies that violate the long-range dependencies such as a normal object placed in the wrong position. In this paper, based on previous knowledge distillation works, we propose to use two students (local and global) to better mimic the teacher's behavior. The local student, which is used in previous studies mainly focuses on structural anomaly detection while the global student pays attention to logical anomalies. To further encourage the global student's learning to capture long-range dependencies, we design the global context condensing block (GCCB) and propose a contextual affinity loss for the student training and anomaly scoring. Experimental results show the proposed method doesn't need cumbersome training techniques and achieves a new state-of-the-art performance on the MVTec LOCO AD dataset.	翻訳日:2023-07-07 13:26:26 公開日:2023-07-06
# gpsを現実世界のデータに適用するためのフレームワーク、intuition Beyond Intuition, a Framework for Applying GPs to Real-World Data ( http://arxiv.org/abs/2307.03093v1 ) ライセンス: Link先を確認	Kenza Tazi, Jihao Andreas Lin, Ross Viljoen, Alex Gardner, Ti John, Hong Ge, Richard E. Turner	(参考訳) Gaussian Processs (GP) は、小さな、構造化された、相関したデータセットに対する回帰の魅力的な方法を提供する。しかし、それらの展開は計算コストと単純な低次元データセットを超えてGPを適用する方法に関する限られたガイドラインによって妨げられている。本稿では,ある問題に対するGPの適合性を同定する枠組みと,頑健で明確なGPモデルの構築方法を提案する。このガイドラインは、経験豊富なGP実践者の決定を定式化し、カーネル設計と計算スケーラビリティのオプションに重点を置いている。この枠組みは氷河の標高変化のケーススタディに適用され、テスト時により正確な結果が得られる。 Gaussian Processes (GPs) offer an attractive method for regression over small, structured and correlated datasets. However, their deployment is hindered by computational costs and limited guidelines on how to apply GPs beyond simple low-dimensional datasets. We propose a framework to identify the suitability of GPs to a given problem and how to set up a robust and well-specified GP model. The guidelines formalise the decisions of experienced GP practitioners, with an emphasis on kernel design and options for computational scalability. The framework is then applied to a case study of glacier elevation change yielding more accurate results at test time.	翻訳日:2023-07-07 13:25:56 公開日:2023-07-06
# 実感と因果性は測定による情報消去を示唆する Realism and causality imply information erasure by measurements ( http://arxiv.org/abs/2307.03134v1 ) ライセンス: Link先を確認	Alberto Montina, Stefan Wolf	(参考訳) 量子測定は一般的に、測定システムのその後の進化に摂動をもたらす。さらに、射影測度は、結果が無視された場合、系の不確実性を減少させることはできず、つまり、フォン・ノイマンエントロピーは減少できない。しかし、特定の音響仮定とLegget-Gargの不等式を用いた場合、この性質は測定過程の忠実な古典的因果シミュレーションによって継承されないことを示す。シミュレーションにおいて、測定は、システムに部分リセットを行うことで、前の情報を消去する。これにより、測定装置は、測定システムから低温浴吸収エントロピーとして機能する。情報消去は、スペケンスの準備の文脈性の一形態である。我々の証明は、量子状態の最大無知が古典状態の最大無知と互換性があると仮定すれば単純である。また、より弱い仮説を用いる。情報消去はライファーとプシーの定理と関連しており、時間対称性はレトロカウシリティを意味する。以上の結果を踏まえて,spekensの準備状況と,leifer と pusey が定義した時間対称性仮説の弱点について考察した。 Quantum measurements generally introduce perturbations into the subsequent evolution of the measured system. Furthermore, a projective measurement cannot decrease the uncertainty on the system if the outcome is ignored; that is, the von Neumann entropy cannot decrease. However, under certain sound assumptions and using the quantum violation of Leggett-Garg inequalities, we demonstrate that this property is not inherited by a faithful classical causal simulation of a measurement process. In the simulation, a measurement erases previous information by performing a partial reset on the system. Thus, the measuring device acts as a low-temperature bath absorbing entropy from the measured system. Information erasure is a form of Spekkens' preparation contextuality. Our proof is straightforward if one assumes that maximal ignorance of the quantum state is compatible with maximal ignorance of the classical state. We also employ a weaker hypothesis. Information erasure is related to a theorem of Leifer and Pusey, which states that time symmetry implies retrocausality. In light of our findings, we discuss Spekkens' preparation contextuality, as well as a weakness in the hypothesis of time symmetry as defined by Leifer and Pusey.	翻訳日:2023-07-07 13:18:29 公開日:2023-07-06
# 画像分類における分布シフトに対するテスト時間適応ベンチマーク Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification ( http://arxiv.org/abs/2307.03133v1 ) ライセンス: Link先を確認	Yongcan Yu, Lijun Sheng, Ran He, Jian Liang	(参考訳) テスト時間適応(TTA)は、予測時にのみラベルのないサンプルを活用することにより、モデルの一般化性能を向上させる技術である。分布シフトに直面するニューラルネットワークシステムのロバスト性を考慮すると,近年,数多くのtta法が提案されている。しかしながら、これらの手法の評価は、分散シフトやバックボーン、シナリオの設計など、異なる設定の下で行われることが多いため、その効果を検証するための一貫性と公正なベンチマークが欠如している。そこで本研究では,CIFAR-10-C,CIFAR-100-C,ImageNet-C,DomainNet,Office-Homeの5つの画像分類データセットに対して,13のTTAメソッドとその変種を体系的に評価するベンチマークを提案する。これらの手法は、幅広い適応シナリオ(例えば、オンライン適応 v.s.オフライン適応、インスタンス適応 v.s.バッチ適応 v.s.ドメイン適応)をカバーする。さらに,ネットワークバックボーンの異なるTTA手法の互換性についても検討する。このベンチマークを実装するために、PyTorchで統一されたフレームワークを開発し、異なるデータセットとネットワークアーキテクチャにわたるTTAメソッドの一貫性のある評価と比較を可能にした。本ベンチマークの確立により、モデルロバスト性および一般化性能を向上させる上でのTTA手法の有効性を評価し、比較する信頼性の高い手段を研究者や実践者に提供することを目指している。私たちのコードはhttps://github.com/yuyongcan/Benchmark-TTAで利用可能です。 Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction. Given the need for robustness in neural network systems when faced with distribution shifts, numerous TTA methods have recently been proposed. However, evaluating these methods is often done under different settings, such as varying distribution shifts, backbones, and designing scenarios, leading to a lack of consistent and fair benchmarks to validate their effectiveness. To address this issue, we present a benchmark that systematically evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets: CIFAR-10-C, CIFAR-100-C, ImageNet-C, DomainNet, and Office-Home. These methods encompass a wide range of adaptation scenarios (e.g. online adaptation v.s. offline adaptation, instance adaptation v.s. batch adaptation v.s. domain adaptation). Furthermore, we explore the compatibility of different TTA methods with diverse network backbones. To implement this benchmark, we have developed a unified framework in PyTorch, which allows for consistent evaluation and comparison of the TTA methods across the different datasets and network architectures. By establishing this benchmark, we aim to provide researchers and practitioners with a reliable means of assessing and comparing the effectiveness of TTA methods in improving model robustness and generalization performance. Our code is available at https://github.com/yuyongcan/Benchmark-TTA.	翻訳日:2023-07-07 13:18:10 公開日:2023-07-06
# T-MARS:テキスト特徴学習による視覚表現の改善 T-MARS: Improving Visual Representations by Circumventing Text Feature Learning ( http://arxiv.org/abs/2307.03132v1 ) ライセンス: Link先を確認	Pratyush Maini, Sachin Goyal, Zachary C. Lipton, J. Zico Kolter, Aditi Raghunathan	(参考訳) 大規模なWebソースによるマルチモーダルデータセットは、汎用的な視覚表現の学習、コンピュータビジョンの最先端化、ゼロショットと少数ショットの認識の革新など、数多くの新しい手法を駆使した。実践者が直面する重要な決定の1つは、いかにして、いつまでも大きなデータセットをキュレートするかである。例えば、LAION-5Bデータセットの作成者は、CLIPの類似度スコアが指定された閾値を超えたイメージキャプチャペアのみを保持することを選んだ。本稿では,LAIONの画像の40%近くが字幕と重なるテキストを含んでいるという観察を動機とした,最新のデータフィルタリング手法を提案する。直感的には、このようなデータは視覚的特徴を学習するのではなく、光学的文字認識を行うモデルにインセンティブを与えるため、無駄になる可能性がある。しかし、視覚的特徴を含む画像を(重なり合うテキストに加えて)捨ててしまうため、こうしたデータを全て取り除くのは無駄になる可能性がある。私たちのシンプルでスケーラブルなアプローチであるT-MARS(Text Masking and Re-Scoring)は、テキストが残りの視覚的特徴を支配しているペアのみをフィルタリングします。実験的に、T-MARSは、DataCompの"medium scale"(データフィルタリングベンチマーク)において、ImageNetの6.5%、VTABの4.7%のマージンでトップランクの手法より優れている。さらに, 2M から 64M までのデータプールサイズを系統的に評価した結果,T-MARS による精度向上はデータや計算が指数関数的に大きくなるにつれて線形的に増加することが示された。コードはhttps://github.com/locuslab/T-MARSで入手できる。 Large web-sourced multimodal datasets have powered a slew of new methods for learning general-purpose visual representations, advancing the state of the art in computer vision and revolutionizing zero- and few-shot recognition. One crucial decision facing practitioners is how, if at all, to curate these ever-larger datasets. For example, the creators of the LAION-5B dataset chose to retain only image-caption pairs whose CLIP similarity score exceeded a designated threshold. In this paper, we propose a new state-of-the-art data filtering approach motivated by our observation that nearly 40% of LAION's images contain text that overlaps significantly with the caption. Intuitively, such data could be wasteful as it incentivizes models to perform optical character recognition rather than learning visual features. However, naively removing all such data could also be wasteful, as it throws away images that contain visual features (in addition to overlapping text). Our simple and scalable approach, T-MARS (Text Masking and Re-Scoring), filters out only those pairs where the text dominates the remaining visual features -- by first masking out the text and then filtering out those with a low CLIP similarity score of the masked image. Experimentally, T-MARS outperforms the top-ranked method on the "medium scale" of DataComp (a data filtering benchmark) by a margin of 6.5% on ImageNet and 4.7% on VTAB. Additionally, our systematic evaluation on various data pool sizes from 2M to 64M shows that the accuracy gains enjoyed by T-MARS linearly increase as data and compute are scaled exponentially. Code is available at https://github.com/locuslab/T-MARS.	翻訳日:2023-07-07 13:17:44 公開日:2023-07-06
# BLEURTにはユニバーサル翻訳がある:最小限のリスクトレーニングによる自動メトリクスの分析 BLEURT Has Universal Translations: An Analysis of Automatic Metrics by Minimum Risk Training ( http://arxiv.org/abs/2307.03131v1 ) ライセンス: Link先を確認	Yiming Yan, Tao Wang, Chengqi Zhao, Shujian Huang, Jiajun Chen, Mingxuan Wang	(参考訳) 自動メトリクスは機械翻訳において重要な役割を果たす。 n-gramベースのメトリクスが広く使用されているにもかかわらず、文の意味論の計測に焦点を当てた事前学習されたモデルベースのメトリクスの開発が最近急増している。しかしながら、これらの神経メトリクスは、人間の評価と高い相関性を達成する一方で、検出が難しい潜在的なバイアスを持つブラックボックスと見なされることが多い。本研究では,機械翻訳システムの学習指導の観点から,各種の主流・最先端自動メトリクスを体系的に分析・比較する。最小リスクトレーニング(MRT)を通じて、BLEURTやBARTScoreに普遍的な逆変換が存在するなど、ある種の指標が堅牢性欠陥を示すことがわかった。詳細な分析からは、トレーニングデータセットにおける分散バイアスと、メトリックパラダイムの傾向の2つの大きな原因が示唆されている。トークンレベルの制約を取り入れることで,評価指標のロバスト性が向上し,機械翻訳システムの性能が向上する。コードは \url{https://github.com/powerpuffpomelo/fairseq_mrt} で入手できる。 Automatic metrics play a crucial role in machine translation. Despite the widespread use of n-gram-based metrics, there has been a recent surge in the development of pre-trained model-based metrics that focus on measuring sentence semantics. However, these neural metrics, while achieving higher correlations with human evaluations, are often considered to be black boxes with potential biases that are difficult to detect. In this study, we systematically analyze and compare various mainstream and cutting-edge automatic metrics from the perspective of their guidance for training machine translation systems. Through Minimum Risk Training (MRT), we find that certain metrics exhibit robustness defects, such as the presence of universal adversarial translations in BLEURT and BARTScore. In-depth analysis suggests two main causes of these robustness deficits: distribution biases in the training datasets, and the tendency of the metric paradigm. By incorporating token-level constraints, we enhance the robustness of evaluation metrics, which in turn leads to an improvement in the performance of machine translation systems. Codes are available at \url{https://github.com/powerpuffpomelo/fairseq_mrt}.	翻訳日:2023-07-07 13:17:09 公開日:2023-07-06
# VisKoP:対話型知識ベース質問応答のための視覚的知識指向プログラミング VisKoP: Visual Knowledge oriented Programming for Interactive Knowledge Base Question Answering ( http://arxiv.org/abs/2307.03130v1 ) ライセンス: Link先を確認	Zijun Yao, Yuanyong Chen, Xin Lv, Shulin Cao, Amy Xin, Jifan Yu, Hailong Jin, Jianjun Xu, Peng Zhang, Lei Hou, Juanzi Li	(参考訳) 本稿では,人間をループに統合し,知識ベース(kb)クエリの編集とデバッグを行う,知識ベース質問応答(kbqa)システムであるviskopを提案する。 VisKoPは、自然言語質問を知識指向プログラム言語(KoPL)に変換するニューラルプログラム誘導モジュールを提供するだけでなく、KoPLプログラムをグラフィカル要素にマッピングする。 KoPLプログラムは、知識演算子を追加するドラッグや演算子引数を指定するスロットフィリングなど、単純なグラフィカル演算子で編集できる。さらに、VisKoPは知識ベーススキーマの自動補完を提供し、ユーザは中間結果をチェックすることで、簡単にKoPLプログラムをデバッグできる。 100万単位のKB上での実用的なKBQAを実現するために,バックエンド用の高効率なKoPL実行エンジンを設計する。実験結果から,VisKoPは高効率であり,ユーザインタラクションによって間違ったKoPLプログラムの大部分が修正され,正解が得られることがわかった。 viskop online demo https://demoviskop.xlore.cn (stable release of this paper)とhttps://viskop.xlore.cn (beta release with new features)、高効率kopl engine https://pypi.org/project/kopl-engine、スクリーンキャストビデオhttps://youtu.be/zabjtxfptxoが公開された。 We present Visual Knowledge oriented Programming platform (VisKoP), a knowledge base question answering (KBQA) system that integrates human into the loop to edit and debug the knowledge base (KB) queries. VisKoP not only provides a neural program induction module, which converts natural language questions into knowledge oriented program language (KoPL), but also maps KoPL programs into graphical elements. KoPL programs can be edited with simple graphical operators, such as dragging to add knowledge operators and slot filling to designate operator arguments. Moreover, VisKoP provides auto-completion for its knowledge base schema and users can easily debug the KoPL program by checking its intermediate results. To facilitate the practical KBQA on a million-entity-level KB, we design a highly efficient KoPL execution engine for the back-end. Experiment results show that VisKoP is highly efficient and user interaction can fix a large portion of wrong KoPL programs to acquire the correct answer. The VisKoP online demo https://demoviskop.xlore.cn (Stable release of this paper) and https://viskop.xlore.cn (Beta release with new features), highly efficient KoPL engine https://pypi.org/project/kopl-engine, and screencast video https://youtu.be/zAbJtxFPTXo are now publicly available.	翻訳日:2023-07-07 13:16:50 公開日:2023-07-06
# 次元減少のための主サブバンドル Principal subbundles for dimension reduction ( http://arxiv.org/abs/2307.03128v1 ) ライセンス: Link先を確認	Morten Akh{\o}j, James Benn, Erlend Grong, Stefan Sommer, Xavier Pennec	(参考訳) 本稿では, 点雲の局所線型近似を組み合わせて低次元束を得ることにより, 多様体学習と表面再構成にサブリーマン幾何学をどのように利用できるかを示す。局所的pcasによって得られる局所近似は、主部分バンドル ( principal subbundle) と呼ばれる$\mathbb{r}^d$, $k<d$ 上の接部分バンドル (tangent subbundle) に集められる。これは$\mathbb{R}^d$ 上の部分リーマン計量を決定する。この距離に関する準リーマン測地学は、近似部分多様体 $m$ の明示的な構成、\mathbb{r}^k$ における点クラウドの表現の構成、観測間の距離の計算、学習された幾何学を考慮に入れるなど、いくつかの重要な問題にうまく適用できることが示されている。再構成は、接空間を正確に推定する極限の場合の真の部分多様体に等しいことが保証される。シミュレーションにより,ノイズの多いデータに適用した場合,フレームワークが堅牢であることを示す。さらに、このフレームワークは、既知リーマン多様体上の観測に一般化される。 In this paper we demonstrate how sub-Riemannian geometry can be used for manifold learning and surface reconstruction by combining local linear approximations of a point cloud to obtain lower dimensional bundles. Local approximations obtained by local PCAs are collected into a rank $k$ tangent subbundle on $\mathbb{R}^d$, $k<d$, which we call a principal subbundle. This determines a sub-Riemannian metric on $\mathbb{R}^d$. We show that sub-Riemannian geodesics with respect to this metric can successfully be applied to a number of important problems, such as: explicit construction of an approximating submanifold $M$, construction of a representation of the point-cloud in $\mathbb{R}^k$, and computation of distances between observations, taking the learned geometry into account. The reconstruction is guaranteed to equal the true submanifold in the limit case where tangent spaces are estimated exactly. Via simulations, we show that the framework is robust when applied to noisy data. Furthermore, the framework generalizes to observations on an a priori known Riemannian manifold.	翻訳日:2023-07-07 13:16:20 公開日:2023-07-06
# 実機会ネットワークのためのWiFiダイレクトグループのコンテキストアウェア構成と管理 Context-Aware Configuration and Management of WiFi Direct Groups for Real Opportunistic Networks ( http://arxiv.org/abs/2307.03126v1 ) ライセンス: Link先を確認	Valerio Arnaboldi, Mattia Giovanni Campana, Franca Delmastro	(参考訳) Wi-Fi Directは、商用モバイルデバイス上でデバイス間通信(D2D)をサポートするための有望な技術である。しかし、標準のas-it-isは、日和見的ネットワークのようなd2dに基づくネットワークソリューションの実際の展開をサポートするのに十分ではない。実際、WiFi Directは、ユーザのパーソナルデバイス間でのD2D接続の自律的生成を制限するいくつかの特徴を示している。具体的には、この標準は2つ以上のデバイス間の接続を確立するために、ユーザの承認を明示的に要求し、グループ間通信を限定的にサポートする。場合によっては、互いに通信できないノードの孤立したグループを作るのに繋がる場合もある。本稿では、WiFiダイレクトグループ(WFD-GM)の効率的な構成と管理のための新しいミドルウェア層プロトコルを提案し、自律的な接続とグループ間通信を実現する。これにより、実環境(例えば、可変モビリティとネットワークサイズ)における機会ネットワークが可能になる。 WFD-GMは、ノードの安定性と電力水準の指標を含む特定の時間窓において、最適なグループ構成を作成するための異種パラメータを考慮に入れたコンテキスト関数を定義する。異なるモビリティモデル,地理的領域,ノード数を含む3つの参照シナリオをシミュレートして,プロトコルの性能を評価する。シミュレーションはまた、関連するコンテキストパラメータの実際のテストベッドにおける評価に関する実験結果によっても支持される。我々はWFD-GMを最先端のソリューションと比較し、中低モビリティのシナリオではベースラインアプローチよりもはるかに優れた性能を示し、さらにオーバーヘッドを伴わずに高モビリティの場合と同等であることを示した。 Wi-Fi Direct is a promising technology for the support of device-to-device communications (D2D) on commercial mobile devices. However, the standard as-it-is is not sufficient to support the real deployment of networking solutions entirely based on D2D such as opportunistic networks. In fact, WiFi Direct presents some characteristics that could limit the autonomous creation of D2D connections among users' personal devices. Specifically, the standard explicitly requires the user's authorization to establish a connection between two or more devices, and it provides a limited support for inter-group communication. In some cases, this might lead to the creation of isolated groups of nodes which cannot communicate among each other. In this paper, we propose a novel middleware-layer protocol for the efficient configuration and management of WiFi Direct groups (WiFi Direct Group Manager, WFD-GM) to enable autonomous connections and inter-group communication. This enables opportunistic networks in real conditions (e.g., variable mobility and network size). WFD-GM defines a context function that takes into account heterogeneous parameters for the creation of the best group configuration in a specific time window, including an index of nodes' stability and power levels. We evaluate the protocol performances by simulating three reference scenarios including different mobility models, geographical areas and number of nodes. Simulations are also supported by experimental results related to the evaluation in a real testbed of the involved context parameters. We compare WFD-GM with the state-of-the-art solutions and we show that it performs significantly better than a Baseline approach in scenarios with medium/low mobility, and it is comparable with it in case of high mobility, without introducing additional overhead.	翻訳日:2023-07-07 13:15:59 公開日:2023-07-06
# 大標準結晶構造の予測のためのアナーリング:n体原子間相互作用の効率的な実装 Annealing for prediction of grand canonical crystal structures: Efficient implementation of n-body atomic interactions ( http://arxiv.org/abs/2307.03123v1 ) ライセンス: Link先を確認	Yannick Couzinie, Yusuke Nishiya, Hirofumi Nishi, Taichi Kosugi, Yu-ichiro Matsushita	(参考訳) 本稿では, 一般的なn-体原子間相互作用, 特に共有結合をシミュレートするために必要な3-体相互作用を考慮した結晶構造予測法を提案する。結晶構造は、実空間をメッシュで判別し、各格子点上の原子の存在または非存在を表す二項変数を配置することで表される。二次非拘束二元最適化(qubo)または高次非拘束二元最適化(hubo)問題においてn体原子相互作用を実装し,シミュレートアニーリングによりcspを行う。本研究では,MoS2結晶のHUBO定式化において,3体相互作用を実装するために必要なビット数を削減することに成功した。さらに, 粒子密度と結晶構造をシミュレートしたアニールを用いて同時に最適化できることを示すことにより, グランドカノニカルシミュレーションが可能であることがわかった。特に、希ガス、すなわちレナード・ジョーンズ(lj)固体にcspを適用することで、グランドカノニカル計算がそのマイクロカノニカル計算よりも解のスケーリングに適していることを示す。 We propose an annealing scheme for crystal structures prediction (CSP) by taking into account the general n-body atomic interactions, and in particular three-body interactions which are necessary to simulate covalent bonds. The crystal structure is represented by discretizing the real space by mesh and placing binary variables which express the existence or non-existence of an atom on every grid point. We implement n-body atomic interaction in quadratic unconstrained binary optimization (QUBO) or higher-order unconstrained binary optimization (HUBO) problems and perform CSP by simulated annealing. In this study we successfully reduce the number of bits necessary to implement three-body interactions within the HUBO formulation of MoS2 crystals. Further, we find that grand canonical simulation is possible by showing that we can simultaneously optimize for the particle density as well as the crystal structure using simulated annealing. In particular, we apply CSP to noble gasses, i.e. Lennard-Jones(LJ) solids, and show that the grand canonical calculation has a better time to solution scaling than its microcanonical counterpart.	翻訳日:2023-07-07 13:15:32 公開日:2023-07-06
# 言語モデルから多値関係を抽出する Extracting Multi-valued Relations from Language Models ( http://arxiv.org/abs/2307.03122v1 ) ライセンス: Link先を確認	Sneha Singhania, Simon Razniewski, Gerhard Weikum	(参考訳) 事前学習言語モデル(lms)による潜在言語表現の普及は、それらが構造化知識の有望な源であることを示唆している。しかし、既存の手法では、複数のオブジェクトが正しい場合が多いにもかかわらず、対象-関係ペア当たりの1つのオブジェクトにのみフォーカスする。この制限を克服するために、我々はこれらの表現を分析して、物質化された多目的関係知識を得る。我々はこの問題をランク選択タスクとして定式化する。候補オブジェクトのランク付けには,既存のプロンプト技術を評価し,ドメイン知識を取り入れた新しい手法を提案する。選択法のうち、学習された関係性特異しきい値よりも高い確率で対象を選択すると、49.5%のF1スコアが得られる。本研究は,多値スロット充足作業におけるlmsの活用の難しさを浮き彫りにし,潜在言語表現から関係知識を抽出するためのさらなる研究の道を開く。 The widespread usage of latent language representations via pre-trained language models (LMs) suggests that they are a promising source of structured knowledge. However, existing methods focus only on a single object per subject-relation pair, even though often multiple objects are correct. To overcome this limitation, we analyze these representations for their potential to yield materialized multi-object relational knowledge. We formulate the problem as a rank-then-select task. For ranking candidate objects, we evaluate existing prompting techniques and propose new ones incorporating domain knowledge. Among the selection methods, we find that choosing objects with a likelihood above a learned relation-specific threshold gives a 49.5% F1 score. Our results highlight the difficulty of employing LMs for the multi-valued slot-filling task and pave the way for further research on extracting relational knowledge from latent language representations.	翻訳日:2023-07-07 13:15:08 公開日:2023-07-06
# 金融における最適マルチオーダー実行のためのマルチエージェント意図認識コミュニケーションの学習 Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance ( http://arxiv.org/abs/2307.03119v1 ) ライセンス: Link先を確認	Yuchen Fang, Zhenggang Tang, Kan Ren, Weiqing Liu, Li Zhao, Jiang Bian, Dongsheng Li, Weinan Zhang, Yong Yu, Tie-Yan Liu	(参考訳) 注文実行は、特定の資産の取引注文の取得または清算を完了することを目的とした、量的金融の基本的なタスクである。モデルフリー強化学習(RL)の最近の進歩は、注文実行問題に対するデータ駆動型ソリューションを提供する。しかしながら、既存の作業は常に個々の順序の実行を最適化し、複数の順序が同時に実行されるように指定されているプラクティスを見越して、亜最適性とバイアスをもたらす。本稿では,まず,現実的な制約を考慮したマルチオーダー実行のためのマルチエージェントRL(MARL)手法を提案する。具体的には、すべてのエージェントを個々のオペレータとして扱い、互いにコミュニケーションを保ちながら、全体の利益を最大化するために協力します。それにもかかわらず、既存のmarlアルゴリズムは、複雑な金融市場では非効率である部分的観測に関する情報のみを交換することで、エージェント間のコミュニケーションを組み込むことが多い。協調性を向上させるために,学習可能なマルチラウンド通信プロトコルを提案する。元の学習目標と確実に一致するが、より効率的である新規な行動値帰属法によって最適化される。実世界の2つの市場におけるデータを用いた実験により,本手法によるコラボレーションの有効性が著しく向上した。 Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets. Recent advance in model-free reinforcement learning (RL) provides a data-driven solution to the order execution problem. However, the existing works always optimize execution for an individual order, overlooking the practice that multiple orders are specified to execute simultaneously, resulting in suboptimality and bias. In this paper, we first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. Specifically, we treat every agent as an individual operator to trade one specific order, while keeping communicating with each other and collaborating for maximizing the overall profits. Nevertheless, the existing MARL algorithms often incorporate communication among agents by exchanging only the information of their partial observations, which is inefficient in complicated financial market. To improve collaboration, we then propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other and refining accordingly. It is optimized through a novel action value attribution method which is provably consistent with the original learning objective yet more efficient. The experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness achieved by our method.	翻訳日:2023-07-07 13:14:54 公開日:2023-07-06
# 消去検出論理測定による超伝導二重レール空洞量子ビットの実証 Demonstrating a superconducting dual-rail cavity qubit with erasure-detected logical measurements ( http://arxiv.org/abs/2307.03169v1 ) ライセンス: Link先を確認	Kevin S. Chou, Tali Shemma, Heather McCarrick, Tzu-Chiao Chien, James D. Teoh, Patrick Winkel, Amos Anderson, Jonathan Chen, Jacob Curtis, Stijn J. de Graaf, John W. O. Garmon, Benjamin Gudlewski, William D. Kalfus, Trevor Keen, Nishaad Khedkar, Chan U Lei, Gangqiang Liu, Pinlei Lu, Yao Lu, Aniket Maiti, Luke Mastalli-Kelly, Nitish Mehta, Shantanu O. Mundhada, Anirudh Narla, Taewan Noh, Takahiro Tsunoda, Sophia H. Xue, Joseph O. Yuan, Luigi Frunzio, Jose Aumentado, Shruti Puri, Steven M. Girvin, S. Harvey Moseley, Jr., Robert J. Schoelkopf	(参考訳) スケーラブルな誤り訂正量子システムを開発する上で重要な課題は、操作と測定をしながらエラーの蓄積である。有望なアプローチの1つは、エラーを検出して消去できるシステムを設計することである。最近の提案では、超伝導キャビティを用いたデュアルレール符号化を目標としている。本研究では,このような二重レールキャビティ量子ビットを実装し,消去検出を伴う投影的論理計測の実証を行う。論理状態の生成と測定誤差を0.01 %$レベルで測定し,99 %$以上の空洞崩壊事象を消去として検出する。新しい測定プロトコルの精度を使って、このシステムにおける異なる種類のエラーを識別し、崩壊エラーは確率$\sim 0.2\%$ perマイクロ秒で発生するが、位相エラーは6倍少なく、ビットフリップは少なくとも170倍少なくなることを発見した。これらの結果は,2重レール消去量子ビットを高効率な消去符号に結合するために必要な誤差階層を初めて確認したことを示す。 A critical challenge in developing scalable error-corrected quantum systems is the accumulation of errors while performing operations and measurements. One promising approach is to design a system where errors can be detected and converted into erasures. A recent proposal aims to do this using a dual-rail encoding with superconducting cavities. In this work, we implement such a dual-rail cavity qubit and use it to demonstrate a projective logical measurement with erasure detection. We measure logical state preparation and measurement errors at the $0.01\%$-level and detect over $99\%$ of cavity decay events as erasures. We use the precision of this new measurement protocol to distinguish different types of errors in this system, finding that while decay errors occur with probability $\sim 0.2\%$ per microsecond, phase errors occur 6 times less frequently and bit flips occur at least 170 times less frequently. These findings represent the first confirmation of the expected error hierarchy necessary to concatenate dual-rail erasure qubits into a highly efficient erasure code.	翻訳日:2023-07-07 13:09:10 公開日:2023-07-06
# VideoGLUE: 基礎モデルの総合的評価 VideoGLUE: Video General Understanding Evaluation of Foundation Models ( http://arxiv.org/abs/2307.03166v1 ) ライセンス: Link先を確認	Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong	(参考訳) 本研究では,3つのホールマークタスク(動作認識,時間的局所化,時空間的局所化),コミュニティが受け取りやすい8つのデータセット,下流タスクのための基盤モデル(fm)を調整した4つの適応手法を用いて,既存の基礎モデルビデオ理解能力を評価した。さらに,一般的な映像理解タスクに適応する際のfmsの有効性と効率を測定するためのスカラービデオグルスコア(vgs)を提案する。主な発見は以下の通りである。第一に、タスク特化モデルは、自然言語や画像理解においてFMが達成したものとは対照的に、本研究で研究した6つのFMよりも著しく優れている。第2に、動画モダリティを含む事前トレーニングデータを持つビデオネイティブfmsは、モーションリッチビデオの分類、時間内のアクションのローカライズ、複数のアクションのビデオの理解において、画像ネイティブfmsよりも一般的に優れている。第3に、ビデオネイティブFMは、ダウンストリームタスク(例えば、FMバックボーンの凍結)に光順応したビデオタスクでうまく機能し、画像ネイティブFMは、完全なエンドツーエンドの微調整で勝利する。最初の2つの観察により、ビデオ中心のfmsの研究を行う必要性と膨大な機会が明らかとなり、最後に、fmsの評価に関してタスクと適応方法の両方が重要であることが確認された。 We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task. Moreover, we propose a scalar VideoGLUE score (VGS) to measure an FMs efficacy and efficiency when adapting to general video understanding tasks. Our main findings are as follows. First, task-specialized models significantly outperform the six FMs studied in this work, in sharp contrast to what FMs have achieved in natural language and image understanding. Second,video-native FMs, whose pretraining data contains the video modality, are generally better than image-native FMs in classifying motion-rich videos, localizing actions in time, and understanding a video of more than one action. Third, the video-native FMs can perform well on video tasks under light adaptations to downstream tasks(e.g., freezing the FM backbones), while image-native FMs win in full end-to-end finetuning. The first two observations reveal the need and tremendous opportunities to conduct research on video-focused FMs, and the last confirms that both tasks and adaptation methods matter when it comes to the evaluation of FMs.	翻訳日:2023-07-07 13:08:50 公開日:2023-07-06
# BrickPal: Brickモデルのための拡張現実ベースのアセンブリ命令 BrickPal: Augmented Reality-based Assembly Instructions for Brick Models ( http://arxiv.org/abs/2307.03162v1 ) ライセンス: Link先を確認	Yao Shi, Xiaofeng Zhang, Ran zhang, Zhou Yang, Xiao Tang, Hongni Ye, Yi Wu	(参考訳) The assembly instruction is a mandatory component of Lego-like brick sets.The conventional production of assembly instructions requires a considerable amount of manual fine-tuning, which is intractable for casual users and customized brick sets.Moreover, the traditional paper-based instructions lack expressiveness and interactivity.To tackle the two problems above, we present BrickPal, an augmented reality-based system, which visualizes assembly instructions in an augmented reality head-mounted display. 本研究は,自然言語処理(nlp)技術を用いて実現可能なアセンブリシーケンスを生成し,arヘッドセットにおけるリアルタイムガイダンスを提供する。さらに、nlpアルゴリズムが生成するアセンブリシーケンスは、手動で適応したシーケンスで同じユーザビリティを実現する。 The assembly instruction is a mandatory component of Lego-like brick sets.The conventional production of assembly instructions requires a considerable amount of manual fine-tuning, which is intractable for casual users and customized brick sets.Moreover, the traditional paper-based instructions lack expressiveness and interactivity.To tackle the two problems above, we present BrickPal, an augmented reality-based system, which visualizes assembly instructions in an augmented reality head-mounted display. It utilizes Natural Language Processing (NLP) techniques to generate plausible assembly sequences, and provide real-time guidance in the AR headset.Our user study demonstrates BrickPal's effectiveness at assisting users in brick assembly compared to traditional assembly methods. Additionally, the NLP algorithm-generated assembly sequences achieve the same usability with manually adapted sequences.	翻訳日:2023-07-07 13:08:22 公開日:2023-07-06
# ドメイン適応は皮膚病変分類の精度と公平性を改善するか? Can Domain Adaptation Improve Accuracy and Fairness of Skin Lesion Classification? ( http://arxiv.org/abs/2307.03157v1 ) ライセンス: Link先を確認	Janet Wang, Yunbei Zhang, Zhengming Ding, Jihun Hamm	(参考訳) 深層学習に基づく診断システムは、ラベル付きトレーニング例が豊富にある皮膚がんの病態を分類する可能性を示している。しかし、皮膚の病変解析はしばしばラベル付きデータの不足に悩まされ、正確で信頼性の高い診断システムの開発を妨げる。本研究は,複数の皮膚病変データセットを活用し,非教師なし領域適応法(UDA)の2値および多値の皮膚病変分類への応用について検討する。特に,シングル,コンバインド,マルチソースの3つのudaトレーニングスキームを評価した。実験の結果,UDAは二分分類に有効であり,不均衡が緩和された場合にはさらなる改善が見られた。多クラスタスクでは、その性能はさほど目立たず、また、上記のベースライン精度を達成するために不均衡の問題に対処する必要がある。定量的解析により,マルチクラスタスクのテストエラーはラベルシフトと強く相関し,機能レベルのudaメソッドには不均衡データセットを扱う際の制限があることが分かった。最後に,本研究では,少数派に対する偏見を効果的に低減し,公平性を重視したテクニックを明示的に用いなくても公平性を促進できることを示した。 Deep learning-based diagnostic system has demonstrated potential in classifying skin cancer conditions when labeled training example are abundant. However, skin lesion analysis often suffers from a scarcity of labeled data, hindering the development of an accurate and reliable diagnostic system. In this work, we leverage multiple skin lesion datasets and investigate the feasibility of various unsupervised domain adaptation (UDA) methods in binary and multi-class skin lesion classification. In particular, we assess three UDA training schemes: single-, combined-, and multi-source. Our experiment results show that UDA is effective in binary classification, with further improvement being observed when imbalance is mitigated. In multi-class task, its performance is less prominent, and imbalance problem again needs to be addressed to achieve above-baseline accuracy. Through our quantitative analysis, we find that the test error of multi-class tasks is strongly correlated with label shift, and feature-level UDA methods have limitations when handling imbalanced datasets. Finally, our study reveals that UDA can effectively reduce bias against minority groups and promote fairness, even without the explicit use of fairness-focused techniques.	翻訳日:2023-07-07 13:08:11 公開日:2023-07-06
# MultiVENT: 自然文を付加したイベントの多言語ビデオ MultiVENT: Multilingual Videos of Events with Aligned Natural Text ( http://arxiv.org/abs/2307.03153v1 ) ライセンス: Link先を確認	Kate Sanders, David Etter, Reno Kriz, Benjamin Van Durme	(参考訳) ニュースの報道は、従来の放送から、手書きで未編集のビデオ映像など、幅広いプレゼンテーション形式に移行している。オンラインで利用可能な多言語多言語ニュースソースの多種多様な配列を反映したデータセットは、このシフトの恩恵を受けるモデルを教えるのに使用できるが、既存のニュースビデオデータセットは、英語話者向けの伝統的なニュースブロードキャストに焦点を当てている。この制限に対処するため、5つのターゲット言語にまたがるテキスト文書に基づく多言語・イベント中心ビデオのデータセットであるMultiVENTを構築した。 MultiVENTには、ニュースブロードキャストビデオとプロでないイベント映像の両方が含まれており、オンラインニュースビデオの状態を分析し、それらを利用して、堅牢で事実的に正確なモデルを構築することができる。最後に,MultiVENTを用いた情報検索のベースラインとして,複雑な多言語ビデオ検索のためのモデルを提案する。 Everyday news coverage has shifted from traditional broadcasts towards a wide range of presentation formats such as first-hand, unedited video footage. Datasets that reflect the diverse array of multimodal, multilingual news sources available online could be used to teach models to benefit from this shift, but existing news video datasets focus on traditional news broadcasts produced for English-speaking audiences. We address this limitation by constructing MultiVENT, a dataset of multilingual, event-centric videos grounded in text documents across five target languages. MultiVENT includes both news broadcast videos and non-professional event footage, which we use to analyze the state of online news videos and how they can be leveraged to build robust, factually accurate models. Finally, we provide a model for complex, multilingual video retrieval to serve as a baseline for information retrieval using MultiVENT.	翻訳日:2023-07-07 13:07:53 公開日:2023-07-06
# 共有モビリティによるアクセシビリティの計算について On the Computation of Accessibility Provided by Shared Mobility ( http://arxiv.org/abs/2307.03148v1 ) ライセンス: Link先を確認	Severin Diepolder, Andrea Araldo, Tarek Chouaki, Santa Maiti, Sebastian H\"orl, Costantinos Antoniou	(参考訳) シェアード・モビリティ・サービス(SMS)、例えばデマンド・レスポンシブ・トランジット(DRT)やライドシェアリングは、低密度領域におけるモビリティを改善することができる。このような改善は、主に待ち時間や旅行時間といった基本的なパフォーマンス指標によって定量化される。しかし、アクセシビリティ指標は、周囲の機会(例えば、仕事、学校、店など)にたどり着くことの容易さを測定することで、より包括的な指標となる。現在、経験的測定に基づいてSMSのアクセシビリティを定量化する方法は存在しない。実際、アクセシビリティは一般的にptネットワークのグラフ表現で計算されるが、smsは動的であり、事前定義されたネットワークに従わない。本研究では,ptのフィーダとして作用するsmsの入力観測トリップをグラフにまとめた空間-時間統計手法を提案する。このようなグラフでは、古典的なアクセシビリティ指標を計算する。本手法をパリ・サクレーにおけるDRTに関するMATSimシミュレーション研究に適用する。 Shared Mobility Services (SMS), e.g., Demand-Responsive Transit (DRT) or ride-sharing, can improve mobility in low-density areas, often poorly served by conventional Public Transport (PT). Such improvement is mostly quantified via basic performance indicators, like wait or travel time. However, accessibility indicators, measuring the ease of reaching surrounding opportunities (e.g., jobs, schools, shops, ...), would be a more comprehensive indicator. To date, no method exists to quantify the accessibility of SMS based on empirical measurements. Indeed, accessibility is generally computed on graph representations of PT networks, but SMS are dynamic and do not follow a predefined network. We propose a spatial-temporal statistical method that takes as input observed trips of a SMS acting as a feeder for PT and summarized such trips in a graph. On such a graph, we compute classic accessibility indicators. We apply our method to a MATSim simulation study concerning DRT in Paris-Saclay.	翻訳日:2023-07-07 13:07:37 公開日:2023-07-06
# 双対ユニタリティの階層的一般化 Hierarchical generalization of dual unitarity ( http://arxiv.org/abs/2307.03138v1 ) ライセンス: Link先を確認	Xie-Hang Yu, Zhiyuan Wang and Pavel Kos	(参考訳) 格子モデルにおける局所的な相互作用を伴う量子力学は、リッチな物理学を示すが、研究は困難である。二重単位回路は、1次元または高次元の量子系における興味深い物理問題に対する正確な答えを可能にする。しかし、このモデル群は、光円錐内における相関の消失や、局所的な可観測物の瞬時熱化など、普遍的な特徴を示す。本研究では, 正確な計算可能な空間-時間相関関数がよりリッチな振る舞いを示し, 局所観測可能な非自明な熱化を持つデュアルユニタリ回路の一般化を提案する。これは、単一ゲート条件をマルチゲート条件の階層に一般化することで実現され、第1レベルがデュアルユニタリモデルを復元し、第2レベルがこれら新しい興味深い特徴を示す。また、議論を拡張して、わずかなサイトオブザーバブルを持つコリエータに正確なソリューションを提供し、量子クエンチ後のものを含む高階について議論する。さらに、量子ビットの場合の徹底的なパラメトリゼーションを提供し、また、2より大きい局所次元のモデルの新しいファミリーを提案し、また二元単位モデルの新しいファミリーを提供する。 Quantum dynamics with local interactions in lattice models display rich physics, but is notoriously hard to study. Dual-unitary circuits allow for exact answers to interesting physical questions in clean or disordered one- and higher-dimensional quantum systems. However, this family of models shows some non-universal features, like vanishing correlations inside the light-cone and instantaneous thermalization of local observables. In this work we propose a generalization of dual-unitary circuits where the exactly calculable spatial-temporal correlation functions display richer behavior, and have non-trivial thermalization of local observables. This is achieved by generalizing the single-gate condition to a hierarchy of multi-gate conditions, where the first level recovers dual-unitary models, and the second level exhibits these new interesting features. We also extend the discussion and provide exact solutions to correlators with few-site observables and discuss higher-orders, including the ones after a quantum quench. In addition, we provide exhaustive parametrizations for qubit cases, and propose a new family of models for local dimensions larger than two, which also provides a new family of dual-unitary models.	翻訳日:2023-07-07 13:07:18 公開日:2023-07-06
# ct画像における大動脈および大血管分節のトポロジー認識損失 Topology-Aware Loss for Aorta and Great Vessel Segmentation in Computed Tomography Images ( http://arxiv.org/abs/2307.03137v1 ) ライセンス: Link先を確認	Seher Ozcelik, Sinan Unver, Ilke Ali Gurses, Rustu Turkay, and Cigdem Gunduz-Demir	(参考訳) セグメンテーションネットワークは、標準的な損失関数で訓練された場合、オブジェクトの形状や複数のオブジェクト間の幾何など、画像のグローバル不変性を学ぶために明示的に強制されない。一方,このような不変性をネットワークトレーニングに組み込むことで,分割対象の固有特性である様々なセグメンテーションタスクの性能を向上させることができる。例えば、CT画像における大動脈と大血管の分節化では、人間の解剖学により体内の特定の形状に血管が見出され、2次元CT画像上の丸い物体のように見える。本稿では, 基底的真理と持続的ホモロジーによる予測とのトポロジの相違を罰する新たなトポロジ認識損失関数を導入することにより, この問題に対処する。予測写像の確率関数と基底真理のベッチ数にしきい値濾過を適用した従来提案されていた分節ネットワーク設計とは違って, ヴィトリス・リップス濾過を適用し, 基底真理と予測写像の持続性図を取得し, 対応する持続性図間のワッサースタイン距離との差を計算することを提案する。この濾過を用いると、形状と形状を同時にモデル化する利点があるが、しきい値濾過が適用されるとは起こり得ない。 24名の被験者の4327ct画像を用いた実験により,提案するトポロジー認識損失関数が,提案手法よりも優れた結果をもたらすことが明らかとなった。 Segmentation networks are not explicitly imposed to learn global invariants of an image, such as the shape of an object and the geometry between multiple objects, when they are trained with a standard loss function. On the other hand, incorporating such invariants into network training may help improve performance for various segmentation tasks when they are the intrinsic characteristics of the objects to be segmented. One example is segmentation of aorta and great vessels in computed tomography (CT) images where vessels are found in a particular geometry in the body due to the human anatomy and they mostly seem as round objects on a 2D CT image. This paper addresses this issue by introducing a new topology-aware loss function that penalizes topology dissimilarities between the ground truth and prediction through persistent homology. Different from the previously suggested segmentation network designs, which apply the threshold filtration on a likelihood function of the prediction map and the Betti numbers of the ground truth, this paper proposes to apply the Vietoris-Rips filtration to obtain persistence diagrams of both ground truth and prediction maps and calculate the dissimilarity with the Wasserstein distance between the corresponding persistence diagrams. The use of this filtration has advantage of modeling shape and geometry at the same time, which may not happen when the threshold filtration is applied. Our experiments on 4327 CT images of 24 subjects reveal that the proposed topology-aware loss function leads to better results than its counterparts, indicating the effectiveness of this use.	翻訳日:2023-07-07 13:06:57 公開日:2023-07-06
# 対称円錐上のオンライン凸最適化のための乗法的更新 Multiplicative Updates for Online Convex Optimization over Symmetric Cones ( http://arxiv.org/abs/2307.03136v1 ) ライセンス: Link先を確認	Ilayda Canyakmaz, Wayne Lin, Georgios Piliouras, Antonios Varvitsiotis	(参考訳) オンライン凸最適化(オンライン凸最適化)について検討し、可能なアクションは対称円錐内のトレース1要素であり、広く研究されている専門家のセットアップとその量子対応を一般化する。対称円錐は、線形、二階錐、半定値最適化を含むいくつかの重要な最適化モデルの統一フレームワークを提供する。ユークリッドジョルダン代数の分野のツールを用いて、任意の対称錐のトレースワンスライス上でのオンライン最適化のための投影なしアルゴリズムであるscmwu(symmetric-cone multiplicative weights update)を導入する。 SCMWUは, 対称錐負エントロピーを正則化器とするFollow-the-Regularized-LeaderおよびOnline Mirror Descentと等価であることを示す。この構造的結果を用いて、scmwuは非回帰アルゴリズムであり、広範な実験により理論結果を検証する。本研究では,確率的単純度を用いた乗法重み更新法と,密度行列の集合上の行列乗法重み更新法を統合し,一般化する。 We study online convex optimization where the possible actions are trace-one elements in a symmetric cone, generalizing the extensively-studied experts setup and its quantum counterpart. Symmetric cones provide a unifying framework for some of the most important optimization models, including linear, second-order cone, and semidefinite optimization. Using tools from the field of Euclidean Jordan Algebras, we introduce the Symmetric-Cone Multiplicative Weights Update (SCMWU), a projection-free algorithm for online optimization over the trace-one slice of an arbitrary symmetric cone. We show that SCMWU is equivalent to Follow-the-Regularized-Leader and Online Mirror Descent with symmetric-cone negative entropy as regularizer. Using this structural result we show that SCMWU is a no-regret algorithm, and verify our theoretical results with extensive experiments. Our results unify and generalize the analysis for the Multiplicative Weights Update method over the probability simplex and the Matrix Multiplicative Weights Update method over the set of density matrices.	翻訳日:2023-07-07 13:06:27 公開日:2023-07-06
# アウト・オブ・ディストリビューション・ジェネリザビリティを持つ大規模視覚言語モデルの蒸留 Distilling Large Vision-Language Model with Out-of-Distribution Generalizability ( http://arxiv.org/abs/2307.03135v1 ) ライセンス: Link先を確認	Xuanlin Li, Yunhao Fang, Minghua Liu, Zhan Ling, Zhuowen Tu, Hao Su	(参考訳) 大きなビジョン言語モデルは優れた性能を達成しているが、そのサイズと計算要件により、リソースに制約のあるデバイスや時間に敏感なタスクへのデプロイは現実的ではない。モデル蒸留は、より大きなモデルの性能を維持する、より小さくより高速なモデルを作成するプロセスであり、ソリューションに向けた有望な方向である。本稿では,大規模教師の視覚モデルから軽度学生モデルへの視覚表現の蒸留について,小規模または中規模データセットを用いて検討する。本研究は,従来モデル蒸留の文献では見過ごされてきた課題であるオープン・ボキャブラリー・アウト・オブ・ディストリビューション(ood)の一般化に焦点を当てたものである。 1) 教師の視覚表現空間を模倣し, 教師との視覚・言語連携を慎重に促進すること, (2) 教師の言語表現を情報的かつ細かな意味的属性で豊かにすることで, 異なるラベルを効果的に区別することである。我々は,いくつかの指標を提案し,その手法を検討するために広範囲な実験を行う。その結果,オープン・ボカブラリー・アウト・オブ・ディストリビューション分類におけるゼロショットと少数ショットの学生成績が有意に改善し,提案手法の有効性が示された。私たちのコードはhttps://github.com/xuanlinli17/large_vlm_distillation_oodでリリースされる。 Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher's language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification, highlighting the effectiveness of our proposed approaches. Our code will be released at https://github.com/xuanlinli17/large_vlm_distillation_ood	翻訳日:2023-07-07 13:06:04 公開日:2023-07-06
# テキストからのアートシネマグラフの合成 Synthesizing Artistic Cinemagraphs from Text ( http://arxiv.org/abs/2307.03190v1 ) ライセンス: Link先を確認	Aniruddha Mahapatra, Aliaksandr Siarohin, Hsin-Ying Lee, Sergey Tulyakov, Jun-Yan Zhu	(参考訳) 本稿では,これらの画像の意味や動きを複雑に解釈することを考えると,特徴的想像的要素や芸術的スタイルを推し進める上で,特に困難な作業である,テキスト記述からシネマグラフを作成する完全自動化手法であるArttic Cinemagraphを紹介する。既存の単一画像アニメーション手法は芸術的な入力に不足しており、最近のテキストベースのビデオ手法は時間的不整合をしばしば導入し、特定の領域を静的に保つのに苦労している。これらの課題に対処するために,1つのテキストプロンプトから画像双生児を合成する手法を提案する。芸術的なイメージはテキストに詳述されたスタイルや外観を描写するが、リアルなイメージはレイアウトや動きの分析を大幅に単純化する。既存の自然画像と映像データセットを利用して、現実のイメージを正確に分割し、その意味情報に基づいて、妥当な動きを予測できる。予測された動きは芸術的イメージに転送され、最終的なシネマグラフが作成される。本手法は,自然景観のシネマグラフ作成における既存の手法と,自動計測とユーザ研究によって検証された芸術的・異世界的なシーンに匹敵する手法である。最後に,既存の絵画のアニメーション化と,テキストによる動き方向制御の2つの拡張を示す。 We introduce Artistic Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions - an especially challenging task when prompts feature imaginary elements and artistic styles, given the complexity of interpreting the semantics and motions of these images. Existing single-image animation methods fall short on artistic inputs, and recent text-based video methods frequently introduce temporal inconsistencies, struggling to keep certain regions static. To address these challenges, we propose an idea of synthesizing image twins from a single text prompt - a pair of an artistic image and its pixel-aligned corresponding natural-looking twin. While the artistic image depicts the style and appearance detailed in our text prompt, the realistic counterpart greatly simplifies layout and motion analysis. Leveraging existing natural image and video datasets, we can accurately segment the realistic image and predict plausible motion given the semantic information. The predicted motion can then be transferred to the artistic image to create the final cinemagraph. Our method outperforms existing approaches in creating cinemagraphs for natural landscapes as well as artistic and other-worldly scenes, as validated by automated metrics and user studies. Finally, we demonstrate two extensions: animating existing paintings and controlling motion directions using text.	翻訳日:2023-07-07 12:58:30 公開日:2023-07-06
# ネルソン量子場理論のシミュレーション Simulating Nelsonian Quantum Field Theory ( http://arxiv.org/abs/2307.03188v1 ) ライセンス: Link先を確認	Andrea Carosso	(参考訳) 我々は、エドワード・ネルソンの確率力学が量子場理論に一般化する際に示唆する物理過程の全体像を、その理論を水素原子に適用した入門的考察の後に記述する。関連する確率過程の数値シミュレーションを行うことで、ネルソンの理論は、格子上で正規化された自由場理論の場合、ジョン・S・ベル(英語版)のフレーズを使うために、基礎となる場から粒子がどのように生じるかという直感的な説明を与える。すると、相互作用するスカラー場理論に一般化すると、この図は質的に似ていると論じる。最後に、Nelsonian フレームワークと QFT の他の様々な提案されたオントロジーを比較し、実効場理論のパラダイムに照らしてそれらの相対的なメリットについて述べる。 We describe the picture of physical processes suggested by Edward Nelson's stochastic mechanics when generalized to quantum field theory, after an introductory review of his theory applied to the hydrogen atom. By performing numerical simulations of the relevant stochastic processes, we observe that Nelson's theory provides an intuitive account of how particles can arise from an underlying field ``beable'' -- to use a phrase of John S. Bell -- in the case of free field theory, regularized on a lattice. We then argue that this picture looks qualitatively similar when generalized to interacting scalar field theory. Lastly, we compare the Nelsonian framework to various other proposed ontologies for QFT, and remark upon their relative merits in light of the effective field theory paradigm.	翻訳日:2023-07-07 12:58:07 公開日:2023-07-06
# TGRL:教師指導強化学習のためのアルゴリズム TGRL: An Algorithm for Teacher Guided Reinforcement Learning ( http://arxiv.org/abs/2307.03186v1 ) ライセンス: Link先を確認	Idan Shenfeld, Zhang-Wei Hong, Aviv Tamar, Pulkit Agrawal	(参考訳) 報酬(強化学習またはrl)から学び、教師を模倣する学習(教師・学生学習)は、逐次的な意思決定問題を解決するために確立された2つのアプローチである。これらの学習形態の利点を組み合わせるために、強化と教師-学生の学習目標の組合せを最大化するための政策を訓練することが一般的である。しかしながら、これらの目的のバランスをとるための原則的な方法がなければ、以前の研究は2つの目的のバランスをとるためにヒューリスティックスと問題固有のハイパーパラメーターサーチを使用した。私たちは、$\textit{principled}$アプローチと、$\textit{dynamically}$と$\textit{automatically}$ balanceingの近似実装を示します。主な考え方は,教師の指導を伴わず,報酬のみから,エージェントのパフォーマンスとエージェント学習の反事実シナリオを比較して,教師の監督の重要性を調整することである。教師の指導が向上すると、教師の監督の重要性が増し、それ以外は低下する。我々のメソッドである$\textit{Teacher Guided Reinforcement Learning}$ (TGRL)は、ハイパーパラメータチューニングなしで様々なドメインで強いベースラインを上回ります。 Learning from rewards (i.e., reinforcement learning or RL) and learning to imitate a teacher (i.e., teacher-student learning) are two established approaches for solving sequential decision-making problems. To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. However, without a principled method to balance these objectives, prior work used heuristics and problem-specific hyperparameter searches to balance the two objectives. We present a $\textit{principled}$ approach, along with an approximate implementation for $\textit{dynamically}$ and $\textit{automatically}$ balancing when to follow the teacher and when to use rewards. The main idea is to adjust the importance of teacher supervision by comparing the agent's performance to the counterfactual scenario of the agent learning without teacher supervision and only from rewards. If using teacher supervision improves performance, the importance of teacher supervision is increased and otherwise it is decreased. Our method, $\textit{Teacher Guided Reinforcement Learning}$ (TGRL), outperforms strong baselines across diverse domains without hyper-parameter tuning.	翻訳日:2023-07-07 12:57:52 公開日:2023-07-06
# チェシャー弦の位相相における弦作用素 String operators for Cheshire strings in topological phases ( http://arxiv.org/abs/2307.03180v1 ) ライセンス: Link先を確認	Nathanan Tantivasadakarn, Xie Chen	(参考訳) 3+1D位相相の初等点電荷励起は線に沿って凝縮し、チェシャー弦と呼ばれる子孫励起を形成する。系の基本的なフラックスループ励起とは異なり、チェシャー弦は2次元円板の境界として現れなくても開線セグメント上に存在する。一方、チェシャー弦は、0dの局所ユニタリと1d以上の有限深さ量子回路で生成できる自明な励起とは異なる。本稿では,チェシャー弦を生成するためには,弦の長さに沿って順次作用する線形深度回路が必要であることを示す。チェシャー弦が生成されると、その変形、運動、融合は有限深度回路によって実現される。この回路深度要件は、対称保護トポロジカル鎖やマヨラナ鎖を含むすべての非自明な子孫励起に適用される。 Elementary point charge excitations in 3+1D topological phases can condense along a line and form a descendant excitation called the Cheshire string. Unlike the elementary flux loop excitations in the system, Cheshire strings do not have to appear as the boundary of a 2D disc and can exist on open line segments. On the other hand, Cheshire strings are different from trivial excitations that can be created with local unitaries in 0d and finite depth quantum circuits in 1d and higher. In this paper, we show that to create a Cheshire string, one needs a linear depth circuit that acts sequentially along the length of the string. Once a Cheshire string is created, its deformation, movement and fusion can be realized by finite depths circuits. This circuit depth requirement applies to all nontrivial descendant excitations including symmetry-protected topological chains and the Majorana chain.	翻訳日:2023-07-07 12:57:26 公開日:2023-07-06
# IPO-LDM:潜伏拡散モデルによる深度360度の室内RGBパノラマ画 IPO-LDM: Depth-aided 360-degree Indoor RGB Panorama Outpainting via Latent Diffusion Model ( http://arxiv.org/abs/2307.03177v1 ) ライセンス: Link先を確認	Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham	(参考訳) 狭視野画像から完全な360度パノラマを生成することは、全方位RGBデータが容易に利用できないため、現在進行中である。既存のGANベースのアプローチは、高品質な出力を実現するための障壁に直面し、異なるマスクタイプに対する一般化性能が劣る。本稿では,潜伏拡散モデル (LDM) を用いた360度室内RGBパノラマ露光モデルであるIPO-LDMを提案する。トレーニング中にRGBと深度パノラマデータの両方を利用する新しいバイモーダル潜伏拡散構造を導入するが、推定時に正常な深度のないRGB画像よりも驚くほどよく機能する。さらに,拡散分別ステップ毎にプログレッシブカメラ回転を導入する新しい手法を提案する。その結果、当社のIPO-LDMは、RGBパノラマのパノラマ画における最先端の手法よりも優れており、さまざまな種類のマスクに対して、多様かつ多様に構造化された結果を得ることができることがわかった。 Generating complete 360-degree panoramas from narrow field of view images is ongoing research as omnidirectional RGB data is not readily available. Existing GAN-based approaches face some barriers to achieving higher quality output, and have poor generalization performance over different mask types. In this paper, we present our 360-degree indoor RGB panorama outpainting model using latent diffusion models (LDM), called IPO-LDM. We introduce a new bi-modal latent diffusion structure that utilizes both RGB and depth panoramic data during training, but works surprisingly well to outpaint normal depth-free RGB images during inference. We further propose a novel technique of introducing progressive camera rotations during each diffusion denoising step, which leads to substantial improvement in achieving panorama wraparound consistency. Results show that our IPO-LDM not only significantly outperforms state-of-the-art methods on RGB panorama outpainting, but can also produce multiple and diverse well-structured results for different types of masks.	翻訳日:2023-07-07 12:57:13 公開日:2023-07-06
# 不均一な特徴サブサンプルリッジアンサンブルの学習曲線 Learning Curves for Heterogeneous Feature-Subsampled Ridge Ensembles ( http://arxiv.org/abs/2307.03176v1 ) ライセンス: Link先を確認	Benjamin S. Ruben, Cengiz Pehlevan	(参考訳) 特徴バッキング(feature bagging)は、ランダムなサブサンプルや特徴の投影のアンサンブルにおいて推定子を訓練することで予測分散を減らすことを目的とした、確立されたセンスリング手法である。通常、アンサンブルは均質であると選択されるが、この意味では、エスティメータが利用できる特徴次元の数はアンサンブル全体で一様である。本稿では,様々な特徴次元に基づいて推定器を組み込んだ不均一な特徴アンサンブルを導入し,その性能を線形回帰条件で検討する。線形予測器のアンサンブルについて検討し,利用可能な特徴のサブセットにリッジ回帰を用いた。これらのサブセットに含まれる機能の数を変更できるようにします。統計物理学からのレプリカのトリックを用いて、決定論的線形マスクを用いたリッジアンサンブルの学習曲線を導出する。等方性特徴雑音を伴う等相関データの場合,学習曲線の明示的な表現を求める。導出表現を用いてサブサンプリングとアンサンブルの効果を調査し,ノイズレベル,データ相関,データ-タスクアライメントのパラメータ空間における最適なアンサンブル戦略の急激な遷移を見出した。最後に,頑健な機械学習のための二重降下を緩和するための戦略として,可変次元特徴バッキングを提案する。 Feature bagging is a well-established ensembling method which aims to reduce prediction variance by training estimators in an ensemble on random subsamples or projections of features. Typically, ensembles are chosen to be homogeneous, in the sense the the number of feature dimensions available to an estimator is uniform across the ensemble. Here, we introduce heterogeneous feature ensembling, with estimators built on varying number of feature dimensions, and consider its performance in a linear regression setting. We study an ensemble of linear predictors, each fit using ridge regression on a subset of the available features. We allow the number of features included in these subsets to vary. Using the replica trick from statistical physics, we derive learning curves for ridge ensembles with deterministic linear masks. We obtain explicit expressions for the learning curves in the case of equicorrelated data with an isotropic feature noise. Using the derived expressions, we investigate the effect of subsampling and ensembling, finding sharp transitions in the optimal ensembling strategy in the parameter space of noise level, data correlations, and data-task alignment. Finally, we suggest variable-dimension feature bagging as a strategy to mitigate double descent for robust machine learning in practice.	翻訳日:2023-07-07 12:56:41 公開日:2023-07-06
# 緑を追い越す - 植物葉を移動して見ることを学ぶ Push Past Green: Learning to Look Behind Plant Foliage by Moving It ( http://arxiv.org/abs/2307.03175v1 ) ライセンス: Link先を確認	Xiaoyu Zhang, Saurabh Gupta	(参考訳) 自律農業の応用(例えば検査、表現型、摘み果物)には、葉と枝の後ろを見るために植物葉を操作する必要がある。部分的な可視性、極端に粗い構造、植物のための未知の幾何学と力学は、そのような操作を困難にしている。データ駆動方式でこれらの課題に取り組む。 SRPNetは、特定の植物に対する候補アクションの実行時に、どの空間が露呈しているかを予測するニューラルネットワークである。我々は,srpnet とクロスエントロピー法を用いて,葉下空間の解明に有効な行動を予測した。さらに、SRPNetは、どれだけの空間が露光されるかだけでなく、どこで露光されるかを予測するだけでなく、植物葉の下の空間を徐々に明らかにする一連の行動を実行することができる。本研究は, 人工植物(Dracaena) と実植物(Dracaena) を, 新しい植物構成への一般化をテストする2つの設定を含む5つの物理的テストベッド上で実験した。本研究は,手作り力学モデルと関連するアブレーションに対するsrpnetの有効性と,競合する手作り探索法に対する総合的手法であるppgの有効性を明らかにした。 Autonomous agriculture applications (e.g., inspection, phenotyping, plucking fruits) require manipulating the plant foliage to look behind the leaves and the branches. Partial visibility, extreme clutter, thin structures, and unknown geometry and dynamics for plants make such manipulation challenging. We tackle these challenges through data-driven methods. We use self-supervision to train SRPNet, a neural network that predicts what space is revealed on execution of a candidate action on a given plant. We use SRPNet with the cross-entropy method to predict actions that are effective at revealing space beneath plant foliage. Furthermore, as SRPNet does not just predict how much space is revealed but also where it is revealed, we can execute a sequence of actions that incrementally reveal more and more space beneath the plant foliage. We experiment with a synthetic (vines) and a real plant (Dracaena) on a physical test-bed across 5 settings including 2 settings that test generalization to novel plant configurations. Our experiments reveal the effectiveness of our overall method, PPG, over a competitive hand-crafted exploration method, and the effectiveness of SRPNet over a hand-crafted dynamics model and relevant ablations.	翻訳日:2023-07-07 12:56:18 公開日:2023-07-06
# 中間の損失:言語モデルが長い文脈をどのように使うか Lost in the Middle: How Language Models Use Long Contexts ( http://arxiv.org/abs/2307.03172v1 ) ライセンス: Link先を確認	Nelson F. Liu and Kevin Lin and John Hewitt and Ashwin Paranjape and Michele Bevilacqua and Fabio Petroni and Percy Liang	(参考訳) 最近の言語モデルは、長いコンテキストを入力として扱うことができるが、言語モデルがいかに長いコンテキストを使用するかは、比較的分かっていない。入力コンテキスト内の関連情報を識別する必要のある2つのタスクにおける言語モデルのパフォーマンスを分析する。入力コンテキストの開始時や終了時に関連情報が生じた場合、性能が最も高く、長いコンテキストの途中でモデルが関連する情報にアクセスしなければならない場合、大幅に低下する。さらに、明示的な長期コンテキストモデルであっても、入力コンテキストが長くなるにつれてパフォーマンスが大幅に低下する。分析は、言語モデルが入力コンテキストをどのように利用するかをよりよく理解し、将来のロングコンテキストモデルのための新しい評価プロトコルを提供する。 While recent language models have the ability to take long contexts as input, relatively little is known about how well the language models use longer context. We analyze language model performance on two tasks that require identifying relevant information within their input contexts: multi-document question answering and key-value retrieval. We find that performance is often highest when relevant information occurs at the beginning or end of the input context, and significantly degrades when models must access relevant information in the middle of long contexts. Furthermore, performance substantially decreases as the input context grows longer, even for explicitly long-context models. Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long-context models.	翻訳日:2023-07-07 12:55:56 公開日:2023-07-06
# LEO:多目的2値決定図のための効率的な順序付け学習 LEO: Learning Efficient Orderings for Multiobjective Binary Decision Diagrams ( http://arxiv.org/abs/2307.03171v1 ) ライセンス: Link先を確認	Rahul Patel, Elias B. Khalil	(参考訳) 双対決定図(BDD)に基づくアプローチは、最近、多目的整数プログラミング問題に対する最先端の結果を得た。 BDDの構築に使用される変数の順序付けは、そのサイズや、単一目的の最適化問題に対する緩和あるいは制限されたBDDから派生したバウンダリの品質に大きな影響を与える可能性がある。我々はまず,多目的ナップサック問題に対するpareto frontier(pf)列挙時間に対する変数順序付けの類似性を示し,多目的bddアプローチのスケーラビリティを向上させる変数順序付けメソッドの導出の必要性を示唆する。そこで我々は,小さな解釈可能かつ容易に計算可能な変数特徴セットにおいて線形な変数スコアリング関数に基づいて,新しいパラメータ構成空間を導出する。ブラックボックス最適化を用いて構成空間を効率的に探索し、次元の呪いを回避し(変数数と目的数)、PF列挙時間を短縮する優れた順序付けを見つける方法を示す。しかし、ブラックボックス最適化アプローチは、良好な変数順序付けによる時間の削減よりも大きい計算オーバーヘッドを伴います。この問題を軽減するために、列挙時間を削減する効率的な変数順序付けを見つけるための教師付き学習手法LEOを提案する。クナプサック問題の3～7の目的と最大80の変数によるベンチマークセットの実験では、LEOは一般的な順序付け戦略やアルゴリズム構成よりも30～300%、PF列挙では10～200%高速であることが示されている。私たちのコードとインスタンスはhttps://github.com/khalil-research/leoで利用可能です。 Approaches based on Binary decision diagrams (BDDs) have recently achieved state-of-the-art results for multiobjective integer programming problems. The variable ordering used in constructing BDDs can have a significant impact on their size and on the quality of bounds derived from relaxed or restricted BDDs for single-objective optimization problems. We first showcase a similar impact of variable ordering on the Pareto frontier (PF) enumeration time for the multiobjective knapsack problem, suggesting the need for deriving variable ordering methods that improve the scalability of the multiobjective BDD approach. To that end, we derive a novel parameter configuration space based on variable scoring functions which are linear in a small set of interpretable and easy-to-compute variable features. We show how the configuration space can be efficiently explored using black-box optimization, circumventing the curse of dimensionality (in the number of variables and objectives), and finding good orderings that reduce the PF enumeration time. However, black-box optimization approaches incur a computational overhead that outweighs the reduction in time due to good variable ordering. To alleviate this issue, we propose LEO, a supervised learning approach for finding efficient variable orderings that reduce the enumeration time. Experiments on benchmark sets from the knapsack problem with 3-7 objectives and up to 80 variables show that LEO is ~30-300% and ~10-200% faster at PF enumeration than common ordering strategies and algorithm configuration. Our code and instances are available at https://github.com/khalil-research/leo.	翻訳日:2023-07-07 12:55:43 公開日:2023-07-06
# Focused Transformer: コンテキストスケーリングのためのコントラストトレーニング Focused Transformer: Contrastive Training for Context Scaling ( http://arxiv.org/abs/2307.03170v1 ) ライセンス: Link先を確認	Szymon Tworkowski, Konrad Staniszewski, Miko{\l}aj Pacek, Yuhuai Wu, Henryk Michalewski, Piotr Mi{\l}o\'s	(参考訳) 大規模言語モデルは、文脈的に新しい情報を組み込む特別な能力を持っている。しかし、そのようなアプローチの完全なポテンシャルは、有効文脈長の制限のためにしばしば抑制される。この問題の解決策の1つは、(キー、値)ペアからなる外部メモリへのアクセスを持つ注意層を提供することである。しかし、文書の数が増えるにつれて、関連するキーの無関係なキーに対する割合が減少し、無関係なキーにもっと集中するようになる。そこでは、異なるセマンティックな値に関連付けられたキーが重複し、区別が困難になる可能性がある。そこで,本研究では,コントラスト学習に触発された学習プロセスを用いる手法であるフォーカストランスフォーマ(fot)を提案する。この新しいアプローチは(キー、値)空間の構造を強化し、コンテキスト長の拡張を可能にする。提案手法では,既存の大規模モデルを微調整して有効コンテキストを延長することができる。これは3b$と7b$ openllamaチェックポイントの微調整で示されています。結果として得られたモデルはLongLLaMAと呼ばれ、長いコンテキストを必要とするタスクの進歩を示す。さらに,我々のLongLLaMAモデルではパスキー検索のコンテキスト長が256k$であることを示す。 Large language models have an exceptional capability to incorporate new information in a contextual manner. However, the full potential of such an approach is often restrained due to a limitation in the effective context length. One solution to this issue is to endow an attention layer with access to an external memory, which comprises of (key, value) pairs. Yet, as the number of documents increases, the proportion of relevant keys to irrelevant ones decreases, leading the model to focus more on the irrelevant keys. We identify a significant challenge, dubbed the distraction issue, where keys linked to different semantic values might overlap, making them hard to distinguish. To tackle this problem, we introduce the Focused Transformer (FoT), a technique that employs a training process inspired by contrastive learning. This novel approach enhances the structure of the (key, value) space, enabling an extension of the context length. Our method allows for fine-tuning pre-existing, large-scale models to lengthen their effective context. This is demonstrated by our fine-tuning of $3B$ and $7B$ OpenLLaMA checkpoints. The resulting models, which we name LongLLaMA, exhibit advancements in tasks requiring a long context. We further illustrate that our LongLLaMA models adeptly manage a $256 k$ context length for passkey retrieval.	翻訳日:2023-07-07 12:55:14 公開日:2023-07-06
# リカレントトレンド予測ニューラルネットワークに基づく予測組込みスケジューリングによるスマートホーム環境の再生可能エネルギー管理 Renewable energy management in smart home environment via forecast embedded scheduling based on Recurrent Trend Predictive Neural Network ( http://arxiv.org/abs/2307.01622v2 ) ライセンス: Link先を確認	Mert Nak{\i}p, Onur \c{C}opur, Emrah Biyik, C\"uneyt G\"uzeli\c{s}	(参考訳) スマートホームエネルギー管理システムは、配電網をより効率的かつ確実に運用し、分散型再生可能エネルギー源の効果的な普及を可能にする。これらのシステムは、需要と再生可能生成の不確実性を扱うことのできる堅牢な予測、最適化、制御/スケジューリングアルゴリズムに依存している。本稿では,Recurrent Trends Predictive Neural Network based Forecast Embedded Scheduling (rTPNN-FES)と呼ばれるMLアルゴリズムを提案する。 rTPNN-FESは、再生可能エネルギーの発生と家電のスケジュールを同時に予測する新しいニューラルネットワークアーキテクチャである。組込み構造により、rTPNN-FESは予測とスケジューリングのための別々のアルゴリズムの使用を排除し、予測エラーに対して堅牢なスケジュールを生成する。本稿では,iot対応スマートホームにおける提案アルゴリズムの性能評価も行う。評価結果から, rTPNN-FESは最適化よりも37.5ドルの速さで, 最先端予測技術より優れていることがわかった。 Smart home energy management systems help the distribution grid operate more efficiently and reliably, and enable effective penetration of distributed renewable energy sources. These systems rely on robust forecasting, optimization, and control/scheduling algorithms that can handle the uncertain nature of demand and renewable generation. This paper proposes an advanced ML algorithm, called Recurrent Trend Predictive Neural Network based Forecast Embedded Scheduling (rTPNN-FES), to provide efficient residential demand control. rTPNN-FES is a novel neural network architecture that simultaneously forecasts renewable energy generation and schedules household appliances. By its embedded structure, rTPNN-FES eliminates the utilization of separate algorithms for forecasting and scheduling and generates a schedule that is robust against forecasting errors. This paper also evaluates the performance of the proposed algorithm for an IoT-enabled smart home. The evaluation results reveal that rTPNN-FES provides near-optimal scheduling $37.5$ times faster than the optimization while outperforming state-of-the-art forecasting techniques.	翻訳日:2023-07-07 11:11:26 公開日:2023-07-06
# イントロスペクティブロボット組立のための正規化フローを用いた密度ベースフィージビリティ学習 Density-based Feasibility Learning with Normalizing Flows for Introspective Robotic Assembly ( http://arxiv.org/abs/2307.01317v2 ) ライセンス: Link先を確認	Jianxiang Feng, Matan Atad, Ismael Rodr\'iguez, Maximilian Durner, Stephan G\"unnemann, Rudolph Triebel	(参考訳) ロボットアセンブリシーケンスプランニング(RASP)における機械学習(ML)モデルは、予測されたソリューション、すなわち、潜在的効率劣化を回避するために、イントロスペクティブである必要がある。以前の作業では、トレーニング中に実現可能な例と実行不可能な例の両方が必要です。しかし、新しい製品に素早く適応するために再トレーニングが必要な場合、実現不可能なものは十分な収集が困難である。本研究では,実例のみを必要とする密度ベース実現可能性学習手法を提案する。具体的には,複雑な確率分布を推定するための強力な生成モデルである正規化フロー(nf)を用いて,分散(ood)検出として実現可能性学習問題を定式化する。実証的に,提案手法はロボットアセンブリのユースケースで実証され,実現不可能なアセンブリの検出において,他の単一クラスベースラインよりも優れる。さらに,本手法の内部動作機構について検討し,NFの高度変種に基づいて大きなメモリ節約が得られることを示す。 Machine Learning (ML) models in Robotic Assembly Sequence Planning (RASP) need to be introspective on the predicted solutions, i.e. whether they are feasible or not, to circumvent potential efficiency degradation. Previous works need both feasible and infeasible examples during training. However, the infeasible ones are hard to collect sufficiently when re-training is required for swift adaptation to new product variants. In this work, we propose a density-based feasibility learning method that requires only feasible examples. Concretely, we formulate the feasibility learning problem as Out-of-Distribution (OOD) detection with Normalizing Flows (NF), which are powerful generative models for estimating complex probability distributions. Empirically, the proposed method is demonstrated on robotic assembly use cases and outperforms other single-class baselines in detecting infeasible assemblies. We further investigate the internal working mechanism of our method and show that a large memory saving can be obtained based on an advanced variant of NF.	翻訳日:2023-07-07 11:11:08 公開日:2023-07-06
# 信頼性の高いAI:次世代の量子コンピューティングは必要か? Reliable AI: Does the Next Generation Require Quantum Computing? ( http://arxiv.org/abs/2307.01301v2 ) ライセンス: Link先を確認	Aras Bacho, Holger Boche, Gitta Kutyniok	(参考訳) 本研究では、次世代の人工知能が量子コンピューティングを必要とするかどうかという根本的な疑問を探究する。人工知能は私たちの日常生活において重要な役割を担っており、第4次産業革命の中心となっている。したがって、人工知能が信頼性と信頼性を持つことが必須である。しかし、自動運転、医療、ロボティクスなどの分野において、プライバシ、責任、安全性、セキュリティなど、人工知能の信頼性にはまだ多くの問題がある。これらの問題には、不十分なデータ、バイアス、堅牢性問題、およびデジタルハードウェアにおける計算可能性問題など、様々な原因がある。これらの計算可能性問題の原因は、デジタルハードウェアが本質的に離散的なチューリングマシンの計算モデルに基づいているという事実にある。特に,デジタルハードウェアは最適化,深層学習,微分方程式の問題解決に本質的に制約されている。したがって、これらの制限は人工知能の分野、特に機械学習に重大な意味を持つ。さらに、量子コンピュータがある種の問題に対して量子的優位性を示すことはよく知られているが、量子回路や量子チューリングマシンのパラダイムに基づく量子コンピューティングモデルを使用する場合、これらの制限の一部は持続する。対照的に、Blum-Shub-Smale マシンのようなアナログコンピューティングモデルは、これらの制限を克服する可能性を示している。 In this survey, we aim to explore the fundamental question of whether the next generation of artificial intelligence requires quantum computing. Artificial intelligence is increasingly playing a crucial role in many aspects of our daily lives and is central to the fourth industrial revolution. It is therefore imperative that artificial intelligence is reliable and trustworthy. However, there are still many issues with reliability of artificial intelligence, such as privacy, responsibility, safety, and security, in areas such as autonomous driving, healthcare, robotics, and others. These problems can have various causes, including insufficient data, biases, and robustness problems, as well as fundamental issues such as computability problems on digital hardware. The cause of these computability problems is rooted in the fact that digital hardware is based on the computing model of the Turing machine, which is inherently discrete. Notably, our findings demonstrate that digital hardware is inherently constrained in solving problems about optimization, deep learning, or differential equations. Therefore, these limitations carry substantial implications for the field of artificial intelligence, in particular for machine learning. Furthermore, although it is well known that the quantum computer shows a quantum advantage for certain classes of problems, our findings establish that some of these limitations persist when employing quantum computing models based on the quantum circuit or the quantum Turing machine paradigm. In contrast, analog computing models, such as the Blum-Shub-Smale machine, exhibit the potential to surmount these limitations.	翻訳日:2023-07-07 11:10:48 公開日:2023-07-06
# 医用画像合成のための3次元潜伏拡散モデルにおけるデータ記憶の検討 Investigating Data Memorization in 3D Latent Diffusion Models for Medical Image Synthesis ( http://arxiv.org/abs/2307.01148v2 ) ライセンス: Link先を確認	Salman Ul Hassan Dar, Arman Ghanaat, Jannik Kahmann, Isabelle Ayx, Theano Papavassiliu, Stefan O. Schoenberg, Sandy Engelhardt	(参考訳) 生成潜在拡散モデルはデータ生成の最先端として確立されている。有望な応用の1つは、患者のプライバシーを損なうことなく、オープンデータ共有のための現実的な合成医療画像データを生成することである。それにもかかわらず、敏感な患者のトレーニングデータを記憶し、トレーニングデータによく似たサンプルを合成するモデルの能力は、比較的未調査である。本稿では, 冠動脈造影および膝磁気共鳴画像データセットを用いた3次元潜時拡散モデルの記憶能力の評価を行った。トレーニングサンプルの潜在的な暗記を検出するために,コントラスト学習に基づく自己教師型モデルを用いる。以上の結果から,このような潜伏拡散モデルがトレーニングデータを記憶し,記憶化を緩和するための戦略を考案する必要があることが示唆された。 Generative latent diffusion models have been established as state-of-the-art in data generation. One promising application is generation of realistic synthetic medical imaging data for open data sharing without compromising patient privacy. Despite the promise, the capacity of such models to memorize sensitive patient training data and synthesize samples showing high resemblance to training data samples is relatively unexplored. Here, we assess the memorization capacity of 3D latent diffusion models on photon-counting coronary computed tomography angiography and knee magnetic resonance imaging datasets. To detect potential memorization of training samples, we utilize self-supervised models based on contrastive learning. Our results suggest that such latent diffusion models indeed memorize training data, and there is a dire need for devising strategies to mitigate memorization.	翻訳日:2023-07-07 11:10:24 公開日:2023-07-06
# REAL: アクティブラーニングのための代表的エラー駆動アプローチ REAL: A Representative Error-Driven Approach for Active Learning ( http://arxiv.org/abs/2307.00968v2 ) ライセンス: Link先を確認	Cheng Chen, Yong Wang, Lizi Liao, Yueguo Chen, Xiaoyong Du	(参考訳) ラベル付け予算が限られているため、active learning(al)はラベルのないプールから最も有益なインスタンスをサンプリングし、その後のモデルトレーニングのためにラベルを取得することを目的としている。これを達成するため、ALは通常、不確実性と多様性に基づいてラベルなしのインスタンスの情報性を測定する。しかし、モデルの性能を向上させる大きな可能性を持つ近傍誤差密度の誤例は考慮していない。この制限に対処するために、$REAL$という新しいアプローチを提案し、$\underline{R}$epresentative $\underline{E}$rrors for $\underline{A}$ctive $\underline{L}$earning。クラスタ内の少数派予測を 'emph{pseudo error} と識別し、推定エラー密度に基づいてクラスタの適応的なサンプリング予算を割り当てる。 5つのテキスト分類データセットの大規模な実験により、$REAL$は、幅広いハイパーパラメータ設定における精度とF1-macroスコアに関するすべての最高のパフォーマンスベースラインを一貫して上回ります。我々の分析によると、$REAL$は決定境界に沿った地道誤差の分布と一致する最も代表的な擬似エラーを選択する。私たちのコードはhttps://github.com/withchencheng/ECML_PKDD_23_Realで公開されています。 Given a limited labeling budget, active learning (AL) aims to sample the most informative instances from an unlabeled pool to acquire labels for subsequent model training. To achieve this, AL typically measures the informativeness of unlabeled instances based on uncertainty and diversity. However, it does not consider erroneous instances with their neighborhood error density, which have great potential to improve the model performance. To address this limitation, we propose $REAL$, a novel approach to select data instances with $\underline{R}$epresentative $\underline{E}$rrors for $\underline{A}$ctive $\underline{L}$earning. It identifies minority predictions as \emph{pseudo errors} within a cluster and allocates an adaptive sampling budget for the cluster based on estimated error density. Extensive experiments on five text classification datasets demonstrate that $REAL$ consistently outperforms all best-performing baselines regarding accuracy and F1-macro scores across a wide range of hyperparameter settings. Our analysis also shows that $REAL$ selects the most representative pseudo errors that match the distribution of ground-truth errors along the decision boundary. Our code is publicly available at https://github.com/withchencheng/ECML_PKDD_23_Real.	翻訳日:2023-07-07 11:10:10 公開日:2023-07-06
# 超高分解能セグメンテーションのための空間整合性誘導パッチグルーピングウェーブレット変換器 Guided Patch-Grouping Wavelet Transformer with Spatial Congruence for Ultra-High Resolution Segmentation ( http://arxiv.org/abs/2307.00711v2 ) ライセンス: Link先を確認	Deyi Ji, Feng Zhao, Hongtao Lu	(参考訳) 既存の超高分解能(UHR)セグメンテーション手法は、メモリコストと局所特性のバランスをとるジレンマに常に苦労している。この研究において、gpwformerはtransform($\mathcal{t}$)-cnn($\mathcal{c}$)相互傾きフレームワークであり、$\mathcal{t}$はuhrイメージ全体を入力として、局所的な詳細と細かな長距離のコンテキスト依存性の両方を収集する。高い推論速度と計算の複雑さのために、$\mathcal{t}$ は元の uhr 画像をパッチに分割し、動的にグループ化し、軽量の multi-head wavelet transformer (wformer) ネットワークで低レベルなローカル詳細を学ぶ。一方で、このプロセスでは、空間領域から遠く離れたパッチを同じグループに割り当てることもできるため、細かな長距離のコンテキスト依存性もキャプチャされる。さらに、$\mathcal{c}$で生成されるマスクを使用してパッチグループ化プロセスをガイドし、ヒューリスティックス決定を提供する。さらに、パッチ間の空間的一貫性を維持するために、2つのブランチ間の共役制約も活用する。全体としては、マルチステージのプロセスをピラミッド的な方法で積み重ねます。 GPWFormerは5つのベンチマークデータセットで大幅に改善され、既存のメソッドよりも優れていた。 Most existing ultra-high resolution (UHR) segmentation methods always struggle in the dilemma of balancing memory cost and local characterization accuracy, which are both taken into account in our proposed Guided Patch-Grouping Wavelet Transformer (GPWFormer) that achieves impressive performances. In this work, GPWFormer is a Transformer ($\mathcal{T}$)-CNN ($\mathcal{C}$) mutual leaning framework, where $\mathcal{T}$ takes the whole UHR image as input and harvests both local details and fine-grained long-range contextual dependencies, while $\mathcal{C}$ takes downsampled image as input for learning the category-wise deep context. For the sake of high inference speed and low computation complexity, $\mathcal{T}$ partitions the original UHR image into patches and groups them dynamically, then learns the low-level local details with the lightweight multi-head Wavelet Transformer (WFormer) network. Meanwhile, the fine-grained long-range contextual dependencies are also captured during this process, since patches that are far away in the spatial domain can also be assigned to the same group. In addition, masks produced by $\mathcal{C}$ are utilized to guide the patch grouping process, providing a heuristics decision. Moreover, the congruence constraints between the two branches are also exploited to maintain the spatial consistency among the patches. Overall, we stack the multi-stage process in a pyramid way. Experiments show that GPWFormer outperforms the existing methods with significant improvements on five benchmark datasets.	翻訳日:2023-07-07 11:09:42 公開日:2023-07-06
# 一般量子マルコフ過程のヒット時間について On Hitting Times for General Quantum Markov Processes ( http://arxiv.org/abs/2210.10188v3 ) ライセンス: Link先を確認	Lorenzo Laneve, Francesco Tacchino, Ivano Tavernelli	(参考訳) ランダムウォーク(英: Random walk、またはMarkov chains)は、理論計算機科学で広く使われているモデルである。打つ時間や混合時間などの量の分析を含むいくつかのツールは、ランダム化されたアルゴリズムを考案するのに役立ちます。注目すべき例はsch\"oning's algorithm for the satisfiability (sat) problemである。本研究では,古典的ウォークを直接一般化する量子マルコフ連鎖モデルを定義するために密度行列形式を用い,古典的理論で見られるものと同様の公式で時間を打つような共通ツールが計算できることを示し,グロバーのアルゴリズムのような既知の量子的設定に適用する。 Random walks (or Markov chains) are models extensively used in theoretical computer science. Several tools, including analysis of quantities such as hitting and mixing times, are helpful for devising randomized algorithms. A notable example is Sch\"oning's algorithm for the satisfiability (SAT) problem. In this work, we use the density-matrix formalism to define a quantum Markov chain model which directly generalizes classical walks, and we show that a common tools such as hitting times can be computed with a similar formula as the one found in the classical theory, which we then apply to known quantum settings such as Grover's algorithm.	翻訳日:2023-07-07 11:09:09 公開日:2023-07-06
# 自然言語証明計画のための帰納的加法 Deductive Additivity for Planning of Natural Language Proofs ( http://arxiv.org/abs/2307.02472v2 ) ライセンス: Link先を確認	Zayne Sprague, Kaj Bostrom, Swarat Chaudhuri, Greg Durrett	(参考訳) マルチステップのクレーム検証のために設計された現在の自然言語システムは、2つのフェーズで運用される: ヒューリスティック(計画)を用いて関連する前提文の集合を検索し、大きな言語モデル(推論)を使用してそれらのステートメントから新しい結論を生成する。計画ステップは、しばしば高価なトランスフォーマー操作を必要とし、任意の数の前提ステートメントにスケールしない。本稿では,帰納的推論に適合する埋め込み空間を通じて,効率的な計画ヒューリスティックが可能かどうかを検討する。具体的には、埋め込み空間が帰納的加法 (deductive additivity) と呼ばれる性質を示すかどうかを評価する: 前提文の和は、それらの前提に基づく結論の埋め込みに近いべきである。我々は,GPT3からの細調整された埋め込みやBM25からのスパース埋め込みに加えて,既成の密着な埋め込みの複数の源を探究する。本研究は, 帰納的加法の性質が持つか, 極端なか, 自然言語証明生成における計画支援に利用するか, 両方の組込みモデルを本質的に検討した。最後に,Single-Step Reasoning Contrast(SSRC)というデータセットを作成し,さまざまな推論タイプのパフォーマンスを調査する。以上より,標準組込み手法は,前提の和に近い結論をしばしば埋め込むが,それらは効果的なヒューリスティックであり,推論の特定のカテゴリをモデル化する能力に欠けることが示唆された。 Current natural language systems designed for multi-step claim validation typically operate in two phases: retrieve a set of relevant premise statements using heuristics (planning), then generate novel conclusions from those statements using a large language model (deduction). The planning step often requires expensive Transformer operations and does not scale to arbitrary numbers of premise statements. In this paper, we investigate whether an efficient planning heuristic is possible via embedding spaces compatible with deductive reasoning. Specifically, we evaluate whether embedding spaces exhibit a property we call deductive additivity: the sum of premise statement embeddings should be close to embeddings of conclusions based on those premises. We explore multiple sources of off-the-shelf dense embeddings in addition to fine-tuned embeddings from GPT3 and sparse embeddings from BM25. We study embedding models both intrinsically, evaluating whether the property of deductive additivity holds, and extrinsically, using them to assist planning in natural language proof generation. Lastly, we create a dataset, Single-Step Reasoning Contrast (SSRC), to further probe performance on various reasoning types. Our findings suggest that while standard embedding methods frequently embed conclusions near the sums of their premises, they fall short of being effective heuristics and lack the ability to model certain categories of reasoning.	翻訳日:2023-07-07 11:03:15 公開日:2023-07-06
# マルチコントラストMRIにおけるDual Arbitrary Scale Super-Resolution Dual Arbitrary Scale Super-Resolution for Multi-Contrast MRI ( http://arxiv.org/abs/2307.02334v2 ) ライセンス: Link先を確認	Jiamiao Zhang, Yichen Chi, Jun Lyu, Wenming Yang, Yapeng Tian	(参考訳) イメージングシステムによって制限された部分的計測からMRI画像の再構成は、医療画像研究に不可欠である。異なる撮像モードのマルチコントラストmr画像の多様かつ相補的な情報から、マルチコントラストスーパーレゾリューション(sr)再構成は高品質のsr画像が得られると期待されている。医学的シナリオでは、多くのMRI SR法で用いられるように、病変を完全に可視化するために、放射線医は固定スケールではなく任意のスケールでMRI画像を拡大することに慣れている。さらに、既存のマルチコントラストMRI SR法では、参照画像の固定解像度を必要とすることが多く、参照画像の取得が困難になり、任意のスケールの SR タスクに制限が課される。これらの問題に対処するため,我々はDual-ArbNetと呼ばれる2軸マルチコントラストMRI超解像法を提案する。まず,対象画像と参照画像の解像度を特徴エンコーダで分離し,ネットワークが任意のスケールで対象画像と参照画像を入力できるようにする。そして、暗黙の融合復号器がマルチコントラスト特徴を融合し、インプリシット復号関数~(IDF)を用いて最終的なMRI SR結果を得る。さらに,我々のネットワークをトレーニングするためのカリキュラム学習戦略を導入し,dual-arbnetの一般化と性能を向上させる。 2つの公開MRIデータセットにおける広範囲な実験により、我々の手法は異なるスケール要因下で最先端のアプローチよりも優れており、臨床実践において大きな可能性を秘めていることが示された。 Limited by imaging systems, the reconstruction of Magnetic Resonance Imaging (MRI) images from partial measurement is essential to medical imaging research. Benefiting from the diverse and complementary information of multi-contrast MR images in different imaging modalities, multi-contrast Super-Resolution (SR) reconstruction is promising to yield SR images with higher quality. In the medical scenario, to fully visualize the lesion, radiologists are accustomed to zooming the MR images at arbitrary scales rather than using a fixed scale, as used by most MRI SR methods. In addition, existing multi-contrast MRI SR methods often require a fixed resolution for the reference image, which makes acquiring reference images difficult and imposes limitations on arbitrary scale SR tasks. To address these issues, we proposed an implicit neural representations based dual-arbitrary multi-contrast MRI super-resolution method, called Dual-ArbNet. First, we decouple the resolution of the target and reference images by a feature encoder, enabling the network to input target and reference images at arbitrary scales. Then, an implicit fusion decoder fuses the multi-contrast features and uses an Implicit Decoding Function~(IDF) to obtain the final MRI SR results. Furthermore, we introduce a curriculum learning strategy to train our network, which improves the generalization and performance of our Dual-ArbNet. Extensive experiments in two public MRI datasets demonstrate that our method outperforms state-of-the-art approaches under different scale factors and has great potential in clinical practice.	翻訳日:2023-07-07 11:02:47 公開日:2023-07-06
# ChatGPT生成データを用いたソーシャルメディアからの抑うつ症状の検索 Utilizing ChatGPT Generated Data to Retrieve Depression Symptoms from Social Media ( http://arxiv.org/abs/2307.02313v2 ) ライセンス: Link先を確認	Ana-Maria Bucur	(参考訳) 本稿では,抑うつ症状の検索におけるeRisk LabタスクにおけるBLUEチームの貢献について述べる。このタスクは、BDI-IIアンケートからうつ病の症状を伝えるRedditのソーシャルメディア文の検索とランキングから成り立っている。 llmsが提供した合成データがデータ拡張と下流モデルの微調整の信頼できる方法であることが証明されていることから,bdi-iiアンケートの症状ごとにchatgptを用いて合成データを生成する方法を選択した。生成したデータは各質問に対するBDI-II応答よりもリッチでセマンティックな多様性を含み、同時にReddit上でのより親密な体験共有に特有な感情的・逸話的体験を含むようにプロンプトを設計した。意味探索を行い,コサイン類似性により文のBDI-II症状との関連をランク付けする。 MentalRoBERTaとMPNetの変種である2つの最先端トランスフォーマーモデルを用いて,BDI-IIのオリジナルおよび生成された応答であるソーシャルメディアポストを埋め込んだ。その結果,意味探索用に設計されたモデルからの文の埋め込みは,メンタルヘルスデータに基づいて事前学習したモデルからの埋め込みよりも優れていることがわかった。さらに、生成した合成データは、このタスクにあまり具体的でないことが証明され、bdi-ii応答に依存するアプローチが最良の性能を示した。 In this work, we present the contribution of the BLUE team in the eRisk Lab task on searching for symptoms of depression. The task consists of retrieving and ranking Reddit social media sentences that convey symptoms of depression from the BDI-II questionnaire. Given that synthetic data provided by LLMs have been proven to be a reliable method for augmenting data and fine-tuning downstream models, we chose to generate synthetic data using ChatGPT for each of the symptoms of the BDI-II questionnaire. We designed a prompt such that the generated data contains more richness and semantic diversity than the BDI-II responses for each question and, at the same time, contains emotional and anecdotal experiences that are specific to the more intimate way of sharing experiences on Reddit. We perform semantic search and rank the sentences' relevance to the BDI-II symptoms by cosine similarity. We used two state-of-the-art transformer-based models (MentalRoBERTa and a variant of MPNet) for embedding the social media posts, the original and generated responses of the BDI-II. Our results show that using sentence embeddings from a model designed for semantic search outperforms the approach using embeddings from a model pre-trained on mental health data. Furthermore, the generated synthetic data were proved too specific for this task, the approach simply relying on the BDI-II responses had the best performance.	翻訳日:2023-07-07 11:02:15 公開日:2023-07-06
# 剛性フェアニューラルアーキテクチャ探索に基づく動的アイソメトリ Dynamical Isometry based Rigorous Fair Neural Architecture Search ( http://arxiv.org/abs/2307.02263v2 ) ライセンス: Link先を確認	Jianxiang Luo, Junyi Hu, Tianji Pang, Weihao Huang, Chuang Liu	(参考訳) 近年,重み付け技術により,ニューラルネットワーク探索のトレーニングと評価が大幅に高速化されている。しかし、既存の重み共有戦略のほとんどは経験や観察のみに基づいており、その結果は解釈可能性や合理性に欠ける。また, 公正性の欠如により, モジュール評価の誤判断が生じる傾向にある。これらの問題に対処するために,動的アイソメトリに基づくニューラルアーキテクチャ探索アルゴリズムを提案する。固定点解析法を平均場理論に用いて、定常ランダムニューラルネットワークにおける動的挙動を解析し、動的等尺法が重み付けに基づくNASの公平性を保証するかを示す。一方,条件付きジャコビアンを持つすべてのモジュールの一般化誤差を推定することにより,モジュール選択戦略が厳密であることを示す。大規模な実験により,提案手法で探索したアーキテクチャは,画像ネット分類における最先端のTop-1検証精度を実現することができた。また,本手法は一般性を損なうことなく,より良く,より安定したトレーニング性能を実現することができることを示した。 Recently, the weight-sharing technique has significantly speeded up the training and evaluation procedure of neural architecture search. However, most existing weight-sharing strategies are solely based on experience or observation, which makes the searching results lack interpretability and rationality. In addition, due to the negligence of fairness, current methods are prone to make misjudgments in module evaluation. To address these problems, we propose a novel neural architecture search algorithm based on dynamical isometry. We use the fix point analysis method in the mean field theory to analyze the dynamics behavior in the steady state random neural network, and how dynamic isometry guarantees the fairness of weight-sharing based NAS. Meanwhile, we prove that our module selection strategy is rigorous fair by estimating the generalization error of all modules with well-conditioned Jacobian. Extensive experiments show that, with the same size, the architecture searched by the proposed method can achieve state-of-the-art top-1 validation accuracy on ImageNet classification. In addition, we demonstrate that our method is able to achieve better and more stable training performance without loss of generality.	翻訳日:2023-07-07 11:01:48 公開日:2023-07-06
# 混合量子状態に対する強い量子速度制限 Stronger Quantum Speed Limit For Mixed Quantum States ( http://arxiv.org/abs/2307.02215v2 ) ライセンス: Link先を確認	Shrobona Bagchi, Dimpi Thakuria, Arun Kumar Pati	(参考訳) 混合量子状態とユニタリ進化の強い不確実性関係を用いて、混合量子状態に対する量子速度制限を導出する。また、この境界は、より良い境界を得るための演算子の異なる選択に対して最適化可能であることも示している。このバウンダリをいくつかの例で説明し、以前のバウンダリよりも優れたパフォーマンスを示します。 We derive a quantum speed limit for mixed quantum states using the stronger uncertainty relation for mixed quantum states and unitary evolution. We also show that this bound can be optimized over different choices of operators for obtaining a better bound. We illustrate this bound with some examples and show its better performance with respect to some earlier bounds.	翻訳日:2023-07-07 11:01:30 公開日:2023-07-06
# 自由空間BBM92量子鍵分配プロトコルにおける非最大絡み合い状態の利用 Use of Non-Maximal entangled state for free space BBM92 quantum key distribution protocol ( http://arxiv.org/abs/2307.02149v2 ) ライセンス: Link先を確認	Ayan Biswas, Sarika Mishra, Satyajeet Patil, Anindya Banerji, Shashi Prabhakar, and Ravindra P. Singh	(参考訳) セキュアな鍵配布のための衛星ベースの量子通信は、破壊不可能なセキュリティのために、より要求の高い研究分野になりつつある。 BB84のようなプレパアプロトコルや測定プロトコルは、衛星を信頼できる装置とみなし、衛星ベースの光通信の現在の傾向を危険視している。したがって、遠距離制限を克服すると共に、衛星を信頼できない機器とみなすことができるため、絡み合いに基づくプロトコルが望ましい。 e91プロトコルは衛星ベースの量子通信のよい候補であるが、eveに対するセキュリティを確保するためにベル・チェシュの不等式を検証するために測定された量子ビットのほとんどを利用するため、鍵レートは低い。エンタングルメントベースのプロトコルは、よりセキュアな鍵分散のために最大エンタングル状態を必要とする。本稿では,セキュアな鍵分布に対する非最大性の影響について述べる。これは、セキュアキーを抽出できない非最大性条件の下限を確立する。 BBM92プロトコルは,Bell-CHSHの不等式に対する違反の程度と,与えられた設定に対する量子ビット誤り率との間に線形接続があることから,鍵分布にとってより有益である。 Satellite-based quantum communication for secure key distribution is becoming a more demanding field of research due to its unbreakable security. Prepare and measure protocols such as BB84 consider the satellite as a trusted device, fraught with danger looking at the current trend for satellite-based optical communication. Therefore, entanglement-based protocols must be preferred since, along with overcoming the distance limitation, one can consider the satellite as an untrusted device too. E91 protocol is a good candidate for satellite-based quantum communication; but the key rate is low as most of the measured qubits are utilized to verify a Bell-CHSH inequality to ensure security against Eve. An entanglement-based protocol requires a maximally entangled state for more secure key distribution. The current work discusses the effect of non-maximality on secure key distribution. It establishes a lower bound on the non-maximality condition below which no secure key can be extracted. BBM92 protocol will be more beneficial for key distribution as we found a linear connection between the extent of violation for Bell-CHSH inequality and the quantum bit error rate for a given setup.	翻訳日:2023-07-07 11:01:23 公開日:2023-07-06
# 計算社会科学における再現性 Computational Reproducibility in Computational Social Science ( http://arxiv.org/abs/2307.01918v2 ) ライセンス: Link先を確認	David Schoch, Chung-hong Chan, Claudia Wagner, Arnim Bleier	(参考訳) 過去10年間で、再現性と再現性の危機が科学界を揺るがしている。潜在的な解決策として、オープンサイエンスの実践は深く議論され、様々な分野で様々な成功を収めた。しかしながら,計算社会科学などの計算X分野における再現性のバイナリ定義は,結果が再現できるエージェントや条件について明示的でないため不十分である,と我々は主張する。本研究では, 理論的再現性を創出するが, 実用的, 検証された再現性をサポートしない「オープン洗浄」を避けるための定義を拡張し, 検証可能性の概念に基づく計算再現性の階層システムを導入する。検証可能な計算再現性、特に計算社会科学の分野における共通の障壁を特定し、共通アクセスや計算障壁を回避する方法について提案する。 In the last decade, replication and reproducibility crises have shaken the scientific landscape. As potential solutions, open science practices were heavily discussed and have been implemented with varying success in different disciplines. We argue, however, that the binary definition of reproducibility, specifically for computational-X disciplines such as computational social science, is insufficient since it is not explicit about the agents and conditions under which results can be reproduced. We expand the definition to avoid "open washing", the practice of fabricating theoretical reproducibility but not supporting practical or verified reproducibility, and introduce a tier system of computational reproducibility based on the concept of verifiability. We identify common barriers to verifiable computational reproducibility, specifically in the field of computational social science, and provide suggestions on how to circumvent common access and computational barriers.	翻訳日:2023-07-07 11:01:04 公開日:2023-07-06
# 変換プロトフォーム再構成 Transformed Protoform Reconstruction ( http://arxiv.org/abs/2307.01896v2 ) ライセンス: Link先を確認	Young Min Kim, Kalvin Chang, Chenxuan Cui and David Mortensen	(参考訳) プロトホルムの再構築は、娘言語の祖先言語における形態素や単語の出現を推測する作業である。 Meloni et al. (2021)は、RNNベースのエンコーダデコーダとアテンションモデルを用いて、ラテン文字のプロトフォーム再構築の最先端を達成した。我々は最新のseq2seqモデルであるtransformerでモデルを更新する。我々のモデルは,5言語にまたがる8,000コニャート,39種にまたがる800以上のコニャートからなる中国語データセット(Hou 2004)の2つの異なるデータセット上で,それらのモデルを比較した。また,本モデルに含まれる可能性のある系統信号についても検討する。私たちのコードはhttps://github.com/cmu-llab/acl-2023で公開されています。 Protoform reconstruction is the task of inferring what morphemes or words appeared like in the ancestral languages of a set of daughter languages. Meloni et al. (2021) achieved the state-of-the-art on Latin protoform reconstruction with an RNN-based encoder-decoder with attention model. We update their model with the state-of-the-art seq2seq model: the Transformer. Our model outperforms their model on a suite of different metrics on two different datasets: their Romance data of 8,000 cognates spanning 5 languages and a Chinese dataset (Hou 2004) of 800+ cognates spanning 39 varieties. We also probe our model for potential phylogenetic signal contained in the model. Our code is publicly available at https://github.com/cmu-llab/acl-2023.	翻訳日:2023-07-07 11:00:51 公開日:2023-07-06
# Align with Purpose: General Plug-and-Play Frameworkを用いたCTCモデルにおけるDesiredプロパティの最適化 Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework ( http://arxiv.org/abs/2307.01715v2 ) ライセンス: Link先を確認	Eliya Segev, Maya Alroy, Ronen Katsir, Noam Wies, Ayana Shenhav, Yael Ben-Oren, David Zar, Oren Tadmor, Jacob Bitterman, Amnon Shashua and Tal Rosenwein	(参考訳) コネクショニスト時間分類(ctc)は、教師付きシーケンシャル・ツー・シークエンス(seq2seq)モデルの訓練に広く用いられている基準である。これは不完全なアライメントを犠牲にして、完全なアライメント(基礎となる真実を生み出す)を余分にすることで、入力シーケンスと出力シーケンスの関係を学習することができる。完全かつ不完全なアライメントのこの二項微分は、他の現実世界の応用において重要な重要なアライメント特性を捉えていない。ここでは、CTC基準でトレーニングされたモデルにおいて、所望のプロパティを強化するために、$\textbf{ general Plug-and-Play framework}$を提案する。我々は、所望の特性に応じてアライメントを優先順位付けする追加の損失項でCTCを補完する。本手法はctc損失関数への干渉を一切必要とせず,様々な特性の最適化を容易にし,完全アライメントと不完全アライメントの区別を可能にする。我々は,ASR(Automatic Speech Recognition)の領域にフレームワークを適用し,その特性選択,アーキテクチャ選択,トレーニングデータセットのスケール(最大280,000時間)において,その汎用性を示す。本フレームワークの有効性を実証するため, 出力時間と単語誤り率(WER)の2つの非関連特性に適用した。前者については、WERの小さな削減によるレイテンシ最適化の最大570msの改善を報告し、後者については、ベースラインモデルよりも4.5%WERの相対的な改善を報告した。私たちの知る限りでは、これらのアプリケーションは我々のものほど大規模なデータを扱うことが実証されたことはない。特に,本手法は数行のコードだけで実装可能であり,アライメントフリーな損失関数やASR以外の領域にも拡張可能である。 Connectionist Temporal Classification (CTC) is a widely used criterion for training supervised sequence-to-sequence (seq2seq) models. It enables learning the relations between input and output sequences, termed alignments, by marginalizing over perfect alignments (that yield the ground truth), at the expense of imperfect alignments. This binary differentiation of perfect and imperfect alignments falls short of capturing other essential alignment properties that hold significance in other real-world applications. Here we propose $\textit{Align With Purpose}$, a $\textbf{general Plug-and-Play framework}$ for enhancing a desired property in models trained with the CTC criterion. We do that by complementing the CTC with an additional loss term that prioritizes alignments according to a desired property. Our method does not require any intervention in the CTC loss function, enables easy optimization of a variety of properties, and allows differentiation between both perfect and imperfect alignments. We apply our framework in the domain of Automatic Speech Recognition (ASR) and show its generality in terms of property selection, architectural choice, and scale of training dataset (up to 280,000 hours). To demonstrate the effectiveness of our framework, we apply it to two unrelated properties: emission time and word error rate (WER). For the former, we report an improvement of up to 570ms in latency optimization with a minor reduction in WER, and for the latter, we report a relative improvement of 4.5% WER over the baseline models. To the best of our knowledge, these applications have never been demonstrated to work on a scale of data as large as ours. Notably, our method can be implemented using only a few lines of code, and can be extended to other alignment-free loss functions and to domains other than ASR.	翻訳日:2023-07-07 11:00:36 公開日:2023-07-06
# オンライン手書き署名検証のためのトランスフォーマーの検討 Exploring Transformers for On-Line Handwritten Signature Verification ( http://arxiv.org/abs/2307.01663v2 ) ライセンス: Link先を確認	Pietro Melzi, Ruben Tolosana, Ruben Vera-Rodriguez, Paula Delgado-Santos, Giuseppe Stragapede, Julian Fierrez, Javier Ortega-Garcia	(参考訳) 近年,ユーザフレンドリーな認証手法としてのモバイルバイオメトリックスの利用が増加している。近年の研究では、トランスフォーマーに基づく新しい行動バイオメトリック認識システムを提案している。オンライン手書き署名検証は、タブレットやスマートフォンなどの電子機器を用いて取得した生体認証に基づいて、被験者の身元を確認することを目的としている。本稿では,オンライン署名検証のための最近のトランスフォーマーに基づくアーキテクチャの適合性について検討する。特に4つの異なる構成が研究され、そのうち2つはVanilla Transformerエンコーダに依存し、他の2つは歩行と行動認識のタスクにうまく適用されている。提案する4つの構成をsvc-ongoing competitionで提案された実験プロトコルに従って評価する。実験の結果は有望であり,オンライン署名検証におけるトランスフォーマーの利用を促進する。 The application of mobile biometrics as a user-friendly authentication method has increased in the last years. Recent studies have proposed novel behavioral biometric recognition systems based on Transformers, which currently outperform the state of the art in several application scenarios. On-line handwritten signature verification aims to verify the identity of subjects, based on their biometric signatures acquired using electronic devices such as tablets or smartphones. This paper investigates the suitability of architectures based on recent Transformers for on-line signature verification. In particular, four different configurations are studied, two of them rely on the Vanilla Transformer encoder, and the two others have been successfully applied to the tasks of gait and activity recognition. We evaluate the four proposed configurations according to the experimental protocol proposed in the SVC-onGoing competition. The results obtained in our experiments are promising, and promote the use of Transformers for on-line signature verification.	翻訳日:2023-07-07 11:00:00 公開日:2023-07-06
# 見ることは信じない: 人間の視覚のプライバシー保護のためのアイデンティティ・ハイダー Seeing is not Believing: An Identity Hider for Human Vision Privacy Protection ( http://arxiv.org/abs/2307.00481v2 ) ライセンス: Link先を確認	Tao Wang, Yushu Zhang, Zixuan Yang, Hua Zhang, and Zhongyun Hua	(参考訳) 大量の撮像された顔画像は、個人を特定するためにデータベースに格納される。しかし、保存された画像は、個人の意思ではなく、プライバシー侵害を引き起こす可能性があるデータマネージャによって、意図的に、または意図せず観察される。既存の保護は、顔の視覚的な内容をわずかに変えるだけで、識別の効用を保ちながら、人間の視覚による真のアイデンティティの推論に影響を受けやすい。本稿では,顔認識器の高識別性を維持しつつ,人間の視力に対する視覚的変化を顕著に抑制するアイデンティティ隠蔽器を提案する。まず、idハイダは、stylegan2の潜在空間を操作して、新たな視覚コンテンツを持つ仮想顔を生成する。特に、仮想顔は、例えばポーズや表現など、元の顔と同じ無関係な属性を持つ。次に、仮想顔の視覚内容が元の顔に転送され、背景が元の顔に置き換えられる。さらに、アイデンティティハイダは、強い転送性を有し、任意の顔認識器が良好な精度を達成できる。適切な実験により,提案手法はプライバシ保護と識別性保存において優れた性能を発揮することが示された。 Massive captured face images are stored in the database for the identification of individuals. However, the stored images can be observed intentionally or unintentionally by data managers, which is not at the will of individuals and may cause privacy violations. Existing protection works only slightly change the visual content of the face while maintaining the utility of identification, making it susceptible to the inference of the true identity by human vision. In this paper, we propose an identity hider that enables significant visual content change for human vision while preserving high identifiability for face recognizers. Firstly, the identity hider generates a virtual face with new visual content by manipulating the latent space in StyleGAN2. In particular, the virtual face has the same irrelevant attributes as the original face, e.g., pose and expression. Secondly, the visual content of the virtual face is transferred into the original face and then the background is replaced with the original one. In addition, the identity hider has strong transferability, which ensures an arbitrary face recognizer can achieve satisfactory accuracy. Adequate experiments show that the proposed identity hider achieves excellent performance on privacy protection and identifiability preservation.	翻訳日:2023-07-07 09:16:06 公開日:2023-07-06

Title

Authors

Abstract

論文公表日・翻訳日

# 話者認識システムのバージョン制御

Version Control of Speaker Recognition Systems ( http://arxiv.org/abs/2007.12069v4 )

ライセンス: Link先を確認

Quan Wang, Ignacio Lopez Moreno

(参考訳) 本稿では,話者認識システムにおける最も困難な工学的問題の一つとして,モデルとユーザプロファイルのバージョン管理について論じる。典型的な話者認識システムは、ユーザが提供した登録音声からプロファイルを生成する登録ステージと、ランタイムオーディオの音声idを格納されたプロファイルと比較するランタイムステージの2つのステージで構成される。技術が進歩するにつれて、話者認識システムはパフォーマンスを改善するために更新される必要がある。しかし、保存されたユーザープロファイルが適切に更新されていない場合、バージョンミスマッチは意味のない認識結果をもたらす。本稿では,長年のエンジニアリング実践からgoogleで注意深く研究されてきた音声認識システムのバージョン管理戦略について述べる。これらの戦略は、製品環境へのデプロイ方法、デバイスサイドデプロイメント、サーバサイドデプロイメント、ハイブリッドデプロイメントの3つのグループに分類される。様々なネットワーク構成下で異なる戦略と定量的指標を比較するために,speakerversim(話者認識システムにおける異なるサーバ側配置戦略のためのpythonベースのシミュレーションフレームワーク)を提案する。

This paper discusses one of the most challenging practical engineering problems in speaker recognition systems - the version control of models and user profiles. A typical speaker recognition system consists of two stages: the enrollment stage, where a profile is generated from user-provided enrollment audio; and the runtime stage, where the voice identity of the runtime audio is compared against the stored profiles. As technology advances, the speaker recognition system needs to be updated for better performance. However, if the stored user profiles are not updated accordingly, version mismatch will result in meaningless recognition results. In this paper, we describe different version control strategies for speaker recognition systems that had been carefully studied at Google from years of engineering practice. These strategies are categorized into three groups according to how they are deployed in the production environment: device-side deployment, server-side deployment, and hybrid deployment. To compare different strategies with quantitative metrics under various network configurations, we present SpeakerVerSim, an easily-extensible Python-based simulation framework for different server-side deployment strategies of speaker recognition systems.

翻訳日:2023-10-24 16:07:39 公開日:2023-07-06

# 便利なコードレビューコメントの特定の進歩を探る

Exploring the Advances in Identifying Useful Code Review Comments ( http://arxiv.org/abs/2307.00692v2 )

ライセンス: Link先を確認

Sharif Ahmed and Nasir U. Eisty

(参考訳) 協調ソフトウェア開発における効果的な相互コードレビューは、有用なレビュアーコメントとサポート的な自動化ツールを必要とする。コードレビューのコメントは、業界とオープンソース開発におけるModern Code Reviewプロセスの中心的なコンポーネントである。したがって、これらのコメントがその目的を達成することが重要である。本稿では,コードレビューコメントの有用性に関する研究の進化を反映する。コードレビューコメントの有用性を定義した論文、データセットのマイニングとアノテート、開発者の知覚の研究、さまざまな側面の要因の分析、機械学習分類器を使用してコードレビューコメントの有用性を自動的に予測する。最後に、将来の研究で有用なコードレビューコメントを認識する際のオープンな問題と課題について論じる。

Effective peer code review in collaborative software development necessitates useful reviewer comments and supportive automated tools. Code review comments are a central component of the Modern Code Review process in the industry and open-source development. Therefore, it is important to ensure these comments serve their purposes. This paper reflects the evolution of research on the usefulness of code review comments. It examines papers that define the usefulness of code review comments, mine and annotate datasets, study developers' perceptions, analyze factors from different aspects, and use machine learning classifiers to automatically predict the usefulness of code review comments. Finally, it discusses the open problems and challenges in recognizing useful code review comments for future research.

翻訳日:2023-10-23 18:36:25 公開日:2023-07-06

# PersonaGen: ユーザフィードバックからペルソナを生成するツール

PersonaGen: A Tool for Generating Personas from User Feedback ( http://arxiv.org/abs/2307.00390v2 )

ライセンス: Link先を確認

Xishuo Zhang, Lin Liu, Yi Wang, Xiao Liu, Hailong Wang, Anqi Ren, Chetan Arora

(参考訳) ペルソナはソフトウェア開発プロセス、特にアジャイル環境では不可欠です。しかしながら、アジャイルソフトウェア開発プロセスにおけるユーザからのフィードバックからペルソナを生成する効果的なツールはありません。このギャップを埋めるために、GPT-4モデルとナレッジグラフを使用して、よく処理されたユーザフィードバックからペルソナテンプレートを生成し、アジャイルソフトウェア開発プロセスにおける要求分析を容易にする新しいツールを提案する。ペルソナゲンというツールを開発しました学生ソフトウェアプロジェクトに関わる小規模なユーザスタディから質的なフィードバックを用いてPersonaGenを評価した。その結果,ペルソナをベースとした教育実践における課題と,非機能的要件への対処が混在していた。

Personas are crucial in software development processes, particularly in agile settings. However, no effective tools are available for generating personas from user feedback in agile software development processes. To fill this gap, we propose a novel tool that uses the GPT-4 model and knowledge graph to generate persona templates from well-processed user feedback, facilitating requirement analysis in agile software development processes. We developed a tool called PersonaGen. We evaluated PersonaGen using qualitative feedback from a small-scale user study involving student software projects. The results were mixed, highlighting challenges in persona-based educational practice and addressing non-functional requirements.

翻訳日:2023-10-23 18:34:37 公開日:2023-07-06

# teaser: シミュレーションに基づく自動運転車ソフトウェアのcanバス回帰テスト

TEASER: Simulation-based CAN Bus Regression Testing for Self-driving Cars Software ( http://arxiv.org/abs/2307.03279v1 )

ライセンス: Link先を確認

Christian Birchler, Cyrill Rohrbach, Hyeongkyun Kim, Alessio Gambi, Tianhai Liu, Jens Horneber, Timo Kehrer, Sebastiano Panichella

(参考訳) 自動運転車(SDC)のような安全クリティカルなシステムのためのソフトウェアシステムは厳格にテストする必要がある。特に、SDCの電子制御ユニット(ECU)は、現実的な入力データでテストする必要がある。この文脈では、一般的に、コントローラエリアネットワーク(CAN)と呼ばれる通信プロトコルが、センサーデータをSDC制御ユニットに転送するために使用される。 SDCメンテナとテスタにとっての課題は、現実の世界におけるSDCの状態を現実的に表現するCANインプットを手動で定義する必要があることだ。この課題に対処するため,我々は,最先端の自動車シミュレータからセンサから取得したsdcに対して,現実的なcan信号を生成するツールであるteaserを開発した。自動車分野の企業であるaicas GmbHのDevOpsパイプラインへの統合機能に基づいてTEASERを評価した。具体的には、Jenkinsで構成されたContinous Integration(CI)パイプラインにTEASERを統合しました。パイプラインは、シミュレーション環境でテストケースを実行し、CANバス上のセンサデータを、テスト対象である物理CANデバイスに送信する。 TEASERは,シミュレーションに基づく故障(回帰戦略を用いて)を公開するCIテストケースの生成と実行が可能であり,実世界におけるSDCの実態を現実的に表現するCAN入力を生成する。この結果は,SDCソフトウェアにおけるシミュレーションベースCANバス回帰テストの自動化と有効性を高める上で重要である。ツール: https://doi.org/10.5281/zenodo.7964890 github: https://github.com/christianbirchler-org/sdc-scissor/releases/tag/v2.2.0-rc.1 ドキュメント: https://sdc-scissor.readthedocs.io

Software systems for safety-critical systems like self-driving cars (SDCs) need to be tested rigorously. Especially electronic control units (ECUs) of SDCs should be tested with realistic input data. In this context, a communication protocol called Controller Area Network (CAN) is typically used to transfer sensor data to the SDC control units. A challenge for SDC maintainers and testers is the need to manually define the CAN inputs that realistically represent the state of the SDC in the real world. To address this challenge, we developed TEASER, which is a tool that generates realistic CAN signals for SDCs obtained from sensors from state-of-the-art car simulators. We evaluated TEASER based on its integration capability into a DevOps pipeline of aicas GmbH, a company in the automotive sector. Concretely, we integrated TEASER in a Continous Integration (CI) pipeline configured with Jenkins. The pipeline executes the test cases in simulation environments and sends the sensor data over the CAN bus to a physical CAN device, which is the test subject. Our evaluation shows the ability of TEASER to generate and execute CI test cases that expose simulation-based faults (using regression strategies); the tool produces CAN inputs that realistically represent the state of the SDC in the real world. This result is of critical importance for increasing automation and effectiveness of simulation-based CAN bus regression testing for SDC software. Tool: https://doi.org/10.5281/zenodo.7964890 GitHub: https://github.com/christianbirchler-org/sdc-scissor/releases/tag/v2.2.0-rc.1 Documentation: https://sdc-scissor.readthedocs.io

翻訳日:2023-10-23 18:15:27 公開日:2023-07-06

# ニューラルネットワークをガイドする芸術的戦略

Artistic Strategies to Guide Neural Networks ( http://arxiv.org/abs/2307.07521v1 )

ライセンス: Link先を確認

Varvara Guljajeva, Mar Canet Sola, Isaac Joseph Clarke

(参考訳) 人工知能は文化の生成と分布に存在している。アーティストはどのようにニューラルネットワークを利用するのか? これらのアルゴリズムが芸術的実践に与える影響は? 本稿では,現在のai技術,より正確にはディープニューラルネットワークの可能性と限界について,画像,テキスト,フォーム,および記号空間の翻訳の文脈で検討する。比較的短時間で高解像度画像と3Dオブジェクトの生成が達成された。 CLIPやtext2meshのような、出力と同じ種類のメディア入力を必要としないモデルがあります。このようなツイストはクリエイティビティの刺激に寄与し、アートの実践で現れ、開発者のパイプラインにフィードバックします。またしても、アートワークがテクノロジー開発の触媒となる様子が見られます。これらの創造的なシナリオとプロセスは、AIモデルだけでなく、これらの新技術の実装の背後にある懸命な努力によって実現されます。 AIは'プッシュ・ア・ボタン'の傑作を作るのではなく、その背後にある技術と創造的で批判的な考え方を深く理解する必要があります。このように、AIはインスピレーションのための新しい道を開き、新しいツールセットを提供する。

Artificial Intelligence is present in the generation and distribution of culture. How do artists exploit neural networks? What impact do these algorithms have on artistic practice? Through a practice-based research methodology, this paper explores the potentials and limits of current AI technology, more precisely deep neural networks, in the context of image, text, form and translation of semiotic spaces. In a relatively short time, the generation of high-resolution images and 3D objects has been achieved. There are models, like CLIP and text2mesh, that do not need the same kind of media input as the output; we call them translation models. Such a twist contributes toward creativity arousal, which manifests itself in art practice and feeds back to the developers' pipeline. Yet again, we see how artworks act as catalysts for technology development. Those creative scenarios and processes are enabled not solely by AI models, but by the hard work behind implementing these new technologies. AI does not create a 'push-a-button' masterpiece but requires a deep understanding of the technology behind it, and a creative and critical mindset. Thus, AI opens new avenues for inspiration and offers novel tool sets, and yet again the question of authorship is asked.

翻訳日:2023-07-23 12:15:00 公開日:2023-07-06

# 大規模言語モデルの限界・損害・リスクの増幅

Amplifying Limitations, Harms and Risks of Large Language Models ( http://arxiv.org/abs/2307.04821v1 )

ライセンス: Link先を確認

Michael O'Neill and Mark Connor

(参考訳) 本稿では、人工知能(AI)とその能力に関する誇大広告の急増と、AIが知覚的かつ超知的なものになると起こりうるSFシナリオに関する話がもたらす混乱に対抗すべく、小さなジェスチャーとして提示する。また、この分野外の人たちがai技術の制限についてもっと情報を得るのに役立つかもしれない。一般的な談話の現在の文脈では、AIのデフォルトは、ChatGPTの作成に使用されるような基礎と大規模言語モデル(LLM)を意味する。これはそれ自体、AIの分野を真に表している多様性、深さ、研究の量、研究者、技術の誤表現である。 AIは、少なくとも1950年代からソフトウェアアーチファクトに存在した研究分野である。私たちはLSMのいくつかの制限を強調することにしました。そのために、すでに害が発生しており、これらの制限のために引き続き起こり続けることを強調しました。その過程で私たちは、この技術を使用する個人や組織に関連するリスクについても強調しています。

We present this article as a small gesture in an attempt to counter what appears to be exponentially growing hype around Artificial Intelligence (AI) and its capabilities, and the distraction provided by the associated talk of science-fiction scenarios that might arise if AI should become sentient and super-intelligent. It may also help those outside of the field to become more informed about some of the limitations of AI technology. In the current context of popular discourse AI defaults to mean foundation and large language models (LLMs) such as those used to create ChatGPT. This in itself is a misrepresentation of the diversity, depth and volume of research, researchers, and technology that truly represents the field of AI. AI being a field of research that has existed in software artefacts since at least the 1950's. We set out to highlight a number of limitations of LLMs, and in so doing highlight that harms have already arisen and will continue to arise due to these limitations. Along the way we also highlight some of the associated risks for individuals and organisations in using this technology.

翻訳日:2023-07-16 04:05:12 公開日:2023-07-06

# S2vNTM: 半教師付きvMFニューラルトピックモデリング

S2vNTM: Semi-supervised vMF Neural Topic Modeling ( http://arxiv.org/abs/2307.04804v1 )

ライセンス: Link先を確認

Weijie Xu, Jay Desai, Srinivasan Sengamedu, Xiaoyu Jiang, Francis Iannacci

(参考訳) 言語モデルに基づく手法はテキスト分類の強力な手法である。しかし、モデルにはいくつかの欠点がある。 1)キーワードなどの人的知識を統合することは困難である。 (2) モデルをトレーニングするには多くのリソースが必要です。 3) 事前学習には大きなテキストデータに頼った。本稿では,これらの課題を克服するためのセミスーパービジョンvMFニューラルトピックモデリング(S2vNTM)を提案する。 S2vNTMはいくつかのシードキーワードをトピックの入力として取り込む。 s2vntmはキーワードのパターンを利用して潜在的なトピックを特定し、トピックのキーワードセットの品質を最適化する。様々なデータセットにおいて、S2vNTMは、限定キーワードによる分類精度において、既存の半教師付きトピックモデリング手法よりも優れている。 S2vNTMはベースラインの少なくとも2倍の速度である。

Language model based methods are powerful techniques for text classification. However, the models have several shortcomings. (1) It is difficult to integrate human knowledge such as keywords. (2) It needs a lot of resources to train the models. (3) It relied on large text data to pretrain. In this paper, we propose Semi-Supervised vMF Neural Topic Modeling (S2vNTM) to overcome these difficulties. S2vNTM takes a few seed keywords as input for topics. S2vNTM leverages the pattern of keywords to identify potential topics, as well as optimize the quality of topics' keywords sets. Across a variety of datasets, S2vNTM outperforms existing semi-supervised topic modeling methods in classification accuracy with limited keywords provided. S2vNTM is at least twice as fast as baselines.

翻訳日:2023-07-16 04:03:44 公開日:2023-07-06

# UniCoRN:認知信号と人間の言語をブリッジする統合認知信号再構成

UniCoRN: Unified Cognitive Signal ReconstructioN bridging cognitive signals and human language ( http://arxiv.org/abs/2307.05355v1 )

ライセンス: Link先を確認

Nuwa Xi, Sendong Zhao, Haochun Wang, Chi Liu, Bing Qin and Ting Liu

(参考訳) 認知信号(例えばfMRI)からテキスト刺激を復号することで、人間の言語システムに対する理解を深め、汎用的なBrain-Computer Interfaceを構築する道を開く。しかし、既存の研究では、個々の単語レベルのfMRIボリュームを制限された語彙から復号することに重点を置いている。本稿では,fMRI時系列と人間の言語を橋渡しする最初のオープン語彙課題であるfMRI2textを提案する。さらに,この課題の可能性を探究するために,脳デコードのための統一認知信号再構成(unified cognitive signal reconstruction for brain decoding)というベースラインソリューションを提案する。個々の時間と時系列の両方を再構成することで、ユニコーンは認知信号のためのロバストなエンコーダ(fmriと脳波)を確立する。事前訓練された言語モデルをデコーダとして活用することにより、UniCoRNは様々な分割設定でfMRIシリーズからコヒーレントテキストを復号する効果を証明している。このモデルでは、fMRI2text上で34.77%のBLEUスコアを達成し、EEGto-textデコーディングに一般化すると37.04%のBLEUを達成し、その結果、以前のベースラインを上回った。実験結果から, 連続fMRIボリュームの復号化の実現可能性, 統合構造を用いた認知信号の復号化の有効性が示唆された。

Decoding text stimuli from cognitive signals (e.g. fMRI) enhances our understanding of the human language system, paving the way for building versatile Brain-Computer Interface. However, existing studies largely focus on decoding individual word-level fMRI volumes from a restricted vocabulary, which is far too idealized for real-world application. In this paper, we propose fMRI2text, the first openvocabulary task aiming to bridge fMRI time series and human language. Furthermore, to explore the potential of this new task, we present a baseline solution, UniCoRN: the Unified Cognitive Signal ReconstructioN for Brain Decoding. By reconstructing both individual time points and time series, UniCoRN establishes a robust encoder for cognitive signals (fMRI & EEG). Leveraging a pre-trained language model as decoder, UniCoRN proves its efficacy in decoding coherent text from fMRI series across various split settings. Our model achieves a 34.77% BLEU score on fMRI2text, and a 37.04% BLEU when generalized to EEGto-text decoding, thereby surpassing the former baseline. Experimental results indicate the feasibility of decoding consecutive fMRI volumes, and the effectiveness of decoding different cognitive signals using a unified structure.

翻訳日:2023-07-16 03:54:59 公開日:2023-07-06

# 解釈可能かつ効率的なppg信号品質評価とアーティファクトセグメンテーションのための学習カーネル

Learned Kernels for Interpretable and Efficient PPG Signal Quality Assessment and Artifact Segmentation ( http://arxiv.org/abs/2307.05385v1 )

ライセンス: Link先を確認

Sully F. Chen, Zhicheng Guo, Cheng Ding, Xiao Hu, Cynthia Rudin

(参考訳) Photoplethysmography (PPG) は、様々な心血管パラメータを継続的に監視する、低コストで非侵襲的な方法を提供する。 PPG信号はウェアラブルデバイスによって生成され、人体の動きなどの外部要因によって引き起こされる大きな成果物を頻繁に含む。生理学的パラメータのロバストで正確な抽出を確保するために、信号の破損領域を識別し適切に処理する必要がある。それまでの方法論は、手作りの特徴検出器や、準最適性能をもたらす信号メトリクスに依存するか、あるいは解釈性に欠け、計算的かつメモリ集約的なディープニューラルネットワーク(DNN)のような機械学習技術に依存していた。本研究では,数桁のパラメータを持つ最先端のdnnアプローチとよく似た性能を持つ,解釈可能な畳み込み型カーネルの小さな集合を学習する新しい手法を提案する。この作業により、低消費電力デバイス上で効率的に、堅牢で、解釈可能な信号品質評価とアーティファクトセグメンテーションが可能になる。

Photoplethysmography (PPG) provides a low-cost, non-invasive method to continuously monitor various cardiovascular parameters. PPG signals are generated by wearable devices and frequently contain large artifacts caused by external factors, such as motion of the human subject. In order to ensure robust and accurate extraction of physiological parameters, corrupted areas of the signal need to be identified and handled appropriately. Previous methodology relied either on handcrafted feature detectors or signal metrics which yield sub-optimal performance, or relied on machine learning techniques such as deep neural networks (DNN) which lack interpretability and are computationally and memory intensive. In this work, we present a novel method to learn a small set of interpretable convolutional kernels that has performance similar to -- and often better than -- the state-of-the-art DNN approach with several orders of magnitude fewer parameters. This work allows for efficient, robust, and interpretable signal quality assessment and artifact segmentation on low-power devices.

翻訳日:2023-07-16 03:44:09 公開日:2023-07-06

# 動的およびパーソナライズされたカープーリングサービスのための機械学習ランキングアルゴリズム

A Machine-Learned Ranking Algorithm for Dynamic and Personalised Car Pooling Services ( http://arxiv.org/abs/2307.05697v1 )

ライセンス: Link先を確認

Mattia Giovanni Campana, Franca Delmastro, Raffaele Bruno

(参考訳) 自動車のプール化は、交通渋滞や大気汚染の低減に大きく貢献し、ドライバーと旅行者とを同じ時間帯や時間帯で共有できるようにすることが期待されている。多くのカープールマッチングサービスが、ドライバーと潜在的な乗客のプール内で、効率的にライドマッチを見つけるために設計されている。しかし現在では、単純なモビリティニーズ以外の多くの非金銭的側面や社会的配慮が、予測が難しい乗り心地の個々人の意思に影響を及ぼす可能性があると認識されている。そこで本研究では,カープーリングサービスのためのレコメンダシステムであるgogogetherを提案する。このシステムでは,カープーリングの学習とランク付け技術を活用して,選択履歴から各ユーザのパーソナライズされたランキングモデルを自動的に導出する。次にgotogetherは、提示されたマッチの成功率を最大化するために推奨乗車数のリストを構築する。提案手法の性能を検証するため,大都市圏の移動パターンや乗車要求のデータセットを生成するために,Twitter やFoursquare の情報源からの実際のデータを利用する。提案手法は,静的条件と動的条件の両方において,パーソナライズされたユーザの選択モデルを精度良く予測できることを示す。

Car pooling is expected to significantly help in reducing traffic congestion and pollution in cities by enabling drivers to share their cars with travellers with similar itineraries and time schedules. A number of car pooling matching services have been designed in order to efficiently find successful ride matches in a given pool of drivers and potential passengers. However, it is now recognised that many non-monetary aspects and social considerations, besides simple mobility needs, may influence the individual willingness of sharing a ride, which are difficult to predict. To address this problem, in this study we propose GoTogether, a recommender system for car pooling services that leverages on learning-to-rank techniques to automatically derive the personalised ranking model of each user from the history of her choices (i.e., the type of accepted or rejected shared rides). Then, GoTogether builds the list of recommended rides in order to maximise the success rate of the offered matches. To test the performance of our scheme we use real data from Twitter and Foursquare sources in order to generate a dataset of plausible mobility patterns and ride requests in a metropolitan area. The results show that the proposed solution quickly obtain an accurate prediction of the personalised user's choice model both in static and dynamic conditions.

翻訳日:2023-07-16 03:25:27 公開日:2023-07-06

# 高速広帯域分光器を用いたspd源のスペクトル特性評価

Spectral characterization of a SPDC source with a fast broadband spectrometer ( http://arxiv.org/abs/2307.06843v1 )

ライセンス: Link先を確認

Brianna Farella, Gregory Medwig, Raphael A. Abrahao, and Andrei Nomerotski

(参考訳) Spontaneous Parametric Down-Conversion (SPDC) ソースで生成された単一光子の特性を知ることは、特定のアプリケーションや用途に不可欠である。特に、スペクトル特性は重要な関連性である。本稿では,高速ブロードバンド分光器を用いた商用SPDC音源について検討する。我々の分析は、他のSPDCソースや、他の単光子生成技術に対して有効な手法であり、この分光器の設計方法のよい例を提供する。我々はアルゴン放射スペクトルの既知の線を用いて分光計を校正する。 spdc源からの2つの逆変換光子はポンプのパワーによって異なるスペクトル特性を有しており、その条件下では逆変換光子とスペクトル的に類似していることが示されている。最後に,ポンプ光子のスペクトル情報を再構成し,検討することができた。

Knowing the properties of the single photons produced in a Spontaneous Parametric Down-Conversion (SPDC) source can be crucial for specific applications and uses. In particular, the spectral properties are of key relevance. Here, we investigate a commercial SPDC source using our fast broadband spectrometer. Our analysis is a valid method for other SPDC sources, as well as other single-photon generation techniques, thus providing a good example of how to use this spectrometer design. We calibrate the spectrometer using known lines of the argon emission spectrum. We show that the two down-converted photons from the SPDC source have different spectral properties depending on the pump power, and in which condition we measured spectrally similar down-converted photons. Lastly, we were able to reconstruct and investigate the spectral information for the pump photon.

翻訳日:2023-07-16 03:16:33 公開日:2023-07-06

# ソーシャルメディア上の摂食障害コンテンツの同定のためのサイト非依存型マルチモーダル深層学習モデル

A Novel Site-Agnostic Multimodal Deep Learning Model to Identify Pro-Eating Disorder Content on Social Media ( http://arxiv.org/abs/2307.06775v1 )

ライセンス: Link先を確認

Jonathan Feldman

(参考訳) 過去10年間で、摂食障害の診断や摂食障害による死亡が急増し、新型コロナウイルス(covid-19)のパンデミックで絶頂期を迎えた。この大きな成長は、パンデミックのストレス要因だけでなく、摂食障害を促進するコンテンツに溢れるソーシャルメディアへの露出の増加にも起因している。このような内容は視聴者の摂食障害を引き起こすことがある。本研究では,視覚データとテキストデータの組み合わせに基づいて,あるソーシャルメディア投稿が摂食障害を促進するかどうかを判断できるマルチモーダル深層学習モデルの構築を目的とした。ツイートのラベル付きデータセットがtwitterから収集され、12のディープラーニングモデルがトレーニングされ、テストされた。モデル性能に基づいて、最も効果的なディープラーニングモデルは、RoBERTa自然言語処理モデルとMaxViT画像分類モデルのマルチモーダル融合であり、それぞれ95.9%と0.959のスコアが得られた。 RoBERTaとMaxViTの融合モデルは、ソーシャルメディアサイトTumblrとRedditの未ラベルの投稿のデータセットを分類するためにデプロイされ、人工知能を使わない以前の研究と同様の分類を生成し、人工知能が研究者と一致する洞察を発達させることができることを示した。さらに、このモデルは、Twitterのハッシュタグ8件の未確認ツイートの時系列分析に使われ、プロ食障害コンテンツが相対的に減少していることが判明した。しかし、2018年ごろから、摂食障害のコンテンツは減少を止めるか、あるいは再び豊富に上昇している。

Over the last decade, there has been a vast increase in eating disorder diagnoses and eating disorder-attributed deaths, reaching their zenith during the Covid-19 pandemic. This immense growth derived in part from the stressors of the pandemic but also from increased exposure to social media, which is rife with content that promotes eating disorders. Such content can induce eating disorders in viewers. This study aimed to create a multimodal deep learning model capable of determining whether a given social media post promotes eating disorders based on a combination of visual and textual data. A labeled dataset of Tweets was collected from Twitter, upon which twelve deep learning models were trained and tested. Based on model performance, the most effective deep learning model was the multimodal fusion of the RoBERTa natural language processing model and the MaxViT image classification model, attaining accuracy and F1 scores of 95.9% and 0.959 respectively. The RoBERTa and MaxViT fusion model, deployed to classify an unlabeled dataset of posts from the social media sites Tumblr and Reddit, generated similar classifications as previous research studies that did not employ artificial intelligence, showing that artificial intelligence can develop insights congruent to those of researchers. Additionally, the model was used to conduct a time-series analysis of yet unseen Tweets from eight Twitter hashtags, uncovering that the relative abundance of pro-eating disorder content has decreased drastically. However, since approximately 2018, pro-eating disorder content has either stopped its decline or risen once more in ampleness.

翻訳日:2023-07-16 03:15:37 公開日:2023-07-06

# 口腔上皮性異形成症における悪性度予測のための完全自動説明可能アルゴリズム

A Fully Automated and Explainable Algorithm for the Prediction of Malignant Transformation in Oral Epithelial Dysplasia ( http://arxiv.org/abs/2307.03757v1 )

ライセンス: Link先を確認

Adam J Shephard, Raja Muhammad Saad Bashir, Hanya Mahmood, Mostafa Jahanifar, Fayyaz Minhas, Shan E Ahmed Raza, Kris D McCombe, Stephanie G Craig, Jacqueline James, Jill Brooks, Paul Nankivell, Hisham Mehanna, Syed Ali Khurram, Nasir M Rajpoot

(参考訳) 口腔上皮性異形成症(OED)は,口腔病変を主訴とする病理組織学的診断である。その段階的変化は重要な観察者間変動に苦しめられ、悪性腫瘍の進行を確実に予測することはなく、潜在的に最適な治療決定につながる可能性がある。そこで我々は,HuematoxylinとEosinのスライディング画像全体の組織学的パターンに基づいて,口腔悪性度変換(OMT)のリスクスコアを割り当て,OED進行のリスクを定量化する,新しい人工知能アルゴリズムを開発した。このアルゴリズムは、内部のセグメンテーションモデルを用いて上皮内(および周辺)の核の検出とセグメンテーションに基づいている。次に,形態・空間的特徴を解釈可能な浅層ニューラルネットワークを用いて組織マーカーをエミュレートした。開発コホート (sheffield; n = 193例) について内部的な相互評価を行い, 2つの外部コホート (birmingham and belfast; n = 92例) について独立した検証を行った。提案された OMTscore は、OED が悪性度に進行するか否かを予測するときに AUROC = 0.74 を得る。生存分析の結果,手動で指定したWHOとバイナリグレードと比較すると,悪性度変化の予測にはOMTスコアが有用であった。正常に予測された症例の解析により上皮周囲および上皮内浸潤リンパ球の存在が判明した(p < 0.0001)。これは、外部データセットで検証しつつ、解釈可能な核の特徴に基づいてOED変換を予測するための完全に自動化されたアルゴリズムを提案する最初の研究である。本アルゴリズムは,OED悪性度変化の予測にヒトよりも優れた性能を示し,通常の臨床実践においてOEDをグレードする課題に対して,有望な解決策を提供する。

Oral epithelial dysplasia (OED) is a premalignant histopathological diagnosis given to lesions of the oral cavity. Its grading suffers from significant inter-/intra- observer variability, and does not reliably predict malignancy progression, potentially leading to suboptimal treatment decisions. To address this, we developed a novel artificial intelligence algorithm that can assign an Oral Malignant Transformation (OMT) risk score, based on histological patterns in the in Haematoxylin and Eosin stained whole slide images, to quantify the risk of OED progression. The algorithm is based on the detection and segmentation of nuclei within (and around) the epithelium using an in-house segmentation model. We then employed a shallow neural network fed with interpretable morphological/spatial features, emulating histological markers. We conducted internal cross-validation on our development cohort (Sheffield; n = 193 cases) followed by independent validation on two external cohorts (Birmingham and Belfast; n = 92 cases). The proposed OMTscore yields an AUROC = 0.74 in predicting whether an OED progresses to malignancy or not. Survival analyses showed the prognostic value of our OMTscore for predicting malignancy transformation, when compared to the manually-assigned WHO and binary grades. Analysis of the correctly predicted cases elucidated the presence of peri-epithelial and epithelium-infiltrating lymphocytes in the most predictive patches of cases that transformed (p < 0.0001). This is the first study to propose a completely automated algorithm for predicting OED transformation based on interpretable nuclear features, whilst being validated on external datasets. The algorithm shows better-than-human-level performance for prediction of OED malignant transformation and offers a promising solution to the challenges of grading OED in routine clinical practice.

翻訳日:2023-07-11 17:47:05 公開日:2023-07-06

# FITS: 10k$パラメータによる時系列モデリング

FITS: Modeling Time Series with $10k$ Parameters ( http://arxiv.org/abs/2307.03756v1 )

ライセンス: Link先を確認

Zhijian Xu, Ailing Zeng, Qiang Xu

(参考訳) 本稿では,時系列解析のための軽量かつ強力なモデルであるFITSを紹介する。生の時間領域データを直接処理する既存のモデルとは異なり、FITSは複雑な周波数領域の補間によって時系列を操作できるという原理に基づいている。時系列データにほとんど影響を与えない高周波成分を廃棄することにより、FITSは、約10k$のパラメータしか持たず、時系列予測や異常検出タスクの最先端モデルに匹敵する性能を達成する。このような軽量なモデルは、簡単にトレーニングしてエッジデバイスにデプロイでき、さまざまなアプリケーションのための機会を生み出します。匿名のコードリポジトリは以下の通りである。

In this paper, we introduce FITS, a lightweight yet powerful model for time series analysis. Unlike existing models that directly process raw time-domain data, FITS operates on the principle that time series can be manipulated through interpolation in the complex frequency domain. By discarding high-frequency components with negligible impact on time series data, FITS achieves performance comparable to state-of-the-art models for time series forecasting and anomaly detection tasks, while having a remarkably compact size of only approximately $10k$ parameters. Such a lightweight model can be easily trained and deployed in edge devices, creating opportunities for various applications. The anonymous code repo is available in: \url{https://anonymous.4open.science/r/FITS}

翻訳日:2023-07-11 17:46:30 公開日:2023-07-06

# 読むか、見るか、聞くか? マルチモーダルデータセットの解決に必要なこと

Read, Look or Listen? What's Needed for Solving a Multimodal Dataset ( http://arxiv.org/abs/2307.04532v1 )

ライセンス: Link先を確認

Netta Madvil, Yonatan Bitton, Roy Schwartz

(参考訳) 大規模マルチモーダルデータセットの普及は,データセットの品質を評価する上で,ユニークな課題を示す。マルチモーダル・インスタンスを処理に必要なモダリティにマップするために、人間のアノテーションの小さなシードを利用するマルチモーダル・データセットを2段階解析する手法を提案する。提案手法は,データセットにおける異なるモダリティの重要性と,それらの関係に光を当てる。ビデオ質問応答データセットであるTVQAに我々のアプローチを適用し、ほとんどの質問が特定のモダリティに対して実質的な偏見を伴わずに単一のモダリティで答えられることを発見した。さらに、ビデオを見たり、音声を聴いたりして、テレビQAにおける複数のモダリティの限定的な統合を強調したりすることで、70%以上の質問が、いくつかの異なる単一モダリティ戦略を用いて解決可能であることがわかった。我々はアノテーションを利用してMERLOTリザーブを解析し、テキストや音声よりも画像に基づく質問に苦しむが、聴覚話者の識別にも苦しむことを発見した。そこで本研究では,複数のモダリティを必要とする新しいテストセットを導入し,モデル性能の劇的な低下を観測する。我々の方法論は、マルチモーダルデータセットに関する貴重な洞察を提供し、より堅牢なモデルの開発の必要性を強調します。

The prevalence of large-scale multimodal datasets presents unique challenges in assessing dataset quality. We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it. Our method sheds light on the importance of different modalities in datasets, as well as the relationship between them. We apply our approach to TVQA, a video question-answering dataset, and discover that most questions can be answered using a single modality, without a substantial bias towards any specific modality. Moreover, we find that more than 70% of the questions are solvable using several different single-modality strategies, e.g., by either looking at the video or listening to the audio, highlighting the limited integration of multiple modalities in TVQA. We leverage our annotation and analyze the MERLOT Reserve, finding that it struggles with image-based questions compared to text and audio, but also with auditory speaker identification. Based on our observations, we introduce a new test set that necessitates multiple modalities, observing a dramatic drop in model performance. Our methodology provides valuable insights into multimodal datasets and highlights the need for the development of more robust models.

翻訳日:2023-07-11 13:01:50 公開日:2023-07-06

# ChatGPTの反応は従来の自然言語処理を促進するか?

Can ChatGPT's Responses Boost Traditional Natural Language Processing? ( http://arxiv.org/abs/2307.04648v1 )

ライセンス: Link先を確認

Mostafa M. Amin, Erik Cambria, Bj\"orn W. Schuller

(参考訳) 基礎モデルの雇用は、特にChatGPTの発売と他の基礎モデルの発売により、着実に拡大している。これらのモデルは、問題解決のために特に訓練されることなく、新しい能力の可能性を示してきた。パフォーマンスの質は従来の自然言語処理(NLP)技術と似ているが、RoBERTa言語モデルの微調整のような特別に訓練されたモデルには欠けていた。本研究は,ChatGPTが既存の特殊化モデルを融合させる新しい知識を持つかどうかを探索することによってこれを拡張する。提案手法は,既存のnlp手法を活用し,ダウンストリームタスクを解決するためのchatgptからの冗長応答の有用性を検討することで実現される。本研究は,感情分析,自殺傾向検出,ビッグファイブパーソナリティ評価という3つの情動計算問題について行った。以上の結果から,ChatGPTは核融合によって既存のNLP技術を改善することができる新たな知識を持っていることが示唆された。

The employment of foundation models is steadily expanding, especially with the launch of ChatGPT and the release of other foundation models. These models have shown the potential of emerging capabilities to solve problems, without being particularly trained to solve. A previous work demonstrated these emerging capabilities in affective computing tasks; the performance quality was similar to traditional Natural Language Processing (NLP) techniques, but falling short of specialised trained models, like fine-tuning of the RoBERTa language model. In this work, we extend this by exploring if ChatGPT has novel knowledge that would enhance existing specialised models when they are fused together. We achieve this by investigating the utility of verbose responses from ChatGPT about solving a downstream task, in addition to studying the utility of fusing that with existing NLP methods. The study is conducted on three affective computing problems, namely sentiment analysis, suicide tendency detection, and big-five personality assessment. The results conclude that ChatGPT has indeed novel knowledge that can improve existing NLP techniques by way of fusion, be it early or late fusion.

翻訳日:2023-07-11 12:31:36 公開日:2023-07-06

# core-gpt: オープンアクセス研究と大規模言語モデルを組み合わせた信頼性の高い質問応答

CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering ( http://arxiv.org/abs/2307.04683v1 )

ライセンス: Link先を確認

David Pride, Matteo Cancellieri and Petr Knoth

(参考訳) 本稿では,GPTに基づく言語モデルと3200万件以上の全文オープンアクセス科学論文を組み合わせた質問応答プラットフォームであるCORE-GPTを提案する。まず、GPT3.5とGPT4は、生成されたテキストへの参照や引用を頼りにできないことを示す。次に,質問に対するエビデンスに基づく回答を提示するcore-gptと引用論文への引用とリンクを導入し,回答の信頼性を大幅に向上させ,幻覚のリスクを低減させる。 CORE-GPTのパフォーマンスは、COREの上位20の科学領域をカバーする100の質問のデータセットで評価され、100の回答と500の関連記事へのリンクが得られた。得られた回答の質とリンクの関連性は2つのアノテータで評価した。以上の結果から,CORE-GPTは科学的領域の大部分を包括的で信頼性の高い回答が得られ,真に関連のある科学論文へのリンクが得られた。

In this paper, we present CORE-GPT, a novel question-answering platform that combines GPT-based language models and more than 32 million full-text open access scientific articles from CORE. We first demonstrate that GPT3.5 and GPT4 cannot be relied upon to provide references or citations for generated text. We then introduce CORE-GPT which delivers evidence-based answers to questions, along with citations and links to the cited papers, greatly increasing the trustworthiness of the answers and reducing the risk of hallucinations. CORE-GPT's performance was evaluated on a dataset of 100 questions covering the top 20 scientific domains in CORE, resulting in 100 answers and links to 500 relevant articles. The quality of the provided answers and and relevance of the links were assessed by two annotators. Our results demonstrate that CORE-GPT can produce comprehensive and trustworthy answers across the majority of scientific domains, complete with links to genuine, relevant scientific articles.

翻訳日:2023-07-11 12:22:56 公開日:2023-07-06

# トルコ語音声テキスト学習モデルの性能比較:Whisper-SmallとWav2Vec2-XLS-R-300M

Performance Comparison of Pre-trained Models for Speech-to-Text in Turkish: Whisper-Small and Wav2Vec2-XLS-R-300M ( http://arxiv.org/abs/2307.04765v1 )

ライセンス: Link先を確認

Oyku Berfin Mercan, Sercan Cepni, Davut Emre Tasar, Sukru Ozan

(参考訳) 本研究では,事前学習された2つの音声からテキストへの多言語モデルであるwhisper-smallとwav2vec2-xls-r-300mモデルの性能について検討した。 Mozilla Common Voiceバージョン11.0はトルコ語で準備されており、オープンソースのデータセットである。多言語モデルであるWhisper-SmallとWav2Vec2-XLS-R-300Mは、少量のデータを含むこのデータセットで微調整された。 2つのモデルの音声とテキストのパフォーマンスを比較した。 WER値は、それぞれWav2Vec2-XLS-R-300MとWhisper-Smallモデルの0.28と0.16と計算される。さらに、トレーニングおよび検証データセットに含まれていないコールセンターレコードを作成したテストデータを用いて、モデルの性能について検討した。

In this study, the performances of the Whisper-Small and Wav2Vec2-XLS-R-300M models which are two pre-trained multilingual models for speech to text were examined for the Turkish language. Mozilla Common Voice version 11.0 which is prepared in Turkish language and is an open-source data set, was used in the study. The multilingual models, Whisper- Small and Wav2Vec2-XLS-R-300M were fine-tuned with this data set which contains a small amount of data. The speech to text performance of the two models was compared. WER values are calculated as 0.28 and 0.16 for the Wav2Vec2-XLS- R-300M and the Whisper-Small models respectively. In addition, the performances of the models were examined with the test data prepared with call center records that were not included in the training and validation dataset.

翻訳日:2023-07-11 12:05:13 公開日:2023-07-06

# 単語意味論と音韻学がアルツハイマー病患者の手書きにどのように影響するか : 機械学習による分析

How word semantics and phonology affect handwriting of Alzheimer's patients: a machine learning based analysis ( http://arxiv.org/abs/2307.04762v1 )

ライセンス: Link先を確認

Nicole Dalia Cilia, Claudio De Stefano, Francesco Fontanella, Sabato Marco Siniscalchi

(参考訳) 神経変性疾患の診断を支援するために手書き文字のキネマティックな特性を利用することは、真の課題である。文献において,提案する課題は,筆跡運動を誘発する様々な認知的スキルに着目したものである。特に、コピーする単語の意味と音韻は、筆順を損なう可能性がある。本稿では,アルツハイマー病の影響を受ける人の筆跡に単語意味論と音韻学がどのように影響するかを検討した。この目的のために,6つの手書き作業から得られたデータを用いて,規則性(予測可能な音素-音素対応,例えば猫),非規則性(非定型音素-音素対応、例えば笑い),非単語(音素-音素変換規則に準拠した非意味的発音可能な文字列)の1つの単語をコピーする必要がある。我々は、よく知られた4つの分類器と特徴選択を実装することで、機械学習アプローチを用いてデータを分析した。実験の結果,特徴の選択により,各単語タイプごとに異なる特徴の異なる集合を導出できることがわかった。さらに、非正規語は、平均して多くの特徴を持つが、優れた分類性能を達成した: 最良の結果が非正規語で得られ、90%近い精度に達した。

Using kinematic properties of handwriting to support the diagnosis of neurodegenerative disease is a real challenge: non-invasive detection techniques combined with machine learning approaches promise big steps forward in this research field. In literature, the tasks proposed focused on different cognitive skills to elicitate handwriting movements. In particular, the meaning and phonology of words to copy can compromise writing fluency. In this paper, we investigated how word semantics and phonology affect the handwriting of people affected by Alzheimer's disease. To this aim, we used the data from six handwriting tasks, each requiring copying a word belonging to one of the following categories: regular (have a predictable phoneme-grapheme correspondence, e.g., cat), non-regular (have atypical phoneme-grapheme correspondence, e.g., laugh), and non-word (non-meaningful pronounceable letter strings that conform to phoneme-grapheme conversion rules). We analyzed the data using a machine learning approach by implementing four well-known and widely-used classifiers and feature selection. The experimental results showed that the feature selection allowed us to derive a different set of highly distinctive features for each word type. Furthermore, non-regular words needed, on average, more features but achieved excellent classification performance: the best result was obtained on a non-regular, reaching an accuracy close to 90%.

翻訳日:2023-07-11 12:04:56 公開日:2023-07-06

# Heteroscedastic noise modelによる患者特異的ルートの同定

Identifying Patient-Specific Root Causes with the Heteroscedastic Noise Model ( http://arxiv.org/abs/2205.13085v2 )

ライセンス: Link先を確認

Eric V. Strobl, Thomas A. Lasko

(参考訳) 複雑な疾患は、同一の診断カテゴリー内でも患者によって異なる様々な要因によって引き起こされる。根底にあるいくつかの原因は、それぞれの患者で疾患の発生を引き起こす可能性がある。そこで我々は,構造方程式モデルにおける外因性誤り項の標本特異的な予測値に類似した疾患の患者固有の根本原因の同定に焦点をあてた。 y = m(x) + \varepsilon\sigma(x)$ で条件付き平均と平均絶対偏差を表す非線型関数 $m(x)$ と $\sigma(x)$ を持つような、線形設定からヘテロシドスティックノイズモデルへ一般化する。このモデルは識別可能性を保持しますが、エラー項を正しく抽出するために一般化ルート因果推論(grci)と呼ばれるカスタマイズアルゴリズムを必要とする非自明な課題を導入します。 GRCIは、既存の代替品よりも患者固有の根本原因を正確に回復する。

Complex diseases are caused by a multitude of factors that may differ between patients even within the same diagnostic category. A few underlying root causes may nevertheless initiate the development of disease within each patient. We therefore focus on identifying patient-specific root causes of disease, which we equate to the sample-specific predictivity of the exogenous error terms in a structural equation model. We generalize from the linear setting to the heteroscedastic noise model where $Y = m(X) + \varepsilon\sigma(X)$ with non-linear functions $m(X)$ and $\sigma(X)$ representing the conditional mean and mean absolute deviation, respectively. This model preserves identifiability but introduces non-trivial challenges that require a customized algorithm called Generalized Root Causal Inference (GRCI) to extract the error terms correctly. GRCI recovers patient-specific root causes more accurately than existing alternatives.

翻訳日:2023-07-10 16:14:27 公開日:2023-07-06

# エンドツーエンドのマルチモーダルファクトチェックと説明生成: 挑戦的なデータセットとモデル

End-to-End Multimodal Fact-Checking and Explanation Generation: A Challenging Dataset and Models ( http://arxiv.org/abs/2205.12487v2 )

ライセンス: Link先を確認

Barry Menglong Yao (1), Aditya Shah (1), Lichao Sun (2), Jin-Hee Cho (1), Lifu Huang (1) ((1) Virginia Tech, (2) Lehigh University)

(参考訳) 本稿では, 記事, 画像, ビデオ, つぶやきを含む大量のWebソースを入力として, クレームの真理性を評価し, 真理性ラベル(例えば, サポート, 反感, あるいは不十分な情報)を予測することによって, クレームの真理性を評価することを目的とした, インプットがクレームであり, 大量のWebソースの集合である, エンドツーエンドのマルチモーダルなファクトチェックと説明生成を提案する。この研究を支援するために,各クレームに真理性ラベルと裁定文を付記した15,601件のクレームと,33,880段落と12,112枚の画像からなる大規模データセットであるmochegを構築した。マルチモーダルエビデンス検索,クレーム検証,説明生成という,3つのパイプラインサブタスク上での最先端のニューラルネットワークアーキテクチャのベースライン性能を確立するため,最先端のマルチモーダルファクトチェックの性能が満足できる結果にならないことを実証した。私たちの知る限りでは、ベンチマークデータセットとエンドツーエンドのマルチモーダルファクトチェックと説明生成のためのソリューションを最初に構築しました。データセット、ソースコード、モデルチェックポイントはhttps://github.com/VT-NLP/Mocheg.comで入手できる。

We propose end-to-end multimodal fact-checking and explanation generation, where the input is a claim and a large collection of web sources, including articles, images, videos, and tweets, and the goal is to assess the truthfulness of the claim by retrieving relevant evidence and predicting a truthfulness label (e.g., support, refute or not enough information), and to generate a statement to summarize and explain the reasoning and ruling process. To support this research, we construct Mocheg, a large-scale dataset consisting of 15,601 claims where each claim is annotated with a truthfulness label and a ruling statement, and 33,880 textual paragraphs and 12,112 images in total as evidence. To establish baseline performances on Mocheg, we experiment with several state-of-the-art neural architectures on the three pipelined subtasks: multimodal evidence retrieval, claim verification, and explanation generation, and demonstrate that the performance of the state-of-the-art end-to-end multimodal fact-checking does not provide satisfactory outcomes. To the best of our knowledge, we are the first to build the benchmark dataset and solutions for end-to-end multimodal fact-checking and explanation generation. The dataset, source code and model checkpoints are available at https://github.com/VT-NLP/Mocheg.

翻訳日:2023-07-10 16:14:08 公開日:2023-07-06

# ブレッドスファーストパイプライン並列処理

Breadth-First Pipeline Parallelism ( http://arxiv.org/abs/2211.05953v2 )

ライセンス: Link先を確認

Joel Lamy-Poirier

(参考訳) パイプラインとデータ並列性の組み合わせを最適化する,新たなトレーニングスケジュールであるBreadth-First Pipeline Parallelismを導入する。 Breadth-First Pipeline Parallelismは、GPU使用率の高いGPUとGPU毎のバッチサイズを併用し、完全なシャードデータ並列性を使用することで、トレーニング時間、コスト、メモリ使用率を低下させる。実験では、megatron-lmと比較して、gpu当たりのバッチサイズが小さい52億パラメタモデルでは、トレーニングスループットが最大43%向上し、大きなgpuクラスタで同じ量でトレーニング時間とコストが削減されることがわかった。

We introduce Breadth-First Pipeline Parallelism, a novel training schedule which optimizes the combination of pipeline and data parallelism. Breadth-First Pipeline Parallelism lowers training time, cost and memory usage by combining a high GPU utilization with a small batch size per GPU, and by making use of fully sharded data parallelism. Experimentally, we observed an increase of up to 43% in training throughput for a 52 billion-parameter model using a small batch size per GPU compared to Megatron-LM, which would reduce the training time and cost by the same amount on a large GPU cluster.

翻訳日:2023-07-10 16:05:54 公開日:2023-07-06

# 優先マトロイド中央値に対するbicriteria近似アルゴリズム

Bicriteria Approximation Algorithms for Priority Matroid Median ( http://arxiv.org/abs/2210.01888v2 )

ライセンス: Link先を確認

Tanvi Bajpai and Chandra Chekuri

(参考訳) フェアネスの考慮は近年、新しいクラスタリング問題とアルゴリズムを動機付けている。本稿では,最近研究されている優先度 $k$-median 問題を一般化した優先度マトロイド中央値問題を考える。入力は、一連の施設$\mathcal{F}$と、計量空間$(\mathcal{F} \cup \mathcal{C},d)$にあるクライアント$\mathcal{C}$と、その施設上のマトロイド$\mathcal{M}=(\mathcal{F},\mathcal{I})$からなる。さらに、各クライアント$j$ は特定の半径 $r_j \ge 0$ を持ち、各施設 $i \in \mathcal{F}$ は開封コスト $f_i$ を持つ。目的は、$\sum_{i \in \mathcal{F}} f_i + \sum_{j \in \mathcal{C}} d(j,S)$ を最小化する施設のサブセット $S \subseteq \mathcal{F}$ を選択することである。 (i)$S$は$\mathcal{M}$の独立集合である(つまり$S \in \mathcal{I}$)。 (ii) 各クライアント$j$に対して、開施設までの距離は最大$r_j$(つまり$d(j,S) \le r_j$)である。この問題に対して、最初のbicriteria $(c_1,c_2)$の固定定数に対する近似を記述する: クライアントの半径制約は、少なくとも$c_1$の係数で破られ、目的コストは最適なコストの最大$c_2$倍である。また、一様半径設定(r_j := L$ $\forall j \in \mathcal{C}$)に対する既知双基準近似も改善する。

Fairness considerations have motivated new clustering problems and algorithms in recent years. In this paper we consider the Priority Matroid Median problem which generalizes the Priority $k$-Median problem that has recently been studied. The input consists of a set of facilities $\mathcal{F}$ and a set of clients $\mathcal{C}$ that lie in a metric space $(\mathcal{F} \cup \mathcal{C},d)$, and a matroid $\mathcal{M}=(\mathcal{F},\mathcal{I})$ over the facilities. In addition each client $j$ has a specified radius $r_j \ge 0$ and each facility $i \in \mathcal{F}$ has an opening cost $f_i$. The goal is to choose a subset $S \subseteq \mathcal{F}$ of facilities to minimize the $\sum_{i \in \mathcal{F}} f_i + \sum_{j \in \mathcal{C}} d(j,S)$ subject to two constraints: (i) $S$ is an independent set in $\mathcal{M}$ (that is $S \in \mathcal{I}$) and (ii) for each client $j$, its distance to an open facility is at most $r_j$ (that is, $d(j,S) \le r_j$). For this problem we describe the first bicriteria $(c_1,c_2)$ approximations for fixed constants $c_1,c_2$: the radius constraints of the clients are violated by at most a factor of $c_1$ and the objective cost is at most $c_2$ times the optimum cost. We also improve the previously known bicriteria approximation for the uniform radius setting ($r_j := L$ $\forall j \in \mathcal{C}$).

翻訳日:2023-07-10 16:05:02 公開日:2023-07-06

# キャプション生成のための視覚的セマンティック類似表現:学習した教訓

Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned ( http://arxiv.org/abs/2209.12817v2 )

ライセンス: Link先を確認

Ahmed Sabir

(参考訳) 本稿では,画像キャプチャ生成システムによって生成されるキャプションの強化に着目する。本稿では,モデルが生成する最も可能性の高い出力ではなく,最も関連性の高い出力を選択することでキャプション生成システムを改善する手法を提案する。我々のモデルは視覚的文脈の観点から言語生成出力ビーム探索を改訂する。画像中の関連情報と適切なキャプションを一致させるために,単語と文レベルの視覚的意味尺度を用いる。提案手法は後処理に基づく手法として任意の字幕システムに適用できる。

This paper focuses on enhancing the captions generated by image-caption generation systems. We propose an approach for improving caption generation systems by choosing the most closely related output to the image rather than the most likely output produced by the model. Our model revises the language generation output beam search from a visual context perspective. We employ a visual semantic measure in a word and sentence level manner to match the proper caption to the related information in the image. The proposed approach can be applied to any caption system as a post-processing based method.

翻訳日:2023-07-10 16:03:53 公開日:2023-07-06

# ボース・アインシュタイン凝縮体における集合励起を用いた量子レジスタ

A quantum register using collective excitations in a Bose-Einstein condensate ( http://arxiv.org/abs/2211.09252v3 )

ライセンス: Link先を確認

Elisha Haber (1), Zekai Chen (1 and 2), Nicholas P. Bigelow (1) ((1) University of Rochester, (2) University of Innsbruck)

(参考訳) 原子の集合からなる量子ビットは、原子の損失に対する耐性から魅力的であり、そのような量子ビットを実現するための多くの提案は、リドベルク封鎖効果に基づいている。代わりに、空間的に重なり合うボース-アインシュタイン凝縮体からスピン依存光学格子をコヒーレントにロードする実験的に実現可能なプロトコルを考える。各格子サイトを量子ビットとして同定し, 空あるいは充填されたサイトを量子ビットとして, 高忠実度単一量子ビット演算, 任意の量子ビット間の2量子ゲート, 非破壊測定を行う方法について検討した。この設定では、原子損失の影響は緩和され、原子は基底状態多様体から取り除かれる必要はなく、量子ビットの別個の記憶と計算の基盤は不要であり、これらは全て他の多くの種類の原子量子ビットにおいて重要なデコヒーレンスの原因となる。

A qubit made up of an ensemble of atoms is attractive due to its resistance to atom losses, and many proposals to realize such a qubit are based on the Rydberg blockade effect. In this work, we instead consider an experimentally feasible protocol to coherently load a spin-dependent optical lattice from a spatially overlapping Bose--Einstein condensate. Identifying each lattice site as a qubit, with an empty or filled site as the qubit basis, we discuss how high-fidelity single-qubit operations, two-qubit gates between arbitrary pairs of qubits, and nondestructive measurements could be performed. In this setup, the effect of atom losses has been mitigated, the atoms never need to be removed from the ground state manifold, and separate storage and computational bases for the qubits are not required, all of which can be significant sources of decoherence in many other types of atomic qubits.

翻訳日:2023-07-10 15:53:57 公開日:2023-07-06

# Calibrated Interpretation:Semantic Parsingにおける信頼度推定

Calibrated Interpretation: Confidence Estimation in Semantic Parsing ( http://arxiv.org/abs/2211.07443v6 )

ライセンス: Link先を確認

Elias Stengel-Eskin and Benjamin Van Durme

(参考訳) シーケンス生成モデルは、自然言語をプログラムに変換するために、すなわち実行可能なセマンティック解析を実行するために、ますます使われている。セマンティック解析は、現実世界で実行されるアクションにつながるプログラムを予測することを目的としているという事実は、安全なシステムの開発を動機付けている。これにより、特に安全性の中心的な要素であるキャリブレーションの測定が重要になる。一般的な4つのセマンティックパーシングデータセットのキャリブレーションを調査し、モデルやデータセットによって異なることを発見した。次に、キャリブレーションエラーに関連する要因を分析し、2つの解析データセットの新しい信頼度に基づく課題分割をリリースする。セマンティック解析評価にキャリブレーションを組み込むことを容易にするため,キャリブレーションメトリクスを計算するためのライブラリをリリースする。

Sequence generation models are increasingly being used to translate natural language into programs, i.e. to perform executable semantic parsing. The fact that semantic parsing aims to predict programs that can lead to executed actions in the real world motivates developing safe systems. This in turn makes measuring calibration -- a central component to safety -- particularly important. We investigate the calibration of popular generation models across four popular semantic parsing datasets, finding that it varies across models and datasets. We then analyze factors associated with calibration error and release new confidence-based challenge splits of two parsing datasets. To facilitate the inclusion of calibration in semantic parsing evaluations, we release a library for computing calibration metrics.

翻訳日:2023-07-10 15:53:38 公開日:2023-07-06

# 量子論におけるリレーショナルシズムを超えて--量子論の新しい不確定性に基づく解釈

Beyond relationalism in quantum theory: A new indeterminacy-based interpretation of quantum theory ( http://arxiv.org/abs/2304.00608v5 )

ライセンス: Link先を確認

Francisco Pipa

(参考訳) 物理学の基礎と哲学における受け入れられた見解は、ある隠れ変数を持つ補足量子論(QT)を拒絶し、ユニタリQTが正しいとみなすならば、QTのリレーショナルな解釈を採用するべきであるというものである。関係論的な解釈は、測定結果、例えば世界、システム、エージェント、参照フレームに相対化する。それは多世界解釈、関係量子力学、QBismを含んでいる。これらの解釈は、それらのリレーショナルな解釈と結びつく潜在的なコストを持つ。したがって、非リレーショナルな非隠れ変数 QT の普遍的 intepretations が存在するなら、真剣に考えるべきである。環境決定性理論(environmental determinacy-based or end quantum theory,endqt)と呼ばれる、リレーショナル主義と受け取られた見解を超えた解釈を提示する。 endqtは、ユニタリな非隠れ変数ユニバーサルqtを維持しながら、関係性を持たない不確定値と決定値と基礎となる量子特性の考慮を構築することによって、関係性を回避する。 EnDQTによると、関係論者が、拡張されたウィグナーの友人シナリオのような測定結果が相対化されると仮定する場合、決定的な結果ではなく、非関係的な不決定値を持つシステムが存在する。このアプローチでは、特定のシステムを通じてある時点の値が発生し、特定のネットワークで表現された特定の相互作用を通じてその値が持続する。これらのネットワークに属する他のシステム、例えば拡張ウィグナーの友人のシナリオにおける友人の研究室内部から隔離された場合、非関係的な値が内部で発生する。ベル相関の局所的因果説明や、これらのネットワークで表現された新しい実証的実証例など、endqtを採用する他の独立した正当な理由について論じる。

The received view in foundations and philosophy of physics holds that if we reject supplementing quantum theory (QT) with certain hidden variables and consider that unitary QT is correct and universal, we should adopt a relationalist interpretation of QT. Relationalist interpretations relativize measurement outcomes to, for example, worlds, systems, agents, or reference frames. It includes the Many-Worlds Interpretation, Relational Quantum Mechanics, and QBism. These interpretations have potential costs connected with their relationalism that make them unattractive. Thus, if there exists a non-relational non-hidden variable universal intepretations of QT, it should be taken seriously. I will present an interpretation of this kind called Environmental Determinacy-based or EnD Quantum Theory (EnDQT), which goes beyond relationalism and the received view. EnDQT circumvents relationalism by constructing an account of indeterminate and determinate values and underlying quantum properties that is not relational while maintaining unitary non-hidden variable universal QT. In situations where a relationalist assumes that measurement outcomes are relativized, such as in the extended Wigner's friend scenarios, according to EnDQT there aren't determinate outcomes but systems with non-relational indeterminate values. In this approach, determinate values arose at some point in time through certain systems and persist due to them via certain interactions represented by certain networks. When there is isolation from the rest of the systems that belong to these networks, such as inside the friend's lab in the extended Wigner's friend scenarios, indeterminate values non-relationally arise inside. I will discuss other independent good reasons for adopting EnDQT, including providing a local causal explanation of Bell correlations and novel empirical posits represented by these networks.

翻訳日:2023-07-10 15:36:53 公開日:2023-07-06

# 情報検索のための埋め込みAPIの評価

Evaluating Embedding APIs for Information Retrieval ( http://arxiv.org/abs/2305.06300v2 )

ライセンス: Link先を確認

Ehsan Kamalloo, Xinyu Zhang, Odunayo Ogundepo, Nandan Thakur, David Alfonso-Hermelo, Mehdi Rezagholizadeh, Jimmy Lin

(参考訳) 言語モデルのサイズが拡大するにつれ、コミュニティへの普及が加速し、多くの企業がAPIを通じて大きな言語モデルにアクセスできるようになる。密集検索に適した1つの特定のタイプは、入力テキストのベクトル表現を構築するセマンティック埋め込みサービスである。公開apiの数が増えているため,本論文の目標は,既存の提供物を現実的な検索シナリオで分析し,実践者や研究者がニーズに応じて適切なサービスを見つけるのを支援することである。具体的には、ドメインの一般化と多言語検索における既存のセマンティック埋め込みAPIの機能について検討する。本研究では,BEIR と MIRACL の2つの標準ベンチマークでこれらのサービスを評価した。このAPIを用いてBM25の結果を再評価することは予算に優しいアプローチであり、第1段階のレトリバーとして使用する標準的なプラクティスとは対照的に、英語でもっとも効果的である。非英語検索の場合、再ランク付けは結果を改善するが、bm25のハイブリッドモデルの方が高いコストで機能する。我々は,情報アクセスにおいて,検索において重要なセマンティック埋め込みAPIを評価するための基礎を築き上げたい。

The ever-increasing size of language models curtails their widespread availability to the community, thereby galvanizing many companies into offering access to large language models through APIs. One particular type, suitable for dense retrieval, is a semantic embedding service that builds vector representations of input text. With a growing number of publicly available APIs, our goal in this paper is to analyze existing offerings in realistic retrieval scenarios, to assist practitioners and researchers in finding suitable services according to their needs. Specifically, we investigate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval. For this purpose, we evaluate these services on two standard benchmarks, BEIR and MIRACL. We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English, in contrast to the standard practice of employing them as first-stage retrievers. For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost. We hope our work lays the groundwork for evaluating semantic embedding APIs that are critical in search and more broadly, for information access.

翻訳日:2023-07-10 15:25:37 公開日:2023-07-06

# 特徴空間密度マッチングによる医用画像分割のための教師なし領域適応

Unsupervised Domain Adaptation for Medical Image Segmentation via Feature-space Density Matching ( http://arxiv.org/abs/2305.05789v2 )

ライセンス: Link先を確認

Tushar Kataria, Beatrice Knudsen, and Shireen Elhabian

(参考訳) セマンティックセグメンテーション(セマンティックセグメンテーション、Semantic segmentation)は、画像の自動解釈と解析において重要なステップである。セマンティックセグメンテーションのためのディープラーニングアプローチは、アノテーション付き画像のパワーを利用して、これらのセマンティッククラスを示す特徴を学習する。それでも、トレーニング(すなわち、ソース)データとデプロイ時に遭遇するデータセット(すなわち、ターゲット)の間に重要なドメイン(すなわち、分散)シフトがある場合、ターゲットデータに対して手動アノテーションを必要とせず、許容可能なパフォーマンスを達成することがしばしばある。異なる画像モダリティは、プロトコールとベンダーの変動性により、サイト内およびサイト間において大きな変動をもたらすため、医療画像において特に重要である。現在の技術はハイパーパラメータチューニングとターゲットデータセットサイズに敏感である。本稿では,対象データのアノテートの必要性を緩和する意味セグメンテーションのための教師なしドメイン適応手法を提案する。カーネル密度推定を用いて,対象データ分布を特徴空間のソース,特に対象サンプル数(対象データセットサイズの3%)が限られている場合と一致させる。提案手法の有効性を2つのデータセット,多部位前立腺MRI,病理組織像に示す。

Semantic segmentation is a critical step in automated image interpretation and analysis where pixels are classified into one or more predefined semantically meaningful classes. Deep learning approaches for semantic segmentation rely on harnessing the power of annotated images to learn features indicative of these semantic classes. Nonetheless, they often fail to generalize when there is a significant domain (i.e., distributional) shift between the training (i.e., source) data and the dataset(s) encountered when deployed (i.e., target), necessitating manual annotations for the target data to achieve acceptable performance. This is especially important in medical imaging because different image modalities have significant intra- and inter-site variations due to protocol and vendor variability. Current techniques are sensitive to hyperparameter tuning and target dataset size. This paper presents an unsupervised domain adaptation approach for semantic segmentation that alleviates the need for annotating target data. Using kernel density estimation, we match the target data distribution to the source in the feature space, particularly when the number of target samples is limited (3% of the target dataset size). We demonstrate the efficacy of our proposed approach on 2 datasets, multisite prostate MRI and histopathology images.

翻訳日:2023-07-10 15:25:19 公開日:2023-07-06

# ESPnet-ST-v2:多目的音声翻訳ツールキット

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit ( http://arxiv.org/abs/2304.04596v3 )

ライセンス: Link先を確認

Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Pol\'ak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe

(参考訳) ESPnet-ST-v2はオープンソースのESPnet-STツールキットを改良したものである。 ESPnet-ST-v2 のサポート 1)オフライン音声テキスト翻訳(ST) 2)同時音声テキスト翻訳(SST)、及び 3) オフライン音声音声翻訳(S2ST) -- 各タスクは、ESPnet-ST-v2と他のオープンソースの音声翻訳ツールキットを区別して、幅広いアプローチでサポートされている。このツールキットはトランスデューサ、ハイブリッドCTC/アテンション、検索可能な中間子を持つマルチデコーダ、時間同期ブロックワイドCTC/アテンション、トランスラトトロンモデル、直接離散単位モデルなどの最先端アーキテクチャを提供する。本稿では,https://github.com/espnet/espnetで公開されているespnet-st-v2の背後にある全体的な設計,各タスクのモデル,パフォーマンスベンチマークについて述べる。

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community. ESPnet-ST-v2 supports 1) offline speech-to-text translation (ST), 2) simultaneous speech-to-text translation (SST), and 3) offline speech-to-speech translation (S2ST) -- each task is supported with a wide variety of approaches, differentiating ESPnet-ST-v2 from other open source spoken language translation toolkits. This toolkit offers state-of-the-art architectures such as transducers, hybrid CTC/attention, multi-decoders with searchable intermediates, time-synchronous blockwise CTC/attention, Translatotron models, and direct discrete unit models. In this paper, we describe the overall design, example models for each task, and performance benchmarking behind ESPnet-ST-v2, which is publicly available at https://github.com/espnet/espnet.

翻訳日:2023-07-10 15:23:28 公開日:2023-07-06

# Einstein-Podolsky-Rosen ステアリングの一方向フィルタ

Filtering one-way Einstein-Podolsky-Rosen steering ( http://arxiv.org/abs/2304.04210v2 )

ライセンス: Link先を確認

Ze-Yan Hao, Yan Wang, Jia-Kun Li, Yu Xiang, Qiong-Yi He, Zheng-Hao Liu, Mu Yang, Kai Sun, Jin-Shi Xu, Chuan-Feng Li, and Guang-Can Guo

(参考訳) EPR(Einstein-Podolsky-Rosen)ステアリング(EPR)は、量子非局所性の基本概念であり、ある観測者が別の観測者の状態に局所的な測定でリモートで影響する能力を記述する。対称量子相関と関連する量子絡み合いやベル非局所性とは異なり、EPRステアリングは量子非局所性のユニークな非対称性を表す。システム成分が廃棄される局所フィルタ演算により、量子非局所性を蒸留して非局所相関を強化することができ、隠れた非局所性も活性化することができる。しかしながら、フィルタ演算における非対称な量子非局所性は、特に量子非局所相関が確率で存在する可能性のある破棄された部分を考えると、十分に取り調べられた研究を欠いている。ここでは,EPRステアリングに対する局所フィルタの効果について,理論と実験の両方において検討する。 EPRステアリングのすべての構成を同時に観察し、一方方向のEPRステアリングの方向を反転させるなど、非対称な量子非局所性の興味深い進化を観察する。この研究は、非対称量子非局所性を理解するための補完的な視点を提供し、非対称量子システムを量子情報タスクに有意な応用で操作するための実用的なツールボックスを示す。

Einstein-Podolsky-Rosen (EPR) steering, a fundamental concept of quantum nonlocality, describes one observer's capability to remotely affect another distant observer's state by local measurements. Unlike quantum entanglement and Bell nonlocality, both associated with the symmetric quantum correlation, EPR steering depicts the unique asymmetric property of quantum nonlocality. With the local filter operation in which some system components are discarded, quantum nonlocality can be distilled to enhance the nonlocal correlation, and even the hidden nonlocality can be activated. However, asymmetric quantum nonlocality in the filter operation still lacks a well-rounded investigation, especially considering the discarded parts where quantum nonlocal correlations may still exist with probabilities. Here, in both theory and experiment, we investigate the effect of the local filter on EPR steering. We observe all configurations of EPR steering simultaneously and other intriguing evolution of asymmetric quantum nonlocality, such as reversing the direction of one-way EPR steering. This work provides a complementary perspective to understand the asymmetric quantum nonlocality and demonstrates a practical toolbox for manipulating asymmetric quantum systems with significant potential applications in quantum information tasks.

翻訳日:2023-07-10 15:23:10 公開日:2023-07-06

# 大規模言語モデルにおけるオープンドメイン質問応答の評価

Evaluating Open-Domain Question Answering in the Era of Large Language Models ( http://arxiv.org/abs/2305.06984v3 )

ライセンス: Link先を確認

Ehsan Kamalloo, Nouha Dziri, Charles L. A. Clarke, Davood Rafiei

(参考訳) 語彙マッチングは、オープンドメイン質問応答(QA)のデファクト評価方法として残っている。残念なことに、論理的マッチングは、金の答えリストにプラウチブル候補の答えが現れない場合に完全に失敗し、抽出モデルから生成モデルへ移行するにつれて、ますますその傾向が増す。近年の大規模言語モデル (LLMs) の成功により、候補解が長くなると語彙的マッチングの失敗が増加し、ゴールド解とのマッチングはさらに困難になる。正確な評価がなければ、オープンドメインQAの真の進歩は分かっていない。本稿では,一般的なベンチマークであるNQ-openのサブセットを手動で評価することにより,LLMを含む様々なオープンドメインQAモデルの徹底的な分析を行う。私たちの評価では、すべてのモデルの真のパフォーマンスは著しく過小評価されているものの、instructgpt (zero-shot) llmのパフォーマンスは60%近く向上し、既存のトップモデルと同等になり、instructgpt (few-shot) モデルはnq-openの新たな最先端を実際に達成しています。また、語彙マッチング失敗の50%以上が意味論的に等価な答えによるものであることが判明した。さらに、不必要な厳密さに悩まされているにもかかわらず、人間の判断と整合したランクQAモデルを示す。最後に, 自動評価モデルは, LLM が生成する長文解に対してではなく, 語彙マッチングのための合理的なサロゲートであることを示す。自動モデルはLLM回答の幻覚を検出するのに苦労し、LLMを評価することができない。現段階では、人間の評価に代わるものはないようである。

Lexical matching remains the de facto evaluation method for open-domain question answering (QA). Unfortunately, lexical matching fails completely when a plausible candidate answer does not appear in the list of gold answers, which is increasingly the case as we shift from extractive to generative models. The recent success of large language models (LLMs) for QA aggravates lexical matching failures since candidate answers become longer, thereby making matching with the gold answers even more challenging. Without accurate evaluation, the true progress in open-domain QA remains unknown. In this paper, we conduct a thorough analysis of various open-domain QA models, including LLMs, by manually evaluating their answers on a subset of NQ-open, a popular benchmark. Our assessments reveal that while the true performance of all models is significantly underestimated, the performance of the InstructGPT (zero-shot) LLM increases by nearly +60%, making it on par with existing top models, and the InstructGPT (few-shot) model actually achieves a new state-of-the-art on NQ-open. We also find that more than 50% of lexical matching failures are attributed to semantically equivalent answers. We further demonstrate that regex matching ranks QA models consistent with human judgments, although still suffering from unnecessary strictness. Finally, we demonstrate that automated evaluation models are a reasonable surrogate for lexical matching in some circumstances, but not for long-form answers generated by LLMs. The automated models struggle in detecting hallucinations in LLM answers and are thus unable to evaluate LLMs. At this time, there appears to be no substitute for human evaluation.

翻訳日:2023-07-10 15:14:09 公開日:2023-07-06

# 視線を信じないで - 機能の可視化の信頼性について

Don't trust your eyes: on the (un)reliability of feature visualizations ( http://arxiv.org/abs/2306.04719v4 )

ライセンス: Link先を確認

Robert Geirhos, Roland S. Zimmermann, Blair Bilodeau, Wieland Brendel, Been Kim

(参考訳) ニューラルネットワークはどのようにピクセルからパターンを抽出するか? 機能の可視化は、最適化によって非常に活性化したパターンを視覚化することで、この重要な質問に答えようとしている。今日、可視化手法は、機械的な解釈可能性の一種として、ニューラルネットワークの内部動作に関する我々の知識の基礎を形成している。機能可視化はどの程度信頼できるのか? 我々は,自然入力上での通常のネットワーク動作から完全に切り離された任意のパターンを示すために,特徴可視化を騙すネットワーク回路の開発に着手する。特徴視覚化は標準入力とは全く異なる処理を受けており、ニューラルネットワークが自然言語をどのように処理するかを「説明」する能力に疑問を呈している。特徴視覚化によって確実に理解できる関数の集合は極めて小さく、一般的なブラックボックスニューラルネットワークを含まないことを証明した理論によるこの経験的発見を裏付ける。そのため、より信頼性の高い特徴視覚化を実現するために、特定の構造を強制するネットワークの開発が期待できる。

How do neural networks extract patterns from pixels? Feature visualizations attempt to answer this important question by visualizing highly activating patterns through optimization. Today, visualization methods form the foundation of our knowledge about the internal workings of neural networks, as a type of mechanistic interpretability. Here we ask: How reliable are feature visualizations? We start our investigation by developing network circuits that trick feature visualizations into showing arbitrary patterns that are completely disconnected from normal network behavior on natural input. We then provide evidence for a similar phenomenon occurring in standard, unmanipulated networks: feature visualizations are processed very differently from standard input, casting doubt on their ability to "explain" how neural networks process natural images. We underpin this empirical finding by theory proving that the set of functions that can be reliably understood by feature visualization is extremely small and does not include general black-box neural networks. Therefore, a promising way forward could be the development of networks that enforce certain structures in order to ensure more reliable feature visualizations.

翻訳日:2023-07-10 15:05:53 公開日:2023-07-06

# 励起状態量子相転移を利用した精密磁気計測

Precision magnetometry exploiting excited state quantum phase transitions ( http://arxiv.org/abs/2306.01126v2 )

ライセンス: Link先を確認

Qian Wang, Ugo Marzolino

(参考訳) 相転移における臨界挙動は精密計測の資源である。理由は、フィッシャー情報として知られるこの関数が臨界点において超指数関数であり、同時にメトロジープロトコルのパフォーマンスを定量化するからである。したがって、位相遷移におけるメロジカルプローブの作成により、遷移制御パラメータの測定精度が向上する。我々は、異なる磁場で励起状態量子相転移を示すリプキン-メシュコフ-グリックモデルに焦点を当てる。モデルスペクトル特性に基づき、フィッシャー情報の広いピークを示し、高精度磁力計の効率的なスキームを提案する。 lipkin-meshkov-glickモデルは、超伝導と核系のために初めて導入され、最近いくつかの凝縮物プラットフォームで実現された。上記のメトロロジースキームは、リプキン-メシュコフ-グリック模型をシミュレートできるシステムの微視的性質を測定するためにも利用できる。

Critical behaviour in phase transitions is a resource for enhanced precision metrology. The reason is that the function, known as Fisher information, is superextensive at critical points, and, at the same time, quantifies performances of metrological protocols. Therefore, preparing metrological probes at phase transitions provides enhanced precision in measuring the transition control parameter. We focus on the Lipkin-Meshkov-Glick model that exhibits excited state quantum phase transitions at different magnetic fields. Resting on the model spectral properties, we show broad peaks of the Fisher information, and propose efficient schemes for precision magnetometry. The Lipkin-Meshkov-Glick model was first introduced for superconductivity and for nuclear systems, and recently realised in several condensed matter platforms. The above metrological schemes can be also exploited to measure microscopic properties of systems able to simulate the Lipkin-Meshkov-Glick model.

翻訳日:2023-07-10 15:04:16 公開日:2023-07-06

# KoLA: 大規模言語モデルのワールドナレッジを慎重にベンチマークする

KoLA: Carefully Benchmarking World Knowledge of Large Language Models ( http://arxiv.org/abs/2306.09296v2 )

ライセンス: Link先を確認

Jifan Yu, Xiaozhi Wang, Shangqing Tu, Shulin Cao, Daniel Zhang-Li, Xin Lv, Hao Peng, Zijun Yao, Xiaohan Zhang, Hanming Li, Chunyang Li, Zheyuan Zhang, Yushi Bai, Yantao Liu, Amy Xin, Nianyi Lin, Kaifeng Yun, Linlu Gong, Jianhui Chen, Zhili Wu, Yunjia Qi, Weikai Li, Yong Guan, Kaisheng Zeng, Ji Qi, Hailong Jin, Jinxin Liu, Yu Gu, Yuan Yao, Ning Ding, Lei Hou, Zhiyuan Liu, Bin Xu, Jie Tang, Juanzi Li

(参考訳) 大規模言語モデル(LLM)の先例のない性能は、評価の改善を必要とする。単にLLM能力の広さを探求するだけでなく、綿密で思慮深い設計が、徹底的で偏見がなく、適用可能な評価に不可欠であると信じている。 LLMに対する世界的知識の重要性を考慮し、知識指向LLMアセスメントベンチマーク(KoLA)を構築し、(1)能力モデリングでは、人間の認知を模倣して知識関連能力の4段階の分類を作成し、19ドルのタスクをカバーしている。 2)データを公平に比較するためには,LLMが事前学習したコーパスであるウィキペディアと,未確認データを扱う能力と知識の進化を評価することを目的とした,新たなコーパスを併用する。 (3) 評価基準には,タスクやモデル間の数値コンパビリティ向上のための総合的な基準スコアと,知識幻覚の自動評価のための独自の自己コントラスト尺度が採用されている。オープンソースおよび商用LLMを21ドルで評価し,興味深い結果を得た。 KoLAデータセットとオープン参加型リーダボードはhttps://kola.xlore.cnで公開されており、LLMとナレッジ関連のシステムを開発するためのリファレンスを提供するために継続的に更新される。

The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we believe meticulous and thoughtful designs are essential to thorough, unbiased, and applicable evaluations. Given the importance of world knowledge to LLMs, we construct a Knowledge-oriented LLM Assessment benchmark (KoLA), in which we carefully design three crucial factors: (1) For ability modeling, we mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks. (2) For data, to ensure fair comparisons, we use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, aiming to evaluate the capacity to handle unseen data and evolving knowledge. (3) For evaluation criteria, we adopt a contrastive system, including overall standard scores for better numerical comparability across tasks and models and a unique self-contrast metric for automatically evaluating knowledge hallucination. We evaluate $21$ open-source and commercial LLMs and obtain some intriguing findings. The KoLA dataset and open-participation leaderboard are publicly released at https://kola.xlore.cn and will be continuously updated to provide references for developing LLMs and knowledge-related systems.

翻訳日:2023-07-10 14:55:09 公開日:2023-07-06

# 深部変動クラスタリングを用いたエキスパート非依存超音波画像品質評価

Expert-Agnostic Ultrasound Image Quality Assessment using Deep Variational Clustering ( http://arxiv.org/abs/2307.02462v2 )

ライセンス: Link先を確認

Deepak Raina, Dimitrios Ntentia, SH Chandrashekhara, Richard Voyles, Subir Kumar Saha

(参考訳) 超音波イメージングは、いくつかの診断および治療の手順で一般的に用いられるモダリティである。しかし超音波による診断は、超音波撮影者が手動で評価した画像の品質に大きく依存しており、診断の客観性を低下させ、操作者に依存している。自動品質評価のための教師付き学習ベースの手法は、手動で注釈付きデータセットを必要とする。これらの超音波画像は品質が低く、オブザーバ間の知覚変化によるノイズの多いアノテーションに苦しむため、学習効率が損なわれる。我々は,手動アノテーションの負担と不確実性を解消するUnSupervised UltraSound Image Quality Assessment Network (US2QNet)を提案する。 US2QNetは、前処理、クラスタリング、後処理の3つのモジュールに埋め込まれた変分オートエンコーダを使用して、超音波画像の品質特徴表現を共同で強化、抽出、クラスタリング、可視化する。プリプロセッシングモジュールはイメージのフィルタリングを使用して、ノイズに注意をそらすのではなく、ネットワークの注意を優れた品質機能に向ける。 2次元空間における特徴表現のクラスタを可視化するための後処理を提案する。提案する膀胱超音波画像の品質評価の枠組みを検証した。提案手法は,最先端クラスタリング手法よりも精度が78%,性能が優れている。

Ultrasound imaging is a commonly used modality for several diagnostic and therapeutic procedures. However, the diagnosis by ultrasound relies heavily on the quality of images assessed manually by sonographers, which diminishes the objectivity of the diagnosis and makes it operator-dependent. The supervised learning-based methods for automated quality assessment require manually annotated datasets, which are highly labour-intensive to acquire. These ultrasound images are low in quality and suffer from noisy annotations caused by inter-observer perceptual variations, which hampers learning efficiency. We propose an UnSupervised UltraSound image Quality assessment Network, US2QNet, that eliminates the burden and uncertainty of manual annotations. US2QNet uses the variational autoencoder embedded with the three modules, pre-processing, clustering and post-processing, to jointly enhance, extract, cluster and visualize the quality feature representation of ultrasound images. The pre-processing module uses filtering of images to point the network's attention towards salient quality features, rather than getting distracted by noise. Post-processing is proposed for visualizing the clusters of feature representations in 2D space. We validated the proposed framework for quality assessment of the urinary bladder ultrasound images. The proposed framework achieved 78% accuracy and superior performance to state-of-the-art clustering methods.

翻訳日:2023-07-10 14:36:26 公開日:2023-07-06

# 都市部埋め込みのための地域意識多視点表現学習

Region-Wise Attentive Multi-View Representation Learning for Urban Region Embeddings ( http://arxiv.org/abs/2307.03212v1 )

ライセンス: Link先を確認

Weiliang Chan and Qianqian Ren

(参考訳) 都市領域の埋め込みは、複雑さと都市データの性質が絶えず変化するため、重要かつ非常に困難な問題である。この課題に対処するため,我々は,都市域の多視点依存を捉えるための領域ワイズ多視点表現学習(ROMER)を提案し,厳密な地域条件の制約を伴わずに都市域の表現表現を学習する。本モデルでは,多元都市データから都市域表現を学ぶことに注力する。まず,モビリティフローパターン,poiセマンティクス,チェックインダイナミクスから多視点相関を捉える。次に,グラフ内の2つの頂点の類似性を学習するために,グローバルグラフアテンションネットワークを採用する。複数ビューの特徴を包括的に検討し共有するために,2段階の融合モジュールを提案し,外部の注意を払って重みを学習し,多視点埋め込みを実現する。実世界のデータセット上での2つの下流タスクに対する大規模な実験により、我々のモデルは最先端の手法を最大17倍改善することを示した。

Urban region embedding is an important and yet highly challenging issue due to the complexity and constantly changing nature of urban data. To address the challenges, we propose a Region-Wise Multi-View Representation Learning (ROMER) to capture multi-view dependencies and learn expressive representations of urban regions without the constraints of rigid neighbourhood region conditions. Our model focus on learn urban region representation from multi-source urban data. First, we capture the multi-view correlations from mobility flow patterns, POI semantics and check-in dynamics. Then, we adopt global graph attention networks to learn similarity of any two vertices in graphs. To comprehensively consider and share features of multiple views, a two-stage fusion module is further proposed to learn weights with external attention to fuse multi-view embeddings. Extensive experiments for two downstream tasks on real-world datasets demonstrate that our model outperforms state-of-the-art methods by up to 17\% improvement.

翻訳日:2023-07-10 14:29:03 公開日:2023-07-06

# PseudoCell:深層学習によるセントロブラスト細胞検出のためのPseudo Labelingとしてのハードネガティブマイニング

PseudoCell: Hard Negative Mining as Pseudo Labeling for Deep Learning-Based Centroblast Cell Detection ( http://arxiv.org/abs/2307.03211v1 )

ライセンス: Link先を確認

Narongrid Seesawad, Piyalitt Ittichaiwong, Thapanun Sudhawiyangkul, Phattarapong Sawangjai, Peti Thuwajit, Paisarn Boonsakan, Supasan Sripodok, Kanyakorn Veerakanjana, Phoomraphee Luenam, Komgrid Charngkaew, Ananya Pongpaibul, Napat Angkathunyakul, Narit Hnoohom, Sumeth Yuenyong, Chanitra Thuwajit, Theerawit Wilaiprasitporn

(参考訳) 深層学習に基づくパッチ分類モデルはH&E染色組織試料の全スライディング画像(WSI)に利用され, 悪性リンパ腫の診断に有用である。しかし、これらのアプローチはいまだに病理学者が中心芽細胞を手動で同定し、最適な性能のラベルを提供する必要がある。これに対処するために、wsi(ソースコードはhttps://github.com/iobt-vistec/pseudocell.gitで利用可能)でcentroblast検出を自動化するオブジェクト検出フレームワークであるpseudocellを提案する。この枠組みは、病理学者のセントロブラストラベルを組み込んで、細胞の形態学的特徴を用いた偽陽性予測から得られた偽陰性のラベルと組み合わせている。 PseudoCellを用いることで、病理学者の作業量を削減し、組織を調べる際に注意を要する領域を正確に絞り込むことができる。信頼性のしきい値に応じて、pseudocellはwsiの非中心芽球組織領域の58.18-99.35%を除去できる。本研究は, 病理学者が改良のために洗練されたラベルを必要としない, 実用的な遠心細胞前スクリーニング法を提案する。 PseudoCellの実践に関する詳細なガイダンスが議論のセクションで提供されている。

Patch classification models based on deep learning have been utilized in whole-slide images (WSI) of H&E-stained tissue samples to assist pathologists in grading follicular lymphoma patients. However, these approaches still require pathologists to manually identify centroblast cells and provide refined labels for optimal performance. To address this, we propose PseudoCell, an object detection framework to automate centroblast detection in WSI (source code is available at https://github.com/IoBT-VISTEC/PseudoCell.git). This framework incorporates centroblast labels from pathologists and combines them with pseudo-negative labels obtained from undersampled false-positive predictions using the cell's morphological features. By employing PseudoCell, pathologists' workload can be reduced as it accurately narrows down the areas requiring their attention during examining tissue. Depending on the confidence threshold, PseudoCell can eliminate 58.18-99.35% of non-centroblasts tissue areas on WSI. This study presents a practical centroblast prescreening method that does not require pathologists' refined labels for improvement. Detailed guidance on the practical implementation of PseudoCell is provided in the discussion section.

翻訳日:2023-07-10 14:28:44 公開日:2023-07-06

# スパースなグラフィカル線形力学系

Sparse Graphical Linear Dynamical Systems ( http://arxiv.org/abs/2307.03210v1 )

ライセンス: Link先を確認

Emilie Chouzenoux and Victor Elvira

(参考訳) 時系列データセットは、生物医学、地球観測、ネットワーク分析など、科学と工学の多くの分野の中心である。状態空間モデル(SSM)は、時系列上で確率的かつ解釈可能な学習を可能にする強力な数学的ツールである。モデルパラメータをSSMで推定することは、おそらく最も複雑なタスクの1つであり、事前知識の含みは、解釈の容易さだけでなく、推論タスクを複雑にすることが知られている。非常に最近の研究は、これらのモデルパラメータのいくつかにグラフィカルな視点を組み込もうと試みているが、これらは、この作業が対処する顕著な制限を示している。より一般的に、既存のグラフィカルモデリングツールは静的情報、独立した確率変数間の統計的依存関係(例えば、グラフィカルラッソアプローチ)、または時系列サンプル間の因果関係を強調する動的情報(例えば、グラフィカルグランガーアプローチ)のいずれかを組み込むように設計されている。しかし、SSMのコンテキスト内で静的および動的グラフィカルモデリングを組み合わせた共同アプローチは存在しない。本研究では,静的グラフィカルラッソモデルと線形ガウスSSMに対する因果的グラフィカルアプローチを橋渡しする共同グラフィカルモデリングフレームワークを導入することにより,このギャップを埋めるための新しいアプローチを提案する。本稿では,このフレームワークにおける新しい推論手法であるdglasso(dynamic graphical lasso)を提案する。アルゴリズムの収束は、非線形解析から現代のツールから離れることによって確立される。合成および実気象変動データの実験的検証は,提案したモデルと推論アルゴリズムの有効性を示す。

Time-series datasets are central in numerous fields of science and engineering, such as biomedicine, Earth observation, and network analysis. Extensive research exists on state-space models (SSMs), which are powerful mathematical tools that allow for probabilistic and interpretable learning on time series. Estimating the model parameters in SSMs is arguably one of the most complicated tasks, and the inclusion of prior knowledge is known to both ease the interpretation but also to complicate the inferential tasks. Very recent works have attempted to incorporate a graphical perspective on some of those model parameters, but they present notable limitations that this work addresses. More generally, existing graphical modeling tools are designed to incorporate either static information, focusing on statistical dependencies among independent random variables (e.g., graphical Lasso approach), or dynamic information, emphasizing causal relationships among time series samples (e.g., graphical Granger approaches). However, there are no joint approaches combining static and dynamic graphical modeling within the context of SSMs. This work proposes a novel approach to fill this gap by introducing a joint graphical modeling framework that bridges the static graphical Lasso model and a causal-based graphical approach for the linear-Gaussian SSM. We present DGLASSO (Dynamic Graphical Lasso), a new inference method within this framework that implements an efficient block alternating majorization-minimization algorithm. The algorithm's convergence is established by departing from modern tools from nonlinear analysis. Experimental validation on synthetic and real weather variability data showcases the effectiveness of the proposed model and inference algorithm.

翻訳日:2023-07-10 14:28:06 公開日:2023-07-06

# DENCLUEの最適帯域選択

Optimal Bandwidth Selection for DENCLUE ( http://arxiv.org/abs/2307.03206v1 )

ライセンス: Link先を確認

Hao Wang

(参考訳) 現代の業界では、クラスタリングアルゴリズムはアルゴリズムエンジニアの日常的なルーチンである。クラスタリングアルゴリズムは2010年以前に急速に成長した。ディープラーニングが機械学習アプリケーションのデファクト産業標準となった後、研究トピックに関連するイノベーションは停滞している。 2007年、非線形データ構造に対するクラスタリング問題を解決するために密度に基づくクラスタリングアルゴリズムDENCLUEが発明された。しかし、パラメータ選択問題は2011年までほとんど無視された。本稿では,denclueアルゴリズムの最適パラメータを計算する新しい手法を提案し,その性能を実験部で検討する。

In modern day industry, clustering algorithms are daily routines of algorithm engineers. Although clustering algorithms experienced rapid growth before 2010. Innovation related to the research topic has stagnated after deep learning became the de facto industrial standard for machine learning applications. In 2007, a density-based clustering algorithm named DENCLUE was invented to solve clustering problem for nonlinear data structures. However, its parameter selection problem was largely neglected until 2011. In this paper, we propose a new approach to compute the optimal parameters for the DENCLUE algorithm, and discuss its performance in the experiment section.

翻訳日:2023-07-10 14:27:05 公開日:2023-07-06

# セルフリー大量MIMOのためのハイブリッド知識駆動チャネルセマンティック獲得とビームフォーミング

Hybrid Knowledge-Data Driven Channel Semantic Acquisition and Beamforming for Cell-Free Massive MIMO ( http://arxiv.org/abs/2307.03070v1 )

ライセンス: Link先を確認

Zhen Gao, Shicong Liu, Yu Su, Zhongxiang Li, Dezhi Zheng

(参考訳) 本稿では,ユビキタスな拡張現実(XR)アプリケーションをサポートし,現在の屋内無線通信能力とのギャップを埋めるため,屋外無線システムの進歩に焦点をあてる。セルレス大規模マルチインプットマルチアウトプット(MIMO)システムにおけるチャネル意味獲得とマルチユーザビームフォーミングのためのハイブリッド知識データ駆動方式を提案する。具体的には、まず、パイロット信号、チャネルセマンティック埋め込みのためのCSI量子化器、チャネルセマンティック抽出のためのCSI再構成をエンドツーエンドで共同で最適化する、チャネルセマンティック取得のためのデータ駆動多重層パーセプトロン(MLP)ベースの自動エンコーダを提案する。さらに、取得したチャネルセマンティクスに基づいて、屋外XRシナリオにおけるCSIの完全性に優れたスペクトル効率を実現することができる知識駆動型深層展開型マルチユーザビームフォーマを提案する。従来の逐次オーバーリラクシエーション(sor)に基づく線形ビームフォーミングスキームをディープラーニングで展開することにより,最適なパラメータを適応的に学習し,収束を加速し,不完全csiに対するロバスト性を向上させることができる。提案手法は,完全ディジタルアレーを用いたアクセスポイント (aps) と,アナログ-デジタルアレーのハイブリッド構造を持つapsに対して使用可能である。シミュレーションの結果,提案手法がチャネル獲得精度の向上に有効であり,csi取得とビームフォーマ設計の複雑さを低減できることを示した。提案手法は,ダウンリンク伝送を3回繰り返しただけで,収束スペクトル効率の約96%を達成し,その効果とアウトドアxr応用の可能性を示した。

This paper focuses on advancing outdoor wireless systems to better support ubiquitous extended reality (XR) applications, and close the gap with current indoor wireless transmission capabilities. We propose a hybrid knowledge-data driven method for channel semantic acquisition and multi-user beamforming in cell-free massive multiple-input multiple-output (MIMO) systems. Specifically, we firstly propose a data-driven multiple layer perceptron (MLP)-Mixer-based auto-encoder for channel semantic acquisition, where the pilot signals, CSI quantizer for channel semantic embedding, and CSI reconstruction for channel semantic extraction are jointly optimized in an end-to-end manner. Moreover, based on the acquired channel semantic, we further propose a knowledge-driven deep-unfolding multi-user beamformer, which is capable of achieving good spectral efficiency with robustness to imperfect CSI in outdoor XR scenarios. By unfolding conventional successive over-relaxation (SOR)-based linear beamforming scheme with deep learning, the proposed beamforming scheme is capable of adaptively learning the optimal parameters to accelerate convergence and improve the robustness to imperfect CSI. The proposed deep unfolding beamforming scheme can be used for access points (APs) with fully-digital array and APs with hybrid analog-digital array structure. Simulation results demonstrate the effectiveness of our proposed scheme in improving the accuracy of channel acquisition, as well as reducing complexity in both CSI acquisition and beamformer design. The proposed beamforming method achieves approximately 96% of the converged spectrum efficiency performance after only three iterations in downlink transmission, demonstrating its efficacy and potential to improve outdoor XR applications.

翻訳日:2023-07-10 14:25:22 公開日:2023-07-06

# PSDR-Room:微分レンダリングによる写真からシーンまで

PSDR-Room: Single Photo to Scene using Differentiable Rendering ( http://arxiv.org/abs/2307.03244v1 )

ライセンス: Link先を確認

Kai Yan, Fujun Luan, Milo\v{S} Ha\v{S}An, Thibault Groueix, Valentin Deschaintre, Shuang Zhao

(参考訳) 3dデジタルシーンにはライト、素材、ジオメトリなど多くの要素が含まれており、望ましい外観に達するために相互作用する。このようなシーンのステージングには時間がかかり、芸術と技術の両方のスキルが必要です。そこで本研究では,PSDR-Roomを提案する。PSDR-Roomは,室内シーンのターゲット画像を最小限のユーザ入力でマッチングするための,個々のオブジェクトのポーズや素材を最適化するシステムである。この目的のために、我々は最近の経路空間の微分可能なレンダリング手法を活用し、幾何学、照明、手続き材料に対するレンダリングの偏りのない勾配を提供し、これらすべてのコンポーネントを勾配勾配を用いて最適化し、入力された写真外観と視覚的に一致させることができる。我々は,最近のシーン理解手法を用いて,最適化を初期化し,適切な3次元モデルや材料を探索する。本手法を屋内シーンの実際の写真上で評価し,得られたシーンコンポーネントの編集性を示す。

A 3D digital scene contains many components: lights, materials and geometries, interacting to reach the desired appearance. Staging such a scene is time-consuming and requires both artistic and technical skills. In this work, we propose PSDR-Room, a system allowing to optimize lighting as well as the pose and materials of individual objects to match a target image of a room scene, with minimal user input. To this end, we leverage a recent path-space differentiable rendering approach that provides unbiased gradients of the rendering with respect to geometry, lighting, and procedural materials, allowing us to optimize all of these components using gradient descent to visually match the input photo appearance. We use recent single-image scene understanding methods to initialize the optimization and search for appropriate 3D models and materials. We evaluate our method on real photographs of indoor scenes and demonstrate the editability of the resulting scene components.

翻訳日:2023-07-10 14:18:38 公開日:2023-07-06

# BAD:局所的特徴クラスタリングによるブラインド異常検出

That's BAD: Blind Anomaly Detection by Implicit Local Feature Clustering ( http://arxiv.org/abs/2307.03243v1 )

ライセンス: Link先を確認

Jie Zhang, Masanori Suganuma, Takayuki Okatani

(参考訳) 産業用物体・テクスチャの視覚異常検出(AD)に関する最近の研究は、非常に優れた成果を上げている。彼らは教師なしの設定、特に1つのクラス設定を考慮し、トレーニングのための正規(\textit{i.e}, anomaly-free)イメージセットが利用可能であると仮定する。本稿では,通常のサンプルと異常サンプルの両方を含む可能性のある画像の集合における異常を検出する,教師なしADのより困難なシナリオについて考察する。この設定は、既知の正規データの可用性を前提とせず、最近の研究で考慮されている標準ADとは全く異なる人間のアノテーションから完全に解放されている。明確にするために、seting blind anomaly detection (bad)と呼ぶ。本稿では,badを局所的異常検出問題に変換できることを示すとともに,画像および画素レベルの異常を正確に検出できるpatchclusterという新しい手法を提案する。実験結果から、PatchClusterは通常のデータを知ることなく有望な性能を示し、必要な1クラス設定で適用されるSOTAメソッドに匹敵する性能を示した。

Recent studies on visual anomaly detection (AD) of industrial objects/textures have achieved quite good performance. They consider an unsupervised setting, specifically the one-class setting, in which we assume the availability of a set of normal (\textit{i.e.}, anomaly-free) images for training. In this paper, we consider a more challenging scenario of unsupervised AD, in which we detect anomalies in a given set of images that might contain both normal and anomalous samples. The setting does not assume the availability of known normal data and thus is completely free from human annotation, which differs from the standard AD considered in recent studies. For clarity, we call the setting blind anomaly detection (BAD). We show that BAD can be converted into a local outlier detection problem and propose a novel method named PatchCluster that can accurately detect image- and pixel-level anomalies. Experimental results show that PatchCluster shows a promising performance without the knowledge of normal data, even comparable to the SOTA methods applied in the one-class setting needing it.

翻訳日:2023-07-10 14:18:22 公開日:2023-07-06

# 可視赤外人物再同定のための原始中間情報の適応生成

Adaptive Generation of Privileged Intermediate Information for Visible-Infrared Person Re-Identification ( http://arxiv.org/abs/2307.03240v1 )

ライセンス: Link先を確認

Mahdi Alehdaghi, Arthur Josi, Pourya Shamsolmoali, Rafael M. O. Cruz, and Eric Granger

(参考訳) 可視赤外線の人物再識別は、RGBと赤外線センサーの分散ネットワーク上で撮影された同一人物の画像の検索を試みる。いくつかのV-I ReIDアプローチは、VとIのモダリティを直接統合して、共有表現空間内の人物を識別する。しかしながら、V と I のモダリティ間のデータ分布の重大なギャップを考えると、V-I ReID は依然として困難である。最近のアプローチでは、v と i のモダリティを橋渡しできる中間空間を活用することで一般化が改善されているが、そのような情報領域のデータの選択や生成には効果的な方法が必要である。本稿では,V と I のモダリティ間の識別情報をブリッジする仮想ドメインを適応し,生成するための適応型中間情報学習手法を提案する。 AGPI^2の背後にある重要な動機は、付加情報を提供する特権画像を生成することによって、深いV-I ReIDバックボーンのトレーニングを強化することである。これらの特権画像は、オリジナルのVまたはIモードでのみアクセスできない共有識別特徴をキャプチャする。この目的に向けて、非線形生成モジュールは、逆対象で訓練され、V 画像をより小さな領域シフト w.r.t.I ドメインで中間空間に変換する。一方、AGPI^2内の埋め込みモジュールは、Vと生成された画像の両方に類似した特徴を生成し、すべてのモダリティに共通する特徴の抽出を促進する。これらの貢献に加えて、AGPI^2 は中間画像に適応するための敵の目的を採用しており、V と I ドメイン間の大きなドメインシフトに対処する非モダリティ固有の空間を作る上で重要な役割を果たす。 V-I ReIDデータセットを用いた実験結果から,AGPI^2は推論中に余分な計算資源を使わずにマッチング精度を向上させることが示された。

Visible-infrared person re-identification seeks to retrieve images of the same individual captured over a distributed network of RGB and IR sensors. Several V-I ReID approaches directly integrate both V and I modalities to discriminate persons within a shared representation space. However, given the significant gap in data distributions between V and I modalities, cross-modal V-I ReID remains challenging. Some recent approaches improve generalization by leveraging intermediate spaces that can bridge V and I modalities, yet effective methods are required to select or generate data for such informative domains. In this paper, the Adaptive Generation of Privileged Intermediate Information training approach is introduced to adapt and generate a virtual domain that bridges discriminant information between the V and I modalities. The key motivation behind AGPI^2 is to enhance the training of a deep V-I ReID backbone by generating privileged images that provide additional information. These privileged images capture shared discriminative features that are not easily accessible within the original V or I modalities alone. Towards this goal, a non-linear generative module is trained with an adversarial objective, translating V images into intermediate spaces with a smaller domain shift w.r.t. the I domain. Meanwhile, the embedding module within AGPI^2 aims to produce similar features for both V and generated images, encouraging the extraction of features that are common to all modalities. In addition to these contributions, AGPI^2 employs adversarial objectives for adapting the intermediate images, which play a crucial role in creating a non-modality-specific space to address the large domain shifts between V and I domains. Experimental results conducted on challenging V-I ReID datasets indicate that AGPI^2 increases matching accuracy without extra computational resources during inference.

翻訳日:2023-07-10 14:18:03 公開日:2023-07-06

# 高エネルギー物理学のための量子コンピューティング:最先端の技術と課題 QC4HEPワーキンググループの概要

Quantum Computing for High-Energy Physics: State of the Art and Challenges. Summary of the QC4HEP Working Group ( http://arxiv.org/abs/2307.03236v1 )

ライセンス: Link先を確認

Alberto Di Meglio, Karl Jansen, Ivano Tavernelli, Constantia Alexandrou, Srinivasan Arunachalam, Christian W. Bauer, Kerstin Borras, Stefano Carrazza, Arianna Crippa, Vincent Croft, Roland de Putter, Andrea Delgado, Vedran Dunjko, Daniel J. Egger, Elias Fernandez-Combarro, Elina Fuchs, Lena Funcke, Daniel Gonzalez-Cuadra, Michele Grossi, Jad C. Halimeh, Zoe Holmes, Stefan Kuhn, Denis Lacroix, Randy Lewis, Donatella Lucchesi, Miriam Lucio Martinez, Federico Meloni, Antonio Mezzacapo, Simone Montangero, Lento Nagano, Voica Radescu, Enrique Rico Ortega, Alessandro Roggero, Julian Schuhmacher, Joao Seixas, Pietro Silvi, Panagiotis Spentzouris, Francesco Tacchino, Kristan Temme, Koji Terashi, Jordi Tura, Cenk Tuysuz, Sofia Vallecorsa, Uwe-Jens Wiese, Shinjae Yoo, Jinglei Zhang

(参考訳) 量子コンピュータは、自然科学や他の分野におけるコンピューティングのパラダイム変化に興味深い経路を提供し、いわゆる量子優位、すなわち数値シミュレーションの重要な(指数関数的な)スピードアップを達成する可能性を秘めている。量子ビットの様々な実現を伴うハードウェアデバイスの急速な開発により、量子コンピュータ上での小規模ながら代表的な応用が可能になる。特に、高エネルギー物理学コミュニティは、この分野が計算問題への挑戦の原動力であるため、量子コンピューティングの力にアクセスする上で重要な役割を果たす。この懸念は、理論的な面では、古典的な手法で対処するのが非常に困難または不可能なモデルの探索であり、実験的な面では、大型ハドロン衝突型加速器のアップグレードのような新しい実験の巨大なデータ課題である。 CERN、DESY、IBMが主導するこのロードマップ論文では、高エネルギー物理量子計算の状況を提供し、近い将来に対処できる理論的および実験的なターゲットベンチマーク応用の例を示す。可能であれば、IBM 100 x 100の課題を念頭に置いて、エラー軽減量子コンピューティングを使用した例のリソース推定も提供する。

Quantum computers offer an intriguing path for a paradigmatic change of computing in the natural sciences and beyond, with the potential for achieving a so-called quantum advantage, namely a significant (in some cases exponential) speed-up of numerical simulations. The rapid development of hardware devices with various realizations of qubits enables the execution of small scale but representative applications on quantum computers. In particular, the high-energy physics community plays a pivotal role in accessing the power of quantum computing, since the field is a driving source for challenging computational problems. This concerns, on the theoretical side, the exploration of models which are very hard or even impossible to address with classical techniques and, on the experimental side, the enormous data challenge of newly emerging experiments, such as the upgrade of the Large Hadron Collider. In this roadmap paper, led by CERN, DESY and IBM, we provide the status of high-energy physics quantum computations and give examples for theoretical and experimental target benchmark applications, which can be addressed in the near future. Having the IBM 100 x 100 challenge in mind, where possible, we also provide resource estimates for the examples given using error mitigated quantum computing.

翻訳日:2023-07-10 14:17:31 公開日:2023-07-06

# 量子誤り訂正プリミティブへの単純な化学応用のコンパイル

Compilation of a simple chemistry application to quantum error correction primitives ( http://arxiv.org/abs/2307.03233v1 )

ライセンス: Link先を確認

Nick S. Blunt, Gy\"orgy P. Geh\'er, Alexandra E. Moylett

(参考訳) 量子誤差補正の分野では、最近の多くのエキサイティングな結果が見られる。これには、現在の量子ハードウェアにおけるエラー訂正の初期のデモンストレーションや、実世界のアプリケーションのために大規模量子アルゴリズムを実行するための要件を理解するためのリソース見積が含まれる。本研究では,この2つの発展のギャップを,最小限の化学例において,フォールトトレラントに量子位相推定(qpe)を行うために必要な資源を注意深く推定することにより橋渡しする。具体的には, 水素分子を最小に設定した回転表面コードに対して, 格子演算を行うためのqpe回路の詳細なコンパイルについて述べる。本稿ではアルゴリズムと誤り訂正レベルでの最適化について述べる。単純な化学回路でさえも900量子ビットと2300量子誤り訂正ラウンドを必要としており、早期耐故障性体制をターゲットとしたエラー訂正技術の改善の必要性を強調している。

A number of exciting recent results have been seen in the field of quantum error correction. These include initial demonstrations of error correction on current quantum hardware, and resource estimates which improve understanding of the requirements to run large-scale quantum algorithms for real-world applications. In this work, we bridge the gap between these two developments by performing careful estimation of the resources required to fault-tolerantly perform quantum phase estimation (QPE) on a minimal chemical example. Specifically, we describe a detailed compilation of the QPE circuit to lattice surgery operations for the rotated surface code, for a hydrogen molecule in a minimal basis set. We describe a number of optimisations at both the algorithmic and error correction levels. We find that implementing even a simple chemistry circuit requires 900 qubits and 2,300 quantum error correction rounds, emphasising the need for improved error correction techniques specifically targeting the early fault-tolerant regime.

翻訳日:2023-07-10 14:17:12 公開日:2023-07-06

# 高忠実性仮想2量子ゲートの実証実験

Experimental demonstration of a high-fidelity virtual two-qubit gate ( http://arxiv.org/abs/2307.03232v1 )

ライセンス: Link先を確認

Akhil Pratap Singh (1), Kosuke Mitarai (2), Yasunari Suzuki (3), Kentaro Heya (4), Yutaka Tabuchi (4), Keisuke Fujii (2 and 4) and Yasunobu Nakamura (1 and 4) ((1) Department of Applied Physics, Graduate School of Engineering, The University of Tokyo, (2) Graduate School of Engineering Science, Osaka University, (3) NTT Computer and Data Science Laboratories, NTT Corporation, (4) RIKEN Center for Quantum Computing)

(参考訳) 仮想2量子ゲートを実験的に実証し,量子プロセストモグラフィー(QPT)を用いて特徴付ける。仮想2キュービットゲートは、実際の2キュービットゲートを単一キュービット演算に分解し、期待値推定のための量子回路における射影測定を行う。中間回路の分散読み出しによる投影計測を実装した。決定論的サンプリング方式は仮想二ビットゲートの分解に必要な回路評価の回数を減らす。また、読み出し誤差の影響を抑制し、仮想制御されたZ$(CZ)ゲートの平均ゲート忠実度を$f_{\rm av} = 0.9975 \pm 0.0028$に改善する。提案手法は,量子回路のシミュレーションに有用であり,量子ビットの少ない量子ビットを用いた仮想2量子ゲートの実装や,遠隔の2量子ゲートの実装に有用である。

We experimentally demonstrate a virtual two-qubit gate and characterize it using quantum process tomography (QPT). The virtual two-qubit gate decomposes an actual two-qubit gate into single-qubit operations and projective measurements in quantum circuits for expectation-value estimation. We implement projective measurements via mid-circuit dispersive readout. The deterministic sampling scheme reduces the number of experimental circuit evaluations required for decomposing a virtual two-qubit gate. We also apply measurement error mitigation to suppress the effect of readout errors and improve the average gate fidelity of a virtual controlled-$Z$ (CZ) gate to $f_{\rm av} = 0.9975 \pm 0.0028$. Our results highlight a practical approach to implement virtual two-qubit gates with high fidelities, which are useful for simulating quantum circuits using fewer qubits and implementing two-qubit gates on a distant pair of qubits.

翻訳日:2023-07-10 14:16:58 公開日:2023-07-06

# 適応投影型変分量子力学

Adaptive projected variational quantum dynamics ( http://arxiv.org/abs/2307.03229v1 )

ライセンス: Link先を確認

David Linteau, Stefano Barison, Netanel Lindner, Giuseppe Carleo

(参考訳) 本稿では,正確な変動時間進化波動関数を作成するための適応量子アルゴリズムを提案する。この手法は,変分パラメータ数に線形スケーリングを施した大域的最適化を行う,予測された変分量子ダイナミクス(pVQD)アルゴリズムに基づいている。シミュレーションの開始時に変分アンザッツを修正する代わりに、回路は時間進化中に体系的に成長する。さらに、適応ステップは補助量子ビットを必要とせず、ゲート探索は異なる量子デバイス上で並列に行うことができる。この新しいアルゴリズムはadaptive pvqd(適応型pvqd)を駆動スピンモデルとフェルミオン系のシミュレーションに適用し、トロッタ化回路と非適応変分法の両方と比較した場合の利点を示す。最後に,適応型pvqdアルゴリズムを用いて作製した浅層回路を用いて,ハードウェア上の量子システムの物理特性をより正確に測定する。

We propose an adaptive quantum algorithm to prepare accurate variational time evolved wave functions. The method is based on the projected Variational Quantum Dynamics (pVQD) algorithm, that performs a global optimization with linear scaling in the number of variational parameters. Instead of fixing a variational ansatz at the beginning of the simulation, the circuit is grown systematically during the time evolution. Moreover, the adaptive step does not require auxiliary qubits and the gate search can be performed in parallel on different quantum devices. We apply the new algorithm, named Adaptive pVQD, to the simulation of driven spin models and fermionic systems, where it shows an advantage when compared to both Trotterized circuits and non-adaptive variational methods. Finally, we use the shallower circuits prepared using the Adaptive pVQD algorithm to obtain more accurate measurements of physical properties of quantum systems on hardware.

翻訳日:2023-07-10 14:16:40 公開日:2023-07-06

# ニューラルネットワーク場の理論:非ガウス性、行動、局所性

Neural Network Field Theories: Non-Gaussianity, Actions, and Locality ( http://arxiv.org/abs/2307.03223v1 )

ライセンス: Link先を確認

Mehmet Demirtas, James Halverson, Anindita Maiti, Matthew D. Schwartz, Keegan Stoner

(参考訳) 場理論における経路積分測度とニューラルネットワークのアンサンブルは、関数上の分布を記述する。中心極限定理が無限幅(無限$N$)極限に適用できるとき、ネットワークのアンサンブルは自由場理論に対応する。 1/N$の展開は場の理論における相互作用に対応するが、ネットワークパラメータの統計的独立性の小さな破れなど、相互作用する理論につながることもある。これらの他の拡張は、例えば普遍近似定理に対する振る舞いの改善によって、1/N$-展開よりも有利である。場の理論の連結コレレータが与えられた場合、頂点が連結コレレータである新しいファインマン図式処方を用いて、拡張パラメータのアクション順序を体系的に再構成することができる。この方法はエッジワース展開に動機付けられ、ニューラルネットワークの場の理論に対する作用を導出することができる。逆に、この対応により、ニューラルネットワークパラメータ密度の変形として作用変形を表現することにより、与えられた場理論を実現するアーキテクチャを設計できる。例えば、$\phi^4$理論は無限の$N$ニューラルネットワーク場理論として実現される。

Both the path integral measure in field theory and ensembles of neural networks describe distributions over functions. When the central limit theorem can be applied in the infinite-width (infinite-$N$) limit, the ensemble of networks corresponds to a free field theory. Although an expansion in $1/N$ corresponds to interactions in the field theory, others, such as in a small breaking of the statistical independence of network parameters, can also lead to interacting theories. These other expansions can be advantageous over the $1/N$-expansion, for example by improved behavior with respect to the universal approximation theorem. Given the connected correlators of a field theory, one can systematically reconstruct the action order-by-order in the expansion parameter, using a new Feynman diagram prescription whose vertices are the connected correlators. This method is motivated by the Edgeworth expansion and allows one to derive actions for neural network field theories. Conversely, the correspondence allows one to engineer architectures realizing a given field theory by representing action deformations as deformations of neural network parameter densities. As an example, $\phi^4$ theory is realized as an infinite-$N$ neural network field theory.

翻訳日:2023-07-10 14:16:25 公開日:2023-07-06

# 逆モデルによる不確かさの定量化

Quantification of Uncertainty with Adversarial Models ( http://arxiv.org/abs/2307.03217v1 )

ライセンス: Link先を確認

Kajetan Schweighofer, Lukas Aichberger, Mykyta Ielanskyi, G\"unter Klambauer, Sepp Hochreiter

(参考訳) 不確かさの定量化は実世界のアプリケーションで実行可能な予測に重要である。予測的不確実性定量化の重要な部分は、発散関数と後部の間の積の積分として定義されるてんかん不確実性の推定である。ディープアンサンブルやMCドロップアウトのような現在の手法は、主にサンプリングモデルにおいて後部を考慮しているため、てんかんの不確かさを推定するには不十分である。疫学的な不確実性をよりよく推定するために, 適応モデルによる不確かさの定量化を提案する。 quamは、積分の下の全積が後側だけでなく大きい領域を特定する。その結果、quamは従来の方法に比べて認識の不確かさの近似誤差が低い。製品が大きいモデルは、(逆の例ではなく)逆のモデルに対応します。敵対モデルは、高い後部と、それらの予測と参照モデルの高ばらつきの両方を持つ。実験の結果, QUIMは, 深層学習モデルの認識不確実性を把握し, 視覚領域における課題に対する従来の手法よりも優れていることがわかった。

Quantifying uncertainty is important for actionable predictions in real-world applications. A crucial part of predictive uncertainty quantification is the estimation of epistemic uncertainty, which is defined as an integral of the product between a divergence function and the posterior. Current methods such as Deep Ensembles or MC dropout underperform at estimating the epistemic uncertainty, since they primarily consider the posterior when sampling models. We suggest Quantification of Uncertainty with Adversarial Models (QUAM) to better estimate the epistemic uncertainty. QUAM identifies regions where the whole product under the integral is large, not just the posterior. Consequently, QUAM has lower approximation error of the epistemic uncertainty compared to previous methods. Models for which the product is large correspond to adversarial models (not adversarial examples!). Adversarial models have both a high posterior as well as a high divergence between their predictions and that of a reference model. Our experiments show that QUAM excels in capturing epistemic uncertainty for deep learning models and outperforms previous methods on challenging tasks in the vision domain.

翻訳日:2023-07-10 14:16:07 公開日:2023-07-06

# preadd: 制御テキスト生成のためのプレフィックス適応復号

PREADD: Prefix-Adaptive Decoding for Controlled Text Generation ( http://arxiv.org/abs/2307.03214v1 )

ライセンス: Link先を確認

Jonathan Pei, Kevin Yang, and Dan Klein

(参考訳) テキスト生成のためのフレキシブルな方法であるPREADD(Prefix-Adaptive Decoding)を提案する。属性の制御に補助的な専門家モデルを使用する既存の方法とは異なり、PreADDは外部モデルを必要としない。具体的には、preaddは、プレフィックスプリプンを使用して生成されたものとrawプロンプトを使用して生成された出力ロジットを対比し、プレフィックスによってカプセル化された属性に関して、ポジティブとネガティブの両方の制御を可能にする。有害なアウトプット緩和,ジェンダーバイアス低減,感情制御の3つのタスクにおいてpreADDを評価した結果,PreADDはベースラインを刺激するだけでなく,各タスクの主指標に対して12%以上の相対的な利得で補助的専門的制御方法も優れていることがわかった。

We propose Prefix-Adaptive Decoding (PREADD), a flexible method for controlled text generation. Unlike existing methods that use auxiliary expert models to control for attributes, PREADD does not require an external model, instead relying on linearly combining output logits from multiple prompts. Specifically, PREADD contrasts the output logits generated using a raw prompt against those generated using a prefix-prepended prompt, enabling both positive and negative control with respect to any attribute encapsulated by the prefix. We evaluate PREADD on three tasks -- toxic output mitigation, gender bias reduction, and sentiment control -- and find that PREADD outperforms not only prompting baselines, but also an auxiliary-expert control method, by 12% or more in relative gain on our main metrics for each task.

翻訳日:2023-07-10 14:15:48 公開日:2023-07-06

# omniboost:マルチdnn負荷下における異種組み込みデバイスのスループット向上

OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload ( http://arxiv.org/abs/2307.03290v1 )

ライセンス: Link先を確認

Andreas Karatzas and Iraklis Anagnostopoulos

(参考訳) 現代のディープニューラルネットワーク(DNN)は、高い効率性と精度を示す。これにより、複数のDNNアプリケーションで構成されるアプリケーションワークロードが導入され、ワークロードの分散に関する新たな課題が提起された。多様なアクセラレーターを備えた新しい組込みシステムは、現在のランタイムコントローラが完全に利用できないアーキテクチャ上の不均一性を示す。マルチDNNワークロードで高いスループットを実現するために、このようなコントローラは、基礎となる不均一性を活用するために、数十万の可能なソリューションを探索する必要がある。本稿では,異種組み込みデバイスのための軽量かつ拡張可能なマルチDNNマネージャであるOmniBoostを提案する。我々は確率空間探索を活用し、それを高精度な性能推定器と組み合わせて、他の最先端手法と比較してx4.6平均スループット向上を観測する。評価はHiKey970開発ボードで行われた。

Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accuracy properties. This has introduced application workloads that comprise of multiple DNN applications, raising new challenges regarding workload distribution. Equipped with a diverse set of accelerators, newer embedded system present architectural heterogeneity, which current run-time controllers are unable to fully utilize. To enable high throughput in multi-DNN workloads, such a controller is ought to explore hundreds of thousands of possible solutions to exploit the underlying heterogeneity. In this paper, we propose OmniBoost, a lightweight and extensible multi-DNN manager for heterogeneous embedded devices. We leverage stochastic space exploration and we combine it with a highly accurate performance estimator to observe a x4.6 average throughput boost compared to other state-of-the-art methods. The evaluation was performed on the HiKey970 development board.

翻訳日:2023-07-10 14:08:45 公開日:2023-07-06

# サブ線形ハイパーボリュームレグレットの最適スカラー化

Optimal Scalarizations for Sublinear Hypervolume Regret ( http://arxiv.org/abs/2307.03288v1 )

ライセンス: Link先を確認

Qiuyi Zhang (Richard)

(参考訳) スケーラビリティは、例えば最近のRLHFでは、人間の好みを調整する報酬モデルをトレーニングするなど、複数の目的をひとつに減らすために、任意の多目的設定にデプロイできる一般的なテクニックである。しかし、線形スカラー化がパレート辺境の凹部を見逃していることが知られているため、この古典的アプローチを否定する者もいる。そのために我々は,パレート・フロンティアにおけるk$目標の多種多様な集合を探索することのできる,単純な非線形スカラー化を見つけることを目指している。均一にランダムな重みを持つ超体積スカラー化は、任意のアルゴリズムが漸近的により良い処理を行なわないように、最適なサブ線形後悔境界を$O(T^{-1/k})$で達成し、超体積後悔を確実に最小化するのに驚くほど最適であることを示す。理論的なケーススタディとして、多目的確率的線形バンディッツ問題を検討し、超体積スカラー化のsublinear regret boundsを利用すると、$\tilde{o}(d t^{-1/2} + t^{-1/k})$ の高体積後悔境界を生成する新しい非ユークリッド解析が得られることを示す。 EHVIのようなベイズ最適化における標準的な多目的アルゴリズムと同様に、線形スカラー化とチェビシェフスカラー化の両方を一貫して上回る単純な超体積スカラー化を用いることで、我々の理論を強い経験的性能で支持する。

Scalarization is a general technique that can be deployed in any multiobjective setting to reduce multiple objectives into one, such as recently in RLHF for training reward models that align human preferences. Yet some have dismissed this classical approach because linear scalarizations are known to miss concave regions of the Pareto frontier. To that end, we aim to find simple non-linear scalarizations that can explore a diverse set of $k$ objectives on the Pareto frontier, as measured by the dominated hypervolume. We show that hypervolume scalarizations with uniformly random weights are surprisingly optimal for provably minimizing the hypervolume regret, achieving an optimal sublinear regret bound of $O(T^{-1/k})$, with matching lower bounds that preclude any algorithm from doing better asymptotically. As a theoretical case study, we consider the multiobjective stochastic linear bandits problem and demonstrate that by exploiting the sublinear regret bounds of the hypervolume scalarizations, we can derive a novel non-Euclidean analysis that produces improved hypervolume regret bounds of $\tilde{O}( d T^{-1/2} + T^{-1/k})$. We support our theory with strong empirical performance of using simple hypervolume scalarizations that consistently outperforms both the linear and Chebyshev scalarizations, as well as standard multiobjective algorithms in bayesian optimization, such as EHVI.

翻訳日:2023-07-10 14:08:31 公開日:2023-07-06

# 接続制限付き量子符号の速度-距離トレードオフの改善

Improved rate-distance trade-offs for quantum codes with restricted connectivity ( http://arxiv.org/abs/2307.03283v1 )

ライセンス: Link先を確認

Nou\'edyn Baspin, Venkatesan Guruswami, Anirudh Krishna, Ray Li

(参考訳) 量子誤り訂正符号が実現可能であるためには、符号制約を受ける量子ビットがある種の限定接続性を示すことが重要である。 Bravyi & Terhal (BT) と Bravyi, Poulin & Terhal (BPT) の業績は、幾何的局所性は符号特性を制約する(例えば $[[n,k,d]]$ $D$-次元格子上の局所チェックによって定義される量子符号は、$k d^{2/(D-1)} \le O(n)$に従わなければならない。 BaspinとKrishnaは、量子コードに関連する接続グラフがコードパラメータをどう制約するかというより一般的な問題を研究した。これらのトレードオフは、bptおよびbt境界よりもリッチなコードクラスに適用され、幾何学的に局所的なコードのみをキャプチャする。我々は,接続グラフにおける分離子の大きさの関数として,より厳密な次元距離トレードオフを確立することにより,この作業を拡張し,改善する。また、LDPC符号のみでなく、特定の分離プロファイルを持つ安定化器符号を全てカバーする距離境界を得る。

For quantum error-correcting codes to be realizable, it is important that the qubits subject to the code constraints exhibit some form of limited connectivity. The works of Bravyi & Terhal (BT) and Bravyi, Poulin & Terhal (BPT) established that geometric locality constrains code properties -- for instance $[[n,k,d]]$ quantum codes defined by local checks on the $D$-dimensional lattice must obey $k d^{2/(D-1)} \le O(n)$. Baspin and Krishna studied the more general question of how the connectivity graph associated with a quantum code constrains the code parameters. These trade-offs apply to a richer class of codes compared to the BPT and BT bounds, which only capture geometrically-local codes. We extend and improve this work, establishing a tighter dimension-distance trade-off as a function of the size of separators in the connectivity graph. We also obtain a distance bound that covers all stabilizer codes with a particular separation profile, rather than only LDPC codes.

翻訳日:2023-07-10 14:08:01 公開日:2023-07-06

# ニューラルネットワークデコーダによる表面実験

Neural network decoder for near-term surface-code experiments ( http://arxiv.org/abs/2307.03280v1 )

ライセンス: Link先を確認

Boris M. Varbanov, Marc Serra-Peralta, David Byfield, Barbara M. Terhal

(参考訳) ニューラルネットワークデコーダは、表面コードをデコードする際に、従来のデコーダよりも低い論理エラー率を達成することができる。さらに、これらのデコーダは物理エラー率に関する事前情報を必要としないため、高度に適応可能である。本研究では,トランスモン量子ビットプロセッサから得られたシミュレーションデータと実験データの両方を用いて,小型表面符号に着目したデコーダの性能について検討する。最初に、ニューラルネットワークが典型的には、マッチするデコーダよりも優れた処理エラーにより、例えば$Y$エラーなど、複数の相関したシンドローム欠陥につながることが示される。 Google Quantum AI, Nature 614, 676 (2023)]の実験データに適用すると、ニューラルネットワークデコーダは最小ウェイト完全マッチングよりも約25\%$低い論理誤差率を達成し、最大ライクなデコーダのパフォーマンスにアプローチする。このデコーダの柔軟性を実証するために、トランスモン量子ビットのアナログ読み出しで利用できるソフト情報を組み込んで、対称ガウスノイズモデルを用いてシミュレーションにおいてこのデコーダの性能を評価する。ソフトな情報を考えると、測定誤差の確率に応じて、約10〜%の論理誤差率が低下する。優れた論理性能、柔軟性、計算効率により、ニューラルネットワークデコーダは量子メモリの短期的な実証に適している。

Neural-network decoders can achieve a lower logical error rate compared to conventional decoders, like minimum-weight perfect matching, when decoding the surface code. Furthermore, these decoders require no prior information about the physical error rates, making them highly adaptable. In this study, we investigate the performance of such a decoder using both simulated and experimental data obtained from a transmon-qubit processor, focusing on small-distance surface codes. We first show that the neural network typically outperforms the matching decoder due to better handling errors leading to multiple correlated syndrome defects, such as $Y$ errors. When applied to the experimental data of [Google Quantum AI, Nature 614, 676 (2023)], the neural network decoder achieves logical error rates approximately $25\%$ lower than minimum-weight perfect matching, approaching the performance of a maximum-likelihood decoder. To demonstrate the flexibility of this decoder, we incorporate the soft information available in the analog readout of transmon qubits and evaluate the performance of this decoder in simulation using a symmetric Gaussian-noise model. Considering the soft information leads to an approximately $10\%$ lower logical error rate, depending on the probability of a measurement error. The good logical performance, flexibility, and computational efficiency make neural network decoders well-suited for near-term demonstrations of quantum memories.

翻訳日:2023-07-10 14:07:34 公開日:2023-07-06

# 事前訓練するか、事前訓練しないか? 病理組織学におけるセマンティクスセグメンテーションのためのドメイン特化前訓練の事例研究

To pretrain or not to pretrain? A case study of domain-specific pretraining for semantic segmentation in histopathology ( http://arxiv.org/abs/2307.03275v1 )

ライセンス: Link先を確認

Tushar Kataria, Beatrice Knudsen and Shireen Elhabian

(参考訳) 医用画像データセットのアノテートは費用がかかるため、細調整(あるいは伝達学習)は疾患分類やセマンティックセグメンテーションなどのデジタル病理ビジョン応用において最も効果的な方法である。しかし、実際の画像に基づいて訓練されたモデルのテクスチャバイアスにより、転送学習は、未ラベルの病理学データと自己教師によるドメイン固有の特徴の発見を必要とするような、パフォーマンスの低いモデルをもたらす可能性がある。そこで我々は,病理組織特異的な事前訓練モデルが,病理視覚,すなわち腺と細胞セグメンテーションにより良い初期化をもたらすという前提を検証した。本研究では,腺と細胞セグメンテーションタスクのパフォーマンスを,ドメイン特異的および非ドメイン特異的な事前訓練重量と比較した。さらに,ドメイン固有事前学習が統計的に有意な性能差をもたらすデータサイズについて検討する。さらに,ドメイン固有の初期化によって,異なるデータセット上でのドメイン外テストの有効性が向上するかどうかを検討した。その結果、ドメイン固有の事前トレーニングによるパフォーマンス向上は、タスクとトレーニングデータセットのサイズの両方に依存することがわかった。データセットサイズが限定されたインスタンスでは腺分節性能が著しく向上するのに対し,細胞分節データセットでトレーニングしたモデルでは改善は見られなかった。

Annotating medical imaging datasets is costly, so fine-tuning (or transfer learning) is the most effective method for digital pathology vision applications such as disease classification and semantic segmentation. However, due to texture bias in models trained on real-world images, transfer learning for histopathology applications might result in underperforming models, which necessitates the need for using unlabeled histopathology data and self-supervised methods to discover domain-specific characteristics. Here, we tested the premise that histopathology-specific pretrained models provide better initializations for pathology vision tasks, i.e., gland and cell segmentation. In this study, we compare the performance of gland and cell segmentation tasks with domain-specific and non-domain-specific pretrained weights. Moreover, we investigate the data size at which domain-specific pretraining produces a statistically significant difference in performance. In addition, we investigated whether domain-specific initialization improves the effectiveness of out-of-domain testing on distinct datasets but the same task. The results indicate that performance gain using domain-specific pretraining depends on both the task and the size of the training dataset. In instances with limited dataset sizes, a significant improvement in gland segmentation performance was also observed, whereas models trained on cell segmentation datasets exhibit no improvement.

翻訳日:2023-07-10 14:07:09 公開日:2023-07-06

# 性的に推奨的ではなく、教育的だ。 tiktokビデオにおける性教育と提案コンテンツの分離

It is not Sexually Suggestive, It is Educative. Separating Sex Education from Suggestive Content on TikTok Videos ( http://arxiv.org/abs/2307.03274v1 )

ライセンス: Link先を確認

Enfa George, Mihai Surdeanu

(参考訳) sextokは、tiktokの動画を(注釈者の視点から)性的に示唆する、性教育的なコンテンツ、あるいはその両方とラベル付けしたマルチモーダルデータセットである。このようなデータセットは、TikTok上の性的な推奨コンテンツと仮想性教育ビデオの区別という課題に対処するために必要である。子どもの性的な示唆的なビデオへの露出は、その発達に逆効果があることが示されている。一方、バーチャルセックス教育、特にLGBTQIA+コミュニティとより関係のあるテーマは、非常に貴重である。プラットフォームの現在のシステムは、異なる目的のために、両方のタイプのビデオの一部を削除またはペナルティ化する。私たちのデータセットにはビデオURLが含まれています。その重要性を検証するために,ビデオの分類のための2つのトランスフォーマーモデルを検討する。予備的な結果は、これらのタイプの動画を区別する作業は学習可能であるが難しいことを示唆している。これらの実験は、このデータセットが有意義であることを示唆している。

We introduce SexTok, a multi-modal dataset composed of TikTok videos labeled as sexually suggestive (from the annotator's point of view), sex-educational content, or neither. Such a dataset is necessary to address the challenge of distinguishing between sexually suggestive content and virtual sex education videos on TikTok. Children's exposure to sexually suggestive videos has been shown to have adversarial effects on their development. Meanwhile, virtual sex education, especially on subjects that are more relevant to the LGBTQIA+ community, is very valuable. The platform's current system removes or penalizes some of both types of videos, even though they serve different purposes. Our dataset contains video URLs, and it is also audio transcribed. To validate its importance, we explore two transformer-based models for classifying the videos. Our preliminary results suggest that the task of distinguishing between these types of videos is learnable but challenging. These experiments suggest that this dataset is meaningful and invites further study on the subject.

翻訳日:2023-07-10 14:06:45 公開日:2023-07-06

# ADASSM:画像からの統計的形状モデルにおける逆データ拡張

ADASSM: Adversarial Data Augmentation in Statistical Shape Models From Images ( http://arxiv.org/abs/2307.03273v1 )

ライセンス: Link先を確認

Mokshagna Sai Teja Karanam, Tushar Kataria and Shireen Elhabian

(参考訳) 統計的形状モデル (SSM) は, 個体群全体の解剖学的変化を識別するための優れたツールとして確立されている。形状モデルは、与えられたコホート内のすべてのサンプルに対して一貫した形状表現を使用し、形状を比較し、病理を検出できるバリエーションを特定し、治療計画を定式化するのに役立ちます。医用画像では、これらの形状表現をCT/MRIスキャンから計算するには、解剖学的セグメンテーションアノテーション、登録、テクスチャデノイングを含む時間集約的な前処理操作が必要となる。深層学習モデルは、容積画像から直接形状表現を学習する際、例外的な能力を示し、高効率で効率的な画像からSSMへと導く。それでもこれらのモデルはデータ不足であり、医療データの入手が限られているため、ディープラーニングモデルは過度に適合する傾向にある。形状拡張されたサンプルを生成するためにカーネル密度推定(KDE)法を用いるオフラインデータ拡張技術は、従来のSSM法と同等の精度で画像からSSMネットワークを支援することに成功した。しかし,これらの拡張手法は形状向上に重点を置いているのに対し,深層学習モデルは準最適モデルにおけるテクスチャバイアスの結果を示す。本稿では,データ依存型ノイズ生成やテクスチャ拡張を利用して,画像間SSMフレームワークのオンザフライデータ拡張のための新しい戦略を提案する。提案するフレームワークは,画像対ssmネットワークの敵として訓練され,多様で難解なサンプルを補完する。提案手法は,画素値のみに頼らず,基礎となる幾何学に焦点をあてることにより,精度の向上を実現する。

Statistical shape models (SSM) have been well-established as an excellent tool for identifying variations in the morphology of anatomy across the underlying population. Shape models use consistent shape representation across all the samples in a given cohort, which helps to compare shapes and identify the variations that can detect pathologies and help in formulating treatment plans. In medical imaging, computing these shape representations from CT/MRI scans requires time-intensive preprocessing operations, including but not limited to anatomy segmentation annotations, registration, and texture denoising. Deep learning models have demonstrated exceptional capabilities in learning shape representations directly from volumetric images, giving rise to highly effective and efficient Image-to-SSM. Nevertheless, these models are data-hungry and due to the limited availability of medical data, deep learning models tend to overfit. Offline data augmentation techniques, that use kernel density estimation based (KDE) methods for generating shape-augmented samples, have successfully aided Image-to-SSM networks in achieving comparable accuracy to traditional SSM methods. However, these augmentation methods focus on shape augmentation, whereas deep learning models exhibit texture bias results in sub-optimal models. This paper introduces a novel strategy for on-the-fly data augmentation for the Image-to-SSM framework by leveraging data-dependent noise generation or texture augmentation. The proposed framework is trained as an adversary to the Image-to-SSM network, augmenting diverse and challenging noisy samples. Our approach achieves improved accuracy by encouraging the model to focus on the underlying geometry rather than relying solely on pixel values.

翻訳日:2023-07-10 14:06:27 公開日:2023-07-06

# 量子プロセッサのためのハイブリッド量子-古典的生成逆数ネットワーク

A Hybrid Quantum-Classical Generative Adversarial Network for Near-Term Quantum Processors ( http://arxiv.org/abs/2307.03269v1 )

ライセンス: Link先を確認

Albha O'Dwyer Boyle and Reza Nikandish

(参考訳) 本稿では,近距離量子プロセッサのためのハイブリッド量子古典生成逆数ネットワーク(GAN)を提案する。ハイブリッドGANは、ジェネレータと識別器量子ニューラルネットワーク(QNN)とを備える。生成ネットワークは、角符号化量子回路と変分量子アンサッツを用いて実現される。識別器ネットワークは、多段トレーニング可能な量子回路を用いて実現される。 QNNでは,その深度を制御し,精度と回路複雑度を妥協するモジュール設計手法が提案されている。ジェネレータと判別器ネットワークの損失関数の勾配は、その実装に使用される同じ量子回路を用いて導出される。これにより、余分な量子回路や補助量子ビットが不要になる。量子シミュレーションはIBM Qiskitオープンソースソフトウェア開発キット(SDK)を用いて行われ、ハイブリッド量子古典的GANのトレーニングは、古典的コンピュータ上でのミニバッチ確率勾配勾配(SGD)最適化を用いて行われる。ハイブリッド量子古典的GANは、異なる識別器ネットワーク構造を持つ2量子システムを用いて実装される。 5段階の判別器ネットワークを用いて実現されたハイブリッドGANは、63個の量子ゲートと31個のトレーニング可能なパラメータから構成され、実データ分布と生成されたデータ分布の類似性をそれぞれ0.39および4.16のKullback-Leibler(KL)とJensen-Shannon(JS)の発散スコアを達成する。

In this article, we present a hybrid quantum-classical generative adversarial network (GAN) for near-term quantum processors. The hybrid GAN comprises a generator and a discriminator quantum neural network (QNN). The generator network is realized using an angle encoding quantum circuit and a variational quantum ansatz. The discriminator network is realized using multi-stage trainable encoding quantum circuits. A modular design approach is proposed for the QNNs which enables control on their depth to compromise between accuracy and circuit complexity. Gradient of the loss functions for the generator and discriminator networks are derived using the same quantum circuits used for their implementation. This prevents the need for extra quantum circuits or auxiliary qubits. The quantum simulations are performed using the IBM Qiskit open-source software development kit (SDK), while the training of the hybrid quantum-classical GAN is conducted using the mini-batch stochastic gradient descent (SGD) optimization on a classic computer. The hybrid quantum-classical GAN is implemented using a two-qubit system with different discriminator network structures. The hybrid GAN realized using a five-stage discriminator network, comprises 63 quantum gates and 31 trainable parameters, and achieves the Kullback-Leibler (KL) and the Jensen-Shannon (JS) divergence scores of 0.39 and 4.16, respectively, for similarity between the real and generated data distributions.

翻訳日:2023-07-10 14:05:57 公開日:2023-07-06

# 前立腺イメージングにおけるセグメンテーション基礎モデルの実証解析

Empirical Analysis of a Segmentation Foundation Model in Prostate Imaging ( http://arxiv.org/abs/2307.03266v1 )

ライセンス: Link先を確認

Heejong Kim, Victor Ion Butoi, Adrian V. Dalca, Mert R. Sabuncu

(参考訳) 医療画像セグメンテーションの最先端技術のほとんどは、ディープラーニングモデルに依存している。しかしながら、これらのモデルは、しばしば、高価なラベル付きデータセットを必要とする教師付き方法で、狭義のタスクで訓練される。自然言語生成などの機械学習領域の最近の進歩は、ラベル付きデータはほとんどなく、下流の様々なタスクにカスタマイズ可能な基礎モデルの構築の実現可能性と有用性を示している。これは、基礎モデルがこの分野の未来を形作ることを期待する医療画像のパラダイムシフトである可能性が高い。本稿では,最近開発された医用画像分割の基礎モデル universeg について述べる。本研究では,前立腺画像の文脈で経験的評価を行い,従来のタスク固有セグメンテーションモデルの訓練手法と比較する。本研究は, 医用画像セグメンテーションの基礎モデルの開発と導入において重要となるいくつかの重要な要因について考察した。

Most state-of-the-art techniques for medical image segmentation rely on deep-learning models. These models, however, are often trained on narrowly-defined tasks in a supervised fashion, which requires expensive labeled datasets. Recent advances in several machine learning domains, such as natural language generation have demonstrated the feasibility and utility of building foundation models that can be customized for various downstream tasks with little to no labeled data. This likely represents a paradigm shift for medical imaging, where we expect that foundation models may shape the future of the field. In this paper, we consider a recently developed foundation model for medical image segmentation, UniverSeg. We conduct an empirical evaluation study in the context of prostate imaging and compare it against the conventional approach of training a task-specific segmentation model. Our results and discussion highlight several important factors that will likely be important in the development and adoption of foundation models for medical image segmentation.

翻訳日:2023-07-10 14:05:29 公開日:2023-07-06

# Vision Language Transformers: 調査

Vision Language Transformers: A Survey ( http://arxiv.org/abs/2307.03254v1 )

ライセンス: Link先を確認

Clayton Fields, Casey Kennington

(参考訳) イメージを記述するキャプションに関する質問に答えたり、生成したりするといった視覚言語タスクは、コンピュータが実行するのが難しいタスクである。比較的最近の研究機関は、‘citet{vaswani2017attention} で導入された事前訓練されたトランスフォーマーアーキテクチャを視覚言語モデリングに応用した。トランスフォーマーモデルは、以前のビジョン言語モデルよりも性能と汎用性を大幅に改善した。大規模なジェネリックデータセットでモデルを事前トレーニングし、アーキテクチャやパラメータ値に小さな変更を加えることで、学習を新しいタスクに移す。この種の伝達学習は、自然言語処理とコンピュータビジョンの両方において標準モデリングの実践となっている。視覚言語トランスフォーマーは、視覚と言語の両方を必要とするタスクで同様の進歩を生み出すことを約束する。本稿では,現在利用可能な視覚言語トランスフォーマーモデルに関する幅広い研究の合成を行い,その強み,限界,未解決の疑問について分析する。

Vision language tasks, such as answering questions about or generating captions that describe an image, are difficult tasks for computers to perform. A relatively recent body of research has adapted the pretrained transformer architecture introduced in \citet{vaswani2017attention} to vision language modeling. Transformer models have greatly improved performance and versatility over previous vision language models. They do so by pretraining models on a large generic datasets and transferring their learning to new tasks with minor changes in architecture and parameter values. This type of transfer learning has become the standard modeling practice in both natural language processing and computer vision. Vision language transformers offer the promise of producing similar advancements in tasks which require both vision and language. In this paper, we provide a broad synthesis of the currently available research on vision language transformer models and offer some analysis of their strengths, limitations and some open questions that remain.

翻訳日:2023-07-10 14:05:16 公開日:2023-07-06

# InfoSync:多言語半構造化テーブル間の情報同期

InfoSync: Information Synchronization across Multilingual Semi-structured Tables ( http://arxiv.org/abs/2307.03313v1 )

ライセンス: Link先を確認

Siddharth Khincha, Chelsi Jain, Vivek Gupta, Tushar Kataria, Shuo Zhang

(参考訳) 言語間の半構造化データの情報同期は困難である。例えば、ある言語のウィキペディアテーブルは言語間で同期する必要がある。この問題に対処するために,新しいデータセットInfoSyncCと2段階のタブ同期手法を導入する。 InfoSyncには14言語にまたがる100Kのエンティティ中心テーブル(Wikipedia Infobox)が含まれており、サブセット(3.5Kペア)が手動で注釈付けされている。提案手法には 1)地図列に対する情報アライメント及び情報アライメント 2)多言語テーブルにまたがるアライメントテーブルの欠落情報更新のための情報更新。 InfoSyncで評価すると、情報アライメントはF1スコア87.91(en <-> non-en)を達成する。情報アップデーションを評価するため,603のテーブル対に対してInfoboxesで人手によるウィキペディア編集を行う。本手法はウィキペディア上で77.28%の受け入れ率を示し,提案手法の有効性を示した。

Information Synchronization of semi-structured data across languages is challenging. For instance, Wikipedia tables in one language should be synchronized across languages. To address this problem, we introduce a new dataset InfoSyncC and a two-step method for tabular synchronization. InfoSync contains 100K entity-centric tables (Wikipedia Infoboxes) across 14 languages, of which a subset (3.5K pairs) are manually annotated. The proposed method includes 1) Information Alignment to map rows and 2) Information Update for updating missing/outdated information for aligned tables across multilingual tables. When evaluated on InfoSync, information alignment achieves an F1 score of 87.91 (en <-> non-en). To evaluate information updation, we perform human-assisted Wikipedia edits on Infoboxes for 603 table pairs. Our approach obtains an acceptance rate of 77.28% on Wikipedia, showing the effectiveness of the proposed method.

翻訳日:2023-07-10 13:59:01 公開日:2023-07-06

# スカラーおよびベクトルデータに対する球面調和表現の不変性、等分散、相関および畳み込みについて

On Invariance, Equivariance, Correlation and Convolution of Spherical Harmonic Representations for Scalar and Vectorial Data ( http://arxiv.org/abs/2307.03311v1 )

ライセンス: Link先を確認

Janis Keuper

(参考訳) Spherical Harmonic (SH) ドメインにおけるデータの数学的表現は、最近、機械学習コミュニティへの関心が高まっている。この技術報告では、SH表現の理論的基礎と実践的な実装について詳細に紹介し、回転不変および同変特性に関する研究を要約するとともに、球面上の信号の畳み込みと正確な相関について述べる。拡張において、これらの手法はスカラーSH表現からベクトル調和(VH)へ一般化され、球面上の3次元ベクトル場にも同様の機能を与える。

The mathematical representations of data in the Spherical Harmonic (SH) domain has recently regained increasing interest in the machine learning community. This technical report gives an in-depth introduction to the theoretical foundation and practical implementation of SH representations, summarizing works on rotation invariant and equivariant features, as well as convolutions and exact correlations of signals on spheres. In extension, these methods are then generalized from scalar SH representations to Vectorial Harmonics (VH), providing the same capabilities for 3d vector fields on spheres

翻訳日:2023-07-10 13:58:45 公開日:2023-07-06

# 機械学習による積分可能な量子多体系のダイナミクスの探索

Finding the Dynamics of an Integrable Quantum Many-Body System via Machine Learning ( http://arxiv.org/abs/2307.03310v1 )

ライセンス: Link先を確認

Victor Wei, Alev Orfi, Felix Fehse, W. A. Coish

(参考訳) 学習手法を用いて,ガウディン磁石(中心スピンモデル)の力学について検討する。このモデルは、例えば、環境スピンの大きな浴と相互作用する中心スピンの非マルコフ非コヒーレンスダイナミクスの研究や非平衡超伝導の研究など、実用上重要なものである。ガウディン磁石もまた可積分であり、多くの保存量を認めている:$N$スピンに対して、モデルハミルトニアンは$N$独立通勤作用素の和として書くことができる。この高次対称性にもかかわらず、この多体問題の力学に対する一般閉形式解析解はいまだ解明されていない。機械学習手法は、明示的な解析解が明らかでない場合でも、可積分問題における高次対称性を利用するのに適している。この直観に動機づけられ、モデルハミルトニアンの各変分固有状態に対してニューラルネットワーク表現(制限ボルツマン機械)を用いる。次に、変分モンテカルロ計算により、ガウディン・マグネットハミルトニアンの基底状態と低次励起状態の正確な表現を得る。低次固有状態から、スピン浴の存在下での時間変化する横磁場に対する中心スピンの線形応答を記述する非摂動動的横スピン感受性を求める。この感受性を効率的に記述することは、量子2レベルシステムの環境と相互作用する量子ビットのキャラクタリゼーションと量子制御手順を改善するための扉を開く。これらのシステムには、超微粒子相互作用を介して環境核スピンと相互作用する電子スピンおよびホールスピン量子ビットや、コヒーレント電荷または常磁性不純物と相互作用する自由度を持つ量子ビットが含まれる。

We study the dynamics of the Gaudin magnet ("central-spin model") using machine-learning methods. This model is of practical importance, e.g., for studying non-Markovian decoherence dynamics of a central spin interacting with a large bath of environmental spins and for studies of nonequilibrium superconductivity. The Gaudin magnet is also integrable, admitting many conserved quantities: For $N$ spins, the model Hamiltonian can be written as the sum of $N$ independent commuting operators. Despite this high degree of symmetry, a general closed-form analytic solution for the dynamics of this many-body problem remains elusive. Machine-learning methods may be well suited to exploiting the high degree of symmetry in integrable problems, even when an explicit analytic solution is not obvious. Motivated in part by this intuition, we use a neural-network representation (restricted Boltzmann machine) for each variational eigenstate of the model Hamiltonian. We then obtain accurate representations of the ground state and of the low-lying excited states of the Gaudin-magnet Hamiltonian through a variational Monte Carlo calculation. From the low-lying eigenstates, we find the non-perturbative dynamic transverse spin susceptibility, describing the linear response of a central spin to a time-varying transverse magnetic field in the presence of a spin bath. Having an efficient description of this susceptibility opens the door to improved characterization and quantum control procedures for qubits interacting with an environment of quantum two-level systems. These systems include electron-spin and hole-spin qubits interacting with environmental nuclear spins via hyperfine interactions or qubits with charge or flux degrees of freedom interacting with coherent charge or paramagnetic impurities.

翻訳日:2023-07-10 13:58:34 公開日:2023-07-06

# 高協調光学系における熱的相互変調バックアクション

Thermal intermodulation backaction in a high-cooperativity optomechanical system ( http://arxiv.org/abs/2307.03309v1 )

ライセンス: Link先を確認

Christian M. Pluchar, Aman R. Agrawal, Dalziel J. Wilson

(参考訳) テザリングナノメカニカル共振器を用いた室温量子光力学の追求は、外部の機械的自由度による厳密な課題に直面している。重要な例は熱変調ノイズ(tin)であり、熱雑音ピークの混合によって生じる余分な光学ノイズの一種である。 TINは光磁場の位相から切り離すことができるが、放射圧によって間接的に結合し、ショットノイズを圧倒する可能性のある隠れたバックアクションの源を示唆している。本稿では,fabry-p\'{e}rot型キャビティに結合した音響周波数si$_3$n$_4$トランポリンからなる高共動作室温キャビティ光機械系におけるtinのバックアクションを観察する。観測したバックアクションは, キャビティ線幅の10倍小さいにもかかわらず, 熱雑音が20db, 放射圧ショットノイズが40dbを超える。この結果は、TINの緩和が、様々な現代光学系における室温から量子状態に到達する上で重要であることを示唆している。

The pursuit of room temperature quantum optomechanics with tethered nanomechanical resonators faces stringent challenges owing to extraneous mechanical degrees of freedom. An important example is thermal intermodulation noise (TIN), a form of excess optical noise produced by mixing of thermal noise peaks. While TIN can be decoupled from the phase of the optical field, it remains indirectly coupled via radiation pressure, implying a hidden source of backaction that might overwhelm shot noise. Here we report observation of TIN backaction in a high-cooperativity, room temperature cavity optomechanical system consisting of an acoustic-frequency Si$_3$N$_4$ trampoline coupled to a Fabry-P\'{e}rot cavity. The backaction we observe exceeds thermal noise by 20 dB and radiation pressure shot noise by 40 dB, despite the thermal motion being 10 times smaller than the cavity linewidth. Our results suggest that mitigating TIN may be critical to reaching the quantum regime from room temperature in a variety of contemporary optomechanical systems.

翻訳日:2023-07-10 13:58:06 公開日:2023-07-06

# 公正な分類がノイズ保護属性と出会うとき

When Fair Classification Meets Noisy Protected Attributes ( http://arxiv.org/abs/2307.03306v1 )

ライセンス: Link先を確認

Avijit Ghosh, Pablo Kvitca, Christo Wilson

(参考訳) アルゴリズムの公平性の運用には、データセットの保護属性の可用性や信頼性など、いくつかの実用的な課題が伴う。現実の文脈では、実用的および法的障害は人口統計データの収集と使用を妨げ、アルゴリズムの公平性を保証することが困難になる。初期フェアネスアルゴリズムはこれらの制限を考慮しなかったが、最近の提案は保護属性にノイズを組み込むか、保護属性を全く使わないことで分類のアルゴリズム的フェアネスを達成することを目的としている。我々の知る限りでは、これは、予測と公正性の二重軸に沿った属性耐性、耐雑音性、および属性ブラインドアルゴリズムを比較するための、公平な分類アルゴリズムの直接的研究である。これらのアルゴリズムを実世界の4つのデータセットと合成摂動のケーススタディを通じて評価した。本研究は,保護された属性がノイズである場合でも,属性依存型アルゴリズムと同等の性能を達成できることを示す。しかし、実際に実施するには注意深いニュアンスが必要である。本研究は,保護属性がうるさく,部分的に使用可能なシナリオにおいて,公平な分類アルゴリズムを使用することの実際的な意義について考察する。

The operationalization of algorithmic fairness comes with several practical challenges, not the least of which is the availability or reliability of protected attributes in datasets. In real-world contexts, practical and legal impediments may prevent the collection and use of demographic data, making it difficult to ensure algorithmic fairness. While initial fairness algorithms did not consider these limitations, recent proposals aim to achieve algorithmic fairness in classification by incorporating noisiness in protected attributes or not using protected attributes at all. To the best of our knowledge, this is the first head-to-head study of fair classification algorithms to compare attribute-reliant, noise-tolerant and attribute-blind algorithms along the dual axes of predictivity and fairness. We evaluated these algorithms via case studies on four real-world datasets and synthetic perturbations. Our study reveals that attribute-blind and noise-tolerant fair classifiers can potentially achieve similar level of performance as attribute-reliant algorithms, even when protected attributes are noisy. However, implementing them in practice requires careful nuance. Our study provides insights into the practical implications of using fair classification algorithms in scenarios where protected attributes are noisy or partially available.

翻訳日:2023-07-10 13:57:47 公開日:2023-07-06

# プレソフトマックススコアを用いた属性法の脆弱性

A Vulnerability of Attribution Methods Using Pre-Softmax Scores ( http://arxiv.org/abs/2307.03305v1 )

ライセンス: Link先を確認

Miguel Lerma and Mirtha Lucas

(参考訳) 分類器として動作する畳み込みニューラルネットワークの出力に関する説明を提供するために使用される帰属方法のカテゴリを含む脆弱性について検討する。このタイプのネットワークは、入力の知覚できない摂動がモデルの出力を変える可能性のある敵攻撃に弱いことが知られている。対照的に、モデル内の小さな修正がモデル出力を変更することなく帰属法に影響を及ぼす影響に焦点を当てる。

We discuss a vulnerability involving a category of attribution methods used to provide explanations for the outputs of convolutional neural networks working as classifiers. It is known that this type of networks are vulnerable to adversarial attacks, in which imperceptible perturbations of the input may alter the outputs of the model. In contrast, here we focus on effects that small modifications in the model may cause on the attribution method without altering the model outputs.

翻訳日:2023-07-10 13:57:25 公開日:2023-07-06

# データ効率・高性能医用画像処理のための同変球面CNN

Equivariant Spherical CNN for Data Efficient and High-Performance Medical Image Processing ( http://arxiv.org/abs/2307.03298v1 )

ライセンス: Link先を確認

Amirreza Hashemi, Yuemeng Feng, Hamid Sabet

(参考訳) 本研究は,トモグラフィ応用における等価ネットワークの効率的かつ高性能なアプローチとしての重要性を強調する。本研究は,様々な医用画像処理システムの後処理において有望である畳み込みニューラルネットワーク(CNN)の限界を基礎にしている。しかし、従来のCNNの効率性は、未完成で適切なトレーニングセットに大きく依存している。そこで本研究では,CNNが特定のトレーニングセットへの依存を減らすことを目的とした同変ネットワークを提案する。断層画像診断における球面信号に対する同変CNNの有効性について検討した。この結果から, ベンチマーク問題の解法と再構成において, 球状CNN(SCNN)の精度と計算効率が優れていた。さらに,従来の画像再構成ツールの補完としてSCNNを用いる新たな手法を提案する。いずれの場合も,CNNと比較して,SCNNと同等あるいは高画質の画像処理を継続しながら,計算コストの大幅な低下を観察する。さらに,このネットワークの広範なトモグラフィ応用,特に全方位表現を必要とするネットワークの可能性について検討する。

This work highlights the significance of equivariant networks as efficient and high-performance approaches for tomography applications. Our study builds upon the limitations of Convolutional Neural Networks (CNNs), which have shown promise in post-processing various medical imaging systems. However, the efficiency of conventional CNNs heavily relies on an undiminished and proper training set. To tackle this issue, in this study, we introduce an equivariant network, aiming to reduce CNN's dependency on specific training sets. We evaluate the efficacy of equivariant CNNs on spherical signals for tomographic medical imaging problems. Our results demonstrate superior quality and computational efficiency of spherical CNNs (SCNNs) in denoising and reconstructing benchmark problems. Furthermore, we propose a novel approach to employ SCNNs as a complement to conventional image reconstruction tools, enhancing the outcomes while reducing reliance on the training set. Across all cases, we observe a significant decrease in computational costs while maintaining the same or higher quality of image processing using SCNNs compared to CNNs. Additionally, we explore the potential of this network for broader tomography applications, particularly those requiring omnidirectional representation.

翻訳日:2023-07-10 13:57:18 公開日:2023-07-06

# 顎関節終末音声処理タスクにおけるガンマトネグラムの表現:音声認識,話者識別,知能度評価

Gammatonegram Representation for End-to-End Dysarthric Speech Processing Tasks: Speech Recognition, Speaker Identification, and Intelligibility Assessment ( http://arxiv.org/abs/2307.03296v1 )

ライセンス: Link先を確認

Aref Farhadipour and Hadi Veisi

(参考訳) 失語症(Dysarthria)は、人間の音声システムに障害を引き起こし、人の音声の品質と知性を減らす障害である。この効果により、正常な音声処理システムは、障害のある音声に対して適切に動作できない。この障害は通常身体障害と関連している。したがって、スマートホームで音声コマンドを受信することでタスクを遂行できるシステムを設計することは重要な成果である。本研究では,畳み込みニューラルネットワークの入力として使用される識別的詳細を持つ音声ファイルの効率的な表現法としてガンマトングラムを導入する。言い換えると、各音声ファイルを画像に変換し、異なるシナリオで音声を分類する画像認識システムを提案する。提案するcnnは、事前学習されたalexnet上の転送学習法に基づいている。本研究では,音声認識,話者識別,インテリジェンス評価のためのシステムの有効性を評価する。 uaデータセットの結果によると、提案する音声認識システムは話者依存モードでは91.29%、話者識別システムは87.74%、明瞭度評価システムは2クラスモードで96.47%の精度を達成した。最後に,完全自動動作するマルチネットワーク音声認識システムを提案する。このシステムは、二級知性評価システムと共にカスケード配置され、このシステムの出力は、音声認識ネットワークの各々の1つを活性化する。このアーキテクチャは92.3%のWRRを達成している。本論文のソースコードは利用可能である。

Dysarthria is a disability that causes a disturbance in the human speech system and reduces the quality and intelligibility of a person's speech. Because of this effect, the normal speech processing systems can not work properly on impaired speech. This disability is usually associated with physical disabilities. Therefore, designing a system that can perform some tasks by receiving voice commands in the smart home can be a significant achievement. In this work, we introduce gammatonegram as an effective method to represent audio files with discriminative details, which is used as input for the convolutional neural network. On the other word, we convert each speech file into an image and propose image recognition system to classify speech in different scenarios. Proposed CNN is based on the transfer learning method on the pre-trained Alexnet. In this research, the efficiency of the proposed system for speech recognition, speaker identification, and intelligibility assessment is evaluated. According to the results on the UA dataset, the proposed speech recognition system achieved 91.29% accuracy in speaker-dependent mode, the speaker identification system acquired 87.74% accuracy in text-dependent mode, and the intelligibility assessment system achieved 96.47% accuracy in two-class mode. Finally, we propose a multi-network speech recognition system that works fully automatically. This system is located in a cascade arrangement with the two-class intelligibility assessment system, and the output of this system activates each one of the speech recognition networks. This architecture achieves an accuracy of 92.3% WRR. The source code of this paper is available.

翻訳日:2023-07-10 13:57:02 公開日:2023-07-06

# chexmask: 胸部x線画像のための解剖学的セグメンテーションマスクの大規模データセット

CheXmask: a large-scale dataset of anatomical segmentation masks for multi-center chest x-ray images ( http://arxiv.org/abs/2307.03293v1 )

ライセンス: Link先を確認

Nicol\'as Gaggion, Candelaria Mosquera, Lucas Mansilla, Martina Aineseder, Diego H. Milone, Enzo Ferrante

(参考訳) 胸部X線分析のための人工知能モデルの開発は、高品質なアノテーションを持つ大規模で多様なデータセットに依存している。胸部X線画像のデータベースがいくつか公開されているが、そのほとんどは疾患診断ラベルを含んでいるが、詳細なピクセルレベルの解剖学的分類ラベルがない。このギャップに対処するため,CANDID-PTX,ChestX-ray8,Chexpert,MIMIC-CXR-JPG,Padchest,VinDr-CXRの6つの公開データベースから得られる画像に対して,均一かつ微細な解剖学的アノテーションを付加した胸部X線多中心セグメンテーションデータセットを導入する。提案手法はHybridGNetモデルを用いて,全データセットの一貫性と高品質なセグメンテーションを保証する。専門医の評価と自動品質管理を含む厳密な検証を行い、その結果のマスクを検証する。さらに,マスク毎の個別品質指標とデータセット毎の全体的な品質推定も提供する。このデータセットは、胸部x線分析における革新的な方法論の開発と評価を合理化し、より広い科学コミュニティにとって貴重な資源となっている。 chexmaskデータセットは、 \url{https://physionet.org/content/chexmask-cxr-segmentation-data/} で公開されている。

The development of successful artificial intelligence models for chest X-ray analysis relies on large, diverse datasets with high-quality annotations. While several databases of chest X-ray images have been released, most include disease diagnosis labels but lack detailed pixel-level anatomical segmentation labels. To address this gap, we introduce an extensive chest X-ray multi-center segmentation dataset with uniform and fine-grain anatomical annotations for images coming from six well-known publicly available databases: CANDID-PTX, ChestX-ray8, Chexpert, MIMIC-CXR-JPG, Padchest, and VinDr-CXR, resulting in 676,803 segmentation masks. Our methodology utilizes the HybridGNet model to ensure consistent and high-quality segmentations across all datasets. Rigorous validation, including expert physician evaluation and automatic quality control, was conducted to validate the resulting masks. Additionally, we provide individualized quality indices per mask and an overall quality estimation per dataset. This dataset serves as a valuable resource for the broader scientific community, streamlining the development and assessment of innovative methodologies in chest X-ray analysis. The CheXmask dataset is publicly available at: \url{https://physionet.org/content/chexmask-cxr-segmentation-data/}.

翻訳日:2023-07-10 13:56:36 公開日:2023-07-06

# 量子回路ボルニングマシンにおける過パラメータ化の同定

Identifying overparameterization in Quantum Circuit Born Machines ( http://arxiv.org/abs/2307.03292v1 )

ライセンス: Link先を確認

Andrea Delgado, Francisco Rios, Kathleen E. Hamilton

(参考訳) 機械学習では、過剰パラメータ化は経験的リスク環境の質的変化と関連しており、より効率的なトレーニングダイナミクスにつながる可能性がある。統計学習で用いられる多くのパラメータ化モデルでは、モデルが構築され、過剰パラメータ化環境下で訓練される、臨界数のパラメータ(またはモデルサイズ)が存在する。過パラメータ化ロスランドスケープには多くの特徴がある。最も重要な点は、低損失のグローバルまたはローカルミニマへの標準勾配降下の収束である。本研究では,非逆勾配法を用いて学習した生成モデルであるBornマシンの過パラメータ化遷移の開始について検討する。数値解析に基づく境界は, 一般に, オーバーパラメータ化遷移において良好な下限である。しかし、量子回路の代数的構造に基づく境界は非常にゆるい上界である。以上の結果から,これらのモデルのトレーサビリティを完全に理解することは,まだ未解決の課題であることが示唆された。

In machine learning, overparameterization is associated with qualitative changes in the empirical risk landscape, which can lead to more efficient training dynamics. For many parameterized models used in statistical learning, there exists a critical number of parameters, or model size, above which the model is constructed and trained in the overparameterized regime. There are many characteristics of overparameterized loss landscapes. The most significant is the convergence of standard gradient descent to global or local minima of low loss. In this work, we study the onset of overparameterization transitions for quantum circuit Born machines, generative models that are trained using non-adversarial gradient-based methods. We observe that bounds based on numerical analysis are in general good lower bounds on the overparameterization transition. However, bounds based on the quantum circuit's algebraic structure are very loose upper bounds. Our results indicate that fully understanding the trainability of these models remains an open question.

翻訳日:2023-07-10 13:56:08 公開日:2023-07-06

# ACDNet:効果的な医薬勧告のための注意誘導協調決定ネットワーク

ACDNet: Attention-guided Collaborative Decision Network for Effective Medication Recommendation ( http://arxiv.org/abs/2307.03332v1 )

ライセンス: Link先を確認

Jiacong Mi, Yi Zu, Zhuoyuan Wang, Jieyue He

(参考訳) 複雑な医療データのためにElectronic Health Records(EHR)を用いた治療勧告は困難である。最近のアプローチでは、患者eerから縦断情報を抽出して推奨事項をパーソナライズする。しかし、既存のモデルは十分な患者表現を欠くことが多く、患者の薬の記録と特定の薬との類似性を考慮することの重要性を見落としている。そこで本論文では,医薬品推奨のための注意誘導協調決定ネットワーク(ACDNet)を提案する。具体的には、adcnetはアテンション機構とトランスフォーマーを使用して、グローバルレベルとローカルレベルの両方での歴史的な訪問をモデル化し、患者の健康状態と薬物記録を効果的に捉えている。 ACDNetはまた、医薬品記録と医薬品表現の類似性を利用して推奨プロセスを促進する共同決定フレームワークも採用している。 MIMIC-IIIとMIMIC-IVの2つの広範囲な医学データセット実験の結果、ACDNetはJaccard、PR-AUC、F1スコアで最先端モデルよりも優れており、その優位性を再確認している。さらに, アブレーション実験により, acdnetにおける各モジュールの有効性の確証が得られ, 全体的な性能への寄与が確認された。さらに、詳細なケーススタディでは、ERHデータに基づく医薬品推奨におけるACDNetの有効性を強化し、現実の医療シナリオにおけるその実用的価値を示す。

Medication recommendation using Electronic Health Records (EHR) is challenging due to complex medical data. Current approaches extract longitudinal information from patient EHR to personalize recommendations. However, existing models often lack sufficient patient representation and overlook the importance of considering the similarity between a patient's medication records and specific medicines. Therefore, an Attention-guided Collaborative Decision Network (ACDNet) for medication recommendation is proposed in this paper. Specifically, ACDNet utilizes attention mechanism and Transformer to effectively capture patient health conditions and medication records by modeling their historical visits at both global and local levels. ACDNet also employs a collaborative decision framework, utilizing the similarity between medication records and medicine representation to facilitate the recommendation process. The experimental results on two extensive medical datasets, MIMIC-III and MIMIC-IV, clearly demonstrate that ACDNet outperforms state-of-the-art models in terms of Jaccard, PR-AUC, and F1 score, reaffirming its superiority. Moreover, the ablation experiments provide solid evidence of the effectiveness of each module in ACDNet, validating their contribution to the overall performance. Furthermore, a detailed case study reinforces the effectiveness of ACDNet in medication recommendation based on EHR data, showcasing its practical value in real-world healthcare scenarios.

翻訳日:2023-07-10 13:47:53 公開日:2023-07-06

# MOBIOデータベースによる顔のランドマーク検出評価

Facial Landmark Detection Evaluation on MOBIO Database ( http://arxiv.org/abs/2307.03329v1 )

ライセンス: Link先を確認

Na Zhang

(参考訳) MOBIOはバイモーダルなデータベースで、ほとんど携帯電話でしか撮れなかった。バイオメトリック技術をモバイルデバイスに展開する研究を改善することを目的としている。モバイル環境では顔認識や話者認識が可能であることが研究で示されている。顔のランドマークの局所化は、2次元顔画像のための予め定義されたキーポイントの集合の座標を見つけることを目的としている。顔ランドマークは通常、鼻先や眼中心といった特定の意味の意味を持ち、顔認識、感情推定、3d顔再構成などの他の顔分析タスクにリッチな幾何学的情報を提供する。 300W, AFW, AFLW, COFWなどの顔データベースを用いた顔のランドマーク検出手法はほとんどないが, モバイルデータはほとんど使われていない。筆者らはまず,MOBIOデータベースからの顔画像を用いて,移動体静止データに対する顔のランドマーク検出評価を行う。約20,600枚の顔画像がこの視聴覚データベースから抽出され、手作業で22のランドマークが基幹としてラベル付けされている。これらのデータ上での性能を評価するために,最先端の顔ランドマーク検出手法がいくつか採用されている。その結果、MOBIOデータベースのデータはかなり難しいことがわかった。このデータベースは、顔のランドマーク検出評価に新たな挑戦となる可能性がある。

MOBIO is a bi-modal database that was captured almost exclusively on mobile phones. It aims to improve research into deploying biometric techniques to mobile devices. Research has been shown that face and speaker recognition can be performed in a mobile environment. Facial landmark localization aims at finding the coordinates of a set of pre-defined key points for 2D face images. A facial landmark usually has specific semantic meaning, e.g. nose tip or eye centre, which provides rich geometric information for other face analysis tasks such as face recognition, emotion estimation and 3D face reconstruction. Pretty much facial landmark detection methods adopt still face databases, such as 300W, AFW, AFLW, or COFW, for evaluation, but seldomly use mobile data. Our work is first to perform facial landmark detection evaluation on the mobile still data, i.e., face images from MOBIO database. About 20,600 face images have been extracted from this audio-visual database and manually labeled with 22 landmarks as the groundtruth. Several state-of-the-art facial landmark detection methods are adopted to evaluate their performance on these data. The result shows that the data from MOBIO database is pretty challenging. This database can be a new challenging one for facial landmark detection evaluation.

翻訳日:2023-07-10 13:47:30 公開日:2023-07-06

# ディジタルアンテナアレイ上の自己教師付き事前学習および下流信号帯域回帰のためのエンコーダデコーダネットワーク

Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays ( http://arxiv.org/abs/2307.03327v1 )

ライセンス: Link先を確認

Rajib Bhattacharjea, Nathan West

(参考訳) 本研究は,デジタルアンテナアレイのデータに適用された自己教師あり学習の最初の応用について述べる。エンコーダ・デコーダネットワークは、デジタルアレイデータ上に事前トレーニングされ、チャネル・イン・ペイントと呼ばれる自己教師付きノイズ・再構成タスクを実行する。自己管理のステップでは、人間のラベル付きデータを必要としない。エンコーダのアーキテクチャと事前訓練からの重みはタスク固有のデコーダを持つ新しいネットワークに転送され、新しいネットワークはラベル付きデータの少ない量でトレーニングされる。ラベル付きデータに対する事前トレーニングにより、新しいネットワークは、ランダム初期化から同じラベル付きデータに基づいてトレーニングされた等価ネットワークよりも、デジタルアレイデータ上で帯域幅回帰のタスクを実行できることを示す。

This work presents the first applications of self-supervised learning applied to data from digital antenna arrays. Encoder-decoder networks are pretrained on digital array data to perform a self-supervised noisy-reconstruction task called channel in-painting, in which the network infers the contents of array data that has been masked with zeros. The self-supervised step requires no human-labeled data. The encoder architecture and weights from pretraining are then transferred to a new network with a task-specific decoder, and the new network is trained on a small volume of labeled data. We show that pretraining on the unlabeled data allows the new network to perform the task of bandwidth regression on the digital array data better than an equivalent network that is trained on the same labeled data from random initialization.

翻訳日:2023-07-10 13:47:11 公開日:2023-07-06

# サイバー攻撃を検出して電力系統障害のタイプを識別する機械学習

Machine Learning to detect cyber-attacks and discriminating the types of power system disturbances ( http://arxiv.org/abs/2307.03323v1 )

ライセンス: Link先を確認

Diane Tuyizere and Remy Ihabwikuzo

(参考訳) 本研究では,電力系統を対象とした機械学習による攻撃検出モデルを提案する。 Phasor Measurementing Devices(PMU)から収集したデータとログを利用することで、システムの振る舞いを学習し、潜在的なセキュリティ境界を効果的に識別することを目指している。提案手法は,データセット前処理,特徴選択,モデル生成,評価などの重要な段階を含む。このアプローチを検証するために、異なるPMUから得られた15のデータセットと、スノート警報とログをリレーするデータセットを使用した。ランダムフォレスト、ロジスティック回帰、K-Nearest Neighbourの3つの機械学習モデルを構築し、さまざまなパフォーマンス指標を用いて評価した。その結果, 無作為林モデルは, 電力系統外乱の検出において90.56%の精度で最高性能を達成でき, 意思決定過程におけるオペレーター支援の可能性も示唆された。

This research proposes a machine learning-based attack detection model for power systems, specifically targeting smart grids. By utilizing data and logs collected from Phasor Measuring Devices (PMUs), the model aims to learn system behaviors and effectively identify potential security boundaries. The proposed approach involves crucial stages including dataset pre-processing, feature selection, model creation, and evaluation. To validate our approach, we used a dataset used, consist of 15 separate datasets obtained from different PMUs, relay snort alarms and logs. Three machine learning models: Random Forest, Logistic Regression, and K-Nearest Neighbour were built and evaluated using various performance metrics. The findings indicate that the Random Forest model achieves the highest performance with an accuracy of 90.56% in detecting power system disturbances and has the potential in assisting operators in decision-making processes.

翻訳日:2023-07-10 13:46:55 公開日:2023-07-06

# BiPhone:テキストにおける言語間音声の影響のモデル化

BiPhone: Modeling Inter Language Phonetic Influences in Text ( http://arxiv.org/abs/2307.03322v1 )

ライセンス: Link先を確認

Abhirut Gupta, Ananya B. Sai, Richard Sproat, Yuri Vasilevski, James S. Ren, Ambarish Jash, Sukhdeep S. Sodhi, and Aravindan Raghuveer

(参考訳) 多くの人々が、テクノロジーの非対称性のために、リテラシーの低い言語でwebを使わざるを得ない。このようなユーザから第2言語(L2)で書かれたテキストには、ネイティブ言語(L1)の影響を受けている大量のエラーが含まれていることが多い。本稿ではL1とL2のペアに対して音素混同(L2ではL1話者が強調される可能性が高い)を抽出する方法を提案する。これらの混乱を生成モデル(Bi-Phone)にプラグインし、合成されたL2テキストを生成する。人的評価を通して, ビフォネはL1ごとに異なる, ウェブ上で広く報道される, もっともらしい汚職を発生させることを示す。また,一般的な言語理解ベンチマークであるSuperGLUEを,我々の手法(FunGLUE for Phonetically Noised GLUE)で劣化させ,SoTA言語基盤モデルの性能が低いことを示す。我々はまた,SuperGLUEに近い性能の回復を支援する新しい音素予測事前学習タスクも導入した。最後に,音声にロバストな言語モデルのさらなる研究を促進するために,funglueベンチマークもリリースします。我々の知る限り、FunGLUEはテキストにL1-L2インタラクションを導入した最初のベンチマークです。

A large number of people are forced to use the Web in a language they have low literacy in due to technology asymmetries. Written text in the second language (L2) from such users often contains a large number of errors that are influenced by their native language (L1). We propose a method to mine phoneme confusions (sounds in L2 that an L1 speaker is likely to conflate) for pairs of L1 and L2. These confusions are then plugged into a generative model (Bi-Phone) for synthetically producing corrupted L2 text. Through human evaluations, we show that Bi-Phone generates plausible corruptions that differ across L1s and also have widespread coverage on the Web. We also corrupt the popular language understanding benchmark SuperGLUE with our technique (FunGLUE for Phonetically Noised GLUE) and show that SoTA language understating models perform poorly. We also introduce a new phoneme prediction pre-training task which helps byte models to recover performance close to SuperGLUE. Finally, we also release the FunGLUE benchmark to promote further research in phonetically robust language models. To the best of our knowledge, FunGLUE is the first benchmark to introduce L1-L2 interactions in text.

翻訳日:2023-07-10 13:46:37 公開日:2023-07-06

# 量子絡み合いと純度テスト:グラフゼータ関数の観点から

Quantum Entanglement & Purity Testing: A Graph Zeta Function Perspective ( http://arxiv.org/abs/2307.03321v1 )

ライセンス: Link先を確認

Zachary P. Bradshaw and Margarite L. LaBorde

(参考訳) 我々は、任意の密度行列を重み付きグラフに割り当て、それを、イハラゼータ関数の一般化とエッジゼータ関数の特別な場合の両方であるグラフゼータ関数に関連付ける。最近開発された対称群に基づく双分極純状態分離性アルゴリズムは、このゼータ関数の指数展開における係数がユニティであるという条件に等価であることを示す。さらに、密度行列の非零固有値とゼータ関数の特異点との間には1対1の対応がある。これらの発見を説明するためにいくつかの例がある。

We assign an arbitrary density matrix to a weighted graph and associate to it a graph zeta function that is both a generalization of the Ihara zeta function and a special case of the edge zeta function. We show that a recently developed bipartite pure state separability algorithm based on the symmetric group is equivalent to the condition that the coefficients in the exponential expansion of this zeta function are unity. Moreover, there is a one-to-one correspondence between the nonzero eigenvalues of a density matrix and the singularities of its zeta function. Several examples are given to illustrate these findings.

翻訳日:2023-07-10 13:46:17 公開日:2023-07-06

# 調査対象の非共通点:調査対象のギャップに焦点をあてた質問生成

Covering Uncommon Ground: Gap-Focused Question Generation for Answer Assessment ( http://arxiv.org/abs/2307.03319v1 )

ライセンス: Link先を確認

Roni Rabin, Alexandre Djerbetian, Roee Engelberg, Lidan Hackmon, Gal Elidan, Reut Tsarfaty, Amir Globerson

(参考訳) 人間のコミュニケーションには、しばしば対話者間の情報ギャップが伴う。例えば、教育的な対話では、生徒は不完全な答えをしばしば提供し、この答えと教師が期待する完璧な答えの間にはギャップがある。成功した対話は、教師が効果的にこのギャップについて質問することで、リッチでインタラクティブな教育体験を生み出す。このようなギャップに着目した質問(GFQ)を自動生成する問題に着目する。我々はタスクを定義し、優れたgfqの望ましい側面を強調し、これらを満たすモデルを提案する。最後に,人間生成の質問に対する人間の注釈者による評価を行い,競争性を示す。

Human communication often involves information gaps between the interlocutors. For example, in an educational dialogue, a student often provides an answer that is incomplete, and there is a gap between this answer and the perfect one expected by the teacher. Successful dialogue then hinges on the teacher asking about this gap in an effective manner, thus creating a rich and interactive educational experience. We focus on the problem of generating such gap-focused questions (GFQs) automatically. We define the task, highlight key desired aspects of a good GFQ, and propose a model that satisfies these. Finally, we provide an evaluation by human annotators of our generated questions compared against human generated ones, demonstrating competitive performance.

翻訳日:2023-07-10 13:46:07 公開日:2023-07-06

# 解離型潜伏表現による難治性治療の臨床的評価

Assisting Clinical Decisions for Scarcely Available Treatment via Disentangled Latent Representation ( http://arxiv.org/abs/2307.03315v1 )

ライセンス: Link先を確認

Bing Xue, Ahmed Sameh Said, Ziqi Xu, Hanyang Liu, Neel Shah, Hanqing Yang, Philip Payne, Chenyang Lu

(参考訳) 体外膜酸素化(ECMO)は、従来の治療法に耐性がある新型コロナウイルス患者にとって必須の生命維持モーメントである。しかし、適切な治療決定は重要な議論の対象であり、この希少で技術的に複雑な治療オプションの利点についてはまだ議論が続いている。臨床判断を支援するためには,治療ニーズと治療の可能性,無治療反応を予測する必要がある。この臨床課題を対象とし,個別化分析のための新しいアプローチである治療変動オートエンコーダ(tvae)を提案する。 TVAEは、ECMOのようなモデリング上の課題に、強力な治療選択バイアスと不十分な治療ケースで対処するように設計されている。 TVAEは治療決定をマルチスケール問題として概念化している。本研究は,患者の潜在的治療課題と,深層潜伏変数モデルで表現できる本質的な特徴の一部として,現実的および非現実的結果をモデル化する。半スーパービジョンと共に再構成正規化スキームにより事実と反事実の予測誤差を軽減し、異方性と分布整合潜在空間とラベルバランス生成戦略により、選択バイアスと治療ケースの不足を軽減させる。我々は、63カ国1651の病院から収集された国際データセットと、15の病院から収集された機関データセットの2つの実世界のCOVID-19データセットについてTVAEを評価した。その結果、TVAEは、不均一なCOVID-19データセットの妥当性スコアと事実結果の両方を予測するために、最先端の治療効果モデルより優れていることが示された。追加実験では、合成したIHDPベンチマークデータセット上で、個別処理効果の推定において、TVAEが最高の既存のモデルより優れていることも示している。

Extracorporeal membrane oxygenation (ECMO) is an essential life-supporting modality for COVID-19 patients who are refractory to conventional therapies. However, the proper treatment decision has been the subject of significant debate and it remains controversial about who benefits from this scarcely available and technically complex treatment option. To support clinical decisions, it is a critical need to predict the treatment need and the potential treatment and no-treatment responses. Targeting this clinical challenge, we propose Treatment Variational AutoEncoder (TVAE), a novel approach for individualized treatment analysis. TVAE is specifically designed to address the modeling challenges like ECMO with strong treatment selection bias and scarce treatment cases. TVAE conceptualizes the treatment decision as a multi-scale problem. We model a patient's potential treatment assignment and the factual and counterfactual outcomes as part of their intrinsic characteristics that can be represented by a deep latent variable model. The factual and counterfactual prediction errors are alleviated via a reconstruction regularization scheme together with semi-supervision, and the selection bias and the scarcity of treatment cases are mitigated by the disentangled and distribution-matched latent space and the label-balancing generative strategy. We evaluate TVAE on two real-world COVID-19 datasets: an international dataset collected from 1651 hospitals across 63 countries, and a institutional dataset collected from 15 hospitals. The results show that TVAE outperforms state-of-the-art treatment effect models in predicting both the propensity scores and factual outcomes on heterogeneous COVID-19 datasets. Additional experiments also show TVAE outperforms the best existing models in individual treatment effect estimation on the synthesized IHDP benchmark dataset.

翻訳日:2023-07-10 13:45:56 公開日:2023-07-06

# 知識グラフ推論のための構造誘導マルチモーダル事前学習トランス

Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning ( http://arxiv.org/abs/2307.03591v1 )

ライセンス: Link先を確認

Ke Liang, Sihang Zhou, Yue Liu, Lingyuan Meng, Meng Liu, Xinwang Liu

(参考訳) 様々なモダリティで情報を直感的に整理するマルチモーダル知識グラフ(MKG)は、レコメンデーションシステムや視覚的質問応答など、複数の下流業務に役立てることができる。しかし、ほとんどのMKGは完成には程遠いため、MKG推論モデルの繁栄の動機となっている。近年,汎用人工建築の発展に伴い,特にマルチモーダルシナリオにおいて,事前学習型トランスフォーマーモデルに注目が集まっている。しかし、知識グラフ推論(KGR)のためのマルチモーダル事前学習変換器(MPT)の研究はまだ初期段階にある。 MKGと他のマルチモーダルデータとの最大の違いとして、MKGの基盤となる豊富な構造情報は、既存のMPTモデルでは十分に活用できない。それらの多くは、同じエンティティに接続された画像とテキストをマッチングするための検索マップとして、グラフ構造のみを使用する。このやり方は彼らの推論パフォーマンスを妨げる。そこで,本研究では知識グラフ推論のためのグラフ構造誘導マルチモーダルプリトレーニングトランス(sgmpt)を提案する。具体的には、構造特徴符号化にグラフ構造エンコーダを用いる。次に、2つの異なる戦略、すなわち重み付き和とアライメント制約を持つ構造誘導型融合モジュールを最初に設計し、構造情報をテキストと視覚の両方に注入する。我々の知る限り、SGMPTは知識グラフの基盤となる構造情報をマイニングするマルチモーダルKGRのための最初のMPTモデルである。 FB15k-237-IMGとWN18-IMGの大規模な実験により、SGMPTが既存の最先端モデルより優れ、設計戦略の有効性が証明された。

Multimodal knowledge graphs (MKGs), which intuitively organize information in various modalities, can benefit multiple practical downstream tasks, such as recommendation systems, and visual question answering. However, most MKGs are still far from complete, which motivates the flourishing of MKG reasoning models. Recently, with the development of general artificial architectures, the pretrained transformer models have drawn increasing attention, especially for multimodal scenarios. However, the research of multimodal pretrained transformer (MPT) for knowledge graph reasoning (KGR) is still at an early stage. As the biggest difference between MKG and other multimodal data, the rich structural information underlying the MKG still cannot be fully leveraged in existing MPT models. Most of them only utilize the graph structure as a retrieval map for matching images and texts connected with the same entity. This manner hinders their reasoning performances. To this end, we propose the graph Structure Guided Multimodal Pretrained Transformer for knowledge graph reasoning, termed SGMPT. Specifically, the graph structure encoder is adopted for structural feature encoding. Then, a structure-guided fusion module with two different strategies, i.e., weighted summation and alignment constraint, is first designed to inject the structural information into both the textual and visual features. To the best of our knowledge, SGMPT is the first MPT model for multimodal KGR, which mines the structural information underlying the knowledge graph. Extensive experiments on FB15k-237-IMG and WN18-IMG, demonstrate that our SGMPT outperforms existing state-of-the-art models, and prove the effectiveness of the designed strategies.

翻訳日:2023-07-10 12:20:58 公開日:2023-07-06

# セキュリティ改善と異言語化における単語埋め込み意味境界オートエンコーダのための未決定ウェーブレット変換

Undecimated Wavelet Transform for Word Embedded Semantic Marginal Autoencoder in Security improvement and Denoising different Languages ( http://arxiv.org/abs/2307.03679v1 )

ライセンス: Link先を確認

Shreyanth S

(参考訳) 本研究は,Word Embedded Semantic Marginal Autoencoder (WESMA) 内の非効率なウェーブレット変換を組み合わせることで,セキュリティ対策の改善と複数の言語を認知するための新たな戦略を提供する。これらの戦略の組み入れは、データ処理アプリケーションにおける堅牢性、プライバシー、多言語性の問題に対処することを目的としている。未決定ウェーブレット変換は、入力データの顕著な言語パターンと構造的性質を識別するための特徴抽出ツールとして使用される。提案手法は,この変換を用いて時間的および地理的な関連を保ちつつ,重要な情報を取り込むことができる。これにより、システムの異常検出能力を高め、隠れたパターンを発見し、正当な内容と危険な脅威を区別することで、セキュリティ対策が改善される。 Word Embedded Semantic Marginal Autoencoderは次元と雑音の低減のためのインテリジェントなフレームワークとしても機能する。オートエンコーダは、データの基盤となるセマンティクスを効果的に学習し、単語埋め込みとセマンティクスコンテキストを利用してノイズ成分を削減する。その結果、以下の処理段階において、データ品質と精度が向上する。提案手法は、複数の言語とセキュリティシナリオを含む多様化データセットを使用してテストされる。実験の結果,提案手法は,複数の言語にまたがるセキュリティ強化と特徴付け機能の実現に有効であることがわかった。このシステムは言語のばらつきを扱うのに強く、使用する言語に関係なく一貫した結果を生み出す。さらに、非効率なウェーブレット変換を組み込むことで、複雑なセキュリティ問題に効率的に対処するシステムの能力が大幅に向上する。

By combining the undecimated wavelet transform within a Word Embedded Semantic Marginal Autoencoder (WESMA), this research study provides a novel strategy for improving security measures and denoising multiple languages. The incorporation of these strategies is intended to address the issues of robustness, privacy, and multilingualism in data processing applications. The undecimated wavelet transform is used as a feature extraction tool to identify prominent language patterns and structural qualities in the input data. The proposed system may successfully capture significant information while preserving the temporal and geographical links within the data by employing this transform. This improves security measures by increasing the system's ability to detect abnormalities, discover hidden patterns, and distinguish between legitimate content and dangerous threats. The Word Embedded Semantic Marginal Autoencoder also functions as an intelligent framework for dimensionality and noise reduction. The autoencoder effectively learns the underlying semantics of the data and reduces noise components by exploiting word embeddings and semantic context. As a result, data quality and accuracy are increased in following processing stages. The suggested methodology is tested using a diversified dataset that includes several languages and security scenarios. The experimental results show that the proposed approach is effective in attaining security enhancement and denoising capabilities across multiple languages. The system is strong in dealing with linguistic variances, producing consistent outcomes regardless of the language used. Furthermore, incorporating the undecimated wavelet transform considerably improves the system's ability to efficiently address complex security concerns

翻訳日:2023-07-10 12:01:48 公開日:2023-07-06

# フロンティアai規制 - 公共安全に対する新たなリスク管理

Frontier AI Regulation: Managing Emerging Risks to Public Safety ( http://arxiv.org/abs/2307.03718v1 )

ライセンス: Link先を確認

Markus Anderljung, Joslyn Barnhart, Jade Leung, Anton Korinek, Cullen O'Keefe, Jess Whittlestone, Shahar Avin, Miles Brundage, Justin Bullock, Duncan Cass-Beggs, Ben Chang, Tantum Collins, Tim Fist, Gillian Hadfield, Alan Hayes, Lewis Ho, Sara Hooker, Eric Horvitz, Noam Kolt, Jonas Schuett, Yonadav Shavit, Divya Siddarth, Robert Trager, Kevin Wolf

(参考訳) 高度なAIモデルは人類にとって大きな利益をもたらすと約束しているが、社会はそれに伴うリスクを積極的に管理する必要がある。本稿では,公共の安全に重大なリスクをもたらすのに十分な危険能力を有するような,高度な能力を持つ基盤モデルについて述べる。危険な能力が予期せず出現する可能性があり、デプロイされたモデルが誤用されることを堅牢に防止することは困難であり、モデルの能力が広範囲に普及することを止めるのは難しい。これらの課題に対処するには、(1)フロンティアAI開発者の適切な要件を特定するための標準設定プロセス、(2)フロンティアAI開発プロセスの可視性を提供するための規制当局の登録および報告要件、(3)フロンティアAIモデルの開発と展開のための安全基準の遵守を保証するメカニズムの3つが必要である。業界の自己規制は重要な第一歩です。しかし、より広範な社会的な議論と政府の介入は、標準の作成とコンプライアンスの確保のために必要となる。我々は、規制当局への執行権限の付与やフロンティアaiモデルのライセンス制度など、この目的へのいくつかの選択肢を検討します。最後に,安全基準の第一セットを提案する。これには、デプロイ前のリスクアセスメントの実行、モデルの振る舞いの外部的検査、デプロイメント決定にリスクアセスメントを使用すること、モデルの能力とデプロイ後の使用に関する新しい情報に関する監視と応答が含まれる。この議論が、ai開発のフロンティアにおける公衆安全のリスクとイノベーションのメリットのバランスのとり方に関する幅広い議論に貢献できることを願っている。

Advanced AI models hold the promise of tremendous benefits for humanity, but society needs to proactively manage the accompanying risks. In this paper, we focus on what we term "frontier AI" models: highly capable foundation models that could possess dangerous capabilities sufficient to pose severe risks to public safety. Frontier AI models pose a distinct regulatory challenge: dangerous capabilities can arise unexpectedly; it is difficult to robustly prevent a deployed model from being misused; and, it is difficult to stop a model's capabilities from proliferating broadly. To address these challenges, at least three building blocks for the regulation of frontier models are needed: (1) standard-setting processes to identify appropriate requirements for frontier AI developers, (2) registration and reporting requirements to provide regulators with visibility into frontier AI development processes, and (3) mechanisms to ensure compliance with safety standards for the development and deployment of frontier AI models. Industry self-regulation is an important first step. However, wider societal discussions and government intervention will be needed to create standards and to ensure compliance with them. We consider several options to this end, including granting enforcement powers to supervisory authorities and licensure regimes for frontier AI models. Finally, we propose an initial set of safety standards. These include conducting pre-deployment risk assessments; external scrutiny of model behavior; using risk assessments to inform deployment decisions; and monitoring and responding to new information about model capabilities and uses post-deployment. We hope this discussion contributes to the broader conversation on how to balance public safety risks and innovation benefits from advances at the frontier of AI development.

翻訳日:2023-07-10 11:52:51 公開日:2023-07-06

# レーザと機械学習モデルを用いた鋼表面粗さパラメータ計算

Steel Surface Roughness Parameter Calculations Using Lasers and Machine Learning Models ( http://arxiv.org/abs/2307.03723v1 )

ライセンス: Link先を確認

Alex Milne, Xianghua Xie

(参考訳) 鋼板の表面性状の制御は, 亜鉛めっきおよび熱間圧延プロセスにおける顧客の要求を満たすために不可欠である。従来の方法はポストプロダクションのスタイラス測定に依存し、オンライン技術はストリップ全体の非接触およびリアルタイム計測を提供する。しかし, 製造パイプラインの有効利用には, 正確な測定の確保が不可欠である。さらに、正確なオンライン測定により製造工程パラメータのリアルタイム調整が可能となり、一貫性のある品質とテンパーミルのクローズドループ制御が可能となる。本研究では,最先端の機械学習モデルを用いて,オンライン計測の高精度なra面粗さ測定への変換を実現する。深部学習法と非深部学習法の両方を含むデータ駆動型アプローチの選択をクローズフォーム変換と比較することにより, 薄帯鋼製造における表面テクスチャ制御の改善の可能性を評価する。

Control of surface texture in strip steel is essential to meet customer requirements during galvanizing and temper rolling processes. Traditional methods rely on post-production stylus measurements, while on-line techniques offer non-contact and real-time measurements of the entire strip. However, ensuring accurate measurement is imperative for their effective utilization in the manufacturing pipeline. Moreover, accurate on-line measurements enable real-time adjustments of manufacturing processing parameters during production, ensuring consistent quality and the possibility of closed-loop control of the temper mill. In this study, we leverage state-of-the-art machine learning models to enhance the transformation of on-line measurements into significantly a more accurate Ra surface roughness metric. By comparing a selection of data-driven approaches, including both deep learning and non-deep learning methods, to the close-form transformation, we evaluate their potential for improving surface texture control in temper strip steel manufacturing.

翻訳日:2023-07-10 11:40:51 公開日:2023-07-06

# エアフォイル GAN:空力形状最適化のためのエアフォイルのエンコードと合成

Airfoil GAN: Encoding and Synthesizing Airfoils for Aerodynamic Shape Optimization ( http://arxiv.org/abs/2101.04757v2 )

ライセンス: Link先を確認

Yuyang Wang, Kenji Shimada, Amir Barati Farimani

(参考訳) エアフォイルのような空力形状の現在の設計は、可能な設計空間を探索するための計算集約的なシミュレーションを伴う。通常、このような設計は設計パラメータの事前定義に依存し、新しい形状の合成に制限を課す。本研究では,既存の翼から表現を自動的に学習し,学習した表現を用いて新しい翼を生成するデータ駆動型形状符号化・生成法を提案する。これらの表現は、空気力学的性能に基づいて合成翼形状の最適化に使用される。我々のモデルは、変分オートエンコーダとジェネレーティブ・アドバーサリアル・ネットワークを組み合わせたニューラルネットワークであるVAEGANに基づいて構築されており、勾配に基づく手法で訓練されている。本モデルでは,(1)既存のエアフォイルを潜在ベクターにエンコードし,それからエアフォイルを再構築し,(2)潜在ベクターをランダムにサンプリングしてエアフォイル座標領域にマッピングし,(3)学習した特徴を遺伝的アルゴリズムにより最適化し,所望の空力特性を有するエアフォイルを合成する。実験の結果,事前定義された設計パラメータを使わずに,形状情報を網羅的かつ包括的に符号化できることがわかった。特徴ベクトルの補間/補間またはガウス雑音からのサンプリングにより、モデルは、モデル訓練のために使用される翼と競合する、あるいはより優れた空力特性を持つ、新しい翼形状を自動的に合成することができる。遺伝的アルゴリズムによって学習された潜在領域の形状を最適化することで、合成された翼は空力特性をターゲットに進化することができる。これは効率のよい学習ベースの翼設計の枠組みを示し、潜水領域の翼を符号化し最適化し、空力性能に必要な有望な翼候補を合成する。

The current design of aerodynamic shapes, like airfoils, involves computationally intensive simulations to explore the possible design space. Usually, such design relies on the prior definition of design parameters and places restrictions on synthesizing novel shapes. In this work, we propose a data-driven shape encoding and generating method, which automatically learns representations from existing airfoils and uses the learned representations to generate new airfoils. The representations are then used in the optimization of synthesized airfoil shapes based on their aerodynamic performance. Our model is built upon VAEGAN, a neural network that combines Variational Autoencoder with Generative Adversarial Network and is trained by the gradient-based technique. Our model can (1) encode the existing airfoil into a latent vector and reconstruct the airfoil from that, (2) generate novel airfoils by randomly sampling the latent vectors and mapping the vectors to the airfoil coordinate domain, and (3) synthesize airfoils with desired aerodynamic properties by optimizing learned features via a genetic algorithm. Our experiments show that the learned features encode shape information thoroughly and comprehensively without predefined design parameters. By interpolating/extrapolating feature vectors or sampling from Gaussian noises, the model can automatically synthesize novel airfoil shapes, some of which possess competitive or even better aerodynamic properties comparing to airfoils used for model training purposes. By optimizing shapes on the learned latent domain via a genetic algorithm, synthesized airfoils can evolve to target aerodynamic properties. This demonstrates an efficient learning-based airfoil design framework, which encodes and optimizes the airfoil on the latent domain and synthesizes promising airfoil candidates for required aerodynamic performance.

翻訳日:2023-07-07 19:04:53 公開日:2023-07-06

# ShadowNet:畳み込みニューラルネットワークのためのセキュアで効率的なオンデバイスモデル推論システム

ShadowNet: A Secure and Efficient On-device Model Inference System for Convolutional Neural Networks ( http://arxiv.org/abs/2011.05905v4 )

ライセンス: Link先を確認

Zhichuang Sun, Ruimin Sun, Changming Liu, Amrita Roy Chowdhury, Long Lu, Somesh Jha

(参考訳) モバイルとエッジデバイスにおけるAIアクセラレータの使用が増加し、オンデバイス機械学習(ML)が人気を集めている。何千ものプロプライエタリなMLモデルが今日、何十億もの信頼できないデバイスにデプロイされている。これはモデルプライバシに関する深刻なセキュリティ上の懸念を引き起こす。しかし、信頼できないAIアクセラレーターへのアクセスを失うことなくモデルのプライバシを保護することは難しい問題である。本稿では,デバイス上での新たなモデル推論システムであるShadowNetを提案する。 shadownetはモデルプライバシをtrusted execution environment(tee)で保護するとともに、モデルの重線形層を信頼できないハードウェアアクセラレータに安全にアウトソーシングする。 ShadowNetは、アウトソーシングする前にリニアレイヤの重みを変換し、TEE内の結果を復元することで、これを実現する。非線形層は、TEE内でも安全である。 ShadowNetの設計は、重みの効率的な変換とその後の結果の復元を保証する。 TensorFlow LiteをベースにShadowNetのプロトタイプを構築し、MobileNet、ResNet-44、MiniVGG、ResNet-404、YOLOv4-tinyという5つの人気のあるCNNで評価する。評価の結果,ShadowNetは適切な性能で強力なセキュリティ保証を実現し,デバイス上での安全なモデル推論のための実用的なソリューションを提供する。

With the increased usage of AI accelerators on mobile and edge devices, on-device machine learning (ML) is gaining popularity. Thousands of proprietary ML models are being deployed today on billions of untrusted devices. This raises serious security concerns about model privacy. However, protecting model privacy without losing access to the untrusted AI accelerators is a challenging problem. In this paper, we present a novel on-device model inference system, ShadowNet. ShadowNet protects the model privacy with Trusted Execution Environment (TEE) while securely outsourcing the heavy linear layers of the model to the untrusted hardware accelerators. ShadowNet achieves this by transforming the weights of the linear layers before outsourcing them and restoring the results inside the TEE. The non-linear layers are also kept secure inside the TEE. ShadowNet's design ensures efficient transformation of the weights and the subsequent restoration of the results. We build a ShadowNet prototype based on TensorFlow Lite and evaluate it on five popular CNNs, namely, MobileNet, ResNet-44, MiniVGG, ResNet-404, and YOLOv4-tiny. Our evaluation shows that ShadowNet achieves strong security guarantees with reasonable performance, offering a practical solution for secure on-device model inference.

翻訳日:2023-07-07 19:04:21 公開日:2023-07-06

# SAT解決のためのタイムラプスチャレンジ

A Time Leap Challenge for SAT Solving ( http://arxiv.org/abs/2008.02215v2 )

ライセンス: Link先を確認

Johannes K. Fichte, Markus Hecher, Stefan Szeider

(参考訳) 我々は過去20年間のSAT問題解決におけるハードウェアの進歩とアルゴリズムの進歩の影響を比較した。特に,20年前のSATソルバと20年前のハードウェアのSATソルバを比較した。以上の結果から,アルゴリズム面での進歩は,ハードウェア面での進歩よりも少なくとも影響が大きいことがわかった。

We compare the impact of hardware advancement and algorithm advancement for SAT solving over the last two decades. In particular, we compare 20-year-old SAT-solvers on new computer hardware with modern SAT-solvers on 20-year-old hardware. Our findings show that the progress on the algorithmic side has at least as much impact as the progress on the hardware side.

翻訳日:2023-07-07 19:03:58 公開日:2023-07-06

# 半デバイス依存ブラインド量子トモグラフィ

Semi-device-dependent blind quantum tomography ( http://arxiv.org/abs/2006.03069v2 )

ライセンス: Link先を確認

Ingo Roth, Jadwiga Wilkens, Dominik Hangleiter, Jens Eisert

(参考訳) 量子状態に関するトモグラフィー情報を抽出することは、高精度量子デバイスを開発するための重要な課題である。現在のスキームは通常、高精度に調整されたトモグラフィのための測定装置を必要とする。皮肉なことに、測定校正の精度は、状態調整の精度によって根本的に制限され、悪循環が確立される。そこで本研究では, このサイクルが破られ, 測定装置のキャリブレーションに対する依存性が著しく緩和されることを示す。その結果, 量子状態の自然低ランク構造を利用して, 古典的に効率的な後処理アルゴリズムを用いた高度にスケーラブルな ‘blind' トモグラフィ法が得られた。さらに,キャリブレーションのスパース構造を利用することで,提案手法の効率をさらに向上させる。これは、ブラインド量子トモグラフィー問題を低ランク行列のスパース和のデミキシングに緩和することで達成される。提案アルゴリズムは低ランクな量子状態を復元し,測定モデルが制限された等尺性を示すことを証明した。総合的な測定を行うには,測定設定を最適に行う必要がある。これらの概念的および数学的知見を補完し、トラップイオンの実装にインスパイアされた実用的な環境で、ロバストブラインド量子トモグラフィーが可能であることを数値的に示す。

Extracting tomographic information about quantum states is a crucial task in the quest towards devising high-precision quantum devices. Current schemes typically require measurement devices for tomography that are a priori calibrated to high precision. Ironically, the accuracy of the measurement calibration is fundamentally limited by the accuracy of state preparation, establishing a vicious cycle. Here, we prove that this cycle can be broken and the dependence on the measurement device's calibration significantly relaxed. We show that exploiting the natural low-rank structure of quantum states of interest suffices to arrive at a highly scalable `blind' tomography scheme with a classically efficient post-processing algorithm. We further improve the efficiency of our scheme by making use of the sparse structure of the calibrations. This is achieved by relaxing the blind quantum tomography problem to the de-mixing of a sparse sum of low-rank matrices. We prove that the proposed algorithm recovers a low-rank quantum state and the calibration provided that the measurement model exhibits a restricted isometry property. For generic measurements, we show that it requires a close-to-optimal number of measurement settings. Complementing these conceptual and mathematical insights, we numerically demonstrate that robust blind quantum tomography is possible in a practical setting inspired by an implementation of trapped ions.

翻訳日:2023-07-07 19:03:53 公開日:2023-07-06

# 非可換代数を用いた畳み込みフィルタとニューラルネットワーク

Convolutional Filtering and Neural Networks with Non Commutative Algebras ( http://arxiv.org/abs/2108.09923v3 )

ライセンス: Link先を確認

Alejandro Parada-Mayorga, Landon Butler and Alejandro Ribeiro

(参考訳) 本稿では,非可換畳み込み畳み込みニューラルネットワークの代数的一般化について述べる。代数的信号処理の理論を畳み込み型非可換アーキテクチャのモデル化に活用し、畳み込み型畳み込み型ニューラルネットワークの文献で得られたものを拡張する具体的安定性境界を導出する。非可換畳み込み構造は作用素空間上の変形に対して安定であることを示す。我々は非可換信号モデルのスペクトル表現を開発し、非可換フィルタが互いに独立してフーリエ成分を処理することを示す。特に、非可換モデルにおける信号のスペクトル分解は次元が 1 より大きい固有空間に関連付けられるが、安定性と選択性の間にはトレードオフがあり、低次元行列空間における行列多項式関数によって制御される。このトレードオフは、代数のフィルタが安定に制限されているとき、ポイントワイズ非線形性によってネットワーク内で補償される識別可能性の損失があることを示している。本稿では,群ニューラルネットワーク,マルチグラフニューラルネットワーク,四元系ニューラルネットワークなどの非可換畳み込みアーキテクチャへの直接的な適用と,摂動発生時の挙動を示す数値実験を行った。

In this paper we introduce and study the algebraic generalization of non commutative convolutional neural networks. We leverage the theory of algebraic signal processing to model convolutional non commutative architectures, and we derive concrete stability bounds that extend those obtained in the literature for commutative convolutional neural networks. We show that non commutative convolutional architectures can be stable to deformations on the space of operators. We develop the spectral representation of non commutative signal models to show that non commutative filters process Fourier components independently of each other. In particular we prove that although the spectral decompositions of signals in non commutative models are associated to eigenspaces of dimension larger than one, there exists a trade-off between stability and selectivity, which is controlled by matrix polynomial functions in spaces of matrices of low dimension. This tradeoff shows how when the filters in the algebra are restricted to be stable, there is a loss in discriminability that is compensated in the network by the pointwise nonlinearities. The results derived in this paper have direct applications and implications in non commutative convolutional architectures such as group neural networks, multigraph neural networks, and quaternion neural networks, for which we provide a set of numerical experiments showing their behavior when perturbations are present.

翻訳日:2023-07-07 18:59:56 公開日:2023-07-06

# 高次元コヒーレント一方向量子鍵分布

High-dimensional coherent one-way quantum key distribution ( http://arxiv.org/abs/2105.04733v4 )

ライセンス: Link先を確認

Kfir Sulimany, Guy Pelc, Rom Dudkiewicz, Simcha Korenblit, Hagai S. Eisenberg, Yaron Bromberg, Michael Ben-Or

(参考訳) 高次元量子鍵分布(QKD)は、2次元符号化によるQKDプロトコルでは得られないセキュアな鍵レートによる究極のセキュアな通信を提供する。しかし、既存の高次元QKDプロトコルは、マルチポート干渉計や複数の検出器などの追加の実験資源を必要とするため、実用的な高次元システムのコストが上がり、使用が制限される。本稿では,標準的な2次元システムのハードウェアのみを必要とする任意の次元QKDのための新しいプロトコルを提示し,解析する。個々の攻撃やコヒーレント攻撃に対するセキュリティ証明を提供し、セキュアな鍵レートの上限を上下に設定します。そして,40kmのファイバーリンク上の標準2次元QKDシステムにおいて,新しい高次元プロトコルをテストする。新しいプロトコルは、ハードウェアの変更をシステムに導入することなく、標準の2次元コヒーレントなワンウェイプロトコルと比較して、セキュアなキーレートを2倍に向上させる。この作業は、ソフトウェアアップデートだけで既にデプロイされているQKDシステムの性能を向上させる大きな可能性を秘めている。さらに、その応用はQKDクォーディットの様々な符号化スキームにまたがる。

High-dimensional quantum key distribution (QKD) provides ultimate secure communication with secure key rates that cannot be obtained by QKD protocols with two-dimensional encoding. However, existing high-dimensional QKD protocols require additional experimental resources, such as multiport interferometers and multiple detectors, thus raising the cost of practical high-dimensional systems and limiting their use. Here, we present and analyze a novel protocol for arbitrary-dimensional QKD, that requires only the hardware of a standard two-dimensional system. We provide security proofs against individual attacks and coherent attacks, setting an upper and lower bound on the secure key rates. Then, we test the new high-dimensional protocol in a standard two-dimensional QKD system over a 40 km fiber link. The new protocol yields a two-fold enhancement of the secure key rate compared to the standard two-dimensional coherent one way protocol, without introducing any hardware modifications to the system. This work therefore holds great potential to enhance the performance of already deployed time-bins QKD systems through a software update alone. Furthermore, its applications extend across different encoding schemes of QKD qudits.

翻訳日:2023-07-07 18:59:34 公開日:2023-07-06

# 補間テンソル積ウェーブレットに基づく電子構造計算

Electronic structure calculations with interpolating tensor product wavelet basis ( http://arxiv.org/abs/2101.05540v7 )

ライセンス: Link先を確認

Tommi H\"oyn\"al\"anmaa and Tapio T. Rantala

(参考訳) 本稿では,3次元Deslauriers--Dubucウェーブレットからなる基底集合を導入し,HおよびHe原子および分子のSchr\"odinger方程式をHF法とDFT法で解く。水素の2sと2pの励起状態も計算する。核のクーロン特異性は擬ポテンシャルを用いて処理される。固有値問題をArnoldi法とLaczos法、GMRES法とCGNR法によるPoisson式で解き、補間ウェーブレットの生体直交関係を用いて行列要素を計算する。パフォーマンスはCCCBDBやBigDFTと比較される。

We introduce a basis set consisting of three-dimensional Deslauriers--Dubuc wavelets and solve numerically the Schr\"odinger equations of H and He atoms and molecules $\mathrm{H}_2$, $\mathrm{H}_2^+$, and $\mathrm{LiH}$ with HF and DFT methods. We also compute the 2s and 2p excited states of hydrogen. The Coulomb singularity at the nucleus is handled by using a pseudopotential. The eigenvalue problem is solved with Arnoldi and Lanczos methods, Poisson equation with GMRES and CGNR methods, and matrix elements are computed using the biorthogonality relations of the interpolating wavelets. Performance is compared with those of CCCBDB and BigDFT.

翻訳日:2023-07-07 18:57:14 公開日:2023-07-06

# 訓練可能な重量平均化:サブスペーストレーニングのための一般的なアプローチ

Trainable Weight Averaging: A General Approach for Subspace Training ( http://arxiv.org/abs/2205.13104v2 )

ライセンス: Link先を確認

Tao Li, Zhehao Huang, Qinghua Tao, Yingwen Wu, Xiaolin Huang

(参考訳) 低次元部分空間におけるディープニューラルネットワーク(DNN)のトレーニングは、効率的なトレーニングとより良い一般化性能を達成する上で有望な方向である。従来の研究は、ランダムな投影やトレーニング軌道上の次元削減手法を用いて部分空間を抽出するが、これらの手法は次元性や数値演算の点で非効率または不安定である。本稿では,重み付けにサブスペーストレーニングを結びつけるとともに,従来の取り組みを一般化したサブスペーストレーニングの一般的なアプローチであるTWAを提案する。 TWAは次元の点で効率的であり、使用も容易であり、サブスペーストレーニングのための有望な新しい方法である。さらに,複数のノードにわたる並列トレーニングを可能とし,各ノードにメモリと計算負荷を均等に分散させる,大規模な問題に対処する部分空間トレーニングの効率的なスキームを設計する。我々は、TWAを効率的なニューラルネットワークトレーニングに適用し、細調整されたパフォーマンスタスクを改善し、我々のアプローチの優れた効率と有効性を示す。我々は、様々なアーキテクチャを用いて、様々なベンチマークコンピュータビジョンとニューラル言語処理タスクをカバーする広範な実験を行う。実装コードはhttps://github.com/nblt/twa。

Training deep neural networks (DNNs) in low-dimensional subspaces is a promising direction for achieving efficient training and better generalization performance. Previous works extract the subspaces by using random projection or performing dimensionality reduction method on the training trajectory, but these methods can be inefficient or unstable in terms of dimensionality and numerical operations. In this paper, we connect subspace training to weight averaging and propose Trainable Weight Averaging (TWA), a general approach for subspace training that generalizes the previous efforts. TWA is efficient in terms of dimensionality and also easy to use, making it a promising new method for subspace training. We further design an efficient scheme for subspace training to cope with large-scale problems, which allows parallel training across multiple nodes and evenly distributing the memory and computation burden to each node. We apply TWA to efficient neural network training and improving fine-tuning performance tasks to demonstrate the great efficiency and effectiveness of our approach. We conduct extensive experiments that cover various benchmark computer vision and neural language processing tasks with various architectures. The code of implementation is available at https://github.com/nblt/TWA.

翻訳日:2023-07-07 18:50:34 公開日:2023-07-06

# 等価性と推定オントロジーマッチングのための機械学習フレンドリーなバイオメディカルデータセット

Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching ( http://arxiv.org/abs/2205.03447v7 )

ライセンス: Link先を確認

Yuan He, Jiaoyan Chen, Hang Dong, Ernesto Jim\'enez-Ruiz, Ali Hadian, Ian Horrocks

(参考訳) オントロジーマッチング(OM)はバイオインフォマティクスやセマンティックウェブなど多くの分野において重要な役割を担い、特に機械学習(ML)技術の適用によってその研究はますます人気が高まっている。オントロジーアライメント評価イニシアチブ(OAEI)は,OMシステムの体系的評価に多大な努力を払っているものの,サブエミッションマッピングの限定的な評価,最適でない参照マッピング,MLベースのシステム評価の限定的なサポートなど,いくつかの制限に悩まされている。これらの制約に対処するために,Mondo と UMLS から抽出したオントロジーを含む5つの新しいバイオメディカル OM タスクを導入する。各タスクは等価性と仮定マッチングの両方を含み、参照マッピングの品質は人間のキュレーションやオントロジープルーニングなどで保証される。 MLベースのOMシステムと非MLベースのOMシステムの両方において,様々な観点からOM性能を測定するための総合評価フレームワークを提案する。我々は,OAEI 2022における新たなBioMLトラックの一部として,これらのリソースの利用状況を示すため,異なるタイプのOMシステムの評価結果を報告する。

Ontology Matching (OM) plays an important role in many domains such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. Although the Ontology Alignment Evaluation Initiative (OAEI) represents an impressive effort for the systematic evaluation of OM systems, it still suffers from several limitations including limited evaluation of subsumption mappings, suboptimal reference mappings, and limited support for the evaluation of ML-based systems. To tackle these limitations, we introduce five new biomedical OM tasks involving ontologies extracted from Mondo and UMLS. Each task includes both equivalence and subsumption matching; the quality of reference mappings is ensured by human curation, ontology pruning, etc.; and a comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems. We report evaluation results for OM systems of different types to demonstrate the usage of these resources, all of which are publicly available as part of the new BioML track at OAEI 2022.

翻訳日:2023-07-07 18:50:12 公開日:2023-07-06

# ニューラルマシン翻訳における非自己回帰生成に関する調査研究

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond ( http://arxiv.org/abs/2204.09269v2 )

ライセンス: Link先を確認

Yisheng Xiao, Lijun Wu, Junliang Guo, Juntao Li, Min Zhang, Tao Qin, Tie-yan Liu

(参考訳) 推論を高速化するためにニューラルネットワーク翻訳(NMT)で最初に提案された非自己回帰(NAR)生成は、機械学習と自然言語処理のコミュニティの両方で注目を集めている。 NAR生成は機械翻訳の推論速度を大幅に高速化するが、高速化は自動回帰(AR)生成と比較して翻訳精度を犠牲にするコストがかかる。近年,NAR生成とAR生成の精度ギャップを埋めるために,多くの新しいモデルやアルゴリズムが設計・提案されている。本稿では,様々な側面の非自己回帰翻訳(nat)モデルの比較と議論を体系的に実施する。具体的には,natの取り組みを,データ操作,モデリング手法,トレーニング基準,デコードアルゴリズム,事前学習モデルのメリットなど,いくつかのグループに分類した。さらに, 文法的誤り訂正, テキスト要約, テキストスタイル変換, 対話, 意味解析, 自動音声認識など, 機械翻訳以外のNARモデルの応用についても, 簡単なレビューを行った。さらに、KDの依存関係、合理的なトレーニング目標、NARの事前トレーニング、より広範なアプリケーションなど、今後の探索の方向性についても論じる。この調査は、研究者が最新のNAR生成の進歩を捉え、先進的なNARモデルとアルゴリズムの設計を刺激し、業界関係者がアプリケーションに適切なソリューションを選択できるようにするのに役立つことを願っている。このサーベイのWebページは \url{https://github.com/LitterBrother-Xiao/Overview-of-Non-autoregressive-Applications} にある。

Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to speed up inference, has attracted much attention in both machine learning and natural language processing communities. While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, autoregressive (AR) generation. In recent years, many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation. In this paper, we conduct a systematic survey with comparisons and discussions of various non-autoregressive translation (NAT) models from different aspects. Specifically, we categorize the efforts of NAT into several groups, including data manipulation, modeling methods, training criterion, decoding algorithms, and the benefit from pre-trained models. Furthermore, we briefly review other applications of NAR models beyond machine translation, such as grammatical error correction, text summarization, text style transfer, dialogue, semantic parsing, automatic speech recognition, and so on. In addition, we also discuss potential directions for future exploration, including releasing the dependency of KD, reasonable training objectives, pre-training for NAR, and wider applications, etc. We hope this survey can help researchers capture the latest progress in NAR generation, inspire the design of advanced NAR models and algorithms, and enable industry practitioners to choose appropriate solutions for their applications. The web page of this survey is at \url{https://github.com/LitterBrother-Xiao/Overview-of-Non-autoregressive-Applications}.

翻訳日:2023-07-07 18:49:49 公開日:2023-07-06

# 高次元雑音データから低次元非線形構造を学習する:積分演算子アプローチ

Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach ( http://arxiv.org/abs/2203.00126v2 )

ライセンス: Link先を確認

Xiucai Ding and Rong Ma

(参考訳) 本研究では,高次元および雑音の観測から低次元非線形構造を学習するためのカーネルスペクトル埋め込みアルゴリズムを提案する。このアルゴリズムは、基礎となる多様体の事前知識に依存しない適応的帯域選択手順を用いる。得られた低次元埋め込みは、データ可視化、クラスタリング、予測などの下流目的にさらに活用することができる。我々の方法は理論的に正当化され、事実上解釈可能である。具体的には,サンプルの寸法と大きさが可分に大きい場合,最終的な埋め込みの収束を確立し,信号対雑音比が収束率と位相遷移に与える影響を特徴付ける。また、ある再生核ヒルベルト空間の核写像によって定義される積分作用素の固有関数への埋め込みの収束を証明し、基礎となる非線形構造を捉える。 3つの実データセットの数値シミュレーションと解析により,様々な多様体を多様な応用で学習する手法と比較して,提案手法の実証的性能が優れていることを示す。

We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations, where the datasets are assumed to be sampled from an intrinsically low-dimensional manifold and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension and size of the samples are comparably large, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Numerical simulations and analysis of three real datasets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various manifolds in diverse applications.

翻訳日:2023-07-07 18:48:51 公開日:2023-07-06

# データセットバイアスの潜在的発生源 : 機械学習アルゴリズムによる診断不足の検討

Potential sources of dataset bias complicate investigation of underdiagnosis by machine learning algorithms ( http://arxiv.org/abs/2201.07856v2 )

ライセンス: Link先を確認

M\'elanie Bernhardt, Charles Jones, Ben Glocker

(参考訳) 機械学習アルゴリズムがトレーニングデータに埋め込まれたバイアスによって、健康格差を増幅するリスクを懸念する報告が増えている。 seyyed-kalantariらは、3つの胸部x線データセットで訓練されたモデルが'no-finding'ラベルのサブグループ間で偽陽性率(fpr)の差をもたらすことを発見した。これらのモデルは、歴史的に保存されていないことが知られているサブグループにおいて、常に高いFPRをもたらす。本研究における実験装置は,アルゴリズム下診断の研究には不十分である。データセットバイアスの程度と性質に関する特定の知識(または仮定)がないため、モデルバイアスを調査することは困難である。重要なことに、トレーニングデータ(ランダム分割による)と同じバイアスを示すテストデータの使用は、報告された格差の解釈を著しく複雑にする。

An increasing number of reports raise concerns about the risk that machine learning algorithms could amplify health disparities due to biases embedded in the training data. Seyyed-Kalantari et al. find that models trained on three chest X-ray datasets yield disparities in false-positive rates (FPR) across subgroups on the 'no-finding' label (indicating the absence of disease). The models consistently yield higher FPR on subgroups known to be historically underserved, and the study concludes that the models exhibit and potentially even amplify systematic underdiagnosis. We argue that the experimental setup in the study is insufficient to study algorithmic underdiagnosis. In the absence of specific knowledge (or assumptions) about the extent and nature of the dataset bias, it is difficult to investigate model bias. Importantly, their use of test data exhibiting the same bias as the training data (due to random splitting) severely complicates the interpretation of the reported disparities.

翻訳日:2023-07-07 18:48:34 公開日:2023-07-06

# データ解析のための完全適応ベイズアルゴリズム, FABADA

Fully Adaptive Bayesian Algorithm for Data Analysis, FABADA ( http://arxiv.org/abs/2201.05145v2 )

ライセンス: Link先を確認

Pablo M Sanchez-Alarcon and Yago Ascasibar Sequeiros

(参考訳) 本研究の目的は,1次元と2次元のデータ,例えば天文学的な画像やスペクトルの信号対雑音比を自動的に改善する,ベイズ推定の観点から,新しい非パラメトリックノイズ低減手法を記述することである。このアルゴリズムはデータの平滑化可能なバージョンである平滑化モデルを反復的に評価し、ノイズ測定と統計的に互換性のある信号の推定を得る。繰り返しは、最後の滑らかなモデルのエビデンスと$\chi^2$統計量に基づいて停止し、スムーズなモデルの集合全体の重み付き平均として信号の期待値を計算する。本稿では,アルゴリズムの数学的形式化と数値的実装について述べるとともに,実天体観測のバッテリを用いて,ピーク信号と雑音比,構造的類似度指数,時間ペイロードを用いてその性能を評価する。データ解析のための完全適応ベイズアルゴリズム(fabada)は、パラメータチューニングなしでは、実際のアプリケーションでは不可能である真の信号に基づいてパラメータを最適化した標準的な画像処理アルゴリズムに匹敵する結果をもたらす。 bm3dのような最先端の非パラメトリックな手法は高い信号対雑音比で少し性能が向上するが、超ノイズデータではアルゴリズムの方がかなり正確である(相対誤差が20～40ドル以上であり、天文学の分野に特に関心がある状況である)。この範囲では, 復元によって得られた残留物の標準偏差は, 元の測定値よりも1桁以上小さくなる可能性がある。このレポートで提示された結果をすべて再現するために必要なソースコードは、メソッドの実装を含めて、https://github.com/PabloMSanAla/fabadaで公開されている。

The aim of this paper is to describe a novel non-parametric noise reduction technique from the point of view of Bayesian inference that may automatically improve the signal-to-noise ratio of one- and two-dimensional data, such as e.g. astronomical images and spectra. The algorithm iteratively evaluates possible smoothed versions of the data, the smooth models, obtaining an estimation of the underlying signal that is statistically compatible with the noisy measurements. Iterations stop based on the evidence and the $\chi^2$ statistic of the last smooth model, and we compute the expected value of the signal as a weighted average of the whole set of smooth models. In this paper, we explain the mathematical formalism and numerical implementation of the algorithm, and we evaluate its performance in terms of the peak signal to noise ratio, the structural similarity index, and the time payload, using a battery of real astronomical observations. Our Fully Adaptive Bayesian Algorithm for Data Analysis (FABADA) yields results that, without any parameter tuning, are comparable to standard image processing algorithms whose parameters have been optimized based on the true signal to be recovered, something that is impossible in a real application. State-of-the-art non-parametric methods, such as BM3D, offer slightly better performance at high signal-to-noise ratio, while our algorithm is significantly more accurate for extremely noisy data (higher than $20-40\%$ relative errors, a situation of particular interest in the field of astronomy). In this range, the standard deviation of the residuals obtained by our reconstruction may become more than an order of magnitude lower than that of the original measurements. The source code needed to reproduce all the results presented in this report, including the implementation of the method, is publicly available at https://github.com/PabloMSanAla/fabada

翻訳日:2023-07-07 18:48:15 公開日:2023-07-06

# AGMの信条改正、セマンティカルに

AGM Belief Revision, Semantically ( http://arxiv.org/abs/2112.13557v2 )

ライセンス: Link先を確認

Faiq Miftakhul Falakh, Sebastian Rudolph, Kai Sauerwald

(参考訳) We establish a generic, model-theoretic characterization of belief revision operators implementing the paradigm of minimal change according to the seminal work by Alchourr\'{o}n, G\"{a}rdenfors, and Makinson (AGM). Our characterization applies to all Tarskian logics, that is, all logics with a classical model-theoretic semantics, and hence a wide variety of formalisms used in knowledge representation and beyond, including many for which a model-theoretic characterization has hitherto been lacking. Our starting point is the approach by Katsuno and Mendelzon (K&M), who provided such a characterization for propositional logic over finite signatures. We generalize K&M's approach to the setting of AGM-style revision over bases in arbitrary Tarskian logics, where base may refer to one of the various ways of representing an agent's beliefs (such as belief sets, arbitrary or finite sets of sentences, or single sentences). Our first core result is a representation theorem providing a two-way correspondence between AGM-style revision operators and specific assignments: functions associating every base to a "preference" relation over interpretations, which must be total but is - in contrast to prior approaches - not always transitive. 第2のコアコントリビューションとして、我々は結果が(K&Mのオリジナルの研究のように)推移的な選好関係を生み出す代入に強化されるような全ての論理の特徴づけを提供する。これらの主な貢献とともに、我々の発見の多様な変種と、他の信念修正理論の分野への影響について論じる。

We establish a generic, model-theoretic characterization of belief revision operators implementing the paradigm of minimal change according to the seminal work by Alchourr\'{o}n, G\"{a}rdenfors, and Makinson (AGM). Our characterization applies to all Tarskian logics, that is, all logics with a classical model-theoretic semantics, and hence a wide variety of formalisms used in knowledge representation and beyond, including many for which a model-theoretic characterization has hitherto been lacking. Our starting point is the approach by Katsuno and Mendelzon (K&M), who provided such a characterization for propositional logic over finite signatures. We generalize K&M's approach to the setting of AGM-style revision over bases in arbitrary Tarskian logics, where base may refer to one of the various ways of representing an agent's beliefs (such as belief sets, arbitrary or finite sets of sentences, or single sentences). Our first core result is a representation theorem providing a two-way correspondence between AGM-style revision operators and specific assignments: functions associating every base to a "preference" relation over interpretations, which must be total but is - in contrast to prior approaches - not always transitive. As our second core contribution, we provide a characterization of all logics for which our result can be strengthened to assignments producing transitive preference relations (as in K&M's original work). Alongside these main contributions, we discuss diverse variants of our findings as well as ramifications for other areas of belief revision theory.

翻訳日:2023-07-07 18:47:44 公開日:2023-07-06

# ネットワークにおける記述的対推論的コミュニティ検出:落とし穴、神話、半真実

Descriptive vs. inferential community detection in networks: pitfalls, myths, and half-truths ( http://arxiv.org/abs/2112.00183v7 )

ライセンス: Link先を確認

Tiago P. Peixoto

(参考訳) コミュニティ検出はネットワーク科学における最も重要な方法論の1つであり、過去数十年でかなりの注目を集めてきた。この領域は、ネットワークを基本的なビルディングブロックに分割し、その大規模構造の要約を提供することを目的としている。その重要性と広く採用されているにもかかわらず、最先端技術と、実際に様々な分野で実際に使われている方法との間には、明らかなギャップがある。ここでは、既存のメソッドが「記述的」か「推論的」かに応じて分割することで、この相違に対処しようと試みる。記述的手法は、コミュニティ構造の文脈依存的な概念に基づくネットワーク内のパターンを見つけるが、推論的手法は生成モデルを明確にし、それらをデータに適合させようとする。このようにして、彼らはネットワーク形成のメカニズムに関する洞察を与え、統計的証拠によって支持される方法でランダム性から構造を分離することができる。我々は,推論目的による記述的手法の導入が,落とし穴や誤解を招く解答に悩まされており,一般的には避けるべきであることを示す。我々は、推論法はより明確な科学的質問と一致し、より強固な結果をもたらし、多くの場合好まれるべきであると主張する。我々は,コミュニティ検出が実際に行われている場合によく信じられる神話や半真実を,そのような手法の使用と結果の解釈の両方を改善するために,取り除こうとしている。

Community detection is one of the most important methodological fields of network science, and one which has attracted a significant amount of attention over the past decades. This area deals with the automated division of a network into fundamental building blocks, with the objective of providing a summary of its large-scale structure. Despite its importance and widespread adoption, there is a noticeable gap between what is arguably the state-of-the-art and the methods that are actually used in practice in a variety of fields. Here we attempt to address this discrepancy by dividing existing methods according to whether they have a "descriptive" or an "inferential" goal. While descriptive methods find patterns in networks based on context-dependent notions of community structure, inferential methods articulate generative models, and attempt to fit them to data. In this way, they are able to provide insights into the mechanisms of network formation, and separate structure from randomness in a manner supported by statistical evidence. We review how employing descriptive methods with inferential aims is riddled with pitfalls and misleading answers, and thus should be in general avoided. We argue that inferential methods are more typically aligned with clearer scientific questions, yield more robust results, and should be in many cases preferred. We attempt to dispel some myths and half-truths often believed when community detection is employed in practice, in an effort to improve both the use of such methods as well as the interpretation of their results.

翻訳日:2023-07-07 18:47:24 公開日:2023-07-06

# 金融予測のためのエキスパートアグリゲーション

Expert Aggregation for Financial Forecasting ( http://arxiv.org/abs/2111.15365v4 )

ライセンス: Link先を確認

Carl Remlinger, Bri\`ere Marie, Alasseur Cl\'emence, Joseph Mikael

(参考訳) 金融時系列予測専用の機械学習アルゴリズムは、多くの関心を集めている。しかし、その推定精度は時間とともに不安定になるため、いくつかのアルゴリズムを選択することは難しい。専門家のオンラインアグリゲーションは、モデルについて仮定することなく、一つのアプローチで有限のモデルの予測を組み合わせる。本稿では,Bernstein Online Aggregation (BOA) 手法を用いて,異なる機械学習モデルから得られる個々のストックリターン予測から構築したロングショート戦略を構築する。オンライン専門家の混在は、非定常性によって特徴づけられる環境においても魅力的なポートフォリオパフォーマンスをもたらす。このアグリゲーションは個々のアルゴリズムより優れており、より高いポートフォリオ Sharpe Ratio、低い不足率、同様のターンオーバーを提供する。専門家や専門家の専門化への拡張も提案されており、ポートフォリオ評価指標のファミリー全体の混合を改善する。

Machine learning algorithms dedicated to financial time series forecasting have gained a lot of interest. But choosing between several algorithms can be challenging, as their estimation accuracy may be unstable over time. Online aggregation of experts combine the forecasts of a finite set of models in a single approach without making any assumption about the models. In this paper, a Bernstein Online Aggregation (BOA) procedure is applied to the construction of long-short strategies built from individual stock return forecasts coming from different machine learning models. The online mixture of experts leads to attractive portfolio performances even in environments characterised by non-stationarity. The aggregation outperforms individual algorithms, offering a higher portfolio Sharpe Ratio, lower shortfall, with a similar turnover. Extensions to expert and aggregation specialisations are also proposed to improve the overall mixture on a family of portfolio evaluation metrics.

翻訳日:2023-07-07 18:46:50 公開日:2023-07-06

# 光パラメトリック増幅によるウィグナー機能トモグラフィ

Wigner Function Tomography via Optical Parametric Amplification ( http://arxiv.org/abs/2207.10030v4 )

ライセンス: Link先を確認

Mahmoud Kalash and Maria V. Chekhova

(参考訳) ウィグナー関数トモグラフィーは量子状態の特徴付けには不可欠であるが、その一般的なバージョンであるバランスの取れたホモダイン検出はいくつかの弱点に苦しむ。まず、非ガウス状態、特に明るい状態を測定するのに重要な効率的な検出が必要である。第二に、テスト中の状態の時空間特性に合わせて調整された局所発振器が必要であり、マルチモード状態やブロードバンド状態では失敗する。本稿では,光学パラメトリック増幅に基づくWigner関数トモグラフィーと直接検出手法を提案する。この方法は、非効率性と損失の検出に免疫を持ち、ブロードバンド、空間的および時間的に多モードの量子状態に適している。この原理を証明するため,強い多重モード状態の単一モードを占有する圧縮真空のウィグナー関数を実験的に再構成した。フィルタにより97%以上の損失が生じるにもかかわらず、$-7.5\pm 0.4$ dBと$0.91^{+0.09}_{-0.08}$の純度を得る。理論的には、圧縮された単一光子(明るい非ガウス状態)の再構成も検討する。強力なマルチモードパラメトリック増幅のため、この方法は複数のモードを同時にトモグラフィーできる。これにより、光学量子情報処理の強力なツールとなる。

Wigner function tomography is indispensable for characterizing quantum states, but its commonly used version, balanced homodyne detection, suffers from several weaknesses. First, it requires efficient detection, which is critical for measuring fragile non-Gaussian states, especially bright ones. Second, it needs a local oscillator, tailored to match the spatiotemporal properties of the state under test, and fails for multimode and broadband states. Here we propose Wigner function tomography based on optical parametric amplification followed by direct detection. The method is immune to detection inefficiency and loss, and suitable for broadband, spatially and temporally multimode quantum states. To prove the principle, we experimentally reconstruct the Wigner function of squeezed vacuum occupying a single mode of a strongly multimode state. We obtain a squeezing of $-7.5\pm 0.4$ dB and a purity of $0.91^{+0.09}_{-0.08}$ despite more than $97\%$ loss caused mainly by filtering. Theoretically, we also consider the reconstruction of a squeezed single photon - a bright non-Gaussian state. Due to strong multimode parametric amplification, the method allows for the simultaneous tomography of multiple modes. This makes it a powerful tool for optical quantum information processing.

翻訳日:2023-07-07 18:41:37 公開日:2023-07-06

# 摂動パラメトリック量子進化の微分に対する"プロパ"シフト則

"Proper" Shift Rules for Derivatives of Perturbed-Parametric Quantum Evolutions ( http://arxiv.org/abs/2207.01587v3 )

ライセンス: Link先を確認

Dirk Oliver Theis

(参考訳) Banchi & Crooks (Quantum, 2021) は「摂動」量子進化(英語版) $x\mapsto e^{i(x A + B)/\hbar}$ と呼ばれるパラメータによって期待値の微分を推定する方法を与えている。彼らのメソッドは、単にパラメータを変更するだけでなく、現れるユニタリへの修正を必要とする。さらに、$b$項が避けられない場合、この微分の正確な方法(偏りのない推定法)は知られていないようである: banchi & crooks の手法は近似を与える。本稿では、このタイプのパラメータ化期待値の導関数を推定するために、シフトパラメータのみを必要とせず、量子進化の他の変更(「適切な」シフト規則)も必要としない方法を提案する。本手法は, 解析的導関数, 偏りのない推定値を与える手法であり, バンチ・クルックス法と同じ最悪の場合のばらつきを持つ。さらに、摂動パラメトリック量子進化のフーリエ解析に基づいて、適切なシフト規則を取り巻く理論について議論し、その結果、フーリエ変換の観点から適切なシフト規則が特徴づけられ、結果としてシフトの指数的な集中を伴う適切なシフト規則が存在しない結果となる。近似誤差を示す切り抜き法を導出し、予備数値シミュレーションに基づいてBanchi-Crooks法と比較する。

Banchi & Crooks (Quantum, 2021) have given methods to estimate derivatives of expectation values depending on a parameter that enters via what we call a "perturbed" quantum evolution $x\mapsto e^{i(x A + B)/\hbar}$. Their methods require modifications, beyond merely changing parameters, to the unitaries that appear. Moreover, in the case when the $B$-term is unavoidable, no exact method (unbiased estimator) for the derivative seems to be known: Banchi & Crooks's method gives an approximation. In this paper, for estimating the derivatives of parameterized expectation values of this type, we present a method that only requires shifting parameters, no other modifications of the quantum evolutions (a "proper" shift rule). Our method is exact (i.e., it gives analytic derivatives, unbiased estimators), and it has the same worst-case variance as Banchi-Crooks's. Moreover, we discuss the theory surrounding proper shift rules, based on Fourier analysis of perturbed-parametric quantum evolutions, resulting in a characterization of the proper shift rules in terms of their Fourier transforms, which in turn leads us to non-existence results of proper shift rules with exponential concentration of the shifts. We derive truncated methods that exhibit approximation errors, and compare to Banchi-Crooks's based on preliminary numerical simulations.

翻訳日:2023-07-07 18:40:26 公開日:2023-07-06

# 遷移確率に対する資源可変近距離量子アルゴリズムの改良と物理学および変分量子線形代数への応用

Improved resource-tunable near-term quantum algorithms for transition probabilities, with applications in physics and variational quantum linear algebra ( http://arxiv.org/abs/2206.14213v2 )

ライセンス: Link先を確認

Nicolas PD Sawaya, Joonsuk Huh

(参考訳) 遷移振幅と遷移確率は、応答特性と相関関数の計算を含む物理学シミュレーションの多くの領域に関係している。これらの量もまた方程式の線形系を解くことに関係している。ここでは遷移確率を計算するための3つの関連アルゴリズムを提案する。まず、2つの入力状態が非直交的になるように,前述した短距離アルゴリズムを拡張する。この第1の手順に基づいて、回路評価の少ないトロッター化とリチャードソン外挿に基づくより深いアルゴリズムを導出する。第3に、回路の深さと測定の複雑さをトレードオフできるチューナブルアルゴリズムを導入し、特定のハードウェア特性に合わせて調整可能なアルゴリズムを導出する。最後に、物理学および化学のモデルおよび変分量子線形解法(vqls)のサブルーチンに対する原理証明数値を実装した。私たちのアプローチの一番の利点は (a) 任意の非直交状態は、量子資源のわずかな増加と共に用いられる。 b) 我々は(最近提案された他の方法と同様に)3ビットゲートの分解を必要とするハダマール試験のようなサブルーチンを完全に回避し、 c) 遷移確率に対するnisqアルゴリズムの以前の状態と比較して、量子回路評価がより少ない場合も少なくなる。

Transition amplitudes and transition probabilities are relevant to many areas of physics simulation, including the calculation of response properties and correlation functions. These quantities can also be related to solving linear systems of equations. Here we present three related algorithms for calculating transition probabilities. First, we extend a previously published short-depth algorithm, allowing for the two input states to be non-orthogonal. Building on this first procedure, we then derive a higher-depth algorithm based on Trotterization and Richardson extrapolation that requires fewer circuit evaluations. Third, we introduce a tunable algorithm that allows for trading off circuit depth and measurement complexity, yielding an algorithm that can be tailored to specific hardware characteristics. Finally, we implement proof-of-principle numerics for models in physics and chemistry and for a subroutine in variational quantum linear solving (VQLS). The primary benefits of our approaches are that (a) arbitrary non-orthogonal states may now be used with small increases in quantum resources, (b) we (like another recently proposed method) entirely avoid subroutines such as the Hadamard test that may require three-qubit gates to be decomposed, and (c) in some cases fewer quantum circuit evaluations are required as compared to the previous state-of-the-art in NISQ algorithms for transition probabilities.

翻訳日:2023-07-07 18:40:00 公開日:2023-07-06

# 単純立方体格子における2レベル原子の力学の量子平均場処理

Quantum mean-field treatment of the dynamics of a two-level atom in a simple cubic lattice ( http://arxiv.org/abs/2206.14156v3 )

ライセンス: Link先を確認

Yamen Hamdouni

(参考訳) 平均場近似はキュリー温度に近い強磁性格子中の2レベル原子の動力学の一般的な特徴を調べるために用いられる。解析的および数値的な結果が得られる。まず、格子ハミルトニアンを線形化し、磁場の任意の方向に対する相転移の次数パラメータに対する自己抵抗方程式を導出する。還元されたダイナミクスは格子の自由度を辿り出し、格子の単位セルのサイズに等しい大きさの有効スピン浴における原子のダイナミクスを減少させる。特定の方向に沿って磁場を印加することにより, 劣化や励起状態の占有確率が向上する可能性が示唆された。また,温度変化とスピンの大きさに対する依存性についても検討した。熱揺らぎの増加は励起状態の占有確率を減少させる可能性があることが判明した。非隣接細胞を占有する2つのそのような原子の絡み合いを研究し、その時間の変化は磁場の方向にあまり敏感でないことが判明した。絡み合いによる突然の死亡と再生は臨界温度近くで起こることが示されている。

The mean field approximation is used to investigate the general features of the dynamics of a two-level atom in a ferromagnetic lattice close to the Curie temperature. Various analytical and numerical results are obtained. We first linearize the lattice Hamiltonian, and we derive the self-consistency equation for the order parameter of the phase transition for arbitrary direction of the magnetic field. The reduced dynamics is deduced by tracing out the degrees of freedom of the lattice, which results in the reduction of the dynamics to that of an atom in an effective spin bath whose size is equal to the size of a unit cell of the lattice. It is found that the dephasing and the excited state occupation probability may be enhanced by applying the magnetic field along some specific directions. The dependence on the change of the temperature and the magnitude of spin is also investigated. It turns out that the increase of thermal fluctuations may reduce the occupation probability of the excited state. The entanglement of two such atoms that occupy non-adjacent cells is studied and its variation in time is found to be not much sensitive to the direction of the magnetic field. Entanglement sudden death and revival is shown to occur close to the critical temperature.

翻訳日:2023-07-07 18:39:40 公開日:2023-07-06

# アルゴリズム付きプラットフォームによるコンテンツクリエータインセンティブのモデリング

Modeling Content Creator Incentives on Algorithm-Curated Platforms ( http://arxiv.org/abs/2206.13102v2 )

ライセンス: Link先を確認

Jiri Hron, Karl Krauth, Michael I. Jordan, Niki Kilbertus, Sarah Dean

(参考訳) コンテンツクリエイターはユーザーの注意を競います。彼らのリーチは、オンラインプラットフォーム上で開発者が行うアルゴリズムの選択に大きく依存する。露出を最大化するために、多くのクリエーターは、スプロールする検索エンジン最適化産業のような例によって証明されているように、戦略的に適応する。これは有限ユーザアテンションプールの競争を招きます。我々はこれらのダイナミクスを、現代の因数分解や(ディープ)2towerアーキテクチャを含むアルゴリズムによって誘導されるインセンティブのモデルである露光ゲームと呼ぶ形で形式化する。非負対非拘束因子化のような一見無害なアルゴリズム選択は、露出ゲームにおける(nash)平衡の存在と特性に大きな影響を与えることが証明される。エクスポージャーゲームのようなクリエーターの行動モデルを使って、(以前の)デプロイ前の監査を行います。このような監査は、望ましいコンテンツとインセンティブのあるコンテンツのミスアライメントを特定し、コンテンツフィルタリングやモデレーションといったポストホックな措置を補完する。そこで本研究では,露出ゲームにおける平衡を数値的に検出するツールを提案し,MovieLensおよびLastFMデータセットの監査結果を示す。さらに, 戦略的に生成したコンテンツは, アルゴリズム探索とコンテンツの多様性, モデル表現率とジェンダーベースユーザとクリエーターグループへの偏見に強く依存していることが判明した。

Content creators compete for user attention. Their reach crucially depends on algorithmic choices made by developers on online platforms. To maximize exposure, many creators adapt strategically, as evidenced by examples like the sprawling search engine optimization industry. This begets competition for the finite user attention pool. We formalize these dynamics in what we call an exposure game, a model of incentives induced by algorithms, including modern factorization and (deep) two-tower architectures. We prove that seemingly innocuous algorithmic choices, e.g., non-negative vs. unconstrained factorization, significantly affect the existence and character of (Nash) equilibria in exposure games. We proffer use of creator behavior models, like exposure games, for an (ex-ante) pre-deployment audit. Such an audit can identify misalignment between desirable and incentivized content, and thus complement post-hoc measures like content filtering and moderation. To this end, we propose tools for numerically finding equilibria in exposure games, and illustrate results of an audit on the MovieLens and LastFM datasets. Among else, we find that the strategically produced content exhibits strong dependence between algorithmic exploration and content diversity, and between model expressivity and bias towards gender-based user and creator groups.

翻訳日:2023-07-07 18:39:23 公開日:2023-07-06

# 視覚観察からのオフライン強化学習の課題と機会

Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations ( http://arxiv.org/abs/2206.04779v3 )

ライセンス: Link先を確認

Cong Lu, Philip J. Ball, Tim G. J. Rudner, Jack Parker-Holder, Michael A. Osborne, Yee Whye Teh

(参考訳) オフライン強化学習は、ポリシー学習に大規模な事前に収集されたデータセットを活用する上で大きな可能性を秘めている。しかしながら、連続的なアクションスペースによる視覚的観察からのオフライン強化学習は、この複雑なドメインにおける重要な課題に対する理解が限られているため、未検討のままである。本稿では、視覚領域における連続的な制御のための単純なベースラインを確立し、実世界のオフラインRL問題に存在するデータ分布をより良く表現するために設計された視覚観測からオフライン強化学習のための一連のベンチマークタスクを導入する。このベンチマークタスクを用いて、DreamerV2とDrQ-v2という2つの人気のある視覚ベースのオンライン強化学習アルゴリズムに簡単な修正を加えることで、既存のオフラインRLメソッドを上回り、視覚領域における継続的な制御のための競争的ベースラインを確立することができることを示す。我々はこれらのアルゴリズムを厳密に評価し、視覚的観察から連続制御するための最先端モデルベースとモデルなしオフラインRL法の違いを実証的に評価する。この評価で使用されるコードとデータは、この領域の進歩を促進するためにオープンソース化されている。

Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, offline reinforcement learning from visual observations with continuous action spaces remains under-explored, with a limited understanding of the key challenges in this complex domain. In this paper, we establish simple baselines for continuous control in the visual domain and introduce a suite of benchmarking tasks for offline reinforcement learning from visual observations designed to better represent the data distributions present in real-world offline RL problems and guided by a set of desiderata for offline RL from visual observations, including robustness to visual distractions and visually identifiable changes in dynamics. Using this suite of benchmarking tasks, we show that simple modifications to two popular vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform existing offline RL methods and establish competitive baselines for continuous control in the visual domain. We rigorously evaluate these algorithms and perform an empirical evaluation of the differences between state-of-the-art model-based and model-free offline RL methods for continuous control from visual observations. All code and data used in this evaluation are open-sourced to facilitate progress in this domain.

翻訳日:2023-07-07 18:38:26 公開日:2023-07-06

# 部分的誤り訂正の時代における清潔で汚れたキュービットの戦い

The battle of clean and dirty qubits in the era of partial error correction ( http://arxiv.org/abs/2205.13454v3 )

ライセンス: Link先を確認

Daniel Bultrini, Samson Wang, Piotr Czarnik, Max Hunter Gordon, M. Cerezo, Patrick J. Coles, Lukasz Cincio

(参考訳) 誤り訂正が可能になった場合、各論理量子ビットに多数の物理量子ビットを割り当てる必要がある。誤り訂正はより深い回路を動作させることができるが、それぞれの物理量子ビットは計算空間の指数的な増加に寄与しうるため、誤り訂正のためにキュービットを使用するか、ノイズの多いキュービットとして使用するかのトレードオフがある。本研究では、ノイズのない量子ビット(誤り訂正された量子ビットの理想的なモデル)とともにノイズの多い量子ビットを使うことの効果を考察し、これを「クリーンで汚い」構成と呼ぶ。この設定を特徴付けるために解析モデルと数値シミュレーションを用いる。イジングモデルであるハミルトン変分アンサッツ回路において,ノイズ誘起不規則高原(nibps)の出現,すなわちノイズによる観測可能物質の指数関数濃度を示す。一つの量子ビットだけがノイズがあり、十分な回路が与えられたとしても、NIBPは単に量子ビットのサブセットを誤り訂正することによって完全に克服できないことを示唆する。正の面では、回路内のすべてのノイズレスキュービットに対して、勾配可観測物の濃度が指数関数的に抑制され、部分誤差補正の利点が示される。最後に, 解析モデルにより, 観測対象が汚染量子ビットの比に関連する指数のスケーリングに集中していることを示し, これらの知見を裏付ける。

When error correction becomes possible it will be necessary to dedicate a large number of physical qubits to each logical qubit. Error correction allows for deeper circuits to be run, but each additional physical qubit can potentially contribute an exponential increase in computational space, so there is a trade-off between using qubits for error correction or using them as noisy qubits. In this work we look at the effects of using noisy qubits in conjunction with noiseless qubits (an idealized model for error-corrected qubits), which we call the "clean and dirty" setup. We employ analytical models and numerical simulations to characterize this setup. Numerically we show the appearance of Noise-Induced Barren Plateaus (NIBPs), i.e., an exponential concentration of observables caused by noise, in an Ising model Hamiltonian variational ansatz circuit. We observe this even if only a single qubit is noisy and given a deep enough circuit, suggesting that NIBPs cannot be fully overcome simply by error-correcting a subset of the qubits. On the positive side, we find that for every noiseless qubit in the circuit, there is an exponential suppression in concentration of gradient observables, showing the benefit of partial error correction. Finally, our analytical models corroborate these findings by showing that observables concentrate with a scaling in the exponent related to the ratio of dirty-to-total qubits.

翻訳日:2023-07-07 18:38:02 公開日:2023-07-06

# マルチドメイン物理インフォームドニューラルネットワークにおけるインタフェース条件のメタ学習

Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks ( http://arxiv.org/abs/2210.12669v2 )

ライセンス: Link先を確認

Shibo Li, Michael Penwarden, Yiming Xu, Conor Tillinghast, Akil Narayan, Robert M. Kirby, Shandian Zhe

(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、偏微分方程式(PDE)の一般的なメッシュフリーな解法として登場している。最近の拡張では、ドメインを分解し、異なるPINNを適用して各サブドメインの問題を解決し、サブドメインをインターフェースで縫い合わせる。これにより、問題の複雑性をさらに軽減し、計算コストを削減し、並列化を可能にする。しかし,マルチドメインPINNの性能は,インターフェース条件の選択に敏感である。かなり多くの条件が提案されているが、特定の問題に応じて条件を選択する方法が提案されていない。このギャップに対処するために、パラメトリックPDEのファミリーを解くための適切なインタフェース条件を動的に決定するための、シンプルで効率的かつ強力なアプローチであるMETALIC(META Learning of Interface Conditions)を提案する。具体的には,2つの文脈的マルチアームバンディット(mab)モデルを開発した。最初のものはトレーニングコース全体に適用され、PDEパラメータとインターフェース条件がパフォーマンスを予測するガウシアンプロセス(GP)の報酬をオンライン更新する。我々は,UPBサンプリングとトンプソンサンプリングの両方に対して,サブ線形後悔を証明し,理論的にはMABの有効性を保証している。第2の段階は2段階に分割し、第1の段階は確率相と第2の段階であり、各段階のGP報酬を更新し、2段階の異なる条件選択を可能にし、柔軟性と性能をさらに向上させる。我々は4つのベンチマークpdeファミリーにおいてメタリックの利点を示した。

Physics-informed neural networks (PINNs) are emerging as popular mesh-free solvers for partial differential equations (PDEs). Recent extensions decompose the domain, apply different PINNs to solve the problem in each subdomain, and stitch the subdomains at the interface. Thereby, they can further alleviate the problem complexity, reduce the computational cost, and allow parallelization. However, the performance of multi-domain PINNs is sensitive to the choice of the interface conditions. While quite a few conditions have been proposed, there is no suggestion about how to select the conditions according to specific problems. To address this gap, we propose META Learning of Interface Conditions (METALIC), a simple, efficient yet powerful approach to dynamically determine appropriate interface conditions for solving a family of parametric PDEs. Specifically, we develop two contextual multi-arm bandit (MAB) models. The first one applies to the entire training course, and online updates a Gaussian process (GP) reward that given the PDE parameters and interface conditions predicts the performance. We prove a sub-linear regret bound for both UCB and Thompson sampling, which in theory guarantees the effectiveness of our MAB. The second one partitions the training into two stages, one is the stochastic phase and the other deterministic phase; we update a GP reward for each phase to enable different condition selections at the two stages to further bolster the flexibility and performance. We have shown the advantage of METALIC on four bench-mark PDE families.

翻訳日:2023-07-07 18:31:06 公開日:2023-07-06

# 強度三重相関によるab initio空間位相検索

Ab Initio Spatial Phase Retrieval via Intensity Triple Correlations ( http://arxiv.org/abs/2210.03793v3 )

ライセンス: Link先を確認

Nolan Peard, Kartik Ayyer, and Henry N. Chapman

(参考訳) 非コヒーレントエミッタからの2次強度相関は、空間分布のフーリエ変換係数を明らかにすることができるが、実空間への完全一般フーリエ変換を可能にするための位相の検索は依然として困難である。 3階の強度相関による位相検索は、計算において未対応の符号問題を単純化する特別なエミッタ構成に依存している。この符号問題の完全な処理がなければ、エミッターの真に任意の配置からフーリエ位相を検索する一般的なケースは不可能である。本稿では, 強度三重相関を用いた ab initio 相の一般検索法について述べる。シミュレーションは、撮像星や蛍光原子や分子に応用できる非コヒーレントエミッターのクラスターの正確な位相検索を示す。この研究により、フーリエ変換を直接実行し、遠方界の強度相関のみを通して任意の独立したエミッター配列の画像を再構成することができるようになった。

Second-order intensity correlations from incoherent emitters can reveal the Fourier transform modulus of their spatial distribution, but retrieving the phase to enable completely general Fourier inversion to real space remains challenging. Phase retrieval via the third-order intensity correlations has relied on special emitter configurations which simplified an unaddressed sign problem in the computation. Without a complete treatment of this sign problem, the general case of retrieving the Fourier phase from a truly arbitrary configuration of emitters is not possible. In this paper, a general method for ab initio phase retrieval via the intensity triple correlations is described. Simulations demonstrate accurate phase retrieval for clusters of incoherent emitters which could be applied to imaging stars or fluorescent atoms and molecules. With this work, it is now finally tractable to perform Fourier inversion directly and reconstruct images of arbitrary arrays of independent emitters via far-field intensity correlations alone.

翻訳日:2023-07-07 18:30:12 公開日:2023-07-06

# adaptive sparse vit: セルフアテンションをフル活用した学習可能な適応トークンプルーニング

Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention ( http://arxiv.org/abs/2209.13802v2 )

ライセンス: Link先を確認

Xiangcheng Liu, Tianyi Wu, Guodong Guo

(参考訳) ビジョントランスフォーマーはコンピュータビジョンの新しいパラダイムとして登場し、高価な計算コストを伴う優れた性能を示している。画像トークンのプルーニングは、トークン数に対して複雑さが二次的であること、背景領域のみを含む多くのトークンが最終的な予測に真に寄与しないという事実から、ViT圧縮の主要なアプローチの1つである。既存の作業は、個々のトークンの重要性を評価するために追加モジュールに依存するか、異なる入力インスタンスに対して固定比率プルーニング戦略を実装している。本研究では,最小限のコストで適応的なスパーストークンプルーニングフレームワークを提案する。具体的には,まず,安価な注意頭部重要度重み付けクラス注意得点機構を提案する。そして、学習可能なパラメータをしきい値として挿入して、重要でないトークンと情報を区別する。トークンアテンションスコアとしきい値を比較することで、不要なトークンを階層的に破棄し、推論を加速することができる。学習可能なしきい値は、精度と複雑さのバランスをとるために予算対応トレーニングに最適化され、異なる入力インスタンスに対して対応するプルーニング設定を実行する。大規模な実験は我々のアプローチの有効性を実証する。提案手法はdeit-sのスループットを50%向上させ,top-1精度が0.2%低下しただけで,従来の手法よりも精度とレイテンシのトレードオフが向上した。

Vision transformer has emerged as a new paradigm in computer vision, showing excellent performance while accompanied by expensive computational cost. Image token pruning is one of the main approaches for ViT compression, due to the facts that the complexity is quadratic with respect to the token number, and many tokens containing only background regions do not truly contribute to the final prediction. Existing works either rely on additional modules to score the importance of individual tokens, or implement a fixed ratio pruning strategy for different input instances. In this work, we propose an adaptive sparse token pruning framework with a minimal cost. Specifically, we firstly propose an inexpensive attention head importance weighted class attention scoring mechanism. Then, learnable parameters are inserted as thresholds to distinguish informative tokens from unimportant ones. By comparing token attention scores and thresholds, we can discard useless tokens hierarchically and thus accelerate inference. The learnable thresholds are optimized in budget-aware training to balance accuracy and complexity, performing the corresponding pruning configurations for different input instances. Extensive experiments demonstrate the effectiveness of our approach. Our method improves the throughput of DeiT-S by 50% and brings only 0.2% drop in top-1 accuracy, which achieves a better trade-off between accuracy and latency than the previous methods.

翻訳日:2023-07-07 18:29:36 公開日:2023-07-06

# アンドロイドは電気羊を笑うのか? new yorkerのキャプションコンテストにおけるユーモアの「理解」ベンチマーク

Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest ( http://arxiv.org/abs/2209.06293v2 )

ライセンス: Link先を確認

Jack Hessel and Ana Marasovi\'c and Jena D. Hwang and Lillian Lee and Jeff Da and Rowan Zellers and Robert Mankoff and Yejin Choi

(参考訳) 大規模なニューラルネットワークがジョークを生成できるようになったが、ユーモアを“理解”できるのだろうか? 我々は、New Yorker Cartoon Caption Contestから派生した3つのタスクでAIモデルに挑戦する: ジョークと漫画をマッチングし、勝利したキャプションを特定し、勝利したキャプションが面白い理由を説明する。重要な要素は、画像とキャプションの間の複雑な、しばしば驚くべき関係と、間接的で遊びに満ちた説明が人間の経験や文化に頻繁に含まれることである。我々は,マルチモーダルモデルと言語のみのモデルの両方について検討する。前者は漫画イメージに直接挑戦し,後者は人間レベルの視覚的理解をシミュレートするために視覚シーンの多面的記述を与える。どちらのモデルも3つのタスクすべてで苦労しています。例えば、当社のベストマルチモーダルモデルは、マッチングタスクにおいて人間のパフォーマンスよりも30ポイント遅れており、たとえ地上の視覚シーン記述子が提供されたとしても、人間による説明は、2/3以上のケースで最高の機械によって認可されたモデル(gpt-4)よりも優先されます。モデル、コード、リーダボード、コーパスをリリースし、画像の位置や関係、シーンで珍しいもの、ジョークの説明を新たに収集したアノテーションが含まれています。

Large neural networks can now generate jokes, but do they really "understand" humor? We challenge AI models with three tasks derived from the New Yorker Cartoon Caption Contest: matching a joke to a cartoon, identifying a winning caption, and explaining why a winning caption is funny. These tasks encapsulate progressively more sophisticated aspects of "understanding" a cartoon; key elements are the complex, often surprising relationships between images and captions and the frequent inclusion of indirect and playful allusions to human experience and culture. We investigate both multimodal and language-only models: the former are challenged with the cartoon images directly, while the latter are given multifaceted descriptions of the visual scene to simulate human-level visual understanding. We find that both types of models struggle at all three tasks. For example, our best multimodal models fall 30 accuracy points behind human performance on the matching task, and, even when provided ground-truth visual scene descriptors, human-authored explanations are preferred head-to-head over the best machine-authored ones (few-shot GPT-4) in more than 2/3 of cases. We release models, code, leaderboard, and corpus, which includes newly-gathered annotations describing the image's locations/entities, what's unusual in the scene, and an explanation of the joke.

翻訳日:2023-07-07 18:29:14 公開日:2023-07-06

# 衛星からのウイルス検出:グラフニューラルネットワークによる西ナイルウイルスの循環のモデル化

Spotting Virus from Satellites: Modeling the Circulation of West Nile Virus Through Graph Neural Networks ( http://arxiv.org/abs/2209.05251v2 )

ライセンス: Link先を確認

Lorenzo Bonicelli, Angelo Porrello, Stefano Vincenzi, Carla Ippoliti, Federica Iapaolo, Annamaria Conte, Simone Calderara

(参考訳) 西ナイルウイルス(英語: West Nile Virus、WNV)は、蚊が媒介する動物病ウイルスの1つである。その循環は通常、ベクター増殖とウイルスの複製に適した気候および環境条件と関連している。その上、wnv循環の形状と予測のためにいくつかの統計モデルが開発されており、特に最近の地球観測(eo)データの大量利用と人工知能の分野における継続的な進歩は、貴重な機会を提供している。本稿では,広範に環境・気候特性を有する衛星画像を用いた深部ニューラルネットワーク(DNN)によるWNV循環予測を提案する。特に,各地形を個別に解析する従来の手法では,近接する場所の特性を考慮した空間認識手法を提案する。具体的には,グラフニューラルネットワーク(gnn)を基盤として,隣接する場所から特徴を集約し,これらのモジュールをさらに拡張して,温度や土壌水分の差,地理的距離など,複数の関係を考察する。さらに,ウイルス拡散の季節性を考慮するため,時間関連情報をモデルに直接注入する。我々は、ランドサットとセンチネルのミッションの衛星画像と、イタリアにおけるWNV循環の地上観測を組み合わせた実験的な設定を設計する。提案するマルチアドバンシーグラフアテンションネットワーク (magat) は, 適切な事前学習段階と組み合わせると, 一貫して高い性能が得られることを示す。最後に,Ablation研究におけるMAGATの各成分の重要性について検討した。

The occurrence of West Nile Virus (WNV) represents one of the most common mosquito-borne zoonosis viral infections. Its circulation is usually associated with climatic and environmental conditions suitable for vector proliferation and virus replication. On top of that, several statistical models have been developed to shape and forecast WNV circulation: in particular, the recent massive availability of Earth Observation (EO) data, coupled with the continuous advances in the field of Artificial Intelligence, offer valuable opportunities. In this paper, we seek to predict WNV circulation by feeding Deep Neural Networks (DNNs) with satellite images, which have been extensively shown to hold environmental and climatic features. Notably, while previous approaches analyze each geographical site independently, we propose a spatial-aware approach that considers also the characteristics of close sites. Specifically, we build upon Graph Neural Networks (GNN) to aggregate features from neighbouring places, and further extend these modules to consider multiple relations, such as the difference in temperature and soil moisture between two sites, as well as the geographical distance. Moreover, we inject time-related information directly into the model to take into account the seasonality of virus spread. We design an experimental setting that combines satellite images - from Landsat and Sentinel missions - with ground truth observations of WNV circulation in Italy. We show that our proposed Multi-Adjacency Graph Attention Network (MAGAT) consistently leads to higher performance when paired with an appropriate pre-training stage. Finally, we assess the importance of each component of MAGAT in our ablation studies.

翻訳日:2023-07-07 18:28:50 公開日:2023-07-06

# バルク-バウンダリ対応によるトポロジカル量子系に対するストレンジ相関器

Strange correlators for topological quantum systems from bulk-boundary correspondence ( http://arxiv.org/abs/2209.04283v3 )

ライセンス: Link先を確認

Luca Lepori and Michele Burrello and Andrea Trombettoni and Simone Paganelli

(参考訳) ストレンジ」相関器は、調査中の状態と自明な参照状態の間の適切な2点相関の行列要素を計算することにより、多体モデルで生じる位相位相を検出するツールを提供する。その効果は、採用されているオペレータの選択に依存する。本稿では,この選択に対する体系的な手順を提案し,監視下のシステムのバルク・バウンダリ対応を用いた演算子選択の利点について論じる。スケーリング指数を用いて、奇妙な相関子の代数的減衰とギャップレスエッジモード作用素のスケーリング次元を直接関連付ける。対称性を保護した位相位相を包含する格子モデルを用いて解析を開始し、奇妙な相関子の和を解析し、それらのモジュラーを統合することでキャンセルや有限サイズ効果が大幅に減少することを示した。また,非自明なトポロジを持つ状態間の奇妙な相関関係だけでなく,内在的トポロジ秩序をホストするシステムも分析する。翻訳的および非翻訳的不変例,およびオンサイト障害や長距離結合の存在下では, トポロジカル位相の診断に奇妙な相関器を用いた手法の有効性を拡張し, 最適選択のための一般的な手順を示す。

"Strange" correlators provide a tool to detect topological phases arising in many-body models by computing the matrix elements of suitably defined two-point correlations between the states under investigation and trivial reference states. Their effectiveness depends on the choice of the adopted operators. In this paper we give a systematic procedure for this choice, discussing the advantages of choosing operators using the bulk-boundary correspondence of the systems under scrutiny. Via the scaling exponents, we directly relate the algebraic decay of the strange correlators with the scaling dimensions of gapless edge modes operators. We begin our analysis with lattice models hosting symmetry-protected topological phases and we analyze the sums of the strange correlators, pointing out that integrating their moduli substantially reduces cancellations and finite-size effects. We also analyze instances of systems hosting intrinsic topological order, as well as strange correlators between states with different nontrivial topologies. Our results for both translational and non-translational invariant cases, and in presence of on-site disorder and long-range couplings, extend the validity of the strange correlators approach for the diagnosis of topological phases of matter, and indicate a general procedure for their optimal choice.

翻訳日:2023-07-07 18:28:26 公開日:2023-07-06

# 複雑なネットワーク理論を用いた分散型エネルギー資源を用いた配電システムの計画と運用のレジリエンス評価

Evaluating the Planning and Operational Resilience of Electrical Distribution Systems with Distributed Energy Resources using Complex Network Theory ( http://arxiv.org/abs/2208.11543v4 )

ライセンス: Link先を確認

Divyanshi Dwivedi, Pradeep Kumar Yemula, Mayukha Pal

(参考訳) 電気系統は分散エネルギー資源(ders)によって広範囲に浸透し、エネルギー需要にシステムのレジリエンスを高めるという一般的な認識を満たしている。しかし、dersの統合はグリッド操作に悪影響を与え、その断続的な可用性、気象条件のダイナミクス、非線形性、複雑さ、悪意のある脅威の数、消費者の信頼性要求の改善といった様々な要因によってシステムのレジリエンスに影響を与える可能性がある。本稿では,極端事象下での配電系統の計画と運用のレジリエンスを評価する手法を提案し,電力系統の耐久能力について検討する。提案手法は複雑なネットワーク理論を効果的に活用して開発された。電力ネットワークのノードで監視されるアクティブ電力の時系列データから、望ましくない構成のための関連ネットワークを開発する。これらの相関ネットワークに対しては,クラスタリング係数,アソシエイト係数,平均度,電力法指数などのネットワークパラメータを計算し,極端な条件下でのネットワークの耐力判定のためのパーコレーション閾値を算出した。提案手法は, 異なる条件下でレジリエンスを維持しつつ, システム内のソーラーパネルのホスト容量を同定し, システムの非レジリエンス化に寄与する最重要ノードを特定するのにも適している。このフレームワークは、シミュレーションソフトウェアGridLAB-Dを用いて、様々な電気条件のアクティブ電力時系列データを生成することにより、IEEE 123ノードテストフィード上で実証される。パーコレーション閾値は配電システムの計画と運用のレジリエンスの決定に有効な指標となった。

Electrical Distribution Systems are extensively penetrated with Distributed Energy Resources (DERs) to cater the energy demands with the general perception that it enhances the system's resilience. However, integration of DERs may adversely affect the grid operation and affect the system resilience due to various factors like their intermittent availability, dynamics of weather conditions, non-linearity, complexity, number of malicious threats, and improved reliability requirements of consumers. This paper proposes a methodology to evaluate the planning and operational resilience of power distribution systems under extreme events and determines the withstand capability of the electrical network. The proposed framework is developed by effectively employing the complex network theory. Correlated networks for undesirable configurations are developed from the time series data of active power monitored at nodes of the electrical network. For these correlated networks, computed the network parameters such as clustering coefficient, assortative coefficient, average degree and power law exponent for the anticipation; and percolation threshold for the determination of the network withstand capability under extreme conditions. The proposed methodology is also suitable for identifying the hosting capacity of solar panels in the system while maintaining resilience under different unfavourable conditions and identifying the most critical nodes of the system that could drive the system into non-resilience. This framework is demonstrated on IEEE 123 node test feeder by generating active power time-series data for a variety of electrical conditions using simulation software, GridLAB-D. The percolation threshold resulted as an effective metric for the determination of the planning and operational resilience of the power distribution system.

翻訳日:2023-07-07 18:28:05 公開日:2023-07-06

# リレーショナルメッセージパッシングニューラルネットワークを用いた不均一シーングラフ生成

Unbiased Heterogeneous Scene Graph Generation with Relation-aware Message Passing Neural Network ( http://arxiv.org/abs/2212.00443v4 )

ライセンス: Link先を確認

Kanghoon Yoon, Kibum Kim, Jinyoung Moon, Chanyoung Park

(参考訳) 最近のシーングラフ生成(SGG)フレームワークは、画像内の複数のオブジェクト間の複雑な関係を学習することに焦点を当てている。オブジェクトとその隣接するオブジェクト間の高次相互作用をモデル化するメッセージパッシングニューラルネットワーク(MPNN)の性質のおかげで、SGGの代表的な表現学習モジュールとなっている。しかし、既存のMPNNベースのフレームワークはシーングラフを均質なグラフとみなし、オブジェクト間の視覚的関係の文脈認識を制限する。つまり、関係が関連している対象に大きく依存する傾向があるという事実を、彼らは見落としている。本稿では,メッセージパッシングニューラルネットワークを用いて関係認識コンテキストをキャプチャする不偏不均一シーングラフ生成(hetsgg)フレームワークを提案する。本稿では,オブジェクト間の述語型を考慮した画像の文脈情報を集約する,rmp(relation-aware message passing neural network)と呼ばれる新しいメッセージパッシング層を考案する。以上の結果から,HetSGGは最先端の手法,特に尾部述語クラスでは性能に優れていた。

Recent scene graph generation (SGG) frameworks have focused on learning complex relationships among multiple objects in an image. Thanks to the nature of the message passing neural network (MPNN) that models high-order interactions between objects and their neighboring objects, they are dominant representation learning modules for SGG. However, existing MPNN-based frameworks assume the scene graph as a homogeneous graph, which restricts the context-awareness of visual relations between objects. That is, they overlook the fact that the relations tend to be highly dependent on the objects with which the relations are associated. In this paper, we propose an unbiased heterogeneous scene graph generation (HetSGG) framework that captures relation-aware context using message passing neural networks. We devise a novel message passing layer, called relation-aware message passing neural network (RMP), that aggregates the contextual information of an image considering the predicate type between objects. Our extensive evaluations demonstrate that HetSGG outperforms state-of-the-art methods, especially outperforming on tail predicate classes.

翻訳日:2023-07-07 18:21:25 公開日:2023-07-06

# Fourier-Net:バンド制限変形による高速画像登録

Fourier-Net: Fast Image Registration with Band-limited Deformation ( http://arxiv.org/abs/2211.16342v2 )

ライセンス: Link先を確認

Xi Jia, Joseph Bartlett, Wei Chen, Siyang Song, Tianyang Zhang, Xinxing Cheng, Wenqi Lu, Zhaowen Qiu, Jinming Duan

(参考訳) 教師なし画像登録では、全解像度空間領域における密度変位場を予測するためにU-Netスタイルのネットワークが一般的である。高解像度のボリューム画像データの場合、このプロセスはリソース集約的で時間を要する。そこで本研究では,u-net方式ネットワークにおける拡張パスをパラメータフリーモデル駆動デコーダに置き換え,フーリエネットを提案する。具体的には,空間領域内のフルレゾリューション変位場を出力するフーリエネット学習の代わりに,その低次元表現を帯域制限フーリエ領域で学習する。この表現は、我々が考案したモデル駆動デコーダ(ゼロパディング層と逆離散フーリエ変換層)によって空間領域内の密度の高い全解像度変位場にデコードされる。これらの変更により、教師なしのfourier-netは、パラメータと計算操作が少なくなり、推論速度が速くなります。 fourier-netは、さまざまな最先端のアプローチに対して、2つの公開3d脳データセットで評価される。例えば、TransMorphというトランスフォーマーベースの手法と比較して、我々のフーリエネットはパラメータの 2.2 % と乗算加算演算の 6.66 % しか使用せず、0.5 % のDiceスコアと 11.48 倍の推論速度が得られる。コードは \url{https://github.com/xi-jia/fourier-net} で入手できる。

Unsupervised image registration commonly adopts U-Net style networks to predict dense displacement fields in the full-resolution spatial domain. For high-resolution volumetric image data, this process is however resource-intensive and time-consuming. To tackle this problem, we propose the Fourier-Net, replacing the expansive path in a U-Net style network with a parameter-free model-driven decoder. Specifically, instead of our Fourier-Net learning to output a full-resolution displacement field in the spatial domain, we learn its low-dimensional representation in a band-limited Fourier domain. This representation is then decoded by our devised model-driven decoder (consisting of a zero padding layer and an inverse discrete Fourier transform layer) to the dense, full-resolution displacement field in the spatial domain. These changes allow our unsupervised Fourier-Net to contain fewer parameters and computational operations, resulting in faster inference speeds. Fourier-Net is then evaluated on two public 3D brain datasets against various state-of-the-art approaches. For example, when compared to a recent transformer-based method, named TransMorph, our Fourier-Net, which only uses 2.2\% of its parameters and 6.66\% of the multiply-add operations, achieves a 0.5\% higher Dice score and an 11.48 times faster inference speed. Code is available at \url{https://github.com/xi-jia/Fourier-Net}.

翻訳日:2023-07-07 18:21:04 公開日:2023-07-06

# 反復アルゴリズム学習のための再帰的リカレントニューラルネットワーク(R2N2)アーキテクチャ

A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms ( http://arxiv.org/abs/2211.12386v2 )

ライセンス: Link先を確認

Danimir T. Doncevic, Alexander Mitsos, Yue Guo, Qianxiao Li, Felix Dietrich, Manuel Dahmen, Ioannis G. Kevrekidis

(参考訳) 与えられたタスクに対する数値アルゴリズムのメタラーニングは、アルゴリズム構造と関連するハイパーパラメータのデータ駆動識別と適応からなる。メタラーニング問題の複雑さを制限するために、有利なアルゴリズム構造に対するある種の帰納的バイアスを持つニューラルアーキテクチャを使用できる。我々は,前回導入したrunge-kuttaニューラルネットワークを再帰的再帰的ニューラルネットワーク(r2n2)スーパー構造に一般化した。既成のディープラーニングアプローチとは対照的に、情報生成のためのモジュールと、それに続くソリューションへの情報の組み立てのためのモジュールの分離が特徴である。サブスペースの形での局所情報は、現在の外部イテレートから始まる繰り返し関数評価の下位、内部、イテレーションによって生成される。次の外部イテレートへの更新は、これらの評価の線形結合として計算され、この空間の残余を低減し、ネットワークの出力を構成する。様々な計算問題クラスの入出力データに対して,提案構造内の重みパラメータを正規にトレーニングすることで,線形方程式系ではクリロフソルバ,非線形方程式系ではニュートン・クリロフソルバ,常微分方程式ではルンゲ・クッタ積分器のような反復が得られることを示す。モジュラリティのため、スーパー構造はテイラー級数展開に基づいて伝統的に反復アルゴリズムのより一般的なクラスを表現するのに必要な関数で容易に拡張できる。

Meta-learning of numerical algorithms for a given task consists of the data-driven identification and adaptation of an algorithmic structure and the associated hyperparameters. To limit the complexity of the meta-learning problem, neural architectures with a certain inductive bias towards favorable algorithmic structures can, and should, be used. We generalize our previously introduced Runge-Kutta neural network to a recursively recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. In contrast to off-the-shelf deep learning approaches, it features a distinct division into modules for generation of information and for the subsequent assembly of this information towards a solution. Local information in the form of a subspace is generated by subordinate, inner, iterations of recurrent function evaluations starting at the current outer iterate. The update to the next outer iterate is computed as a linear combination of these evaluations, reducing the residual in this space, and constitutes the output of the network. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields iterations similar to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta integrators for ordinary differential equations. Due to its modularity, the superstructure can be readily extended with functionalities needed to represent more general classes of iterative algorithms traditionally based on Taylor series expansions.

翻訳日:2023-07-07 18:20:40 公開日:2023-07-06

# 正規変換器:視覚意味論によるLiDAR点からの表面形状の抽出

Normal Transformer: Extracting Surface Geometry from LiDAR Points Enhanced by Visual Semantics ( http://arxiv.org/abs/2211.10580v2 )

ライセンス: Link先を確認

Ancheng Lin, Jun Li

(参考訳) 表面ノーマルの高品質な推定は、衝突回避や咬合推定のような多くの幾何学的理解問題において曖昧さを減らすのに役立つ。本稿では,3次元点雲と2次元カラー画像から正規分布を推定する手法を提案する。本研究では,視覚意味と3次元幾何学データのハイブリッド情報と効果的な学習戦略を活用すべく,トランスフォーマーニューラルネットワークを開発した。既存の手法と比較して,提案手法の情報融合はより効果的であり,実験によって支援されている。また、3次元レンダリングエンジンに屋外交通シーンのシミュレーション環境を構築し、通常の推定器を訓練するための注釈付きデータを得た。合成データに基づいてトレーニングされたモデルは、KITTIデータセットの実際のシーンでテストされる。 KITTIデータセットの通常の方向を推定したタスクは、提案した推定器が既存の手法よりも優れていることを示す。

High-quality estimation of surface normal can help reduce ambiguity in many geometry understanding problems, such as collision avoidance and occlusion inference. This paper presents a technique for estimating the normal from 3D point clouds and 2D colour images. We have developed a transformer neural network that learns to utilise the hybrid information of visual semantic and 3D geometric data, as well as effective learning strategies. Compared to existing methods, the information fusion of the proposed method is more effective, which is supported by experiments. We have also built a simulation environment of outdoor traffic scenes in a 3D rendering engine to obtain annotated data to train the normal estimator. The model trained on synthetic data is tested on the real scenes in the KITTI dataset. And subsequent tasks built upon the estimated normal directions in the KITTI dataset show that the proposed estimator has advantage over existing methods.

翻訳日:2023-07-07 18:19:51 公開日:2023-07-06

# スタイン変分勾配降下のための有限粒子収束速度

A Finite-Particle Convergence Rate for Stein Variational Gradient Descent ( http://arxiv.org/abs/2211.09721v4 )

ライセンス: Link先を確認

Jiaxin Shi and Lester Mackey

(参考訳) 粒子の集合で確率分布を近似する一般的なアルゴリズムであるスタイン変分勾配降下(SVGD)に対する最初の有限粒子収束速度を提供する。具体的には、ターゲット分布がリプシッツスコアのサブガウジアンである場合、n個の粒子と適切なステップサイズシーケンスを持つsvgdは、カーネルスタインの不一致を1/sqrt(log log n)レートでゼロにする。 n への依存度が向上し、我々の明示的で非漸近的な証明戦略が将来の改良のテンプレートになることを期待している。

We provide the first finite-particle convergence rate for Stein variational gradient descent (SVGD), a popular algorithm for approximating a probability distribution with a collection of particles. Specifically, whenever the target distribution is sub-Gaussian with a Lipschitz score, SVGD with n particles and an appropriate step size sequence drives the kernel Stein discrepancy to zero at an order 1/sqrt(log log n) rate. We suspect that the dependence on n can be improved, and we hope that our explicit, non-asymptotic proof strategy will serve as a template for future refinements.

翻訳日:2023-07-07 18:19:38 公開日:2023-07-06

# DiffusionDB: テキストから画像生成モデルのための大規模プロンプトギャラリーデータセット

DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models ( http://arxiv.org/abs/2210.14896v4 )

ライセンス: Link先を確認

Zijie J. Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, Duen Horng Chau

(参考訳) 最近の拡散モデルの進歩により、ユーザーは自然言語でテキストプロンプトを書くことで高品質な画像を生成することができる。しかし、所望の詳細な画像を生成するには適切なプロンプトが必要であり、モデルがどのように異なるプロンプトに反応するか、最良のプロンプトが何であるかはよくわからない。これらの重要な課題に対処するために、DiffusionDBを紹介した。DiffusionDBは、Stable Diffusionが生成した1400万のイメージ、1.8万のユニークなプロンプト、および実際のユーザが指定したハイパーパラメータを含む、最初の大規模なテキスト・画像プロンプトデータセットである。我々はプロンプトの構文的特徴と意味的特徴を分析する。モデルエラーにつながる可能性のある特定のハイパーパラメータ値とプロンプトスタイルを特定し、誤情報の発生のような潜在的に有害なモデル使用の証拠を示す。この前例のない規模のデータセットと多様性は、プロンプトと生成モデルの相互作用を理解し、ディープフェイクを検出し、これらのモデルをより簡単に使用するためのヒューマン・aiインタラクションツールを設計するための、エキサイティングな研究機会を提供します。 DiffusionDBは、https://poloclub.github.io/diffusiondb.comで公開されている。

With recent advancements in diffusion models, users can generate high-quality images by writing text prompts in natural language. However, generating images with desired details requires proper prompts, and it is often unclear how a model reacts to different prompts or what the best prompts are. To help researchers tackle these critical challenges, we introduce DiffusionDB, the first large-scale text-to-image prompt dataset totaling 6.5TB, containing 14 million images generated by Stable Diffusion, 1.8 million unique prompts, and hyperparameters specified by real users. We analyze the syntactic and semantic characteristics of prompts. We pinpoint specific hyperparameter values and prompt styles that can lead to model errors and present evidence of potentially harmful model usage, such as the generation of misinformation. The unprecedented scale and diversity of this human-actuated dataset provide exciting research opportunities in understanding the interplay between prompts and generative models, detecting deepfakes, and designing human-AI interaction tools to help users more easily use these models. DiffusionDB is publicly available at: https://poloclub.github.io/diffusiondb.

翻訳日:2023-07-07 18:19:07 公開日:2023-07-06

# 逆コントラスト学習に基づく中国語スペルチェックフレームワーク

A Chinese Spelling Check Framework Based on Reverse Contrastive Learning ( http://arxiv.org/abs/2210.13823v2 )

ライセンス: Link先を確認

Nankai Lin, Hongyan Wu, Sihui Fu, Shengyi Jiang, Aimin Yang

(参考訳) 中国語のスペルチェックは、漢字のスペルミスを検出し、訂正するタスクである。既存の研究は、テキスト表現を強化し、マルチソース情報を用いてモデルの検出と修正能力を向上させることを目的としているが、不明瞭な単語を区別する能力にはあまり注意を払わない。類似したサンプルペア間の表現空間距離を最小化することを目的としたコントラスト学習は,近年,自然言語処理において主流となっている。コントラスト学習にインスパイアされた中国語のスペルチェックのための新しいフレームワークを提案し,言語表現,スペルチェック,逆コントラスト学習の3つのモジュールからなる。具体的には,類似した例,すなわち音韻的および視覚的に表現可能な文字間の一致を最小限に抑えるための逆コントラスト学習戦略を提案する。実験の結果,我々のフレームワークはモデルに依存しず,既存の中国語綴りチェックモデルと組み合わせて,最先端のパフォーマンスを実現することができた。

Chinese spelling check is a task to detect and correct spelling mistakes in Chinese text. Existing research aims to enhance the text representation and use multi-source information to improve the detection and correction capabilities of models, but does not pay too much attention to improving their ability to distinguish between confusable words. Contrastive learning, whose aim is to minimize the distance in representation space between similar sample pairs, has recently become a dominant technique in natural language processing. Inspired by contrastive learning, we present a novel framework for Chinese spelling checking, which consists of three modules: language representation, spelling check and reverse contrastive learning. Specifically, we propose a reverse contrastive learning strategy, which explicitly forces the model to minimize the agreement between the similar examples, namely, the phonetically and visually confusable characters. Experimental results show that our framework is model-agnostic and could be combined with existing Chinese spelling check models to yield state-of-the-art performance.

翻訳日:2023-07-07 18:18:43 公開日:2023-07-06

# 複雑な問合せ応答に対するニューラルリンク予測器の適応

Adapting Neural Link Predictors for Complex Query Answering ( http://arxiv.org/abs/2301.12313v2 )

ライセンス: Link先を確認

Erik Arakelyan, Pasquale Minervini, Isabelle Augenstein

(参考訳) 不完全な知識グラフに複雑なクエリを答えることは、モデルが不足する知識が存在する場合、複雑な論理的クエリに答える必要があるという課題である。 arakelyan et al. (2021), minervini et al. (2022) は、ニューラルネットワークの予測器は複雑なクエリに対する応答にも使用できることを示した。しかし、CQDは否定を処理せず、アトミックなトレーニングクエリからのトレーニング信号のみを使用する: ニューラルネットワーク予測スコアは、複雑なクエリ応答中にファジィ論理t-ノルムを介して相互に相互作用するように調整されていない。本稿では、パラメータ効率のよいスコア適応モデルをトレーニングして、ニューラルネットワーク予測スコアを再分類することで、この問題に対処することを提案する。我々の手法であるCQD$^{A}$は、現在の最先端の手法よりもはるかに正確な結果を生成し、利用可能なトレーニングクエリタイプの$\leq 35\%$を使用しながら、すべてのデータセットとクエリタイプの平均値の平均34.4$から35.1ドルに改善した。さらに、CQD$^{A}$はデータ効率が高く、トレーニングデータのわずか1\%の値で競合結果が得られ、ドメイン外の評価が堅牢であることを示す。

Answering complex queries on incomplete knowledge graphs is a challenging task where a model needs to answer complex logical queries in the presence of missing knowledge. Recently, Arakelyan et al. (2021); Minervini et al. (2022) showed that neural link predictors could also be used for answering complex queries: their Continuous Query Decomposition (CQD) method works by decomposing complex queries into atomic sub-queries, answers them using neural link predictors and aggregates their scores via t-norms for ranking the answers to each complex query. However, CQD does not handle negations and only uses the training signal from atomic training queries: neural link prediction scores are not calibrated to interact together via fuzzy logic t-norms during complex query answering. In this work, we propose to address this problem by training a parameter-efficient score adaptation model to re-calibrate neural link prediction scores: this new component is trained on complex queries by back-propagating through the complex query-answering process. Our method, CQD$^{A}$, produces significantly more accurate results than current state-of-the-art methods, improving from $34.4$ to $35.1$ Mean Reciprocal Rank values averaged across all datasets and query types while using $\leq 35\%$ of the available training query types. We further show that CQD$^{A}$ is data-efficient, achieving competitive results with only $1\%$ of the training data, and robust in out-of-domain evaluations.

翻訳日:2023-07-07 18:11:31 公開日:2023-07-06

# 発散自発波動関数崩壊に対する線形摩擦多体方程式について

On the linear friction many-body equation for dissipative spontaneous wavefunction collapse ( http://arxiv.org/abs/2301.07661v2 )

ライセンス: Link先を確認

Giovanni Di Bartolomeo and Matteo Carlesso and Kristian Piscicchia and Catalina Curceanu and Maaneli Derakhshani and Lajos Di\'osi

(参考訳) 我々は、自然界における基本的な自発的デコヒーレンスと自発的波動関数の崩壊に関する既存の非相対論的理論の新たな散逸的拡張を目的として、多体系に対する最も単純な普遍散逸的リンドブラッドマスター方程式を構築し、研究する。これは、第二量子化された質量密度 $\hat \rho$ と現在の $\hat j$ で書かれるので普遍的であり、物質構造とそのパラメータとは独立している。電流の線形摩擦を仮定すると、散逸構造は厳密に制約されている。発散性リンドブラッド方程式の一般構造に従うと、最もよく知られた2つの自発波関数崩壊モデルDi\'osi-Penroseと連続自発局所化モデルの発散性拡張を導出し、解析する。

We construct and study the simplest universal dissipative Lindblad master equation for many-body systems with the purpose of a new dissipative extension of existing nonrelativistic theories of fundamental spontaneous decoherence and spontaneous wave function collapse in nature. It is universal as it is written in terms of second-quantized mass density $\hat \rho$ and current $\hat J$, thus making it independent of the material structure and its parameters. Assuming linear friction in the current, we find that the dissipative structure is strictly constrained. Following the general structure of our dissipative Lindblad equation, we derive and analyze the dissipative extensions of the two most known spontaneous wave function collapse models, the Di\'osi-Penrose and the continuous spontaneous localization models.

翻訳日:2023-07-07 18:10:59 公開日:2023-07-06

# ドメインシフトとラベルノイズ下での病理像を用いた共通不確実性推定手法のベンチマーク

Benchmarking common uncertainty estimation methods with histopathological images under domain shift and label noise ( http://arxiv.org/abs/2301.01054v2 )

ライセンス: Link先を確認

Hendrik A. Mehrtens, Alexander Kurz, Tabea-Clara Bucher, Titus J. Brinker

(参考訳) 近年, 病理組織学的応用分野における深層学習の利用が増加している。しかし、これらのアプローチは大きな可能性を示しているが、リスクの高い環境では、ディープラーニングモデルは不確実性を判断し、誤分類の可能性がかなり高い場合に入力を拒否できる必要がある。本研究は,スライド画像全体の分類において,最も一般的に用いられる不確かさとロバストネスの手法を厳密に評価し,不確かでない状況においてモデルが分類を拒絶すべき選択分類のタスクに着目した。我々は、ドメインシフトやラベルノイズの面からタイルレベルの実験を行い、スライドレベルの実験も行います。実験では,Deep Ensembles,Monte-Carlo Dropout,Stochastic Variational Inference,Test-Time Data Augmentation,および後者のアプローチのアンサンブルを比較した。従来のコンピュータビジョンベンチマークの結果とは対照的に,一般に手法のアンサンブルが不確実性評価を向上し,ドメインシフトやラベルノイズに対するロバスト性も向上するが,他の手法の体系的な利得は示さない。方法全体では、最も不確実なサンプルの拒絶は、分布内および分布外データの両方の分類精度を確実に向上させる。さらに,これらの手法をラベルノイズの異なる条件下で比較する実験を行った。最後に,病理組織学的データに対する不確実性推定のさらなる研究を促進するために,コードフレームワークを公開する。

In the past years, deep learning has seen an increase in usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole Slide Images, with a focus on the task of selective classification, where the model should reject the classification in situations in which it is uncertain. We conduct our experiments on tile-level under the aspects of domain shift and label noise, as well as on slide-level. In our experiments, we compare Deep Ensembles, Monte-Carlo Dropout, Stochastic Variational Inference, Test-Time Data Augmentation as well as ensembles of the latter approaches. We observe that ensembles of methods generally lead to better uncertainty estimates as well as an increased robustness towards domain shifts and label noise, while contrary to results from classical computer vision benchmarks no systematic gain of the other methods can be shown. Across methods, a rejection of the most uncertain samples reliably leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.

翻訳日:2023-07-07 18:10:41 公開日:2023-07-06

# 高次例外点の非線形摂動--皮膚離散呼吸器と階層的パワー・ロースケーリング

Nonlinear perturbation of a high-order exceptional point: skin discrete breathers and the hierarchical power-law scaling ( http://arxiv.org/abs/2212.13765v2 )

ライセンス: Link先を確認

Hui Jiang, Enhong Cheng, Ziyu Zhou, and Li-Jun Lang

(参考訳) 本研究では,一方向ホッピングとKerr非線形性を有するHatano-Nelsonモデルにおいて,高次例外点(EP)の非線形摂動をシステムサイト数$L$と同等に検討する。特に、1つの境界に集約する離散呼吸器のクラスを見つけ、ここで「it skin discrete breathers」 (sdb) と名づける。これらのSDBの非線形スペクトルは、EPの近くで階層的なパワーロースケーリングを示す。具体的には、摂動に対する非線形エネルギーの応答は $e_m\propto \vargamma^{\alpha_{m}}$ で与えられ、ここで $\alpha_m=3^{m-1}$ は、非線形エネルギーバンドをラベルする $m=1,\cdots,l$ のパワーである。これは一般の線型摂動の$L$-番目の根とは対照的である。これらのsdbは、指数関数的に崩壊する線形系のエッジ状態やスキンモードとは異なり、二重指数的に崩壊する。さらに、これらのSDBは完全な非線形性強度で生き残ることができ、大きな非線形性の限界において自己トラッピング状態と連続的に接続される。また、安定性解析から断熱的な進化の非線形忠実性が定義されていることも確認されている。ケラー非線形性が自然に存在する光学導波路の古典的プラットフォームやボース・アインシュタイン凝縮による光学格子の量子プラットフォームなど、様々なプラットフォームにおいて非相対的非線形モデルが実験的に実現されているため、この解析結果が非線形性と非ヘルミティシティの間の相互作用、特に高次epの相互作用のさらなる探求を促し、関連するシミュレーションをベンチマークすることができる。

We study the nonlinear perturbation of a high-order exceptional point (EP) of the order equal to the system site number $L$ in a Hatano-Nelson model with unidirectional hopping and Kerr nonlinearity. Notably, We find a class of discrete breathers that aggregate to one boundary, here named as {\it skin discrete breathers} (SDBs). The nonlinear spectrum of these SDBs shows a hierarchical power-law scaling near the EP. Specifically, the response of nonlinear energy to the perturbation is given by $E_m\propto \varGamma^{\alpha_{m}}$, where $\alpha_m=3^{m-1}$ is the power with $m=1,\cdots,L$ labeling the nonlinear energy bands. This is in sharp contrast to the $L$-th root of a linear perturbation in general. These SDBs decay in a double-exponential manner, unlike the edge states or skin modes in linear systems, which decay exponentially. Furthermore, these SDBs can survive over the full range of nonlinearity strength and are continuously connected to the self-trapped states in the limit of large nonlinearity. They are also stable, as confirmed by a defined nonlinear fidelity of an adiabatic evolution from the stability analysis. As nonreciprocal nonlinear models may be experimentally realized in various platforms, such as the classical platform of optical waveguides, where Kerr nonlinearity is naturally present, and the quantum platform of optical lattices with Bose-Einstein condensates, our analytical results may inspire further exploration of the interplay between nonlinearity and non-Hermiticity, particularly on high-order EPs, and benchmark the relevant simulations.

翻訳日:2023-07-07 18:10:12 公開日:2023-07-06

# コンピュータは「ノー」と言う:共感的会話型aiの事例

Computer says "No": The Case Against Empathetic Conversational AI ( http://arxiv.org/abs/2212.10983v2 )

ライセンス: Link先を確認

Alba Curry, Amanda Cercas Curry

(参考訳) 感情は人間の認知の不可欠な部分であり、世界に対する私たちの理解だけでなく、その中の行動も導く。このように、感情を落ち着かせるか、燃やすかは一致しない。会話型AIにおける最近の研究は、ユーザーに対して共感的に反応し、実際のベースなしで感情を検証することに集中している。このAIが支援する感情的規制は、ユーザや社会にネガティブな結果をもたらす可能性がある。我々はユーザーの感情にどう反応するかを慎重に検討する必要がある。

Emotions are an integral part of human cognition and they guide not only our understanding of the world but also our actions within it. As such, whether we soothe or flame an emotion is not inconsequential. Recent work in conversational AI has focused on responding empathetically to users, validating and soothing their emotions without a real basis. This AI-aided emotional regulation can have negative consequences for users and society, tending towards a one-noted happiness defined as only the absence of "negative" emotions. We argue that we must carefully consider whether and how to respond to users' emotions.

翻訳日:2023-07-07 18:09:38 公開日:2023-07-06

# 2-2散乱におけるベル不等式

Bell inequalities in 2-2 scattering ( http://arxiv.org/abs/2212.10213v3 )

ライセンス: Link先を確認

Aninda Sinha, Ahmadullah Zahed

(参考訳) 我々は,光子,重力子,フェルミオン,ピオンの2-2散乱におけるベルの不等式を考える。最大絡み合った状態に対してベル違反の最大値を与える測定設定を選択し,ベル不等式を計算した。低エネルギーでの光子散乱では、qedは小さな横領域を除いて全ての散乱角に対してベル違反を示す。これは微調整の問題を引き起こす。 ap(light axion-axion-like particle)を組み込むことで、微調整問題を排除し、axion-coupling-axion-massパラメータを制約する。グラビトン交換と光子散乱におけるベル違反の要求により、弱重力導体が満足していることが分かる。アクシオンカップリングにおける量子重力効果について論じる。 2-2グラビトン散乱の場合、CEMZ境界はベルの少なくとも小さな違反を許容する。ワインバーグ角の制限は、ババ散乱におけるベル違反を要求することによって見出される。近年のS-行列ブートストラップデータを用いて,許容S-行列空間のベルパラメータを解析した。光子の場合、ベルパラメータをエネルギーの関数として研究し、EFT観測の支持を得る。クエットであるピオンS行列のベルパラメータについて論じる。ピオンに対しては、Regge挙動を示すS-行列に適したベルパラメータの最小化が存在することが分かる。

We consider Bell inequalities in 2-2 scattering of photons, gravitons, fermions and pions. We choose measurement settings that give maximum Bell violation for maximally entangled states and calculate the relevant Bell inequalities for these processes. For photon scattering at low energies, QED exhibits Bell violation for all scattering angles except for a small transverse region. This leads to a fine-tuning problem. Incorporating a light axion/axion-like particle (ALP) removes the fine-tuning problem and constrains the axion-coupling--axion-mass parameters. Allowing for graviton exchange and demanding Bell violation in photon scattering, we find that the Weak Gravity Conjecture is satisfied. Quantum gravity effect on axion coupling is discussed. For 2-2 graviton scattering, we find that CEMZ bounds allow for at most small Bell violations. Restriction on the Weinberg angle is found by demanding Bell violation in Bhabha scattering. We use recent S-matrix bootstrap data for pions and photons to study the Bell parameter in the space of allowed S-matrices. In the photon case, we study the Bell parameters as a function of energy and find support for the EFT observations. We discuss Bell parameter for pion S-matrices, which are qutrits. For pions, we find that there is a minimization of a suitable Bell parameter for S-matrices which exhibit Regge behaviour.

翻訳日:2023-07-07 18:09:27 公開日:2023-07-06

# メモリ効率の高いNLLB-200:多言語機械翻訳モデルの言語特化

Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model ( http://arxiv.org/abs/2212.09811v2 )

ライセンス: Link先を確認

Yeskendir Koishekenov, Vassilina Nikoulina, Alexandre Berard

(参考訳) 従来のバイリンガル翻訳システムと比較して、単一のモデルが複数の言語に翻訳でき、低リソース言語に対する知識伝達の恩恵を受けるため、多言語機械翻訳は魅力的である。一方、多言語モデルは、そのサイズを大規模にスケーリングし、トレーニングと推論コストを増大させない限り、多言語性の呪いに悩まされる。 Sparse Mixture-of-Expertsモデルは、比例計算を必要とせずに、モデル容量を大幅に増やす方法である。最近リリースされたnllb-200は、そのようなモデルの例である。 202言語をカバーするが、推論には少なくとも4つの32GB GPUが必要である。そこで本研究では, 翻訳品質を損なうことなく, 最大80\%のエキスパートを除去し, 単一の32gb gpu上でモデルを実行することが可能なプルーニング手法を提案する。さらに分析した結果,言語固有の専門家を識別し,特定の言語ペアに関連のない専門家を特定できることが示唆された。

Compared to conventional bilingual translation systems, massively multilingual machine translation is appealing because a single model can translate into multiple languages and benefit from knowledge transfer for low resource languages. On the other hand, massively multilingual models suffer from the curse of multilinguality, unless scaling their size massively, which increases their training and inference costs. Sparse Mixture-of-Experts models are a way to drastically increase model capacity without the need for a proportional amount of computing. The recently released NLLB-200 is an example of such a model. It covers 202 languages but requires at least four 32GB GPUs just for inference. In this work, we propose a pruning method that allows the removal of up to 80\% of experts with a negligible loss in translation quality, which makes it feasible to run the model on a single 32GB GPU. Further analysis suggests that our pruning metrics allow to identify language-specific experts and prune non-relevant experts for a given language pair.

翻訳日:2023-07-07 18:09:07 公開日:2023-07-06

# LOANet:UAV空中リモートセンシング画像から建物や道路を抽出するオブジェクト注意を用いた軽量ネットワーク

LOANet: A Lightweight Network Using Object Attention for Extracting Buildings and Roads from UAV Aerial Remote Sensing Images ( http://arxiv.org/abs/2212.08490v6 )

ライセンス: Link先を確認

Xiaoxiang Han, Yiman Liu, Gang Liu, Yuanjie Lin, Qiaohong Liu

(参考訳) 深層学習による無人航空機(uav)リモートセンシング画像から建物や道路を抽出するセマンティックセグメンテーションは、測量やマッピングの分野で従来の手動セグメンテーションよりも効率的で便利である。モデルを軽量化し,モデルの精度を向上させるために,uav空中リモートセンシング画像から建物や道路にオブジェクト・アテンション(loanet)を用いた軽量ネットワークを提案する。提案するネットワークは,軽量Densely Connected Network (LDCNet) をエンコーダとして開発したエンコーダデコーダアーキテクチャを採用している。復号器部では、Atrous Space Pyramid Pooling Module (ASPP) と Object Attention Module (OAM) から構成される2つのマルチスケールコンテキストモジュールが、UAVリモートセンシング画像の特徴マップからより多くのコンテキスト情報を取得するように設計されている。 ASPPとOAMの間には、ASPPから抽出したマルチスケール機能を融合するために、FPN(Feature Pyramid Network)モジュールが使用されている。 2431のトレーニングセット、945の検証セット、および475のテストセットを含むUAVが撮影するリモートセンシング画像のプライベートデータセットを構築する。提案する基本モデルは1.4mパラメータと5.48g浮動小数点演算(flops)しか持たず、優れた平均交叉結合(miou)を達成している。 LoveDAとCITY-OSMデータセットのさらなる実験を行い、提案した基本モデルと大規模モデルの有効性をさらに検証し、優れたmIoU結果を得た。すべてのコードはhttps://github.com/GtLinyer/LOANetで入手できる。

Semantic segmentation for extracting buildings and roads from uncrewed aerial vehicle (UAV) remote sensing images by deep learning becomes a more efficient and convenient method than traditional manual segmentation in surveying and mapping fields. In order to make the model lightweight and improve the model accuracy, a Lightweight Network Using Object Attention (LOANet) for Buildings and Roads from UAV Aerial Remote Sensing Images is proposed. The proposed network adopts an encoder-decoder architecture in which a Lightweight Densely Connected Network (LDCNet) is developed as the encoder. In the decoder part, the dual multi-scale context modules which consist of the Atrous Spatial Pyramid Pooling module (ASPP) and the Object Attention Module (OAM) are designed to capture more context information from feature maps of UAV remote sensing images. Between ASPP and OAM, a Feature Pyramid Network (FPN) module is used to fuse multi-scale features extracted from ASPP. A private dataset of remote sensing images taken by UAV which contains 2431 training sets, 945 validation sets, and 475 test sets is constructed. The proposed basic model performs well on this dataset, with only 1.4M parameters and 5.48G floating point operations (FLOPs), achieving excellent mean Intersection-over-Union (mIoU). Further experiments on the publicly available LoveDA and CITY-OSM datasets have been conducted to further validate the effectiveness of the proposed basic and large model, and outstanding mIoU results have been achieved. All codes are available on https://github.com/GtLinyer/LOANet.

翻訳日:2023-07-07 18:08:47 公開日:2023-07-06

# 量子隠れマルコフモデルの実装と学習

Implementation and Learning of Quantum Hidden Markov Models ( http://arxiv.org/abs/2212.03796v2 )

ライセンス: Link先を確認

Vanio Markov, Vladimir Rastunkov, Amol Deshmukh, Daniel Fry, Charlee Stefanski

(参考訳) 本稿では、量子チャネルとオープン量子システムの理論を用いて、量子隠れマルコフモデル(QHMM)として知られる確率的生成系の効率的なユニタリ特性を提供する。ユニタリ・キャラクタリゼーションを利用して、任意のQHMMを中間回路計測による量子回路として実装できることを実証する。従来の隠れマルコフモデル (HMM) と比較して, QHMM は確率過程言語のより効率的な定義であることを示す。 QHMMを量子チャネルとして定式化することから始め、Stinespring の構成を用いて、これらのモデルを中間回路測定によるユニタリ量子回路として表現する。 QHMMの単位パラメータ化を利用して,形式的QHMM学習モデルを定義する。このモデルは、対象の確率過程言語の経験的分布を定式化し、量子回路の仮説空間を定義し、学習の成功基準として経験的確率的発散測度ハイポテーゼ適合性を導入する。学習モデルは,Stinespringの拡張の連続性により,スムーズな探索環境を有することを示す。仮説と適合空間の滑らかなマッピングは、効率的なヒューリスティックおよび勾配降下学習アルゴリズムの開発を可能にする。本稿では,QHMMのための2つの実践的学習アルゴリズムを提案する。最初のアルゴリズムはハイパーパラメータ適応進化探索である。第2のアルゴリズムは、マルチパラメータ非線形最適化手法を用いてqhmmを量子アンサッツ回路として学習する。

In this article we use the theory of quantum channels and open quantum systems to provide an efficient unitary characterization of a class of stochastic generators known as quantum hidden Markov models (QHMMs). By utilizing the unitary characterization, we demonstrate that any QHMM can be implemented as a quantum circuit with mid-circuit measurement. We prove that QHMMs are more efficient definitions of stochastic process languages compared to the equivalent classical hidden Markov Models (HMMs). Starting with the formulation of QHMMs as quantum channels, we employ Stinespring's construction to represent these models as unitary quantum circuits with mid-circuit measurement. By utilizing the unitary parameterization of QHMMs, we define a formal QHMM learning model. The model formalizes the empirical distributions of target stochastic process languages, defines hypothesis space of quantum circuits, and introduces an empirical stochastic divergence measure - hypothesis fitness - as a success criterion for learning. We demonstrate that the learning model has a smooth search landscape due to the continuity of Stinespring's dilation. The smooth mapping between the hypothesis and fitness spaces enables the development of efficient heuristic and gradient descent learning algorithms. We propose two practical learning algorithms for QHMMs. The first algorithm is a hyperparameter-adaptive evolutionary search. The second algorithm learns the QHMM as a quantum ansatz circuit using a multi-parameter non-linear optimization technique.

翻訳日:2023-07-07 18:08:16 公開日:2023-07-06

# マルチラベルテキスト分類におけるコントラスト学習の有効活用

An Effective Employment of Contrastive Learning in Multi-label Text Classification ( http://arxiv.org/abs/2212.00552v2 )

ライセンス: Link先を確認

Nankai Lin, Guanqiu Qin, Jigang Wang, Aimin Yang, Dong Zhou

(参考訳) 自然言語処理タスクにおけるコントラスト学習技術の有効性はまだ探究・分析されていない。正と負のサンプルを正しくかつ合理的に構築する方法は、コントラスト学習の核となる課題である。複数ラベルのテキスト分類タスクで対照的なオブジェクトを見つけるのはさらに難しい。以前提案された対照的な損失はほとんどない。本稿では,複数ラベルのテキスト分類タスクに対して,新しいコントラスト損失を5つ提案することにより,問題を異なる角度から検討する。これらは、SCL(Strict Contrastive Loss)、ICL(Intra-label Contrastive Loss)、JSCL(Jaccard similarity Contrastive Loss)、JSPCL(Jaccard similarity Probability Contrastive Loss)、SLCL(Stepwise Label Contrastive Loss)である。本稿では,これら新たな損失の雇用によるマルチラベルテキスト分類タスクにおけるコントラスト学習の有効性について検討し,コントラスト学習手法を特定のタスクに展開するためのベースラインモデルを提案する。さらに,このアプローチの解釈可能な分析を行い,コントラスト学習損失の異なる要素がどのように役割を担っているかを示す。実験結果から,提案したコントラスト損失は,複数ラベルテキスト分類タスクの改善につながることが示された。また,マルチラベルテキスト分類タスクにコントラスト学習をどのように適用すべきかについても検討した。

The effectiveness of contrastive learning technology in natural language processing tasks is yet to be explored and analyzed. How to construct positive and negative samples correctly and reasonably is the core challenge of contrastive learning. It is even harder to discover contrastive objects in multi-label text classification tasks. There are very few contrastive losses proposed previously. In this paper, we investigate the problem from a different angle by proposing five novel contrastive losses for multi-label text classification tasks. These are Strict Contrastive Loss (SCL), Intra-label Contrastive Loss (ICL), Jaccard Similarity Contrastive Loss (JSCL), Jaccard Similarity Probability Contrastive Loss (JSPCL), and Stepwise Label Contrastive Loss (SLCL). We explore the effectiveness of contrastive learning for multi-label text classification tasks by the employment of these novel losses and provide a set of baseline models for deploying contrastive learning techniques on specific tasks. We further perform an interpretable analysis of our approach to show how different components of contrastive learning losses play their roles. The experimental results show that our proposed contrastive losses can bring improvement to multi-label text classification tasks. Our work also explores how contrastive learning should be adapted for multi-label text classification tasks.

翻訳日:2023-07-07 18:07:57 公開日:2023-07-06

# FederatedTrust: 信頼できるフェデレーション学習のためのソリューション

FederatedTrust: A Solution for Trustworthy Federated Learning ( http://arxiv.org/abs/2302.09844v2 )

ライセンス: Link先を確認

Pedro Miguel S\'anchez S\'anchez, Alberto Huertas Celdr\'an, Ning Xie, G\'er\^ome Bovet, Gregorio Mart\'inez P\'erez, Burkhard Stiller

(参考訳) IoT(Internet of Things)とエッジコンピューティングの急速な拡張により、センシティブな情報を保持する分散データサイロの存在により、集中型機械学習(ML/DL)手法の課題が提示された。データプライバシに関する懸念に対処するため、フェデレートラーニング(FL)のようなML/DL技術が登場している。しかし、モデル予測に対する信頼を確立する必要性が高まっているため、データのプライバシとパフォーマンスの確保だけでは不十分である。既存の文献では、信頼できるML/DL(データプライバシを除く)に対して、堅牢性、公正性、説明可能性、説明責任を重要な柱として特定する様々なアプローチが提案されている。それでも、FLモデルに関連する信頼性柱と評価指標を同定し、FLモデルの信頼性レベルを計算できるソリューションを開発するためには、さらなる研究が必要である。本研究は、flの信頼性を評価するための既存の要件を検証し、6つの柱(プライバシー、ロバスト性、公平性、説明可能性、説明責任、連邦)と、flモデルの信頼性を計算するための30以上の指標からなる包括的分類法を導入する。その後、FederatedTrustというアルゴリズムが分類学で同定された柱とメトリクスに基づいて設計され、FLモデルの信頼性スコアが計算される。 FederatedTrustのプロトタイプが実装され、十分に確立されたFLフレームワークであるFederatedScopeの学習プロセスに統合される。最後に,federatedscopeの異なる構成を用いて5つの実験を行い,flモデルの信頼性計算におけるfederatedtrustの有用性を実証した。 3つの実験では、FEMNISTデータセットを使用し、2つは、実際のIoTセキュリティユースケースを考慮したN-Ba IoTデータセットを使用する。

The rapid expansion of the Internet of Things (IoT) and Edge Computing has presented challenges for centralized Machine and Deep Learning (ML/DL) methods due to the presence of distributed data silos that hold sensitive information. To address concerns regarding data privacy, collaborative and privacy-preserving ML/DL techniques like Federated Learning (FL) have emerged. However, ensuring data privacy and performance alone is insufficient since there is a growing need to establish trust in model predictions. Existing literature has proposed various approaches on trustworthy ML/DL (excluding data privacy), identifying robustness, fairness, explainability, and accountability as important pillars. Nevertheless, further research is required to identify trustworthiness pillars and evaluation metrics specifically relevant to FL models, as well as to develop solutions that can compute the trustworthiness level of FL models. This work examines the existing requirements for evaluating trustworthiness in FL and introduces a comprehensive taxonomy consisting of six pillars (privacy, robustness, fairness, explainability, accountability, and federation), along with over 30 metrics for computing the trustworthiness of FL models. Subsequently, an algorithm named FederatedTrust is designed based on the pillars and metrics identified in the taxonomy to compute the trustworthiness score of FL models. A prototype of FederatedTrust is implemented and integrated into the learning process of FederatedScope, a well-established FL framework. Finally, five experiments are conducted using different configurations of FederatedScope to demonstrate the utility of FederatedTrust in computing the trustworthiness of FL models. Three experiments employ the FEMNIST dataset, and two utilize the N-BaIoT dataset considering a real-world IoT security use case.

翻訳日:2023-07-07 18:02:40 公開日:2023-07-06

# ポインタジェネレータネットワークとSciBERT埋め込みを用いた研究論文からのハイライト生成

Generation of Highlights from Research Papers Using Pointer-Generator Networks and SciBERT Embeddings ( http://arxiv.org/abs/2302.07729v2 )

ライセンス: Link先を確認

Tohida Rehman, Debarshi Kumar Sanyal, Samiran Chattopadhyay, Plaban Kumar Bhowmick, Partha Pratim Das

(参考訳) 近年,本論文の主な知見を要約する研究論文が多数発表されている。ハイライトは、研究者が論文のコントリビューションを正確かつ迅速に特定するのに役立つだけでなく、検索エンジンによる発見可能性を高める。研究論文の特定の部分について,研究ハイライトを自動的に作成することを目的としている。我々は,入力トークンをSciBERT埋め込みにエンコードする入力に,カバレッジ機構を備えたポインタジェネレータネットワークとコンテキスト埋め込み層を使用する。我々は、ベンチマークデータセットCSPubSumでモデルをテストし、また、自動研究ハイライト生成のための新しい論文の多分野コーパスであるMixSubを提示する。 CSPubSum と MixSub の両モデルにおいて,提案モデルが関連する変種や文献で提案する他のモデルと比較して,最高の性能を達成できることを示した。 CSPubSumデータセットでは,入力が紙の抽象的な部分のみである場合に,他の部分に対して最高の性能が得られる。 ROUGE-1、ROUGE-2、ROUGE-L F1スコアは38.26、14.26、35.51、METEORスコアは32.62、BERTScore F1は86.65で、全てのベースラインを上回っている。新しいMixSubデータセットにおいて,提案したモデル(対象カテゴリを区別せずにトレーニングコーパス全体をトレーニングした場合)は,それぞれ31.78,9.76,29.3のROUGE-1,ROUGE-2,ROUGE-L F1スコア,METEORスコア24.00,BERTScore F1,85.25のそれぞれを達成する。

Nowadays many research articles are prefaced with research highlights to summarize the main findings of the paper. Highlights not only help researchers precisely and quickly identify the contributions of a paper, they also enhance the discoverability of the article via search engines. We aim to automatically construct research highlights given certain segments of a research paper. We use a pointer-generator network with coverage mechanism and a contextual embedding layer at the input that encodes the input tokens into SciBERT embeddings. We test our model on a benchmark dataset, CSPubSum, and also present MixSub, a new multi-disciplinary corpus of papers for automatic research highlight generation. For both CSPubSum and MixSub, we have observed that the proposed model achieves the best performance compared to related variants and other models proposed in the literature. On the CSPubSum dataset, our model achieves the best performance when the input is only the abstract of a paper as opposed to other segments of the paper. It produces ROUGE-1, ROUGE-2 and ROUGE-L F1-scores of 38.26, 14.26 and 35.51, respectively, METEOR score of 32.62, and BERTScore F1 of 86.65 which outperform all other baselines. On the new MixSub dataset, where only the abstract is the input, our proposed model (when trained on the whole training corpus without distinguishing between the subject categories) achieves ROUGE-1, ROUGE-2 and ROUGE-L F1-scores of 31.78, 9.76 and 29.3, respectively, METEOR score of 24.00, and BERTScore F1 of 85.25.

翻訳日:2023-07-07 18:02:08 公開日:2023-07-06

# その部分の合計:操作対象の慣性パラメータ識別のための視覚部分分割

The Sum of Its Parts: Visual Part Segmentation for Inertial Parameter Identification of Manipulated Objects ( http://arxiv.org/abs/2302.06685v2 )

ライセンス: Link先を確認

Philippe Nadeau, Matthew Giamou, Jonathan Kelly

(参考訳) 作業者と共に安全かつ効率的に作業するためには,協調ロボット(cobots)は,操作対象のダイナミックスを迅速に理解する能力が必要である。しかしながら、慣性パラメータの完全なセットを推定する従来の方法は、必ずしも高速で安全でない動き(十分な信号対雑音比を達成するために)に依存する。本研究では,視覚と力のねじれを組み合わせることで,動きの遅さや「ストップ・アンド・ゴー」のみを必要とする慣性パラメータ同定アルゴリズムを開発した。この手法は均質部分分割 (hps) と呼ばれ, 人工物は異なる均質な部分から構成されていることが多いという観察を生かしている。我々は,表面に基づく点クラスタリング法と体積形状分割アルゴリズムを組み合わせることで,操作対象の部分レベルセグメンテーションを高速に生成し,そのセグメンテーション表現をHPSにより精度よくオブジェクトの慣性パラメータを推定するために利用する。アルゴリズムをベンチマークするために、20の共通ワークショップツールに対して、現実的なメッシュ、セグメント化されたポイントクラウド、慣性パラメータからなる新しいデータセットを作成し、利用する。最後に,低コストの協調ロボットアームを用いて,複雑な「ハンマーバランス法」を自律的かつオンラインで実施することにより,HPSの実際の性能と精度を実証する。私たちのコードとデータセットはオープンソースで、自由に利用できます。

To operate safely and efficiently alongside human workers, collaborative robots (cobots) require the ability to quickly understand the dynamics of manipulated objects. However, traditional methods for estimating the full set of inertial parameters rely on motions that are necessarily fast and unsafe (to achieve a sufficient signal-to-noise ratio). In this work, we take an alternative approach: by combining visual and force-torque measurements, we develop an inertial parameter identification algorithm that requires slow or 'stop-and-go' motions only, and hence is ideally tailored for use around humans. Our technique, called Homogeneous Part Segmentation (HPS), leverages the observation that man-made objects are often composed of distinct, homogeneous parts. We combine a surface-based point clustering method with a volumetric shape segmentation algorithm to quickly produce a part-level segmentation of a manipulated object; the segmented representation is then used by HPS to accurately estimate the object's inertial parameters. To benchmark our algorithm, we create and utilize a novel dataset consisting of realistic meshes, segmented point clouds, and inertial parameters for 20 common workshop tools. Finally, we demonstrate the real-world performance and accuracy of HPS by performing an intricate 'hammer balancing act' autonomously and online with a low-cost collaborative robotic arm. Our code and dataset are open source and freely available.

翻訳日:2023-07-07 18:01:36 公開日:2023-07-06

# 心電図のパワーを解き放つ : 心電図信号を用いた医療システムにおける新しい患者同定法

Unleashing the Power of Electrocardiograms: A novel approach for Patient Identification in Healthcare Systems with ECG Signals ( http://arxiv.org/abs/2302.06529v2 )

ライセンス: Link先を確認

Caterina Fuster-Barcel\'o, Carmen C\'amara, Pedro Peris-L\'opez

(参考訳) 過去20年間に渡り、心臓のシグナルを生体計測のモダリティとして活用する可能性についてかなりの研究が続けられてきた。本稿では心電図信号を用いた医療システムにおける患者識別のための新しいアプローチを提案する。畳み込みニューラルネットワークは、ECG信号から抽出された画像に基づいてユーザを分類するために使用される。提案する識別システムは複数のデータベースで評価され,実世界のシナリオにおけるその可能性の包括的理解を提供する。心臓血管疾患の一般ユーザ識別への影響は、これまでの研究では概ね見過ごされてきた。本手法は, 患者の心血管状態を考慮し, 得られた結果が偏りや制限がないことを保証する。さらに、得られた結果は、広範囲な実験によって示されるように、低いエラー率と高い精度のメトリクスで、一貫性と信頼性がある。これらの機能はすべて、医療システムにおける患者識別の分野において、提案手法が貴重な貢献となり、実用的応用の強力な候補となる。

Over the course of the past two decades, a substantial body of research has substantiated the viability of utilising cardiac signals as a biometric modality. This paper presents a novel approach for patient identification in healthcare systems using electrocardiogram signals. A convolutional neural network is used to classify users based on images extracted from ECG signals. The proposed identification system is evaluated in multiple databases, providing a comprehensive understanding of its potential in real-world scenarios. The impact of Cardiovascular Diseases on generic user identification has been largely overlooked in previous studies. The presented method takes into account the cardiovascular condition of the patients, ensuring that the results obtained are not biased or limited. Furthermore, the results obtained are consistent and reliable, with lower error rates and higher accuracy metrics, as demonstrated through extensive experimentation. All these features make the proposed method a valuable contribution to the field of patient identification in healthcare systems, and make it a strong contender for practical applications.

翻訳日:2023-07-07 18:01:11 公開日:2023-07-06

# 仮想量子エラー検出

Virtual quantum error detection ( http://arxiv.org/abs/2302.02626v2 )

ライセンス: Link先を確認

Kento Tsubouchi, Yasunari Suzuki, Yuuki Tokunaga, Nobuyuki Yoshioka, Suguru Endo

(参考訳) 量子誤差補正と量子誤差検出は、エラーを検出するために症候群の測定を必要とする。各安定化器発電機のシンドローム測定は、現在の量子ハードウェアにおける読み出し忠実度が一般的にゲート忠実度よりも低いという事実を考慮すると、大きなオーバーヘッドとなる。本稿では,対称性拡張と呼ばれる量子エラー緩和手法を一般化することにより,仮想量子エラー検出(VQED)と呼ばれるプロトコルを提案する。この方法では、回路実行中の量子誤差検出により得られた後選択量子状態に対応する計算結果を、シンドローム測定を実装せずに、事実上評価することができる。安定化器発生器毎のアダマール試験回路の実装を必要とする従来の量子誤り検出とは異なり、我々のVQEDプロトコルは、安定化器発生器の数に関係なく、アンシラ量子ビットを持つ一定の深さの浅い量子回路で実行することができる。また,vqedを用いた計算結果は,vqedの動作中に発生する雑音に対して頑健であり,本手法は他の誤差緩和手法と完全互換であり,計算精度のさらなる向上と高忠実性量子コンピューティングの容易化が図れる。

Quantum error correction and quantum error detection necessitate syndrome measurements to detect errors. Performing syndrome measurements for each stabilizer generator can be a significant overhead, considering the fact that the readout fidelity in the current quantum hardware is generally lower than gate fidelity. Here, by generalizing a quantum error mitigation method known as symmetry expansion, we propose a protocol called virtual quantum error detection (VQED). This method virtually allows for evaluating computation results corresponding to post-selected quantum states obtained through quantum error detection during circuit execution, without implementing syndrome measurements. Unlike conventional quantum error detection, which requires the implementation of Hadamard test circuits for each stabilizer generator, our VQED protocol can be performed with a constant depth shallow quantum circuit with an ancilla qubit, irrespective of the number of stabilizer generators. Furthermore, the computation results obtained using VQED are robust against the noise that occurred during the operation of VQED, and our method is fully compatible with other error mitigation schemes, enabling further improvements in computation accuracy and facilitating high-fidelity quantum computing.

翻訳日:2023-07-07 18:00:27 公開日:2023-07-06

# マージナルコントリビューションを伴わないシェープリー値の近似

Approximating the Shapley Value without Marginal Contributions ( http://arxiv.org/abs/2302.00736v2 )

ライセンス: Link先を確認

Patrick Kolpaczki, Viktor Bengs, Maximilian Muschalik, Eyke H\"ullermeier

(参考訳) Shapley値は、最近説明可能な人工知能で集中的に使用されている協調ゲームにおいて、プレイヤーに有意義な貢献価値を割り当てる最も一般的なアプローチであることは間違いない。意味性は、シャプリー値のみが満足する公理的性質によるものであるが、エージェントの数で指数関数的に増加する正確な計算を犠牲にしている。したがって、多くの研究がシェープリーの値の効率的な近似に費やされており、そのほとんどはエージェントの限界貢献の概念に反するものである。本稿では,余剰貢献の概念から分離されたShapley値の表現に基づいて,SVARM と Stratified SVARM の2つのパラメータフリーおよびドメイン非依存近似アルゴリズムを提案する。我々は,その近似的品質に関する不一致の理論的保証を証明し,合成ゲームを含む経験的結果と,最先端手法と比較する一般的な説明可能性ユースケースを提供する。

The Shapley value is arguably the most popular approach for assigning a meaningful contribution value to players in a cooperative game, which has recently been used intensively in explainable artificial intelligence. The meaningfulness is due to axiomatic properties that only the Shapley value satisfies, which, however, comes at the expense of an exact computation growing exponentially with the number of agents. Accordingly, a number of works are devoted to the efficient approximation of the Shapley values, most of them revolve around the notion of an agent's marginal contribution. In this paper, we propose with SVARM and Stratified SVARM two parameter-free and domain-independent approximation algorithms based on a representation of the Shapley value detached from the notion of marginal contributions. We prove unmatched theoretical guarantees regarding their approximation quality and provide empirical results including synthetic games as well as common explainability use cases comparing ourselves with state-of-the-art methods.

翻訳日:2023-07-07 18:00:07 公開日:2023-07-06

# ESC:ゼロショットオブジェクトナビゲーションのためのソフトコモンセンス制約による探索

ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation ( http://arxiv.org/abs/2301.13166v3 )

ライセンス: Link先を確認

Kaiwen Zhou, Kaizhi Zheng, Connor Pryor, Yilin Shen, Hongxia Jin, Lise Getoor, Xin Eric Wang

(参考訳) 特定のオブジェクトを正確に見つけてナビゲートする能力は、現実世界で動作し、タスクを完了させるためにオブジェクトと対話するエージェントにとって重要な能力である。このようなオブジェクトナビゲーションタスクは、通常、ラベル付きオブジェクトを持つ視覚環境において大規模なトレーニングを必要とする。本研究では,事前学習モデルにおける常識知識を,ナビゲーション経験や視覚環境でのトレーニングなしにオープンワールドオブジェクトナビゲーションに伝達する,ソフト・コモンセンス制約(esc)を用いた新たなゼロショットオブジェクトナビゲーション手法を提案する。第一に、ESCは、オープンワールドのプロンプトベースのグラウンドリングのための事前学習されたビジョンと言語モデルと、ルームおよびオブジェクト推論のための事前学習されたコモンセンス言語モデルを利用する。そして、ESCはコモンセンス知識を、効率的な探索のためのソフトロジック述語としてモデル化することで、ナビゲーション行動に変換する。 MP3D, HM3D, および RoboTHOR ベンチマークの大規模な実験により、我々のESC法はベースラインよりも大幅に改善され、ゼロショットオブジェクトナビゲーションのための新しい最先端結果が得られる(例えば、MP3D の CoW よりも288% の相対的継承率の向上)。

The ability to accurately locate and navigate to a specific object is a crucial capability for embodied agents that operate in the real world and interact with objects to complete tasks. Such object navigation tasks usually require large-scale training in visual environments with labeled objects, which generalizes poorly to novel objects in unknown environments. In this work, we present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC), that transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience nor any other training on the visual environments. First, ESC leverages a pre-trained vision and language model for open-world prompt-based grounding and a pre-trained commonsense language model for room and object reasoning. Then ESC converts commonsense knowledge into navigation actions by modeling it as soft logic predicates for efficient exploration. Extensive experiments on MP3D, HM3D, and RoboTHOR benchmarks show that our ESC method improves significantly over baselines, and achieves new state-of-the-art results for zero-shot object navigation (e.g., 288% relative Success Rate improvement than CoW on MP3D).

翻訳日:2023-07-07 17:59:50 公開日:2023-07-06

# グラフ表現学習による効率的かつ実現可能なロボット組立シーケンス計画

Efficient and Feasible Robotic Assembly Sequence Planning via Graph Representation Learning ( http://arxiv.org/abs/2303.10135v3 )

ライセンス: Link先を確認

Matan Atad, Jianxiang Feng, Ismael Rodr\'iguez, Maximilian Durner, Rudolph Triebel

(参考訳) 自動ロボット組立シーケンス計画(RASP)は、製品カスタマイズの必要性が高まるとともに、現代製造業における生産性とレジリエンスを大幅に向上させることができる。このような自動化を実現する上での最大の課題のひとつは、ますます複雑なアセンブリの潜在的なシーケンスの数が増えることによるソリューションの効率的な発見にある。さらに、ロボットシステムにはコストのかかる実現性チェックが常に必要です。そこで本研究では,製品アセンブリのためのグラフ表現であるアセンブリグラフと,アセンブリシーケンス生成のためのGRACEと呼ばれるポリシアーキテクチャであるGraph Assembly Processing Networkを提案する。次に、GRACEを用いてグラフ入力から意味のある情報を抽出し、ステップバイステップでアセンブリシーケンスを予測する。実験では、両腕ロボットシステムのシミュレーションで収集したデータに基づいて、アルミニウムプロファイルの製品変種間で実現可能な組立シーケンスを予測できることを示す。さらに,本手法は, 偽予測による望ましくない影響を著しく軽減し, 現実の展開を容易にすることができることを示す。コードとトレーニングデータはhttps://github.com/DLR-RM/GRACEで公開されている。

Automatic Robotic Assembly Sequence Planning (RASP) can significantly improve productivity and resilience in modern manufacturing along with the growing need for greater product customization. One of the main challenges in realizing such automation resides in efficiently finding solutions from a growing number of potential sequences for increasingly complex assemblies. Besides, costly feasibility checks are always required for the robotic system. To address this, we propose a holistic graphical approach including a graph representation called Assembly Graph for product assemblies and a policy architecture, Graph Assembly Processing Network, dubbed GRACE for assembly sequence generation. Secondly, we use GRACE to extract meaningful information from the graph input and predict assembly sequences in a step-by-step manner. In experiments, we show that our approach can predict feasible assembly sequences across product variants of aluminum profiles based on data collected in simulation of a dual-armed robotic system. We further demonstrate that our method is capable of detecting infeasible assemblies, substantially alleviating the undesirable impacts from false predictions, and hence facilitating real-world deployment soon. Code and training data are available at https://github.com/DLR-RM/GRACE.

翻訳日:2023-07-07 17:51:17 公開日:2023-07-06

# NeRF固有の4つ: 逆内在カメラパラメータと外在カメラパラメータの同時最適化

NeRFtrinsic Four: An End-To-End Trainable NeRF Jointly Optimizing Diverse Intrinsic and Extrinsic Camera Parameters ( http://arxiv.org/abs/2303.09412v3 )

ライセンス: Link先を確認

Hannah Schieber, Fabian Deuser, Bernhard Egger, Norbert Oswald, Daniel Roth

(参考訳) ニューラル放射場(NeRF)を用いた新しいビュー合成は、新しい視点から高品質な画像を生成する最先端技術である。既存の手法では、極端および内在的なカメラパラメータに関する事前知識が必要である。これにより、前処理ステップが必要な合成シーンや現実世界のシナリオへの適用が制限される。カメラパラメータとNeRFの合同最適化に関する最近の研究は、ノイズのある外部カメラパラメータの精製に重点を置いており、しばしば固有のカメラパラメータの事前処理に依存している。さらなるアプローチは、1つのカメラのみを本質的にカバーすることに限られる。これらの制約に対処するため、我々はNeRFtrinsic Fourと呼ばれる新しいエンドツーエンドのトレーニング可能なアプローチを提案する。我々は,gaussian fourier特徴を用いて,外部カメラパラメータを推定し,投影誤差の監視により,固有カメラパラメータの変動を動的に予測する。提案手法はLLFFとBLEFFの既存の共同最適化手法よりも優れている。これら既存のデータセットに加えて,固有カメラパラメータの異なるiffと呼ばれる新しいデータセットも導入する。 nerftrinsic fourは、nerfベースのビュー合成を共同最適化するステップであり、カメラパラメータの異なる現実世界のシナリオにおいて、よりリアルで柔軟なレンダリングを可能にする。

Novel view synthesis using neural radiance fields (NeRF) is the state-of-the-art technique for generating high-quality images from novel viewpoints. Existing methods require a priori knowledge about extrinsic and intrinsic camera parameters. This limits their applicability to synthetic scenes, or real-world scenarios with the necessity of a preprocessing step. Current research on the joint optimization of camera parameters and NeRF focuses on refining noisy extrinsic camera parameters and often relies on the preprocessing of intrinsic camera parameters. Further approaches are limited to cover only one single camera intrinsic. To address these limitations, we propose a novel end-to-end trainable approach called NeRFtrinsic Four. We utilize Gaussian Fourier features to estimate extrinsic camera parameters and dynamically predict varying intrinsic camera parameters through the supervision of the projection error. Our approach outperforms existing joint optimization methods on LLFF and BLEFF. In addition to these existing datasets, we introduce a new dataset called iFF with varying intrinsic camera parameters. NeRFtrinsic Four is a step forward in joint optimization NeRF-based view synthesis and enables more realistic and flexible rendering in real-world scenarios with varying camera parameters.

翻訳日:2023-07-07 17:51:01 公開日:2023-07-06

# 複合時間スタンプイベントストリームの高速・マルチアスペクトマイニング

Fast and Multi-aspect Mining of Complex Time-stamped Event Streams ( http://arxiv.org/abs/2303.03789v2 )

ライセンス: Link先を確認

Kota Nakamura, Yasuko Matsubara, Koki Kawabata, Yuhei Umeda, Yuichiro Wada and Yasushi Sakurai

(参考訳) オンラインショッピングログ (item, price, brand, time) やローカルモビリティアクティビティ (pick-up and drop-off location, time) など,さまざまな属性を備えた時間進化イベントの巨大なオンラインストリームを,どのようにして,大規模で動的高次テンソルストリームを要約すればよいか? 隠れたパターンやルール、異常をどうやって見るのか? 我々は,高次テンソルストリーム上の効率的かつ効果的な手法であるcubescopeを提案するため,'regimes'と'components'という2種類のパターンに注目した。具体的には、突然の不連続を識別し、異なる動的パターン(例えば、平日、ウィークエンド、ホリデーパターン)を認識する。各制度では、すべての属性(アイテム、価格、ブランド、時間など)に対して多方向の要約を行い、潜在グループ(アイテム/ブランドグループなど)とその関係を表す隠れた'コンポーネント'を発見する。簡潔だが効果的な要約のおかげで、CubeScopeは異常の突然の出現を検出し、実際に発生する異常の種類を特定することもできる。提案手法は以下の特性を有する。 (a) 実効性: 動的マルチアスペクトパターン、すなわちレジームとコンポーネントをキャプチャし、統計的にすべての事象を要約する。 b) 一般に,データ圧縮,パターン発見,および様々なテンソルストリームの異常検出に成功させるには,実用的である。 (c)スケーラブル:我々のアルゴリズムは,データストリームの長さと次元に依存しない。実データセットに関する広範な実験は、立方体スコープが有意義なパターンや異常を正しく発見し、精度と実行速度に関して最先端の手法を一貫して上回っていることを示している。

Given a huge, online stream of time-evolving events with multiple attributes, such as online shopping logs: (item, price, brand, time), and local mobility activities: (pick-up and drop-off locations, time), how can we summarize large, dynamic high-order tensor streams? How can we see any hidden patterns, rules, and anomalies? Our answer is to focus on two types of patterns, i.e., ''regimes'' and ''components'', for which we present CubeScope, an efficient and effective method over high-order tensor streams. Specifically, it identifies any sudden discontinuity and recognizes distinct dynamical patterns, ''regimes'' (e.g., weekday/weekend/holiday patterns). In each regime, it also performs multi-way summarization for all attributes (e.g., item, price, brand, and time) and discovers hidden ''components'' representing latent groups (e.g., item/brand groups) and their relationship. Thanks to its concise but effective summarization, CubeScope can also detect the sudden appearance of anomalies and identify the types of anomalies that occur in practice. Our proposed method has the following properties: (a) Effective: it captures dynamical multi-aspect patterns, i.e., regimes and components, and statistically summarizes all the events; (b) General: it is practical for successful application to data compression, pattern discovery, and anomaly detection on various types of tensor streams; (c) Scalable: our algorithm does not depend on the length of the data stream and its dimensionality. Extensive experiments on real datasets demonstrate that CubeScope finds meaningful patterns and anomalies correctly, and consistently outperforms the state-of-the-art methods as regards accuracy and execution speed.

翻訳日:2023-07-07 17:50:22 公開日:2023-07-06

# 高精度・伝達可能なニューラルポテンシャルのための非平衡分子のDenoise Pretraining

Denoise Pretraining on Nonequilibrium Molecules for Accurate and Transferable Neural Potentials ( http://arxiv.org/abs/2303.02216v2 )

ライセンス: Link先を確認

Yuyang Wang, Changwen Xu, Zijie Li, Amir Barati Farimani

(参考訳) 等変グラフニューラルネットワーク(GNN)の最近の進歩は、分子ポテンシャル予測のための高価なアブ初期量子力学(QM)アプローチへの高速サロゲートモデルの開発に深層学習が適している。しかしながら、gnnを用いた正確で転送可能なポテンシャルモデルの構築は、特に大規模で複雑な分子システムにおいて、高価な計算コストとqm法の理論のレベルによって非常に制限されるため、依然として困難である。本研究では,非平衡分子配座を事前学習して,より正確かつ伝達可能なGNNポテンシャル予測を実現することを提案する。具体的には、サンプル非平衡配座の原子座標はランダムノイズによって摂動され、GNNは、元の座標を復元する摂動分子配座を飾るために事前訓練される。複数のベンチマークでの厳密な実験は、事前学習が神経電位の精度を大幅に向上させることを示した。さらに,提案手法はモデル非依存であり,異なる不変量および同変量gnnの性能が向上することを示した。特に, 分子にプリトレーニングされたモデルでは, 異種分子, 荷電分子, 生体分子, 大型分子など, 様々な分子系に微調整した場合の性能が向上する。これらの結果は、複雑な分子系に対してより一般化可能なニューラルポテンシャルを構築するために、denoise Pretrainingアプローチを活用する可能性を強調している。

Recent advances in equivariant graph neural networks (GNNs) have made deep learning amenable to developing fast surrogate models to expensive ab initio quantum mechanics (QM) approaches for molecular potential predictions. However, building accurate and transferable potential models using GNNs remains challenging, as the data is greatly limited by the expensive computational costs and level of theory of QM methods, especially for large and complex molecular systems. In this work, we propose denoise pretraining on nonequilibrium molecular conformations to achieve more accurate and transferable GNN potential predictions. Specifically, atomic coordinates of sampled nonequilibrium conformations are perturbed by random noises and GNNs are pretrained to denoise the perturbed molecular conformations which recovers the original coordinates. Rigorous experiments on multiple benchmarks reveal that pretraining significantly improves the accuracy of neural potentials. Furthermore, we show that the proposed pretraining approach is model-agnostic, as it improves the performance of different invariant and equivariant GNNs. Notably, our models pretrained on small molecules demonstrate remarkable transferability, improving performance when fine-tuned on diverse molecular systems, including different elements, charged molecules, biomolecules, and larger systems. These results highlight the potential for leveraging denoise pretraining approaches to build more generalizable neural potentials for complex molecular systems.

翻訳日:2023-07-07 17:49:27 公開日:2023-07-06

# 混合スパース線形回帰における統計計算的トレードオフ

Statistical-Computational Tradeoffs in Mixed Sparse Linear Regression ( http://arxiv.org/abs/2303.02118v2 )

ライセンス: Link先を確認

Gabriel Arpino and Ramji Venkataramanan

(参考訳) 2つの成分による混合スパース線形回帰の問題を考えると、2つの実$k$スパース信号 $\beta_1, \beta_2$ が$n$の非ラベリングノイズ線型測定から回収される。スパーシティは次元において部分線型であることが許され、加法ノイズは分散 $\sigma^2$ を持つ独立ガウスであると仮定される。以前の研究によると、この問題は$\frac{k}{snr^2}$-to-$\frac{k^2}{snr^2}$ 統計計算から計算へのギャップに苦しんでおり、スパースpcaやロバストスパース平均推定のような計算上困難な他の高次元推論問題に似ている。低次多項式の手法によりこの問題に対するより広範な計算障壁の存在を確立するが、この問題は非常に狭い対称パラメータ状態においてのみ計算的に困難であることを示す。この難易度における任意のランダム化アルゴリズムに対して,サンプル複雑性$n$と実行時の間のスムーズな情報計算トレードオフを同定する。単純な還元により、サンプル複雑性 $n = \tilde{o}(k^2)$ でスパース位相検索における正確な支持回復を解決するために計算障壁が存在するという新しい厳密な証拠が得られる。第2の貢献は, 難解な狭い状況以外では, サンプル数と正方根の時間と(非混合)スパース線形回帰に必要なサンプルの複雑さを一致させて, 関連する混合回帰検出問題を$O(np)$で解く, という単純なしきい値決定アルゴリズムを解析することである。この結果の特別な場合として,この単純なアルゴリズムは,分散線形回帰法において,完全符号付きサポートリカバリを解くためのアルゴリズム群の中で,順序最適であることを示す。

We consider the problem of mixed sparse linear regression with two components, where two real $k$-sparse signals $\beta_1, \beta_2$ are to be recovered from $n$ unlabelled noisy linear measurements. The sparsity is allowed to be sublinear in the dimension, and additive noise is assumed to be independent Gaussian with variance $\sigma^2$. Prior work has shown that the problem suffers from a $\frac{k}{SNR^2}$-to-$\frac{k^2}{SNR^2}$ statistical-to-computational gap, resembling other computationally challenging high-dimensional inference problems such as Sparse PCA and Robust Sparse Mean Estimation; here $SNR$ is the signal-to-noise ratio. We establish the existence of a more extensive computational barrier for this problem through the method of low-degree polynomials, but show that the problem is computationally hard only in a very narrow symmetric parameter regime. We identify a smooth information-computation tradeoff between the sample complexity $n$ and runtime for any randomized algorithm in this hard regime. Via a simple reduction, this provides novel rigorous evidence for the existence of a computational barrier to solving exact support recovery in sparse phase retrieval with sample complexity $n = \tilde{o}(k^2)$. Our second contribution is to analyze a simple thresholding algorithm which, outside of the narrow regime where the problem is hard, solves the associated mixed regression detection problem in $O(np)$ time with square-root the number of samples and matches the sample complexity required for (non-mixed) sparse linear regression; this allows the recovery problem to be subsequently solved by state-of-the-art techniques from the dense case. As a special case of our results, we show that this simple algorithm is order-optimal among a large family of algorithms in solving exact signed support recovery in sparse linear regression.

翻訳日:2023-07-07 17:49:01 公開日:2023-07-06

# 数量子配列における長持続的反相関

Long persistent anticorrelations in few-qubit arrays ( http://arxiv.org/abs/2303.02085v2 )

ライセンス: Link先を確認

Danil Kornovan, Alexander Poddubny, and Alexander Poshakinskiy

(参考訳) 一般電磁環境における2レベル原子配列に散在する光子間のアンチバンチングを実現する機構を理論的に検討する。私たちの目標は、個々の原子の自発的放出寿命よりもはるかに長い時間持続するアンチバンチングです。このような持続的なアンチバンチングのメカニズムを2つ挙げる。 1つは原子配列のサブラジアント状態に基づいており、もう1つはサブラジアント状態を必要としない。我々は,自由空間の配列と導波路に結合した配列に基づいて,最適化されたアンチバンチを持つ配列パラメータの具体例を2つ提案した。

We consider theoretically the mechanisms to realize antibunching between the photons scattered on the array of two-level atoms in a general electromagnetic environment. Our goal is the antibunching that persists for the times much longer than the spontaneous emission lifetime of an individual atom. We identify two mechanisms for such persistent antibunching. The first one is based on subradiant states of the atomic array, and the second one does not require any subradiant states. We provided two specific examples of array parameters with optimized antibunching, based on an array in a free space and an array coupled to a waveguide.

翻訳日:2023-07-07 17:48:22 公開日:2023-07-06

# ディープラーニングを用いた光リモートセンシング画像における指向性物体検出

Oriented Object Detection in Optical Remote Sensing Images using Deep Learning: A Survey ( http://arxiv.org/abs/2302.10473v2 )

ライセンス: Link先を確認

Kun Wang, Zi Wang, Zhang Lia, Ang Sua, Xichao Tenga, Minhao Liua and Qifeng Yua

(参考訳) 指向オブジェクト検出は、リモートセンシングにおける最も基本的かつ挑戦的なタスクの1つであり、多数の事前定義されたオブジェクトカテゴリの指向オブジェクトを見つけることを目的としている。近年,光リモートセンシング画像における指向性物体の検出において,深層学習に基づく手法が顕著な成果を上げている。しかし,リモートセンシングにおける文献の徹底的なレビューは行われていない。そこで我々は,近年の進歩を包括的に調査し,問題定義,一般的なデータセット,評価プロトコル,検出フレームワーク,オブジェクト指向オブジェクト表現,特徴表現など,オブジェクト指向オブジェクト検出の多くの側面をカバーする。さらに,最先端の手法を分析し,考察する。最後に,今後の研究の方向性を議論し,有用な研究指導を行う。この調査は学界や産業界の研究者にとって

Oriented object detection is one of the most fundamental and challenging tasks in remote sensing, aiming at locating the oriented objects of numerous predefined object categories. Recently, deep learning based methods have achieved remarkable performance in detecting oriented objects in optical remote sensing imagery. However, a thorough review of the literature in remote sensing has not yet emerged. Therefore, we give a comprehensive survey of recent advances and cover many aspects of oriented object detection, including problem definition, commonly used datasets, evaluation protocols, detection frameworks, oriented object representations, and feature representations. Besides, the state-of-the-art methods are analyzed and discussed. We finally discuss future research directions to put forward some useful research guidance. We believe that this survey shall be valuable to researchers across academia and industry

翻訳日:2023-07-07 17:48:12 公開日:2023-07-06

# 多様なマルチモーダル制御を備えたインタラクティブな画像記述

Caption Anything: Interactive Image Description with Diverse Multimodal Controls ( http://arxiv.org/abs/2305.02677v3 )

ライセンス: Link先を確認

Teng Wang, Jinrui Zhang, Junjie Fei, Hao Zheng, Yunlong Tang, Zhe Li, Mingqi Gao, Shanshan Zhao

(参考訳) 制御可能な画像キャプション(英: Controllable image Casting)は、人間の目的に従って自然言語で画像を記述することを目的とした、新たなマルチモーダルトピックである。最先端の手法は、アノテーション付き入力制御と出力キャプションで訓練される。しかし、このような注釈付きマルチモーダルデータの不足は、対話型AIシステムのユーザビリティとスケーラビリティを大幅に制限する。ユニモーダル命令追跡基盤モデルを活用することは、幅広いデータソースの恩恵を受ける有望な代替手段である。本稿では,幅広いマルチモデル制御をサポートする基盤モデル拡張画像キャプションフレームワークであるCaption AnyThing(CAT)について述べる。 1) 点,箱,軌跡を含む視覚制御 2)感情,長さ,言語,事実性などの言語制御。 Segment Anything Model(SAM)とChatGPTによって、視覚と言語プロンプトをモジュール化されたフレームワークに統合し、異なるコントロール間の柔軟な組み合わせを可能にします。広範なケーススタディは,視覚言語アプリケーションにおける効果的なユーザインタラクションモデリングに光を当てながら,このフレームワークのユーザ意図アライメント機能を実証する。私たちのコードはhttps://github.com/ttengwang/Caption-Anything.comで公開されています。

Controllable image captioning is an emerging multimodal topic that aims to describe the image with natural language following human purpose, $\textit{e.g.}$, looking at the specified regions or telling in a particular text style. State-of-the-art methods are trained on annotated pairs of input controls and output captions. However, the scarcity of such well-annotated multimodal data largely limits their usability and scalability for interactive AI systems. Leveraging unimodal instruction-following foundation models is a promising alternative that benefits from broader sources of data. In this paper, we present Caption AnyThing (CAT), a foundation model augmented image captioning framework supporting a wide range of multimodel controls: 1) visual controls, including points, boxes, and trajectories; 2) language controls, such as sentiment, length, language, and factuality. Powered by Segment Anything Model (SAM) and ChatGPT, we unify the visual and language prompts into a modularized framework, enabling the flexible combination between different controls. Extensive case studies demonstrate the user intention alignment capabilities of our framework, shedding light on effective user interaction modeling in vision-language applications. Our code is publicly available at https://github.com/ttengwang/Caption-Anything.

翻訳日:2023-07-07 17:42:07 公開日:2023-07-06

# FedVS: 分割モデルのためのストラグラー耐性とプライバシ保護による垂直的フェデレーション学習

FedVS: Straggler-Resilient and Privacy-Preserving Vertical Federated Learning for Split Models ( http://arxiv.org/abs/2304.13407v3 )

ライセンス: Link先を確認

Songze Li, Duanyi Yao, Jin Liu

(参考訳) 中央サーバと多くの分散クライアントからなる垂直連合学習(VFL)システムにおいて、トレーニングデータを垂直に分割し、異なる特徴を異なるクライアントにプライベートに格納する。分割VFLの問題は、サーバとクライアントの間で分割されたモデルをトレーニングすることだ。本稿では,分割VFLにおける2つの課題に対処することを目的とする。 1) 研修中にクライアントを絞ったことによる性能の低下 2) クライアントがアップロードしたデータ埋め込みからのデータとモデルのプライバシリーク。我々はこれらの2つの課題に同時に対処するためにFedVSを提案する。 fedvsの鍵となるアイデアは、ローカルデータやモデルのシークレット共有スキームをデザインすることであり、クライアントと好奇心に満ちたサーバに対する情報理論的なプライバシーが保証され、全てのクライアントの埋め込みの集約は、非ストラグリングクライアントから計算共有を復号することで損失なく再構築される。様々な種類のVFLデータセット(表、CV、マルチビューを含む)に対する大規模な実験は、ベースラインプロトコルに対するトラグラー緩和とプライバシ保護におけるFedVSの普遍的な利点を示している。

In a vertical federated learning (VFL) system consisting of a central server and many distributed clients, the training data are vertically partitioned such that different features are privately stored on different clients. The problem of split VFL is to train a model split between the server and the clients. This paper aims to address two major challenges in split VFL: 1) performance degradation due to straggling clients during training; and 2) data and model privacy leakage from clients' uploaded data embeddings. We propose FedVS to simultaneously address these two challenges. The key idea of FedVS is to design secret sharing schemes for the local data and models, such that information-theoretical privacy against colluding clients and curious server is guaranteed, and the aggregation of all clients' embeddings is reconstructed losslessly, via decrypting computation shares from the non-straggling clients. Extensive experiments on various types of VFL datasets (including tabular, CV, and multi-view) demonstrate the universal advantages of FedVS in straggler mitigation and privacy protection over baseline protocols.

翻訳日:2023-07-07 17:41:13 公開日:2023-07-06

# 部分観測からナビゲーションパターンを予測する学習

Learning to Predict Navigational Patterns from Partial Observations ( http://arxiv.org/abs/2304.13242v2 )

ライセンス: Link先を確認

Robin Karlsson, Alexander Carballo, Francisco Lepe-Salazar, Keisuke Fujii, Kento Ohtani, Kazuya Takeda

(参考訳) 人間は、相互に知られた航法パターンに固執することで、規則に制約された環境を協調的にナビゲートする。不完全な環境からこれらのナビゲーションパターンを推測するには、未熟な場所で動作するインテリジェントな移動ロボットが必要である。しかし、これらのナビゲーションパターンをアルゴリズム的に定義することは非自明である。本稿では,実環境におけるナビゲーションパターンを部分的観測のみから推測する,最初の自己教師付き学習(ssl)手法を提案する。幾何学的データ拡張, 予測世界モデリング, 情報理論正規化器により, 無限データに制限された非バイアスな局所指向性軟線確率(DSLP)の予測が可能となる。 dslp フィールドに最大度グラフをフィッティングすることにより、グローバルナビゲーションパターンを推定する方法を実証する。実験の結果,sslモデルは,nuscenesデータセット上の2つのsoma教師付きレーングラフ予測モデルよりも優れていた。認識によるナビゲーションのためのスケーラブルで解釈可能な連続学習パラダイムとしてSSL方式を提案する。コードはhttps://github.com/robin-karlsson0/dslpで入手できる。

Human beings cooperatively navigate rule-constrained environments by adhering to mutually known navigational patterns, which may be represented as directional pathways or road lanes. Inferring these navigational patterns from incompletely observed environments is required for intelligent mobile robots operating in unmapped locations. However, algorithmically defining these navigational patterns is nontrivial. This paper presents the first self-supervised learning (SSL) method for learning to infer navigational patterns in real-world environments from partial observations only. We explain how geometric data augmentation, predictive world modeling, and an information-theoretic regularizer enables our model to predict an unbiased local directional soft lane probability (DSLP) field in the limit of infinite data. We demonstrate how to infer global navigational patterns by fitting a maximum likelihood graph to the DSLP field. Experiments show that our SSL model outperforms two SOTA supervised lane graph prediction models on the nuScenes dataset. We propose our SSL method as a scalable and interpretable continual learning paradigm for navigation by perception. Code is available at https://github.com/robin-karlsson0/dslp.

翻訳日:2023-07-07 17:40:54 公開日:2023-07-06

# DETRはリアルタイム物体検出でYOLOに勝る

DETRs Beat YOLOs on Real-time Object Detection ( http://arxiv.org/abs/2304.08069v2 )

ライセンス: Link先を確認

Wenyu Lv, Yian Zhao, Shangliang Xu, Jinman Wei, Guanzhong Wang, Cheng Cui, Yuning Du, Qingqing Dang, Yi Liu

(参考訳) 近年, エンド・ツー・エンド変圧器型検出器~(DETR)は優れた性能を発揮している。しかし, DETR の高計算コストの問題は効果的に解決されておらず,実用的利用を制限し,非最大抑圧 (NMS) などの後処理の利点を完全に活用することができない。本稿では,現代のリアルタイム物体検出器におけるNMSの推論速度への影響を解析し,エンドツーエンドの速度ベンチマークを確立する。 NMSによる推論遅延を回避するため,我々の知る最初のリアルタイム・エンドツーエンド物体検出器であるリアルタイム検出TRansformer (RT-DETR)を提案する。具体的には,大規模インタラクションとクロススケールフュージョンを分離してマルチスケール特徴を効率的に処理する効率的なハイブリッドエンコーダを設計し,オブジェクトクエリの初期化を改善するためにIoU対応クエリ選択を提案する。また,提案する検出器は,異なるデコーダ層を用いて,再訓練を必要とせず柔軟に推定速度を調整できるため,実時間物体検出器の実用化が容易である。 RT-DETR-LはCOCO val2017で53.0%AP、T4 GPUで114FPS、RT-DETR-Xは54.8%APと74FPSを達成し、同じスケールのYOLO検出器をスピードと精度で上回っている。さらに, RT-DETR-R50は53.1%のAPと108のFPSを達成し, DINO-Deformable-DETR-R50の精度は2.2%, FPSの約21倍に向上した。 ourceコードと事前トレーニング済みモデルはhttps://github.com/lyuwenyu/RT-DETR.orgで公開されている。

Recently, end-to-end transformer-based detectors~(DETRs) have achieved remarkable performance. However, the issue of the high computational cost of DETRs has not been effectively addressed, limiting their practical application and preventing them from fully exploiting the benefits of no post-processing, such as non-maximum suppression (NMS). In this paper, we first analyze the influence of NMS in modern real-time object detectors on inference speed, and establish an end-to-end speed benchmark. To avoid the inference delay caused by NMS, we propose a Real-Time DEtection TRansformer (RT-DETR), the first real-time end-to-end object detector to our best knowledge. Specifically, we design an efficient hybrid encoder to efficiently process multi-scale features by decoupling the intra-scale interaction and cross-scale fusion, and propose IoU-aware query selection to improve the initialization of object queries. In addition, our proposed detector supports flexibly adjustment of the inference speed by using different decoder layers without the need for retraining, which facilitates the practical application of real-time object detectors. Our RT-DETR-L achieves 53.0% AP on COCO val2017 and 114 FPS on T4 GPU, while RT-DETR-X achieves 54.8% AP and 74 FPS, outperforming all YOLO detectors of the same scale in both speed and accuracy. Furthermore, our RT-DETR-R50 achieves 53.1% AP and 108 FPS, outperforming DINO-Deformable-DETR-R50 by 2.2% AP in accuracy and by about 21 times in FPS. ource code and pre-trained models are available at https://github.com/lyuwenyu/RT-DETR.

翻訳日:2023-07-07 17:40:30 公開日:2023-07-06

# シュミット位と行列位によるエンタングルメント蒸留

Entanglement distillation in terms of Schmidt rank and matrix rank ( http://arxiv.org/abs/2304.05563v2 )

ライセンス: Link先を確認

Tianyi Ding, Lin Chen

(参考訳) エンタングルメント蒸留は量子情報処理において重要なタスクである。本稿では,Schmidt階数と行列階数の非正分位 (NPT) バイパルタイト状態を蒸留する。シュミットランク2の全ての二成分状態は古典古典的状態と局所的に等価であり、シュミットランク3の全ての二成分状態は1-不飽和状態であることを示す。次に, 生成物ベクトルを含む低ランクのB値のNPT状態が蒸留可能であることを示し, 低ランクのB値のNPT状態は, 大容量の密度演算子に対して蒸留可能であることを示した。最終的には、$M\times N$ bipartite state of rank $\max\{M,N\}+1$ を蒸留する等価条件を示す。

Entanglement distillation is a key task in quantum-information processing. In this paper, we distill non-positive-partial-transpose (NPT) bipartite states of some given Schmidt rank and matrix rank. We show that all bipartite states of Schmidt rank two are locally equivalent to classical-classical states, and all bipartite states of Schmidt rank three are 1-undistillable. Subsequently, we show that low-rank B-irreducible NPT states are distillable for large-rank reduced density operators by proving low-rank B-irreducible NPT state whose range contains a product vector is distillable. Eventually, we present an equivalent condition to distill $M\times N$ bipartite states of rank $\max\{M,N\}+1$.

翻訳日:2023-07-07 17:39:57 公開日:2023-07-06

# 子宮動脈ドプラ画像の自動誘導と品質評価システム

An Automatic Guidance and Quality Assessment System for Doppler Imaging of Umbilical Artery ( http://arxiv.org/abs/2304.05463v2 )

ライセンス: Link先を確認

Chun Kit Wong and Manxi Lin and Alberto Raheli and Zahra Bashir and Morten Bo S{\o}ndergaard Svendsen and Martin Gr{\o}nneb{\ae}k Tolsgaard and Aasa Feragen and Anders Nymark Christensen

(参考訳) 超音波ドプラ法を用いて子宮動脈の検査を行い,胎児の健康モニタリングに欠かせない子宮を通じて胎児への血液供給について検討した。このような検査は、測定に最適な部位を特定すること、ドップラースペクトルの形で血流曲線を取得すること、一連の品質基準に準拠すること、など、正しく行う必要があるいくつかのステップを含む。これらのステップはオペレーターのスキルに大きく依存しており、経験豊富なソノグラフィーの不足が機械支援の需要を生み出している。本研究では,このギャップを埋める自動システムを提案する。改良されたFaster R-CNNネットワークを用いることで,ドップラー計測に適した位置を提案するアルゴリズムを得る。また,ドップラースペクトルの品質評価手法も開発した。提案システムは,全国の超音波検診データベースから657枚の画像に対して検証し,ガイダンスシステムとしての可能性を示した。

Examination of the umbilical artery with Doppler ultrasonography is performed to investigate blood supply to the fetus through the umbilical cord, which is vital for the monitoring of fetal health. Such examination involves several steps that must be performed correctly: identifying suitable sites on the umbilical artery for the measurement, acquiring the blood flow curve in the form of a Doppler spectrum, and ensuring compliance to a set of quality standards. These steps rely heavily on the operator's skill, and the shortage of experienced sonographers has thus created a demand for machine assistance. In this work, we propose an automatic system to fill the gap. By using a modified Faster R-CNN network, we obtain an algorithm that can suggest locations suitable for Doppler measurement. Meanwhile, we have also developed a method for assessment of the Doppler spectrum's quality. The proposed system is validated on 657 images from a national ultrasound screening database, with results demonstrating its potential as a guidance system.

翻訳日:2023-07-07 17:39:44 公開日:2023-07-06

# SLPerf: 分散学習のベンチマークのための統一フレームワーク

SLPerf: a Unified Framework for Benchmarking Split Learning ( http://arxiv.org/abs/2304.01502v2 )

ライセンス: Link先を確認

Tianchen Zhou, Zhanyi Hu, Bingzhe Wu, Cen Chen

(参考訳) データプライバシの懸念により、サイロに分散したデータの集中的なトレーニングが実現不可能となり、協調学習フレームワークの必要性が高まった。これに対処するために、フェデレーション学習(fl)とスプリット学習(sl)という、2つの著名なフレームワークが登場した。 FLは様々なベンチマークフレームワークや研究ライブラリを確立しているが、SLは現在、ラベル共有、モデルアグリゲーション、カット層選択の点で多様性があるにもかかわらず、統一ライブラリを欠いている。この標準化の欠如はSLパラダイムの比較を困難にしている。そこで本研究では,SLのための統一的な研究フレームワークであるSLPerfを提案し,IIDおよび非IIDデータ設定下で広く使用されている4つのデータセットについて広範な実験を行った。我々のコントリビューションには、最近提案されたSLパラダイムの包括的調査、さまざまな状況におけるSLパラダイムの詳細なベンチマーク比較、SLパラダイムを改善するためのリッチエンジニアリングのテイクアウトメッセージと研究の洞察が含まれている。 SLPerfはSLアルゴリズムの開発と公正な性能比較を容易にする。コードはhttps://github.com/Rainysponge/Split-learning-Attacksで入手できる。

Data privacy concerns has made centralized training of data, which is scattered across silos, infeasible, leading to the need for collaborative learning frameworks. To address that, two prominent frameworks emerged, i.e., federated learning (FL) and split learning (SL). While FL has established various benchmark frameworks and research libraries,SL currently lacks a unified library despite its diversity in terms of label sharing, model aggregation, and cut layer choice. This lack of standardization makes comparing SL paradigms difficult. To address this, we propose SLPerf, a unified research framework and open research library for SL, and conduct extensive experiments on four widely-used datasets under both IID and Non-IID data settings. Our contributions include a comprehensive survey of recently proposed SL paradigms, a detailed benchmark comparison of different SL paradigms in different situations, and rich engineering take-away messages and research insights for improving SL paradigms. SLPerf can facilitate SL algorithm development and fair performance comparisons. The code is available at https://github.com/Rainysponge/Split-learning-Attacks .

翻訳日:2023-07-07 17:39:30 公開日:2023-07-06

# 非マルコフ進化が量子熱力学のキャラクタリゼーションに及ぼす影響

Impact of non-Markovian evolution on characterizations of quantum thermodynamics ( http://arxiv.org/abs/2305.10622v2 )

ライセンス: Link先を確認

Devvrat Tiwari and Subhashish Banerjee

(参考訳) 本研究では,非マルコフ進化がエルゴトロピーやパワーといった量子熱力学の顕著な特性に与える影響について考察する。これらは量子速度制限時間の挙動によってベンチマークされる。本稿では,幾何学的,特に量子フィッシャーとウィグナー・ヤナゼ情報量測定と物性に基づく測定,特に相対純度測度とコヒーレンス測度の相対エントロピーを用いて,量子速度制限時間を計算する。非マルコフ振幅減衰進化を示すボソニック浴中の量子ビットの単純な非マルコフ模型は、有限な初期エルゴトロピーを持つ量子熱力学の観点から量子バッテリーとして観察することができる。この目的のために,量子速度制限時間の物理特性に基づく測定値とエルゴトロピーのコヒーレント成分との関係を考察する。非マルコフ進化は量子電池の充電過程に影響を与えることが示されている。さらに、量子電池の放電充電サイクルと、量子速度制限時間の幾何学的測定との接続を観測する。

Here we study the impact of non-Markovian evolution on prominent characteristics of quantum thermodynamics, such as ergotropy and power. These are benchmarked by the behavior of the quantum speed limit time. We make use of both geometric-based, particularly quantum Fisher and Wigner-Yanase information metric, and physical properties based-measures, particularly relative purity measure and relative entropy of coherence measure, to compute the quantum speed limit time. A simple non-Markovian model of a qubit in a bosonic bath exhibiting non-Markovian amplitude damping evolution is considered, which, from the quantum thermodynamic perspective with finite initial ergotropy, can be envisaged as a quantum battery. To this end, we explore the connections between the physical properties-based measures of quantum speed limit time and the coherent component of ergotropy. The non-Markovian evolution is shown to impact the recharging process of the quantum battery. Further, a connection between the discharging-charging cycle of the quantum battery and the geometric measures of quantum speed limit time is observed.

翻訳日:2023-07-07 17:31:53 公開日:2023-07-06

# ランダム畳み込み核を用いた時系列クラスタリング

Time Series Clustering With Random Convolutional Kernels ( http://arxiv.org/abs/2305.10457v2 )

ライセンス: Link先を確認

Jorge Marco-Blanco, Rub\'en Cuevas

(参考訳) 気候学からファイナンス、医療まで幅広いアプリケーションにわたる時系列データは、その大きさと複雑さのためにデータマイニングにおいて重大な課題を呈する。ひとつは時系列クラスタリングであり、ラベルなしの時系列データの大量処理と貴重な洞察の解放に不可欠である。しかし、伝統的かつ近代的な分析手法は、しばしばこれらの複雑さに苦しむ。これらの制約に対処するために、ランダムに選択されたパラメータを持つ畳み込みアーキテクチャを利用するR-Clusteringを導入する。大規模な評価を通じて、R-Clusteringはクラスタリングの精度、計算効率、スケーラビリティの観点から、既存の手法よりも優れた性能を示す。 UCRアーカイブを用いて得られた実験結果は,様々な時系列データセットにまたがるアプローチの有効性を示した。この結果は、様々な領域やアプリケーションにおけるRクラスタリングの重要性を強調し、時系列データマイニングの進歩に寄与している。

Time series data, spanning applications ranging from climatology to finance to healthcare, presents significant challenges in data mining due to its size and complexity. One open issue lies in time series clustering, which is crucial for processing large volumes of unlabeled time series data and unlocking valuable insights. Traditional and modern analysis methods, however, often struggle with these complexities. To address these limitations, we introduce R-Clustering, a novel method that utilizes convolutional architectures with randomly selected parameters. Through extensive evaluations, R-Clustering demonstrates superior performance over existing methods in terms of clustering accuracy, computational efficiency and scalability. Empirical results obtained using the UCR archive demonstrate the effectiveness of our approach across diverse time series datasets. The findings highlight the significance of R-Clustering in various domains and applications, contributing to the advancement of time series data mining.

翻訳日:2023-07-07 17:31:34 公開日:2023-07-06

# レニアの新たな複雑さを捉え

Capturing Emerging Complexity in Lenia ( http://arxiv.org/abs/2305.09378v2 )

ライセンス: Link先を確認

Sanyam Jain, Aarati Shrestha and Stefano Nichele

(参考訳) この研究プロジェクトは、デジタル生物の生態系をシミュレートする人工生命プラットフォームLeniaを調査する。レニアの生態系は、移動し、消費し、成長し、再生できる単純な人工生物から成り立っている。このプラットフォームは、様々な能力と行動を持つ多様な生物を生み出すためのスケーラブルで柔軟な環境を提供するため、人工生命と進化を研究するためのツールとして重要である。レニアの複雑さを測定することは、まだ発見されていないレニアの行動を改善することを目的として、ルールの長期的な複雑な出現行動を測定するための指標を特定する研究の重要な側面である。遺伝的アルゴリズムは、近辺やカーネルを遺伝子型として使用し、レニアの残りのパラメータを例えば成長関数のように固定し、個体群ごとに異なる行動を生成し、その結果生じる行動の複雑さを決定するために適合値を測定する。まず,フレーム間のばらつきが高まるようなフィットネス機能として,時間とともに変化を利用する。第2に,フレームの復元損失リストの変動が報われる自動エンコーダベースの適合性を用いる。第3に、再構成フレームの画素密度のより高い変動が報われるような複合フィットネスを行う。 3つの実験はすべてpixel alive thresholdとフレームで調整されている。最後に、500世代毎に各フィットネスの9つの実験を行った後、さらなる進化のスコープがあるような全ての実験から構成を選択し、2500世代にわたって実行します。結果は、核の質量中心は、特定のピクセル集合と、核がガウス分布を達成しようとする境界とともに増加することを示している。

This research project investigates Lenia, an artificial life platform that simulates ecosystems of digital creatures. Lenia's ecosystem consists of simple, artificial organisms that can move, consume, grow, and reproduce. The platform is important as a tool for studying artificial life and evolution, as it provides a scalable and flexible environment for creating a diverse range of organisms with varying abilities and behaviors. Measuring complexity in Lenia is a key aspect of the study, which identifies the metrics for measuring long-term complex emerging behavior of rules, with the aim of evolving better Lenia behaviors which are yet not discovered. The Genetic Algorithm uses neighborhoods or kernels as genotype while keeping the rest of the parameters of Lenia as fixed, for example growth function, to produce different behaviors respective to the population and then measures fitness value to decide the complexity of the resulting behavior. First, we use Variation over Time as a fitness function where higher variance between the frames are rewarded. Second, we use Auto-encoder based fitness where variation of the list of reconstruction loss for the frames is rewarded. Third, we perform combined fitness where higher variation of the pixel density of reconstructed frames is rewarded. All three experiments are tweaked with pixel alive threshold and frames used. Finally, after performing nine experiments of each fitness for 500 generations, we pick configurations from all experiments such that there is a scope of further evolution, and run it for 2500 generations. Results show that the kernel's center of mass increases with a specific set of pixels and together with borders the kernel try to achieve a Gaussian distribution.

翻訳日:2023-07-07 17:30:55 公開日:2023-07-06

# 量子ゲートの物理的誤差寄与の高速推定

Fast Estimation of Physical Error Contributions of Quantum Gates ( http://arxiv.org/abs/2305.08916v2 )

ライセンス: Link先を確認

Miha Papi\v{c}, Adrian Auer, In\'es de Vega

(参考訳) 大規模量子計算では、実装された量子ゲートの主なエラー源を高速に評価する必要がある。そこで本研究では,各物理ノイズ源の寄与を,少数の実験的な測定値を用いて,一連のゲートの不確かさから抽出する学習ベースのフレームワークを提案する。本手法を説明するために,超伝導トランスモンアーキテクチャを例として,可変カプラを用いたCZゲートのダイアバティック実装に着目する。この文脈では、非マルコフ雑音、電子的不完全性、可変カプラによる計算誤差の影響など、関連する全てのノイズ源を考慮に入れる。

Large-scale quantum computation requires a fast assessment of the main sources of error in the implemented quantum gates. To this aim, we provide a learning based framework that allows to extract the contribution of each physical noise source to the infidelity of a series of gates with a small number of experimental measurements. To illustrate this method, we consider the case of superconducting transmon architectures, where we focus on the diabatic implementation of the CZ gate with tunable couplers. In this context, we account for all relevant noise sources, including non-Markovian noise, electronics imperfections and the effect of tunable couplers to the error of the computation.

翻訳日:2023-07-07 17:30:31 公開日:2023-07-06

# 量子力学におけるパラドックスとその解法について

On a paradox in quantum mechanics and its resolution ( http://arxiv.org/abs/2305.08556v2 )

ライセンス: Link先を確認

Padtarapan Banyadsin and Salvatore De Vincenzo

(参考訳) ディリクレ境界条件によって特徴づけられる壁のある区間内の自由シュル=オディンガー粒子を考える。この境界条件を満たす粒子の正規化状態として放物線を選択する。その状態におけるハミルトニアンの分散を計算するには、ハミルトニアンの平均値とその正方形の値を計算する必要がある。これらの平均値を計算するのに標準式を使用すると、両者の結果は困難なく得られるが、その差分は予想外に虚偽値を取る。これらの平均値を計算するのに同じ式を使うが、まず各固有関数と固有値の項でハミルトニアンとその平方を書けば、ハミルトニアンの平均値に対して上と同じ結果が得られるが、ハミルトニアンの平均値は異なる(実際にはゼロではない)ので、分散は許容できる値となる。この矛盾した結果がいつから起こるのか? 後者のパラドックスは、ヒルベルト空間における線型作用素の一般理論の中で、ある基本的な概念を使用することでのみ適切に解決できる問題の例として文献に提示されている。ここでは、これらの概念を慎重に検討し、パラドックスを解決するための詳細な方法で適用する。我々の結果は波動力学の自然な枠組みの中で定式化され、ディラックの象徴的形式主義がもたらす不便さを避けるために、記事全体を通してその形式主義の使用を避ける。さらに、関係する演算子の領域によって課される制約に対処することなく、完全に形式的な方法でパラドックスの解決を得る。本論文の内容は,大学院生や大学院生,インストラクターにとって有用であると考えられる。

Consider a free Schr\"odinger particle inside an interval with walls characterized by the Dirichlet boundary condition. Choose a parabola as the normalized state of the particle that satisfies this boundary condition. To calculate the variance of the Hamiltonian in that state, one needs to calculate the mean value of the Hamiltonian and that of its square. If one uses the standard formula to calculate these mean values, one obtains both results without difficulty, but the variance unexpectedly takes an imaginary value. If one uses the same expression to calculate these mean values but first writes the Hamiltonian and its square in terms of their respective eigenfunctions and eigenvalues, one obtains the same result as above for the mean value of the Hamiltonian but a different value for its square (in fact, it is not zero); hence, the variance takes an acceptable value. From whence do these contradictory results arise? The latter paradox has been presented in the literature as an example of a problem that can only be properly solved by making use of certain fundamental concepts within the general theory of linear operators in Hilbert spaces. Here, we carefully review those concepts and apply them in a detailed way to resolve the paradox. Our results are formulated within the natural framework of wave mechanics, and to avoid inconveniences that the use of Dirac's symbolic formalism could bring, we avoid the use of that formalism throughout the article. In addition, we obtain a resolution of the paradox in an entirely formal way without addressing the restrictions imposed by the domains of the operators involved. We think that the content of this paper will be useful to undergraduate and graduate students as well as to their instructors.

翻訳日:2023-07-07 17:30:21 公開日:2023-07-06

# 事前データから言語モデル、下流タスクへ:不公平なNLPモデルによる政治的バイアスの軌跡を追跡する

From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models ( http://arxiv.org/abs/2305.08283v3 )

ライセンス: Link先を確認

Shangbin Feng, Chan Young Park, Yuhan Liu, Yulia Tsvetkov

(参考訳) 言語モデル(LM)は、ニュース、ディスカッションフォーラム、書籍、オンライン百科事典など、さまざまなデータソースで事前訓練されている。このデータの大部分には、民主主義とアイデアの多様性を祝福する意見と視点が含まれており、一方で本質的に社会的に偏っている。本研究は,(1)そのようなコーパスで訓練されたLMの社会的偏見を社会的・経済的軸に沿って測定し,(2)政治的偏見のあるLM上で訓練された下流NLPモデルの公平さを測定するための新しい手法を開発する。我々はヘイトスピーチと誤情報検出に注目し、ハイテイクなソーシャル指向タスクの公平性に関する事前学習データにおける政治的(社会的、経済的)バイアスの効果を実証的に定量化することを目的としている。以上の結果から, 事前学習されたLMは, コーパスの偏極性を高める政治的傾向を示し, 社会的バイアスをヘイトスピーチ予測や誤情報検知器に伝播させることがわかった。我々は,nlp研究の意義を議論し,不公平を緩和するための今後の方向性を提案する。

Language models (LMs) are pretrained on diverse data sources, including news, discussion forums, books, and online encyclopedias. A significant portion of this data includes opinions and perspectives which, on one hand, celebrate democracy and diversity of ideas, and on the other hand are inherently socially biased. Our work develops new methods to (1) measure political biases in LMs trained on such corpora, along social and economic axes, and (2) measure the fairness of downstream NLP models trained on top of politically biased LMs. We focus on hate speech and misinformation detection, aiming to empirically quantify the effects of political (social, economic) biases in pretraining data on the fairness of high-stakes social-oriented tasks. Our findings reveal that pretrained LMs do have political leanings that reinforce the polarization present in pretraining corpora, propagating social biases into hate speech predictions and misinformation detectors. We discuss the implications of our findings for NLP research and propose future directions to mitigate unfairness.

翻訳日:2023-07-07 17:29:55 公開日:2023-07-06

# ブラウンドワーフモデルグリッドの相互比較と機械学習による大気検索

Intercomparison of Brown Dwarf Model Grids and Atmospheric Retrieval Using Machine Learning ( http://arxiv.org/abs/2305.07719v2 )

ライセンス: Link先を確認

Anna Lueber, Daniel Kitzmann, Chloe E. Fisher, Brendan P. Bowler, Adam J. Burgasser, Mark Marley, Kevin Heng

(参考訳) サブステラースペクトルデータとモデルの違いを理解することは、特にブラウンドワーフ大気の徹底的な調査に必要な自己整合モデルグリッドにおいて、大きな課題であることが証明されている。ランダム林の教師付き機械学習手法を用いて,1997年から2021年までのブラウンドロームの14個のモデルグリッドの情報量について検討した。ランダムフォレスト法により,モデルグリッドの予測力を解析し,近似ベイズ計算(abc)の枠組み内でデータを解釈することができる。我々のキュレートされたデータセットには、3つのベンチマークブラウンドローム(Gl 570D, {\epsilon} Indi Ba, Bb)と19個のLおよびTドロームのサンプルが含まれており、このサンプルは従来型のベイズ法(ネステッドサンプリング)を用いてLueber et al. (2022)で分析された。この解釈のために選択されたモデルグリッドとは無関係に、ブラウンドロームの有効温度を頑健に予測できることが判明した。しかし、表面重力の推論はモデルに依存します。具体的には、BT-Settl, Sonora Bobcat および Sonora Cholla モデルグリッドは 1.2 {\mu}m のデータブルーワードが不完全なアルカリ線の形状に関する知識を緩和するために無視されているにもかかわらず、logg ~3-4 (cgs unit) を予測する傾向にある。ブラウンドワーフの大気における雲の影響を理解することに関連する2つの大きな、長い間の課題は、次の原則からそれらをモデル化できないことと、これらのモデルを堅牢に検証することである。

Understanding differences between sub-stellar spectral data and models has proven to be a major challenge, especially for self-consistent model grids that are necessary for a thorough investigation of brown dwarf atmospheres. Using the supervised machine learning method of the random forest, we study the information content of 14 previously published model grids of brown dwarfs (from 1997 to 2021). The random forest method allows us to analyze the predictive power of these model grids, as well as interpret data within the framework of Approximate Bayesian Computation (ABC). Our curated dataset includes 3 benchmark brown dwarfs (Gl 570D, {\epsilon} Indi Ba and Bb) as well as a sample of 19 L and T dwarfs; this sample was previously analyzed in Lueber et al. (2022) using traditional Bayesian methods (nested sampling). We find that the effective temperature of a brown dwarf can be robustly predicted independent of the model grid chosen for the interpretation. However, inference of the surface gravity is model-dependent. Specifically, the BT-Settl, Sonora Bobcat and Sonora Cholla model grids tend to predict logg ~3-4 (cgs units) even after data blueward of 1.2 {\mu}m have been disregarded to mitigate for our incomplete knowledge of the shapes of alkali lines. Two major, longstanding challenges associated with understanding the influence of clouds in brown dwarf atmospheres remain: our inability to model them from first principles and also to robustly validate these models.

翻訳日:2023-07-07 17:29:34 公開日:2023-07-06

# 機械学習を用いた意思決定システムにおけるデータ時間ラグの豚肉価格予測に及ぼす影響

Effects of data time lag in a decision-making system using machine learning for pork price prediction ( http://arxiv.org/abs/2305.05677v2 )

ライセンス: Link先を確認

Mario Suaza-Medina, F. Javier Zarazaga-Soria, Jorge Pinilla-Lopez, Francisco J. L\'opez-Pellicer, Javier Lacasta

(参考訳) スペインは世界第3位の豚肉生産国であり、いくつかの地域の多くの農場はこの市場の進化に依存している。しかし、現在の価格体系は不公平であり、一部の俳優は他の業者よりも優れた市場情報を持っている。この文脈では、歴史的価格設定は簡単で手頃な価格なデータソースであり、すべてのエージェントにより良い情報を提供するのに役立つ。しかし、データ取得の遅れが価格決定に影響を及ぼす可能性がある。本稿では,複数の予測アルゴリズムを用いて,データ取得遅延が価格予測システムに与える影響について検討する。本稿では,最適な提案を意思決定支援システムのプロトタイプに統合し,実際のシナリオでテストする。具体的には、農務省が発行したスペインの最も重要な地域豚肉市場の公開データを用いて、同日に取得した同市場の2週間の遅延とサブスクリプションベースのデータを用いている。その結果,最高のパブリックモデルとデータサブスクリプションモデルとの誤差差は0.6ユーロであり,遅延のないデータに有利であることがわかった。市場規模はこれらの違いをサプライチェーンにおいて重要なものにし、市場価格を交渉するためのより良いツールを提供する。

Spain is the third-largest producer of pork meat in the world, and many farms in several regions depend on the evolution of this market. However, the current pricing system is unfair, as some actors have better market information than others. In this context, historical pricing is an easy-to-find and affordable data source that can help all agents to be better informed. However, the time lag in data acquisition can affect their pricing decisions. In this paper, we study the effect that data acquisition delay has on a price prediction system using multiple prediction algorithms. We describe the integration of the best proposal into a decision support system prototype and test it in a real-case scenario. Specifically, we use public data from the most important regional pork meat markets in Spain published by the Ministry of Agriculture with a two-week delay and subscription-based data of the same markets obtained on the same day. The results show that the error difference between the best public and data subscription models is 0.6 Euro cents in favor of the data without delay. The market dimension makes these differences significant in the supply chain, giving pricing agents a better tool to negotiate market prices.

翻訳日:2023-07-07 17:29:00 公開日:2023-07-06

# セマンティックセグメンテーションのための構造的および統計的テクスチャ知識蒸留

Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation ( http://arxiv.org/abs/2305.03944v2 )

ライセンス: Link先を確認

Deyi Ji, Haoran Wang, Mingyuan Tao, Jianqiang Huang, Xian-Sheng Hua, Hongtao Lu

(参考訳) 既存の知識蒸留は、主に教師から学生への高度な文脈知識の伝達に焦点を当てている。しかし、低レベルのテクスチャ知識は、高レベルの深い特徴に対処できない境界、滑らかさ、規則性、色コントラストといった、局所的な構造パターンとグローバルな統計特性を特徴付ける上でも不可欠である。本稿では,構造的・統計的テクスチャ知識を最大限に活用し,意味的セグメント化のための新しい構造的・統計的テクスチャ知識蒸留(sstkd)フレームワークを提案する。具体的には,構造テクスチャ知識のために,構造テクスチャ知識をマイニングするために,ラプラシアンピラミッドと指向性フィルタバンクで低レベル特徴を分解するContourlet Decomposition Module (CDM)を導入する。統計的知識については,統計テクスチャ知識を適応的に抽出し,ヒューリスティックス反復量子化と復号化操作により拡張するDenoized Texture Intensity Equalization Module (DTIEM)を提案する。最後に、各知識学習は個々の損失関数によって監督され、学生ネットワークはより広い視点から教師をよりよく模倣する。実験の結果,提案手法はCityscapes, Pascal VOC 2012, ADE20Kデータセット上での最先端性能を実現することがわかった。

Existing knowledge distillation works for semantic segmentation mainly focus on transferring high-level contextual knowledge from teacher to student. However, low-level texture knowledge is also of vital importance for characterizing the local structural pattern and global statistical property, such as boundary, smoothness, regularity and color contrast, which may not be well addressed by high-level deep features. In this paper, we are intended to take full advantage of both structural and statistical texture knowledge and propose a novel Structural and Statistical Texture Knowledge Distillation (SSTKD) framework for semantic segmentation. Specifically, for structural texture knowledge, we introduce a Contourlet Decomposition Module (CDM) that decomposes low-level features with iterative Laplacian pyramid and directional filter bank to mine the structural texture knowledge. For statistical knowledge, we propose a Denoised Texture Intensity Equalization Module (DTIEM) to adaptively extract and enhance statistical texture knowledge through heuristics iterative quantization and denoised operation. Finally, each knowledge learning is supervised by an individual loss function, forcing the student network to mimic the teacher better from a broader perspective. Experiments show that the proposed method achieves state-of-the-art performance on Cityscapes, Pascal VOC 2012 and ADE20K datasets.

翻訳日:2023-07-07 17:28:43 公開日:2023-07-06

# 表面から見る:試料効率の良いオフラインRLの基礎対称性の爆発

Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL ( http://arxiv.org/abs/2306.04220v3 )

ライセンス: Link先を確認

Peng Cheng, Xianyuan Zhan, Zhihao Wu, Wenjia Zhang, Shoucheng Song, Han Wang, Youfang Lin, Li Jiang

(参考訳) オフライン強化学習(rl)は、事前収集されたデータセットから環境と対話することなくポリシーを学習することで、現実世界のタスクに魅力的なアプローチを提供する。しかし、既存のオフラインRLアルゴリズムの性能はデータセットのスケールと状態-アクション空間カバレッジに大きく依存する。現実世界のデータ収集は、しばしば高価で制御不能であり、小規模で狭い範囲のデータセットにつながり、オフラインrlの実用的なデプロイに重大な課題をもたらす。本稿では,システムダイナミクスの基本的な対称性を活用することで,小規模データセット下でのオフラインrl性能が大幅に向上することを示す。具体的には,tdm(time-reversal symmetry)強制動力学モデル(t-symmetry enforced dynamics model, tdm)を提案する。 TDMは、小さなデータセットに対する良好な表現と、T対称性の遵守に基づくOODサンプルに対する新しい信頼性尺度の両方を提供する。これらは、保守的なポリシー制約の少ない新しいオフラインRLアルゴリズム(TSRL)の構築や、信頼性の高い遅延空間データ拡張手順に容易に使用できる。大規模な実験に基づいて、TSRLは、原サンプルの1%に満たない小さなベンチマークデータセットで優れた性能を発揮し、データ効率と一般化性の観点から、最近のオフラインRLアルゴリズムを著しく上回っている。

Offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets without interacting with the environment. However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets. Real-world data collection is often expensive and uncontrollable, leading to small and narrowly covered datasets and posing significant challenges for practical deployments of offline RL. In this paper, we provide a new insight that leveraging the fundamental symmetry of system dynamics can substantially enhance offline RL performance under small datasets. Specifically, we propose a Time-reversal symmetry (T-symmetry) enforced Dynamics Model (TDM), which establishes consistency between a pair of forward and reverse latent dynamics. TDM provides both well-behaved representations for small datasets and a new reliability measure for OOD samples based on compliance with the T-symmetry. These can be readily used to construct a new offline RL algorithm (TSRL) with less conservative policy constraints and a reliable latent space data augmentation procedure. Based on extensive experiments, we find TSRL achieves great performance on small benchmark datasets with as few as 1% of the original samples, which significantly outperforms the recent offline RL algorithms in terms of data efficiency and generalizability.

翻訳日:2023-07-07 17:23:04 公開日:2023-07-06

# OSPC:オンライン連続測光校正

OSPC: Online Sequential Photometric Calibration ( http://arxiv.org/abs/2305.17673v2 )

ライセンス: Link先を確認

Jawad Haidar, Douaa Khalil, Daniel Asmar

(参考訳) 測光キャリブレーションは多くのコンピュータビジョンアプリケーションに必須である。主な利点の1つは、特に標準のKLTアルゴリズムのようなトラッキングの直接的な方法に依存する場合、Visual SLAMの性能を向上させることである。もうひとつの利点は、測定された強度からセンサーの照射値を取得することであり、シェーディングの形状のような視覚アルゴリズムの事前処理ステップである。現在の測光キャリブレーションシステムは、共同最適化の問題に頼り、推定値の曖昧さに遭遇する。本稿では, 逐次推定手法を用いて, 測光パラメータを求める新しい手法を提案する。提案手法は,すべてのパラメータを高精度に推定でき,さらに定式化は線形かつ凸であり,その解を高速かつオンラインアプリケーションに適したものにしている。提案手法を検証し,その利点を実証するビジュアルオドメトリーシステムの実験を行った。

Photometric calibration is essential to many computer vision applications. One of its key benefits is enhancing the performance of Visual SLAM, especially when it depends on a direct method for tracking, such as the standard KLT algorithm. Another advantage could be in retrieving the sensor irradiance values from measured intensities, as a pre-processing step for some vision algorithms, such as shape-from-shading. Current photometric calibration systems rely on a joint optimization problem and encounter an ambiguity in the estimates, which can only be resolved using ground truth information. We propose a novel method that solves for photometric parameters using a sequential estimation approach. Our proposed method achieves high accuracy in estimating all parameters; furthermore, the formulations are linear and convex, which makes the solution fast and suitable for online applications. Experiments on a Visual Odometry system validate the proposed method and demonstrate its advantages.

翻訳日:2023-07-07 17:21:54 公開日:2023-07-06

# ランダム化SVDの雑音感度について

On the Noise Sensitivity of the Randomized SVD ( http://arxiv.org/abs/2305.17435v2 )

ライセンス: Link先を確認

Elad Romanov

(参考訳) ランダム化特異値分解(R-SVD)は、大きな行列の部分的なSVDを効率的に計算するためのスケッチベースアルゴリズムである。行列が低ランクの場合、R-SVDはその部分SVDを正確に生成するが、ランクが大きいと近似しか得られない。データサイエンスと主成分分析(PCA)の応用により、低ランク信号と雑音測定モデルの下でR-SVDを解析する。 R-SVD が生成した特異値は BBP のような相転移を示すことが示され、SNR が特定の検出可能性閾値を超えると、寸法減少係数に依存する最大の特異値は外れ値となる。さらに、基底真理信号特異ベクトルとR-SVDによる近似との重なり合いに関する漸近公式を計算する。次元の減少は、ノイズを非常に非線形に増幅する悪影響がある。以上の結果から,R-SVDの信号検出と推定の両面での統計的優位性を示すとともに,スケッチ寸法が小さい場合には特に顕著である。我々の分析は漸近的に正確であり、R-SVDの既存の作用素-ノルム誤差境界よりもかなり微細である。これは、ガウスのi.d.スケッチ、ランダム・プロジェクション、サブサンプラート・アダマール変換など、以前に文献で考えられていたスケッチ行列の幅広いファミリーに適用される。最後に、r-svd によって得られる特異値とベクトルに対する最適特異値縮小器を導出し、行列の除算への応用に有用である。

The randomized singular value decomposition (R-SVD) is a popular sketching-based algorithm for efficiently computing the partial SVD of a large matrix. When the matrix is low-rank, the R-SVD produces its partial SVD exactly; but when the rank is large, it only yields an approximation. Motivated by applications in data science and principal component analysis (PCA), we analyze the R-SVD under a low-rank signal plus noise measurement model; specifically, when its input is a spiked random matrix. The singular values produced by the R-SVD are shown to exhibit a BBP-like phase transition: when the SNR exceeds a certain detectability threshold, that depends on the dimension reduction factor, the largest singular value is an outlier; below the threshold, no outlier emerges from the bulk of singular values. We further compute asymptotic formulas for the overlap between the ground truth signal singular vectors and the approximations produced by the R-SVD. Dimensionality reduction has the adverse affect of amplifying the noise in a highly nonlinear manner. Our results demonstrate the statistical advantage -- in both signal detection and estimation -- of the R-SVD over more naive sketched PCA variants; the advantage is especially dramatic when the sketching dimension is small. Our analysis is asymptotically exact, and substantially more fine-grained than existing operator-norm error bounds for the R-SVD, which largely fail to give meaningful error estimates in the moderate SNR regime. It applies for a broad family of sketching matrices previously considered in the literature, including Gaussian i.i.d. sketches, random projections, and the sub-sampled Hadamard transform, among others. Lastly, we derive an optimal singular value shrinker for singular values and vectors obtained through the R-SVD, which may be useful for applications in matrix denoising.

翻訳日:2023-07-07 17:21:34 公開日:2023-07-06

# 多視点制限カーネルマシンにおける双対性

Duality in Multi-View Restricted Kernel Machines ( http://arxiv.org/abs/2305.17251v2 )

ライセンス: Link先を確認

Sonny Achten, Arun Pandey, Hannes De Meulemeester, Bart De Moor, Johan A. K. Suykens

(参考訳) 本稿では,既存の制限付きカーネルマシンメソッドを,教師なし設定と教師なし設定の両方においてカーネル主成分分析のための単一のプリミラル・ディアル・マルチビュー・フレームワークに結合した統一設定を提案する。フレームワークの一次表現と双対表現を導出し、理論的な観点から異なるトレーニングと推論アルゴリズムを関連づける。一次変数を再スケーリングすることで、原始変数と双対変数の完全同値性を実現する方法を示す。最後に,不確定なテストデータを再帰的に予測し,学習した特徴を可視化することにより,複数の時系列データセットにおける異なる手法間の関係を実験的に検証し,考察する。

We propose a unifying setting that combines existing restricted kernel machine methods into a single primal-dual multi-view framework for kernel principal component analysis in both supervised and unsupervised settings. We derive the primal and dual representations of the framework and relate different training and inference algorithms from a theoretical perspective. We show how to achieve full equivalence in primal and dual formulations by rescaling primal variables. Finally, we experimentally validate the equivalence and provide insight into the relationships between different methods on a number of time series data sets by recursively forecasting unseen test data and visualizing the learned features.

翻訳日:2023-07-07 17:20:59 公開日:2023-07-06

# 音声による抑うつ検出における自己教師付き表現

Self-supervised representations in speech-based depression detection ( http://arxiv.org/abs/2305.12263v2 )

ライセンス: Link先を確認

Wen Wu, Chao Zhang, Philip C. Woodland

(参考訳) 本稿では,自己教師付き学習(ssl)による基礎モデルを用いた音声自動抑うつ検出(sdd)における学習データのスパーシティの取り扱いを提案する。予め訓練された基礎モデルの異なる層から派生したSSL表現をSDDで解析し、うつ病検出に適した指標の洞察を提供する。次に、基礎モデルの微調整により、自動音声認識(ASR)と感情認識からSDDへの知識伝達を行う。その結果,asrモデルの隠れた表現とasrのテキスト情報とが組み合わさった場合,oracle と asr の書き起こしが同様の sdd 性能をもたらすことがわかった。複数の基礎モデルから表現を統合することで、DAIC-WOZデータセット上で実際のASRに基づく最先端SDD結果が得られた。

This paper proposes handling training data sparsity in speech-based automatic depression detection (SDD) using foundation models pre-trained with self-supervised learning (SSL). An analysis of SSL representations derived from different layers of pre-trained foundation models is first presented for SDD, which provides insight to suitable indicator for depression detection. Knowledge transfer is then performed from automatic speech recognition (ASR) and emotion recognition to SDD by fine-tuning the foundation models. Results show that the uses of oracle and ASR transcriptions yield similar SDD performance when the hidden representations of the ASR model is incorporated along with the ASR textual information. By integrating representations from multiple foundation models, state-of-the-art SDD results based on real ASR were achieved on the DAIC-WOZ dataset.

翻訳日:2023-07-07 17:19:55 公開日:2023-07-06

# 冷原子不純物モデルを用いたフェルミオン物質波量子光学

Fermionic matter-wave quantum optics with cold-atom impurity models ( http://arxiv.org/abs/2305.11610v2 )

ライセンス: Link先を確認

Bennet Windt, Miguel Bello, Eugene Demler, J. Ignacio Cirac

(参考訳) 物質-波導波路QEDの近年のコールド原子実現により、簡単なフェルミオン不純物モデルが研究され、非自明な境界状態の形成、(マター波)放出ダイナミクス、集団散逸など、量子光学におけるいくつかのパラダイム現象のフェルミオン類似が議論される。単一不純物の場合、特に不純物スクリーニングクラウドに関連する創発的長さスケールの実際の空間シグネチャに焦点を当て、興味深い基底状態の特徴を強調します。また,単重および多重不純物系のクエンチダイナミクスにおいて,フェルミ準位付近の分数減衰や連続体の束縛状態による多重励起集団トラップを含む,新しい非マルコフ多体効果を示す。

Motivated by recent cold-atom realisations of matter-wave waveguide QED, we study simple fermionic impurity models and discuss fermionic analogues of several paradigmatic phenomena in quantum optics, including formation of non-trivial bound states, (matter-wave) emission dynamics, and collective dissipation. For a single impurity, we highlight interesting ground-state features, focusing in particular on real-space signatures of an emergent length scale associated with an impurity screening cloud. We also present novel non-Markovian many-body effects in the quench dynamics of single- and multiple-impurity systems, including fractional decay around the Fermi level and multi-excitation population trapping due to bound states in the continuum.

翻訳日:2023-07-07 17:19:43 公開日:2023-07-06

# ビデオ異常検出のためのマルチスケール時空間インタラクションネットワーク

Multi-scale Spatial-temporal Interaction Network for Video Anomaly Detection ( http://arxiv.org/abs/2306.10239v2 )

ライセンス: Link先を確認

Zhiyuan Ning, Zhangxun Li, Zhengliang Guo, Zile Wang, Liang Song

(参考訳) video anomaly detection (vad)は信号処理において必須だが困難なタスクである。時間的または空間的情報の分離分析では特定の異常は検出できないため、これらの2種類のデータ間の相互作用はvadにとって重要であると考えられている。しかし、現在のデュアルストリームアーキテクチャでは、この積分相互作用をオートエンコーダのボトルネックに限定するか、異常に非関連な背景画素をインタラクティブなプロセスに導入することで、VADの精度を損なう。これらの欠陥に対処するために,VADのためのマルチスケール空間時間相互作用ネットワーク(MSTI-Net)を提案する。まず,移動物体の検出を優先し,2種類のデータ間の意味的相違を調和させるため,従来の直接核融合の代替として,アテンションに基づく時空間融合モジュール(ASTFM)を提案する。さらに、両ストリームネットワークの出現と動きをブリッジするマルチASTFMベースの接続を注入し、マルチスケールの時空間相互作用を促進する。最後に,正常な動作と異常な動作の関連性を高めるため,メモリモジュール内の正規情報を記録する。 3つのベンチマークデータセットにおける実験結果から,ucsd ped2,cuhk avenue,上海テックデータセットでそれぞれ96.8%,87.6%,73.9%のaucsを達成した。

Video Anomaly Detection (VAD) is an essential yet challenging task in signal processing. Since certain anomalies cannot be detected by isolated analysis of either temporal or spatial information, the interaction between these two types of data is considered crucial for VAD. However, current dual-stream architectures either confine this integral interaction to the bottleneck of the autoencoder or introduce anomaly-irrelevant background pixels into the interactive process, hindering the accuracy of VAD. To address these deficiencies, we propose a Multi-scale Spatial-Temporal Interaction Network (MSTI-Net) for VAD. First, to prioritize the detection of moving objects in the scene and harmonize the substantial semantic discrepancies between the two types of data, we propose an Attention-based Spatial-Temporal Fusion Module (ASTFM) as a substitute for the conventional direct fusion. Furthermore, we inject multi-ASTFM-based connections that bridge the appearance and motion streams of the dual-stream network, thus fostering multi-scale spatial-temporal interaction. Finally, to bolster the delineation between normal and abnormal activities, our system records the regular information in a memory module. Experimental results on three benchmark datasets validate the effectiveness of our approach, which achieves AUCs of 96.8%, 87.6%, and 73.9% on the UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets, respectively.

翻訳日:2023-07-07 17:12:45 公開日:2023-07-06

# wasserstein barycentersによるマルチタスク学習の公平性

Fairness in Multi-Task Learning via Wasserstein Barycenters ( http://arxiv.org/abs/2306.10155v2 )

ライセンス: Link先を確認

Fran\c{c}ois Hu, Philipp Ratz, Arthur Charpentier

(参考訳) アルゴリズムフェアネスは、データのバイアスを減らすことを目的とした機械学習の確立された分野である。近年の進歩は、単一タスクの非バイアス化を目標とする単変量環境における公平性を確保するための様々な方法が提案されている。しかし、複数の目的が共有表現を用いて最適化されるマルチタスク設定への公平性の拡張は、未探索のままである。このギャップを埋めるため,マルチマルジナルなワッサーシュタイン・バリセンタを用いたマルチタスク学習に,Strong Demographic Parityの定義を拡張した手法を開発した。提案手法は回帰および二項分類タスクを含む最適フェアマルチタスク予測器に対する閉形式解を提供する。本研究では,データ駆動型解推定手法を開発し,合成データと実データの両方について数値実験を行う。実験結果は, 公平な意思決定を促進する上で, 処理後方法論の実際的価値を浮き彫りにするものである。

Algorithmic Fairness is an established field in machine learning that aims to reduce biases in data. Recent advances have proposed various methods to ensure fairness in a univariate environment, where the goal is to de-bias a single task. However, extending fairness to a multi-task setting, where more than one objective is optimised using a shared representation, remains underexplored. To bridge this gap, we develop a method that extends the definition of Strong Demographic Parity to multi-task learning using multi-marginal Wasserstein barycenters. Our approach provides a closed form solution for the optimal fair multi-task predictor including both regression and binary classification tasks. We develop a data-driven estimation procedure for the solution and run numerical experiments on both synthetic and real datasets. The empirical results highlight the practical value of our post-processing methodology in promoting fair decision-making.

翻訳日:2023-07-07 17:12:21 公開日:2023-07-06

# タッグ符号化問題に対するハイパーパラメータ調整モデルの重ね合わせ

Stacking of Hyperparameter Tuned Models for Tagging Coding Problems ( http://arxiv.org/abs/2306.10077v2 )

ライセンス: Link先を確認

Sathya Krishnan TS, S. Lakshmana Pandian and P. Shunmugapriya

(参考訳) 符号化問題は、コンピュータプログラムの形で解を必要とする問題である。コーディングの問題は、学生やプロの間で人気があり、スキルやキャリアの機会を高める。コーディング問題を実践する人たちを助けるAIシステムは、非常に有用であり、そのようなシステムには大きな可能性がある。本研究では,ハイパーパラメータの積み重ねによって77.8%の精度と0.815pr-aucの印象的なメトリックスコアを,codeforcesとleetcodeから抽出したデータセット上で達成するモデルを提案する。この作業のために開発されたデータセットとモデルをオープンソースにしています。

Coding problems are problems that require a solution in the form of a computer program. Coding problems are popular among students and professionals as it enhances their skills and career opportunities. An AI system that would help those who practice coding problems would be highly useful and there is a huge potential for such a system. In this work, we propose a model which uses stacking of hyperparameter tuned boosting models to achieve impressive metric scores of 77.8% accuracy and 0.815 PR-AUC on the dataset that was scraped from Codeforces and Leetcode. We open source the dataset and the models developed for this work.

翻訳日:2023-07-07 17:11:41 公開日:2023-07-06

# ChatGPT と LLM の医療イメージ保有者への影響 : 展望とユースケース

The Impact of ChatGPT and LLMs on Medical Imaging Stakeholders: Perspectives and Use Cases ( http://arxiv.org/abs/2306.06767v2 )

ライセンス: Link先を確認

Jiancheng Yang, Hongwei Bran Li, Donglai Wei

(参考訳) 本研究では,医療画像におけるOpenAI ChatGPTなどの大規模言語モデル(LLM)の変換可能性について検討する。公衆データの助けを借りて、これらのモデルは優れた言語理解と生成能力を持ち、放射線科医の解釈スキルを増強し、患者と物理学者のコミュニケーションを強化し、臨床ワークフローを合理化する。本稿では,企業,保険法人,政府,研究機関,病院(通称BIGR-H)など医療画像利害関係者の複雑な相互作用を示すための分析枠組みについて紹介する。この視点は、詳細な分析、説明的ユースケース、より広範な意味と今後の方向性に関する議論を通じて、AI対応ヘルスケアの時代における戦略的計画と意思決定に関する議論を提起することを目指している。

This study investigates the transformative potential of Large Language Models (LLMs), such as OpenAI ChatGPT, in medical imaging. With the aid of public data, these models, which possess remarkable language understanding and generation capabilities, are augmenting the interpretive skills of radiologists, enhancing patient-physician communication, and streamlining clinical workflows. The paper introduces an analytic framework for presenting the complex interactions between LLMs and the broader ecosystem of medical imaging stakeholders, including businesses, insurance entities, governments, research institutions, and hospitals (nicknamed BIGR-H). Through detailed analyses, illustrative use cases, and discussions on the broader implications and future directions, this perspective seeks to raise discussion in strategic planning and decision-making in the era of AI-enabled healthcare.

翻訳日:2023-07-07 17:11:24 公開日:2023-07-06

# スパース観測による日次予測の深層学習

Deep Learning for Day Forecasts from Sparse Observations ( http://arxiv.org/abs/2306.06079v3 )

ライセンス: Link先を確認

Marcin Andrychowicz, Lasse Espeholt, Di Li, Samier Merchant, Alexander Merose, Fred Zyda, Shreya Agrawal, Nal Kalchbrenner

(参考訳) 深層ニューラルネットワークは、気象条件をモデル化するための代替パラダイムを提供する。データが利用可能になったら1秒未満で予測できる神経モデルの能力と、非常に高い時間分解能と空間分解能、そして大気観測から直接学習できる能力は、これらのモデルのユニークな利点のほんの一部にすぎない。最新の確率的数値気象予報モデルと比較すると,大気観測で訓練された最も高い忠実度と最低遅延データであるニューラルモデルは,最大12時間のリードタイムを達成でき,降水量の唯一の変数に限られる。本稿では,観測に基づくニューラルモデルによって予測可能な,リードタイム範囲と変数の両方を大きく拡張するMetNet-3を提案する。 MetNet-3は、密度とスパースの両方のデータセンサーから学習し、降水、風、温度、露点を最大24時間前に予測する。 MetNet-3は、極端にスパースなターゲットでのネットワークトレーニングにもかかわらず、暗黙的にデータ同化を捉え、空間的に密度の高い予測を生成するキーデンシフィケーション技術を導入している。 MetNet-3は、それぞれ2分と1kmまでの時間分解能と空間分解能が高く、運用遅延も低い。 MetNet-3は、観測ベースのニューラルモデルに新たなパフォーマンスマイルストーンが設定される前に、最大24時間、CONUS領域上でHRRRやENSのような最も優れたシングルおよびマルチメンバNWPを上回ります。 metnet-3は運用中であり、予測は他のモデルとともにgoogle検索で提供される。

Deep neural networks offer an alternative paradigm for modeling weather conditions. The ability of neural models to make a prediction in less than a second once the data is available and to do so with very high temporal and spatial resolution, and the ability to learn directly from atmospheric observations, are just some of these models' unique advantages. Neural models trained using atmospheric observations, the highest fidelity and lowest latency data, have to date achieved good performance only up to twelve hours of lead time when compared with state-of-the-art probabilistic Numerical Weather Prediction models and only for the sole variable of precipitation. In this paper, we present MetNet-3 that extends significantly both the lead time range and the variables that an observation based neural model can predict well. MetNet-3 learns from both dense and sparse data sensors and makes predictions up to 24 hours ahead for precipitation, wind, temperature and dew point. MetNet-3 introduces a key densification technique that implicitly captures data assimilation and produces spatially dense forecasts in spite of the network training on extremely sparse targets. MetNet-3 has a high temporal and spatial resolution of, respectively, up to 2 minutes and 1 km as well as a low operational latency. We find that MetNet-3 is able to outperform the best single- and multi-member NWPs such as HRRR and ENS over the CONUS region for up to 24 hours ahead setting a new performance milestone for observation based neural models. MetNet-3 is operational and its forecasts are served in Google Search in conjunction with other models.

翻訳日:2023-07-07 17:10:50 公開日:2023-07-06

# 近位誘導によるチューニングフリー実画像編集の改善

Improving Tuning-Free Real Image Editing with Proximal Guidance ( http://arxiv.org/abs/2306.05414v3 )

ライセンス: Link先を確認

Ligong Han, Song Wen, Qi Chen, Zhixing Zhang, Kunpeng Song, Mengwei Ren, Ruijiang Gao, Anastasis Stathopoulos, Xiaoxiao He, Yuxiao Chen, Di Liu, Qilong Zhangli, Jindong Jiang, Zhaoyang Xia, Akash Srivastava, Dimitris Metaxas

(参考訳) DDIMインバージョンは拡散法における実際の画像編集の可能性を明らかにした。しかし、DDIM再構成の精度は、より大きな分類器フリーガイダンス(CFG)スケールが編集の強化に使われているため劣化する。 null-text inversion (nti) は、レコンストラクションとインバージョントラジェクタをより大きなcfgスケールに合わせるためにnull埋め込みを最適化し、クロスアテンション制御による実際の画像編集を可能にする。負のプロンプト反転(NPI)はさらに、NTIのトレーニング不要閉形式解を提供する。しかし、アーティファクトを導入し、DDIMの再構築品質に制約されている。これらの制限を克服するため,我々は近位指導法を提案し,それをNPIに組み込む。我々は、NPIを正規化期間と再構築指導で強化し、トレーニングフリーな性質を生かしながらアーティファクトを減らす。さらに,概念を拡張して相互自己着脱制御を組み込むことにより,編集プロセスにおける幾何およびレイアウト変更を可能にする。提案手法は,計算オーバーヘッドを最小限に抑えることで,実画像編集作業に効果的に対処する。

DDIM inversion has revealed the remarkable potential of real image editing within diffusion-based methods. However, the accuracy of DDIM reconstruction degrades as larger classifier-free guidance (CFG) scales being used for enhanced editing. Null-text inversion (NTI) optimizes null embeddings to align the reconstruction and inversion trajectories with larger CFG scales, enabling real image editing with cross-attention control. Negative-prompt inversion (NPI) further offers a training-free closed-form solution of NTI. However, it may introduce artifacts and is still constrained by DDIM reconstruction quality. To overcome these limitations, we propose proximal guidance and incorporate it to NPI with cross-attention control. We enhance NPI with a regularization term and reconstruction guidance, which reduces artifacts while capitalizing on its training-free nature. Additionally, we extend the concepts to incorporate mutual self-attention control, enabling geometry and layout alterations in the editing process. Our method provides an efficient and straightforward approach, effectively addressing real image editing tasks with minimal computational overhead.

翻訳日:2023-07-07 17:10:21 公開日:2023-07-06

# 統計学者としてのトランスフォーマー:in-contextアルゴリズム選択によるコンテキスト内学習の実現

Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection ( http://arxiv.org/abs/2306.04637v2 )

ライセンス: Link先を確認

Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, Song Mei

(参考訳) トランスフォーマーアーキテクチャに基づくニューラルシーケンスモデルでは、トレーニングやテスト例で新たなタスクを実行し、パラメータをモデルに更新することなく、注目すべき \emph{in-context learning} (icl)能力が実証されている。この研究はまず、トランスフォーマーがiclを実行するための包括的な統計理論を提供する。具体的には、最小二乗、リッジ回帰、ラッソ、学習一般化線形モデル、二層ニューラルネットワーク上の勾配勾配などの文脈において、様々なコンテキスト内データ分布にほぼ最適な予測力を持つ、幅広い機械学習アルゴリズムを実装できることを示す。変換器の構成は,文脈内勾配勾配の効率的な実装を基礎として軽度サイズ境界を許容し,多項式的に多くの事前学習シーケンスで学習することができる。これらの 'base'' の icl アルゴリズムに基づいて、興味深いことに、トランスフォーマーがより複雑な icl プロシージャを実装できることを示します。それは、統計学者が実生活でできることに似ています -- \emph{single} トランスフォーマーは、異なるベース icl アルゴリズムを適応的に選択できます -- あるいは、異なる入力シーケンス上で、正しいアルゴリズムやタスクを明示的にプロンプトすることなく、質的に異なるタスクを実行することができます。我々は,この現象を明示的な構成によって理論的に確立し,実験的に観察する。理論的には,事前iclテストとポストicl検証という2つのアルゴリズム選択機構を具体例で構築する。例えば、ICL後検証機構を用いて、ノイズレベルが混在する雑音のある線形モデルにおいて、ベイズ最適ICLに近い動作が可能なトランスフォーマーを構築する。実験により,標準トランスアーキテクチャの強いコンテキスト内アルゴリズム選択能力を示す。

Neural sequence models based on the transformer architecture have demonstrated remarkable \emph{in-context learning} (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism, our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences. Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving \emph{in-context algorithm selection}, akin to what a statistician can do in real life -- A \emph{single} transformer can adaptively select different base ICL algorithms -- or even perform qualitatively different tasks -- on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally. In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging task -- noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures.

翻訳日:2023-07-07 17:10:01 公開日:2023-07-06

# バナッハ空間の誘導系におけるダイナミクスの収束

Convergence of Dynamics on Inductive Systems of Banach Spaces ( http://arxiv.org/abs/2306.16063v2 )

ライセンス: Link先を確認

Lauritz van Luijk, Alexander Stottmeister, Reinhard F. Werner

(参考訳) 定性的かつ定量的な物理系の多くの特徴は、ある限定的な状況下でのみ、鋭く定義されるか、抽出可能である。例えば、熱力学極限における相転移、量子論からの大きな作用における古典力学の出現、再正規化群固定点から生じる連続量子場理論である。このような多様なアプリケーションで有効な方法がほとんどないように思える。しかし、ここでは理論の極限に対する柔軟なモデリングツールを示す:バナッハ空間の帰納的極限の一般化を構成するソフトインダクティブ極限。この文脈では、ダイナミクスの収束に関する一般的な基準が定式化され、これらの基準が前述の状況に適用されることが示される。

Many features of physical systems, both qualitative and quantitative, become sharply defined or tractable only in some limiting situation. Examples are phase transitions in the thermodynamic limit, the emergence of classical mechanics from quantum theory at large action, and continuum quantum field theory arising from renormalization group fixed points. It would seem that few methods can be useful in such diverse applications. However, we here present a flexible modeling tool for the limit of theories: soft inductive limits constituting a generalization of inductive limits of Banach spaces. In this context, general criteria for the convergence of dynamics will be formulated, and these criteria will be shown to apply in the situations mentioned and more.

翻訳日:2023-07-07 17:03:14 公開日:2023-07-06

# オープンボキャブラリ学習に向けて:調査

Towards Open Vocabulary Learning: A Survey ( http://arxiv.org/abs/2306.15880v2 )

ライセンス: Link先を確認

Jianzong Wu, Xiangtai Li, Shilin Xu, Haobo Yuan, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, Dacheng Tao

(参考訳) 視覚シーン理解の分野では、ディープニューラルネットワークはセグメンテーション、トラッキング、検出など、さまざまなコアタスクにおいて驚くべき進歩を遂げている。しかし、ほとんどのアプローチはクローズセットの仮定に基づいており、トレーニングセットに存在する事前定義されたカテゴリのみを識別できる。近年、視覚言語事前学習の急速な進歩により、オープンな語彙設定が提案されている。これらの新しいアプローチは、注釈付きラベル空間を超えてカテゴリを見つけ、認識することを目指している。オープン語彙のアプローチは、弱教師付きおよびゼロショット設定に比べて、より一般的で実用的で効果的である。本稿では,その分野における最近の発展を要約し,分析し,オープンな語彙学習の徹底的なレビューを行う。特に,ゼロショット学習,オープンセット認識,分散検出といった関連する概念と比較することから始める。次に, セグメンテーションと検出に関して, ロングテール問題, 少数ショット設定, ゼロショット設定など, 密接に関連するタスクをいくつか検討する。本研究は,まず,事前知識としてクローズセットにおける検出とセグメンテーションの基本的な知識を提示する。次に,オープン語彙学習を用いた様々なシナリオについて検討し,共通設計要素とコアアイデアを同定する。次に、一般的なデータセットとベンチマークにおける最近の検出とセグメンテーションのアプローチを比較した。最後に,今後の研究方向性に関する洞察,課題,議論をまとめる。私たちの知る限り、オープンな語彙学習に関する総合的な文献レビューはこれが初めてである。関連する作業をhttps://github.com/jianzongwu/Awesome-Open-Vocabulary.comで追跡しています。

In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model can only identify pre-defined categories that are present in the training set. Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training. These new approaches seek to locate and recognize categories beyond the annotated label space. The open vocabulary approach is more general, practical, and effective compared to weakly supervised and zero-shot settings. This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field. In particular, we begin by comparing it to related concepts such as zero-shot learning, open-set recognition, and out-of-distribution detection. Then, we review several closely related tasks in the case of segmentation and detection, including long-tail problems, few-shot, and zero-shot settings. For the method survey, we first present the basic knowledge of detection and segmentation in close-set as the preliminary knowledge. Next, we examine various scenarios in which open vocabulary learning is used, identifying common design elements and core ideas. Then, we compare the recent detection and segmentation approaches in commonly used datasets and benchmarks. Finally, we conclude with insights, issues, and discussions regarding future research directions. To our knowledge, this is the first comprehensive literature review of open vocabulary learning. We keep tracing related works at https://github.com/jianzongwu/Awesome-Open-Vocabulary.

翻訳日:2023-07-07 17:03:03 公開日:2023-07-06

# UTRNet: 印刷文書における高解像度ウルドゥー文字認識

UTRNet: High-Resolution Urdu Text Recognition In Printed Documents ( http://arxiv.org/abs/2306.15782v2 )

ライセンス: Link先を確認

Abdur Rahman, Arjun Ghosh, and Chetan Arora

(参考訳) 本稿では,高解像度・マルチスケールな意味的特徴抽出を用いたUrduテキスト認識の課題に対処する新しい手法を提案する。提案するハイブリッドCNN-RNNモデルであるUTRNetアーキテクチャは,ベンチマークデータセット上での最先端性能を示す。ウルドゥー文字の複雑さと十分な注釈付き実世界のデータの欠如に対応するために,我々は,11,000 行以上からなる大規模な注釈付き実世界データセット utrset-real と,実世界に近い2万行の合成データセット utrset-synth を導入し,既存のiii 番目のデータセットの基礎的真相を訂正し,将来の研究のためのより信頼性の高いリソースとした。また、スキャンした文書のUrduテキスト行検出のためのベンチマークデータセットであるUrduDocも提供する。さらに,UTRNetをテキスト検出モデルに統合することにより,印刷物からUrdu OCRをエンド・ツー・エンドにするためのオンラインツールを開発した。我々の研究は、現在のUrdu OCRの限界に対処するだけでなく、この領域における今後の研究の道を開くとともに、Urdu OCR技術の継続的な進歩を促進する。ソースコード、データセット、アノテーション、トレーニングされたモデル、オンラインツールを備えたプロジェクトページは、abdur75648.github.io/utrnetで入手できる。

In this paper, we propose a novel approach to address the challenges of printed Urdu text recognition using high-resolution, multi-scale semantic feature extraction. Our proposed UTRNet architecture, a hybrid CNN-RNN model, demonstrates state-of-the-art performance on benchmark datasets. To address the limitations of previous works, which struggle to generalize to the intricacies of the Urdu script and the lack of sufficient annotated real-world data, we have introduced the UTRSet-Real, a large-scale annotated real-world dataset comprising over 11,000 lines and UTRSet-Synth, a synthetic dataset with 20,000 lines closely resembling real-world and made corrections to the ground truth of the existing IIITH dataset, making it a more reliable resource for future research. We also provide UrduDoc, a benchmark dataset for Urdu text line detection in scanned documents. Additionally, we have developed an online tool for end-to-end Urdu OCR from printed documents by integrating UTRNet with a text detection model. Our work not only addresses the current limitations of Urdu OCR but also paves the way for future research in this area and facilitates the continued advancement of Urdu OCR technology. The project page with source code, datasets, annotations, trained models, and online tool is available at abdur75648.github.io/UTRNet.

翻訳日:2023-07-07 17:02:41 公開日:2023-07-06

# 注意型生成型adversarial networkのための単純かつ効果的なベースライン

A Simple and Effective Baseline for Attentional Generative Adversarial Networks ( http://arxiv.org/abs/2306.14708v2 )

ライセンス: Link先を確認

Mingyu Jin, Chong Zhang, Qinkai Yu, Haochen Xue, Xiaobo Jin, Xi Yang

(参考訳) テキスト記述を通じて生成モデルを導くことで高品質画像のテキスト対画像モデルを合成することは、革新的で挑戦的なタスクである。近年,GANトレーニングをガイドするアテンション機構に基づくAttnGAN,ジェネレータの性能と画像生成の質を向上させる自己蒸留技術を採用したSD-GAN,複数のジェネレータと識別器を積み重ねることで画像の細部と品質を徐々に改善するStack-GAN++などが提案されている。しかし、この一連のGANの改善は、いずれもある程度の冗長性を持ち、世代性能と複雑性にある程度影響を及ぼす。我々は,AttnGANの冗長構造を除去し,バックボーンネットワークを改善するために,一般的なシンプルで効果的なアイデア(1)を用いる。 2) DAMSMの複数損失の統合と再構築。モデルサイズとトレーニング効率を大幅に改善するとともに,モデルの性能が変化しないことを保証し,最終的にSEAttnGANを提案する。コードはhttps://github.com/jmyissb/SEAttnGANで検証可能である。

Synthesising a text-to-image model of high-quality images by guiding the generative model through the Text description is an innovative and challenging task. In recent years, AttnGAN based on the Attention mechanism to guide GAN training has been proposed, SD-GAN, which adopts a self-distillation technique to improve the performance of the generator and the quality of image generation, and Stack-GAN++, which gradually improves the details and quality of the image by stacking multiple generators and discriminators. However, this series of improvements to GAN all have redundancy to a certain extent, which affects the generation performance and complexity to a certain extent. We use the popular simple and effective idea (1) to remove redundancy structure and improve the backbone network of AttnGAN. (2) to integrate and reconstruct multiple losses of DAMSM. Our improvements have significantly improved the model size and training efficiency while ensuring that the model's performance is unchanged and finally proposed our SEAttnGAN. Code is avalilable at https://github.com/jmyissb/SEAttnGAN.

翻訳日:2023-07-07 17:01:58 公開日:2023-07-06

# 歌声変換チャレンジ2023

The Singing Voice Conversion Challenge 2023 ( http://arxiv.org/abs/2306.14422v2 )

ライセンス: Link先を確認

Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, Tomoki Toda

(参考訳) 本稿では,共通データセットに基づく異なる音声変換(VC)システムの比較と理解を目的とした,二年制の科学イベントであるVCCシリーズの最新版を紹介する。今年はsvc(singing voice conversion challenge)に焦点を移し、the challenge the singing voice conversion challenge(svcc)と命名しました。新しいデータベースはドメイン内およびドメイン間SVCという2つのタスクのために構築された。チャレンジは2ヶ月間実施され、合計26の応募があり、2つのベースラインがありました。クラウドソースによる大規模なリスニングテストを通じて,人間レベルの自然性はトップシステムによって達成されたが,目標とする話者ほど高い類似度スコアを得ることはできなかった。また、予想通り、ドメイン間SVCは、特に類似性の観点から、ドメイン内SVCよりも難しい。また,既存の客観的測定値が知覚的パフォーマンスを予測できたかを調査し,有意な相関が得られたのはごくわずかであった。

We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual scientific event aiming to compare and understand different voice conversion (VC) systems based on a common dataset. This year we shifted our focus to singing voice conversion (SVC), thus named the challenge the Singing Voice Conversion Challenge (SVCC). A new database was constructed for two tasks, namely in-domain and cross-domain SVC. The challenge was run for two months, and in total we received 26 submissions, including 2 baselines. Through a large-scale crowd-sourced listening test, we observed that for both tasks, although human-level naturalness was achieved by the top system, no team was able to obtain a similarity score as high as the target speakers. Also, as expected, cross-domain SVC is harder than in-domain SVC, especially in the similarity aspect. We also investigated whether existing objective measurements were able to predict perceptual performance, and found that only few of them could reach a significant correlation.

翻訳日:2023-07-07 17:01:38 公開日:2023-07-06

# 大規模言語モデルによる中国のきめ細かな金融感情分析

Chinese Fine-Grained Financial Sentiment Analysis with Large Language Models ( http://arxiv.org/abs/2306.14096v2 )

ライセンス: Link先を確認

Yinyu Lan, Yanru Wu, Wang Xu, Weiqiang Feng, Youhao Zhang

(参考訳) 金融ドメインにおけるエンティティレベルのきめ細かい感情分析は、感情分析の重要なサブタスクであり、現在多くの課題に直面している。主な課題は、財務的なテキスト感情分析用に特別に設計された高品質で大規模な注釈付きコーパスが欠如していることであり、それによって効果的なテキスト処理技術を開発するために必要なデータの利用が制限される。大規模言語モデル(llm)の最近の進歩は、自然言語処理タスクにおいて、主に言語パターンマッチングを中心に顕著なパフォーマンスをもたらした。本稿では,企業早期警戒のための中国における財務感情分析データセットFinChina SAを提案する。我々のデータセットを用いて、よく知られたオープンソースのLCMを徹底的に評価し、実験した。我々は、我々のデータセットが、将来の研究の焦点となる実世界の財務感情分析タスクの探索を進めるための貴重なリソースとなると強く信じている。私たちのデータセットと実験結果を複製するすべてのコードがリリースされます。

Entity-level fine-grained sentiment analysis in the financial domain is a crucial subtask of sentiment analysis and currently faces numerous challenges. The primary challenge stems from the lack of high-quality and large-scale annotated corpora specifically designed for financial text sentiment analysis, which in turn limits the availability of data necessary for developing effective text processing techniques. Recent advancements in large language models (LLMs) have yielded remarkable performance in natural language processing tasks, primarily centered around language pattern matching. In this paper, we propose a novel and extensive Chinese fine-grained financial sentiment analysis dataset, FinChina SA, for enterprise early warning. We thoroughly evaluate and experiment with well-known existing open-source LLMs using our dataset. We firmly believe that our dataset will serve as a valuable resource to advance the exploration of real-world financial sentiment analysis tasks, which should be the focus of future research. Our dataset and all code to replicate the experimental results will be released.

翻訳日:2023-07-07 17:01:21 公開日:2023-07-06

# 新型コロナウイルスワクチン接種に関するトピックとパブリックスタンスの関係の可視化

Visualizing Relation Between (De)Motivating Topics and Public Stance toward COVID-19 Vaccine ( http://arxiv.org/abs/2306.12118v2 )

ライセンス: Link先を確認

Ashiqur Rahman and Hamed Alhoori

(参考訳) 現代のソーシャルメディアはコミュニケーションにおいて重要な役割を担っているが、誤った情報や荒らしが簡単に会話を引き継ぎ、これらのプラットフォームで世論を操ることができる。新型コロナウイルス(covid-19)パンデミックの際には、公衆衛生当局が国民にワクチン接種の動機付けを図りながら大きな反発を受けた。緊急時の現在および将来の脅威に対処し、共通の目標に向けて国民を動機付けるためには、公共のモチベーションがどのように変化し、どのトピックが一般市民の間で共鳴しているかを理解することが不可欠である。本研究では、新型コロナウイルス(covid-19)パンデミック時にtwitter圏内で共鳴した話題を検査・分析し、公衆の予防接種に対するスタンスを変えた重要な要因を理解するためのインタラクティブな可視化ツールを提案する。このツールは、視覚分析のあらゆるシナリオに対して容易に一般化することができ、研究者や一般大衆のソーシャルメディアデータの透明性を高めることができる。

While social media plays a vital role in communication nowadays, misinformation and trolls can easily take over the conversation and steer public opinion on these platforms. We saw the effect of misinformation during the COVID-19 pandemic when public health officials faced significant push-back while trying to motivate the public to vaccinate. To tackle the current and any future threats in emergencies and motivate the public towards a common goal, it is essential to understand how public motivation shifts and which topics resonate among the general population. In this study, we proposed an interactive visualization tool to inspect and analyze the topics that resonated among Twitter-sphere during the COVID-19 pandemic and understand the key factors that shifted public stance for vaccination. This tool can easily be generalized for any scenario for visual analysis and to increase the transparency of social media data for researchers and the general population alike.

翻訳日:2023-07-07 17:01:04 公開日:2023-07-06

# 熱2次元混合スピン1/2系の幾何学的位相

Geometric phases for a thermal two-dimensional mixed spin 1/2 system ( http://arxiv.org/abs/2306.11752v3 )

ライセンス: Link先を確認

Y. Ben-Aryeh

(参考訳) 混合状態に対する幾何位相を得るための量子力学的手法を解析する。純粋状態に対する並列輸送方程式は、動的位相を排除した混合状態に一般化される。混合状態の幾何学的位相はパンチャラトナム相として得られ、これは開サイクルにも有効である。幾何相は、NMRや中性子干渉実験で用いられるものと異なる混合熱状態のSU(2)変換によって引き起こされる。ゼロ次ハミルトニアンは、z方向における磁気モーメントと定磁場の相互作用によって与えられるが、本論文で想定される高次摂動は同じz方向の2つの振動磁場からなる。これらの仮定は、幾何相および干渉強度に関する結果が導出される混合熱状態のSU(2)ユニタリ変換の特別な形式をもたらす。

Quantum mechanical methods for getting geometric phases for mixed states are analyzed. Parallel transport equations for pure states are generalized to mixed states by which dynamical phases are eliminated. The geometric phases of mixed states are obtained as Pancharatnam phases which are valid also for open cycles. The geometric phases are derived here by SU(2) transformations of mixed thermal states which are different from those used in NMR and neutron interference experiments. While the zeroth order Hamiltonian is given by the interaction of a magnetic moment and constant magnetic field in the z direction, the high order perturbations assumed in the present article are composed of two oscillating magnetic fields in the same z direction. These assumptions lead to a special form of the SU(2) unitary transformation of the mixed thermal states by which results for geometric phase and for interference intensity are derived.

翻訳日:2023-07-07 17:00:45 公開日:2023-07-06

# ベクトル探索のための共設計ハードウェアとアルゴリズム

Co-design Hardware and Algorithm for Vector Search ( http://arxiv.org/abs/2306.11182v3 )

ライセンス: Link先を確認

Wenqi Jiang and Shigang Li and Yu Zhu and Johannes de Fine Licht and Zhenhao He and Runbin Shi and Cedric Renggli and Shuai Zhang and Theodoros Rekatsinas and Torsten Hoefler and Gustavo Alonso

(参考訳) ベクトル検索は大規模な情報検索と機械学習システムの基盤として現れ、GoogleやBingといった検索エンジンは、エンコードされたクエリテキストとWebドキュメント間のベクトル類似性を評価することによって、ペタバイト規模のドキュメントデータセットで毎秒数万のクエリを処理する。ベクトル探索システムの性能要求が急増するにつれて、加速ハードウェアはムーアの法則時代において有望な解決策を提供する。 FPGA上のエンドツーエンドでスケーラブルなベクトル検索フレームワークである \textit{FANNS} を紹介する。データセットとハードウェアリソースの予算に関するユーザが提供するリコール要求を前提として、 \textit{FANNS}は自動的にハードウェアとアルゴリズムを設計し、それに対応するアクセラレータを生成する。このフレームワークは、ハードウェアTCP/IPスタックをアクセラレータに組み込むことでスケールアウトもサポートする。 fpgaとcpuのベースラインと比較して最大23.0$\times$と37.2$\times$ speedupを達成し、gpuに対する優れたスケーラビリティを示し、中央値で5.5$\times$と7.6$\times$ speedupを、8アクセラレータ構成で95$textsuperscript{th} percentile (p95)レイテンシを達成する。 textit{FANNS} の顕著な性能は、データセンターとAIスーパーコンピュータにおける将来のFPGA統合の堅牢な基盤となる。

Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0$\times$ and 37.2$\times$ speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5$\times$ and 7.6$\times$ speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.

翻訳日:2023-07-07 17:00:33 公開日:2023-07-06

# Image Matters:マルチモーダルハイパボラ検出のための新しいデータセットと実証的研究

Image Matters: A New Dataset and Empirical Study for Multimodal Hyperbole Detection ( http://arxiv.org/abs/2307.00209v2 )

ライセンス: Link先を確認

Huixuan Zhang, Xiaojun Wan

(参考訳) 誇張(Hyperbole)または誇張(exaggeration)は、一般的な言語現象である。ハイパボールの発見は、人間の表現を理解する重要な部分である。ハイパボラ検出の研究はいくつかあるが、そのほとんどはテキストのモダリティのみに焦点を当てている。しかし、ソーシャルメディアの発展によって、テキスト、画像、ビデオなど、さまざまなモダリティを持った双曲表現が作成できるようになる。本稿では,マルチモーダルハイパーボイル検出に注目する。マルチモーダル検出データセット\footnote{the datasetはコミュニティにリリースされます。 →weibo(中国のソーシャルメディア)から、いくつかの研究を行ないました。 weiboの一部のテキストと画像を2つのモダリティとして扱い,ハイパーボイル検出におけるテキストと画像の役割について検討する。このダウンストリームタスクでは、さまざまなプリトレーニングされたマルチモーダルエンコーダも評価され、パフォーマンスを示している。さらに、このデータセットは5つの異なるトピックから構築されているため、異なるモデルのクロスドメイン性能も評価する。これらの研究は、ベンチマークとして機能し、マルチモーダルハイパーボイル検出に関するさらなる研究の方向性を指摘することができる。

Hyperbole, or exaggeration, is a common linguistic phenomenon. The detection of hyperbole is an important part of understanding human expression. There have been several studies on hyperbole detection, but most of which focus on text modality only. However, with the development of social media, people can create hyperbolic expressions with various modalities, including text, images, videos, etc. In this paper, we focus on multimodal hyperbole detection. We create a multimodal detection dataset\footnote{The dataset will be released to the community.} from Weibo (a Chinese social media) and carry out some studies on it. We treat the text and image from a piece of weibo as two modalities and explore the role of text and image for hyperbole detection. Different pre-trained multimodal encoders are also evaluated on this downstream task to show their performance. Besides, since this dataset is constructed from five different topics, we also evaluate the cross-domain performance of different models. These studies can serve as a benchmark and point out the direction of further study on multimodal hyperbole detection.

翻訳日:2023-07-07 16:51:50 公開日:2023-07-06

# バイオメディカル言語モデルは準最適トークン化にロバストである

Biomedical Language Models are Robust to Sub-optimal Tokenization ( http://arxiv.org/abs/2306.17649v2 )

ライセンス: Link先を確認

Bernal Jim\'enez Guti\'errez, Huan Sun, Yu Su

(参考訳) 一般英語とは対照的に、バイオメディカル用語学の多くの概念は、正確で簡潔なことを目標として、近年のバイオメディカル専門家によって設計された。これはしばしば、意味のある生体形態を結合して新しい意味単位を作成することで達成される。しかしながら、現代のほとんどのバイオメディカル言語モデル(LM)は、バイオメディカル言語の凝集特性を明示的に活用することなく、大規模バイオメディカルコーパス統計から派生した標準ドメイン固有のトークン化剤を用いて事前訓練されている。本研究では,バイオメディカルな用語を意味のある構成要素に分割できない標準オープンドメインとバイオメディカルなトークン化剤について述べる。そこで, バイオメディカル用語をより正確に区分するトークン化装置を用いることで, 下流のバイオメディカルNLPタスク, 特に名前付きエンティティ認識(NER)やエンティティリンクなどのバイオメディカル用語を直接含むタスクにおいて, バイオメディカルLMの性能を向上させることができると仮定した。驚くべきことに、より正確なバイオメディカルトークンを使用して生体医学的lmを事前トレーニングすることは、マスク言語モデリング予測(mlm)の精度やnerおよびエンティティリンクのパフォーマンスといったいくつかの本質的および極端的な尺度で測定されるように、言語モデルのエンティティ表現品質を改善するものではない。これらの定量的研究は、実体表現の質をより直接的に探求するケーススタディとともに、生物医学的な事前学習プロセスが準最適トークン化の事例に対して非常に堅牢であることを示している。

As opposed to general English, many concepts in biomedical terminology have been designed in recent history by biomedical professionals with the goal of being precise and concise. This is often achieved by concatenating meaningful biomedical morphemes to create new semantic units. Nevertheless, most modern biomedical language models (LMs) are pre-trained using standard domain-specific tokenizers derived from large scale biomedical corpus statistics without explicitly leveraging the agglutinating nature of biomedical language. In this work, we first find that standard open-domain and biomedical tokenizers are largely unable to segment biomedical terms into meaningful components. Therefore, we hypothesize that using a tokenizer which segments biomedical terminology more accurately would enable biomedical LMs to improve their performance on downstream biomedical NLP tasks, especially ones which involve biomedical terms directly such as named entity recognition (NER) and entity linking. Surprisingly, we find that pre-training a biomedical LM using a more accurate biomedical tokenizer does not improve the entity representation quality of a language model as measured by several intrinsic and extrinsic measures such as masked language modeling prediction (MLM) accuracy as well as NER and entity linking performance. These quantitative findings, along with a case study which explores entity representation quality more directly, suggest that the biomedical pre-training process is quite robust to instances of sub-optimal tokenization.

翻訳日:2023-07-07 16:51:35 公開日:2023-07-06

# 古典的および量子力学による原子表面散乱の物理過程の比較

Comparison of physical processes of atom-surface scattering computed by classical and quantum dynamics ( http://arxiv.org/abs/2306.17483v2 )

ライセンス: Link先を確認

Tapas Sahoo and Eli Pollak

(参考訳) 我々は,原子表面散乱の物理過程,例えばトラップ確率と平均エネルギー損失,腐食した熱表面から散乱した粒子の最終的な角分布の動的量を計算するために,古典的および量子力学シミュレーションを行った。ここでは、垂直距離 z と水平座標 x の2つの自由度しか考慮しなくてよいように、平面内散乱に自分自身を制限した。さらに, 表面の熱フォノン浴との相互作用により垂直座標のみが変動することが仮定された。初期位相 - 量子力学の初期波動関数から導かれたウィグナー分布関数に従って, 系の空間変数と古典的シミュレーションのための浴が生成される。非常に低い入射エネルギーでは、脱着した粒子と熱表面の量子力学的平均エネルギー損失は、特定の表面温度において古典的な粒子よりも小さいことが判明した。古典シミュレーションにより得られた散乱粒子の脱出確率は表面温度の増加とともに増加することに留意する必要がある。一方、量子速度は粒子の入射エネルギー2 meVでほぼ温度に依存し、古典的な結果と5 meVで同じ傾向を示し、量子速度は古典的な速度よりも低い。また、古典的だけでなく量子力学においても散乱粒子の最終的な角分布が定性的に異なるが、その量は多かれ少なかれ温度に依存しない。

We have performed classical and quantum dynamical simulations to calculate dynamical quantities for physical processes of atom - surface scattering, e.g., trapping probability and average energy loss, final angular distribution of a particle scattered from a corrugated thermal surface. Here we have restricted ourselves to in-plane scattering so that only two degrees of freedom of the particle have to be considered - the vertical distance z and the horizontal coordinate x. Moreover, we assumed further that only the vertical coordinate fluctuates due to interaction with thermal phonon bath of the surface. Initial phase - space variables of the system and the bath for our classical simulations were generated according to Wigner distribution functions which were derived from initial wavefunctions of our quantum dynamics. At very low incident energy, we have found that the quantum mechanical average energy loss of the escaped particle from the corrugated as well as thermal surface are smaller than the classical ones at a particular surface temperature. It is important to note that the rate of escaping probability of the scattered particle obtained by classical simulation increases with increasing surface temperature. On the other hand, quantum rate is almost temperature independent at 2 meV incident energy of the particle, whereas it shows same trend with the classical results at 5 meV and the quantum rate is lower than the classical rate. We have also noticed that the final angular distributions of the scattered particle both for classical as well as quantum dynamics are qualitatively different but the quantities are more or less temperature independent.

翻訳日:2023-07-07 16:51:06 公開日:2023-07-06

# 情報的非平衡の動的資源理論

The Dynamical Resource Theory of Informational Non-Equilibrium ( http://arxiv.org/abs/2306.16848v2 )

ライセンス: Link先を確認

Benjamin Stratton, Chung-Yun Hsieh, Paul Skrzypczyk

(参考訳) 情報は熱力学の理解に欠かせない。彼らの相互作用は、熱力学変換への情報的貢献を分離できる完全に縮退したハミルトニアンを通じて研究されている。この設定では、最大混合状態以外の全ての状態は情報非平衡状態であると考えられる。情報的非平衡を維持するために量子力学の能力をどのように特徴付けるか? ここでは, 情報的非平衡可観測性に関する動的資源理論を導入し, この問いへの答えを述べる。許容される演算のキャラクタリゼーションは、キュービットチャネルとn次元ワイル共変チャネル(一般チャネルの物理的関連部分集合)に対して与えられる。ベル状態測定を伴う状態識別ゲームの操作解釈が与えられる。最後に、チャネルの古典的容量と情報非平衡を維持する能力との明示的なリンクを作る。

Information is instrumental in our understanding of thermodynamics. Their interplay has been studied through completely degenerate Hamiltonians whereby the informational contributions to thermodynamic transformations can be isolated. In this setting, all states other then the maximally mixed state are considered to be in informational non-equilibrium. An important yet still open question is: how to characterise the ability of quantum dynamics to maintain informational non-equilibrium? Here, the dynamical resource theory of informational non-equilibrium preservability is introduced to begin providing an answer to this question. A characterisation of the allowed operations is given for qubit channels and the n dimensional Weyl-covariant channels - a physically relevant subset of the general channels. An operational interpretation of a state discrimination game with Bell state measurements is given. Finally, an explicit link between a channels classical capacity and its ability to maintain informational non-equilibrium is made.

翻訳日:2023-07-07 16:50:42 公開日:2023-07-06

# KITE:セマンティックマニピュレーションのためのキーポイント型ポリシー

KITE: Keypoint-Conditioned Policies for Semantic Manipulation ( http://arxiv.org/abs/2306.16605v2 )

ライセンス: Link先を確認

Priya Sundaresan, Suneel Belkhale, Dorsa Sadigh, Jeannette Bohg

(参考訳) 自然言語は人間とロボットに便利な共有インターフェースを提供するが、ロボットが言語コマンドを解釈し従わせることは、操作において長年の課題である。動作指示追従ロボットを実現するための重要なステップは、ロボットが「ぬいぐるみを拾い上げる」といった高レベルな指示から「象の左耳を磨く」といったより詳細な入力まで、異なる特異性で言語を解釈する意味操作を実現することである。そこで我々は,シーンセマンティクス(視覚的場面における異なるオブジェクトの識別)とオブジェクトセマンティクス(正確にはオブジェクトインスタンス内の異なる部分のローカライズ)の両方に対応する意味操作のための2段階のフレームワークであるKeypoints + Instructions to Execution (KITE)を提案する。 KITEは、まず2次元画像キーポイントを通して視覚シーンに入力命令を接地し、下流アクション推論のための高精度なオブジェクト中心バイアスを提供する。 KITEはRGB-Dシーンの観察を行い、学習されたキーポイント条件のスキルを実行して命令を実行する。キーポイントの精度とパラメータ化スキルを組み合わせることで、シーンやオブジェクトのバリエーションを一般化したきめ細かい操作が可能になる。実世界の3つの環境 – 長距離6-DoFテーブルトップ操作,意味的把握,高精度コーヒー製造タスク – において,KITEを実証した。これらの設定では、KITEはそれぞれ75%、70%、全体の71%の成功率を達成している。 KITEは、キーポイントベースのグラウンドよりも事前訓練されたビジュアル言語モデルを選択するフレームワークや、エンドツーエンドのビジュモータコントロールを優先して省略スキルを向上する。追加資料、データセット、コード、ビデオは、私たちのWebサイトにある。

While natural language offers a convenient shared interface for humans and robots, enabling robots to interpret and follow language commands remains a longstanding challenge in manipulation. A crucial step to realizing a performant instruction-following robot is achieving semantic manipulation, where a robot interprets language at different specificities, from high-level instructions like "Pick up the stuffed animal" to more detailed inputs like "Grab the left ear of the elephant." To tackle this, we propose Keypoints + Instructions to Execution (KITE), a two-step framework for semantic manipulation which attends to both scene semantics (distinguishing between different objects in a visual scene) and object semantics (precisely localizing different parts within an object instance). KITE first grounds an input instruction in a visual scene through 2D image keypoints, providing a highly accurate object-centric bias for downstream action inference. Provided an RGB-D scene observation, KITE then executes a learned keypoint-conditioned skill to carry out the instruction. The combined precision of keypoints and parameterized skills enables fine-grained manipulation with generalization to scene and object variations. Empirically, we demonstrate KITE in 3 real-world environments: long-horizon 6-DoF tabletop manipulation, semantic grasping, and a high-precision coffee-making task. In these settings, KITE achieves a 75%, 70%, and 71% overall success rate for instruction-following, respectively. KITE outperforms frameworks that opt for pre-trained visual language models over keypoint-based grounding, or omit skills in favor of end-to-end visuomotor control, all while being trained from fewer or comparable amounts of demonstrations. Supplementary material, datasets, code, and videos can be found on our website: http://tinyurl.com/kite-site.

翻訳日:2023-07-07 16:50:30 公開日:2023-07-06

# Pareto Optimal Self-supervisionによるLCM校正と幻覚自動検出

LLM Calibration and Automatic Hallucination Detection via Pareto Optimal Self-supervision ( http://arxiv.org/abs/2306.16564v2 )

ライセンス: Link先を確認

Theodore Zhao, Mu Wei, J. Samuel Preston, Hoifung Poon

(参考訳) 大規模言語モデル (LLM) は、広範囲の応用において目覚ましい能力を示してきたが、精度は依然として大きな成長領域であり、特にバイオメディシンのようなミッションクリティカルな領域では顕著である。 LLM応答に対する信頼度を校正する効果的な方法は、エラーを自動的に検出し、ループ内検証を容易にするために不可欠である。キャリブレーション信号の重要な源は、低コストで利用可能であるが、ノイズやカバレッジといった独自の制限がある、専門家によるプログラム的監督にある。本稿では,利用可能なプログラム的監督を活用し,追加の手動作業なしに,各応答に対するリスクスコアを作成することで,llm応答を体系的に校正することができるparetoの最適自己スーパービジョンフレームワークを提案する。これは、より不確実なLSM応答により高いリスクスコアを割り当て、エラー修正を容易にする、他の利用可能な監視源とLLM出力を一致させるハーモニザモデルを学ぶことで達成される。生体医学領域および一般領域における標準関係抽出タスクの実験により,本手法の有効性が示され,本手法のリスクスコアはllmsの実誤差率と高い相関を示した。最も不確実なテスト例では,提案したリスクスコアに基づく動的プロンプトにより,既製のLCMの精度が大幅に向上し,SOTA(State-of-the-art)の監督が弱く,SOTAの監督が難しい評価データセットにGPT-4の結果が及んだ。

Large language models (LLMs) have demonstrated remarkable capabilities out of box for a wide range of applications, yet accuracy still remains a major growth area, especially in mission-critical domains such as biomedicine. An effective method to calibrate the confidence level on LLM responses is essential to automatically detect errors and facilitate human-in-the-loop verification. An important source of calibration signals stems from expert-stipulated programmatic supervision, which is often available at low cost but has its own limitations such as noise and coverage. In this paper, we introduce a Pareto optimal self-supervision framework that can leverage available programmatic supervision to systematically calibrate LLM responses by producing a risk score for every response, without any additional manual efforts. This is accomplished by learning a harmonizer model to align LLM output with other available supervision sources, which would assign higher risk scores to more uncertain LLM responses and facilitate error correction. Experiments on standard relation extraction tasks in biomedical and general domains demonstrate the promise of this approach, with our proposed risk scores highly correlated with the real error rate of LLMs. For the most uncertain test instances, dynamic prompting based on our proposed risk scores results in significant accuracy improvement for off-the-shelf LLMs, boosting GPT-3 results past state-of-the-art (SOTA) weak supervision and GPT-4 results past SOTA supervised results on challenging evaluation datasets.

翻訳日:2023-07-07 16:49:53 公開日:2023-07-06

# ランダムおよび自然言語文によるストラー数の統計力学

Statistical Mechanics of Strahler Number via Random and Natural Language Sentences ( http://arxiv.org/abs/2307.02697v1 )

ライセンス: Link先を確認

Kumiko Tanaka-Ishii and Akira Tanaka

(参考訳) ストラー数は当初、河川分岐の複雑さを特徴付けるために提案され、様々な応用を見出した。本稿では,統計的力学解析が可能な大規模データセットで利用可能な自然言語文木構造に対するシュトララー数の上・下限の計算を提案する。文法的に注釈付けされたデータにわたる経験的な測定により、Strahler の自然言語文の数は、Strahler (1957) と Horton (1945) が報告したように、ほぼ常に 3 または 4 であることが示された。数字の背景にある理論から、特定のモデルの下で文を処理するのに必要なメモリ量の上限が低いことを示す。乱数木の数学的解析は、シュトララー数の性質に関するさらなる予想を与え、それが定数ではなく対数的に成長することを示す。この発見は、一般的な木構造ターゲットの特徴として、ストラー数の背後にある統計的基礎を明らかにする。

The Strahler number was originally proposed to characterize the complexity of river bifurcation and has found various applications. This article proposes computation of the Strahler number's upper and lower limits for natural language sentence tree structures, which are available in a large dataset allowing for statistical mechanics analysis. Through empirical measurements across grammatically annotated data, the Strahler number of natural language sentences is shown to be almost always 3 or 4, similar to the case of river bifurcation as reported by Strahler (1957) and Horton (1945). From the theory behind the number, we show that it is the lower limit of the amount of memory required to process sentences under a particular model. A mathematical analysis of random trees provides a further conjecture on the nature of the Strahler number, revealing that it is not a constant but grows logarithmically. This finding uncovers the statistical basics behind the Strahler number as a characteristic of a general tree structure target.

翻訳日:2023-07-07 15:45:09 公開日:2023-07-06

# フェアネスレンズを通して:エンティティマッチングの実験解析と評価

Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching ( http://arxiv.org/abs/2307.02726v1 )

ライセンス: Link先を確認

Nima Shahbazi, Nikola Danevski, Fatemeh Nargesian, Abolfazl Asudeh, Divesh Srivastava

(参考訳) エンティティマッチング(em)は、さまざまなコミュニティが半世紀以上にわたって研究してきた困難な問題である。アルゴリズムの公平さは、マシンバイアスとその社会的影響に対処するためのタイムリーなトピックにもなっている。これら2つのトピックに関する広範な研究にもかかわらず、エンティティマッチングの公平性にはほとんど注意が払われていない。このギャップに対処するため,本論文では様々なem手法を広範囲に実験的に評価する。我々は、公正なレンズを通してEMを監査するために、公開データセットから2つのソーシャルデータセットを生成した。本研究は,実社会における2つの共通条件下での潜在的不公平性を浮き彫りにする。 (i)一部の人口集団が過剰に代表される場合 (ii)他のグループに比べて名前が似ている場合。多くの発見のうち、様々な公平性の定義は、emのクラス不均衡性のため、異なる設定で価値があるが、ポジティブな予測値パリティや真の正のレートパリティといった尺度は、一般に、emの不公平性を明らかにすることができる。

Entity matching (EM) is a challenging problem studied by different communities for over half a century. Algorithmic fairness has also become a timely topic to address machine bias and its societal impacts. Despite extensive research on these two topics, little attention has been paid to the fairness of entity matching. Towards addressing this gap, we perform an extensive experimental evaluation of a variety of EM techniques in this paper. We generated two social datasets from publicly available datasets for the purpose of auditing EM through the lens of fairness. Our findings underscore potential unfairness under two common conditions in real-world societies: (i) when some demographic groups are overrepresented, and (ii) when names are more similar in some groups compared to others. Among our many findings, it is noteworthy to mention that while various fairness definitions are valuable for different settings, due to EM's class imbalance nature, measures such as positive predictive value parity and true positive rate parity are, in general, more capable of revealing EM unfairness.

翻訳日:2023-07-07 15:34:08 公開日:2023-07-06

# 知識蒸留によるキーワードスポッティングのためのオンデバイス制約付き自己教師付き音声表現学習

On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation ( http://arxiv.org/abs/2307.02720v1 )

ライセンス: Link先を確認

Gene-Ping Yang, Yue Gu, Qingming Tang, Dongsu Du, Yuzong Liu

(参考訳) 大きな自己教師付きモデルは効果的な機能抽出器であるが、そのアプリケーションはオンデバイス予算の制約やバイアス付きデータセットの収集、特にキーワードスポッティングの下では困難である。そこで我々は,オンデバイスキーワードスポッティングのための知識蒸留に基づく自己教師型音声表現学習(S3RL)アーキテクチャを提案する。提案手法では,より大規模で複雑なモデルから,より小型で軽量なモデルに知識を移すための教師・教師の枠組みを用いて,二重視相互相関蒸留と教師のコードブックを学習対象とした。我々は、社内データセットを用いて、Alexaキーワードスポッティング検出タスクでモデルの性能を評価した。本手法は,通常および騒音条件において異常な性能を示し,オンデバイスリソース制約下で作業しながらキーワードスポッティングタスクの自己教師付きモデル構築における知識蒸留法の有効性を示した。

Large self-supervised models are effective feature extractors, but their application is challenging under on-device budget constraints and biased dataset collection, especially in keyword spotting. To address this, we proposed a knowledge distillation-based self-supervised speech representation learning (S3RL) architecture for on-device keyword spotting. Our approach used a teacher-student framework to transfer knowledge from a larger, more complex model to a smaller, light-weight model using dual-view cross-correlation distillation and the teacher's codebook as learning objectives. We evaluated our model's performance on an Alexa keyword spotting detection task using a 16.6k-hour in-house dataset. Our technique showed exceptional performance in normal and noisy conditions, demonstrating the efficacy of knowledge distillation methods in constructing self-supervised models for keyword spotting tasks while working within on-device resource constraints.

翻訳日:2023-07-07 15:33:53 公開日:2023-07-06

# 不確かさサンプリングを理解する

Understanding Uncertainty Sampling ( http://arxiv.org/abs/2307.02719v1 )

ライセンス: Link先を確認

Shang Liu, Xiaocheng Li

(参考訳) 不確実性サンプリングは、現在の予測モデルが不確実であるデータサンプルの注釈を逐次クエリする、一般的なアクティブラーニングアルゴリズムである。しかし、不確実性サンプリングの使用は概ねヒューリスティックである。 (i)特定の損失を受けた特定のタスクに対する「不確実性」の適切な定義についての合意がないこと。 (II)アルゴリズムを実装するための標準プロトコルを規定する理論的保証はない。例えば、確率勾配降下のような最適化アルゴリズムの枠組みの下で、逐次到着した注釈付きデータをどう扱うか。本研究では,ストリームベースとプールベースの両方のアクティブラーニングの下で不確実性サンプリングアルゴリズムを体系的に検討する。そこで本研究では, 不確実性尺度と元の損失関数に依存する等価損失の概念を提案し, 不確実性サンプリングアルゴリズムが等価損失に対して本質的に最適化することを示す。この観点は、既存の不確実性対策の正当性を2つの側面から検証する。さらに、不確実性測度を不確実性として設計するための新しい概念である \textit{loss as uncertainty} を提案する。特徴を不確実性尺度として考慮すれば、条件付き期待損失を使用することが目的である。このような不確実性測度は、分類問題と回帰問題の両方をカバーする優れた解析的性質と一般性を有しており、基礎となるモデルと問題の完全な一般性において、ストリームベースとプールベースの設定の両方において不確実性サンプリングアルゴリズムに束縛された最初の一般化を提供することができる。最後に,リスクに敏感な目標と分布的ロバスト性を持つ不確実性サンプリングアルゴリズムのある種の変種間の接続を確立することにより,サンプルサイズが小さい場合の不確実性サンプリングアルゴリズムの利点を部分的に説明できる。

Uncertainty sampling is a prevalent active learning algorithm that queries sequentially the annotations of data samples which the current prediction model is uncertain about. However, the usage of uncertainty sampling has been largely heuristic: (i) There is no consensus on the proper definition of "uncertainty" for a specific task under a specific loss; (ii) There is no theoretical guarantee that prescribes a standard protocol to implement the algorithm, for example, how to handle the sequentially arrived annotated data under the framework of optimization algorithms such as stochastic gradient descent. In this work, we systematically examine uncertainty sampling algorithms under both stream-based and pool-based active learning. We propose a notion of equivalent loss which depends on the used uncertainty measure and the original loss function and establish that an uncertainty sampling algorithm essentially optimizes against such an equivalent loss. The perspective verifies the properness of existing uncertainty measures from two aspects: surrogate property and loss convexity. Furthermore, we propose a new notion for designing uncertainty measures called \textit{loss as uncertainty}. The idea is to use the conditional expected loss given the features as the uncertainty measure. Such an uncertainty measure has nice analytical properties and generality to cover both classification and regression problems, which enable us to provide the first generalization bound for uncertainty sampling algorithms under both stream-based and pool-based settings, in the full generality of the underlying model and problem. Lastly, we establish connections between certain variants of the uncertainty sampling algorithms with risk-sensitive objectives and distributional robustness, which can partly explain the advantage of uncertainty sampling algorithms when the sample size is small.

翻訳日:2023-07-07 15:33:36 公開日:2023-07-06

# TL-nvSRAM-CIM: DC-Power Free Restore と Ternary MAC 操作による超高密度3レベル ReRAM-Assisted Computing-in-nvSRAM

TL-nvSRAM-CIM: Ultra-High-Density Three-Level ReRAM-Assisted Computing-in-nvSRAM with DC-Power Free Restore and Ternary MAC Operations ( http://arxiv.org/abs/2307.02717v1 )

ライセンス: Link先を確認

Dengfeng Wang, Liukai Xu, Songyuan Liu, zhi Li, Yiming Chen, Weifeng He, Xueqing Li and Yanan Su

(参考訳) 大規模NNのためにチップ上のすべての重量を調節することは、オンチップ容量に制限のあるSRAMベースのコンピューティングインメモリ(SRAM-CIM)にとって、依然として大きな課題である。従来の非揮発性SRAM-CIM(nvSRAM-CIM)は、高効率SRAM-CIMの上に高密度のシングルレベルReRAMを統合することでこの問題に対処し、オフチップメモリアクセスをなくした。しかし、以前のSL-nvSRAM-CIMは、SL-ReRAMの増加と計算効率の制限によりスケーラビリティが低下していた。これらの課題を克服するために、大規模なNNモデルのための超高密度3レベルReRAM支援非揮発性SRAM(TL-nvSRAM-CIM)方式を提案する。クラスタ化されたn-selector-n-ReRAM (cluster-nSnRs) は、DC電力を排除した信頼性の高い重み復元に使用される。さらに、高NN精度を維持しつつ、エネルギー効率のよい三値MAC演算に対して、微分計算方式による三値SRAM-CIM機構を提案する。提案したTL-nvSRAM-CIMは、最先端技術と比較して7.8倍のストレージ密度を実現する。さらに、TL-nvSRAM-CIMはSRAM-CIMとReRAM-CIMのベースライン設計と比較して最大2.9倍、エネルギー効率は1.9倍に向上した。

Accommodating all the weights on-chip for large-scale NNs remains a great challenge for SRAM based computing-in-memory (SRAM-CIM) with limited on-chip capacity. Previous non-volatile SRAM-CIM (nvSRAM-CIM) addresses this issue by integrating high-density single-level ReRAMs on the top of high-efficiency SRAM-CIM for weight storage to eliminate the off-chip memory access. However, previous SL-nvSRAM-CIM suffers from poor scalability for an increased number of SL-ReRAMs and limited computing efficiency. To overcome these challenges, this work proposes an ultra-high-density three-level ReRAMs-assisted computing-in-nonvolatile-SRAM (TL-nvSRAM-CIM) scheme for large NN models. The clustered n-selector-n-ReRAM (cluster-nSnRs) is employed for reliable weight-restore with eliminated DC power. Furthermore, a ternary SRAM-CIM mechanism with differential computing scheme is proposed for energy-efficient ternary MAC operations while preserving high NN accuracy. The proposed TL-nvSRAM-CIM achieves 7.8x higher storage density, compared with the state-of-art works. Moreover, TL-nvSRAM-CIM shows up to 2.9x and 1.9x enhanced energy-efficiency, respectively, compared to the baseline designs of SRAM-CIM and ReRAM-CIM, respectively.

翻訳日:2023-07-07 15:33:09 公開日:2023-07-06

# cfsum:マルチモーダル要約のための細かな貢献ネットワーク

CFSum: A Coarse-to-Fine Contribution Network for Multimodal Summarization ( http://arxiv.org/abs/2307.02716v1 )

ライセンス: Link先を確認

Min Xiao, Junnan Zhu, Haitao Lin, Yu Zhou, Chengqing Zong

(参考訳) マルチモーダル要約は通常、視覚モダリティの寄与が不明確であるという問題に苦しむ。既存のマルチモーダル要約手法は、視覚的モダリティが有用である適応条件を無視しながら、異なるモダリティの融合方法の設計に重点を置いている。そこで本研究では,多変量和数化 (cfsum) のための,画像の異なる和数化への寄与を考えるための新しい粗度対細貢献ネットワークを提案する。まず,無駄な画像の干渉をなくすため,無駄な画像を見捨てるプリフィルタモジュールを提案する。次に,有用な画像を正確に利用するために,単語レベルと句レベルという2つの視覚補完モジュールを提案する。具体的には、画像のコントリビューションを計算し、テキストと視覚の両方に注意を向ける。実験の結果、CFSumは標準ベンチマークで複数の強いベースラインを著しく上回っていることがわかった。さらに,画像中に暗黙的に表現される非視覚的単語を生成するのに有用であることを示す。

Multimodal summarization usually suffers from the problem that the contribution of the visual modality is unclear. Existing multimodal summarization approaches focus on designing the fusion methods of different modalities, while ignoring the adaptive conditions under which visual modalities are useful. Therefore, we propose a novel Coarse-to-Fine contribution network for multimodal Summarization (CFSum) to consider different contributions of images for summarization. First, to eliminate the interference of useless images, we propose a pre-filter module to abandon useless images. Second, to make accurate use of useful images, we propose two levels of visual complement modules, word level and phrase level. Specifically, image contributions are calculated and are adopted to guide the attention of both textual and visual modalities. Experimental results have shown that CFSum significantly outperforms multiple strong baselines on the standard benchmark. Furthermore, the analysis verifies that useful images can even help generate non-visual words which are implicitly represented in the image.

翻訳日:2023-07-07 15:32:42 公開日:2023-07-06

# 多相性コントラスト学習

Multi-Similarity Contrastive Learning ( http://arxiv.org/abs/2307.02712v1 )

ライセンス: Link先を確認

Emily Mu, John Guttag, Maggie Makar

(参考訳) 類似度計量が与えられたとき、対照的な手法は、類似する例が一つにまとめられ、異なる例が引き離される表現を学ぶ。画像分類からキャプション生成までのタスクの表現を学習するために,コントラスト学習技術が広く利用されている。しかし、既存の対照的な学習アプローチは、異なる類似性関係の可能性を考慮していないため、一般化できない可能性がある。本稿では,複数の類似度指標の監視を共同で活用することにより,一般化可能な埋め込みを学習する新しいマルチ相似コントラスト損失(MSCon)を提案する。提案手法は,類似性の不確実性に基づいてコントラスト的類似度重み付けを自動的に学習し,未知のタスクの重み付けを行い,新たなタスクへのドメイン外一般化を改善する。我々は、MSConでトレーニングされたネットワークが、ドメイン内およびドメイン外設定で最先端のベースラインより優れていることを実証的に示す。

Given a similarity metric, contrastive methods learn a representation in which examples that are similar are pushed together and examples that are dissimilar are pulled apart. Contrastive learning techniques have been utilized extensively to learn representations for tasks ranging from image classification to caption generation. However, existing contrastive learning approaches can fail to generalize because they do not take into account the possibility of different similarity relations. In this paper, we propose a novel multi-similarity contrastive loss (MSCon), that learns generalizable embeddings by jointly utilizing supervision from multiple metrics of similarity. Our method automatically learns contrastive similarity weightings based on the uncertainty in the corresponding similarity, down-weighting uncertain tasks and leading to better out-of-domain generalization to new tasks. We show empirically that networks trained with MSCon outperform state-of-the-art baselines on in-domain and out-of-domain settings.

翻訳日:2023-07-07 15:32:23 公開日:2023-07-06

# 不正確な地下構造評価のための論理的評価式の実用性検証

Validation of the Practicability of Logical Assessment Formula for Evaluations with Inaccurate Ground-Truth Labels ( http://arxiv.org/abs/2307.02709v1 )

ライセンス: Link先を確認

Yongquan Yang and Hong Bu

(参考訳) 論理的アセスメント公式 (LAF) は、様々な人工知能応用の予測モデルを評価するために、不正確な地上真実ラベル (IAGTL) を用いた評価のために提案された新しい理論である。しかし, IAGTLを用いた評価において, LAFの実践性はまだ実証されていない。本稿では,この課題に対処するため,臨床病理組織学的スライス画像解析(MHWSIA)における乳癌の腫瘍分節(TSfBC)にLAFを適用した。実験結果と解析結果から,TSfBC症例における IAGTL 評価における LAF の有効性と,MHWSIA に対する LAF の有用性が示唆された。

Logical assessment formula (LAF) is a new theory proposed for evaluations with inaccurate ground-truth labels (IAGTLs) to assess the predictive models for various artificial intelligence applications. However, the practicability of LAF for evaluations with IAGTLs has not yet been validated in real-world practice. In this paper, to address this issue, we applied LAF to tumour segmentation for breast cancer (TSfBC) in medical histopathology whole slide image analysis (MHWSIA). Experimental results and analysis show the validity of LAF for evaluations with IAGTLs in the case of TSfBC and reflect the potentials of LAF applied to MHWSIA.

翻訳日:2023-07-07 15:32:05 公開日:2023-07-06

# 対称性を考慮した周期材料の創製に向けて

Towards Symmetry-Aware Generation of Periodic Materials ( http://arxiv.org/abs/2307.02707v1 )

ライセンス: Link先を確認

Youzhi Luo, Chengkai Liu, Shuiwang Ji

(参考訳) 深部モデルを用いた周期材料生成の問題を考える。対称性を感知する分子生成は広く研究されているが、周期的物質は異なる対称性を持ち、既存の方法では完全には捉えられていない。本稿では,周期的物質構造の物理的対称性を捉える新しい材料生成手法であるsymatを提案する。 SyMatは、変分オートエンコーダモデルを用いて、原子タイプセット、格子長、格子角を生成することによって、材料の原子タイプと格子を生成する。さらに、symatはスコアベースの拡散モデルを用いて材料の原子座標を生成し、座標拡散過程において新しい対称性を認識できる確率モデルを用いる。我々は,SyMatが材料上のすべての対称性変換に理論的に不変であることを示し,SyMatがランダム生成および特性最適化タスクにおいて有望な性能を達成することを示す。

We consider the problem of generating periodic materials with deep models. While symmetry-aware molecule generation has been studied extensively, periodic materials possess different symmetries, which have not been completely captured by existing methods. In this work, we propose SyMat, a novel material generation approach that can capture physical symmetries of periodic material structures. SyMat generates atom types and lattices of materials through generating atom type sets, lattice lengths and lattice angles with a variational auto-encoder model. In addition, SyMat employs a score-based diffusion model to generate atom coordinates of materials, in which a novel symmetry-aware probabilistic model is used in the coordinate diffusion process. We show that SyMat is theoretically invariant to all symmetry transformations on materials and demonstrate that SyMat achieves promising performance on random generation and property optimization tasks.

翻訳日:2023-07-07 15:31:51 公開日:2023-07-06

# 積分ゆらぎ定理とトレース保存写像

Integral fluctuation theorems and trace-preserving map ( http://arxiv.org/abs/2307.02705v1 )

ライセンス: Link先を確認

Zhiqiang Huang

(参考訳) 詳細なゆらぎ定理はエントロピー生成確率の生成関数に関する対称性を意味する。積分ゆらぎ定理は、この対称性と確率の正規化からすぐに従う。本稿では,生成関数を完全正の写像で書き直し,積分 ft が構築した写像のトレース保存特性によって決定されることを示す。固有状態変動定理と2つの系間の熱交換を議論することで,この枠組みの利便性を実証する。この手法は準確率生成関数にも適用でき、petzリカバリ写像はこの枠組みから自然に生じることが分かる。また、変動散逸定理の一般化の研究に役立つであろうマルチタイムプロセスの関数生成についても、簡潔に論じる。

The detailed fluctuation theorem implies the symmetry on the generating function of the entropy production probability. The integral fluctuation theorem follows immediately from this symmetry and normalization of the probability. In this paper, we rewrite the generating function with complete positive maps and show that the integral FT is determined by the trace-preserving property of constructed maps. We demonstrate the convenience of this framework by discussing the eigenstate fluctuation theorem and the heat exchange between two systems. This set of methods is also applicable to the generating function of quasi-probability, and we find that the Petz recovery map can arise naturally from this framework. We also briefly discuss generating functions for multitime processes, which may be helpful in studying the generalization of the fluctuation-dissipation theorem.

翻訳日:2023-07-07 15:31:37 公開日:2023-07-06

# 拡散モデルを用いた局所制御によるカラーパレットの適用

Applying a Color Palette with Local Control using Diffusion Models ( http://arxiv.org/abs/2307.02698v1 )

ライセンス: Link先を確認

Vaibhav Vavilala and David Forsyth

(参考訳) ファンタジーカードアートの文脈における2つの新しい編集手順を実証する。パレット転送は、指定された参照パレットを所定のカードに適用する。ファンタジーアートにとって、パレットの望ましい変化は非常に大きく、芸術の「外観」に大きな変化をもたらす可能性がある。ベクトル量子化のパイプライン、マッチング、および(拡散モデルを用いて)「ベクトル量子化」が極端なパレット転送を成功させることを示す。セグメント制御により、アーティストは1つ以上の画像セグメントを移動でき、任意に結果の色を指定することができる。これら2つのタイプの編集の組み合わせは、セグメントを移動し、再色し、再色し、一部のセグメントに所定の色を強制するといった、貴重なワークフローをもたらす。我々は,Yu-Gi-Ohカードアートデータセットに挑戦する手法を実証する。

We demonstrate two novel editing procedures in the context of fantasy card art. Palette transfer applies a specified reference palette to a given card. For fantasy art, the desired change in palette can be very large, leading to huge changes in the "look" of the art. We demonstrate that a pipeline of vector quantization; matching; and "vector dequantization" (using a diffusion model) produces successful extreme palette transfers. Segment control allows an artist to move one or more image segments, and to optionally specify the desired color of the result. The combination of these two types of edit yields valuable workflows, including: move a segment, then recolor; recolor, then force some segments to take a prescribed color. We demonstrate our methods on the challenging Yu-Gi-Oh card art dataset.

翻訳日:2023-07-07 15:31:25 公開日:2023-07-06

# ALPCAH:Tail Singular Value Regularizationを用いたサンプルワイズヘテロシダスティックPCA

ALPCAH: Sample-wise Heteroscedastic PCA with Tail Singular Value Regularization ( http://arxiv.org/abs/2307.02745v1 )

ライセンス: Link先を確認

Javier Salazar Cavazos, Jeffrey A. Fessler, Laura Balzano

(参考訳) 主成分分析(PCA)はデータ次元削減の分野で重要なツールであり、様々なデータサイエンス問題に有用である。しかし、多くの応用は、異なるデータ源に関連するノイズ特性により品質が変化する異種データを含む。この混合データセットを扱う手法はヘテロシデスティック法として知られている。 HePPCATのような現在の手法は、実際は成り立たない基底係数のガウス的仮定を作る。重み付きPCA (WPCA) のような他の手法はノイズの分散が知られていると仮定するが、実際は知るのが難しい。本稿では,サンプル単位の雑音分散を推定できるPCA法を開発し,この情報を用いてデータの低ランク構造に関連する部分空間ベースの推定を改善する。これは低ランク成分の分布的な仮定やノイズ分散が知られていると仮定せずに行われる。シミュレーションでは, データのヘテロセシスティック性を考慮し, 全データと良好なデータのみを保持することの利点, PCA, Robust PCA (RPCA) や HePPCAT などの文献で確立されている他の PCA 手法との比較を行った。コードはhttps://github.com/javiersc1/alpcahで利用可能

Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction that is useful for various data science problems. However, many applications involve heterogeneous data that varies in quality due to noise characteristics associated with different sources of the data. Methods that deal with this mixed dataset are known as heteroscedastic methods. Current methods like HePPCAT make Gaussian assumptions of the basis coefficients that may not hold in practice. Other methods such as Weighted PCA (WPCA) assume the noise variances are known, which may be difficult to know in practice. This paper develops a PCA method that can estimate the sample-wise noise variances and use this information in the model to improve the estimate of the subspace basis associated with the low-rank structure of the data. This is done without distributional assumptions of the low-rank component and without assuming the noise variances are known. Simulations show the effectiveness of accounting for such heteroscedasticity in the data, the benefits of using such a method with all of the data versus retaining only good data, and comparisons are made against other PCA methods established in the literature like PCA, Robust PCA (RPCA), and HePPCAT. Code available at https://github.com/javiersc1/ALPCAH

翻訳日:2023-07-07 15:25:44 公開日:2023-07-06

# 表情認識のためのコントラスト事前学習によるアクティブラーニング

Active Learning with Contrastive Pre-training for Facial Expression Recognition ( http://arxiv.org/abs/2307.02744v1 )

ライセンス: Link先を確認

Shuvendu Roy, Ali Etemad

(参考訳) ディープラーニングは、大規模なモデルと大量のラベル付きデータのおかげで、表情認識(FER)の成功に重要な役割を果たしてきた。しかし、ラベル付きデータを得るには膨大な人的労力、時間、資金が必要となる。いくつかの先行研究は、異なる教師なし手法を用いて大量のラベル付きデータの必要性を減らすことに重点を置いているが、アクティブラーニングと呼ばれる別の有望なアプローチは、FERの文脈ではほとんど研究されていない。このアプローチでは、制限された「ラベル予算」を最大限に活用するために、ラベルなしのセットから最も代表的なサンプルを選択してラベル付けする。本稿では,3つの公開FERデータセット(FER13,RAF-DB,KDEF)に対して,最近の8つのアクティブラーニング手法を実装し,検討する。その結果、既存のアクティブラーニング手法はferの文脈ではうまく機能せず、ラベル付きサンプルの初期セットがデータセット全体をよく表していない場合に発生する「コールドスタート」と呼ばれる現象に苦しむことが判明した。この問題に対処するために,まず,ラベルなしデータセット全体に基づいて基礎となる表現を学習する,自己教師付き事前学習を提案する。次に、アクティブラーニング手法を用いてこれに従い、2段階のアプローチがランダムサンプリングよりも最大9.2%改善され、事前トレーニングなしで既存の最良のアクティブラーニングベースラインよりも最大6.7%改善されていることを観察する。この研究のコードは、github.com/ShuvenduRoy/ActiveFERで公開します。

Deep learning has played a significant role in the success of facial expression recognition (FER), thanks to large models and vast amounts of labelled data. However, obtaining labelled data requires a tremendous amount of human effort, time, and financial resources. Even though some prior works have focused on reducing the need for large amounts of labelled data using different unsupervised methods, another promising approach called active learning is barely explored in the context of FER. This approach involves selecting and labelling the most representative samples from an unlabelled set to make the best use of a limited 'labelling budget'. In this paper, we implement and study 8 recent active learning methods on three public FER datasets, FER13, RAF-DB, and KDEF. Our findings show that existing active learning methods do not perform well in the context of FER, likely suffering from a phenomenon called 'Cold Start', which occurs when the initial set of labelled samples is not well representative of the entire dataset. To address this issue, we propose contrastive self-supervised pre-training, which first learns the underlying representations based on the entire unlabelled dataset. We then follow this with the active learning methods and observe that our 2-step approach shows up to 9.2% improvement over random sampling and up to 6.7% improvement over the best existing active learning baseline without the pre-training. We will make the code for this study public upon publication at: github.com/ShuvenduRoy/ActiveFER.

翻訳日:2023-07-07 15:25:22 公開日:2023-07-06

# ターゲットドメイン記述を用いたDense Retrieval Adaptation

Dense Retrieval Adaptation using Target Domain Description ( http://arxiv.org/abs/2307.02740v1 )

ライセンス: Link先を確認

Helia Hashemi, Yong Zhuang, Sachith Sri Ram Kothur, Srivas Prasad, Edgar Meij, W. Bruce Croft

(参考訳) 情報検索(ir)において、ドメイン適応とは、データ分布がソースドメインとは異なる新しいドメインに検索モデルを適用するプロセスである。この領域の既存の手法は、対象のドキュメントコレクションへのアクセスや、対象のドメイン内のラベル付き(制限された)データへのアクセスを監督(しばしば数ショット)するドメイン適応に焦点をあてている。適応性のない検索モデルのゼロショット性能向上に関する研究もある。本稿では、未探索のIRにおける領域適応の新しいカテゴリを紹介する。ここではゼロショット設定と同様、検索モデルが対象文書コレクションにアクセスできないと仮定する。対照的に、ターゲットドメインを説明する短いテキスト記述にアクセスすることができる。検索タスクにおいて、対象ドメインに適合可能なソースドメインの異なる特性を理解するために、ドメイン属性の分類を定義する。本稿では,テキストドメイン記述を前提として,合成文書コレクション,クエリセット,擬似関連ラベルを生成する新しい自動データ構築パイプラインを提案する。 5つの多様な対象領域に関する広範囲な実験により,構築した合成データを用いた高密度検索モデルの適用により,対象領域での効果的な検索性能が得られた。

In information retrieval (IR), domain adaptation is the process of adapting a retrieval model to a new domain whose data distribution is different from the source domain. Existing methods in this area focus on unsupervised domain adaptation where they have access to the target document collection or supervised (often few-shot) domain adaptation where they additionally have access to (limited) labeled data in the target domain. There also exists research on improving zero-shot performance of retrieval models with no adaptation. This paper introduces a new category of domain adaptation in IR that is as-yet unexplored. Here, similar to the zero-shot setting, we assume the retrieval model does not have access to the target document collection. In contrast, it does have access to a brief textual description that explains the target domain. We define a taxonomy of domain attributes in retrieval tasks to understand different properties of a source domain that can be adapted to a target domain. We introduce a novel automatic data construction pipeline that produces a synthetic document collection, query set, and pseudo relevance labels, given a textual domain description. Extensive experiments on five diverse target domains show that adapting dense retrieval models using the constructed synthetic data leads to effective retrieval performance on the target domain.

翻訳日:2023-07-07 15:24:54 公開日:2023-07-06

# RecallM: 時間的コンテキスト理解と質問応答のためのアーキテクチャ

RecallM: An Architecture for Temporal Context Understanding and Question Answering ( http://arxiv.org/abs/2307.02738v1 )

ライセンス: Link先を確認

Brandon Kynoch, Hugo Latapie

(参考訳) 大規模言語モデル(llm)ベースのチャットボットのための理想的な長期記憶メカニズムは、継続的な学習、複雑な推論、シーケンシャルおよびテンポラリな依存関係の学習の基盤となる。このタイプのメモリメカニズムを作成することは、非常に難しい問題です。本稿では、長期記憶の効果を達成するための様々な方法を検討する。本稿では,AGIシステムのための適応型・アップグレード可能な長期メモリの構築を目的とした新しいアーキテクチャを提案する。様々な実験を通じて、RecallMアーキテクチャの利点、特に時間的理解の向上を実演する。

The ideal long-term memory mechanism for Large Language Model (LLM) based chatbots, would lay the foundation for continual learning, complex reasoning and allow sequential and temporal dependencies to be learnt. Creating this type of memory mechanism is an extremely challenging problem. In this paper we explore different methods of achieving the effect of long-term memory. We propose a new architecture focused on creating adaptable and updatable long-term memory for AGI systems. We demonstrate through various experiments the benefits of the RecallM architecture, particularly the improved temporal understanding it provides.

翻訳日:2023-07-07 15:24:35 公開日:2023-07-06

# Liver $T_1\rho$マッピングと分析のための不確かさ支援フレームワーク

An Uncertainty Aided Framework for Learning based Liver $T_1\rho$ Mapping and Analysis ( http://arxiv.org/abs/2307.02736v1 )

ライセンス: Link先を確認

Chaoxing Huang, Vincent Wai Sun Wong, Queen Chan, Winnie Chiu Wing Chu, Weitian Chen

(参考訳) 目的:$T_1\rho$イメージングは肝疾患の生化学的変化を評価する可能性がある。定量的なT_1\rho$イメージングを加速するために深層学習法が用いられている。複雑な臨床環境において人工知能を用いた定量的イメージング手法を採用するためには,推定された$t_1\rho$値の不確かさを推定し,定量化結果の信頼性レベルを提供することが重要である。この不確実性は、ポストホックな定量的分析とモデル学習タスクを支援するためにも活用されるべきである。アプローチ:このニーズに対処するために、学習ベースの$t_1\rho$マッピングのためのパラメトリックマップリファインメントアプローチを提案し、不確かさをモデル化するための確率的方法でモデルを訓練する。また,改良された$t_1\rho$マッピングネットワークのトレーニングを空間的に重み付けて,マッピング性能をさらに向上させ,信頼できない$t_1\rho$値の画素を除去するための不確実性マップを提案する。この枠組みは肝線維化段階の異なる51例のデータセットでテストされた。主な結果: 学習に基づくマップリファインメント手法は, 相対的マッピング誤差が3%未満となり, 不確実性推定を同時に行なえることを示す。推定された不確実性は実際のエラーレベルを反映しており、相対的に$t_1\rho$マッピングエラーを2.60%に削減し、関心領域の信頼できないピクセルを効果的に除去するために使用できる。意義:本研究は肝のT_1\rho$マッピングに学習に基づく定量的MRIシステムを提供することの可能性を示した。

Objective: Quantitative $T_1\rho$ imaging has potential for assessment of biochemical alterations of liver pathologies. Deep learning methods have been employed to accelerate quantitative $T_1\rho$ imaging. To employ artificial intelligence-based quantitative imaging methods in complicated clinical environment, it is valuable to estimate the uncertainty of the predicated $T_1\rho$ values to provide the confidence level of the quantification results. The uncertainty should also be utilized to aid the post-hoc quantitative analysis and model learning tasks. Approach: To address this need, we propose a parametric map refinement approach for learning-based $T_1\rho$ mapping and train the model in a probabilistic way to model the uncertainty. We also propose to utilize the uncertainty map to spatially weight the training of an improved $T_1\rho$ mapping network to further improve the mapping performance and to remove pixels with unreliable $T_1\rho$ values in the region of interest. The framework was tested on a dataset of 51 patients with different liver fibrosis stages. Main results: Our results indicate that the learning-based map refinement method leads to a relative mapping error of less than 3% and provides uncertainty estimation simultaneously. The estimated uncertainty reflects the actual error level, and it can be used to further reduce relative $T_1\rho$ mapping error to 2.60% as well as removing unreliable pixels in the region of interest effectively. Significance: Our studies demonstrate the proposed approach has potential to provide a learning-based quantitative MRI system for trustworthy $T_1\rho$ mapping of the liver.

翻訳日:2023-07-07 15:24:25 公開日:2023-07-06

# MMNet:シークエンシャルディープフェイク検出のためのマルチコラボレーションとマルチスーパービジョンネットワーク

MMNet: Multi-Collaboration and Multi-Supervision Network for Sequential Deepfake Detection ( http://arxiv.org/abs/2307.02733v1 )

ライセンス: Link先を確認

Ruiyang Xia, Decheng Liu, Jie Li, Lin Yuan, Nannan Wang, Xinbo Gao

(参考訳) 高度な操作技術は、偽造顔画像などの偽造メディアの生成を通じて、犯罪者に社会的なパニックや不正な利益を得る機会を与えてきた。画像の信頼性を評価するため,様々なディープフェイク検出手法が提案されている。ディープフェイク検出の拡張であるシークエンシャルディープフェイク検出は、回復のための正しいシーケンスで偽の顔領域を特定することを目的としている。それにもかかわらず、空間的およびシーケンシャルな操作の異なる組み合わせにより、偽造顔画像は検出性能に重大な影響を及ぼすかなりの相違点を示す。さらに、偽画像の復元には、逆変換を実装するために操作モデルの知識が必要であるため、関連する技術が攻撃者によって隠蔽されることが多いため、確認は困難である。これらの課題に対処するために, 顔画像の様々な空間スケールや逐次的な順応を扱うマルチコラボレーション・マルチスーパービジョンネットワーク (MMNet) を提案し, 対応する操作方法の知識を必要とせずに回復を実現する。さらに, 既存の評価基準では, 連続的複数ステップにおける地盤との一致度を考慮せず, 単一の推定ステップで検出精度のみを考慮に入れている。この制限を克服するために,複数の推論ステップにおける検出精度を考慮した完全系列マッチング(CSM)と呼ばれる新しい評価指標を提案する。いくつかの典型的なデータセットに対する大規模な実験は、MMNetが最先端検出性能と独立回復性能を達成することを示した。

Advanced manipulation techniques have provided criminals with opportunities to make social panic or gain illicit profits through the generation of deceptive media, such as forged face images. In response, various deepfake detection methods have been proposed to assess image authenticity. Sequential deepfake detection, which is an extension of deepfake detection, aims to identify forged facial regions with the correct sequence for recovery. Nonetheless, due to the different combinations of spatial and sequential manipulations, forged face images exhibit substantial discrepancies that severely impact detection performance. Additionally, the recovery of forged images requires knowledge of the manipulation model to implement inverse transformations, which is difficult to ascertain as relevant techniques are often concealed by attackers. To address these issues, we propose Multi-Collaboration and Multi-Supervision Network (MMNet) that handles various spatial scales and sequential permutations in forged face images and achieve recovery without requiring knowledge of the corresponding manipulation method. Furthermore, existing evaluation metrics only consider detection accuracy at a single inferring step, without accounting for the matching degree with ground-truth under continuous multiple steps. To overcome this limitation, we propose a novel evaluation metric called Complete Sequence Matching (CSM), which considers the detection accuracy at multiple inferring steps, reflecting the ability to detect integrally forged sequences. Extensive experiments on several typical datasets demonstrate that MMNet achieves state-of-the-art detection performance and independent recovery performance.

翻訳日:2023-07-07 15:23:55 公開日:2023-07-06

# evaluatorsの評価: 現在の少数の学習ベンチマークは目的に合っているか?

Evaluating the Evaluators: Are Current Few-Shot Learning Benchmarks Fit for Purpose? ( http://arxiv.org/abs/2307.02732v1 )

ライセンス: Link先を確認

Lu\'isa Shimabucoro, Timothy Hospedales, Henry Gouk

(参考訳) Few-Shot Learningのための多くのベンチマークがここ10年間提案されている。しかし、これらのベンチマークはすべて多くのタスクでパフォーマンスに重点を置いており、個々のタスクのためにトレーニングされたモデルをどのように確実に評価しチューニングするかという問題は解決されていない。本稿では,タスクレベルの評価 - モデルをデプロイする上での基本的なステップについて,最初の調査を行う。提案手法は,数ショット設定における性能推定器の精度を計測し,モデル選択の戦略を検討し,通常ロバストであると考えられる評価器の故障の原因を考察する。また,多数の折り畳みを持つブートストラップやクロスバリデーションを用いることで,モデル選択の目的に適しており,モデルの性能を直接推定する上では,クロスバリデーションが最適である,という結論を得た。全体として、既存の数ショット学習のベンチマークは、個々のタスクでメソッドがいかに効果的に使えるかの信頼性の高い図を得られるように設計されていない。

Numerous benchmarks for Few-Shot Learning have been proposed in the last decade. However all of these benchmarks focus on performance averaged over many tasks, and the question of how to reliably evaluate and tune models trained for individual tasks in this regime has not been addressed. This paper presents the first investigation into task-level evaluation -- a fundamental step when deploying a model. We measure the accuracy of performance estimators in the few-shot setting, consider strategies for model selection, and examine the reasons for the failure of evaluators usually thought of as being robust. We conclude that cross-validation with a low number of folds is the best choice for directly estimating the performance of a model, whereas using bootstrapping or cross validation with a large number of folds is better for model selection purposes. Overall, we find that existing benchmarks for few-shot learning are not designed in such a way that one can get a reliable picture of how effectively methods can be used on individual tasks.

翻訳日:2023-07-07 15:23:29 公開日:2023-07-06

# 細粒度アクション分析:フィギュアスケートのマルチモダリティとマルチタスクデータセット

Fine-grained Action Analysis: A Multi-modality and Multi-task Dataset of Figure Skating ( http://arxiv.org/abs/2307.02730v1 )

ライセンス: Link先を確認

Sheng-Lan Liu, Yu-Ning Ding, Si-Fan Zhang, Wen-Yue Chen, Ning Zhou, Hao Liu, Gui-Hong Lao

(参考訳) 既存のアクションデータセットのきめ細かいアクション分析は、不十分なアクションカテゴリ、低い粒度、限られたモダリティ、タスクによって挑戦される。本稿では,世界フィギュアスケート選手権から収集したフィギュアスケート(mmfs)のマルチモダリティとマルチタスクデータセットを提案する。行動認識と行動品質評価を持つMMFSは、RGB、スケルトンをキャプチャし、空間ラベルや時間ラベルを含む256のカテゴリを持つ11671クリップからアクションのスコアを収集する。私たちのデータセットの主な貢献は、以下の3つの側面に分類できます。 1) 個別に空間的・時間的カテゴリーを提案し, より詳細な行動認識と品質評価を行う。 2) MMFSはまず, 複雑なきめ細かい動作品質評価のための骨格モードを導入する。 (3)マルチモーダリティとマルチタスクデータセットは、より多くのアクション分析モデルを促進する。データセットをベンチマークするために、アクション認識とアクション品質評価のためのRGBおよびスケルトンベースのベースライン手法を採用した。

The fine-grained action analysis of the existing action datasets is challenged by insufficient action categories, low fine granularities, limited modalities, and tasks. In this paper, we propose a Multi-modality and Multi-task dataset of Figure Skating (MMFS) which was collected from the World Figure Skating Championships. MMFS, which possesses action recognition and action quality assessment, captures RGB, skeleton, and is collected the score of actions from 11671 clips with 256 categories including spatial and temporal labels. The key contributions of our dataset fall into three aspects as follows. (1) Independently spatial and temporal categories are first proposed to further explore fine-grained action recognition and quality assessment. (2) MMFS first introduces the skeleton modality for complex fine-grained action quality assessment. (3) Our multi-modality and multi-task dataset encourage more action analysis models. To benchmark our dataset, we adopt RGB-based and skeleton-based baseline methods for action recognition and action quality assessment.

翻訳日:2023-07-07 15:23:11 公開日:2023-07-06

# テキストアライメントは大規模NLPタスクのための効率的な統一モデル

Text Alignment Is An Efficient Unified Model for Massive NLP Tasks ( http://arxiv.org/abs/2307.02729v1 )

ライセンス: Link先を確認

Yuheng Zha, Yichi Yang, Ruichen Li, Zhiting Hu

(参考訳) 大きな言語モデル(LLM)は、通常、次の単語予測の関数として設計され、広範なNLPタスクに優れていた。一般性にもかかわらず、次の単語予測は多くの場合、多くのタスクにおいて効率的な定式化ではなく、極端なモデルパラメータ(10億から100億)を必要とし、時には準最適性能をもたらす。実際には、より効率的なモデルを構築することが望ましいことが多い -- 汎用性は低いが、問題のかなりのサブセットに適用され、モデルサイズがはるかに小さい同等あるいは優れたパフォーマンスを提供する。本稿では,テキストの包含,類似性,質問応答(と応答性),事実整合性などを含む幅広い重要なタスクに対して,テキストアライメントを効率的な統一モデルとして提案する。一対のテキストが与えられると、モデルはその情報間のアライメントの度合いを測定する。 28データセットの5.9M例を用いて,RoBERTa(355Mパラメータ)の軽量微調整によりアライメントモデル(Align)をインスタンス化する。 Despite its compact size, extensive experiments show the model's efficiency and strong performance: (1) On over 20 datasets of aforementioned diverse tasks, the model matches or surpasses FLAN-T5 models that have around 2x or 10x more parameters; the single unified model also outperforms task-specific models finetuned on individual datasets; (2) When applied to evaluate factual consistency of language generation on 23 datasets, our model improves over various baselines, including the much larger GPT-3.5 (ChatGPT) and sometimes even GPT-4; (3) The lightweight model can also serve as an add-on component for LLMs such as GPT-3.5 in question answering tasks, improving the average exact match (EM) score by 17.94 and F1 score by 15.05 through identifying unanswerable questions.

Large language models (LLMs), typically designed as a function of next-word prediction, have excelled across extensive NLP tasks. Despite the generality, next-word prediction is often not an efficient formulation for many of the tasks, demanding an extreme scale of model parameters (10s or 100s of billions) and sometimes yielding suboptimal performance. In practice, it is often desirable to build more efficient models -- despite being less versatile, they still apply to a substantial subset of problems, delivering on par or even superior performance with much smaller model sizes. In this paper, we propose text alignment as an efficient unified model for a wide range of crucial tasks involving text entailment, similarity, question answering (and answerability), factual consistency, and so forth. Given a pair of texts, the model measures the degree of alignment between their information. We instantiate an alignment model (Align) through lightweight finetuning of RoBERTa (355M parameters) using 5.9M examples from 28 datasets. Despite its compact size, extensive experiments show the model's efficiency and strong performance: (1) On over 20 datasets of aforementioned diverse tasks, the model matches or surpasses FLAN-T5 models that have around 2x or 10x more parameters; the single unified model also outperforms task-specific models finetuned on individual datasets; (2) When applied to evaluate factual consistency of language generation on 23 datasets, our model improves over various baselines, including the much larger GPT-3.5 (ChatGPT) and sometimes even GPT-4; (3) The lightweight model can also serve as an add-on component for LLMs such as GPT-3.5 in question answering tasks, improving the average exact match (EM) score by 17.94 and F1 score by 15.05 through identifying unanswerable questions.

翻訳日:2023-07-07 15:22:54 公開日:2023-07-06

# 階層的エンパワーメント:気軽なエンパワーメントに基づくスキル学習に向けて

Hierarchical Empowerment: Towards Tractable Empowerment-Based Skill-Learning ( http://arxiv.org/abs/2307.02728v1 )

ライセンス: Link先を確認

Andrew Levy, Sreehari Rammohan, Alessandro Allievi, Scott Niekum, George Konidaris

(参考訳) 汎用エージェントには大量のスキルのレパートリーが必要です。スキルと国家間の最大の相互情報であるエンパワーメントは、異なるスキルの大規模なコレクションを学ぶための経路を提供するが、相互情報の最適化は困難である。我々は,目標条件付き階層強化学習の概念を統合することにより,コンピュータエンパワメントをより扱いやすくする新しいフレームワークである階層エンパワメントを導入する。私たちのフレームワークは2つの特別な貢献をします。まず,短時間の地平線上でのエンパワーメントの計算に使用可能な,相互情報に基づく新しい変分下界を導入する。第2に,指数的に長い時間スケールで計算能力を高める階層アーキテクチャを導入する。シミュレーションロボットタスクにおけるフレームワークの貢献を検証する。一般的なアリナビゲーション領域では、我々の4つのレベルエージェントは、以前の作業よりも2桁大きい表面積をカバーするスキルを学ぶことができる。

General purpose agents will require large repertoires of skills. Empowerment -- the maximum mutual information between skills and the states -- provides a pathway for learning large collections of distinct skills, but mutual information is difficult to optimize. We introduce a new framework, Hierarchical Empowerment, that makes computing empowerment more tractable by integrating concepts from Goal-Conditioned Hierarchical Reinforcement Learning. Our framework makes two specific contributions. First, we introduce a new variational lower bound on mutual information that can be used to compute empowerment over short horizons. Second, we introduce a hierarchical architecture for computing empowerment over exponentially longer time scales. We verify the contributions of the framework in a series of simulated robotics tasks. In a popular ant navigation domain, our four level agents are able to learn skills that cover a surface area over two orders of magnitude larger than prior work.

翻訳日:2023-07-07 15:22:26 公開日:2023-07-06

# 3分間の人間フィードバックを用いた拡散モデルの検閲サンプリング

Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback ( http://arxiv.org/abs/2307.02770v1 )

ライセンス: Link先を確認

TaeHo Yoon, Kibeom Myoung, Keon Lee, Jaewoong Cho, Albert No, Ernest K. Ryu

(参考訳) 拡散モデルは最近、高品質な画像生成で顕著な成功を収めている。しかし、事前学習された拡散モデルは、良い画像を生成できるという意味で部分的な不一致を示すことがあるが、望ましくない画像を出力することもある。もしそうなら、単に悪い画像を生成するのを防ぎ、このタスクを検閲と呼びます。本研究では,最小の人間フィードバックに基づいて学習した報酬モデルを用いて,事前学習した拡散モデルを用いた検閲生成法を提案する。検閲は極端に人的フィードバック効率で達成でき、ほんの数分のフィードバックで生成されたラベルだけで十分であることを示す。 https://github.com/tetrzim/diffusion-human-feedback.com/で利用可能。

Diffusion models have recently shown remarkable success in high-quality image generation. Sometimes, however, a pre-trained diffusion model exhibits partial misalignment in the sense that the model can generate good images, but it sometimes outputs undesirable images. If so, we simply need to prevent the generation of the bad images, and we call this task censoring. In this work, we present censored generation with a pre-trained diffusion model using a reward model trained on minimal human feedback. We show that censoring can be accomplished with extreme human feedback efficiency and that labels generated with a mere few minutes of human feedback are sufficient. Code available at: https://github.com/tetrzim/diffusion-human-feedback.

翻訳日:2023-07-07 15:14:22 公開日:2023-07-06

# 未知の思考の生成、認識、再編成のためのトレーニングモデル

Training Models to Generate, Recognize, and Reframe Unhelpful Thoughts ( http://arxiv.org/abs/2307.02768v1 )

ライセンス: Link先を確認

Mounica Maddela, Megan Ung, Jing Xu, Andrea Madotto, Heather Foran, Y-Lan Boureau

(参考訳) 健康に対する多くの認知的アプローチは、例えば無力な思考を認識して再フレーミングするなど、過去数十年にわたってかなりの実証的支援を受けてきたが、それでも本当に広く自己啓発の形式に採用されていない。その採用の障壁は、適切に特定され、多様な専門的な実践材料がないことです。本研究は,現在使われている言語モデルを用いて,特定の文脈に適合する標準的な無ヘルペスな思考パターンを記述し,適切な肯定的再フレーミング提案を生成することができるかを検討する。 PATTERNREFRAMEは、与えられたペルソナに条件付けされた不愉快な思考パターンを含む、およそ10kの思考例からなる、新しいデータセットである。このデータセットを使用して現在のモデルをトレーニングおよび/または評価することにより、既存のモデルは、必要最小限のモデルトレーニングを必要とせずに、多数の調整済みの練習資料と仮説を生成するのに有効なツールであることが示されます。

Many cognitive approaches to well-being, such as recognizing and reframing unhelpful thoughts, have received considerable empirical support over the past decades, yet still lack truly widespread adoption in self-help format. A barrier to that adoption is a lack of adequately specific and diverse dedicated practice material. This work examines whether current language models can be leveraged to both produce a virtually unlimited quantity of practice material illustrating standard unhelpful thought patterns matching specific given contexts, and generate suitable positive reframing proposals. We propose PATTERNREFRAME, a novel dataset of about 10k examples of thoughts containing unhelpful thought patterns conditioned on a given persona, accompanied by about 27k positive reframes. By using this dataset to train and/or evaluate current models, we show that existing models can already be powerful tools to help generate an abundance of tailored practice material and hypotheses, with no or minimal additional model training required.

翻訳日:2023-07-07 15:14:11 公開日:2023-07-06

# ジャンプを伴う高次元PIDEの時間差学習

Temporal Difference Learning for High-Dimensional PIDEs with Jumps ( http://arxiv.org/abs/2307.02766v1 )

ライセンス: Link先を確認

Liwei Lu, Hailong Guo, Xu Yang, Yi Zhu

(参考訳) 本稿では,時間差学習に基づく高次元部分積分微分方程式(pide)を解くための深層学習フレームワークを提案する。一連のLeviプロセスを導入し、それに対応する強化学習モデルを構築する。プロセス全体をシミュレートするために、ディープニューラルネットワークを使用して、方程式の解と非局所項を表現する。その後,非局所項の時間差誤差,終了条件,特性を損失関数としてネットワークを訓練する。この方法の相対誤差は、100次元実験ではo(10^{-3})、一次元純粋なジャンプ問題ではo(10^{-4})に達する。さらに, 計算コストの低減とロバスト性の利点を実証し, ジャンプの強度や形状の異なる問題への対処に適していることを示す。

In this paper, we propose a deep learning framework for solving high-dimensional partial integro-differential equations (PIDEs) based on the temporal difference learning. We introduce a set of Levy processes and construct a corresponding reinforcement learning model. To simulate the entire process, we use deep neural networks to represent the solutions and non-local terms of the equations. Subsequently, we train the networks using the temporal difference error, termination condition, and properties of the non-local terms as the loss function. The relative error of the method reaches O(10^{-3}) in 100-dimensional experiments and O(10^{-4}) in one-dimensional pure jump problems. Additionally, our method demonstrates the advantages of low computational cost and robustness, making it well-suited for addressing problems with different forms and intensities of jumps.

翻訳日:2023-07-07 15:13:52 公開日:2023-07-06

# 信頼に基づくカスケードデフェデレーションはいつ有効か?

When Does Confidence-Based Cascade Deferral Suffice? ( http://arxiv.org/abs/2307.02764v1 )

ライセンス: Link先を確認

Wittawat Jitkrittum, Neha Gupta, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sanjiv Kumar

(参考訳) カスケードは、一連の分類器が順番に呼び出されるサンプル間で、推論コストを適応的に変化させる古典的な戦略である。 deferralルールは、シーケンス内の次の分類子を呼び出すか、または予測を終了するかを決定する。 1つの単純なdeferral ruleは、例えば最大予測ソフトマックス確率に基づいて、現在の分類器の信頼性を利用する。カスケードの構造(例えば、下流モデルのエラーをモデル化しない)に従順であるにもかかわらず、このような信頼に基づく推論は、実際には非常にうまく機能する。本稿では,信頼度に基づく推論が失敗する条件と,代替的推論戦略がうまく機能する場合の状況についてより深く理解することを目指す。まず、信頼に基づく推論が苦しむ可能性のある設定を正確に特徴づける最適deferralルールの理論的特徴付けを示す。次に, ポストホック・デフェラルのメカニズムについて検討し, 設定における信頼度に基づくデフェラルの大幅な改善を実証する。 (i)下流モデルは入力のサブセットでのみうまく機能する専門家である。 (ii)サンプルはラベルノイズを受けており、 (iii)列車と試験台の間には分布シフトがある。

Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade -- e.g., not modelling the errors of downstream models -- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better. We first present a theoretical characterisation of the optimal deferral rule, which precisely characterises settings under which confidence-based deferral may suffer. We then study post-hoc deferral mechanisms, and demonstrate they can significantly improve upon confidence-based deferral in settings where (i) downstream models are specialists that only work well on a subset of inputs, (ii) samples are subject to label noise, and (iii) there is distribution shift between the train and test set.

翻訳日:2023-07-07 15:13:41 公開日:2023-07-06

# 配偶者は、社会的関係をモデル化することで、メッセージの文脈的適切性を決定する

Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships ( http://arxiv.org/abs/2307.02763v1 )

ライセンス: Link先を確認

David Jurgens, Agrima Seth, Jackson Sargent, Athena Aghighi, Michael Geraci

(参考訳) 対人コミュニケーションを理解するには、メッセージが語られる社会的文脈と規範を理解することが必要である。しかし、このようなコミュニケーションにおける攻撃的コンテンツを識別する現在の手法は、コミュニティの規範や事前会話を文脈として考慮し、文脈に依存しない。本稿では,個人間の社会的関係を明示的にモデル化することにより,不適切なコミュニケーションを識別する新しいアプローチを提案する。本稿では,文脈的に構成された適切性判断のデータセットを新たに導入し,大言語モデルが関係情報を容易に組み込んで,与えられた文脈における適切性を正確に識別できることを示す。オンライン会話と映画対話のデータを用いて、関係自体が暗黙の規範として機能し、異なる会話設定でコンテキスト感受性が必要な程度を定量化する。さらに, 文脈適合性判断は, 便宜や丁寧さといった言語で表される他の社会的要因を予測できることを示す。

Understanding interpersonal communication requires, in part, understanding the social context and norms in which a message is said. However, current methods for identifying offensive content in such communication largely operate independent of context, with only a few approaches considering community norms or prior conversation as context. Here, we introduce a new approach to identifying inappropriate communication by explicitly modeling the social relationship between the individuals. We introduce a new dataset of contextually-situated judgments of appropriateness and show that large language models can readily incorporate relationship information to accurately identify appropriateness in a given context. Using data from online conversations and movie dialogues, we provide insight into how the relationships themselves function as implicit norms and quantify the degree to which context-sensitivity is needed in different conversation settings. Further, we also demonstrate that contextual-appropriateness judgments are predictive of other social factors expressed in language such as condescension and politeness.

翻訳日:2023-07-07 15:13:22 公開日:2023-07-06

# PRD: 大規模言語モデルに基づく評価を改善するピアランクと考察

PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations ( http://arxiv.org/abs/2307.02762v1 )

ライセンス: Link先を確認

Ruosen Li, Teerth Patel, Xinya Du

(参考訳) 現在、様々な現代大言語モデル(LLM)が生成する応答の質は、自動で評価・比較することが困難である。近年の研究では、LLMをオープンエンド質問応答の基準自由度として主に用いている。より具体的には、彼らは「最も強い」llmを評価器として使用し、候補モデルの答えをペアで比較し、ランキングスコアを提供する。しかし、この直感的な手法には、自己強調(自身の答えを好む)や位置バイアスなど、複数の問題がある。教育領域(Cho and MacArthur, 2011; Walsh, 2014)からLLMに基づく評価を改善するための洞察と教訓を引き出す。具体的には,(1)ピア・ランク(pr)アルゴリズムを提案し,各ピア・llmの対方向選好を考慮し,モデルの最終的なランキングを出力し,(2)ピア・ディベーション(pd)により,2つの回答の選好について議論し,相互合意に達するように促す。我々は2つのベンチマークデータセットで実験を行う。私たちのアプローチは、より高い精度を達成し、それぞれ人間の判断とよりよく一致していることが分かりました。興味深いことに、prは匿名設定の下で比較的正確なモデルの自己組織化を誘導することができる。私たちの研究は、人間と比較しにくいモデルを評価するスペースを提供する。

Nowadays, the quality of responses generated by different modern large language models (LLMs) are hard to evaluate and compare automatically. Recent studies suggest and predominantly use LLMs as a reference-free metric for open-ended question answering. More specifically, they use the recognized "strongest" LLM as the evaluator, which conducts pairwise comparisons of candidate models' answers and provides a ranking score. However, this intuitive method has multiple problems, such as bringing in self-enhancement (favoring its own answers) and positional bias. We draw insights and lessons from the educational domain (Cho and MacArthur, 2011; Walsh, 2014) to improve LLM-based evaluations. Specifically, we propose the (1) peer rank (PR) algorithm that takes into account each peer LLM's pairwise preferences of all answer pairs, and outputs a final ranking of models; and (2) peer discussion (PD), where we prompt two LLMs to discuss and try to reach a mutual agreement on preferences of two answers. We conduct experiments on two benchmark datasets. We find that our approaches achieve higher accuracy and align better with human judgments, respectively. Interestingly, PR can induce a relatively accurate self-ranking of models under the anonymous setting, where each model's name is unrevealed. Our work provides space to explore evaluating models that are hard to compare for humans.

翻訳日:2023-07-07 15:13:04 公開日:2023-07-06

# 推薦のための知識グラフ自己監督型合理化

Knowledge Graph Self-Supervised Rationalization for Recommendation ( http://arxiv.org/abs/2307.02759v1 )

ライセンス: Link先を確認

Yuhao Yang, Chao Huang, Lianghao Xia, Chunzhen Huang

(参考訳) 本稿では,知識認識リコメンデータシステムのための,KGRecと呼ばれる自己指導型合理化手法を提案する。情報的知識接続を効果的に識別するために,知識三重項に対する合理的スコアを生成する注意的知識合理化機構を提案する。これらのスコアにより、KGRecは有理マスクによる推薦のための生成的かつコントラスト的な自己監督タスクを統合する。知識グラフの有理性を強調するために,マスキング・再構築という新たな生成タスクを設計する。重要な知識を高い有理スコアで隠蔽することで、KGRecは有理値として役立つ有用な知識接続を再構築し強調するように訓練されている。知識グラフ学習における協調的相互作用の効果をさらに合理化するために,知識とユーザ・イテムの相互作用ビューからの信号を整合させるコントラスト学習タスクを導入する。耐雑音コントラストを確保するため、有理スコアで判断される両グラフの潜在的なノイズエッジをマスキングする。 3つの実世界のデータセットに対する大規模な実験は、KGRecが最先端の手法より優れていることを示した。アプローチの実装コードもhttps://github.com/HKUDS/KGRec.comで公開しています。

In this paper, we introduce a new self-supervised rationalization method, called KGRec, for knowledge-aware recommender systems. To effectively identify informative knowledge connections, we propose an attentive knowledge rationalization mechanism that generates rational scores for knowledge triplets. With these scores, KGRec integrates generative and contrastive self-supervised tasks for recommendation through rational masking. To highlight rationales in the knowledge graph, we design a novel generative task in the form of masking-reconstructing. By masking important knowledge with high rational scores, KGRec is trained to rebuild and highlight useful knowledge connections that serve as rationales. To further rationalize the effect of collaborative interactions on knowledge graph learning, we introduce a contrastive learning task that aligns signals from knowledge and user-item interaction views. To ensure noise-resistant contrasting, potential noisy edges in both graphs judged by the rational scores are masked. Extensive experiments on three real-world datasets demonstrate that KGRec outperforms state-of-the-art methods. We also provide the implementation codes for our approach at https://github.com/HKUDS/KGRec.

翻訳日:2023-07-07 15:12:41 公開日:2023-07-06

# オンラインコミュニティにおける言語スタイルマッチングの探求 : 社会的文脈と会話ダイナミクスの役割

Exploring Linguistic Style Matching in Online Communities: The Role of Social Context and Conversation Dynamics ( http://arxiv.org/abs/2307.02758v1 )

ライセンス: Link先を確認

Aparna Ananthasubramaniam, Hong Chen, Jason Yan, Kenan Alkiek, Jiaxin Pei, Agrima Seth, Lavinia Dunagan, Minje Choi, Benjamin Litterer, David Jurgens

(参考訳) 会話における言語スタイルマッチング(LSM)は、力や説得といった社会的影響のいくつかの側面を反映することができる。しかし、LSMがRedditのようなプラットフォーム上でのオンラインコミュニケーションの結果とどのように関係しているのかは不明な疑問である。本研究では,Redditにおける二者会話スレッドの大規模コーパスを分析し,機能語の使用と形式性という2種類のスタイルを用いて,LSMのすべての発生を識別する。このフレームワークを用いて、Reddit内のいくつかの社会的要因(ポストとサブレディット機能、会話深度、ユーザ在任率、コメントの議論)によって、LSMのレベルが会話でどのように異なるかを検討する。最後に,コミュニティ禁止後の身分喪失に伴うlsmの変化を測定した。その結果,Redditでの会話におけるLSMの相互作用が,コミュニティのダイナミクスを理解する上での会話の関与を理解することの重要性が示唆された。

Linguistic style matching (LSM) in conversations can be reflective of several aspects of social influence such as power or persuasion. However, how LSM relates to the outcomes of online communication on platforms such as Reddit is an unknown question. In this study, we analyze a large corpus of two-party conversation threads in Reddit where we identify all occurrences of LSM using two types of style: the use of function words and formality. Using this framework, we examine how levels of LSM differ in conversations depending on several social factors within Reddit: post and subreddit features, conversation depth, user tenure, and the controversiality of a comment. Finally, we measure the change of LSM following loss of status after community banning. Our findings reveal the interplay of LSM in Reddit conversations with several community metrics, suggesting the importance of understanding conversation engagement when understanding community dynamics.

翻訳日:2023-07-07 15:12:22 公開日:2023-07-06

# CityTrack: 位置認識とボックスグレードマッチングによる都市規模マルチカメラマルチターゲットトラッキングの改善

CityTrack: Improving City-Scale Multi-Camera Multi-Target Tracking by Location-Aware Tracking and Box-Grained Matching ( http://arxiv.org/abs/2307.02753v1 )

ライセンス: Link先を確認

Jincheng Lu, Xipeng Yang, Jin Ye, Yifu Zhang, Zhikang Zou, Wei Zhang, Xiao Tan

(参考訳) Multi-Camera Multi-Target Tracking (MCMT)は、複数のカメラを同時に追跡するコンピュータビジョン技術である。都市交通の視覚分析におけるmcmtは、都市交通シーンの複雑でダイナミックな性質のために大きな課題に直面している。都市交通シーンのターゲットはしばしば閉塞、照明変更、視点変更を受け、異なるカメラ間でターゲットを関連付けることが困難になる。これらの課題を克服するために,CityTrackと呼ばれる新しいMCMTフレームワークを提案する。具体的には、MCMTタスクにおいて、様々な高度な技術を統合した位置認識SCMTトラッカーを提案し、上記の問題を解決するために、ICAモジュールのための新しいボックスグレードマッチング(BGM)手法を提案する。我々は、cityflowv2データセットの公開テストセットに対するアプローチを評価し、2022年のai city challengeで84.91%のidf1を達成した。本研究では,都市交通シーンがもたらす課題を克服するためのアプローチの有効性を実証した。

Multi-Camera Multi-Target Tracking (MCMT) is a computer vision technique that involves tracking multiple targets simultaneously across multiple cameras. MCMT in urban traffic visual analysis faces great challenges due to the complex and dynamic nature of urban traffic scenes, where multiple cameras with different views and perspectives are often used to cover a large city-scale area. Targets in urban traffic scenes often undergo occlusion, illumination changes, and perspective changes, making it difficult to associate targets across different cameras accurately. To overcome these challenges, we propose a novel systematic MCMT framework, called CityTrack. Specifically, we present a Location-Aware SCMT tracker which integrates various advanced techniques to improve its effectiveness in the MCMT task and propose a novel Box-Grained Matching (BGM) method for the ICA module to solve the aforementioned problems. We evaluated our approach on the public test set of the CityFlowV2 dataset and achieved an IDF1 of 84.91%, ranking 1st in the 2022 AI CITY CHALLENGE. Our experimental results demonstrate the effectiveness of our approach in overcoming the challenges posed by urban traffic scenes.

翻訳日:2023-07-07 15:12:06 公開日:2023-07-06

# 不均衡データセットを用いたオフライン強化学習

Offline Reinforcement Learning with Imbalanced Datasets ( http://arxiv.org/abs/2307.02752v1 )

ライセンス: Link先を確認

Li Jiang, Sijie Chen, Jielin Qiu, Haoran Xu, Wai Kin Chan, Zhao Ding

(参考訳) 現在のオフライン強化学習(RL)研究におけるベンチマークの利用は、モデル開発における実際のデータセット分布の不均衡を無視している。現実世界のオフラインRLデータセットは、探索や安全性の考慮が難しいため、状態空間上で不均衡になることが多い。本稿では、オフラインRLにおける不均衡データセットの特性を規定する。そこでは、状態カバレッジは、歪んだポリシーを特徴とする電力法分布に従う。理論的および実証的に、保守的q-learning(cql)のような分布的制約に基づくオフラインrlメソッドは、不均衡データセットの下でポリシーを抽出するのに効果がないことを示した。自然知性に触発されて,cqlを検索プロセスで拡張し,過去の関連する経験を思い出し,不均衡データセットによって生じる課題を効果的に軽減する,オフラインrl手法を提案する。我々は,D4RLの変種を利用して,不均衡なデータセットの文脈における複数のタスクに対する手法の評価を行った。実験により,本手法が他のベースラインよりも優れていることを示す。

The prevalent use of benchmarks in current offline reinforcement learning (RL) research has led to a neglect of the imbalance of real-world dataset distributions in the development of models. The real-world offline RL dataset is often imbalanced over the state space due to the challenge of exploration or safety considerations. In this paper, we specify properties of imbalanced datasets in offline RL, where the state coverage follows a power law distribution characterized by skewed policies. Theoretically and empirically, we show that typically offline RL methods based on distributional constraints, such as conservative Q-learning (CQL), are ineffective in extracting policies under the imbalanced dataset. Inspired by natural intelligence, we propose a novel offline RL method that utilizes the augmentation of CQL with a retrieval process to recall past related experiences, effectively alleviating the challenges posed by imbalanced datasets. We evaluate our method on several tasks in the context of imbalanced datasets with varying levels of imbalance, utilizing the variant of D4RL. Empirical results demonstrate the superiority of our method over other baselines.

翻訳日:2023-07-07 15:11:48 公開日:2023-07-06

# 平均重み付きOLR-WAオンライン回帰

OLR-WA Online Regression with Weighted Average ( http://arxiv.org/abs/2307.02804v1 )

ライセンス: Link先を確認

Mohammad Abu-Shaira and Greg Speegle

(参考訳) 機械学習は正確なモデルを構築するために大量のトレーニングデータを必要とする。時にデータが時間とともに到着し、大きなストレージスペースを必要とし、新しいデータを説明するためにモデルを再計算する。オンライン学習は、データが発生したときにモデルを漸進的に修正し、データを捨てることで、これらの問題に対処する。本研究では,新しいオンライン線形回帰手法を提案する。このアプローチでは、新たに到着したデータと既存のモデルを組み合わせて、新しいモデルを作成します。 olr-wa (online regression with weighted average) と名付けられたこのモデルでは,ユーザ定義の重み付けを使用して,データ変更に対する柔軟性を提供して,結果のバイアスを古いデータや新しいデータに置き換える。我々は,OLR-WAをデータセット全体を用いた静的モデルと比較した2次元および3次元実験を行った。その結果、一貫性のあるデータの場合、olr-waと静的バッチモデルも同様に動作し、異なるデータの場合、ユーザーはolr-waをより迅速に適応させるか、変更に抵抗するように設定できる。

Machine Learning requires a large amount of training data in order to build accurate models. Sometimes the data arrives over time, requiring significant storage space and recalculating the model to account for the new data. On-line learning addresses these issues by incrementally modifying the model as data is encountered, and then discarding the data. In this study we introduce a new online linear regression approach. Our approach combines newly arriving data with a previously existing model to create a new model. The introduced model, named OLR-WA (OnLine Regression with Weighted Average) uses user-defined weights to provide flexibility in the face of changing data to bias the results in favor of old or new data. We have conducted 2-D and 3-D experiments comparing OLR-WA to a static model using the entire data set. The results show that for consistent data, OLR-WA and the static batch model perform similarly and for varying data, the user can set the OLR-WA to adapt more quickly or to resist change.

翻訳日:2023-07-07 15:07:07 公開日:2023-07-06

# テンソル回帰を用いた構造的グローバル情報保存のためのFew-ShotパーソナライズSaliency予測

Few-Shot Personalized Saliency Prediction Using Tensor Regression for Preserving Structural Global Information ( http://arxiv.org/abs/2307.02799v1 )

ライセンス: Link先を確認

Yuya Moroto, Keisuke Maeda, Takahiro Ogawa and Miki Haseyama

(参考訳) 本稿では,psms(パーソナライズ・サリエンシー・マップ)の構造的グローバル情報を保存するために,テンソル・ツー・マトリックス回帰を用いた数ショットのパーソナライズ・サリエンシー予測を提案する。一般のサルマンシーマップとは対照的に、psmは、注視領域の多様性から個々の視覚嗜好を得るのに有用な人物特有の視覚注意を示すので、大きな可能性を秘めている。 PSM予測は、見えない画像のPSMを取得するために必要であるが、個々の視線パターンの複雑さのため、その予測は依然として難しい課題である。視線追跡データの限られた量から個々の視線パターンを認識するために、従来の方法は人の視線傾向の類似性を採用する。しかし、従来の手法では、予測モデルに対してPSMはベクトル化される。このようにして、画像に対応するPSMの構造的グローバル情報を無視する。 psm間の関係を自動的に明らかにするために,psmの構造情報を保存できるテンソルに基づく回帰モデルに着目し,予測精度の向上を実現する。実験の結果,テンソルベース回帰を含む提案手法が比較法より優れていることを確認した。

This paper presents a few-shot personalized saliency prediction using tensor-to-matrix regression for preserving the structural global information of personalized saliency maps (PSMs). In contrast to a general saliency map, a PSM has been great potential since its map indicates the person-specific visual attention that is useful for obtaining individual visual preferences from heterogeneity of gazed areas. The PSM prediction is needed for acquiring the PSM for the unseen image, but its prediction is still a challenging task due to the complexity of individual gaze patterns. For recognizing individual gaze patterns from the limited amount of eye-tracking data, the previous methods adopt the similarity of gaze tendency between persons. However, in the previous methods, the PSMs are vectorized for the prediction model. In this way, the structural global information of the PSMs corresponding to the image is ignored. For automatically revealing the relationship between PSMs, we focus on the tensor-based regression model that can preserve the structural information of PSMs, and realize the improvement of the prediction accuracy. In the experimental results, we confirm the proposed method including the tensor-based regression outperforms the comparative methods.

翻訳日:2023-07-07 15:06:46 公開日:2023-07-06

# 整合性正規化非整合性学習による半教師付き領域適応型医用画像分割

Semi-supervised Domain Adaptive Medical Image Segmentation through Consistency Regularized Disentangled Contrastive Learning ( http://arxiv.org/abs/2307.02798v1 )

ライセンス: Link先を確認

Hritam Basak, Zhaozheng Yin

(参考訳) 教師なしドメイン適応(UDA)は、ドメインシフトを軽減するための有望な方向であるが、教師なしドメイン適応(unsupervised domain adapt)には程遠い。本研究では,医療画像セグメント化のための半教師付き領域適応 (SSDA) を比較的少ない範囲で検討し,いくつかのラベル付き対象サンプルへのアクセスにより適応性能が大幅に向上することを示した。具体的には,2段階のトレーニングプロセスを提案する。まず、新しいドメイン内容の不整合型コントラスト学習(CL)と画素レベルの特徴整合性制約を用いて、自己学習パラダイムでエンコーダを事前学習する。提案したCLは,空間感度を維持することにより局所画素レベルの情報マイニングを強制する一方,ソース画像とターゲット画像からグローバルスケールで識別的コンテンツ固有のドメイン不変セマンティクスを学習するようエンコーダに強制する。このプリトレーニングエンコーダはデコーダと共に、半教師付き設定を用いて下流タスク(ピクセルレベルセグメンテーション)に対してさらに微調整される。さらに,提案手法がUDA設定で容易に拡張可能であることを実験的に検証し,提案手法の優位性を高める。提案手法は,2つの領域適応画像分割タスクの評価において,SSDAおよびUDA設定の両方において,SoTA法よりも優れている。コードはhttps://github.com/hritam-98/GFDA-disentangledで入手できる。

Although unsupervised domain adaptation (UDA) is a promising direction to alleviate domain shift, they fall short of their supervised counterparts. In this work, we investigate relatively less explored semi-supervised domain adaptation (SSDA) for medical image segmentation, where access to a few labeled target samples can improve the adaptation performance substantially. Specifically, we propose a two-stage training process. First, an encoder is pre-trained in a self-learning paradigm using a novel domain-content disentangled contrastive learning (CL) along with a pixel-level feature consistency constraint. The proposed CL enforces the encoder to learn discriminative content-specific but domain-invariant semantics on a global scale from the source and target images, whereas consistency regularization enforces the mining of local pixel-level information by maintaining spatial sensitivity. This pre-trained encoder, along with a decoder, is further fine-tuned for the downstream task, (i.e. pixel-level segmentation) using a semi-supervised setting. Furthermore, we experimentally validate that our proposed method can easily be extended for UDA settings, adding to the superiority of the proposed strategy. Upon evaluation on two domain adaptive image segmentation tasks, our proposed method outperforms the SoTA methods, both in SSDA and UDA settings. Code is available at https://github.com/hritam-98/GFDA-disentangled

翻訳日:2023-07-07 15:06:12 公開日:2023-07-06

# bheisr: バイアスからバランスへ - 知識に基づくレコメンデーションにおけるイデオロギー分離を排除することによって信念調和を促進する

BHEISR: Nudging from Bias to Balance -- Promoting Belief Harmony by Eliminating Ideological Segregation in Knowledge-based Recommendations ( http://arxiv.org/abs/2307.02797v1 )

ライセンス: Link先を確認

Mengyan Wang, Yuxuan Hu, Zihan Yuan, Chenting Jiang, Weihua Li, Shiqing Wu and Quan Bai

(参考訳) パーソナライズされたレコメンデーションシステムの領域では、信念の不均衡とユーザのバイアスの増幅が懸念されている。そこで本研究では,既存のレコメンデーションシステムにおけるフィルタバブル効果の悪影響を軽減させるため,ユーザと既存レコメンデーションシステム間の革新的な中間機関(bheisr)を提案する。主な目的は,フィルタバブルによる有害な影響を最小限に抑えつつ,ユーザの信念バランスを打つことである。 BHEISRモデルは、民主的かつ透明な原則を支持しながら、ナッジ理論から原則を取り入れている。ユーザー固有のカテゴリー情報を利用して好奇心を刺激する。新たなカテゴリーへの関心を徐々に刺激することで、このモデルはユーザーが信念の地平を広げ、通常見落としている情報を探索することを奨励する。我々のモデルは時間に敏感であり、ユーザのフィードバックループで動作する。モデルの既存のレコメンデーションアルゴリズムを使用し、事前の時間フレームからのユーザフィードバックを組み込む。このアプローチは、フィルターバブルの制約を超越し、レコメンデーションの多様性を高め、ユーザー間の信念バランスを保ちつつ、ユーザの好みやシステム固有のビジネス要件にも応えます。 BHEISRモデルの有効性と信頼性を検証するため,実世界のデータセットを用いた総合実験を行った。これらの実験は、200人近いフィルターバブルユーザを試験対象として、bheisrモデルの性能をいくつかのベースラインモデルと比較した。実験結果は,フィルタバブルの緩和とユーザ視点のバランスをとる上で,BHEISRモデルの優れた性能を示すものである。

In the realm of personalized recommendation systems, the increasing concern is the amplification of belief imbalance and user biases, a phenomenon primarily attributed to the filter bubble. Addressing this critical issue, we introduce an innovative intermediate agency (BHEISR) between users and existing recommendation systems to attenuate the negative repercussions of the filter bubble effect in extant recommendation systems. The main objective is to strike a belief balance for users while minimizing the detrimental influence caused by filter bubbles. The BHEISR model amalgamates principles from nudge theory while upholding democratic and transparent principles. It harnesses user-specific category information to stimulate curiosity, even in areas users might initially deem uninteresting. By progressively stimulating interest in novel categories, the model encourages users to broaden their belief horizons and explore the information they typically overlook. Our model is time-sensitive and operates on a user feedback loop. It utilizes the existing recommendation algorithm of the model and incorporates user feedback from the prior time frame. This approach endeavors to transcend the constraints of the filter bubble, enrich recommendation diversity, and strike a belief balance among users while also catering to user preferences and system-specific business requirements. To validate the effectiveness and reliability of the BHEISR model, we conducted a series of comprehensive experiments with real-world datasets. These experiments compared the performance of the BHEISR model against several baseline models using nearly 200 filter bubble-impacted users as test subjects. Our experimental results conclusively illustrate the superior performance of the BHEISR model in mitigating filter bubbles and balancing user perspectives.

翻訳日:2023-07-07 15:05:29 公開日:2023-07-06

# VerifAI: 検証された生成AI

VerifAI: Verified Generative AI ( http://arxiv.org/abs/2307.02796v1 )

ライセンス: Link先を確認

Nan Tang and Chenyu Yang and Ju Fan and Lei Cao

(参考訳) 生成AIは大きな進歩を遂げているが、アウトプットの正確性と信頼性に関する懸念は拡大を続けている。このような不正確さは、不正確な意思決定、誤った情報の拡散、プライバシー侵害、法的負債など、重大な結果をもたらす可能性がある。透明性、プライバシ保護、バイアス軽減、社会的および環境的責任といった、説明可能なAIと責任あるAIプラクティスを含む、これらのリスクに対処する努力が進行中である。データ管理の観点から生成AIの出力を検証することは、生成AIの新たな課題である。これには、テキストファイル、テーブル、ナレッジグラフを含むマルチモーダルデータレイクの基盤となるデータを分析し、その品質と一貫性を評価することが含まれる。これにより、生成AIモデルの出力を評価するためのより強力な基盤を確立することができる。このようなアプローチは、生成AIの正確性を確保し、透明性を促進し、より信頼性の高い意思決定を可能にする。私たちのビジョンは、検証可能な生成AIの開発を促進し、より信頼性が高く責任あるAIの利用に貢献することです。

Generative AI has made significant strides, yet concerns about the accuracy and reliability of its outputs continue to grow. Such inaccuracies can have serious consequences such as inaccurate decision-making, the spread of false information, privacy violations, legal liabilities, and more. Although efforts to address these risks are underway, including explainable AI and responsible AI practices such as transparency, privacy protection, bias mitigation, and social and environmental responsibility, misinformation caused by generative AI will remain a significant challenge. We propose that verifying the outputs of generative AI from a data management perspective is an emerging issue for generative AI. This involves analyzing the underlying data from multi-modal data lakes, including text files, tables, and knowledge graphs, and assessing its quality and consistency. By doing so, we can establish a stronger foundation for evaluating the outputs of generative AI models. Such an approach can ensure the correctness of generative AI, promote transparency, and enable decision-making with greater confidence. Our vision is to promote the development of verifiable generative AI and contribute to a more trustworthy and responsible use of AI.

翻訳日:2023-07-07 15:04:51 公開日:2023-07-06

# データサイエンス教育は大規模言語モデルで何をすべきか?

What Should Data Science Education Do with Large Language Models? ( http://arxiv.org/abs/2307.02792v1 )

ライセンス: Link先を確認

Xinming Tu, James Zou, Weijie J. Su, Linjun Zhang

(参考訳) ChatGPTのような大規模言語モデル(LLM)の急速な進歩は、データサイエンスと統計学に革命をもたらしている。これらの最先端ツールは複雑なプロセスを合理化する。その結果、データサイエンティストの役割が再認識される。 LLMはデータサイエンティストの責務を転換し、手作業によるコーディング、データラングリング、標準分析から、これらの自動化AIによる分析の評価と管理へと焦点を移している、と私たちは主張する。この役割の進化は、ソフトウェアエンジニアからプロダクトマネージャへの移行を思い起こさせる。本稿では, LLMを用いた具体的なデータサイエンスケーススタディを用いて, この変遷を説明する。これらの発展は、データサイエンス教育において有意義な進化を必要とする。教育は、LLMインフォームドクリエイティビティ、批判的思考、AI誘導プログラミングなど、学生の間で多様なスキルセットの育成に重点を置く必要がある。 LLMは教室でインタラクティブな教育と学習ツールとして重要な役割を担い、パーソナライズされた教育に寄与する。本稿では,これら各方向性に対する機会,資源,オープンな課題について論じる。あらゆるトランスフォーメーション技術と同様に、教育にllmを統合するには慎重に検討する必要がある。 LLMは反復作業を効率的に行うことができますが、その役割は人間の知性と創造性を補うことであり、それを置き換えることではありません。したがって、データサイエンス教育の新しい時代は、人間の専門知識とイノベーションを補完しながら、llmの利点のバランスをとるべきである。結論として、LLMの台頭はデータサイエンスとその教育の転換期を告げている。本稿は,このパラダイムシフトに伴う新たなトレンド,潜在的な機会,課題を浮き彫りにし,エキサイティングで未解決な領域に関するさらなる談話や調査のきっかけとなることを願っている。

The rapid advances of large language models (LLMs), such as ChatGPT, are revolutionizing data science and statistics. These state-of-the-art tools can streamline complex processes. As a result, it reshapes the role of data scientists. We argue that LLMs are transforming the responsibilities of data scientists, shifting their focus from hands-on coding, data-wrangling and conducting standard analyses to assessing and managing analyses performed by these automated AIs. This evolution of roles is reminiscent of the transition from a software engineer to a product manager. We illustrate this transition with concrete data science case studies using LLMs in this paper. These developments necessitate a meaningful evolution in data science education. Pedagogy must now place greater emphasis on cultivating diverse skillsets among students, such as LLM-informed creativity, critical thinking, AI-guided programming. LLMs can also play a significant role in the classroom as interactive teaching and learning tools, contributing to personalized education. This paper discusses the opportunities, resources and open challenges for each of these directions. As with any transformative technology, integrating LLMs into education calls for careful consideration. While LLMs can perform repetitive tasks efficiently, it's crucial to remember that their role is to supplement human intelligence and creativity, not to replace it. Therefore, the new era of data science education should balance the benefits of LLMs while fostering complementary human expertise and innovations. In conclusion, the rise of LLMs heralds a transformative period for data science and its education. This paper seeks to shed light on the emerging trends, potential opportunities, and challenges accompanying this paradigm shift, hoping to spark further discourse and investigation into this exciting, uncharted territory.

翻訳日:2023-07-07 15:04:31 公開日:2023-07-06

# グループフェア医療画像分類におけるサブグループ分離性の役割

The Role of Subgroup Separability in Group-Fair Medical Image Classification ( http://arxiv.org/abs/2307.02791v1 )

ライセンス: Link先を確認

Charles Jones, M\'elanie Roschewitz, Ben Glocker

(参考訳) 深層分類器の性能差を調査した。分類器が個人をサブグループに分ける能力は, 医用画像のモダリティや保護特性によって大きく異なっており, この特性がアルゴリズムバイアスの予測であることを示す。理論解析と広範な経験的評価を通じて,下位診断などの体系的バイアスのあるデータを用いてモデルが訓練された場合,サブグループ分離可能性,サブグループ格差,パフォーマンス低下の関係を見出した。私たちの発見は、モデルがどのように偏見を抱くかという問題に新たな光を当て、公正な医療画像AIの開発に重要な洞察を与えました。

We investigate performance disparities in deep classifiers. We find that the ability of classifiers to separate individuals into subgroups varies substantially across medical imaging modalities and protected characteristics; crucially, we show that this property is predictive of algorithmic bias. Through theoretical analysis and extensive empirical evaluation, we find a relationship between subgroup separability, subgroup disparities, and performance degradation when models are trained on data with systematic bias such as underdiagnosis. Our findings shed new light on the question of how models become biased, providing important insights for the development of fair medical imaging AI.

翻訳日:2023-07-07 15:04:03 公開日:2023-07-06

# MEDVQA-GI 2023 における UIT-Saviors: 画像強調によるマルチモーダル学習の改善

UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image Enhancement for Gastrointestinal Visual Question Answering ( http://arxiv.org/abs/2307.02783v1 )

ライセンス: Link先を確認

Triet M. Thai, Anh T. Vo, Hao K. Tieu, Linh N.P. Bui, Thien T.B. Nguyen

(参考訳) 近年、人工知能は医学や疾患の診断において重要な役割を担い、その1つはMedVQA(MedVQA)である。コンピュータビジョンと自然言語処理を組み合わせることで、MedVQAシステムは、与えられた質問に基づいて医療画像から関連情報を抽出し、正確な診断回答を提供する専門家を支援することができる。 ImageCLEFmed-MEDVQA-GI-2023は胃内視鏡および大腸内視鏡画像を含む消化管領域の視覚的質問応答タスクを実行した。我々のチームは,胃腸画像上のVQA性能を改善するために,画像強調によるマルチモーダル学習手法を提案することで課題1にアプローチした。マルチモーダルアーキテクチャは、BERTエンコーダと、畳み込みニューラルネットワーク(CNN)とトランスフォーマーアーキテクチャに基づいて、質問や内視鏡画像から特徴抽出のための様々な事前訓練されたビジョンモデルを備える。本研究は,CNN上でのトランスフォーマーベース視覚モデルの優位性を強調し,F1スコアが向上した8つの視覚モデルのうち6つを用いて,画像強調処理の有効性を示した。 BERT+BEiT融合と画像強調の利点を生かし, 開発テストセット上で最大87.25%の精度と91.85%のF1スコアを達成するとともに, 82.01%の精度でプライベートテストセット上で良好な結果が得られる。

In recent years, artificial intelligence has played an important role in medicine and disease diagnosis, with many applications to be mentioned, one of which is Medical Visual Question Answering (MedVQA). By combining computer vision and natural language processing, MedVQA systems can assist experts in extracting relevant information from medical image based on a given question and providing precise diagnostic answers. The ImageCLEFmed-MEDVQA-GI-2023 challenge carried out visual question answering task in the gastrointestinal domain, which includes gastroscopy and colonoscopy images. Our team approached Task 1 of the challenge by proposing a multimodal learning method with image enhancement to improve the VQA performance on gastrointestinal images. The multimodal architecture is set up with BERT encoder and different pre-trained vision models based on convolutional neural network (CNN) and Transformer architecture for features extraction from question and endoscopy image. The result of this study highlights the dominance of Transformer-based vision models over the CNNs and demonstrates the effectiveness of the image enhancement process, with six out of the eight vision models achieving better F1-Score. Our best method, which takes advantages of BERT+BEiT fusion and image enhancement, achieves up to 87.25% accuracy and 91.85% F1-Score on the development test set, while also producing good result on the private test set with accuracy of 82.01%.

翻訳日:2023-07-07 15:03:51 公開日:2023-07-06

# 大規模言語モデルによるコネクテッドインテリジェンスのための自律エッジAI

Large Language Models Empowered Autonomous Edge AI for Connected Intelligence ( http://arxiv.org/abs/2307.02779v1 )

ライセンス: Link先を確認

Yifei Shen, Jiawei Shao, Xinjie Zhang, Zehong Lin, Hao Pan, Dongsheng Li, Jun Zhang, Khaled B. Letaief

(参考訳) ワイヤレスネットワークの進化は、超接続されたサイバー物理世界における人間、物体、および知性のシームレスな相互接続を想定した、コネクテッド・インテリジェンス(connected intelligence)へと向かっている。エッジAIは、ネットワークエッジで高品質で低レイテンシ、プライバシ保護のAIサービスを提供することで、コネクテッドインテリジェンスを実現するための有望なソリューションとして登場します。本稿では,ユーザの多様な要件を満たすために,自動編成,適応,最適化を行う自律エッジAIシステムを紹介する。このシステムはクラウド・エッジ・クライアントの階層アーキテクチャを採用しており、大きな言語モデル、すなわちジェネレーティブ・プレトレーニング・トランスフォーマー(GPT)がクラウドに存在し、他のAIモデルがデバイスやエッジサーバで共同デプロイされる。言語理解,計画,コード生成におけるGPTの強力な能力を活用することで,エッジAIモデルを効率的にコーディネートし,ユーザの個人的要求に応えるとともに,エッジフェデレーション学習を通じて新たなモデルをトレーニングするためのコードを自動的に生成する,汎用的なフレームワークを提案する。実験結果は,ユーザの要求を正確に理解し,最小限のコストでaiモデルを効率的に実行し,連合学習による高性能aiモデルを効果的に作成するシステムの驚くべき能力を示している。

The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world. Edge AI emerges as a promising solution to achieve connected intelligence by delivering high-quality, low-latency, and privacy-preserving AI services at the network edge. In this article, we introduce an autonomous edge AI system that automatically organizes, adapts, and optimizes itself to meet users' diverse requirements. The system employs a cloud-edge-client hierarchical architecture, where the large language model, i.e., Generative Pretrained Transformer (GPT), resides in the cloud, and other AI models are co-deployed on devices and edge servers. By leveraging the powerful abilities of GPT in language understanding, planning, and code generation, we present a versatile framework that efficiently coordinates edge AI models to cater to users' personal demands while automatically generating code to train new models via edge federated learning. Experimental results demonstrate the system's remarkable ability to accurately comprehend user demands, efficiently execute AI models with minimal cost, and effectively create high-performance AI models through federated learning.

翻訳日:2023-07-07 15:03:23 公開日:2023-07-06

# SeLiNet:画像の感情認識のための高密度軽量ネットワーク

SeLiNet: Sentiment enriched Lightweight Network for Emotion Recognition in Images ( http://arxiv.org/abs/2307.02773v1 )

ライセンス: Link先を確認

Tuneer Khargonkar, Shwetank Choudhary, Sumit Kumar, Barath Raj KR

(参考訳) 本稿では,感情に富んだ軽量ネットワークSeLiNetと,画像の文脈的感情認識のためのエンド・ツー・エンド・デバイス・パイプラインを提案する。 SeLiNetモデルは、身体特徴抽出器、画像美学特徴抽出器、学習ベース融合ネットワークから構成され、個別の感情と人間の感情を共同で推定する。 EMOTICデータセットでは,ベースラインAPスコアの27.38に対して平均精度(AP)スコアの27.17を達成し,モデルサイズを85%削減する。さらに,ベースラインと比較してモデルサイズが93%以上減少する26.42点のオンデバイスapスコアを報告した。

In this paper, we propose a sentiment-enriched lightweight network SeLiNet and an end-to-end on-device pipeline for contextual emotion recognition in images. SeLiNet model consists of body feature extractor, image aesthetics feature extractor, and learning-based fusion network which jointly estimates discrete emotion and human sentiments tasks. On the EMOTIC dataset, the proposed approach achieves an Average Precision (AP) score of 27.17 in comparison to the baseline AP score of 27.38 while reducing the model size by >85%. In addition, we report an on-device AP score of 26.42 with reduction in model size by >93% when compared to the baseline.

翻訳日:2023-07-07 15:02:55 公開日:2023-07-06

# 逆プロンプトによるクロスドメインスロット充足のためのゼロショットプロンプト学習

Generative Zero-Shot Prompt Learning for Cross-Domain Slot Filling with Inverse Prompting ( http://arxiv.org/abs/2307.02830v1 )

ライセンス: Link先を確認

Xuefeng Li, Liwen Wang, Guanting Dong, Keqing He, Jinzheng Zhao, Hao Lei, Jiachi Liu, Weiran Xu

(参考訳) ゼロショットクロスドメインスロットフィリングは、ラベル付きソースドメインからラベルなしターゲットドメインへの知識の転送を目的としている。既存のモデルはスロット記述や例をエンコードするか、ヒューリスティックなルールを使って手作りの質問テンプレートを設計する。本稿では,クロスドメインスロット充填のための生成的ゼロショットプロンプト学習フレームワークを提案する。さらに,複数の予測問題を回避するために,異なるスロットタイプを識別する新しい逆プロンプト戦略と,プロンプトパラメータの少ないトレーニングだけで高いパフォーマンスを向上させる効率的なプロンプトチューニング戦略を導入する。実験と解析により提案手法の有効性が示され、特に未確認スロットにおける大幅な改善(+13.44% F1)が示された。

Zero-shot cross-domain slot filling aims to transfer knowledge from the labeled source domain to the unlabeled target domain. Existing models either encode slot descriptions and examples or design handcrafted question templates using heuristic rules, suffering from poor generalization capability or robustness. In this paper, we propose a generative zero-shot prompt learning framework for cross-domain slot filling, both improving generalization and robustness than previous work. Besides, we introduce a novel inverse prompting strategy to distinguish different slot types to avoid the multiple prediction problem, and an efficient prompt-tuning strategy to boost higher performance by only training fewer prompt parameters. Experiments and analysis demonstrate the effectiveness of our proposed framework, especially huge improvements (+13.44% F1) on the unseen slots.

翻訳日:2023-07-07 14:56:10 公開日:2023-07-06

# 政策コントラスト模倣学習

Policy Contrastive Imitation Learning ( http://arxiv.org/abs/2307.02829v1 )

ライセンス: Link先を確認

Jialei Huang, Zhaoheng Yin, Yingdong Hu, Yang Gao

(参考訳) 逆模倣学習(英: Adversarial mimicion learning, AIL)は、最近多くの成功を収めた人気手法である。しかしながら、AILのパフォーマンスは、より困難なタスクにはまだ満足できません。主な原因の1つは、AIL識別器の低品質化によるものである。 AIL判別器は、必ずしも専門家から政策を有意義に区別するとは限らないバイナリ分類によって訓練されるので、結果として得られる報酬も意味のあるものではないかもしれない。この問題を解決するために,政策コントラスト模倣学習(PCIL)と呼ばれる新しい手法を提案する。 PCILは異なるポリシーを固定することでコントラスト表現空間を学び、スムーズなコサイン類似性に基づく報酬を生成する。提案する表現学習目標は,ail目標のより強固なバージョンと見なすことができ,エージェントとポリシーのより有意義な比較を行うことができる。理論的観点から,見習い学習フレームワークを用いた手法の有効性を示す。さらに,DeepMind Control スイートの実証評価により,PCIL が最先端の性能を達成できることが実証された。最後に、定性的な結果は、PCILが模倣学習のためのより滑らかで意味のある表現空間を構築することを示唆している。

Adversarial imitation learning (AIL) is a popular method that has recently achieved much success. However, the performance of AIL is still unsatisfactory on the more challenging tasks. We find that one of the major reasons is due to the low quality of AIL discriminator representation. Since the AIL discriminator is trained via binary classification that does not necessarily discriminate the policy from the expert in a meaningful way, the resulting reward might not be meaningful either. We propose a new method called Policy Contrastive Imitation Learning (PCIL) to resolve this issue. PCIL learns a contrastive representation space by anchoring on different policies and generates a smooth cosine-similarity-based reward. Our proposed representation learning objective can be viewed as a stronger version of the AIL objective and provide a more meaningful comparison between the agent and the policy. From a theoretical perspective, we show the validity of our method using the apprenticeship learning framework. Furthermore, our empirical evaluation on the DeepMind Control suite demonstrates that PCIL can achieve state-of-the-art performance. Finally, qualitative results suggest that PCIL builds a smoother and more meaningful representation space for imitation learning.

翻訳日:2023-07-07 14:55:55 公開日:2023-07-06

# サンプリング型高速勾配再スケーリング法による高転送性逆襲攻撃

Sampling-based Fast Gradient Rescaling Method for Highly Transferable Adversarial Attacks ( http://arxiv.org/abs/2307.02828v1 )

ライセンス: Link先を確認

Xu Han, Anmin Liu, Chenxuan Yao, Yanbo Fan, Kun He

(参考訳) 深層ニューラルネットワークは、人間の知覚できない摂動を良心的な入力に加えることで、敵の例に弱いことが知られている。ホワイトボックス設定で100%近い攻撃成功率を達成した後、ブラックボックス攻撃に焦点を移し、敵の事例の転送可能性に大きな注目を集めている。いずれの場合も、一般的な勾配法は一般に手動関数を用いて勾配更新の摂動を生成するが、これは概ね正しい方向を与え、大きな成功を収めた。しかし、その限界に注意を払う仕事はほとんどない。本研究では,元の勾配と発生する雑音との偏差が不正確な勾配更新推定と逆移動可能性に対する最適解をもたらすことを観測する。そこで本研究では,サンプリングに基づくFast Gradient Rescaling Method (S-FGRM)を提案する。具体的には、余分な計算コストを伴わずに手話関数を置換するためにデータ再スケーリングを用いる。さらに,再スケーリングの変動を解消し,勾配更新を安定化するDepth First Smpling法を提案する。本手法は任意の勾配に基づく攻撃に適用可能であり, 様々な入力変換やアンサンブル手法と統合して, 対向移動性の向上を図ることができる。標準のImageNetデータセットに対する大規模な実験により、我々の手法は勾配に基づく攻撃の転送可能性を大幅に向上し、最先端のベースラインよりも優れることが示された。

Deep neural networks are known to be vulnerable to adversarial examples crafted by adding human-imperceptible perturbations to the benign input. After achieving nearly 100% attack success rates in white-box setting, more focus is shifted to black-box attacks, of which the transferability of adversarial examples has gained significant attention. In either case, the common gradient-based methods generally use the sign function to generate perturbations on the gradient update, that offers a roughly correct direction and has gained great success. But little work pays attention to its possible limitation. In this work, we observe that the deviation between the original gradient and the generated noise may lead to inaccurate gradient update estimation and suboptimal solutions for adversarial transferability. To this end, we propose a Sampling-based Fast Gradient Rescaling Method (S-FGRM). Specifically, we use data rescaling to substitute the sign function without extra computational cost. We further propose a Depth First Sampling method to eliminate the fluctuation of rescaling and stabilize the gradient update. Our method could be used in any gradient-based attacks and is extensible to be integrated with various input transformation or ensemble methods to further improve the adversarial transferability. Extensive experiments on the standard ImageNet dataset show that our method could significantly boost the transferability of gradient-based attacks and outperform the state-of-the-art baselines.

翻訳日:2023-07-07 14:55:35 公開日:2023-07-06

# 高次流線微分方程式を用いた束特異的道図分布推定

Bundle-specific Tractogram Distribution Estimation Using Higher-order Streamline Differential Equation ( http://arxiv.org/abs/2307.02825v1 )

ライセンス: Link先を確認

Yuanjing Feng, Lei Xie, Jingqiang Wang, Jianzhong He, Fei Gao

(参考訳) トラクトグラフィーは、拡散方向と繊維幾何学との間の不明瞭な空間的対応に苦しむ繊維配向分布(FOD)から抽出されたピーク方向をトレースする。ピークに基づくトラクトグラフィ手法は「局所的」に復元された流線を「単一」にすることで,繊維束全体の傾向に関する全体的情報に欠ける。本研究では,「クラスターからクラスタへの」方法で流線束を再構成する高次流線微分方程式を用いて,束特異的な気道分布関数に基づく新しい気道図法を提案する。任意の高階ストリームライン微分方程式の統一的フレームワークを示し、拡散テンソルベクトル場に基づいて定義される不整合ストリームラインを持つファイバーバンドルを記述する。大域的なレベルでは、エネルギー最適化モデルを最小化することにより、束特異的なトラクトグラム分布(BTD)係数の推定を簡略化し、トラクトグラムバンドル情報を導入して解剖学的先行情報を提供することにより、事前指導の下でBTDと拡散テンソルベクトルの関係を特徴づける。シミュレートハフ、サイン、円データ、ismrm 2015路面図チャレンジデータ、fibercupデータ、およびヒトコネクトームプロジェクト(hcp)データからのin vivoデータを用いて、質的、定量的評価を行う実験を行った。その結果,本手法は複雑な大域繊維束を直接再構成できることがわかった。 BTDは、局所レベルでの誤差の偏差と蓄積を低減し、長距離、ねじれ、大きなファンニングトラクトを再構築するより良い結果を示す。

Tractography traces the peak directions extracted from fiber orientation distribution (FOD) suffering from ambiguous spatial correspondences between diffusion directions and fiber geometry, which is prone to producing erroneous tracks while missing true positive connections. The peaks-based tractography methods 'locally' reconstructed streamlines in 'single to single' manner, thus lacking of global information about the trend of the whole fiber bundle. In this work, we propose a novel tractography method based on a bundle-specific tractogram distribution function by using a higher-order streamline differential equation, which reconstructs the streamline bundles in 'cluster to cluster' manner. A unified framework for any higher-order streamline differential equation is presented to describe the fiber bundles with disjoint streamlines defined based on the diffusion tensor vector field. At the global level, the tractography process is simplified as the estimation of bundle-specific tractogram distribution (BTD) coefficients by minimizing the energy optimization model, and is used to characterize the relations between BTD and diffusion tensor vector under the prior guidance by introducing the tractogram bundle information to provide anatomic priors. Experiments are performed on simulated Hough, Sine, Circle data, ISMRM 2015 Tractography Challenge data, FiberCup data, and in vivo data from the Human Connectome Project (HCP) data for qualitative and quantitative evaluation. The results demonstrate that our approach can reconstruct the complex global fiber bundles directly. BTD reduces the error deviation and accumulation at the local level and shows better results in reconstructing long-range, twisting, and large fanning tracts.

翻訳日:2023-07-07 14:55:11 公開日:2023-07-06

# 音声感情認識のためのディープラーニングフレームワークによる生波形の評価

Evaluating raw waveforms with deep learning frameworks for speech emotion recognition ( http://arxiv.org/abs/2307.02820v1 )

ライセンス: Link先を確認

Zeynep Hilal Kilimci, Ulku Bayraktar, Ayhan Kucukmanisa

(参考訳) 音声認識は音声処理分野における課題である。このため,特徴抽出プロセスは音声信号の実証と処理において重要な役割を担っている。本研究では、EMO-DB、RAVDESS、TESS、CREMA、SAVEE、TESS+RAVDESSの6つの異なるデータセットを利用した感情の認識のための特徴抽出段階なしで、生オーディオファイルをディープニューラルネットワークに直接供給するモデルを示す。提案モデルの寄与を実証するために,メルスケールスペクトル,メル周波数ケプストラム係数といった従来の特徴抽出技術の性能を,機械学習アルゴリズム,アンサンブル学習手法,深層・ハイブリッド深層学習技術とブレンドする。サポートベクターマシン,決定木,ナイーブベイズ,ランダムフォレストモデルを機械学習アルゴリズムとして評価し,多数決と累積法をアンサンブル学習手法として評価する。さらに,畳み込みニューラルネットワーク,長期記憶ネットワーク,ハイブリッドCNN-LSTMモデルをディープラーニング手法として評価し,機械学習やアンサンブル学習法と比較した。提案モデルの有効性を示すため,最新研究との比較を行った。実験結果に基づき、cnnモデルは、生のオーディオファイルを用いたtess+ravdessデータセットの95.86%の精度で既存のアプローチに優れている。 CNNモデルによるEMO-DBの精度は90.34%、CNNモデルによるRAVDESSの精度は90.42%、LSTMモデルによるTESSの精度は99.48%、CNNモデルによるCREMAの精度は69.72%、CNNモデルによるSAVEEの精度は85.76%である。

Speech emotion recognition is a challenging task in speech processing field. For this reason, feature extraction process has a crucial importance to demonstrate and process the speech signals. In this work, we represent a model, which feeds raw audio files directly into the deep neural networks without any feature extraction stage for the recognition of emotions utilizing six different data sets, EMO-DB, RAVDESS, TESS, CREMA, SAVEE, and TESS+RAVDESS. To demonstrate the contribution of proposed model, the performance of traditional feature extraction techniques namely, mel-scale spectogram, mel-frequency cepstral coefficients, are blended with machine learning algorithms, ensemble learning methods, deep and hybrid deep learning techniques. Support vector machine, decision tree, naive Bayes, random forests models are evaluated as machine learning algorithms while majority voting and stacking methods are assessed as ensemble learning techniques. Moreover, convolutional neural networks, long short-term memory networks, and hybrid CNN- LSTM model are evaluated as deep learning techniques and compared with machine learning and ensemble learning methods. To demonstrate the effectiveness of proposed model, the comparison with state-of-the-art studies are carried out. Based on the experiment results, CNN model excels existent approaches with 95.86% of accuracy for TESS+RAVDESS data set using raw audio files, thence determining the new state-of-the-art. The proposed model performs 90.34% of accuracy for EMO-DB with CNN model, 90.42% of accuracy for RAVDESS with CNN model, 99.48% of accuracy for TESS with LSTM model, 69.72% of accuracy for CREMA with CNN model, 85.76% of accuracy for SAVEE with CNN model in speaker-independent audio categorization problems.

翻訳日:2023-07-07 14:54:38 公開日:2023-07-06

# 機械学習と脳波(EEG)の動向

Trends in Machine Learning and Electroencephalogram (EEG): A Review for Undergraduate Researchers ( http://arxiv.org/abs/2307.02819v1 )

ライセンス: Link先を確認

Nathan Koome Murungi, Michael Vinh Pham, Xufeng Dai, Xiaodong Qu

(参考訳) 本稿では,機械学習の文脈における脳-コンピュータインタフェース(BCI)に関する体系的な文献レビューを行う。私たちの焦点は脳波(EEG)研究であり、2023年現在の最新の傾向を浮き彫りにしている。目標は、bciフィールドのアクセス可能な概要を提供し、タスク、アルゴリズム、データセットをカバーすることにある。近年の知見を合成することにより,bci研究の基本的な理解を提供し,今後の研究に有望な道を見いだすことが目的である。

This paper presents a systematic literature review on Brain-Computer Interfaces (BCIs) in the context of Machine Learning. Our focus is on Electroencephalography (EEG) research, highlighting the latest trends as of 2023. The objective is to provide undergraduate researchers with an accessible overview of the BCI field, covering tasks, algorithms, and datasets. By synthesizing recent findings, our aim is to offer a fundamental understanding of BCI research, identifying promising avenues for future investigations.

翻訳日:2023-07-07 14:53:59 公開日:2023-07-06

# 高次ネットワークにおけるDegree Heterogeneity: Inference in the Hypergraph $\boldsymbol{\beta}$-Model

Degree Heterogeneity in Higher-Order Networks: Inference in the Hypergraph $\boldsymbol{\beta}$-Model ( http://arxiv.org/abs/2307.02818v1 )

ライセンス: Link先を確認

Sagnik Nandy and Bhaswar B. Bhattacharya

(参考訳) ランダムグラフに対する$\boldsymbol{\beta}$-model は、次数の不均質なネットワーク内の対関係を表現するのによく用いられる。 stasi et al. (2014) は双対相互作用を超えて、高次(多方向)相互作用を持つネットワークの次数の不均一性を捉えるハイパーグラフ $\boldsymbol{\beta}$-モデルを導入した。本稿では,複数の層を持つハイパーグラフ $\boldsymbol{\beta}$-model の厳密な研究を開始する。まず,最大確率(ml)推定値の収束率を導出し,最小速度の最適性を確立する。また,ML推定の限界分布を導出し,モデルパラメータに対する漸近的に有効な信頼区間を構築する。次に、hypergraph $\boldsymbol{\beta}$-modelにおける適合性の問題を考察する。具体的には,ヌル仮説の下での度数比(lr)検定の漸近正規性を確立し,その検出しきい値と閾値での制限パワーを導出する。興味深いことに、LRテストの検出しきい値はこのしきい値以下で漸近的に無力である、最小限の最適値であることが判明した。理論的結果は数値実験でさらに検証される。ハイパーグラフ$\boldsymbol{\beta}$-モデルの推定と推論のための理論的フレームワークの開発に加えて、上記の結果は、ml推定の最小最適性やlrテストの非null性など、グラフ$\boldsymbol{\beta}$-モデル文献の多くのギャップを埋めている。

The $\boldsymbol{\beta}$-model for random graphs is commonly used for representing pairwise interactions in a network with degree heterogeneity. Going beyond pairwise interactions, Stasi et al. (2014) introduced the hypergraph $\boldsymbol{\beta}$-model for capturing degree heterogeneity in networks with higher-order (multi-way) interactions. In this paper we initiate the rigorous study of the hypergraph $\boldsymbol{\beta}$-model with multiple layers, which allows for hyperedges of different sizes across the layers. To begin with, we derive the rates of convergence of the maximum likelihood (ML) estimate and establish their minimax rate optimality. We also derive the limiting distribution of the ML estimate and construct asymptotically valid confidence intervals for the model parameters. Next, we consider the goodness-of-fit problem in the hypergraph $\boldsymbol{\beta}$-model. Specifically, we establish the asymptotic normality of the likelihood ratio (LR) test under the null hypothesis, derive its detection threshold, and also its limiting power at the threshold. Interestingly, the detection threshold of the LR test turns out to be minimax optimal, that is, all tests are asymptotically powerless below this threshold. The theoretical results are further validated in numerical experiments. In addition to developing the theoretical framework for estimation and inference for hypergraph $\boldsymbol{\beta}$-models, the above results fill a number of gaps in the graph $\boldsymbol{\beta}$-model literature, such as the minimax optimality of the ML estimates and the non-null properties of the LR test, which, to the best of our knowledge, have not been studied before.

翻訳日:2023-07-07 14:53:51 公開日:2023-07-06

# 条件拡散を用いた単一画像LDRからHDRへの変換

Single Image LDR to HDR Conversion using Conditional Diffusion ( http://arxiv.org/abs/2307.02814v1 )

ライセンス: Link先を確認

Dwip Dalal, Gautam Vashishtha, Prajwal Singh, Shanmuganathan Raman

(参考訳) デジタル・イメージングは写実的なシーンを再現することを目的としているが、低ダイナミックレンジ(ldr)カメラは実際のシーンの広いダイナミックレンジを表現できない。本稿では,ハイダイナミックレンジ(hdr)画像を再構成しながら,影やハイライトから複雑な詳細を復元する深層学習に基づくアプローチを提案する。我々は,イメージ・ツー・イメージ(I2I)翻訳タスクとして問題を定式化し,分類器フリーガイダンスを用いた条件付き拡散確率モデル(DDPM)に基づくフレームワークを提案する。提案するフレームワークに深層CNNベースのオートエンコーダを組み込んで,コンディショニングに使用する入力LDR画像の潜時表現の質を高める。さらに,ldr-hdr翻訳タスクにおける新たな損失関数「露光損失」を導入する。この損失は飽和の反対方向の直接勾配を助け、結果の品質をさらに向上させる。定量的および定性的実験により,提案手法の有効性を効果的に実証した。以上の結果から,複雑なカメラパイプラインアーキテクチャを置き換える条件付き拡散法が提案されている。

Digital imaging aims to replicate realistic scenes, but Low Dynamic Range (LDR) cameras cannot represent the wide dynamic range of real scenes, resulting in under-/overexposed images. This paper presents a deep learning-based approach for recovering intricate details from shadows and highlights while reconstructing High Dynamic Range (HDR) images. We formulate the problem as an image-to-image (I2I) translation task and propose a conditional Denoising Diffusion Probabilistic Model (DDPM) based framework using classifier-free guidance. We incorporate a deep CNN-based autoencoder in our proposed framework to enhance the quality of the latent representation of the input LDR image used for conditioning. Moreover, we introduce a new loss function for LDR-HDR translation tasks, termed Exposure Loss. This loss helps direct gradients in the opposite direction of the saturation, further improving the results' quality. By conducting comprehensive quantitative and qualitative experiments, we have effectively demonstrated the proficiency of our proposed method. The results indicate that a simple conditional diffusion-based method can replace the complex camera pipeline-based architectures.

翻訳日:2023-07-07 14:53:20 公開日:2023-07-06

# cpdg : 動的グラフニューラルネットワークのためのコントラスト事前学習法

CPDG: A Contrastive Pre-Training Method for Dynamic Graph Neural Networks ( http://arxiv.org/abs/2307.02813v1 )

ライセンス: Link先を確認

Yuanchen Bei, Hao Xu, Sheng Zhou, Huixuan Chi, Mengdi Zhang, Zhao Li, Jiajun Bu

(参考訳) 動的グラフデータマイニングは, 動的グラフに含まれる豊富な情報と実世界で広く利用されているため, 近年普及している。動的グラフニューラルネットワーク(DGNN)の進歩にもかかわらず、豊富な情報と多様な下流タスクは、産業シナリオにおけるDGNNの実用化に重大な困難をもたらしている。そこで本稿では,この課題を事前学習によって解決し,動的グラフニューラルネットワーク(cpdg)のためのコントラスト事前学習法を提案する。 CPDGは、構造的時間的コントラスト付き事前学習スキームとともに、柔軟な構造的時間的サブグラフサンプリング器を通じて、一般化と長期モデリング機能を含むDGNNの事前訓練の課題に取り組む。大規模研究と産業用動的グラフデータセットの両方で実施された大規模な実験により、CPDGは3つの転送条件下での様々な下流タスクに対する動的グラフ事前学習において、既存の手法よりも優れた性能を示した。

Dynamic graph data mining has gained popularity in recent years due to the rich information contained in dynamic graphs and their widespread use in the real world. Despite the advances in dynamic graph neural networks (DGNNs), the rich information and diverse downstream tasks have posed significant difficulties for the practical application of DGNNs in industrial scenarios. To this end, in this paper, we propose to address them by pre-training and present the Contrastive Pre-Training Method for Dynamic Graph Neural Networks (CPDG). CPDG tackles the challenges of pre-training for DGNNs, including generalization and long-short term modeling capability, through a flexible structural-temporal subgraph sampler along with structural-temporal contrastive pre-training schemes. Extensive experiments conducted on both large-scale research and industrial dynamic graph datasets show that CPDG outperforms existing methods in dynamic graph pre-training for various downstream tasks under three transfer settings.

翻訳日:2023-07-07 14:53:01 公開日:2023-07-06

# テキストプロンプト評価によるゼロショットデジタル品質評価の改善

Advancing Zero-Shot Digital Human Quality Assessment through Text-Prompted Evaluation ( http://arxiv.org/abs/2307.02808v1 )

ライセンス: Link先を確認

Zicheng Zhang, Wei Sun, Yingjie Zhou, Haoning Wu, Chunyi Li, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin

(参考訳) デジタル人間は様々な領域で広範囲の応用を目撃し、関連する品質評価研究を必要としている。しかし、包括的なデジタル人間質評価(DHQA)データベースは存在しない。このギャップに対処するため,本研究では,全身デジタル人間を対象とした主観的品質評価データベースsjtu-h3dを提案する。 40人の高品質の基準デジタル人間と、1,120個のラベル付き歪みが7種類の歪みで生成される。 SJTU-H3DデータベースはDHQA研究のベンチマークとして機能し、処理アルゴリズムの評価と改善を可能にする。さらに、データベースバイアスを緩和しながら一般化機能を確保するため、ノン参照(NR)シナリオに焦点を当てたゼロショットDHQAアプローチを提案する。提案手法は,投影から抽出した意味的・歪み的特徴と,デジタル人間のメッシュ構造から抽出した幾何学的特徴を利用する。具体的には,コントラスト言語-画像事前学習(clip)モデルを用いて意味親和性を測定し,自然性画像品質評価器(niqe)モデルを用いて低レベルの歪み情報を取り込む。さらに,ディヘドラル角度を幾何ディスクリプタとしてメッシュ特徴を抽出する。これらの指標を集約することにより、ゼロショット性能の大幅な改善を示すDHQI(Digital Human Quality Index)を導入する。 DHQIはDHQAタスクの堅牢なベースラインとしても機能し、この分野の進歩を促進する。データベースとコードはhttps://github.com/zzc-1998/SJTU-H3Dで入手できる。

Digital humans have witnessed extensive applications in various domains, necessitating related quality assessment studies. However, there is a lack of comprehensive digital human quality assessment (DHQA) databases. To address this gap, we propose SJTU-H3D, a subjective quality assessment database specifically designed for full-body digital humans. It comprises 40 high-quality reference digital humans and 1,120 labeled distorted counterparts generated with seven types of distortions. The SJTU-H3D database can serve as a benchmark for DHQA research, allowing evaluation and refinement of processing algorithms. Further, we propose a zero-shot DHQA approach that focuses on no-reference (NR) scenarios to ensure generalization capabilities while mitigating database bias. Our method leverages semantic and distortion features extracted from projections, as well as geometry features derived from the mesh structure of digital humans. Specifically, we employ the Contrastive Language-Image Pre-training (CLIP) model to measure semantic affinity and incorporate the Naturalness Image Quality Evaluator (NIQE) model to capture low-level distortion information. Additionally, we utilize dihedral angles as geometry descriptors to extract mesh features. By aggregating these measures, we introduce the Digital Human Quality Index (DHQI), which demonstrates significant improvements in zero-shot performance. The DHQI can also serve as a robust baseline for DHQA tasks, facilitating advancements in the field. The database and the code are available at https://github.com/zzc-1998/SJTU-H3D.

翻訳日:2023-07-07 14:52:43 公開日:2023-07-06

# 集中認識タスクにおける基礎モデルの利用状況に関する批判的考察

A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task ( http://arxiv.org/abs/2307.02862v1 )

ライセンス: Link先を確認

Shiqi Yang, Atsushi Hashimoto, Yoshitaka Ushiku

(参考訳) 近年, 画像認識や生成など多くの分野において, 膨大なモダリティデータを学習した大規模モデルは, 基礎モデルと呼ばれることが多いが, 顕著な達成を達成している。当初のアプリケーションでは大きな成功を収めたものの、これらの基盤モデルが他のダウンストリームタスクにも適用できるかどうかはまだ不明である。本稿では,事前学習した基礎モデルに基づく識別的高密度化タスクの手法に関する簡単な調査を行う。また,Stable Diffusionに基づく既存の開語彙セグメンテーション手法の予備的検討を行い,セグメンテーションのための拡散モデルの展開方法が最適でないことを示す。これは、下流タスクに基礎モデルを採用するための将来の研究のための洞察を提供することを目的としている。

In recent years large model trained on huge amount of cross-modality data, which is usually be termed as foundation model, achieves conspicuous accomplishment in many fields, such as image recognition and generation. Though achieving great success in their original application case, it is still unclear whether those foundation models can be applied to other different downstream tasks. In this paper, we conduct a short survey on the current methods for discriminative dense recognition tasks, which are built on the pretrained foundation model. And we also provide some preliminary experimental analysis of an existing open-vocabulary segmentation method based on Stable Diffusion, which indicates the current way of deploying diffusion model for segmentation is not optimal. This aims to provide insights for future research on adopting foundation model for downstream task.

翻訳日:2023-07-07 14:46:57 公開日:2023-07-06

# フレームスキップによるフェイスアンチスプーフィングのための深層アンサンブル学習

Deep Ensemble Learning with Frame Skipping for Face Anti-Spoofing ( http://arxiv.org/abs/2307.02858v1 )

ライセンス: Link先を確認

Usman Muhammad, Md Ziaul Hoque, Mourad Oussalah and Jorma Laaksonen

(参考訳) スプーフィング攻撃(spoofing attack)とも呼ばれる顔提示攻撃は、アクセス制御システム、モバイル支払いシステム、身元確認システムといった顔認識システムに依存する生体認証システムに重大な脅威をもたらす。スプーフィングを防止するため、連続するビデオフレームで顔の動きを分析するいくつかのビデオベースの手法が文献に提示されている。しかし、隣接するフレーム間の動きを推定することは困難であり、計算コストが高い。本稿では,顔の反スプーフィング課題を運動予測問題として再構成し,フレームスキップ機構を備えた深層アンサンブル学習モデルを提案する。提案するフレームスキップは,オリジナル映像を一定サイズのビデオクリップに分割する一様サンプリング手法に基づいている。このようにして、3つの異なるリカレントニューラルネットワーク(rnn)のトレーニング中に、時間パターンを容易に認識できるように、クリップのn番目のフレームが選択される。各RNNの性能に動機づけられたメタモデルは、個々のRNNの予測を組み合わせることにより、全体的な認識性能を向上させる。 MSU-MFSD (3.12\%)、Replay-Attack (11.19\%)、OULU-NPU (12.23\%)の4つのデータセットで実験を行い、最も難しいクロスデータセットテストシナリオでは、半総誤差率 (HTER) を使用した。

Face presentation attacks, also known as spoofing attacks, pose a significant threat to biometric systems that rely on facial recognition systems, such as access control systems, mobile payments, and identity verification systems. To prevent spoofing, several video-based methods have been presented in the literature that analyze facial motion in successive video frames. However, estimating the motion between adjacent frames is a challenging task and requires high computational cost. In this paper, we reformulate the face anti-spoofing task as a motion prediction problem and introduce a deep ensemble learning model with a frame skipping mechanism. The proposed frame skipping is based on a uniform sampling approach where the original video is divided into fixed size video clips. In this way, every nth frame of the clip is selected to ensure that the temporal patterns can easily be perceived during the training of three different recurrent neural networks (RNNs). Motivated by the performance of each RNNs, a meta-model is developed to improve the overall recognition performance by combining the predictions of the individual RNNs. Extensive experiments were conducted on four datasets, and state-of-the-art performance is reported for MSU-MFSD (3.12\%), Replay-Attack (11.19\%), and OULU-NPU (12.23\%) using half total error rate (HTER) in the most challenging cross-dataset test scenario.

翻訳日:2023-07-07 14:46:44 公開日:2023-07-06

# お金だけでなく、ランサムウェア攻撃による現実世界の被害

It's more than just money: The real-world harms from ransomware attacks ( http://arxiv.org/abs/2307.02855v1 )

ライセンス: Link先を確認

Nandita Pattnaik, Jason R. C. Nurse, Sarah Turner, Gareth Mott, Jamie MacColl, Pia Huesch, James Sullivan

(参考訳) サイバー攻撃の頻度と洗練度が高まるにつれて、組織はインシデントの現実に直面する準備が整う必要がある。セキュリティリスクの管理に成功しようとする組織計画は、攻撃の余波によって影響を受ける害(すなわち負の影響)と様々な当事者を明確に理解しなければならない。この目的のために,本稿では,サイバー攻撃によって引き起こされる多数の現実世界の害について,特にランサムウェア事件を中心に,新たな調査を行っている。この調査は、このような事故による損害をモデル化するための、新しい堅牢な方法論の提案にも繋がる。ランサムウェア攻撃後の様々な段階で発生する害の種類や被害(例えば、オフラインのエンタープライズサーバ)が、利害関係者(例えば、顧客が社会福祉給付や銀行口座にアクセスできないこと)に悪影響を及ぼす可能性を秘めているかどうかを調べるために、ランサムウェアインシデントに関する公開可能なケースデータを作成します。私たちの分析で顕著な発見は、ビジネスそのものを超えて(身代金の支払い以外の)社会的・人的被害の顕著なセットの特定と、産業セクターに関係なく攻撃後に生じる複雑な害の網である。また,完全なデータがないため,害の完全な範囲とシーケンスを解読することは困難な作業であることも確認した。この論文は、ランサムウェアの害に対する透明性の向上を論じ、これらの事件の現実をよりよく理解し、組織や社会の利益をより一般的にする。

As cyber-attacks continue to increase in frequency and sophistication, organisations must be better prepared to face the reality of an incident. Any organisational plan that intends to be successful at managing security risks must clearly understand the harm (i.e., negative impact) and the various parties affected in the aftermath of an attack. To this end, this article conducts a novel exploration into the multitude of real-world harms that can arise from cyber-attacks, with a particular focus on ransomware incidents given their current prominence. This exploration also leads to the proposal of a new, robust methodology for modelling harms from such incidents. We draw on publicly-available case data on high-profile ransomware incidents to examine the types of harm that emerge at various stages after a ransomware attack and how harms (e.g., an offline enterprise server) may trigger other negative, potentially more substantial impacts for stakeholders (e.g., the inability for a customer to access their social welfare benefits or bank account). Prominent findings from our analysis include the identification of a notable set of social/human harms beyond the business itself (and beyond the financial payment of a ransom) and a complex web of harms that emerge after attacks regardless of the industry sector. We also observed that deciphering the full extent and sequence of harms can be a challenging undertaking because of the lack of complete data available. This paper consequently argues for more transparency on ransomware harms, as it would lead to a better understanding of the realities of these incidents to the benefit of organisations and society more generally.

翻訳日:2023-07-07 14:46:14 公開日:2023-07-06

# 窒素空洞中心における光子放出統計のキャラクタリゼーション

Characterization of the photon emission statistics in nitrogen-vacancy centers ( http://arxiv.org/abs/2307.02854v1 )

ライセンス: Link先を確認

Iv\'an Panadero, Hilario Espin\'os, Lucas Tsunaki, Kseniia Volkova, Ander Tobalina, Jorge Casanova, Pablo Acedo, Boris Naydenov, Ricardo Puebla, and Erik Torrontegui

(参考訳) 非共鳴レーザー励起と共振マイクロ波制御の下でダイヤモンド中の窒素空孔(NV)中心から放出される光子の時間依存カウント統計をモデル化し,実験的に実証した。 nvセンターの高速固有ダイナミクスに関連する7つの電子状態に対する量子ジャンプ形式性の一般化は、その放出を特徴づけることができ、量子系の内部状態と測定可能な検出された光子数との関係を明確化する自己完結モデルを提供する。このモデルにより、検出プロトコルの開発により、磁場測定に対するシステムの感度を最大化しながら、エネルギーと時間資源を最適化することができる。

We model and experimentally demonstrate the full time-dependent counting statistics of photons emitted by a single nitrogen-vacancy (NV) center in diamond under non-resonant laser excitation and resonant microwave control. A generalization of the quantum jump formalism for the seven electronic states involved in the fast intrinsic dynamics of an NV center provides a self-contained model that allows for the characterization of its emission and clarifies the relation between the quantum system internal states and the measurable detected photon counts. The model allows the elaboration of detection protocols to optimize the energy and time resources while maximizing the system sensitivity to magnetic-field measurements.

翻訳日:2023-07-07 14:45:45 公開日:2023-07-06

# 誇大広告に抵抗しろ! R\'esum\'e-Driven Development の実践的提案

Resist the Hype! Practical Recommendations to Cope With R\'esum\'e-Driven Development ( http://arxiv.org/abs/2307.02850v1 )

ライセンス: Link先を確認

Jonas Fritzsch, Marvin Wyrich, Justus Bogner, Stefan Wagner

(参考訳) 技術トレンドは、ソフトウェアとitプロフェッショナルの雇用プロセスにおいて重要な役割を果たす。採用(130)職と技術(558)職の両方で591人のソフトウェア専門家を対象とした最近の研究で、r\'esum\e とアプリケーションプロセスにおける技術トレンドを過大に強調する傾向に対する経験的サポートを見出した。雇用者の60%は、こうした傾向が求人広告に影響を与えることに同意した。ソフトウェア専門家のうち、82%は、日々の仕事にトレンド技術を使うことは、将来の雇用者にとってより魅力的なものになると信じていた。この現象は以前、r\'esum\'e-driven development (rdd) というラベルで報告されたことがある。この記事では、RDDがソフトウェア開発プラクティスに与える影響について、より真剣な議論を始めようとしています。我々は,この現象が有害な自己維持動態をいかに生み出すかを説明し,雇用者と応募者の両方の視点で,現状を良く変えるための実践的なレコメンデーションを提供する。

Technology trends play an important role in the hiring process for software and IT professionals. In a recent study of 591 software professionals in both hiring (130) and technical (558) roles, we found empirical support for a tendency to overemphasize technology trends in r\'esum\'es and the application process. 60% of the hiring professionals agreed that such trends would influence their job advertisements. Among the software professionals, 82% believed that using trending technologies in their daily work would make them more attractive for potential future employers. This phenomenon has previously been reported anecdotally and somewhat humorously under the label R\'esum\'e-Driven Development (RDD). Our article seeks to initiate a more serious debate about the consequences of RDD on software development practice. We explain how the phenomenon may constitute a harmful self-sustaining dynamic, and provide practical recommendations for both the hiring and applicant perspectives to change the current situation for the better.

翻訳日:2023-07-07 14:45:33 公開日:2023-07-06

# NatLogAttack:自然言語推論モデルを自然言語論理で攻撃するフレームワーク

NatLogAttack: A Framework for Attacking Natural Language Inference Models with Natural Logic ( http://arxiv.org/abs/2307.02849v1 )

ライセンス: Link先を確認

Zi'ou Zheng and Xiaodan Zhu

(参考訳) 推論は、当初から人工知能の中心的な話題だった。分散表現とニューラルネットワークの最近の進歩は、自然言語推論の最先端性能を改善し続けている。しかし、モデルが結論に達するための真の推論を行うのか、あるいは急激な相関に依存するのかは、まだ明らかな疑問である。敵の攻撃は、アキレスの犠牲者モデルのヒールを評価する重要なツールであることが証明されている。本研究では,論理形式に基づく攻撃モデルの開発に関する基礎的問題を検討する。自然論理を中心とする体系的攻撃を行うnatlogattackを提案する。natlogattackは、アリストテレスのシルロジズムに遡り、自然言語推論のために密接に開発された古典論理形式である。提案するフレームワークは,ラベル保存攻撃とラベルフリッピング攻撃の両方をレンダリングする。既存の攻撃モデルと比較して、NatLogAttackは、犠牲者モデルへの訪問が少なく、より良い敵例を生成する。被害者のモデルはラベルフライピング設定でより脆弱であることが分かる。 NatLogAttackは、キーの観点から既存のNLIモデルのキャパシティを調査するためのツールを提供しています。

Reasoning has been a central topic in artificial intelligence from the beginning. The recent progress made on distributed representation and neural networks continues to improve the state-of-the-art performance of natural language inference. However, it remains an open question whether the models perform real reasoning to reach their conclusions or rely on spurious correlations. Adversarial attacks have proven to be an important tool to help evaluate the Achilles' heel of the victim models. In this study, we explore the fundamental problem of developing attack models based on logic formalism. We propose NatLogAttack to perform systematic attacks centring around natural logic, a classical logic formalism that is traceable back to Aristotle's syllogism and has been closely developed for natural language inference. The proposed framework renders both label-preserving and label-flipping attacks. We show that compared to the existing attack models, NatLogAttack generates better adversarial examples with fewer visits to the victim models. The victim models are found to be more vulnerable under the label-flipping setting. NatLogAttack provides a tool to probe the existing and future NLI models' capacity from a key viewpoint and we hope more logic-based attacks will be further explored for understanding the desired property of reasoning.

翻訳日:2023-07-07 14:45:16 公開日:2023-07-06

# コンピュータ支援結核診断の再検討

Revisiting Computer-Aided Tuberculosis Diagnosis ( http://arxiv.org/abs/2307.02848v1 )

ライセンス: Link先を確認

Yun Liu, Yu-Huan Wu, Shi-Chen Zhang, Li Liu, Min Wu, and Ming-Ming Cheng

(参考訳) 結核(TB)は世界的な健康上の脅威であり、毎年数百万人が死亡している。早期診断と治療は生存率を大幅に向上させるが、特に発展途上国では依然として大きな課題である。近年,深層学習による結核診断 (ctd) が期待されているが, 限られたトレーニングデータによって進歩が妨げられている。そこで本研究では,結核X線(TBX11K)データセットを大規模に構築し,TB領域に対応する境界ボックスアノテーションを備えた胸部X線(CXR)画像を含む。このデータセットは高品質ctdのための高度な検出器のトレーニングを可能にする。さらに,CXR画像の同時分類とTB感染領域検出のための強力なベースラインであるSymFormerを提案する。 SymFormerはSymmetric Search Attention(SymAttention)を導入し、CXR画像の左右対称特性に取り組み、識別的特徴を学習する。 cxr画像は左右対称性に厳密に従わないため,特徴リカバリレーションによるシンマテンションを容易にする対称位置符号化 (spe) も提案する。今後のctd研究を促進するために,評価指標の導入,既存の検出器から再構成したベースラインモデルの評価,オンラインチャレンジの実施により,ベンチマークを構築する。 SymFormerはTBX11Kデータセット上で最先端のパフォーマンスを実現する。データ、コード、モデルがリリースされます。

Tuberculosis (TB) is a major global health threat, causing millions of deaths annually. Although early diagnosis and treatment can greatly improve the chances of survival, it remains a major challenge, especially in developing countries. Recently, computer-aided tuberculosis diagnosis (CTD) using deep learning has shown promise, but progress is hindered by limited training data. To address this, we establish a large-scale dataset, namely the Tuberculosis X-ray (TBX11K) dataset, which contains 11,200 chest X-ray (CXR) images with corresponding bounding box annotations for TB areas. This dataset enables the training of sophisticated detectors for high-quality CTD. Furthermore, we propose a strong baseline, SymFormer, for simultaneous CXR image classification and TB infection area detection. SymFormer incorporates Symmetric Search Attention (SymAttention) to tackle the bilateral symmetry property of CXR images for learning discriminative features. Since CXR images may not strictly adhere to the bilateral symmetry property, we also propose Symmetric Positional Encoding (SPE) to facilitate SymAttention through feature recalibration. To promote future research on CTD, we build a benchmark by introducing evaluation metrics, evaluating baseline models reformed from existing detectors, and running an online challenge. Experiments show that SymFormer achieves state-of-the-art performance on the TBX11K dataset. The data, code, and models will be released.

翻訳日:2023-07-07 14:44:55 公開日:2023-07-06

# 関数近似を用いたCVaR強化学習の高速化

Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation ( http://arxiv.org/abs/2307.02842v1 )

ライセンス: Link先を確認

Yu Chen, Yihan Du, Pihe Hu, Siwei Wang, Desheng Wu, Longbo Huang

(参考訳) リスクセンシティブ強化学習(rl)は、期待される報酬とリスクのバランスをとるポリシーを最適化することを目的としている。本稿では,線形および一般関数近似の下での反復条件値-アット・リスク(CVaR)目標を用いたリスク感応性RLの新規な定式化について検討する。関数近似を備えた ICVaR-RL と呼ばれるこの新しい定式化は、各決定ステップにおける安全性を保証するための原則化された方法を提供する。線形関数近似を持つicvar-rlに対して、計算効率の良いアルゴリズムicvar-lを提案し、$\widetilde{o}(\sqrt{\alpha^{-(h+1)}(d^2h^4+dh^6)k})$ regret、ここで$\alpha$はリスクレベル、$d$は状態動作特徴の次元、$h$は各エピソードの長さ、$k$はエピソード数である。また、一致した下界$\Omega(\sqrt{\alpha^{-(H-1)}d^2K})$を確立して、$d$および$K$に対するCVaR-Lの最適性を検証する。一般関数近似を用いた ICVaR-RL に対し, アルゴリズム ICVaR-G を提案し, ユーラダー次元と被覆数に依存する次元パラメータを $\widetilde{O}(\sqrt{\alpha^{-(H+1)}DH^4K})$ regret とする。さらに, CVaR 演算子の効率的な近似, CVaR 適応特徴を持つ新しい隆起回帰, 改良された楕円形の潜在性レムマなど, リスクに敏感な RL の新たな手法が提案されている。

Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk. In this paper, we investigate a novel risk-sensitive RL formulation with an Iterated Conditional Value-at-Risk (CVaR) objective under linear and general function approximations. This new formulation, named ICVaR-RL with function approximation, provides a principled way to guarantee safety at each decision step. For ICVaR-RL with linear function approximation, we propose a computationally efficient algorithm ICVaR-L, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}(d^2H^4+dH^6)K})$ regret, where $\alpha$ is the risk level, $d$ is the dimension of state-action features, $H$ is the length of each episode, and $K$ is the number of episodes. We also establish a matching lower bound $\Omega(\sqrt{\alpha^{-(H-1)}d^2K})$ to validate the optimality of ICVaR-L with respect to $d$ and $K$. For ICVaR-RL with general function approximation, we propose algorithm ICVaR-G, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}DH^4K})$ regret, where $D$ is a dimensional parameter that depends on the eluder dimension and covering number. Furthermore, our analysis provides several novel techniques for risk-sensitive RL, including an efficient approximation of the CVaR operator, a new ridge regression with CVaR-adapted features, and a refined elliptical potential lemma.

翻訳日:2023-07-07 14:44:32 公開日:2023-07-06

# ニュース要約生成のための進化的微調整によるLLMの強化

Enhancing LLM with Evolutionary Fine Tuning for News Summary Generation ( http://arxiv.org/abs/2307.02839v1 )

ライセンス: Link先を確認

Le Xiao and Xiaolin Chen

(参考訳) ニュース要約生成はインテリジェンス分析の分野で重要なタスクであり、人々が複雑な現実世界の出来事を理解し、反応するのに役立つ正確で包括的な情報を提供する。しかし、従来のニュース要約生成手法では、モデル自体やトレーニングデータの量、テキストノイズの影響に制限があるため、信頼性の高い情報を正確に生成することは困難である。本稿では,自然言語理解と生成能力を備えたllmを用いたニュース要約生成のための新しいパラダイムを提案する。 LLMを用いて、ニュース段落に含まれる事象から複数の構造化イベントパターンを抽出し、遺伝的アルゴリズムを用いてイベントパターンの集団を進化させ、LLMに入力する最も適応性の高いイベントパターンを選択し、ニュース要約を生成する。ニュース概要生成装置(NSG)は、イベントパターンの集団を選択し、進化させ、ニュース要約を生成するように設計されている。実験の結果,ニュース要約生成器は,一般化能力を備えた正確で信頼性の高いニュース要約を生成できることがわかった。

News summary generation is an important task in the field of intelligence analysis, which can provide accurate and comprehensive information to help people better understand and respond to complex real-world events. However, traditional news summary generation methods face some challenges, which are limited by the model itself and the amount of training data, as well as the influence of text noise, making it difficult to generate reliable information accurately. In this paper, we propose a new paradigm for news summary generation using LLM with powerful natural language understanding and generative capabilities. We use LLM to extract multiple structured event patterns from the events contained in news paragraphs, evolve the event pattern population with genetic algorithm, and select the most adaptive event pattern to input into the LLM to generate news summaries. A News Summary Generator (NSG) is designed to select and evolve the event pattern populations and generate news summaries. The experimental results show that the news summary generator is able to generate accurate and reliable news summaries with some generalization ability.

翻訳日:2023-07-07 14:43:50 公開日:2023-07-06

# 産業異常検出と位置推定のためのノイズ・ノーム再構成

Noise-to-Norm Reconstruction for Industrial Anomaly Detection and Localization ( http://arxiv.org/abs/2307.02836v1 )

ライセンス: Link先を確認

Shiqi Deng and Zhiyu Sun and Ruiyan Zhuang and Jun Gong

(参考訳) 異常検出には幅広い応用があり、特に工業品質検査において重要である。現在、多くの最高の異常検出モデルは特徴埋め込み法に依存している。しかし、これらの手法は、オブジェクト位置の変動が大きいデータセットではうまく機能しない。再構成に基づく手法では、サンプルの位置差を考慮せずに再構成誤差を用いて異常を検出する。本研究では,異常領域の不変な再構成を回避し,ノイズ・ツー・ノルムパラダイムを用いた再構成手法を提案する。再構成ネットワークはM-netをベースとして,マルチスケールフュージョンと残留アテンションモジュールを組み込んで,エンドツーエンドの異常検出とローカライゼーションを実現している。実験により, 異常領域を正常なパターンに再構成し, 正確な異常検出と局所化を実現するのに有効であることが示された。 mpddデータセットとvisaデータセットでは,提案手法が最新の手法よりも高い競合性能を達成し,mpddデータセットに新たな最先端標準を設定した。

Anomaly detection has a wide range of applications and is especially important in industrial quality inspection. Currently, many top-performing anomaly-detection models rely on feature-embedding methods. However, these methods do not perform well on datasets with large variations in object locations. Reconstruction-based methods use reconstruction errors to detect anomalies without considering positional differences between samples. In this study, a reconstruction-based method using the noise-to-norm paradigm is proposed, which avoids the invariant reconstruction of anomalous regions. Our reconstruction network is based on M-net and incorporates multiscale fusion and residual attention modules to enable end-to-end anomaly detection and localization. Experiments demonstrate that the method is effective in reconstructing anomalous regions into normal patterns and achieving accurate anomaly detection and localization. On the MPDD and VisA datasets, our proposed method achieved more competitive results than the latest methods, and it set a new state-of-the-art standard on the MPDD dataset.

翻訳日:2023-07-07 14:43:32 公開日:2023-07-06

# 多視点観測によるPOMDPのサンプル効率学習

Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight ( http://arxiv.org/abs/2307.02884v1 )

ライセンス: Link先を確認

Jiacheng Guo, Minshuo Chen, Huan Wang, Caiming Xiong, Mengdi Wang, Yu Bai

(参考訳) 本稿では,強化学習における難解な問題である部分可観測マルコフ決定過程(pomdps)における学習のサンプル効率について検討する。ゲームプレイにおけるローディングなどの実世界の設定により,POMDPと対話する各エピソードの後に,学習者は遭遇した潜伏状態から放出される複数の追加観測を収集するが,潜伏状態自体を観察しないような,強化されたフィードバックモデルを提案する。このフィードバックモデルに基づくサンプル効率学習は,POMDPsの新たなサブクラスである \emph{multi-observation revealeding POMDPs} と \emph{distinguishable POMDPs} の2つに対して可能であることを示す。両方のサブクラスは、標準軌跡フィードバックの下でサンプル効率の学習が可能な広く研究されているサブクラスである 'emph{revealing POMDPs} を一般化し、実質的に緩和する。特に、区別可能なPOMDPは、POMDPを明らかにするのに必要な \emph{linearly independent} の代わりに、異なる潜在状態からの放出分布を \emph{different} としてのみ要求する。

This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case. Motivated by real-world settings such as loading in game playing, we propose an enhanced feedback model called ``multiple observations in hindsight'', where after each episode of interaction with the POMDP, the learner may collect multiple additional observations emitted from the encountered latent states, but may not observe the latent states themselves. We show that sample-efficient learning under this feedback model is possible for two new subclasses of POMDPs: \emph{multi-observation revealing POMDPs} and \emph{distinguishable POMDPs}. Both subclasses generalize and substantially relax \emph{revealing POMDPs} -- a widely studied subclass for which sample-efficient learning is possible under standard trajectory feedback. Notably, distinguishable POMDPs only require the emission distributions from different latent states to be \emph{different} instead of \emph{linearly independent} as required in revealing POMDPs.

翻訳日:2023-07-07 14:35:59 公開日:2023-07-06

# コントラストが必要なのは

Contrast Is All You Need ( http://arxiv.org/abs/2307.02882v1 )

ライセンス: Link先を確認

Burak Kilic, Florix Bex, Albert Gatt

(参考訳) 本研究では,ラベル付き法定データが小さく,不均衡であり,結果の質を損なう可能性のある,データスカース分類シナリオを分析する。本研究では,SetFit(Sentence Transformer Finetuning),コントラスト学習設定,法定規定分類タスクにおけるバニラ微調整設定の2つに着目した。さらに,lime (local interpretable model-agnostic explanations) で抽出された特徴を比較し,モデルの分類決定にどの特徴が寄与したかを確認する。その結果,SetFitのコントラスト設定は,トレーニングサンプルのごく一部を使用しながら,バニラファインタニングよりも優れていた。 LIMEの結果から, 比較学習アプローチは, 法的に有意な肯定的特徴と否定的特徴の両方を増強し, 分類結果に寄与することが示唆された。このように、対照的な目的によって微調整されたモデルは、その決定を法的に情報的特徴に基づいてより自信を持って下すように思われる。

In this study, we analyze data-scarce classification scenarios, where available labeled legal data is small and imbalanced, potentially hurting the quality of the results. We focused on two finetuning objectives; SetFit (Sentence Transformer Finetuning), a contrastive learning setup, and a vanilla finetuning setup on a legal provision classification task. Additionally, we compare the features that are extracted with LIME (Local Interpretable Model-agnostic Explanations) to see which particular features contributed to the model's classification decisions. The results show that a contrastive setup with SetFit performed better than vanilla finetuning while using a fraction of the training samples. LIME results show that the contrastive learning approach helps boost both positive and negative features which are legally informative and contribute to the classification results. Thus a model finetuned with a contrastive objective seems to base its decisions more confidently on legally informative features.

翻訳日:2023-07-07 14:35:39 公開日:2023-07-06

# 画像多様体の確率的・意味的記述とその応用

Probabilistic and Semantic Descriptions of Image Manifolds and Their Applications ( http://arxiv.org/abs/2307.02881v1 )

ライセンス: Link先を確認

Peter Tu, Zhaoyuan Yang, Richard Hartley, Zhiwei Xu, Jing Zhang, Dylan Campbell, Jaskirat Singh, Tianyu Wang

(参考訳) 本稿では,高次元画像空間の制限領域内に存在するように制限されているという観測結果を反映した画像の確率密度関数を推定する手法について記述することから始める。画像は高次元空間の低次元多様体上にあると言うのが一般的である。しかし、像はそのような低次元多様体上に存在するかもしれないが、多様体上のすべての点が同じ確率で像になるとは限らない。画像は多様体上に不均一に分布し、この分布を確率分布としてモデル化する方法を考案する。この目標を追求するために、AIやコンピュータビジョンコミュニティで人気のある生成モデルを検討する。我々の目的のために、生成的・確率的モデルは性質を持つべきである 1)サンプル生成:モデル化された密度関数に従ってこの分布からサンプルを採取できなければならない。 2) 確率計算: 興味のあるデータセットから以前に見つからなかったサンプルが与えられた場合、少なくとも正規化定数までサンプルの確率を計算することができる。そこで本研究では,流れの正規化や拡散モデルなどの手法について検討する。次に,このような確率的記述を,敵の攻撃に対する防御構築に利用できることを示す。密度の観点で多様体を記述することに加えて、多様体上の点を記述するために意味論的解釈をどのように利用できるかを考える。この目的のために, 変分エンコーダを用いて与えられた多様体上に存在する点の不等角表現を生成する, 創発的言語フレームワークを考える。多様体上の点間の軌道は、進化する意味記述によって記述することができる。

This paper begins with a description of methods for estimating probability density functions for images that reflects the observation that such data is usually constrained to lie in restricted regions of the high-dimensional image space - not every pattern of pixels is an image. It is common to say that images lie on a lower-dimensional manifold in the high-dimensional space. However, although images may lie on such lower-dimensional manifolds, it is not the case that all points on the manifold have an equal probability of being images. Images are unevenly distributed on the manifold, and our task is to devise ways to model this distribution as a probability distribution. In pursuing this goal, we consider generative models that are popular in AI and computer vision community. For our purposes, generative/probabilistic models should have the properties of 1) sample generation: it should be possible to sample from this distribution according to the modelled density function, and 2) probability computation: given a previously unseen sample from the dataset of interest, one should be able to compute the probability of the sample, at least up to a normalising constant. To this end, we investigate the use of methods such as normalising flow and diffusion models. We then show that such probabilistic descriptions can be used to construct defences against adversarial attacks. In addition to describing the manifold in terms of density, we also consider how semantic interpretations can be used to describe points on the manifold. To this end, we consider an emergent language framework which makes use of variational encoders to produce a disentangled representation of points that reside on a given manifold. Trajectories between points on a manifold can then be described in terms of evolving semantic descriptions.

翻訳日:2023-07-07 14:35:19 公開日:2023-07-06

# 大規模LiDAR点雲における高精度インスタンスセグメンテーションに向けて

Towards accurate instance segmentation in large-scale LiDAR point clouds ( http://arxiv.org/abs/2307.02877v1 )

ライセンス: Link先を確認

Binbin Xiang, Torben Peters, Theodora Kontogianni, Frawa Vetterli, Stefano Puliti, Rasmus Astrup, Konrad Schindler

(参考訳) パンオプティカルセグメンテーション(panoptic segmentation)は、セマンティックとインスタンスセグメンテーションの組み合わせである。 3dポイントクラウド内のポイントをセマンティックカテゴリに割り当て、それらを別々のオブジェクトインスタンスに分割する。都市地図から森林管理まで、屋外の景観理解に多くの明白な応用がある。既存のメソッドは、隣接するストリート家具や隣接するツリーのような、同じセマンティックなカテゴリの近隣のインスタンスを分割するのに苦労している。本研究では,オブジェクトインスタンスへのクラスタリングポイントに関するpanoptic segmentation pipelineのステップを調査し,ボトルネックの緩和を目標とする。複数のタイプの学習点埋め込みを利用する注意深く設計されたクラスタリング戦略は、インスタンスのセグメンテーションを大幅に改善する。 npm3d urban mobile mapping datasetとfor-instance forest datasetの実験は、提案手法の有効性と汎用性を示している。

Panoptic segmentation is the combination of semantic and instance segmentation: assign the points in a 3D point cloud to semantic categories and partition them into distinct object instances. It has many obvious applications for outdoor scene understanding, from city mapping to forest management. Existing methods struggle to segment nearby instances of the same semantic category, like adjacent pieces of street furniture or neighbouring trees, which limits their usability for inventory- or management-type applications that rely on object instances. This study explores the steps of the panoptic segmentation pipeline concerned with clustering points into object instances, with the goal to alleviate that bottleneck. We find that a carefully designed clustering strategy, which leverages multiple types of learned point embeddings, significantly improves instance segmentation. Experiments on the NPM3D urban mobile mapping dataset and the FOR-instance forest dataset demonstrate the effectiveness and versatility of the proposed strategy.

翻訳日:2023-07-07 14:34:56 公開日:2023-07-06

# 基準に基づく動きのぼけ除去:基準画像のシャープネスを利用した学習

Reference-based Motion Blur Removal: Learning to Utilize Sharpness in the Reference Image ( http://arxiv.org/abs/2307.02875v1 )

ライセンス: Link先を確認

Han Zou, Masanori Suganuma, Takayuki Okatani

(参考訳) 画像中の動きのぼかしを除去する研究の進歩にもかかわらず、強いぼかしを扱うことは依然として困難である。単一の画像からぼやけを取り除くには限界があるが、ぼやけた画像をデブラーする参照として追加画像を使用するなど、複数の画像を使用する可能性も高い。典型的な設定は、ビデオの劣化の研究のように、近くのシャープ画像を用いて映像をビデオシーケンスでデバリングすることである。本稿では,参照画像に存在する情報を利用する方法を提案する。この方法は参照画像に対する強い仮定を必要としない。ビデオのデブラリングのように、同じシーンの別のショットを使うこともできるし、別のシーンから別のイメージを使うこともできる。提案手法は,まずターゲット画像と参照画像の局所的パッチに一致し,その特徴を融合させてシャープ画像を推定する。我々は、ぼやけた画像と鋭い参照をマッチングする難しい問題を解決するために、パッチベースの特徴マッチング戦略を採用する。本手法は, 単一画像デブロアリング用に設計された既設ネットワークに組み込むことができる。実験の結果,提案手法の有効性が示された。

Despite the recent advancement in the study of removing motion blur in an image, it is still hard to deal with strong blurs. While there are limits in removing blurs from a single image, it has more potential to use multiple images, e.g., using an additional image as a reference to deblur a blurry image. A typical setting is deburring an image using a nearby sharp image(s) in a video sequence, as in the studies of video deblurring. This paper proposes a better method to use the information present in a reference image. The method does not need a strong assumption on the reference image. We can utilize an alternative shot of the identical scene, just like in video deblurring, or we can even employ a distinct image from another scene. Our method first matches local patches of the target and reference images and then fuses their features to estimate a sharp image. We employ a patch-based feature matching strategy to solve the difficult problem of matching the blurry image with the sharp reference. Our method can be integrated into pre-existing networks designed for single image deblurring. The experimental results show the effectiveness of the proposed method.

翻訳日:2023-07-07 14:34:42 公開日:2023-07-06

# momentdiff: ランダムからリアルへの生成的ビデオモーメント検索

MomentDiff: Generative Video Moment Retrieval from Random to Real ( http://arxiv.org/abs/2307.02869v1 )

ライセンス: Link先を確認

Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, Yongdong Zhang

(参考訳) ビデオモーメント検索は、与えられた言語記述に対応する未トリミングビデオ内の特定の時間的セグメントを識別するための効率的で一般化されたソリューションを追求する。この目的を達成するために、momentdiffと呼ばれる生成拡散ベースのフレームワークを提供し、ランダムブラウジングから漸進的ローカライゼーションまでの典型的な人間の検索プロセスをシミュレートする。具体的には、まず実空間をランダムノイズに拡散させ、テキストとビデオの類似性のガイダンスを用いてランダムノイズを元の空間に分解する。これにより、モデルは任意のランダムな場所から実際のモーメントへのマッピングを学習でき、ランダムな初期化からセグメントを見つけることができる。トレーニングが完了すると、MomentDiffはランダムな時間セグメントを初期推定としてサンプリングし、それらを反復的に洗練して正確な時間境界を生成する。識別作業(例えば学習可能な提案やクエリに基づく)とは異なり、ランダムな初期化スパンを持つmomentdiffはデータセットからの時間的位置バイアスに抵抗する可能性がある。時間的位置バイアスの影響を評価するために,Charades-STA-Len と Charades-STA-Mom という2つの反バイアスデータセットを提案する。実験の結果,提案手法は3つのベンチマークで常に最先端手法を上回っており,提案するアンチバイアスデータセットの一般化とロバスト性が向上していることがわかった。コード、モデル、アンチバイアス評価データセットはhttps://github.com/IMCCretrieval/MomentDiffで入手できる。

Video moment retrieval pursues an efficient and generalized solution to identify the specific temporal segments within an untrimmed video that correspond to a given language description. To achieve this goal, we provide a generative diffusion-based framework called MomentDiff, which simulates a typical human retrieval process from random browsing to gradual localization. Specifically, we first diffuse the real span to random noise, and learn to denoise the random noise to the original span with the guidance of similarity between text and video. This allows the model to learn a mapping from arbitrary random locations to real moments, enabling the ability to locate segments from random initialization. Once trained, MomentDiff could sample random temporal segments as initial guesses and iteratively refine them to generate an accurate temporal boundary. Different from discriminative works (e.g., based on learnable proposals or queries), MomentDiff with random initialized spans could resist the temporal location biases from datasets. To evaluate the influence of the temporal location biases, we propose two anti-bias datasets with location distribution shifts, named Charades-STA-Len and Charades-STA-Mom. The experimental results demonstrate that our efficient framework consistently outperforms state-of-the-art methods on three public benchmarks, and exhibits better generalization and robustness on the proposed anti-bias datasets. The code, model, and anti-bias evaluation datasets are available at https://github.com/IMCCretrieval/MomentDiff.

翻訳日:2023-07-07 14:34:23 公開日:2023-07-06

# deep-learning balanced homodyne detectionを用いたカオスによる増幅量子ノイズの高速光子相関モニタリング

High-speed photon correlation monitoring of amplified quantum noise by chaos using deep-learning balanced homodyne detection ( http://arxiv.org/abs/2307.02868v1 )

ライセンス: Link先を確認

Yanqiang Guo, Zinan Hu, Jianchao Zhang, Chenyu Zhu, Xiaomin Guo

(参考訳) 光子相関の精密な実験的な決定には大量のデータと膨大な測定時間が必要である。広帯域平衡ホモダイン検出とディープラーニング加速度に基づく増幅量子雑音の2次光子相関を$g^{(2)}(0)$でモニタする手法を提案する。弱いカオスレーザーの注入により量子ノイズを効果的に増幅し、増幅された量子ノイズの$g^{(2)}(0)$をリアルタイムサンプルレート1.4GHzで測定する。また,光子相関畳み込みニューラルネットワークを用いて,数次ゆらぎを用いて相関データを加速し,様々なカオス注入強度と有効帯域幅に対して$g^{(2)}(0)$の並列処理を行う。深層学習法は、$g^{(2)}(0)$実験的な取得を高精度に加速し、平均2乗誤差0.002の光子相関データの6107セットを22秒で推定し、データ取得時間で3桁の大加速度を達成する。この技術は、セキュア通信および量子イメージングにおけるエントロピー源の高速かつ高精度なコヒーレンス評価に寄与する。

Precision experimental determination of photon correlation requires the massive amounts of data and extensive measurement time. We present a technique to monitor second-order photon correlation $g^{(2)}(0)$ of amplified quantum noise based on wideband balanced homodyne detection and deep-learning acceleration. The quantum noise is effectively amplified by an injection of weak chaotic laser and the $g^{(2)}(0)$ of the amplified quantum noise is measured with a real-time sample rate of 1.4 GHz. We also exploit a photon correlation convolutional neural network accelerating correlation data using a few quadrature fluctuations to perform a parallel processing of the $g^{(2)}(0)$ for various chaos injection intensities and effective bandwidths. The deep-learning method accelerates the $g^{(2)}(0)$ experimental acquisition with a high accuracy, estimating 6107 sets of photon correlation data with a mean square error of 0.002 in 22 seconds and achieving a three orders of magnitude acceleration in data acquisition time. This technique contributes to a high-speed and precision coherence evaluation of entropy source in secure communication and quantum imaging.

翻訳日:2023-07-07 14:33:57 公開日:2023-07-06

# 鉄道分野におけるml系システムの継続的開発と安全性確保のための安全mlopsプロセスに向けて

Towards a safe MLOps Process for the Continuous Development and Safety Assurance of ML-based Systems in the Railway Domain ( http://arxiv.org/abs/2307.02867v1 )

ライセンス: Link先を確認

Marc Zeller, Thomas Waschulzik, Reiner Schmid, Claus Bahlmann

(参考訳) 従来の自動化技術だけでは、非制限のインフラ上での無人運転を可能にするには不十分である。現在、必要な知覚タスクは機械学習(ML)を使用して実現されており、開発とデプロイを確実かつ効率的に行う必要がある。これを実現するための重要な側面の1つは、改善された再現性、トレーサビリティ、コラボレーション、変更条件へのドライバレス操作の継続的適応にMLOpsプロセスを使用することである。 MLOpsはMLアプリケーション開発と運用(Ops)を混在させ、運用からのフィードバックに基づいて、高周波ソフトウェアリリースと継続的イノベーションを可能にする。本稿では,鉄道分野におけるMLベースシステムの継続的開発と安全性保証のための安全MLOpsプロセスの概要について述べる。システムエンジニアリング、安全性保証、MLライフサイクルを包括的なワークフローに統合する。プロセスの個々の段階とその相互作用を示す。さらに,safe mlopsプロセスの異なる段階を自動化するための,関連する課題について述べる。

Traditional automation technologies alone are not sufficient to enable driverless operation of trains (called Grade of Automation (GoA) 4) on non-restricted infrastructure. The required perception tasks are nowadays realized using Machine Learning (ML) and thus need to be developed and deployed reliably and efficiently. One important aspect to achieve this is to use an MLOps process for tackling improved reproducibility, traceability, collaboration, and continuous adaptation of a driverless operation to changing conditions. MLOps mixes ML application development and operation (Ops) and enables high frequency software releases and continuous innovation based on the feedback from operations. In this paper, we outline a safe MLOps process for the continuous development and safety assurance of ML-based systems in the railway domain. It integrates system engineering, safety assurance, and the ML life-cycle in a comprehensive workflow. We present the individual stages of the process and their interactions. Moreover, we describe relevant challenges to automate the different stages of the safe MLOps process.

翻訳日:2023-07-07 14:33:38 公開日:2023-07-06

# PLIERS: オンラインソーシャルネットワークにおけるコンテンツ拡散のための人気ベースのレコメンデーションシステム

PLIERS: a Popularity-Based Recommender System for Content Dissemination in Online Social Networks ( http://arxiv.org/abs/2307.02865v1 )

ライセンス: Link先を確認

Valerio Arnaboldi, Mattia Giovanni Campana, Franca Delmastro, Elena Pagani

(参考訳) 本稿では,ユーザがすでに持っているものと同様の人気を持つアイテムやタグに関心を持っているという仮定に基づく,新しいタグベースのレコメンダシステムpliersを提案する。 PLIERSは、アルゴリズムの複雑さと推奨項目のパーソナライズレベルとの良好なトレードオフを達成することを目的としている。プライアーを評価するために,我々は実際のosnデータセットに関する一連の実験を行い,パーソナライゼーション,関連性,レコメンデーションのノベル性といった面で最先端のソリューションを上回ることを示した。

In this paper, we propose a novel tag-based recommender system called PLIERS, which relies on the assumption that users are mainly interested in items and tags with similar popularity to those they already own. PLIERS is aimed at reaching a good tradeoff between algorithmic complexity and the level of personalization of recommended items. To evaluate PLIERS, we performed a set of experiments on real OSN datasets, demonstrating that it outperforms state-of-the-art solutions in terms of personalization, relevance, and novelty of recommendations.

翻訳日:2023-07-07 14:33:23 公開日:2023-07-06

# ValiTex -- 社会科学構成の計算テキストに基づく測定のための一様検証フレームワーク

ValiTex -- a uniform validation framework for computational text-based measures of social science constructs ( http://arxiv.org/abs/2307.02863v1 )

ライセンス: Link先を確認

Lukas Birkenmaier and Clemens Lechner and Claudia Wagner

(参考訳) 社会科学構造に関する計算テキストに基づく尺度の検証方法に関するガイダンスが断片化されている。研究者は一般的に、テキストベースの尺度を検証することの重要性を認めているが、それらはしばしば共通の用語や統一的な枠組みを欠いている。本稿では,テキストデータに基づく社会科学構造の測定を支援するために,ValiTexという新たな検証フレームワークを提案する。このフレームワークは、計算テキスト分析の目的のためにフレームワークを拡張しながら、心理測定において長年確立されてきた伝統に基づいている。 ValiTexは概念モデルと動的チェックリストという2つのコンポーネントで構成されている。概念モデルがバリデーションへのアプローチ方法に関する異なるフェーズに沿って一般的な構造を提供するのに対して、動的チェックリストは特定の検証手順を定義し、推奨可能なステップ(つまり、関連する検証証拠と必要な検証証拠を提供する)またはオプション(つまり、追加の検証証拠を提供するのに役立ちます)についてガイダンスを提供する。ソーシャルメディアデータから性差別を検出するユースケースに適用することにより、フレームワークの有用性を実証する。

Guidance on how to validate computational text-based measures of social science constructs is fragmented. Whereas scholars are generally acknowledging the importance of validating their text-based measures, they often lack common terminology and a unified framework to do so. This paper introduces a new validation framework called ValiTex, designed to assist scholars to measure social science constructs based on textual data. The framework draws on a long-established tradition within psychometrics while extending the framework for the purpose of computational text analysis. ValiTex consists of two components, a conceptual model, and a dynamic checklist. Whereas the conceptual model provides a general structure along distinct phases on how to approach validation, the dynamic checklist defines specific validation steps and provides guidance on which steps might be considered recommendable (i.e., providing relevant and necessary validation evidence) or optional (i.e., useful for providing additional supporting validation evidence. The utility of the framework is demonstrated by applying it to a use case of detecting sexism from social media data.

翻訳日:2023-07-07 14:33:10 公開日:2023-07-06

# センサを用いた人間行動認識における最適センサ配置のためのリアルタイム人文推定手法

A Real-time Human Pose Estimation Approach for Optimal Sensor Placement in Sensor-based Human Activity Recognition ( http://arxiv.org/abs/2307.02906v1 )

ライセンス: Link先を確認

Orhan Konak, Alexander Wischmann, Robin van de Water, Bert Arnrich

(参考訳) センサベースのヒューマンアクティビティ認識は、人間の動きの邪魔にならない監視を容易にする。しかし,最適分類性能に最も効果的なセンサ配置の決定は依然として困難である。本稿では,対象行動の映像記録から推定した実時間2次元ポーズを用いて,この問題を解決する新しい手法を提案する。得られた骨格データは、最適なセンサ位置を特定するためのユニークな戦略を提供する。提案手法の有効性を検証し,慣性センサを用いて被験者10名を対象に13種類の活動を監視する。以上の結果から,視覚に基づくセンサ配置法は従来のディープラーニング手法と同等の結果を示し,その効果を示す。本研究は,センサ配置の最適決定のための軽量なオンデバイスソリューションを提供し,データの匿名化を促進し,マルチモーダル分類アプローチをサポートすることにより,ヒューマンアクティビティ認識の分野を著しく進歩させる。

Sensor-based Human Activity Recognition facilitates unobtrusive monitoring of human movements. However, determining the most effective sensor placement for optimal classification performance remains challenging. This paper introduces a novel methodology to resolve this issue, using real-time 2D pose estimations derived from video recordings of target activities. The derived skeleton data provides a unique strategy for identifying the optimal sensor location. We validate our approach through a feasibility study, applying inertial sensors to monitor 13 different activities across ten subjects. Our findings indicate that the vision-based method for sensor placement offers comparable results to the conventional deep learning approach, demonstrating its efficacy. This research significantly advances the field of Human Activity Recognition by providing a lightweight, on-device solution for determining the optimal sensor placement, thereby enhancing data anonymization and supporting a multimodal classification approach.

翻訳日:2023-07-07 14:26:57 公開日:2023-07-06

# パーシステンスランク関数機械学習のための計算安定性

Computable Stability for Persistence Rank Function Machine Learning ( http://arxiv.org/abs/2307.02904v1 )

ライセンス: Link先を確認

Qiquan Wang, In\'es Garc\'ia-Redondo, Pierre Faug\`ere, Anthea Monod, Gregory Henselman-Petrusek

(参考訳) 永続的ホモロジーバーコードとダイアグラムは、トポロジカルデータ分析の基盤である。多くの実データ設定で広く使われているが、トポロジ情報の変化(細胞ホモロジーによって測定される)とデータの変動を関連付けるが、複雑な幾何学的構造のために統計的設定での使用は困難である。 In this paper, we revisit the persistent homology rank function -- an invariant measure of ``shape" that was introduced before barcodes and persistence diagrams and captures the same information in a form that is more amenable to data and computation. In particular, since they are functions, techniques from functional data analysis -- a domain of statistics adapted for functions -- apply directly to persistent homology when represented by rank functions. Rank functions, however, have been less popular than barcodes because they face the challenge that stability -- a property that is crucial to validate their use in data analysis -- is difficult to guarantee, mainly due to metric concerns on rank function space. しかし、ランク関数はより自然に、多パラメータ持続ホモロジーの人気の高まりと重要なケースにまで拡張される。本稿では,シミュレーションデータと実データの両方,および単一および多パラメータ持続ホモロジーにおいて,関数推論統計および機械学習におけるランク関数の性能について検討する。階数関数によって捕捉される永続的ホモロジーの使用は、既存のアプローチよりも明らかな改善をもたらす。次に, 計算可能性と解釈可能性という基礎的な目的から, 各種指標を用いた単パラメータ・多パラメータ永続ランク関数の安定性を導出し, 数値実験とデータへの適用を理論的に正当化する。

Persistent homology barcodes and diagrams are a cornerstone of topological data analysis. Widely used in many real data settings, they relate variation in topological information (as measured by cellular homology) with variation in data, however, they are challenging to use in statistical settings due to their complex geometric structure. In this paper, we revisit the persistent homology rank function -- an invariant measure of ``shape" that was introduced before barcodes and persistence diagrams and captures the same information in a form that is more amenable to data and computation. In particular, since they are functions, techniques from functional data analysis -- a domain of statistics adapted for functions -- apply directly to persistent homology when represented by rank functions. Rank functions, however, have been less popular than barcodes because they face the challenge that stability -- a property that is crucial to validate their use in data analysis -- is difficult to guarantee, mainly due to metric concerns on rank function space. However, rank functions extend more naturally to the increasingly popular and important case of multiparameter persistent homology. In this paper, we study the performance of rank functions in functional inferential statistics and machine learning on both simulated and real data, and in both single and multiparameter persistent homology. We find that the use of persistent homology captured by rank functions offers a clear improvement over existing approaches. We then provide theoretical justification for our numerical experiments and applications to data by deriving several stability results for single- and multiparameter persistence rank functions under various metrics with the underlying aim of computational feasibility and interpretability.

翻訳日:2023-07-07 14:26:43 公開日:2023-07-06

# puffin:蒸気圧予測のためのパス統一フィードフォワードインタフェースネットワーク

PUFFIN: A Path-Unifying Feed-Forward Interfaced Network for Vapor Pressure Prediction ( http://arxiv.org/abs/2307.02903v1 )

ライセンス: Link先を確認

Vinicius Viena Santana, Carine Menezes Rebello, Luana P. Queiroz, Ana Mafalda Ribeiro, Nadia Shardta, and Idelfonso B. R. Nogueira

(参考訳) 蒸気圧の正確な予測は、様々な産業・環境用途に不可欠である。しかし, 実験の資源と労働力の強さから, 興味のあるすべての化合物の正確な測定は不可能である。蒸気圧を予測するための温度依存関係が要求されるとき、資源と労働の需要はさらに増加する。本稿では,移動学習とドメイン知識(アントワーヌ方程式)にインスパイアされた新しい帰納バイアスノードを組み合わせることで,蒸気圧予測を改善する機械学習フレームワークPUFFINを提案する。グラフ埋め込みを用いたインダクティブバイアスとトランスファーラーニングを活用することで、puffinはインダクティブバイアスを使用しない、あるいは化合物の汎用記述子を使用する代替戦略よりも優れている。このフレームワークは、データ可用性の限界を克服するためにドメイン固有の知識を組み込むことによって、他の物理化学的性質の予測を含む化学化合物分析の幅広い応用の可能性を示している。インダクティブアントインノードはネットワーク由来アントイン方程式係数を生成するため,提案する機械学習フレームワークは部分的に解釈可能である。すると、得られた分析表現を直接プロセス設計ソフトウェアに組み込んで、産業や環境で発生するプロセスの予測と制御を改善することができる。

Accurately predicting vapor pressure is vital for various industrial and environmental applications. However, obtaining accurate measurements for all compounds of interest is not possible due to the resource and labor intensity of experiments. The demand for resources and labor further multiplies when a temperature-dependent relationship for predicting vapor pressure is desired. In this paper, we propose PUFFIN (Path-Unifying Feed-Forward Interfaced Network), a machine learning framework that combines transfer learning with a new inductive bias node inspired by domain knowledge (the Antoine equation) to improve vapor pressure prediction. By leveraging inductive bias and transfer learning using graph embeddings, PUFFIN outperforms alternative strategies that do not use inductive bias or that use generic descriptors of compounds. The framework's incorporation of domain-specific knowledge to overcome the limitation of poor data availability shows its potential for broader applications in chemical compound analysis, including the prediction of other physicochemical properties. Importantly, our proposed machine learning framework is partially interpretable, because the inductive Antoine node yields network-derived Antoine equation coefficients. It would then be possible to directly incorporate the obtained analytical expression in process design software for better prediction and control of processes occurring in industry and the environment.

翻訳日:2023-07-07 14:26:21 公開日:2023-07-06

# NMR量子プロセッサ上のパウリ半群の凸混合による量子非マルコフ性の実験的実現

Experimental realization of quantum non-Markovianity through the convex mixing of Pauli semigroups on an NMR quantum processor ( http://arxiv.org/abs/2307.02899v1 )

ライセンス: Link先を確認

Vaishali Gulati and Vinayak Jagadish and R. Srikanth and Kavita Dorai

(参考訳) この実験は、任意の混合パラメータを持つパウリ半群の凸結合を調べ、結果の動的写像がマルコフ的あるいは非マルコフ的挙動を示すかどうかを決定することを目的としている。具体的には、2つのパウリ半群の同値かつ不等混合を考慮し、結果の写像が常に非マルコフ写像であることを示す。さらに、3つのパウリ半群の3方向混合の3つのケースを調査し、結果の写像のマルコビアン性または非マルコビアン性を決定する。 nmr量子プロセッサ上でポーリ半群の異なる混合結合を持つ単一量子ビット系の非ユニタリダイナミクスをシミュレートするために、2つのアンシラリー量子ビットを含むアルゴリズムを用いる。実験結果は理論的な予測と一致した。

This experimental study aims to investigate the convex combinations of Pauli semigroups with arbitrary mixing parameters to determine whether the resulting dynamical map exhibits Markovian or non-Markovian behavior. Specifically, we consider the cases of equal as well as unequal mixing of two Pauli semigroups, and demonstrate that the resulting map is always non-Markovian. Additionally, we study three cases of three-way mixing of the three Pauli semigroups and determine the Markovianity or non-Markovianity of the resulting maps by experimentally determining the decay rates. To simulate the non-unitary dynamics of a single qubit system with different mixing combinations of Pauli semigroups on an NMR quantum processor, we use an algorithm involving two ancillary qubits. The experimental results align with the theoretical predictions.

翻訳日:2023-07-07 14:26:00 公開日:2023-07-06

# RefVSR++: 参照ベースのビデオ超解像のための参照入力の爆発

RefVSR++: Exploiting Reference Inputs for Reference-based Video Super-resolution ( http://arxiv.org/abs/2307.02897v1 )

ライセンス: Link先を確認

Han Zou, Masanori Suganuma, Takayuki Okatani

(参考訳) 異なる視野(fov)を持つ複数のカメラからなるマルチカメラシステムを備えたスマートフォンが普及している。これらのカメラ構成は、参照ベースのSRとビデオSRと互換性があり、デバイス上でビデオを録画しながら同時に実行できる。これにより、これら2つのsr法を組み合わせることで画質が向上する。近年、LeeらはRefVSRという方法を提示している。本稿では,低解像度 (LR) ビデオやレファレンス (Ref) ビデオなどの観測結果を最適に活用する方法を検討する。 RefVSRは従来のビデオSRを非常に簡単に拡張し、LRとRefの入力を1つの双方向ストリームで時間とともに集約する。しかし,FoVによるLR画像とRef画像のコンテンツ差を考慮すると,時間方向に独立して集約することで,二つの画像列から最大情報を導き出すことができる。そこで,本研究では,融合lr入力とref入力を集約する手法と,時間とともにref入力を集約する手法であるrefvsr++を提案する。さらに,ビデオSRの成功の鍵となる画像特徴を時間とともに整列させる機構をRefVSR++に装備する。実験により、RefVSR++はPSNRにおいて1dB以上でRefVSRを上回る性能を示し、新しい最先端を実現する。

Smartphones equipped with a multi-camera system comprising multiple cameras with different field-of-view (FoVs) are becoming more prevalent. These camera configurations are compatible with reference-based SR and video SR, which can be executed simultaneously while recording video on the device. Thus, combining these two SR methods can improve image quality. Recently, Lee et al. have presented such a method, RefVSR. In this paper, we consider how to optimally utilize the observations obtained, including input low-resolution (LR) video and reference (Ref) video. RefVSR extends conventional video SR quite simply, aggregating the LR and Ref inputs over time in a single bidirectional stream. However, considering the content difference between LR and Ref images due to their FoVs, we can derive the maximum information from the two image sequences by aggregating them independently in the temporal direction. Then, we propose an improved method, RefVSR++, which can aggregate two features in parallel in the temporal direction, one for aggregating the fused LR and Ref inputs and the other for Ref inputs over time. Furthermore, we equip RefVSR++ with enhanced mechanisms to align image features over time, which is the key to the success of video SR. We experimentally show that RefVSR++ outperforms RefVSR by over 1dB in PSNR, achieving the new state-of-the-art.

翻訳日:2023-07-07 14:25:44 公開日:2023-07-06

# Free Bits:エッジ上の混合精度量子ニューラルネットワークのレイテンシ最適化

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge ( http://arxiv.org/abs/2307.02894v1 )

ライセンス: Link先を確認

Georg Rutishauser, Francesco Conti, Luca Benini

(参考訳) ディープニューラルネットワークの層が異なる精度で量子化される混合精度量子化(mixed-precision quantization)は、均質なビット幅量子化によって達成できる以上のモデルサイズ、レイテンシ、統計的精度の間のトレードオフを最適化する機会を提供する。与えられたネットワークに対する混合精度構成の難解な探索空間をナビゲートするために,ハイブリッド検索手法を提案する。ハードウェアに依存しない微分可能な検索アルゴリズムからなり、ハードウェア認識のヒューリスティック最適化により、特定のハードウェアターゲットに対して遅延最適化された混合精度設定を見つける。提案アルゴリズムはMobileNetV1およびMobileNetV2上で評価し,ハードウェア特性の異なるマルチコアRISC-Vマイクロコントローラ群上にネットワークを配置する。我々は、1000クラスのImageNetデータセットの完全精度ベースラインから無視できない精度で8ビットモデルと比較して、エンドツーエンドのレイテンシを最大28.6%削減する。我々は8ビットのベースラインに対して,ハードウェアサポートのないシステムでも,無視可能な精度低下時に高速化を実証する。さらに、レイテンシーのプロキシとして、二項演算数を減らした微分可能な探索に対して、我々のアプローチの優位性を示す。

Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the intractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6% reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline, even on systems with no hardware support for sub-byte arithmetic at negligible accuracy drop. Furthermore, we show the superiority of our approach with respect to differentiable search targeting reduced binary operation counts as a proxy for latency.

翻訳日:2023-07-07 14:25:19 公開日:2023-07-06

# 抑うつ状態における音声特徴の関係--抑うつ検出の速度と性能向上のための特徴相関-

The Relationship Between Speech Features Changes When You Get Depressed: Feature Correlations for Improving Speed and Performance of Depression Detection ( http://arxiv.org/abs/2307.02892v1 )

ライセンス: Link先を確認

Fuxiang Tao, Wei Ma, Xuri Ge, Anna Esposito, Alessandro Vinciarelli

(参考訳) この研究は、抑うつが音声から抽出した特徴間の相関を変化させることを示す。さらに、このような知見を用いることで、SVMとLSTMに基づく抑うつ検知器の訓練速度と性能を向上させることができることを示す。実験は、プロの精神科医によってうつ病と診断された58人を含む112人の話者を含む公開データセットであるAndroids Corpus上で実施された。その結果,実験で使用したモデルでは,特徴ベクトルよりも特徴相関行列が与えられ,学習速度と性能が向上した。誤差率の相対的な減少はモデルによって23.1%から26.6%の範囲である。特徴相関行列は, 抑えられた話者の場合, より可変である可能性が示唆された。それに応じて、このような現象は抑うつマーカーと考えることができる。

This work shows that depression changes the correlation between features extracted from speech. Furthermore, it shows that using such an insight can improve the training speed and performance of depression detectors based on SVMs and LSTMs. The experiments were performed over the Androids Corpus, a publicly available dataset involving 112 speakers, including 58 people diagnosed with depression by professional psychiatrists. The results show that the models used in the experiments improve in terms of training speed and performance when fed with feature correlation matrices rather than with feature vectors. The relative reduction of the error rate ranges between 23.1% and 26.6% depending on the model. The probable explanation is that feature correlation matrices appear to be more variable in the case of depressed speakers. Correspondingly, such a phenomenon can be thought of as a depression marker.

翻訳日:2023-07-07 14:24:59 公開日:2023-07-06

# BaBE: 遅延説明変数の推定によるフェアネスの向上

BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables ( http://arxiv.org/abs/2307.02891v1 )

ライセンス: Link先を確認

Ruta Binkyte, Daniele Gorla, Catuscia Palamidessi

(参考訳) 両グループ間の不公平な差別の問題を検討し,公平性を達成するための前処理法を提案する。統計的パリティのような補正法は通常、不正確さを生じさせ、敏感な属性sと正当な属性e(説明変数)との間に相関がある場合、実際には公平性が得られない。これらの欠点を克服するために、他の公平性の概念、特に条件付き統計パリティと平等機会が提案されている。しかし、E はデータの中で直接観測できないことが多い、すなわち潜時変数である。 E を表す他の変数 Z も観測できるが、問題は Z が S に影響される可能性があり、したがって Z 自身はバイアスを受けることができることである。この問題に対処するため、ベイズ推論と期待最大化法の組み合わせに基づくアプローチであるBaBE(Bayesian Bias Elimination)を提案し、各群に対して与えられたZに対してEの最も可能性の高い値を推定する。合成および実データ集合の実験によって、我々のアプローチは、高い正確性とともに十分な公平性を提供することが示された。

We consider the problem of unfair discrimination between two groups and propose a pre-processing method to achieve fairness. Corrective methods like statistical parity usually lead to bad accuracy and do not really achieve fairness in situations where there is a correlation between the sensitive attribute S and the legitimate attribute E (explanatory variable) that should determine the decision. To overcome these drawbacks, other notions of fairness have been proposed, in particular, conditional statistical parity and equal opportunity. However, E is often not directly observable in the data, i.e., it is a latent variable. We may observe some other variable Z representing E, but the problem is that Z may also be affected by S, hence Z itself can be biased. To deal with this problem, we propose BaBE (Bayesian Bias Elimination), an approach based on a combination of Bayes inference and the Expectation-Maximization method, to estimate the most likely value of E for a given Z for each group. The decision can then be based directly on the estimated E. We show, by experiments on synthetic and real data sets, that our approach provides a good level of fairness as well as high accuracy.

翻訳日:2023-07-07 14:24:45 公開日:2023-07-06

# 蛍光光子の登録に基づくイオン量子ビットの高精度トモグラフィー

High-precision tomography of ion qubits based on registration of fluorescent photons ( http://arxiv.org/abs/2307.02890v1 )

ライセンス: Link先を確認

Yu. I. Bogdanov, I.A. Dmitriev, B.I. Bantysh, N.A. Bogdanova, V.F. Lukichev

(参考訳) イオン量子ビットレジスタの論理状態の識別性が限定された条件下での高精度トモグラフィー法を開発した。励起レベル、光子散乱、暗騒音、低い数値開口などの有限寿命によるイオン量子ビットの量子状態の読み出し中に低い誤差率を達成することは必ずしも不可能である。しかし、ファジィ量子測定のモデルは、量子状態の正確なトモグラフィーを確保することができる。そこで我々は,蛍光光子の数を数えるファジィ測定モデルを開発した。ファジィ測定演算子に基づくイオン量子ビットレジスタの量子状態再構成のための統計的に適切なアルゴリズムを提案する。このアルゴリズムは実験で利用可能な完全な情報を使用し、イオン量子ビットの論理状態の限定的な識別性に関連する系統的な測定誤差を考慮できる。開発したモデルでは,計算量は複雑ではあるが,量子ビットの状態に関する情報がかなり多く,閾値アルゴリズムに基づくモデルよりも精度が高いことが判明した。

We develop a new method for high-precision tomography of ion qubit registers under conditions of limited distinguishability of its logical states. It is not always possible to achieve low error rates during the readout of the quantum states of ion qubits due to the finite lifetime of excited levels, photon scattering, dark noise, low numerical aperture, etc. However, the model of fuzzy quantum measurements makes it possible to ensure precise tomography of quantum states. To do this, we developed a fuzzy measurement model based on counting the number of fluorescent photons. A statistically adequate algorithm for the reconstruction of quantum states of ion qubit registers based on fuzzy measurement operators is proposed. The algorithm uses the complete information available in the experiment and makes it possible to account for systematic measurement errors associated with the limited distinguishability of the logical states of ion qubits. We show that the developed model, although computationally more complex, contains significantly more information about the state of the qubit and provides a higher accuracy of state reconstruction compared to the model based on the threshold algorithm.

翻訳日:2023-07-07 14:24:24 公開日:2023-07-06

# 先行行動の探索による課題解決の学習

Learning to Solve Tasks with Exploring Prior Behaviours ( http://arxiv.org/abs/2307.02889v1 )

ライセンス: Link先を確認

Ruiqi Zhu, Siyuan Li, Tianhong Dai, Chongjie Zhang, Oya Celiktutan

(参考訳) デモは深層強化学習(drl)で広く使われ、少ない報酬で問題解決を容易にする。しかし、実世界のシナリオにおけるタスクは、しばしばデモから様々な初期条件を持つことができ、追加の事前動作が必要になる。例えば、"emph{picking a object from a open drawer}"というタスクのデモンストレーションが与えられますが、引き出しはトレーニングでクローズされています。引き出しを開ける事前の動作が得られなければ、ロボットがそのタスクを解決する可能性は低い。これを解決するために,本論文では,内在的リワード駆動例に基づく制御 \textbf{(IRDEC)}を提案する。提案手法は,必要な先行行動の探索と取得をエージェントに行うことができ,その上でタスク固有の動作に接続してスパース・リワードタスクを解決することができる。提案手法の性能は3つのナビゲーションタスクと1つのロボット操作タスクにおける他のベースラインよりも高い。コードはhttps://github.com/Ricky-Zhu/IRDECで入手できる。

Demonstrations are widely used in Deep Reinforcement Learning (DRL) for facilitating solving tasks with sparse rewards. However, the tasks in real-world scenarios can often have varied initial conditions from the demonstration, which would require additional prior behaviours. For example, consider we are given the demonstration for the task of \emph{picking up an object from an open drawer}, but the drawer is closed in the training. Without acquiring the prior behaviours of opening the drawer, the robot is unlikely to solve the task. To address this, in this paper we propose an Intrinsic Rewards Driven Example-based Control \textbf{(IRDEC)}. Our method can endow agents with the ability to explore and acquire the required prior behaviours and then connect to the task-specific behaviours in the demonstration to solve sparse-reward tasks without requiring additional demonstration of the prior behaviours. The performance of our method outperforms other baselines on three navigation tasks and one robotic manipulation task with sparse rewards. Codes are available at https://github.com/Ricky-Zhu/IRDEC.

翻訳日:2023-07-07 14:24:08 公開日:2023-07-06

# 離散非線形schr\"odinger方程式における創発的ssh物理、ソリトンおよび凝縮を誘導する密度依存ゲージ場

Density dependent gauge field inducing emergent SSH physics, solitons and condensates in a discrete nonlinear Schr\"odinger equation ( http://arxiv.org/abs/2307.02952v1 )

ライセンス: Link先を確認

William N. Faugno, Mario Salerno, Tomoki Ozawa

(参考訳) 動的密度差依存ゲージ場を持つ離散非線形シュリンガー方程式について検討する。平面波凝縮状態から局所ソリトン状態への基底状態遷移は、ゲージ結合が変化するにつれて起こる。興味深いことに、凝縮物とソリトンが安定している状態が見つかる。創発的なキラル対称性を同定し、対称性が保護されたゼロエネルギーエッジモードの存在につながる。創発的なキラル対称性は、低エネルギーソリトンと高エネルギーソリトンを関連付ける。これらの状態は、相互作用が反発的かつ魅力的に作用することを示している。

We investigate a discrete non-linear Schr\"odinger equation with dynamical, density-difference-dependent, gauge fields. We find a ground-state transition from a plane wave condensate to a localized soliton state as the gauge coupling is varied. Interestingly we find a regime in which the condensate and soliton are both stable. We identify an emergent chiral symmetry, which leads to the existence of a symmetry protected zero energy edge mode. The emergent chiral symmetry relates low and high energy solitons. These states indicate that the interaction acts both repulsively and attractively.

翻訳日:2023-07-07 14:17:50 公開日:2023-07-06

# 実数値観測からの強化学習のためのニューロモルフィックアーキテクチャ

A Neuromorphic Architecture for Reinforcement Learning from Real-Valued Observations ( http://arxiv.org/abs/2307.02947v1 )

ライセンス: Link先を確認

Sergio F. Chevtchenko, Yeshwanth Bethi, Teresa B. Ludermir, Saeed Afshar

(参考訳) 強化学習(RL)は複雑な環境における意思決定のための強力なフレームワークを提供する。しかし、ハードウェア効率とバイオインスパイアされた方法でRLを実装することは依然として課題である。本稿では,実測値を用いてRL問題を解くための新しいスパイキングニューラルネットワーク(SNN)アーキテクチャを提案する。提案モデルは,td(temporal difference)-error modulation)とeligibility tracesを追加して,事前作業に基づいて多層イベントベースクラスタリングを組み込んだものである。アブレーション研究は、これらの成分がモデルの性能に与える影響を裏付けるものである。適応性トレースを持つ表型アクター批判アルゴリズムと最先端のPPOアルゴリズムをベンチマークとして使用する。当社のネットワークは,従来型のRL環境(マウンテンカー,カートポール,アクロボット)における安定的な制御ポリシの発見に成功した。提案モデルは,計算およびハードウェア実装要件の観点から,魅力的なトレードオフを提供する。このモデルは外部メモリバッファやグローバルエラー勾配計算を必要とせず、ローカル学習ルールと放送されたtd-error信号によってオンラインにシナプス更新が行われる。したがって、この研究はよりハードウェア効率の良いRLソリューションの開発に寄与する。

Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments. However, implementing RL in hardware-efficient and bio-inspired ways remains a challenge. This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations. The proposed model incorporates multi-layered event-based clustering, with the addition of Temporal Difference (TD)-error modulation and eligibility traces, building upon prior work. An ablation study confirms the significant impact of these components on the proposed model's performance. A tabular actor-critic algorithm with eligibility traces and a state-of-the-art Proximal Policy Optimization (PPO) algorithm are used as benchmarks. Our network consistently outperforms the tabular approach and successfully discovers stable control policies on classic RL environments: mountain car, cart-pole, and acrobot. The proposed model offers an appealing trade-off in terms of computational and hardware implementation requirements. The model does not require an external memory buffer nor a global error gradient computation, and synaptic updates occur online, driven by local learning rules and a broadcasted TD-error signal. Thus, this work contributes to the development of more hardware-efficient RL solutions.

翻訳日:2023-07-07 14:17:42 公開日:2023-07-06

# DisAsymNet:自己逆学習を用いた両側乳房X線像の非対称異常の解離

DisAsymNet: Disentanglement of Asymmetrical Abnormality on Bilateral Mammograms using Self-adversarial Learning ( http://arxiv.org/abs/2307.02935v1 )

ライセンス: Link先を確認

Xin Wang, Tao Tan, Yuan Gao, Luyi Han, Tianyu Zhang, Chunyao Lu, Regina Beets-Tan, Ruisheng Su, Ritse Mann

(参考訳) 非対称性は異常発生時の両側マンモグラム(Bi-MG)の重要な特徴である。放射線医が診断に広く利用している。左右対称のBi-MGは、非対称な異常が除去された時にどのように見えるか?」という疑問は、まだマンモグラムのアルゴリズムの開発において大きな注目を集めていない。この疑問に対処することで、マンモグラフィー解剖学の貴重な洞察と診断の解釈を助けることができる。そこで本論文では,非対称異常トランスフォーマーを用いた自己敵学習を応用した新しい枠組みであるdisasymnetを提案する。同時に,提案手法はランダムに合成された異常によって部分的に導かれる。提案手法は,3つのパブリックデータセットと1つの社内データセットを用いて実験を行い,異常分類,セグメンテーション,ローカライゼーションタスクにおいて既存の手法よりも優れていることを示す。さらに、再建された正常マンモグラムは、臨床診断のためのより良い解釈可能な視覚的手がかりへの洞察を与える。コードは一般公開される予定だ。

Asymmetry is a crucial characteristic of bilateral mammograms (Bi-MG) when abnormalities are developing. It is widely utilized by radiologists for diagnosis. The question of 'what the symmetrical Bi-MG would look like when the asymmetrical abnormalities have been removed ?' has not yet received strong attention in the development of algorithms on mammograms. Addressing this question could provide valuable insights into mammographic anatomy and aid in diagnostic interpretation. Hence, we propose a novel framework, DisAsymNet, which utilizes asymmetrical abnormality transformer guided self-adversarial learning for disentangling abnormalities and symmetric Bi-MG. At the same time, our proposed method is partially guided by randomly synthesized abnormalities. We conduct experiments on three public and one in-house dataset, and demonstrate that our method outperforms existing methods in abnormality classification, segmentation, and localization tasks. Additionally, reconstructed normal mammograms can provide insights toward better interpretable visual cues for clinical diagnosis. The code will be accessible to the public.

翻訳日:2023-07-07 14:17:26 公開日:2023-07-06

# 時間と空間:補助ロボットアームの適応制御を目指して

In Time and Space: Towards Usable Adaptive Control for Assistive Robotic Arms ( http://arxiv.org/abs/2307.02933v1 )

ライセンス: Link先を確認

Max Pascher and Kirill Kronhardt and Felix Ferdinand Goldau and Udo Frese and Jens Gerken

(参考訳) ロボットのソリューション、特にロボットアームは、製造業や家庭の医療環境など、人間との密接なコラボレーションのために頻繁にデプロイされている。これらのロボットアームは、主に物体の把握と操作を含むいくつかの自由度(DoF)を制御する必要がある。標準入力デバイスは主に2つのDoFを持ち、個々のDoFを選択するのに時間を要する。現代の適応型DoFマッピング制御(ADMC)は、必要なモードスイッチ数を削減できたが、これまでは認識された作業負荷を大幅に削減できなかった。ユーザは今でも、ワークフローに抽象モードを切り替える、というメンタルなワークロードを抱えている。我々はADMCのリコメンデーションを更新してフィードフォワードのマルチモーダルフィードバックを提供することにより、ユーザが現在と提案したマッピングをリアルタイムで視覚的に比較できるようにする。 2つの新しいアプローチの効果とは対照的に a) 継続的に更新されたDoFの組み合わせを推奨する b) 現在のロボットの動きと新しい推奨の間で、個別のしきい値を使用する。両者は、古典的な制御方法に対する個人によるVR(Virtual Reality)研究で比較される。タスク完了時間を短縮し、モードスイッチを減らし、認識されたワークロードを減らし、フィードフォワードと組み合わせることで、ADMC法は古典的なモード切替よりも優れていることを確定した。連続性としきい値の間の明らかな定量的な違いの欠如は、ユーザ中心のカスタマイズオプションの重要性を明らかにしている。これらの影響を開発プロセスに含めることで、ユーザビリティが向上し、高いユーザ受け入れを持つロボット技術の実現に欠かせないものとなる。

Robotic solutions, in particular robotic arms, are becoming more frequently deployed for close collaboration with humans, for example in manufacturing or domestic care environments. These robotic arms require the user to control several Degrees-of-Freedom (DoFs) to perform tasks, primarily involving grasping and manipulating objects. Standard input devices predominantly have two DoFs, requiring time-consuming and cognitively demanding mode switches to select individual DoFs. Contemporary Adaptive DoF Mapping Controls (ADMCs) have shown to decrease the necessary number of mode switches but were up to now not able to significantly reduce the perceived workload. Users still bear the mental workload of incorporating abstract mode switching into their workflow. We address this by providing feed-forward multimodal feedback using updated recommendations of ADMC, allowing users to visually compare the current and the suggested mapping in real-time. We contrast the effectiveness of two new approaches that a) continuously recommend updated DoF combinations or b) use discrete thresholds between current robot movements and new recommendations. Both are compared in a Virtual Reality (VR) in-person study against a classic control method. Significant results for lowered task completion time, fewer mode switches, and reduced perceived workload conclusively establish that in combination with feedforward, ADMC methods can indeed outperform classic mode switching. A lack of apparent quantitative differences between Continuous and Threshold reveals the importance of user-centered customization options. Including these implications in the development process will improve usability, which is essential for successfully implementing robotic technologies with high user acceptance.

翻訳日:2023-07-07 14:17:08 公開日:2023-07-06

# 拒絶を伴う回帰にノー・リジェクション学習が最適である場合

When No-Rejection Learning is Optimal for Regression with Rejection ( http://arxiv.org/abs/2307.02932v1 )

ライセンス: Link先を確認

Xiaocheng Li, Shang Liu, Chunlin Sun, Hanzhao Wang

(参考訳) 拒絶による学習は、予測タスクにおける人間とAIの相互作用を研究するための原型モデルである。モデルには2つのコンポーネント、予測器とリジェクタがある。サンプルが到着すると、拒絶者はまずそれを受け入れるかどうかを判断し、受理された場合、予測タスクを遂行し、拒否された場合は、予測を人間に延期する。学習問題は、予測者と拒絶者を同時に学習する必要がある。これは従来の損失関数の構造を変え、しばしば非凸性や不整合の問題を引き起こす。拒絶問題のある分類では、いくつかの作品が、証明可能な一貫性保証を持つ共同学習のための代理的損失を発生させる。本稿では,レグレッションをリジェクション問題(RwR)を用いて検討し,レグレッションタスクとしてRwR問題を扱うノリジェクション学習戦略について検討する。文献で観察される非退行学習戦略の準最適性は、予測器の関数クラスを拡大することにより緩和できることを示す。そこで我々は,予測器の学習を独り占めにするために,乱れた損失を導入し,予測器と拒絶器を共同で行うよりも,予測器に対して一貫したサロゲート特性を個別に確立できることを示す。本研究は,まず全てのデータを用いて予測器を学習し,その予測損失を校正する2段階の学習手順を提案する。より多くのデータサンプルがより良い予測器に結びつくという一般的な直感と一致し、リジェクタを学ぶためのキャリブレーションアルゴリズムのより良い設計にもっと努力する必要がある。我々の議論は主に回帰問題に焦点をあてるが、理論的結果と洞察は分類問題にも一般化する。

Learning with rejection is a prototypical model for studying the interaction between humans and AI on prediction tasks. The model has two components, a predictor and a rejector. Upon the arrival of a sample, the rejector first decides whether to accept it; if accepted, the predictor fulfills the prediction task, and if rejected, the prediction will be deferred to humans. The learning problem requires learning a predictor and a rejector simultaneously. This changes the structure of the conventional loss function and often results in non-convexity and inconsistency issues. For the classification with rejection problem, several works develop surrogate losses for the jointly learning with provable consistency guarantees; in parallel, there has been less work for the regression counterpart. We study the regression with rejection (RwR) problem and investigate the no-rejection learning strategy which treats the RwR problem as a standard regression task to learn the predictor. We establish that the suboptimality of the no-rejection learning strategy observed in the literature can be mitigated by enlarging the function class of the predictor. Then we introduce the truncated loss to single out the learning for the predictor and we show that a consistent surrogate property can be established for the predictor individually in an easier way than for the predictor and the rejector jointly. Our findings advocate for a two-step learning procedure that first uses all the data to learn the predictor and then calibrates the prediction loss for the rejector. It is better aligned with the common intuition that more data samples will lead to a better predictor and it calls for more efforts on a better design of calibration algorithms for learning the rejector. While our discussions mainly focus on the regression problem, the theoretical results and insights generalize to the classification problem as well.

翻訳日:2023-07-07 14:16:24 公開日:2023-07-06

# MIP*=RE後の物理の論理的可能性

Logical possibilities for physics after MIP*=RE ( http://arxiv.org/abs/2307.02920v1 )

ライセンス: Link先を確認

Ad\'an Cabello, Marco T\'ulio Quintino, Matthias Kleinmann

(参考訳) MIP*=RE は C_{qa} (テンソル積相関の集合の閉包) と C_{qc} (可換相関の集合) を超平面(ベルのような不等式)で分離することができ、有限次元テンソル積相関の列で近似できない無限次元量子系上の可換測度(有限個および有限個の結果)によって生成される相関が存在することを意味する。この結果から、論理的に可能な宇宙は4つあると指摘する。それぞれの可能性は、受け入れられた物理理論の限界または自然の重要な側面をテストする機会を明らかにするため興味深い。私たちは、これらの宇宙のどれかを学ぶために、道路マップを設計するのに役立つ、いくつかのオープンな問題をリストアップします。

MIP*=RE implies that C_{qa} (the closure of the set of tensor product correlations) and C_{qc} (the set of commuting correlations) can be separated by a hyperplane (i.e., a Bell-like inequality) and that there are correlations produced by commuting measurements (a finite number of them and with a finite number of outcomes) on an infinite-dimensional quantum system which cannot be approximated by sequences of finite-dimensional tensor product correlations. We point out that there are four logically possible universes after this result. Each possibility is interesting because it reveals either limitations in accepted physical theories or opportunities to test crucial aspects of nature. We list some open problems that may help us to design a road map to learn in which of these universes we are.

翻訳日:2023-07-07 14:15:20 公開日:2023-07-06

# 従業員の心理的契約違反が情報セキュリティポリシーの遵守に及ぼす影響--本質的・極端的動機づけ

The impact of an employee's psychological contract breach on compliance with information security policies: intrinsic and extrinsic motivation ( http://arxiv.org/abs/2307.02916v1 )

ライセンス: Link先を確認

Daeun Lee and Harjinder Singh Lallie and Nadine Michaelides

(参考訳) ソーシャルエンジニアリング攻撃の急速な増加にもかかわらず、すべての従業員が、組織が期待するほど情報セキュリティポリシー(isp)に準拠しているわけではない。 ISPの非準拠は、様々な心理的動機によって引き起こされる。本研究では、計画行動理論(TPB)と一般抑止理論(GDT)を用いて、従業員の心理的契約違反(PCB)がISPコンプライアンス意図(ICI)に及ぼす影響について検討した。英国人従業員 (\textit{n=206}) のデータ分析の結果, PCB が高いほどICI が低くなることがわかった。調査の結果,PCBは内因性動機(態度と公正感)を有意に低下させ,PCBは内因性動機(制裁の重大さと制裁の確実性)と内因性動機(ICI)の関係を緩やかにしなかった。その結果、ISセキュリティ分野におけるPCBのリスクに対処し、高いPCBを持つ従業員に対して効果的な解決策を提案する。

Despite the rapid rise in social engineering attacks, not all employees are as compliant with information security policies (ISPs) to the extent that organisations expect them to be. ISP non-compliance is caused by a variety of psychological motivation. This study investigates the effect of psychological contract breach (PCB) of employees on ISP compliance intention (ICI) by dividing them into intrinsic and extrinsic motivation using the theory of planned behaviour (TPB) and the general deterrence theory (GDT). Data analysis from UK employees (\textit{n=206}) showed that the higher the PCB, the lower the ICI. The study also found that PCBs significantly reduced intrinsic motivation (attitude and perceived fairness) for ICI, whereas PCBs did not moderate the relationship between extrinsic motivation (sanction severity and sanctions certainty) and ICI. As a result, this study successfully addresses the risks of PCBs in the field of IS security and proposes effective solutions for employees with high PCBs.

翻訳日:2023-07-07 14:14:51 公開日:2023-07-06

# lea: 語彙的注意バイアスを用いたタイプミスに対する文類似性の改善

LEA: Improving Sentence Similarity Robustness to Typos Using Lexical Attention Bias ( http://arxiv.org/abs/2307.02912v1 )

ライセンス: Link先を確認

Mario Almagro, Emilio Almaz\'an, Diego Ortego, David Jim\'enez

(参考訳) タイプミスや略語などのテキストノイズは、下流のタスクでバニラトランスフォーマをペナライズする有名な問題である。これは、複数の領域における基本的なタスクである文類似性(例えば、マッチング、検索、言い換えなど)のケースでもある。文の類似性はクロスエンコーダを用いてアプローチすることができ、2つの文が入力に連結され、モデルがそれらの間の関係を利用することができる。ノイズ問題に対処する以前の研究は、主にデータ拡張戦略に依存しており、トレーニングに使用されるものと類似した破損したサンプルを扱う際の堅牢性が向上している。しかし、これらの手法はすべて依然としてtyposによって引き起こされるトークン分布シフトに苦しむ。本稿では,両文の単語間の語彙類似性を組み込んだ新しい語彙認識アテンションモジュール(lea)をクロスエンコーダに実装し,テキスト雑音に対処することを提案する。テキストの類似性を利用してトークン化シフト問題を回避し,ロバスト性を向上させる。 LEAによって導入された注意バイアスは、特に短文記述と限られたコンテキストを持つドメインにおいて、テキストノイズを伴う複雑なシナリオにクロスエンコーダが取り組むのに役立つことを実証する。製品マッチングのために5つのeコマースデータセットに3つの人気のあるTransformerエンコーダを使用した実験によると、LEAはノイズの存在下でパフォーマンスを継続的に向上する一方で、元の(クリーン)分割に競争力を維持する。また,本手法を2つのデータセットで評価し,LEAが文の長い領域やより自然な文脈でタイポに頑健であることを示す。さらに,本手法における設計選択を徹底的に分析し,意思決定の影響について考察し,タイポスを扱うクロスエンコーダの今後の研究を促進する。

Textual noise, such as typos or abbreviations, is a well-known issue that penalizes vanilla Transformers for most downstream tasks. We show that this is also the case for sentence similarity, a fundamental task in multiple domains, e.g. matching, retrieval or paraphrasing. Sentence similarity can be approached using cross-encoders, where the two sentences are concatenated in the input allowing the model to exploit the inter-relations between them. Previous works addressing the noise issue mainly rely on data augmentation strategies, showing improved robustness when dealing with corrupted samples that are similar to the ones used for training. However, all these methods still suffer from the token distribution shift induced by typos. In this work, we propose to tackle textual noise by equipping cross-encoders with a novel LExical-aware Attention module (LEA) that incorporates lexical similarities between words in both sentences. By using raw text similarities, our approach avoids the tokenization shift problem obtaining improved robustness. We demonstrate that the attention bias introduced by LEA helps cross-encoders to tackle complex scenarios with textual noise, specially in domains with short-text descriptions and limited context. Experiments using three popular Transformer encoders in five e-commerce datasets for product matching show that LEA consistently boosts performance under the presence of noise, while remaining competitive on the original (clean) splits. We also evaluate our approach in two datasets for textual entailment and paraphrasing showing that LEA is robust to typos in domains with longer sentences and more natural context. Additionally, we thoroughly analyze several design choices in our approach, providing insights about the impact of the decisions made and fostering future research in cross-encoders dealing with typos.

翻訳日:2023-07-07 14:14:32 公開日:2023-07-06

# ギルバートにおけるエージェントivit\`a e telicit\`a:inlicazioni認知

Agentivit\`a e telicit\`a in GilBERTo: implicazioni cognitive ( http://arxiv.org/abs/2307.02910v1 )

ライセンス: Link先を確認

Agnese Lombardi, Alessandro Lenci

(参考訳) 本研究の目的は,トランスフォーマティブ・ニューラル・ランゲージモデルが語彙意味論を推論し,その情報を形態素合成パターンの完成に利用するかどうかを検討することである。セマンティクス特性は、テロシティ(これも定性と組み合わせている)とエージェント性である。どちらもセマンティクスとモルフォシンタックスのインターフェイスで動作し、意味的に決定され、構文的にエンコードされる。タスクは、計算モデルとイタリアのネイティブスピーカーのグループの両方に送信された。 2つのデータ群の比較により、ニューラルネットワークモデルが人間の意味的能力の重要な側面をどの程度捉えているかを調べることができる。

The goal of this study is to investigate whether a Transformer-based neural language model infers lexical semantics and use this information for the completion of morphosyntactic patterns. The semantic properties considered are telicity (also combined with definiteness) and agentivity. Both act at the interface between semantics and morphosyntax: they are semantically determined and syntactically encoded. The tasks were submitted to both the computational model and a group of Italian native speakers. The comparison between the two groups of data allows us to investigate to what extent neural language models capture significant aspects of human semantic competence.

翻訳日:2023-07-07 14:14:02 公開日:2023-07-06

# 音声-視覚的エンドツーエンドのマルチチャンネル音声分離・デバーベレーション・認識

Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition ( http://arxiv.org/abs/2307.02909v1 )

ライセンス: Link先を確認

Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu

(参考訳) 重なり合う話者、騒音、残響を含むカクテルパーティー音声の正確な認識は、現在でも非常に難しい課題である。本稿では、音響信号の劣化に対する視覚的モダリティの不変性、音声-視覚的多チャンネル音声分離、全システムコンポーネントに視覚情報をフルに組み込んだデバーベーションと認識アプローチを提案する。ビデオ入力の有効性は、マスクベースのMVDR音声分離、DNN-WPEまたはスペクトルマッピング(SpecM)ベースの音声デバーベレーションフロントエンド、コンフォーマーASRバックエンドで一貫して実証される。マスクを用いたWPDによるパイプライン化, 共同方式による音声分離, 残響処理を行うフロントエンドアーキテクチャについて検討した。音声強調フロントエンドとASRバックエンドコンポーネント間の誤差コストのミスマッチは、ASRコスト関数のみを用いたエンドツーエンドの微調整や、音声強調損失の補間によって最小化する。オックスフォードLSS2データセットのシミュレーションや再生を用いて合成した重畳および残響音声データについて実験を行った。提案された音声-視覚的多チャンネル音声分離、収差認識システムは、同等の音声のみのベースラインを9.1%、絶対値6.2%(41.7%、相対値36.0%)のワードエラー率(WER)で一貫して上回った。 PESQ, STOI, SRMRでは, 音声強調の改善が得られた。

Accurate recognition of cocktail party speech containing overlapping speakers, noise and reverberation remains a highly challenging task to date. Motivated by the invariance of visual modality to acoustic signal corruption, an audio-visual multi-channel speech separation, dereverberation and recognition approach featuring a full incorporation of visual information into all system components is proposed in this paper. The efficacy of the video input is consistently demonstrated in mask-based MVDR speech separation, DNN-WPE or spectral mapping (SpecM) based speech dereverberation front-end and Conformer ASR back-end. Audio-visual integrated front-end architectures performing speech separation and dereverberation in a pipelined or joint fashion via mask-based WPD are investigated. The error cost mismatch between the speech enhancement front-end and ASR back-end components is minimized by end-to-end jointly fine-tuning using either the ASR cost function alone, or its interpolation with the speech enhancement loss. Experiments were conducted on the mixture overlapped and reverberant speech data constructed using simulation or replay of the Oxford LRS2 dataset. The proposed audio-visual multi-channel speech separation, dereverberation and recognition systems consistently outperformed the comparable audio-only baseline by 9.1% and 6.2% absolute (41.7% and 36.0% relative) word error rate (WER) reductions. Consistent speech enhancement improvements were also obtained on PESQ, STOI and SRMR scores.

翻訳日:2023-07-07 14:13:51 公開日:2023-07-06

# CNNと意思決定レベル融合を用いたマルチモーダル・マルチクラスパーキンソン病分類

Multi-modal multi-class Parkinson disease classification using CNN and decision level fusion ( http://arxiv.org/abs/2307.02978v1 )

ライセンス: Link先を確認

Sushanta Kumar Sahu, Ananda S. Chowdhury

(参考訳) パーキンソン病は世界保健機関(WHO)が報告した2番目に一般的な神経変性疾患である。本稿では,MRIとDTIの2つのモードを用いた直接3クラスPD分類を提案する。分類に用いられる3つのクラスはpdであり、ドーパミン欠乏の証拠のないスキャンと健康管理である。目的を達成するために,mriおよび分数異方性から白色物質と灰色物質を用い,dtiからの平均拡散率を測定した。上記の4種類のデータに基づいて、4つの別々のCNNをトレーニングします。決定レベルでは、4つのcnnモデルの出力は最適な重み付き平均融合技術によって融合される。 PPMIデータベース上で,PD,HC,SWEDDの直接3クラス分類における95.53パーセントの精度を実現した。一連のアブレーション研究を含む広範な比較は,提案法の有効性を明確に示している。

Parkinson disease is the second most common neurodegenerative disorder, as reported by the World Health Organization. In this paper, we propose a direct three-Class PD classification using two different modalities, namely, MRI and DTI. The three classes used for classification are PD, Scans Without Evidence of Dopamine Deficit and Healthy Control. We use white matter and gray matter from the MRI and fractional anisotropy and mean diffusivity from the DTI to achieve our goal. We train four separate CNNs on the above four types of data. At the decision level, the outputs of the four CNN models are fused with an optimal weighted average fusion technique. We achieve an accuracy of 95.53 percentage for the direct three class classification of PD, HC and SWEDD on the publicly available PPMI database. Extensive comparisons including a series of ablation studies clearly demonstrate the effectiveness of our proposed solution.

翻訳日:2023-07-07 14:08:14 公開日:2023-07-06

# スマートフォン音声データからcovid-19の効率的な検出のためのトランスファー学習

Transfer Learning for the Efficient Detection of COVID-19 from Smartphone Audio Data ( http://arxiv.org/abs/2307.02975v1 )

ライセンス: Link先を確認

Mattia Giovanni Campana, Franca Delmastro, Elena Pagani

(参考訳) スマートフォンデータから病気を検出することは、モバイル健康(m-health)システムにおけるオープンな研究課題である。新型コロナウイルスとその呼吸器症状は、この地域で重要なケーススタディであり、その早期発見は、パンデミックの状況に対処するための潜在的な手段である。このソリューションの有効性は主に、収集されたデータに適用されたAIアルゴリズムのパフォーマンスと、ユーザのモバイルデバイスに直接適用可能な実装に依存する。本稿では,これらの課題と限られたデータ量を考慮すると,手作りの特徴と比較した3種類の深層学習モデルの実験的評価と,特徴抽出と微調整の2つのシナリオにおける伝達学習の主なアプローチについて述べる。具体的には、VGGish、YAMNET、L\textsuperscript{3}-Net(12の異なる構成を含む)を4つの異なるデータセット(合計13,447のサンプル)でユーザに依存しない実験によって評価した。その結果、L\textsuperscript{3}-Netの利点は、他のソリューションを12.3倍の精度で上回り、AUCを特徴抽出器として、そしてモデルが微調整された場合の10倍の利点を示している。さらに、事前学習したモデルの完全連結層のみを微調整すると、一般的に性能が低下し、特徴抽出に関して平均6.6\%低下する。 % さらなる調査の必要性が高い。最後に,様々なモデルのメモリフットプリントを評価し,商用モバイルデバイス上での利用の可能性について検討した。

Disease detection from smartphone data represents an open research challenge in mobile health (m-health) systems. COVID-19 and its respiratory symptoms are an important case study in this area and their early detection is a potential real instrument to counteract the pandemic situation. The efficacy of this solution mainly depends on the performances of AI algorithms applied to the collected data and their possible implementation directly on the users' mobile devices. Considering these issues, and the limited amount of available data, in this paper we present the experimental evaluation of 3 different deep learning models, compared also with hand-crafted features, and of two main approaches of transfer learning in the considered scenario: both feature extraction and fine-tuning. Specifically, we considered VGGish, YAMNET, and L\textsuperscript{3}-Net (including 12 different configurations) evaluated through user-independent experiments on 4 different datasets (13,447 samples in total). Results clearly show the advantages of L\textsuperscript{3}-Net in all the experimental settings as it overcomes the other solutions by 12.3\% in terms of Precision-Recall AUC as features extractor, and by 10\% when the model is fine-tuned. Moreover, we note that to fine-tune only the fully-connected layers of the pre-trained models generally leads to worse performances, with an average drop of 6.6\% with respect to feature extraction. %highlighting the need for further investigations. Finally, we evaluate the memory footprints of the different models for their possible applications on commercial mobile devices.

翻訳日:2023-07-07 14:08:02 公開日:2023-07-06

# リモートセンシング画像超解像のためのクロス空間画素統合とクロスステージ機能融合型トランスネットワーク

Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network for Remote Sensing Image Super-Resolution ( http://arxiv.org/abs/2307.02974v1 )

ライセンス: Link先を確認

Yuting Lu, Lingtong Min, Binglu Wang, Le Zheng, Xiaoxu Wang, Yongqiang Zhao, Teng Long

(参考訳) リモートセンシング画像スーパーレゾリューション(RSISR)は、空間デテールの強化と衛星画像の品質向上に重要な役割を果たす。近年、TransformerベースのモデルはRSISRの競争性能を示している。グローバルな自己注意による二次計算の複雑さを軽減するため、様々な手法が局所的な窓に注意を拘束し、効率を高める。その結果、単一の注意層における受容場は不十分であり、コンテキストモデリングが不十分となる。さらに、ほとんどの変換ベースのアプローチは、スキップ接続を通じて浅い機能を再利用するが、これらの接続のみに依存することによって、浅い特徴と深い特徴を等しく扱い、モデルの特徴付け能力を妨げる。これらの課題に対処するため,RSISR 用 Cross-Spatial Pixel Integration と Cross-Stage Feature Fusion Based Transformer Network (SPIFFNet) と呼ばれる新しいトランスフォーマアーキテクチャを提案する。提案モデルは,画像全体のグローバル認知と理解を効果的に促進し,機能統合の効率化を図る。本モデルでは,CSPIA (Cross-spatial pixel Integration attention) を用いて局所窓にコンテキスト情報を導入し,CSFFA (Cross-stage feature fusion attention) は前段階の特徴を適応的に融合させ,現行の要件に則って特徴表現を改善する。本研究では,複数のベンチマークデータセットを対象とした総合的な実験を行い,提案するspiffnetの性能を,最先端手法と比較して定量的指標と視覚品質の両面で実証した。

Remote sensing image super-resolution (RSISR) plays a vital role in enhancing spatial detials and improving the quality of satellite imagery. Recently, Transformer-based models have shown competitive performance in RSISR. To mitigate the quadratic computational complexity resulting from global self-attention, various methods constrain attention to a local window, enhancing its efficiency. Consequently, the receptive fields in a single attention layer are inadequate, leading to insufficient context modeling. Furthermore, while most transform-based approaches reuse shallow features through skip connections, relying solely on these connections treats shallow and deep features equally, impeding the model's ability to characterize them. To address these issues, we propose a novel transformer architecture called Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network (SPIFFNet) for RSISR. Our proposed model effectively enhances global cognition and understanding of the entire image, facilitating efficient integration of features cross-stages. The model incorporates cross-spatial pixel integration attention (CSPIA) to introduce contextual information into a local window, while cross-stage feature fusion attention (CSFFA) adaptively fuses features from the previous stage to improve feature expression in line with the requirements of the current stage. We conducted comprehensive experiments on multiple benchmark datasets, demonstrating the superior performance of our proposed SPIFFNet in terms of both quantitative metrics and visual quality when compared to state-of-the-art methods.

翻訳日:2023-07-07 14:07:36 公開日:2023-07-06

# プルーニング対量子化:どちらが良いか

Pruning vs Quantization: Which is Better? ( http://arxiv.org/abs/2307.02973v1 )

ライセンス: Link先を確認

Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen Blankevoort

(参考訳) ニューラルネットワークのプルーニングと量子化技術は、ニューラルネットワーク自体と同じくらい古い。しかし、現在では両者のアドホックな比較しか発表されていない。本稿では,ニューラルネットワークの量子化とプルーニングのどちらがよいのか,という問いに答える。この質問に答えることで、今後ニューラルネットワークハードウェアに関する設計決定が下されることを期待します。ディープニューラルネットワークを圧縮する2つの手法を広範囲に比較した。まず、一般的なデータ分布に対する期待量子化とプルーニング誤差の分析比較を行う。次に,学習ネットワークにおける層毎のプルーニングと量子化誤差の上限を低くし,最適化後の経験的誤差と比較する。最後に,8つの大規模モデルを3つのタスクでトレーニングするための実験的な比較を行った。その結果,ほとんどの場合,量子化はプルーニングよりも優れていた。圧縮比が非常に高いいくつかのシナリオでのみ、プルーニングは精度の観点から有益である。

Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models on 3 tasks. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with very high compression ratio, pruning might be beneficial from an accuracy standpoint.

翻訳日:2023-07-07 14:07:06 公開日:2023-07-06

# テキスト対画像生成における文化的ギャップについて

On the Cultural Gap in Text-to-Image Generation ( http://arxiv.org/abs/2307.02971v1 )

ライセンス: Link先を確認

Bingshuai Liu, Longyue Wang, Chenyang Lyu, Yong Zhang, Jinsong Su, Shuming Shi, Zhaopeng Tu

(参考訳) テキスト・トゥ・イメージ(T2I)生成における課題の1つは、トレーニングデータに存在する文化ギャップの意図しない反映であり、入力テキストの文化的要素がトレーニングセットにほとんど収集されない場合に生成された画像品質の相違を示す。様々なT2Iモデルは印象的だが任意の例を示しているが、T2Iモデルが異文化間画像を生成する能力を体系的に評価するベンチマークは存在しない。このギャップを埋めるために、モデルが対象文化にどの程度適しているかを評価するための総合的な評価基準を備えたChallenging Cross-Cultural (C3)ベンチマークを提案する。 C3ベンチマークで安定拡散モデルによって生成された欠陥画像を解析することにより、そのモデルが特定の文化オブジェクトを生成するのに失敗することが多いことが分かる。そこで本稿では,t2iモデルを微調整して異文化生成を改善するために,対象文化の微調整データをフィルタするために,オブジェクト・テキストアライメントを考慮した新しいマルチモーダル・メトリックを提案する。実験結果から,我々のマルチモーダル・メトリックは既存の指標よりもC3ベンチマーク上でより強力なデータ選択性能を提供することが示された。このベンチマーク、データ、コード、生成した画像は、文化的に多様なT2I世代(https://github.com/longyuewangdcu/C3-Bench.com)の今後の研究を促進する。

One challenge in text-to-image (T2I) generation is the inadvertent reflection of culture gaps present in the training data, which signifies the disparity in generated image quality when the cultural elements of the input text are rarely collected in the training set. Although various T2I models have shown impressive but arbitrary examples, there is no benchmark to systematically evaluate a T2I model's ability to generate cross-cultural images. To bridge the gap, we propose a Challenging Cross-Cultural (C3) benchmark with comprehensive evaluation criteria, which can assess how well-suited a model is to a target culture. By analyzing the flawed images generated by the Stable Diffusion model on the C3 benchmark, we find that the model often fails to generate certain cultural objects. Accordingly, we propose a novel multi-modal metric that considers object-text alignment to filter the fine-tuning data in the target culture, which is used to fine-tune a T2I model to improve cross-cultural generation. Experimental results show that our multi-modal metric provides stronger data selection performance on the C3 benchmark than existing metrics, in which the object-text alignment is crucial. We release the benchmark, data, code, and generated images to facilitate future research on culturally diverse T2I generation (https://github.com/longyuewangdcu/C3-Bench).

翻訳日:2023-07-07 14:06:53 公開日:2023-07-06

# DPM: 分離による感性データのクラスタリング

DPM: Clustering Sensitive Data through Separation ( http://arxiv.org/abs/2307.02969v1 )

ライセンス: Link先を確認

Yara Sch\"utt, Johannes Liebenow, Tanya Braun, Marcel Gehrke, Florian Thaeter, Esfandiar Mohammadi

(参考訳) プライバシ保存型クラスタリンググループデータポイントは教師なしの方法で保護され、機密情報が保護される。以前のプライバシ保存クラスタリングは、ポイントクラウドの集中度を特定することに重点を置いていた。本稿では,データセットを分割する適切な分離子を特定することに着目する。本稿では,差分プライベートな方法で正確なデータポイントセパレータを探索する,差分プライベートクラスタリングアルゴリズムdpmを提案する。 DPMは、クラスタ内の小さなギャップではなく、クラスタ間の大きなギャップであるセパレータを特定することと、データを大きなサブパートに分割するセパレータを優先して、プライバシ予算を効率的に使用することだ。差分的にプライベートな指数メカニズムを用いて、DPMは証明可能な高いユーティリティを持つクラスタセパレータをランダムに選択する: データセットの$D$に対して、中央の$60\%$quantileに広い低密度セパレータがある場合、DPMは確率1\exp(-\sqrt{|D|})$でそのセパレータを見つける。実験の結果,dpmはクラスタリング指標の慣性において有意な改善が得られた。ベースラインとしての非プライベートkmeans++の慣性的な結果により、$\varepsilon = 1$と$\delta=10^{-5}$ dpmは、合成データセットに対して最大$50\%、changとkamathによる最先端クラスタリングアルゴリズムと比較して実世界のデータセットに対して最大$62\$でベースラインとの違いを改善している。

Privacy-preserving clustering groups data points in an unsupervised manner whilst ensuring that sensitive information remains protected. Previous privacy-preserving clustering focused on identifying concentration of point clouds. In this paper, we take another path and focus on identifying appropriate separators that split a data set. We introduce the novel differentially private clustering algorithm DPM that searches for accurate data point separators in a differentially private manner. DPM addresses two key challenges for finding accurate separators: identifying separators that are large gaps between clusters instead of small gaps within a cluster and, to efficiently spend the privacy budget, prioritising separators that split the data into large subparts. Using the differentially private Exponential Mechanism, DPM randomly chooses cluster separators with provably high utility: For a data set $D$, if there is a wide low-density separator in the central $60\%$ quantile, DPM finds that separator with probability $1 - \exp(-\sqrt{|D|})$. Our experimental evaluation demonstrates that DPM achieves significant improvements in terms of the clustering metric inertia. With the inertia results of the non-private KMeans++ as a baseline, for $\varepsilon = 1$ and $\delta=10^{-5}$ DPM improves upon the difference to the baseline by up to $50\%$ for a synthetic data set and by up to $62\%$ for a real-world data set compared to a state-of-the-art clustering algorithm by Chang and Kamath.

翻訳日:2023-07-07 14:06:26 公開日:2023-07-06

# 空間スペクトルベクトルビーム

Spatio-Spectral Vector Beams ( http://arxiv.org/abs/2307.02965v1 )

ライセンス: Link先を確認

Lea Kopf, Rafael Barros, Robert Fickler

(参考訳) 自由度(dof)の高度な操作によって光場の複雑さが増すことは、基礎研究や技術にとって新たな機会となる。光の空間的またはスペクトル的な形状に関連する偏光は、完全に偏光され、空間的またはスペクトル的に変化する偏光構造を持ついわゆる空間的またはスペクトル的ベクトルビームをもたらす。ここでは、両方のアプローチを組み合わせることでベクトルビームの一般的な考え方を拡張し、空間、波長、偏光の3つの非分離性DoFにおける新しい光状態を構築する。我々は、それらの複素偏光構造を詳細に研究し、場の偏光の度合いは、空間と波長が狭く定義されているときにのみ明らかにすることを示し、非分離量子系におけるコヒーレンス損失の類似性を実証する。このような光場は、古典的な光場の非分離性や新しい技術機会、例えばイメージングや分光の応用に関する基礎研究を可能にする。

Increasing the complexity of a light field through the advanced manipulation of its degrees of freedom (DoF) provides new opportunities for fundamental studies and technologies. Correlating polarization with the light's spatial or spectral shape results in so-called spatial or spectral vector beams that are fully polarized and have a spatially or spectrally varying polarization structure. Here, we extend the general idea of vector beams by combining both approaches and structuring a novel state of light in three non-separable DoF's, i.e. space, wavelength, and polarization. We study in detail their complex polarization structure, show that the degree of polarization of the field is only unveiled when the field is narrowly defined in space and wavelength, and demonstrate the analogy to the loss of coherence in non-separable quantum systems. Such light fields allow fundamental studies on the non-separable nature of a classical light field and new technological opportunities, e.g. through applications in imaging or spectroscopy.

翻訳日:2023-07-07 14:05:50 公開日:2023-07-06

# 局所パウリ雑音流路の構造とパラメータの効率的な学習

Efficient learning of the structure and parameters of local Pauli noise channels ( http://arxiv.org/abs/2307.02959v1 )

ライセンス: Link先を確認

Cambyse Rouz\'e, Daniel Stilck Fran\c{c}a

(参考訳) ノイズの避けられない存在は、大規模量子コンピュータの開発にとって重要な障害であり、量子ノイズを高精度で確実かつ効率的に特徴づける能力は、量子技術のさらなる拡張に不可欠である。任意の量子チャネルを推定するには指数的な資源を必要とするが、物理的に関連するノイズはいくつかの局所構造を持つことが期待されている。前回の研究では、既知の条件付き独立構造から外れても、状態の準備や測定エラーに頑健な方法で、効率的なサンプル数でポーリノイズチャネルを推定できることが示されている。本稿では,n量子ビット上でポーリノイズチャネルを学習する新しい手法を提案する。条件付き独立構造を持つ学習係数に着目した先行研究とは異なり,本手法は係数と基礎構造の両方を学習する。我々は,Gibs測度を効率よく学習するためにBreslerによる画期的な結果を利用して,O(log(n))の最適なサンプル複雑性を求め,n量子ビットに作用する雑音の未知構造を学習する。この情報を利用すれば、O(poly(n))サンプルからダイヤモンド距離に近いチャネルの記述を得ることができる。さらに,本手法は,SPAMロバストネスなどの他の望ましい特徴を諦めることなく,サンプル数と後処理の両面で効率的であり,単一キュービットクリフォードの実装しか必要としない。これを踏まえ, 量子デバイスにおけるポーリノイズの大規模キャラクタリゼーションを, 最小実験条件と仮定下で実現している。

The unavoidable presence of noise is a crucial roadblock for the development of large-scale quantum computers and the ability to characterize quantum noise reliably and efficiently with high precision is essential to scale quantum technologies further. Although estimating an arbitrary quantum channel requires exponential resources, it is expected that physically relevant noise has some underlying local structure, for instance that errors across different qubits have a conditional independence structure. Previous works showed how it is possible to estimate Pauli noise channels with an efficient number of samples in a way that is robust to state preparation and measurement errors, albeit departing from a known conditional independence structure. We present a novel approach for learning Pauli noise channels over n qubits that addresses this shortcoming. Unlike previous works that focused on learning coefficients with a known conditional independence structure, our method learns both the coefficients and the underlying structure. We achieve our results by leveraging a groundbreaking result by Bresler for efficiently learning Gibbs measures and obtain an optimal sample complexity of O(log(n)) to learn the unknown structure of the noise acting on n qubits. This information can then be leveraged to obtain a description of the channel that is close in diamond distance from O(poly(n)) samples. Furthermore, our method is efficient both in the number of samples and postprocessing without giving up on other desirable features such as SPAM-robustness, and only requires the implementation of single qubit Cliffords. In light of this, our novel approach enables the large-scale characterization of Pauli noise in quantum devices under minimal experimental requirements and assumptions.

翻訳日:2023-07-07 14:05:32 公開日:2023-07-06

# スピン軌道結合二重量子ドットの分類とマジック磁場方向

Classification and magic magnetic-field directions for spin-orbit-coupled double quantum dots ( http://arxiv.org/abs/2307.02958v1 )

ライセンス: Link先を確認

Aritra Sen, Gy\"orgy Frank, Baksa Kolok, Jeroen Danon, Andr\'as P\'alyi

(参考訳) 半導体量子ドットに閉じ込められた単一電子のスピンは自然量子ビット候補である。スピンベースの量子コンピューティングの基本構成要素は、スピン軌道結合の大きい二重量子ドットで実証されている。ここで、スピン軌道結合二重量子ドットは、その$g$-tensorの多次元空間の分割により、6つのクラスに分類できることを示す。このクラスは二重点の物理的特性、すなわち、輸送、分光、コヒーレンス測定の特徴、および量子ビット制御、シャットリング、読み出し実験などを決定する。特に、スピン物理学は、外部磁場が特殊方向(‘マジック方向’)を指しているときに、シュードスピン保存のために高度に単純化され、特殊方向の数はクラスによって決定される。また,等局所ゼーマン分割に対応する磁場方向空間におけるマジックループの存在と関連性を解析した。これらの結果は、強いスピン軌道結合を持つ材料におけるスピンベースの量子コンピューティング実験の正確な解釈と効率的な設計に向けた重要なステップを示す。

The spin of a single electron confined in a semiconductor quantum dot is a natural qubit candidate. Fundamental building blocks of spin-based quantum computing have been demonstrated in double quantum dots with significant spin-orbit coupling. Here, we show that spin-orbit-coupled double quantum dots can be categorised in six classes, according to a partitioning of the multi-dimensional space of their $g$-tensors. The class determines physical characteristics of the double dot, i.e., features in transport, spectroscopy and coherence measurements, as well as qubit control, shuttling, and readout experiments. In particular, we predict that the spin physics is highly simplified due to pseudospin conservation, whenever the external magnetic field is pointing to special directions (`magic directions'), where the number of special directions is determined by the class. We also analyze the existence and relevance of magic loops in the space of magnetic-field directions, corresponding to equal local Zeeman splittings. These results present an important step toward precise interpretation and efficient design of spin-based quantum computing experiments in materials with strong spin-orbit coupling.

翻訳日:2023-07-07 14:05:04 公開日:2023-07-06

# SegNetr:U字型ネットワークにおけるローカル-グローバルインタラクションとスキップ接続の再考

SegNetr: Rethinking the local-global interactions and skip connections in U-shaped networks ( http://arxiv.org/abs/2307.02953v1 )

ライセンス: Link先を確認

Junlong Cheng, Chengrui Gao, Fengjie Wang, Min Zhu

(参考訳) 近年,U字型ネットワークは,シンプルで手軽に調整可能な構造であるため,医用画像セグメンテーションの分野を支配している。しかし、既存のu字型セグメンテーションネットワーク: 1) 主に、畳み込み操作に基づく長期依存の欠如を補う複雑な自己注意モジュールの設計に焦点が当てられ、ネットワークのパラメータの総数と計算複雑性が増大する。 2) 単にエンコーダとデコーダの特徴を融合させ, 空間的位置の接続を無視する。本稿では、上記の問題を再考し、SegNetrと呼ばれる軽量な医用画像分割ネットワークを構築する。具体的には,任意の段階で動的に局所的・局所的相互作用を行なえる新しいSegNetrブロックを提案する。同時に、エンコーダ特徴の空間的位置情報を保存し、デコーダ特徴との正確な融合を実現するための汎用情報保持スキップ接続(IRSC)を設計する。我々は,4つの主流な医用画像セグメンテーションデータセットに対するSegNetrの有効性を検証し,59 %,76 %のパラメータとGFLOPをバニラU-Netよりも少なくし,最先端の手法に匹敵するセグメンテーション性能を実現した。特に,本論文で提案するコンポーネントを他のU字型ネットワークに適用し,セグメンテーション性能を向上させる。

Recently, U-shaped networks have dominated the field of medical image segmentation due to their simple and easily tuned structure. However, existing U-shaped segmentation networks: 1) mostly focus on designing complex self-attention modules to compensate for the lack of long-term dependence based on convolution operation, which increases the overall number of parameters and computational complexity of the network; 2) simply fuse the features of encoder and decoder, ignoring the connection between their spatial locations. In this paper, we rethink the above problem and build a lightweight medical image segmentation network, called SegNetr. Specifically, we introduce a novel SegNetr block that can perform local-global interactions dynamically at any stage and with only linear complexity. At the same time, we design a general information retention skip connection (IRSC) to preserve the spatial location information of encoder features and achieve accurate fusion with the decoder features. We validate the effectiveness of SegNetr on four mainstream medical image segmentation datasets, with 59\% and 76\% fewer parameters and GFLOPs than vanilla U-Net, while achieving segmentation performance comparable to state-of-the-art methods. Notably, the components proposed in this paper can be applied to other U-shaped networks to improve their segmentation performance.

翻訳日:2023-07-07 14:04:47 公開日:2023-07-06

# ラベル効率3d-to2dセグメンテーションのためのモード間再構成と特徴投影ネットワークによる自己教師あり学習

Self-supervised learning via inter-modal reconstruction and feature projection networks for label-efficient 3D-to-2D segmentation ( http://arxiv.org/abs/2307.03008v1 )

ライセンス: Link先を確認

Jos\'e Morano, Guilherme Aresta, Dmitrii Lachinov, Julia Mai, Ursula Schmidt-Erfurth, Hrvoje Bogunovi\'c

(参考訳) 深層学習は、特定の医用画像セグメンテーションタスクを自動化し、医療専門家の作業量を大幅に軽減する貴重なツールとなっている。これらのタスクのいくつかは、入力次元のサブセットでセグメンテーションを行う必要があり、最も一般的なケースは3D-to-2Dである。しかし、既存の手法の性能は、現在これらのタスクで検証されている転送学習のようなデータ効率のよい手法がないため、ラベル付きデータの量によって強く条件付けられている。本研究では,ラベル効率のよい3D-to-2Dセグメンテーションのための新しい畳み込みニューラルネットワーク(CNN)と自己教師付き学習(SSL)手法を提案する。 cnnは、3dエンコーダと、2dデコーダからなり、新しい3d-to2dブロックで接続される。 SSL法は次元の異なるモダリティのイメージペアを再構成する。光コヒーレンス・トモグラフィーにおける地理的萎縮の面分画と直交性偽ドライセンの2つの臨床的関連性について検討した。異なるデータセット上の結果から,提案するcnnは,diceスコアの最大8%の制限付きデータを用いて,シナリオにおけるアートの状態を著しく改善することが示された。さらに,提案手法により,最大23%の性能向上が可能となり,ネットワークアーキテクチャに関係なくSSLが有効であることを示す。

Deep learning has become a valuable tool for the automation of certain medical image segmentation tasks, significantly relieving the workload of medical specialists. Some of these tasks require segmentation to be performed on a subset of the input dimensions, the most common case being 3D-to-2D. However, the performance of existing methods is strongly conditioned by the amount of labeled data available, as there is currently no data efficient method, e.g. transfer learning, that has been validated on these tasks. In this work, we propose a novel convolutional neural network (CNN) and self-supervised learning (SSL) method for label-efficient 3D-to-2D segmentation. The CNN is composed of a 3D encoder and a 2D decoder connected by novel 3D-to-2D blocks. The SSL method consists of reconstructing image pairs of modalities with different dimensionality. The approach has been validated in two tasks with clinical relevance: the en-face segmentation of geographic atrophy and reticular pseudodrusen in optical coherence tomography. Results on different datasets demonstrate that the proposed CNN significantly improves the state of the art in scenarios with limited labeled data by up to 8% in Dice score. Moreover, the proposed SSL method allows further improvement of this performance by up to 23%, and we show that the SSL is beneficial regardless of the network architecture.

翻訳日:2023-07-07 13:57:10 公開日:2023-07-06

# 解剖学的特徴と反復学習を用いた手ポーズ推定の自己教師あり最適化

Self-supervised Optimization of Hand Pose Estimation using Anatomical Features and Iterative Learning ( http://arxiv.org/abs/2307.03007v1 )

ライセンス: Link先を確認

Christian Jauch, Timo Leitritz, Marco F. Huber

(参考訳) 手動組立作業員は、仕事の複雑さが増しています。人間中心の補助システムは役に立つが、オブジェクト認識を可能にする技術は、これらのシステムの洗練された人間中心の設計を妨げる。同時に、手のポーズに基づく行動認識は、手袋を着用するなどの複雑な使用シナリオにおいて、ポーズ推定の貧弱さに苦しむ。本稿では,人的相互作用が最小限である特定のユースケースに手振り推定を適用するための自己教師付きパイプラインを提案する。これにより、安価でロバストな手ポーズベースのアクティビティ認識が可能になる。このパイプラインは、一般的なデータセットでトレーニングされた手のポーズ推定のための一般的な機械学習モデルと、手の解剖学的制約を考慮した空間的および時間的フィルタリングと、モデルを改善するための再トレーニングステップで構成されている。異なるパラメータの組み合わせは、公開および注釈付きデータセットで評価される。最適なパラメータとモデルの組み合わせは、手動のアセンブリシナリオからラベルなしのビデオに適用されます。パイプラインの有効性は、手動のアセンブリシナリオで下流タスクとしてアクティビティ認識をトレーニングすることで実証される。

Manual assembly workers face increasing complexity in their work. Human-centered assistance systems could help, but object recognition as an enabling technology hinders sophisticated human-centered design of these systems. At the same time, activity recognition based on hand poses suffers from poor pose estimation in complex usage scenarios, such as wearing gloves. This paper presents a self-supervised pipeline for adapting hand pose estimation to specific use cases with minimal human interaction. This enables cheap and robust hand posebased activity recognition. The pipeline consists of a general machine learning model for hand pose estimation trained on a generalized dataset, spatial and temporal filtering to account for anatomical constraints of the hand, and a retraining step to improve the model. Different parameter combinations are evaluated on a publicly available and annotated dataset. The best parameter and model combination is then applied to unlabelled videos from a manual assembly scenario. The effectiveness of the pipeline is demonstrated by training an activity recognition as a downstream task in the manual assembly scenario.

翻訳日:2023-07-07 13:56:47 公開日:2023-07-06

# ロボットシステムの効率性向上 : 人間のエキスパートに人工的を加える

Improving the Efficiency of Human-in-the-Loop Systems: Adding Artificial to Human Experts ( http://arxiv.org/abs/2307.03003v1 )

ライセンス: Link先を確認

Johannes Jakubik, Daniel Weber, Patrick Hemmer, Michael V\"ossing, Gerhard Satzger

(参考訳) 情報システムは人工知能(AI)と機械学習(ML)を活用して、膨大なデータから価値を生み出す。しかし、MLモデルは不完全であり、誤った分類を生成することができる。したがって、MLモデルへのHuman-in-the-loop(HITL)拡張は、分類が難しいインスタンスのヒューマンレビューを追加する。この研究は、難しいモデルの分類を扱うために人間の専門家を継続的に頼りにすることは、限られた資源を圧迫する人間の努力を強力に増加させると主張している。この問題に対処するために,人間専門家が以前にレビューした未知のクラスからデータインスタンスを分類することを学ぶ人工専門家を作成するハイブリッドシステムを提案する。我々のハイブリッドシステムは、未知のクラスからインスタンスを分類するのに適した人工専門家を評価し、自動的に割り当てる。時間が経つにつれ、人間の労力が減り、システムの効率が向上します。提案手法は,画像分類のベンチマークにおいて,従来のHITLシステムよりも優れていることを示す。

Information systems increasingly leverage artificial intelligence (AI) and machine learning (ML) to generate value from vast amounts of data. However, ML models are imperfect and can generate incorrect classifications. Hence, human-in-the-loop (HITL) extensions to ML models add a human review for instances that are difficult to classify. This study argues that continuously relying on human experts to handle difficult model classifications leads to a strong increase in human effort, which strains limited resources. To address this issue, we propose a hybrid system that creates artificial experts that learn to classify data instances from unknown classes previously reviewed by human experts. Our hybrid system assesses which artificial expert is suitable for classifying an instance from an unknown class and automatically assigns it. Over time, this reduces human effort and increases the efficiency of the system. Our experiments demonstrate that our approach outperforms traditional HITL systems for several benchmarks on image classification.

翻訳日:2023-07-07 13:56:32 公開日:2023-07-06

# Fourier-Net+: 効率的な3次元医用画像登録のための帯域制限表現の活用

Fourier-Net+: Leveraging Band-Limited Representation for Efficient 3D Medical Image Registration ( http://arxiv.org/abs/2307.02997v1 )

ライセンス: Link先を確認

Xi Jia, Alexander Thorley, Alberto Gomez, Wenqi Lu, Dipak Kotecha and Jinming Duan

(参考訳) U-Netスタイルのネットワークは、教師なし画像登録において、高解像度のボリューム画像データに対してリソース集約的かつ時間のかかるタスクである密度の変位場を予測するために一般的に利用される。この課題に取り組むために、まず、コストのかかるu-netスタイルの拡張パスをパラメータフリーのモデル駆動デコーダに置き換えるfourier-netを提案する。フルレゾリューション変位場を直接予測する代わりに、フーリエネットは、モデル駆動デコーダが空間領域のフルレゾリューション変位場に変換する帯域制限フーリエ領域における変位場の低次元表現を学習する。フーリエネット上に拡大してフーリエネット+を導入し、さらに画像の帯域制限された空間表現を入力とし、さらにu-netスタイルのネットワークの収縮経路における畳み込み層数を減らす。最後に、登録性能を向上させるために、Fourier-Net+のカスケード版を提案する。提案手法は,提案手法を3つのデータセット上で評価し,提案手法は現在の最先端手法と同等の結果を得るとともに,高速な推論速度,メモリフットプリントの低減,乗算演算の削減を実現している。このような計算コストの小さなFourier-Net+は、低VRAMGPU上での大規模3D登録の効率的なトレーニングを可能にします。私たちのコードは \url{https://github.com/xi-jia/fourier-net} で公開されている。

U-Net style networks are commonly utilized in unsupervised image registration to predict dense displacement fields, which for high-resolution volumetric image data is a resource-intensive and time-consuming task. To tackle this challenge, we first propose Fourier-Net, which replaces the costly U-Net style expansive path with a parameter-free model-driven decoder. Instead of directly predicting a full-resolution displacement field, our Fourier-Net learns a low-dimensional representation of the displacement field in the band-limited Fourier domain which our model-driven decoder converts to a full-resolution displacement field in the spatial domain. Expanding upon Fourier-Net, we then introduce Fourier-Net+, which additionally takes the band-limited spatial representation of the images as input and further reduces the number of convolutional layers in the U-Net style network's contracting path. Finally, to enhance the registration performance, we propose a cascaded version of Fourier-Net+. We evaluate our proposed methods on three datasets, on which our proposed Fourier-Net and its variants achieve comparable results with current state-of-the art methods, while exhibiting faster inference speeds, lower memory footprint, and fewer multiply-add operations. With such small computational cost, our Fourier-Net+ enables the efficient training of large-scale 3D registration on low-VRAM GPUs. Our code is publicly available at \url{https://github.com/xi-jia/Fourier-Net}.

翻訳日:2023-07-07 13:56:16 公開日:2023-07-06

# 非エルミート系における生体直交動的量子相転移

Biorthogonal dynamical quantum phase transitions in non-Hermitian systems ( http://arxiv.org/abs/2307.02993v1 )

ライセンス: Link先を確認

Yecheng Jing, Jian-Jun Dong, Yu-Yu Zhang, and Zi-Xiang Hu

(参考訳) 生物直交基底を用いて、非エルミート系における生物直交動的量子相転移の完全な枠組みを構築する。これまで見過ごされていた関連状態の助けを借りて, 自動正規化バイオノゴナルロスシュミットエコーを定義する。このアプローチは、複素固有値を持つ任意の非エルミート系を扱うことができ、生体直交基底なしで得られるロスシュミットレートの負の値を自然に取り除くことができる。非エルミート的なSu-Schrieffer-Heegerモデルを具体例として、伝統的な量子相転移を超越した、生物直交の動的トポロジカル秩序パラメータの1/2$の特別な変化が観察される。また、臨界運動量における2段階のサブシステムが振動するか、定常状態に達するかによって、生物直交の動的量子相転移の周期性も分かる。

By using biorthogonal bases, we construct a complete framework for biorthogonal dynamical quantum phase transitions in non-Hermitian systems. With the help of associated state which is overlooked previously, we define the automatically normalized biorthogonal Loschmidt echo. This approach is capable of handling arbitrary non-Hermitian systems with complex eigenvalues, which naturally eliminates the negative value of Loschmidt rate obtained without the biorthogonal bases. Taking the non-Hermitian Su-Schrieffer-Heeger model as a concrete example, a peculiar $1/2$ change in biorthogonal dynamical topological order parameter, which is beyond the traditional dynamical quantum phase transitions is observed. We also find the periodicity of biorthogonal dynamical quantum phase transitions depend on whether the two-level subsystem at the critical momentum oscillates or reaches a steady state.

翻訳日:2023-07-07 13:55:50 公開日:2023-07-06

# ContainerGym: リソース割り当てのための実世界の強化学習ベンチマーク

ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource Allocation ( http://arxiv.org/abs/2307.02991v1 )

ライセンス: Link先を確認

Abhijeet Pendyala, Justin Dettmer, Tobias Glasmachers, Asma Atamna

(参考訳) 本稿では,実世界の産業資源配分タスクに触発された強化学習のベンチマークであるContainerGymを紹介する。提案したベンチマークは、不確実性など、現実のシーケンシャルな意思決定問題でよく遭遇する様々な課題をエンコードする。これは、例えば、可変次元の観点で、様々な難易度の問題をインスタンス化するように構成することができる。我々のベンチマークは他の強化学習ベンチマークと異なり、実世界の難易度をエンコードすることを目的としており、それは最小限の単純化と合理化を行った実世界の産業問題から直接導かれるものである。リソース割り当てフレームワークに適合する実世界の問題に対して、強化学習アルゴリズムを評価するのに十分便利です。標準ベースライン方式の結果を提供する。通常のトレーニング報酬曲線を超えて、我々の結果とそれらの解釈に使用される統計ツールは、よく知られた深層強化学習アルゴリズム(PPO、TRPO、DQN)の興味深い制限を強調します。

We present ContainerGym, a benchmark for reinforcement learning inspired by a real-world industrial resource allocation task. The proposed benchmark encodes a range of challenges commonly encountered in real-world sequential decision making problems, such as uncertainty. It can be configured to instantiate problems of varying degrees of difficulty, e.g., in terms of variable dimensionality. Our benchmark differs from other reinforcement learning benchmarks, including the ones aiming to encode real-world difficulties, in that it is directly derived from a real-world industrial problem, which underwent minimal simplification and streamlining. It is sufficiently versatile to evaluate reinforcement learning algorithms on any real-world problem that fits our resource allocation framework. We provide results of standard baseline methods. Going beyond the usual training reward curves, our results and the statistical tools used to interpret them allow to highlight interesting limitations of well-known deep reinforcement learning algorithms, namely PPO, TRPO and DQN.

翻訳日:2023-07-07 13:55:35 公開日:2023-07-06

# 講義ノート:ランダムユニタリ回路の導入と測定誘起絡み合い相転移

Lecture Notes: Introduction to random unitary circuits and the measurement-induced entanglement phase transition ( http://arxiv.org/abs/2307.02986v1 )

ライセンス: Link先を確認

Brian Skinner

(参考訳) これらはミネソタ大学の2023 condensed matter summer schoolの短い講義シリーズにまとめられた講義ノートである。会話的で楽しいようにデザインされており、物事を正確に述べ、文学を徹底的に引用する真剣な仕事を行う記事のレビューに取って代わるものではない。このノートの目的は、ランダムユニタリ回路における絡み合いダイナミクスと測定誘起絡み合い相転移の研究の基礎となるいくつかの中心的なアイデアを導入することである。特に注目されるのは、「最小カット」の概念とその関連する統計メカマッピングである。

These are lecture notes compiled for a short lecture series at the 2023 Condensed Matter Summer School at the University of Minnesota. They are designed to be conversational and fun, and not to take the place of review articles that do a serious job of stating things precisely and citing literature thoroughly. The goal of the notes is to introduce some central ideas underlying the study of entanglement dynamics in random unitary circuits and the measurement-induced entanglement phase transition. A particular focus is the concept of the "minimal cut" and its associated stat mech mappings.

翻訳日:2023-07-07 13:55:17 公開日:2023-07-06

# 3次元フェルミオン物質中の渦ループダイナミクスと動的量子相転移

Vortex loop dynamics and dynamical quantum phase transitions in 3D fermion matter ( http://arxiv.org/abs/2307.02985v1 )

ライセンス: Link先を確認

Arkadiusz Kosior, Markus Heyl

(参考訳) 本研究では, 一般非相互作用フェルミオン格子モデルにおけるグリーン関数の相における渦特異点の挙動を, 瞬時クエンチ後の3次元で検討する。渦の全集合が1次元の動的物体を形成しており、渦ループと呼ばれる。そのような渦ループの数は、異なる非平衡位相を区別する量子化順序パラメータとして解釈できる。この順序パラメータの変化は、動的量子相転移(DQPT)と関連していることを示す。この結果は3次元の一般格子モデルに適用できる。具体的には、単純二バンドワイル半金属の文脈でそれらを示す。また, 渦ループが弱い相互作用系で生き残ることを示す。最後に, 渦ループは, バンドタッチワイルノードの存在により運動量空間の複雑な動的パターンを形成することが観察された。本研究は,非平衡系における動的順序パラメータの定義の開発に有用な知見を提供する。

In this study, we investigate the behavior of vortex singularities in the phase of the Green's function of a general non-interacting fermionic lattice model in three dimensions after an instantaneous quench. We find that the full set of vortices form one-dimensional dynamical objects, which we call vortex loops. The number of such vortex loops can be interpreted as a quantized order parameter that distinguishes between different non-equilibrium phases. We show that changes in this order parameter are related to dynamical quantum phase transitions (DQPTs). Our results are applicable to general lattice models in three dimensions. For concreteness, we present them in the context of a simple two-band Weyl semimetal. We also show that the vortex loops survive in weakly interacting systems. Finally, we observe that vortex loops can form complex dynamical patterns in momentum space due to the existence of band touching Weyl nodes. Our findings provide valuable insights for developing definitions of dynamical order parameters in non-equilibrium systems.

翻訳日:2023-07-07 13:55:06 公開日:2023-07-06

# 医療用生成モデルの潜在空間におけるプライバシ保護ウォーク

A Privacy-Preserving Walk in the Latent Space of Generative Models for Medical Applications ( http://arxiv.org/abs/2307.02984v1 )

ライセンス: Link先を確認

Matteo Pennisi, Federica Proietto Salanitri, Giovanni Bellitto, Simone Palazzo, Ulas Bagci, Concetto Spampinato

(参考訳) GAN(Generative Adversarial Networks)は、ターゲット分布に一致する合成サンプルを生成する能力を実証している。しかし、プライバシーの観点からは、gansをデータ共有のプロキシとして使うことは安全な解決策ではない。 k-匿名性原理に触発された最近の研究は、潜在空間におけるサンプルアグリゲーションを通じてこの問題に対処し、データセットをkの係数で減らすという欠点がある。本研究の目的は、奥行きモデルの効果的なトレーニングを支援するために、プライバシー問題に原則的に対処しながら、多様な合成サンプルを生成できる潜航型宇宙航行戦略を提案することである。提案手法では,潜在空間の点間を非線形に歩いたり,実際のサンプルと衝突するリスクを最小限に抑えるための補助的等式分類器を利用する。潜在空間における無作為な一対の点を考えると、我々の歩行戦略は線形補間よりも安全である。次に,k-same法と併用したパス探索戦略を検証し,結核と糖尿病網膜症分類の2つの指標を用いて,プライバシ保護を維持しつつ,そのモデルを用いたトレーニングにより性能低下を軽減できることを示した。

Generative Adversarial Networks (GANs) have demonstrated their ability to generate synthetic samples that match a target distribution. However, from a privacy perspective, using GANs as a proxy for data sharing is not a safe solution, as they tend to embed near-duplicates of real samples in the latent space. Recent works, inspired by k-anonymity principles, address this issue through sample aggregation in the latent space, with the drawback of reducing the dataset by a factor of k. Our work aims to mitigate this problem by proposing a latent space navigation strategy able to generate diverse synthetic samples that may support effective training of deep models, while addressing privacy concerns in a principled way. Our approach leverages an auxiliary identity classifier as a guide to non-linearly walk between points in the latent space, minimizing the risk of collision with near-duplicates of real samples. We empirically demonstrate that, given any random pair of points in the latent space, our walking strategy is safer than linear interpolation. We then test our path-finding strategy combined to k-same methods and demonstrate, on two benchmarks for tuberculosis and diabetic retinopathy classification, that training a model using samples generated by our approach mitigate drops in performance, while keeping privacy preservation.

翻訳日:2023-07-07 13:54:54 公開日:2023-07-06

# 効率的なセミリング重み付きearleyパース

Efficient Semiring-Weighted Earley Parsing ( http://arxiv.org/abs/2307.02982v1 )

ライセンス: Link先を確認

Andreas Opedal, Ran Zmigrod, Tim Vieira, Ryan Cotterell, Jason Eisner

(参考訳) 本稿では,様々なスピードアップを伴うEarey (1970) の文脈自由解析アルゴリズムの,推論システムという形での参照記述を提供する。私たちのプレゼンテーションでは、Earey氏の$O(N^3|G|R|)$から、自然言語処理で発生する大きな文法では動作不能な$O(N^3|G|)$から、CKYのランタイムを二項化された文法の$G$にマッチさせる$O(N^3|G|)$まで、既知の最悪のランタイム改善が含まれている。ここで$N$は文の長さ、$|R|$は$G$のプロダクションの数、$|G|$はそれらのプロダクションの総の長さである。また、文法が単一の有限状態オートマトン$m$としてコンパクトに表現されるときに、$|m| \leq |g|$で$o (n^3|m|)$のランタイムを実現するバージョンも提供します。半重み付き推論への一般化を慎重に扱い、Stolcke (1995)のような文法を前処理して推論サイクルを排除し、さらに文プレフィックスの重みを計算するStolckeの手法を一般化する。プリプロセスされた文法では、メソッドのセミリング重み付けバージョンは、非重み付けメソッドと同じ漸近ランタイムとスペース要件を持ち、いくつかの文法上のサブキュービックランタイムを含む。

This paper provides a reference description, in the form of a deduction system, of Earley's (1970) context-free parsing algorithm with various speed-ups. Our presentation includes a known worst-case runtime improvement from Earley's $O (N^3|G||R|)$, which is unworkable for the large grammars that arise in natural language processing, to $O (N^3|G|)$, which matches the runtime of CKY on a binarized version of the grammar $G$. Here $N$ is the length of the sentence, $|R|$ is the number of productions in $G$, and $|G|$ is the total length of those productions. We also provide a version that achieves runtime of $O (N^3|M|)$ with $|M| \leq |G|$ when the grammar is represented compactly as a single finite-state automaton $M$ (this is partly novel). We carefully treat the generalization to semiring-weighted deduction, preprocessing the grammar like Stolcke (1995) to eliminate deduction cycles, and further generalize Stolcke's method to compute the weights of sentence prefixes. We also provide implementation details for efficient execution, ensuring that on a preprocessed grammar, the semiring-weighted versions of our methods have the same asymptotic runtime and space requirements as the unweighted methods, including sub-cubic runtime on some grammars.

翻訳日:2023-07-07 13:54:28 公開日:2023-07-06

# シャンファー距離の近距離時間アルゴリズム

A Near-Linear Time Algorithm for the Chamfer Distance ( http://arxiv.org/abs/2307.03043v1 )

ライセンス: Link先を確認

Ainesh Bakshi, Piotr Indyk, Rajesh Jayaram, Sandeep Silwal, Erik Waingarten

(参考訳) 任意の二つの点集合に対して、$A,B \subset \mathbb{R}^d$ が$n$ になるとき、$A$ から $B$ までのチャムファー距離は $\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$ と定義される。チャンファー距離は点雲間の相似性の一般的な尺度であり、多くの機械学習、コンピュータビジョン、グラフィックアプリケーションで使われ、単純な$O(d n^2)$-time brute forceアルゴリズムが認められている。さらに、チャンファー距離はより計算的に要求される地球移動距離のプロキシとしてしばしば用いられる。しかし、実行時の$n$に依存する \emph{quadratic} は、大規模なデータセットでは難解なアプローチとなる。このボトルネックを克服し、ほぼ直線走行時間でチャンファー距離を推定する最初の$(1+\epsilon)$-approximateアルゴリズムを示す。具体的には、我々のアルゴリズムは時間$O(nd \log (n)/\varepsilon^2)$で実行でき、実装可能である。我々の実験は、大規模な高次元データセットでは正確かつ高速であることを示した。我々は,我々のアルゴリズムが大規模高次元点雲を解析するための新たな道を開くと信じている。また、emph{report} a $(1+\varepsilon)$-approximate mapping to \emph{report} a $(1+\varepsilon) from $a$ to $b$ (単なる値とは対照的に) ならば、任意のサブ量子時間アルゴリズムは存在し得ない。

For any two point sets $A,B \subset \mathbb{R}^d$ of size up to $n$, the Chamfer distance from $A$ to $B$ is defined as $\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$, where $d_X$ is the underlying distance measure (e.g., the Euclidean or Manhattan distance). The Chamfer distance is a popular measure of dissimilarity between point clouds, used in many machine learning, computer vision, and graphics applications, and admits a straightforward $O(d n^2)$-time brute force algorithm. Further, the Chamfer distance is often used as a proxy for the more computationally demanding Earth-Mover (Optimal Transport) Distance. However, the \emph{quadratic} dependence on $n$ in the running time makes the naive approach intractable for large datasets. We overcome this bottleneck and present the first $(1+\epsilon)$-approximate algorithm for estimating the Chamfer distance with a near-linear running time. Specifically, our algorithm runs in time $O(nd \log (n)/\varepsilon^2)$ and is implementable. Our experiments demonstrate that it is both accurate and fast on large high-dimensional datasets. We believe that our algorithm will open new avenues for analyzing large high-dimensional point clouds. We also give evidence that if the goal is to \emph{report} a $(1+\varepsilon)$-approximate mapping from $A$ to $B$ (as opposed to just its value), then any sub-quadratic time algorithm is unlikely to exist.

翻訳日:2023-07-07 13:48:49 公開日:2023-07-06

# 臨床領域におけるLLaMAのパラメータ効率向上

Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain ( http://arxiv.org/abs/2307.03042v1 )

ライセンス: Link先を確認

Aryo Gema, Luke Daines, Pasquale Minervini, Beatrice Alex

(参考訳) 臨床応用のような新しい領域に事前訓練された言語モデルを適用するには、伝統的にパラメータの集合全体をトレーニングする必要がある。しかし、このような大規模言語モデルの訓練に関係するかなりの計算要求のため、このアプローチは実用的でないことがますます証明されている。この問題に対処するために、パラメータ効率の良いファインチューニング(peft)技術は、追加のパラメータの小さなサブセットを選択的に微調整することで、実行可能なソリューションを提供する。本研究では,オープンソースのLLaMAモデルに基づくPEFTアダプタ層である臨床LLaMA-LoRAを提案する。 MIMIC-IVデータベースから得られた臨床ノートを用いて臨床LLaMA-LoRAを訓練し、臨床領域用に設計された特別なアダプタを作成する。さらに,2段階のPEFTフレームワークを提案する。このフレームワークは,下流タスクに特化した2段階のPEFTアダプタであるLLaMA-LoRAと臨床LLaMA-LoRAを融合する。本稿では,複数の臨床結果予測データセットについて,臨床訓練言語モデルと比較した。提案フレームワークは,すべての臨床下流タスクにおいて,最先端のaurocスコアを実現する。診断や手順分類などの大規模多ラベル分類タスクにおいて,6-9%のAUROCスコアの大幅な改善が観察された。

Adapting pretrained language models to novel domains, such as clinical applications, traditionally involves retraining their entire set of parameters. However, this approach is increasingly proven to be impractical owing to the substantial computational requirements associated with training such large language models. To address this issue, Parameter-Efficient Fine-Tuning (PEFT) techniques offer a viable solution by selectively fine-tuning a small subset of additional parameters, significantly reducing the computational requirements for domain adaptation. In this study, we propose Clinical LLaMA-LoRA, a PEFT adapter layer built upon the open-sourced LLaMA model. Clinical LLaMA-LoRA is trained using clinical notes obtained from the MIMIC-IV database, thereby creating a specialised adapter designed for the clinical domain. Additionally, we propose a two-step PEFT framework which fuses Clinical LLaMA-LoRA with Downstream LLaMA-LoRA, another PEFT adapter specialised for downstream tasks. We evaluate this framework on multiple clinical outcome prediction datasets, comparing it to clinically trained language models. Our proposed framework achieves a state-of-the-art AUROC score averaged across all clinical downstream tasks. We observe substantial improvements of 6-9% AUROC score in the large-scale multilabel classification tasks, such as diagnoses and procedures classification.

翻訳日:2023-07-07 13:48:17 公開日:2023-07-06

# 視覚トランスフォーマによるアート認証

Art Authentication with Vision Transformers ( http://arxiv.org/abs/2307.03039v1 )

ライセンス: Link先を確認

Ludovica Schaerf, Carina Popovici, Eric Postma

(参考訳) 近年では、言語用に開発されたTransformersが視覚タスクにうまく適用されている。視覚トランスフォーマーは、画像分類、オブジェクト検出、セマンティックセグメンテーションなど、最先端のタスクを幅広いタスクで推進することが示されている。本研究は,畳み込みニューラルネットワークを用いたアートアトリビューションとアート認証の課題において有望な結果が得られたが,視覚トランスフォーマーの優位性がアート認証に拡張され,コンピュータベースのアート認証の信頼性が向上するかどうかを検証した。ヴィンセント・ファン・ゴッホ(vincent van gogh)と2つのコントラストデータセットによる真正な絵画の注意深くコンパイルされたデータセットを用いて、スウィントランスフォーマのアート認証性能と効率性を比較した。模倣とプロキシを含む標準的なコントラストセット(ファン・ゴッホと密接に関連するスタイルを持つ画家による作品)を用いて、EfficientNetは全体として最高のパフォーマンスを達成する。模倣のみで構成されたコントラストセットでは、認証精度が85%を超えることにより、Swin TransformerはEfficientNetよりも優れていることが分かる。これらの結果から,視覚変換器は,特にコンピュータによる芸術的模倣検出能力の向上において,芸術的認証において強力かつ有望な競争相手である,という結論に至った。

In recent years, Transformers, initially developed for language, have been successfully applied to visual tasks. Vision Transformers have been shown to push the state-of-the-art in a wide range of tasks, including image classification, object detection, and semantic segmentation. While ample research has shown promising results in art attribution and art authentication tasks using Convolutional Neural Networks, this paper examines if the superiority of Vision Transformers extends to art authentication, improving, thus, the reliability of computer-based authentication of artworks. Using a carefully compiled dataset of authentic paintings by Vincent van Gogh and two contrast datasets, we compare the art authentication performances of Swin Transformers with those of EfficientNet. Using a standard contrast set containing imitations and proxies (works by painters with styles closely related to van Gogh), we find that EfficientNet achieves the best performance overall. With a contrast set that only consists of imitations, we find the Swin Transformer to be superior to EfficientNet by achieving an authentication accuracy of over 85%. These results lead us to conclude that Vision Transformers represent a strong and promising contender in art authentication, particularly in enhancing the computer-based ability to detect artistic imitations.

翻訳日:2023-07-07 13:47:54 公開日:2023-07-06

# 一般観測モデルを用いたレストレスバンディットのpcl-indexabilityとwhitle index

PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models ( http://arxiv.org/abs/2307.03034v1 )

ライセンス: Link先を確認

Keqin Liu and Chengzhong Zhang

(参考訳) 本稿では,restless multi-armed bandit問題に対する一般的な観測モデルについて検討する。プレイヤーの操作は、リソースの制約や環境や内在的なノイズによってエラーが発生しやすいフィードバック機構に基づく必要がある。フィードバック・観測のダイナミクスの一般的な確率モデルを確立することにより、任意の初期信念(事前情報)から始まる可算な信念状態空間を持つレスレス・バンドイットとして問題を定式化する。部分保存法則(PCL)を用いた達成可能な領域法を無限状態問題に適用し,その指数性と優先度(Whittle index)を分析する。最後に、有限状態問題に対するNi\~no-Mora と Bertsimas の AG アルゴリズムを適用可能な問題に変換する近似法を提案する。数値実験により,このアルゴリズムは優れた性能を示す。

In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a countable belief state space starting from an arbitrary initial belief (a priori information). We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Ni\~no-Mora and Bertsimas for finite-state problems can be applied to. Numerical experiments show that our algorithm has an excellent performance.

翻訳日:2023-07-07 13:47:25 公開日:2023-07-06

# データ重要度学習による検索型大規模言語モデルの改善

Improving Retrieval-Augmented Large Language Models via Data Importance Learning ( http://arxiv.org/abs/2307.03027v1 )

ライセンス: Link先を確認

Xiaozhong Lyu, Stefan Grafberger, Samantha Biegel, Shaopeng Wei, Meng Cao, Sebastian Schelter, Ce Zhang

(参考訳) Retrieval Augmentationは、例えば質問応答やデータ計算といったタスクにおいて、大きな言語モデルで外部の知識を活用できるようにする。しかし,このような検索提示モデルの性能は,検索コーパスのデータ品質によって制限される。本稿では,検索したデータポイントのデータ重要度を評価するためのマルチ線形拡張に基づくアルゴリズムを提案する。マルチリニア拡張には指数関数的に多くの項があり、本論文の重要な貢献の一つは、付加効用関数と検証セットを備えた検索指定モデルが与えられたとき、モデルユーティリティ関数のマルチリニア拡張を用いた検索コーパスにおけるデータポイントの重要度を正確に計算する多項式時間アルゴリズムである。さらに,より効率的な近似アルゴリズム({\epsilon, {\delta})を提案した。実験結果から,検索コーパスのプルーニングや再重み付けのみを必要とせずに,大規模言語モデルの性能を向上させることができることがわかった。一部のタスクでは、小さなモデル(例えば、GPT-JT)を検索エンジンAPIで拡張し、GPT-3.5を(検索拡張なしで)上回ることができる。さらに,マルチリニア拡張に基づく重みは,実際に効率的に計算できることを示す(例えば,1億要素のコーパスに対して10分以内で計算できる)。

Retrieval augmentation enables large language models to take advantage of external knowledge, for example on tasks like question answering and data imputation. However, the performance of such retrieval-augmented models is limited by the data quality of their underlying retrieval corpus. In this paper, we propose an algorithm based on multilinear extension for evaluating the data importance of retrieved data points. There are exponentially many terms in the multilinear extension, and one key contribution of this paper is a polynomial time algorithm that computes exactly, given a retrieval-augmented model with an additive utility function and a validation set, the data importance of data points in the retrieval corpus using the multilinear extension of the model's utility function. We further proposed an even more efficient ({\epsilon}, {\delta})-approximation algorithm. Our experimental results illustrate that we can enhance the performance of large language models by only pruning or reweighting the retrieval corpus, without requiring further training. For some tasks, this even allows a small model (e.g., GPT-JT), augmented with a search engine API, to outperform GPT-3.5 (without retrieval augmentation). Moreover, we show that weights based on multilinear extension can be computed efficiently in practice (e.g., in less than ten minutes for a corpus with 100 million elements).

翻訳日:2023-07-07 13:47:09 公開日:2023-07-06

# style over substance: 大規模言語モデルに対する評価バイアス

Style Over Substance: Evaluation Biases for Large Language Models ( http://arxiv.org/abs/2307.03025v1 )

ライセンス: Link先を確認

Minghao Wu, Alham Fikri Aji

(参考訳) 大きな言語モデル(LLM)が進歩を続けるにつれ、そのパフォーマンスを正確かつ包括的に評価することはますます困難になっている。従来、人間の評価は自然言語生成における金の標準とみなされていた。近年の進歩は、評価過程における人間の判断のためのプロキシとして最先端のLSMを取り入れている。それでも、人間とLLMがどの程度の能力を持つかは、まだ不明である。本研究では,クラウドソース型人間とLCMベースの審査員の,異なるモデルからのアウトプットを比較する際の行動について検討する。これを実現するために、意図的に欠陥のある機械生成回答からなるデータセットをキュレートする。その結果, 事実誤りによる潜在的に大きな危険があるにもかかわらず, 事実誤りによる回答は, 短すぎる, 文法的誤りを含む回答に比べ, いまだに好意的に評価されていた。これは評価プロセスにおける関連するバイアスを強調します。この問題に対処するために,評価面を1つのスコアにマージするのではなく,複数の次元にまたがるマシン生成テキストを独立して評価することを提案する。このアイデアをeloレーティングシステムでインスタンス化し,マルチeloレーティングシステムを実現する。本研究から得られた実験結果から,本手法はLLMによる評価,特に実測精度を著しく向上させることが明らかとなった。しかし、クラウドソースによる評価では顕著な改善は見られず、さらなる調査と改善の必要性が示唆されている。

As large language models (LLMs) continue to advance, accurately and comprehensively evaluating their performance becomes increasingly challenging. Conventionally, human evaluations are considered the gold standard in natural language generation. Recent advancements incorporate state-of-the-art LLMs as proxies for human judges in evaluation processes. Nonetheless, the extent to which humans and LLMs are capable evaluators remains uncertain. This study aims to investigate the behavior of both crowd-sourced human and LLM-based judges when comparing outputs from different models. To accomplish this, we curate a dataset comprising intentionally flawed machine-generated answers. Our findings indicate that despite the potentially greater danger posed by factual errors, answers with factual errors were still rated more favorably compared to answers that were too short or contained grammatical errors. This highlights a concerning bias in the evaluation process. To address this issue, we propose to independently evaluate machine-generated text across multiple dimensions, rather than merging all the evaluation aspects into a single score. We instantiate this idea with the Elo rating system, resulting in the Multi-Elo Rating System. Empirical results from our study reveal that this proposed approach significantly enhances the quality of LLM-based evaluations, particularly in terms of factual accuracy. However, notable improvement is not observed in crowd-sourced-based evaluations, suggesting the need for further investigation and refinement.

翻訳日:2023-07-07 13:46:47 公開日:2023-07-06

# 光磁気双極子転移の強いパーセル増強

Strong Purcell enhancement of an optical magnetic dipole transition ( http://arxiv.org/abs/2307.03022v1 )

ライセンス: Link先を確認

Sebastian P. Horvath, Christopher M. Phenicie, Salim Ourari, Mehmet T. Uysal, Songtao Chen, {\L}ukasz Dusanowski, Mouktik Raha, Paul Stevenson, Adam T. Turflinger, Robert J. Cava, Nathalie P. de Leon, and Jeff D. Thompson

(参考訳) ナノフォトニック構造を持つ状態の局所密度は、パーセル効果を介して光-物質相互作用を制御する強力なツールである。光周波数では、状態の電界密度の制御は通常、電気双極子遷移を結合して操作するために使用される。しかし、磁気双極子遷移を制御するために状態の磁気密度を設計することもできる。本研究では, ナノフォトニックキャビティに結合した単一の希土類イオンを用いた光磁気パーセル効果を実験的に実証した。我々は、MgOに新しい単一光子エミッタEr$^{3+}$を設計し、電気双極子崩壊速度は立方体サイト対称性によって強く抑制され、ほぼ純粋な磁気双極子光遷移をもたらす。これにより、磁気パーセル因子 $p_m=1040 \pm 30$ の曖昧な決定が可能になる。さらに、この技術を拡張して磁気双極子スピン-光子界面を実現し、単一er$^{3+}$電子スピンの光スピン初期化と読み出しを行う。この研究は、状態工学の電気的および磁気的密度の基本的な等価性を実証し、より広いクラスのエミッタに対する光-物質相互作用を制御するための新しいツールを提供する。

Engineering the local density of states with nanophotonic structures is a powerful tool to control light-matter interactions via the Purcell effect. At optical frequencies, control over the electric field density of states is typically used to couple to and manipulate electric dipole transitions. However, it is also possible to engineer the magnetic density of states to control magnetic dipole transitions. In this work, we experimentally demonstrate the optical magnetic Purcell effect using a single rare earth ion coupled to a nanophotonic cavity. We engineer a new single photon emitter, Er$^{3+}$ in MgO, where the electric dipole decay rate is strongly suppressed by the cubic site symmetry, giving rise to a nearly pure magnetic dipole optical transition. This allows the unambiguous determination of a magnetic Purcell factor $P_m=1040 \pm 30$. We further extend this technique to realize a magnetic dipole spin-photon interface, performing optical spin initialization and readout of a single Er$^{3+}$ electron spin. This work demonstrates the fundamental equivalence of electric and magnetic density of states engineering, and provides a new tool for controlling light-matter interactions for a broader class of emitters.

翻訳日:2023-07-07 13:46:26 公開日:2023-07-06

# EffLiFe:階層スパースグラディエント蛍光による高効率光電界発生

EffLiFe: Efficient Light Field Generation via Hierarchical Sparse Gradient Descent ( http://arxiv.org/abs/2307.03017v1 )

ライセンス: Link先を確認

Yijie Deng, Lei Han, Tianpeng Lin, Lin Li, Jinzhi Zhang, and Lu Fang

(参考訳) 拡張現実感(XR)技術の台頭に伴い、スパースビューの入力からリアルタイムの光場生成の必要性が高まっている。既存の手法は、高品質なノベルビューを生成することができるが、長い推論/トレーニングのコストがかかるオフライン技術と、一般化性に欠けるか、不満足な結果を生み出すオンライン手法に分類することができる。しかし,Multi-plane Images (MPI) の固有スパース多様体は,レンダリング品質を維持しつつ,光電場生成の大幅な加速を可能にした。この知見に基づいて,提案した階層スパース勾配Descent (HSGD) を利用して,スパース画像から高品質な光フィールドをリアルタイムで生成する光場最適化手法であるEffLiFeを紹介する。技術的には、シーンの粗いMPIはまず3D CNNを使用して生成され、数回のイテレーションで重要なMPI勾配のみに焦点をあてることで、より疎く最適化される。それでも、最適化のみに依存することは、咬合境界でのアーティファクトにつながる可能性がある。そこで本研究では,入力を反復的にフィルタリングすることで,隠蔽領域の視覚的アーティファクトを除去するオクルージョン対応イテレーティブリファインメントモジュールを提案する。大規模な実験により,従来のオフライン手法に比べて平均100倍高速で視覚的品質を達成でき,他のオンライン手法に比べて性能(PSNRでは約2dB高い)が向上した。

With the rise of Extended Reality (XR) technology, there is a growing need for real-time light field generation from sparse view inputs. Existing methods can be classified into offline techniques, which can generate high-quality novel views but at the cost of long inference/training time, and online methods, which either lack generalizability or produce unsatisfactory results. However, we have observed that the intrinsic sparse manifold of Multi-plane Images (MPI) enables a significant acceleration of light field generation while maintaining rendering quality. Based on this insight, we introduce EffLiFe, a novel light field optimization method, which leverages the proposed Hierarchical Sparse Gradient Descent (HSGD) to produce high-quality light fields from sparse view images in real time. Technically, the coarse MPI of a scene is first generated using a 3D CNN, and it is further sparsely optimized by focusing only on important MPI gradients in a few iterations. Nevertheless, relying solely on optimization can lead to artifacts at occlusion boundaries. Therefore, we propose an occlusion-aware iterative refinement module that removes visual artifacts in occluded regions by iteratively filtering the input. Extensive experiments demonstrate that our method achieves comparable visual quality while being 100x faster on average than state-of-the-art offline methods and delivering better performance (about 2 dB higher in PSNR) compared to other online approaches.

翻訳日:2023-07-07 13:46:05 公開日:2023-07-06

# スケーラブルな動的障害物回避のための逐次ニューラルネットワークバリア

Sequential Neural Barriers for Scalable Dynamic Obstacle Avoidance ( http://arxiv.org/abs/2307.03015v1 )

ライセンス: Link先を確認

Hongzhan Yu, Chiaki Hirayama, Chenning Yu, Sylvia Herbert, Sicun Gao

(参考訳) 障害物の複雑な相互作用のダイナミクスを解析的にモデル化することは困難であり、障害物の数で計画と制御の複雑さが指数関数的に増加する。したがって、この文脈では、データ駆動および学習に基づく手法が特に有用である。しかし、データ駆動手法は分布のドリフトに敏感であり、異なる障害物密度にわたる学習モデルの訓練と一般化が困難である。本稿では,SNCBF(Sequential Neural Control Barrier Model)の合成学習手法を提案する。複数の動的障害物の空間的相互作用パターンを分解し,各障害物の状態列を通じて予測することができる。分解により、少数の障害でのみ訓練された制御ポリシーを、障害密度が100倍の環境に一般化することができる。提案手法は, ポテンシャル場, エンド・ツー・エンド強化学習, モデル予測制御など既存の手法と比較して, 動的衝突回避の改善に有効であることを示す。また,ハードウェア実験を行い,補足映像におけるアプローチの有効性を示した。

There are two major challenges for scaling up robot navigation around dynamic obstacles: the complex interaction dynamics of the obstacles can be hard to model analytically, and the complexity of planning and control grows exponentially in the number of obstacles. Data-driven and learning-based methods are thus particularly valuable in this context. However, data-driven methods are sensitive to distribution drift, making it hard to train and generalize learned models across different obstacle densities. We propose a novel method for compositional learning of Sequential Neural Control Barrier models (SNCBFs) to achieve scalability. Our approach exploits an important observation: the spatial interaction patterns of multiple dynamic obstacles can be decomposed and predicted through temporal sequences of states for each obstacle. Through decomposition, we can generalize control policies trained only with a small number of obstacles, to environments where the obstacle density can be 100x higher. We demonstrate the benefits of the proposed methods in improving dynamic collision avoidance in comparison with existing methods including potential fields, end-to-end reinforcement learning, and model-predictive control. We also perform hardware experiments and show the practical effectiveness of the approach in the supplementary video.

翻訳日:2023-07-07 13:45:39 公開日:2023-07-06

# 熱平衡からのナノ粒子とグラフェンの分散相互作用に及ぼす質量ギャップの影響

Impact of Mass-Gap on the Dispersion Interaction of Nanoparticles with Graphene out of Thermal Equilibrium ( http://arxiv.org/abs/2307.03009v1 )

ライセンス: Link先を確認

Galina L. Klimchitskaya, Constantine C. Korikov, Vladimir M. Mostepanenko and Oleg Yu. Tsybin

(参考訳) グラフェンシートのソース側のナノ粒子に作用する非平衡分散力について検討した。ナノ粒子は環境温度で保持されるが、グラフェンシートは環境よりも冷却または高温である。質量ギャップパラメータの異なる値における分離関数としての分散力の計算は、基本リフシッツ理論から熱平衡条件への一般化を用いて行う。電磁場の量子および熱揺らぎに対するガッピンググラフェンの応答は、ディラック模型の枠組みにおける(2+1)次元時空における分極テンソルによって記述される。エバネッセント波の領域におけるこのテンソルの成分に対する明示的な表現を示す。グラフェンの質量ギャップパラメータの非平衡分散力に対する非自明な影響を平衡定数と比較して決定する。原始グラフェンの場合とは異なり、非平衡力は魅力的な性質を保っていることが示されている。ナノ粒子とグラフェンシートを複合したマイクロデバイス, ナノデバイスの設計において, 得られた結果を利用する可能性について論じる。

We consider the nonequilibrium dispersion force acting on nanoparticles on the source side of gapped graphene sheet. Nanoparticles are kept at the environmental temperature, whereas the graphene sheet may be either cooler or hotter than the environment. Calculation of the dispersion force as a function of separation at different values of the mass-gap parameter is performed using the generalization of the fundamental Lifshitz theory to the out-of-thermal-equilibrium conditions. The response of gapped graphene to quantum and thermal fluctuations of the electromagnetic field is described by the polarization tensor in (2+1)-dimensional space-time in the framework of the Dirac model. The explicit expressions for the components of this tensor in the area of evanescent waves are presented. The nontrivial impact of the mass-gap parameter of graphene on the nonequilibrium dispersion force, as compared to the equilibrium one, is determined. It is shown that, unlike the case of a pristine graphene, the nonequilibrium force preserves an attractive character. The possibilities of using the obtained results in the design of micro- and nanodevices incorporating nanoparticles and graphene sheets for their functionality are discussed.

翻訳日:2023-07-07 13:45:21 公開日:2023-07-06

# 社会的仮定を伴わない有向グラフにおける不連続表現の学習

Learning Disentangled Representations in Signed Directed Graphs without Social Assumptions ( http://arxiv.org/abs/2307.03077v1 )

ライセンス: Link先を確認

Geonwoo Ko and Jinhong Jung

(参考訳) 符号付きグラフは、様々な領域における信頼関係や嗜好を表す複雑なシステムである。このようなグラフでノード表現を学ぶことは、多くのマイニングタスクにとって重要です。現実世界のサイン付き関係は複数の潜在要因に影響される可能性があるが、ほとんどの既存の手法は、しばしば社会理論に頼り、それらを単純化された要因として扱うことによって、署名付き関係のモデリングを単純化する。これは、それらの表現力とそれらの関係を形作る多様な要因を捉える能力を制限する。本稿では,符号付き有向グラフにおける不連続ノード表現を社会的仮定なしに学習する新しい手法である dines を提案する。それぞれの埋め込みを異なる要因に分けて、複数の潜伏要因をキャプチャする、アンタングル化フレームワークを採用しています。また,社会理論によらず,サインや方向のみに焦点を当てた軽量グラフ畳み込みについても検討した。さらに,因子間の相関を考慮し,エッジの符号を効果的に分類するデコーダを提案する。さらに, エンコーダとデコーダを併用して, 自己教師付き因子判別器の訓練を行った。実世界の符号付き有向グラフに関する広範な実験を通して、DINESは効果的に絡み合ったノード表現を学習し、符号予測タスクにおいて競合相手よりも大幅に優れていることを示す。

Signed graphs are complex systems that represent trust relationships or preferences in various domains. Learning node representations in such graphs is crucial for many mining tasks. Although real-world signed relationships can be influenced by multiple latent factors, most existing methods often oversimplify the modeling of signed relationships by relying on social theories and treating them as simplistic factors. This limits their expressiveness and their ability to capture the diverse factors that shape these relationships. In this paper, we propose DINES, a novel method for learning disentangled node representations in signed directed graphs without social assumptions. We adopt a disentangled framework that separates each embedding into distinct factors, allowing for capturing multiple latent factors. We also explore lightweight graph convolutions that focus solely on sign and direction, without depending on social theories. Additionally, we propose a decoder that effectively classifies an edge's sign by considering correlations between the factors. To further enhance disentanglement, we jointly train a self-supervised factor discriminator with our encoder and decoder. Throughout extensive experiments on real-world signed directed graphs, we show that DINES effectively learns disentangled node representations, and significantly outperforms its competitors in the sign prediction task.

翻訳日:2023-07-07 13:37:42 公開日:2023-07-06

# 分類された経路計算

Categorified Path Calculus ( http://arxiv.org/abs/2307.03075v1 )

ライセンス: Link先を確認

Simon Burton

(参考訳) パス計算(Path calculus)またはグラフィカル線型代数は、基底環上の行列の圏に対する弦図式である。これは対称モノイド圏の通常の弦図計算であり、モノイド積は行列の直和である。我々はこの物語を分類し、基底双モノイド圏上の行列の2カテゴリーのための曲面図表を開発する。これにより、任意のビモノイド圏の表面ダイアグラムは 1x1 行列のダイアグラムに制限される。両積、双対、短剣といった基底圏上の追加構造が、結果として得られる計算にどのように構造を加えるかを示す。圏量子力学に適用すると、テレポーテーションプロトコルの新しいグラフィカルな証明が得られる。

Path calculus, or graphical linear algebra, is a string diagram calculus for the category of matrices over a base ring. It is the usual string diagram calculus for a symmetric monoidal category, where the monoidal product is the direct sum of matrices. We categorify this story to develop a surface diagram calculus for the bicategory of matrices over a base bimonoidal category. This yields a surface diagram calculus for any bimonoidal category by restricting to diagrams for 1x1 matrices. We show how additional structure on the base category, such as biproducts, duals and the dagger, adds structure to the resulting calculus. Applied to categorical quantum mechanics this yields a new graphical proof of the teleportation protocol.

翻訳日:2023-07-07 13:37:19 公開日:2023-07-06

# Proto-CLIP:Few-Shot Learningのためのビジョン言語プロトタイプネットワーク

Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning ( http://arxiv.org/abs/2307.03073v1 )

ライセンス: Link先を確認

Jishnu Jaykumar P, Kamalesh Palanisamy, Yu-Wei Chao, Xinya Du, Yu Xiang

(参考訳) 本稿では,CLIPのような大規模視覚言語モデルを活用することで,数ショット学習のための新しいフレームワークを提案する。初歩学習のためのユニモーダルな原型的ネットワークに動機づけられ,初歩学習に画像プロトタイプとテキストプロトタイプを利用するproto-clipを導入した。具体的には、PROTO-CLIPは、CLIP内の画像エンコーダとテキストエンコーダを、少数の例を用いて共同で適応させる。 2つのエンコーダは、分類のための画像クラスのプロトタイプを計算するために使用される。適応中に、対応するクラスの画像とテキストのプロトタイプの整列を提案する。このようなアライメントは、両タイプのプロトタイプからの貢献により、少数ショットの分類に有用である。本手法の有効性を,数発の学習のためのベンチマークデータセットと,ロボットの知覚のための実世界で実験することで実証する。

We propose a novel framework for few-shot learning by leveraging large-scale vision-language models such as CLIP. Motivated by the unimodal prototypical networks for few-shot learning, we introduce PROTO-CLIP that utilizes image prototypes and text prototypes for few-shot learning. Specifically, PROTO-CLIP adapts the image encoder and text encoder in CLIP in a joint fashion using few-shot examples. The two encoders are used to compute prototypes of image classes for classification. During adaptation, we propose aligning the image and text prototypes of corresponding classes. Such a proposed alignment is beneficial for few-shot classification due to the contributions from both types of prototypes. We demonstrate the effectiveness of our method by conducting experiments on benchmark datasets for few-shot learning as well as in the real world for robot perception.

翻訳日:2023-07-07 13:37:09 公開日:2023-07-06

# 脳波認識のためのグラフスムース信号を用いたハイブリッド・エンド・エンド時空間アテンションニューラルネットワーク

A Hybrid End-to-End Spatio-Temporal Attention Neural Network with Graph-Smooth Signals for EEG Emotion Recognition ( http://arxiv.org/abs/2307.03068v1 )

ライセンス: Link先を確認

Shadi Sartipi and Mastaneh Torkamani-Azar and Mujdat Cetin

(参考訳) 近年,心電図(eeg)信号などの生理データが情動計算に注目されている。この文脈での主な目標は、感情状態を評価する自動化モデルを設計することです。近年、ディープニューラルネットワークは感情認識タスクにおいて有望なパフォーマンスを示している。しかし、生データから実用的な情報を抽出する深層アーキテクチャの設計は依然として課題である。本稿では,時空間符号化とリカレントアテンションネットワークブロックのハイブリッド構造により,解釈可能な生理学的表現を得るディープニューラルネットワークを提案する。さらに、グラフ信号処理ツールを用いて生データに前処理ステップを適用し、空間領域でグラフ平滑化を行う。提案するアーキテクチャは,公開のdeapデータセットにおける感情分類の最先端結果を超えることを実証する。また,学習モデルの汎用性を検討するために,モデルパラメータを特定のソースから他のターゲットドメインに転送することで,トランスファー学習(tl)に向けたアーキテクチャの性能を評価する。 DREAMER と Emotional English Word (EEWD) データセットでは,脳波に基づく感情分類タスクと異なる刺激を伴う感情分類タスクを伴って,モデルの有効性を実証する。

Recently, physiological data such as electroencephalography (EEG) signals have attracted significant attention in affective computing. In this context, the main goal is to design an automated model that can assess emotional states. Lately, deep neural networks have shown promising performance in emotion recognition tasks. However, designing a deep architecture that can extract practical information from raw data is still a challenge. Here, we introduce a deep neural network that acquires interpretable physiological representations by a hybrid structure of spatio-temporal encoding and recurrent attention network blocks. Furthermore, a preprocessing step is applied to the raw data using graph signal processing tools to perform graph smoothing in the spatial domain. We demonstrate that our proposed architecture exceeds state-of-the-art results for emotion classification on the publicly available DEAP dataset. To explore the generality of the learned model, we also evaluate the performance of our architecture towards transfer learning (TL) by transferring the model parameters from a specific source to other target domains. Using DEAP as the source dataset, we demonstrate the effectiveness of our model in performing cross-modality TL and improving emotion classification accuracy on DREAMER and the Emotional English Word (EEWD) datasets, which involve EEG-based emotion classification tasks with different stimuli.

翻訳日:2023-07-07 13:36:57 公開日:2023-07-06

# DeepOnto: ディープラーニングによるオントロジーエンジニアリングのためのPythonパッケージ

DeepOnto: A Python Package for Ontology Engineering with Deep Learning ( http://arxiv.org/abs/2307.03067v1 )

ライセンス: Link先を確認

Yuan He, Jiaoyan Chen, Hang Dong, Ian Horrocks, Carlo Allocca, Taehun Kim, Brahmananda Sapkota

(参考訳) オントロジー工学におけるディープラーニング技術、特に言語モデル(LM)の適用は、広く注目を集めている。しかし、PyTorchやTensorflowのようなディープラーニングフレームワークは主にPythonプログラミング用に開発されており、OWL APIやJanaといった広く使われているオントロジーAPIは主にJavaベースである。これらのフレームワークとAPIのシームレスな統合を容易にするため、オントロジーエンジニアリング用に設計されたPythonパッケージであるDeepontoを紹介します。このパッケージには、広く認識され信頼性の高いOWL API上に構築されたコアオントロジー処理モジュールが含まれており、基本的な機能をよりPython的な方法でカプセル化し、推論、動詞化、正規化、投影など、他の必須コンポーネントを含むように拡張している。このモジュール上に構築されているDeepontoは、オントロジーアライメントや完了といった様々なオントロジーエンジニアリングタスクをサポートする一連のツール、リソース、アルゴリズムを提供する。本稿では,Samsung Research UKのDigital Health Coachingと,Ontology Alignment Evaluation Initiative(OAEI)のBio-MLトラックの2つのユースケースを通じて,Deepontoの実用性を実証する。

Applying deep learning techniques, particularly language models (LMs), in ontology engineering has raised widespread attention. However, deep learning frameworks like PyTorch and Tensorflow are predominantly developed for Python programming, while widely-used ontology APIs, such as the OWL API and Jena, are primarily Java-based. To facilitate seamless integration of these frameworks and APIs, we present Deeponto, a Python package designed for ontology engineering. The package encompasses a core ontology processing module founded on the widely-recognised and reliable OWL API, encapsulating its fundamental features in a more "Pythonic" manner and extending its capabilities to include other essential components including reasoning, verbalisation, normalisation, projection, and more. Building on this module, Deeponto offers a suite of tools, resources, and algorithms that support various ontology engineering tasks, such as ontology alignment and completion, by harnessing deep learning methodologies, primarily pre-trained LMs. In this paper, we also demonstrate the practical utility of Deeponto through two use-cases: the Digital Health Coaching in Samsung Research UK and the Bio-ML track of the Ontology Alignment Evaluation Initiative (OAEI).

翻訳日:2023-07-07 13:36:38 公開日:2023-07-06

# 離散対数に対する量子複雑性とその問題

Quantum Complexity for Discrete Logarithms and Related Problems ( http://arxiv.org/abs/2307.03065v1 )

ライセンス: Link先を確認

Minki Hhan, Takashi Yamakawa, Aaram Yun

(参考訳) 本稿では,離散対数(dl)の量子計算複雑性と,それに関連する一般アルゴリズムの文脈における群論的問題,すなわち,群符号化の特性を生かさないアルゴリズムについて検討する。我々は、群理論問題に対する量子計算の一般モデルを構築し、これを量子ジェネリックグループモデルと呼ぶ。 dl問題と関連アルゴリズムに対するshorのアルゴリズムは、このモデルで記述することができる。このモデルでは、量子複雑性の低い境界と、DLのほぼ一致するアルゴリズムと関連する問題を示す。より正確には、巡回群 $G$ of prime order に対して以下の結果が証明される。 -任意のジェネリック量子dlアルゴリズムは群演算の深さを$\omega(\log |g|)$でなければならない。これは、ショアのアルゴリズムが、並列アルゴリズムを考慮しても、ジェネリック量子アルゴリズムの中で漸近的に最適であることを示している。 -shorのアルゴリズムのバリエーションが古典計算を活用し、量子群演算の数を減らすことができることを観測する。汎用ハイブリッド量子古典アルゴリズムのモデルを導入し、これらのアルゴリズムがこのモデルでほぼ最適であることを示す。 DL問題に対する総群演算数$Q$は、深さ$\Omega(\log |G|/\log Q)$の量子群演算を$\Omega(\log |G| - \log\log Q)$とする。 - 量子メモリが$t$グループ要素のみを格納し、$r$グループ要素の量子ランダムアクセスメモリを使用する場合、任意の一般的なハイブリッドアルゴリズムは、合計$\Omega(\sqrt{|G|})$グループ演算または$\Omega(\log |G|/\log (tr))$量子群演算をしなければならない。副次的貢献として、パスワード認証鍵交換プロトコルの文脈で提案される量子イライラ特性の強い形に反論し、各インスタンスを1つずつ解決するよりも優れたアルゴリズムを許容する複数のDL問題を示す。

This paper studies the quantum computational complexity of the discrete logarithm (DL) and related group-theoretic problems in the context of generic algorithms -- that is, algorithms that do not exploit any properties of the group encoding. We establish a generic model of quantum computation for group-theoretic problems, which we call the quantum generic group model. Shor's algorithm for the DL problem and related algorithms can be described in this model. We show the quantum complexity lower bounds and almost matching algorithms of the DL and related problems in this model. More precisely, we prove the following results for a cyclic group $G$ of prime order. - Any generic quantum DL algorithm must make $\Omega(\log |G|)$ depth of group operations. This shows that Shor's algorithm is asymptotically optimal among the generic quantum algorithms, even considering parallel algorithms. - We observe that variations of Shor's algorithm can take advantage of classical computations to reduce the number of quantum group operations. We introduce a model for generic hybrid quantum-classical algorithms and show that these algorithms are almost optimal in this model. Any generic hybrid algorithm for the DL problem with a total number of group operations $Q$ must make $\Omega(\log |G|/\log Q)$ quantum group operations of depth $\Omega(\log\log |G| - \log\log Q)$. - When the quantum memory can only store $t$ group elements and use quantum random access memory of $r$ group elements, any generic hybrid algorithm must make either $\Omega(\sqrt{|G|})$ group operations in total or $\Omega(\log |G|/\log (tr))$ quantum group operations. As a side contribution, we show a multiple DL problem admits a better algorithm than solving each instance one by one, refuting a strong form of the quantum annoying property suggested in the context of password-authenticated key exchange protocol.

翻訳日:2023-07-07 13:36:14 公開日:2023-07-06

# 勾配に基づく解釈可能性のためのバックプロパゲーションの一般化

Generalizing Backpropagation for Gradient-Based Interpretability ( http://arxiv.org/abs/2307.03056v1 )

ライセンス: Link先を確認

Kevin Du, Lucas Torroba Hennigen, Niklas Stoehr, Alexander Warstadt, Ryan Cotterell

(参考訳) ディープニューラルネットワークを解釈するための多くの一般的な特徴属性法は、入力に対するモデルの出力の勾配の計算に依存する。これらの手法はモデルの予測にどの入力特徴が重要であるかを示すことができるが、モデル自体の内部動作についてはほとんど明らかにしない。本稿では,モデルの勾配計算が半環を用いたより一般的な定式化の特別な場合であることを示す。この観測により、バックプロパゲーションアルゴリズムを一般化し、最も重み付きパスやエントロピーのようなニューラルネットワークの勾配グラフに関する他の解釈可能な統計を効率的に計算することができる。本稿では、この一般化アルゴリズムを実装し、計算した統計をよりよく理解するために合成データセット上で評価し、SVAにおけるBERTの挙動の研究に応用する。この方法により、我々は (a)モデルの構成要素を流れる勾配の量は、その予測の重要性を反映していることを検証する。 b) 自己保持機構のどの経路が最も重要であるかを特定する。

Many popular feature-attribution methods for interpreting deep neural networks rely on computing the gradients of a model's output with respect to its inputs. While these methods can indicate which input features may be important for the model's prediction, they reveal little about the inner workings of the model itself. In this paper, we observe that the gradient computation of a model is a special case of a more general formulation using semirings. This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics about the gradient graph of a neural network, such as the highest-weighted path and entropy. We implement this generalized algorithm, evaluate it on synthetic datasets to better understand the statistics it computes, and apply it to study BERT's behavior on the subject-verb number agreement task (SVA). With this method, we (a) validate that the amount of gradient flow through a component of a model reflects its importance to a prediction and (b) for SVA, identify which pathways of the self-attention mechanism are most important.

翻訳日:2023-07-07 13:35:40 公開日:2023-07-06

# ハイブリッド量子古典型貯水池計算による2次元乱流レイリー・ブエナード流れの低次モデリング

Reduced-order modeling of two-dimensional turbulent Rayleigh-B\'enard flow by hybrid quantum-classical reservoir computing ( http://arxiv.org/abs/2307.03053v1 )

ライセンス: Link先を確認

Philipp Pfeffer, Florian Heyder and J\"org Schumacher

(参考訳) 2つのハイブリッド量子古典型貯留層計算モデルを用いて,レイリー数 ra=1e5 における2次元乱流rayleigh-b\'enard対流流とpr=10 の低次統計特性を再現した。どちらの量子アルゴリズムも、量子貯水池の回路層、特に絡み合い層の配置によって異なる。 2つのアーキテクチャのうち2番目はh2と呼ばれ、量子回路内のリザーバー更新を完全に実行することができる。その性能は古典的な貯水池計算モデルと比較される。 3つのモデルはすべて、最もエネルギーの強い16個の正直交分解(pod)モードの時系列にまたがる低次元の潜在データ空間における流れの非線形およびカオス力学を学ぶ必要がある。これらのトレーニングデータは、乱流からポッドスナップショット解析によって生成される。全ての貯水池計算モデルは復元モードまたは開放ループモードで動作し、各ステップで入力として3つのPODモードを受け取り、欠落した13のモードを再構築する。本研究では,量子ケースに特有なハイパーパラメータや,リザーバサイズやリーク率などの古典的コンセンサスと共用するハイパーパラメータに依存した再構成誤差を解析した。両量子アルゴリズムは, 乱流対流の基本的な統計特性を, n<=9の少数の量子ビットで再現可能であることを示す。これらの特性は, 速度および温度変動分布, 特に乱流対流熱流束からなり, 層内を横切る乱流熱伝達を定量化し, 密集した高温上昇および冷下熱気柱に現れる。

Two hybrid quantum-classical reservoir computing models are presented to reproduce low-order statistical properties of a two-dimensional turbulent Rayleigh-B\'enard convection flow at a Rayleigh number Ra=1e5 and a Prandtl number Pr=10. Both quantum algorithms differ by the arrangement of the circuit layers in the quantum reservoir, in particular the entanglement layers. The second of the two architectures, denoted as H2, enables a complete execution of the reservoir update inside the quantum circuit. Their performance is compared with that of a classical reservoir computing model. All three models have to learn the nonlinear and chaotic dynamics of the flow in a lower-dimensional latent data space which is spanned by the time series of the 16 most energetic Proper Orthogonal Decomposition (POD) modes. These training data are generated by a POD snapshot analysis from the turbulent flow. All reservoir computing models are operated in the reconstruction or open-loop mode, i.e., they receive 3 POD modes as an input at each step and reconstruct the missing 13 ones. We analyse the reconstruction error in dependence on the hyperparameters which are specific for the quantum cases or shared with the classical counterpart, such as the reservoir size and the leaking rate. We show that both quantum algorithms are able to reconstruct essential statistical properties of the turbulent convection flow successfully with a small number of qubits of n<=9. These properties comprise the velocity and temperature fluctuation profiles and, in particular, the turbulent convective heat flux, which quantifies the turbulent heat transfer across the layer and manifests in coherent hot rising and cold falling thermal plumes.

翻訳日:2023-07-07 13:35:22 公開日:2023-07-06

# 地図ベースのサービスのための原位置旅行時間Oracle

Origin-Destination Travel Time Oracle for Map-based Services ( http://arxiv.org/abs/2307.03048v1 )

ライセンス: Link先を確認

Yan Lin, Huaiyu Wan, Jilin Hu, Shengnan Guo, Bin Yang, Youfang Lin, Christian S. Jensen

(参考訳) オリジン(o)、デスティネーション(d)、出発時刻(t)が与えられると、オリジン-デスティネーション(od)の旅行時間oracle~(odt-oracle)は、t.odt-oraclesがマップベースのサービスにおいて重要な目的を果たすときに、oからdへの旅行に要する時間の見積もりを返す。このようなオーラクルの構築を可能にするため,過去の軌道を利用してODペアの時間変化旅行時間を推定する旅行時間推定(TTE)ソリューションを提案する。この問題は、異なる旅行時間を持つ複数の歴史的な軌道がodペアを接続する可能性があり、一方、軌道は互いに異なる可能性があるという事実によって複雑である。この問題を解決するためには、将来の問い合わせに対して旅行時間推定を行う際に、外乱軌道を除去することが重要である。そこで本稿では,Diffusion-based Origin-Detination Travel Time Estimation (DOT)と呼ばれる新しい2段階のフレームワークを提案する。第一に、DOTは、OD対と過去の軌跡との相関を学習することで拡散に基づくPiT推論プロセスの構築を可能にする、条件付きPixelated Trajectories (PiT)デノイザを採用している。具体的には、ODペアと出発時間を考えると、PiTを推論することを目的としています。次に、DOTは、推定されたPiTに基づいて、効率的に効率的に旅行時間を推定するMasked Vision Transformer~(MViT)を含む。我々は,dotがベースラインメソッドよりも精度,スケーラビリティ,説明可能性において優れていることを示す2つの実世界のデータセットに関する広範な実験を報告する。

Given an origin (O), a destination (D), and a departure time (T), an Origin-Destination (OD) travel time oracle~(ODT-Oracle) returns an estimate of the time it takes to travel from O to D when departing at T. ODT-Oracles serve important purposes in map-based services. To enable the construction of such oracles, we provide a travel-time estimation (TTE) solution that leverages historical trajectories to estimate time-varying travel times for OD pairs. The problem is complicated by the fact that multiple historical trajectories with different travel times may connect an OD pair, while trajectories may vary from one another. To solve the problem, it is crucial to remove outlier trajectories when doing travel time estimation for future queries. We propose a novel, two-stage framework called Diffusion-based Origin-destination Travel Time Estimation (DOT), that solves the problem. First, DOT employs a conditioned Pixelated Trajectories (PiT) denoiser that enables building a diffusion-based PiT inference process by learning correlations between OD pairs and historical trajectories. Specifically, given an OD pair and a departure time, we aim to infer a PiT. Next, DOT encompasses a Masked Vision Transformer~(MViT) that effectively and efficiently estimates a travel time based on the inferred PiT. We report on extensive experiments on two real-world datasets that offer evidence that DOT is capable of outperforming baseline methods in terms of accuracy, scalability, and explainability.

翻訳日:2023-07-07 13:34:53 公開日:2023-07-06

# 変圧器を用いた音楽ストリーミングサービスにおけるトラックミックス生成

Track Mix Generation on Music Streaming Services using Transformers ( http://arxiv.org/abs/2307.03045v1 )

ライセンス: Link先を確認

Walid Bendada and Th\'eo Bontempelli and Mathieu Morlon and Benjamin Chapus and Thibault Cador and Thomas Bouab\c{c}a and Guillaume Salha-Galvan

(参考訳) 本稿では,2022年に音楽ストリーミングサービスDeezerでリリースされたパーソナライズされたプレイリスト生成システムであるTrack Mixを紹介する。 Track Mixは、初期の音楽曲にインスパイアされた「ミックス」プレイリストを自動的に生成し、ユーザーはお気に入りのコンテンツに似た音楽を発見することができる。これらのミックスを生成するために,ユーザプレイリストから数百万トラックシーケンスをトレーニングしたTransformerモデルを検討する。近年、トランスフォーマーの人気が高まりつつあることを踏まえて、従来のコラボレーティブフィルタリングアプローチと比較して、このようなモデルを用いたサービスでのミックス生成の利点、欠点、技術的な課題を分析した。 Track Mixはリリース以来、毎日何百万ものユーザー向けにプレイリストを作成し、Deezerで音楽発見体験を強化してきた。

This paper introduces Track Mix, a personalized playlist generation system released in 2022 on the music streaming service Deezer. Track Mix automatically generates "mix" playlists inspired by initial music tracks, allowing users to discover music similar to their favorite content. To generate these mixes, we consider a Transformer model trained on millions of track sequences from user playlists. In light of the growing popularity of Transformers in recent years, we analyze the advantages, drawbacks, and technical challenges of using such a model for mix generation on the service, compared to a more traditional collaborative filtering approach. Since its release, Track Mix has been generating playlists for millions of users daily, enhancing their music discovery experience on Deezer.

翻訳日:2023-07-07 13:34:21 公開日:2023-07-06

# プライバシーとユーティリティのトレードオフに対する量子ソリューション

Quantum Solutions to the Privacy vs. Utility Tradeoff ( http://arxiv.org/abs/2307.03118v1 )

ライセンス: Link先を確認

Sagnik Chatterjee and Vyacheslav Kungurtsev

(参考訳) 本稿では,生成モデルに対するメンバシップ推論攻撃に関するプライバシとセキュリティ保証の証明が可能な量子暗号プリミティブに基づく,新たなアーキテクチャ(およびそのいくつかの変種)を提案する。私たちのアーキテクチャは、既存の古典的または量子的生成モデル上で使用できます。我々は、ユニタリ演算子に関連する量子ゲートの使用は、すべての多項式時間敵からの保証されたセキュリティを確立するための標準微分プライバシベースの技術と比較して本質的に有利であると主張している。

In this work, we propose a novel architecture (and several variants thereof) based on quantum cryptographic primitives with provable privacy and security guarantees regarding membership inference attacks on generative models. Our architecture can be used on top of any existing classical or quantum generative models. We argue that the use of quantum gates associated with unitary operators provides inherent advantages compared to standard Differential Privacy based techniques for establishing guaranteed security from all polynomial-time adversaries.

翻訳日:2023-07-07 13:28:52 公開日:2023-07-06

# KoRC: 深層テキスト理解のための知識指向読解ベンチマーク

KoRC: Knowledge oriented Reading Comprehension Benchmark for Deep Text Understanding ( http://arxiv.org/abs/2307.03115v1 )

ライセンス: Link先を確認

Zijun Yao, Yantao Liu, Xin Lv, Shulin Cao, Jifan Yu, Lei Hou, Juanzi Li

(参考訳) 与えられた文書とテキスト以外の知識との間の接続を必要とする深いテキスト理解は、近年多くのベンチマークによって強調されている。しかし、これらのベンチマークは2つの大きな制限に遭遇した。一方、そのほとんどが人間の知識アノテーションを必要としており、知識のカバー範囲が限られている。一方、彼らは通常、テキスト中の選択やスパンを答えとして使用し、その結果、狭い回答空間となる。これらの制限を克服するために、我々はKoRcという新しい挑戦的なベンチマークを構築した。以前のベンチマークと比較すると、KoRCには2つの利点がある。具体的には,大量の知識ベースを用いてアノテーションや大規模言語モデル(llm)を指導し,理解可能な質問を構築する。さらに、最終回答として範囲や選択ではなく、知識ベースでラベルを使用します。実験結果から, 最強のベースラインは, 分布内および分布外において, 68.3%, 30.0%のF1測定値しか得られないことが判明した。これらの結果は、深いテキスト理解はまだ未解決の課題であることを示している。ベンチマークデータセット、リーダーボード、ベースラインメソッドはhttps://github.com/THU-KEG/KoRC.orgで公開されている。

Deep text understanding, which requires the connections between a given document and prior knowledge beyond its text, has been highlighted by many benchmarks in recent years. However, these benchmarks have encountered two major limitations. On the one hand, most of them require human annotation of knowledge, which leads to limited knowledge coverage. On the other hand, they usually use choices or spans in the texts as the answers, which results in narrow answer space. To overcome these limitations, we build a new challenging benchmark named KoRc in this paper. Compared with previous benchmarks, KoRC has two advantages, i.e., broad knowledge coverage and flexible answer format. Specifically, we utilize massive knowledge bases to guide annotators or large language models (LLMs) to construct knowledgable questions. Moreover, we use labels in knowledge bases rather than spans or choices as the final answers. We test state-of-the-art models on KoRC and the experimental results show that the strongest baseline only achieves 68.3% and 30.0% F1 measure in the in-distribution and out-of-distribution test set, respectively. These results indicate that deep text understanding is still an unsolved challenge. The benchmark dataset, leaderboard, and baseline methods are released in https://github.com/THU-KEG/KoRC.

翻訳日:2023-07-07 13:28:45 公開日:2023-07-06

# LISSNAS: ニューラルネットワーク検索のための局所性に基づく反復探索空間収縮

LISSNAS: Locality-based Iterative Search Space Shrinkage for Neural Architecture Search ( http://arxiv.org/abs/2307.03110v1 )

ライセンス: Link先を確認

Bhavna Gopal, Arjun Sridhar, Tunhou Zhang and Yiran Chen

(参考訳) 探索空間はニューラルアーキテクチャサーチ(NAS)の進歩を示す。汎用的な建築オペレーターと構造を持つ大規模で複雑な探索空間は、有望なアーキテクチャを造る機会を提供するが、効率的な探索と搾取には厳しい課題が生じる。その後、いくつかの検索空間縮小法は、性能の良いネットワークを含む単一のサブリージョンを選択することで最適化される。これらの手法では, 少ない性能と効率向上が観察されるが, 探索性能を著しく向上させる余地は残っており, アーキテクチャの多様性を維持するには有効ではない。我々は,大規模な空間をSOTA検索性能を持つ多種多様な小さな探索空間に縮小する自動アルゴリズム LISSNAS を提案する。提案手法は, 局所性, 構造的類似性と性能類似性の関係を利用して, 性能の良いネットワークのポケットを効率的に抽出する。様々なサイズとデータセットにまたがる探索空間の配列に本手法を示す。 2つの異なる検索空間において、最良top-1精度を達成することにより、ワンショット検索における縮小空間の有効性を強調する。提案手法は,モバイル制約下でのイメージネットのSOTA Top-1精度77.6\%,最良クラスKendal-Tau,アーキテクチャ多様性,検索空間サイズを実現している。

Search spaces hallmark the advancement of Neural Architecture Search (NAS). Large and complex search spaces with versatile building operators and structures provide more opportunities to brew promising architectures, yet pose severe challenges on efficient exploration and exploitation. Subsequently, several search space shrinkage methods optimize by selecting a single sub-region that contains some well-performing networks. Small performance and efficiency gains are observed with these methods but such techniques leave room for significantly improved search performance and are ineffective at retaining architectural diversity. We propose LISSNAS, an automated algorithm that shrinks a large space into a diverse, small search space with SOTA search performance. Our approach leverages locality, the relationship between structural and performance similarity, to efficiently extract many pockets of well-performing networks. We showcase our method on an array of search spaces spanning various sizes and datasets. We accentuate the effectiveness of our shrunk spaces when used in one-shot search by achieving the best Top-1 accuracy in two different search spaces. Our method achieves a SOTA Top-1 accuracy of 77.6\% in ImageNet under mobile constraints, best-in-class Kendal-Tau, architectural diversity, and search space size.

翻訳日:2023-07-07 13:28:24 公開日:2023-07-06

# 大規模言語モデルの評価に関する調査

A Survey on Evaluation of Large Language Models ( http://arxiv.org/abs/2307.03109v1 )

ライセンス: Link先を確認

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Kaijie Zhu, Hao Chen, Linyi Yang, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie

(参考訳) 大規模言語モデル(LLM)は、様々なアプリケーションにおける前例のない性能のため、学術と産業の両方で人気が高まっている。 LLMは研究と日常利用の両方において重要な役割を担い続けており、その評価はタスクレベルだけでなく社会レベルでもますます重要になり、潜在的なリスクの理解を深めている。過去数年間、様々な観点からLSMを調べるための重要な努力が続けられてきた。本稿では, これらのLCMの評価手法を総合的に検討し, 評価方法, 評価方法, 評価方法の3つの重要な側面に着目した。まず,一般的な自然言語処理タスク,推論,医療利用,倫理,教育,自然科学,社会科学,エージェント応用など,評価タスクの観点から概観する。第2に,LLMの性能評価において重要な要素である評価手法とベンチマークに飛び乗ることで,'where' と 'how' の質問に答える。次に、異なるタスクにおけるLCMの成功事例と失敗事例を要約する。最後に、llms評価の先にあるいくつかの将来の課題に光を当てた。我々の目的は、LLMの評価の領域における研究者に貴重な洞察を提供することであり、それによってより熟練したLLMの開発を支援することである。我々のキーポイントは、LCMの開発を支援するために、評価を必須の規律として扱うべきであるということです。関連したオープンソース資料は、https://github.com/mlgroupjlu/llm-eval-surveyで一貫して保守しています。

Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, educations, natural and social sciences, agent applications, and other areas. Secondly, we answer the `where' and `how' questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey.

翻訳日:2023-07-07 13:28:00 公開日:2023-07-06

# テキスト・画像拡散モデルにおける不正データ利用の検出方法

How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models ( http://arxiv.org/abs/2307.03108v1 )

ライセンス: Link先を確認

Zhenting Wang, Chen Chen, Yuchen Liu, Lingjuan Lyu, Dimitris Metaxas, Shiqing Ma

(参考訳) 最近のテキストから画像への拡散モデルは、高品質な画像を生成するのに驚くべき性能を示している。しかし、トレーニングプロセス中に不正なデータの使用が懸念されている。例えば、モデルトレーナーが特定のアーティストによって作成された画像の集合を収集し、アーティストの許可を得ずに類似の画像を生成することができるモデルを訓練しようとする場合である。この問題に対処するには、不正なデータ利用を検出することが不可欠になる。本稿では,保護されたデータセット上で訓練されたテキスト間拡散モデルに,インジェクトした記憶を植え付けることで,そのような不正なデータ利用を検出する手法を提案する。具体的には,人間の視力に影響されないが拡散モデルにより捉え,記憶できるステルス画像ラッピング機能などの画像に,ユニークな内容を加えることで,保護された画像データセットを変更する。モデルがインジェクトされたコンテンツ(すなわち、生成された画像が選択された後処理機能によって処理されているかどうか)を記憶しているかどうかを解析することにより、不正に使用したモデルを検出することができる。安定拡散とloraモデルを用いた実験により,提案手法の有効性が実証された。

Recent text-to-image diffusion models have shown surprising performance in generating high-quality images. However, concerns have arisen regarding the unauthorized usage of data during the training process. One example is when a model trainer collects a set of images created by a particular artist and attempts to train a model capable of generating similar images without obtaining permission from the artist. To address this issue, it becomes crucial to detect unauthorized data usage. In this paper, we propose a method for detecting such unauthorized data usage by planting injected memorization into the text-to-image diffusion models trained on the protected dataset. Specifically, we modify the protected image dataset by adding unique contents on the images such as stealthy image wrapping functions that are imperceptible to human vision but can be captured and memorized by diffusion models. By analyzing whether the model has memorization for the injected content (i.e., whether the generated images are processed by the chosen post-processing function), we can detect models that had illegally utilized the unauthorized data. Our experiments conducted on Stable Diffusion and LoRA model demonstrate the effectiveness of the proposed method in detecting unauthorized data usages.

翻訳日:2023-07-07 13:27:36 公開日:2023-07-06

# 一般化量子共分散行列からの非エルミート親ハミルトン

Non-Hermitian Parent Hamiltonian from Generalized Quantum Covariance Matrix ( http://arxiv.org/abs/2307.03107v1 )

ライセンス: Link先を確認

Yin Tang, W. Zhu

(参考訳) 量子逆問題は1つの固有状態から局所ハミルトニアンを決定する方法として定義される。この問題はエルミート系だけでなく非エルミート系においても有効である。これまでのところ、ほとんどの試みはエルミート系に限定されているが、非エルミート解は未解決のままである。本研究では,非エルミート系に適用可能な場合に対して量子共分散行列法を一般化し,非エルミート親ハミルトニアンを任意の対の生物固有状態から明示的に再構成することができる。具体例として、リー・ヤン特異点と非エルミート相互作用フェルミオンモデルによるスピン鎖へのアプローチをうまく適用する。このアプローチの一般化とさらなる応用についても論じる。我々の研究は、一対の生物直交固有状態から非エルミートハミルトニアンを構築するための体系的で効率的な方法を提供し、非エルミート物理学の将来の探索に光を当てた。

Quantum inverse problem is defined as how to determine a local Hamiltonian from a single eigenstate? This question is valid not only in Hermitian system but also in non-Hermitian system. So far, most attempts are limited to Hermitian systems, while the possible non-Hermitian solution remains outstanding. In this work, we generalize the quantum covariance matrix method to the cases that are applicable to non-Hermitian systems, through which we are able to explicitly reconstruct the non-Hermitian parent Hamiltonian from an arbitrary pair of biorthogonal eigenstates. As concrete examples, we successfully apply our approach in spin chain with Lee-Yang singularity and a non-Hermitian interacting fermion model. Some generalization and further application of our approach are also discussed. Our work provides a systematical and efficient way to construct non-Hermitian Hamiltonian from a single pair of biorthogonal eigenstates and shed light on future exploration on non-Hermitian physics.

翻訳日:2023-07-07 13:27:18 公開日:2023-07-06

# アダプタを用いた文埋め込みの効率的なドメイン適応

Efficient Domain Adaptation of Sentence Embeddings using Adapters ( http://arxiv.org/abs/2307.03104v1 )

ライセンス: Link先を確認

Tim Schopf, Dennis Schneider, Florian Matthes

(参考訳) 文埋め込みにより、短いテキストの意味的類似性を捉えることができる。ほとんどの文埋め込みモデルはsts(general semantic textual similarity)タスクのために訓練される。したがって、特定のドメインに文を埋め込むには、良い結果を得るためにモデルを適用する必要がある。通常、これは関心領域の文埋め込みモデル全体を微調整することによって行われる。このアプローチは最先端の結果をもたらすが、モデルの重みはすべて微調整中に更新され、このメソッドはリソース集約的になる。したがって,各対象領域の文埋め込みモデル全体を個別に微調整するのではなく,軽量アダプタのトレーニングを提案する。これらのドメイン固有のアダプタは、基礎となるすべての文埋め込みモデルパラメータを微調整する必要はない。代わりに、基礎となる文埋め込みモデルの重みを固定しながら、少数の追加パラメータのみをトレーニングします。ドメイン固有のアダプタのトレーニングでは、常に同じベースモデルを使用することができ、特定のドメインに文の埋め込みを適用するためにのみドメイン固有のアダプタを交換することができる。文埋め込みのパラメータ効率のよいドメイン適応のためのアダプタを用いることで、約3.6%のパラメータをトレーニングしながら、ドメイン適応された完全に微調整された文埋め込みモデルの1%以内の競争性能が得られることを示す。

Sentence embeddings enable us to capture the semantic similarity of short texts. Most sentence embedding models are trained for general semantic textual similarity (STS) tasks. Therefore, to use sentence embeddings in a particular domain, the model must be adapted to it in order to achieve good results. Usually, this is done by fine-tuning the entire sentence embedding model for the domain of interest. While this approach yields state-of-the-art results, all of the model's weights are updated during fine-tuning, making this method resource-intensive. Therefore, instead of fine-tuning entire sentence embedding models for each target domain individually, we propose to train lightweight adapters. These domain-specific adapters do not require fine-tuning all underlying sentence embedding model parameters. Instead, we only train a small number of additional parameters while keeping the weights of the underlying sentence embedding model fixed. Training domain-specific adapters allows always using the same base model and only exchanging the domain-specific adapters to adapt sentence embeddings to a specific domain. We show that using adapters for parameter-efficient domain adaptation of sentence embeddings yields competitive performance within 1% of a domain-adapted, entirely fine-tuned sentence embedding model while only training approximately 3.6% of the parameters.

翻訳日:2023-07-07 13:27:00 公開日:2023-07-06

# 画像異常検出のためのコンテクストアフィニティ蒸留

Contextual Affinity Distillation for Image Anomaly Detection ( http://arxiv.org/abs/2307.03101v1 )

ライセンス: Link先を確認

Jie Zhang, Masanori Suganuma, Takayuki Okatani

(参考訳) 従来、非監督的産業異常検出の研究は、主にひび割れや色汚染などの局所的な構造異常に焦点を当てていた。この種の異常に対する検出性能は著しく向上するが、通常の物体が間違った位置に置かれているような長距離依存に反する論理的異常に直面している。本稿では,過去の知識蒸留研究に基づいて,生徒2名(地域とグローバル)を用いて,教師の行動を模倣する手法を提案する。先行研究で用いられる地域学生は主に構造異常の検出に焦点をあて、グローバル学生は論理異常に注意を払っている。さらに,学生の長期的依存を捉えるための学習を促すために,GCCB(Global context condensing block)を設計し,学生のトレーニングと異常スコアに対する文脈親和性損失を提案する。実験結果から,提案手法は煩雑なトレーニング手法を必要とせず,MVTec LOCO ADデータセット上で新たな最先端性能を実現する。

Previous works on unsupervised industrial anomaly detection mainly focus on local structural anomalies such as cracks and color contamination. While achieving significantly high detection performance on this kind of anomaly, they are faced with logical anomalies that violate the long-range dependencies such as a normal object placed in the wrong position. In this paper, based on previous knowledge distillation works, we propose to use two students (local and global) to better mimic the teacher's behavior. The local student, which is used in previous studies mainly focuses on structural anomaly detection while the global student pays attention to logical anomalies. To further encourage the global student's learning to capture long-range dependencies, we design the global context condensing block (GCCB) and propose a contextual affinity loss for the student training and anomaly scoring. Experimental results show the proposed method doesn't need cumbersome training techniques and achieves a new state-of-the-art performance on the MVTec LOCO AD dataset.

翻訳日:2023-07-07 13:26:26 公開日:2023-07-06

# gpsを現実世界のデータに適用するためのフレームワーク、intuition

Beyond Intuition, a Framework for Applying GPs to Real-World Data ( http://arxiv.org/abs/2307.03093v1 )

ライセンス: Link先を確認

Kenza Tazi, Jihao Andreas Lin, Ross Viljoen, Alex Gardner, Ti John, Hong Ge, Richard E. Turner

(参考訳) Gaussian Processs (GP) は、小さな、構造化された、相関したデータセットに対する回帰の魅力的な方法を提供する。しかし、それらの展開は計算コストと単純な低次元データセットを超えてGPを適用する方法に関する限られたガイドラインによって妨げられている。本稿では,ある問題に対するGPの適合性を同定する枠組みと,頑健で明確なGPモデルの構築方法を提案する。このガイドラインは、経験豊富なGP実践者の決定を定式化し、カーネル設計と計算スケーラビリティのオプションに重点を置いている。この枠組みは氷河の標高変化のケーススタディに適用され、テスト時により正確な結果が得られる。

Gaussian Processes (GPs) offer an attractive method for regression over small, structured and correlated datasets. However, their deployment is hindered by computational costs and limited guidelines on how to apply GPs beyond simple low-dimensional datasets. We propose a framework to identify the suitability of GPs to a given problem and how to set up a robust and well-specified GP model. The guidelines formalise the decisions of experienced GP practitioners, with an emphasis on kernel design and options for computational scalability. The framework is then applied to a case study of glacier elevation change yielding more accurate results at test time.

翻訳日:2023-07-07 13:25:56 公開日:2023-07-06

# 実感と因果性は測定による情報消去を示唆する

Realism and causality imply information erasure by measurements ( http://arxiv.org/abs/2307.03134v1 )

ライセンス: Link先を確認

Alberto Montina, Stefan Wolf

(参考訳) 量子測定は一般的に、測定システムのその後の進化に摂動をもたらす。さらに、射影測度は、結果が無視された場合、系の不確実性を減少させることはできず、つまり、フォン・ノイマンエントロピーは減少できない。しかし、特定の音響仮定とLegget-Gargの不等式を用いた場合、この性質は測定過程の忠実な古典的因果シミュレーションによって継承されないことを示す。シミュレーションにおいて、測定は、システムに部分リセットを行うことで、前の情報を消去する。これにより、測定装置は、測定システムから低温浴吸収エントロピーとして機能する。情報消去は、スペケンスの準備の文脈性の一形態である。我々の証明は、量子状態の最大無知が古典状態の最大無知と互換性があると仮定すれば単純である。また、より弱い仮説を用いる。情報消去はライファーとプシーの定理と関連しており、時間対称性はレトロカウシリティを意味する。以上の結果を踏まえて,spekensの準備状況と,leifer と pusey が定義した時間対称性仮説の弱点について考察した。

Quantum measurements generally introduce perturbations into the subsequent evolution of the measured system. Furthermore, a projective measurement cannot decrease the uncertainty on the system if the outcome is ignored; that is, the von Neumann entropy cannot decrease. However, under certain sound assumptions and using the quantum violation of Leggett-Garg inequalities, we demonstrate that this property is not inherited by a faithful classical causal simulation of a measurement process. In the simulation, a measurement erases previous information by performing a partial reset on the system. Thus, the measuring device acts as a low-temperature bath absorbing entropy from the measured system. Information erasure is a form of Spekkens' preparation contextuality. Our proof is straightforward if one assumes that maximal ignorance of the quantum state is compatible with maximal ignorance of the classical state. We also employ a weaker hypothesis. Information erasure is related to a theorem of Leifer and Pusey, which states that time symmetry implies retrocausality. In light of our findings, we discuss Spekkens' preparation contextuality, as well as a weakness in the hypothesis of time symmetry as defined by Leifer and Pusey.

翻訳日:2023-07-07 13:18:29 公開日:2023-07-06

# 画像分類における分布シフトに対するテスト時間適応ベンチマーク

Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification ( http://arxiv.org/abs/2307.03133v1 )

ライセンス: Link先を確認

Yongcan Yu, Lijun Sheng, Ran He, Jian Liang

(参考訳) テスト時間適応(TTA)は、予測時にのみラベルのないサンプルを活用することにより、モデルの一般化性能を向上させる技術である。分布シフトに直面するニューラルネットワークシステムのロバスト性を考慮すると,近年,数多くのtta法が提案されている。しかしながら、これらの手法の評価は、分散シフトやバックボーン、シナリオの設計など、異なる設定の下で行われることが多いため、その効果を検証するための一貫性と公正なベンチマークが欠如している。そこで本研究では,CIFAR-10-C,CIFAR-100-C,ImageNet-C,DomainNet,Office-Homeの5つの画像分類データセットに対して,13のTTAメソッドとその変種を体系的に評価するベンチマークを提案する。これらの手法は、幅広い適応シナリオ(例えば、オンライン適応 v.s.オフライン適応、インスタンス適応 v.s.バッチ適応 v.s.ドメイン適応)をカバーする。さらに,ネットワークバックボーンの異なるTTA手法の互換性についても検討する。このベンチマークを実装するために、PyTorchで統一されたフレームワークを開発し、異なるデータセットとネットワークアーキテクチャにわたるTTAメソッドの一貫性のある評価と比較を可能にした。本ベンチマークの確立により、モデルロバスト性および一般化性能を向上させる上でのTTA手法の有効性を評価し、比較する信頼性の高い手段を研究者や実践者に提供することを目指している。私たちのコードはhttps://github.com/yuyongcan/Benchmark-TTAで利用可能です。

Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction. Given the need for robustness in neural network systems when faced with distribution shifts, numerous TTA methods have recently been proposed. However, evaluating these methods is often done under different settings, such as varying distribution shifts, backbones, and designing scenarios, leading to a lack of consistent and fair benchmarks to validate their effectiveness. To address this issue, we present a benchmark that systematically evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets: CIFAR-10-C, CIFAR-100-C, ImageNet-C, DomainNet, and Office-Home. These methods encompass a wide range of adaptation scenarios (e.g. online adaptation v.s. offline adaptation, instance adaptation v.s. batch adaptation v.s. domain adaptation). Furthermore, we explore the compatibility of different TTA methods with diverse network backbones. To implement this benchmark, we have developed a unified framework in PyTorch, which allows for consistent evaluation and comparison of the TTA methods across the different datasets and network architectures. By establishing this benchmark, we aim to provide researchers and practitioners with a reliable means of assessing and comparing the effectiveness of TTA methods in improving model robustness and generalization performance. Our code is available at https://github.com/yuyongcan/Benchmark-TTA.

翻訳日:2023-07-07 13:18:10 公開日:2023-07-06

# T-MARS:テキスト特徴学習による視覚表現の改善

T-MARS: Improving Visual Representations by Circumventing Text Feature Learning ( http://arxiv.org/abs/2307.03132v1 )

ライセンス: Link先を確認

Pratyush Maini, Sachin Goyal, Zachary C. Lipton, J. Zico Kolter, Aditi Raghunathan

(参考訳) 大規模なWebソースによるマルチモーダルデータセットは、汎用的な視覚表現の学習、コンピュータビジョンの最先端化、ゼロショットと少数ショットの認識の革新など、数多くの新しい手法を駆使した。実践者が直面する重要な決定の1つは、いかにして、いつまでも大きなデータセットをキュレートするかである。例えば、LAION-5Bデータセットの作成者は、CLIPの類似度スコアが指定された閾値を超えたイメージキャプチャペアのみを保持することを選んだ。本稿では,LAIONの画像の40%近くが字幕と重なるテキストを含んでいるという観察を動機とした,最新のデータフィルタリング手法を提案する。直感的には、このようなデータは視覚的特徴を学習するのではなく、光学的文字認識を行うモデルにインセンティブを与えるため、無駄になる可能性がある。しかし、視覚的特徴を含む画像を(重なり合うテキストに加えて)捨ててしまうため、こうしたデータを全て取り除くのは無駄になる可能性がある。私たちのシンプルでスケーラブルなアプローチであるT-MARS(Text Masking and Re-Scoring)は、テキストが残りの視覚的特徴を支配しているペアのみをフィルタリングします。実験的に、T-MARSは、DataCompの"medium scale"(データフィルタリングベンチマーク)において、ImageNetの6.5%、VTABの4.7%のマージンでトップランクの手法より優れている。さらに, 2M から 64M までのデータプールサイズを系統的に評価した結果,T-MARS による精度向上はデータや計算が指数関数的に大きくなるにつれて線形的に増加することが示された。コードはhttps://github.com/locuslab/T-MARSで入手できる。

Large web-sourced multimodal datasets have powered a slew of new methods for learning general-purpose visual representations, advancing the state of the art in computer vision and revolutionizing zero- and few-shot recognition. One crucial decision facing practitioners is how, if at all, to curate these ever-larger datasets. For example, the creators of the LAION-5B dataset chose to retain only image-caption pairs whose CLIP similarity score exceeded a designated threshold. In this paper, we propose a new state-of-the-art data filtering approach motivated by our observation that nearly 40% of LAION's images contain text that overlaps significantly with the caption. Intuitively, such data could be wasteful as it incentivizes models to perform optical character recognition rather than learning visual features. However, naively removing all such data could also be wasteful, as it throws away images that contain visual features (in addition to overlapping text). Our simple and scalable approach, T-MARS (Text Masking and Re-Scoring), filters out only those pairs where the text dominates the remaining visual features -- by first masking out the text and then filtering out those with a low CLIP similarity score of the masked image. Experimentally, T-MARS outperforms the top-ranked method on the "medium scale" of DataComp (a data filtering benchmark) by a margin of 6.5% on ImageNet and 4.7% on VTAB. Additionally, our systematic evaluation on various data pool sizes from 2M to 64M shows that the accuracy gains enjoyed by T-MARS linearly increase as data and compute are scaled exponentially. Code is available at https://github.com/locuslab/T-MARS.

翻訳日:2023-07-07 13:17:44 公開日:2023-07-06

# BLEURTにはユニバーサル翻訳がある:最小限のリスクトレーニングによる自動メトリクスの分析

BLEURT Has Universal Translations: An Analysis of Automatic Metrics by Minimum Risk Training ( http://arxiv.org/abs/2307.03131v1 )

ライセンス: Link先を確認

Yiming Yan, Tao Wang, Chengqi Zhao, Shujian Huang, Jiajun Chen, Mingxuan Wang

(参考訳) 自動メトリクスは機械翻訳において重要な役割を果たす。 n-gramベースのメトリクスが広く使用されているにもかかわらず、文の意味論の計測に焦点を当てた事前学習されたモデルベースのメトリクスの開発が最近急増している。しかしながら、これらの神経メトリクスは、人間の評価と高い相関性を達成する一方で、検出が難しい潜在的なバイアスを持つブラックボックスと見なされることが多い。本研究では,機械翻訳システムの学習指導の観点から,各種の主流・最先端自動メトリクスを体系的に分析・比較する。最小リスクトレーニング(MRT)を通じて、BLEURTやBARTScoreに普遍的な逆変換が存在するなど、ある種の指標が堅牢性欠陥を示すことがわかった。詳細な分析からは、トレーニングデータセットにおける分散バイアスと、メトリックパラダイムの傾向の2つの大きな原因が示唆されている。トークンレベルの制約を取り入れることで,評価指標のロバスト性が向上し,機械翻訳システムの性能が向上する。コードは \url{https://github.com/powerpuffpomelo/fairseq_mrt} で入手できる。

Automatic metrics play a crucial role in machine translation. Despite the widespread use of n-gram-based metrics, there has been a recent surge in the development of pre-trained model-based metrics that focus on measuring sentence semantics. However, these neural metrics, while achieving higher correlations with human evaluations, are often considered to be black boxes with potential biases that are difficult to detect. In this study, we systematically analyze and compare various mainstream and cutting-edge automatic metrics from the perspective of their guidance for training machine translation systems. Through Minimum Risk Training (MRT), we find that certain metrics exhibit robustness defects, such as the presence of universal adversarial translations in BLEURT and BARTScore. In-depth analysis suggests two main causes of these robustness deficits: distribution biases in the training datasets, and the tendency of the metric paradigm. By incorporating token-level constraints, we enhance the robustness of evaluation metrics, which in turn leads to an improvement in the performance of machine translation systems. Codes are available at \url{https://github.com/powerpuffpomelo/fairseq_mrt}.

翻訳日:2023-07-07 13:17:09 公開日:2023-07-06

# VisKoP:対話型知識ベース質問応答のための視覚的知識指向プログラミング

VisKoP: Visual Knowledge oriented Programming for Interactive Knowledge Base Question Answering ( http://arxiv.org/abs/2307.03130v1 )

ライセンス: Link先を確認

Zijun Yao, Yuanyong Chen, Xin Lv, Shulin Cao, Amy Xin, Jifan Yu, Hailong Jin, Jianjun Xu, Peng Zhang, Lei Hou, Juanzi Li

(参考訳) 本稿では,人間をループに統合し,知識ベース(kb)クエリの編集とデバッグを行う,知識ベース質問応答(kbqa)システムであるviskopを提案する。 VisKoPは、自然言語質問を知識指向プログラム言語(KoPL)に変換するニューラルプログラム誘導モジュールを提供するだけでなく、KoPLプログラムをグラフィカル要素にマッピングする。 KoPLプログラムは、知識演算子を追加するドラッグや演算子引数を指定するスロットフィリングなど、単純なグラフィカル演算子で編集できる。さらに、VisKoPは知識ベーススキーマの自動補完を提供し、ユーザは中間結果をチェックすることで、簡単にKoPLプログラムをデバッグできる。 100万単位のKB上での実用的なKBQAを実現するために,バックエンド用の高効率なKoPL実行エンジンを設計する。実験結果から,VisKoPは高効率であり,ユーザインタラクションによって間違ったKoPLプログラムの大部分が修正され,正解が得られることがわかった。 viskop online demo https://demoviskop.xlore.cn (stable release of this paper)とhttps://viskop.xlore.cn (beta release with new features)、高効率kopl engine https://pypi.org/project/kopl-engine、スクリーンキャストビデオhttps://youtu.be/zabjtxfptxoが公開された。

We present Visual Knowledge oriented Programming platform (VisKoP), a knowledge base question answering (KBQA) system that integrates human into the loop to edit and debug the knowledge base (KB) queries. VisKoP not only provides a neural program induction module, which converts natural language questions into knowledge oriented program language (KoPL), but also maps KoPL programs into graphical elements. KoPL programs can be edited with simple graphical operators, such as dragging to add knowledge operators and slot filling to designate operator arguments. Moreover, VisKoP provides auto-completion for its knowledge base schema and users can easily debug the KoPL program by checking its intermediate results. To facilitate the practical KBQA on a million-entity-level KB, we design a highly efficient KoPL execution engine for the back-end. Experiment results show that VisKoP is highly efficient and user interaction can fix a large portion of wrong KoPL programs to acquire the correct answer. The VisKoP online demo https://demoviskop.xlore.cn (Stable release of this paper) and https://viskop.xlore.cn (Beta release with new features), highly efficient KoPL engine https://pypi.org/project/kopl-engine, and screencast video https://youtu.be/zAbJtxFPTXo are now publicly available.

翻訳日:2023-07-07 13:16:50 公開日:2023-07-06

# 次元減少のための主サブバンドル

Principal subbundles for dimension reduction ( http://arxiv.org/abs/2307.03128v1 )

ライセンス: Link先を確認

Morten Akh{\o}j, James Benn, Erlend Grong, Stefan Sommer, Xavier Pennec

(参考訳) 本稿では, 点雲の局所線型近似を組み合わせて低次元束を得ることにより, 多様体学習と表面再構成にサブリーマン幾何学をどのように利用できるかを示す。局所的pcasによって得られる局所近似は、主部分バンドル ( principal subbundle) と呼ばれる$\mathbb{r}^d$, $k<d$ 上の接部分バンドル (tangent subbundle) に集められる。これは$\mathbb{R}^d$ 上の部分リーマン計量を決定する。この距離に関する準リーマン測地学は、近似部分多様体 $m$ の明示的な構成、\mathbb{r}^k$ における点クラウドの表現の構成、観測間の距離の計算、学習された幾何学を考慮に入れるなど、いくつかの重要な問題にうまく適用できることが示されている。再構成は、接空間を正確に推定する極限の場合の真の部分多様体に等しいことが保証される。シミュレーションにより,ノイズの多いデータに適用した場合,フレームワークが堅牢であることを示す。さらに、このフレームワークは、既知リーマン多様体上の観測に一般化される。

In this paper we demonstrate how sub-Riemannian geometry can be used for manifold learning and surface reconstruction by combining local linear approximations of a point cloud to obtain lower dimensional bundles. Local approximations obtained by local PCAs are collected into a rank $k$ tangent subbundle on $\mathbb{R}^d$, $k<d$, which we call a principal subbundle. This determines a sub-Riemannian metric on $\mathbb{R}^d$. We show that sub-Riemannian geodesics with respect to this metric can successfully be applied to a number of important problems, such as: explicit construction of an approximating submanifold $M$, construction of a representation of the point-cloud in $\mathbb{R}^k$, and computation of distances between observations, taking the learned geometry into account. The reconstruction is guaranteed to equal the true submanifold in the limit case where tangent spaces are estimated exactly. Via simulations, we show that the framework is robust when applied to noisy data. Furthermore, the framework generalizes to observations on an a priori known Riemannian manifold.

翻訳日:2023-07-07 13:16:20 公開日:2023-07-06

# 実機会ネットワークのためのWiFiダイレクトグループのコンテキストアウェア構成と管理

Context-Aware Configuration and Management of WiFi Direct Groups for Real Opportunistic Networks ( http://arxiv.org/abs/2307.03126v1 )

ライセンス: Link先を確認

Valerio Arnaboldi, Mattia Giovanni Campana, Franca Delmastro

(参考訳) Wi-Fi Directは、商用モバイルデバイス上でデバイス間通信(D2D)をサポートするための有望な技術である。しかし、標準のas-it-isは、日和見的ネットワークのようなd2dに基づくネットワークソリューションの実際の展開をサポートするのに十分ではない。実際、WiFi Directは、ユーザのパーソナルデバイス間でのD2D接続の自律的生成を制限するいくつかの特徴を示している。具体的には、この標準は2つ以上のデバイス間の接続を確立するために、ユーザの承認を明示的に要求し、グループ間通信を限定的にサポートする。場合によっては、互いに通信できないノードの孤立したグループを作るのに繋がる場合もある。本稿では、WiFiダイレクトグループ(WFD-GM)の効率的な構成と管理のための新しいミドルウェア層プロトコルを提案し、自律的な接続とグループ間通信を実現する。これにより、実環境(例えば、可変モビリティとネットワークサイズ)における機会ネットワークが可能になる。 WFD-GMは、ノードの安定性と電力水準の指標を含む特定の時間窓において、最適なグループ構成を作成するための異種パラメータを考慮に入れたコンテキスト関数を定義する。異なるモビリティモデル,地理的領域,ノード数を含む3つの参照シナリオをシミュレートして,プロトコルの性能を評価する。シミュレーションはまた、関連するコンテキストパラメータの実際のテストベッドにおける評価に関する実験結果によっても支持される。我々はWFD-GMを最先端のソリューションと比較し、中低モビリティのシナリオではベースラインアプローチよりもはるかに優れた性能を示し、さらにオーバーヘッドを伴わずに高モビリティの場合と同等であることを示した。

Wi-Fi Direct is a promising technology for the support of device-to-device communications (D2D) on commercial mobile devices. However, the standard as-it-is is not sufficient to support the real deployment of networking solutions entirely based on D2D such as opportunistic networks. In fact, WiFi Direct presents some characteristics that could limit the autonomous creation of D2D connections among users' personal devices. Specifically, the standard explicitly requires the user's authorization to establish a connection between two or more devices, and it provides a limited support for inter-group communication. In some cases, this might lead to the creation of isolated groups of nodes which cannot communicate among each other. In this paper, we propose a novel middleware-layer protocol for the efficient configuration and management of WiFi Direct groups (WiFi Direct Group Manager, WFD-GM) to enable autonomous connections and inter-group communication. This enables opportunistic networks in real conditions (e.g., variable mobility and network size). WFD-GM defines a context function that takes into account heterogeneous parameters for the creation of the best group configuration in a specific time window, including an index of nodes' stability and power levels. We evaluate the protocol performances by simulating three reference scenarios including different mobility models, geographical areas and number of nodes. Simulations are also supported by experimental results related to the evaluation in a real testbed of the involved context parameters. We compare WFD-GM with the state-of-the-art solutions and we show that it performs significantly better than a Baseline approach in scenarios with medium/low mobility, and it is comparable with it in case of high mobility, without introducing additional overhead.

翻訳日:2023-07-07 13:15:59 公開日:2023-07-06

# 大標準結晶構造の予測のためのアナーリング:n体原子間相互作用の効率的な実装

Annealing for prediction of grand canonical crystal structures: Efficient implementation of n-body atomic interactions ( http://arxiv.org/abs/2307.03123v1 )

ライセンス: Link先を確認

Yannick Couzinie, Yusuke Nishiya, Hirofumi Nishi, Taichi Kosugi, Yu-ichiro Matsushita

(参考訳) 本稿では, 一般的なn-体原子間相互作用, 特に共有結合をシミュレートするために必要な3-体相互作用を考慮した結晶構造予測法を提案する。結晶構造は、実空間をメッシュで判別し、各格子点上の原子の存在または非存在を表す二項変数を配置することで表される。二次非拘束二元最適化(qubo)または高次非拘束二元最適化(hubo)問題においてn体原子相互作用を実装し,シミュレートアニーリングによりcspを行う。本研究では,MoS2結晶のHUBO定式化において,3体相互作用を実装するために必要なビット数を削減することに成功した。さらに, 粒子密度と結晶構造をシミュレートしたアニールを用いて同時に最適化できることを示すことにより, グランドカノニカルシミュレーションが可能であることがわかった。特に、希ガス、すなわちレナード・ジョーンズ(lj)固体にcspを適用することで、グランドカノニカル計算がそのマイクロカノニカル計算よりも解のスケーリングに適していることを示す。

We propose an annealing scheme for crystal structures prediction (CSP) by taking into account the general n-body atomic interactions, and in particular three-body interactions which are necessary to simulate covalent bonds. The crystal structure is represented by discretizing the real space by mesh and placing binary variables which express the existence or non-existence of an atom on every grid point. We implement n-body atomic interaction in quadratic unconstrained binary optimization (QUBO) or higher-order unconstrained binary optimization (HUBO) problems and perform CSP by simulated annealing. In this study we successfully reduce the number of bits necessary to implement three-body interactions within the HUBO formulation of MoS2 crystals. Further, we find that grand canonical simulation is possible by showing that we can simultaneously optimize for the particle density as well as the crystal structure using simulated annealing. In particular, we apply CSP to noble gasses, i.e. Lennard-Jones(LJ) solids, and show that the grand canonical calculation has a better time to solution scaling than its microcanonical counterpart.

翻訳日:2023-07-07 13:15:32 公開日:2023-07-06

# 言語モデルから多値関係を抽出する

Extracting Multi-valued Relations from Language Models ( http://arxiv.org/abs/2307.03122v1 )

ライセンス: Link先を確認

Sneha Singhania, Simon Razniewski, Gerhard Weikum

(参考訳) 事前学習言語モデル(lms)による潜在言語表現の普及は、それらが構造化知識の有望な源であることを示唆している。しかし、既存の手法では、複数のオブジェクトが正しい場合が多いにもかかわらず、対象-関係ペア当たりの1つのオブジェクトにのみフォーカスする。この制限を克服するために、我々はこれらの表現を分析して、物質化された多目的関係知識を得る。我々はこの問題をランク選択タスクとして定式化する。候補オブジェクトのランク付けには,既存のプロンプト技術を評価し,ドメイン知識を取り入れた新しい手法を提案する。選択法のうち、学習された関係性特異しきい値よりも高い確率で対象を選択すると、49.5%のF1スコアが得られる。本研究は,多値スロット充足作業におけるlmsの活用の難しさを浮き彫りにし,潜在言語表現から関係知識を抽出するためのさらなる研究の道を開く。

The widespread usage of latent language representations via pre-trained language models (LMs) suggests that they are a promising source of structured knowledge. However, existing methods focus only on a single object per subject-relation pair, even though often multiple objects are correct. To overcome this limitation, we analyze these representations for their potential to yield materialized multi-object relational knowledge. We formulate the problem as a rank-then-select task. For ranking candidate objects, we evaluate existing prompting techniques and propose new ones incorporating domain knowledge. Among the selection methods, we find that choosing objects with a likelihood above a learned relation-specific threshold gives a 49.5% F1 score. Our results highlight the difficulty of employing LMs for the multi-valued slot-filling task and pave the way for further research on extracting relational knowledge from latent language representations.

翻訳日:2023-07-07 13:15:08 公開日:2023-07-06

# 金融における最適マルチオーダー実行のためのマルチエージェント意図認識コミュニケーションの学習

Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance ( http://arxiv.org/abs/2307.03119v1 )

ライセンス: Link先を確認

Yuchen Fang, Zhenggang Tang, Kan Ren, Weiqing Liu, Li Zhao, Jiang Bian, Dongsheng Li, Weinan Zhang, Yong Yu, Tie-Yan Liu

(参考訳) 注文実行は、特定の資産の取引注文の取得または清算を完了することを目的とした、量的金融の基本的なタスクである。モデルフリー強化学習(RL)の最近の進歩は、注文実行問題に対するデータ駆動型ソリューションを提供する。しかしながら、既存の作業は常に個々の順序の実行を最適化し、複数の順序が同時に実行されるように指定されているプラクティスを見越して、亜最適性とバイアスをもたらす。本稿では,まず,現実的な制約を考慮したマルチオーダー実行のためのマルチエージェントRL(MARL)手法を提案する。具体的には、すべてのエージェントを個々のオペレータとして扱い、互いにコミュニケーションを保ちながら、全体の利益を最大化するために協力します。それにもかかわらず、既存のmarlアルゴリズムは、複雑な金融市場では非効率である部分的観測に関する情報のみを交換することで、エージェント間のコミュニケーションを組み込むことが多い。協調性を向上させるために,学習可能なマルチラウンド通信プロトコルを提案する。元の学習目標と確実に一致するが、より効率的である新規な行動値帰属法によって最適化される。実世界の2つの市場におけるデータを用いた実験により,本手法によるコラボレーションの有効性が著しく向上した。

Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets. Recent advance in model-free reinforcement learning (RL) provides a data-driven solution to the order execution problem. However, the existing works always optimize execution for an individual order, overlooking the practice that multiple orders are specified to execute simultaneously, resulting in suboptimality and bias. In this paper, we first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. Specifically, we treat every agent as an individual operator to trade one specific order, while keeping communicating with each other and collaborating for maximizing the overall profits. Nevertheless, the existing MARL algorithms often incorporate communication among agents by exchanging only the information of their partial observations, which is inefficient in complicated financial market. To improve collaboration, we then propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other and refining accordingly. It is optimized through a novel action value attribution method which is provably consistent with the original learning objective yet more efficient. The experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness achieved by our method.

翻訳日:2023-07-07 13:14:54 公開日:2023-07-06

# 消去検出論理測定による超伝導二重レール空洞量子ビットの実証

Demonstrating a superconducting dual-rail cavity qubit with erasure-detected logical measurements ( http://arxiv.org/abs/2307.03169v1 )

ライセンス: Link先を確認

Kevin S. Chou, Tali Shemma, Heather McCarrick, Tzu-Chiao Chien, James D. Teoh, Patrick Winkel, Amos Anderson, Jonathan Chen, Jacob Curtis, Stijn J. de Graaf, John W. O. Garmon, Benjamin Gudlewski, William D. Kalfus, Trevor Keen, Nishaad Khedkar, Chan U Lei, Gangqiang Liu, Pinlei Lu, Yao Lu, Aniket Maiti, Luke Mastalli-Kelly, Nitish Mehta, Shantanu O. Mundhada, Anirudh Narla, Taewan Noh, Takahiro Tsunoda, Sophia H. Xue, Joseph O. Yuan, Luigi Frunzio, Jose Aumentado, Shruti Puri, Steven M. Girvin, S. Harvey Moseley, Jr., Robert J. Schoelkopf

(参考訳) スケーラブルな誤り訂正量子システムを開発する上で重要な課題は、操作と測定をしながらエラーの蓄積である。有望なアプローチの1つは、エラーを検出して消去できるシステムを設計することである。最近の提案では、超伝導キャビティを用いたデュアルレール符号化を目標としている。本研究では,このような二重レールキャビティ量子ビットを実装し,消去検出を伴う投影的論理計測の実証を行う。論理状態の生成と測定誤差を0.01 %$レベルで測定し,99 %$以上の空洞崩壊事象を消去として検出する。新しい測定プロトコルの精度を使って、このシステムにおける異なる種類のエラーを識別し、崩壊エラーは確率$\sim 0.2\%$ perマイクロ秒で発生するが、位相エラーは6倍少なく、ビットフリップは少なくとも170倍少なくなることを発見した。これらの結果は,2重レール消去量子ビットを高効率な消去符号に結合するために必要な誤差階層を初めて確認したことを示す。

A critical challenge in developing scalable error-corrected quantum systems is the accumulation of errors while performing operations and measurements. One promising approach is to design a system where errors can be detected and converted into erasures. A recent proposal aims to do this using a dual-rail encoding with superconducting cavities. In this work, we implement such a dual-rail cavity qubit and use it to demonstrate a projective logical measurement with erasure detection. We measure logical state preparation and measurement errors at the $0.01\%$-level and detect over $99\%$ of cavity decay events as erasures. We use the precision of this new measurement protocol to distinguish different types of errors in this system, finding that while decay errors occur with probability $\sim 0.2\%$ per microsecond, phase errors occur 6 times less frequently and bit flips occur at least 170 times less frequently. These findings represent the first confirmation of the expected error hierarchy necessary to concatenate dual-rail erasure qubits into a highly efficient erasure code.

翻訳日:2023-07-07 13:09:10 公開日:2023-07-06

# VideoGLUE: 基礎モデルの総合的評価

VideoGLUE: Video General Understanding Evaluation of Foundation Models ( http://arxiv.org/abs/2307.03166v1 )

ライセンス: Link先を確認

Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong

(参考訳) 本研究では,3つのホールマークタスク(動作認識,時間的局所化,時空間的局所化),コミュニティが受け取りやすい8つのデータセット,下流タスクのための基盤モデル(fm)を調整した4つの適応手法を用いて,既存の基礎モデルビデオ理解能力を評価した。さらに,一般的な映像理解タスクに適応する際のfmsの有効性と効率を測定するためのスカラービデオグルスコア(vgs)を提案する。主な発見は以下の通りである。第一に、タスク特化モデルは、自然言語や画像理解においてFMが達成したものとは対照的に、本研究で研究した6つのFMよりも著しく優れている。第2に、動画モダリティを含む事前トレーニングデータを持つビデオネイティブfmsは、モーションリッチビデオの分類、時間内のアクションのローカライズ、複数のアクションのビデオの理解において、画像ネイティブfmsよりも一般的に優れている。第3に、ビデオネイティブFMは、ダウンストリームタスク(例えば、FMバックボーンの凍結)に光順応したビデオタスクでうまく機能し、画像ネイティブFMは、完全なエンドツーエンドの微調整で勝利する。最初の2つの観察により、ビデオ中心のfmsの研究を行う必要性と膨大な機会が明らかとなり、最後に、fmsの評価に関してタスクと適応方法の両方が重要であることが確認された。

We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task. Moreover, we propose a scalar VideoGLUE score (VGS) to measure an FMs efficacy and efficiency when adapting to general video understanding tasks. Our main findings are as follows. First, task-specialized models significantly outperform the six FMs studied in this work, in sharp contrast to what FMs have achieved in natural language and image understanding. Second,video-native FMs, whose pretraining data contains the video modality, are generally better than image-native FMs in classifying motion-rich videos, localizing actions in time, and understanding a video of more than one action. Third, the video-native FMs can perform well on video tasks under light adaptations to downstream tasks(e.g., freezing the FM backbones), while image-native FMs win in full end-to-end finetuning. The first two observations reveal the need and tremendous opportunities to conduct research on video-focused FMs, and the last confirms that both tasks and adaptation methods matter when it comes to the evaluation of FMs.

翻訳日:2023-07-07 13:08:50 公開日:2023-07-06

# BrickPal: Brickモデルのための拡張現実ベースのアセンブリ命令

BrickPal: Augmented Reality-based Assembly Instructions for Brick Models ( http://arxiv.org/abs/2307.03162v1 )

ライセンス: Link先を確認

Yao Shi, Xiaofeng Zhang, Ran zhang, Zhou Yang, Xiao Tang, Hongni Ye, Yi Wu

(参考訳) The assembly instruction is a mandatory component of Lego-like brick sets.The conventional production of assembly instructions requires a considerable amount of manual fine-tuning, which is intractable for casual users and customized brick sets.Moreover, the traditional paper-based instructions lack expressiveness and interactivity.To tackle the two problems above, we present BrickPal, an augmented reality-based system, which visualizes assembly instructions in an augmented reality head-mounted display. 本研究は,自然言語処理(nlp)技術を用いて実現可能なアセンブリシーケンスを生成し,arヘッドセットにおけるリアルタイムガイダンスを提供する。さらに、nlpアルゴリズムが生成するアセンブリシーケンスは、手動で適応したシーケンスで同じユーザビリティを実現する。

The assembly instruction is a mandatory component of Lego-like brick sets.The conventional production of assembly instructions requires a considerable amount of manual fine-tuning, which is intractable for casual users and customized brick sets.Moreover, the traditional paper-based instructions lack expressiveness and interactivity.To tackle the two problems above, we present BrickPal, an augmented reality-based system, which visualizes assembly instructions in an augmented reality head-mounted display. It utilizes Natural Language Processing (NLP) techniques to generate plausible assembly sequences, and provide real-time guidance in the AR headset.Our user study demonstrates BrickPal's effectiveness at assisting users in brick assembly compared to traditional assembly methods. Additionally, the NLP algorithm-generated assembly sequences achieve the same usability with manually adapted sequences.

翻訳日:2023-07-07 13:08:22 公開日:2023-07-06

# ドメイン適応は皮膚病変分類の精度と公平性を改善するか?

Can Domain Adaptation Improve Accuracy and Fairness of Skin Lesion Classification? ( http://arxiv.org/abs/2307.03157v1 )

ライセンス: Link先を確認

Janet Wang, Yunbei Zhang, Zhengming Ding, Jihun Hamm

(参考訳) 深層学習に基づく診断システムは、ラベル付きトレーニング例が豊富にある皮膚がんの病態を分類する可能性を示している。しかし、皮膚の病変解析はしばしばラベル付きデータの不足に悩まされ、正確で信頼性の高い診断システムの開発を妨げる。本研究は,複数の皮膚病変データセットを活用し,非教師なし領域適応法(UDA)の2値および多値の皮膚病変分類への応用について検討する。特に,シングル,コンバインド,マルチソースの3つのudaトレーニングスキームを評価した。実験の結果,UDAは二分分類に有効であり,不均衡が緩和された場合にはさらなる改善が見られた。多クラスタスクでは、その性能はさほど目立たず、また、上記のベースライン精度を達成するために不均衡の問題に対処する必要がある。定量的解析により,マルチクラスタスクのテストエラーはラベルシフトと強く相関し,機能レベルのudaメソッドには不均衡データセットを扱う際の制限があることが分かった。最後に,本研究では,少数派に対する偏見を効果的に低減し,公平性を重視したテクニックを明示的に用いなくても公平性を促進できることを示した。

Deep learning-based diagnostic system has demonstrated potential in classifying skin cancer conditions when labeled training example are abundant. However, skin lesion analysis often suffers from a scarcity of labeled data, hindering the development of an accurate and reliable diagnostic system. In this work, we leverage multiple skin lesion datasets and investigate the feasibility of various unsupervised domain adaptation (UDA) methods in binary and multi-class skin lesion classification. In particular, we assess three UDA training schemes: single-, combined-, and multi-source. Our experiment results show that UDA is effective in binary classification, with further improvement being observed when imbalance is mitigated. In multi-class task, its performance is less prominent, and imbalance problem again needs to be addressed to achieve above-baseline accuracy. Through our quantitative analysis, we find that the test error of multi-class tasks is strongly correlated with label shift, and feature-level UDA methods have limitations when handling imbalanced datasets. Finally, our study reveals that UDA can effectively reduce bias against minority groups and promote fairness, even without the explicit use of fairness-focused techniques.

翻訳日:2023-07-07 13:08:11 公開日:2023-07-06

# MultiVENT: 自然文を付加したイベントの多言語ビデオ

MultiVENT: Multilingual Videos of Events with Aligned Natural Text ( http://arxiv.org/abs/2307.03153v1 )

ライセンス: Link先を確認

Kate Sanders, David Etter, Reno Kriz, Benjamin Van Durme

(参考訳) ニュースの報道は、従来の放送から、手書きで未編集のビデオ映像など、幅広いプレゼンテーション形式に移行している。オンラインで利用可能な多言語多言語ニュースソースの多種多様な配列を反映したデータセットは、このシフトの恩恵を受けるモデルを教えるのに使用できるが、既存のニュースビデオデータセットは、英語話者向けの伝統的なニュースブロードキャストに焦点を当てている。この制限に対処するため、5つのターゲット言語にまたがるテキスト文書に基づく多言語・イベント中心ビデオのデータセットであるMultiVENTを構築した。 MultiVENTには、ニュースブロードキャストビデオとプロでないイベント映像の両方が含まれており、オンラインニュースビデオの状態を分析し、それらを利用して、堅牢で事実的に正確なモデルを構築することができる。最後に,MultiVENTを用いた情報検索のベースラインとして,複雑な多言語ビデオ検索のためのモデルを提案する。

Everyday news coverage has shifted from traditional broadcasts towards a wide range of presentation formats such as first-hand, unedited video footage. Datasets that reflect the diverse array of multimodal, multilingual news sources available online could be used to teach models to benefit from this shift, but existing news video datasets focus on traditional news broadcasts produced for English-speaking audiences. We address this limitation by constructing MultiVENT, a dataset of multilingual, event-centric videos grounded in text documents across five target languages. MultiVENT includes both news broadcast videos and non-professional event footage, which we use to analyze the state of online news videos and how they can be leveraged to build robust, factually accurate models. Finally, we provide a model for complex, multilingual video retrieval to serve as a baseline for information retrieval using MultiVENT.

翻訳日:2023-07-07 13:07:53 公開日:2023-07-06

# 共有モビリティによるアクセシビリティの計算について

On the Computation of Accessibility Provided by Shared Mobility ( http://arxiv.org/abs/2307.03148v1 )

ライセンス: Link先を確認

Severin Diepolder, Andrea Araldo, Tarek Chouaki, Santa Maiti, Sebastian H\"orl, Costantinos Antoniou

(参考訳) シェアード・モビリティ・サービス(SMS)、例えばデマンド・レスポンシブ・トランジット(DRT)やライドシェアリングは、低密度領域におけるモビリティを改善することができる。このような改善は、主に待ち時間や旅行時間といった基本的なパフォーマンス指標によって定量化される。しかし、アクセシビリティ指標は、周囲の機会(例えば、仕事、学校、店など)にたどり着くことの容易さを測定することで、より包括的な指標となる。現在、経験的測定に基づいてSMSのアクセシビリティを定量化する方法は存在しない。実際、アクセシビリティは一般的にptネットワークのグラフ表現で計算されるが、smsは動的であり、事前定義されたネットワークに従わない。本研究では,ptのフィーダとして作用するsmsの入力観測トリップをグラフにまとめた空間-時間統計手法を提案する。このようなグラフでは、古典的なアクセシビリティ指標を計算する。本手法をパリ・サクレーにおけるDRTに関するMATSimシミュレーション研究に適用する。

Shared Mobility Services (SMS), e.g., Demand-Responsive Transit (DRT) or ride-sharing, can improve mobility in low-density areas, often poorly served by conventional Public Transport (PT). Such improvement is mostly quantified via basic performance indicators, like wait or travel time. However, accessibility indicators, measuring the ease of reaching surrounding opportunities (e.g., jobs, schools, shops, ...), would be a more comprehensive indicator. To date, no method exists to quantify the accessibility of SMS based on empirical measurements. Indeed, accessibility is generally computed on graph representations of PT networks, but SMS are dynamic and do not follow a predefined network. We propose a spatial-temporal statistical method that takes as input observed trips of a SMS acting as a feeder for PT and summarized such trips in a graph. On such a graph, we compute classic accessibility indicators. We apply our method to a MATSim simulation study concerning DRT in Paris-Saclay.

翻訳日:2023-07-07 13:07:37 公開日:2023-07-06

# 双対ユニタリティの階層的一般化

Hierarchical generalization of dual unitarity ( http://arxiv.org/abs/2307.03138v1 )

ライセンス: Link先を確認

Xie-Hang Yu, Zhiyuan Wang and Pavel Kos

(参考訳) 格子モデルにおける局所的な相互作用を伴う量子力学は、リッチな物理学を示すが、研究は困難である。二重単位回路は、1次元または高次元の量子系における興味深い物理問題に対する正確な答えを可能にする。しかし、このモデル群は、光円錐内における相関の消失や、局所的な可観測物の瞬時熱化など、普遍的な特徴を示す。本研究では, 正確な計算可能な空間-時間相関関数がよりリッチな振る舞いを示し, 局所観測可能な非自明な熱化を持つデュアルユニタリ回路の一般化を提案する。これは、単一ゲート条件をマルチゲート条件の階層に一般化することで実現され、第1レベルがデュアルユニタリモデルを復元し、第2レベルがこれら新しい興味深い特徴を示す。また、議論を拡張して、わずかなサイトオブザーバブルを持つコリエータに正確なソリューションを提供し、量子クエンチ後のものを含む高階について議論する。さらに、量子ビットの場合の徹底的なパラメトリゼーションを提供し、また、2より大きい局所次元のモデルの新しいファミリーを提案し、また二元単位モデルの新しいファミリーを提供する。

Quantum dynamics with local interactions in lattice models display rich physics, but is notoriously hard to study. Dual-unitary circuits allow for exact answers to interesting physical questions in clean or disordered one- and higher-dimensional quantum systems. However, this family of models shows some non-universal features, like vanishing correlations inside the light-cone and instantaneous thermalization of local observables. In this work we propose a generalization of dual-unitary circuits where the exactly calculable spatial-temporal correlation functions display richer behavior, and have non-trivial thermalization of local observables. This is achieved by generalizing the single-gate condition to a hierarchy of multi-gate conditions, where the first level recovers dual-unitary models, and the second level exhibits these new interesting features. We also extend the discussion and provide exact solutions to correlators with few-site observables and discuss higher-orders, including the ones after a quantum quench. In addition, we provide exhaustive parametrizations for qubit cases, and propose a new family of models for local dimensions larger than two, which also provides a new family of dual-unitary models.

翻訳日:2023-07-07 13:07:18 公開日:2023-07-06

# ct画像における大動脈および大血管分節のトポロジー認識損失

Topology-Aware Loss for Aorta and Great Vessel Segmentation in Computed Tomography Images ( http://arxiv.org/abs/2307.03137v1 )

ライセンス: Link先を確認

Seher Ozcelik, Sinan Unver, Ilke Ali Gurses, Rustu Turkay, and Cigdem Gunduz-Demir

(参考訳) セグメンテーションネットワークは、標準的な損失関数で訓練された場合、オブジェクトの形状や複数のオブジェクト間の幾何など、画像のグローバル不変性を学ぶために明示的に強制されない。一方,このような不変性をネットワークトレーニングに組み込むことで,分割対象の固有特性である様々なセグメンテーションタスクの性能を向上させることができる。例えば、CT画像における大動脈と大血管の分節化では、人間の解剖学により体内の特定の形状に血管が見出され、2次元CT画像上の丸い物体のように見える。本稿では, 基底的真理と持続的ホモロジーによる予測とのトポロジの相違を罰する新たなトポロジ認識損失関数を導入することにより, この問題に対処する。予測写像の確率関数と基底真理のベッチ数にしきい値濾過を適用した従来提案されていた分節ネットワーク設計とは違って, ヴィトリス・リップス濾過を適用し, 基底真理と予測写像の持続性図を取得し, 対応する持続性図間のワッサースタイン距離との差を計算することを提案する。この濾過を用いると、形状と形状を同時にモデル化する利点があるが、しきい値濾過が適用されるとは起こり得ない。 24名の被験者の4327ct画像を用いた実験により,提案するトポロジー認識損失関数が,提案手法よりも優れた結果をもたらすことが明らかとなった。

Segmentation networks are not explicitly imposed to learn global invariants of an image, such as the shape of an object and the geometry between multiple objects, when they are trained with a standard loss function. On the other hand, incorporating such invariants into network training may help improve performance for various segmentation tasks when they are the intrinsic characteristics of the objects to be segmented. One example is segmentation of aorta and great vessels in computed tomography (CT) images where vessels are found in a particular geometry in the body due to the human anatomy and they mostly seem as round objects on a 2D CT image. This paper addresses this issue by introducing a new topology-aware loss function that penalizes topology dissimilarities between the ground truth and prediction through persistent homology. Different from the previously suggested segmentation network designs, which apply the threshold filtration on a likelihood function of the prediction map and the Betti numbers of the ground truth, this paper proposes to apply the Vietoris-Rips filtration to obtain persistence diagrams of both ground truth and prediction maps and calculate the dissimilarity with the Wasserstein distance between the corresponding persistence diagrams. The use of this filtration has advantage of modeling shape and geometry at the same time, which may not happen when the threshold filtration is applied. Our experiments on 4327 CT images of 24 subjects reveal that the proposed topology-aware loss function leads to better results than its counterparts, indicating the effectiveness of this use.

翻訳日:2023-07-07 13:06:57 公開日:2023-07-06

# 対称円錐上のオンライン凸最適化のための乗法的更新

Multiplicative Updates for Online Convex Optimization over Symmetric Cones ( http://arxiv.org/abs/2307.03136v1 )

ライセンス: Link先を確認

Ilayda Canyakmaz, Wayne Lin, Georgios Piliouras, Antonios Varvitsiotis

(参考訳) オンライン凸最適化(オンライン凸最適化)について検討し、可能なアクションは対称円錐内のトレース1要素であり、広く研究されている専門家のセットアップとその量子対応を一般化する。対称円錐は、線形、二階錐、半定値最適化を含むいくつかの重要な最適化モデルの統一フレームワークを提供する。ユークリッドジョルダン代数の分野のツールを用いて、任意の対称錐のトレースワンスライス上でのオンライン最適化のための投影なしアルゴリズムであるscmwu(symmetric-cone multiplicative weights update)を導入する。 SCMWUは, 対称錐負エントロピーを正則化器とするFollow-the-Regularized-LeaderおよびOnline Mirror Descentと等価であることを示す。この構造的結果を用いて、scmwuは非回帰アルゴリズムであり、広範な実験により理論結果を検証する。本研究では,確率的単純度を用いた乗法重み更新法と,密度行列の集合上の行列乗法重み更新法を統合し,一般化する。

We study online convex optimization where the possible actions are trace-one elements in a symmetric cone, generalizing the extensively-studied experts setup and its quantum counterpart. Symmetric cones provide a unifying framework for some of the most important optimization models, including linear, second-order cone, and semidefinite optimization. Using tools from the field of Euclidean Jordan Algebras, we introduce the Symmetric-Cone Multiplicative Weights Update (SCMWU), a projection-free algorithm for online optimization over the trace-one slice of an arbitrary symmetric cone. We show that SCMWU is equivalent to Follow-the-Regularized-Leader and Online Mirror Descent with symmetric-cone negative entropy as regularizer. Using this structural result we show that SCMWU is a no-regret algorithm, and verify our theoretical results with extensive experiments. Our results unify and generalize the analysis for the Multiplicative Weights Update method over the probability simplex and the Matrix Multiplicative Weights Update method over the set of density matrices.

翻訳日:2023-07-07 13:06:27 公開日:2023-07-06

# アウト・オブ・ディストリビューション・ジェネリザビリティを持つ大規模視覚言語モデルの蒸留

Distilling Large Vision-Language Model with Out-of-Distribution Generalizability ( http://arxiv.org/abs/2307.03135v1 )

ライセンス: Link先を確認

Xuanlin Li, Yunhao Fang, Minghua Liu, Zhan Ling, Zhuowen Tu, Hao Su

(参考訳) 大きなビジョン言語モデルは優れた性能を達成しているが、そのサイズと計算要件により、リソースに制約のあるデバイスや時間に敏感なタスクへのデプロイは現実的ではない。モデル蒸留は、より大きなモデルの性能を維持する、より小さくより高速なモデルを作成するプロセスであり、ソリューションに向けた有望な方向である。本稿では,大規模教師の視覚モデルから軽度学生モデルへの視覚表現の蒸留について,小規模または中規模データセットを用いて検討する。本研究は,従来モデル蒸留の文献では見過ごされてきた課題であるオープン・ボキャブラリー・アウト・オブ・ディストリビューション(ood)の一般化に焦点を当てたものである。 1) 教師の視覚表現空間を模倣し, 教師との視覚・言語連携を慎重に促進すること, (2) 教師の言語表現を情報的かつ細かな意味的属性で豊かにすることで, 異なるラベルを効果的に区別することである。我々は,いくつかの指標を提案し,その手法を検討するために広範囲な実験を行う。その結果,オープン・ボカブラリー・アウト・オブ・ディストリビューション分類におけるゼロショットと少数ショットの学生成績が有意に改善し,提案手法の有効性が示された。私たちのコードはhttps://github.com/xuanlinli17/large_vlm_distillation_oodでリリースされる。

Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher's language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification, highlighting the effectiveness of our proposed approaches. Our code will be released at https://github.com/xuanlinli17/large_vlm_distillation_ood

翻訳日:2023-07-07 13:06:04 公開日:2023-07-06

# テキストからのアートシネマグラフの合成

Synthesizing Artistic Cinemagraphs from Text ( http://arxiv.org/abs/2307.03190v1 )

ライセンス: Link先を確認

Aniruddha Mahapatra, Aliaksandr Siarohin, Hsin-Ying Lee, Sergey Tulyakov, Jun-Yan Zhu

(参考訳) 本稿では,これらの画像の意味や動きを複雑に解釈することを考えると,特徴的想像的要素や芸術的スタイルを推し進める上で,特に困難な作業である,テキスト記述からシネマグラフを作成する完全自動化手法であるArttic Cinemagraphを紹介する。既存の単一画像アニメーション手法は芸術的な入力に不足しており、最近のテキストベースのビデオ手法は時間的不整合をしばしば導入し、特定の領域を静的に保つのに苦労している。これらの課題に対処するために,1つのテキストプロンプトから画像双生児を合成する手法を提案する。芸術的なイメージはテキストに詳述されたスタイルや外観を描写するが、リアルなイメージはレイアウトや動きの分析を大幅に単純化する。既存の自然画像と映像データセットを利用して、現実のイメージを正確に分割し、その意味情報に基づいて、妥当な動きを予測できる。予測された動きは芸術的イメージに転送され、最終的なシネマグラフが作成される。本手法は,自然景観のシネマグラフ作成における既存の手法と,自動計測とユーザ研究によって検証された芸術的・異世界的なシーンに匹敵する手法である。最後に,既存の絵画のアニメーション化と,テキストによる動き方向制御の2つの拡張を示す。

We introduce Artistic Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions - an especially challenging task when prompts feature imaginary elements and artistic styles, given the complexity of interpreting the semantics and motions of these images. Existing single-image animation methods fall short on artistic inputs, and recent text-based video methods frequently introduce temporal inconsistencies, struggling to keep certain regions static. To address these challenges, we propose an idea of synthesizing image twins from a single text prompt - a pair of an artistic image and its pixel-aligned corresponding natural-looking twin. While the artistic image depicts the style and appearance detailed in our text prompt, the realistic counterpart greatly simplifies layout and motion analysis. Leveraging existing natural image and video datasets, we can accurately segment the realistic image and predict plausible motion given the semantic information. The predicted motion can then be transferred to the artistic image to create the final cinemagraph. Our method outperforms existing approaches in creating cinemagraphs for natural landscapes as well as artistic and other-worldly scenes, as validated by automated metrics and user studies. Finally, we demonstrate two extensions: animating existing paintings and controlling motion directions using text.

翻訳日:2023-07-07 12:58:30 公開日:2023-07-06

# ネルソン量子場理論のシミュレーション

Simulating Nelsonian Quantum Field Theory ( http://arxiv.org/abs/2307.03188v1 )

ライセンス: Link先を確認

Andrea Carosso

(参考訳) 我々は、エドワード・ネルソンの確率力学が量子場理論に一般化する際に示唆する物理過程の全体像を、その理論を水素原子に適用した入門的考察の後に記述する。関連する確率過程の数値シミュレーションを行うことで、ネルソンの理論は、格子上で正規化された自由場理論の場合、ジョン・S・ベル(英語版)のフレーズを使うために、基礎となる場から粒子がどのように生じるかという直感的な説明を与える。すると、相互作用するスカラー場理論に一般化すると、この図は質的に似ていると論じる。最後に、Nelsonian フレームワークと QFT の他の様々な提案されたオントロジーを比較し、実効場理論のパラダイムに照らしてそれらの相対的なメリットについて述べる。

We describe the picture of physical processes suggested by Edward Nelson's stochastic mechanics when generalized to quantum field theory, after an introductory review of his theory applied to the hydrogen atom. By performing numerical simulations of the relevant stochastic processes, we observe that Nelson's theory provides an intuitive account of how particles can arise from an underlying field ``beable'' -- to use a phrase of John S. Bell -- in the case of free field theory, regularized on a lattice. We then argue that this picture looks qualitatively similar when generalized to interacting scalar field theory. Lastly, we compare the Nelsonian framework to various other proposed ontologies for QFT, and remark upon their relative merits in light of the effective field theory paradigm.

翻訳日:2023-07-07 12:58:07 公開日:2023-07-06

# TGRL:教師指導強化学習のためのアルゴリズム

TGRL: An Algorithm for Teacher Guided Reinforcement Learning ( http://arxiv.org/abs/2307.03186v1 )

ライセンス: Link先を確認

Idan Shenfeld, Zhang-Wei Hong, Aviv Tamar, Pulkit Agrawal

(参考訳) 報酬(強化学習またはrl)から学び、教師を模倣する学習(教師・学生学習)は、逐次的な意思決定問題を解決するために確立された2つのアプローチである。これらの学習形態の利点を組み合わせるために、強化と教師-学生の学習目標の組合せを最大化するための政策を訓練することが一般的である。しかしながら、これらの目的のバランスをとるための原則的な方法がなければ、以前の研究は2つの目的のバランスをとるためにヒューリスティックスと問題固有のハイパーパラメーターサーチを使用した。私たちは、$\textit{principled}$アプローチと、$\textit{dynamically}$と$\textit{automatically}$ balanceingの近似実装を示します。主な考え方は,教師の指導を伴わず,報酬のみから,エージェントのパフォーマンスとエージェント学習の反事実シナリオを比較して,教師の監督の重要性を調整することである。教師の指導が向上すると、教師の監督の重要性が増し、それ以外は低下する。我々のメソッドである$\textit{Teacher Guided Reinforcement Learning}$ (TGRL)は、ハイパーパラメータチューニングなしで様々なドメインで強いベースラインを上回ります。

Learning from rewards (i.e., reinforcement learning or RL) and learning to imitate a teacher (i.e., teacher-student learning) are two established approaches for solving sequential decision-making problems. To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. However, without a principled method to balance these objectives, prior work used heuristics and problem-specific hyperparameter searches to balance the two objectives. We present a $\textit{principled}$ approach, along with an approximate implementation for $\textit{dynamically}$ and $\textit{automatically}$ balancing when to follow the teacher and when to use rewards. The main idea is to adjust the importance of teacher supervision by comparing the agent's performance to the counterfactual scenario of the agent learning without teacher supervision and only from rewards. If using teacher supervision improves performance, the importance of teacher supervision is increased and otherwise it is decreased. Our method, $\textit{Teacher Guided Reinforcement Learning}$ (TGRL), outperforms strong baselines across diverse domains without hyper-parameter tuning.

翻訳日:2023-07-07 12:57:52 公開日:2023-07-06

# チェシャー弦の位相相における弦作用素

String operators for Cheshire strings in topological phases ( http://arxiv.org/abs/2307.03180v1 )

ライセンス: Link先を確認

Nathanan Tantivasadakarn, Xie Chen

(参考訳) 3+1D位相相の初等点電荷励起は線に沿って凝縮し、チェシャー弦と呼ばれる子孫励起を形成する。系の基本的なフラックスループ励起とは異なり、チェシャー弦は2次元円板の境界として現れなくても開線セグメント上に存在する。一方、チェシャー弦は、0dの局所ユニタリと1d以上の有限深さ量子回路で生成できる自明な励起とは異なる。本稿では,チェシャー弦を生成するためには,弦の長さに沿って順次作用する線形深度回路が必要であることを示す。チェシャー弦が生成されると、その変形、運動、融合は有限深度回路によって実現される。この回路深度要件は、対称保護トポロジカル鎖やマヨラナ鎖を含むすべての非自明な子孫励起に適用される。

Elementary point charge excitations in 3+1D topological phases can condense along a line and form a descendant excitation called the Cheshire string. Unlike the elementary flux loop excitations in the system, Cheshire strings do not have to appear as the boundary of a 2D disc and can exist on open line segments. On the other hand, Cheshire strings are different from trivial excitations that can be created with local unitaries in 0d and finite depth quantum circuits in 1d and higher. In this paper, we show that to create a Cheshire string, one needs a linear depth circuit that acts sequentially along the length of the string. Once a Cheshire string is created, its deformation, movement and fusion can be realized by finite depths circuits. This circuit depth requirement applies to all nontrivial descendant excitations including symmetry-protected topological chains and the Majorana chain.

翻訳日:2023-07-07 12:57:26 公開日:2023-07-06

# IPO-LDM:潜伏拡散モデルによる深度360度の室内RGBパノラマ画

IPO-LDM: Depth-aided 360-degree Indoor RGB Panorama Outpainting via Latent Diffusion Model ( http://arxiv.org/abs/2307.03177v1 )

ライセンス: Link先を確認

Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham

(参考訳) 狭視野画像から完全な360度パノラマを生成することは、全方位RGBデータが容易に利用できないため、現在進行中である。既存のGANベースのアプローチは、高品質な出力を実現するための障壁に直面し、異なるマスクタイプに対する一般化性能が劣る。本稿では,潜伏拡散モデル (LDM) を用いた360度室内RGBパノラマ露光モデルであるIPO-LDMを提案する。トレーニング中にRGBと深度パノラマデータの両方を利用する新しいバイモーダル潜伏拡散構造を導入するが、推定時に正常な深度のないRGB画像よりも驚くほどよく機能する。さらに,拡散分別ステップ毎にプログレッシブカメラ回転を導入する新しい手法を提案する。その結果、当社のIPO-LDMは、RGBパノラマのパノラマ画における最先端の手法よりも優れており、さまざまな種類のマスクに対して、多様かつ多様に構造化された結果を得ることができることがわかった。

Generating complete 360-degree panoramas from narrow field of view images is ongoing research as omnidirectional RGB data is not readily available. Existing GAN-based approaches face some barriers to achieving higher quality output, and have poor generalization performance over different mask types. In this paper, we present our 360-degree indoor RGB panorama outpainting model using latent diffusion models (LDM), called IPO-LDM. We introduce a new bi-modal latent diffusion structure that utilizes both RGB and depth panoramic data during training, but works surprisingly well to outpaint normal depth-free RGB images during inference. We further propose a novel technique of introducing progressive camera rotations during each diffusion denoising step, which leads to substantial improvement in achieving panorama wraparound consistency. Results show that our IPO-LDM not only significantly outperforms state-of-the-art methods on RGB panorama outpainting, but can also produce multiple and diverse well-structured results for different types of masks.

翻訳日:2023-07-07 12:57:13 公開日:2023-07-06

# 不均一な特徴サブサンプルリッジアンサンブルの学習曲線

Learning Curves for Heterogeneous Feature-Subsampled Ridge Ensembles ( http://arxiv.org/abs/2307.03176v1 )

ライセンス: Link先を確認

Benjamin S. Ruben, Cengiz Pehlevan

(参考訳) 特徴バッキング(feature bagging)は、ランダムなサブサンプルや特徴の投影のアンサンブルにおいて推定子を訓練することで予測分散を減らすことを目的とした、確立されたセンスリング手法である。通常、アンサンブルは均質であると選択されるが、この意味では、エスティメータが利用できる特徴次元の数はアンサンブル全体で一様である。本稿では,様々な特徴次元に基づいて推定器を組み込んだ不均一な特徴アンサンブルを導入し,その性能を線形回帰条件で検討する。線形予測器のアンサンブルについて検討し,利用可能な特徴のサブセットにリッジ回帰を用いた。これらのサブセットに含まれる機能の数を変更できるようにします。統計物理学からのレプリカのトリックを用いて、決定論的線形マスクを用いたリッジアンサンブルの学習曲線を導出する。等方性特徴雑音を伴う等相関データの場合,学習曲線の明示的な表現を求める。導出表現を用いてサブサンプリングとアンサンブルの効果を調査し,ノイズレベル,データ相関,データ-タスクアライメントのパラメータ空間における最適なアンサンブル戦略の急激な遷移を見出した。最後に,頑健な機械学習のための二重降下を緩和するための戦略として,可変次元特徴バッキングを提案する。

Feature bagging is a well-established ensembling method which aims to reduce prediction variance by training estimators in an ensemble on random subsamples or projections of features. Typically, ensembles are chosen to be homogeneous, in the sense the the number of feature dimensions available to an estimator is uniform across the ensemble. Here, we introduce heterogeneous feature ensembling, with estimators built on varying number of feature dimensions, and consider its performance in a linear regression setting. We study an ensemble of linear predictors, each fit using ridge regression on a subset of the available features. We allow the number of features included in these subsets to vary. Using the replica trick from statistical physics, we derive learning curves for ridge ensembles with deterministic linear masks. We obtain explicit expressions for the learning curves in the case of equicorrelated data with an isotropic feature noise. Using the derived expressions, we investigate the effect of subsampling and ensembling, finding sharp transitions in the optimal ensembling strategy in the parameter space of noise level, data correlations, and data-task alignment. Finally, we suggest variable-dimension feature bagging as a strategy to mitigate double descent for robust machine learning in practice.

翻訳日:2023-07-07 12:56:41 公開日:2023-07-06

# 緑を追い越す - 植物葉を移動して見ることを学ぶ

Push Past Green: Learning to Look Behind Plant Foliage by Moving It ( http://arxiv.org/abs/2307.03175v1 )

ライセンス: Link先を確認

Xiaoyu Zhang, Saurabh Gupta

(参考訳) 自律農業の応用(例えば検査、表現型、摘み果物)には、葉と枝の後ろを見るために植物葉を操作する必要がある。部分的な可視性、極端に粗い構造、植物のための未知の幾何学と力学は、そのような操作を困難にしている。データ駆動方式でこれらの課題に取り組む。 SRPNetは、特定の植物に対する候補アクションの実行時に、どの空間が露呈しているかを予測するニューラルネットワークである。我々は,srpnet とクロスエントロピー法を用いて,葉下空間の解明に有効な行動を予測した。さらに、SRPNetは、どれだけの空間が露光されるかだけでなく、どこで露光されるかを予測するだけでなく、植物葉の下の空間を徐々に明らかにする一連の行動を実行することができる。本研究は, 人工植物(Dracaena) と実植物(Dracaena) を, 新しい植物構成への一般化をテストする2つの設定を含む5つの物理的テストベッド上で実験した。本研究は,手作り力学モデルと関連するアブレーションに対するsrpnetの有効性と,競合する手作り探索法に対する総合的手法であるppgの有効性を明らかにした。

Autonomous agriculture applications (e.g., inspection, phenotyping, plucking fruits) require manipulating the plant foliage to look behind the leaves and the branches. Partial visibility, extreme clutter, thin structures, and unknown geometry and dynamics for plants make such manipulation challenging. We tackle these challenges through data-driven methods. We use self-supervision to train SRPNet, a neural network that predicts what space is revealed on execution of a candidate action on a given plant. We use SRPNet with the cross-entropy method to predict actions that are effective at revealing space beneath plant foliage. Furthermore, as SRPNet does not just predict how much space is revealed but also where it is revealed, we can execute a sequence of actions that incrementally reveal more and more space beneath the plant foliage. We experiment with a synthetic (vines) and a real plant (Dracaena) on a physical test-bed across 5 settings including 2 settings that test generalization to novel plant configurations. Our experiments reveal the effectiveness of our overall method, PPG, over a competitive hand-crafted exploration method, and the effectiveness of SRPNet over a hand-crafted dynamics model and relevant ablations.

翻訳日:2023-07-07 12:56:18 公開日:2023-07-06

# 中間の損失:言語モデルが長い文脈をどのように使うか

Lost in the Middle: How Language Models Use Long Contexts ( http://arxiv.org/abs/2307.03172v1 )

ライセンス: Link先を確認

Nelson F. Liu and Kevin Lin and John Hewitt and Ashwin Paranjape and Michele Bevilacqua and Fabio Petroni and Percy Liang

(参考訳) 最近の言語モデルは、長いコンテキストを入力として扱うことができるが、言語モデルがいかに長いコンテキストを使用するかは、比較的分かっていない。入力コンテキスト内の関連情報を識別する必要のある2つのタスクにおける言語モデルのパフォーマンスを分析する。入力コンテキストの開始時や終了時に関連情報が生じた場合、性能が最も高く、長いコンテキストの途中でモデルが関連する情報にアクセスしなければならない場合、大幅に低下する。さらに、明示的な長期コンテキストモデルであっても、入力コンテキストが長くなるにつれてパフォーマンスが大幅に低下する。分析は、言語モデルが入力コンテキストをどのように利用するかをよりよく理解し、将来のロングコンテキストモデルのための新しい評価プロトコルを提供する。

While recent language models have the ability to take long contexts as input, relatively little is known about how well the language models use longer context. We analyze language model performance on two tasks that require identifying relevant information within their input contexts: multi-document question answering and key-value retrieval. We find that performance is often highest when relevant information occurs at the beginning or end of the input context, and significantly degrades when models must access relevant information in the middle of long contexts. Furthermore, performance substantially decreases as the input context grows longer, even for explicitly long-context models. Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long-context models.

翻訳日:2023-07-07 12:55:56 公開日:2023-07-06

# LEO:多目的2値決定図のための効率的な順序付け学習

LEO: Learning Efficient Orderings for Multiobjective Binary Decision Diagrams ( http://arxiv.org/abs/2307.03171v1 )

ライセンス: Link先を確認

Rahul Patel, Elias B. Khalil

(参考訳) 双対決定図(BDD)に基づくアプローチは、最近、多目的整数プログラミング問題に対する最先端の結果を得た。 BDDの構築に使用される変数の順序付けは、そのサイズや、単一目的の最適化問題に対する緩和あるいは制限されたBDDから派生したバウンダリの品質に大きな影響を与える可能性がある。我々はまず,多目的ナップサック問題に対するpareto frontier(pf)列挙時間に対する変数順序付けの類似性を示し,多目的bddアプローチのスケーラビリティを向上させる変数順序付けメソッドの導出の必要性を示唆する。そこで我々は,小さな解釈可能かつ容易に計算可能な変数特徴セットにおいて線形な変数スコアリング関数に基づいて,新しいパラメータ構成空間を導出する。ブラックボックス最適化を用いて構成空間を効率的に探索し、次元の呪いを回避し(変数数と目的数)、PF列挙時間を短縮する優れた順序付けを見つける方法を示す。しかし、ブラックボックス最適化アプローチは、良好な変数順序付けによる時間の削減よりも大きい計算オーバーヘッドを伴います。この問題を軽減するために、列挙時間を削減する効率的な変数順序付けを見つけるための教師付き学習手法LEOを提案する。クナプサック問題の3～7の目的と最大80の変数によるベンチマークセットの実験では、LEOは一般的な順序付け戦略やアルゴリズム構成よりも30～300%、PF列挙では10～200%高速であることが示されている。私たちのコードとインスタンスはhttps://github.com/khalil-research/leoで利用可能です。

Approaches based on Binary decision diagrams (BDDs) have recently achieved state-of-the-art results for multiobjective integer programming problems. The variable ordering used in constructing BDDs can have a significant impact on their size and on the quality of bounds derived from relaxed or restricted BDDs for single-objective optimization problems. We first showcase a similar impact of variable ordering on the Pareto frontier (PF) enumeration time for the multiobjective knapsack problem, suggesting the need for deriving variable ordering methods that improve the scalability of the multiobjective BDD approach. To that end, we derive a novel parameter configuration space based on variable scoring functions which are linear in a small set of interpretable and easy-to-compute variable features. We show how the configuration space can be efficiently explored using black-box optimization, circumventing the curse of dimensionality (in the number of variables and objectives), and finding good orderings that reduce the PF enumeration time. However, black-box optimization approaches incur a computational overhead that outweighs the reduction in time due to good variable ordering. To alleviate this issue, we propose LEO, a supervised learning approach for finding efficient variable orderings that reduce the enumeration time. Experiments on benchmark sets from the knapsack problem with 3-7 objectives and up to 80 variables show that LEO is ~30-300% and ~10-200% faster at PF enumeration than common ordering strategies and algorithm configuration. Our code and instances are available at https://github.com/khalil-research/leo.

翻訳日:2023-07-07 12:55:43 公開日:2023-07-06

# Focused Transformer: コンテキストスケーリングのためのコントラストトレーニング

Focused Transformer: Contrastive Training for Context Scaling ( http://arxiv.org/abs/2307.03170v1 )

ライセンス: Link先を確認

Szymon Tworkowski, Konrad Staniszewski, Miko{\l}aj Pacek, Yuhuai Wu, Henryk Michalewski, Piotr Mi{\l}o\'s

(参考訳) 大規模言語モデルは、文脈的に新しい情報を組み込む特別な能力を持っている。しかし、そのようなアプローチの完全なポテンシャルは、有効文脈長の制限のためにしばしば抑制される。この問題の解決策の1つは、(キー、値)ペアからなる外部メモリへのアクセスを持つ注意層を提供することである。しかし、文書の数が増えるにつれて、関連するキーの無関係なキーに対する割合が減少し、無関係なキーにもっと集中するようになる。そこでは、異なるセマンティックな値に関連付けられたキーが重複し、区別が困難になる可能性がある。そこで,本研究では,コントラスト学習に触発された学習プロセスを用いる手法であるフォーカストランスフォーマ(fot)を提案する。この新しいアプローチは(キー、値)空間の構造を強化し、コンテキスト長の拡張を可能にする。提案手法では,既存の大規模モデルを微調整して有効コンテキストを延長することができる。これは3b$と7b$ openllamaチェックポイントの微調整で示されています。結果として得られたモデルはLongLLaMAと呼ばれ、長いコンテキストを必要とするタスクの進歩を示す。さらに,我々のLongLLaMAモデルではパスキー検索のコンテキスト長が256k$であることを示す。

Large language models have an exceptional capability to incorporate new information in a contextual manner. However, the full potential of such an approach is often restrained due to a limitation in the effective context length. One solution to this issue is to endow an attention layer with access to an external memory, which comprises of (key, value) pairs. Yet, as the number of documents increases, the proportion of relevant keys to irrelevant ones decreases, leading the model to focus more on the irrelevant keys. We identify a significant challenge, dubbed the distraction issue, where keys linked to different semantic values might overlap, making them hard to distinguish. To tackle this problem, we introduce the Focused Transformer (FoT), a technique that employs a training process inspired by contrastive learning. This novel approach enhances the structure of the (key, value) space, enabling an extension of the context length. Our method allows for fine-tuning pre-existing, large-scale models to lengthen their effective context. This is demonstrated by our fine-tuning of $3B$ and $7B$ OpenLLaMA checkpoints. The resulting models, which we name LongLLaMA, exhibit advancements in tasks requiring a long context. We further illustrate that our LongLLaMA models adeptly manage a $256 k$ context length for passkey retrieval.

翻訳日:2023-07-07 12:55:14 公開日:2023-07-06

# リカレントトレンド予測ニューラルネットワークに基づく予測組込みスケジューリングによるスマートホーム環境の再生可能エネルギー管理

Renewable energy management in smart home environment via forecast embedded scheduling based on Recurrent Trend Predictive Neural Network ( http://arxiv.org/abs/2307.01622v2 )

ライセンス: Link先を確認

Mert Nak{\i}p, Onur \c{C}opur, Emrah Biyik, C\"uneyt G\"uzeli\c{s}

(参考訳) スマートホームエネルギー管理システムは、配電網をより効率的かつ確実に運用し、分散型再生可能エネルギー源の効果的な普及を可能にする。これらのシステムは、需要と再生可能生成の不確実性を扱うことのできる堅牢な予測、最適化、制御/スケジューリングアルゴリズムに依存している。本稿では,Recurrent Trends Predictive Neural Network based Forecast Embedded Scheduling (rTPNN-FES)と呼ばれるMLアルゴリズムを提案する。 rTPNN-FESは、再生可能エネルギーの発生と家電のスケジュールを同時に予測する新しいニューラルネットワークアーキテクチャである。組込み構造により、rTPNN-FESは予測とスケジューリングのための別々のアルゴリズムの使用を排除し、予測エラーに対して堅牢なスケジュールを生成する。本稿では,iot対応スマートホームにおける提案アルゴリズムの性能評価も行う。評価結果から, rTPNN-FESは最適化よりも37.5ドルの速さで, 最先端予測技術より優れていることがわかった。

Smart home energy management systems help the distribution grid operate more efficiently and reliably, and enable effective penetration of distributed renewable energy sources. These systems rely on robust forecasting, optimization, and control/scheduling algorithms that can handle the uncertain nature of demand and renewable generation. This paper proposes an advanced ML algorithm, called Recurrent Trend Predictive Neural Network based Forecast Embedded Scheduling (rTPNN-FES), to provide efficient residential demand control. rTPNN-FES is a novel neural network architecture that simultaneously forecasts renewable energy generation and schedules household appliances. By its embedded structure, rTPNN-FES eliminates the utilization of separate algorithms for forecasting and scheduling and generates a schedule that is robust against forecasting errors. This paper also evaluates the performance of the proposed algorithm for an IoT-enabled smart home. The evaluation results reveal that rTPNN-FES provides near-optimal scheduling $37.5$ times faster than the optimization while outperforming state-of-the-art forecasting techniques.

翻訳日:2023-07-07 11:11:26 公開日:2023-07-06

# イントロスペクティブロボット組立のための正規化フローを用いた密度ベースフィージビリティ学習

Density-based Feasibility Learning with Normalizing Flows for Introspective Robotic Assembly ( http://arxiv.org/abs/2307.01317v2 )

ライセンス: Link先を確認

Jianxiang Feng, Matan Atad, Ismael Rodr\'iguez, Maximilian Durner, Stephan G\"unnemann, Rudolph Triebel

(参考訳) ロボットアセンブリシーケンスプランニング(RASP)における機械学習(ML)モデルは、予測されたソリューション、すなわち、潜在的効率劣化を回避するために、イントロスペクティブである必要がある。以前の作業では、トレーニング中に実現可能な例と実行不可能な例の両方が必要です。しかし、新しい製品に素早く適応するために再トレーニングが必要な場合、実現不可能なものは十分な収集が困難である。本研究では,実例のみを必要とする密度ベース実現可能性学習手法を提案する。具体的には,複雑な確率分布を推定するための強力な生成モデルである正規化フロー(nf)を用いて,分散(ood)検出として実現可能性学習問題を定式化する。実証的に,提案手法はロボットアセンブリのユースケースで実証され,実現不可能なアセンブリの検出において,他の単一クラスベースラインよりも優れる。さらに,本手法の内部動作機構について検討し,NFの高度変種に基づいて大きなメモリ節約が得られることを示す。

Machine Learning (ML) models in Robotic Assembly Sequence Planning (RASP) need to be introspective on the predicted solutions, i.e. whether they are feasible or not, to circumvent potential efficiency degradation. Previous works need both feasible and infeasible examples during training. However, the infeasible ones are hard to collect sufficiently when re-training is required for swift adaptation to new product variants. In this work, we propose a density-based feasibility learning method that requires only feasible examples. Concretely, we formulate the feasibility learning problem as Out-of-Distribution (OOD) detection with Normalizing Flows (NF), which are powerful generative models for estimating complex probability distributions. Empirically, the proposed method is demonstrated on robotic assembly use cases and outperforms other single-class baselines in detecting infeasible assemblies. We further investigate the internal working mechanism of our method and show that a large memory saving can be obtained based on an advanced variant of NF.

翻訳日:2023-07-07 11:11:08 公開日:2023-07-06

# 信頼性の高いAI:次世代の量子コンピューティングは必要か?

Reliable AI: Does the Next Generation Require Quantum Computing? ( http://arxiv.org/abs/2307.01301v2 )

ライセンス: Link先を確認

Aras Bacho, Holger Boche, Gitta Kutyniok

(参考訳) 本研究では、次世代の人工知能が量子コンピューティングを必要とするかどうかという根本的な疑問を探究する。人工知能は私たちの日常生活において重要な役割を担っており、第4次産業革命の中心となっている。したがって、人工知能が信頼性と信頼性を持つことが必須である。しかし、自動運転、医療、ロボティクスなどの分野において、プライバシ、責任、安全性、セキュリティなど、人工知能の信頼性にはまだ多くの問題がある。これらの問題には、不十分なデータ、バイアス、堅牢性問題、およびデジタルハードウェアにおける計算可能性問題など、様々な原因がある。これらの計算可能性問題の原因は、デジタルハードウェアが本質的に離散的なチューリングマシンの計算モデルに基づいているという事実にある。特に,デジタルハードウェアは最適化,深層学習,微分方程式の問題解決に本質的に制約されている。したがって、これらの制限は人工知能の分野、特に機械学習に重大な意味を持つ。さらに、量子コンピュータがある種の問題に対して量子的優位性を示すことはよく知られているが、量子回路や量子チューリングマシンのパラダイムに基づく量子コンピューティングモデルを使用する場合、これらの制限の一部は持続する。対照的に、Blum-Shub-Smale マシンのようなアナログコンピューティングモデルは、これらの制限を克服する可能性を示している。

In this survey, we aim to explore the fundamental question of whether the next generation of artificial intelligence requires quantum computing. Artificial intelligence is increasingly playing a crucial role in many aspects of our daily lives and is central to the fourth industrial revolution. It is therefore imperative that artificial intelligence is reliable and trustworthy. However, there are still many issues with reliability of artificial intelligence, such as privacy, responsibility, safety, and security, in areas such as autonomous driving, healthcare, robotics, and others. These problems can have various causes, including insufficient data, biases, and robustness problems, as well as fundamental issues such as computability problems on digital hardware. The cause of these computability problems is rooted in the fact that digital hardware is based on the computing model of the Turing machine, which is inherently discrete. Notably, our findings demonstrate that digital hardware is inherently constrained in solving problems about optimization, deep learning, or differential equations. Therefore, these limitations carry substantial implications for the field of artificial intelligence, in particular for machine learning. Furthermore, although it is well known that the quantum computer shows a quantum advantage for certain classes of problems, our findings establish that some of these limitations persist when employing quantum computing models based on the quantum circuit or the quantum Turing machine paradigm. In contrast, analog computing models, such as the Blum-Shub-Smale machine, exhibit the potential to surmount these limitations.

翻訳日:2023-07-07 11:10:48 公開日:2023-07-06

# 医用画像合成のための3次元潜伏拡散モデルにおけるデータ記憶の検討

Investigating Data Memorization in 3D Latent Diffusion Models for Medical Image Synthesis ( http://arxiv.org/abs/2307.01148v2 )

ライセンス: Link先を確認

Salman Ul Hassan Dar, Arman Ghanaat, Jannik Kahmann, Isabelle Ayx, Theano Papavassiliu, Stefan O. Schoenberg, Sandy Engelhardt

(参考訳) 生成潜在拡散モデルはデータ生成の最先端として確立されている。有望な応用の1つは、患者のプライバシーを損なうことなく、オープンデータ共有のための現実的な合成医療画像データを生成することである。それにもかかわらず、敏感な患者のトレーニングデータを記憶し、トレーニングデータによく似たサンプルを合成するモデルの能力は、比較的未調査である。本稿では, 冠動脈造影および膝磁気共鳴画像データセットを用いた3次元潜時拡散モデルの記憶能力の評価を行った。トレーニングサンプルの潜在的な暗記を検出するために,コントラスト学習に基づく自己教師型モデルを用いる。以上の結果から,このような潜伏拡散モデルがトレーニングデータを記憶し,記憶化を緩和するための戦略を考案する必要があることが示唆された。

Generative latent diffusion models have been established as state-of-the-art in data generation. One promising application is generation of realistic synthetic medical imaging data for open data sharing without compromising patient privacy. Despite the promise, the capacity of such models to memorize sensitive patient training data and synthesize samples showing high resemblance to training data samples is relatively unexplored. Here, we assess the memorization capacity of 3D latent diffusion models on photon-counting coronary computed tomography angiography and knee magnetic resonance imaging datasets. To detect potential memorization of training samples, we utilize self-supervised models based on contrastive learning. Our results suggest that such latent diffusion models indeed memorize training data, and there is a dire need for devising strategies to mitigate memorization.

翻訳日:2023-07-07 11:10:24 公開日:2023-07-06

# REAL: アクティブラーニングのための代表的エラー駆動アプローチ

REAL: A Representative Error-Driven Approach for Active Learning ( http://arxiv.org/abs/2307.00968v2 )

ライセンス: Link先を確認

Cheng Chen, Yong Wang, Lizi Liao, Yueguo Chen, Xiaoyong Du

(参考訳) ラベル付け予算が限られているため、active learning(al)はラベルのないプールから最も有益なインスタンスをサンプリングし、その後のモデルトレーニングのためにラベルを取得することを目的としている。これを達成するため、ALは通常、不確実性と多様性に基づいてラベルなしのインスタンスの情報性を測定する。しかし、モデルの性能を向上させる大きな可能性を持つ近傍誤差密度の誤例は考慮していない。この制限に対処するために、$REAL$という新しいアプローチを提案し、$\underline{R}$epresentative $\underline{E}$rrors for $\underline{A}$ctive $\underline{L}$earning。クラスタ内の少数派予測を 'emph{pseudo error} と識別し、推定エラー密度に基づいてクラスタの適応的なサンプリング予算を割り当てる。 5つのテキスト分類データセットの大規模な実験により、$REAL$は、幅広いハイパーパラメータ設定における精度とF1-macroスコアに関するすべての最高のパフォーマンスベースラインを一貫して上回ります。我々の分析によると、$REAL$は決定境界に沿った地道誤差の分布と一致する最も代表的な擬似エラーを選択する。私たちのコードはhttps://github.com/withchencheng/ECML_PKDD_23_Realで公開されています。

Given a limited labeling budget, active learning (AL) aims to sample the most informative instances from an unlabeled pool to acquire labels for subsequent model training. To achieve this, AL typically measures the informativeness of unlabeled instances based on uncertainty and diversity. However, it does not consider erroneous instances with their neighborhood error density, which have great potential to improve the model performance. To address this limitation, we propose $REAL$, a novel approach to select data instances with $\underline{R}$epresentative $\underline{E}$rrors for $\underline{A}$ctive $\underline{L}$earning. It identifies minority predictions as \emph{pseudo errors} within a cluster and allocates an adaptive sampling budget for the cluster based on estimated error density. Extensive experiments on five text classification datasets demonstrate that $REAL$ consistently outperforms all best-performing baselines regarding accuracy and F1-macro scores across a wide range of hyperparameter settings. Our analysis also shows that $REAL$ selects the most representative pseudo errors that match the distribution of ground-truth errors along the decision boundary. Our code is publicly available at https://github.com/withchencheng/ECML_PKDD_23_Real.

翻訳日:2023-07-07 11:10:10 公開日:2023-07-06

# 超高分解能セグメンテーションのための空間整合性誘導パッチグルーピングウェーブレット変換器

Guided Patch-Grouping Wavelet Transformer with Spatial Congruence for Ultra-High Resolution Segmentation ( http://arxiv.org/abs/2307.00711v2 )

ライセンス: Link先を確認

Deyi Ji, Feng Zhao, Hongtao Lu

(参考訳) 既存の超高分解能(UHR)セグメンテーション手法は、メモリコストと局所特性のバランスをとるジレンマに常に苦労している。この研究において、gpwformerはtransform($\mathcal{t}$)-cnn($\mathcal{c}$)相互傾きフレームワークであり、$\mathcal{t}$はuhrイメージ全体を入力として、局所的な詳細と細かな長距離のコンテキスト依存性の両方を収集する。高い推論速度と計算の複雑さのために、$\mathcal{t}$ は元の uhr 画像をパッチに分割し、動的にグループ化し、軽量の multi-head wavelet transformer (wformer) ネットワークで低レベルなローカル詳細を学ぶ。一方で、このプロセスでは、空間領域から遠く離れたパッチを同じグループに割り当てることもできるため、細かな長距離のコンテキスト依存性もキャプチャされる。さらに、$\mathcal{c}$で生成されるマスクを使用してパッチグループ化プロセスをガイドし、ヒューリスティックス決定を提供する。さらに、パッチ間の空間的一貫性を維持するために、2つのブランチ間の共役制約も活用する。全体としては、マルチステージのプロセスをピラミッド的な方法で積み重ねます。 GPWFormerは5つのベンチマークデータセットで大幅に改善され、既存のメソッドよりも優れていた。

Most existing ultra-high resolution (UHR) segmentation methods always struggle in the dilemma of balancing memory cost and local characterization accuracy, which are both taken into account in our proposed Guided Patch-Grouping Wavelet Transformer (GPWFormer) that achieves impressive performances. In this work, GPWFormer is a Transformer ($\mathcal{T}$)-CNN ($\mathcal{C}$) mutual leaning framework, where $\mathcal{T}$ takes the whole UHR image as input and harvests both local details and fine-grained long-range contextual dependencies, while $\mathcal{C}$ takes downsampled image as input for learning the category-wise deep context. For the sake of high inference speed and low computation complexity, $\mathcal{T}$ partitions the original UHR image into patches and groups them dynamically, then learns the low-level local details with the lightweight multi-head Wavelet Transformer (WFormer) network. Meanwhile, the fine-grained long-range contextual dependencies are also captured during this process, since patches that are far away in the spatial domain can also be assigned to the same group. In addition, masks produced by $\mathcal{C}$ are utilized to guide the patch grouping process, providing a heuristics decision. Moreover, the congruence constraints between the two branches are also exploited to maintain the spatial consistency among the patches. Overall, we stack the multi-stage process in a pyramid way. Experiments show that GPWFormer outperforms the existing methods with significant improvements on five benchmark datasets.

翻訳日:2023-07-07 11:09:42 公開日:2023-07-06

# 一般量子マルコフ過程のヒット時間について

On Hitting Times for General Quantum Markov Processes ( http://arxiv.org/abs/2210.10188v3 )

ライセンス: Link先を確認

Lorenzo Laneve, Francesco Tacchino, Ivano Tavernelli

(参考訳) ランダムウォーク(英: Random walk、またはMarkov chains)は、理論計算機科学で広く使われているモデルである。打つ時間や混合時間などの量の分析を含むいくつかのツールは、ランダム化されたアルゴリズムを考案するのに役立ちます。注目すべき例はsch\"oning's algorithm for the satisfiability (sat) problemである。本研究では,古典的ウォークを直接一般化する量子マルコフ連鎖モデルを定義するために密度行列形式を用い,古典的理論で見られるものと同様の公式で時間を打つような共通ツールが計算できることを示し,グロバーのアルゴリズムのような既知の量子的設定に適用する。

Random walks (or Markov chains) are models extensively used in theoretical computer science. Several tools, including analysis of quantities such as hitting and mixing times, are helpful for devising randomized algorithms. A notable example is Sch\"oning's algorithm for the satisfiability (SAT) problem. In this work, we use the density-matrix formalism to define a quantum Markov chain model which directly generalizes classical walks, and we show that a common tools such as hitting times can be computed with a similar formula as the one found in the classical theory, which we then apply to known quantum settings such as Grover's algorithm.

翻訳日:2023-07-07 11:09:09 公開日:2023-07-06

# 自然言語証明計画のための帰納的加法

Deductive Additivity for Planning of Natural Language Proofs ( http://arxiv.org/abs/2307.02472v2 )

ライセンス: Link先を確認

Zayne Sprague, Kaj Bostrom, Swarat Chaudhuri, Greg Durrett

(参考訳) マルチステップのクレーム検証のために設計された現在の自然言語システムは、2つのフェーズで運用される: ヒューリスティック(計画)を用いて関連する前提文の集合を検索し、大きな言語モデル(推論)を使用してそれらのステートメントから新しい結論を生成する。計画ステップは、しばしば高価なトランスフォーマー操作を必要とし、任意の数の前提ステートメントにスケールしない。本稿では,帰納的推論に適合する埋め込み空間を通じて,効率的な計画ヒューリスティックが可能かどうかを検討する。具体的には、埋め込み空間が帰納的加法 (deductive additivity) と呼ばれる性質を示すかどうかを評価する: 前提文の和は、それらの前提に基づく結論の埋め込みに近いべきである。我々は,GPT3からの細調整された埋め込みやBM25からのスパース埋め込みに加えて,既成の密着な埋め込みの複数の源を探究する。本研究は, 帰納的加法の性質が持つか, 極端なか, 自然言語証明生成における計画支援に利用するか, 両方の組込みモデルを本質的に検討した。最後に,Single-Step Reasoning Contrast(SSRC)というデータセットを作成し,さまざまな推論タイプのパフォーマンスを調査する。以上より,標準組込み手法は,前提の和に近い結論をしばしば埋め込むが,それらは効果的なヒューリスティックであり,推論の特定のカテゴリをモデル化する能力に欠けることが示唆された。

Current natural language systems designed for multi-step claim validation typically operate in two phases: retrieve a set of relevant premise statements using heuristics (planning), then generate novel conclusions from those statements using a large language model (deduction). The planning step often requires expensive Transformer operations and does not scale to arbitrary numbers of premise statements. In this paper, we investigate whether an efficient planning heuristic is possible via embedding spaces compatible with deductive reasoning. Specifically, we evaluate whether embedding spaces exhibit a property we call deductive additivity: the sum of premise statement embeddings should be close to embeddings of conclusions based on those premises. We explore multiple sources of off-the-shelf dense embeddings in addition to fine-tuned embeddings from GPT3 and sparse embeddings from BM25. We study embedding models both intrinsically, evaluating whether the property of deductive additivity holds, and extrinsically, using them to assist planning in natural language proof generation. Lastly, we create a dataset, Single-Step Reasoning Contrast (SSRC), to further probe performance on various reasoning types. Our findings suggest that while standard embedding methods frequently embed conclusions near the sums of their premises, they fall short of being effective heuristics and lack the ability to model certain categories of reasoning.

翻訳日:2023-07-07 11:03:15 公開日:2023-07-06

# マルチコントラストMRIにおけるDual Arbitrary Scale Super-Resolution

Dual Arbitrary Scale Super-Resolution for Multi-Contrast MRI ( http://arxiv.org/abs/2307.02334v2 )

ライセンス: Link先を確認

Jiamiao Zhang, Yichen Chi, Jun Lyu, Wenming Yang, Yapeng Tian

(参考訳) イメージングシステムによって制限された部分的計測からMRI画像の再構成は、医療画像研究に不可欠である。異なる撮像モードのマルチコントラストmr画像の多様かつ相補的な情報から、マルチコントラストスーパーレゾリューション(sr)再構成は高品質のsr画像が得られると期待されている。医学的シナリオでは、多くのMRI SR法で用いられるように、病変を完全に可視化するために、放射線医は固定スケールではなく任意のスケールでMRI画像を拡大することに慣れている。さらに、既存のマルチコントラストMRI SR法では、参照画像の固定解像度を必要とすることが多く、参照画像の取得が困難になり、任意のスケールの SR タスクに制限が課される。これらの問題に対処するため,我々はDual-ArbNetと呼ばれる2軸マルチコントラストMRI超解像法を提案する。まず,対象画像と参照画像の解像度を特徴エンコーダで分離し,ネットワークが任意のスケールで対象画像と参照画像を入力できるようにする。そして、暗黙の融合復号器がマルチコントラスト特徴を融合し、インプリシット復号関数~(IDF)を用いて最終的なMRI SR結果を得る。さらに,我々のネットワークをトレーニングするためのカリキュラム学習戦略を導入し,dual-arbnetの一般化と性能を向上させる。 2つの公開MRIデータセットにおける広範囲な実験により、我々の手法は異なるスケール要因下で最先端のアプローチよりも優れており、臨床実践において大きな可能性を秘めていることが示された。

Limited by imaging systems, the reconstruction of Magnetic Resonance Imaging (MRI) images from partial measurement is essential to medical imaging research. Benefiting from the diverse and complementary information of multi-contrast MR images in different imaging modalities, multi-contrast Super-Resolution (SR) reconstruction is promising to yield SR images with higher quality. In the medical scenario, to fully visualize the lesion, radiologists are accustomed to zooming the MR images at arbitrary scales rather than using a fixed scale, as used by most MRI SR methods. In addition, existing multi-contrast MRI SR methods often require a fixed resolution for the reference image, which makes acquiring reference images difficult and imposes limitations on arbitrary scale SR tasks. To address these issues, we proposed an implicit neural representations based dual-arbitrary multi-contrast MRI super-resolution method, called Dual-ArbNet. First, we decouple the resolution of the target and reference images by a feature encoder, enabling the network to input target and reference images at arbitrary scales. Then, an implicit fusion decoder fuses the multi-contrast features and uses an Implicit Decoding Function~(IDF) to obtain the final MRI SR results. Furthermore, we introduce a curriculum learning strategy to train our network, which improves the generalization and performance of our Dual-ArbNet. Extensive experiments in two public MRI datasets demonstrate that our method outperforms state-of-the-art approaches under different scale factors and has great potential in clinical practice.

翻訳日:2023-07-07 11:02:47 公開日:2023-07-06

# ChatGPT生成データを用いたソーシャルメディアからの抑うつ症状の検索

Utilizing ChatGPT Generated Data to Retrieve Depression Symptoms from Social Media ( http://arxiv.org/abs/2307.02313v2 )

ライセンス: Link先を確認

Ana-Maria Bucur

(参考訳) 本稿では,抑うつ症状の検索におけるeRisk LabタスクにおけるBLUEチームの貢献について述べる。このタスクは、BDI-IIアンケートからうつ病の症状を伝えるRedditのソーシャルメディア文の検索とランキングから成り立っている。 llmsが提供した合成データがデータ拡張と下流モデルの微調整の信頼できる方法であることが証明されていることから,bdi-iiアンケートの症状ごとにchatgptを用いて合成データを生成する方法を選択した。生成したデータは各質問に対するBDI-II応答よりもリッチでセマンティックな多様性を含み、同時にReddit上でのより親密な体験共有に特有な感情的・逸話的体験を含むようにプロンプトを設計した。意味探索を行い,コサイン類似性により文のBDI-II症状との関連をランク付けする。 MentalRoBERTaとMPNetの変種である2つの最先端トランスフォーマーモデルを用いて,BDI-IIのオリジナルおよび生成された応答であるソーシャルメディアポストを埋め込んだ。その結果,意味探索用に設計されたモデルからの文の埋め込みは,メンタルヘルスデータに基づいて事前学習したモデルからの埋め込みよりも優れていることがわかった。さらに、生成した合成データは、このタスクにあまり具体的でないことが証明され、bdi-ii応答に依存するアプローチが最良の性能を示した。

In this work, we present the contribution of the BLUE team in the eRisk Lab task on searching for symptoms of depression. The task consists of retrieving and ranking Reddit social media sentences that convey symptoms of depression from the BDI-II questionnaire. Given that synthetic data provided by LLMs have been proven to be a reliable method for augmenting data and fine-tuning downstream models, we chose to generate synthetic data using ChatGPT for each of the symptoms of the BDI-II questionnaire. We designed a prompt such that the generated data contains more richness and semantic diversity than the BDI-II responses for each question and, at the same time, contains emotional and anecdotal experiences that are specific to the more intimate way of sharing experiences on Reddit. We perform semantic search and rank the sentences' relevance to the BDI-II symptoms by cosine similarity. We used two state-of-the-art transformer-based models (MentalRoBERTa and a variant of MPNet) for embedding the social media posts, the original and generated responses of the BDI-II. Our results show that using sentence embeddings from a model designed for semantic search outperforms the approach using embeddings from a model pre-trained on mental health data. Furthermore, the generated synthetic data were proved too specific for this task, the approach simply relying on the BDI-II responses had the best performance.

翻訳日:2023-07-07 11:02:15 公開日:2023-07-06

# 剛性フェアニューラルアーキテクチャ探索に基づく動的アイソメトリ

Dynamical Isometry based Rigorous Fair Neural Architecture Search ( http://arxiv.org/abs/2307.02263v2 )

ライセンス: Link先を確認

Jianxiang Luo, Junyi Hu, Tianji Pang, Weihao Huang, Chuang Liu

(参考訳) 近年,重み付け技術により,ニューラルネットワーク探索のトレーニングと評価が大幅に高速化されている。しかし、既存の重み共有戦略のほとんどは経験や観察のみに基づいており、その結果は解釈可能性や合理性に欠ける。また, 公正性の欠如により, モジュール評価の誤判断が生じる傾向にある。これらの問題に対処するために,動的アイソメトリに基づくニューラルアーキテクチャ探索アルゴリズムを提案する。固定点解析法を平均場理論に用いて、定常ランダムニューラルネットワークにおける動的挙動を解析し、動的等尺法が重み付けに基づくNASの公平性を保証するかを示す。一方,条件付きジャコビアンを持つすべてのモジュールの一般化誤差を推定することにより,モジュール選択戦略が厳密であることを示す。大規模な実験により,提案手法で探索したアーキテクチャは,画像ネット分類における最先端のTop-1検証精度を実現することができた。また,本手法は一般性を損なうことなく,より良く,より安定したトレーニング性能を実現することができることを示した。

Recently, the weight-sharing technique has significantly speeded up the training and evaluation procedure of neural architecture search. However, most existing weight-sharing strategies are solely based on experience or observation, which makes the searching results lack interpretability and rationality. In addition, due to the negligence of fairness, current methods are prone to make misjudgments in module evaluation. To address these problems, we propose a novel neural architecture search algorithm based on dynamical isometry. We use the fix point analysis method in the mean field theory to analyze the dynamics behavior in the steady state random neural network, and how dynamic isometry guarantees the fairness of weight-sharing based NAS. Meanwhile, we prove that our module selection strategy is rigorous fair by estimating the generalization error of all modules with well-conditioned Jacobian. Extensive experiments show that, with the same size, the architecture searched by the proposed method can achieve state-of-the-art top-1 validation accuracy on ImageNet classification. In addition, we demonstrate that our method is able to achieve better and more stable training performance without loss of generality.

翻訳日:2023-07-07 11:01:48 公開日:2023-07-06

# 混合量子状態に対する強い量子速度制限

Stronger Quantum Speed Limit For Mixed Quantum States ( http://arxiv.org/abs/2307.02215v2 )

ライセンス: Link先を確認

Shrobona Bagchi, Dimpi Thakuria, Arun Kumar Pati

(参考訳) 混合量子状態とユニタリ進化の強い不確実性関係を用いて、混合量子状態に対する量子速度制限を導出する。また、この境界は、より良い境界を得るための演算子の異なる選択に対して最適化可能であることも示している。このバウンダリをいくつかの例で説明し、以前のバウンダリよりも優れたパフォーマンスを示します。

We derive a quantum speed limit for mixed quantum states using the stronger uncertainty relation for mixed quantum states and unitary evolution. We also show that this bound can be optimized over different choices of operators for obtaining a better bound. We illustrate this bound with some examples and show its better performance with respect to some earlier bounds.

翻訳日:2023-07-07 11:01:30 公開日:2023-07-06

# 自由空間BBM92量子鍵分配プロトコルにおける非最大絡み合い状態の利用

Use of Non-Maximal entangled state for free space BBM92 quantum key distribution protocol ( http://arxiv.org/abs/2307.02149v2 )

ライセンス: Link先を確認

Ayan Biswas, Sarika Mishra, Satyajeet Patil, Anindya Banerji, Shashi Prabhakar, and Ravindra P. Singh

(参考訳) セキュアな鍵配布のための衛星ベースの量子通信は、破壊不可能なセキュリティのために、より要求の高い研究分野になりつつある。 BB84のようなプレパアプロトコルや測定プロトコルは、衛星を信頼できる装置とみなし、衛星ベースの光通信の現在の傾向を危険視している。したがって、遠距離制限を克服すると共に、衛星を信頼できない機器とみなすことができるため、絡み合いに基づくプロトコルが望ましい。 e91プロトコルは衛星ベースの量子通信のよい候補であるが、eveに対するセキュリティを確保するためにベル・チェシュの不等式を検証するために測定された量子ビットのほとんどを利用するため、鍵レートは低い。エンタングルメントベースのプロトコルは、よりセキュアな鍵分散のために最大エンタングル状態を必要とする。本稿では,セキュアな鍵分布に対する非最大性の影響について述べる。これは、セキュアキーを抽出できない非最大性条件の下限を確立する。 BBM92プロトコルは,Bell-CHSHの不等式に対する違反の程度と,与えられた設定に対する量子ビット誤り率との間に線形接続があることから,鍵分布にとってより有益である。

Satellite-based quantum communication for secure key distribution is becoming a more demanding field of research due to its unbreakable security. Prepare and measure protocols such as BB84 consider the satellite as a trusted device, fraught with danger looking at the current trend for satellite-based optical communication. Therefore, entanglement-based protocols must be preferred since, along with overcoming the distance limitation, one can consider the satellite as an untrusted device too. E91 protocol is a good candidate for satellite-based quantum communication; but the key rate is low as most of the measured qubits are utilized to verify a Bell-CHSH inequality to ensure security against Eve. An entanglement-based protocol requires a maximally entangled state for more secure key distribution. The current work discusses the effect of non-maximality on secure key distribution. It establishes a lower bound on the non-maximality condition below which no secure key can be extracted. BBM92 protocol will be more beneficial for key distribution as we found a linear connection between the extent of violation for Bell-CHSH inequality and the quantum bit error rate for a given setup.

翻訳日:2023-07-07 11:01:23 公開日:2023-07-06

# 計算社会科学における再現性

Computational Reproducibility in Computational Social Science ( http://arxiv.org/abs/2307.01918v2 )

ライセンス: Link先を確認

David Schoch, Chung-hong Chan, Claudia Wagner, Arnim Bleier

(参考訳) 過去10年間で、再現性と再現性の危機が科学界を揺るがしている。潜在的な解決策として、オープンサイエンスの実践は深く議論され、様々な分野で様々な成功を収めた。しかしながら,計算社会科学などの計算X分野における再現性のバイナリ定義は,結果が再現できるエージェントや条件について明示的でないため不十分である,と我々は主張する。本研究では, 理論的再現性を創出するが, 実用的, 検証された再現性をサポートしない「オープン洗浄」を避けるための定義を拡張し, 検証可能性の概念に基づく計算再現性の階層システムを導入する。検証可能な計算再現性、特に計算社会科学の分野における共通の障壁を特定し、共通アクセスや計算障壁を回避する方法について提案する。

In the last decade, replication and reproducibility crises have shaken the scientific landscape. As potential solutions, open science practices were heavily discussed and have been implemented with varying success in different disciplines. We argue, however, that the binary definition of reproducibility, specifically for computational-X disciplines such as computational social science, is insufficient since it is not explicit about the agents and conditions under which results can be reproduced. We expand the definition to avoid "open washing", the practice of fabricating theoretical reproducibility but not supporting practical or verified reproducibility, and introduce a tier system of computational reproducibility based on the concept of verifiability. We identify common barriers to verifiable computational reproducibility, specifically in the field of computational social science, and provide suggestions on how to circumvent common access and computational barriers.

翻訳日:2023-07-07 11:01:04 公開日:2023-07-06

# 変換プロトフォーム再構成

Transformed Protoform Reconstruction ( http://arxiv.org/abs/2307.01896v2 )

ライセンス: Link先を確認

Young Min Kim, Kalvin Chang, Chenxuan Cui and David Mortensen

(参考訳) プロトホルムの再構築は、娘言語の祖先言語における形態素や単語の出現を推測する作業である。 Meloni et al. (2021)は、RNNベースのエンコーダデコーダとアテンションモデルを用いて、ラテン文字のプロトフォーム再構築の最先端を達成した。我々は最新のseq2seqモデルであるtransformerでモデルを更新する。我々のモデルは,5言語にまたがる8,000コニャート,39種にまたがる800以上のコニャートからなる中国語データセット(Hou 2004)の2つの異なるデータセット上で,それらのモデルを比較した。また,本モデルに含まれる可能性のある系統信号についても検討する。私たちのコードはhttps://github.com/cmu-llab/acl-2023で公開されています。

Protoform reconstruction is the task of inferring what morphemes or words appeared like in the ancestral languages of a set of daughter languages. Meloni et al. (2021) achieved the state-of-the-art on Latin protoform reconstruction with an RNN-based encoder-decoder with attention model. We update their model with the state-of-the-art seq2seq model: the Transformer. Our model outperforms their model on a suite of different metrics on two different datasets: their Romance data of 8,000 cognates spanning 5 languages and a Chinese dataset (Hou 2004) of 800+ cognates spanning 39 varieties. We also probe our model for potential phylogenetic signal contained in the model. Our code is publicly available at https://github.com/cmu-llab/acl-2023.

翻訳日:2023-07-07 11:00:51 公開日:2023-07-06

# Align with Purpose: General Plug-and-Play Frameworkを用いたCTCモデルにおけるDesiredプロパティの最適化

Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework ( http://arxiv.org/abs/2307.01715v2 )

ライセンス: Link先を確認

Eliya Segev, Maya Alroy, Ronen Katsir, Noam Wies, Ayana Shenhav, Yael Ben-Oren, David Zar, Oren Tadmor, Jacob Bitterman, Amnon Shashua and Tal Rosenwein

(参考訳) コネクショニスト時間分類(ctc)は、教師付きシーケンシャル・ツー・シークエンス(seq2seq)モデルの訓練に広く用いられている基準である。これは不完全なアライメントを犠牲にして、完全なアライメント(基礎となる真実を生み出す)を余分にすることで、入力シーケンスと出力シーケンスの関係を学習することができる。完全かつ不完全なアライメントのこの二項微分は、他の現実世界の応用において重要な重要なアライメント特性を捉えていない。ここでは、CTC基準でトレーニングされたモデルにおいて、所望のプロパティを強化するために、$\textbf{ general Plug-and-Play framework}$を提案する。我々は、所望の特性に応じてアライメントを優先順位付けする追加の損失項でCTCを補完する。本手法はctc損失関数への干渉を一切必要とせず,様々な特性の最適化を容易にし,完全アライメントと不完全アライメントの区別を可能にする。我々は,ASR(Automatic Speech Recognition)の領域にフレームワークを適用し,その特性選択,アーキテクチャ選択,トレーニングデータセットのスケール(最大280,000時間)において,その汎用性を示す。本フレームワークの有効性を実証するため, 出力時間と単語誤り率(WER)の2つの非関連特性に適用した。前者については、WERの小さな削減によるレイテンシ最適化の最大570msの改善を報告し、後者については、ベースラインモデルよりも4.5%WERの相対的な改善を報告した。私たちの知る限りでは、これらのアプリケーションは我々のものほど大規模なデータを扱うことが実証されたことはない。特に,本手法は数行のコードだけで実装可能であり,アライメントフリーな損失関数やASR以外の領域にも拡張可能である。

Connectionist Temporal Classification (CTC) is a widely used criterion for training supervised sequence-to-sequence (seq2seq) models. It enables learning the relations between input and output sequences, termed alignments, by marginalizing over perfect alignments (that yield the ground truth), at the expense of imperfect alignments. This binary differentiation of perfect and imperfect alignments falls short of capturing other essential alignment properties that hold significance in other real-world applications. Here we propose $\textit{Align With Purpose}$, a $\textbf{general Plug-and-Play framework}$ for enhancing a desired property in models trained with the CTC criterion. We do that by complementing the CTC with an additional loss term that prioritizes alignments according to a desired property. Our method does not require any intervention in the CTC loss function, enables easy optimization of a variety of properties, and allows differentiation between both perfect and imperfect alignments. We apply our framework in the domain of Automatic Speech Recognition (ASR) and show its generality in terms of property selection, architectural choice, and scale of training dataset (up to 280,000 hours). To demonstrate the effectiveness of our framework, we apply it to two unrelated properties: emission time and word error rate (WER). For the former, we report an improvement of up to 570ms in latency optimization with a minor reduction in WER, and for the latter, we report a relative improvement of 4.5% WER over the baseline models. To the best of our knowledge, these applications have never been demonstrated to work on a scale of data as large as ours. Notably, our method can be implemented using only a few lines of code, and can be extended to other alignment-free loss functions and to domains other than ASR.

翻訳日:2023-07-07 11:00:36 公開日:2023-07-06

# オンライン手書き署名検証のためのトランスフォーマーの検討

Exploring Transformers for On-Line Handwritten Signature Verification ( http://arxiv.org/abs/2307.01663v2 )

ライセンス: Link先を確認

Pietro Melzi, Ruben Tolosana, Ruben Vera-Rodriguez, Paula Delgado-Santos, Giuseppe Stragapede, Julian Fierrez, Javier Ortega-Garcia

(参考訳) 近年,ユーザフレンドリーな認証手法としてのモバイルバイオメトリックスの利用が増加している。近年の研究では、トランスフォーマーに基づく新しい行動バイオメトリック認識システムを提案している。オンライン手書き署名検証は、タブレットやスマートフォンなどの電子機器を用いて取得した生体認証に基づいて、被験者の身元を確認することを目的としている。本稿では,オンライン署名検証のための最近のトランスフォーマーに基づくアーキテクチャの適合性について検討する。特に4つの異なる構成が研究され、そのうち2つはVanilla Transformerエンコーダに依存し、他の2つは歩行と行動認識のタスクにうまく適用されている。提案する4つの構成をsvc-ongoing competitionで提案された実験プロトコルに従って評価する。実験の結果は有望であり,オンライン署名検証におけるトランスフォーマーの利用を促進する。

The application of mobile biometrics as a user-friendly authentication method has increased in the last years. Recent studies have proposed novel behavioral biometric recognition systems based on Transformers, which currently outperform the state of the art in several application scenarios. On-line handwritten signature verification aims to verify the identity of subjects, based on their biometric signatures acquired using electronic devices such as tablets or smartphones. This paper investigates the suitability of architectures based on recent Transformers for on-line signature verification. In particular, four different configurations are studied, two of them rely on the Vanilla Transformer encoder, and the two others have been successfully applied to the tasks of gait and activity recognition. We evaluate the four proposed configurations according to the experimental protocol proposed in the SVC-onGoing competition. The results obtained in our experiments are promising, and promote the use of Transformers for on-line signature verification.

翻訳日:2023-07-07 11:00:00 公開日:2023-07-06

# 見ることは信じない: 人間の視覚のプライバシー保護のためのアイデンティティ・ハイダー

Seeing is not Believing: An Identity Hider for Human Vision Privacy Protection ( http://arxiv.org/abs/2307.00481v2 )

ライセンス: Link先を確認

Tao Wang, Yushu Zhang, Zixuan Yang, Hua Zhang, and Zhongyun Hua

(参考訳) 大量の撮像された顔画像は、個人を特定するためにデータベースに格納される。しかし、保存された画像は、個人の意思ではなく、プライバシー侵害を引き起こす可能性があるデータマネージャによって、意図的に、または意図せず観察される。既存の保護は、顔の視覚的な内容をわずかに変えるだけで、識別の効用を保ちながら、人間の視覚による真のアイデンティティの推論に影響を受けやすい。本稿では,顔認識器の高識別性を維持しつつ,人間の視力に対する視覚的変化を顕著に抑制するアイデンティティ隠蔽器を提案する。まず、idハイダは、stylegan2の潜在空間を操作して、新たな視覚コンテンツを持つ仮想顔を生成する。特に、仮想顔は、例えばポーズや表現など、元の顔と同じ無関係な属性を持つ。次に、仮想顔の視覚内容が元の顔に転送され、背景が元の顔に置き換えられる。さらに、アイデンティティハイダは、強い転送性を有し、任意の顔認識器が良好な精度を達成できる。適切な実験により,提案手法はプライバシ保護と識別性保存において優れた性能を発揮することが示された。

Massive captured face images are stored in the database for the identification of individuals. However, the stored images can be observed intentionally or unintentionally by data managers, which is not at the will of individuals and may cause privacy violations. Existing protection works only slightly change the visual content of the face while maintaining the utility of identification, making it susceptible to the inference of the true identity by human vision. In this paper, we propose an identity hider that enables significant visual content change for human vision while preserving high identifiability for face recognizers. Firstly, the identity hider generates a virtual face with new visual content by manipulating the latent space in StyleGAN2. In particular, the virtual face has the same irrelevant attributes as the original face, e.g., pose and expression. Secondly, the visual content of the virtual face is transferred into the original face and then the background is replaced with the original one. In addition, the identity hider has strong transferability, which ensures an arbitrary face recognizer can achieve satisfactory accuracy. Adequate experiments show that the proposed identity hider achieves excellent performance on privacy protection and identifiability preservation.

翻訳日:2023-07-07 09:16:06 公開日:2023-07-06

PDF登録状況（公開日: 20230706）