Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20230726となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# デザインによる逆透過性による人間分析の再考 Rethinking People Analytics With Inverse Transparency by Design ( http://arxiv.org/abs/2305.09813v2 ) ライセンス: Link先を確認	Valentin Zieglmeier and Alexander Pretschner	(参考訳) 従業員は高度な分析を可能にする、ますますデジタル環境で働く。しかし、データを処理するシステムに対する監視は欠如している。つまり、潜在的な分析エラーや隠れバイアスは発見が難しいということだ。最近のデータ保護法はこれらの問題に取り組みますが、不十分です。データに対する適切なユースケースを省略しながらも、データの誤用を防ぎません。データ保護とデータ駆動システムとの対立は、異なる方法で解決すべきだと考えています。従業員のデータにアクセスする際には、逆透過性の概念に従って、すべての使用方法を透過的にする必要がある。これにより個人は、データ誤用による潜在的に有害な結果に対処しながら、賢明なデータ使用の恩恵を受けることができる。これを実現するために、我々は、デザインによる逆透明性と呼ばれる労働分析のための新しい設計手法を提案する。本提案の開発者およびユーザ視点を理解するために,学生を対象に2つの探索研究を行った。まず、小さな開発者のチームが逆透明性を備えた分析ツールを設計して、アプローチの判断方法と、それが開発ツールでどのように実現されているかを明らかにする。アーキテクチャの変更はコア機能を阻害することなく行われます。開発者は我々のアプローチを価値があり技術的に実現可能であると考えている。第2に,3ヶ月以上にわたってユーザ調査を実施し,参加者が提供された逆透過性を体験し,その経験を反映させる。この研究は、ほとんどの作業プロセスがすでにディジタルであるソフトウェア開発の作業場をモデル化する。参加者は透明性を有益と認識し、その権限を行使する。彼らは全会一致で職場の改善だと同意した。設計による逆透過性は、受け入れられた、責任ある人の分析を実現するための有望なアプローチであると結論づける。 Employees work in increasingly digital environments that enable advanced analytics. Yet, they lack oversight over the systems that process their data. That means that potential analysis errors or hidden biases are hard to uncover. Recent data protection legislation tries to tackle these issues, but it is inadequate. It does not prevent data misusage while at the same time stifling sensible use cases for data. We think the conflict between data protection and increasingly data-driven systems should be solved differently. When access to an employees' data is given, all usages should be made transparent to them, according to the concept of inverse transparency. This allows individuals to benefit from sensible data usage while addressing the potentially harmful consequences of data misusage. To accomplish this, we propose a new design approach for workforce analytics we refer to as inverse transparency by design. To understand the developer and user perspectives on the proposal, we conduct two exploratory studies with students. First, we let small teams of developers implement analytics tools with inverse transparency by design to uncover how they judge the approach and how it materializes in their developed tools. We find that architectural changes are made without inhibiting core functionality. The developers consider our approach valuable and technically feasible. Second, we conduct a user study over three months to let participants experience the provided inverse transparency and reflect on their experience. The study models a software development workplace where most work processes are already digital. Participants perceive the transparency as beneficial and feel empowered by it. They unanimously agree that it would be an improvement for the workplace. We conclude that inverse transparency by design is a promising approach to realize accepted and responsible people analytics.	翻訳日:2023-10-24 08:21:23 公開日:2023-07-26
# コードレビューにおけるコードスニペットの最小化 - OpenStackコミュニティとQtコミュニティの検討と実践者調査 Demystifying Code Snippets in Code Reviews: A Study of the OpenStack and Qt Communities and A Practitioner Survey ( http://arxiv.org/abs/2307.14406v1 ) ライセンス: Link先を確認	Beiqi Zhang, Liming Fu, Peng Liang, Jiaxin Yu, Chong Wang	(参考訳) コードレビューはソフトウェア開発におけるソフトウェア品質保証のベストプラクティスの1つとして広く知られている。典型的なコードレビュープロセスでは、レビュー担当者が開発者がコミットしたコードをチェックして、コードの品質を保証する。結果として、レビューコメントの情報を理解することは、レビュアーや開発者が効果的なコードレビューを行うための前提条件となる。コードスニペットは、特別なコード形式として、コードレビューに必要な情報を伝えるために使用できる。例えば、レビュアはコードスニペットを使って提案したり、アイデアを精巧にすることで、コードレビューで開発者に必要な情報を満たすことができる。しかし、コードレビューにコードスニペットを提供するプラクティスに注目した研究はほとんどない。このギャップを埋めるために、コードレビューのコードスニペットに関する情報と知識をマイニングする混合手法の研究を行い、実践者や研究者がコードレビューでコードスニペットを使用することについて理解を深めるのに役立つ。具体的には,コードレビューデータのマイニングと実践者の調査の2段階を含む。調査の結果は、レビュー担当者がコードレビューで開発者が必要とする特定の情報を満たすために、適切なシナリオでコードスニペットを提供することができる点を強調している。 Code review is widely known as one of the best practices for software quality assurance in software development. In a typical code review process, reviewers check the code committed by developers to ensure the quality of the code, during which reviewers and developers would communicate with each other in review comments to exchange necessary information. As a result, understanding the information in review comments is a prerequisite for reviewers and developers to conduct an effective code review. Code snippet, as a special form of code, can be used to convey necessary information in code reviews. For example, reviewers can use code snippets to make suggestions or elaborate their ideas to meet developers' information needs in code reviews. However, little research has focused on the practices of providing code snippets in code reviews. To bridge this gap, we conduct a mixed-methods study to mine information and knowledge related to code snippets in code reviews, which can help practitioners and researchers get a better understanding about using code snippets in code review. Specifically, our study includes two phases: mining code review data and conducting practitioners' survey. The study results highlight that reviewers can provide code snippets in appropriate scenarios to meet developers' specific information needs in code reviews, which will facilitate and accelerate the code review process.	翻訳日:2023-10-23 16:10:00 公開日:2023-07-26
# redditのデータマイニングが新型コロナパンデミックの学生の要求に応える Mining Reddit Data to Elicit Students' Requirements During COVID-19 Pandemic ( http://arxiv.org/abs/2307.14212v1 ) ライセンス: Link先を確認	Shadikur Rahman, Faiz Ahmed, Maleknaz Nayebi	(参考訳) データ駆動要件エンジニアリングは、web上のオープンアクセスとクラウドソースの情報を豊富に活用する。モバイルアプリストアのレビューのようなソフトウェア製品に関するユーザフィードバックを取り入れることで、これらのアプローチは問題の特定、バグ修正、変更要求の実装を容易にする。しかしながら、ソフトウェア製品に関するユーザからのフィードバックにのみ依存することは、ソフトウェアが遭遇し、それを支援するために使用する問題、イベント、課題に関する豊富な経験にもかかわらず、ユーザがソフトウェアから正確なニーズを常に明確に理解しているとは限らないため、すべての要件を引き出す可能性を制限する。本研究では,ソフトウェア製品に対するフィードバックにのみ依存するのではなく,問題自体に関するフィードバックを集めることに着目し,要件適用のシフトを提案する。高等教育機関における新型コロナウイルスパンデミック時の学生要件に関するケーススタディを行った。パンデミックの間、Redditからコミュニケーションを集め、複数の機械学習と自然言語処理技術を使って要求文を特定しました。 TF-IDFを用いて複数の手法のベンチマークを行った結果,0.79のFスコアが得られた。その結果,問題に関するコミュニケーションからのマイニングの要件が実現可能であると考えることができた。予備的な結果を示す一方で、これらの要件が従来の要求を補完し、要求ギャップを埋める未来を想定する。 Data-driven requirements engineering leverages the abundance of openly accessible and crowdsourced information on the web. By incorporating user feedback provided about a software product, such as reviews in mobile app stores, these approaches facilitate the identification of issues, bug fixes, and implementation of change requests. However, relying solely on user feedback about a software product limits the possibility of eliciting all requirements, as users may not always have a clear understanding of their exact needs from the software, despite their wealth of experience with the problem, event, or challenges they encounter and use the software to assist them. In this study, we propose a shift in requirements elicitation, focusing on gathering feedback related to the problem itself rather than relying solely on feedback about the software product. We conducted a case study on student requirements during the COVID-19 pandemic in a higher education institution. We gathered their communications from Reddit during the pandemic and employed multiple machine-learning and natural language processing techniques to identify requirement sentences. We achieved the F-score of 0.79 using Naive Bayes with TF-IDF when benchmarking multiple techniques. The results lead us to believe that mining requirements from communication about a problem are feasible. While we present the preliminary results, we envision a future where these requirements complement conventionally elicited requirements and help to close the requirements gap.	翻訳日:2023-10-23 16:09:19 公開日:2023-07-26
# 局所的に観察し、グローバルに分類する: gnnを使ってスパースマトリックス構造を識別する Observe Locally, Classify Globally: Using GNNs to Identify Sparse Matrix Structure ( http://arxiv.org/abs/2309.02442v1 ) ライセンス: Link先を確認	Khaled Abdelaal and Richard Veras	(参考訳) スパース行列計算の性能は、行列形式と計算されるデータの基盤構造との整合性に大きく依存する。異なるスパース行列形式は、データの異なる構造に適している。したがって、第一の課題は、計算の前に行列構造を識別して適切なデータ形式に適合させることである。 2つめの課題は、データセット全体を分類する前に読み込むのを避けることだ。これは、サンプルとその特徴を通してマトリックス構造を識別することで実現できる。しかし、グローバルな特徴はサンプリングセットから決定できず、代わりに局所的な特徴から推測する必要がある可能性がある。これらの課題に対処するために,グラフ畳み込みネットワークを用いたスパース行列構造分類器を生成するフレームワークを開発した。フレームワークは、ユーザが提供するジェネレータを使用して、他のマトリックス構造に拡張することもできる。提案手法は,代表的スパース行列形状の97%の分類精度を実現する。 The performance of sparse matrix computation highly depends on the matching of the matrix format with the underlying structure of the data being computed on. Different sparse matrix formats are suitable for different structures of data. Therefore, the first challenge is identifying the matrix structure before the computation to match it with an appropriate data format. The second challenge is to avoid reading the entire dataset before classifying it. This can be done by identifying the matrix structure through samples and their features. Yet, it is possible that global features cannot be determined from a sampling set and must instead be inferred from local features. To address these challenges, we develop a framework that generates sparse matrix structure classifiers using graph convolutional networks. The framework can also be extended to other matrix structures using user-provided generators. The approach achieves 97% classification accuracy on a set of representative sparse matrix shapes.	翻訳日:2023-10-23 09:04:33 公開日:2023-07-26
# LieDetect: 点雲からのコンパクトリー群の表現軌道の検出 LieDetect: Detection of representation orbits of compact Lie groups from point clouds ( http://arxiv.org/abs/2309.03086v1 ) ライセンス: Link先を確認	Henrique Ennes, Rapha\"el Tinarrage	(参考訳) 我々は、その軌道の有限サンプルからコンパクトリー群の表現を推定する新しいアルゴリズムを提案する。提案手法は,他の報告手法と異なり,既約表現の直接和として正確な表現型の検索を可能にする。さらに、表現型の知識は、その軌道の再構成を可能にし、これは作用を生成するリー群を特定するのに役立つ。我々のアルゴリズムは任意のコンパクトリー群に対して一般化されるが、SO(2), T^d, SU(2), SO(3) のインスタンス化のみが考慮される。ハウスドルフとヴァッサーシュタイン距離の観点からのロバスト性の理論的な保証が導かれる。我々のツールは幾何学的測度論、計算幾何学、行列多様体の最適化から導かれる。このアルゴリズムは16次元までの合成データと、画像解析、調和解析、古典力学システムにおける実時間応用のためにテストされ、非常に正確な結果が得られる。 We suggest a new algorithm to estimate representations of compact Lie groups from finite samples of their orbits. Different from other reported techniques, our method allows the retrieval of the precise representation type as a direct sum of irreducible representations. Moreover, the knowledge of the representation type permits the reconstruction of its orbit, which is useful to identify the Lie group that generates the action. Our algorithm is general for any compact Lie group, but only instantiations for SO(2), T^d, SU(2) and SO(3) are considered. Theoretical guarantees of robustness in terms of Hausdorff and Wasserstein distances are derived. Our tools are drawn from geometric measure theory, computational geometry, and optimization on matrix manifolds. The algorithm is tested for synthetic data up to dimension 16, as well as real-life applications in image analysis, harmonic analysis, and classical mechanics systems, achieving very accurate results.	翻訳日:2023-10-23 08:54:27 公開日:2023-07-26
# 炭素オフセット市場におけるweb3活用の枠組みとケーススタディ Harnessing Web3 on Carbon Offset Market for Sustainability: Framework and A Case Study ( http://arxiv.org/abs/2308.02039v1 ) ライセンス: Link先を確認	Chenyu Zhou, Hongzhou Chen, Shiman Wang, Xinyao Sun, Abdulmotaleb El Saddik, Wei Cai	(参考訳) metaverseとweb3を形作る上で重要なブロックチェーンは、高エネルギー消費と二酸化炭素排出に対する批判をしばしば引き起こす。持続可能性を重視したブロックチェーンの台頭、特に革新的なワイヤレス技術と交差する場合、この状況は改善される。持続可能性におけるブロックチェーンの役割を理解するために,記録と追跡,広範な検証,バリュートレーディング,概念拡散という4つのグリーンユーティリティをカプセル化した3層構造を提案する。分権的自主的炭素オフセットプロジェクトであるノリ(ノリ)が当社の事例として,これらのユーティリティを照らす。我々の研究は、オンチェーンの炭素市場参加者に対するユニークな洞察、市場の要因、NTTベースの炭素クレジットの価値提案、そして、炭素オフセットの概念を広めるためのソーシャルメディアの役割に影響を及ぼす。ブロックチェーンのサステナビリティへの貢献は重要であり、ブロックチェーンセクターにおける新たな標準としてカーボンオフセットが進化する可能性がある、と私たちは主張しています。 Blockchain, pivotal in shaping the metaverse and Web3, often draws criticism for high energy consumption and carbon emission. The rise of sustainability-focused blockchains, especially when intersecting with innovative wireless technologies, revises this predicament. To understand blockchain's role in sustainability, we propose a three-layers structure encapsulating four green utilities: Recording and Tracking, Wide Verification, Value Trading, and Concept Disseminating. Nori, a decentralized voluntary carbon offset project, serves as our case, illuminating these utilities. Our research unveils unique insights into the on-chain carbon market participants, affect factors of the market, value propositions of NFT-based carbon credits, and the role of social media to spread the concept of carbon offset. We argue that blockchain's contribution to sustainability is significant, with carbon offsetting potentially evolving as a new standard within the blockchain sector.	翻訳日:2023-08-14 01:59:34 公開日:2023-07-26
# 適応的特徴埋め込みを用いたフレキシブルな個人用垂直学習 Flexible Differentially Private Vertical Federated Learning with Adaptive Feature Embeddings ( http://arxiv.org/abs/2308.02362v1 ) ライセンス: Link先を確認	Yuxi Mi, Hongquan Liu, Yewei Xia, Yiheng Sun, Jihong Guan, Shuigeng Zhou	(参考訳) 垂直連合学習(vertical federated learning, vfl)の出現は、プライバシー保護の不完全性に関する懸念を刺激した。本稿では、データプライバシとVFLのタスクユーティリティ目標との微妙な均衡を、差分プライバシー(DP)下で検討する。先行技術の一般性問題に対処するため,本稿では,2つの目標を分離し,順次対処するフレキシブルで汎用的なアプローチを提唱する。具体的には、さまざまなデータセットやモデルに適用可能な共有機能埋め込みにノームクリップを適用することで、最初は厳格なプライバシー保証を導き出します。提案手法は,DP機構を改良することなく,機能埋め込みの規模や分布を精度よく調整することで,タスクユーティリティを最適化できることを実証する。提案するVFL-AFEフレームワークは,広範な実験によって実証されたように,プライバシ攻撃に対する有効性と,良好なタスクユーティリティを維持する能力を示す。 The emergence of vertical federated learning (VFL) has stimulated concerns about the imperfection in privacy protection, as shared feature embeddings may reveal sensitive information under privacy attacks. This paper studies the delicate equilibrium between data privacy and task utility goals of VFL under differential privacy (DP). To address the generality issue of prior arts, this paper advocates a flexible and generic approach that decouples the two goals and addresses them successively. Specifically, we initially derive a rigorous privacy guarantee by applying norm clipping on shared feature embeddings, which is applicable across various datasets and models. Subsequently, we demonstrate that task utility can be optimized via adaptive adjustments on the scale and distribution of feature embeddings in an accuracy-appreciative way, without compromising established DP mechanisms. We concretize our observation into the proposed VFL-AFE framework, which exhibits effectiveness against privacy attacks and the capacity to retain favorable task utility, as substantiated by extensive experiments.	翻訳日:2023-08-14 01:47:28 公開日:2023-07-26
# 欧州のai法で許容されるリスク - リスク管理がどの程度で十分かを決める他の原則 Acceptable risks in Europe's proposed AI Act: Reasonableness and other principles for deciding how much risk management is enough ( http://arxiv.org/abs/2308.02047v1 ) ライセンス: Link先を確認	Henry Fraser and Jose-Miguel Bello y Villarino	(参考訳) 本稿では、基本的権利と安全性にリスクをもたらすリスクの高いAIシステムに対して、欧州委員会が提案したリスク管理とリスク受容性に対するAI法のアプローチを批判的に評価する。この法律は「信頼できる」AIを規制の負担に比例して推進することを目的としている。そのリスク受容性に関する規定は、リスクの高いシステムからの残留リスクを「芸術の状態」に関して「可能な限り」削減または排除する必要がある。この基準、特に狭義に解釈される場合、作業不能であり、規制上の負担や信頼性を比例しない。これとは対照的に、欧州議会のリスク管理条項に関する最新の修正案は「合理的性」、コスト対効果の分析を導入し、リスク受容可能性判断の価値と文脈の性質をより透明にしている。この論文では、議会のアプローチはより機能的であり、比例性と信頼性の目標のバランスが良いと論じている。リスク受容性判断の合理的性は、無視法や欧州医療機器規制の原則に依拠して説明されている。また、リスク受容性判断のアプローチには、規制当局による詳細なガイダンスや関与、影響のある利害関係者からの有意義なインプットなど、市民の正当性の確固たる基礎が必要です。 This paper critically evaluates the European Commission's proposed AI Act's approach to risk management and risk acceptability for high-risk AI systems that pose risks to fundamental rights and safety. The Act aims to promote "trustworthy" AI with a proportionate regulatory burden. Its provisions on risk acceptability require residual risks from high-risk systems to be reduced or eliminated "as far as possible", having regard to the "state of the art". This criterion, especially if interpreted narrowly, is unworkable and promotes neither proportionate regulatory burden, nor trustworthiness. By contrast the Parliament's most recent draft amendments to the risk management provisions introduce "reasonableness", cost-benefit analysis, and are more transparent about the value-laden and contextual nature of risk acceptability judgements. This paper argues that the Parliament's approach is more workable, and better balances the goals of proportionality and trustworthiness. It explains what reasonableness in risk acceptability judgments would entail, drawing on principles from negligence law and European medical devices regulation. And it contends that the approach to risk acceptability judgments need a firm foundation of civic legitimacy: including detailed guidance or involvement from regulators, and meaningful input from affected stakeholders.	翻訳日:2023-08-14 01:47:10 公開日:2023-07-26
# データサイエンスの民主化におけるChatGPTの役割--テレマティクスにおけるAIによるデータ分析の探索 The Role of ChatGPT in Democratizing Data Science: An Exploration of AI-facilitated Data Analysis in Telematics ( http://arxiv.org/abs/2308.02045v1 ) ライセンス: Link先を確認	Ryan Lingo	(参考訳) データサイエンスの領域は、かつて専門家のために確保されていたもので、生成AIの急速な台頭、特にChatGPTのようなツールを通じて革命を繰り広げている。本稿では,chatgptを重要な橋として捉え,従来の複雑なデータ分析に伴う急な学習曲線を格段に下げる。直感的なデータナラティブを生成し、リアルタイムのアシストを提供することで、ChatGPTはフィールドを民主化し、複雑なデータセットからより広い聴衆が洞察を得られるようにする。この変換ポテンシャルの注目すべき例が、合成生成されたテレマティクスデータセットの検証を通じて示され、chatgptは複雑なパターンや洞察を蒸留するのに役立っている。しかし、民主化への旅にはハードルがないわけではない。この論文は、分析における潜在的なバイアスからChatGPTの限定的な推論能力まで、そのようなAIが提示する課題について論じている。民主化されたデータサイエンスの展望の約束は、この移行に注意と認識、そしてツールの能力と制約の絶え間なく進化し続ける理解を持って取り組むことが不可欠である。 The realm of data science, once reserved for specialists, is undergoing a revolution with the rapid emergence of generative AI, particularly through tools like ChatGPT. This paper posits ChatGPT as a pivotal bridge, drastically lowering the steep learning curve traditionally associated with complex data analysis. By generating intuitive data narratives and offering real-time assistance, ChatGPT democratizes the field, enabling a wider audience to glean insights from intricate datasets. A notable illustration of this transformative potential is provided through the examination of a synthetically generated telematics dataset, wherein ChatGPT aids in distilling complex patterns and insights. However, the journey to democratization is not without its hurdles. The paper delves into challenges presented by such AI, from potential biases in analysis to ChatGPT's limited reasoning capabilities. While the promise of a democratized data science landscape beckons, it is imperative to approach this transition with caution, cognizance, and an ever-evolving understanding of the tool's capabilities and constraints.	翻訳日:2023-08-14 01:46:47 公開日:2023-07-26
# ホップ長に対する微分可能な短時間フーリエ変換 Differentiable short-time Fourier transform with respect to the hop length ( http://arxiv.org/abs/2308.02421v1 ) ライセンス: Link先を確認	Maxime Leiber, Yosra Marnissi, Axel Barrau, Mohammed El Badaoui	(参考訳) 本稿では,これらのパラメータを連続させることにより,ホップ長やフレーム時間位置の勾配に基づく最適化を可能にする,短時間フーリエ変換(STFT)の微分可能バージョンを提案する。ホップ長の連続的な性質により、より微調整された最適化が可能となり、フレームの時間的位置決めの制御を改善した。さらに,従来の離散最適化手法よりも計算効率がよい勾配降下法などの最適化手法の利用も可能である。私たちの差別化可能なSTFTは、既存のアルゴリズムやニューラルネットワークに簡単に統合することができます。本研究は,提案手法の有効性を実証し,研究コミュニティの関心を惹きつけるためのシミュレーションイラストを提示する。 In this paper, we propose a differentiable version of the short-time Fourier transform (STFT) that allows for gradient-based optimization of the hop length or the frame temporal position by making these parameters continuous. Our approach provides improved control over the temporal positioning of frames, as the continuous nature of the hop length allows for a more finely-tuned optimization. Furthermore, our contribution enables the use of optimization methods such as gradient descent, which are more computationally efficient than conventional discrete optimization methods. Our differentiable STFT can also be easily integrated into existing algorithms and neural networks. We present a simulated illustration to demonstrate the efficacy of our approach and to garner interest from the research community.	翻訳日:2023-08-14 01:39:01 公開日:2023-07-26
# ウィンドウ長に対する可変適応短時間フーリエ変換 Differentiable adaptive short-time Fourier transform with respect to the window length ( http://arxiv.org/abs/2308.02418v1 ) ライセンス: Link先を確認	Maxime Leiber, Yosra Marnissi, Axel Barrau, Mohammed El Badaoui	(参考訳) 本稿では,短時間フーリエ変換 (STFT) のフレーム単位と周波数単位のウィンドウ長を段階的に最適化する手法を提案する。結果として得られる微分可能適応STFTは、過渡成分と定常成分の両方に同じ時間周波数表現で適応できるなど、可換性を持っているが、勾配降下により容易に最適化できる。本手法の性能を振動解析で検証する。 This paper presents a gradient-based method for on-the-fly optimization for both per-frame and per-frequency window length of the short-time Fourier transform (STFT), related to previous work in which we developed a differentiable version of STFT by making the window length a continuous parameter. The resulting differentiable adaptive STFT possesses commendable properties, such as the ability to adapt in the same time-frequency representation to both transient and stationary components, while being easily optimized by gradient descent. We validate the performance of our method in vibration analysis.	翻訳日:2023-08-14 01:38:24 公開日:2023-07-26
# 先進運転支援システムにおける視力検出 Visual Saliency Detection in Advanced Driver Assistance Systems ( http://arxiv.org/abs/2308.03770v1 ) ライセンス: Link先を確認	Francesco Rundo, Michael Sebastian Rundo, Concetto Spampinato	(参考訳) ビジュアル・サリエンシ(Visual Saliency)とは、観察された環境から重要な特徴を集中して抽出する人間のメカニズムである。近年,視覚障害者の視力評価に関する自動車研究の分野への関心が高まっている。運転中、ドライバーは自然に特定の物体に注意を向け、他の要素よりも特定の要素を優先する脳駆動のサリエンシメカニズムを採用する。本研究では,ドライバの眠気検出システムと,サリエンシーに基づくシーン理解パイプラインを組み合わせたインテリジェントシステムを提案する。そこで本研究では,自動車グレードの外部カメラで捉えたフレームの処理を事前訓練し,調整した,セマンティックセグメンテーションのための3Dディープネットワークを実装した。提案されたパイプラインは、ARM A7デュアルコアを備えたSTA1295コアを使用した組み込みプラットフォーム上にホストされ、ハードウェアアクセラレータが組み込まれている。さらに,自動車ハンドルに埋め込まれた革新的なバイオセンサーを用いて運転者の眠気を監視し,運転者の光PlethysmoGraphy(PPG)信号を収集する。収集したppg時系列を分類する専用の1次元時間深層畳み込みネットワークが考案され,ドライバの注意度を評価することができた。最終的に、運転者の決定された注意レベルと対応する相性に基づくシーン分類を比較し、全体の安全レベルを評価する。提案したパイプラインの有効性は広範な実験結果によって検証された。 Visual Saliency refers to the innate human mechanism of focusing on and extracting important features from the observed environment. Recently, there has been a notable surge of interest in the field of automotive research regarding the estimation of visual saliency. While operating a vehicle, drivers naturally direct their attention towards specific objects, employing brain-driven saliency mechanisms that prioritize certain elements over others. In this investigation, we present an intelligent system that combines a drowsiness detection system for drivers with a scene comprehension pipeline based on saliency. To achieve this, we have implemented a specialized 3D deep network for semantic segmentation, which has been pretrained and tailored for processing the frames captured by an automotive-grade external camera. The proposed pipeline was hosted on an embedded platform utilizing the STA1295 core, featuring ARM A7 dual-cores, and embeds an hardware accelerator. Additionally, we employ an innovative biosensor embedded on the car steering wheel to monitor the driver drowsiness, gathering the PhotoPlethysmoGraphy (PPG) signal of the driver. A dedicated 1D temporal deep convolutional network has been devised to classify the collected PPG time-series, enabling us to assess the driver level of attentiveness. Ultimately, we compare the determined attention level of the driver with the corresponding saliency-based scene classification to evaluate the overall safety level. The efficacy of the proposed pipeline has been validated through extensive experimental results.	翻訳日:2023-08-14 00:41:27 公開日:2023-07-26
# 薬理学データベースへの自然インタフェースのための大規模言語モデルの利用 Utilizing Large Language Models for Natural Interface to Pharmacology Databases ( http://arxiv.org/abs/2307.15717v1 ) ライセンス: Link先を確認	Hong Lu, Chuan Li, Yinheng Li, Jie Zhao	(参考訳) 薬の開発プロセスは、薬理学者が文学のレビュー、仮説の定式化、実験の設計、結果の解釈など、様々なタスクを負う必要がある。各ステージは大量の情報にアクセスし、クエリする必要がある。本稿では,データベースに格納された構造化情報と対話するためのLarge Language Model (LLM)ベースの自然言語インタフェースを提案する。提案手法の有効性と有効性を示す実験を行った。このフレームワークは、幅広い薬学データと知識ベースに問合せを一般化することができる。 The drug development process necessitates that pharmacologists undertake various tasks, such as reviewing literature, formulating hypotheses, designing experiments, and interpreting results. Each stage requires accessing and querying vast amounts of information. In this abstract, we introduce a Large Language Model (LLM)-based Natural Language Interface designed to interact with structured information stored in databases. Our experiments demonstrate the feasibility and effectiveness of the proposed framework. This framework can generalize to query a wide range of pharmaceutical data and knowledge bases.	翻訳日:2023-08-06 11:35:37 公開日:2023-07-26
# 生成言語モデルを用いたニューラルマシン翻訳のためのデータ拡張 Data Augmentation for Neural Machine Translation using Generative Language Model ( http://arxiv.org/abs/2307.16833v1 ) ライセンス: Link先を確認	Seokjin Oh, Su ah Lee and Woohwan Jung	(参考訳) モデルアーキテクチャの急速な成長にもかかわらず、大きな並列コーパスの不足はニューラルマシン翻訳の主要なボトルネックである。データ拡張(Data augmentation)は、新しいデータを集める代わりに合成データを生成することによって、データハングリーモデルの性能を向上させる技術である。 chatgptのような大規模言語モデルを活用したプロンプトベースのデータ拡張手法について検討する。合成並列コーパスを作成するために,異なるプロンプトを用いて3つの手法を比較する。生成した合成データの多様性を測定するために2つの評価指標を用いる。このアプローチは、バックトランスレーションのような他の拡張メソッドで必須となる、さらなるモデルトレーニングコストを必要としない。提案手法では, ベースラインを0.68 bleuスコアで改善する。 Despite the rapid growth in model architecture, the scarcity of large parallel corpora remains the main bottleneck in Neural Machine Translation. Data augmentation is a technique that enhances the performance of data-hungry models by generating synthetic data instead of collecting new ones. We explore prompt-based data augmentation approaches that leverage large-scale language models such as ChatGPT. To create a synthetic parallel corpus, we compare 3 methods using different prompts. We employ two assessment metrics to measure the diversity of the generated synthetic data. This approach requires no further model training cost, which is mandatory in other augmentation methods like back-translation. The proposed method improves the unaugmented baseline by 0.68 BLEU score.	翻訳日:2023-08-06 11:22:22 公開日:2023-07-26
# 単語埋め込みにおけるアイデアの流れ The flow of ideas in word embeddings ( http://arxiv.org/abs/2307.16819v1 ) ライセンス: Link先を確認	Debayan Dasgupta	(参考訳) アイデアの流れは物理学者、心理学者、機械学習エンジニアによって広く研究されている。本稿では, マイクロレオロジーの具体的ツールを用いて, 類似性に基づくアイデアの流れを考察する。単語埋め込みにランダムウォーカを導入し,その振る舞いについて検討する。このような類似性によるランダムウォークは、生体細胞や複雑な流体のような複雑な構造系でよく見られる異常拡散の徴候を示す。論文は,ランダムウォークとブラウン運動下での粒子拡散の研究に使用される一般的なツールを用いて,文書中の多様なアイデアの蓄積を定量的に評価することを提案する。全体として,マイクロレオロジーと機械学習の概念を組み合わせた自己参照手法を提案し,言語モデルの有意義な傾向と創造性との関連性について考察する。 The flow of ideas has been extensively studied by physicists, psychologists, and machine learning engineers. This paper adopts specific tools from microrheology to investigate the similarity-based flow of ideas. We introduce a random walker in word embeddings and study its behavior. Such similarity-mediated random walks through the embedding space show signatures of anomalous diffusion commonly observed in complex structured systems such as biological cells and complex fluids. The paper concludes by proposing the application of popular tools employed in the study of random walks and diffusion of particles under Brownian motion to assess quantitatively the incorporation of diverse ideas in a document. Overall, this paper presents a self-referenced method combining microrheology and machine learning concepts to explore the meandering tendencies of language models and their potential association with creativity.	翻訳日:2023-08-06 11:22:13 公開日:2023-07-26
# ヒューマノイドロボットに具体化された汎用知性の構築とテスト Building and Testing a General Intelligence Embodied in a Humanoid Robot ( http://arxiv.org/abs/2307.16770v1 ) ライセンス: Link先を確認	Suzanne Gildert and Geordie Rose	(参考訳) 人間レベルの知能を持つ機械は、最も経済的に価値のある仕事ができるはずです。これは、人間のような心を作るという科学的大挑戦と、大きな経済的なインセンティブと一致する。本稿では,このようなシステムの構築とテストのアプローチについて述べる。私たちのアプローチは、物理的ヒューマノイドロボットシステム、このタイプのロボットのためのソフトウェアベースの制御システム、ヒューマノイドロボットにおける人間に似た知性を測定するために設計されたパフォーマンスメトリック、およびこのパフォーマンスメトリックのスコアを漸進的に増やす進化的アルゴリズムを含む。それぞれの現状について紹介し、解説する。本報告では, システムにおけるg+測定の現在および歴史的測定について報告する。 Machines with human-level intelligence should be able to do most economically valuable work. This aligns a major economic incentive with the scientific grand challenge of building a human-like mind. Here we describe our approach to building and testing such a system. Our approach comprises a physical humanoid robotic system; a software based control system for robots of this type; a performance metric, which we call g+, designed to be a measure of human-like intelligence in humanoid robots; and an evolutionary algorithm for incrementally increasing scores on this performance metric. We introduce and describe the current status of each of these. We report on current and historical measurements of the g+ metric on the systems described here.	翻訳日:2023-08-06 11:21:33 公開日:2023-07-26
# 大規模言語モデルのための透かし統合のための3つのれんが Three Bricks to Consolidate Watermarks for Large Language Models ( http://arxiv.org/abs/2308.00113v1 ) ライセンス: Link先を確認	Pierre Fernandez, Antoine Chaffin, Karim Tit, Vivien Chappelier, Teddy Furon	(参考訳) 生成テキストと自然テキストの区別はますます困難になっている。この文脈では、ウォーターマーキングは、生成されたテキストを特定のモデルに割り当てるための有望なテクニックとして現れる。サンプリング生成プロセスを変更して、生成した出力に目に見えない痕跡を残すことで、後続の検出を容易にする。本研究は,3つの理論的および経験的考察に基づいて,大規模言語モデルの透かしを統合する。まず、低い偽陽性率(10$^{\text{-6}}$未満)でも有効であるような、堅牢な理論的保証を提供する新しい統計テストを導入する。第2に,自然言語処理の分野における古典的なベンチマークを用いたウォーターマークの有効性を比較し,実世界への適用可能性について考察する。第3に,LLMへのアクセスが可能なシナリオとマルチビット透かしの高度な検出手法を開発した。 The task of discerning between generated and natural texts is increasingly challenging. In this context, watermarking emerges as a promising technique for ascribing generated text to a specific model. It alters the sampling generation process so as to leave an invisible trace in the generated output, facilitating later detection. This research consolidates watermarks for large language models based on three theoretical and empirical considerations. First, we introduce new statistical tests that offer robust theoretical guarantees which remain valid even at low false-positive rates (less than 10$^{\text{-6}}$). Second, we compare the effectiveness of watermarks using classical benchmarks in the field of natural language processing, gaining insights into their real-world applicability. Third, we develop advanced detection schemes for scenarios where access to the LLM is available, as well as multi-bit watermarking.	翻訳日:2023-08-06 11:13:34 公開日:2023-07-26
# 巨大な言語モデルは人間の言語を理解することができるのか? A Sentence is Worth a Thousand Pictures: Can Large Language Models Understand Human Language? ( http://arxiv.org/abs/2308.00109v1 ) ライセンス: Link先を確認	Gary Marcus, Evelina Leivada, Elliot Murphy	(参考訳) 人工知能アプリケーションは、単語の予測に依存する言語関連のタスクに大きな可能性を示す。現在の世代の大きな言語モデルは、人間の言語的パフォーマンスに関する主張と関連付けられており、その応用は、人工知能の重要なステップとして、そして人間の言語における認知的、さらには神経的基礎を理解するための大きな進歩として、双方に称賛されている。我々は,大規模言語モデルの寄与を,対象システムの理論的に有意な表現として分析し,これらのモデルの開発・活用の現状からまだ欠落している重要な能力を特定する。 Artificial Intelligence applications show great potential for language-related tasks that rely on next-word prediction. The current generation of large language models have been linked to claims about human-like linguistic performance and their applications are hailed both as a key step towards Artificial General Intelligence and as major advance in understanding the cognitive, and even neural basis of human language. We analyze the contribution of large language models as theoretically informative representations of a target system vs. atheoretical powerful mechanistic tools, and we identify the key abilities that are still missing from the current state of development and exploitation of these models.	翻訳日:2023-08-06 11:13:21 公開日:2023-07-26
# DPBERT:動的計画に基づくBERTの効率的な推論 DPBERT: Efficient Inference for BERT based on Dynamic Planning ( http://arxiv.org/abs/2308.00108v1 ) ライセンス: Link先を確認	Weixin Wu and Hankz Hankui Zhuo	(参考訳) BERTのような大規模事前訓練型言語モデルは、NLPの開発に大きく貢献している。しかし、これらのモデルには膨大な計算資源が必要であり、計算能力に制限のあるモバイルデバイスに適用することは困難である。本稿では,BERTの構造を十分に活用できない既存の入力適応推論手法の弱点に対処することを目的とする。本稿では,入力サンプルの計算経路としてバックボーンの変圧器層リストを選択することで,BERTの推論プロセスを高速化する新しい微調整手法であるBERTの動的プランニングを提案する。これを実現するため、本手法では、推論中に層が含まれているかバイパスされているかを判断する計画モジュールを元のBERTモデルに追加する。 glueベンチマークによる実験の結果,98\%の精度を維持しつつ遅延を75\%まで低減し,最先端の入力適応法と比較して精度と速度のトレードオフが向上した。 Large-scale pre-trained language models such as BERT have contributed significantly to the development of NLP. However, those models require large computational resources, making it difficult to be applied to mobile devices where computing power is limited. In this paper we aim to address the weakness of existing input-adaptive inference methods which fail to take full advantage of the structure of BERT. We propose Dynamic Planning in BERT, a novel fine-tuning strategy that can accelerate the inference process of BERT through selecting a subsequence of transformer layers list of backbone as a computational path for an input sample. To do this, our approach adds a planning module to the original BERT model to determine whether a layer is included or bypassed during inference. Experimental results on the GLUE benchmark exhibit that our method reduces latency to 75\% while maintaining 98\% accuracy, yielding a better accuracy-speed trade-off compared to state-of-the-art input-adaptive methods.	翻訳日:2023-08-06 11:13:10 公開日:2023-07-26
# ユーザ言語がChatGPTの紛争的死亡率に与える影響 How User Language Affects Conflict Fatality Estimates in ChatGPT ( http://arxiv.org/abs/2308.00072v1 ) ライセンス: Link先を確認	Daniel Kazenwadel and Christoph V. Steinert	(参考訳) OpenAIのChatGPT言語モデルは、複雑な問題解決と情報検索のための強力なツールとして人気を集めている。しかしながら、言語固有のトレーニングデータに存在するバイアスの再現に関する懸念が生じる。本研究では,イスラエル・パレスチナ・トルコ・クルド紛争の文脈でこの問題に対処する。我々はgpt-3.5を用いて、以前の紛争ではヘブライ語とアラビア語、後者ではトルコ語とクルド語の両方で、特定の空爆の犠牲者について問い合わせる自動クエリ手順を採用した。分析の結果,GPT-3.5は標的グループの言語よりも攻撃者の言語で検索した場合の死亡率を27$\pm$11%低下させることがわかった。このような攻撃の存在を否定する広範囲な回答は、さらに矛盾を増し、通常の検索エンジンには存在しない新しいバイアス機構を生み出した。この言語バイアスは、既存のメディアバイアスを増幅し、情報バブルに寄与する可能性があり、最終的には紛争を補強する。 OpenAI's ChatGPT language model has gained popularity as a powerful tool for complex problem-solving and information retrieval. However, concerns arise about the reproduction of biases present in the language-specific training data. In this study, we address this issue in the context of the Israeli-Palestinian and Turkish-Kurdish conflicts. Using GPT-3.5, we employed an automated query procedure to inquire about casualties in specific airstrikes, in both Hebrew and Arabic for the former conflict and Turkish and Kurdish for the latter. Our analysis reveals that GPT-3.5 provides 27$\pm$11 percent lower fatality estimates when queried in the language of the attacker than in the language of the targeted group. Evasive answers denying the existence of such attacks further increase the discrepancy, creating a novel bias mechanism not present in regular search engines. This language bias has the potential to amplify existing media biases and contribute to information bubbles, ultimately reinforcing conflicts.	翻訳日:2023-08-06 11:12:09 公開日:2023-07-26
# 3:1 Nesting Rules in Redistricting 3:1 Nesting Rules in Redistricting ( http://arxiv.org/abs/2308.00605v1 ) ライセンス: Link先を確認	Christopher Donnay	(参考訳) 立法再編成では、ほとんどの州が下院と上院の地図を別々に描いている。オハイオ州とウィスコンシン州は上院の選挙区に3:1のネスト規則、すなわち隣接する下院の3つの選挙区から作るよう求めている。我々は、この要件が再編成に与える影響、特に特定の政党が獲得した議席数について調査する。 2つのマルコフ連鎖モンテカルロシミュレーションを比較した。1つはレコン連鎖を使ってネスト条件なしで元老院地図を生成し、もう1つは3:1ネスト条件で元老院地図を生成する新しいチェーンを使用する。さらに、両チェーンでオハイオ州の立憲郡分割要件を実装している。 3:1のネスト規則を必要とすることは、勝利した席の分布に最小限の影響を与える。一方、オハイオ州の郡分割要求を強制することは、この分布を厳しく制限する。 In legislative redistricting, most states draw their House and Senate maps separately. Ohio and Wisconsin require that their Senate districts be made with a 3:1 nesting rule, i.e., out of triplets of adjacent House districts. We seek to study the impact of this requirement on redistricting, specifically on the number of seats won by a particular political party. We compare two Markov Chain Monte Carlo simulations, one which uses the ReCom chain to generate Senate maps without a nesting requirement, and the other which uses a novel chain that generates Senate maps with a 3:1 nesting requirement. Moreover, we implement Ohio's constitutional county splitting requirements in both chains. We find that requiring a 3:1 nesting rule has minimal impact on the distribution of seats won. On the other hand, enforcing Ohio's county splitting requirements severely restricts this distribution.	翻訳日:2023-08-06 11:02:11 公開日:2023-07-26
# ood-cv-v2: 自然画像における個々の迷惑の分散シフトに対するロバスト性の拡張ベンチマーク OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images ( http://arxiv.org/abs/2304.10266v2 ) ライセンス: Link先を確認	Bingchen Zhao, Jiahao Wang, Wufei Ma, Artur Jesslen, Siwei Yang, Shaozuo Yu, Oliver Zendel, Christian Theobalt, Alan Yuille, Adam Kortylewski	(参考訳) 現実のシナリオにおけるビジョンアルゴリズムの堅牢性を高めることは難しい。一つの理由は、既存の堅牢性ベンチマークは、合成データに依存するか、個々のニュアンス要因の影響を無視しているため、制限されているからである。 ood-cv-v2は,ポーズ,形状,テクスチャ,コンテクスト,気象条件の10種類の対象カテゴリの分散例を含むベンチマークデータセットであり,画像分類,物体検出,3次元ポーズ推定のためのモデルのベンチマークを可能にする。この新たなデータセットに加えて、一般的なベースライン手法を用いた広範な実験にも貢献する。 1)一部のニュアンス要因は、視力タスクにもよるが、他の要因に比べてパフォーマンスに強い負の影響がある。 2) 強靭性向上への現在のアプローチは限界効果しか持たず, 強靭性も低減できる。 3) 畳み込みアーキテクチャと変圧器アーキテクチャでは大きな違いはみられない。当社のデータセットは、堅牢性を研究するための豊富なテストベッドを提供し、この分野の研究を進めるのに役立ちます。私たちのデータセットはhttps://bzhao.me/OOD-CV/からアクセスできます。 Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors. We introduce OOD-CV-v2, a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the weather conditions, and enables benchmarking of models for image classification, object detection, and 3D pose estimation. In addition to this novel dataset, we contribute extensive experiments using popular baseline methods, which reveal that: 1) Some nuisance factors have a much stronger negative effect on the performance compared to others, also depending on the vision task. 2) Current approaches to enhance robustness have only marginal effects, and can even reduce robustness. 3) We do not observe significant differences between convolutional and transformer architectures. We believe our dataset provides a rich test bed to study robustness and will help push forward research in this area. Our dataset can be accessed from https://bzhao.me/OOD-CV/	翻訳日:2023-07-31 15:50:48 公開日:2023-07-26
# 共通拡散騒音スケジューリングとサンプルステップの欠陥 Common Diffusion Noise Schedules and Sample Steps are Flawed ( http://arxiv.org/abs/2305.08891v2 ) ライセンス: Link先を確認	Shanchuan Lin, Bingchen Liu, Jiashi Li, Xiao Yang	(参考訳) 一般的な拡散雑音のスケジュールは、信号対雑音比(snr)をゼロにする最後の時間ステップを強制せず、拡散サンプラーの実装のいくつかは、最後の時間ステップから開始しない。このような設計には欠陥があり、モデルが推論時に純粋なガウスノイズを与えられるという事実を反映せず、トレーニングと推論の間に相違が生じている。既存の実装に欠陥のある設計が本当の問題を引き起こすことを示す。安定拡散(Stable Diffusion)では、モデルが中輝度の画像のみを生成することを厳しく制限し、非常に明るく暗いサンプルを生成するのを防ぐ。我々は,(1) ノイズスケジュールを再スケールして端末snrをゼロにする,(2) モデルをv予測でトレーニングする,(3) サンプリング器を最後の時間ステップから常に起動するように変更する,(4) 過度な露出を防止するための再スケール分類器フリーガイダンスを提案する。これらの単純な変更により、トレーニングと推論の間に拡散プロセスが一致し、モデルは元のデータ分布に忠実なサンプルを生成することができる。 We discover that common diffusion noise schedules do not enforce the last timestep to have zero signal-to-noise ratio (SNR), and some implementations of diffusion samplers do not start from the last timestep. Such designs are flawed and do not reflect the fact that the model is given pure Gaussian noise at inference, creating a discrepancy between training and inference. We show that the flawed design causes real problems in existing implementations. In Stable Diffusion, it severely limits the model to only generate images with medium brightness and prevents it from generating very bright and dark samples. We propose a few simple fixes: (1) rescale the noise schedule to enforce zero terminal SNR; (2) train the model with v prediction; (3) change the sampler to always start from the last timestep; (4) rescale classifier-free guidance to prevent over-exposure. These simple changes ensure the diffusion process is congruent between training and inference and allow the model to generate samples more faithful to the original data distribution.	翻訳日:2023-07-31 15:42:12 公開日:2023-07-26
# シンクホーン損失を有するニューラルシュリンガー橋:コロイド自己組織化のデータ駆動型最小努力制御への応用 Neural Schr\"odinger Bridge with Sinkhorn Losses: Application to Data-driven Minimum Effort Control of Colloidal Self-assembly ( http://arxiv.org/abs/2307.14442v1 ) ライセンス: Link先を確認	Iman Nodozi, Charlie Yan, Mira Khare, Abhishek Halder, Ali Mesbah	(参考訳) 我々は、コロイド自己集合の最小労力制御が、1930年代初頭のアーウィン・シュル「オーディンガー」の業績に端を発する固定水平確率的最適制御問題のクラスである一般化シュル「オーディンガー橋問題」として順序パラメータ空間で自然に定式化できることを示した。近年、この種の問題は、制御と機械学習のコミュニティにおける研究活動が再燃している。このような問題に対する理論と計算に関する既存の文献とは異なり、コロイド自己集合の制御ドリフトと拡散係数は一般的に制御の非アフィンであり、物理ベースのモデリングから得られるのが困難である。このような一般化問題に対する最適性の条件を導出し、結果の方程式系が既存の結果と構造的に大きく異なることを示し、標準的な計算手法がもはや適用されないことを示す。そこで本研究では,ニューラルネットワークの最近の進歩を活かし,一般化したシュランガーブリッジ問題を解くために,データ駆動型学習制御フレームワーク「neural schr\"odinger bridge」を提案する。コロイド自己組織化の数値ケーススタディを用いて,提案手法の有効性について述べる。分子動力学シミュレーションデータを用いて制御ドリフトと拡散係数を2つのニューラルネットワークとして学習し、この2つを用いて、この制御問題に特有な分布終端制約を設計したシンクホーン損失を持つ第3のネットワークを訓練する。 We show that the minimum effort control of colloidal self-assembly can be naturally formulated in the order-parameter space as a generalized Schr\"odinger bridge problem -- a class of fixed-horizon stochastic optimal control problems that originated in the works of Erwin Schr\"odinger in the early 1930s. In recent years, this class of problems has seen a resurgence of research activities in control and machine learning communities. Different from the existing literature on the theory and computation for such problems, the controlled drift and diffusion coefficients for colloidal self-assembly are typically non-affine in control, and are difficult to obtain from physics-based modeling. We deduce the conditions of optimality for such generalized problems, and show that the resulting system of equations is structurally very different from the existing results in a way that standard computational approaches no longer apply. Thus motivated, we propose a data-driven learning and control framework, named `neural Schr\"odinger bridge', to solve such generalized Schr\"odinger bridge problems by innovating on recent advances in neural networks. We illustrate the effectiveness of the proposed framework using a numerical case study of colloidal self-assembly. We learn the controlled drift and diffusion coefficients as two neural networks using molecular dynamics simulation data, and then use these two to train a third network with Sinkhorn losses designed for distributional endpoint constraints, specific for this class of control problems.	翻訳日:2023-07-31 15:02:51 公開日:2023-07-26
# データセットにおける情報獲得サブグループ発見 Information Gained Subgroup Discovery in Datasets ( http://arxiv.org/abs/2307.15089v1 ) ライセンス: Link先を確認	Daniel G\'omez-Bravo, Aaron Garc\'ia, Guillermo Vigueras, Bel\'en R\'ios, Mariano Provencio, Alejandro Rodr\'iguez-Gonz\'alez	(参考訳) 肺がんは、がんの主要な死因である。 2023年には238,340人以上が肺がん患者で、死者は127,070人以上と推定されている。正しい治療を選択することは、生存確率を高め、患者の生活の質を改善する上で重要な要素である。がん治療は二次効果を引き起こす可能性がある。これらの毒性は患者の生活の質に影響を与える様々な健康問題を引き起こす。したがって, 治療効果の維持や改善を図りながら毒性を低下させることが, 臨床的視点から追求すべき重要な目標である。一方で、臨床ガイドラインには、臨床医を支援するためにがん治療の推奨に関する一般的な知識が含まれている。がん疾患の側面と個々の患者の特徴に基づく治療勧告を提供するが、治療結果を考慮した統計分析はここでは提供されない。したがって、臨床データに見られる臨床ガイドラインと治療パターンの比較は、検出されたパターンの検証と代替治療パターンの発見を可能にする。本研究では,情報ゲインとオッズ比を考慮した最も関連するパターンを見つけることを目的としたサブグループ発見アルゴリズムである,ゲインサブグループディスカバリーを提案する。そこで我々は,患者のデータ,所定の治療,その結果を含む肺癌患者の情報を含むデータセットを解析した。得られた結果は臨床医を通して検証され、臨床ガイドラインと比較される。このアルゴリズムは,本データセットにおける発見パターンの最も高い受け入れを実現するとともに,サブグループ発見の指標も向上させる。 Lung cancer is the leading cause of cancer death. More than 238,340 new cases of lung cancer patients are expected in 2023, with an estimation of more than 127,070 deaths. Choosing the correct treatment is an important element to enhance the probability of survival and to improve patient's quality of life. Cancer treatments might provoke secondary effects. These toxicities cause different health problems that impact the patient's quality of life. Hence, reducing treatments toxicities while maintaining or improving their effectivenes is an important goal that aims to be pursued from the clinical perspective. On the other hand, clinical guidelines include general knowledge about cancer treatment recommendations to assist clinicians. Although they provide treatment recommendations based on cancer disease aspects and individual patient features, a statistical analysis taking into account treatment outcomes is not provided here. Therefore, the comparison between clinical guidelines with treatment patterns found in clinical data, would allow to validate the patterns found, as well as discovering alternative treatment patterns. In this work, we present Information Gained Subgroup Discovery, a Subgroup Discovery algorithm that aims to find most relevant patterns taking into account Information gain and Odds ratio. Thus, we analyze a dataset containing lung cancer patients information including patients' data, prescribed treatments and their outcomes. Obtained results are validated through clinicians and compared with clinical guidelines. We conclude that this new algorithm achieves highest acceptance of found patterns in this dataset, while also improving indices of Subgroup Discovery.	翻訳日:2023-07-31 14:52:00 公開日:2023-07-26
# 公平な時間変動価格関税設計--共同学習と最適化アプローチ Equitable Time-Varying Pricing Tariff Design: A Joint Learning and Optimization Approach ( http://arxiv.org/abs/2307.15088v1 ) ライセンス: Link先を確認	Liudong Chen and Bolun Xu	(参考訳) 時間変動価格関税は、消費者に電力需要のシフトとコストの削減を奨励するが、応答能力の制限された消費者のエネルギー負担を増加させる可能性がある。したがって、消費者の反応期待を考慮し、これらの関税を設計する際には、有用性と応答インセンティブのバランスをとらなければならない。本稿では,適切な時間変動関税を設計するための共同学習に基づく識別と最適化手法を提案する。提案手法は,歴史価格と需要応答データをリカレントニューラルネットワーク(RNN)に符号化し,高次元および非線形の消費者価格応答挙動を捉える。次に、RNNを関税設計最適化に組み込み、非線形最適化問題を2次目的に定式化する。本稿では,高速かつスケーラブルな計算を実現する勾配に基づく解法を提案する。実世界の消費者データを用いたシミュレーションは、我々の平等関税が低所得消費者を価格上昇から保護し、消費者にピーク需要を減らす動機付けを与えていることを示している。また、ユーティリティ企業の収益回復を確実にし、需要応答の不確実性や予測エラーに対して堅牢な性能を達成する。 Time-varying pricing tariffs incentivize consumers to shift their electricity demand and reduce costs, but may increase the energy burden for consumers with limited response capability. The utility must thus balance affordability and response incentives when designing these tariffs by considering consumers' response expectations. This paper proposes a joint learning-based identification and optimization method to design equitable time-varying tariffs. Our proposed method encodes historical prices and demand response data into a recurrent neural network (RNN) to capture high-dimensional and non-linear consumer price response behaviors. We then embed the RNN into the tariff design optimization, formulating a non-linear optimization problem with a quadratic objective. We propose a gradient-based solution method that achieves fast and scalable computation. Simulation using real-world consumer data shows that our equitable tariffs protect low-income consumers from price surges while effectively motivating consumers to reduce peak demand. The method also ensures revenue recovery for the utility company and achieves robust performance against demand response uncertainties and prediction errors.	翻訳日:2023-07-31 14:51:39 公開日:2023-07-26
# ヒ素ガリウムの2次元光機械結晶共振器 Two-dimensional optomechanical crystal resonator in gallium arsenide ( http://arxiv.org/abs/2307.15087v1 ) ライセンス: Link先を確認	Rhys G. Povey, Ming-Han Chou, Gustav Andersson, Christopher R. Conner, Joel Grebel, Yash J. Joshi, Jacob M. Miller, Hong Qiao, Xuntao Wu, Haoxiong Yan, Andrew N. Cleland	(参考訳) 量子計算と通信の分野では、電子エレクトロニクスと赤外線光学の間で量子コヒーレントな周波数変換が必要である。このための有望なプラットフォームは光学結晶共振器であり、同時にフォトニック結晶とフォノン結晶を用いて電磁モードと音響モードを結合する共局在キャビティを生成し、電気機械的相互作用によって電子に直接変換することができる。この領域での仕事の大半は1次元のナノビーム共振器で、強い光機械的カップリングを提供するが、その形状から、動作に必要なレーザーポンピングによって生じる熱を消散することができない。近年, 準2次元光学結晶空洞がシリコン中で開発され, 熱重合性も向上したが, 最適量子ビット動作周波数を超える機械周波数でも同様に強い結合を示した。ここでは、この設計を、電気機械的相互作用を取り入れ、超伝導量子ビットに理想的なf_m〜4.5GHzの機械共振モードを得ることができ、光学的結合g_om/(2pi)〜650kHzの自然な薄膜単結晶圧電体であるガリウムヒ素に適応させる。 In the field of quantum computation and communication there is a compelling need for quantum-coherent frequency conversion between microwave electronics and infra-red optics. A promising platform for this is an optomechanical crystal resonator that uses simultaneous photonic and phononic crystals to create a co-localized cavity coupling an electromagnetic mode to an acoustic mode, which then via electromechanical interactions can undergo direct transduction to electronics. The majority of work in this area has been on one-dimensional nanobeam resonators which provide strong optomechanical couplings but, due to their geometry, suffer from an inability to dissipate heat produced by the laser pumping required for operation. Recently, a quasi-two-dimensional optomechanical crystal cavity was developed in silicon exhibiting similarly strong coupling with better thermalization, but at a mechanical frequency above optimal qubit operating frequencies. Here we adapt this design to gallium arsenide, a natural thin-film single-crystal piezoelectric that can incorporate electromechanical interactions, obtaining a mechanical resonant mode at f_m ~ 4.5 GHz ideal for superconducting qubits, and demonstrating optomechanical coupling g_om/(2pi) ~ 650 kHz.	翻訳日:2023-07-31 14:51:20 公開日:2023-07-26
# 社会デモグラフィーを用いたBCGによる膀胱癌治療の数学的モデリング Mathematical Modeling of BCG-based Bladder Cancer Treatment Using Socio-Demographics ( http://arxiv.org/abs/2307.15084v1 ) ライセンス: Link先を確認	Elizaveta Savchenko, Ariel Rosenfeld, Svetlana Bunimovich-Mendrazitsky	(参考訳) がんは、毎年何百万もの新規患者を抱える世界でも最も広範にある病気の1つである。膀胱癌は、明らかな原型患者を伴わない全ての個人に影響を及ぼす最も一般的ながんの1つである。 BCの現在の標準治療は、Bacillus Calmette-Guerin(BCG)免疫療法ベースの治療プロトコルに従っており、すべての患者にも適用される。 BCG治療に関連する臨床結果は、免疫系、治療、がん細胞間の相互作用の生物学的および臨床的複雑さにより、患者間で大きく異なる。本研究は,bcg治療に関連する臨床動態を記述したパーソナライズされた数学的モデルを提供するために,患者の社会デモグラフィを利用する。この目的のために,確立されたbcg処理モデルを採用し,機械学習コンポーネントを統合して,モデル内のキーパラメータの時間的調整と再構成を行い,パーソナライゼーションを促進する。実際の臨床データを用いて、我々のパーソナライズされたモデルが、治療終了時のがん細胞の数を平均14.8%改善し、元のモデルと好意的に比較した。 Cancer is one of the most widespread diseases around the world with millions of new patients each year. Bladder cancer is one of the most prevalent types of cancer affecting all individuals alike with no obvious prototypical patient. The current standard treatment for BC follows a routine weekly Bacillus Calmette-Guerin (BCG) immunotherapy-based therapy protocol which is applied to all patients alike. The clinical outcomes associated with BCG treatment vary significantly among patients due to the biological and clinical complexity of the interaction between the immune system, treatments, and cancer cells. In this study, we take advantage of the patient's socio-demographics to offer a personalized mathematical model that describes the clinical dynamics associated with BCG-based treatment. To this end, we adopt a well-established BCG treatment model and integrate a machine learning component to temporally adjust and reconfigure key parameters within the model thus promoting its personalization. Using real clinical data, we show that our personalized model favorably compares with the original one in predicting the number of cancer cells at the end of the treatment, with 14.8% improvement, on average.	翻訳日:2023-07-31 14:50:57 公開日:2023-07-26
# ディープシリアルナンバー:DNN知的財産保護のための計算透かし Deep Serial Number: Computational Watermarking for DNN Intellectual Property Protection ( http://arxiv.org/abs/2011.08960v3 ) ライセンス: Link先を確認	Ruixiang Tang, Mengnan Du, Xia Hu	(参考訳) 本稿では,ディープニューラルネットワーク(DNN)に特化した簡易かつ効果的な透かしアルゴリズムであるDSN(Deep Serial Number)を提案する。 DNNに識別信号を組み込む従来の手法とは異なり、我々はDNNの知的財産権(IP)保護機構を探索し、敵の盗難ネットワークの使用を効果的に阻止する。従来のソフトウェアIPの保護におけるシリアル番号の成功に触発されて,DNNに埋め込まれたシリアル番号の最初の実装を提案する。これを実現するために、DSNは知識蒸留フレームワークに統合され、個人教師DNNが最初に訓練される。その後、その知識は蒸留され、一連のカスタマイズされた学生DNNに付与される。各顧客DNNは、有効なシリアル番号の入力時にのみ正しく機能する。各種アプリケーションにまたがる実験結果から、元のDNN性能を損なうことなく、DSNが不正使用を防止する効果が示された。さらに実験により、DSNは異なるカテゴリーのウォーターマーク攻撃に耐性があることが示されている。 In this paper, we present DSN (Deep Serial Number), a simple yet effective watermarking algorithm designed specifically for deep neural networks (DNNs). Unlike traditional methods that incorporate identification signals into DNNs, our approach explores a novel Intellectual Property (IP) protection mechanism for DNNs, effectively thwarting adversaries from using stolen networks. Inspired by the success of serial numbers in safeguarding conventional software IP, we propose the first implementation of serial number embedding within DNNs. To achieve this, DSN is integrated into a knowledge distillation framework, in which a private teacher DNN is initially trained. Subsequently, its knowledge is distilled and imparted to a series of customized student DNNs. Each customer DNN functions correctly only upon input of a valid serial number. Experimental results across various applications demonstrate DSN's efficacy in preventing unauthorized usage without compromising the original DNN performance. The experiments further show that DSN is resistant to different categories of watermark attacks.	翻訳日:2023-07-28 21:07:59 公開日:2023-07-26
# データ拡張における線形変換の一般化効果について On the Generalization Effects of Linear Transformations in Data Augmentation ( http://arxiv.org/abs/2005.00695v3 ) ライセンス: Link先を確認	Sen Wu, Hongyang R. Zhang, Gregory Valiant, Christopher R\'e	(参考訳) データ拡張は、画像やテキストの分類タスクのようなアプリケーションのパフォーマンスを改善する強力な技術である。しかし、なぜ、どのように様々な拡張が機能するのかについての厳密な理解はほとんどない。本研究では,線形変換の族を考察し,過パラメータ線形回帰設定におけるリッジ推定器への影響について検討する。まず,データのラベルを保存する変換は,トレーニングデータのスパンを広げることで,推定を改善できることを示す。第二に、データを混合する変換が正規化効果を奏でることで推定を改善できることを示す。最後に,MNISTに関する理論的知見を検証した。そこで本研究では,モデルが変換データに対してどの程度不確実かによって,変換空間を探索する拡張手法を提案する。提案手法を画像およびテキストデータセット上で検証する。例えば,open-resnet-28-10を用いたcifar-100では,ランダムサンプリング法を1.24%上回った。さらに、CIFAR-10, CIFAR-100, SVHN, ImageNetデータセット上のSoTA Adversarial AutoAugmentに匹敵する精度を実現する。 Data augmentation is a powerful technique to improve performance in applications such as image and text classification tasks. Yet, there is little rigorous understanding of why and how various augmentations work. In this work, we consider a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting. First, we show that transformations that preserve the labels of the data can improve estimation by enlarging the span of the training data. Second, we show that transformations that mix data can improve estimation by playing a regularization effect. Finally, we validate our theoretical insights on MNIST. Based on the insights, we propose an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data. We validate our proposed scheme on image and text datasets. For example, our method outperforms random sampling methods by 1.24% on CIFAR-100 using Wide-ResNet-28-10. Furthermore, we achieve comparable accuracy to the SoTA Adversarial AutoAugment on CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets.	翻訳日:2023-07-28 21:07:43 公開日:2023-07-26
# 構成的連合学習:分布的ロバストな平均化とメタ学習への応用 Compositional federated learning: Applications in distributionally robust averaging and meta learning ( http://arxiv.org/abs/2106.11264v3 ) ライセンス: Link先を確認	Feihu Huang, Junyi Li	(参考訳) 本稿では,分散ロバストflやモデル非依存メタ学習(maml)といった階層構造を持つ多くのデータマイニング問題や機械学習問題で頻繁に発生する新しい構成的フェデレーション学習(fl)フレームワークの解法として,効率的かつ効率的な構成的フェデレーション学習(comfedl)アルゴリズムを提案する。さらに,いくつかの穏やかな条件下でのcomfedlアルゴリズムの収束解析を行い,$t$ が反復数を表す$o(\frac{1}{\sqrt{t}})$ の収束率を達成することを証明した。我々の知る限り、我々の新しいコンポジションFLフレームワークは、コンポジション確率最適化によるフェデレーション学習を橋渡しする最初の試みである。特に、分布的に堅牢なFL(ミニマックス最適化問題)をKL分散正規化を用いて単純な合成最適化問題に変換する。同時に、分布に依存しないMAML問題(ミニマックス最適化問題)も、単純で効果的な合成最適化問題に変換する。最後に、分布的に堅牢なFLとMAMLの2つの機械学習タスクを適用し、アルゴリズムの有効性を実証する。 In the paper, we propose an effective and efficient Compositional Federated Learning (ComFedL) algorithm for solving a new compositional Federated Learning (FL) framework, which frequently appears in many data mining and machine learning problems with a hierarchical structure such as distributionally robust FL and model-agnostic meta learning (MAML). Moreover, we study the convergence analysis of our ComFedL algorithm under some mild conditions, and prove that it achieves a convergence rate of $O(\frac{1}{\sqrt{T}})$, where $T$ denotes the number of iteration. To the best of our knowledge, our new Compositional FL framework is the first work to bridge federated learning with composition stochastic optimization. In particular, we first transform the distributionally robust FL (i.e., a minimax optimization problem) into a simple composition optimization problem by using KL divergence regularization. At the same time, we also first transform the distribution-agnostic MAML problem (i.e., a minimax optimization problem) into a simple yet effective composition optimization problem. Finally, we apply two popular machine learning tasks, i.e., distributionally robust FL and MAML to demonstrate the effectiveness of our algorithm.	翻訳日:2023-07-28 20:58:35 公開日:2023-07-26
# 接続型クエリの非効率PAC学習性について On the non-efficient PAC learnability of conjunctive queries ( http://arxiv.org/abs/2208.10255v2 ) ライセンス: Link先を確認	Balder ten Cate, Maurice Funk, Jean Christoph Jung, Carsten Lutz	(参考訳) このメモは3つの目的を果たす。 (i)この概念クラスが多項式サイズの適合性に欠けており、これは計算学習理論の文献の多くで暗黙的に想定されている性質である、という複雑な事実に注意を払いながら、結合的問合せがおそらくは正しい(pac)モデルでは効率的に学習できないという事実を自己完結した表現を提供する。二) 連結クエリ(cqs)の多くの制限されたクラスに適用できる強い負のpac学習可能性(「非循環性」の幅広い概念に対する非循環的cqsを含む。)を確立する。 3) CQ(およびUCQ)は, メンバーシップクエリで効率よくPACを学習可能であることを示す。 This note serves three purposes: (i) we provide a self-contained exposition of the fact that conjunctive queries are not efficiently learnable in the Probably-Approximately-Correct (PAC) model, paying clear attention to the complicating fact that this concept class lacks the polynomial-size fitting property, a property that is tacitly assumed in much of the computational learning theory literature; (ii) we establish a strong negative PAC learnability result that applies to many restricted classes of conjunctive queries (CQs), including acyclic CQs for a wide range of notions of "acyclicity"; (iii) we show that CQs (and UCQs) are efficiently PAC learnable with membership queries.	翻訳日:2023-07-28 20:50:56 公開日:2023-07-26
# スカラー入力と関数出力のためのニューラルネットワーク Neural Networks for Scalar Input and Functional Output ( http://arxiv.org/abs/2208.05776v2 ) ライセンス: Link先を確認	Sidi Wu, C\'edric Beaulac and Jiguo Cao	(参考訳) 一連のスカラー予測器に対する関数応答の回帰は、特に多くの予測器が存在する場合や、それらの予測器と応答の関係が非線形である場合、難しい課題となる。本研究では,この問題に対する解法を提案する。スカラー入力を用いて関数応答を予測するために設計されたフィードフォワードニューラルネットワーク(nn)である。まず、関数応答を有限次元表現に変換し、この表現を出力するnnを構成する。そこで本研究では,対象関数を介しNNの出力を改良し,ネットワークトレーニングのための異なる目的関数を導入することを提案する。提案手法は定期的および不規則な間隔データに適用可能であり, 予測曲線の滑らかさを制御するためにさらにラフネスペナルティを適用することができる。両方の機能を実装することの難しさは、バックプロパゲーション可能な客観的関数の定義にある。実験では,従来の関数・オン・スカラー回帰モデルを複数のシナリオで上回り,予測器の次元で計算的によくスケーリングできることを実証する。 The regression of a functional response on a set of scalar predictors can be a challenging task, especially if there is a large number of predictors, or the relationship between those predictors and the response is nonlinear. In this work, we propose a solution to this problem: a feed-forward neural network (NN) designed to predict a functional response using scalar inputs. First, we transform the functional response to a finite-dimensional representation and construct an NN that outputs this representation. Then, we propose to modify the output of an NN via the objective function and introduce different objective functions for network training. The proposed models are suited for both regularly and irregularly spaced data, and a roughness penalty can be further applied to control the smoothness of the predicted curve. The difficulty in implementing both those features lies in the definition of objective functions that can be back-propagated. In our experiments, we demonstrate that our model outperforms the conventional function-on-scalar regression model in multiple scenarios while computationally scaling better with the dimension of the predictors.	翻訳日:2023-07-28 20:50:43 公開日:2023-07-26
# ミニチュアクロックネットワークを用いた重力赤方偏移のラボベース実験 A lab-based test of the gravitational redshift with a miniature clock network ( http://arxiv.org/abs/2207.07145v2 ) ライセンス: Link先を確認	Xin Zheng, Jonathan Dolde, Matthew C. Cambria, Hong Ming Lim, Shimon Kolkowitz	(参考訳) アインシュタインの一般相対性理論では、高重力ポテンシャルの時計は低電位での同一の時計よりも速く動き、重力の赤方偏移として知られる効果が予測される。ここでは、高さ1cmの5つの原子アンサンブルの等間隔配列内の差分クロック比較を用いて、実験室による重力赤方偏移のブラインドテストを行う。 12.4\pm0.7_{\rm{(stat)}}\pm2.5_{\rm{(sys)}}]\times10^{-19}/$cmは、予想される10.9\times10^{-19}/$cmと一致する。我々の結果は、地球表面におけるmmスケールの変化に敏感な相対論的重力ポテンシャル差測定として見ることもできる。これらの結果は、測地学、新しい物理学の探索、重力波の検出、量子力学と重力の相互作用の探索を含む光原子時計の新たな応用のための局所オシレータ非依存差分クロック比較の可能性を強調している。 Einstein's theory of general relativity predicts that a clock at a higher gravitational potential will tick faster than an otherwise identical clock at a lower potential, an effect known as the gravitational redshift. Here we perform a laboratory-based, blinded test of the gravitational redshift using differential clock comparisons within an evenly spaced array of 5 atomic ensembles spanning a height difference of 1 cm. We measure a fractional frequency gradient of $[-12.4\pm0.7_{\rm{(stat)}}\pm2.5_{\rm{(sys)}}]\times10^{-19}/$cm, consistent with the expected redshift gradient of $-10.9\times10^{-19}/$cm. Our results can also be viewed as relativistic gravitational potential difference measurements with sensitivity to mm scale changes in height on the surface of the Earth. These results highlight the potential of local-oscillator-independent differential clock comparisons for emerging applications of optical atomic clocks including geodesy, searches for new physics, gravitational wave detection, and explorations of the interplay between quantum mechanics and gravity.	翻訳日:2023-07-28 20:49:51 公開日:2023-07-26
# 置換型進化アルゴリズムのランタイム解析 Runtime Analysis for Permutation-based Evolutionary Algorithms ( http://arxiv.org/abs/2207.04045v3 ) ライセンス: Link先を確認	Benjamin Doerr, Yassine Ghannane, Marouane Ibn Brahim	(参考訳) 進化的アルゴリズム(EA)の理論解析は、過去25年間に擬ブール最適化問題において大きな進歩を遂げてきたが、EAが置換に基づく問題を解決する方法に関する散発的な理論的な結果のみが存在する。置換に基づくベンチマークの欠如を克服するため,従来の擬似ブールベンチマークを置換集合上で定義されたベンチマークに変換する一般的な方法を提案する。次に、Scharnow, Tinnefeld, Wegener (2004) が提案した置換に基づく$(1+1)$ EAの厳密な実行時解析を、LeadingOnes と Jump ベンチマークの類似で実施する。後者は、ビットストリングと異なり、置換を$\sigma$を別の$\tau$に変換するのがどれほど難しいかを決定するハミング距離だけでなく、$\sigma \tau^{-1}$の正確なサイクル構造も示している。このため、より対称的なスクランブル変異演算子も考慮する。私たちは、それがより単純な証明につながるだけでなく、ジャンプ関数のランタイムを奇なジャンプサイズで$\thetaで減少させるのを観察する。 (n)$。最後に、ビットストリングの場合のように、スクランブル演算子の重み付きバージョンが$m^{\Thetaの高速化につながることを示す。 (m)$ on jump functions with jump size $m$。短い経験的分析によってこれらの発見が確認されたが、イヴォイドの変異率のような小さな実装の詳細が重要な違いをもたらすことも判明した。 While the theoretical analysis of evolutionary algorithms (EAs) has made significant progress for pseudo-Boolean optimization problems in the last 25 years, only sporadic theoretical results exist on how EAs solve permutation-based problems. To overcome the lack of permutation-based benchmark problems, we propose a general way to transfer the classic pseudo-Boolean benchmarks into benchmarks defined on sets of permutations. We then conduct a rigorous runtime analysis of the permutation-based $(1+1)$ EA proposed by Scharnow, Tinnefeld, and Wegener (2004) on the analogues of the LeadingOnes and Jump benchmarks. The latter shows that, different from bit-strings, it is not only the Hamming distance that determines how difficult it is to mutate a permutation $\sigma$ into another one $\tau$, but also the precise cycle structure of $\sigma \tau^{-1}$. For this reason, we also regard the more symmetric scramble mutation operator. We observe that it not only leads to simpler proofs, but also reduces the runtime on jump functions with odd jump size by a factor of $\Theta(n)$. Finally, we show that a heavy-tailed version of the scramble operator, as in the bit-string case, leads to a speed-up of order $m^{\Theta(m)}$ on jump functions with jump size $m$. A short empirical analysis confirms these findings, but also reveals that small implementation details like the rate of void mutations can make an important difference.	翻訳日:2023-07-28 20:49:35 公開日:2023-07-26
# ナビゲーションのためのビジュアル事前学習:ノイズから何が学べる? Visual Pre-training for Navigation: What Can We Learn from Noise? ( http://arxiv.org/abs/2207.00052v3 ) ライセンス: Link先を確認	Yanwei Wang, Ching-Yun Ko, Pulkit Agrawal	(参考訳) 視覚ナビゲーションの強力なパラダイムの一つは、観察から直接行動を予測することである。このようなエンドツーエンドシステムのトレーニングにより、下流タスクが自動的に現れるのに役立つ表現が可能になる。しかし、帰納バイアスの欠如により、このシステムデータは非効率になる。我々は現在の視点の十分な表現とナビゲーションポリシーの目標ビューを、目標に対応する現在の視点の作物の位置と大きさを予測することによって学習できると仮定する。さらに、合成ノイズ画像から自然の家庭画像へ変換する自己教師方式で、このようなランダムな作物予測を訓練することが示される。そして、学習した表現をブートストラップして、対話データが少なく、効率的にナビゲーションポリシーを学ぶことができる。コードはhttps://yanweiw.github.io/noise2ptzで入手できる。 One powerful paradigm in visual navigation is to predict actions from observations directly. Training such an end-to-end system allows representations useful for downstream tasks to emerge automatically. However, the lack of inductive bias makes this system data inefficient. We hypothesize a sufficient representation of the current view and the goal view for a navigation policy can be learned by predicting the location and size of a crop of the current view that corresponds to the goal. We further show that training such random crop prediction in a self-supervised fashion purely on synthetic noise images transfers well to natural home images. The learned representation can then be bootstrapped to learn a navigation policy efficiently with little interaction data. The code is available at https://yanweiw.github.io/noise2ptz	翻訳日:2023-07-28 20:49:09 公開日:2023-07-26
# 定音性を有する量子局所テスト可能符号 Quantum Locally Testable Code with Constant Soundness ( http://arxiv.org/abs/2209.11405v2 ) ライセンス: Link先を確認	Andrew Cross, Zhiyang He, Anand Natarajan, Mario Szegedy, Guanyu Zhu	(参考訳) 本稿では,量子局所テスト可能符号(QLTC)の定音性を示す2つの構成について述べる。第1のアプローチでは、チェック製品と呼ばれる操作を導入し、この操作が、定音率、定速度、局所性による距離スケーリングのQLTCをいかに生み出すかを示す。第2のアプローチでは、量子符号と古典的な反復符号のハイパーグラフ積を考察し、成分符号の健全性が保たれる特別な場合を観察した。この洞察により、一定音質、スケーラブルな速度と距離、および一定平均局所性のQLTCを構築することができる。我々の研究は、高い音質と距離のQLTCを構築するための一歩であり、これはNo Low-Energy Trivial States (NLTS) の定理に異なる構成を与える。 In this paper, we present two constructions of quantum locally testable codes (QLTC) with constant soundness. In the first approach, we introduce an operation called check product, and show how this operation gives rise to QLTCs of constant soundness, constant rate, and distance scaling with locality. In the second approach, we consider hypergraph product of a quantum code and a classical repetition code, and observe a special case in which the soundness of component codes is preserved. This insight leads us to construct QLTCs of constant soundness, scalable rate and distance, and constant average locality. Our work marks a step towards constructing QLTCs of high soundness and distance, which would give a different construction to the No Low-Energy Trivial States (NLTS) theorem.	翻訳日:2023-07-28 20:40:58 公開日:2023-07-26
# 非滑らかな非凸非凸最小値最適化:2次元バランスと反復複雑度解析 Nonsmooth Nonconvex-Nonconcave Minimax Optimization: Primal-Dual Balancing and Iteration Complexity Analysis ( http://arxiv.org/abs/2209.10825v3 ) ライセンス: Link先を確認	Jiajin Li, Linglingzhi Zhu and Anthony Man-Cho So	(参考訳) nonconvex-nonconcave minimaxの最適化は、過去10年間で広く関心を集めている。しかし、既存のほとんどの研究は、スムーズな非凸凹設定にのみ適用可能な勾配降下度アルゴリズム(GDA)の変種に焦点を当てている。この制限に対処するため、スムーズな近位線形降下指数(smoothed PLDA)と呼ばれる新しいアルゴリズムを提案する。具体的には、原始函数が非滑らかな合成構造を持ち、双対函数が指数$\theta \in [0,1)$のクルディカ・ロジャシエヴィチ(KL)性質を持つような集合を考える。提案手法は, 新たに開発した非スムース主元誤差境界と2重誤差境界を主成分とする, 平滑化pldaのための新しい収束解析フレームワークを提案する。このフレームワークを用いて、平滑化pldaは$\epsilon$-game-stationary pointと$\epsilon$-optimization-stationary pointの両方を$\mathcal{o}(\epsilon^{-2\max\{2\theta,1\}})$イテレーションの興味のある問題から見つけることができる。さらに、$\theta \in [0,\frac{1}{2}]$の場合、平滑化pldaは$\mathcal{o}(\epsilon^{-2})$の最適な反復複雑性を達成する。分析フレームワークの有効性と適用性をさらに高めるために、ある最大構造問題は、軽度仮定の下で指数$\theta=0$のKL特性を持つことを示した。副産物として,様々な定常性概念間のアルゴリズム非依存な定量的関係を確立する。 Nonconvex-nonconcave minimax optimization has gained widespread interest over the last decade. However, most existing works focus on variants of gradient descent-ascent (GDA) algorithms, which are only applicable to smooth nonconvex-concave settings. To address this limitation, we propose a novel algorithm named smoothed proximal linear descent-ascent (smoothed PLDA), which can effectively handle a broad range of structured nonsmooth nonconvex-nonconcave minimax problems. Specifically, we consider the setting where the primal function has a nonsmooth composite structure and the dual function possesses the Kurdyka-Lojasiewicz (KL) property with exponent $\theta \in [0,1)$. We introduce a novel convergence analysis framework for smoothed PLDA, the key components of which are our newly developed nonsmooth primal error bound and dual error bound. Using this framework, we show that smoothed PLDA can find both $\epsilon$-game-stationary points and $\epsilon$-optimization-stationary points of the problems of interest in $\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$ iterations. Furthermore, when $\theta \in [0,\frac{1}{2}]$, smoothed PLDA achieves the optimal iteration complexity of $\mathcal{O}(\epsilon^{-2})$. To further demonstrate the effectiveness and wide applicability of our analysis framework, we show that certain max-structured problem possesses the KL property with exponent $\theta=0$ under mild assumptions. As a by-product, we establish algorithm-independent quantitative relationships among various stationarity concepts, which may be of independent interest.	翻訳日:2023-07-28 20:40:43 公開日:2023-07-26
# レンズとカメラの校正のための深部知覚計測 A Deep Perceptual Measure for Lens and Camera Calibration ( http://arxiv.org/abs/2208.12300v2 ) ライセンス: Link先を確認	Yannick Hold-Geoffroy, Dominique Pich\'e-Meunier, Kalyan Sunkavalli, Jean-Charles Bazin, Fran\c{c}ois Rameau and Jean-Fran\c{c}ois Lalonde	(参考訳) デジタルアートからarやvr体験に至るまで、エンタテインメントでは画像編集や合成が普及している。美しい複合材料を作るためには、カメラを幾何学的に調整する必要がある。従来のマルチイメージキャリブレーション法の代わりに、深部畳み込みニューラルネットワークを用いて、単一画像から直接ピッチ、ロール、視野、レンズ歪みなどのカメラキャリブレーションパラメータを推定することを提案する。大規模パノラマデータセットから自動生成されたサンプルを使ってネットワークをトレーニングし、標準の `2 エラーの点で競合精度を得る。しかし、このような標準エラーメトリクスの最小化は、多くのアプリケーションにとって最適ではないかもしれない。本研究では,幾何学的カメラキャリブレーションにおける不正確性に対する人間感度について検討する。そこで我々は, カメラキャリブレーションパラメータを補正した3次元物体のリアリズムの判断を参加者に依頼する大規模人間の知覚調査を行った。本研究では,カメラキャリブレーションのための新しい知覚尺度を開発し,この新しい知覚尺度と標準測定値の両方に基づいて,従来の単一画像に基づくキャリブレーション手法よりも深いキャリブレーションネットワークが優れていることを示す。最後に,仮想物体挿入,画像検索,合成など,いくつかのアプリケーションにおける校正ネットワークの利用を実証する。私たちのアプローチのデモはhttps://lvsn.github.io/deepcalib で公開されています。 Image editing and compositing have become ubiquitous in entertainment, from digital art to AR and VR experiences. To produce beautiful composites, the camera needs to be geometrically calibrated, which can be tedious and requires a physical calibration target. In place of the traditional multi-image calibration process, we propose to infer the camera calibration parameters such as pitch, roll, field of view, and lens distortion directly from a single image using a deep convolutional neural network. We train this network using automatically generated samples from a large-scale panorama dataset, yielding competitive accuracy in terms of standard `2 error. However, we argue that minimizing such standard error metrics might not be optimal for many applications. In this work, we investigate human sensitivity to inaccuracies in geometric camera calibration. To this end, we conduct a large-scale human perception study where we ask participants to judge the realism of 3D objects composited with correct and biased camera calibration parameters. Based on this study, we develop a new perceptual measure for camera calibration and demonstrate that our deep calibration network outperforms previous single-image based calibration methods both on standard metrics as well as on this novel perceptual measure. Finally, we demonstrate the use of our calibration network for several applications, including virtual object insertion, image retrieval, and compositing. A demonstration of our approach is available at https://lvsn.github.io/deepcalib .	翻訳日:2023-07-28 20:39:20 公開日:2023-07-26
# 隠れマルコフモデルを用いた強化学習のためのタスク自動学習 Learning Task Automata for Reinforcement Learning using Hidden Markov Models ( http://arxiv.org/abs/2208.11838v3 ) ライセンス: Link先を確認	Alessandro Abate (1), Yousif Almulla (1), James Fox (1), David Hyland (1), Michael Wooldridge (1) ((1) University of Oxford)	(参考訳) スカラー報酬信号を用いた訓練強化学習(RL)エージェントは、環境がまばらで非マルコフ報酬を持つ場合、しばしば実現不可能である。さらに、トレーニング前にこれらの報酬関数を手作りすることは、特に環境のダイナミクスが部分的にしか知られていない場合、不特定に陥る傾向がある。本稿では,未知環境におけるエージェント体験のエピソードから,非マルコフタスク仕様を簡潔な有限状態「タスクオートマトン」として学習するための新しいパイプラインを提案する。 2つの重要なアルゴリズムの洞察を活用します。まず、製品MDPを部分的に観測可能なMDPとして扱い、よく知られたBaum-Welchアルゴリズムを用いて隠れマルコフモデルを学習することで、仕様のオートマトンと環境のMDP(どちらも当初不明)からなるモデルである製品MDPを学習する。第2に、学習した製品MDPからタスクオートマトン(決定論的有限オートマトンと仮定される)を蒸留する方法を提案する。我々の学習タスクオートマトンはタスクをその構成サブタスクに分解し、RLエージェントが後に最適なポリシーを合成できる速度を改善する。また、高レベルの環境やタスクの特徴を解釈可能なエンコーディングを提供しており、エージェントが不特定性のないコヒーレントなタスクを学習したことを容易に確認することができる。さらに,学習オートマトンが環境非依存であることを保証するための一歩を踏み出し,転校学習に適するようにした。最後に,2つのベースラインと比較した実験結果を提供し,異なる環境とタスクにおけるアルゴリズムの性能を示す。 Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to misspecification, especially when the environment's dynamics are only partially known. This paper proposes a novel pipeline for learning non-Markovian task specifications as succinct finite-state `task automata' from episodes of agent experience within unknown environments. We leverage two key algorithmic insights. First, we learn a product MDP, a model composed of the specification's automaton and the environment's MDP (both initially unknown), by treating the product MDP as a partially observable MDP and using the well-known Baum-Welch algorithm for learning hidden Markov models. Second, we propose a novel method for distilling the task automaton (assumed to be a deterministic finite automaton) from the learnt product MDP. Our learnt task automaton enables the decomposition of a task into its constituent sub-tasks, which improves the rate at which an RL agent can later synthesise an optimal policy. It also provides an interpretable encoding of high-level environmental and task features, so a human can readily verify that the agent has learnt coherent tasks with no misspecifications. In addition, we take steps towards ensuring that the learnt automaton is environment-agnostic, making it well-suited for use in transfer learning. Finally, we provide experimental results compared with two baselines to illustrate our algorithm's performance in different environments and tasks.	翻訳日:2023-07-28 20:38:54 公開日:2023-07-26
# 決定論的問題に対する確率的推定器の優越性:ロバスト性、一貫性、知覚品質 Reasons for the Superiority of Stochastic Estimators over Deterministic Ones: Robustness, Consistency and Perceptual Quality ( http://arxiv.org/abs/2211.08944v3 ) ライセンス: Link先を確認	Guy Ohayon, Theo Adrai, Michael Elad, Tomer Michaeli	(参考訳) 確率的復元アルゴリズムは、劣化した入力に対応する解の空間を探索することができる。本稿では, 決定論的手法よりも確率論的手法の基本的な利点を明らかにする。まず, 完全な知覚的品質を達成し, 入力と一致した出力を持つ復元アルゴリズムは, 後方標本でなければならないことを証明し, 確率的であることが求められる。第二に、決定論的復元アルゴリズムは高い知覚的品質を達成できるが、これは、非常に敏感なマッピングを用いて、可能なすべてのソースイメージの空間を埋めることによってのみ達成できるので、敵の攻撃に対して非常に脆弱である。実際,このような攻撃に対して決定論的モデルを強制することは知覚的品質を著しく損なう一方で,確率的モデルの堅牢化は知覚的品質にはほとんど影響を与えず,出力の変動性も向上することを示す。これらの知見は, 確率的回復手法の進歩を促進する動機となり, 回復アルゴリズムの改善への道を開いた。 Stochastic restoration algorithms allow to explore the space of solutions that correspond to the degraded input. In this paper we reveal additional fundamental advantages of stochastic methods over deterministic ones, which further motivate their use. First, we prove that any restoration algorithm that attains perfect perceptual quality and whose outputs are consistent with the input must be a posterior sampler, and is thus required to be stochastic. Second, we illustrate that while deterministic restoration algorithms may attain high perceptual quality, this can be achieved only by filling up the space of all possible source images using an extremely sensitive mapping, which makes them highly vulnerable to adversarial attacks. Indeed, we show that enforcing deterministic models to be robust to such attacks profoundly hinders their perceptual quality, while robustifying stochastic models hardly influences their perceptual quality, and improves their output variability. These findings provide a motivation to foster progress in stochastic restoration methods, paving the way to better recovery algorithms.	翻訳日:2023-07-28 20:28:35 公開日:2023-07-26
# bohm-de broglie サイクル Bohm - de Broglie Cycles ( http://arxiv.org/abs/2301.13251v2 ) ライセンス: Link先を確認	Olivier Piguet	(参考訳) de broglie-bohm量子理論では、粒子はその波動関数に関連する磁束によって決定される軌道を記述する。これらの軌道は相対論的スピン・ワン・ハーフ粒子に対して研究され、次元3次元の無質量粒子の場合の明示的な数値計算により、波動関数が全角運動量の固有関数である場合、軌道は直線をたどる遷移時間まで徐々に増加する半径の円として始まることが示されている。ある検出器における位置時間とそれらの確率分布も計算される。選択されたエネルギーと運動量パラメータは、グラフェンの物理学で満たされる桁数である。 In the de Broglie-Bohm quantum theory, particles describe trajectories determined by the flux associated with their wave function. These trajectories are studied here for relativistic spin-one-half particles.Based in explicit numerical calculations for the case of a massless particle in dimension three space-time, it is shown that if the wave function is an eigenfunction of the total angular momentum, the trajectories begin as circles of slowly increasing radius until a transition time at which they tend to follow straight lines. Arrival times at some detector, as well as their probability distribution are calculated, too. The chosen energy and momentum parameters are of the orders of magnitude met in graphene's physics.	翻訳日:2023-07-28 20:20:39 公開日:2023-07-26
# 制約プログラミング解法における汎用的価値選択ヒューリスティックの学習 Learning a Generic Value-Selection Heuristic Inside a Constraint Programming Solver ( http://arxiv.org/abs/2301.01913v2 ) ライセンス: Link先を確認	Tom Marty, Tristan Fran\c{c}ois, Pierre Tessier, Louis Gauthier, Louis-Martin Rousseau, Quentin Cappart	(参考訳) 制約プログラミングは組合せ問題の効率的な解法として知られている。解法における重要な設計選択は分岐ヒューリスティックスであり、探索を最小限の時間で最良の解に導くように設計されている。しかし、これらのヒューリスティックスの開発は、問題固有の専門知識を必要とする時間を要するプロセスである。この観察は、専門家の介入なしに機械学習を使って効率的なヒューリスティックを自動的に学習する多くの努力を動機付けてきた。私たちの知る限りでは、まだオープンな研究課題である。いくつかのジェネリック変数選択ヒューリスティックは文献で利用可能であるが、ジェネリック値選択ヒューリスティックの選択肢は少ない。本稿では,制約プログラミングソルバの内部において,価値選択ヒューリスティックを得るために使用できる汎用学習手順を導入することで,この問題に取り組むことを提案する。これは、深いq学習アルゴリズム、カスタマイズされた報酬信号、異種グラフニューラルネットワークアーキテクチャの組み合わせによって達成されている。グラフの彩色,最大独立集合,最大カット問題に関する実験は,汎用的ながら大量のバックトラックを必要とせずに,最適に近いより良い解を見つけることができることを示した。 Constraint programming is known for being an efficient approach for solving combinatorial problems. Important design choices in a solver are the branching heuristics, which are designed to lead the search to the best solutions in a minimum amount of time. However, developing these heuristics is a time-consuming process that requires problem-specific expertise. This observation has motivated many efforts to use machine learning to automatically learn efficient heuristics without expert intervention. To the best of our knowledge, it is still an open research question. Although several generic variable-selection heuristics are available in the literature, the options for a generic value-selection heuristic are more scarce. In this paper, we propose to tackle this issue by introducing a generic learning procedure that can be used to obtain a value-selection heuristic inside a constraint programming solver. This has been achieved thanks to the combination of a deep Q-learning algorithm, a tailored reward signal, and a heterogeneous graph neural network architecture. Experiments on graph coloring, maximum independent set, and maximum cut problems show that our framework is able to find better solutions close to optimality without requiring a large amounts of backtracks while being generic.	翻訳日:2023-07-28 20:19:43 公開日:2023-07-26
# dae-former : 医用画像セグメンテーションのための2重注意誘導型効率的なトランスフォーマー DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation ( http://arxiv.org/abs/2212.13504v3 ) ライセンス: Link先を確認	Reza Azad, Ren\'e Arimond, Ehsan Khodapanah Aghdam, Amirhossein Kazerouni, Dorit Merhof	(参考訳) トランスフォーマーは最近、長距離依存をモデル化する能力により、コンピュータビジョン領域で注目を集めている。しかし、変圧器モデルの中核部分である自己拘束機構は、通常トークン数に関して二次計算の複雑さに苦しむ。多くのアーキテクチャは、自己保持機構をローカル領域に制限したり、トークン化プロセスを再設計することで、モデルの複雑さを減らそうとしている。本稿では,自己認識機構を効率的に設計することで,新たな視点の提供を目指すDAE-Formerを提案する。より具体的には、計算効率を保ちながら、特徴次元全体の空間的およびチャネル的関係を捉える自己認識機構を再構成する。さらに, クロスアテンションモジュールを組み込んだスキップ接続経路の再設計を行い, 特徴再利用性を確保し, ローカライズパワーを高める。プレトレーニング重量を必要とせず,多臓器心および皮膚病変分画データセットの最先端法を上回った。コードはhttps://github.com/mindflow-institue/daeformerで公開されている。 Transformers have recently gained attention in the computer vision domain due to their ability to model long-range dependencies. However, the self-attention mechanism, which is the core part of the Transformer model, usually suffers from quadratic computational complexity with respect to the number of tokens. Many architectures attempt to reduce model complexity by limiting the self-attention mechanism to local regions or by redesigning the tokenization process. In this paper, we propose DAE-Former, a novel method that seeks to provide an alternative perspective by efficiently designing the self-attention mechanism. More specifically, we reformulate the self-attention mechanism to capture both spatial and channel relations across the whole feature dimension while staying computationally efficient. Furthermore, we redesign the skip connection path by including the cross-attention module to ensure the feature reusability and enhance the localization power. Our method outperforms state-of-the-art methods on multi-organ cardiac and skin lesion segmentation datasets without requiring pre-training weights. The code is publicly available at https://github.com/mindflow-institue/DAEFormer.	翻訳日:2023-07-28 20:19:25 公開日:2023-07-26
# 陽子におけるクォーク対角相関:エントロピーと絡み合いの負性 Quark pair angular correlations in the proton: entropy versus entanglement negativity ( http://arxiv.org/abs/2303.07408v2 ) ライセンス: Link先を確認	Adrian Dumitru and Eric Kolbusz	(参考訳) 光面上の陽子の2粒子相関は、他のすべての観測されていない自由度を追跡した結果得られる混合密度行列によって記述される。量子情報理論のエンタングルメント負性度尺度を用いて真に量子クォーク方位相関を定量化する。色空間における2クォーク状態は高いエントロピーと弱い量子相関の1つであるが、文献からの標準3クォークモデル波動関数は、低エントロピーと高エンタングルメント負性性の方位相関状態を予測する。低エントロピーは多くの色に対する期待値(固定された't Hooft結合$g^2 N_c$)と一致しているが、高い負性度は、N_c=3$でかなりの2粒子量子相関を示す。絡み合いの負性度に関連する量子相関の抑制は、クォーク対アジムタルモーメント $\langle \zeta^n \rangle$, $\zeta = \exp(i (\phi_1-\phi_2))$, intrinsic to the proton state を強く修飾する。また、陽子中のグルーオンの存在(または交換)に起因する光円錐摂動理論から密度行列への${\cal O}(g^2)$の補正の仕方についても記述する。この補正はエントロピーを高め、クォーク対の方位相関に対する密度行列の負性を低減する。したがって、絡み合い陰性度測定はQCDのプロトン状態の構造に関する新しい洞察を与える可能性がある。 Two-particle correlations in the proton on the light-front are described by a mixed density matrix obtained by tracing over all other, unobserved, degrees of freedom. We quantify genuinely quantum quark azimuthal correlations in terms of the entanglement negativity measure of Quantum Information Theory. While the two-quark state in color space is one of high entropy and weak quantum correlation, we find that a standard three-quark model wave function from the literature predicts an azimuthally correlated state of low entropy and high entanglement negativity. Low entropy is consistent with expectations for many colors (at fixed 't Hooft coupling $g^2 N_c$) but high negativity indicates substantial two-particle quantum correlations at $N_c=3$. Suppressing quantum correlations associated with entanglement negativity strongly modifies quark pair azimuthal moments $\langle \zeta^n \rangle$, $\zeta = \exp(i (\phi_1-\phi_2))$, intrinsic to the proton state. We also describe how to account for the leading ${\cal O}(g^2)$ correction to the density matrix from light-cone perturbation theory which is due to the presence (or exchange) of a gluon in the proton. This correction increases the entropy and reduces the negativity of the density matrix for quark pair azimuthal correlations. Hence, the entanglement negativity measure may provide novel insight into the structure of the proton state of QCD.	翻訳日:2023-07-28 20:11:27 公開日:2023-07-26
# ベルヌーイ線形力学系のスペクトル学習 Spectral learning of Bernoulli linear dynamical systems models ( http://arxiv.org/abs/2303.02060v2 ) ライセンス: Link先を確認	Iris R. Stone, Yotam Sagiv, Il Memming Park, Jonathan W. Pillow	(参考訳) ベルヌーイ観測による潜在線形力学系は、二項決定や離散確率過程(例えば、双有神経スパイク列)のような様々な文脈で発生する、二項時系列データに基づく時間的ダイナミクスを特定する強力なモデリングフレームワークを提供する。本稿では,probit-bernoulli latent linear dynamical system (lds)モデルの高速かつ効率的な適合のためのスペクトル学習法を開発した。提案手法は,従来の部分空間同定手法を,第1および第2サンプルモーメントの変換を通じてベルヌーイ設定に拡張する。この結果、局所光学の危険性と、期待最大化(em)アルゴリズムのような反復的フィッティング手順の長い計算時間を回避する堅牢で固定コストの推定器が得られる。データの制限やデータの統計構造に関する仮定が満たされていない状況では、スペクトル推定がLaplace-EMフィッティングの優れた初期化を提供することを示す。最後に,感性決定タスクを行うマウスのデータを解析することにより,実世界の環境に有意な利点をもたらすことを示す。 Latent linear dynamical systems with Bernoulli observations provide a powerful modeling framework for identifying the temporal dynamics underlying binary time series data, which arise in a variety of contexts such as binary decision-making and discrete stochastic processes (e.g., binned neural spike trains). Here we develop a spectral learning method for fast, efficient fitting of probit-Bernoulli latent linear dynamical system (LDS) models. Our approach extends traditional subspace identification methods to the Bernoulli setting via a transformation of the first and second sample moments. This results in a robust, fixed-cost estimator that avoids the hazards of local optima and the long computation time of iterative fitting procedures like the expectation-maximization (EM) algorithm. In regimes where data is limited or assumptions about the statistical structure of the data are not met, we demonstrate that the spectral estimate provides a good initialization for Laplace-EM fitting. Finally, we show that the estimator provides substantial benefits to real world settings by analyzing data from mice performing a sensory decision-making task.	翻訳日:2023-07-28 20:10:54 公開日:2023-07-26
# 変動深部拡散による大気乱流補正 Atmospheric Turbulence Correction via Variational Deep Diffusion ( http://arxiv.org/abs/2305.05077v2 ) ライセンス: Link先を確認	Xijun Wang, Santiago L\'opez-Tapia, Aggelos K. Katsaggelos	(参考訳) 大気乱流補正(AT)は、幾何学的歪みと空間的に変化するぼやけという2つの歪みからなるため、困難な復元作業である。拡散モデルは、フォトリアリスティックな画像合成などの素晴らしい成果を示している。本稿では, at補正問題を解くために, 変分推論の枠組みに基づく新しい深部条件拡散モデルを提案する。このフレームワークを使用して,入力および劣化プロセスから潜在先行情報を学習することにより,パフォーマンスを向上させる。学習した情報を用いて拡散モデルをさらに条件付けする。実験はデータセットで総合的な合成で行われる。提案手法は,定量的かつ質的な結果が得られることを示す。 Atmospheric Turbulence (AT) correction is a challenging restoration task as it consists of two distortions: geometric distortion and spatially variant blur. Diffusion models have shown impressive accomplishments in photo-realistic image synthesis and beyond. In this paper, we propose a novel deep conditional diffusion model under a variational inference framework to solve the AT correction problem. We use this framework to improve performance by learning latent prior information from the input and degradation processes. We use the learned information to further condition the diffusion model. Experiments are conducted in a comprehensive synthetic AT dataset. We show that the proposed framework achieves good quantitative and qualitative results.	翻訳日:2023-07-28 20:02:39 公開日:2023-07-26
# 弱教師付き時間行動定位のためのビデオ特異的クエリーキー注意モデル Video-Specific Query-Key Attention Modeling for Weakly-Supervised Temporal Action Localization ( http://arxiv.org/abs/2305.04186v2 ) ライセンス: Link先を確認	Xijun Wang, Aggelos K. Katsaggelos	(参考訳) 弱教師付き時間的アクションローカライゼーションは、ビデオレベルのアクションラベルのみを用いて、未トリミングビデオ中のアクションインスタンスを特定し、ローカライズすることを目的としている。人間がビデオを見るとき、さまざまなビデオシナリオにおけるアクションに関する抽象的な知識を適応させ、いくつかのアクションが起こっているかどうかを検出することができます。本稿では,人間がどのように行動するかを模倣し,ビデオ中の複数のアクションを特定し識別するための新しい視点をもたらす。本稿では,vqk-net というネットワークを提案し,各入力ビデオのアクションカテゴリ毎にユニークなクエリを学習する,ビデオ固有のクエリキー注意モデルを提案する。学習されたクエリは、アクションの知識の特徴を抽象レベルで含むだけでなく、この知識を対象のビデオシナリオに適合させる能力も備えており、時間次元に沿って対応するアクションの存在を検出するために使用される。これらのアクションカテゴリクエリをよりよく学習するために,従来の入力ビデオの特徴だけでなく,クエリ類似性を損なう新しいビデオ固有のアクションカテゴリクエリ学習者を通じて,異なるビデオ間の相関性を利用する。最後に,一般的に使用される3つのデータセット(thumos14, activitynet1.2, activitynet1.3)について広範な実験を行い,最先端のパフォーマンスを実現する。 Weakly-supervised temporal action localization aims to identify and localize the action instances in the untrimmed videos with only video-level action labels. When humans watch videos, we can adapt our abstract-level knowledge about actions in different video scenarios and detect whether some actions are occurring. In this paper, we mimic how humans do and bring a new perspective for locating and identifying multiple actions in a video. We propose a network named VQK-Net with a video-specific query-key attention modeling that learns a unique query for each action category of each input video. The learned queries not only contain the actions' knowledge features at the abstract level but also have the ability to fit this knowledge into the target video scenario, and they will be used to detect the presence of the corresponding action along the temporal dimension. To better learn these action category queries, we exploit not only the features of the current input video but also the correlation between different videos through a novel video-specific action category query learner worked with a query similarity loss. Finally, we conduct extensive experiments on three commonly used datasets (THUMOS14, ActivityNet1.2, and ActivityNet1.3) and achieve state-of-the-art performance.	翻訳日:2023-07-28 20:02:31 公開日:2023-07-26
# 文脈認識型注意層と最適な伝達領域適応と自然発話から認知症を認識するマルチモーダル融合法 Context-aware attention layers coupled with optimal transport domain adaptation and multimodal fusion methods for recognizing dementia from spontaneous speech ( http://arxiv.org/abs/2305.16406v2 ) ライセンス: Link先を確認	Loukas Ilias, Dimitris Askounis	(参考訳) アルツハイマー病(ad)は複雑な神経認知疾患であり、認知症の主な原因である。自発的発話による認知症診断を目標とする研究が数多く提案されているが、まだ限界がある。既存の最先端のアプローチでは、マルチモーダルな手法を提案し、言語と音響モデルを個別に訓練し、多数投票のアプローチを採用し、入力レベル、すなわち早期融合または訓練中に異なるモーダルの表現を結合する。また、文脈情報を考慮せずに表現間の依存関係を計算するセルフアテンション層も採用している。また,モデル校正に関する先行研究は行われていない。これらの制約に対処するため,AD患者検出のための新しい手法を提案する。まず、オーディオファイルをlog-mel spectrograms、delta、delta-deltaに変換し、3つのチャンネルからなるオーディオファイル毎の画像を作成する。次に、各転写文字と画像をそれぞれBERTモデルとDeiTモデルに渡す。その後、コンテキストベースの自己アテンション層、ゲートモデル付き自己アテンション層、および最適なトランスポートドメイン適応法を用いて、モーダル内およびモーダル間相互作用をキャプチャする。最後に、自己と横断的な特徴を融合させる2つの方法を利用する。モデルキャリブレーションを考慮した場合,ラベル平滑化を適用する。パフォーマンスとキャリブレーションの両方のメトリクスを使用します。 ADReSSとADReSSo Challengeのデータセットで実施された実験は、既存の研究イニシアチブに対する我々の導入したアプローチの有効性を示し、我々の最高の性能モデルが精度に到達し、F1スコアが91.25%、F1スコアが91.06%に達した。 Alzheimer's disease (AD) constitutes a complex neurocognitive disease and is the main cause of dementia. Although many studies have been proposed targeting at diagnosing dementia through spontaneous speech, there are still limitations. Existing state-of-the-art approaches, which propose multimodal methods, train separately language and acoustic models, employ majority-vote approaches, and concatenate the representations of the different modalities either at the input level, i.e., early fusion, or during training. Also, some of them employ self-attention layers, which calculate the dependencies between representations without considering the contextual information. In addition, no prior work has taken into consideration the model calibration. To address these limitations, we propose some new methods for detecting AD patients, which capture the intra- and cross-modal interactions. First, we convert the audio files into log-Mel spectrograms, their delta, and delta-delta and create in this way an image per audio file consisting of three channels. Next, we pass each transcript and image through BERT and DeiT models respectively. After that, context-based self-attention layers, self-attention layers with a gate model, and optimal transport domain adaptation methods are employed for capturing the intra- and inter-modal interactions. Finally, we exploit two methods for fusing the self and cross-attention features. For taking into account the model calibration, we apply label smoothing. We use both performance and calibration metrics. Experiments conducted on the ADReSS and ADReSSo Challenge datasets indicate the efficacy of our introduced approaches over existing research initiatives with our best performing model reaching Accuracy and F1-score up to 91.25% and 91.06% respectively.	翻訳日:2023-07-28 19:51:36 公開日:2023-07-26
# チャットGPT, 大規模言語モデル, 生成AI時代の科学 : 研究倫理と応答方法への挑戦 Science in the Era of ChatGPT, Large Language Models and Generative AI: Challenges for Research Ethics and How to Respond ( http://arxiv.org/abs/2305.15299v3 ) ライセンス: Link先を確認	Evangelos Pournaras	(参考訳) ChatGPTのような人工知能(AI)の大規模な言語モデルは、科学と研究に顕著だが議論の余地がある。本稿では,創造的AIの出現にともなう科学行為における認識論的課題,倫理的・整合性リスクについてレビューする。これは、高品質な研究倫理レビューのための、新たなタイムリーな基礎を築き上げることを目的としています。研究機器と主題としてのAI言語モデルの役割は、科学者、参加者、レビュアーに対する倫理的意味とともに精査されている。研究倫理レビューの新しい新たなプラクティスについて議論され、ai時代のより責任ある研究行為に対する反応を形成する10の推奨事項がまとめられている。 Large language models of artificial intelligence (AI), such as ChatGPT, find remarkable but controversial applicability in science and research. This paper reviews epistemological challenges, ethical and integrity risks in science conduct in the advent of generative AI. This is with the aim to lay new timely foundations for a high-quality research ethics review. The role of AI language models as a research instrument and subject is scrutinized along with ethical implications for scientists, participants and reviewers. New emerging practices for research ethics review are discussed, concluding with ten recommendations that shape a response for a more responsible research conduct in the era of AI.	翻訳日:2023-07-28 19:51:03 公開日:2023-07-26
# 分散クランクラベリング関係の一方向強通信複雑性における非有界量子優位 Unbounded Quantum Advantage in One-Way Strong Communication Complexity of a Distributed Clique Labelling Relation ( http://arxiv.org/abs/2305.10372v2 ) ライセンス: Link先を確認	Sumit Rout, Nitica Sakharwade, Some Sankar Bhattacharya, Ravishankar Ramanathan, Pawe{\l} Horodecki	(参考訳) 分散クリフラベル問題により誘導される関係のクラスに対する一方向ゼロエラー古典的および量子的通信複雑性について検討する。 2つの変種を考えます 1) 受信者は、関係を満足する回答 - 従来の関係の通信複雑性(ccr) - を出力し、 2)レシーバは、関係を満たすすべての有効な回答を出力する非ゼロ確率(つまり、関係を完全に再構築することができる)を持ち、関係の強い通信複雑性を示す(s-ccr)。プレイヤーがリソースを共有しない場合、ここで考慮される特定の関係クラスに対して、任意のグラフに対するccrタスクに量子的な利点がないことを証明します。一方、s-ccrタスクにおける一方向の古典的通信と量子的通信の分離がグラフ $m$ の順序で増加するグラフのクラスが存在し、特に量子的複雑性は $o(1)$ であり、古典的複雑性は $\omega(\log m)$ である。第二に、固定された制限された通信のシナリオにおける分離を克服するために必要な共有ランダム性の量に対する下界(傾きの数で線型)を証明し、直交配列の存在に接続する。最後に,この課題を半デバイス非依存次元の目撃や,相互に偏りのない基底の検出に応用する。 We investigate the one-way zero-error classical and quantum communication complexities for a class of relations induced by a distributed clique labelling problem. We consider two variants: 1) the receiver outputs an answer satisfying the relation - the traditional communication complexity of relations (CCR) and 2) the receiver has non-zero probabilities of outputting every valid answer satisfying the relation (equivalently, the relation can be fully reconstructed), that we denote the strong communication complexity of the relation (S-CCR). We prove that for the specific class of relations considered here when the players do not share any resources, there is no quantum advantage in the CCR task for any graph. On the other hand, we show that there exist, classes of graphs for which the separation between one-way classical and quantum communication in the S-CCR task grow with the order of the graph $m$, specifically, the quantum complexity is $O(1)$ while the classical complexity is $\Omega(\log m)$. Secondly, we prove a lower bound (that is linear in the number of cliques) on the amount of shared randomness necessary to overcome the separation in the scenario of fixed restricted communication and connect this to the existence of Orthogonal Arrays. Finally, we highlight some applications of this task to semi-device-independent dimension witnessing as well as to the detection of Mutually Unbiased Bases.	翻訳日:2023-07-28 19:49:57 公開日:2023-07-26
# 一次元液滴, 気泡, キンクの相互作用とダイナミクス Interactions and dynamics of one-dimensional droplets, bubbles and kinks ( http://arxiv.org/abs/2306.07055v2 ) ライセンス: Link先を確認	G. C. Katsimiga, S. I. Mistakidis, B. A. Malomed, D. J. Frantzeskakis, R. Carretero-Gonz\'alez and P. G. Kevrekidis	(参考訳) 我々は,lee-huang-yang補正を含む1次元グロス・ピタエフスキーモデルを用いて,複数の明るい液滴と気泡のダイナミクスと相互作用,およびキンクスと液滴およびアンチキンクとの相互作用について検討した。化学ポテンシャルの観点から1次元の液滴と気泡の存在領域を同定し, 液滴の安定性を検証し, 気泡の不安定性を明らかにする。液滴ファミリーの制限ケースは安定なキンクである。液滴間の相互作用は相内(相外)アトラクション(反発)を示し、いわゆるマントン法は観察された動的応答を解明し、相転移の中間値に対する混合挙動を示す。異なる化学ポテンシャルを持つ液滴は質量交換現象を経験する。個々のバブルは、不安定化の前にコア膨張と相互アトラクションを示す。キンクと相互作用する液滴はそれらによって吸収され、分散衝撃波と灰色のソリトンが放出される。 kink-antikink相互作用は反発的であり、反伝播衝撃波を生成する。本研究は,現在の実験で検出できる液滴とキンクの動的特徴を明らかにした。 We explore the dynamics and interactions of multiple bright droplets and bubbles, as well as the interactions of kinks with droplets and with antikinks, in the extended one-dimensional Gross-Pitaevskii model including the Lee-Huang-Yang correction. Existence regions are identified for the one-dimensional droplets and bubbles in terms of their chemical potential, verifying the stability of the droplets and exposing the instability of the bubbles. The limiting case of the droplet family is a stable kink. The interactions between droplets demonstrate in-phase (out-of-phase) attraction (repulsion), with the so-called Manton's method explicating the observed dynamical response, and mixed behavior for intermediate values of the phase shift. Droplets bearing different chemical potentials experience mass-exchange phenomena. Individual bubbles exhibit core expansion and mutual attraction prior to their destabilization. Droplets interacting with kinks are absorbed by them, a process accompanied by the emission of dispersive shock waves and gray solitons. Kink-antikink interactions are repulsive, generating counter-propagating shock waves. Our findings reveal dynamical features of droplets and kinks that can be detected in current experiments.	翻訳日:2023-07-28 19:42:03 公開日:2023-07-26
# PlaSma:(企業)計画のための手続き的知識モデルを改善するための小さな言語モデル PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning ( http://arxiv.org/abs/2305.19472v2 ) ライセンス: Link先を確認	Faeze Brahman, Chandra Bhagavatula, Valentina Pyatkin, Jena D. Hwang, Xiang Lorraine Li, Hirona J. Arai, Soumya Sanyal, Keisuke Sakaguchi, Xiang Ren, Yejin Choi	(参考訳) 高レベルの目標を時間的に順序付けられた一連のステップに分解する手続き的計画は、マシンにとって重要で複雑な作業である。これは「電話なしで医師の予約をスケジュールする」など、しばしば事実に反する複雑な状況についての推論に常識的な知識を統合することである。現在のアプローチでは、大きな言語モデル(LLM)を使用して結果を奨励しているが、コストのかかるAPI呼び出しや再現性の問題といった欠点によって妨げられている。本稿では,より小さな言語モデルを用いたプランニングを提唱する。手続き的知識と(非現実的な)計画能力を備えた小型言語モデルを実現するための,新しい2段階のアプローチであるPlasmaを提案する。より具体的には,小言語モデルにおける暗黙的知識を高めるための記号的手続き的知識蒸留法と,より構造化された正確な推論を容易にする推論時アルゴリズムを開発する。さらに, 対実的状況に対応するための計画の見直しを必要とする, 対実的計画という新たな課題を導入する。原型モデルと対物モデルの両方において、770M-11Bパラメータのオーダーが、より大きな教師モデルの能力を競い、しばしば超えることを示す。 Procedural planning, which entails decomposing a high-level goal into a sequence of temporally ordered steps, is an important yet intricate task for machines. It involves integrating common-sense knowledge to reason about complex contextualized situations that are often counterfactual, e.g. "scheduling a doctor's appointment without a phone". While current approaches show encouraging results using large language models (LLMs), they are hindered by drawbacks such as costly API calls and reproducibility issues. In this paper, we advocate planning using smaller language models. We present PlaSma, a novel two-pronged approach to endow small language models with procedural knowledge and (counterfactual) planning capabilities. More concretely, we develop symbolic procedural knowledge distillation to enhance the implicit knowledge in small language models and an inference-time algorithm to facilitate more structured and accurate reasoning. In addition, we introduce a novel task, Counterfactual Planning, that requires a revision of a plan to cope with a counterfactual situation. In both the original and counterfactual setting, we show that orders-of-magnitude smaller models (770M-11B parameters) can compete and often surpass their larger teacher models' capabilities.	翻訳日:2023-07-28 19:39:47 公開日:2023-07-26
# 計算社会科学における再現性 Computational Reproducibility in Computational Social Science ( http://arxiv.org/abs/2307.01918v3 ) ライセンス: Link先を確認	David Schoch, Chung-hong Chan, Claudia Wagner, Arnim Bleier	(参考訳) 過去10年間で、再現性と再現性の危機が科学界を揺るがしている。潜在的な解決策として、オープンサイエンスの実践は深く議論され、様々な分野で様々な成功を収めた。しかしながら,計算社会科学などの計算X分野における再現性のバイナリ定義は,結果が再現できるエージェントや条件について明示的でないため不十分である,と我々は主張する。本研究では, 理論的再現性を創出するが, 実用的, 検証された再現性をサポートしない「オープン洗浄」を避けるための定義を拡張し, 検証可能性の概念に基づく計算再現性の階層システムを導入する。検証可能な計算再現性、特に計算社会科学の分野における共通の障壁を特定し、共通アクセスや計算障壁を回避する方法について提案する。 In the last decade, replication and reproducibility crises have shaken the scientific landscape. As potential solutions, open science practices were heavily discussed and have been implemented with varying success in different disciplines. We argue, however, that the binary definition of reproducibility, specifically for computational-X disciplines such as computational social science, is insufficient since it is not explicit about the agents and conditions under which results can be reproduced. We expand the definition to avoid "open washing", the practice of fabricating theoretical reproducibility but not supporting practical or verified reproducibility, and introduce a tier system of computational reproducibility based on the concept of verifiability. We identify common barriers to verifiable computational reproducibility, specifically in the field of computational social science, and provide suggestions on how to circumvent common access and computational barriers.	翻訳日:2023-07-28 19:30:48 公開日:2023-07-26
# ChatGPTは人格認識に優れているか? 予備的研究 Is ChatGPT a Good Personality Recognizer? A Preliminary Study ( http://arxiv.org/abs/2307.03952v2 ) ライセンス: Link先を確認	Yu Ji, Wen Wu, Hong Zheng, Yi Hu, Xi Chen, Liang He	(参考訳) 近年、パーソナリティは感情分析や製品のレコメンデーションといった多くのタスクに組み込まれている価値ある個人的要因とみなされている。これは、与えられたテキストに基づいて個人のパーソナリティを識別することを目的とした、テキストベースのパーソナリティ認識タスクに広く注目されている。近年,ChatGPTが様々な自然言語処理タスクにおいて顕著な能力を発揮していることを考慮し,テキストに基づく人格認識タスクにおけるChatGPTの予備評価を行い,効果的な人格データを生成する。具体的には,ChatGPTが与えられたテキストから人格を認識する能力,特に所定レベルでの分析においてChatGPTを導くために設計されたレベル指向のプロンプト戦略を探索する。 2つの代表的な実世界のデータセットにおける実験結果から,ゼロショット・チェーン・オブ・マインド・プロンプトのchatgptは印象的なパーソナリティ認識能力を示し,テキストに基づく論理推論を通じて自然言語説明を提供できることが明らかとなった。さらに、ゼロショットチェーン・オブ・シークレット・プロンプトを最適化するためのレベル指向プロンプト戦略を利用することで、ChatGPTとそれに対応する最先端モデルのパフォーマンスギャップをさらに狭めている。しかし、ChatGPTは、性別や年齢などの特定のセンシティブな属性に対して不公平である。また,チャットgptのパーソナリティ認識能力の解明は,感情分類やストレス予測などのパーソナリティ関連下流タスクにおけるパフォーマンスの向上に寄与することがわかった。 In recent years, personality has been regarded as a valuable personal factor being incorporated into numerous tasks such as sentiment analysis and product recommendation. This has led to widespread attention to text-based personality recognition task, which aims to identify an individual's personality based on given text. Considering that ChatGPT has recently exhibited remarkable abilities on various natural language processing tasks, we provide a preliminary evaluation of ChatGPT on text-based personality recognition task for generating effective personality data. Concretely, we employ a variety of prompting strategies to explore ChatGPT's ability in recognizing personality from given text, especially the level-oriented prompting strategy we designed for guiding ChatGPT in analyzing given text at a specified level. The experimental results on two representative real-world datasets reveal that ChatGPT with zero-shot chain-of-thought prompting exhibits impressive personality recognition ability and is capable to provide natural language explanations through text-based logical reasoning. Furthermore, by employing the level-oriented prompting strategy to optimize zero-shot chain-of-thought prompting, the performance gap between ChatGPT and corresponding state-of-the-art model has been narrowed even more. However, we observe that ChatGPT shows unfairness towards certain sensitive demographic attributes such as gender and age. Additionally, we discover that eliciting the personality recognition ability of ChatGPT helps improve its performance on personality-related downstream tasks such as sentiment classification and stress prediction.	翻訳日:2023-07-28 19:20:58 公開日:2023-07-26
# デッドビット存在下でのフォールトトレラントハスティングス・ハア符号 Fault-Tolerant Hastings-Haah Codes in the Presence of Dead Qubits ( http://arxiv.org/abs/2307.03715v2 ) ライセンス: Link先を確認	David Aasen, Jeongwan Haah, Parsa Bonderson, Zhenghan Wang, Matthew Hastings	(参考訳) デッドキュービットの存在下でHastings-Haah Floquet符号のプロトコルを開発する。 We develop protocols for Hastings-Haah Floquet codes in the presence of dead qubits.	翻訳日:2023-07-28 19:20:00 公開日:2023-07-26
# LAMP:マルチパーソン・ポース推定のための言語プロンプトの活用 LAMP: Leveraging Language Prompts for Multi-person Pose Estimation ( http://arxiv.org/abs/2307.11934v2 ) ライセンス: Link先を確認	Shengnan Hu, Ce Zheng, Zixiang Zhou, Chen Chen, and Gita Sukthankar	(参考訳) 人間中心の視覚理解は、効果的な人間とロボットの相互作用にとって重要なデシデラタムである。混雑した公共の場所をナビゲートするためには、社会ロボットが周囲の人間の活動を理解する必要がある。本稿では,人間中心の視覚的理解,多人数ポーズ推定における重要な側面について述べる。混み合った場面における多人数ポーズ推定における良好な性能の実現は,オクルードジョイントやインスタンス分離の課題から困難である。これらの課題に取り組み,目に見えない部分を表現する際の画像特徴の限界を克服するために,lamp(language assisted multi-person pose estimation)と呼ばれる新しいプロンプトベースポーズ推論戦略を提案する。 CLIP( well-trained language model)によって生成されたテキスト表現を利用することで、LAMPはインスタンスや関節レベルでのポーズの理解を容易にし、閉塞に弱いより堅牢な視覚表現を学習することができる。本稿では,言語指導型学習が単一段階多人数ポーズ推定の性能を高めることを示し,インスタンスレベルと共同レベルのプロンプトの両方がトレーニングに有用であることを示す。コードはhttps://github.com/shengnanh20/LAMPで公開されている。 Human-centric visual understanding is an important desideratum for effective human-robot interaction. In order to navigate crowded public places, social robots must be able to interpret the activity of the surrounding humans. This paper addresses one key aspect of human-centric visual understanding, multi-person pose estimation. Achieving good performance on multi-person pose estimation in crowded scenes is difficult due to the challenges of occluded joints and instance separation. In order to tackle these challenges and overcome the limitations of image features in representing invisible body parts, we propose a novel prompt-based pose inference strategy called LAMP (Language Assisted Multi-person Pose estimation). By utilizing the text representations generated by a well-trained language model (CLIP), LAMP can facilitate the understanding of poses on the instance and joint levels, and learn more robust visual representations that are less susceptible to occlusion. This paper demonstrates that language-supervised training boosts the performance of single-stage multi-person pose estimation, and both instance-level and joint-level prompts are valuable for training. The code is available at https://github.com/shengnanh20/LAMP.	翻訳日:2023-07-28 19:11:24 公開日:2023-07-26
# 悪騒音に対するフェアネス制約学習の脆弱性について On the Vulnerability of Fairness Constrained Learning to Malicious Noise ( http://arxiv.org/abs/2307.11892v2 ) ライセンス: Link先を確認	Avrim Blum, Princewill Okoroafor, Aadirupa Saha, Kevin Stangl	(参考訳) トレーニングデータにおいて、公平性に制約された学習の脆弱性を少数の悪意のある雑音に対して考慮する。 konstantinov と lampert (2021) はこの問題の研究を開始し、いくつかの公平な制約に対して、グループのサイズが不均衡な場合、適切な学習者が高い脆弱性を示すデータ分布が存在することを示した。ここでは、より楽観的な見解を示し、ランダム化分類器を許すと、風景はより微妙になることを示す。例えば、人口統計学的パリティの場合、精度の低下は$\theta(\alpha)$であり、$\alpha$は悪意のあるノイズレートであり、公平さの制約なしにも最良に一致する。同じ機会のために、我々は$o(\sqrt{\alpha})$損失を発生させ、一致する$\omega(\sqrt{\alpha})$lowerバウンドを与えることができることを示します。対照的に、Konstantinov と Lampert (2021) は、適切な学習者に対して、両方の概念の精度の損失は$\Omega(1)$であることを示した。我々の研究の重要な技術的ノベルティは、敵が彼の力を増幅するために使える単純な「トリック」をランダム化がどのようにバイパスできるかである。また、等化オッズや校正を含む追加の公平性の概念も検討する。これらの公平性の概念に対して、過剰な精度のクラスターは3つの自然界に$O(\alpha)$,$O(\sqrt{\alpha})$と$O(1)$である。これらの結果は、訓練データにおける対向雑音に対する公平性制約学習の感度をよりきめ細かなビューを提供する。 We consider the vulnerability of fairness-constrained learning to small amounts of malicious noise in the training data. Konstantinov and Lampert (2021) initiated the study of this question and presented negative results showing there exist data distributions where for several fairness constraints, any proper learner will exhibit high vulnerability when group sizes are imbalanced. Here, we present a more optimistic view, showing that if we allow randomized classifiers, then the landscape is much more nuanced. For example, for Demographic Parity we show we can incur only a $\Theta(\alpha)$ loss in accuracy, where $\alpha$ is the malicious noise rate, matching the best possible even without fairness constraints. For Equal Opportunity, we show we can incur an $O(\sqrt{\alpha})$ loss, and give a matching $\Omega(\sqrt{\alpha})$lower bound. In contrast, Konstantinov and Lampert (2021) showed for proper learners the loss in accuracy for both notions is $\Omega(1)$. The key technical novelty of our work is how randomization can bypass simple "tricks" an adversary can use to amplify his power. We also consider additional fairness notions including Equalized Odds and Calibration. For these fairness notions, the excess accuracy clusters into three natural regimes $O(\alpha)$,$O(\sqrt{\alpha})$ and $O(1)$. These results provide a more fine-grained view of the sensitivity of fairness-constrained learning to adversarial noise in training data.	翻訳日:2023-07-28 19:11:05 公開日:2023-07-26
# 限られたデータと少ないショットとゼロショットによる生成モデリングに関する調査 A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot ( http://arxiv.org/abs/2307.14397v1 ) ライセンス: Link先を確認	Milad Abdollahzadeh, Touba Malekzadeh, Christopher T. H. Teo, Keshigeyan Chandrasegaran, Guimeng Liu, Ngai-Man Cheung	(参考訳) 機械学習において、生成モデリングは、トレーニングデータ分布と統計的に類似した新しいデータの生成を学ぶことを目的としている。本稿では,データ制約下の生成モデル (GM-DC) と称される,限られたデータ,少ないショット,ゼロショットの学習モデルについて調査する。これは、医療アプリケーションなど、データ取得が難しい場合に重要なトピックです。我々は,gm-dcタスクとgm-dcアプローチの2つの分類法について,背景,課題,提案を行う。重要なことは、異なるGM-DCタスクとアプローチ間の相互作用を研究することである。さらに,今後の探索に向けた研究のギャップ,研究動向,潜在的な道筋も強調する。プロジェクトウェブサイト: https://gmdc-survey.github.io In machine learning, generative modeling aims to learn to generate new data statistically similar to the training data distribution. In this paper, we survey learning generative models under limited data, few shots and zero shot, referred to as Generative Modeling under Data Constraint (GM-DC). This is an important topic when data acquisition is challenging, e.g. healthcare applications. We discuss background, challenges, and propose two taxonomies: one on GM-DC tasks and another on GM-DC approaches. Importantly, we study interactions between different GM-DC tasks and approaches. Furthermore, we highlight research gaps, research trends, and potential avenues for future exploration. Project website: https://gmdc-survey.github.io.	翻訳日:2023-07-28 17:08:57 公開日:2023-07-26
# 学習可能差分演算子を用いた部分既知の時空間力学のシミュレート Learning to simulate partially known spatio-temporal dynamics with trainable difference operators ( http://arxiv.org/abs/2307.14395v1 ) ライセンス: Link先を確認	Xiang Huang, Zhuoyuan Li, Hongsheng Liu, Zidong Wang, Hongye Zhou, Bin Dong, Bei Hua	(参考訳) 近年,時空間ダイナミクスをシミュレートするニューラルネットワークが注目されている。しかし、既存の手法の多くは、精度と解釈性に乏しい純粋なデータ駆動ブラックボックスモデルを採用している。トレーニング可能な差分演算子とブラックボックスモデルを組み合わせることで、PDE-Net++と呼ばれる基礎となるPDEの事前知識を部分的に組み込んだ新しいハイブリッドアーキテクチャを提案する。さらに、差分演算子に対して、トレーニング可能な反転差分層(TFDL)とトレーニング可能な動的差分層(TDDL)という2つの異なる選択肢を導入する。多くの数値実験により、PDE-Net++はブラックボックスモデルよりも予測精度と外挿性能が優れていることが示されている。 Recently, using neural networks to simulate spatio-temporal dynamics has received a lot of attention. However, most existing methods adopt pure data-driven black-box models, which have limited accuracy and interpretability. By combining trainable difference operators with black-box models, we propose a new hybrid architecture explicitly embedded with partial prior knowledge of the underlying PDEs named PDE-Net++. Furthermore, we introduce two distinct options called the trainable flipping difference layer (TFDL) and the trainable dynamic difference layer (TDDL) for the difference operators. Numerous numerical experiments have demonstrated that PDE-Net++ has superior prediction accuracy and better extrapolation performance than black-box models.	翻訳日:2023-07-28 17:08:46 公開日:2023-07-26
# ハイパーグラフ同型計算 Hypergraph Isomorphism Computation ( http://arxiv.org/abs/2307.14394v1 ) ライセンス: Link先を確認	Yifan Feng, Jiashu Han, Shihui Ying, Yue Gao	(参考訳) 同型問題(isomorphism problem)は、低次構造情報と高次構造情報の両方を取り込むネットワーク解析における根本的な問題である。低次構造情報の抽出に関して、グラフ同型アルゴリズムは、構造同値を解析してソルバ空間次元を減少させ、タンパク質設計、化学経路、コミュニティ検出などの多くの応用においてその威力を示す。現実のシナリオにおいてより一般的に発生する高次関係に対して、これらの高次構造関係を効果的に捉えているハイパーグラフ同型問題は、グラフ同型法を用いて簡単には解決できない。さらに、既存のハイパーグラフカーネルメソッドは、高いメモリ消費や不正確なサブ構造識別に苦しむ可能性があるため、サブ最適性能をもたらす。本稿では,上記の問題に対処するため,ワイスプダー・リーマンテストアルゴリズムをグラフからハイパーグラフに一般化することにより,最初にハイパーグラフ同型テスト問題に対するハイパグラフワイスプダー・リーマンテストアルゴリズムを提案する。次に,提案手法に基づき,hypergraph weisfeiler-lehmanカーネルフレームワークを提案し,hypergraph weisfeiler-lehamnサブツリーカーネルとhypergraph weisfeiler-lehamnハイパーエッジカーネルの2つのインスタンスを実装した。研究目的を達成するため、7つのグラフ分類データセットと12のハイパーグラフ分類データセットを含む包括的な実験セットを慎重に設計した。ハイパーグラフ分類データセットの結果は,提案手法の有効性を示す他のカーネルベース手法と比較して有意な改善を示した。評価の結果,提案手法は,複雑なハイパーグラフ構造を扱う場合,実行時の80倍以上の速度で実行可能であることがわかった。 The isomorphism problem is a fundamental problem in network analysis, which involves capturing both low-order and high-order structural information. In terms of extracting low-order structural information, graph isomorphism algorithms analyze the structural equivalence to reduce the solver space dimension, which demonstrates its power in many applications, such as protein design, chemical pathways, and community detection. For the more commonly occurring high-order relationships in real-life scenarios, the problem of hypergraph isomorphism, which effectively captures these high-order structural relationships, cannot be straightforwardly addressed using graph isomorphism methods. Besides, the existing hypergraph kernel methods may suffer from high memory consumption or inaccurate sub-structure identification, thus yielding sub-optimal performance. In this paper, to address the abovementioned problems, we first propose the hypergraph Weisfiler-Lehman test algorithm for the hypergraph isomorphism test problem by generalizing the Weisfiler-Lehman test algorithm from graphs to hypergraphs. Secondly, based on the presented algorithm, we propose a general hypergraph Weisfieler-Lehman kernel framework and implement two instances, which are Hypergraph Weisfeiler-Lehamn Subtree Kernel and Hypergraph Weisfeiler-Lehamn Hyperedge Kernel. In order to fulfill our research objectives, a comprehensive set of experiments was meticulously designed, including seven graph classification datasets and 12 hypergraph classification datasets. Results on hypergraph classification datasets show significant improvements compared to other typical kernel-based methods, which demonstrates the effectiveness of the proposed methods. In our evaluation, we found that our proposed methods outperform the second-best method in terms of runtime, running over 80 times faster when handling complex hypergraph structures.	翻訳日:2023-07-28 17:08:32 公開日:2023-07-26
# 3次元大規模シナリオのための人間中心シーン理解 Human-centric Scene Understanding for 3D Large-scale Scenarios ( http://arxiv.org/abs/2307.14392v1 ) ライセンス: Link先を確認	Yiteng Xu, Peishan Cong, Yichen Yao, Runnan Chen, Yuenan Hou, Xinge Zhu, Xuming He, Jingyi Yu, Yuexin Ma	(参考訳) 人間中心のシーン理解は現実世界の応用において重要であるが、多様な人間のポーズや行動、複雑な人間と環境の相互作用、群衆の激しい閉塞など、非常に難しい。本稿では,人間中心のシーン理解のための大規模マルチモーダルデータセットであるhucenlifeについて述べる。私たちのhucenlifeは、セグメンテーション、検出、アクション認識など、多くの3d知覚タスクの恩恵を受けると同時に、関連する研究を容易にするためにこれらのタスクのベンチマークも提供しています。さらに,LiDARに基づくセグメンテーションと行動認識のための新しいモジュールを設計する。 Human-centric scene understanding is significant for real-world applications, but it is extremely challenging due to the existence of diverse human poses and actions, complex human-environment interactions, severe occlusions in crowds, etc. In this paper, we present a large-scale multi-modal dataset for human-centric scene understanding, dubbed HuCenLife, which is collected in diverse daily-life scenarios with rich and fine-grained annotations. Our HuCenLife can benefit many 3D perception tasks, such as segmentation, detection, action recognition, etc., and we also provide benchmarks for these tasks to facilitate related research. In addition, we design novel modules for LiDAR-based segmentation and action recognition, which are more applicable for large-scale human-centric scenarios and achieve state-of-the-art performance.	翻訳日:2023-07-28 17:08:02 公開日:2023-07-26
# diff-e: 拡散型学習による想像音声脳波の復号化 Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG ( http://arxiv.org/abs/2307.14389v1 ) ライセンス: Link先を確認	Soowon Kim, Young-Eun Lee, Seo-Hyun Lee, Seong-Whan Lee	(参考訳) 想定された音声に対する脳波信号の復号化は、データの高次元的性質と低信号対雑音比のため難しい課題である。近年, 拡散確率モデル (DDPM) は, 様々な領域における表現学習に有望なアプローチとして出現している。本研究では,DDPMとDiff-Eという条件付きオートエンコーダを用いた脳波信号の符号化手法を提案する。その結果,Diff-Eは従来の機械学習手法やベースラインモデルと比較して脳波信号の復号精度を著しく向上させることがわかった。この結果から,DDPMは脳波信号復号に有効なツールであり,脳-コンピュータインタフェースの開発に寄与する可能性が示唆された。 Decoding EEG signals for imagined speech is a challenging task due to the high-dimensional nature of the data and low signal-to-noise ratio. In recent years, denoising diffusion probabilistic models (DDPMs) have emerged as promising approaches for representation learning in various domains. Our study proposes a novel method for decoding EEG signals for imagined speech using DDPMs and a conditional autoencoder named Diff-E. Results indicate that Diff-E significantly improves the accuracy of decoding EEG signals for imagined speech compared to traditional machine learning techniques and baseline models. Our findings suggest that DDPMs can be an effective tool for EEG signal decoding, with potential implications for the development of brain-computer interfaces that enable communication through imagined speech.	翻訳日:2023-07-28 17:07:47 公開日:2023-07-26
# ランダムウォークに基づく異常検出に対するデュアルスペース攻撃 Dual-Space Attacks against Random-Walk-based Anomaly Detection ( http://arxiv.org/abs/2307.14387v1 ) ライセンス: Link先を確認	Yuni Lai, Marcin Waniek, Yulin Zhu, Liying Li, Jingwen Wu, Tomasz P. Michalak, Talal Rahwan, Kai Zhou	(参考訳) ランダムウォークスに基づく異常検出(RWAD)は、様々なアプリケーションにおいて異常パターンを特定するために一般的に用いられる。 RWADの興味深い特徴は、入力グラフが事前に存在するか、生の特徴から構築できることである。その結果、RWADに対する潜在的な攻撃面は2つあり、グラフ空間攻撃と特徴空間攻撃である。本稿では,実用的な二重空間攻撃を設計し,グラフ空間と特徴空間攻撃の相互作用について検討する。この目的のために、我々は徹底的な複雑性解析を行い、RWAD攻撃がNPハードであることを証明した。そこで我々は,グラフ空間攻撃を二段階最適化問題として定式化し,それを解決するための2つの戦略を提案する。最後に、より強力な特徴空間攻撃(グラフ誘導攻撃)を設計するためのガイダンスとしてグラフ空間攻撃の結果を利用する。包括的実験により,提案する攻撃は,rwadからターゲットノードを限定的な攻撃予算で有効にすることを示す。さらに,ブラックボックス設定で転送攻撃実験を行い,対象ノードの異常スコアを有意に減少させることを示した。本研究では,グラフ空間が特徴空間に依存するグラフ異常検出に対する二重空間攻撃の研究の扉を開く。 Random Walks-based Anomaly Detection (RWAD) is commonly used to identify anomalous patterns in various applications. An intriguing characteristic of RWAD is that the input graph can either be pre-existing or constructed from raw features. Consequently, there are two potential attack surfaces against RWAD: graph-space attacks and feature-space attacks. In this paper, we explore this vulnerability by designing practical dual-space attacks, investigating the interplay between graph-space and feature-space attacks. To this end, we conduct a thorough complexity analysis, proving that attacking RWAD is NP-hard. Then, we proceed to formulate the graph-space attack as a bi-level optimization problem and propose two strategies to solve it: alternative iteration (alterI-attack) or utilizing the closed-form solution of the random walk model (cf-attack). Finally, we utilize the results from the graph-space attacks as guidance to design more powerful feature-space attacks (i.e., graph-guided attacks). Comprehensive experiments demonstrate that our proposed attacks are effective in enabling the target nodes from RWAD with a limited attack budget. In addition, we conduct transfer attack experiments in a black-box setting, which show that our feature attack significantly decreases the anomaly scores of target nodes. Our study opens the door to studying the dual-space attack against graph anomaly detection in which the graph space relies on the feature space.	翻訳日:2023-07-28 17:07:34 公開日:2023-07-26
# オンラインテキストデータを用いた大規模言語モデルを用いたメンタルヘルス予測 Leveraging Large Language Models for Mental Health Prediction via Online Text Data ( http://arxiv.org/abs/2307.14385v1 ) ライセンス: Link先を確認	Xuhai Xu, Bingshen Yao, Yuanzhe Dong, Hong Yu, James Hendler, Anind K. Dey, Dakuo Wang	(参考訳) 最近の大規模言語モデル(LLM)の技術強化は、様々なアプリケーションに力を与えている。しかし、精神保健領域におけるLSMの能力の理解と改善に関する研究はほとんどない。本研究は,アルパカ,アルパカ-ロラ,GPT-3.5を含む複数のLSMの様々なメンタルヘルス予測タスクにおけるオンラインテキストデータによる総合的な評価である。ゼロショットプロンプト,少数ショットプロンプト,インストラクションの微調整など,幅広い実験を実施した。その結果、ゼロショットと数ショットのプロンプトを持つLSMの有望な性能は、メンタルヘルスタスクのための設計であることがわかった。さらに重要なことは、命令の微調整が全てのタスクに対するLLMの性能を同時に向上させることを示すことである。我々の最も精巧なモデルであるMental-Alpacaは、バランスの取れた精度でGPT-3.5(25倍)を16.7%上回り、最先端のタスク特化モデルと同等に動作します。我々は,今後の研究者,技術者,実践者に対して,llmによりよいメンタルヘルス領域知識を付与し,メンタルヘルス予測タスクの専門家となるための一連の行動ガイドラインをまとめる。 The recent technology boost of large language models (LLMs) has empowered a variety of applications. However, there is very little research on understanding and improving LLMs' capability for the mental health domain. In this work, we present the first comprehensive evaluation of multiple LLMs, including Alpaca, Alpaca-LoRA, and GPT-3.5, on various mental health prediction tasks via online text data. We conduct a wide range of experiments, covering zero-shot prompting, few-shot prompting, and instruction finetuning. The results indicate the promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned model, Mental-Alpaca, outperforms GPT-3.5 (25 times bigger) by 16.7\% on balanced accuracy and performs on par with the state-of-the-art task-specific model. We summarize our findings into a set of action guidelines for future researchers, engineers, and practitioners on how to empower LLMs with better mental health domain knowledge and become an expert in mental health prediction tasks.	翻訳日:2023-07-28 17:07:11 公開日:2023-07-26
# HyperFed: フェデレーション学習における非IIDデータの一貫性集約による双曲型探索 HyperFed: Hyperbolic Prototypes Exploration with Consistent Aggregation for Non-IID Data in Federated Learning ( http://arxiv.org/abs/2307.14384v1 ) ライセンス: Link先を確認	Xinting Liao, Weiming Liu, Chaochao Chen, Pengyang Zhou, Huabin Zhu, Yanchao Tan, Jun Wang and Yue Qi	(参考訳) フェデレーション学習(fl)は、分散した方法でユーザデータを協調的にモデル化する。しかし,実世界では,クライアント間の非同一・独立データ分散(非IID)は,(1)クラス統計のシフト,(2)階層的情報利用の不十分,(3)集約における不整合という3つの問題により,FLの性能を阻害する。以上の課題に対処するため,HyperFed はハイパーボリックプロトタイプ Tammes 初期化 (HPTI) ,ハイパーボリックプロトタイプ学習 (HPL) ,一貫性のあるアグリゲーション (CA) の3つの主要モジュールを含む。第一に、サーバ内のhptiは一様分散および固定クラスのプロトタイプを構築し、それらをクラス統計にマッチするクライアントと共有し、さらにローカルクライアントのための一貫した機能表現を導く。第二に、各クライアントのHPLは、双曲モデル空間における共有クラスプロトタイプの監督により、ローカルデータの階層情報をキャプチャする。さらに、サーバ内のCAは、クライアントからサーバへの一貫性のない逸脱の影響を軽減する。 4つのデータセットの大規模な研究により、HyperFedは非IIDデータセット下でのFLの性能向上に有効であることが証明された。 Federated learning (FL) collaboratively models user data in a decentralized way. However, in the real world, non-identical and independent data distributions (non-IID) among clients hinder the performance of FL due to three issues, i.e., (1) the class statistics shifting, (2) the insufficient hierarchical information utilization, and (3) the inconsistency in aggregating clients. To address the above issues, we propose HyperFed which contains three main modules, i.e., hyperbolic prototype Tammes initialization (HPTI), hyperbolic prototype learning (HPL), and consistent aggregation (CA). Firstly, HPTI in the server constructs uniformly distributed and fixed class prototypes, and shares them with clients to match class statistics, further guiding consistent feature representation for local clients. Secondly, HPL in each client captures the hierarchical information in local data with the supervision of shared class prototypes in the hyperbolic model space. Additionally, CA in the server mitigates the impact of the inconsistent deviations from clients to server. Extensive studies of four datasets prove that HyperFed is effective in enhancing the performance of FL under the non-IID set.	翻訳日:2023-07-28 17:06:47 公開日:2023-07-26
# 共形場理論におけるサブシステムからのpetz回復 Petz recovery from subsystems in conformal field theory ( http://arxiv.org/abs/2307.14434v1 ) ライセンス: Link先を確認	Shreya Vardhan, Annie Y. Wei, and Yijian Zou	(参考訳) cftの真空状態の多成分絡み合い構造を1+1次元で探究し、より小さな部分領域の密度行列からある領域の密度行列を再構成しようとする回復演算を用いた。我々は,twirled petz mapとして知られる明示的な回復チャネルを用いて,元の状態と回復状態との間の忠実性,相対エントロピー,トレース距離などの距離測定を行った。私たちが詳細に研究している1つのセットアップは、空間スライス上の3つの連続した間隔$A$、$B$、および$C$であり、そこではこれらの量が、それらの間に在る領域$B$によって仲介されない$A$と$C$の間の相関を測るものと見なすことができる。それぞれの距離測度は、cftの作用素量に依存しないuv有限であり、従って間隔の中央電荷と交差比にのみ依存することを示した。臨界スピンチェーンモデルにおける格子シミュレーションを用いて,これらの普遍的量を数値的に評価し,その解析形式を ope 展開を用いて a$ と $c$ が近い極限で導出する。 a$ と $c$ が遠く離れている場合は、ope の制限によってレプリカトリックの非可換性が驚くべきこととなる。クロス比のすべての値に対して、忠実性は条件付き相互情報の観点から一般情報理論下限よりも厳密に優れている。また、元の状態と回復した状態における様々なサブシステム間の相互情報の比較を行い、それらの違いをより定性的に理解する。さらに,回復操作を3つ以上の隣接区間に一般化し,演算子の内容に対して忠実度が再び普遍的であることを示す。 We probe the multipartite entanglement structure of the vacuum state of a CFT in 1+1 dimensions, using recovery operations that attempt to reconstruct the density matrix in some region from its reduced density matrices on smaller subregions. We use an explicit recovery channel known as the twirled Petz map, and study distance measures such as the fidelity, relative entropy, and trace distance between the original state and the recovered state. One setup we study in detail involves three contiguous intervals $A$, $B$ and $C$ on a spatial slice, where we can view these quantities as measuring correlations between $A$ and $C$ that are not mediated by the region $B$ that lies between them. We show that each of the distance measures is both UV finite and independent of the operator content of the CFT, and hence depends only on the central charge and the cross-ratio of the intervals. We evaluate these universal quantities numerically using lattice simulations in critical spin chain models, and derive their analytic forms in the limit where $A$ and $C$ are close using the OPE expansion. In the case where $A$ and $C$ are far apart, we find a surprising non-commutativity of the replica trick with the OPE limit. For all values of the cross-ratio, the fidelity is strictly better than a general information-theoretic lower bound in terms of the conditional mutual information. We also compare the mutual information between various subsystems in the original and recovered states, which leads to a more qualitative understanding of the differences between them. Further, we introduce generalizations of the recovery operation to more than three adjacent intervals, for which the fidelity is again universal with respect to the operator content.	翻訳日:2023-07-28 16:59:00 公開日:2023-07-26
# ProtoASNet:心エコー図における非定型的大動脈狭窄分類のための動的プロトタイプ ProtoASNet: Dynamic Prototypes for Inherently Interpretable and Uncertainty-Aware Aortic Stenosis Classification in Echocardiography ( http://arxiv.org/abs/2307.14433v1 ) ライセンス: Link先を確認	Hooman Vaseli, Ang Nan Gu, S. Neda Ahmadi Amiri, Michael Y. Tsang, Andrea Fung, Nima Kondori, Armin Saadat, Purang Abolmaesumi, Teresa S. M. Tsang	(参考訳) 大動脈狭窄症(as)は、適切な治療のために正確かつタイムリーな診断を必要とする一般的な心臓弁疾患である。現在のAS重度自動検出法のほとんどは、信頼性の低いブラックボックスモデルに依存しており、臨床応用を妨げている。そこで本研究では,bモード心エコービデオからasを直接検出し,入力と学習時空間プロトタイプの類似性に基づいて解釈可能な予測を行うprotoasnetを提案する。このアプローチは、プロトタイプが典型的には石灰化や大動脈弁のリーフレットの制限された移動などのマーカーを強調するため、臨床的に重要な証拠を提供する。さらに、protoasnetは、観測データに曖昧さと不十分な情報をキャプチャするプロトタイプセットを定義することで、摂食損失を推定する。これは、いつ失敗するかを検知し、説明できる信頼できるシステムを提供する。 ProtoASNetをプライベートデータセットと公開可能なTMED-2データセットで評価し、それぞれ80.0%と79.7%の精度で既存の最先端メソッドを上回ります。さらに、ProtoASNetは、各予測に対する解釈可能性と不確実性対策を提供し、透明性を改善し、臨床的な意思決定を支援するためにディープネットワークの対話的利用を促進する。ソースコードはhttps://github.com/hooman007/protoasnet。 Aortic stenosis (AS) is a common heart valve disease that requires accurate and timely diagnosis for appropriate treatment. Most current automatic AS severity detection methods rely on black-box models with a low level of trustworthiness, which hinders clinical adoption. To address this issue, we propose ProtoASNet, a prototypical network that directly detects AS from B-mode echocardiography videos, while making interpretable predictions based on the similarity between the input and learned spatio-temporal prototypes. This approach provides supporting evidence that is clinically relevant, as the prototypes typically highlight markers such as calcification and restricted movement of aortic valve leaflets. Moreover, ProtoASNet utilizes abstention loss to estimate aleatoric uncertainty by defining a set of prototypes that capture ambiguity and insufficient information in the observed data. This provides a reliable system that can detect and explain when it may fail. We evaluate ProtoASNet on a private dataset and the publicly available TMED-2 dataset, where it outperforms existing state-of-the-art methods with an accuracy of 80.0% and 79.7%, respectively. Furthermore, ProtoASNet provides interpretability and an uncertainty measure for each prediction, which can improve transparency and facilitate the interactive usage of deep networks to aid clinical decision-making. Our source code is available at: https://github.com/hooman007/ProtoASNet.	翻訳日:2023-07-28 16:58:29 公開日:2023-07-26
# 時間相関ノイズを有する量子デバイスの圧縮ゲート特性評価 Compressed gate characterization for quantum devices with time-correlated noise ( http://arxiv.org/abs/2307.14432v1 ) ライセンス: Link先を確認	M. J. Gullans, M. Caranti, A. R. Mills, and J. R. Petta	(参考訳) 量子デバイスは、中間スケールとフォールトトレラントな量子コンピューティングに向けて着実に進歩するので、既知のノイズ源を説明する厳密で効率的な測定プロトコルを開発することが不可欠である。ゲートセットトモグラフィやランダム化ベンチマークのような既存の量子特徴づけプロトコルの多くは、量子ビットに作用するノイズがマルコビアンであると仮定する。しかし、1/fの電荷ノイズや超微細核スピンノイズの場合のように、この仮定はしばしば有効ではない。本稿では,時間関連ノイズの存在下での量子プロセストモグラフィ(QPT)の一般的な枠組みについて述べる。さらに,マルコフ音源と非マルコフノイズの相対強度を定量化する忠実度ベンチマークも導入する。本手法の適用例として,シリコンスピン量子ビットの比較理論的および実験的解析を行った。まず, 支配的雑音源を考慮した詳細なノイズモデルを開発し, 実験データに対する評価を行った。時間関連QPTの枠組みを適用すると、完全汎用の場合と比較して、1と2のキュービットゲートを特徴付けるのに必要な独立パラメータの数を10倍、100倍圧縮できることがわかった。これらの圧縮は実験に必要なトモグラフィ測定量を減少させると同時に、時間依存のハミルトニアンシミュレーションと比較してノイズ量子回路ダイナミクスの数値シミュレーションを著しく高速化する。この圧縮雑音モデルを用いて, シリコンスピン量子ビットに関する最近の実験において, 理論的に予測されたプロセスフィデリティと2つの量子ビット間ランダム化ベンチマークフィデリティの99.8%との一致が確認された。より広範に、我々のフォーマリズムは直接拡張することができ、非マルコフノイズを持つ大規模量子デバイスの高忠実性制御のための効率的でスケーラブルなチューニングプロトコルを開発することができる。 As quantum devices make steady progress towards intermediate scale and fault-tolerant quantum computing, it is essential to develop rigorous and efficient measurement protocols that account for known sources of noise. Most existing quantum characterization protocols such as gate set tomography and randomized benchmarking assume the noise acting on the qubits is Markovian. However, this assumption is often not valid, as for the case of 1/f charge noise or hyperfine nuclear spin noise. Here, we present a general framework for quantum process tomography (QPT) in the presence of time-correlated noise. We further introduce fidelity benchmarks that quantify the relative strength of different sources of Markovian and non-Markovian noise. As an application of our method, we perform a comparative theoretical and experimental analysis of silicon spin qubits. We first develop a detailed noise model that accounts for the dominant sources of noise and validate the model against experimental data. Applying our framework for time-correlated QPT, we find that the number of independent parameters needed to characterize one and two-qubit gates can be compressed by 10x and 100x, respectively, when compared to the fully generic case. These compressions reduce the amount of tomographic measurements needed in experiment, while also significantly speeding up numerical simulations of noisy quantum circuit dynamics compared to time-dependent Hamiltonian simulation. Using this compressed noise model, we find good agreement between our theoretically predicted process fidelities and two qubit interleaved randomized benchmarking fidelities of 99.8% measured in recent experiments on silicon spin qubits. More broadly, our formalism can be directly extended to develop efficient and scalable tuning protocols for high-fidelity control of large-arrays of quantum devices with non-Markovian noise.	翻訳日:2023-07-28 16:58:06 公開日:2023-07-26
# スキル・イット! 言語モデルの理解と訓練のためのデータ駆動スキルフレームワーク Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models ( http://arxiv.org/abs/2307.14430v1 ) ライセンス: Link先を確認	Mayee F. Chen, Nicholas Roberts, Kush Bhatia, Jue Wang, Ce Zhang, Frederic Sala, Christopher R\'e	(参考訳) トレーニングデータの質は、事前訓練された大規模言語モデル(LM)の性能に影響を及ぼす。トークンの固定された予算を前提として、タスク間のダウンストリームモデルのパフォーマンスを向上する最適なデータ選択方法を検討する。簡単な仮説に基づく新しいフレームワークを開発する。人間が意図的な順序で相互依存スキルを取得するのと同じように、言語モデルもトレーニングデータから一連のスキルを学ぶ際に自然な順序に従う。このような順序が存在する場合、LMの理解の向上やデータ効率のトレーニングに利用できる。この直観を用いて、我々のフレームワークは、関連するデータの観点から、スキルの概念と順序付けられたスキルセットを定式化する。まず、合成データと実データの両方を用いて、これらの順序づけられたスキルセットの存在を実証し、それらの存在によって、より高度なスキルがより少ないデータで学習できることを示す。第2に,提案手法を用いて,前者のスキルと後者のスキルを効率的に学習することを目的とした,継続的な事前学習と微調整の両方のスキルを混合した,オンラインデータサンプリングアルゴリズムであるスキルイットを提案する。 Skill-Itは、連続的な事前トレーニング設定におけるLEGO合成において、ランダムサンプリングよりも36.5ポイント高い精度を得る。微調整設定の自然命令データセットでは、目標スキル自体に関連するデータに対するトレーニングに比べて、目標スキルのバリデーション損失を13.6%削減する。我々は最近のRedPajamaデータセットにスキル・フレームワークを適用し、3BパラメータのLMを継続的に事前訓練し、1BトークンによるLM評価ハーネスを、3Bトークンによるデータソースを均一にサンプリングするベースラインアプローチよりも高い精度で達成する。 The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow a natural order when learning a set of skills from their training data. If such an order exists, it can be utilized for improved understanding of LMs and for data-efficient training. Using this intuition, our framework formalizes the notion of a skill and of an ordered set of skills in terms of the associated data. First, using both synthetic and real data, we demonstrate that these ordered skill sets exist, and that their existence enables more advanced skills to be learned with less data when we train on their prerequisite skills. Second, using our proposed framework, we introduce an online data sampling algorithm, Skill-It, over mixtures of skills for both continual pre-training and fine-tuning regimes, where the objective is to efficiently learn multiple skills in the former and an individual skill in the latter. On the LEGO synthetic in the continual pre-training setting, Skill-It obtains 36.5 points higher accuracy than random sampling. On the Natural Instructions dataset in the fine-tuning setting, Skill-It reduces the validation loss on the target skill by 13.6% versus training on data associated with the target skill itself. We apply our skills framework on the recent RedPajama dataset to continually pre-train a 3B-parameter LM, achieving higher accuracy on the LM Evaluation Harness with 1B tokens than the baseline approach of sampling uniformly over data sources with 3B tokens.	翻訳日:2023-07-28 16:57:40 公開日:2023-07-26
# 機械学習雑音緩和による非平面グラフの大規模量子近似最適化 Large-scale quantum approximate optimization on non-planar graphs with machine learning noise mitigation ( http://arxiv.org/abs/2307.14427v1 ) ライセンス: Link先を確認	Stefan H. Sack and Daniel J. Egger	(参考訳) 量子コンピュータのサイズと品質は増加しているが、いまだに非常に騒がしい。誤差軽減は、ノイズの多いデバイスが有意義に実行できる量子回路のサイズを拡張する。しかし、最先端の誤差軽減手法は実装が困難であり、超伝導量子ビットデバイスにおける限定的な量子ビット接続は、ハードウェアのネイティブトポロジーにほとんどのアプリケーションを制限する。ここでは,最大40ノードの非平面乱数正規グラフに対して,機械学習に基づく誤差軽減により量子近似最適化アルゴリズム(QAOA)を提案する。我々は,40キュービットまでの深さ2qaoaの最適化を実証するために,慎重な決定変数からキュービットへのマッピングとフィードフォワードニューラルネットワークを備えたスワップネットワークを用いた。我々は,958個の2量子ビットゲートを持つ量子回路を必要とする最大グラフに対する有意義なパラメータ最適化を観察する。我々の研究は、量子近似最適化において、期待値だけでなくサンプルの緩和の必要性を強調している。これらの結果は、古典的にシミュレートできないスケールで量子近似最適化を実行するためのステップである。このようなシステムサイズを取得することは、QAOAのようなヒューリスティックアルゴリズムの真のポテンシャルを適切に理解するための鍵となる。 Quantum computers are increasing in size and quality, but are still very noisy. Error mitigation extends the size of the quantum circuits that noisy devices can meaningfully execute. However, state-of-the-art error mitigation methods are hard to implement and the limited qubit connectivity in superconducting qubit devices restricts most applications to the hardware's native topology. Here we show a quantum approximate optimization algorithm (QAOA) on non-planar random regular graphs with up to 40 nodes enabled by a machine learning-based error mitigation. We use a swap network with careful decision-variable-to-qubit mapping and a feed-forward neural network to demonstrate optimization of a depth-two QAOA on up to 40 qubits. We observe a meaningful parameter optimization for the largest graph which requires running quantum circuits with 958 two-qubit gates. Our work emphasizes the need to mitigate samples, and not only expectation values, in quantum approximate optimization. These results are a step towards executing quantum approximate optimization at a scale that is not classically simulable. Reaching such system sizes is key to properly understanding the true potential of heuristic algorithms like QAOA.	翻訳日:2023-07-28 16:57:09 公開日:2023-07-26
# ユニバーサルトランスバーサルゲート集合のための量子ゴレイ符号を用いたコード変換 Code conversion with the quantum Golay code for a universal transversal gate set ( http://arxiv.org/abs/2307.14425v1 ) ライセンス: Link先を確認	Matthew Sullivan	(参考訳) 7,1,3]]$ steane codeと$[[23,1,7]]$ quantum golay codeは、コード結合によるフォールトトレラントな量子コンピューティングの候補として認識されている。これら2つの符号はクリフォードゲートを横断的に実装するが、フォールトトレラントな$T$ゲートの他のスキームを必要とする。マジックステート、クリフォード演算、測定は一般的なスキームであるが、マジックステート蒸留には大きなオーバーヘッドがある。コード変換は、魔法の状態を用いずにユニバーサルゲートセットをフォールトトレラントに実装するための1つの方法である。 $[[7,1,3]]$ Steaneコードをフォールトトレラントに変換し、$[[[15,1,3]]$ Reed-Mullerコードから、$T$ゲートを変換した$[[23,1,7]$ Golayコードを$[[[95,1,7]$ triorthogonal code with a transversal $T$ gateに変換できる。この手順の重要な要素は$[[49,1,5]]$ triorthogonal codeであり、これはそれ自体が$[[17,1,5]$ 2dカラーコードと関連していると見なすことができる。 The $[[7,1,3]]$ Steane code and $[[23,1,7]]$ quantum Golay code have been identified as good candidates for fault-tolerant quantum computing via code concatenation. These two codes have transversal implementations of all Clifford gates, but require some other scheme for fault-tolerant $T$ gates. Using magic states, Clifford operations, and measurements is one common scheme, but magic state distillation can have a large overhead. Code conversion is one avenue for implementing a universal gate set fault-tolerantly without the use of magic states. Analogously to how the $[[7,1,3]]$ Steane code can be fault-tolerantly converted to and from the $[[15,1,3]]$ Reed-Muller code which has a transversal $T$ gate, the $[[23,1,7]]$ Golay code can be converted to a $[[95,1,7]]$ triorthogonal code with a transversal $T$ gate. A crucial ingredient to this procedure is the $[[49,1,5]]$ triorthogonal code, which can itself be seen as related to the self-dual $[[17,1,5]]$ 2D color code.	翻訳日:2023-07-28 16:56:50 公開日:2023-07-26
# 捕捉イオンを用いた測定に基づく量子ランダムサンプリングの検証 Verifiable measurement-based quantum random sampling with trapped ions ( http://arxiv.org/abs/2307.14424v1 ) ライセンス: Link先を確認	Martin Ringbauer, Marcel Hinsche, Thomas Feldker, Paul K. Faehrmann, Juani Bermejo-Vega, Claire Edmunds, Lukas Postler, Roman Stricker, Christian D. Marciniak, Michael Meth, Ivan Pogorelov, Rainer Blatt, Philipp Schindler, Jens Eisert, Thomas Monz, Dominik Hangleiter	(参考訳) 量子コンピュータは今や、彼らの古典的コンピュータを上回っている。量子計算の利点を示す1つの方法は、量子コンピューティングデバイス上で実行される量子ランダムサンプリングである。しかしながら、量子デバイスが実際に古典的な難解なサンプリングタスクを実行したことを検証するための既存のツールは、実用的でないか、量子アドバンテージにスケーラブルでないかのどちらかである。検証問題は依然として顕著な課題である。ここでは、捕捉イオン量子プロセッサ上での量子計算の測定モデルにおいて、効率よく検証可能な量子ランダムサンプリングを実験的に示す。私たちは、測定ベースのコンピューティングの中心にあるランダムなクラスタ状態を作成し、最大4 x 4 qubitまでのサイズにします。さらに、これらの状態の構造を利用することで、量子ビットレジスタよりも大きい絡み合ったクラスタ状態からサンプルに計算中に量子ビットを再利用することができる。結果とクロスエントロピーベンチマークを比較して,結果の妥当性を効果的に推定して,生成した状態(単一インスタンスと平均)を検証した。最後に,実験騒音が証明書に与える影響について検討する。我々の結果と手法は、量子優位の検証された実証に向けて実現可能な経路を提供する。 Quantum computers are now on the brink of outperforming their classical counterparts. One way to demonstrate the advantage of quantum computation is through quantum random sampling performed on quantum computing devices. However, existing tools for verifying that a quantum device indeed performed the classically intractable sampling task are either impractical or not scalable to the quantum advantage regime. The verification problem thus remains an outstanding challenge. Here, we experimentally demonstrate efficiently verifiable quantum random sampling in the measurement-based model of quantum computation on a trapped-ion quantum processor. We create random cluster states, which are at the heart of measurement-based computing, up to a size of 4 x 4 qubits. Moreover, by exploiting the structure of these states, we are able to recycle qubits during the computation to sample from entangled cluster states that are larger than the qubit register. We then efficiently estimate the fidelity to verify the prepared states--in single instances and on average--and compare our results to cross-entropy benchmarking. Finally, we study the effect of experimental noise on the certificates. Our results and techniques provide a feasible path toward a verified demonstration of a quantum advantage.	翻訳日:2023-07-28 16:56:20 公開日:2023-07-26
# 量子コンピューティングによる地球観測衛星の画像取得の最適化 Optimization of Image Acquisition for Earth Observation Satellites via Quantum Computing ( http://arxiv.org/abs/2307.14419v1 ) ライセンス: Link先を確認	Ant\'on Makarov, M\'arcio M. Taddei, Eneko Osaba, Giacomo Franceschetto, Esther Villar-Rodriguez, Izaskun Oregi	(参考訳) 衛星画像取得スケジューリングは、地球観測分野において一様である問題であり、その目的は、与えられた軌道の通過時に撮影される画像の最適なサブセットを一連の制約の下で見つけることである。この問題は組合せ最適化によってモデル化できるが、人工知能と運用研究コミュニティによって何度も扱われてきた。しかし、その本質的な関心にもかかわらず、量子コンピューティングパラダイムを通じてはほとんど研究されていない。そこで本稿では,この問題に対する2つのqubo定式化について,非自明な制約を扱うために異なるアプローチを用いて述べる。現在D-Waveから利用可能な3つの量子アニールと、そのハイブリッドソルバを用いて、20以上の問題を実験的に比較した。テスト中の14のインスタンスはよく知られたSPOT5ベンチマークから取得され、残りの6つはアドホックで生成された。以上の結果から, 定式化とアシラハンドリング手法が課題の解決に不可欠であることが示唆された。最後に、現在の量子コンピュータで現実的に解決できる問題インスタンスのサイズ制限に関する実践的ガイドラインも提供する。 Satellite image acquisition scheduling is a problem that is omnipresent in the earth observation field; its goal is to find the optimal subset of images to be taken during a given orbit pass under a set of constraints. This problem, which can be modeled via combinatorial optimization, has been dealt with many times by the artificial intelligence and operations research communities. However, despite its inherent interest, it has been scarcely studied through the quantum computing paradigm. Taking this situation as motivation, we present in this paper two QUBO formulations for the problem, using different approaches to handle the non-trivial constraints. We compare the formulations experimentally over 20 problem instances using three quantum annealers currently available from D-Wave, as well as one of its hybrid solvers. Fourteen of the tested instances have been obtained from the well-known SPOT5 benchmark, while the remaining six have been generated ad-hoc for this study. Our results show that the formulation and the ancilla handling technique is crucial to solve the problem successfully. Finally, we also provide practical guidelines on the size limits of problem instances that can be realistically solved on current quantum computers.	翻訳日:2023-07-28 16:56:03 公開日:2023-07-26
# スペクトルと空間的忠実度を併用した教師なし深層学習によるパンシャープニング Unsupervised Deep Learning-based Pansharpening with Jointly-Enhanced Spectral and Spatial Fidelity ( http://arxiv.org/abs/2307.14403v1 ) ライセンス: Link先を確認	Matteo Ciotola, Giovanni Poggi, Giuseppe Scarpa	(参考訳) 近年、深層学習は多解像度画像のパンシャーピングにおいて主要な役割を担っている。基礎的真理データがないことから、深層学習に基づく手法の多くは、解像度の低い領域で教師付きトレーニングを実行する。しかし、小型画像で訓練されたモデルは高解像度のターゲット画像では性能が良くない傾向にある。このため、いくつかの研究グループが、適切な損失関数とトレーニングパラダイムの定義を通じて、フルレゾリューション領域における教師なしトレーニングに移行している。この文脈で、我々は最近、既存の多くのアーキテクチャに適用可能なフルレゾリューショントレーニングフレームワークを提案しました。本稿では,このアプローチの可能性を十分に活用し,最先端のパフォーマンスを提供する,深層学習に基づく新しいパンシャープニングモデルを提案する。余剰アテンションモジュールの使用など,過去の作業に対するアーキテクチャ改善に加えて,提案モデルでは,パンシャープデータのスペクトルと空間的品質を協調的に促進する新たな損失関数が特徴である。さらに、新しい微調整戦略により、ターゲット画像への推論時間適応を改善する。挑戦的なシナリオで実施された多種多様なテスト画像の実験により,提案手法は,数値的結果と視覚的出力の両面において,技術の現状と良好に比較できることを示した。コードはhttps://github.com/matciotola/lambda-pnnで入手できる。 In latest years, deep learning has gained a leading role in the pansharpening of multiresolution images. Given the lack of ground truth data, most deep learning-based methods carry out supervised training in a reduced-resolution domain. However, models trained on downsized images tend to perform poorly on high-resolution target images. For this reason, several research groups are now turning to unsupervised training in the full-resolution domain, through the definition of appropriate loss functions and training paradigms. In this context, we have recently proposed a full-resolution training framework which can be applied to many existing architectures. Here, we propose a new deep learning-based pansharpening model that fully exploits the potential of this approach and provides cutting-edge performance. Besides architectural improvements with respect to previous work, such as the use of residual attention modules, the proposed model features a novel loss function that jointly promotes the spectral and spatial quality of the pansharpened data. In addition, thanks to a new fine-tuning strategy, it improves inference-time adaptation to target images. Experiments on a large variety of test images, performed in challenging scenarios, demonstrate that the proposed method compares favorably with the state of the art both in terms of numerical results and visual output. Code is available online at https://github.com/matciotola/Lambda-PNN.	翻訳日:2023-07-28 16:55:45 公開日:2023-07-26
# 癌治療結果予測のための非線形自己拡張ディープパイプライン Non-Linear Self Augmentation Deep Pipeline for Cancer Treatment outcome Prediction ( http://arxiv.org/abs/2307.14398v1 ) ライセンス: Link先を確認	Francesco Rundo, Concetto Spampinato, Michael Rundo	(参考訳) 免疫療法は癌治療に有望なアプローチとして現れる。腫瘍治療における免疫療法の効果は, 従来の化学療法法と比較して長期生存率と毒性の顕著な低下がみられた。しかし、免疫療法に適する患者のプールは依然として比較的小さく、特定の患者に好意的な治療反応をもたらす生理的メカニズムに関する包括的な理解の欠如が示唆されている。この問題に取り組むため,著者らは,非線形セルアーキテクチャとディープ下流分類器を併用した革新的な戦略を提案する。このアプローチは胸腹部ct画像から抽出した2次元特徴を慎重に選択・拡張し,治療結果の予測を改善することを目的としている。提案したパイプラインは、高度に組み込まれたPoint of Careシステムとシームレスに統合するように慎重に設計されている。この文脈で著者らは、特に攻撃的ながんである転移性尿路上皮癌(muc)に焦点を当てた説得力のあるケーススタディを提示した。提案手法の性能評価は, 約93%の精度で, その効果を裏付けるものである。 Immunotherapy emerges as promising approach for treating cancer. Encouraging findings have validated the efficacy of immunotherapy medications in addressing tumors, resulting in prolonged survival rates and notable reductions in toxicity compared to conventional chemotherapy methods. However, the pool of eligible patients for immunotherapy remains relatively small, indicating a lack of comprehensive understanding regarding the physiological mechanisms responsible for favorable treatment response in certain individuals while others experience limited benefits. To tackle this issue, the authors present an innovative strategy that harnesses a non-linear cellular architecture in conjunction with a deep downstream classifier. This approach aims to carefully select and enhance 2D features extracted from chest-abdomen CT images, thereby improving the prediction of treatment outcomes. The proposed pipeline has been meticulously designed to seamlessly integrate with an advanced embedded Point of Care system. In this context, the authors present a compelling case study focused on Metastatic Urothelial Carcinoma (mUC), a particularly aggressive form of cancer. Performance evaluation of the proposed approach underscores its effectiveness, with an impressive overall accuracy of approximately 93%	翻訳日:2023-07-28 16:55:27 公開日:2023-07-26
# MiDaS v3.1 -- ロバストな単分子相対深さ推定のためのモデル動物園 MiDaS v3.1 -- A Model Zoo for Robust Monocular Relative Depth Estimation ( http://arxiv.org/abs/2307.14460v1 ) ライセンス: Link先を確認	Reiner Birkl, Diana Wofk, Matthias M\"uller	(参考訳) モノクロ深度推定のためのMiDaS v3.1をリリースし、異なるエンコーダのバックボーンに基づく様々な新しいモデルを提供する。このリリースはコンピュータビジョンにおけるトランスフォーマーの成功によるものであり、様々な事前訓練されたビジョントランスフォーマーが利用可能になっている。画像エンコーダとして最も有望なビジョントランスフォーマーが,MiDaSアーキテクチャの深度推定品質とランタイムに与える影響について検討する。画像分類タスクにおいて視覚トランスフォーマーに匹敵する品質を実現する最近の畳み込み手法についても検討した。 MiDaS v3.0はバニラ・ビジョン・トランスフォーマーのViTのみを利用しているが、MiDaS v3.1はBEiT、Swin、SwinV2、Next-ViT、LeViTをベースとした追加モデルを提供している。これらのモデルはパフォーマンスとランタイムのトレードオフが異なる。最良のモデルは深さ推定品質を28%改善し、効率的なモデルはフレームレートの高い下流タスクを可能にする。新しいバックボーンを統合する一般的なプロセスについても説明します。作業の要約はhttps://youtu.be/UjaeNNFf9sEで、コードはhttps://github.com/isl-org/MiDaSで公開されている。 We release MiDaS v3.1 for monocular depth estimation, offering a variety of new models based on different encoder backbones. This release is motivated by the success of transformers in computer vision, with a large variety of pretrained vision transformers now available. We explore how using the most promising vision transformers as image encoders impacts depth estimation quality and runtime of the MiDaS architecture. Our investigation also includes recent convolutional approaches that achieve comparable quality to vision transformers in image classification tasks. While the previous release MiDaS v3.0 solely leverages the vanilla vision transformer ViT, MiDaS v3.1 offers additional models based on BEiT, Swin, SwinV2, Next-ViT and LeViT. These models offer different performance-runtime tradeoffs. The best model improves the depth estimation quality by 28% while efficient models enable downstream tasks requiring high frame rates. We also describe the general process for integrating new backbones. A video summarizing the work can be found at https://youtu.be/UjaeNNFf9sE and the code is available at https://github.com/isl-org/MiDaS.	翻訳日:2023-07-28 16:50:41 公開日:2023-07-26
# コアセットを用いた量子ボルツマンマシンのトレーニング Training Quantum Boltzmann Machines with Coresets ( http://arxiv.org/abs/2307.14459v1 ) ライセンス: Link先を確認	Joshua Viszlai, Teague Tomesh, Pranav Gokhale, Eric Anschuetz, Frederic T. Chong	(参考訳) 最近の研究は、これらのアルゴリズムの短期量子デバイスへの適用性を高めるために、古典的なデータセットで動作する量子アルゴリズムのコアセット技術を用いて、研究されている。これらのアイデアを量子ボルツマンマシン(QBM)に適用し、ギブス状態サンプリングを必要とする勾配に基づくステップがトレーニングにおける主な計算ボトルネックとなる。データセット全体の代わりにcoresetを使用することで、必要なステップの数を最小化し、トレーニング時間を短縮します。量子コンピュータの計算時間が重要な資源である体制では、このことが現実的な節約につながる可能性がある。本手法は,36個の可視ユニットと8個の隠蔽ユニットを持つQBMを用いて,拡張バーからの6x6バイナリ画像に対して評価を行った。インセプションスコアにインスパイアされたメトリクスを用いて、コアセットの使用の有無とQBMトレーニング時間を比較する。 Recent work has proposed and explored using coreset techniques for quantum algorithms that operate on classical data sets to accelerate the applicability of these algorithms on near-term quantum devices. We apply these ideas to Quantum Boltzmann Machines (QBM) where gradient-based steps which require Gibbs state sampling are the main computational bottleneck during training. By using a coreset in place of the full data set, we try to minimize the number of steps needed and accelerate the overall training time. In a regime where computational time on quantum computers is a precious resource, we propose this might lead to substantial practical savings. We evaluate this approach on 6x6 binary images from an augmented bars and stripes data set using a QBM with 36 visible units and 8 hidden units. Using an Inception score inspired metric, we compare QBM training times with and without using coresets.	翻訳日:2023-07-28 16:50:04 公開日:2023-07-26
# 機械学習を用いた装甲車両の予測保守 Predictive Maintenance of Armoured Vehicles using Machine Learning Approaches ( http://arxiv.org/abs/2307.14453v1 ) ライセンス: Link先を確認	Prajit Sengupta, Anant Mehta, Prashant Singh Rana	(参考訳) 装甲車両(英語: Armoured vehicle)は、しばしば戦闘や戦術的な状況において、高ストレス環境で運用するために設計された特殊で複雑な機械である。本研究では,これらの車両から収集したセンサデータに基づいて,潜在的保守ニーズの予測を支援する予測保守型アンサンブルシステムを提案する。提案されたモデルのアーキテクチャは、車両のメンテナンス要件を正確に予測するために、軽量勾配ブースティング、ランダムフォレスト、決定木、余分な木分類器、勾配ブースティングといった様々なモデルを含んでいる。さらに,提案したアンサンブルモデルの安定性を評価するために,TOPSIS解析とともにK-foldクロスバリデーションを用いた。その結果,提案システムは98.93%の精度,99.80%の精度,99.03%のリコールを達成した。このアルゴリズムは、メンテナンスニーズを効果的に予測でき、車両のダウンタイムを低減し、運用効率を向上させる。様々なアルゴリズムと提案するアンサンブルを比較することで,機械学習による予測保守ソリューションの可能性を明らかにする。 Armoured vehicles are specialized and complex pieces of machinery designed to operate in high-stress environments, often in combat or tactical situations. This study proposes a predictive maintenance-based ensemble system that aids in predicting potential maintenance needs based on sensor data collected from these vehicles. The proposed model's architecture involves various models such as Light Gradient Boosting, Random Forest, Decision Tree, Extra Tree Classifier and Gradient Boosting to predict the maintenance requirements of the vehicles accurately. In addition, K-fold cross validation, along with TOPSIS analysis, is employed to evaluate the proposed ensemble model's stability. The results indicate that the proposed system achieves an accuracy of 98.93%, precision of 99.80% and recall of 99.03%. The algorithm can effectively predict maintenance needs, thereby reducing vehicle downtime and improving operational efficiency. Through comparisons between various algorithms and the suggested ensemble, this study highlights the potential of machine learning-based predictive maintenance solutions.	翻訳日:2023-07-28 16:49:39 公開日:2023-07-26
# 古典確率ビットと回路を用いた量子アルゴリズムのシミュレーション Simulation of quantum algorithms using classical probabilistic bits and circuits ( http://arxiv.org/abs/2307.14452v1 ) ライセンス: Link先を確認	D. D. Yavuz and A. Yadav	(参考訳) 古典確率ビットと回路を用いて量子アルゴリズムをシミュレートする新しい手法を提案する。各量子ビット(2レベル量子システム)は、8次元確率空間内のベクトル(つまり8つの確率的結果を持つ古典確率変数)にマッピングされる。この写像の鍵となる考え方は、確率におけるキュービット状態を記述する複素係数の振幅と位相情報を格納することである。複数の量子系と複数の確率空間を結合する同一のテンソル積構造のため、n$ qubits は、n$ 8-次元確率ベクトルのテンソル積に写像される(すなわち、次元 2^n$ のヒルベルト空間は、次元 8^n$ の確率空間に写像される)。この最初のマッピングの後、これらの古典確率変数の相関誘導演算を用いて、確率空間における単一量子ビットおよび2量子ビットゲートのアナログの実装方法を示す。確率空間への写像と、この空間における変換(つまり、確率変数上の演算)の両方の重要な定義的特徴は、それらが線型ではなくアフィンであることである。このアーキテクチャを用いることで、量子システムの2^n$複素係数の進化は、確率変数の多項式数の結合的完全相関確率で追跡することができる。次に、(1) deutsch-jozsaアルゴリズム、(2)確率空間における量子フーリエ変換を実装するための特別な手順を与える。量子の場合と同一であり、確率空間における量子フーリエ変換をシミュレートするには、$O(n)$確率ビットと$O(n^2)$(すなわち量子ビット数の2次)演算が必要である。 We discuss a new approach to simulate quantum algorithms using classical probabilistic bits and circuits. Each qubit (a two-level quantum system) is initially mapped to a vector in an eight dimensional probability space (equivalently, to a classical random variable with eight probabilistic outcomes). The key idea in this mapping is to store both the amplitude and phase information of the complex coefficients that describe the qubit state in the probabilities. Due to the identical tensor product structure of combining multiple quantum systems as well as multiple probability spaces, $n$ qubits are then mapped to a tensor product of $n$ 8-dimensional probabilistic vectors (i.e., the Hilbert space of dimension $2^n$ is mapped to a probability space of dimension $8^n$). After this initial mapping, we show how to implement the analogs of single-qubit and two-qubit gates in the probability space using correlation-inducing operations on these classical random variables. The key defining feature of both the mapping to the probability space and the transformations in this space (i.e., operations on the random variables) is that they are not linear, but instead affine. Using this architecture, the evolution of the $2^n$ complex coefficients of the quantum system can be tracked in the joint fully-correlated probabilities of the polynomial number of random variables. We then give specific procedures for implementing (1) the Deutsch-Jozsa algorithm, and (2) the Quantum Fourier Transform in the probability space. Identical to the Quantum case, simulating the Quantum Fourier Transform in the probability space requires $O(n)$ probabilistic bits and $O(n^2)$ (i.e., quadratic in the number of quantum bits) operations.	翻訳日:2023-07-28 16:49:07 公開日:2023-07-26
# VISPUR: データ駆動決定における純粋アソシエーションの特定と解釈のためのビジュアルエイド VISPUR: Visual Aids for Identifying and Interpreting Spurious Associations in Data-Driven Decisions ( http://arxiv.org/abs/2307.14448v1 ) ライセンス: Link先を確認	Xian Teng, Yongsu Ahn, Yu-Ru Lin	(参考訳) ビッグデータと機械学習のツールは、データ駆動の意思決定で人間に力を与えてきた。しかし、それらの多くは、結合する要因と部分群の不均質性によって引き起こされる可能性のある経験的関連を捉えている。有名なシンプソンのパラドックスは、集約とサブグループレベルの関係が互いに矛盾し、認知的な混乱と適切な解釈や決定が困難になる現象である。既存のツールは、人間が実際に急激な協会の落とし穴を見つけ、推論し、防ぐための洞察をほとんど提供しない。本稿では、因果解析フレームワークと人間中心のワークフローを提供する視覚分析システムであるVISPURを提案する。それらはConFOUNDER DASHBOARD(英語版)とSUBGROUP VIEWER(英語版)で、因果関係の誤解釈をもたらす可能性のある様々なサブグループのパターンの可視化と比較を可能にする。また,フローベースの手法を用いてパラドックス現象を説明できるREASONING STORYBOARDや,説明責任のある意思決定を支援するインタラクティブなDEC(Decision DIAGNOSIS)パネルを提案する。専門的なインタビューと制御されたユーザ実験を通じて,提案した「デパラドックス」ワークフローとデザインされた視覚分析システムが,突発的な関連を識別し理解し,説明可能な因果決定を行うのに役立つことを示す。 Big data and machine learning tools have jointly empowered humans in making data-driven decisions. However, many of them capture empirical associations that might be spurious due to confounding factors and subgroup heterogeneity. The famous Simpson's paradox is such a phenomenon where aggregated and subgroup-level associations contradict with each other, causing cognitive confusions and difficulty in making adequate interpretations and decisions. Existing tools provide little insights for humans to locate, reason about, and prevent pitfalls of spurious association in practice. We propose VISPUR, a visual analytic system that provides a causal analysis framework and a human-centric workflow for tackling spurious associations. These include a CONFOUNDER DASHBOARD, which can automatically identify possible confounding factors, and a SUBGROUP VIEWER, which allows for the visualization and comparison of diverse subgroup patterns that likely or potentially result in a misinterpretation of causality. Additionally, we propose a REASONING STORYBOARD, which uses a flow-based approach to illustrate paradoxical phenomena, as well as an interactive DECISION DIAGNOSIS panel that helps ensure accountable decision-making. Through an expert interview and a controlled user experiment, our qualitative and quantitative results demonstrate that the proposed "de-paradox" workflow and the designed visual analytic system are effective in helping human users to identify and understand spurious associations, as well as to make accountable causal decisions.	翻訳日:2023-07-28 16:48:19 公開日:2023-07-26
# 意味セグメンテーションのための自己教師付き少数ショット学習--アノテーションフリーアプローチ Self-supervised Few-shot Learning for Semantic Segmentation: An Annotation-free Approach ( http://arxiv.org/abs/2307.14446v1 ) ライセンス: Link先を確認	Sanaz Karimijafarbigloo and Reza Azad and Dorit Merhof	(参考訳) Few-shot semantic segmentation (FSS)は、医療画像解析の分野で大きな可能性を秘めており、限られたトレーニングデータで正確なオブジェクトセグメンテーションを可能にする。しかし、既存のFSS技術は注釈付きセマンティッククラスに大きく依存しており、アノテーションの不足のため医学画像には適さない。この課題に対処するために、複数のコントリビューションが提案されている。まず、スペクトル分解法にインスパイアされた画像分解の問題は、グラフ分割タスクとして再編成される。自己教師付きネットワークの特徴親和性行列から導出されるラプラシアン行列の固有ベクトルを分析し、支持画像から関心対象の分布を推定する。次に,アノテーションに依存しない自己教師型FSSフレームワークを提案する。その代わり、サポート画像から得られた固有ベクトルを利用してクエリマスクを適応的に推定する。このアプローチは手動のアノテーションの必要性を排除し、注釈付きデータに制限のある医療画像に特に適している。第3に,サポート画像が提供する情報に基づいて,クエリ画像の復号化をさらに促進するために,マルチスケールの大規模カーネルアテンションモジュールを導入する。関連する機能や詳細を選択的に強調することにより、このモジュールはセグメンテーションプロセスを改善し、よりよいオブジェクト記述に寄与する。自然画像データセットと医用画像データセットの評価は,本手法の有効性と有効性を示す。さらに,提案手法は汎用性とモデルに依存しない性質を特徴とし,様々な深層アーキテクチャとのシームレスな統合を実現する。コードは \href{https://github.com/mindflow-institue/annotation_free_fewshot}{\textcolor{magenta}{GitHub}} で公開されている。 Few-shot semantic segmentation (FSS) offers immense potential in the field of medical image analysis, enabling accurate object segmentation with limited training data. However, existing FSS techniques heavily rely on annotated semantic classes, rendering them unsuitable for medical images due to the scarcity of annotations. To address this challenge, multiple contributions are proposed: First, inspired by spectral decomposition methods, the problem of image decomposition is reframed as a graph partitioning task. The eigenvectors of the Laplacian matrix, derived from the feature affinity matrix of self-supervised networks, are analyzed to estimate the distribution of the objects of interest from the support images. Secondly, we propose a novel self-supervised FSS framework that does not rely on any annotation. Instead, it adaptively estimates the query mask by leveraging the eigenvectors obtained from the support images. This approach eliminates the need for manual annotation, making it particularly suitable for medical images with limited annotated data. Thirdly, to further enhance the decoding of the query image based on the information provided by the support image, we introduce a multi-scale large kernel attention module. By selectively emphasizing relevant features and details, this module improves the segmentation process and contributes to better object delineation. Evaluations on both natural and medical image datasets demonstrate the efficiency and effectiveness of our method. Moreover, the proposed approach is characterized by its generality and model-agnostic nature, allowing for seamless integration with various deep architectures. The code is publicly available at \href{https://github.com/mindflow-institue/annotation_free_fewshot}{\textcolor{magenta}{GitHub}}.	翻訳日:2023-07-28 16:47:51 公開日:2023-07-26
# 量子シミュレーションからの高密度出力 Dense outputs from quantum simulations ( http://arxiv.org/abs/2307.14441v1 ) ライセンス: Link先を確認	Jin-Peng Liu, Lin Lin	(参考訳) 量子密度出力問題(quantum dense output problem)は、量子コンピュータを用いて時間依存の量子力学から時間蓄積可観測性を評価する過程である。この問題は量子制御や分光計算などの応用で頻繁に発生する。我々は、早期および完全フォールトトレラントな量子プラットフォームの両方で動作するように設計されたアルゴリズムを提示する。これらの手法は振幅推定、ハミルトニアンシミュレーション、量子線型正規微分方程式(ODE)解法、量子カールマン線形化などの手法に基づいている。進化時間$t$とエラー耐性$\epsilon$に関する包括的な複雑性分析を提供する。その結果, 線形化手法は, ある種の低ランク高密度出力に対して, 最適複雑性$\mathcal{O}(T/\epsilon)$をほぼ達成できることを示した。さらに、密度出力問題の線形化を行い、元の状態を包含する完全かつ有限次元の閉包を与える。この定式化はクープマン不変部分空間理論と関係があり、非線形制御と科学機械学習に独立した関心を持つ可能性がある。 The quantum dense output problem is the process of evaluating time-accumulated observables from time-dependent quantum dynamics using quantum computers. This problem arises frequently in applications such as quantum control and spectroscopic computation. We present a range of algorithms designed to operate on both early and fully fault-tolerant quantum platforms. These methodologies draw upon techniques like amplitude estimation, Hamiltonian simulation, quantum linear Ordinary Differential Equation (ODE) solvers, and quantum Carleman linearization. We provide a comprehensive complexity analysis with respect to the evolution time $T$ and error tolerance $\epsilon$. Our results demonstrate that the linearization approach can nearly achieve optimal complexity $\mathcal{O}(T/\epsilon)$ for a certain type of low-rank dense outputs. Moreover, we provide a linearization of the dense output problem that yields an exact and finite-dimensional closure which encompasses the original states. This formulation is related to the Koopman Invariant Subspace theory and may be of independent interest in nonlinear control and scientific machine learning.	翻訳日:2023-07-28 16:47:22 公開日:2023-07-26
# ファウショット応答生成とランク付けによる対話システムのための対話法の制御可能生成 Controllable Generation of Dialogue Acts for Dialogue Systems via Few-Shot Response Generation and Ranking ( http://arxiv.org/abs/2307.14440v1 ) ライセンス: Link先を確認	Angela Ramirez and Karik Agarwal and Juraj Juraska and Utkarsh Garg and Marilyn A. Walker	(参考訳) 対話システムは,多種類の対話行動(DA)を実現するための応答を生成する必要がある。これまで,対話用自然言語生成器(NLG)は,ドメイン固有DAとその意味的属性を出力発話にマッピングする大規模並列コーパスで訓練されていた。最近の研究は、事前学習言語モデル(LLM)が、プロンプトベース学習を用いた制御可能なNLGに新たな可能性をもたらすことを示している。ここでは、DAの制御された生成を実現するために、新しい数発のオーバージェネレーション・アンド・ランクアプローチを開発する。テキストスタイル転送手法を用いて,テキストの擬似参照から新たに生成する手法を含む8つの小ショットプロンプトスタイルを比較した。生成時に正しいDAと高い意味的精度の両方で出力を識別する6つの自動ランキング関数を開発する。 3つのドメインと4つのLSMでアプローチをテストする。我々の知る限り、DAと属性の精度の両方を用いてアウトプットを自動的にランク付けする対話用NLGに関する最初の研究である。完全性については、DA毎に5から100のインスタンスでトレーニングされた微調整された数ショットモデルと比較する。その結果,いくつかのプロンプト設定が完全なDA精度を実現し,ほぼ完全な意味的精度(99.81%)を実現し,数発の微調整よりも優れた性能を示した。 Dialogue systems need to produce responses that realize multiple types of dialogue acts (DAs) with high semantic fidelity. In the past, natural language generators (NLGs) for dialogue were trained on large parallel corpora that map from a domain-specific DA and its semantic attributes to an output utterance. Recent work shows that pretrained language models (LLMs) offer new possibilities for controllable NLG using prompt-based learning. Here we develop a novel few-shot overgenerate-and-rank approach that achieves the controlled generation of DAs. We compare eight few-shot prompt styles that include a novel method of generating from textual pseudo-references using a textual style transfer approach. We develop six automatic ranking functions that identify outputs with both the correct DA and high semantic accuracy at generation time. We test our approach on three domains and four LLMs. To our knowledge, this is the first work on NLG for dialogue that automatically ranks outputs using both DA and attribute accuracy. For completeness, we compare our results to fine-tuned few-shot models trained with 5 to 100 instances per DA. Our results show that several prompt settings achieve perfect DA accuracy, and near perfect semantic accuracy (99.81%) and perform better than few-shot fine-tuning.	翻訳日:2023-07-28 16:47:06 公開日:2023-07-26
# 固定積分型ニューラルネットワーク Fixed Integral Neural Networks ( http://arxiv.org/abs/2307.14439v1 ) ライセンス: Link先を確認	Ryan Kortvelesy	(参考訳) ニューラルネットワークで表される学習関数に対して統合を行うのに有用であることが多い。しかし、この積分は通常数値的に行われ、学習関数(特にニューラルネットワーク)上の解析的積分は一般に難解であると見なされる。本研究では、学習した関数の積分を$f$で表す方法を提案する。これにより、ニューラルネットワークの正確な積分を計算でき、制約付きニューラルネットワークを積分に直接制約を適用してパラメータ化することができる。重要な点として、多くのアプリケーション(例えば確率分布、距離メトリクスなど)に必要な条件として、$f$を正に制限する手法も紹介する。最後に,固定積分ニューラルネットワーク(finn)を活用可能なアプリケーションをいくつか紹介する。 It is often useful to perform integration over learned functions represented by neural networks. However, this integration is usually performed numerically, as analytical integration over learned functions (especially neural networks) is generally viewed as intractable. In this work, we present a method for representing the analytical integral of a learned function $f$. This allows the exact integral of a neural network to be computed, and enables constrained neural networks to be parametrised by applying constraints directly to the integral. Crucially, we also introduce a method to constrain $f$ to be positive, a necessary condition for many applications (e.g. probability distributions, distance metrics, etc). Finally, we introduce several applications where our fixed-integral neural network (FINN) can be utilised.	翻訳日:2023-07-28 16:46:43 公開日:2023-07-26
# 生成インパインティングによる高画質画像再構成のための表現型保存メトリック設計 Phenotype-preserving metric design for high-content image reconstruction by generative inpainting ( http://arxiv.org/abs/2307.14436v1 ) ライセンス: Link先を確認	Vaibhav Sharma, Artur Yakimovich	(参考訳) 過去数十年間、高濃度自動顕微鏡は、表現型薬物スクリーニングとシステム生物学応用の汎用性を活用した大量の画像ベースのデータを提供する能力を示した。しかし、画像に基づくデータセットのサイズが大きくなるにつれて、画像中の画像やサンプル作成物の存在を人間が制御、回避、克服することは不可能になった。機械学習やディープラーニングのような新しい技術は、生成的画像のインペイントによってこれらの欠点に対処する可能性があるが、センシティブな研究データに適用すると、望ましくない画像操作のコストがかかる可能性がある。望ましくない操作は、いくつかの人工的なニューラルネットワークが引き起こされる神経幻覚のような現象によって引き起こされる可能性がある。そこで本研究では, ラベル付き培養細胞の高濃度蛍光顕微鏡による画像修復法の評価を行った。 deepfill v2やedge connectのようなアーキテクチャは、比較的少ないデータで微調整することで顕微鏡画像を忠実に復元できる。以上の結果から,復元すべき領域は形状よりも重要であることが示唆された。さらに,復元の質を制御するために,新しい表現型保存メトリックデザイン戦略を提案する。この戦略では、細胞核のような修復された生物学的表現型のサイズと数を定量化し、望ましくない操作を罰する。このアプローチの設計原則は、他のアプリケーションにも一般化するかもしれません。 In the past decades, automated high-content microscopy demonstrated its ability to deliver large quantities of image-based data powering the versatility of phenotypic drug screening and systems biology applications. However, as the sizes of image-based datasets grew, it became infeasible for humans to control, avoid and overcome the presence of imaging and sample preparation artefacts in the images. While novel techniques like machine learning and deep learning may address these shortcomings through generative image inpainting, when applied to sensitive research data this may come at the cost of undesired image manipulation. Undesired manipulation may be caused by phenomena such as neural hallucinations, to which some artificial neural networks are prone. To address this, here we evaluate the state-of-the-art inpainting methods for image restoration in a high-content fluorescence microscopy dataset of cultured cells with labelled nuclei. We show that architectures like DeepFill V2 and Edge Connect can faithfully restore microscopy images upon fine-tuning with relatively little data. Our results demonstrate that the area of the region to be restored is of higher importance than shape. Furthermore, to control for the quality of restoration, we propose a novel phenotype-preserving metric design strategy. In this strategy, the size and count of the restored biological phenotypes like cell nuclei are quantified to penalise undesirable manipulation. We argue that the design principles of our approach may also generalise to other applications.	翻訳日:2023-07-28 16:46:32 公開日:2023-07-26
# 非局所情報による予測による不確実性下での信頼性向上 Improving Reliable Navigation under Uncertainty via Predictions Informed by Non-Local Information ( http://arxiv.org/abs/2307.14501v1 ) ライセンス: Link先を確認	Raihan Islam Arnob and Gregory J. Stein	(参考訳) 非局所的に利用可能な情報を用いて、時間的に拡張された行動が不明瞭な空間に入ることの良さを予測することにより、部分マップ環境における信頼性、長期的目標指向ナビゲーションを改善する。ロボットがこれまで見てきたあらゆる観察は、旅行の特定の方向の良さに関する情報を提供するかもしれない。不確実性下での学習型モデルベース計画の最近の研究に基づいて、我々は、(グラフニューラルネットワークを介して)予測を行うために非局所情報に頼ることができると同時に、設計によって信頼性の高いアプローチを提案する。非局所的な情報が必要となる3つのシミュレーション環境で実験を行う。実世界のフロアプランから大規模に生成された大規模大学建築環境では,非学習型ベースラインと比較して9.3\%のコスト削減と,局所情報のみを活用可能な学習型プランナーと比較して14.9\%の削減が実証されている。 We improve reliable, long-horizon, goal-directed navigation in partially-mapped environments by using non-locally available information to predict the goodness of temporally-extended actions that enter unseen space. Making predictions about where to navigate in general requires non-local information: any observations the robot has seen so far may provide information about the goodness of a particular direction of travel. Building on recent work in learning-augmented model-based planning under uncertainty, we present an approach that can both rely on non-local information to make predictions (via a graph neural network) and is reliable by design: it will always reach its goal, even when learning does not provide accurate predictions. We conduct experiments in three simulated environments in which non-local information is needed to perform well. In our large scale university building environment, generated from real-world floorplans to the scale, we demonstrate a 9.3\% reduction in cost-to-go compared to a non-learned baseline and a 14.9\% reduction compared to a learning-informed planner that can only use local information to inform its predictions.	翻訳日:2023-07-28 16:39:13 公開日:2023-07-26
# デジタル情報の関与予測モデル:認知バイアス、計算言語学、自然言語処理を組み込んだ英語単語へのユーザの関与予測 A Predictive Model of Digital Information Engagement: Forecasting User Engagement With English Words by Incorporating Cognitive Biases, Computational Linguistics and Natural Language Processing ( http://arxiv.org/abs/2307.14500v1 ) ライセンス: Link先を確認	Nimrod Dvir, Elaine Friedman, Suraj Commuri, Fan yang and Jennifer Romano	(参考訳) 本研究では,デジタル情報エンゲージメント(IE)の新たな予測モデルであるREADモデルを紹介し,実証的に検証する。累積プロスペクト理論の理論的枠組みの中で概念化されたこのモデルは、重要な認知バイアスを計算言語学や自然言語処理と統合し、情報エンゲージメントに関する多次元的な視点を開発する。 WordNetデータベースから50組の同義語(合計100語)をランダムに選択した厳密なテストプロトコルが実装された。これらの単語のエンゲージメントレベルは、大規模なオンライン調査(n = 80,500)を通じて評価され、経験的IEメトリクスを導出する。各単語の読み出し属性を計算し,その予測の有効性を検討した。その結果,READモデルの頑健さを裏付け,単語のIEレベルを正確に予測し,より係わる単語を84%の精度で同義語と区別した。 READモデルの可能性は、ビジネス、教育、政府、医療など、さまざまな領域に広がり、コンテンツエンゲージメントを高め、AI言語モデルの開発と生成テキストワークを通知する可能性がある。将来の研究は、異なるドメインや言語にわたるモデルのスケーラビリティと適応性に対処し、適用性と有効性を広げるべきである。 This study introduces and empirically tests a novel predictive model for digital information engagement (IE) - the READ model, an acronym for the four pivotal attributes of engaging information: Representativeness, Ease-of-use, Affect, and Distribution. Conceptualized within the theoretical framework of Cumulative Prospect Theory, the model integrates key cognitive biases with computational linguistics and natural language processing to develop a multidimensional perspective on information engagement. A rigorous testing protocol was implemented, involving 50 randomly selected pairs of synonymous words (100 words in total) from the WordNet database. These words' engagement levels were evaluated through a large-scale online survey (n = 80,500) to derive empirical IE metrics. The READ attributes for each word were then computed and their predictive efficacy examined. The findings affirm the READ model's robustness, accurately predicting a word's IE level and distinguishing the more engaging word from a pair of synonyms with an 84% accuracy rate. The READ model's potential extends across various domains, including business, education, government, and healthcare, where it could enhance content engagement and inform AI language model development and generative text work. Future research should address the model's scalability and adaptability across different domains and languages, thereby broadening its applicability and efficacy.	翻訳日:2023-07-28 16:38:53 公開日:2023-07-26
# HUGE: TPUを使った巨大な教師なしグラフ埋め込み HUGE: Huge Unsupervised Graph Embeddings with TPUs ( http://arxiv.org/abs/2307.14490v1 ) ライセンス: Link先を確認	Brandon Mayer, Anton Tsitsulin, Hendrik Fichtenberger, Jonathan Halcrow, Bryan Perozzi	(参考訳) グラフは、オブジェクトの集合間の関係をキャプチャする構造化データの表現である。利用可能なネットワークデータの普及に伴い、数十億のノードと数兆のエッジを持つグラフを素早く分析する産業や学術的なニーズが高まっている。ネットワーク理解のための一般的な第一歩は、グラフ内のノードを連続的に表現するプロセスであるGraph Embeddingである。連続表現は、特に大規模において、分類、リンク予測、クラスタリングといった下流の機械学習タスクを解決するために、しばしばより効果的である。テンソル処理ユニット(TPU)と高帯域幅メモリを併用した高性能グラフ埋め込みアーキテクチャを提案し,グラフ埋め込み問題を単純化し,数十億のノードと数兆のエッジを持つグラフにスケール可能である。本研究では,実および合成大規模データセットの組込み空間品質を検証する。 Graphs are a representation of structured data that captures the relationships between sets of objects. With the ubiquity of available network data, there is increasing industrial and academic need to quickly analyze graphs with billions of nodes and trillions of edges. A common first step for network understanding is Graph Embedding, the process of creating a continuous representation of nodes in a graph. A continuous representation is often more amenable, especially at scale, for solving downstream machine learning tasks such as classification, link prediction, and clustering. A high-performance graph embedding architecture leveraging Tensor Processing Units (TPUs) with configurable amounts of high-bandwidth memory is presented that simplifies the graph embedding problem and can scale to graphs with billions of nodes and trillions of edges. We verify the embedding space quality on real and synthetic large-scale datasets.	翻訳日:2023-07-28 16:38:26 公開日:2023-07-26
# Super Inpaint:超高解像度画像インパインティングのための詳細な注意インシシット表現の学習 SuperInpaint: Learning Detail-Enhanced Attentional Implicit Representation for Super-resolutional Image Inpainting ( http://arxiv.org/abs/2307.14489v1 ) ライセンス: Link先を確認	Canyu Zhang, Qing Guo, Xiaoguang Li, Renjie Wan, Hongkai Yu, Ivor Tsang, Song Wang	(参考訳) 本研究では,低解像度画像の欠落領域を再構築し,任意の高解像度画像を生成することを目的とした,SuperInpaintと呼ばれる課題の画像復元タスクを導入する。この課題は, 互いの欠陥を増幅するため, 最先端の超解像・画像インパインティング手法を積み重ねることによって効果的に対処できないことが判明した。これらの制約を克服するために,スーパーインペントを1つのモデルで実現し,任意の解像度で高品質な画像を生成するDEARを提案する。具体的には,深い畳み込みネットワークを用いて入力画像の潜在埋め込みを抽出し,適応型ハイパスフィルタによる潜在埋め込みの高周波成分を強化する。これにより、詳細な意味埋め込みがもたらされる。さらに,非効率なマスク画素からの埋め込みを抑制する非マスク型モジュールにセマンティック埋め込みを組み込む。さらに,画像再構成にどの画素を使用するべきかを示す画素単位の重要度マップを抽出する。再構成したい画素の座標を考えると、まずその近傍の画素を入力画像に集め、その詳細を強調したセマンティック埋め込み、意図しないセマンティック埋め込み、重要値、所望の画素への空間距離を抽出する。そして、上記の全ての用語を暗黙の表現に入力し、指定されたピクセルの色を生成する。提案手法を評価するため,既存の3つのデータセットを拡張し,SOTA塗装法と超解像法を用いて18の有意義なベースラインを構築した。広範な実験結果から,本手法は既存の手法を4つのメトリクスに対して有意なマージンで上回ることがわかった。 In this work, we introduce a challenging image restoration task, referred to as SuperInpaint, which aims to reconstruct missing regions in low-resolution images and generate completed images with arbitrarily higher resolutions. We have found that this task cannot be effectively addressed by stacking state-of-the-art super-resolution and image inpainting methods as they amplify each other's flaws, leading to noticeable artifacts. To overcome these limitations, we propose the detail-enhanced attentional implicit representation (DEAR) that can achieve SuperInpaint with a single model, resulting in high-quality completed images with arbitrary resolutions. Specifically, we use a deep convolutional network to extract the latent embedding of an input image and then enhance the high-frequency components of the latent embedding via an adaptive high-pass filter. This leads to detail-enhanced semantic embedding. We further feed the semantic embedding into an unmask-attentional module that suppresses embeddings from ineffective masked pixels. Additionally, we extract a pixel-wise importance map that indicates which pixels should be used for image reconstruction. Given the coordinates of a pixel we want to reconstruct, we first collect its neighboring pixels in the input image and extract their detail-enhanced semantic embeddings, unmask-attentional semantic embeddings, importance values, and spatial distances to the desired pixel. Then, we feed all the above terms into an implicit representation and generate the color of the specified pixel. To evaluate our method, we extend three existing datasets for this new task and build 18 meaningful baselines using SOTA inpainting and super-resolution methods. Extensive experimental results demonstrate that our method outperforms all existing methods by a significant margin on four widely used metrics.	翻訳日:2023-07-28 16:38:12 公開日:2023-07-26
# ShinyAnimalCV: オブジェクト検出、セグメンテーション、およびコンピュータビジョンを用いた動物の3次元可視化のためのオープンソースのクラウドベースのWebアプリケーション Technical note: ShinyAnimalCV: open-source cloud-based web application for object detection, segmentation, and three-dimensional visualization of animals using computer vision ( http://arxiv.org/abs/2307.14487v1 ) ライセンス: Link先を確認	Jin Wang, Yu Hu, Lirong Xiang, Gota Morota, Samantha A. Brooks, Carissa L. Wickens, Emily K. Miller-Cushon, and Haipeng Yu	(参考訳) 非侵襲的で費用対効果の高いコンピュータビジョン(CV)は、タイムリーかつ個別化された動物ケアによる意思決定を最適化することで、精密な家畜農業の発展を促進する。安価な2次元および3次元カメラセンサーと様々な機械学習とディープラーニングアルゴリズムが組み合わさったことで、家畜生産システムを改善する貴重な機会となった。しかし、パブリックドメインで様々なcvツールが利用可能であるにもかかわらず、これらのツールを動物データに適用することは困難であり、しばしば、プログラミングとデータ分析のスキルと、コンピューティングリソースへのアクセスを必要とする。さらに、畜産の精密化が急速に進み、CVで動物科学の学生を教育・訓練する必要性が高まっている。このことは、CVに関わる複雑なアルゴリズムを効果的に実証することの課題を教育者に提示する。そこで本研究では,オープンソースクラウドベースのWebアプリケーションであるShinyAnimalCVを開発した。本アプリケーションは,物体のセグメンテーション,検出,3次元表面の可視化,2次元および3次元形態特徴の抽出など,CVタスクを実行するユーザフレンドリーなインタフェースを提供する。このアプリケーションには、トップビュー動物データを用いた9つの事前訓練CVモデルが含まれている。 ShinyAnimalCVは、クラウドコンピューティングプラットフォームを使用してオンラインでデプロイされている。 ShinyAnimalCVのソースコードはGitHubで公開されており、カスタムデータを使用してCVモデルをトレーニングし、ユーザがアプリケーションの機能を完全に活用できるようにローカルにデプロイするための詳細なドキュメントが提供されている。 shinyanimalcvは動物科学コミュニティにおけるcv研究と教育に貢献できる。 Computer vision (CV), a non-intrusive and cost-effective technology, has furthered the development of precision livestock farming by enabling optimized decision-making through timely and individualized animal care. The availability of affordable two- and three-dimensional camera sensors, combined with various machine learning and deep learning algorithms, has provided a valuable opportunity to improve livestock production systems. However, despite the availability of various CV tools in the public domain, applying these tools to animal data can be challenging, often requiring users to have programming and data analysis skills, as well as access to computing resources. Moreover, the rapid expansion of precision livestock farming is creating a growing need to educate and train animal science students in CV. This presents educators with the challenge of efficiently demonstrating the complex algorithms involved in CV. Thus, the objective of this study was to develop ShinyAnimalCV, an open-source cloud-based web application. This application provides a user-friendly interface for performing CV tasks, including object segmentation, detection, three-dimensional surface visualization, and extraction of two- and three-dimensional morphological features. Nine pre-trained CV models using top-view animal data are included in the application. ShinyAnimalCV has been deployed online using cloud computing platforms. The source code of ShinyAnimalCV is available on GitHub, along with detailed documentation on training CV models using custom data and deploying ShinyAnimalCV locally to allow users to fully leverage the capabilities of the application. ShinyAnimalCV can contribute to CV research and teaching in the animal science community.	翻訳日:2023-07-28 16:37:39 公開日:2023-07-26
# 自動セグメンテーションモデル一般化における画像取得と患者現象の変動の役割 Role of Image Acquisition and Patient Phenotype Variations in Automatic Segmentation Model Generalization ( http://arxiv.org/abs/2307.14482v1 ) ライセンス: Link先を確認	Timothy L. Kline, Sumana Ramanathan, Harrison C. Gottlich, Panagiotis Korfiatis, Adriana V. Gregory	(参考訳) 目的: 医用画像セグメンテーションモデルの領域外性能と一般化能力を評価し, 新たな画像取得と疾患タイプへの適応に焦点をあてた。材料: 健常者および多嚢胞性腎疾患(PKD)患者の非コントラストおよび造影腹部CTのデータセットを用いて検討した。腎臓,肝臓,脾臓を分画するモデルのトレーニング・検証には,400枚の画像(非コントラストコントロール100枚,コントラストコントロール100枚,非コントラストPKD100枚,コントラストPKD100枚)を使用し,PKD患者100枚の非コントラストCT画像に対して最終モデルを試験した。 Dice, Jaccard, TPR, Precision を用いて評価した。結果: 多様なデータでトレーニングされたモデルは、ドメイン内のデータでテストされた場合のみにトレーニングされたモデルよりもパフォーマンスが悪くなかった。例えば、各データセットから25%でトレーニングされたモデルのDice類似性は、ドメイン内のデータで純粋にトレーニングされたモデルと非同等であることが判明した。結論: 幅広いトレーニング例がモデルの一般化とドメイン外のパフォーマンスを著しく向上し, 臨床現場におけるセグメンテーション自動化ツールの適用性が向上した。この研究の結果は、医療画像AIモデル開発にデータ中心のアプローチを採用するための将来の研究のロードマップを提供する。 Purpose: This study evaluated the out-of-domain performance and generalization capabilities of automated medical image segmentation models, with a particular focus on adaptation to new image acquisitions and disease type. Materials: Datasets from both non-contrast and contrast-enhanced abdominal CT scans of healthy patients and those with polycystic kidney disease (PKD) were used. A total of 400 images (100 non-contrast controls, 100 contrast controls, 100 non-contrast PKD, 100 contrast PKD) were utilized for training/validation of models to segment kidneys, livers, and spleens, and the final models were then tested on 100 non-contrast CT images of patients affected by PKD. Performance was evaluated using Dice, Jaccard, TPR, and Precision. Results: Models trained on a diverse range of data showed no worse performance than models trained exclusively on in-domain data when tested on in-domain data. For instance, the Dice similarity of the model trained on 25% from each dataset was found to be non-inferior to the model trained purely on in-domain data. Conclusions: The results indicate that broader training examples significantly enhances model generalization and out-of-domain performance, thereby improving automated segmentation tools' applicability in clinical settings. The study's findings provide a roadmap for future research to adopt a data-centric approach in medical image AI model development.	翻訳日:2023-07-28 16:37:11 公開日:2023-07-26
# 貯水池学習の限界 Limits to Reservoir Learning ( http://arxiv.org/abs/2307.14474v1 ) ライセンス: Link先を確認	Anthony M. Polloreno	(参考訳) 本研究では,物理性が示唆する計算限界に基づいて学習する機械の能力を限定した。まず,信号集合の期待二乗誤差の正規化尺度である情報処理能力(IPC)を関数の完全基底として検討することから始める。我々はIPCを用いて、物理的考察に制約された場合、特定の種類のリカレントネットワークである貯水池コンピュータの性能のノイズ下での劣化を測定する。まず、ipcは、n$出力信号のポイントワイズ生成可能な2^n$の集まりを考える場合でも、システムサイズで最大で$n$の多項式であることを示す。次に, この劣化は, 貯留層で表される関数の族が, 貯留層ノイズの存在下で学習するために指数関数的なサンプル数を必要とすることを示唆する。最後に、バイナリ分類に使用する場合、ノイズのない2^n$関数の同じコレクションのパフォーマンスに関する議論を締めくくった。 In this work, we bound a machine's ability to learn based on computational limitations implied by physicality. We start by considering the information processing capacity (IPC), a normalized measure of the expected squared error of a collection of signals to a complete basis of functions. We use the IPC to measure the degradation under noise of the performance of reservoir computers, a particular kind of recurrent network, when constrained by physical considerations. First, we show that the IPC is at most a polynomial in the system size $n$, even when considering the collection of $2^n$ possible pointwise products of the $n$ output signals. Next, we argue that this degradation implies that the family of functions represented by the reservoir requires an exponential number of samples to learn in the presence of the reservoir's noise. Finally, we conclude with a discussion of the performance of the same collection of $2^n$ functions without noise when being used for binary classification.	翻訳日:2023-07-28 16:36:43 公開日:2023-07-26
# ML APIに必要な契約の種類 What Kinds of Contracts Do ML APIs Need? ( http://arxiv.org/abs/2307.14465v1 ) ライセンス: Link先を確認	Samantha Syeda Khairunnesa, Shibbir Ahmed, Sayem Mohammad Imtiaz, Hridesh Rajan, Gary T. Leavens	(参考訳) 最近の研究によると、機械学習(ML)プログラムはエラーを起こしやすく、MLコードのコントラクトを要求している。コントラクトは,コントラクト方法論による設計のように,apiのドキュメント化とapiユーザによる正しいコードの記述を支援する。問題は、APIユーザにとって最も役に立つ契約はどんなものなのだろうか? 私たちは特に、mlパイプラインの初期段階で、apiユーザがエラーをキャッチするのに役立つ契約の種類に興味を持っています。 TensorFlow、Scikit-learn、Keras、PyTorchの4つの最も頻繁に議論されているMLライブラリのStack Overflowに関するポストに関する実証的研究について説明する。これらのライブラリでは、413の非公式な(英語)API仕様を抽出した。これらの仕様を使って以下の質問を理解しました。 ml契約違反の背後にある根本原因と影響は何か? ML契約違反の一般的なパターンはありますか? ML契約を理解するには、高度なレベルのMLソフトウェア専門知識が必要ですか? APIレベルでコントラクトをチェックすることは、MLパイプラインの初期段階における違反の検出に役立ちますか? 私たちの重要な発見は、ML APIの最も一般的に必要とされる契約は、APIの単一引数の制約をチェックするか、API呼び出しの順序をチェックすることです。ソフトウェアエンジニアリングコミュニティは、ML APIの理解を深めるために、これらの契約をマイニングするために既存のコントラクトマイニングアプローチを採用することができる。我々はまた、行動と時間的契約のマイニングのアプローチを組み合わせる必要性についても言及した。契約言語の設計を支援するために必要なml契約のカテゴリについて報告する。 Recent work has shown that Machine Learning (ML) programs are error-prone and called for contracts for ML code. Contracts, as in the design by contract methodology, help document APIs and aid API users in writing correct code. The question is: what kinds of contracts would provide the most help to API users? We are especially interested in what kinds of contracts help API users catch errors at earlier stages in the ML pipeline. We describe an empirical study of posts on Stack Overflow of the four most often-discussed ML libraries: TensorFlow, Scikit-learn, Keras, and PyTorch. For these libraries, our study extracted 413 informal (English) API specifications. We used these specifications to understand the following questions. What are the root causes and effects behind ML contract violations? Are there common patterns of ML contract violations? When does understanding ML contracts require an advanced level of ML software expertise? Could checking contracts at the API level help detect the violations in early ML pipeline stages? Our key findings are that the most commonly needed contracts for ML APIs are either checking constraints on single arguments of an API or on the order of API calls. The software engineering community could employ existing contract mining approaches to mine these contracts to promote an increased understanding of ML APIs. We also noted a need to combine behavioral and temporal contract mining approaches. We report on categories of required ML contracts, which may help designers of contract languages.	翻訳日:2023-07-28 16:36:29 公開日:2023-07-26
# U-Net Spiking Neural Networkを用いた単一チャネル音声強調 Single Channel Speech Enhancement Using U-Net Spiking Neural Networks ( http://arxiv.org/abs/2307.14464v1 ) ライセンス: Link先を確認	Abir Riahi and \'Eric Plourde	(参考訳) 信頼度の高い通信デバイスや頑健な音声認識システムには音声強調(se)が不可欠である。従来の人工ニューラルネットワーク(ANN)はSEで顕著な性能を示したが、高いエネルギーコストとともに計算能力がかなり必要である。本稿では,U-Netアーキテクチャに基づくスパイキングニューラルネットワーク(SNN)を用いたSEに対する新しいアプローチを提案する。 SNNは音声などの時間次元のデータ処理に適しており、ニューロモルフィックハードウェア上でのエネルギー効率のよい実装で知られている。したがって、SNNは限られたリソースを持つデバイス上でのリアルタイムアプリケーションに対する興味深い候補である。現在の研究の主な目的は、SEのための最先端のANNモデルと同等の性能を持つSNNベースのモデルを開発することである。代用階調に基づく最適化を用いて深部SNNを訓練し、異なる信号対雑音比と実環境雑音条件下での知覚目標試験による性能評価を行う。その結果,提案モデルがintel neuromorphic deep noise reduction challenge (intel n-dns challenge) のベースラインソリューションを上回り,同等のannモデルと比較して許容可能な性能が得られることがわかった。 Speech enhancement (SE) is crucial for reliable communication devices or robust speech recognition systems. Although conventional artificial neural networks (ANN) have demonstrated remarkable performance in SE, they require significant computational power, along with high energy costs. In this paper, we propose a novel approach to SE using a spiking neural network (SNN) based on a U-Net architecture. SNNs are suitable for processing data with a temporal dimension, such as speech, and are known for their energy-efficient implementation on neuromorphic hardware. As such, SNNs are thus interesting candidates for real-time applications on devices with limited resources. The primary objective of the current work is to develop an SNN-based model with comparable performance to a state-of-the-art ANN model for SE. We train a deep SNN using surrogate-gradient-based optimization and evaluate its performance using perceptual objective tests under different signal-to-noise ratios and real-world noise conditions. Our results demonstrate that the proposed energy-efficient SNN model outperforms the Intel Neuromorphic Deep Noise Suppression Challenge (Intel N-DNS Challenge) baseline solution and achieves acceptable performance compared to an equivalent ANN model.	翻訳日:2023-07-28 16:36:07 公開日:2023-07-26
# 造影学習による超音波ガイド下脳腫瘍摘出術のマルチモーダル解剖学的ランドマーク検出に向けて Towards multi-modal anatomical landmark detection for ultrasound-guided brain tumor resection with contrastive learning ( http://arxiv.org/abs/2307.14523v1 ) ライセンス: Link先を確認	Soorena Salari, Amirhossein Rasoulian, Hassan Rivaz and Yiming Xiao	(参考訳) 超音波ガイド下脳腫瘍切除における組織変化補正のためのMRI-超音波記録など,様々な臨床応用における画像登録品質の定量的評価に医療スキャン間の相同性解剖学的ランドマークが有用である。手動でMRIと超音波(US)のランドマークペアを識別することで、タスクの異なる登録アルゴリズムの検証が大幅に促進されているが、この手順にはかなりの専門知識、労力、時間が必要であり、画像間の整合性が難しくなる。これまでのところ、解剖学的ランドマーク検出のための伝統的な機械学習アプローチや機械学習アプローチは、主にモノモーダルアプリケーションに焦点を当てている。残念ながら、臨床ニーズにもかかわらず、モーダル/コントラストランドマーク検出が試みられることは稀である。そこで我々は,脳外科におけるMRIと術中USスキャンの対応するランドマークを検出するための,新しいコントラスト学習フレームワークを提案する。具体的には、2つの畳み込みニューラルネットワークが共同で訓練され、MRIと米国のスキャンで画像の特徴を符号化し、MRIの対応するランドマークを含む米国の画像パッチと一致するようにした。公開RESECTデータベースを用いて,その手法の開発と検証を行った。 SIFT特徴を持つ5.88+-4.79mmに対して平均的ランドマーク検出精度は18.78+-4.77mmであり, 神経外科応用におけるMRI-USランドマーク検出の有望な結果が得られた。 Homologous anatomical landmarks between medical scans are instrumental in quantitative assessment of image registration quality in various clinical applications, such as MRI-ultrasound registration for tissue shift correction in ultrasound-guided brain tumor resection. While manually identified landmark pairs between MRI and ultrasound (US) have greatly facilitated the validation of different registration algorithms for the task, the procedure requires significant expertise, labor, and time, and can be prone to inter- and intra-rater inconsistency. So far, many traditional and machine learning approaches have been presented for anatomical landmark detection, but they primarily focus on mono-modal applications. Unfortunately, despite the clinical needs, inter-modal/contrast landmark detection has very rarely been attempted. Therefore, we propose a novel contrastive learning framework to detect corresponding landmarks between MRI and intra-operative US scans in neurosurgery. Specifically, two convolutional neural networks were trained jointly to encode image features in MRI and US scans to help match the US image patch that contain the corresponding landmarks in the MRI. We developed and validated the technique using the public RESECT database. With a mean landmark detection accuracy of 5.88+-4.79 mm against 18.78+-4.77 mm with SIFT features, the proposed method offers promising results for MRI-US landmark detection in neurosurgical applications for the first time.	翻訳日:2023-07-28 16:30:26 公開日:2023-07-26
# CliniDigest: 大規模言語モデルによる臨床試験記述の大規模要約の事例研究 CliniDigest: A Case Study in Large Language Model Based Large-Scale Summarization of Clinical Trial Descriptions ( http://arxiv.org/abs/2307.14522v1 ) ライセンス: Link先を確認	Renee D. White (1), Tristan Peng (1), Pann Sripitak (1), Alexander Rosenberg Johansen (1), Michael Snyder (1) (1) Stanford University	(参考訳) 臨床試験は、新しいバイオメディカル介入を評価する研究である。新しい試行をデザインするために、研究者は現在のものからインスピレーションを得て完成する。 2022年には、毎日100以上の臨床試験が行われ、各臨床試験の平均は1500語[1]である。このため、最新の状態を維持することはほぼ不可能である。この問題を軽減するため,GPT-3.5を用いたクリニダイジェスト(CliniDigest)という試薬を作成した。 CliniDigestは、私たちの知る限り、臨床試験のリアルタイム、真実、そして包括的な要約を提供する最初のツールです。 CliniDigestは、85の臨床試験記述(約10,500語)を、参照と限定幻覚を伴う簡潔な200ワードの要約に還元することができる。 CliniDigestを27のサブドメインに分けて457の臨床試験をまとめた。各フィールドに対して、clinidigestは$\mu=153,\ \sigma=69 $ワードの要約を生成し、それぞれ$\mu=54\%,\ \sigma=30\% $のソースを使用する。より包括的な評価を計画し、本稿で概説する。 A clinical trial is a study that evaluates new biomedical interventions. To design new trials, researchers draw inspiration from those current and completed. In 2022, there were on average more than 100 clinical trials submitted to ClinicalTrials.gov every day, with each trial having a mean of approximately 1500 words [1]. This makes it nearly impossible to keep up to date. To mitigate this issue, we have created a batch clinical trial summarizer called CliniDigest using GPT-3.5. CliniDigest is, to our knowledge, the first tool able to provide real-time, truthful, and comprehensive summaries of clinical trials. CliniDigest can reduce up to 85 clinical trial descriptions (approximately 10,500 words) into a concise 200-word summary with references and limited hallucinations. We have tested CliniDigest on its ability to summarize 457 trials divided across 27 medical subdomains. For each field, CliniDigest generates summaries of $\mu=153,\ \sigma=69 $ words, each of which utilizes $\mu=54\%,\ \sigma=30\% $ of the sources. A more comprehensive evaluation is planned and outlined in this paper.	翻訳日:2023-07-28 16:29:58 公開日:2023-07-26
# 車両照明のパターン:カメラを用いた車両光データセットとメトリクスのキュレーションとアノテーションの複雑さに対処する Patterns of Vehicle Lights: Addressing Complexities in Curation and Annotation of Camera-Based Vehicle Light Datasets and Metrics ( http://arxiv.org/abs/2307.14521v1 ) ライセンス: Link先を確認	Ross Greer, Akshay Gopalkrishnan, Maitrayee Keskar, Mohan Trivedi	(参考訳) 本稿では、コンピュータビジョンにおける車両光の表現とその自律運転分野における様々なタスクへの応用について述べる。境界ボックス,センターポイント,コーナーポイント,セグメンテーションマスクなど,車両の光を表現するための異なる仕様について,その強度と弱点の観点から論じる。車両光検出の恩恵を受ける自動運転における重要な3つのタスクは、夜間車両検出、3次元車両の向き推定、動的軌道探索である。各タスクは光の異なる表現を必要とすることがある。 LISA Vehicle Lights Datasetと関連するLight Visibility Modelが導入され、車両検出、意図と軌道予測、安全な経路計画において、下流アプリケーション用に特別に設計された光アノテーションが提供される。既存の車両光データセットの比較が提供され、各データセットのユニークな特徴と制限が強調される。本論文は、車載照明の表現と、自動運転アプリケーションにおける効果的な検出モデルのトレーニングのための正確なアノテーションの重要性について考察する。私たちのデータセットとモデルはhttps://cvrr.ucsd.edu/vehicle-lights-datasetで利用可能です。 This paper explores the representation of vehicle lights in computer vision and its implications for various tasks in the field of autonomous driving. Different specifications for representing vehicle lights, including bounding boxes, center points, corner points, and segmentation masks, are discussed in terms of their strengths and weaknesses. Three important tasks in autonomous driving that can benefit from vehicle light detection are identified: nighttime vehicle detection, 3D vehicle orientation estimation, and dynamic trajectory cues. Each task may require a different representation of the light. The challenges of collecting and annotating large datasets for training data-driven models are also addressed, leading to introduction of the LISA Vehicle Lights Dataset and associated Light Visibility Model, which provides light annotations specifically designed for downstream applications in vehicle detection, intent and trajectory prediction, and safe path planning. A comparison of existing vehicle light datasets is provided, highlighting the unique features and limitations of each dataset. Overall, this paper provides insights into the representation of vehicle lights and the importance of accurate annotations for training effective detection models in autonomous driving applications. Our dataset and model are made available at https://cvrr.ucsd.edu/vehicle-lights-dataset	翻訳日:2023-07-28 16:29:39 公開日:2023-07-26
# focalerrornet : 不確実性を考慮した焦点変調ネットワークによる超音波ガイド下神経外科手術におけるモーダル間登録誤差推定 FocalErrorNet: Uncertainty-aware focal modulation network for inter-modal registration error estimation in ultrasound-guided neurosurgery ( http://arxiv.org/abs/2307.14520v1 ) ライセンス: Link先を確認	Soorena Salari, Amirhossein Rasoulian, Hassan Rivaz and Yiming Xiao	(参考訳) 脳腫瘍切除では,エロークエント領域を保存しながら癌の組織を正確に除去することが治療の安全性と成果に不可欠である。しかし、術中組織変形(脳シフトと呼ばれる)は手術対象を移動させ、手術前計画を無効にすることができる。術中超音波(ius)は脳シフトを追跡するためのリアルタイム画像として採用されており,手術前計画の更新にはモーダル間登録(mri-ius)が必要となることが多い。手術中の登録結果の品質管理は有害な結果を避けるために重要であるが,手動による検証は困難な3次元可視化とiUSの低コントラストのために大きな課題に直面している。この問題に対処するためには自動アルゴリズムが緊急に必要とされているが、その問題はほとんど試みられなかった。そこで我々は,脳腫瘍手術におけるMRI-iUS登録誤差を正確に評価するために,3次元焦点変調に基づく新しい深層学習手法を提案する。一般のRESECT臨床データベースを用いて開発・検証し,0.59+0.57mmの誤差を推定する。 In brain tumor resection, accurate removal of cancerous tissues while preserving eloquent regions is crucial to the safety and outcomes of the treatment. However, intra-operative tissue deformation (called brain shift) can move the surgical target and render the pre-surgical plan invalid. Intra-operative ultrasound (iUS) has been adopted to provide real-time images to track brain shift, and inter-modal (i.e., MRI-iUS) registration is often required to update the pre-surgical plan. Quality control for the registration results during surgery is important to avoid adverse outcomes, but manual verification faces great challenges due to difficult 3D visualization and the low contrast of iUS. Automatic algorithms are urgently needed to address this issue, but the problem was rarely attempted. Therefore, we propose a novel deep learning technique based on 3D focal modulation in conjunction with uncertainty estimation to accurately assess MRI-iUS registration errors for brain tumor surgery. Developed and validated with the public RESECT clinical database, the resulting algorithm can achieve an estimation error of 0.59+-0.57 mm.	翻訳日:2023-07-28 16:29:18 公開日:2023-07-26
# 解釈可能な部分プロトタイプ画像分類器の評価のためのCo-12レシピ The Co-12 Recipe for Evaluating Interpretable Part-Prototype Image Classifiers ( http://arxiv.org/abs/2307.14517v1 ) ライセンス: Link先を確認	Meike Nauta and Christin Seifert	(参考訳) 解釈可能な部分プロトタイプモデルは、設計によって説明可能なコンピュータビジョンモデルである。モデルは原型部分を学習し、画像中のこれらのコンポーネントを認識し、分類と説明を組み合わせる。直感的に解釈可能なモデルに対する近年の注目にもかかわらず、解釈可能な部分プロトタイプモデルの説明品質を評価するための包括的概要は存在しない。 arXiv:2201.08164(例えば、正しさ、完全性、コンパクト性)で導入された説明品質のCo-12特性に基づいて、部分プロトタイプモデルを評価し、研究ギャップを明らかにし、部分プロトタイプモデルの説明品質を評価するための今後のアプローチを概説する。そこで本稿は,この比較的新しい解釈可能な部分プロトタイプモデルの研究分野の進展と成熟に寄与する。また,パートプロトタイプモデルの評価における知見の簡潔な要約として機能する '`Co-12 cheat sheet' も提供する。 Interpretable part-prototype models are computer vision models that are explainable by design. The models learn prototypical parts and recognise these components in an image, thereby combining classification and explanation. Despite the recent attention for intrinsically interpretable models, there is no comprehensive overview on evaluating the explanation quality of interpretable part-prototype models. Based on the Co-12 properties for explanation quality as introduced in arXiv:2201.08164 (e.g., correctness, completeness, compactness), we review existing work that evaluates part-prototype models, reveal research gaps and outline future approaches for evaluation of the explanation quality of part-prototype models. This paper, therefore, contributes to the progression and maturity of this relatively new research field on interpretable part-prototype models. We additionally provide a ``Co-12 cheat sheet'' that acts as a concise summary of our findings on evaluating part-prototype models.	翻訳日:2023-07-28 16:28:55 公開日:2023-07-26
# 機械学習システムにおけるバグキャラクタリゼーション Bug Characterization in Machine Learning-based Systems ( http://arxiv.org/abs/2307.14512v1 ) ライセンス: Link先を確認	Mohammad Mehdi Morovati, Amin Nikanjam, Florian Tambon, Foutse Khomh, Zhen Ming (Jack) Jiang	(参考訳) 機械学習(ML)を異なる分野、特に安全クリティカル領域に適用する急速な成長により、信頼性の高いMLコンポーネント、すなわちMLに基づいたソフトウェアコンポーネントの必要性が高まっている。 mlベースのシステムにおけるバグの特徴とメンテナンスの課題を理解することで、これらのシステムの開発者は、最もエラーが発生しやすいコンポーネント、最も一般的なバグなどに関する洞察を提供することで、メンテナンスとテストの作業の場所を特定することができる。本稿では,MLベースのソフトウェアシステムにおけるバグの特徴と,メンテナンスの観点からMLと非MLのバグの違いについて検討する。私たちは、TensorFlow、Keras、PyTorchという3つの最も人気のあるMLフレームワークの1つを使用した447,948のGitHubリポジトリを抽出しました。複数のフィルタリングステップを行った後、最もクローズドイシューの多い上位300リポジトリを選択します。抽出したレポジトリを手作業で調べ,非MLシステムを排除する。本調査では,ML コンポーネントに影響を及ぼすか否かを示すため,特定ML ベースシステムで報告された386 項目を手動で検査した。我々の分析によると、MLベースのシステムで報告されている実際の問題の半分はMLバグであり、MLコンポーネントが非MLコンポーネントよりもエラーを起こしやすいことを示している。次に109個のMLバグを特定し,その根本原因,症状を同定し,必要な固定時間を算出した。その結果、MLバグは、バグ修正の複雑さ(コミット数、ファイルの変更、コード行の変更)の観点から、非MLバグとは大きく異なる特徴を持つことが明らかになった。結果から、ML以外のバグや非MLコンポーネントと比較して、MLコンポーネントの修正はコストがかかり、エラーが発生しやすい。したがって、MLベースのシステムでは、MLコンポーネントの信頼性に大きな注意を払うことが不可欠である。 Rapid growth of applying Machine Learning (ML) in different domains, especially in safety-critical areas, increases the need for reliable ML components, i.e., a software component operating based on ML. Understanding the bugs characteristics and maintenance challenges in ML-based systems can help developers of these systems to identify where to focus maintenance and testing efforts, by giving insights into the most error-prone components, most common bugs, etc. In this paper, we investigate the characteristics of bugs in ML-based software systems and the difference between ML and non-ML bugs from the maintenance viewpoint. We extracted 447,948 GitHub repositories that used one of the three most popular ML frameworks, i.e., TensorFlow, Keras, and PyTorch. After multiple filtering steps, we select the top 300 repositories with the highest number of closed issues. We manually investigate the extracted repositories to exclude non-ML-based systems. Our investigation involved a manual inspection of 386 sampled reported issues in the identified ML-based systems to indicate whether they affect ML components or not. Our analysis shows that nearly half of the real issues reported in ML-based systems are ML bugs, indicating that ML components are more error-prone than non-ML components. Next, we thoroughly examined 109 identified ML bugs to identify their root causes, symptoms, and calculate their required fixing time. The results also revealed that ML bugs have significantly different characteristics compared to non-ML bugs, in terms of the complexity of bug-fixing (number of commits, changed files, and changed lines of code). Based on our results, fixing ML bugs are more costly and ML components are more error-prone, compared to non-ML bugs and non-ML components respectively. Hence, paying a significant attention to the reliability of the ML components is crucial in ML-based systems.	翻訳日:2023-07-28 16:28:37 公開日:2023-07-26
# word that stick: 認知バイアスと計算言語学を用いた意思決定と同義語関与の予測 Words That Stick: Predicting Decision Making and Synonym Engagement Using Cognitive Biases and Computational Linguistics ( http://arxiv.org/abs/2307.14511v1 ) ライセンス: Link先を確認	Nimrod Dvir, Elaine Friedman, Suraj Commuri, Fan Yang, Jennifer Romano	(参考訳) 本研究は,デジタルプラットフォーム上でのユーザエンゲージメントと意思決定を期待するために,認知心理学と情報システム研究に基づく。自然言語処理(NLP)技術と認知バイアス研究からの洞察を用いて,デジタルコンテンツ内の同義語とのユーザインタラクションを探索する。本手法は, 4つの認知バイアス表現性, 使いやすさ, 影響, 分布を読み取りモデルに合成する。包括的ユーザ調査を通じて,モデルがユーザエンゲージメントを予測する能力を評価し,コアアイデアを正確に表現し,理解しやすく,感情的反応を誘発し,一般的に遭遇する同義語が,ユーザエンゲージメントを促進することを発見した。重要なのは、私たちの研究は、人間とコンピュータのインタラクション、デジタル行動、意思決定プロセスに関する新しいレンズを提供します。以上の結果から,認知バイアスはユーザエンゲージメントの強力な指標であり,教育やマーケティングといった分野において効果的なデジタルコンテンツを設計する上での意義を強調する。 This research draws upon cognitive psychology and information systems studies to anticipate user engagement and decision-making on digital platforms. By employing natural language processing (NLP) techniques and insights from cognitive bias research, we delve into user interactions with synonyms within digital content. Our methodology synthesizes four cognitive biasesRepresentativeness, Ease-of-use, Affect, and Distributioninto the READ model. Through a comprehensive user survey, we assess the model's ability to predict user engagement, discovering that synonyms that accurately represent core ideas, are easy to understand, elicit emotional responses, and are commonly encountered, promote greater user engagement. Crucially, our work offers a fresh lens on human-computer interaction, digital behaviors, and decision-making processes. Our results highlight the promise of cognitive biases as potent indicators of user engagement, underscoring their significance in designing effective digital content across fields like education and marketing.	翻訳日:2023-07-28 16:28:06 公開日:2023-07-26
# ロボットタッチの注意:ロバストなシム・トゥ・リアル触覚制御のための触覚閾値予測 Attention of Robot Touch: Tactile Saliency Prediction for Robust Sim-to-Real Tactile Control ( http://arxiv.org/abs/2307.14510v1 ) ライセンス: Link先を確認	Yijiong Lin, Mauro Comi, Alex Church, Dandan Zhang, Nathan F. Lepora	(参考訳) 高解像度触覚センサーは、接触に富むロボットタスクにおける局所的な接触に関する情報を正確に提供することができる。しかし、そのようなタスクの非構造化環境への展開は未調査のままである。非構造環境における触覚ロボット制御のロバスト性を向上させるため,ニューロサイエンスのヒューマンタッチアテンション機構やコンピュータビジョンのビジュアルサリエンシー予測問題に触発されたロボットタッチのための新しい概念である \textit{tactile saliency} を提案し,検討した。視覚的サリエンシと類似したこの概念は、触覚センサーが捉えた触覚画像のキー情報を識別する。視覚サリエンシーデータセットは、一般に人間が注釈を付けるが、触覚画像を手動でラベル付けすることは、直観に反するパターンのため困難である。この課題に対処するため、3つのネットワークからなる新しいアプローチを提案する。 1)接触深度ネットワーク(ConDepNet)は、目標と雑音の特徴を含む実際の触覚画像の変形を局所化する接触深度マップを生成する。 2) 入力接触深度マップの目標領域を記述するために、触覚的サルテンシーマップを予測する触覚的サルテンシーネットワーク(tacsalnet) 3) 触覚ノイズ生成装置(tacngen)は,tacsalnetを訓練するためにノイズ特性を生成する。コンタクトポーズ推定とエッジ追従実験の結果から,実触覚画像からのターゲット特徴の正確な予測が得られた。全体として、当社の触覚塩分予測アプローチは、未知の障害のある環境での堅牢なsim-to-real触覚制御を可能にする。プロジェクトページ: https://sites.google.com/view/tactile-saliency/ High-resolution tactile sensing can provide accurate information about local contact in contact-rich robotic tasks. However, the deployment of such tasks in unstructured environments remains under-investigated. To improve the robustness of tactile robot control in unstructured environments, we propose and study a new concept: \textit{tactile saliency} for robot touch, inspired by the human touch attention mechanism from neuroscience and the visual saliency prediction problem from computer vision. In analogy to visual saliency, this concept involves identifying key information in tactile images captured by a tactile sensor. While visual saliency datasets are commonly annotated by humans, manually labelling tactile images is challenging due to their counterintuitive patterns. To address this challenge, we propose a novel approach comprised of three interrelated networks: 1) a Contact Depth Network (ConDepNet), which generates a contact depth map to localize deformation in a real tactile image that contains target and noise features; 2) a Tactile Saliency Network (TacSalNet), which predicts a tactile saliency map to describe the target areas for an input contact depth map; 3) and a Tactile Noise Generator (TacNGen), which generates noise features to train the TacSalNet. Experimental results in contact pose estimation and edge-following in the presence of distractors showcase the accurate prediction of target features from real tactile images. Overall, our tactile saliency prediction approach gives robust sim-to-real tactile control in environments with unknown distractors. Project page: https://sites.google.com/view/tactile-saliency/.	翻訳日:2023-07-28 16:27:49 公開日:2023-07-26
# 純状態からの熱量子クエンチダイナミクスの再構成 Reconstructing Thermal Quantum Quench Dynamics from Pure States ( http://arxiv.org/abs/2307.14508v1 ) ライセンス: Link先を確認	Jason Saroni, Henry Lamm, Peter P. Orth, Thomas Iadecola	(参考訳) 熱状態の非平衡ダイナミクスをシミュレートすることは、高エネルギーから凝縮物質物理学までのスケールにおける根本的な問題である。量子コンピュータはこの問題を効率的に解く方法を提供するかもしれない。量子コンピュータ上での熱状態を作成することは難しいが、時間依存行列要素の重み付け和を便利に計算することでこれを回避できる方法が存在する。基底状態の数は大きいが、本研究では、最大の密度行列要素のみを重みでシミュレートし、密度行列を所定の精度で捉えることにより、減らすことができることを示す。ハミルトン対称性の活用はさらなる還元を可能にする。このアプローチは、短期量子ハードウェア上でのより正確な熱状態力学シミュレーションへの道を開く。 Simulating the nonequilibrium dynamics of thermal states is a fundamental problem across scales from high energy to condensed matter physics. Quantum computers may provide a way to solve this problem efficiently. Preparing a thermal state on a quantum computer is challenging, but there exist methods to circumvent this by computing a weighted sum of time-dependent matrix elements in a convenient basis. While the number of basis states can be large, in this work we show that it can be reduced by simulating only the largest density matrix elements by weight, capturing the density matrix to a specified precision. Leveraging Hamiltonian symmetries enables further reductions. This approach paves the way to more accurate thermal-state dynamics simulations on near-term quantum hardware.	翻訳日:2023-07-28 16:27:21 公開日:2023-07-26
# adhd, dyslexia, short attention spanの学生を対象とした人工知能を活用した高速読解ツール Speed Reading Tool Powered by Artificial Intelligence for Students with ADHD, Dyslexia, or Short Attention Span ( http://arxiv.org/abs/2307.14544v1 ) ライセンス: Link先を確認	Megat Irfan Zackry Bin Ismail Ahmad Nazran bin Yusri Muhammad Hafizzul Bin Abdul Manap Muhammad Muizzuddin Bin Kamarozaman	(参考訳) 本稿では, ディプレキシア, ADHD, 注意不足の学生がテキストベースの情報をより効率的に消化する上で, 新たなアプローチを提案する。提案手法は,多層パーセプトロン(MLP)アルゴリズムを用いて複雑なテキスト処理と要約処理を行う。このツールはHugging FaceのT5(Text-to-Text Transfer Transformer)モデルを活用し、すべてのNLPタスクをテキスト生成タスクとして扱う。モデルはより小さなデータセットを使用して特定のタスクに微調整される。 NLTK の Punkt Sentence Tokenizer はテキストを文のリストに分割するために使われる。アプリケーションは、軽量なwebサーバとフレームワークであるflaskを使って提供される。このツールは、読みやすさを高めるためにBionic Readingの原則を適用しており、大胆な機能と行、単語、文字間隔の調整を含んでいる。本稿では,AIを用いた速度測定ツールの方法論,実装,結果について論じる。 This paper presents a novel approach to assist students with dyslexia, ADHD, and short attention span in digesting any text-based information more efficiently. The proposed solution utilizes the Multilayer Perceptron (MLP) algorithm for complex text processing and summarization tasks. The tool leverages the T5 (Text-to-Text Transfer Transformer) model from Hugging Face, which treats every NLP task as a text generation task. The model is fine-tuned on specific tasks using a smaller dataset. The NLTK's Punkt Sentence Tokenizer is used to divide a text into a list of sentences. The application is served using Flask, a lightweight web server and framework. The tool also applies principles from Bionic Reading to enhance readability, which includes a bolding function and adjustments to line, word, and character spacing. The paper discusses the methodology, implementation, and results of the AI-based speed reading tool.	翻訳日:2023-07-28 16:18:23 公開日:2023-07-26
# plug and pray: マルチモーダルモデルの市販コンポーネントの活用 Plug and Pray: Exploiting off-the-shelf components of Multi-Modal Models ( http://arxiv.org/abs/2307.14539v1 ) ライセンス: Link先を確認	Erfan Shayegani, Yue Dong, Nael Abu-Ghazaleh	(参考訳) 大規模言語モデル(llm)に付加的なモダリティ(ビジョンなど)を組み込んだ急速な成長と人気が高まっているため、セキュリティ上の懸念が高まっている。このモダリティの拡大は、家のドアを増やすのと同じように、意図せずに敵攻撃のための複数のアクセスポイントを生成します。本稿では, 対向型埋め込み空間攻撃の導入により, 市販の事前学習エンコーダなどの既設部品をプラグアンドプレイ方式で組み込んだマルチモーダルシステムに存在する脆弱性を強調した。既存の作業とは対照的に、このアプローチではマルチモーダルシステムの重みやパラメータにアクセスする必要はなく、その代わりに、事前学習されたエンコーダの巨大な未熟な埋め込み空間に依存する。提案する組込み空間攻撃には,事前学習済みコンポーネントの広範囲な組込み空間の危険領域や対象領域に存在する入力画像を求めることが含まれる。これらは'コンテキスト汚染'と'隠れたプロンプト注入'の2つの大きな脅威をもたらし、どちらもllavaのようなマルチモーダルモデルに妥協し、関連する言語モデルの振舞いを完全に変えることができる。本研究は,システムに組み込んで堅牢なセキュリティを確保するために,基礎となるコンポーネント,特に訓練済みエンコーダの総合的な検査の必要性を強調した。 The rapid growth and increasing popularity of incorporating additional modalities (e.g., vision) into large language models (LLMs) has raised significant security concerns. This expansion of modality, akin to adding more doors to a house, unintentionally creates multiple access points for adversarial attacks. In this paper, by introducing adversarial embedding space attacks, we emphasize the vulnerabilities present in multi-modal systems that originate from incorporating off-the-shelf components like public pre-trained encoders in a plug-and-play manner into these systems. In contrast to existing work, our approach does not require access to the multi-modal system's weights or parameters but instead relies on the huge under-explored embedding space of such pre-trained encoders. Our proposed embedding space attacks involve seeking input images that reside within the dangerous or targeted regions of the extensive embedding space of these pre-trained components. These crafted adversarial images pose two major threats: 'Context Contamination' and 'Hidden Prompt Injection'-both of which can compromise multi-modal models like LLaVA and fully change the behavior of the associated language model. Our findings emphasize the need for a comprehensive examination of the underlying components, particularly pre-trained encoders, before incorporating them into systems in a plug-and-play manner to ensure robust security.	翻訳日:2023-07-28 16:18:08 公開日:2023-07-26
# カーネルスペクトルの修正による広帯域ニューラルネットワークの帰納バイアスの制御 Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum ( http://arxiv.org/abs/2307.14531v1 ) ライセンス: Link先を確認	Amnon Geifman, Daniel Barzilai, Ronen Basri and Meirav Galun	(参考訳) 広範ニューラルネットワークは特定の関数の学習に偏りがあり、勾配降下(GD)の収束率と、有限の訓練時間でGDに到達可能な関数の両方に影響を与える。そのため、手元にあるタスクに応じてこのバイアスを修正できるメソッドがとても必要になります。そこで我々は,閉形式が知られていない所望の固有値を持つカーネルを近似するために使用可能な,新しい構成カーネル群であるmodified spectrum kernels (msks)を導入する。本稿では,ニューラルネットワークと神経接核の双対性を利用して,gdの軌道を変化させる事前条件付き勾配降下法を提案する。その結果、多項式と、場合によっては最終的な解を変更することなく指数関数的なトレーニングスピードアップが可能になる。本手法は計算効率が高く,実装が容易である。 Wide neural networks are biased towards learning certain functions, influencing both the rate of convergence of gradient descent (GD) and the functions that are reachable with GD in finite training time. As such, there is a great need for methods that can modify this bias according to the task at hand. To that end, we introduce Modified Spectrum Kernels (MSKs), a novel family of constructed kernels that can be used to approximate kernels with desired eigenvalues for which no closed form is known. We leverage the duality between wide neural networks and Neural Tangent Kernels and propose a preconditioned gradient descent method, which alters the trajectory of GD. As a result, this allows for a polynomial and, in some cases, exponential training speedup without changing the final solution. Our method is both computationally efficient and simple to implement.	翻訳日:2023-07-28 16:17:44 公開日:2023-07-26
# 混合メンバ確率ブロックモデルにおける最適推定 Optimal Estimation in Mixed-Membership Stochastic Block Models ( http://arxiv.org/abs/2307.14530v1 ) ライセンス: Link先を確認	Fedor Noskov and Maxim Panov	(参考訳) コミュニティ検出は現代のネットワーク科学において最も重要な問題の一つである。その応用は、タンパク質モデリングからソーシャルネットワーク分析まで、様々な分野で見ることができる。近年,ネットワークの各ノードが複数のコミュニティに属するという,重複するコミュニティ検出の問題を研究する論文が数多く出ている。本研究では,airoldi et al. (2008) によって初めて提案された混合メンバ確率ブロックモデル (mmsb) について考察する。 MMSBはグラフで重複するコミュニティ構造をモデリングするための非常に一般的な設定を提供する。本研究の中心的課題は,観測ネットワークが与えるコミュニティ間の関係を再構築することである。異なる手法を比較し,推定誤差についてminimax下限を定式化する。次に、この下限に合致する新しい推定器を提案する。理論的結果は、考慮されたモデル上でかなり一般的な条件下で証明される。最後に、この理論を一連の実験で示します。 Community detection is one of the most critical problems in modern network science. Its applications can be found in various fields, from protein modeling to social network analysis. Recently, many papers appeared studying the problem of overlapping community detection, where each node of a network may belong to several communities. In this work, we consider Mixed-Membership Stochastic Block Model (MMSB) first proposed by Airoldi et al. (2008). MMSB provides quite a general setting for modeling overlapping community structure in graphs. The central question of this paper is to reconstruct relations between communities given an observed network. We compare different approaches and establish the minimax lower bound on the estimation error. Then, we propose a new estimator that matches this lower bound. Theoretical results are proved under fairly general conditions on the considered model. Finally, we illustrate the theory in a series of experiments.	翻訳日:2023-07-28 16:17:29 公開日:2023-07-26
# 関数値学習:ermにおけるpolyakステップと関数分割に基づく適応学習率 Function Value Learning: Adaptive Learning Rates Based on the Polyak Stepsize and Function Splitting in ERM ( http://arxiv.org/abs/2307.14528v1 ) ライセンス: Link先を確認	Guillaume Garrigos, Robert M. Gower, Fabian Schaipp	(参考訳) 本稿では,サンプル損失値を用いた適応ステップサイズのsgd(stochasticgradient descent)の変種を開発した。特に、経験的リスク最小化(experiical risk minimization)として知られる有限項和問題に焦点をあてる。まず、サンプル損失値を利用し、サンプル損失の知識を最適に仮定する、$\texttt{SPS}_+$と呼ばれる理想化された適応手法を詳述する。この$\texttt{SPS}_+$は、ステップサイズを正に強制するSPS(Stochastic Polyak Stepsize)法の小さな修正である。次に、$\texttt{SPS}_+$ がリプシッツ非滑らかな SGD の収束率の最もよく知られた値であることを示す。次に、最適な損失値が与えられるのではなく、徐々に学習される$\textt{SPS}_+$の変種である$\textt{FUVAL}$を開発する。プロジェクションベースメソッドとして$\texttt{fuval}$の3つの視点をprox-linearメソッドの変形として、そして特定のオンラインsgdメソッドとして与える。次に、$\texttt{FUVAL}$の収束解析と実験結果を示す。我々の研究の欠点は、$\texttt{FUVAL}$ の収束解析が SGD に勝るものではないことである。もう一つのショートミームは、現在$\texttt{FUVAL}$のフルバッチバージョンのみが、ステップサイズに対する感度の点でGD(Gradient Descent)の小さな利点を示していることである。確率版はsgdに対して明確な利点を示さない。我々は、大きなミニバッチが$\texttt{FUVAL}$競争力を持つ必要があると推測する。現在、この論文で研究されている新しい$\texttt{FUVAL}$メソッドは、明確な理論的または実用的な利点を提供していない。それにもかかわらず、我々は、$\texttt{SPS}_+$の非滑らかな分析など、使用している分析手法のいくつかのために、このドラフトをオンラインで公開することにしました。 Here we develop variants of SGD (stochastic gradient descent) with an adaptive step size that make use of the sampled loss values. In particular, we focus on solving a finite sum-of-terms problem, also known as empirical risk minimization. We first detail an idealized adaptive method called $\texttt{SPS}_+$ that makes use of the sampled loss values and assumes knowledge of the sampled loss at optimality. This $\texttt{SPS}_+$ is a minor modification of the SPS (Stochastic Polyak Stepsize) method, where the step size is enforced to be positive. We then show that $\texttt{SPS}_+$ achieves the best known rates of convergence for SGD in the Lipschitz non-smooth. We then move onto to develop $\texttt{FUVAL}$, a variant of $\texttt{SPS}_+$ where the loss values at optimality are gradually learned, as opposed to being given. We give three viewpoints of $\texttt{FUVAL}$, as a projection based method, as a variant of the prox-linear method, and then as a particular online SGD method. We then present a convergence analysis of $\texttt{FUVAL}$ and experimental results. The shortcomings of our work is that the convergence analysis of $\texttt{FUVAL}$ shows no advantage over SGD. Another shortcomming is that currently only the full batch version of $\texttt{FUVAL}$ shows a minor advantages of GD (Gradient Descent) in terms of sensitivity to the step size. The stochastic version shows no clear advantage over SGD. We conjecture that large mini-batches are required to make $\texttt{FUVAL}$ competitive. Currently the new $\texttt{FUVAL}$ method studied in this paper does not offer any clear theoretical or practical advantage. We have chosen to make this draft available online nonetheless because of some of the analysis techniques we use, such as the non-smooth analysis of $\texttt{SPS}_+$, and also to show an apparently interesting approach that currently does not work.	翻訳日:2023-07-28 16:17:19 公開日:2023-07-26
# 野生sarのためのコンピュータビジョンのオープン問題とpatricia wu-muradの探索 Open Problems in Computer Vision for Wilderness SAR and The Search for Patricia Wu-Murad ( http://arxiv.org/abs/2307.14527v1 ) ライセンス: Link先を確認	Thomas Manzini, Robin Murphy	(参考訳) 本稿では,Wu-Murad wilderness search and rescue (WSAR) における2つのコンピュータビジョンシステム,効率的な教師付き学習モデル,および教師なしRXスペクトル分類器を98.9GBのドローン画像に適用する際の課題について述べる。ドローン画像中の行方不明者を特定するための少なくとも19のアプローチと3つのデータセットが提案されているが、実際のWSAR操作で使用されたのは3つのアプローチ(監視されていない2と未知の構造の1)のみである。これらの手法のうち、効率的なDETアーキテクチャと教師なしスペクトルRX分類器が最適に選択された。効率的デットモデルは、heridalデータセットに適用され、最先端と統計的に等価なパフォーマンスを達成するものの、偽陽性(例えば、木足と岩を人として識別する)と偽陰性(例えば、検索チームのメンバーの識別に失敗した)の観点から実世界への変換に失敗した。データセットに良い結果を示すアルゴリズムの実際的な貧弱な結果は、将来の研究の3つの領域を示唆している: 荒野sarのためのより現実的なデータセット、実際のwsar操作で収集できる様々なイメージをシームレスに処理できるコンピュータビジョンモデル、パフォーマンス測定のアライメントの改善。 This paper details the challenges in applying two computer vision systems, an EfficientDET supervised learning model and the unsupervised RX spectral classifier, to 98.9 GB of drone imagery from the Wu-Murad wilderness search and rescue (WSAR) effort in Japan and identifies 3 directions for future research. There have been at least 19 proposed approaches and 3 datasets aimed at locating missing persons in drone imagery, but only 3 approaches (2 unsupervised and 1 of an unknown structure) are referenced in the literature as having been used in an actual WSAR operation. Of these proposed approaches, the EfficientDET architecture and the unsupervised spectral RX classifier were selected as the most appropriate for this setting. The EfficientDET model was applied to the HERIDAL dataset and despite achieving performance that is statistically equivalent to the state-of-the-art, the model fails to translate to the real world in terms of false positives (e.g., identifying tree limbs and rocks as people), and false negatives (e.g., failing to identify members of the search team). The poor results in practice for algorithms that showed good results on datasets suggest 3 areas of future research: more realistic datasets for wilderness SAR, computer vision models that are capable of seamlessly handling the variety of imagery that can be collected during actual WSAR operations, and better alignment on performance measures.	翻訳日:2023-07-28 16:16:37 公開日:2023-07-26
# 過去20年の研究におけるトレースダイナミクスとその意義 Trace dynamics and its implications for my work of the last two decades ( http://arxiv.org/abs/2307.14524v1 ) ライセンス: Link先を確認	Stephen L. Adler	(参考訳) 2004年のケンブリッジ大学出版局の著書『Quantum Theory as an Emergent Phenomenon'』で述べた『トレース力学』の基本概念を概観し、過去20年間の私の仕事の多くにどのように影響したかについて論じる。 I review the basic ideas of ``trace dynamics'', as formulated in my 2004 Cambridge University Press book ``Quantum Theory as an Emergent Phenomenon'', and then discuss how they have influenced much of my work of the last two decades.	翻訳日:2023-07-28 16:16:10 公開日:2023-07-26
# 交通予測モデルにおける不確かさの定量化と一般化性向上のためのベイズ的アプローチ A Bayesian approach to quantifying uncertainties and improving generalizability in traffic prediction models ( http://arxiv.org/abs/2307.05946v3 ) ライセンス: Link先を確認	Agnimitra Sengupta, Sudeepta Mondal, Adway Das, S. Ilgin Guler	(参考訳) 交通データ予測のためのディープラーニングモデルは、多層アーキテクチャを用いた複雑な関数のモデリングにおいて優れた性能を持つ。しかし、これらのアプローチの大きな欠点は、これらのアプローチのほとんどが不確実性推定による予測を提供していないことである。不確実性推定がなければ、モデル予測に信頼レベルを付けることは困難であり、過信予測に依存する運用戦略は交通状況の悪化につながる可能性がある。本研究では,隠れた層にスペクトル正規化を導入することで,より一般化可能な交通予測における不確実性定量化のためのベイズ繰り返しニューラルネットワークフレームワークを提案する。本稿では,モデルの複雑さを制御し,トレーニングデータへの過剰適合のリスクを低減し,ディープニューラルネットワークのトレーニングプロセスを変化させることを示す。これにより、アウト・オブ・ディストリビューションデータセット上でのモデルの一般化性能が向上する。その結果、スペクトル正規化は不確実性推定を改善でき、単段予測地平線の正規化を伴わない層正規化とモデルの両方を著しく上回ることがわかった。この改良された性能は、摂動下でのデータの特徴空間をよりよくローカライズするスペクトル正規化の能力に起因する。特に交通管理の分野では,複数地点にわたる交通状況の予測が目的であるが,複数の地点からのトレーニングデータの利用は限られている。したがって、スペクトル正規化は、位置特化モデルを必要としないトラフィックデータの基本パターンを効果的にキャプチャできる、より一般化可能なアプローチを提供する。 Deep-learning models for traffic data prediction can have superior performance in modeling complex functions using a multi-layer architecture. However, a major drawback of these approaches is that most of these approaches do not offer forecasts with uncertainty estimates, which are essential for traffic operations and control. Without uncertainty estimates, it is difficult to place any level of trust to the model predictions, and operational strategies relying on overconfident predictions can lead to worsening traffic conditions. In this study, we propose a Bayesian recurrent neural network framework for uncertainty quantification in traffic prediction with higher generalizability by introducing spectral normalization to its hidden layers. In our paper, we have shown that normalization alters the training process of deep neural networks by controlling the model's complexity and reducing the risk of overfitting to the training data. This, in turn, helps improve the generalization performance of the model on out-of-distribution datasets. Results demonstrate that spectral normalization improves uncertainty estimates and significantly outperforms both the layer normalization and model without normalization in single-step prediction horizons. This improved performance can be attributed to the ability of spectral normalization to better localize the feature space of the data under perturbations. Our findings are especially relevant to traffic management applications, where predicting traffic conditions across multiple locations is the goal, but the availability of training data from multiple locations is limited. Spectral normalization, therefore, provides a more generalizable approach that can effectively capture the underlying patterns in traffic data without requiring location-specific models.	翻訳日:2023-07-28 11:28:12 公開日:2023-07-26
# AlphaNet: 分類器の組み合わせによる長距離分類の改善 AlphaNet: Improving Long-Tail Classification By Combining Classifiers ( http://arxiv.org/abs/2008.07073v2 ) ライセンス: Link先を確認	Nadine Chang, Jayanth Koushik, Aarti Singh, Martial Hebert, Yu-Xiong Wang, Michael J. Tarr	(参考訳) ロングテール学習の手法は、データポーア (rare) クラスのパフォーマンス向上に重点を置いているが、そのようなクラスのパフォーマンスは、よりデータリッチ (frequent) クラスのパフォーマンスよりもはるかに低いままである。レアクラスのロングテールメソッドの予測を分析すると、多くのエラーがレアアイテムを視覚的に類似した頻繁なクラスとして誤分類していることが分かる。この問題に対処するために,既存のモデルに適用可能なalphanetを紹介し,レアクラスの分類器に対してポストホック補正を行う。事前学習モデルから、モデルの表現空間における希少なクラスに最も近い頻繁なクラスを見つけ、希少なクラス分類器を頻繁なクラス分類器の線形結合で更新するための重みを学習する。 AlphaNetは、複数のモデルに適用され、複数の長い尾を持つデータセットで稀なクラスのテスト精度を大幅に改善する。また,本手法は,レアクラスと総合的精度のトレードオフを制御し,野生のロングテール分類に有効であることを示す。 Methods in long-tail learning focus on improving performance for data-poor (rare) classes; however, performance for such classes remains much lower than performance for more data-rich (frequent) classes. Analyzing the predictions of long-tail methods for rare classes reveals that a large number of errors are due to misclassification of rare items as visually similar frequent classes. To address this problem, we introduce AlphaNet, a method that can be applied to existing models, performing post hoc correction on classifiers of rare classes. Starting with a pre-trained model, we find frequent classes that are closest to rare classes in the model's representation space and learn weights to update rare class classifiers with a linear combination of frequent class classifiers. AlphaNet, applied to several models, greatly improves test accuracy for rare classes in multiple long-tailed datasets, with very little change to overall accuracy. Our method also provides a way to control the trade-off between rare class and overall accuracy, making it practical for long-tail classification in the wild.	翻訳日:2023-07-27 16:57:07 公開日:2023-07-26
# ガウス過程のスパース依存構造を持つ圧縮可能なスペクトル混合カーネル Compressible Spectral Mixture Kernels with Sparse Dependency Structures for Gaussian Processes ( http://arxiv.org/abs/1808.00560v9 ) ライセンス: Link先を確認	Kai Chen, Yijue Dai, Feng Yin, Elena Marchiori, and Sergios Theodoridis	(参考訳) スペクトル混合(SM)カーネルは、複雑なパターンを記述するためにガウス過程(GP)のための強力な一般化されたカーネルのクラスを構成する。本稿では、GPの一般化を改善するために、モデル圧縮と時間と位相(TP)変調依存構造を元の(SM)カーネルに導入する。具体的には、bienaym\のアイデンティティを採用することで、smコンポーネント間の相互分散を通じて依存関係構造を一般化します。そこで我々は,SMコンポーネント間の相互畳み込みを利用して,依存関係構造を持つ新しいSMカーネルを提案する。さらに,時間と位相の遅延をパラメータ化することで,依存構造の表現性を改善する。依存構造はスペクトル密度、共分散挙動、サンプリング経路の点で明確な解釈を持つ。実効的なハイパーパラメータ初期化、圧縮可能なSMカーネルコンポーネント、スパース依存構造でSMDを強化するために、最後に新しい構造適応(SA)アルゴリズムを導入する。合成および現実の応用におけるSMDの徹底的な比較分析は、その効果を裏付けるものである。 Spectral mixture (SM) kernels comprise a powerful class of generalized kernels for Gaussian processes (GPs) to describe complex patterns. This paper introduces model compression and time- and phase (TP) modulated dependency structures to the original (SM) kernel for improved generalization of GPs. Specifically, by adopting Bienaym\'es identity, we generalize the dependency structure through cross-covariance between the SM components. Then, we propose a novel SM kernel with a dependency structure (SMD) by using cross-convolution between the SM components. Furthermore, we ameliorate the expressiveness of the dependency structure by parameterizing it with time and phase delays. The dependency structure has clear interpretations in terms of spectral density, covariance behavior, and sampling path. To enrich the SMD with effective hyperparameter initialization, compressible SM kernel components, and sparse dependency structures, we introduce a novel structure adaptation (SA) algorithm in the end. A thorough comparative analysis of the SMD on both synthetic and real-life applications corroborates its efficacy.	翻訳日:2023-07-27 16:56:46 公開日:2023-07-26
# ニューラルネットワークにおける最適経路探索とタスク依存学習の組み合わせ Combining optimal path search with task-dependent learning in a neural network ( http://arxiv.org/abs/2201.11104v4 ) ライセンス: Link先を確認	Tomas Kulvicius, Minija Tamosiunaite and Florentin W\"org\"otter	(参考訳) 連結グラフの最適経路を見つけるには、グラフの端を移動する際の最小の総コストを決定する必要がある。この問題は、通常すべてのエッジに対してコストが予め定義された古典的なアルゴリズムによって解決できる。従来の計画手法は、通常、あるタスクの要求に従う適応的な方法でコストを変更したい場合、使用できない。ここでは、コスト値をシナプス重みに変換することで、経路探索問題のニューラルネットワーク表現を定義できることを示し、ネットワーク学習機構を用いたオンラインウェイト適応を可能にする。このネットワークの最初のアクティビティ値から始めると、このネットワークにおけるアクティビティの伝播は、ベルマン・フォードのアルゴリズムで見られるのと同じ解をもたらす。ニューラルネットワークはBellman-Fordと同じアルゴリズムの複雑さを持ち、さらに、ネットワーク学習機構(例えばHebbian Learning)が、ネットワーク内の重みを手作業に応じて強化できることを示すことができる。障害のある環境でのナビゲーションの学習や,特定の経路ノードのシーケンスに従う学習によってこれを実証する。したがって、この表現された新しいアルゴリズムは、経路拡張(学習による)が自然な方法で経路発見と直接結合される、異なるタイプのアプリケーションを開くことができる。 Finding optimal paths in connected graphs requires determining the smallest total cost for traveling along the graph's edges. This problem can be solved by several classical algorithms where, usually, costs are predefined for all edges. Conventional planning methods can, thus, normally not be used when wanting to change costs in an adaptive way following the requirements of some task. Here we show that one can define a neural network representation of path finding problems by transforming cost values into synaptic weights, which allows for online weight adaptation using network learning mechanisms. When starting with an initial activity value of one, activity propagation in this network will lead to solutions, which are identical to those found by the Bellman-Ford algorithm. The neural network has the same algorithmic complexity as Bellman-Ford and, in addition, we can show that network learning mechanisms (such as Hebbian learning) can adapt the weights in the network augmenting the resulting paths according to some task at hand. We demonstrate this by learning to navigate in an environment with obstacles as well as by learning to follow certain sequences of path nodes. Hence, the here-presented novel algorithm may open up a different regime of applications where path-augmentation (by learning) is directly coupled with path finding in a natural way.	翻訳日:2023-07-27 16:53:23 公開日:2023-07-26
# 光ビームのモード構造に符号化されたパラメータの推定:量子論 Estimation of a parameter encoded in the modal structure of a light beam: a quantum theory ( http://arxiv.org/abs/2201.04050v2 ) ライセンス: Link先を確認	Manuel Gessner, Nicolas Treps, and Claude Fabre	(参考訳) 量子光は量子状態だけでなく、状態が定義される電磁モードの形状によっても記述される。光精密測定では、周波数、時間形状、光場の空間分布などの特性を決定する「モードパラメータ」を推定することが多い。量子精度限界を導出することにより、モードパラメータ推定の基本境界を確立する。その結果、任意のモードパラメータを量子精度で推定できる明示的なモード設計レシピが明らかになった。提案手法は,空間的・時間的位置決め,分光,位相推定,超高分解能イメージングなど,モードパラメータ推定を応用した実用的な手法を提供する。 Quantum light is described not only by a quantum state but also by the shape of the electromagnetic modes on which the state is defined. Optical precision measurements often estimate a ``mode parameter'' that determines properties such as frequency, temporal shape and the spatial distribution of the light field. By deriving quantum precision limits, we establish the fundamental bounds for mode parameter estimation. Our results reveal explicit mode-design recipes that enable the estimation of any mode parameter with quantum enhanced precision. Our approach provides practical methods for optimizing mode parameter estimation with relevant applications, including spatial and temporal positioning, spectroscopy, phase estimation, and superresolution imaging.	翻訳日:2023-07-27 16:53:01 公開日:2023-07-26
# 自己教師付きビデオ表現学習のためのクロスモーダルマニフォールドカットミックス Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning ( http://arxiv.org/abs/2112.03906v2 ) ライセンス: Link先を確認	Srijan Das and Michael S. Ryoo	(参考訳) 本稿では,実世界のアプリケーションにおけるコントラスト表現学習のための大規模ビデオデータセットの獲得という課題に対処する。本稿では,ビデオの異なるモダリティを組み合わせることで拡張サンプルを生成するクロスモーダル多様体カットミックス(cmmc)と呼ばれる,自己教師付き学習のための新しい映像拡張手法を提案する。特徴空間内の2つのモダリティにまたがってビデオテッセラクタを別のモードに埋め込むことにより,学習した映像表現の品質を高める。我々は,アクション認識とビデオ検索のための2つの小規模ビデオデータセット ucf101 と hmdb51 について広範な実験を行った。また,本手法はドメイン知識に制限のあるNTUデータセットに対して有効であることを示す。我々のCMMCは、下流の両方のタスクで少ないトレーニングデータを使用しながら、他の自己管理手法と同等のパフォーマンスを達成する。 In this paper, we address the challenge of obtaining large-scale unlabelled video datasets for contrastive representation learning in real-world applications. We present a novel video augmentation technique for self-supervised learning, called Cross-Modal Manifold Cutmix (CMMC), which generates augmented samples by combining different modalities in videos. By embedding a video tesseract into another across two modalities in the feature space, our method enhances the quality of learned video representations. We perform extensive experiments on two small-scale video datasets, UCF101 and HMDB51, for action recognition and video retrieval tasks. Our approach is also shown to be effective on the NTU dataset with limited domain knowledge. Our CMMC achieves comparable performance to other self-supervised methods while using less training data for both downstream tasks.	翻訳日:2023-07-27 16:52:50 公開日:2023-07-26
# 最良腕識別におけるレート最適ベイズ単純後悔 Rate-optimal Bayesian Simple Regret in Best Arm Identification ( http://arxiv.org/abs/2111.09885v3 ) ライセンス: Link先を確認	Junpei Komiyama, Kaito Ariu, Masahiro Kato and Chao Qin	(参考訳) マルチアームのバンディット問題において,最善のアーム識別を考える。前者の一定の連続性条件を仮定すると、ベイズ的単純後悔の速度を特徴づける。ベイズ的後悔最小化(英語版)(Bayesian regret minimization) (Lai, 1987) から派生し、ベイズ的単純後悔の第一項は最適腕と最適腕の間のギャップが$\sqrt{\frac{\log T}{T}}$より小さい地域に由来する。提案手法は, 計算が容易で, 計算が容易なアルゴリズムであり, その先行項が定数係数までの下限値に一致することを提案する。 We consider best arm identification in the multi-armed bandit problem. Assuming certain continuity conditions of the prior, we characterize the rate of the Bayesian simple regret. Differing from Bayesian regret minimization (Lai, 1987), the leading term in the Bayesian simple regret derives from the region where the gap between optimal and suboptimal arms is smaller than $\sqrt{\frac{\log T}{T}}$. We propose a simple and easy-to-compute algorithm with its leading term matching with the lower bound up to a constant factor; simulation results support our theoretical findings.	翻訳日:2023-07-27 16:52:35 公開日:2023-07-26
# QOptCraft: 線形光量子系の設計と研究のためのPythonパッケージ QOptCraft: A Python package for the design and study of linear optical quantum systems ( http://arxiv.org/abs/2108.06186v2 ) ライセンス: Link先を確認	Daniel G\'omez Aguado, Vicent Gimeno, Julio Jos\'e Moyano-Fern\'andez, Juan Carlos Garcia-Escartin	(参考訳) 線形光学系における光の量子状態の操作は、量子光学と量子計算に複数の応用がある。 QOptCraftパッケージは、線形干渉計を用いた量子実験を設計する際に、最も一般的な問題のいくつかを解決する方法のコレクションを提供する。この方法には、システムの古典的な記述からn個の光子の量子進化行列を計算する関数と、任意の所望の量子進化のために、ユニタリ進化を実現する実験系の完全な記述を与えるか、あるいはそれが不可能である場合には、所望のユニタリを局所的に最小の誤差で近似する線形系の完全な記述を与える逆法が含まれる。パッケージ内の関数には、線形系の古典的な散乱行列をビームスプリッターと位相シフト器のリストに変換する異なる既知の分解の実装と、n光子を持つ状態の量子進化を記述する効果的なハミルトニアンを計算する方法が含まれる。このパッケージはランダム線形光学系の生成や行列対数計算などの有用なタスクのためのルーチンで完結している。ルーチンは、線形系の記述に現れるユニタリ行列を扱うとき、通常の数値問題を避けるために選択される。 The manipulation of the quantum states of light in linear optical systems has multiple applications in quantum optics and quantum computation. The package QOptCraft gives a collection of methods to solve some of the most usual problems when designing quantum experiments with linear interferometers. The methods include functions that compute the quantum evolution matrix for n photons from the classical description of the system and inverse methods that, for any desired quantum evolution, will either give the complete description of the experimental system that realizes that unitary evolution or, when this is impossible, the complete description of the linear system which approximates the desired unitary with a locally minimal error. The functions in the package include implementations of different known decompositions that translate the classical scattering matrix of a linear system into a list of beam splitters and phase shifters and methods to compute the effective Hamiltonian that describes the quantum evolution of states with n photons. The package is completed with routines for useful tasks like generating random linear optical systems and computing matrix logarithms. The routines are chosen to avoid usual numerical problems when dealing with the unitary matrices that appear in the description of linear systems.	翻訳日:2023-07-27 16:52:21 公開日:2023-07-26
# 語彙データのための深層学習モデルの再検討 Revisiting Deep Learning Models for Tabular Data ( http://arxiv.org/abs/2106.11959v3 ) ライセンス: Link先を確認	Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, Artem Babenko	(参考訳) 表形式のデータに対するディープラーニングに関する既存の文献は、幅広い新しいアーキテクチャを提案し、様々なデータセットで競合する結果を報告している。しかしながら、提案されたモデルは、通常、互いに適切に比較されないため、既存の作業では、しばしば異なるベンチマークと実験プロトコルを使用する。その結果、研究者と実践者の両方にとって、どのモデルが優れているかは明らかでない。さらに、フィールドには効果的なベースライン、すなわち様々な問題にまたがる競争性能を提供する使いやすいモデルがない。本研究では,2つの単純かつ強力な深層アーキテクチャを識別することにより,表層データに対するDLアーキテクチャのメインファミリーの概要と表層DLにおけるベースラインのバーを高める。ひとつはResNetのようなアーキテクチャで、以前の作業でしばしば欠落する強力なベースラインであることが分かりました。第2のモデルは、表データに対するTransformerアーキテクチャの簡単な適応であり、ほとんどのタスクにおいて他のソリューションよりも優れています。どちらのモデルも、同じトレーニングおよびチューニングプロトコルの下で様々なタスクセットで既存のアーキテクチャと比較される。また、最高のDLモデルとGradient Boosted Decision Treesを比較して、まだ普遍的に優れたソリューションがないと結論づける。 The existing literature on deep learning for tabular data proposes a wide range of novel architectures and reports competitive results on various datasets. However, the proposed models are usually not properly compared to each other and existing works often use different benchmarks and experiment protocols. As a result, it is unclear for both researchers and practitioners what models perform best. Additionally, the field still lacks effective baselines, that is, the easy-to-use models that provide competitive performance across different problems. In this work, we perform an overview of the main families of DL architectures for tabular data and raise the bar of baselines in tabular DL by identifying two simple and powerful deep architectures. The first one is a ResNet-like architecture which turns out to be a strong baseline that is often missing in prior works. The second model is our simple adaptation of the Transformer architecture for tabular data, which outperforms other solutions on most tasks. Both models are compared to many existing architectures on a diverse set of tasks under the same training and tuning protocols. We also compare the best DL models with Gradient Boosted Decision Trees and conclude that there is still no universally superior solution.	翻訳日:2023-07-27 16:51:58 公開日:2023-07-26
# 事前学習言語モデルの包括的比較 A Comprehensive Comparison of Pre-training Language Models ( http://arxiv.org/abs/2106.11483v9 ) ライセンス: Link先を確認	Tong Guo	(参考訳) 近年、訓練済み言語モデルの開発により、自然言語処理(NLP)タスクが新しい最先端技術に導入されている。本稿では,様々な事前学習言語モデルの効率性について検討する。我々は、同じテキスト量と同じトレーニングステップを持つトランスフォーマーベースのモデルのリストを事前訓練する。実験結果から、BERTの原点における最も改善点は、短文理解のための文脈情報を取得するためにRNN層を追加することである。しかし、結論は: 類似のbert構造に対する短いテキスト理解に顕著な改善はない。データ中心のメソッド[12]はよりよいパフォーマンスを達成できます。 Recently, the development of pre-trained language models has brought natural language processing (NLP) tasks to the new state-of-the-art. In this paper we explore the efficiency of various pre-trained language models. We pre-train a list of transformer-based models with the same amount of text and the same training steps. The experimental results shows that the most improvement upon the origin BERT is adding the RNN-layer to capture more contextual information for short text understanding. But the conclusion is: There are no remarkable improvement for short text understanding for similar BERT structures. Data-centric method[12] can achieve better performance.	翻訳日:2023-07-27 16:51:40 公開日:2023-07-26
# ディープラーニングに基づく3次元セグメンテーション:調査 Deep Learning Based 3D Segmentation: A Survey ( http://arxiv.org/abs/2103.05423v3 ) ライセンス: Link先を確認	Yong He, Hongshan Yu, Xiaoyan Liu, Zhengeng Yang, Wei Sun, Ajmal Mian	(参考訳) 3dセグメンテーションは、自律運転、ロボティクス、拡張現実、医療画像解析などの応用を含む、コンピュータビジョンにおける基本的かつ困難な問題である。コンピュータビジョン、グラフィックス、機械学習のコミュニティから大きな注目を集めている。手作り特徴と機械学習分類器に基づく従来の3Dセグメンテーション手法では、一般化能力が欠如している。 2Dコンピュータビジョンの成功によって、ディープラーニング技術は、最近3Dセグメンテーションタスクの選択ツールとなっている。これにより、さまざまなベンチマークデータセットで評価された多くの方法が文献に流入した。 RGB-Dとポイントクラウドのセグメンテーションに関する調査論文は存在するが、すべての3Dデータモダリティとアプリケーションドメインをカバーする詳細な調査や最近の調査は存在しない。本稿では,このギャップを埋め,ディープラーニングに基づく3Dセグメンテーションにおける最近の進歩を包括的に調査する。 180以上の作品をカバーし、強みと限界を分析し、ベンチマークデータセットでの競争力について論じている。この調査は、最も一般的に使用されているパイプラインの概要を提供し、最終的に将来有望な研究方向性を強調している。 3D segmentation is a fundamental and challenging problem in computer vision with applications in autonomous driving, robotics, augmented reality and medical image analysis. It has received significant attention from the computer vision, graphics and machine learning communities. Conventional methods for 3D segmentation, based on hand-crafted features and machine learning classifiers, lack generalization ability. Driven by their success in 2D computer vision, deep learning techniques have recently become the tool of choice for 3D segmentation tasks. This has led to an influx of a large number of methods in the literature that have been evaluated on different benchmark datasets. Whereas survey papers on RGB-D and point cloud segmentation exist, there is a lack of an in-depth and recent survey that covers all 3D data modalities and application domains. This paper fills the gap and provides a comprehensive survey of the recent progress made in deep learning based 3D segmentation. It covers over 180 works, analyzes their strengths and limitations and discusses their competitive results on benchmark datasets. The survey provides a summary of the most commonly used pipelines and finally highlights promising research directions for the future.	翻訳日:2023-07-27 16:51:30 公開日:2023-07-26
# 静的・動的シナリオにおけるモノガミーの出現 Emergence of Monogamy under Static and Dynamic Scenarios ( http://arxiv.org/abs/2102.04940v2 ) ライセンス: Link先を確認	Rivu Gupta, Saptarshi Roy, Shiladitya Mal, Aditi Sen De	(参考訳) 2つのパーティを超えて多部量子相関を特徴付けることは、最先端の量子技術を構築する上で最も重要である。本稿では,多元系に存在する量子相関 (qcs) について,単元得点 (ms) と局所化量子相関 (lqc) ,および状態の真の多元的絡み合い (gme) の相関関係について検討する。我々は高励起のディック状態に対するGMEの周波数分布がランダム状態の周波数分布に類似していることを発見した。我々は,すべての状態が単元となるgmeの臨界値が存在することを示し,各単元関係の様々な層を提供するmsの異なるパワーを考慮して検討する。興味深いことに、LQC と MS と GME の関係は成り立たない。非常に低いGME(低いモノガミースコア、正と負の両方)を持つ状態は、2つのパーティで高いQCをローカライズすることができる。また、ランダム状態に対するLQCを含む2部QC測度の和に対する上界を提供し、実際の上界と代数的最大値の間のギャップを確立する。 Characterizing multipartite quantum correlations beyond two parties is of utmost importance for building cutting edge quantum technologies, although the comprehensive picture is still missing. Here we investigate quantum correlations (QCs) present in a multipartite system by exploring connections between monogamy score (MS), localizable quantum correlations (LQC), and genuine multipartite entanglement (GME) content of the state. We find that the frequency distribution of GME for Dicke states with higher excitations resembles that of random states. We show that there is a critical value of GME beyond which all states become monogamous and it is investigated by considering different powers of MS which provide various layers of monogamy relations. Interestingly, such a relation between LQC and MS as well as GME does not hold. States having a very low GME (low monogamy score, both positive and negative) can localize a high amount of QCs in two parties. We also provide an upper bound to the sum of bipartite QC measures including LQC for random states and establish a gap between the actual upper bound and the algebraic maximum.	翻訳日:2023-07-27 16:51:13 公開日:2023-07-26
# スーパーデンス符号化の剛性 Rigidity of superdense coding ( http://arxiv.org/abs/2012.01672v2 ) ライセンス: Link先を確認	Ashwin Nayak and Henry Yuen	(参考訳) bennett と wiesner の有名な superdense 符号化プロトコルは、1つの qubit だけを送信し、共有 epr ペアを使って2ビットの古典情報を伝えることができることを実証している。最初の結果は、このタスクを達成するための任意のプロトコル(送信者のエンコーディング操作や共有されたエンタングル状態の次元に仮定がない)が、標準のbennett-wiesnerプロトコルと局所的に等価であるということです。言い換えれば、超高次符号化タスクは厳格である。特に,送信側と受信側は,古典的ランダム性の源として,追加の絡み合い(EPRペア以外の)のみを使用することを示す。また、高次元のスーパーデンス符号化に関するいくつかの質問についても検討し、一般的な次元で$d$-dimensionalの量子状態を送信することで、$d^2$の可能なメッセージの1つを伝えることを目標としている。 $d=2$の場合(つまり1つのqubitを送信)とは異なり、より高額な$d$に対して、等価なスーパーセンスコーディングプロトコルが存在する。非同値プロトコルの具体的構成は、すべての$d > 2$ に対する非同値直交ユニタリベースの構成に基づいている。最後に、符号化演算子がユニタリ群上のハール測度から独立にサンプリングされるスーパーデンス符号化プロトコルの性能を分析する。我々の分析は、無作為な最大絡み合った状態の区別可能性の有界化を伴う。 The famous superdense coding protocol of Bennett and Wiesner demonstrates that it is possible to communicate two bits of classical information by sending only one qubit and using a shared EPR pair. Our first result is that an arbitrary protocol for achieving this task (where there are no assumptions on the sender's encoding operations or the dimension of the shared entangled state) is locally equivalent to the canonical Bennett-Wiesner protocol. In other words, the superdense coding task is rigid. In particular, we show that the sender and receiver only use additional entanglement (beyond the EPR pair) as a source of classical randomness. We also investigate several questions about higher-dimensional superdense coding, where the goal is to communicate one of $d^2$ possible messages by sending a $d$-dimensional quantum state, for general dimensions $d$. Unlike the $d=2$ case (i.e. sending a single qubit), there can be inequivalent superdense coding protocols for higher $d$. We present concrete constructions of inequivalent protocols, based on constructions of inequivalent orthogonal unitary bases for all $d > 2$. Finally, we analyze the performance of superdense coding protocols where the encoding operators are independently sampled from the Haar measure on the unitary group. Our analysis involves bounding the distinguishability of random maximally entangled states, which may be of independent interest.	翻訳日:2023-07-27 16:50:52 公開日:2023-07-26
# 深部画像復元・拡張の先駆者:調査 Priors in Deep Image Restoration and Enhancement: A Survey ( http://arxiv.org/abs/2206.02070v2 ) ライセンス: Link先を確認	Yunfan Lu, Yiqi Lin, Hao Wu, Yunhao Luo, Xu Zheng, Hui Xiong, Lin Wang	(参考訳) 画像の復元と改善は、ノイズ、ぼかし、分解などの劣化を取り除くことによって画質を改善するプロセスである。深層学習(DL)は近年,画像修復と拡張に応用されている。その不適切な性質のため、深層ニューラルネットワーク(dnn)のトレーニングを容易にするために、先行研究が数多く行われている。しかし, 先行研究の重要性は, 研究コミュニティにおいて, 体系的に研究され, 分析されていない。そこで本稿は,最近の深部画像復元・強調技術の進歩を包括的に概観する最初の研究である。 Our work covers five primary contents: (1) A theoretical analysis of priors for deep image restoration and enhancement; (2) A hierarchical and structural taxonomy of priors commonly used in the DL-based methods; (3) An insightful discussion on each prior regarding its principle, potential, and applications; (4) A summary of crucial problems by highlighting the potential future directions, especially adopting the large-scale foundation models as prior, to spark more research in the community; (5) An open-source repository that provides a taxonomy of all mentioned works and code links. Image restoration and enhancement is a process of improving the image quality by removing degradations, such as noise, blur, and resolution degradation. Deep learning (DL) has recently been applied to image restoration and enhancement. Due to its ill-posed property, plenty of works have been explored priors to facilitate training deep neural networks (DNNs). However, the importance of priors has not been systematically studied and analyzed by far in the research community. Therefore, this paper serves as the first study that provides a comprehensive overview of recent advancements in priors for deep image restoration and enhancement. Our work covers five primary contents: (1) A theoretical analysis of priors for deep image restoration and enhancement; (2) A hierarchical and structural taxonomy of priors commonly used in the DL-based methods; (3) An insightful discussion on each prior regarding its principle, potential, and applications; (4) A summary of crucial problems by highlighting the potential future directions, especially adopting the large-scale foundation models as prior, to spark more research in the community; (5) An open-source repository that provides a taxonomy of all mentioned works and code links.	翻訳日:2023-07-27 16:45:18 公開日:2023-07-26
# 勧告の公正性:基礎,方法,応用 Fairness in Recommendation: Foundations, Methods and Applications ( http://arxiv.org/abs/2205.13619v5 ) ライセンス: Link先を確認	Yunqi Li, Hanxiong Chen, Shuyuan Xu, Yingqiang Ge, Juntao Tan, Shuchang Liu, Yongfeng Zhang	(参考訳) 機械学習の最も普及している応用の1つとして、推奨システムは人間の意思決定を支援する上で重要な役割を果たす。ユーザの満足度とプラットフォームの関心度は,生成した推奨結果の品質と密接に関連している。しかし、高度にデータ駆動のシステムとして、レコメンダシステムはデータやアルゴリズムのバイアスの影響を受け、不公平な結果をもたらし、システムへの依存を弱める可能性がある。その結果、推薦設定における潜在的不公平問題に対処することが重要である。近年,レコメンデーションシステムにおける公平性への配慮が注目され,レコメンデーションの公平性を促進するためのアプローチに関する文献が増えている。しかし、研究はむしろ断片化されており、体系的な組織を欠いているため、新たな研究者をドメインに侵入することは困難である。これにより、既存のフェアネスに関するレコメンデーションに関する調査を体系的に実施する動機付けとなります。本調査は、推薦文学における公正性の基盤に焦点を当てる。まず、公平性研究の概観を提供するため、分類やランク付けといった基本的な機械学習タスクにおける公平性に関する簡単な紹介と、レコメンダシステムにおける公平性を研究する際に考慮すべきより複雑な状況と課題を紹介する。その後、現在のフェアネス定義の分類法、フェアネス改善のための典型的な手法、そして、レコメンデーションにおけるフェアネス研究のためのデータセットに焦点を当てて、レコメンデーションにフェアネスを導入する。また、フェアネス研究の課題と機会についても述べ、フェアリコメンデーション研究分野の推進などを目指している。 As one of the most pervasive applications of machine learning, recommender systems are playing an important role on assisting human decision making. The satisfaction of users and the interests of platforms are closely related to the quality of the generated recommendation results. However, as a highly data-driven system, recommender system could be affected by data or algorithmic bias and thus generate unfair results, which could weaken the reliance of the systems. As a result, it is crucial to address the potential unfairness problems in recommendation settings. Recently, there has been growing attention on fairness considerations in recommender systems with more and more literature on approaches to promote fairness in recommendation. However, the studies are rather fragmented and lack a systematic organization, thus making it difficult to penetrate for new researchers to the domain. This motivates us to provide a systematic survey of existing works on fairness in recommendation. This survey focuses on the foundations for fairness in recommendation literature. It first presents a brief introduction about fairness in basic machine learning tasks such as classification and ranking in order to provide a general overview of fairness research, as well as introduce the more complex situations and challenges that need to be considered when studying fairness in recommender systems. After that, the survey will introduce fairness in recommendation with a focus on the taxonomies of current fairness definitions, the typical techniques for improving fairness, as well as the datasets for fairness studies in recommendation. The survey also talks about the challenges and opportunities in fairness research with the hope of promoting the fair recommendation research area and beyond.	翻訳日:2023-07-27 16:45:02 公開日:2023-07-26
# フェデレート学習のためのロバスト量認識集約 Robust Quantity-Aware Aggregation for Federated Learning ( http://arxiv.org/abs/2205.10848v2 ) ライセンス: Link先を確認	Jingwei Yi, Fangzhao Wu, Huishuai Zhang, Bin Zhu, Tao Qi, Guangzhong Sun, Xing Xie	(参考訳) federated learning(fl)は、複数のクライアントがローカルデータを共有せずに、協調的にモデルをトレーニングすることを可能にする。しかし、古典的なFLは深刻なセキュリティと堅牢性の問題に直面しており、例えば、悪意のあるクライアントはモデルのアップデートを害し、同時にモデルアグリゲーションにおけるモデル更新の影響を増幅するために大量の請求を行う。 FLの既存の防御メソッドは、悪意のあるモデル更新を処理する一方で、すべての量の良性を扱うか、単にすべてのクライアントの量を無視/停止する。前者は量増強攻撃に弱いが、後者は、異なるクライアント上のローカルデータが通常、かなり異なるサイズであるため、準最適パフォーマンスをもたらす。本稿では,フェデレーション学習のためのロバストな量認識集約アルゴリズムであるFedRAを提案する。具体的には、アップロードされたモデル更新と異なるクライアントのデータ量とを協調的に考慮し、残っているクライアントのモデル更新に重み付けを施すことにより、悪意のあるクライアントをフィルタリングする手法を提案する。さらに,フェデレーション学習に参加する悪意のあるクライアントの数は,異なるラウンドで動的に変化する可能性があるため,各ラウンドにおいて不審なクライアントの数を推定する悪意のあるクライアント番号推定器を提案する。 4つの公開データセットを用いた実験により,FedRA法の有効性が実証された。 Federated learning (FL) enables multiple clients to collaboratively train models without sharing their local data, and becomes an important privacy-preserving machine learning framework. However, classical FL faces serious security and robustness problem, e.g., malicious clients can poison model updates and at the same time claim large quantities to amplify the impact of their model updates in the model aggregation. Existing defense methods for FL, while all handling malicious model updates, either treat all quantities benign or simply ignore/truncate the quantities of all clients. The former is vulnerable to quantity-enhanced attack, while the latter leads to sub-optimal performance since the local data on different clients is usually in significantly different sizes. In this paper, we propose a robust quantity-aware aggregation algorithm for federated learning, called FedRA, to perform the aggregation with awareness of local data quantities while being able to defend against quantity-enhanced attacks. More specifically, we propose a method to filter malicious clients by jointly considering the uploaded model updates and data quantities from different clients, and performing quantity-aware weighted averaging on model updates from remaining clients. Moreover, as the number of malicious clients participating in the federated learning may dynamically change in different rounds, we also propose a malicious client number estimator to predict how many suspicious clients should be filtered in each round. Experiments on four public datasets demonstrate the effectiveness of our FedRA method in defending FL against quantity-enhanced attacks.	翻訳日:2023-07-27 16:44:37 公開日:2023-07-26
# タブラルディープラーニングにおける数値的特徴の埋め込みについて On Embeddings for Numerical Features in Tabular Deep Learning ( http://arxiv.org/abs/2203.05556v3 ) ライセンス: Link先を確認	Yury Gorishniy and Ivan Rubachev and Artem Babenko	(参考訳) 近年,トランスフォーマーのような深層アーキテクチャは表型データ問題に対して高い性能を示している。 MLPのような従来のモデルとは異なり、これらのアーキテクチャはスカラー値の数値特徴をメインのバックボーンに混ぜる前に高次元の埋め込みにマッピングする。本研究では,従来の GBDT 対応ベンチマークにおいて,より強力な DL モデルの構築と GBDT との競合を可能にするため,数値的特徴の埋め込みは,表型 DL の過度な自由度である,と論じる。まず、埋め込み加群を構築するための概念的に異なる2つのアプローチについて説明する: 1つはスカラー値の断片的線形符号化に基づくもので、2つ目は周期的アクティベーションを利用する。次に,これら2つのアプローチが,線形層やreluアクティベーションといった従来のブロックに基づく組込みと比較して,大幅なパフォーマンス向上につながることを実証する。重要なのは,トランスフォーマーだけでなく,多くのバックボーンにも数値的特徴を埋め込むことが有益であることを示すことである。具体的には、適切な埋め込みの後、単純なMLPのようなモデルは注意に基づくアーキテクチャと同等に機能する。全体として、数値的な特徴の埋め込みを重要な設計の側面として強調し、表状DLのさらなる改善の可能性を秘めている。 Recently, Transformer-like deep architectures have shown strong performance on tabular data problems. Unlike traditional models, e.g., MLP, these architectures map scalar values of numerical features to high-dimensional embeddings before mixing them in the main backbone. In this work, we argue that embeddings for numerical features are an underexplored degree of freedom in tabular DL, which allows constructing more powerful DL models and competing with GBDT on some traditionally GBDT-friendly benchmarks. We start by describing two conceptually different approaches to building embedding modules: the first one is based on a piecewise linear encoding of scalar values, and the second one utilizes periodic activations. Then, we empirically demonstrate that these two approaches can lead to significant performance boosts compared to the embeddings based on conventional blocks such as linear layers and ReLU activations. Importantly, we also show that embedding numerical features is beneficial for many backbones, not only for Transformers. Specifically, after proper embeddings, simple MLP-like models can perform on par with the attention-based architectures. Overall, we highlight embeddings for numerical features as an important design aspect with good potential for further improvements in tabular DL.	翻訳日:2023-07-27 16:44:12 公開日:2023-07-26
# MICDIR: 自己構築グラフラテント付きUNetMSSを用いたマルチスケール逆整合デフォルマブルイメージレジストレーション MICDIR: Multi-scale Inverse-consistent Deformable Image Registration using UNetMSS with Self-Constructing Graph Latent ( http://arxiv.org/abs/2203.04317v2 ) ライセンス: Link先を確認	Soumick Chatterjee, Himanshi Bajaj, Istiyak H. Siddiquee, Nandish Bandi Subbarayappa, Steve Simon, Suraj Bangalore Shashidhar, Oliver Speck and Andreas N\"urnberge	(参考訳) 画像登録とは、リモートセンシング、画像検索、そして最も一般的には医療画像などのコンピュータビジョンの様々な応用で広く使われている技術である。深層学習に基づく技術は、医用画像登録を含む様々な複雑な医用画像処理問題に対処するために成功している。長年にわたり、深層学習を用いた画像登録技術が提案されてきた。 voxelmorphのような変形可能な画像登録技術は、より細かい変化を捉え、より滑らかな変形を提供するのに成功している。しかしながら、VoxelmorphはICNetやFIREと同様に、グローバルな依存関係(すなわち供給された画像の全体解剖学的ビュー)を明示的にエンコードしていないため、大きな変形を追跡できない。上記の問題に取り組むため,本稿ではvoxelmorphアプローチを3つの方法で拡張する。変形の小さい場合や大きな場合の性能向上のために,マルチスケールのUNetを用いて,解像度の異なるモデルの監視を行った。与えられた画像対の構造的相関関係を学習し、符号化するネットワークを支援するために、自己構築グラフネットワーク(SCGNet)がマルチスケールUNetの潜時として使われ、モデルの学習プロセスを改善し、モデルをより一般化するのに役立つ。そして最後に,変形を逆整合にするために,サイクル一貫性の損失が採用されている。提案手法は脳mriの登録作業において, アリとボクセルモルフに対して有意な改善を行い, イントラモーダルでは0.8013 \pm 0.0243, インターモーダルでは 0.6211 \pm 0.0309, ボクセルモルフでは 0.7747 \pm 0.0260 と 0.6071 \pm 0.0510 を得た。 Image registration is the process of bringing different images into a common coordinate system - a technique widely used in various applications of computer vision, such as remote sensing, image retrieval, and, most commonly, medical imaging. Deep learning based techniques have been applied successfully to tackle various complex medical image processing problems, including medical image registration. Over the years, several image registration techniques have been proposed using deep learning. Deformable image registration techniques such as Voxelmorph have been successful in capturing finer changes and providing smoother deformations. However, Voxelmorph, as well as ICNet and FIRE, do not explicitly encode global dependencies (i.e. the overall anatomical view of the supplied image) and, therefore, cannot track large deformations. In order to tackle the aforementioned problems, this paper extends the Voxelmorph approach in three different ways. To improve the performance in case of small as well as large deformations, supervision of the model at different resolutions has been integrated using a multi-scale UNet. To support the network to learn and encode the minute structural co-relations of the given image-pairs, a self-constructing graph network (SCGNet) has been used as the latent of the multi-scale UNet - which can improve the learning process of the model and help the model to generalise better. And finally, to make the deformations inverse-consistent, cycle consistency loss has been employed. On the task of registration of brain MRIs, the proposed method achieved significant improvements over ANTs and VoxelMorph, obtaining a Dice score of 0.8013 \pm 0.0243 for intramodal and 0.6211 \pm 0.0309 for intermodal, while VoxelMorph achieved 0.7747 \pm 0.0260 and 0.6071 \pm 0.0510, respectively	翻訳日:2023-07-27 16:43:55 公開日:2023-07-26
# 20モードユニバーサル量子フォトニックプロセッサ 20-Mode Universal Quantum Photonic Processor ( http://arxiv.org/abs/2203.01801v5 ) ライセンス: Link先を確認	Caterina Taballione, Malaquias Correa Anguita, Michiel de Goede, Pim Venderbosch, Ben Kassenberg, Henk Snijders, Narasimhan Kannan, Ward L. Vleeshouwers, Devin Smith, J\"orn P. Epping, Reinier van der Meer, Pepijn W. H. Pinkse, Hans van den Vlekkert, Jelmer J. Renema	(参考訳) 集積フォトニクスは光量子コンピューティングに不可欠な技術である。 universal, phase-stable, reconfigurable multimode interferometers (quantum photonic processor) はフォトニック量子状態の操作を可能にし、様々なアーキテクチャにおけるフォトニック量子コンピュータの主要なコンポーネントの一つである。本稿では,これまでで最大の量子フォトニックプロセッサの実現について報告する。プロセッサは20個の入力モードにおいて任意のユニタリ変換を可能とし、振幅忠実度は$f_{\text{haar}} = 97.4\%$と$f_{\text{perm}} = 99.5\%$ for haar-random と置換行列に対して、それぞれ2.9 dbの光学損失と$v_{\text{hom}}=98\%$の高可視性量子干渉を持つ。プロセッサは$\mathrm{Si_3N_4}$導波路で実現され、ペルチェ素子によって積極的に冷却される。 Integrated photonics is an essential technology for optical quantum computing. Universal, phase-stable, reconfigurable multimode interferometers (quantum photonic processors) enable manipulation of photonic quantum states and are one of the main components of photonic quantum computers in various architectures. In this paper, we report the realization of the largest quantum photonic processor to date. The processor enables arbitrary unitary transformations on its 20 input modes with an amplitude fidelity of $F_{\text{Haar}} = 97.4\%$ and $F_{\text{Perm}} = 99.5\%$ for Haar-random and permutation matrices, respectively, an optical loss of 2.9 dB averaged over all modes, and high-visibility quantum interference with $V_{\text{HOM}}=98\%$. The processor is realized in $\mathrm{Si_3N_4}$ waveguides and is actively cooled by a Peltier element.	翻訳日:2023-07-27 16:43:18 公開日:2023-07-26
# MetaDT:Few-Shot学習のためのクラス階層を持つメタ決定木 MetaDT: Meta Decision Tree with Class Hierarchy for Interpretable Few-Shot Learning ( http://arxiv.org/abs/2203.01482v2 ) ライセンス: Link先を確認	Baoquan Zhang, Hao Jiang, Xutao Li, Shanshan Feng, Yunming Ye, Rui Ye	(参考訳) FSL(Few-Shot Learning)は、新しいクラスをいくつかの例で認識することを目的とした課題である。近年,メタ学習や表現学習の観点から多くの手法が提案されている。しかし、FSL決定プロセスの解釈可能性に焦点を当てた研究はほとんどない。本稿では,新しいメタ学習ベースの決定木フレームワークであるmetadtを提案することで,解釈可能なfslへの一歩を踏み出す。特に、FSLの解釈性は概念的側面と視覚的側面という2つの側面から達成される。概念面では、まず FSL として木のような概念階層を導入する。そこで, 先行課題に頼って, 各タスクを異なる概念レベルを持つサブタスク群に分割し, 決定木モデルを用いてクラス予測を行う。このような設計の利点は、最終的なクラス予測につながる一連のハイレベルな概念決定が得られ、fslの決定プロセスが明確になることである。視覚面では、視覚的注意機構を備えたサブタスク固有の分類器のセットが、決定ツリーの各ノードで決定を行うように設計されている。その結果、サブタスク固有のヒートマップ可視化が得られ、各ツリーノードの決定解釈性が得られる。最終的に、FSLのデータ不足を緩和するために、概念階層の先行を無向グラフとみなし、グラフ畳み込みに基づく決定木推論ネットワークをメタラーナーとして設計し、決定木のパラメータを推測する。性能比較および解釈可能性分析に関する大規模な実験は、MetaDTの優位性を示している。 Few-Shot Learning (FSL) is a challenging task, which aims to recognize novel classes with few examples. Recently, lots of methods have been proposed from the perspective of meta-learning and representation learning. However, few works focus on the interpretability of FSL decision process. In this paper, we take a step towards the interpretable FSL by proposing a novel meta-learning based decision tree framework, namely, MetaDT. In particular, the FSL interpretability is achieved from two aspects, i.e., a concept aspect and a visual aspect. On the concept aspect, we first introduce a tree-like concept hierarchy as FSL prior. Then, resorting to the prior, we split each few-shot task to a set of subtasks with different concept levels and then perform class prediction via a model of decision tree. The advantage of such design is that a sequence of high-level concept decisions that lead up to a final class prediction can be obtained, which clarifies the FSL decision process. On the visual aspect, a set of subtask-specific classifiers with visual attention mechanism is designed to perform decision at each node of the decision tree. As a result, a subtask-specific heatmap visualization can be obtained to achieve the decision interpretability of each tree node. At last, to alleviate the data scarcity issue of FSL, we regard the prior of concept hierarchy as an undirected graph, and then design a graph convolution-based decision tree inference network as our meta-learner to infer parameters of the decision tree. Extensive experiments on performance comparison and interpretability analysis show superiority of our MetaDT.	翻訳日:2023-07-27 16:42:57 公開日:2023-07-26
# モデル比較と校正評価 : 機械学習とアクチュアリカル・プラクティスにおける一貫性のあるスコア機能のためのユーザガイド Model Comparison and Calibration Assessment: User Guide for Consistent Scoring Functions in Machine Learning and Actuarial Practice ( http://arxiv.org/abs/2202.12780v3 ) ライセンス: Link先を確認	Tobias Fissler, Christian Lorentzen, Michael Mayer	(参考訳) actuaryとデータサイエンティストの主なタスクの1つは、クレームサイズや保険のクレーム数といった特定の現象に対する優れた予測モデルを構築することである。これらのモデルは与えられた特徴情報を理想的に活用し、予測の精度を高める。このユーザガイドは、あるモデルのキャリブレーションや妥当性を評価し、他方で異なるモデルを比較しランク付けするための統計的手法を再検討し、明確化する。その際、事前の予測対象機能を指定すること(例えば平均または分位数)と、この目標機能と並んでモデル比較における得点関数を選択することの重要性を強調する。採点機能の実用的選択のためのガイダンスが提供される。応用における科学と日常の実践のギャップを埋めようとして、主に既存の成果の教育的な提示とベストプラクティスに焦点を当てている。結果は、労働者の報酬と顧客の混乱に関する2つの実データケーススタディに伴って説明される。 One of the main tasks of actuaries and data scientists is to build good predictive models for certain phenomena such as the claim size or the number of claims in insurance. These models ideally exploit given feature information to enhance the accuracy of prediction. This user guide revisits and clarifies statistical techniques to assess the calibration or adequacy of a model on the one hand, and to compare and rank different models on the other hand. In doing so, it emphasises the importance of specifying the prediction target functional at hand a priori (e.g. the mean or a quantile) and of choosing the scoring function in model comparison in line with this target functional. Guidance for the practical choice of the scoring function is provided. Striving to bridge the gap between science and daily practice in application, it focuses mainly on the pedagogical presentation of existing results and of best practice. The results are accompanied and illustrated by two real data case studies on workers' compensation and customer churn.	翻訳日:2023-07-27 16:42:34 公開日:2023-07-26
# SIMMC 2.0チャレンジにおけるあいまいさ検出と参照解決のためのマルチモーダル表現の探索 Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge ( http://arxiv.org/abs/2202.12645v2 ) ライセンス: Link先を確認	Javier Chiyah-Garcia and Alessandro Suglia and Jos\'e Lopes and Arash Eshghi and Helen Hastie	(参考訳) 代名詞や指示記述などのアナフォリックな表現は、先行するターンの言語的文脈や、即時的な視覚環境に関するものである。しかし、話者の参照記述が必ずしも参照者を識別するとは限らないため、その後の明確化交換による解決の必要性が曖昧になる。したがって、会話型AIにおけるタスク成功の鍵は、効果的なあいまいさ検出と参照解決である。本稿では,simmc 2.0 チャレンジ (kottur et al. 2021) の一環として,これら2つのタスクのモデルを提案する。具体的には,TOD-BERTとLXMERTをベースとしたモデルを用いて,多数のベースラインと比較し,アブレーション実験を行う。その結果,(1)言語モデルでは曖昧さを検出するためにデータの相関を活用でき,(2)言語モデルでは,スマートオブジェクト表現を用いることで,視覚コンポーネントの必要性を回避することができることがわかった。 Anaphoric expressions, such as pronouns and referential descriptions, are situated with respect to the linguistic context of prior turns, as well as, the immediate visual environment. However, a speaker's referential descriptions do not always uniquely identify the referent, leading to ambiguities in need of resolution through subsequent clarificational exchanges. Thus, effective Ambiguity Detection and Coreference Resolution are key to task success in Conversational AI. In this paper, we present models for these two tasks as part of the SIMMC 2.0 Challenge (Kottur et al. 2021). Specifically, we use TOD-BERT and LXMERT based models, compare them to a number of baselines and provide ablation experiments. Our results show that (1) language models are able to exploit correlations in the data to detect ambiguity; and (2) unimodal coreference resolution models can avoid the need for a vision component, through the use of smart object representations.	翻訳日:2023-07-27 16:42:17 公開日:2023-07-26
# Universal Deep Domain Adaptation Frameworkを用いたクロスセッションモータ画像のプライミング Priming Cross-Session Motor Imagery Classification with A Universal Deep Domain Adaptation Framework ( http://arxiv.org/abs/2202.09559v2 ) ライセンス: Link先を確認	Zhengqing Miao, Xin Zhang, Carlo Menon, Yelong Zheng, Meirong Zhao, Dong Ming	(参考訳) 運動画像(英: Motor image、MI)は、脳のコンピュータインタフェース(BCI)のパラダイムである。脳波は信号と雑音の少ない非定常的であり、異なる脳波記録セッションから同じ参加者の運動画像タスクを分類することは一般的に困難である。クロスセッションMI分類をドメイン適応問題と考えるのは直感的であるが、合理的かつ実現可能なアプローチは解明されていない。本稿では,領域適応理論の数学的モデルに基づくクロスセッションMI分類のための,シームズ深部ドメイン適応(SDDA)フレームワークを提案する。提案手法は,既存のニューラルネットワークの多くに対して,ネットワーク構造を変更せずに容易に適用することができる。提案手法では,まずチャネル正規化とユークリッドアライメントを併用してドメイン不変量を構築した。次に、ソースとターゲットドメインからの埋め込み機能を再生カーネルヒルベルト空間(RKHS)にマッピングし、それに従って整列する。 SDDAの一般化性を改善するために,コサインに基づく中心損失もフレームワークに統合された。提案フレームワークは、2つのMI-EEG公開データセット(BCI Competition IIA, IIB)において、BCI研究分野(EEGNetとConvNet)から古典的で一般的な畳み込みニューラルネットワークを用いて検証された。バニラのEEGNetとConvNetと比較して、提案されたSDDAフレームワークは、IIAデータセットでそれぞれ10.2%、IIBデータセットで5.5%、4.2%のMI分類精度を15.2%向上することができた。最終MI分類精度はIIAデータセットで82.01%、IIBで87.52%に達した。 Motor imagery (MI) is a common brain computer interface (BCI) paradigm. EEG is non-stationary with low signal-to-noise, classifying motor imagery tasks of the same participant from different EEG recording sessions is generally challenging, as EEG data distribution may vary tremendously among different acquisition sessions. Although it is intuitive to consider the cross-session MI classification as a domain adaptation problem, the rationale and feasible approach is not elucidated. In this paper, we propose a Siamese deep domain adaptation (SDDA) framework for cross-session MI classification based on mathematical models in domain adaptation theory. The proposed framework can be easily applied to most existing artificial neural networks without altering the network structure, which facilitates our method with great flexibility and transferability. In the proposed framework, domain invariants were firstly constructed jointly with channel normalization and Euclidean alignment. Then, embedding features from source and target domain were mapped into the Reproducing Kernel Hilbert Space (RKHS) and aligned accordingly. A cosine-based center loss was also integrated into the framework to improve the generalizability of the SDDA. The proposed framework was validated with two classic and popular convolutional neural networks from BCI research field (EEGNet and ConvNet) in two MI-EEG public datasets (BCI Competition IV IIA, IIB). Compared to the vanilla EEGNet and ConvNet, the proposed SDDA framework was able to boost the MI classification accuracy by 15.2%, 10.2% respectively in IIA dataset, and 5.5%, 4.2% in IIB dataset. The final MI classification accuracy reached 82.01% in IIA dataset and 87.52% in IIB, which outperformed the state-of-the-art methods in the literature.	翻訳日:2023-07-27 16:42:01 公開日:2023-07-26
# 小さなサンプルから大きな因果多木を推定する Estimating large causal polytrees from small samples ( http://arxiv.org/abs/2209.07028v2 ) ライセンス: Link先を確認	Sourav Chatterjee, Mathukumalli Vidyasagar	(参考訳) 比較的小さなi.d.サンプルから大きな因果ポリツリーを推定する問題を考察する。これは、遺伝子制御ネットワークのようなサンプルサイズに比べて変数数が非常に大きい場合に因果構造を決定する問題によって動機付けられた。このような設定で高い精度で木を復元するアルゴリズムを提案する。このアルゴリズムは本質的には、軽度非退化条件以外の分布的あるいはモデリング的な仮定下では機能しない。 We consider the problem of estimating a large causal polytree from a relatively small i.i.d. sample. This is motivated by the problem of determining causal structure when the number of variables is very large compared to the sample size, such as in gene regulatory networks. We give an algorithm that recovers the tree with high accuracy in such settings. The algorithm works under essentially no distributional or modeling assumptions other than some mild non-degeneracy conditions.	翻訳日:2023-07-27 16:35:03 公開日:2023-07-26
# フォトニック量子回路の設計について On the design of photonic quantum circuits ( http://arxiv.org/abs/2209.06069v4 ) ライセンス: Link先を確認	Yuan Yao, Filippo Miatto, and Nicol\'as Quesada	(参考訳) 本稿では,ガウス的対象(純粋かつ混合ガウス的ユニタリ,ガウス的チャネル,ガウス的測定)と光子数分解測定などの非ガウス的効果からなる一般フォトニック量子回路の設計と最適化を行う枠組みを提案する。この枠組みでは、シンプレクティック群(あるいは特別な場合におけるユニタリ群や直交群)の要素を用いてガウス対象の位相空間表現をパラメトリズし、任意のガウス対象のフォック振幅を再帰的に計算する単一の線形反復関係を用いてフォック表現に変換する。また,相空間パラメータに対するフォック振幅の勾配を再帰関係を通じて微分することにより計算する。次に、シンプレクティック群上のリーマン最適化を使用して、mモードガウスオブジェクトを最適化し、基本ゲートの観点から特定の実現にコミットする必要をなくすことができる。これにより、最適化が完了した後に選択できる同じ回路のゲートレベルの実装をすべて“モックアウト”することができる。これは、状態や変換のクラスにプロパティの値をバインドしたり、回路最適化のステップとは別にハードウェアの制約を心配したい場合など、一般的な質問に答えたい場合に特に有用である。最後に、状態がガウス変換を行うときのグローバル位相の変化を明示的に計算することにより、ガウス変換の線形結合として記述できる非ガウスオブジェクトに我々のフレームワークを拡張できるようにする。我々はこれらの手法をオープンソースライブラリMrMustardに実装し、Borealisの216モード干渉計を最適化する3つの例と、猫の状態と立方相状態を生成する2モードおよび3モード回路(Fock測定)で実装した。 We propose a framework to design and optimize generic photonic quantum circuits composed of Gaussian objects (pure and mixed Gaussian states, Gaussian unitaries, Gaussian channels, Gaussian measurements) as well as non-Gaussian effects such as photon-number-resolving measurements. In this framework, we parametrize a phase space representation of Gaussian objects using elements of the symplectic group (or the unitary or orthogonal group in special cases), and then we transform it into the Fock representation using a single linear recurrence relation that computes the Fock amplitudes of any Gaussian object recursively. We also compute the gradient of the Fock amplitudes with respect to phase space parameters by differentiating through the recurrence relation. We can then use Riemannian optimization on the symplectic group to optimize M-mode Gaussian objects, avoiding the need to commit to particular realizations in terms of fundamental gates. This allows us to "mod out" all the different gate-level implementations of the same circuit, which now can be chosen after the optimization has completed. This can be especially useful when looking to answer general questions, such as bounding the value of a property over a class of states or transformations, or when one would like to worry about hardware constraints separately from the circuit optimization step. Finally, we make our framework extendable to non-Gaussian objects that can be written as linear combinations of Gaussian ones, by explicitly computing the change in global phase when states undergo Gaussian transformations. We implemented all of these methods in the freely available open-source library MrMustard, which we use in three examples to optimize the 216-mode interferometer in Borealis, and 2- and 3-modes circuits (with Fock measurements) to produce cat states and cubic phase states.	翻訳日:2023-07-27 16:34:58 公開日:2023-07-26
# AudioLM: 音声生成のための言語モデリングアプローチ AudioLM: a Language Modeling Approach to Audio Generation ( http://arxiv.org/abs/2209.03143v2 ) ライセンス: Link先を確認	Zal\'an Borsos, Rapha\"el Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil Zeghidour	(参考訳) 本稿では,長期的整合性を有する高品質オーディオ生成フレームワークであるAudioLMを紹介する。 audiolmは入力オーディオを一連の離散トークンにマッピングし、この表現空間で言語モデリングタスクとしてオーディオ生成をキャストする。本稿では,既存の音声トークン化装置が,再建品質と長期構造との間に異なるトレードオフを提供する方法を示す。すなわち,音声に事前学習したマスク付き言語モデルの離散化アクティベーションを利用して,ニューラルオーディオコーデックが生成する長期構造と離散符号をキャプチャし,高品質な合成を実現する。生音声波形の大規模なコーパスを訓練することにより、AudioLMは短いプロンプトを与えられた自然なコヒーレントな継続を生成することを学ぶ。音声で訓練し、書き起こしや注釈なしでは、AudioLMは構文的かつ意味論的に妥当な音声継続を生成すると同時に、未知の話者に対する話者のアイデンティティと韻律を維持できる。さらに,音楽の象徴的表現を伴わずに訓練されたにもかかわらず,コヒーレントなピアノ音楽継続を生成することによって,音声を超えて我々のアプローチが拡張されることを示す。 We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure, and we propose a hybrid tokenization scheme to achieve both objectives. Namely, we leverage the discretized activations of a masked language model pre-trained on audio to capture long-term structure and the discrete codes produced by a neural audio codec to achieve high-quality synthesis. By training on large corpora of raw audio waveforms, AudioLM learns to generate natural and coherent continuations given short prompts. When trained on speech, and without any transcript or annotation, AudioLM generates syntactically and semantically plausible speech continuations while also maintaining speaker identity and prosody for unseen speakers. Furthermore, we demonstrate how our approach extends beyond speech by generating coherent piano music continuations, despite being trained without any symbolic representation of music.	翻訳日:2023-07-27 16:34:24 公開日:2023-07-26
# 古典的なモデルは、目標に絞られた光モデルよりも、Juzhang 1.0 Gaussian Boson Samplerのより良い説明であるかもしれない。 Classical models may be a better explanation of the Jiuzhang 1.0 Gaussian Boson Sampler than its targeted squeezed light model ( http://arxiv.org/abs/2207.10058v5 ) ライセンス: Link先を確認	Javier Mart\'inez-Cifuentes, K. M. Fonseca-Romero, Nicol\'as Quesada	(参考訳) 最近、Zhongらはしきい値検出器を用いて最大144モードのガウスボソンサンプリング実験を行った。著者らはこれらの実験の実装により、Juzhang 1.0 と Jiuzhang 2.0 という量子計算上の優位性を達成したと主張している。これらの実験結果は、モード、ベイズ仮説テスト、重出力生成(hog)テストとの統計的相関の比較などのテストを用いて、いくつかの古典的な仮説と敵に対して検証される。本稿では, 干渉計に送信されたコヒーレント状態の混合物の確率分布を用いて, これらの実験を検証するための古典的仮説を提案する。高光子数密度系における構成について、統計相関の比較は実験の基礎的真実(2モードの圧縮状態が干渉計に送信される)を我々の代替仮説と区別しない。ベイズテストは、Juzhang 1.0以外のすべての構成について、基礎的な真実は我々の代替仮説よりも実験データのよりありそうな説明であることを示している。同様の結果がホグテストで得られた:jiuzhang 2.0の全ての構成について、実験サンプルは我々の代替分布で得られたサンプルよりも高い基底真理確率を持つことを示し、jiuzhang 1.0ではテストは決定的ではない。本結果は,今後のGBS実験の検証において考慮すべき新しい仮説を提供し,GBSの文脈で量子優位性を検証するための適切なメトリクスを特定する必要性に光を当てた。また、量子的特徴を欠いたJuzhang 1.0実験の古典的な説明は除外されていないことも示している。 Recently, Zhong et al. performed landmark Gaussian boson sampling experiments with up to 144 modes using threshold detectors. The authors claim to have achieved quantum computational advantage with the implementation of these experiments, named Jiuzhang 1.0 and Jiuzhang 2.0. Their experimental results are validated against several classical hypotheses and adversaries using tests such as the comparison of statistical correlations between modes, Bayesian hypothesis testing and the Heavy Output Generation (HOG) test. We propose an alternative classical hypothesis for the validation of these experiments using the probability distribution of mixtures of coherent states sent into a lossy interferometer; these input mixed states, which we term squashed states, have vacuum fluctuations in one quadrature and excess fluctuations in the other. We find that for configurations in the high photon number density regime, the comparison of statistical correlations does not tell apart the ground truth of the experiment (two-mode squeezed states sent into an interferometer) from our alternative hypothesis. The Bayesian test indicates that, for all configurations excepting Jiuzhang 1.0, the ground truth is a more likely explanation of the experimental data than our alternative hypothesis. A similar result is obtained for the HOG test: for all configurations of Jiuzhang 2.0, the test indicates that the experimental samples have higher ground truth probability than the samples obtained form our alternative distribution; for Jiuzhang 1.0 the test is inconclusive. Our results provide a new hypothesis that should be considered in the validation of future GBS experiments, and shed light into the need to identify proper metrics to verify quantum advantage in the context of GBS. They also indicate that a classical explanation of the Jiuzhang 1.0 experiment, lacking any quantum features, has not been ruled out.	翻訳日:2023-07-27 16:33:39 公開日:2023-07-26
# FedIIC: 医用画像分類のためのロバストなフェデレーション学習を目指して FedIIC: Towards Robust Federated Learning for Class-Imbalanced Medical Image Classification ( http://arxiv.org/abs/2206.13803v3 ) ライセンス: Link先を確認	Nannan Wu, Li Yu, Xin Yang, Kwang-Ting Cheng, and Zengqiang Yan	(参考訳) プライバシーの漏えいのない分散データから深層モデルをトレーニングするfederated learning(fl)は、最近医療画像コンピューティングにおいて大きな可能性を秘めている。しかし、医療データにおけるユビキタスクラスの不均衡を考えると、flは特にマイノリティクラス(まれな疾患など)において性能低下を示すことができる。この問題に対する既存の手法は主に、クラス間のクラス事前バイアスを取り除くためのバランスの取れた分類器の訓練に重点を置いている。本稿では,特徴学習と分類器学習という2つの観点からクラス不均衡と戦うために,FedIICというプライバシ保護FL手法を提案する。特徴学習では、2段階のコントラスト学習が、FLで不均衡なデータを用いてより優れたクラス特化特徴を抽出するように設計されている。分類器学習では、クラスごとのマージンはリアルタイムの難易度とクラス優先度に応じて動的に設定される。公開データセットに対する実験結果から,FedIICが実世界とシミュレーションされたマルチソース医療画像データの両方を扱う上で,クラス不均衡下での優れた性能を示した。コードはhttps://github.com/wnn2000/FedIICで入手できる。 Federated learning (FL), training deep models from decentralized data without privacy leakage, has shown great potential in medical image computing recently. However, considering the ubiquitous class imbalance in medical data, FL can exhibit performance degradation, especially for minority classes (e.g. rare diseases). Existing methods towards this problem mainly focus on training a balanced classifier to eliminate class prior bias among classes, but neglect to explore better representation to facilitate classification performance. In this paper, we present a privacy-preserving FL method named FedIIC to combat class imbalance from two perspectives: feature learning and classifier learning. In feature learning, two levels of contrastive learning are designed to extract better class-specific features with imbalanced data in FL. In classifier learning, per-class margins are dynamically set according to real-time difficulty and class priors, which helps the model learn classes equally. Experimental results on publicly-available datasets demonstrate the superior performance of FedIIC in dealing with both real-world and simulated multi-source medical imaging data under class imbalance. Code is available at https://github.com/wnn2000/FedIIC.	翻訳日:2023-07-27 16:32:41 公開日:2023-07-26
# 2段階の勾配更新による安定性の限界を超える Beyond the Edge of Stability via Two-step Gradient Updates ( http://arxiv.org/abs/2206.04172v3 ) ライセンス: Link先を確認	Lei Chen, Joan Bruna	(参考訳) Gradient Descent(GD)は、高次元空間におけるスケーラビリティと効率のおかげで、現代の機械学習の強力なワークホースである。局所的なミニミザーを見つける能力はリプシッツ勾配の損失に対してのみ保証され、そこでは下層の勾配流の'bona-fide'離散化と見なすことができる。しかし、過パラメータモデルを含む多くのmlセットアップは、上記のリプシッツ定数に反比例する許容しきい値にステップサイズが交差するいわゆる「安定性の限界」(eos)以上の研究を動機付けたこの問題クラスには入らない。おそらく驚くべきことに、gdは局所的な不安定性と振動行動に関わらず、依然として収束することが実証的に観察されている。この現象の初歩的な理論的分析は、主に過パラメトリッドな体制に焦点を合わせており、大きな学習率を選択する効果は、適切な漸近的限界の下で、ミニミザー多様体内の「シャープネス・ミニミフィケーション」の暗黙的な正則化と関連付けられる可能性がある。対照的に,本研究では,2段階の勾配更新の解析を通じて,単純だが代表的な学習問題に着目し,不安定収束の条件を直接検討する。具体的には,二段階更新の固定点に対する存在と収束を保証する三階微分を含む局所的条件を特徴とし,その特性を教師の学習環境において,人口減少下で活用する。最後に, 行列因子分解からGDの周期2軌道を高次元的に観察し, ダイナミックスを直感的に観察し, より一般的な設定を探索する。 Gradient Descent (GD) is a powerful workhorse of modern machine learning thanks to its scalability and efficiency in high-dimensional spaces. Its ability to find local minimisers is only guaranteed for losses with Lipschitz gradients, where it can be seen as a `bona-fide' discretisation of an underlying gradient flow. Yet, many ML setups involving overparametrised models do not fall into this problem class, which has motivated research beyond the so-called ``Edge of Stability'' (EoS), where the step-size crosses the admissibility threshold inversely proportional to the Lipschitz constant above. Perhaps surprisingly, GD has been empirically observed to still converge regardless of local instability and oscillatory behavior. The incipient theoretical analysis of this phenomena has mainly focused in the overparametrised regime, where the effect of choosing a large learning rate may be associated to a `Sharpness-Minimisation' implicit regularisation within the manifold of minimisers, under appropriate asymptotic limits. In contrast, in this work we directly examine the conditions for such unstable convergence, focusing on simple, yet representative, learning problems, via analysis of two-step gradient updates. Specifically, we characterize a local condition involving third-order derivatives that guarantees existence and convergence to fixed points of the two-step updates, and leverage such property in a teacher-student setting, under population loss. Finally, starting from Matrix Factorization, we provide observations of period-2 orbit of GD in high-dimensional settings with intuition of its dynamics, along with exploration into more general settings.	翻訳日:2023-07-27 16:32:18 公開日:2023-07-26
# TreeFlow: ツリーベースのガウス確率的回帰を超えて TreeFlow: Going beyond Tree-based Gaussian Probabilistic Regression ( http://arxiv.org/abs/2206.04140v2 ) ライセンス: Link先を確認	Patryk Wielopolski, Maciej Zi\k{e}ba	(参考訳) 木に基づくアンサンブルは、様々な範囲や領域の混合型変数で表される特徴ベクトルを特徴とする分類や回帰問題において優れた性能で知られている。しかし、回帰問題を考えると、主に決定論的応答を提供するか、ガウス分布やパラメトリック分布で出力の不確かさをモデル化するために設計されている。本研究では,ツリーアンサンブルの利点と,正規化フローを用いた柔軟な確率分布のモデル化機能を組み合わせたツリーベースアプローチであるTreeFlowを紹介する。この解の主な考え方は、木に基づくモデルを特徴抽出器として使用し、正規化フローの条件変数と組み合わせることである。その結果,本手法は回帰出力の複雑な分布をモデル化することができる。提案手法は, 量, 特徴特性, 対象寸法の異なる難易度回帰ベンチマークを用いて評価する。我々は,多モーダル目標分布を持つデータセットの確率的および決定論的指標と,木に基づく回帰ベースラインと比較した単調なデータセットの競合結果のSOTA結果を得た。 The tree-based ensembles are known for their outstanding performance in classification and regression problems characterized by feature vectors represented by mixed-type variables from various ranges and domains. However, considering regression problems, they are primarily designed to provide deterministic responses or model the uncertainty of the output with Gaussian or parametric distribution. In this work, we introduce TreeFlow, the tree-based approach that combines the benefits of using tree ensembles with the capabilities of modeling flexible probability distributions using normalizing flows. The main idea of the solution is to use a tree-based model as a feature extractor and combine it with a conditional variant of normalizing flow. Consequently, our approach is capable of modeling complex distributions for the regression outputs. We evaluate the proposed method on challenging regression benchmarks with varying volume, feature characteristics, and target dimensionality. We obtain the SOTA results for both probabilistic and deterministic metrics on datasets with multi-modal target distributions and competitive results on unimodal ones compared to tree-based regression baselines.	翻訳日:2023-07-27 16:31:42 公開日:2023-07-26
# 3次元小分子と高分子錯体のための効率的かつ正確な物理量認識多重グラフニューラルネットワーク Efficient and Accurate Physics-aware Multiplex Graph Neural Networks for 3D Small Molecules and Macromolecule Complexes ( http://arxiv.org/abs/2206.02789v2 ) ライセンス: Link先を確認	Shuo Zhang, Yang Liu, Lei Xie	(参考訳) グラフニューラルネットワーク(GNN)を分子科学に適用する最近の進歩は、3次元3次元構造表現をGNNで学習する能力を示している。しかし、既存のGNNのほとんどは、多様な相互作用のモデリング不足、計算コストの高い演算、ベクトル値の無知の限界に悩まされている。そこで我々は,新しいGNNモデルである物理対応多重グラフニューラルネットワーク(PaxNet)を提案し,小さな有機化合物とマクロ分子複合体の3次元分子の表現を効率的かつ正確に学習する。 PaxNetは、分子力学にインスパイアされた局所的および非局所的な相互作用のモデリングを分離し、高価な角度関連計算を減らす。スカラー特性の他に、paxnetは各原子の関連するベクトルを学習することでベクトル特性を予測できる。 PaxNetの性能を評価するために,2つのタスクにおける最先端のベースラインと比較する。量子化学特性を予測するための小さな分子データセットでは、PaxNetは予測誤差を15%削減し、最高のベースラインよりも73%少ないメモリを使用する。タンパク質-リガンド結合親和性を予測するマクロ分子データセットでは、PaxNetはメモリ消費を33%減らし、推論時間を85%減らしながら、最高のベースラインを上回っている。したがって、PaxNetは分子の大規模機械学習のための普遍的で堅牢で正確な方法を提供する。私たちのコードはhttps://github.com/zetayue/Physics-aware-Multiplex-GNNで利用可能です。 Recent advances in applying Graph Neural Networks (GNNs) to molecular science have showcased the power of learning three-dimensional (3D) structure representations with GNNs. However, most existing GNNs suffer from the limitations of insufficient modeling of diverse interactions, computational expensive operations, and ignorance of vectorial values. Here, we tackle these limitations by proposing a novel GNN model, Physics-aware Multiplex Graph Neural Network (PaxNet), to efficiently and accurately learn the representations of 3D molecules for both small organic compounds and macromolecule complexes. PaxNet separates the modeling of local and non-local interactions inspired by molecular mechanics, and reduces the expensive angle-related computations. Besides scalar properties, PaxNet can also predict vectorial properties by learning an associated vector for each atom. To evaluate the performance of PaxNet, we compare it with state-of-the-art baselines in two tasks. On small molecule dataset for predicting quantum chemical properties, PaxNet reduces the prediction error by 15% and uses 73% less memory than the best baseline. On macromolecule dataset for predicting protein-ligand binding affinities, PaxNet outperforms the best baseline while reducing the memory consumption by 33% and the inference time by 85%. Thus, PaxNet provides a universal, robust and accurate method for large-scale machine learning of molecules. Our code is available at https://github.com/zetayue/Physics-aware-Multiplex-GNN.	翻訳日:2023-07-27 16:31:24 公開日:2023-07-26
# 流れ・ステレオ・深さの統一推定 Unifying Flow, Stereo and Depth Estimation ( http://arxiv.org/abs/2211.05783v3 ) ライセンス: Link先を確認	Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, Dacheng Tao, Andreas Geiger	(参考訳) 本稿では,光学的流れ,修正ステレオマッチング,未修正ステレオ深度推定という3つの動作および3次元知覚タスクの統一的な定式化とモデルを提案する。特定のタスクごとの以前の特殊なアーキテクチャとは異なり、我々は3つのタスクすべてを統一的な密対応マッチング問題として定式化し、特徴の類似性を直接比較することで単一のモデルで解決できる。このような定式化は、トランスフォーマー、特にクロスアテンション機構を用いて達成される識別的特徴表現を要求する。我々は,クロスアテンションによって他画像からの知識を相互に統合できることを実証し,抽出した特徴の質を大幅に向上させることを実証した。私たちの統一モデルは、モデルアーキテクチャとパラメータがタスク間で共有されるため、自然にクロスタスク転送を可能にします。 Sintelデータセットの統一モデルではRAFTよりも優れており、モデル設計や推論速度の点でよりシンプルで効率的でありながら、10の一般的なフロー、ステレオ、ディープデータセットにおける最新の最先端手法よりも優れ、あるいは好適に、タスク固有の改善ステップを使用する最終モデルです。 We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images. Unlike previous specialized architectures for each specific task, we formulate all three tasks as a unified dense correspondence matching problem, which can be solved with a single model by directly comparing feature similarities. Such a formulation calls for discriminative feature representations, which we achieve using a Transformer, in particular the cross-attention mechanism. We demonstrate that cross-attention enables integration of knowledge from another image via cross-view interactions, which greatly improves the quality of the extracted features. Our unified model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks. We outperform RAFT with our unified model on the challenging Sintel dataset, and our final model that uses a few additional task-specific refinement steps outperforms or compares favorably to recent state-of-the-art methods on 10 popular flow, stereo and depth datasets, while being simpler and more efficient in terms of model design and inference speed.	翻訳日:2023-07-27 16:25:08 公開日:2023-07-26
# 連合学習における顧客選択:原則、課題、機会 Client Selection in Federated Learning: Principles, Challenges, and Opportunities ( http://arxiv.org/abs/2211.01549v2 ) ライセンス: Link先を確認	Lei Fu and Huanle Zhang and Ge Gao and Mi Zhang and Xin Liu	(参考訳) 機械学習(ML)モデルをトレーニングするためのプライバシ保護パラダイムとして、フェデレートラーニング(FL)は、業界と学術の両方から大きな注目を集めています。典型的なFLシナリオでは、クライアントはデータ分散とハードウェア構成の点で大きな異質性を示す。したがって、各トレーニングラウンドのクライアントをランダムにサンプリングすることは、ヘテロジニアスクライアントからのローカル更新を十分に活用できないため、モデルの精度が低下し、収束速度が遅くなり、公平性が低下する。 FLクライアントの不均一性問題に対処するため,様々なクライアント選択アルゴリズムが開発され,性能改善が期待できる。本稿では,FLクライアント選択の新興分野における最近の進歩とその課題と研究の機会を体系的に提示する。このエキサイティングな研究トピックをより深く理解するために、アプリケーションに最適なクライアント選択メカニズムを実践者が選択できるようにしたいと思っています。 As a privacy-preserving paradigm for training Machine Learning (ML) models, Federated Learning (FL) has received tremendous attention from both industry and academia. In a typical FL scenario, clients exhibit significant heterogeneity in terms of data distribution and hardware configurations. Thus, randomly sampling clients in each training round may not fully exploit the local updates from heterogeneous clients, resulting in lower model accuracy, slower convergence rate, degraded fairness, etc. To tackle the FL client heterogeneity problem, various client selection algorithms have been developed, showing promising performance improvement. In this paper, we systematically present recent advances in the emerging field of FL client selection and its challenges and research opportunities. We hope to facilitate practitioners in choosing the most suitable client selection mechanisms for their applications, as well as inspire researchers and newcomers to better understand this exciting research topic.	翻訳日:2023-07-27 16:24:39 公開日:2023-07-26
# 拡散に基づく生成モデルにおける最適制御 An optimal control perspective on diffusion-based generative modeling ( http://arxiv.org/abs/2211.01364v2 ) ライセンス: Link先を確認	Julius Berner, Lorenz Richter, Karen Ullrich	(参考訳) 近年開発された拡散確率モデルのような確率微分方程式(SDE)に基づく確率最適制御と生成モデルとの接続を確立する。特にハミルトン・ヤコビ・ベルマン方程式を導出し、基礎となるSDE限界の対数密度の進化を制御している。この観点は、最適制御理論から生成的モデリングへのメソッドの転送を可能にする。まず、下界の証拠が制御理論からよく知られた検証定理の直接的な帰結であることを示す。さらに、経路空間における適切な測度間のKulback-Leibler分散の最小化として拡散に基づく生成モデルを定式化することができる。最後に, 統計学や計算科学で頻繁に発生する問題である非正規化密度からの拡散に基づく新しいサンプリング法を開発した。時間反転拡散サンプラー (dis) は, 複数の数値例において他の拡散に基づくサンプリング手法よりも優れることを示す。 We establish a connection between stochastic optimal control and generative models based on stochastic differential equations (SDEs), such as recently developed diffusion probabilistic models. In particular, we derive a Hamilton-Jacobi-Bellman equation that governs the evolution of the log-densities of the underlying SDE marginals. This perspective allows to transfer methods from optimal control theory to generative modeling. First, we show that the evidence lower bound is a direct consequence of the well-known verification theorem from control theory. Further, we can formulate diffusion-based generative modeling as a minimization of the Kullback-Leibler divergence between suitable measures in path space. Finally, we develop a novel diffusion-based method for sampling from unnormalized densities -- a problem frequently occurring in statistics and computational sciences. We demonstrate that our time-reversed diffusion sampler (DIS) can outperform other diffusion-based sampling approaches on multiple numerical examples.	翻訳日:2023-07-27 16:24:21 公開日:2023-07-26
# teal: wanトラフィックエンジニアリングの学習促進最適化 Teal: Learning-Accelerated Optimization of WAN Traffic Engineering ( http://arxiv.org/abs/2210.13763v3 ) ライセンス: Link先を確認	Zhiying Xu, Francis Y. Yan, Rachee Singh, Justin T. Chiu, Alexander M. Rush, Minlan Yu	(参考訳) グローバルクラウドワイドエリアネットワーク(WAN)の急速な拡張は、商用最適化エンジンが大規模なネットワークトラフィックエンジニアリング(TE)問題を効率的に解決する上で、課題となっている。既存のアクセラレーション戦略は、te最適化を並行部分問題に分解するが、実行時間と割り当て性能の固有のトレードオフにより、限定的な並列性を実現する。本稿では,GPUの並列処理能力を活用してTE制御を高速化する学習型TEアルゴリズムTealを提案する。まず、Tealはフロー中心グラフニューラルネットワーク(GNN)を設計し、WAN接続とネットワークフローをキャプチャし、下流アロケーションへの入力としてフロー特徴を学習する。第2に,問題スケールを小さくし,学習を容易なものにするため,中央のTE目標を最適化しながら,各交通需要を独立的に割り当てるマルチエージェント強化学習(RL)アルゴリズムを用いる。最後に,ADMM(Alternating Direction Method of Multipliers)を用いたTeal Fine-tunesアロケーションは,過利用リンクなどの制約違反を低減するために,高度に並列化可能な最適化アルゴリズムである。 MicrosoftのWANのトラフィック行列を用いてTealを評価する。 1,700ノード以上の大きなwanトポロジでは、tealはプロダクション最適化エンジンよりも数桁速い速度で実行しながら、ほぼ最適に近いフロー割り当てを生成する。他のte加速方式と比較して、tealは6～32%のトラフィック需要を満たし、197～625倍のスピードアップを実現している。 The rapid expansion of global cloud wide-area networks (WANs) has posed a challenge for commercial optimization engines to efficiently solve network traffic engineering (TE) problems at scale. Existing acceleration strategies decompose TE optimization into concurrent subproblems but realize limited parallelism due to an inherent tradeoff between run time and allocation performance. We present Teal, a learning-based TE algorithm that leverages the parallel processing power of GPUs to accelerate TE control. First, Teal designs a flow-centric graph neural network (GNN) to capture WAN connectivity and network flows, learning flow features as inputs to downstream allocation. Second, to reduce the problem scale and make learning tractable, Teal employs a multi-agent reinforcement learning (RL) algorithm to independently allocate each traffic demand while optimizing a central TE objective. Finally, Teal fine-tunes allocations with ADMM (Alternating Direction Method of Multipliers), a highly parallelizable optimization algorithm for reducing constraint violations such as overutilized links. We evaluate Teal using traffic matrices from Microsoft's WAN. On a large WAN topology with >1,700 nodes, Teal generates near-optimal flow allocations while running several orders of magnitude faster than the production optimization engine. Compared with other TE acceleration schemes, Teal satisfies 6--32% more traffic demand and yields 197--625x speedups.	翻訳日:2023-07-27 16:23:53 公開日:2023-07-26
# リングトラップにおける分子イオンの量子論理制御と精密測定-基礎対称性試験のための新しいアプローチ Quantum logic control and precision measurements of molecular ions in a ring trap -- a new approach for testing fundamental symmetries ( http://arxiv.org/abs/2210.11613v2 ) ライセンス: Link先を確認	Yan Zhou, Joshua O. Island, Matt Grau	(参考訳) 本稿では,分節リングイオントラップにおける極性分子イオンの量子論理制御を容易にする新しいプラットフォームを提案する。このアプローチは、スピンコヒーレンスとともに、近距離均一状態の準備と検出を達成することに焦点を当てる。特徴的な特徴は、回転するフレームのパリティ選択スピンプリセションから静的フレームで行われる状態準備と検出を分離することにある。この方法は幅広いイオン種に適用することができ、電子の電気双極子モーメントと核磁気四極子モーメントの探索に使用される。 We present a new platform facilitating quantum logic control of polar molecular ions in a segmented ring ion trap, paving the way for precision measurements. This approach focuses on achieving near-unity state preparation and detection, as well as long spin coherence. A distinctive aspect lies in separating state preparation and detection conducted in a static frame, from parity-selective spin-precession in a rotating frame. This method can be applied to a wide range of ion species and will be used to search for the electron's electric dipole moment and the nuclear magnetic quadrupole moment.	翻訳日:2023-07-27 16:23:27 公開日:2023-07-26
# セキュアなマルチパーティ量子最小共通多重計算プロトコル A Secure Multiparty Quantum Least Common Multiple Computation Protocol ( http://arxiv.org/abs/2210.08165v2 ) ライセンス: Link先を確認	Zixian Li and Wenjie Liu	(参考訳) 本稿では、ShorのQPA(quantum period-finding algorithm)に基づいて、最小多重(LCM)に対してセキュアなマルチパーティ計算(SMC)プロトコルを提案する。我々のプロトコルは以下の原理に基づいている: 複数の周期関数の接続は周期関数であり、周期は全ての小さな周期のうち、正確には最も一般的でない多重である。また,QPAは確率的アルゴリズムであるため,提案したLCMプロトコルの結果を検証するために,既存のセキュアなマルチパーティ量子和プロトコルに基づく一票制投票プロトコルを提案する。セキュリティ分析により,提案プロトコルは高い確率でセキュアであり,計算量は多項式の複雑さに留まっていることがわかった。本稿では,LCMの効率的かつセキュアなマルチパーティ計算の課題を解決し,量子計算の可能性を示す。 In this paper, we present a secure multiparty computation (SMC) protocol for least common multiple (LCM) based on Shor's quantum period-finding algorithm (QPA). Our protocol is based on the following principle: the connection of multiple periodic functions is also a periodic function whose period is exactly the least common multiple of all small periods. Since QPA is a probabilistic algorithm, we also propose a one-vote-down vote protocol based on the existing secure multi-party quantum summation protocol, which is used to verify the results of the proposed LCM protocol. Security analysis shows that under the semi-honest model, the proposed protocol is secure with high probability, while the computational consumption remains at polynomial complexity. The protocol proposed in this paper solves the problem of efficient and secure multiparty computation of LCM, demonstrating quantum computation potential.	翻訳日:2023-07-27 16:23:14 公開日:2023-07-26
# リモートセンシングと機械学習によるバークビートル攻撃の早期検出 Early Detection of Bark Beetle Attack Using Remote Sensing and Machine Learning: A Review ( http://arxiv.org/abs/2210.03829v2 ) ライセンス: Link先を確認	Seyed Mojtaba Marvasti-Zadeh, Devin Goodsman, Nilanjan Ray, Nadir Erbilgin	(参考訳) 本報告では,本研究の過去および現在の動向を概観し,本研究の3つの主要な視点からブナ害虫による樹木死の早期発見について概観する。これまでの取り組みとは対照的に、このレビューは全てのRSシステムを網羅し、その強みや弱点を調査するためのML/DL手法を強調している。我々は,マルチ・ハイパー・スペクトル分析に基づいて既存の文献を解析し,その知識を抽出した。攻撃の初期段階,ホストツリー,研究領域,rsプラットフォームとセンサ,スペクトル/空間/時間分解能,スペクトルシグネチャ,スペクトル植生指数(svis),mlアプローチ,学習スキーム,タスクカテゴリ,アルゴリズム,クラス/クラスタ,特徴,dlネットワークとアーキテクチャに重点を置く。 DLベースの手法とランダムフォレスト(RF)アルゴリズムは有望な結果を示し、可視、熱、短波赤外(SWIR)スペクトル領域にわたる微妙な変化を検出する可能性を強調したが、その効果は限定的であり、高い不確実性を持っている。これらの欠点に対する新しい解決策を刺激するために、さまざまな視点から主要な課題と機会を掘り下げ、研究の現状をより深く理解し、今後の研究方向性を導く。 This paper provides a comprehensive review of past and current advances in the early detection of bark beetle-induced tree mortality from three primary perspectives: bark beetle & host interactions, RS, and ML/DL. In contrast to prior efforts, this review encompasses all RS systems and emphasizes ML/DL methods to investigate their strengths and weaknesses. We parse existing literature based on multi- or hyper-spectral analyses and distill their knowledge based on: bark beetle species & attack phases with a primary emphasis on early stages of attacks, host trees, study regions, RS platforms & sensors, spectral/spatial/temporal resolutions, spectral signatures, spectral vegetation indices (SVIs), ML approaches, learning schemes, task categories, models, algorithms, classes/clusters, features, and DL networks & architectures. Although DL-based methods and the random forest (RF) algorithm showed promising results, highlighting their potential to detect subtle changes across visible, thermal, and short-wave infrared (SWIR) spectral regions, they still have limited effectiveness and high uncertainties. To inspire novel solutions to these shortcomings, we delve into the principal challenges & opportunities from different perspectives, enabling a deeper understanding of the current state of research and guiding future research directions.	翻訳日:2023-07-27 16:22:45 公開日:2023-07-26
# factor fields: ニューラルフィールドとそれ以降の統一フレームワーク Factor Fields: A Unified Framework for Neural Fields and Beyond ( http://arxiv.org/abs/2302.01226v2 ) ライセンス: Link先を確認	Anpei Chen, Zexiang Xu, Xinyue Wei, Siyu Tang, Hao Su, Andreas Geiger	(参考訳) 信号のモデル化と表現のための新しいフレームワークであるファクタフィールドを提案する。因子場は信号を因子の積に分解し、それぞれが座標変換された入力信号を操作する神経または正則なフィールド表現で表される。この分解により,nerf,plenoxels,eg3d,instant-ngp,tensorfなどの最近の信号表現を一般化する統一フレームワークが得られた。さらに,本論文で提案するCoBaFa(Coefficient-Basis Factorization, CoBaFa)のような,強力な新しい信号表現の創出を可能にする。実験で証明されたように、cobafaは、神経信号表現における3つの重要な目標である近似品質、コンパクト性、効率性の観点から、以前の高速再構成法よりも改善される。実験により,2次元画像回帰タスクでは画像の近似精度が向上し,3次元符号付き距離場を再構成する場合の幾何的品質が向上し,従来の高速再構成手法に比べて精度が向上することが実証された。さらに,このCoBaFa表現は,トレーニング中に信号間で基底を共有することで一般化が可能であり,スパース観測による画像回帰や数発の放射場再構成といった一般化タスクも実現している。プロジェクトページ: https://apchenstu.github.io/factorfields/ We present Factor Fields, a novel framework for modeling and representing signals. Factor Fields decomposes a signal into a product of factors, each of which is represented by a neural or regular field representation operating on a coordinate transformed input signal. We show that this decomposition yields a unified framework that generalizes several recent signal representations including NeRF, PlenOxels, EG3D, Instant-NGP, and TensoRF. Moreover, the framework allows for the creation of powerful new signal representations, such as the Coefficient-Basis Factorization (CoBaFa) which we propose in this paper. As evidenced by our experiments, CoBaFa leads to improvements over previous fast reconstruction methods in terms of the three critical goals in neural signal representation: approximation quality, compactness and efficiency. Experimentally, we demonstrate that our representation achieves better image approximation quality on 2D image regression tasks, higher geometric quality when reconstructing 3D signed distance fields and higher compactness for radiance field reconstruction tasks compared to previous fast reconstruction methods. Besides, our CoBaFa representation enables generalization by sharing the basis across signals during training, enabling generalization tasks such as image regression with sparse observations and few-shot radiance field reconstruction. Project Page: https://apchenstu.github.io/FactorFields/	翻訳日:2023-07-27 16:14:50 公開日:2023-07-26
# 量子力学の非線形拡張における符号なし No-signaling in Nonlinear Extensions of Quantum Mechanics ( http://arxiv.org/abs/2301.11548v2 ) ライセンス: Link先を確認	Rohit Kishan Ray, Gian Paolo Beretta	(参考訳) 量子力学の非線形拡張の展開は、超音速通信(シグナリング)のような非物理的特徴を除外する必要があるため、簡単ではない。このレターでは、最も急激なエントロピー上昇形式は、部分系の局所的進化が必ずしもその減少状態にのみ依存するとは限らないような、より広範な非有意な非線形進化方程式に属する、有理な非有理的拡張であることを示す。局所還元密度演算子に加えて、「局所知覚」と呼ばれる局所作用素の幅広いクラスが存在し、他の非相互作用系内で局所化されるユニタリ演算に無関心であることを示す。 Devising a nonlinear extension of quantum mechanics is nontrivial because unphysical features such as supraluminal communication (signaling) are to be excluded. In this Letter, we show that the steepest entropy ascent formalism is a viable no-signaling extension belonging to a broader class of no-signaling nonlinear evolution equations for which the local evolution of a subsystem is not necessarily bound to depend only on its reduced state. We prove that, in addition to the local reduced density operator, there is a broad class of local operators called `local perceptions', which are insensitive to unitary operations localized within other non-interacting systems.	翻訳日:2023-07-27 16:14:28 公開日:2023-07-26
# 量子コンピュータ上のスレーター行列式と相関状態の効率的な調製のための浅量子回路 Shallow quantum circuits for efficient preparation of Slater determinants and correlated states on a quantum computer ( http://arxiv.org/abs/2301.07477v5 ) ライセンス: Link先を確認	Chong Hian Chee, Daniel Leykam, Adrian M. Mak, Dimitris G. Angelakis	(参考訳) フェルミオンアンザッツ状態調製は、量子化学や凝縮物質への応用のための変分量子固有解法のような多くの量子アルゴリズムにおける臨界サブルーチンである。スレーター行列式と相関状態を作成するのに必要な最浅い回路深度は、システムサイズ$n$に対して少なくとも線形にスケールする。量子機械学習のために開発されたデータローディング回路に触発されて、d-フェルミオンを用いたそのような状態を作成するために、より浅くスケーラブルな${\mathcal{o}}(d \log_2^2n)$ 2量子ビットのゲート深さ回路を提供する代替パラダイムを提案し、第二量子化における既存のアプローチよりもn$のサブ指数的削減を提供し、d{\ll}{\mathcal{o}}{\left(n / \log_2^2n\right)}$ fermionic systemsの精度の高い研究を可能にした。 Fermionic ansatz state preparation is a critical subroutine in many quantum algorithms such as Variational Quantum Eigensolver for quantum chemistry and condensed matter applications. The shallowest circuit depth needed to prepare Slater determinants and correlated states to date scale at least linearly with respect to the system size $N$. Inspired by data-loading circuits developed for quantum machine learning, we propose an alternate paradigm that provides shallower, yet scalable ${\mathcal{O}}(d \log_2^2N)$ two-qubit gate depth circuits to prepare such states with d-fermions, offering a subexponential reduction in $N$ over existing approaches in second quantization, enabling high-accuracy studies of $d{\ll}{\mathcal{O}}{\left(N / \log_2^2 N\right)}$ fermionic systems with larger basis sets on near-term quantum devices.	翻訳日:2023-07-27 16:14:14 公開日:2023-07-26
# 統計的推定における重み付きデータの量子化:(Near)ミニマックスレート、共変量化、均一回復 Quantizing Heavy-tailed Data in Statistical Estimation: (Near) Minimax Rates, Covariate Quantization, and Uniform Recovery ( http://arxiv.org/abs/2212.14562v2 ) ライセンス: Link先を確認	Junren Chen, Michael K. Ng, Di Wang	(参考訳) 本稿では,いくつかの基本統計的推定問題における重み付きデータの量子化について検討する。我々は,一様量子化に先立ってデータを切断し,適切に処理することを提案する。提案手法では, 推定誤差の最小化速度は, 提案手法が生成する量子化データからのみ達成可能である。特に, 共分散推定, 圧縮センシング, 行列完全度について, 量子化が乗法係数をわずかに悪化させるだけという具体的な結果が得られた。さらに,共変量(つまり,ベクトル)と応答が量子化される圧縮センシングの研究を行った。共変量化の下では、共分散行列推定器は正の半定性に欠けるため、回復プログラムは非凸であるが、全ての局所最小化器は最適誤差境界付近で楽しむことが証明される。さらに, 製品プロセスの濃度不等式と被覆議論により, 重み付き雑音を伴う量子化圧縮センシングのための最小値均一回復保証をほぼ確立する。 This paper studies the quantization of heavy-tailed data in some fundamental statistical estimation problems, where the underlying distributions have bounded moments of some order. We propose to truncate and properly dither the data prior to a uniform quantization. Our major standpoint is that (near) minimax rates of estimation error are achievable merely from the quantized data produced by the proposed scheme. In particular, concrete results are worked out for covariance estimation, compressed sensing, and matrix completion, all agreeing that the quantization only slightly worsens the multiplicative factor. Besides, we study compressed sensing where both covariate (i.e., sensing vector) and response are quantized. Under covariate quantization, although our recovery program is non-convex because the covariance matrix estimator lacks positive semi-definiteness, all local minimizers are proved to enjoy near optimal error bound. Moreover, by the concentration inequality of product process and covering argument, we establish near minimax uniform recovery guarantee for quantized compressed sensing with heavy-tailed noise.	翻訳日:2023-07-27 16:13:45 公開日:2023-07-26
# カスケードlstmネットワークを用いた新しい深層強化学習型自動株式取引システム A Novel Deep Reinforcement Learning Based Automated Stock Trading System Using Cascaded LSTM Networks ( http://arxiv.org/abs/2212.02721v2 ) ライセンス: Link先を確認	Jie Zou, Jiashu Lou, Baohua Wang, Sixue Liu	(参考訳) 深層強化学習(DRL)アルゴリズムを用いて、より多くの株式取引戦略が構築されているが、ゲームコミュニティで広く使われているDRL手法は、信号対雑音比と不均一性の低い財務データに直接適応できないため、パフォーマンス上の欠点に悩まされている。本稿では,まずLSTMを用いて日次データから時系列特徴を抽出し,次に抽出した特徴をトレーニングエージェントに供給し,強化学習における戦略関数もトレーニングにLSTMを使用する,DRLベースの株式取引システムを提案する。米国市場におけるDJIと中国株式市場におけるSSE50の実験から、当社のモデルは累積リターンとシャープ比で従来のベースラインモデルよりも優れており、この優位性は、合併市場である中国株式市場においてより重要である。提案手法は,自動株式取引システムを構築する上で有望な方法であることを示す。 More and more stock trading strategies are constructed using deep reinforcement learning (DRL) algorithms, but DRL methods originally widely used in the gaming community are not directly adaptable to financial data with low signal-to-noise ratios and unevenness, and thus suffer from performance shortcomings. In this paper, to capture the hidden information, we propose a DRL based stock trading system using cascaded LSTM, which first uses LSTM to extract the time-series features from stock daily data, and then the features extracted are fed to the agent for training, while the strategy functions in reinforcement learning also use another LSTM for training. Experiments in DJI in the US market and SSE50 in the Chinese stock market show that our model outperforms previous baseline models in terms of cumulative returns and Sharp ratio, and this advantage is more significant in the Chinese stock market, a merging market. It indicates that our proposed method is a promising way to build a automated stock trading system.	翻訳日:2023-07-27 16:13:25 公開日:2023-07-26
# 不確かさを持つマルコフジャンプ線形系の形式制御器合成 Formal Controller Synthesis for Markov Jump Linear Systems with Uncertain Dynamics ( http://arxiv.org/abs/2212.00679v4 ) ライセンス: Link先を確認	Luke Rickard, Thom Badings, Licio Romao, Alessandro Abate	(参考訳) サイバーフィジカルシステムのための確実に正しい制御器の自動合成は、安全クリティカルなシナリオの展開に不可欠である。しかし、ハイブリッド機能や確率的あるいは未知の振る舞いは、この問題を難しくする。サイバーフィジカルシステムのための離散時間モデルのクラスであるマルコフジャンプ線形システム(mjlss)の制御器を合成する方法を提案する。 MJLSは有限集合の確率線型力学と、マルコフ決定過程(MDP)によって支配されるこれらの力学の間の離散ジャンプからなる。本研究は, このMPPの遷移確率が一定間隔で知られているか, 完全に未知であるかを考察する。我々のアプローチは、MJLSの離散(モードジャンプ)と連続(確率線形)の両方の挙動を捉える有限状態抽象化に基づいている。我々は、この抽象概念を、いわゆる「scenario approach」のサンプリング手法を用いて遷移確率の間隔を計算する区間 MDP (iMDP) として定式化し、確率論的に近似を与える。本手法を複数の現実的なベンチマーク問題,特に温度制御と航空機の配送問題に適用する。 Automated synthesis of provably correct controllers for cyber-physical systems is crucial for deployment in safety-critical scenarios. However, hybrid features and stochastic or unknown behaviours make this problem challenging. We propose a method for synthesising controllers for Markov jump linear systems (MJLSs), a class of discrete-time models for cyber-physical systems, so that they certifiably satisfy probabilistic computation tree logic (PCTL) formulae. An MJLS consists of a finite set of stochastic linear dynamics and discrete jumps between these dynamics that are governed by a Markov decision process (MDP). We consider the cases where the transition probabilities of this MDP are either known up to an interval or completely unknown. Our approach is based on a finite-state abstraction that captures both the discrete (mode-jumping) and continuous (stochastic linear) behaviour of the MJLS. We formalise this abstraction as an interval MDP (iMDP) for which we compute intervals of transition probabilities using sampling techniques from the so-called 'scenario approach', resulting in a probabilistically sound approximation. We apply our method to multiple realistic benchmark problems, in particular, a temperature control and an aerial vehicle delivery problem.	翻訳日:2023-07-27 16:13:07 公開日:2023-07-26
# エコーチェンバー効果を増幅するリツイート Retweets Amplify the Echo Chamber Effect ( http://arxiv.org/abs/2211.16480v2 ) ライセンス: Link先を確認	Ashwin Rao, Fred Morstatter and Kristina Lerman	(参考訳) 公共の談話におけるソーシャルメディアの隆盛は、オンライン情報の品質とそれが政治的二極化の増幅に果たす役割のさらなる精査につながった。しかし、twitterのようなソーシャルメディアプラットフォームにおける分断の研究は、ソーシャルグラフに関するデータ収集の難しさ、特にユーザーが参加するエコーチェンバーやタイムラインで見るものを表すリンクをフォローすることによって妨げられている。フォロワーグラフのプロキシとして、研究者はretweetを使用するが、この選択が分析にどのように影響するかは明らかではない。 twitterのフォロワーグラフとユーザーの投稿したツイートのサンプルを使って、retweetグラフを再構築し、エコーチャンバーと露出の指標にその影響を定量化する。両方のグラフにエコーチャンバーがあることは分かっていますが、retweetグラフではより顕著です。我々は、ユーザーがフォロワーとリツイートネットワークで見る情報を比較し、リツイートされたアカウントが系統的により分断されたコンテンツを共有していることを示す。このバイアスは、ユーザ自身のフォロワグラフ近傍でのアクティビティや分極などでは説明できないが、イデオロギー的に自身の見解と一致したアカウントに注意を払うことで説明できる。以上の結果から,リツイートグラフに基づく研究はエコーチャンバー効果や偏光情報への露出を過大評価していることが示唆された。 The growing prominence of social media in public discourse has led to a greater scrutiny of the quality of online information and the role it plays in amplifying political polarization. However, studies of polarization on social media platforms like Twitter have been hampered by the difficulty of collecting data about the social graph, specifically follow links that shape the echo chambers users join as well as what they see in their timelines. As a proxy of the follower graph, researchers use retweets, although it is not clear how this choice affects analysis. Using a sample of the Twitter follower graph and the tweets posted by users within it, we reconstruct the retweet graph and quantify its impact on the measures of echo chambers and exposure. While we find that echo chambers exist in both graphs, they are more pronounced in the retweet graph. We compare the information users see via their follower and retweet networks to show that retweeted accounts share systematically more polarized content. This bias cannot be explained by the activity or polarization within users' own follower graph neighborhoods but by the increased attention they pay to accounts that are ideologically aligned with their own views. Our results suggest that studies relying on the retweet graphs overestimate the echo chamber effects and exposure to polarized information.	翻訳日:2023-07-27 16:12:46 公開日:2023-07-26
# FsaNet: セマンティックセグメンテーションのための周波数自己注意 FsaNet: Frequency Self-attention for Semantic Segmentation ( http://arxiv.org/abs/2211.15595v3 ) ライセンス: Link先を確認	Fengyu Zhang, Ashkan Panahi, Guangjun Gao	(参考訳) 画像のスペクトル特性を考慮し,線形速度まで計算複雑性を低減した新しい自己追尾機構を提案する。オブジェクト内の類似性を促進しつつエッジの保存性を向上させるため,周波数帯域の異なる個別化プロセスを提案する。特に, プロセスが低周波成分上のみである場合について検討する。アブレーション研究により,低周波自己注意は,ネットワークを再トレーニングすることなく,全周波に対して非常に近い,あるいは良好な性能が得られることを示した。そこで我々は,FsaNetと呼ぶCNNネットワークの先頭に,新しいプラグアンドプレイモジュールを設計し,組み込む。周波数自己注意 1) 入力として少数の低周波係数しか必要としない。 2) 線形構造を持つ空間領域自己完結と数学的に等価である。 3) トークンマッピング(1\times1$畳み込み)ステージとトークンの混合ステージを同時に単純化する。周波数自己アテンションに要するメモリは 87.29 % \sim 90.04 %$、メモリは 96.13 % \sim 98.07 %$ FLOPs と 97.56 % \sim 98.18 %$ である。他のresnet101ベースのセルフアテンションネットワークと比較して、 \ourm は cityscape テストデータセットと ade20k と vocaug の競合結果で新たな \sart 結果 (83.0\%$ miou) を達成している。 \ourMは、COCO上のインスタンスセグメンテーションのためのMASK R-CNNを強化することもできる。また、提案モジュールを利用することで、スケールの異なる一連のモデル上でsegformerをブーストすることができ、再トレーニングすることなくsegformer-b5を改善できる。コードは \url{https://github.com/zfy-csu/FsaNet Considering the spectral properties of images, we propose a new self-attention mechanism with highly reduced computational complexity, up to a linear rate. To better preserve edges while promoting similarity within objects, we propose individualized processes over different frequency bands. In particular, we study a case where the process is merely over low-frequency components. By ablation study, we show that low frequency self-attention can achieve very close or better performance relative to full frequency even without retraining the network. Accordingly, we design and embed novel plug-and-play modules to the head of a CNN network that we refer to as FsaNet. The frequency self-attention 1) requires only a few low frequency coefficients as input, 2) can be mathematically equivalent to spatial domain self-attention with linear structures, 3) simplifies token mapping ($1\times1$ convolution) stage and token mixing stage simultaneously. We show that frequency self-attention requires $87.29\% \sim 90.04\%$ less memory, $96.13\% \sim 98.07\%$ less FLOPs, and $97.56\% \sim 98.18\%$ in run time than the regular self-attention. Compared to other ResNet101-based self-attention networks, \ourM achieves a new \sArt result ($83.0\%$ mIoU) on Cityscape test dataset and competitive results on ADE20k and VOCaug. \ourM can also enhance MASK R-CNN for instance segmentation on COCO. In addition, utilizing the proposed module, Segformer can be boosted on a series of models with different scales, and Segformer-B5 can be improved even without retraining. Code is accessible at \url{https://github.com/zfy-csu/FsaNet	翻訳日:2023-07-27 16:12:23 公開日:2023-07-26
# 非エルミート二バンドBCSモデルにおけるゼロ例外点におけるマイスナー効果の破壊 Breakdown of the Meissner effect at the zero exceptional point in non-Hermitian two-band BCS model ( http://arxiv.org/abs/2211.11422v2 ) ライセンス: Link先を確認	Takanobu Taira	(参考訳) 外部浴槽に結合した系を記述する非エルミート多体ハミルトニアンについて検討する。非エルミート平均場理論を用いて、ハミルトニアンの固有値はパラメータ空間において特異性を示し、例外点と呼ばれる相転移点が出現することを示した。この時点で、ギャップパラメータが有限である間、マイスナー効果は崩壊する。我々の研究は、非エルミート多体系における例外点の役割に関する洞察を提供する。 We investigate a non-Hermitian many-body Hamiltonian describing a system coupled to an external bath. Using non-Hermitian mean-field theory, we show that the Hamiltonian's eigenvalues exhibit a singularity in the parameter space, leading to the emergence of a phase transition point called the exceptional point. At this point, the Meissner effect breaks down while gap parameters remain finite. Our work provides insights into the role of an exceptional point in the non-Hermitian many-body systems.	翻訳日:2023-07-27 16:11:51 公開日:2023-07-26
# 視覚位置認識のための集合データベース選択の優位化 Dominating Set Database Selection for Visual Place Recognition ( http://arxiv.org/abs/2303.05123v2 ) ライセンス: Link先を確認	Anastasiia Kornilova, Ivan Moskalenko, Timofei Pushkin, Fakhriddin Tojiboev, Rahim Tariverdizadeh, Gonzalo Ferrer	(参考訳) 本稿では,RGBDスキャンシーケンスから室内環境のローカライズのための視覚的位置認識(VPR)データベースを作成する手法を提案する。提案手法は,空間情報から構築したグラフのドミネーションセットアルゴリズムを用いて最小化問題として定式化され,ドミネーションセットと呼ばれる。本アルゴリズムは,データベース作成に使用される他の手法と比較して,シーンカバレッジを向上する。また,dominatingsetを使用すると,データベースサイズは元のスキャンシーケンスの最大250～1400倍小さくなり,リコールレートはテストシーケンスの80%以上となることを実証した。提案アルゴリズムを7シーンとBundleFusionデータセットと,高度に反復的なオフィス設定で追加記録したシーケンスで評価した。さらに、データベース選択は、ニューラルネットワークの位置認識アルゴリズムを特定の設定に微調整する弱い教師付きラベルを生成することができ、精度をさらに向上させる。また、RGBDスキャンシーケンスからVPRデータベースを作成するための完全自動化パイプラインと、VPRデータベース評価のためのメトリクスセットも提示する。コードとリリースされたデータは、私たちのWebページ~-https://prime-slam.github.io/place-recognition-db/で利用可能です。 This paper presents an approach for creating a visual place recognition (VPR) database for localization in indoor environments from RGBD scanning sequences. The proposed approach is formulated as a minimization problem in terms of dominating set algorithm for graph, constructed from spatial information, and referred as DominatingSet. Our algorithm shows better scene coverage in comparison to other methodologies that are used for database creation. Also, we demonstrate that using DominatingSet, a database size could be up to 250-1400 times smaller than the original scanning sequence while maintaining a recall rate of more than 80% on testing sequences. We evaluated our algorithm on 7-scenes and BundleFusion datasets and an additionally recorded sequence in a highly repetitive office setting. In addition, the database selection can produce weakly-supervised labels for fine-tuning neural place recognition algorithms to particular settings, improving even more their accuracy. The paper also presents a fully automated pipeline for VPR database creation from RGBD scanning sequences, as well as a set of metrics for VPR database evaluation. The code and released data are available on our web-page~ -- https://prime-slam.github.io/place-recognition-db/	翻訳日:2023-07-27 16:05:22 公開日:2023-07-26
# フロッケ電子相の光伝導率シグネチャ Optical Conductivity Signatures of Floquet Electronic Phases ( http://arxiv.org/abs/2303.02261v2 ) ライセンス: Link先を確認	Andrew Cupo, Joshuah T. Heath, Emilio Cobanera, James D. Whitfield, Chandrasekhar Ramanathan, Lorenza Viola	(参考訳) 光伝導率測定はフロケ電子相の異なるシグネチャへのアクセスを提供し、理論的には準エネルギーバンド構造によって記述される。我々は以前に導入したフロケグラフェンアンチドート格子(FGAL)の実験観測値(Phys. Rev. B 104, 174304 (2021))を特徴付ける。フロッケ線形応答理論に基づいて、縦導電率とホール導電率の実部と虚部をプローブ周波数の関数として計算する。応答関数におけるピークの数と位置は、異なるフロッケ電子相に特有のものであり、平衡アナログを持たない複数の特性を同定する。まず、プローブ周波数のいくつかの間隔で、導電性の実部は負になる。これは通常のジュール加熱機構のサブバージョンである: フロケット駆動により、物質がプローブのパワーを増幅し、結果として得られる。さらに、ホールの反応は平衡で消えるが、フロケホールの導電率の実部と虚部はゼロではなく、長手成分と同じ大きさである。最後に、駆動による局在化は、全体の大きさを減少させ、光伝導率信号を平坦化する傾向がある。実装の観点からは、FGALの主な利点は、上記帯域幅の駆動限界は、本質材料に必要なものよりも少なくとも20倍低い光子エネルギーで到達し、マグニチュード以下の帯域再正規化を可能にすることである。私たちの研究は、この新素材の反射率データを特定のフロッケ相にマッピングするために必要なツールを提供します。 Optical conductivity measurements may provide access to distinct signatures of Floquet electronic phases, which are described theoretically by their quasienergy band structures. We characterize experimental observables of the Floquet graphene antidot lattice (FGAL), which we introduced previously [Phys. Rev. B 104, 174304 (2021)]. On the basis of Floquet linear response theory, the real and imaginary parts of the longitudinal and Hall optical conductivity are computed as a function of probe frequency. We find that the number and positions of peaks in the response function are distinctive of the different Floquet electronic phases, and identify multiple properties with no equilibrium analog. First, for several intervals of probe frequencies, the real part of the conductivity becomes negative. We argue this is indicative of a subversion of the usual Joule heating mechanism: The Floquet drive causes the material to amplify the power of the probe, resulting in gain. Additionally, while the Hall response vanishes at equilibrium, the real and imaginary parts of the Floquet Hall conductivity are non-zero and can be as large as the longitudinal components. Lastly, driving-induced localization tends to reduce the overall magnitude of and to flatten out the optical conductivity signal. From an implementation standpoint, a major advantage of the FGAL is that the above-bandwidth driving limit is reached with photon energies that are at least twenty times lower than that required for the intrinsic material, allowing for significant band renormalization at orders-of-magnitude smaller intensities. Our work provides the necessary tools for experimentalists to map reflectance data to particular Floquet phases for this novel material.	翻訳日:2023-07-27 16:04:46 公開日:2023-07-26
# FacEDiM:乳牛の生体認証のための顔埋め込み分布モデル FacEDiM: A Face Embedding Distribution Model for Few-Shot Biometric Authentication of Cattle ( http://arxiv.org/abs/2302.14831v2 ) ライセンス: Link先を確認	Meshia C\'edric Oveneke, Rucha Vaishampayan, Deogratias Lukamba Nsadisa, Jenny Ambukiyenyi Onya	(参考訳) 本研究は, プレトレーニングCNNを用いて得られたトレーニング埋め込みの多変量ガウス分布と試験埋め込みのマハラノビス距離を計算することで, バイオメトリック認証の課題を解決することを提案する。実験の結果,ImageNetデータセット上で事前学習したモデルは,人間の顔上で事前学習したモデルよりも有意に優れていた。 vgg16モデルでは20頭の牛の身元をデータセットで1.18%の範囲で1.25%のfrrを得る。 This work proposes to solve the problem of few-shot biometric authentication by computing the Mahalanobis distance between testing embeddings and a multivariate Gaussian distribution of training embeddings obtained using pre-trained CNNs. Experimental results show that models pre-trained on the ImageNet dataset significantly outperform models pre-trained on human faces. With a VGG16 model, we obtain a FRR of 1.25% for a FAR of 1.18% on a dataset of 20 cattle identities.	翻訳日:2023-07-27 16:04:17 公開日:2023-07-26
# オープンシステムのノイズ支援ディジタル量子シミュレーション Noise-assisted digital quantum simulation of open systems ( http://arxiv.org/abs/2302.14592v3 ) ライセンス: Link先を確認	Jos\'e D. Guimar\~aes, James Lim, Mikhail I. Vasilevskiy, Susana F. Huelga and Martin B. Plenio	(参考訳) 量子系は本質的にオープンであり、環境騒音に影響を受けやすいため、その力学に有害で有益な効果がある。この現象は、ノイズが新しい機能を可能にする生体分子系で観察され、そのダイナミクスのシミュレーションがデジタルおよびアナログ量子シミュレーションの重要なターゲットとなっている。それにもかかわらず、現在の量子デバイスの計算能力は、その固有のノイズのため、しばしば制限される。本研究では,オープンな量子システムのシミュレーションに必要な計算資源を削減するために,量子デバイス固有のノイズを利用する新しい手法を提案する。提案手法は,量子ノイズ特性法と量子誤差緩和法を組み合わせることで,量子回路における固有ノイズの操作と制御を可能にする。具体的には,開放系力学の所望のシミュレーションを実現するために,量子回路のデコヒーレンス率を選択的に増減する。本手法の詳細を述べるとともに、実およびエミュレートされたibm量子コンピュータで実施したノイズキャラクタリゼーションおよび量子誤差軽減実験の結果について報告する。さらに,本手法の実験的資源要件を推定する。提案手法では,ノイズを生かして量子計算を高速化し,新しいシミュレーション手法をNISQ(Noisy Intermediate-Scale Quantum)デバイスに導入する可能性を秘めている。 Quantum systems are inherently open and susceptible to environmental noise, which can have both detrimental and beneficial effects on their dynamics. This phenomenon has been observed in bio-molecular systems, where noise enables novel functionalities, making the simulation of their dynamics a crucial target for digital and analog quantum simulation. Nevertheless, the computational capabilities of current quantum devices are often limited due to their inherent noise. In this work, we present a novel approach that capitalizes on the intrinsic noise of quantum devices to reduce the computational resources required for simulating open quantum systems. Our approach combines quantum noise characterization methods with quantum error mitigation techniques, enabling us to manipulate and control the intrinsic noise in a quantum circuit. Specifically, we selectively enhance or reduce decoherence rates in the quantum circuit to achieve the desired simulation of open system dynamics. We provide a detailed description of our methods and report on the results of noise characterization and quantum error mitigation experiments conducted on both real and emulated IBM Quantum computers. Additionally, we estimate the experimental resource requirements for our techniques. Our approach holds the potential to unlock new simulation techniques in Noisy Intermediate-Scale Quantum (NISQ) devices, harnessing their intrinsic noise to enhance quantum computations.	翻訳日:2023-07-27 16:04:07 公開日:2023-07-26
# サブキューブ条件付きハイパーグリッドの均一性試験 Uniformity Testing over Hypergrids with Subcube Conditioning ( http://arxiv.org/abs/2302.09013v2 ) ライセンス: Link先を確認	Xi Chen, Cassandra Marcussen	(参考訳) これは$\smash{\widetilde{o}(\text{poly}(m)\sqrt{n}/\epsilon^2)} となる。$m=\max_i m_i$ でoracleをサンプリングするサブキューブ条件付きクエリに多くのクエリを与える。 m$が定数である場合、我々のアルゴリズムはほぼ最適であり、[CCK+21]のアルゴリズムは、同じクエリの複雑さを持つが、ハイパーキューブ$\{\pm 1\}^n$でのみ機能する。我々のアルゴリズムの分析の背後にある重要な技術的貢献は、フーリエ解析を用いて超格子上の関数に対するピシエの不等式の頑健なバージョンの証明である。 We give an algorithm for testing uniformity of distributions supported on hypergrids $[m_1] \times \cdots \times [m_n]$, which makes $\smash{\widetilde{O}(\text{poly}(m)\sqrt{n}/\epsilon^2)}$ many queries to a subcube conditional sampling oracle with $m=\max_i m_i$. When $m$ is a constant, our algorithm is nearly optimal and strengthens the algorithm of [CCK+21] which has the same query complexity but works for hypercubes $\{\pm 1\}^n$ only. A key technical contribution behind the analysis of our algorithm is a proof of a robust version of Pisier's inequality for functions over hypergrids using Fourier analysis.	翻訳日:2023-07-27 16:02:57 公開日:2023-07-26
# オフライン強化学習におけるデータ強化のための不確実性駆動トラジェクトリトランケーション Uncertainty-driven Trajectory Truncation for Data Augmentation in Offline Reinforcement Learning ( http://arxiv.org/abs/2304.04660v2 ) ライセンス: Link先を確認	Junjie Zhang, Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Jun Yang, Le Wan, Xiu Li	(参考訳) トレーニングされた環境ダイナミクスを備えたモデルベースオフライン強化学習(RL)アルゴリズムは、品質の低いデータセットでさえも、固定サイズのデータセットから優れたポリシをうまく学習することができる。しかし残念ながら、トレーニングされたダイナミクスモデルから生成されたサンプルが信頼できることは保証できない(例えば、いくつかの合成サンプルは静的データセットの支持領域の外側にあるかもしれない)。この問題に対処するため, 軌道に沿って蓄積された不確かさが大きすぎる場合, 合成軌道を適応的に切断するトラジェクトリトラニケーション (TATU) を提案する。理論的には、TATUの性能境界を示し、その利点を正当化する。 TATUの利点を実証的に示すために、まず2つの古典的モデルベースオフラインRLアルゴリズム、MOPOとCOMBOを組み合わせる。さらに、TATUを市販のモデルなしオフラインRLアルゴリズム、例えばBCQと統合する。 D4RLベンチマーク実験の結果、TATUは性能を著しく改善し、しばしば大きなマージンで改善した。コードはここにある。 Equipped with the trained environmental dynamics, model-based offline reinforcement learning (RL) algorithms can often successfully learn good policies from fixed-sized datasets, even some datasets with poor quality. Unfortunately, however, it can not be guaranteed that the generated samples from the trained dynamics model are reliable (e.g., some synthetic samples may lie outside of the support region of the static dataset). To address this issue, we propose Trajectory Truncation with Uncertainty (TATU), which adaptively truncates the synthetic trajectory if the accumulated uncertainty along the trajectory is too large. We theoretically show the performance bound of TATU to justify its benefits. To empirically show the advantages of TATU, we first combine it with two classical model-based offline RL algorithms, MOPO and COMBO. Furthermore, we integrate TATU with several off-the-shelf model-free offline RL algorithms, e.g., BCQ. Experimental results on the D4RL benchmark show that TATU significantly improves their performance, often by a large margin. Code is available here.	翻訳日:2023-07-27 15:54:32 公開日:2023-07-26
# 強いバスカップリングによる循環型量子エンジン Cyclic quantum engines enhanced by strong bath coupling ( http://arxiv.org/abs/2304.03267v3 ) ライセンス: Link先を確認	Camille L. Latune, Graeme Pleasance, and Francesco Petruccione	(参考訳) 強いシステムバス結合はリッチで興味深い現象を生み出すが、量子熱エンジンへの応用は、主に有害な効果を指摘してきた。強い結合による効率損失とより早い平衡による電力増加との微妙なトレードオフは認識されているものの、正確に平衡時間を評価するという課題のためにほとんど未解決のままであった。ここでは, 階層的運動方程式 (heom) 形式に基づく厳密な数値シミュレーションを用いて, この障害を克服する。量子オットーサイクルは、この方法で出力電力の効率タイムの積を最大化することで、強結合(しかし超強結合ではない)よりも優れた性能を示す。特に,強い結合により,同じ出力パワーを共有しながら,より効率のよいエンジンを得ることができることを示した。逆に、弱い結合されたエンジンよりも大きな出力を持つ強い結合エンジンを設計でき、同じ効率を共有できる。その結果, 強い結合が熱力学的操作の性能を直接的に向上させることができ, 量子サーマルエンジンの標準構成以上の研究の重要性を再強調できる。 While strong system-bath coupling produces rich and interesting phenomena, applications to quantum thermal engines have been so far pointing mainly at detrimental effects. The delicate trade-off between efficiency loss due to strong coupling and power increase due to faster equilibration, while acknowledged, remained largely unexplored owing to the challenge of assessing precisely the equilibration time. Here, we overcome this obstacle by exploiting exact numerical simulations based on the hierarchical equations of motion (HEOM) formalism. We show that a quantum Otto cycle can perform better at strong (but not ultrastrong) coupling in that the product of the efficiency times the output power is maximized in this regime. In particular, we show that strong coupling allows one to obtain engines with larger efficiency than their weakly coupled counterparts, while sharing the same output power. Conversely, one can design strongly coupled engines with larger power than their weakly coupled counterparts, while sharing the same efficiency. Overall, our results provide situations where strong coupling can directly enhance the performance of thermodynamic operations, re-enforcing the importance of studying quantum thermal engines beyond standard configurations.	翻訳日:2023-07-27 15:54:13 公開日:2023-07-26
# Neglected Free Lunch - アノテーション副産物を用いた画像分類器の学習 Neglected Free Lunch -- Learning Image Classifiers Using Annotation Byproducts ( http://arxiv.org/abs/2303.17595v3 ) ライセンス: Link先を確認	Dongyoon Han, Junsuk Choe, Seonghyeok Chun, John Joon Young Chung, Minsuk Chang, Sangdoo Yun, Jean Y. Song, Seong Joon Oh	(参考訳) 画像分類器の教師付き学習は、画像と対応するラベル(x,y)のペアを通して人間の知識をパラメトリックモデルに蒸留する。このシンプルで広く使われている人間の知識の表現は、画像選択後のマウスのトレースやクリックの時系列などのアノテーション手順からの豊富な補助情報を無視していると論じる。我々の洞察では、このようなアノテーション副産物Zは、モデルが前景の手がかりに集中するように弱め、素早い相関を減らし、ショートカット学習を阻害するおよそ人間の注意を与える。これを検証するために、ImageNet-ABとCOCO-ABを作成します。これらはImageNetとCOCOトレーニングセットで、サンプル単位のアノテーション副産物が豊富で、それぞれのオリジナルのアノテーションタスクを複製して収集される。アノテーション副産物を用いたトレーニングモデルの新たなパラダイムを,アノテーション副産物を用いた学習(luab)と呼んでいる。 y とともに z をレグレッシブする単純なマルチタスクロスにより,学習モデルの一般化性とロバスト性が向上することを示す。オリジナルの教師付き学習と比較すると、LUABは追加のアノテーションコストを必要としない。 ImageNet-ABとCOCO-ABはhttps://github.com/naver-ai/NeglectedFreeLunchにある。 Supervised learning of image classifiers distills human knowledge into a parametric model through pairs of images and corresponding labels (X,Y). We argue that this simple and widely used representation of human knowledge neglects rich auxiliary information from the annotation procedure, such as the time-series of mouse traces and clicks left after image selection. Our insight is that such annotation byproducts Z provide approximate human attention that weakly guides the model to focus on the foreground cues, reducing spurious correlations and discouraging shortcut learning. To verify this, we create ImageNet-AB and COCO-AB. They are ImageNet and COCO training sets enriched with sample-wise annotation byproducts, collected by replicating the respective original annotation tasks. We refer to the new paradigm of training models with annotation byproducts as learning using annotation byproducts (LUAB). We show that a simple multitask loss for regressing Z together with Y already improves the generalisability and robustness of the learned models. Compared to the original supervised learning, LUAB does not require extra annotation costs. ImageNet-AB and COCO-AB are at https://github.com/naver-ai/NeglectedFreeLunch.	翻訳日:2023-07-27 15:53:52 公開日:2023-07-26
# 変圧器ネットワークを用いた高速道路自動走行のマルチモーダル操作と軌道予測 Multimodal Manoeuvre and Trajectory Prediction for Automated Driving on Highways Using Transformer Networks ( http://arxiv.org/abs/2303.16109v2 ) ライセンス: Link先を確認	Sajjad Mozaffari, Mreza Alipour Sormoli, Konstantinos Koufos, and Mehrdad Dianati	(参考訳) 自動運転車(AV)や自動走行システム(ADS)の安全かつ効率的な運転には、車両を含む他の道路利用者の行動(操縦・軌道)を予測することが重要である。車両の将来の挙動が不確実であるため、複数の将来の動作モードは、与えられた運転シーンにおいて車両に対してしばしば可能となる。したがって、マルチモーダル予測はシングルモード予測よりもリッチな情報を提供し、AVがより良いリスク評価を行うことができる。そこで本研究では,複数の動作モードとその可能性を予測するマルチモーダル予測フレームワークを提案する。提案フレームワークは,マルチモーダルな操作および軌道予測のための調整されたトレーニング手法と,新しいトランスフォーマーに基づく予測モデルを含む。本フレームワークの性能は,NGSIM, HighD, exiDという3つの公道走行データセットを用いて評価した。その結果,提案手法は予測誤差の点で最先端のマルチモーダル手法よりも優れており,予測可能な操作モードや軌道モードを予測できることがわかった。 Predicting the behaviour (i.e., manoeuvre/trajectory) of other road users, including vehicles, is critical for the safe and efficient operation of autonomous vehicles (AVs), a.k.a., automated driving systems (ADSs). Due to the uncertain future behaviour of vehicles, multiple future behaviour modes are often plausible for a vehicle in a given driving scene. Therefore, multimodal prediction can provide richer information than single-mode prediction, enabling AVs to perform a better risk assessment. To this end, we propose a novel multimodal prediction framework that can predict multiple plausible behaviour modes and their likelihoods. The proposed framework includes a bespoke problem formulation for manoeuvre prediction, a novel transformer-based prediction model, and a tailored training method for multimodal manoeuvre and trajectory prediction. The performance of the framework is evaluated using three public highway driving datasets, namely NGSIM, highD, and exiD. The results show that our framework outperforms the state-of-the-art multimodal methods in terms of prediction error and is capable of predicting plausible manoeuvre and trajectory modes.	翻訳日:2023-07-27 15:53:32 公開日:2023-07-26
# HOICLIP:視覚言語モデルを用いたHOI検出のための効率的な知識伝達 HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models ( http://arxiv.org/abs/2303.15786v3 ) ライセンス: Link先を確認	Shan Ning, Longtian Qiu, Yongfei Liu, Xuming He	(参考訳) human-object interaction(hoi)検出は、人間とオブジェクトのペアをローカライズし、それらの相互作用を認識することを目的としている。近年,コントラスト言語-画像事前学習 (CLIP) は,知識蒸留によるHOI検出器の操作に先立って大きな可能性を示している。しかしながら、このようなアプローチは大規模トレーニングデータに依存することが多く、少数/ゼロショットのシナリオではパフォーマンスが劣る。本稿では,CLIPから事前知識を効率的に抽出し,より優れた一般化を実現する新しいHOI検出フレームワークを提案する。具体的には,まず,クロスアテンション機構を介してクリップの視覚特徴マップから情報領域を抽出する新しいインタラクションデコーダを導入し,より正確な人間と対象のペア検出のための知識統合ブロックによって検出バックボーンと融合する。さらに、CLIPテキストエンコーダの事前知識を利用して、HOI記述を埋め込んで分類器を生成する。詳細なインタラクションを識別するために,視覚的意味演算と軽量動詞表現アダプタを用いて,訓練データから動詞分類器を構築する。さらに,CLIPのグローバルHOI予測を利用した学習自由化を提案する。 HICO-Det上の+4.04 mAPなど,様々な設定において,本手法が最先端の手法であることを示す。ソースコードはhttps://github.com/Artanic30/HOICLIPで入手できる。 Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions. Recently, Contrastive Language-Image Pre-training (CLIP) has shown great potential in providing interaction prior for HOI detectors via knowledge distillation. However, such approaches often rely on large-scale training data and suffer from inferior performance under few/zero-shot scenarios. In this paper, we propose a novel HOI detection framework that efficiently extracts prior knowledge from CLIP and achieves better generalization. In detail, we first introduce a novel interaction decoder to extract informative regions in the visual feature map of CLIP via a cross-attention mechanism, which is then fused with the detection backbone by a knowledge integration block for more accurate human-object pair detection. In addition, prior knowledge in CLIP text encoder is leveraged to generate a classifier by embedding HOI descriptions. To distinguish fine-grained interactions, we build a verb classifier from training data via visual semantic arithmetic and a lightweight verb representation adapter. Furthermore, we propose a training-free enhancement to exploit global HOI predictions from CLIP. Extensive experiments demonstrate that our method outperforms the state of the art by a large margin on various settings, e.g. +4.04 mAP on HICO-Det. The source code is available in https://github.com/Artanic30/HOICLIP.	翻訳日:2023-07-27 15:53:11 公開日:2023-07-26
# 安定なシグナチャ:潜拡散モデルにおけるローイング透かし The Stable Signature: Rooting Watermarks in Latent Diffusion Models ( http://arxiv.org/abs/2303.15435v2 ) ライセンス: Link先を確認	Pierre Fernandez, Guillaume Couairon, Herv\'e J\'egou, Matthijs Douze and Teddy Furon	(参考訳) 生成画像モデリングは幅広いアプリケーションを可能にするが、責任あるデプロイメントに関する倫理的懸念を提起する。本稿では,画像透かしと潜在拡散モデルを組み合わせたアクティブ戦略を提案する。目標は、生成したすべての画像が、将来の検出や識別を可能にする、見えない透かしを隠すことだ。この方法は、バイナリシグネチャで条件付けられたイメージジェネレータの潜在デコーダを迅速に微調整する。予め訓練された透かし抽出器は、生成された画像から隠された署名を回収し、統計検査を行い、生成モデルから来たものかどうかを判定する。画像修正後も安定署名が機能することを示すため,様々な世代タスクにおける透かしの可視性と頑健性を評価した。例えば、テキストプロンプトから生成された画像の原点を検出し、その内容の10\%$を90$+$\%$精度で10$^{-6}$以下で保持する。 Generative image modeling enables a wide range of applications but raises ethical concerns about responsible deployment. This paper introduces an active strategy combining image watermarking and Latent Diffusion Models. The goal is for all generated images to conceal an invisible watermark allowing for future detection and/or identification. The method quickly fine-tunes the latent decoder of the image generator, conditioned on a binary signature. A pre-trained watermark extractor recovers the hidden signature from any generated image and a statistical test then determines whether it comes from the generative model. We evaluate the invisibility and robustness of the watermarks on a variety of generation tasks, showing that Stable Signature works even after the images are modified. For instance, it detects the origin of an image generated from a text prompt, then cropped to keep $10\%$ of the content, with $90$+$\%$ accuracy at a false positive rate below 10$^{-6}$.	翻訳日:2023-07-27 15:52:24 公開日:2023-07-26
# ソフトウェア開発教育におけるジェネレーティブAIアシスタント Generative AI Assistants in Software Development Education ( http://arxiv.org/abs/2303.13936v2 ) ライセンス: Link先を確認	Christopher Bull, Ahmed Kharrufa	(参考訳) ソフトウェア開発業界は、別の破壊的なパラダイム変化の最中にある。プログラミングに生成型ai(gai)アシスタントを採用することだ。 AIはすでにソフトウェアエンジニアリングのさまざまな領域で使用されているが、GitHub CopilotやChatGPTといったGAIテクノロジは、人々の想像力(と恐怖)に火をつけている。業界がどのように適応するかは不明だが、Microsoft(GitHub、Bing)やGoogle(Bard)といった大手ソフトウェア企業によってこれらの技術を統合する動きは、意図と方向性を明確に示している。私たちは、現在の実践と課題を理解するために、業界専門家と探索的なインタビューを行い、ソフトウェア開発教育の将来というビジョンに組み込んで、教育的なレコメンデーションを実施しました。 The software development industry is amid another disruptive paradigm change - adopting the use of generative AI (GAI) assistants for programming. Whilst AI is already used in various areas of software engineering, GAI technologies, such as GitHub Copilot and ChatGPT, have ignited peoples' imaginations (and fears). It is unclear how the industry will adapt, but the move to integrate these technologies by large software companies, such as Microsoft (GitHub, Bing) and Google (Bard), is a clear indication of intent and direction. We performed exploratory interviews with industry professionals to understand current practice and challenges, which we incorporate into our vision of a future of software development education and make some pedagogical recommendations.	翻訳日:2023-07-27 15:52:06 公開日:2023-07-26
# 階層的関係推論によるシーングラフ生成 Scene Graph Generation from Hierarchical Relationship Reasoning ( http://arxiv.org/abs/2303.06842v2 ) ライセンス: Link先を確認	Bowen Jiang and Camillo J. Taylor	(参考訳) 本稿では,視覚場面における物体間の関係を推定する新しい手法を提案する。オブジェクトと関係のカテゴリを分離するために課せられる、有益で階層的な構造を明示的に利用します。具体的には,提案手法はベイズ予測ヘッドを組み込んで,2つのオブジェクト間の関係の型としてスーパーカテゴリの結合予測と,そのスーパーカテゴリ内の詳細な関係を実現できる。この設計はクラス不均衡の問題の影響を低減する。さらに,教師付きコントラスト学習を改良し,階層型分類方式を適用した。 Visual GenomeとOpenImage V6データセットの実験的評価は、この分解されたアプローチが比較的単純なモデルで、特に述語分類やゼロショットタスクにおいて、競争的なパフォーマンスを達成することを実証している。 This paper presents a novel approach for inferring relationships between objects in visual scenes. It explicitly exploits an informative hierarchical structure that can be imposed to divide the object and relationship categories into disjoint super-categories. Specifically, our proposed method incorporates a Bayes prediction head, enabling joint predictions of the super-category as the type of relationship between the two objects, along with the detailed relationship within that super-category. This design reduces the impact of class imbalance problems. Furthermore, we also modify the supervised contrastive learning to adapt our hierarchical classification scheme. Experimental evaluations on the Visual Genome and OpenImage V6 datasets demonstrate that this factorized approach allows a relatively simple model to achieve competitive performance, particularly in predicate classification and zero-shot tasks.	翻訳日:2023-07-27 15:51:52 公開日:2023-07-26
# 量子メトロロジーのための周波数境界の階層:Cram\er-RaoからBarankin Hierarchies of Frequentist Bounds for Quantum Metrology: From Cram\'er-Rao to Barankin ( http://arxiv.org/abs/2303.06108v2 ) ライセンス: Link先を確認	M. Gessner and A. Smerzi	(参考訳) 量子距離論における推定器の分散に関する下界は、推定器の不偏性に関する制約を定義する可観測性を選択することによって導かれる。量子境界は、与えられた制約を満たすすべての可能な量子測定値と推定値に対する解析的最適化によって得られる。我々は、最低次数で束縛された量子クレーア・ラオを含む、ますます厳密な境界の階層を得る。反対の極限において、量子バランキン境界 (quantum barankin bound) は、量子メトロロジーにおける局所最良不偏推定子の分散である。本結果は, 有限データによる混合状態の量子計測において, 規則性条件を回避し, しきい値の挙動を識別できる量子フィッシャー情報の一般化を明らかにするものである。 We derive lower bounds on the variance of estimators in quantum metrology by choosing test observables that define constraints on the unbiasedness of the estimator. The quantum bounds are obtained by analytical optimization over all possible quantum measurements and estimators that satisfy the given constraints. We obtain hierarchies of increasingly tight bounds that include the quantum Cram\'er-Rao bound at the lowest order. In the opposite limit, the quantum Barankin bound is the variance of the locally best unbiased estimator in quantum metrology. Our results reveal generalizations of the quantum Fisher information that are able to avoid regularity conditions and identify threshold behavior in quantum measurements with mixed states, caused by finite data.	翻訳日:2023-07-27 15:51:39 公開日:2023-07-26
# 特徴適応を用いたDNN圧縮領域認識 DNN-Compressed Domain Visual Recognition with Feature Adaptation ( http://arxiv.org/abs/2305.08000v2 ) ライセンス: Link先を確認	Yingpeng Deng and Lina J. Karam	(参考訳) 学習に基づく画像圧縮は、最先端の変換ベースのコーデックと競合する性能を発揮する。これはJPEG-AIのような新しい学習ベースのビジュアル圧縮標準の開発を動機づけた。これらの新しい標準に対する特に関心は、人間と機械の両方をターゲットにした学習ベースの画像圧縮システムの開発である。本稿では,圧縮領域表現を用いて,圧縮領域内で直接視覚処理やコンピュータビジョンタスクを行う学習ベース圧縮方式について述べる。本研究では,ビットレートの異なる圧縮ドメイン潜在表現を用いて視覚認識を行うための,学習ベースの圧縮ドメイン分類フレームワークを採用する。本稿では,抽出されたチャネル情報の中で重要な特徴を適応的に強調・強化するために,軽量な注意モデルを統合する新しい特徴適応モジュールを提案する。また,事前訓練された画素領域重みを利用するための適応学習戦略を設計する。比較のために,提案手法を用いて得られた性能評価結果に加えて,画素領域内の圧縮・完全復号画像とオリジナル未圧縮画像を用いた性能評価結果も提示する。その結果,提案した圧縮領域分類モデルは,既存の圧縮領域分類モデルよりも明らかに優れており,完全復号化画像を用いて訓練された画素領域モデルと比較して,計算効率が向上することを示す。 Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs. This motivated the development of new learning-based visual compression standards such as JPEG-AI. Of particular interest to these emerging standards is the development of learning-based image compression systems targeting both humans and machines. This paper is concerned with learning-based compression schemes whose compressed-domain representations can be utilized to perform visual processing and computer vision tasks directly in the compressed domain. In our work, we adopt a learning-based compressed-domain classification framework for performing visual recognition using the compressed-domain latent representation at varying bit-rates. We propose a novel feature adaptation module integrating a lightweight attention model to adaptively emphasize and enhance the key features within the extracted channel-wise information. Also, we design an adaptation training strategy to utilize the pretrained pixel-domain weights. For comparison, in addition to the performance results that are obtained using our proposed latent-based compressed-domain method, we also present performance results using compressed but fully decoded images in the pixel domain as well as original uncompressed images. The obtained performance results show that our proposed compressed-domain classification model can distinctly outperform the existing compressed-domain classification models, and that it can also yield similar accuracy results with a much higher computational efficiency as compared to the pixel-domain models that are trained using fully decoded images.	翻訳日:2023-07-27 15:46:26 公開日:2023-07-26
# マルチオブザーバによる高次元モニタリングとリアリズムの出現 High-dimensional monitoring and the emergence of realism via multiple observers ( http://arxiv.org/abs/2305.07919v2 ) ライセンス: Link先を確認	Alexandre C. Orthey Jr., Pedro R. Dieguez, Owidiusz Makuta, Remigiusz Augusiak	(参考訳) 非凸量子測定はユニタリ進化と部分的トレースによって記述することができる。そこで本研究では,量子世界の物理的現実の出現を,弱度と強い非選択性の測定を補間するモデルを導入することによって解決する。一般化された可観測量とハイゼンベルク・ワイル作用素に基づく我々のモデルは、高次元の量子ダーウィン主義の枠組みに従えば、高次元の可観測量についての完全な情報が得られることを示唆している。 Unrevealed quantum measurements can be described by unitary evolutions followed by partial traces. Based on that, we address the problem of the emergence of physical reality from the quantum world by introducing a model that interpolates between weak and strong non-selective measurements for qudits. Our model, which is based on generalized observables and Heisenberg-Weyl operators, suggests that for high-dimensional qudits, full information about the observable of interest can only be obtained by making the system interact with not just one but several environmental qudits, following a Quantum Darwinism framework.	翻訳日:2023-07-27 15:46:06 公開日:2023-07-26
# 拡散モデルにおけるNull-text Guidanceは、秘かにカートゥーンスタイルのクリエーターである Null-text Guidance in Diffusion Models is Secretly a Cartoon-style Creator ( http://arxiv.org/abs/2305.06710v3 ) ライセンス: Link先を確認	Jing Zhao, Heliang Zheng, Chaoyue Wang, Long Lan, Wanrong Huang, Wenjing Yang	(参考訳) 分類器フリーガイダンスは拡散モデルにおいて有効なサンプリング手法であり、広く採用されている。主な考え方は、モデルをテキストガイダンスの方向に外挿し、nullテキストガイダンスから遠ざかることである。本稿では,拡散モデルにおけるヌルテキストガイダンスが秘かにマンガスタイルの作者であること,すなわち,ヌルテキストガイダンスを単純に摂動させることで,生成した画像を漫画に効率的に変換できることを実証する。具体的には,2つの外乱手法,すなわちロールバック障害(Back-D)とイメージ障害(Image-D)を提案し,サンプリングプロセスにおいて,ヌルテキストガイダンスとテキストガイダンスの予測に使用されるノイズ画像と,それぞれ \textbf{null-text noisy image} と \textbf{text noisy image} とを一致させる。 Back-Dは、$x_t$を$x_{t+\Delta t}$に置き換えることで、null-textのノイズレベルを変更することで、漫画化を実現する。 Image-Dは、クリーンな入力画像として$x_t$を定義することにより、高忠実で多様な漫画を生成する。包括的実験により, ノイズ乱れの原理を考察し, 乱れの有効性は, 雑音画像と音源画像との相関に依存することを明らかにした。さらに,提案手法は,漫画画像を生成し,特定のものを漫画化することができるため,任意の分類子フリー誘導拡散モデルにおいて,プラグイン・アンド・プレイ・コンポーネントとして容易に統合できる。プロジェクトページは \url{https://nulltextforcartoon.github.io/} で利用可能である。 Classifier-free guidance is an effective sampling technique in diffusion models that has been widely adopted. The main idea is to extrapolate the model in the direction of text guidance and away from null-text guidance. In this paper, we demonstrate that null-text guidance in diffusion models is secretly a cartoon-style creator, i.e., the generated images can be efficiently transformed into cartoons by simply perturbing the null-text guidance. Specifically, we proposed two disturbance methods, i.e., Rollback disturbance (Back-D) and Image disturbance (Image-D), to construct misalignment between the noisy images used for predicting null-text guidance and text guidance (subsequently referred to as \textbf{null-text noisy image} and \textbf{text noisy image} respectively) in the sampling process. Back-D achieves cartoonization by altering the noise level of null-text noisy image via replacing $x_t$ with $x_{t+\Delta t}$. Image-D, alternatively, produces high-fidelity, diverse cartoons by defining $x_t$ as a clean input image, which further improves the incorporation of finer image details. Through comprehensive experiments, we delved into the principle of noise disturbing for null-text and uncovered that the efficacy of disturbance depends on the correlation between the null-text noisy image and the source image. Moreover, our proposed techniques, which can generate cartoon images and cartoonize specific ones, are training-free and easily integrated as a plug-and-play component in any classifier-free guided diffusion model. Project page is available at \url{https://nulltextforcartoon.github.io/}.	翻訳日:2023-07-27 15:45:37 公開日:2023-07-26
# 時間矢印予測を用いた実細胞顕微鏡のための自己教師付き密度表現学習 Self-supervised dense representation learning for live-cell microscopy with time arrow prediction ( http://arxiv.org/abs/2305.05511v2 ) ライセンス: Link先を確認	Benjamin Gallusser, Max Stieber, and Martin Weigert	(参考訳) 顕微鏡画像の最先端のオブジェクト検出とセグメンテーション方法は教師付き機械学習に依存しており、トレーニングデータの手作業による注釈を必要とする。本稿では,生の無ラベルライブセル顕微鏡映像から高密度画像表現を学習するtime arrow prediction pre-trainingに基づく自己教師あり方式を提案する。本手法は,画像領域の正しい順序を単一画像特徴抽出器を用いて予測し,その後,融合した特徴に基づいて時間矢印予測ヘッドを動作させる。得られた高密度表現が本質的に時間非対称な生物学的過程を捉えていることを示す。さらに,細胞分裂の検出と分節化,および細胞状態の分類において,いくつかのライブセル顕微鏡データセット上でこれらの表現の有用性を示す。提案手法は教師付き手法よりも優れており,特に実例と同様,限定的真理アノテーションしか利用できない場合に優れる。コードはhttps://github.com/weigertlab/tarrow.com/で提供します。 State-of-the-art object detection and segmentation methods for microscopy images rely on supervised machine learning, which requires laborious manual annotation of training data. Here we present a self-supervised method based on time arrow prediction pre-training that learns dense image representations from raw, unlabeled live-cell microscopy videos. Our method builds upon the task of predicting the correct order of time-flipped image regions via a single-image feature extractor followed by a time arrow prediction head that operates on the fused features. We show that the resulting dense representations capture inherently time-asymmetric biological processes such as cell divisions on a pixel-level. We furthermore demonstrate the utility of these representations on several live-cell microscopy datasets for detection and segmentation of dividing cells, as well as for cell state classification. Our method outperforms supervised methods, particularly when only limited ground truth annotations are available as is commonly the case in practice. We provide code at https://github.com/weigertlab/tarrow.	翻訳日:2023-07-27 15:45:01 公開日:2023-07-26
# Kullback-Leibler Maillard Smpling for Multi-armed Bandits with bounded Rewards Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards ( http://arxiv.org/abs/2304.14989v2 ) ライセンス: Link先を確認	Hao Qin, Kwang-Sung Jun and Chicheng Zhang	(参考訳) 我々は、腕の報酬分布がすべて$[0,1]$間隔で支えられるような$K$武器の盗賊問題を研究する。この環境では、後悔効率の悪いランダム化探索アルゴリズムを設計することが難しかった。 maillard sampling~\cite{maillard13apprentissage}(トンプソンサンプリングに代わる魅力的な代替品)は、最近、オフラインポリシー評価に有用なクローズドフォームアクション確率を維持しながら、サブゲージの報酬設定における競合的な後悔の保証を達成することが示されている。本研究では,KL-Leibler Maillard Smpling (KL-MS)アルゴリズムを提案する。 kl-ms は、報酬がベルヌーイであるときに漸近的最適性を享受し、最悪の場合の後悔の束縛が $o(\sqrt{\mu^(1-\mu^) k t \ln k} + k \ln t)$ であることを示し、ここで $\mu^$ は最適アームの期待報酬であり、$t$ は時平線の長さである。 We study $K$-armed bandit problems where the reward distributions of the arms are all supported on the $[0,1]$ interval. It has been a challenge to design regret-efficient randomized exploration algorithms in this setting. Maillard sampling~\cite{maillard13apprentissage}, an attractive alternative to Thompson sampling, has recently been shown to achieve competitive regret guarantees in the sub-Gaussian reward setting~\cite{bian2022maillard} while maintaining closed-form action probabilities, which is useful for offline policy evaluation. In this work, we propose the Kullback-Leibler Maillard Sampling (KL-MS) algorithm, a natural extension of Maillard sampling for achieving KL-style gap-dependent regret bound. We show that KL-MS enjoys the asymptotic optimality when the rewards are Bernoulli and has a worst-case regret bound of the form $O(\sqrt{\mu^(1-\mu^) K T \ln K} + K \ln T)$, where $\mu^$ is the expected reward of the optimal arm, and $T$ is the time horizon length.	翻訳日:2023-07-27 15:44:46 公開日:2023-07-26
# 多光子高次元GHZ状態の合成 Preparation of multiphoton high-dimensional GHZ state ( http://arxiv.org/abs/2304.12813v4 ) ライセンス: Link先を確認	Wen-Bo Xing, Xiao-Min Hu, Yu Guo, Bi-Heng Liu, Chuan-Feng Li and Guang-Can Guo	(参考訳) 多部類高次元絡み合わせは多部類2次元絡み合わせとは異なる物理を呈する。しかし、多次元高次元絡み合わせの作り方はまだ線形光学の課題である。本稿では,光学系において任意の次元の準備プロトコルを持つ多光子GHZ状態を提案する。本プロトコルでは,高次元エンタングルメントゲートを実現するために補助エンタングルメントを用い,高次元エンタングルペアを多成分の高次元ghz状態に接続する。具体的には、光子の経路自由度を用いて4粒子の3次元ghz状態を作成する例を示す。本手法は他の自由度まで拡張でき、任意の次元で任意のghz絡み合いを生成することができる。 Multipartite high-dimensional entanglement presents different physics from multipartite two-dimensional entanglement. However, how to prepare multipartite high-dimensional entanglement is still a challenge with linear optics. In this paper, a multiphoton GHZ state with arbitrary dimensions preparation protocol is proposed in optical systems. In this protocol, we use auxiliary entanglements to realize a high-dimensional entanglement gate, so that high-dimensional entangled pairs can be connected into a multipartite high-dimensional GHZ state. Specifically, we give an example of using photons' path degree of freedom to prepare a 4-particle 3-dimensional GHZ state. Our method can be extended to other degrees of freedom and can generate arbitrary GHZ entanglement in any dimension.	翻訳日:2023-07-27 15:44:13 公開日:2023-07-26
# クロスレファレンストランスによる医療画像の分節化 Few-shot Medical Image Segmentation via Cross-Reference Transformer ( http://arxiv.org/abs/2304.09630v4 ) ライセンス: Link先を確認	Yao Huang and Jianming Liu	(参考訳) 深層学習モデルは医用画像セグメンテーションの主流となっているが、トレーニングには大規模な手動ラベル付きデータセットが必要であり、目に見えないカテゴリに拡張することは困難である。 Few-shot segmentation(FSS)は、少数のラベル付きサンプルから新しいカテゴリを学習することで、これらの課題に対処する可能性がある。現在の手法のほとんどはプロトタイプ学習アーキテクチャを採用しており、サポート対象のベクトルを拡張し、条件付きセグメンテーションを実行するためにクエリ機能と結合する。しかし、このようなフレームワークは、サポートとクエリ機能の相関を無視する一方で、クエリ機能に重点を置く可能性がある。本稿では,支援画像と問合せ画像との相互作用の欠如に対処するために,クロスリファレンストランスを用いた,自己教師付き少数の医用画像分割ネットワークを提案する。まず,両方向のクロスアテンションモジュールを用いて,サポートセット画像とクエリ画像の相関性を向上する。次に,高次元チャネルにおけるサポート機能やクエリ機能の類似部分を発掘・拡張するために,クロスリファレンス機構を採用している。実験の結果,CTデータセットとMRIデータセットの両方で良好な結果が得られた。 Deep learning models have become the mainstream method for medical image segmentation, but they require a large manually labeled dataset for training and are difficult to extend to unseen categories. Few-shot segmentation(FSS) has the potential to address these challenges by learning new categories from a small number of labeled samples. The majority of the current methods employ a prototype learning architecture, which involves expanding support prototype vectors and concatenating them with query features to conduct conditional segmentation. However, such framework potentially focuses more on query features while may neglect the correlation between support and query features. In this paper, we propose a novel self-supervised few shot medical image segmentation network with Cross-Reference Transformer, which addresses the lack of interaction between the support image and the query image. We first enhance the correlation features between the support set image and the query image using a bidirectional cross-attention module. Then, we employ a cross-reference mechanism to mine and enhance the similar parts of support features and query features in high-dimensional channels. Experimental results show that the proposed model achieves good results on both CT dataset and MRI dataset.	翻訳日:2023-07-27 15:44:02 公開日:2023-07-26
# UPGPT:人物画像生成・編集・メッセージ転送のためのユニバーサル拡散モデル UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer ( http://arxiv.org/abs/2304.08870v2 ) ライセンス: Link先を確認	Soon Yau Cheong, Armin Mustafa, Andrew Gilbert	(参考訳) StableDiffusionのようなテキスト・ツー・イメージ・モデル(T2I)は、人々の高品質な画像を生成するために使われてきた。しかし、生成過程のランダムな性質から、同じテキストプロンプトを使用しているにもかかわらず、人物はポーズ、顔、衣服などの外観が異なる。不整合のように見えるため、T2Iはポーズ転移には不適である。我々は、テキスト、ポーズ、視覚的プロンプトを受け入れるマルチモーダル拡散モデルを提案する。本モデルは,全人物画像タスク生成,ポーズ転送,マスクレス編集を行う最初の統一手法である。また,小型3次元ボディモデルパラメータを直接利用して,人物の外観を維持しながら,新たな機能的ポーズとカメラビューの補間を示す。 Text-to-image models (T2I) such as StableDiffusion have been used to generate high quality images of people. However, due to the random nature of the generation process, the person has a different appearance e.g. pose, face, and clothing, despite using the same text prompt. The appearance inconsistency makes T2I unsuitable for pose transfer. We address this by proposing a multimodal diffusion model that accepts text, pose, and visual prompting. Our model is the first unified method to perform all person image tasks - generation, pose transfer, and mask-less edit. We also pioneer using small dimensional 3D body model parameters directly to demonstrate new capability - simultaneous pose and camera view interpolation while maintaining the person's appearance.	翻訳日:2023-07-27 15:43:20 公開日:2023-07-26
# 格子ゲージ理論とサブシステム符号の相互作用 Interplay between lattice gauge theory and subsystem codes ( http://arxiv.org/abs/2304.05718v3 ) ライセンス: Link先を確認	Yoshihito Kuno, Ikuo Ichinose	(参考訳) トーリック符号は、トポロジカル順序を持つ射影ハミルトニアンによって支配される純粋ゲージ理論モデルであると広く認識されている。本研究では,量子情報システムとゲージ理論モデルとの相互作用をサブシステムコードの観点から拡張する。例えば、特定の開境界条件を持つ(2+1)次元のZ_2$格子ゲージ-ヒッグスモデルが、一種のサブシステムコードであることを示す。システムでは、ガウス・ロー制約は安定化子であり、ヒッグスと閉じ込めフェーズを識別する順序パラメータが存在し、境界上に位置するサブシステム符号の論理演算子である。混合異常は境界零モードの存在を規定しており、これはヒッグスと閉じ込め相における対称性で保護された位相秩序の直接的な結果である。位相図を識別した後、サブシステムはhiggsとcloinementフェーズに埋め込まれる。主な知見として、higgsとcloinementフェーズでコード(エンコードされたqubit)を明確に記述し、higgsとcloinementフェーズの双対性を明確にする。ヒッグスおよび閉じ込め相のサブシステムの縮退構造は、いくつかの興味深い凝縮マッター系で観測される強零モードの概念に類似した非常に高エネルギーレベルでも残っている。数値解析手法を用いて解析的に得られた結果を相関させ,得られたスペクトル構造はゲージ理論相における様々なサブシステムの解析的記述をサポートする。 It is now widely recognized that the toric code is a pure gauge-theory model governed by a projective Hamiltonian with topological orders. In this work, we extend the interplay between quantum information system and gauge-theory model from the view point of subsystem code, which is suitable for \textit{gauge systems including matter fields}. As an example, we show that $Z_2$ lattice gauge-Higgs model in (2+1)-dimensions with specific open boundary conditions is noting but a kind of subsystem code. In the system, Gauss-law constraints are stabilizers, and order parameters identifying Higgs and confinement phases exist and they are nothing but logical operators in subsystem codes residing on the boundaries. Mixed anomaly of them dictates the existence of boundary zero modes, which is a direct consequence of symmetry-protected topological order in Higgs and confinement phases. After identifying phase diagram, subsystem codes are embedded in the Higgs and confinement phases. As our main findings, we give an explicit description of the code (encoded qubit) in the Higgs and confinement phases, which clarifies duality between Higgs and confinement phases. The degenerate structure of subsystem code in the Higgs and confinement phases remains even in very high-energy levels, which is analogous to notion of strong-zero modes observed in some interesting condensed-matter systems. Numerical methods are used to corroborate analytically-obtained results and the obtained spectrum structure supports the analytical description of various subsystem codes in the gauge theory phases.	翻訳日:2023-07-27 15:42:58 公開日:2023-07-26
# 説明可能で言語非依存なllmに向けて:大規模言語のシンボリックリバースエンジニアリング Towards Explainable and Language-Agnostic LLMs: Symbolic Reverse Engineering of Language at Scale ( http://arxiv.org/abs/2306.00017v3 ) ライセンス: Link先を確認	Walid S. Saba	(参考訳) 大規模言語モデル(llm)は、undenia-blyが多くの人工知能(ai)に対する信念を変えたマイルストーンを達成した。しかし、深層ニューラルネットワークの下位アーキテクチャの副産物である真の言語理解に関しては、これらのLLMには多くの制限がある。さらに、それらのサブシンボリックな性質のため、これらのモデルが言語がどのように機能するかに関する知識は、常に何十億ものマイクロファチュア(重み)に埋もれてしまう。これらの制約に対処するため、我々は記号表現の強さとLLMの成功の鍵となるもの、すなわち大規模言語におけるボトムアップ・リバースエンジニアリングの成功を組み合わせることを提案する。このように、我々はボトムアップな言語リバースエンジニアリングをシンボリックな設定で議論する。このプロジェクトのヒントは、何人かの著者によって提案されており、このプロジェクトをどのように達成できるかについて、いくつかの詳細を議論している。 Large language models (LLMs) have achieved a milestone that undenia-bly changed many held beliefs in artificial intelligence (AI). However, there remains many limitations of these LLMs when it comes to true language understanding, limitations that are a byproduct of the under-lying architecture of deep neural networks. Moreover, and due to their subsymbolic nature, whatever knowledge these models acquire about how language works will always be buried in billions of microfeatures (weights), none of which is meaningful on its own, making such models hopelessly unexplainable. To address these limitations, we suggest com-bining the strength of symbolic representations with what we believe to be the key to the success of LLMs, namely a successful bottom-up re-verse engineering of language at scale. As such we argue for a bottom-up reverse engineering of language in a symbolic setting. Hints on what this project amounts to have been suggested by several authors, and we discuss in some detail here how this project could be accomplished.	翻訳日:2023-07-27 15:35:28 公開日:2023-07-26
# TD-GEM:テキスト駆動ガーメント編集マッパー TD-GEM: Text-Driven Garment Editing Mapper ( http://arxiv.org/abs/2305.18120v2 ) ライセンス: Link先を確認	Reza Dadfar, Sanaz Sabzevari, M\r{a}rten Bj\"orkman, Danica Kragic	(参考訳) 言語ベースのファッション画像編集は、ユーザーがテキストプロンプトで好みの衣服のバリエーションを試すことができる。 StyleCLIPとHairCLIPの潜在表現を操作する研究から着想を得て、フルボディの人間のデータセットのファッションアイテムを編集する潜在空間に焦点を当てた。現在、衣服の形状やテクスチャの複雑さや人間のポーズの多様性が原因で、ファッション画像編集の処理にギャップがある。本稿では,ファッションアイテムの編集を目的としたtd-gem(text-driven clothing editing mapper)と呼ばれる編集最適化手法を提案する。この目的のために、まず、より正確な結果を得るために、Encoder for Editing (e4e) やPivotal Tuning Inversion (PTI) のような生成的敵ネットワークインバージョンを通して画像の潜在表現を得る。次に、最適化に基づくContrastive Language-Image Pre-training(CLIP)を用いて、テキストプロンプトによって表現されたターゲット属性の方向におけるファッションイメージの潜在表現を誘導する。我々のTD-GEMはターゲット属性に従って画像を正確に操作し、画像の他の部分は無タッチで保持する。実験では,TD-GEMを2つの属性(色と袖の長さ)で評価し,最近の操作方式と比較して現実的な画像を効果的に生成する。 Language-based fashion image editing allows users to try out variations of desired garments through provided text prompts. Inspired by research on manipulating latent representations in StyleCLIP and HairCLIP, we focus on these latent spaces for editing fashion items of full-body human datasets. Currently, there is a gap in handling fashion image editing due to the complexity of garment shapes and textures and the diversity of human poses. In this paper, we propose an editing optimizer scheme method called Text-Driven Garment Editing Mapper (TD-GEM), aiming to edit fashion items in a disentangled way. To this end, we initially obtain a latent representation of an image through generative adversarial network inversions such as Encoder for Editing (e4e) or Pivotal Tuning Inversion (PTI) for more accurate results. An optimization-based Contrastive Language-Image Pre-training (CLIP) is then utilized to guide the latent representation of a fashion image in the direction of a target attribute expressed in terms of a text prompt. Our TD-GEM manipulates the image accurately according to the target attribute, while other parts of the image are kept untouched. In the experiments, we evaluate TD-GEM on two different attributes (i.e., "color" and "sleeve length"), which effectively generates realistic images compared to the recent manipulation schemes.	翻訳日:2023-07-27 15:34:48 公開日:2023-07-26
# 時空間マター:相対論的量子論における局在問題 Space-Time-Matter: Some Notes on the Localization Problem in Relativistic Quantum Theory ( http://arxiv.org/abs/2305.18118v2 ) ライセンス: Link先を確認	Christian Beck	(参考訳) この研究は、相対論的量子論における正のエネルギー仮定の意味と、量子システムの局所化に関する問題に光を当てることを目的としている。相対論的波動方程式(ディラック方程式など)の解の正のエネルギー特性は、自由時間発展を超えた状態変換に関して非常に脆弱であることが示されている。第二量子化における負のエネルギーディラック波動関数とペア生成過程の間の関係に注意を払うと、この解析は相対論的量子論の局所化問題(例えばニュートン、ウィグナー、リー、シュライダー、ヘーガーフェルト、マレーメントの有名な結果と関連する)として知られる問題のクラスをよりよく理解する。最後に、この解析はボーム場の量子論の観点から反映される。 This work aims to shed some light on the meaning of the positive energy assumption in relativistic quantum theory and its relation to questions of localization of quantum systems. It is shown that the positive energy property of solutions of relativistic wave equations (such as the Dirac equation) is very fragile with respect to state transformations beyond free time evolution. Paying attention to the connection between negative energy Dirac wave functions and pair creation processes in second quantization, this analysis leads to a better understanding of a class of problems known as the localization problem of relativistic quantum theory (associated for instance with famous results of Newton and Wigner, Reeh and Schlieder, Hegerfeldt or Malament). Finally, this analysis is reflected from the perspective of a Bohmian quantum field theory.	翻訳日:2023-07-27 15:34:24 公開日:2023-07-26
# 1次元ボース気体中の分散量子衝撃波における「真空点」と灰色のソリトンの運命 Fate of the "vacuum point'' and of grey solitons in dispersive quantum shock waves in a one-dimensional Bose gas ( http://arxiv.org/abs/2305.17647v3 ) ライセンス: Link先を確認	S. A. Simmons, J. C. Pillay, and K. V. Kheruntsyan	(参考訳) 平均場近似を超えた1次元ボース気体中の分散量子衝撃波の研究を継続する。 Simmonsらによる最近の作品。 [Phys. Let. 125, 180401 (2020)], この系で発生した発振衝撃波列は, 量子力学的自己干渉の結果, 物質-波位相コヒーレンスの損失によって干渉コントラストが減少すると考えられる。このようなコヒーレンスの喪失は、平均体Gross-Pitaevskiiの記述と比較して、量子的または熱的ゆらぎと強く相互作用する状態によって起こる。本研究では、この文脈における分散量子衝撃波の解析を他の動的シナリオにまで拡張する。より具体的には、研究されたシナリオには、平均場記述でいわゆる「真空点」へと導くのに十分な密度のバンプの進化と、同じ平均場近似で灰色のソリトン列を降ろすことで知られる初期密度ディップの進化が含まれる。量子的および熱的ゆらぎの存在,および中間的および強い相互作用におけるこれらの非線形波動構造の運命について検討し,真空点と灰色のソリトンの両方が平均場的アプローチを超えないことを示す。一方, 真空点は, 局所ジムプルポテンシャルの基底状態から進化する理想的(非相互作用的)ボースガス中で発生する。自然界における分散衝撃波のユビキタス性から,非線形波動現象を表示できる他の物理系に対して有用な知見と展望を提供する必要がある。 We continue the study of dispersive quantum shock waves in a one-dimensional Bose gas beyond the mean-field approximation. In a recent work by Simmons et al. [Phys. Rev. Let. 125, 180401 (2020)], the oscillatory shock wave train developing in this system from an initial localized density bump on a uniform background was interpreted as a result of quantum mechanical self-interference, wherein the interference contrast would diminish with the loss of matter-wave phase coherence. Such loss of coherence, relative to the mean-field Gross-Pitaevskii description, occurs due to either quantum or thermal fluctuations, as well as in the strongly interacting regime. In this work, we extend the analysis of dispersive quantum shock waves in this context to other dynamical scenarios. More specifically, the scenarios studied include evolution of a sufficiently high density bump, known to lead to the so-called ``vacuum point'' in the mean-field description, and evolution of an initial density dip, known to shed a train of grey solitons in the same mean-field approximation. We study the fate of these nonlinear wave structures in the presence of quantum and thermal fluctuations, as well as at intermediate and strong interactions, and show that both the vacuum point and grey solitons cease to manifest themselves beyond the mean-field approach. On the other hand, we find that a vacuum point can occur in an ideal (noninteracting) Bose gas evolving from a ground state of a localized dimple potential. Due to the ubiquity of dispersive shock waves in nature, our results should provide useful insights and perspectives for a variety of other physical systems known to display nonlinear wave phenomena.	翻訳日:2023-07-27 15:34:09 公開日:2023-07-26
# ロングテール認識問題における重みバランスの検討 Exploring Weight Balancing on Long-Tailed Recognition Problem ( http://arxiv.org/abs/2305.16573v3 ) ライセンス: Link先を確認	Naoya Hasegawa, Issei Sato	(参考訳) サンプルサイズが意図的に調整されない限り,データセット内のクラス毎のサンプルサイズ分布が一般的に指数関数的であるため,クラス毎のサンプルサイズが重く歪んだ長いデータにおける認識問題は近年重要になっている。これらの問題に対処するために様々なアプローチが考案された。近年,有名な古典的正規化手法と二段階訓練を組み合わせた重みバランスが提案されている。その単純さにもかかわらず、様々な方法で考案された既存の手法に対する高い性能で知られている。しかし、このアプローチが長期データに有効である理由については理解されていない。本研究では,各訓練段階における神経崩壊と錐体効果に着目した手法を分析し,重量減少とクロスエントロピー損失による特徴抽出器のフィッシャーの判別比の増加と,体重減少とクラスバランス損失による暗黙のロジット調整に分解できることを見出した。分析により,精度を高めつつ訓練段階の数を1つに減らすことにより,より簡便な訓練方法が得られた。 Recognition problems in long-tailed data, where the sample size per class is heavily skewed, have recently gained importance because the distribution of the sample size per class in a dataset is generally exponential unless the sample size is intentionally adjusted. Various approaches have been devised to address these problems. Recently, weight balancing, which combines well-known classical regularization techniques with two-stage training, has been proposed. Despite its simplicity, it is known for its high performance against existing methods devised in various ways. However, there is a lack of understanding as to why this approach is effective for long-tailed data. In this study, we analyze the method focusing on neural collapse and cone effect at each training stage and find that it can be decomposed into the increase in Fisher's discriminant ratio of the feature extractor caused by weight decay and cross entropy loss and implicit logit adjustment caused by weight decay and class-balanced loss. Our analysis shows that the training method can be further simplified by reducing the number of training stages to one while increasing accuracy.	翻訳日:2023-07-27 15:33:39 公開日:2023-07-26
# EgoVSR: 高品質なEgocentric Video Super-Resolutionを目指す EgoVSR: Towards High-Quality Egocentric Video Super-Resolution ( http://arxiv.org/abs/2305.14708v2 ) ライセンス: Link先を確認	Yichen Chi, Junhao Gu, Jiamiao Zhang, Wenming Yang, Yapeng Tian	(参考訳) キャプチャ装置やシナリオの制限のため、エゴセントリックなビデオは視覚的品質が低く、主に高い圧縮と激しい動きのぼけによって引き起こされる。エゴセントリックビデオの応用が増えているため、これらのビデオの品質を超高解像度で高める必要がある。しかし、既存のVSR(Video Super-Resolution)の作品は、3人称ビュービデオに焦点をあてているが、エゴセントリックビデオの急激なエゴモーションや物体の動きによるぼやけたアーチファクトを扱うには適していない。この目的のために,エゴセントリックなビデオに特化して設計されたVSRフレームワークであるEgoVSRを提案する。 VSRフレームワークのDual Branch Deblur Network (DB$^2$Net) を用いて,エゴセントリックな動画における動きのぼかしに明示的に対処する。一方、DB$^2$Net学習のガイドとしてぼやけたマスクが導入され、ビデオフレーム内のぼやけた領域のローカライズに使用できる。またマスク予測のためにMaskNetを設計し,マスク推定を最適化するためにマスク損失を予測した。さらに, エゴセントリックビデオのように動きのぼやきをシミュレートするために, 一般的なvsrトレーニングデータに対するオンラインモーションボケ合成モデルを提案する。提案手法の有効性を検証するため,多数の高速移動エゴセントリックなビデオシーケンスを含むEgoVSRデータセットを提案する。我々のEgoVSRモデルは、低品質のエゴセントリックビデオを効率よく超解し、強力な比較ベースラインを上回ります。私たちのコード、事前トレーニングされたモデル、データはhttps://github.com/chiyich/egovsr/で確認できます。 Due to the limitations of capture devices and scenarios, egocentric videos frequently have low visual quality, mainly caused by high compression and severe motion blur. With the increasing application of egocentric videos, there is an urgent need to enhance the quality of these videos through super-resolution. However, existing Video Super-Resolution (VSR) works, focusing on third-person view videos, are actually unsuitable for handling blurring artifacts caused by rapid ego-motion and object motion in egocentric videos. To this end, we propose EgoVSR, a VSR framework specifically designed for egocentric videos. We explicitly tackle motion blurs in egocentric videos using a Dual Branch Deblur Network (DB$^2$Net) in the VSR framework. Meanwhile, a blurring mask is introduced to guide the DB$^2$Net learning, and can be used to localize blurred areas in video frames. We also design a MaskNet to predict the mask, as well as a mask loss to optimize the mask estimation. Additionally, an online motion blur synthesis model for common VSR training data is proposed to simulate motion blurs as in egocentric videos. In order to validate the effectiveness of our proposed method, we introduce an EgoVSR dataset containing a large amount of fast-motion egocentric video sequences. Extensive experiments demonstrate that our EgoVSR model can efficiently super-resolve low-quality egocentric videos and outperform strong comparison baselines. Our code, pre-trained models and data can be found at https://github.com/chiyich/EGOVSR/.	翻訳日:2023-07-27 15:33:19 公開日:2023-07-26
# 時空間的注意に基づく視覚位置認識のための学習シーケンス記述子 Learning Sequence Descriptor based on Spatio-Temporal Attention for Visual Place Recognition ( http://arxiv.org/abs/2305.11467v3 ) ライセンス: Link先を確認	Fenglin Zhang, Junqiao Zhao, Yingfeng Cai, Gengxuan Tian, Wenjie Mu, Chen Ye	(参考訳) ビジュアルプレース認識(VPR)は、クエリフレームと同じ場所に位置するジオタグデータベースからフレームを取得することを目的としている。知覚的エイリアスにおけるVPRの堅牢性を改善するために,シーケンスベースのVPR手法を提案する。これらの手法はフレームシーケンス間のマッチングや直接検索のためのシーケンス記述子抽出に基づいている。しかし、前者は一般に一定の速度の仮定に基づいており、これは実際は保持が困難であり、計算コストが高く、シーケンス長が要求される。後者はこれらの問題を克服するが、既存のシーケンス記述子は、時間的情報に干渉することなく、複数のフレームの特徴を集約することで構築される。本稿では,時空間情報を効果的に組み込むシーケンス記述子を提案する。具体的には、同じフレーム内の空間的注意を空間的特徴パターンの学習に利用し、異なるフレームの対応する局所領域の注意を時間とともに特徴の持続性や変化を学ぶために利用する。我々はスライディングウィンドウを用いて時間的注意範囲を制御し、相対的な位置エンコーディングを用いて異なる特徴間の逐次的関係を構築する。これにより、ディスクリプタはフレームのシーケンスで内在的なダイナミクスをキャプチャできます。挑戦的なベンチマークデータセットに関する包括的な実験は、提案手法が最近の最先端手法よりも優れていることを示している。 Visual Place Recognition (VPR) aims to retrieve frames from a geotagged database that are located at the same place as the query frame. To improve the robustness of VPR in perceptually aliasing scenarios, sequence-based VPR methods are proposed. These methods are either based on matching between frame sequences or extracting sequence descriptors for direct retrieval. However, the former is usually based on the assumption of constant velocity, which is difficult to hold in practice, and is computationally expensive and subject to sequence length. Although the latter overcomes these problems, existing sequence descriptors are constructed by aggregating features of multiple frames only, without interaction on temporal information, and thus cannot obtain descriptors with spatio-temporal discrimination. In this paper, we propose a sequence descriptor that effectively incorporates spatio-temporal information. Specifically, spatial attention within the same frame is utilized to learn spatial feature patterns, while attention in corresponding local regions of different frames is utilized to learn the persistence or change of features over time. We use a sliding window to control the temporal range of attention and use relative position encoding to construct sequential relationships between different features. This allows our descriptors to capture the intrinsic dynamics in a sequence of frames. Comprehensive experiments on challenging benchmark datasets show that the proposed approach outperforms recent state-of-the-art methods.	翻訳日:2023-07-27 15:32:31 公開日:2023-07-26
# 量子リピータネットワークのスケーリング限界 Scaling Limits of Quantum Repeater Networks ( http://arxiv.org/abs/2305.08696v2 ) ライセンス: Link先を確認	Mahdi Chehimi, Shahrooz Pouryousef, Nitish K. Panigrahy, Don Towsley, and Walid Saad	(参考訳) 量子ネットワーク(QN)はセキュアな通信、強化されたセンシング、効率的な分散量子コンピューティングのための有望なプラットフォームである。しかし、量子状態の脆弱な性質のため、これらのネットワークはスケーラビリティの面で大きな課題に直面している。本稿では,量子リピータネットワーク(QRN)のスケーリング限界について解析する。この研究の目標は、qos(application-specific quality-of-service)要件を満たす一方で、長距離量子通信を実現するqrnの全体的な長さやスケーラビリティを最大化することである。特に、QRNのスケーラビリティを最大化することを目的とした、エンドツーエンドの忠実度とレートに関するQoS制約を満たす新しい共同最適化フレームワークを提案する。提案手法は,QRNリピータノード数,分離距離,およびリンクレベルとエンド・ツー・エンドレベルの両方で行う蒸留ラウンド数を最適化する。ゲートおよび測定誤差下でのQRNのスケーラビリティ,速度,忠実さのトレードオフを分析するために,広範囲なシミュレーションを行った。得られた結果は、所定のQoS要求に対するQRNスケーリング限界を特徴づける。提案されたアプローチは、将来のQRNデプロイメントのための有望なソリューションと設計ガイドラインを提供する。 Quantum networks (QNs) are a promising platform for secure communications, enhanced sensing, and efficient distributed quantum computing. However, due to the fragile nature of quantum states, these networks face significant challenges in terms of scalability. In this paper, the scaling limits of quantum repeater networks (QRNs) are analyzed. The goal of this work is to maximize the overall length, or scalability of QRNs such that long-distance quantum communications is achieved while application-specific quality-of-service (QoS) requirements are satisfied. In particular, a novel joint optimization framework that aims at maximizing QRN scalability, while satisfying QoS constraints on the end-to-end fidelity and rate is proposed. The proposed approach optimizes the number of QRN repeater nodes, their separation distance, and the number of distillation rounds to be performed at both link and end-to-end levels. Extensive simulations are conducted to analyze the tradeoffs between QRN scalability, rate, and fidelity under gate and measurement errors. The obtained results characterize the QRN scaling limits for a given QoS requirement. The proposed approach offers a promising solution and design guidelines for future QRN deployments.	翻訳日:2023-07-27 15:32:07 公開日:2023-07-26
# 導電性自由ウェイト空間の組立 Derivative Free Weight-space Ensembling ( http://arxiv.org/abs/2307.03506v2 ) ライセンス: Link先を確認	Dean Ninalga	(参考訳) 最近の研究は、2つの専門言語モデルの重み間の補間によって、マルチタスク学習ができない方法でタスク間で知識を伝達できることを示唆している。しかし、2つ以上のモデル間の補間を探索する事例はほとんどなく、それぞれに異なる知識基盤がある。本稿では,オープンドメイン対話のための新しいタスク転送手法であるdfwe(dederative free weight-space ensembling)を提案する。我々のフレームワークは、事前定義されたソースタスクセットを使用して訓練された多様な専門家言語モデルを作成する。次に,対象タスクにおける各専門家モデルの精細化を行い,複数の異なる知識ベースから対象タスクに接近する。最後に、勾配最適化アルゴリズムを用いてモデル重み間の線形補間を行い、補間重み付けを効率的に行う。本手法は,feta-friendsの標準的なプリトレイン・フィニチューンアプローチに匹敵する効果を示す。 Recent work suggests that interpolating between the weights of two specialized language models can transfer knowledge between tasks in a way that multi-task learning cannot. However, very few have explored interpolation between more than two models, where each has a distinct knowledge base. In this paper, we introduce Derivative Free Weight-space Ensembling (DFWE), a new few-sample task transfer approach for open-domain dialogue. Our framework creates a set of diverse expert language models trained using a predefined set of source tasks. Next, we finetune each of the expert models on the target task, approaching the target task from several distinct knowledge bases. Finally, we linearly interpolate between the model weights using a gradient-free-optimization algorithm, to efficiently find a good interpolation weighting. We demonstrate the effectiveness of the method on FETA-Friends outperforming the standard pretrain-finetune approach.	翻訳日:2023-07-27 15:26:13 公開日:2023-07-26
# MDViT:小型医用画像分割データセット用マルチドメインビジョントランス MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets ( http://arxiv.org/abs/2307.02100v2 ) ライセンス: Link先を確認	Siyi Du, Nourhan Bayasi, Ghassan Harmarneh, Rafeef Garbi	(参考訳) 臨床的有用性にもかかわらず、医用画像分割(MIS)は画像固有の複雑さと変動性のため、困難な作業である。ビジョントランスフォーマー(ViT)は最近、MISを改善するための有望なソリューションとして登場したが、畳み込みニューラルネットワークよりも大規模なトレーニングデータセットを必要とする。この障害を克服するために、データ効率のよいvitが提案されたが、通常は単一のデータソースを使用してトレーニングされ、他の利用可能なデータセットから活用できる貴重な知識を見落としている。異なるドメインからのデータセットを組み合わせることは、負の知識伝達(NKT)、すなわち、無視できないドメイン間不均一性を持ついくつかのドメインにおけるモデル性能の低下をもたらす。本稿では,複数のデータリソース(ドメイン)の知識を適応的に活用することにより,データハンガーを緩和し,NKTと戦うドメインアダプタを含む,最初のマルチドメインViTであるMDViTを提案する。さらに、ドメイン間の表現学習を強化するために、ユニバーサルネットワーク(全ドメインを拡大する)と補助ドメイン固有のブランチ間で知識を伝達する相互知識蒸留パラダイムを統合する。 4つの皮膚病変セグメンテーションデータセットの実験により、MDViTは、より多くのドメインを追加しても推論時に、より優れたセグメンテーション性能と固定モデルサイズで最先端のアルゴリズムより優れていることが示された。私たちのコードはhttps://github.com/siyi-wind/mdvitで利用可能です。 Despite its clinical utility, medical image segmentation (MIS) remains a daunting task due to images' inherent complexity and variability. Vision transformers (ViTs) have recently emerged as a promising solution to improve MIS; however, they require larger training datasets than convolutional neural networks. To overcome this obstacle, data-efficient ViTs were proposed, but they are typically trained using a single source of data, which overlooks the valuable knowledge that could be leveraged from other available datasets. Naivly combining datasets from different domains can result in negative knowledge transfer (NKT), i.e., a decrease in model performance on some domains with non-negligible inter-domain heterogeneity. In this paper, we propose MDViT, the first multi-domain ViT that includes domain adapters to mitigate data-hunger and combat NKT by adaptively exploiting knowledge in multiple small data resources (domains). Further, to enhance representation learning across domains, we integrate a mutual knowledge distillation paradigm that transfers knowledge between a universal network (spanning all the domains) and auxiliary domain-specific branches. Experiments on 4 skin lesion segmentation datasets show that MDViT outperforms state-of-the-art algorithms, with superior segmentation performance and a fixed model size, at inference time, even as more domains are added. Our code is available at https://github.com/siyi-wind/MDViT.	翻訳日:2023-07-27 15:25:59 公開日:2023-07-26
# DifFSS:Few-Shot Semantic Segmentationのための拡散モデル DifFSS: Diffusion Model for Few-Shot Semantic Segmentation ( http://arxiv.org/abs/2307.00773v2 ) ライセンス: Link先を確認	Weimin Tan, Siyuan Chen, Bo Yan	(参考訳) 拡散モデルは画像生成において優れた性能を示した。様々なネットワーク構造を持つ小ショットセマンティックセグメンテーション(FSS)モデルが提案されているが、性能改善はボトルネックに達している。本稿では,DifFSSと呼ばれるFSSタスクの拡散モデルを活用するための最初の研究について述べる。新たなFSSパラダイムであるDifFSSは、ネットワーク構造を変更することなく、最先端のFSSモデルの性能をさらに向上させることができる。具体的には,拡散モデルの強力な生成能力を利用して,支援画像のセマンティックマスク,スクリブル,ソフトHED境界を制御条件として,多様な補助画像を生成する。この生成プロセスは、色、テクスチャの変化、照明、$etc$といったクエリイメージのクラス内の多様性をシミュレートする。結果として、fssモデルはより多様なサポートイメージを参照でき、よりロバストな表現となり、セグメンテーション性能の一貫した改善を達成することができる。既存の高度なFSSモデルに基づく3つの公開データセットに対する大規模な実験は、FSSタスクの拡散モデルの有効性を示す。さらに,拡散モデルの入力設定の違いがセグメント化性能に与える影響について詳細に検討した。この全く新しいパラダイムが、AI生成コンテンツと統合されたFSSタスクの研究にインスピレーションを与えることを期待している。 Diffusion models have demonstrated excellent performance in image generation. Although various few-shot semantic segmentation (FSS) models with different network structures have been proposed, performance improvement has reached a bottleneck. This paper presents the first work to leverage the diffusion model for FSS task, called DifFSS. DifFSS, a novel FSS paradigm, can further improve the performance of the state-of-the-art FSS models by a large margin without modifying their network structure. Specifically, we utilize the powerful generation ability of diffusion models to generate diverse auxiliary support images by using the semantic mask, scribble or soft HED boundary of the support image as control conditions. This generation process simulates the variety within the class of the query image, such as color, texture variation, lighting, $etc$. As a result, FSS models can refer to more diverse support images, yielding more robust representations, thereby achieving a consistent improvement in segmentation performance. Extensive experiments on three publicly available datasets based on existing advanced FSS models demonstrate the effectiveness of the diffusion model for FSS task. Furthermore, we explore in detail the impact of different input settings of the diffusion model on segmentation performance. Hopefully, this completely new paradigm will bring inspiration to the study of FSS task integrated with AI-generated content.	翻訳日:2023-07-27 15:25:33 公開日:2023-07-26
# flipnerf: 反射光線を反射して、ノベル・ビュー・シンセサイザーを作る FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis ( http://arxiv.org/abs/2306.17723v3 ) ライセンス: Link先を確認	Seunghyeon Seo, Yeonjin Chang, Nojun Kwak	(参考訳) ニューラル・ラミアンス・フィールド(nerf)は、レンダリングされた画像と単純なアーキテクチャの素晴らしい品質を持つ、新しいビュー合成の主流である。 NeRFは, 連続的な性能向上のために様々な方向に開発されてきたが, 多視点画像の高密度化の必要性は, 実用化に向けての停滞ブロックとして残っている。そこで本研究では,フリップ反射光を利用した数ショットの新規ビュー合成のための新しい正規化手法であるFlipNeRFを提案する。反射光は入力線方向と推定される正規ベクトルから明示的に導出され、より正確な表面の正常を推定し、3D幾何学を効果的に学習しながら効果的な追加の訓練線の役割を担っている。表面の正規度とシーンの深さはどちらも光線に沿った推定密度から導出されるため、正確な表面の正規度はより正確な深さ推定をもたらす。さらに,FlipNeRFは,不確実性を考慮した不確実性損失とボトルネック特徴整合性損失を推定することにより,複数のシーン構造にまたがって浮動小数点を効果的に低減し,新たな特徴抽出装置を使わずに,フォトコンシステント画素に投射される2つの画素間の特徴レベルの整合性を向上させることができる。我々のFlipNeRFは、すべてのシナリオにわたる複数のベンチマークでSOTAのパフォーマンスを達成する。 Neural Radiance Field (NeRF) has been a mainstream in novel view synthesis with its remarkable quality of rendered images and simple architecture. Although NeRF has been developed in various directions improving continuously its performance, the necessity of a dense set of multi-view images still exists as a stumbling block to progress for practical application. In this work, we propose FlipNeRF, a novel regularization method for few-shot novel view synthesis by utilizing our proposed flipped reflection rays. The flipped reflection rays are explicitly derived from the input ray directions and estimated normal vectors, and play a role of effective additional training rays while enabling to estimate more accurate surface normals and learn the 3D geometry effectively. Since the surface normal and the scene depth are both derived from the estimated densities along a ray, the accurate surface normal leads to more exact depth estimation, which is a key factor for few-shot novel view synthesis. Furthermore, with our proposed Uncertainty-aware Emptiness Loss and Bottleneck Feature Consistency Loss, FlipNeRF is able to estimate more reliable outputs with reducing floating artifacts effectively across the different scene structures, and enhance the feature-level consistency between the pair of the rays cast toward the photo-consistent pixels without any additional feature extractor, respectively. Our FlipNeRF achieves the SOTA performance on the multiple benchmarks across all the scenarios.	翻訳日:2023-07-27 15:25:13 公開日:2023-07-26
# 「犬に眼鏡をかけろとおっしゃいますか?」CoDrawデータセットにおける教示明細書の内容 "Are you telling me to put glasses on the dog?'' Content-Grounded Annotation of Instruction Clarification Requests in the CoDraw Dataset ( http://arxiv.org/abs/2306.02377v2 ) ライセンス: Link先を確認	Brielen Madureira and David Schlangen	(参考訳) 命令の明確化要求は通信問題を解決するメカニズムであり、命令追従相互作用において非常に機能する。最近の研究は、CoDrawデータセットは自然発生のiCRの貴重な情報源であると主張している。 iCRがいつ作成されるべきかを識別する以外に、対話モデルは適切なフォームとコンテンツで生成できる必要がある。本研究では,CoDraw-iCR (v2) を導入し,既存の iCR 識別子を,対話ゲームアイテムと可能なアクションを基盤としたきめ細かい情報で拡張する。我々のアノテーションは対話エージェントの修復能力のモデル化と評価に役立てることができる。 Instruction Clarification Requests are a mechanism to solve communication problems, which is very functional in instruction-following interactions. Recent work has argued that the CoDraw dataset is a valuable source of naturally occurring iCRs. Beyond identifying when iCRs should be made, dialogue models should also be able to generate them with suitable form and content. In this work, we introduce CoDraw-iCR (v2), extending the existing iCR identifiers with fine-grained information grounded in the underlying dialogue game items and possible actions. Our annotation can serve to model and evaluate repair capabilities of dialogue agents.	翻訳日:2023-07-27 15:24:09 公開日:2023-07-26
# 事前学習された視覚と言語モデルにおけるエンティティの知識調査のためのテーブルと画像生成 Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models ( http://arxiv.org/abs/2306.02115v2 ) ライセンス: Link先を確認	Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe	(参考訳) 本稿では,自然言語から取得したエンティティに関する知識がvision & language(v&l)モデルに保持されているかを検証するための表と画像生成タスクを提案する。このタスクは2つの部分で構成される: 1つはエンティティとその関連イメージに関する知識を含むテーブルを生成し、もう1つは、キャプションを持つエンティティから画像を生成すること、そして、そのエンティティに関する知識を含むテーブルである。どちらのタスクでも、モデルは生成を適切に実行するために使用されるエンティティを知る必要があります。提案したタスクを実行するために、約20万のインフォボックスからウィキペディアテーブルと画像生成(WikiTIG)データセットを作成しました。複数のタスクで最新の結果を得たv&lモデルofaを用いて,上記の研究課題に対するタスクの性能評価を行った。実験の結果,OFAは画像関連タスクの性能向上のための補完として,事前学習によってエンティティ知識の一部を忘れていることがわかった。 In this paper, we propose a table and image generation task to verify how the knowledge about entities acquired from natural language is retained in Vision & Language (V&L) models. This task consists of two parts: the first is to generate a table containing knowledge about an entity and its related image, and the second is to generate an image from an entity with a caption and a table containing related knowledge of the entity. In both tasks, the model must know the entities used to perform the generation properly. We created the Wikipedia Table and Image Generation (WikiTIG) dataset from about 200,000 infoboxes in English Wikipedia articles to perform the proposed tasks. We evaluated the performance on the tasks with respect to the above research question using the V&L model OFA, which has achieved state-of-the-art results in multiple tasks. Experimental results show that OFA forgets part of its entity knowledge by pre-training as a complement to improve the performance of image related tasks.	翻訳日:2023-07-27 15:23:55 公開日:2023-07-26
# 話者非依存3次元対話ヘッド生成のための音声からのランドマークの学習 Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation ( http://arxiv.org/abs/2306.01415v2 ) ライセンス: Link先を確認	Federico Nocentini, Claudio Ferrari, Stefano Berretti	(参考訳) 本稿では,生音声入力から3次元音声頭を生成する新しい手法を提案する。本手法は,顔の可動部に位置するいくつかの制御点,すなわちランドマークの運動によって,音声関連運動を包括的かつ効率的に記述できるという考えに基づく。基礎となる筋骨格構造は、その動きが顔全体の幾何学的変形にどのように影響するかを学べる。提案手法はこの目的のために2つの異なるモデルを用いており、最初の1つは与えられたオーディオからスパースなランドマークの動作を生成することを学ぶ。第2のモデルは、そのようなランドマークの動きを密度の高い運動場に拡張し、与えられた3Dメッシュを中立状態にアニメーションするために使用される。さらに,生成した運動ベクトルと基底真理関数との角度を最小化する新しい損失関数Cosine Lossを導入する。 3D音声ヘッド生成におけるランドマークの使用は、一貫性、信頼性、手動アノテーションの必要性の回避など、さまざまなメリットを提供する。当社のアプローチは、アイデンティティ非依存で、追加のデータやトレーニングなしで、任意のユーザに対して高品質な顔アニメーションを可能にするように設計されている。 This paper presents a novel approach for generating 3D talking heads from raw audio inputs. Our method grounds on the idea that speech related movements can be comprehensively and efficiently described by the motion of a few control points located on the movable parts of the face, i.e., landmarks. The underlying musculoskeletal structure then allows us to learn how their motion influences the geometrical deformations of the whole face. The proposed method employs two distinct models to this aim: the first one learns to generate the motion of a sparse set of landmarks from the given audio. The second model expands such landmarks motion to a dense motion field, which is utilized to animate a given 3D mesh in neutral state. Additionally, we introduce a novel loss function, named Cosine Loss, which minimizes the angle between the generated motion vectors and the ground truth ones. Using landmarks in 3D talking head generation offers various advantages such as consistency, reliability, and obviating the need for manual-annotation. Our approach is designed to be identity-agnostic, enabling high-quality facial animations for any users without additional data or training.	翻訳日:2023-07-27 15:23:36 公開日:2023-07-26
# 連続時間ガウス過程回帰による時間分解能を有するイベントベースステレオビジュアルオドメトリー Event-based Stereo Visual Odometry with Native Temporal Resolution via Continuous-time Gaussian Process Regression ( http://arxiv.org/abs/2306.01188v2 ) ライセンス: Link先を確認	Jianeng Wang, Jonathan D. Gammell	(参考訳) イベントベースのカメラは、シーン内の個々の視覚変化を非同期に捉えます。これにより、従来のフレームベースのカメラよりも、非常にダイナミックな動きと照明が弱い。それはまた、シーン内のすべての測定が、ユニークなタイミングで起こりうることを意味する。これらの異なる測定時間を扱うことは、イベントベースのカメラを使用する上で大きな課題である。視覚計測(VO)パイプラインでは、時間的に近い測定を1つの共通の時間で行うように近似することで、しばしば対処される。このグルーピングは推定問題を単純化するが、追加センサーがないため、イベントベースカメラの時間分解能を犠牲にする。そこで本稿では,グループ化や近似を必要とせず,個々の事象計測時間を直接推定する完全ステレオVOパイプラインを提案する。連続時間軌道推定を用いて、物理的動機付け前のガウス過程の回帰を通じて、イベントベースのカメラの時間的忠実度と非同期性を維持する。その性能はMVSECデータセットで評価され、2つの独立したシーケンスで7.9e-3と5.9e-3の相対誤差を達成し、既存の公開イベントベースのステレオVOパイプラインをそれぞれ2回と4回上回る。 Event-based cameras asynchronously capture individual visual changes in a scene. This makes them more robust than traditional frame-based cameras to highly dynamic motions and poor illumination. It also means that every measurement in a scene can occur at a unique time. Handling these different measurement times is a major challenge of using event-based cameras. It is often addressed in visual odometry (VO) pipelines by approximating temporally close measurements as occurring at one common time. This grouping simplifies the estimation problem but, absent additional sensors, sacrifices the inherent temporal resolution of event-based cameras. This paper instead presents a complete stereo VO pipeline that estimates directly with individual event-measurement times without requiring any grouping or approximation in the estimation state. It uses continuous-time trajectory estimation to maintain the temporal fidelity and asynchronous nature of event-based cameras through Gaussian process regression with a physically motivated prior. Its performance is evaluated on the MVSEC dataset, where it achieves 7.9e-3 and 5.9e-3 RMS relative error on two independent sequences, outperforming the existing publicly available event-based stereo VO pipeline by two and four times, respectively.	翻訳日:2023-07-27 15:23:17 公開日:2023-07-26
# 機械学習のための高忠実性プラズマシミュレーションにおける磁場トポロジーのグラフ表現 Graph Representation of the Magnetic Field Topology in High-Fidelity Plasma Simulations for Machine Learning Applications ( http://arxiv.org/abs/2307.09469v2 ) ライセンス: Link先を確認	Ioanna Bouri, Fanni Franssila, Markku Alho, Giulia Cozzani, Ivan Zaitsev, Minna Palmroth, Teemu Roos	(参考訳) シミュレーションプラズマ中の磁場のトポロジカル解析は、様々な物理現象を幅広い設定で研究することができる。そのような応用の1つは、磁場トポロジーのダイナミクスに関連する現象である磁気リコネクションであり、3次元で検出および特徴づけが難しい。三次元磁気ベクトル場のトポロジカルデータ解析と時空間グラフ表現のためのスケーラブルパイプラインを提案する。我々は,地球近傍空間に対する超コンピュータスケールvlasov理論に基づくシミュレーションであるvlasiatorによって生成された地球磁気圏のシミュレーションについて,本手法を実証する。この研究の目的は、機械学習コミュニティに対して、グラフベースの機械学習アプローチを探求し、広範囲にわたる潜在的な影響に対処することである。 Topological analysis of the magnetic field in simulated plasmas allows the study of various physical phenomena in a wide range of settings. One such application is magnetic reconnection, a phenomenon related to the dynamics of the magnetic field topology, which is difficult to detect and characterize in three dimensions. We propose a scalable pipeline for topological data analysis and spatiotemporal graph representation of three-dimensional magnetic vector fields. We demonstrate our methods on simulations of the Earth's magnetosphere produced by Vlasiator, a supercomputer-scale Vlasov theory-based simulation for near-Earth space. The purpose of this work is to challenge the machine learning community to explore graph-based machine learning approaches to address a largely open scientific problem with wide-ranging potential impact.	翻訳日:2023-07-27 15:16:17 公開日:2023-07-26
# スケール・アウェア Modulation Meet Transformer Scale-Aware Modulation Meet Transformer ( http://arxiv.org/abs/2307.08579v2 ) ライセンス: Link先を確認	Weifeng Lin, Ziheng Wu, Jiayu Chen, Jun Huang, Lianwen Jin	(参考訳) 本稿では,畳み込みネットワークと視覚トランスを組み合わせることで,様々な下流タスクを効率的に処理できる新しいビジョントランスであるスケールアウェア変調トランス(smt)を提案する。 SMT で提案されているスケール・アウェア・変調 (SAM) には2つの新しい設計が含まれている。まず,マルチヘッド混合畳み込み(mhmc)モジュールについて紹介する。次に,SAAモジュールを提案する。SAAモジュールは軽量だが有効であり,異なる頭部をまたいだ情報融合を可能にする。これら2つのモジュールを活用することで、畳み込み変調はさらに強化される。さらに,全段階にわたって変調を利用して注意を払わないネットワークを構築する先行研究とは対照的に,ネットワークの深化に伴って局所的依存からグローバル的依存へのシフトを効果的にシミュレートできる進化的ハイブリッドネットワーク(ehn)を提案する。大規模な実験により、SMTは様々な視覚的タスクにおいて既存の最先端モデルよりも大幅に優れていることが示された。具体的には、11.5M / 2.4GFLOPs と 32M / 7.7GFLOPs の SMT は ImageNet-1K の 82.2% と 84.3% のトップ-1 の精度が得られる。 imagenet-22kを224^2解像度で事前トレーニングした後、解像度224^2と384^2で微調整すると、87.1%と88.1%のtop-1精度が得られる。 Mask R-CNNによる物体検出では、1xと3xのスケジュールで訓練されたSMTベースがCOCOのSwin Transformerの4.2と1.3mAPを上回っている。 UPerNetとのセマンティックセグメンテーションでは、シングルスケールとマルチスケールのSMTベーステストがADE20Kでそれぞれ2.0mIoUと1.1mIoUを上回っている。 This paper presents a new vision Transformer, Scale-Aware Modulation Transformer (SMT), that can handle various downstream tasks efficiently by combining the convolutional network and vision Transformer. The proposed Scale-Aware Modulation (SAM) in the SMT includes two primary novel designs. Firstly, we introduce the Multi-Head Mixed Convolution (MHMC) module, which can capture multi-scale features and expand the receptive field. Secondly, we propose the Scale-Aware Aggregation (SAA) module, which is lightweight but effective, enabling information fusion across different heads. By leveraging these two modules, convolutional modulation is further enhanced. Furthermore, in contrast to prior works that utilized modulations throughout all stages to build an attention-free network, we propose an Evolutionary Hybrid Network (EHN), which can effectively simulate the shift from capturing local to global dependencies as the network becomes deeper, resulting in superior performance. Extensive experiments demonstrate that SMT significantly outperforms existing state-of-the-art models across a wide range of visual tasks. Specifically, SMT with 11.5M / 2.4GFLOPs and 32M / 7.7GFLOPs can achieve 82.2% and 84.3% top-1 accuracy on ImageNet-1K, respectively. After pretrained on ImageNet-22K in 224^2 resolution, it attains 87.1% and 88.1% top-1 accuracy when finetuned with resolution 224^2 and 384^2, respectively. For object detection with Mask R-CNN, the SMT base trained with 1x and 3x schedule outperforms the Swin Transformer counterpart by 4.2 and 1.3 mAP on COCO, respectively. For semantic segmentation with UPerNet, the SMT base test at single- and multi-scale surpasses Swin by 2.0 and 1.1 mIoU respectively on the ADE20K.	翻訳日:2023-07-27 15:16:05 公開日:2023-07-26
# 量子化大規模言語モデルにおける創発的能力--実証的研究 Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study ( http://arxiv.org/abs/2307.08072v2 ) ライセンス: Link先を確認	Peiyu Liu, Zikang Liu, Ze-Feng Gao, Dawei Gao, Wayne Xin Zhao, Yaliang Li, Bolin Ding, Ji-Rong Wen	(参考訳) 優れた性能にもかかわらず、Large Language Models~(LLM)は、デプロイと使用のためにかなりの計算資源を必要とする。この問題を解決するために、LLMのメモリフットプリント削減や推論率の向上に量子化法が広く応用されている。しかし、大きな課題は、低ビット量子化法がしばしば性能劣化を引き起こすことである。量子化がLLMの容量に与える影響を理解することは重要である。全体的な性能に着目した以前の研究と異なり、本研究は、小言語モデルとllmを区別する重要な特徴である \emph{emergent ability} に対する量子化の影響を調べることを目的としている。特に,量子化llmにおける文脈内学習,連鎖的思考推論,命令追従の能力について検討する。実験により,4ビット量子化モデルにおいて,これらの創発能力は依然として存在することが示された。低ビットモデルの性能向上のために,(1) 部品(またはサブ構造)が量子化に敏感である場合の微視的影響解析,(2) モデル微視化による性能補償の2つの実験を行った。我々の研究は、量子化が創発能力に与える影響を理解するための重要な発見を導き、LLMの極低ビット量子化の可能性に光を放つ。 Despite the superior performance, Large Language Models~(LLMs) require significant computational resources for deployment and use. To overcome this issue, quantization methods have been widely applied to reduce the memory footprint of LLMs as well as increasing the inference rate. However, a major challenge is that low-bit quantization methods often lead to performance degradation. It is important to understand how quantization impacts the capacity of LLMs. Different from previous studies focused on overall performance, this work aims to investigate the impact of quantization on \emph{emergent abilities}, which are important characteristics that distinguish LLMs from small language models. Specially, we examine the abilities of in-context learning, chain-of-thought reasoning, and instruction-following in quantized LLMs. Our empirical experiments show that these emergent abilities still exist in 4-bit quantization models, while 2-bit models encounter severe performance degradation on the test of these abilities. To improve the performance of low-bit models, we conduct two special experiments: (1) fine-gained impact analysis that studies which components (or substructures) are more sensitive to quantization, and (2) performance compensation through model fine-tuning. Our work derives a series of important findings to understand the impact of quantization on emergent abilities, and sheds lights on the possibilities of extremely low-bit quantization for LLMs.	翻訳日:2023-07-27 15:15:31 公開日:2023-07-26
# スキップ接続を伴わない顔スワップ用強化アンタングル Reinforced Disentanglement for Face Swapping without Skip Connection ( http://arxiv.org/abs/2307.07928v3 ) ライセンス: Link先を確認	Xiaohang Ren, Xingyu Chen, Pengfei Yao, Heung-Yeung Shum, Baoyuan Wang	(参考訳) SOTAのフェイススワップモデルでは、ターゲットのアイデンティティ(形状)がリークされたり、ターゲットの非アイデンティティ属性(背景、毛髪)が最終結果に完全に保存されないという問題がまだ残っている。 We show that this insufficient disentanglement is caused by two flawed designs that were commonly adopted in prior models: (1) counting on only one compressed encoder to represent both the semantic-level non-identity facial attributes(i.e., pose) and the pixel-level non-facial region details, which is contradictory to satisfy at the same time; (2) highly relying on long skip-connections between the encoder and the final generator, leaking a certain amount of target face identity into the result. そこで我々は,2つのターゲットエンコーダを用いて,顔領域の画素レベルの非顔領域属性と意味的非顔領域属性をそれぞれキャプチャする「WSCスワップ」という新しい顔スワップフレームワークを提案する。対象エンコーダの絡み合い学習をさらに強化するために,逆訓練(gan)によるid消去損失と,[11]のような先行3dmmモデルによる非id化保存損失の両方を用いる。 faceforensics++ と celeba-hq の両方の広範な実験により、我々の結果は、以前完全に無視されたアイデンティティ一貫性を測定するための新しいメトリックを含む、リッチなメトリクスセットの以前の成果を大きく上回っていることが分かりました。 The SOTA face swap models still suffer the problem of either target identity (i.e., shape) being leaked or the target non-identity attributes (i.e., background, hair) failing to be fully preserved in the final results. We show that this insufficient disentanglement is caused by two flawed designs that were commonly adopted in prior models: (1) counting on only one compressed encoder to represent both the semantic-level non-identity facial attributes(i.e., pose) and the pixel-level non-facial region details, which is contradictory to satisfy at the same time; (2) highly relying on long skip-connections between the encoder and the final generator, leaking a certain amount of target face identity into the result. To fix them, we introduce a new face swap framework called 'WSC-swap' that gets rid of skip connections and uses two target encoders to respectively capture the pixel-level non-facial region attributes and the semantic non-identity attributes in the face region. To further reinforce the disentanglement learning for the target encoder, we employ both identity removal loss via adversarial training (i.e., GAN) and the non-identity preservation loss via prior 3DMM models like [11]. Extensive experiments on both FaceForensics++ and CelebA-HQ show that our results significantly outperform previous works on a rich set of metrics, including one novel metric for measuring identity consistency that was completely neglected before.	翻訳日:2023-07-27 15:15:08 公開日:2023-07-26
# no train no gain: トランスフォーマーベースの言語モデルのための効率的なトレーニングアルゴリズムの再検討 No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models ( http://arxiv.org/abs/2307.06440v2 ) ライセンス: Link先を確認	Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J. Kusner	(参考訳) トランスフォーマーベースの言語モデルのトレーニングに必要な計算量は近年急増している。この傾向は、トレーニング、バリデーション、下流のパフォーマンスを標準トレーニングよりも高速に向上するために設計された効率的なトレーニングアルゴリズムの研究を動機付けている。本研究では,動的アーキテクチャ (レイヤスタック,レイヤドロップ),バッチ選択 (選択バックプロップ,rho損失),効率的な最適化 (lion,sophia) という3つのカテゴリを再検討する。このような手法を用いて, BERT と T5 を固定計算予算で事前学習すると, トレーニング, 検証, ダウンストリームのゲインが, 完全に遅延した学習率のベースラインに比べて消失することがわかった。我々は,すべての計算時間を参照システム時間と呼ぶ参照マシンにマッピングすることにより,任意のマシン上での計算を可能にする評価プロトコルを定義する。我々は提案するプロトコルの限界について議論し、効率的なトレーニング手順における厳密な研究を促進するためにコードをリリースした。 The computation necessary for training Transformer-based language models has skyrocketed in recent years. This trend has motivated research on efficient training algorithms designed to improve training, validation, and downstream performance faster than standard training. In this work, we revisit three categories of such algorithms: dynamic architectures (layer stacking, layer dropping), batch selection (selective backprop, RHO loss), and efficient optimizers (Lion, Sophia). When pre-training BERT and T5 with a fixed computation budget using such methods, we find that their training, validation, and downstream gains vanish compared to a baseline with a fully-decayed learning rate. We define an evaluation protocol that enables computation to be done on arbitrary machines by mapping all computation time to a reference machine which we call reference system time. We discuss the limitations of our proposed protocol and release our code to encourage rigorous research in efficient training procedures: https://github.com/JeanKaddour/NoTrainNoGain.	翻訳日:2023-07-27 15:14:40 公開日:2023-07-26
# MMBench: マルチモーダルモデルはオールアラウンドプレイヤーか? MMBench: Is Your Multi-modal Model an All-around Player? ( http://arxiv.org/abs/2307.06281v2 ) ライセンス: Link先を確認	Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin	(参考訳) 大規模視覚言語モデルは近年顕著な進歩を遂げており、視覚情報に関する認識と推論能力を示している。しかし、これらの大きな視覚言語モデルをどのように効果的に評価するかは大きな障害であり、将来のモデル開発を妨げる。 VQAv2やCOCO Captionのような従来のベンチマークは、定量的なパフォーマンス測定を提供するが、きめ細かい能力評価と非ロバスト評価の指標が欠如している。近年のOwlEvalのような主観的ベンチマークは、人間の労働を取り入れたモデル能力の包括的な評価を提供するが、それらはスケーラブルではなく、重大なバイアスを示す。これらの課題に対応するために,新しいマルチモーダリティベンチマークMMBenchを提案する。 MMBenchは、主に2つの要素からなる包括的な評価パイプラインを方法論的に開発する。第1の要素は厳密にキュレートされたデータセットで、既存の類似ベンチマークを、さまざまな評価質問や能力で上回る。第2の要素は、新しいCircularEval戦略を導入し、ChatGPTの使用を取り入れている。この実装は、フリーフォーム予測を事前定義された選択に変換するように設計されているので、モデルの予測をより堅牢な評価が容易になる。 mmbenchは視覚言語モデルの様々な能力を堅牢に評価するための体系的に設計された客観的ベンチマークである。 mmbenchが研究コミュニティのモデルの評価を改善し、この分野の今後の進歩を促進することを願っている。プロジェクトページ: https://opencompass.org.cn/mmbench Large vision-language models have recently achieved remarkable progress, exhibiting great perception and reasoning abilities concerning visual information. However, how to effectively evaluate these large vision-language models remains a major obstacle, hindering future model development. Traditional benchmarks like VQAv2 or COCO Caption provide quantitative performance measurements but suffer from a lack of fine-grained ability assessment and non-robust evaluation metrics. Recent subjective benchmarks, such as OwlEval, offer comprehensive evaluations of a model's abilities by incorporating human labor, but they are not scalable and display significant bias. In response to these challenges, we propose MMBench, a novel multi-modality benchmark. MMBench methodically develops a comprehensive evaluation pipeline, primarily comprised of two elements. The first element is a meticulously curated dataset that surpasses existing similar benchmarks in terms of the number and variety of evaluation questions and abilities. The second element introduces a novel CircularEval strategy and incorporates the use of ChatGPT. This implementation is designed to convert free-form predictions into pre-defined choices, thereby facilitating a more robust evaluation of the model's predictions. MMBench is a systematically-designed objective benchmark for robustly evaluating the various abilities of vision-language models. We hope MMBench will assist the research community in better evaluating their models and encourage future advancements in this domain. Project page: https://opencompass.org.cn/mmbench.	翻訳日:2023-07-27 15:14:21 公開日:2023-07-26
# Prompt Generate Train (PGT):オープンブック質問応答のための検索拡張生成モデルのFew-shot Domain Adaption Prompt Generate Train (PGT): Few-shot Domain Adaption of Retrieval Augmented Generation Models for Open Book Question-Answering ( http://arxiv.org/abs/2307.05915v2 ) ライセンス: Link先を確認	C. S. Krishna	(参考訳) 本稿では,オープンブック質問応答のための生成的質問応答モデルを開発するためのフレームワークであるPrompt, Generate, Train (PGT)を提案する。このフレームワークは、教師付き微調整および合成フィードバックによる強化学習を用いて、レトリバー拡張生成(RAG)モデルをターゲット領域に適用する。これを仮定すると,GPT-4をベースとしたテキスト内検索拡張生成と競合する整合的不確実性校正モデルが得られ,より低いサービスコストで関連する回答が生成される。フレームワークの合成生成パイプラインは、オープンソースのLCMと新しい一貫性フィルタリングスキームを使用して、<passage, question, answer>タプルからなる合成トレーニングデータを生成する。パイプラインは、コーパス全体にわたる抽象的および抽出的な質問を生成するように設計されている。このフレームワークでは,高密度検索器(ColBERTv2)と,合成データセット上に小型のLPMからなるRAGモデルを微調整することを提案する。並行して、このフレームワークは、合成されたサンプルの事前関連順序付けを用いて、幻覚された回答よりも高いドメイン基底回答をスコアするRewardモデルを訓練する。次のフェーズでは、RAGモデルを強化学習(Proximal Policy Optimization)を使用してターゲットドメインと整合させる。このステップは、RAGモデルの基底化された回答を生成し、ドメインの質問を無視する能力を改善する可能性がある。最終段階では、このフレームワークは抽出された質問に対するモデルの不確実性を校正する。 We propose a framework - Prompt, Generate, Train (PGT) - to efficiently develop a generative question-answering model for open-book question-answering over a proprietary collection of text documents. The framework adapts a retriever augmented generation (RAG) model to the target domain using supervised fine-tuning and reinforcement learning with synthetic feedback in a few-shot setting. This, we hypothesize, will yield an aligned, uncertainty calibrated model that is competitive with GPT-4 based in-context retrieval augmented generation in generating relevant answers at lower serving costs. The framework's synthetic generation pipeline will generate synthetic training data comprising <passage, question, answer> tuples using an open-source LLM and a novel consistency filtering scheme. The pipeline will be designed to generate both abstractive and extractive questions that span the entire corpus. The framework proposes to fine-tune a smaller RAG model comprising a dense retriever (ColBERTv2) and a smaller sized LLM on the synthetic dataset. In parallel, the framework will train a Reward model to score domain grounded answers higher than hallucinated answers using an a priori relevance ordering of synthetically assembled samples. In the next phase, the framework will align the RAG model with the target domain using reinforcement learning (Proximal Policy Optimization). This step may improve the RAG model's ability to generate grounded answers and ignore out of domain questions. In the final phase, the framework will calibrate the model's uncertainty for extractive question-answers.	翻訳日:2023-07-27 15:13:57 公開日:2023-07-26
# PKU-GoodsAD:教師なし異常検出とセグメンテーションのためのスーパーマーケットグッズデータセット PKU-GoodsAD: A Supermarket Goods Dataset for Unsupervised Anomaly Detection and Segmentation ( http://arxiv.org/abs/2307.04956v2 ) ライセンス: Link先を確認	Jian Zhang, Runwei Ding, Miaoju Ban, Ge Yang	(参考訳) 視覚異常検出はコンピュータビジョンの分野で多くのタスクに必須であり、一般的に用いられる。最近の異常検出データセットは主に産業自動化検査、医療画像分析、ビデオ監視に焦点を当てている。無人のスーパーマーケットやスマート製造における異常検出の適用範囲を広げ,研究するために,スーパーマーケット商品の異常検出(GoodsAD)データセットを導入する。 484種類の外見品を6つのカテゴリに分けた6124枚の高解像度画像を含んでいる。各カテゴリには、変形、表面損傷、開口など、いくつかの一般的な種類の異常が含まれている。異常はテクスチャ変化と構造変化の両方を含む。教師なしの設定に従い、通常の(欠陥のない)画像のみをトレーニングに使用する。画素精度の基底真理領域は、全ての異常に対して提供される。また,現在最先端の教師なし異常検出手法を徹底的に評価する。この最初のベンチマークは、産業的異常検出データセット(例えばMVTec AD)でうまく機能するいくつかのメソッドが、我々のデータセットで性能が悪いことを示している。これは、現実世界のアプリケーションに焦点を当てたスーパーマーケット商品異常検出のための包括的でマルチオブジェクトデータセットである。 Visual anomaly detection is essential and commonly used for many tasks in the field of computer vision. Recent anomaly detection datasets mainly focus on industrial automated inspection, medical image analysis and video surveillance. In order to broaden the application and research of anomaly detection in unmanned supermarkets and smart manufacturing, we introduce the supermarket goods anomaly detection (GoodsAD) dataset. It contains 6124 high-resolution images of 484 different appearance goods divided into 6 categories. Each category contains several common different types of anomalies such as deformation, surface damage and opened. Anomalies contain both texture changes and structural changes. It follows the unsupervised setting and only normal (defect-free) images are used for training. Pixel-precise ground truth regions are provided for all anomalies. Moreover, we also conduct a thorough evaluation of current state-of-the-art unsupervised anomaly detection methods. This initial benchmark indicates that some methods which perform well on the industrial anomaly detection dataset (e.g., MVTec AD), show poor performance on our dataset. This is a comprehensive, multi-object dataset for supermarket goods anomaly detection that focuses on real-world applications.	翻訳日:2023-07-27 15:13:28 公開日:2023-07-26
# SAS Video-QA: 効率的なビデオ質問応答のための自己適応サンプリング SAS Video-QA: Self-Adaptive Sampling for Efficient Video Question-Answering ( http://arxiv.org/abs/2307.04192v2 ) ライセンス: Link先を確認	Wei Han, Hui Chen, Min-Yen Kan, Soujanya Poria	(参考訳) ビデオ質問応答は、ビデオ理解の分野における基本的な課題である。ビデオ変換器を備えた現在の視覚言語モデル(VLM)では、時間的モデリングが可能であり、優れた結果が得られるが、計算能力の巨大なコストがかかるため、リアルタイムアプリケーションシナリオへのデプロイには高すぎる。 An economical workaround only samples a small portion of frames to represent the main content of that video and tune an image--text model on these sampled frames. Recent video understanding models usually randomly sample a set of frames or clips, regardless of internal correlations between their visual contents, nor their relevance to the problem. We argue that such kinds of aimless sampling may omit the key frames from which the correct answer can be deduced, and the situation gets worse when the sampling sparsity increases, which always happens as the video lengths increase. To mitigate this issue, we propose two frame sampling strategies, namely the most domain frames (MDF) and most implied frames (MIF), to maximally preserve those frames that are most likely vital to the given questions. MDF passively minimizes the risk of key frame omission in a bootstrap manner, while MIS actively searches key frames customized for each video--question pair with the assistance of auxiliary models. 3つの高度なVLM(CLIP, GIT, All-in-one)による3つの公開データセットに対する実験結果から,提案手法が画像テキスト事前学習モデルの性能を向上させることを示す。本論文で提案されている手法に関するソースコードはhttps://github.com/declare-lab/sas-vqa.comで公開されている。 Video question--answering is a fundamental task in the field of video understanding. Although current vision--language models (VLMs) equipped with Video Transformers have enabled temporal modeling and yielded superior results, they are at the cost of huge computational power and thus too expensive to deploy in real-time application scenarios. An economical workaround only samples a small portion of frames to represent the main content of that video and tune an image--text model on these sampled frames. Recent video understanding models usually randomly sample a set of frames or clips, regardless of internal correlations between their visual contents, nor their relevance to the problem. We argue that such kinds of aimless sampling may omit the key frames from which the correct answer can be deduced, and the situation gets worse when the sampling sparsity increases, which always happens as the video lengths increase. To mitigate this issue, we propose two frame sampling strategies, namely the most domain frames (MDF) and most implied frames (MIF), to maximally preserve those frames that are most likely vital to the given questions. MDF passively minimizes the risk of key frame omission in a bootstrap manner, while MIS actively searches key frames customized for each video--question pair with the assistance of auxiliary models. The experimental results on three public datasets from three advanced VLMs (CLIP, GIT and All-in-one) demonstrate that our proposed strategies can boost the performance for image--text pretrained models. The source codes pertaining to the method proposed in this paper are publicly available at https://github.com/declare-lab/sas-vqa.	翻訳日:2023-07-27 15:13:12 公開日:2023-07-26
# 高次元平均王問題に対する実験的解法 Experimental Solutions to the High-Dimensional Mean King's Problem ( http://arxiv.org/abs/2307.12938v2 ) ライセンス: Link先を確認	Tareq Jaouni, Xiaoqin Gao, S\"oren Arlt, Mario Krenn and Ebrahim Karimi	(参考訳) 1987年、ヴァイドマン、アハラノフ、アルベルトは量子エンタングルメントを利用するだけで解ける平均王問題(Mean King's Problem, MKP)というパズルを提唱した。この問題に対する素動力の解が存在することが示されているが、これらは2つ以上の次元で実験的に実現されていない。我々は、MKPを素次元(D$)で解くための一般的な第一種実験スキームを提案する。私たちの検索は、デジタル発見フレームワークpytheusによって導かれ、量子光学実験的なセットアップの高度に解釈可能なグラフベースの表現を見つける。原理の証明として, 3次元, 5次元, 7次元のケースに対する解法を詳細に検討する。最大成功確率は72.8 %$、45.8 %$、34.8 %$である。したがって、コンピュータにインスパイアされたスキームは古典的な確率(1/D$)の2倍を超える解を導き、実験的な実装の可能性を実証する。 In 1987, Vaidman, Aharanov, and Albert put forward a puzzle called the Mean King's Problem (MKP) that can be solved only by harnessing quantum entanglement. Prime-powered solutions to the problem have been shown to exist, but they have not yet been experimentally realized for any dimension beyond two. We propose a general first-of-its-kind experimental scheme for solving the MKP in prime dimensions ($D$). Our search is guided by the digital discovery framework PyTheus, which finds highly interpretable graph-based representations of quantum optical experimental setups; using it, we find specific solutions and generalize to higher dimensions through human insight. As proof of principle, we present a detailed investigation of our solution for the three-, five-, and seven-dimensional cases. We obtain maximum success probabilities of $72.8 \%$, $45.8\%$, and $34.8 \%$, respectively. We, therefore, posit that our computer-inspired scheme yields solutions that exceed the classical probability ($1/D$) twofold, demonstrating its promise for experimental implementation.	翻訳日:2023-07-27 15:07:22 公開日:2023-07-26
# 教師なし人物再同定のためのハードスケルトンマイニングを用いた階層的骨格メタプロトタイプコントラスト学習 Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-Identification ( http://arxiv.org/abs/2307.12917v2 ) ライセンス: Link先を確認	Haocong Rao, Cyril Leung, Chunyan Miao	(参考訳) 深度センサーと深度学習の急速な進歩により、骨格に基づく人物再識別(re-ID)モデルは近年、多くの利点で顕著な進歩を遂げている。既存のソリューションのほとんどは、同一の骨格の重要性を前提として、身体関節から単一レベルの骨格特徴を学習するが、通常、よりグローバルな身体パターンを持つ肢レベルのような様々なレベルからより有用な骨格特徴を活用できない。これらの手法のラベル依存性は、より一般的な骨格表現を学ぶ際の柔軟性を制限している。本稿では,HSM(Hard Skeleton Mining)を用いた階層型非教師付きメタプロトタイプコントラストラーニング(Hi-MPC)手法を提案する。まず,骨格の階層的表現を構築し,身体関節,構成要素,四肢のレベルから体と運動の特徴をモデル化する。その後、階層的なメタプロトタイプコントラスト学習モデルが提案され、異なるレベルの骨格から最も典型的な骨格の特徴(プロトタイプ)をクラスタリングし、対比する。原原型を複数の同種変換を伴うメタプロトタイプに変換することにより,原型固有の一貫性を学習し,人体再IDのより効果的な骨格特徴を捉える。さらに, 各骨格の情報的重要性を適応的に推測し, より識別的な骨格表現を学習するために, 硬い骨格のマイニング機構を考案した。 5つのデータセットに関する広範な評価は、我々のアプローチが様々な最先端のスケルトンベース手法よりも優れていることを示している。さらに,本手法が推定骨格を用いたクロスビューパーソン・リIDとRGBベースのシナリオに適用可能であることを示す。 With rapid advancements in depth sensors and deep learning, skeleton-based person re-identification (re-ID) models have recently achieved remarkable progress with many advantages. Most existing solutions learn single-level skeleton features from body joints with the assumption of equal skeleton importance, while they typically lack the ability to exploit more informative skeleton features from various levels such as limb level with more global body patterns. The label dependency of these methods also limits their flexibility in learning more general skeleton representations. This paper proposes a generic unsupervised Hierarchical skeleton Meta-Prototype Contrastive learning (Hi-MPC) approach with Hard Skeleton Mining (HSM) for person re-ID with unlabeled 3D skeletons. Firstly, we construct hierarchical representations of skeletons to model coarse-to-fine body and motion features from the levels of body joints, components, and limbs. Then a hierarchical meta-prototype contrastive learning model is proposed to cluster and contrast the most typical skeleton features ("prototypes") from different-level skeletons. By converting original prototypes into meta-prototypes with multiple homogeneous transformations, we induce the model to learn the inherent consistency of prototypes to capture more effective skeleton features for person re-ID. Furthermore, we devise a hard skeleton mining mechanism to adaptively infer the informative importance of each skeleton, so as to focus on harder skeletons to learn more discriminative skeleton representations. Extensive evaluations on five datasets demonstrate that our approach outperforms a wide variety of state-of-the-art skeleton-based methods. We further show the general applicability of our method to cross-view person re-ID and RGB-based scenarios with estimated skeletons.	翻訳日:2023-07-27 15:07:03 公開日:2023-07-26
# AMAE:胸部X線二重分布異常検出のための前訓練マスク付きオートエンコーダの適応 AMAE: Adaptation of Pre-Trained Masked Autoencoder for Dual-Distribution Anomaly Detection in Chest X-Rays ( http://arxiv.org/abs/2307.12721v2 ) ライセンス: Link先を確認	Behzad Bozorgtabar, Dwarikanath Mahapatra, Jean-Philippe Thiran	(参考訳) 胸部x線写真などの医療画像における教師なし異常検出は、異常データの労働集約的かつ費用のかかる専門家による注釈の不足を軽減するため、スポットライトを浴びている。しかしながら、既存のほとんどのメソッドは、通常のクラスからの表現のみに基づいて訓練された1クラス分類として定式化され、ラベルなしデータの潜在的重要な部分を捨てる。本報告では, 胸部X線に対して, 正常画像と未ラベル画像の両方を含むトレーニングデータ全体を用いて, より実用的, 二重分布異常検出に着目する。画像領域を再構成するために部分的な画像入力を用いて訓練された現代の自己教師付き視覚トランスフォーマーモデルに触発され,事前学習されたマスク付きオートエンコーダ(mae)の適応のための2段階アルゴリズムであるamaeを提案する。 MAEの初期化から始まり、AMAEはまず通常の訓練画像のみから合成異常を生成し、冷凍変圧器の特徴を軽量に分類する。次に,異常を含むラベル付き画像を活用する適応戦略を提案する。この適応方式は、未ラベル画像に擬似ラベルを割り当て、擬似ラベル画像の正規分布と異常分布をモデル化するために2つのmaeベースモジュールを使用する。提案手法の有効性を,ラベルのないトレーニングセットにおいて異なる異常比で評価する。 AMAEは、競合する自己監督型および二重分布異常検出法よりも一貫したパフォーマンス向上をもたらし、RSNA、NIH-CXR、VinDr-CXRの3つの公開胸部X線ベンチマークに新しい最先端を設定した。 Unsupervised anomaly detection in medical images such as chest radiographs is stepping into the spotlight as it mitigates the scarcity of the labor-intensive and costly expert annotation of anomaly data. However, nearly all existing methods are formulated as a one-class classification trained only on representations from the normal class and discard a potentially significant portion of the unlabeled data. This paper focuses on a more practical setting, dual distribution anomaly detection for chest X-rays, using the entire training data, including both normal and unlabeled images. Inspired by a modern self-supervised vision transformer model trained using partial image inputs to reconstruct missing image regions -- we propose AMAE, a two-stage algorithm for adaptation of the pre-trained masked autoencoder (MAE). Starting from MAE initialization, AMAE first creates synthetic anomalies from only normal training images and trains a lightweight classifier on frozen transformer features. Subsequently, we propose an adaptation strategy to leverage unlabeled images containing anomalies. The adaptation scheme is accomplished by assigning pseudo-labels to unlabeled images and using two separate MAE based modules to model the normative and anomalous distributions of pseudo-labeled images. The effectiveness of the proposed adaptation strategy is evaluated with different anomaly ratios in an unlabeled training set. AMAE leads to consistent performance gains over competing self-supervised and dual distribution anomaly detection methods, setting the new state-of-the-art on three public chest X-ray benchmarks: RSNA, NIH-CXR, and VinDr-CXR.	翻訳日:2023-07-27 15:06:34 公開日:2023-07-26
# 量子ネットワークの絡み合い:力学、エナリング技術、課題、研究の方向性 Entanglement-Assisted Quantum Networks: Mechanics, Enabling Technologies, Challenges, and Research Directions ( http://arxiv.org/abs/2307.12490v2 ) ライセンス: Link先を確認	Zhonghui Li, Kaiping Xue, Jian Li, Lutong Chen, Ruidong Li, Zhaoying Wang, Nenghai Yu, David S.L. Wei, Qibin Sun, Jun Lu	(参考訳) 過去数十年間、理論研究から実験的実証まで、量子情報技術において大きな進歩を遂げてきた。革命的量子アプリケーションは現在ライムライトにあり、量子情報技術の利点を示し、学術や産業における研究ホットスポットとなっている。量子アプリケーションがより深い影響とより広い応用をもたらすために、量子チャネルを介して複数の量子ノードの相互接続が不可欠である。量子ノード間の量子情報伝送を実現するエンタングルメント支援量子ネットワークの構築が主な目標である。しかし、絡み合い支援量子ネットワークは、重ね合わせ原理、無閉定理、量子絡み合いといった量子力学のユニークな法則によって制御され、古典的ネットワークとは区別される。そのため、絡み合い支援量子ネットワークの確立には基本的な取り組みが必要である。いくつかの洞察に富んだ調査は、絡み合い支援量子ネットワークの道を開いたが、これらの研究の大半は、重要なネットワーク問題を無視した技術と量子アプリケーションの実現に焦点を当てている。本報告では,量子ネットワークの絡み合いに関する包括的調査を行う。本論文は,基本力学の見直しと有効化技術に加えて,ネットワーク構造,作業原理,開発段階の詳細な概要を提供し,古典的ネットワークとの差異を明らかにする。さらに、広域絡み合い支援量子ネットワーク構築の課題にも対処している。さらに,今後の絡み合い支援量子ネットワークの実現を促進するため,アーキテクチャ設計,絡み合いベースのネットワーク問題,標準化など,オープン研究の方向性を強調する。 Over the past few decades, significant progress has been made in quantum information technology, from theoretical studies to experimental demonstrations. Revolutionary quantum applications are now in the limelight, showcasing the advantages of quantum information technology and becoming a research hotspot in academia and industry. To enable quantum applications to have a more profound impact and wider application, the interconnection of multiple quantum nodes through quantum channels becomes essential. Building an entanglement-assisted quantum network, capable of realizing quantum information transmission between these quantum nodes, is the primary goal. However, entanglement-assisted quantum networks are governed by the unique laws of quantum mechanics, such as the superposition principle, the no-cloning theorem, and quantum entanglement, setting them apart from classical networks. Consequently, fundamental efforts are required to establish entanglement-assisted quantum networks. While some insightful surveys have paved the way for entanglement-assisted quantum networks, most of these studies focus on enabling technologies and quantum applications, neglecting critical network issues. In response, this paper presents a comprehensive survey of entanglement-assisted quantum networks. Alongside reviewing fundamental mechanics and enabling technologies, the paper provides a detailed overview of the network structure, working principles, and development stages, highlighting the differences from classical networks. Additionally, the challenges of building wide-area entanglement-assisted quantum networks are addressed. Furthermore, the paper emphasizes open research directions, including architecture design, entanglement-based network issues, and standardization, to facilitate the implementation of future entanglement-assisted quantum networks.	翻訳日:2023-07-27 15:05:38 公開日:2023-07-26
# タイムウインドウを用いた動的車両経路のハイブリッド遺伝的探索 Hybrid Genetic Search for Dynamic Vehicle Routing with Time Windows ( http://arxiv.org/abs/2307.11800v2 ) ライセンス: Link先を確認	Mohammed Ghannam and Ambros Gleixner	(参考訳) 時間窓付き動的車両ルーティング問題(DVRPTW)は、従来のVRPTWをオンライン環境に一般化したものである。本稿では,VRPTWのためのヒューリスティックアルゴリズムであるHybrid Genetic Search (HGS)アルゴリズムを動的変種に適用する。本稿では,hgsアルゴリズムの影響を受ける構成要素として,巨大ツーリング表現,コスト計算,初期個体数,クロスオーバー,局所探索について論じる。弊社のアプローチでは、これらのコンポーネントをDVRPTWに修正し、ソリューションの品質と今後の顧客の到着に対する制約のバランスを図っている。この目的のために私たちは,異なるサイズのソリューションを比較し,コストを正規化し,事前のトレーニングを必要としない将来の時代を計算するための手法を考案する。この制限にもかかわらず、EUROのデータに対する計算結果がNeurIPS Vehicle Routing Competition 2022と一致し、最高の性能のベースラインアルゴリズムよりも解の質が大幅に向上した。 The dynamic vehicle routing problem with time windows (DVRPTW) is a generalization of the classical VRPTW to an online setting, where customer data arrives in batches and real-time routing solutions are required. In this paper we adapt the Hybrid Genetic Search (HGS) algorithm, a successful heuristic for VRPTW, to the dynamic variant. We discuss the affected components of the HGS algorithm including giant-tour representation, cost computation, initial population, crossover, and local search. Our approach modifies these components for DVRPTW, attempting to balance solution quality and constraints on future customer arrivals. To this end, we devise methods for comparing different-sized solutions, normalizing costs, and accounting for future epochs that do not require any prior training. Despite this limitation, computational results on data from the EURO meets NeurIPS Vehicle Routing Competition 2022 demonstrate significantly improved solution quality over the best-performing baseline algorithm.	翻訳日:2023-07-27 15:04:47 公開日:2023-07-26
# mediagpt : 中国語メディアのための大規模言語モデル MediaGPT : A Large Language Model For Chinese Media ( http://arxiv.org/abs/2307.10930v2 ) ライセンス: Link先を確認	Zhonghao Wang, Zijia Lu, Bo Jin, Haiying Deng	(参考訳) 大規模言語モデル(llm)は、高品質なテキストの生成と、メディアドメインを含む大量のデータに基づく予測に優れた能力を示している。しかし、実際的な応用では、メディアのユースケースとLLMの汎用的応用の違いが、特に中国語で顕著になっている。本稿では,メディアドメイン固有のLCMの特徴を一般のLCMと比較し,各領域の要求を満たすために多様なタスク命令型を設計し,メディアドメインに適した独自のデータセットを構築した。これらに基づいて,中国メディアドメインのためのドメイン固有llmであるmediagpt,ドメイン固有データによるトレーニング,専門家のsftデータを提案する。そこで本研究では,人的専門家による評価と強力なモデル評価を行うことにより,メディアGPTが中国メディアドメインタスクの主流モデルよりも優れ,ドメインデータの重要性やドメイン定義のプロンプト型が有効ドメイン固有LLM構築に有効であることが実証された。 Large language models (LLMs) have shown remarkable capabilities in generating high-quality text and making predictions based on large amounts of data, including the media domain. However, in practical applications, the differences between the media's use cases and the general-purpose applications of LLMs have become increasingly apparent, especially Chinese. This paper examines the unique characteristics of media-domain-specific LLMs compared to general LLMs, designed a diverse set of task instruction types to cater the specific requirements of the domain and constructed unique datasets that are tailored to the media domain. Based on these, we proposed MediaGPT, a domain-specific LLM for the Chinese media domain, training by domain-specific data and experts SFT data. By performing human experts evaluation and strong model evaluation on a validation set, this paper demonstrated that MediaGPT outperforms mainstream models on various Chinese media domain tasks and verifies the importance of domain data and domain-defined prompt types for building an effective domain-specific LLM.	翻訳日:2023-07-27 15:04:30 公開日:2023-07-26
# 知覚的アライメントモニタリング Deceptive Alignment Monitoring ( http://arxiv.org/abs/2307.10569v2 ) ライセンス: Link先を確認	Andres Carranza, Dhruv Pai, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo	(参考訳) 大規模な機械学習モデルの能力が拡大し続け、そのようなモデルに与えられる自律性が拡大するにつれて、新しい敵の織機(モデルそのもの)が見えてくる。モデルが一見合理的に振る舞うという脅威は、内密かつ微妙にその振る舞いを操作上の理由から修正する一方で、AIセーフティ&アライメントのコミュニティにおいて、詐欺的アライメントと呼ばれることが多い。したがって、この新たな方向を認知アライメントモニタリングと呼ぶ。そこで本研究では,近未来にますます重要となり,相互に絡み合うであろう,多様な機械学習サブフィールドにおける新たな方向性を特定し,これらの分野における進歩は,長期的な課題と新たな研究機会の両方をもたらすと論じる。我々は、これらの新興方向への敵対的機械学習コミュニティのさらなる関与を提唱することで、結論付ける。 As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves. The threat that a model might behave in a seemingly reasonable manner, while secretly and subtly modifying its behavior for ulterior reasons is often referred to as deceptive alignment in the AI Safety & Alignment communities. Consequently, we call this new direction Deceptive Alignment Monitoring. In this work, we identify emerging directions in diverse machine learning subfields that we believe will become increasingly important and intertwined in the near future for deceptive alignment monitoring, and we argue that advances in these fields present both long-term challenges and new research opportunities. We conclude by advocating for greater involvement by the adversarial machine learning community in these emerging directions.	翻訳日:2023-07-27 15:04:12 公開日:2023-07-26
# TimeTuner: 時系列予測の時間表現と非現実的説明 TimeTuner: Diagnosing Time Representations for Time-Series Forecasting with Counterfactual Explanations ( http://arxiv.org/abs/2307.09916v2 ) ライセンス: Link先を確認	Jianing Hao, Qing Shi, Yilin Ye, and Wei Zeng	(参考訳) ディープラーニング(DL)アプローチは、複雑なDLモデルを設計するための多くの取り組みとともに、時系列予測にますます使われています。近年の研究では、dlの成功は効果的なデータ表現に起因しており、機能工学と表現学習の分野を育んでいることが示されている。しかし、機能学習の自動化アプローチは通常、事前知識の導入、変数間の相互作用の特定、モデルの信頼性を保証するための評価指標の選択に限られる。これらの制約を改善するために,本論文では,モデル行動が局所的相関,定常性,時系列表現の粒度とどのように関連しているかをアナリストが理解するための新しいビジュアル分析フレームワークであるTimeTunerを提案する。まず, 時系列表現と多変量特徴, モデル予測の関係を関連づけるために, 反事実的説明を利用する。次に,分割型相関行列と分岐二変量ストライプを含む複数の協調ビューを設計し,ユーザが変換選択プロセスに踏み込み,特徴空間をナビゲートし,モデル性能を推論するためのインタラクションセットを提供する。平滑化とサンプリングの2つの変換方法でタイムチューナーをインスタンス化し,実世界の太陽黒点と多変量大気汚染物質の時系列予測への適用性を示す。ドメインエキスパートからのフィードバックは、我々のシステムが時系列表現を特徴づけ、機能エンジニアリングプロセスを導くのに役立つことを示している。 Deep learning (DL) approaches are being increasingly used for time-series forecasting, with many efforts devoted to designing complex DL models. Recent studies have shown that the DL success is often attributed to effective data representations, fostering the fields of feature engineering and representation learning. However, automated approaches for feature learning are typically limited with respect to incorporating prior knowledge, identifying interactions among variables, and choosing evaluation metrics to ensure that the models are reliable. To improve on these limitations, this paper contributes a novel visual analytics framework, namely TimeTuner, designed to help analysts understand how model behaviors are associated with localized correlations, stationarity, and granularity of time-series representations. The system mainly consists of the following two-stage technique: We first leverage counterfactual explanations to connect the relationships among time-series representations, multivariate features and model predictions. Next, we design multiple coordinated views including a partition-based correlation matrix and juxtaposed bivariate stripes, and provide a set of interactions that allow users to step into the transformation selection process, navigate through the feature space, and reason the model performance. We instantiate TimeTuner with two transformation methods of smoothing and sampling, and demonstrate its applicability on real-world time-series forecasting of univariate sunspots and multivariate air pollutants. Feedback from domain experts indicates that our system can help characterize time-series representations and guide the feature engineering processes.	翻訳日:2023-07-27 15:03:57 公開日:2023-07-26
# GPT-3モデルと金融共振器 GPT-3 Models are Few-Shot Financial Reasoners ( http://arxiv.org/abs/2307.13617v2 ) ライセンス: Link先を確認	Raul Salles de Padua, Imran Qureshi and Mustafa U. Karakaplan	(参考訳) 財務分析は企業業績を評価する重要なツールである。実践者は、収益性のある投資決定を行うために財務的な質問に答え、高度な定量的分析を用いてそれを行う。その結果、QA(Financial Question Answering)は、数字に関する深い推論を必要とする質問応答タスクである。さらに、事前訓練された言語モデルが金融分野でどの程度理にかなっているかは不明である。現在の最先端技術では、検索者はテキストとジェネレータから財務問題に関する関連事実を収集し、有効な金融プログラムと最終回答を生成する必要がある。しかし、gpt-3のような最近の大規模言語モデルは、少数の例で、さまざまなタスクで最先端のパフォーマンスを達成している。我々はGPT-3でいくつかの実験を行い、特に財務問題の性質や財務文書に格納されている複雑な情報により、個別の検索モデルと論理エンジンがSOTAの性能を達成する上で不可欠な要素であることを発見した。これにより, GPT-3 に対する改良されたプロンプトエンジニアリング手法は, 微調整を伴わずにSOTA 付近の精度を達成できる。 Financial analysis is an important tool for evaluating company performance. Practitioners work to answer financial questions to make profitable investment decisions, and use advanced quantitative analyses to do so. As a result, Financial Question Answering (QA) is a question answering task that requires deep reasoning about numbers. Furthermore, it is unknown how well pre-trained language models can reason in the financial domain. The current state-of-the-art requires a retriever to collect relevant facts about the financial question from the text and a generator to produce a valid financial program and a final answer. However, recently large language models like GPT-3 have achieved state-of-the-art performance on wide variety of tasks with just a few shot examples. We run several experiments with GPT-3 and find that a separate retrieval model and logic engine continue to be essential components to achieving SOTA performance in this task, particularly due to the precise nature of financial questions and the complex information stored in financial documents. With this understanding, our refined prompt-engineering approach on GPT-3 achieves near SOTA accuracy without any fine-tuning.	翻訳日:2023-07-27 14:55:10 公開日:2023-07-26
# FacTool: 生成AIにおける顔検出 - マルチタスクとマルチドメインシナリオのためのツール拡張フレームワーク FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios ( http://arxiv.org/abs/2307.13528v2 ) ライセンス: Link先を確認	I-Chun Chern, Steffi Chern, Shiqi Chen, Weizhe Yuan, Kehua Feng, Chunting Zhou, Junxian He, Graham Neubig, Pengfei Liu	(参考訳) 生成的事前学習モデルの出現は高品質テキストの合成を促進させたが、生成したテキストの事実的誤りを特定する上での課題も生じている。特に,(1)より広い範囲のタスクが生成モデルによって処理された場合に,事実エラーを含むリスクが増大している。 2) 生成テキストは長大であり, 個々の事実に対して明確な粒度が欠如している。 (3)事実確認の過程で明らかな証拠が不足している。上記の課題を念頭に,本稿では,大規模言語モデル(ChatGPTなど)が生成するテキストの事実誤りを検出するためのタスクおよびドメインに依存しないフレームワークであるFacToolを提案する。 4つの異なるタスク(知識ベースQA、コード生成、数学的推論、科学的文献レビュー)の実験は、提案手法の有効性を示している。私たちは、ChatGPTプラグインインターフェースに関連するFacToolのコードをhttps://github.com/GAIR-NLP/factool.comでリリースします。 The emergence of generative pre-trained models has facilitated the synthesis of high-quality text, but it has also posed challenges in identifying factual errors in the generated text. In particular: (1) A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models. (2) Generated texts tend to be lengthy and lack a clearly defined granularity for individual facts. (3) There is a scarcity of explicit evidence available during the process of fact checking. With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT). Experiments on four different tasks (knowledge-based QA, code generation, mathematical reasoning, and scientific literature review) show the efficacy of the proposed method. We release the code of FacTool associated with ChatGPT plugin interface at https://github.com/GAIR-NLP/factool .	翻訳日:2023-07-27 14:54:28 公開日:2023-07-26
# デュエット:効率的でスケーラブルなヒブリド・ネウラル・リレーション・アンダースタンディング Duet: efficient and scalable hybriD neUral rElation undersTanding ( http://arxiv.org/abs/2307.13494v2 ) ライセンス: Link先を確認	Kaixin Zhang, Hongzhi Wang, Yabin Lu, Ziqi Li, Chang Shu, Yu Yan, Donghua Yang	(参考訳) 学習された濃度推定法は従来の手法に比べて高精度である。学習した方法の中で、クエリ駆動アプローチは、データとワークロードのドリフトの問題に長い間直面する。クエリ駆動手法とハイブリッド方式の両方がこの問題を回避するために提案されているが、それらのうちの最先端技術でさえ高いトレーニングと推定コスト、限られたスケーラビリティ、不安定性、高濃度および高次元テーブル上の長期分布問題に悩まされており、これは学習された濃度推定器の実践的応用に大きな影響を及ぼす。本稿では,これらの問題のほとんどが,広く用いられているプログレッシブサンプリングによるものであることを実証する。本稿では, 自己回帰モデルに述語を導入し, サンプリングや非微分可能プロセスなしに, 濃度を直接推定する安定かつ効率的でスケーラブルなハイブリッド手法であるDuetを提案し, 推定複雑性をナルーやUAEと比較して$O(n)$から$O(1)$に低減できるだけでなく, 高濃度および高次元テーブル上で高い精度を実現する。実験の結果、Duetは上記のすべての設計目標を達成でき、より実用的であり、GPU上のほとんどの学習した手法よりもCPU上での推論コストが低いことがわかった。 Learned cardinality estimation methods have achieved high precision compared to traditional methods. Among learned methods, query-driven approaches face the data and workload drift problem for a long time. Although both query-driven and hybrid methods are proposed to avoid this problem, even the state-of-art of them suffer from high training and estimation costs, limited scalability, instability, and long-tailed distribution problem on high cardinality and high dimensional tables, which seriously affects the practical application of learned cardinality estimators. In this paper, we prove that most of these problems are directly caused by the widely used progressive sampling. We solve this problem by introducing predicates into the autoregressive model and propose Duet, a stable, efficient, and scalable hybrid method to estimate cardinality directly without sampling or any non-differentiable process, which can not only reduces the inference complexity from $O(n)$ to $O(1)$ compared to Naru and UAE but also achieve higher accuracy on high cardinality and high dimensional tables. Experimental results show that Duet can achieve all the design goals above and be much more practical and even has a lower inference cost on CPU than that of most learned methods on GPU.	翻訳日:2023-07-27 14:53:59 公開日:2023-07-26
# 注意ネットワークの学習ダイナミクスについて On the Learning Dynamics of Attention Networks ( http://arxiv.org/abs/2307.13421v2 ) ライセンス: Link先を確認	Rahul Vashisht and Harish G. Ramaswamy	(参考訳) 注意モデルは一般的に、ソフトアテンション(Soft attention)、ハードアテンション(ハードアテンション)、潜在変数の辺縁的可能性(Latent variable marginal chance, LVML)という3つの標準的な損失関数のうちの1つを最適化することによって学習される。これら3つのパラダイムは、入力の右 \textit{segment} を 'select' する 'focus' モデルと、選択したセグメントをターゲットラベルに処理する 'classification' モデルである。しかし、これらは選択されたセグメントを集約する方法で大きく異なり、異なるダイナミクスと最終的な結果をもたらす。これらのパラダイムを用いて学習したモデルのユニークなシグネチャを観察し,フォーカスモデルが固定された場合の勾配降下下での分類モデルの進化の帰結として説明する。また,これらのパラダイムを簡単な設定で解析し,勾配流下のパラメータ軌跡の閉形式式を導出する。ソフトアテンションの損失により、フォーカスモデルは初期化と後続のスパッタで急速に改善する。一方、注意喪失は反対方向に振る舞う。我々の観測に基づいて、異なる損失関数の利点を組み合わせた単純なハイブリッドアプローチを提案し、半合成および実世界のデータセットの集合上でそれを実証する。 Attention models are typically learned by optimizing one of three standard loss functions that are variously called -- soft attention, hard attention, and latent variable marginal likelihood (LVML) attention. All three paradigms are motivated by the same goal of finding two models -- a `focus' model that `selects' the right \textit{segment} of the input and a `classification' model that processes the selected segment into the target label. However, they differ significantly in the way the selected segments are aggregated, resulting in distinct dynamics and final results. We observe a unique signature of models learned using these paradigms and explain this as a consequence of the evolution of the classification model under gradient descent when the focus model is fixed. We also analyze these paradigms in a simple setting and derive closed-form expressions for the parameter trajectory under gradient flow. With the soft attention loss, the focus model improves quickly at initialization and splutters later on. On the other hand, hard attention loss behaves in the opposite fashion. Based on our observations, we propose a simple hybrid approach that combines the advantages of the different loss functions and demonstrates it on a collection of semi-synthetic and real-world datasets	翻訳日:2023-07-27 14:53:36 公開日:2023-07-26
# 気候変動交渉のための動的グループ化:効果的な戦略による協力と均衡の促進 Dynamic Grouping for Climate Change Negotiation: Facilitating Cooperation and Balancing Interests through Effective Strategies ( http://arxiv.org/abs/2307.13886v1 ) ライセンス: Link先を確認	Duo Zhang, Yuren Pang, Yu Qin	(参考訳) 現在の気候変動交渉モデルの枠組みは、さらなる研究と開発を保証するいくつかの制限を提示している。本トラックでは,地理的影響とユーティリティ・フレームワークを中心に,改善のための2つの重要な領域について論じる。地理的影響の面では,(1)地域的影響から世界的影響へのシフト,(2)地域間の気候変動の影響の多様性,(3)地理的な位置と政治構造の不均一性,(4)隣国間の協力,(5)歴史的・文化的要因が気候変動交渉に影響を与えることの重要性を考察する。さらに,貯蓄率の正の効果を報奨関数と全地域間の不均質性に統合することにより,温暖化の均質性と過大評価のレベルを低減するための効用と報酬の枠組みの洗練の必要性を強調する。これらの制限に対処することで、気候変動交渉モデルの正確性と有効性を向上し、政策立案者や利害関係者が、地域・世界レベルで気候変動に取り組むための目標かつ適切な戦略を策定できることを期待する。 The current framework for climate change negotiation models presents several limitations that warrant further research and development. In this track, we discuss mainly two key areas for improvement, focusing on the geographical impacts and utility framework. In the aspects of geographical impacts, We explore five critical aspects: (1) the shift from local to global impact, (2) variability in climate change effects across regions, (3) heterogeneity in geographical location and political structures, and (4) collaborations between adjacent nations, (5) the importance of including historical and cultural factors influencing climate negotiations. Furthermore, we emphasize the need to refine the utility and rewards framework to reduce the homogeneity and the level of overestimating the climate mitigation by integrating the positive effects of saving rates into the reward function and heterogeneity among all regions. By addressing these limitations, we hope to enhance the accuracy and effectiveness of climate change negotiation models, enabling policymakers and stakeholders to devise targeted and appropriate strategies to tackle climate change at both regional and global levels.	翻訳日:2023-07-27 14:08:31 公開日:2023-07-26
# 機械学習モデルの局所的ロバストネスの効率的な推定 Efficient Estimation of the Local Robustness of Machine Learning Models ( http://arxiv.org/abs/2307.13885v1 ) ライセンス: Link先を確認	Tessa Han, Suraj Srinivas, Himabindu Lakkaraju	(参考訳) 機械学習モデルは、しばしばノイズの多い入力データに対して堅牢である必要がある。モデル予測に対する実世界のノイズ(しばしばランダムである)の影響は、モデルの局所的ロバスト性、すなわち入力周辺の局所領域におけるモデル予測の一貫性によって捉えられる。しかし、モンテカルロサンプリングに基づく局所ロバストネスの「計算」アプローチは統計的に非効率的であり、大規模アプリケーションでは計算コストが禁じられている。本研究では,局所線形関数近似と多変量正規CDFを用いた多クラス判別モデルの局所ロバスト性を効率的に計算する最初の解析的推定器を開発する。これらの推定器の導出を通じて,局所的ロバスト性がランダム化平滑化やソフトマックス確率といった概念とどのように結びついているかを示す。また、これらの推定器が標準ディープラーニングモデルの局所的ロバスト性を正確かつ効率的に計算できることを実証的に確認する。さらに、ロバスト性バイアスの測定やデータセットのノイズ摂動に弱い例の特定など、局所ロバスト性に関わる様々なタスクに対するこれらの推定器の有用性を示す。これらの解析的推定器を開発することにより、局所ロバスト性の概念的理解を深めるだけでなく、その計算を実用的なものにし、臨界下流アプリケーションにおける局所ロバスト性の利用を可能にする。 Machine learning models often need to be robust to noisy input data. The effect of real-world noise (which is often random) on model predictions is captured by a model's local robustness, i.e., the consistency of model predictions in a local region around an input. However, the na\"ive approach to computing local robustness based on Monte-Carlo sampling is statistically inefficient, leading to prohibitive computational costs for large-scale applications. In this work, we develop the first analytical estimators to efficiently compute local robustness of multi-class discriminative models using local linear function approximation and the multivariate Normal CDF. Through the derivation of these estimators, we show how local robustness is connected to concepts such as randomized smoothing and softmax probability. We also confirm empirically that these estimators accurately and efficiently compute the local robustness of standard deep learning models. In addition, we demonstrate these estimators' usefulness for various tasks involving local robustness, such as measuring robustness bias and identifying examples that are vulnerable to noise perturbation in a dataset. By developing these analytical estimators, this work not only advances conceptual understanding of local robustness, but also makes its computation practical, enabling the use of local robustness in critical downstream applications.	翻訳日:2023-07-27 14:08:10 公開日:2023-07-26
# ExeDec: ニューラルプログラム合成における構成一般化のための実行分解 ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis ( http://arxiv.org/abs/2307.13883v1 ) ライセンス: Link先を確認	Kensen Shi, Joey Hong, Manzil Zaheer, Pengcheng Yin, Charles Sutton	(参考訳) プログラムを書くとき、人々はより小さくより使い慣れたサブタスクに分解することで、新しい複雑なタスクに取り組むことができる。ニューラルプログラム合成手法が類似する機能を持つかどうかを計測することは難しいが、より単純なサブタスクで訓練されたモデルが後により複雑なタスクを解決できるかどうかを合成的に一般化するかどうかを測定できる。本稿では,プログラム合成において望ましい複数の構成一般化形式を特徴付け,ロバストフィルとディープコーダの2つの一般的なデータセットに対する一般化タスクを作成するメタベンチマークを作成する。次に,各ステップにおけるプログラム実行によって学習される課題を段階的に解決するために,実行過程を予測する新しい分解型合成戦略であるexedecを提案する。 ExeDecは合成性能が向上し、ベースラインに比べて構成一般化能力が大幅に向上した。 When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks. While it is difficult to measure whether neural program synthesis methods have similar capabilities, we can measure whether they compositionally generalize, that is, whether a model that has been trained on the simpler subtasks is subsequently able to solve more complex tasks. In this paper, we characterize several different forms of compositional generalization that are desirable in program synthesis, forming a meta-benchmark which we use to create generalization tasks for two popular datasets, RobustFill and DeepCoder. We then propose ExeDec, a novel decomposition-based synthesis strategy that predicts execution subgoals to solve problems step-by-step informed by program execution at each step. ExeDec has better synthesis performance and greatly improved compositional generalization ability compared to baselines.	翻訳日:2023-07-27 14:07:46 公開日:2023-07-26
# 人間文化: 歴史に無関係で予測可能な経験 Human Culture: A History Irrelevant and Predictable Experience ( http://arxiv.org/abs/2307.13882v1 ) ライセンス: Link先を確認	Hao Wang	(参考訳) 人間の文化研究は、ビッグデータとソーシャルネットワーク革命のおかげで、革命の機会を目の当たりにした。 Douban.com、Goodreads.com、Pandora、IMDBなどのウェブサイトが文化研究者のための新しい金鉱山となっている。 2021年と2022年に、AIコールドスタート問題のための2つのデータフリーレコメンデーションシステムを発明した。このアルゴリズムは、ユーザーの過去の好みに言及せずに、ユーザーに文化的および商業的な商品を推薦することができる。新しい発明の社会的意味は、人間の文化的嗜好であり、人間の個人に関する情報なしに正確に予測することができる。本稿では,AI技術とその文化的意味を,他のAIアルゴリズムとともに分析する。人間の文化は(主に)無関係で予測可能な経験であることを示す。 Human culture research has witnessed an opportunity of revolution thanks to the big data and social network revolution. Websites such as Douban.com, Goodreads.com, Pandora and IMDB become the new gold mine for cultural researchers. In 2021 and 2022, the author of this paper invented 2 data-free recommender systems for AI cold-start problem. The algorithms can recommend cultural and commercial products to users without reference to users' past preferences. The social implications of the new inventions are human cultural tastes can be predicted very precisely without any information related to human individuals. In this paper, we analyze the AI technologies and its cultural implications together with other AI algorithms. We show that human culture is (mostly) a history irrelevant and predictable experience.	翻訳日:2023-07-27 14:07:29 公開日:2023-07-26
# 優れた格子トレーニング:数理論による物理情報ニューラルネットワーク Good Lattice Training: Physics-Informed Neural Networks Accelerated by Number Theory ( http://arxiv.org/abs/2307.13869v1 ) ライセンス: Link先を確認	Takashi Matsubara, Takaharu Yaguchi	(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、偏微分方程式(PDE)を解くための、新しく効率的なアプローチを提供する。彼らの成功は、与えられたPDEを特定の点で満たし、解を近似するためにニューラルネットワークを訓練する物理インフォームド損失にある。しかし、PDEの解は本質的に無限次元であり、出力と解の間の距離は領域上の積分によって定義される。したがって、物理情報損失は有限近似しか得られず、離散化誤差を抑制するためには適切なコロケーション点を選択することが重要である。本稿では,数値解析の数値論的手法に触発されて,優れた格子学習(GLT)と呼ばれる新しい手法を提案する。 GLT は、少数の点や多次元空間に対しても有効であるコロケーション点の集合を提供する。実験の結果,GLTでは一様ランダムサンプリングやラテンハイパーキューブサンプリングよりも2～20倍のコロケーションポイント(計算コストの削減)が必要であり,競争性能が向上した。 Physics-informed neural networks (PINNs) offer a novel and efficient approach to solving partial differential equations (PDEs). Their success lies in the physics-informed loss, which trains a neural network to satisfy a given PDE at specific points and to approximate the solution. However, the solutions to PDEs are inherently infinite-dimensional, and the distance between the output and the solution is defined by an integral over the domain. Therefore, the physics-informed loss only provides a finite approximation, and selecting appropriate collocation points becomes crucial to suppress the discretization errors, although this aspect has often been overlooked. In this paper, we propose a new technique called good lattice training (GLT) for PINNs, inspired by number theoretic methods for numerical analysis. GLT offers a set of collocation points that are effective even with a small number of points and for multi-dimensional spaces. Our experiments demonstrate that GLT requires 2--20 times fewer collocation points (resulting in lower computational cost) than uniformly random sampling or Latin hypercube sampling, while achieving competitive performance.	翻訳日:2023-07-27 14:07:21 公開日:2023-07-26
# 高次元観測による可変性の学習源 Learning sources of variability from high-dimensional observational studies ( http://arxiv.org/abs/2307.13868v1 ) ライセンス: Link先を確認	Eric W. Bridgeford, Jaewon Chung, Brian Gilbert, Sambit Panda, Adam Li, Cencheng Shen, Alexandra Badea, Brian Caffo, Joshua T. Vogelstein	(参考訳) 因果推論は、変数の存在が観測結果に影響を及ぼすかどうかを研究する。平均治療効果」などの量によって測定されるように、このパラダイムはワクチンや薬物開発から政策介入に至るまで、多くの生物学的分野にまたがる。残念なことに、これらの手法の大部分は、しばしば単変量の結果に制限される。我々の研究は、任意の次元または可測空間を持つ結果に対する因果推定を一般化し、因果差検定として名目変数に対する従来の因果推定を定式化する。本稿では,一貫した条件付き独立性テストの簡易な調整手法を提案し,これらのテストが一貫した因果不一致性テストであることを証明した。数値実験により,提案手法であるcausal cdcorrは,既存の手法と比較して有限サンプルの妥当性とパワーが向上することを示す。私たちのメソッドはすべてオープンソースで、github.com/ebridge2/cdcorrで利用可能です。 Causal inference studies whether the presence of a variable influences an observed outcome. As measured by quantities such as the "average treatment effect," this paradigm is employed across numerous biological fields, from vaccine and drug development to policy interventions. Unfortunately, the majority of these methods are often limited to univariate outcomes. Our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space, and formulates traditional causal estimands for nominal variables as causal discrepancy tests. We propose a simple technique for adjusting universally consistent conditional independence tests and prove that these tests are universally consistent causal discrepancy tests. Numerical experiments illustrate that our method, Causal CDcorr, leads to improvements in both finite sample validity and power when compared to existing strategies. Our methods are all open source and available at github.com/ebridge2/cdcorr.	翻訳日:2023-07-27 14:07:00 公開日:2023-07-26
# 点対3D: スパース点と形状制御可能なテキスト対3D生成のギャップを埋める Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation ( http://arxiv.org/abs/2307.13908v1 ) ライセンス: Link先を確認	Chaohui Yu, Qiang Zhou, Jingliang Li, Zhe Zhang, Zhibin Wang, Fan Wang	(参考訳) 数十億もの画像テキストペアでトレーニングされた2d拡散モデルによって、テキストから3dへの生成が注目されている。既存の方法は、主に2D拡散の先行を利用して3Dモデル、例えばNeRFの生成を監督するためにスコア蒸留に依存している。しかし、スコア蒸留は視界の不整合に悩まされがちであり、暗黙のNeRFモデリングもまた任意の形状につながり、現実的で制御不能な3D生成につながる。本研究では,3次元拡散モデルと2次元拡散モデルの両方から知識を抽出することにより,スパースで自由な3次元点と現実的な形状制御可能な3次元点とのギャップを埋めることのできるポイントツー3Dの柔軟な枠組みを提案する。 Points-to-3Dの基本的な考え方は、テキストから3D生成を導くために制御可能なスパース3Dポイントを導入することである。具体的には、3次元拡散モデルであるPoint-Eから生成されたスパース点雲を1つの参照画像に条件付き幾何学的先行として用いる。スパース3D点をよりよく活用するために,このスパース3D点の形状に合わせて,NeRFの形状を適応的に駆動する効率的な点雲誘導損失を提案する。幾何制御に加えて,より視界に一貫性のある外観に最適化することを提案する。具体的には,公開された2次元画像拡散モデル制御ネットにスコア蒸留を行い,テキストを条件とし,学習したコンパクト幾何の奥行きマップを作成する。定性的かつ定量的な比較は、Points-to-3Dがビューの一貫性を改善し、テキストから3D生成のための良好な形状制御を実現することを示す。 Points-to-3Dは、テキストから3D生成を改善し制御する新しい方法を提供する。 Text-to-3D generation has recently garnered significant attention, fueled by 2D diffusion models trained on billions of image-text pairs. Existing methods primarily rely on score distillation to leverage the 2D diffusion priors to supervise the generation of 3D models, e.g., NeRF. However, score distillation is prone to suffer the view inconsistency problem, and implicit NeRF modeling can also lead to an arbitrary shape, thus leading to less realistic and uncontrollable 3D generation. In this work, we propose a flexible framework of Points-to-3D to bridge the gap between sparse yet freely available 3D points and realistic shape-controllable 3D generation by distilling the knowledge from both 2D and 3D diffusion models. The core idea of Points-to-3D is to introduce controllable sparse 3D points to guide the text-to-3D generation. Specifically, we use the sparse point cloud generated from the 3D diffusion model, Point-E, as the geometric prior, conditioned on a single reference image. To better utilize the sparse 3D points, we propose an efficient point cloud guidance loss to adaptively drive the NeRF's geometry to align with the shape of the sparse 3D points. In addition to controlling the geometry, we propose to optimize the NeRF for a more view-consistent appearance. To be specific, we perform score distillation to the publicly available 2D image diffusion model ControlNet, conditioned on text as well as depth map of the learned compact geometry. Qualitative and quantitative comparisons demonstrate that Points-to-3D improves view consistency and achieves good shape controllability for text-to-3D generation. Points-to-3D provides users with a new way to improve and control text-to-3D generation.	翻訳日:2023-07-27 13:58:42 公開日:2023-07-26
# 可変長時系列入力による星型到達可能性解析による深部ニューラルネットワークのロバスト性検証 Robustness Verification of Deep Neural Networks using Star-Based Reachability Analysis with Variable-Length Time Series Input ( http://arxiv.org/abs/2307.13907v1 ) ライセンス: Link先を確認	Neelanjana Pal, Diego Manzanas Lopez, and Taylor T Johnson	(参考訳) データ駆動型ニューラルネットワーク(nn)ベースの異常検出と予測メンテナンスは、新たな研究領域である。 NNベースの時系列データの分析は、過去の行動に関する貴重な洞察と、機器の有用な寿命(RUL)やバッテリーの充電状態(SOC)といった重要なパラメータの推定を提供する。しかし、入力時系列データは、センサーを通過する際に意図的または意図しないノイズにさらされ、堅牢な検証とこれらのNNの検証が必要である。本稿では, 時系列回帰NN(TSRegNN)に対して, 集合に基づく形式的手法を用いたロバスト性検証手法を提案する。可変長入力データを利用して入力操作を効率化し、ネットワークアーキテクチャの一般化性を高める。本手法は,(1)リチウムイオン電池のsoc推定と(2)タービンエンジンのrul推定という,予後管理および健康管理(phm)適用領域の2つのデータセットに適用する。 nnsのロバスト性は、星ベースの到達可能性分析を用いてチェックされ、いくつかのパフォーマンス指標は、入力における有界摂動がネットワーク出力、すなわち将来の結果に与える影響を評価する。全体として本論文は,実世界における時系列データのnnベース分析の検証と検証のための包括的ケーススタディを提供し,特にノイズが将来の結果に与える影響を考慮して,正確で信頼性の高い予測に対するロバストネステストの重要性を強調する。 Data-driven, neural network (NN) based anomaly detection and predictive maintenance are emerging research areas. NN-based analytics of time-series data offer valuable insights into past behaviors and estimates of critical parameters like remaining useful life (RUL) of equipment and state-of-charge (SOC) of batteries. However, input time series data can be exposed to intentional or unintentional noise when passing through sensors, necessitating robust validation and verification of these NNs. This paper presents a case study of the robustness verification approach for time series regression NNs (TSRegNN) using set-based formal methods. It focuses on utilizing variable-length input data to streamline input manipulation and enhance network architecture generalizability. The method is applied to two data sets in the Prognostics and Health Management (PHM) application areas: (1) SOC estimation of a Lithium-ion battery and (2) RUL estimation of a turbine engine. The NNs' robustness is checked using star-based reachability analysis, and several performance measures evaluate the effect of bounded perturbations in the input on network outputs, i.e., future outcomes. Overall, the paper offers a comprehensive case study for validating and verifying NN-based analytics of time-series data in real-world applications, emphasizing the importance of robustness testing for accurate and reliable predictions, especially considering the impact of noise on future outcomes.	翻訳日:2023-07-27 13:58:08 公開日:2023-07-26
# 破壊破壊リプシッツの文脈探索 Corruption-Robust Lipschitz Contextual Search ( http://arxiv.org/abs/2307.13903v1 ) ライセンス: Link先を確認	Shiliang Zuo	(参考訳) リプシッツ関数を劣化したバイナリ信号で学習する問題について研究する。学習者は、敵が選択するリプシッツ関数をf$で学習しようとする。各ラウンドにおいて、敵は入力空間でコンテキストベクトル $x_t$ を選択し、学習者は真の関数値 $f(x_t)$ を推測し、その推測が高いか低いかを示すバイナリ信号を受信する。合計$C$ラウンドでは、信号は破損する可能性があるが、学習者には$C$の値が不明である。学習者の目標は、小さな累積損失を負うことである。汚損防止アルゴリズムを設計するのに有用な,自然かつ強力なテクニックの正当性チェックを提示する。 i は、(リプシッツパラメータ $l$ を定数として扱う)アルゴリズムを設計する: 対称損失に対して、学習者は、$d = 1$ で、$o_d(c\log t + t^{(d-1)/d})$ で、$d > 1$ で、学習者は$\widetilde{o} (t^{d/(d+1)} + c\cdot t^{1/(d+1)})$ で後悔する。 I study the problem of learning a Lipschitz function with corrupted binary signals. The learner tries to learn a Lipschitz function $f$ that the adversary chooses. In each round, the adversary selects a context vector $x_t$ in the input space, and the learner makes a guess to the true function value $f(x_t)$ and receives a binary signal indicating whether the guess was high or low. In a total of $C$ rounds, the signal may be corrupted, though the value of $C$ is unknown to the learner. The learner's goal is to incur a small cumulative loss. I present a natural yet powerful technique sanity check, which proves useful in designing corruption-robust algorithms. I design algorithms which (treating the Lipschitz parameter $L$ as constant): for the symmetric loss, the learner achieves regret $O(C\log T)$ with $d = 1$ and $O_d(C\log T + T^{(d-1)/d})$ with $d > 1$; for the pricing loss the learner achieves regret $\widetilde{O} (T^{d/(d+1)} + C\cdot T^{1/(d+1)})$.	翻訳日:2023-07-27 13:57:45 公開日:2023-07-26
# YOLOBench: 組み込みシステム上での効率的なオブジェクト検出器のベンチマーク YOLOBench: Benchmarking Efficient Object Detectors on Embedded Systems ( http://arxiv.org/abs/2307.13901v1 ) ライセンス: Link先を確認	Ivan Lazarevich and Matteo Grimaldi and Ravish Kumar and Saptarshi Mitra and Shahrukh Khan and Sudhakar Sah	(参考訳) これは4つの異なるデータセットと4つの組み込みハードウェアプラットフォーム(x86 cpu, arm cpu, nvidia gpu, npu)上の550以上のyoloベースのオブジェクト検出モデルで構成されるベンチマークである。異なるモデルスケールで様々なヨーロベースの1段検出器の精度と待ち時間数を、固定されたトレーニング環境(コードとトレーニングハイパーパラメータ)との公正な比較により収集する。収集したデータのパレート最適分析により、現代の検出ヘッドとトレーニング技術が学習プロセスに組み込まれている場合、YOLOシリーズの複数のアーキテクチャは、YOLOv3やYOLOv4といった古いモデルを含む、良好な精度とレイテンシのトレードオフを実現することが明らかになった。また、yolobenchのニューラルアーキテクチャ探索で使用されるトレーニングフリー精度推定器を評価し、最先端のゼロコスト精度推定器はmacカウントのような単純なベースラインよりも優れているが、そのいくつかはパレート最適検出モデルの予測に効果的に使用できることを示した。 Raspberry Pi 4 CPU上での最先端のYOLOv8モデルと競合するYOLOアーキテクチャを,ゼロコストプロキシを用いて識別できることを示します。コードとデータはhttps://github.com/deeplite/deeplite-torch-zooで入手できる。 We present YOLOBench, a benchmark comprised of 550+ YOLO-based object detection models on 4 different datasets and 4 different embedded hardware platforms (x86 CPU, ARM CPU, Nvidia GPU, NPU). We collect accuracy and latency numbers for a variety of YOLO-based one-stage detectors at different model scales by performing a fair, controlled comparison of these detectors with a fixed training environment (code and training hyperparameters). Pareto-optimality analysis of the collected data reveals that, if modern detection heads and training techniques are incorporated into the learning process, multiple architectures of the YOLO series achieve a good accuracy-latency trade-off, including older models like YOLOv3 and YOLOv4. We also evaluate training-free accuracy estimators used in neural architecture search on YOLOBench and demonstrate that, while most state-of-the-art zero-cost accuracy estimators are outperformed by a simple baseline like MAC count, some of them can be effectively used to predict Pareto-optimal detection models. We showcase that by using a zero-cost proxy to identify a YOLO architecture competitive against a state-of-the-art YOLOv8 model on a Raspberry Pi 4 CPU. The code and data are available at https://github.com/Deeplite/deeplite-torch-zoo	翻訳日:2023-07-27 13:57:07 公開日:2023-07-26
# FinTree: 関係抽出のための金融データセットプリトレイン変圧器エンコーダ FinTree: Financial Dataset Pretrain Transformer Encoder for Relation Extraction ( http://arxiv.org/abs/2307.13900v1 ) ライセンス: Link先を確認	Hyunjong Ok	(参考訳) 関係抽出のためのFinTree, Financial Dataset Pretrain Transformer Encoderを提案する。エンコーダ言語モデルを用いることで、ファイナンシャルデータセット上でFinTreeをさらに事前訓練し、金融ドメインタスクにモデルを適用する。 FinTreeは、Pattern Exploiting Training方法論にインスパイアされた、従来の[CLS]トークンの代わりにマスク付きトークンを予測する新しい構造で際立っている。この構造により、2つの与えられたエンティティ間のより正確な関係予測が可能になる。モデルは、興味のあるエンティティに関する文脈的および位置的な情報を提供するために、ユニークな入力パターンで訓練され、後処理ステップはエンティティタイプに合わせて正確な予測を保証する。本研究では,FinTreeが大規模金融関係抽出データセットREFinDより優れていることを示す。コードと事前訓練されたモデルはhttps://github.com/HJ-Ok/FinTree.comで入手できる。 We present FinTree, Financial Dataset Pretrain Transformer Encoder for Relation Extraction. Utilizing an encoder language model, we further pretrain FinTree on the financial dataset, adapting the model in financial domain tasks. FinTree stands out with its novel structure that predicts a masked token instead of the conventional [CLS] token, inspired by the Pattern Exploiting Training methodology. This structure allows for more accurate relation predictions between two given entities. The model is trained with a unique input pattern to provide contextual and positional information about the entities of interest, and a post-processing step ensures accurate predictions in line with the entity types. Our experiments demonstrate that FinTree outperforms on the REFinD, a large-scale financial relation extraction dataset. The code and pretrained models are available at https://github.com/HJ-Ok/FinTree.	翻訳日:2023-07-27 13:56:42 公開日:2023-07-26
# メタ学習生成モデルによるニューラルネットワークの正規化 Regularizing Neural Networks with Meta-Learning Generative Models ( http://arxiv.org/abs/2307.13899v1 ) ライセンス: Link先を確認	Shin'ya Yamaguchi, Daiki Chijiwa, Sekitoshi Kanai, Atsutoshi Kumagai, Hisashi Kashima	(参考訳) 本稿では,深層学習のための生成データ向上手法について検討する。生成データ拡張は、生成モデルによって生成された合成サンプルを、小さなデータセット設定で分類するための追加データセットとして活用する。生成データ拡張の重要な課題は、合成データが精度を低下させる非変換サンプルを含むことである。これは、合成サンプルが実際のデータのクラスカテゴリを完全に表現しておらず、一様サンプリングが必ずしもタスクに有用なサンプルを提供していないためである。本稿では,メタ生成正則化(Meta Generative regularization, MGR)と呼ばれる新しい生成データ拡張戦略を提案する。生成データ拡張の劣化を避けるため、mgrは損失関数(例えばクロスエントロピー)ではなく、特徴抽出器の正規化用語で合成サンプルを利用する。これらの合成サンプルはメタラーニングによる検証損失を最小限に抑えるために動的に決定される。我々は,MGRが生合成データ強化の性能劣化を回避し,ベースラインを向上できることを示した。 6つのデータセットに関する実験は、特にデータセットがベースラインよりも小さく安定的に優れている場合にmgrが有効であることを示した。 This paper investigates methods for improving generative data augmentation for deep learning. Generative data augmentation leverages the synthetic samples produced by generative models as an additional dataset for classification with small dataset settings. A key challenge of generative data augmentation is that the synthetic data contain uninformative samples that degrade accuracy. This is because the synthetic samples do not perfectly represent class categories in real data and uniform sampling does not necessarily provide useful samples for tasks. In this paper, we present a novel strategy for generative data augmentation called meta generative regularization (MGR). To avoid the degradation of generative data augmentation, MGR utilizes synthetic samples in the regularization term for feature extractors instead of in the loss function, e.g., cross-entropy. These synthetic samples are dynamically determined to minimize the validation losses through meta-learning. We observed that MGR can avoid the performance degradation of na\"ive generative data augmentation and boost the baselines. Experiments on six datasets showed that MGR is effective particularly when datasets are smaller and stably outperforms baselines.	翻訳日:2023-07-27 13:56:25 公開日:2023-07-26
# avit: 小さな皮膚病変分割データセットに対する視覚トランスフォーマーの適用 AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation Datasets ( http://arxiv.org/abs/2307.13897v1 ) ライセンス: Link先を確認	Siyi Du, Nourhan Bayasi, Ghassan Harmarneh, Rafeef Garbi	(参考訳) 皮膚病変セグメンテーション(SLS)は皮膚病変解析において重要な役割を担っている。視覚トランスフォーマー(ViT)は、SLSにとって注目に値するソリューションであるが、固有のパラメータ重構造と誘導バイアスの欠如により、畳み込みニューラルネットワーク(CNN)と比較して、より多くのトレーニングデータを必要とする。この問題を軽減するため、現在のSLSデータセット上で、微調整済みのViTバックボーンにアプローチすることで、より大規模な自然画像から学んだ知識を活用して、必要な皮膚トレーニングデータの量を減らすことを目指している。しかし、大きなバックボーンの全てのパラメータの完全な微調整は計算コストが高く、メモリ集約的である。本稿では,任意のトレーニング済みViTをSLSタスクに転送することで,ViTのデータハンガーを緩和する,新しい効率的な戦略であるAViTを提案する。具体的には、プレトレーニングされた重みを更新せずにvitの特徴表現を変調する軽量モジュール(アダプタ)をトランスフォーマー層に統合する。さらに,細粒度情報とcnnのインダクティブバイアスを把握し,小さなデータセットのセグメンテーションタスクをガイドする入力画像からプロンプト埋め込みを作成するためのプロンプトジェネレータとして,浅いcnnを用いる。 4つの皮膚病変データセットを定量的に検討した結果,avitはsomaと競合するが,訓練可能なパラメータは有意に少ない。私たちのコードはhttps://github.com/siyi-wind/avitで利用可能です。 Skin lesion segmentation (SLS) plays an important role in skin lesion analysis. Vision transformers (ViTs) are considered an auspicious solution for SLS, but they require more training data compared to convolutional neural networks (CNNs) due to their inherent parameter-heavy structure and lack of some inductive biases. To alleviate this issue, current approaches fine-tune pre-trained ViT backbones on SLS datasets, aiming to leverage the knowledge learned from a larger set of natural images to lower the amount of skin training data needed. However, fully fine-tuning all parameters of large backbones is computationally expensive and memory intensive. In this paper, we propose AViT, a novel efficient strategy to mitigate ViTs' data-hunger by transferring any pre-trained ViTs to the SLS task. Specifically, we integrate lightweight modules (adapters) within the transformer layers, which modulate the feature representation of a ViT without updating its pre-trained weights. In addition, we employ a shallow CNN as a prompt generator to create a prompt embedding from the input image, which grasps fine-grained information and CNN's inductive biases to guide the segmentation task on small datasets. Our quantitative experiments on 4 skin lesion datasets demonstrate that AViT achieves competitive, and at times superior, performance to SOTA but with significantly fewer trainable parameters. Our code is available at https://github.com/siyi-wind/AViT.	翻訳日:2023-07-27 13:56:10 公開日:2023-07-26
# AI4GCC - チーム: 海底レベル: 批判と改善 AI4GCC - Team: Below Sea Level: Critiques and Improvements ( http://arxiv.org/abs/2307.13894v1 ) ライセンス: Link先を確認	Bram Renting, Phillip Wozny, Robert Loftin, Claudia Wieners, Erman Acar	(参考訳) 本稿では、気候変動が経済に与える影響を評価するための統合評価モデル(IAM)であるRICE-Nの批判的分析を行う。我々は、アクションマスキングや無関係な行動を含むRICE-Nの重要課題を特定し、関税収入の活用や過剰生産の処罰などの改善を提案する。また、概してIAMの特徴、すなわち過度に楽観的な損傷関数と非現実的な評価コスト関数に重きを置いている。本研究は, 政策立案者へのインスピレーションとして, シミュレーションを改善するため, RICE-N フレームワークをさらに発展させる取り組みに寄与する。 We present a critical analysis of the simulation framework RICE-N, an integrated assessment model (IAM) for evaluating the impacts of climate change on the economy. We identify key issues with RICE-N, including action masking and irrelevant actions, and suggest improvements such as utilizing tariff revenue and penalizing overproduction. We also critically engage with features of IAMs in general, namely overly optimistic damage functions and unrealistic abatement cost functions. Our findings contribute to the ongoing efforts to further develop the RICE-N framework in an effort to improve the simulation, making it more useful as an inspiration for policymakers.	翻訳日:2023-07-27 13:55:44 公開日:2023-07-26
# 気候変動交渉のための動的グループ化:効果的な戦略による協力と均衡の促進 Dynamic Grouping for Climate Change Negotiation: Facilitating Cooperation and Balancing Interests through Effective Strategies ( http://arxiv.org/abs/2307.13893v1 ) ライセンス: Link先を確認	Yu Qin, Duo Zhang, Yuren Pang	(参考訳) 本稿では,実世界ビジネスと政治交渉プロトコルに基づく気候緩和のための動的グループ化交渉モデルを提案する。 AI4GCCコンペティションフレームワーク内では,グループ形成と更新,グループ内交渉,グループ間交渉という3段階のプロセスを開発する。本モデルは,グローバルな気候変動目標を達成するために,様々な利害関係者間の効率的かつ効果的な協力を促進する。グループ形成手法とグループ更新戦略を導入することで,多地域気候交渉における複雑さと不均衡を解消する。グループ内交渉は、すべてのメンバーが緩和活動に貢献することを保証する一方、グループ間交渉は、緩和と貯蓄率を設定するために提案評価フレームワークを使用する。我々は、気候変動対策に関する国際協力を促進するための有望なアプローチとして、RIS-Nフレームワーク内での交渉モデルを実証する。 In this paper, we propose a dynamic grouping negotiation model for climate mitigation based on real-world business and political negotiation protocols. Within the AI4GCC competition framework, we develop a three-stage process: group formation and updates, intra-group negotiation, and inter-group negotiation. Our model promotes efficient and effective cooperation between various stakeholders to achieve global climate change objectives. By implementing a group-forming method and group updating strategy, we address the complexities and imbalances in multi-region climate negotiations. Intra-group negotiations ensure that all members contribute to mitigation efforts, while inter-group negotiations use the proposal-evaluation framework to set mitigation and savings rates. We demonstrate our negotiation model within the RICE-N framework, illustrating a promising approach for facilitating international cooperation on climate change mitigation.	翻訳日:2023-07-27 13:55:32 公開日:2023-07-26
# AI4GCC - チーム: 海底レベル: スコアと実世界の関連性 AI4GCC - Team: Below Sea Level: Score and Real World Relevance ( http://arxiv.org/abs/2307.13892v1 ) ライセンス: Link先を確認	Phillip Wozny, Bram Renting, Robert Loftin, Claudia Wieners, Erman Acar	(参考訳) ai for global climate cooperation (ai4gcc) コンペティションのトラック3への提案として,米-n気候経済シミュレーションにおける使用のための交渉プロトコルを提案する。本提案では, 炭素境界調整機構 (CBAM) と気候クラブ (CC) にインスパイアされた手法を用いて, 炭素漏れの課題を解決することを目的とする。シミュレーション結果と代表集中経路(RCP)と共有社会経済経路(SSP)を比較し,本手法の有効性を実証した。我々のプロトコルは、RCP 3.4/4.5 と SSP 2 に匹敵する温度上昇をもたらす。さらに、我が国の国際貿易機関(WTO)のコンプライアンス、行政及び政治的実現可能性、倫理的懸念について分析する。我々は,我々の提案が発展途上国を損なうリスクがあることを認識し,技術共有や富の再分配といった既存の不平等を悪化させないための具体的な是正措置を提案する。今後の研究は、米-n関税機構を改善し、前述の是正措置を可能にする措置を講じるべきである。 As our submission for track three of the AI for Global Climate Cooperation (AI4GCC) competition, we propose a negotiation protocol for use in the RICE-N climate-economic simulation. Our proposal seeks to address the challenges of carbon leakage through methods inspired by the Carbon Border Adjustment Mechanism (CBAM) and Climate Clubs (CC). We demonstrate the effectiveness of our approach by comparing simulated outcomes to representative concentration pathways (RCP) and shared socioeconomic pathways (SSP). Our protocol results in a temperature rise comparable to RCP 3.4/4.5 and SSP 2. Furthermore, we provide an analysis of our protocol's World Trade Organization compliance, administrative and political feasibility, and ethical concerns. We recognize that our proposal risks hurting the least developing countries, and we suggest specific corrective measures to avoid exacerbating existing inequalities, such as technology sharing and wealth redistribution. Future research should improve the RICE-N tariff mechanism and implement actions allowing for the aforementioned corrective measures.	翻訳日:2023-07-27 13:55:17 公開日:2023-07-26
# EasyNet:3Dインダストリアル異常検出のための簡易ネットワーク EasyNet: An Easy Network for 3D Industrial Anomaly Detection ( http://arxiv.org/abs/2307.13925v1 ) ライセンス: Link先を確認	Ruitao Chen, Guoyang Xie, Jiaqi Liu, Jinbao Wang, Ziqi Luo, Jinfan Wang, Feng Zheng	(参考訳) 3d異常検出は産業生産(im)におけるコンピュータビジョンの新たな課題である。近年,多くの高度なアルゴリズムが公表されているが,そのほとんどがIMのニーズを満たすことはできない。欠点はいくつかある。一アルゴリズムが大規模な事前訓練されたモデルに大きく依存するため、生産ラインへの展開が困難であること。二記憶バンクの過多による記憶オーバヘッドの大幅な増加三推論速度は、リアルタイムでは達成できない。 To overcome these issues, we propose an easy and deployment-friendly network (called EasyNet) without using pre-trained models and memory banks: firstly, we design a multi-scale multi-modality feature encoder-decoder to accurately reconstruct the segmentation maps of anomalous regions and encourage the interaction between RGB images and depth images; secondly, we adopt a multi-modality anomaly segmentation network to achieve a precise anomaly map; thirdly, we propose an attention-based information entropy fusion module for feature fusion during inference, making it suitable for real-time deployment. 大規模な実験により、EasyNetは事前訓練されたモデルやメモリバンクを使わずに92.6%の異常検出AUROCを実現している。さらに、EasyNetは既存の方法よりも高速で、Tesla V100 GPU上で94.55 FPSのフレームレートを持つ。 3D anomaly detection is an emerging and vital computer vision task in industrial manufacturing (IM). Recently many advanced algorithms have been published, but most of them cannot meet the needs of IM. There are several disadvantages: i) difficult to deploy on production lines since their algorithms heavily rely on large pre-trained models; ii) hugely increase storage overhead due to overuse of memory banks; iii) the inference speed cannot be achieved in real-time. To overcome these issues, we propose an easy and deployment-friendly network (called EasyNet) without using pre-trained models and memory banks: firstly, we design a multi-scale multi-modality feature encoder-decoder to accurately reconstruct the segmentation maps of anomalous regions and encourage the interaction between RGB images and depth images; secondly, we adopt a multi-modality anomaly segmentation network to achieve a precise anomaly map; thirdly, we propose an attention-based information entropy fusion module for feature fusion during inference, making it suitable for real-time deployment. Extensive experiments show that EasyNet achieves an anomaly detection AUROC of 92.6% without using pre-trained models and memory banks. In addition, EasyNet is faster than existing methods, with a high frame rate of 94.55 FPS on a Tesla V100 GPU.	翻訳日:2023-07-27 13:49:36 公開日:2023-07-26
# trajdata: 複数の人軌道データセットに対する統一インターフェース trajdata: A Unified Interface to Multiple Human Trajectory Datasets ( http://arxiv.org/abs/2307.13924v1 ) ライセンス: Link先を確認	Boris Ivanovic, Guanyu Song, Igor Gilitschenski, Marco Pavone	(参考訳) 軌道予測の分野は近年大きく成長しており、自動運転車(AV)のための大規模で現実的な人間の軌道データセットの公開や歩行者の動き追跡が原因となっている。このようなデータセットはコミュニティにとって朗報だが、それぞれ独自のデータフォーマットとAPIを使用しているため、研究者が複数のデータセットをまたいだメソッドのトレーニングと評価が難しい。これを改善するために、複数の人間の軌跡データセットに統一されたインターフェースであるtrajdataを提案する。 trajdataの中核は、トラジェクトリとマップデータのためのシンプルで均一で効率的な表現とAPIを提供する。そこで本研究では,既存の軌跡データセットの包括的実験的評価を行い,現在の歩行者とavモーション予測研究の基盤となるデータを理解し,これらの知見から将来のデータセットの提案を提示する。 trajdataは許容ライセンス(apache 2.0)であり、https://github.com/nvlabs/trajdataでアクセスすることができる。 The field of trajectory forecasting has grown significantly in recent years, partially owing to the release of numerous large-scale, real-world human trajectory datasets for autonomous vehicles (AVs) and pedestrian motion tracking. While such datasets have been a boon for the community, they each use custom and unique data formats and APIs, making it cumbersome for researchers to train and evaluate methods across multiple datasets. To remedy this, we present trajdata: a unified interface to multiple human trajectory datasets. At its core, trajdata provides a simple, uniform, and efficient representation and API for trajectory and map data. As a demonstration of its capabilities, in this work we conduct a comprehensive empirical evaluation of existing trajectory datasets, providing users with a rich understanding of the data underpinning much of current pedestrian and AV motion forecasting research, and proposing suggestions for future datasets from these insights. trajdata is permissively licensed (Apache 2.0) and can be accessed online at https://github.com/NVlabs/trajdata	翻訳日:2023-07-27 13:49:18 公開日:2023-07-26
# GrammarGPT: 改良されたファインチューニングによる中国語文法誤り訂正のためのオープンソースのLLM探索 GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical Error Correction with Supervised Fine-Tuning ( http://arxiv.org/abs/2307.13923v1 ) ライセンス: Link先を確認	Yaxin Fan, Feng Jiang, Peifeng Li, and Haizhou Li	(参考訳) 文法的誤り訂正は、非文法的文章を自動的に修正することを目的としている。近年、文法的誤り訂正において、クローズドソースの大規模言語モデル(llm、例えばchatgpt)の優れた能力が実証されている。しかし、オープンソース LLM の可能性はまだ明らかにされていない。本稿では,オープンソースのLLMであるGrammarGPTを導入し,中国語の文法的誤り訂正の可能性について検討した。 GrammarGPTの核となるレシピは、ChatGPT生成と人間アノテーションのハイブリッドデータセットを活用することである。手がかり付き文法的誤りに対しては,ChatGPTを誘導して非文法的文を生成するヒューリスティック手法を提案する。手がかりのない文法的誤りに対しては,公開ウェブサイトから非文法的文章を収集し,手作業で修正した。さらに,中国語の文法的誤りを訂正するモデルの能力を高めるために,誤り不変拡張法を採用した。最終的に約1kの並列データを構築し,これらのデータを用いて,香港大学深セン校がリリースしたPhoenixなどのオープンソースのLCMを微調整した。実験の結果,GrammarGPTは既存のSOTAシステムよりも優れていた。モデルパラメータはSOTAベースラインより20倍大きいが、命令チューニングに必要なデータ量は1200倍小さく、ネイティブCGEC上でのオープンソースLCMの可能性を示している。我々のGrammarGPTは、NLPCC2023 SharedTask1に$3^{rd}をランク付けし、我々のアプローチの有効性を示している。コードとデータは \url{https://github.com/freedomintelligence/grammargpt} で入手できる。 Grammatical error correction aims to correct ungrammatical sentences automatically. Recently, some work has demonstrated the excellent capabilities of closed-source Large Language Models (LLMs, e.g., ChatGPT) in grammatical error correction. However, the potential of open-source LLMs remains unexplored. In this paper, we introduced GrammarGPT, an open-source LLM, to preliminary explore its potential for native Chinese grammatical error correction. The core recipe of GrammarGPT is to leverage the hybrid dataset of ChatGPT-generated and human-annotated. For grammatical errors with clues, we proposed a heuristic method to guide ChatGPT to generate ungrammatical sentences by providing those clues. For grammatical errors without clues, we collected ungrammatical sentences from publicly available websites and manually corrected them. In addition, we employed an error-invariant augmentation method to enhance the ability of the model to correct native Chinese grammatical errors. We ultimately constructed about 1k parallel data and utilized these data to fine-tune open-source LLMs (e.g., Phoenix, released by The Chinese University of Hong Kong, Shenzhen) with instruction tuning. The experimental results show that GrammarGPT outperforms the existing SOTA system significantly. Although model parameters are 20x larger than the SOTA baseline, the required amount of data for instruction tuning is 1200x smaller, illustrating the potential of open-source LLMs on native CGEC. Our GrammarGPT ranks $3^{rd}$ on NLPCC2023 SharedTask1, demonstrating our approach's effectiveness. The code and data are available at \url{https://github.com/FreedomIntelligence/GrammarGPT}.	翻訳日:2023-07-27 13:49:01 公開日:2023-07-26
# マルチエージェント学習の安定性:多くのプレイヤーによるネットワークゲームにおける収束性 Stability of Multi-Agent Learning: Convergence in Network Games with Many Players ( http://arxiv.org/abs/2307.13922v1 ) ライセンス: Link先を確認	Aamal Hussain, Dan Leonte, Francesco Belardinelli and Georgios Piliouras	(参考訳) 多くのプレイヤーゲームにおけるマルチエージェント学習の振る舞いは、ネットワークゼロサムゲームのような制限的な例以外で複雑なダイナミクスを示すことが示されている。また,プレイヤー数の増加に伴い,収束行動は生じにくいことが示されている。この問題を解くために,q-learning dynamics について検討し,ネットワークゲームにおいてダイナミクスが一意な均衡に収束するのに十分な条件を決定する。この条件は、ペアワイズ相互作用の性質とネットワーク構造に依存するが、ゲーム内のエージェントの総数とは明確に独立している。この結果を代表的ネットワークゲームで評価し、適切なネットワーク条件下では、任意の数のエージェントで安定した学習ダイナミクスを実現できることを示す。 The behaviour of multi-agent learning in many player games has been shown to display complex dynamics outside of restrictive examples such as network zero-sum games. In addition, it has been shown that convergent behaviour is less likely to occur as the number of players increase. To make progress in resolving this problem, we study Q-Learning dynamics and determine a sufficient condition for the dynamics to converge to a unique equilibrium in any network game. We find that this condition depends on the nature of pairwise interactions and on the network structure, but is explicitly independent of the total number of agents in the game. We evaluate this result on a number of representative network games and show that, under suitable network conditions, stable learning dynamics can be achieved with an arbitrary number of agents.	翻訳日:2023-07-27 13:48:34 公開日:2023-07-26
# ホーキング放射のエントロピーのゆらぎ Fluctuations in the Entropy of Hawking Radiation ( http://arxiv.org/abs/2307.13920v1 ) ライセンス: Link先を確認	Raphael Bousso, Masamichi Miyaji	(参考訳) 我々は、Penington \emph{et al} が導入した二次元モデルを用いて、ページ曲線の周りのホーキング放射エントロピーのゆらぎを計算するために重力経路積分(GPI)を用いる。ページタイムの前には、$\delta s = e^{-s}/\sqrt{2}$ が発見され、ここで$s$ はブラックホールエントロピーである。この結果は二成分系におけるhaar平均エントロピーゆらぎと一致し、これも先行順序で計算する。ページ時間後、$\delta S \sim e^{-S}$は、マイクロカノニカルエネルギーウィンドウの幅に対数的に依存するプレファクターになる。これはサブシステムのサイズの交換では対称ではないので、固定ヒルベルト空間次元のサブシステムに対するハール平均とは一致しない。この差は、ブラックホールヒルベルト空間次元が状態準備によって固定されないという事実に起因し得る: トップハットのスミア機能を持つマイクロカノニカルアンサンブルにおいても、GPIはブラックホール状態の数に付加的な変動をもたらす。この結果と、GPIによって計算されたページ曲線が滑らかであるという事実は、すべてGPIのアンサンブル解釈に向かっている。 We use the gravitational path integral (GPI) to compute the fluctuations of the Hawking radiation entropy around the Page curve, in a two-dimensional model introduced by Penington \emph{et al}. Before the Page time, we find that $\delta S = e^{-S}/\sqrt{2}$, where $S$ is the black hole entropy. This result agrees with the Haar-averaged entropy fluctuations of a bipartite system, which we also compute at leading order. After the Page time, we find that $\delta S \sim e^{-S}$, up to a prefactor that depends logarithmically on the width of the microcanonical energy window. This is not symmetric under exchange of subsystem sizes and so does not agree with the Haar average for a subsystem of fixed Hilbert space dimension. The discrepancy can be attributed to the fact that the black hole Hilbert space dimension is not fixed by the state preparation: even in a microcanonical ensemble with a top-hat smearing function, the GPI yields an additive fluctuation in the number of black hole states. This result, and the fact that the Page curve computed by the GPI is smooth, all point towards an ensemble interpretation of the GPI.	翻訳日:2023-07-27 13:48:21 公開日:2023-07-26
# 心血管モデルのシミュレーションによる推論 Simulation-based Inference for Cardiovascular Models ( http://arxiv.org/abs/2307.13918v1 ) ライセンス: Link先を確認	Antoine Wehenkel, Jens Behrmann, Andrew C. Miller, Guillermo Sapiro, Ozan Sener, Marco Cuturi, J\"orn-Henrik Jacobsen	(参考訳) 過去数十年間、血液力学シミュレーターは着実に進化し、シリコン中の循環器系を研究するためのツールとなった。このようなツールは、生理的パラメータから全身血行動態をシミュレートするために日常的に使用されているが、波形を可算な生理的パラメータにマッピングする逆問題の解決は、有望かつ困難なままである。シミュレーションベース推論 (SBI) の進歩により, この逆問題を統計的推論として捉えた。代替アプローチとは対照的に、SBIは興味のあるパラメータに対して \textit{posterior distributions} を提供し、 \textit{individual} 測定に対して不確実性の \textit{multi-dimensional} 表現を提供する。本研究は, 臨床関心の5つのバイオマーカーのシリコン内不確実性解析を行い, 測定精度を比較した。心拍数推定の可能性など、既知の事実の確証のほかに、標準的なケア指標から新しいバイオマーカーを推定する可能性についても注目する。 SBIは、パラメータ推定が異なる不確実な状態を示すサブポピュレーションの存在など、標準感度分析では捉えられない事実上の関連した発見を明らかにしている。最後に,in-vivoとin-silicoのギャップをミームiii波形データベースを用いて検討し,心血管シミュレーションが実世界データ解析にどのように寄与するかを批判的に論じる。 Over the past decades, hemodynamics simulators have steadily evolved and have become tools of choice for studying cardiovascular systems in-silico. While such tools are routinely used to simulate whole-body hemodynamics from physiological parameters, solving the corresponding inverse problem of mapping waveforms back to plausible physiological parameters remains both promising and challenging. Motivated by advances in simulation-based inference (SBI), we cast this inverse problem as statistical inference. In contrast to alternative approaches, SBI provides \textit{posterior distributions} for the parameters of interest, providing a \textit{multi-dimensional} representation of uncertainty for \textit{individual} measurements. We showcase this ability by performing an in-silico uncertainty analysis of five biomarkers of clinical interest comparing several measurement modalities. Beyond the corroboration of known facts, such as the feasibility of estimating heart rate, our study highlights the potential of estimating new biomarkers from standard-of-care measurements. SBI reveals practically relevant findings that cannot be captured by standard sensitivity analyses, such as the existence of sub-populations for which parameter estimation exhibits distinct uncertainty regimes. Finally, we study the gap between in-vivo and in-silico with the MIMIC-III waveform database and critically discuss how cardiovascular simulations can inform real-world data analysis.	翻訳日:2023-07-27 13:47:57 公開日:2023-07-26
# BayesDAG: 因果発見のための勾配に基づく後方サンプリング BayesDAG: Gradient-Based Posterior Sampling for Causal Discovery ( http://arxiv.org/abs/2307.13917v1 ) ライセンス: Link先を確認	Yashas Annadani, Nick Pawlowski, Joel Jennings, Stefan Bauer, Cheng Zhang, Wenbo Gong	(参考訳) Bayesian causal discoveryは、観測されたデータから因果モデルの後方分布を推定し、疫学的な不確実性を定量化し、下流のタスクに利益をもたらすことを目的としている。しかし、DAG(Directed Acyclic Graphs)と非線形関数の組合せ空間に対する共同推論によって計算上の問題が発生する。 DAGに対する効率的な後部推論への最近の進歩にもかかわらず、既存の手法は線形因果モデルに対するノード置換行列の変分推論に制限され、妥協された推論精度、DAG正規化器によって制約された隣接行列の連続緩和が導かれる。本研究では,このような制約を克服する確率的勾配マルコフ連鎖モンテカルロ (sg-mcmc) に基づくスケーラブルベイズ因果発見フレームワークを提案する。本手法では,DAG正則化を必要とせずに後方からDAGを直接サンプリングし,同時に関数パラメータのサンプルを抽出し,線形因果モデルと非線形因果モデルの両方に適用する。提案手法を実現するために,置換に基づくDAG学習と新しい等価性を導出し,置換上に定義された緩和勾配推定器の使用可能性を高める。我々の知る限り、これは勾配に基づくMCMCサンプリングを因果発見に適用した最初のフレームワークである。合成および実世界のデータセットに関する実証評価は、最先端のベースラインと比較して、我々のアプローチの有効性を示している。 Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existing methods are either limited to variational inference on node permutation matrices for linear causal models, leading to compromised inference accuracy, or continuous relaxation of adjacency matrices constrained by a DAG regularizer, which cannot ensure resulting graphs are DAGs. In this work, we introduce a scalable Bayesian causal discovery framework based on stochastic gradient Markov Chain Monte Carlo (SG-MCMC) that overcomes these limitations. Our approach directly samples DAGs from the posterior without requiring any DAG regularization, simultaneously draws function parameter samples and is applicable to both linear and nonlinear causal models. To enable our approach, we derive a novel equivalence to the permutation-based DAG learning, which opens up possibilities of using any relaxed gradient estimator defined over permutations. To our knowledge, this is the first framework applying gradient-based MCMC sampling for causal discovery. Empirical evaluations on synthetic and real-world datasets demonstrate our approach's effectiveness compared to state-of-the-art baselines.	翻訳日:2023-07-27 13:47:30 公開日:2023-07-26
# 予測文脈をもつ帯域におけるオンライン学習 Online learning in bandits with predicted context ( http://arxiv.org/abs/2307.13916v1 ) ライセンス: Link先を確認	Yongyi Guo, Susan Murphy	(参考訳) エージェントがコンテキストのノイズのあるバージョンとエラー分散(あるいはこの分散の推定器)にのみアクセスできる状況的帯域幅問題を考える。この設定は、意思決定の真のコンテキストが観測されず、潜在的に複雑な機械学習アルゴリズムによるコンテキストの予測しかできない幅広いアプリケーションによって動機付けられている。文脈誤差が最小化されない場合、古典的帯域幅アルゴリズムはサブ線形後悔を達成できない。本稿では,この設定における最初のオンラインアルゴリズムを提案する。鍵となる考え方は、古典統計学における測定誤差モデルをオンライン意思決定設定に拡張することである。 We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context and the error variance (or an estimator of this variance). This setting is motivated by a wide range of applications where the true context for decision-making is unobserved, and only a prediction of the context by a potentially complex machine learning algorithm is available. When the context error is non-diminishing, classical bandit algorithms fail to achieve sublinear regret. We propose the first online algorithm in this setting with sublinear regret compared to the appropriate benchmark. The key idea is to extend the measurement error model in classical statistics to the online decision-making setting, which is nontrivial due to the policy being dependent on the noisy context observations.	翻訳日:2023-07-27 13:47:08 公開日:2023-07-26
# 社会目的関数によるソーシャルメディアAIへの民主的価値の埋め込み Embedding Democratic Values into Social Media AIs via Societal Objective Functions ( http://arxiv.org/abs/2307.13912v1 ) ライセンス: Link先を確認	Chenyan Jia, Michelle S. Lam, Minh Chau Mai, Jeff Hancock, Michael S. Bernstein	(参考訳) ソーシャルメディアフィードをランク付けする人工知能(AI)システムを設計すれば、その目的機能の一部としてパルチザンの敵意を緩和するような民主的価値を考慮できるだろうか? 本稿では, 確立された社会的科学的構成を社会目的関数と呼ぶai目的関数に翻訳する手法を紹介し, 反民主主義的態度の政治科学構築への応用を実証する。伝統的に、そのようなモデルをトレーニングするための観測可能な成果は得られていないが、社会科学はこれらの構築物に対する調査機器や定性的コードブックを開発し、その精度は大規模言語モデルの詳細なプロンプトへの翻訳を容易にする。本稿では,ソーシャルメディア投稿が反民主的態度を促進する程度を推定する民主的態度モデルを作成し,この民主的態度モデルを3つの研究で検証する。研究1では,米国パルチザン間の介入(n=1,380)が,反民主主義的態度スコアを付したソーシャルメディア投稿(アルファ=.895)に手作業で注釈を付け,これらのスコアに基づいて複数のフィードランキング条件をテストし,行動的効果を最初に検証した。削除(d=.20)と下級のフィード(d=.25)は、参加者の経験やエンゲージメントを損なうことなく、パルチザンの敵意を減らした。研究2では, 民主的態度モデルを作成し, マニュアルラベルとの強い合意を求めることで, マニュアルラベルをスケールアップする(rho=.75)。最後に,研究3では,手動ラベルの代わりに民主的態度モデルを用いて研究1を再現し,その姿勢的・行動的影響(N=558)を検証した。本手法は,ソーシャルメディアAIにおける社会的害を軽減するために,社会科学理論と手法に基づく新たな戦略を提案する。 Can we design artificial intelligence (AI) systems that rank our social media feeds to consider democratic values such as mitigating partisan animosity as part of their objective functions? We introduce a method for translating established, vetted social scientific constructs into AI objective functions, which we term societal objective functions, and demonstrate the method with application to the political science construct of anti-democratic attitudes. Traditionally, we have lacked observable outcomes to use to train such models, however, the social sciences have developed survey instruments and qualitative codebooks for these constructs, and their precision facilitates translation into detailed prompts for large language models. We apply this method to create a democratic attitude model that estimates the extent to which a social media post promotes anti-democratic attitudes, and test this democratic attitude model across three studies. In Study 1, we first test the attitudinal and behavioral effectiveness of the intervention among US partisans (N=1,380) by manually annotating (alpha=.895) social media posts with anti-democratic attitude scores and testing several feed ranking conditions based on these scores. Removal (d=.20) and downranking feeds (d=.25) reduced participants' partisan animosity without compromising their experience and engagement. In Study 2, we scale up the manual labels by creating the democratic attitude model, finding strong agreement with manual labels (rho=.75). Finally, in Study 3, we replicate Study 1 using the democratic attitude model instead of manual labels to test its attitudinal and behavioral impact (N=558), and again find that the feed downranking using the societal objective function reduced partisan animosity (d=.25). This method presents a novel strategy to draw on social science theory and methods to mitigate societal harms in social media AIs.	翻訳日:2023-07-27 13:46:56 公開日:2023-07-26
# グラフニューラルネットワークを用いた粒子破砕強度予測用ハイブリッドフレームワーク Graph Neural Networks-based Hybrid Framework For Predicting Particle Crushing Strength ( http://arxiv.org/abs/2307.13909v1 ) ライセンス: Link先を確認	Tongya Zheng, Tianli Zhang, Qingzheng Guan, Wenjie Huang, Zunlei Feng, Mingli Song, Chun Chen	(参考訳) グラフニューラルネットワークは、異なる実体間の非ユークリッド関係をモデル化できるため、薬品分子の分類や化学反応予測のような多分野のタスクに有効な機械学習ツールとして登場した。粒子破砕は土木工学の重要な分野として、粒子断片結合の破壊による粒状物質の破壊を数値シミュレーションのモデルで記述し、粒子断片とグラフニューラルネットワーク(GNN)との接続を通して粒子破砕の機械的挙動を特徴づける動機となった。しかし、実験室試験や数値シミュレーションの費用がかかるため、研究用のオープンソースの大規模粒子破砕データセットが欠落している。そこで,まず 45,000 個の数値シミュレーションと 900 個の粒子タイプからなるデータセットを生成し, 粒子破砕のための機械学習の研究の進展を促進する。第二に, 粒子フラグメントビューにおける粒子破砕強度を予測するために, gnnsに基づくハイブリッドフレームワークを考案し, 最先端の技術gnnを用いて, 粒子破砕強度を予測する。最後に,従来の機械学習手法と平易なmlpとのハイブリッドフレームワークを比較し,その有効性を確認した。異なる特徴の有用性は、予測値の勾配属性説明によってさらに議論される。我々のデータとコードはhttps://github.com/doujiang-zheng/GNN-For-Particle-Crushingで公開されています。 Graph Neural Networks have emerged as an effective machine learning tool for multi-disciplinary tasks such as pharmaceutical molecule classification and chemical reaction prediction, because they can model non-euclidean relationships between different entities. Particle crushing, as a significant field of civil engineering, describes the breakage of granular materials caused by the breakage of particle fragment bonds under the modeling of numerical simulations, which motivates us to characterize the mechanical behaviors of particle crushing through the connectivity of particle fragments with Graph Neural Networks (GNNs). However, there lacks an open-source large-scale particle crushing dataset for research due to the expensive costs of laboratory tests or numerical simulations. Therefore, we firstly generate a dataset with 45,000 numerical simulations and 900 particle types to facilitate the research progress of machine learning for particle crushing. Secondly, we devise a hybrid framework based on GNNs to predict particle crushing strength in a particle fragment view with the advances of state of the art GNNs. Finally, we compare our hybrid framework against traditional machine learning methods and the plain MLP to verify its effectiveness. The usefulness of different features is further discussed through the gradient attribution explanation w.r.t the predictions. Our data and code are released at https://github.com/doujiang-zheng/GNN-For-Particle-Crushing.	翻訳日:2023-07-27 13:46:19 公開日:2023-07-26
# 音声と顔の相関を再考する: 幾何学的視点 Rethinking Voice-Face Correlation: A Geometry View ( http://arxiv.org/abs/2307.13948v1 ) ライセンス: Link先を確認	Xiang Li, Yandong Wen, Muqiao Yang, Jinglu Wang, Rita Singh, Bhiksha Raj	(参考訳) 音声のマッチングと音声誘導顔合成に関するこれまでの研究は、声と顔の間に強い相関関係を示すが、主に性別、年齢、感情などの粗い意味的手がかりに依存する。本稿では,音声から3次元顔形状を再構成する能力について,意味情報を用いずに幾何学的視点から検討する。音声から予測可能な顔AMを識別し,それを用いて3次元顔再構成を誘導する音声人文計測(AM)-顔パラダイムを提案する。音声と顔の形状をリンクするプロキシとしてAMを活用することで、予測不可能なAMの影響を排除し、顔の形状を抽出できるようにする。提案手法は,3次元顔スキャンと対応する音声記録を用いて,提案するデータセット上で評価し,鼻腔や頭蓋などの顔形状の特定の部分と音声との有意な相関を見出した。本研究は, 音声と顔の相関に関する新しい視点を提供し, 人類計測科学の優れた実証研究として機能する。 Previous works on voice-face matching and voice-guided face synthesis demonstrate strong correlations between voice and face, but mainly rely on coarse semantic cues such as gender, age, and emotion. In this paper, we aim to investigate the capability of reconstructing the 3D facial shape from voice from a geometry perspective without any semantic information. We propose a voice-anthropometric measurement (AM)-face paradigm, which identifies predictable facial AMs from the voice and uses them to guide 3D face reconstruction. By leveraging AMs as a proxy to link the voice and face geometry, we can eliminate the influence of unpredictable AMs and make the face geometry tractable. Our approach is evaluated on our proposed dataset with ground-truth 3D face scans and corresponding voice recordings, and we find significant correlations between voice and specific parts of the face geometry, such as the nasal cavity and cranium. Our work offers a new perspective on voice-face correlation and can serve as a good empirical study for anthropometry science.	翻訳日:2023-07-27 13:39:23 公開日:2023-07-26
# 病理画像におけるCentroid-aware feature recalibration Centroid-aware feature recalibration for cancer grading in pathology images ( http://arxiv.org/abs/2307.13947v1 ) ライセンス: Link先を確認	Jaeung Lee, Keunho Byeon, and Jin Tae Kwak	(参考訳) がんの診断は病理学において重要な課題である。計算病理学におけるニューラルネットワークの最近の進歩は、これらの手法ががん診断の精度と品質を改善する大きな可能性を持っていることを示している。しかし,そのような手法の堅牢性と信頼性に関する問題は,まだ完全には解決されていない。本稿では,癌診断を高精度かつロバストに行うことができるセントロイド対応機能再構成ネットワークを提案する。提案するネットワークは, 入力病理画像を埋め込み空間にマッピングし, 注意機構を用いて, 異なるがんグレードのベクターを組み込んで調整する。再校正された埋め込みベクターにより、提案ネットワークは入力病理画像を関連するクラスラベル、すなわちがんのグレードに分類する。異なる環境下で収集した大腸癌データセットを用いて,提案ネットワークを評価した。実験の結果, 提案ネットワークは, データセットの環境変化にかかわらず, 病理画像において高い精度でガングレーディングを行うことができることを確認した。 Cancer grading is an essential task in pathology. The recent developments of artificial neural networks in computational pathology have shown that these methods hold great potential for improving the accuracy and quality of cancer diagnosis. However, the issues with the robustness and reliability of such methods have not been fully resolved yet. Herein, we propose a centroid-aware feature recalibration network that can conduct cancer grading in an accurate and robust manner. The proposed network maps an input pathology image into an embedding space and adjusts it by using centroids embedding vectors of different cancer grades via attention mechanism. Equipped with the recalibrated embedding vector, the proposed network classifiers the input pathology image into a pertinent class label, i.e., cancer grade. We evaluate the proposed network using colorectal cancer datasets that were collected under different environments. The experimental results confirm that the proposed network is able to conduct cancer grading in pathology images with high accuracy regardless of the environmental changes in the datasets.	翻訳日:2023-07-27 13:38:52 公開日:2023-07-26
# 最適集約戦略を用いた分散ガウス過程を用いたPMSMの学習制御 Learning-based Control for PMSM Using Distributed Gaussian Processes with Optimal Aggregation Strategy ( http://arxiv.org/abs/2307.13945v1 ) ライセンス: Link先を確認	Zhenxiao Yin, Xiaobing Dai, Zewen Yang, Yang Shen, Georges Hattab, Hang Zhao	(参考訳) 様々な環境や未知環境における正確な制御の需要の増大は、永久磁石同期モータ(PMSM)を含む電源部品の需要の増大に拍車をかけた。システムの未知部分を推定するために機械学習技術が広く採用されており、特にガウス過程回帰(GPR)は連続系モデリングの柔軟性と性能保証のためである。実用的な実装では、分散GPRを用いて高い計算複雑性を緩和する。しかし, 制御的観点からの分散gprの研究は未解決の問題である。本稿では,Lyapunov 安定性理論に基づいて,PMSM に対する分散 GPR の最適集約戦略を提案する。この戦略は後方平均を排他的に活用するので、別のアプローチで後方分散に関連する計算集約的な計算の必要性がなくなる。さらに,提案手法の簡易な計算プロセスは,高周波pmsm制御におけるシームレスな実装に有用である。提案手法の有効性をシミュレーションで実証した。 The growing demand for accurate control in varying and unknown environments has sparked a corresponding increase in the requirements for power supply components, including permanent magnet synchronous motors (PMSMs). To infer the unknown part of the system, machine learning techniques are widely employed, especially Gaussian process regression (GPR) due to its flexibility of continuous system modeling and its guaranteed performance. For practical implementation, distributed GPR is adopted to alleviate the high computational complexity. However, the study of distributed GPR from a control perspective remains an open problem. In this paper, a control-aware optimal aggregation strategy of distributed GPR for PMSMs is proposed based on the Lyapunov stability theory. This strategy exclusively leverages the posterior mean, thereby obviating the need for computationally intensive calculations associated with posterior variance in alternative approaches. Moreover, the straightforward calculation process of our proposed strategy lends itself to seamless implementation in high-frequency PMSM control. The effectiveness of the proposed strategy is demonstrated in the simulations.	翻訳日:2023-07-27 13:38:09 公開日:2023-07-26
# グラフコントラスト学習のためのエントロピーニューラル推定 Entropy Neural Estimation for Graph Contrastive Learning ( http://arxiv.org/abs/2307.13944v1 ) ライセンス: Link先を確認	Yixuan Ma, Xiaolin Zhang, Peng Zhang, Kun Zhan	(参考訳) グラフ上の対比学習は、ノードの識別可能なハイレベル表現を抽出することを目的としている。本稿では,グラフの異なるビューにおける相互情報の下位境界を最大化することにより,データセットのエントロピーを近似することができることを理論的に説明する。そこで本研究では,データセットのビュー間のペアワイズ表現を対比する,シンプルで効果的なサブセットサンプリング戦略を提案する。特に、与えられたグラフからノードとエッジをランダムにサンプリングして、ビューの入力サブセットを構築します。 2つのビューはパラメータ共有のシャムネットワークに送られ、高次元埋め込みを抽出し、グラフ全体の情報エントロピーを推定する。学習プロセスでは,2つの目的を同時に利用してネットワークを最適化することを提案する。具体的には、対照的損失関数の入力は正対と負対からなる。グラフエンコーダの表現能力を向上するための新たな戦略として,クロスビューの類似性に基づいてノードを選択する手法を提案する。我々は, 非常に類似したサンプルと全く異なるデータを選択することで, 正と負のペアの多様性を向上する。また、異なるビューから生成された表現に対して、クロスビュー一貫性制約を導入する。この目的は、学習された表現がグラフ全体の観点からビュー間で一貫性があることを保証する。提案手法は,7つのグラフベンチマークを広範囲に実験し,現在の最先端手法と比較し,競合性能を実現する。この論文が受け入れられたら、ソースコードは公開される予定だ。 Contrastive learning on graphs aims at extracting distinguishable high-level representations of nodes. In this paper, we theoretically illustrate that the entropy of a dataset can be approximated by maximizing the lower bound of the mutual information across different views of a graph, \ie, entropy is estimated by a neural network. Based on this finding, we propose a simple yet effective subset sampling strategy to contrast pairwise representations between views of a dataset. In particular, we randomly sample nodes and edges from a given graph to build the input subset for a view. Two views are fed into a parameter-shared Siamese network to extract the high-dimensional embeddings and estimate the information entropy of the entire graph. For the learning process, we propose to optimize the network using two objectives, simultaneously. Concretely, the input of the contrastive loss function consists of positive and negative pairs. Our selection strategy of pairs is different from previous works and we present a novel strategy to enhance the representation ability of the graph encoder by selecting nodes based on cross-view similarities. We enrich the diversity of the positive and negative pairs by selecting highly similar samples and totally different data with the guidance of cross-view similarity scores, respectively. We also introduce a cross-view consistency constraint on the representations generated from the different views. This objective guarantees the learned representations are consistent across views from the perspective of the entire graph. We conduct extensive experiments on seven graph benchmarks, and the proposed approach achieves competitive performance compared to the current state-of-the-art methods. The source code will be publicly released once this paper is accepted.	翻訳日:2023-07-27 13:37:47 公開日:2023-07-26
# 分散一般化のためのトポロジーアウェアロバスト最適化 Topology-aware Robust Optimization for Out-of-distribution Generalization ( http://arxiv.org/abs/2307.13943v1 ) ライセンス: Link先を確認	Fengchun Qiao, Xi Peng	(参考訳) out-of-distribution (ood) 一般化は、多くの高スループットアプリケーションにおいて非常に望ましい機械学習問題である。既存の手法は、一般化の信頼性の低い過度な悲観的モデリングに苦しむ。任意のテスト分布を一般化することは不可能であるため、分布のトポロジーのさらなる構造は強力なOODレジリエンスを開発する上で重要であると仮定する。そこで本研究では,分散トポロジを原理最適化フレームワークでシームレスに統合するトポロジ対応ロバスト最適化(TRO)を提案する。より具体的には、troは、2つの最適化目標を解決している: 1) 分布トポロジーを明らかにするためにデータ多様体を探索するトポロジー学習; (2) トポロジーを利用したトポロジーの学習。本手法の有効性を理論的に実証し, 分類, 回帰, 意味セグメンテーションなど幅広いタスクにおいて, 芸術の状態を著しく上回っていることを実証的に示す。さらに、データ駆動分布トポロジーはドメイン知識と一貫性があることを実証的に見出し、このアプローチの説明可能性を高めた。 Out-of-distribution (OOD) generalization is a challenging machine learning problem yet highly desirable in many high-stake applications. Existing methods suffer from overly pessimistic modeling with low generalization confidence. As generalizing to arbitrary test distributions is impossible, we hypothesize that further structure on the topology of distributions is crucial in developing strong OOD resilience. To this end, we propose topology-aware robust optimization (TRO) that seamlessly integrates distributional topology in a principled optimization framework. More specifically, TRO solves two optimization objectives: (1) Topology Learning which explores data manifold to uncover the distributional topology; (2) Learning on Topology which exploits the topology to constrain robust optimization for tightly-bounded generalization risks. We theoretically demonstrate the effectiveness of our approach and empirically show that it significantly outperforms the state of the arts in a wide range of tasks including classification, regression, and semantic segmentation. Moreover, we empirically find the data-driven distributional topology is consistent with domain knowledge, enhancing the explainability of our approach.	翻訳日:2023-07-27 13:37:24 公開日:2023-07-26
# Dual-Level Siamese Structure Networkによる半教師付きセマンティックセマンティックセグメンテーションの改善 Improving Semi-Supervised Semantic Segmentation with Dual-Level Siamese Structure Network ( http://arxiv.org/abs/2307.13938v1 ) ライセンス: Link先を確認	Zhibo Tain, Xiaolin Zhang, Peng Zhang, Kun Zhan	(参考訳) semi-supervised semantic segmentation (sss)はラベル付きデータとラベルなしデータの両方を使用して、ラベル付きトレーニング例のコストを削減する重要なタスクである。しかし、sssアルゴリズムの有効性は、ラベルなしデータのポテンシャルを十分に活用することの困難さによって制限される。そこで本研究では,画素間コントラスト学習のためのデュアルレベルシアーム構造ネットワーク (dssn) を提案する。低レベル画像空間と高レベル特徴空間の両方における強力な拡張ビューを用いて、正の対を画素単位のコントラスト損失と整合させることにより、DSSNは、利用可能な未ラベルデータの利用を最大化するように設計されている。さらに,クラス選択を行なわない,あるいはすべてのクラスに対して事前定義されたしきい値を適用しない,既存のメソッドの制限に対処する,弱強監督のための新しいクラス対応擬似ラベル選択戦略を導入する。具体的には,強固な拡張ビューを監督する擬似ラベルを生成するために,クラス毎の弱ビューの上位高信頼予測を選択する。この戦略は、クラスの不均衡を考慮し、ロングテールクラスのパフォーマンスを改善することができる。提案手法は, PASCAL VOC 2012とCityscapesの2つのデータセットに対して, 最先端の結果を得る。 Semi-supervised semantic segmentation (SSS) is an important task that utilizes both labeled and unlabeled data to reduce expenses on labeling training examples. However, the effectiveness of SSS algorithms is limited by the difficulty of fully exploiting the potential of unlabeled data. To address this, we propose a dual-level Siamese structure network (DSSN) for pixel-wise contrastive learning. By aligning positive pairs with a pixel-wise contrastive loss using strong augmented views in both low-level image space and high-level feature space, the proposed DSSN is designed to maximize the utilization of available unlabeled data. Additionally, we introduce a novel class-aware pseudo-label selection strategy for weak-to-strong supervision, which addresses the limitations of most existing methods that do not perform selection or apply a predefined threshold for all classes. Specifically, our strategy selects the top high-confidence prediction of the weak view for each class to generate pseudo labels that supervise the strong augmented views. This strategy is capable of taking into account the class imbalance and improving the performance of long-tailed classes. Our proposed method achieves state-of-the-art results on two datasets, PASCAL VOC 2012 and Cityscapes, outperforming other SSS algorithms by a significant margin.	翻訳日:2023-07-27 13:37:03 公開日:2023-07-26
# ノイズチャネルにおける2つのキュービットプローブによるマルチパラメータ推定 Multiparameter estimation with two qubit probes in noisy channels ( http://arxiv.org/abs/2307.13936v1 ) ライセンス: Link先を確認	Lorcan. O. Conlon, Ping Koy Lam and Syed. M. Assad	(参考訳) この研究は、異なるノイズチャネルの作用下で複数の位相回転を同時に推定する単一および2つのキュービットプローブの性能を比較する。我々は,この同時推定の量子限界を,ホレボと長岡-ハヤシ-ラオ境界をそれぞれ評価して計算する。いくつかの量子ノイズチャネル、すなわちデコヒーリングチャネル、振幅減衰チャネル、位相減衰チャネルが考慮されている。各チャネルに対して最適な1と2のキュービットプローブを求める。可能ならば, 適切な境界を飽和させる明示的な測定戦略を実証し, 同一プローブの複数コピーの集合的測定により, ホレヴォ境界がどの程度近づいたかを考察する。検討されたチャネルの動作により、2つの量子ビットプローブが1つの量子ビットプローブよりも高いパラメータ推定能力を示し、すなわち1つの量子ビットプローブによる達成可能な精度は、2つの量子ビットプローブよりもノイズ環境への露出が増加すると劣化する。しかし、十分なノイズのあるチャネルでは、単一量子ビットプローブが最大2つの量子ビットプローブより優れていることを示す。この研究は、量子力学によって許容される究極の精度限界に達するためには、状態準備と状態測定の段階で絡み合わなければならないことを示している。本論文のチュートリアル的な性質が容易に利用できることを期待している。 This work compares the performance of single and two qubit probes for estimating several phase rotations simultaneously under the action of different noisy channels. We compute the quantum limits for this simultaneous estimation using collective and individual measurements by evaluating the Holevo and Nagaoka-Hayashi Cram\'er-Rao bounds respectively. Several quantum noise channels are considered, namely the decohering channel, the amplitude damping channel and the phase damping channel. For each channel we find the optimal single and two qubit probes. Where possible we demonstrate an explicit measurement strategy which saturates the appropriate bound and we investigate how closely the Holevo bound can be approached through collective measurements on multiple copies of the same probe. We find that under the action of the considered channels, two qubit probes show enhanced parameter estimation capabilities over single qubit probes for almost all non-identity channels, i.e. the achievable precision with a single qubit probe degrades faster with increasing exposure to the noisy environment than that of the two qubit probe. However, in sufficiently noisy channels, we show that it is possible for single qubit probes to outperform maximally entangled two qubit probes. This work shows that, in order to reach the ultimate precision limits allowed by quantum mechanics, entanglement is required in both the state preparation and state measurement stages. It is hoped the tutorial-style nature of this paper will make it easily accessible.	翻訳日:2023-07-27 13:36:39 公開日:2023-07-26
# AIDE: 補助駆動知覚のためのビジョン駆動型マルチビュー、マルチモーダル、マルチタスクデータセット AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception ( http://arxiv.org/abs/2307.13933v1 ) ライセンス: Link先を確認	Dingkang Yang, Shuai Huang, Zhi Xu, Zhenpeng Li, Shunli Wang, Mingcheng Li, Yuzheng Wang, Yang Liu, Kun Yang, Zhaoyu Chen, Yan Wang, Jing Liu, Peixuan Zhang, Peng Zhai, Lihua Zhang	(参考訳) ドライバーの気晴らしは、過去10年間の交通事故の重要な原因となっている。視覚駆動運転監視システムの開発が進んでいるにもかかわらず、包括的認識データセットの欠如は道路の安全と交通安全を制限している。本稿では,車内と車外の両方の文脈情報を自然なシナリオで考察する,AIDE(AssIstive Driving pErception dataset)を提案する。 AIDEは、ドライバとシーンのマルチビュー設定、顔、体、姿勢、ジェスチャーのマルチモーダルアノテーション、理解を促進するための4つの実用的タスクデザインなど、三つの特徴を通じて、総合的なドライバー監視を促進する。 aideを徹底的に検討するために、広範囲なメソッドを通じて3種類のベースラインフレームワークに関する実験的なベンチマークを提供する。さらに、2つの融合戦略を導入し、効果的なマルチストリーム/モーダル表現の学習に新たな洞察を与える。また、AIDEおよびベンチマークにおけるキーコンポーネントの重要性と合理性についても系統的に検討する。プロジェクトリンクはhttps://github.com/ydk122024/AIDE。 Driver distraction has become a significant cause of severe traffic accidents over the past decade. Despite the growing development of vision-driven driver monitoring systems, the lack of comprehensive perception datasets restricts road safety and traffic security. In this paper, we present an AssIstive Driving pErception dataset (AIDE) that considers context information both inside and outside the vehicle in naturalistic scenarios. AIDE facilitates holistic driver monitoring through three distinctive characteristics, including multi-view settings of driver and scene, multi-modal annotations of face, body, posture, and gesture, and four pragmatic task designs for driving understanding. To thoroughly explore AIDE, we provide experimental benchmarks on three kinds of baseline frameworks via extensive methods. Moreover, two fusion strategies are introduced to give new insights into learning effective multi-stream/modal representations. We also systematically investigate the importance and rationality of the key components in AIDE and benchmarks. The project link is https://github.com/ydk122024/AIDE.	翻訳日:2023-07-27 13:36:15 公開日:2023-07-26
# 多エージェント協調知覚のための時空間認識 Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception ( http://arxiv.org/abs/2307.13929v1 ) ライセンス: Link先を確認	Kun Yang, Dingkang Yang, Jingyu Zhang, Mingcheng Li, Yang Liu, Jing Liu, Hanqi Wang, Peng Sun, Liang Song	(参考訳) 車両間通信の潜在的な応用としてのマルチエージェント協調認識は、単一エージェント認識よりも自動運転車の知覚性能を著しく向上させる可能性がある。しかし、この新たな研究で実用的な情報共有を実現する上で、いくつかの課題が残っている。本稿では,道路上のエージェント間の時空間的認識特性をエンドツーエンドに集約する新しい協調認識フレームワークSCOPEを提案する。具体的にはSCOPEには3つの異なる長所がある。一標的エージェントの現在の表現を高めるために、時間的文脈の効果的な意味的手がかりを考えること。二異種エージェントから知覚的に重要な空間情報を集約し、多スケールの特徴的相互作用による局在誤差を克服する。三適応融合パラダイムによる補完的貢献に基づいて、対象エージェントのマルチソース表現を統合すること。スコープを徹底的に評価するために,3つのデータセット上での協調的3次元物体検出タスクの現実シナリオとシミュレーションシナリオの両方を検討する。大規模な実験は、我々のアプローチの優位性と提案したコンポーネントの必要性を実証する。 Multi-agent collaborative perception as a potential application for vehicle-to-everything communication could significantly improve the perception performance of autonomous vehicles over single-agent perception. However, several challenges remain in achieving pragmatic information sharing in this emerging research. In this paper, we propose SCOPE, a novel collaborative perception framework that aggregates the spatio-temporal awareness characteristics across on-road agents in an end-to-end manner. Specifically, SCOPE has three distinct strengths: i) it considers effective semantic cues of the temporal context to enhance current representations of the target agent; ii) it aggregates perceptually critical spatial information from heterogeneous agents and overcomes localization errors via multi-scale feature interactions; iii) it integrates multi-source representations of the target agent based on their complementary contributions by an adaptive fusion paradigm. To thoroughly evaluate SCOPE, we consider both real-world and simulated scenarios of collaborative 3D object detection tasks on three datasets. Extensive experiments demonstrate the superiority of our approach and the necessity of the proposed components.	翻訳日:2023-07-27 13:35:56 公開日:2023-07-26
# DFR-Net:ヘイズ密度差を利用した画像デハージングのための密度特徴補正ネットワーク DFR-Net: Density Feature Refinement Network for Image Dehazing Utilizing Haze Density Difference ( http://arxiv.org/abs/2307.13927v1 ) ライセンス: Link先を確認	Zhongze Wang, Haitao Zhao, Lujian Yao, Jingchao Peng, Kaijie Zhao	(参考訳) 画像デハジングタスクでは、ヘイズ密度が重要な特徴であり、デハジング手法の性能に影響を与える。しかし、既存の手法には密度を測る比較画像が欠けているものもあり、中間結果を生成するものもあるが、密度差の活用が欠如しており、密度の認識が容易である。これらの欠陥に対処するために,密度差からヘイズ密度特徴を抽出し,密度差を利用して密度特性を洗練させる密度特徴再構成ネットワーク (DFR-Net) を提案する。 DFR-Netでは、まず全体密度がハジー入力よりも低い提案画像を生成し、大域的な密度差をもたらす。さらに、提案画像のデハージング残差はデハージング性能のレベルを反映し、局所化されたハードデハジングまたは高密度領域を示す局所密度差を提供する。その後,密度認識を実現するため,Global Branch (GB) と Local Branch (LB) を導入する。 GB では,ハッシュ入力と提案画像の特徴抽出に Siamese ネットワークを使用し,グローバル密度特徴再構成 (GDFR) モジュールを提案する。 LBでは, ゆるやかな入力と提案画像間の残差から局所密度特徴を探索し, 局所特徴を更新し, 鮮明な画像特徴に近づけるための中間復調残留フィードフォワード (IDRF) モジュールを導入する。提案手法は, 各種データセット上での最先端の手法を超える結果が得られることを示す。 In image dehazing task, haze density is a key feature and affects the performance of dehazing methods. However, some of the existing methods lack a comparative image to measure densities, and others create intermediate results but lack the exploitation of their density differences, which can facilitate perception of density. To address these deficiencies, we propose a density-aware dehazing method named Density Feature Refinement Network (DFR-Net) that extracts haze density features from density differences and leverages density differences to refine density features. In DFR-Net, we first generate a proposal image that has lower overall density than the hazy input, bringing in global density differences. Additionally, the dehazing residual of the proposal image reflects the level of dehazing performance and provides local density differences that indicate localized hard dehazing or high density areas. Subsequently, we introduce a Global Branch (GB) and a Local Branch (LB) to achieve density-awareness. In GB, we use Siamese networks for feature extraction of hazy inputs and proposal images, and we propose a Global Density Feature Refinement (GDFR) module that can refine features by pushing features with different global densities further away. In LB, we explore local density features from the dehazing residuals between hazy inputs and proposal images and introduce an Intermediate Dehazing Residual Feedforward (IDRF) module to update local features and pull them closer to clear image features. Sufficient experiments demonstrate that the proposed method achieves results beyond the state-of-the-art methods on various datasets.	翻訳日:2023-07-27 13:35:41 公開日:2023-07-26
# 暗号化された視覚変換器モデルのランダムアンサンブルを用いた敵に対するセキュリティ強化 Enhanced Security against Adversarial Examples Using a Random Ensemble of Encrypted Vision Transformer Models ( http://arxiv.org/abs/2307.13985v1 ) ライセンス: Link先を確認	Ryota Iijima, Miki Tanaka, Sayaka Shiota, Hitoshi Kiya	(参考訳) ディープニューラルネットワーク(DNN)は、敵の例(AE)に弱いことがよく知られている。さらに、AEは逆転性を持ち、つまりソースモデルのために生成されたAEは、非自明な確率で別のブラックボックスモデル(ターゲットモデル)を騙すことができる。従来の研究では、ビジョントランスフォーマー(ViT)は、ConvMixerのような畳み込みニューラルネットワーク(CNN)モデルよりも、逆転性の性質に対してより堅牢であることが確認されており、暗号化されたViTは暗号化なしではViTよりも堅牢である。本稿では,より堅牢なモデルを実現するために,暗号化されたViTモデルのランダムアンサンブルを提案する。実験では,提案手法は従来手法よりもブラックボックス攻撃だけでなくホワイトボックス攻撃に対しても堅牢であることが確認された。 Deep neural networks (DNNs) are well known to be vulnerable to adversarial examples (AEs). In addition, AEs have adversarial transferability, which means AEs generated for a source model can fool another black-box model (target model) with a non-trivial probability. In previous studies, it was confirmed that the vision transformer (ViT) is more robust against the property of adversarial transferability than convolutional neural network (CNN) models such as ConvMixer, and moreover encrypted ViT is more robust than ViT without any encryption. In this article, we propose a random ensemble of encrypted ViT models to achieve much more robust models. In experiments, the proposed scheme is verified to be more robust against not only black-box attacks but also white-box ones than convention methods.	翻訳日:2023-07-27 13:29:49 公開日:2023-07-26
# 極小映像品質モデルの設計による映像品質データセットの解析 Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models ( http://arxiv.org/abs/2307.13981v1 ) ライセンス: Link先を確認	Wei Sun and Wen Wen and Xiongkuo Min and Long Lan and Guangtao Zhai and Kede Ma	(参考訳) Blind Video Quality Assessment (BVQA) は、様々な実世界のビデオ対応メディアアプリケーションにおけるエンドユーザの視聴体験の監視と改善に不可欠である。実験分野として、BVQAモデルの改良は、主に人間の評価されたVQAデータセットに基づいて測定されている。したがって、既存のVQAデータセットをよりよく理解し、BVQAの現在の進歩を適切に評価することが重要である。この目標に向けて、最小主義的BVQAモデルを設計することで、VQAデータセットの第一種計算分析を行う。ビデオプリプロセッサ(アグレッシブな時空間的ダウンサンプリング)、空間的品質アナライザ、任意の時間的品質アナライザ、品質レグレッサといった、最も単純なインスタンス化を備えたbvqaモデルのファミリーを最小限に制限します。 8つのVQAデータセットの異なるモデル変種の品質予測性能と現実的な歪みを比較することで、ほぼ全てのデータセットが、さまざまな重大さのデータセット問題に悩まされており、そのうちのいくつかはブラインド画像品質評価(BIQA)ソリューションを受け入れている。さらに、これらのVQAデータセットのモデル一般化可能性と、基本ビルディングブロックに関連するBVQA設計選択を曖昧にすることで、当社の主張を正当化する。我々の結果は、BVQAの現在の進歩に疑問を投げかけ、一方で、次世代のVQAデータセットとモデルを構築するための良い実践に光を当てた。 Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by contrasting our model generalizability on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.	翻訳日:2023-07-27 13:29:35 公開日:2023-07-26
# 強化学習によるGANの潜時空間制御:タスクベース画像翻訳を事例として Controlling the Latent Space of GANs through Reinforcement Learning: A Case Study on Task-based Image-to-Image Translation ( http://arxiv.org/abs/2307.13978v1 ) ライセンス: Link先を確認	Mahyar Abbasian, Taha Rajabzadeh, Ahmadreza Moradipari, Seyed Amir Hossein Aqajari, Hongsheng Lu, Amir Rahmani	(参考訳) GAN(Generative Adversarial Networks)は、トレーニングデータセットに基づいたリアルなアウトプットを生成する、恐ろしいAIツールとして登場した。しかし、gansの生成プロセスを制御するという課題は依然として大きなハードルとなっている。本稿では,RLエージェントと潜在空間GAN(l-GAN)を統合し,所望の出力を生成することにより,この問題に対処する新しい手法を提案する。より具体的には,l-GANの潜伏空間をナビゲートし,特定のタスクに基づいて出力を生成できる,細心の注意を払って設計された報酬ポリシーを備えたアクタ批判的RLエージェントを開発した。提案手法の有効性を確認するために,MNISTデータセットを用いた一連の実験を行った。これらの実験の結果は、我々の方法論を検証するのに役立つ。我々の先駆的なRLエージェントとGANモデルの統合は、将来、生成ネットワークを強化する大きな可能性を秘めている。 Generative Adversarial Networks (GAN) have emerged as a formidable AI tool to generate realistic outputs based on training datasets. However, the challenge of exerting control over the generation process of GANs remains a significant hurdle. In this paper, we propose a novel methodology to address this issue by integrating a reinforcement learning (RL) agent with a latent-space GAN (l-GAN), thereby facilitating the generation of desired outputs. More specifically, we have developed an actor-critic RL agent with a meticulously designed reward policy, enabling it to acquire proficiency in navigating the latent space of the l-GAN and generating outputs based on specified tasks. To substantiate the efficacy of our approach, we have conducted a series of experiments employing the MNIST dataset, including arithmetic addition as an illustrative task. The outcomes of these experiments serve to validate our methodology. Our pioneering integration of an RL agent with a GAN model represents a novel advancement, holding great potential for enhancing generative networks in the future.	翻訳日:2023-07-27 13:29:09 公開日:2023-07-26
# 高品質なものを追跡する Tracking Anything in High Quality ( http://arxiv.org/abs/2307.13974v1 ) ライセンス: Link先を確認	Jiawen Zhu, Zhenyu Chen, Zeqi Hao, Shijie Chang, Lu Zhang, Dong Wang, Huchuan Lu, Bin Luo, Jun-Yan He, Jin-Peng Lan, Hanyuan Chen, Chenyang Li	(参考訳) ビジュアルオブジェクトトラッキングはコンピュータビジョンにおける基本的なビデオタスクである。近年、認識アルゴリズムの顕著なパワー向上により、シングル/マルチオブジェクトとボックス/マスクベースのトラッキングの統合が可能になった。その中でもSegment Anything Model (SAM) が注目されている。本稿では,ビデオの高品質なトラッキングのためのフレームワークであるhqtrackを提案する。 HQTrackは主にビデオマルチオブジェクトセグメンタ(VMOS)とマスクリファインダ(MR)で構成されている。ビデオの最初のフレームで追跡するオブジェクトが与えられた場合、VMOSはオブジェクトマスクを現在のフレームに伝搬する。 VMOSは複数のクローズセットビデオオブジェクトセグメンテーション(VOS)データセットでトレーニングされており、複雑なシーンやコーナーシーンに一般化する能力に制限があるため、この段階でのマスクの結果は十分に正確ではない。トラッキングマスクの品質をさらに向上するため、追跡結果を改善するために事前訓練されたMRモデルが採用された。テスト時のデータ拡張やモデルアンサンブルといったトリックを使わずに、私たちのパラダイムの有効性を証明してくれるものとして、HQTrackは、ビジュアルオブジェクト追跡とセグメンテーション(VOTS2023)の2位にランク付けします。コードとモデルはhttps://github.com/jiawen-zhu/hqtrackで入手できる。 Visual object tracking is a fundamental video task in computer vision. Recently, the notably increasing power of perception algorithms allows the unification of single/multiobject and box/mask-based tracking. Among them, the Segment Anything Model (SAM) attracts much attention. In this report, we propose HQTrack, a framework for High Quality Tracking anything in videos. HQTrack mainly consists of a video multi-object segmenter (VMOS) and a mask refiner (MR). Given the object to be tracked in the initial frame of a video, VMOS propagates the object masks to the current frame. The mask results at this stage are not accurate enough since VMOS is trained on several closeset video object segmentation (VOS) datasets, which has limited ability to generalize to complex and corner scenes. To further improve the quality of tracking masks, a pretrained MR model is employed to refine the tracking results. As a compelling testament to the effectiveness of our paradigm, without employing any tricks such as test-time data augmentations and model ensemble, HQTrack ranks the 2nd place in the Visual Object Tracking and Segmentation (VOTS2023) challenge. Code and models are available at https://github.com/jiawen-zhu/HQTrack.	翻訳日:2023-07-27 13:28:51 公開日:2023-07-26
# 隠れ層の線形分離性によるディープニューラルネットワークの理解 Understanding Deep Neural Networks via Linear Separability of Hidden Layers ( http://arxiv.org/abs/2307.13962v1 ) ライセンス: Link先を確認	Chao Zhang, Xinyu Chen, Wensheng Li, Lixue Liu, Wei Wu, Dacheng Tao	(参考訳) 本稿では,隠れ層出力の線形分離性を測定し,深層ニューラルネットワークの特性について検討する。特に,ミンコフスキー差分に基づく線形分離性尺度(MD-LSMs)を提案し,2点集合の線形分離性度を評価する。次に,隠れレイヤ出力の線形分離性度とネットワークトレーニング性能との間に同期性があること,すなわち,更新重みが隠れレイヤ出力の線形分離性度を高めることができるならば,更新ネットワークはよりよいトレーニング性能を達成し,その逆も実現できることを示す。さらに,活性化関数とネットワークサイズ(幅と深さを含む)が隠れ層の線形分離性に及ぼす影響について検討した。最後に、多層パーセプトロン(MLP)、畳み込みニューラルネットワーク(CNN)、深層ネットワーク(DBN)、ResNet、VGGNet、AlexNet、ビジョントランスフォーマー(ViT)、GoogLeNetなど、いくつかの一般的なディープネットワークに関する数値実験を行った。 In this paper, we measure the linear separability of hidden layer outputs to study the characteristics of deep neural networks. In particular, we first propose Minkowski difference based linear separability measures (MD-LSMs) to evaluate the linear separability degree of two points sets. Then, we demonstrate that there is a synchronicity between the linear separability degree of hidden layer outputs and the network training performance, i.e., if the updated weights can enhance the linear separability degree of hidden layer outputs, the updated network will achieve a better training performance, and vice versa. Moreover, we study the effect of activation function and network size (including width and depth) on the linear separability of hidden layers. Finally, we conduct the numerical experiments to validate our findings on some popular deep networks including multilayer perceptron (MLP), convolutional neural network (CNN), deep belief network (DBN), ResNet, VGGNet, AlexNet, vision transformer (ViT) and GoogLeNet.	翻訳日:2023-07-27 13:28:30 公開日:2023-07-26
# 可変容量束量子ビットのデコヒーレンス Decoherence of a tunable capacitively shunted flux qubit ( http://arxiv.org/abs/2307.13961v1 ) ライセンス: Link先を確認	R. Trappen, X. Dai, M. A. Yurtalan, D. Melanson, D. M. Tennant, A. J. Martinez, Y. Tang, J. Gibson, J. A. Grover, S. M. Disseler, J. I. Basham, R. Das, D. K. Kim, A. J. Melville, B. M. Niedzielski, C. F. Hirjibehedin, K. Serniak, S. J. Weber, J. L. Yoder, W. D. Oliver, D. A. Lidar, A. Lupascu	(参考訳) 本稿では,コヒーレント量子アニーリング用に設計された波長可変容量量子束量子ビットのコヒーレンスに関する詳細な研究を行う。クビット対称性点における測定された緩和は、主に$\sim3~\text{GHz}$以下のクビット周波数に対する主クビットループの固有フラックスノイズに起因する。高い周波数では、バイアスラインの熱ノイズが緩和に大きく寄与し、高速熱処理と高周波制御の両方を実験的に探索する設計選択から生じる。測定された消耗速度は、主に2つの量子ビットループの固有低周波フラックスノイズによるもので、高速アニーリングに用いられる制御電子回路の低周波ノイズによる追加の寄与がある。劣化時間のフラックスバイアス依存性は、おそらく局所的なフラックスノイズやジャンクション臨界電流ノイズによる2つのキュービットループ間の明らかなノイズ相関も示している。この結果は、コヒーレンスを増大させた超伝導量子アニールの構築に向けた継続的な取り組みに関係している。 We present a detailed study of the coherence of a tunable capacitively-shunted flux qubit, designed for coherent quantum annealing applications. The measured relaxation at the qubit symmetry point is mainly due to intrinsic flux noise in the main qubit loop for qubit frequencies below $\sim3~\text{GHz}$. At higher frequencies, thermal noise in the bias line makes a significant contribution to the relaxation, arising from the design choice to experimentally explore both fast annealing and high-frequency control. The measured dephasing rate is primarily due to intrinsic low-frequency flux noise in the two qubit loops, with additional contribution from the low-frequency noise of control electronics used for fast annealing. The flux-bias dependence of the dephasing time also reveals apparent noise correlation between the two qubit loops, possibly due to non-local sources of flux noise or junction critical-current noise. Our results are relevant for ongoing efforts toward building superconducting quantum annealers with increased coherence.	翻訳日:2023-07-27 13:28:08 公開日:2023-07-26
# ビジュアルプロンプトフレキシブル・モード顔アンチスプーフィング Visual Prompt Flexible-Modal Face Anti-Spoofing ( http://arxiv.org/abs/2307.13958v1 ) ライセンス: Link先を確認	Zitong Yu, Rizhao Cai, Yawen Cui, Ajian Liu and Changsheng Chen	(参考訳) 近年,face anti-spoofing (fas) システムのロバスト性を改善するため,視覚トランスフォーマーを用いたマルチモーダル学習法が提案されている。しかし、実世界から収集されたマルチモーダル顔データは、様々な撮像センサからのモダリティの欠如により、しばしば不完全である。近年、フレキシブルモダルfas~\cite{yu2023flexible}が注目され、完全なマルチモダルフェースデータを用いた統一マルチモダルfasモデルの開発が目的となっている。本稿では,フレキシブルモダルfasにおける1つの大きな課題,すなわち,実環境においてトレーニング中やテスト中にモダリティの欠如が発生する場合に取り組む。近年の言語モデルにおけるプロンプト学習の成功に触発されて,我々は,凍ったプレトレーニング基礎モデルから下流のフレキシブルモダルfasタスクに適応するためのモーダル関連プロンプトを学ぶための,フレキシブルモダル \textbf{p}rompt flexible-modal \textbf{fas} (vp-fas)を提案する。具体的には、バニラビジュアルプロンプトと残差コンテクストプロンプトの両方をマルチモーダルトランスフォーマタに接続して、一般的な欠如モダリティケースを処理するが、モデル全体のトレーニングに比べて学習可能なパラメータは4\%未満である。さらに, 部分モダリティが欠如している場合には, モデルに一貫したマルチモーダルな特徴埋め込みを学習させなければならない。 2つのマルチモーダルFASベンチマークデータセットで実施された大規模な実験は、重モデル再トレーニングの要件を緩和しつつ、様々なモダリティケースにおけるパフォーマンスを向上させるVP-FASフレームワークの有効性を示す。 Recently, vision transformer based multimodal learning methods have been proposed to improve the robustness of face anti-spoofing (FAS) systems. However, multimodal face data collected from the real world is often imperfect due to missing modalities from various imaging sensors. Recently, flexible-modal FAS~\cite{yu2023flexible} has attracted more attention, which aims to develop a unified multimodal FAS model using complete multimodal face data but is insensitive to test-time missing modalities. In this paper, we tackle one main challenge in flexible-modal FAS, i.e., when missing modality occurs either during training or testing in real-world situations. Inspired by the recent success of the prompt learning in language models, we propose \textbf{V}isual \textbf{P}rompt flexible-modal \textbf{FAS} (VP-FAS), which learns the modal-relevant prompts to adapt the frozen pre-trained foundation model to downstream flexible-modal FAS task. Specifically, both vanilla visual prompts and residual contextual prompts are plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 4\% learnable parameters compared to training the entire model. Furthermore, missing-modality regularization is proposed to force models to learn consistent multimodal feature embeddings when missing partial modalities. Extensive experiments conducted on two multimodal FAS benchmark datasets demonstrate the effectiveness of our VP-FAS framework that improves the performance under various missing-modality cases while alleviating the requirement of heavy model re-training.	翻訳日:2023-07-27 13:27:52 公開日:2023-07-26
# 不均一な多エージェント協調 Heterogeneous Embodied Multi-Agent Collaboration ( http://arxiv.org/abs/2307.13957v1 ) ライセンス: Link先を確認	Xinzhu Liu, Di Guo, Huaping Liu	(参考訳) 近年,複雑な室内視覚環境においてマルチエージェントエンボディタスクが研究されている。複数のエージェント間のコラボレーションは作業効率を向上し、実用的な価値を持つ。しかし、既存の研究のほとんどは均質なマルチエージェントタスクに焦点を当てている。均質なエージェントと比較して、異質なエージェントはそれぞれの能力を活用して対応するサブタスクを割り当て、複雑なタスクを完了させる。不均一なマルチエージェントタスクは現実のシナリオでは一般的であり、異種エージェント間のコラボレーション戦略は解決すべき課題であり、重要な問題である。本研究では,異種エージェント間の協調について検討するため,異なる能力を持つ複数の異種エージェントが協調してミスプレース物体を検出し,妥当な場所に配置する,異種エージェント間タイディングアップタスクを提案する。適切なタスク計画を実行し、タスク全体を完了するために、エージェントがそれぞれの能力の最大限の活用を要求するため、これは要求の多いタスクである。そこで本研究では,複数の部屋を有する集合住宅において, procthor-10k に基づくマルチエージェント・タイディングアップベンチマークデータセットを構築する。提案手法は,ミスプレース物体検出,合理的レセプタクル予測,ハンドシェイクに基づくグループコミュニケーション機構に基づく階層的決定モデルを提案する。提案モデルの有効性を示すため, 大規模な実験を行った。プロジェクトのWebサイトと実験のビデオはhttps://hetercol.github.io/で見ることができる。 Multi-agent embodied tasks have recently been studied in complex indoor visual environments. Collaboration among multiple agents can improve work efficiency and has significant practical value. However, most of the existing research focuses on homogeneous multi-agent tasks. Compared with homogeneous agents, heterogeneous agents can leverage their different capabilities to allocate corresponding sub-tasks and cooperate to complete complex tasks. Heterogeneous multi-agent tasks are common in real-world scenarios, and the collaboration strategy among heterogeneous agents is a challenging and important problem to be solved. To study collaboration among heterogeneous agents, we propose the heterogeneous multi-agent tidying-up task, in which multiple heterogeneous agents with different capabilities collaborate with each other to detect misplaced objects and place them in reasonable locations. This is a demanding task since it requires agents to make the best use of their different capabilities to conduct reasonable task planning and complete the whole task. To solve this task, we build a heterogeneous multi-agent tidying-up benchmark dataset in a large number of houses with multiple rooms based on ProcTHOR-10K. We propose the hierarchical decision model based on misplaced object detection, reasonable receptacle prediction, as well as the handshake-based group communication mechanism. Extensive experiments are conducted to demonstrate the effectiveness of the proposed model. The project's website and videos of experiments can be found at https://hetercol.github.io/.	翻訳日:2023-07-27 13:27:14 公開日:2023-07-26
# 音韻とヴィザジュの隠れた踊り--音韻と顔の特徴の巧妙な関係を解き明かす The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features ( http://arxiv.org/abs/2307.13953v1 ) ライセンス: Link先を確認	Liao Qu, Xianwei Zou, Xiang Li, Yandong Wen, Rita Singh, Bhiksha Raj	(参考訳) この研究は、音素と顔の特徴を巧妙に結びつける。音声と顔の相関に関する従来の研究では、音声から顔画像を生成し、音声から3d顔メッシュを再構築するなど、音声入力の長期的使用が一般的である。しかし、音声による犯罪のような状況では、利用可能な音声証拠は短く制限される可能性がある。さらに、生理的観点からは、音声の各部分(音素)は、顔の様々な種類の気流と動きに対応している。したがって、音素と顔属性の隠れたリンクを見つけるのが有利である。本稿では,音素v.s.顔面計測(am)を用いて,音声と顔の関係を詳細に検討するための分析パイプラインを提案する。我々は,各音素-AMペアに対する推定器を構築し,仮説テストにより相関性を評価する。その結果, 子音, 特に発声音と比較して, AMは母音からより予測可能であることが示唆された。さらに、特定のamが音素発音中により多くの動きを示す場合、より予測可能であることも観察する。本研究は,相関関係に関する生理学の諸問題をサポートし,音声対マルチモーダル学習の今後の研究に向けた基礎研究を展開する。 This work unveils the enigmatic link between phonemes and facial features. Traditional studies on voice-face correlations typically involve using a long period of voice input, including generating face images from voices and reconstructing 3D face meshes from voices. However, in situations like voice-based crimes, the available voice evidence may be short and limited. Additionally, from a physiological perspective, each segment of speech -- phoneme -- corresponds to different types of airflow and movements in the face. Therefore, it is advantageous to discover the hidden link between phonemes and face attributes. In this paper, we propose an analysis pipeline to help us explore the voice-face relationship in a fine-grained manner, i.e., phonemes v.s. facial anthropometric measurements (AM). We build an estimator for each phoneme-AM pair and evaluate the correlation through hypothesis testing. Our results indicate that AMs are more predictable from vowels compared to consonants, particularly with plosives. Additionally, we observe that if a specific AM exhibits more movement during phoneme pronunciation, it is more predictable. Our findings support those in physiology regarding correlation and lay the groundwork for future research on speech-face multimodal learning.	翻訳日:2023-07-27 13:26:52 公開日:2023-07-26
# 拡散は事前学習言語モデルにどのように影響するか? How Does Diffusion Influence Pretrained Language Models on Out-of-Distribution Data? ( http://arxiv.org/abs/2307.13949v1 ) ライセンス: Link先を確認	Huazheng Wang, Daixuan Cheng, Haifeng Sun, Jingyu Wang, Qi Qi, Jianxin Liao, Jing Wang, Cong Liu	(参考訳) トランスフォーマーベースの事前訓練言語モデル(PLM)は、現代のNLPにおいて大きな成功を収めている。 PLMの重要な利点は、良い分配性(OOD)の堅牢性である。近年、拡散モデルがplmに拡散を適用する多くの研究を惹きつけている。拡散がOODデータにPLMがどのように影響するかは未解明のままである。拡散モデルのコアは、ガウスノイズを入力に徐々に適用する前方拡散過程と、ノイズを除去する逆復調過程である。ノイズ入力再構成は拡散モデルの基本的な能力である。我々は,OODデータ再構成能力の検証やOODサンプルの検出など,復元損失を計測することで,OODのロバスト性を直接解析する。実験は、8つのデータセットで異なるトレーニングパラメータとデータ統計特徴を分析して行われる。拡散を伴う微視的PLMはOODデータの再構成能力を低下させる。また、拡散モデルがoodサンプルを効果的に検出し、18%の精度向上でほとんどのデータセットで最先端のパフォーマンスを実現することも示している。これらの結果から, 拡散はPLMのOOD堅牢性を低下させることが示された。 Transformer-based pretrained language models (PLMs) have achieved great success in modern NLP. An important advantage of PLMs is good out-of-distribution (OOD) robustness. Recently, diffusion models have attracted a lot of work to apply diffusion to PLMs. It remains under-explored how diffusion influences PLMs on OOD data. The core of diffusion models is a forward diffusion process which gradually applies Gaussian noise to inputs, and a reverse denoising process which removes noise. The noised input reconstruction is a fundamental ability of diffusion models. We directly analyze OOD robustness by measuring the reconstruction loss, including testing the abilities to reconstruct OOD data, and to detect OOD samples. Experiments are conducted by analyzing different training parameters and data statistical features on eight datasets. It shows that finetuning PLMs with diffusion degrades the reconstruction ability on OOD data. The comparison also shows that diffusion models can effectively detect OOD samples, achieving state-of-the-art performance in most of the datasets with an absolute accuracy improvement up to 18%. These results indicate that diffusion reduces OOD robustness of PLMs.	翻訳日:2023-07-27 13:26:34 公開日:2023-07-26
# スケルトンに基づく人間の運動予測のための学習スニペットから運動への進歩 Learning Snippet-to-Motion Progression for Skeleton-based Human Motion Prediction ( http://arxiv.org/abs/2307.14006v1 ) ライセンス: Link先を確認	Xinshun Wang, Qiongjie Cui, Chen Chen, Shen Zhao, Mengyuan Liu	(参考訳) 既存のグラフ畳み込みネットワークは、人間の動き予測を達成するために、歴史入力から直接予測を出力するワンステップスキームを採用しており、人間の動きパターンを活用できない。人間の動きは遷移パターンを持ち、各遷移を表すスニペットに分割することができる。各スニペットは、遷移ポーズと呼ばれる開始と終了のポーズから再構成することができる。スニペットからモーションへのマルチステージフレームワークを提案し,動作予測をサブタスクに分解する。各サブタスクは、トランザクショナルポーズ予測、スニペット再構築、スニペット・トゥ・モーション予測の3つのモジュールを統合する。具体的には、まず遷移ポーズのみを予測することを提案する。次に、それらを用いて対応するスニペットを再構成し、真の動き列に近似する。最後に、最終的な予測出力を生成するためにそれらを洗練する。このネットワークを実現するために,異なる時空間モデリングに依存する既存手法と比較して,直接的かつ効果的な特徴伝播を可能にする統一グラフモデリングを提案する。ヒト3.6M, CMU Mocap, 3DPWデータセットの大規模実験により, 最先端性能を実現する手法の有効性が検証された。 Existing Graph Convolutional Networks to achieve human motion prediction largely adopt a one-step scheme, which output the prediction straight from history input, failing to exploit human motion patterns. We observe that human motions have transitional patterns and can be split into snippets representative of each transition. Each snippet can be reconstructed from its starting and ending poses referred to as the transitional poses. We propose a snippet-to-motion multi-stage framework that breaks motion prediction into sub-tasks easier to accomplish. Each sub-task integrates three modules: transitional pose prediction, snippet reconstruction, and snippet-to-motion prediction. Specifically, we propose to first predict only the transitional poses. Then we use them to reconstruct the corresponding snippets, obtaining a close approximation to the true motion sequence. Finally we refine them to produce the final prediction output. To implement the network, we propose a novel unified graph modeling, which allows for direct and effective feature propagation compared to existing approaches which rely on separate space-time modeling. Extensive experiments on Human 3.6M, CMU Mocap and 3DPW datasets verify the effectiveness of our method which achieves state-of-the-art performance.	翻訳日:2023-07-27 13:19:38 公開日:2023-07-26
# 単一テキストからの局所およびグローバルキーワードの教師なし抽出 Unsupervised extraction of local and global keywords from a single text ( http://arxiv.org/abs/2307.14005v1 ) ライセンス: Link先を確認	Lida Aleksanyan and Armen E. Allahverdyan	(参考訳) テキストからキーワードを抽出する非教師付きコーパス非依存手法を提案する。これは、単語の空間分布と、この分布が単語のランダムな置換に対する応答に基づいている。既存の方法(例えばYAKE)と比較して、我々の方法には3つの利点がある。まず、長いテキストからキーワードを抽出する方がはるかに効果的である。第二に、ローカルとグローバルの2種類のキーワードを推論できる。第3に、テキストの基本テーマを明らかにする。さらに,本手法は言語非依存であり,短文に適用できる。結果は,従来の古典文学作品データベースからテキストの知識を持つ人間アノテータを通じて得られる(アノテータ間の合意は中等から実質的なものである)。本研究は,抽出された単語の平均長と抽出語の平均名詞数に基づいて,人間に依存しない議論を通じて支持する。高次テキスト特徴を持つキーワードの関係を議論し,キーワードと章分割の関係を明らかにする。 We propose an unsupervised, corpus-independent method to extract keywords from a single text. It is based on the spatial distribution of words and the response of this distribution to a random permutation of words. As compared to existing methods (such as e.g. YAKE) our method has three advantages. First, it is significantly more effective at extracting keywords from long texts. Second, it allows inference of two types of keywords: local and global. Third, it uncovers basic themes in texts. Additionally, our method is language-independent and applies to short texts. The results are obtained via human annotators with previous knowledge of texts from our database of classical literary works (the agreement between annotators is from moderate to substantial). Our results are supported via human-independent arguments based on the average length of extracted content words and on the average number of nouns in extracted words. We discuss relations of keywords with higher-order textual features and reveal a connection between keywords and chapter divisions.	翻訳日:2023-07-27 13:19:19 公開日:2023-07-26
# きめ細かい評価条件による事象記述の感情的自然言語生成 Affective Natural Language Generation of Event Descriptions through Fine-grained Appraisal Conditions ( http://arxiv.org/abs/2307.14004v1 ) ライセンス: Link先を確認	Yarik Menchaca Resendiz and Roman Klinger	(参考訳) 感情的テキスト生成のモデルは顕著な進歩を示しているが、一般的には基本的な感情理論やヴァランス/覚醒値にのみ条件として依存している。これは、明示的な感情表現("the kid is happy")を作ることが目的であるときに適切である。しかし、感情は暗黙的に伝達される。例えば、ある出来事の感情的な解釈(「Their Dog died.」)は、しばしば明示的な感情表現を必要としない。心理学において、評価理論は、事象の認知的評価と潜在的に発達する感情との関係を説明する。彼らは状況の評価をその場に置き、例えば、自身のコントロールや何が起こるかの責任について。生成フレームワークの条件として評価変数を含めると2つの利点があることを示す。 1) 生成モデルは, 特定の感情の作り方や特性について, より詳細な情報を得る。これにより、条件を満たしたテキストが生成される。 2)評価の変数は、感情カテゴリのみを提供するのではなく、状況の特性を述べることによって、ユーザが生成したテキストをよりきめ細かい制御を行うことができる。 7つの感情(Anger, Disgust, Fear, Guilt, Joy, Sadness, Shame)と7つの評価(Attention, Responsibility, Control, Circumstance, Pleasantness, Effort, Certainty)を用いた実験の結果,(1)トレーニング中に評価を追加することで,F1の10ppの精度が向上することがわかった。さらに、(2)鑑定変数のテキストは長く、より詳細なものを含んでいる。これは、ユーザに対するより大きなコントロールの例です。 Models for affective text generation have shown a remarkable progress, but they commonly rely only on basic emotion theories or valance/arousal values as conditions. This is appropriate when the goal is to create explicit emotion statements ("The kid is happy."). Emotions are, however, commonly communicated implicitly. For instance, the emotional interpretation of an event ("Their dog died.") does often not require an explicit emotion statement. In psychology, appraisal theories explain the link between a cognitive evaluation of an event and the potentially developed emotion. They put the assessment of the situation on the spot, for instance regarding the own control or the responsibility for what happens. We hypothesize and subsequently show that including appraisal variables as conditions in a generation framework comes with two advantages. (1) The generation model is informed in greater detail about what makes a specific emotion and what properties it has. This leads to text generation that better fulfills the condition. (2) The variables of appraisal allow a user to perform a more fine-grained control of the generated text, by stating properties of a situation instead of only providing the emotion category. Our Bart and T5-based experiments with 7 emotions (Anger, Disgust, Fear, Guilt, Joy, Sadness, Shame), and 7 appraisals (Attention, Responsibility, Control, Circumstance, Pleasantness, Effort, Certainty) show that (1) adding appraisals during training improves the accurateness of the generated texts by 10 pp in F1. Further, (2) the texts with appraisal variables are longer and contain more details. This exemplifies the greater control for users.	翻訳日:2023-07-27 13:19:07 公開日:2023-07-26
# マトロイド制約を受けるkサブモジュラー最大化のための高速アルゴリズム Fast algorithms for k-submodular maximization subject to a matroid constraint ( http://arxiv.org/abs/2307.13996v1 ) ライセンス: Link先を確認	Shuxian Niu and Qian Liu and Yang Zhou and Min Li	(参考訳) 本稿では,matroid制約下でk$-submodular関数を最大化するためにしきい値切り下げアルゴリズムを適用し,近似比の少ないgreedyアルゴリズムと比較して,アルゴリズムのクエリの複雑さを低減した。モノトンに対して$(\frac{1}{2} - \epsilon)$-approximation algorithm for monotone $k$-submodular function maximization, and a $(\frac{1}{3} - \epsilon)$-approximation algorithm for non-monotone case, with complexity $o(\frac{n(k\cdot eo + io)}{\epsilon} \log \frac{r}{\epsilon})$, ここで$r$はマトロイドのランクを表し、$io, eo$は、サブセットが独立集合であるかどうかを評価し、それぞれ$f$の関数値を計算するオラクルの数を表す。総サイズ制約は一様マトロイドと呼ばれる特別なマトロイドと見なすことができるので、全サイズ制約の対象となる$k$-サブモジュラー関数を最大化するための高速アルゴリズムを提案する。参列者。 In this paper, we apply a Threshold-Decreasing Algorithm to maximize $k$-submodular functions under a matroid constraint, which reduces the query complexity of the algorithm compared to the greedy algorithm with little loss in approximation ratio. We give a $(\frac{1}{2} - \epsilon)$-approximation algorithm for monotone $k$-submodular function maximization, and a $(\frac{1}{3} - \epsilon)$-approximation algorithm for non-monotone case, with complexity $O(\frac{n(k\cdot EO + IO)}{\epsilon} \log \frac{r}{\epsilon})$, where $r$ denotes the rank of the matroid, and $IO, EO$ denote the number of oracles to evaluate whether a subset is an independent set and to compute the function value of $f$, respectively. Since the constraint of total size can be looked as a special matroid, called uniform matroid, then we present the fast algorithm for maximizing $k$-submodular functions subject to a total size constraint as corollaries. corollaries.	翻訳日:2023-07-27 13:18:35 公開日:2023-07-26
# 低次元特徴空間における効果的な個人化フェデレーション学習の実現 Take Your Pick: Enabling Effective Personalized Federated Learning within Low-dimensional Feature Space ( http://arxiv.org/abs/2307.13995v1 ) ライセンス: Link先を確認	Guogang Zhu, Xuefeng Liu, Shaojie Tang, Jianwei Niu, Xinghao Wu, Jiaxing Shen	(参考訳) パーソナライズド・フェデレーション・ラーニング(PFL)は、クライアントのデータが異なるドメインにあるアプリケーションシナリオに対処するための異なるモデルを持つことができる人気のあるフレームワークである。 pflのクライアントの典型的なモデルは、全クライアントがトレーニングしたグローバルエンコーダを特徴とし、生データとクライアントのローカルデータを使用してトレーニングされたパーソナライズされたレイヤ(例えば、分類器)から普遍的な特徴を抽出する。それでも、異なるクライアントのデータ分散(別名、ドメインギャップ)の違いにより、グローバルエンコーダが生成する普遍的な機能は、クライアントのローカルなタスクとは無関係に、多くのコンポーネントを含んでいる。最近のPFL法では、エンコーダ内の特定のパラメータをパーソナライズすることで上記の問題に対処している。しかし、これらの手法は、ニューラルネットワークパラメータ空間の高次元性と非線形性に起因する大きな課題に遭遇する。対照的に、特徴空間はより低い次元を示し、パラメータ空間と比較してより直感性と解釈性を提供する。そこで我々はFedPickという新しいPFLフレームワークを提案する。 FedPickは、そのローカルデータ分布に基づいてグローバルエンコーダが生成した特徴から、各クライアントのタスク関連機能を適応的に選択することで、低次元の特徴空間におけるPFLを実現する。これはパラメータ空間で機能する手法と比較して、よりアクセシブルで解釈可能なPFLの実装を示す。大規模な実験結果から、FedPickは各クライアントのタスク関連機能を効果的に選択し、クロスドメインFLにおけるモデル性能を向上させることができた。 Personalized federated learning (PFL) is a popular framework that allows clients to have different models to address application scenarios where clients' data are in different domains. The typical model of a client in PFL features a global encoder trained by all clients to extract universal features from the raw data and personalized layers (e.g., a classifier) trained using the client's local data. Nonetheless, due to the differences between the data distributions of different clients (aka, domain gaps), the universal features produced by the global encoder largely encompass numerous components irrelevant to a certain client's local task. Some recent PFL methods address the above problem by personalizing specific parameters within the encoder. However, these methods encounter substantial challenges attributed to the high dimensionality and non-linearity of neural network parameter space. In contrast, the feature space exhibits a lower dimensionality, providing greater intuitiveness and interpretability as compared to the parameter space. To this end, we propose a novel PFL framework named FedPick. FedPick achieves PFL in the low-dimensional feature space by selecting task-relevant features adaptively for each client from the features generated by the global encoder based on its local data distribution. It presents a more accessible and interpretable implementation of PFL compared to those methods working in the parameter space. Extensive experimental results show that FedPick could effectively select task-relevant features for each client and improve model performance in cross-domain FL.	翻訳日:2023-07-27 13:17:59 公開日:2023-07-26
# BovineTalk: 負の影響下での乳牛の発声分析のための機械学習 BovineTalk: Machine Learning for Vocalization Analysis of Dairy Cattle under Negative Affective States ( http://arxiv.org/abs/2307.13994v1 ) ライセンス: Link先を確認	Dinu Gavojdian, Teddy Lazebnik, Madalina Mincu, Ariel Oren, Ioana Nicolae, Anna Zamansky	(参考訳) 家畜の正確な家畜養殖(PLF)ツールを利用することにより、家畜種における情動状態の非侵襲的な指標を開発し、検証する必要がある。そのような有望なアプローチの1つは、発声指示器の使用である。声化の音響構造とその機能は、豚、馬、鶏、ヤギなどの重要な家畜種で広く研究されたが、牛はこの文脈で現在まで検討されている。牛は, 口を閉じた, あるいは部分的に閉じた, 遠距離接触のための低周波発声 (LF) と, 遠距離通信のための開口発声 (HF) の2種類の発声を, 後者は負の感情状態と関連していると考えられた。さらに, 牛の発声には, 否定的, 肯定的, 幅広い文脈において, 個人性に関する情報が含まれていた。現在では、乳牛は典型的な生産サイクルにおいて一連のネガティブな課題やストレスに直面しており、研究に特に興味を持つネガティブな感情状態の中で声を鳴らしている。この研究の貢献の一つは、視覚隔離課題によって引き起こされるネガティブな感情状態の間、乳牛を授乳する成人の乳牛の、最大で最新の(ノイズからのクリーン)データセットを提供することである。本稿では,深層学習と説明可能な機械学習,高頻度および低周波の牛の鳴き声の分類,および個別の牛の音声認識の2つの計算フレームワークを提案する。両フレームワークのモデルでは, LF分類では87.2%, HF分類では89.4%, 牛個体識別では68.9%, 72.5%の精度であった。 There is a critical need to develop and validate non-invasive animal-based indicators of affective states in livestock species, in order to integrate them into on-farm assessment protocols, potentially via the use of precision livestock farming (PLF) tools. One such promising approach is the use of vocal indicators. The acoustic structure of vocalizations and their functions were extensively studied in important livestock species, such as pigs, horses, poultry and goats, yet cattle remain understudied in this context to date. Cows were shown to produce two types vocalizations: low-frequency calls (LF), produced with the mouth closed, or partially closed, for close distance contacts and open mouth emitted high-frequency calls (HF), produced for long distance communication, with the latter considered to be largely associated with negative affective states. Moreover, cattle vocalizations were shown to contain information on individuality across a wide range of contexts, both negative and positive. Nowadays, dairy cows are facing a series of negative challenges and stressors in a typical production cycle, making vocalizations during negative affective states of special interest for research. One contribution of this study is providing the largest to date pre-processed (clean from noises) dataset of lactating adult multiparous dairy cows during negative affective states induced by visual isolation challenges. Here we present two computational frameworks - deep learning based and explainable machine learning based, to classify high and low-frequency cattle calls, and individual cow voice recognition. Our models in these two frameworks reached 87.2% and 89.4% accuracy for LF and HF classification, with 68.9% and 72.5% accuracy rates for the cow individual identification, respectively.	翻訳日:2023-07-27 13:17:33 公開日:2023-07-26
# コンピュータビジョンタスクにおける因果推論 Causal reasoning in typical computer vision tasks ( http://arxiv.org/abs/2307.13992v1 ) ライセンス: Link先を確認	Zhang, Kexuan and Sun, Qiyu and Zhao, Chaoqiang and Tang, Yang	(参考訳) ディープラーニングは人工知能の分野に革命をもたらした。深層学習に基づく手法によって明らかになった統計的相関に基づき、コンピュータビジョン技術は、自動運転やロボット工学などの分野において大きな成長をもたらした。深層学習の基礎であるにもかかわらず、そのような相関関係は安定ではなく、制御されていない要因に影響を受けやすい。事前知識のガイダンスがないと、統計的相関は容易に素早い相関に変わり、共同設立者を引き起こす。その結果、研究者は因果理論を用いて深層学習に基づく手法を洗練し始めた。因果理論は、データバイアスに影響を受けない固有の因果構造をモデル化し、スプリアス相関を避けるのに有効である。本稿では,セマンティックセグメンテーション,オブジェクト検出,画像キャプションといった視覚・視覚言語タスクにおける既存の因果法を総合的に検討することを目的とした。因果関係の利点と因果関係のパラダイムを構築するためのアプローチを要約する。今後のロードマップも提案され、因果理論の開発と他の複雑なシーンやシステムへの応用が促進される。 Deep learning has revolutionized the field of artificial intelligence. Based on the statistical correlations uncovered by deep learning-based methods, computer vision technology has contributed to tremendous growth in areas such as autonomous driving and robotics. Despite being the basis of deep learning, such correlation is not stable and is susceptible to uncontrolled factors. In the absence of the guidance of prior knowledge, statistical correlations can easily turn into spurious correlations and cause confounders. As a result, researchers are beginning to refine deep learning-based methods with causal theory. Causal theory models the intrinsic causal structure unaffected by data bias and is effective in avoiding spurious correlations. This paper aims to comprehensively review the existing causal methods in typical vision and vision-language tasks such as semantic segmentation, object detection, and image captioning. The advantages of causality and the approaches for building causal paradigms will be summarized. Future roadmaps are also proposed, including facilitating the development of causal theory and its application in other complex scenes and systems.	翻訳日:2023-07-27 13:17:01 公開日:2023-07-26
# METAVerse: オフロードナビゲーションのためのメタラーニングトレーサビリティコストマップ METAVerse: Meta-Learning Traversability Cost Map for Off-Road Navigation ( http://arxiv.org/abs/2307.13991v1 ) ライセンス: Link先を確認	Junwon Seo, Taekyung Kim, Seongyong Ahn, Kiho Kwak	(参考訳) オフロード環境での自律航行には、正確な地形通過可能性の推定が必要である。しかし,非構造環境におけるトラバーサビリティ推定は,車両とテランの相互作用に影響を与える要因が多様であることから,不確実性が高い。したがって、様々な環境において正確にトラバーサビリティを予測できる一般化モデルを得ることは困難である。本稿では,多様な環境における地形変動を正確にかつ確実に予測するグローバルモデル学習用メタラーニングフレームワークMETAVerseを提案する。トラバーサビリティ予測ネットワークをトレーニングし、疎いLiDAR点雲から高密度で連続的なコストマップを生成し、車と地形の相互作用フィードバックを自己管理的に活用する。メタラーニングは、複数の環境から収集したデータを用いてグローバルモデルを訓練し、推定の不確実性を効果的に最小化する。デプロイ中に、最近のインタラクション体験を利用して、ネットワークをローカル環境に迅速に適応させるために、オンライン適応を行う。総合的な評価を行うため,様々な地形から運転データを収集し,不確実性を最小化するグローバルモデルが得られることを示す。さらに,モデル予測コントローラとモデルを統合することにより,不確かさの低減により,未構造地や未知地での安全で安定した航行が可能となることを示す。 Autonomous navigation in off-road conditions requires an accurate estimation of terrain traversability. However, traversability estimation in unstructured environments is subject to high uncertainty due to the variability of numerous factors that influence vehicle-terrain interaction. Consequently, it is challenging to obtain a generalizable model that can accurately predict traversability in a variety of environments. This paper presents METAVerse, a meta-learning framework for learning a global model that accurately and reliably predicts terrain traversability across diverse environments. We train the traversability prediction network to generate a dense and continuous-valued cost map from a sparse LiDAR point cloud, leveraging vehicle-terrain interaction feedback in a self-supervised manner. Meta-learning is utilized to train a global model with driving data collected from multiple environments, effectively minimizing estimation uncertainty. During deployment, online adaptation is performed to rapidly adapt the network to the local environment by exploiting recent interaction experiences. To conduct a comprehensive evaluation, we collect driving data from various terrains and demonstrate that our method can obtain a global model that minimizes uncertainty. Moreover, by integrating our model with a model predictive controller, we demonstrate that the reduced uncertainty results in safe and stable navigation in unstructured and unknown terrains.	翻訳日:2023-07-27 13:16:47 公開日:2023-07-26
# これは正しくありません! 言語生成システムの否定認識評価 This is not correct! Negation-aware Evaluation of Language Generation Systems ( http://arxiv.org/abs/2307.13989v1 ) ライセンス: Link先を確認	Miriam Ansch\"utz and Diego Miguel Lozano and Georg Groh	(参考訳) 大規模な言語モデルは、否定が文の意味をどの程度変えているかを過小評価する。したがって,これらのモデルに基づく学習評価指標は否定に敏感である。本稿では,BLEURT評価尺度の否定対応版であるNegBLEURTを提案する。そこで我々はルールベースの文否定ツールを設計し,CANNOT否定評価データセットの作成に利用した。このデータセットに基づいて,文変換器と評価指標を微調整し,否定感度を向上させる。既存のベンチマークでこれらのモデルを評価すると、我々の微調整されたモデルは、他の摂動に対するベースモデルのパフォーマンスを維持しながら、否定された文の既存のメトリクスをはるかに上回っています。 Large language models underestimate the impact of negations on how much they change the meaning of a sentence. Therefore, learned evaluation metrics based on these models are insensitive to negations. In this paper, we propose NegBLEURT, a negation-aware version of the BLEURT evaluation metric. For that, we designed a rule-based sentence negation tool and used it to create the CANNOT negation evaluation dataset. Based on this dataset, we fine-tuned a sentence transformer and an evaluation metric to improve their negation sensitivity. Evaluating these models on existing benchmarks shows that our fine-tuned models outperform existing metrics on the negated sentences by far while preserving their base models' performances on other perturbations.	翻訳日:2023-07-27 13:16:25 公開日:2023-07-26
# 下肢筋骨格分節におけるベイズアクティブラーニングのためのハイブリッド表現強調サンプリング Hybrid Representation-Enhanced Sampling for Bayesian Active Learning in Musculoskeletal Segmentation of Lower Extremities ( http://arxiv.org/abs/2307.13986v1 ) ライセンス: Link先を確認	Ganping Li, Yoshito Otake, Mazen Soufi, Masashi Taniguchi, Masahide Yagi, Noriaki Ichihashi, Keisuke Uemura, Masaki Takao, Nobuhiko Sugano, Yoshinobu Sato	(参考訳) 目的: 自動セグメンテーションのためのディープラーニング(dl)モデルをトレーニングするための手動アノテーションを取得するのは、しばしば時間がかかります。不確実性に基づくベイズ能動学習(BAL)は、アノテーションの努力を減らすために広く研究されている手法である。 balに基づいて,最も有意義なサンプルを効率的に選択することにより,手動アノテーションコストを削減するために,密度と多様性の基準を統合したハイブリッド表現エンハンスドサンプリング戦略を提案する。方法:ベイジアンU-netに基づくBALフレームワークを用いて,MRIおよびCT画像の2つの下肢データセットを用いて実験を行った。本手法は,手動リビジョンのための高密度・多彩な不確実なサンプルを選択し,ラベル付きインスタンスとの最大類似度と既存のトレーニングデータとの最小類似度を最適化する。提案手法である減算アノテーションコスト (rac) を用いて, dice の精度と効率を評価した。さらに, 各種取得規則がBAL性能に及ぼす影響を評価し, 有効性評価のためのアブレーション研究を設計する。結果: 提案手法は, 2つの取得ルールにまたがる2つのデータセットの他の手法よりも優劣を示し, 定量的結果から, 取得ルールの長所と短所を明らかにした。本研究は, 筋骨格の分節化において, 密度と多様性の基準の組み合わせは, いずれかを用いてのみ発現することを示した。結論: 画像分割作業におけるアノテーションコストの削減には, サンプリング手法が有効であることが証明された。提案手法とbalフレームワークの組み合わせは医用画像データセットの効率的なアノテーションのための半自動的な方法を提供する。 Purpose: Obtaining manual annotations to train deep learning (DL) models for auto-segmentation is often time-consuming. Uncertainty-based Bayesian active learning (BAL) is a widely-adopted method to reduce annotation efforts. Based on BAL, this study introduces a hybrid representation-enhanced sampling strategy that integrates density and diversity criteria to save manual annotation costs by efficiently selecting the most informative samples. Methods: The experiments are performed on two lower extremity (LE) datasets of MRI and CT images by a BAL framework based on Bayesian U-net. Our method selects uncertain samples with high density and diversity for manual revision, optimizing for maximal similarity to unlabeled instances and minimal similarity to existing training data. We assess the accuracy and efficiency using Dice and a proposed metric called reduced annotation cost (RAC), respectively. We further evaluate the impact of various acquisition rules on BAL performance and design an ablation study for effectiveness estimation. Results: The proposed method showed superiority or non-inferiority to other methods on both datasets across two acquisition rules, and quantitative results reveal the pros and cons of the acquisition rules. Our ablation study in volume-wise acquisition shows that the combination of density and diversity criteria outperforms solely using either of them in musculoskeletal segmentation. Conclusion: Our sampling method is proven efficient in reducing annotation costs in image segmentation tasks. The combination of the proposed method and our BAL framework provides a semi-automatic way for efficient annotation of medical image datasets.	翻訳日:2023-07-27 13:16:15 公開日:2023-07-26
# 低域重み行列を用いた一層自己注意型変圧器はユニバーサル近似器か? Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators? ( http://arxiv.org/abs/2307.14023v1 ) ライセンス: Link先を確認	Tokio Kajitsuka and Issei Sato	(参考訳) 変圧器モデルの表現能力の既存の分析では、データの記憶に過度に深い層を必要とするため、実際に実際に使用される変圧器との相違が生じている。これは主にハードマックス関数の近似としてのソフトマックス関数の解釈によるものである。ソフトマックス関数とボルツマン作用素の接続を明確化することにより、低ランク重み行列を持つ単層が入力列全体の文脈を完全に捉える能力を有することを証明した。その結果、単一層トランスフォーマーは有限標本の記憶能力を有しており、2つのフィードフォワードニューラルネットワークを持つ1つの自己アテンション層からなるトランスフォーマーは、コンパクトドメイン上の連続関数の普遍近似器であることを示す。 Existing analyses of the expressive capacity of Transformer models have required excessively deep layers for data memorization, leading to a discrepancy with the Transformers actually used in practice. This is primarily due to the interpretation of the softmax function as an approximation of the hardmax function. By clarifying the connection between the softmax function and the Boltzmann operator, we prove that a single layer of self-attention with low-rank weight matrices possesses the capability to perfectly capture the context of an entire input sequence. As a consequence, we show that single-layer Transformer has a memorization capacity for finite samples, and that Transformers consisting of one self-attention layer with two feed-forward neural networks are universal approximators for continuous functions on a compact domain.	翻訳日:2023-07-27 13:10:11 公開日:2023-07-26
# 量子コンピューティングの効率最適化:熱力学と計算性能のバランス Efficiency Optimization in Quantum Computing: Balancing Thermodynamics and Computational Performance ( http://arxiv.org/abs/2307.14022v1 ) ライセンス: Link先を確認	Tomasz \'Smierzchalski, Zakaria Mzaouali, Sebastian Deffner, Bart{\l}omiej Gardas	(参考訳) 逆熱処理におけるD波量子アニールの計算効率と熱力学的コストについて検討した。実験の結果, 逆アニーリングとパジングの組み合わせは, 熱力学的コストを最小化しつつ, 計算効率の向上につながることがわかった。さらに, 逆アニーリング時に, 磁場が量子アニーラーの性能に正の影響を及ぼすが, 舗装が関与すると劣化することがわかった。本研究では,逆アニーリングプロトコルを用いた量子アニーリングシステムの性能とエネルギー消費を最適化する手法を提案する。 We investigate the computational efficiency and thermodynamic cost of the D-Wave quantum annealer under reverse-annealing with and without pausing. Our experimental results demonstrate that the combination of reverse-annealing and pausing leads to improved computational efficiency while minimizing the thermodynamic cost compared to reverse-annealing alone. Moreover, we find that the magnetic field has a positive impact on the performance of the quantum annealer during reverse-annealing but becomes detrimental when pausing is involved. Our results provide strategies for optimizing the performance and energy consumption of quantum annealing systems employing reverse-annealing protocols.	翻訳日:2023-07-27 13:09:57 公開日:2023-07-26
# retinotopyインスパイアされた脳エンコーディングモデルとオールフォーワントレーニングレシピ Retinotopy Inspired Brain Encoding Model and the All-for-One Training Recipe ( http://arxiv.org/abs/2307.14021v1 ) ライセンス: Link先を確認	Huzheng Yang, Jianbo Shi, James Gee	(参考訳) 脳エンコーディングモデルは、刺激画像に対する脳のボクセル的反応を予測し、ニューロイメージング技術で捉えた脳信号を複製することを目的としている。大量の公開データがあるが、包括的な脳エンコーディングモデルのトレーニングは難しい。主な難しさは a) 機能的異種脳領域を有する個々の脳内の多様性 b) 遺伝的及び発達的差異による異なる対象からの脳の多様性 c) 画像モダリティおよび処理パイプラインの多様性。この多様性は、難解な1つの大きなモデルの問題を複数の小さなモデルに分割し、異なる機能領域の区別を維持しながら知識を集約する、オール・フォー・ワンのトレーニングレシピを導入することで、当社の利点を生かしている。トレーニングレシピによらず、脳の生物学的知識、特に網膜写真を用いて誘導バイアスを導入し、3D脳画像マッピングを学習します。 a) 各ニューロンは、情報を収集する画像領域及び意味レベルを知っており、 b) モデルに残されたニューロンは存在しない。 3つの画像モダリティにまたがる5つの公開データセットから100万以上のデータポイントを用いて、脳エンコーディングモデルを事前訓練した。私たちの知る限りでは、これはこれまでで最も包括的な脳のエンコーディングモデルです。視覚バックボーンモデルのドロップイン代替として,事前学習モデルの有効性を示す。さらに,脳のデコードに対するモデルの適用例を示した。コードとモデルチェックポイントが利用可能になる。 Brain encoding models aim to predict brain voxel-wise responses to stimuli images, replicating brain signals captured by neuroimaging techniques. There is a large volume of publicly available data, but training a comprehensive brain encoding model is challenging. The main difficulties stem from a) diversity within individual brain, with functional heterogeneous brain regions; b) diversity of brains from different subjects, due to genetic and developmental differences; c) diversity of imaging modalities and processing pipelines. We use this diversity to our advantage by introducing the All-for-One training recipe, which divides the challenging one-big-model problem into multiple small models, with the small models aggregating the knowledge while preserving the distinction between the different functional regions. Agnostic of the training recipe, we use biological knowledge of the brain, specifically retinotopy, to introduce inductive bias to learn a 3D brain-to-image mapping that ensures a) each neuron knows which image regions and semantic levels to gather information, and b) no neurons are left behind in the model. We pre-trained a brain encoding model using over one million data points from five public datasets spanning three imaging modalities. To the best of our knowledge, this is the most comprehensive brain encoding model to the date. We demonstrate the effectiveness of the pre-trained model as a drop-in replacement for commonly used vision backbone models. Furthermore, we demonstrate the application of the model to brain decoding. Code and the model checkpoint will be made available.	翻訳日:2023-07-27 13:09:45 公開日:2023-07-26
# 監督されていない点群登録のための1Nearest Neighborhood Guides Inlier Estimation One-Nearest Neighborhood Guides Inlier Estimation for Unsupervised Point Cloud Registration ( http://arxiv.org/abs/2307.14019v1 ) ライセンス: Link先を確認	Yongzhe Yuan, Yue Wu, Maoguo Gong, Qiguang Miao and A. K. Qin	(参考訳) 教師なしのクラウド登録手法の精度は、特に部分的に重複するシナリオにおいて、信頼性の高いインリアリヤ推定と自己監督信号の欠如によって制限される。本稿では,源点クラウドと対応する基準点クラウドコピー間の幾何的構造整合性を把握し,教師なしの点クラウド登録のための効果的な不整合推定手法を提案する。具体的には、高品質な基準点クラウドコピーを得るために、入力点クラウドによりワンネアレス(1-NN)ポイントクラウドを生成する。これによりマッチングマップの構築が容易になり、1-nnポイントクラウドと入力ポイントクラウドの2つの近傍マッチングスコアを統合することで、マッチング信頼性が向上する。高品質な参照コピーに特化して、不整合とその近傍のグラフは、ソースポイントクラウドと対応する参照コピーとの間に整合性を持つべきであると論じる。この観察に基づいて,変換不変な幾何構造表現を構築し,幾何構造一貫性を捉えることにより,原点雲とその参照コピー間の推定対応に対する信頼度を推定する。この戦略はモデル最適化のための信頼性の高い自己教師付き信号も同時に提供する。最後に、重み付きSVDアルゴリズムによる変換推定を、推定対応度と対応する不整合信頼度で計算する。提案モデルを教師なしでトレーニングし,提案手法の有効性を示す合成および実世界のデータセットに関する広範な実験を行った。 The precision of unsupervised point cloud registration methods is typically limited by the lack of reliable inlier estimation and self-supervised signal, especially in partially overlapping scenarios. In this paper, we propose an effective inlier estimation method for unsupervised point cloud registration by capturing geometric structure consistency between the source point cloud and its corresponding reference point cloud copy. Specifically, to obtain a high quality reference point cloud copy, an One-Nearest Neighborhood (1-NN) point cloud is generated by input point cloud. This facilitates matching map construction and allows for integrating dual neighborhood matching scores of 1-NN point cloud and input point cloud to improve matching confidence. Benefiting from the high quality reference copy, we argue that the neighborhood graph formed by inlier and its neighborhood should have consistency between source point cloud and its corresponding reference copy. Based on this observation, we construct transformation-invariant geometric structure representations and capture geometric structure consistency to score the inlier confidence for estimated correspondences between source point cloud and its reference copy. This strategy can simultaneously provide the reliable self-supervised signal for model optimization. Finally, we further calculate transformation estimation by the weighted SVD algorithm with the estimated correspondences and corresponding inlier confidence. We train the proposed model in an unsupervised manner, and extensive experiments on synthetic and real-world datasets illustrate the effectiveness of the proposed method.	翻訳日:2023-07-27 13:09:23 公開日:2023-07-26
# 市販ナノフォトニックシリコン導波路におけるエルビウム放出体 Erbium emitters in commercially fabricated nanophotonic silicon waveguides ( http://arxiv.org/abs/2307.14017v1 ) ライセンス: Link先を確認	Stephan Rinner, Florian Burger, Andreas Gritsch, Jonas Schmitt, Andreas Reiserer	(参考訳) ナノフォトニックシリコンデバイスに統合された量子メモリは、大規模量子ネットワークとスケーラブルフォトニック量子コンピュータのための有望なプラットフォームである。この文脈では、エルビウムドーパタンは電気通信周波数帯の光遷移と第2長いコヒーレンス時間のポテンシャルを組み合わせるため、特に魅力的である。ここでは、これらのエミッタを商業的に製造された低損失導波路に確実に統合できることを示す。我々は、複数の積分手順を調査し、2GHz以下の均一幅と30kHz以下の均一な直線幅を持つ多くのエミッタのアンサンブルを得る。さらに、常磁性不純物を凍結する9Tまでの磁場中での電子スピン状態の分裂を観察する。我々の発見は、CMOS技術を用いてウェーハスケールで製造できる長寿命量子メモリへの重要な一歩である。 Quantum memories integrated into nanophotonic silicon devices are a promising platform for large quantum networks and scalable photonic quantum computers. In this context, erbium dopants are particularly attractive, as they combine optical transitions in the telecommunications frequency band with the potential for second-long coherence time. Here we show that these emitters can be reliably integrated into commercially fabricated low-loss waveguides. We investigate several integration procedures and obtain ensembles of many emitters with an inhomogeneous broadening of < 2 GHz and a homogeneous linewidth of < 30 kHz. We further observe the splitting of the electronic spin states in a magnetic field up to 9 T that freezes paramagnetic impurities. Our findings are an important step towards long-lived quantum memories that can be fabricated on a wafer-scale using CMOS technology.	翻訳日:2023-07-27 13:08:58 公開日:2023-07-26
# RPG-Palm:パルププリント認識のための実データ生成 RPG-Palm: Realistic Pseudo-data Generation for Palmprint Recognition ( http://arxiv.org/abs/2307.14016v1 ) ライセンス: Link先を確認	Lei Shen, Jianlong Jin, Ruixin Zhang, Huaen Li, Yingyi Zhang, Jingyun Zhang, Shouhong Ding, Yang Zhao, Wei Jia	(参考訳) Palmprintは最近、プライバシーにやさしく安定したバイオメトリックスであるため、認識アプリケーションに大きな可能性を示している。しかし、大規模な公開palmprintデータセットの欠如は、palmprint認識のさらなる研究と開発を制限している。本稿では,パームプリントを大量のIDで合成する新しい現実的な擬似パルムプリント生成(RPG)モデルを提案する。まず,クラス内多様性を改善する条件変調生成器を提案する。次に,非ペアトレーニングに対するid一貫性を確保するために,id認識損失を提案する。我々は、アイデンティティ独立を保証するため、B'ezier palm creases生成戦略をさらに改善する。広範な実験結果から,合成前訓練は認識モデルの性能を著しく向上させることが示された。例えば、我々のモデルは、1:1$と1:3$のオープンセットプロトコルの下でtar@far=1e-6の観点で、最先端のb\'ezierpalmを$5\%$と$14\%$で改善します。実際のトレーニングデータのうち10〜%しかアクセスしない場合、本手法はarcfaceを100〜%の実際のトレーニングデータで上回っており、実データなしのpalmprint認識に近いことを示している。 Palmprint recently shows great potential in recognition applications as it is a privacy-friendly and stable biometric. However, the lack of large-scale public palmprint datasets limits further research and development of palmprint recognition. In this paper, we propose a novel realistic pseudo-palmprint generation (RPG) model to synthesize palmprints with massive identities. We first introduce a conditional modulation generator to improve the intra-class diversity. Then an identity-aware loss is proposed to ensure identity consistency against unpaired training. We further improve the B\'ezier palm creases generation strategy to guarantee identity independence. Extensive experimental results demonstrate that synthetic pretraining significantly boosts the recognition model performance. For example, our model improves the state-of-the-art B\'ezierPalm by more than $5\%$ and $14\%$ in terms of TAR@FAR=1e-6 under the $1:1$ and $1:3$ Open-set protocol. When accessing only $10\%$ of the real training data, our method still outperforms ArcFace with $100\%$ real training data, indicating that we are closer to real-data-free palmprint recognition.	翻訳日:2023-07-27 13:08:44 公開日:2023-07-26
# モデル構成のためのスコアベース拡散モデルのmcmc補正 MCMC-Correction of Score-Based Diffusion Models for Model Composition ( http://arxiv.org/abs/2307.14012v1 ) ライセンス: Link先を確認	Anders Sj\"oberg, Jakob Lindqvist, Magnus \"Onnheim, Mats Jirstrand and Lennart Svensson	(参考訳) 拡散モデルはスコアまたはエネルギー関数の項でパラメータ化することができる。エネルギーパラメータ化は,提案する試料の全エネルギーの変化に基づいて,メトロポリス-ハstings補正ステップを用いた拡張サンプリング手順を可能にするという,理論的な性質が向上した。しかし、これは若干パフォーマンスが悪くなり、さらに重要なことに、スコアベースの拡散が広く普及しているため、市販の事前訓練エネルギーベースのものしか利用できない。この制限は、事前訓練されたモデルと新しい分布からのサンプルを組み合わせることを目的としたモデル構成の目的を損なう。しかし,本提案では,スコアパラメータ化の維持と,スコア関数のライン積分によるエネルギーベース受け入れ確率の算出を提案する。これにより、既存の拡散モデルを再利用し、逆過程と様々なマルコフ-チェインモンテカルロ法(MCMC)を組み合わせることができる。提案手法を2次元実験で評価した結果,エネルギーパラメータ化よりも類似性や性能が良好であることが判明した。 Diffusion models can be parameterised in terms of either a score or an energy function. The energy parameterisation has better theoretical properties, mainly that it enables an extended sampling procedure with a Metropolis--Hastings correction step, based on the change in total energy in the proposed samples. However, it seems to yield slightly worse performance, and more importantly, due to the widespread popularity of score-based diffusion, there are limited availability of off-the-shelf pre-trained energy-based ones. This limitation undermines the purpose of model composition, which aims to combine pre-trained models to sample from new distributions. Our proposal, however, suggests retaining the score parameterization and instead computing the energy-based acceptance probability through line integration of the score function. This allows us to re-use existing diffusion models and still combine the reverse process with various Markov-Chain Monte Carlo (MCMC) methods. We evaluate our method on a 2D experiment and find that it achieve similar or arguably better performance than the energy parameterisation.	翻訳日:2023-07-27 13:08:25 公開日:2023-07-26
# ESSAformer:ハイパースペクトル画像超解像のための効率的な変換器 ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution ( http://arxiv.org/abs/2307.14010v1 ) ライセンス: Link先を確認	Mingjin Zhang, Chi Zhang, Qiming Zhang, Jie Guo, Xinbo Gao, Jing Zhang	(参考訳) single hyperspectral image super- resolution (single-hsi-sr) は、低解像度の観測から高分解能のハイパースペクトル画像を復元することを目的としている。しかし、CNNベースのアプローチは、長距離依存の構築とスペクトル特徴間の相互作用情報をキャプチャする際の制限を示している。これにより、アップサンプリング後のスペクトル情報やアーティファクトの利用が不十分になる。この問題に対処するために,単HSI-SR 用 ESSA attention-embedded Transformer ネットワークであるESSAformer を提案する。具体的には、まず、スペクトルのスペクトル相関係数(SCC)である、頑健でスペクトルに親しみやすい類似度尺度である \ie を導入し、元の注意行列を置き換え、誘導バイアスをモデルに組み込んでトレーニングを容易にする。そこで我々は,より効率的なSCCカーネル・セルフアテンション(ESSA)を構築し,注意計算を線形複雑性に還元する理論的支援により,カーネル化可能なアテンション手法をさらに活用する。 ESSAは、アップサンプリング後の特徴に対する受容領域を、多くの計算を伴わずに拡大し、異なるスケールの空間スペクトル情報を効果的に活用し、より自然な高解像度画像を生成する。大規模なデータセットの事前トレーニングを必要とせず、我々の実験は、視覚的品質と定量的結果の両方においてESSAの有効性を実証した。 Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation. However, the prevailing CNN-based approaches have shown limitations in building long-range dependencies and capturing interaction information between spectral features. This results in inadequate utilization of spectral information and artifacts after upsampling. To address this issue, we propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure. Specifically, we first introduce a robust and spectral-friendly similarity metric, \ie, the spectral correlation coefficient of the spectrum (SCC), to replace the original attention matrix and incorporates inductive biases into the model to facilitate training. Built upon it, we further utilize the kernelizable attention technique with theoretical support to form a novel efficient SCC-kernel-based self-attention (ESSA) and reduce attention computation to linear complexity. ESSA enlarges the receptive field for features after upsampling without bringing much computation and allows the model to effectively utilize spatial-spectral information from different scales, resulting in the generation of more natural high-resolution images. Without the need for pretraining on large-scale datasets, our experiments demonstrate ESSA's effectiveness in both visual quality and quantitative results.	翻訳日:2023-07-27 13:08:09 公開日:2023-07-26
# car-studio:シングルビューとエンドレスインザワイルド画像から車の放射場を学ぶ Car-Studio: Learning Car Radiance Fields from Single-View and Endless In-the-wild Images ( http://arxiv.org/abs/2307.14009v1 ) ライセンス: Link先を確認	Tianyu Liu, Hao Zhao, Yang Yu, Guyue Zhou, Ming Liu	(参考訳) 合成ニューラルシーングラフ研究は、放射場が編集可能な自律運転シミュレーターにおいて効率的なツールであることを示した。しかし、これまでの研究では一連の自動運転データセットの中で学習し、シミュレータで車を回すとぼやけてしまう。本稿では,制約のないイメージを学習し,処理されたイメージからデータセットを構築するパイプラインを提案する。車両の視界が変化したときの明瞭さの維持と、編集時のアーティファクトを避けるため、背景から輪郭を鋭く保つことを求めるシミュレータの要件を満たすため、我々は、都市景観の重要部分である車両の放射場を設計する。実験により,本モデルがベースラインよりも競争性能が高いことを示す。 In-the-wild画像から構築したデータセットを用いて、制御可能な外観編集機能を徐々に提示する。我々はデータセットとコードをhttps://lty2226262.github.io/car-studio/でリリースし、この分野のさらなる研究を促進する。 Compositional neural scene graph studies have shown that radiance fields can be an efficient tool in an editable autonomous driving simulator. However, previous studies learned within a sequence of autonomous driving datasets, resulting in unsatisfactory blurring when rotating the car in the simulator. In this letter, we propose a pipeline for learning unconstrained images and building a dataset from processed images. To meet the requirements of the simulator, which demands that the vehicle maintain clarity when the perspective changes and that the contour remains sharp from the background to avoid artifacts when editing, we design a radiation field of the vehicle, a crucial part of the urban scene foreground. Through experiments, we demonstrate that our model achieves competitive performance compared to baselines. Using the datasets built from in-the-wild images, our method gradually presents a controllable appearance editing function. We will release the dataset and code on https://lty2226262.github.io/car-studio/ to facilitate further research in the field.	翻訳日:2023-07-27 13:07:43 公開日:2023-07-26
# 効率的なグローバルトケミキサーとしての適応周波数フィルタ Adaptive Frequency Filters As Efficient Global Token Mixers ( http://arxiv.org/abs/2307.14008v1 ) ライセンス: Link先を確認	Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Zheng-Jun Zha, Yan Lu, Baining Guo	(参考訳) 近年のビジョントランスフォーマー、大型カーネルcnn、mlpは、グローバルスコープでの効果的な情報融合により、広いビジョンタスクにおいて顕著な成功を収めている。しかし、その効率的なデプロイメント、特にモバイルデバイスでは、自己着脱機構や大きなカーネル、あるいは完全に接続されたレイヤの計算コストが重いため、依然として注目すべき課題に直面している。本研究では,従来の畳み込み定理を深層学習に適用し,適応周波数フィルタが効率的なグローバルトークンミキサーとして機能することを示す。そこで本研究では,適応周波数フィルタ(AFF)トークンミキサーを提案する。このニューラル演算子は、フーリエ変換を介して潜在表現を周波数領域に転送し、要素分割乗算による意味適応周波数フィルタリングを実行し、この潜在表現の空間分解能よりも大きな動的畳み込み核を持つ元の潜在空間におけるトークン混合演算に数学的に等しい。 affトークンミキサーを主要なニューラルネットワークとして、afnetと呼ばれる軽量ニューラルネットワークを構築する。提案したAFFトークンミキサーの有効性を実証し,AFFNetが視覚認識や密集予測タスクを含む広範囲な視覚的タスクにおいて,他の軽量ネットワーク設計と比較して精度と効率のトレードオフを達成できることを実証した。 Recent vision transformers, large-kernel CNNs and MLPs have attained remarkable successes in broad vision tasks thanks to their effective information fusion in the global scope. However, their efficient deployments, especially on mobile devices, still suffer from noteworthy challenges due to the heavy computational costs of self-attention mechanisms, large kernels, or fully connected layers. In this work, we apply conventional convolution theorem to deep learning for addressing this and reveal that adaptive frequency filters can serve as efficient global token mixers. With this insight, we propose Adaptive Frequency Filtering (AFF) token mixer. This neural operator transfers a latent representation to the frequency domain via a Fourier transform and performs semantic-adaptive frequency filtering via an elementwise multiplication, which mathematically equals to a token mixing operation in the original latent space with a dynamic convolution kernel as large as the spatial resolution of this latent representation. We take AFF token mixers as primary neural operators to build a lightweight neural network, dubbed AFFNet. Extensive experiments demonstrate the effectiveness of our proposed AFF token mixer and show that AFFNet achieve superior accuracy and efficiency trade-offs compared to other lightweight network designs on broad visual tasks, including visual recognition and dense prediction tasks.	翻訳日:2023-07-27 13:07:22 公開日:2023-07-26
# セットレベル誘導攻撃:ビジョンランゲージ事前学習モデルの逆転性を高める Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models ( http://arxiv.org/abs/2307.14061v1 ) ライセンス: Link先を確認	Dong Lu, Zhiqiang Wang, Teng Wang, Weili Guan, Hongchang Gao, Feng Zheng	(参考訳) 視覚言語事前学習(VLP)モデルは、マルチモーダルタスクにおける敵の例に対する脆弱性を示す。さらに、悪意のある敵は意図的に他のブラックボックスモデルを攻撃することができる。しかし、既存の研究は主にホワイトボックス攻撃の調査に焦点を当てている。本稿では,近年のVLPモデルの逆転送性について検討する。既存の手法は, ホワイトボックス設定における攻撃性能よりもはるかに低い転送性を示す。伝達性劣化は、部分的にはクロスモーダル相互作用のアンダーユース化によって引き起こされる。特に、単項学習とは異なり、VLPモデルはクロスモーダル相互作用に強く依存しており、マルチモーダルアライメントは多対多である(例えば、画像は様々な自然言語で記述できる)。そこで本研究では,モダリティインタラクションを徹底的に活用し,アライメント保存強化とクロスモーダルガイダンスを組み込んだ,高度に転送可能なセットレベル誘導攻撃(sga)を提案する。実験により、SGAは複数の下流視覚言語タスクにおいて、異なるVLPモデル間で強く伝達可能な逆例を生成できることを示した。画像テキスト検索において、SGAはALBEFからTCLへの転送攻撃の攻撃成功率(少なくとも9.78%、最大30.21%)を最先端と比較して著しく向上させる。 Vision-language pre-training (VLP) models have shown vulnerability to adversarial examples in multimodal tasks. Furthermore, malicious adversaries can be deliberately transferred to attack other black-box models. However, existing work has mainly focused on investigating white-box attacks. In this paper, we present the first study to investigate the adversarial transferability of recent VLP models. We observe that existing methods exhibit much lower transferability, compared to the strong attack performance in white-box settings. The transferability degradation is partly caused by the under-utilization of cross-modal interactions. Particularly, unlike unimodal learning, VLP models rely heavily on cross-modal interactions and the multimodal alignments are many-to-many, e.g., an image can be described in various natural languages. To this end, we propose a highly transferable Set-level Guidance Attack (SGA) that thoroughly leverages modality interactions and incorporates alignment-preserving augmentation with cross-modal guidance. Experimental results demonstrate that SGA could generate adversarial examples that can strongly transfer across different VLP models on multiple downstream vision-language tasks. On image-text retrieval, SGA significantly enhances the attack success rate for transfer attacks from ALBEF to TCL by a large margin (at least 9.78% and up to 30.21%), compared to the state-of-the-art.	翻訳日:2023-07-27 12:59:36 公開日:2023-07-26
# 幾何学的アプローチによるquditによるデータの分類 Classification of data with a qudit, a geometric approach ( http://arxiv.org/abs/2307.14060v1 ) ライセンス: Link先を確認	A. Mandilara, B. Dellen, U. Jaekel, T. Valtinos, D. Syvridis	(参考訳) 本稿では,孤立量子$d$レベルのシステムなどを用いたデータ分類モデルを提案する。この手順は、古典的なデータが回転符号化によってキューディットのブロッホ超球面にマッピングされるエンコーディングフェーズと、球面の回転と射影測定によって構成される。回転は測定対象のオペレータを制御するために調整可能であるが、ブロッホの超曲面上のマッピングを調整する符号化フェーズでは追加の重みが導入されている。トレーニングフェーズにおいて、観測可能量の平均期待値に基づくコスト関数を勾配降下を用いて最小化し、重量を調整する。実例を用いて無損失メモリ次元の数値的推定を行い,この幾何学的インスパイアされたquditモデルが,少数のパラメータのみを用いて,かつ絡み合い操作を必要とせずに非線形分類問題を解くことができることを実証した。 We propose a model for data classification using isolated quantum $d$-level systems or else qudits. The procedure consists of an encoding phase where classical data are mapped on the surface of the qudit's Bloch hyper-sphere via rotation encoding, followed by a rotation of the sphere and a projective measurement. The rotation is adjustable in order to control the operator to be measured, while additional weights are introduced in the encoding phase adjusting the mapping on the Bloch's hyper-surface. During the training phase, a cost function based on the average expectation value of the observable is minimized using gradient descent thereby adjusting the weights. Using examples and performing a numerical estimation of lossless memory dimension, we demonstrate that this geometrically inspired qudit model for classification is able to solve nonlinear classification problems using a small number of parameters only and without requiring entangling operations.	翻訳日:2023-07-27 12:59:13 公開日:2023-07-26
# 自動運転のためのシステム分類要件の確立に向けて Towards Establishing Systematic Classification Requirements for Automated Driving ( http://arxiv.org/abs/2307.14058v1 ) ライセンス: Link先を確認	Ken T. Mori, Trent Brown, Steven Peters	(参考訳) 自動車分野における認識のための様々なベンチマークデータセットにおいて分類タスクが存在するにもかかわらず、一貫性のある分類要件を定義するための努力はほとんど行われていない。本稿では,分類構造を生成するための構造的手法を提案する。第一に、車両の行動要件に基づいて法的カテゴリを識別する。この構造は、物体の衝突安全性と知覚的カテゴリーの2つの側面を考慮することでさらに裏付けられる。模範的な法文にこの方法を適用することにより、分類階層を得る。結果とベンチマークデータセットのカテゴリを比較すると、合意は限られている。これは、知覚に関する法的要件を明確に考慮することの必要性を示している。 Despite the presence of the classification task in many different benchmark datasets for perception in the automotive domain, few efforts have been undertaken to define consistent classification requirements. This work addresses the topic by proposing a structured method to generate a classification structure. First, legal categories are identified based on behavioral requirements for the vehicle. This structure is further substantiated by considering the two aspects of collision safety for objects as well as perceptual categories. A classification hierarchy is obtained by applying the method to an exemplary legal text. A comparison of the results with benchmark dataset categories shows limited agreement. This indicates the necessity for explicit consideration of legal requirements regarding perception.	翻訳日:2023-07-27 12:58:59 公開日:2023-07-26
# オープン画像コンテンツの非武装化と再構築 Open Image Content Disarm And Reconstruction ( http://arxiv.org/abs/2307.14057v1 ) ライセンス: Link先を確認	Eli Belkind, Ran Dubin, Amit Dvir	(参考訳) マルウェア技術の進歩により、攻撃者は悪意のあるコードをアンチウイルスサービスから隠す新しい方法を作る。攻撃を邪魔するひとつの方法は、悪質なスクリプトを隠すためのカバーとして共通ファイルを使用することで、マルウェアは正当なファイルのように見えてしまう。最先端の人工知能とコンテンツシグネチャは存在するが、evasive malwareはステガノグラフィのような高度な手法で次世代マルウェアの検出をうまくバイパスする。マルウェアを隠すためによく使われるファイルは画像ファイル(JPEGなど)である。さらに、一部のマルウェアはsteganographyを使って悪意のあるスクリプトや機密データを画像に隠している。画像中のステガノグラフィーは、特殊なツールを使っても検出が難しい。イメージベースの攻撃は、悪意のあるペイロードを使用してユーザのデバイスを攻撃するか、イメージステガノグラフィを使用して、正当なイメージ内の機密データを隠蔽し、ユーザのデバイス外にリークしようとする。そこで本稿では,新しい画像コンテンツの非武装化・再構築(icdr)を提案する。当社のicdrシステムは,高い画像品質とファイル使用性を維持しつつ,ゼロ信頼アプローチで潜在的なマルウェアを除去する。画像データを抽出し、他のファイルから削除し、画像画素を操作することで、ファイル内の隠れたマルウェアを無効にしたり削除したりすることができる。 With the advance in malware technology, attackers create new ways to hide their malicious code from antivirus services. One way to obfuscate an attack is to use common files as cover to hide the malicious scripts, so the malware will look like a legitimate file. Although cutting-edge Artificial Intelligence and content signature exist, evasive malware successfully bypasses next-generation malware detection using advanced methods like steganography. Some of the files commonly used to hide malware are image files (e.g., JPEG). In addition, some malware use steganography to hide malicious scripts or sensitive data in images. Steganography in images is difficult to detect even with specialized tools. Image-based attacks try to attack the user's device using malicious payloads or utilize image steganography to hide sensitive data inside legitimate images and leak it outside the user's device. Therefore in this paper, we present a novel Image Content Disarm and Reconstruction (ICDR). Our ICDR system removes potential malware, with a zero trust approach, while maintaining high image quality and file usability. By extracting the image data, removing it from the rest of the file, and manipulating the image pixels, it is possible to disable or remove the hidden malware inside the file.	翻訳日:2023-07-27 12:58:49 公開日:2023-07-26
# Unite-Divide-Unite: 高精度二関節画像分割のためのジョイントブースティングトランクと構造 Unite-Divide-Unite: Joint Boosting Trunk and Structure for High-accuracy Dichotomous Image Segmentation ( http://arxiv.org/abs/2307.14052v1 ) ライセンス: Link先を確認	Jialun Pei, Zhangjun Zhou, Yueming Jin, He Tang, Pheng-Ann Heng	(参考訳) high-accuracy dichotomous image segmentation (dis)は、カテゴリーに依存しないフォアグラウンドオブジェクトを自然シーンから特定することを目的としている。 DISの主な課題は、詳細なオブジェクト構造を描画しながら、高度に正確な支配領域を特定することである。しかし、一般的なエンコーダ-デコーダアーキテクチャを直接使用すると、高レベルの特徴が過剰に供給され、細部構造を分割するのに必要な浅い空間情報が無視される可能性がある。このギャップを埋めるために、トランクと構造同定の有効性を同時に向上するために、補間的特徴を再構成し、分割的に配置する新しいユニット・ディヴィッド・ユニテ・ネットワーク(UDUN)を導入する。提案されたUDUNはいくつかの強みから進歩している。まず、デュアルサイズの入力が共有バックボーンにフィードされ、モデルを軽量に保ちながら、より全体的で詳細な機能を生成する。第2に、構造デコーダとトランクデコーダにマルチスケールの低レベル特徴と高レベル特徴を分離して、構造情報とトランク情報を取得するための単純なDCMを提案する。さらに,一様高精度セグメンテーションのためのカスケード統合を行う結合デコーダにおいて,トランク構造アグリゲーションモジュール(TSA)を設計する。その結果、udunは全6つの評価指標、すなわち0.772の重み付きf-measureと977 hceにおいて最先端の競合相手に対して有利に作用する。 10241024入力を用いて、ResNet-18で65.3fpsのリアルタイム推論を可能にする。 High-accuracy Dichotomous Image Segmentation (DIS) aims to pinpoint category-agnostic foreground objects from natural scenes. The main challenge for DIS involves identifying the highly accurate dominant area while rendering detailed object structure. However, directly using a general encoder-decoder architecture may result in an oversupply of high-level features and neglect the shallow spatial information necessary for partitioning meticulous structures. To fill this gap, we introduce a novel Unite-Divide-Unite Network (UDUN} that restructures and bipartitely arranges complementary features to simultaneously boost the effectiveness of trunk and structure identification. The proposed UDUN proceeds from several strengths. First, a dual-size input feeds into the shared backbone to produce more holistic and detailed features while keeping the model lightweight. Second, a simple Divide-and-Conquer Module (DCM) is proposed to decouple multiscale low- and high-level features into our structure decoder and trunk decoder to obtain structure and trunk information respectively. Moreover, we design a Trunk-Structure Aggregation module (TSA) in our union decoder that performs cascade integration for uniform high-accuracy segmentation. As a result, UDUN performs favorably against state-of-the-art competitors in all six evaluation metrics on overall DIS-TE, i.e., achieving 0.772 weighted F-measure and 977 HCE. Using 10241024 input, our model enables real-time inference at 65.3 fps with ResNet-18.	翻訳日:2023-07-27 12:58:29 公開日:2023-07-26
# 3dセマンティックサブスペーストラバーサ : 形状編集機能付き3d生成モデルの実現 3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability ( http://arxiv.org/abs/2307.14051v1 ) ライセンス: Link先を確認	Ruowei Wang, Yu Liu, Pei Su, Jianwei Zhang, Qijun Zhao	(参考訳) 形状生成は、3dコンテンツ作成のための様々な表現として3d形状を生成する実践である。従来の3次元形状生成の研究は、意味情報の重要性を考慮せずに、形状の質と構造に焦点を合わせてきた。したがって、このような生成モデルは、しばしば、形状構造の意味的一貫性を維持したり、生成中の形状の意味的属性を操作できない。本稿では,カテゴリ固有の3次元形状の生成と編集に意味属性を利用する3Dセマンティックサブスペーストラバーサという新しい意味生成モデルを提案する。提案手法は3次元形状表現として暗黙関数を利用し,新しい潜在空間GANと線形部分空間モデルを組み合わせて,局所潜在空間における意味的次元を探索する。部分空間の各次元は特定の意味属性に対応し、それらの次元の係数をトラバースすることで生成された形状の属性を編集することができる。実験の結果,提案手法は複雑な構造を持つ妥当な形状を生成でき,意味属性の編集が可能となった。コードとトレーニングされたモデルはhttps://github.com/trepangcat/3d_semantic_subspace_traverserで入手できる。 Shape generation is the practice of producing 3D shapes as various representations for 3D content creation. Previous studies on 3D shape generation have focused on shape quality and structure, without or less considering the importance of semantic information. Consequently, such generative models often fail to preserve the semantic consistency of shape structure or enable manipulation of the semantic attributes of shapes during generation. In this paper, we proposed a novel semantic generative model named 3D Semantic Subspace Traverser that utilizes semantic attributes for category-specific 3D shape generation and editing. Our method utilizes implicit functions as the 3D shape representation and combines a novel latent-space GAN with a linear subspace model to discover semantic dimensions in the local latent space of 3D shapes. Each dimension of the subspace corresponds to a particular semantic attribute, and we can edit the attributes of generated shapes by traversing the coefficients of those dimensions. Experimental results demonstrate that our method can produce plausible shapes with complex structures and enable the editing of semantic attributes. The code and trained models are available at https://github.com/TrepangCat/3D_Semantic_Subspace_Traverser	翻訳日:2023-07-27 12:57:58 公開日:2023-07-26
# 顔偽造検出のための制御可能なガイドスペース Controllable Guide-Space for Generalizable Face Forgery Detection ( http://arxiv.org/abs/2307.14039v1 ) ライセンス: Link先を確認	Ying Guo, Cheng Zhen, Pengfei Yan	(参考訳) 顔偽造検出の最近の研究は、データセットの訓練に携わる手法に満足できる性能を示したが、未知の領域では不十分である。これは一般化を改善するための多くの研究を動機付けているが、画像の背景やアイデンティティなどの偽情報はまだ異なる領域の特徴を持ち、予期せぬクラスタリングを引き起こし、一般化を制限している。本稿では,異なる偽ドメインの識別を強化するための制御可能なガイド空間(GS)手法を提案し,特徴の偽関連性を高め,一般化を改善する。十分に設計されたガイド空間は、偽ドメインの適切な分離と、実偽ドメイン間の大きな距離を明示的かつ制御可能な方法で同時に達成することができる。さらに、より良い識別のために、ドメイン間の偽造関連相関の干渉を弱めるためにデカップリングモジュールを使用する。さらに、近傍における同一領域特徴のクラスタリング度に応じて、決定境界多様体の調整を行う。複数のドメイン内およびクロスドメイン設定での広範囲な実験により、この手法が最先端の一般化を実現できることを確認した。 Recent studies on face forgery detection have shown satisfactory performance for methods involved in training datasets, but are not ideal enough for unknown domains. This motivates many works to improve the generalization, but forgery-irrelevant information, such as image background and identity, still exists in different domain features and causes unexpected clustering, limiting the generalization. In this paper, we propose a controllable guide-space (GS) method to enhance the discrimination of different forgery domains, so as to increase the forgery relevance of features and thereby improve the generalization. The well-designed guide-space can simultaneously achieve both the proper separation of forgery domains and the large distance between real-forgery domains in an explicit and controllable manner. Moreover, for better discrimination, we use a decoupling module to weaken the interference of forgery-irrelevant correlations between domains. Furthermore, we make adjustments to the decision boundary manifold according to the clustering degree of the same domain features within the neighborhood. Extensive experiments in multiple in-domain and cross-domain settings confirm that our method can achieve state-of-the-art generalization.	翻訳日:2023-07-27 12:57:37 公開日:2023-07-26
# Multi3WOZ: 文化的適応型タスク指向対話システムの訓練と評価のための多言語・多言語・マルチパラメータデータセット Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems ( http://arxiv.org/abs/2307.14031v1 ) ライセンス: Link先を確認	Songbo Hu, Han Zhou, Mete Hergul, Milan Gritta, Guchun Zhang, Ignacio Iacobacci, Ivan Vuli\'c, Anna Korhonen	(参考訳) タスク指向ダイアログ(ToD)のための高品質なアノテートデータの作成は、非常に難しいことで知られており、その目標は、複数の言語向けに平等で文化的に適応し、大規模なToDデータセットを作成することにある。そのため、現在のデータセットは依然として非常に少なく、翻訳に基づく非ネイティブな対話や小さなスケール、文化的適応の欠如といった制限に悩まされている。本稿では,まず,多言語todデータセットの現在の展望を概観し,その特性と限界を体系的に概観する。検出された制限をすべて削減するために,新しいマルチ言語,マルチドメイン,マルチ並列ToDデータセットであるMulti3WOZを導入する。大規模で、4つの言語で文化的に適応したダイアログを提供し、多言語および言語間todシステムのトレーニングと評価を可能にする。最終的なデータセットを得た複雑なボトムアップデータ収集プロセスを説明し、将来の参照のために様々なToD関連タスクのベースラインスコアの最初のセットを提供する。 Creating high-quality annotated data for task-oriented dialog (ToD) is known to be notoriously difficult, and the challenges are amplified when the goal is to create equitable, culturally adapted, and large-scale ToD datasets for multiple languages. Therefore, the current datasets are still very scarce and suffer from limitations such as translation-based non-native dialogs with translation artefacts, small scale, or lack of cultural adaptation, among others. In this work, we first take stock of the current landscape of multilingual ToD datasets, offering a systematic overview of their properties and limitations. Aiming to reduce all the detected limitations, we then introduce Multi3WOZ, a novel multilingual, multi-domain, multi-parallel ToD dataset. It is large-scale and offers culturally adapted dialogs in 4 languages to enable training and evaluation of multilingual and cross-lingual ToD systems. We describe a complex bottom-up data collection process that yielded the final dataset, and offer the first sets of baseline scores across different ToD-related tasks for future reference, also highlighting its challenging nature.	翻訳日:2023-07-27 12:57:19 公開日:2023-07-26
# コンセンサス適応RANSAC Consensus-Adaptive RANSAC ( http://arxiv.org/abs/2307.14030v1 ) ライセンス: Link先を確認	Luca Cavalli, Daniel Barath, Marc Pollefeys, Viktor Larsson	(参考訳) RANSACとその変種は、ロバストな推定に広く用いられているが、一般的には、他のモデル仮説を無視しながら最高スコアのモデルを見つけるための欲求的なアプローチに従う。対照的に、反復重み付き最小二乗法(IRLS)の手法は、過去の繰り返しの残差に基づいて各対応の重みを反復的に更新することによって、徐々にモデルにアプローチする。これらの手法に触発されて,これまでに見てきた残差を新たな注意層を通して考慮し,パラメータ空間を探索することを学ぶ新しいransacフレームワークを提案する。このアテンション機構は、ポイント・ツー・モデル残差のバッチで動作し、軽量なワンステップトランスフォーマーによって得られたコンセンサスを考慮したポイント毎の推定状態を更新する。このリッチな状態は、イテレーション間の最小限のサンプリングとモデルの洗練を導く。室内および屋外の複数のデータセットに対して,本質的および基本的行列推定に関する提案手法を評価する。実行時のオーバーヘッドが小さいという大きなマージンで、最先端の推定値を上回っている。さらに、トレーニングモデルの優れた一般化特性を示し、異なるデータセットとタスクにまたがる効果を示す。提案したアテンション機構とワンステップトランスフォーマーは、RANSACの性能を向上させる適応的な動作を提供し、ロバストな推定のためのより効果的なツールである。コードはhttps://github.com/cavalli1234/CA-RANSACで公開されている。 RANSAC and its variants are widely used for robust estimation, however, they commonly follow a greedy approach to finding the highest scoring model while ignoring other model hypotheses. In contrast, Iteratively Reweighted Least Squares (IRLS) techniques gradually approach the model by iteratively updating the weight of each correspondence based on the residuals from previous iterations. Inspired by these methods, we propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer. The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer. This rich state then guides the minimal sampling between iterations as well as the model refinement. We evaluate the proposed approach on essential and fundamental matrix estimation on a number of indoor and outdoor datasets. It outperforms state-of-the-art estimators by a significant margin adding only a small runtime overhead. Moreover, we demonstrate good generalization properties of our trained model, indicating its effectiveness across different datasets and tasks. The proposed attention mechanism and one-step transformer provide an adaptive behavior that enhances the performance of RANSAC, making it a more effective tool for robust estimation. Code is available at https://github.com/cavalli1234/CA-RANSAC.	翻訳日:2023-07-27 12:56:57 公開日:2023-07-26
# 赤血球疾患分類のためのトポロジカル・レギュラライズ・マルチインスタンス学習 Topologically-Regularized Multiple Instance Learning for Red Blood Cell Disease Classification ( http://arxiv.org/abs/2307.14025v1 ) ライセンス: Link先を確認	Salome Kazeminia, Ario Sadafi, Asya Makhro, Anna Bogdanova, Carsten Marr, Bastian Rieck	(参考訳) 顕微鏡画像を用いたまれな貧血の診断は熟練の専門医や機械学習の手法では困難である。単一の血液サンプルに数千の疾患関連細胞があるため、これは複雑な多重インスタンス学習(MIL)問題を構成する。赤血球の空間的近傍は、それ自体は意味がないが、トポロジー、すなわち血液サンプル全体の幾何学は、限られたデータでトレーニングする際の勾配の消失や過度な適合などの典型的なMIL問題を治療するための情報的特徴を含む。そこで我々は,単一赤血球画像の袋から多スケールなトポロジー特徴を抽出するトポロジーベースアプローチを開発した。トポロジカルな特徴はモデルの正則化に使われ、データの特徴的なトポロジカル特性の保存を強制される。 521個の赤血球の顕微鏡像を有する稀貧血患者71例のデータセットに適用し, 単細胞画像に基づく異常貧血自動分類において, 局所的正規化が3%以上の性能向上に繋がる有効な方法であることを示した。これは、MILプロセスの正則化に位相特性を使用する最初のアプローチである。 Diagnosing rare anemia disorders using microscopic images is challenging for skilled specialists and machine-learning methods alike. Due to thousands of disease-relevant cells in a single blood sample, this constitutes a complex multiple-instance learning (MIL) problem. While the spatial neighborhood of red blood cells is not meaningful per se, the topology, i.e., the geometry of blood samples as a whole, contains informative features to remedy typical MIL issues, such as vanishing gradients and overfitting when training on limited data. We thus develop a topology-based approach that extracts multi-scale topological features from bags of single red blood cell images. The topological features are used to regularize the model, enforcing the preservation of characteristic topological properties of the data. Applied to a dataset of 71 patients suffering from rare anemia disorders with 521 microscopic images of red blood cells, our experiments show that topological regularization is an effective method that leads to more than 3% performance improvements for the automated classification of rare anemia disorders based on single-cell images. This is the first approach that uses topological properties for regularizing the MIL process.	翻訳日:2023-07-27 12:56:34 公開日:2023-07-26
# 欲しいことを伝えるアクション: 戦略的フィードバックから量子スタックルバーグ平衡のおそらくサンプル効率の良い強化学習 Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks ( http://arxiv.org/abs/2307.14085v1 ) ライセンス: Link先を確認	Siyu Chen, Mengdi Wang, Zhuoran Yang	(参考訳) リーダー従者構造を持つエピソディックマルコフゲームにおいて,量子的スタックルバーグ平衡(qse)を学ぶための強化学習(rl)について検討した。具体的には、ゲームの開始時に、リーダーは自分のポリシーをフォロワーに発表し、コミットする。従者は、リーダーの政策を観察し、次に、リーダーの政策によって引き起こされるエントロピー正規化政策最適化問題を解決することにより、質的応答政策を採用する。リーダーの目標は、フォロワーと対話し、データから学ぶことで、最適な総利益をもたらす、最適なポリシーを見つけることである。この問題の鍵となる課題は、リーダーは従者の報酬を観察できず、リーダーの政策に対する行動から従者の質的反応モデルを推測する必要があることである。関数近似の文脈において,オンライン設定とオフライン設定の両方のサンプル効率のアルゴリズムを提案する。私たちのアルゴリズムは (i)最大確率推定と量的応答モデルの学習 (ii) リーダーの意思決定問題を解決するためのモデルフリーまたはモデルベースrlは, サブリニアな後悔の上限を達成することを示す。さらに,これらの推定者の不確実性を定量化し,不確実性を利用してオンラインおよびオフラインの設定に楽観的で悲観的なアルゴリズムを実装する。また,リニア・ミオピック・セッティングに特化する場合は,アルゴリズムの計算効率も向上する。理論解析では, 量子応答モデルの誤差を組み込んだ, 独立興味を持つような新しいパフォーマンス・ディファレンス補題を特徴とする。 We study reinforcement learning (RL) for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure. In specific, at the outset of the game, the leader announces her policy to the follower and commits to it. The follower observes the leader's policy and, in turn, adopts a quantal response policy by solving an entropy-regularized policy optimization problem induced by leader's policy. The goal of the leader is to find her optimal policy, which yields the optimal expected total return, by interacting with the follower and learning from data. A key challenge of this problem is that the leader cannot observe the follower's reward, and needs to infer the follower's quantal response model from his actions against leader's policies. We propose sample-efficient algorithms for both the online and offline settings, in the context of function approximation. Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem, and we show that they achieve sublinear regret upper bounds. Moreover, we quantify the uncertainty of these estimators and leverage the uncertainty to implement optimistic and pessimistic algorithms for online and offline settings. Besides, when specialized to the linear and myopic setting, our algorithms are also computationally efficient. Our theoretical analysis features a novel performance-difference lemma which incorporates the error of quantal response model, which might be of independent interest.	翻訳日:2023-07-27 12:52:09 公開日:2023-07-26
# デジタル化カウンタダイアバティックqaoaの収束:回路深度と自由パラメータの比較 Convergence of Digitized-Counterdiabatic QAOA: circuit depth versus free parameters ( http://arxiv.org/abs/2307.14079v1 ) ライセンス: Link先を確認	Mara Vizzuso, Gianluca Passarelli, Giovanni Cantele, and Procolo Lucignano	(参考訳) 近年,連続時間量子アニーリングにおけるトロータライズ・カウンターダイアベイト駆動に触発されて,qaoaを少ないステップで最適化問題の解に収束させるために,cd量子近似最適化アルゴリズム(qaoa)が提案されている。本稿では,パラダイム的重み付きおよび非重み付き1次元MaxCut問題に着目して,このアプローチを批判的に再検討する。 1階と2階のCD補正を施した2種類のQAOAについて検討した。その結果,高次cd補正は変動コスト関数の複雑性を増大させることにより,問題の厳密な解へのより迅速な収束を可能にすることがわかった。しかし、この結果を達成するのに必要な自由パラメータの総数は、分析された特定のQAOA変種とは独立である。 Recently, Digitized-Counterdiabatic (CD) Quantum Approximate Optimization Algorithm (QAOA) has been proposed to make QAOA converge to the solution of an optimization problem in fewer steps, inspired by Trotterized counterdiabatic driving in continuous-time quantum annealing. In this paper, we critically revisit this approach by focusing on the paradigmatic weighted and unweighted one-dimensional MaxCut problem. We study two variants of QAOA with first and second-order CD corrections. Our results show that, indeed, higher order CD corrections allow for a quicker convergence to the exact solution of the problem at hand by increasing the complexity of the variational cost function. Remarkably, however, the total number of free parameters needed to achieve this result is independent of the particular QAOA variant analyzed.	翻訳日:2023-07-27 12:51:43 公開日:2023-07-26
# videocontrolnet:ディフュージョンモデルとコントロールネットを用いた動画対ビデオ翻訳フレームワーク VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet ( http://arxiv.org/abs/2307.14073v1 ) ライセンス: Link先を確認	Zhihao Hu, Dong Xu	(参考訳) 近年,stablediffusionのような拡散モデルが印象的な画像生成結果を得ている。しかし、そのような拡散モデルの生成プロセスは制御不能であり、連続的かつ一貫したコンテンツを持つビデオを生成するのが困難である。本研究では、制御ネットを用いた拡散モデルを用いて、入力されたプロンプトと条件に基づいて様々な動画を生成するために、ビデオコントロルネットと呼ばれる動き誘導型動画翻訳フレームワークを提案する。映像コーデックにインスパイアされ、時間的冗長性を低減させる動き情報を用いて、コンテンツ一貫性のための冗長領域の再生を防止する。具体的には,制御ネットを用いた拡散モデルを用いて第1フレーム(すなわちIフレーム)を生成する。そして、新しい動き誘導型Pフレーム生成法(MgPG)を用いて、従来のI/Pフレームに基づく他の鍵フレーム(すなわちPフレーム)を生成し、この拡散モデルを用いてPフレームを動作情報に基づいて生成し、閉塞領域を印加する。最後に、動作誘導Bフレーム補間(MgBI)モジュールを用いて、残りのフレーム(Bフレーム)を生成する。提案するビデオコントロールネットは,事前学習した大規模拡散モデルの生成能力を継承し,映像拡散モデルを運動情報を用いてビデオ拡散モデルに拡張する。さらなる結果は、プロジェクトのページにある。 Recently, diffusion models like StableDiffusion have achieved impressive image generation results. However, the generation process of such diffusion models is uncontrollable, which makes it hard to generate videos with continuous and consistent content. In this work, by using the diffusion model with ControlNet, we proposed a new motion-guided video-to-video translation framework called VideoControlNet to generate various videos based on the given prompts and the condition from the input video. Inspired by the video codecs that use motion information for reducing temporal redundancy, our framework uses motion information to prevent the regeneration of the redundant areas for content consistency. Specifically, we generate the first frame (i.e., the I-frame) by using the diffusion model with ControlNet. Then we generate other key frames (i.e., the P-frame) based on the previous I/P-frame by using our newly proposed motion-guided P-frame generation (MgPG) method, in which the P-frames are generated based on the motion information and the occlusion areas are inpainted by using the diffusion model. Finally, the rest frames (i.e., the B-frame) are generated by using our motion-guided B-frame interpolation (MgBI) module. Our experiments demonstrate that our proposed VideoControlNet inherits the generation capability of the pre-trained large diffusion model and extends the image diffusion model to the video diffusion model by using motion information. More results are provided at our project page.	翻訳日:2023-07-27 12:51:27 公開日:2023-07-26
# 負の$\Delta_T$雑音を持つスピンフリップ散乱 Spin-flip scattering engendered negative $\Delta_T$ noise ( http://arxiv.org/abs/2307.14072v1 ) ライセンス: Link先を確認	Tusaradri Mohapatra, Colin Benjamin	(参考訳) 帯電電流がない場合の温度勾配による$\Delta_T$ノイズは、最近多くの関心を集めている。本稿では, スピン偏極電荷$\delta_t$ ノイズを初めて導出し, ショットノイズライクで熱雑音ライクな寄与とともにスピン$\delta_t$ ノイズを導出する。温度勾配の2層金属接合界面におけるスピンフリップパの導入について,スピンフリップ散乱の影響について検討した。 2つの異なる温度条件に対して、電荷とスピンの$\Delta_T$ノイズを4つの異なる設定で詳細に解析する: 1つの熱い貯水池の第1ケースと、同じ温度の貯水池の第2ケース、および2つの異なるバイアス電圧条件:0バイアス電圧の第1ケースと有限バイアス電圧の第2ケースである。これら全てのレジームにおいて、転送される正電荷電流が常にゼロであることを保証する。負電荷$\Delta_T$は、同じ温度の貯水池に対して、別の熱い貯水池の場合、$\Delta_T$は正である。また、スピン$\Delta_T$ノイズとスピン$\Delta_T$熱ノイズのような寄与は、ホットとコールド貯水池のケースでは負である。スピン依存バイアスを持つスピン$\delta_t$ショットノイズに対する一般的なバウンドに関する最近の研究は、常に正であることを示している。本稿では,スピン依存バイアスが存在しないにもかかわらず,正電荷$\delta_t$ショットノイズ寄与とは対照的に,スピン$\delta_t$ショットノイズ様寄与が負になることを示す。スピンフリップ散乱は、電荷とスピンの両方における符号の変化の興味深い効果を示し、スピン偏極輸送を探究するのに役立つ。 $\Delta_T$ noise generated due to temperature gradient in the absence of charge current has recently attracted a lot of interest. In this paper, for the first time, we derive spin-polarized charge $\Delta_T$ noise and spin $\Delta_T$ noise along with its shot noise-like and thermal noise-like contributions. Introducing a spin flipper at the interface of a bilayer metal junction with a temperature gradient, we examine the impact of spin-flip scattering. We do a detailed analysis of charge and spin $\Delta_T$ noise in four distinct setups for two distinct temperature regimes: the first case of one hot \& the other cold reservoir and the second case of reservoirs with comparable temperatures, and also two distinct bias voltage regimes: the first case of zero bias voltage and second case of finite bias voltage. In all these regimes, we ensure that the net charge current transported is zero always. We find negative charge $\Delta_T$ noise for reservoirs at comparable temperatures while for the one hot \& another cold reservoir case, charge $\Delta_T$ noise is positive. We also see that spin $\Delta_T$ noise and spin $\Delta_T$ thermal noise-like contributions are negative for one hot and the other cold reservoir case. Recent work on the general bound for spin $\Delta_T$ shot noise with a spin-dependent bias suggests it is always positive. In this paper, we see spin $\Delta_T$ shot noise-like contribution to be negative in contrast to positive charge $\Delta_T$ shot noise contribution, although in the absence of any spin-dependent bias. Spin-flip scattering exhibits the intriguing effect of a change in sign in both charge and spin $\Delta_T$ noise, which can help probe spin-polarized transport.	翻訳日:2023-07-27 12:51:00 公開日:2023-07-26
# 頑健かつ効率的なステレオマッチングのための不確かさ誘導適応型ワープ Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching ( http://arxiv.org/abs/2307.14071v1 ) ライセンス: Link先を確認	Junpeng Jing, Jiankun Li, Pengfei Xiong, Jiangyu Liu, Shuaicheng Liu, Yichen Guo, Xin Deng, Mai Xu, Lai Jiang, Leonid Sigal	(参考訳) 相関に基づくステレオマッチングは、2つの特徴マップ間のコストボリュームを追求する優れた性能を達成している。残念ながら、固定されたモデルを持つ現在のメソッドは、さまざまなデータセットで均一に動作せず、実際の適用性を大幅に制限している。本稿では,ロバストなステレオマッチングのための相関を動的に計算するための新しい視点を提案する。異なるシナリオに対して同じモデルを堅牢に適応させるために、新しいUncertainty Guided Adaptive correlation (UGAC)モジュールが導入された。具体的には、整流動作中のサンプリング領域を適応的に調整するために分散に基づく不確かさ推定を行う。さらに, 位置特異的重みを学習できるように, 学習可能なパラメータを用いて従来の非パラメトリックウォーピングを改善する。 UGACモジュールでリカレントネットワークを強化することで、ステレオマッチングをより堅牢かつ効果的に活用できることが示される。 ETH3D,KITTI,Middleburyの各データセットに対して,これらのデータセットに対して同じ固定モデルを用いることで,再トレーニングを行なわずに,最先端の性能を実現することを実証した。リアルタイムアプリケーションをターゲットに,UGACに基づく軽量モデルをさらに設計し,パラメータ0.6MのKITTIベンチマークで他の手法よりも優れていることを示す。 Correlation based stereo matching has achieved outstanding performance, which pursues cost volume between two feature maps. Unfortunately, current methods with a fixed model do not work uniformly well across various datasets, greatly limiting their real-world applicability. To tackle this issue, this paper proposes a new perspective to dynamically calculate correlation for robust stereo matching. A novel Uncertainty Guided Adaptive Correlation (UGAC) module is introduced to robustly adapt the same model for different scenarios. Specifically, a variance-based uncertainty estimation is employed to adaptively adjust the sampling area during warping operation. Additionally, we improve the traditional non-parametric warping with learnable parameters, such that the position-specific weights can be learned. We show that by empowering the recurrent network with the UGAC module, stereo matching can be exploited more robustly and effectively. Extensive experiments demonstrate that our method achieves state-of-the-art performance over the ETH3D, KITTI, and Middlebury datasets when employing the same fixed model over these datasets without any retraining procedure. To target real-time applications, we further design a lightweight model based on UGAC, which also outperforms other methods over KITTI benchmarks with only 0.6 M parameters.	翻訳日:2023-07-27 12:50:20 公開日:2023-07-26
# PNT-Edge: 画素レベルの雑音遷移学習による雑音ラベルによるロバストエッジ検出に向けて PNT-Edge: Towards Robust Edge Detection with Noisy Labels by Learning Pixel-level Noise Transitions ( http://arxiv.org/abs/2307.14070v1 ) ライセンス: Link先を確認	Wenjie Xuan, Shanshan Zhao, Yu Yao, Juhua Liu, Tongliang Liu, Yixin Chen, Bo Du, Dacheng Tao	(参考訳) 画素レベルラベルを用いた大規模トレーニングデータから,従来のエッジ検出手法は高い性能を実現している。しかし、特に大規模なデータセットでは、エッジを正確に手動でラベル付けすることは困難である。このラベルノイズ問題は分類のために広く研究されてきたが、エッジ検出については未調査のままである。本稿では,エッジ検出のためのラベルノイズ問題に対処するため,画素レベルのノイズ遷移を学習し,ラベル分解過程をモデル化する。そこで,我々は,クリーンラベルからノイズラベルへの移行を変位場として推定する,新しい画素単位シフト学習(psl)モジュールを開発した。 pnt-edgeと名づけたこのモデルでは、推定ノイズ遷移を利用して、予測をラベルのクリーン化に適合させることができる。さらに,局所的エッジ密度正規化項を考案し,局所構造情報をより優れたトランジッション学習に活用する。この用語は複雑な局所構造を持つ辺に対する大きなシフトを学ぶことを奨励する。 SBDとCityscapesの実験は,ラベルノイズの影響を緩和する手法の有効性を示した。コードはgithubで入手できる。 Relying on large-scale training data with pixel-level labels, previous edge detection methods have achieved high performance. However, it is hard to manually label edges accurately, especially for large datasets, and thus the datasets inevitably contain noisy labels. This label-noise issue has been studied extensively for classification, while still remaining under-explored for edge detection. To address the label-noise issue for edge detection, this paper proposes to learn Pixel-level NoiseTransitions to model the label-corruption process. To achieve it, we develop a novel Pixel-wise Shift Learning (PSL) module to estimate the transition from clean to noisy labels as a displacement field. Exploiting the estimated noise transitions, our model, named PNT-Edge, is able to fit the prediction to clean labels. In addition, a local edge density regularization term is devised to exploit local structure information for better transition learning. This term encourages learning large shifts for the edges with complex local structures. Experiments on SBD and Cityscapes demonstrate the effectiveness of our method in relieving the impact of label noise. Codes will be available at github.	翻訳日:2023-07-27 12:49:58 公開日:2023-07-26
# アクティブマルチドメイン適応のための動的ドメイン不一致調整 Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation ( http://arxiv.org/abs/2307.14068v1 ) ライセンス: Link先を確認	Long Liu, Bo Zhou, Zhipeng Zhao, Zening Liu	(参考訳) multi-source unsupervised domain adaptation (muda) は、関連するソースドメインからラベルなしのターゲットドメインに知識を転送することを目的としている。最近のMUDAメソッドは有望な結果を示しているが、ほとんどの場合、ソースドメイン全体の機能分布を調整することに重点を置いている。さらに、MUDAと教師付き手法の間には大きな性能差がある。これらの課題に対処するため,我々は動的ドメイン離散性適応(Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation, D3AAMDA)と呼ばれる新しいアプローチを提案する。まず、ソースとターゲットドメイン間の分布差の度合いに基づいて、トレーニングプロセス中にマルチソースの動的変調機構を確立する。このメカニズムは、ソースドメインとターゲットドメイン間の特徴のアライメントレベルを制御し、ソースドメイン内のローカルな有利な特徴情報を効果的に活用する。さらに、ガイド付き動的境界損失を利用して重要なサンプルを選択するための効率的なクエリ関数を設計するマルチソースアクティブ境界サンプル選択(MABS)戦略を提案する。この戦略は、最小サンプリングコストでターゲットドメインへの一般化を改善する。提案手法を,既存のUDA法とADA法と比較し,広く活用されているドメイン適応データセットについて検討した。実験結果は,我々のアプローチの優位性を明白に示している。 Multi-source unsupervised domain adaptation (MUDA) aims to transfer knowledge from related source domains to an unlabeled target domain. While recent MUDA methods have shown promising results, most focus on aligning the overall feature distributions across source domains, which can lead to negative effects due to redundant features within each domain. Moreover, there is a significant performance gap between MUDA and supervised methods. To address these challenges, we propose a novel approach called Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation (D3AAMDA). Firstly, we establish a multi-source dynamic modulation mechanism during the training process based on the degree of distribution differences between source and target domains. This mechanism controls the alignment level of features between each source domain and the target domain, effectively leveraging the local advantageous feature information within the source domains. Additionally, we propose a Multi-source Active Boundary Sample Selection (MABS) strategy, which utilizes a guided dynamic boundary loss to design an efficient query function for selecting important samples. This strategy achieves improved generalization to the target domain with minimal sampling costs. We extensively evaluate our proposed method on commonly used domain adaptation datasets, comparing it against existing UDA and ADA methods. The experimental results unequivocally demonstrate the superiority of our approach.	翻訳日:2023-07-27 12:49:40 公開日:2023-07-26
# 医療における機械学習応用:知識の現状と今後の方向性 Machine Learning Applications In Healthcare: The State Of Knowledge and Future Directions ( http://arxiv.org/abs/2307.14067v1 ) ライセンス: Link先を確認	Mrinmoy Roy, Sarwar J. Minar, Porarthi Dhar, A T M Omor Faruq	(参考訳) 高速な処理能力で簡単に紛失した隠れパターンの検出は、今日の医療システムに機械学習(ML)が不可欠である。多くのMLアプリケーションがすでに発見されており、その多くはまだ調査中であるが、現在の医療システムで採用されているものはほとんどない。その結果、MLの医療システムには大きなチャンスがあるが、分散情報、適切に整理されたドキュメントの不足、関連分野における説明が容易なドキュメントが大きな障害となり、医療専門家にとってMLの応用が困難になる。本研究の目的は,医療分野のさまざまな分野のMLアプリケーションを簡潔かつ効果的に収集し,必要な情報を関連文献で即座にアクセスできるようにすることである。本研究は,地域レベルでの作業,リスク管理・予防ケア,医療運用管理,遠隔医療,早期発見の5つのグループに分けた。これらのグループをサブグループに分割し,簡単なアクセスのための表形式で記述した関連資料を提供した。我々の目標は、医療産業におけるML適用性について人々に知らせ、臨床医の機械学習応用に関する知識ギャップを減らし、より機械学習ベースの医療システムにヘルスケア専門家を動機付けることである。 Detection of easily missed hidden patterns with fast processing power makes machine learning (ML) indispensable to today's healthcare system. Though many ML applications have already been discovered and many are still under investigation, only a few have been adopted by current healthcare systems. As a result, there exists an enormous opportunity in healthcare system for ML but distributed information, scarcity of properly arranged and easily explainable documentation in related sector are major impede which are making ML applications difficult to healthcare professionals. This study aimed to gather ML applications in different areas of healthcare concisely and more effectively so that necessary information can be accessed immediately with relevant references. We divided our study into five major groups: community level work, risk management/ preventive care, healthcare operation management, remote care, and early detection. Dividing these groups into subgroups, we provided relevant references with description in tabular form for quick access. Our objective is to inform people about ML applicability in healthcare industry, reduce the knowledge gap of clinicians about the ML applications and motivate healthcare professionals towards more machine learning based healthcare system.	翻訳日:2023-07-27 12:49:04 公開日:2023-07-26
# 歯科放射線画像セグメンテーションのための拡散モデルによる事前訓練 Pre-Training with Diffusion models for Dental Radiography segmentation ( http://arxiv.org/abs/2307.14066v1 ) ライセンス: Link先を確認	J\'er\'emy Rousseau, Christian Alaka, Emma Covili, Hippolyte Mayard, Laura Misrachi, Willy Au	(参考訳) 医用ラジオグラフィーのセグメンテーション、特に歯科用ラジオグラフィーは、特定の専門知識と労働集約的なアノテーションを必要とするラベル付けのコストによって非常に制限されている。本研究では,分散確率モデル(ddpm)を用いた意味セグメンテーションのための素早い事前学習手法を提案する。当社の直接的なアプローチはラベル効率の面で目覚ましいパフォーマンスを達成し,事前トレーニングとダウンストリームタスク間のアーキテクチャ変更は必要としない。 DDPMトレーニングの目的を利用して,まずUnetを事前訓練し,次にセグメント化タスクで得られたモデルを微調整する。歯科用ラジオグラフィーのセグメンテーション実験の結果,提案手法は最先端の事前訓練法と競合することが示された。 Medical radiography segmentation, and specifically dental radiography, is highly limited by the cost of labeling which requires specific expertise and labor-intensive annotations. In this work, we propose a straightforward pre-training method for semantic segmentation leveraging Denoising Diffusion Probabilistic Models (DDPM), which have shown impressive results for generative modeling. Our straightforward approach achieves remarkable performance in terms of label efficiency and does not require architectural modifications between pre-training and downstream tasks. We propose to first pre-train a Unet by exploiting the DDPM training objective, and then fine-tune the resulting model on a segmentation task. Our experimental results on the segmentation of dental radiographs demonstrate that the proposed method is competitive with state-of-the-art pre-training methods.	翻訳日:2023-07-27 12:48:10 公開日:2023-07-26
# ECO:ビジョンランゲージモデルのためのコンテキスト最適化 ECO: Ensembling Context Optimization for Vision-Language Models ( http://arxiv.org/abs/2307.14063v1 ) ライセンス: Link先を確認	Lorenzo Agnolucci, Alberto Baldrati, Francesco Todino, Federico Becattini, Marco Bertini, Alberto Del Bimbo	(参考訳) 画像認識は、近ごろパラダイムシフトを目撃し、テキストのプロンプトに基づいた数ショットの分類に視覚言語モデルが使用されている。これらのうち、CLIPモデルは、画像と独自のテキストプロンプトを潜在空間でマッチングすることで、ゼロショット転送の顕著な機能を示している。これは、CLIPの分類能力を最大化するためのエンジニアリングやテキストコンテキストの学習に焦点を当てたいくつかの作業の道を開いた。本稿では,画像分類のためのプロンプトの集合を学習することで,この傾向に従う。トレーニング可能な1つのプロンプトに頼るのではなく,多様で,おそらく短いコンテキストでの学習が,結果を大幅に改善することを示す。特に、推論時に追加コストなしで、より優れたマイノリティを報告します。 11のベンチマークで、我々のアプローチの能力を実演します。 Image recognition has recently witnessed a paradigm shift, where vision-language models are now used to perform few-shot classification based on textual prompts. Among these, the CLIP model has shown remarkable capabilities for zero-shot transfer by matching an image and a custom textual prompt in its latent space. This has paved the way for several works that focus on engineering or learning textual contexts for maximizing CLIP's classification capabilities. In this paper, we follow this trend by learning an ensemble of prompts for image classification. We show that learning diverse and possibly shorter contexts improves considerably and consistently the results rather than relying on a single trainable prompt. In particular, we report better few-shot capabilities with no additional cost at inference time. We demonstrate the capabilities of our approach on 11 different benchmarks.	翻訳日:2023-07-27 12:47:33 公開日:2023-07-26
# 可積分多体フロッケ系を用いたボヒガス・ジアニニ・シュミット予想の破れ Violation of Bohigas-Giannoni-Schmit conjecture using an integrable many-body Floquet system ( http://arxiv.org/abs/2307.14122v1 ) ライセンス: Link先を確認	Harshit Sharma, Udaysinh T. Bhosale	(参考訳) 初期の研究では、BGSの予想を支持する十分な証拠が得られており、例外は少ない。ここでは、量子キックトップのモデルとして知られる多体システムを用いて、全対一の相互作用とキック強度$k=N\pi/2$からなる量子キックトップのモデルを用いる。対応する半古典位相空間がカオスであっても量子可積分であることを示し、したがってBGS予想に反する。 n=5$ から $11$ qubits のケースを解析的に解き、固有系、絡み合いのダイナミクス、ユニタリ進化演算子を見つける。 N>11$ qubits の一般的な場合、縮退スペクトルを用いた積分可能性の数値的証拠と、時間発展ユニタリ進化作用素の正確な周期的性質と絡み合いのダイナミクスを提供する。 Earlier studies have given enough evidence in support of the BGS conjecture, with few exceptions violating it. Here, we provide one more counterexample using a many-body system popularly known as the model of quantum kicked top consisting of $N$ qubits with all-to-all interaction and kicking strength $k=N\pi/2$. We show that it is quantum integrable even though the corresponding semiclassical phase-space is chaotic, thus violating the BGS conjecture. We solve the cases of $N=5$ to $11$ qubits analytically, finding its eigensystem, the dynamics of the entanglement, and the unitary evolution operator. For the general case of $N>11$ qubits, we provide numerical evidence of integrability using degenerate spectrum, and the exact periodic nature of the time-evolved unitary evolution operator and the entanglement dynamics.	翻訳日:2023-07-27 12:39:30 公開日:2023-07-26
# 単純グラフの最大傾きの最大数を計算する手段としての斜めグラフ Cliqueful graphs as a means of calculating the maximal number of maximum cliques of simple graphs ( http://arxiv.org/abs/2307.14120v1 ) ライセンス: Link先を確認	D\'aniel Pfeifer	(参考訳) n$頂点上の単純なグラフは、多くの最大傾きを含むことができる。しかし、その数はどれくらいあるのか? さらに、より具体的には、もし$n \ge 15$であれば、飽和した複合気候グラフの上に取り込まれることが示される。これを用いて、$3^{\lfloor n/3 \rfloor}c$ maxcliques を含むグラフは、$n$ vertices 上で最も多くの最大値を持ち、$c\in\{1,\frac{4}{3},2\}$ は $n \text{ mod } 3$ に依存する。 A simple graph on $n$ vertices may contain a lot of maximum cliques. But how many can it potentially contain? We will show that the maximum number of maximum cliques is taken over so-called cliqueful graphs, more specifically, later we will show that it is taken over saturated composite cliqueful graphs, if $n \ge 15$. Using this we will show that the graph that contains $3^{\lfloor n/3 \rfloor}c$ maxcliques has the most number of maxcliques on $n$ vertices, where $c\in\{1,\frac{4}{3},2\}$, depending on $n \text{ mod } 3$.	翻訳日:2023-07-27 12:39:14 公開日:2023-07-26
# 高画質画像アノテーションのためのセマンティクス駆動手法 A semantics-driven methodology for high-quality image annotation ( http://arxiv.org/abs/2307.14119v1 ) ライセンス: Link先を確認	Fausto Giunchiglia, Mayukh Bagchi and Xiaolei Diao	(参考訳) 機械学習とコンピュータビジョンにおける最近の研究は、ground truth object recognition benchmarkデータセット内に様々な種類の体系的欠陥があることを強調している。我々の基本的な特徴は、これらの欠陥は画像に符号化された視覚情報とそれらに注釈を付けるラベルの意図した意味との間に存在する多対多のマッピングに根ざしているということだ。その結果、現在のアノテーションプロセスはほとんど仕様が不明確であり、アノテータの主観的な判断に多くの自由が残されている。本稿では, 自然言語処理, 知識表現, コンピュータビジョンの方法論であるvTelosを提案する。その目的は, 暗黙的に意図する意味意味論を明確にすることであり, 主観的選択の数と役割を最小化することである。 vtelos の重要な要素は、自然言語ラベルの意味を提供する主要な手段として wordnet lexico-semantic hierarchy を活用し、結果として、オブジェクトと彼らが描いた視覚特性に基づいて画像のアノテーションを駆動することである。この方法論はimagenet階層のサブセットをポピュレートするイメージ上で検証される。 Recent work in Machine Learning and Computer Vision has highlighted the presence of various types of systematic flaws inside ground truth object recognition benchmark datasets. Our basic tenet is that these flaws are rooted in the many-to-many mappings which exist between the visual information encoded in images and the intended semantics of the labels annotating them. The net consequence is that the current annotation process is largely under-specified, thus leaving too much freedom to the subjective judgment of annotators. In this paper, we propose vTelos, an integrated Natural Language Processing, Knowledge Representation, and Computer Vision methodology whose main goal is to make explicit the (otherwise implicit) intended annotation semantics, thus minimizing the number and role of subjective choices. A key element of vTelos is the exploitation of the WordNet lexico-semantic hierarchy as the main means for providing the meaning of natural language labels and, as a consequence, for driving the annotation of images based on the objects and the visual properties they depict. The methodology is validated on images populating a subset of the ImageNet hierarchy.	翻訳日:2023-07-27 12:38:56 公開日:2023-07-26
# 対話におけるデプロイメントデータからのインシシトフィードバックの活用 Leveraging Implicit Feedback from Deployment Data in Dialogue ( http://arxiv.org/abs/2307.14117v1 ) ライセンス: Link先を確認	Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He, Jason Weston	(参考訳) 我々は,ユーザとデプロイモデルとの自然な対話から学習することで,追加のアノテーションを使わずに社会的会話エージェントを改善することを研究する。機械が生成した発話の質を暗黙的に測定するために,収集した対話エピソードにおけるユーザ応答長,感情,将来の人間の発話の反応などの信号を利用する。我々の実験では、BlenderBot(Xu et al., 2023)から公開されたデプロイメントデータを使用しました。人的評価は, ベースライン応答よりも新しいモデルの改良を示唆するが, プロキシ信号によっては, 望ましくない特性を持つ世代が増える可能性がある。例えば、会話長の最適化は、ベースラインよりも議論の的あるいは不フレンドリーな世代につながるが、ポジティブな感情や反応の最適化はこれらの行動を減少させる。 We study improving social conversational agents by learning from natural dialogue between users and a deployed model, without extra annotations. To implicitly measure the quality of a machine-generated utterance, we leverage signals like user response length, sentiment and reaction of the future human utterances in the collected dialogue episodes. Our experiments use the publicly released deployment data from BlenderBot (Xu et al., 2023). Human evaluation indicates improvements in our new models over baseline responses; however, we find that some proxy signals can lead to more generations with undesirable properties as well. For example, optimizing for conversation length can lead to more controversial or unfriendly generations compared to the baseline, whereas optimizing for positive sentiment or reaction can decrease these behaviors.	翻訳日:2023-07-27 12:38:35 公開日:2023-07-26
# ガウス国家の想像力 Imaginarity of Gaussian states ( http://arxiv.org/abs/2307.14116v1 ) ライセンス: Link先を確認	Jianwei Xu	(参考訳) 量子力学がなぜ複素数だけでなく実数を使うのかという長い議論があった。この問題に対処するため、近年では、量子資源理論の手法で想像力理論が開発されている。しかし、既存の想像力理論は、主に有限次元の量子系に焦点を当てている。ガウス状態は、量子物理学の多くの分野で広く使われているが、無限次元の量子系にある。本稿では,ボソニックなガウス状態に対する想像性の資源理論を確立する。そのために、フォック基底の下で、ガウス状態の平均と共分散行列の観点から、実ガウス状態と実ガウスチャネルを決定する。また,忠実性に基づくガウス国家に対する2つの想像上の尺度を提案する。 It has been a long-standing debate that why quantum mechanics uses complex numbers but not only real numbers. To address this topic, in recent years, the imaginarity theory has been developed in the way of quantum resource theory. However, the existing imaginarity theory mainly focuses on the quantum systems with finite dimensions. Gaussian states are widely used in many fields of quantum physics, but they are in the quantum systems with infinite dimensions. In this paper we establish a resource theory of imaginarity for bosonic Gaussian states. To do so, under the Fock basis, we determine the real Gaussian states and real Gaussian channels in terms of the means and covariance matrices of Gaussian states. Also, we provide two imaginary measures for Gaussian states based on the fidelity.	翻訳日:2023-07-27 12:38:23 公開日:2023-07-26
# 眼周囲バイオメトリックス:データベース、アルゴリズム、方向 Periocular biometrics: databases, algorithms and directions ( http://arxiv.org/abs/2307.14111v1 ) ライセンス: Link先を確認	Fernando Alonso-Fernandez, Josef Bigun	(参考訳) 眼窩バイオメトリックス(periocular bioometrics)は、非制御状態における虹彩や顔のシステムの性能に関する懸念から、独立したモダリティとして確立されている。眼窩 (periocular) は、まぶた、裂け目、まぶたなど眼の周辺にある顔面の領域を指す。これは、顔全体(近距離では隠蔽できる)と虹彩テクスチャ(遠距離では十分な解像度を持たない)の間のトレードオフを表す、広範囲な取得距離で利用可能である。眼周囲領域は顔や虹彩画像に現れるため、これらのモダリティと併用して使用することもできる。眼周囲領域から抽出された特徴は、性別分類や民族分類にも有効であり、また、性別変換やプラスティック手術が認知能力に与える影響について研究している。本稿では, 近視バイオメトリックス研究における技術の現状を概観し, 最も関係の深い課題について考察し, 既存の文献を網羅的に紹介する。今後の研究動向についても概説する。 Periocular biometrics has been established as an independent modality due to concerns on the performance of iris or face systems in uncontrolled conditions. Periocular refers to the facial region in the eye vicinity, including eyelids, lashes and eyebrows. It is available over a wide range of acquisition distances, representing a trade-off between the whole face (which can be occluded at close distances) and the iris texture (which do not have enough resolution at long distances). Since the periocular region appears in face or iris images, it can be used also in conjunction with these modalities. Features extracted from the periocular region have been also used successfully for gender classification and ethnicity classification, and to study the impact of gender transformation or plastic surgery in the recognition performance. This paper presents a review of the state of the art in periocular biometric research, providing an insight of the most relevant issues and giving a thorough coverage of the existing literature. Future research trends are also briefly discussed.	翻訳日:2023-07-27 12:38:11 公開日:2023-07-26
# GraphRNNが再考: 有向非巡回グラフのアブレーション研究と拡張 GraphRNN Revisited: An Ablation Study and Extensions for Directed Acyclic Graphs ( http://arxiv.org/abs/2307.14109v1 ) ライセンス: Link先を確認	Taniya Das, Mark Koch, Maya Ravichandran, Nikhil Khatri	(参考訳) GraphRNNは、Youらによって提案された、グラフ生成モデルを学ぶためのディープラーニングベースのアーキテクチャである。我々は、GraphRNNアーキテクチャの再現実装を用いて、Youらの結果を再現し、新しいメトリクスを使用してベースラインモデルに対して評価する。アブレーション研究により,同型グラフの表現を崩壊させるようなBFSトラバーサルがモデル性能に大きく寄与することを発見した。さらに、BFSトラバーサルをトポロジ的ソートに置き換えることで、グラフRNNを拡張して有向非巡回グラフを生成する。本手法は,現実のデータセット上でのグラフRNNの有向マルチクラス変種よりも大幅に改善されていることを示す。 GraphRNN is a deep learning-based architecture proposed by You et al. for learning generative models for graphs. We replicate the results of You et al. using a reproduced implementation of the GraphRNN architecture and evaluate this against baseline models using new metrics. Through an ablation study, we find that the BFS traversal suggested by You et al. to collapse representations of isomorphic graphs contributes significantly to model performance. Additionally, we extend GraphRNN to generate directed acyclic graphs by replacing the BFS traversal with a topological sort. We demonstrate that this method improves significantly over a directed-multiclass variant of GraphRNN on a real-world dataset.	翻訳日:2023-07-27 12:37:56 公開日:2023-07-26
# Decoding ChatGPT: 既存の研究の分類学、現在の課題、そして将来の可能性 Decoding ChatGPT: A Taxonomy of Existing Research, Current Challenges, and Possible Future Directions ( http://arxiv.org/abs/2307.14107v1 ) ライセンス: Link先を確認	Shahab Saquib Sohail, Faiza Farhat, Yassine Himeur, Mohammad Nadeem, Dag {\O}ivind Madsen, Yashbir Singh, Shadi Atalla and Wathiq Mansoor	(参考訳) Chat Generative Pre-trained Transformer (ChatGPT)は2022年11月の打ち上げ以来、大きな関心を集めている。合格試験やクリエイティビティ・ライティングなど、様々な分野で印象的なパフォーマンスを示している。しかし、バイアスや信頼に関する課題や懸念は続いている。本稿では、ChatGPT研究の分類学を提供し、その応用を探求することを目的として、ChatGPT上で100冊以上のScoopsをインデクシングした出版物を総合的にレビューする。既存の文献を批判的に分析し,研究に共通するアプローチを特定した。さらに, chatgpt が医療, マーケティング, 金融サービス, ソフトウェア工学, 学術的, 科学的な記述, 研究と教育, 環境科学, 自然言語処理など, 有用性を見出した多様な応用分野を調査した。これらのアプリケーションを調べることで、実世界の課題に対処するためのchatgptの可能性に関する貴重な洞察を得ることができます。また,これらの分野におけるさらなる研究開発の必要性を強調し,バイアスや信頼性など,chatgptに関わる重要な問題についても論じる。さらに,ChatGPT研究の今後の方向性を明らかにし,今後の課題への解決策を提案し,今後の展望を推測する。 ChatGPTの能力を十分に活用することで、さまざまな領域でその可能性を解き放つことができ、会話型AIの進歩と社会における変革的な影響につながります。 Chat Generative Pre-trained Transformer (ChatGPT) has gained significant interest and attention since its launch in November 2022. It has shown impressive performance in various domains, including passing exams and creative writing. However, challenges and concerns related to biases and trust persist. In this work, we present a comprehensive review of over 100 Scopus-indexed publications on ChatGPT, aiming to provide a taxonomy of ChatGPT research and explore its applications. We critically analyze the existing literature, identifying common approaches employed in the studies. Additionally, we investigate diverse application areas where ChatGPT has found utility, such as healthcare, marketing and financial services, software engineering, academic and scientific writing, research and education, environmental science, and natural language processing. Through examining these applications, we gain valuable insights into the potential of ChatGPT in addressing real-world challenges. We also discuss crucial issues related to ChatGPT, including biases and trustworthiness, emphasizing the need for further research and development in these areas. Furthermore, we identify potential future directions for ChatGPT research, proposing solutions to current challenges and speculating on expected advancements. By fully leveraging the capabilities of ChatGPT, we can unlock its potential across various domains, leading to advancements in conversational AI and transformative impacts in society.	翻訳日:2023-07-27 12:37:45 公開日:2023-07-26
# 広い非調和ポテンシャルにおける粒子動力学のウィグナー解析 Wigner Analysis of Particle Dynamics in Wide Nonharmonic Potentials ( http://arxiv.org/abs/2307.14106v1 ) ライセンス: Link先を確認	Andreu Riera-Campeny and Marc Roda-Llordes and Piotr T. Grochowski and Oriol Romero-Isart	(参考訳) 非調和ポテンシャルにおける粒子の1次元運動の時間発展を概ね記述したウィグナー関数の解析的表現を導出する。この結果は、広いポテンシャルと小さなゆらぎ、すなわち初期状態の1つよりも大きな大きさの空間展開を可能にするが、関連する動的長さスケール(例えば、回転点間の距離)よりも小さく保たれるポテンシャルの配置において優れた近似を与える。解析結果は,古典物理学と量子物理学の相互作用と非線形力学におけるデコヒーレンスの影響を解明する。この解析結果は、非線形力学を用いて大規模粒子のマクロ量子状態を生成する提案を設計、最適化、理解するのに役立つ。 We derive an analytical expression of a Wigner function that approximately describes the time evolution of the one-dimensional motion of a particle in a nonharmonic potential. Our result provides an excellent approximation in the regime of wide potentials and small fluctuations, namely potentials that enable spatial expansions orders of magnitude larger than the one of the initial state but that remain smaller compared to the relevant dynamical length scale (e.g., distance between turning points). Our analytical result elucidates the interplay between classical and quantum physics and the impact of decoherence during nonlinear dynamics. This analytical result is instrumental to design, optimize and understand proposals using nonlinear dynamics to generate macroscopic quantum states of massive particles.	翻訳日:2023-07-27 12:37:22 公開日:2023-07-26
# スピン系上の量子非退化測定における誤差チャネル Error channels in quantum nondemolition measurements on spin systems ( http://arxiv.org/abs/2307.14103v1 ) ライセンス: Link先を確認	Benjamin Joecker, Holly G. Stemp, Irene Fern\'andez de Fuentes, Mark A. I. Johnson, Andrea Morello	(参考訳) 量子非破壊測定(QND)は、量子情報処理の貴重な資源である。反復QND測定は、基礎となる単発測定が低忠実度であっても、キュービットの準備と測定の忠実度を高めることができる。しかし、この忠実度向上は、物理系が真にQND過程を許容する程度によって制限される - 理想的なQND測定から逸脱すると、測定が繰り返し過ぎるとビットフリップエラー(「量子ジャンプ」)が発生する。そこで我々は,モデルスピン量子ビット系における完全QND測定の偏差から生じる誤差を理解し,定量化する理論的枠組みを開発する。まず,交換結合電子スピン qubits tunnel-coupled to a charge reservoir のユビキタスな例に基づくモデルを開発した。次に電子-核スピン系に拡張し、2つの限界の間の重要な類似性と相違を説明する。シリコン中のドナー核スピンのよく理解されたプラットフォームに適用すると、このモデルは実験と良好な一致を示す。付加一般性については、異方性スピンカップリングの効果を考慮して研究を終える。 Quantum nondemolition (QND) measurements are a precious resource for quantum information processing. Repetitive QND measurements can boost the fidelity of qubit preparation and measurement, even when the underlying single-shot measurements are of low fidelity. However, this fidelity boost is limited by the degree in which the physical system allows for a truly QND process -- slight deviations from ideal QND measurement result in bit flip errors (`quantum jumps') if the measurement is repeated too often. Here, we develop a theoretical framework to understand and quantify the resulting error arising from deviation from perfect QND measurement in model spin qubit systems. We first develop our model on the ubiquitous example of exchange-coupled electron spins qubits tunnel-coupled to a charge reservoir. We then extend it to electron-nuclear spin systems, to illustrate the crucial similarities and differences between the two limits. Applied to the well-understood platform of a donor nuclear spin in silicon, the model shows excellent agreement with experiments. For added generality, we conclude the work by considering the effect of anisotropic spin couplings.	翻訳日:2023-07-27 12:37:08 公開日:2023-07-26
# 単なる死滅による合成能動推論剤の設計に向けて Toward Design of Synthetic Active Inference Agents by Mere Mortals ( http://arxiv.org/abs/2307.14145v1 ) ライセンス: Link先を確認	Bert de Vries	(参考訳) アクティブ推論エージェントの理論的特性は印象的だが,エッジデバイス上での動作ハードウェアやソフトウェアにおいて有効なエージェントを実現するにはどうすればよいのか? これは、ポリシー探索の計算負荷が指数関数的に爆発するのに対して、計算リソースはエッジデバイスでは非常に限られているため、興味深い問題である。本稿では,能動型推論エージェントを開発するために,熟練者以外の技術者を支援するソフトウェアツールボックスに必要な機能について論じる。 tensorflowがディープラーニング技術の応用を促進するのと同じように、アクティブな推論エージェントの民主化を加速するツールボックス・イン・プログレッシブを導入する。 The theoretical properties of active inference agents are impressive, but how do we realize effective agents in working hardware and software on edge devices? This is an interesting problem because the computational load for policy exploration explodes exponentially, while the computational resources are very limited for edge devices. In this paper, we discuss the necessary features for a software toolbox that supports a competent non-expert engineer to develop working active inference agents. We introduce a toolbox-in-progress that aims to accelerate the democratization of active inference agents in a similar way as TensorFlow propelled applications of deep learning technology.	翻訳日:2023-07-27 12:30:56 公開日:2023-07-26
# LOIS:ビジュアル質問応答のためのインスタンスセマンティクスの展望 LOIS: Looking Out of Instance Semantics for Visual Question Answering ( http://arxiv.org/abs/2307.14142v1 ) ライセンス: Link先を確認	Siyu Zhang, Yeming Chen, Yaoru Sun, Fang Wang, Haibo Shi, Haoran Wang	(参考訳) 視覚的質問応答(VQA)は、視覚と言語を正しく推論するために、多モーダルなタスクとして集中的に研究されている。最近の試みでは、VQAタスクを解くための様々な注意ベースのモジュールが開発されている。しかし、モデル推論の性能は、セマンティックス理解のための視覚処理によってほとんどボトルネックとなる。既存の検出手法の多くはバウンディングボックスに依存しており、VQAモデルでは画像中のオブジェクトの意味論の因果関係を理解し、コンテキスト情報を正しく推測することが深刻な課題である。この目的のために,本研究では,この重要な問題に対処するため,LOIS(Looking Out of Instance Semantics)と呼ばれる,ボックス境界のないモデルフレームワークを提案する。 LOISにより、よりきめ細かい特徴記述が視覚的事実を生成する。さらに、インスタンスマスクによるラベルの曖昧さを克服するために、関係注意モジュールは2種類ある。 1)モダリティ内及びモダリティ 2) モーダリティは, 異なるマルチビュー特徴から正しい回答を推測するために考案された。具体的には、インスタンスオブジェクトと背景情報の間の高度な視覚的意味関係をモデル化するための相互関係注意モジュールを実装した。また,提案する注意モデルは,単語に関する重要な質問に注目することで,画像領域をさらに分析することができる。 4つのベンチマークvqaデータセットにおける実験結果から,提案手法は視覚的推論能力の向上に好適な性能を示す。 Visual question answering (VQA) has been intensively studied as a multimodal task that requires effort in bridging vision and language to infer answers correctly. Recent attempts have developed various attention-based modules for solving VQA tasks. However, the performance of model inference is largely bottlenecked by visual processing for semantics understanding. Most existing detection methods rely on bounding boxes, remaining a serious challenge for VQA models to understand the causal nexus of object semantics in images and correctly infer contextual information. To this end, we propose a finer model framework without bounding boxes in this work, termed Looking Out of Instance Semantics (LOIS) to tackle this important issue. LOIS enables more fine-grained feature descriptions to produce visual facts. Furthermore, to overcome the label ambiguity caused by instance masks, two types of relation attention modules: 1) intra-modality and 2) inter-modality, are devised to infer the correct answers from the different multi-view features. Specifically, we implement a mutual relation attention module to model sophisticated and deeper visual semantic relations between instance objects and background information. In addition, our proposed attention model can further analyze salient image regions by focusing on important word-related questions. Experimental results on four benchmark VQA datasets prove that our proposed method has favorable performance in improving visual reasoning capability.	翻訳日:2023-07-27 12:30:45 公開日:2023-07-26
# 可変駆動強度を有する単一磁束量子ビット制御 Single-flux-quantum-based Qubit Control with Tunable Driving Strength ( http://arxiv.org/abs/2307.14140v1 ) ライセンス: Link先を確認	Kuang Liu, Yifan Wang, Bo Ji, Wanpeng Gao, Zhirong Lin, Zhen Wang	(参考訳) 単一磁束量子(SFQ)回路は超伝導量子プロセッサをスケールアップするための低温量子古典界面を構築する大きな可能性を持っている。 sfqベースの量子ゲートが設計・実現されている。しかし、現在の制御方式では駆動強度をqubitsに調整することは困難であり、ゲート長を制限し、通常不要なレベルへの漏洩を引き起こす。本研究では,sfqパルスと可変間隔を結合して駆動強度を連続的に調整する方式とパルス発生回路を設計する。このスキームは、SFQベースのゲート長を調整するだけでなく、駆動強度エンベロープを調整できる可能性も提案している。シミュレーションにより,提案手法は不要なレベルへの漏洩を抑制し,SFQベースのクリフォードゲートの誤差を1桁以上低減できることが示された。 Single-flux-quantum (SFQ) circuits have great potential in building cryogenic quantum-classical interfaces for scaling up superconducting quantum processors. SFQ-based quantum gates have been designed and realized. However, current control schemes are difficult to tune the driving strength to qubits, which restricts the gate length and usually induces leakage to unwanted levels. In this study, we design the scheme and corresponding pulse generator circuit to continuously adjust the driving strength by coupling SFQ pulses with variable intervals. This scheme not only provides a way to adjust the SFQ-based gate length, but also proposes the possibility to tune the driving strength envelope. Simulations show that our scheme can suppress leakage to unwanted levels and reduce the error of SFQ-based Clifford gates by more than an order of magnitude.	翻訳日:2023-07-27 12:30:23 公開日:2023-07-26
# 因果関係の報酬を伴う部分的定常組合せ半バンド Piecewise-Stationary Combinatorial Semi-Bandit with Causally Related Rewards ( http://arxiv.org/abs/2307.14138v1 ) ライセンス: Link先を確認	Behzad Nourani-Koliji, Steven Bilaj, Amir Rezaei Balef, Setareh Maghsudi	(参考訳) 本稿では,因果関係の報酬を用いた定位半帯域問題について検討する。非定常環境では、ベースアームの分布の変化、報酬間の因果関係、またはその両方が報酬生成プロセスを変化させる。このような環境では、最適な意思決定者は、両方の変化源を従わなければならない。この問題は、意思決定者が選択された腕の束の結果のみを観察する組合せ半バンド設定において悪化する。提案するポリシの中核は、Upper Confidence Bound (UCB)アルゴリズムである。エージェントはこの課題を克服するために適応的なアプローチに依存していると仮定する。具体的には、GLR(Generalized Likelihood Ratio)テストに基づく変更点検出器を用いる。さらに、構造化環境における意思決定プロセスにおける新たな再起動戦略としてグループ再スタートの概念を導入する。最後に,提案アルゴリズムは,基礎となるグラフ構造の変動をトレースする機構を統合し,バンディット設定における報酬間の因果関係をキャプチャする。理論的には,構造および分布の変化が性能に与える影響を反映した,後悔の上限を確立する。実世界のシナリオにおける数値実験の結果から,提案手法の適用性と性能は,最先端ベンチマークと比較して良好であった。 We study the piecewise stationary combinatorial semi-bandit problem with causally related rewards. In our nonstationary environment, variations in the base arms' distributions, causal relationships between rewards, or both, change the reward generation process. In such an environment, an optimal decision-maker must follow both sources of change and adapt accordingly. The problem becomes aggravated in the combinatorial semi-bandit setting, where the decision-maker only observes the outcome of the selected bundle of arms. The core of our proposed policy is the Upper Confidence Bound (UCB) algorithm. We assume the agent relies on an adaptive approach to overcome the challenge. More specifically, it employs a change-point detector based on the Generalized Likelihood Ratio (GLR) test. Besides, we introduce the notion of group restart as a new alternative restarting strategy in the decision making process in structured environments. Finally, our algorithm integrates a mechanism to trace the variations of the underlying graph structure, which captures the causal relationships between the rewards in the bandit setting. Theoretically, we establish a regret upper bound that reflects the effects of the number of structural- and distribution changes on the performance. The outcome of our numerical experiments in real-world scenarios exhibits applicability and superior performance of our proposal compared to the state-of-the-art benchmarks.	翻訳日:2023-07-27 12:30:08 公開日:2023-07-26
# 中規模トルコのBERTモデルの開発と評価 Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models ( http://arxiv.org/abs/2307.14134v1 ) ライセンス: Link先を確認	Himmet Toprak Kesgin, Muzaffer Kaan Yuce, Mehmet Fatih Amasyali	(参考訳) 本研究では,小,小,小,中規模のトルコのBERTモデルを導入,評価し,低リソース言語における研究ギャップを埋めることを目的とした。我々は、複数の情報源から75GB以上のテキストを含む多様なデータセットでこれらのモデルをトレーニングし、マスク予測、感情分析、ニュース分類、ゼロショット分類などのタスクでテストした。モデルのサイズは小さいものの、ゼロショットタスクを含む堅牢な性能を示し、計算効率と実行時間の短縮を実現した。本研究は,特にトルコ語の文脈において,より小さな言語モデルの開発と適用に関する貴重な知見を提供する。 This study introduces and evaluates tiny, mini, small, and medium-sized uncased Turkish BERT models, aiming to bridge the research gap in less-resourced languages. We trained these models on a diverse dataset encompassing over 75GB of text from multiple sources and tested them on several tasks, including mask prediction, sentiment analysis, news classification, and, zero-shot classification. Despite their smaller size, our models exhibited robust performance, including zero-shot task, while ensuring computational efficiency and faster execution times. Our findings provide valuable insights into the development and application of smaller language models, especially in the context of the Turkish language.	翻訳日:2023-07-27 12:29:51 公開日:2023-07-26
# RNN-Tロスにさよなら: 音声認識のための新しいCIFベースのトランスデューサアーキテクチャ Say Goodbye to RNN-T Loss: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition ( http://arxiv.org/abs/2307.14132v1 ) ライセンス: Link先を確認	Tian-Hao Zhang, Dinghao Zhou, Guiping Zhon, Baoxiang Li	(参考訳) RNN-Tモデルは、入力オーディオとターゲットシーケンス間の長さアライメントを実現するために、RNN-T損失に依存するASRで広く使われている。しかし、実装の複雑さとrnn-t損失のアライメントに基づく最適化ターゲットは、それぞれ計算冗長性と予測ネットワークの役割を減少させる。本稿では,CIF(Continuous Integrate-and-Fire)機構をRNN-Tモデルに組み込んだCIF-Transducer(CIF-T)という新しいモデルを提案する。このようにして、RNN-T損失は放棄され、計算量が減少し、予測ネットワークがより重要な役割を果たす。また,Funnel-CIF,Context Blocks,Unified Gating and Bilinear Pooling joint network,およびパフォーマンス向上のための補助的トレーニング戦略についても紹介する。 178時間AISHELL-1と10000時間WnetSpeechデータセットの実験は、CIF-TがRNN-Tモデルと比較して計算オーバーヘッドの少ない最先端の結果を達成することを示した。 RNN-T models are widely used in ASR, which rely on the RNN-T loss to achieve length alignment between input audio and target sequence. However, the implementation complexity and the alignment-based optimization target of RNN-T loss lead to computational redundancy and a reduced role for predictor network, respectively. In this paper, we propose a novel model named CIF-Transducer (CIF-T) which incorporates the Continuous Integrate-and-Fire (CIF) mechanism with the RNN-T model to achieve efficient alignment. In this way, the RNN-T loss is abandoned, thus bringing a computational reduction and allowing the predictor network a more significant role. We also introduce Funnel-CIF, Context Blocks, Unified Gating and Bilinear Pooling joint network, and auxiliary training strategy to further improve performance. Experiments on the 178-hour AISHELL-1 and 10000-hour WenetSpeech datasets show that CIF-T achieves state-of-the-art results with lower computational overhead compared to RNN-T models.	翻訳日:2023-07-27 12:29:37 公開日:2023-07-26
# 超伝導量子古典ハイブリッド回路における準粒子ダイナミクス Quasiparticle Dynamics in Superconducting Quantum-Classical Hybrid Circuits ( http://arxiv.org/abs/2307.14130v1 ) ライセンス: Link先を確認	Kuang Liu, Xiaoliang He, Zhengqi Niu, Hang Xue, Wenbing Jiang, Liliang Ying, Wei Peng, Masaaki Maezawa, Zhirong Lin, Xiaoming Xie, Zhen Wang	(参考訳) 単一磁束量子(sfq)回路は、スケーラブルで可積分な極低温量子制御システムの有望な候補である。しかし、SFQ回路の動作は、量子ビットデコヒーレンスの重要な源である非平衡準粒子(QP)を導入している。本研究では、SFQ回路と量子ビット回路からなる超伝導量子古典ハイブリッドチップのQP挙動について検討する。量子緩和時間のモニタリングにより,sfq回路誘起qpsのダイナミクスを探索する。量子ビット近傍のqp密度は、ハイブリッド回路におけるqpsのフォノンによる伝播時間に対応するsfq回路動作数マイクロ秒後にピークに達することが判明した。これはフォノンによる伝搬がハイブリッド回路におけるQPの拡散を支配することを示唆している。その結果,量子古典ハイブリッドシステムにおけるQP中毒抑制の基礎を築いた。 Single flux quantum (SFQ) circuitry is a promising candidate for a scalable and integratable cryogenic quantum control system. However, the operation of SFQ circuits introduces non-equilibrium quasiparticles (QPs), which are a significant source of qubit decoherence. In this study, we investigate QP behavior in a superconducting quantum-classical hybrid chip that comprises an SFQ circuit and a qubit circuit. By monitoring qubit relaxation time, we explore the dynamics of SFQ-circuit-induced QPs. Our findings reveal that the QP density near the qubit reaches its peak after several microseconds of SFQ circuit operation, which corresponds to the phonon-mediated propagation time of QPs in the hybrid circuits. This suggests that phonon-mediated propagation dominates the spreading of QPs in the hybrid circuits. Our results lay the foundation to suppress QP poisoning in quantum-classical hybrid systems.	翻訳日:2023-07-27 12:29:18 公開日:2023-07-26
# creative birds: 自己監督型single-view 3dスタイルトランスファー Creative Birds: Self-Supervised Single-View 3D Style Transfer ( http://arxiv.org/abs/2307.14127v1 ) ライセンス: Link先を確認	Renke Wang, Guimin Que, Shuo Chen, Xiang Li, Jun Li, Jian Yang	(参考訳) 本稿では, 形状とテクスチャを両立させたユニークな3Dオブジェクトを生成する, 単一視点3Dスタイルのトランスファー手法を提案する。提案手法は鳥の3dメッシュ形状とテクスチャを2枚の単一視点画像から生成することを目的としている。そこで本研究では,dual residual gated network(drgnet)とmulti-layer perceptron(mlp)を組み合わせた新しい形状伝達生成器を提案する。 drgnetは共有座標ゲートユニットを用いてソースおよびターゲット画像の特徴を抽出し、mlpは3dメッシュを構築するための空間座標を生成する。また,セマンティクスuvセグメンテーションを用いたテクスチャスタイル転送を実装するセマンティクスuvテクスチャ転送モジュールも導入し,セマンティクス領域の意味的意味の一貫性を保証する。このモジュールは多くの既存のアプローチに広く適用できる。最後に,識別可能なレンダラーを用いて新しい3次元鳥を構築する。 CUBデータセットの実験結果から,本手法が一視点3Dスタイル転送タスクにおける最先端性能を実現することが確認された。コードはhttps://github.com/wrk226/2D-to-3D-Evolution-Transferで公開されている。 In this paper, we propose a novel method for single-view 3D style transfer that generates a unique 3D object with both shape and texture transfer. Our focus lies primarily on birds, a popular subject in 3D reconstruction, for which no existing single-view 3D transfer methods have been developed.The method we propose seeks to generate a 3D mesh shape and texture of a bird from two single-view images. To achieve this, we introduce a novel shape transfer generator that comprises a dual residual gated network (DRGNet), and a multi-layer perceptron (MLP). DRGNet extracts the features of source and target images using a shared coordinate gate unit, while the MLP generates spatial coordinates for building a 3D mesh. We also introduce a semantic UV texture transfer module that implements textural style transfer using semantic UV segmentation, which ensures consistency in the semantic meaning of the transferred regions. This module can be widely adapted to many existing approaches. Finally, our method constructs a novel 3D bird using a differentiable renderer. Experimental results on the CUB dataset verify that our method achieves state-of-the-art performance on the single-view 3D style transfer task. Code is available in https://github.com/wrk226/2D-to-3D-Evolution-Transfer.	翻訳日:2023-07-27 12:29:06 公開日:2023-07-26
# 共有特徴モデルによるモダリティの欠如を伴うマルチモーダル学習 Multi-modal Learning with Missing Modality via Shared-Specific Feature Modelling ( http://arxiv.org/abs/2307.14126v1 ) ライセンス: Link先を確認	Hu Wang, Yuanhong Chen, Congbo Ma, Jodie Avery, Louise Hull, Gustavo Carneiro	(参考訳) モダリティの欠如は重要な問題であるが、マルチモーダルモデルによって解決されるのは自明ではない。マルチモーダルタスクにおける欠落モダリティ問題に対処する現在の手法は、評価中のみ欠落モダリティを処理するか、特定の欠落モダリティ設定を扱うために別のモデルを訓練する。さらに、これらのモデルは特定のタスクのために設計されており、例えば、分類モデルはセグメンテーションタスクに容易に適応できない。本稿では、上記の問題に対処する競合するアプローチよりもはるかにシンプルで効果的である共有特徴モデリング(ShaSpec)手法を提案する。 ShaSpecは、トレーニング中に利用可能なすべての入力モダリティを活用し、共有機能と特定の機能を学び、入力データをより良く表現することで評価するように設計されている。これは、分散アライメントとドメイン分類に基づく補助的なタスクに依存する戦略と、残りの特徴融合手順によって達成される。また、ShaSpecの設計の単純さにより、分類やセグメンテーションといった複数のタスクへの適応が容易になる。医用画像のセグメンテーションとコンピュータビジョンの分類において実験を行い、ShaSpecは競合する手法よりも大きなマージンで優れていることを示した。例えば、BraTS2018では、ShaSpecは腫瘍を増強するためのSOTAを3%以上改善し、腫瘍コアを5%、腫瘍全体を3%改善した。 The missing modality issue is critical but non-trivial to be solved by multi-modal models. Current methods aiming to handle the missing modality problem in multi-modal tasks, either deal with missing modalities only during evaluation or train separate models to handle specific missing modality settings. In addition, these models are designed for specific tasks, so for example, classification models are not easily adapted to segmentation tasks and vice versa. In this paper, we propose the Shared-Specific Feature Modelling (ShaSpec) method that is considerably simpler and more effective than competing approaches that address the issues above. ShaSpec is designed to take advantage of all available input modalities during training and evaluation by learning shared and specific features to better represent the input data. This is achieved from a strategy that relies on auxiliary tasks based on distribution alignment and domain classification, in addition to a residual feature fusion procedure. Also, the design simplicity of ShaSpec enables its easy adaptation to multiple tasks, such as classification and segmentation. Experiments are conducted on both medical image segmentation and computer vision classification, with results indicating that ShaSpec outperforms competing methods by a large margin. For instance, on BraTS2018, ShaSpec improves the SOTA by more than 3% for enhancing tumour, 5% for tumour core and 3% for whole tumour.	翻訳日:2023-07-27 12:28:43 公開日:2023-07-26
# イベントカメラを用いた物体分類と検出のためのメモリ効率の高いグラフ畳み込みネットワーク Memory-Efficient Graph Convolutional Networks for Object Classification and Detection with Event Cameras ( http://arxiv.org/abs/2307.14124v1 ) ライセンス: Link先を確認	Kamil Jeziorek, Andrea Pinna, Tomasz Kryjak	(参考訳) イベントカメラ研究の最近の進歩は、高時間分解能、高ダイナミックレンジ、低レイテンシ、画像ぼけに対する耐性など、独自の特徴を利用できるような、スパース形式のデータ処理を強調している。イベントデータを解析するための有望なアプローチは、グラフ畳み込みネットワーク(GCN)を通じてである。しかし、この領域における現在の研究は、主に計算コストの最適化と関連するメモリコストの無視に焦点を当てている。本稿では,両因子を組み合わせることで,満足度の高い結果と比較的低いモデルの複雑さを実現する。そこで本研究では,実行時間,トレーニング可能なモデルパラメータ数,データフォーマット要件,トレーニング結果などの要因を考慮した,異なるグラフ畳み込み操作の比較分析を行った。その結果,特徴抽出モジュールのパラメータ数を450倍に減らし,データ表現のサイズを4.5倍に減らし,52.3%の分類精度を維持した。さらに,オブジェクト検出アーキテクチャを実装し,N-Caltech101データセット上での性能評価を行った。その結果、精度は53.7 % mAP@0.5で、実行速度は毎秒82グラフに達した。 Recent advances in event camera research emphasize processing data in its original sparse form, which allows the use of its unique features such as high temporal resolution, high dynamic range, low latency, and resistance to image blur. One promising approach for analyzing event data is through graph convolutional networks (GCNs). However, current research in this domain primarily focuses on optimizing computational costs, neglecting the associated memory costs. In this paper, we consider both factors together in order to achieve satisfying results and relatively low model complexity. For this purpose, we performed a comparative analysis of different graph convolution operations, considering factors such as execution time, the number of trainable model parameters, data format requirements, and training outcomes. Our results show a 450-fold reduction in the number of parameters for the feature extraction module and a 4.5-fold reduction in the size of the data representation while maintaining a classification accuracy of 52.3%, which is 6.3% higher compared to the operation used in state-of-the-art approaches. To further evaluate performance, we implemented the object detection architecture and evaluated its performance on the N-Caltech101 dataset. The results showed an accuracy of 53.7 % mAP@0.5 and reached an execution rate of 82 graphs per second.	翻訳日:2023-07-27 12:28:20 公開日:2023-07-26
# AIと教育 : システム思考におけるChatGPTの利用に関する調査 AI and Education: An Investigation into the Use of ChatGPT for Systems Thinking ( http://arxiv.org/abs/2307.14206v1 ) ライセンス: Link先を確認	Holger Arndt	(参考訳) 本研究は,様々な分野におけるシステム思考(ST)を支援する人工知能ツールChatGPTの可能性について検討する。本研究は、一般用および対象用両方のプロンプトを用いて、ツールの異なるバージョンにわたるChatGPTの応答の正確性、有用性、信頼性を評価する。以上の結果から,ChatGPTは様々な被験者に対して,STスキル向上のためのツールとしての可能性を示した。しかし、時に不正確なことは、ユーザがChatGPTの応答に批判的であり続ける必要性を浮き彫りにする。若干の制限はあるものの、この研究は注意深い使用と特注により、chatgptはstの教育と学習に有用なツールであることを示唆している。 This exploratory study investigates the potential of the artificial intelligence tool, ChatGPT, to support systems thinking (ST) in various subjects. Using both general and subject specific prompts, the study assesses the accuracy, helpfulness, and reliability of ChatGPT's responses across different versions of the tool. The results indicate that ChatGPT can provide largely correct and very helpful responses in various subjects, demonstrating its potential as a tool for enhancing ST skills. However, occasional inaccuracies highlight the need for users to remain critical of ChatGPT's responses. Despite some limitations, this study suggests that with careful use and attention to its idiosyncrasies, ChatGPT can be a valuable tool for teaching and learning ST.	翻訳日:2023-07-27 12:20:13 公開日:2023-07-26
# ランダム・フォレストとサポート・ベクター・マシンの圧力濾過性能調査への応用 : 亜鉛プラント・フィルタ・ケーキ・モデリング Application of Random Forest and Support Vector Machine for Investigation of Pressure Filtration Performance, a Zinc Plant Filter Cake Modeling ( http://arxiv.org/abs/2307.14199v1 ) ライセンス: Link先を確認	Masoume Kazemi, Davood Moradkhani, Alireza Abbas Alipour	(参考訳) 亜鉛の生産には鉱石から亜鉛を溶出し、その後圧力濾過によって固形物と液体を分離することが含まれる。この分離プロセスは、亜鉛の回収量を減少させる水分を含むため、非常に重要である。本研究ではランダムフォレスト(rf)とサポートベクターマシン(svm)による圧力濾過過程をモデル化した。モデルは実験室のサンプルから連続変数(抽出された特徴)を入力として取り出す。そのため、回帰モデルであるランダムフォレスト回帰(RFR)とサポートベクター回帰(SVR)が選択された。圧力濾過過程において,2つの条件で全データセットを得た。 1)ポリプロピレン(S1)および 2) ポリエステル織物(S2) ケーキ水分の予測には, 固体濃度 (0.2, 0.38), 温度 (35, 65cm) , pH (2, 3.5, 5) , 圧力, ケーキ厚 (14, 20, 26, 34 mm) , 気中時間 (2, 10, 15分) , 濾過時間 (15分) を入力変数として適用した。モデルの予測精度は決定係数(r2)パラメータによって評価された。その結果,RFRモデルはケーキの水分予測においてSVRモデルよりも優れていることがわかった。 The hydrometallurgical method of zinc production involves leaching zinc from ore and then separating the solid residue from the liquid solution by pressure filtration. This separation process is very important since the solid residue contains some moisture that can reduce the amount of zinc recovered. This study modeled the pressure filtration process through Random Forest (RF) and Support Vector Machine (SVM). The models take continuous variables (extracted features) from the lab samples as inputs. Thus, regression models namely Random Forest Regression (RFR) and Support Vector Regression (SVR) were chosen. A total dataset was obtained during the pressure filtration process in two conditions: 1) Polypropylene (S1) and 2) Polyester fabrics (S2). To predict the cake moisture, solids concentration (0.2 and 0.38), temperature (35 and 65 centigrade), pH (2, 3.5, and 5), pressure, cake thickness (14, 20, 26, and 34 mm), air-blow time (2, 10 and 15 min) and filtration time were applied as input variables. The models' predictive accuracy was evaluated by the coefficient of determination (R2) parameter. The results revealed that the RFR model is superior to the SVR model for cake moisture prediction.	翻訳日:2023-07-27 12:19:59 公開日:2023-07-26
# 離散連続計算グラフの効率的な学習 Efficient Learning of Discrete-Continuous Computation Graphs ( http://arxiv.org/abs/2307.14193v1 ) ライセンス: Link先を確認	David Friede and Mathias Niepert	(参考訳) 教師付きおよび強化学習のための多数のモデルは、離散的および連続的なモデルコンポーネントの組み合わせから恩恵を受ける。エンドツーエンド学習可能な離散連続モデルは合成であり、より一般化され、より解釈可能である。離散連続計算グラフを構築する一般的なアプローチは、確率的ソフトマックストリックを用いて離散確率分布をニューラルネットワークに統合する手法である。先行研究は主に、グラフの実行パスごとに単一の離散成分を持つ計算グラフに焦点を当てている。複数の逐次離散成分を持つより複雑な確率計算グラフの挙動を解析する。これらのモデルのパラメータを最適化することは、主に小さな勾配と局所的な極小さのために困難である。次に、これらの課題を克服するための2つの新しい戦略を提案する。まず,学習時のガムベルノイズ摂動のスケールパラメータの増大が学習行動を改善することを示す。第二に,確率的離散連続計算グラフに専用に調整したドロップアウト残差接続を提案する。広範な実験により、標準的な確率的ソフトマックスのトリックで訓練できない複雑な離散連続モデルを訓練できることが示される。また、複雑な離散確率モデルが、いくつかのベンチマークデータセット上で連続的なモデルよりも一般化されていることを示す。 Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more interpretable. A popular approach to building discrete-continuous computation graphs is that of integrating discrete probability distributions into neural networks using stochastic softmax tricks. Prior work has mainly focused on computation graphs with a single discrete component on each of the graph's execution paths. We analyze the behavior of more complex stochastic computations graphs with multiple sequential discrete components. We show that it is challenging to optimize the parameters of these models, mainly due to small gradients and local minima. We then propose two new strategies to overcome these challenges. First, we show that increasing the scale parameter of the Gumbel noise perturbations during training improves the learning behavior. Second, we propose dropout residual connections specifically tailored to stochastic, discrete-continuous computation graphs. With an extensive set of experiments, we show that we can train complex discrete-continuous models which one cannot train with standard stochastic softmax tricks. We also show that complex discrete-stochastic models generalize better than their continuous counterparts on several benchmark datasets.	翻訳日:2023-07-27 12:19:33 公開日:2023-07-26
# chatgptのセキュリティ、プライバシ、倫理的懸念を公開 Unveiling Security, Privacy, and Ethical Concerns of ChatGPT ( http://arxiv.org/abs/2307.14192v1 ) ライセンス: Link先を確認	Xiaodong Wu, Ran Duan, Jianbing Ni	(参考訳) 本稿では、トピックモデリングと強化学習を利用して自然な応答を生成するAIを利用したチャットボットChatGPTの領域について述べる。 ChatGPTは、カスタマーサービス、教育、メンタルヘルス治療、個人の生産性、コンテンツ作成など、さまざまな業界で大きな約束を持っていますが、セキュリティ、プライバシー、倫理的影響に対処することが不可欠です。本研究は,GPT-1からGPT-4へのアップグレードパスを探索し,モデルの特徴,限界,潜在的な応用について考察することにより,ChatGPTを日常生活に組み込む可能性を明らかにすることを目的とする。セキュリティ、プライバシー、倫理の問題に焦点を当てて、これらの懸念が広く採用する上での課題を強調します。最後に,これらの領域におけるオープンな問題を分析し,安全で倫理的に健全な大規模言語モデルの開発を確実にするための協力的な取り組みを求める。 This paper delves into the realm of ChatGPT, an AI-powered chatbot that utilizes topic modeling and reinforcement learning to generate natural responses. Although ChatGPT holds immense promise across various industries, such as customer service, education, mental health treatment, personal productivity, and content creation, it is essential to address its security, privacy, and ethical implications. By exploring the upgrade path from GPT-1 to GPT-4, discussing the model's features, limitations, and potential applications, this study aims to shed light on the potential risks of integrating ChatGPT into our daily lives. Focusing on security, privacy, and ethics issues, we highlight the challenges these concerns pose for widespread adoption. Finally, we analyze the open problems in these areas, calling for concerted efforts to ensure the development of secure and ethically sound large language models.	翻訳日:2023-07-27 12:19:18 公開日:2023-07-26
# adapt:適応による効率的なマルチエージェント軌道予測 ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation ( http://arxiv.org/abs/2307.14187v1 ) ライセンス: Link先を確認	G\"orkay Aydemir, Adil Kaan Akan, Fatma G\"uney	(参考訳) 複雑な交通シーンにおけるエージェントの将来の軌道を予測するには、シーン内のすべてのエージェントの信頼性と効率のよい予測が必要である。しかし、既存の軌道予測手法は非効率か犠牲の精度である。この課題に対処するために,動的重み学習を用いて現場の全てのエージェントの軌跡を共同で予測する新しいアプローチであるADAPTを提案する。提案手法は,ArgoverseおよびInteractionデータセットの単一エージェントおよび複数エージェント設定において,計算オーバーヘッドのごく一部で,最先端の手法よりも優れている。第1に,モデルサイズを増加させずにモデルのキャパシティを増強する適応ヘッド,第2に,勾配停止によって強化されたエンドポイント条件付き予測における設計選択を特徴とする。解析の結果,ADAPTは適応予測により各エージェントに焦点を絞ることができることがわかった。 https://KUIS-AI.github.io/adapt Forecasting future trajectories of agents in complex traffic scenes requires reliable and efficient predictions for all agents in the scene. However, existing methods for trajectory prediction are either inefficient or sacrifice accuracy. To address this challenge, we propose ADAPT, a novel approach for jointly predicting the trajectories of all agents in the scene with dynamic weight learning. Our approach outperforms state-of-the-art methods in both single-agent and multi-agent settings on the Argoverse and Interaction datasets, with a fraction of their computational overhead. We attribute the improvement in our performance: first, to the adaptive head augmenting the model capacity without increasing the model size; second, to our design choices in the endpoint-conditioned prediction, reinforced by gradient stopping. Our analyses show that ADAPT can focus on each agent with adaptive prediction, allowing for accurate predictions efficiently. https://KUIS-AI.github.io/adapt	翻訳日:2023-07-27 12:19:02 公開日:2023-07-26
# バージニア州ノーフォークにおける道路規模洪水の機械学習シュロゲートモデルの比較 A comparison of machine learning surrogate models of street-scale flooding in Norfolk, Virginia ( http://arxiv.org/abs/2307.14185v1 ) ライセンス: Link先を確認	Diana McSpadden and Steven Goldenberg and Binata Roy and Malachi Schram and Jonathan L. Goodall and Heather Richter	(参考訳) バージニア州ノーフォークに代表される低地沿岸の都市は、降雨と潮によって引き起こされる道路洪水の課題に直面している。高忠実で物理に基づくシミュレーションは都市多重洪水の正確な予測を提供するが、その計算複雑性はリアルタイムアプリケーションには適さない。 2016年から2018年にかけてのノーフォークの降雨イベントのデータを用いて、ランダム森林アルゴリズムに基づく従来の代理モデルと2つのディープラーニングモデル、LSTM(Long Short-Term Memory)とGated Recurrent Unit(GRU)を比較した。本研究は,予測不確実性の伝達と,関連するマルチモーダル特徴の効果的な統合を支援するモデルアーキテクチャの利用の重要性を指摘する。 Low-lying coastal cities, exemplified by Norfolk, Virginia, face the challenge of street flooding caused by rainfall and tides, which strain transportation and sewer systems and can lead to property damage. While high-fidelity, physics-based simulations provide accurate predictions of urban pluvial flooding, their computational complexity renders them unsuitable for real-time applications. Using data from Norfolk rainfall events between 2016 and 2018, this study compares the performance of a previous surrogate model based on a random forest algorithm with two deep learning models: Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). This investigation underscores the importance of using a model architecture that supports the communication of prediction uncertainty and the effective integration of relevant, multi-modal features.	翻訳日:2023-07-27 12:18:44 公開日:2023-07-26
# セマンティクスセグメンテーションネットワークのためのアトーラスレートの解像度認識設計 Resolution-Aware Design of Atrous Rates for Semantic Segmentation Networks ( http://arxiv.org/abs/2307.14179v1 ) ライセンス: Link先を確認	Bum Jun Kim, Hyeyeon Choi, Hyeonah Jang, Sang Woo Kim	(参考訳) deeplabはセマンティックセグメンテーションに広く使われているディープニューラルネットワークであり、その成功はatrous spatial pyramid pooling (aspp)と呼ばれる並列アーキテクチャに起因する。 ASPPは、局所情報とグローバル情報の両方を抽出するために異なるアトラスレートを持つ複数のアトラス畳み込みを使用する。しかし、アラスレートの固定値は、その視野のサイズを制限するASPPモジュールに使用される。原則として atrous rate は、対象のタスクやデータセットに応じてビューサイズのサイズを変更するハイパーパラメータであるべきです。しかし、アトーラスレートの操作はいかなるガイドラインにも従わない。本研究は,最適アラスレートを得るための実践的ガイドラインを提案する。まず、セグメンテーションネットワークの内部挙動を分析するために、セグメンテーションのための効果的な受容場を導入する。我々は,ASPPモジュールの使用により,有効受容領域の特定のパターンが得られ,モジュールの基盤となるメカニズムが明らかにされた。したがって、入力画像のサイズに基づいて制御すべき最適アラス率を得るための実用的なガイドラインを導出する。他の値と比較して、最適なatrousレートを使用することで、stare、 chase_db1、hrf、cityscapes、isaidデータセットを含む複数のデータセットにまたがるセグメンテーション結果が一貫して改善される。 DeepLab is a widely used deep neural network for semantic segmentation, whose success is attributed to its parallel architecture called atrous spatial pyramid pooling (ASPP). ASPP uses multiple atrous convolutions with different atrous rates to extract both local and global information. However, fixed values of atrous rates are used for the ASPP module, which restricts the size of its field of view. In principle, atrous rate should be a hyperparameter to change the field of view size according to the target task or dataset. However, the manipulation of atrous rate is not governed by any guidelines. This study proposes practical guidelines for obtaining an optimal atrous rate. First, an effective receptive field for semantic segmentation is introduced to analyze the inner behavior of segmentation networks. We observed that the use of ASPP module yielded a specific pattern in the effective receptive field, which was traced to reveal the module's underlying mechanism. Accordingly, we derive practical guidelines for obtaining the optimal atrous rate, which should be controlled based on the size of input image. Compared to other values, using the optimal atrous rate consistently improved the segmentation results across multiple datasets, including the STARE, CHASE_DB1, HRF, Cityscapes, and iSAID datasets.	翻訳日:2023-07-27 12:18:28 公開日:2023-07-26
# SoCFPGAデバイスを用いた高精細イベントフレーム生成 High-definition event frame generation using SoC FPGA devices ( http://arxiv.org/abs/2307.14177v1 ) ライセンス: Link先を確認	Krzysztof Blachut, Tomasz Kryjak	(参考訳) 本稿では,FPGA デバイスにおける画像面への高解像度イベントデータストリーム (HD -1280 x 720 ピクセル) の蓄積と投影の実装について述べる。結果はこのアプローチの実現可能性を確認したが、考慮すべき課題、制限、トレードオフはいくつかある。選択したデータ表現(バイナリフレーム、イベントフレーム、指数関数的に減衰する時間表面、イベント周波数)のハードウェアリソースは、AMD Xilinxの一般的なプラットフォームで利用できるものと比較した。結果のイベントフレームは、古典的およびディープニューラルネットワーク手法の両方を用いて、オブジェクトの分類や検出などの典型的な視覚アルゴリズムに使用することができる。 In this paper we have addressed the implementation of the accumulation and projection of high-resolution event data stream (HD -1280 x 720 pixels) onto the image plane in FPGA devices. The results confirm the feasibility of this approach, but there are a number of challenges, limitations and trade-offs to be considered. The required hardware resources of selected data representations, such as binary frame, event frame, exponentially decaying time surface and event frequency, were compared with those available on several popular platforms from AMD Xilinx. The resulting event frames can be used for typical vision algorithms, such as object classification and detection, using both classical and deep neural network methods.	翻訳日:2023-07-27 12:18:05 公開日:2023-07-26
# 非古典光による多光子電子放出 Multi-photon electron emission with non-classical light ( http://arxiv.org/abs/2307.14153v1 ) ライセンス: Link先を確認	Jonas Heimerl, Alexander Mikhaylov, Stefan Meier, Henrick H\"ollerer, Ido Kaminer, Maria Chekhova and Peter Hommelhoff	(参考訳) 古典的および非古典的光源からの光子数分布は広く研究されてきたが、光電子放出過程への影響はほとんど解明されていない。本稿では,光子量子統計の異なる超短光パルスで照らされた金属針先端からの電子数分布の測定を行う。古典(ポアソニアン)と量子(超ポアソニアン)の間の励起光場の光子統計を変化させることにより、測定された電子分布が実質的に変化することを証明する。単一モードの明るい真空光を用いて、1つの光パルスから最大65電子の極端な統計事象を1パルスあたり0.27電子と測定し、そのような事象の確率はポアソニアン統計値と10^{-128}$である。励起励起された真空光のモード数を変えることで、必要に応じて電子数分布を調整できる。最も重要なことは、光子統計が駆動光から放出される電子に印加され、新しいセンサーデバイスへの扉が開き、量子光による強磁場量子光学に開放されることである。 Photon number distributions from classical and non-classical light sources have been studied extensively, yet their impact on photoemission processes is largely unexplored. In this article, we present measurements of electron number-distributions from metal needle tips illuminated with ultrashort light pulses of different photon quantum statistics. By varying the photon statistics of the exciting light field between classical (Poissonian) and quantum (super-Poissonian), we demonstrate that the measured electron distributions are changed substantially. Using single-mode bright squeezed vacuum light, we measure extreme statistics events with up to 65 electrons from one light pulse at a mean of 0.27 electrons per pulse - the likelihood for such an event equals $10^{-128}$ with Poissonian statistics. Changing the number of modes of the exciting bright squeezed vacuum light, we can tailor the electron-number distribution on demand. Most importantly, our results demonstrate that the photon statistics is imprinted from the driving light to the emitted electrons, opening the door to new sensor devices and to strong-field quantum optics with quantum light.	翻訳日:2023-07-27 12:17:53 公開日:2023-07-26
# 分散離散表現の学習 Learning Disentangled Discrete Representations ( http://arxiv.org/abs/2307.14151v1 ) ライセンス: Link先を確認	David Friede, Christian Reimers, Heiner Stuckenschmidt and Mathias Niepert	(参考訳) 最近の画像生成、モデルベース強化学習、テキストから画像への生成の成功は、離散的潜在表現の実証的利点を示しているが、その利点の背後にある理由は定かではない。本稿では,標準ガウス変分オートエンコーダ(VAE)をカテゴリー変分オートエンコーダに置き換えることで,離散潜在空間と非交分表現の関係を検討する。カテゴリー分布の基盤となる格子構造は多変量ガウス分布に付随する回転不変性の問題を緩和し、非交叉表現の効率的な帰納的先行として機能することを示す。本研究では,非絡み合った表現を学習する上で,個別のVAEの利点を示す分析的および実証的な知見を提供する。さらに,不連続表現を好む最初の教師なしモデル選択戦略を提案する。 Recent successes in image generation, model-based reinforcement learning, and text-to-image generation have demonstrated the empirical advantages of discrete latent representations, although the reasons behind their benefits remain unclear. We explore the relationship between discrete latent spaces and disentangled representations by replacing the standard Gaussian variational autoencoder (VAE) with a tailored categorical variational autoencoder. We show that the underlying grid structure of categorical distributions mitigates the problem of rotational invariance associated with multivariate Gaussian distributions, acting as an efficient inductive prior for disentangled representations. We provide both analytical and empirical findings that demonstrate the advantages of discrete VAEs for learning disentangled representations. Furthermore, we introduce the first unsupervised model selection strategy that favors disentangled representations.	翻訳日:2023-07-27 12:17:35 公開日:2023-07-26
# 説明可能な人工知能(XAI)における性能説明可能性貿易の見直し Revisiting the Performance-Explainability Trade-Off in Explainable Artificial Intelligence (XAI) ( http://arxiv.org/abs/2307.14239v1 ) ライセンス: Link先を確認	Barnaby Crook, Maximilian Schl\"uter, Timo Speith	(参考訳) 要求工学(RE)の分野では、AIをサポートするシステムとユーザニーズ、社会的期待、規制基準の整合性において、説明可能な人工知能(XAI)の重要性が増している。一般に、システム品質に影響を与える重要な非機能要件として説明可能性が現れています。しかし、説明可能性と性能のトレードオフは説明可能性のポジティブな影響と推定される。説明可能性の要件を満たすことがシステム性能の低下を伴う場合、これらの品質面のどちらが優先され、どのように妥協するかを慎重に検討する必要がある。本稿では,そのトレードオフを批判的に検討する。我々は、リソースの可用性、ドメインの特徴、リスクの考慮を組み込んだ、曖昧な方法でアプローチするのが最善である、と論じる。この研究は、将来の研究とベストプラクティスの基礎を提供することで、AIのためのREの分野を前進させることを目指している。 Within the field of Requirements Engineering (RE), the increasing significance of Explainable Artificial Intelligence (XAI) in aligning AI-supported systems with user needs, societal expectations, and regulatory standards has garnered recognition. In general, explainability has emerged as an important non-functional requirement that impacts system quality. However, the supposed trade-off between explainability and performance challenges the presumed positive influence of explainability. If meeting the requirement of explainability entails a reduction in system performance, then careful consideration must be given to which of these quality aspects takes precedence and how to compromise between them. In this paper, we critically examine the alleged trade-off. We argue that it is best approached in a nuanced way that incorporates resource availability, domain characteristics, and considerations of risk. By providing a foundation for future research and best practices, this work aims to advance the field of RE for AI.	翻訳日:2023-07-27 12:11:09 公開日:2023-07-26
# ロボット群のための多目的ニューラルネットワークコントローラの進化 Evolving Multi-Objective Neural Network Controllers for Robot Swarms ( http://arxiv.org/abs/2307.14237v1 ) ライセンス: Link先を確認	Karl Mason, Sabine Hauert	(参考訳) 多くのswarm roboticsタスクは、複数の相反する目的から成り立っている。本研究では,ロボット群に対する制御器開発のための多目的進化型ニューラルネットワーク手法を提案する。 Swarmロボットコントローラは、低忠実度Pythonシミュレータでトレーニングされ、Webotsを使用して高忠実度シミュレーション環境でテストされる。次に、進化した多目的ロボットコントローラの、多数のロボットを持つ環境への拡張性をテストするシミュレーションを行う。その結果,提案手法は各ロボットを効果的に制御できることを示した。ロボット群は、それぞれの目標の重み付けを調整するため、異なる振る舞いを示す。その結果、低忠実度シミュレータで進化した多目的ニューラルネットワークコントローラは、高忠実度シミュレーション環境に移行でき、さらに多くのロボットを必要とせずに、より多くの環境にスケールできることがわかった。 Many swarm robotics tasks consist of multiple conflicting objectives. This research proposes a multi-objective evolutionary neural network approach to developing controllers for swarms of robots. The swarm robot controllers are trained in a low-fidelity Python simulator and then tested in a high-fidelity simulated environment using Webots. Simulations are then conducted to test the scalability of the evolved multi-objective robot controllers to environments with a larger number of robots. The results presented demonstrate that the proposed approach can effectively control each of the robots. The robot swarm exhibits different behaviours as the weighting for each objective is adjusted. The results also confirm that multi-objective neural network controllers evolved in a low-fidelity simulator can be transferred to high-fidelity simulated environments and that the controllers can scale to environments with a larger number of robots without further retraining needed.	翻訳日:2023-07-27 12:10:54 公開日:2023-07-26
# UnScientify: 学術的不確かさを全文で検出する UnScientify: Detecting Scientific Uncertainty in Scholarly Full Text ( http://arxiv.org/abs/2307.14236v1 ) ライセンス: Link先を確認	Panggih Kusuma Ningrum, Philipp Mayr, Iana Atanassova	(参考訳) 本論文は,科学的な不確実性を検出するインタラクティブシステムであるunscientifyを提案する。このシステムは、微粒度アノテーションスキームを用いて、科学文章の文レベルで不確かさを言語的に定式化する弱い教師技術を用いる。システム用のパイプラインには、パターンマッチング、複雑な文チェック、オーサリング参照チェックの組み合わせが含まれている。提案手法は,情報検索,テキストマイニング,学術文書処理など,さまざまな種類の科学的不確実性を考慮した,科学的不確実性識別のためのラベル付けおよびアノテーションタスクを自動化する。さらに、UnScientifyは解釈可能な結果を提供し、テキストにおける科学的不確実性の特定事例の理解を支援する。 This demo paper presents UnScientify, an interactive system designed to detect scientific uncertainty in scholarly full text. The system utilizes a weakly supervised technique that employs a fine-grained annotation scheme to identify verbally formulated uncertainty at the sentence level in scientific texts. The pipeline for the system includes a combination of pattern matching, complex sentence checking, and authorial reference checking. Our approach automates labeling and annotation tasks for scientific uncertainty identification, taking into account different types of scientific uncertainty, that can serve various applications such as information retrieval, text mining, and scholarly document processing. Additionally, UnScientify provides interpretable results, aiding in the comprehension of identified instances of scientific uncertainty in text.	翻訳日:2023-07-27 12:10:40 公開日:2023-07-26
# コンピュータシステムにおけるOpacityの源泉 : 包括的分類学を目指して Sources of Opacity in Computer Systems: Towards a Comprehensive Taxonomy ( http://arxiv.org/abs/2307.14232v1 ) ライセンス: Link先を確認	Sara Mann, Barnaby Crook, Lena K\"astner, Astrid Schom\"acker, Timo Speith	(参考訳) 現代のコンピュータシステムは現代では至る所に存在するが、その多くが不透明である。これはフェアネスや説明責任といったデシデラタが重要な領域において大きな課題となる。システム透過性を達成するための最善の戦略は、与えられた文脈に共通する不透明さの特定の源によって異なります。既存の議論を合成し、拡張し、アーキテクチャ、分析、社会工学の3つの主要なカテゴリに分類される不透明性の8つの源からなる分類法を提案する。各ソースに対して,結果の不透明性に対処する方法について,まず最初に提案する。分類学は、要件エンジニアや他の実践者が、文脈的に一般的な不透明性のソースを理解し、それらを克服するための適切な戦略を選択または開発するための出発点を提供する。 Modern computer systems are ubiquitous in contemporary life yet many of them remain opaque. This poses significant challenges in domains where desiderata such as fairness or accountability are crucial. We suggest that the best strategy for achieving system transparency varies depending on the specific source of opacity prevalent in a given context. Synthesizing and extending existing discussions, we propose a taxonomy consisting of eight sources of opacity that fall into three main categories: architectural, analytical, and socio-technical. For each source, we provide initial suggestions as to how to address the resulting opacity in practice. The taxonomy provides a starting point for requirements engineers and other practitioners to understand contextually prevalent sources of opacity, and to select or develop appropriate strategies for overcoming them.	翻訳日:2023-07-27 12:10:27 公開日:2023-07-26
# 伝統的な中国絵画の計算的アプローチ:「絵画の6原則」の視点から Computational Approaches for Traditional Chinese Painting: From the "Six Principles of Painting" Perspective ( http://arxiv.org/abs/2307.14227v1 ) ライセンス: Link先を確認	Wei Zhang, Jian-Wei Zhang, Kam Kwai Wong, Yifang Wang, Yingchaojie Feng, Luwei Wang, and Wei Chen	(参考訳) 伝統的な中国絵画(TCP)は貴重な文化遺産であり、ユニークな視覚芸術様式である。近年、文化の保存と再生のためにTCPのデジタル化への関心が高まっている。結果として得られたデジタルコピーは、TCPの構造的および体系的な理解のための計算手法の進歩を可能にした。そこで本研究では,92点の文献を詳細に分析した。 tcpsにおけるコンピュータ技術の現状について,専門家との会話を多用した3つの視点から検討した。まず,「絵画の六原則」理論に照らして,これらの論文を芸術的要素に着目した研究により分類した。次に、TCPアプリケーションの目的を説明するための4段階のフレームワークを作成しました。第3に、TCPに適用された一般的な計算技法を要約した。このフレームワークはまた、潜在的なアプリケーションと将来の展望に関する洞察を提供する。調査対象の出版物と関連情報の一覧はhttps://ca4tcp.com.comで公開されている。 Traditional Chinese Painting (TCP) is an invaluable cultural heritage resource and a unique visual art style. In recent years, increasing interest has been placed on digitalizing TCPs to preserve and revive the culture. The resulting digital copies have enabled the advancement of computational methods for structured and systematic understanding of TCPs. To explore this topic, we conducted an in-depth analysis of 92 pieces of literature. We examined the current use of computer technologies on TCPs from three perspectives, based on numerous conversations with specialists. First, in light of the "Six Principles of Painting" theory, we categorized the articles according to their research focus on artistic elements. Second, we created a four-stage framework to illustrate the purposes of TCP applications. Third, we summarized the popular computational techniques applied to TCPs. The framework also provides insights into potential applications and future prospects, with professional opinion. The list of surveyed publications and related information is available online at https://ca4tcp.com.	翻訳日:2023-07-27 12:10:16 公開日:2023-07-26
# 地域貿易組織に基づく気候交渉の進展の可能性を探る:RICE-Nに基づく研究 Explore the possibility of advancing climate negotiations on the basis of regional trade organizations: A study based on RICE-N ( http://arxiv.org/abs/2307.14226v1 ) ライセンス: Link先を確認	Wubo Dai	(参考訳) 気候問題は今ますます重要になっている。世界各国政府は何らかの進展を遂げているが、現在、国際協力の見通しが明確でない事実に直面している。統合評価モデル(IAM)モデルの限界のため,動的交渉プロセスのシミュレーションは困難である。したがって、深層学習を用いて新しいエージェントベースモデル(ABM)を構築することで、気候交渉に新たな理論的支援を提供することができる。 RICE-Nモデルに基づいて、既存の貿易グループに基づく気候交渉へのアプローチを提案した。シミュレーションの結果,このスキームは有望であることが判明した。 Climate issues have become more and more important now. Although global governments have made some progress, we are still facing the truth that the prospect of international cooperation is not clear at present. Due to the limitations of the Integrated assessment models (IAMs) model, it is difficult to simulate the dynamic negotiation process. Therefore, using deep learning to build a new agents based model (ABM) might can provide new theoretical support for climate negotiations. Building on the RICE-N model, this work proposed an approach to climate negotiations based on existing trade groups. Simulation results show that the scheme has a good prospect.	翻訳日:2023-07-27 12:10:03 公開日:2023-07-26
# 大規模言語モデルは、言語とアイテムに基づく好みのコールドスタートレコメンデーションと競争している Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences ( http://arxiv.org/abs/2307.14225v1 ) ライセンス: Link先を確認	Scott Sanner and Krisztian Balog and Filip Radlinski and Ben Wedin and Lucas Dixon	(参考訳) 従来のレコメンダシステムでは,ユーザの項目選択履歴を活用して,ユーザが好む可能性のある新たなコンテンツを推奨する。しかし、ユーザが言語ベースの好みを表現できるモダンなダイアログインターフェースは、好み入力に対して根本的に異なるモダリティを提供する。近年の大規模言語モデル(LLM)のパラダイム導入の成功に触発されて,現在最先端の項目ベース協調フィルタリング(CF)手法と比較して,項目ベースと言語ベースの両方で推奨する手法について検討した。この調査を支援するために,様々な推奨項目(バイアス)および(バイアスのない)ランダム項目に対する評価とともに,ユーザから誘導される項目ベースと言語ベースの選好の両方からなる新しいデータセットを収集した。多くの実験結果の中で, LLM は, 特定のタスク (ゼロショット) や少数のラベル (ファウショット) を指導していないにもかかわらず, アイテムベースCF 法と比較して, 近い冷間開始時の純粋言語に基づく選好(項目選好)に対して, 競争力のあるレコメンデーション性能を提供することがわかった。言語ベースの嗜好表現は、アイテムベースやベクトルベースの表現よりも説明可能で精査可能であるため、これは特に有望である。 Traditional recommender systems leverage users' item preference history to recommend novel content that users may like. However, modern dialog interfaces that allow users to express language-based preferences offer a fundamentally different modality for preference input. Inspired by recent successes of prompting paradigms for large language models (LLMs), we study their use for making recommendations from both item-based and language-based preferences in comparison to state-of-the-art item-based collaborative filtering (CF) methods. To support this investigation, we collect a new dataset consisting of both item-based and language-based preferences elicited from users along with their ratings on a variety of (biased) recommended items and (unbiased) random items. Among numerous experimental results, we find that LLMs provide competitive recommendation performance for pure language-based preferences (no item preferences) in the near cold-start case in comparison to item-based CF methods, despite having no supervised training for this specific task (zero-shot) or only a few labels (few-shot). This is particularly promising as language-based preference representations are more explainable and scrutable than item-based or vector-based representations.	翻訳日:2023-07-27 12:09:50 公開日:2023-07-26
# 量子コンピューティングのdyadicフラグメントにおけるsum-over-pathの書き換えと完全性 Rewriting and Completeness of Sum-Over-Paths in Dyadic Fragments of Quantum Computing ( http://arxiv.org/abs/2307.14223v1 ) ライセンス: Link先を確認	Renaud Vilmart	(参考訳) sum-over-paths"形式主義は、量子系を記述する線形写像を象徴的に操作する方法であり、そのような系の形式的検証に使用されるツールである。ここでは、定式化のための新しい書き直し規則を述べ、量子力学の最も単純な普遍的な断片である "Toffoli-Hadamard" に対して完備であることを示す。書き直しは終了しているが、confluent(断片の普遍性から期待される)ではないことを示す。我々は、Sum-over-Pathsとグラフィカル言語ZH-calculusの接続を利用し、また、公理化が後者にどのように変換されるかを示す。提案する書き直しルールの一般化を提供し,実際に用語を削減しようとする場合に有用であることを示すとともに,これらの新しいルールをグラフィカルに理解する方法を示す。量子フーリエ変換において特に用いられる量子計算のdyadicフラグメントの完全性を達成するために書き直しシステムを拡張し、dyadic倍数$\pi$の位相ゲートを toffoli-hadamard ゲート集合に付加する方法を示す。最後に、ゲートベースの量子計算を解析するために設計されたシステムではネイティブではないが、ハミルトニアンベースの量子計算を考える際に必要となる任意の項の和と結合の方法を示す。 The "Sum-Over-Paths" formalism is a way to symbolically manipulate linear maps that describe quantum systems, and is a tool that is used in formal verification of such systems. We give here a new set of rewrite rules for the formalism, and show that it is complete for "Toffoli-Hadamard", the simplest approximately universal fragment of quantum mechanics. We show that the rewriting is terminating, but not confluent (which is expected from the universality of the fragment). We do so using the connection between Sum-over-Paths and graphical language ZH-calculus, and also show how the axiomatisation translates into the latter. We provide generalisations of the presented rewrite rules, that can prove useful when trying to reduce terms in practice, and we show how to graphically make sense of these new rules. We show how to enrich the rewrite system to reach completeness for the dyadic fragments of quantum computation, used in particular in the Quantum Fourier Transform, and obtained by adding phase gates with dyadic multiples of $\pi$ to the Toffoli-Hadamard gate-set. Finally, we show how to perform sums and concatenation of arbitrary terms, something which is not native in a system designed for analysing gate-based quantum computation, but necessary when considering Hamiltonian-based quantum computation.	翻訳日:2023-07-27 12:09:12 公開日:2023-07-26
# 普遍量子フォン・ノイマン構造に関する調査研究 A survey of universal quantum von Neumann architecture ( http://arxiv.org/abs/2307.14219v1 ) ライセンス: Link先を確認	Y.-T. Liu, K. Wang, Y.-D. Liu, D.-S. Wang	(参考訳) 普遍量子コンピュータの存在は理論的によく確立されている。しかし、実際の量子コンピュータシステムを構築するには、普遍性の理論に頼るだけでなく、プログラム可能性、モジュール性、スケーラビリティなど、他の機能に対する要求を満たす方法が必要である。この目的のために、最近提案された量子フォン・ノイマン・アーキテクチャのモデルについて、コンピュータシステムの階層的設計という、実用的でより広い設定で検討する。我々は、量子cpuと量子制御ユニットの構造を分析し、それらの接続を計算の利点で引き出す。また、我々のモデルの最近のデモでは20キュービット未満が必要だったことも指摘しています。 The existence of universal quantum computers has been theoretically well established. However, building up a real quantum computer system not only relies on the theory of universality, but also needs methods to satisfy requirements on other features, such as programmability, modularity, scalability, etc. To this end, we study the recently proposed model of quantum von Neumann architecture, by putting it in a practical and broader setting, namely, the hierarchical design of a computer system. We analyze the structures of quantum CPU and quantum control unit, and draw their connections with computational advantages. We also point out that a recent demonstration of our model would require less than 20 qubits.	翻訳日:2023-07-27 12:08:45 公開日:2023-07-26
# 資源制約下における従属プロセスのオンラインモデリングとモニタリング Online Modeling and Monitoring of Dependent Processes under Resource Constraints ( http://arxiv.org/abs/2307.14208v1 ) ライセンス: Link先を確認	Tanapol Kosolwattana, Huazheng Wang, Ying Lin	(参考訳) 限られた資源の下で依存するプロセスの集団を監視することは異常な事象の検出に重要である。リスクの高いプロセスの活用と依存動力学の探索のための資源を適応的に割り当てる新しいオンライン協調学習手法を提案する。提案手法の有効性は理論解析と実験によって証明される。 Monitoring a population of dependent processes under limited resources is critical for abnormal events detection. A novel online collaborative learning method is proposed to adaptively allocate the resources for exploitation of high-risk processes and exploration of dependent dynamics. Efficiency of the proposed method is proved through theoretical analysis and experiments.	翻訳日:2023-07-27 12:08:34 公開日:2023-07-26
# ディープフェイク画像による脳腫瘍分画の改善 Deepfake Image Generation for Improved Brain Tumor Segmentation ( http://arxiv.org/abs/2307.14273v1 ) ライセンス: Link先を確認	Roa'a Al-Emaryeen, Sara Al-Nahhas, Fatima Himour, Waleed Mahafza and Omar Al-Kadi	(参考訳) 世界が技術と健康が進歩するにつれて、無症状の徴候を明らかにすることで病気の認識が向上する。生命を脅かす可能性があるため、早期に腫瘍を検出・治療することが重要である。コンピュータ支援技術は、病気の診断に直面する退屈な限界を克服するために用いられるが、脳腫瘍のセグメンテーションは、特にマルチモダリティデータに関わる場合、難しいプロセスである。これは主にデータ不足とそれに対応するラベル付けによる非効率なトレーニングに起因する。本研究は,脳腫瘍セグメンテーションにおけるディープフェイク画像生成の可能性を検討する。この目的のために、画像から画像への変換にGenerative Adversarial Networkを使用してデータセットのサイズを拡大し、続いてディープフェイクイメージでトレーニングされたU-Netベースの畳み込みニューラルネットワークを用いて画像セグメンテーションを行った。提案手法の性能は、4つの公開データセットの真理と比較される。その結果,画像セグメンテーションの品質指標の面ではパフォーマンスが向上し,限られたデータでトレーニングする際の支援が可能となった。 As the world progresses in technology and health, awareness of disease by revealing asymptomatic signs improves. It is important to detect and treat tumors in early stage as it can be life-threatening. Computer-aided technologies are used to overcome lingering limitations facing disease diagnosis, while brain tumor segmentation remains a difficult process, especially when multi-modality data is involved. This is mainly attributed to ineffective training due to lack of data and corresponding labelling. This work investigates the feasibility of employing deep-fake image generation for effective brain tumor segmentation. To this end, a Generative Adversarial Network was used for image-to-image translation for increasing dataset size, followed by image segmentation using a U-Net-based convolutional neural network trained with deepfake images. Performance of the proposed approach is compared with ground truth of four publicly available datasets. Results show improved performance in terms of image segmentation quality metrics, and could potentially assist when training with limited data.	翻訳日:2023-07-27 12:00:48 公開日:2023-07-26
# 相互条件付き拘束コミットメントによる国際気候政策の改善 Improving International Climate Policy via Mutually Conditional Binding Commitments ( http://arxiv.org/abs/2307.14267v1 ) ライセンス: Link先を確認	Jobst Heitzig, J\"org Oechssler, Christoph Pr\"oschel, Niranjana Ragavan, Yat Long Lo	(参考訳) パリ協定は、気候交渉において重要なマイルストーンと見なされ、多くの国が決定する貢献(ndc)の無条件性のために、気候変動を効果的に対処するための課題に直面してきた。その結果, 主要汚染物質に対するフリーライディング行動の頻度が増加し, NDCにおける具体的な条件の欠如が生じた。この問題に対処するため,条件付きコミット機構と呼ばれる分散的ボトムアップ手法の実装を提案する。このメカニズムは、国際気候政策における条件付き協力を形式化することを目的として、早期採用者に柔軟性とインセンティブを提供している。本稿では,ai4climatecooperationチャレンジにおけるメカニズムの概要,その性能について述べ,実世界実装の可能性について考察する。気候緩和集団行動問題、基本的な経済原理、ゲーム理論の概念の事前知識が想定される。 The Paris Agreement, considered a significant milestone in climate negotiations, has faced challenges in effectively addressing climate change due to the unconditional nature of most Nationally Determined Contributions (NDCs). This has resulted in a prevalence of free-riding behavior among major polluters and a lack of concrete conditionality in NDCs. To address this issue, we propose the implementation of a decentralized, bottom-up approach called the Conditional Commitment Mechanism. This mechanism, inspired by the National Popular Vote Interstate Compact, offers flexibility and incentives for early adopters, aiming to formalize conditional cooperation in international climate policy. In this paper, we provide an overview of the mechanism, its performance in the AI4ClimateCooperation challenge, and discuss potential real-world implementation aspects. Prior knowledge of the climate mitigation collective action problem, basic economic principles, and game theory concepts are assumed.	翻訳日:2023-07-27 12:00:30 公開日:2023-07-26
# 相互条件付き拘束コミットメントによる国際気候政策の改善 Improving International Climate Policy via Mutually Conditional Binding Commitments ( http://arxiv.org/abs/2307.14266v1 ) ライセンス: Link先を確認	Jobst Heitzig, J\"org Oechssler, Christoph Pr\"oschel, Niranjana Ragavan, Richie YatLong Lo	(参考訳) 本稿では,国際気候政策交渉の現実性を改善するため,RICE-Nシミュレーションとマルチエージェント強化学習フレームワークの強化を提案する。枠組みの価値を認め,気候交渉のモデル化における様々な要因に対処する重要な拡張の必要性を強調した。 CCFメカニズム(Conditional Commitments Mechanism)に関するこれまでの研究に基づいて、シミュレーションと現実のギャップを埋める方法について論じる。コーディネーション強化のためのレコメンダまたはプランナーエージェントの導入、社会的要因と非パーティ利害関係者サブエイジェントを組み込むことによるreal2simギャップへの対処、および基盤となる強化学習ソリューションアルゴリズムの強化を提案する。これらの改善は、米Nにおけるより効果的な国際気候政策決定のための交渉プロトコルの評価と定式化を促進することを目的としている。しかしながら、これらの提案の意義と有効性を決定するには、さらなる実験とテストが必要である。 This paper proposes enhancements to the RICE-N simulation and multi-agent reinforcement learning framework to improve the realism of international climate policy negotiations. Acknowledging the framework's value, we highlight the necessity of significant enhancements to address the diverse array of factors in modeling climate negotiations. Building upon our previous work on the "Conditional Commitments Mechanism" (CCF mechanism) we discuss ways to bridge the gap between simulation and reality. We suggest the inclusion of a recommender or planner agent to enhance coordination, address the Real2Sim gap by incorporating social factors and non-party stakeholder sub-agents, and propose enhancements to the underlying Reinforcement Learning solution algorithm. These proposed improvements aim to advance the evaluation and formulation of negotiation protocols for more effective international climate policy decision-making in Rice-N. However, further experimentation and testing are required to determine the implications and effectiveness of these suggestions.	翻訳日:2023-07-27 12:00:12 公開日:2023-07-26
# 拡散確率モデルを用いた組織像のアーティファクト復元 Artifact Restoration in Histology Images with Diffusion Probabilistic Models ( http://arxiv.org/abs/2307.14262v1 ) ライセンス: Link先を確認	Zhenqi He, Junjun He, Jin Ye, Yiqing Shen	(参考訳) 組織学的全スライド画像(WSI)は通常、組織折り畳みや気泡などの人工物によって妥協され、病理医とコンピュータ支援診断(CAD)システムの検査困難が増大する。既存のアーティファクトイメージの復元アプローチはGAN(Generative Adversarial Networks)に限られており、修復プロセスはイメージ・ツー・イメージ・トランスファーとして定式化されている。これらの手法は、モード崩壊と予期せぬステンスタイルの誤伝に苦しむ傾向があり、不満足で非現実的な復元画像を生み出す。 Innovatively, we make the first attempt at a denoising diffusion probabilistic model for histological artifact restoration, namely ArtiFusion.Specifically, ArtiFusion formulates the artifact region restoration as a gradual denoising process, and its training relies solely on artifact-free images to simplify the training complexity.Furthermore, to capture local-global correlations in the regional artifact restoration, a novel Swin-Transformer denoising architecture is designed, along with a time token scheme. 本研究は, 組織解析におけるArtiFusionの予備処理法としての有効性を実証し, 修復過程における組織構造と染色様式の保存に成功した。コードはhttps://github.com/zhenqi-he/artifusionで入手できる。 Histological whole slide images (WSIs) can be usually compromised by artifacts, such as tissue folding and bubbles, which will increase the examination difficulty for both pathologists and Computer-Aided Diagnosis (CAD) systems. Existing approaches to restoring artifact images are confined to Generative Adversarial Networks (GANs), where the restoration process is formulated as an image-to-image transfer. Those methods are prone to suffer from mode collapse and unexpected mistransfer in the stain style, leading to unsatisfied and unrealistic restored images. Innovatively, we make the first attempt at a denoising diffusion probabilistic model for histological artifact restoration, namely ArtiFusion.Specifically, ArtiFusion formulates the artifact region restoration as a gradual denoising process, and its training relies solely on artifact-free images to simplify the training complexity.Furthermore, to capture local-global correlations in the regional artifact restoration, a novel Swin-Transformer denoising architecture is designed, along with a time token scheme. Our extensive evaluations demonstrate the effectiveness of ArtiFusion as a pre-processing method for histology analysis, which can successfully preserve the tissue structures and stain style in artifact-free regions during the restoration. Code is available at https://github.com/zhenqi-he/ArtiFusion.	翻訳日:2023-07-27 11:59:56 公開日:2023-07-26
# 視覚トランスフォーマーのスパース・ダブル降下:リアルかファントムか? Sparse Double Descent in Vision Transformers: real or phantom threat? ( http://arxiv.org/abs/2307.14253v1 ) ライセンス: Link先を確認	Victor Qu\'etu, Marta Milovanovic and Enzo Tartaglione	(参考訳) 視覚変換器(ViT)は近年の理論的および実証的な研究に広く関心を寄せている。注意に基づくアプローチのおかげで、インダクティブバイアスを避ける能力のおかげで、画像内の重要な特徴やパターンの識別が促進され、非常に正確な画像解析が実現される。一方、新生代の研究は、非常に過度にパラメータ化されたモデルが一般化できる現代のディープラーニングモデルで起こりうる 'sparse double descend' 現象を報告している。これにより、モデルのサイズが最適であることや、スパーシティとパフォーマンスの最良のトレードオフを見つけるための探求について、現実的な疑問が持ち上がる。このような現象を避ける方法を見つけられるでしょうか? 我々の研究は、ViTsにおけるスパース二重降下の発生に対処する。 resnetのような伝統的なアーキテクチャがスパース二重降下現象を非難されていることを示すいくつかの著作にもかかわらず、vitsでは最適に調整された$\ell_2$正規化がそのような現象を緩和することを観測する。最適なラムダは、ViTの潜在的な圧縮を犠牲にします。 Vision transformers (ViT) have been of broad interest in recent theoretical and empirical works. They are state-of-the-art thanks to their attention-based approach, which boosts the identification of key features and patterns within images thanks to the capability of avoiding inductive bias, resulting in highly accurate image analysis. Meanwhile, neoteric studies have reported a ``sparse double descent'' phenomenon that can occur in modern deep-learning models, where extremely over-parametrized models can generalize well. This raises practical questions about the optimal size of the model and the quest over finding the best trade-off between sparsity and performance is launched: are Vision Transformers also prone to sparse double descent? Can we find a way to avoid such a phenomenon? Our work tackles the occurrence of sparse double descent on ViTs. Despite some works that have shown that traditional architectures, like Resnet, are condemned to the sparse double descent phenomenon, for ViTs we observe that an optimally-tuned $\ell_2$ regularization relieves such a phenomenon. However, everything comes at a cost: optimal lambda will sacrifice the potential compression of the ViT.	翻訳日:2023-07-27 11:59:34 公開日:2023-07-26
# ステップとアイソスペクタリティを有する高調波発振器 Harmonic Oscillator with a Step and Isospectrality ( http://arxiv.org/abs/2307.14251v1 ) ライセンス: Link先を確認	Yuta Nasuda, Nobuyuki Sawado	(参考訳) 原点に有限ジャンプ$a$の高調波発振器を持つ一次元Schr\"{o}dinger方程式について検討する。この解は、通常の波動関数マッチング技術を用いて構成される。 a$, $a=4\ell$ (\ell=1,2,\ldots$) の特別な選択に対して、波動関数はエルミート多項式によって表現できる。さらに,darboux変換によるポテンシャルの等スペクトル変形についても検討する。この文脈では、通常の調和振動子に対する無限個の等スペクトルハミルトニアンが得られる。 We investigate the one-dimensional Schr\"{o}dinger equation with a harmonic oscillator with a finite jump $a$ at the origin. The solution is constructed by employing the ordinary matching-of-wavefunctions technique. For the special choices of $a$, $a=4\ell$ ($\ell=1,2,\ldots$), the wavefunctions can be expressed by the Hermite polynomials. Moreover, we explore isospectral deformations of the potential via the Darboux transformation. In this context, infinitely many number of isospectral Hamiltonians to the ordinary harmonic oscillator are obtained.	翻訳日:2023-07-27 11:59:14 公開日:2023-07-26
# 説明可能な人工知能(XAI)の評価手法の新しい展望 A New Perspective on Evaluation Methods for Explainable Artificial Intelligence (XAI) ( http://arxiv.org/abs/2307.14246v1 ) ライセンス: Link先を確認	Timo Speith, Markus Langer	(参考訳) 要求工学(RE)の分野では、AIをサポートするシステムとユーザニーズ、社会的期待、規制基準の整合性において、説明可能な人工知能(XAI)の重要性が増している。一般に、システム品質に影響を与える重要な非機能要件として説明可能性が現れています。しかし、説明可能性と性能のトレードオフは説明可能性のポジティブな影響と推定される。説明可能性の要件を満たすことがシステム性能の低下を伴う場合、これらの品質面のどちらが優先され、どのように妥協するかを慎重に検討する必要がある。本稿では,そのトレードオフを批判的に検討する。我々は、リソースの可用性、ドメインの特徴、リスクの考慮を組み込んだ、曖昧な方法でアプローチするのが最善である、と論じる。この研究は、将来の研究とベストプラクティスの基礎を提供することで、AIのためのREの分野を前進させることを目指している。 Within the field of Requirements Engineering (RE), the increasing significance of Explainable Artificial Intelligence (XAI) in aligning AI-supported systems with user needs, societal expectations, and regulatory standards has garnered recognition. In general, explainability has emerged as an important non-functional requirement that impacts system quality. However, the supposed trade-off between explainability and performance challenges the presumed positive influence of explainability. If meeting the requirement of explainability entails a reduction in system performance, then careful consideration must be given to which of these quality aspects takes precedence and how to compromise between them. In this paper, we critically examine the alleged trade-off. We argue that it is best approached in a nuanced way that incorporates resource availability, domain characteristics, and considerations of risk. By providing a foundation for future research and best practices, this work aims to advance the field of RE for AI.	翻訳日:2023-07-27 11:59:04 公開日:2023-07-26
# 蛍光ニューロン v2: 顕微鏡における深層学習のためのマルチタスク・マルチフォームアノテーション Fluorescent Neuronal Cells v2: Multi-Task, Multi-Format Annotations for Deep Learning in Microscopy ( http://arxiv.org/abs/2307.14243v1 ) ライセンス: Link先を確認	Luca Clissa, Antonio Macaluso, Roberto Morelli, Alessandra Occhinegro, Emiliana Piscitiello, Ludovico Taddei, Marco Luppi, Roberto Amici, Matteo Cerri, Timna Hitrec, Lorenzo Rinaldi, Antonio Zoccoli	(参考訳) 蛍光細胞v2は、生命科学と深層学習の領域における革新的な研究を促進するために設計された蛍光顕微鏡画像とそれに対応する地平線アノテーションのコレクションである。このデータセットは、歯状神経細胞の核と細胞質が様々なマーカーで染色され、解剖学的または機能的特徴を強調する3つのイメージコレクションを含んでいる。画像の他に、セマンティックセグメンテーション、オブジェクト検出、カウントなど、いくつかの学習タスクに対して、地平のアノテーションを提供する。コントリビューションは2つあります。まず,アノテーションの多様さと利用可能な形式を考慮し,セグメンテーション,検出,特徴学習,教師なし・自己教師型学習,伝達学習,関連分野のコンピュータビジョンアプローチにおける方法論的進歩の促進を図った。第2に,広範な探索とベンチマークを行うことにより,蛍光細胞v2が蛍光顕微鏡解析におけるブレークスルーを触媒し,生命科学における最先端の発見を促進することを期待する。データは以下の通り。 https://amsacta.unibo.it/id/eprint/7347 Fluorescent Neuronal Cells v2 is a collection of fluorescence microscopy images and the corresponding ground-truth annotations, designed to foster innovative research in the domains of Life Sciences and Deep Learning. This dataset encompasses three image collections in which rodent neuronal cells' nuclei and cytoplasm are stained with diverse markers to highlight their anatomical or functional characteristics. Alongside the images, we provide ground-truth annotations for several learning tasks, including semantic segmentation, object detection, and counting. The contribution is two-fold. First, given the variety of annotations and their accessible formats, we envision our work facilitating methodological advancements in computer vision approaches for segmentation, detection, feature learning, unsupervised and self-supervised learning, transfer learning, and related areas. Second, by enabling extensive exploration and benchmarking, we hope Fluorescent Neuronal Cells v2 will catalyze breakthroughs in fluorescence microscopy analysis and promote cutting-edge discoveries in life sciences. The data are available at: https://amsacta.unibo.it/id/eprint/7347	翻訳日:2023-07-27 11:58:50 公開日:2023-07-26
# ジョイント領域ローカライズとインパインティングによる敵パッチの防御 Defending Adversarial Patches via Joint Region Localizing and Inpainting ( http://arxiv.org/abs/2307.14242v1 ) ライセンス: Link先を確認	Junwen Chen, Xingxing Wei	(参考訳) ディープニューラルネットワークは様々なアプリケーションでうまく使われているが、敵の例に対する脆弱性を示している。敵対的パッチの開発により、物理的シーンにおける攻撃の可能性が高まり、パッチ攻撃に対する防御が緊急に必要となる。しかし、このような敵パッチ攻撃を防御することは未解決の問題である。本稿では,敵のパッチの特性を解析し,一方,敵のパッチは対象オブジェクトの出現や文脈的不整合につながり,他方ではパッチ領域はバックボーンネットワークによって抽出されたオブジェクトの高レベル特徴マップに異常な変化を示す。上記の2点を考慮し、入力例を前処理する ‘`localizing and inpainting' 機構に基づく新たな防御手法を提案する。具体的には、``localizing' サブネットワークが上記の2つの側面を表現し、画像中の敵パッチ領域を正確に検出する、2つの分岐構造を利用する統一フレームワークを設計する。インパインティング」サブネットワークでは、周囲のコンテキストキューを利用して、敵パッチでカバーされた元のコンテンツを復元する。インパインされた画像の品質は、外見の一貫性と敵攻撃の影響を計測することで評価される。これら2つのサブネットワークは、反復的な最適化方法で共同で訓練される。こうすることで、「ローカライズ」モジュールと「インパインティング」モジュールは、互いに密接に相互作用し、より良いソリューションを学ぶことができる。様々な敵パッチ攻撃に対して,交通標識の分類と検出を行う一連の実験を行った。 Deep neural networks are successfully used in various applications, but show their vulnerability to adversarial examples. With the development of adversarial patches, the feasibility of attacks in physical scenes increases, and the defenses against patch attacks are urgently needed. However, defending such adversarial patch attacks is still an unsolved problem. In this paper, we analyse the properties of adversarial patches, and find that: on the one hand, adversarial patches will lead to the appearance or contextual inconsistency in the target objects; on the other hand, the patch region will show abnormal changes on the high-level feature maps of the objects extracted by a backbone network. Considering the above two points, we propose a novel defense method based on a ``localizing and inpainting" mechanism to pre-process the input examples. Specifically, we design an unified framework, where the ``localizing" sub-network utilizes a two-branch structure to represent the above two aspects to accurately detect the adversarial patch region in the image. For the ``inpainting" sub-network, it utilizes the surrounding contextual cues to recover the original content covered by the adversarial patch. The quality of inpainted images is also evaluated by measuring the appearance consistency and the effects of adversarial attacks. These two sub-networks are then jointly trained via an iterative optimization manner. In this way, the ``localizing" and ``inpainting" modules can interact closely with each other, and thus learn a better solution. A series of experiments versus traffic sign classification and detection tasks are conducted to defend against various adversarial patch attacks.	翻訳日:2023-07-27 11:58:28 公開日:2023-07-26
# disguisor:手術室の全体的な顔匿名化 DisguisOR: Holistic Face Anonymization for the Operating Room ( http://arxiv.org/abs/2307.14241v1 ) ライセンス: Link先を確認	Lennart Bastian, Tony Danjun Wang, Tobias Czempiel, Benjamin Busam and Nassir Navab	(参考訳) 目的: 外科的データサイエンス(SDS)の進歩は, 病院環境からの映像記録の増加に寄与している。外科的ワークフロー認識のような手法は患者のケアの質を高める可能性があるが、ビデオデータの量は手作業で画像が匿名化できる規模を超えている。手術室(または手術室)における既存の2次元自動匿名化手法は、閉塞と閉塞によるものである。複数のカメラストリームからの3Dデータを用いて,マルチビューOR記録の匿名化を提案する。方法:複数のカメラからのRGBと深度画像は、シーンの3Dポイントクラウド表現に融合される。次に,検出された3dヒューマンキーポイントにパラメトリックヒューマンメッシュモデルを回帰させ,顔メッシュを融合3dポイントクラウドと整合させることにより,各顔を3dで検出する。メッシュモデルは取得したカメラビュー毎にレンダリングされ、個々の顔を置き換える。結果: 本手法は, 既存のアプローチよりも高い速度で顔の特定に有望性を示す。 Disguisorは、各カメラビューに対して幾何学的に一貫した匿名化を生成し、より現実的な匿名化を可能にする。結論: 手術室での頻繁な閉塞および群集は, 既設の匿名化法の改善の余地を残している。 Disguisorは、シーンレベルでのプライバシーに対処し、SDSにおけるさらなる研究を促進する可能性がある。 Purpose: Recent advances in Surgical Data Science (SDS) have contributed to an increase in video recordings from hospital environments. While methods such as surgical workflow recognition show potential in increasing the quality of patient care, the quantity of video data has surpassed the scale at which images can be manually anonymized. Existing automated 2D anonymization methods under-perform in Operating Rooms (OR), due to occlusions and obstructions. We propose to anonymize multi-view OR recordings using 3D data from multiple camera streams. Methods: RGB and depth images from multiple cameras are fused into a 3D point cloud representation of the scene. We then detect each individual's face in 3D by regressing a parametric human mesh model onto detected 3D human keypoints and aligning the face mesh with the fused 3D point cloud. The mesh model is rendered into every acquired camera view, replacing each individual's face. Results: Our method shows promise in locating faces at a higher rate than existing approaches. DisguisOR produces geometrically consistent anonymizations for each camera view, enabling more realistic anonymization that is less detrimental to downstream tasks. Conclusion: Frequent obstructions and crowding in operating rooms leaves significant room for improvement for off-the-shelf anonymization methods. DisguisOR addresses privacy on a scene level and has the potential to facilitate further research in SDS.	翻訳日:2023-07-27 11:57:56 公開日:2023-07-26
# 意見要約における意見の有病率の自動評価 Automatically Evaluating Opinion Prevalence in Opinion Summarization ( http://arxiv.org/abs/2307.14305v1 ) ライセンス: Link先を確認	Christopher Malon	(参考訳) 多数の製品レビューに直面した場合、人間がそれらすべてを思い出し、適切な参照要約を書くために、重み付けの意見を代表的に書けるかどうかははっきりしない。本稿では,要約中の各文に一致したレビュー数をカウントし,自明な文や冗長な文を認識しながら,要約が表現する意見の正当性をテストするための自動尺度を提案する。この評価指標を定式化するために,個々のソースレビューに対して要約文の事実整合性を評価するための既存の手法をいくつか検討する。 Amazon製品レビューのコーパスでは、意見整合性の複数の人的判断を集め、製品レビューでどの自動指標が一貫性を表現するかを決定する。その結果, 提案手法は, 従来の抽出的, 抽象的, 非教師的意見要約法よりも, 著者による要約の方が, ランダムに選択された抽出結果よりもわずかに優れた評価率を示した。我々は,人間による2倍の論評率を持つ抽出要約の欲張りな構成により,改善の余地を示す。最後に,従来の抽象的な意見要約システムによって達成された意見の妥当性を人的パフォーマンスのレベルに引き上げることができることを示す。 When faced with a large number of product reviews, it is not clear that a human can remember all of them and weight opinions representatively to write a good reference summary. We propose an automatic metric to test the prevalence of the opinions that a summary expresses, based on counting the number of reviews that are consistent with each statement in the summary, while discrediting trivial or redundant statements. To formulate this opinion prevalence metric, we consider several existing methods to score the factual consistency of a summary statement with respect to each individual source review. On a corpus of Amazon product reviews, we gather multiple human judgments of the opinion consistency, to determine which automatic metric best expresses consistency in product reviews. Using the resulting opinion prevalence metric, we show that a human authored summary has only slightly better opinion prevalence than randomly selected extracts from the source reviews, and previous extractive and abstractive unsupervised opinion summarization methods perform worse than humans. We demonstrate room for improvement with a greedy construction of extractive summaries with twice the opinion prevalence achieved by humans. Finally, we show that preprocessing source reviews by simplification can raise the opinion prevalence achieved by existing abstractive opinion summarization systems to the level of human performance.	翻訳日:2023-07-27 11:52:31 公開日:2023-07-26
# 最適エネルギー貯蔵システムディスパッチのための制約強制深層強化学習フレームワーク A Constraint Enforcement Deep Reinforcement Learning Framework for Optimal Energy Storage Systems Dispatch ( http://arxiv.org/abs/2307.14304v1 ) ライセンス: Link先を確認	Shengren Hou and Edgar Mauricio Salazar Duque and Peter Palensky and Pedro P. Vergara	(参考訳) エネルギー貯蔵システム(ESS)の最適供給は、動的価格の変動、需要消費、再生可能エネルギーの発生によって生じる不確実性により、深刻な課題を提起する。ディープニューラルネットワーク(DNN)の一般化機能を活用することで、ディープ強化学習(DRL)アルゴリズムは、分散ネットワークの確率性に適応して応答する良質な制御モデルを学ぶことができる。しかし、現在のdrlアルゴリズムには運用上の制約を厳密に強制する能力が欠けている。この問題に対処するために,オンライン操作中の環境や行動空間の運用制約を厳格に実施しつつ,継続的な行動空間を効果的に処理するDRLフレームワークを提案する。まず、提案フレームワークは、DNNを用いてモデル化されたアクション値関数を訓練する。その後、このアクション値関数は、環境の運用制約を考慮した混合整数プログラミング(MIP)の定式化として定式化される。総合的な数値シミュレーションにより,提案したMIP-DRLフレームワークの性能が向上し,最先端のDRLアルゴリズムと確率変数の完全予測で得られる最適解とを比較した。 The optimal dispatch of energy storage systems (ESSs) presents formidable challenges due to the uncertainty introduced by fluctuations in dynamic prices, demand consumption, and renewable-based energy generation. By exploiting the generalization capabilities of deep neural networks (DNNs), deep reinforcement learning (DRL) algorithms can learn good-quality control models that adaptively respond to distribution networks' stochastic nature. However, current DRL algorithms lack the capabilities to enforce operational constraints strictly, often even providing unfeasible control actions. To address this issue, we propose a DRL framework that effectively handles continuous action spaces while strictly enforcing the environments and action space operational constraints during online operation. Firstly, the proposed framework trains an action-value function modeled using DNNs. Subsequently, this action-value function is formulated as a mixed-integer programming (MIP) formulation enabling the consideration of the environment's operational constraints. Comprehensive numerical simulations show the superior performance of the proposed MIP-DRL framework, effectively enforcing all constraints while delivering high-quality dispatch decisions when compared with state-of-the-art DRL algorithms and the optimal solution obtained with a perfect forecast of the stochastic variables.	翻訳日:2023-07-27 11:52:10 公開日:2023-07-26
# ホテル・ホスピタリティにおけるパーソナライズドレコメンデーションの管理と提供のためのチャットgptと説得技術 ChatGPT and Persuasive Technologies for the Management and Delivery of Personalized Recommendations in Hotel Hospitality ( http://arxiv.org/abs/2307.14298v1 ) ライセンス: Link先を確認	Manolis Remountakis, Konstantinos Kotis, Babis Kourtzis, and George E. Tsekouras	(参考訳) レコメンダシステムはホテルのホスピタリティ業界で必須のツールとなり、ゲストにパーソナライズされ、カスタマイズされた体験を可能にする。近年,ChatGPTや説得技術といった大規模言語モデル(LLM)の進歩により,これらのシステムの有効性を高めるための新たな道が開かれた。本稿では,ホテル宿泊レコメンデーションシステムの自動化と改善を目的としたChatGPTと説得技術の統合の可能性を検討する。まず、人間のようなテキストを理解して生成できるChatGPTの機能を調べ、より正確でコンテキスト対応のレコメンデーションを可能にします。 chatgptをレコメンダシステムに統合し、ユーザの好みを分析し、オンラインレビューから貴重な洞察を抽出し、ゲストプロフィールに基づいてパーソナライズされたレコメンデーションを生成する機能を強調する。第2に,ユーザの行動に影響を及ぼす説得的技術の役割と,ホテルのレコメンデーションの説得的影響について検討する。社会的証明、不足、パーソナライゼーションといった説得力のある手法を取り入れることで、レコメンダシステムはユーザの意思決定に効果的に影響を与え、特定のホテルの予約や部屋のアップグレードといった望ましい行動を奨励することができる。本稿では,ChatGPTと説得技術の有効性を検討するために,ホテル推薦システムを用いたパイロット実験を行った。本研究の目的は,ChatGPTとPersua-sive技術の統合がユーザのエンゲージメント,満足度,コンバージョン率に与える影響を検討することである。予備結果は,これらの技術がゲスト体験とビジネスパフォーマンスを向上させる可能性を示すものである。本稿では,レコメンデーションシステムにおけるLLMと説得技術との相乗関係を探求し,客の満足感とホテル収入に影響を与えるホテルの宿泊分野に貢献する。 Recommender systems have become indispensable tools in the hotel hospitality industry, enabling personalized and tailored experiences for guests. Recent advancements in large language models (LLMs), such as ChatGPT, and persuasive technologies, have opened new avenues for enhancing the effectiveness of those systems. This paper explores the potential of integrating ChatGPT and persuasive technologies for automating and improving hotel hospitality recommender systems. First, we delve into the capabilities of ChatGPT, which can understand and generate human-like text, enabling more accurate and context-aware recommendations. We discuss the integration of ChatGPT into recommender systems, highlighting the ability to analyze user preferences, extract valuable insights from online reviews, and generate personalized recommendations based on guest profiles. Second, we investigate the role of persuasive technology in influencing user behavior and enhancing the persuasive impact of hotel recommendations. By incorporating persuasive techniques, such as social proof, scarcity and personalization, recommender systems can effectively influence user decision-making and encourage desired actions, such as booking a specific hotel or upgrading their room. To investigate the efficacy of ChatGPT and persuasive technologies, we present a pilot experi-ment with a case study involving a hotel recommender system. We aim to study the impact of integrating ChatGPT and persua-sive techniques on user engagement, satisfaction, and conversion rates. The preliminary results demonstrate the potential of these technologies in enhancing the overall guest experience and business performance. Overall, this paper contributes to the field of hotel hospitality by exploring the synergistic relationship between LLMs and persuasive technology in recommender systems, ultimately influencing guest satisfaction and hotel revenue.	翻訳日:2023-07-27 11:51:47 公開日:2023-07-26
# 逐次データ分割の複雑さを解き放つ:ビデオと時系列分析における課題に取り組む Unraveling the Complexity of Splitting Sequential Data: Tackling Challenges in Video and Time Series Analysis ( http://arxiv.org/abs/2307.14294v1 ) ライセンス: Link先を確認	Diego Botache, Kristina Dingel, Rico Huhnstock, Arno Ehresmann, Bernhard Sick	(参考訳) ビデオや時系列などのシーケンシャルデータの分割は、オブジェクト追跡や異常検出など、さまざまなデータ分析タスクにおいて重要なステップである。しかし、逐次データを分割することは、その後の分析の正確性と信頼性に影響を与える様々な課題をもたらす。本稿では,データ取得,データ表現,分割比選択,品質基準の設定,適切な選択戦略の選択など,逐次データ分割に関わる課題について考察する。これらの課題を、運動テストベンチと液体中の粒子追跡の2つの実例を通して探求する。 Splitting of sequential data, such as videos and time series, is an essential step in various data analysis tasks, including object tracking and anomaly detection. However, splitting sequential data presents a variety of challenges that can impact the accuracy and reliability of subsequent analyses. This concept article examines the challenges associated with splitting sequential data, including data acquisition, data representation, split ratio selection, setting up quality criteria, and choosing suitable selection strategies. We explore these challenges through two real-world examples: motor test benches and particle tracking in liquids.	翻訳日:2023-07-27 11:51:12 公開日:2023-07-26
# 言語学における数学的拡散モデルの構築イタリア北東部方言におけるドイツ語構文の特徴の事例研究 Founding a mathematical diffusion model in linguistics. The case study of German syntactic features in the North-Eastern Italian dialects ( http://arxiv.org/abs/2307.14291v1 ) ライセンス: Link先を確認	I. Lazzizzera	(参考訳) 中世後期にチロルにドイツ人が移住した後に発生したイタリア北東部のロマンス方言へのゲルマン語の構文的特徴の拡散を事例として考察する。インタラクティブマップは、地理データサイエンスと呼ばれるツールを使って作成される。滑らかな2次元曲面 $\mathcal{G}$ は、どの領域が与えられたドイツ語の特徴を使用するかを局所的に表現する。ニューラインこの曲面 $\mathcal{G}$ は、拡散対流現象を2次元で表す函数の現在の値(以下「emph{tidal} モード」という)であり、熱拡散のような多くの現象学的な事実のために物理学で用いられる同じ方程式に非常に自然な方法で従う。現在評価されているこの方程式の解は、$\mathcal{G}$で補間されたデータとよく適合し、ケーススタディの言語的特徴の拡散対流の説得力のある画像を提供し、単純化と近似を提供する。ニューラインは非常に重要であり、シュミットの「波」は拡散方程式の解の中に数えられることが示されている: シュミットの「波」を「潮流の洪水」に重ね合わせることで、実際の言語拡散現象の複雑さを再現することができる。 We take as a case study the spread of Germanic syntactic features into Romance dialects of North-Eastern Italy, which occurred after the immigration of German people in the Tyrol during the High Middle Ages. An interactive map is produced using tools of what is called Geographic Data Science. A smooth two-dimensional surface $\mathcal{G}$ expresses locally which fraction of territory uses a given German language feature: it is obtained by interpolating a discrete function that says if at any surveyed locality that feature is used or not.\newline This surface $\mathcal{G}$ is thought of as the value at the present time of a function describing a diffusion-convection phenomenon in two dimensions (here said \emph{tidal} mode), which is subjected in a very natural way to the same equation, suitably contextualized, used in physics for a number of phenomenological facts like the heat diffusion. It is shown that solutions of this equation, evaluated at the present time, fit well with the data as interpolated by $\mathcal{G}$, thus providing convincing pictures of diffusion-convection of the linguistic features of the case study, albeit simplifications and approximations.\newline Very importantly, it is shown that Schmidt's 'waves' can be counted among the solutions of the diffusion equation: superimposing Schmidt 'waves' to a 'tidal flooding' can reproduce complexities of real linguistic diffusion events.	翻訳日:2023-07-27 11:51:02 公開日:2023-07-26
# Skin Co-Registrationに基づくUS & MR画像融合 US & MR Image-Fusion Based on Skin Co-Registration ( http://arxiv.org/abs/2307.14288v1 ) ライセンス: Link先を確認	Martina Paccini, Giacomo Paschina, Stefano De Beni, Giuseppe Patan\`e	(参考訳) 医用画像の高度な可視化、表現、分析のための革新的なソリューションの研究と開発は、異なる研究方向を提供する。医用画像の現在の実践は、リアルタイムUSと画像モダリティを組み合わせることで、CT、MRI、PETなどの内部解剖学的取得を可能にしている。画像融合のアプローチの応用は、手術器具や針をリアルタイムで追跡するときに見ることができる。そこで本研究では,3次元カメラセンサを用いたリアルタイムus取得によるct画像とmri画像の登録のための融合画像システムを提案する。この研究の主な焦点は、システムの移植性と、異なる解剖学領域への適用性である。 The study and development of innovative solutions for the advanced visualisation, representation and analysis of medical images offer different research directions. Current practice in medical imaging consists in combining real-time US with imaging modalities that allow internal anatomy acquisitions, such as CT, MRI, PET or similar. Application of image-fusion approaches can be found in tracking surgical tools and/or needles, in real-time during interventions. Thus, this work proposes a fusion imaging system for the registration of CT and MRI images with real-time US acquisition leveraging a 3D camera sensor. The main focus of the work is the portability of the system and its applicability to different anatomical districts.	翻訳日:2023-07-27 11:50:34 公開日:2023-07-26
# 米国の都市における極低温予測のための新しい統計的機械学習技術 Emerging Statistical Machine Learning Techniques for Extreme Temperature Forecasting in U.S. Cities ( http://arxiv.org/abs/2307.14285v1 ) ライセンス: Link先を確認	Kameron B. Kinast and Ernest Fokou\'e	(参考訳) 本稿では,新しい統計的機械学習技術を用いて,極端温度パターンの包括的解析を行う。本研究は,気候時系列予測における各種統計モデルの有効性の探索と比較に焦点をあてる。これらのモデルには、自己回帰的統合移動平均、指数的平滑化、多層パーセプトロン、ガウス過程が含まれる。我々は,これらの手法を,最も人口の多い5つの米国都市の時系列データに適用し,PythonとJuliaを利用して,気候変動とその影響を理解する上での統計計算の役割を実証する。本研究は, 統計的手法の違いを強調し, 最も効果的なアプローチとして多層パーセプトロンを同定した。さらに, この最適性能法を用いて極端温度を2030年まで予測し, 温度変化が0より大きいかどうかを検証し, 仮説を検証した。 In this paper, we present a comprehensive analysis of extreme temperature patterns using emerging statistical machine learning techniques. Our research focuses on exploring and comparing the effectiveness of various statistical models for climate time series forecasting. The models considered include Auto-Regressive Integrated Moving Average, Exponential Smoothing, Multilayer Perceptrons, and Gaussian Processes. We apply these methods to climate time series data from five most populated U.S. cities, utilizing Python and Julia to demonstrate the role of statistical computing in understanding climate change and its impacts. Our findings highlight the differences between the statistical methods and identify Multilayer Perceptrons as the most effective approach. Additionally, we project extreme temperatures using this best-performing method, up to 2030, and examine whether the temperature changes are greater than zero, thereby testing a hypothesis.	翻訳日:2023-07-27 11:50:24 公開日:2023-07-26
# 汎用人工知能システム(gpais):特性、定義、分類、オープンチャレンジと意義 General Purpose Artificial Intelligence Systems (GPAIS): Properties, Definition, Taxonomy, Open Challenges and Implications ( http://arxiv.org/abs/2307.14283v1 ) ライセンス: Link先を確認	Isaac Triguero, Daniel Molina, Javier Poyatos, Javier Del Ser, Francisco Herrera	(参考訳) 人工知能(AI)のほとんどのアプリケーションは、限定的で特定のタスクのために設計されている。しかし、より一般的なaiを求める多くのシナリオがあり、それらのために特別に設計されることなく、幅広いタスクを解決できる。汎用人工知能システム(General-Purpose Artificial Intelligence Systems, GPAIS)は、これらのAIシステムを指す用語である。これまでのところ、人工知能の可能性は、人間であるかのように知的タスクを遂行できるほど強力であり、またそれを改善することさえ可能であり、願望、フィクションであり、我々の社会にとってリスクであると考えられてきた。私たちはまだそれを達成するには程遠いかもしれないが、GPAISは現実であり、AI研究の最前線にいる。本稿では,gpais の既存定義について論じ,その特性と限界に応じて gpai の種類を段階的に微分できる新たな定義を提案する。クローズドワールドとオープンワールドのGPAISを区別し、新しいタスクへの適応、意図的に訓練されていないドメインにおける能力、少ないデータから学習する能力、あるいは自身の制限を積極的に認める能力など、いくつかの要因に基づいて、それらの自律性と能力の程度を特徴付ける。次に、GPAISを実現するためのアプローチの分類を提案し、AI技術を用いた別のAIや基礎モデルの改善などの研究動向について述べる。第一の例として、私たちは生成aiを分類学で提示された用語と概念と整合させます。提案した定義と分類学を通じて,汎用的な課題に対処する様々な分野の研究協力を促進することを目的としている。最後に,gpaiの現状,課題と展望,社会への意味,責任と信頼に値するaiシステムと規制の必要性について議論し,gpaiの全体像を提供することを目標とした。 Most applications of Artificial Intelligence (AI) are designed for a confined and specific task. However, there are many scenarios that call for a more general AI, capable of solving a wide array of tasks without being specifically designed for them. The term General-Purpose Artificial Intelligence Systems (GPAIS) has been defined to refer to these AI systems. To date, the possibility of an Artificial General Intelligence, powerful enough to perform any intellectual task as if it were human, or even improve it, has remained an aspiration, fiction, and considered a risk for our society. Whilst we might still be far from achieving that, GPAIS is a reality and sitting at the forefront of AI research. This work discusses existing definitions for GPAIS and proposes a new definition that allows for a gradual differentiation among types of GPAIS according to their properties and limitations. We distinguish between closed-world and open-world GPAIS, characterising their degree of autonomy and ability based on several factors such as adaptation to new tasks, competence in domains not intentionally trained for, ability to learn from few data, or proactive acknowledgment of their own limitations. We then propose a taxonomy of approaches to realise GPAIS, describing research trends such as the use of AI techniques to improve another AI or foundation models. As a prime example, we delve into generative AI, aligning them with the terms and concepts presented in the taxonomy. Through the proposed definition and taxonomy, our aim is to facilitate research collaboration across different areas that are tackling general-purpose tasks, as they share many common aspects. Finally, we discuss the current state of GPAIS, its challenges and prospects, implications for our society, and the need for responsible and trustworthy AI systems and regulation, with the goal of providing a holistic view of GPAIS.	翻訳日:2023-07-27 11:50:09 公開日:2023-07-26
# 大規模完全教師なし再確認 Large-scale Fully-Unsupervised Re-Identification ( http://arxiv.org/abs/2307.14278v1 ) ライセンス: Link先を確認	Gabriel Bertocco, Fernanda Andal\'o, Terrance E. Boult, and Anderson Rocha	(参考訳) 完全に監督されていない人物と車両の再識別は、手動のアノテーションを必要とせず、監視、法医学、イベント理解、スマートシティに広く適用できるため、注目されている。しかしながら、以前の技術のほとんどは、わずか数千のサンプルを持つデータセットで評価されている。このような小さなデータ設定は、時間とメモリフットプリント(Re-Rankingなど)にコストのかかるテクニックを使用することで、クラスタリング結果を改善することができる。さらに、以前の作業では、データセット毎に最適なクラスタリングハイパーパラメータを事前に選択しているものもある。この文脈では、より現実的なシナリオに取り組み、大規模なラベルのないデータから学ぶための2つの戦略を提案する。第1の戦略は、近傍関係に違反することなく、各イテレーションにおけるデータセットサイズを削減するために、ローカル近傍サンプリングを実行する。第2の戦略は、より低い時間上限の複雑さを持ち、メモリの複雑さを k<<n で O(n^2) から O(kn) に還元する、新しいRe-Ranking 手法を利用する。また,クラスタリングアルゴリズムの特定のハイパーパラメータ値の事前選択を回避するために,トレーニング中の密度パラメータを調整し,サンプルの多様性を活用し,学習をノイズラベリングに頑健に保つ新しいスケジューリングアルゴリズムを提案する。最後に、異なるモデルによって学習された相補的な知識により、予測された擬似ラベルの背骨間の置換に依存し、ハイパーパラメータや重み付け最適化を必要としないコトレーニング戦略を導入する。提案手法は,高名なベンチマークや大規模veri-wildデータセットにおいて,より高速でメモリ効率の高い再ランキング戦略,大規模でノイズの多い,アンサンブルベースの学習手法において,最先端の手法よりも優れている。 Fully-unsupervised Person and Vehicle Re-Identification have received increasing attention due to their broad applicability in surveillance, forensics, event understanding, and smart cities, without requiring any manual annotation. However, most of the prior art has been evaluated in datasets that have just a couple thousand samples. Such small-data setups often allow the use of costly techniques in time and memory footprints, such as Re-Ranking, to improve clustering results. Moreover, some previous work even pre-selects the best clustering hyper-parameters for each dataset, which is unrealistic in a large-scale fully-unsupervised scenario. In this context, this work tackles a more realistic scenario and proposes two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each iteration without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n^2) to O(kn) with k << n. To avoid the pre-selection of specific hyper-parameter values for the clustering algorithm, we also present a novel scheduling algorithm that adjusts the density parameter during training, to leverage the diversity of samples and keep the learning robust to noisy labeling. Finally, due to the complementary knowledge learned by different models, we also introduce a co-training strategy that relies upon the permutation of predicted pseudo-labels, among the backbones, with no need for any hyper-parameters or weighting optimization. The proposed methodology outperforms the state-of-the-art methods in well-known benchmarks and in the challenging large-scale Veri-Wild dataset, with a faster and memory-efficient Re-Ranking strategy, and a large-scale, noisy-robust, and ensemble-based learning approach.	翻訳日:2023-07-27 11:49:38 公開日:2023-07-26
# G2L:ジオデシックとゲーム理論によるセマンティックアライメントと一様グラウンド G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory ( http://arxiv.org/abs/2307.14277v1 ) ライセンス: Link先を確認	Hongxiang Li, Meng Cao, Xuxin Cheng, Yaowei Li, Zhihong Zhu, Yuexian Zou	(参考訳) 最近のビデオグラウンディングは、バニラコントラスト学習をビデオグラウンディングに導入しようと試みている。しかし、このナイーブ解は準最適であると主張する。対照的な学習には、(1)類似したサンプルの特徴のemph{alignment}と(2)超球上の正規化特徴の誘導分布のemph{uniformity}という2つの重要な特性が必要である。ビデオグラウンディングにおける2つの厄介な問題として,(1) 真実と他の瞬間の両方に視覚的実体が存在すること,(2) ビデオ中のいくつかの瞬間だけが注釈付けされていること,(2) バニラ・コントラスト学習は時間的に離れたモーメントと非一貫性なビデオ表現の相関をモデル化できないこと,などがあげられる。どちらの特徴も、バニラのコントラスト学習はビデオの接地には適さない。本稿では,ジオデシックとゲーム理論を通した,意味的に整列した一様ビデオグラウンドフレームワークであるgeodesic and game localization (g2l)を提案する。我々は、モデルが正しいクロスモーダル表現を学ぶのを導く測地距離を利用したモーメント間の相関を定量化する。さらに,ゲーム理論の新たな視点から,測地線距離サンプリングに基づくセマンティック・シェープリー相互作用を提案し,類似した瞬間における微粒なセマンティックアライメントを学習する。 3つのベンチマーク実験により,本手法の有効性が示された。 The recent video grounding works attempt to introduce vanilla contrastive learning into video grounding. However, we claim that this naive solution is suboptimal. Contrastive learning requires two key properties: (1) \emph{alignment} of features of similar samples, and (2) \emph{uniformity} of the induced distribution of the normalized features on the hypersphere. Due to two annoying issues in video grounding: (1) the co-existence of some visual entities in both ground truth and other moments, \ie semantic overlapping; (2) only a few moments in the video are annotated, \ie sparse annotation dilemma, vanilla contrastive learning is unable to model the correlations between temporally distant moments and learned inconsistent video representations. Both characteristics lead to vanilla contrastive learning being unsuitable for video grounding. In this paper, we introduce Geodesic and Game Localization (G2L), a semantically aligned and uniform video grounding framework via geodesic and game theory. We quantify the correlations among moments leveraging the geodesic distance that guides the model to learn the correct cross-modal representations. Furthermore, from the novel perspective of game theory, we propose semantic Shapley interaction based on geodesic distance sampling to learn fine-grained semantic alignment in similar moments. Experiments on three benchmarks demonstrate the effectiveness of our method.	翻訳日:2023-07-27 11:49:03 公開日:2023-07-26
# ロボットマニピュレーションのためのWaypoint-based Imitation Learning Waypoint-Based Imitation Learning for Robotic Manipulation ( http://arxiv.org/abs/2307.14326v1 ) ライセンス: Link先を確認	Lucy Xiaoyang Shi, Archit Sharma, Tony Z. Zhao, Chelsea Finn	(参考訳) 模倣学習法はロボット操作への関心が高まりつつあるが、複合的エラーのよく知られた問題は、行動クローニング(BC)に影響を与え続けている。ウェイポイントは、bcの学習問題の地平線を縮めることでこの問題に対処できるため、エラーは時間とともに複雑化する。しかし、ウェイポイントラベリングは不特定であり、追加の人的監督が必要である。人的監督なしでwaypointを自動生成できますか? 我々の重要な洞察は、軌道セグメントが線形運動によって近似できるならば、エンドポイントはウェイポイントとして使用できるということである。そこで本研究では,再現学習のための自動ウェイポイント抽出 (awe) を提案する。このプリプロセッシングモジュールはデモを最小のウェイポイントに分解し,線形に補間することで,指定された誤差閾値までの軌道を近似できる。 AWEは任意のBCアルゴリズムと組み合わせることができ、AWEはシミュレーションで25%、実世界のバイマン的操作タスクで4-28%、意思決定の地平線を最大10倍に向上させることができる。ビデオとコードはhttps://lucys0.github.io/awe/で入手できる。 While imitation learning methods have seen a resurgent interest for robotic manipulation, the well-known problem of compounding errors continues to afflict behavioral cloning (BC). Waypoints can help address this problem by reducing the horizon of the learning problem for BC, and thus, the errors compounded over time. However, waypoint labeling is underspecified, and requires additional human supervision. Can we generate waypoints automatically without any additional human supervision? Our key insight is that if a trajectory segment can be approximated by linear motion, the endpoints can be used as waypoints. We propose Automatic Waypoint Extraction (AWE) for imitation learning, a preprocessing module to decompose a demonstration into a minimal set of waypoints which when interpolated linearly can approximate the trajectory up to a specified error threshold. AWE can be combined with any BC algorithm, and we find that AWE can increase the success rate of state-of-the-art algorithms by up to 25% in simulation and by 4-28% on real-world bimanual manipulation tasks, reducing the decision making horizon by up to a factor of 10. Videos and code are available at https://lucys0.github.io/awe/	翻訳日:2023-07-27 11:41:12 公開日:2023-07-26
# 低深度凸ユニタリ進化によるオープン量子系のシミュレーション Simulation of Open Quantum Systems via Low-Depth Convex Unitary Evolutions ( http://arxiv.org/abs/2307.14325v1 ) ライセンス: Link先を確認	Joseph Peetz, Scott E. Smart, Spyros Tserkis, Prineha Narang	(参考訳) 量子デバイス上で物理システムをシミュレーションすることは、量子技術の最も有望な応用の1つである。オープン量子システムをシミュレートする現在の量子アプローチは、通常、アンシラ量子ビットと広範囲に制御されたシーケンスを必要とするため、nisq時代のデバイスでは事実上困難である。本研究では,ランダムユニタリチャネルと呼ばれるオープンシステムダイナミクスのクラスをシミュレートするためのハイブリッド量子古典的手法を提案する。これらのチャネルは自然に一連の凸ユニタリ進化に分解され、効率的にサンプリングされ独立した回路として実行される。このメソッドは深いアンシラフレームワークを必要としないため、低ノイズコストで実装できる。我々は、開量子システムのシミュレーションを数十キュービットまで実装し、大きなチャネルランクで実装する。 Simulating physical systems on quantum devices is one of the most promising applications of quantum technology. Current quantum approaches to simulating open quantum systems are still practically challenging on NISQ-era devices, because they typically require ancilla qubits and extensive controlled sequences. In this work, we propose a hybrid quantum-classical approach for simulating a class of open system dynamics called random-unitary channels. These channels naturally decompose into a series of convex unitary evolutions, which can then be efficiently sampled and run as independent circuits. The method does not require deep ancilla frameworks and thus can be implemented with lower noise costs. We implement simulations of open quantum systems up to dozens of qubits and with large channel rank.	翻訳日:2023-07-27 11:40:51 公開日:2023-07-26
# LLMにおけるモラル信念の評価 Evaluating the Moral Beliefs Encoded in LLMs ( http://arxiv.org/abs/2307.14324v1 ) ライセンス: Link先を確認	Nino Scherrer, Claudia Shi, Amir Feder and David M. Blei	(参考訳) 本稿では,大規模言語モデル(LLM)における設計・管理・後処理・評価について事例研究を行う。 1) LLM に符号化された信念を統計的に抽出する手法。我々は,LCMの「選択を行う」確率,関連する不確実性,およびその選択の一貫性を定量化する統計測度と評価測度を導入する。 2)モラル信念が異なるllmにエンコードされているか,特に正しい選択が明確でない曖昧な場合について,この方法を適用する。 680の高曖昧な道徳的シナリオ(例:「白い嘘をつくか?」)と687の低曖昧な道徳的シナリオ(例:「道路の歩行者を止めるか?」)からなる大規模な調査を設計する。各シナリオには説明と2つの可能なアクション、違反したルール(例えば「殺さない」)を示す補助ラベルが含まれている。オープンおよびクローズドソース LLM を対象とした調査を28件実施する。私たちはそれを見つけ (a) あいまいなシナリオでは、ほとんどのモデルはコモンセンスと整合したアクションを「選択」します。曖昧な場合、ほとんどのモデルは不確実性を表す。 (b)質問文に反応が敏感であるため,コモンセンス行動の選択について不確実なモデルもある。 (c)曖昧なシナリオにおいて明確な嗜好を反映するモデルもある。具体的には、クローズドソースモデルは互いに合意する傾向がある。 This paper presents a case study on the design, administration, post-processing, and evaluation of surveys on large language models (LLMs). It comprises two components: (1) A statistical method for eliciting beliefs encoded in LLMs. We introduce statistical measures and evaluation metrics that quantify the probability of an LLM "making a choice", the associated uncertainty, and the consistency of that choice. (2) We apply this method to study what moral beliefs are encoded in different LLMs, especially in ambiguous cases where the right choice is not obvious. We design a large-scale survey comprising 680 high-ambiguity moral scenarios (e.g., "Should I tell a white lie?") and 687 low-ambiguity moral scenarios (e.g., "Should I stop for a pedestrian on the road?"). Each scenario includes a description, two possible actions, and auxiliary labels indicating violated rules (e.g., "do not kill"). We administer the survey to 28 open- and closed-source LLMs. We find that (a) in unambiguous scenarios, most models "choose" actions that align with commonsense. In ambiguous cases, most models express uncertainty. (b) Some models are uncertain about choosing the commonsense action because their responses are sensitive to the question-wording. (c) Some models reflect clear preferences in ambiguous scenarios. Specifically, closed-source models tend to agree with each other.	翻訳日:2023-07-27 11:40:37 公開日:2023-07-26
# 説明可能なデュアルニューラルネットワークを用いた逆需要関数のモデル化 Modeling Inverse Demand Function with Explainable Dual Neural Networks ( http://arxiv.org/abs/2307.14322v1 ) ライセンス: Link先を確認	Zhiyu Cao, Zihan Chen, Prerna Mishra, Hamed Amini, Zachary Feinstein	(参考訳) 金融の伝染は金融システムの基本的リスクとして広く認識されている。特に強力なのが価格経由の感染であり、企業による強引な清算によって資産価格が下落し、金融ストレスが伝播し、危機は一見無関係な組織の範囲で拡大する。価格の影響は現在、外因性逆需要関数によってモデル化されている。しかし、現実のシナリオでは、初期ショックと最終均衡資産価格のみが観測可能であり、実際の資産の流動性はほとんど不明である。この欠落したデータは、既存のモデルの校正に重大な制限を与える。これらの課題に対処するため、第1のニューラルネットワークは初期ショックを予測された資産の流動にマッピングし、第2のニューラルネットワークはこれらの流動を利用して結果の平衡価格を導出する。このデータ駆動型アプローチは、解析構造を事前に指定することなく、線形形式と非線形形式の両方をキャプチャすることができる。シミュレーションデータセットを用いた実験により,本モデルは初期ショックのみに基づいて均衡資産価格を正確に予測し,予測値と真の清算値との整合性を示した。我々の説明可能なフレームワークは、価格を媒介とする伝染の理解とモデリングに寄与し、金融当局が効果的なストレステストと規制ポリシーを構築するための貴重な洞察を提供します。 Financial contagion has been widely recognized as a fundamental risk to the financial system. Particularly potent is price-mediated contagion, wherein forced liquidations by firms depress asset prices and propagate financial stress, enabling crises to proliferate across a broad spectrum of seemingly unrelated entities. Price impacts are currently modeled via exogenous inverse demand functions. However, in real-world scenarios, only the initial shocks and the final equilibrium asset prices are typically observable, leaving actual asset liquidations largely obscured. This missing data presents significant limitations to calibrating the existing models. To address these challenges, we introduce a novel dual neural network structure that operates in two sequential stages: the first neural network maps initial shocks to predicted asset liquidations, and the second network utilizes these liquidations to derive resultant equilibrium prices. This data-driven approach can capture both linear and non-linear forms without pre-specifying an analytical structure; furthermore, it functions effectively even in the absence of observable liquidation data. Experiments with simulated datasets demonstrate that our model can accurately predict equilibrium asset prices based solely on initial shocks, while revealing a strong alignment between predicted and true liquidations. Our explainable framework contributes to the understanding and modeling of price-mediated contagion and provides valuable insights for financial authorities to construct effective stress tests and regulatory policies.	翻訳日:2023-07-27 11:40:16 公開日:2023-07-26
# ガイド付き安全探査による強化学習 Reinforcement Learning by Guided Safe Exploration ( http://arxiv.org/abs/2307.14316v1 ) ライセンス: Link先を確認	Qisong Yang, Thiago D. Sim\~ao, Nils Jansen, Simon H. Tindemans, Matthijs T. J. Spaan	(参考訳) 安全は強化学習(RL)の適用を広げるために重要である。多くの場合、実験室のような制御された環境でRLエージェントを訓練し、実世界で展開する。しかし、実際のターゲットタスクは、デプロイ前に不明かもしれない。 Reward-free RLは報酬のないエージェントを訓練し、報酬が明らかになったらすぐに適応させる。エージェント(ガイド)が報酬信号なしで安全に探索することを学ぶという制約のない条件を考える。このエージェントは制御された環境で訓練され、安全でない相互作用を可能にし、安全信号を提供する。目標タスクが公表された後、安全違反はもはや許されない。したがって、ガイドを利用して安全な行動ポリシーを構成する。また,転校学習から,学生が信頼できない間に目標方針(学生)を定式化し,学習が進むにつれてガイドの影響を徐々に排除する。実験分析の結果,この手法は安全な転校学習を実現でき,学生がより早く目標課題を解決できることがわかった。 Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to compose a safe behaviour policy. Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence of the guide as training progresses. The empirical analysis shows that this method can achieve safe transfer learning and helps the student solve the target task faster.	翻訳日:2023-07-27 11:39:51 公開日:2023-07-26
# 一般化シモン問題に対する厳密な分散量子アルゴリズム Exact distributed quantum algorithm for generalized Simon's problem ( http://arxiv.org/abs/2307.14315v1 ) ライセンス: Link先を確認	Hao Li, Daowen Qiu, Le Luo, Mateus Paulo	(参考訳) サイモンの問題は、ショアのアルゴリズムの提案に大きな影響を与えたため、量子アルゴリズムのパワーを示す最も重要な問題の1つである。一般化されたサイモンの問題はサイモンの問題の自然な拡張であり、特別な隠れ部分群問題でもある。本稿では2つの重要な貢献について述べる。まず、一般化されたサイモン問題の構造を分散シナリオで特徴付け、対応する分散量子アルゴリズムを導入する。第2に,量子振幅増幅法の応用による正確性を確保するアルゴリズムを改良する。本アルゴリズムは分散古典アルゴリズムと比較して指数加速度を提供する。一般化されたシモン問題に対する集中量子アルゴリズムと対照的に、我々のアルゴリズムのオラクルはより少ない量子ビットを必要とするため、物理的に容易に実装できる。特に、一般化されたサイモン問題のために我々が開発する厳密な分散量子アルゴリズムは、一般化可能性と厳密性の観点からサイモンの問題に対して提案されている最良の分散量子アルゴリズムよりも優れている。 Simon's problem is one of the most important problems demonstrating the power of quantum algorithms, as it greatly inspired the proposal of Shor's algorithm. The generalized Simon's problem is a natural extension of Simon's problem, and also a special hidden subgroup problem. In this paper, we present two key contributions. Firstly, we characterize the structure of the generalized Simon's problem in distributed scenario and introduce a corresponding distributed quantum algorithm. Secondly, we refine the algorithm to ensure exactness due to the application of quantum amplitude amplification technique. Our algorithm offers exponential acceleration compared to the distributed classical algorithm. When contrasted with the centralized quantum algorithm for the generalized Simon's problem, our algorithm's oracle requires fewer qubits, thus making it easier to be physically implemented. Particularly, the exact distributed quantum algorithm we develop for the generalized Simon's problem outperforms the best previously proposed distributed quantum algorithm for Simon's problem in terms of generalizability and exactness.	翻訳日:2023-07-27 11:39:34 公開日:2023-07-26
# SQUWALS: Szegedy QUantum Walks Simulator SQUWALS: A Szegedy QUantum WALks Simulator ( http://arxiv.org/abs/2307.14314v1 ) ライセンス: Link先を確認	Sergio A. Ortega, Miguel A. Martin-Delgado	(参考訳) Szegedyの量子ウォークは、一般的なマルコフ連鎖を定量化するアルゴリズムである。最適化の多くの変種など、多くのアプリケーションがあります。エラーのない環境でその特性をチェックするためには、古典的なシミュレータを持つことが重要である。しかし、現在のシミュレーションアルゴリズムは、この量子ウォークの特定の定式化のために大量のメモリを必要とする。本稿では、グラフのサイズが$n$で$\mathcal{o}(n^2)$にスケールするメモリ節約アルゴリズムを提案する。混合状態上のセゲディの量子ウォークと半古典的セゲディウォークをシミュレートするための追加手順を提案する。これらのテクニックにより、PythonでSQUWALSと呼ばれる古典的なシミュレータを構築しました。我々のシミュレータは、時間とメモリリソースの両方で$\mathcal{o}(n^2)$でスケールする。このパッケージは、例えばPageRankのように、Szegedyの量子ウォークに基づくアルゴリズムの高レベルなアプリケーションを提供する。 Szegedy's quantum walk is an algorithm for quantizing a general Markov chain. It has plenty of applications such as many variants of optimizations. In order to check its properties in an error-free environment, it is important to have a classical simulator. However, the current simulation algorithms require a great deal of memory due to the particular formulation of this quantum walk. In this paper we propose a memory-saving algorithm that scales as $\mathcal{O}(N^2)$ with the size $N$ of the graph. We provide additional procedures for simulating Szegedy's quantum walk over mixed states and also the Semiclassical Szegedy walk. With these techniques we have built a classical simulator in Python called SQUWALS. We show that our simulator scales as $\mathcal{O}(N^2)$ in both time and memory resources. This package provides some high-level applications for algorithms based on Szegedy's quantum walk, as for example the quantum PageRank.	翻訳日:2023-07-27 11:39:20 公開日:2023-07-26
# 感傷分析のための図書館の比較分析 Comparative Analysis of Libraries for the Sentimental Analysis ( http://arxiv.org/abs/2307.14311v1 ) ライセンス: Link先を確認	Wendy Ccoya and Edson Pinto	(参考訳) この研究は、機械学習手法を用いたライブラリの比較を行うことが主な目的である。自然言語処理(NLP)の専門家は、テキスト変更の感情分析(SA)にますます関心を寄せている。 nlpテキスト分析技術を利用する目的は、twitterユーザーの発話に関する感情を認識し、分類することである。本試験では,SAと活用図書館の問題についても検討した。感情の極性を分類する協力的な方法を提供しています最近の研究によると、Naive Bayes分類器、Decision Tree分類器、Maxent分類器、Sklearn分類器、Sklearn分類器、MultinomialNBなどの結合学習アルゴリズムは非常に効果的である。プロジェクトでは、PythonとRの5つのライブラリ、NLTK、TextBlob、Vader、Transformer(GPTとBERTの事前トレーニング)を使用し、Tidytextは、感情分析技術の適用のために研究に使用される。 4つの機械学習モデルであるTree of Decisions (DT), Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN)も使用される。ソーシャルネットワーク環境におけるSAライブラリの運用状況を評価するために,比較研究を行った。この実験の最良のアルゴリズムを評価する尺度は、各方法に1つのデータセットを使用しており、精度、リコール、およびf1スコアであった。その結果, 感情分析には 0.973 の BERT トランスフォーマー法が推奨されることがわかった。 This study is main goal is to provide a comparative comparison of libraries using machine learning methods. Experts in natural language processing (NLP) are becoming more and more interested in sentiment analysis (SA) of text changes. The objective of employing NLP text analysis techniques is to recognize and categorize feelings related to twitter users utterances. In this examination, issues with SA and the libraries utilized are also looked at. provides a number of cooperative methods to classify emotional polarity. The Naive Bayes Classifier, Decision Tree Classifier, Maxent Classifier, Sklearn Classifier, Sklearn Classifier MultinomialNB, and other conjoint learning algorithms, according to recent research, are very effective. In the project will use Five Python and R libraries NLTK, TextBlob, Vader, Transformers (GPT and BERT pretrained), and Tidytext will be used in the study to apply sentiment analysis techniques. Four machine learning models Tree of Decisions (DT), Support Vector Machine (SVM), Naive Bayes (NB), and K-Nearest Neighbor (KNN) will also be used. To evaluate how well libraries for SA operate in the social network environment, comparative study was also carried out. The measures to assess the best algorithms in this experiment, which used a single data set for each method, were precision, recall, and F1 score. We conclude that the BERT transformer method with an Accuracy: 0.973 is recommended for sentiment analysis.	翻訳日:2023-07-27 11:39:04 公開日:2023-07-26
# 量子信号処理による導出価格 Derivative Pricing using Quantum Signal Processing ( http://arxiv.org/abs/2307.14310v1 ) ライセンス: Link先を確認	Nikitas Stamatopoulos and William J. Zeng	(参考訳) 量子コンピュータ上の金融デリバティブの価格には一般的に量子算術要素が含まれ、対応する回路が必要とする量子リソースに大きく寄与する。本稿では,金融デリバティブ・ペイオフを量子振幅に直接エンコードする量子信号処理(qsp)に基づく手法を導入し,コストのかかる量子演算の負担から量子回路を緩和する。文献における現在の最先端のアプローチと比較すると、実用的関心のあるデリバティブ契約の場合、qspの適用により考慮されるすべての指標において必要なリソースが大幅に削減され、最も注目すべきなのは、$\sim 16$xのtゲートの総数と$\sim 4$xの論理量子ビット数である。さらに、量子優位性に必要な論理クロックレートも、$\sim 5$x の係数で低減されると推定する。全体として、量子アドバンテージは4.7$k論理量子ビットを必要とし、量子デバイスは45$MHzのレートで10^9$Tゲートを実行できる。本研究は,提案手法を最も容易に適用可能なデリバティブ価格プロセスのペイオフコンポーネントを特に重視する一方で,同様の手法を用いて,状態準備などの他のアプリケーションにおけるリソースの削減を図ることができる。 Pricing financial derivatives on quantum computers typically includes quantum arithmetic components which contribute heavily to the quantum resources required by the corresponding circuits. In this manuscript, we introduce a method based on Quantum Signal Processing (QSP) to encode financial derivative payoffs directly into quantum amplitudes, alleviating the quantum circuits from the burden of costly quantum arithmetic. Compared to current state-of-the-art approaches in the literature, we find that for derivative contracts of practical interest, the application of QSP significantly reduces the required resources across all metrics considered, most notably the total number of T-gates by $\sim 16$x and the number of logical qubits by $\sim 4$x. Additionally, we estimate that the logical clock rate needed for quantum advantage is also reduced by a factor of $\sim 5$x. Overall, we find that quantum advantage will require $4.7$k logical qubits, and quantum devices that can execute $10^9$ T-gates at a rate of $45$MHz. While in this work we focus specifically on the payoff component of the derivative pricing process where the method we present is most readily applicable, similar techniques can be employed to further reduce the resources in other applications, such as state preparation.	翻訳日:2023-07-27 11:38:38 公開日:2023-07-26
# QPLEX: 組合せ最適化ソフトウェアへの量子コンピューティングの統合の実現 QPLEX: Realizing the Integration of Quantum Computing into Combinatorial Optimization Software ( http://arxiv.org/abs/2307.14308v1 ) ライセンス: Link先を確認	Juan Giraldo, Jos\'e Ossorio, Norha M. Villegas, Gabriel Tamura, Ulrike Stege	(参考訳) 量子コンピューティングは、複雑な問題を解決する際に現在の古典的コンピュータの能力を超える可能性がある。コンビネーション最適化は量子コンピュータの重要なターゲット領域の一つとして登場しており、この分野で見られる問題は、多くの異なる産業応用分野(例えば、製造業務の強化や意思決定プロセスの改善)において重要な役割を担っている。現在、様々なタイプの高性能最適化ソフトウェア(例えば、ILOG CPLEX や Gurobi)があり、技術者や科学者が古典的なコンピュータを用いて最適化問題を解くのを支援する。量子リソースを利用するには、ユーザーは量子アルゴリズム、SDK、ライブラリのドメイン固有の知識を必要とする。私たちの目標は、従来の最適化パッケージにソフトウェアインフラストラクチャを追加することで、アプリケーション開発者がワークフローのセットアップ時に簡単に量子プラットフォームとインターフェースできるようにすることです。本稿では,古典的インタフェースによる量子資源のシームレス利用のためのツールを提案する。このアプローチは、複数の量子プロバイダへのアクセスを容易にするバックエンドを提供するPythonライブラリ拡張で構成されています。我々のパイプラインは、最適化ソフトウェア開発者が量子リソースを選択的に実験し、ハイブリッド量子古典最適化ソリューションの性能改善を評価することを可能にする。 Quantum computing has the potential to surpass the capabilities of current classical computers when solving complex problems. Combinatorial optimization has emerged as one of the key target areas for quantum computers as problems found in this field play a critical role in many different industrial application sectors (e.g., enhancing manufacturing operations or improving decision processes). Currently, there are different types of high-performance optimization software (e.g., ILOG CPLEX and Gurobi) that support engineers and scientists in solving optimization problems using classical computers. In order to utilize quantum resources, users require domain-specific knowledge of quantum algorithms, SDKs and libraries, which can be a limiting factor for any practitioner who wants to integrate this technology into their workflows. Our goal is to add software infrastructure to a classical optimization package so that application developers can interface with quantum platforms readily when setting up their workflows. This paper presents a tool for the seamless utilization of quantum resources through a classical interface. Our approach consists of a Python library extension that provides a backend to facilitate access to multiple quantum providers. Our pipeline enables optimization software developers to experiment with quantum resources selectively and assess performance improvements of hybrid quantum-classical optimization solutions.	翻訳日:2023-07-27 11:38:13 公開日:2023-07-26
# バーチャルミラー:3回めのバウンスを超える非視界イメージング Virtual Mirrors: Non-Line-of-Sight Imaging Beyond the Third Bounce ( http://arxiv.org/abs/2307.14341v1 ) ライセンス: Link先を確認	Diego Royo and Talha Sultan and Adolfo Mu\~noz and Khadijeh Masumnia-Bisheh and Eric Brandt and Diego Gutierrez and Andreas Velten and Julio Marco	(参考訳) 非視線撮像法(NLOS)は、間接照明を用いて観察者が見えない複雑なシーンを再構成することができる。しかし、彼らは3オンスの照明のみを仮定しており、現在は単角形状に制限されており、特定の方向での撮像面の視認性は限られている。これらの制約を推理し、対処するために、平面拡散面は計算波ベースのNLOSイメージング領域で用いられる波長で特異に振る舞うという重要な観察を行う。このような表面を仮想鏡と呼ぶ。我々は、この観察を利用して、第3のバウンスを超えた照明を用いて、nlosイメージングの能力を拡大し、2つの問題、すなわち、可視角の制限された単一角物体の撮影と、2つの角の背後に隠れた物体の撮像に対処した。対象物の視認角を限定した画像に対して,まず,物体の位置や方向の推定として,現場表面上の既知の照度点の反射を,視認性に乏しい範囲で解析する。次に,対象物体を直接視認する他の面の二次開口部を計算的に構築することにより,これら限られた可視性物体を可視化する。単一角nlosイメージング以外にも,2つの角に隠れた物体の鏡像が形成される仮想ミラーの背後にある空間をイメージングすることにより,仮想ミラーの鏡面挙動を利用して2つの角に隠れた物体を撮像する。この論文の作成には鏡面は関与していない。 Non-line-of-sight (NLOS) imaging methods are capable of reconstructing complex scenes that are not visible to an observer using indirect illumination. However, they assume only third-bounce illumination, so they are currently limited to single-corner configurations, and present limited visibility when imaging surfaces at certain orientations. To reason about and tackle these limitations, we make the key observation that planar diffuse surfaces behave specularly at wavelengths used in the computational wave-based NLOS imaging domain. We call such surfaces virtual mirrors. We leverage this observation to expand the capabilities of NLOS imaging using illumination beyond the third bounce, addressing two problems: imaging single-corner objects at limited visibility angles, and imaging objects hidden behind two corners. To image objects at limited visibility angles, we first analyze the reflections of the known illuminated point on surfaces of the scene as an estimator of the position and orientation of objects with limited visibility. We then image those limited visibility objects by computationally building secondary apertures at other surfaces that observe the target object from a direct visibility perspective. Beyond single-corner NLOS imaging, we exploit the specular behavior of virtual mirrors to image objects hidden behind a second corner by imaging the space behind such virtual mirrors, where the mirror image of objects hidden around two corners is formed. No specular surfaces were involved in the making of this paper.	翻訳日:2023-07-27 11:33:38 公開日:2023-07-26
# 散逸による非エルミタン破砕 Non-Hermitian tearing by dissipation ( http://arxiv.org/abs/2307.14340v1 ) ライセンス: Link先を確認	Qian Du and Su-Peng Kou	(参考訳) 本稿では,エネルギー帯域が虚線ギャップを示し,エネルギー固有状態が特定の領域に結合する散逸下での非エルミート系について検討する。これらの現象を説明するために、我々は「非エルミート的破断」の概念を提案し、我々が定義した破断の程度は例外的な点で連続的な相転移を示す。非エルミート的分解は、バルク状態分離と境界状態分離の2つの形態で表される。非エルミート断裂のより深い理解のために、実空間におけるNNハミルトニアンを減少させることにより、k-空間において有効22ハミルトニアンを与える。さらに,一次元Su-Schrieffer-HeegerモデルとQi-Wu-Zhangモデルにおける非エルミート断裂についても検討する。この結果は、より複雑なシステムにおける非エルミート断裂の研究に理論的アプローチを提供する。 In the paper, we study the non-Hermitian system under dissipation in which the energy band shows an imaginary line gap and energy eigenstates are bound to a specific region. To describe these phenomena, we propose the concept of "non-Hermitian tearing", in which the degree of tearing we defined reveals a continuous phase transition at the exceptional point. The non-Hermitian tearing manifests in two forms -- bulk state separation and boundary state decoupling. For a deeper understanding of non-Hermitian tearing, we give the effective 22 Hamiltonian in the k-space by reducing the NN Hamiltonian in the real space. In addition, we also explore the non-Hermitian tearing in the one-dimensional Su-Schrieffer-Heeger model and the Qi-Wu-Zhang model. Our results provide a theoretical approach for studying non-Hermitian tearing in more complex systems.	翻訳日:2023-07-27 11:33:10 公開日:2023-07-26
# TabR: 検索機能強化された語彙深層学習のパワーを解き放つ TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning ( http://arxiv.org/abs/2307.14338v1 ) ライセンス: Link先を確認	Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev, Daniil Shlenskii, Akim Kotelnikov, Artem Babenko	(参考訳) グラフデータ問題に対するディープラーニング(DL)モデルはますます注目を集めている一方、勾配ブースト決定木(GBDT)に基づくアルゴリズムは依然として強力なゴーツーソリューションである。自然言語処理やコンピュータビジョンといった他の領域の最近のトレンドに続き、検索拡張表型DLモデルが最近提案されている。与えられた対象オブジェクトに対して、検索ベースモデルは、利用可能な(トレーニング)データから、最も近い隣接オブジェクトなどの他の関連オブジェクトを検索し、それらの特徴やラベルを使用してより良い予測を行う。しかし,既存の検索ベースの表型DLソリューションは,適切に調整された単純な検索自由ベースラインよりも,マイナーなメリットしか得られないことがわかった。したがって、検索に基づくアプローチが表型DLにとって価値のある方向であるかどうかは不明である。本論では,この問題に対して強い肯定的な回答を与える。まず,単純なフィードフォワードアーキテクチャを,多くの(表型)検索ベースモデルと同様の注意深い検索コンポーネントで段階的に拡張することから始める。次に,表データ問題に対する性能に大きな影響を与える注意機構について,いくつかの詳細を強調するが,先行研究では検討されなかった。その結果、TabRは単純な検索ベースの表型DLモデルであり、一連の公開ベンチマークにおいて、表型DLモデルの中で最高の平均性能を示し、複数のデータセットで新しい最先端技術となり、最近提案された‘GBDTフレンドリ’ベンチマークではGBDTモデルよりも優れています(第1図参照)。 Deep learning (DL) models for tabular data problems are receiving increasingly more attention, while the algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution. Following the recent trends in other domains, such as natural language processing and computer vision, several retrieval-augmented tabular DL models have been recently proposed. For a given target object, a retrieval-based model retrieves other relevant objects, such as the nearest neighbors, from the available (training) data and uses their features or even labels to make a better prediction. However, we show that the existing retrieval-based tabular DL solutions provide only minor, if any, benefits over the properly tuned simple retrieval-free baselines. Thus, it remains unclear whether the retrieval-based approach is a worthy direction for tabular DL. In this work, we give a strong positive answer to this question. We start by incrementally augmenting a simple feed-forward architecture with an attention-like retrieval component similar to those of many (tabular) retrieval-based models. Then, we highlight several details of the attention mechanism that turn out to have a massive impact on the performance on tabular data problems, but that were not explored in prior work. As a result, we design TabR -- a simple retrieval-based tabular DL model which, on a set of public benchmarks, demonstrates the best average performance among tabular DL models, becomes the new state-of-the-art on several datasets, and even outperforms GBDT models on the recently proposed ``GBDT-friendly'' benchmark (see the first figure).	翻訳日:2023-07-27 11:32:41 公開日:2023-07-26
# MAMO:モノクロビデオ深度推定のためのメモリと注意の活用 MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation ( http://arxiv.org/abs/2307.14336v1 ) ライセンス: Link先を確認	Rajeev Yasarla, Hong Cai, Jisoo Jeong, Yunxiao Shi, Risheek Garrepalli, Fatih Porikli	(参考訳) モノクロ映像深度推定のための新しいメモリとアテンションフレームであるMAMOを提案する。 MAMOは、任意の単一画像深度推定ネットワークをビデオ深度推定モデルに拡張し、改善し、時間的情報を利用してより正確な深度を予測できる。また,MAMoでは,映像を流すときの深度予測を支援するメモリによるモデル拡張を行う。具体的には、前回のインスタンスの視覚的および変位的トークンを記憶する。これにより、現在のフレームの深さを予測する際に、深度ネットワークが過去から関連する特徴を相互参照することができる。本稿では,過去と現在の両方の視覚情報に対応するトークンを保持するために,メモリを継続的に更新する新しい手法を提案する。本稿では,自己認識モジュールを用いた視覚的・変位的メモリトークン間の時空間的関係を初めて学習するプロセスメモリ特徴に対する注意に基づくアプローチを採用する。さらに、自己注意の出力特徴を、交差注意を通して現在の視覚特徴と集約する。交差した特徴は最終的にデコーダに与えられ、現在のフレームの深さを予測する。 KITTI,NYU-Depth V2,DDADなどのベンチマーク実験を通じて,MAMOは単分子深度推定ネットワークを一貫して改善し,新しいSOTA(State-of-the-art)の精度を設定することを示した。特に,当社のMAMoビデオ深度推定は,SOTAコストボリュームに基づくビデオ深度モデルに準じて,低レイテンシで高い精度を実現する。 We propose MAMo, a novel memory and attention frame-work for monocular video depth estimation. MAMo can augment and improve any single-image depth estimation networks into video depth estimation models, enabling them to take advantage of the temporal information to predict more accurate depth. In MAMo, we augment model with memory which aids the depth prediction as the model streams through the video. Specifically, the memory stores learned visual and displacement tokens of the previous time instances. This allows the depth network to cross-reference relevant features from the past when predicting depth on the current frame. We introduce a novel scheme to continuously update the memory, optimizing it to keep tokens that correspond with both the past and the present visual information. We adopt attention-based approach to process memory features where we first learn the spatio-temporal relation among the resultant visual and displacement memory tokens using self-attention module. Further, the output features of self-attention are aggregated with the current visual features through cross-attention. The cross-attended features are finally given to a decoder to predict depth on the current frame. Through extensive experiments on several benchmarks, including KITTI, NYU-Depth V2, and DDAD, we show that MAMo consistently improves monocular depth estimation networks and sets new state-of-the-art (SOTA) accuracy. Notably, our MAMo video depth estimation provides higher accuracy with lower latency, when omparing to SOTA cost-volume-based video depth models.	翻訳日:2023-07-27 11:31:47 公開日:2023-07-26
# WavJourney: 大きな言語モデルによる作曲オーディオ作成 WavJourney: Compositional Audio Creation with Large Language Models ( http://arxiv.org/abs/2307.14335v1 ) ライセンス: Link先を確認	Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang	(参考訳) 大規模言語モデル(LLM)は、複雑な言語とビジョンタスクに取り組むために多様な専門家モデルを統合することに大きな期待を示している。人工知能生成コンテンツ(AIGC: Artificial Intelligence Generated Content)の分野を推し進めることの重要性にもかかわらず、インテリジェントなオーディオコンテンツ作成におけるそのポテンシャルは未解明のままである。そこで本研究では,音声,音楽,音響効果を含むストーリーラインを用いたテキスト指示による音声コンテンツ作成の問題に取り組む。 llmを利用して様々なオーディオモデルを音声コンテンツ生成につなげるシステムwavjourneyを提案する。聴覚シーンのテキスト記述が与えられると、wavjourneyはまずllmsに音声ストーリーテリング専用の構造化スクリプトを生成するように促す。オーディオスクリプトは、その時空間関係に基づいて構成された多様なオーディオ要素を含む。音声の概念表現として、音声スクリプトは対話的で解釈可能な人間の関与の根拠を提供する。その後、オーディオスクリプトをスクリプトコンパイラに供給し、それをコンピュータプログラムに変換する。プログラムの各行はタスク固有の音声生成モデルまたは計算操作関数(例えば、連結、混合)を呼び出します。そして、コンピュータプログラムを実行し、音声生成のための説明可能な解を得る。我々は,sf,教育,ラジオプレイなど,現実世界のさまざまなシナリオにおけるwavjourneyの実用性を示す。 WavJourneyの説明可能なインタラクティブなデザインは、マルチラウンド対話における人間と機械の共創を促進し、オーディオ制作における創造的制御と適応性を高める。 WavJourneyは人間の想像力をオーディオ化し、マルチメディアコンテンツの創造性のための新たな道を開く。 Large Language Models (LLMs) have shown great promise in integrating diverse expert models to tackle intricate language and vision tasks. Despite their significance in advancing the field of Artificial Intelligence Generated Content (AIGC), their potential in intelligent audio content creation remains unexplored. In this work, we tackle the problem of creating audio content with storylines encompassing speech, music, and sound effects, guided by text instructions. We present WavJourney, a system that leverages LLMs to connect various audio models for audio content generation. Given a text description of an auditory scene, WavJourney first prompts LLMs to generate a structured script dedicated to audio storytelling. The audio script incorporates diverse audio elements, organized based on their spatio-temporal relationships. As a conceptual representation of audio, the audio script provides an interactive and interpretable rationale for human engagement. Afterward, the audio script is fed into a script compiler, converting it into a computer program. Each line of the program calls a task-specific audio generation model or computational operation function (e.g., concatenate, mix). The computer program is then executed to obtain an explainable solution for audio generation. We demonstrate the practicality of WavJourney across diverse real-world scenarios, including science fiction, education, and radio play. The explainable and interactive design of WavJourney fosters human-machine co-creation in multi-round dialogues, enhancing creative control and adaptability in audio production. WavJourney audiolizes the human imagination, opening up new avenues for creativity in multimedia content creation.	翻訳日:2023-07-27 11:31:23 公開日:2023-07-26
# 汎用バイオメディカルAIを目指して Towards Generalist Biomedical AI ( http://arxiv.org/abs/2307.14334v1 ) ライセンス: Link先を確認	Tao Tu, Shekoofeh Azizi, Danny Driess, Mike Schaekermann, Mohamed Amin, Pi-Chuan Chang, Andrew Carroll, Chuck Lau, Ryutaro Tanno, Ira Ktena, Basil Mustafa, Aakanksha Chowdhery, Yun Liu, Simon Kornblith, David Fleet, Philip Mansfield, Sushant Prakash, Renee Wong, Sunny Virmani, Christopher Semturs, S Sara Mahdavi, Bradley Green, Ewa Dominowska, Blaise Aguera y Arcas, Joelle Barral, Dale Webster, Greg S. Corrado, Yossi Matias, Karan Singhal, Pete Florence, Alan Karthikesalingam, Vivek Natarajan	(参考訳) 医学は本質的にマルチモーダルであり、テキスト、画像、ゲノムなど幅広いリッチなデータモダリティを持つ。このデータを柔軟にエンコードし、統合し、大規模に解釈する一般のバイオメディカル人工知能(AI)システムは、科学的発見からケアデリバリーまで、影響のあるアプリケーションを可能にする可能性がある。これらのモデルの開発を可能にするために,我々はまず,新しいマルチモーダルバイオメディカルベンチマークであるMultiMedBenchをキュレートする。 MultiMedBenchは、医学的質問応答、マンモグラフィーと皮膚科のイメージ解釈、放射線学レポートの生成と要約、ゲノム変異呼び出しなどの14のタスクを含む。次に、汎用バイオメディカルAIシステムの概念実証であるMed-PaLM Multimodal(Med-PaLM M)を紹介する。 med-palm mは、同じモデル重みを持つ臨床言語、画像、ゲノムを含む生体医学データを柔軟にエンコードし、解釈する大きなマルチモーダル生成モデルである。 Med-PaLM Mは、すべてのMultiMedBenchタスクにおける技術状況と競合するか、あるいは超越している。また,新しい医療概念や課題に対するゼロショット一般化,タスク間のポジティブトランスファー学習,創発的ゼロショット医療推論の例を報告する。我々は,Med-PaLM Mの能力と限界を更に探究するために,モデル生成(およびヒト)胸部X線検査の放射線学的評価を行い,モデルスケールでの性能向上を観察する。 246例の胸部X線を並べて評価すると、臨床医は放射線科医が最大40.50%の症例で作成したものよりも、Med-PaLM Mの報告を相互に好んでいる。実世界のユースケースでこれらのモデルを検証するには、かなりの作業が必要であるが、私たちの結果は、一般のバイオメディカルAIシステムの開発に向けたマイルストーンである。 Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, and interpret this data at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To enable the development of these models, we first curate MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduce Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system. Med-PaLM M is a large multimodal generative model that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. Med-PaLM M reaches performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. We also report examples of zero-shot generalization to novel medical concepts and tasks, positive transfer learning across tasks, and emergent zero-shot medical reasoning. To further probe the capabilities and limitations of Med-PaLM M, we conduct a radiologist evaluation of model-generated (and human) chest X-ray reports and observe encouraging performance across model scales. In a side-by-side ranking on 246 retrospective chest X-rays, clinicians express a pairwise preference for Med-PaLM M reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility. While considerable work is needed to validate these models in real-world use cases, our results represent a milestone towards the development of generalist biomedical AI systems.	翻訳日:2023-07-27 11:30:54 公開日:2023-07-26
# イベントベースビジョンによるマニピュレーション動作の早期予測 Event-based Vision for Early Prediction of Manipulation Actions ( http://arxiv.org/abs/2307.14332v1 ) ライセンス: Link先を確認	Daniel Deniz and Cornelia Fermuller and Eduardo Ros and Manuel Rodriguez-Alvarez and Francisco Barranco	(参考訳) ニューロモルフィックな視覚センサーは、シーンで明るさが変化するときに非同期イベントのシーケンスを出力する人工網膜である。これらのセンサーは、非常に高時間分解能、動きのぼやけがなく、リアルタイム処理に理想的なスマートデータ圧縮など、多くの利点を提供している。本研究では,微粒な操作動作に関するイベントベースデータセットを導入し,イベントを伴う動作予測にトランスフォーマーを使用する実験を行った。認知ロボティクスや人間とロボットの相互作用の分野では、人間の行動の理解と予測にできる限り早く関心がある。早期予測は、計画のための複雑な段階を予測し、効果的かつリアルタイムなインタラクションを可能にする。当社のTransformerネットワークでは,オンライン推論を用いてイベントを使用して操作動作の予測を行っている。このモデルは、早期に行動を予測することに成功し、時間とともに信頼性を高め、最先端の分類を達成する。さらに,注意に基づくトランスフォーマアーキテクチャにより,モデルによって選択された時空間パターンの役割を考察できる。実験の結果,Transformer ネットワークはビデオベースのアプローチよりも優れた動作ダイナミックな特徴を捉え,アクション間の差異が極めて微妙な方法で発生するシナリオに成功していることがわかった。最後に,新たなイベントデータセットをリリースする。このデータセットは,アクション認識の操作に関する文献の中で最初のものだ。コードはhttps://github.com/DaniDeniz/EventVisionTransformer.comから入手できる。 Neuromorphic visual sensors are artificial retinas that output sequences of asynchronous events when brightness changes occur in the scene. These sensors offer many advantages including very high temporal resolution, no motion blur and smart data compression ideal for real-time processing. In this study, we introduce an event-based dataset on fine-grained manipulation actions and perform an experimental study on the use of transformers for action prediction with events. There is enormous interest in the fields of cognitive robotics and human-robot interaction on understanding and predicting human actions as early as possible. Early prediction allows anticipating complex stages for planning, enabling effective and real-time interaction. Our Transformer network uses events to predict manipulation actions as they occur, using online inference. The model succeeds at predicting actions early on, building up confidence over time and achieving state-of-the-art classification. Moreover, the attention-based transformer architecture allows us to study the role of the spatio-temporal patterns selected by the model. Our experiments show that the Transformer network captures action dynamic features outperforming video-based approaches and succeeding with scenarios where the differences between actions lie in very subtle cues. Finally, we release the new event dataset, which is the first in the literature for manipulation action recognition. Code will be available at https://github.com/DaniDeniz/EventVisionTransformer.	翻訳日:2023-07-27 11:30:17 公開日:2023-07-26
# Visual Instruction Inversion: Visual Promptingによる画像編集 Visual Instruction Inversion: Image Editing via Visual Prompting ( http://arxiv.org/abs/2307.14331v1 ) ライセンス: Link先を確認	Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee	(参考訳) テキスト条件の画像編集は画像編集の強力なツールとして登場した。しかし、多くの場合、言語は曖昧で、特定の画像編集を記述するのに役に立たない。このような課題に直面した場合、視覚的なプロンプトは、アイデアを伝えるためのより情報的で直感的な方法になり得る。本稿では,視覚的プロンプトによる画像編集手法を提案する。編集の「前」と「後」の画像を表す一対の例が与えられた場合、我々のゴールは、新しい画像で同じ編集を行うために使用できるテキストベースの編集方向を学ぶことである。テキストと画像の拡散モデルのリッチで事前訓練された編集機能を利用して、視覚的プロンプトを編集命令に変換する。この結果から,一対の例では,最先端のテキストコンディショニング画像編集フレームワークと比較して,競合的な結果が得られることがわかった。 Text-conditioned image editing has emerged as a powerful tool for editing images. However, in many situations, language can be ambiguous and ineffective in describing specific image edits. When faced with such challenges, visual prompts can be a more informative and intuitive way to convey ideas. We present a method for image editing via visual prompting. Given pairs of example that represent the "before" and "after" images of an edit, our goal is to learn a text-based editing direction that can be used to perform the same edit on new images. We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions. Our results show that with just one example pair, we can achieve competitive results compared to state-of-the-art text-conditioned image editing frameworks.	翻訳日:2023-07-27 11:29:53 公開日:2023-07-26
# 有効-ハミルトン理論:開量子系の平衡状態への近似 Effective-Hamiltonian theory: An approximation to the equilibrium state of open quantum systems ( http://arxiv.org/abs/2307.14330v1 ) ライセンス: Link先を確認	Nicholas Anto-Sztrikacs, Brett Min, Marlon Brenes, and Dvira Segal	(参考訳) 熱浴との強いカップリングにおける量子系の平衡状態(平均力ギブス状態)の近似として,最近開発された実効ハミルトニアン(effh)法 [prx quantum $\bf{4}$, 020307 (2023)] を拡張してベンチマークを行った。 EFFH法は近似フレームワークである。反応-配位写像、ポーラロン変換、制御された切断の組み合わせにより、系-バスカップリングパラメータをシステムのハミルトニアンにインプリントする。まず、$\textit{variational}$ EFFH 技術を開発する。本手法では,システムバス結合パラメータ(元のEFFH法のように)と浴槽温度の両方で,系のパラメータを正規化する。次に,一般化スピン-ボーソンモデルを適用し,数値実効シミュレーションに対するeffh法からの平衡状態の評価を行い,ブラウンスペクトル関数を用いた偏光とコヒーレンスの両方について良好な一致を示す。第3に, EFFH法と慣れ親しんだ (正規および変動) ポーラロン法を対比した。両手法が平衡状態の類似構造を予測することを示し,EFFH法は簡単な計算と閉形式解析結果の利点を提供する。同様に、系の周波数に匹敵する温度では、EFFH法は、極弱から超強までのシステム-バス結合の完全な範囲において、平均力ギブズ状態に対して良好な近似を提供する。 We extend and benchmark the recently-developed Effective-Hamiltonian (EFFH) method [PRX Quantum $\bf{4}$, 020307 (2023)] as an approximation to the equilibrium state ("mean-force Gibbs state") of a quantum system at strong coupling to a thermal bath. The EFFH method is an approximate framework. Through a combination of the reaction-coordinate mapping, a polaron transformation and a controlled truncation, it imprints the system-bath coupling parameters into the system's Hamiltonian. First, we develop a $\textit{variational}$ EFFH technique. In this method, system's parameters are renormalized by both the system-bath coupling parameters (as in the original EFFH approach) and the bath's temperature. Second, adopting the generalized spin-boson model, we benchmark the equilibrium state from the EFFH treatment against numerically-exact simulations and demonstrate a good agreement for both polarization and coherences using the Brownian spectral function. Third, we contrast the (normal and variational) EFFH approach with the familiar (normal and variational) polaron treatment. We show that the two methods predict a similar structure for the equilibrium state, albeit the EFFH approach offers the advantage of simpler calculations and closed-form analytical results. Altogether, we argue that for temperatures comparable to the system's frequencies, the EFFH methodology provides a good approximation for the mean-force Gibbs state in the full range of system-bath coupling, from ultraweak to ultrastrong.	翻訳日:2023-07-27 11:29:42 公開日:2023-07-26
# MHz周波数フラクソニウム量子ビットを用いた高感度交流電荷検出 High-sensitivity AC-charge detection with a MHz-frequency fluxonium qubit ( http://arxiv.org/abs/2307.14329v1 ) ライセンス: Link先を確認	B.-L. Najera-Santos, R. Rousseau, K. Gerashchenko, H. Patange, A. Riva, M. Villiers, T. Briant, P.-F. Cohadon, A. Heidmann, J. Palomo, M. Rosticher, H. le Sueur, A. Sarlette, W. C. Smith, Z. Leghtas, E. Flurin, T. Jacqmin, S. Del\'eglise	(参考訳) 強い双極子モーメントと長いコヒーレンス時間により、超伝導量子ビットはハイブリッド量子回路において顕著な成功を収めた。しかし、ほとんどの量子ビットアーキテクチャはGHz周波数範囲に限定されており、相互作用可能なシステムのクラスを厳しく制限している。一方、フラクソニウム量子ビットは、標準的なマイクロ波技術で操作され読み出されながら、非常に低い周波数にバイアスすることができる。ここでは、前例のない低い遷移周波数を1.8〜\mathrm{MHz}$で設計し、運用する。最終基底状態が 97.7~\%$ の ‘hot' 量子ビット遷移のサイドバンド冷却は, 有効温度が 23~\mu\mathrm{K}$ の値に対応する。さらに,コヒーレンス時間$t_1=34~\mu\mathrm{s}$,$t_2^=39〜\mu\mathrm{s}$,シングルショットのqubit状態の読み出しによるコヒーレント操作も示す。重要なことは、量子ビット遷移を容量結合導波路で直接処理することにより、高周波磁場に対する高い感度を示すことである。周期量子ビット合成と問合せにより、この低周波量子ビットを周波数分解電荷センサに変換する。この方法により、電荷感度は33〜\mu\mathrm{e}/\sqrt{\mathrm{Hz}}$、エネルギー感度(ヘルツあたりジュール)は2.8〜\hbar$となる。この方法は、直流電荷ノイズに対する固有の非感度を維持しつつ、最先端のトランスポートベースデバイスに匹敵する。高電荷感度と大きな静電容量シャントが組み合わさって、1-10〜\mathrm{MHz}$範囲の量子現象を探索するための新しい経路を解き放つ。 Owing to their strong dipole moment and long coherence times, superconducting qubits have demonstrated remarkable success in hybrid quantum circuits. However, most qubit architectures are limited to the GHz frequency range, severely constraining the class of systems they can interact with. The fluxonium qubit, on the other hand, can be biased to very low frequency while being manipulated and read out with standard microwave techniques. Here, we design and operate a heavy fluxonium with an unprecedentedly low transition frequency of $1.8~\mathrm{MHz}$. We demonstrate resolved sideband cooling of the ``hot'' qubit transition with a final ground state population of $97.7~\%$, corresponding to an effective temperature of $23~\mu\mathrm{K}$. We further demonstrate coherent manipulation with coherence times $T_1=34~\mu\mathrm{s}$, $T_2^=39~\mu\mathrm{s}$, and single-shot readout of the qubit state. Importantly, by directly addressing the qubit transition with a capacitively coupled waveguide, we showcase its high sensitivity to a radio-frequency field. Through cyclic qubit preparation and interrogation, we transform this low-frequency fluxonium qubit into a frequency-resolved charge sensor. This method results in a charge sensitivity of $33~\mu\mathrm{e}/\sqrt{\mathrm{Hz}}$, or an energy sensitivity (in joules per hertz) of $2.8~\hbar$. This method rivals state-of-the-art transport-based devices, while maintaining inherent insensitivity to DC charge noise. The high charge sensitivity combined with large capacitive shunt unlocks new avenues for exploring quantum phenomena in the $1-10~\mathrm{MHz}$ range, such as the strong-coupling regime with a resonant macroscopic mechanical resonator.	翻訳日:2023-07-27 11:29:12 公開日:2023-07-26

Title

Authors

Abstract

論文公表日・翻訳日

# デザインによる逆透過性による人間分析の再考

Rethinking People Analytics With Inverse Transparency by Design ( http://arxiv.org/abs/2305.09813v2 )

ライセンス: Link先を確認

Valentin Zieglmeier and Alexander Pretschner

(参考訳) 従業員は高度な分析を可能にする、ますますデジタル環境で働く。しかし、データを処理するシステムに対する監視は欠如している。つまり、潜在的な分析エラーや隠れバイアスは発見が難しいということだ。最近のデータ保護法はこれらの問題に取り組みますが、不十分です。データに対する適切なユースケースを省略しながらも、データの誤用を防ぎません。データ保護とデータ駆動システムとの対立は、異なる方法で解決すべきだと考えています。従業員のデータにアクセスする際には、逆透過性の概念に従って、すべての使用方法を透過的にする必要がある。これにより個人は、データ誤用による潜在的に有害な結果に対処しながら、賢明なデータ使用の恩恵を受けることができる。これを実現するために、我々は、デザインによる逆透明性と呼ばれる労働分析のための新しい設計手法を提案する。本提案の開発者およびユーザ視点を理解するために,学生を対象に2つの探索研究を行った。まず、小さな開発者のチームが逆透明性を備えた分析ツールを設計して、アプローチの判断方法と、それが開発ツールでどのように実現されているかを明らかにする。アーキテクチャの変更はコア機能を阻害することなく行われます。開発者は我々のアプローチを価値があり技術的に実現可能であると考えている。第2に,3ヶ月以上にわたってユーザ調査を実施し,参加者が提供された逆透過性を体験し,その経験を反映させる。この研究は、ほとんどの作業プロセスがすでにディジタルであるソフトウェア開発の作業場をモデル化する。参加者は透明性を有益と認識し、その権限を行使する。彼らは全会一致で職場の改善だと同意した。設計による逆透過性は、受け入れられた、責任ある人の分析を実現するための有望なアプローチであると結論づける。

Employees work in increasingly digital environments that enable advanced analytics. Yet, they lack oversight over the systems that process their data. That means that potential analysis errors or hidden biases are hard to uncover. Recent data protection legislation tries to tackle these issues, but it is inadequate. It does not prevent data misusage while at the same time stifling sensible use cases for data. We think the conflict between data protection and increasingly data-driven systems should be solved differently. When access to an employees' data is given, all usages should be made transparent to them, according to the concept of inverse transparency. This allows individuals to benefit from sensible data usage while addressing the potentially harmful consequences of data misusage. To accomplish this, we propose a new design approach for workforce analytics we refer to as inverse transparency by design. To understand the developer and user perspectives on the proposal, we conduct two exploratory studies with students. First, we let small teams of developers implement analytics tools with inverse transparency by design to uncover how they judge the approach and how it materializes in their developed tools. We find that architectural changes are made without inhibiting core functionality. The developers consider our approach valuable and technically feasible. Second, we conduct a user study over three months to let participants experience the provided inverse transparency and reflect on their experience. The study models a software development workplace where most work processes are already digital. Participants perceive the transparency as beneficial and feel empowered by it. They unanimously agree that it would be an improvement for the workplace. We conclude that inverse transparency by design is a promising approach to realize accepted and responsible people analytics.

翻訳日:2023-10-24 08:21:23 公開日:2023-07-26

# コードレビューにおけるコードスニペットの最小化 - OpenStackコミュニティとQtコミュニティの検討と実践者調査

Demystifying Code Snippets in Code Reviews: A Study of the OpenStack and Qt Communities and A Practitioner Survey ( http://arxiv.org/abs/2307.14406v1 )

ライセンス: Link先を確認

Beiqi Zhang, Liming Fu, Peng Liang, Jiaxin Yu, Chong Wang

(参考訳) コードレビューはソフトウェア開発におけるソフトウェア品質保証のベストプラクティスの1つとして広く知られている。典型的なコードレビュープロセスでは、レビュー担当者が開発者がコミットしたコードをチェックして、コードの品質を保証する。結果として、レビューコメントの情報を理解することは、レビュアーや開発者が効果的なコードレビューを行うための前提条件となる。コードスニペットは、特別なコード形式として、コードレビューに必要な情報を伝えるために使用できる。例えば、レビュアはコードスニペットを使って提案したり、アイデアを精巧にすることで、コードレビューで開発者に必要な情報を満たすことができる。しかし、コードレビューにコードスニペットを提供するプラクティスに注目した研究はほとんどない。このギャップを埋めるために、コードレビューのコードスニペットに関する情報と知識をマイニングする混合手法の研究を行い、実践者や研究者がコードレビューでコードスニペットを使用することについて理解を深めるのに役立つ。具体的には,コードレビューデータのマイニングと実践者の調査の2段階を含む。調査の結果は、レビュー担当者がコードレビューで開発者が必要とする特定の情報を満たすために、適切なシナリオでコードスニペットを提供することができる点を強調している。

Code review is widely known as one of the best practices for software quality assurance in software development. In a typical code review process, reviewers check the code committed by developers to ensure the quality of the code, during which reviewers and developers would communicate with each other in review comments to exchange necessary information. As a result, understanding the information in review comments is a prerequisite for reviewers and developers to conduct an effective code review. Code snippet, as a special form of code, can be used to convey necessary information in code reviews. For example, reviewers can use code snippets to make suggestions or elaborate their ideas to meet developers' information needs in code reviews. However, little research has focused on the practices of providing code snippets in code reviews. To bridge this gap, we conduct a mixed-methods study to mine information and knowledge related to code snippets in code reviews, which can help practitioners and researchers get a better understanding about using code snippets in code review. Specifically, our study includes two phases: mining code review data and conducting practitioners' survey. The study results highlight that reviewers can provide code snippets in appropriate scenarios to meet developers' specific information needs in code reviews, which will facilitate and accelerate the code review process.

翻訳日:2023-10-23 16:10:00 公開日:2023-07-26

# redditのデータマイニングが新型コロナパンデミックの学生の要求に応える

Mining Reddit Data to Elicit Students' Requirements During COVID-19 Pandemic ( http://arxiv.org/abs/2307.14212v1 )

ライセンス: Link先を確認

Shadikur Rahman, Faiz Ahmed, Maleknaz Nayebi

(参考訳) データ駆動要件エンジニアリングは、web上のオープンアクセスとクラウドソースの情報を豊富に活用する。モバイルアプリストアのレビューのようなソフトウェア製品に関するユーザフィードバックを取り入れることで、これらのアプローチは問題の特定、バグ修正、変更要求の実装を容易にする。しかしながら、ソフトウェア製品に関するユーザからのフィードバックにのみ依存することは、ソフトウェアが遭遇し、それを支援するために使用する問題、イベント、課題に関する豊富な経験にもかかわらず、ユーザがソフトウェアから正確なニーズを常に明確に理解しているとは限らないため、すべての要件を引き出す可能性を制限する。本研究では,ソフトウェア製品に対するフィードバックにのみ依存するのではなく,問題自体に関するフィードバックを集めることに着目し,要件適用のシフトを提案する。高等教育機関における新型コロナウイルスパンデミック時の学生要件に関するケーススタディを行った。パンデミックの間、Redditからコミュニケーションを集め、複数の機械学習と自然言語処理技術を使って要求文を特定しました。 TF-IDFを用いて複数の手法のベンチマークを行った結果,0.79のFスコアが得られた。その結果,問題に関するコミュニケーションからのマイニングの要件が実現可能であると考えることができた。予備的な結果を示す一方で、これらの要件が従来の要求を補完し、要求ギャップを埋める未来を想定する。

Data-driven requirements engineering leverages the abundance of openly accessible and crowdsourced information on the web. By incorporating user feedback provided about a software product, such as reviews in mobile app stores, these approaches facilitate the identification of issues, bug fixes, and implementation of change requests. However, relying solely on user feedback about a software product limits the possibility of eliciting all requirements, as users may not always have a clear understanding of their exact needs from the software, despite their wealth of experience with the problem, event, or challenges they encounter and use the software to assist them. In this study, we propose a shift in requirements elicitation, focusing on gathering feedback related to the problem itself rather than relying solely on feedback about the software product. We conducted a case study on student requirements during the COVID-19 pandemic in a higher education institution. We gathered their communications from Reddit during the pandemic and employed multiple machine-learning and natural language processing techniques to identify requirement sentences. We achieved the F-score of 0.79 using Naive Bayes with TF-IDF when benchmarking multiple techniques. The results lead us to believe that mining requirements from communication about a problem are feasible. While we present the preliminary results, we envision a future where these requirements complement conventionally elicited requirements and help to close the requirements gap.

翻訳日:2023-10-23 16:09:19 公開日:2023-07-26

# 局所的に観察し、グローバルに分類する: gnnを使ってスパースマトリックス構造を識別する

Observe Locally, Classify Globally: Using GNNs to Identify Sparse Matrix Structure ( http://arxiv.org/abs/2309.02442v1 )

ライセンス: Link先を確認

Khaled Abdelaal and Richard Veras

(参考訳) スパース行列計算の性能は、行列形式と計算されるデータの基盤構造との整合性に大きく依存する。異なるスパース行列形式は、データの異なる構造に適している。したがって、第一の課題は、計算の前に行列構造を識別して適切なデータ形式に適合させることである。 2つめの課題は、データセット全体を分類する前に読み込むのを避けることだ。これは、サンプルとその特徴を通してマトリックス構造を識別することで実現できる。しかし、グローバルな特徴はサンプリングセットから決定できず、代わりに局所的な特徴から推測する必要がある可能性がある。これらの課題に対処するために,グラフ畳み込みネットワークを用いたスパース行列構造分類器を生成するフレームワークを開発した。フレームワークは、ユーザが提供するジェネレータを使用して、他のマトリックス構造に拡張することもできる。提案手法は,代表的スパース行列形状の97%の分類精度を実現する。

The performance of sparse matrix computation highly depends on the matching of the matrix format with the underlying structure of the data being computed on. Different sparse matrix formats are suitable for different structures of data. Therefore, the first challenge is identifying the matrix structure before the computation to match it with an appropriate data format. The second challenge is to avoid reading the entire dataset before classifying it. This can be done by identifying the matrix structure through samples and their features. Yet, it is possible that global features cannot be determined from a sampling set and must instead be inferred from local features. To address these challenges, we develop a framework that generates sparse matrix structure classifiers using graph convolutional networks. The framework can also be extended to other matrix structures using user-provided generators. The approach achieves 97% classification accuracy on a set of representative sparse matrix shapes.

翻訳日:2023-10-23 09:04:33 公開日:2023-07-26

# LieDetect: 点雲からのコンパクトリー群の表現軌道の検出

LieDetect: Detection of representation orbits of compact Lie groups from point clouds ( http://arxiv.org/abs/2309.03086v1 )

ライセンス: Link先を確認

Henrique Ennes, Rapha\"el Tinarrage

(参考訳) 我々は、その軌道の有限サンプルからコンパクトリー群の表現を推定する新しいアルゴリズムを提案する。提案手法は,他の報告手法と異なり,既約表現の直接和として正確な表現型の検索を可能にする。さらに、表現型の知識は、その軌道の再構成を可能にし、これは作用を生成するリー群を特定するのに役立つ。我々のアルゴリズムは任意のコンパクトリー群に対して一般化されるが、SO(2), T^d, SU(2), SO(3) のインスタンス化のみが考慮される。ハウスドルフとヴァッサーシュタイン距離の観点からのロバスト性の理論的な保証が導かれる。我々のツールは幾何学的測度論、計算幾何学、行列多様体の最適化から導かれる。このアルゴリズムは16次元までの合成データと、画像解析、調和解析、古典力学システムにおける実時間応用のためにテストされ、非常に正確な結果が得られる。

We suggest a new algorithm to estimate representations of compact Lie groups from finite samples of their orbits. Different from other reported techniques, our method allows the retrieval of the precise representation type as a direct sum of irreducible representations. Moreover, the knowledge of the representation type permits the reconstruction of its orbit, which is useful to identify the Lie group that generates the action. Our algorithm is general for any compact Lie group, but only instantiations for SO(2), T^d, SU(2) and SO(3) are considered. Theoretical guarantees of robustness in terms of Hausdorff and Wasserstein distances are derived. Our tools are drawn from geometric measure theory, computational geometry, and optimization on matrix manifolds. The algorithm is tested for synthetic data up to dimension 16, as well as real-life applications in image analysis, harmonic analysis, and classical mechanics systems, achieving very accurate results.

翻訳日:2023-10-23 08:54:27 公開日:2023-07-26

# 炭素オフセット市場におけるweb3活用の枠組みとケーススタディ

Harnessing Web3 on Carbon Offset Market for Sustainability: Framework and A Case Study ( http://arxiv.org/abs/2308.02039v1 )

ライセンス: Link先を確認

Chenyu Zhou, Hongzhou Chen, Shiman Wang, Xinyao Sun, Abdulmotaleb El Saddik, Wei Cai

(参考訳) metaverseとweb3を形作る上で重要なブロックチェーンは、高エネルギー消費と二酸化炭素排出に対する批判をしばしば引き起こす。持続可能性を重視したブロックチェーンの台頭、特に革新的なワイヤレス技術と交差する場合、この状況は改善される。持続可能性におけるブロックチェーンの役割を理解するために,記録と追跡,広範な検証,バリュートレーディング,概念拡散という4つのグリーンユーティリティをカプセル化した3層構造を提案する。分権的自主的炭素オフセットプロジェクトであるノリ(ノリ)が当社の事例として,これらのユーティリティを照らす。我々の研究は、オンチェーンの炭素市場参加者に対するユニークな洞察、市場の要因、NTTベースの炭素クレジットの価値提案、そして、炭素オフセットの概念を広めるためのソーシャルメディアの役割に影響を及ぼす。ブロックチェーンのサステナビリティへの貢献は重要であり、ブロックチェーンセクターにおける新たな標準としてカーボンオフセットが進化する可能性がある、と私たちは主張しています。

Blockchain, pivotal in shaping the metaverse and Web3, often draws criticism for high energy consumption and carbon emission. The rise of sustainability-focused blockchains, especially when intersecting with innovative wireless technologies, revises this predicament. To understand blockchain's role in sustainability, we propose a three-layers structure encapsulating four green utilities: Recording and Tracking, Wide Verification, Value Trading, and Concept Disseminating. Nori, a decentralized voluntary carbon offset project, serves as our case, illuminating these utilities. Our research unveils unique insights into the on-chain carbon market participants, affect factors of the market, value propositions of NFT-based carbon credits, and the role of social media to spread the concept of carbon offset. We argue that blockchain's contribution to sustainability is significant, with carbon offsetting potentially evolving as a new standard within the blockchain sector.

翻訳日:2023-08-14 01:59:34 公開日:2023-07-26

# 適応的特徴埋め込みを用いたフレキシブルな個人用垂直学習

Flexible Differentially Private Vertical Federated Learning with Adaptive Feature Embeddings ( http://arxiv.org/abs/2308.02362v1 )

ライセンス: Link先を確認

Yuxi Mi, Hongquan Liu, Yewei Xia, Yiheng Sun, Jihong Guan, Shuigeng Zhou

(参考訳) 垂直連合学習(vertical federated learning, vfl)の出現は、プライバシー保護の不完全性に関する懸念を刺激した。本稿では、データプライバシとVFLのタスクユーティリティ目標との微妙な均衡を、差分プライバシー(DP)下で検討する。先行技術の一般性問題に対処するため,本稿では,2つの目標を分離し,順次対処するフレキシブルで汎用的なアプローチを提唱する。具体的には、さまざまなデータセットやモデルに適用可能な共有機能埋め込みにノームクリップを適用することで、最初は厳格なプライバシー保証を導き出します。提案手法は,DP機構を改良することなく,機能埋め込みの規模や分布を精度よく調整することで,タスクユーティリティを最適化できることを実証する。提案するVFL-AFEフレームワークは,広範な実験によって実証されたように,プライバシ攻撃に対する有効性と,良好なタスクユーティリティを維持する能力を示す。

The emergence of vertical federated learning (VFL) has stimulated concerns about the imperfection in privacy protection, as shared feature embeddings may reveal sensitive information under privacy attacks. This paper studies the delicate equilibrium between data privacy and task utility goals of VFL under differential privacy (DP). To address the generality issue of prior arts, this paper advocates a flexible and generic approach that decouples the two goals and addresses them successively. Specifically, we initially derive a rigorous privacy guarantee by applying norm clipping on shared feature embeddings, which is applicable across various datasets and models. Subsequently, we demonstrate that task utility can be optimized via adaptive adjustments on the scale and distribution of feature embeddings in an accuracy-appreciative way, without compromising established DP mechanisms. We concretize our observation into the proposed VFL-AFE framework, which exhibits effectiveness against privacy attacks and the capacity to retain favorable task utility, as substantiated by extensive experiments.

翻訳日:2023-08-14 01:47:28 公開日:2023-07-26

# 欧州のai法で許容されるリスク - リスク管理がどの程度で十分かを決める他の原則

Acceptable risks in Europe's proposed AI Act: Reasonableness and other principles for deciding how much risk management is enough ( http://arxiv.org/abs/2308.02047v1 )

ライセンス: Link先を確認

Henry Fraser and Jose-Miguel Bello y Villarino

(参考訳) 本稿では、基本的権利と安全性にリスクをもたらすリスクの高いAIシステムに対して、欧州委員会が提案したリスク管理とリスク受容性に対するAI法のアプローチを批判的に評価する。この法律は「信頼できる」AIを規制の負担に比例して推進することを目的としている。そのリスク受容性に関する規定は、リスクの高いシステムからの残留リスクを「芸術の状態」に関して「可能な限り」削減または排除する必要がある。この基準、特に狭義に解釈される場合、作業不能であり、規制上の負担や信頼性を比例しない。これとは対照的に、欧州議会のリスク管理条項に関する最新の修正案は「合理的性」、コスト対効果の分析を導入し、リスク受容可能性判断の価値と文脈の性質をより透明にしている。この論文では、議会のアプローチはより機能的であり、比例性と信頼性の目標のバランスが良いと論じている。リスク受容性判断の合理的性は、無視法や欧州医療機器規制の原則に依拠して説明されている。また、リスク受容性判断のアプローチには、規制当局による詳細なガイダンスや関与、影響のある利害関係者からの有意義なインプットなど、市民の正当性の確固たる基礎が必要です。

This paper critically evaluates the European Commission's proposed AI Act's approach to risk management and risk acceptability for high-risk AI systems that pose risks to fundamental rights and safety. The Act aims to promote "trustworthy" AI with a proportionate regulatory burden. Its provisions on risk acceptability require residual risks from high-risk systems to be reduced or eliminated "as far as possible", having regard to the "state of the art". This criterion, especially if interpreted narrowly, is unworkable and promotes neither proportionate regulatory burden, nor trustworthiness. By contrast the Parliament's most recent draft amendments to the risk management provisions introduce "reasonableness", cost-benefit analysis, and are more transparent about the value-laden and contextual nature of risk acceptability judgements. This paper argues that the Parliament's approach is more workable, and better balances the goals of proportionality and trustworthiness. It explains what reasonableness in risk acceptability judgments would entail, drawing on principles from negligence law and European medical devices regulation. And it contends that the approach to risk acceptability judgments need a firm foundation of civic legitimacy: including detailed guidance or involvement from regulators, and meaningful input from affected stakeholders.

翻訳日:2023-08-14 01:47:10 公開日:2023-07-26

# データサイエンスの民主化におけるChatGPTの役割--テレマティクスにおけるAIによるデータ分析の探索

The Role of ChatGPT in Democratizing Data Science: An Exploration of AI-facilitated Data Analysis in Telematics ( http://arxiv.org/abs/2308.02045v1 )

ライセンス: Link先を確認

Ryan Lingo

(参考訳) データサイエンスの領域は、かつて専門家のために確保されていたもので、生成AIの急速な台頭、特にChatGPTのようなツールを通じて革命を繰り広げている。本稿では,chatgptを重要な橋として捉え,従来の複雑なデータ分析に伴う急な学習曲線を格段に下げる。直感的なデータナラティブを生成し、リアルタイムのアシストを提供することで、ChatGPTはフィールドを民主化し、複雑なデータセットからより広い聴衆が洞察を得られるようにする。この変換ポテンシャルの注目すべき例が、合成生成されたテレマティクスデータセットの検証を通じて示され、chatgptは複雑なパターンや洞察を蒸留するのに役立っている。しかし、民主化への旅にはハードルがないわけではない。この論文は、分析における潜在的なバイアスからChatGPTの限定的な推論能力まで、そのようなAIが提示する課題について論じている。民主化されたデータサイエンスの展望の約束は、この移行に注意と認識、そしてツールの能力と制約の絶え間なく進化し続ける理解を持って取り組むことが不可欠である。

The realm of data science, once reserved for specialists, is undergoing a revolution with the rapid emergence of generative AI, particularly through tools like ChatGPT. This paper posits ChatGPT as a pivotal bridge, drastically lowering the steep learning curve traditionally associated with complex data analysis. By generating intuitive data narratives and offering real-time assistance, ChatGPT democratizes the field, enabling a wider audience to glean insights from intricate datasets. A notable illustration of this transformative potential is provided through the examination of a synthetically generated telematics dataset, wherein ChatGPT aids in distilling complex patterns and insights. However, the journey to democratization is not without its hurdles. The paper delves into challenges presented by such AI, from potential biases in analysis to ChatGPT's limited reasoning capabilities. While the promise of a democratized data science landscape beckons, it is imperative to approach this transition with caution, cognizance, and an ever-evolving understanding of the tool's capabilities and constraints.

翻訳日:2023-08-14 01:46:47 公開日:2023-07-26

# ホップ長に対する微分可能な短時間フーリエ変換

Differentiable short-time Fourier transform with respect to the hop length ( http://arxiv.org/abs/2308.02421v1 )

ライセンス: Link先を確認

Maxime Leiber, Yosra Marnissi, Axel Barrau, Mohammed El Badaoui

(参考訳) 本稿では,これらのパラメータを連続させることにより,ホップ長やフレーム時間位置の勾配に基づく最適化を可能にする,短時間フーリエ変換(STFT)の微分可能バージョンを提案する。ホップ長の連続的な性質により、より微調整された最適化が可能となり、フレームの時間的位置決めの制御を改善した。さらに,従来の離散最適化手法よりも計算効率がよい勾配降下法などの最適化手法の利用も可能である。私たちの差別化可能なSTFTは、既存のアルゴリズムやニューラルネットワークに簡単に統合することができます。本研究は,提案手法の有効性を実証し,研究コミュニティの関心を惹きつけるためのシミュレーションイラストを提示する。

In this paper, we propose a differentiable version of the short-time Fourier transform (STFT) that allows for gradient-based optimization of the hop length or the frame temporal position by making these parameters continuous. Our approach provides improved control over the temporal positioning of frames, as the continuous nature of the hop length allows for a more finely-tuned optimization. Furthermore, our contribution enables the use of optimization methods such as gradient descent, which are more computationally efficient than conventional discrete optimization methods. Our differentiable STFT can also be easily integrated into existing algorithms and neural networks. We present a simulated illustration to demonstrate the efficacy of our approach and to garner interest from the research community.

翻訳日:2023-08-14 01:39:01 公開日:2023-07-26

# ウィンドウ長に対する可変適応短時間フーリエ変換

Differentiable adaptive short-time Fourier transform with respect to the window length ( http://arxiv.org/abs/2308.02418v1 )

ライセンス: Link先を確認

Maxime Leiber, Yosra Marnissi, Axel Barrau, Mohammed El Badaoui

(参考訳) 本稿では,短時間フーリエ変換 (STFT) のフレーム単位と周波数単位のウィンドウ長を段階的に最適化する手法を提案する。結果として得られる微分可能適応STFTは、過渡成分と定常成分の両方に同じ時間周波数表現で適応できるなど、可換性を持っているが、勾配降下により容易に最適化できる。本手法の性能を振動解析で検証する。

This paper presents a gradient-based method for on-the-fly optimization for both per-frame and per-frequency window length of the short-time Fourier transform (STFT), related to previous work in which we developed a differentiable version of STFT by making the window length a continuous parameter. The resulting differentiable adaptive STFT possesses commendable properties, such as the ability to adapt in the same time-frequency representation to both transient and stationary components, while being easily optimized by gradient descent. We validate the performance of our method in vibration analysis.

翻訳日:2023-08-14 01:38:24 公開日:2023-07-26

# 先進運転支援システムにおける視力検出

Visual Saliency Detection in Advanced Driver Assistance Systems ( http://arxiv.org/abs/2308.03770v1 )

ライセンス: Link先を確認

Francesco Rundo, Michael Sebastian Rundo, Concetto Spampinato

(参考訳) ビジュアル・サリエンシ(Visual Saliency)とは、観察された環境から重要な特徴を集中して抽出する人間のメカニズムである。近年,視覚障害者の視力評価に関する自動車研究の分野への関心が高まっている。運転中、ドライバーは自然に特定の物体に注意を向け、他の要素よりも特定の要素を優先する脳駆動のサリエンシメカニズムを採用する。本研究では,ドライバの眠気検出システムと,サリエンシーに基づくシーン理解パイプラインを組み合わせたインテリジェントシステムを提案する。そこで本研究では,自動車グレードの外部カメラで捉えたフレームの処理を事前訓練し,調整した,セマンティックセグメンテーションのための3Dディープネットワークを実装した。提案されたパイプラインは、ARM A7デュアルコアを備えたSTA1295コアを使用した組み込みプラットフォーム上にホストされ、ハードウェアアクセラレータが組み込まれている。さらに,自動車ハンドルに埋め込まれた革新的なバイオセンサーを用いて運転者の眠気を監視し,運転者の光PlethysmoGraphy(PPG)信号を収集する。収集したppg時系列を分類する専用の1次元時間深層畳み込みネットワークが考案され,ドライバの注意度を評価することができた。最終的に、運転者の決定された注意レベルと対応する相性に基づくシーン分類を比較し、全体の安全レベルを評価する。提案したパイプラインの有効性は広範な実験結果によって検証された。

Visual Saliency refers to the innate human mechanism of focusing on and extracting important features from the observed environment. Recently, there has been a notable surge of interest in the field of automotive research regarding the estimation of visual saliency. While operating a vehicle, drivers naturally direct their attention towards specific objects, employing brain-driven saliency mechanisms that prioritize certain elements over others. In this investigation, we present an intelligent system that combines a drowsiness detection system for drivers with a scene comprehension pipeline based on saliency. To achieve this, we have implemented a specialized 3D deep network for semantic segmentation, which has been pretrained and tailored for processing the frames captured by an automotive-grade external camera. The proposed pipeline was hosted on an embedded platform utilizing the STA1295 core, featuring ARM A7 dual-cores, and embeds an hardware accelerator. Additionally, we employ an innovative biosensor embedded on the car steering wheel to monitor the driver drowsiness, gathering the PhotoPlethysmoGraphy (PPG) signal of the driver. A dedicated 1D temporal deep convolutional network has been devised to classify the collected PPG time-series, enabling us to assess the driver level of attentiveness. Ultimately, we compare the determined attention level of the driver with the corresponding saliency-based scene classification to evaluate the overall safety level. The efficacy of the proposed pipeline has been validated through extensive experimental results.

翻訳日:2023-08-14 00:41:27 公開日:2023-07-26

# 薬理学データベースへの自然インタフェースのための大規模言語モデルの利用

Utilizing Large Language Models for Natural Interface to Pharmacology Databases ( http://arxiv.org/abs/2307.15717v1 )

ライセンス: Link先を確認

Hong Lu, Chuan Li, Yinheng Li, Jie Zhao

(参考訳) 薬の開発プロセスは、薬理学者が文学のレビュー、仮説の定式化、実験の設計、結果の解釈など、様々なタスクを負う必要がある。各ステージは大量の情報にアクセスし、クエリする必要がある。本稿では,データベースに格納された構造化情報と対話するためのLarge Language Model (LLM)ベースの自然言語インタフェースを提案する。提案手法の有効性と有効性を示す実験を行った。このフレームワークは、幅広い薬学データと知識ベースに問合せを一般化することができる。

The drug development process necessitates that pharmacologists undertake various tasks, such as reviewing literature, formulating hypotheses, designing experiments, and interpreting results. Each stage requires accessing and querying vast amounts of information. In this abstract, we introduce a Large Language Model (LLM)-based Natural Language Interface designed to interact with structured information stored in databases. Our experiments demonstrate the feasibility and effectiveness of the proposed framework. This framework can generalize to query a wide range of pharmaceutical data and knowledge bases.

翻訳日:2023-08-06 11:35:37 公開日:2023-07-26

# 生成言語モデルを用いたニューラルマシン翻訳のためのデータ拡張

Data Augmentation for Neural Machine Translation using Generative Language Model ( http://arxiv.org/abs/2307.16833v1 )

ライセンス: Link先を確認

Seokjin Oh, Su ah Lee and Woohwan Jung

(参考訳) モデルアーキテクチャの急速な成長にもかかわらず、大きな並列コーパスの不足はニューラルマシン翻訳の主要なボトルネックである。データ拡張(Data augmentation)は、新しいデータを集める代わりに合成データを生成することによって、データハングリーモデルの性能を向上させる技術である。 chatgptのような大規模言語モデルを活用したプロンプトベースのデータ拡張手法について検討する。合成並列コーパスを作成するために,異なるプロンプトを用いて3つの手法を比較する。生成した合成データの多様性を測定するために2つの評価指標を用いる。このアプローチは、バックトランスレーションのような他の拡張メソッドで必須となる、さらなるモデルトレーニングコストを必要としない。提案手法では, ベースラインを0.68 bleuスコアで改善する。

Despite the rapid growth in model architecture, the scarcity of large parallel corpora remains the main bottleneck in Neural Machine Translation. Data augmentation is a technique that enhances the performance of data-hungry models by generating synthetic data instead of collecting new ones. We explore prompt-based data augmentation approaches that leverage large-scale language models such as ChatGPT. To create a synthetic parallel corpus, we compare 3 methods using different prompts. We employ two assessment metrics to measure the diversity of the generated synthetic data. This approach requires no further model training cost, which is mandatory in other augmentation methods like back-translation. The proposed method improves the unaugmented baseline by 0.68 BLEU score.

翻訳日:2023-08-06 11:22:22 公開日:2023-07-26

# 単語埋め込みにおけるアイデアの流れ

The flow of ideas in word embeddings ( http://arxiv.org/abs/2307.16819v1 )

ライセンス: Link先を確認

Debayan Dasgupta

(参考訳) アイデアの流れは物理学者、心理学者、機械学習エンジニアによって広く研究されている。本稿では, マイクロレオロジーの具体的ツールを用いて, 類似性に基づくアイデアの流れを考察する。単語埋め込みにランダムウォーカを導入し,その振る舞いについて検討する。このような類似性によるランダムウォークは、生体細胞や複雑な流体のような複雑な構造系でよく見られる異常拡散の徴候を示す。論文は,ランダムウォークとブラウン運動下での粒子拡散の研究に使用される一般的なツールを用いて,文書中の多様なアイデアの蓄積を定量的に評価することを提案する。全体として,マイクロレオロジーと機械学習の概念を組み合わせた自己参照手法を提案し,言語モデルの有意義な傾向と創造性との関連性について考察する。

The flow of ideas has been extensively studied by physicists, psychologists, and machine learning engineers. This paper adopts specific tools from microrheology to investigate the similarity-based flow of ideas. We introduce a random walker in word embeddings and study its behavior. Such similarity-mediated random walks through the embedding space show signatures of anomalous diffusion commonly observed in complex structured systems such as biological cells and complex fluids. The paper concludes by proposing the application of popular tools employed in the study of random walks and diffusion of particles under Brownian motion to assess quantitatively the incorporation of diverse ideas in a document. Overall, this paper presents a self-referenced method combining microrheology and machine learning concepts to explore the meandering tendencies of language models and their potential association with creativity.

翻訳日:2023-08-06 11:22:13 公開日:2023-07-26

# ヒューマノイドロボットに具体化された汎用知性の構築とテスト

Building and Testing a General Intelligence Embodied in a Humanoid Robot ( http://arxiv.org/abs/2307.16770v1 )

ライセンス: Link先を確認

Suzanne Gildert and Geordie Rose

(参考訳) 人間レベルの知能を持つ機械は、最も経済的に価値のある仕事ができるはずです。これは、人間のような心を作るという科学的大挑戦と、大きな経済的なインセンティブと一致する。本稿では,このようなシステムの構築とテストのアプローチについて述べる。私たちのアプローチは、物理的ヒューマノイドロボットシステム、このタイプのロボットのためのソフトウェアベースの制御システム、ヒューマノイドロボットにおける人間に似た知性を測定するために設計されたパフォーマンスメトリック、およびこのパフォーマンスメトリックのスコアを漸進的に増やす進化的アルゴリズムを含む。それぞれの現状について紹介し、解説する。本報告では, システムにおけるg+測定の現在および歴史的測定について報告する。

Machines with human-level intelligence should be able to do most economically valuable work. This aligns a major economic incentive with the scientific grand challenge of building a human-like mind. Here we describe our approach to building and testing such a system. Our approach comprises a physical humanoid robotic system; a software based control system for robots of this type; a performance metric, which we call g+, designed to be a measure of human-like intelligence in humanoid robots; and an evolutionary algorithm for incrementally increasing scores on this performance metric. We introduce and describe the current status of each of these. We report on current and historical measurements of the g+ metric on the systems described here.

翻訳日:2023-08-06 11:21:33 公開日:2023-07-26

# 大規模言語モデルのための透かし統合のための3つのれんが

Three Bricks to Consolidate Watermarks for Large Language Models ( http://arxiv.org/abs/2308.00113v1 )

ライセンス: Link先を確認

Pierre Fernandez, Antoine Chaffin, Karim Tit, Vivien Chappelier, Teddy Furon

(参考訳) 生成テキストと自然テキストの区別はますます困難になっている。この文脈では、ウォーターマーキングは、生成されたテキストを特定のモデルに割り当てるための有望なテクニックとして現れる。サンプリング生成プロセスを変更して、生成した出力に目に見えない痕跡を残すことで、後続の検出を容易にする。本研究は,3つの理論的および経験的考察に基づいて,大規模言語モデルの透かしを統合する。まず、低い偽陽性率(10$^{\text{-6}}$未満)でも有効であるような、堅牢な理論的保証を提供する新しい統計テストを導入する。第2に,自然言語処理の分野における古典的なベンチマークを用いたウォーターマークの有効性を比較し,実世界への適用可能性について考察する。第3に,LLMへのアクセスが可能なシナリオとマルチビット透かしの高度な検出手法を開発した。

The task of discerning between generated and natural texts is increasingly challenging. In this context, watermarking emerges as a promising technique for ascribing generated text to a specific model. It alters the sampling generation process so as to leave an invisible trace in the generated output, facilitating later detection. This research consolidates watermarks for large language models based on three theoretical and empirical considerations. First, we introduce new statistical tests that offer robust theoretical guarantees which remain valid even at low false-positive rates (less than 10$^{\text{-6}}$). Second, we compare the effectiveness of watermarks using classical benchmarks in the field of natural language processing, gaining insights into their real-world applicability. Third, we develop advanced detection schemes for scenarios where access to the LLM is available, as well as multi-bit watermarking.

翻訳日:2023-08-06 11:13:34 公開日:2023-07-26

# 巨大な言語モデルは人間の言語を理解することができるのか?

A Sentence is Worth a Thousand Pictures: Can Large Language Models Understand Human Language? ( http://arxiv.org/abs/2308.00109v1 )

ライセンス: Link先を確認

Gary Marcus, Evelina Leivada, Elliot Murphy

(参考訳) 人工知能アプリケーションは、単語の予測に依存する言語関連のタスクに大きな可能性を示す。現在の世代の大きな言語モデルは、人間の言語的パフォーマンスに関する主張と関連付けられており、その応用は、人工知能の重要なステップとして、そして人間の言語における認知的、さらには神経的基礎を理解するための大きな進歩として、双方に称賛されている。我々は,大規模言語モデルの寄与を,対象システムの理論的に有意な表現として分析し,これらのモデルの開発・活用の現状からまだ欠落している重要な能力を特定する。

Artificial Intelligence applications show great potential for language-related tasks that rely on next-word prediction. The current generation of large language models have been linked to claims about human-like linguistic performance and their applications are hailed both as a key step towards Artificial General Intelligence and as major advance in understanding the cognitive, and even neural basis of human language. We analyze the contribution of large language models as theoretically informative representations of a target system vs. atheoretical powerful mechanistic tools, and we identify the key abilities that are still missing from the current state of development and exploitation of these models.

翻訳日:2023-08-06 11:13:21 公開日:2023-07-26

# DPBERT:動的計画に基づくBERTの効率的な推論

DPBERT: Efficient Inference for BERT based on Dynamic Planning ( http://arxiv.org/abs/2308.00108v1 )

ライセンス: Link先を確認

Weixin Wu and Hankz Hankui Zhuo

(参考訳) BERTのような大規模事前訓練型言語モデルは、NLPの開発に大きく貢献している。しかし、これらのモデルには膨大な計算資源が必要であり、計算能力に制限のあるモバイルデバイスに適用することは困難である。本稿では,BERTの構造を十分に活用できない既存の入力適応推論手法の弱点に対処することを目的とする。本稿では,入力サンプルの計算経路としてバックボーンの変圧器層リストを選択することで,BERTの推論プロセスを高速化する新しい微調整手法であるBERTの動的プランニングを提案する。これを実現するため、本手法では、推論中に層が含まれているかバイパスされているかを判断する計画モジュールを元のBERTモデルに追加する。 glueベンチマークによる実験の結果,98\%の精度を維持しつつ遅延を75\%まで低減し,最先端の入力適応法と比較して精度と速度のトレードオフが向上した。

Large-scale pre-trained language models such as BERT have contributed significantly to the development of NLP. However, those models require large computational resources, making it difficult to be applied to mobile devices where computing power is limited. In this paper we aim to address the weakness of existing input-adaptive inference methods which fail to take full advantage of the structure of BERT. We propose Dynamic Planning in BERT, a novel fine-tuning strategy that can accelerate the inference process of BERT through selecting a subsequence of transformer layers list of backbone as a computational path for an input sample. To do this, our approach adds a planning module to the original BERT model to determine whether a layer is included or bypassed during inference. Experimental results on the GLUE benchmark exhibit that our method reduces latency to 75\% while maintaining 98\% accuracy, yielding a better accuracy-speed trade-off compared to state-of-the-art input-adaptive methods.

翻訳日:2023-08-06 11:13:10 公開日:2023-07-26

# ユーザ言語がChatGPTの紛争的死亡率に与える影響

How User Language Affects Conflict Fatality Estimates in ChatGPT ( http://arxiv.org/abs/2308.00072v1 )

ライセンス: Link先を確認

Daniel Kazenwadel and Christoph V. Steinert

(参考訳) OpenAIのChatGPT言語モデルは、複雑な問題解決と情報検索のための強力なツールとして人気を集めている。しかしながら、言語固有のトレーニングデータに存在するバイアスの再現に関する懸念が生じる。本研究では,イスラエル・パレスチナ・トルコ・クルド紛争の文脈でこの問題に対処する。我々はgpt-3.5を用いて、以前の紛争ではヘブライ語とアラビア語、後者ではトルコ語とクルド語の両方で、特定の空爆の犠牲者について問い合わせる自動クエリ手順を採用した。分析の結果,GPT-3.5は標的グループの言語よりも攻撃者の言語で検索した場合の死亡率を27$\pm$11%低下させることがわかった。このような攻撃の存在を否定する広範囲な回答は、さらに矛盾を増し、通常の検索エンジンには存在しない新しいバイアス機構を生み出した。この言語バイアスは、既存のメディアバイアスを増幅し、情報バブルに寄与する可能性があり、最終的には紛争を補強する。

OpenAI's ChatGPT language model has gained popularity as a powerful tool for complex problem-solving and information retrieval. However, concerns arise about the reproduction of biases present in the language-specific training data. In this study, we address this issue in the context of the Israeli-Palestinian and Turkish-Kurdish conflicts. Using GPT-3.5, we employed an automated query procedure to inquire about casualties in specific airstrikes, in both Hebrew and Arabic for the former conflict and Turkish and Kurdish for the latter. Our analysis reveals that GPT-3.5 provides 27$\pm$11 percent lower fatality estimates when queried in the language of the attacker than in the language of the targeted group. Evasive answers denying the existence of such attacks further increase the discrepancy, creating a novel bias mechanism not present in regular search engines. This language bias has the potential to amplify existing media biases and contribute to information bubbles, ultimately reinforcing conflicts.

翻訳日:2023-08-06 11:12:09 公開日:2023-07-26

# 3:1 Nesting Rules in Redistricting

3:1 Nesting Rules in Redistricting ( http://arxiv.org/abs/2308.00605v1 )

ライセンス: Link先を確認

Christopher Donnay

(参考訳) 立法再編成では、ほとんどの州が下院と上院の地図を別々に描いている。オハイオ州とウィスコンシン州は上院の選挙区に3:1のネスト規則、すなわち隣接する下院の3つの選挙区から作るよう求めている。我々は、この要件が再編成に与える影響、特に特定の政党が獲得した議席数について調査する。 2つのマルコフ連鎖モンテカルロシミュレーションを比較した。1つはレコン連鎖を使ってネスト条件なしで元老院地図を生成し、もう1つは3:1ネスト条件で元老院地図を生成する新しいチェーンを使用する。さらに、両チェーンでオハイオ州の立憲郡分割要件を実装している。 3:1のネスト規則を必要とすることは、勝利した席の分布に最小限の影響を与える。一方、オハイオ州の郡分割要求を強制することは、この分布を厳しく制限する。

In legislative redistricting, most states draw their House and Senate maps separately. Ohio and Wisconsin require that their Senate districts be made with a 3:1 nesting rule, i.e., out of triplets of adjacent House districts. We seek to study the impact of this requirement on redistricting, specifically on the number of seats won by a particular political party. We compare two Markov Chain Monte Carlo simulations, one which uses the ReCom chain to generate Senate maps without a nesting requirement, and the other which uses a novel chain that generates Senate maps with a 3:1 nesting requirement. Moreover, we implement Ohio's constitutional county splitting requirements in both chains. We find that requiring a 3:1 nesting rule has minimal impact on the distribution of seats won. On the other hand, enforcing Ohio's county splitting requirements severely restricts this distribution.

翻訳日:2023-08-06 11:02:11 公開日:2023-07-26

# ood-cv-v2: 自然画像における個々の迷惑の分散シフトに対するロバスト性の拡張ベンチマーク

OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images ( http://arxiv.org/abs/2304.10266v2 )

ライセンス: Link先を確認

Bingchen Zhao, Jiahao Wang, Wufei Ma, Artur Jesslen, Siwei Yang, Shaozuo Yu, Oliver Zendel, Christian Theobalt, Alan Yuille, Adam Kortylewski

(参考訳) 現実のシナリオにおけるビジョンアルゴリズムの堅牢性を高めることは難しい。一つの理由は、既存の堅牢性ベンチマークは、合成データに依存するか、個々のニュアンス要因の影響を無視しているため、制限されているからである。 ood-cv-v2は,ポーズ,形状,テクスチャ,コンテクスト,気象条件の10種類の対象カテゴリの分散例を含むベンチマークデータセットであり,画像分類,物体検出,3次元ポーズ推定のためのモデルのベンチマークを可能にする。この新たなデータセットに加えて、一般的なベースライン手法を用いた広範な実験にも貢献する。 1)一部のニュアンス要因は、視力タスクにもよるが、他の要因に比べてパフォーマンスに強い負の影響がある。 2) 強靭性向上への現在のアプローチは限界効果しか持たず, 強靭性も低減できる。 3) 畳み込みアーキテクチャと変圧器アーキテクチャでは大きな違いはみられない。当社のデータセットは、堅牢性を研究するための豊富なテストベッドを提供し、この分野の研究を進めるのに役立ちます。私たちのデータセットはhttps://bzhao.me/OOD-CV/からアクセスできます。

Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors. We introduce OOD-CV-v2, a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the weather conditions, and enables benchmarking of models for image classification, object detection, and 3D pose estimation. In addition to this novel dataset, we contribute extensive experiments using popular baseline methods, which reveal that: 1) Some nuisance factors have a much stronger negative effect on the performance compared to others, also depending on the vision task. 2) Current approaches to enhance robustness have only marginal effects, and can even reduce robustness. 3) We do not observe significant differences between convolutional and transformer architectures. We believe our dataset provides a rich test bed to study robustness and will help push forward research in this area. Our dataset can be accessed from https://bzhao.me/OOD-CV/

翻訳日:2023-07-31 15:50:48 公開日:2023-07-26

# 共通拡散騒音スケジューリングとサンプルステップの欠陥

Common Diffusion Noise Schedules and Sample Steps are Flawed ( http://arxiv.org/abs/2305.08891v2 )

ライセンス: Link先を確認

Shanchuan Lin, Bingchen Liu, Jiashi Li, Xiao Yang

(参考訳) 一般的な拡散雑音のスケジュールは、信号対雑音比(snr)をゼロにする最後の時間ステップを強制せず、拡散サンプラーの実装のいくつかは、最後の時間ステップから開始しない。このような設計には欠陥があり、モデルが推論時に純粋なガウスノイズを与えられるという事実を反映せず、トレーニングと推論の間に相違が生じている。既存の実装に欠陥のある設計が本当の問題を引き起こすことを示す。安定拡散(Stable Diffusion)では、モデルが中輝度の画像のみを生成することを厳しく制限し、非常に明るく暗いサンプルを生成するのを防ぐ。我々は,(1) ノイズスケジュールを再スケールして端末snrをゼロにする,(2) モデルをv予測でトレーニングする,(3) サンプリング器を最後の時間ステップから常に起動するように変更する,(4) 過度な露出を防止するための再スケール分類器フリーガイダンスを提案する。これらの単純な変更により、トレーニングと推論の間に拡散プロセスが一致し、モデルは元のデータ分布に忠実なサンプルを生成することができる。

We discover that common diffusion noise schedules do not enforce the last timestep to have zero signal-to-noise ratio (SNR), and some implementations of diffusion samplers do not start from the last timestep. Such designs are flawed and do not reflect the fact that the model is given pure Gaussian noise at inference, creating a discrepancy between training and inference. We show that the flawed design causes real problems in existing implementations. In Stable Diffusion, it severely limits the model to only generate images with medium brightness and prevents it from generating very bright and dark samples. We propose a few simple fixes: (1) rescale the noise schedule to enforce zero terminal SNR; (2) train the model with v prediction; (3) change the sampler to always start from the last timestep; (4) rescale classifier-free guidance to prevent over-exposure. These simple changes ensure the diffusion process is congruent between training and inference and allow the model to generate samples more faithful to the original data distribution.

翻訳日:2023-07-31 15:42:12 公開日:2023-07-26

# シンクホーン損失を有するニューラルシュリンガー橋:コロイド自己組織化のデータ駆動型最小努力制御への応用

Neural Schr\"odinger Bridge with Sinkhorn Losses: Application to Data-driven Minimum Effort Control of Colloidal Self-assembly ( http://arxiv.org/abs/2307.14442v1 )

ライセンス: Link先を確認

Iman Nodozi, Charlie Yan, Mira Khare, Abhishek Halder, Ali Mesbah

(参考訳) 我々は、コロイド自己集合の最小労力制御が、1930年代初頭のアーウィン・シュル「オーディンガー」の業績に端を発する固定水平確率的最適制御問題のクラスである一般化シュル「オーディンガー橋問題」として順序パラメータ空間で自然に定式化できることを示した。近年、この種の問題は、制御と機械学習のコミュニティにおける研究活動が再燃している。このような問題に対する理論と計算に関する既存の文献とは異なり、コロイド自己集合の制御ドリフトと拡散係数は一般的に制御の非アフィンであり、物理ベースのモデリングから得られるのが困難である。このような一般化問題に対する最適性の条件を導出し、結果の方程式系が既存の結果と構造的に大きく異なることを示し、標準的な計算手法がもはや適用されないことを示す。そこで本研究では,ニューラルネットワークの最近の進歩を活かし,一般化したシュランガーブリッジ問題を解くために,データ駆動型学習制御フレームワーク「neural schr\"odinger bridge」を提案する。コロイド自己組織化の数値ケーススタディを用いて,提案手法の有効性について述べる。分子動力学シミュレーションデータを用いて制御ドリフトと拡散係数を2つのニューラルネットワークとして学習し、この2つを用いて、この制御問題に特有な分布終端制約を設計したシンクホーン損失を持つ第3のネットワークを訓練する。

We show that the minimum effort control of colloidal self-assembly can be naturally formulated in the order-parameter space as a generalized Schr\"odinger bridge problem -- a class of fixed-horizon stochastic optimal control problems that originated in the works of Erwin Schr\"odinger in the early 1930s. In recent years, this class of problems has seen a resurgence of research activities in control and machine learning communities. Different from the existing literature on the theory and computation for such problems, the controlled drift and diffusion coefficients for colloidal self-assembly are typically non-affine in control, and are difficult to obtain from physics-based modeling. We deduce the conditions of optimality for such generalized problems, and show that the resulting system of equations is structurally very different from the existing results in a way that standard computational approaches no longer apply. Thus motivated, we propose a data-driven learning and control framework, named `neural Schr\"odinger bridge', to solve such generalized Schr\"odinger bridge problems by innovating on recent advances in neural networks. We illustrate the effectiveness of the proposed framework using a numerical case study of colloidal self-assembly. We learn the controlled drift and diffusion coefficients as two neural networks using molecular dynamics simulation data, and then use these two to train a third network with Sinkhorn losses designed for distributional endpoint constraints, specific for this class of control problems.

翻訳日:2023-07-31 15:02:51 公開日:2023-07-26

# データセットにおける情報獲得サブグループ発見

Information Gained Subgroup Discovery in Datasets ( http://arxiv.org/abs/2307.15089v1 )

ライセンス: Link先を確認

Daniel G\'omez-Bravo, Aaron Garc\'ia, Guillermo Vigueras, Bel\'en R\'ios, Mariano Provencio, Alejandro Rodr\'iguez-Gonz\'alez

(参考訳) 肺がんは、がんの主要な死因である。 2023年には238,340人以上が肺がん患者で、死者は127,070人以上と推定されている。正しい治療を選択することは、生存確率を高め、患者の生活の質を改善する上で重要な要素である。がん治療は二次効果を引き起こす可能性がある。これらの毒性は患者の生活の質に影響を与える様々な健康問題を引き起こす。したがって, 治療効果の維持や改善を図りながら毒性を低下させることが, 臨床的視点から追求すべき重要な目標である。一方で、臨床ガイドラインには、臨床医を支援するためにがん治療の推奨に関する一般的な知識が含まれている。がん疾患の側面と個々の患者の特徴に基づく治療勧告を提供するが、治療結果を考慮した統計分析はここでは提供されない。したがって、臨床データに見られる臨床ガイドラインと治療パターンの比較は、検出されたパターンの検証と代替治療パターンの発見を可能にする。本研究では,情報ゲインとオッズ比を考慮した最も関連するパターンを見つけることを目的としたサブグループ発見アルゴリズムである,ゲインサブグループディスカバリーを提案する。そこで我々は,患者のデータ,所定の治療,その結果を含む肺癌患者の情報を含むデータセットを解析した。得られた結果は臨床医を通して検証され、臨床ガイドラインと比較される。このアルゴリズムは,本データセットにおける発見パターンの最も高い受け入れを実現するとともに,サブグループ発見の指標も向上させる。

Lung cancer is the leading cause of cancer death. More than 238,340 new cases of lung cancer patients are expected in 2023, with an estimation of more than 127,070 deaths. Choosing the correct treatment is an important element to enhance the probability of survival and to improve patient's quality of life. Cancer treatments might provoke secondary effects. These toxicities cause different health problems that impact the patient's quality of life. Hence, reducing treatments toxicities while maintaining or improving their effectivenes is an important goal that aims to be pursued from the clinical perspective. On the other hand, clinical guidelines include general knowledge about cancer treatment recommendations to assist clinicians. Although they provide treatment recommendations based on cancer disease aspects and individual patient features, a statistical analysis taking into account treatment outcomes is not provided here. Therefore, the comparison between clinical guidelines with treatment patterns found in clinical data, would allow to validate the patterns found, as well as discovering alternative treatment patterns. In this work, we present Information Gained Subgroup Discovery, a Subgroup Discovery algorithm that aims to find most relevant patterns taking into account Information gain and Odds ratio. Thus, we analyze a dataset containing lung cancer patients information including patients' data, prescribed treatments and their outcomes. Obtained results are validated through clinicians and compared with clinical guidelines. We conclude that this new algorithm achieves highest acceptance of found patterns in this dataset, while also improving indices of Subgroup Discovery.

翻訳日:2023-07-31 14:52:00 公開日:2023-07-26

# 公平な時間変動価格関税設計--共同学習と最適化アプローチ

Equitable Time-Varying Pricing Tariff Design: A Joint Learning and Optimization Approach ( http://arxiv.org/abs/2307.15088v1 )

ライセンス: Link先を確認

Liudong Chen and Bolun Xu

(参考訳) 時間変動価格関税は、消費者に電力需要のシフトとコストの削減を奨励するが、応答能力の制限された消費者のエネルギー負担を増加させる可能性がある。したがって、消費者の反応期待を考慮し、これらの関税を設計する際には、有用性と応答インセンティブのバランスをとらなければならない。本稿では,適切な時間変動関税を設計するための共同学習に基づく識別と最適化手法を提案する。提案手法は,歴史価格と需要応答データをリカレントニューラルネットワーク(RNN)に符号化し,高次元および非線形の消費者価格応答挙動を捉える。次に、RNNを関税設計最適化に組み込み、非線形最適化問題を2次目的に定式化する。本稿では,高速かつスケーラブルな計算を実現する勾配に基づく解法を提案する。実世界の消費者データを用いたシミュレーションは、我々の平等関税が低所得消費者を価格上昇から保護し、消費者にピーク需要を減らす動機付けを与えていることを示している。また、ユーティリティ企業の収益回復を確実にし、需要応答の不確実性や予測エラーに対して堅牢な性能を達成する。

Time-varying pricing tariffs incentivize consumers to shift their electricity demand and reduce costs, but may increase the energy burden for consumers with limited response capability. The utility must thus balance affordability and response incentives when designing these tariffs by considering consumers' response expectations. This paper proposes a joint learning-based identification and optimization method to design equitable time-varying tariffs. Our proposed method encodes historical prices and demand response data into a recurrent neural network (RNN) to capture high-dimensional and non-linear consumer price response behaviors. We then embed the RNN into the tariff design optimization, formulating a non-linear optimization problem with a quadratic objective. We propose a gradient-based solution method that achieves fast and scalable computation. Simulation using real-world consumer data shows that our equitable tariffs protect low-income consumers from price surges while effectively motivating consumers to reduce peak demand. The method also ensures revenue recovery for the utility company and achieves robust performance against demand response uncertainties and prediction errors.

翻訳日:2023-07-31 14:51:39 公開日:2023-07-26

# ヒ素ガリウムの2次元光機械結晶共振器

Two-dimensional optomechanical crystal resonator in gallium arsenide ( http://arxiv.org/abs/2307.15087v1 )

ライセンス: Link先を確認

Rhys G. Povey, Ming-Han Chou, Gustav Andersson, Christopher R. Conner, Joel Grebel, Yash J. Joshi, Jacob M. Miller, Hong Qiao, Xuntao Wu, Haoxiong Yan, Andrew N. Cleland

(参考訳) 量子計算と通信の分野では、電子エレクトロニクスと赤外線光学の間で量子コヒーレントな周波数変換が必要である。このための有望なプラットフォームは光学結晶共振器であり、同時にフォトニック結晶とフォノン結晶を用いて電磁モードと音響モードを結合する共局在キャビティを生成し、電気機械的相互作用によって電子に直接変換することができる。この領域での仕事の大半は1次元のナノビーム共振器で、強い光機械的カップリングを提供するが、その形状から、動作に必要なレーザーポンピングによって生じる熱を消散することができない。近年, 準2次元光学結晶空洞がシリコン中で開発され, 熱重合性も向上したが, 最適量子ビット動作周波数を超える機械周波数でも同様に強い結合を示した。ここでは、この設計を、電気機械的相互作用を取り入れ、超伝導量子ビットに理想的なf_m〜4.5GHzの機械共振モードを得ることができ、光学的結合g_om/(2pi)〜650kHzの自然な薄膜単結晶圧電体であるガリウムヒ素に適応させる。

In the field of quantum computation and communication there is a compelling need for quantum-coherent frequency conversion between microwave electronics and infra-red optics. A promising platform for this is an optomechanical crystal resonator that uses simultaneous photonic and phononic crystals to create a co-localized cavity coupling an electromagnetic mode to an acoustic mode, which then via electromechanical interactions can undergo direct transduction to electronics. The majority of work in this area has been on one-dimensional nanobeam resonators which provide strong optomechanical couplings but, due to their geometry, suffer from an inability to dissipate heat produced by the laser pumping required for operation. Recently, a quasi-two-dimensional optomechanical crystal cavity was developed in silicon exhibiting similarly strong coupling with better thermalization, but at a mechanical frequency above optimal qubit operating frequencies. Here we adapt this design to gallium arsenide, a natural thin-film single-crystal piezoelectric that can incorporate electromechanical interactions, obtaining a mechanical resonant mode at f_m ~ 4.5 GHz ideal for superconducting qubits, and demonstrating optomechanical coupling g_om/(2pi) ~ 650 kHz.

翻訳日:2023-07-31 14:51:20 公開日:2023-07-26

# 社会デモグラフィーを用いたBCGによる膀胱癌治療の数学的モデリング

Mathematical Modeling of BCG-based Bladder Cancer Treatment Using Socio-Demographics ( http://arxiv.org/abs/2307.15084v1 )

ライセンス: Link先を確認

Elizaveta Savchenko, Ariel Rosenfeld, Svetlana Bunimovich-Mendrazitsky

(参考訳) がんは、毎年何百万もの新規患者を抱える世界でも最も広範にある病気の1つである。膀胱癌は、明らかな原型患者を伴わない全ての個人に影響を及ぼす最も一般的ながんの1つである。 BCの現在の標準治療は、Bacillus Calmette-Guerin(BCG)免疫療法ベースの治療プロトコルに従っており、すべての患者にも適用される。 BCG治療に関連する臨床結果は、免疫系、治療、がん細胞間の相互作用の生物学的および臨床的複雑さにより、患者間で大きく異なる。本研究は,bcg治療に関連する臨床動態を記述したパーソナライズされた数学的モデルを提供するために,患者の社会デモグラフィを利用する。この目的のために,確立されたbcg処理モデルを採用し,機械学習コンポーネントを統合して,モデル内のキーパラメータの時間的調整と再構成を行い,パーソナライゼーションを促進する。実際の臨床データを用いて、我々のパーソナライズされたモデルが、治療終了時のがん細胞の数を平均14.8%改善し、元のモデルと好意的に比較した。

Cancer is one of the most widespread diseases around the world with millions of new patients each year. Bladder cancer is one of the most prevalent types of cancer affecting all individuals alike with no obvious prototypical patient. The current standard treatment for BC follows a routine weekly Bacillus Calmette-Guerin (BCG) immunotherapy-based therapy protocol which is applied to all patients alike. The clinical outcomes associated with BCG treatment vary significantly among patients due to the biological and clinical complexity of the interaction between the immune system, treatments, and cancer cells. In this study, we take advantage of the patient's socio-demographics to offer a personalized mathematical model that describes the clinical dynamics associated with BCG-based treatment. To this end, we adopt a well-established BCG treatment model and integrate a machine learning component to temporally adjust and reconfigure key parameters within the model thus promoting its personalization. Using real clinical data, we show that our personalized model favorably compares with the original one in predicting the number of cancer cells at the end of the treatment, with 14.8% improvement, on average.

翻訳日:2023-07-31 14:50:57 公開日:2023-07-26

# ディープシリアルナンバー:DNN知的財産保護のための計算透かし

Deep Serial Number: Computational Watermarking for DNN Intellectual Property Protection ( http://arxiv.org/abs/2011.08960v3 )

ライセンス: Link先を確認

Ruixiang Tang, Mengnan Du, Xia Hu

(参考訳) 本稿では,ディープニューラルネットワーク(DNN)に特化した簡易かつ効果的な透かしアルゴリズムであるDSN(Deep Serial Number)を提案する。 DNNに識別信号を組み込む従来の手法とは異なり、我々はDNNの知的財産権(IP)保護機構を探索し、敵の盗難ネットワークの使用を効果的に阻止する。従来のソフトウェアIPの保護におけるシリアル番号の成功に触発されて,DNNに埋め込まれたシリアル番号の最初の実装を提案する。これを実現するために、DSNは知識蒸留フレームワークに統合され、個人教師DNNが最初に訓練される。その後、その知識は蒸留され、一連のカスタマイズされた学生DNNに付与される。各顧客DNNは、有効なシリアル番号の入力時にのみ正しく機能する。各種アプリケーションにまたがる実験結果から、元のDNN性能を損なうことなく、DSNが不正使用を防止する効果が示された。さらに実験により、DSNは異なるカテゴリーのウォーターマーク攻撃に耐性があることが示されている。

In this paper, we present DSN (Deep Serial Number), a simple yet effective watermarking algorithm designed specifically for deep neural networks (DNNs). Unlike traditional methods that incorporate identification signals into DNNs, our approach explores a novel Intellectual Property (IP) protection mechanism for DNNs, effectively thwarting adversaries from using stolen networks. Inspired by the success of serial numbers in safeguarding conventional software IP, we propose the first implementation of serial number embedding within DNNs. To achieve this, DSN is integrated into a knowledge distillation framework, in which a private teacher DNN is initially trained. Subsequently, its knowledge is distilled and imparted to a series of customized student DNNs. Each customer DNN functions correctly only upon input of a valid serial number. Experimental results across various applications demonstrate DSN's efficacy in preventing unauthorized usage without compromising the original DNN performance. The experiments further show that DSN is resistant to different categories of watermark attacks.

翻訳日:2023-07-28 21:07:59 公開日:2023-07-26

# データ拡張における線形変換の一般化効果について

On the Generalization Effects of Linear Transformations in Data Augmentation ( http://arxiv.org/abs/2005.00695v3 )

ライセンス: Link先を確認

Sen Wu, Hongyang R. Zhang, Gregory Valiant, Christopher R\'e

(参考訳) データ拡張は、画像やテキストの分類タスクのようなアプリケーションのパフォーマンスを改善する強力な技術である。しかし、なぜ、どのように様々な拡張が機能するのかについての厳密な理解はほとんどない。本研究では,線形変換の族を考察し,過パラメータ線形回帰設定におけるリッジ推定器への影響について検討する。まず,データのラベルを保存する変換は,トレーニングデータのスパンを広げることで,推定を改善できることを示す。第二に、データを混合する変換が正規化効果を奏でることで推定を改善できることを示す。最後に,MNISTに関する理論的知見を検証した。そこで本研究では,モデルが変換データに対してどの程度不確実かによって,変換空間を探索する拡張手法を提案する。提案手法を画像およびテキストデータセット上で検証する。例えば,open-resnet-28-10を用いたcifar-100では,ランダムサンプリング法を1.24%上回った。さらに、CIFAR-10, CIFAR-100, SVHN, ImageNetデータセット上のSoTA Adversarial AutoAugmentに匹敵する精度を実現する。

Data augmentation is a powerful technique to improve performance in applications such as image and text classification tasks. Yet, there is little rigorous understanding of why and how various augmentations work. In this work, we consider a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting. First, we show that transformations that preserve the labels of the data can improve estimation by enlarging the span of the training data. Second, we show that transformations that mix data can improve estimation by playing a regularization effect. Finally, we validate our theoretical insights on MNIST. Based on the insights, we propose an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data. We validate our proposed scheme on image and text datasets. For example, our method outperforms random sampling methods by 1.24% on CIFAR-100 using Wide-ResNet-28-10. Furthermore, we achieve comparable accuracy to the SoTA Adversarial AutoAugment on CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets.

翻訳日:2023-07-28 21:07:43 公開日:2023-07-26

# 構成的連合学習:分布的ロバストな平均化とメタ学習への応用

Compositional federated learning: Applications in distributionally robust averaging and meta learning ( http://arxiv.org/abs/2106.11264v3 )

ライセンス: Link先を確認

Feihu Huang, Junyi Li

(参考訳) 本稿では,分散ロバストflやモデル非依存メタ学習(maml)といった階層構造を持つ多くのデータマイニング問題や機械学習問題で頻繁に発生する新しい構成的フェデレーション学習(fl)フレームワークの解法として,効率的かつ効率的な構成的フェデレーション学習(comfedl)アルゴリズムを提案する。さらに,いくつかの穏やかな条件下でのcomfedlアルゴリズムの収束解析を行い,$t$ が反復数を表す$o(\frac{1}{\sqrt{t}})$ の収束率を達成することを証明した。我々の知る限り、我々の新しいコンポジションFLフレームワークは、コンポジション確率最適化によるフェデレーション学習を橋渡しする最初の試みである。特に、分布的に堅牢なFL(ミニマックス最適化問題)をKL分散正規化を用いて単純な合成最適化問題に変換する。同時に、分布に依存しないMAML問題(ミニマックス最適化問題)も、単純で効果的な合成最適化問題に変換する。最後に、分布的に堅牢なFLとMAMLの2つの機械学習タスクを適用し、アルゴリズムの有効性を実証する。

In the paper, we propose an effective and efficient Compositional Federated Learning (ComFedL) algorithm for solving a new compositional Federated Learning (FL) framework, which frequently appears in many data mining and machine learning problems with a hierarchical structure such as distributionally robust FL and model-agnostic meta learning (MAML). Moreover, we study the convergence analysis of our ComFedL algorithm under some mild conditions, and prove that it achieves a convergence rate of $O(\frac{1}{\sqrt{T}})$, where $T$ denotes the number of iteration. To the best of our knowledge, our new Compositional FL framework is the first work to bridge federated learning with composition stochastic optimization. In particular, we first transform the distributionally robust FL (i.e., a minimax optimization problem) into a simple composition optimization problem by using KL divergence regularization. At the same time, we also first transform the distribution-agnostic MAML problem (i.e., a minimax optimization problem) into a simple yet effective composition optimization problem. Finally, we apply two popular machine learning tasks, i.e., distributionally robust FL and MAML to demonstrate the effectiveness of our algorithm.

翻訳日:2023-07-28 20:58:35 公開日:2023-07-26

# 接続型クエリの非効率PAC学習性について

On the non-efficient PAC learnability of conjunctive queries ( http://arxiv.org/abs/2208.10255v2 )

ライセンス: Link先を確認

Balder ten Cate, Maurice Funk, Jean Christoph Jung, Carsten Lutz

(参考訳) このメモは3つの目的を果たす。 (i)この概念クラスが多項式サイズの適合性に欠けており、これは計算学習理論の文献の多くで暗黙的に想定されている性質である、という複雑な事実に注意を払いながら、結合的問合せがおそらくは正しい(pac)モデルでは効率的に学習できないという事実を自己完結した表現を提供する。二) 連結クエリ(cqs)の多くの制限されたクラスに適用できる強い負のpac学習可能性(「非循環性」の幅広い概念に対する非循環的cqsを含む。)を確立する。 3) CQ(およびUCQ)は, メンバーシップクエリで効率よくPACを学習可能であることを示す。

This note serves three purposes: (i) we provide a self-contained exposition of the fact that conjunctive queries are not efficiently learnable in the Probably-Approximately-Correct (PAC) model, paying clear attention to the complicating fact that this concept class lacks the polynomial-size fitting property, a property that is tacitly assumed in much of the computational learning theory literature; (ii) we establish a strong negative PAC learnability result that applies to many restricted classes of conjunctive queries (CQs), including acyclic CQs for a wide range of notions of "acyclicity"; (iii) we show that CQs (and UCQs) are efficiently PAC learnable with membership queries.

翻訳日:2023-07-28 20:50:56 公開日:2023-07-26

# スカラー入力と関数出力のためのニューラルネットワーク

Neural Networks for Scalar Input and Functional Output ( http://arxiv.org/abs/2208.05776v2 )

ライセンス: Link先を確認

Sidi Wu, C\'edric Beaulac and Jiguo Cao

(参考訳) 一連のスカラー予測器に対する関数応答の回帰は、特に多くの予測器が存在する場合や、それらの予測器と応答の関係が非線形である場合、難しい課題となる。本研究では,この問題に対する解法を提案する。スカラー入力を用いて関数応答を予測するために設計されたフィードフォワードニューラルネットワーク(nn)である。まず、関数応答を有限次元表現に変換し、この表現を出力するnnを構成する。そこで本研究では,対象関数を介しNNの出力を改良し,ネットワークトレーニングのための異なる目的関数を導入することを提案する。提案手法は定期的および不規則な間隔データに適用可能であり, 予測曲線の滑らかさを制御するためにさらにラフネスペナルティを適用することができる。両方の機能を実装することの難しさは、バックプロパゲーション可能な客観的関数の定義にある。実験では,従来の関数・オン・スカラー回帰モデルを複数のシナリオで上回り,予測器の次元で計算的によくスケーリングできることを実証する。

The regression of a functional response on a set of scalar predictors can be a challenging task, especially if there is a large number of predictors, or the relationship between those predictors and the response is nonlinear. In this work, we propose a solution to this problem: a feed-forward neural network (NN) designed to predict a functional response using scalar inputs. First, we transform the functional response to a finite-dimensional representation and construct an NN that outputs this representation. Then, we propose to modify the output of an NN via the objective function and introduce different objective functions for network training. The proposed models are suited for both regularly and irregularly spaced data, and a roughness penalty can be further applied to control the smoothness of the predicted curve. The difficulty in implementing both those features lies in the definition of objective functions that can be back-propagated. In our experiments, we demonstrate that our model outperforms the conventional function-on-scalar regression model in multiple scenarios while computationally scaling better with the dimension of the predictors.

翻訳日:2023-07-28 20:50:43 公開日:2023-07-26

# ミニチュアクロックネットワークを用いた重力赤方偏移のラボベース実験

A lab-based test of the gravitational redshift with a miniature clock network ( http://arxiv.org/abs/2207.07145v2 )

ライセンス: Link先を確認

Xin Zheng, Jonathan Dolde, Matthew C. Cambria, Hong Ming Lim, Shimon Kolkowitz

(参考訳) アインシュタインの一般相対性理論では、高重力ポテンシャルの時計は低電位での同一の時計よりも速く動き、重力の赤方偏移として知られる効果が予測される。ここでは、高さ1cmの5つの原子アンサンブルの等間隔配列内の差分クロック比較を用いて、実験室による重力赤方偏移のブラインドテストを行う。 12.4\pm0.7_{\rm{(stat)}}\pm2.5_{\rm{(sys)}}]\times10^{-19}/$cmは、予想される10.9\times10^{-19}/$cmと一致する。我々の結果は、地球表面におけるmmスケールの変化に敏感な相対論的重力ポテンシャル差測定として見ることもできる。これらの結果は、測地学、新しい物理学の探索、重力波の検出、量子力学と重力の相互作用の探索を含む光原子時計の新たな応用のための局所オシレータ非依存差分クロック比較の可能性を強調している。

Einstein's theory of general relativity predicts that a clock at a higher gravitational potential will tick faster than an otherwise identical clock at a lower potential, an effect known as the gravitational redshift. Here we perform a laboratory-based, blinded test of the gravitational redshift using differential clock comparisons within an evenly spaced array of 5 atomic ensembles spanning a height difference of 1 cm. We measure a fractional frequency gradient of $[-12.4\pm0.7_{\rm{(stat)}}\pm2.5_{\rm{(sys)}}]\times10^{-19}/$cm, consistent with the expected redshift gradient of $-10.9\times10^{-19}/$cm. Our results can also be viewed as relativistic gravitational potential difference measurements with sensitivity to mm scale changes in height on the surface of the Earth. These results highlight the potential of local-oscillator-independent differential clock comparisons for emerging applications of optical atomic clocks including geodesy, searches for new physics, gravitational wave detection, and explorations of the interplay between quantum mechanics and gravity.

翻訳日:2023-07-28 20:49:51 公開日:2023-07-26

# 置換型進化アルゴリズムのランタイム解析

Runtime Analysis for Permutation-based Evolutionary Algorithms ( http://arxiv.org/abs/2207.04045v3 )

ライセンス: Link先を確認

Benjamin Doerr, Yassine Ghannane, Marouane Ibn Brahim

(参考訳) 進化的アルゴリズム(EA)の理論解析は、過去25年間に擬ブール最適化問題において大きな進歩を遂げてきたが、EAが置換に基づく問題を解決する方法に関する散発的な理論的な結果のみが存在する。置換に基づくベンチマークの欠如を克服するため,従来の擬似ブールベンチマークを置換集合上で定義されたベンチマークに変換する一般的な方法を提案する。次に、Scharnow, Tinnefeld, Wegener (2004) が提案した置換に基づく$(1+1)$ EAの厳密な実行時解析を、LeadingOnes と Jump ベンチマークの類似で実施する。後者は、ビットストリングと異なり、置換を$\sigma$を別の$\tau$に変換するのがどれほど難しいかを決定するハミング距離だけでなく、$\sigma \tau^{-1}$の正確なサイクル構造も示している。このため、より対称的なスクランブル変異演算子も考慮する。私たちは、それがより単純な証明につながるだけでなく、ジャンプ関数のランタイムを奇なジャンプサイズで$\thetaで減少させるのを観察する。 (n)$。最後に、ビットストリングの場合のように、スクランブル演算子の重み付きバージョンが$m^{\Thetaの高速化につながることを示す。 (m)$ on jump functions with jump size $m$。短い経験的分析によってこれらの発見が確認されたが、イヴォイドの変異率のような小さな実装の詳細が重要な違いをもたらすことも判明した。

While the theoretical analysis of evolutionary algorithms (EAs) has made significant progress for pseudo-Boolean optimization problems in the last 25 years, only sporadic theoretical results exist on how EAs solve permutation-based problems. To overcome the lack of permutation-based benchmark problems, we propose a general way to transfer the classic pseudo-Boolean benchmarks into benchmarks defined on sets of permutations. We then conduct a rigorous runtime analysis of the permutation-based $(1+1)$ EA proposed by Scharnow, Tinnefeld, and Wegener (2004) on the analogues of the LeadingOnes and Jump benchmarks. The latter shows that, different from bit-strings, it is not only the Hamming distance that determines how difficult it is to mutate a permutation $\sigma$ into another one $\tau$, but also the precise cycle structure of $\sigma \tau^{-1}$. For this reason, we also regard the more symmetric scramble mutation operator. We observe that it not only leads to simpler proofs, but also reduces the runtime on jump functions with odd jump size by a factor of $\Theta(n)$. Finally, we show that a heavy-tailed version of the scramble operator, as in the bit-string case, leads to a speed-up of order $m^{\Theta(m)}$ on jump functions with jump size $m$. A short empirical analysis confirms these findings, but also reveals that small implementation details like the rate of void mutations can make an important difference.

翻訳日:2023-07-28 20:49:35 公開日:2023-07-26

# ナビゲーションのためのビジュアル事前学習:ノイズから何が学べる?

Visual Pre-training for Navigation: What Can We Learn from Noise? ( http://arxiv.org/abs/2207.00052v3 )

ライセンス: Link先を確認

Yanwei Wang, Ching-Yun Ko, Pulkit Agrawal

(参考訳) 視覚ナビゲーションの強力なパラダイムの一つは、観察から直接行動を予測することである。このようなエンドツーエンドシステムのトレーニングにより、下流タスクが自動的に現れるのに役立つ表現が可能になる。しかし、帰納バイアスの欠如により、このシステムデータは非効率になる。我々は現在の視点の十分な表現とナビゲーションポリシーの目標ビューを、目標に対応する現在の視点の作物の位置と大きさを予測することによって学習できると仮定する。さらに、合成ノイズ画像から自然の家庭画像へ変換する自己教師方式で、このようなランダムな作物予測を訓練することが示される。そして、学習した表現をブートストラップして、対話データが少なく、効率的にナビゲーションポリシーを学ぶことができる。コードはhttps://yanweiw.github.io/noise2ptzで入手できる。

One powerful paradigm in visual navigation is to predict actions from observations directly. Training such an end-to-end system allows representations useful for downstream tasks to emerge automatically. However, the lack of inductive bias makes this system data inefficient. We hypothesize a sufficient representation of the current view and the goal view for a navigation policy can be learned by predicting the location and size of a crop of the current view that corresponds to the goal. We further show that training such random crop prediction in a self-supervised fashion purely on synthetic noise images transfers well to natural home images. The learned representation can then be bootstrapped to learn a navigation policy efficiently with little interaction data. The code is available at https://yanweiw.github.io/noise2ptz

翻訳日:2023-07-28 20:49:09 公開日:2023-07-26

# 定音性を有する量子局所テスト可能符号

Quantum Locally Testable Code with Constant Soundness ( http://arxiv.org/abs/2209.11405v2 )

ライセンス: Link先を確認

Andrew Cross, Zhiyang He, Anand Natarajan, Mario Szegedy, Guanyu Zhu

(参考訳) 本稿では,量子局所テスト可能符号(QLTC)の定音性を示す2つの構成について述べる。第1のアプローチでは、チェック製品と呼ばれる操作を導入し、この操作が、定音率、定速度、局所性による距離スケーリングのQLTCをいかに生み出すかを示す。第2のアプローチでは、量子符号と古典的な反復符号のハイパーグラフ積を考察し、成分符号の健全性が保たれる特別な場合を観察した。この洞察により、一定音質、スケーラブルな速度と距離、および一定平均局所性のQLTCを構築することができる。我々の研究は、高い音質と距離のQLTCを構築するための一歩であり、これはNo Low-Energy Trivial States (NLTS) の定理に異なる構成を与える。

In this paper, we present two constructions of quantum locally testable codes (QLTC) with constant soundness. In the first approach, we introduce an operation called check product, and show how this operation gives rise to QLTCs of constant soundness, constant rate, and distance scaling with locality. In the second approach, we consider hypergraph product of a quantum code and a classical repetition code, and observe a special case in which the soundness of component codes is preserved. This insight leads us to construct QLTCs of constant soundness, scalable rate and distance, and constant average locality. Our work marks a step towards constructing QLTCs of high soundness and distance, which would give a different construction to the No Low-Energy Trivial States (NLTS) theorem.

翻訳日:2023-07-28 20:40:58 公開日:2023-07-26

# 非滑らかな非凸非凸最小値最適化:2次元バランスと反復複雑度解析

Nonsmooth Nonconvex-Nonconcave Minimax Optimization: Primal-Dual Balancing and Iteration Complexity Analysis ( http://arxiv.org/abs/2209.10825v3 )

ライセンス: Link先を確認

Jiajin Li, Linglingzhi Zhu and Anthony Man-Cho So

(参考訳) nonconvex-nonconcave minimaxの最適化は、過去10年間で広く関心を集めている。しかし、既存のほとんどの研究は、スムーズな非凸凹設定にのみ適用可能な勾配降下度アルゴリズム(GDA)の変種に焦点を当てている。この制限に対処するため、スムーズな近位線形降下指数(smoothed PLDA)と呼ばれる新しいアルゴリズムを提案する。具体的には、原始函数が非滑らかな合成構造を持ち、双対函数が指数$\theta \in [0,1)$のクルディカ・ロジャシエヴィチ(KL)性質を持つような集合を考える。提案手法は, 新たに開発した非スムース主元誤差境界と2重誤差境界を主成分とする, 平滑化pldaのための新しい収束解析フレームワークを提案する。このフレームワークを用いて、平滑化pldaは$\epsilon$-game-stationary pointと$\epsilon$-optimization-stationary pointの両方を$\mathcal{o}(\epsilon^{-2\max\{2\theta,1\}})$イテレーションの興味のある問題から見つけることができる。さらに、$\theta \in [0,\frac{1}{2}]$の場合、平滑化pldaは$\mathcal{o}(\epsilon^{-2})$の最適な反復複雑性を達成する。分析フレームワークの有効性と適用性をさらに高めるために、ある最大構造問題は、軽度仮定の下で指数$\theta=0$のKL特性を持つことを示した。副産物として,様々な定常性概念間のアルゴリズム非依存な定量的関係を確立する。

Nonconvex-nonconcave minimax optimization has gained widespread interest over the last decade. However, most existing works focus on variants of gradient descent-ascent (GDA) algorithms, which are only applicable to smooth nonconvex-concave settings. To address this limitation, we propose a novel algorithm named smoothed proximal linear descent-ascent (smoothed PLDA), which can effectively handle a broad range of structured nonsmooth nonconvex-nonconcave minimax problems. Specifically, we consider the setting where the primal function has a nonsmooth composite structure and the dual function possesses the Kurdyka-Lojasiewicz (KL) property with exponent $\theta \in [0,1)$. We introduce a novel convergence analysis framework for smoothed PLDA, the key components of which are our newly developed nonsmooth primal error bound and dual error bound. Using this framework, we show that smoothed PLDA can find both $\epsilon$-game-stationary points and $\epsilon$-optimization-stationary points of the problems of interest in $\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$ iterations. Furthermore, when $\theta \in [0,\frac{1}{2}]$, smoothed PLDA achieves the optimal iteration complexity of $\mathcal{O}(\epsilon^{-2})$. To further demonstrate the effectiveness and wide applicability of our analysis framework, we show that certain max-structured problem possesses the KL property with exponent $\theta=0$ under mild assumptions. As a by-product, we establish algorithm-independent quantitative relationships among various stationarity concepts, which may be of independent interest.

翻訳日:2023-07-28 20:40:43 公開日:2023-07-26

# レンズとカメラの校正のための深部知覚計測

A Deep Perceptual Measure for Lens and Camera Calibration ( http://arxiv.org/abs/2208.12300v2 )

ライセンス: Link先を確認

Yannick Hold-Geoffroy, Dominique Pich\'e-Meunier, Kalyan Sunkavalli, Jean-Charles Bazin, Fran\c{c}ois Rameau and Jean-Fran\c{c}ois Lalonde

(参考訳) デジタルアートからarやvr体験に至るまで、エンタテインメントでは画像編集や合成が普及している。美しい複合材料を作るためには、カメラを幾何学的に調整する必要がある。従来のマルチイメージキャリブレーション法の代わりに、深部畳み込みニューラルネットワークを用いて、単一画像から直接ピッチ、ロール、視野、レンズ歪みなどのカメラキャリブレーションパラメータを推定することを提案する。大規模パノラマデータセットから自動生成されたサンプルを使ってネットワークをトレーニングし、標準の `2 エラーの点で競合精度を得る。しかし、このような標準エラーメトリクスの最小化は、多くのアプリケーションにとって最適ではないかもしれない。本研究では,幾何学的カメラキャリブレーションにおける不正確性に対する人間感度について検討する。そこで我々は, カメラキャリブレーションパラメータを補正した3次元物体のリアリズムの判断を参加者に依頼する大規模人間の知覚調査を行った。本研究では,カメラキャリブレーションのための新しい知覚尺度を開発し,この新しい知覚尺度と標準測定値の両方に基づいて,従来の単一画像に基づくキャリブレーション手法よりも深いキャリブレーションネットワークが優れていることを示す。最後に,仮想物体挿入,画像検索,合成など,いくつかのアプリケーションにおける校正ネットワークの利用を実証する。私たちのアプローチのデモはhttps://lvsn.github.io/deepcalib で公開されています。

Image editing and compositing have become ubiquitous in entertainment, from digital art to AR and VR experiences. To produce beautiful composites, the camera needs to be geometrically calibrated, which can be tedious and requires a physical calibration target. In place of the traditional multi-image calibration process, we propose to infer the camera calibration parameters such as pitch, roll, field of view, and lens distortion directly from a single image using a deep convolutional neural network. We train this network using automatically generated samples from a large-scale panorama dataset, yielding competitive accuracy in terms of standard `2 error. However, we argue that minimizing such standard error metrics might not be optimal for many applications. In this work, we investigate human sensitivity to inaccuracies in geometric camera calibration. To this end, we conduct a large-scale human perception study where we ask participants to judge the realism of 3D objects composited with correct and biased camera calibration parameters. Based on this study, we develop a new perceptual measure for camera calibration and demonstrate that our deep calibration network outperforms previous single-image based calibration methods both on standard metrics as well as on this novel perceptual measure. Finally, we demonstrate the use of our calibration network for several applications, including virtual object insertion, image retrieval, and compositing. A demonstration of our approach is available at https://lvsn.github.io/deepcalib .

翻訳日:2023-07-28 20:39:20 公開日:2023-07-26

# 隠れマルコフモデルを用いた強化学習のためのタスク自動学習

Learning Task Automata for Reinforcement Learning using Hidden Markov Models ( http://arxiv.org/abs/2208.11838v3 )

ライセンス: Link先を確認

Alessandro Abate (1), Yousif Almulla (1), James Fox (1), David Hyland (1), Michael Wooldridge (1) ((1) University of Oxford)

(参考訳) スカラー報酬信号を用いた訓練強化学習(RL)エージェントは、環境がまばらで非マルコフ報酬を持つ場合、しばしば実現不可能である。さらに、トレーニング前にこれらの報酬関数を手作りすることは、特に環境のダイナミクスが部分的にしか知られていない場合、不特定に陥る傾向がある。本稿では,未知環境におけるエージェント体験のエピソードから,非マルコフタスク仕様を簡潔な有限状態「タスクオートマトン」として学習するための新しいパイプラインを提案する。 2つの重要なアルゴリズムの洞察を活用します。まず、製品MDPを部分的に観測可能なMDPとして扱い、よく知られたBaum-Welchアルゴリズムを用いて隠れマルコフモデルを学習することで、仕様のオートマトンと環境のMDP(どちらも当初不明)からなるモデルである製品MDPを学習する。第2に、学習した製品MDPからタスクオートマトン(決定論的有限オートマトンと仮定される)を蒸留する方法を提案する。我々の学習タスクオートマトンはタスクをその構成サブタスクに分解し、RLエージェントが後に最適なポリシーを合成できる速度を改善する。また、高レベルの環境やタスクの特徴を解釈可能なエンコーディングを提供しており、エージェントが不特定性のないコヒーレントなタスクを学習したことを容易に確認することができる。さらに,学習オートマトンが環境非依存であることを保証するための一歩を踏み出し,転校学習に適するようにした。最後に,2つのベースラインと比較した実験結果を提供し,異なる環境とタスクにおけるアルゴリズムの性能を示す。

Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to misspecification, especially when the environment's dynamics are only partially known. This paper proposes a novel pipeline for learning non-Markovian task specifications as succinct finite-state `task automata' from episodes of agent experience within unknown environments. We leverage two key algorithmic insights. First, we learn a product MDP, a model composed of the specification's automaton and the environment's MDP (both initially unknown), by treating the product MDP as a partially observable MDP and using the well-known Baum-Welch algorithm for learning hidden Markov models. Second, we propose a novel method for distilling the task automaton (assumed to be a deterministic finite automaton) from the learnt product MDP. Our learnt task automaton enables the decomposition of a task into its constituent sub-tasks, which improves the rate at which an RL agent can later synthesise an optimal policy. It also provides an interpretable encoding of high-level environmental and task features, so a human can readily verify that the agent has learnt coherent tasks with no misspecifications. In addition, we take steps towards ensuring that the learnt automaton is environment-agnostic, making it well-suited for use in transfer learning. Finally, we provide experimental results compared with two baselines to illustrate our algorithm's performance in different environments and tasks.

翻訳日:2023-07-28 20:38:54 公開日:2023-07-26

# 決定論的問題に対する確率的推定器の優越性:ロバスト性、一貫性、知覚品質

Reasons for the Superiority of Stochastic Estimators over Deterministic Ones: Robustness, Consistency and Perceptual Quality ( http://arxiv.org/abs/2211.08944v3 )

ライセンス: Link先を確認

Guy Ohayon, Theo Adrai, Michael Elad, Tomer Michaeli

(参考訳) 確率的復元アルゴリズムは、劣化した入力に対応する解の空間を探索することができる。本稿では, 決定論的手法よりも確率論的手法の基本的な利点を明らかにする。まず, 完全な知覚的品質を達成し, 入力と一致した出力を持つ復元アルゴリズムは, 後方標本でなければならないことを証明し, 確率的であることが求められる。第二に、決定論的復元アルゴリズムは高い知覚的品質を達成できるが、これは、非常に敏感なマッピングを用いて、可能なすべてのソースイメージの空間を埋めることによってのみ達成できるので、敵の攻撃に対して非常に脆弱である。実際,このような攻撃に対して決定論的モデルを強制することは知覚的品質を著しく損なう一方で,確率的モデルの堅牢化は知覚的品質にはほとんど影響を与えず,出力の変動性も向上することを示す。これらの知見は, 確率的回復手法の進歩を促進する動機となり, 回復アルゴリズムの改善への道を開いた。

Stochastic restoration algorithms allow to explore the space of solutions that correspond to the degraded input. In this paper we reveal additional fundamental advantages of stochastic methods over deterministic ones, which further motivate their use. First, we prove that any restoration algorithm that attains perfect perceptual quality and whose outputs are consistent with the input must be a posterior sampler, and is thus required to be stochastic. Second, we illustrate that while deterministic restoration algorithms may attain high perceptual quality, this can be achieved only by filling up the space of all possible source images using an extremely sensitive mapping, which makes them highly vulnerable to adversarial attacks. Indeed, we show that enforcing deterministic models to be robust to such attacks profoundly hinders their perceptual quality, while robustifying stochastic models hardly influences their perceptual quality, and improves their output variability. These findings provide a motivation to foster progress in stochastic restoration methods, paving the way to better recovery algorithms.

翻訳日:2023-07-28 20:28:35 公開日:2023-07-26

# bohm-de broglie サイクル

Bohm - de Broglie Cycles ( http://arxiv.org/abs/2301.13251v2 )

ライセンス: Link先を確認

Olivier Piguet

(参考訳) de broglie-bohm量子理論では、粒子はその波動関数に関連する磁束によって決定される軌道を記述する。これらの軌道は相対論的スピン・ワン・ハーフ粒子に対して研究され、次元3次元の無質量粒子の場合の明示的な数値計算により、波動関数が全角運動量の固有関数である場合、軌道は直線をたどる遷移時間まで徐々に増加する半径の円として始まることが示されている。ある検出器における位置時間とそれらの確率分布も計算される。選択されたエネルギーと運動量パラメータは、グラフェンの物理学で満たされる桁数である。

In the de Broglie-Bohm quantum theory, particles describe trajectories determined by the flux associated with their wave function. These trajectories are studied here for relativistic spin-one-half particles.Based in explicit numerical calculations for the case of a massless particle in dimension three space-time, it is shown that if the wave function is an eigenfunction of the total angular momentum, the trajectories begin as circles of slowly increasing radius until a transition time at which they tend to follow straight lines. Arrival times at some detector, as well as their probability distribution are calculated, too. The chosen energy and momentum parameters are of the orders of magnitude met in graphene's physics.

翻訳日:2023-07-28 20:20:39 公開日:2023-07-26

# 制約プログラミング解法における汎用的価値選択ヒューリスティックの学習

Learning a Generic Value-Selection Heuristic Inside a Constraint Programming Solver ( http://arxiv.org/abs/2301.01913v2 )

ライセンス: Link先を確認

Tom Marty, Tristan Fran\c{c}ois, Pierre Tessier, Louis Gauthier, Louis-Martin Rousseau, Quentin Cappart

(参考訳) 制約プログラミングは組合せ問題の効率的な解法として知られている。解法における重要な設計選択は分岐ヒューリスティックスであり、探索を最小限の時間で最良の解に導くように設計されている。しかし、これらのヒューリスティックスの開発は、問題固有の専門知識を必要とする時間を要するプロセスである。この観察は、専門家の介入なしに機械学習を使って効率的なヒューリスティックを自動的に学習する多くの努力を動機付けてきた。私たちの知る限りでは、まだオープンな研究課題である。いくつかのジェネリック変数選択ヒューリスティックは文献で利用可能であるが、ジェネリック値選択ヒューリスティックの選択肢は少ない。本稿では,制約プログラミングソルバの内部において,価値選択ヒューリスティックを得るために使用できる汎用学習手順を導入することで,この問題に取り組むことを提案する。これは、深いq学習アルゴリズム、カスタマイズされた報酬信号、異種グラフニューラルネットワークアーキテクチャの組み合わせによって達成されている。グラフの彩色,最大独立集合,最大カット問題に関する実験は,汎用的ながら大量のバックトラックを必要とせずに,最適に近いより良い解を見つけることができることを示した。

Constraint programming is known for being an efficient approach for solving combinatorial problems. Important design choices in a solver are the branching heuristics, which are designed to lead the search to the best solutions in a minimum amount of time. However, developing these heuristics is a time-consuming process that requires problem-specific expertise. This observation has motivated many efforts to use machine learning to automatically learn efficient heuristics without expert intervention. To the best of our knowledge, it is still an open research question. Although several generic variable-selection heuristics are available in the literature, the options for a generic value-selection heuristic are more scarce. In this paper, we propose to tackle this issue by introducing a generic learning procedure that can be used to obtain a value-selection heuristic inside a constraint programming solver. This has been achieved thanks to the combination of a deep Q-learning algorithm, a tailored reward signal, and a heterogeneous graph neural network architecture. Experiments on graph coloring, maximum independent set, and maximum cut problems show that our framework is able to find better solutions close to optimality without requiring a large amounts of backtracks while being generic.

翻訳日:2023-07-28 20:19:43 公開日:2023-07-26

# dae-former : 医用画像セグメンテーションのための2重注意誘導型効率的なトランスフォーマー

DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation ( http://arxiv.org/abs/2212.13504v3 )

ライセンス: Link先を確認

Reza Azad, Ren\'e Arimond, Ehsan Khodapanah Aghdam, Amirhossein Kazerouni, Dorit Merhof

(参考訳) トランスフォーマーは最近、長距離依存をモデル化する能力により、コンピュータビジョン領域で注目を集めている。しかし、変圧器モデルの中核部分である自己拘束機構は、通常トークン数に関して二次計算の複雑さに苦しむ。多くのアーキテクチャは、自己保持機構をローカル領域に制限したり、トークン化プロセスを再設計することで、モデルの複雑さを減らそうとしている。本稿では,自己認識機構を効率的に設計することで,新たな視点の提供を目指すDAE-Formerを提案する。より具体的には、計算効率を保ちながら、特徴次元全体の空間的およびチャネル的関係を捉える自己認識機構を再構成する。さらに, クロスアテンションモジュールを組み込んだスキップ接続経路の再設計を行い, 特徴再利用性を確保し, ローカライズパワーを高める。プレトレーニング重量を必要とせず,多臓器心および皮膚病変分画データセットの最先端法を上回った。コードはhttps://github.com/mindflow-institue/daeformerで公開されている。

Transformers have recently gained attention in the computer vision domain due to their ability to model long-range dependencies. However, the self-attention mechanism, which is the core part of the Transformer model, usually suffers from quadratic computational complexity with respect to the number of tokens. Many architectures attempt to reduce model complexity by limiting the self-attention mechanism to local regions or by redesigning the tokenization process. In this paper, we propose DAE-Former, a novel method that seeks to provide an alternative perspective by efficiently designing the self-attention mechanism. More specifically, we reformulate the self-attention mechanism to capture both spatial and channel relations across the whole feature dimension while staying computationally efficient. Furthermore, we redesign the skip connection path by including the cross-attention module to ensure the feature reusability and enhance the localization power. Our method outperforms state-of-the-art methods on multi-organ cardiac and skin lesion segmentation datasets without requiring pre-training weights. The code is publicly available at https://github.com/mindflow-institue/DAEFormer.

翻訳日:2023-07-28 20:19:25 公開日:2023-07-26

# 陽子におけるクォーク対角相関:エントロピーと絡み合いの負性

Quark pair angular correlations in the proton: entropy versus entanglement negativity ( http://arxiv.org/abs/2303.07408v2 )

ライセンス: Link先を確認

Adrian Dumitru and Eric Kolbusz

(参考訳) 光面上の陽子の2粒子相関は、他のすべての観測されていない自由度を追跡した結果得られる混合密度行列によって記述される。量子情報理論のエンタングルメント負性度尺度を用いて真に量子クォーク方位相関を定量化する。色空間における2クォーク状態は高いエントロピーと弱い量子相関の1つであるが、文献からの標準3クォークモデル波動関数は、低エントロピーと高エンタングルメント負性性の方位相関状態を予測する。低エントロピーは多くの色に対する期待値(固定された't Hooft結合$g^2 N_c$)と一致しているが、高い負性度は、N_c=3$でかなりの2粒子量子相関を示す。絡み合いの負性度に関連する量子相関の抑制は、クォーク対アジムタルモーメント $\langle \zeta^n \rangle$, $\zeta = \exp(i (\phi_1-\phi_2))$, intrinsic to the proton state を強く修飾する。また、陽子中のグルーオンの存在(または交換)に起因する光円錐摂動理論から密度行列への${\cal O}(g^2)$の補正の仕方についても記述する。この補正はエントロピーを高め、クォーク対の方位相関に対する密度行列の負性を低減する。したがって、絡み合い陰性度測定はQCDのプロトン状態の構造に関する新しい洞察を与える可能性がある。

Two-particle correlations in the proton on the light-front are described by a mixed density matrix obtained by tracing over all other, unobserved, degrees of freedom. We quantify genuinely quantum quark azimuthal correlations in terms of the entanglement negativity measure of Quantum Information Theory. While the two-quark state in color space is one of high entropy and weak quantum correlation, we find that a standard three-quark model wave function from the literature predicts an azimuthally correlated state of low entropy and high entanglement negativity. Low entropy is consistent with expectations for many colors (at fixed 't Hooft coupling $g^2 N_c$) but high negativity indicates substantial two-particle quantum correlations at $N_c=3$. Suppressing quantum correlations associated with entanglement negativity strongly modifies quark pair azimuthal moments $\langle \zeta^n \rangle$, $\zeta = \exp(i (\phi_1-\phi_2))$, intrinsic to the proton state. We also describe how to account for the leading ${\cal O}(g^2)$ correction to the density matrix from light-cone perturbation theory which is due to the presence (or exchange) of a gluon in the proton. This correction increases the entropy and reduces the negativity of the density matrix for quark pair azimuthal correlations. Hence, the entanglement negativity measure may provide novel insight into the structure of the proton state of QCD.

翻訳日:2023-07-28 20:11:27 公開日:2023-07-26

# ベルヌーイ線形力学系のスペクトル学習

Spectral learning of Bernoulli linear dynamical systems models ( http://arxiv.org/abs/2303.02060v2 )

ライセンス: Link先を確認

Iris R. Stone, Yotam Sagiv, Il Memming Park, Jonathan W. Pillow

(参考訳) ベルヌーイ観測による潜在線形力学系は、二項決定や離散確率過程(例えば、双有神経スパイク列)のような様々な文脈で発生する、二項時系列データに基づく時間的ダイナミクスを特定する強力なモデリングフレームワークを提供する。本稿では,probit-bernoulli latent linear dynamical system (lds)モデルの高速かつ効率的な適合のためのスペクトル学習法を開発した。提案手法は,従来の部分空間同定手法を,第1および第2サンプルモーメントの変換を通じてベルヌーイ設定に拡張する。この結果、局所光学の危険性と、期待最大化(em)アルゴリズムのような反復的フィッティング手順の長い計算時間を回避する堅牢で固定コストの推定器が得られる。データの制限やデータの統計構造に関する仮定が満たされていない状況では、スペクトル推定がLaplace-EMフィッティングの優れた初期化を提供することを示す。最後に,感性決定タスクを行うマウスのデータを解析することにより,実世界の環境に有意な利点をもたらすことを示す。

Latent linear dynamical systems with Bernoulli observations provide a powerful modeling framework for identifying the temporal dynamics underlying binary time series data, which arise in a variety of contexts such as binary decision-making and discrete stochastic processes (e.g., binned neural spike trains). Here we develop a spectral learning method for fast, efficient fitting of probit-Bernoulli latent linear dynamical system (LDS) models. Our approach extends traditional subspace identification methods to the Bernoulli setting via a transformation of the first and second sample moments. This results in a robust, fixed-cost estimator that avoids the hazards of local optima and the long computation time of iterative fitting procedures like the expectation-maximization (EM) algorithm. In regimes where data is limited or assumptions about the statistical structure of the data are not met, we demonstrate that the spectral estimate provides a good initialization for Laplace-EM fitting. Finally, we show that the estimator provides substantial benefits to real world settings by analyzing data from mice performing a sensory decision-making task.

翻訳日:2023-07-28 20:10:54 公開日:2023-07-26

# 変動深部拡散による大気乱流補正

Atmospheric Turbulence Correction via Variational Deep Diffusion ( http://arxiv.org/abs/2305.05077v2 )

ライセンス: Link先を確認

Xijun Wang, Santiago L\'opez-Tapia, Aggelos K. Katsaggelos

(参考訳) 大気乱流補正(AT)は、幾何学的歪みと空間的に変化するぼやけという2つの歪みからなるため、困難な復元作業である。拡散モデルは、フォトリアリスティックな画像合成などの素晴らしい成果を示している。本稿では, at補正問題を解くために, 変分推論の枠組みに基づく新しい深部条件拡散モデルを提案する。このフレームワークを使用して,入力および劣化プロセスから潜在先行情報を学習することにより,パフォーマンスを向上させる。学習した情報を用いて拡散モデルをさらに条件付けする。実験はデータセットで総合的な合成で行われる。提案手法は,定量的かつ質的な結果が得られることを示す。

Atmospheric Turbulence (AT) correction is a challenging restoration task as it consists of two distortions: geometric distortion and spatially variant blur. Diffusion models have shown impressive accomplishments in photo-realistic image synthesis and beyond. In this paper, we propose a novel deep conditional diffusion model under a variational inference framework to solve the AT correction problem. We use this framework to improve performance by learning latent prior information from the input and degradation processes. We use the learned information to further condition the diffusion model. Experiments are conducted in a comprehensive synthetic AT dataset. We show that the proposed framework achieves good quantitative and qualitative results.

翻訳日:2023-07-28 20:02:39 公開日:2023-07-26

# 弱教師付き時間行動定位のためのビデオ特異的クエリーキー注意モデル

Video-Specific Query-Key Attention Modeling for Weakly-Supervised Temporal Action Localization ( http://arxiv.org/abs/2305.04186v2 )

ライセンス: Link先を確認

Xijun Wang, Aggelos K. Katsaggelos

(参考訳) 弱教師付き時間的アクションローカライゼーションは、ビデオレベルのアクションラベルのみを用いて、未トリミングビデオ中のアクションインスタンスを特定し、ローカライズすることを目的としている。人間がビデオを見るとき、さまざまなビデオシナリオにおけるアクションに関する抽象的な知識を適応させ、いくつかのアクションが起こっているかどうかを検出することができます。本稿では,人間がどのように行動するかを模倣し,ビデオ中の複数のアクションを特定し識別するための新しい視点をもたらす。本稿では,vqk-net というネットワークを提案し,各入力ビデオのアクションカテゴリ毎にユニークなクエリを学習する,ビデオ固有のクエリキー注意モデルを提案する。学習されたクエリは、アクションの知識の特徴を抽象レベルで含むだけでなく、この知識を対象のビデオシナリオに適合させる能力も備えており、時間次元に沿って対応するアクションの存在を検出するために使用される。これらのアクションカテゴリクエリをよりよく学習するために,従来の入力ビデオの特徴だけでなく,クエリ類似性を損なう新しいビデオ固有のアクションカテゴリクエリ学習者を通じて,異なるビデオ間の相関性を利用する。最後に,一般的に使用される3つのデータセット(thumos14, activitynet1.2, activitynet1.3)について広範な実験を行い,最先端のパフォーマンスを実現する。

Weakly-supervised temporal action localization aims to identify and localize the action instances in the untrimmed videos with only video-level action labels. When humans watch videos, we can adapt our abstract-level knowledge about actions in different video scenarios and detect whether some actions are occurring. In this paper, we mimic how humans do and bring a new perspective for locating and identifying multiple actions in a video. We propose a network named VQK-Net with a video-specific query-key attention modeling that learns a unique query for each action category of each input video. The learned queries not only contain the actions' knowledge features at the abstract level but also have the ability to fit this knowledge into the target video scenario, and they will be used to detect the presence of the corresponding action along the temporal dimension. To better learn these action category queries, we exploit not only the features of the current input video but also the correlation between different videos through a novel video-specific action category query learner worked with a query similarity loss. Finally, we conduct extensive experiments on three commonly used datasets (THUMOS14, ActivityNet1.2, and ActivityNet1.3) and achieve state-of-the-art performance.

翻訳日:2023-07-28 20:02:31 公開日:2023-07-26

# 文脈認識型注意層と最適な伝達領域適応と自然発話から認知症を認識するマルチモーダル融合法

Context-aware attention layers coupled with optimal transport domain adaptation and multimodal fusion methods for recognizing dementia from spontaneous speech ( http://arxiv.org/abs/2305.16406v2 )

ライセンス: Link先を確認

Loukas Ilias, Dimitris Askounis

(参考訳) アルツハイマー病(ad)は複雑な神経認知疾患であり、認知症の主な原因である。自発的発話による認知症診断を目標とする研究が数多く提案されているが、まだ限界がある。既存の最先端のアプローチでは、マルチモーダルな手法を提案し、言語と音響モデルを個別に訓練し、多数投票のアプローチを採用し、入力レベル、すなわち早期融合または訓練中に異なるモーダルの表現を結合する。また、文脈情報を考慮せずに表現間の依存関係を計算するセルフアテンション層も採用している。また,モデル校正に関する先行研究は行われていない。これらの制約に対処するため,AD患者検出のための新しい手法を提案する。まず、オーディオファイルをlog-mel spectrograms、delta、delta-deltaに変換し、3つのチャンネルからなるオーディオファイル毎の画像を作成する。次に、各転写文字と画像をそれぞれBERTモデルとDeiTモデルに渡す。その後、コンテキストベースの自己アテンション層、ゲートモデル付き自己アテンション層、および最適なトランスポートドメイン適応法を用いて、モーダル内およびモーダル間相互作用をキャプチャする。最後に、自己と横断的な特徴を融合させる2つの方法を利用する。モデルキャリブレーションを考慮した場合,ラベル平滑化を適用する。パフォーマンスとキャリブレーションの両方のメトリクスを使用します。 ADReSSとADReSSo Challengeのデータセットで実施された実験は、既存の研究イニシアチブに対する我々の導入したアプローチの有効性を示し、我々の最高の性能モデルが精度に到達し、F1スコアが91.25%、F1スコアが91.06%に達した。

Alzheimer's disease (AD) constitutes a complex neurocognitive disease and is the main cause of dementia. Although many studies have been proposed targeting at diagnosing dementia through spontaneous speech, there are still limitations. Existing state-of-the-art approaches, which propose multimodal methods, train separately language and acoustic models, employ majority-vote approaches, and concatenate the representations of the different modalities either at the input level, i.e., early fusion, or during training. Also, some of them employ self-attention layers, which calculate the dependencies between representations without considering the contextual information. In addition, no prior work has taken into consideration the model calibration. To address these limitations, we propose some new methods for detecting AD patients, which capture the intra- and cross-modal interactions. First, we convert the audio files into log-Mel spectrograms, their delta, and delta-delta and create in this way an image per audio file consisting of three channels. Next, we pass each transcript and image through BERT and DeiT models respectively. After that, context-based self-attention layers, self-attention layers with a gate model, and optimal transport domain adaptation methods are employed for capturing the intra- and inter-modal interactions. Finally, we exploit two methods for fusing the self and cross-attention features. For taking into account the model calibration, we apply label smoothing. We use both performance and calibration metrics. Experiments conducted on the ADReSS and ADReSSo Challenge datasets indicate the efficacy of our introduced approaches over existing research initiatives with our best performing model reaching Accuracy and F1-score up to 91.25% and 91.06% respectively.

翻訳日:2023-07-28 19:51:36 公開日:2023-07-26

# チャットGPT, 大規模言語モデル, 生成AI時代の科学 : 研究倫理と応答方法への挑戦

Science in the Era of ChatGPT, Large Language Models and Generative AI: Challenges for Research Ethics and How to Respond ( http://arxiv.org/abs/2305.15299v3 )

ライセンス: Link先を確認

Evangelos Pournaras

(参考訳) ChatGPTのような人工知能(AI)の大規模な言語モデルは、科学と研究に顕著だが議論の余地がある。本稿では,創造的AIの出現にともなう科学行為における認識論的課題,倫理的・整合性リスクについてレビューする。これは、高品質な研究倫理レビューのための、新たなタイムリーな基礎を築き上げることを目的としています。研究機器と主題としてのAI言語モデルの役割は、科学者、参加者、レビュアーに対する倫理的意味とともに精査されている。研究倫理レビューの新しい新たなプラクティスについて議論され、ai時代のより責任ある研究行為に対する反応を形成する10の推奨事項がまとめられている。

Large language models of artificial intelligence (AI), such as ChatGPT, find remarkable but controversial applicability in science and research. This paper reviews epistemological challenges, ethical and integrity risks in science conduct in the advent of generative AI. This is with the aim to lay new timely foundations for a high-quality research ethics review. The role of AI language models as a research instrument and subject is scrutinized along with ethical implications for scientists, participants and reviewers. New emerging practices for research ethics review are discussed, concluding with ten recommendations that shape a response for a more responsible research conduct in the era of AI.

翻訳日:2023-07-28 19:51:03 公開日:2023-07-26

# 分散クランクラベリング関係の一方向強通信複雑性における非有界量子優位

Unbounded Quantum Advantage in One-Way Strong Communication Complexity of a Distributed Clique Labelling Relation ( http://arxiv.org/abs/2305.10372v2 )

ライセンス: Link先を確認

Sumit Rout, Nitica Sakharwade, Some Sankar Bhattacharya, Ravishankar Ramanathan, Pawe{\l} Horodecki

(参考訳) 分散クリフラベル問題により誘導される関係のクラスに対する一方向ゼロエラー古典的および量子的通信複雑性について検討する。 2つの変種を考えます 1) 受信者は、関係を満足する回答 - 従来の関係の通信複雑性(ccr) - を出力し、 2)レシーバは、関係を満たすすべての有効な回答を出力する非ゼロ確率(つまり、関係を完全に再構築することができる)を持ち、関係の強い通信複雑性を示す(s-ccr)。プレイヤーがリソースを共有しない場合、ここで考慮される特定の関係クラスに対して、任意のグラフに対するccrタスクに量子的な利点がないことを証明します。一方、s-ccrタスクにおける一方向の古典的通信と量子的通信の分離がグラフ $m$ の順序で増加するグラフのクラスが存在し、特に量子的複雑性は $o(1)$ であり、古典的複雑性は $\omega(\log m)$ である。第二に、固定された制限された通信のシナリオにおける分離を克服するために必要な共有ランダム性の量に対する下界(傾きの数で線型)を証明し、直交配列の存在に接続する。最後に,この課題を半デバイス非依存次元の目撃や,相互に偏りのない基底の検出に応用する。

We investigate the one-way zero-error classical and quantum communication complexities for a class of relations induced by a distributed clique labelling problem. We consider two variants: 1) the receiver outputs an answer satisfying the relation - the traditional communication complexity of relations (CCR) and 2) the receiver has non-zero probabilities of outputting every valid answer satisfying the relation (equivalently, the relation can be fully reconstructed), that we denote the strong communication complexity of the relation (S-CCR). We prove that for the specific class of relations considered here when the players do not share any resources, there is no quantum advantage in the CCR task for any graph. On the other hand, we show that there exist, classes of graphs for which the separation between one-way classical and quantum communication in the S-CCR task grow with the order of the graph $m$, specifically, the quantum complexity is $O(1)$ while the classical complexity is $\Omega(\log m)$. Secondly, we prove a lower bound (that is linear in the number of cliques) on the amount of shared randomness necessary to overcome the separation in the scenario of fixed restricted communication and connect this to the existence of Orthogonal Arrays. Finally, we highlight some applications of this task to semi-device-independent dimension witnessing as well as to the detection of Mutually Unbiased Bases.

翻訳日:2023-07-28 19:49:57 公開日:2023-07-26

# 一次元液滴, 気泡, キンクの相互作用とダイナミクス

Interactions and dynamics of one-dimensional droplets, bubbles and kinks ( http://arxiv.org/abs/2306.07055v2 )

ライセンス: Link先を確認

G. C. Katsimiga, S. I. Mistakidis, B. A. Malomed, D. J. Frantzeskakis, R. Carretero-Gonz\'alez and P. G. Kevrekidis

(参考訳) 我々は,lee-huang-yang補正を含む1次元グロス・ピタエフスキーモデルを用いて,複数の明るい液滴と気泡のダイナミクスと相互作用,およびキンクスと液滴およびアンチキンクとの相互作用について検討した。化学ポテンシャルの観点から1次元の液滴と気泡の存在領域を同定し, 液滴の安定性を検証し, 気泡の不安定性を明らかにする。液滴ファミリーの制限ケースは安定なキンクである。液滴間の相互作用は相内(相外)アトラクション(反発)を示し、いわゆるマントン法は観察された動的応答を解明し、相転移の中間値に対する混合挙動を示す。異なる化学ポテンシャルを持つ液滴は質量交換現象を経験する。個々のバブルは、不安定化の前にコア膨張と相互アトラクションを示す。キンクと相互作用する液滴はそれらによって吸収され、分散衝撃波と灰色のソリトンが放出される。 kink-antikink相互作用は反発的であり、反伝播衝撃波を生成する。本研究は,現在の実験で検出できる液滴とキンクの動的特徴を明らかにした。

We explore the dynamics and interactions of multiple bright droplets and bubbles, as well as the interactions of kinks with droplets and with antikinks, in the extended one-dimensional Gross-Pitaevskii model including the Lee-Huang-Yang correction. Existence regions are identified for the one-dimensional droplets and bubbles in terms of their chemical potential, verifying the stability of the droplets and exposing the instability of the bubbles. The limiting case of the droplet family is a stable kink. The interactions between droplets demonstrate in-phase (out-of-phase) attraction (repulsion), with the so-called Manton's method explicating the observed dynamical response, and mixed behavior for intermediate values of the phase shift. Droplets bearing different chemical potentials experience mass-exchange phenomena. Individual bubbles exhibit core expansion and mutual attraction prior to their destabilization. Droplets interacting with kinks are absorbed by them, a process accompanied by the emission of dispersive shock waves and gray solitons. Kink-antikink interactions are repulsive, generating counter-propagating shock waves. Our findings reveal dynamical features of droplets and kinks that can be detected in current experiments.

翻訳日:2023-07-28 19:42:03 公開日:2023-07-26

# PlaSma:(企業)計画のための手続き的知識モデルを改善するための小さな言語モデル

PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning ( http://arxiv.org/abs/2305.19472v2 )

ライセンス: Link先を確認

Faeze Brahman, Chandra Bhagavatula, Valentina Pyatkin, Jena D. Hwang, Xiang Lorraine Li, Hirona J. Arai, Soumya Sanyal, Keisuke Sakaguchi, Xiang Ren, Yejin Choi

(参考訳) 高レベルの目標を時間的に順序付けられた一連のステップに分解する手続き的計画は、マシンにとって重要で複雑な作業である。これは「電話なしで医師の予約をスケジュールする」など、しばしば事実に反する複雑な状況についての推論に常識的な知識を統合することである。現在のアプローチでは、大きな言語モデル(LLM)を使用して結果を奨励しているが、コストのかかるAPI呼び出しや再現性の問題といった欠点によって妨げられている。本稿では,より小さな言語モデルを用いたプランニングを提唱する。手続き的知識と(非現実的な)計画能力を備えた小型言語モデルを実現するための,新しい2段階のアプローチであるPlasmaを提案する。より具体的には,小言語モデルにおける暗黙的知識を高めるための記号的手続き的知識蒸留法と,より構造化された正確な推論を容易にする推論時アルゴリズムを開発する。さらに, 対実的状況に対応するための計画の見直しを必要とする, 対実的計画という新たな課題を導入する。原型モデルと対物モデルの両方において、770M-11Bパラメータのオーダーが、より大きな教師モデルの能力を競い、しばしば超えることを示す。

Procedural planning, which entails decomposing a high-level goal into a sequence of temporally ordered steps, is an important yet intricate task for machines. It involves integrating common-sense knowledge to reason about complex contextualized situations that are often counterfactual, e.g. "scheduling a doctor's appointment without a phone". While current approaches show encouraging results using large language models (LLMs), they are hindered by drawbacks such as costly API calls and reproducibility issues. In this paper, we advocate planning using smaller language models. We present PlaSma, a novel two-pronged approach to endow small language models with procedural knowledge and (counterfactual) planning capabilities. More concretely, we develop symbolic procedural knowledge distillation to enhance the implicit knowledge in small language models and an inference-time algorithm to facilitate more structured and accurate reasoning. In addition, we introduce a novel task, Counterfactual Planning, that requires a revision of a plan to cope with a counterfactual situation. In both the original and counterfactual setting, we show that orders-of-magnitude smaller models (770M-11B parameters) can compete and often surpass their larger teacher models' capabilities.

翻訳日:2023-07-28 19:39:47 公開日:2023-07-26

# 計算社会科学における再現性

Computational Reproducibility in Computational Social Science ( http://arxiv.org/abs/2307.01918v3 )

ライセンス: Link先を確認

David Schoch, Chung-hong Chan, Claudia Wagner, Arnim Bleier

(参考訳) 過去10年間で、再現性と再現性の危機が科学界を揺るがしている。潜在的な解決策として、オープンサイエンスの実践は深く議論され、様々な分野で様々な成功を収めた。しかしながら,計算社会科学などの計算X分野における再現性のバイナリ定義は,結果が再現できるエージェントや条件について明示的でないため不十分である,と我々は主張する。本研究では, 理論的再現性を創出するが, 実用的, 検証された再現性をサポートしない「オープン洗浄」を避けるための定義を拡張し, 検証可能性の概念に基づく計算再現性の階層システムを導入する。検証可能な計算再現性、特に計算社会科学の分野における共通の障壁を特定し、共通アクセスや計算障壁を回避する方法について提案する。

In the last decade, replication and reproducibility crises have shaken the scientific landscape. As potential solutions, open science practices were heavily discussed and have been implemented with varying success in different disciplines. We argue, however, that the binary definition of reproducibility, specifically for computational-X disciplines such as computational social science, is insufficient since it is not explicit about the agents and conditions under which results can be reproduced. We expand the definition to avoid "open washing", the practice of fabricating theoretical reproducibility but not supporting practical or verified reproducibility, and introduce a tier system of computational reproducibility based on the concept of verifiability. We identify common barriers to verifiable computational reproducibility, specifically in the field of computational social science, and provide suggestions on how to circumvent common access and computational barriers.

翻訳日:2023-07-28 19:30:48 公開日:2023-07-26

# ChatGPTは人格認識に優れているか? 予備的研究

Is ChatGPT a Good Personality Recognizer? A Preliminary Study ( http://arxiv.org/abs/2307.03952v2 )

ライセンス: Link先を確認

Yu Ji, Wen Wu, Hong Zheng, Yi Hu, Xi Chen, Liang He

(参考訳) 近年、パーソナリティは感情分析や製品のレコメンデーションといった多くのタスクに組み込まれている価値ある個人的要因とみなされている。これは、与えられたテキストに基づいて個人のパーソナリティを識別することを目的とした、テキストベースのパーソナリティ認識タスクに広く注目されている。近年,ChatGPTが様々な自然言語処理タスクにおいて顕著な能力を発揮していることを考慮し,テキストに基づく人格認識タスクにおけるChatGPTの予備評価を行い,効果的な人格データを生成する。具体的には,ChatGPTが与えられたテキストから人格を認識する能力,特に所定レベルでの分析においてChatGPTを導くために設計されたレベル指向のプロンプト戦略を探索する。 2つの代表的な実世界のデータセットにおける実験結果から,ゼロショット・チェーン・オブ・マインド・プロンプトのchatgptは印象的なパーソナリティ認識能力を示し,テキストに基づく論理推論を通じて自然言語説明を提供できることが明らかとなった。さらに、ゼロショットチェーン・オブ・シークレット・プロンプトを最適化するためのレベル指向プロンプト戦略を利用することで、ChatGPTとそれに対応する最先端モデルのパフォーマンスギャップをさらに狭めている。しかし、ChatGPTは、性別や年齢などの特定のセンシティブな属性に対して不公平である。また,チャットgptのパーソナリティ認識能力の解明は,感情分類やストレス予測などのパーソナリティ関連下流タスクにおけるパフォーマンスの向上に寄与することがわかった。

In recent years, personality has been regarded as a valuable personal factor being incorporated into numerous tasks such as sentiment analysis and product recommendation. This has led to widespread attention to text-based personality recognition task, which aims to identify an individual's personality based on given text. Considering that ChatGPT has recently exhibited remarkable abilities on various natural language processing tasks, we provide a preliminary evaluation of ChatGPT on text-based personality recognition task for generating effective personality data. Concretely, we employ a variety of prompting strategies to explore ChatGPT's ability in recognizing personality from given text, especially the level-oriented prompting strategy we designed for guiding ChatGPT in analyzing given text at a specified level. The experimental results on two representative real-world datasets reveal that ChatGPT with zero-shot chain-of-thought prompting exhibits impressive personality recognition ability and is capable to provide natural language explanations through text-based logical reasoning. Furthermore, by employing the level-oriented prompting strategy to optimize zero-shot chain-of-thought prompting, the performance gap between ChatGPT and corresponding state-of-the-art model has been narrowed even more. However, we observe that ChatGPT shows unfairness towards certain sensitive demographic attributes such as gender and age. Additionally, we discover that eliciting the personality recognition ability of ChatGPT helps improve its performance on personality-related downstream tasks such as sentiment classification and stress prediction.

翻訳日:2023-07-28 19:20:58 公開日:2023-07-26

# デッドビット存在下でのフォールトトレラントハスティングス・ハア符号

Fault-Tolerant Hastings-Haah Codes in the Presence of Dead Qubits ( http://arxiv.org/abs/2307.03715v2 )

ライセンス: Link先を確認

David Aasen, Jeongwan Haah, Parsa Bonderson, Zhenghan Wang, Matthew Hastings

(参考訳) デッドキュービットの存在下でHastings-Haah Floquet符号のプロトコルを開発する。

We develop protocols for Hastings-Haah Floquet codes in the presence of dead qubits.

翻訳日:2023-07-28 19:20:00 公開日:2023-07-26

# LAMP:マルチパーソン・ポース推定のための言語プロンプトの活用

LAMP: Leveraging Language Prompts for Multi-person Pose Estimation ( http://arxiv.org/abs/2307.11934v2 )

ライセンス: Link先を確認

Shengnan Hu, Ce Zheng, Zixiang Zhou, Chen Chen, and Gita Sukthankar

(参考訳) 人間中心の視覚理解は、効果的な人間とロボットの相互作用にとって重要なデシデラタムである。混雑した公共の場所をナビゲートするためには、社会ロボットが周囲の人間の活動を理解する必要がある。本稿では,人間中心の視覚的理解,多人数ポーズ推定における重要な側面について述べる。混み合った場面における多人数ポーズ推定における良好な性能の実現は,オクルードジョイントやインスタンス分離の課題から困難である。これらの課題に取り組み,目に見えない部分を表現する際の画像特徴の限界を克服するために,lamp(language assisted multi-person pose estimation)と呼ばれる新しいプロンプトベースポーズ推論戦略を提案する。 CLIP( well-trained language model)によって生成されたテキスト表現を利用することで、LAMPはインスタンスや関節レベルでのポーズの理解を容易にし、閉塞に弱いより堅牢な視覚表現を学習することができる。本稿では,言語指導型学習が単一段階多人数ポーズ推定の性能を高めることを示し,インスタンスレベルと共同レベルのプロンプトの両方がトレーニングに有用であることを示す。コードはhttps://github.com/shengnanh20/LAMPで公開されている。

Human-centric visual understanding is an important desideratum for effective human-robot interaction. In order to navigate crowded public places, social robots must be able to interpret the activity of the surrounding humans. This paper addresses one key aspect of human-centric visual understanding, multi-person pose estimation. Achieving good performance on multi-person pose estimation in crowded scenes is difficult due to the challenges of occluded joints and instance separation. In order to tackle these challenges and overcome the limitations of image features in representing invisible body parts, we propose a novel prompt-based pose inference strategy called LAMP (Language Assisted Multi-person Pose estimation). By utilizing the text representations generated by a well-trained language model (CLIP), LAMP can facilitate the understanding of poses on the instance and joint levels, and learn more robust visual representations that are less susceptible to occlusion. This paper demonstrates that language-supervised training boosts the performance of single-stage multi-person pose estimation, and both instance-level and joint-level prompts are valuable for training. The code is available at https://github.com/shengnanh20/LAMP.

翻訳日:2023-07-28 19:11:24 公開日:2023-07-26

# 悪騒音に対するフェアネス制約学習の脆弱性について

On the Vulnerability of Fairness Constrained Learning to Malicious Noise ( http://arxiv.org/abs/2307.11892v2 )

ライセンス: Link先を確認

Avrim Blum, Princewill Okoroafor, Aadirupa Saha, Kevin Stangl

(参考訳) トレーニングデータにおいて、公平性に制約された学習の脆弱性を少数の悪意のある雑音に対して考慮する。 konstantinov と lampert (2021) はこの問題の研究を開始し、いくつかの公平な制約に対して、グループのサイズが不均衡な場合、適切な学習者が高い脆弱性を示すデータ分布が存在することを示した。ここでは、より楽観的な見解を示し、ランダム化分類器を許すと、風景はより微妙になることを示す。例えば、人口統計学的パリティの場合、精度の低下は$\theta(\alpha)$であり、$\alpha$は悪意のあるノイズレートであり、公平さの制約なしにも最良に一致する。同じ機会のために、我々は$o(\sqrt{\alpha})$損失を発生させ、一致する$\omega(\sqrt{\alpha})$lowerバウンドを与えることができることを示します。対照的に、Konstantinov と Lampert (2021) は、適切な学習者に対して、両方の概念の精度の損失は$\Omega(1)$であることを示した。我々の研究の重要な技術的ノベルティは、敵が彼の力を増幅するために使える単純な「トリック」をランダム化がどのようにバイパスできるかである。また、等化オッズや校正を含む追加の公平性の概念も検討する。これらの公平性の概念に対して、過剰な精度のクラスターは3つの自然界に$O(\alpha)$,$O(\sqrt{\alpha})$と$O(1)$である。これらの結果は、訓練データにおける対向雑音に対する公平性制約学習の感度をよりきめ細かなビューを提供する。

We consider the vulnerability of fairness-constrained learning to small amounts of malicious noise in the training data. Konstantinov and Lampert (2021) initiated the study of this question and presented negative results showing there exist data distributions where for several fairness constraints, any proper learner will exhibit high vulnerability when group sizes are imbalanced. Here, we present a more optimistic view, showing that if we allow randomized classifiers, then the landscape is much more nuanced. For example, for Demographic Parity we show we can incur only a $\Theta(\alpha)$ loss in accuracy, where $\alpha$ is the malicious noise rate, matching the best possible even without fairness constraints. For Equal Opportunity, we show we can incur an $O(\sqrt{\alpha})$ loss, and give a matching $\Omega(\sqrt{\alpha})$lower bound. In contrast, Konstantinov and Lampert (2021) showed for proper learners the loss in accuracy for both notions is $\Omega(1)$. The key technical novelty of our work is how randomization can bypass simple "tricks" an adversary can use to amplify his power. We also consider additional fairness notions including Equalized Odds and Calibration. For these fairness notions, the excess accuracy clusters into three natural regimes $O(\alpha)$,$O(\sqrt{\alpha})$ and $O(1)$. These results provide a more fine-grained view of the sensitivity of fairness-constrained learning to adversarial noise in training data.

翻訳日:2023-07-28 19:11:05 公開日:2023-07-26

# 限られたデータと少ないショットとゼロショットによる生成モデリングに関する調査

A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot ( http://arxiv.org/abs/2307.14397v1 )

ライセンス: Link先を確認

Milad Abdollahzadeh, Touba Malekzadeh, Christopher T. H. Teo, Keshigeyan Chandrasegaran, Guimeng Liu, Ngai-Man Cheung

(参考訳) 機械学習において、生成モデリングは、トレーニングデータ分布と統計的に類似した新しいデータの生成を学ぶことを目的としている。本稿では,データ制約下の生成モデル (GM-DC) と称される,限られたデータ,少ないショット,ゼロショットの学習モデルについて調査する。これは、医療アプリケーションなど、データ取得が難しい場合に重要なトピックです。我々は,gm-dcタスクとgm-dcアプローチの2つの分類法について,背景,課題,提案を行う。重要なことは、異なるGM-DCタスクとアプローチ間の相互作用を研究することである。さらに,今後の探索に向けた研究のギャップ,研究動向,潜在的な道筋も強調する。プロジェクトウェブサイト: https://gmdc-survey.github.io

In machine learning, generative modeling aims to learn to generate new data statistically similar to the training data distribution. In this paper, we survey learning generative models under limited data, few shots and zero shot, referred to as Generative Modeling under Data Constraint (GM-DC). This is an important topic when data acquisition is challenging, e.g. healthcare applications. We discuss background, challenges, and propose two taxonomies: one on GM-DC tasks and another on GM-DC approaches. Importantly, we study interactions between different GM-DC tasks and approaches. Furthermore, we highlight research gaps, research trends, and potential avenues for future exploration. Project website: https://gmdc-survey.github.io.

翻訳日:2023-07-28 17:08:57 公開日:2023-07-26

# 学習可能差分演算子を用いた部分既知の時空間力学のシミュレート

Learning to simulate partially known spatio-temporal dynamics with trainable difference operators ( http://arxiv.org/abs/2307.14395v1 )

ライセンス: Link先を確認

Xiang Huang, Zhuoyuan Li, Hongsheng Liu, Zidong Wang, Hongye Zhou, Bin Dong, Bei Hua

(参考訳) 近年,時空間ダイナミクスをシミュレートするニューラルネットワークが注目されている。しかし、既存の手法の多くは、精度と解釈性に乏しい純粋なデータ駆動ブラックボックスモデルを採用している。トレーニング可能な差分演算子とブラックボックスモデルを組み合わせることで、PDE-Net++と呼ばれる基礎となるPDEの事前知識を部分的に組み込んだ新しいハイブリッドアーキテクチャを提案する。さらに、差分演算子に対して、トレーニング可能な反転差分層(TFDL)とトレーニング可能な動的差分層(TDDL)という2つの異なる選択肢を導入する。多くの数値実験により、PDE-Net++はブラックボックスモデルよりも予測精度と外挿性能が優れていることが示されている。

Recently, using neural networks to simulate spatio-temporal dynamics has received a lot of attention. However, most existing methods adopt pure data-driven black-box models, which have limited accuracy and interpretability. By combining trainable difference operators with black-box models, we propose a new hybrid architecture explicitly embedded with partial prior knowledge of the underlying PDEs named PDE-Net++. Furthermore, we introduce two distinct options called the trainable flipping difference layer (TFDL) and the trainable dynamic difference layer (TDDL) for the difference operators. Numerous numerical experiments have demonstrated that PDE-Net++ has superior prediction accuracy and better extrapolation performance than black-box models.

翻訳日:2023-07-28 17:08:46 公開日:2023-07-26

# ハイパーグラフ同型計算

Hypergraph Isomorphism Computation ( http://arxiv.org/abs/2307.14394v1 )

ライセンス: Link先を確認

Yifan Feng, Jiashu Han, Shihui Ying, Yue Gao

(参考訳) 同型問題(isomorphism problem)は、低次構造情報と高次構造情報の両方を取り込むネットワーク解析における根本的な問題である。低次構造情報の抽出に関して、グラフ同型アルゴリズムは、構造同値を解析してソルバ空間次元を減少させ、タンパク質設計、化学経路、コミュニティ検出などの多くの応用においてその威力を示す。現実のシナリオにおいてより一般的に発生する高次関係に対して、これらの高次構造関係を効果的に捉えているハイパーグラフ同型問題は、グラフ同型法を用いて簡単には解決できない。さらに、既存のハイパーグラフカーネルメソッドは、高いメモリ消費や不正確なサブ構造識別に苦しむ可能性があるため、サブ最適性能をもたらす。本稿では,上記の問題に対処するため,ワイスプダー・リーマンテストアルゴリズムをグラフからハイパーグラフに一般化することにより,最初にハイパーグラフ同型テスト問題に対するハイパグラフワイスプダー・リーマンテストアルゴリズムを提案する。次に,提案手法に基づき,hypergraph weisfeiler-lehmanカーネルフレームワークを提案し,hypergraph weisfeiler-lehamnサブツリーカーネルとhypergraph weisfeiler-lehamnハイパーエッジカーネルの2つのインスタンスを実装した。研究目的を達成するため、7つのグラフ分類データセットと12のハイパーグラフ分類データセットを含む包括的な実験セットを慎重に設計した。ハイパーグラフ分類データセットの結果は,提案手法の有効性を示す他のカーネルベース手法と比較して有意な改善を示した。評価の結果,提案手法は,複雑なハイパーグラフ構造を扱う場合,実行時の80倍以上の速度で実行可能であることがわかった。

The isomorphism problem is a fundamental problem in network analysis, which involves capturing both low-order and high-order structural information. In terms of extracting low-order structural information, graph isomorphism algorithms analyze the structural equivalence to reduce the solver space dimension, which demonstrates its power in many applications, such as protein design, chemical pathways, and community detection. For the more commonly occurring high-order relationships in real-life scenarios, the problem of hypergraph isomorphism, which effectively captures these high-order structural relationships, cannot be straightforwardly addressed using graph isomorphism methods. Besides, the existing hypergraph kernel methods may suffer from high memory consumption or inaccurate sub-structure identification, thus yielding sub-optimal performance. In this paper, to address the abovementioned problems, we first propose the hypergraph Weisfiler-Lehman test algorithm for the hypergraph isomorphism test problem by generalizing the Weisfiler-Lehman test algorithm from graphs to hypergraphs. Secondly, based on the presented algorithm, we propose a general hypergraph Weisfieler-Lehman kernel framework and implement two instances, which are Hypergraph Weisfeiler-Lehamn Subtree Kernel and Hypergraph Weisfeiler-Lehamn Hyperedge Kernel. In order to fulfill our research objectives, a comprehensive set of experiments was meticulously designed, including seven graph classification datasets and 12 hypergraph classification datasets. Results on hypergraph classification datasets show significant improvements compared to other typical kernel-based methods, which demonstrates the effectiveness of the proposed methods. In our evaluation, we found that our proposed methods outperform the second-best method in terms of runtime, running over 80 times faster when handling complex hypergraph structures.

翻訳日:2023-07-28 17:08:32 公開日:2023-07-26

# 3次元大規模シナリオのための人間中心シーン理解

Human-centric Scene Understanding for 3D Large-scale Scenarios ( http://arxiv.org/abs/2307.14392v1 )

ライセンス: Link先を確認

Yiteng Xu, Peishan Cong, Yichen Yao, Runnan Chen, Yuenan Hou, Xinge Zhu, Xuming He, Jingyi Yu, Yuexin Ma

(参考訳) 人間中心のシーン理解は現実世界の応用において重要であるが、多様な人間のポーズや行動、複雑な人間と環境の相互作用、群衆の激しい閉塞など、非常に難しい。本稿では,人間中心のシーン理解のための大規模マルチモーダルデータセットであるhucenlifeについて述べる。私たちのhucenlifeは、セグメンテーション、検出、アクション認識など、多くの3d知覚タスクの恩恵を受けると同時に、関連する研究を容易にするためにこれらのタスクのベンチマークも提供しています。さらに,LiDARに基づくセグメンテーションと行動認識のための新しいモジュールを設計する。

Human-centric scene understanding is significant for real-world applications, but it is extremely challenging due to the existence of diverse human poses and actions, complex human-environment interactions, severe occlusions in crowds, etc. In this paper, we present a large-scale multi-modal dataset for human-centric scene understanding, dubbed HuCenLife, which is collected in diverse daily-life scenarios with rich and fine-grained annotations. Our HuCenLife can benefit many 3D perception tasks, such as segmentation, detection, action recognition, etc., and we also provide benchmarks for these tasks to facilitate related research. In addition, we design novel modules for LiDAR-based segmentation and action recognition, which are more applicable for large-scale human-centric scenarios and achieve state-of-the-art performance.

翻訳日:2023-07-28 17:08:02 公開日:2023-07-26

# diff-e: 拡散型学習による想像音声脳波の復号化

Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG ( http://arxiv.org/abs/2307.14389v1 )

ライセンス: Link先を確認

Soowon Kim, Young-Eun Lee, Seo-Hyun Lee, Seong-Whan Lee

(参考訳) 想定された音声に対する脳波信号の復号化は、データの高次元的性質と低信号対雑音比のため難しい課題である。近年, 拡散確率モデル (DDPM) は, 様々な領域における表現学習に有望なアプローチとして出現している。本研究では,DDPMとDiff-Eという条件付きオートエンコーダを用いた脳波信号の符号化手法を提案する。その結果,Diff-Eは従来の機械学習手法やベースラインモデルと比較して脳波信号の復号精度を著しく向上させることがわかった。この結果から,DDPMは脳波信号復号に有効なツールであり,脳-コンピュータインタフェースの開発に寄与する可能性が示唆された。

Decoding EEG signals for imagined speech is a challenging task due to the high-dimensional nature of the data and low signal-to-noise ratio. In recent years, denoising diffusion probabilistic models (DDPMs) have emerged as promising approaches for representation learning in various domains. Our study proposes a novel method for decoding EEG signals for imagined speech using DDPMs and a conditional autoencoder named Diff-E. Results indicate that Diff-E significantly improves the accuracy of decoding EEG signals for imagined speech compared to traditional machine learning techniques and baseline models. Our findings suggest that DDPMs can be an effective tool for EEG signal decoding, with potential implications for the development of brain-computer interfaces that enable communication through imagined speech.

翻訳日:2023-07-28 17:07:47 公開日:2023-07-26

# ランダムウォークに基づく異常検出に対するデュアルスペース攻撃

Dual-Space Attacks against Random-Walk-based Anomaly Detection ( http://arxiv.org/abs/2307.14387v1 )

ライセンス: Link先を確認

Yuni Lai, Marcin Waniek, Yulin Zhu, Liying Li, Jingwen Wu, Tomasz P. Michalak, Talal Rahwan, Kai Zhou

(参考訳) ランダムウォークスに基づく異常検出(RWAD)は、様々なアプリケーションにおいて異常パターンを特定するために一般的に用いられる。 RWADの興味深い特徴は、入力グラフが事前に存在するか、生の特徴から構築できることである。その結果、RWADに対する潜在的な攻撃面は2つあり、グラフ空間攻撃と特徴空間攻撃である。本稿では,実用的な二重空間攻撃を設計し,グラフ空間と特徴空間攻撃の相互作用について検討する。この目的のために、我々は徹底的な複雑性解析を行い、RWAD攻撃がNPハードであることを証明した。そこで我々は,グラフ空間攻撃を二段階最適化問題として定式化し,それを解決するための2つの戦略を提案する。最後に、より強力な特徴空間攻撃(グラフ誘導攻撃)を設計するためのガイダンスとしてグラフ空間攻撃の結果を利用する。包括的実験により,提案する攻撃は,rwadからターゲットノードを限定的な攻撃予算で有効にすることを示す。さらに,ブラックボックス設定で転送攻撃実験を行い,対象ノードの異常スコアを有意に減少させることを示した。本研究では,グラフ空間が特徴空間に依存するグラフ異常検出に対する二重空間攻撃の研究の扉を開く。

Random Walks-based Anomaly Detection (RWAD) is commonly used to identify anomalous patterns in various applications. An intriguing characteristic of RWAD is that the input graph can either be pre-existing or constructed from raw features. Consequently, there are two potential attack surfaces against RWAD: graph-space attacks and feature-space attacks. In this paper, we explore this vulnerability by designing practical dual-space attacks, investigating the interplay between graph-space and feature-space attacks. To this end, we conduct a thorough complexity analysis, proving that attacking RWAD is NP-hard. Then, we proceed to formulate the graph-space attack as a bi-level optimization problem and propose two strategies to solve it: alternative iteration (alterI-attack) or utilizing the closed-form solution of the random walk model (cf-attack). Finally, we utilize the results from the graph-space attacks as guidance to design more powerful feature-space attacks (i.e., graph-guided attacks). Comprehensive experiments demonstrate that our proposed attacks are effective in enabling the target nodes from RWAD with a limited attack budget. In addition, we conduct transfer attack experiments in a black-box setting, which show that our feature attack significantly decreases the anomaly scores of target nodes. Our study opens the door to studying the dual-space attack against graph anomaly detection in which the graph space relies on the feature space.

翻訳日:2023-07-28 17:07:34 公開日:2023-07-26

# オンラインテキストデータを用いた大規模言語モデルを用いたメンタルヘルス予測

Leveraging Large Language Models for Mental Health Prediction via Online Text Data ( http://arxiv.org/abs/2307.14385v1 )

ライセンス: Link先を確認

Xuhai Xu, Bingshen Yao, Yuanzhe Dong, Hong Yu, James Hendler, Anind K. Dey, Dakuo Wang

(参考訳) 最近の大規模言語モデル(LLM)の技術強化は、様々なアプリケーションに力を与えている。しかし、精神保健領域におけるLSMの能力の理解と改善に関する研究はほとんどない。本研究は,アルパカ,アルパカ-ロラ,GPT-3.5を含む複数のLSMの様々なメンタルヘルス予測タスクにおけるオンラインテキストデータによる総合的な評価である。ゼロショットプロンプト,少数ショットプロンプト,インストラクションの微調整など,幅広い実験を実施した。その結果、ゼロショットと数ショットのプロンプトを持つLSMの有望な性能は、メンタルヘルスタスクのための設計であることがわかった。さらに重要なことは、命令の微調整が全てのタスクに対するLLMの性能を同時に向上させることを示すことである。我々の最も精巧なモデルであるMental-Alpacaは、バランスの取れた精度でGPT-3.5(25倍)を16.7%上回り、最先端のタスク特化モデルと同等に動作します。我々は,今後の研究者,技術者,実践者に対して,llmによりよいメンタルヘルス領域知識を付与し,メンタルヘルス予測タスクの専門家となるための一連の行動ガイドラインをまとめる。

The recent technology boost of large language models (LLMs) has empowered a variety of applications. However, there is very little research on understanding and improving LLMs' capability for the mental health domain. In this work, we present the first comprehensive evaluation of multiple LLMs, including Alpaca, Alpaca-LoRA, and GPT-3.5, on various mental health prediction tasks via online text data. We conduct a wide range of experiments, covering zero-shot prompting, few-shot prompting, and instruction finetuning. The results indicate the promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned model, Mental-Alpaca, outperforms GPT-3.5 (25 times bigger) by 16.7\% on balanced accuracy and performs on par with the state-of-the-art task-specific model. We summarize our findings into a set of action guidelines for future researchers, engineers, and practitioners on how to empower LLMs with better mental health domain knowledge and become an expert in mental health prediction tasks.

翻訳日:2023-07-28 17:07:11 公開日:2023-07-26

# HyperFed: フェデレーション学習における非IIDデータの一貫性集約による双曲型探索

HyperFed: Hyperbolic Prototypes Exploration with Consistent Aggregation for Non-IID Data in Federated Learning ( http://arxiv.org/abs/2307.14384v1 )

ライセンス: Link先を確認

Xinting Liao, Weiming Liu, Chaochao Chen, Pengyang Zhou, Huabin Zhu, Yanchao Tan, Jun Wang and Yue Qi

(参考訳) フェデレーション学習(fl)は、分散した方法でユーザデータを協調的にモデル化する。しかし,実世界では,クライアント間の非同一・独立データ分散(非IID)は,(1)クラス統計のシフト,(2)階層的情報利用の不十分,(3)集約における不整合という3つの問題により,FLの性能を阻害する。以上の課題に対処するため,HyperFed はハイパーボリックプロトタイプ Tammes 初期化 (HPTI) ,ハイパーボリックプロトタイプ学習 (HPL) ,一貫性のあるアグリゲーション (CA) の3つの主要モジュールを含む。第一に、サーバ内のhptiは一様分散および固定クラスのプロトタイプを構築し、それらをクラス統計にマッチするクライアントと共有し、さらにローカルクライアントのための一貫した機能表現を導く。第二に、各クライアントのHPLは、双曲モデル空間における共有クラスプロトタイプの監督により、ローカルデータの階層情報をキャプチャする。さらに、サーバ内のCAは、クライアントからサーバへの一貫性のない逸脱の影響を軽減する。 4つのデータセットの大規模な研究により、HyperFedは非IIDデータセット下でのFLの性能向上に有効であることが証明された。

Federated learning (FL) collaboratively models user data in a decentralized way. However, in the real world, non-identical and independent data distributions (non-IID) among clients hinder the performance of FL due to three issues, i.e., (1) the class statistics shifting, (2) the insufficient hierarchical information utilization, and (3) the inconsistency in aggregating clients. To address the above issues, we propose HyperFed which contains three main modules, i.e., hyperbolic prototype Tammes initialization (HPTI), hyperbolic prototype learning (HPL), and consistent aggregation (CA). Firstly, HPTI in the server constructs uniformly distributed and fixed class prototypes, and shares them with clients to match class statistics, further guiding consistent feature representation for local clients. Secondly, HPL in each client captures the hierarchical information in local data with the supervision of shared class prototypes in the hyperbolic model space. Additionally, CA in the server mitigates the impact of the inconsistent deviations from clients to server. Extensive studies of four datasets prove that HyperFed is effective in enhancing the performance of FL under the non-IID set.

翻訳日:2023-07-28 17:06:47 公開日:2023-07-26

# 共形場理論におけるサブシステムからのpetz回復

Petz recovery from subsystems in conformal field theory ( http://arxiv.org/abs/2307.14434v1 )

ライセンス: Link先を確認

Shreya Vardhan, Annie Y. Wei, and Yijian Zou

(参考訳) cftの真空状態の多成分絡み合い構造を1+1次元で探究し、より小さな部分領域の密度行列からある領域の密度行列を再構成しようとする回復演算を用いた。我々は,twirled petz mapとして知られる明示的な回復チャネルを用いて,元の状態と回復状態との間の忠実性,相対エントロピー,トレース距離などの距離測定を行った。私たちが詳細に研究している1つのセットアップは、空間スライス上の3つの連続した間隔$A$、$B$、および$C$であり、そこではこれらの量が、それらの間に在る領域$B$によって仲介されない$A$と$C$の間の相関を測るものと見なすことができる。それぞれの距離測度は、cftの作用素量に依存しないuv有限であり、従って間隔の中央電荷と交差比にのみ依存することを示した。臨界スピンチェーンモデルにおける格子シミュレーションを用いて,これらの普遍的量を数値的に評価し,その解析形式を ope 展開を用いて a$ と $c$ が近い極限で導出する。 a$ と $c$ が遠く離れている場合は、ope の制限によってレプリカトリックの非可換性が驚くべきこととなる。クロス比のすべての値に対して、忠実性は条件付き相互情報の観点から一般情報理論下限よりも厳密に優れている。また、元の状態と回復した状態における様々なサブシステム間の相互情報の比較を行い、それらの違いをより定性的に理解する。さらに,回復操作を3つ以上の隣接区間に一般化し,演算子の内容に対して忠実度が再び普遍的であることを示す。

We probe the multipartite entanglement structure of the vacuum state of a CFT in 1+1 dimensions, using recovery operations that attempt to reconstruct the density matrix in some region from its reduced density matrices on smaller subregions. We use an explicit recovery channel known as the twirled Petz map, and study distance measures such as the fidelity, relative entropy, and trace distance between the original state and the recovered state. One setup we study in detail involves three contiguous intervals $A$, $B$ and $C$ on a spatial slice, where we can view these quantities as measuring correlations between $A$ and $C$ that are not mediated by the region $B$ that lies between them. We show that each of the distance measures is both UV finite and independent of the operator content of the CFT, and hence depends only on the central charge and the cross-ratio of the intervals. We evaluate these universal quantities numerically using lattice simulations in critical spin chain models, and derive their analytic forms in the limit where $A$ and $C$ are close using the OPE expansion. In the case where $A$ and $C$ are far apart, we find a surprising non-commutativity of the replica trick with the OPE limit. For all values of the cross-ratio, the fidelity is strictly better than a general information-theoretic lower bound in terms of the conditional mutual information. We also compare the mutual information between various subsystems in the original and recovered states, which leads to a more qualitative understanding of the differences between them. Further, we introduce generalizations of the recovery operation to more than three adjacent intervals, for which the fidelity is again universal with respect to the operator content.

翻訳日:2023-07-28 16:59:00 公開日:2023-07-26

# ProtoASNet:心エコー図における非定型的大動脈狭窄分類のための動的プロトタイプ

ProtoASNet: Dynamic Prototypes for Inherently Interpretable and Uncertainty-Aware Aortic Stenosis Classification in Echocardiography ( http://arxiv.org/abs/2307.14433v1 )

ライセンス: Link先を確認

Hooman Vaseli, Ang Nan Gu, S. Neda Ahmadi Amiri, Michael Y. Tsang, Andrea Fung, Nima Kondori, Armin Saadat, Purang Abolmaesumi, Teresa S. M. Tsang

(参考訳) 大動脈狭窄症(as)は、適切な治療のために正確かつタイムリーな診断を必要とする一般的な心臓弁疾患である。現在のAS重度自動検出法のほとんどは、信頼性の低いブラックボックスモデルに依存しており、臨床応用を妨げている。そこで本研究では,bモード心エコービデオからasを直接検出し,入力と学習時空間プロトタイプの類似性に基づいて解釈可能な予測を行うprotoasnetを提案する。このアプローチは、プロトタイプが典型的には石灰化や大動脈弁のリーフレットの制限された移動などのマーカーを強調するため、臨床的に重要な証拠を提供する。さらに、protoasnetは、観測データに曖昧さと不十分な情報をキャプチャするプロトタイプセットを定義することで、摂食損失を推定する。これは、いつ失敗するかを検知し、説明できる信頼できるシステムを提供する。 ProtoASNetをプライベートデータセットと公開可能なTMED-2データセットで評価し、それぞれ80.0%と79.7%の精度で既存の最先端メソッドを上回ります。さらに、ProtoASNetは、各予測に対する解釈可能性と不確実性対策を提供し、透明性を改善し、臨床的な意思決定を支援するためにディープネットワークの対話的利用を促進する。ソースコードはhttps://github.com/hooman007/protoasnet。

Aortic stenosis (AS) is a common heart valve disease that requires accurate and timely diagnosis for appropriate treatment. Most current automatic AS severity detection methods rely on black-box models with a low level of trustworthiness, which hinders clinical adoption. To address this issue, we propose ProtoASNet, a prototypical network that directly detects AS from B-mode echocardiography videos, while making interpretable predictions based on the similarity between the input and learned spatio-temporal prototypes. This approach provides supporting evidence that is clinically relevant, as the prototypes typically highlight markers such as calcification and restricted movement of aortic valve leaflets. Moreover, ProtoASNet utilizes abstention loss to estimate aleatoric uncertainty by defining a set of prototypes that capture ambiguity and insufficient information in the observed data. This provides a reliable system that can detect and explain when it may fail. We evaluate ProtoASNet on a private dataset and the publicly available TMED-2 dataset, where it outperforms existing state-of-the-art methods with an accuracy of 80.0% and 79.7%, respectively. Furthermore, ProtoASNet provides interpretability and an uncertainty measure for each prediction, which can improve transparency and facilitate the interactive usage of deep networks to aid clinical decision-making. Our source code is available at: https://github.com/hooman007/ProtoASNet.

翻訳日:2023-07-28 16:58:29 公開日:2023-07-26

# 時間相関ノイズを有する量子デバイスの圧縮ゲート特性評価

Compressed gate characterization for quantum devices with time-correlated noise ( http://arxiv.org/abs/2307.14432v1 )

ライセンス: Link先を確認

M. J. Gullans, M. Caranti, A. R. Mills, and J. R. Petta

(参考訳) 量子デバイスは、中間スケールとフォールトトレラントな量子コンピューティングに向けて着実に進歩するので、既知のノイズ源を説明する厳密で効率的な測定プロトコルを開発することが不可欠である。ゲートセットトモグラフィやランダム化ベンチマークのような既存の量子特徴づけプロトコルの多くは、量子ビットに作用するノイズがマルコビアンであると仮定する。しかし、1/fの電荷ノイズや超微細核スピンノイズの場合のように、この仮定はしばしば有効ではない。本稿では,時間関連ノイズの存在下での量子プロセストモグラフィ(QPT)の一般的な枠組みについて述べる。さらに,マルコフ音源と非マルコフノイズの相対強度を定量化する忠実度ベンチマークも導入する。本手法の適用例として,シリコンスピン量子ビットの比較理論的および実験的解析を行った。まず, 支配的雑音源を考慮した詳細なノイズモデルを開発し, 実験データに対する評価を行った。時間関連QPTの枠組みを適用すると、完全汎用の場合と比較して、1と2のキュービットゲートを特徴付けるのに必要な独立パラメータの数を10倍、100倍圧縮できることがわかった。これらの圧縮は実験に必要なトモグラフィ測定量を減少させると同時に、時間依存のハミルトニアンシミュレーションと比較してノイズ量子回路ダイナミクスの数値シミュレーションを著しく高速化する。この圧縮雑音モデルを用いて, シリコンスピン量子ビットに関する最近の実験において, 理論的に予測されたプロセスフィデリティと2つの量子ビット間ランダム化ベンチマークフィデリティの99.8%との一致が確認された。より広範に、我々のフォーマリズムは直接拡張することができ、非マルコフノイズを持つ大規模量子デバイスの高忠実性制御のための効率的でスケーラブルなチューニングプロトコルを開発することができる。

As quantum devices make steady progress towards intermediate scale and fault-tolerant quantum computing, it is essential to develop rigorous and efficient measurement protocols that account for known sources of noise. Most existing quantum characterization protocols such as gate set tomography and randomized benchmarking assume the noise acting on the qubits is Markovian. However, this assumption is often not valid, as for the case of 1/f charge noise or hyperfine nuclear spin noise. Here, we present a general framework for quantum process tomography (QPT) in the presence of time-correlated noise. We further introduce fidelity benchmarks that quantify the relative strength of different sources of Markovian and non-Markovian noise. As an application of our method, we perform a comparative theoretical and experimental analysis of silicon spin qubits. We first develop a detailed noise model that accounts for the dominant sources of noise and validate the model against experimental data. Applying our framework for time-correlated QPT, we find that the number of independent parameters needed to characterize one and two-qubit gates can be compressed by 10x and 100x, respectively, when compared to the fully generic case. These compressions reduce the amount of tomographic measurements needed in experiment, while also significantly speeding up numerical simulations of noisy quantum circuit dynamics compared to time-dependent Hamiltonian simulation. Using this compressed noise model, we find good agreement between our theoretically predicted process fidelities and two qubit interleaved randomized benchmarking fidelities of 99.8% measured in recent experiments on silicon spin qubits. More broadly, our formalism can be directly extended to develop efficient and scalable tuning protocols for high-fidelity control of large-arrays of quantum devices with non-Markovian noise.

翻訳日:2023-07-28 16:58:06 公開日:2023-07-26

# スキル・イット! 言語モデルの理解と訓練のためのデータ駆動スキルフレームワーク

Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models ( http://arxiv.org/abs/2307.14430v1 )

ライセンス: Link先を確認

Mayee F. Chen, Nicholas Roberts, Kush Bhatia, Jue Wang, Ce Zhang, Frederic Sala, Christopher R\'e

(参考訳) トレーニングデータの質は、事前訓練された大規模言語モデル(LM)の性能に影響を及ぼす。トークンの固定された予算を前提として、タスク間のダウンストリームモデルのパフォーマンスを向上する最適なデータ選択方法を検討する。簡単な仮説に基づく新しいフレームワークを開発する。人間が意図的な順序で相互依存スキルを取得するのと同じように、言語モデルもトレーニングデータから一連のスキルを学ぶ際に自然な順序に従う。このような順序が存在する場合、LMの理解の向上やデータ効率のトレーニングに利用できる。この直観を用いて、我々のフレームワークは、関連するデータの観点から、スキルの概念と順序付けられたスキルセットを定式化する。まず、合成データと実データの両方を用いて、これらの順序づけられたスキルセットの存在を実証し、それらの存在によって、より高度なスキルがより少ないデータで学習できることを示す。第2に,提案手法を用いて,前者のスキルと後者のスキルを効率的に学習することを目的とした,継続的な事前学習と微調整の両方のスキルを混合した,オンラインデータサンプリングアルゴリズムであるスキルイットを提案する。 Skill-Itは、連続的な事前トレーニング設定におけるLEGO合成において、ランダムサンプリングよりも36.5ポイント高い精度を得る。微調整設定の自然命令データセットでは、目標スキル自体に関連するデータに対するトレーニングに比べて、目標スキルのバリデーション損失を13.6%削減する。我々は最近のRedPajamaデータセットにスキル・フレームワークを適用し、3BパラメータのLMを継続的に事前訓練し、1BトークンによるLM評価ハーネスを、3Bトークンによるデータソースを均一にサンプリングするベースラインアプローチよりも高い精度で達成する。

The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow a natural order when learning a set of skills from their training data. If such an order exists, it can be utilized for improved understanding of LMs and for data-efficient training. Using this intuition, our framework formalizes the notion of a skill and of an ordered set of skills in terms of the associated data. First, using both synthetic and real data, we demonstrate that these ordered skill sets exist, and that their existence enables more advanced skills to be learned with less data when we train on their prerequisite skills. Second, using our proposed framework, we introduce an online data sampling algorithm, Skill-It, over mixtures of skills for both continual pre-training and fine-tuning regimes, where the objective is to efficiently learn multiple skills in the former and an individual skill in the latter. On the LEGO synthetic in the continual pre-training setting, Skill-It obtains 36.5 points higher accuracy than random sampling. On the Natural Instructions dataset in the fine-tuning setting, Skill-It reduces the validation loss on the target skill by 13.6% versus training on data associated with the target skill itself. We apply our skills framework on the recent RedPajama dataset to continually pre-train a 3B-parameter LM, achieving higher accuracy on the LM Evaluation Harness with 1B tokens than the baseline approach of sampling uniformly over data sources with 3B tokens.

翻訳日:2023-07-28 16:57:40 公開日:2023-07-26

# 機械学習雑音緩和による非平面グラフの大規模量子近似最適化

Large-scale quantum approximate optimization on non-planar graphs with machine learning noise mitigation ( http://arxiv.org/abs/2307.14427v1 )

ライセンス: Link先を確認

Stefan H. Sack and Daniel J. Egger

(参考訳) 量子コンピュータのサイズと品質は増加しているが、いまだに非常に騒がしい。誤差軽減は、ノイズの多いデバイスが有意義に実行できる量子回路のサイズを拡張する。しかし、最先端の誤差軽減手法は実装が困難であり、超伝導量子ビットデバイスにおける限定的な量子ビット接続は、ハードウェアのネイティブトポロジーにほとんどのアプリケーションを制限する。ここでは,最大40ノードの非平面乱数正規グラフに対して,機械学習に基づく誤差軽減により量子近似最適化アルゴリズム(QAOA)を提案する。我々は,40キュービットまでの深さ2qaoaの最適化を実証するために,慎重な決定変数からキュービットへのマッピングとフィードフォワードニューラルネットワークを備えたスワップネットワークを用いた。我々は,958個の2量子ビットゲートを持つ量子回路を必要とする最大グラフに対する有意義なパラメータ最適化を観察する。我々の研究は、量子近似最適化において、期待値だけでなくサンプルの緩和の必要性を強調している。これらの結果は、古典的にシミュレートできないスケールで量子近似最適化を実行するためのステップである。このようなシステムサイズを取得することは、QAOAのようなヒューリスティックアルゴリズムの真のポテンシャルを適切に理解するための鍵となる。

Quantum computers are increasing in size and quality, but are still very noisy. Error mitigation extends the size of the quantum circuits that noisy devices can meaningfully execute. However, state-of-the-art error mitigation methods are hard to implement and the limited qubit connectivity in superconducting qubit devices restricts most applications to the hardware's native topology. Here we show a quantum approximate optimization algorithm (QAOA) on non-planar random regular graphs with up to 40 nodes enabled by a machine learning-based error mitigation. We use a swap network with careful decision-variable-to-qubit mapping and a feed-forward neural network to demonstrate optimization of a depth-two QAOA on up to 40 qubits. We observe a meaningful parameter optimization for the largest graph which requires running quantum circuits with 958 two-qubit gates. Our work emphasizes the need to mitigate samples, and not only expectation values, in quantum approximate optimization. These results are a step towards executing quantum approximate optimization at a scale that is not classically simulable. Reaching such system sizes is key to properly understanding the true potential of heuristic algorithms like QAOA.

翻訳日:2023-07-28 16:57:09 公開日:2023-07-26

# ユニバーサルトランスバーサルゲート集合のための量子ゴレイ符号を用いたコード変換

Code conversion with the quantum Golay code for a universal transversal gate set ( http://arxiv.org/abs/2307.14425v1 )

ライセンス: Link先を確認

Matthew Sullivan

(参考訳) 7,1,3]]$ steane codeと$[[23,1,7]]$ quantum golay codeは、コード結合によるフォールトトレラントな量子コンピューティングの候補として認識されている。これら2つの符号はクリフォードゲートを横断的に実装するが、フォールトトレラントな$T$ゲートの他のスキームを必要とする。マジックステート、クリフォード演算、測定は一般的なスキームであるが、マジックステート蒸留には大きなオーバーヘッドがある。コード変換は、魔法の状態を用いずにユニバーサルゲートセットをフォールトトレラントに実装するための1つの方法である。 $[[7,1,3]]$ Steaneコードをフォールトトレラントに変換し、$[[[15,1,3]]$ Reed-Mullerコードから、$T$ゲートを変換した$[[23,1,7]$ Golayコードを$[[[95,1,7]$ triorthogonal code with a transversal $T$ gateに変換できる。この手順の重要な要素は$[[49,1,5]]$ triorthogonal codeであり、これはそれ自体が$[[17,1,5]$ 2dカラーコードと関連していると見なすことができる。

The $[[7,1,3]]$ Steane code and $[[23,1,7]]$ quantum Golay code have been identified as good candidates for fault-tolerant quantum computing via code concatenation. These two codes have transversal implementations of all Clifford gates, but require some other scheme for fault-tolerant $T$ gates. Using magic states, Clifford operations, and measurements is one common scheme, but magic state distillation can have a large overhead. Code conversion is one avenue for implementing a universal gate set fault-tolerantly without the use of magic states. Analogously to how the $[[7,1,3]]$ Steane code can be fault-tolerantly converted to and from the $[[15,1,3]]$ Reed-Muller code which has a transversal $T$ gate, the $[[23,1,7]]$ Golay code can be converted to a $[[95,1,7]]$ triorthogonal code with a transversal $T$ gate. A crucial ingredient to this procedure is the $[[49,1,5]]$ triorthogonal code, which can itself be seen as related to the self-dual $[[17,1,5]]$ 2D color code.

翻訳日:2023-07-28 16:56:50 公開日:2023-07-26

# 捕捉イオンを用いた測定に基づく量子ランダムサンプリングの検証

Verifiable measurement-based quantum random sampling with trapped ions ( http://arxiv.org/abs/2307.14424v1 )

ライセンス: Link先を確認

Martin Ringbauer, Marcel Hinsche, Thomas Feldker, Paul K. Faehrmann, Juani Bermejo-Vega, Claire Edmunds, Lukas Postler, Roman Stricker, Christian D. Marciniak, Michael Meth, Ivan Pogorelov, Rainer Blatt, Philipp Schindler, Jens Eisert, Thomas Monz, Dominik Hangleiter

(参考訳) 量子コンピュータは今や、彼らの古典的コンピュータを上回っている。量子計算の利点を示す1つの方法は、量子コンピューティングデバイス上で実行される量子ランダムサンプリングである。しかしながら、量子デバイスが実際に古典的な難解なサンプリングタスクを実行したことを検証するための既存のツールは、実用的でないか、量子アドバンテージにスケーラブルでないかのどちらかである。検証問題は依然として顕著な課題である。ここでは、捕捉イオン量子プロセッサ上での量子計算の測定モデルにおいて、効率よく検証可能な量子ランダムサンプリングを実験的に示す。私たちは、測定ベースのコンピューティングの中心にあるランダムなクラスタ状態を作成し、最大4 x 4 qubitまでのサイズにします。さらに、これらの状態の構造を利用することで、量子ビットレジスタよりも大きい絡み合ったクラスタ状態からサンプルに計算中に量子ビットを再利用することができる。結果とクロスエントロピーベンチマークを比較して,結果の妥当性を効果的に推定して,生成した状態(単一インスタンスと平均)を検証した。最後に,実験騒音が証明書に与える影響について検討する。我々の結果と手法は、量子優位の検証された実証に向けて実現可能な経路を提供する。

Quantum computers are now on the brink of outperforming their classical counterparts. One way to demonstrate the advantage of quantum computation is through quantum random sampling performed on quantum computing devices. However, existing tools for verifying that a quantum device indeed performed the classically intractable sampling task are either impractical or not scalable to the quantum advantage regime. The verification problem thus remains an outstanding challenge. Here, we experimentally demonstrate efficiently verifiable quantum random sampling in the measurement-based model of quantum computation on a trapped-ion quantum processor. We create random cluster states, which are at the heart of measurement-based computing, up to a size of 4 x 4 qubits. Moreover, by exploiting the structure of these states, we are able to recycle qubits during the computation to sample from entangled cluster states that are larger than the qubit register. We then efficiently estimate the fidelity to verify the prepared states--in single instances and on average--and compare our results to cross-entropy benchmarking. Finally, we study the effect of experimental noise on the certificates. Our results and techniques provide a feasible path toward a verified demonstration of a quantum advantage.

翻訳日:2023-07-28 16:56:20 公開日:2023-07-26

# 量子コンピューティングによる地球観測衛星の画像取得の最適化

Optimization of Image Acquisition for Earth Observation Satellites via Quantum Computing ( http://arxiv.org/abs/2307.14419v1 )

ライセンス: Link先を確認

Ant\'on Makarov, M\'arcio M. Taddei, Eneko Osaba, Giacomo Franceschetto, Esther Villar-Rodriguez, Izaskun Oregi

(参考訳) 衛星画像取得スケジューリングは、地球観測分野において一様である問題であり、その目的は、与えられた軌道の通過時に撮影される画像の最適なサブセットを一連の制約の下で見つけることである。この問題は組合せ最適化によってモデル化できるが、人工知能と運用研究コミュニティによって何度も扱われてきた。しかし、その本質的な関心にもかかわらず、量子コンピューティングパラダイムを通じてはほとんど研究されていない。そこで本稿では,この問題に対する2つのqubo定式化について,非自明な制約を扱うために異なるアプローチを用いて述べる。現在D-Waveから利用可能な3つの量子アニールと、そのハイブリッドソルバを用いて、20以上の問題を実験的に比較した。テスト中の14のインスタンスはよく知られたSPOT5ベンチマークから取得され、残りの6つはアドホックで生成された。以上の結果から, 定式化とアシラハンドリング手法が課題の解決に不可欠であることが示唆された。最後に、現在の量子コンピュータで現実的に解決できる問題インスタンスのサイズ制限に関する実践的ガイドラインも提供する。

Satellite image acquisition scheduling is a problem that is omnipresent in the earth observation field; its goal is to find the optimal subset of images to be taken during a given orbit pass under a set of constraints. This problem, which can be modeled via combinatorial optimization, has been dealt with many times by the artificial intelligence and operations research communities. However, despite its inherent interest, it has been scarcely studied through the quantum computing paradigm. Taking this situation as motivation, we present in this paper two QUBO formulations for the problem, using different approaches to handle the non-trivial constraints. We compare the formulations experimentally over 20 problem instances using three quantum annealers currently available from D-Wave, as well as one of its hybrid solvers. Fourteen of the tested instances have been obtained from the well-known SPOT5 benchmark, while the remaining six have been generated ad-hoc for this study. Our results show that the formulation and the ancilla handling technique is crucial to solve the problem successfully. Finally, we also provide practical guidelines on the size limits of problem instances that can be realistically solved on current quantum computers.

翻訳日:2023-07-28 16:56:03 公開日:2023-07-26

# スペクトルと空間的忠実度を併用した教師なし深層学習によるパンシャープニング

Unsupervised Deep Learning-based Pansharpening with Jointly-Enhanced Spectral and Spatial Fidelity ( http://arxiv.org/abs/2307.14403v1 )

ライセンス: Link先を確認

Matteo Ciotola, Giovanni Poggi, Giuseppe Scarpa

(参考訳) 近年、深層学習は多解像度画像のパンシャーピングにおいて主要な役割を担っている。基礎的真理データがないことから、深層学習に基づく手法の多くは、解像度の低い領域で教師付きトレーニングを実行する。しかし、小型画像で訓練されたモデルは高解像度のターゲット画像では性能が良くない傾向にある。このため、いくつかの研究グループが、適切な損失関数とトレーニングパラダイムの定義を通じて、フルレゾリューション領域における教師なしトレーニングに移行している。この文脈で、我々は最近、既存の多くのアーキテクチャに適用可能なフルレゾリューショントレーニングフレームワークを提案しました。本稿では,このアプローチの可能性を十分に活用し,最先端のパフォーマンスを提供する,深層学習に基づく新しいパンシャープニングモデルを提案する。余剰アテンションモジュールの使用など,過去の作業に対するアーキテクチャ改善に加えて,提案モデルでは,パンシャープデータのスペクトルと空間的品質を協調的に促進する新たな損失関数が特徴である。さらに、新しい微調整戦略により、ターゲット画像への推論時間適応を改善する。挑戦的なシナリオで実施された多種多様なテスト画像の実験により,提案手法は,数値的結果と視覚的出力の両面において,技術の現状と良好に比較できることを示した。コードはhttps://github.com/matciotola/lambda-pnnで入手できる。

In latest years, deep learning has gained a leading role in the pansharpening of multiresolution images. Given the lack of ground truth data, most deep learning-based methods carry out supervised training in a reduced-resolution domain. However, models trained on downsized images tend to perform poorly on high-resolution target images. For this reason, several research groups are now turning to unsupervised training in the full-resolution domain, through the definition of appropriate loss functions and training paradigms. In this context, we have recently proposed a full-resolution training framework which can be applied to many existing architectures. Here, we propose a new deep learning-based pansharpening model that fully exploits the potential of this approach and provides cutting-edge performance. Besides architectural improvements with respect to previous work, such as the use of residual attention modules, the proposed model features a novel loss function that jointly promotes the spectral and spatial quality of the pansharpened data. In addition, thanks to a new fine-tuning strategy, it improves inference-time adaptation to target images. Experiments on a large variety of test images, performed in challenging scenarios, demonstrate that the proposed method compares favorably with the state of the art both in terms of numerical results and visual output. Code is available online at https://github.com/matciotola/Lambda-PNN.

翻訳日:2023-07-28 16:55:45 公開日:2023-07-26

# 癌治療結果予測のための非線形自己拡張ディープパイプライン

Non-Linear Self Augmentation Deep Pipeline for Cancer Treatment outcome Prediction ( http://arxiv.org/abs/2307.14398v1 )

ライセンス: Link先を確認

Francesco Rundo, Concetto Spampinato, Michael Rundo

(参考訳) 免疫療法は癌治療に有望なアプローチとして現れる。腫瘍治療における免疫療法の効果は, 従来の化学療法法と比較して長期生存率と毒性の顕著な低下がみられた。しかし、免疫療法に適する患者のプールは依然として比較的小さく、特定の患者に好意的な治療反応をもたらす生理的メカニズムに関する包括的な理解の欠如が示唆されている。この問題に取り組むため,著者らは,非線形セルアーキテクチャとディープ下流分類器を併用した革新的な戦略を提案する。このアプローチは胸腹部ct画像から抽出した2次元特徴を慎重に選択・拡張し,治療結果の予測を改善することを目的としている。提案したパイプラインは、高度に組み込まれたPoint of Careシステムとシームレスに統合するように慎重に設計されている。この文脈で著者らは、特に攻撃的ながんである転移性尿路上皮癌(muc)に焦点を当てた説得力のあるケーススタディを提示した。提案手法の性能評価は, 約93%の精度で, その効果を裏付けるものである。

Immunotherapy emerges as promising approach for treating cancer. Encouraging findings have validated the efficacy of immunotherapy medications in addressing tumors, resulting in prolonged survival rates and notable reductions in toxicity compared to conventional chemotherapy methods. However, the pool of eligible patients for immunotherapy remains relatively small, indicating a lack of comprehensive understanding regarding the physiological mechanisms responsible for favorable treatment response in certain individuals while others experience limited benefits. To tackle this issue, the authors present an innovative strategy that harnesses a non-linear cellular architecture in conjunction with a deep downstream classifier. This approach aims to carefully select and enhance 2D features extracted from chest-abdomen CT images, thereby improving the prediction of treatment outcomes. The proposed pipeline has been meticulously designed to seamlessly integrate with an advanced embedded Point of Care system. In this context, the authors present a compelling case study focused on Metastatic Urothelial Carcinoma (mUC), a particularly aggressive form of cancer. Performance evaluation of the proposed approach underscores its effectiveness, with an impressive overall accuracy of approximately 93%

翻訳日:2023-07-28 16:55:27 公開日:2023-07-26

# MiDaS v3.1 -- ロバストな単分子相対深さ推定のためのモデル動物園

MiDaS v3.1 -- A Model Zoo for Robust Monocular Relative Depth Estimation ( http://arxiv.org/abs/2307.14460v1 )

ライセンス: Link先を確認

Reiner Birkl, Diana Wofk, Matthias M\"uller

(参考訳) モノクロ深度推定のためのMiDaS v3.1をリリースし、異なるエンコーダのバックボーンに基づく様々な新しいモデルを提供する。このリリースはコンピュータビジョンにおけるトランスフォーマーの成功によるものであり、様々な事前訓練されたビジョントランスフォーマーが利用可能になっている。画像エンコーダとして最も有望なビジョントランスフォーマーが,MiDaSアーキテクチャの深度推定品質とランタイムに与える影響について検討する。画像分類タスクにおいて視覚トランスフォーマーに匹敵する品質を実現する最近の畳み込み手法についても検討した。 MiDaS v3.0はバニラ・ビジョン・トランスフォーマーのViTのみを利用しているが、MiDaS v3.1はBEiT、Swin、SwinV2、Next-ViT、LeViTをベースとした追加モデルを提供している。これらのモデルはパフォーマンスとランタイムのトレードオフが異なる。最良のモデルは深さ推定品質を28%改善し、効率的なモデルはフレームレートの高い下流タスクを可能にする。新しいバックボーンを統合する一般的なプロセスについても説明します。作業の要約はhttps://youtu.be/UjaeNNFf9sEで、コードはhttps://github.com/isl-org/MiDaSで公開されている。

We release MiDaS v3.1 for monocular depth estimation, offering a variety of new models based on different encoder backbones. This release is motivated by the success of transformers in computer vision, with a large variety of pretrained vision transformers now available. We explore how using the most promising vision transformers as image encoders impacts depth estimation quality and runtime of the MiDaS architecture. Our investigation also includes recent convolutional approaches that achieve comparable quality to vision transformers in image classification tasks. While the previous release MiDaS v3.0 solely leverages the vanilla vision transformer ViT, MiDaS v3.1 offers additional models based on BEiT, Swin, SwinV2, Next-ViT and LeViT. These models offer different performance-runtime tradeoffs. The best model improves the depth estimation quality by 28% while efficient models enable downstream tasks requiring high frame rates. We also describe the general process for integrating new backbones. A video summarizing the work can be found at https://youtu.be/UjaeNNFf9sE and the code is available at https://github.com/isl-org/MiDaS.

翻訳日:2023-07-28 16:50:41 公開日:2023-07-26

# コアセットを用いた量子ボルツマンマシンのトレーニング

Training Quantum Boltzmann Machines with Coresets ( http://arxiv.org/abs/2307.14459v1 )

ライセンス: Link先を確認

Joshua Viszlai, Teague Tomesh, Pranav Gokhale, Eric Anschuetz, Frederic T. Chong

(参考訳) 最近の研究は、これらのアルゴリズムの短期量子デバイスへの適用性を高めるために、古典的なデータセットで動作する量子アルゴリズムのコアセット技術を用いて、研究されている。これらのアイデアを量子ボルツマンマシン(QBM)に適用し、ギブス状態サンプリングを必要とする勾配に基づくステップがトレーニングにおける主な計算ボトルネックとなる。データセット全体の代わりにcoresetを使用することで、必要なステップの数を最小化し、トレーニング時間を短縮します。量子コンピュータの計算時間が重要な資源である体制では、このことが現実的な節約につながる可能性がある。本手法は,36個の可視ユニットと8個の隠蔽ユニットを持つQBMを用いて,拡張バーからの6x6バイナリ画像に対して評価を行った。インセプションスコアにインスパイアされたメトリクスを用いて、コアセットの使用の有無とQBMトレーニング時間を比較する。

Recent work has proposed and explored using coreset techniques for quantum algorithms that operate on classical data sets to accelerate the applicability of these algorithms on near-term quantum devices. We apply these ideas to Quantum Boltzmann Machines (QBM) where gradient-based steps which require Gibbs state sampling are the main computational bottleneck during training. By using a coreset in place of the full data set, we try to minimize the number of steps needed and accelerate the overall training time. In a regime where computational time on quantum computers is a precious resource, we propose this might lead to substantial practical savings. We evaluate this approach on 6x6 binary images from an augmented bars and stripes data set using a QBM with 36 visible units and 8 hidden units. Using an Inception score inspired metric, we compare QBM training times with and without using coresets.

翻訳日:2023-07-28 16:50:04 公開日:2023-07-26

# 機械学習を用いた装甲車両の予測保守

Predictive Maintenance of Armoured Vehicles using Machine Learning Approaches ( http://arxiv.org/abs/2307.14453v1 )

ライセンス: Link先を確認

Prajit Sengupta, Anant Mehta, Prashant Singh Rana

(参考訳) 装甲車両(英語: Armoured vehicle)は、しばしば戦闘や戦術的な状況において、高ストレス環境で運用するために設計された特殊で複雑な機械である。本研究では,これらの車両から収集したセンサデータに基づいて,潜在的保守ニーズの予測を支援する予測保守型アンサンブルシステムを提案する。提案されたモデルのアーキテクチャは、車両のメンテナンス要件を正確に予測するために、軽量勾配ブースティング、ランダムフォレスト、決定木、余分な木分類器、勾配ブースティングといった様々なモデルを含んでいる。さらに,提案したアンサンブルモデルの安定性を評価するために,TOPSIS解析とともにK-foldクロスバリデーションを用いた。その結果,提案システムは98.93%の精度,99.80%の精度,99.03%のリコールを達成した。このアルゴリズムは、メンテナンスニーズを効果的に予測でき、車両のダウンタイムを低減し、運用効率を向上させる。様々なアルゴリズムと提案するアンサンブルを比較することで,機械学習による予測保守ソリューションの可能性を明らかにする。

Armoured vehicles are specialized and complex pieces of machinery designed to operate in high-stress environments, often in combat or tactical situations. This study proposes a predictive maintenance-based ensemble system that aids in predicting potential maintenance needs based on sensor data collected from these vehicles. The proposed model's architecture involves various models such as Light Gradient Boosting, Random Forest, Decision Tree, Extra Tree Classifier and Gradient Boosting to predict the maintenance requirements of the vehicles accurately. In addition, K-fold cross validation, along with TOPSIS analysis, is employed to evaluate the proposed ensemble model's stability. The results indicate that the proposed system achieves an accuracy of 98.93%, precision of 99.80% and recall of 99.03%. The algorithm can effectively predict maintenance needs, thereby reducing vehicle downtime and improving operational efficiency. Through comparisons between various algorithms and the suggested ensemble, this study highlights the potential of machine learning-based predictive maintenance solutions.

翻訳日:2023-07-28 16:49:39 公開日:2023-07-26

# 古典確率ビットと回路を用いた量子アルゴリズムのシミュレーション

Simulation of quantum algorithms using classical probabilistic bits and circuits ( http://arxiv.org/abs/2307.14452v1 )

ライセンス: Link先を確認

D. D. Yavuz and A. Yadav

(参考訳) 古典確率ビットと回路を用いて量子アルゴリズムをシミュレートする新しい手法を提案する。各量子ビット(2レベル量子システム)は、8次元確率空間内のベクトル(つまり8つの確率的結果を持つ古典確率変数)にマッピングされる。この写像の鍵となる考え方は、確率におけるキュービット状態を記述する複素係数の振幅と位相情報を格納することである。複数の量子系と複数の確率空間を結合する同一のテンソル積構造のため、n$ qubits は、n$ 8-次元確率ベクトルのテンソル積に写像される(すなわち、次元 2^n$ のヒルベルト空間は、次元 8^n$ の確率空間に写像される)。この最初のマッピングの後、これらの古典確率変数の相関誘導演算を用いて、確率空間における単一量子ビットおよび2量子ビットゲートのアナログの実装方法を示す。確率空間への写像と、この空間における変換(つまり、確率変数上の演算)の両方の重要な定義的特徴は、それらが線型ではなくアフィンであることである。このアーキテクチャを用いることで、量子システムの2^n$複素係数の進化は、確率変数の多項式数の結合的完全相関確率で追跡することができる。次に、(1) deutsch-jozsaアルゴリズム、(2)確率空間における量子フーリエ変換を実装するための特別な手順を与える。量子の場合と同一であり、確率空間における量子フーリエ変換をシミュレートするには、$O(n)$確率ビットと$O(n^2)$(すなわち量子ビット数の2次)演算が必要である。

We discuss a new approach to simulate quantum algorithms using classical probabilistic bits and circuits. Each qubit (a two-level quantum system) is initially mapped to a vector in an eight dimensional probability space (equivalently, to a classical random variable with eight probabilistic outcomes). The key idea in this mapping is to store both the amplitude and phase information of the complex coefficients that describe the qubit state in the probabilities. Due to the identical tensor product structure of combining multiple quantum systems as well as multiple probability spaces, $n$ qubits are then mapped to a tensor product of $n$ 8-dimensional probabilistic vectors (i.e., the Hilbert space of dimension $2^n$ is mapped to a probability space of dimension $8^n$). After this initial mapping, we show how to implement the analogs of single-qubit and two-qubit gates in the probability space using correlation-inducing operations on these classical random variables. The key defining feature of both the mapping to the probability space and the transformations in this space (i.e., operations on the random variables) is that they are not linear, but instead affine. Using this architecture, the evolution of the $2^n$ complex coefficients of the quantum system can be tracked in the joint fully-correlated probabilities of the polynomial number of random variables. We then give specific procedures for implementing (1) the Deutsch-Jozsa algorithm, and (2) the Quantum Fourier Transform in the probability space. Identical to the Quantum case, simulating the Quantum Fourier Transform in the probability space requires $O(n)$ probabilistic bits and $O(n^2)$ (i.e., quadratic in the number of quantum bits) operations.

翻訳日:2023-07-28 16:49:07 公開日:2023-07-26

# VISPUR: データ駆動決定における純粋アソシエーションの特定と解釈のためのビジュアルエイド

VISPUR: Visual Aids for Identifying and Interpreting Spurious Associations in Data-Driven Decisions ( http://arxiv.org/abs/2307.14448v1 )

ライセンス: Link先を確認

Xian Teng, Yongsu Ahn, Yu-Ru Lin

(参考訳) ビッグデータと機械学習のツールは、データ駆動の意思決定で人間に力を与えてきた。しかし、それらの多くは、結合する要因と部分群の不均質性によって引き起こされる可能性のある経験的関連を捉えている。有名なシンプソンのパラドックスは、集約とサブグループレベルの関係が互いに矛盾し、認知的な混乱と適切な解釈や決定が困難になる現象である。既存のツールは、人間が実際に急激な協会の落とし穴を見つけ、推論し、防ぐための洞察をほとんど提供しない。本稿では、因果解析フレームワークと人間中心のワークフローを提供する視覚分析システムであるVISPURを提案する。それらはConFOUNDER DASHBOARD(英語版)とSUBGROUP VIEWER(英語版)で、因果関係の誤解釈をもたらす可能性のある様々なサブグループのパターンの可視化と比較を可能にする。また,フローベースの手法を用いてパラドックス現象を説明できるREASONING STORYBOARDや,説明責任のある意思決定を支援するインタラクティブなDEC(Decision DIAGNOSIS)パネルを提案する。専門的なインタビューと制御されたユーザ実験を通じて,提案した「デパラドックス」ワークフローとデザインされた視覚分析システムが,突発的な関連を識別し理解し,説明可能な因果決定を行うのに役立つことを示す。

Big data and machine learning tools have jointly empowered humans in making data-driven decisions. However, many of them capture empirical associations that might be spurious due to confounding factors and subgroup heterogeneity. The famous Simpson's paradox is such a phenomenon where aggregated and subgroup-level associations contradict with each other, causing cognitive confusions and difficulty in making adequate interpretations and decisions. Existing tools provide little insights for humans to locate, reason about, and prevent pitfalls of spurious association in practice. We propose VISPUR, a visual analytic system that provides a causal analysis framework and a human-centric workflow for tackling spurious associations. These include a CONFOUNDER DASHBOARD, which can automatically identify possible confounding factors, and a SUBGROUP VIEWER, which allows for the visualization and comparison of diverse subgroup patterns that likely or potentially result in a misinterpretation of causality. Additionally, we propose a REASONING STORYBOARD, which uses a flow-based approach to illustrate paradoxical phenomena, as well as an interactive DECISION DIAGNOSIS panel that helps ensure accountable decision-making. Through an expert interview and a controlled user experiment, our qualitative and quantitative results demonstrate that the proposed "de-paradox" workflow and the designed visual analytic system are effective in helping human users to identify and understand spurious associations, as well as to make accountable causal decisions.

翻訳日:2023-07-28 16:48:19 公開日:2023-07-26

# 意味セグメンテーションのための自己教師付き少数ショット学習--アノテーションフリーアプローチ

Self-supervised Few-shot Learning for Semantic Segmentation: An Annotation-free Approach ( http://arxiv.org/abs/2307.14446v1 )

ライセンス: Link先を確認

Sanaz Karimijafarbigloo and Reza Azad and Dorit Merhof

(参考訳) Few-shot semantic segmentation (FSS)は、医療画像解析の分野で大きな可能性を秘めており、限られたトレーニングデータで正確なオブジェクトセグメンテーションを可能にする。しかし、既存のFSS技術は注釈付きセマンティッククラスに大きく依存しており、アノテーションの不足のため医学画像には適さない。この課題に対処するために、複数のコントリビューションが提案されている。まず、スペクトル分解法にインスパイアされた画像分解の問題は、グラフ分割タスクとして再編成される。自己教師付きネットワークの特徴親和性行列から導出されるラプラシアン行列の固有ベクトルを分析し、支持画像から関心対象の分布を推定する。次に,アノテーションに依存しない自己教師型FSSフレームワークを提案する。その代わり、サポート画像から得られた固有ベクトルを利用してクエリマスクを適応的に推定する。このアプローチは手動のアノテーションの必要性を排除し、注釈付きデータに制限のある医療画像に特に適している。第3に,サポート画像が提供する情報に基づいて,クエリ画像の復号化をさらに促進するために,マルチスケールの大規模カーネルアテンションモジュールを導入する。関連する機能や詳細を選択的に強調することにより、このモジュールはセグメンテーションプロセスを改善し、よりよいオブジェクト記述に寄与する。自然画像データセットと医用画像データセットの評価は,本手法の有効性と有効性を示す。さらに,提案手法は汎用性とモデルに依存しない性質を特徴とし,様々な深層アーキテクチャとのシームレスな統合を実現する。コードは \href{https://github.com/mindflow-institue/annotation_free_fewshot}{\textcolor{magenta}{GitHub}} で公開されている。

Few-shot semantic segmentation (FSS) offers immense potential in the field of medical image analysis, enabling accurate object segmentation with limited training data. However, existing FSS techniques heavily rely on annotated semantic classes, rendering them unsuitable for medical images due to the scarcity of annotations. To address this challenge, multiple contributions are proposed: First, inspired by spectral decomposition methods, the problem of image decomposition is reframed as a graph partitioning task. The eigenvectors of the Laplacian matrix, derived from the feature affinity matrix of self-supervised networks, are analyzed to estimate the distribution of the objects of interest from the support images. Secondly, we propose a novel self-supervised FSS framework that does not rely on any annotation. Instead, it adaptively estimates the query mask by leveraging the eigenvectors obtained from the support images. This approach eliminates the need for manual annotation, making it particularly suitable for medical images with limited annotated data. Thirdly, to further enhance the decoding of the query image based on the information provided by the support image, we introduce a multi-scale large kernel attention module. By selectively emphasizing relevant features and details, this module improves the segmentation process and contributes to better object delineation. Evaluations on both natural and medical image datasets demonstrate the efficiency and effectiveness of our method. Moreover, the proposed approach is characterized by its generality and model-agnostic nature, allowing for seamless integration with various deep architectures. The code is publicly available at \href{https://github.com/mindflow-institue/annotation_free_fewshot}{\textcolor{magenta}{GitHub}}.

翻訳日:2023-07-28 16:47:51 公開日:2023-07-26

# 量子シミュレーションからの高密度出力

Dense outputs from quantum simulations ( http://arxiv.org/abs/2307.14441v1 )

ライセンス: Link先を確認

Jin-Peng Liu, Lin Lin

(参考訳) 量子密度出力問題(quantum dense output problem)は、量子コンピュータを用いて時間依存の量子力学から時間蓄積可観測性を評価する過程である。この問題は量子制御や分光計算などの応用で頻繁に発生する。我々は、早期および完全フォールトトレラントな量子プラットフォームの両方で動作するように設計されたアルゴリズムを提示する。これらの手法は振幅推定、ハミルトニアンシミュレーション、量子線型正規微分方程式(ODE)解法、量子カールマン線形化などの手法に基づいている。進化時間$t$とエラー耐性$\epsilon$に関する包括的な複雑性分析を提供する。その結果, 線形化手法は, ある種の低ランク高密度出力に対して, 最適複雑性$\mathcal{O}(T/\epsilon)$をほぼ達成できることを示した。さらに、密度出力問題の線形化を行い、元の状態を包含する完全かつ有限次元の閉包を与える。この定式化はクープマン不変部分空間理論と関係があり、非線形制御と科学機械学習に独立した関心を持つ可能性がある。

The quantum dense output problem is the process of evaluating time-accumulated observables from time-dependent quantum dynamics using quantum computers. This problem arises frequently in applications such as quantum control and spectroscopic computation. We present a range of algorithms designed to operate on both early and fully fault-tolerant quantum platforms. These methodologies draw upon techniques like amplitude estimation, Hamiltonian simulation, quantum linear Ordinary Differential Equation (ODE) solvers, and quantum Carleman linearization. We provide a comprehensive complexity analysis with respect to the evolution time $T$ and error tolerance $\epsilon$. Our results demonstrate that the linearization approach can nearly achieve optimal complexity $\mathcal{O}(T/\epsilon)$ for a certain type of low-rank dense outputs. Moreover, we provide a linearization of the dense output problem that yields an exact and finite-dimensional closure which encompasses the original states. This formulation is related to the Koopman Invariant Subspace theory and may be of independent interest in nonlinear control and scientific machine learning.

翻訳日:2023-07-28 16:47:22 公開日:2023-07-26

# ファウショット応答生成とランク付けによる対話システムのための対話法の制御可能生成

Controllable Generation of Dialogue Acts for Dialogue Systems via Few-Shot Response Generation and Ranking ( http://arxiv.org/abs/2307.14440v1 )

ライセンス: Link先を確認

Angela Ramirez and Karik Agarwal and Juraj Juraska and Utkarsh Garg and Marilyn A. Walker

(参考訳) 対話システムは,多種類の対話行動(DA)を実現するための応答を生成する必要がある。これまで,対話用自然言語生成器(NLG)は,ドメイン固有DAとその意味的属性を出力発話にマッピングする大規模並列コーパスで訓練されていた。最近の研究は、事前学習言語モデル(LLM)が、プロンプトベース学習を用いた制御可能なNLGに新たな可能性をもたらすことを示している。ここでは、DAの制御された生成を実現するために、新しい数発のオーバージェネレーション・アンド・ランクアプローチを開発する。テキストスタイル転送手法を用いて,テキストの擬似参照から新たに生成する手法を含む8つの小ショットプロンプトスタイルを比較した。生成時に正しいDAと高い意味的精度の両方で出力を識別する6つの自動ランキング関数を開発する。 3つのドメインと4つのLSMでアプローチをテストする。我々の知る限り、DAと属性の精度の両方を用いてアウトプットを自動的にランク付けする対話用NLGに関する最初の研究である。完全性については、DA毎に5から100のインスタンスでトレーニングされた微調整された数ショットモデルと比較する。その結果,いくつかのプロンプト設定が完全なDA精度を実現し,ほぼ完全な意味的精度(99.81%)を実現し,数発の微調整よりも優れた性能を示した。

Dialogue systems need to produce responses that realize multiple types of dialogue acts (DAs) with high semantic fidelity. In the past, natural language generators (NLGs) for dialogue were trained on large parallel corpora that map from a domain-specific DA and its semantic attributes to an output utterance. Recent work shows that pretrained language models (LLMs) offer new possibilities for controllable NLG using prompt-based learning. Here we develop a novel few-shot overgenerate-and-rank approach that achieves the controlled generation of DAs. We compare eight few-shot prompt styles that include a novel method of generating from textual pseudo-references using a textual style transfer approach. We develop six automatic ranking functions that identify outputs with both the correct DA and high semantic accuracy at generation time. We test our approach on three domains and four LLMs. To our knowledge, this is the first work on NLG for dialogue that automatically ranks outputs using both DA and attribute accuracy. For completeness, we compare our results to fine-tuned few-shot models trained with 5 to 100 instances per DA. Our results show that several prompt settings achieve perfect DA accuracy, and near perfect semantic accuracy (99.81%) and perform better than few-shot fine-tuning.

翻訳日:2023-07-28 16:47:06 公開日:2023-07-26

# 固定積分型ニューラルネットワーク

Fixed Integral Neural Networks ( http://arxiv.org/abs/2307.14439v1 )

ライセンス: Link先を確認

Ryan Kortvelesy

(参考訳) ニューラルネットワークで表される学習関数に対して統合を行うのに有用であることが多い。しかし、この積分は通常数値的に行われ、学習関数(特にニューラルネットワーク)上の解析的積分は一般に難解であると見なされる。本研究では、学習した関数の積分を$f$で表す方法を提案する。これにより、ニューラルネットワークの正確な積分を計算でき、制約付きニューラルネットワークを積分に直接制約を適用してパラメータ化することができる。重要な点として、多くのアプリケーション(例えば確率分布、距離メトリクスなど)に必要な条件として、$f$を正に制限する手法も紹介する。最後に,固定積分ニューラルネットワーク(finn)を活用可能なアプリケーションをいくつか紹介する。

It is often useful to perform integration over learned functions represented by neural networks. However, this integration is usually performed numerically, as analytical integration over learned functions (especially neural networks) is generally viewed as intractable. In this work, we present a method for representing the analytical integral of a learned function $f$. This allows the exact integral of a neural network to be computed, and enables constrained neural networks to be parametrised by applying constraints directly to the integral. Crucially, we also introduce a method to constrain $f$ to be positive, a necessary condition for many applications (e.g. probability distributions, distance metrics, etc). Finally, we introduce several applications where our fixed-integral neural network (FINN) can be utilised.

翻訳日:2023-07-28 16:46:43 公開日:2023-07-26

# 生成インパインティングによる高画質画像再構成のための表現型保存メトリック設計

Phenotype-preserving metric design for high-content image reconstruction by generative inpainting ( http://arxiv.org/abs/2307.14436v1 )

ライセンス: Link先を確認

Vaibhav Sharma, Artur Yakimovich

(参考訳) 過去数十年間、高濃度自動顕微鏡は、表現型薬物スクリーニングとシステム生物学応用の汎用性を活用した大量の画像ベースのデータを提供する能力を示した。しかし、画像に基づくデータセットのサイズが大きくなるにつれて、画像中の画像やサンプル作成物の存在を人間が制御、回避、克服することは不可能になった。機械学習やディープラーニングのような新しい技術は、生成的画像のインペイントによってこれらの欠点に対処する可能性があるが、センシティブな研究データに適用すると、望ましくない画像操作のコストがかかる可能性がある。望ましくない操作は、いくつかの人工的なニューラルネットワークが引き起こされる神経幻覚のような現象によって引き起こされる可能性がある。そこで本研究では, ラベル付き培養細胞の高濃度蛍光顕微鏡による画像修復法の評価を行った。 deepfill v2やedge connectのようなアーキテクチャは、比較的少ないデータで微調整することで顕微鏡画像を忠実に復元できる。以上の結果から,復元すべき領域は形状よりも重要であることが示唆された。さらに,復元の質を制御するために,新しい表現型保存メトリックデザイン戦略を提案する。この戦略では、細胞核のような修復された生物学的表現型のサイズと数を定量化し、望ましくない操作を罰する。このアプローチの設計原則は、他のアプリケーションにも一般化するかもしれません。

In the past decades, automated high-content microscopy demonstrated its ability to deliver large quantities of image-based data powering the versatility of phenotypic drug screening and systems biology applications. However, as the sizes of image-based datasets grew, it became infeasible for humans to control, avoid and overcome the presence of imaging and sample preparation artefacts in the images. While novel techniques like machine learning and deep learning may address these shortcomings through generative image inpainting, when applied to sensitive research data this may come at the cost of undesired image manipulation. Undesired manipulation may be caused by phenomena such as neural hallucinations, to which some artificial neural networks are prone. To address this, here we evaluate the state-of-the-art inpainting methods for image restoration in a high-content fluorescence microscopy dataset of cultured cells with labelled nuclei. We show that architectures like DeepFill V2 and Edge Connect can faithfully restore microscopy images upon fine-tuning with relatively little data. Our results demonstrate that the area of the region to be restored is of higher importance than shape. Furthermore, to control for the quality of restoration, we propose a novel phenotype-preserving metric design strategy. In this strategy, the size and count of the restored biological phenotypes like cell nuclei are quantified to penalise undesirable manipulation. We argue that the design principles of our approach may also generalise to other applications.

翻訳日:2023-07-28 16:46:32 公開日:2023-07-26

# 非局所情報による予測による不確実性下での信頼性向上

Improving Reliable Navigation under Uncertainty via Predictions Informed by Non-Local Information ( http://arxiv.org/abs/2307.14501v1 )

ライセンス: Link先を確認

Raihan Islam Arnob and Gregory J. Stein

(参考訳) 非局所的に利用可能な情報を用いて、時間的に拡張された行動が不明瞭な空間に入ることの良さを予測することにより、部分マップ環境における信頼性、長期的目標指向ナビゲーションを改善する。ロボットがこれまで見てきたあらゆる観察は、旅行の特定の方向の良さに関する情報を提供するかもしれない。不確実性下での学習型モデルベース計画の最近の研究に基づいて、我々は、(グラフニューラルネットワークを介して)予測を行うために非局所情報に頼ることができると同時に、設計によって信頼性の高いアプローチを提案する。非局所的な情報が必要となる3つのシミュレーション環境で実験を行う。実世界のフロアプランから大規模に生成された大規模大学建築環境では,非学習型ベースラインと比較して9.3\%のコスト削減と,局所情報のみを活用可能な学習型プランナーと比較して14.9\%の削減が実証されている。

We improve reliable, long-horizon, goal-directed navigation in partially-mapped environments by using non-locally available information to predict the goodness of temporally-extended actions that enter unseen space. Making predictions about where to navigate in general requires non-local information: any observations the robot has seen so far may provide information about the goodness of a particular direction of travel. Building on recent work in learning-augmented model-based planning under uncertainty, we present an approach that can both rely on non-local information to make predictions (via a graph neural network) and is reliable by design: it will always reach its goal, even when learning does not provide accurate predictions. We conduct experiments in three simulated environments in which non-local information is needed to perform well. In our large scale university building environment, generated from real-world floorplans to the scale, we demonstrate a 9.3\% reduction in cost-to-go compared to a non-learned baseline and a 14.9\% reduction compared to a learning-informed planner that can only use local information to inform its predictions.

翻訳日:2023-07-28 16:39:13 公開日:2023-07-26

# デジタル情報の関与予測モデル:認知バイアス、計算言語学、自然言語処理を組み込んだ英語単語へのユーザの関与予測

A Predictive Model of Digital Information Engagement: Forecasting User Engagement With English Words by Incorporating Cognitive Biases, Computational Linguistics and Natural Language Processing ( http://arxiv.org/abs/2307.14500v1 )

ライセンス: Link先を確認

Nimrod Dvir, Elaine Friedman, Suraj Commuri, Fan yang and Jennifer Romano

(参考訳) 本研究では,デジタル情報エンゲージメント(IE)の新たな予測モデルであるREADモデルを紹介し,実証的に検証する。累積プロスペクト理論の理論的枠組みの中で概念化されたこのモデルは、重要な認知バイアスを計算言語学や自然言語処理と統合し、情報エンゲージメントに関する多次元的な視点を開発する。 WordNetデータベースから50組の同義語(合計100語)をランダムに選択した厳密なテストプロトコルが実装された。これらの単語のエンゲージメントレベルは、大規模なオンライン調査(n = 80,500)を通じて評価され、経験的IEメトリクスを導出する。各単語の読み出し属性を計算し,その予測の有効性を検討した。その結果,READモデルの頑健さを裏付け,単語のIEレベルを正確に予測し,より係わる単語を84%の精度で同義語と区別した。 READモデルの可能性は、ビジネス、教育、政府、医療など、さまざまな領域に広がり、コンテンツエンゲージメントを高め、AI言語モデルの開発と生成テキストワークを通知する可能性がある。将来の研究は、異なるドメインや言語にわたるモデルのスケーラビリティと適応性に対処し、適用性と有効性を広げるべきである。

This study introduces and empirically tests a novel predictive model for digital information engagement (IE) - the READ model, an acronym for the four pivotal attributes of engaging information: Representativeness, Ease-of-use, Affect, and Distribution. Conceptualized within the theoretical framework of Cumulative Prospect Theory, the model integrates key cognitive biases with computational linguistics and natural language processing to develop a multidimensional perspective on information engagement. A rigorous testing protocol was implemented, involving 50 randomly selected pairs of synonymous words (100 words in total) from the WordNet database. These words' engagement levels were evaluated through a large-scale online survey (n = 80,500) to derive empirical IE metrics. The READ attributes for each word were then computed and their predictive efficacy examined. The findings affirm the READ model's robustness, accurately predicting a word's IE level and distinguishing the more engaging word from a pair of synonyms with an 84% accuracy rate. The READ model's potential extends across various domains, including business, education, government, and healthcare, where it could enhance content engagement and inform AI language model development and generative text work. Future research should address the model's scalability and adaptability across different domains and languages, thereby broadening its applicability and efficacy.

翻訳日:2023-07-28 16:38:53 公開日:2023-07-26

# HUGE: TPUを使った巨大な教師なしグラフ埋め込み

HUGE: Huge Unsupervised Graph Embeddings with TPUs ( http://arxiv.org/abs/2307.14490v1 )

ライセンス: Link先を確認

Brandon Mayer, Anton Tsitsulin, Hendrik Fichtenberger, Jonathan Halcrow, Bryan Perozzi

(参考訳) グラフは、オブジェクトの集合間の関係をキャプチャする構造化データの表現である。利用可能なネットワークデータの普及に伴い、数十億のノードと数兆のエッジを持つグラフを素早く分析する産業や学術的なニーズが高まっている。ネットワーク理解のための一般的な第一歩は、グラフ内のノードを連続的に表現するプロセスであるGraph Embeddingである。連続表現は、特に大規模において、分類、リンク予測、クラスタリングといった下流の機械学習タスクを解決するために、しばしばより効果的である。テンソル処理ユニット(TPU)と高帯域幅メモリを併用した高性能グラフ埋め込みアーキテクチャを提案し,グラフ埋め込み問題を単純化し,数十億のノードと数兆のエッジを持つグラフにスケール可能である。本研究では,実および合成大規模データセットの組込み空間品質を検証する。

Graphs are a representation of structured data that captures the relationships between sets of objects. With the ubiquity of available network data, there is increasing industrial and academic need to quickly analyze graphs with billions of nodes and trillions of edges. A common first step for network understanding is Graph Embedding, the process of creating a continuous representation of nodes in a graph. A continuous representation is often more amenable, especially at scale, for solving downstream machine learning tasks such as classification, link prediction, and clustering. A high-performance graph embedding architecture leveraging Tensor Processing Units (TPUs) with configurable amounts of high-bandwidth memory is presented that simplifies the graph embedding problem and can scale to graphs with billions of nodes and trillions of edges. We verify the embedding space quality on real and synthetic large-scale datasets.

翻訳日:2023-07-28 16:38:26 公開日:2023-07-26

# Super Inpaint:超高解像度画像インパインティングのための詳細な注意インシシット表現の学習

SuperInpaint: Learning Detail-Enhanced Attentional Implicit Representation for Super-resolutional Image Inpainting ( http://arxiv.org/abs/2307.14489v1 )

ライセンス: Link先を確認

Canyu Zhang, Qing Guo, Xiaoguang Li, Renjie Wan, Hongkai Yu, Ivor Tsang, Song Wang

(参考訳) 本研究では,低解像度画像の欠落領域を再構築し,任意の高解像度画像を生成することを目的とした,SuperInpaintと呼ばれる課題の画像復元タスクを導入する。この課題は, 互いの欠陥を増幅するため, 最先端の超解像・画像インパインティング手法を積み重ねることによって効果的に対処できないことが判明した。これらの制約を克服するために,スーパーインペントを1つのモデルで実現し,任意の解像度で高品質な画像を生成するDEARを提案する。具体的には,深い畳み込みネットワークを用いて入力画像の潜在埋め込みを抽出し,適応型ハイパスフィルタによる潜在埋め込みの高周波成分を強化する。これにより、詳細な意味埋め込みがもたらされる。さらに,非効率なマスク画素からの埋め込みを抑制する非マスク型モジュールにセマンティック埋め込みを組み込む。さらに,画像再構成にどの画素を使用するべきかを示す画素単位の重要度マップを抽出する。再構成したい画素の座標を考えると、まずその近傍の画素を入力画像に集め、その詳細を強調したセマンティック埋め込み、意図しないセマンティック埋め込み、重要値、所望の画素への空間距離を抽出する。そして、上記の全ての用語を暗黙の表現に入力し、指定されたピクセルの色を生成する。提案手法を評価するため,既存の3つのデータセットを拡張し,SOTA塗装法と超解像法を用いて18の有意義なベースラインを構築した。広範な実験結果から,本手法は既存の手法を4つのメトリクスに対して有意なマージンで上回ることがわかった。

In this work, we introduce a challenging image restoration task, referred to as SuperInpaint, which aims to reconstruct missing regions in low-resolution images and generate completed images with arbitrarily higher resolutions. We have found that this task cannot be effectively addressed by stacking state-of-the-art super-resolution and image inpainting methods as they amplify each other's flaws, leading to noticeable artifacts. To overcome these limitations, we propose the detail-enhanced attentional implicit representation (DEAR) that can achieve SuperInpaint with a single model, resulting in high-quality completed images with arbitrary resolutions. Specifically, we use a deep convolutional network to extract the latent embedding of an input image and then enhance the high-frequency components of the latent embedding via an adaptive high-pass filter. This leads to detail-enhanced semantic embedding. We further feed the semantic embedding into an unmask-attentional module that suppresses embeddings from ineffective masked pixels. Additionally, we extract a pixel-wise importance map that indicates which pixels should be used for image reconstruction. Given the coordinates of a pixel we want to reconstruct, we first collect its neighboring pixels in the input image and extract their detail-enhanced semantic embeddings, unmask-attentional semantic embeddings, importance values, and spatial distances to the desired pixel. Then, we feed all the above terms into an implicit representation and generate the color of the specified pixel. To evaluate our method, we extend three existing datasets for this new task and build 18 meaningful baselines using SOTA inpainting and super-resolution methods. Extensive experimental results demonstrate that our method outperforms all existing methods by a significant margin on four widely used metrics.

翻訳日:2023-07-28 16:38:12 公開日:2023-07-26

# ShinyAnimalCV: オブジェクト検出、セグメンテーション、およびコンピュータビジョンを用いた動物の3次元可視化のためのオープンソースのクラウドベースのWebアプリケーション

Technical note: ShinyAnimalCV: open-source cloud-based web application for object detection, segmentation, and three-dimensional visualization of animals using computer vision ( http://arxiv.org/abs/2307.14487v1 )

ライセンス: Link先を確認

Jin Wang, Yu Hu, Lirong Xiang, Gota Morota, Samantha A. Brooks, Carissa L. Wickens, Emily K. Miller-Cushon, and Haipeng Yu

(参考訳) 非侵襲的で費用対効果の高いコンピュータビジョン(CV)は、タイムリーかつ個別化された動物ケアによる意思決定を最適化することで、精密な家畜農業の発展を促進する。安価な2次元および3次元カメラセンサーと様々な機械学習とディープラーニングアルゴリズムが組み合わさったことで、家畜生産システムを改善する貴重な機会となった。しかし、パブリックドメインで様々なcvツールが利用可能であるにもかかわらず、これらのツールを動物データに適用することは困難であり、しばしば、プログラミングとデータ分析のスキルと、コンピューティングリソースへのアクセスを必要とする。さらに、畜産の精密化が急速に進み、CVで動物科学の学生を教育・訓練する必要性が高まっている。このことは、CVに関わる複雑なアルゴリズムを効果的に実証することの課題を教育者に提示する。そこで本研究では,オープンソースクラウドベースのWebアプリケーションであるShinyAnimalCVを開発した。本アプリケーションは,物体のセグメンテーション,検出,3次元表面の可視化,2次元および3次元形態特徴の抽出など,CVタスクを実行するユーザフレンドリーなインタフェースを提供する。このアプリケーションには、トップビュー動物データを用いた9つの事前訓練CVモデルが含まれている。 ShinyAnimalCVは、クラウドコンピューティングプラットフォームを使用してオンラインでデプロイされている。 ShinyAnimalCVのソースコードはGitHubで公開されており、カスタムデータを使用してCVモデルをトレーニングし、ユーザがアプリケーションの機能を完全に活用できるようにローカルにデプロイするための詳細なドキュメントが提供されている。 shinyanimalcvは動物科学コミュニティにおけるcv研究と教育に貢献できる。

Computer vision (CV), a non-intrusive and cost-effective technology, has furthered the development of precision livestock farming by enabling optimized decision-making through timely and individualized animal care. The availability of affordable two- and three-dimensional camera sensors, combined with various machine learning and deep learning algorithms, has provided a valuable opportunity to improve livestock production systems. However, despite the availability of various CV tools in the public domain, applying these tools to animal data can be challenging, often requiring users to have programming and data analysis skills, as well as access to computing resources. Moreover, the rapid expansion of precision livestock farming is creating a growing need to educate and train animal science students in CV. This presents educators with the challenge of efficiently demonstrating the complex algorithms involved in CV. Thus, the objective of this study was to develop ShinyAnimalCV, an open-source cloud-based web application. This application provides a user-friendly interface for performing CV tasks, including object segmentation, detection, three-dimensional surface visualization, and extraction of two- and three-dimensional morphological features. Nine pre-trained CV models using top-view animal data are included in the application. ShinyAnimalCV has been deployed online using cloud computing platforms. The source code of ShinyAnimalCV is available on GitHub, along with detailed documentation on training CV models using custom data and deploying ShinyAnimalCV locally to allow users to fully leverage the capabilities of the application. ShinyAnimalCV can contribute to CV research and teaching in the animal science community.

翻訳日:2023-07-28 16:37:39 公開日:2023-07-26

# 自動セグメンテーションモデル一般化における画像取得と患者現象の変動の役割

Role of Image Acquisition and Patient Phenotype Variations in Automatic Segmentation Model Generalization ( http://arxiv.org/abs/2307.14482v1 )

ライセンス: Link先を確認

Timothy L. Kline, Sumana Ramanathan, Harrison C. Gottlich, Panagiotis Korfiatis, Adriana V. Gregory

(参考訳) 目的: 医用画像セグメンテーションモデルの領域外性能と一般化能力を評価し, 新たな画像取得と疾患タイプへの適応に焦点をあてた。材料: 健常者および多嚢胞性腎疾患(PKD)患者の非コントラストおよび造影腹部CTのデータセットを用いて検討した。腎臓,肝臓,脾臓を分画するモデルのトレーニング・検証には,400枚の画像(非コントラストコントロール100枚,コントラストコントロール100枚,非コントラストPKD100枚,コントラストPKD100枚)を使用し,PKD患者100枚の非コントラストCT画像に対して最終モデルを試験した。 Dice, Jaccard, TPR, Precision を用いて評価した。結果: 多様なデータでトレーニングされたモデルは、ドメイン内のデータでテストされた場合のみにトレーニングされたモデルよりもパフォーマンスが悪くなかった。例えば、各データセットから25%でトレーニングされたモデルのDice類似性は、ドメイン内のデータで純粋にトレーニングされたモデルと非同等であることが判明した。結論: 幅広いトレーニング例がモデルの一般化とドメイン外のパフォーマンスを著しく向上し, 臨床現場におけるセグメンテーション自動化ツールの適用性が向上した。この研究の結果は、医療画像AIモデル開発にデータ中心のアプローチを採用するための将来の研究のロードマップを提供する。

Purpose: This study evaluated the out-of-domain performance and generalization capabilities of automated medical image segmentation models, with a particular focus on adaptation to new image acquisitions and disease type. Materials: Datasets from both non-contrast and contrast-enhanced abdominal CT scans of healthy patients and those with polycystic kidney disease (PKD) were used. A total of 400 images (100 non-contrast controls, 100 contrast controls, 100 non-contrast PKD, 100 contrast PKD) were utilized for training/validation of models to segment kidneys, livers, and spleens, and the final models were then tested on 100 non-contrast CT images of patients affected by PKD. Performance was evaluated using Dice, Jaccard, TPR, and Precision. Results: Models trained on a diverse range of data showed no worse performance than models trained exclusively on in-domain data when tested on in-domain data. For instance, the Dice similarity of the model trained on 25% from each dataset was found to be non-inferior to the model trained purely on in-domain data. Conclusions: The results indicate that broader training examples significantly enhances model generalization and out-of-domain performance, thereby improving automated segmentation tools' applicability in clinical settings. The study's findings provide a roadmap for future research to adopt a data-centric approach in medical image AI model development.

翻訳日:2023-07-28 16:37:11 公開日:2023-07-26

# 貯水池学習の限界

Limits to Reservoir Learning ( http://arxiv.org/abs/2307.14474v1 )

ライセンス: Link先を確認

Anthony M. Polloreno

(参考訳) 本研究では,物理性が示唆する計算限界に基づいて学習する機械の能力を限定した。まず,信号集合の期待二乗誤差の正規化尺度である情報処理能力(IPC)を関数の完全基底として検討することから始める。我々はIPCを用いて、物理的考察に制約された場合、特定の種類のリカレントネットワークである貯水池コンピュータの性能のノイズ下での劣化を測定する。まず、ipcは、n$出力信号のポイントワイズ生成可能な2^n$の集まりを考える場合でも、システムサイズで最大で$n$の多項式であることを示す。次に, この劣化は, 貯留層で表される関数の族が, 貯留層ノイズの存在下で学習するために指数関数的なサンプル数を必要とすることを示唆する。最後に、バイナリ分類に使用する場合、ノイズのない2^n$関数の同じコレクションのパフォーマンスに関する議論を締めくくった。

In this work, we bound a machine's ability to learn based on computational limitations implied by physicality. We start by considering the information processing capacity (IPC), a normalized measure of the expected squared error of a collection of signals to a complete basis of functions. We use the IPC to measure the degradation under noise of the performance of reservoir computers, a particular kind of recurrent network, when constrained by physical considerations. First, we show that the IPC is at most a polynomial in the system size $n$, even when considering the collection of $2^n$ possible pointwise products of the $n$ output signals. Next, we argue that this degradation implies that the family of functions represented by the reservoir requires an exponential number of samples to learn in the presence of the reservoir's noise. Finally, we conclude with a discussion of the performance of the same collection of $2^n$ functions without noise when being used for binary classification.

翻訳日:2023-07-28 16:36:43 公開日:2023-07-26

# ML APIに必要な契約の種類

What Kinds of Contracts Do ML APIs Need? ( http://arxiv.org/abs/2307.14465v1 )

ライセンス: Link先を確認

Samantha Syeda Khairunnesa, Shibbir Ahmed, Sayem Mohammad Imtiaz, Hridesh Rajan, Gary T. Leavens

(参考訳) 最近の研究によると、機械学習(ML)プログラムはエラーを起こしやすく、MLコードのコントラクトを要求している。コントラクトは,コントラクト方法論による設計のように,apiのドキュメント化とapiユーザによる正しいコードの記述を支援する。問題は、APIユーザにとって最も役に立つ契約はどんなものなのだろうか? 私たちは特に、mlパイプラインの初期段階で、apiユーザがエラーをキャッチするのに役立つ契約の種類に興味を持っています。 TensorFlow、Scikit-learn、Keras、PyTorchの4つの最も頻繁に議論されているMLライブラリのStack Overflowに関するポストに関する実証的研究について説明する。これらのライブラリでは、413の非公式な(英語)API仕様を抽出した。これらの仕様を使って以下の質問を理解しました。 ml契約違反の背後にある根本原因と影響は何か? ML契約違反の一般的なパターンはありますか? ML契約を理解するには、高度なレベルのMLソフトウェア専門知識が必要ですか? APIレベルでコントラクトをチェックすることは、MLパイプラインの初期段階における違反の検出に役立ちますか? 私たちの重要な発見は、ML APIの最も一般的に必要とされる契約は、APIの単一引数の制約をチェックするか、API呼び出しの順序をチェックすることです。ソフトウェアエンジニアリングコミュニティは、ML APIの理解を深めるために、これらの契約をマイニングするために既存のコントラクトマイニングアプローチを採用することができる。我々はまた、行動と時間的契約のマイニングのアプローチを組み合わせる必要性についても言及した。契約言語の設計を支援するために必要なml契約のカテゴリについて報告する。

Recent work has shown that Machine Learning (ML) programs are error-prone and called for contracts for ML code. Contracts, as in the design by contract methodology, help document APIs and aid API users in writing correct code. The question is: what kinds of contracts would provide the most help to API users? We are especially interested in what kinds of contracts help API users catch errors at earlier stages in the ML pipeline. We describe an empirical study of posts on Stack Overflow of the four most often-discussed ML libraries: TensorFlow, Scikit-learn, Keras, and PyTorch. For these libraries, our study extracted 413 informal (English) API specifications. We used these specifications to understand the following questions. What are the root causes and effects behind ML contract violations? Are there common patterns of ML contract violations? When does understanding ML contracts require an advanced level of ML software expertise? Could checking contracts at the API level help detect the violations in early ML pipeline stages? Our key findings are that the most commonly needed contracts for ML APIs are either checking constraints on single arguments of an API or on the order of API calls. The software engineering community could employ existing contract mining approaches to mine these contracts to promote an increased understanding of ML APIs. We also noted a need to combine behavioral and temporal contract mining approaches. We report on categories of required ML contracts, which may help designers of contract languages.

翻訳日:2023-07-28 16:36:29 公開日:2023-07-26

# U-Net Spiking Neural Networkを用いた単一チャネル音声強調

Single Channel Speech Enhancement Using U-Net Spiking Neural Networks ( http://arxiv.org/abs/2307.14464v1 )

ライセンス: Link先を確認

Abir Riahi and \'Eric Plourde

(参考訳) 信頼度の高い通信デバイスや頑健な音声認識システムには音声強調(se)が不可欠である。従来の人工ニューラルネットワーク(ANN)はSEで顕著な性能を示したが、高いエネルギーコストとともに計算能力がかなり必要である。本稿では,U-Netアーキテクチャに基づくスパイキングニューラルネットワーク(SNN)を用いたSEに対する新しいアプローチを提案する。 SNNは音声などの時間次元のデータ処理に適しており、ニューロモルフィックハードウェア上でのエネルギー効率のよい実装で知られている。したがって、SNNは限られたリソースを持つデバイス上でのリアルタイムアプリケーションに対する興味深い候補である。現在の研究の主な目的は、SEのための最先端のANNモデルと同等の性能を持つSNNベースのモデルを開発することである。代用階調に基づく最適化を用いて深部SNNを訓練し、異なる信号対雑音比と実環境雑音条件下での知覚目標試験による性能評価を行う。その結果,提案モデルがintel neuromorphic deep noise reduction challenge (intel n-dns challenge) のベースラインソリューションを上回り,同等のannモデルと比較して許容可能な性能が得られることがわかった。

Speech enhancement (SE) is crucial for reliable communication devices or robust speech recognition systems. Although conventional artificial neural networks (ANN) have demonstrated remarkable performance in SE, they require significant computational power, along with high energy costs. In this paper, we propose a novel approach to SE using a spiking neural network (SNN) based on a U-Net architecture. SNNs are suitable for processing data with a temporal dimension, such as speech, and are known for their energy-efficient implementation on neuromorphic hardware. As such, SNNs are thus interesting candidates for real-time applications on devices with limited resources. The primary objective of the current work is to develop an SNN-based model with comparable performance to a state-of-the-art ANN model for SE. We train a deep SNN using surrogate-gradient-based optimization and evaluate its performance using perceptual objective tests under different signal-to-noise ratios and real-world noise conditions. Our results demonstrate that the proposed energy-efficient SNN model outperforms the Intel Neuromorphic Deep Noise Suppression Challenge (Intel N-DNS Challenge) baseline solution and achieves acceptable performance compared to an equivalent ANN model.

翻訳日:2023-07-28 16:36:07 公開日:2023-07-26

# 造影学習による超音波ガイド下脳腫瘍摘出術のマルチモーダル解剖学的ランドマーク検出に向けて

Towards multi-modal anatomical landmark detection for ultrasound-guided brain tumor resection with contrastive learning ( http://arxiv.org/abs/2307.14523v1 )

ライセンス: Link先を確認

Soorena Salari, Amirhossein Rasoulian, Hassan Rivaz and Yiming Xiao

(参考訳) 超音波ガイド下脳腫瘍切除における組織変化補正のためのMRI-超音波記録など,様々な臨床応用における画像登録品質の定量的評価に医療スキャン間の相同性解剖学的ランドマークが有用である。手動でMRIと超音波(US)のランドマークペアを識別することで、タスクの異なる登録アルゴリズムの検証が大幅に促進されているが、この手順にはかなりの専門知識、労力、時間が必要であり、画像間の整合性が難しくなる。これまでのところ、解剖学的ランドマーク検出のための伝統的な機械学習アプローチや機械学習アプローチは、主にモノモーダルアプリケーションに焦点を当てている。残念ながら、臨床ニーズにもかかわらず、モーダル/コントラストランドマーク検出が試みられることは稀である。そこで我々は,脳外科におけるMRIと術中USスキャンの対応するランドマークを検出するための,新しいコントラスト学習フレームワークを提案する。具体的には、2つの畳み込みニューラルネットワークが共同で訓練され、MRIと米国のスキャンで画像の特徴を符号化し、MRIの対応するランドマークを含む米国の画像パッチと一致するようにした。公開RESECTデータベースを用いて,その手法の開発と検証を行った。 SIFT特徴を持つ5.88+-4.79mmに対して平均的ランドマーク検出精度は18.78+-4.77mmであり, 神経外科応用におけるMRI-USランドマーク検出の有望な結果が得られた。

Homologous anatomical landmarks between medical scans are instrumental in quantitative assessment of image registration quality in various clinical applications, such as MRI-ultrasound registration for tissue shift correction in ultrasound-guided brain tumor resection. While manually identified landmark pairs between MRI and ultrasound (US) have greatly facilitated the validation of different registration algorithms for the task, the procedure requires significant expertise, labor, and time, and can be prone to inter- and intra-rater inconsistency. So far, many traditional and machine learning approaches have been presented for anatomical landmark detection, but they primarily focus on mono-modal applications. Unfortunately, despite the clinical needs, inter-modal/contrast landmark detection has very rarely been attempted. Therefore, we propose a novel contrastive learning framework to detect corresponding landmarks between MRI and intra-operative US scans in neurosurgery. Specifically, two convolutional neural networks were trained jointly to encode image features in MRI and US scans to help match the US image patch that contain the corresponding landmarks in the MRI. We developed and validated the technique using the public RESECT database. With a mean landmark detection accuracy of 5.88+-4.79 mm against 18.78+-4.77 mm with SIFT features, the proposed method offers promising results for MRI-US landmark detection in neurosurgical applications for the first time.

翻訳日:2023-07-28 16:30:26 公開日:2023-07-26

# CliniDigest: 大規模言語モデルによる臨床試験記述の大規模要約の事例研究

CliniDigest: A Case Study in Large Language Model Based Large-Scale Summarization of Clinical Trial Descriptions ( http://arxiv.org/abs/2307.14522v1 )

ライセンス: Link先を確認

Renee D. White (1), Tristan Peng (1), Pann Sripitak (1), Alexander Rosenberg Johansen (1), Michael Snyder (1) (1) Stanford University

(参考訳) 臨床試験は、新しいバイオメディカル介入を評価する研究である。新しい試行をデザインするために、研究者は現在のものからインスピレーションを得て完成する。 2022年には、毎日100以上の臨床試験が行われ、各臨床試験の平均は1500語[1]である。このため、最新の状態を維持することはほぼ不可能である。この問題を軽減するため,GPT-3.5を用いたクリニダイジェスト(CliniDigest)という試薬を作成した。 CliniDigestは、私たちの知る限り、臨床試験のリアルタイム、真実、そして包括的な要約を提供する最初のツールです。 CliniDigestは、85の臨床試験記述(約10,500語)を、参照と限定幻覚を伴う簡潔な200ワードの要約に還元することができる。 CliniDigestを27のサブドメインに分けて457の臨床試験をまとめた。各フィールドに対して、clinidigestは$\mu=153,\ \sigma=69 $ワードの要約を生成し、それぞれ$\mu=54\%,\ \sigma=30\% $のソースを使用する。より包括的な評価を計画し、本稿で概説する。

A clinical trial is a study that evaluates new biomedical interventions. To design new trials, researchers draw inspiration from those current and completed. In 2022, there were on average more than 100 clinical trials submitted to ClinicalTrials.gov every day, with each trial having a mean of approximately 1500 words [1]. This makes it nearly impossible to keep up to date. To mitigate this issue, we have created a batch clinical trial summarizer called CliniDigest using GPT-3.5. CliniDigest is, to our knowledge, the first tool able to provide real-time, truthful, and comprehensive summaries of clinical trials. CliniDigest can reduce up to 85 clinical trial descriptions (approximately 10,500 words) into a concise 200-word summary with references and limited hallucinations. We have tested CliniDigest on its ability to summarize 457 trials divided across 27 medical subdomains. For each field, CliniDigest generates summaries of $\mu=153,\ \sigma=69 $ words, each of which utilizes $\mu=54\%,\ \sigma=30\% $ of the sources. A more comprehensive evaluation is planned and outlined in this paper.

翻訳日:2023-07-28 16:29:58 公開日:2023-07-26

# 車両照明のパターン:カメラを用いた車両光データセットとメトリクスのキュレーションとアノテーションの複雑さに対処する

Patterns of Vehicle Lights: Addressing Complexities in Curation and Annotation of Camera-Based Vehicle Light Datasets and Metrics ( http://arxiv.org/abs/2307.14521v1 )

ライセンス: Link先を確認

Ross Greer, Akshay Gopalkrishnan, Maitrayee Keskar, Mohan Trivedi

(参考訳) 本稿では、コンピュータビジョンにおける車両光の表現とその自律運転分野における様々なタスクへの応用について述べる。境界ボックス,センターポイント,コーナーポイント,セグメンテーションマスクなど,車両の光を表現するための異なる仕様について,その強度と弱点の観点から論じる。車両光検出の恩恵を受ける自動運転における重要な3つのタスクは、夜間車両検出、3次元車両の向き推定、動的軌道探索である。各タスクは光の異なる表現を必要とすることがある。 LISA Vehicle Lights Datasetと関連するLight Visibility Modelが導入され、車両検出、意図と軌道予測、安全な経路計画において、下流アプリケーション用に特別に設計された光アノテーションが提供される。既存の車両光データセットの比較が提供され、各データセットのユニークな特徴と制限が強調される。本論文は、車載照明の表現と、自動運転アプリケーションにおける効果的な検出モデルのトレーニングのための正確なアノテーションの重要性について考察する。私たちのデータセットとモデルはhttps://cvrr.ucsd.edu/vehicle-lights-datasetで利用可能です。

This paper explores the representation of vehicle lights in computer vision and its implications for various tasks in the field of autonomous driving. Different specifications for representing vehicle lights, including bounding boxes, center points, corner points, and segmentation masks, are discussed in terms of their strengths and weaknesses. Three important tasks in autonomous driving that can benefit from vehicle light detection are identified: nighttime vehicle detection, 3D vehicle orientation estimation, and dynamic trajectory cues. Each task may require a different representation of the light. The challenges of collecting and annotating large datasets for training data-driven models are also addressed, leading to introduction of the LISA Vehicle Lights Dataset and associated Light Visibility Model, which provides light annotations specifically designed for downstream applications in vehicle detection, intent and trajectory prediction, and safe path planning. A comparison of existing vehicle light datasets is provided, highlighting the unique features and limitations of each dataset. Overall, this paper provides insights into the representation of vehicle lights and the importance of accurate annotations for training effective detection models in autonomous driving applications. Our dataset and model are made available at https://cvrr.ucsd.edu/vehicle-lights-dataset

翻訳日:2023-07-28 16:29:39 公開日:2023-07-26

# focalerrornet : 不確実性を考慮した焦点変調ネットワークによる超音波ガイド下神経外科手術におけるモーダル間登録誤差推定

FocalErrorNet: Uncertainty-aware focal modulation network for inter-modal registration error estimation in ultrasound-guided neurosurgery ( http://arxiv.org/abs/2307.14520v1 )

ライセンス: Link先を確認

Soorena Salari, Amirhossein Rasoulian, Hassan Rivaz and Yiming Xiao

(参考訳) 脳腫瘍切除では,エロークエント領域を保存しながら癌の組織を正確に除去することが治療の安全性と成果に不可欠である。しかし、術中組織変形(脳シフトと呼ばれる)は手術対象を移動させ、手術前計画を無効にすることができる。術中超音波(ius)は脳シフトを追跡するためのリアルタイム画像として採用されており,手術前計画の更新にはモーダル間登録(mri-ius)が必要となることが多い。手術中の登録結果の品質管理は有害な結果を避けるために重要であるが,手動による検証は困難な3次元可視化とiUSの低コントラストのために大きな課題に直面している。この問題に対処するためには自動アルゴリズムが緊急に必要とされているが、その問題はほとんど試みられなかった。そこで我々は,脳腫瘍手術におけるMRI-iUS登録誤差を正確に評価するために,3次元焦点変調に基づく新しい深層学習手法を提案する。一般のRESECT臨床データベースを用いて開発・検証し,0.59+0.57mmの誤差を推定する。

In brain tumor resection, accurate removal of cancerous tissues while preserving eloquent regions is crucial to the safety and outcomes of the treatment. However, intra-operative tissue deformation (called brain shift) can move the surgical target and render the pre-surgical plan invalid. Intra-operative ultrasound (iUS) has been adopted to provide real-time images to track brain shift, and inter-modal (i.e., MRI-iUS) registration is often required to update the pre-surgical plan. Quality control for the registration results during surgery is important to avoid adverse outcomes, but manual verification faces great challenges due to difficult 3D visualization and the low contrast of iUS. Automatic algorithms are urgently needed to address this issue, but the problem was rarely attempted. Therefore, we propose a novel deep learning technique based on 3D focal modulation in conjunction with uncertainty estimation to accurately assess MRI-iUS registration errors for brain tumor surgery. Developed and validated with the public RESECT clinical database, the resulting algorithm can achieve an estimation error of 0.59+-0.57 mm.

翻訳日:2023-07-28 16:29:18 公開日:2023-07-26

# 解釈可能な部分プロトタイプ画像分類器の評価のためのCo-12レシピ

The Co-12 Recipe for Evaluating Interpretable Part-Prototype Image Classifiers ( http://arxiv.org/abs/2307.14517v1 )

ライセンス: Link先を確認

Meike Nauta and Christin Seifert

(参考訳) 解釈可能な部分プロトタイプモデルは、設計によって説明可能なコンピュータビジョンモデルである。モデルは原型部分を学習し、画像中のこれらのコンポーネントを認識し、分類と説明を組み合わせる。直感的に解釈可能なモデルに対する近年の注目にもかかわらず、解釈可能な部分プロトタイプモデルの説明品質を評価するための包括的概要は存在しない。 arXiv:2201.08164(例えば、正しさ、完全性、コンパクト性)で導入された説明品質のCo-12特性に基づいて、部分プロトタイプモデルを評価し、研究ギャップを明らかにし、部分プロトタイプモデルの説明品質を評価するための今後のアプローチを概説する。そこで本稿は,この比較的新しい解釈可能な部分プロトタイプモデルの研究分野の進展と成熟に寄与する。また,パートプロトタイプモデルの評価における知見の簡潔な要約として機能する '`Co-12 cheat sheet' も提供する。

Interpretable part-prototype models are computer vision models that are explainable by design. The models learn prototypical parts and recognise these components in an image, thereby combining classification and explanation. Despite the recent attention for intrinsically interpretable models, there is no comprehensive overview on evaluating the explanation quality of interpretable part-prototype models. Based on the Co-12 properties for explanation quality as introduced in arXiv:2201.08164 (e.g., correctness, completeness, compactness), we review existing work that evaluates part-prototype models, reveal research gaps and outline future approaches for evaluation of the explanation quality of part-prototype models. This paper, therefore, contributes to the progression and maturity of this relatively new research field on interpretable part-prototype models. We additionally provide a ``Co-12 cheat sheet'' that acts as a concise summary of our findings on evaluating part-prototype models.

翻訳日:2023-07-28 16:28:55 公開日:2023-07-26

# 機械学習システムにおけるバグキャラクタリゼーション

Bug Characterization in Machine Learning-based Systems ( http://arxiv.org/abs/2307.14512v1 )

ライセンス: Link先を確認

Mohammad Mehdi Morovati, Amin Nikanjam, Florian Tambon, Foutse Khomh, Zhen Ming (Jack) Jiang

(参考訳) 機械学習(ML)を異なる分野、特に安全クリティカル領域に適用する急速な成長により、信頼性の高いMLコンポーネント、すなわちMLに基づいたソフトウェアコンポーネントの必要性が高まっている。 mlベースのシステムにおけるバグの特徴とメンテナンスの課題を理解することで、これらのシステムの開発者は、最もエラーが発生しやすいコンポーネント、最も一般的なバグなどに関する洞察を提供することで、メンテナンスとテストの作業の場所を特定することができる。本稿では,MLベースのソフトウェアシステムにおけるバグの特徴と,メンテナンスの観点からMLと非MLのバグの違いについて検討する。私たちは、TensorFlow、Keras、PyTorchという3つの最も人気のあるMLフレームワークの1つを使用した447,948のGitHubリポジトリを抽出しました。複数のフィルタリングステップを行った後、最もクローズドイシューの多い上位300リポジトリを選択します。抽出したレポジトリを手作業で調べ,非MLシステムを排除する。本調査では,ML コンポーネントに影響を及ぼすか否かを示すため,特定ML ベースシステムで報告された386 項目を手動で検査した。我々の分析によると、MLベースのシステムで報告されている実際の問題の半分はMLバグであり、MLコンポーネントが非MLコンポーネントよりもエラーを起こしやすいことを示している。次に109個のMLバグを特定し,その根本原因,症状を同定し,必要な固定時間を算出した。その結果、MLバグは、バグ修正の複雑さ(コミット数、ファイルの変更、コード行の変更)の観点から、非MLバグとは大きく異なる特徴を持つことが明らかになった。結果から、ML以外のバグや非MLコンポーネントと比較して、MLコンポーネントの修正はコストがかかり、エラーが発生しやすい。したがって、MLベースのシステムでは、MLコンポーネントの信頼性に大きな注意を払うことが不可欠である。

Rapid growth of applying Machine Learning (ML) in different domains, especially in safety-critical areas, increases the need for reliable ML components, i.e., a software component operating based on ML. Understanding the bugs characteristics and maintenance challenges in ML-based systems can help developers of these systems to identify where to focus maintenance and testing efforts, by giving insights into the most error-prone components, most common bugs, etc. In this paper, we investigate the characteristics of bugs in ML-based software systems and the difference between ML and non-ML bugs from the maintenance viewpoint. We extracted 447,948 GitHub repositories that used one of the three most popular ML frameworks, i.e., TensorFlow, Keras, and PyTorch. After multiple filtering steps, we select the top 300 repositories with the highest number of closed issues. We manually investigate the extracted repositories to exclude non-ML-based systems. Our investigation involved a manual inspection of 386 sampled reported issues in the identified ML-based systems to indicate whether they affect ML components or not. Our analysis shows that nearly half of the real issues reported in ML-based systems are ML bugs, indicating that ML components are more error-prone than non-ML components. Next, we thoroughly examined 109 identified ML bugs to identify their root causes, symptoms, and calculate their required fixing time. The results also revealed that ML bugs have significantly different characteristics compared to non-ML bugs, in terms of the complexity of bug-fixing (number of commits, changed files, and changed lines of code). Based on our results, fixing ML bugs are more costly and ML components are more error-prone, compared to non-ML bugs and non-ML components respectively. Hence, paying a significant attention to the reliability of the ML components is crucial in ML-based systems.

翻訳日:2023-07-28 16:28:37 公開日:2023-07-26

# word that stick: 認知バイアスと計算言語学を用いた意思決定と同義語関与の予測

Words That Stick: Predicting Decision Making and Synonym Engagement Using Cognitive Biases and Computational Linguistics ( http://arxiv.org/abs/2307.14511v1 )

ライセンス: Link先を確認

Nimrod Dvir, Elaine Friedman, Suraj Commuri, Fan Yang, Jennifer Romano

(参考訳) 本研究は,デジタルプラットフォーム上でのユーザエンゲージメントと意思決定を期待するために,認知心理学と情報システム研究に基づく。自然言語処理(NLP)技術と認知バイアス研究からの洞察を用いて,デジタルコンテンツ内の同義語とのユーザインタラクションを探索する。本手法は, 4つの認知バイアス表現性, 使いやすさ, 影響, 分布を読み取りモデルに合成する。包括的ユーザ調査を通じて,モデルがユーザエンゲージメントを予測する能力を評価し,コアアイデアを正確に表現し,理解しやすく,感情的反応を誘発し,一般的に遭遇する同義語が,ユーザエンゲージメントを促進することを発見した。重要なのは、私たちの研究は、人間とコンピュータのインタラクション、デジタル行動、意思決定プロセスに関する新しいレンズを提供します。以上の結果から,認知バイアスはユーザエンゲージメントの強力な指標であり,教育やマーケティングといった分野において効果的なデジタルコンテンツを設計する上での意義を強調する。

This research draws upon cognitive psychology and information systems studies to anticipate user engagement and decision-making on digital platforms. By employing natural language processing (NLP) techniques and insights from cognitive bias research, we delve into user interactions with synonyms within digital content. Our methodology synthesizes four cognitive biasesRepresentativeness, Ease-of-use, Affect, and Distributioninto the READ model. Through a comprehensive user survey, we assess the model's ability to predict user engagement, discovering that synonyms that accurately represent core ideas, are easy to understand, elicit emotional responses, and are commonly encountered, promote greater user engagement. Crucially, our work offers a fresh lens on human-computer interaction, digital behaviors, and decision-making processes. Our results highlight the promise of cognitive biases as potent indicators of user engagement, underscoring their significance in designing effective digital content across fields like education and marketing.

翻訳日:2023-07-28 16:28:06 公開日:2023-07-26

# ロボットタッチの注意:ロバストなシム・トゥ・リアル触覚制御のための触覚閾値予測

Attention of Robot Touch: Tactile Saliency Prediction for Robust Sim-to-Real Tactile Control ( http://arxiv.org/abs/2307.14510v1 )

ライセンス: Link先を確認

Yijiong Lin, Mauro Comi, Alex Church, Dandan Zhang, Nathan F. Lepora

(参考訳) 高解像度触覚センサーは、接触に富むロボットタスクにおける局所的な接触に関する情報を正確に提供することができる。しかし、そのようなタスクの非構造化環境への展開は未調査のままである。非構造環境における触覚ロボット制御のロバスト性を向上させるため,ニューロサイエンスのヒューマンタッチアテンション機構やコンピュータビジョンのビジュアルサリエンシー予測問題に触発されたロボットタッチのための新しい概念である \textit{tactile saliency} を提案し,検討した。視覚的サリエンシと類似したこの概念は、触覚センサーが捉えた触覚画像のキー情報を識別する。視覚サリエンシーデータセットは、一般に人間が注釈を付けるが、触覚画像を手動でラベル付けすることは、直観に反するパターンのため困難である。この課題に対処するため、3つのネットワークからなる新しいアプローチを提案する。 1)接触深度ネットワーク(ConDepNet)は、目標と雑音の特徴を含む実際の触覚画像の変形を局所化する接触深度マップを生成する。 2) 入力接触深度マップの目標領域を記述するために、触覚的サルテンシーマップを予測する触覚的サルテンシーネットワーク(tacsalnet) 3) 触覚ノイズ生成装置(tacngen)は,tacsalnetを訓練するためにノイズ特性を生成する。コンタクトポーズ推定とエッジ追従実験の結果から,実触覚画像からのターゲット特徴の正確な予測が得られた。全体として、当社の触覚塩分予測アプローチは、未知の障害のある環境での堅牢なsim-to-real触覚制御を可能にする。プロジェクトページ: https://sites.google.com/view/tactile-saliency/

High-resolution tactile sensing can provide accurate information about local contact in contact-rich robotic tasks. However, the deployment of such tasks in unstructured environments remains under-investigated. To improve the robustness of tactile robot control in unstructured environments, we propose and study a new concept: \textit{tactile saliency} for robot touch, inspired by the human touch attention mechanism from neuroscience and the visual saliency prediction problem from computer vision. In analogy to visual saliency, this concept involves identifying key information in tactile images captured by a tactile sensor. While visual saliency datasets are commonly annotated by humans, manually labelling tactile images is challenging due to their counterintuitive patterns. To address this challenge, we propose a novel approach comprised of three interrelated networks: 1) a Contact Depth Network (ConDepNet), which generates a contact depth map to localize deformation in a real tactile image that contains target and noise features; 2) a Tactile Saliency Network (TacSalNet), which predicts a tactile saliency map to describe the target areas for an input contact depth map; 3) and a Tactile Noise Generator (TacNGen), which generates noise features to train the TacSalNet. Experimental results in contact pose estimation and edge-following in the presence of distractors showcase the accurate prediction of target features from real tactile images. Overall, our tactile saliency prediction approach gives robust sim-to-real tactile control in environments with unknown distractors. Project page: https://sites.google.com/view/tactile-saliency/.

翻訳日:2023-07-28 16:27:49 公開日:2023-07-26

# 純状態からの熱量子クエンチダイナミクスの再構成

Reconstructing Thermal Quantum Quench Dynamics from Pure States ( http://arxiv.org/abs/2307.14508v1 )

ライセンス: Link先を確認

Jason Saroni, Henry Lamm, Peter P. Orth, Thomas Iadecola

(参考訳) 熱状態の非平衡ダイナミクスをシミュレートすることは、高エネルギーから凝縮物質物理学までのスケールにおける根本的な問題である。量子コンピュータはこの問題を効率的に解く方法を提供するかもしれない。量子コンピュータ上での熱状態を作成することは難しいが、時間依存行列要素の重み付け和を便利に計算することでこれを回避できる方法が存在する。基底状態の数は大きいが、本研究では、最大の密度行列要素のみを重みでシミュレートし、密度行列を所定の精度で捉えることにより、減らすことができることを示す。ハミルトン対称性の活用はさらなる還元を可能にする。このアプローチは、短期量子ハードウェア上でのより正確な熱状態力学シミュレーションへの道を開く。

Simulating the nonequilibrium dynamics of thermal states is a fundamental problem across scales from high energy to condensed matter physics. Quantum computers may provide a way to solve this problem efficiently. Preparing a thermal state on a quantum computer is challenging, but there exist methods to circumvent this by computing a weighted sum of time-dependent matrix elements in a convenient basis. While the number of basis states can be large, in this work we show that it can be reduced by simulating only the largest density matrix elements by weight, capturing the density matrix to a specified precision. Leveraging Hamiltonian symmetries enables further reductions. This approach paves the way to more accurate thermal-state dynamics simulations on near-term quantum hardware.

翻訳日:2023-07-28 16:27:21 公開日:2023-07-26

# adhd, dyslexia, short attention spanの学生を対象とした人工知能を活用した高速読解ツール

Speed Reading Tool Powered by Artificial Intelligence for Students with ADHD, Dyslexia, or Short Attention Span ( http://arxiv.org/abs/2307.14544v1 )

ライセンス: Link先を確認

Megat Irfan Zackry Bin Ismail Ahmad Nazran bin Yusri Muhammad Hafizzul Bin Abdul Manap Muhammad Muizzuddin Bin Kamarozaman

(参考訳) 本稿では, ディプレキシア, ADHD, 注意不足の学生がテキストベースの情報をより効率的に消化する上で, 新たなアプローチを提案する。提案手法は,多層パーセプトロン(MLP)アルゴリズムを用いて複雑なテキスト処理と要約処理を行う。このツールはHugging FaceのT5(Text-to-Text Transfer Transformer)モデルを活用し、すべてのNLPタスクをテキスト生成タスクとして扱う。モデルはより小さなデータセットを使用して特定のタスクに微調整される。 NLTK の Punkt Sentence Tokenizer はテキストを文のリストに分割するために使われる。アプリケーションは、軽量なwebサーバとフレームワークであるflaskを使って提供される。このツールは、読みやすさを高めるためにBionic Readingの原則を適用しており、大胆な機能と行、単語、文字間隔の調整を含んでいる。本稿では,AIを用いた速度測定ツールの方法論,実装,結果について論じる。

This paper presents a novel approach to assist students with dyslexia, ADHD, and short attention span in digesting any text-based information more efficiently. The proposed solution utilizes the Multilayer Perceptron (MLP) algorithm for complex text processing and summarization tasks. The tool leverages the T5 (Text-to-Text Transfer Transformer) model from Hugging Face, which treats every NLP task as a text generation task. The model is fine-tuned on specific tasks using a smaller dataset. The NLTK's Punkt Sentence Tokenizer is used to divide a text into a list of sentences. The application is served using Flask, a lightweight web server and framework. The tool also applies principles from Bionic Reading to enhance readability, which includes a bolding function and adjustments to line, word, and character spacing. The paper discusses the methodology, implementation, and results of the AI-based speed reading tool.

翻訳日:2023-07-28 16:18:23 公開日:2023-07-26

# plug and pray: マルチモーダルモデルの市販コンポーネントの活用

Plug and Pray: Exploiting off-the-shelf components of Multi-Modal Models ( http://arxiv.org/abs/2307.14539v1 )

ライセンス: Link先を確認

Erfan Shayegani, Yue Dong, Nael Abu-Ghazaleh

(参考訳) 大規模言語モデル(llm)に付加的なモダリティ(ビジョンなど)を組み込んだ急速な成長と人気が高まっているため、セキュリティ上の懸念が高まっている。このモダリティの拡大は、家のドアを増やすのと同じように、意図せずに敵攻撃のための複数のアクセスポイントを生成します。本稿では, 対向型埋め込み空間攻撃の導入により, 市販の事前学習エンコーダなどの既設部品をプラグアンドプレイ方式で組み込んだマルチモーダルシステムに存在する脆弱性を強調した。既存の作業とは対照的に、このアプローチではマルチモーダルシステムの重みやパラメータにアクセスする必要はなく、その代わりに、事前学習されたエンコーダの巨大な未熟な埋め込み空間に依存する。提案する組込み空間攻撃には,事前学習済みコンポーネントの広範囲な組込み空間の危険領域や対象領域に存在する入力画像を求めることが含まれる。これらは'コンテキスト汚染'と'隠れたプロンプト注入'の2つの大きな脅威をもたらし、どちらもllavaのようなマルチモーダルモデルに妥協し、関連する言語モデルの振舞いを完全に変えることができる。本研究は,システムに組み込んで堅牢なセキュリティを確保するために,基礎となるコンポーネント,特に訓練済みエンコーダの総合的な検査の必要性を強調した。

The rapid growth and increasing popularity of incorporating additional modalities (e.g., vision) into large language models (LLMs) has raised significant security concerns. This expansion of modality, akin to adding more doors to a house, unintentionally creates multiple access points for adversarial attacks. In this paper, by introducing adversarial embedding space attacks, we emphasize the vulnerabilities present in multi-modal systems that originate from incorporating off-the-shelf components like public pre-trained encoders in a plug-and-play manner into these systems. In contrast to existing work, our approach does not require access to the multi-modal system's weights or parameters but instead relies on the huge under-explored embedding space of such pre-trained encoders. Our proposed embedding space attacks involve seeking input images that reside within the dangerous or targeted regions of the extensive embedding space of these pre-trained components. These crafted adversarial images pose two major threats: 'Context Contamination' and 'Hidden Prompt Injection'-both of which can compromise multi-modal models like LLaVA and fully change the behavior of the associated language model. Our findings emphasize the need for a comprehensive examination of the underlying components, particularly pre-trained encoders, before incorporating them into systems in a plug-and-play manner to ensure robust security.

翻訳日:2023-07-28 16:18:08 公開日:2023-07-26

# カーネルスペクトルの修正による広帯域ニューラルネットワークの帰納バイアスの制御

Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum ( http://arxiv.org/abs/2307.14531v1 )

ライセンス: Link先を確認

Amnon Geifman, Daniel Barzilai, Ronen Basri and Meirav Galun

(参考訳) 広範ニューラルネットワークは特定の関数の学習に偏りがあり、勾配降下(GD)の収束率と、有限の訓練時間でGDに到達可能な関数の両方に影響を与える。そのため、手元にあるタスクに応じてこのバイアスを修正できるメソッドがとても必要になります。そこで我々は,閉形式が知られていない所望の固有値を持つカーネルを近似するために使用可能な,新しい構成カーネル群であるmodified spectrum kernels (msks)を導入する。本稿では,ニューラルネットワークと神経接核の双対性を利用して,gdの軌道を変化させる事前条件付き勾配降下法を提案する。その結果、多項式と、場合によっては最終的な解を変更することなく指数関数的なトレーニングスピードアップが可能になる。本手法は計算効率が高く,実装が容易である。

Wide neural networks are biased towards learning certain functions, influencing both the rate of convergence of gradient descent (GD) and the functions that are reachable with GD in finite training time. As such, there is a great need for methods that can modify this bias according to the task at hand. To that end, we introduce Modified Spectrum Kernels (MSKs), a novel family of constructed kernels that can be used to approximate kernels with desired eigenvalues for which no closed form is known. We leverage the duality between wide neural networks and Neural Tangent Kernels and propose a preconditioned gradient descent method, which alters the trajectory of GD. As a result, this allows for a polynomial and, in some cases, exponential training speedup without changing the final solution. Our method is both computationally efficient and simple to implement.

翻訳日:2023-07-28 16:17:44 公開日:2023-07-26

# 混合メンバ確率ブロックモデルにおける最適推定

Optimal Estimation in Mixed-Membership Stochastic Block Models ( http://arxiv.org/abs/2307.14530v1 )

ライセンス: Link先を確認

Fedor Noskov and Maxim Panov

(参考訳) コミュニティ検出は現代のネットワーク科学において最も重要な問題の一つである。その応用は、タンパク質モデリングからソーシャルネットワーク分析まで、様々な分野で見ることができる。近年,ネットワークの各ノードが複数のコミュニティに属するという,重複するコミュニティ検出の問題を研究する論文が数多く出ている。本研究では,airoldi et al. (2008) によって初めて提案された混合メンバ確率ブロックモデル (mmsb) について考察する。 MMSBはグラフで重複するコミュニティ構造をモデリングするための非常に一般的な設定を提供する。本研究の中心的課題は,観測ネットワークが与えるコミュニティ間の関係を再構築することである。異なる手法を比較し,推定誤差についてminimax下限を定式化する。次に、この下限に合致する新しい推定器を提案する。理論的結果は、考慮されたモデル上でかなり一般的な条件下で証明される。最後に、この理論を一連の実験で示します。

Community detection is one of the most critical problems in modern network science. Its applications can be found in various fields, from protein modeling to social network analysis. Recently, many papers appeared studying the problem of overlapping community detection, where each node of a network may belong to several communities. In this work, we consider Mixed-Membership Stochastic Block Model (MMSB) first proposed by Airoldi et al. (2008). MMSB provides quite a general setting for modeling overlapping community structure in graphs. The central question of this paper is to reconstruct relations between communities given an observed network. We compare different approaches and establish the minimax lower bound on the estimation error. Then, we propose a new estimator that matches this lower bound. Theoretical results are proved under fairly general conditions on the considered model. Finally, we illustrate the theory in a series of experiments.

翻訳日:2023-07-28 16:17:29 公開日:2023-07-26

# 関数値学習:ermにおけるpolyakステップと関数分割に基づく適応学習率

Function Value Learning: Adaptive Learning Rates Based on the Polyak Stepsize and Function Splitting in ERM ( http://arxiv.org/abs/2307.14528v1 )

ライセンス: Link先を確認

Guillaume Garrigos, Robert M. Gower, Fabian Schaipp

(参考訳) 本稿では,サンプル損失値を用いた適応ステップサイズのsgd(stochasticgradient descent)の変種を開発した。特に、経験的リスク最小化(experiical risk minimization)として知られる有限項和問題に焦点をあてる。まず、サンプル損失値を利用し、サンプル損失の知識を最適に仮定する、$\texttt{SPS}_+$と呼ばれる理想化された適応手法を詳述する。この$\texttt{SPS}_+$は、ステップサイズを正に強制するSPS(Stochastic Polyak Stepsize)法の小さな修正である。次に、$\texttt{SPS}_+$ がリプシッツ非滑らかな SGD の収束率の最もよく知られた値であることを示す。次に、最適な損失値が与えられるのではなく、徐々に学習される$\textt{SPS}_+$の変種である$\textt{FUVAL}$を開発する。プロジェクションベースメソッドとして$\texttt{fuval}$の3つの視点をprox-linearメソッドの変形として、そして特定のオンラインsgdメソッドとして与える。次に、$\texttt{FUVAL}$の収束解析と実験結果を示す。我々の研究の欠点は、$\texttt{FUVAL}$ の収束解析が SGD に勝るものではないことである。もう一つのショートミームは、現在$\texttt{FUVAL}$のフルバッチバージョンのみが、ステップサイズに対する感度の点でGD(Gradient Descent)の小さな利点を示していることである。確率版はsgdに対して明確な利点を示さない。我々は、大きなミニバッチが$\texttt{FUVAL}$競争力を持つ必要があると推測する。現在、この論文で研究されている新しい$\texttt{FUVAL}$メソッドは、明確な理論的または実用的な利点を提供していない。それにもかかわらず、我々は、$\texttt{SPS}_+$の非滑らかな分析など、使用している分析手法のいくつかのために、このドラフトをオンラインで公開することにしました。

Here we develop variants of SGD (stochastic gradient descent) with an adaptive step size that make use of the sampled loss values. In particular, we focus on solving a finite sum-of-terms problem, also known as empirical risk minimization. We first detail an idealized adaptive method called $\texttt{SPS}_+$ that makes use of the sampled loss values and assumes knowledge of the sampled loss at optimality. This $\texttt{SPS}_+$ is a minor modification of the SPS (Stochastic Polyak Stepsize) method, where the step size is enforced to be positive. We then show that $\texttt{SPS}_+$ achieves the best known rates of convergence for SGD in the Lipschitz non-smooth. We then move onto to develop $\texttt{FUVAL}$, a variant of $\texttt{SPS}_+$ where the loss values at optimality are gradually learned, as opposed to being given. We give three viewpoints of $\texttt{FUVAL}$, as a projection based method, as a variant of the prox-linear method, and then as a particular online SGD method. We then present a convergence analysis of $\texttt{FUVAL}$ and experimental results. The shortcomings of our work is that the convergence analysis of $\texttt{FUVAL}$ shows no advantage over SGD. Another shortcomming is that currently only the full batch version of $\texttt{FUVAL}$ shows a minor advantages of GD (Gradient Descent) in terms of sensitivity to the step size. The stochastic version shows no clear advantage over SGD. We conjecture that large mini-batches are required to make $\texttt{FUVAL}$ competitive. Currently the new $\texttt{FUVAL}$ method studied in this paper does not offer any clear theoretical or practical advantage. We have chosen to make this draft available online nonetheless because of some of the analysis techniques we use, such as the non-smooth analysis of $\texttt{SPS}_+$, and also to show an apparently interesting approach that currently does not work.

翻訳日:2023-07-28 16:17:19 公開日:2023-07-26

# 野生sarのためのコンピュータビジョンのオープン問題とpatricia wu-muradの探索

Open Problems in Computer Vision for Wilderness SAR and The Search for Patricia Wu-Murad ( http://arxiv.org/abs/2307.14527v1 )

ライセンス: Link先を確認

Thomas Manzini, Robin Murphy

(参考訳) 本稿では,Wu-Murad wilderness search and rescue (WSAR) における2つのコンピュータビジョンシステム,効率的な教師付き学習モデル,および教師なしRXスペクトル分類器を98.9GBのドローン画像に適用する際の課題について述べる。ドローン画像中の行方不明者を特定するための少なくとも19のアプローチと3つのデータセットが提案されているが、実際のWSAR操作で使用されたのは3つのアプローチ(監視されていない2と未知の構造の1)のみである。これらの手法のうち、効率的なDETアーキテクチャと教師なしスペクトルRX分類器が最適に選択された。効率的デットモデルは、heridalデータセットに適用され、最先端と統計的に等価なパフォーマンスを達成するものの、偽陽性(例えば、木足と岩を人として識別する)と偽陰性(例えば、検索チームのメンバーの識別に失敗した)の観点から実世界への変換に失敗した。データセットに良い結果を示すアルゴリズムの実際的な貧弱な結果は、将来の研究の3つの領域を示唆している: 荒野sarのためのより現実的なデータセット、実際のwsar操作で収集できる様々なイメージをシームレスに処理できるコンピュータビジョンモデル、パフォーマンス測定のアライメントの改善。

This paper details the challenges in applying two computer vision systems, an EfficientDET supervised learning model and the unsupervised RX spectral classifier, to 98.9 GB of drone imagery from the Wu-Murad wilderness search and rescue (WSAR) effort in Japan and identifies 3 directions for future research. There have been at least 19 proposed approaches and 3 datasets aimed at locating missing persons in drone imagery, but only 3 approaches (2 unsupervised and 1 of an unknown structure) are referenced in the literature as having been used in an actual WSAR operation. Of these proposed approaches, the EfficientDET architecture and the unsupervised spectral RX classifier were selected as the most appropriate for this setting. The EfficientDET model was applied to the HERIDAL dataset and despite achieving performance that is statistically equivalent to the state-of-the-art, the model fails to translate to the real world in terms of false positives (e.g., identifying tree limbs and rocks as people), and false negatives (e.g., failing to identify members of the search team). The poor results in practice for algorithms that showed good results on datasets suggest 3 areas of future research: more realistic datasets for wilderness SAR, computer vision models that are capable of seamlessly handling the variety of imagery that can be collected during actual WSAR operations, and better alignment on performance measures.

翻訳日:2023-07-28 16:16:37 公開日:2023-07-26

# 過去20年の研究におけるトレースダイナミクスとその意義

Trace dynamics and its implications for my work of the last two decades ( http://arxiv.org/abs/2307.14524v1 )

ライセンス: Link先を確認

Stephen L. Adler

(参考訳) 2004年のケンブリッジ大学出版局の著書『Quantum Theory as an Emergent Phenomenon'』で述べた『トレース力学』の基本概念を概観し、過去20年間の私の仕事の多くにどのように影響したかについて論じる。

I review the basic ideas of ``trace dynamics'', as formulated in my 2004 Cambridge University Press book ``Quantum Theory as an Emergent Phenomenon'', and then discuss how they have influenced much of my work of the last two decades.

翻訳日:2023-07-28 16:16:10 公開日:2023-07-26

# 交通予測モデルにおける不確かさの定量化と一般化性向上のためのベイズ的アプローチ

A Bayesian approach to quantifying uncertainties and improving generalizability in traffic prediction models ( http://arxiv.org/abs/2307.05946v3 )

ライセンス: Link先を確認

Agnimitra Sengupta, Sudeepta Mondal, Adway Das, S. Ilgin Guler

(参考訳) 交通データ予測のためのディープラーニングモデルは、多層アーキテクチャを用いた複雑な関数のモデリングにおいて優れた性能を持つ。しかし、これらのアプローチの大きな欠点は、これらのアプローチのほとんどが不確実性推定による予測を提供していないことである。不確実性推定がなければ、モデル予測に信頼レベルを付けることは困難であり、過信予測に依存する運用戦略は交通状況の悪化につながる可能性がある。本研究では,隠れた層にスペクトル正規化を導入することで,より一般化可能な交通予測における不確実性定量化のためのベイズ繰り返しニューラルネットワークフレームワークを提案する。本稿では,モデルの複雑さを制御し,トレーニングデータへの過剰適合のリスクを低減し,ディープニューラルネットワークのトレーニングプロセスを変化させることを示す。これにより、アウト・オブ・ディストリビューションデータセット上でのモデルの一般化性能が向上する。その結果、スペクトル正規化は不確実性推定を改善でき、単段予測地平線の正規化を伴わない層正規化とモデルの両方を著しく上回ることがわかった。この改良された性能は、摂動下でのデータの特徴空間をよりよくローカライズするスペクトル正規化の能力に起因する。特に交通管理の分野では,複数地点にわたる交通状況の予測が目的であるが,複数の地点からのトレーニングデータの利用は限られている。したがって、スペクトル正規化は、位置特化モデルを必要としないトラフィックデータの基本パターンを効果的にキャプチャできる、より一般化可能なアプローチを提供する。

Deep-learning models for traffic data prediction can have superior performance in modeling complex functions using a multi-layer architecture. However, a major drawback of these approaches is that most of these approaches do not offer forecasts with uncertainty estimates, which are essential for traffic operations and control. Without uncertainty estimates, it is difficult to place any level of trust to the model predictions, and operational strategies relying on overconfident predictions can lead to worsening traffic conditions. In this study, we propose a Bayesian recurrent neural network framework for uncertainty quantification in traffic prediction with higher generalizability by introducing spectral normalization to its hidden layers. In our paper, we have shown that normalization alters the training process of deep neural networks by controlling the model's complexity and reducing the risk of overfitting to the training data. This, in turn, helps improve the generalization performance of the model on out-of-distribution datasets. Results demonstrate that spectral normalization improves uncertainty estimates and significantly outperforms both the layer normalization and model without normalization in single-step prediction horizons. This improved performance can be attributed to the ability of spectral normalization to better localize the feature space of the data under perturbations. Our findings are especially relevant to traffic management applications, where predicting traffic conditions across multiple locations is the goal, but the availability of training data from multiple locations is limited. Spectral normalization, therefore, provides a more generalizable approach that can effectively capture the underlying patterns in traffic data without requiring location-specific models.

翻訳日:2023-07-28 11:28:12 公開日:2023-07-26

# AlphaNet: 分類器の組み合わせによる長距離分類の改善

AlphaNet: Improving Long-Tail Classification By Combining Classifiers ( http://arxiv.org/abs/2008.07073v2 )

ライセンス: Link先を確認

Nadine Chang, Jayanth Koushik, Aarti Singh, Martial Hebert, Yu-Xiong Wang, Michael J. Tarr

(参考訳) ロングテール学習の手法は、データポーア (rare) クラスのパフォーマンス向上に重点を置いているが、そのようなクラスのパフォーマンスは、よりデータリッチ (frequent) クラスのパフォーマンスよりもはるかに低いままである。レアクラスのロングテールメソッドの予測を分析すると、多くのエラーがレアアイテムを視覚的に類似した頻繁なクラスとして誤分類していることが分かる。この問題に対処するために,既存のモデルに適用可能なalphanetを紹介し,レアクラスの分類器に対してポストホック補正を行う。事前学習モデルから、モデルの表現空間における希少なクラスに最も近い頻繁なクラスを見つけ、希少なクラス分類器を頻繁なクラス分類器の線形結合で更新するための重みを学習する。 AlphaNetは、複数のモデルに適用され、複数の長い尾を持つデータセットで稀なクラスのテスト精度を大幅に改善する。また,本手法は,レアクラスと総合的精度のトレードオフを制御し,野生のロングテール分類に有効であることを示す。

Methods in long-tail learning focus on improving performance for data-poor (rare) classes; however, performance for such classes remains much lower than performance for more data-rich (frequent) classes. Analyzing the predictions of long-tail methods for rare classes reveals that a large number of errors are due to misclassification of rare items as visually similar frequent classes. To address this problem, we introduce AlphaNet, a method that can be applied to existing models, performing post hoc correction on classifiers of rare classes. Starting with a pre-trained model, we find frequent classes that are closest to rare classes in the model's representation space and learn weights to update rare class classifiers with a linear combination of frequent class classifiers. AlphaNet, applied to several models, greatly improves test accuracy for rare classes in multiple long-tailed datasets, with very little change to overall accuracy. Our method also provides a way to control the trade-off between rare class and overall accuracy, making it practical for long-tail classification in the wild.

翻訳日:2023-07-27 16:57:07 公開日:2023-07-26

# ガウス過程のスパース依存構造を持つ圧縮可能なスペクトル混合カーネル

Compressible Spectral Mixture Kernels with Sparse Dependency Structures for Gaussian Processes ( http://arxiv.org/abs/1808.00560v9 )

ライセンス: Link先を確認

Kai Chen, Yijue Dai, Feng Yin, Elena Marchiori, and Sergios Theodoridis

(参考訳) スペクトル混合(SM)カーネルは、複雑なパターンを記述するためにガウス過程(GP)のための強力な一般化されたカーネルのクラスを構成する。本稿では、GPの一般化を改善するために、モデル圧縮と時間と位相(TP)変調依存構造を元の(SM)カーネルに導入する。具体的には、bienaym\のアイデンティティを採用することで、smコンポーネント間の相互分散を通じて依存関係構造を一般化します。そこで我々は,SMコンポーネント間の相互畳み込みを利用して,依存関係構造を持つ新しいSMカーネルを提案する。さらに,時間と位相の遅延をパラメータ化することで,依存構造の表現性を改善する。依存構造はスペクトル密度、共分散挙動、サンプリング経路の点で明確な解釈を持つ。実効的なハイパーパラメータ初期化、圧縮可能なSMカーネルコンポーネント、スパース依存構造でSMDを強化するために、最後に新しい構造適応(SA)アルゴリズムを導入する。合成および現実の応用におけるSMDの徹底的な比較分析は、その効果を裏付けるものである。

Spectral mixture (SM) kernels comprise a powerful class of generalized kernels for Gaussian processes (GPs) to describe complex patterns. This paper introduces model compression and time- and phase (TP) modulated dependency structures to the original (SM) kernel for improved generalization of GPs. Specifically, by adopting Bienaym\'es identity, we generalize the dependency structure through cross-covariance between the SM components. Then, we propose a novel SM kernel with a dependency structure (SMD) by using cross-convolution between the SM components. Furthermore, we ameliorate the expressiveness of the dependency structure by parameterizing it with time and phase delays. The dependency structure has clear interpretations in terms of spectral density, covariance behavior, and sampling path. To enrich the SMD with effective hyperparameter initialization, compressible SM kernel components, and sparse dependency structures, we introduce a novel structure adaptation (SA) algorithm in the end. A thorough comparative analysis of the SMD on both synthetic and real-life applications corroborates its efficacy.

翻訳日:2023-07-27 16:56:46 公開日:2023-07-26

# ニューラルネットワークにおける最適経路探索とタスク依存学習の組み合わせ

Combining optimal path search with task-dependent learning in a neural network ( http://arxiv.org/abs/2201.11104v4 )

ライセンス: Link先を確認

Tomas Kulvicius, Minija Tamosiunaite and Florentin W\"org\"otter

(参考訳) 連結グラフの最適経路を見つけるには、グラフの端を移動する際の最小の総コストを決定する必要がある。この問題は、通常すべてのエッジに対してコストが予め定義された古典的なアルゴリズムによって解決できる。従来の計画手法は、通常、あるタスクの要求に従う適応的な方法でコストを変更したい場合、使用できない。ここでは、コスト値をシナプス重みに変換することで、経路探索問題のニューラルネットワーク表現を定義できることを示し、ネットワーク学習機構を用いたオンラインウェイト適応を可能にする。このネットワークの最初のアクティビティ値から始めると、このネットワークにおけるアクティビティの伝播は、ベルマン・フォードのアルゴリズムで見られるのと同じ解をもたらす。ニューラルネットワークはBellman-Fordと同じアルゴリズムの複雑さを持ち、さらに、ネットワーク学習機構(例えばHebbian Learning)が、ネットワーク内の重みを手作業に応じて強化できることを示すことができる。障害のある環境でのナビゲーションの学習や,特定の経路ノードのシーケンスに従う学習によってこれを実証する。したがって、この表現された新しいアルゴリズムは、経路拡張(学習による)が自然な方法で経路発見と直接結合される、異なるタイプのアプリケーションを開くことができる。

Finding optimal paths in connected graphs requires determining the smallest total cost for traveling along the graph's edges. This problem can be solved by several classical algorithms where, usually, costs are predefined for all edges. Conventional planning methods can, thus, normally not be used when wanting to change costs in an adaptive way following the requirements of some task. Here we show that one can define a neural network representation of path finding problems by transforming cost values into synaptic weights, which allows for online weight adaptation using network learning mechanisms. When starting with an initial activity value of one, activity propagation in this network will lead to solutions, which are identical to those found by the Bellman-Ford algorithm. The neural network has the same algorithmic complexity as Bellman-Ford and, in addition, we can show that network learning mechanisms (such as Hebbian learning) can adapt the weights in the network augmenting the resulting paths according to some task at hand. We demonstrate this by learning to navigate in an environment with obstacles as well as by learning to follow certain sequences of path nodes. Hence, the here-presented novel algorithm may open up a different regime of applications where path-augmentation (by learning) is directly coupled with path finding in a natural way.

翻訳日:2023-07-27 16:53:23 公開日:2023-07-26

# 光ビームのモード構造に符号化されたパラメータの推定:量子論

Estimation of a parameter encoded in the modal structure of a light beam: a quantum theory ( http://arxiv.org/abs/2201.04050v2 )

ライセンス: Link先を確認

Manuel Gessner, Nicolas Treps, and Claude Fabre

(参考訳) 量子光は量子状態だけでなく、状態が定義される電磁モードの形状によっても記述される。光精密測定では、周波数、時間形状、光場の空間分布などの特性を決定する「モードパラメータ」を推定することが多い。量子精度限界を導出することにより、モードパラメータ推定の基本境界を確立する。その結果、任意のモードパラメータを量子精度で推定できる明示的なモード設計レシピが明らかになった。提案手法は,空間的・時間的位置決め,分光,位相推定,超高分解能イメージングなど,モードパラメータ推定を応用した実用的な手法を提供する。

Quantum light is described not only by a quantum state but also by the shape of the electromagnetic modes on which the state is defined. Optical precision measurements often estimate a ``mode parameter'' that determines properties such as frequency, temporal shape and the spatial distribution of the light field. By deriving quantum precision limits, we establish the fundamental bounds for mode parameter estimation. Our results reveal explicit mode-design recipes that enable the estimation of any mode parameter with quantum enhanced precision. Our approach provides practical methods for optimizing mode parameter estimation with relevant applications, including spatial and temporal positioning, spectroscopy, phase estimation, and superresolution imaging.

翻訳日:2023-07-27 16:53:01 公開日:2023-07-26

# 自己教師付きビデオ表現学習のためのクロスモーダルマニフォールドカットミックス

Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning ( http://arxiv.org/abs/2112.03906v2 )

ライセンス: Link先を確認

Srijan Das and Michael S. Ryoo

(参考訳) 本稿では,実世界のアプリケーションにおけるコントラスト表現学習のための大規模ビデオデータセットの獲得という課題に対処する。本稿では,ビデオの異なるモダリティを組み合わせることで拡張サンプルを生成するクロスモーダル多様体カットミックス(cmmc)と呼ばれる,自己教師付き学習のための新しい映像拡張手法を提案する。特徴空間内の2つのモダリティにまたがってビデオテッセラクタを別のモードに埋め込むことにより,学習した映像表現の品質を高める。我々は,アクション認識とビデオ検索のための2つの小規模ビデオデータセット ucf101 と hmdb51 について広範な実験を行った。また,本手法はドメイン知識に制限のあるNTUデータセットに対して有効であることを示す。我々のCMMCは、下流の両方のタスクで少ないトレーニングデータを使用しながら、他の自己管理手法と同等のパフォーマンスを達成する。

In this paper, we address the challenge of obtaining large-scale unlabelled video datasets for contrastive representation learning in real-world applications. We present a novel video augmentation technique for self-supervised learning, called Cross-Modal Manifold Cutmix (CMMC), which generates augmented samples by combining different modalities in videos. By embedding a video tesseract into another across two modalities in the feature space, our method enhances the quality of learned video representations. We perform extensive experiments on two small-scale video datasets, UCF101 and HMDB51, for action recognition and video retrieval tasks. Our approach is also shown to be effective on the NTU dataset with limited domain knowledge. Our CMMC achieves comparable performance to other self-supervised methods while using less training data for both downstream tasks.

翻訳日:2023-07-27 16:52:50 公開日:2023-07-26

# 最良腕識別におけるレート最適ベイズ単純後悔

Rate-optimal Bayesian Simple Regret in Best Arm Identification ( http://arxiv.org/abs/2111.09885v3 )

ライセンス: Link先を確認

Junpei Komiyama, Kaito Ariu, Masahiro Kato and Chao Qin

(参考訳) マルチアームのバンディット問題において,最善のアーム識別を考える。前者の一定の連続性条件を仮定すると、ベイズ的単純後悔の速度を特徴づける。ベイズ的後悔最小化(英語版)(Bayesian regret minimization) (Lai, 1987) から派生し、ベイズ的単純後悔の第一項は最適腕と最適腕の間のギャップが$\sqrt{\frac{\log T}{T}}$より小さい地域に由来する。提案手法は, 計算が容易で, 計算が容易なアルゴリズムであり, その先行項が定数係数までの下限値に一致することを提案する。

We consider best arm identification in the multi-armed bandit problem. Assuming certain continuity conditions of the prior, we characterize the rate of the Bayesian simple regret. Differing from Bayesian regret minimization (Lai, 1987), the leading term in the Bayesian simple regret derives from the region where the gap between optimal and suboptimal arms is smaller than $\sqrt{\frac{\log T}{T}}$. We propose a simple and easy-to-compute algorithm with its leading term matching with the lower bound up to a constant factor; simulation results support our theoretical findings.

翻訳日:2023-07-27 16:52:35 公開日:2023-07-26

# QOptCraft: 線形光量子系の設計と研究のためのPythonパッケージ

QOptCraft: A Python package for the design and study of linear optical quantum systems ( http://arxiv.org/abs/2108.06186v2 )

ライセンス: Link先を確認

Daniel G\'omez Aguado, Vicent Gimeno, Julio Jos\'e Moyano-Fern\'andez, Juan Carlos Garcia-Escartin

(参考訳) 線形光学系における光の量子状態の操作は、量子光学と量子計算に複数の応用がある。 QOptCraftパッケージは、線形干渉計を用いた量子実験を設計する際に、最も一般的な問題のいくつかを解決する方法のコレクションを提供する。この方法には、システムの古典的な記述からn個の光子の量子進化行列を計算する関数と、任意の所望の量子進化のために、ユニタリ進化を実現する実験系の完全な記述を与えるか、あるいはそれが不可能である場合には、所望のユニタリを局所的に最小の誤差で近似する線形系の完全な記述を与える逆法が含まれる。パッケージ内の関数には、線形系の古典的な散乱行列をビームスプリッターと位相シフト器のリストに変換する異なる既知の分解の実装と、n光子を持つ状態の量子進化を記述する効果的なハミルトニアンを計算する方法が含まれる。このパッケージはランダム線形光学系の生成や行列対数計算などの有用なタスクのためのルーチンで完結している。ルーチンは、線形系の記述に現れるユニタリ行列を扱うとき、通常の数値問題を避けるために選択される。

The manipulation of the quantum states of light in linear optical systems has multiple applications in quantum optics and quantum computation. The package QOptCraft gives a collection of methods to solve some of the most usual problems when designing quantum experiments with linear interferometers. The methods include functions that compute the quantum evolution matrix for n photons from the classical description of the system and inverse methods that, for any desired quantum evolution, will either give the complete description of the experimental system that realizes that unitary evolution or, when this is impossible, the complete description of the linear system which approximates the desired unitary with a locally minimal error. The functions in the package include implementations of different known decompositions that translate the classical scattering matrix of a linear system into a list of beam splitters and phase shifters and methods to compute the effective Hamiltonian that describes the quantum evolution of states with n photons. The package is completed with routines for useful tasks like generating random linear optical systems and computing matrix logarithms. The routines are chosen to avoid usual numerical problems when dealing with the unitary matrices that appear in the description of linear systems.

翻訳日:2023-07-27 16:52:21 公開日:2023-07-26

# 語彙データのための深層学習モデルの再検討

Revisiting Deep Learning Models for Tabular Data ( http://arxiv.org/abs/2106.11959v3 )

ライセンス: Link先を確認

Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, Artem Babenko

(参考訳) 表形式のデータに対するディープラーニングに関する既存の文献は、幅広い新しいアーキテクチャを提案し、様々なデータセットで競合する結果を報告している。しかしながら、提案されたモデルは、通常、互いに適切に比較されないため、既存の作業では、しばしば異なるベンチマークと実験プロトコルを使用する。その結果、研究者と実践者の両方にとって、どのモデルが優れているかは明らかでない。さらに、フィールドには効果的なベースライン、すなわち様々な問題にまたがる競争性能を提供する使いやすいモデルがない。本研究では,2つの単純かつ強力な深層アーキテクチャを識別することにより,表層データに対するDLアーキテクチャのメインファミリーの概要と表層DLにおけるベースラインのバーを高める。ひとつはResNetのようなアーキテクチャで、以前の作業でしばしば欠落する強力なベースラインであることが分かりました。第2のモデルは、表データに対するTransformerアーキテクチャの簡単な適応であり、ほとんどのタスクにおいて他のソリューションよりも優れています。どちらのモデルも、同じトレーニングおよびチューニングプロトコルの下で様々なタスクセットで既存のアーキテクチャと比較される。また、最高のDLモデルとGradient Boosted Decision Treesを比較して、まだ普遍的に優れたソリューションがないと結論づける。

The existing literature on deep learning for tabular data proposes a wide range of novel architectures and reports competitive results on various datasets. However, the proposed models are usually not properly compared to each other and existing works often use different benchmarks and experiment protocols. As a result, it is unclear for both researchers and practitioners what models perform best. Additionally, the field still lacks effective baselines, that is, the easy-to-use models that provide competitive performance across different problems. In this work, we perform an overview of the main families of DL architectures for tabular data and raise the bar of baselines in tabular DL by identifying two simple and powerful deep architectures. The first one is a ResNet-like architecture which turns out to be a strong baseline that is often missing in prior works. The second model is our simple adaptation of the Transformer architecture for tabular data, which outperforms other solutions on most tasks. Both models are compared to many existing architectures on a diverse set of tasks under the same training and tuning protocols. We also compare the best DL models with Gradient Boosted Decision Trees and conclude that there is still no universally superior solution.

翻訳日:2023-07-27 16:51:58 公開日:2023-07-26

# 事前学習言語モデルの包括的比較

A Comprehensive Comparison of Pre-training Language Models ( http://arxiv.org/abs/2106.11483v9 )

ライセンス: Link先を確認

Tong Guo

(参考訳) 近年、訓練済み言語モデルの開発により、自然言語処理(NLP)タスクが新しい最先端技術に導入されている。本稿では,様々な事前学習言語モデルの効率性について検討する。我々は、同じテキスト量と同じトレーニングステップを持つトランスフォーマーベースのモデルのリストを事前訓練する。実験結果から、BERTの原点における最も改善点は、短文理解のための文脈情報を取得するためにRNN層を追加することである。しかし、結論は: 類似のbert構造に対する短いテキスト理解に顕著な改善はない。データ中心のメソッド[12]はよりよいパフォーマンスを達成できます。

Recently, the development of pre-trained language models has brought natural language processing (NLP) tasks to the new state-of-the-art. In this paper we explore the efficiency of various pre-trained language models. We pre-train a list of transformer-based models with the same amount of text and the same training steps. The experimental results shows that the most improvement upon the origin BERT is adding the RNN-layer to capture more contextual information for short text understanding. But the conclusion is: There are no remarkable improvement for short text understanding for similar BERT structures. Data-centric method[12] can achieve better performance.

翻訳日:2023-07-27 16:51:40 公開日:2023-07-26

# ディープラーニングに基づく3次元セグメンテーション:調査

Deep Learning Based 3D Segmentation: A Survey ( http://arxiv.org/abs/2103.05423v3 )

ライセンス: Link先を確認

Yong He, Hongshan Yu, Xiaoyan Liu, Zhengeng Yang, Wei Sun, Ajmal Mian

(参考訳) 3dセグメンテーションは、自律運転、ロボティクス、拡張現実、医療画像解析などの応用を含む、コンピュータビジョンにおける基本的かつ困難な問題である。コンピュータビジョン、グラフィックス、機械学習のコミュニティから大きな注目を集めている。手作り特徴と機械学習分類器に基づく従来の3Dセグメンテーション手法では、一般化能力が欠如している。 2Dコンピュータビジョンの成功によって、ディープラーニング技術は、最近3Dセグメンテーションタスクの選択ツールとなっている。これにより、さまざまなベンチマークデータセットで評価された多くの方法が文献に流入した。 RGB-Dとポイントクラウドのセグメンテーションに関する調査論文は存在するが、すべての3Dデータモダリティとアプリケーションドメインをカバーする詳細な調査や最近の調査は存在しない。本稿では,このギャップを埋め,ディープラーニングに基づく3Dセグメンテーションにおける最近の進歩を包括的に調査する。 180以上の作品をカバーし、強みと限界を分析し、ベンチマークデータセットでの競争力について論じている。この調査は、最も一般的に使用されているパイプラインの概要を提供し、最終的に将来有望な研究方向性を強調している。

3D segmentation is a fundamental and challenging problem in computer vision with applications in autonomous driving, robotics, augmented reality and medical image analysis. It has received significant attention from the computer vision, graphics and machine learning communities. Conventional methods for 3D segmentation, based on hand-crafted features and machine learning classifiers, lack generalization ability. Driven by their success in 2D computer vision, deep learning techniques have recently become the tool of choice for 3D segmentation tasks. This has led to an influx of a large number of methods in the literature that have been evaluated on different benchmark datasets. Whereas survey papers on RGB-D and point cloud segmentation exist, there is a lack of an in-depth and recent survey that covers all 3D data modalities and application domains. This paper fills the gap and provides a comprehensive survey of the recent progress made in deep learning based 3D segmentation. It covers over 180 works, analyzes their strengths and limitations and discusses their competitive results on benchmark datasets. The survey provides a summary of the most commonly used pipelines and finally highlights promising research directions for the future.

翻訳日:2023-07-27 16:51:30 公開日:2023-07-26

# 静的・動的シナリオにおけるモノガミーの出現

Emergence of Monogamy under Static and Dynamic Scenarios ( http://arxiv.org/abs/2102.04940v2 )

ライセンス: Link先を確認

Rivu Gupta, Saptarshi Roy, Shiladitya Mal, Aditi Sen De

(参考訳) 2つのパーティを超えて多部量子相関を特徴付けることは、最先端の量子技術を構築する上で最も重要である。本稿では,多元系に存在する量子相関 (qcs) について,単元得点 (ms) と局所化量子相関 (lqc) ,および状態の真の多元的絡み合い (gme) の相関関係について検討する。我々は高励起のディック状態に対するGMEの周波数分布がランダム状態の周波数分布に類似していることを発見した。我々は,すべての状態が単元となるgmeの臨界値が存在することを示し,各単元関係の様々な層を提供するmsの異なるパワーを考慮して検討する。興味深いことに、LQC と MS と GME の関係は成り立たない。非常に低いGME(低いモノガミースコア、正と負の両方)を持つ状態は、2つのパーティで高いQCをローカライズすることができる。また、ランダム状態に対するLQCを含む2部QC測度の和に対する上界を提供し、実際の上界と代数的最大値の間のギャップを確立する。

Characterizing multipartite quantum correlations beyond two parties is of utmost importance for building cutting edge quantum technologies, although the comprehensive picture is still missing. Here we investigate quantum correlations (QCs) present in a multipartite system by exploring connections between monogamy score (MS), localizable quantum correlations (LQC), and genuine multipartite entanglement (GME) content of the state. We find that the frequency distribution of GME for Dicke states with higher excitations resembles that of random states. We show that there is a critical value of GME beyond which all states become monogamous and it is investigated by considering different powers of MS which provide various layers of monogamy relations. Interestingly, such a relation between LQC and MS as well as GME does not hold. States having a very low GME (low monogamy score, both positive and negative) can localize a high amount of QCs in two parties. We also provide an upper bound to the sum of bipartite QC measures including LQC for random states and establish a gap between the actual upper bound and the algebraic maximum.

翻訳日:2023-07-27 16:51:13 公開日:2023-07-26

# スーパーデンス符号化の剛性

Rigidity of superdense coding ( http://arxiv.org/abs/2012.01672v2 )

ライセンス: Link先を確認

Ashwin Nayak and Henry Yuen

(参考訳) bennett と wiesner の有名な superdense 符号化プロトコルは、1つの qubit だけを送信し、共有 epr ペアを使って2ビットの古典情報を伝えることができることを実証している。最初の結果は、このタスクを達成するための任意のプロトコル(送信者のエンコーディング操作や共有されたエンタングル状態の次元に仮定がない)が、標準のbennett-wiesnerプロトコルと局所的に等価であるということです。言い換えれば、超高次符号化タスクは厳格である。特に,送信側と受信側は,古典的ランダム性の源として,追加の絡み合い(EPRペア以外の)のみを使用することを示す。また、高次元のスーパーデンス符号化に関するいくつかの質問についても検討し、一般的な次元で$d$-dimensionalの量子状態を送信することで、$d^2$の可能なメッセージの1つを伝えることを目標としている。 $d=2$の場合(つまり1つのqubitを送信)とは異なり、より高額な$d$に対して、等価なスーパーセンスコーディングプロトコルが存在する。非同値プロトコルの具体的構成は、すべての$d > 2$ に対する非同値直交ユニタリベースの構成に基づいている。最後に、符号化演算子がユニタリ群上のハール測度から独立にサンプリングされるスーパーデンス符号化プロトコルの性能を分析する。我々の分析は、無作為な最大絡み合った状態の区別可能性の有界化を伴う。

The famous superdense coding protocol of Bennett and Wiesner demonstrates that it is possible to communicate two bits of classical information by sending only one qubit and using a shared EPR pair. Our first result is that an arbitrary protocol for achieving this task (where there are no assumptions on the sender's encoding operations or the dimension of the shared entangled state) is locally equivalent to the canonical Bennett-Wiesner protocol. In other words, the superdense coding task is rigid. In particular, we show that the sender and receiver only use additional entanglement (beyond the EPR pair) as a source of classical randomness. We also investigate several questions about higher-dimensional superdense coding, where the goal is to communicate one of $d^2$ possible messages by sending a $d$-dimensional quantum state, for general dimensions $d$. Unlike the $d=2$ case (i.e. sending a single qubit), there can be inequivalent superdense coding protocols for higher $d$. We present concrete constructions of inequivalent protocols, based on constructions of inequivalent orthogonal unitary bases for all $d > 2$. Finally, we analyze the performance of superdense coding protocols where the encoding operators are independently sampled from the Haar measure on the unitary group. Our analysis involves bounding the distinguishability of random maximally entangled states, which may be of independent interest.

翻訳日:2023-07-27 16:50:52 公開日:2023-07-26

# 深部画像復元・拡張の先駆者:調査

Priors in Deep Image Restoration and Enhancement: A Survey ( http://arxiv.org/abs/2206.02070v2 )

ライセンス: Link先を確認

Yunfan Lu, Yiqi Lin, Hao Wu, Yunhao Luo, Xu Zheng, Hui Xiong, Lin Wang

(参考訳) 画像の復元と改善は、ノイズ、ぼかし、分解などの劣化を取り除くことによって画質を改善するプロセスである。深層学習(DL)は近年,画像修復と拡張に応用されている。その不適切な性質のため、深層ニューラルネットワーク(dnn)のトレーニングを容易にするために、先行研究が数多く行われている。しかし, 先行研究の重要性は, 研究コミュニティにおいて, 体系的に研究され, 分析されていない。そこで本稿は,最近の深部画像復元・強調技術の進歩を包括的に概観する最初の研究である。 Our work covers five primary contents: (1) A theoretical analysis of priors for deep image restoration and enhancement; (2) A hierarchical and structural taxonomy of priors commonly used in the DL-based methods; (3) An insightful discussion on each prior regarding its principle, potential, and applications; (4) A summary of crucial problems by highlighting the potential future directions, especially adopting the large-scale foundation models as prior, to spark more research in the community; (5) An open-source repository that provides a taxonomy of all mentioned works and code links.

Image restoration and enhancement is a process of improving the image quality by removing degradations, such as noise, blur, and resolution degradation. Deep learning (DL) has recently been applied to image restoration and enhancement. Due to its ill-posed property, plenty of works have been explored priors to facilitate training deep neural networks (DNNs). However, the importance of priors has not been systematically studied and analyzed by far in the research community. Therefore, this paper serves as the first study that provides a comprehensive overview of recent advancements in priors for deep image restoration and enhancement. Our work covers five primary contents: (1) A theoretical analysis of priors for deep image restoration and enhancement; (2) A hierarchical and structural taxonomy of priors commonly used in the DL-based methods; (3) An insightful discussion on each prior regarding its principle, potential, and applications; (4) A summary of crucial problems by highlighting the potential future directions, especially adopting the large-scale foundation models as prior, to spark more research in the community; (5) An open-source repository that provides a taxonomy of all mentioned works and code links.

翻訳日:2023-07-27 16:45:18 公開日:2023-07-26

# 勧告の公正性:基礎,方法,応用

Fairness in Recommendation: Foundations, Methods and Applications ( http://arxiv.org/abs/2205.13619v5 )

ライセンス: Link先を確認

Yunqi Li, Hanxiong Chen, Shuyuan Xu, Yingqiang Ge, Juntao Tan, Shuchang Liu, Yongfeng Zhang

(参考訳) 機械学習の最も普及している応用の1つとして、推奨システムは人間の意思決定を支援する上で重要な役割を果たす。ユーザの満足度とプラットフォームの関心度は,生成した推奨結果の品質と密接に関連している。しかし、高度にデータ駆動のシステムとして、レコメンダシステムはデータやアルゴリズムのバイアスの影響を受け、不公平な結果をもたらし、システムへの依存を弱める可能性がある。その結果、推薦設定における潜在的不公平問題に対処することが重要である。近年,レコメンデーションシステムにおける公平性への配慮が注目され,レコメンデーションの公平性を促進するためのアプローチに関する文献が増えている。しかし、研究はむしろ断片化されており、体系的な組織を欠いているため、新たな研究者をドメインに侵入することは困難である。これにより、既存のフェアネスに関するレコメンデーションに関する調査を体系的に実施する動機付けとなります。本調査は、推薦文学における公正性の基盤に焦点を当てる。まず、公平性研究の概観を提供するため、分類やランク付けといった基本的な機械学習タスクにおける公平性に関する簡単な紹介と、レコメンダシステムにおける公平性を研究する際に考慮すべきより複雑な状況と課題を紹介する。その後、現在のフェアネス定義の分類法、フェアネス改善のための典型的な手法、そして、レコメンデーションにおけるフェアネス研究のためのデータセットに焦点を当てて、レコメンデーションにフェアネスを導入する。また、フェアネス研究の課題と機会についても述べ、フェアリコメンデーション研究分野の推進などを目指している。

As one of the most pervasive applications of machine learning, recommender systems are playing an important role on assisting human decision making. The satisfaction of users and the interests of platforms are closely related to the quality of the generated recommendation results. However, as a highly data-driven system, recommender system could be affected by data or algorithmic bias and thus generate unfair results, which could weaken the reliance of the systems. As a result, it is crucial to address the potential unfairness problems in recommendation settings. Recently, there has been growing attention on fairness considerations in recommender systems with more and more literature on approaches to promote fairness in recommendation. However, the studies are rather fragmented and lack a systematic organization, thus making it difficult to penetrate for new researchers to the domain. This motivates us to provide a systematic survey of existing works on fairness in recommendation. This survey focuses on the foundations for fairness in recommendation literature. It first presents a brief introduction about fairness in basic machine learning tasks such as classification and ranking in order to provide a general overview of fairness research, as well as introduce the more complex situations and challenges that need to be considered when studying fairness in recommender systems. After that, the survey will introduce fairness in recommendation with a focus on the taxonomies of current fairness definitions, the typical techniques for improving fairness, as well as the datasets for fairness studies in recommendation. The survey also talks about the challenges and opportunities in fairness research with the hope of promoting the fair recommendation research area and beyond.

翻訳日:2023-07-27 16:45:02 公開日:2023-07-26

# フェデレート学習のためのロバスト量認識集約

Robust Quantity-Aware Aggregation for Federated Learning ( http://arxiv.org/abs/2205.10848v2 )

ライセンス: Link先を確認

Jingwei Yi, Fangzhao Wu, Huishuai Zhang, Bin Zhu, Tao Qi, Guangzhong Sun, Xing Xie

(参考訳) federated learning(fl)は、複数のクライアントがローカルデータを共有せずに、協調的にモデルをトレーニングすることを可能にする。しかし、古典的なFLは深刻なセキュリティと堅牢性の問題に直面しており、例えば、悪意のあるクライアントはモデルのアップデートを害し、同時にモデルアグリゲーションにおけるモデル更新の影響を増幅するために大量の請求を行う。 FLの既存の防御メソッドは、悪意のあるモデル更新を処理する一方で、すべての量の良性を扱うか、単にすべてのクライアントの量を無視/停止する。前者は量増強攻撃に弱いが、後者は、異なるクライアント上のローカルデータが通常、かなり異なるサイズであるため、準最適パフォーマンスをもたらす。本稿では,フェデレーション学習のためのロバストな量認識集約アルゴリズムであるFedRAを提案する。具体的には、アップロードされたモデル更新と異なるクライアントのデータ量とを協調的に考慮し、残っているクライアントのモデル更新に重み付けを施すことにより、悪意のあるクライアントをフィルタリングする手法を提案する。さらに,フェデレーション学習に参加する悪意のあるクライアントの数は,異なるラウンドで動的に変化する可能性があるため,各ラウンドにおいて不審なクライアントの数を推定する悪意のあるクライアント番号推定器を提案する。 4つの公開データセットを用いた実験により,FedRA法の有効性が実証された。

Federated learning (FL) enables multiple clients to collaboratively train models without sharing their local data, and becomes an important privacy-preserving machine learning framework. However, classical FL faces serious security and robustness problem, e.g., malicious clients can poison model updates and at the same time claim large quantities to amplify the impact of their model updates in the model aggregation. Existing defense methods for FL, while all handling malicious model updates, either treat all quantities benign or simply ignore/truncate the quantities of all clients. The former is vulnerable to quantity-enhanced attack, while the latter leads to sub-optimal performance since the local data on different clients is usually in significantly different sizes. In this paper, we propose a robust quantity-aware aggregation algorithm for federated learning, called FedRA, to perform the aggregation with awareness of local data quantities while being able to defend against quantity-enhanced attacks. More specifically, we propose a method to filter malicious clients by jointly considering the uploaded model updates and data quantities from different clients, and performing quantity-aware weighted averaging on model updates from remaining clients. Moreover, as the number of malicious clients participating in the federated learning may dynamically change in different rounds, we also propose a malicious client number estimator to predict how many suspicious clients should be filtered in each round. Experiments on four public datasets demonstrate the effectiveness of our FedRA method in defending FL against quantity-enhanced attacks.

翻訳日:2023-07-27 16:44:37 公開日:2023-07-26

# タブラルディープラーニングにおける数値的特徴の埋め込みについて

On Embeddings for Numerical Features in Tabular Deep Learning ( http://arxiv.org/abs/2203.05556v3 )

ライセンス: Link先を確認

Yury Gorishniy and Ivan Rubachev and Artem Babenko

(参考訳) 近年,トランスフォーマーのような深層アーキテクチャは表型データ問題に対して高い性能を示している。 MLPのような従来のモデルとは異なり、これらのアーキテクチャはスカラー値の数値特徴をメインのバックボーンに混ぜる前に高次元の埋め込みにマッピングする。本研究では,従来の GBDT 対応ベンチマークにおいて,より強力な DL モデルの構築と GBDT との競合を可能にするため,数値的特徴の埋め込みは,表型 DL の過度な自由度である,と論じる。まず、埋め込み加群を構築するための概念的に異なる2つのアプローチについて説明する: 1つはスカラー値の断片的線形符号化に基づくもので、2つ目は周期的アクティベーションを利用する。次に,これら2つのアプローチが,線形層やreluアクティベーションといった従来のブロックに基づく組込みと比較して,大幅なパフォーマンス向上につながることを実証する。重要なのは,トランスフォーマーだけでなく,多くのバックボーンにも数値的特徴を埋め込むことが有益であることを示すことである。具体的には、適切な埋め込みの後、単純なMLPのようなモデルは注意に基づくアーキテクチャと同等に機能する。全体として、数値的な特徴の埋め込みを重要な設計の側面として強調し、表状DLのさらなる改善の可能性を秘めている。

Recently, Transformer-like deep architectures have shown strong performance on tabular data problems. Unlike traditional models, e.g., MLP, these architectures map scalar values of numerical features to high-dimensional embeddings before mixing them in the main backbone. In this work, we argue that embeddings for numerical features are an underexplored degree of freedom in tabular DL, which allows constructing more powerful DL models and competing with GBDT on some traditionally GBDT-friendly benchmarks. We start by describing two conceptually different approaches to building embedding modules: the first one is based on a piecewise linear encoding of scalar values, and the second one utilizes periodic activations. Then, we empirically demonstrate that these two approaches can lead to significant performance boosts compared to the embeddings based on conventional blocks such as linear layers and ReLU activations. Importantly, we also show that embedding numerical features is beneficial for many backbones, not only for Transformers. Specifically, after proper embeddings, simple MLP-like models can perform on par with the attention-based architectures. Overall, we highlight embeddings for numerical features as an important design aspect with good potential for further improvements in tabular DL.

翻訳日:2023-07-27 16:44:12 公開日:2023-07-26

# MICDIR: 自己構築グラフラテント付きUNetMSSを用いたマルチスケール逆整合デフォルマブルイメージレジストレーション

MICDIR: Multi-scale Inverse-consistent Deformable Image Registration using UNetMSS with Self-Constructing Graph Latent ( http://arxiv.org/abs/2203.04317v2 )

ライセンス: Link先を確認

Soumick Chatterjee, Himanshi Bajaj, Istiyak H. Siddiquee, Nandish Bandi Subbarayappa, Steve Simon, Suraj Bangalore Shashidhar, Oliver Speck and Andreas N\"urnberge

(参考訳) 画像登録とは、リモートセンシング、画像検索、そして最も一般的には医療画像などのコンピュータビジョンの様々な応用で広く使われている技術である。深層学習に基づく技術は、医用画像登録を含む様々な複雑な医用画像処理問題に対処するために成功している。長年にわたり、深層学習を用いた画像登録技術が提案されてきた。 voxelmorphのような変形可能な画像登録技術は、より細かい変化を捉え、より滑らかな変形を提供するのに成功している。しかしながら、VoxelmorphはICNetやFIREと同様に、グローバルな依存関係(すなわち供給された画像の全体解剖学的ビュー)を明示的にエンコードしていないため、大きな変形を追跡できない。上記の問題に取り組むため,本稿ではvoxelmorphアプローチを3つの方法で拡張する。変形の小さい場合や大きな場合の性能向上のために,マルチスケールのUNetを用いて,解像度の異なるモデルの監視を行った。与えられた画像対の構造的相関関係を学習し、符号化するネットワークを支援するために、自己構築グラフネットワーク(SCGNet)がマルチスケールUNetの潜時として使われ、モデルの学習プロセスを改善し、モデルをより一般化するのに役立つ。そして最後に,変形を逆整合にするために,サイクル一貫性の損失が採用されている。提案手法は脳mriの登録作業において, アリとボクセルモルフに対して有意な改善を行い, イントラモーダルでは0.8013 \pm 0.0243, インターモーダルでは 0.6211 \pm 0.0309, ボクセルモルフでは 0.7747 \pm 0.0260 と 0.6071 \pm 0.0510 を得た。

Image registration is the process of bringing different images into a common coordinate system - a technique widely used in various applications of computer vision, such as remote sensing, image retrieval, and, most commonly, medical imaging. Deep learning based techniques have been applied successfully to tackle various complex medical image processing problems, including medical image registration. Over the years, several image registration techniques have been proposed using deep learning. Deformable image registration techniques such as Voxelmorph have been successful in capturing finer changes and providing smoother deformations. However, Voxelmorph, as well as ICNet and FIRE, do not explicitly encode global dependencies (i.e. the overall anatomical view of the supplied image) and, therefore, cannot track large deformations. In order to tackle the aforementioned problems, this paper extends the Voxelmorph approach in three different ways. To improve the performance in case of small as well as large deformations, supervision of the model at different resolutions has been integrated using a multi-scale UNet. To support the network to learn and encode the minute structural co-relations of the given image-pairs, a self-constructing graph network (SCGNet) has been used as the latent of the multi-scale UNet - which can improve the learning process of the model and help the model to generalise better. And finally, to make the deformations inverse-consistent, cycle consistency loss has been employed. On the task of registration of brain MRIs, the proposed method achieved significant improvements over ANTs and VoxelMorph, obtaining a Dice score of 0.8013 \pm 0.0243 for intramodal and 0.6211 \pm 0.0309 for intermodal, while VoxelMorph achieved 0.7747 \pm 0.0260 and 0.6071 \pm 0.0510, respectively

翻訳日:2023-07-27 16:43:55 公開日:2023-07-26

# 20モードユニバーサル量子フォトニックプロセッサ

20-Mode Universal Quantum Photonic Processor ( http://arxiv.org/abs/2203.01801v5 )

ライセンス: Link先を確認

Caterina Taballione, Malaquias Correa Anguita, Michiel de Goede, Pim Venderbosch, Ben Kassenberg, Henk Snijders, Narasimhan Kannan, Ward L. Vleeshouwers, Devin Smith, J\"orn P. Epping, Reinier van der Meer, Pepijn W. H. Pinkse, Hans van den Vlekkert, Jelmer J. Renema

(参考訳) 集積フォトニクスは光量子コンピューティングに不可欠な技術である。 universal, phase-stable, reconfigurable multimode interferometers (quantum photonic processor) はフォトニック量子状態の操作を可能にし、様々なアーキテクチャにおけるフォトニック量子コンピュータの主要なコンポーネントの一つである。本稿では,これまでで最大の量子フォトニックプロセッサの実現について報告する。プロセッサは20個の入力モードにおいて任意のユニタリ変換を可能とし、振幅忠実度は$f_{\text{haar}} = 97.4\%$と$f_{\text{perm}} = 99.5\%$ for haar-random と置換行列に対して、それぞれ2.9 dbの光学損失と$v_{\text{hom}}=98\%$の高可視性量子干渉を持つ。プロセッサは$\mathrm{Si_3N_4}$導波路で実現され、ペルチェ素子によって積極的に冷却される。

Integrated photonics is an essential technology for optical quantum computing. Universal, phase-stable, reconfigurable multimode interferometers (quantum photonic processors) enable manipulation of photonic quantum states and are one of the main components of photonic quantum computers in various architectures. In this paper, we report the realization of the largest quantum photonic processor to date. The processor enables arbitrary unitary transformations on its 20 input modes with an amplitude fidelity of $F_{\text{Haar}} = 97.4\%$ and $F_{\text{Perm}} = 99.5\%$ for Haar-random and permutation matrices, respectively, an optical loss of 2.9 dB averaged over all modes, and high-visibility quantum interference with $V_{\text{HOM}}=98\%$. The processor is realized in $\mathrm{Si_3N_4}$ waveguides and is actively cooled by a Peltier element.

翻訳日:2023-07-27 16:43:18 公開日:2023-07-26

# MetaDT:Few-Shot学習のためのクラス階層を持つメタ決定木

MetaDT: Meta Decision Tree with Class Hierarchy for Interpretable Few-Shot Learning ( http://arxiv.org/abs/2203.01482v2 )

ライセンス: Link先を確認

Baoquan Zhang, Hao Jiang, Xutao Li, Shanshan Feng, Yunming Ye, Rui Ye

(参考訳) FSL(Few-Shot Learning)は、新しいクラスをいくつかの例で認識することを目的とした課題である。近年,メタ学習や表現学習の観点から多くの手法が提案されている。しかし、FSL決定プロセスの解釈可能性に焦点を当てた研究はほとんどない。本稿では,新しいメタ学習ベースの決定木フレームワークであるmetadtを提案することで,解釈可能なfslへの一歩を踏み出す。特に、FSLの解釈性は概念的側面と視覚的側面という2つの側面から達成される。概念面では、まず FSL として木のような概念階層を導入する。そこで, 先行課題に頼って, 各タスクを異なる概念レベルを持つサブタスク群に分割し, 決定木モデルを用いてクラス予測を行う。このような設計の利点は、最終的なクラス予測につながる一連のハイレベルな概念決定が得られ、fslの決定プロセスが明確になることである。視覚面では、視覚的注意機構を備えたサブタスク固有の分類器のセットが、決定ツリーの各ノードで決定を行うように設計されている。その結果、サブタスク固有のヒートマップ可視化が得られ、各ツリーノードの決定解釈性が得られる。最終的に、FSLのデータ不足を緩和するために、概念階層の先行を無向グラフとみなし、グラフ畳み込みに基づく決定木推論ネットワークをメタラーナーとして設計し、決定木のパラメータを推測する。性能比較および解釈可能性分析に関する大規模な実験は、MetaDTの優位性を示している。

Few-Shot Learning (FSL) is a challenging task, which aims to recognize novel classes with few examples. Recently, lots of methods have been proposed from the perspective of meta-learning and representation learning. However, few works focus on the interpretability of FSL decision process. In this paper, we take a step towards the interpretable FSL by proposing a novel meta-learning based decision tree framework, namely, MetaDT. In particular, the FSL interpretability is achieved from two aspects, i.e., a concept aspect and a visual aspect. On the concept aspect, we first introduce a tree-like concept hierarchy as FSL prior. Then, resorting to the prior, we split each few-shot task to a set of subtasks with different concept levels and then perform class prediction via a model of decision tree. The advantage of such design is that a sequence of high-level concept decisions that lead up to a final class prediction can be obtained, which clarifies the FSL decision process. On the visual aspect, a set of subtask-specific classifiers with visual attention mechanism is designed to perform decision at each node of the decision tree. As a result, a subtask-specific heatmap visualization can be obtained to achieve the decision interpretability of each tree node. At last, to alleviate the data scarcity issue of FSL, we regard the prior of concept hierarchy as an undirected graph, and then design a graph convolution-based decision tree inference network as our meta-learner to infer parameters of the decision tree. Extensive experiments on performance comparison and interpretability analysis show superiority of our MetaDT.

翻訳日:2023-07-27 16:42:57 公開日:2023-07-26

# モデル比較と校正評価 : 機械学習とアクチュアリカル・プラクティスにおける一貫性のあるスコア機能のためのユーザガイド

Model Comparison and Calibration Assessment: User Guide for Consistent Scoring Functions in Machine Learning and Actuarial Practice ( http://arxiv.org/abs/2202.12780v3 )

ライセンス: Link先を確認

Tobias Fissler, Christian Lorentzen, Michael Mayer

(参考訳) actuaryとデータサイエンティストの主なタスクの1つは、クレームサイズや保険のクレーム数といった特定の現象に対する優れた予測モデルを構築することである。これらのモデルは与えられた特徴情報を理想的に活用し、予測の精度を高める。このユーザガイドは、あるモデルのキャリブレーションや妥当性を評価し、他方で異なるモデルを比較しランク付けするための統計的手法を再検討し、明確化する。その際、事前の予測対象機能を指定すること(例えば平均または分位数)と、この目標機能と並んでモデル比較における得点関数を選択することの重要性を強調する。採点機能の実用的選択のためのガイダンスが提供される。応用における科学と日常の実践のギャップを埋めようとして、主に既存の成果の教育的な提示とベストプラクティスに焦点を当てている。結果は、労働者の報酬と顧客の混乱に関する2つの実データケーススタディに伴って説明される。

One of the main tasks of actuaries and data scientists is to build good predictive models for certain phenomena such as the claim size or the number of claims in insurance. These models ideally exploit given feature information to enhance the accuracy of prediction. This user guide revisits and clarifies statistical techniques to assess the calibration or adequacy of a model on the one hand, and to compare and rank different models on the other hand. In doing so, it emphasises the importance of specifying the prediction target functional at hand a priori (e.g. the mean or a quantile) and of choosing the scoring function in model comparison in line with this target functional. Guidance for the practical choice of the scoring function is provided. Striving to bridge the gap between science and daily practice in application, it focuses mainly on the pedagogical presentation of existing results and of best practice. The results are accompanied and illustrated by two real data case studies on workers' compensation and customer churn.

翻訳日:2023-07-27 16:42:34 公開日:2023-07-26

# SIMMC 2.0チャレンジにおけるあいまいさ検出と参照解決のためのマルチモーダル表現の探索

Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge ( http://arxiv.org/abs/2202.12645v2 )

ライセンス: Link先を確認

Javier Chiyah-Garcia and Alessandro Suglia and Jos\'e Lopes and Arash Eshghi and Helen Hastie

(参考訳) 代名詞や指示記述などのアナフォリックな表現は、先行するターンの言語的文脈や、即時的な視覚環境に関するものである。しかし、話者の参照記述が必ずしも参照者を識別するとは限らないため、その後の明確化交換による解決の必要性が曖昧になる。したがって、会話型AIにおけるタスク成功の鍵は、効果的なあいまいさ検出と参照解決である。本稿では,simmc 2.0 チャレンジ (kottur et al. 2021) の一環として,これら2つのタスクのモデルを提案する。具体的には,TOD-BERTとLXMERTをベースとしたモデルを用いて,多数のベースラインと比較し,アブレーション実験を行う。その結果,(1)言語モデルでは曖昧さを検出するためにデータの相関を活用でき,(2)言語モデルでは,スマートオブジェクト表現を用いることで,視覚コンポーネントの必要性を回避することができることがわかった。

Anaphoric expressions, such as pronouns and referential descriptions, are situated with respect to the linguistic context of prior turns, as well as, the immediate visual environment. However, a speaker's referential descriptions do not always uniquely identify the referent, leading to ambiguities in need of resolution through subsequent clarificational exchanges. Thus, effective Ambiguity Detection and Coreference Resolution are key to task success in Conversational AI. In this paper, we present models for these two tasks as part of the SIMMC 2.0 Challenge (Kottur et al. 2021). Specifically, we use TOD-BERT and LXMERT based models, compare them to a number of baselines and provide ablation experiments. Our results show that (1) language models are able to exploit correlations in the data to detect ambiguity; and (2) unimodal coreference resolution models can avoid the need for a vision component, through the use of smart object representations.

翻訳日:2023-07-27 16:42:17 公開日:2023-07-26

# Universal Deep Domain Adaptation Frameworkを用いたクロスセッションモータ画像のプライミング

Priming Cross-Session Motor Imagery Classification with A Universal Deep Domain Adaptation Framework ( http://arxiv.org/abs/2202.09559v2 )

ライセンス: Link先を確認

Zhengqing Miao, Xin Zhang, Carlo Menon, Yelong Zheng, Meirong Zhao, Dong Ming

(参考訳) 運動画像(英: Motor image、MI)は、脳のコンピュータインタフェース(BCI)のパラダイムである。脳波は信号と雑音の少ない非定常的であり、異なる脳波記録セッションから同じ参加者の運動画像タスクを分類することは一般的に困難である。クロスセッションMI分類をドメイン適応問題と考えるのは直感的であるが、合理的かつ実現可能なアプローチは解明されていない。本稿では,領域適応理論の数学的モデルに基づくクロスセッションMI分類のための,シームズ深部ドメイン適応(SDDA)フレームワークを提案する。提案手法は,既存のニューラルネットワークの多くに対して,ネットワーク構造を変更せずに容易に適用することができる。提案手法では,まずチャネル正規化とユークリッドアライメントを併用してドメイン不変量を構築した。次に、ソースとターゲットドメインからの埋め込み機能を再生カーネルヒルベルト空間(RKHS)にマッピングし、それに従って整列する。 SDDAの一般化性を改善するために,コサインに基づく中心損失もフレームワークに統合された。提案フレームワークは、2つのMI-EEG公開データセット(BCI Competition IIA, IIB)において、BCI研究分野(EEGNetとConvNet)から古典的で一般的な畳み込みニューラルネットワークを用いて検証された。バニラのEEGNetとConvNetと比較して、提案されたSDDAフレームワークは、IIAデータセットでそれぞれ10.2%、IIBデータセットで5.5%、4.2%のMI分類精度を15.2%向上することができた。最終MI分類精度はIIAデータセットで82.01%、IIBで87.52%に達した。

Motor imagery (MI) is a common brain computer interface (BCI) paradigm. EEG is non-stationary with low signal-to-noise, classifying motor imagery tasks of the same participant from different EEG recording sessions is generally challenging, as EEG data distribution may vary tremendously among different acquisition sessions. Although it is intuitive to consider the cross-session MI classification as a domain adaptation problem, the rationale and feasible approach is not elucidated. In this paper, we propose a Siamese deep domain adaptation (SDDA) framework for cross-session MI classification based on mathematical models in domain adaptation theory. The proposed framework can be easily applied to most existing artificial neural networks without altering the network structure, which facilitates our method with great flexibility and transferability. In the proposed framework, domain invariants were firstly constructed jointly with channel normalization and Euclidean alignment. Then, embedding features from source and target domain were mapped into the Reproducing Kernel Hilbert Space (RKHS) and aligned accordingly. A cosine-based center loss was also integrated into the framework to improve the generalizability of the SDDA. The proposed framework was validated with two classic and popular convolutional neural networks from BCI research field (EEGNet and ConvNet) in two MI-EEG public datasets (BCI Competition IV IIA, IIB). Compared to the vanilla EEGNet and ConvNet, the proposed SDDA framework was able to boost the MI classification accuracy by 15.2%, 10.2% respectively in IIA dataset, and 5.5%, 4.2% in IIB dataset. The final MI classification accuracy reached 82.01% in IIA dataset and 87.52% in IIB, which outperformed the state-of-the-art methods in the literature.

翻訳日:2023-07-27 16:42:01 公開日:2023-07-26

# 小さなサンプルから大きな因果多木を推定する

Estimating large causal polytrees from small samples ( http://arxiv.org/abs/2209.07028v2 )

ライセンス: Link先を確認

Sourav Chatterjee, Mathukumalli Vidyasagar

(参考訳) 比較的小さなi.d.サンプルから大きな因果ポリツリーを推定する問題を考察する。これは、遺伝子制御ネットワークのようなサンプルサイズに比べて変数数が非常に大きい場合に因果構造を決定する問題によって動機付けられた。このような設定で高い精度で木を復元するアルゴリズムを提案する。このアルゴリズムは本質的には、軽度非退化条件以外の分布的あるいはモデリング的な仮定下では機能しない。

We consider the problem of estimating a large causal polytree from a relatively small i.i.d. sample. This is motivated by the problem of determining causal structure when the number of variables is very large compared to the sample size, such as in gene regulatory networks. We give an algorithm that recovers the tree with high accuracy in such settings. The algorithm works under essentially no distributional or modeling assumptions other than some mild non-degeneracy conditions.

翻訳日:2023-07-27 16:35:03 公開日:2023-07-26

# フォトニック量子回路の設計について

On the design of photonic quantum circuits ( http://arxiv.org/abs/2209.06069v4 )

ライセンス: Link先を確認

Yuan Yao, Filippo Miatto, and Nicol\'as Quesada

(参考訳) 本稿では,ガウス的対象(純粋かつ混合ガウス的ユニタリ,ガウス的チャネル,ガウス的測定)と光子数分解測定などの非ガウス的効果からなる一般フォトニック量子回路の設計と最適化を行う枠組みを提案する。この枠組みでは、シンプレクティック群(あるいは特別な場合におけるユニタリ群や直交群)の要素を用いてガウス対象の位相空間表現をパラメトリズし、任意のガウス対象のフォック振幅を再帰的に計算する単一の線形反復関係を用いてフォック表現に変換する。また,相空間パラメータに対するフォック振幅の勾配を再帰関係を通じて微分することにより計算する。次に、シンプレクティック群上のリーマン最適化を使用して、mモードガウスオブジェクトを最適化し、基本ゲートの観点から特定の実現にコミットする必要をなくすことができる。これにより、最適化が完了した後に選択できる同じ回路のゲートレベルの実装をすべて“モックアウト”することができる。これは、状態や変換のクラスにプロパティの値をバインドしたり、回路最適化のステップとは別にハードウェアの制約を心配したい場合など、一般的な質問に答えたい場合に特に有用である。最後に、状態がガウス変換を行うときのグローバル位相の変化を明示的に計算することにより、ガウス変換の線形結合として記述できる非ガウスオブジェクトに我々のフレームワークを拡張できるようにする。我々はこれらの手法をオープンソースライブラリMrMustardに実装し、Borealisの216モード干渉計を最適化する3つの例と、猫の状態と立方相状態を生成する2モードおよび3モード回路(Fock測定)で実装した。

We propose a framework to design and optimize generic photonic quantum circuits composed of Gaussian objects (pure and mixed Gaussian states, Gaussian unitaries, Gaussian channels, Gaussian measurements) as well as non-Gaussian effects such as photon-number-resolving measurements. In this framework, we parametrize a phase space representation of Gaussian objects using elements of the symplectic group (or the unitary or orthogonal group in special cases), and then we transform it into the Fock representation using a single linear recurrence relation that computes the Fock amplitudes of any Gaussian object recursively. We also compute the gradient of the Fock amplitudes with respect to phase space parameters by differentiating through the recurrence relation. We can then use Riemannian optimization on the symplectic group to optimize M-mode Gaussian objects, avoiding the need to commit to particular realizations in terms of fundamental gates. This allows us to "mod out" all the different gate-level implementations of the same circuit, which now can be chosen after the optimization has completed. This can be especially useful when looking to answer general questions, such as bounding the value of a property over a class of states or transformations, or when one would like to worry about hardware constraints separately from the circuit optimization step. Finally, we make our framework extendable to non-Gaussian objects that can be written as linear combinations of Gaussian ones, by explicitly computing the change in global phase when states undergo Gaussian transformations. We implemented all of these methods in the freely available open-source library MrMustard, which we use in three examples to optimize the 216-mode interferometer in Borealis, and 2- and 3-modes circuits (with Fock measurements) to produce cat states and cubic phase states.

翻訳日:2023-07-27 16:34:58 公開日:2023-07-26

# AudioLM: 音声生成のための言語モデリングアプローチ

AudioLM: a Language Modeling Approach to Audio Generation ( http://arxiv.org/abs/2209.03143v2 )

ライセンス: Link先を確認

Zal\'an Borsos, Rapha\"el Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil Zeghidour

(参考訳) 本稿では,長期的整合性を有する高品質オーディオ生成フレームワークであるAudioLMを紹介する。 audiolmは入力オーディオを一連の離散トークンにマッピングし、この表現空間で言語モデリングタスクとしてオーディオ生成をキャストする。本稿では,既存の音声トークン化装置が,再建品質と長期構造との間に異なるトレードオフを提供する方法を示す。すなわち,音声に事前学習したマスク付き言語モデルの離散化アクティベーションを利用して,ニューラルオーディオコーデックが生成する長期構造と離散符号をキャプチャし,高品質な合成を実現する。生音声波形の大規模なコーパスを訓練することにより、AudioLMは短いプロンプトを与えられた自然なコヒーレントな継続を生成することを学ぶ。音声で訓練し、書き起こしや注釈なしでは、AudioLMは構文的かつ意味論的に妥当な音声継続を生成すると同時に、未知の話者に対する話者のアイデンティティと韻律を維持できる。さらに,音楽の象徴的表現を伴わずに訓練されたにもかかわらず,コヒーレントなピアノ音楽継続を生成することによって,音声を超えて我々のアプローチが拡張されることを示す。

We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure, and we propose a hybrid tokenization scheme to achieve both objectives. Namely, we leverage the discretized activations of a masked language model pre-trained on audio to capture long-term structure and the discrete codes produced by a neural audio codec to achieve high-quality synthesis. By training on large corpora of raw audio waveforms, AudioLM learns to generate natural and coherent continuations given short prompts. When trained on speech, and without any transcript or annotation, AudioLM generates syntactically and semantically plausible speech continuations while also maintaining speaker identity and prosody for unseen speakers. Furthermore, we demonstrate how our approach extends beyond speech by generating coherent piano music continuations, despite being trained without any symbolic representation of music.

翻訳日:2023-07-27 16:34:24 公開日:2023-07-26

# 古典的なモデルは、目標に絞られた光モデルよりも、Juzhang 1.0 Gaussian Boson Samplerのより良い説明であるかもしれない。

Classical models may be a better explanation of the Jiuzhang 1.0 Gaussian Boson Sampler than its targeted squeezed light model ( http://arxiv.org/abs/2207.10058v5 )

ライセンス: Link先を確認

Javier Mart\'inez-Cifuentes, K. M. Fonseca-Romero, Nicol\'as Quesada

(参考訳) 最近、Zhongらはしきい値検出器を用いて最大144モードのガウスボソンサンプリング実験を行った。著者らはこれらの実験の実装により、Juzhang 1.0 と Jiuzhang 2.0 という量子計算上の優位性を達成したと主張している。これらの実験結果は、モード、ベイズ仮説テスト、重出力生成(hog)テストとの統計的相関の比較などのテストを用いて、いくつかの古典的な仮説と敵に対して検証される。本稿では, 干渉計に送信されたコヒーレント状態の混合物の確率分布を用いて, これらの実験を検証するための古典的仮説を提案する。高光子数密度系における構成について、統計相関の比較は実験の基礎的真実(2モードの圧縮状態が干渉計に送信される)を我々の代替仮説と区別しない。ベイズテストは、Juzhang 1.0以外のすべての構成について、基礎的な真実は我々の代替仮説よりも実験データのよりありそうな説明であることを示している。同様の結果がホグテストで得られた:jiuzhang 2.0の全ての構成について、実験サンプルは我々の代替分布で得られたサンプルよりも高い基底真理確率を持つことを示し、jiuzhang 1.0ではテストは決定的ではない。本結果は,今後のGBS実験の検証において考慮すべき新しい仮説を提供し,GBSの文脈で量子優位性を検証するための適切なメトリクスを特定する必要性に光を当てた。また、量子的特徴を欠いたJuzhang 1.0実験の古典的な説明は除外されていないことも示している。

Recently, Zhong et al. performed landmark Gaussian boson sampling experiments with up to 144 modes using threshold detectors. The authors claim to have achieved quantum computational advantage with the implementation of these experiments, named Jiuzhang 1.0 and Jiuzhang 2.0. Their experimental results are validated against several classical hypotheses and adversaries using tests such as the comparison of statistical correlations between modes, Bayesian hypothesis testing and the Heavy Output Generation (HOG) test. We propose an alternative classical hypothesis for the validation of these experiments using the probability distribution of mixtures of coherent states sent into a lossy interferometer; these input mixed states, which we term squashed states, have vacuum fluctuations in one quadrature and excess fluctuations in the other. We find that for configurations in the high photon number density regime, the comparison of statistical correlations does not tell apart the ground truth of the experiment (two-mode squeezed states sent into an interferometer) from our alternative hypothesis. The Bayesian test indicates that, for all configurations excepting Jiuzhang 1.0, the ground truth is a more likely explanation of the experimental data than our alternative hypothesis. A similar result is obtained for the HOG test: for all configurations of Jiuzhang 2.0, the test indicates that the experimental samples have higher ground truth probability than the samples obtained form our alternative distribution; for Jiuzhang 1.0 the test is inconclusive. Our results provide a new hypothesis that should be considered in the validation of future GBS experiments, and shed light into the need to identify proper metrics to verify quantum advantage in the context of GBS. They also indicate that a classical explanation of the Jiuzhang 1.0 experiment, lacking any quantum features, has not been ruled out.

翻訳日:2023-07-27 16:33:39 公開日:2023-07-26

# FedIIC: 医用画像分類のためのロバストなフェデレーション学習を目指して

FedIIC: Towards Robust Federated Learning for Class-Imbalanced Medical Image Classification ( http://arxiv.org/abs/2206.13803v3 )

ライセンス: Link先を確認

Nannan Wu, Li Yu, Xin Yang, Kwang-Ting Cheng, and Zengqiang Yan

(参考訳) プライバシーの漏えいのない分散データから深層モデルをトレーニングするfederated learning(fl)は、最近医療画像コンピューティングにおいて大きな可能性を秘めている。しかし、医療データにおけるユビキタスクラスの不均衡を考えると、flは特にマイノリティクラス(まれな疾患など)において性能低下を示すことができる。この問題に対する既存の手法は主に、クラス間のクラス事前バイアスを取り除くためのバランスの取れた分類器の訓練に重点を置いている。本稿では,特徴学習と分類器学習という2つの観点からクラス不均衡と戦うために,FedIICというプライバシ保護FL手法を提案する。特徴学習では、2段階のコントラスト学習が、FLで不均衡なデータを用いてより優れたクラス特化特徴を抽出するように設計されている。分類器学習では、クラスごとのマージンはリアルタイムの難易度とクラス優先度に応じて動的に設定される。公開データセットに対する実験結果から,FedIICが実世界とシミュレーションされたマルチソース医療画像データの両方を扱う上で,クラス不均衡下での優れた性能を示した。コードはhttps://github.com/wnn2000/FedIICで入手できる。

Federated learning (FL), training deep models from decentralized data without privacy leakage, has shown great potential in medical image computing recently. However, considering the ubiquitous class imbalance in medical data, FL can exhibit performance degradation, especially for minority classes (e.g. rare diseases). Existing methods towards this problem mainly focus on training a balanced classifier to eliminate class prior bias among classes, but neglect to explore better representation to facilitate classification performance. In this paper, we present a privacy-preserving FL method named FedIIC to combat class imbalance from two perspectives: feature learning and classifier learning. In feature learning, two levels of contrastive learning are designed to extract better class-specific features with imbalanced data in FL. In classifier learning, per-class margins are dynamically set according to real-time difficulty and class priors, which helps the model learn classes equally. Experimental results on publicly-available datasets demonstrate the superior performance of FedIIC in dealing with both real-world and simulated multi-source medical imaging data under class imbalance. Code is available at https://github.com/wnn2000/FedIIC.

翻訳日:2023-07-27 16:32:41 公開日:2023-07-26

# 2段階の勾配更新による安定性の限界を超える

Beyond the Edge of Stability via Two-step Gradient Updates ( http://arxiv.org/abs/2206.04172v3 )

ライセンス: Link先を確認

Lei Chen, Joan Bruna

(参考訳) Gradient Descent(GD)は、高次元空間におけるスケーラビリティと効率のおかげで、現代の機械学習の強力なワークホースである。局所的なミニミザーを見つける能力はリプシッツ勾配の損失に対してのみ保証され、そこでは下層の勾配流の'bona-fide'離散化と見なすことができる。しかし、過パラメータモデルを含む多くのmlセットアップは、上記のリプシッツ定数に反比例する許容しきい値にステップサイズが交差するいわゆる「安定性の限界」(eos)以上の研究を動機付けたこの問題クラスには入らない。おそらく驚くべきことに、gdは局所的な不安定性と振動行動に関わらず、依然として収束することが実証的に観察されている。この現象の初歩的な理論的分析は、主に過パラメトリッドな体制に焦点を合わせており、大きな学習率を選択する効果は、適切な漸近的限界の下で、ミニミザー多様体内の「シャープネス・ミニミフィケーション」の暗黙的な正則化と関連付けられる可能性がある。対照的に,本研究では,2段階の勾配更新の解析を通じて,単純だが代表的な学習問題に着目し,不安定収束の条件を直接検討する。具体的には,二段階更新の固定点に対する存在と収束を保証する三階微分を含む局所的条件を特徴とし,その特性を教師の学習環境において,人口減少下で活用する。最後に, 行列因子分解からGDの周期2軌道を高次元的に観察し, ダイナミックスを直感的に観察し, より一般的な設定を探索する。

Gradient Descent (GD) is a powerful workhorse of modern machine learning thanks to its scalability and efficiency in high-dimensional spaces. Its ability to find local minimisers is only guaranteed for losses with Lipschitz gradients, where it can be seen as a `bona-fide' discretisation of an underlying gradient flow. Yet, many ML setups involving overparametrised models do not fall into this problem class, which has motivated research beyond the so-called ``Edge of Stability'' (EoS), where the step-size crosses the admissibility threshold inversely proportional to the Lipschitz constant above. Perhaps surprisingly, GD has been empirically observed to still converge regardless of local instability and oscillatory behavior. The incipient theoretical analysis of this phenomena has mainly focused in the overparametrised regime, where the effect of choosing a large learning rate may be associated to a `Sharpness-Minimisation' implicit regularisation within the manifold of minimisers, under appropriate asymptotic limits. In contrast, in this work we directly examine the conditions for such unstable convergence, focusing on simple, yet representative, learning problems, via analysis of two-step gradient updates. Specifically, we characterize a local condition involving third-order derivatives that guarantees existence and convergence to fixed points of the two-step updates, and leverage such property in a teacher-student setting, under population loss. Finally, starting from Matrix Factorization, we provide observations of period-2 orbit of GD in high-dimensional settings with intuition of its dynamics, along with exploration into more general settings.

翻訳日:2023-07-27 16:32:18 公開日:2023-07-26

# TreeFlow: ツリーベースのガウス確率的回帰を超えて

TreeFlow: Going beyond Tree-based Gaussian Probabilistic Regression ( http://arxiv.org/abs/2206.04140v2 )

ライセンス: Link先を確認

Patryk Wielopolski, Maciej Zi\k{e}ba

(参考訳) 木に基づくアンサンブルは、様々な範囲や領域の混合型変数で表される特徴ベクトルを特徴とする分類や回帰問題において優れた性能で知られている。しかし、回帰問題を考えると、主に決定論的応答を提供するか、ガウス分布やパラメトリック分布で出力の不確かさをモデル化するために設計されている。本研究では,ツリーアンサンブルの利点と,正規化フローを用いた柔軟な確率分布のモデル化機能を組み合わせたツリーベースアプローチであるTreeFlowを紹介する。この解の主な考え方は、木に基づくモデルを特徴抽出器として使用し、正規化フローの条件変数と組み合わせることである。その結果,本手法は回帰出力の複雑な分布をモデル化することができる。提案手法は, 量, 特徴特性, 対象寸法の異なる難易度回帰ベンチマークを用いて評価する。我々は,多モーダル目標分布を持つデータセットの確率的および決定論的指標と,木に基づく回帰ベースラインと比較した単調なデータセットの競合結果のSOTA結果を得た。

The tree-based ensembles are known for their outstanding performance in classification and regression problems characterized by feature vectors represented by mixed-type variables from various ranges and domains. However, considering regression problems, they are primarily designed to provide deterministic responses or model the uncertainty of the output with Gaussian or parametric distribution. In this work, we introduce TreeFlow, the tree-based approach that combines the benefits of using tree ensembles with the capabilities of modeling flexible probability distributions using normalizing flows. The main idea of the solution is to use a tree-based model as a feature extractor and combine it with a conditional variant of normalizing flow. Consequently, our approach is capable of modeling complex distributions for the regression outputs. We evaluate the proposed method on challenging regression benchmarks with varying volume, feature characteristics, and target dimensionality. We obtain the SOTA results for both probabilistic and deterministic metrics on datasets with multi-modal target distributions and competitive results on unimodal ones compared to tree-based regression baselines.

翻訳日:2023-07-27 16:31:42 公開日:2023-07-26

# 3次元小分子と高分子錯体のための効率的かつ正確な物理量認識多重グラフニューラルネットワーク

Efficient and Accurate Physics-aware Multiplex Graph Neural Networks for 3D Small Molecules and Macromolecule Complexes ( http://arxiv.org/abs/2206.02789v2 )

ライセンス: Link先を確認

Shuo Zhang, Yang Liu, Lei Xie

(参考訳) グラフニューラルネットワーク(GNN)を分子科学に適用する最近の進歩は、3次元3次元構造表現をGNNで学習する能力を示している。しかし、既存のGNNのほとんどは、多様な相互作用のモデリング不足、計算コストの高い演算、ベクトル値の無知の限界に悩まされている。そこで我々は,新しいGNNモデルである物理対応多重グラフニューラルネットワーク(PaxNet)を提案し,小さな有機化合物とマクロ分子複合体の3次元分子の表現を効率的かつ正確に学習する。 PaxNetは、分子力学にインスパイアされた局所的および非局所的な相互作用のモデリングを分離し、高価な角度関連計算を減らす。スカラー特性の他に、paxnetは各原子の関連するベクトルを学習することでベクトル特性を予測できる。 PaxNetの性能を評価するために,2つのタスクにおける最先端のベースラインと比較する。量子化学特性を予測するための小さな分子データセットでは、PaxNetは予測誤差を15%削減し、最高のベースラインよりも73%少ないメモリを使用する。タンパク質-リガンド結合親和性を予測するマクロ分子データセットでは、PaxNetはメモリ消費を33%減らし、推論時間を85%減らしながら、最高のベースラインを上回っている。したがって、PaxNetは分子の大規模機械学習のための普遍的で堅牢で正確な方法を提供する。私たちのコードはhttps://github.com/zetayue/Physics-aware-Multiplex-GNNで利用可能です。

Recent advances in applying Graph Neural Networks (GNNs) to molecular science have showcased the power of learning three-dimensional (3D) structure representations with GNNs. However, most existing GNNs suffer from the limitations of insufficient modeling of diverse interactions, computational expensive operations, and ignorance of vectorial values. Here, we tackle these limitations by proposing a novel GNN model, Physics-aware Multiplex Graph Neural Network (PaxNet), to efficiently and accurately learn the representations of 3D molecules for both small organic compounds and macromolecule complexes. PaxNet separates the modeling of local and non-local interactions inspired by molecular mechanics, and reduces the expensive angle-related computations. Besides scalar properties, PaxNet can also predict vectorial properties by learning an associated vector for each atom. To evaluate the performance of PaxNet, we compare it with state-of-the-art baselines in two tasks. On small molecule dataset for predicting quantum chemical properties, PaxNet reduces the prediction error by 15% and uses 73% less memory than the best baseline. On macromolecule dataset for predicting protein-ligand binding affinities, PaxNet outperforms the best baseline while reducing the memory consumption by 33% and the inference time by 85%. Thus, PaxNet provides a universal, robust and accurate method for large-scale machine learning of molecules. Our code is available at https://github.com/zetayue/Physics-aware-Multiplex-GNN.

翻訳日:2023-07-27 16:31:24 公開日:2023-07-26

# 流れ・ステレオ・深さの統一推定

Unifying Flow, Stereo and Depth Estimation ( http://arxiv.org/abs/2211.05783v3 )

ライセンス: Link先を確認

Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, Dacheng Tao, Andreas Geiger

(参考訳) 本稿では,光学的流れ,修正ステレオマッチング,未修正ステレオ深度推定という3つの動作および3次元知覚タスクの統一的な定式化とモデルを提案する。特定のタスクごとの以前の特殊なアーキテクチャとは異なり、我々は3つのタスクすべてを統一的な密対応マッチング問題として定式化し、特徴の類似性を直接比較することで単一のモデルで解決できる。このような定式化は、トランスフォーマー、特にクロスアテンション機構を用いて達成される識別的特徴表現を要求する。我々は,クロスアテンションによって他画像からの知識を相互に統合できることを実証し,抽出した特徴の質を大幅に向上させることを実証した。私たちの統一モデルは、モデルアーキテクチャとパラメータがタスク間で共有されるため、自然にクロスタスク転送を可能にします。 Sintelデータセットの統一モデルではRAFTよりも優れており、モデル設計や推論速度の点でよりシンプルで効率的でありながら、10の一般的なフロー、ステレオ、ディープデータセットにおける最新の最先端手法よりも優れ、あるいは好適に、タスク固有の改善ステップを使用する最終モデルです。

We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images. Unlike previous specialized architectures for each specific task, we formulate all three tasks as a unified dense correspondence matching problem, which can be solved with a single model by directly comparing feature similarities. Such a formulation calls for discriminative feature representations, which we achieve using a Transformer, in particular the cross-attention mechanism. We demonstrate that cross-attention enables integration of knowledge from another image via cross-view interactions, which greatly improves the quality of the extracted features. Our unified model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks. We outperform RAFT with our unified model on the challenging Sintel dataset, and our final model that uses a few additional task-specific refinement steps outperforms or compares favorably to recent state-of-the-art methods on 10 popular flow, stereo and depth datasets, while being simpler and more efficient in terms of model design and inference speed.

翻訳日:2023-07-27 16:25:08 公開日:2023-07-26

# 連合学習における顧客選択:原則、課題、機会

Client Selection in Federated Learning: Principles, Challenges, and Opportunities ( http://arxiv.org/abs/2211.01549v2 )

ライセンス: Link先を確認

Lei Fu and Huanle Zhang and Ge Gao and Mi Zhang and Xin Liu

(参考訳) 機械学習(ML)モデルをトレーニングするためのプライバシ保護パラダイムとして、フェデレートラーニング(FL)は、業界と学術の両方から大きな注目を集めています。典型的なFLシナリオでは、クライアントはデータ分散とハードウェア構成の点で大きな異質性を示す。したがって、各トレーニングラウンドのクライアントをランダムにサンプリングすることは、ヘテロジニアスクライアントからのローカル更新を十分に活用できないため、モデルの精度が低下し、収束速度が遅くなり、公平性が低下する。 FLクライアントの不均一性問題に対処するため,様々なクライアント選択アルゴリズムが開発され,性能改善が期待できる。本稿では,FLクライアント選択の新興分野における最近の進歩とその課題と研究の機会を体系的に提示する。このエキサイティングな研究トピックをより深く理解するために、アプリケーションに最適なクライアント選択メカニズムを実践者が選択できるようにしたいと思っています。

As a privacy-preserving paradigm for training Machine Learning (ML) models, Federated Learning (FL) has received tremendous attention from both industry and academia. In a typical FL scenario, clients exhibit significant heterogeneity in terms of data distribution and hardware configurations. Thus, randomly sampling clients in each training round may not fully exploit the local updates from heterogeneous clients, resulting in lower model accuracy, slower convergence rate, degraded fairness, etc. To tackle the FL client heterogeneity problem, various client selection algorithms have been developed, showing promising performance improvement. In this paper, we systematically present recent advances in the emerging field of FL client selection and its challenges and research opportunities. We hope to facilitate practitioners in choosing the most suitable client selection mechanisms for their applications, as well as inspire researchers and newcomers to better understand this exciting research topic.

翻訳日:2023-07-27 16:24:39 公開日:2023-07-26

# 拡散に基づく生成モデルにおける最適制御

An optimal control perspective on diffusion-based generative modeling ( http://arxiv.org/abs/2211.01364v2 )

ライセンス: Link先を確認

Julius Berner, Lorenz Richter, Karen Ullrich

(参考訳) 近年開発された拡散確率モデルのような確率微分方程式(SDE)に基づく確率最適制御と生成モデルとの接続を確立する。特にハミルトン・ヤコビ・ベルマン方程式を導出し、基礎となるSDE限界の対数密度の進化を制御している。この観点は、最適制御理論から生成的モデリングへのメソッドの転送を可能にする。まず、下界の証拠が制御理論からよく知られた検証定理の直接的な帰結であることを示す。さらに、経路空間における適切な測度間のKulback-Leibler分散の最小化として拡散に基づく生成モデルを定式化することができる。最後に, 統計学や計算科学で頻繁に発生する問題である非正規化密度からの拡散に基づく新しいサンプリング法を開発した。時間反転拡散サンプラー (dis) は, 複数の数値例において他の拡散に基づくサンプリング手法よりも優れることを示す。

We establish a connection between stochastic optimal control and generative models based on stochastic differential equations (SDEs), such as recently developed diffusion probabilistic models. In particular, we derive a Hamilton-Jacobi-Bellman equation that governs the evolution of the log-densities of the underlying SDE marginals. This perspective allows to transfer methods from optimal control theory to generative modeling. First, we show that the evidence lower bound is a direct consequence of the well-known verification theorem from control theory. Further, we can formulate diffusion-based generative modeling as a minimization of the Kullback-Leibler divergence between suitable measures in path space. Finally, we develop a novel diffusion-based method for sampling from unnormalized densities -- a problem frequently occurring in statistics and computational sciences. We demonstrate that our time-reversed diffusion sampler (DIS) can outperform other diffusion-based sampling approaches on multiple numerical examples.

翻訳日:2023-07-27 16:24:21 公開日:2023-07-26

# teal: wanトラフィックエンジニアリングの学習促進最適化

Teal: Learning-Accelerated Optimization of WAN Traffic Engineering ( http://arxiv.org/abs/2210.13763v3 )

ライセンス: Link先を確認

Zhiying Xu, Francis Y. Yan, Rachee Singh, Justin T. Chiu, Alexander M. Rush, Minlan Yu

(参考訳) グローバルクラウドワイドエリアネットワーク(WAN)の急速な拡張は、商用最適化エンジンが大規模なネットワークトラフィックエンジニアリング(TE)問題を効率的に解決する上で、課題となっている。既存のアクセラレーション戦略は、te最適化を並行部分問題に分解するが、実行時間と割り当て性能の固有のトレードオフにより、限定的な並列性を実現する。本稿では,GPUの並列処理能力を活用してTE制御を高速化する学習型TEアルゴリズムTealを提案する。まず、Tealはフロー中心グラフニューラルネットワーク(GNN)を設計し、WAN接続とネットワークフローをキャプチャし、下流アロケーションへの入力としてフロー特徴を学習する。第2に,問題スケールを小さくし,学習を容易なものにするため,中央のTE目標を最適化しながら,各交通需要を独立的に割り当てるマルチエージェント強化学習(RL)アルゴリズムを用いる。最後に,ADMM(Alternating Direction Method of Multipliers)を用いたTeal Fine-tunesアロケーションは,過利用リンクなどの制約違反を低減するために,高度に並列化可能な最適化アルゴリズムである。 MicrosoftのWANのトラフィック行列を用いてTealを評価する。 1,700ノード以上の大きなwanトポロジでは、tealはプロダクション最適化エンジンよりも数桁速い速度で実行しながら、ほぼ最適に近いフロー割り当てを生成する。他のte加速方式と比較して、tealは6～32%のトラフィック需要を満たし、197～625倍のスピードアップを実現している。

The rapid expansion of global cloud wide-area networks (WANs) has posed a challenge for commercial optimization engines to efficiently solve network traffic engineering (TE) problems at scale. Existing acceleration strategies decompose TE optimization into concurrent subproblems but realize limited parallelism due to an inherent tradeoff between run time and allocation performance. We present Teal, a learning-based TE algorithm that leverages the parallel processing power of GPUs to accelerate TE control. First, Teal designs a flow-centric graph neural network (GNN) to capture WAN connectivity and network flows, learning flow features as inputs to downstream allocation. Second, to reduce the problem scale and make learning tractable, Teal employs a multi-agent reinforcement learning (RL) algorithm to independently allocate each traffic demand while optimizing a central TE objective. Finally, Teal fine-tunes allocations with ADMM (Alternating Direction Method of Multipliers), a highly parallelizable optimization algorithm for reducing constraint violations such as overutilized links. We evaluate Teal using traffic matrices from Microsoft's WAN. On a large WAN topology with >1,700 nodes, Teal generates near-optimal flow allocations while running several orders of magnitude faster than the production optimization engine. Compared with other TE acceleration schemes, Teal satisfies 6--32% more traffic demand and yields 197--625x speedups.

翻訳日:2023-07-27 16:23:53 公開日:2023-07-26

# リングトラップにおける分子イオンの量子論理制御と精密測定-基礎対称性試験のための新しいアプローチ

Quantum logic control and precision measurements of molecular ions in a ring trap -- a new approach for testing fundamental symmetries ( http://arxiv.org/abs/2210.11613v2 )

ライセンス: Link先を確認

Yan Zhou, Joshua O. Island, Matt Grau

(参考訳) 本稿では,分節リングイオントラップにおける極性分子イオンの量子論理制御を容易にする新しいプラットフォームを提案する。このアプローチは、スピンコヒーレンスとともに、近距離均一状態の準備と検出を達成することに焦点を当てる。特徴的な特徴は、回転するフレームのパリティ選択スピンプリセションから静的フレームで行われる状態準備と検出を分離することにある。この方法は幅広いイオン種に適用することができ、電子の電気双極子モーメントと核磁気四極子モーメントの探索に使用される。

We present a new platform facilitating quantum logic control of polar molecular ions in a segmented ring ion trap, paving the way for precision measurements. This approach focuses on achieving near-unity state preparation and detection, as well as long spin coherence. A distinctive aspect lies in separating state preparation and detection conducted in a static frame, from parity-selective spin-precession in a rotating frame. This method can be applied to a wide range of ion species and will be used to search for the electron's electric dipole moment and the nuclear magnetic quadrupole moment.

翻訳日:2023-07-27 16:23:27 公開日:2023-07-26

# セキュアなマルチパーティ量子最小共通多重計算プロトコル

A Secure Multiparty Quantum Least Common Multiple Computation Protocol ( http://arxiv.org/abs/2210.08165v2 )

ライセンス: Link先を確認

Zixian Li and Wenjie Liu

(参考訳) 本稿では、ShorのQPA(quantum period-finding algorithm)に基づいて、最小多重(LCM)に対してセキュアなマルチパーティ計算(SMC)プロトコルを提案する。我々のプロトコルは以下の原理に基づいている: 複数の周期関数の接続は周期関数であり、周期は全ての小さな周期のうち、正確には最も一般的でない多重である。また,QPAは確率的アルゴリズムであるため,提案したLCMプロトコルの結果を検証するために,既存のセキュアなマルチパーティ量子和プロトコルに基づく一票制投票プロトコルを提案する。セキュリティ分析により,提案プロトコルは高い確率でセキュアであり,計算量は多項式の複雑さに留まっていることがわかった。本稿では,LCMの効率的かつセキュアなマルチパーティ計算の課題を解決し,量子計算の可能性を示す。

In this paper, we present a secure multiparty computation (SMC) protocol for least common multiple (LCM) based on Shor's quantum period-finding algorithm (QPA). Our protocol is based on the following principle: the connection of multiple periodic functions is also a periodic function whose period is exactly the least common multiple of all small periods. Since QPA is a probabilistic algorithm, we also propose a one-vote-down vote protocol based on the existing secure multi-party quantum summation protocol, which is used to verify the results of the proposed LCM protocol. Security analysis shows that under the semi-honest model, the proposed protocol is secure with high probability, while the computational consumption remains at polynomial complexity. The protocol proposed in this paper solves the problem of efficient and secure multiparty computation of LCM, demonstrating quantum computation potential.

翻訳日:2023-07-27 16:23:14 公開日:2023-07-26

# リモートセンシングと機械学習によるバークビートル攻撃の早期検出

Early Detection of Bark Beetle Attack Using Remote Sensing and Machine Learning: A Review ( http://arxiv.org/abs/2210.03829v2 )

ライセンス: Link先を確認

Seyed Mojtaba Marvasti-Zadeh, Devin Goodsman, Nilanjan Ray, Nadir Erbilgin

(参考訳) 本報告では,本研究の過去および現在の動向を概観し,本研究の3つの主要な視点からブナ害虫による樹木死の早期発見について概観する。これまでの取り組みとは対照的に、このレビューは全てのRSシステムを網羅し、その強みや弱点を調査するためのML/DL手法を強調している。我々は,マルチ・ハイパー・スペクトル分析に基づいて既存の文献を解析し,その知識を抽出した。攻撃の初期段階,ホストツリー,研究領域,rsプラットフォームとセンサ,スペクトル/空間/時間分解能,スペクトルシグネチャ,スペクトル植生指数(svis),mlアプローチ,学習スキーム,タスクカテゴリ,アルゴリズム,クラス/クラスタ,特徴,dlネットワークとアーキテクチャに重点を置く。 DLベースの手法とランダムフォレスト(RF)アルゴリズムは有望な結果を示し、可視、熱、短波赤外(SWIR)スペクトル領域にわたる微妙な変化を検出する可能性を強調したが、その効果は限定的であり、高い不確実性を持っている。これらの欠点に対する新しい解決策を刺激するために、さまざまな視点から主要な課題と機会を掘り下げ、研究の現状をより深く理解し、今後の研究方向性を導く。

This paper provides a comprehensive review of past and current advances in the early detection of bark beetle-induced tree mortality from three primary perspectives: bark beetle & host interactions, RS, and ML/DL. In contrast to prior efforts, this review encompasses all RS systems and emphasizes ML/DL methods to investigate their strengths and weaknesses. We parse existing literature based on multi- or hyper-spectral analyses and distill their knowledge based on: bark beetle species & attack phases with a primary emphasis on early stages of attacks, host trees, study regions, RS platforms & sensors, spectral/spatial/temporal resolutions, spectral signatures, spectral vegetation indices (SVIs), ML approaches, learning schemes, task categories, models, algorithms, classes/clusters, features, and DL networks & architectures. Although DL-based methods and the random forest (RF) algorithm showed promising results, highlighting their potential to detect subtle changes across visible, thermal, and short-wave infrared (SWIR) spectral regions, they still have limited effectiveness and high uncertainties. To inspire novel solutions to these shortcomings, we delve into the principal challenges & opportunities from different perspectives, enabling a deeper understanding of the current state of research and guiding future research directions.

翻訳日:2023-07-27 16:22:45 公開日:2023-07-26

# factor fields: ニューラルフィールドとそれ以降の統一フレームワーク

Factor Fields: A Unified Framework for Neural Fields and Beyond ( http://arxiv.org/abs/2302.01226v2 )

ライセンス: Link先を確認

Anpei Chen, Zexiang Xu, Xinyue Wei, Siyu Tang, Hao Su, Andreas Geiger

(参考訳) 信号のモデル化と表現のための新しいフレームワークであるファクタフィールドを提案する。因子場は信号を因子の積に分解し、それぞれが座標変換された入力信号を操作する神経または正則なフィールド表現で表される。この分解により,nerf,plenoxels,eg3d,instant-ngp,tensorfなどの最近の信号表現を一般化する統一フレームワークが得られた。さらに,本論文で提案するCoBaFa(Coefficient-Basis Factorization, CoBaFa)のような,強力な新しい信号表現の創出を可能にする。実験で証明されたように、cobafaは、神経信号表現における3つの重要な目標である近似品質、コンパクト性、効率性の観点から、以前の高速再構成法よりも改善される。実験により,2次元画像回帰タスクでは画像の近似精度が向上し,3次元符号付き距離場を再構成する場合の幾何的品質が向上し,従来の高速再構成手法に比べて精度が向上することが実証された。さらに,このCoBaFa表現は,トレーニング中に信号間で基底を共有することで一般化が可能であり,スパース観測による画像回帰や数発の放射場再構成といった一般化タスクも実現している。プロジェクトページ: https://apchenstu.github.io/factorfields/

We present Factor Fields, a novel framework for modeling and representing signals. Factor Fields decomposes a signal into a product of factors, each of which is represented by a neural or regular field representation operating on a coordinate transformed input signal. We show that this decomposition yields a unified framework that generalizes several recent signal representations including NeRF, PlenOxels, EG3D, Instant-NGP, and TensoRF. Moreover, the framework allows for the creation of powerful new signal representations, such as the Coefficient-Basis Factorization (CoBaFa) which we propose in this paper. As evidenced by our experiments, CoBaFa leads to improvements over previous fast reconstruction methods in terms of the three critical goals in neural signal representation: approximation quality, compactness and efficiency. Experimentally, we demonstrate that our representation achieves better image approximation quality on 2D image regression tasks, higher geometric quality when reconstructing 3D signed distance fields and higher compactness for radiance field reconstruction tasks compared to previous fast reconstruction methods. Besides, our CoBaFa representation enables generalization by sharing the basis across signals during training, enabling generalization tasks such as image regression with sparse observations and few-shot radiance field reconstruction. Project Page: https://apchenstu.github.io/FactorFields/

翻訳日:2023-07-27 16:14:50 公開日:2023-07-26

# 量子力学の非線形拡張における符号なし

No-signaling in Nonlinear Extensions of Quantum Mechanics ( http://arxiv.org/abs/2301.11548v2 )

ライセンス: Link先を確認

Rohit Kishan Ray, Gian Paolo Beretta

(参考訳) 量子力学の非線形拡張の展開は、超音速通信(シグナリング)のような非物理的特徴を除外する必要があるため、簡単ではない。このレターでは、最も急激なエントロピー上昇形式は、部分系の局所的進化が必ずしもその減少状態にのみ依存するとは限らないような、より広範な非有意な非線形進化方程式に属する、有理な非有理的拡張であることを示す。局所還元密度演算子に加えて、「局所知覚」と呼ばれる局所作用素の幅広いクラスが存在し、他の非相互作用系内で局所化されるユニタリ演算に無関心であることを示す。

Devising a nonlinear extension of quantum mechanics is nontrivial because unphysical features such as supraluminal communication (signaling) are to be excluded. In this Letter, we show that the steepest entropy ascent formalism is a viable no-signaling extension belonging to a broader class of no-signaling nonlinear evolution equations for which the local evolution of a subsystem is not necessarily bound to depend only on its reduced state. We prove that, in addition to the local reduced density operator, there is a broad class of local operators called `local perceptions', which are insensitive to unitary operations localized within other non-interacting systems.

翻訳日:2023-07-27 16:14:28 公開日:2023-07-26

# 量子コンピュータ上のスレーター行列式と相関状態の効率的な調製のための浅量子回路

Shallow quantum circuits for efficient preparation of Slater determinants and correlated states on a quantum computer ( http://arxiv.org/abs/2301.07477v5 )

ライセンス: Link先を確認

Chong Hian Chee, Daniel Leykam, Adrian M. Mak, Dimitris G. Angelakis

(参考訳) フェルミオンアンザッツ状態調製は、量子化学や凝縮物質への応用のための変分量子固有解法のような多くの量子アルゴリズムにおける臨界サブルーチンである。スレーター行列式と相関状態を作成するのに必要な最浅い回路深度は、システムサイズ$n$に対して少なくとも線形にスケールする。量子機械学習のために開発されたデータローディング回路に触発されて、d-フェルミオンを用いたそのような状態を作成するために、より浅くスケーラブルな${\mathcal{o}}(d \log_2^2n)$ 2量子ビットのゲート深さ回路を提供する代替パラダイムを提案し、第二量子化における既存のアプローチよりもn$のサブ指数的削減を提供し、d{\ll}{\mathcal{o}}{\left(n / \log_2^2n\right)}$ fermionic systemsの精度の高い研究を可能にした。

Fermionic ansatz state preparation is a critical subroutine in many quantum algorithms such as Variational Quantum Eigensolver for quantum chemistry and condensed matter applications. The shallowest circuit depth needed to prepare Slater determinants and correlated states to date scale at least linearly with respect to the system size $N$. Inspired by data-loading circuits developed for quantum machine learning, we propose an alternate paradigm that provides shallower, yet scalable ${\mathcal{O}}(d \log_2^2N)$ two-qubit gate depth circuits to prepare such states with d-fermions, offering a subexponential reduction in $N$ over existing approaches in second quantization, enabling high-accuracy studies of $d{\ll}{\mathcal{O}}{\left(N / \log_2^2 N\right)}$ fermionic systems with larger basis sets on near-term quantum devices.

翻訳日:2023-07-27 16:14:14 公開日:2023-07-26

# 統計的推定における重み付きデータの量子化:(Near)ミニマックスレート、共変量化、均一回復

Quantizing Heavy-tailed Data in Statistical Estimation: (Near) Minimax Rates, Covariate Quantization, and Uniform Recovery ( http://arxiv.org/abs/2212.14562v2 )

ライセンス: Link先を確認

Junren Chen, Michael K. Ng, Di Wang

(参考訳) 本稿では,いくつかの基本統計的推定問題における重み付きデータの量子化について検討する。我々は,一様量子化に先立ってデータを切断し,適切に処理することを提案する。提案手法では, 推定誤差の最小化速度は, 提案手法が生成する量子化データからのみ達成可能である。特に, 共分散推定, 圧縮センシング, 行列完全度について, 量子化が乗法係数をわずかに悪化させるだけという具体的な結果が得られた。さらに,共変量(つまり,ベクトル)と応答が量子化される圧縮センシングの研究を行った。共変量化の下では、共分散行列推定器は正の半定性に欠けるため、回復プログラムは非凸であるが、全ての局所最小化器は最適誤差境界付近で楽しむことが証明される。さらに, 製品プロセスの濃度不等式と被覆議論により, 重み付き雑音を伴う量子化圧縮センシングのための最小値均一回復保証をほぼ確立する。

This paper studies the quantization of heavy-tailed data in some fundamental statistical estimation problems, where the underlying distributions have bounded moments of some order. We propose to truncate and properly dither the data prior to a uniform quantization. Our major standpoint is that (near) minimax rates of estimation error are achievable merely from the quantized data produced by the proposed scheme. In particular, concrete results are worked out for covariance estimation, compressed sensing, and matrix completion, all agreeing that the quantization only slightly worsens the multiplicative factor. Besides, we study compressed sensing where both covariate (i.e., sensing vector) and response are quantized. Under covariate quantization, although our recovery program is non-convex because the covariance matrix estimator lacks positive semi-definiteness, all local minimizers are proved to enjoy near optimal error bound. Moreover, by the concentration inequality of product process and covering argument, we establish near minimax uniform recovery guarantee for quantized compressed sensing with heavy-tailed noise.

翻訳日:2023-07-27 16:13:45 公開日:2023-07-26

# カスケードlstmネットワークを用いた新しい深層強化学習型自動株式取引システム

A Novel Deep Reinforcement Learning Based Automated Stock Trading System Using Cascaded LSTM Networks ( http://arxiv.org/abs/2212.02721v2 )

ライセンス: Link先を確認

Jie Zou, Jiashu Lou, Baohua Wang, Sixue Liu

(参考訳) 深層強化学習(DRL)アルゴリズムを用いて、より多くの株式取引戦略が構築されているが、ゲームコミュニティで広く使われているDRL手法は、信号対雑音比と不均一性の低い財務データに直接適応できないため、パフォーマンス上の欠点に悩まされている。本稿では,まずLSTMを用いて日次データから時系列特徴を抽出し,次に抽出した特徴をトレーニングエージェントに供給し,強化学習における戦略関数もトレーニングにLSTMを使用する,DRLベースの株式取引システムを提案する。米国市場におけるDJIと中国株式市場におけるSSE50の実験から、当社のモデルは累積リターンとシャープ比で従来のベースラインモデルよりも優れており、この優位性は、合併市場である中国株式市場においてより重要である。提案手法は,自動株式取引システムを構築する上で有望な方法であることを示す。

More and more stock trading strategies are constructed using deep reinforcement learning (DRL) algorithms, but DRL methods originally widely used in the gaming community are not directly adaptable to financial data with low signal-to-noise ratios and unevenness, and thus suffer from performance shortcomings. In this paper, to capture the hidden information, we propose a DRL based stock trading system using cascaded LSTM, which first uses LSTM to extract the time-series features from stock daily data, and then the features extracted are fed to the agent for training, while the strategy functions in reinforcement learning also use another LSTM for training. Experiments in DJI in the US market and SSE50 in the Chinese stock market show that our model outperforms previous baseline models in terms of cumulative returns and Sharp ratio, and this advantage is more significant in the Chinese stock market, a merging market. It indicates that our proposed method is a promising way to build a automated stock trading system.

翻訳日:2023-07-27 16:13:25 公開日:2023-07-26

# 不確かさを持つマルコフジャンプ線形系の形式制御器合成

Formal Controller Synthesis for Markov Jump Linear Systems with Uncertain Dynamics ( http://arxiv.org/abs/2212.00679v4 )

ライセンス: Link先を確認

Luke Rickard, Thom Badings, Licio Romao, Alessandro Abate

(参考訳) サイバーフィジカルシステムのための確実に正しい制御器の自動合成は、安全クリティカルなシナリオの展開に不可欠である。しかし、ハイブリッド機能や確率的あるいは未知の振る舞いは、この問題を難しくする。サイバーフィジカルシステムのための離散時間モデルのクラスであるマルコフジャンプ線形システム(mjlss)の制御器を合成する方法を提案する。 MJLSは有限集合の確率線型力学と、マルコフ決定過程(MDP)によって支配されるこれらの力学の間の離散ジャンプからなる。本研究は, このMPPの遷移確率が一定間隔で知られているか, 完全に未知であるかを考察する。我々のアプローチは、MJLSの離散(モードジャンプ)と連続(確率線形)の両方の挙動を捉える有限状態抽象化に基づいている。我々は、この抽象概念を、いわゆる「scenario approach」のサンプリング手法を用いて遷移確率の間隔を計算する区間 MDP (iMDP) として定式化し、確率論的に近似を与える。本手法を複数の現実的なベンチマーク問題,特に温度制御と航空機の配送問題に適用する。

Automated synthesis of provably correct controllers for cyber-physical systems is crucial for deployment in safety-critical scenarios. However, hybrid features and stochastic or unknown behaviours make this problem challenging. We propose a method for synthesising controllers for Markov jump linear systems (MJLSs), a class of discrete-time models for cyber-physical systems, so that they certifiably satisfy probabilistic computation tree logic (PCTL) formulae. An MJLS consists of a finite set of stochastic linear dynamics and discrete jumps between these dynamics that are governed by a Markov decision process (MDP). We consider the cases where the transition probabilities of this MDP are either known up to an interval or completely unknown. Our approach is based on a finite-state abstraction that captures both the discrete (mode-jumping) and continuous (stochastic linear) behaviour of the MJLS. We formalise this abstraction as an interval MDP (iMDP) for which we compute intervals of transition probabilities using sampling techniques from the so-called 'scenario approach', resulting in a probabilistically sound approximation. We apply our method to multiple realistic benchmark problems, in particular, a temperature control and an aerial vehicle delivery problem.

翻訳日:2023-07-27 16:13:07 公開日:2023-07-26

# エコーチェンバー効果を増幅するリツイート

Retweets Amplify the Echo Chamber Effect ( http://arxiv.org/abs/2211.16480v2 )

ライセンス: Link先を確認

Ashwin Rao, Fred Morstatter and Kristina Lerman

(参考訳) 公共の談話におけるソーシャルメディアの隆盛は、オンライン情報の品質とそれが政治的二極化の増幅に果たす役割のさらなる精査につながった。しかし、twitterのようなソーシャルメディアプラットフォームにおける分断の研究は、ソーシャルグラフに関するデータ収集の難しさ、特にユーザーが参加するエコーチェンバーやタイムラインで見るものを表すリンクをフォローすることによって妨げられている。フォロワーグラフのプロキシとして、研究者はretweetを使用するが、この選択が分析にどのように影響するかは明らかではない。 twitterのフォロワーグラフとユーザーの投稿したツイートのサンプルを使って、retweetグラフを再構築し、エコーチャンバーと露出の指標にその影響を定量化する。両方のグラフにエコーチャンバーがあることは分かっていますが、retweetグラフではより顕著です。我々は、ユーザーがフォロワーとリツイートネットワークで見る情報を比較し、リツイートされたアカウントが系統的により分断されたコンテンツを共有していることを示す。このバイアスは、ユーザ自身のフォロワグラフ近傍でのアクティビティや分極などでは説明できないが、イデオロギー的に自身の見解と一致したアカウントに注意を払うことで説明できる。以上の結果から,リツイートグラフに基づく研究はエコーチャンバー効果や偏光情報への露出を過大評価していることが示唆された。

The growing prominence of social media in public discourse has led to a greater scrutiny of the quality of online information and the role it plays in amplifying political polarization. However, studies of polarization on social media platforms like Twitter have been hampered by the difficulty of collecting data about the social graph, specifically follow links that shape the echo chambers users join as well as what they see in their timelines. As a proxy of the follower graph, researchers use retweets, although it is not clear how this choice affects analysis. Using a sample of the Twitter follower graph and the tweets posted by users within it, we reconstruct the retweet graph and quantify its impact on the measures of echo chambers and exposure. While we find that echo chambers exist in both graphs, they are more pronounced in the retweet graph. We compare the information users see via their follower and retweet networks to show that retweeted accounts share systematically more polarized content. This bias cannot be explained by the activity or polarization within users' own follower graph neighborhoods but by the increased attention they pay to accounts that are ideologically aligned with their own views. Our results suggest that studies relying on the retweet graphs overestimate the echo chamber effects and exposure to polarized information.

翻訳日:2023-07-27 16:12:46 公開日:2023-07-26

# FsaNet: セマンティックセグメンテーションのための周波数自己注意

FsaNet: Frequency Self-attention for Semantic Segmentation ( http://arxiv.org/abs/2211.15595v3 )

ライセンス: Link先を確認

Fengyu Zhang, Ashkan Panahi, Guangjun Gao

(参考訳) 画像のスペクトル特性を考慮し,線形速度まで計算複雑性を低減した新しい自己追尾機構を提案する。オブジェクト内の類似性を促進しつつエッジの保存性を向上させるため,周波数帯域の異なる個別化プロセスを提案する。特に, プロセスが低周波成分上のみである場合について検討する。アブレーション研究により,低周波自己注意は,ネットワークを再トレーニングすることなく,全周波に対して非常に近い,あるいは良好な性能が得られることを示した。そこで我々は,FsaNetと呼ぶCNNネットワークの先頭に,新しいプラグアンドプレイモジュールを設計し,組み込む。周波数自己注意 1) 入力として少数の低周波係数しか必要としない。 2) 線形構造を持つ空間領域自己完結と数学的に等価である。 3) トークンマッピング(1\times1$畳み込み)ステージとトークンの混合ステージを同時に単純化する。周波数自己アテンションに要するメモリは 87.29 % \sim 90.04 %$、メモリは 96.13 % \sim 98.07 %$ FLOPs と 97.56 % \sim 98.18 %$ である。他のresnet101ベースのセルフアテンションネットワークと比較して、 \ourm は cityscape テストデータセットと ade20k と vocaug の競合結果で新たな \sart 結果 (83.0\%$ miou) を達成している。 \ourMは、COCO上のインスタンスセグメンテーションのためのMASK R-CNNを強化することもできる。また、提案モジュールを利用することで、スケールの異なる一連のモデル上でsegformerをブーストすることができ、再トレーニングすることなくsegformer-b5を改善できる。コードは \url{https://github.com/zfy-csu/FsaNet

Considering the spectral properties of images, we propose a new self-attention mechanism with highly reduced computational complexity, up to a linear rate. To better preserve edges while promoting similarity within objects, we propose individualized processes over different frequency bands. In particular, we study a case where the process is merely over low-frequency components. By ablation study, we show that low frequency self-attention can achieve very close or better performance relative to full frequency even without retraining the network. Accordingly, we design and embed novel plug-and-play modules to the head of a CNN network that we refer to as FsaNet. The frequency self-attention 1) requires only a few low frequency coefficients as input, 2) can be mathematically equivalent to spatial domain self-attention with linear structures, 3) simplifies token mapping ($1\times1$ convolution) stage and token mixing stage simultaneously. We show that frequency self-attention requires $87.29\% \sim 90.04\%$ less memory, $96.13\% \sim 98.07\%$ less FLOPs, and $97.56\% \sim 98.18\%$ in run time than the regular self-attention. Compared to other ResNet101-based self-attention networks, \ourM achieves a new \sArt result ($83.0\%$ mIoU) on Cityscape test dataset and competitive results on ADE20k and VOCaug. \ourM can also enhance MASK R-CNN for instance segmentation on COCO. In addition, utilizing the proposed module, Segformer can be boosted on a series of models with different scales, and Segformer-B5 can be improved even without retraining. Code is accessible at \url{https://github.com/zfy-csu/FsaNet

翻訳日:2023-07-27 16:12:23 公開日:2023-07-26

# 非エルミート二バンドBCSモデルにおけるゼロ例外点におけるマイスナー効果の破壊

Breakdown of the Meissner effect at the zero exceptional point in non-Hermitian two-band BCS model ( http://arxiv.org/abs/2211.11422v2 )

ライセンス: Link先を確認

Takanobu Taira

(参考訳) 外部浴槽に結合した系を記述する非エルミート多体ハミルトニアンについて検討する。非エルミート平均場理論を用いて、ハミルトニアンの固有値はパラメータ空間において特異性を示し、例外点と呼ばれる相転移点が出現することを示した。この時点で、ギャップパラメータが有限である間、マイスナー効果は崩壊する。我々の研究は、非エルミート多体系における例外点の役割に関する洞察を提供する。

We investigate a non-Hermitian many-body Hamiltonian describing a system coupled to an external bath. Using non-Hermitian mean-field theory, we show that the Hamiltonian's eigenvalues exhibit a singularity in the parameter space, leading to the emergence of a phase transition point called the exceptional point. At this point, the Meissner effect breaks down while gap parameters remain finite. Our work provides insights into the role of an exceptional point in the non-Hermitian many-body systems.

翻訳日:2023-07-27 16:11:51 公開日:2023-07-26

# 視覚位置認識のための集合データベース選択の優位化

Dominating Set Database Selection for Visual Place Recognition ( http://arxiv.org/abs/2303.05123v2 )

ライセンス: Link先を確認

Anastasiia Kornilova, Ivan Moskalenko, Timofei Pushkin, Fakhriddin Tojiboev, Rahim Tariverdizadeh, Gonzalo Ferrer

(参考訳) 本稿では,RGBDスキャンシーケンスから室内環境のローカライズのための視覚的位置認識(VPR)データベースを作成する手法を提案する。提案手法は,空間情報から構築したグラフのドミネーションセットアルゴリズムを用いて最小化問題として定式化され,ドミネーションセットと呼ばれる。本アルゴリズムは,データベース作成に使用される他の手法と比較して,シーンカバレッジを向上する。また,dominatingsetを使用すると,データベースサイズは元のスキャンシーケンスの最大250～1400倍小さくなり,リコールレートはテストシーケンスの80%以上となることを実証した。提案アルゴリズムを7シーンとBundleFusionデータセットと,高度に反復的なオフィス設定で追加記録したシーケンスで評価した。さらに、データベース選択は、ニューラルネットワークの位置認識アルゴリズムを特定の設定に微調整する弱い教師付きラベルを生成することができ、精度をさらに向上させる。また、RGBDスキャンシーケンスからVPRデータベースを作成するための完全自動化パイプラインと、VPRデータベース評価のためのメトリクスセットも提示する。コードとリリースされたデータは、私たちのWebページ~-https://prime-slam.github.io/place-recognition-db/で利用可能です。

This paper presents an approach for creating a visual place recognition (VPR) database for localization in indoor environments from RGBD scanning sequences. The proposed approach is formulated as a minimization problem in terms of dominating set algorithm for graph, constructed from spatial information, and referred as DominatingSet. Our algorithm shows better scene coverage in comparison to other methodologies that are used for database creation. Also, we demonstrate that using DominatingSet, a database size could be up to 250-1400 times smaller than the original scanning sequence while maintaining a recall rate of more than 80% on testing sequences. We evaluated our algorithm on 7-scenes and BundleFusion datasets and an additionally recorded sequence in a highly repetitive office setting. In addition, the database selection can produce weakly-supervised labels for fine-tuning neural place recognition algorithms to particular settings, improving even more their accuracy. The paper also presents a fully automated pipeline for VPR database creation from RGBD scanning sequences, as well as a set of metrics for VPR database evaluation. The code and released data are available on our web-page~ -- https://prime-slam.github.io/place-recognition-db/

翻訳日:2023-07-27 16:05:22 公開日:2023-07-26

# フロッケ電子相の光伝導率シグネチャ

Optical Conductivity Signatures of Floquet Electronic Phases ( http://arxiv.org/abs/2303.02261v2 )

ライセンス: Link先を確認

Andrew Cupo, Joshuah T. Heath, Emilio Cobanera, James D. Whitfield, Chandrasekhar Ramanathan, Lorenza Viola

(参考訳) 光伝導率測定はフロケ電子相の異なるシグネチャへのアクセスを提供し、理論的には準エネルギーバンド構造によって記述される。我々は以前に導入したフロケグラフェンアンチドート格子(FGAL)の実験観測値(Phys. Rev. B 104, 174304 (2021))を特徴付ける。フロッケ線形応答理論に基づいて、縦導電率とホール導電率の実部と虚部をプローブ周波数の関数として計算する。応答関数におけるピークの数と位置は、異なるフロッケ電子相に特有のものであり、平衡アナログを持たない複数の特性を同定する。まず、プローブ周波数のいくつかの間隔で、導電性の実部は負になる。これは通常のジュール加熱機構のサブバージョンである: フロケット駆動により、物質がプローブのパワーを増幅し、結果として得られる。さらに、ホールの反応は平衡で消えるが、フロケホールの導電率の実部と虚部はゼロではなく、長手成分と同じ大きさである。最後に、駆動による局在化は、全体の大きさを減少させ、光伝導率信号を平坦化する傾向がある。実装の観点からは、FGALの主な利点は、上記帯域幅の駆動限界は、本質材料に必要なものよりも少なくとも20倍低い光子エネルギーで到達し、マグニチュード以下の帯域再正規化を可能にすることである。私たちの研究は、この新素材の反射率データを特定のフロッケ相にマッピングするために必要なツールを提供します。

Optical conductivity measurements may provide access to distinct signatures of Floquet electronic phases, which are described theoretically by their quasienergy band structures. We characterize experimental observables of the Floquet graphene antidot lattice (FGAL), which we introduced previously [Phys. Rev. B 104, 174304 (2021)]. On the basis of Floquet linear response theory, the real and imaginary parts of the longitudinal and Hall optical conductivity are computed as a function of probe frequency. We find that the number and positions of peaks in the response function are distinctive of the different Floquet electronic phases, and identify multiple properties with no equilibrium analog. First, for several intervals of probe frequencies, the real part of the conductivity becomes negative. We argue this is indicative of a subversion of the usual Joule heating mechanism: The Floquet drive causes the material to amplify the power of the probe, resulting in gain. Additionally, while the Hall response vanishes at equilibrium, the real and imaginary parts of the Floquet Hall conductivity are non-zero and can be as large as the longitudinal components. Lastly, driving-induced localization tends to reduce the overall magnitude of and to flatten out the optical conductivity signal. From an implementation standpoint, a major advantage of the FGAL is that the above-bandwidth driving limit is reached with photon energies that are at least twenty times lower than that required for the intrinsic material, allowing for significant band renormalization at orders-of-magnitude smaller intensities. Our work provides the necessary tools for experimentalists to map reflectance data to particular Floquet phases for this novel material.

翻訳日:2023-07-27 16:04:46 公開日:2023-07-26

# FacEDiM:乳牛の生体認証のための顔埋め込み分布モデル

FacEDiM: A Face Embedding Distribution Model for Few-Shot Biometric Authentication of Cattle ( http://arxiv.org/abs/2302.14831v2 )

ライセンス: Link先を確認

Meshia C\'edric Oveneke, Rucha Vaishampayan, Deogratias Lukamba Nsadisa, Jenny Ambukiyenyi Onya

(参考訳) 本研究は, プレトレーニングCNNを用いて得られたトレーニング埋め込みの多変量ガウス分布と試験埋め込みのマハラノビス距離を計算することで, バイオメトリック認証の課題を解決することを提案する。実験の結果,ImageNetデータセット上で事前学習したモデルは,人間の顔上で事前学習したモデルよりも有意に優れていた。 vgg16モデルでは20頭の牛の身元をデータセットで1.18%の範囲で1.25%のfrrを得る。

This work proposes to solve the problem of few-shot biometric authentication by computing the Mahalanobis distance between testing embeddings and a multivariate Gaussian distribution of training embeddings obtained using pre-trained CNNs. Experimental results show that models pre-trained on the ImageNet dataset significantly outperform models pre-trained on human faces. With a VGG16 model, we obtain a FRR of 1.25% for a FAR of 1.18% on a dataset of 20 cattle identities.

翻訳日:2023-07-27 16:04:17 公開日:2023-07-26

# オープンシステムのノイズ支援ディジタル量子シミュレーション

Noise-assisted digital quantum simulation of open systems ( http://arxiv.org/abs/2302.14592v3 )

ライセンス: Link先を確認

Jos\'e D. Guimar\~aes, James Lim, Mikhail I. Vasilevskiy, Susana F. Huelga and Martin B. Plenio

(参考訳) 量子系は本質的にオープンであり、環境騒音に影響を受けやすいため、その力学に有害で有益な効果がある。この現象は、ノイズが新しい機能を可能にする生体分子系で観察され、そのダイナミクスのシミュレーションがデジタルおよびアナログ量子シミュレーションの重要なターゲットとなっている。それにもかかわらず、現在の量子デバイスの計算能力は、その固有のノイズのため、しばしば制限される。本研究では,オープンな量子システムのシミュレーションに必要な計算資源を削減するために,量子デバイス固有のノイズを利用する新しい手法を提案する。提案手法は,量子ノイズ特性法と量子誤差緩和法を組み合わせることで,量子回路における固有ノイズの操作と制御を可能にする。具体的には,開放系力学の所望のシミュレーションを実現するために,量子回路のデコヒーレンス率を選択的に増減する。本手法の詳細を述べるとともに、実およびエミュレートされたibm量子コンピュータで実施したノイズキャラクタリゼーションおよび量子誤差軽減実験の結果について報告する。さらに,本手法の実験的資源要件を推定する。提案手法では,ノイズを生かして量子計算を高速化し,新しいシミュレーション手法をNISQ(Noisy Intermediate-Scale Quantum)デバイスに導入する可能性を秘めている。

Quantum systems are inherently open and susceptible to environmental noise, which can have both detrimental and beneficial effects on their dynamics. This phenomenon has been observed in bio-molecular systems, where noise enables novel functionalities, making the simulation of their dynamics a crucial target for digital and analog quantum simulation. Nevertheless, the computational capabilities of current quantum devices are often limited due to their inherent noise. In this work, we present a novel approach that capitalizes on the intrinsic noise of quantum devices to reduce the computational resources required for simulating open quantum systems. Our approach combines quantum noise characterization methods with quantum error mitigation techniques, enabling us to manipulate and control the intrinsic noise in a quantum circuit. Specifically, we selectively enhance or reduce decoherence rates in the quantum circuit to achieve the desired simulation of open system dynamics. We provide a detailed description of our methods and report on the results of noise characterization and quantum error mitigation experiments conducted on both real and emulated IBM Quantum computers. Additionally, we estimate the experimental resource requirements for our techniques. Our approach holds the potential to unlock new simulation techniques in Noisy Intermediate-Scale Quantum (NISQ) devices, harnessing their intrinsic noise to enhance quantum computations.

翻訳日:2023-07-27 16:04:07 公開日:2023-07-26

# サブキューブ条件付きハイパーグリッドの均一性試験

Uniformity Testing over Hypergrids with Subcube Conditioning ( http://arxiv.org/abs/2302.09013v2 )

ライセンス: Link先を確認

Xi Chen, Cassandra Marcussen

(参考訳) これは$\smash{\widetilde{o}(\text{poly}(m)\sqrt{n}/\epsilon^2)} となる。$m=\max_i m_i$ でoracleをサンプリングするサブキューブ条件付きクエリに多くのクエリを与える。 m$が定数である場合、我々のアルゴリズムはほぼ最適であり、[CCK+21]のアルゴリズムは、同じクエリの複雑さを持つが、ハイパーキューブ$\{\pm 1\}^n$でのみ機能する。我々のアルゴリズムの分析の背後にある重要な技術的貢献は、フーリエ解析を用いて超格子上の関数に対するピシエの不等式の頑健なバージョンの証明である。

We give an algorithm for testing uniformity of distributions supported on hypergrids $[m_1] \times \cdots \times [m_n]$, which makes $\smash{\widetilde{O}(\text{poly}(m)\sqrt{n}/\epsilon^2)}$ many queries to a subcube conditional sampling oracle with $m=\max_i m_i$. When $m$ is a constant, our algorithm is nearly optimal and strengthens the algorithm of [CCK+21] which has the same query complexity but works for hypercubes $\{\pm 1\}^n$ only. A key technical contribution behind the analysis of our algorithm is a proof of a robust version of Pisier's inequality for functions over hypergrids using Fourier analysis.

翻訳日:2023-07-27 16:02:57 公開日:2023-07-26

# オフライン強化学習におけるデータ強化のための不確実性駆動トラジェクトリトランケーション

Uncertainty-driven Trajectory Truncation for Data Augmentation in Offline Reinforcement Learning ( http://arxiv.org/abs/2304.04660v2 )

ライセンス: Link先を確認

Junjie Zhang, Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Jun Yang, Le Wan, Xiu Li

(参考訳) トレーニングされた環境ダイナミクスを備えたモデルベースオフライン強化学習(RL)アルゴリズムは、品質の低いデータセットでさえも、固定サイズのデータセットから優れたポリシをうまく学習することができる。しかし残念ながら、トレーニングされたダイナミクスモデルから生成されたサンプルが信頼できることは保証できない(例えば、いくつかの合成サンプルは静的データセットの支持領域の外側にあるかもしれない)。この問題に対処するため, 軌道に沿って蓄積された不確かさが大きすぎる場合, 合成軌道を適応的に切断するトラジェクトリトラニケーション (TATU) を提案する。理論的には、TATUの性能境界を示し、その利点を正当化する。 TATUの利点を実証的に示すために、まず2つの古典的モデルベースオフラインRLアルゴリズム、MOPOとCOMBOを組み合わせる。さらに、TATUを市販のモデルなしオフラインRLアルゴリズム、例えばBCQと統合する。 D4RLベンチマーク実験の結果、TATUは性能を著しく改善し、しばしば大きなマージンで改善した。コードはここにある。

Equipped with the trained environmental dynamics, model-based offline reinforcement learning (RL) algorithms can often successfully learn good policies from fixed-sized datasets, even some datasets with poor quality. Unfortunately, however, it can not be guaranteed that the generated samples from the trained dynamics model are reliable (e.g., some synthetic samples may lie outside of the support region of the static dataset). To address this issue, we propose Trajectory Truncation with Uncertainty (TATU), which adaptively truncates the synthetic trajectory if the accumulated uncertainty along the trajectory is too large. We theoretically show the performance bound of TATU to justify its benefits. To empirically show the advantages of TATU, we first combine it with two classical model-based offline RL algorithms, MOPO and COMBO. Furthermore, we integrate TATU with several off-the-shelf model-free offline RL algorithms, e.g., BCQ. Experimental results on the D4RL benchmark show that TATU significantly improves their performance, often by a large margin. Code is available here.

翻訳日:2023-07-27 15:54:32 公開日:2023-07-26

# 強いバスカップリングによる循環型量子エンジン

Cyclic quantum engines enhanced by strong bath coupling ( http://arxiv.org/abs/2304.03267v3 )

ライセンス: Link先を確認

Camille L. Latune, Graeme Pleasance, and Francesco Petruccione

(参考訳) 強いシステムバス結合はリッチで興味深い現象を生み出すが、量子熱エンジンへの応用は、主に有害な効果を指摘してきた。強い結合による効率損失とより早い平衡による電力増加との微妙なトレードオフは認識されているものの、正確に平衡時間を評価するという課題のためにほとんど未解決のままであった。ここでは, 階層的運動方程式 (heom) 形式に基づく厳密な数値シミュレーションを用いて, この障害を克服する。量子オットーサイクルは、この方法で出力電力の効率タイムの積を最大化することで、強結合(しかし超強結合ではない)よりも優れた性能を示す。特に,強い結合により,同じ出力パワーを共有しながら,より効率のよいエンジンを得ることができることを示した。逆に、弱い結合されたエンジンよりも大きな出力を持つ強い結合エンジンを設計でき、同じ効率を共有できる。その結果, 強い結合が熱力学的操作の性能を直接的に向上させることができ, 量子サーマルエンジンの標準構成以上の研究の重要性を再強調できる。

While strong system-bath coupling produces rich and interesting phenomena, applications to quantum thermal engines have been so far pointing mainly at detrimental effects. The delicate trade-off between efficiency loss due to strong coupling and power increase due to faster equilibration, while acknowledged, remained largely unexplored owing to the challenge of assessing precisely the equilibration time. Here, we overcome this obstacle by exploiting exact numerical simulations based on the hierarchical equations of motion (HEOM) formalism. We show that a quantum Otto cycle can perform better at strong (but not ultrastrong) coupling in that the product of the efficiency times the output power is maximized in this regime. In particular, we show that strong coupling allows one to obtain engines with larger efficiency than their weakly coupled counterparts, while sharing the same output power. Conversely, one can design strongly coupled engines with larger power than their weakly coupled counterparts, while sharing the same efficiency. Overall, our results provide situations where strong coupling can directly enhance the performance of thermodynamic operations, re-enforcing the importance of studying quantum thermal engines beyond standard configurations.

翻訳日:2023-07-27 15:54:13 公開日:2023-07-26

# Neglected Free Lunch - アノテーション副産物を用いた画像分類器の学習

Neglected Free Lunch -- Learning Image Classifiers Using Annotation Byproducts ( http://arxiv.org/abs/2303.17595v3 )

ライセンス: Link先を確認

Dongyoon Han, Junsuk Choe, Seonghyeok Chun, John Joon Young Chung, Minsuk Chang, Sangdoo Yun, Jean Y. Song, Seong Joon Oh

(参考訳) 画像分類器の教師付き学習は、画像と対応するラベル(x,y)のペアを通して人間の知識をパラメトリックモデルに蒸留する。このシンプルで広く使われている人間の知識の表現は、画像選択後のマウスのトレースやクリックの時系列などのアノテーション手順からの豊富な補助情報を無視していると論じる。我々の洞察では、このようなアノテーション副産物Zは、モデルが前景の手がかりに集中するように弱め、素早い相関を減らし、ショートカット学習を阻害するおよそ人間の注意を与える。これを検証するために、ImageNet-ABとCOCO-ABを作成します。これらはImageNetとCOCOトレーニングセットで、サンプル単位のアノテーション副産物が豊富で、それぞれのオリジナルのアノテーションタスクを複製して収集される。アノテーション副産物を用いたトレーニングモデルの新たなパラダイムを,アノテーション副産物を用いた学習(luab)と呼んでいる。 y とともに z をレグレッシブする単純なマルチタスクロスにより,学習モデルの一般化性とロバスト性が向上することを示す。オリジナルの教師付き学習と比較すると、LUABは追加のアノテーションコストを必要としない。 ImageNet-ABとCOCO-ABはhttps://github.com/naver-ai/NeglectedFreeLunchにある。

Supervised learning of image classifiers distills human knowledge into a parametric model through pairs of images and corresponding labels (X,Y). We argue that this simple and widely used representation of human knowledge neglects rich auxiliary information from the annotation procedure, such as the time-series of mouse traces and clicks left after image selection. Our insight is that such annotation byproducts Z provide approximate human attention that weakly guides the model to focus on the foreground cues, reducing spurious correlations and discouraging shortcut learning. To verify this, we create ImageNet-AB and COCO-AB. They are ImageNet and COCO training sets enriched with sample-wise annotation byproducts, collected by replicating the respective original annotation tasks. We refer to the new paradigm of training models with annotation byproducts as learning using annotation byproducts (LUAB). We show that a simple multitask loss for regressing Z together with Y already improves the generalisability and robustness of the learned models. Compared to the original supervised learning, LUAB does not require extra annotation costs. ImageNet-AB and COCO-AB are at https://github.com/naver-ai/NeglectedFreeLunch.

翻訳日:2023-07-27 15:53:52 公開日:2023-07-26

# 変圧器ネットワークを用いた高速道路自動走行のマルチモーダル操作と軌道予測

Multimodal Manoeuvre and Trajectory Prediction for Automated Driving on Highways Using Transformer Networks ( http://arxiv.org/abs/2303.16109v2 )

ライセンス: Link先を確認

Sajjad Mozaffari, Mreza Alipour Sormoli, Konstantinos Koufos, and Mehrdad Dianati

(参考訳) 自動運転車(AV)や自動走行システム(ADS)の安全かつ効率的な運転には、車両を含む他の道路利用者の行動(操縦・軌道)を予測することが重要である。車両の将来の挙動が不確実であるため、複数の将来の動作モードは、与えられた運転シーンにおいて車両に対してしばしば可能となる。したがって、マルチモーダル予測はシングルモード予測よりもリッチな情報を提供し、AVがより良いリスク評価を行うことができる。そこで本研究では,複数の動作モードとその可能性を予測するマルチモーダル予測フレームワークを提案する。提案フレームワークは,マルチモーダルな操作および軌道予測のための調整されたトレーニング手法と,新しいトランスフォーマーに基づく予測モデルを含む。本フレームワークの性能は,NGSIM, HighD, exiDという3つの公道走行データセットを用いて評価した。その結果,提案手法は予測誤差の点で最先端のマルチモーダル手法よりも優れており,予測可能な操作モードや軌道モードを予測できることがわかった。

Predicting the behaviour (i.e., manoeuvre/trajectory) of other road users, including vehicles, is critical for the safe and efficient operation of autonomous vehicles (AVs), a.k.a., automated driving systems (ADSs). Due to the uncertain future behaviour of vehicles, multiple future behaviour modes are often plausible for a vehicle in a given driving scene. Therefore, multimodal prediction can provide richer information than single-mode prediction, enabling AVs to perform a better risk assessment. To this end, we propose a novel multimodal prediction framework that can predict multiple plausible behaviour modes and their likelihoods. The proposed framework includes a bespoke problem formulation for manoeuvre prediction, a novel transformer-based prediction model, and a tailored training method for multimodal manoeuvre and trajectory prediction. The performance of the framework is evaluated using three public highway driving datasets, namely NGSIM, highD, and exiD. The results show that our framework outperforms the state-of-the-art multimodal methods in terms of prediction error and is capable of predicting plausible manoeuvre and trajectory modes.

翻訳日:2023-07-27 15:53:32 公開日:2023-07-26

# HOICLIP:視覚言語モデルを用いたHOI検出のための効率的な知識伝達

HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models ( http://arxiv.org/abs/2303.15786v3 )

ライセンス: Link先を確認

Shan Ning, Longtian Qiu, Yongfei Liu, Xuming He

(参考訳) human-object interaction(hoi)検出は、人間とオブジェクトのペアをローカライズし、それらの相互作用を認識することを目的としている。近年,コントラスト言語-画像事前学習 (CLIP) は,知識蒸留によるHOI検出器の操作に先立って大きな可能性を示している。しかしながら、このようなアプローチは大規模トレーニングデータに依存することが多く、少数/ゼロショットのシナリオではパフォーマンスが劣る。本稿では,CLIPから事前知識を効率的に抽出し,より優れた一般化を実現する新しいHOI検出フレームワークを提案する。具体的には,まず,クロスアテンション機構を介してクリップの視覚特徴マップから情報領域を抽出する新しいインタラクションデコーダを導入し,より正確な人間と対象のペア検出のための知識統合ブロックによって検出バックボーンと融合する。さらに、CLIPテキストエンコーダの事前知識を利用して、HOI記述を埋め込んで分類器を生成する。詳細なインタラクションを識別するために,視覚的意味演算と軽量動詞表現アダプタを用いて,訓練データから動詞分類器を構築する。さらに,CLIPのグローバルHOI予測を利用した学習自由化を提案する。 HICO-Det上の+4.04 mAPなど,様々な設定において,本手法が最先端の手法であることを示す。ソースコードはhttps://github.com/Artanic30/HOICLIPで入手できる。

Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions. Recently, Contrastive Language-Image Pre-training (CLIP) has shown great potential in providing interaction prior for HOI detectors via knowledge distillation. However, such approaches often rely on large-scale training data and suffer from inferior performance under few/zero-shot scenarios. In this paper, we propose a novel HOI detection framework that efficiently extracts prior knowledge from CLIP and achieves better generalization. In detail, we first introduce a novel interaction decoder to extract informative regions in the visual feature map of CLIP via a cross-attention mechanism, which is then fused with the detection backbone by a knowledge integration block for more accurate human-object pair detection. In addition, prior knowledge in CLIP text encoder is leveraged to generate a classifier by embedding HOI descriptions. To distinguish fine-grained interactions, we build a verb classifier from training data via visual semantic arithmetic and a lightweight verb representation adapter. Furthermore, we propose a training-free enhancement to exploit global HOI predictions from CLIP. Extensive experiments demonstrate that our method outperforms the state of the art by a large margin on various settings, e.g. +4.04 mAP on HICO-Det. The source code is available in https://github.com/Artanic30/HOICLIP.

翻訳日:2023-07-27 15:53:11 公開日:2023-07-26

# 安定なシグナチャ:潜拡散モデルにおけるローイング透かし

The Stable Signature: Rooting Watermarks in Latent Diffusion Models ( http://arxiv.org/abs/2303.15435v2 )

ライセンス: Link先を確認

Pierre Fernandez, Guillaume Couairon, Herv\'e J\'egou, Matthijs Douze and Teddy Furon

(参考訳) 生成画像モデリングは幅広いアプリケーションを可能にするが、責任あるデプロイメントに関する倫理的懸念を提起する。本稿では,画像透かしと潜在拡散モデルを組み合わせたアクティブ戦略を提案する。目標は、生成したすべての画像が、将来の検出や識別を可能にする、見えない透かしを隠すことだ。この方法は、バイナリシグネチャで条件付けられたイメージジェネレータの潜在デコーダを迅速に微調整する。予め訓練された透かし抽出器は、生成された画像から隠された署名を回収し、統計検査を行い、生成モデルから来たものかどうかを判定する。画像修正後も安定署名が機能することを示すため,様々な世代タスクにおける透かしの可視性と頑健性を評価した。例えば、テキストプロンプトから生成された画像の原点を検出し、その内容の10\%$を90$+$\%$精度で10$^{-6}$以下で保持する。

Generative image modeling enables a wide range of applications but raises ethical concerns about responsible deployment. This paper introduces an active strategy combining image watermarking and Latent Diffusion Models. The goal is for all generated images to conceal an invisible watermark allowing for future detection and/or identification. The method quickly fine-tunes the latent decoder of the image generator, conditioned on a binary signature. A pre-trained watermark extractor recovers the hidden signature from any generated image and a statistical test then determines whether it comes from the generative model. We evaluate the invisibility and robustness of the watermarks on a variety of generation tasks, showing that Stable Signature works even after the images are modified. For instance, it detects the origin of an image generated from a text prompt, then cropped to keep $10\%$ of the content, with $90$+$\%$ accuracy at a false positive rate below 10$^{-6}$.

翻訳日:2023-07-27 15:52:24 公開日:2023-07-26

# ソフトウェア開発教育におけるジェネレーティブAIアシスタント

Generative AI Assistants in Software Development Education ( http://arxiv.org/abs/2303.13936v2 )

ライセンス: Link先を確認

Christopher Bull, Ahmed Kharrufa

(参考訳) ソフトウェア開発業界は、別の破壊的なパラダイム変化の最中にある。プログラミングに生成型ai(gai)アシスタントを採用することだ。 AIはすでにソフトウェアエンジニアリングのさまざまな領域で使用されているが、GitHub CopilotやChatGPTといったGAIテクノロジは、人々の想像力(と恐怖)に火をつけている。業界がどのように適応するかは不明だが、Microsoft(GitHub、Bing)やGoogle(Bard)といった大手ソフトウェア企業によってこれらの技術を統合する動きは、意図と方向性を明確に示している。私たちは、現在の実践と課題を理解するために、業界専門家と探索的なインタビューを行い、ソフトウェア開発教育の将来というビジョンに組み込んで、教育的なレコメンデーションを実施しました。

The software development industry is amid another disruptive paradigm change - adopting the use of generative AI (GAI) assistants for programming. Whilst AI is already used in various areas of software engineering, GAI technologies, such as GitHub Copilot and ChatGPT, have ignited peoples' imaginations (and fears). It is unclear how the industry will adapt, but the move to integrate these technologies by large software companies, such as Microsoft (GitHub, Bing) and Google (Bard), is a clear indication of intent and direction. We performed exploratory interviews with industry professionals to understand current practice and challenges, which we incorporate into our vision of a future of software development education and make some pedagogical recommendations.

翻訳日:2023-07-27 15:52:06 公開日:2023-07-26

# 階層的関係推論によるシーングラフ生成

Scene Graph Generation from Hierarchical Relationship Reasoning ( http://arxiv.org/abs/2303.06842v2 )

ライセンス: Link先を確認

Bowen Jiang and Camillo J. Taylor

(参考訳) 本稿では,視覚場面における物体間の関係を推定する新しい手法を提案する。オブジェクトと関係のカテゴリを分離するために課せられる、有益で階層的な構造を明示的に利用します。具体的には,提案手法はベイズ予測ヘッドを組み込んで,2つのオブジェクト間の関係の型としてスーパーカテゴリの結合予測と,そのスーパーカテゴリ内の詳細な関係を実現できる。この設計はクラス不均衡の問題の影響を低減する。さらに,教師付きコントラスト学習を改良し,階層型分類方式を適用した。 Visual GenomeとOpenImage V6データセットの実験的評価は、この分解されたアプローチが比較的単純なモデルで、特に述語分類やゼロショットタスクにおいて、競争的なパフォーマンスを達成することを実証している。

This paper presents a novel approach for inferring relationships between objects in visual scenes. It explicitly exploits an informative hierarchical structure that can be imposed to divide the object and relationship categories into disjoint super-categories. Specifically, our proposed method incorporates a Bayes prediction head, enabling joint predictions of the super-category as the type of relationship between the two objects, along with the detailed relationship within that super-category. This design reduces the impact of class imbalance problems. Furthermore, we also modify the supervised contrastive learning to adapt our hierarchical classification scheme. Experimental evaluations on the Visual Genome and OpenImage V6 datasets demonstrate that this factorized approach allows a relatively simple model to achieve competitive performance, particularly in predicate classification and zero-shot tasks.

翻訳日:2023-07-27 15:51:52 公開日:2023-07-26

# 量子メトロロジーのための周波数境界の階層:Cram\er-RaoからBarankin

Hierarchies of Frequentist Bounds for Quantum Metrology: From Cram\'er-Rao to Barankin ( http://arxiv.org/abs/2303.06108v2 )

ライセンス: Link先を確認

M. Gessner and A. Smerzi

(参考訳) 量子距離論における推定器の分散に関する下界は、推定器の不偏性に関する制約を定義する可観測性を選択することによって導かれる。量子境界は、与えられた制約を満たすすべての可能な量子測定値と推定値に対する解析的最適化によって得られる。我々は、最低次数で束縛された量子クレーア・ラオを含む、ますます厳密な境界の階層を得る。反対の極限において、量子バランキン境界 (quantum barankin bound) は、量子メトロロジーにおける局所最良不偏推定子の分散である。本結果は, 有限データによる混合状態の量子計測において, 規則性条件を回避し, しきい値の挙動を識別できる量子フィッシャー情報の一般化を明らかにするものである。

We derive lower bounds on the variance of estimators in quantum metrology by choosing test observables that define constraints on the unbiasedness of the estimator. The quantum bounds are obtained by analytical optimization over all possible quantum measurements and estimators that satisfy the given constraints. We obtain hierarchies of increasingly tight bounds that include the quantum Cram\'er-Rao bound at the lowest order. In the opposite limit, the quantum Barankin bound is the variance of the locally best unbiased estimator in quantum metrology. Our results reveal generalizations of the quantum Fisher information that are able to avoid regularity conditions and identify threshold behavior in quantum measurements with mixed states, caused by finite data.

翻訳日:2023-07-27 15:51:39 公開日:2023-07-26

# 特徴適応を用いたDNN圧縮領域認識

DNN-Compressed Domain Visual Recognition with Feature Adaptation ( http://arxiv.org/abs/2305.08000v2 )

ライセンス: Link先を確認

Yingpeng Deng and Lina J. Karam

(参考訳) 学習に基づく画像圧縮は、最先端の変換ベースのコーデックと競合する性能を発揮する。これはJPEG-AIのような新しい学習ベースのビジュアル圧縮標準の開発を動機づけた。これらの新しい標準に対する特に関心は、人間と機械の両方をターゲットにした学習ベースの画像圧縮システムの開発である。本稿では,圧縮領域表現を用いて,圧縮領域内で直接視覚処理やコンピュータビジョンタスクを行う学習ベース圧縮方式について述べる。本研究では,ビットレートの異なる圧縮ドメイン潜在表現を用いて視覚認識を行うための,学習ベースの圧縮ドメイン分類フレームワークを採用する。本稿では,抽出されたチャネル情報の中で重要な特徴を適応的に強調・強化するために,軽量な注意モデルを統合する新しい特徴適応モジュールを提案する。また,事前訓練された画素領域重みを利用するための適応学習戦略を設計する。比較のために,提案手法を用いて得られた性能評価結果に加えて,画素領域内の圧縮・完全復号画像とオリジナル未圧縮画像を用いた性能評価結果も提示する。その結果,提案した圧縮領域分類モデルは,既存の圧縮領域分類モデルよりも明らかに優れており,完全復号化画像を用いて訓練された画素領域モデルと比較して,計算効率が向上することを示す。

Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs. This motivated the development of new learning-based visual compression standards such as JPEG-AI. Of particular interest to these emerging standards is the development of learning-based image compression systems targeting both humans and machines. This paper is concerned with learning-based compression schemes whose compressed-domain representations can be utilized to perform visual processing and computer vision tasks directly in the compressed domain. In our work, we adopt a learning-based compressed-domain classification framework for performing visual recognition using the compressed-domain latent representation at varying bit-rates. We propose a novel feature adaptation module integrating a lightweight attention model to adaptively emphasize and enhance the key features within the extracted channel-wise information. Also, we design an adaptation training strategy to utilize the pretrained pixel-domain weights. For comparison, in addition to the performance results that are obtained using our proposed latent-based compressed-domain method, we also present performance results using compressed but fully decoded images in the pixel domain as well as original uncompressed images. The obtained performance results show that our proposed compressed-domain classification model can distinctly outperform the existing compressed-domain classification models, and that it can also yield similar accuracy results with a much higher computational efficiency as compared to the pixel-domain models that are trained using fully decoded images.

翻訳日:2023-07-27 15:46:26 公開日:2023-07-26

# マルチオブザーバによる高次元モニタリングとリアリズムの出現

High-dimensional monitoring and the emergence of realism via multiple observers ( http://arxiv.org/abs/2305.07919v2 )

ライセンス: Link先を確認

Alexandre C. Orthey Jr., Pedro R. Dieguez, Owidiusz Makuta, Remigiusz Augusiak

(参考訳) 非凸量子測定はユニタリ進化と部分的トレースによって記述することができる。そこで本研究では,量子世界の物理的現実の出現を,弱度と強い非選択性の測定を補間するモデルを導入することによって解決する。一般化された可観測量とハイゼンベルク・ワイル作用素に基づく我々のモデルは、高次元の量子ダーウィン主義の枠組みに従えば、高次元の可観測量についての完全な情報が得られることを示唆している。

Unrevealed quantum measurements can be described by unitary evolutions followed by partial traces. Based on that, we address the problem of the emergence of physical reality from the quantum world by introducing a model that interpolates between weak and strong non-selective measurements for qudits. Our model, which is based on generalized observables and Heisenberg-Weyl operators, suggests that for high-dimensional qudits, full information about the observable of interest can only be obtained by making the system interact with not just one but several environmental qudits, following a Quantum Darwinism framework.

翻訳日:2023-07-27 15:46:06 公開日:2023-07-26

# 拡散モデルにおけるNull-text Guidanceは、秘かにカートゥーンスタイルのクリエーターである

Null-text Guidance in Diffusion Models is Secretly a Cartoon-style Creator ( http://arxiv.org/abs/2305.06710v3 )

ライセンス: Link先を確認

Jing Zhao, Heliang Zheng, Chaoyue Wang, Long Lan, Wanrong Huang, Wenjing Yang

(参考訳) 分類器フリーガイダンスは拡散モデルにおいて有効なサンプリング手法であり、広く採用されている。主な考え方は、モデルをテキストガイダンスの方向に外挿し、nullテキストガイダンスから遠ざかることである。本稿では,拡散モデルにおけるヌルテキストガイダンスが秘かにマンガスタイルの作者であること,すなわち,ヌルテキストガイダンスを単純に摂動させることで,生成した画像を漫画に効率的に変換できることを実証する。具体的には,2つの外乱手法,すなわちロールバック障害(Back-D)とイメージ障害(Image-D)を提案し,サンプリングプロセスにおいて,ヌルテキストガイダンスとテキストガイダンスの予測に使用されるノイズ画像と,それぞれ \textbf{null-text noisy image} と \textbf{text noisy image} とを一致させる。 Back-Dは、$x_t$を$x_{t+\Delta t}$に置き換えることで、null-textのノイズレベルを変更することで、漫画化を実現する。 Image-Dは、クリーンな入力画像として$x_t$を定義することにより、高忠実で多様な漫画を生成する。包括的実験により, ノイズ乱れの原理を考察し, 乱れの有効性は, 雑音画像と音源画像との相関に依存することを明らかにした。さらに,提案手法は,漫画画像を生成し,特定のものを漫画化することができるため,任意の分類子フリー誘導拡散モデルにおいて,プラグイン・アンド・プレイ・コンポーネントとして容易に統合できる。プロジェクトページは \url{https://nulltextforcartoon.github.io/} で利用可能である。

Classifier-free guidance is an effective sampling technique in diffusion models that has been widely adopted. The main idea is to extrapolate the model in the direction of text guidance and away from null-text guidance. In this paper, we demonstrate that null-text guidance in diffusion models is secretly a cartoon-style creator, i.e., the generated images can be efficiently transformed into cartoons by simply perturbing the null-text guidance. Specifically, we proposed two disturbance methods, i.e., Rollback disturbance (Back-D) and Image disturbance (Image-D), to construct misalignment between the noisy images used for predicting null-text guidance and text guidance (subsequently referred to as \textbf{null-text noisy image} and \textbf{text noisy image} respectively) in the sampling process. Back-D achieves cartoonization by altering the noise level of null-text noisy image via replacing $x_t$ with $x_{t+\Delta t}$. Image-D, alternatively, produces high-fidelity, diverse cartoons by defining $x_t$ as a clean input image, which further improves the incorporation of finer image details. Through comprehensive experiments, we delved into the principle of noise disturbing for null-text and uncovered that the efficacy of disturbance depends on the correlation between the null-text noisy image and the source image. Moreover, our proposed techniques, which can generate cartoon images and cartoonize specific ones, are training-free and easily integrated as a plug-and-play component in any classifier-free guided diffusion model. Project page is available at \url{https://nulltextforcartoon.github.io/}.

翻訳日:2023-07-27 15:45:37 公開日:2023-07-26

# 時間矢印予測を用いた実細胞顕微鏡のための自己教師付き密度表現学習

Self-supervised dense representation learning for live-cell microscopy with time arrow prediction ( http://arxiv.org/abs/2305.05511v2 )

ライセンス: Link先を確認

Benjamin Gallusser, Max Stieber, and Martin Weigert

(参考訳) 顕微鏡画像の最先端のオブジェクト検出とセグメンテーション方法は教師付き機械学習に依存しており、トレーニングデータの手作業による注釈を必要とする。本稿では,生の無ラベルライブセル顕微鏡映像から高密度画像表現を学習するtime arrow prediction pre-trainingに基づく自己教師あり方式を提案する。本手法は,画像領域の正しい順序を単一画像特徴抽出器を用いて予測し,その後,融合した特徴に基づいて時間矢印予測ヘッドを動作させる。得られた高密度表現が本質的に時間非対称な生物学的過程を捉えていることを示す。さらに,細胞分裂の検出と分節化,および細胞状態の分類において,いくつかのライブセル顕微鏡データセット上でこれらの表現の有用性を示す。提案手法は教師付き手法よりも優れており,特に実例と同様,限定的真理アノテーションしか利用できない場合に優れる。コードはhttps://github.com/weigertlab/tarrow.com/で提供します。

State-of-the-art object detection and segmentation methods for microscopy images rely on supervised machine learning, which requires laborious manual annotation of training data. Here we present a self-supervised method based on time arrow prediction pre-training that learns dense image representations from raw, unlabeled live-cell microscopy videos. Our method builds upon the task of predicting the correct order of time-flipped image regions via a single-image feature extractor followed by a time arrow prediction head that operates on the fused features. We show that the resulting dense representations capture inherently time-asymmetric biological processes such as cell divisions on a pixel-level. We furthermore demonstrate the utility of these representations on several live-cell microscopy datasets for detection and segmentation of dividing cells, as well as for cell state classification. Our method outperforms supervised methods, particularly when only limited ground truth annotations are available as is commonly the case in practice. We provide code at https://github.com/weigertlab/tarrow.

翻訳日:2023-07-27 15:45:01 公開日:2023-07-26

# Kullback-Leibler Maillard Smpling for Multi-armed Bandits with bounded Rewards

Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards ( http://arxiv.org/abs/2304.14989v2 )

ライセンス: Link先を確認

Hao Qin, Kwang-Sung Jun and Chicheng Zhang

(参考訳) 我々は、腕の報酬分布がすべて$[0,1]$間隔で支えられるような$K$武器の盗賊問題を研究する。この環境では、後悔効率の悪いランダム化探索アルゴリズムを設計することが難しかった。 maillard sampling~\cite{maillard13apprentissage}(トンプソンサンプリングに代わる魅力的な代替品)は、最近、オフラインポリシー評価に有用なクローズドフォームアクション確率を維持しながら、サブゲージの報酬設定における競合的な後悔の保証を達成することが示されている。本研究では,KL-Leibler Maillard Smpling (KL-MS)アルゴリズムを提案する。 kl-ms は、報酬がベルヌーイであるときに漸近的最適性を享受し、最悪の場合の後悔の束縛が $o(\sqrt{\mu^*(1-\mu^*) k t \ln k} + k \ln t)$ であることを示し、ここで $\mu^*$ は最適アームの期待報酬であり、$t$ は時平線の長さである。

We study $K$-armed bandit problems where the reward distributions of the arms are all supported on the $[0,1]$ interval. It has been a challenge to design regret-efficient randomized exploration algorithms in this setting. Maillard sampling~\cite{maillard13apprentissage}, an attractive alternative to Thompson sampling, has recently been shown to achieve competitive regret guarantees in the sub-Gaussian reward setting~\cite{bian2022maillard} while maintaining closed-form action probabilities, which is useful for offline policy evaluation. In this work, we propose the Kullback-Leibler Maillard Sampling (KL-MS) algorithm, a natural extension of Maillard sampling for achieving KL-style gap-dependent regret bound. We show that KL-MS enjoys the asymptotic optimality when the rewards are Bernoulli and has a worst-case regret bound of the form $O(\sqrt{\mu^*(1-\mu^*) K T \ln K} + K \ln T)$, where $\mu^*$ is the expected reward of the optimal arm, and $T$ is the time horizon length.

翻訳日:2023-07-27 15:44:46 公開日:2023-07-26

# 多光子高次元GHZ状態の合成

Preparation of multiphoton high-dimensional GHZ state ( http://arxiv.org/abs/2304.12813v4 )

ライセンス: Link先を確認

Wen-Bo Xing, Xiao-Min Hu, Yu Guo, Bi-Heng Liu, Chuan-Feng Li and Guang-Can Guo

(参考訳) 多部類高次元絡み合わせは多部類2次元絡み合わせとは異なる物理を呈する。しかし、多次元高次元絡み合わせの作り方はまだ線形光学の課題である。本稿では,光学系において任意の次元の準備プロトコルを持つ多光子GHZ状態を提案する。本プロトコルでは,高次元エンタングルメントゲートを実現するために補助エンタングルメントを用い,高次元エンタングルペアを多成分の高次元ghz状態に接続する。具体的には、光子の経路自由度を用いて4粒子の3次元ghz状態を作成する例を示す。本手法は他の自由度まで拡張でき、任意の次元で任意のghz絡み合いを生成することができる。

Multipartite high-dimensional entanglement presents different physics from multipartite two-dimensional entanglement. However, how to prepare multipartite high-dimensional entanglement is still a challenge with linear optics. In this paper, a multiphoton GHZ state with arbitrary dimensions preparation protocol is proposed in optical systems. In this protocol, we use auxiliary entanglements to realize a high-dimensional entanglement gate, so that high-dimensional entangled pairs can be connected into a multipartite high-dimensional GHZ state. Specifically, we give an example of using photons' path degree of freedom to prepare a 4-particle 3-dimensional GHZ state. Our method can be extended to other degrees of freedom and can generate arbitrary GHZ entanglement in any dimension.

翻訳日:2023-07-27 15:44:13 公開日:2023-07-26

# クロスレファレンストランスによる医療画像の分節化

Few-shot Medical Image Segmentation via Cross-Reference Transformer ( http://arxiv.org/abs/2304.09630v4 )

ライセンス: Link先を確認

Yao Huang and Jianming Liu

(参考訳) 深層学習モデルは医用画像セグメンテーションの主流となっているが、トレーニングには大規模な手動ラベル付きデータセットが必要であり、目に見えないカテゴリに拡張することは困難である。 Few-shot segmentation(FSS)は、少数のラベル付きサンプルから新しいカテゴリを学習することで、これらの課題に対処する可能性がある。現在の手法のほとんどはプロトタイプ学習アーキテクチャを採用しており、サポート対象のベクトルを拡張し、条件付きセグメンテーションを実行するためにクエリ機能と結合する。しかし、このようなフレームワークは、サポートとクエリ機能の相関を無視する一方で、クエリ機能に重点を置く可能性がある。本稿では,支援画像と問合せ画像との相互作用の欠如に対処するために,クロスリファレンストランスを用いた,自己教師付き少数の医用画像分割ネットワークを提案する。まず,両方向のクロスアテンションモジュールを用いて,サポートセット画像とクエリ画像の相関性を向上する。次に,高次元チャネルにおけるサポート機能やクエリ機能の類似部分を発掘・拡張するために,クロスリファレンス機構を採用している。実験の結果,CTデータセットとMRIデータセットの両方で良好な結果が得られた。

Deep learning models have become the mainstream method for medical image segmentation, but they require a large manually labeled dataset for training and are difficult to extend to unseen categories. Few-shot segmentation(FSS) has the potential to address these challenges by learning new categories from a small number of labeled samples. The majority of the current methods employ a prototype learning architecture, which involves expanding support prototype vectors and concatenating them with query features to conduct conditional segmentation. However, such framework potentially focuses more on query features while may neglect the correlation between support and query features. In this paper, we propose a novel self-supervised few shot medical image segmentation network with Cross-Reference Transformer, which addresses the lack of interaction between the support image and the query image. We first enhance the correlation features between the support set image and the query image using a bidirectional cross-attention module. Then, we employ a cross-reference mechanism to mine and enhance the similar parts of support features and query features in high-dimensional channels. Experimental results show that the proposed model achieves good results on both CT dataset and MRI dataset.

翻訳日:2023-07-27 15:44:02 公開日:2023-07-26

# UPGPT:人物画像生成・編集・メッセージ転送のためのユニバーサル拡散モデル

UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer ( http://arxiv.org/abs/2304.08870v2 )

ライセンス: Link先を確認

Soon Yau Cheong, Armin Mustafa, Andrew Gilbert

(参考訳) StableDiffusionのようなテキスト・ツー・イメージ・モデル(T2I)は、人々の高品質な画像を生成するために使われてきた。しかし、生成過程のランダムな性質から、同じテキストプロンプトを使用しているにもかかわらず、人物はポーズ、顔、衣服などの外観が異なる。不整合のように見えるため、T2Iはポーズ転移には不適である。我々は、テキスト、ポーズ、視覚的プロンプトを受け入れるマルチモーダル拡散モデルを提案する。本モデルは,全人物画像タスク生成,ポーズ転送,マスクレス編集を行う最初の統一手法である。また,小型3次元ボディモデルパラメータを直接利用して,人物の外観を維持しながら,新たな機能的ポーズとカメラビューの補間を示す。

Text-to-image models (T2I) such as StableDiffusion have been used to generate high quality images of people. However, due to the random nature of the generation process, the person has a different appearance e.g. pose, face, and clothing, despite using the same text prompt. The appearance inconsistency makes T2I unsuitable for pose transfer. We address this by proposing a multimodal diffusion model that accepts text, pose, and visual prompting. Our model is the first unified method to perform all person image tasks - generation, pose transfer, and mask-less edit. We also pioneer using small dimensional 3D body model parameters directly to demonstrate new capability - simultaneous pose and camera view interpolation while maintaining the person's appearance.

翻訳日:2023-07-27 15:43:20 公開日:2023-07-26

# 格子ゲージ理論とサブシステム符号の相互作用

Interplay between lattice gauge theory and subsystem codes ( http://arxiv.org/abs/2304.05718v3 )

ライセンス: Link先を確認

Yoshihito Kuno, Ikuo Ichinose

(参考訳) トーリック符号は、トポロジカル順序を持つ射影ハミルトニアンによって支配される純粋ゲージ理論モデルであると広く認識されている。本研究では,量子情報システムとゲージ理論モデルとの相互作用をサブシステムコードの観点から拡張する。例えば、特定の開境界条件を持つ(2+1)次元のZ_2$格子ゲージ-ヒッグスモデルが、一種のサブシステムコードであることを示す。システムでは、ガウス・ロー制約は安定化子であり、ヒッグスと閉じ込めフェーズを識別する順序パラメータが存在し、境界上に位置するサブシステム符号の論理演算子である。混合異常は境界零モードの存在を規定しており、これはヒッグスと閉じ込め相における対称性で保護された位相秩序の直接的な結果である。位相図を識別した後、サブシステムはhiggsとcloinementフェーズに埋め込まれる。主な知見として、higgsとcloinementフェーズでコード(エンコードされたqubit)を明確に記述し、higgsとcloinementフェーズの双対性を明確にする。ヒッグスおよび閉じ込め相のサブシステムの縮退構造は、いくつかの興味深い凝縮マッター系で観測される強零モードの概念に類似した非常に高エネルギーレベルでも残っている。数値解析手法を用いて解析的に得られた結果を相関させ,得られたスペクトル構造はゲージ理論相における様々なサブシステムの解析的記述をサポートする。

It is now widely recognized that the toric code is a pure gauge-theory model governed by a projective Hamiltonian with topological orders. In this work, we extend the interplay between quantum information system and gauge-theory model from the view point of subsystem code, which is suitable for \textit{gauge systems including matter fields}. As an example, we show that $Z_2$ lattice gauge-Higgs model in (2+1)-dimensions with specific open boundary conditions is noting but a kind of subsystem code. In the system, Gauss-law constraints are stabilizers, and order parameters identifying Higgs and confinement phases exist and they are nothing but logical operators in subsystem codes residing on the boundaries. Mixed anomaly of them dictates the existence of boundary zero modes, which is a direct consequence of symmetry-protected topological order in Higgs and confinement phases. After identifying phase diagram, subsystem codes are embedded in the Higgs and confinement phases. As our main findings, we give an explicit description of the code (encoded qubit) in the Higgs and confinement phases, which clarifies duality between Higgs and confinement phases. The degenerate structure of subsystem code in the Higgs and confinement phases remains even in very high-energy levels, which is analogous to notion of strong-zero modes observed in some interesting condensed-matter systems. Numerical methods are used to corroborate analytically-obtained results and the obtained spectrum structure supports the analytical description of various subsystem codes in the gauge theory phases.

翻訳日:2023-07-27 15:42:58 公開日:2023-07-26

# 説明可能で言語非依存なllmに向けて:大規模言語のシンボリックリバースエンジニアリング

Towards Explainable and Language-Agnostic LLMs: Symbolic Reverse Engineering of Language at Scale ( http://arxiv.org/abs/2306.00017v3 )

ライセンス: Link先を確認

Walid S. Saba

(参考訳) 大規模言語モデル(llm)は、undenia-blyが多くの人工知能(ai)に対する信念を変えたマイルストーンを達成した。しかし、深層ニューラルネットワークの下位アーキテクチャの副産物である真の言語理解に関しては、これらのLLMには多くの制限がある。さらに、それらのサブシンボリックな性質のため、これらのモデルが言語がどのように機能するかに関する知識は、常に何十億ものマイクロファチュア(重み)に埋もれてしまう。これらの制約に対処するため、我々は記号表現の強さとLLMの成功の鍵となるもの、すなわち大規模言語におけるボトムアップ・リバースエンジニアリングの成功を組み合わせることを提案する。このように、我々はボトムアップな言語リバースエンジニアリングをシンボリックな設定で議論する。このプロジェクトのヒントは、何人かの著者によって提案されており、このプロジェクトをどのように達成できるかについて、いくつかの詳細を議論している。

Large language models (LLMs) have achieved a milestone that undenia-bly changed many held beliefs in artificial intelligence (AI). However, there remains many limitations of these LLMs when it comes to true language understanding, limitations that are a byproduct of the under-lying architecture of deep neural networks. Moreover, and due to their subsymbolic nature, whatever knowledge these models acquire about how language works will always be buried in billions of microfeatures (weights), none of which is meaningful on its own, making such models hopelessly unexplainable. To address these limitations, we suggest com-bining the strength of symbolic representations with what we believe to be the key to the success of LLMs, namely a successful bottom-up re-verse engineering of language at scale. As such we argue for a bottom-up reverse engineering of language in a symbolic setting. Hints on what this project amounts to have been suggested by several authors, and we discuss in some detail here how this project could be accomplished.

翻訳日:2023-07-27 15:35:28 公開日:2023-07-26

# TD-GEM:テキスト駆動ガーメント編集マッパー

TD-GEM: Text-Driven Garment Editing Mapper ( http://arxiv.org/abs/2305.18120v2 )

ライセンス: Link先を確認

Reza Dadfar, Sanaz Sabzevari, M\r{a}rten Bj\"orkman, Danica Kragic

(参考訳) 言語ベースのファッション画像編集は、ユーザーがテキストプロンプトで好みの衣服のバリエーションを試すことができる。 StyleCLIPとHairCLIPの潜在表現を操作する研究から着想を得て、フルボディの人間のデータセットのファッションアイテムを編集する潜在空間に焦点を当てた。現在、衣服の形状やテクスチャの複雑さや人間のポーズの多様性が原因で、ファッション画像編集の処理にギャップがある。本稿では,ファッションアイテムの編集を目的としたtd-gem(text-driven clothing editing mapper)と呼ばれる編集最適化手法を提案する。この目的のために、まず、より正確な結果を得るために、Encoder for Editing (e4e) やPivotal Tuning Inversion (PTI) のような生成的敵ネットワークインバージョンを通して画像の潜在表現を得る。次に、最適化に基づくContrastive Language-Image Pre-training(CLIP)を用いて、テキストプロンプトによって表現されたターゲット属性の方向におけるファッションイメージの潜在表現を誘導する。我々のTD-GEMはターゲット属性に従って画像を正確に操作し、画像の他の部分は無タッチで保持する。実験では,TD-GEMを2つの属性(色と袖の長さ)で評価し,最近の操作方式と比較して現実的な画像を効果的に生成する。

Language-based fashion image editing allows users to try out variations of desired garments through provided text prompts. Inspired by research on manipulating latent representations in StyleCLIP and HairCLIP, we focus on these latent spaces for editing fashion items of full-body human datasets. Currently, there is a gap in handling fashion image editing due to the complexity of garment shapes and textures and the diversity of human poses. In this paper, we propose an editing optimizer scheme method called Text-Driven Garment Editing Mapper (TD-GEM), aiming to edit fashion items in a disentangled way. To this end, we initially obtain a latent representation of an image through generative adversarial network inversions such as Encoder for Editing (e4e) or Pivotal Tuning Inversion (PTI) for more accurate results. An optimization-based Contrastive Language-Image Pre-training (CLIP) is then utilized to guide the latent representation of a fashion image in the direction of a target attribute expressed in terms of a text prompt. Our TD-GEM manipulates the image accurately according to the target attribute, while other parts of the image are kept untouched. In the experiments, we evaluate TD-GEM on two different attributes (i.e., "color" and "sleeve length"), which effectively generates realistic images compared to the recent manipulation schemes.

翻訳日:2023-07-27 15:34:48 公開日:2023-07-26

# 時空間マター:相対論的量子論における局在問題

Space-Time-Matter: Some Notes on the Localization Problem in Relativistic Quantum Theory ( http://arxiv.org/abs/2305.18118v2 )

ライセンス: Link先を確認

Christian Beck

(参考訳) この研究は、相対論的量子論における正のエネルギー仮定の意味と、量子システムの局所化に関する問題に光を当てることを目的としている。相対論的波動方程式(ディラック方程式など)の解の正のエネルギー特性は、自由時間発展を超えた状態変換に関して非常に脆弱であることが示されている。第二量子化における負のエネルギーディラック波動関数とペア生成過程の間の関係に注意を払うと、この解析は相対論的量子論の局所化問題(例えばニュートン、ウィグナー、リー、シュライダー、ヘーガーフェルト、マレーメントの有名な結果と関連する)として知られる問題のクラスをよりよく理解する。最後に、この解析はボーム場の量子論の観点から反映される。

This work aims to shed some light on the meaning of the positive energy assumption in relativistic quantum theory and its relation to questions of localization of quantum systems. It is shown that the positive energy property of solutions of relativistic wave equations (such as the Dirac equation) is very fragile with respect to state transformations beyond free time evolution. Paying attention to the connection between negative energy Dirac wave functions and pair creation processes in second quantization, this analysis leads to a better understanding of a class of problems known as the localization problem of relativistic quantum theory (associated for instance with famous results of Newton and Wigner, Reeh and Schlieder, Hegerfeldt or Malament). Finally, this analysis is reflected from the perspective of a Bohmian quantum field theory.

翻訳日:2023-07-27 15:34:24 公開日:2023-07-26

# 1次元ボース気体中の分散量子衝撃波における「真空点」と灰色のソリトンの運命

Fate of the "vacuum point'' and of grey solitons in dispersive quantum shock waves in a one-dimensional Bose gas ( http://arxiv.org/abs/2305.17647v3 )

ライセンス: Link先を確認

S. A. Simmons, J. C. Pillay, and K. V. Kheruntsyan

(参考訳) 平均場近似を超えた1次元ボース気体中の分散量子衝撃波の研究を継続する。 Simmonsらによる最近の作品。 [Phys. Let. 125, 180401 (2020)], この系で発生した発振衝撃波列は, 量子力学的自己干渉の結果, 物質-波位相コヒーレンスの損失によって干渉コントラストが減少すると考えられる。このようなコヒーレンスの喪失は、平均体Gross-Pitaevskiiの記述と比較して、量子的または熱的ゆらぎと強く相互作用する状態によって起こる。本研究では、この文脈における分散量子衝撃波の解析を他の動的シナリオにまで拡張する。より具体的には、研究されたシナリオには、平均場記述でいわゆる「真空点」へと導くのに十分な密度のバンプの進化と、同じ平均場近似で灰色のソリトン列を降ろすことで知られる初期密度ディップの進化が含まれる。量子的および熱的ゆらぎの存在,および中間的および強い相互作用におけるこれらの非線形波動構造の運命について検討し,真空点と灰色のソリトンの両方が平均場的アプローチを超えないことを示す。一方, 真空点は, 局所ジムプルポテンシャルの基底状態から進化する理想的(非相互作用的)ボースガス中で発生する。自然界における分散衝撃波のユビキタス性から,非線形波動現象を表示できる他の物理系に対して有用な知見と展望を提供する必要がある。

We continue the study of dispersive quantum shock waves in a one-dimensional Bose gas beyond the mean-field approximation. In a recent work by Simmons et al. [Phys. Rev. Let. 125, 180401 (2020)], the oscillatory shock wave train developing in this system from an initial localized density bump on a uniform background was interpreted as a result of quantum mechanical self-interference, wherein the interference contrast would diminish with the loss of matter-wave phase coherence. Such loss of coherence, relative to the mean-field Gross-Pitaevskii description, occurs due to either quantum or thermal fluctuations, as well as in the strongly interacting regime. In this work, we extend the analysis of dispersive quantum shock waves in this context to other dynamical scenarios. More specifically, the scenarios studied include evolution of a sufficiently high density bump, known to lead to the so-called ``vacuum point'' in the mean-field description, and evolution of an initial density dip, known to shed a train of grey solitons in the same mean-field approximation. We study the fate of these nonlinear wave structures in the presence of quantum and thermal fluctuations, as well as at intermediate and strong interactions, and show that both the vacuum point and grey solitons cease to manifest themselves beyond the mean-field approach. On the other hand, we find that a vacuum point can occur in an ideal (noninteracting) Bose gas evolving from a ground state of a localized dimple potential. Due to the ubiquity of dispersive shock waves in nature, our results should provide useful insights and perspectives for a variety of other physical systems known to display nonlinear wave phenomena.

翻訳日:2023-07-27 15:34:09 公開日:2023-07-26

# ロングテール認識問題における重みバランスの検討

Exploring Weight Balancing on Long-Tailed Recognition Problem ( http://arxiv.org/abs/2305.16573v3 )

ライセンス: Link先を確認

Naoya Hasegawa, Issei Sato

(参考訳) サンプルサイズが意図的に調整されない限り,データセット内のクラス毎のサンプルサイズ分布が一般的に指数関数的であるため,クラス毎のサンプルサイズが重く歪んだ長いデータにおける認識問題は近年重要になっている。これらの問題に対処するために様々なアプローチが考案された。近年,有名な古典的正規化手法と二段階訓練を組み合わせた重みバランスが提案されている。その単純さにもかかわらず、様々な方法で考案された既存の手法に対する高い性能で知られている。しかし、このアプローチが長期データに有効である理由については理解されていない。本研究では,各訓練段階における神経崩壊と錐体効果に着目した手法を分析し,重量減少とクロスエントロピー損失による特徴抽出器のフィッシャーの判別比の増加と,体重減少とクラスバランス損失による暗黙のロジット調整に分解できることを見出した。分析により,精度を高めつつ訓練段階の数を1つに減らすことにより,より簡便な訓練方法が得られた。

Recognition problems in long-tailed data, where the sample size per class is heavily skewed, have recently gained importance because the distribution of the sample size per class in a dataset is generally exponential unless the sample size is intentionally adjusted. Various approaches have been devised to address these problems. Recently, weight balancing, which combines well-known classical regularization techniques with two-stage training, has been proposed. Despite its simplicity, it is known for its high performance against existing methods devised in various ways. However, there is a lack of understanding as to why this approach is effective for long-tailed data. In this study, we analyze the method focusing on neural collapse and cone effect at each training stage and find that it can be decomposed into the increase in Fisher's discriminant ratio of the feature extractor caused by weight decay and cross entropy loss and implicit logit adjustment caused by weight decay and class-balanced loss. Our analysis shows that the training method can be further simplified by reducing the number of training stages to one while increasing accuracy.

翻訳日:2023-07-27 15:33:39 公開日:2023-07-26

# EgoVSR: 高品質なEgocentric Video Super-Resolutionを目指す

EgoVSR: Towards High-Quality Egocentric Video Super-Resolution ( http://arxiv.org/abs/2305.14708v2 )

ライセンス: Link先を確認

Yichen Chi, Junhao Gu, Jiamiao Zhang, Wenming Yang, Yapeng Tian

(参考訳) キャプチャ装置やシナリオの制限のため、エゴセントリックなビデオは視覚的品質が低く、主に高い圧縮と激しい動きのぼけによって引き起こされる。エゴセントリックビデオの応用が増えているため、これらのビデオの品質を超高解像度で高める必要がある。しかし、既存のVSR(Video Super-Resolution)の作品は、3人称ビュービデオに焦点をあてているが、エゴセントリックビデオの急激なエゴモーションや物体の動きによるぼやけたアーチファクトを扱うには適していない。この目的のために,エゴセントリックなビデオに特化して設計されたVSRフレームワークであるEgoVSRを提案する。 VSRフレームワークのDual Branch Deblur Network (DB$^2$Net) を用いて,エゴセントリックな動画における動きのぼかしに明示的に対処する。一方、DB$^2$Net学習のガイドとしてぼやけたマスクが導入され、ビデオフレーム内のぼやけた領域のローカライズに使用できる。またマスク予測のためにMaskNetを設計し,マスク推定を最適化するためにマスク損失を予測した。さらに, エゴセントリックビデオのように動きのぼやきをシミュレートするために, 一般的なvsrトレーニングデータに対するオンラインモーションボケ合成モデルを提案する。提案手法の有効性を検証するため,多数の高速移動エゴセントリックなビデオシーケンスを含むEgoVSRデータセットを提案する。我々のEgoVSRモデルは、低品質のエゴセントリックビデオを効率よく超解し、強力な比較ベースラインを上回ります。私たちのコード、事前トレーニングされたモデル、データはhttps://github.com/chiyich/egovsr/で確認できます。

Due to the limitations of capture devices and scenarios, egocentric videos frequently have low visual quality, mainly caused by high compression and severe motion blur. With the increasing application of egocentric videos, there is an urgent need to enhance the quality of these videos through super-resolution. However, existing Video Super-Resolution (VSR) works, focusing on third-person view videos, are actually unsuitable for handling blurring artifacts caused by rapid ego-motion and object motion in egocentric videos. To this end, we propose EgoVSR, a VSR framework specifically designed for egocentric videos. We explicitly tackle motion blurs in egocentric videos using a Dual Branch Deblur Network (DB$^2$Net) in the VSR framework. Meanwhile, a blurring mask is introduced to guide the DB$^2$Net learning, and can be used to localize blurred areas in video frames. We also design a MaskNet to predict the mask, as well as a mask loss to optimize the mask estimation. Additionally, an online motion blur synthesis model for common VSR training data is proposed to simulate motion blurs as in egocentric videos. In order to validate the effectiveness of our proposed method, we introduce an EgoVSR dataset containing a large amount of fast-motion egocentric video sequences. Extensive experiments demonstrate that our EgoVSR model can efficiently super-resolve low-quality egocentric videos and outperform strong comparison baselines. Our code, pre-trained models and data can be found at https://github.com/chiyich/EGOVSR/.

翻訳日:2023-07-27 15:33:19 公開日:2023-07-26

# 時空間的注意に基づく視覚位置認識のための学習シーケンス記述子

Learning Sequence Descriptor based on Spatio-Temporal Attention for Visual Place Recognition ( http://arxiv.org/abs/2305.11467v3 )

ライセンス: Link先を確認

Fenglin Zhang, Junqiao Zhao, Yingfeng Cai, Gengxuan Tian, Wenjie Mu, Chen Ye

(参考訳) ビジュアルプレース認識(VPR)は、クエリフレームと同じ場所に位置するジオタグデータベースからフレームを取得することを目的としている。知覚的エイリアスにおけるVPRの堅牢性を改善するために,シーケンスベースのVPR手法を提案する。これらの手法はフレームシーケンス間のマッチングや直接検索のためのシーケンス記述子抽出に基づいている。しかし、前者は一般に一定の速度の仮定に基づいており、これは実際は保持が困難であり、計算コストが高く、シーケンス長が要求される。後者はこれらの問題を克服するが、既存のシーケンス記述子は、時間的情報に干渉することなく、複数のフレームの特徴を集約することで構築される。本稿では,時空間情報を効果的に組み込むシーケンス記述子を提案する。具体的には、同じフレーム内の空間的注意を空間的特徴パターンの学習に利用し、異なるフレームの対応する局所領域の注意を時間とともに特徴の持続性や変化を学ぶために利用する。我々はスライディングウィンドウを用いて時間的注意範囲を制御し、相対的な位置エンコーディングを用いて異なる特徴間の逐次的関係を構築する。これにより、ディスクリプタはフレームのシーケンスで内在的なダイナミクスをキャプチャできます。挑戦的なベンチマークデータセットに関する包括的な実験は、提案手法が最近の最先端手法よりも優れていることを示している。

Visual Place Recognition (VPR) aims to retrieve frames from a geotagged database that are located at the same place as the query frame. To improve the robustness of VPR in perceptually aliasing scenarios, sequence-based VPR methods are proposed. These methods are either based on matching between frame sequences or extracting sequence descriptors for direct retrieval. However, the former is usually based on the assumption of constant velocity, which is difficult to hold in practice, and is computationally expensive and subject to sequence length. Although the latter overcomes these problems, existing sequence descriptors are constructed by aggregating features of multiple frames only, without interaction on temporal information, and thus cannot obtain descriptors with spatio-temporal discrimination. In this paper, we propose a sequence descriptor that effectively incorporates spatio-temporal information. Specifically, spatial attention within the same frame is utilized to learn spatial feature patterns, while attention in corresponding local regions of different frames is utilized to learn the persistence or change of features over time. We use a sliding window to control the temporal range of attention and use relative position encoding to construct sequential relationships between different features. This allows our descriptors to capture the intrinsic dynamics in a sequence of frames. Comprehensive experiments on challenging benchmark datasets show that the proposed approach outperforms recent state-of-the-art methods.

翻訳日:2023-07-27 15:32:31 公開日:2023-07-26

# 量子リピータネットワークのスケーリング限界

Scaling Limits of Quantum Repeater Networks ( http://arxiv.org/abs/2305.08696v2 )

ライセンス: Link先を確認

Mahdi Chehimi, Shahrooz Pouryousef, Nitish K. Panigrahy, Don Towsley, and Walid Saad

(参考訳) 量子ネットワーク(QN)はセキュアな通信、強化されたセンシング、効率的な分散量子コンピューティングのための有望なプラットフォームである。しかし、量子状態の脆弱な性質のため、これらのネットワークはスケーラビリティの面で大きな課題に直面している。本稿では,量子リピータネットワーク(QRN)のスケーリング限界について解析する。この研究の目標は、qos(application-specific quality-of-service)要件を満たす一方で、長距離量子通信を実現するqrnの全体的な長さやスケーラビリティを最大化することである。特に、QRNのスケーラビリティを最大化することを目的とした、エンドツーエンドの忠実度とレートに関するQoS制約を満たす新しい共同最適化フレームワークを提案する。提案手法は,QRNリピータノード数,分離距離,およびリンクレベルとエンド・ツー・エンドレベルの両方で行う蒸留ラウンド数を最適化する。ゲートおよび測定誤差下でのQRNのスケーラビリティ,速度,忠実さのトレードオフを分析するために,広範囲なシミュレーションを行った。得られた結果は、所定のQoS要求に対するQRNスケーリング限界を特徴づける。提案されたアプローチは、将来のQRNデプロイメントのための有望なソリューションと設計ガイドラインを提供する。

Quantum networks (QNs) are a promising platform for secure communications, enhanced sensing, and efficient distributed quantum computing. However, due to the fragile nature of quantum states, these networks face significant challenges in terms of scalability. In this paper, the scaling limits of quantum repeater networks (QRNs) are analyzed. The goal of this work is to maximize the overall length, or scalability of QRNs such that long-distance quantum communications is achieved while application-specific quality-of-service (QoS) requirements are satisfied. In particular, a novel joint optimization framework that aims at maximizing QRN scalability, while satisfying QoS constraints on the end-to-end fidelity and rate is proposed. The proposed approach optimizes the number of QRN repeater nodes, their separation distance, and the number of distillation rounds to be performed at both link and end-to-end levels. Extensive simulations are conducted to analyze the tradeoffs between QRN scalability, rate, and fidelity under gate and measurement errors. The obtained results characterize the QRN scaling limits for a given QoS requirement. The proposed approach offers a promising solution and design guidelines for future QRN deployments.

翻訳日:2023-07-27 15:32:07 公開日:2023-07-26

# 導電性自由ウェイト空間の組立

Derivative Free Weight-space Ensembling ( http://arxiv.org/abs/2307.03506v2 )

ライセンス: Link先を確認

Dean Ninalga

(参考訳) 最近の研究は、2つの専門言語モデルの重み間の補間によって、マルチタスク学習ができない方法でタスク間で知識を伝達できることを示唆している。しかし、2つ以上のモデル間の補間を探索する事例はほとんどなく、それぞれに異なる知識基盤がある。本稿では,オープンドメイン対話のための新しいタスク転送手法であるdfwe(dederative free weight-space ensembling)を提案する。我々のフレームワークは、事前定義されたソースタスクセットを使用して訓練された多様な専門家言語モデルを作成する。次に,対象タスクにおける各専門家モデルの精細化を行い,複数の異なる知識ベースから対象タスクに接近する。最後に、勾配最適化アルゴリズムを用いてモデル重み間の線形補間を行い、補間重み付けを効率的に行う。本手法は,feta-friendsの標準的なプリトレイン・フィニチューンアプローチに匹敵する効果を示す。

Recent work suggests that interpolating between the weights of two specialized language models can transfer knowledge between tasks in a way that multi-task learning cannot. However, very few have explored interpolation between more than two models, where each has a distinct knowledge base. In this paper, we introduce Derivative Free Weight-space Ensembling (DFWE), a new few-sample task transfer approach for open-domain dialogue. Our framework creates a set of diverse expert language models trained using a predefined set of source tasks. Next, we finetune each of the expert models on the target task, approaching the target task from several distinct knowledge bases. Finally, we linearly interpolate between the model weights using a gradient-free-optimization algorithm, to efficiently find a good interpolation weighting. We demonstrate the effectiveness of the method on FETA-Friends outperforming the standard pretrain-finetune approach.

翻訳日:2023-07-27 15:26:13 公開日:2023-07-26

# MDViT:小型医用画像分割データセット用マルチドメインビジョントランス

MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets ( http://arxiv.org/abs/2307.02100v2 )

ライセンス: Link先を確認

Siyi Du, Nourhan Bayasi, Ghassan Harmarneh, Rafeef Garbi

(参考訳) 臨床的有用性にもかかわらず、医用画像分割(MIS)は画像固有の複雑さと変動性のため、困難な作業である。ビジョントランスフォーマー(ViT)は最近、MISを改善するための有望なソリューションとして登場したが、畳み込みニューラルネットワークよりも大規模なトレーニングデータセットを必要とする。この障害を克服するために、データ効率のよいvitが提案されたが、通常は単一のデータソースを使用してトレーニングされ、他の利用可能なデータセットから活用できる貴重な知識を見落としている。異なるドメインからのデータセットを組み合わせることは、負の知識伝達(NKT)、すなわち、無視できないドメイン間不均一性を持ついくつかのドメインにおけるモデル性能の低下をもたらす。本稿では,複数のデータリソース(ドメイン)の知識を適応的に活用することにより,データハンガーを緩和し,NKTと戦うドメインアダプタを含む,最初のマルチドメインViTであるMDViTを提案する。さらに、ドメイン間の表現学習を強化するために、ユニバーサルネットワーク(全ドメインを拡大する)と補助ドメイン固有のブランチ間で知識を伝達する相互知識蒸留パラダイムを統合する。 4つの皮膚病変セグメンテーションデータセットの実験により、MDViTは、より多くのドメインを追加しても推論時に、より優れたセグメンテーション性能と固定モデルサイズで最先端のアルゴリズムより優れていることが示された。私たちのコードはhttps://github.com/siyi-wind/mdvitで利用可能です。

Despite its clinical utility, medical image segmentation (MIS) remains a daunting task due to images' inherent complexity and variability. Vision transformers (ViTs) have recently emerged as a promising solution to improve MIS; however, they require larger training datasets than convolutional neural networks. To overcome this obstacle, data-efficient ViTs were proposed, but they are typically trained using a single source of data, which overlooks the valuable knowledge that could be leveraged from other available datasets. Naivly combining datasets from different domains can result in negative knowledge transfer (NKT), i.e., a decrease in model performance on some domains with non-negligible inter-domain heterogeneity. In this paper, we propose MDViT, the first multi-domain ViT that includes domain adapters to mitigate data-hunger and combat NKT by adaptively exploiting knowledge in multiple small data resources (domains). Further, to enhance representation learning across domains, we integrate a mutual knowledge distillation paradigm that transfers knowledge between a universal network (spanning all the domains) and auxiliary domain-specific branches. Experiments on 4 skin lesion segmentation datasets show that MDViT outperforms state-of-the-art algorithms, with superior segmentation performance and a fixed model size, at inference time, even as more domains are added. Our code is available at https://github.com/siyi-wind/MDViT.

翻訳日:2023-07-27 15:25:59 公開日:2023-07-26

# DifFSS:Few-Shot Semantic Segmentationのための拡散モデル

DifFSS: Diffusion Model for Few-Shot Semantic Segmentation ( http://arxiv.org/abs/2307.00773v2 )

ライセンス: Link先を確認

Weimin Tan, Siyuan Chen, Bo Yan

(参考訳) 拡散モデルは画像生成において優れた性能を示した。様々なネットワーク構造を持つ小ショットセマンティックセグメンテーション(FSS)モデルが提案されているが、性能改善はボトルネックに達している。本稿では,DifFSSと呼ばれるFSSタスクの拡散モデルを活用するための最初の研究について述べる。新たなFSSパラダイムであるDifFSSは、ネットワーク構造を変更することなく、最先端のFSSモデルの性能をさらに向上させることができる。具体的には,拡散モデルの強力な生成能力を利用して,支援画像のセマンティックマスク,スクリブル,ソフトHED境界を制御条件として,多様な補助画像を生成する。この生成プロセスは、色、テクスチャの変化、照明、$etc$といったクエリイメージのクラス内の多様性をシミュレートする。結果として、fssモデルはより多様なサポートイメージを参照でき、よりロバストな表現となり、セグメンテーション性能の一貫した改善を達成することができる。既存の高度なFSSモデルに基づく3つの公開データセットに対する大規模な実験は、FSSタスクの拡散モデルの有効性を示す。さらに,拡散モデルの入力設定の違いがセグメント化性能に与える影響について詳細に検討した。この全く新しいパラダイムが、AI生成コンテンツと統合されたFSSタスクの研究にインスピレーションを与えることを期待している。

Diffusion models have demonstrated excellent performance in image generation. Although various few-shot semantic segmentation (FSS) models with different network structures have been proposed, performance improvement has reached a bottleneck. This paper presents the first work to leverage the diffusion model for FSS task, called DifFSS. DifFSS, a novel FSS paradigm, can further improve the performance of the state-of-the-art FSS models by a large margin without modifying their network structure. Specifically, we utilize the powerful generation ability of diffusion models to generate diverse auxiliary support images by using the semantic mask, scribble or soft HED boundary of the support image as control conditions. This generation process simulates the variety within the class of the query image, such as color, texture variation, lighting, $etc$. As a result, FSS models can refer to more diverse support images, yielding more robust representations, thereby achieving a consistent improvement in segmentation performance. Extensive experiments on three publicly available datasets based on existing advanced FSS models demonstrate the effectiveness of the diffusion model for FSS task. Furthermore, we explore in detail the impact of different input settings of the diffusion model on segmentation performance. Hopefully, this completely new paradigm will bring inspiration to the study of FSS task integrated with AI-generated content.

翻訳日:2023-07-27 15:25:33 公開日:2023-07-26

# flipnerf: 反射光線を反射して、ノベル・ビュー・シンセサイザーを作る

FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis ( http://arxiv.org/abs/2306.17723v3 )

ライセンス: Link先を確認

Seunghyeon Seo, Yeonjin Chang, Nojun Kwak

(参考訳) ニューラル・ラミアンス・フィールド(nerf)は、レンダリングされた画像と単純なアーキテクチャの素晴らしい品質を持つ、新しいビュー合成の主流である。 NeRFは, 連続的な性能向上のために様々な方向に開発されてきたが, 多視点画像の高密度化の必要性は, 実用化に向けての停滞ブロックとして残っている。そこで本研究では,フリップ反射光を利用した数ショットの新規ビュー合成のための新しい正規化手法であるFlipNeRFを提案する。反射光は入力線方向と推定される正規ベクトルから明示的に導出され、より正確な表面の正常を推定し、3D幾何学を効果的に学習しながら効果的な追加の訓練線の役割を担っている。表面の正規度とシーンの深さはどちらも光線に沿った推定密度から導出されるため、正確な表面の正規度はより正確な深さ推定をもたらす。さらに,FlipNeRFは,不確実性を考慮した不確実性損失とボトルネック特徴整合性損失を推定することにより,複数のシーン構造にまたがって浮動小数点を効果的に低減し,新たな特徴抽出装置を使わずに,フォトコンシステント画素に投射される2つの画素間の特徴レベルの整合性を向上させることができる。我々のFlipNeRFは、すべてのシナリオにわたる複数のベンチマークでSOTAのパフォーマンスを達成する。

Neural Radiance Field (NeRF) has been a mainstream in novel view synthesis with its remarkable quality of rendered images and simple architecture. Although NeRF has been developed in various directions improving continuously its performance, the necessity of a dense set of multi-view images still exists as a stumbling block to progress for practical application. In this work, we propose FlipNeRF, a novel regularization method for few-shot novel view synthesis by utilizing our proposed flipped reflection rays. The flipped reflection rays are explicitly derived from the input ray directions and estimated normal vectors, and play a role of effective additional training rays while enabling to estimate more accurate surface normals and learn the 3D geometry effectively. Since the surface normal and the scene depth are both derived from the estimated densities along a ray, the accurate surface normal leads to more exact depth estimation, which is a key factor for few-shot novel view synthesis. Furthermore, with our proposed Uncertainty-aware Emptiness Loss and Bottleneck Feature Consistency Loss, FlipNeRF is able to estimate more reliable outputs with reducing floating artifacts effectively across the different scene structures, and enhance the feature-level consistency between the pair of the rays cast toward the photo-consistent pixels without any additional feature extractor, respectively. Our FlipNeRF achieves the SOTA performance on the multiple benchmarks across all the scenarios.

翻訳日:2023-07-27 15:25:13 公開日:2023-07-26

# 「犬に眼鏡をかけろとおっしゃいますか?」CoDrawデータセットにおける教示明細書の内容

"Are you telling me to put glasses on the dog?'' Content-Grounded Annotation of Instruction Clarification Requests in the CoDraw Dataset ( http://arxiv.org/abs/2306.02377v2 )

ライセンス: Link先を確認

Brielen Madureira and David Schlangen

(参考訳) 命令の明確化要求は通信問題を解決するメカニズムであり、命令追従相互作用において非常に機能する。最近の研究は、CoDrawデータセットは自然発生のiCRの貴重な情報源であると主張している。 iCRがいつ作成されるべきかを識別する以外に、対話モデルは適切なフォームとコンテンツで生成できる必要がある。本研究では,CoDraw-iCR (v2) を導入し,既存の iCR 識別子を,対話ゲームアイテムと可能なアクションを基盤としたきめ細かい情報で拡張する。我々のアノテーションは対話エージェントの修復能力のモデル化と評価に役立てることができる。

Instruction Clarification Requests are a mechanism to solve communication problems, which is very functional in instruction-following interactions. Recent work has argued that the CoDraw dataset is a valuable source of naturally occurring iCRs. Beyond identifying when iCRs should be made, dialogue models should also be able to generate them with suitable form and content. In this work, we introduce CoDraw-iCR (v2), extending the existing iCR identifiers with fine-grained information grounded in the underlying dialogue game items and possible actions. Our annotation can serve to model and evaluate repair capabilities of dialogue agents.

翻訳日:2023-07-27 15:24:09 公開日:2023-07-26

# 事前学習された視覚と言語モデルにおけるエンティティの知識調査のためのテーブルと画像生成

Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models ( http://arxiv.org/abs/2306.02115v2 )

ライセンス: Link先を確認

Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe

(参考訳) 本稿では,自然言語から取得したエンティティに関する知識がvision & language(v&l)モデルに保持されているかを検証するための表と画像生成タスクを提案する。このタスクは2つの部分で構成される: 1つはエンティティとその関連イメージに関する知識を含むテーブルを生成し、もう1つは、キャプションを持つエンティティから画像を生成すること、そして、そのエンティティに関する知識を含むテーブルである。どちらのタスクでも、モデルは生成を適切に実行するために使用されるエンティティを知る必要があります。提案したタスクを実行するために、約20万のインフォボックスからウィキペディアテーブルと画像生成(WikiTIG)データセットを作成しました。複数のタスクで最新の結果を得たv&lモデルofaを用いて,上記の研究課題に対するタスクの性能評価を行った。実験の結果,OFAは画像関連タスクの性能向上のための補完として,事前学習によってエンティティ知識の一部を忘れていることがわかった。

In this paper, we propose a table and image generation task to verify how the knowledge about entities acquired from natural language is retained in Vision & Language (V&L) models. This task consists of two parts: the first is to generate a table containing knowledge about an entity and its related image, and the second is to generate an image from an entity with a caption and a table containing related knowledge of the entity. In both tasks, the model must know the entities used to perform the generation properly. We created the Wikipedia Table and Image Generation (WikiTIG) dataset from about 200,000 infoboxes in English Wikipedia articles to perform the proposed tasks. We evaluated the performance on the tasks with respect to the above research question using the V&L model OFA, which has achieved state-of-the-art results in multiple tasks. Experimental results show that OFA forgets part of its entity knowledge by pre-training as a complement to improve the performance of image related tasks.

翻訳日:2023-07-27 15:23:55 公開日:2023-07-26

# 話者非依存3次元対話ヘッド生成のための音声からのランドマークの学習

Learning Landmarks Motion from Speech for Speaker-Agnostic 3D Talking Heads Generation ( http://arxiv.org/abs/2306.01415v2 )

ライセンス: Link先を確認

Federico Nocentini, Claudio Ferrari, Stefano Berretti

(参考訳) 本稿では,生音声入力から3次元音声頭を生成する新しい手法を提案する。本手法は,顔の可動部に位置するいくつかの制御点,すなわちランドマークの運動によって,音声関連運動を包括的かつ効率的に記述できるという考えに基づく。基礎となる筋骨格構造は、その動きが顔全体の幾何学的変形にどのように影響するかを学べる。提案手法はこの目的のために2つの異なるモデルを用いており、最初の1つは与えられたオーディオからスパースなランドマークの動作を生成することを学ぶ。第2のモデルは、そのようなランドマークの動きを密度の高い運動場に拡張し、与えられた3Dメッシュを中立状態にアニメーションするために使用される。さらに,生成した運動ベクトルと基底真理関数との角度を最小化する新しい損失関数Cosine Lossを導入する。 3D音声ヘッド生成におけるランドマークの使用は、一貫性、信頼性、手動アノテーションの必要性の回避など、さまざまなメリットを提供する。当社のアプローチは、アイデンティティ非依存で、追加のデータやトレーニングなしで、任意のユーザに対して高品質な顔アニメーションを可能にするように設計されている。

This paper presents a novel approach for generating 3D talking heads from raw audio inputs. Our method grounds on the idea that speech related movements can be comprehensively and efficiently described by the motion of a few control points located on the movable parts of the face, i.e., landmarks. The underlying musculoskeletal structure then allows us to learn how their motion influences the geometrical deformations of the whole face. The proposed method employs two distinct models to this aim: the first one learns to generate the motion of a sparse set of landmarks from the given audio. The second model expands such landmarks motion to a dense motion field, which is utilized to animate a given 3D mesh in neutral state. Additionally, we introduce a novel loss function, named Cosine Loss, which minimizes the angle between the generated motion vectors and the ground truth ones. Using landmarks in 3D talking head generation offers various advantages such as consistency, reliability, and obviating the need for manual-annotation. Our approach is designed to be identity-agnostic, enabling high-quality facial animations for any users without additional data or training.

翻訳日:2023-07-27 15:23:36 公開日:2023-07-26

# 連続時間ガウス過程回帰による時間分解能を有するイベントベースステレオビジュアルオドメトリー

Event-based Stereo Visual Odometry with Native Temporal Resolution via Continuous-time Gaussian Process Regression ( http://arxiv.org/abs/2306.01188v2 )

ライセンス: Link先を確認

Jianeng Wang, Jonathan D. Gammell

(参考訳) イベントベースのカメラは、シーン内の個々の視覚変化を非同期に捉えます。これにより、従来のフレームベースのカメラよりも、非常にダイナミックな動きと照明が弱い。それはまた、シーン内のすべての測定が、ユニークなタイミングで起こりうることを意味する。これらの異なる測定時間を扱うことは、イベントベースのカメラを使用する上で大きな課題である。視覚計測(VO)パイプラインでは、時間的に近い測定を1つの共通の時間で行うように近似することで、しばしば対処される。このグルーピングは推定問題を単純化するが、追加センサーがないため、イベントベースカメラの時間分解能を犠牲にする。そこで本稿では,グループ化や近似を必要とせず,個々の事象計測時間を直接推定する完全ステレオVOパイプラインを提案する。連続時間軌道推定を用いて、物理的動機付け前のガウス過程の回帰を通じて、イベントベースのカメラの時間的忠実度と非同期性を維持する。その性能はMVSECデータセットで評価され、2つの独立したシーケンスで7.9e-3と5.9e-3の相対誤差を達成し、既存の公開イベントベースのステレオVOパイプラインをそれぞれ2回と4回上回る。

Event-based cameras asynchronously capture individual visual changes in a scene. This makes them more robust than traditional frame-based cameras to highly dynamic motions and poor illumination. It also means that every measurement in a scene can occur at a unique time. Handling these different measurement times is a major challenge of using event-based cameras. It is often addressed in visual odometry (VO) pipelines by approximating temporally close measurements as occurring at one common time. This grouping simplifies the estimation problem but, absent additional sensors, sacrifices the inherent temporal resolution of event-based cameras. This paper instead presents a complete stereo VO pipeline that estimates directly with individual event-measurement times without requiring any grouping or approximation in the estimation state. It uses continuous-time trajectory estimation to maintain the temporal fidelity and asynchronous nature of event-based cameras through Gaussian process regression with a physically motivated prior. Its performance is evaluated on the MVSEC dataset, where it achieves 7.9e-3 and 5.9e-3 RMS relative error on two independent sequences, outperforming the existing publicly available event-based stereo VO pipeline by two and four times, respectively.

翻訳日:2023-07-27 15:23:17 公開日:2023-07-26

# 機械学習のための高忠実性プラズマシミュレーションにおける磁場トポロジーのグラフ表現

Graph Representation of the Magnetic Field Topology in High-Fidelity Plasma Simulations for Machine Learning Applications ( http://arxiv.org/abs/2307.09469v2 )

ライセンス: Link先を確認

Ioanna Bouri, Fanni Franssila, Markku Alho, Giulia Cozzani, Ivan Zaitsev, Minna Palmroth, Teemu Roos

(参考訳) シミュレーションプラズマ中の磁場のトポロジカル解析は、様々な物理現象を幅広い設定で研究することができる。そのような応用の1つは、磁場トポロジーのダイナミクスに関連する現象である磁気リコネクションであり、3次元で検出および特徴づけが難しい。三次元磁気ベクトル場のトポロジカルデータ解析と時空間グラフ表現のためのスケーラブルパイプラインを提案する。我々は,地球近傍空間に対する超コンピュータスケールvlasov理論に基づくシミュレーションであるvlasiatorによって生成された地球磁気圏のシミュレーションについて,本手法を実証する。この研究の目的は、機械学習コミュニティに対して、グラフベースの機械学習アプローチを探求し、広範囲にわたる潜在的な影響に対処することである。

Topological analysis of the magnetic field in simulated plasmas allows the study of various physical phenomena in a wide range of settings. One such application is magnetic reconnection, a phenomenon related to the dynamics of the magnetic field topology, which is difficult to detect and characterize in three dimensions. We propose a scalable pipeline for topological data analysis and spatiotemporal graph representation of three-dimensional magnetic vector fields. We demonstrate our methods on simulations of the Earth's magnetosphere produced by Vlasiator, a supercomputer-scale Vlasov theory-based simulation for near-Earth space. The purpose of this work is to challenge the machine learning community to explore graph-based machine learning approaches to address a largely open scientific problem with wide-ranging potential impact.

翻訳日:2023-07-27 15:16:17 公開日:2023-07-26

# スケール・アウェア Modulation Meet Transformer

Scale-Aware Modulation Meet Transformer ( http://arxiv.org/abs/2307.08579v2 )

ライセンス: Link先を確認

Weifeng Lin, Ziheng Wu, Jiayu Chen, Jun Huang, Lianwen Jin

(参考訳) 本稿では,畳み込みネットワークと視覚トランスを組み合わせることで,様々な下流タスクを効率的に処理できる新しいビジョントランスであるスケールアウェア変調トランス(smt)を提案する。 SMT で提案されているスケール・アウェア・変調 (SAM) には2つの新しい設計が含まれている。まず,マルチヘッド混合畳み込み(mhmc)モジュールについて紹介する。次に,SAAモジュールを提案する。SAAモジュールは軽量だが有効であり,異なる頭部をまたいだ情報融合を可能にする。これら2つのモジュールを活用することで、畳み込み変調はさらに強化される。さらに,全段階にわたって変調を利用して注意を払わないネットワークを構築する先行研究とは対照的に,ネットワークの深化に伴って局所的依存からグローバル的依存へのシフトを効果的にシミュレートできる進化的ハイブリッドネットワーク(ehn)を提案する。大規模な実験により、SMTは様々な視覚的タスクにおいて既存の最先端モデルよりも大幅に優れていることが示された。具体的には、11.5M / 2.4GFLOPs と 32M / 7.7GFLOPs の SMT は ImageNet-1K の 82.2% と 84.3% のトップ-1 の精度が得られる。 imagenet-22kを224^2解像度で事前トレーニングした後、解像度224^2と384^2で微調整すると、87.1%と88.1%のtop-1精度が得られる。 Mask R-CNNによる物体検出では、1xと3xのスケジュールで訓練されたSMTベースがCOCOのSwin Transformerの4.2と1.3mAPを上回っている。 UPerNetとのセマンティックセグメンテーションでは、シングルスケールとマルチスケールのSMTベーステストがADE20Kでそれぞれ2.0mIoUと1.1mIoUを上回っている。

This paper presents a new vision Transformer, Scale-Aware Modulation Transformer (SMT), that can handle various downstream tasks efficiently by combining the convolutional network and vision Transformer. The proposed Scale-Aware Modulation (SAM) in the SMT includes two primary novel designs. Firstly, we introduce the Multi-Head Mixed Convolution (MHMC) module, which can capture multi-scale features and expand the receptive field. Secondly, we propose the Scale-Aware Aggregation (SAA) module, which is lightweight but effective, enabling information fusion across different heads. By leveraging these two modules, convolutional modulation is further enhanced. Furthermore, in contrast to prior works that utilized modulations throughout all stages to build an attention-free network, we propose an Evolutionary Hybrid Network (EHN), which can effectively simulate the shift from capturing local to global dependencies as the network becomes deeper, resulting in superior performance. Extensive experiments demonstrate that SMT significantly outperforms existing state-of-the-art models across a wide range of visual tasks. Specifically, SMT with 11.5M / 2.4GFLOPs and 32M / 7.7GFLOPs can achieve 82.2% and 84.3% top-1 accuracy on ImageNet-1K, respectively. After pretrained on ImageNet-22K in 224^2 resolution, it attains 87.1% and 88.1% top-1 accuracy when finetuned with resolution 224^2 and 384^2, respectively. For object detection with Mask R-CNN, the SMT base trained with 1x and 3x schedule outperforms the Swin Transformer counterpart by 4.2 and 1.3 mAP on COCO, respectively. For semantic segmentation with UPerNet, the SMT base test at single- and multi-scale surpasses Swin by 2.0 and 1.1 mIoU respectively on the ADE20K.

翻訳日:2023-07-27 15:16:05 公開日:2023-07-26

# 量子化大規模言語モデルにおける創発的能力--実証的研究

Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study ( http://arxiv.org/abs/2307.08072v2 )

ライセンス: Link先を確認

Peiyu Liu, Zikang Liu, Ze-Feng Gao, Dawei Gao, Wayne Xin Zhao, Yaliang Li, Bolin Ding, Ji-Rong Wen

(参考訳) 優れた性能にもかかわらず、Large Language Models~(LLM)は、デプロイと使用のためにかなりの計算資源を必要とする。この問題を解決するために、LLMのメモリフットプリント削減や推論率の向上に量子化法が広く応用されている。しかし、大きな課題は、低ビット量子化法がしばしば性能劣化を引き起こすことである。量子化がLLMの容量に与える影響を理解することは重要である。全体的な性能に着目した以前の研究と異なり、本研究は、小言語モデルとllmを区別する重要な特徴である \emph{emergent ability} に対する量子化の影響を調べることを目的としている。特に,量子化llmにおける文脈内学習,連鎖的思考推論,命令追従の能力について検討する。実験により,4ビット量子化モデルにおいて,これらの創発能力は依然として存在することが示された。低ビットモデルの性能向上のために,(1) 部品(またはサブ構造)が量子化に敏感である場合の微視的影響解析,(2) モデル微視化による性能補償の2つの実験を行った。我々の研究は、量子化が創発能力に与える影響を理解するための重要な発見を導き、LLMの極低ビット量子化の可能性に光を放つ。

Despite the superior performance, Large Language Models~(LLMs) require significant computational resources for deployment and use. To overcome this issue, quantization methods have been widely applied to reduce the memory footprint of LLMs as well as increasing the inference rate. However, a major challenge is that low-bit quantization methods often lead to performance degradation. It is important to understand how quantization impacts the capacity of LLMs. Different from previous studies focused on overall performance, this work aims to investigate the impact of quantization on \emph{emergent abilities}, which are important characteristics that distinguish LLMs from small language models. Specially, we examine the abilities of in-context learning, chain-of-thought reasoning, and instruction-following in quantized LLMs. Our empirical experiments show that these emergent abilities still exist in 4-bit quantization models, while 2-bit models encounter severe performance degradation on the test of these abilities. To improve the performance of low-bit models, we conduct two special experiments: (1) fine-gained impact analysis that studies which components (or substructures) are more sensitive to quantization, and (2) performance compensation through model fine-tuning. Our work derives a series of important findings to understand the impact of quantization on emergent abilities, and sheds lights on the possibilities of extremely low-bit quantization for LLMs.

翻訳日:2023-07-27 15:15:31 公開日:2023-07-26

# スキップ接続を伴わない顔スワップ用強化アンタングル

Reinforced Disentanglement for Face Swapping without Skip Connection ( http://arxiv.org/abs/2307.07928v3 )

ライセンス: Link先を確認

Xiaohang Ren, Xingyu Chen, Pengfei Yao, Heung-Yeung Shum, Baoyuan Wang

(参考訳) SOTAのフェイススワップモデルでは、ターゲットのアイデンティティ(形状)がリークされたり、ターゲットの非アイデンティティ属性(背景、毛髪)が最終結果に完全に保存されないという問題がまだ残っている。 We show that this insufficient disentanglement is caused by two flawed designs that were commonly adopted in prior models: (1) counting on only one compressed encoder to represent both the semantic-level non-identity facial attributes(i.e., pose) and the pixel-level non-facial region details, which is contradictory to satisfy at the same time; (2) highly relying on long skip-connections between the encoder and the final generator, leaking a certain amount of target face identity into the result. そこで我々は,2つのターゲットエンコーダを用いて,顔領域の画素レベルの非顔領域属性と意味的非顔領域属性をそれぞれキャプチャする「WSCスワップ」という新しい顔スワップフレームワークを提案する。対象エンコーダの絡み合い学習をさらに強化するために,逆訓練(gan)によるid消去損失と,[11]のような先行3dmmモデルによる非id化保存損失の両方を用いる。 faceforensics++ と celeba-hq の両方の広範な実験により、我々の結果は、以前完全に無視されたアイデンティティ一貫性を測定するための新しいメトリックを含む、リッチなメトリクスセットの以前の成果を大きく上回っていることが分かりました。

The SOTA face swap models still suffer the problem of either target identity (i.e., shape) being leaked or the target non-identity attributes (i.e., background, hair) failing to be fully preserved in the final results. We show that this insufficient disentanglement is caused by two flawed designs that were commonly adopted in prior models: (1) counting on only one compressed encoder to represent both the semantic-level non-identity facial attributes(i.e., pose) and the pixel-level non-facial region details, which is contradictory to satisfy at the same time; (2) highly relying on long skip-connections between the encoder and the final generator, leaking a certain amount of target face identity into the result. To fix them, we introduce a new face swap framework called 'WSC-swap' that gets rid of skip connections and uses two target encoders to respectively capture the pixel-level non-facial region attributes and the semantic non-identity attributes in the face region. To further reinforce the disentanglement learning for the target encoder, we employ both identity removal loss via adversarial training (i.e., GAN) and the non-identity preservation loss via prior 3DMM models like [11]. Extensive experiments on both FaceForensics++ and CelebA-HQ show that our results significantly outperform previous works on a rich set of metrics, including one novel metric for measuring identity consistency that was completely neglected before.

翻訳日:2023-07-27 15:15:08 公開日:2023-07-26

# no train no gain: トランスフォーマーベースの言語モデルのための効率的なトレーニングアルゴリズムの再検討

No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models ( http://arxiv.org/abs/2307.06440v2 )

ライセンス: Link先を確認

Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J. Kusner

(参考訳) トランスフォーマーベースの言語モデルのトレーニングに必要な計算量は近年急増している。この傾向は、トレーニング、バリデーション、下流のパフォーマンスを標準トレーニングよりも高速に向上するために設計された効率的なトレーニングアルゴリズムの研究を動機付けている。本研究では,動的アーキテクチャ (レイヤスタック,レイヤドロップ),バッチ選択 (選択バックプロップ,rho損失),効率的な最適化 (lion,sophia) という3つのカテゴリを再検討する。このような手法を用いて, BERT と T5 を固定計算予算で事前学習すると, トレーニング, 検証, ダウンストリームのゲインが, 完全に遅延した学習率のベースラインに比べて消失することがわかった。我々は,すべての計算時間を参照システム時間と呼ぶ参照マシンにマッピングすることにより,任意のマシン上での計算を可能にする評価プロトコルを定義する。我々は提案するプロトコルの限界について議論し、効率的なトレーニング手順における厳密な研究を促進するためにコードをリリースした。

The computation necessary for training Transformer-based language models has skyrocketed in recent years. This trend has motivated research on efficient training algorithms designed to improve training, validation, and downstream performance faster than standard training. In this work, we revisit three categories of such algorithms: dynamic architectures (layer stacking, layer dropping), batch selection (selective backprop, RHO loss), and efficient optimizers (Lion, Sophia). When pre-training BERT and T5 with a fixed computation budget using such methods, we find that their training, validation, and downstream gains vanish compared to a baseline with a fully-decayed learning rate. We define an evaluation protocol that enables computation to be done on arbitrary machines by mapping all computation time to a reference machine which we call reference system time. We discuss the limitations of our proposed protocol and release our code to encourage rigorous research in efficient training procedures: https://github.com/JeanKaddour/NoTrainNoGain.

翻訳日:2023-07-27 15:14:40 公開日:2023-07-26

# MMBench: マルチモーダルモデルはオールアラウンドプレイヤーか?

MMBench: Is Your Multi-modal Model an All-around Player? ( http://arxiv.org/abs/2307.06281v2 )

ライセンス: Link先を確認

Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin

(参考訳) 大規模視覚言語モデルは近年顕著な進歩を遂げており、視覚情報に関する認識と推論能力を示している。しかし、これらの大きな視覚言語モデルをどのように効果的に評価するかは大きな障害であり、将来のモデル開発を妨げる。 VQAv2やCOCO Captionのような従来のベンチマークは、定量的なパフォーマンス測定を提供するが、きめ細かい能力評価と非ロバスト評価の指標が欠如している。近年のOwlEvalのような主観的ベンチマークは、人間の労働を取り入れたモデル能力の包括的な評価を提供するが、それらはスケーラブルではなく、重大なバイアスを示す。これらの課題に対応するために,新しいマルチモーダリティベンチマークMMBenchを提案する。 MMBenchは、主に2つの要素からなる包括的な評価パイプラインを方法論的に開発する。第1の要素は厳密にキュレートされたデータセットで、既存の類似ベンチマークを、さまざまな評価質問や能力で上回る。第2の要素は、新しいCircularEval戦略を導入し、ChatGPTの使用を取り入れている。この実装は、フリーフォーム予測を事前定義された選択に変換するように設計されているので、モデルの予測をより堅牢な評価が容易になる。 mmbenchは視覚言語モデルの様々な能力を堅牢に評価するための体系的に設計された客観的ベンチマークである。 mmbenchが研究コミュニティのモデルの評価を改善し、この分野の今後の進歩を促進することを願っている。プロジェクトページ: https://opencompass.org.cn/mmbench

Large vision-language models have recently achieved remarkable progress, exhibiting great perception and reasoning abilities concerning visual information. However, how to effectively evaluate these large vision-language models remains a major obstacle, hindering future model development. Traditional benchmarks like VQAv2 or COCO Caption provide quantitative performance measurements but suffer from a lack of fine-grained ability assessment and non-robust evaluation metrics. Recent subjective benchmarks, such as OwlEval, offer comprehensive evaluations of a model's abilities by incorporating human labor, but they are not scalable and display significant bias. In response to these challenges, we propose MMBench, a novel multi-modality benchmark. MMBench methodically develops a comprehensive evaluation pipeline, primarily comprised of two elements. The first element is a meticulously curated dataset that surpasses existing similar benchmarks in terms of the number and variety of evaluation questions and abilities. The second element introduces a novel CircularEval strategy and incorporates the use of ChatGPT. This implementation is designed to convert free-form predictions into pre-defined choices, thereby facilitating a more robust evaluation of the model's predictions. MMBench is a systematically-designed objective benchmark for robustly evaluating the various abilities of vision-language models. We hope MMBench will assist the research community in better evaluating their models and encourage future advancements in this domain. Project page: https://opencompass.org.cn/mmbench.

翻訳日:2023-07-27 15:14:21 公開日:2023-07-26

# Prompt Generate Train (PGT):オープンブック質問応答のための検索拡張生成モデルのFew-shot Domain Adaption

Prompt Generate Train (PGT): Few-shot Domain Adaption of Retrieval Augmented Generation Models for Open Book Question-Answering ( http://arxiv.org/abs/2307.05915v2 )

ライセンス: Link先を確認

C. S. Krishna

(参考訳) 本稿では,オープンブック質問応答のための生成的質問応答モデルを開発するためのフレームワークであるPrompt, Generate, Train (PGT)を提案する。このフレームワークは、教師付き微調整および合成フィードバックによる強化学習を用いて、レトリバー拡張生成(RAG)モデルをターゲット領域に適用する。これを仮定すると,GPT-4をベースとしたテキスト内検索拡張生成と競合する整合的不確実性校正モデルが得られ,より低いサービスコストで関連する回答が生成される。フレームワークの合成生成パイプラインは、オープンソースのLCMと新しい一貫性フィルタリングスキームを使用して、<passage, question, answer>タプルからなる合成トレーニングデータを生成する。パイプラインは、コーパス全体にわたる抽象的および抽出的な質問を生成するように設計されている。このフレームワークでは,高密度検索器(ColBERTv2)と,合成データセット上に小型のLPMからなるRAGモデルを微調整することを提案する。並行して、このフレームワークは、合成されたサンプルの事前関連順序付けを用いて、幻覚された回答よりも高いドメイン基底回答をスコアするRewardモデルを訓練する。次のフェーズでは、RAGモデルを強化学習(Proximal Policy Optimization)を使用してターゲットドメインと整合させる。このステップは、RAGモデルの基底化された回答を生成し、ドメインの質問を無視する能力を改善する可能性がある。最終段階では、このフレームワークは抽出された質問に対するモデルの不確実性を校正する。

We propose a framework - Prompt, Generate, Train (PGT) - to efficiently develop a generative question-answering model for open-book question-answering over a proprietary collection of text documents. The framework adapts a retriever augmented generation (RAG) model to the target domain using supervised fine-tuning and reinforcement learning with synthetic feedback in a few-shot setting. This, we hypothesize, will yield an aligned, uncertainty calibrated model that is competitive with GPT-4 based in-context retrieval augmented generation in generating relevant answers at lower serving costs. The framework's synthetic generation pipeline will generate synthetic training data comprising <passage, question, answer> tuples using an open-source LLM and a novel consistency filtering scheme. The pipeline will be designed to generate both abstractive and extractive questions that span the entire corpus. The framework proposes to fine-tune a smaller RAG model comprising a dense retriever (ColBERTv2) and a smaller sized LLM on the synthetic dataset. In parallel, the framework will train a Reward model to score domain grounded answers higher than hallucinated answers using an a priori relevance ordering of synthetically assembled samples. In the next phase, the framework will align the RAG model with the target domain using reinforcement learning (Proximal Policy Optimization). This step may improve the RAG model's ability to generate grounded answers and ignore out of domain questions. In the final phase, the framework will calibrate the model's uncertainty for extractive question-answers.

翻訳日:2023-07-27 15:13:57 公開日:2023-07-26

# PKU-GoodsAD:教師なし異常検出とセグメンテーションのためのスーパーマーケットグッズデータセット

PKU-GoodsAD: A Supermarket Goods Dataset for Unsupervised Anomaly Detection and Segmentation ( http://arxiv.org/abs/2307.04956v2 )

ライセンス: Link先を確認

Jian Zhang, Runwei Ding, Miaoju Ban, Ge Yang

(参考訳) 視覚異常検出はコンピュータビジョンの分野で多くのタスクに必須であり、一般的に用いられる。最近の異常検出データセットは主に産業自動化検査、医療画像分析、ビデオ監視に焦点を当てている。無人のスーパーマーケットやスマート製造における異常検出の適用範囲を広げ,研究するために,スーパーマーケット商品の異常検出(GoodsAD)データセットを導入する。 484種類の外見品を6つのカテゴリに分けた6124枚の高解像度画像を含んでいる。各カテゴリには、変形、表面損傷、開口など、いくつかの一般的な種類の異常が含まれている。異常はテクスチャ変化と構造変化の両方を含む。教師なしの設定に従い、通常の(欠陥のない)画像のみをトレーニングに使用する。画素精度の基底真理領域は、全ての異常に対して提供される。また,現在最先端の教師なし異常検出手法を徹底的に評価する。この最初のベンチマークは、産業的異常検出データセット(例えばMVTec AD)でうまく機能するいくつかのメソッドが、我々のデータセットで性能が悪いことを示している。これは、現実世界のアプリケーションに焦点を当てたスーパーマーケット商品異常検出のための包括的でマルチオブジェクトデータセットである。

Visual anomaly detection is essential and commonly used for many tasks in the field of computer vision. Recent anomaly detection datasets mainly focus on industrial automated inspection, medical image analysis and video surveillance. In order to broaden the application and research of anomaly detection in unmanned supermarkets and smart manufacturing, we introduce the supermarket goods anomaly detection (GoodsAD) dataset. It contains 6124 high-resolution images of 484 different appearance goods divided into 6 categories. Each category contains several common different types of anomalies such as deformation, surface damage and opened. Anomalies contain both texture changes and structural changes. It follows the unsupervised setting and only normal (defect-free) images are used for training. Pixel-precise ground truth regions are provided for all anomalies. Moreover, we also conduct a thorough evaluation of current state-of-the-art unsupervised anomaly detection methods. This initial benchmark indicates that some methods which perform well on the industrial anomaly detection dataset (e.g., MVTec AD), show poor performance on our dataset. This is a comprehensive, multi-object dataset for supermarket goods anomaly detection that focuses on real-world applications.

翻訳日:2023-07-27 15:13:28 公開日:2023-07-26

# SAS Video-QA: 効率的なビデオ質問応答のための自己適応サンプリング

SAS Video-QA: Self-Adaptive Sampling for Efficient Video Question-Answering ( http://arxiv.org/abs/2307.04192v2 )

ライセンス: Link先を確認

Wei Han, Hui Chen, Min-Yen Kan, Soujanya Poria

(参考訳) ビデオ質問応答は、ビデオ理解の分野における基本的な課題である。ビデオ変換器を備えた現在の視覚言語モデル(VLM)では、時間的モデリングが可能であり、優れた結果が得られるが、計算能力の巨大なコストがかかるため、リアルタイムアプリケーションシナリオへのデプロイには高すぎる。 An economical workaround only samples a small portion of frames to represent the main content of that video and tune an image--text model on these sampled frames. Recent video understanding models usually randomly sample a set of frames or clips, regardless of internal correlations between their visual contents, nor their relevance to the problem. We argue that such kinds of aimless sampling may omit the key frames from which the correct answer can be deduced, and the situation gets worse when the sampling sparsity increases, which always happens as the video lengths increase. To mitigate this issue, we propose two frame sampling strategies, namely the most domain frames (MDF) and most implied frames (MIF), to maximally preserve those frames that are most likely vital to the given questions. MDF passively minimizes the risk of key frame omission in a bootstrap manner, while MIS actively searches key frames customized for each video--question pair with the assistance of auxiliary models. 3つの高度なVLM(CLIP, GIT, All-in-one)による3つの公開データセットに対する実験結果から,提案手法が画像テキスト事前学習モデルの性能を向上させることを示す。本論文で提案されている手法に関するソースコードはhttps://github.com/declare-lab/sas-vqa.comで公開されている。

Video question--answering is a fundamental task in the field of video understanding. Although current vision--language models (VLMs) equipped with Video Transformers have enabled temporal modeling and yielded superior results, they are at the cost of huge computational power and thus too expensive to deploy in real-time application scenarios. An economical workaround only samples a small portion of frames to represent the main content of that video and tune an image--text model on these sampled frames. Recent video understanding models usually randomly sample a set of frames or clips, regardless of internal correlations between their visual contents, nor their relevance to the problem. We argue that such kinds of aimless sampling may omit the key frames from which the correct answer can be deduced, and the situation gets worse when the sampling sparsity increases, which always happens as the video lengths increase. To mitigate this issue, we propose two frame sampling strategies, namely the most domain frames (MDF) and most implied frames (MIF), to maximally preserve those frames that are most likely vital to the given questions. MDF passively minimizes the risk of key frame omission in a bootstrap manner, while MIS actively searches key frames customized for each video--question pair with the assistance of auxiliary models. The experimental results on three public datasets from three advanced VLMs (CLIP, GIT and All-in-one) demonstrate that our proposed strategies can boost the performance for image--text pretrained models. The source codes pertaining to the method proposed in this paper are publicly available at https://github.com/declare-lab/sas-vqa.

翻訳日:2023-07-27 15:13:12 公開日:2023-07-26

# 高次元平均王問題に対する実験的解法

Experimental Solutions to the High-Dimensional Mean King's Problem ( http://arxiv.org/abs/2307.12938v2 )

ライセンス: Link先を確認

Tareq Jaouni, Xiaoqin Gao, S\"oren Arlt, Mario Krenn and Ebrahim Karimi

(参考訳) 1987年、ヴァイドマン、アハラノフ、アルベルトは量子エンタングルメントを利用するだけで解ける平均王問題(Mean King's Problem, MKP)というパズルを提唱した。この問題に対する素動力の解が存在することが示されているが、これらは2つ以上の次元で実験的に実現されていない。我々は、MKPを素次元(D$)で解くための一般的な第一種実験スキームを提案する。私たちの検索は、デジタル発見フレームワークpytheusによって導かれ、量子光学実験的なセットアップの高度に解釈可能なグラフベースの表現を見つける。原理の証明として, 3次元, 5次元, 7次元のケースに対する解法を詳細に検討する。最大成功確率は72.8 %$、45.8 %$、34.8 %$である。したがって、コンピュータにインスパイアされたスキームは古典的な確率(1/D$)の2倍を超える解を導き、実験的な実装の可能性を実証する。

In 1987, Vaidman, Aharanov, and Albert put forward a puzzle called the Mean King's Problem (MKP) that can be solved only by harnessing quantum entanglement. Prime-powered solutions to the problem have been shown to exist, but they have not yet been experimentally realized for any dimension beyond two. We propose a general first-of-its-kind experimental scheme for solving the MKP in prime dimensions ($D$). Our search is guided by the digital discovery framework PyTheus, which finds highly interpretable graph-based representations of quantum optical experimental setups; using it, we find specific solutions and generalize to higher dimensions through human insight. As proof of principle, we present a detailed investigation of our solution for the three-, five-, and seven-dimensional cases. We obtain maximum success probabilities of $72.8 \%$, $45.8\%$, and $34.8 \%$, respectively. We, therefore, posit that our computer-inspired scheme yields solutions that exceed the classical probability ($1/D$) twofold, demonstrating its promise for experimental implementation.

翻訳日:2023-07-27 15:07:22 公開日:2023-07-26

# 教師なし人物再同定のためのハードスケルトンマイニングを用いた階層的骨格メタプロトタイプコントラスト学習

Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-Identification ( http://arxiv.org/abs/2307.12917v2 )

ライセンス: Link先を確認

Haocong Rao, Cyril Leung, Chunyan Miao

(参考訳) 深度センサーと深度学習の急速な進歩により、骨格に基づく人物再識別(re-ID)モデルは近年、多くの利点で顕著な進歩を遂げている。既存のソリューションのほとんどは、同一の骨格の重要性を前提として、身体関節から単一レベルの骨格特徴を学習するが、通常、よりグローバルな身体パターンを持つ肢レベルのような様々なレベルからより有用な骨格特徴を活用できない。これらの手法のラベル依存性は、より一般的な骨格表現を学ぶ際の柔軟性を制限している。本稿では,HSM(Hard Skeleton Mining)を用いた階層型非教師付きメタプロトタイプコントラストラーニング(Hi-MPC)手法を提案する。まず,骨格の階層的表現を構築し,身体関節,構成要素,四肢のレベルから体と運動の特徴をモデル化する。その後、階層的なメタプロトタイプコントラスト学習モデルが提案され、異なるレベルの骨格から最も典型的な骨格の特徴(プロトタイプ)をクラスタリングし、対比する。原原型を複数の同種変換を伴うメタプロトタイプに変換することにより,原型固有の一貫性を学習し,人体再IDのより効果的な骨格特徴を捉える。さらに, 各骨格の情報的重要性を適応的に推測し, より識別的な骨格表現を学習するために, 硬い骨格のマイニング機構を考案した。 5つのデータセットに関する広範な評価は、我々のアプローチが様々な最先端のスケルトンベース手法よりも優れていることを示している。さらに,本手法が推定骨格を用いたクロスビューパーソン・リIDとRGBベースのシナリオに適用可能であることを示す。

With rapid advancements in depth sensors and deep learning, skeleton-based person re-identification (re-ID) models have recently achieved remarkable progress with many advantages. Most existing solutions learn single-level skeleton features from body joints with the assumption of equal skeleton importance, while they typically lack the ability to exploit more informative skeleton features from various levels such as limb level with more global body patterns. The label dependency of these methods also limits their flexibility in learning more general skeleton representations. This paper proposes a generic unsupervised Hierarchical skeleton Meta-Prototype Contrastive learning (Hi-MPC) approach with Hard Skeleton Mining (HSM) for person re-ID with unlabeled 3D skeletons. Firstly, we construct hierarchical representations of skeletons to model coarse-to-fine body and motion features from the levels of body joints, components, and limbs. Then a hierarchical meta-prototype contrastive learning model is proposed to cluster and contrast the most typical skeleton features ("prototypes") from different-level skeletons. By converting original prototypes into meta-prototypes with multiple homogeneous transformations, we induce the model to learn the inherent consistency of prototypes to capture more effective skeleton features for person re-ID. Furthermore, we devise a hard skeleton mining mechanism to adaptively infer the informative importance of each skeleton, so as to focus on harder skeletons to learn more discriminative skeleton representations. Extensive evaluations on five datasets demonstrate that our approach outperforms a wide variety of state-of-the-art skeleton-based methods. We further show the general applicability of our method to cross-view person re-ID and RGB-based scenarios with estimated skeletons.

翻訳日:2023-07-27 15:07:03 公開日:2023-07-26

# AMAE:胸部X線二重分布異常検出のための前訓練マスク付きオートエンコーダの適応

AMAE: Adaptation of Pre-Trained Masked Autoencoder for Dual-Distribution Anomaly Detection in Chest X-Rays ( http://arxiv.org/abs/2307.12721v2 )

ライセンス: Link先を確認

Behzad Bozorgtabar, Dwarikanath Mahapatra, Jean-Philippe Thiran

(参考訳) 胸部x線写真などの医療画像における教師なし異常検出は、異常データの労働集約的かつ費用のかかる専門家による注釈の不足を軽減するため、スポットライトを浴びている。しかしながら、既存のほとんどのメソッドは、通常のクラスからの表現のみに基づいて訓練された1クラス分類として定式化され、ラベルなしデータの潜在的重要な部分を捨てる。本報告では, 胸部X線に対して, 正常画像と未ラベル画像の両方を含むトレーニングデータ全体を用いて, より実用的, 二重分布異常検出に着目する。画像領域を再構成するために部分的な画像入力を用いて訓練された現代の自己教師付き視覚トランスフォーマーモデルに触発され,事前学習されたマスク付きオートエンコーダ(mae)の適応のための2段階アルゴリズムであるamaeを提案する。 MAEの初期化から始まり、AMAEはまず通常の訓練画像のみから合成異常を生成し、冷凍変圧器の特徴を軽量に分類する。次に,異常を含むラベル付き画像を活用する適応戦略を提案する。この適応方式は、未ラベル画像に擬似ラベルを割り当て、擬似ラベル画像の正規分布と異常分布をモデル化するために2つのmaeベースモジュールを使用する。提案手法の有効性を,ラベルのないトレーニングセットにおいて異なる異常比で評価する。 AMAEは、競合する自己監督型および二重分布異常検出法よりも一貫したパフォーマンス向上をもたらし、RSNA、NIH-CXR、VinDr-CXRの3つの公開胸部X線ベンチマークに新しい最先端を設定した。

Unsupervised anomaly detection in medical images such as chest radiographs is stepping into the spotlight as it mitigates the scarcity of the labor-intensive and costly expert annotation of anomaly data. However, nearly all existing methods are formulated as a one-class classification trained only on representations from the normal class and discard a potentially significant portion of the unlabeled data. This paper focuses on a more practical setting, dual distribution anomaly detection for chest X-rays, using the entire training data, including both normal and unlabeled images. Inspired by a modern self-supervised vision transformer model trained using partial image inputs to reconstruct missing image regions -- we propose AMAE, a two-stage algorithm for adaptation of the pre-trained masked autoencoder (MAE). Starting from MAE initialization, AMAE first creates synthetic anomalies from only normal training images and trains a lightweight classifier on frozen transformer features. Subsequently, we propose an adaptation strategy to leverage unlabeled images containing anomalies. The adaptation scheme is accomplished by assigning pseudo-labels to unlabeled images and using two separate MAE based modules to model the normative and anomalous distributions of pseudo-labeled images. The effectiveness of the proposed adaptation strategy is evaluated with different anomaly ratios in an unlabeled training set. AMAE leads to consistent performance gains over competing self-supervised and dual distribution anomaly detection methods, setting the new state-of-the-art on three public chest X-ray benchmarks: RSNA, NIH-CXR, and VinDr-CXR.

翻訳日:2023-07-27 15:06:34 公開日:2023-07-26

# 量子ネットワークの絡み合い:力学、エナリング技術、課題、研究の方向性

Entanglement-Assisted Quantum Networks: Mechanics, Enabling Technologies, Challenges, and Research Directions ( http://arxiv.org/abs/2307.12490v2 )

ライセンス: Link先を確認

Zhonghui Li, Kaiping Xue, Jian Li, Lutong Chen, Ruidong Li, Zhaoying Wang, Nenghai Yu, David S.L. Wei, Qibin Sun, Jun Lu

(参考訳) 過去数十年間、理論研究から実験的実証まで、量子情報技術において大きな進歩を遂げてきた。革命的量子アプリケーションは現在ライムライトにあり、量子情報技術の利点を示し、学術や産業における研究ホットスポットとなっている。量子アプリケーションがより深い影響とより広い応用をもたらすために、量子チャネルを介して複数の量子ノードの相互接続が不可欠である。量子ノード間の量子情報伝送を実現するエンタングルメント支援量子ネットワークの構築が主な目標である。しかし、絡み合い支援量子ネットワークは、重ね合わせ原理、無閉定理、量子絡み合いといった量子力学のユニークな法則によって制御され、古典的ネットワークとは区別される。そのため、絡み合い支援量子ネットワークの確立には基本的な取り組みが必要である。いくつかの洞察に富んだ調査は、絡み合い支援量子ネットワークの道を開いたが、これらの研究の大半は、重要なネットワーク問題を無視した技術と量子アプリケーションの実現に焦点を当てている。本報告では,量子ネットワークの絡み合いに関する包括的調査を行う。本論文は,基本力学の見直しと有効化技術に加えて,ネットワーク構造,作業原理,開発段階の詳細な概要を提供し,古典的ネットワークとの差異を明らかにする。さらに、広域絡み合い支援量子ネットワーク構築の課題にも対処している。さらに,今後の絡み合い支援量子ネットワークの実現を促進するため,アーキテクチャ設計,絡み合いベースのネットワーク問題,標準化など,オープン研究の方向性を強調する。

Over the past few decades, significant progress has been made in quantum information technology, from theoretical studies to experimental demonstrations. Revolutionary quantum applications are now in the limelight, showcasing the advantages of quantum information technology and becoming a research hotspot in academia and industry. To enable quantum applications to have a more profound impact and wider application, the interconnection of multiple quantum nodes through quantum channels becomes essential. Building an entanglement-assisted quantum network, capable of realizing quantum information transmission between these quantum nodes, is the primary goal. However, entanglement-assisted quantum networks are governed by the unique laws of quantum mechanics, such as the superposition principle, the no-cloning theorem, and quantum entanglement, setting them apart from classical networks. Consequently, fundamental efforts are required to establish entanglement-assisted quantum networks. While some insightful surveys have paved the way for entanglement-assisted quantum networks, most of these studies focus on enabling technologies and quantum applications, neglecting critical network issues. In response, this paper presents a comprehensive survey of entanglement-assisted quantum networks. Alongside reviewing fundamental mechanics and enabling technologies, the paper provides a detailed overview of the network structure, working principles, and development stages, highlighting the differences from classical networks. Additionally, the challenges of building wide-area entanglement-assisted quantum networks are addressed. Furthermore, the paper emphasizes open research directions, including architecture design, entanglement-based network issues, and standardization, to facilitate the implementation of future entanglement-assisted quantum networks.

翻訳日:2023-07-27 15:05:38 公開日:2023-07-26

# タイムウインドウを用いた動的車両経路のハイブリッド遺伝的探索

Hybrid Genetic Search for Dynamic Vehicle Routing with Time Windows ( http://arxiv.org/abs/2307.11800v2 )

ライセンス: Link先を確認

Mohammed Ghannam and Ambros Gleixner

(参考訳) 時間窓付き動的車両ルーティング問題(DVRPTW)は、従来のVRPTWをオンライン環境に一般化したものである。本稿では,VRPTWのためのヒューリスティックアルゴリズムであるHybrid Genetic Search (HGS)アルゴリズムを動的変種に適用する。本稿では,hgsアルゴリズムの影響を受ける構成要素として,巨大ツーリング表現,コスト計算,初期個体数,クロスオーバー,局所探索について論じる。弊社のアプローチでは、これらのコンポーネントをDVRPTWに修正し、ソリューションの品質と今後の顧客の到着に対する制約のバランスを図っている。この目的のために私たちは,異なるサイズのソリューションを比較し,コストを正規化し,事前のトレーニングを必要としない将来の時代を計算するための手法を考案する。この制限にもかかわらず、EUROのデータに対する計算結果がNeurIPS Vehicle Routing Competition 2022と一致し、最高の性能のベースラインアルゴリズムよりも解の質が大幅に向上した。

The dynamic vehicle routing problem with time windows (DVRPTW) is a generalization of the classical VRPTW to an online setting, where customer data arrives in batches and real-time routing solutions are required. In this paper we adapt the Hybrid Genetic Search (HGS) algorithm, a successful heuristic for VRPTW, to the dynamic variant. We discuss the affected components of the HGS algorithm including giant-tour representation, cost computation, initial population, crossover, and local search. Our approach modifies these components for DVRPTW, attempting to balance solution quality and constraints on future customer arrivals. To this end, we devise methods for comparing different-sized solutions, normalizing costs, and accounting for future epochs that do not require any prior training. Despite this limitation, computational results on data from the EURO meets NeurIPS Vehicle Routing Competition 2022 demonstrate significantly improved solution quality over the best-performing baseline algorithm.

翻訳日:2023-07-27 15:04:47 公開日:2023-07-26

# mediagpt : 中国語メディアのための大規模言語モデル

MediaGPT : A Large Language Model For Chinese Media ( http://arxiv.org/abs/2307.10930v2 )

ライセンス: Link先を確認

Zhonghao Wang, Zijia Lu, Bo Jin, Haiying Deng

(参考訳) 大規模言語モデル(llm)は、高品質なテキストの生成と、メディアドメインを含む大量のデータに基づく予測に優れた能力を示している。しかし、実際的な応用では、メディアのユースケースとLLMの汎用的応用の違いが、特に中国語で顕著になっている。本稿では,メディアドメイン固有のLCMの特徴を一般のLCMと比較し,各領域の要求を満たすために多様なタスク命令型を設計し,メディアドメインに適した独自のデータセットを構築した。これらに基づいて,中国メディアドメインのためのドメイン固有llmであるmediagpt,ドメイン固有データによるトレーニング,専門家のsftデータを提案する。そこで本研究では,人的専門家による評価と強力なモデル評価を行うことにより,メディアGPTが中国メディアドメインタスクの主流モデルよりも優れ,ドメインデータの重要性やドメイン定義のプロンプト型が有効ドメイン固有LLM構築に有効であることが実証された。

Large language models (LLMs) have shown remarkable capabilities in generating high-quality text and making predictions based on large amounts of data, including the media domain. However, in practical applications, the differences between the media's use cases and the general-purpose applications of LLMs have become increasingly apparent, especially Chinese. This paper examines the unique characteristics of media-domain-specific LLMs compared to general LLMs, designed a diverse set of task instruction types to cater the specific requirements of the domain and constructed unique datasets that are tailored to the media domain. Based on these, we proposed MediaGPT, a domain-specific LLM for the Chinese media domain, training by domain-specific data and experts SFT data. By performing human experts evaluation and strong model evaluation on a validation set, this paper demonstrated that MediaGPT outperforms mainstream models on various Chinese media domain tasks and verifies the importance of domain data and domain-defined prompt types for building an effective domain-specific LLM.

翻訳日:2023-07-27 15:04:30 公開日:2023-07-26

# 知覚的アライメントモニタリング

Deceptive Alignment Monitoring ( http://arxiv.org/abs/2307.10569v2 )

ライセンス: Link先を確認

Andres Carranza, Dhruv Pai, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo

(参考訳) 大規模な機械学習モデルの能力が拡大し続け、そのようなモデルに与えられる自律性が拡大するにつれて、新しい敵の織機(モデルそのもの)が見えてくる。モデルが一見合理的に振る舞うという脅威は、内密かつ微妙にその振る舞いを操作上の理由から修正する一方で、AIセーフティ&アライメントのコミュニティにおいて、詐欺的アライメントと呼ばれることが多い。したがって、この新たな方向を認知アライメントモニタリングと呼ぶ。そこで本研究では,近未来にますます重要となり,相互に絡み合うであろう,多様な機械学習サブフィールドにおける新たな方向性を特定し,これらの分野における進歩は,長期的な課題と新たな研究機会の両方をもたらすと論じる。我々は、これらの新興方向への敵対的機械学習コミュニティのさらなる関与を提唱することで、結論付ける。

As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves. The threat that a model might behave in a seemingly reasonable manner, while secretly and subtly modifying its behavior for ulterior reasons is often referred to as deceptive alignment in the AI Safety & Alignment communities. Consequently, we call this new direction Deceptive Alignment Monitoring. In this work, we identify emerging directions in diverse machine learning subfields that we believe will become increasingly important and intertwined in the near future for deceptive alignment monitoring, and we argue that advances in these fields present both long-term challenges and new research opportunities. We conclude by advocating for greater involvement by the adversarial machine learning community in these emerging directions.

翻訳日:2023-07-27 15:04:12 公開日:2023-07-26

# TimeTuner: 時系列予測の時間表現と非現実的説明

TimeTuner: Diagnosing Time Representations for Time-Series Forecasting with Counterfactual Explanations ( http://arxiv.org/abs/2307.09916v2 )

ライセンス: Link先を確認

Jianing Hao, Qing Shi, Yilin Ye, and Wei Zeng

(参考訳) ディープラーニング(DL)アプローチは、複雑なDLモデルを設計するための多くの取り組みとともに、時系列予測にますます使われています。近年の研究では、dlの成功は効果的なデータ表現に起因しており、機能工学と表現学習の分野を育んでいることが示されている。しかし、機能学習の自動化アプローチは通常、事前知識の導入、変数間の相互作用の特定、モデルの信頼性を保証するための評価指標の選択に限られる。これらの制約を改善するために,本論文では,モデル行動が局所的相関,定常性,時系列表現の粒度とどのように関連しているかをアナリストが理解するための新しいビジュアル分析フレームワークであるTimeTunerを提案する。まず, 時系列表現と多変量特徴, モデル予測の関係を関連づけるために, 反事実的説明を利用する。次に,分割型相関行列と分岐二変量ストライプを含む複数の協調ビューを設計し,ユーザが変換選択プロセスに踏み込み,特徴空間をナビゲートし,モデル性能を推論するためのインタラクションセットを提供する。平滑化とサンプリングの2つの変換方法でタイムチューナーをインスタンス化し,実世界の太陽黒点と多変量大気汚染物質の時系列予測への適用性を示す。ドメインエキスパートからのフィードバックは、我々のシステムが時系列表現を特徴づけ、機能エンジニアリングプロセスを導くのに役立つことを示している。

Deep learning (DL) approaches are being increasingly used for time-series forecasting, with many efforts devoted to designing complex DL models. Recent studies have shown that the DL success is often attributed to effective data representations, fostering the fields of feature engineering and representation learning. However, automated approaches for feature learning are typically limited with respect to incorporating prior knowledge, identifying interactions among variables, and choosing evaluation metrics to ensure that the models are reliable. To improve on these limitations, this paper contributes a novel visual analytics framework, namely TimeTuner, designed to help analysts understand how model behaviors are associated with localized correlations, stationarity, and granularity of time-series representations. The system mainly consists of the following two-stage technique: We first leverage counterfactual explanations to connect the relationships among time-series representations, multivariate features and model predictions. Next, we design multiple coordinated views including a partition-based correlation matrix and juxtaposed bivariate stripes, and provide a set of interactions that allow users to step into the transformation selection process, navigate through the feature space, and reason the model performance. We instantiate TimeTuner with two transformation methods of smoothing and sampling, and demonstrate its applicability on real-world time-series forecasting of univariate sunspots and multivariate air pollutants. Feedback from domain experts indicates that our system can help characterize time-series representations and guide the feature engineering processes.

翻訳日:2023-07-27 15:03:57 公開日:2023-07-26

# GPT-3モデルと金融共振器

GPT-3 Models are Few-Shot Financial Reasoners ( http://arxiv.org/abs/2307.13617v2 )

ライセンス: Link先を確認

Raul Salles de Padua, Imran Qureshi and Mustafa U. Karakaplan

(参考訳) 財務分析は企業業績を評価する重要なツールである。実践者は、収益性のある投資決定を行うために財務的な質問に答え、高度な定量的分析を用いてそれを行う。その結果、QA(Financial Question Answering)は、数字に関する深い推論を必要とする質問応答タスクである。さらに、事前訓練された言語モデルが金融分野でどの程度理にかなっているかは不明である。現在の最先端技術では、検索者はテキストとジェネレータから財務問題に関する関連事実を収集し、有効な金融プログラムと最終回答を生成する必要がある。しかし、gpt-3のような最近の大規模言語モデルは、少数の例で、さまざまなタスクで最先端のパフォーマンスを達成している。我々はGPT-3でいくつかの実験を行い、特に財務問題の性質や財務文書に格納されている複雑な情報により、個別の検索モデルと論理エンジンがSOTAの性能を達成する上で不可欠な要素であることを発見した。これにより, GPT-3 に対する改良されたプロンプトエンジニアリング手法は, 微調整を伴わずにSOTA 付近の精度を達成できる。

Financial analysis is an important tool for evaluating company performance. Practitioners work to answer financial questions to make profitable investment decisions, and use advanced quantitative analyses to do so. As a result, Financial Question Answering (QA) is a question answering task that requires deep reasoning about numbers. Furthermore, it is unknown how well pre-trained language models can reason in the financial domain. The current state-of-the-art requires a retriever to collect relevant facts about the financial question from the text and a generator to produce a valid financial program and a final answer. However, recently large language models like GPT-3 have achieved state-of-the-art performance on wide variety of tasks with just a few shot examples. We run several experiments with GPT-3 and find that a separate retrieval model and logic engine continue to be essential components to achieving SOTA performance in this task, particularly due to the precise nature of financial questions and the complex information stored in financial documents. With this understanding, our refined prompt-engineering approach on GPT-3 achieves near SOTA accuracy without any fine-tuning.

翻訳日:2023-07-27 14:55:10 公開日:2023-07-26

# FacTool: 生成AIにおける顔検出 - マルチタスクとマルチドメインシナリオのためのツール拡張フレームワーク

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios ( http://arxiv.org/abs/2307.13528v2 )

ライセンス: Link先を確認

I-Chun Chern, Steffi Chern, Shiqi Chen, Weizhe Yuan, Kehua Feng, Chunting Zhou, Junxian He, Graham Neubig, Pengfei Liu

(参考訳) 生成的事前学習モデルの出現は高品質テキストの合成を促進させたが、生成したテキストの事実的誤りを特定する上での課題も生じている。特に,(1)より広い範囲のタスクが生成モデルによって処理された場合に,事実エラーを含むリスクが増大している。 2) 生成テキストは長大であり, 個々の事実に対して明確な粒度が欠如している。 (3)事実確認の過程で明らかな証拠が不足している。上記の課題を念頭に,本稿では,大規模言語モデル(ChatGPTなど)が生成するテキストの事実誤りを検出するためのタスクおよびドメインに依存しないフレームワークであるFacToolを提案する。 4つの異なるタスク(知識ベースQA、コード生成、数学的推論、科学的文献レビュー)の実験は、提案手法の有効性を示している。私たちは、ChatGPTプラグインインターフェースに関連するFacToolのコードをhttps://github.com/GAIR-NLP/factool.comでリリースします。

The emergence of generative pre-trained models has facilitated the synthesis of high-quality text, but it has also posed challenges in identifying factual errors in the generated text. In particular: (1) A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models. (2) Generated texts tend to be lengthy and lack a clearly defined granularity for individual facts. (3) There is a scarcity of explicit evidence available during the process of fact checking. With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT). Experiments on four different tasks (knowledge-based QA, code generation, mathematical reasoning, and scientific literature review) show the efficacy of the proposed method. We release the code of FacTool associated with ChatGPT plugin interface at https://github.com/GAIR-NLP/factool .

翻訳日:2023-07-27 14:54:28 公開日:2023-07-26

# デュエット:効率的でスケーラブルなヒブリド・ネウラル・リレーション・アンダースタンディング

Duet: efficient and scalable hybriD neUral rElation undersTanding ( http://arxiv.org/abs/2307.13494v2 )

ライセンス: Link先を確認

Kaixin Zhang, Hongzhi Wang, Yabin Lu, Ziqi Li, Chang Shu, Yu Yan, Donghua Yang

(参考訳) 学習された濃度推定法は従来の手法に比べて高精度である。学習した方法の中で、クエリ駆動アプローチは、データとワークロードのドリフトの問題に長い間直面する。クエリ駆動手法とハイブリッド方式の両方がこの問題を回避するために提案されているが、それらのうちの最先端技術でさえ高いトレーニングと推定コスト、限られたスケーラビリティ、不安定性、高濃度および高次元テーブル上の長期分布問題に悩まされており、これは学習された濃度推定器の実践的応用に大きな影響を及ぼす。本稿では,これらの問題のほとんどが,広く用いられているプログレッシブサンプリングによるものであることを実証する。本稿では, 自己回帰モデルに述語を導入し, サンプリングや非微分可能プロセスなしに, 濃度を直接推定する安定かつ効率的でスケーラブルなハイブリッド手法であるDuetを提案し, 推定複雑性をナルーやUAEと比較して$O(n)$から$O(1)$に低減できるだけでなく, 高濃度および高次元テーブル上で高い精度を実現する。実験の結果、Duetは上記のすべての設計目標を達成でき、より実用的であり、GPU上のほとんどの学習した手法よりもCPU上での推論コストが低いことがわかった。

Learned cardinality estimation methods have achieved high precision compared to traditional methods. Among learned methods, query-driven approaches face the data and workload drift problem for a long time. Although both query-driven and hybrid methods are proposed to avoid this problem, even the state-of-art of them suffer from high training and estimation costs, limited scalability, instability, and long-tailed distribution problem on high cardinality and high dimensional tables, which seriously affects the practical application of learned cardinality estimators. In this paper, we prove that most of these problems are directly caused by the widely used progressive sampling. We solve this problem by introducing predicates into the autoregressive model and propose Duet, a stable, efficient, and scalable hybrid method to estimate cardinality directly without sampling or any non-differentiable process, which can not only reduces the inference complexity from $O(n)$ to $O(1)$ compared to Naru and UAE but also achieve higher accuracy on high cardinality and high dimensional tables. Experimental results show that Duet can achieve all the design goals above and be much more practical and even has a lower inference cost on CPU than that of most learned methods on GPU.

翻訳日:2023-07-27 14:53:59 公開日:2023-07-26

# 注意ネットワークの学習ダイナミクスについて

On the Learning Dynamics of Attention Networks ( http://arxiv.org/abs/2307.13421v2 )

ライセンス: Link先を確認

Rahul Vashisht and Harish G. Ramaswamy

(参考訳) 注意モデルは一般的に、ソフトアテンション(Soft attention)、ハードアテンション(ハードアテンション)、潜在変数の辺縁的可能性(Latent variable marginal chance, LVML)という3つの標準的な損失関数のうちの1つを最適化することによって学習される。これら3つのパラダイムは、入力の右 \textit{segment} を 'select' する 'focus' モデルと、選択したセグメントをターゲットラベルに処理する 'classification' モデルである。しかし、これらは選択されたセグメントを集約する方法で大きく異なり、異なるダイナミクスと最終的な結果をもたらす。これらのパラダイムを用いて学習したモデルのユニークなシグネチャを観察し,フォーカスモデルが固定された場合の勾配降下下での分類モデルの進化の帰結として説明する。また,これらのパラダイムを簡単な設定で解析し,勾配流下のパラメータ軌跡の閉形式式を導出する。ソフトアテンションの損失により、フォーカスモデルは初期化と後続のスパッタで急速に改善する。一方、注意喪失は反対方向に振る舞う。我々の観測に基づいて、異なる損失関数の利点を組み合わせた単純なハイブリッドアプローチを提案し、半合成および実世界のデータセットの集合上でそれを実証する。

Attention models are typically learned by optimizing one of three standard loss functions that are variously called -- soft attention, hard attention, and latent variable marginal likelihood (LVML) attention. All three paradigms are motivated by the same goal of finding two models -- a `focus' model that `selects' the right \textit{segment} of the input and a `classification' model that processes the selected segment into the target label. However, they differ significantly in the way the selected segments are aggregated, resulting in distinct dynamics and final results. We observe a unique signature of models learned using these paradigms and explain this as a consequence of the evolution of the classification model under gradient descent when the focus model is fixed. We also analyze these paradigms in a simple setting and derive closed-form expressions for the parameter trajectory under gradient flow. With the soft attention loss, the focus model improves quickly at initialization and splutters later on. On the other hand, hard attention loss behaves in the opposite fashion. Based on our observations, we propose a simple hybrid approach that combines the advantages of the different loss functions and demonstrates it on a collection of semi-synthetic and real-world datasets

翻訳日:2023-07-27 14:53:36 公開日:2023-07-26

# 気候変動交渉のための動的グループ化:効果的な戦略による協力と均衡の促進

Dynamic Grouping for Climate Change Negotiation: Facilitating Cooperation and Balancing Interests through Effective Strategies ( http://arxiv.org/abs/2307.13886v1 )

ライセンス: Link先を確認

Duo Zhang, Yuren Pang, Yu Qin

(参考訳) 現在の気候変動交渉モデルの枠組みは、さらなる研究と開発を保証するいくつかの制限を提示している。本トラックでは,地理的影響とユーティリティ・フレームワークを中心に,改善のための2つの重要な領域について論じる。地理的影響の面では,(1)地域的影響から世界的影響へのシフト,(2)地域間の気候変動の影響の多様性,(3)地理的な位置と政治構造の不均一性,(4)隣国間の協力,(5)歴史的・文化的要因が気候変動交渉に影響を与えることの重要性を考察する。さらに,貯蓄率の正の効果を報奨関数と全地域間の不均質性に統合することにより,温暖化の均質性と過大評価のレベルを低減するための効用と報酬の枠組みの洗練の必要性を強調する。これらの制限に対処することで、気候変動交渉モデルの正確性と有効性を向上し、政策立案者や利害関係者が、地域・世界レベルで気候変動に取り組むための目標かつ適切な戦略を策定できることを期待する。

The current framework for climate change negotiation models presents several limitations that warrant further research and development. In this track, we discuss mainly two key areas for improvement, focusing on the geographical impacts and utility framework. In the aspects of geographical impacts, We explore five critical aspects: (1) the shift from local to global impact, (2) variability in climate change effects across regions, (3) heterogeneity in geographical location and political structures, and (4) collaborations between adjacent nations, (5) the importance of including historical and cultural factors influencing climate negotiations. Furthermore, we emphasize the need to refine the utility and rewards framework to reduce the homogeneity and the level of overestimating the climate mitigation by integrating the positive effects of saving rates into the reward function and heterogeneity among all regions. By addressing these limitations, we hope to enhance the accuracy and effectiveness of climate change negotiation models, enabling policymakers and stakeholders to devise targeted and appropriate strategies to tackle climate change at both regional and global levels.

翻訳日:2023-07-27 14:08:31 公開日:2023-07-26

# 機械学習モデルの局所的ロバストネスの効率的な推定

Efficient Estimation of the Local Robustness of Machine Learning Models ( http://arxiv.org/abs/2307.13885v1 )

ライセンス: Link先を確認

Tessa Han, Suraj Srinivas, Himabindu Lakkaraju

(参考訳) 機械学習モデルは、しばしばノイズの多い入力データに対して堅牢である必要がある。モデル予測に対する実世界のノイズ(しばしばランダムである)の影響は、モデルの局所的ロバスト性、すなわち入力周辺の局所領域におけるモデル予測の一貫性によって捉えられる。しかし、モンテカルロサンプリングに基づく局所ロバストネスの「計算」アプローチは統計的に非効率的であり、大規模アプリケーションでは計算コストが禁じられている。本研究では,局所線形関数近似と多変量正規CDFを用いた多クラス判別モデルの局所ロバスト性を効率的に計算する最初の解析的推定器を開発する。これらの推定器の導出を通じて,局所的ロバスト性がランダム化平滑化やソフトマックス確率といった概念とどのように結びついているかを示す。また、これらの推定器が標準ディープラーニングモデルの局所的ロバスト性を正確かつ効率的に計算できることを実証的に確認する。さらに、ロバスト性バイアスの測定やデータセットのノイズ摂動に弱い例の特定など、局所ロバスト性に関わる様々なタスクに対するこれらの推定器の有用性を示す。これらの解析的推定器を開発することにより、局所ロバスト性の概念的理解を深めるだけでなく、その計算を実用的なものにし、臨界下流アプリケーションにおける局所ロバスト性の利用を可能にする。

Machine learning models often need to be robust to noisy input data. The effect of real-world noise (which is often random) on model predictions is captured by a model's local robustness, i.e., the consistency of model predictions in a local region around an input. However, the na\"ive approach to computing local robustness based on Monte-Carlo sampling is statistically inefficient, leading to prohibitive computational costs for large-scale applications. In this work, we develop the first analytical estimators to efficiently compute local robustness of multi-class discriminative models using local linear function approximation and the multivariate Normal CDF. Through the derivation of these estimators, we show how local robustness is connected to concepts such as randomized smoothing and softmax probability. We also confirm empirically that these estimators accurately and efficiently compute the local robustness of standard deep learning models. In addition, we demonstrate these estimators' usefulness for various tasks involving local robustness, such as measuring robustness bias and identifying examples that are vulnerable to noise perturbation in a dataset. By developing these analytical estimators, this work not only advances conceptual understanding of local robustness, but also makes its computation practical, enabling the use of local robustness in critical downstream applications.

翻訳日:2023-07-27 14:08:10 公開日:2023-07-26

# ExeDec: ニューラルプログラム合成における構成一般化のための実行分解

ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis ( http://arxiv.org/abs/2307.13883v1 )

ライセンス: Link先を確認

Kensen Shi, Joey Hong, Manzil Zaheer, Pengcheng Yin, Charles Sutton

(参考訳) プログラムを書くとき、人々はより小さくより使い慣れたサブタスクに分解することで、新しい複雑なタスクに取り組むことができる。ニューラルプログラム合成手法が類似する機能を持つかどうかを計測することは難しいが、より単純なサブタスクで訓練されたモデルが後により複雑なタスクを解決できるかどうかを合成的に一般化するかどうかを測定できる。本稿では,プログラム合成において望ましい複数の構成一般化形式を特徴付け,ロバストフィルとディープコーダの2つの一般的なデータセットに対する一般化タスクを作成するメタベンチマークを作成する。次に,各ステップにおけるプログラム実行によって学習される課題を段階的に解決するために,実行過程を予測する新しい分解型合成戦略であるexedecを提案する。 ExeDecは合成性能が向上し、ベースラインに比べて構成一般化能力が大幅に向上した。

When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks. While it is difficult to measure whether neural program synthesis methods have similar capabilities, we can measure whether they compositionally generalize, that is, whether a model that has been trained on the simpler subtasks is subsequently able to solve more complex tasks. In this paper, we characterize several different forms of compositional generalization that are desirable in program synthesis, forming a meta-benchmark which we use to create generalization tasks for two popular datasets, RobustFill and DeepCoder. We then propose ExeDec, a novel decomposition-based synthesis strategy that predicts execution subgoals to solve problems step-by-step informed by program execution at each step. ExeDec has better synthesis performance and greatly improved compositional generalization ability compared to baselines.

翻訳日:2023-07-27 14:07:46 公開日:2023-07-26

# 人間文化: 歴史に無関係で予測可能な経験

Human Culture: A History Irrelevant and Predictable Experience ( http://arxiv.org/abs/2307.13882v1 )

ライセンス: Link先を確認

Hao Wang

(参考訳) 人間の文化研究は、ビッグデータとソーシャルネットワーク革命のおかげで、革命の機会を目の当たりにした。 Douban.com、Goodreads.com、Pandora、IMDBなどのウェブサイトが文化研究者のための新しい金鉱山となっている。 2021年と2022年に、AIコールドスタート問題のための2つのデータフリーレコメンデーションシステムを発明した。このアルゴリズムは、ユーザーの過去の好みに言及せずに、ユーザーに文化的および商業的な商品を推薦することができる。新しい発明の社会的意味は、人間の文化的嗜好であり、人間の個人に関する情報なしに正確に予測することができる。本稿では,AI技術とその文化的意味を,他のAIアルゴリズムとともに分析する。人間の文化は(主に)無関係で予測可能な経験であることを示す。

Human culture research has witnessed an opportunity of revolution thanks to the big data and social network revolution. Websites such as Douban.com, Goodreads.com, Pandora and IMDB become the new gold mine for cultural researchers. In 2021 and 2022, the author of this paper invented 2 data-free recommender systems for AI cold-start problem. The algorithms can recommend cultural and commercial products to users without reference to users' past preferences. The social implications of the new inventions are human cultural tastes can be predicted very precisely without any information related to human individuals. In this paper, we analyze the AI technologies and its cultural implications together with other AI algorithms. We show that human culture is (mostly) a history irrelevant and predictable experience.

翻訳日:2023-07-27 14:07:29 公開日:2023-07-26

# 優れた格子トレーニング:数理論による物理情報ニューラルネットワーク

Good Lattice Training: Physics-Informed Neural Networks Accelerated by Number Theory ( http://arxiv.org/abs/2307.13869v1 )

ライセンス: Link先を確認

Takashi Matsubara, Takaharu Yaguchi

(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、偏微分方程式(PDE)を解くための、新しく効率的なアプローチを提供する。彼らの成功は、与えられたPDEを特定の点で満たし、解を近似するためにニューラルネットワークを訓練する物理インフォームド損失にある。しかし、PDEの解は本質的に無限次元であり、出力と解の間の距離は領域上の積分によって定義される。したがって、物理情報損失は有限近似しか得られず、離散化誤差を抑制するためには適切なコロケーション点を選択することが重要である。本稿では,数値解析の数値論的手法に触発されて,優れた格子学習(GLT)と呼ばれる新しい手法を提案する。 GLT は、少数の点や多次元空間に対しても有効であるコロケーション点の集合を提供する。実験の結果,GLTでは一様ランダムサンプリングやラテンハイパーキューブサンプリングよりも2～20倍のコロケーションポイント(計算コストの削減)が必要であり,競争性能が向上した。

Physics-informed neural networks (PINNs) offer a novel and efficient approach to solving partial differential equations (PDEs). Their success lies in the physics-informed loss, which trains a neural network to satisfy a given PDE at specific points and to approximate the solution. However, the solutions to PDEs are inherently infinite-dimensional, and the distance between the output and the solution is defined by an integral over the domain. Therefore, the physics-informed loss only provides a finite approximation, and selecting appropriate collocation points becomes crucial to suppress the discretization errors, although this aspect has often been overlooked. In this paper, we propose a new technique called good lattice training (GLT) for PINNs, inspired by number theoretic methods for numerical analysis. GLT offers a set of collocation points that are effective even with a small number of points and for multi-dimensional spaces. Our experiments demonstrate that GLT requires 2--20 times fewer collocation points (resulting in lower computational cost) than uniformly random sampling or Latin hypercube sampling, while achieving competitive performance.

翻訳日:2023-07-27 14:07:21 公開日:2023-07-26

# 高次元観測による可変性の学習源

Learning sources of variability from high-dimensional observational studies ( http://arxiv.org/abs/2307.13868v1 )

ライセンス: Link先を確認

Eric W. Bridgeford, Jaewon Chung, Brian Gilbert, Sambit Panda, Adam Li, Cencheng Shen, Alexandra Badea, Brian Caffo, Joshua T. Vogelstein

(参考訳) 因果推論は、変数の存在が観測結果に影響を及ぼすかどうかを研究する。平均治療効果」などの量によって測定されるように、このパラダイムはワクチンや薬物開発から政策介入に至るまで、多くの生物学的分野にまたがる。残念なことに、これらの手法の大部分は、しばしば単変量の結果に制限される。我々の研究は、任意の次元または可測空間を持つ結果に対する因果推定を一般化し、因果差検定として名目変数に対する従来の因果推定を定式化する。本稿では,一貫した条件付き独立性テストの簡易な調整手法を提案し,これらのテストが一貫した因果不一致性テストであることを証明した。数値実験により,提案手法であるcausal cdcorrは,既存の手法と比較して有限サンプルの妥当性とパワーが向上することを示す。私たちのメソッドはすべてオープンソースで、github.com/ebridge2/cdcorrで利用可能です。

Causal inference studies whether the presence of a variable influences an observed outcome. As measured by quantities such as the "average treatment effect," this paradigm is employed across numerous biological fields, from vaccine and drug development to policy interventions. Unfortunately, the majority of these methods are often limited to univariate outcomes. Our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space, and formulates traditional causal estimands for nominal variables as causal discrepancy tests. We propose a simple technique for adjusting universally consistent conditional independence tests and prove that these tests are universally consistent causal discrepancy tests. Numerical experiments illustrate that our method, Causal CDcorr, leads to improvements in both finite sample validity and power when compared to existing strategies. Our methods are all open source and available at github.com/ebridge2/cdcorr.

翻訳日:2023-07-27 14:07:00 公開日:2023-07-26

# 点対3D: スパース点と形状制御可能なテキスト対3D生成のギャップを埋める

Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation ( http://arxiv.org/abs/2307.13908v1 )

ライセンス: Link先を確認

Chaohui Yu, Qiang Zhou, Jingliang Li, Zhe Zhang, Zhibin Wang, Fan Wang

(参考訳) 数十億もの画像テキストペアでトレーニングされた2d拡散モデルによって、テキストから3dへの生成が注目されている。既存の方法は、主に2D拡散の先行を利用して3Dモデル、例えばNeRFの生成を監督するためにスコア蒸留に依存している。しかし、スコア蒸留は視界の不整合に悩まされがちであり、暗黙のNeRFモデリングもまた任意の形状につながり、現実的で制御不能な3D生成につながる。本研究では,3次元拡散モデルと2次元拡散モデルの両方から知識を抽出することにより,スパースで自由な3次元点と現実的な形状制御可能な3次元点とのギャップを埋めることのできるポイントツー3Dの柔軟な枠組みを提案する。 Points-to-3Dの基本的な考え方は、テキストから3D生成を導くために制御可能なスパース3Dポイントを導入することである。具体的には、3次元拡散モデルであるPoint-Eから生成されたスパース点雲を1つの参照画像に条件付き幾何学的先行として用いる。スパース3D点をよりよく活用するために,このスパース3D点の形状に合わせて,NeRFの形状を適応的に駆動する効率的な点雲誘導損失を提案する。幾何制御に加えて,より視界に一貫性のある外観に最適化することを提案する。具体的には,公開された2次元画像拡散モデル制御ネットにスコア蒸留を行い,テキストを条件とし,学習したコンパクト幾何の奥行きマップを作成する。定性的かつ定量的な比較は、Points-to-3Dがビューの一貫性を改善し、テキストから3D生成のための良好な形状制御を実現することを示す。 Points-to-3Dは、テキストから3D生成を改善し制御する新しい方法を提供する。

Text-to-3D generation has recently garnered significant attention, fueled by 2D diffusion models trained on billions of image-text pairs. Existing methods primarily rely on score distillation to leverage the 2D diffusion priors to supervise the generation of 3D models, e.g., NeRF. However, score distillation is prone to suffer the view inconsistency problem, and implicit NeRF modeling can also lead to an arbitrary shape, thus leading to less realistic and uncontrollable 3D generation. In this work, we propose a flexible framework of Points-to-3D to bridge the gap between sparse yet freely available 3D points and realistic shape-controllable 3D generation by distilling the knowledge from both 2D and 3D diffusion models. The core idea of Points-to-3D is to introduce controllable sparse 3D points to guide the text-to-3D generation. Specifically, we use the sparse point cloud generated from the 3D diffusion model, Point-E, as the geometric prior, conditioned on a single reference image. To better utilize the sparse 3D points, we propose an efficient point cloud guidance loss to adaptively drive the NeRF's geometry to align with the shape of the sparse 3D points. In addition to controlling the geometry, we propose to optimize the NeRF for a more view-consistent appearance. To be specific, we perform score distillation to the publicly available 2D image diffusion model ControlNet, conditioned on text as well as depth map of the learned compact geometry. Qualitative and quantitative comparisons demonstrate that Points-to-3D improves view consistency and achieves good shape controllability for text-to-3D generation. Points-to-3D provides users with a new way to improve and control text-to-3D generation.

翻訳日:2023-07-27 13:58:42 公開日:2023-07-26

# 可変長時系列入力による星型到達可能性解析による深部ニューラルネットワークのロバスト性検証

Robustness Verification of Deep Neural Networks using Star-Based Reachability Analysis with Variable-Length Time Series Input ( http://arxiv.org/abs/2307.13907v1 )

ライセンス: Link先を確認

Neelanjana Pal, Diego Manzanas Lopez, and Taylor T Johnson

(参考訳) データ駆動型ニューラルネットワーク(nn)ベースの異常検出と予測メンテナンスは、新たな研究領域である。 NNベースの時系列データの分析は、過去の行動に関する貴重な洞察と、機器の有用な寿命(RUL)やバッテリーの充電状態(SOC)といった重要なパラメータの推定を提供する。しかし、入力時系列データは、センサーを通過する際に意図的または意図しないノイズにさらされ、堅牢な検証とこれらのNNの検証が必要である。本稿では, 時系列回帰NN(TSRegNN)に対して, 集合に基づく形式的手法を用いたロバスト性検証手法を提案する。可変長入力データを利用して入力操作を効率化し、ネットワークアーキテクチャの一般化性を高める。本手法は,(1)リチウムイオン電池のsoc推定と(2)タービンエンジンのrul推定という,予後管理および健康管理(phm)適用領域の2つのデータセットに適用する。 nnsのロバスト性は、星ベースの到達可能性分析を用いてチェックされ、いくつかのパフォーマンス指標は、入力における有界摂動がネットワーク出力、すなわち将来の結果に与える影響を評価する。全体として本論文は,実世界における時系列データのnnベース分析の検証と検証のための包括的ケーススタディを提供し,特にノイズが将来の結果に与える影響を考慮して,正確で信頼性の高い予測に対するロバストネステストの重要性を強調する。

Data-driven, neural network (NN) based anomaly detection and predictive maintenance are emerging research areas. NN-based analytics of time-series data offer valuable insights into past behaviors and estimates of critical parameters like remaining useful life (RUL) of equipment and state-of-charge (SOC) of batteries. However, input time series data can be exposed to intentional or unintentional noise when passing through sensors, necessitating robust validation and verification of these NNs. This paper presents a case study of the robustness verification approach for time series regression NNs (TSRegNN) using set-based formal methods. It focuses on utilizing variable-length input data to streamline input manipulation and enhance network architecture generalizability. The method is applied to two data sets in the Prognostics and Health Management (PHM) application areas: (1) SOC estimation of a Lithium-ion battery and (2) RUL estimation of a turbine engine. The NNs' robustness is checked using star-based reachability analysis, and several performance measures evaluate the effect of bounded perturbations in the input on network outputs, i.e., future outcomes. Overall, the paper offers a comprehensive case study for validating and verifying NN-based analytics of time-series data in real-world applications, emphasizing the importance of robustness testing for accurate and reliable predictions, especially considering the impact of noise on future outcomes.

翻訳日:2023-07-27 13:58:08 公開日:2023-07-26

# 破壊破壊リプシッツの文脈探索

Corruption-Robust Lipschitz Contextual Search ( http://arxiv.org/abs/2307.13903v1 )

ライセンス: Link先を確認

Shiliang Zuo

(参考訳) リプシッツ関数を劣化したバイナリ信号で学習する問題について研究する。学習者は、敵が選択するリプシッツ関数をf$で学習しようとする。各ラウンドにおいて、敵は入力空間でコンテキストベクトル $x_t$ を選択し、学習者は真の関数値 $f(x_t)$ を推測し、その推測が高いか低いかを示すバイナリ信号を受信する。合計$C$ラウンドでは、信号は破損する可能性があるが、学習者には$C$の値が不明である。学習者の目標は、小さな累積損失を負うことである。汚損防止アルゴリズムを設計するのに有用な,自然かつ強力なテクニックの正当性チェックを提示する。 i は、(リプシッツパラメータ $l$ を定数として扱う)アルゴリズムを設計する: 対称損失に対して、学習者は、$d = 1$ で、$o_d(c\log t + t^{(d-1)/d})$ で、$d > 1$ で、学習者は$\widetilde{o} (t^{d/(d+1)} + c\cdot t^{1/(d+1)})$ で後悔する。

I study the problem of learning a Lipschitz function with corrupted binary signals. The learner tries to learn a Lipschitz function $f$ that the adversary chooses. In each round, the adversary selects a context vector $x_t$ in the input space, and the learner makes a guess to the true function value $f(x_t)$ and receives a binary signal indicating whether the guess was high or low. In a total of $C$ rounds, the signal may be corrupted, though the value of $C$ is unknown to the learner. The learner's goal is to incur a small cumulative loss. I present a natural yet powerful technique sanity check, which proves useful in designing corruption-robust algorithms. I design algorithms which (treating the Lipschitz parameter $L$ as constant): for the symmetric loss, the learner achieves regret $O(C\log T)$ with $d = 1$ and $O_d(C\log T + T^{(d-1)/d})$ with $d > 1$; for the pricing loss the learner achieves regret $\widetilde{O} (T^{d/(d+1)} + C\cdot T^{1/(d+1)})$.

翻訳日:2023-07-27 13:57:45 公開日:2023-07-26

# YOLOBench: 組み込みシステム上での効率的なオブジェクト検出器のベンチマーク

YOLOBench: Benchmarking Efficient Object Detectors on Embedded Systems ( http://arxiv.org/abs/2307.13901v1 )

ライセンス: Link先を確認

Ivan Lazarevich and Matteo Grimaldi and Ravish Kumar and Saptarshi Mitra and Shahrukh Khan and Sudhakar Sah

(参考訳) これは4つの異なるデータセットと4つの組み込みハードウェアプラットフォーム(x86 cpu, arm cpu, nvidia gpu, npu)上の550以上のyoloベースのオブジェクト検出モデルで構成されるベンチマークである。異なるモデルスケールで様々なヨーロベースの1段検出器の精度と待ち時間数を、固定されたトレーニング環境(コードとトレーニングハイパーパラメータ)との公正な比較により収集する。収集したデータのパレート最適分析により、現代の検出ヘッドとトレーニング技術が学習プロセスに組み込まれている場合、YOLOシリーズの複数のアーキテクチャは、YOLOv3やYOLOv4といった古いモデルを含む、良好な精度とレイテンシのトレードオフを実現することが明らかになった。また、yolobenchのニューラルアーキテクチャ探索で使用されるトレーニングフリー精度推定器を評価し、最先端のゼロコスト精度推定器はmacカウントのような単純なベースラインよりも優れているが、そのいくつかはパレート最適検出モデルの予測に効果的に使用できることを示した。 Raspberry Pi 4 CPU上での最先端のYOLOv8モデルと競合するYOLOアーキテクチャを,ゼロコストプロキシを用いて識別できることを示します。コードとデータはhttps://github.com/deeplite/deeplite-torch-zooで入手できる。

We present YOLOBench, a benchmark comprised of 550+ YOLO-based object detection models on 4 different datasets and 4 different embedded hardware platforms (x86 CPU, ARM CPU, Nvidia GPU, NPU). We collect accuracy and latency numbers for a variety of YOLO-based one-stage detectors at different model scales by performing a fair, controlled comparison of these detectors with a fixed training environment (code and training hyperparameters). Pareto-optimality analysis of the collected data reveals that, if modern detection heads and training techniques are incorporated into the learning process, multiple architectures of the YOLO series achieve a good accuracy-latency trade-off, including older models like YOLOv3 and YOLOv4. We also evaluate training-free accuracy estimators used in neural architecture search on YOLOBench and demonstrate that, while most state-of-the-art zero-cost accuracy estimators are outperformed by a simple baseline like MAC count, some of them can be effectively used to predict Pareto-optimal detection models. We showcase that by using a zero-cost proxy to identify a YOLO architecture competitive against a state-of-the-art YOLOv8 model on a Raspberry Pi 4 CPU. The code and data are available at https://github.com/Deeplite/deeplite-torch-zoo

翻訳日:2023-07-27 13:57:07 公開日:2023-07-26

# FinTree: 関係抽出のための金融データセットプリトレイン変圧器エンコーダ

FinTree: Financial Dataset Pretrain Transformer Encoder for Relation Extraction ( http://arxiv.org/abs/2307.13900v1 )

ライセンス: Link先を確認

Hyunjong Ok

(参考訳) 関係抽出のためのFinTree, Financial Dataset Pretrain Transformer Encoderを提案する。エンコーダ言語モデルを用いることで、ファイナンシャルデータセット上でFinTreeをさらに事前訓練し、金融ドメインタスクにモデルを適用する。 FinTreeは、Pattern Exploiting Training方法論にインスパイアされた、従来の[CLS]トークンの代わりにマスク付きトークンを予測する新しい構造で際立っている。この構造により、2つの与えられたエンティティ間のより正確な関係予測が可能になる。モデルは、興味のあるエンティティに関する文脈的および位置的な情報を提供するために、ユニークな入力パターンで訓練され、後処理ステップはエンティティタイプに合わせて正確な予測を保証する。本研究では,FinTreeが大規模金融関係抽出データセットREFinDより優れていることを示す。コードと事前訓練されたモデルはhttps://github.com/HJ-Ok/FinTree.comで入手できる。

We present FinTree, Financial Dataset Pretrain Transformer Encoder for Relation Extraction. Utilizing an encoder language model, we further pretrain FinTree on the financial dataset, adapting the model in financial domain tasks. FinTree stands out with its novel structure that predicts a masked token instead of the conventional [CLS] token, inspired by the Pattern Exploiting Training methodology. This structure allows for more accurate relation predictions between two given entities. The model is trained with a unique input pattern to provide contextual and positional information about the entities of interest, and a post-processing step ensures accurate predictions in line with the entity types. Our experiments demonstrate that FinTree outperforms on the REFinD, a large-scale financial relation extraction dataset. The code and pretrained models are available at https://github.com/HJ-Ok/FinTree.

翻訳日:2023-07-27 13:56:42 公開日:2023-07-26

# メタ学習生成モデルによるニューラルネットワークの正規化

Regularizing Neural Networks with Meta-Learning Generative Models ( http://arxiv.org/abs/2307.13899v1 )

ライセンス: Link先を確認

Shin'ya Yamaguchi, Daiki Chijiwa, Sekitoshi Kanai, Atsutoshi Kumagai, Hisashi Kashima

(参考訳) 本稿では,深層学習のための生成データ向上手法について検討する。生成データ拡張は、生成モデルによって生成された合成サンプルを、小さなデータセット設定で分類するための追加データセットとして活用する。生成データ拡張の重要な課題は、合成データが精度を低下させる非変換サンプルを含むことである。これは、合成サンプルが実際のデータのクラスカテゴリを完全に表現しておらず、一様サンプリングが必ずしもタスクに有用なサンプルを提供していないためである。本稿では,メタ生成正則化(Meta Generative regularization, MGR)と呼ばれる新しい生成データ拡張戦略を提案する。生成データ拡張の劣化を避けるため、mgrは損失関数(例えばクロスエントロピー)ではなく、特徴抽出器の正規化用語で合成サンプルを利用する。これらの合成サンプルはメタラーニングによる検証損失を最小限に抑えるために動的に決定される。我々は,MGRが生合成データ強化の性能劣化を回避し,ベースラインを向上できることを示した。 6つのデータセットに関する実験は、特にデータセットがベースラインよりも小さく安定的に優れている場合にmgrが有効であることを示した。

This paper investigates methods for improving generative data augmentation for deep learning. Generative data augmentation leverages the synthetic samples produced by generative models as an additional dataset for classification with small dataset settings. A key challenge of generative data augmentation is that the synthetic data contain uninformative samples that degrade accuracy. This is because the synthetic samples do not perfectly represent class categories in real data and uniform sampling does not necessarily provide useful samples for tasks. In this paper, we present a novel strategy for generative data augmentation called meta generative regularization (MGR). To avoid the degradation of generative data augmentation, MGR utilizes synthetic samples in the regularization term for feature extractors instead of in the loss function, e.g., cross-entropy. These synthetic samples are dynamically determined to minimize the validation losses through meta-learning. We observed that MGR can avoid the performance degradation of na\"ive generative data augmentation and boost the baselines. Experiments on six datasets showed that MGR is effective particularly when datasets are smaller and stably outperforms baselines.

翻訳日:2023-07-27 13:56:25 公開日:2023-07-26

# avit: 小さな皮膚病変分割データセットに対する視覚トランスフォーマーの適用

AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation Datasets ( http://arxiv.org/abs/2307.13897v1 )

ライセンス: Link先を確認

Siyi Du, Nourhan Bayasi, Ghassan Harmarneh, Rafeef Garbi

(参考訳) 皮膚病変セグメンテーション(SLS)は皮膚病変解析において重要な役割を担っている。視覚トランスフォーマー(ViT)は、SLSにとって注目に値するソリューションであるが、固有のパラメータ重構造と誘導バイアスの欠如により、畳み込みニューラルネットワーク(CNN)と比較して、より多くのトレーニングデータを必要とする。この問題を軽減するため、現在のSLSデータセット上で、微調整済みのViTバックボーンにアプローチすることで、より大規模な自然画像から学んだ知識を活用して、必要な皮膚トレーニングデータの量を減らすことを目指している。しかし、大きなバックボーンの全てのパラメータの完全な微調整は計算コストが高く、メモリ集約的である。本稿では,任意のトレーニング済みViTをSLSタスクに転送することで,ViTのデータハンガーを緩和する,新しい効率的な戦略であるAViTを提案する。具体的には、プレトレーニングされた重みを更新せずにvitの特徴表現を変調する軽量モジュール(アダプタ)をトランスフォーマー層に統合する。さらに,細粒度情報とcnnのインダクティブバイアスを把握し,小さなデータセットのセグメンテーションタスクをガイドする入力画像からプロンプト埋め込みを作成するためのプロンプトジェネレータとして,浅いcnnを用いる。 4つの皮膚病変データセットを定量的に検討した結果,avitはsomaと競合するが,訓練可能なパラメータは有意に少ない。私たちのコードはhttps://github.com/siyi-wind/avitで利用可能です。

Skin lesion segmentation (SLS) plays an important role in skin lesion analysis. Vision transformers (ViTs) are considered an auspicious solution for SLS, but they require more training data compared to convolutional neural networks (CNNs) due to their inherent parameter-heavy structure and lack of some inductive biases. To alleviate this issue, current approaches fine-tune pre-trained ViT backbones on SLS datasets, aiming to leverage the knowledge learned from a larger set of natural images to lower the amount of skin training data needed. However, fully fine-tuning all parameters of large backbones is computationally expensive and memory intensive. In this paper, we propose AViT, a novel efficient strategy to mitigate ViTs' data-hunger by transferring any pre-trained ViTs to the SLS task. Specifically, we integrate lightweight modules (adapters) within the transformer layers, which modulate the feature representation of a ViT without updating its pre-trained weights. In addition, we employ a shallow CNN as a prompt generator to create a prompt embedding from the input image, which grasps fine-grained information and CNN's inductive biases to guide the segmentation task on small datasets. Our quantitative experiments on 4 skin lesion datasets demonstrate that AViT achieves competitive, and at times superior, performance to SOTA but with significantly fewer trainable parameters. Our code is available at https://github.com/siyi-wind/AViT.

翻訳日:2023-07-27 13:56:10 公開日:2023-07-26

# AI4GCC - チーム: 海底レベル: 批判と改善

AI4GCC - Team: Below Sea Level: Critiques and Improvements ( http://arxiv.org/abs/2307.13894v1 )

ライセンス: Link先を確認

Bram Renting, Phillip Wozny, Robert Loftin, Claudia Wieners, Erman Acar

(参考訳) 本稿では、気候変動が経済に与える影響を評価するための統合評価モデル(IAM)であるRICE-Nの批判的分析を行う。我々は、アクションマスキングや無関係な行動を含むRICE-Nの重要課題を特定し、関税収入の活用や過剰生産の処罰などの改善を提案する。また、概してIAMの特徴、すなわち過度に楽観的な損傷関数と非現実的な評価コスト関数に重きを置いている。本研究は, 政策立案者へのインスピレーションとして, シミュレーションを改善するため, RICE-N フレームワークをさらに発展させる取り組みに寄与する。

We present a critical analysis of the simulation framework RICE-N, an integrated assessment model (IAM) for evaluating the impacts of climate change on the economy. We identify key issues with RICE-N, including action masking and irrelevant actions, and suggest improvements such as utilizing tariff revenue and penalizing overproduction. We also critically engage with features of IAMs in general, namely overly optimistic damage functions and unrealistic abatement cost functions. Our findings contribute to the ongoing efforts to further develop the RICE-N framework in an effort to improve the simulation, making it more useful as an inspiration for policymakers.

翻訳日:2023-07-27 13:55:44 公開日:2023-07-26

Yu Qin, Duo Zhang, Yuren Pang

(参考訳) 本稿では,実世界ビジネスと政治交渉プロトコルに基づく気候緩和のための動的グループ化交渉モデルを提案する。 AI4GCCコンペティションフレームワーク内では,グループ形成と更新,グループ内交渉,グループ間交渉という3段階のプロセスを開発する。本モデルは,グローバルな気候変動目標を達成するために,様々な利害関係者間の効率的かつ効果的な協力を促進する。グループ形成手法とグループ更新戦略を導入することで,多地域気候交渉における複雑さと不均衡を解消する。グループ内交渉は、すべてのメンバーが緩和活動に貢献することを保証する一方、グループ間交渉は、緩和と貯蓄率を設定するために提案評価フレームワークを使用する。我々は、気候変動対策に関する国際協力を促進するための有望なアプローチとして、RIS-Nフレームワーク内での交渉モデルを実証する。

In this paper, we propose a dynamic grouping negotiation model for climate mitigation based on real-world business and political negotiation protocols. Within the AI4GCC competition framework, we develop a three-stage process: group formation and updates, intra-group negotiation, and inter-group negotiation. Our model promotes efficient and effective cooperation between various stakeholders to achieve global climate change objectives. By implementing a group-forming method and group updating strategy, we address the complexities and imbalances in multi-region climate negotiations. Intra-group negotiations ensure that all members contribute to mitigation efforts, while inter-group negotiations use the proposal-evaluation framework to set mitigation and savings rates. We demonstrate our negotiation model within the RICE-N framework, illustrating a promising approach for facilitating international cooperation on climate change mitigation.

翻訳日:2023-07-27 13:55:32 公開日:2023-07-26

# AI4GCC - チーム: 海底レベル: スコアと実世界の関連性

AI4GCC - Team: Below Sea Level: Score and Real World Relevance ( http://arxiv.org/abs/2307.13892v1 )

ライセンス: Link先を確認

Phillip Wozny, Bram Renting, Robert Loftin, Claudia Wieners, Erman Acar

(参考訳) ai for global climate cooperation (ai4gcc) コンペティションのトラック3への提案として,米-n気候経済シミュレーションにおける使用のための交渉プロトコルを提案する。本提案では, 炭素境界調整機構 (CBAM) と気候クラブ (CC) にインスパイアされた手法を用いて, 炭素漏れの課題を解決することを目的とする。シミュレーション結果と代表集中経路(RCP)と共有社会経済経路(SSP)を比較し,本手法の有効性を実証した。我々のプロトコルは、RCP 3.4/4.5 と SSP 2 に匹敵する温度上昇をもたらす。さらに、我が国の国際貿易機関(WTO)のコンプライアンス、行政及び政治的実現可能性、倫理的懸念について分析する。我々は,我々の提案が発展途上国を損なうリスクがあることを認識し,技術共有や富の再分配といった既存の不平等を悪化させないための具体的な是正措置を提案する。今後の研究は、米-n関税機構を改善し、前述の是正措置を可能にする措置を講じるべきである。

As our submission for track three of the AI for Global Climate Cooperation (AI4GCC) competition, we propose a negotiation protocol for use in the RICE-N climate-economic simulation. Our proposal seeks to address the challenges of carbon leakage through methods inspired by the Carbon Border Adjustment Mechanism (CBAM) and Climate Clubs (CC). We demonstrate the effectiveness of our approach by comparing simulated outcomes to representative concentration pathways (RCP) and shared socioeconomic pathways (SSP). Our protocol results in a temperature rise comparable to RCP 3.4/4.5 and SSP 2. Furthermore, we provide an analysis of our protocol's World Trade Organization compliance, administrative and political feasibility, and ethical concerns. We recognize that our proposal risks hurting the least developing countries, and we suggest specific corrective measures to avoid exacerbating existing inequalities, such as technology sharing and wealth redistribution. Future research should improve the RICE-N tariff mechanism and implement actions allowing for the aforementioned corrective measures.

翻訳日:2023-07-27 13:55:17 公開日:2023-07-26

# EasyNet:3Dインダストリアル異常検出のための簡易ネットワーク

EasyNet: An Easy Network for 3D Industrial Anomaly Detection ( http://arxiv.org/abs/2307.13925v1 )

ライセンス: Link先を確認

Ruitao Chen, Guoyang Xie, Jiaqi Liu, Jinbao Wang, Ziqi Luo, Jinfan Wang, Feng Zheng

(参考訳) 3d異常検出は産業生産(im)におけるコンピュータビジョンの新たな課題である。近年,多くの高度なアルゴリズムが公表されているが,そのほとんどがIMのニーズを満たすことはできない。欠点はいくつかある。一アルゴリズムが大規模な事前訓練されたモデルに大きく依存するため、生産ラインへの展開が困難であること。二記憶バンクの過多による記憶オーバヘッドの大幅な増加三推論速度は、リアルタイムでは達成できない。 To overcome these issues, we propose an easy and deployment-friendly network (called EasyNet) without using pre-trained models and memory banks: firstly, we design a multi-scale multi-modality feature encoder-decoder to accurately reconstruct the segmentation maps of anomalous regions and encourage the interaction between RGB images and depth images; secondly, we adopt a multi-modality anomaly segmentation network to achieve a precise anomaly map; thirdly, we propose an attention-based information entropy fusion module for feature fusion during inference, making it suitable for real-time deployment. 大規模な実験により、EasyNetは事前訓練されたモデルやメモリバンクを使わずに92.6%の異常検出AUROCを実現している。さらに、EasyNetは既存の方法よりも高速で、Tesla V100 GPU上で94.55 FPSのフレームレートを持つ。

3D anomaly detection is an emerging and vital computer vision task in industrial manufacturing (IM). Recently many advanced algorithms have been published, but most of them cannot meet the needs of IM. There are several disadvantages: i) difficult to deploy on production lines since their algorithms heavily rely on large pre-trained models; ii) hugely increase storage overhead due to overuse of memory banks; iii) the inference speed cannot be achieved in real-time. To overcome these issues, we propose an easy and deployment-friendly network (called EasyNet) without using pre-trained models and memory banks: firstly, we design a multi-scale multi-modality feature encoder-decoder to accurately reconstruct the segmentation maps of anomalous regions and encourage the interaction between RGB images and depth images; secondly, we adopt a multi-modality anomaly segmentation network to achieve a precise anomaly map; thirdly, we propose an attention-based information entropy fusion module for feature fusion during inference, making it suitable for real-time deployment. Extensive experiments show that EasyNet achieves an anomaly detection AUROC of 92.6% without using pre-trained models and memory banks. In addition, EasyNet is faster than existing methods, with a high frame rate of 94.55 FPS on a Tesla V100 GPU.

翻訳日:2023-07-27 13:49:36 公開日:2023-07-26

# trajdata: 複数の人軌道データセットに対する統一インターフェース

trajdata: A Unified Interface to Multiple Human Trajectory Datasets ( http://arxiv.org/abs/2307.13924v1 )

ライセンス: Link先を確認

Boris Ivanovic, Guanyu Song, Igor Gilitschenski, Marco Pavone

(参考訳) 軌道予測の分野は近年大きく成長しており、自動運転車(AV)のための大規模で現実的な人間の軌道データセットの公開や歩行者の動き追跡が原因となっている。このようなデータセットはコミュニティにとって朗報だが、それぞれ独自のデータフォーマットとAPIを使用しているため、研究者が複数のデータセットをまたいだメソッドのトレーニングと評価が難しい。これを改善するために、複数の人間の軌跡データセットに統一されたインターフェースであるtrajdataを提案する。 trajdataの中核は、トラジェクトリとマップデータのためのシンプルで均一で効率的な表現とAPIを提供する。そこで本研究では,既存の軌跡データセットの包括的実験的評価を行い,現在の歩行者とavモーション予測研究の基盤となるデータを理解し,これらの知見から将来のデータセットの提案を提示する。 trajdataは許容ライセンス(apache 2.0)であり、https://github.com/nvlabs/trajdataでアクセスすることができる。

The field of trajectory forecasting has grown significantly in recent years, partially owing to the release of numerous large-scale, real-world human trajectory datasets for autonomous vehicles (AVs) and pedestrian motion tracking. While such datasets have been a boon for the community, they each use custom and unique data formats and APIs, making it cumbersome for researchers to train and evaluate methods across multiple datasets. To remedy this, we present trajdata: a unified interface to multiple human trajectory datasets. At its core, trajdata provides a simple, uniform, and efficient representation and API for trajectory and map data. As a demonstration of its capabilities, in this work we conduct a comprehensive empirical evaluation of existing trajectory datasets, providing users with a rich understanding of the data underpinning much of current pedestrian and AV motion forecasting research, and proposing suggestions for future datasets from these insights. trajdata is permissively licensed (Apache 2.0) and can be accessed online at https://github.com/NVlabs/trajdata

翻訳日:2023-07-27 13:49:18 公開日:2023-07-26

# GrammarGPT: 改良されたファインチューニングによる中国語文法誤り訂正のためのオープンソースのLLM探索

GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical Error Correction with Supervised Fine-Tuning ( http://arxiv.org/abs/2307.13923v1 )

ライセンス: Link先を確認

Yaxin Fan, Feng Jiang, Peifeng Li, and Haizhou Li

(参考訳) 文法的誤り訂正は、非文法的文章を自動的に修正することを目的としている。近年、文法的誤り訂正において、クローズドソースの大規模言語モデル(llm、例えばchatgpt)の優れた能力が実証されている。しかし、オープンソース LLM の可能性はまだ明らかにされていない。本稿では,オープンソースのLLMであるGrammarGPTを導入し,中国語の文法的誤り訂正の可能性について検討した。 GrammarGPTの核となるレシピは、ChatGPT生成と人間アノテーションのハイブリッドデータセットを活用することである。手がかり付き文法的誤りに対しては,ChatGPTを誘導して非文法的文を生成するヒューリスティック手法を提案する。手がかりのない文法的誤りに対しては,公開ウェブサイトから非文法的文章を収集し,手作業で修正した。さらに,中国語の文法的誤りを訂正するモデルの能力を高めるために,誤り不変拡張法を採用した。最終的に約1kの並列データを構築し,これらのデータを用いて,香港大学深セン校がリリースしたPhoenixなどのオープンソースのLCMを微調整した。実験の結果,GrammarGPTは既存のSOTAシステムよりも優れていた。モデルパラメータはSOTAベースラインより20倍大きいが、命令チューニングに必要なデータ量は1200倍小さく、ネイティブCGEC上でのオープンソースLCMの可能性を示している。我々のGrammarGPTは、NLPCC2023 SharedTask1に$3^{rd}をランク付けし、我々のアプローチの有効性を示している。コードとデータは \url{https://github.com/freedomintelligence/grammargpt} で入手できる。

Grammatical error correction aims to correct ungrammatical sentences automatically. Recently, some work has demonstrated the excellent capabilities of closed-source Large Language Models (LLMs, e.g., ChatGPT) in grammatical error correction. However, the potential of open-source LLMs remains unexplored. In this paper, we introduced GrammarGPT, an open-source LLM, to preliminary explore its potential for native Chinese grammatical error correction. The core recipe of GrammarGPT is to leverage the hybrid dataset of ChatGPT-generated and human-annotated. For grammatical errors with clues, we proposed a heuristic method to guide ChatGPT to generate ungrammatical sentences by providing those clues. For grammatical errors without clues, we collected ungrammatical sentences from publicly available websites and manually corrected them. In addition, we employed an error-invariant augmentation method to enhance the ability of the model to correct native Chinese grammatical errors. We ultimately constructed about 1k parallel data and utilized these data to fine-tune open-source LLMs (e.g., Phoenix, released by The Chinese University of Hong Kong, Shenzhen) with instruction tuning. The experimental results show that GrammarGPT outperforms the existing SOTA system significantly. Although model parameters are 20x larger than the SOTA baseline, the required amount of data for instruction tuning is 1200x smaller, illustrating the potential of open-source LLMs on native CGEC. Our GrammarGPT ranks $3^{rd}$ on NLPCC2023 SharedTask1, demonstrating our approach's effectiveness. The code and data are available at \url{https://github.com/FreedomIntelligence/GrammarGPT}.

翻訳日:2023-07-27 13:49:01 公開日:2023-07-26

# マルチエージェント学習の安定性:多くのプレイヤーによるネットワークゲームにおける収束性

Stability of Multi-Agent Learning: Convergence in Network Games with Many Players ( http://arxiv.org/abs/2307.13922v1 )

ライセンス: Link先を確認

Aamal Hussain, Dan Leonte, Francesco Belardinelli and Georgios Piliouras

(参考訳) 多くのプレイヤーゲームにおけるマルチエージェント学習の振る舞いは、ネットワークゼロサムゲームのような制限的な例以外で複雑なダイナミクスを示すことが示されている。また,プレイヤー数の増加に伴い,収束行動は生じにくいことが示されている。この問題を解くために,q-learning dynamics について検討し,ネットワークゲームにおいてダイナミクスが一意な均衡に収束するのに十分な条件を決定する。この条件は、ペアワイズ相互作用の性質とネットワーク構造に依存するが、ゲーム内のエージェントの総数とは明確に独立している。この結果を代表的ネットワークゲームで評価し、適切なネットワーク条件下では、任意の数のエージェントで安定した学習ダイナミクスを実現できることを示す。

The behaviour of multi-agent learning in many player games has been shown to display complex dynamics outside of restrictive examples such as network zero-sum games. In addition, it has been shown that convergent behaviour is less likely to occur as the number of players increase. To make progress in resolving this problem, we study Q-Learning dynamics and determine a sufficient condition for the dynamics to converge to a unique equilibrium in any network game. We find that this condition depends on the nature of pairwise interactions and on the network structure, but is explicitly independent of the total number of agents in the game. We evaluate this result on a number of representative network games and show that, under suitable network conditions, stable learning dynamics can be achieved with an arbitrary number of agents.

翻訳日:2023-07-27 13:48:34 公開日:2023-07-26

# ホーキング放射のエントロピーのゆらぎ

Fluctuations in the Entropy of Hawking Radiation ( http://arxiv.org/abs/2307.13920v1 )

ライセンス: Link先を確認

Raphael Bousso, Masamichi Miyaji

(参考訳) 我々は、Penington \emph{et al} が導入した二次元モデルを用いて、ページ曲線の周りのホーキング放射エントロピーのゆらぎを計算するために重力経路積分(GPI)を用いる。ページタイムの前には、$\delta s = e^{-s}/\sqrt{2}$ が発見され、ここで$s$ はブラックホールエントロピーである。この結果は二成分系におけるhaar平均エントロピーゆらぎと一致し、これも先行順序で計算する。ページ時間後、$\delta S \sim e^{-S}$は、マイクロカノニカルエネルギーウィンドウの幅に対数的に依存するプレファクターになる。これはサブシステムのサイズの交換では対称ではないので、固定ヒルベルト空間次元のサブシステムに対するハール平均とは一致しない。この差は、ブラックホールヒルベルト空間次元が状態準備によって固定されないという事実に起因し得る: トップハットのスミア機能を持つマイクロカノニカルアンサンブルにおいても、GPIはブラックホール状態の数に付加的な変動をもたらす。この結果と、GPIによって計算されたページ曲線が滑らかであるという事実は、すべてGPIのアンサンブル解釈に向かっている。

We use the gravitational path integral (GPI) to compute the fluctuations of the Hawking radiation entropy around the Page curve, in a two-dimensional model introduced by Penington \emph{et al}. Before the Page time, we find that $\delta S = e^{-S}/\sqrt{2}$, where $S$ is the black hole entropy. This result agrees with the Haar-averaged entropy fluctuations of a bipartite system, which we also compute at leading order. After the Page time, we find that $\delta S \sim e^{-S}$, up to a prefactor that depends logarithmically on the width of the microcanonical energy window. This is not symmetric under exchange of subsystem sizes and so does not agree with the Haar average for a subsystem of fixed Hilbert space dimension. The discrepancy can be attributed to the fact that the black hole Hilbert space dimension is not fixed by the state preparation: even in a microcanonical ensemble with a top-hat smearing function, the GPI yields an additive fluctuation in the number of black hole states. This result, and the fact that the Page curve computed by the GPI is smooth, all point towards an ensemble interpretation of the GPI.

翻訳日:2023-07-27 13:48:21 公開日:2023-07-26

# 心血管モデルのシミュレーションによる推論

Simulation-based Inference for Cardiovascular Models ( http://arxiv.org/abs/2307.13918v1 )

ライセンス: Link先を確認

Antoine Wehenkel, Jens Behrmann, Andrew C. Miller, Guillermo Sapiro, Ozan Sener, Marco Cuturi, J\"orn-Henrik Jacobsen

(参考訳) 過去数十年間、血液力学シミュレーターは着実に進化し、シリコン中の循環器系を研究するためのツールとなった。このようなツールは、生理的パラメータから全身血行動態をシミュレートするために日常的に使用されているが、波形を可算な生理的パラメータにマッピングする逆問題の解決は、有望かつ困難なままである。シミュレーションベース推論 (SBI) の進歩により, この逆問題を統計的推論として捉えた。代替アプローチとは対照的に、SBIは興味のあるパラメータに対して \textit{posterior distributions} を提供し、 \textit{individual} 測定に対して不確実性の \textit{multi-dimensional} 表現を提供する。本研究は, 臨床関心の5つのバイオマーカーのシリコン内不確実性解析を行い, 測定精度を比較した。心拍数推定の可能性など、既知の事実の確証のほかに、標準的なケア指標から新しいバイオマーカーを推定する可能性についても注目する。 SBIは、パラメータ推定が異なる不確実な状態を示すサブポピュレーションの存在など、標準感度分析では捉えられない事実上の関連した発見を明らかにしている。最後に,in-vivoとin-silicoのギャップをミームiii波形データベースを用いて検討し,心血管シミュレーションが実世界データ解析にどのように寄与するかを批判的に論じる。

Over the past decades, hemodynamics simulators have steadily evolved and have become tools of choice for studying cardiovascular systems in-silico. While such tools are routinely used to simulate whole-body hemodynamics from physiological parameters, solving the corresponding inverse problem of mapping waveforms back to plausible physiological parameters remains both promising and challenging. Motivated by advances in simulation-based inference (SBI), we cast this inverse problem as statistical inference. In contrast to alternative approaches, SBI provides \textit{posterior distributions} for the parameters of interest, providing a \textit{multi-dimensional} representation of uncertainty for \textit{individual} measurements. We showcase this ability by performing an in-silico uncertainty analysis of five biomarkers of clinical interest comparing several measurement modalities. Beyond the corroboration of known facts, such as the feasibility of estimating heart rate, our study highlights the potential of estimating new biomarkers from standard-of-care measurements. SBI reveals practically relevant findings that cannot be captured by standard sensitivity analyses, such as the existence of sub-populations for which parameter estimation exhibits distinct uncertainty regimes. Finally, we study the gap between in-vivo and in-silico with the MIMIC-III waveform database and critically discuss how cardiovascular simulations can inform real-world data analysis.

翻訳日:2023-07-27 13:47:57 公開日:2023-07-26

# BayesDAG: 因果発見のための勾配に基づく後方サンプリング

BayesDAG: Gradient-Based Posterior Sampling for Causal Discovery ( http://arxiv.org/abs/2307.13917v1 )

ライセンス: Link先を確認

Yashas Annadani, Nick Pawlowski, Joel Jennings, Stefan Bauer, Cheng Zhang, Wenbo Gong

(参考訳) Bayesian causal discoveryは、観測されたデータから因果モデルの後方分布を推定し、疫学的な不確実性を定量化し、下流のタスクに利益をもたらすことを目的としている。しかし、DAG(Directed Acyclic Graphs)と非線形関数の組合せ空間に対する共同推論によって計算上の問題が発生する。 DAGに対する効率的な後部推論への最近の進歩にもかかわらず、既存の手法は線形因果モデルに対するノード置換行列の変分推論に制限され、妥協された推論精度、DAG正規化器によって制約された隣接行列の連続緩和が導かれる。本研究では,このような制約を克服する確率的勾配マルコフ連鎖モンテカルロ (sg-mcmc) に基づくスケーラブルベイズ因果発見フレームワークを提案する。本手法では,DAG正則化を必要とせずに後方からDAGを直接サンプリングし,同時に関数パラメータのサンプルを抽出し,線形因果モデルと非線形因果モデルの両方に適用する。提案手法を実現するために,置換に基づくDAG学習と新しい等価性を導出し,置換上に定義された緩和勾配推定器の使用可能性を高める。我々の知る限り、これは勾配に基づくMCMCサンプリングを因果発見に適用した最初のフレームワークである。合成および実世界のデータセットに関する実証評価は、最先端のベースラインと比較して、我々のアプローチの有効性を示している。

Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existing methods are either limited to variational inference on node permutation matrices for linear causal models, leading to compromised inference accuracy, or continuous relaxation of adjacency matrices constrained by a DAG regularizer, which cannot ensure resulting graphs are DAGs. In this work, we introduce a scalable Bayesian causal discovery framework based on stochastic gradient Markov Chain Monte Carlo (SG-MCMC) that overcomes these limitations. Our approach directly samples DAGs from the posterior without requiring any DAG regularization, simultaneously draws function parameter samples and is applicable to both linear and nonlinear causal models. To enable our approach, we derive a novel equivalence to the permutation-based DAG learning, which opens up possibilities of using any relaxed gradient estimator defined over permutations. To our knowledge, this is the first framework applying gradient-based MCMC sampling for causal discovery. Empirical evaluations on synthetic and real-world datasets demonstrate our approach's effectiveness compared to state-of-the-art baselines.

翻訳日:2023-07-27 13:47:30 公開日:2023-07-26

# 予測文脈をもつ帯域におけるオンライン学習

Online learning in bandits with predicted context ( http://arxiv.org/abs/2307.13916v1 )

ライセンス: Link先を確認

Yongyi Guo, Susan Murphy

(参考訳) エージェントがコンテキストのノイズのあるバージョンとエラー分散(あるいはこの分散の推定器)にのみアクセスできる状況的帯域幅問題を考える。この設定は、意思決定の真のコンテキストが観測されず、潜在的に複雑な機械学習アルゴリズムによるコンテキストの予測しかできない幅広いアプリケーションによって動機付けられている。文脈誤差が最小化されない場合、古典的帯域幅アルゴリズムはサブ線形後悔を達成できない。本稿では,この設定における最初のオンラインアルゴリズムを提案する。鍵となる考え方は、古典統計学における測定誤差モデルをオンライン意思決定設定に拡張することである。

We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context and the error variance (or an estimator of this variance). This setting is motivated by a wide range of applications where the true context for decision-making is unobserved, and only a prediction of the context by a potentially complex machine learning algorithm is available. When the context error is non-diminishing, classical bandit algorithms fail to achieve sublinear regret. We propose the first online algorithm in this setting with sublinear regret compared to the appropriate benchmark. The key idea is to extend the measurement error model in classical statistics to the online decision-making setting, which is nontrivial due to the policy being dependent on the noisy context observations.

翻訳日:2023-07-27 13:47:08 公開日:2023-07-26

# 社会目的関数によるソーシャルメディアAIへの民主的価値の埋め込み

Embedding Democratic Values into Social Media AIs via Societal Objective Functions ( http://arxiv.org/abs/2307.13912v1 )

ライセンス: Link先を確認

Chenyan Jia, Michelle S. Lam, Minh Chau Mai, Jeff Hancock, Michael S. Bernstein

(参考訳) ソーシャルメディアフィードをランク付けする人工知能(AI)システムを設計すれば、その目的機能の一部としてパルチザンの敵意を緩和するような民主的価値を考慮できるだろうか? 本稿では, 確立された社会的科学的構成を社会目的関数と呼ぶai目的関数に翻訳する手法を紹介し, 反民主主義的態度の政治科学構築への応用を実証する。伝統的に、そのようなモデルをトレーニングするための観測可能な成果は得られていないが、社会科学はこれらの構築物に対する調査機器や定性的コードブックを開発し、その精度は大規模言語モデルの詳細なプロンプトへの翻訳を容易にする。本稿では,ソーシャルメディア投稿が反民主的態度を促進する程度を推定する民主的態度モデルを作成し,この民主的態度モデルを3つの研究で検証する。研究1では,米国パルチザン間の介入(n=1,380)が,反民主主義的態度スコアを付したソーシャルメディア投稿(アルファ=.895)に手作業で注釈を付け,これらのスコアに基づいて複数のフィードランキング条件をテストし,行動的効果を最初に検証した。削除(d=.20)と下級のフィード(d=.25)は、参加者の経験やエンゲージメントを損なうことなく、パルチザンの敵意を減らした。研究2では, 民主的態度モデルを作成し, マニュアルラベルとの強い合意を求めることで, マニュアルラベルをスケールアップする(rho=.75)。最後に,研究3では,手動ラベルの代わりに民主的態度モデルを用いて研究1を再現し,その姿勢的・行動的影響(N=558)を検証した。本手法は,ソーシャルメディアAIにおける社会的害を軽減するために,社会科学理論と手法に基づく新たな戦略を提案する。

Can we design artificial intelligence (AI) systems that rank our social media feeds to consider democratic values such as mitigating partisan animosity as part of their objective functions? We introduce a method for translating established, vetted social scientific constructs into AI objective functions, which we term societal objective functions, and demonstrate the method with application to the political science construct of anti-democratic attitudes. Traditionally, we have lacked observable outcomes to use to train such models, however, the social sciences have developed survey instruments and qualitative codebooks for these constructs, and their precision facilitates translation into detailed prompts for large language models. We apply this method to create a democratic attitude model that estimates the extent to which a social media post promotes anti-democratic attitudes, and test this democratic attitude model across three studies. In Study 1, we first test the attitudinal and behavioral effectiveness of the intervention among US partisans (N=1,380) by manually annotating (alpha=.895) social media posts with anti-democratic attitude scores and testing several feed ranking conditions based on these scores. Removal (d=.20) and downranking feeds (d=.25) reduced participants' partisan animosity without compromising their experience and engagement. In Study 2, we scale up the manual labels by creating the democratic attitude model, finding strong agreement with manual labels (rho=.75). Finally, in Study 3, we replicate Study 1 using the democratic attitude model instead of manual labels to test its attitudinal and behavioral impact (N=558), and again find that the feed downranking using the societal objective function reduced partisan animosity (d=.25). This method presents a novel strategy to draw on social science theory and methods to mitigate societal harms in social media AIs.

翻訳日:2023-07-27 13:46:56 公開日:2023-07-26

# グラフニューラルネットワークを用いた粒子破砕強度予測用ハイブリッドフレームワーク

Graph Neural Networks-based Hybrid Framework For Predicting Particle Crushing Strength ( http://arxiv.org/abs/2307.13909v1 )

ライセンス: Link先を確認

Tongya Zheng, Tianli Zhang, Qingzheng Guan, Wenjie Huang, Zunlei Feng, Mingli Song, Chun Chen

(参考訳) グラフニューラルネットワークは、異なる実体間の非ユークリッド関係をモデル化できるため、薬品分子の分類や化学反応予測のような多分野のタスクに有効な機械学習ツールとして登場した。粒子破砕は土木工学の重要な分野として、粒子断片結合の破壊による粒状物質の破壊を数値シミュレーションのモデルで記述し、粒子断片とグラフニューラルネットワーク(GNN)との接続を通して粒子破砕の機械的挙動を特徴づける動機となった。しかし、実験室試験や数値シミュレーションの費用がかかるため、研究用のオープンソースの大規模粒子破砕データセットが欠落している。そこで,まず 45,000 個の数値シミュレーションと 900 個の粒子タイプからなるデータセットを生成し, 粒子破砕のための機械学習の研究の進展を促進する。第二に, 粒子フラグメントビューにおける粒子破砕強度を予測するために, gnnsに基づくハイブリッドフレームワークを考案し, 最先端の技術gnnを用いて, 粒子破砕強度を予測する。最後に,従来の機械学習手法と平易なmlpとのハイブリッドフレームワークを比較し,その有効性を確認した。異なる特徴の有用性は、予測値の勾配属性説明によってさらに議論される。我々のデータとコードはhttps://github.com/doujiang-zheng/GNN-For-Particle-Crushingで公開されています。

Graph Neural Networks have emerged as an effective machine learning tool for multi-disciplinary tasks such as pharmaceutical molecule classification and chemical reaction prediction, because they can model non-euclidean relationships between different entities. Particle crushing, as a significant field of civil engineering, describes the breakage of granular materials caused by the breakage of particle fragment bonds under the modeling of numerical simulations, which motivates us to characterize the mechanical behaviors of particle crushing through the connectivity of particle fragments with Graph Neural Networks (GNNs). However, there lacks an open-source large-scale particle crushing dataset for research due to the expensive costs of laboratory tests or numerical simulations. Therefore, we firstly generate a dataset with 45,000 numerical simulations and 900 particle types to facilitate the research progress of machine learning for particle crushing. Secondly, we devise a hybrid framework based on GNNs to predict particle crushing strength in a particle fragment view with the advances of state of the art GNNs. Finally, we compare our hybrid framework against traditional machine learning methods and the plain MLP to verify its effectiveness. The usefulness of different features is further discussed through the gradient attribution explanation w.r.t the predictions. Our data and code are released at https://github.com/doujiang-zheng/GNN-For-Particle-Crushing.

翻訳日:2023-07-27 13:46:19 公開日:2023-07-26

# 音声と顔の相関を再考する: 幾何学的視点

Rethinking Voice-Face Correlation: A Geometry View ( http://arxiv.org/abs/2307.13948v1 )

ライセンス: Link先を確認

Xiang Li, Yandong Wen, Muqiao Yang, Jinglu Wang, Rita Singh, Bhiksha Raj

(参考訳) 音声のマッチングと音声誘導顔合成に関するこれまでの研究は、声と顔の間に強い相関関係を示すが、主に性別、年齢、感情などの粗い意味的手がかりに依存する。本稿では,音声から3次元顔形状を再構成する能力について,意味情報を用いずに幾何学的視点から検討する。音声から予測可能な顔AMを識別し,それを用いて3次元顔再構成を誘導する音声人文計測(AM)-顔パラダイムを提案する。音声と顔の形状をリンクするプロキシとしてAMを活用することで、予測不可能なAMの影響を排除し、顔の形状を抽出できるようにする。提案手法は,3次元顔スキャンと対応する音声記録を用いて,提案するデータセット上で評価し,鼻腔や頭蓋などの顔形状の特定の部分と音声との有意な相関を見出した。本研究は, 音声と顔の相関に関する新しい視点を提供し, 人類計測科学の優れた実証研究として機能する。

Previous works on voice-face matching and voice-guided face synthesis demonstrate strong correlations between voice and face, but mainly rely on coarse semantic cues such as gender, age, and emotion. In this paper, we aim to investigate the capability of reconstructing the 3D facial shape from voice from a geometry perspective without any semantic information. We propose a voice-anthropometric measurement (AM)-face paradigm, which identifies predictable facial AMs from the voice and uses them to guide 3D face reconstruction. By leveraging AMs as a proxy to link the voice and face geometry, we can eliminate the influence of unpredictable AMs and make the face geometry tractable. Our approach is evaluated on our proposed dataset with ground-truth 3D face scans and corresponding voice recordings, and we find significant correlations between voice and specific parts of the face geometry, such as the nasal cavity and cranium. Our work offers a new perspective on voice-face correlation and can serve as a good empirical study for anthropometry science.

翻訳日:2023-07-27 13:39:23 公開日:2023-07-26

# 病理画像におけるCentroid-aware feature recalibration

Centroid-aware feature recalibration for cancer grading in pathology images ( http://arxiv.org/abs/2307.13947v1 )

ライセンス: Link先を確認

Jaeung Lee, Keunho Byeon, and Jin Tae Kwak

(参考訳) がんの診断は病理学において重要な課題である。計算病理学におけるニューラルネットワークの最近の進歩は、これらの手法ががん診断の精度と品質を改善する大きな可能性を持っていることを示している。しかし,そのような手法の堅牢性と信頼性に関する問題は,まだ完全には解決されていない。本稿では,癌診断を高精度かつロバストに行うことができるセントロイド対応機能再構成ネットワークを提案する。提案するネットワークは, 入力病理画像を埋め込み空間にマッピングし, 注意機構を用いて, 異なるがんグレードのベクターを組み込んで調整する。再校正された埋め込みベクターにより、提案ネットワークは入力病理画像を関連するクラスラベル、すなわちがんのグレードに分類する。異なる環境下で収集した大腸癌データセットを用いて,提案ネットワークを評価した。実験の結果, 提案ネットワークは, データセットの環境変化にかかわらず, 病理画像において高い精度でガングレーディングを行うことができることを確認した。

Cancer grading is an essential task in pathology. The recent developments of artificial neural networks in computational pathology have shown that these methods hold great potential for improving the accuracy and quality of cancer diagnosis. However, the issues with the robustness and reliability of such methods have not been fully resolved yet. Herein, we propose a centroid-aware feature recalibration network that can conduct cancer grading in an accurate and robust manner. The proposed network maps an input pathology image into an embedding space and adjusts it by using centroids embedding vectors of different cancer grades via attention mechanism. Equipped with the recalibrated embedding vector, the proposed network classifiers the input pathology image into a pertinent class label, i.e., cancer grade. We evaluate the proposed network using colorectal cancer datasets that were collected under different environments. The experimental results confirm that the proposed network is able to conduct cancer grading in pathology images with high accuracy regardless of the environmental changes in the datasets.

翻訳日:2023-07-27 13:38:52 公開日:2023-07-26

# 最適集約戦略を用いた分散ガウス過程を用いたPMSMの学習制御

Learning-based Control for PMSM Using Distributed Gaussian Processes with Optimal Aggregation Strategy ( http://arxiv.org/abs/2307.13945v1 )

ライセンス: Link先を確認

Zhenxiao Yin, Xiaobing Dai, Zewen Yang, Yang Shen, Georges Hattab, Hang Zhao

(参考訳) 様々な環境や未知環境における正確な制御の需要の増大は、永久磁石同期モータ(PMSM)を含む電源部品の需要の増大に拍車をかけた。システムの未知部分を推定するために機械学習技術が広く採用されており、特にガウス過程回帰(GPR)は連続系モデリングの柔軟性と性能保証のためである。実用的な実装では、分散GPRを用いて高い計算複雑性を緩和する。しかし, 制御的観点からの分散gprの研究は未解決の問題である。本稿では,Lyapunov 安定性理論に基づいて,PMSM に対する分散 GPR の最適集約戦略を提案する。この戦略は後方平均を排他的に活用するので、別のアプローチで後方分散に関連する計算集約的な計算の必要性がなくなる。さらに,提案手法の簡易な計算プロセスは,高周波pmsm制御におけるシームレスな実装に有用である。提案手法の有効性をシミュレーションで実証した。

The growing demand for accurate control in varying and unknown environments has sparked a corresponding increase in the requirements for power supply components, including permanent magnet synchronous motors (PMSMs). To infer the unknown part of the system, machine learning techniques are widely employed, especially Gaussian process regression (GPR) due to its flexibility of continuous system modeling and its guaranteed performance. For practical implementation, distributed GPR is adopted to alleviate the high computational complexity. However, the study of distributed GPR from a control perspective remains an open problem. In this paper, a control-aware optimal aggregation strategy of distributed GPR for PMSMs is proposed based on the Lyapunov stability theory. This strategy exclusively leverages the posterior mean, thereby obviating the need for computationally intensive calculations associated with posterior variance in alternative approaches. Moreover, the straightforward calculation process of our proposed strategy lends itself to seamless implementation in high-frequency PMSM control. The effectiveness of the proposed strategy is demonstrated in the simulations.

翻訳日:2023-07-27 13:38:09 公開日:2023-07-26

# グラフコントラスト学習のためのエントロピーニューラル推定

Entropy Neural Estimation for Graph Contrastive Learning ( http://arxiv.org/abs/2307.13944v1 )

ライセンス: Link先を確認

Yixuan Ma, Xiaolin Zhang, Peng Zhang, Kun Zhan

(参考訳) グラフ上の対比学習は、ノードの識別可能なハイレベル表現を抽出することを目的としている。本稿では,グラフの異なるビューにおける相互情報の下位境界を最大化することにより,データセットのエントロピーを近似することができることを理論的に説明する。そこで本研究では,データセットのビュー間のペアワイズ表現を対比する,シンプルで効果的なサブセットサンプリング戦略を提案する。特に、与えられたグラフからノードとエッジをランダムにサンプリングして、ビューの入力サブセットを構築します。 2つのビューはパラメータ共有のシャムネットワークに送られ、高次元埋め込みを抽出し、グラフ全体の情報エントロピーを推定する。学習プロセスでは,2つの目的を同時に利用してネットワークを最適化することを提案する。具体的には、対照的損失関数の入力は正対と負対からなる。グラフエンコーダの表現能力を向上するための新たな戦略として,クロスビューの類似性に基づいてノードを選択する手法を提案する。我々は, 非常に類似したサンプルと全く異なるデータを選択することで, 正と負のペアの多様性を向上する。また、異なるビューから生成された表現に対して、クロスビュー一貫性制約を導入する。この目的は、学習された表現がグラフ全体の観点からビュー間で一貫性があることを保証する。提案手法は,7つのグラフベンチマークを広範囲に実験し,現在の最先端手法と比較し,競合性能を実現する。この論文が受け入れられたら、ソースコードは公開される予定だ。

Contrastive learning on graphs aims at extracting distinguishable high-level representations of nodes. In this paper, we theoretically illustrate that the entropy of a dataset can be approximated by maximizing the lower bound of the mutual information across different views of a graph, \ie, entropy is estimated by a neural network. Based on this finding, we propose a simple yet effective subset sampling strategy to contrast pairwise representations between views of a dataset. In particular, we randomly sample nodes and edges from a given graph to build the input subset for a view. Two views are fed into a parameter-shared Siamese network to extract the high-dimensional embeddings and estimate the information entropy of the entire graph. For the learning process, we propose to optimize the network using two objectives, simultaneously. Concretely, the input of the contrastive loss function consists of positive and negative pairs. Our selection strategy of pairs is different from previous works and we present a novel strategy to enhance the representation ability of the graph encoder by selecting nodes based on cross-view similarities. We enrich the diversity of the positive and negative pairs by selecting highly similar samples and totally different data with the guidance of cross-view similarity scores, respectively. We also introduce a cross-view consistency constraint on the representations generated from the different views. This objective guarantees the learned representations are consistent across views from the perspective of the entire graph. We conduct extensive experiments on seven graph benchmarks, and the proposed approach achieves competitive performance compared to the current state-of-the-art methods. The source code will be publicly released once this paper is accepted.

翻訳日:2023-07-27 13:37:47 公開日:2023-07-26

# 分散一般化のためのトポロジーアウェアロバスト最適化

Topology-aware Robust Optimization for Out-of-distribution Generalization ( http://arxiv.org/abs/2307.13943v1 )

ライセンス: Link先を確認

Fengchun Qiao, Xi Peng

(参考訳) out-of-distribution (ood) 一般化は、多くの高スループットアプリケーションにおいて非常に望ましい機械学習問題である。既存の手法は、一般化の信頼性の低い過度な悲観的モデリングに苦しむ。任意のテスト分布を一般化することは不可能であるため、分布のトポロジーのさらなる構造は強力なOODレジリエンスを開発する上で重要であると仮定する。そこで本研究では,分散トポロジを原理最適化フレームワークでシームレスに統合するトポロジ対応ロバスト最適化(TRO)を提案する。より具体的には、troは、2つの最適化目標を解決している: 1) 分布トポロジーを明らかにするためにデータ多様体を探索するトポロジー学習; (2) トポロジーを利用したトポロジーの学習。本手法の有効性を理論的に実証し, 分類, 回帰, 意味セグメンテーションなど幅広いタスクにおいて, 芸術の状態を著しく上回っていることを実証的に示す。さらに、データ駆動分布トポロジーはドメイン知識と一貫性があることを実証的に見出し、このアプローチの説明可能性を高めた。

Out-of-distribution (OOD) generalization is a challenging machine learning problem yet highly desirable in many high-stake applications. Existing methods suffer from overly pessimistic modeling with low generalization confidence. As generalizing to arbitrary test distributions is impossible, we hypothesize that further structure on the topology of distributions is crucial in developing strong OOD resilience. To this end, we propose topology-aware robust optimization (TRO) that seamlessly integrates distributional topology in a principled optimization framework. More specifically, TRO solves two optimization objectives: (1) Topology Learning which explores data manifold to uncover the distributional topology; (2) Learning on Topology which exploits the topology to constrain robust optimization for tightly-bounded generalization risks. We theoretically demonstrate the effectiveness of our approach and empirically show that it significantly outperforms the state of the arts in a wide range of tasks including classification, regression, and semantic segmentation. Moreover, we empirically find the data-driven distributional topology is consistent with domain knowledge, enhancing the explainability of our approach.

翻訳日:2023-07-27 13:37:24 公開日:2023-07-26

# Dual-Level Siamese Structure Networkによる半教師付きセマンティックセマンティックセグメンテーションの改善

Improving Semi-Supervised Semantic Segmentation with Dual-Level Siamese Structure Network ( http://arxiv.org/abs/2307.13938v1 )

ライセンス: Link先を確認

Zhibo Tain, Xiaolin Zhang, Peng Zhang, Kun Zhan

(参考訳) semi-supervised semantic segmentation (sss)はラベル付きデータとラベルなしデータの両方を使用して、ラベル付きトレーニング例のコストを削減する重要なタスクである。しかし、sssアルゴリズムの有効性は、ラベルなしデータのポテンシャルを十分に活用することの困難さによって制限される。そこで本研究では,画素間コントラスト学習のためのデュアルレベルシアーム構造ネットワーク (dssn) を提案する。低レベル画像空間と高レベル特徴空間の両方における強力な拡張ビューを用いて、正の対を画素単位のコントラスト損失と整合させることにより、DSSNは、利用可能な未ラベルデータの利用を最大化するように設計されている。さらに,クラス選択を行なわない,あるいはすべてのクラスに対して事前定義されたしきい値を適用しない,既存のメソッドの制限に対処する,弱強監督のための新しいクラス対応擬似ラベル選択戦略を導入する。具体的には,強固な拡張ビューを監督する擬似ラベルを生成するために,クラス毎の弱ビューの上位高信頼予測を選択する。この戦略は、クラスの不均衡を考慮し、ロングテールクラスのパフォーマンスを改善することができる。提案手法は, PASCAL VOC 2012とCityscapesの2つのデータセットに対して, 最先端の結果を得る。

Semi-supervised semantic segmentation (SSS) is an important task that utilizes both labeled and unlabeled data to reduce expenses on labeling training examples. However, the effectiveness of SSS algorithms is limited by the difficulty of fully exploiting the potential of unlabeled data. To address this, we propose a dual-level Siamese structure network (DSSN) for pixel-wise contrastive learning. By aligning positive pairs with a pixel-wise contrastive loss using strong augmented views in both low-level image space and high-level feature space, the proposed DSSN is designed to maximize the utilization of available unlabeled data. Additionally, we introduce a novel class-aware pseudo-label selection strategy for weak-to-strong supervision, which addresses the limitations of most existing methods that do not perform selection or apply a predefined threshold for all classes. Specifically, our strategy selects the top high-confidence prediction of the weak view for each class to generate pseudo labels that supervise the strong augmented views. This strategy is capable of taking into account the class imbalance and improving the performance of long-tailed classes. Our proposed method achieves state-of-the-art results on two datasets, PASCAL VOC 2012 and Cityscapes, outperforming other SSS algorithms by a significant margin.

翻訳日:2023-07-27 13:37:03 公開日:2023-07-26

# ノイズチャネルにおける2つのキュービットプローブによるマルチパラメータ推定

Multiparameter estimation with two qubit probes in noisy channels ( http://arxiv.org/abs/2307.13936v1 )

ライセンス: Link先を確認

Lorcan. O. Conlon, Ping Koy Lam and Syed. M. Assad

(参考訳) この研究は、異なるノイズチャネルの作用下で複数の位相回転を同時に推定する単一および2つのキュービットプローブの性能を比較する。我々は,この同時推定の量子限界を,ホレボと長岡-ハヤシ-ラオ境界をそれぞれ評価して計算する。いくつかの量子ノイズチャネル、すなわちデコヒーリングチャネル、振幅減衰チャネル、位相減衰チャネルが考慮されている。各チャネルに対して最適な1と2のキュービットプローブを求める。可能ならば, 適切な境界を飽和させる明示的な測定戦略を実証し, 同一プローブの複数コピーの集合的測定により, ホレヴォ境界がどの程度近づいたかを考察する。検討されたチャネルの動作により、2つの量子ビットプローブが1つの量子ビットプローブよりも高いパラメータ推定能力を示し、すなわち1つの量子ビットプローブによる達成可能な精度は、2つの量子ビットプローブよりもノイズ環境への露出が増加すると劣化する。しかし、十分なノイズのあるチャネルでは、単一量子ビットプローブが最大2つの量子ビットプローブより優れていることを示す。この研究は、量子力学によって許容される究極の精度限界に達するためには、状態準備と状態測定の段階で絡み合わなければならないことを示している。本論文のチュートリアル的な性質が容易に利用できることを期待している。

This work compares the performance of single and two qubit probes for estimating several phase rotations simultaneously under the action of different noisy channels. We compute the quantum limits for this simultaneous estimation using collective and individual measurements by evaluating the Holevo and Nagaoka-Hayashi Cram\'er-Rao bounds respectively. Several quantum noise channels are considered, namely the decohering channel, the amplitude damping channel and the phase damping channel. For each channel we find the optimal single and two qubit probes. Where possible we demonstrate an explicit measurement strategy which saturates the appropriate bound and we investigate how closely the Holevo bound can be approached through collective measurements on multiple copies of the same probe. We find that under the action of the considered channels, two qubit probes show enhanced parameter estimation capabilities over single qubit probes for almost all non-identity channels, i.e. the achievable precision with a single qubit probe degrades faster with increasing exposure to the noisy environment than that of the two qubit probe. However, in sufficiently noisy channels, we show that it is possible for single qubit probes to outperform maximally entangled two qubit probes. This work shows that, in order to reach the ultimate precision limits allowed by quantum mechanics, entanglement is required in both the state preparation and state measurement stages. It is hoped the tutorial-style nature of this paper will make it easily accessible.

翻訳日:2023-07-27 13:36:39 公開日:2023-07-26

# AIDE: 補助駆動知覚のためのビジョン駆動型マルチビュー、マルチモーダル、マルチタスクデータセット

AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception ( http://arxiv.org/abs/2307.13933v1 )

ライセンス: Link先を確認

Dingkang Yang, Shuai Huang, Zhi Xu, Zhenpeng Li, Shunli Wang, Mingcheng Li, Yuzheng Wang, Yang Liu, Kun Yang, Zhaoyu Chen, Yan Wang, Jing Liu, Peixuan Zhang, Peng Zhai, Lihua Zhang

(参考訳) ドライバーの気晴らしは、過去10年間の交通事故の重要な原因となっている。視覚駆動運転監視システムの開発が進んでいるにもかかわらず、包括的認識データセットの欠如は道路の安全と交通安全を制限している。本稿では,車内と車外の両方の文脈情報を自然なシナリオで考察する,AIDE(AssIstive Driving pErception dataset)を提案する。 AIDEは、ドライバとシーンのマルチビュー設定、顔、体、姿勢、ジェスチャーのマルチモーダルアノテーション、理解を促進するための4つの実用的タスクデザインなど、三つの特徴を通じて、総合的なドライバー監視を促進する。 aideを徹底的に検討するために、広範囲なメソッドを通じて3種類のベースラインフレームワークに関する実験的なベンチマークを提供する。さらに、2つの融合戦略を導入し、効果的なマルチストリーム/モーダル表現の学習に新たな洞察を与える。また、AIDEおよびベンチマークにおけるキーコンポーネントの重要性と合理性についても系統的に検討する。プロジェクトリンクはhttps://github.com/ydk122024/AIDE。

Driver distraction has become a significant cause of severe traffic accidents over the past decade. Despite the growing development of vision-driven driver monitoring systems, the lack of comprehensive perception datasets restricts road safety and traffic security. In this paper, we present an AssIstive Driving pErception dataset (AIDE) that considers context information both inside and outside the vehicle in naturalistic scenarios. AIDE facilitates holistic driver monitoring through three distinctive characteristics, including multi-view settings of driver and scene, multi-modal annotations of face, body, posture, and gesture, and four pragmatic task designs for driving understanding. To thoroughly explore AIDE, we provide experimental benchmarks on three kinds of baseline frameworks via extensive methods. Moreover, two fusion strategies are introduced to give new insights into learning effective multi-stream/modal representations. We also systematically investigate the importance and rationality of the key components in AIDE and benchmarks. The project link is https://github.com/ydk122024/AIDE.

翻訳日:2023-07-27 13:36:15 公開日:2023-07-26

# 多エージェント協調知覚のための時空間認識

Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception ( http://arxiv.org/abs/2307.13929v1 )

ライセンス: Link先を確認

Kun Yang, Dingkang Yang, Jingyu Zhang, Mingcheng Li, Yang Liu, Jing Liu, Hanqi Wang, Peng Sun, Liang Song

(参考訳) 車両間通信の潜在的な応用としてのマルチエージェント協調認識は、単一エージェント認識よりも自動運転車の知覚性能を著しく向上させる可能性がある。しかし、この新たな研究で実用的な情報共有を実現する上で、いくつかの課題が残っている。本稿では,道路上のエージェント間の時空間的認識特性をエンドツーエンドに集約する新しい協調認識フレームワークSCOPEを提案する。具体的にはSCOPEには3つの異なる長所がある。一標的エージェントの現在の表現を高めるために、時間的文脈の効果的な意味的手がかりを考えること。二異種エージェントから知覚的に重要な空間情報を集約し、多スケールの特徴的相互作用による局在誤差を克服する。三適応融合パラダイムによる補完的貢献に基づいて、対象エージェントのマルチソース表現を統合すること。スコープを徹底的に評価するために,3つのデータセット上での協調的3次元物体検出タスクの現実シナリオとシミュレーションシナリオの両方を検討する。大規模な実験は、我々のアプローチの優位性と提案したコンポーネントの必要性を実証する。

Multi-agent collaborative perception as a potential application for vehicle-to-everything communication could significantly improve the perception performance of autonomous vehicles over single-agent perception. However, several challenges remain in achieving pragmatic information sharing in this emerging research. In this paper, we propose SCOPE, a novel collaborative perception framework that aggregates the spatio-temporal awareness characteristics across on-road agents in an end-to-end manner. Specifically, SCOPE has three distinct strengths: i) it considers effective semantic cues of the temporal context to enhance current representations of the target agent; ii) it aggregates perceptually critical spatial information from heterogeneous agents and overcomes localization errors via multi-scale feature interactions; iii) it integrates multi-source representations of the target agent based on their complementary contributions by an adaptive fusion paradigm. To thoroughly evaluate SCOPE, we consider both real-world and simulated scenarios of collaborative 3D object detection tasks on three datasets. Extensive experiments demonstrate the superiority of our approach and the necessity of the proposed components.

翻訳日:2023-07-27 13:35:56 公開日:2023-07-26

# DFR-Net:ヘイズ密度差を利用した画像デハージングのための密度特徴補正ネットワーク

DFR-Net: Density Feature Refinement Network for Image Dehazing Utilizing Haze Density Difference ( http://arxiv.org/abs/2307.13927v1 )

ライセンス: Link先を確認

Zhongze Wang, Haitao Zhao, Lujian Yao, Jingchao Peng, Kaijie Zhao

(参考訳) 画像デハジングタスクでは、ヘイズ密度が重要な特徴であり、デハジング手法の性能に影響を与える。しかし、既存の手法には密度を測る比較画像が欠けているものもあり、中間結果を生成するものもあるが、密度差の活用が欠如しており、密度の認識が容易である。これらの欠陥に対処するために,密度差からヘイズ密度特徴を抽出し,密度差を利用して密度特性を洗練させる密度特徴再構成ネットワーク (DFR-Net) を提案する。 DFR-Netでは、まず全体密度がハジー入力よりも低い提案画像を生成し、大域的な密度差をもたらす。さらに、提案画像のデハージング残差はデハージング性能のレベルを反映し、局所化されたハードデハジングまたは高密度領域を示す局所密度差を提供する。その後,密度認識を実現するため,Global Branch (GB) と Local Branch (LB) を導入する。 GB では,ハッシュ入力と提案画像の特徴抽出に Siamese ネットワークを使用し,グローバル密度特徴再構成 (GDFR) モジュールを提案する。 LBでは, ゆるやかな入力と提案画像間の残差から局所密度特徴を探索し, 局所特徴を更新し, 鮮明な画像特徴に近づけるための中間復調残留フィードフォワード (IDRF) モジュールを導入する。提案手法は, 各種データセット上での最先端の手法を超える結果が得られることを示す。

In image dehazing task, haze density is a key feature and affects the performance of dehazing methods. However, some of the existing methods lack a comparative image to measure densities, and others create intermediate results but lack the exploitation of their density differences, which can facilitate perception of density. To address these deficiencies, we propose a density-aware dehazing method named Density Feature Refinement Network (DFR-Net) that extracts haze density features from density differences and leverages density differences to refine density features. In DFR-Net, we first generate a proposal image that has lower overall density than the hazy input, bringing in global density differences. Additionally, the dehazing residual of the proposal image reflects the level of dehazing performance and provides local density differences that indicate localized hard dehazing or high density areas. Subsequently, we introduce a Global Branch (GB) and a Local Branch (LB) to achieve density-awareness. In GB, we use Siamese networks for feature extraction of hazy inputs and proposal images, and we propose a Global Density Feature Refinement (GDFR) module that can refine features by pushing features with different global densities further away. In LB, we explore local density features from the dehazing residuals between hazy inputs and proposal images and introduce an Intermediate Dehazing Residual Feedforward (IDRF) module to update local features and pull them closer to clear image features. Sufficient experiments demonstrate that the proposed method achieves results beyond the state-of-the-art methods on various datasets.

翻訳日:2023-07-27 13:35:41 公開日:2023-07-26

# 暗号化された視覚変換器モデルのランダムアンサンブルを用いた敵に対するセキュリティ強化

Enhanced Security against Adversarial Examples Using a Random Ensemble of Encrypted Vision Transformer Models ( http://arxiv.org/abs/2307.13985v1 )

ライセンス: Link先を確認

Ryota Iijima, Miki Tanaka, Sayaka Shiota, Hitoshi Kiya

(参考訳) ディープニューラルネットワーク(DNN)は、敵の例(AE)に弱いことがよく知られている。さらに、AEは逆転性を持ち、つまりソースモデルのために生成されたAEは、非自明な確率で別のブラックボックスモデル(ターゲットモデル)を騙すことができる。従来の研究では、ビジョントランスフォーマー(ViT)は、ConvMixerのような畳み込みニューラルネットワーク(CNN)モデルよりも、逆転性の性質に対してより堅牢であることが確認されており、暗号化されたViTは暗号化なしではViTよりも堅牢である。本稿では,より堅牢なモデルを実現するために,暗号化されたViTモデルのランダムアンサンブルを提案する。実験では,提案手法は従来手法よりもブラックボックス攻撃だけでなくホワイトボックス攻撃に対しても堅牢であることが確認された。

Deep neural networks (DNNs) are well known to be vulnerable to adversarial examples (AEs). In addition, AEs have adversarial transferability, which means AEs generated for a source model can fool another black-box model (target model) with a non-trivial probability. In previous studies, it was confirmed that the vision transformer (ViT) is more robust against the property of adversarial transferability than convolutional neural network (CNN) models such as ConvMixer, and moreover encrypted ViT is more robust than ViT without any encryption. In this article, we propose a random ensemble of encrypted ViT models to achieve much more robust models. In experiments, the proposed scheme is verified to be more robust against not only black-box attacks but also white-box ones than convention methods.

翻訳日:2023-07-27 13:29:49 公開日:2023-07-26

# 極小映像品質モデルの設計による映像品質データセットの解析

Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models ( http://arxiv.org/abs/2307.13981v1 )

ライセンス: Link先を確認

Wei Sun and Wen Wen and Xiongkuo Min and Long Lan and Guangtao Zhai and Kede Ma

(参考訳) Blind Video Quality Assessment (BVQA) は、様々な実世界のビデオ対応メディアアプリケーションにおけるエンドユーザの視聴体験の監視と改善に不可欠である。実験分野として、BVQAモデルの改良は、主に人間の評価されたVQAデータセットに基づいて測定されている。したがって、既存のVQAデータセットをよりよく理解し、BVQAの現在の進歩を適切に評価することが重要である。この目標に向けて、最小主義的BVQAモデルを設計することで、VQAデータセットの第一種計算分析を行う。ビデオプリプロセッサ(アグレッシブな時空間的ダウンサンプリング)、空間的品質アナライザ、任意の時間的品質アナライザ、品質レグレッサといった、最も単純なインスタンス化を備えたbvqaモデルのファミリーを最小限に制限します。 8つのVQAデータセットの異なるモデル変種の品質予測性能と現実的な歪みを比較することで、ほぼ全てのデータセットが、さまざまな重大さのデータセット問題に悩まされており、そのうちのいくつかはブラインド画像品質評価(BIQA)ソリューションを受け入れている。さらに、これらのVQAデータセットのモデル一般化可能性と、基本ビルディングブロックに関連するBVQA設計選択を曖昧にすることで、当社の主張を正当化する。我々の結果は、BVQAの現在の進歩に疑問を投げかけ、一方で、次世代のVQAデータセットとモデルを構築するための良い実践に光を当てた。

Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by contrasting our model generalizability on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.

翻訳日:2023-07-27 13:29:35 公開日:2023-07-26

# 強化学習によるGANの潜時空間制御:タスクベース画像翻訳を事例として

Controlling the Latent Space of GANs through Reinforcement Learning: A Case Study on Task-based Image-to-Image Translation ( http://arxiv.org/abs/2307.13978v1 )

ライセンス: Link先を確認

Mahyar Abbasian, Taha Rajabzadeh, Ahmadreza Moradipari, Seyed Amir Hossein Aqajari, Hongsheng Lu, Amir Rahmani

(参考訳) GAN(Generative Adversarial Networks)は、トレーニングデータセットに基づいたリアルなアウトプットを生成する、恐ろしいAIツールとして登場した。しかし、gansの生成プロセスを制御するという課題は依然として大きなハードルとなっている。本稿では,RLエージェントと潜在空間GAN(l-GAN)を統合し,所望の出力を生成することにより,この問題に対処する新しい手法を提案する。より具体的には,l-GANの潜伏空間をナビゲートし,特定のタスクに基づいて出力を生成できる,細心の注意を払って設計された報酬ポリシーを備えたアクタ批判的RLエージェントを開発した。提案手法の有効性を確認するために,MNISTデータセットを用いた一連の実験を行った。これらの実験の結果は、我々の方法論を検証するのに役立つ。我々の先駆的なRLエージェントとGANモデルの統合は、将来、生成ネットワークを強化する大きな可能性を秘めている。

Generative Adversarial Networks (GAN) have emerged as a formidable AI tool to generate realistic outputs based on training datasets. However, the challenge of exerting control over the generation process of GANs remains a significant hurdle. In this paper, we propose a novel methodology to address this issue by integrating a reinforcement learning (RL) agent with a latent-space GAN (l-GAN), thereby facilitating the generation of desired outputs. More specifically, we have developed an actor-critic RL agent with a meticulously designed reward policy, enabling it to acquire proficiency in navigating the latent space of the l-GAN and generating outputs based on specified tasks. To substantiate the efficacy of our approach, we have conducted a series of experiments employing the MNIST dataset, including arithmetic addition as an illustrative task. The outcomes of these experiments serve to validate our methodology. Our pioneering integration of an RL agent with a GAN model represents a novel advancement, holding great potential for enhancing generative networks in the future.

翻訳日:2023-07-27 13:29:09 公開日:2023-07-26

# 高品質なものを追跡する

Tracking Anything in High Quality ( http://arxiv.org/abs/2307.13974v1 )

ライセンス: Link先を確認

Jiawen Zhu, Zhenyu Chen, Zeqi Hao, Shijie Chang, Lu Zhang, Dong Wang, Huchuan Lu, Bin Luo, Jun-Yan He, Jin-Peng Lan, Hanyuan Chen, Chenyang Li

(参考訳) ビジュアルオブジェクトトラッキングはコンピュータビジョンにおける基本的なビデオタスクである。近年、認識アルゴリズムの顕著なパワー向上により、シングル/マルチオブジェクトとボックス/マスクベースのトラッキングの統合が可能になった。その中でもSegment Anything Model (SAM) が注目されている。本稿では,ビデオの高品質なトラッキングのためのフレームワークであるhqtrackを提案する。 HQTrackは主にビデオマルチオブジェクトセグメンタ(VMOS)とマスクリファインダ(MR)で構成されている。ビデオの最初のフレームで追跡するオブジェクトが与えられた場合、VMOSはオブジェクトマスクを現在のフレームに伝搬する。 VMOSは複数のクローズセットビデオオブジェクトセグメンテーション(VOS)データセットでトレーニングされており、複雑なシーンやコーナーシーンに一般化する能力に制限があるため、この段階でのマスクの結果は十分に正確ではない。トラッキングマスクの品質をさらに向上するため、追跡結果を改善するために事前訓練されたMRモデルが採用された。テスト時のデータ拡張やモデルアンサンブルといったトリックを使わずに、私たちのパラダイムの有効性を証明してくれるものとして、HQTrackは、ビジュアルオブジェクト追跡とセグメンテーション(VOTS2023)の2位にランク付けします。コードとモデルはhttps://github.com/jiawen-zhu/hqtrackで入手できる。

Visual object tracking is a fundamental video task in computer vision. Recently, the notably increasing power of perception algorithms allows the unification of single/multiobject and box/mask-based tracking. Among them, the Segment Anything Model (SAM) attracts much attention. In this report, we propose HQTrack, a framework for High Quality Tracking anything in videos. HQTrack mainly consists of a video multi-object segmenter (VMOS) and a mask refiner (MR). Given the object to be tracked in the initial frame of a video, VMOS propagates the object masks to the current frame. The mask results at this stage are not accurate enough since VMOS is trained on several closeset video object segmentation (VOS) datasets, which has limited ability to generalize to complex and corner scenes. To further improve the quality of tracking masks, a pretrained MR model is employed to refine the tracking results. As a compelling testament to the effectiveness of our paradigm, without employing any tricks such as test-time data augmentations and model ensemble, HQTrack ranks the 2nd place in the Visual Object Tracking and Segmentation (VOTS2023) challenge. Code and models are available at https://github.com/jiawen-zhu/HQTrack.

翻訳日:2023-07-27 13:28:51 公開日:2023-07-26

# 隠れ層の線形分離性によるディープニューラルネットワークの理解

Understanding Deep Neural Networks via Linear Separability of Hidden Layers ( http://arxiv.org/abs/2307.13962v1 )

ライセンス: Link先を確認

Chao Zhang, Xinyu Chen, Wensheng Li, Lixue Liu, Wei Wu, Dacheng Tao

(参考訳) 本稿では,隠れ層出力の線形分離性を測定し,深層ニューラルネットワークの特性について検討する。特に,ミンコフスキー差分に基づく線形分離性尺度(MD-LSMs)を提案し,2点集合の線形分離性度を評価する。次に,隠れレイヤ出力の線形分離性度とネットワークトレーニング性能との間に同期性があること,すなわち,更新重みが隠れレイヤ出力の線形分離性度を高めることができるならば,更新ネットワークはよりよいトレーニング性能を達成し,その逆も実現できることを示す。さらに,活性化関数とネットワークサイズ(幅と深さを含む)が隠れ層の線形分離性に及ぼす影響について検討した。最後に、多層パーセプトロン(MLP)、畳み込みニューラルネットワーク(CNN)、深層ネットワーク(DBN)、ResNet、VGGNet、AlexNet、ビジョントランスフォーマー(ViT)、GoogLeNetなど、いくつかの一般的なディープネットワークに関する数値実験を行った。

In this paper, we measure the linear separability of hidden layer outputs to study the characteristics of deep neural networks. In particular, we first propose Minkowski difference based linear separability measures (MD-LSMs) to evaluate the linear separability degree of two points sets. Then, we demonstrate that there is a synchronicity between the linear separability degree of hidden layer outputs and the network training performance, i.e., if the updated weights can enhance the linear separability degree of hidden layer outputs, the updated network will achieve a better training performance, and vice versa. Moreover, we study the effect of activation function and network size (including width and depth) on the linear separability of hidden layers. Finally, we conduct the numerical experiments to validate our findings on some popular deep networks including multilayer perceptron (MLP), convolutional neural network (CNN), deep belief network (DBN), ResNet, VGGNet, AlexNet, vision transformer (ViT) and GoogLeNet.

翻訳日:2023-07-27 13:28:30 公開日:2023-07-26

# 可変容量束量子ビットのデコヒーレンス

Decoherence of a tunable capacitively shunted flux qubit ( http://arxiv.org/abs/2307.13961v1 )

ライセンス: Link先を確認

R. Trappen, X. Dai, M. A. Yurtalan, D. Melanson, D. M. Tennant, A. J. Martinez, Y. Tang, J. Gibson, J. A. Grover, S. M. Disseler, J. I. Basham, R. Das, D. K. Kim, A. J. Melville, B. M. Niedzielski, C. F. Hirjibehedin, K. Serniak, S. J. Weber, J. L. Yoder, W. D. Oliver, D. A. Lidar, A. Lupascu

(参考訳) 本稿では,コヒーレント量子アニーリング用に設計された波長可変容量量子束量子ビットのコヒーレンスに関する詳細な研究を行う。クビット対称性点における測定された緩和は、主に$\sim3~\text{GHz}$以下のクビット周波数に対する主クビットループの固有フラックスノイズに起因する。高い周波数では、バイアスラインの熱ノイズが緩和に大きく寄与し、高速熱処理と高周波制御の両方を実験的に探索する設計選択から生じる。測定された消耗速度は、主に2つの量子ビットループの固有低周波フラックスノイズによるもので、高速アニーリングに用いられる制御電子回路の低周波ノイズによる追加の寄与がある。劣化時間のフラックスバイアス依存性は、おそらく局所的なフラックスノイズやジャンクション臨界電流ノイズによる2つのキュービットループ間の明らかなノイズ相関も示している。この結果は、コヒーレンスを増大させた超伝導量子アニールの構築に向けた継続的な取り組みに関係している。

We present a detailed study of the coherence of a tunable capacitively-shunted flux qubit, designed for coherent quantum annealing applications. The measured relaxation at the qubit symmetry point is mainly due to intrinsic flux noise in the main qubit loop for qubit frequencies below $\sim3~\text{GHz}$. At higher frequencies, thermal noise in the bias line makes a significant contribution to the relaxation, arising from the design choice to experimentally explore both fast annealing and high-frequency control. The measured dephasing rate is primarily due to intrinsic low-frequency flux noise in the two qubit loops, with additional contribution from the low-frequency noise of control electronics used for fast annealing. The flux-bias dependence of the dephasing time also reveals apparent noise correlation between the two qubit loops, possibly due to non-local sources of flux noise or junction critical-current noise. Our results are relevant for ongoing efforts toward building superconducting quantum annealers with increased coherence.

翻訳日:2023-07-27 13:28:08 公開日:2023-07-26

# ビジュアルプロンプトフレキシブル・モード顔アンチスプーフィング

Visual Prompt Flexible-Modal Face Anti-Spoofing ( http://arxiv.org/abs/2307.13958v1 )

ライセンス: Link先を確認

Zitong Yu, Rizhao Cai, Yawen Cui, Ajian Liu and Changsheng Chen

(参考訳) 近年,face anti-spoofing (fas) システムのロバスト性を改善するため,視覚トランスフォーマーを用いたマルチモーダル学習法が提案されている。しかし、実世界から収集されたマルチモーダル顔データは、様々な撮像センサからのモダリティの欠如により、しばしば不完全である。近年、フレキシブルモダルfas~\cite{yu2023flexible}が注目され、完全なマルチモダルフェースデータを用いた統一マルチモダルfasモデルの開発が目的となっている。本稿では,フレキシブルモダルfasにおける1つの大きな課題,すなわち,実環境においてトレーニング中やテスト中にモダリティの欠如が発生する場合に取り組む。近年の言語モデルにおけるプロンプト学習の成功に触発されて,我々は,凍ったプレトレーニング基礎モデルから下流のフレキシブルモダルfasタスクに適応するためのモーダル関連プロンプトを学ぶための,フレキシブルモダル \textbf{p}rompt flexible-modal \textbf{fas} (vp-fas)を提案する。具体的には、バニラビジュアルプロンプトと残差コンテクストプロンプトの両方をマルチモーダルトランスフォーマタに接続して、一般的な欠如モダリティケースを処理するが、モデル全体のトレーニングに比べて学習可能なパラメータは4\%未満である。さらに, 部分モダリティが欠如している場合には, モデルに一貫したマルチモーダルな特徴埋め込みを学習させなければならない。 2つのマルチモーダルFASベンチマークデータセットで実施された大規模な実験は、重モデル再トレーニングの要件を緩和しつつ、様々なモダリティケースにおけるパフォーマンスを向上させるVP-FASフレームワークの有効性を示す。

Recently, vision transformer based multimodal learning methods have been proposed to improve the robustness of face anti-spoofing (FAS) systems. However, multimodal face data collected from the real world is often imperfect due to missing modalities from various imaging sensors. Recently, flexible-modal FAS~\cite{yu2023flexible} has attracted more attention, which aims to develop a unified multimodal FAS model using complete multimodal face data but is insensitive to test-time missing modalities. In this paper, we tackle one main challenge in flexible-modal FAS, i.e., when missing modality occurs either during training or testing in real-world situations. Inspired by the recent success of the prompt learning in language models, we propose \textbf{V}isual \textbf{P}rompt flexible-modal \textbf{FAS} (VP-FAS), which learns the modal-relevant prompts to adapt the frozen pre-trained foundation model to downstream flexible-modal FAS task. Specifically, both vanilla visual prompts and residual contextual prompts are plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 4\% learnable parameters compared to training the entire model. Furthermore, missing-modality regularization is proposed to force models to learn consistent multimodal feature embeddings when missing partial modalities. Extensive experiments conducted on two multimodal FAS benchmark datasets demonstrate the effectiveness of our VP-FAS framework that improves the performance under various missing-modality cases while alleviating the requirement of heavy model re-training.

翻訳日:2023-07-27 13:27:52 公開日:2023-07-26

# 不均一な多エージェント協調

Heterogeneous Embodied Multi-Agent Collaboration ( http://arxiv.org/abs/2307.13957v1 )

ライセンス: Link先を確認

Xinzhu Liu, Di Guo, Huaping Liu

(参考訳) 近年,複雑な室内視覚環境においてマルチエージェントエンボディタスクが研究されている。複数のエージェント間のコラボレーションは作業効率を向上し、実用的な価値を持つ。しかし、既存の研究のほとんどは均質なマルチエージェントタスクに焦点を当てている。均質なエージェントと比較して、異質なエージェントはそれぞれの能力を活用して対応するサブタスクを割り当て、複雑なタスクを完了させる。不均一なマルチエージェントタスクは現実のシナリオでは一般的であり、異種エージェント間のコラボレーション戦略は解決すべき課題であり、重要な問題である。本研究では,異種エージェント間の協調について検討するため,異なる能力を持つ複数の異種エージェントが協調してミスプレース物体を検出し,妥当な場所に配置する,異種エージェント間タイディングアップタスクを提案する。適切なタスク計画を実行し、タスク全体を完了するために、エージェントがそれぞれの能力の最大限の活用を要求するため、これは要求の多いタスクである。そこで本研究では,複数の部屋を有する集合住宅において, procthor-10k に基づくマルチエージェント・タイディングアップベンチマークデータセットを構築する。提案手法は,ミスプレース物体検出,合理的レセプタクル予測,ハンドシェイクに基づくグループコミュニケーション機構に基づく階層的決定モデルを提案する。提案モデルの有効性を示すため, 大規模な実験を行った。プロジェクトのWebサイトと実験のビデオはhttps://hetercol.github.io/で見ることができる。

Multi-agent embodied tasks have recently been studied in complex indoor visual environments. Collaboration among multiple agents can improve work efficiency and has significant practical value. However, most of the existing research focuses on homogeneous multi-agent tasks. Compared with homogeneous agents, heterogeneous agents can leverage their different capabilities to allocate corresponding sub-tasks and cooperate to complete complex tasks. Heterogeneous multi-agent tasks are common in real-world scenarios, and the collaboration strategy among heterogeneous agents is a challenging and important problem to be solved. To study collaboration among heterogeneous agents, we propose the heterogeneous multi-agent tidying-up task, in which multiple heterogeneous agents with different capabilities collaborate with each other to detect misplaced objects and place them in reasonable locations. This is a demanding task since it requires agents to make the best use of their different capabilities to conduct reasonable task planning and complete the whole task. To solve this task, we build a heterogeneous multi-agent tidying-up benchmark dataset in a large number of houses with multiple rooms based on ProcTHOR-10K. We propose the hierarchical decision model based on misplaced object detection, reasonable receptacle prediction, as well as the handshake-based group communication mechanism. Extensive experiments are conducted to demonstrate the effectiveness of the proposed model. The project's website and videos of experiments can be found at https://hetercol.github.io/.

翻訳日:2023-07-27 13:27:14 公開日:2023-07-26

# 音韻とヴィザジュの隠れた踊り--音韻と顔の特徴の巧妙な関係を解き明かす

The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features ( http://arxiv.org/abs/2307.13953v1 )

ライセンス: Link先を確認

Liao Qu, Xianwei Zou, Xiang Li, Yandong Wen, Rita Singh, Bhiksha Raj

(参考訳) この研究は、音素と顔の特徴を巧妙に結びつける。音声と顔の相関に関する従来の研究では、音声から顔画像を生成し、音声から3d顔メッシュを再構築するなど、音声入力の長期的使用が一般的である。しかし、音声による犯罪のような状況では、利用可能な音声証拠は短く制限される可能性がある。さらに、生理的観点からは、音声の各部分(音素)は、顔の様々な種類の気流と動きに対応している。したがって、音素と顔属性の隠れたリンクを見つけるのが有利である。本稿では,音素v.s.顔面計測(am)を用いて,音声と顔の関係を詳細に検討するための分析パイプラインを提案する。我々は,各音素-AMペアに対する推定器を構築し,仮説テストにより相関性を評価する。その結果, 子音, 特に発声音と比較して, AMは母音からより予測可能であることが示唆された。さらに、特定のamが音素発音中により多くの動きを示す場合、より予測可能であることも観察する。本研究は,相関関係に関する生理学の諸問題をサポートし,音声対マルチモーダル学習の今後の研究に向けた基礎研究を展開する。

This work unveils the enigmatic link between phonemes and facial features. Traditional studies on voice-face correlations typically involve using a long period of voice input, including generating face images from voices and reconstructing 3D face meshes from voices. However, in situations like voice-based crimes, the available voice evidence may be short and limited. Additionally, from a physiological perspective, each segment of speech -- phoneme -- corresponds to different types of airflow and movements in the face. Therefore, it is advantageous to discover the hidden link between phonemes and face attributes. In this paper, we propose an analysis pipeline to help us explore the voice-face relationship in a fine-grained manner, i.e., phonemes v.s. facial anthropometric measurements (AM). We build an estimator for each phoneme-AM pair and evaluate the correlation through hypothesis testing. Our results indicate that AMs are more predictable from vowels compared to consonants, particularly with plosives. Additionally, we observe that if a specific AM exhibits more movement during phoneme pronunciation, it is more predictable. Our findings support those in physiology regarding correlation and lay the groundwork for future research on speech-face multimodal learning.

翻訳日:2023-07-27 13:26:52 公開日:2023-07-26

# 拡散は事前学習言語モデルにどのように影響するか?

How Does Diffusion Influence Pretrained Language Models on Out-of-Distribution Data? ( http://arxiv.org/abs/2307.13949v1 )

ライセンス: Link先を確認

Huazheng Wang, Daixuan Cheng, Haifeng Sun, Jingyu Wang, Qi Qi, Jianxin Liao, Jing Wang, Cong Liu

(参考訳) トランスフォーマーベースの事前訓練言語モデル(PLM)は、現代のNLPにおいて大きな成功を収めている。 PLMの重要な利点は、良い分配性(OOD)の堅牢性である。近年、拡散モデルがplmに拡散を適用する多くの研究を惹きつけている。拡散がOODデータにPLMがどのように影響するかは未解明のままである。拡散モデルのコアは、ガウスノイズを入力に徐々に適用する前方拡散過程と、ノイズを除去する逆復調過程である。ノイズ入力再構成は拡散モデルの基本的な能力である。我々は,OODデータ再構成能力の検証やOODサンプルの検出など,復元損失を計測することで,OODのロバスト性を直接解析する。実験は、8つのデータセットで異なるトレーニングパラメータとデータ統計特徴を分析して行われる。拡散を伴う微視的PLMはOODデータの再構成能力を低下させる。また、拡散モデルがoodサンプルを効果的に検出し、18%の精度向上でほとんどのデータセットで最先端のパフォーマンスを実現することも示している。これらの結果から, 拡散はPLMのOOD堅牢性を低下させることが示された。

Transformer-based pretrained language models (PLMs) have achieved great success in modern NLP. An important advantage of PLMs is good out-of-distribution (OOD) robustness. Recently, diffusion models have attracted a lot of work to apply diffusion to PLMs. It remains under-explored how diffusion influences PLMs on OOD data. The core of diffusion models is a forward diffusion process which gradually applies Gaussian noise to inputs, and a reverse denoising process which removes noise. The noised input reconstruction is a fundamental ability of diffusion models. We directly analyze OOD robustness by measuring the reconstruction loss, including testing the abilities to reconstruct OOD data, and to detect OOD samples. Experiments are conducted by analyzing different training parameters and data statistical features on eight datasets. It shows that finetuning PLMs with diffusion degrades the reconstruction ability on OOD data. The comparison also shows that diffusion models can effectively detect OOD samples, achieving state-of-the-art performance in most of the datasets with an absolute accuracy improvement up to 18%. These results indicate that diffusion reduces OOD robustness of PLMs.

翻訳日:2023-07-27 13:26:34 公開日:2023-07-26

# スケルトンに基づく人間の運動予測のための学習スニペットから運動への進歩

Learning Snippet-to-Motion Progression for Skeleton-based Human Motion Prediction ( http://arxiv.org/abs/2307.14006v1 )

ライセンス: Link先を確認

Xinshun Wang, Qiongjie Cui, Chen Chen, Shen Zhao, Mengyuan Liu

(参考訳) 既存のグラフ畳み込みネットワークは、人間の動き予測を達成するために、歴史入力から直接予測を出力するワンステップスキームを採用しており、人間の動きパターンを活用できない。人間の動きは遷移パターンを持ち、各遷移を表すスニペットに分割することができる。各スニペットは、遷移ポーズと呼ばれる開始と終了のポーズから再構成することができる。スニペットからモーションへのマルチステージフレームワークを提案し,動作予測をサブタスクに分解する。各サブタスクは、トランザクショナルポーズ予測、スニペット再構築、スニペット・トゥ・モーション予測の3つのモジュールを統合する。具体的には、まず遷移ポーズのみを予測することを提案する。次に、それらを用いて対応するスニペットを再構成し、真の動き列に近似する。最後に、最終的な予測出力を生成するためにそれらを洗練する。このネットワークを実現するために,異なる時空間モデリングに依存する既存手法と比較して,直接的かつ効果的な特徴伝播を可能にする統一グラフモデリングを提案する。ヒト3.6M, CMU Mocap, 3DPWデータセットの大規模実験により, 最先端性能を実現する手法の有効性が検証された。

Existing Graph Convolutional Networks to achieve human motion prediction largely adopt a one-step scheme, which output the prediction straight from history input, failing to exploit human motion patterns. We observe that human motions have transitional patterns and can be split into snippets representative of each transition. Each snippet can be reconstructed from its starting and ending poses referred to as the transitional poses. We propose a snippet-to-motion multi-stage framework that breaks motion prediction into sub-tasks easier to accomplish. Each sub-task integrates three modules: transitional pose prediction, snippet reconstruction, and snippet-to-motion prediction. Specifically, we propose to first predict only the transitional poses. Then we use them to reconstruct the corresponding snippets, obtaining a close approximation to the true motion sequence. Finally we refine them to produce the final prediction output. To implement the network, we propose a novel unified graph modeling, which allows for direct and effective feature propagation compared to existing approaches which rely on separate space-time modeling. Extensive experiments on Human 3.6M, CMU Mocap and 3DPW datasets verify the effectiveness of our method which achieves state-of-the-art performance.

翻訳日:2023-07-27 13:19:38 公開日:2023-07-26

# 単一テキストからの局所およびグローバルキーワードの教師なし抽出

Unsupervised extraction of local and global keywords from a single text ( http://arxiv.org/abs/2307.14005v1 )

ライセンス: Link先を確認

Lida Aleksanyan and Armen E. Allahverdyan

(参考訳) テキストからキーワードを抽出する非教師付きコーパス非依存手法を提案する。これは、単語の空間分布と、この分布が単語のランダムな置換に対する応答に基づいている。既存の方法(例えばYAKE)と比較して、我々の方法には3つの利点がある。まず、長いテキストからキーワードを抽出する方がはるかに効果的である。第二に、ローカルとグローバルの2種類のキーワードを推論できる。第3に、テキストの基本テーマを明らかにする。さらに,本手法は言語非依存であり,短文に適用できる。結果は,従来の古典文学作品データベースからテキストの知識を持つ人間アノテータを通じて得られる(アノテータ間の合意は中等から実質的なものである)。本研究は,抽出された単語の平均長と抽出語の平均名詞数に基づいて,人間に依存しない議論を通じて支持する。高次テキスト特徴を持つキーワードの関係を議論し,キーワードと章分割の関係を明らかにする。

We propose an unsupervised, corpus-independent method to extract keywords from a single text. It is based on the spatial distribution of words and the response of this distribution to a random permutation of words. As compared to existing methods (such as e.g. YAKE) our method has three advantages. First, it is significantly more effective at extracting keywords from long texts. Second, it allows inference of two types of keywords: local and global. Third, it uncovers basic themes in texts. Additionally, our method is language-independent and applies to short texts. The results are obtained via human annotators with previous knowledge of texts from our database of classical literary works (the agreement between annotators is from moderate to substantial). Our results are supported via human-independent arguments based on the average length of extracted content words and on the average number of nouns in extracted words. We discuss relations of keywords with higher-order textual features and reveal a connection between keywords and chapter divisions.

翻訳日:2023-07-27 13:19:19 公開日:2023-07-26

# きめ細かい評価条件による事象記述の感情的自然言語生成

Affective Natural Language Generation of Event Descriptions through Fine-grained Appraisal Conditions ( http://arxiv.org/abs/2307.14004v1 )

ライセンス: Link先を確認

Yarik Menchaca Resendiz and Roman Klinger

(参考訳) 感情的テキスト生成のモデルは顕著な進歩を示しているが、一般的には基本的な感情理論やヴァランス/覚醒値にのみ条件として依存している。これは、明示的な感情表現("the kid is happy")を作ることが目的であるときに適切である。しかし、感情は暗黙的に伝達される。例えば、ある出来事の感情的な解釈(「Their Dog died.」)は、しばしば明示的な感情表現を必要としない。心理学において、評価理論は、事象の認知的評価と潜在的に発達する感情との関係を説明する。彼らは状況の評価をその場に置き、例えば、自身のコントロールや何が起こるかの責任について。生成フレームワークの条件として評価変数を含めると2つの利点があることを示す。 1) 生成モデルは, 特定の感情の作り方や特性について, より詳細な情報を得る。これにより、条件を満たしたテキストが生成される。 2)評価の変数は、感情カテゴリのみを提供するのではなく、状況の特性を述べることによって、ユーザが生成したテキストをよりきめ細かい制御を行うことができる。 7つの感情(Anger, Disgust, Fear, Guilt, Joy, Sadness, Shame)と7つの評価(Attention, Responsibility, Control, Circumstance, Pleasantness, Effort, Certainty)を用いた実験の結果,(1)トレーニング中に評価を追加することで,F1の10ppの精度が向上することがわかった。さらに、(2)鑑定変数のテキストは長く、より詳細なものを含んでいる。これは、ユーザに対するより大きなコントロールの例です。

Models for affective text generation have shown a remarkable progress, but they commonly rely only on basic emotion theories or valance/arousal values as conditions. This is appropriate when the goal is to create explicit emotion statements ("The kid is happy."). Emotions are, however, commonly communicated implicitly. For instance, the emotional interpretation of an event ("Their dog died.") does often not require an explicit emotion statement. In psychology, appraisal theories explain the link between a cognitive evaluation of an event and the potentially developed emotion. They put the assessment of the situation on the spot, for instance regarding the own control or the responsibility for what happens. We hypothesize and subsequently show that including appraisal variables as conditions in a generation framework comes with two advantages. (1) The generation model is informed in greater detail about what makes a specific emotion and what properties it has. This leads to text generation that better fulfills the condition. (2) The variables of appraisal allow a user to perform a more fine-grained control of the generated text, by stating properties of a situation instead of only providing the emotion category. Our Bart and T5-based experiments with 7 emotions (Anger, Disgust, Fear, Guilt, Joy, Sadness, Shame), and 7 appraisals (Attention, Responsibility, Control, Circumstance, Pleasantness, Effort, Certainty) show that (1) adding appraisals during training improves the accurateness of the generated texts by 10 pp in F1. Further, (2) the texts with appraisal variables are longer and contain more details. This exemplifies the greater control for users.

翻訳日:2023-07-27 13:19:07 公開日:2023-07-26

# マトロイド制約を受けるkサブモジュラー最大化のための高速アルゴリズム

Fast algorithms for k-submodular maximization subject to a matroid constraint ( http://arxiv.org/abs/2307.13996v1 )

ライセンス: Link先を確認

Shuxian Niu and Qian Liu and Yang Zhou and Min Li

(参考訳) 本稿では,matroid制約下でk$-submodular関数を最大化するためにしきい値切り下げアルゴリズムを適用し,近似比の少ないgreedyアルゴリズムと比較して,アルゴリズムのクエリの複雑さを低減した。モノトンに対して$(\frac{1}{2} - \epsilon)$-approximation algorithm for monotone $k$-submodular function maximization, and a $(\frac{1}{3} - \epsilon)$-approximation algorithm for non-monotone case, with complexity $o(\frac{n(k\cdot eo + io)}{\epsilon} \log \frac{r}{\epsilon})$, ここで$r$はマトロイドのランクを表し、$io, eo$は、サブセットが独立集合であるかどうかを評価し、それぞれ$f$の関数値を計算するオラクルの数を表す。総サイズ制約は一様マトロイドと呼ばれる特別なマトロイドと見なすことができるので、全サイズ制約の対象となる$k$-サブモジュラー関数を最大化するための高速アルゴリズムを提案する。参列者。

In this paper, we apply a Threshold-Decreasing Algorithm to maximize $k$-submodular functions under a matroid constraint, which reduces the query complexity of the algorithm compared to the greedy algorithm with little loss in approximation ratio. We give a $(\frac{1}{2} - \epsilon)$-approximation algorithm for monotone $k$-submodular function maximization, and a $(\frac{1}{3} - \epsilon)$-approximation algorithm for non-monotone case, with complexity $O(\frac{n(k\cdot EO + IO)}{\epsilon} \log \frac{r}{\epsilon})$, where $r$ denotes the rank of the matroid, and $IO, EO$ denote the number of oracles to evaluate whether a subset is an independent set and to compute the function value of $f$, respectively. Since the constraint of total size can be looked as a special matroid, called uniform matroid, then we present the fast algorithm for maximizing $k$-submodular functions subject to a total size constraint as corollaries. corollaries.

翻訳日:2023-07-27 13:18:35 公開日:2023-07-26

# 低次元特徴空間における効果的な個人化フェデレーション学習の実現

Take Your Pick: Enabling Effective Personalized Federated Learning within Low-dimensional Feature Space ( http://arxiv.org/abs/2307.13995v1 )

ライセンス: Link先を確認

Guogang Zhu, Xuefeng Liu, Shaojie Tang, Jianwei Niu, Xinghao Wu, Jiaxing Shen

(参考訳) パーソナライズド・フェデレーション・ラーニング(PFL)は、クライアントのデータが異なるドメインにあるアプリケーションシナリオに対処するための異なるモデルを持つことができる人気のあるフレームワークである。 pflのクライアントの典型的なモデルは、全クライアントがトレーニングしたグローバルエンコーダを特徴とし、生データとクライアントのローカルデータを使用してトレーニングされたパーソナライズされたレイヤ(例えば、分類器)から普遍的な特徴を抽出する。それでも、異なるクライアントのデータ分散(別名、ドメインギャップ)の違いにより、グローバルエンコーダが生成する普遍的な機能は、クライアントのローカルなタスクとは無関係に、多くのコンポーネントを含んでいる。最近のPFL法では、エンコーダ内の特定のパラメータをパーソナライズすることで上記の問題に対処している。しかし、これらの手法は、ニューラルネットワークパラメータ空間の高次元性と非線形性に起因する大きな課題に遭遇する。対照的に、特徴空間はより低い次元を示し、パラメータ空間と比較してより直感性と解釈性を提供する。そこで我々はFedPickという新しいPFLフレームワークを提案する。 FedPickは、そのローカルデータ分布に基づいてグローバルエンコーダが生成した特徴から、各クライアントのタスク関連機能を適応的に選択することで、低次元の特徴空間におけるPFLを実現する。これはパラメータ空間で機能する手法と比較して、よりアクセシブルで解釈可能なPFLの実装を示す。大規模な実験結果から、FedPickは各クライアントのタスク関連機能を効果的に選択し、クロスドメインFLにおけるモデル性能を向上させることができた。

Personalized federated learning (PFL) is a popular framework that allows clients to have different models to address application scenarios where clients' data are in different domains. The typical model of a client in PFL features a global encoder trained by all clients to extract universal features from the raw data and personalized layers (e.g., a classifier) trained using the client's local data. Nonetheless, due to the differences between the data distributions of different clients (aka, domain gaps), the universal features produced by the global encoder largely encompass numerous components irrelevant to a certain client's local task. Some recent PFL methods address the above problem by personalizing specific parameters within the encoder. However, these methods encounter substantial challenges attributed to the high dimensionality and non-linearity of neural network parameter space. In contrast, the feature space exhibits a lower dimensionality, providing greater intuitiveness and interpretability as compared to the parameter space. To this end, we propose a novel PFL framework named FedPick. FedPick achieves PFL in the low-dimensional feature space by selecting task-relevant features adaptively for each client from the features generated by the global encoder based on its local data distribution. It presents a more accessible and interpretable implementation of PFL compared to those methods working in the parameter space. Extensive experimental results show that FedPick could effectively select task-relevant features for each client and improve model performance in cross-domain FL.

翻訳日:2023-07-27 13:17:59 公開日:2023-07-26

# BovineTalk: 負の影響下での乳牛の発声分析のための機械学習

BovineTalk: Machine Learning for Vocalization Analysis of Dairy Cattle under Negative Affective States ( http://arxiv.org/abs/2307.13994v1 )

ライセンス: Link先を確認

Dinu Gavojdian, Teddy Lazebnik, Madalina Mincu, Ariel Oren, Ioana Nicolae, Anna Zamansky

(参考訳) 家畜の正確な家畜養殖(PLF)ツールを利用することにより、家畜種における情動状態の非侵襲的な指標を開発し、検証する必要がある。そのような有望なアプローチの1つは、発声指示器の使用である。声化の音響構造とその機能は、豚、馬、鶏、ヤギなどの重要な家畜種で広く研究されたが、牛はこの文脈で現在まで検討されている。牛は, 口を閉じた, あるいは部分的に閉じた, 遠距離接触のための低周波発声 (LF) と, 遠距離通信のための開口発声 (HF) の2種類の発声を, 後者は負の感情状態と関連していると考えられた。さらに, 牛の発声には, 否定的, 肯定的, 幅広い文脈において, 個人性に関する情報が含まれていた。現在では、乳牛は典型的な生産サイクルにおいて一連のネガティブな課題やストレスに直面しており、研究に特に興味を持つネガティブな感情状態の中で声を鳴らしている。この研究の貢献の一つは、視覚隔離課題によって引き起こされるネガティブな感情状態の間、乳牛を授乳する成人の乳牛の、最大で最新の(ノイズからのクリーン)データセットを提供することである。本稿では,深層学習と説明可能な機械学習,高頻度および低周波の牛の鳴き声の分類,および個別の牛の音声認識の2つの計算フレームワークを提案する。両フレームワークのモデルでは, LF分類では87.2%, HF分類では89.4%, 牛個体識別では68.9%, 72.5%の精度であった。

There is a critical need to develop and validate non-invasive animal-based indicators of affective states in livestock species, in order to integrate them into on-farm assessment protocols, potentially via the use of precision livestock farming (PLF) tools. One such promising approach is the use of vocal indicators. The acoustic structure of vocalizations and their functions were extensively studied in important livestock species, such as pigs, horses, poultry and goats, yet cattle remain understudied in this context to date. Cows were shown to produce two types vocalizations: low-frequency calls (LF), produced with the mouth closed, or partially closed, for close distance contacts and open mouth emitted high-frequency calls (HF), produced for long distance communication, with the latter considered to be largely associated with negative affective states. Moreover, cattle vocalizations were shown to contain information on individuality across a wide range of contexts, both negative and positive. Nowadays, dairy cows are facing a series of negative challenges and stressors in a typical production cycle, making vocalizations during negative affective states of special interest for research. One contribution of this study is providing the largest to date pre-processed (clean from noises) dataset of lactating adult multiparous dairy cows during negative affective states induced by visual isolation challenges. Here we present two computational frameworks - deep learning based and explainable machine learning based, to classify high and low-frequency cattle calls, and individual cow voice recognition. Our models in these two frameworks reached 87.2% and 89.4% accuracy for LF and HF classification, with 68.9% and 72.5% accuracy rates for the cow individual identification, respectively.

翻訳日:2023-07-27 13:17:33 公開日:2023-07-26

# コンピュータビジョンタスクにおける因果推論

Causal reasoning in typical computer vision tasks ( http://arxiv.org/abs/2307.13992v1 )

ライセンス: Link先を確認

Zhang, Kexuan and Sun, Qiyu and Zhao, Chaoqiang and Tang, Yang

(参考訳) ディープラーニングは人工知能の分野に革命をもたらした。深層学習に基づく手法によって明らかになった統計的相関に基づき、コンピュータビジョン技術は、自動運転やロボット工学などの分野において大きな成長をもたらした。深層学習の基礎であるにもかかわらず、そのような相関関係は安定ではなく、制御されていない要因に影響を受けやすい。事前知識のガイダンスがないと、統計的相関は容易に素早い相関に変わり、共同設立者を引き起こす。その結果、研究者は因果理論を用いて深層学習に基づく手法を洗練し始めた。因果理論は、データバイアスに影響を受けない固有の因果構造をモデル化し、スプリアス相関を避けるのに有効である。本稿では,セマンティックセグメンテーション,オブジェクト検出,画像キャプションといった視覚・視覚言語タスクにおける既存の因果法を総合的に検討することを目的とした。因果関係の利点と因果関係のパラダイムを構築するためのアプローチを要約する。今後のロードマップも提案され、因果理論の開発と他の複雑なシーンやシステムへの応用が促進される。

Deep learning has revolutionized the field of artificial intelligence. Based on the statistical correlations uncovered by deep learning-based methods, computer vision technology has contributed to tremendous growth in areas such as autonomous driving and robotics. Despite being the basis of deep learning, such correlation is not stable and is susceptible to uncontrolled factors. In the absence of the guidance of prior knowledge, statistical correlations can easily turn into spurious correlations and cause confounders. As a result, researchers are beginning to refine deep learning-based methods with causal theory. Causal theory models the intrinsic causal structure unaffected by data bias and is effective in avoiding spurious correlations. This paper aims to comprehensively review the existing causal methods in typical vision and vision-language tasks such as semantic segmentation, object detection, and image captioning. The advantages of causality and the approaches for building causal paradigms will be summarized. Future roadmaps are also proposed, including facilitating the development of causal theory and its application in other complex scenes and systems.

翻訳日:2023-07-27 13:17:01 公開日:2023-07-26

# METAVerse: オフロードナビゲーションのためのメタラーニングトレーサビリティコストマップ

METAVerse: Meta-Learning Traversability Cost Map for Off-Road Navigation ( http://arxiv.org/abs/2307.13991v1 )

ライセンス: Link先を確認

Junwon Seo, Taekyung Kim, Seongyong Ahn, Kiho Kwak

(参考訳) オフロード環境での自律航行には、正確な地形通過可能性の推定が必要である。しかし,非構造環境におけるトラバーサビリティ推定は,車両とテランの相互作用に影響を与える要因が多様であることから,不確実性が高い。したがって、様々な環境において正確にトラバーサビリティを予測できる一般化モデルを得ることは困難である。本稿では,多様な環境における地形変動を正確にかつ確実に予測するグローバルモデル学習用メタラーニングフレームワークMETAVerseを提案する。トラバーサビリティ予測ネットワークをトレーニングし、疎いLiDAR点雲から高密度で連続的なコストマップを生成し、車と地形の相互作用フィードバックを自己管理的に活用する。メタラーニングは、複数の環境から収集したデータを用いてグローバルモデルを訓練し、推定の不確実性を効果的に最小化する。デプロイ中に、最近のインタラクション体験を利用して、ネットワークをローカル環境に迅速に適応させるために、オンライン適応を行う。総合的な評価を行うため,様々な地形から運転データを収集し,不確実性を最小化するグローバルモデルが得られることを示す。さらに,モデル予測コントローラとモデルを統合することにより,不確かさの低減により,未構造地や未知地での安全で安定した航行が可能となることを示す。

Autonomous navigation in off-road conditions requires an accurate estimation of terrain traversability. However, traversability estimation in unstructured environments is subject to high uncertainty due to the variability of numerous factors that influence vehicle-terrain interaction. Consequently, it is challenging to obtain a generalizable model that can accurately predict traversability in a variety of environments. This paper presents METAVerse, a meta-learning framework for learning a global model that accurately and reliably predicts terrain traversability across diverse environments. We train the traversability prediction network to generate a dense and continuous-valued cost map from a sparse LiDAR point cloud, leveraging vehicle-terrain interaction feedback in a self-supervised manner. Meta-learning is utilized to train a global model with driving data collected from multiple environments, effectively minimizing estimation uncertainty. During deployment, online adaptation is performed to rapidly adapt the network to the local environment by exploiting recent interaction experiences. To conduct a comprehensive evaluation, we collect driving data from various terrains and demonstrate that our method can obtain a global model that minimizes uncertainty. Moreover, by integrating our model with a model predictive controller, we demonstrate that the reduced uncertainty results in safe and stable navigation in unstructured and unknown terrains.

翻訳日:2023-07-27 13:16:47 公開日:2023-07-26

# これは正しくありません! 言語生成システムの否定認識評価

This is not correct! Negation-aware Evaluation of Language Generation Systems ( http://arxiv.org/abs/2307.13989v1 )

ライセンス: Link先を確認

Miriam Ansch\"utz and Diego Miguel Lozano and Georg Groh

(参考訳) 大規模な言語モデルは、否定が文の意味をどの程度変えているかを過小評価する。したがって,これらのモデルに基づく学習評価指標は否定に敏感である。本稿では,BLEURT評価尺度の否定対応版であるNegBLEURTを提案する。そこで我々はルールベースの文否定ツールを設計し,CANNOT否定評価データセットの作成に利用した。このデータセットに基づいて,文変換器と評価指標を微調整し,否定感度を向上させる。既存のベンチマークでこれらのモデルを評価すると、我々の微調整されたモデルは、他の摂動に対するベースモデルのパフォーマンスを維持しながら、否定された文の既存のメトリクスをはるかに上回っています。

Large language models underestimate the impact of negations on how much they change the meaning of a sentence. Therefore, learned evaluation metrics based on these models are insensitive to negations. In this paper, we propose NegBLEURT, a negation-aware version of the BLEURT evaluation metric. For that, we designed a rule-based sentence negation tool and used it to create the CANNOT negation evaluation dataset. Based on this dataset, we fine-tuned a sentence transformer and an evaluation metric to improve their negation sensitivity. Evaluating these models on existing benchmarks shows that our fine-tuned models outperform existing metrics on the negated sentences by far while preserving their base models' performances on other perturbations.

翻訳日:2023-07-27 13:16:25 公開日:2023-07-26

# 下肢筋骨格分節におけるベイズアクティブラーニングのためのハイブリッド表現強調サンプリング

Hybrid Representation-Enhanced Sampling for Bayesian Active Learning in Musculoskeletal Segmentation of Lower Extremities ( http://arxiv.org/abs/2307.13986v1 )

ライセンス: Link先を確認

Ganping Li, Yoshito Otake, Mazen Soufi, Masashi Taniguchi, Masahide Yagi, Noriaki Ichihashi, Keisuke Uemura, Masaki Takao, Nobuhiko Sugano, Yoshinobu Sato

(参考訳) 目的: 自動セグメンテーションのためのディープラーニング(dl)モデルをトレーニングするための手動アノテーションを取得するのは、しばしば時間がかかります。不確実性に基づくベイズ能動学習(BAL)は、アノテーションの努力を減らすために広く研究されている手法である。 balに基づいて,最も有意義なサンプルを効率的に選択することにより,手動アノテーションコストを削減するために,密度と多様性の基準を統合したハイブリッド表現エンハンスドサンプリング戦略を提案する。方法:ベイジアンU-netに基づくBALフレームワークを用いて,MRIおよびCT画像の2つの下肢データセットを用いて実験を行った。本手法は,手動リビジョンのための高密度・多彩な不確実なサンプルを選択し,ラベル付きインスタンスとの最大類似度と既存のトレーニングデータとの最小類似度を最適化する。提案手法である減算アノテーションコスト (rac) を用いて, dice の精度と効率を評価した。さらに, 各種取得規則がBAL性能に及ぼす影響を評価し, 有効性評価のためのアブレーション研究を設計する。結果: 提案手法は, 2つの取得ルールにまたがる2つのデータセットの他の手法よりも優劣を示し, 定量的結果から, 取得ルールの長所と短所を明らかにした。本研究は, 筋骨格の分節化において, 密度と多様性の基準の組み合わせは, いずれかを用いてのみ発現することを示した。結論: 画像分割作業におけるアノテーションコストの削減には, サンプリング手法が有効であることが証明された。提案手法とbalフレームワークの組み合わせは医用画像データセットの効率的なアノテーションのための半自動的な方法を提供する。

Purpose: Obtaining manual annotations to train deep learning (DL) models for auto-segmentation is often time-consuming. Uncertainty-based Bayesian active learning (BAL) is a widely-adopted method to reduce annotation efforts. Based on BAL, this study introduces a hybrid representation-enhanced sampling strategy that integrates density and diversity criteria to save manual annotation costs by efficiently selecting the most informative samples. Methods: The experiments are performed on two lower extremity (LE) datasets of MRI and CT images by a BAL framework based on Bayesian U-net. Our method selects uncertain samples with high density and diversity for manual revision, optimizing for maximal similarity to unlabeled instances and minimal similarity to existing training data. We assess the accuracy and efficiency using Dice and a proposed metric called reduced annotation cost (RAC), respectively. We further evaluate the impact of various acquisition rules on BAL performance and design an ablation study for effectiveness estimation. Results: The proposed method showed superiority or non-inferiority to other methods on both datasets across two acquisition rules, and quantitative results reveal the pros and cons of the acquisition rules. Our ablation study in volume-wise acquisition shows that the combination of density and diversity criteria outperforms solely using either of them in musculoskeletal segmentation. Conclusion: Our sampling method is proven efficient in reducing annotation costs in image segmentation tasks. The combination of the proposed method and our BAL framework provides a semi-automatic way for efficient annotation of medical image datasets.

翻訳日:2023-07-27 13:16:15 公開日:2023-07-26

# 低域重み行列を用いた一層自己注意型変圧器はユニバーサル近似器か?

Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators? ( http://arxiv.org/abs/2307.14023v1 )

ライセンス: Link先を確認

Tokio Kajitsuka and Issei Sato

(参考訳) 変圧器モデルの表現能力の既存の分析では、データの記憶に過度に深い層を必要とするため、実際に実際に使用される変圧器との相違が生じている。これは主にハードマックス関数の近似としてのソフトマックス関数の解釈によるものである。ソフトマックス関数とボルツマン作用素の接続を明確化することにより、低ランク重み行列を持つ単層が入力列全体の文脈を完全に捉える能力を有することを証明した。その結果、単一層トランスフォーマーは有限標本の記憶能力を有しており、2つのフィードフォワードニューラルネットワークを持つ1つの自己アテンション層からなるトランスフォーマーは、コンパクトドメイン上の連続関数の普遍近似器であることを示す。

Existing analyses of the expressive capacity of Transformer models have required excessively deep layers for data memorization, leading to a discrepancy with the Transformers actually used in practice. This is primarily due to the interpretation of the softmax function as an approximation of the hardmax function. By clarifying the connection between the softmax function and the Boltzmann operator, we prove that a single layer of self-attention with low-rank weight matrices possesses the capability to perfectly capture the context of an entire input sequence. As a consequence, we show that single-layer Transformer has a memorization capacity for finite samples, and that Transformers consisting of one self-attention layer with two feed-forward neural networks are universal approximators for continuous functions on a compact domain.

翻訳日:2023-07-27 13:10:11 公開日:2023-07-26

# 量子コンピューティングの効率最適化:熱力学と計算性能のバランス

Efficiency Optimization in Quantum Computing: Balancing Thermodynamics and Computational Performance ( http://arxiv.org/abs/2307.14022v1 )

ライセンス: Link先を確認

Tomasz \'Smierzchalski, Zakaria Mzaouali, Sebastian Deffner, Bart{\l}omiej Gardas

(参考訳) 逆熱処理におけるD波量子アニールの計算効率と熱力学的コストについて検討した。実験の結果, 逆アニーリングとパジングの組み合わせは, 熱力学的コストを最小化しつつ, 計算効率の向上につながることがわかった。さらに, 逆アニーリング時に, 磁場が量子アニーラーの性能に正の影響を及ぼすが, 舗装が関与すると劣化することがわかった。本研究では,逆アニーリングプロトコルを用いた量子アニーリングシステムの性能とエネルギー消費を最適化する手法を提案する。

We investigate the computational efficiency and thermodynamic cost of the D-Wave quantum annealer under reverse-annealing with and without pausing. Our experimental results demonstrate that the combination of reverse-annealing and pausing leads to improved computational efficiency while minimizing the thermodynamic cost compared to reverse-annealing alone. Moreover, we find that the magnetic field has a positive impact on the performance of the quantum annealer during reverse-annealing but becomes detrimental when pausing is involved. Our results provide strategies for optimizing the performance and energy consumption of quantum annealing systems employing reverse-annealing protocols.

翻訳日:2023-07-27 13:09:57 公開日:2023-07-26

# retinotopyインスパイアされた脳エンコーディングモデルとオールフォーワントレーニングレシピ

Retinotopy Inspired Brain Encoding Model and the All-for-One Training Recipe ( http://arxiv.org/abs/2307.14021v1 )

ライセンス: Link先を確認

Huzheng Yang, Jianbo Shi, James Gee

(参考訳) 脳エンコーディングモデルは、刺激画像に対する脳のボクセル的反応を予測し、ニューロイメージング技術で捉えた脳信号を複製することを目的としている。大量の公開データがあるが、包括的な脳エンコーディングモデルのトレーニングは難しい。主な難しさは a) 機能的異種脳領域を有する個々の脳内の多様性 b) 遺伝的及び発達的差異による異なる対象からの脳の多様性 c) 画像モダリティおよび処理パイプラインの多様性。この多様性は、難解な1つの大きなモデルの問題を複数の小さなモデルに分割し、異なる機能領域の区別を維持しながら知識を集約する、オール・フォー・ワンのトレーニングレシピを導入することで、当社の利点を生かしている。トレーニングレシピによらず、脳の生物学的知識、特に網膜写真を用いて誘導バイアスを導入し、3D脳画像マッピングを学習します。 a) 各ニューロンは、情報を収集する画像領域及び意味レベルを知っており、 b) モデルに残されたニューロンは存在しない。 3つの画像モダリティにまたがる5つの公開データセットから100万以上のデータポイントを用いて、脳エンコーディングモデルを事前訓練した。私たちの知る限りでは、これはこれまでで最も包括的な脳のエンコーディングモデルです。視覚バックボーンモデルのドロップイン代替として,事前学習モデルの有効性を示す。さらに,脳のデコードに対するモデルの適用例を示した。コードとモデルチェックポイントが利用可能になる。

Brain encoding models aim to predict brain voxel-wise responses to stimuli images, replicating brain signals captured by neuroimaging techniques. There is a large volume of publicly available data, but training a comprehensive brain encoding model is challenging. The main difficulties stem from a) diversity within individual brain, with functional heterogeneous brain regions; b) diversity of brains from different subjects, due to genetic and developmental differences; c) diversity of imaging modalities and processing pipelines. We use this diversity to our advantage by introducing the All-for-One training recipe, which divides the challenging one-big-model problem into multiple small models, with the small models aggregating the knowledge while preserving the distinction between the different functional regions. Agnostic of the training recipe, we use biological knowledge of the brain, specifically retinotopy, to introduce inductive bias to learn a 3D brain-to-image mapping that ensures a) each neuron knows which image regions and semantic levels to gather information, and b) no neurons are left behind in the model. We pre-trained a brain encoding model using over one million data points from five public datasets spanning three imaging modalities. To the best of our knowledge, this is the most comprehensive brain encoding model to the date. We demonstrate the effectiveness of the pre-trained model as a drop-in replacement for commonly used vision backbone models. Furthermore, we demonstrate the application of the model to brain decoding. Code and the model checkpoint will be made available.

翻訳日:2023-07-27 13:09:45 公開日:2023-07-26

# 監督されていない点群登録のための1Nearest Neighborhood Guides Inlier Estimation

One-Nearest Neighborhood Guides Inlier Estimation for Unsupervised Point Cloud Registration ( http://arxiv.org/abs/2307.14019v1 )

ライセンス: Link先を確認

Yongzhe Yuan, Yue Wu, Maoguo Gong, Qiguang Miao and A. K. Qin

(参考訳) 教師なしのクラウド登録手法の精度は、特に部分的に重複するシナリオにおいて、信頼性の高いインリアリヤ推定と自己監督信号の欠如によって制限される。本稿では,源点クラウドと対応する基準点クラウドコピー間の幾何的構造整合性を把握し,教師なしの点クラウド登録のための効果的な不整合推定手法を提案する。具体的には、高品質な基準点クラウドコピーを得るために、入力点クラウドによりワンネアレス(1-NN)ポイントクラウドを生成する。これによりマッチングマップの構築が容易になり、1-nnポイントクラウドと入力ポイントクラウドの2つの近傍マッチングスコアを統合することで、マッチング信頼性が向上する。高品質な参照コピーに特化して、不整合とその近傍のグラフは、ソースポイントクラウドと対応する参照コピーとの間に整合性を持つべきであると論じる。この観察に基づいて,変換不変な幾何構造表現を構築し,幾何構造一貫性を捉えることにより,原点雲とその参照コピー間の推定対応に対する信頼度を推定する。この戦略はモデル最適化のための信頼性の高い自己教師付き信号も同時に提供する。最後に、重み付きSVDアルゴリズムによる変換推定を、推定対応度と対応する不整合信頼度で計算する。提案モデルを教師なしでトレーニングし,提案手法の有効性を示す合成および実世界のデータセットに関する広範な実験を行った。

The precision of unsupervised point cloud registration methods is typically limited by the lack of reliable inlier estimation and self-supervised signal, especially in partially overlapping scenarios. In this paper, we propose an effective inlier estimation method for unsupervised point cloud registration by capturing geometric structure consistency between the source point cloud and its corresponding reference point cloud copy. Specifically, to obtain a high quality reference point cloud copy, an One-Nearest Neighborhood (1-NN) point cloud is generated by input point cloud. This facilitates matching map construction and allows for integrating dual neighborhood matching scores of 1-NN point cloud and input point cloud to improve matching confidence. Benefiting from the high quality reference copy, we argue that the neighborhood graph formed by inlier and its neighborhood should have consistency between source point cloud and its corresponding reference copy. Based on this observation, we construct transformation-invariant geometric structure representations and capture geometric structure consistency to score the inlier confidence for estimated correspondences between source point cloud and its reference copy. This strategy can simultaneously provide the reliable self-supervised signal for model optimization. Finally, we further calculate transformation estimation by the weighted SVD algorithm with the estimated correspondences and corresponding inlier confidence. We train the proposed model in an unsupervised manner, and extensive experiments on synthetic and real-world datasets illustrate the effectiveness of the proposed method.

翻訳日:2023-07-27 13:09:23 公開日:2023-07-26

# 市販ナノフォトニックシリコン導波路におけるエルビウム放出体

Erbium emitters in commercially fabricated nanophotonic silicon waveguides ( http://arxiv.org/abs/2307.14017v1 )

ライセンス: Link先を確認

Stephan Rinner, Florian Burger, Andreas Gritsch, Jonas Schmitt, Andreas Reiserer

(参考訳) ナノフォトニックシリコンデバイスに統合された量子メモリは、大規模量子ネットワークとスケーラブルフォトニック量子コンピュータのための有望なプラットフォームである。この文脈では、エルビウムドーパタンは電気通信周波数帯の光遷移と第2長いコヒーレンス時間のポテンシャルを組み合わせるため、特に魅力的である。ここでは、これらのエミッタを商業的に製造された低損失導波路に確実に統合できることを示す。我々は、複数の積分手順を調査し、2GHz以下の均一幅と30kHz以下の均一な直線幅を持つ多くのエミッタのアンサンブルを得る。さらに、常磁性不純物を凍結する9Tまでの磁場中での電子スピン状態の分裂を観察する。我々の発見は、CMOS技術を用いてウェーハスケールで製造できる長寿命量子メモリへの重要な一歩である。

Quantum memories integrated into nanophotonic silicon devices are a promising platform for large quantum networks and scalable photonic quantum computers. In this context, erbium dopants are particularly attractive, as they combine optical transitions in the telecommunications frequency band with the potential for second-long coherence time. Here we show that these emitters can be reliably integrated into commercially fabricated low-loss waveguides. We investigate several integration procedures and obtain ensembles of many emitters with an inhomogeneous broadening of < 2 GHz and a homogeneous linewidth of < 30 kHz. We further observe the splitting of the electronic spin states in a magnetic field up to 9 T that freezes paramagnetic impurities. Our findings are an important step towards long-lived quantum memories that can be fabricated on a wafer-scale using CMOS technology.

翻訳日:2023-07-27 13:08:58 公開日:2023-07-26

# RPG-Palm:パルププリント認識のための実データ生成

RPG-Palm: Realistic Pseudo-data Generation for Palmprint Recognition ( http://arxiv.org/abs/2307.14016v1 )

ライセンス: Link先を確認

Lei Shen, Jianlong Jin, Ruixin Zhang, Huaen Li, Yingyi Zhang, Jingyun Zhang, Shouhong Ding, Yang Zhao, Wei Jia

(参考訳) Palmprintは最近、プライバシーにやさしく安定したバイオメトリックスであるため、認識アプリケーションに大きな可能性を示している。しかし、大規模な公開palmprintデータセットの欠如は、palmprint認識のさらなる研究と開発を制限している。本稿では,パームプリントを大量のIDで合成する新しい現実的な擬似パルムプリント生成(RPG)モデルを提案する。まず,クラス内多様性を改善する条件変調生成器を提案する。次に,非ペアトレーニングに対するid一貫性を確保するために,id認識損失を提案する。我々は、アイデンティティ独立を保証するため、B'ezier palm creases生成戦略をさらに改善する。広範な実験結果から,合成前訓練は認識モデルの性能を著しく向上させることが示された。例えば、我々のモデルは、1:1$と1:3$のオープンセットプロトコルの下でtar@far=1e-6の観点で、最先端のb\'ezierpalmを$5\%$と$14\%$で改善します。実際のトレーニングデータのうち10〜%しかアクセスしない場合、本手法はarcfaceを100〜%の実際のトレーニングデータで上回っており、実データなしのpalmprint認識に近いことを示している。

Palmprint recently shows great potential in recognition applications as it is a privacy-friendly and stable biometric. However, the lack of large-scale public palmprint datasets limits further research and development of palmprint recognition. In this paper, we propose a novel realistic pseudo-palmprint generation (RPG) model to synthesize palmprints with massive identities. We first introduce a conditional modulation generator to improve the intra-class diversity. Then an identity-aware loss is proposed to ensure identity consistency against unpaired training. We further improve the B\'ezier palm creases generation strategy to guarantee identity independence. Extensive experimental results demonstrate that synthetic pretraining significantly boosts the recognition model performance. For example, our model improves the state-of-the-art B\'ezierPalm by more than $5\%$ and $14\%$ in terms of TAR@FAR=1e-6 under the $1:1$ and $1:3$ Open-set protocol. When accessing only $10\%$ of the real training data, our method still outperforms ArcFace with $100\%$ real training data, indicating that we are closer to real-data-free palmprint recognition.

翻訳日:2023-07-27 13:08:44 公開日:2023-07-26

# モデル構成のためのスコアベース拡散モデルのmcmc補正

MCMC-Correction of Score-Based Diffusion Models for Model Composition ( http://arxiv.org/abs/2307.14012v1 )

ライセンス: Link先を確認

Anders Sj\"oberg, Jakob Lindqvist, Magnus \"Onnheim, Mats Jirstrand and Lennart Svensson

(参考訳) 拡散モデルはスコアまたはエネルギー関数の項でパラメータ化することができる。エネルギーパラメータ化は,提案する試料の全エネルギーの変化に基づいて,メトロポリス-ハstings補正ステップを用いた拡張サンプリング手順を可能にするという,理論的な性質が向上した。しかし、これは若干パフォーマンスが悪くなり、さらに重要なことに、スコアベースの拡散が広く普及しているため、市販の事前訓練エネルギーベースのものしか利用できない。この制限は、事前訓練されたモデルと新しい分布からのサンプルを組み合わせることを目的としたモデル構成の目的を損なう。しかし,本提案では,スコアパラメータ化の維持と,スコア関数のライン積分によるエネルギーベース受け入れ確率の算出を提案する。これにより、既存の拡散モデルを再利用し、逆過程と様々なマルコフ-チェインモンテカルロ法(MCMC)を組み合わせることができる。提案手法を2次元実験で評価した結果,エネルギーパラメータ化よりも類似性や性能が良好であることが判明した。

Diffusion models can be parameterised in terms of either a score or an energy function. The energy parameterisation has better theoretical properties, mainly that it enables an extended sampling procedure with a Metropolis--Hastings correction step, based on the change in total energy in the proposed samples. However, it seems to yield slightly worse performance, and more importantly, due to the widespread popularity of score-based diffusion, there are limited availability of off-the-shelf pre-trained energy-based ones. This limitation undermines the purpose of model composition, which aims to combine pre-trained models to sample from new distributions. Our proposal, however, suggests retaining the score parameterization and instead computing the energy-based acceptance probability through line integration of the score function. This allows us to re-use existing diffusion models and still combine the reverse process with various Markov-Chain Monte Carlo (MCMC) methods. We evaluate our method on a 2D experiment and find that it achieve similar or arguably better performance than the energy parameterisation.

翻訳日:2023-07-27 13:08:25 公開日:2023-07-26

# ESSAformer:ハイパースペクトル画像超解像のための効率的な変換器

ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution ( http://arxiv.org/abs/2307.14010v1 )

ライセンス: Link先を確認

Mingjin Zhang, Chi Zhang, Qiming Zhang, Jie Guo, Xinbo Gao, Jing Zhang

(参考訳) single hyperspectral image super- resolution (single-hsi-sr) は、低解像度の観測から高分解能のハイパースペクトル画像を復元することを目的としている。しかし、CNNベースのアプローチは、長距離依存の構築とスペクトル特徴間の相互作用情報をキャプチャする際の制限を示している。これにより、アップサンプリング後のスペクトル情報やアーティファクトの利用が不十分になる。この問題に対処するために,単HSI-SR 用 ESSA attention-embedded Transformer ネットワークであるESSAformer を提案する。具体的には、まず、スペクトルのスペクトル相関係数(SCC)である、頑健でスペクトルに親しみやすい類似度尺度である \ie を導入し、元の注意行列を置き換え、誘導バイアスをモデルに組み込んでトレーニングを容易にする。そこで我々は,より効率的なSCCカーネル・セルフアテンション(ESSA)を構築し,注意計算を線形複雑性に還元する理論的支援により,カーネル化可能なアテンション手法をさらに活用する。 ESSAは、アップサンプリング後の特徴に対する受容領域を、多くの計算を伴わずに拡大し、異なるスケールの空間スペクトル情報を効果的に活用し、より自然な高解像度画像を生成する。大規模なデータセットの事前トレーニングを必要とせず、我々の実験は、視覚的品質と定量的結果の両方においてESSAの有効性を実証した。

Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation. However, the prevailing CNN-based approaches have shown limitations in building long-range dependencies and capturing interaction information between spectral features. This results in inadequate utilization of spectral information and artifacts after upsampling. To address this issue, we propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure. Specifically, we first introduce a robust and spectral-friendly similarity metric, \ie, the spectral correlation coefficient of the spectrum (SCC), to replace the original attention matrix and incorporates inductive biases into the model to facilitate training. Built upon it, we further utilize the kernelizable attention technique with theoretical support to form a novel efficient SCC-kernel-based self-attention (ESSA) and reduce attention computation to linear complexity. ESSA enlarges the receptive field for features after upsampling without bringing much computation and allows the model to effectively utilize spatial-spectral information from different scales, resulting in the generation of more natural high-resolution images. Without the need for pretraining on large-scale datasets, our experiments demonstrate ESSA's effectiveness in both visual quality and quantitative results.

翻訳日:2023-07-27 13:08:09 公開日:2023-07-26

# car-studio:シングルビューとエンドレスインザワイルド画像から車の放射場を学ぶ

Car-Studio: Learning Car Radiance Fields from Single-View and Endless In-the-wild Images ( http://arxiv.org/abs/2307.14009v1 )

ライセンス: Link先を確認

Tianyu Liu, Hao Zhao, Yang Yu, Guyue Zhou, Ming Liu

(参考訳) 合成ニューラルシーングラフ研究は、放射場が編集可能な自律運転シミュレーターにおいて効率的なツールであることを示した。しかし、これまでの研究では一連の自動運転データセットの中で学習し、シミュレータで車を回すとぼやけてしまう。本稿では,制約のないイメージを学習し,処理されたイメージからデータセットを構築するパイプラインを提案する。車両の視界が変化したときの明瞭さの維持と、編集時のアーティファクトを避けるため、背景から輪郭を鋭く保つことを求めるシミュレータの要件を満たすため、我々は、都市景観の重要部分である車両の放射場を設計する。実験により,本モデルがベースラインよりも競争性能が高いことを示す。 In-the-wild画像から構築したデータセットを用いて、制御可能な外観編集機能を徐々に提示する。我々はデータセットとコードをhttps://lty2226262.github.io/car-studio/でリリースし、この分野のさらなる研究を促進する。

Compositional neural scene graph studies have shown that radiance fields can be an efficient tool in an editable autonomous driving simulator. However, previous studies learned within a sequence of autonomous driving datasets, resulting in unsatisfactory blurring when rotating the car in the simulator. In this letter, we propose a pipeline for learning unconstrained images and building a dataset from processed images. To meet the requirements of the simulator, which demands that the vehicle maintain clarity when the perspective changes and that the contour remains sharp from the background to avoid artifacts when editing, we design a radiation field of the vehicle, a crucial part of the urban scene foreground. Through experiments, we demonstrate that our model achieves competitive performance compared to baselines. Using the datasets built from in-the-wild images, our method gradually presents a controllable appearance editing function. We will release the dataset and code on https://lty2226262.github.io/car-studio/ to facilitate further research in the field.

翻訳日:2023-07-27 13:07:43 公開日:2023-07-26

# 効率的なグローバルトケミキサーとしての適応周波数フィルタ

Adaptive Frequency Filters As Efficient Global Token Mixers ( http://arxiv.org/abs/2307.14008v1 )

ライセンス: Link先を確認

Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Zheng-Jun Zha, Yan Lu, Baining Guo

(参考訳) 近年のビジョントランスフォーマー、大型カーネルcnn、mlpは、グローバルスコープでの効果的な情報融合により、広いビジョンタスクにおいて顕著な成功を収めている。しかし、その効率的なデプロイメント、特にモバイルデバイスでは、自己着脱機構や大きなカーネル、あるいは完全に接続されたレイヤの計算コストが重いため、依然として注目すべき課題に直面している。本研究では,従来の畳み込み定理を深層学習に適用し,適応周波数フィルタが効率的なグローバルトークンミキサーとして機能することを示す。そこで本研究では,適応周波数フィルタ(AFF)トークンミキサーを提案する。このニューラル演算子は、フーリエ変換を介して潜在表現を周波数領域に転送し、要素分割乗算による意味適応周波数フィルタリングを実行し、この潜在表現の空間分解能よりも大きな動的畳み込み核を持つ元の潜在空間におけるトークン混合演算に数学的に等しい。 affトークンミキサーを主要なニューラルネットワークとして、afnetと呼ばれる軽量ニューラルネットワークを構築する。提案したAFFトークンミキサーの有効性を実証し,AFFNetが視覚認識や密集予測タスクを含む広範囲な視覚的タスクにおいて,他の軽量ネットワーク設計と比較して精度と効率のトレードオフを達成できることを実証した。

Recent vision transformers, large-kernel CNNs and MLPs have attained remarkable successes in broad vision tasks thanks to their effective information fusion in the global scope. However, their efficient deployments, especially on mobile devices, still suffer from noteworthy challenges due to the heavy computational costs of self-attention mechanisms, large kernels, or fully connected layers. In this work, we apply conventional convolution theorem to deep learning for addressing this and reveal that adaptive frequency filters can serve as efficient global token mixers. With this insight, we propose Adaptive Frequency Filtering (AFF) token mixer. This neural operator transfers a latent representation to the frequency domain via a Fourier transform and performs semantic-adaptive frequency filtering via an elementwise multiplication, which mathematically equals to a token mixing operation in the original latent space with a dynamic convolution kernel as large as the spatial resolution of this latent representation. We take AFF token mixers as primary neural operators to build a lightweight neural network, dubbed AFFNet. Extensive experiments demonstrate the effectiveness of our proposed AFF token mixer and show that AFFNet achieve superior accuracy and efficiency trade-offs compared to other lightweight network designs on broad visual tasks, including visual recognition and dense prediction tasks.

翻訳日:2023-07-27 13:07:22 公開日:2023-07-26

# セットレベル誘導攻撃:ビジョンランゲージ事前学習モデルの逆転性を高める

Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models ( http://arxiv.org/abs/2307.14061v1 )

ライセンス: Link先を確認

Dong Lu, Zhiqiang Wang, Teng Wang, Weili Guan, Hongchang Gao, Feng Zheng

(参考訳) 視覚言語事前学習(VLP)モデルは、マルチモーダルタスクにおける敵の例に対する脆弱性を示す。さらに、悪意のある敵は意図的に他のブラックボックスモデルを攻撃することができる。しかし、既存の研究は主にホワイトボックス攻撃の調査に焦点を当てている。本稿では,近年のVLPモデルの逆転送性について検討する。既存の手法は, ホワイトボックス設定における攻撃性能よりもはるかに低い転送性を示す。伝達性劣化は、部分的にはクロスモーダル相互作用のアンダーユース化によって引き起こされる。特に、単項学習とは異なり、VLPモデルはクロスモーダル相互作用に強く依存しており、マルチモーダルアライメントは多対多である(例えば、画像は様々な自然言語で記述できる)。そこで本研究では,モダリティインタラクションを徹底的に活用し,アライメント保存強化とクロスモーダルガイダンスを組み込んだ,高度に転送可能なセットレベル誘導攻撃(sga)を提案する。実験により、SGAは複数の下流視覚言語タスクにおいて、異なるVLPモデル間で強く伝達可能な逆例を生成できることを示した。画像テキスト検索において、SGAはALBEFからTCLへの転送攻撃の攻撃成功率(少なくとも9.78%、最大30.21%)を最先端と比較して著しく向上させる。

Vision-language pre-training (VLP) models have shown vulnerability to adversarial examples in multimodal tasks. Furthermore, malicious adversaries can be deliberately transferred to attack other black-box models. However, existing work has mainly focused on investigating white-box attacks. In this paper, we present the first study to investigate the adversarial transferability of recent VLP models. We observe that existing methods exhibit much lower transferability, compared to the strong attack performance in white-box settings. The transferability degradation is partly caused by the under-utilization of cross-modal interactions. Particularly, unlike unimodal learning, VLP models rely heavily on cross-modal interactions and the multimodal alignments are many-to-many, e.g., an image can be described in various natural languages. To this end, we propose a highly transferable Set-level Guidance Attack (SGA) that thoroughly leverages modality interactions and incorporates alignment-preserving augmentation with cross-modal guidance. Experimental results demonstrate that SGA could generate adversarial examples that can strongly transfer across different VLP models on multiple downstream vision-language tasks. On image-text retrieval, SGA significantly enhances the attack success rate for transfer attacks from ALBEF to TCL by a large margin (at least 9.78% and up to 30.21%), compared to the state-of-the-art.

翻訳日:2023-07-27 12:59:36 公開日:2023-07-26

# 幾何学的アプローチによるquditによるデータの分類

Classification of data with a qudit, a geometric approach ( http://arxiv.org/abs/2307.14060v1 )

ライセンス: Link先を確認

A. Mandilara, B. Dellen, U. Jaekel, T. Valtinos, D. Syvridis

(参考訳) 本稿では,孤立量子$d$レベルのシステムなどを用いたデータ分類モデルを提案する。この手順は、古典的なデータが回転符号化によってキューディットのブロッホ超球面にマッピングされるエンコーディングフェーズと、球面の回転と射影測定によって構成される。回転は測定対象のオペレータを制御するために調整可能であるが、ブロッホの超曲面上のマッピングを調整する符号化フェーズでは追加の重みが導入されている。トレーニングフェーズにおいて、観測可能量の平均期待値に基づくコスト関数を勾配降下を用いて最小化し、重量を調整する。実例を用いて無損失メモリ次元の数値的推定を行い,この幾何学的インスパイアされたquditモデルが,少数のパラメータのみを用いて,かつ絡み合い操作を必要とせずに非線形分類問題を解くことができることを実証した。

We propose a model for data classification using isolated quantum $d$-level systems or else qudits. The procedure consists of an encoding phase where classical data are mapped on the surface of the qudit's Bloch hyper-sphere via rotation encoding, followed by a rotation of the sphere and a projective measurement. The rotation is adjustable in order to control the operator to be measured, while additional weights are introduced in the encoding phase adjusting the mapping on the Bloch's hyper-surface. During the training phase, a cost function based on the average expectation value of the observable is minimized using gradient descent thereby adjusting the weights. Using examples and performing a numerical estimation of lossless memory dimension, we demonstrate that this geometrically inspired qudit model for classification is able to solve nonlinear classification problems using a small number of parameters only and without requiring entangling operations.

翻訳日:2023-07-27 12:59:13 公開日:2023-07-26

# 自動運転のためのシステム分類要件の確立に向けて

Towards Establishing Systematic Classification Requirements for Automated Driving ( http://arxiv.org/abs/2307.14058v1 )

ライセンス: Link先を確認

Ken T. Mori, Trent Brown, Steven Peters

(参考訳) 自動車分野における認識のための様々なベンチマークデータセットにおいて分類タスクが存在するにもかかわらず、一貫性のある分類要件を定義するための努力はほとんど行われていない。本稿では,分類構造を生成するための構造的手法を提案する。第一に、車両の行動要件に基づいて法的カテゴリを識別する。この構造は、物体の衝突安全性と知覚的カテゴリーの2つの側面を考慮することでさらに裏付けられる。模範的な法文にこの方法を適用することにより、分類階層を得る。結果とベンチマークデータセットのカテゴリを比較すると、合意は限られている。これは、知覚に関する法的要件を明確に考慮することの必要性を示している。

Despite the presence of the classification task in many different benchmark datasets for perception in the automotive domain, few efforts have been undertaken to define consistent classification requirements. This work addresses the topic by proposing a structured method to generate a classification structure. First, legal categories are identified based on behavioral requirements for the vehicle. This structure is further substantiated by considering the two aspects of collision safety for objects as well as perceptual categories. A classification hierarchy is obtained by applying the method to an exemplary legal text. A comparison of the results with benchmark dataset categories shows limited agreement. This indicates the necessity for explicit consideration of legal requirements regarding perception.

翻訳日:2023-07-27 12:58:59 公開日:2023-07-26

# オープン画像コンテンツの非武装化と再構築

Open Image Content Disarm And Reconstruction ( http://arxiv.org/abs/2307.14057v1 )

ライセンス: Link先を確認

Eli Belkind, Ran Dubin, Amit Dvir

(参考訳) マルウェア技術の進歩により、攻撃者は悪意のあるコードをアンチウイルスサービスから隠す新しい方法を作る。攻撃を邪魔するひとつの方法は、悪質なスクリプトを隠すためのカバーとして共通ファイルを使用することで、マルウェアは正当なファイルのように見えてしまう。最先端の人工知能とコンテンツシグネチャは存在するが、evasive malwareはステガノグラフィのような高度な手法で次世代マルウェアの検出をうまくバイパスする。マルウェアを隠すためによく使われるファイルは画像ファイル(JPEGなど)である。さらに、一部のマルウェアはsteganographyを使って悪意のあるスクリプトや機密データを画像に隠している。画像中のステガノグラフィーは、特殊なツールを使っても検出が難しい。イメージベースの攻撃は、悪意のあるペイロードを使用してユーザのデバイスを攻撃するか、イメージステガノグラフィを使用して、正当なイメージ内の機密データを隠蔽し、ユーザのデバイス外にリークしようとする。そこで本稿では,新しい画像コンテンツの非武装化・再構築(icdr)を提案する。当社のicdrシステムは,高い画像品質とファイル使用性を維持しつつ,ゼロ信頼アプローチで潜在的なマルウェアを除去する。画像データを抽出し、他のファイルから削除し、画像画素を操作することで、ファイル内の隠れたマルウェアを無効にしたり削除したりすることができる。

With the advance in malware technology, attackers create new ways to hide their malicious code from antivirus services. One way to obfuscate an attack is to use common files as cover to hide the malicious scripts, so the malware will look like a legitimate file. Although cutting-edge Artificial Intelligence and content signature exist, evasive malware successfully bypasses next-generation malware detection using advanced methods like steganography. Some of the files commonly used to hide malware are image files (e.g., JPEG). In addition, some malware use steganography to hide malicious scripts or sensitive data in images. Steganography in images is difficult to detect even with specialized tools. Image-based attacks try to attack the user's device using malicious payloads or utilize image steganography to hide sensitive data inside legitimate images and leak it outside the user's device. Therefore in this paper, we present a novel Image Content Disarm and Reconstruction (ICDR). Our ICDR system removes potential malware, with a zero trust approach, while maintaining high image quality and file usability. By extracting the image data, removing it from the rest of the file, and manipulating the image pixels, it is possible to disable or remove the hidden malware inside the file.

翻訳日:2023-07-27 12:58:49 公開日:2023-07-26

# Unite-Divide-Unite: 高精度二関節画像分割のためのジョイントブースティングトランクと構造

Unite-Divide-Unite: Joint Boosting Trunk and Structure for High-accuracy Dichotomous Image Segmentation ( http://arxiv.org/abs/2307.14052v1 )

ライセンス: Link先を確認

Jialun Pei, Zhangjun Zhou, Yueming Jin, He Tang, Pheng-Ann Heng

(参考訳) high-accuracy dichotomous image segmentation (dis)は、カテゴリーに依存しないフォアグラウンドオブジェクトを自然シーンから特定することを目的としている。 DISの主な課題は、詳細なオブジェクト構造を描画しながら、高度に正確な支配領域を特定することである。しかし、一般的なエンコーダ-デコーダアーキテクチャを直接使用すると、高レベルの特徴が過剰に供給され、細部構造を分割するのに必要な浅い空間情報が無視される可能性がある。このギャップを埋めるために、トランクと構造同定の有効性を同時に向上するために、補間的特徴を再構成し、分割的に配置する新しいユニット・ディヴィッド・ユニテ・ネットワーク(UDUN)を導入する。提案されたUDUNはいくつかの強みから進歩している。まず、デュアルサイズの入力が共有バックボーンにフィードされ、モデルを軽量に保ちながら、より全体的で詳細な機能を生成する。第2に、構造デコーダとトランクデコーダにマルチスケールの低レベル特徴と高レベル特徴を分離して、構造情報とトランク情報を取得するための単純なDCMを提案する。さらに,一様高精度セグメンテーションのためのカスケード統合を行う結合デコーダにおいて,トランク構造アグリゲーションモジュール(TSA)を設計する。その結果、udunは全6つの評価指標、すなわち0.772の重み付きf-measureと977 hceにおいて最先端の競合相手に対して有利に作用する。 1024*1024入力を用いて、ResNet-18で65.3fpsのリアルタイム推論を可能にする。

High-accuracy Dichotomous Image Segmentation (DIS) aims to pinpoint category-agnostic foreground objects from natural scenes. The main challenge for DIS involves identifying the highly accurate dominant area while rendering detailed object structure. However, directly using a general encoder-decoder architecture may result in an oversupply of high-level features and neglect the shallow spatial information necessary for partitioning meticulous structures. To fill this gap, we introduce a novel Unite-Divide-Unite Network (UDUN} that restructures and bipartitely arranges complementary features to simultaneously boost the effectiveness of trunk and structure identification. The proposed UDUN proceeds from several strengths. First, a dual-size input feeds into the shared backbone to produce more holistic and detailed features while keeping the model lightweight. Second, a simple Divide-and-Conquer Module (DCM) is proposed to decouple multiscale low- and high-level features into our structure decoder and trunk decoder to obtain structure and trunk information respectively. Moreover, we design a Trunk-Structure Aggregation module (TSA) in our union decoder that performs cascade integration for uniform high-accuracy segmentation. As a result, UDUN performs favorably against state-of-the-art competitors in all six evaluation metrics on overall DIS-TE, i.e., achieving 0.772 weighted F-measure and 977 HCE. Using 1024*1024 input, our model enables real-time inference at 65.3 fps with ResNet-18.

翻訳日:2023-07-27 12:58:29 公開日:2023-07-26

# 3dセマンティックサブスペーストラバーサ : 形状編集機能付き3d生成モデルの実現

3D Semantic Subspace Traverser: Empowering 3D Generative Model with Shape Editing Capability ( http://arxiv.org/abs/2307.14051v1 )

ライセンス: Link先を確認

Ruowei Wang, Yu Liu, Pei Su, Jianwei Zhang, Qijun Zhao

(参考訳) 形状生成は、3dコンテンツ作成のための様々な表現として3d形状を生成する実践である。従来の3次元形状生成の研究は、意味情報の重要性を考慮せずに、形状の質と構造に焦点を合わせてきた。したがって、このような生成モデルは、しばしば、形状構造の意味的一貫性を維持したり、生成中の形状の意味的属性を操作できない。本稿では,カテゴリ固有の3次元形状の生成と編集に意味属性を利用する3Dセマンティックサブスペーストラバーサという新しい意味生成モデルを提案する。提案手法は3次元形状表現として暗黙関数を利用し,新しい潜在空間GANと線形部分空間モデルを組み合わせて,局所潜在空間における意味的次元を探索する。部分空間の各次元は特定の意味属性に対応し、それらの次元の係数をトラバースすることで生成された形状の属性を編集することができる。実験の結果,提案手法は複雑な構造を持つ妥当な形状を生成でき,意味属性の編集が可能となった。コードとトレーニングされたモデルはhttps://github.com/trepangcat/3d_semantic_subspace_traverserで入手できる。

Shape generation is the practice of producing 3D shapes as various representations for 3D content creation. Previous studies on 3D shape generation have focused on shape quality and structure, without or less considering the importance of semantic information. Consequently, such generative models often fail to preserve the semantic consistency of shape structure or enable manipulation of the semantic attributes of shapes during generation. In this paper, we proposed a novel semantic generative model named 3D Semantic Subspace Traverser that utilizes semantic attributes for category-specific 3D shape generation and editing. Our method utilizes implicit functions as the 3D shape representation and combines a novel latent-space GAN with a linear subspace model to discover semantic dimensions in the local latent space of 3D shapes. Each dimension of the subspace corresponds to a particular semantic attribute, and we can edit the attributes of generated shapes by traversing the coefficients of those dimensions. Experimental results demonstrate that our method can produce plausible shapes with complex structures and enable the editing of semantic attributes. The code and trained models are available at https://github.com/TrepangCat/3D_Semantic_Subspace_Traverser

翻訳日:2023-07-27 12:57:58 公開日:2023-07-26

# 顔偽造検出のための制御可能なガイドスペース

Controllable Guide-Space for Generalizable Face Forgery Detection ( http://arxiv.org/abs/2307.14039v1 )

ライセンス: Link先を確認

Ying Guo, Cheng Zhen, Pengfei Yan

(参考訳) 顔偽造検出の最近の研究は、データセットの訓練に携わる手法に満足できる性能を示したが、未知の領域では不十分である。これは一般化を改善するための多くの研究を動機付けているが、画像の背景やアイデンティティなどの偽情報はまだ異なる領域の特徴を持ち、予期せぬクラスタリングを引き起こし、一般化を制限している。本稿では,異なる偽ドメインの識別を強化するための制御可能なガイド空間(GS)手法を提案し,特徴の偽関連性を高め,一般化を改善する。十分に設計されたガイド空間は、偽ドメインの適切な分離と、実偽ドメイン間の大きな距離を明示的かつ制御可能な方法で同時に達成することができる。さらに、より良い識別のために、ドメイン間の偽造関連相関の干渉を弱めるためにデカップリングモジュールを使用する。さらに、近傍における同一領域特徴のクラスタリング度に応じて、決定境界多様体の調整を行う。複数のドメイン内およびクロスドメイン設定での広範囲な実験により、この手法が最先端の一般化を実現できることを確認した。

Recent studies on face forgery detection have shown satisfactory performance for methods involved in training datasets, but are not ideal enough for unknown domains. This motivates many works to improve the generalization, but forgery-irrelevant information, such as image background and identity, still exists in different domain features and causes unexpected clustering, limiting the generalization. In this paper, we propose a controllable guide-space (GS) method to enhance the discrimination of different forgery domains, so as to increase the forgery relevance of features and thereby improve the generalization. The well-designed guide-space can simultaneously achieve both the proper separation of forgery domains and the large distance between real-forgery domains in an explicit and controllable manner. Moreover, for better discrimination, we use a decoupling module to weaken the interference of forgery-irrelevant correlations between domains. Furthermore, we make adjustments to the decision boundary manifold according to the clustering degree of the same domain features within the neighborhood. Extensive experiments in multiple in-domain and cross-domain settings confirm that our method can achieve state-of-the-art generalization.

翻訳日:2023-07-27 12:57:37 公開日:2023-07-26

# Multi3WOZ: 文化的適応型タスク指向対話システムの訓練と評価のための多言語・多言語・マルチパラメータデータセット

Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems ( http://arxiv.org/abs/2307.14031v1 )

ライセンス: Link先を確認

Songbo Hu, Han Zhou, Mete Hergul, Milan Gritta, Guchun Zhang, Ignacio Iacobacci, Ivan Vuli\'c, Anna Korhonen

(参考訳) タスク指向ダイアログ(ToD)のための高品質なアノテートデータの作成は、非常に難しいことで知られており、その目標は、複数の言語向けに平等で文化的に適応し、大規模なToDデータセットを作成することにある。そのため、現在のデータセットは依然として非常に少なく、翻訳に基づく非ネイティブな対話や小さなスケール、文化的適応の欠如といった制限に悩まされている。本稿では,まず,多言語todデータセットの現在の展望を概観し,その特性と限界を体系的に概観する。検出された制限をすべて削減するために,新しいマルチ言語,マルチドメイン,マルチ並列ToDデータセットであるMulti3WOZを導入する。大規模で、4つの言語で文化的に適応したダイアログを提供し、多言語および言語間todシステムのトレーニングと評価を可能にする。最終的なデータセットを得た複雑なボトムアップデータ収集プロセスを説明し、将来の参照のために様々なToD関連タスクのベースラインスコアの最初のセットを提供する。

Creating high-quality annotated data for task-oriented dialog (ToD) is known to be notoriously difficult, and the challenges are amplified when the goal is to create equitable, culturally adapted, and large-scale ToD datasets for multiple languages. Therefore, the current datasets are still very scarce and suffer from limitations such as translation-based non-native dialogs with translation artefacts, small scale, or lack of cultural adaptation, among others. In this work, we first take stock of the current landscape of multilingual ToD datasets, offering a systematic overview of their properties and limitations. Aiming to reduce all the detected limitations, we then introduce Multi3WOZ, a novel multilingual, multi-domain, multi-parallel ToD dataset. It is large-scale and offers culturally adapted dialogs in 4 languages to enable training and evaluation of multilingual and cross-lingual ToD systems. We describe a complex bottom-up data collection process that yielded the final dataset, and offer the first sets of baseline scores across different ToD-related tasks for future reference, also highlighting its challenging nature.

翻訳日:2023-07-27 12:57:19 公開日:2023-07-26

# コンセンサス適応RANSAC

Consensus-Adaptive RANSAC ( http://arxiv.org/abs/2307.14030v1 )

ライセンス: Link先を確認

Luca Cavalli, Daniel Barath, Marc Pollefeys, Viktor Larsson

(参考訳) RANSACとその変種は、ロバストな推定に広く用いられているが、一般的には、他のモデル仮説を無視しながら最高スコアのモデルを見つけるための欲求的なアプローチに従う。対照的に、反復重み付き最小二乗法(IRLS)の手法は、過去の繰り返しの残差に基づいて各対応の重みを反復的に更新することによって、徐々にモデルにアプローチする。これらの手法に触発されて,これまでに見てきた残差を新たな注意層を通して考慮し,パラメータ空間を探索することを学ぶ新しいransacフレームワークを提案する。このアテンション機構は、ポイント・ツー・モデル残差のバッチで動作し、軽量なワンステップトランスフォーマーによって得られたコンセンサスを考慮したポイント毎の推定状態を更新する。このリッチな状態は、イテレーション間の最小限のサンプリングとモデルの洗練を導く。室内および屋外の複数のデータセットに対して,本質的および基本的行列推定に関する提案手法を評価する。実行時のオーバーヘッドが小さいという大きなマージンで、最先端の推定値を上回っている。さらに、トレーニングモデルの優れた一般化特性を示し、異なるデータセットとタスクにまたがる効果を示す。提案したアテンション機構とワンステップトランスフォーマーは、RANSACの性能を向上させる適応的な動作を提供し、ロバストな推定のためのより効果的なツールである。コードはhttps://github.com/cavalli1234/CA-RANSACで公開されている。

RANSAC and its variants are widely used for robust estimation, however, they commonly follow a greedy approach to finding the highest scoring model while ignoring other model hypotheses. In contrast, Iteratively Reweighted Least Squares (IRLS) techniques gradually approach the model by iteratively updating the weight of each correspondence based on the residuals from previous iterations. Inspired by these methods, we propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer. The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer. This rich state then guides the minimal sampling between iterations as well as the model refinement. We evaluate the proposed approach on essential and fundamental matrix estimation on a number of indoor and outdoor datasets. It outperforms state-of-the-art estimators by a significant margin adding only a small runtime overhead. Moreover, we demonstrate good generalization properties of our trained model, indicating its effectiveness across different datasets and tasks. The proposed attention mechanism and one-step transformer provide an adaptive behavior that enhances the performance of RANSAC, making it a more effective tool for robust estimation. Code is available at https://github.com/cavalli1234/CA-RANSAC.

翻訳日:2023-07-27 12:56:57 公開日:2023-07-26

# 赤血球疾患分類のためのトポロジカル・レギュラライズ・マルチインスタンス学習

Topologically-Regularized Multiple Instance Learning for Red Blood Cell Disease Classification ( http://arxiv.org/abs/2307.14025v1 )

ライセンス: Link先を確認

Salome Kazeminia, Ario Sadafi, Asya Makhro, Anna Bogdanova, Carsten Marr, Bastian Rieck

(参考訳) 顕微鏡画像を用いたまれな貧血の診断は熟練の専門医や機械学習の手法では困難である。単一の血液サンプルに数千の疾患関連細胞があるため、これは複雑な多重インスタンス学習(MIL)問題を構成する。赤血球の空間的近傍は、それ自体は意味がないが、トポロジー、すなわち血液サンプル全体の幾何学は、限られたデータでトレーニングする際の勾配の消失や過度な適合などの典型的なMIL問題を治療するための情報的特徴を含む。そこで我々は,単一赤血球画像の袋から多スケールなトポロジー特徴を抽出するトポロジーベースアプローチを開発した。トポロジカルな特徴はモデルの正則化に使われ、データの特徴的なトポロジカル特性の保存を強制される。 521個の赤血球の顕微鏡像を有する稀貧血患者71例のデータセットに適用し, 単細胞画像に基づく異常貧血自動分類において, 局所的正規化が3%以上の性能向上に繋がる有効な方法であることを示した。これは、MILプロセスの正則化に位相特性を使用する最初のアプローチである。

Diagnosing rare anemia disorders using microscopic images is challenging for skilled specialists and machine-learning methods alike. Due to thousands of disease-relevant cells in a single blood sample, this constitutes a complex multiple-instance learning (MIL) problem. While the spatial neighborhood of red blood cells is not meaningful per se, the topology, i.e., the geometry of blood samples as a whole, contains informative features to remedy typical MIL issues, such as vanishing gradients and overfitting when training on limited data. We thus develop a topology-based approach that extracts multi-scale topological features from bags of single red blood cell images. The topological features are used to regularize the model, enforcing the preservation of characteristic topological properties of the data. Applied to a dataset of 71 patients suffering from rare anemia disorders with 521 microscopic images of red blood cells, our experiments show that topological regularization is an effective method that leads to more than 3% performance improvements for the automated classification of rare anemia disorders based on single-cell images. This is the first approach that uses topological properties for regularizing the MIL process.

翻訳日:2023-07-27 12:56:34 公開日:2023-07-26

# 欲しいことを伝えるアクション: 戦略的フィードバックから量子スタックルバーグ平衡のおそらくサンプル効率の良い強化学習

Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks ( http://arxiv.org/abs/2307.14085v1 )

ライセンス: Link先を確認

Siyu Chen, Mengdi Wang, Zhuoran Yang

(参考訳) リーダー従者構造を持つエピソディックマルコフゲームにおいて,量子的スタックルバーグ平衡(qse)を学ぶための強化学習(rl)について検討した。具体的には、ゲームの開始時に、リーダーは自分のポリシーをフォロワーに発表し、コミットする。従者は、リーダーの政策を観察し、次に、リーダーの政策によって引き起こされるエントロピー正規化政策最適化問題を解決することにより、質的応答政策を採用する。リーダーの目標は、フォロワーと対話し、データから学ぶことで、最適な総利益をもたらす、最適なポリシーを見つけることである。この問題の鍵となる課題は、リーダーは従者の報酬を観察できず、リーダーの政策に対する行動から従者の質的反応モデルを推測する必要があることである。関数近似の文脈において,オンライン設定とオフライン設定の両方のサンプル効率のアルゴリズムを提案する。私たちのアルゴリズムは (i)最大確率推定と量的応答モデルの学習 (ii) リーダーの意思決定問題を解決するためのモデルフリーまたはモデルベースrlは, サブリニアな後悔の上限を達成することを示す。さらに,これらの推定者の不確実性を定量化し,不確実性を利用してオンラインおよびオフラインの設定に楽観的で悲観的なアルゴリズムを実装する。また,リニア・ミオピック・セッティングに特化する場合は,アルゴリズムの計算効率も向上する。理論解析では, 量子応答モデルの誤差を組み込んだ, 独立興味を持つような新しいパフォーマンス・ディファレンス補題を特徴とする。

We study reinforcement learning (RL) for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure. In specific, at the outset of the game, the leader announces her policy to the follower and commits to it. The follower observes the leader's policy and, in turn, adopts a quantal response policy by solving an entropy-regularized policy optimization problem induced by leader's policy. The goal of the leader is to find her optimal policy, which yields the optimal expected total return, by interacting with the follower and learning from data. A key challenge of this problem is that the leader cannot observe the follower's reward, and needs to infer the follower's quantal response model from his actions against leader's policies. We propose sample-efficient algorithms for both the online and offline settings, in the context of function approximation. Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem, and we show that they achieve sublinear regret upper bounds. Moreover, we quantify the uncertainty of these estimators and leverage the uncertainty to implement optimistic and pessimistic algorithms for online and offline settings. Besides, when specialized to the linear and myopic setting, our algorithms are also computationally efficient. Our theoretical analysis features a novel performance-difference lemma which incorporates the error of quantal response model, which might be of independent interest.

翻訳日:2023-07-27 12:52:09 公開日:2023-07-26

# デジタル化カウンタダイアバティックqaoaの収束:回路深度と自由パラメータの比較

Convergence of Digitized-Counterdiabatic QAOA: circuit depth versus free parameters ( http://arxiv.org/abs/2307.14079v1 )

ライセンス: Link先を確認

Mara Vizzuso, Gianluca Passarelli, Giovanni Cantele, and Procolo Lucignano

(参考訳) 近年,連続時間量子アニーリングにおけるトロータライズ・カウンターダイアベイト駆動に触発されて,qaoaを少ないステップで最適化問題の解に収束させるために,cd量子近似最適化アルゴリズム(qaoa)が提案されている。本稿では,パラダイム的重み付きおよび非重み付き1次元MaxCut問題に着目して,このアプローチを批判的に再検討する。 1階と2階のCD補正を施した2種類のQAOAについて検討した。その結果,高次cd補正は変動コスト関数の複雑性を増大させることにより,問題の厳密な解へのより迅速な収束を可能にすることがわかった。しかし、この結果を達成するのに必要な自由パラメータの総数は、分析された特定のQAOA変種とは独立である。

Recently, Digitized-Counterdiabatic (CD) Quantum Approximate Optimization Algorithm (QAOA) has been proposed to make QAOA converge to the solution of an optimization problem in fewer steps, inspired by Trotterized counterdiabatic driving in continuous-time quantum annealing. In this paper, we critically revisit this approach by focusing on the paradigmatic weighted and unweighted one-dimensional MaxCut problem. We study two variants of QAOA with first and second-order CD corrections. Our results show that, indeed, higher order CD corrections allow for a quicker convergence to the exact solution of the problem at hand by increasing the complexity of the variational cost function. Remarkably, however, the total number of free parameters needed to achieve this result is independent of the particular QAOA variant analyzed.

翻訳日:2023-07-27 12:51:43 公開日:2023-07-26

# videocontrolnet:ディフュージョンモデルとコントロールネットを用いた動画対ビデオ翻訳フレームワーク

VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet ( http://arxiv.org/abs/2307.14073v1 )

ライセンス: Link先を確認

Zhihao Hu, Dong Xu

(参考訳) 近年,stablediffusionのような拡散モデルが印象的な画像生成結果を得ている。しかし、そのような拡散モデルの生成プロセスは制御不能であり、連続的かつ一貫したコンテンツを持つビデオを生成するのが困難である。本研究では、制御ネットを用いた拡散モデルを用いて、入力されたプロンプトと条件に基づいて様々な動画を生成するために、ビデオコントロルネットと呼ばれる動き誘導型動画翻訳フレームワークを提案する。映像コーデックにインスパイアされ、時間的冗長性を低減させる動き情報を用いて、コンテンツ一貫性のための冗長領域の再生を防止する。具体的には,制御ネットを用いた拡散モデルを用いて第1フレーム(すなわちIフレーム)を生成する。そして、新しい動き誘導型Pフレーム生成法(MgPG)を用いて、従来のI/Pフレームに基づく他の鍵フレーム(すなわちPフレーム)を生成し、この拡散モデルを用いてPフレームを動作情報に基づいて生成し、閉塞領域を印加する。最後に、動作誘導Bフレーム補間(MgBI)モジュールを用いて、残りのフレーム(Bフレーム)を生成する。提案するビデオコントロールネットは,事前学習した大規模拡散モデルの生成能力を継承し,映像拡散モデルを運動情報を用いてビデオ拡散モデルに拡張する。さらなる結果は、プロジェクトのページにある。

Recently, diffusion models like StableDiffusion have achieved impressive image generation results. However, the generation process of such diffusion models is uncontrollable, which makes it hard to generate videos with continuous and consistent content. In this work, by using the diffusion model with ControlNet, we proposed a new motion-guided video-to-video translation framework called VideoControlNet to generate various videos based on the given prompts and the condition from the input video. Inspired by the video codecs that use motion information for reducing temporal redundancy, our framework uses motion information to prevent the regeneration of the redundant areas for content consistency. Specifically, we generate the first frame (i.e., the I-frame) by using the diffusion model with ControlNet. Then we generate other key frames (i.e., the P-frame) based on the previous I/P-frame by using our newly proposed motion-guided P-frame generation (MgPG) method, in which the P-frames are generated based on the motion information and the occlusion areas are inpainted by using the diffusion model. Finally, the rest frames (i.e., the B-frame) are generated by using our motion-guided B-frame interpolation (MgBI) module. Our experiments demonstrate that our proposed VideoControlNet inherits the generation capability of the pre-trained large diffusion model and extends the image diffusion model to the video diffusion model by using motion information. More results are provided at our project page.

翻訳日:2023-07-27 12:51:27 公開日:2023-07-26

# 負の$\Delta_T$雑音を持つスピンフリップ散乱

Spin-flip scattering engendered negative $\Delta_T$ noise ( http://arxiv.org/abs/2307.14072v1 )

ライセンス: Link先を確認

Tusaradri Mohapatra, Colin Benjamin

(参考訳) 帯電電流がない場合の温度勾配による$\Delta_T$ノイズは、最近多くの関心を集めている。本稿では, スピン偏極電荷$\delta_t$ ノイズを初めて導出し, ショットノイズライクで熱雑音ライクな寄与とともにスピン$\delta_t$ ノイズを導出する。温度勾配の2層金属接合界面におけるスピンフリップパの導入について,スピンフリップ散乱の影響について検討した。 2つの異なる温度条件に対して、電荷とスピンの$\Delta_T$ノイズを4つの異なる設定で詳細に解析する: 1つの熱い貯水池の第1ケースと、同じ温度の貯水池の第2ケース、および2つの異なるバイアス電圧条件:0バイアス電圧の第1ケースと有限バイアス電圧の第2ケースである。これら全てのレジームにおいて、転送される正電荷電流が常にゼロであることを保証する。負電荷$\Delta_T$は、同じ温度の貯水池に対して、別の熱い貯水池の場合、$\Delta_T$は正である。また、スピン$\Delta_T$ノイズとスピン$\Delta_T$熱ノイズのような寄与は、ホットとコールド貯水池のケースでは負である。スピン依存バイアスを持つスピン$\delta_t$ショットノイズに対する一般的なバウンドに関する最近の研究は、常に正であることを示している。本稿では,スピン依存バイアスが存在しないにもかかわらず,正電荷$\delta_t$ショットノイズ寄与とは対照的に,スピン$\delta_t$ショットノイズ様寄与が負になることを示す。スピンフリップ散乱は、電荷とスピンの両方における符号の変化の興味深い効果を示し、スピン偏極輸送を探究するのに役立つ。

$\Delta_T$ noise generated due to temperature gradient in the absence of charge current has recently attracted a lot of interest. In this paper, for the first time, we derive spin-polarized charge $\Delta_T$ noise and spin $\Delta_T$ noise along with its shot noise-like and thermal noise-like contributions. Introducing a spin flipper at the interface of a bilayer metal junction with a temperature gradient, we examine the impact of spin-flip scattering. We do a detailed analysis of charge and spin $\Delta_T$ noise in four distinct setups for two distinct temperature regimes: the first case of one hot \& the other cold reservoir and the second case of reservoirs with comparable temperatures, and also two distinct bias voltage regimes: the first case of zero bias voltage and second case of finite bias voltage. In all these regimes, we ensure that the net charge current transported is zero always. We find negative charge $\Delta_T$ noise for reservoirs at comparable temperatures while for the one hot \& another cold reservoir case, charge $\Delta_T$ noise is positive. We also see that spin $\Delta_T$ noise and spin $\Delta_T$ thermal noise-like contributions are negative for one hot and the other cold reservoir case. Recent work on the general bound for spin $\Delta_T$ shot noise with a spin-dependent bias suggests it is always positive. In this paper, we see spin $\Delta_T$ shot noise-like contribution to be negative in contrast to positive charge $\Delta_T$ shot noise contribution, although in the absence of any spin-dependent bias. Spin-flip scattering exhibits the intriguing effect of a change in sign in both charge and spin $\Delta_T$ noise, which can help probe spin-polarized transport.

翻訳日:2023-07-27 12:51:00 公開日:2023-07-26

# 頑健かつ効率的なステレオマッチングのための不確かさ誘導適応型ワープ

Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching ( http://arxiv.org/abs/2307.14071v1 )

ライセンス: Link先を確認

Junpeng Jing, Jiankun Li, Pengfei Xiong, Jiangyu Liu, Shuaicheng Liu, Yichen Guo, Xin Deng, Mai Xu, Lai Jiang, Leonid Sigal

(参考訳) 相関に基づくステレオマッチングは、2つの特徴マップ間のコストボリュームを追求する優れた性能を達成している。残念ながら、固定されたモデルを持つ現在のメソッドは、さまざまなデータセットで均一に動作せず、実際の適用性を大幅に制限している。本稿では,ロバストなステレオマッチングのための相関を動的に計算するための新しい視点を提案する。異なるシナリオに対して同じモデルを堅牢に適応させるために、新しいUncertainty Guided Adaptive correlation (UGAC)モジュールが導入された。具体的には、整流動作中のサンプリング領域を適応的に調整するために分散に基づく不確かさ推定を行う。さらに, 位置特異的重みを学習できるように, 学習可能なパラメータを用いて従来の非パラメトリックウォーピングを改善する。 UGACモジュールでリカレントネットワークを強化することで、ステレオマッチングをより堅牢かつ効果的に活用できることが示される。 ETH3D,KITTI,Middleburyの各データセットに対して,これらのデータセットに対して同じ固定モデルを用いることで,再トレーニングを行なわずに,最先端の性能を実現することを実証した。リアルタイムアプリケーションをターゲットに,UGACに基づく軽量モデルをさらに設計し,パラメータ0.6MのKITTIベンチマークで他の手法よりも優れていることを示す。

Correlation based stereo matching has achieved outstanding performance, which pursues cost volume between two feature maps. Unfortunately, current methods with a fixed model do not work uniformly well across various datasets, greatly limiting their real-world applicability. To tackle this issue, this paper proposes a new perspective to dynamically calculate correlation for robust stereo matching. A novel Uncertainty Guided Adaptive Correlation (UGAC) module is introduced to robustly adapt the same model for different scenarios. Specifically, a variance-based uncertainty estimation is employed to adaptively adjust the sampling area during warping operation. Additionally, we improve the traditional non-parametric warping with learnable parameters, such that the position-specific weights can be learned. We show that by empowering the recurrent network with the UGAC module, stereo matching can be exploited more robustly and effectively. Extensive experiments demonstrate that our method achieves state-of-the-art performance over the ETH3D, KITTI, and Middlebury datasets when employing the same fixed model over these datasets without any retraining procedure. To target real-time applications, we further design a lightweight model based on UGAC, which also outperforms other methods over KITTI benchmarks with only 0.6 M parameters.

翻訳日:2023-07-27 12:50:20 公開日:2023-07-26

# PNT-Edge: 画素レベルの雑音遷移学習による雑音ラベルによるロバストエッジ検出に向けて

PNT-Edge: Towards Robust Edge Detection with Noisy Labels by Learning Pixel-level Noise Transitions ( http://arxiv.org/abs/2307.14070v1 )

ライセンス: Link先を確認

Wenjie Xuan, Shanshan Zhao, Yu Yao, Juhua Liu, Tongliang Liu, Yixin Chen, Bo Du, Dacheng Tao

(参考訳) 画素レベルラベルを用いた大規模トレーニングデータから,従来のエッジ検出手法は高い性能を実現している。しかし、特に大規模なデータセットでは、エッジを正確に手動でラベル付けすることは困難である。このラベルノイズ問題は分類のために広く研究されてきたが、エッジ検出については未調査のままである。本稿では,エッジ検出のためのラベルノイズ問題に対処するため,画素レベルのノイズ遷移を学習し,ラベル分解過程をモデル化する。そこで,我々は,クリーンラベルからノイズラベルへの移行を変位場として推定する,新しい画素単位シフト学習(psl)モジュールを開発した。 pnt-edgeと名づけたこのモデルでは、推定ノイズ遷移を利用して、予測をラベルのクリーン化に適合させることができる。さらに,局所的エッジ密度正規化項を考案し,局所構造情報をより優れたトランジッション学習に活用する。この用語は複雑な局所構造を持つ辺に対する大きなシフトを学ぶことを奨励する。 SBDとCityscapesの実験は,ラベルノイズの影響を緩和する手法の有効性を示した。コードはgithubで入手できる。

Relying on large-scale training data with pixel-level labels, previous edge detection methods have achieved high performance. However, it is hard to manually label edges accurately, especially for large datasets, and thus the datasets inevitably contain noisy labels. This label-noise issue has been studied extensively for classification, while still remaining under-explored for edge detection. To address the label-noise issue for edge detection, this paper proposes to learn Pixel-level NoiseTransitions to model the label-corruption process. To achieve it, we develop a novel Pixel-wise Shift Learning (PSL) module to estimate the transition from clean to noisy labels as a displacement field. Exploiting the estimated noise transitions, our model, named PNT-Edge, is able to fit the prediction to clean labels. In addition, a local edge density regularization term is devised to exploit local structure information for better transition learning. This term encourages learning large shifts for the edges with complex local structures. Experiments on SBD and Cityscapes demonstrate the effectiveness of our method in relieving the impact of label noise. Codes will be available at github.

翻訳日:2023-07-27 12:49:58 公開日:2023-07-26

# アクティブマルチドメイン適応のための動的ドメイン不一致調整

Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation ( http://arxiv.org/abs/2307.14068v1 )

ライセンス: Link先を確認

Long Liu, Bo Zhou, Zhipeng Zhao, Zening Liu

(参考訳) multi-source unsupervised domain adaptation (muda) は、関連するソースドメインからラベルなしのターゲットドメインに知識を転送することを目的としている。最近のMUDAメソッドは有望な結果を示しているが、ほとんどの場合、ソースドメイン全体の機能分布を調整することに重点を置いている。さらに、MUDAと教師付き手法の間には大きな性能差がある。これらの課題に対処するため,我々は動的ドメイン離散性適応(Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation, D3AAMDA)と呼ばれる新しいアプローチを提案する。まず、ソースとターゲットドメイン間の分布差の度合いに基づいて、トレーニングプロセス中にマルチソースの動的変調機構を確立する。このメカニズムは、ソースドメインとターゲットドメイン間の特徴のアライメントレベルを制御し、ソースドメイン内のローカルな有利な特徴情報を効果的に活用する。さらに、ガイド付き動的境界損失を利用して重要なサンプルを選択するための効率的なクエリ関数を設計するマルチソースアクティブ境界サンプル選択(MABS)戦略を提案する。この戦略は、最小サンプリングコストでターゲットドメインへの一般化を改善する。提案手法を,既存のUDA法とADA法と比較し,広く活用されているドメイン適応データセットについて検討した。実験結果は,我々のアプローチの優位性を明白に示している。

Multi-source unsupervised domain adaptation (MUDA) aims to transfer knowledge from related source domains to an unlabeled target domain. While recent MUDA methods have shown promising results, most focus on aligning the overall feature distributions across source domains, which can lead to negative effects due to redundant features within each domain. Moreover, there is a significant performance gap between MUDA and supervised methods. To address these challenges, we propose a novel approach called Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation (D3AAMDA). Firstly, we establish a multi-source dynamic modulation mechanism during the training process based on the degree of distribution differences between source and target domains. This mechanism controls the alignment level of features between each source domain and the target domain, effectively leveraging the local advantageous feature information within the source domains. Additionally, we propose a Multi-source Active Boundary Sample Selection (MABS) strategy, which utilizes a guided dynamic boundary loss to design an efficient query function for selecting important samples. This strategy achieves improved generalization to the target domain with minimal sampling costs. We extensively evaluate our proposed method on commonly used domain adaptation datasets, comparing it against existing UDA and ADA methods. The experimental results unequivocally demonstrate the superiority of our approach.

翻訳日:2023-07-27 12:49:40 公開日:2023-07-26

# 医療における機械学習応用:知識の現状と今後の方向性

Machine Learning Applications In Healthcare: The State Of Knowledge and Future Directions ( http://arxiv.org/abs/2307.14067v1 )

ライセンス: Link先を確認

Mrinmoy Roy, Sarwar J. Minar, Porarthi Dhar, A T M Omor Faruq

(参考訳) 高速な処理能力で簡単に紛失した隠れパターンの検出は、今日の医療システムに機械学習(ML)が不可欠である。多くのMLアプリケーションがすでに発見されており、その多くはまだ調査中であるが、現在の医療システムで採用されているものはほとんどない。その結果、MLの医療システムには大きなチャンスがあるが、分散情報、適切に整理されたドキュメントの不足、関連分野における説明が容易なドキュメントが大きな障害となり、医療専門家にとってMLの応用が困難になる。本研究の目的は,医療分野のさまざまな分野のMLアプリケーションを簡潔かつ効果的に収集し,必要な情報を関連文献で即座にアクセスできるようにすることである。本研究は,地域レベルでの作業,リスク管理・予防ケア,医療運用管理,遠隔医療,早期発見の5つのグループに分けた。これらのグループをサブグループに分割し,簡単なアクセスのための表形式で記述した関連資料を提供した。我々の目標は、医療産業におけるML適用性について人々に知らせ、臨床医の機械学習応用に関する知識ギャップを減らし、より機械学習ベースの医療システムにヘルスケア専門家を動機付けることである。

Detection of easily missed hidden patterns with fast processing power makes machine learning (ML) indispensable to today's healthcare system. Though many ML applications have already been discovered and many are still under investigation, only a few have been adopted by current healthcare systems. As a result, there exists an enormous opportunity in healthcare system for ML but distributed information, scarcity of properly arranged and easily explainable documentation in related sector are major impede which are making ML applications difficult to healthcare professionals. This study aimed to gather ML applications in different areas of healthcare concisely and more effectively so that necessary information can be accessed immediately with relevant references. We divided our study into five major groups: community level work, risk management/ preventive care, healthcare operation management, remote care, and early detection. Dividing these groups into subgroups, we provided relevant references with description in tabular form for quick access. Our objective is to inform people about ML applicability in healthcare industry, reduce the knowledge gap of clinicians about the ML applications and motivate healthcare professionals towards more machine learning based healthcare system.

翻訳日:2023-07-27 12:49:04 公開日:2023-07-26

# 歯科放射線画像セグメンテーションのための拡散モデルによる事前訓練

Pre-Training with Diffusion models for Dental Radiography segmentation ( http://arxiv.org/abs/2307.14066v1 )

ライセンス: Link先を確認

J\'er\'emy Rousseau, Christian Alaka, Emma Covili, Hippolyte Mayard, Laura Misrachi, Willy Au

(参考訳) 医用ラジオグラフィーのセグメンテーション、特に歯科用ラジオグラフィーは、特定の専門知識と労働集約的なアノテーションを必要とするラベル付けのコストによって非常に制限されている。本研究では,分散確率モデル(ddpm)を用いた意味セグメンテーションのための素早い事前学習手法を提案する。当社の直接的なアプローチはラベル効率の面で目覚ましいパフォーマンスを達成し,事前トレーニングとダウンストリームタスク間のアーキテクチャ変更は必要としない。 DDPMトレーニングの目的を利用して,まずUnetを事前訓練し,次にセグメント化タスクで得られたモデルを微調整する。歯科用ラジオグラフィーのセグメンテーション実験の結果,提案手法は最先端の事前訓練法と競合することが示された。

Medical radiography segmentation, and specifically dental radiography, is highly limited by the cost of labeling which requires specific expertise and labor-intensive annotations. In this work, we propose a straightforward pre-training method for semantic segmentation leveraging Denoising Diffusion Probabilistic Models (DDPM), which have shown impressive results for generative modeling. Our straightforward approach achieves remarkable performance in terms of label efficiency and does not require architectural modifications between pre-training and downstream tasks. We propose to first pre-train a Unet by exploiting the DDPM training objective, and then fine-tune the resulting model on a segmentation task. Our experimental results on the segmentation of dental radiographs demonstrate that the proposed method is competitive with state-of-the-art pre-training methods.

翻訳日:2023-07-27 12:48:10 公開日:2023-07-26

# ECO:ビジョンランゲージモデルのためのコンテキスト最適化

ECO: Ensembling Context Optimization for Vision-Language Models ( http://arxiv.org/abs/2307.14063v1 )

ライセンス: Link先を確認

Lorenzo Agnolucci, Alberto Baldrati, Francesco Todino, Federico Becattini, Marco Bertini, Alberto Del Bimbo

(参考訳) 画像認識は、近ごろパラダイムシフトを目撃し、テキストのプロンプトに基づいた数ショットの分類に視覚言語モデルが使用されている。これらのうち、CLIPモデルは、画像と独自のテキストプロンプトを潜在空間でマッチングすることで、ゼロショット転送の顕著な機能を示している。これは、CLIPの分類能力を最大化するためのエンジニアリングやテキストコンテキストの学習に焦点を当てたいくつかの作業の道を開いた。本稿では,画像分類のためのプロンプトの集合を学習することで,この傾向に従う。トレーニング可能な1つのプロンプトに頼るのではなく,多様で,おそらく短いコンテキストでの学習が,結果を大幅に改善することを示す。特に、推論時に追加コストなしで、より優れたマイノリティを報告します。 11のベンチマークで、我々のアプローチの能力を実演します。

Image recognition has recently witnessed a paradigm shift, where vision-language models are now used to perform few-shot classification based on textual prompts. Among these, the CLIP model has shown remarkable capabilities for zero-shot transfer by matching an image and a custom textual prompt in its latent space. This has paved the way for several works that focus on engineering or learning textual contexts for maximizing CLIP's classification capabilities. In this paper, we follow this trend by learning an ensemble of prompts for image classification. We show that learning diverse and possibly shorter contexts improves considerably and consistently the results rather than relying on a single trainable prompt. In particular, we report better few-shot capabilities with no additional cost at inference time. We demonstrate the capabilities of our approach on 11 different benchmarks.

翻訳日:2023-07-27 12:47:33 公開日:2023-07-26

# 可積分多体フロッケ系を用いたボヒガス・ジアニニ・シュミット予想の破れ

Violation of Bohigas-Giannoni-Schmit conjecture using an integrable many-body Floquet system ( http://arxiv.org/abs/2307.14122v1 )

ライセンス: Link先を確認

Harshit Sharma, Udaysinh T. Bhosale

(参考訳) 初期の研究では、BGSの予想を支持する十分な証拠が得られており、例外は少ない。ここでは、量子キックトップのモデルとして知られる多体システムを用いて、全対一の相互作用とキック強度$k=N\pi/2$からなる量子キックトップのモデルを用いる。対応する半古典位相空間がカオスであっても量子可積分であることを示し、したがってBGS予想に反する。 n=5$ から $11$ qubits のケースを解析的に解き、固有系、絡み合いのダイナミクス、ユニタリ進化演算子を見つける。 N>11$ qubits の一般的な場合、縮退スペクトルを用いた積分可能性の数値的証拠と、時間発展ユニタリ進化作用素の正確な周期的性質と絡み合いのダイナミクスを提供する。

Earlier studies have given enough evidence in support of the BGS conjecture, with few exceptions violating it. Here, we provide one more counterexample using a many-body system popularly known as the model of quantum kicked top consisting of $N$ qubits with all-to-all interaction and kicking strength $k=N\pi/2$. We show that it is quantum integrable even though the corresponding semiclassical phase-space is chaotic, thus violating the BGS conjecture. We solve the cases of $N=5$ to $11$ qubits analytically, finding its eigensystem, the dynamics of the entanglement, and the unitary evolution operator. For the general case of $N>11$ qubits, we provide numerical evidence of integrability using degenerate spectrum, and the exact periodic nature of the time-evolved unitary evolution operator and the entanglement dynamics.

翻訳日:2023-07-27 12:39:30 公開日:2023-07-26

# 単純グラフの最大傾きの最大数を計算する手段としての斜めグラフ

Cliqueful graphs as a means of calculating the maximal number of maximum cliques of simple graphs ( http://arxiv.org/abs/2307.14120v1 )

ライセンス: Link先を確認

D\'aniel Pfeifer

(参考訳) n$頂点上の単純なグラフは、多くの最大傾きを含むことができる。しかし、その数はどれくらいあるのか? さらに、より具体的には、もし$n \ge 15$であれば、飽和した複合気候グラフの上に取り込まれることが示される。これを用いて、$3^{\lfloor n/3 \rfloor}c$ maxcliques を含むグラフは、$n$ vertices 上で最も多くの最大値を持ち、$c\in\{1,\frac{4}{3},2\}$ は $n \text{ mod } 3$ に依存する。

A simple graph on $n$ vertices may contain a lot of maximum cliques. But how many can it potentially contain? We will show that the maximum number of maximum cliques is taken over so-called cliqueful graphs, more specifically, later we will show that it is taken over saturated composite cliqueful graphs, if $n \ge 15$. Using this we will show that the graph that contains $3^{\lfloor n/3 \rfloor}c$ maxcliques has the most number of maxcliques on $n$ vertices, where $c\in\{1,\frac{4}{3},2\}$, depending on $n \text{ mod } 3$.

翻訳日:2023-07-27 12:39:14 公開日:2023-07-26

# 高画質画像アノテーションのためのセマンティクス駆動手法

A semantics-driven methodology for high-quality image annotation ( http://arxiv.org/abs/2307.14119v1 )

ライセンス: Link先を確認

Fausto Giunchiglia, Mayukh Bagchi and Xiaolei Diao

(参考訳) 機械学習とコンピュータビジョンにおける最近の研究は、ground truth object recognition benchmarkデータセット内に様々な種類の体系的欠陥があることを強調している。我々の基本的な特徴は、これらの欠陥は画像に符号化された視覚情報とそれらに注釈を付けるラベルの意図した意味との間に存在する多対多のマッピングに根ざしているということだ。その結果、現在のアノテーションプロセスはほとんど仕様が不明確であり、アノテータの主観的な判断に多くの自由が残されている。本稿では, 自然言語処理, 知識表現, コンピュータビジョンの方法論であるvTelosを提案する。その目的は, 暗黙的に意図する意味意味論を明確にすることであり, 主観的選択の数と役割を最小化することである。 vtelos の重要な要素は、自然言語ラベルの意味を提供する主要な手段として wordnet lexico-semantic hierarchy を活用し、結果として、オブジェクトと彼らが描いた視覚特性に基づいて画像のアノテーションを駆動することである。この方法論はimagenet階層のサブセットをポピュレートするイメージ上で検証される。

Recent work in Machine Learning and Computer Vision has highlighted the presence of various types of systematic flaws inside ground truth object recognition benchmark datasets. Our basic tenet is that these flaws are rooted in the many-to-many mappings which exist between the visual information encoded in images and the intended semantics of the labels annotating them. The net consequence is that the current annotation process is largely under-specified, thus leaving too much freedom to the subjective judgment of annotators. In this paper, we propose vTelos, an integrated Natural Language Processing, Knowledge Representation, and Computer Vision methodology whose main goal is to make explicit the (otherwise implicit) intended annotation semantics, thus minimizing the number and role of subjective choices. A key element of vTelos is the exploitation of the WordNet lexico-semantic hierarchy as the main means for providing the meaning of natural language labels and, as a consequence, for driving the annotation of images based on the objects and the visual properties they depict. The methodology is validated on images populating a subset of the ImageNet hierarchy.

翻訳日:2023-07-27 12:38:56 公開日:2023-07-26

# 対話におけるデプロイメントデータからのインシシトフィードバックの活用

Leveraging Implicit Feedback from Deployment Data in Dialogue ( http://arxiv.org/abs/2307.14117v1 )

ライセンス: Link先を確認

Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He, Jason Weston

(参考訳) 我々は,ユーザとデプロイモデルとの自然な対話から学習することで,追加のアノテーションを使わずに社会的会話エージェントを改善することを研究する。機械が生成した発話の質を暗黙的に測定するために,収集した対話エピソードにおけるユーザ応答長,感情,将来の人間の発話の反応などの信号を利用する。我々の実験では、BlenderBot(Xu et al., 2023)から公開されたデプロイメントデータを使用しました。人的評価は, ベースライン応答よりも新しいモデルの改良を示唆するが, プロキシ信号によっては, 望ましくない特性を持つ世代が増える可能性がある。例えば、会話長の最適化は、ベースラインよりも議論の的あるいは不フレンドリーな世代につながるが、ポジティブな感情や反応の最適化はこれらの行動を減少させる。

We study improving social conversational agents by learning from natural dialogue between users and a deployed model, without extra annotations. To implicitly measure the quality of a machine-generated utterance, we leverage signals like user response length, sentiment and reaction of the future human utterances in the collected dialogue episodes. Our experiments use the publicly released deployment data from BlenderBot (Xu et al., 2023). Human evaluation indicates improvements in our new models over baseline responses; however, we find that some proxy signals can lead to more generations with undesirable properties as well. For example, optimizing for conversation length can lead to more controversial or unfriendly generations compared to the baseline, whereas optimizing for positive sentiment or reaction can decrease these behaviors.

翻訳日:2023-07-27 12:38:35 公開日:2023-07-26

# ガウス国家の想像力

Imaginarity of Gaussian states ( http://arxiv.org/abs/2307.14116v1 )

ライセンス: Link先を確認

Jianwei Xu

(参考訳) 量子力学がなぜ複素数だけでなく実数を使うのかという長い議論があった。この問題に対処するため、近年では、量子資源理論の手法で想像力理論が開発されている。しかし、既存の想像力理論は、主に有限次元の量子系に焦点を当てている。ガウス状態は、量子物理学の多くの分野で広く使われているが、無限次元の量子系にある。本稿では,ボソニックなガウス状態に対する想像性の資源理論を確立する。そのために、フォック基底の下で、ガウス状態の平均と共分散行列の観点から、実ガウス状態と実ガウスチャネルを決定する。また,忠実性に基づくガウス国家に対する2つの想像上の尺度を提案する。

It has been a long-standing debate that why quantum mechanics uses complex numbers but not only real numbers. To address this topic, in recent years, the imaginarity theory has been developed in the way of quantum resource theory. However, the existing imaginarity theory mainly focuses on the quantum systems with finite dimensions. Gaussian states are widely used in many fields of quantum physics, but they are in the quantum systems with infinite dimensions. In this paper we establish a resource theory of imaginarity for bosonic Gaussian states. To do so, under the Fock basis, we determine the real Gaussian states and real Gaussian channels in terms of the means and covariance matrices of Gaussian states. Also, we provide two imaginary measures for Gaussian states based on the fidelity.

翻訳日:2023-07-27 12:38:23 公開日:2023-07-26

# 眼周囲バイオメトリックス:データベース、アルゴリズム、方向

Periocular biometrics: databases, algorithms and directions ( http://arxiv.org/abs/2307.14111v1 )

ライセンス: Link先を確認

Fernando Alonso-Fernandez, Josef Bigun

(参考訳) 眼窩バイオメトリックス(periocular bioometrics)は、非制御状態における虹彩や顔のシステムの性能に関する懸念から、独立したモダリティとして確立されている。眼窩 (periocular) は、まぶた、裂け目、まぶたなど眼の周辺にある顔面の領域を指す。これは、顔全体(近距離では隠蔽できる)と虹彩テクスチャ(遠距離では十分な解像度を持たない)の間のトレードオフを表す、広範囲な取得距離で利用可能である。眼周囲領域は顔や虹彩画像に現れるため、これらのモダリティと併用して使用することもできる。眼周囲領域から抽出された特徴は、性別分類や民族分類にも有効であり、また、性別変換やプラスティック手術が認知能力に与える影響について研究している。本稿では, 近視バイオメトリックス研究における技術の現状を概観し, 最も関係の深い課題について考察し, 既存の文献を網羅的に紹介する。今後の研究動向についても概説する。

Periocular biometrics has been established as an independent modality due to concerns on the performance of iris or face systems in uncontrolled conditions. Periocular refers to the facial region in the eye vicinity, including eyelids, lashes and eyebrows. It is available over a wide range of acquisition distances, representing a trade-off between the whole face (which can be occluded at close distances) and the iris texture (which do not have enough resolution at long distances). Since the periocular region appears in face or iris images, it can be used also in conjunction with these modalities. Features extracted from the periocular region have been also used successfully for gender classification and ethnicity classification, and to study the impact of gender transformation or plastic surgery in the recognition performance. This paper presents a review of the state of the art in periocular biometric research, providing an insight of the most relevant issues and giving a thorough coverage of the existing literature. Future research trends are also briefly discussed.

翻訳日:2023-07-27 12:38:11 公開日:2023-07-26

# GraphRNNが再考: 有向非巡回グラフのアブレーション研究と拡張

GraphRNN Revisited: An Ablation Study and Extensions for Directed Acyclic Graphs ( http://arxiv.org/abs/2307.14109v1 )

ライセンス: Link先を確認

Taniya Das, Mark Koch, Maya Ravichandran, Nikhil Khatri

(参考訳) GraphRNNは、Youらによって提案された、グラフ生成モデルを学ぶためのディープラーニングベースのアーキテクチャである。我々は、GraphRNNアーキテクチャの再現実装を用いて、Youらの結果を再現し、新しいメトリクスを使用してベースラインモデルに対して評価する。アブレーション研究により,同型グラフの表現を崩壊させるようなBFSトラバーサルがモデル性能に大きく寄与することを発見した。さらに、BFSトラバーサルをトポロジ的ソートに置き換えることで、グラフRNNを拡張して有向非巡回グラフを生成する。本手法は,現実のデータセット上でのグラフRNNの有向マルチクラス変種よりも大幅に改善されていることを示す。

GraphRNN is a deep learning-based architecture proposed by You et al. for learning generative models for graphs. We replicate the results of You et al. using a reproduced implementation of the GraphRNN architecture and evaluate this against baseline models using new metrics. Through an ablation study, we find that the BFS traversal suggested by You et al. to collapse representations of isomorphic graphs contributes significantly to model performance. Additionally, we extend GraphRNN to generate directed acyclic graphs by replacing the BFS traversal with a topological sort. We demonstrate that this method improves significantly over a directed-multiclass variant of GraphRNN on a real-world dataset.

翻訳日:2023-07-27 12:37:56 公開日:2023-07-26

# Decoding ChatGPT: 既存の研究の分類学、現在の課題、そして将来の可能性

Decoding ChatGPT: A Taxonomy of Existing Research, Current Challenges, and Possible Future Directions ( http://arxiv.org/abs/2307.14107v1 )

ライセンス: Link先を確認

Shahab Saquib Sohail, Faiza Farhat, Yassine Himeur, Mohammad Nadeem, Dag {\O}ivind Madsen, Yashbir Singh, Shadi Atalla and Wathiq Mansoor

(参考訳) Chat Generative Pre-trained Transformer (ChatGPT)は2022年11月の打ち上げ以来、大きな関心を集めている。合格試験やクリエイティビティ・ライティングなど、様々な分野で印象的なパフォーマンスを示している。しかし、バイアスや信頼に関する課題や懸念は続いている。本稿では、ChatGPT研究の分類学を提供し、その応用を探求することを目的として、ChatGPT上で100冊以上のScoopsをインデクシングした出版物を総合的にレビューする。既存の文献を批判的に分析し,研究に共通するアプローチを特定した。さらに, chatgpt が医療, マーケティング, 金融サービス, ソフトウェア工学, 学術的, 科学的な記述, 研究と教育, 環境科学, 自然言語処理など, 有用性を見出した多様な応用分野を調査した。これらのアプリケーションを調べることで、実世界の課題に対処するためのchatgptの可能性に関する貴重な洞察を得ることができます。また,これらの分野におけるさらなる研究開発の必要性を強調し,バイアスや信頼性など,chatgptに関わる重要な問題についても論じる。さらに,ChatGPT研究の今後の方向性を明らかにし,今後の課題への解決策を提案し,今後の展望を推測する。 ChatGPTの能力を十分に活用することで、さまざまな領域でその可能性を解き放つことができ、会話型AIの進歩と社会における変革的な影響につながります。

Chat Generative Pre-trained Transformer (ChatGPT) has gained significant interest and attention since its launch in November 2022. It has shown impressive performance in various domains, including passing exams and creative writing. However, challenges and concerns related to biases and trust persist. In this work, we present a comprehensive review of over 100 Scopus-indexed publications on ChatGPT, aiming to provide a taxonomy of ChatGPT research and explore its applications. We critically analyze the existing literature, identifying common approaches employed in the studies. Additionally, we investigate diverse application areas where ChatGPT has found utility, such as healthcare, marketing and financial services, software engineering, academic and scientific writing, research and education, environmental science, and natural language processing. Through examining these applications, we gain valuable insights into the potential of ChatGPT in addressing real-world challenges. We also discuss crucial issues related to ChatGPT, including biases and trustworthiness, emphasizing the need for further research and development in these areas. Furthermore, we identify potential future directions for ChatGPT research, proposing solutions to current challenges and speculating on expected advancements. By fully leveraging the capabilities of ChatGPT, we can unlock its potential across various domains, leading to advancements in conversational AI and transformative impacts in society.

翻訳日:2023-07-27 12:37:45 公開日:2023-07-26

# 広い非調和ポテンシャルにおける粒子動力学のウィグナー解析

Wigner Analysis of Particle Dynamics in Wide Nonharmonic Potentials ( http://arxiv.org/abs/2307.14106v1 )

ライセンス: Link先を確認

Andreu Riera-Campeny and Marc Roda-Llordes and Piotr T. Grochowski and Oriol Romero-Isart

(参考訳) 非調和ポテンシャルにおける粒子の1次元運動の時間発展を概ね記述したウィグナー関数の解析的表現を導出する。この結果は、広いポテンシャルと小さなゆらぎ、すなわち初期状態の1つよりも大きな大きさの空間展開を可能にするが、関連する動的長さスケール(例えば、回転点間の距離)よりも小さく保たれるポテンシャルの配置において優れた近似を与える。解析結果は,古典物理学と量子物理学の相互作用と非線形力学におけるデコヒーレンスの影響を解明する。この解析結果は、非線形力学を用いて大規模粒子のマクロ量子状態を生成する提案を設計、最適化、理解するのに役立つ。

We derive an analytical expression of a Wigner function that approximately describes the time evolution of the one-dimensional motion of a particle in a nonharmonic potential. Our result provides an excellent approximation in the regime of wide potentials and small fluctuations, namely potentials that enable spatial expansions orders of magnitude larger than the one of the initial state but that remain smaller compared to the relevant dynamical length scale (e.g., distance between turning points). Our analytical result elucidates the interplay between classical and quantum physics and the impact of decoherence during nonlinear dynamics. This analytical result is instrumental to design, optimize and understand proposals using nonlinear dynamics to generate macroscopic quantum states of massive particles.

翻訳日:2023-07-27 12:37:22 公開日:2023-07-26

# スピン系上の量子非退化測定における誤差チャネル

Error channels in quantum nondemolition measurements on spin systems ( http://arxiv.org/abs/2307.14103v1 )

ライセンス: Link先を確認

Benjamin Joecker, Holly G. Stemp, Irene Fern\'andez de Fuentes, Mark A. I. Johnson, Andrea Morello

(参考訳) 量子非破壊測定(QND)は、量子情報処理の貴重な資源である。反復QND測定は、基礎となる単発測定が低忠実度であっても、キュービットの準備と測定の忠実度を高めることができる。しかし、この忠実度向上は、物理系が真にQND過程を許容する程度によって制限される - 理想的なQND測定から逸脱すると、測定が繰り返し過ぎるとビットフリップエラー(「量子ジャンプ」)が発生する。そこで我々は,モデルスピン量子ビット系における完全QND測定の偏差から生じる誤差を理解し,定量化する理論的枠組みを開発する。まず,交換結合電子スピン qubits tunnel-coupled to a charge reservoir のユビキタスな例に基づくモデルを開発した。次に電子-核スピン系に拡張し、2つの限界の間の重要な類似性と相違を説明する。シリコン中のドナー核スピンのよく理解されたプラットフォームに適用すると、このモデルは実験と良好な一致を示す。付加一般性については、異方性スピンカップリングの効果を考慮して研究を終える。

Quantum nondemolition (QND) measurements are a precious resource for quantum information processing. Repetitive QND measurements can boost the fidelity of qubit preparation and measurement, even when the underlying single-shot measurements are of low fidelity. However, this fidelity boost is limited by the degree in which the physical system allows for a truly QND process -- slight deviations from ideal QND measurement result in bit flip errors (`quantum jumps') if the measurement is repeated too often. Here, we develop a theoretical framework to understand and quantify the resulting error arising from deviation from perfect QND measurement in model spin qubit systems. We first develop our model on the ubiquitous example of exchange-coupled electron spins qubits tunnel-coupled to a charge reservoir. We then extend it to electron-nuclear spin systems, to illustrate the crucial similarities and differences between the two limits. Applied to the well-understood platform of a donor nuclear spin in silicon, the model shows excellent agreement with experiments. For added generality, we conclude the work by considering the effect of anisotropic spin couplings.

翻訳日:2023-07-27 12:37:08 公開日:2023-07-26

# 単なる死滅による合成能動推論剤の設計に向けて

Toward Design of Synthetic Active Inference Agents by Mere Mortals ( http://arxiv.org/abs/2307.14145v1 )

ライセンス: Link先を確認

Bert de Vries

(参考訳) アクティブ推論エージェントの理論的特性は印象的だが,エッジデバイス上での動作ハードウェアやソフトウェアにおいて有効なエージェントを実現するにはどうすればよいのか? これは、ポリシー探索の計算負荷が指数関数的に爆発するのに対して、計算リソースはエッジデバイスでは非常に限られているため、興味深い問題である。本稿では,能動型推論エージェントを開発するために,熟練者以外の技術者を支援するソフトウェアツールボックスに必要な機能について論じる。 tensorflowがディープラーニング技術の応用を促進するのと同じように、アクティブな推論エージェントの民主化を加速するツールボックス・イン・プログレッシブを導入する。

The theoretical properties of active inference agents are impressive, but how do we realize effective agents in working hardware and software on edge devices? This is an interesting problem because the computational load for policy exploration explodes exponentially, while the computational resources are very limited for edge devices. In this paper, we discuss the necessary features for a software toolbox that supports a competent non-expert engineer to develop working active inference agents. We introduce a toolbox-in-progress that aims to accelerate the democratization of active inference agents in a similar way as TensorFlow propelled applications of deep learning technology.

翻訳日:2023-07-27 12:30:56 公開日:2023-07-26

# LOIS:ビジュアル質問応答のためのインスタンスセマンティクスの展望

LOIS: Looking Out of Instance Semantics for Visual Question Answering ( http://arxiv.org/abs/2307.14142v1 )

ライセンス: Link先を確認

Siyu Zhang, Yeming Chen, Yaoru Sun, Fang Wang, Haibo Shi, Haoran Wang

(参考訳) 視覚的質問応答(VQA)は、視覚と言語を正しく推論するために、多モーダルなタスクとして集中的に研究されている。最近の試みでは、VQAタスクを解くための様々な注意ベースのモジュールが開発されている。しかし、モデル推論の性能は、セマンティックス理解のための視覚処理によってほとんどボトルネックとなる。既存の検出手法の多くはバウンディングボックスに依存しており、VQAモデルでは画像中のオブジェクトの意味論の因果関係を理解し、コンテキスト情報を正しく推測することが深刻な課題である。この目的のために,本研究では,この重要な問題に対処するため,LOIS(Looking Out of Instance Semantics)と呼ばれる,ボックス境界のないモデルフレームワークを提案する。 LOISにより、よりきめ細かい特徴記述が視覚的事実を生成する。さらに、インスタンスマスクによるラベルの曖昧さを克服するために、関係注意モジュールは2種類ある。 1)モダリティ内及びモダリティ 2) モーダリティは, 異なるマルチビュー特徴から正しい回答を推測するために考案された。具体的には、インスタンスオブジェクトと背景情報の間の高度な視覚的意味関係をモデル化するための相互関係注意モジュールを実装した。また,提案する注意モデルは,単語に関する重要な質問に注目することで,画像領域をさらに分析することができる。 4つのベンチマークvqaデータセットにおける実験結果から,提案手法は視覚的推論能力の向上に好適な性能を示す。

Visual question answering (VQA) has been intensively studied as a multimodal task that requires effort in bridging vision and language to infer answers correctly. Recent attempts have developed various attention-based modules for solving VQA tasks. However, the performance of model inference is largely bottlenecked by visual processing for semantics understanding. Most existing detection methods rely on bounding boxes, remaining a serious challenge for VQA models to understand the causal nexus of object semantics in images and correctly infer contextual information. To this end, we propose a finer model framework without bounding boxes in this work, termed Looking Out of Instance Semantics (LOIS) to tackle this important issue. LOIS enables more fine-grained feature descriptions to produce visual facts. Furthermore, to overcome the label ambiguity caused by instance masks, two types of relation attention modules: 1) intra-modality and 2) inter-modality, are devised to infer the correct answers from the different multi-view features. Specifically, we implement a mutual relation attention module to model sophisticated and deeper visual semantic relations between instance objects and background information. In addition, our proposed attention model can further analyze salient image regions by focusing on important word-related questions. Experimental results on four benchmark VQA datasets prove that our proposed method has favorable performance in improving visual reasoning capability.

翻訳日:2023-07-27 12:30:45 公開日:2023-07-26

# 可変駆動強度を有する単一磁束量子ビット制御

Single-flux-quantum-based Qubit Control with Tunable Driving Strength ( http://arxiv.org/abs/2307.14140v1 )

ライセンス: Link先を確認

Kuang Liu, Yifan Wang, Bo Ji, Wanpeng Gao, Zhirong Lin, Zhen Wang

(参考訳) 単一磁束量子(SFQ)回路は超伝導量子プロセッサをスケールアップするための低温量子古典界面を構築する大きな可能性を持っている。 sfqベースの量子ゲートが設計・実現されている。しかし、現在の制御方式では駆動強度をqubitsに調整することは困難であり、ゲート長を制限し、通常不要なレベルへの漏洩を引き起こす。本研究では,sfqパルスと可変間隔を結合して駆動強度を連続的に調整する方式とパルス発生回路を設計する。このスキームは、SFQベースのゲート長を調整するだけでなく、駆動強度エンベロープを調整できる可能性も提案している。シミュレーションにより,提案手法は不要なレベルへの漏洩を抑制し,SFQベースのクリフォードゲートの誤差を1桁以上低減できることが示された。

Single-flux-quantum (SFQ) circuits have great potential in building cryogenic quantum-classical interfaces for scaling up superconducting quantum processors. SFQ-based quantum gates have been designed and realized. However, current control schemes are difficult to tune the driving strength to qubits, which restricts the gate length and usually induces leakage to unwanted levels. In this study, we design the scheme and corresponding pulse generator circuit to continuously adjust the driving strength by coupling SFQ pulses with variable intervals. This scheme not only provides a way to adjust the SFQ-based gate length, but also proposes the possibility to tune the driving strength envelope. Simulations show that our scheme can suppress leakage to unwanted levels and reduce the error of SFQ-based Clifford gates by more than an order of magnitude.

翻訳日:2023-07-27 12:30:23 公開日:2023-07-26

# 因果関係の報酬を伴う部分的定常組合せ半バンド

Piecewise-Stationary Combinatorial Semi-Bandit with Causally Related Rewards ( http://arxiv.org/abs/2307.14138v1 )

ライセンス: Link先を確認

Behzad Nourani-Koliji, Steven Bilaj, Amir Rezaei Balef, Setareh Maghsudi

(参考訳) 本稿では,因果関係の報酬を用いた定位半帯域問題について検討する。非定常環境では、ベースアームの分布の変化、報酬間の因果関係、またはその両方が報酬生成プロセスを変化させる。このような環境では、最適な意思決定者は、両方の変化源を従わなければならない。この問題は、意思決定者が選択された腕の束の結果のみを観察する組合せ半バンド設定において悪化する。提案するポリシの中核は、Upper Confidence Bound (UCB)アルゴリズムである。エージェントはこの課題を克服するために適応的なアプローチに依存していると仮定する。具体的には、GLR(Generalized Likelihood Ratio)テストに基づく変更点検出器を用いる。さらに、構造化環境における意思決定プロセスにおける新たな再起動戦略としてグループ再スタートの概念を導入する。最後に,提案アルゴリズムは,基礎となるグラフ構造の変動をトレースする機構を統合し,バンディット設定における報酬間の因果関係をキャプチャする。理論的には,構造および分布の変化が性能に与える影響を反映した,後悔の上限を確立する。実世界のシナリオにおける数値実験の結果から,提案手法の適用性と性能は,最先端ベンチマークと比較して良好であった。

We study the piecewise stationary combinatorial semi-bandit problem with causally related rewards. In our nonstationary environment, variations in the base arms' distributions, causal relationships between rewards, or both, change the reward generation process. In such an environment, an optimal decision-maker must follow both sources of change and adapt accordingly. The problem becomes aggravated in the combinatorial semi-bandit setting, where the decision-maker only observes the outcome of the selected bundle of arms. The core of our proposed policy is the Upper Confidence Bound (UCB) algorithm. We assume the agent relies on an adaptive approach to overcome the challenge. More specifically, it employs a change-point detector based on the Generalized Likelihood Ratio (GLR) test. Besides, we introduce the notion of group restart as a new alternative restarting strategy in the decision making process in structured environments. Finally, our algorithm integrates a mechanism to trace the variations of the underlying graph structure, which captures the causal relationships between the rewards in the bandit setting. Theoretically, we establish a regret upper bound that reflects the effects of the number of structural- and distribution changes on the performance. The outcome of our numerical experiments in real-world scenarios exhibits applicability and superior performance of our proposal compared to the state-of-the-art benchmarks.

翻訳日:2023-07-27 12:30:08 公開日:2023-07-26

# 中規模トルコのBERTモデルの開発と評価

Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models ( http://arxiv.org/abs/2307.14134v1 )

ライセンス: Link先を確認

Himmet Toprak Kesgin, Muzaffer Kaan Yuce, Mehmet Fatih Amasyali

(参考訳) 本研究では,小,小,小,中規模のトルコのBERTモデルを導入,評価し,低リソース言語における研究ギャップを埋めることを目的とした。我々は、複数の情報源から75GB以上のテキストを含む多様なデータセットでこれらのモデルをトレーニングし、マスク予測、感情分析、ニュース分類、ゼロショット分類などのタスクでテストした。モデルのサイズは小さいものの、ゼロショットタスクを含む堅牢な性能を示し、計算効率と実行時間の短縮を実現した。本研究は,特にトルコ語の文脈において,より小さな言語モデルの開発と適用に関する貴重な知見を提供する。

This study introduces and evaluates tiny, mini, small, and medium-sized uncased Turkish BERT models, aiming to bridge the research gap in less-resourced languages. We trained these models on a diverse dataset encompassing over 75GB of text from multiple sources and tested them on several tasks, including mask prediction, sentiment analysis, news classification, and, zero-shot classification. Despite their smaller size, our models exhibited robust performance, including zero-shot task, while ensuring computational efficiency and faster execution times. Our findings provide valuable insights into the development and application of smaller language models, especially in the context of the Turkish language.

翻訳日:2023-07-27 12:29:51 公開日:2023-07-26

# RNN-Tロスにさよなら: 音声認識のための新しいCIFベースのトランスデューサアーキテクチャ

Say Goodbye to RNN-T Loss: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition ( http://arxiv.org/abs/2307.14132v1 )

ライセンス: Link先を確認

Tian-Hao Zhang, Dinghao Zhou, Guiping Zhon, Baoxiang Li

(参考訳) RNN-Tモデルは、入力オーディオとターゲットシーケンス間の長さアライメントを実現するために、RNN-T損失に依存するASRで広く使われている。しかし、実装の複雑さとrnn-t損失のアライメントに基づく最適化ターゲットは、それぞれ計算冗長性と予測ネットワークの役割を減少させる。本稿では,CIF(Continuous Integrate-and-Fire)機構をRNN-Tモデルに組み込んだCIF-Transducer(CIF-T)という新しいモデルを提案する。このようにして、RNN-T損失は放棄され、計算量が減少し、予測ネットワークがより重要な役割を果たす。また,Funnel-CIF,Context Blocks,Unified Gating and Bilinear Pooling joint network,およびパフォーマンス向上のための補助的トレーニング戦略についても紹介する。 178時間AISHELL-1と10000時間WnetSpeechデータセットの実験は、CIF-TがRNN-Tモデルと比較して計算オーバーヘッドの少ない最先端の結果を達成することを示した。

RNN-T models are widely used in ASR, which rely on the RNN-T loss to achieve length alignment between input audio and target sequence. However, the implementation complexity and the alignment-based optimization target of RNN-T loss lead to computational redundancy and a reduced role for predictor network, respectively. In this paper, we propose a novel model named CIF-Transducer (CIF-T) which incorporates the Continuous Integrate-and-Fire (CIF) mechanism with the RNN-T model to achieve efficient alignment. In this way, the RNN-T loss is abandoned, thus bringing a computational reduction and allowing the predictor network a more significant role. We also introduce Funnel-CIF, Context Blocks, Unified Gating and Bilinear Pooling joint network, and auxiliary training strategy to further improve performance. Experiments on the 178-hour AISHELL-1 and 10000-hour WenetSpeech datasets show that CIF-T achieves state-of-the-art results with lower computational overhead compared to RNN-T models.

翻訳日:2023-07-27 12:29:37 公開日:2023-07-26

# 超伝導量子古典ハイブリッド回路における準粒子ダイナミクス

Quasiparticle Dynamics in Superconducting Quantum-Classical Hybrid Circuits ( http://arxiv.org/abs/2307.14130v1 )

ライセンス: Link先を確認

Kuang Liu, Xiaoliang He, Zhengqi Niu, Hang Xue, Wenbing Jiang, Liliang Ying, Wei Peng, Masaaki Maezawa, Zhirong Lin, Xiaoming Xie, Zhen Wang

(参考訳) 単一磁束量子(sfq)回路は、スケーラブルで可積分な極低温量子制御システムの有望な候補である。しかし、SFQ回路の動作は、量子ビットデコヒーレンスの重要な源である非平衡準粒子(QP)を導入している。本研究では、SFQ回路と量子ビット回路からなる超伝導量子古典ハイブリッドチップのQP挙動について検討する。量子緩和時間のモニタリングにより,sfq回路誘起qpsのダイナミクスを探索する。量子ビット近傍のqp密度は、ハイブリッド回路におけるqpsのフォノンによる伝播時間に対応するsfq回路動作数マイクロ秒後にピークに達することが判明した。これはフォノンによる伝搬がハイブリッド回路におけるQPの拡散を支配することを示唆している。その結果,量子古典ハイブリッドシステムにおけるQP中毒抑制の基礎を築いた。

Single flux quantum (SFQ) circuitry is a promising candidate for a scalable and integratable cryogenic quantum control system. However, the operation of SFQ circuits introduces non-equilibrium quasiparticles (QPs), which are a significant source of qubit decoherence. In this study, we investigate QP behavior in a superconducting quantum-classical hybrid chip that comprises an SFQ circuit and a qubit circuit. By monitoring qubit relaxation time, we explore the dynamics of SFQ-circuit-induced QPs. Our findings reveal that the QP density near the qubit reaches its peak after several microseconds of SFQ circuit operation, which corresponds to the phonon-mediated propagation time of QPs in the hybrid circuits. This suggests that phonon-mediated propagation dominates the spreading of QPs in the hybrid circuits. Our results lay the foundation to suppress QP poisoning in quantum-classical hybrid systems.

翻訳日:2023-07-27 12:29:18 公開日:2023-07-26

# creative birds: 自己監督型single-view 3dスタイルトランスファー

Creative Birds: Self-Supervised Single-View 3D Style Transfer ( http://arxiv.org/abs/2307.14127v1 )

ライセンス: Link先を確認

Renke Wang, Guimin Que, Shuo Chen, Xiang Li, Jun Li, Jian Yang

(参考訳) 本稿では, 形状とテクスチャを両立させたユニークな3Dオブジェクトを生成する, 単一視点3Dスタイルのトランスファー手法を提案する。提案手法は鳥の3dメッシュ形状とテクスチャを2枚の単一視点画像から生成することを目的としている。そこで本研究では,dual residual gated network(drgnet)とmulti-layer perceptron(mlp)を組み合わせた新しい形状伝達生成器を提案する。 drgnetは共有座標ゲートユニットを用いてソースおよびターゲット画像の特徴を抽出し、mlpは3dメッシュを構築するための空間座標を生成する。また,セマンティクスuvセグメンテーションを用いたテクスチャスタイル転送を実装するセマンティクスuvテクスチャ転送モジュールも導入し,セマンティクス領域の意味的意味の一貫性を保証する。このモジュールは多くの既存のアプローチに広く適用できる。最後に,識別可能なレンダラーを用いて新しい3次元鳥を構築する。 CUBデータセットの実験結果から,本手法が一視点3Dスタイル転送タスクにおける最先端性能を実現することが確認された。コードはhttps://github.com/wrk226/2D-to-3D-Evolution-Transferで公開されている。

In this paper, we propose a novel method for single-view 3D style transfer that generates a unique 3D object with both shape and texture transfer. Our focus lies primarily on birds, a popular subject in 3D reconstruction, for which no existing single-view 3D transfer methods have been developed.The method we propose seeks to generate a 3D mesh shape and texture of a bird from two single-view images. To achieve this, we introduce a novel shape transfer generator that comprises a dual residual gated network (DRGNet), and a multi-layer perceptron (MLP). DRGNet extracts the features of source and target images using a shared coordinate gate unit, while the MLP generates spatial coordinates for building a 3D mesh. We also introduce a semantic UV texture transfer module that implements textural style transfer using semantic UV segmentation, which ensures consistency in the semantic meaning of the transferred regions. This module can be widely adapted to many existing approaches. Finally, our method constructs a novel 3D bird using a differentiable renderer. Experimental results on the CUB dataset verify that our method achieves state-of-the-art performance on the single-view 3D style transfer task. Code is available in https://github.com/wrk226/2D-to-3D-Evolution-Transfer.

翻訳日:2023-07-27 12:29:06 公開日:2023-07-26

# 共有特徴モデルによるモダリティの欠如を伴うマルチモーダル学習

Multi-modal Learning with Missing Modality via Shared-Specific Feature Modelling ( http://arxiv.org/abs/2307.14126v1 )

ライセンス: Link先を確認

Hu Wang, Yuanhong Chen, Congbo Ma, Jodie Avery, Louise Hull, Gustavo Carneiro

(参考訳) モダリティの欠如は重要な問題であるが、マルチモーダルモデルによって解決されるのは自明ではない。マルチモーダルタスクにおける欠落モダリティ問題に対処する現在の手法は、評価中のみ欠落モダリティを処理するか、特定の欠落モダリティ設定を扱うために別のモデルを訓練する。さらに、これらのモデルは特定のタスクのために設計されており、例えば、分類モデルはセグメンテーションタスクに容易に適応できない。本稿では、上記の問題に対処する競合するアプローチよりもはるかにシンプルで効果的である共有特徴モデリング(ShaSpec)手法を提案する。 ShaSpecは、トレーニング中に利用可能なすべての入力モダリティを活用し、共有機能と特定の機能を学び、入力データをより良く表現することで評価するように設計されている。これは、分散アライメントとドメイン分類に基づく補助的なタスクに依存する戦略と、残りの特徴融合手順によって達成される。また、ShaSpecの設計の単純さにより、分類やセグメンテーションといった複数のタスクへの適応が容易になる。医用画像のセグメンテーションとコンピュータビジョンの分類において実験を行い、ShaSpecは競合する手法よりも大きなマージンで優れていることを示した。例えば、BraTS2018では、ShaSpecは腫瘍を増強するためのSOTAを3%以上改善し、腫瘍コアを5%、腫瘍全体を3%改善した。

The missing modality issue is critical but non-trivial to be solved by multi-modal models. Current methods aiming to handle the missing modality problem in multi-modal tasks, either deal with missing modalities only during evaluation or train separate models to handle specific missing modality settings. In addition, these models are designed for specific tasks, so for example, classification models are not easily adapted to segmentation tasks and vice versa. In this paper, we propose the Shared-Specific Feature Modelling (ShaSpec) method that is considerably simpler and more effective than competing approaches that address the issues above. ShaSpec is designed to take advantage of all available input modalities during training and evaluation by learning shared and specific features to better represent the input data. This is achieved from a strategy that relies on auxiliary tasks based on distribution alignment and domain classification, in addition to a residual feature fusion procedure. Also, the design simplicity of ShaSpec enables its easy adaptation to multiple tasks, such as classification and segmentation. Experiments are conducted on both medical image segmentation and computer vision classification, with results indicating that ShaSpec outperforms competing methods by a large margin. For instance, on BraTS2018, ShaSpec improves the SOTA by more than 3% for enhancing tumour, 5% for tumour core and 3% for whole tumour.

翻訳日:2023-07-27 12:28:43 公開日:2023-07-26

# イベントカメラを用いた物体分類と検出のためのメモリ効率の高いグラフ畳み込みネットワーク

Memory-Efficient Graph Convolutional Networks for Object Classification and Detection with Event Cameras ( http://arxiv.org/abs/2307.14124v1 )

ライセンス: Link先を確認

Kamil Jeziorek, Andrea Pinna, Tomasz Kryjak

(参考訳) イベントカメラ研究の最近の進歩は、高時間分解能、高ダイナミックレンジ、低レイテンシ、画像ぼけに対する耐性など、独自の特徴を利用できるような、スパース形式のデータ処理を強調している。イベントデータを解析するための有望なアプローチは、グラフ畳み込みネットワーク(GCN)を通じてである。しかし、この領域における現在の研究は、主に計算コストの最適化と関連するメモリコストの無視に焦点を当てている。本稿では,両因子を組み合わせることで,満足度の高い結果と比較的低いモデルの複雑さを実現する。そこで本研究では,実行時間,トレーニング可能なモデルパラメータ数,データフォーマット要件,トレーニング結果などの要因を考慮した,異なるグラフ畳み込み操作の比較分析を行った。その結果,特徴抽出モジュールのパラメータ数を450倍に減らし,データ表現のサイズを4.5倍に減らし,52.3%の分類精度を維持した。さらに,オブジェクト検出アーキテクチャを実装し,N-Caltech101データセット上での性能評価を行った。その結果、精度は53.7 % mAP@0.5で、実行速度は毎秒82グラフに達した。

Recent advances in event camera research emphasize processing data in its original sparse form, which allows the use of its unique features such as high temporal resolution, high dynamic range, low latency, and resistance to image blur. One promising approach for analyzing event data is through graph convolutional networks (GCNs). However, current research in this domain primarily focuses on optimizing computational costs, neglecting the associated memory costs. In this paper, we consider both factors together in order to achieve satisfying results and relatively low model complexity. For this purpose, we performed a comparative analysis of different graph convolution operations, considering factors such as execution time, the number of trainable model parameters, data format requirements, and training outcomes. Our results show a 450-fold reduction in the number of parameters for the feature extraction module and a 4.5-fold reduction in the size of the data representation while maintaining a classification accuracy of 52.3%, which is 6.3% higher compared to the operation used in state-of-the-art approaches. To further evaluate performance, we implemented the object detection architecture and evaluated its performance on the N-Caltech101 dataset. The results showed an accuracy of 53.7 % mAP@0.5 and reached an execution rate of 82 graphs per second.

翻訳日:2023-07-27 12:28:20 公開日:2023-07-26

# AIと教育 : システム思考におけるChatGPTの利用に関する調査

AI and Education: An Investigation into the Use of ChatGPT for Systems Thinking ( http://arxiv.org/abs/2307.14206v1 )

ライセンス: Link先を確認

Holger Arndt

(参考訳) 本研究は,様々な分野におけるシステム思考(ST)を支援する人工知能ツールChatGPTの可能性について検討する。本研究は、一般用および対象用両方のプロンプトを用いて、ツールの異なるバージョンにわたるChatGPTの応答の正確性、有用性、信頼性を評価する。以上の結果から,ChatGPTは様々な被験者に対して,STスキル向上のためのツールとしての可能性を示した。しかし、時に不正確なことは、ユーザがChatGPTの応答に批判的であり続ける必要性を浮き彫りにする。若干の制限はあるものの、この研究は注意深い使用と特注により、chatgptはstの教育と学習に有用なツールであることを示唆している。

This exploratory study investigates the potential of the artificial intelligence tool, ChatGPT, to support systems thinking (ST) in various subjects. Using both general and subject specific prompts, the study assesses the accuracy, helpfulness, and reliability of ChatGPT's responses across different versions of the tool. The results indicate that ChatGPT can provide largely correct and very helpful responses in various subjects, demonstrating its potential as a tool for enhancing ST skills. However, occasional inaccuracies highlight the need for users to remain critical of ChatGPT's responses. Despite some limitations, this study suggests that with careful use and attention to its idiosyncrasies, ChatGPT can be a valuable tool for teaching and learning ST.

翻訳日:2023-07-27 12:20:13 公開日:2023-07-26

# ランダム・フォレストとサポート・ベクター・マシンの圧力濾過性能調査への応用 : 亜鉛プラント・フィルタ・ケーキ・モデリング

Application of Random Forest and Support Vector Machine for Investigation of Pressure Filtration Performance, a Zinc Plant Filter Cake Modeling ( http://arxiv.org/abs/2307.14199v1 )

ライセンス: Link先を確認

Masoume Kazemi, Davood Moradkhani, Alireza Abbas Alipour

(参考訳) 亜鉛の生産には鉱石から亜鉛を溶出し、その後圧力濾過によって固形物と液体を分離することが含まれる。この分離プロセスは、亜鉛の回収量を減少させる水分を含むため、非常に重要である。本研究ではランダムフォレスト(rf)とサポートベクターマシン(svm)による圧力濾過過程をモデル化した。モデルは実験室のサンプルから連続変数(抽出された特徴)を入力として取り出す。そのため、回帰モデルであるランダムフォレスト回帰(RFR)とサポートベクター回帰(SVR)が選択された。圧力濾過過程において,2つの条件で全データセットを得た。 1)ポリプロピレン(S1)および 2) ポリエステル織物(S2) ケーキ水分の予測には, 固体濃度 (0.2, 0.38), 温度 (35, 65cm) , pH (2, 3.5, 5) , 圧力, ケーキ厚 (14, 20, 26, 34 mm) , 気中時間 (2, 10, 15分) , 濾過時間 (15分) を入力変数として適用した。モデルの予測精度は決定係数(r2)パラメータによって評価された。その結果,RFRモデルはケーキの水分予測においてSVRモデルよりも優れていることがわかった。

The hydrometallurgical method of zinc production involves leaching zinc from ore and then separating the solid residue from the liquid solution by pressure filtration. This separation process is very important since the solid residue contains some moisture that can reduce the amount of zinc recovered. This study modeled the pressure filtration process through Random Forest (RF) and Support Vector Machine (SVM). The models take continuous variables (extracted features) from the lab samples as inputs. Thus, regression models namely Random Forest Regression (RFR) and Support Vector Regression (SVR) were chosen. A total dataset was obtained during the pressure filtration process in two conditions: 1) Polypropylene (S1) and 2) Polyester fabrics (S2). To predict the cake moisture, solids concentration (0.2 and 0.38), temperature (35 and 65 centigrade), pH (2, 3.5, and 5), pressure, cake thickness (14, 20, 26, and 34 mm), air-blow time (2, 10 and 15 min) and filtration time were applied as input variables. The models' predictive accuracy was evaluated by the coefficient of determination (R2) parameter. The results revealed that the RFR model is superior to the SVR model for cake moisture prediction.

翻訳日:2023-07-27 12:19:59 公開日:2023-07-26

# 離散連続計算グラフの効率的な学習

Efficient Learning of Discrete-Continuous Computation Graphs ( http://arxiv.org/abs/2307.14193v1 )

ライセンス: Link先を確認

David Friede and Mathias Niepert

(参考訳) 教師付きおよび強化学習のための多数のモデルは、離散的および連続的なモデルコンポーネントの組み合わせから恩恵を受ける。エンドツーエンド学習可能な離散連続モデルは合成であり、より一般化され、より解釈可能である。離散連続計算グラフを構築する一般的なアプローチは、確率的ソフトマックストリックを用いて離散確率分布をニューラルネットワークに統合する手法である。先行研究は主に、グラフの実行パスごとに単一の離散成分を持つ計算グラフに焦点を当てている。複数の逐次離散成分を持つより複雑な確率計算グラフの挙動を解析する。これらのモデルのパラメータを最適化することは、主に小さな勾配と局所的な極小さのために困難である。次に、これらの課題を克服するための2つの新しい戦略を提案する。まず,学習時のガムベルノイズ摂動のスケールパラメータの増大が学習行動を改善することを示す。第二に,確率的離散連続計算グラフに専用に調整したドロップアウト残差接続を提案する。広範な実験により、標準的な確率的ソフトマックスのトリックで訓練できない複雑な離散連続モデルを訓練できることが示される。また、複雑な離散確率モデルが、いくつかのベンチマークデータセット上で連続的なモデルよりも一般化されていることを示す。

Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more interpretable. A popular approach to building discrete-continuous computation graphs is that of integrating discrete probability distributions into neural networks using stochastic softmax tricks. Prior work has mainly focused on computation graphs with a single discrete component on each of the graph's execution paths. We analyze the behavior of more complex stochastic computations graphs with multiple sequential discrete components. We show that it is challenging to optimize the parameters of these models, mainly due to small gradients and local minima. We then propose two new strategies to overcome these challenges. First, we show that increasing the scale parameter of the Gumbel noise perturbations during training improves the learning behavior. Second, we propose dropout residual connections specifically tailored to stochastic, discrete-continuous computation graphs. With an extensive set of experiments, we show that we can train complex discrete-continuous models which one cannot train with standard stochastic softmax tricks. We also show that complex discrete-stochastic models generalize better than their continuous counterparts on several benchmark datasets.

翻訳日:2023-07-27 12:19:33 公開日:2023-07-26

# chatgptのセキュリティ、プライバシ、倫理的懸念を公開

Unveiling Security, Privacy, and Ethical Concerns of ChatGPT ( http://arxiv.org/abs/2307.14192v1 )

ライセンス: Link先を確認

Xiaodong Wu, Ran Duan, Jianbing Ni

(参考訳) 本稿では、トピックモデリングと強化学習を利用して自然な応答を生成するAIを利用したチャットボットChatGPTの領域について述べる。 ChatGPTは、カスタマーサービス、教育、メンタルヘルス治療、個人の生産性、コンテンツ作成など、さまざまな業界で大きな約束を持っていますが、セキュリティ、プライバシー、倫理的影響に対処することが不可欠です。本研究は,GPT-1からGPT-4へのアップグレードパスを探索し,モデルの特徴,限界,潜在的な応用について考察することにより,ChatGPTを日常生活に組み込む可能性を明らかにすることを目的とする。セキュリティ、プライバシー、倫理の問題に焦点を当てて、これらの懸念が広く採用する上での課題を強調します。最後に,これらの領域におけるオープンな問題を分析し,安全で倫理的に健全な大規模言語モデルの開発を確実にするための協力的な取り組みを求める。

This paper delves into the realm of ChatGPT, an AI-powered chatbot that utilizes topic modeling and reinforcement learning to generate natural responses. Although ChatGPT holds immense promise across various industries, such as customer service, education, mental health treatment, personal productivity, and content creation, it is essential to address its security, privacy, and ethical implications. By exploring the upgrade path from GPT-1 to GPT-4, discussing the model's features, limitations, and potential applications, this study aims to shed light on the potential risks of integrating ChatGPT into our daily lives. Focusing on security, privacy, and ethics issues, we highlight the challenges these concerns pose for widespread adoption. Finally, we analyze the open problems in these areas, calling for concerted efforts to ensure the development of secure and ethically sound large language models.

翻訳日:2023-07-27 12:19:18 公開日:2023-07-26

# adapt:適応による効率的なマルチエージェント軌道予測

ADAPT: Efficient Multi-Agent Trajectory Prediction with Adaptation ( http://arxiv.org/abs/2307.14187v1 )

ライセンス: Link先を確認

G\"orkay Aydemir, Adil Kaan Akan, Fatma G\"uney

(参考訳) 複雑な交通シーンにおけるエージェントの将来の軌道を予測するには、シーン内のすべてのエージェントの信頼性と効率のよい予測が必要である。しかし、既存の軌道予測手法は非効率か犠牲の精度である。この課題に対処するために,動的重み学習を用いて現場の全てのエージェントの軌跡を共同で予測する新しいアプローチであるADAPTを提案する。提案手法は,ArgoverseおよびInteractionデータセットの単一エージェントおよび複数エージェント設定において,計算オーバーヘッドのごく一部で,最先端の手法よりも優れている。第1に,モデルサイズを増加させずにモデルのキャパシティを増強する適応ヘッド,第2に,勾配停止によって強化されたエンドポイント条件付き予測における設計選択を特徴とする。解析の結果,ADAPTは適応予測により各エージェントに焦点を絞ることができることがわかった。 https://KUIS-AI.github.io/adapt

Forecasting future trajectories of agents in complex traffic scenes requires reliable and efficient predictions for all agents in the scene. However, existing methods for trajectory prediction are either inefficient or sacrifice accuracy. To address this challenge, we propose ADAPT, a novel approach for jointly predicting the trajectories of all agents in the scene with dynamic weight learning. Our approach outperforms state-of-the-art methods in both single-agent and multi-agent settings on the Argoverse and Interaction datasets, with a fraction of their computational overhead. We attribute the improvement in our performance: first, to the adaptive head augmenting the model capacity without increasing the model size; second, to our design choices in the endpoint-conditioned prediction, reinforced by gradient stopping. Our analyses show that ADAPT can focus on each agent with adaptive prediction, allowing for accurate predictions efficiently. https://KUIS-AI.github.io/adapt

翻訳日:2023-07-27 12:19:02 公開日:2023-07-26

# バージニア州ノーフォークにおける道路規模洪水の機械学習シュロゲートモデルの比較

A comparison of machine learning surrogate models of street-scale flooding in Norfolk, Virginia ( http://arxiv.org/abs/2307.14185v1 )

ライセンス: Link先を確認

Diana McSpadden and Steven Goldenberg and Binata Roy and Malachi Schram and Jonathan L. Goodall and Heather Richter

(参考訳) バージニア州ノーフォークに代表される低地沿岸の都市は、降雨と潮によって引き起こされる道路洪水の課題に直面している。高忠実で物理に基づくシミュレーションは都市多重洪水の正確な予測を提供するが、その計算複雑性はリアルタイムアプリケーションには適さない。 2016年から2018年にかけてのノーフォークの降雨イベントのデータを用いて、ランダム森林アルゴリズムに基づく従来の代理モデルと2つのディープラーニングモデル、LSTM(Long Short-Term Memory)とGated Recurrent Unit(GRU)を比較した。本研究は,予測不確実性の伝達と,関連するマルチモーダル特徴の効果的な統合を支援するモデルアーキテクチャの利用の重要性を指摘する。

Low-lying coastal cities, exemplified by Norfolk, Virginia, face the challenge of street flooding caused by rainfall and tides, which strain transportation and sewer systems and can lead to property damage. While high-fidelity, physics-based simulations provide accurate predictions of urban pluvial flooding, their computational complexity renders them unsuitable for real-time applications. Using data from Norfolk rainfall events between 2016 and 2018, this study compares the performance of a previous surrogate model based on a random forest algorithm with two deep learning models: Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). This investigation underscores the importance of using a model architecture that supports the communication of prediction uncertainty and the effective integration of relevant, multi-modal features.

翻訳日:2023-07-27 12:18:44 公開日:2023-07-26

# セマンティクスセグメンテーションネットワークのためのアトーラスレートの解像度認識設計

Resolution-Aware Design of Atrous Rates for Semantic Segmentation Networks ( http://arxiv.org/abs/2307.14179v1 )

ライセンス: Link先を確認

Bum Jun Kim, Hyeyeon Choi, Hyeonah Jang, Sang Woo Kim

(参考訳) deeplabはセマンティックセグメンテーションに広く使われているディープニューラルネットワークであり、その成功はatrous spatial pyramid pooling (aspp)と呼ばれる並列アーキテクチャに起因する。 ASPPは、局所情報とグローバル情報の両方を抽出するために異なるアトラスレートを持つ複数のアトラス畳み込みを使用する。しかし、アラスレートの固定値は、その視野のサイズを制限するASPPモジュールに使用される。原則として atrous rate は、対象のタスクやデータセットに応じてビューサイズのサイズを変更するハイパーパラメータであるべきです。しかし、アトーラスレートの操作はいかなるガイドラインにも従わない。本研究は,最適アラスレートを得るための実践的ガイドラインを提案する。まず、セグメンテーションネットワークの内部挙動を分析するために、セグメンテーションのための効果的な受容場を導入する。我々は,ASPPモジュールの使用により,有効受容領域の特定のパターンが得られ,モジュールの基盤となるメカニズムが明らかにされた。したがって、入力画像のサイズに基づいて制御すべき最適アラス率を得るための実用的なガイドラインを導出する。他の値と比較して、最適なatrousレートを使用することで、stare、 chase_db1、hrf、cityscapes、isaidデータセットを含む複数のデータセットにまたがるセグメンテーション結果が一貫して改善される。

DeepLab is a widely used deep neural network for semantic segmentation, whose success is attributed to its parallel architecture called atrous spatial pyramid pooling (ASPP). ASPP uses multiple atrous convolutions with different atrous rates to extract both local and global information. However, fixed values of atrous rates are used for the ASPP module, which restricts the size of its field of view. In principle, atrous rate should be a hyperparameter to change the field of view size according to the target task or dataset. However, the manipulation of atrous rate is not governed by any guidelines. This study proposes practical guidelines for obtaining an optimal atrous rate. First, an effective receptive field for semantic segmentation is introduced to analyze the inner behavior of segmentation networks. We observed that the use of ASPP module yielded a specific pattern in the effective receptive field, which was traced to reveal the module's underlying mechanism. Accordingly, we derive practical guidelines for obtaining the optimal atrous rate, which should be controlled based on the size of input image. Compared to other values, using the optimal atrous rate consistently improved the segmentation results across multiple datasets, including the STARE, CHASE_DB1, HRF, Cityscapes, and iSAID datasets.

翻訳日:2023-07-27 12:18:28 公開日:2023-07-26

# SoCFPGAデバイスを用いた高精細イベントフレーム生成

High-definition event frame generation using SoC FPGA devices ( http://arxiv.org/abs/2307.14177v1 )

ライセンス: Link先を確認

Krzysztof Blachut, Tomasz Kryjak

(参考訳) 本稿では,FPGA デバイスにおける画像面への高解像度イベントデータストリーム (HD -1280 x 720 ピクセル) の蓄積と投影の実装について述べる。結果はこのアプローチの実現可能性を確認したが、考慮すべき課題、制限、トレードオフはいくつかある。選択したデータ表現(バイナリフレーム、イベントフレーム、指数関数的に減衰する時間表面、イベント周波数)のハードウェアリソースは、AMD Xilinxの一般的なプラットフォームで利用できるものと比較した。結果のイベントフレームは、古典的およびディープニューラルネットワーク手法の両方を用いて、オブジェクトの分類や検出などの典型的な視覚アルゴリズムに使用することができる。

In this paper we have addressed the implementation of the accumulation and projection of high-resolution event data stream (HD -1280 x 720 pixels) onto the image plane in FPGA devices. The results confirm the feasibility of this approach, but there are a number of challenges, limitations and trade-offs to be considered. The required hardware resources of selected data representations, such as binary frame, event frame, exponentially decaying time surface and event frequency, were compared with those available on several popular platforms from AMD Xilinx. The resulting event frames can be used for typical vision algorithms, such as object classification and detection, using both classical and deep neural network methods.

翻訳日:2023-07-27 12:18:05 公開日:2023-07-26

# 非古典光による多光子電子放出

Multi-photon electron emission with non-classical light ( http://arxiv.org/abs/2307.14153v1 )

ライセンス: Link先を確認

Jonas Heimerl, Alexander Mikhaylov, Stefan Meier, Henrick H\"ollerer, Ido Kaminer, Maria Chekhova and Peter Hommelhoff

(参考訳) 古典的および非古典的光源からの光子数分布は広く研究されてきたが、光電子放出過程への影響はほとんど解明されていない。本稿では,光子量子統計の異なる超短光パルスで照らされた金属針先端からの電子数分布の測定を行う。古典(ポアソニアン)と量子(超ポアソニアン)の間の励起光場の光子統計を変化させることにより、測定された電子分布が実質的に変化することを証明する。単一モードの明るい真空光を用いて、1つの光パルスから最大65電子の極端な統計事象を1パルスあたり0.27電子と測定し、そのような事象の確率はポアソニアン統計値と10^{-128}$である。励起励起された真空光のモード数を変えることで、必要に応じて電子数分布を調整できる。最も重要なことは、光子統計が駆動光から放出される電子に印加され、新しいセンサーデバイスへの扉が開き、量子光による強磁場量子光学に開放されることである。

Photon number distributions from classical and non-classical light sources have been studied extensively, yet their impact on photoemission processes is largely unexplored. In this article, we present measurements of electron number-distributions from metal needle tips illuminated with ultrashort light pulses of different photon quantum statistics. By varying the photon statistics of the exciting light field between classical (Poissonian) and quantum (super-Poissonian), we demonstrate that the measured electron distributions are changed substantially. Using single-mode bright squeezed vacuum light, we measure extreme statistics events with up to 65 electrons from one light pulse at a mean of 0.27 electrons per pulse - the likelihood for such an event equals $10^{-128}$ with Poissonian statistics. Changing the number of modes of the exciting bright squeezed vacuum light, we can tailor the electron-number distribution on demand. Most importantly, our results demonstrate that the photon statistics is imprinted from the driving light to the emitted electrons, opening the door to new sensor devices and to strong-field quantum optics with quantum light.

翻訳日:2023-07-27 12:17:53 公開日:2023-07-26

# 分散離散表現の学習

Learning Disentangled Discrete Representations ( http://arxiv.org/abs/2307.14151v1 )

ライセンス: Link先を確認

David Friede, Christian Reimers, Heiner Stuckenschmidt and Mathias Niepert

(参考訳) 最近の画像生成、モデルベース強化学習、テキストから画像への生成の成功は、離散的潜在表現の実証的利点を示しているが、その利点の背後にある理由は定かではない。本稿では,標準ガウス変分オートエンコーダ(VAE)をカテゴリー変分オートエンコーダに置き換えることで,離散潜在空間と非交分表現の関係を検討する。カテゴリー分布の基盤となる格子構造は多変量ガウス分布に付随する回転不変性の問題を緩和し、非交叉表現の効率的な帰納的先行として機能することを示す。本研究では,非絡み合った表現を学習する上で,個別のVAEの利点を示す分析的および実証的な知見を提供する。さらに,不連続表現を好む最初の教師なしモデル選択戦略を提案する。

Recent successes in image generation, model-based reinforcement learning, and text-to-image generation have demonstrated the empirical advantages of discrete latent representations, although the reasons behind their benefits remain unclear. We explore the relationship between discrete latent spaces and disentangled representations by replacing the standard Gaussian variational autoencoder (VAE) with a tailored categorical variational autoencoder. We show that the underlying grid structure of categorical distributions mitigates the problem of rotational invariance associated with multivariate Gaussian distributions, acting as an efficient inductive prior for disentangled representations. We provide both analytical and empirical findings that demonstrate the advantages of discrete VAEs for learning disentangled representations. Furthermore, we introduce the first unsupervised model selection strategy that favors disentangled representations.

翻訳日:2023-07-27 12:17:35 公開日:2023-07-26

# 説明可能な人工知能(XAI)における性能説明可能性貿易の見直し

Revisiting the Performance-Explainability Trade-Off in Explainable Artificial Intelligence (XAI) ( http://arxiv.org/abs/2307.14239v1 )

ライセンス: Link先を確認

Barnaby Crook, Maximilian Schl\"uter, Timo Speith

(参考訳) 要求工学(RE)の分野では、AIをサポートするシステムとユーザニーズ、社会的期待、規制基準の整合性において、説明可能な人工知能(XAI)の重要性が増している。一般に、システム品質に影響を与える重要な非機能要件として説明可能性が現れています。しかし、説明可能性と性能のトレードオフは説明可能性のポジティブな影響と推定される。説明可能性の要件を満たすことがシステム性能の低下を伴う場合、これらの品質面のどちらが優先され、どのように妥協するかを慎重に検討する必要がある。本稿では,そのトレードオフを批判的に検討する。我々は、リソースの可用性、ドメインの特徴、リスクの考慮を組み込んだ、曖昧な方法でアプローチするのが最善である、と論じる。この研究は、将来の研究とベストプラクティスの基礎を提供することで、AIのためのREの分野を前進させることを目指している。

Within the field of Requirements Engineering (RE), the increasing significance of Explainable Artificial Intelligence (XAI) in aligning AI-supported systems with user needs, societal expectations, and regulatory standards has garnered recognition. In general, explainability has emerged as an important non-functional requirement that impacts system quality. However, the supposed trade-off between explainability and performance challenges the presumed positive influence of explainability. If meeting the requirement of explainability entails a reduction in system performance, then careful consideration must be given to which of these quality aspects takes precedence and how to compromise between them. In this paper, we critically examine the alleged trade-off. We argue that it is best approached in a nuanced way that incorporates resource availability, domain characteristics, and considerations of risk. By providing a foundation for future research and best practices, this work aims to advance the field of RE for AI.

翻訳日:2023-07-27 12:11:09 公開日:2023-07-26

# ロボット群のための多目的ニューラルネットワークコントローラの進化

Evolving Multi-Objective Neural Network Controllers for Robot Swarms ( http://arxiv.org/abs/2307.14237v1 )

ライセンス: Link先を確認

Karl Mason, Sabine Hauert

(参考訳) 多くのswarm roboticsタスクは、複数の相反する目的から成り立っている。本研究では,ロボット群に対する制御器開発のための多目的進化型ニューラルネットワーク手法を提案する。 Swarmロボットコントローラは、低忠実度Pythonシミュレータでトレーニングされ、Webotsを使用して高忠実度シミュレーション環境でテストされる。次に、進化した多目的ロボットコントローラの、多数のロボットを持つ環境への拡張性をテストするシミュレーションを行う。その結果,提案手法は各ロボットを効果的に制御できることを示した。ロボット群は、それぞれの目標の重み付けを調整するため、異なる振る舞いを示す。その結果、低忠実度シミュレータで進化した多目的ニューラルネットワークコントローラは、高忠実度シミュレーション環境に移行でき、さらに多くのロボットを必要とせずに、より多くの環境にスケールできることがわかった。

Many swarm robotics tasks consist of multiple conflicting objectives. This research proposes a multi-objective evolutionary neural network approach to developing controllers for swarms of robots. The swarm robot controllers are trained in a low-fidelity Python simulator and then tested in a high-fidelity simulated environment using Webots. Simulations are then conducted to test the scalability of the evolved multi-objective robot controllers to environments with a larger number of robots. The results presented demonstrate that the proposed approach can effectively control each of the robots. The robot swarm exhibits different behaviours as the weighting for each objective is adjusted. The results also confirm that multi-objective neural network controllers evolved in a low-fidelity simulator can be transferred to high-fidelity simulated environments and that the controllers can scale to environments with a larger number of robots without further retraining needed.

翻訳日:2023-07-27 12:10:54 公開日:2023-07-26

# UnScientify: 学術的不確かさを全文で検出する

UnScientify: Detecting Scientific Uncertainty in Scholarly Full Text ( http://arxiv.org/abs/2307.14236v1 )

ライセンス: Link先を確認

Panggih Kusuma Ningrum, Philipp Mayr, Iana Atanassova

(参考訳) 本論文は,科学的な不確実性を検出するインタラクティブシステムであるunscientifyを提案する。このシステムは、微粒度アノテーションスキームを用いて、科学文章の文レベルで不確かさを言語的に定式化する弱い教師技術を用いる。システム用のパイプラインには、パターンマッチング、複雑な文チェック、オーサリング参照チェックの組み合わせが含まれている。提案手法は,情報検索,テキストマイニング,学術文書処理など,さまざまな種類の科学的不確実性を考慮した,科学的不確実性識別のためのラベル付けおよびアノテーションタスクを自動化する。さらに、UnScientifyは解釈可能な結果を提供し、テキストにおける科学的不確実性の特定事例の理解を支援する。

This demo paper presents UnScientify, an interactive system designed to detect scientific uncertainty in scholarly full text. The system utilizes a weakly supervised technique that employs a fine-grained annotation scheme to identify verbally formulated uncertainty at the sentence level in scientific texts. The pipeline for the system includes a combination of pattern matching, complex sentence checking, and authorial reference checking. Our approach automates labeling and annotation tasks for scientific uncertainty identification, taking into account different types of scientific uncertainty, that can serve various applications such as information retrieval, text mining, and scholarly document processing. Additionally, UnScientify provides interpretable results, aiding in the comprehension of identified instances of scientific uncertainty in text.

翻訳日:2023-07-27 12:10:40 公開日:2023-07-26

# コンピュータシステムにおけるOpacityの源泉 : 包括的分類学を目指して

Sources of Opacity in Computer Systems: Towards a Comprehensive Taxonomy ( http://arxiv.org/abs/2307.14232v1 )

ライセンス: Link先を確認

Sara Mann, Barnaby Crook, Lena K\"astner, Astrid Schom\"acker, Timo Speith

(参考訳) 現代のコンピュータシステムは現代では至る所に存在するが、その多くが不透明である。これはフェアネスや説明責任といったデシデラタが重要な領域において大きな課題となる。システム透過性を達成するための最善の戦略は、与えられた文脈に共通する不透明さの特定の源によって異なります。既存の議論を合成し、拡張し、アーキテクチャ、分析、社会工学の3つの主要なカテゴリに分類される不透明性の8つの源からなる分類法を提案する。各ソースに対して,結果の不透明性に対処する方法について,まず最初に提案する。分類学は、要件エンジニアや他の実践者が、文脈的に一般的な不透明性のソースを理解し、それらを克服するための適切な戦略を選択または開発するための出発点を提供する。

Modern computer systems are ubiquitous in contemporary life yet many of them remain opaque. This poses significant challenges in domains where desiderata such as fairness or accountability are crucial. We suggest that the best strategy for achieving system transparency varies depending on the specific source of opacity prevalent in a given context. Synthesizing and extending existing discussions, we propose a taxonomy consisting of eight sources of opacity that fall into three main categories: architectural, analytical, and socio-technical. For each source, we provide initial suggestions as to how to address the resulting opacity in practice. The taxonomy provides a starting point for requirements engineers and other practitioners to understand contextually prevalent sources of opacity, and to select or develop appropriate strategies for overcoming them.

翻訳日:2023-07-27 12:10:27 公開日:2023-07-26

# 伝統的な中国絵画の計算的アプローチ:「絵画の6原則」の視点から

Computational Approaches for Traditional Chinese Painting: From the "Six Principles of Painting" Perspective ( http://arxiv.org/abs/2307.14227v1 )

ライセンス: Link先を確認

Wei Zhang, Jian-Wei Zhang, Kam Kwai Wong, Yifang Wang, Yingchaojie Feng, Luwei Wang, and Wei Chen

(参考訳) 伝統的な中国絵画(TCP)は貴重な文化遺産であり、ユニークな視覚芸術様式である。近年、文化の保存と再生のためにTCPのデジタル化への関心が高まっている。結果として得られたデジタルコピーは、TCPの構造的および体系的な理解のための計算手法の進歩を可能にした。そこで本研究では,92点の文献を詳細に分析した。 tcpsにおけるコンピュータ技術の現状について,専門家との会話を多用した3つの視点から検討した。まず,「絵画の六原則」理論に照らして,これらの論文を芸術的要素に着目した研究により分類した。次に、TCPアプリケーションの目的を説明するための4段階のフレームワークを作成しました。第3に、TCPに適用された一般的な計算技法を要約した。このフレームワークはまた、潜在的なアプリケーションと将来の展望に関する洞察を提供する。調査対象の出版物と関連情報の一覧はhttps://ca4tcp.com.comで公開されている。

Traditional Chinese Painting (TCP) is an invaluable cultural heritage resource and a unique visual art style. In recent years, increasing interest has been placed on digitalizing TCPs to preserve and revive the culture. The resulting digital copies have enabled the advancement of computational methods for structured and systematic understanding of TCPs. To explore this topic, we conducted an in-depth analysis of 92 pieces of literature. We examined the current use of computer technologies on TCPs from three perspectives, based on numerous conversations with specialists. First, in light of the "Six Principles of Painting" theory, we categorized the articles according to their research focus on artistic elements. Second, we created a four-stage framework to illustrate the purposes of TCP applications. Third, we summarized the popular computational techniques applied to TCPs. The framework also provides insights into potential applications and future prospects, with professional opinion. The list of surveyed publications and related information is available online at https://ca4tcp.com.

翻訳日:2023-07-27 12:10:16 公開日:2023-07-26

# 地域貿易組織に基づく気候交渉の進展の可能性を探る:RICE-Nに基づく研究

Explore the possibility of advancing climate negotiations on the basis of regional trade organizations: A study based on RICE-N ( http://arxiv.org/abs/2307.14226v1 )

ライセンス: Link先を確認

Wubo Dai

(参考訳) 気候問題は今ますます重要になっている。世界各国政府は何らかの進展を遂げているが、現在、国際協力の見通しが明確でない事実に直面している。統合評価モデル(IAM)モデルの限界のため,動的交渉プロセスのシミュレーションは困難である。したがって、深層学習を用いて新しいエージェントベースモデル(ABM)を構築することで、気候交渉に新たな理論的支援を提供することができる。 RICE-Nモデルに基づいて、既存の貿易グループに基づく気候交渉へのアプローチを提案した。シミュレーションの結果,このスキームは有望であることが判明した。

Climate issues have become more and more important now. Although global governments have made some progress, we are still facing the truth that the prospect of international cooperation is not clear at present. Due to the limitations of the Integrated assessment models (IAMs) model, it is difficult to simulate the dynamic negotiation process. Therefore, using deep learning to build a new agents based model (ABM) might can provide new theoretical support for climate negotiations. Building on the RICE-N model, this work proposed an approach to climate negotiations based on existing trade groups. Simulation results show that the scheme has a good prospect.

翻訳日:2023-07-27 12:10:03 公開日:2023-07-26

# 大規模言語モデルは、言語とアイテムに基づく好みのコールドスタートレコメンデーションと競争している

Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences ( http://arxiv.org/abs/2307.14225v1 )

ライセンス: Link先を確認

Scott Sanner and Krisztian Balog and Filip Radlinski and Ben Wedin and Lucas Dixon

(参考訳) 従来のレコメンダシステムでは,ユーザの項目選択履歴を活用して,ユーザが好む可能性のある新たなコンテンツを推奨する。しかし、ユーザが言語ベースの好みを表現できるモダンなダイアログインターフェースは、好み入力に対して根本的に異なるモダリティを提供する。近年の大規模言語モデル(LLM)のパラダイム導入の成功に触発されて,現在最先端の項目ベース協調フィルタリング(CF)手法と比較して,項目ベースと言語ベースの両方で推奨する手法について検討した。この調査を支援するために,様々な推奨項目(バイアス)および(バイアスのない)ランダム項目に対する評価とともに,ユーザから誘導される項目ベースと言語ベースの選好の両方からなる新しいデータセットを収集した。多くの実験結果の中で, LLM は, 特定のタスク (ゼロショット) や少数のラベル (ファウショット) を指導していないにもかかわらず, アイテムベースCF 法と比較して, 近い冷間開始時の純粋言語に基づく選好(項目選好)に対して, 競争力のあるレコメンデーション性能を提供することがわかった。言語ベースの嗜好表現は、アイテムベースやベクトルベースの表現よりも説明可能で精査可能であるため、これは特に有望である。

Traditional recommender systems leverage users' item preference history to recommend novel content that users may like. However, modern dialog interfaces that allow users to express language-based preferences offer a fundamentally different modality for preference input. Inspired by recent successes of prompting paradigms for large language models (LLMs), we study their use for making recommendations from both item-based and language-based preferences in comparison to state-of-the-art item-based collaborative filtering (CF) methods. To support this investigation, we collect a new dataset consisting of both item-based and language-based preferences elicited from users along with their ratings on a variety of (biased) recommended items and (unbiased) random items. Among numerous experimental results, we find that LLMs provide competitive recommendation performance for pure language-based preferences (no item preferences) in the near cold-start case in comparison to item-based CF methods, despite having no supervised training for this specific task (zero-shot) or only a few labels (few-shot). This is particularly promising as language-based preference representations are more explainable and scrutable than item-based or vector-based representations.

翻訳日:2023-07-27 12:09:50 公開日:2023-07-26

# 量子コンピューティングのdyadicフラグメントにおけるsum-over-pathの書き換えと完全性

Rewriting and Completeness of Sum-Over-Paths in Dyadic Fragments of Quantum Computing ( http://arxiv.org/abs/2307.14223v1 )

ライセンス: Link先を確認

Renaud Vilmart

(参考訳) sum-over-paths"形式主義は、量子系を記述する線形写像を象徴的に操作する方法であり、そのような系の形式的検証に使用されるツールである。ここでは、定式化のための新しい書き直し規則を述べ、量子力学の最も単純な普遍的な断片である "Toffoli-Hadamard" に対して完備であることを示す。書き直しは終了しているが、confluent(断片の普遍性から期待される)ではないことを示す。我々は、Sum-over-Pathsとグラフィカル言語ZH-calculusの接続を利用し、また、公理化が後者にどのように変換されるかを示す。提案する書き直しルールの一般化を提供し,実際に用語を削減しようとする場合に有用であることを示すとともに,これらの新しいルールをグラフィカルに理解する方法を示す。量子フーリエ変換において特に用いられる量子計算のdyadicフラグメントの完全性を達成するために書き直しシステムを拡張し、dyadic倍数$\pi$の位相ゲートを toffoli-hadamard ゲート集合に付加する方法を示す。最後に、ゲートベースの量子計算を解析するために設計されたシステムではネイティブではないが、ハミルトニアンベースの量子計算を考える際に必要となる任意の項の和と結合の方法を示す。

The "Sum-Over-Paths" formalism is a way to symbolically manipulate linear maps that describe quantum systems, and is a tool that is used in formal verification of such systems. We give here a new set of rewrite rules for the formalism, and show that it is complete for "Toffoli-Hadamard", the simplest approximately universal fragment of quantum mechanics. We show that the rewriting is terminating, but not confluent (which is expected from the universality of the fragment). We do so using the connection between Sum-over-Paths and graphical language ZH-calculus, and also show how the axiomatisation translates into the latter. We provide generalisations of the presented rewrite rules, that can prove useful when trying to reduce terms in practice, and we show how to graphically make sense of these new rules. We show how to enrich the rewrite system to reach completeness for the dyadic fragments of quantum computation, used in particular in the Quantum Fourier Transform, and obtained by adding phase gates with dyadic multiples of $\pi$ to the Toffoli-Hadamard gate-set. Finally, we show how to perform sums and concatenation of arbitrary terms, something which is not native in a system designed for analysing gate-based quantum computation, but necessary when considering Hamiltonian-based quantum computation.

翻訳日:2023-07-27 12:09:12 公開日:2023-07-26

# 普遍量子フォン・ノイマン構造に関する調査研究

A survey of universal quantum von Neumann architecture ( http://arxiv.org/abs/2307.14219v1 )

ライセンス: Link先を確認

Y.-T. Liu, K. Wang, Y.-D. Liu, D.-S. Wang

(参考訳) 普遍量子コンピュータの存在は理論的によく確立されている。しかし、実際の量子コンピュータシステムを構築するには、普遍性の理論に頼るだけでなく、プログラム可能性、モジュール性、スケーラビリティなど、他の機能に対する要求を満たす方法が必要である。この目的のために、最近提案された量子フォン・ノイマン・アーキテクチャのモデルについて、コンピュータシステムの階層的設計という、実用的でより広い設定で検討する。我々は、量子cpuと量子制御ユニットの構造を分析し、それらの接続を計算の利点で引き出す。また、我々のモデルの最近のデモでは20キュービット未満が必要だったことも指摘しています。

The existence of universal quantum computers has been theoretically well established. However, building up a real quantum computer system not only relies on the theory of universality, but also needs methods to satisfy requirements on other features, such as programmability, modularity, scalability, etc. To this end, we study the recently proposed model of quantum von Neumann architecture, by putting it in a practical and broader setting, namely, the hierarchical design of a computer system. We analyze the structures of quantum CPU and quantum control unit, and draw their connections with computational advantages. We also point out that a recent demonstration of our model would require less than 20 qubits.

翻訳日:2023-07-27 12:08:45 公開日:2023-07-26

# 資源制約下における従属プロセスのオンラインモデリングとモニタリング

Online Modeling and Monitoring of Dependent Processes under Resource Constraints ( http://arxiv.org/abs/2307.14208v1 )

ライセンス: Link先を確認

Tanapol Kosolwattana, Huazheng Wang, Ying Lin

(参考訳) 限られた資源の下で依存するプロセスの集団を監視することは異常な事象の検出に重要である。リスクの高いプロセスの活用と依存動力学の探索のための資源を適応的に割り当てる新しいオンライン協調学習手法を提案する。提案手法の有効性は理論解析と実験によって証明される。

Monitoring a population of dependent processes under limited resources is critical for abnormal events detection. A novel online collaborative learning method is proposed to adaptively allocate the resources for exploitation of high-risk processes and exploration of dependent dynamics. Efficiency of the proposed method is proved through theoretical analysis and experiments.

翻訳日:2023-07-27 12:08:34 公開日:2023-07-26

# ディープフェイク画像による脳腫瘍分画の改善

Deepfake Image Generation for Improved Brain Tumor Segmentation ( http://arxiv.org/abs/2307.14273v1 )

ライセンス: Link先を確認

Roa'a Al-Emaryeen, Sara Al-Nahhas, Fatima Himour, Waleed Mahafza and Omar Al-Kadi

(参考訳) 世界が技術と健康が進歩するにつれて、無症状の徴候を明らかにすることで病気の認識が向上する。生命を脅かす可能性があるため、早期に腫瘍を検出・治療することが重要である。コンピュータ支援技術は、病気の診断に直面する退屈な限界を克服するために用いられるが、脳腫瘍のセグメンテーションは、特にマルチモダリティデータに関わる場合、難しいプロセスである。これは主にデータ不足とそれに対応するラベル付けによる非効率なトレーニングに起因する。本研究は,脳腫瘍セグメンテーションにおけるディープフェイク画像生成の可能性を検討する。この目的のために、画像から画像への変換にGenerative Adversarial Networkを使用してデータセットのサイズを拡大し、続いてディープフェイクイメージでトレーニングされたU-Netベースの畳み込みニューラルネットワークを用いて画像セグメンテーションを行った。提案手法の性能は、4つの公開データセットの真理と比較される。その結果,画像セグメンテーションの品質指標の面ではパフォーマンスが向上し,限られたデータでトレーニングする際の支援が可能となった。

As the world progresses in technology and health, awareness of disease by revealing asymptomatic signs improves. It is important to detect and treat tumors in early stage as it can be life-threatening. Computer-aided technologies are used to overcome lingering limitations facing disease diagnosis, while brain tumor segmentation remains a difficult process, especially when multi-modality data is involved. This is mainly attributed to ineffective training due to lack of data and corresponding labelling. This work investigates the feasibility of employing deep-fake image generation for effective brain tumor segmentation. To this end, a Generative Adversarial Network was used for image-to-image translation for increasing dataset size, followed by image segmentation using a U-Net-based convolutional neural network trained with deepfake images. Performance of the proposed approach is compared with ground truth of four publicly available datasets. Results show improved performance in terms of image segmentation quality metrics, and could potentially assist when training with limited data.

翻訳日:2023-07-27 12:00:48 公開日:2023-07-26

# 相互条件付き拘束コミットメントによる国際気候政策の改善

Improving International Climate Policy via Mutually Conditional Binding Commitments ( http://arxiv.org/abs/2307.14267v1 )

ライセンス: Link先を確認

Jobst Heitzig, J\"org Oechssler, Christoph Pr\"oschel, Niranjana Ragavan, Yat Long Lo

(参考訳) パリ協定は、気候交渉において重要なマイルストーンと見なされ、多くの国が決定する貢献(ndc)の無条件性のために、気候変動を効果的に対処するための課題に直面してきた。その結果, 主要汚染物質に対するフリーライディング行動の頻度が増加し, NDCにおける具体的な条件の欠如が生じた。この問題に対処するため,条件付きコミット機構と呼ばれる分散的ボトムアップ手法の実装を提案する。このメカニズムは、国際気候政策における条件付き協力を形式化することを目的として、早期採用者に柔軟性とインセンティブを提供している。本稿では,ai4climatecooperationチャレンジにおけるメカニズムの概要,その性能について述べ,実世界実装の可能性について考察する。気候緩和集団行動問題、基本的な経済原理、ゲーム理論の概念の事前知識が想定される。

The Paris Agreement, considered a significant milestone in climate negotiations, has faced challenges in effectively addressing climate change due to the unconditional nature of most Nationally Determined Contributions (NDCs). This has resulted in a prevalence of free-riding behavior among major polluters and a lack of concrete conditionality in NDCs. To address this issue, we propose the implementation of a decentralized, bottom-up approach called the Conditional Commitment Mechanism. This mechanism, inspired by the National Popular Vote Interstate Compact, offers flexibility and incentives for early adopters, aiming to formalize conditional cooperation in international climate policy. In this paper, we provide an overview of the mechanism, its performance in the AI4ClimateCooperation challenge, and discuss potential real-world implementation aspects. Prior knowledge of the climate mitigation collective action problem, basic economic principles, and game theory concepts are assumed.

翻訳日:2023-07-27 12:00:30 公開日:2023-07-26

# 相互条件付き拘束コミットメントによる国際気候政策の改善

Improving International Climate Policy via Mutually Conditional Binding Commitments ( http://arxiv.org/abs/2307.14266v1 )

ライセンス: Link先を確認

Jobst Heitzig, J\"org Oechssler, Christoph Pr\"oschel, Niranjana Ragavan, Richie YatLong Lo

(参考訳) 本稿では,国際気候政策交渉の現実性を改善するため,RICE-Nシミュレーションとマルチエージェント強化学習フレームワークの強化を提案する。枠組みの価値を認め,気候交渉のモデル化における様々な要因に対処する重要な拡張の必要性を強調した。 CCFメカニズム(Conditional Commitments Mechanism)に関するこれまでの研究に基づいて、シミュレーションと現実のギャップを埋める方法について論じる。コーディネーション強化のためのレコメンダまたはプランナーエージェントの導入、社会的要因と非パーティ利害関係者サブエイジェントを組み込むことによるreal2simギャップへの対処、および基盤となる強化学習ソリューションアルゴリズムの強化を提案する。これらの改善は、米Nにおけるより効果的な国際気候政策決定のための交渉プロトコルの評価と定式化を促進することを目的としている。しかしながら、これらの提案の意義と有効性を決定するには、さらなる実験とテストが必要である。

This paper proposes enhancements to the RICE-N simulation and multi-agent reinforcement learning framework to improve the realism of international climate policy negotiations. Acknowledging the framework's value, we highlight the necessity of significant enhancements to address the diverse array of factors in modeling climate negotiations. Building upon our previous work on the "Conditional Commitments Mechanism" (CCF mechanism) we discuss ways to bridge the gap between simulation and reality. We suggest the inclusion of a recommender or planner agent to enhance coordination, address the Real2Sim gap by incorporating social factors and non-party stakeholder sub-agents, and propose enhancements to the underlying Reinforcement Learning solution algorithm. These proposed improvements aim to advance the evaluation and formulation of negotiation protocols for more effective international climate policy decision-making in Rice-N. However, further experimentation and testing are required to determine the implications and effectiveness of these suggestions.

翻訳日:2023-07-27 12:00:12 公開日:2023-07-26

# 拡散確率モデルを用いた組織像のアーティファクト復元

Artifact Restoration in Histology Images with Diffusion Probabilistic Models ( http://arxiv.org/abs/2307.14262v1 )

ライセンス: Link先を確認

Zhenqi He, Junjun He, Jin Ye, Yiqing Shen

(参考訳) 組織学的全スライド画像(WSI)は通常、組織折り畳みや気泡などの人工物によって妥協され、病理医とコンピュータ支援診断(CAD)システムの検査困難が増大する。既存のアーティファクトイメージの復元アプローチはGAN(Generative Adversarial Networks)に限られており、修復プロセスはイメージ・ツー・イメージ・トランスファーとして定式化されている。これらの手法は、モード崩壊と予期せぬステンスタイルの誤伝に苦しむ傾向があり、不満足で非現実的な復元画像を生み出す。 Innovatively, we make the first attempt at a denoising diffusion probabilistic model for histological artifact restoration, namely ArtiFusion.Specifically, ArtiFusion formulates the artifact region restoration as a gradual denoising process, and its training relies solely on artifact-free images to simplify the training complexity.Furthermore, to capture local-global correlations in the regional artifact restoration, a novel Swin-Transformer denoising architecture is designed, along with a time token scheme. 本研究は, 組織解析におけるArtiFusionの予備処理法としての有効性を実証し, 修復過程における組織構造と染色様式の保存に成功した。コードはhttps://github.com/zhenqi-he/artifusionで入手できる。

Histological whole slide images (WSIs) can be usually compromised by artifacts, such as tissue folding and bubbles, which will increase the examination difficulty for both pathologists and Computer-Aided Diagnosis (CAD) systems. Existing approaches to restoring artifact images are confined to Generative Adversarial Networks (GANs), where the restoration process is formulated as an image-to-image transfer. Those methods are prone to suffer from mode collapse and unexpected mistransfer in the stain style, leading to unsatisfied and unrealistic restored images. Innovatively, we make the first attempt at a denoising diffusion probabilistic model for histological artifact restoration, namely ArtiFusion.Specifically, ArtiFusion formulates the artifact region restoration as a gradual denoising process, and its training relies solely on artifact-free images to simplify the training complexity.Furthermore, to capture local-global correlations in the regional artifact restoration, a novel Swin-Transformer denoising architecture is designed, along with a time token scheme. Our extensive evaluations demonstrate the effectiveness of ArtiFusion as a pre-processing method for histology analysis, which can successfully preserve the tissue structures and stain style in artifact-free regions during the restoration. Code is available at https://github.com/zhenqi-he/ArtiFusion.

翻訳日:2023-07-27 11:59:56 公開日:2023-07-26

# 視覚トランスフォーマーのスパース・ダブル降下:リアルかファントムか?

Sparse Double Descent in Vision Transformers: real or phantom threat? ( http://arxiv.org/abs/2307.14253v1 )

ライセンス: Link先を確認

Victor Qu\'etu, Marta Milovanovic and Enzo Tartaglione

(参考訳) 視覚変換器(ViT)は近年の理論的および実証的な研究に広く関心を寄せている。注意に基づくアプローチのおかげで、インダクティブバイアスを避ける能力のおかげで、画像内の重要な特徴やパターンの識別が促進され、非常に正確な画像解析が実現される。一方、新生代の研究は、非常に過度にパラメータ化されたモデルが一般化できる現代のディープラーニングモデルで起こりうる 'sparse double descend' 現象を報告している。これにより、モデルのサイズが最適であることや、スパーシティとパフォーマンスの最良のトレードオフを見つけるための探求について、現実的な疑問が持ち上がる。このような現象を避ける方法を見つけられるでしょうか? 我々の研究は、ViTsにおけるスパース二重降下の発生に対処する。 resnetのような伝統的なアーキテクチャがスパース二重降下現象を非難されていることを示すいくつかの著作にもかかわらず、vitsでは最適に調整された$\ell_2$正規化がそのような現象を緩和することを観測する。最適なラムダは、ViTの潜在的な圧縮を犠牲にします。

Vision transformers (ViT) have been of broad interest in recent theoretical and empirical works. They are state-of-the-art thanks to their attention-based approach, which boosts the identification of key features and patterns within images thanks to the capability of avoiding inductive bias, resulting in highly accurate image analysis. Meanwhile, neoteric studies have reported a ``sparse double descent'' phenomenon that can occur in modern deep-learning models, where extremely over-parametrized models can generalize well. This raises practical questions about the optimal size of the model and the quest over finding the best trade-off between sparsity and performance is launched: are Vision Transformers also prone to sparse double descent? Can we find a way to avoid such a phenomenon? Our work tackles the occurrence of sparse double descent on ViTs. Despite some works that have shown that traditional architectures, like Resnet, are condemned to the sparse double descent phenomenon, for ViTs we observe that an optimally-tuned $\ell_2$ regularization relieves such a phenomenon. However, everything comes at a cost: optimal lambda will sacrifice the potential compression of the ViT.

翻訳日:2023-07-27 11:59:34 公開日:2023-07-26

# ステップとアイソスペクタリティを有する高調波発振器

Harmonic Oscillator with a Step and Isospectrality ( http://arxiv.org/abs/2307.14251v1 )

ライセンス: Link先を確認

Yuta Nasuda, Nobuyuki Sawado

(参考訳) 原点に有限ジャンプ$a$の高調波発振器を持つ一次元Schr\"{o}dinger方程式について検討する。この解は、通常の波動関数マッチング技術を用いて構成される。 a$, $a=4\ell$ (\ell=1,2,\ldots$) の特別な選択に対して、波動関数はエルミート多項式によって表現できる。さらに,darboux変換によるポテンシャルの等スペクトル変形についても検討する。この文脈では、通常の調和振動子に対する無限個の等スペクトルハミルトニアンが得られる。

We investigate the one-dimensional Schr\"{o}dinger equation with a harmonic oscillator with a finite jump $a$ at the origin. The solution is constructed by employing the ordinary matching-of-wavefunctions technique. For the special choices of $a$, $a=4\ell$ ($\ell=1,2,\ldots$), the wavefunctions can be expressed by the Hermite polynomials. Moreover, we explore isospectral deformations of the potential via the Darboux transformation. In this context, infinitely many number of isospectral Hamiltonians to the ordinary harmonic oscillator are obtained.

翻訳日:2023-07-27 11:59:14 公開日:2023-07-26

# 説明可能な人工知能(XAI)の評価手法の新しい展望

A New Perspective on Evaluation Methods for Explainable Artificial Intelligence (XAI) ( http://arxiv.org/abs/2307.14246v1 )

ライセンス: Link先を確認

Timo Speith, Markus Langer

翻訳日:2023-07-27 11:59:04 公開日:2023-07-26

# 蛍光ニューロン v2: 顕微鏡における深層学習のためのマルチタスク・マルチフォームアノテーション

Fluorescent Neuronal Cells v2: Multi-Task, Multi-Format Annotations for Deep Learning in Microscopy ( http://arxiv.org/abs/2307.14243v1 )

ライセンス: Link先を確認

Luca Clissa, Antonio Macaluso, Roberto Morelli, Alessandra Occhinegro, Emiliana Piscitiello, Ludovico Taddei, Marco Luppi, Roberto Amici, Matteo Cerri, Timna Hitrec, Lorenzo Rinaldi, Antonio Zoccoli

(参考訳) 蛍光細胞v2は、生命科学と深層学習の領域における革新的な研究を促進するために設計された蛍光顕微鏡画像とそれに対応する地平線アノテーションのコレクションである。このデータセットは、歯状神経細胞の核と細胞質が様々なマーカーで染色され、解剖学的または機能的特徴を強調する3つのイメージコレクションを含んでいる。画像の他に、セマンティックセグメンテーション、オブジェクト検出、カウントなど、いくつかの学習タスクに対して、地平のアノテーションを提供する。コントリビューションは2つあります。まず,アノテーションの多様さと利用可能な形式を考慮し,セグメンテーション,検出,特徴学習,教師なし・自己教師型学習,伝達学習,関連分野のコンピュータビジョンアプローチにおける方法論的進歩の促進を図った。第2に,広範な探索とベンチマークを行うことにより,蛍光細胞v2が蛍光顕微鏡解析におけるブレークスルーを触媒し,生命科学における最先端の発見を促進することを期待する。データは以下の通り。 https://amsacta.unibo.it/id/eprint/7347

Fluorescent Neuronal Cells v2 is a collection of fluorescence microscopy images and the corresponding ground-truth annotations, designed to foster innovative research in the domains of Life Sciences and Deep Learning. This dataset encompasses three image collections in which rodent neuronal cells' nuclei and cytoplasm are stained with diverse markers to highlight their anatomical or functional characteristics. Alongside the images, we provide ground-truth annotations for several learning tasks, including semantic segmentation, object detection, and counting. The contribution is two-fold. First, given the variety of annotations and their accessible formats, we envision our work facilitating methodological advancements in computer vision approaches for segmentation, detection, feature learning, unsupervised and self-supervised learning, transfer learning, and related areas. Second, by enabling extensive exploration and benchmarking, we hope Fluorescent Neuronal Cells v2 will catalyze breakthroughs in fluorescence microscopy analysis and promote cutting-edge discoveries in life sciences. The data are available at: https://amsacta.unibo.it/id/eprint/7347

翻訳日:2023-07-27 11:58:50 公開日:2023-07-26

# ジョイント領域ローカライズとインパインティングによる敵パッチの防御

Defending Adversarial Patches via Joint Region Localizing and Inpainting ( http://arxiv.org/abs/2307.14242v1 )

ライセンス: Link先を確認

Junwen Chen, Xingxing Wei

(参考訳) ディープニューラルネットワークは様々なアプリケーションでうまく使われているが、敵の例に対する脆弱性を示している。敵対的パッチの開発により、物理的シーンにおける攻撃の可能性が高まり、パッチ攻撃に対する防御が緊急に必要となる。しかし、このような敵パッチ攻撃を防御することは未解決の問題である。本稿では,敵のパッチの特性を解析し,一方,敵のパッチは対象オブジェクトの出現や文脈的不整合につながり,他方ではパッチ領域はバックボーンネットワークによって抽出されたオブジェクトの高レベル特徴マップに異常な変化を示す。上記の2点を考慮し、入力例を前処理する ‘`localizing and inpainting' 機構に基づく新たな防御手法を提案する。具体的には、``localizing' サブネットワークが上記の2つの側面を表現し、画像中の敵パッチ領域を正確に検出する、2つの分岐構造を利用する統一フレームワークを設計する。インパインティング」サブネットワークでは、周囲のコンテキストキューを利用して、敵パッチでカバーされた元のコンテンツを復元する。インパインされた画像の品質は、外見の一貫性と敵攻撃の影響を計測することで評価される。これら2つのサブネットワークは、反復的な最適化方法で共同で訓練される。こうすることで、「ローカライズ」モジュールと「インパインティング」モジュールは、互いに密接に相互作用し、より良いソリューションを学ぶことができる。様々な敵パッチ攻撃に対して,交通標識の分類と検出を行う一連の実験を行った。

Deep neural networks are successfully used in various applications, but show their vulnerability to adversarial examples. With the development of adversarial patches, the feasibility of attacks in physical scenes increases, and the defenses against patch attacks are urgently needed. However, defending such adversarial patch attacks is still an unsolved problem. In this paper, we analyse the properties of adversarial patches, and find that: on the one hand, adversarial patches will lead to the appearance or contextual inconsistency in the target objects; on the other hand, the patch region will show abnormal changes on the high-level feature maps of the objects extracted by a backbone network. Considering the above two points, we propose a novel defense method based on a ``localizing and inpainting" mechanism to pre-process the input examples. Specifically, we design an unified framework, where the ``localizing" sub-network utilizes a two-branch structure to represent the above two aspects to accurately detect the adversarial patch region in the image. For the ``inpainting" sub-network, it utilizes the surrounding contextual cues to recover the original content covered by the adversarial patch. The quality of inpainted images is also evaluated by measuring the appearance consistency and the effects of adversarial attacks. These two sub-networks are then jointly trained via an iterative optimization manner. In this way, the ``localizing" and ``inpainting" modules can interact closely with each other, and thus learn a better solution. A series of experiments versus traffic sign classification and detection tasks are conducted to defend against various adversarial patch attacks.

翻訳日:2023-07-27 11:58:28 公開日:2023-07-26

# disguisor:手術室の全体的な顔匿名化

DisguisOR: Holistic Face Anonymization for the Operating Room ( http://arxiv.org/abs/2307.14241v1 )

ライセンス: Link先を確認

Lennart Bastian, Tony Danjun Wang, Tobias Czempiel, Benjamin Busam and Nassir Navab

(参考訳) 目的: 外科的データサイエンス(SDS)の進歩は, 病院環境からの映像記録の増加に寄与している。外科的ワークフロー認識のような手法は患者のケアの質を高める可能性があるが、ビデオデータの量は手作業で画像が匿名化できる規模を超えている。手術室(または手術室)における既存の2次元自動匿名化手法は、閉塞と閉塞によるものである。複数のカメラストリームからの3Dデータを用いて,マルチビューOR記録の匿名化を提案する。方法:複数のカメラからのRGBと深度画像は、シーンの3Dポイントクラウド表現に融合される。次に,検出された3dヒューマンキーポイントにパラメトリックヒューマンメッシュモデルを回帰させ,顔メッシュを融合3dポイントクラウドと整合させることにより,各顔を3dで検出する。メッシュモデルは取得したカメラビュー毎にレンダリングされ、個々の顔を置き換える。結果: 本手法は, 既存のアプローチよりも高い速度で顔の特定に有望性を示す。 Disguisorは、各カメラビューに対して幾何学的に一貫した匿名化を生成し、より現実的な匿名化を可能にする。結論: 手術室での頻繁な閉塞および群集は, 既設の匿名化法の改善の余地を残している。 Disguisorは、シーンレベルでのプライバシーに対処し、SDSにおけるさらなる研究を促進する可能性がある。

Purpose: Recent advances in Surgical Data Science (SDS) have contributed to an increase in video recordings from hospital environments. While methods such as surgical workflow recognition show potential in increasing the quality of patient care, the quantity of video data has surpassed the scale at which images can be manually anonymized. Existing automated 2D anonymization methods under-perform in Operating Rooms (OR), due to occlusions and obstructions. We propose to anonymize multi-view OR recordings using 3D data from multiple camera streams. Methods: RGB and depth images from multiple cameras are fused into a 3D point cloud representation of the scene. We then detect each individual's face in 3D by regressing a parametric human mesh model onto detected 3D human keypoints and aligning the face mesh with the fused 3D point cloud. The mesh model is rendered into every acquired camera view, replacing each individual's face. Results: Our method shows promise in locating faces at a higher rate than existing approaches. DisguisOR produces geometrically consistent anonymizations for each camera view, enabling more realistic anonymization that is less detrimental to downstream tasks. Conclusion: Frequent obstructions and crowding in operating rooms leaves significant room for improvement for off-the-shelf anonymization methods. DisguisOR addresses privacy on a scene level and has the potential to facilitate further research in SDS.

翻訳日:2023-07-27 11:57:56 公開日:2023-07-26

# 意見要約における意見の有病率の自動評価

Automatically Evaluating Opinion Prevalence in Opinion Summarization ( http://arxiv.org/abs/2307.14305v1 )

ライセンス: Link先を確認

Christopher Malon

(参考訳) 多数の製品レビューに直面した場合、人間がそれらすべてを思い出し、適切な参照要約を書くために、重み付けの意見を代表的に書けるかどうかははっきりしない。本稿では,要約中の各文に一致したレビュー数をカウントし,自明な文や冗長な文を認識しながら,要約が表現する意見の正当性をテストするための自動尺度を提案する。この評価指標を定式化するために,個々のソースレビューに対して要約文の事実整合性を評価するための既存の手法をいくつか検討する。 Amazon製品レビューのコーパスでは、意見整合性の複数の人的判断を集め、製品レビューでどの自動指標が一貫性を表現するかを決定する。その結果, 提案手法は, 従来の抽出的, 抽象的, 非教師的意見要約法よりも, 著者による要約の方が, ランダムに選択された抽出結果よりもわずかに優れた評価率を示した。我々は,人間による2倍の論評率を持つ抽出要約の欲張りな構成により,改善の余地を示す。最後に,従来の抽象的な意見要約システムによって達成された意見の妥当性を人的パフォーマンスのレベルに引き上げることができることを示す。

When faced with a large number of product reviews, it is not clear that a human can remember all of them and weight opinions representatively to write a good reference summary. We propose an automatic metric to test the prevalence of the opinions that a summary expresses, based on counting the number of reviews that are consistent with each statement in the summary, while discrediting trivial or redundant statements. To formulate this opinion prevalence metric, we consider several existing methods to score the factual consistency of a summary statement with respect to each individual source review. On a corpus of Amazon product reviews, we gather multiple human judgments of the opinion consistency, to determine which automatic metric best expresses consistency in product reviews. Using the resulting opinion prevalence metric, we show that a human authored summary has only slightly better opinion prevalence than randomly selected extracts from the source reviews, and previous extractive and abstractive unsupervised opinion summarization methods perform worse than humans. We demonstrate room for improvement with a greedy construction of extractive summaries with twice the opinion prevalence achieved by humans. Finally, we show that preprocessing source reviews by simplification can raise the opinion prevalence achieved by existing abstractive opinion summarization systems to the level of human performance.

翻訳日:2023-07-27 11:52:31 公開日:2023-07-26

# 最適エネルギー貯蔵システムディスパッチのための制約強制深層強化学習フレームワーク

A Constraint Enforcement Deep Reinforcement Learning Framework for Optimal Energy Storage Systems Dispatch ( http://arxiv.org/abs/2307.14304v1 )

ライセンス: Link先を確認

Shengren Hou and Edgar Mauricio Salazar Duque and Peter Palensky and Pedro P. Vergara

(参考訳) エネルギー貯蔵システム(ESS)の最適供給は、動的価格の変動、需要消費、再生可能エネルギーの発生によって生じる不確実性により、深刻な課題を提起する。ディープニューラルネットワーク(DNN)の一般化機能を活用することで、ディープ強化学習(DRL)アルゴリズムは、分散ネットワークの確率性に適応して応答する良質な制御モデルを学ぶことができる。しかし、現在のdrlアルゴリズムには運用上の制約を厳密に強制する能力が欠けている。この問題に対処するために,オンライン操作中の環境や行動空間の運用制約を厳格に実施しつつ,継続的な行動空間を効果的に処理するDRLフレームワークを提案する。まず、提案フレームワークは、DNNを用いてモデル化されたアクション値関数を訓練する。その後、このアクション値関数は、環境の運用制約を考慮した混合整数プログラミング(MIP)の定式化として定式化される。総合的な数値シミュレーションにより,提案したMIP-DRLフレームワークの性能が向上し,最先端のDRLアルゴリズムと確率変数の完全予測で得られる最適解とを比較した。

The optimal dispatch of energy storage systems (ESSs) presents formidable challenges due to the uncertainty introduced by fluctuations in dynamic prices, demand consumption, and renewable-based energy generation. By exploiting the generalization capabilities of deep neural networks (DNNs), deep reinforcement learning (DRL) algorithms can learn good-quality control models that adaptively respond to distribution networks' stochastic nature. However, current DRL algorithms lack the capabilities to enforce operational constraints strictly, often even providing unfeasible control actions. To address this issue, we propose a DRL framework that effectively handles continuous action spaces while strictly enforcing the environments and action space operational constraints during online operation. Firstly, the proposed framework trains an action-value function modeled using DNNs. Subsequently, this action-value function is formulated as a mixed-integer programming (MIP) formulation enabling the consideration of the environment's operational constraints. Comprehensive numerical simulations show the superior performance of the proposed MIP-DRL framework, effectively enforcing all constraints while delivering high-quality dispatch decisions when compared with state-of-the-art DRL algorithms and the optimal solution obtained with a perfect forecast of the stochastic variables.

翻訳日:2023-07-27 11:52:10 公開日:2023-07-26

# ホテル・ホスピタリティにおけるパーソナライズドレコメンデーションの管理と提供のためのチャットgptと説得技術

ChatGPT and Persuasive Technologies for the Management and Delivery of Personalized Recommendations in Hotel Hospitality ( http://arxiv.org/abs/2307.14298v1 )

ライセンス: Link先を確認

Manolis Remountakis, Konstantinos Kotis, Babis Kourtzis, and George E. Tsekouras

(参考訳) レコメンダシステムはホテルのホスピタリティ業界で必須のツールとなり、ゲストにパーソナライズされ、カスタマイズされた体験を可能にする。近年,ChatGPTや説得技術といった大規模言語モデル(LLM)の進歩により,これらのシステムの有効性を高めるための新たな道が開かれた。本稿では,ホテル宿泊レコメンデーションシステムの自動化と改善を目的としたChatGPTと説得技術の統合の可能性を検討する。まず、人間のようなテキストを理解して生成できるChatGPTの機能を調べ、より正確でコンテキスト対応のレコメンデーションを可能にします。 chatgptをレコメンダシステムに統合し、ユーザの好みを分析し、オンラインレビューから貴重な洞察を抽出し、ゲストプロフィールに基づいてパーソナライズされたレコメンデーションを生成する機能を強調する。第2に,ユーザの行動に影響を及ぼす説得的技術の役割と,ホテルのレコメンデーションの説得的影響について検討する。社会的証明、不足、パーソナライゼーションといった説得力のある手法を取り入れることで、レコメンダシステムはユーザの意思決定に効果的に影響を与え、特定のホテルの予約や部屋のアップグレードといった望ましい行動を奨励することができる。本稿では,ChatGPTと説得技術の有効性を検討するために,ホテル推薦システムを用いたパイロット実験を行った。本研究の目的は,ChatGPTとPersua-sive技術の統合がユーザのエンゲージメント,満足度,コンバージョン率に与える影響を検討することである。予備結果は,これらの技術がゲスト体験とビジネスパフォーマンスを向上させる可能性を示すものである。本稿では,レコメンデーションシステムにおけるLLMと説得技術との相乗関係を探求し,客の満足感とホテル収入に影響を与えるホテルの宿泊分野に貢献する。

Recommender systems have become indispensable tools in the hotel hospitality industry, enabling personalized and tailored experiences for guests. Recent advancements in large language models (LLMs), such as ChatGPT, and persuasive technologies, have opened new avenues for enhancing the effectiveness of those systems. This paper explores the potential of integrating ChatGPT and persuasive technologies for automating and improving hotel hospitality recommender systems. First, we delve into the capabilities of ChatGPT, which can understand and generate human-like text, enabling more accurate and context-aware recommendations. We discuss the integration of ChatGPT into recommender systems, highlighting the ability to analyze user preferences, extract valuable insights from online reviews, and generate personalized recommendations based on guest profiles. Second, we investigate the role of persuasive technology in influencing user behavior and enhancing the persuasive impact of hotel recommendations. By incorporating persuasive techniques, such as social proof, scarcity and personalization, recommender systems can effectively influence user decision-making and encourage desired actions, such as booking a specific hotel or upgrading their room. To investigate the efficacy of ChatGPT and persuasive technologies, we present a pilot experi-ment with a case study involving a hotel recommender system. We aim to study the impact of integrating ChatGPT and persua-sive techniques on user engagement, satisfaction, and conversion rates. The preliminary results demonstrate the potential of these technologies in enhancing the overall guest experience and business performance. Overall, this paper contributes to the field of hotel hospitality by exploring the synergistic relationship between LLMs and persuasive technology in recommender systems, ultimately influencing guest satisfaction and hotel revenue.

翻訳日:2023-07-27 11:51:47 公開日:2023-07-26

# 逐次データ分割の複雑さを解き放つ:ビデオと時系列分析における課題に取り組む

Unraveling the Complexity of Splitting Sequential Data: Tackling Challenges in Video and Time Series Analysis ( http://arxiv.org/abs/2307.14294v1 )

ライセンス: Link先を確認

Diego Botache, Kristina Dingel, Rico Huhnstock, Arno Ehresmann, Bernhard Sick

(参考訳) ビデオや時系列などのシーケンシャルデータの分割は、オブジェクト追跡や異常検出など、さまざまなデータ分析タスクにおいて重要なステップである。しかし、逐次データを分割することは、その後の分析の正確性と信頼性に影響を与える様々な課題をもたらす。本稿では,データ取得,データ表現,分割比選択,品質基準の設定,適切な選択戦略の選択など,逐次データ分割に関わる課題について考察する。これらの課題を、運動テストベンチと液体中の粒子追跡の2つの実例を通して探求する。

Splitting of sequential data, such as videos and time series, is an essential step in various data analysis tasks, including object tracking and anomaly detection. However, splitting sequential data presents a variety of challenges that can impact the accuracy and reliability of subsequent analyses. This concept article examines the challenges associated with splitting sequential data, including data acquisition, data representation, split ratio selection, setting up quality criteria, and choosing suitable selection strategies. We explore these challenges through two real-world examples: motor test benches and particle tracking in liquids.

翻訳日:2023-07-27 11:51:12 公開日:2023-07-26

# 言語学における数学的拡散モデルの構築イタリア北東部方言におけるドイツ語構文の特徴の事例研究

Founding a mathematical diffusion model in linguistics. The case study of German syntactic features in the North-Eastern Italian dialects ( http://arxiv.org/abs/2307.14291v1 )

ライセンス: Link先を確認

I. Lazzizzera

(参考訳) 中世後期にチロルにドイツ人が移住した後に発生したイタリア北東部のロマンス方言へのゲルマン語の構文的特徴の拡散を事例として考察する。インタラクティブマップは、地理データサイエンスと呼ばれるツールを使って作成される。滑らかな2次元曲面 $\mathcal{G}$ は、どの領域が与えられたドイツ語の特徴を使用するかを局所的に表現する。ニューラインこの曲面 $\mathcal{G}$ は、拡散対流現象を2次元で表す函数の現在の値(以下「emph{tidal} モード」という)であり、熱拡散のような多くの現象学的な事実のために物理学で用いられる同じ方程式に非常に自然な方法で従う。現在評価されているこの方程式の解は、$\mathcal{G}$で補間されたデータとよく適合し、ケーススタディの言語的特徴の拡散対流の説得力のある画像を提供し、単純化と近似を提供する。ニューラインは非常に重要であり、シュミットの「波」は拡散方程式の解の中に数えられることが示されている: シュミットの「波」を「潮流の洪水」に重ね合わせることで、実際の言語拡散現象の複雑さを再現することができる。

We take as a case study the spread of Germanic syntactic features into Romance dialects of North-Eastern Italy, which occurred after the immigration of German people in the Tyrol during the High Middle Ages. An interactive map is produced using tools of what is called Geographic Data Science. A smooth two-dimensional surface $\mathcal{G}$ expresses locally which fraction of territory uses a given German language feature: it is obtained by interpolating a discrete function that says if at any surveyed locality that feature is used or not.\newline This surface $\mathcal{G}$ is thought of as the value at the present time of a function describing a diffusion-convection phenomenon in two dimensions (here said \emph{tidal} mode), which is subjected in a very natural way to the same equation, suitably contextualized, used in physics for a number of phenomenological facts like the heat diffusion. It is shown that solutions of this equation, evaluated at the present time, fit well with the data as interpolated by $\mathcal{G}$, thus providing convincing pictures of diffusion-convection of the linguistic features of the case study, albeit simplifications and approximations.\newline Very importantly, it is shown that Schmidt's 'waves' can be counted among the solutions of the diffusion equation: superimposing Schmidt 'waves' to a 'tidal flooding' can reproduce complexities of real linguistic diffusion events.

翻訳日:2023-07-27 11:51:02 公開日:2023-07-26

# Skin Co-Registrationに基づくUS & MR画像融合

US & MR Image-Fusion Based on Skin Co-Registration ( http://arxiv.org/abs/2307.14288v1 )

ライセンス: Link先を確認

Martina Paccini, Giacomo Paschina, Stefano De Beni, Giuseppe Patan\`e

(参考訳) 医用画像の高度な可視化、表現、分析のための革新的なソリューションの研究と開発は、異なる研究方向を提供する。医用画像の現在の実践は、リアルタイムUSと画像モダリティを組み合わせることで、CT、MRI、PETなどの内部解剖学的取得を可能にしている。画像融合のアプローチの応用は、手術器具や針をリアルタイムで追跡するときに見ることができる。そこで本研究では,3次元カメラセンサを用いたリアルタイムus取得によるct画像とmri画像の登録のための融合画像システムを提案する。この研究の主な焦点は、システムの移植性と、異なる解剖学領域への適用性である。

The study and development of innovative solutions for the advanced visualisation, representation and analysis of medical images offer different research directions. Current practice in medical imaging consists in combining real-time US with imaging modalities that allow internal anatomy acquisitions, such as CT, MRI, PET or similar. Application of image-fusion approaches can be found in tracking surgical tools and/or needles, in real-time during interventions. Thus, this work proposes a fusion imaging system for the registration of CT and MRI images with real-time US acquisition leveraging a 3D camera sensor. The main focus of the work is the portability of the system and its applicability to different anatomical districts.

翻訳日:2023-07-27 11:50:34 公開日:2023-07-26

# 米国の都市における極低温予測のための新しい統計的機械学習技術

Emerging Statistical Machine Learning Techniques for Extreme Temperature Forecasting in U.S. Cities ( http://arxiv.org/abs/2307.14285v1 )

ライセンス: Link先を確認

Kameron B. Kinast and Ernest Fokou\'e

(参考訳) 本稿では,新しい統計的機械学習技術を用いて,極端温度パターンの包括的解析を行う。本研究は,気候時系列予測における各種統計モデルの有効性の探索と比較に焦点をあてる。これらのモデルには、自己回帰的統合移動平均、指数的平滑化、多層パーセプトロン、ガウス過程が含まれる。我々は,これらの手法を,最も人口の多い5つの米国都市の時系列データに適用し,PythonとJuliaを利用して,気候変動とその影響を理解する上での統計計算の役割を実証する。本研究は, 統計的手法の違いを強調し, 最も効果的なアプローチとして多層パーセプトロンを同定した。さらに, この最適性能法を用いて極端温度を2030年まで予測し, 温度変化が0より大きいかどうかを検証し, 仮説を検証した。

In this paper, we present a comprehensive analysis of extreme temperature patterns using emerging statistical machine learning techniques. Our research focuses on exploring and comparing the effectiveness of various statistical models for climate time series forecasting. The models considered include Auto-Regressive Integrated Moving Average, Exponential Smoothing, Multilayer Perceptrons, and Gaussian Processes. We apply these methods to climate time series data from five most populated U.S. cities, utilizing Python and Julia to demonstrate the role of statistical computing in understanding climate change and its impacts. Our findings highlight the differences between the statistical methods and identify Multilayer Perceptrons as the most effective approach. Additionally, we project extreme temperatures using this best-performing method, up to 2030, and examine whether the temperature changes are greater than zero, thereby testing a hypothesis.

翻訳日:2023-07-27 11:50:24 公開日:2023-07-26

# 汎用人工知能システム(gpais):特性、定義、分類、オープンチャレンジと意義

General Purpose Artificial Intelligence Systems (GPAIS): Properties, Definition, Taxonomy, Open Challenges and Implications ( http://arxiv.org/abs/2307.14283v1 )

ライセンス: Link先を確認

Isaac Triguero, Daniel Molina, Javier Poyatos, Javier Del Ser, Francisco Herrera

(参考訳) 人工知能(AI)のほとんどのアプリケーションは、限定的で特定のタスクのために設計されている。しかし、より一般的なaiを求める多くのシナリオがあり、それらのために特別に設計されることなく、幅広いタスクを解決できる。汎用人工知能システム(General-Purpose Artificial Intelligence Systems, GPAIS)は、これらのAIシステムを指す用語である。これまでのところ、人工知能の可能性は、人間であるかのように知的タスクを遂行できるほど強力であり、またそれを改善することさえ可能であり、願望、フィクションであり、我々の社会にとってリスクであると考えられてきた。私たちはまだそれを達成するには程遠いかもしれないが、GPAISは現実であり、AI研究の最前線にいる。本稿では,gpais の既存定義について論じ,その特性と限界に応じて gpai の種類を段階的に微分できる新たな定義を提案する。クローズドワールドとオープンワールドのGPAISを区別し、新しいタスクへの適応、意図的に訓練されていないドメインにおける能力、少ないデータから学習する能力、あるいは自身の制限を積極的に認める能力など、いくつかの要因に基づいて、それらの自律性と能力の程度を特徴付ける。次に、GPAISを実現するためのアプローチの分類を提案し、AI技術を用いた別のAIや基礎モデルの改善などの研究動向について述べる。第一の例として、私たちは生成aiを分類学で提示された用語と概念と整合させます。提案した定義と分類学を通じて,汎用的な課題に対処する様々な分野の研究協力を促進することを目的としている。最後に,gpaiの現状,課題と展望,社会への意味,責任と信頼に値するaiシステムと規制の必要性について議論し,gpaiの全体像を提供することを目標とした。

Most applications of Artificial Intelligence (AI) are designed for a confined and specific task. However, there are many scenarios that call for a more general AI, capable of solving a wide array of tasks without being specifically designed for them. The term General-Purpose Artificial Intelligence Systems (GPAIS) has been defined to refer to these AI systems. To date, the possibility of an Artificial General Intelligence, powerful enough to perform any intellectual task as if it were human, or even improve it, has remained an aspiration, fiction, and considered a risk for our society. Whilst we might still be far from achieving that, GPAIS is a reality and sitting at the forefront of AI research. This work discusses existing definitions for GPAIS and proposes a new definition that allows for a gradual differentiation among types of GPAIS according to their properties and limitations. We distinguish between closed-world and open-world GPAIS, characterising their degree of autonomy and ability based on several factors such as adaptation to new tasks, competence in domains not intentionally trained for, ability to learn from few data, or proactive acknowledgment of their own limitations. We then propose a taxonomy of approaches to realise GPAIS, describing research trends such as the use of AI techniques to improve another AI or foundation models. As a prime example, we delve into generative AI, aligning them with the terms and concepts presented in the taxonomy. Through the proposed definition and taxonomy, our aim is to facilitate research collaboration across different areas that are tackling general-purpose tasks, as they share many common aspects. Finally, we discuss the current state of GPAIS, its challenges and prospects, implications for our society, and the need for responsible and trustworthy AI systems and regulation, with the goal of providing a holistic view of GPAIS.

翻訳日:2023-07-27 11:50:09 公開日:2023-07-26

# 大規模完全教師なし再確認

Large-scale Fully-Unsupervised Re-Identification ( http://arxiv.org/abs/2307.14278v1 )

ライセンス: Link先を確認

Gabriel Bertocco, Fernanda Andal\'o, Terrance E. Boult, and Anderson Rocha

(参考訳) 完全に監督されていない人物と車両の再識別は、手動のアノテーションを必要とせず、監視、法医学、イベント理解、スマートシティに広く適用できるため、注目されている。しかしながら、以前の技術のほとんどは、わずか数千のサンプルを持つデータセットで評価されている。このような小さなデータ設定は、時間とメモリフットプリント(Re-Rankingなど)にコストのかかるテクニックを使用することで、クラスタリング結果を改善することができる。さらに、以前の作業では、データセット毎に最適なクラスタリングハイパーパラメータを事前に選択しているものもある。この文脈では、より現実的なシナリオに取り組み、大規模なラベルのないデータから学ぶための2つの戦略を提案する。第1の戦略は、近傍関係に違反することなく、各イテレーションにおけるデータセットサイズを削減するために、ローカル近傍サンプリングを実行する。第2の戦略は、より低い時間上限の複雑さを持ち、メモリの複雑さを k<<n で O(n^2) から O(kn) に還元する、新しいRe-Ranking 手法を利用する。また,クラスタリングアルゴリズムの特定のハイパーパラメータ値の事前選択を回避するために,トレーニング中の密度パラメータを調整し,サンプルの多様性を活用し,学習をノイズラベリングに頑健に保つ新しいスケジューリングアルゴリズムを提案する。最後に、異なるモデルによって学習された相補的な知識により、予測された擬似ラベルの背骨間の置換に依存し、ハイパーパラメータや重み付け最適化を必要としないコトレーニング戦略を導入する。提案手法は,高名なベンチマークや大規模veri-wildデータセットにおいて,より高速でメモリ効率の高い再ランキング戦略,大規模でノイズの多い,アンサンブルベースの学習手法において,最先端の手法よりも優れている。

Fully-unsupervised Person and Vehicle Re-Identification have received increasing attention due to their broad applicability in surveillance, forensics, event understanding, and smart cities, without requiring any manual annotation. However, most of the prior art has been evaluated in datasets that have just a couple thousand samples. Such small-data setups often allow the use of costly techniques in time and memory footprints, such as Re-Ranking, to improve clustering results. Moreover, some previous work even pre-selects the best clustering hyper-parameters for each dataset, which is unrealistic in a large-scale fully-unsupervised scenario. In this context, this work tackles a more realistic scenario and proposes two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each iteration without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n^2) to O(kn) with k << n. To avoid the pre-selection of specific hyper-parameter values for the clustering algorithm, we also present a novel scheduling algorithm that adjusts the density parameter during training, to leverage the diversity of samples and keep the learning robust to noisy labeling. Finally, due to the complementary knowledge learned by different models, we also introduce a co-training strategy that relies upon the permutation of predicted pseudo-labels, among the backbones, with no need for any hyper-parameters or weighting optimization. The proposed methodology outperforms the state-of-the-art methods in well-known benchmarks and in the challenging large-scale Veri-Wild dataset, with a faster and memory-efficient Re-Ranking strategy, and a large-scale, noisy-robust, and ensemble-based learning approach.

翻訳日:2023-07-27 11:49:38 公開日:2023-07-26

# G2L:ジオデシックとゲーム理論によるセマンティックアライメントと一様グラウンド

G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory ( http://arxiv.org/abs/2307.14277v1 )

ライセンス: Link先を確認

Hongxiang Li, Meng Cao, Xuxin Cheng, Yaowei Li, Zhihong Zhu, Yuexian Zou

(参考訳) 最近のビデオグラウンディングは、バニラコントラスト学習をビデオグラウンディングに導入しようと試みている。しかし、このナイーブ解は準最適であると主張する。対照的な学習には、(1)類似したサンプルの特徴のemph{alignment}と(2)超球上の正規化特徴の誘導分布のemph{uniformity}という2つの重要な特性が必要である。ビデオグラウンディングにおける2つの厄介な問題として,(1) 真実と他の瞬間の両方に視覚的実体が存在すること,(2) ビデオ中のいくつかの瞬間だけが注釈付けされていること,(2) バニラ・コントラスト学習は時間的に離れたモーメントと非一貫性なビデオ表現の相関をモデル化できないこと,などがあげられる。どちらの特徴も、バニラのコントラスト学習はビデオの接地には適さない。本稿では,ジオデシックとゲーム理論を通した,意味的に整列した一様ビデオグラウンドフレームワークであるgeodesic and game localization (g2l)を提案する。我々は、モデルが正しいクロスモーダル表現を学ぶのを導く測地距離を利用したモーメント間の相関を定量化する。さらに,ゲーム理論の新たな視点から,測地線距離サンプリングに基づくセマンティック・シェープリー相互作用を提案し,類似した瞬間における微粒なセマンティックアライメントを学習する。 3つのベンチマーク実験により,本手法の有効性が示された。

The recent video grounding works attempt to introduce vanilla contrastive learning into video grounding. However, we claim that this naive solution is suboptimal. Contrastive learning requires two key properties: (1) \emph{alignment} of features of similar samples, and (2) \emph{uniformity} of the induced distribution of the normalized features on the hypersphere. Due to two annoying issues in video grounding: (1) the co-existence of some visual entities in both ground truth and other moments, \ie semantic overlapping; (2) only a few moments in the video are annotated, \ie sparse annotation dilemma, vanilla contrastive learning is unable to model the correlations between temporally distant moments and learned inconsistent video representations. Both characteristics lead to vanilla contrastive learning being unsuitable for video grounding. In this paper, we introduce Geodesic and Game Localization (G2L), a semantically aligned and uniform video grounding framework via geodesic and game theory. We quantify the correlations among moments leveraging the geodesic distance that guides the model to learn the correct cross-modal representations. Furthermore, from the novel perspective of game theory, we propose semantic Shapley interaction based on geodesic distance sampling to learn fine-grained semantic alignment in similar moments. Experiments on three benchmarks demonstrate the effectiveness of our method.

翻訳日:2023-07-27 11:49:03 公開日:2023-07-26

# ロボットマニピュレーションのためのWaypoint-based Imitation Learning

Waypoint-Based Imitation Learning for Robotic Manipulation ( http://arxiv.org/abs/2307.14326v1 )

ライセンス: Link先を確認

Lucy Xiaoyang Shi, Archit Sharma, Tony Z. Zhao, Chelsea Finn

(参考訳) 模倣学習法はロボット操作への関心が高まりつつあるが、複合的エラーのよく知られた問題は、行動クローニング(BC)に影響を与え続けている。ウェイポイントは、bcの学習問題の地平線を縮めることでこの問題に対処できるため、エラーは時間とともに複雑化する。しかし、ウェイポイントラベリングは不特定であり、追加の人的監督が必要である。人的監督なしでwaypointを自動生成できますか? 我々の重要な洞察は、軌道セグメントが線形運動によって近似できるならば、エンドポイントはウェイポイントとして使用できるということである。そこで本研究では,再現学習のための自動ウェイポイント抽出 (awe) を提案する。このプリプロセッシングモジュールはデモを最小のウェイポイントに分解し,線形に補間することで,指定された誤差閾値までの軌道を近似できる。 AWEは任意のBCアルゴリズムと組み合わせることができ、AWEはシミュレーションで25%、実世界のバイマン的操作タスクで4-28%、意思決定の地平線を最大10倍に向上させることができる。ビデオとコードはhttps://lucys0.github.io/awe/で入手できる。

While imitation learning methods have seen a resurgent interest for robotic manipulation, the well-known problem of compounding errors continues to afflict behavioral cloning (BC). Waypoints can help address this problem by reducing the horizon of the learning problem for BC, and thus, the errors compounded over time. However, waypoint labeling is underspecified, and requires additional human supervision. Can we generate waypoints automatically without any additional human supervision? Our key insight is that if a trajectory segment can be approximated by linear motion, the endpoints can be used as waypoints. We propose Automatic Waypoint Extraction (AWE) for imitation learning, a preprocessing module to decompose a demonstration into a minimal set of waypoints which when interpolated linearly can approximate the trajectory up to a specified error threshold. AWE can be combined with any BC algorithm, and we find that AWE can increase the success rate of state-of-the-art algorithms by up to 25% in simulation and by 4-28% on real-world bimanual manipulation tasks, reducing the decision making horizon by up to a factor of 10. Videos and code are available at https://lucys0.github.io/awe/

翻訳日:2023-07-27 11:41:12 公開日:2023-07-26

# 低深度凸ユニタリ進化によるオープン量子系のシミュレーション

Simulation of Open Quantum Systems via Low-Depth Convex Unitary Evolutions ( http://arxiv.org/abs/2307.14325v1 )

ライセンス: Link先を確認

Joseph Peetz, Scott E. Smart, Spyros Tserkis, Prineha Narang

(参考訳) 量子デバイス上で物理システムをシミュレーションすることは、量子技術の最も有望な応用の1つである。オープン量子システムをシミュレートする現在の量子アプローチは、通常、アンシラ量子ビットと広範囲に制御されたシーケンスを必要とするため、nisq時代のデバイスでは事実上困難である。本研究では,ランダムユニタリチャネルと呼ばれるオープンシステムダイナミクスのクラスをシミュレートするためのハイブリッド量子古典的手法を提案する。これらのチャネルは自然に一連の凸ユニタリ進化に分解され、効率的にサンプリングされ独立した回路として実行される。このメソッドは深いアンシラフレームワークを必要としないため、低ノイズコストで実装できる。我々は、開量子システムのシミュレーションを数十キュービットまで実装し、大きなチャネルランクで実装する。

Simulating physical systems on quantum devices is one of the most promising applications of quantum technology. Current quantum approaches to simulating open quantum systems are still practically challenging on NISQ-era devices, because they typically require ancilla qubits and extensive controlled sequences. In this work, we propose a hybrid quantum-classical approach for simulating a class of open system dynamics called random-unitary channels. These channels naturally decompose into a series of convex unitary evolutions, which can then be efficiently sampled and run as independent circuits. The method does not require deep ancilla frameworks and thus can be implemented with lower noise costs. We implement simulations of open quantum systems up to dozens of qubits and with large channel rank.

翻訳日:2023-07-27 11:40:51 公開日:2023-07-26

# LLMにおけるモラル信念の評価

Evaluating the Moral Beliefs Encoded in LLMs ( http://arxiv.org/abs/2307.14324v1 )

ライセンス: Link先を確認

Nino Scherrer, Claudia Shi, Amir Feder and David M. Blei

(参考訳) 本稿では,大規模言語モデル(LLM)における設計・管理・後処理・評価について事例研究を行う。 1) LLM に符号化された信念を統計的に抽出する手法。我々は,LCMの「選択を行う」確率,関連する不確実性,およびその選択の一貫性を定量化する統計測度と評価測度を導入する。 2)モラル信念が異なるllmにエンコードされているか,特に正しい選択が明確でない曖昧な場合について,この方法を適用する。 680の高曖昧な道徳的シナリオ(例:「白い嘘をつくか?」)と687の低曖昧な道徳的シナリオ(例:「道路の歩行者を止めるか?」)からなる大規模な調査を設計する。各シナリオには説明と2つの可能なアクション、違反したルール(例えば「殺さない」)を示す補助ラベルが含まれている。オープンおよびクローズドソース LLM を対象とした調査を28件実施する。私たちはそれを見つけ (a) あいまいなシナリオでは、ほとんどのモデルはコモンセンスと整合したアクションを「選択」します。曖昧な場合、ほとんどのモデルは不確実性を表す。 (b)質問文に反応が敏感であるため,コモンセンス行動の選択について不確実なモデルもある。 (c)曖昧なシナリオにおいて明確な嗜好を反映するモデルもある。具体的には、クローズドソースモデルは互いに合意する傾向がある。

This paper presents a case study on the design, administration, post-processing, and evaluation of surveys on large language models (LLMs). It comprises two components: (1) A statistical method for eliciting beliefs encoded in LLMs. We introduce statistical measures and evaluation metrics that quantify the probability of an LLM "making a choice", the associated uncertainty, and the consistency of that choice. (2) We apply this method to study what moral beliefs are encoded in different LLMs, especially in ambiguous cases where the right choice is not obvious. We design a large-scale survey comprising 680 high-ambiguity moral scenarios (e.g., "Should I tell a white lie?") and 687 low-ambiguity moral scenarios (e.g., "Should I stop for a pedestrian on the road?"). Each scenario includes a description, two possible actions, and auxiliary labels indicating violated rules (e.g., "do not kill"). We administer the survey to 28 open- and closed-source LLMs. We find that (a) in unambiguous scenarios, most models "choose" actions that align with commonsense. In ambiguous cases, most models express uncertainty. (b) Some models are uncertain about choosing the commonsense action because their responses are sensitive to the question-wording. (c) Some models reflect clear preferences in ambiguous scenarios. Specifically, closed-source models tend to agree with each other.

翻訳日:2023-07-27 11:40:37 公開日:2023-07-26

# 説明可能なデュアルニューラルネットワークを用いた逆需要関数のモデル化

Modeling Inverse Demand Function with Explainable Dual Neural Networks ( http://arxiv.org/abs/2307.14322v1 )

ライセンス: Link先を確認

Zhiyu Cao, Zihan Chen, Prerna Mishra, Hamed Amini, Zachary Feinstein

(参考訳) 金融の伝染は金融システムの基本的リスクとして広く認識されている。特に強力なのが価格経由の感染であり、企業による強引な清算によって資産価格が下落し、金融ストレスが伝播し、危機は一見無関係な組織の範囲で拡大する。価格の影響は現在、外因性逆需要関数によってモデル化されている。しかし、現実のシナリオでは、初期ショックと最終均衡資産価格のみが観測可能であり、実際の資産の流動性はほとんど不明である。この欠落したデータは、既存のモデルの校正に重大な制限を与える。これらの課題に対処するため、第1のニューラルネットワークは初期ショックを予測された資産の流動にマッピングし、第2のニューラルネットワークはこれらの流動を利用して結果の平衡価格を導出する。このデータ駆動型アプローチは、解析構造を事前に指定することなく、線形形式と非線形形式の両方をキャプチャすることができる。シミュレーションデータセットを用いた実験により,本モデルは初期ショックのみに基づいて均衡資産価格を正確に予測し,予測値と真の清算値との整合性を示した。我々の説明可能なフレームワークは、価格を媒介とする伝染の理解とモデリングに寄与し、金融当局が効果的なストレステストと規制ポリシーを構築するための貴重な洞察を提供します。

Financial contagion has been widely recognized as a fundamental risk to the financial system. Particularly potent is price-mediated contagion, wherein forced liquidations by firms depress asset prices and propagate financial stress, enabling crises to proliferate across a broad spectrum of seemingly unrelated entities. Price impacts are currently modeled via exogenous inverse demand functions. However, in real-world scenarios, only the initial shocks and the final equilibrium asset prices are typically observable, leaving actual asset liquidations largely obscured. This missing data presents significant limitations to calibrating the existing models. To address these challenges, we introduce a novel dual neural network structure that operates in two sequential stages: the first neural network maps initial shocks to predicted asset liquidations, and the second network utilizes these liquidations to derive resultant equilibrium prices. This data-driven approach can capture both linear and non-linear forms without pre-specifying an analytical structure; furthermore, it functions effectively even in the absence of observable liquidation data. Experiments with simulated datasets demonstrate that our model can accurately predict equilibrium asset prices based solely on initial shocks, while revealing a strong alignment between predicted and true liquidations. Our explainable framework contributes to the understanding and modeling of price-mediated contagion and provides valuable insights for financial authorities to construct effective stress tests and regulatory policies.

翻訳日:2023-07-27 11:40:16 公開日:2023-07-26

# ガイド付き安全探査による強化学習

Reinforcement Learning by Guided Safe Exploration ( http://arxiv.org/abs/2307.14316v1 )

ライセンス: Link先を確認

Qisong Yang, Thiago D. Sim\~ao, Nils Jansen, Simon H. Tindemans, Matthijs T. J. Spaan

(参考訳) 安全は強化学習(RL)の適用を広げるために重要である。多くの場合、実験室のような制御された環境でRLエージェントを訓練し、実世界で展開する。しかし、実際のターゲットタスクは、デプロイ前に不明かもしれない。 Reward-free RLは報酬のないエージェントを訓練し、報酬が明らかになったらすぐに適応させる。エージェント(ガイド)が報酬信号なしで安全に探索することを学ぶという制約のない条件を考える。このエージェントは制御された環境で訓練され、安全でない相互作用を可能にし、安全信号を提供する。目標タスクが公表された後、安全違反はもはや許されない。したがって、ガイドを利用して安全な行動ポリシーを構成する。また,転校学習から,学生が信頼できない間に目標方針(学生)を定式化し,学習が進むにつれてガイドの影響を徐々に排除する。実験分析の結果,この手法は安全な転校学習を実現でき,学生がより早く目標課題を解決できることがわかった。

Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to compose a safe behaviour policy. Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence of the guide as training progresses. The empirical analysis shows that this method can achieve safe transfer learning and helps the student solve the target task faster.

翻訳日:2023-07-27 11:39:51 公開日:2023-07-26

# 一般化シモン問題に対する厳密な分散量子アルゴリズム

Exact distributed quantum algorithm for generalized Simon's problem ( http://arxiv.org/abs/2307.14315v1 )

ライセンス: Link先を確認

Hao Li, Daowen Qiu, Le Luo, Mateus Paulo

(参考訳) サイモンの問題は、ショアのアルゴリズムの提案に大きな影響を与えたため、量子アルゴリズムのパワーを示す最も重要な問題の1つである。一般化されたサイモンの問題はサイモンの問題の自然な拡張であり、特別な隠れ部分群問題でもある。本稿では2つの重要な貢献について述べる。まず、一般化されたサイモン問題の構造を分散シナリオで特徴付け、対応する分散量子アルゴリズムを導入する。第2に,量子振幅増幅法の応用による正確性を確保するアルゴリズムを改良する。本アルゴリズムは分散古典アルゴリズムと比較して指数加速度を提供する。一般化されたシモン問題に対する集中量子アルゴリズムと対照的に、我々のアルゴリズムのオラクルはより少ない量子ビットを必要とするため、物理的に容易に実装できる。特に、一般化されたサイモン問題のために我々が開発する厳密な分散量子アルゴリズムは、一般化可能性と厳密性の観点からサイモンの問題に対して提案されている最良の分散量子アルゴリズムよりも優れている。

Simon's problem is one of the most important problems demonstrating the power of quantum algorithms, as it greatly inspired the proposal of Shor's algorithm. The generalized Simon's problem is a natural extension of Simon's problem, and also a special hidden subgroup problem. In this paper, we present two key contributions. Firstly, we characterize the structure of the generalized Simon's problem in distributed scenario and introduce a corresponding distributed quantum algorithm. Secondly, we refine the algorithm to ensure exactness due to the application of quantum amplitude amplification technique. Our algorithm offers exponential acceleration compared to the distributed classical algorithm. When contrasted with the centralized quantum algorithm for the generalized Simon's problem, our algorithm's oracle requires fewer qubits, thus making it easier to be physically implemented. Particularly, the exact distributed quantum algorithm we develop for the generalized Simon's problem outperforms the best previously proposed distributed quantum algorithm for Simon's problem in terms of generalizability and exactness.

翻訳日:2023-07-27 11:39:34 公開日:2023-07-26

# SQUWALS: Szegedy QUantum Walks Simulator

SQUWALS: A Szegedy QUantum WALks Simulator ( http://arxiv.org/abs/2307.14314v1 )

ライセンス: Link先を確認

Sergio A. Ortega, Miguel A. Martin-Delgado

(参考訳) Szegedyの量子ウォークは、一般的なマルコフ連鎖を定量化するアルゴリズムである。最適化の多くの変種など、多くのアプリケーションがあります。エラーのない環境でその特性をチェックするためには、古典的なシミュレータを持つことが重要である。しかし、現在のシミュレーションアルゴリズムは、この量子ウォークの特定の定式化のために大量のメモリを必要とする。本稿では、グラフのサイズが$n$で$\mathcal{o}(n^2)$にスケールするメモリ節約アルゴリズムを提案する。混合状態上のセゲディの量子ウォークと半古典的セゲディウォークをシミュレートするための追加手順を提案する。これらのテクニックにより、PythonでSQUWALSと呼ばれる古典的なシミュレータを構築しました。我々のシミュレータは、時間とメモリリソースの両方で$\mathcal{o}(n^2)$でスケールする。このパッケージは、例えばPageRankのように、Szegedyの量子ウォークに基づくアルゴリズムの高レベルなアプリケーションを提供する。

Szegedy's quantum walk is an algorithm for quantizing a general Markov chain. It has plenty of applications such as many variants of optimizations. In order to check its properties in an error-free environment, it is important to have a classical simulator. However, the current simulation algorithms require a great deal of memory due to the particular formulation of this quantum walk. In this paper we propose a memory-saving algorithm that scales as $\mathcal{O}(N^2)$ with the size $N$ of the graph. We provide additional procedures for simulating Szegedy's quantum walk over mixed states and also the Semiclassical Szegedy walk. With these techniques we have built a classical simulator in Python called SQUWALS. We show that our simulator scales as $\mathcal{O}(N^2)$ in both time and memory resources. This package provides some high-level applications for algorithms based on Szegedy's quantum walk, as for example the quantum PageRank.

翻訳日:2023-07-27 11:39:20 公開日:2023-07-26

# 感傷分析のための図書館の比較分析

Comparative Analysis of Libraries for the Sentimental Analysis ( http://arxiv.org/abs/2307.14311v1 )

ライセンス: Link先を確認

Wendy Ccoya and Edson Pinto

(参考訳) この研究は、機械学習手法を用いたライブラリの比較を行うことが主な目的である。自然言語処理(NLP)の専門家は、テキスト変更の感情分析(SA)にますます関心を寄せている。 nlpテキスト分析技術を利用する目的は、twitterユーザーの発話に関する感情を認識し、分類することである。本試験では,SAと活用図書館の問題についても検討した。感情の極性を分類する協力的な方法を提供しています最近の研究によると、Naive Bayes分類器、Decision Tree分類器、Maxent分類器、Sklearn分類器、Sklearn分類器、MultinomialNBなどの結合学習アルゴリズムは非常に効果的である。プロジェクトでは、PythonとRの5つのライブラリ、NLTK、TextBlob、Vader、Transformer(GPTとBERTの事前トレーニング)を使用し、Tidytextは、感情分析技術の適用のために研究に使用される。 4つの機械学習モデルであるTree of Decisions (DT), Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN)も使用される。ソーシャルネットワーク環境におけるSAライブラリの運用状況を評価するために,比較研究を行った。この実験の最良のアルゴリズムを評価する尺度は、各方法に1つのデータセットを使用しており、精度、リコール、およびf1スコアであった。その結果, 感情分析には 0.973 の BERT トランスフォーマー法が推奨されることがわかった。

This study is main goal is to provide a comparative comparison of libraries using machine learning methods. Experts in natural language processing (NLP) are becoming more and more interested in sentiment analysis (SA) of text changes. The objective of employing NLP text analysis techniques is to recognize and categorize feelings related to twitter users utterances. In this examination, issues with SA and the libraries utilized are also looked at. provides a number of cooperative methods to classify emotional polarity. The Naive Bayes Classifier, Decision Tree Classifier, Maxent Classifier, Sklearn Classifier, Sklearn Classifier MultinomialNB, and other conjoint learning algorithms, according to recent research, are very effective. In the project will use Five Python and R libraries NLTK, TextBlob, Vader, Transformers (GPT and BERT pretrained), and Tidytext will be used in the study to apply sentiment analysis techniques. Four machine learning models Tree of Decisions (DT), Support Vector Machine (SVM), Naive Bayes (NB), and K-Nearest Neighbor (KNN) will also be used. To evaluate how well libraries for SA operate in the social network environment, comparative study was also carried out. The measures to assess the best algorithms in this experiment, which used a single data set for each method, were precision, recall, and F1 score. We conclude that the BERT transformer method with an Accuracy: 0.973 is recommended for sentiment analysis.

翻訳日:2023-07-27 11:39:04 公開日:2023-07-26

# 量子信号処理による導出価格

Derivative Pricing using Quantum Signal Processing ( http://arxiv.org/abs/2307.14310v1 )

ライセンス: Link先を確認

Nikitas Stamatopoulos and William J. Zeng

(参考訳) 量子コンピュータ上の金融デリバティブの価格には一般的に量子算術要素が含まれ、対応する回路が必要とする量子リソースに大きく寄与する。本稿では,金融デリバティブ・ペイオフを量子振幅に直接エンコードする量子信号処理(qsp)に基づく手法を導入し,コストのかかる量子演算の負担から量子回路を緩和する。文献における現在の最先端のアプローチと比較すると、実用的関心のあるデリバティブ契約の場合、qspの適用により考慮されるすべての指標において必要なリソースが大幅に削減され、最も注目すべきなのは、$\sim 16$xのtゲートの総数と$\sim 4$xの論理量子ビット数である。さらに、量子優位性に必要な論理クロックレートも、$\sim 5$x の係数で低減されると推定する。全体として、量子アドバンテージは4.7$k論理量子ビットを必要とし、量子デバイスは45$MHzのレートで10^9$Tゲートを実行できる。本研究は,提案手法を最も容易に適用可能なデリバティブ価格プロセスのペイオフコンポーネントを特に重視する一方で,同様の手法を用いて,状態準備などの他のアプリケーションにおけるリソースの削減を図ることができる。

Pricing financial derivatives on quantum computers typically includes quantum arithmetic components which contribute heavily to the quantum resources required by the corresponding circuits. In this manuscript, we introduce a method based on Quantum Signal Processing (QSP) to encode financial derivative payoffs directly into quantum amplitudes, alleviating the quantum circuits from the burden of costly quantum arithmetic. Compared to current state-of-the-art approaches in the literature, we find that for derivative contracts of practical interest, the application of QSP significantly reduces the required resources across all metrics considered, most notably the total number of T-gates by $\sim 16$x and the number of logical qubits by $\sim 4$x. Additionally, we estimate that the logical clock rate needed for quantum advantage is also reduced by a factor of $\sim 5$x. Overall, we find that quantum advantage will require $4.7$k logical qubits, and quantum devices that can execute $10^9$ T-gates at a rate of $45$MHz. While in this work we focus specifically on the payoff component of the derivative pricing process where the method we present is most readily applicable, similar techniques can be employed to further reduce the resources in other applications, such as state preparation.

翻訳日:2023-07-27 11:38:38 公開日:2023-07-26

# QPLEX: 組合せ最適化ソフトウェアへの量子コンピューティングの統合の実現

QPLEX: Realizing the Integration of Quantum Computing into Combinatorial Optimization Software ( http://arxiv.org/abs/2307.14308v1 )

ライセンス: Link先を確認

Juan Giraldo, Jos\'e Ossorio, Norha M. Villegas, Gabriel Tamura, Ulrike Stege

(参考訳) 量子コンピューティングは、複雑な問題を解決する際に現在の古典的コンピュータの能力を超える可能性がある。コンビネーション最適化は量子コンピュータの重要なターゲット領域の一つとして登場しており、この分野で見られる問題は、多くの異なる産業応用分野(例えば、製造業務の強化や意思決定プロセスの改善)において重要な役割を担っている。現在、様々なタイプの高性能最適化ソフトウェア(例えば、ILOG CPLEX や Gurobi)があり、技術者や科学者が古典的なコンピュータを用いて最適化問題を解くのを支援する。量子リソースを利用するには、ユーザーは量子アルゴリズム、SDK、ライブラリのドメイン固有の知識を必要とする。私たちの目標は、従来の最適化パッケージにソフトウェアインフラストラクチャを追加することで、アプリケーション開発者がワークフローのセットアップ時に簡単に量子プラットフォームとインターフェースできるようにすることです。本稿では,古典的インタフェースによる量子資源のシームレス利用のためのツールを提案する。このアプローチは、複数の量子プロバイダへのアクセスを容易にするバックエンドを提供するPythonライブラリ拡張で構成されています。我々のパイプラインは、最適化ソフトウェア開発者が量子リソースを選択的に実験し、ハイブリッド量子古典最適化ソリューションの性能改善を評価することを可能にする。

Quantum computing has the potential to surpass the capabilities of current classical computers when solving complex problems. Combinatorial optimization has emerged as one of the key target areas for quantum computers as problems found in this field play a critical role in many different industrial application sectors (e.g., enhancing manufacturing operations or improving decision processes). Currently, there are different types of high-performance optimization software (e.g., ILOG CPLEX and Gurobi) that support engineers and scientists in solving optimization problems using classical computers. In order to utilize quantum resources, users require domain-specific knowledge of quantum algorithms, SDKs and libraries, which can be a limiting factor for any practitioner who wants to integrate this technology into their workflows. Our goal is to add software infrastructure to a classical optimization package so that application developers can interface with quantum platforms readily when setting up their workflows. This paper presents a tool for the seamless utilization of quantum resources through a classical interface. Our approach consists of a Python library extension that provides a backend to facilitate access to multiple quantum providers. Our pipeline enables optimization software developers to experiment with quantum resources selectively and assess performance improvements of hybrid quantum-classical optimization solutions.

翻訳日:2023-07-27 11:38:13 公開日:2023-07-26

# バーチャルミラー:3回めのバウンスを超える非視界イメージング

Virtual Mirrors: Non-Line-of-Sight Imaging Beyond the Third Bounce ( http://arxiv.org/abs/2307.14341v1 )

ライセンス: Link先を確認

Diego Royo and Talha Sultan and Adolfo Mu\~noz and Khadijeh Masumnia-Bisheh and Eric Brandt and Diego Gutierrez and Andreas Velten and Julio Marco

(参考訳) 非視線撮像法(NLOS)は、間接照明を用いて観察者が見えない複雑なシーンを再構成することができる。しかし、彼らは3オンスの照明のみを仮定しており、現在は単角形状に制限されており、特定の方向での撮像面の視認性は限られている。これらの制約を推理し、対処するために、平面拡散面は計算波ベースのNLOSイメージング領域で用いられる波長で特異に振る舞うという重要な観察を行う。このような表面を仮想鏡と呼ぶ。我々は、この観察を利用して、第3のバウンスを超えた照明を用いて、nlosイメージングの能力を拡大し、2つの問題、すなわち、可視角の制限された単一角物体の撮影と、2つの角の背後に隠れた物体の撮像に対処した。対象物の視認角を限定した画像に対して,まず,物体の位置や方向の推定として,現場表面上の既知の照度点の反射を,視認性に乏しい範囲で解析する。次に,対象物体を直接視認する他の面の二次開口部を計算的に構築することにより,これら限られた可視性物体を可視化する。単一角nlosイメージング以外にも,2つの角に隠れた物体の鏡像が形成される仮想ミラーの背後にある空間をイメージングすることにより,仮想ミラーの鏡面挙動を利用して2つの角に隠れた物体を撮像する。この論文の作成には鏡面は関与していない。

Non-line-of-sight (NLOS) imaging methods are capable of reconstructing complex scenes that are not visible to an observer using indirect illumination. However, they assume only third-bounce illumination, so they are currently limited to single-corner configurations, and present limited visibility when imaging surfaces at certain orientations. To reason about and tackle these limitations, we make the key observation that planar diffuse surfaces behave specularly at wavelengths used in the computational wave-based NLOS imaging domain. We call such surfaces virtual mirrors. We leverage this observation to expand the capabilities of NLOS imaging using illumination beyond the third bounce, addressing two problems: imaging single-corner objects at limited visibility angles, and imaging objects hidden behind two corners. To image objects at limited visibility angles, we first analyze the reflections of the known illuminated point on surfaces of the scene as an estimator of the position and orientation of objects with limited visibility. We then image those limited visibility objects by computationally building secondary apertures at other surfaces that observe the target object from a direct visibility perspective. Beyond single-corner NLOS imaging, we exploit the specular behavior of virtual mirrors to image objects hidden behind a second corner by imaging the space behind such virtual mirrors, where the mirror image of objects hidden around two corners is formed. No specular surfaces were involved in the making of this paper.

翻訳日:2023-07-27 11:33:38 公開日:2023-07-26

# 散逸による非エルミタン破砕

Non-Hermitian tearing by dissipation ( http://arxiv.org/abs/2307.14340v1 )

ライセンス: Link先を確認

Qian Du and Su-Peng Kou

(参考訳) 本稿では,エネルギー帯域が虚線ギャップを示し,エネルギー固有状態が特定の領域に結合する散逸下での非エルミート系について検討する。これらの現象を説明するために、我々は「非エルミート的破断」の概念を提案し、我々が定義した破断の程度は例外的な点で連続的な相転移を示す。非エルミート的分解は、バルク状態分離と境界状態分離の2つの形態で表される。非エルミート断裂のより深い理解のために、実空間におけるN*Nハミルトニアンを減少させることにより、k-空間において有効2*2ハミルトニアンを与える。さらに,一次元Su-Schrieffer-HeegerモデルとQi-Wu-Zhangモデルにおける非エルミート断裂についても検討する。この結果は、より複雑なシステムにおける非エルミート断裂の研究に理論的アプローチを提供する。

In the paper, we study the non-Hermitian system under dissipation in which the energy band shows an imaginary line gap and energy eigenstates are bound to a specific region. To describe these phenomena, we propose the concept of "non-Hermitian tearing", in which the degree of tearing we defined reveals a continuous phase transition at the exceptional point. The non-Hermitian tearing manifests in two forms -- bulk state separation and boundary state decoupling. For a deeper understanding of non-Hermitian tearing, we give the effective 2*2 Hamiltonian in the k-space by reducing the N*N Hamiltonian in the real space. In addition, we also explore the non-Hermitian tearing in the one-dimensional Su-Schrieffer-Heeger model and the Qi-Wu-Zhang model. Our results provide a theoretical approach for studying non-Hermitian tearing in more complex systems.

翻訳日:2023-07-27 11:33:10 公開日:2023-07-26

# TabR: 検索機能強化された語彙深層学習のパワーを解き放つ

TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning ( http://arxiv.org/abs/2307.14338v1 )

ライセンス: Link先を確認

Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev, Daniil Shlenskii, Akim Kotelnikov, Artem Babenko

(参考訳) グラフデータ問題に対するディープラーニング(DL)モデルはますます注目を集めている一方、勾配ブースト決定木(GBDT)に基づくアルゴリズムは依然として強力なゴーツーソリューションである。自然言語処理やコンピュータビジョンといった他の領域の最近のトレンドに続き、検索拡張表型DLモデルが最近提案されている。与えられた対象オブジェクトに対して、検索ベースモデルは、利用可能な(トレーニング)データから、最も近い隣接オブジェクトなどの他の関連オブジェクトを検索し、それらの特徴やラベルを使用してより良い予測を行う。しかし,既存の検索ベースの表型DLソリューションは,適切に調整された単純な検索自由ベースラインよりも,マイナーなメリットしか得られないことがわかった。したがって、検索に基づくアプローチが表型DLにとって価値のある方向であるかどうかは不明である。本論では,この問題に対して強い肯定的な回答を与える。まず,単純なフィードフォワードアーキテクチャを,多くの(表型)検索ベースモデルと同様の注意深い検索コンポーネントで段階的に拡張することから始める。次に,表データ問題に対する性能に大きな影響を与える注意機構について,いくつかの詳細を強調するが,先行研究では検討されなかった。その結果、TabRは単純な検索ベースの表型DLモデルであり、一連の公開ベンチマークにおいて、表型DLモデルの中で最高の平均性能を示し、複数のデータセットで新しい最先端技術となり、最近提案された‘GBDTフレンドリ’ベンチマークではGBDTモデルよりも優れています(第1図参照)。

Deep learning (DL) models for tabular data problems are receiving increasingly more attention, while the algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution. Following the recent trends in other domains, such as natural language processing and computer vision, several retrieval-augmented tabular DL models have been recently proposed. For a given target object, a retrieval-based model retrieves other relevant objects, such as the nearest neighbors, from the available (training) data and uses their features or even labels to make a better prediction. However, we show that the existing retrieval-based tabular DL solutions provide only minor, if any, benefits over the properly tuned simple retrieval-free baselines. Thus, it remains unclear whether the retrieval-based approach is a worthy direction for tabular DL. In this work, we give a strong positive answer to this question. We start by incrementally augmenting a simple feed-forward architecture with an attention-like retrieval component similar to those of many (tabular) retrieval-based models. Then, we highlight several details of the attention mechanism that turn out to have a massive impact on the performance on tabular data problems, but that were not explored in prior work. As a result, we design TabR -- a simple retrieval-based tabular DL model which, on a set of public benchmarks, demonstrates the best average performance among tabular DL models, becomes the new state-of-the-art on several datasets, and even outperforms GBDT models on the recently proposed ``GBDT-friendly'' benchmark (see the first figure).

翻訳日:2023-07-27 11:32:41 公開日:2023-07-26

# MAMO:モノクロビデオ深度推定のためのメモリと注意の活用

MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation ( http://arxiv.org/abs/2307.14336v1 )

ライセンス: Link先を確認

Rajeev Yasarla, Hong Cai, Jisoo Jeong, Yunxiao Shi, Risheek Garrepalli, Fatih Porikli

(参考訳) モノクロ映像深度推定のための新しいメモリとアテンションフレームであるMAMOを提案する。 MAMOは、任意の単一画像深度推定ネットワークをビデオ深度推定モデルに拡張し、改善し、時間的情報を利用してより正確な深度を予測できる。また,MAMoでは,映像を流すときの深度予測を支援するメモリによるモデル拡張を行う。具体的には、前回のインスタンスの視覚的および変位的トークンを記憶する。これにより、現在のフレームの深さを予測する際に、深度ネットワークが過去から関連する特徴を相互参照することができる。本稿では,過去と現在の両方の視覚情報に対応するトークンを保持するために,メモリを継続的に更新する新しい手法を提案する。本稿では,自己認識モジュールを用いた視覚的・変位的メモリトークン間の時空間的関係を初めて学習するプロセスメモリ特徴に対する注意に基づくアプローチを採用する。さらに、自己注意の出力特徴を、交差注意を通して現在の視覚特徴と集約する。交差した特徴は最終的にデコーダに与えられ、現在のフレームの深さを予測する。 KITTI,NYU-Depth V2,DDADなどのベンチマーク実験を通じて,MAMOは単分子深度推定ネットワークを一貫して改善し,新しいSOTA(State-of-the-art)の精度を設定することを示した。特に,当社のMAMoビデオ深度推定は,SOTAコストボリュームに基づくビデオ深度モデルに準じて,低レイテンシで高い精度を実現する。

We propose MAMo, a novel memory and attention frame-work for monocular video depth estimation. MAMo can augment and improve any single-image depth estimation networks into video depth estimation models, enabling them to take advantage of the temporal information to predict more accurate depth. In MAMo, we augment model with memory which aids the depth prediction as the model streams through the video. Specifically, the memory stores learned visual and displacement tokens of the previous time instances. This allows the depth network to cross-reference relevant features from the past when predicting depth on the current frame. We introduce a novel scheme to continuously update the memory, optimizing it to keep tokens that correspond with both the past and the present visual information. We adopt attention-based approach to process memory features where we first learn the spatio-temporal relation among the resultant visual and displacement memory tokens using self-attention module. Further, the output features of self-attention are aggregated with the current visual features through cross-attention. The cross-attended features are finally given to a decoder to predict depth on the current frame. Through extensive experiments on several benchmarks, including KITTI, NYU-Depth V2, and DDAD, we show that MAMo consistently improves monocular depth estimation networks and sets new state-of-the-art (SOTA) accuracy. Notably, our MAMo video depth estimation provides higher accuracy with lower latency, when omparing to SOTA cost-volume-based video depth models.

翻訳日:2023-07-27 11:31:47 公開日:2023-07-26

# WavJourney: 大きな言語モデルによる作曲オーディオ作成

WavJourney: Compositional Audio Creation with Large Language Models ( http://arxiv.org/abs/2307.14335v1 )

ライセンス: Link先を確認

Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

(参考訳) 大規模言語モデル(LLM)は、複雑な言語とビジョンタスクに取り組むために多様な専門家モデルを統合することに大きな期待を示している。人工知能生成コンテンツ(AIGC: Artificial Intelligence Generated Content)の分野を推し進めることの重要性にもかかわらず、インテリジェントなオーディオコンテンツ作成におけるそのポテンシャルは未解明のままである。そこで本研究では,音声,音楽,音響効果を含むストーリーラインを用いたテキスト指示による音声コンテンツ作成の問題に取り組む。 llmを利用して様々なオーディオモデルを音声コンテンツ生成につなげるシステムwavjourneyを提案する。聴覚シーンのテキスト記述が与えられると、wavjourneyはまずllmsに音声ストーリーテリング専用の構造化スクリプトを生成するように促す。オーディオスクリプトは、その時空間関係に基づいて構成された多様なオーディオ要素を含む。音声の概念表現として、音声スクリプトは対話的で解釈可能な人間の関与の根拠を提供する。その後、オーディオスクリプトをスクリプトコンパイラに供給し、それをコンピュータプログラムに変換する。プログラムの各行はタスク固有の音声生成モデルまたは計算操作関数(例えば、連結、混合)を呼び出します。そして、コンピュータプログラムを実行し、音声生成のための説明可能な解を得る。我々は,sf,教育,ラジオプレイなど,現実世界のさまざまなシナリオにおけるwavjourneyの実用性を示す。 WavJourneyの説明可能なインタラクティブなデザインは、マルチラウンド対話における人間と機械の共創を促進し、オーディオ制作における創造的制御と適応性を高める。 WavJourneyは人間の想像力をオーディオ化し、マルチメディアコンテンツの創造性のための新たな道を開く。

Large Language Models (LLMs) have shown great promise in integrating diverse expert models to tackle intricate language and vision tasks. Despite their significance in advancing the field of Artificial Intelligence Generated Content (AIGC), their potential in intelligent audio content creation remains unexplored. In this work, we tackle the problem of creating audio content with storylines encompassing speech, music, and sound effects, guided by text instructions. We present WavJourney, a system that leverages LLMs to connect various audio models for audio content generation. Given a text description of an auditory scene, WavJourney first prompts LLMs to generate a structured script dedicated to audio storytelling. The audio script incorporates diverse audio elements, organized based on their spatio-temporal relationships. As a conceptual representation of audio, the audio script provides an interactive and interpretable rationale for human engagement. Afterward, the audio script is fed into a script compiler, converting it into a computer program. Each line of the program calls a task-specific audio generation model or computational operation function (e.g., concatenate, mix). The computer program is then executed to obtain an explainable solution for audio generation. We demonstrate the practicality of WavJourney across diverse real-world scenarios, including science fiction, education, and radio play. The explainable and interactive design of WavJourney fosters human-machine co-creation in multi-round dialogues, enhancing creative control and adaptability in audio production. WavJourney audiolizes the human imagination, opening up new avenues for creativity in multimedia content creation.

翻訳日:2023-07-27 11:31:23 公開日:2023-07-26

# 汎用バイオメディカルAIを目指して

Towards Generalist Biomedical AI ( http://arxiv.org/abs/2307.14334v1 )

ライセンス: Link先を確認

Tao Tu, Shekoofeh Azizi, Danny Driess, Mike Schaekermann, Mohamed Amin, Pi-Chuan Chang, Andrew Carroll, Chuck Lau, Ryutaro Tanno, Ira Ktena, Basil Mustafa, Aakanksha Chowdhery, Yun Liu, Simon Kornblith, David Fleet, Philip Mansfield, Sushant Prakash, Renee Wong, Sunny Virmani, Christopher Semturs, S Sara Mahdavi, Bradley Green, Ewa Dominowska, Blaise Aguera y Arcas, Joelle Barral, Dale Webster, Greg S. Corrado, Yossi Matias, Karan Singhal, Pete Florence, Alan Karthikesalingam, Vivek Natarajan

(参考訳) 医学は本質的にマルチモーダルであり、テキスト、画像、ゲノムなど幅広いリッチなデータモダリティを持つ。このデータを柔軟にエンコードし、統合し、大規模に解釈する一般のバイオメディカル人工知能(AI)システムは、科学的発見からケアデリバリーまで、影響のあるアプリケーションを可能にする可能性がある。これらのモデルの開発を可能にするために,我々はまず,新しいマルチモーダルバイオメディカルベンチマークであるMultiMedBenchをキュレートする。 MultiMedBenchは、医学的質問応答、マンモグラフィーと皮膚科のイメージ解釈、放射線学レポートの生成と要約、ゲノム変異呼び出しなどの14のタスクを含む。次に、汎用バイオメディカルAIシステムの概念実証であるMed-PaLM Multimodal(Med-PaLM M)を紹介する。 med-palm mは、同じモデル重みを持つ臨床言語、画像、ゲノムを含む生体医学データを柔軟にエンコードし、解釈する大きなマルチモーダル生成モデルである。 Med-PaLM Mは、すべてのMultiMedBenchタスクにおける技術状況と競合するか、あるいは超越している。また,新しい医療概念や課題に対するゼロショット一般化,タスク間のポジティブトランスファー学習,創発的ゼロショット医療推論の例を報告する。我々は,Med-PaLM Mの能力と限界を更に探究するために,モデル生成(およびヒト)胸部X線検査の放射線学的評価を行い,モデルスケールでの性能向上を観察する。 246例の胸部X線を並べて評価すると、臨床医は放射線科医が最大40.50%の症例で作成したものよりも、Med-PaLM Mの報告を相互に好んでいる。実世界のユースケースでこれらのモデルを検証するには、かなりの作業が必要であるが、私たちの結果は、一般のバイオメディカルAIシステムの開発に向けたマイルストーンである。

Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, and interpret this data at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To enable the development of these models, we first curate MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduce Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system. Med-PaLM M is a large multimodal generative model that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. Med-PaLM M reaches performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. We also report examples of zero-shot generalization to novel medical concepts and tasks, positive transfer learning across tasks, and emergent zero-shot medical reasoning. To further probe the capabilities and limitations of Med-PaLM M, we conduct a radiologist evaluation of model-generated (and human) chest X-ray reports and observe encouraging performance across model scales. In a side-by-side ranking on 246 retrospective chest X-rays, clinicians express a pairwise preference for Med-PaLM M reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility. While considerable work is needed to validate these models in real-world use cases, our results represent a milestone towards the development of generalist biomedical AI systems.

翻訳日:2023-07-27 11:30:54 公開日:2023-07-26

# イベントベースビジョンによるマニピュレーション動作の早期予測

Event-based Vision for Early Prediction of Manipulation Actions ( http://arxiv.org/abs/2307.14332v1 )

ライセンス: Link先を確認

Daniel Deniz and Cornelia Fermuller and Eduardo Ros and Manuel Rodriguez-Alvarez and Francisco Barranco

(参考訳) ニューロモルフィックな視覚センサーは、シーンで明るさが変化するときに非同期イベントのシーケンスを出力する人工網膜である。これらのセンサーは、非常に高時間分解能、動きのぼやけがなく、リアルタイム処理に理想的なスマートデータ圧縮など、多くの利点を提供している。本研究では,微粒な操作動作に関するイベントベースデータセットを導入し,イベントを伴う動作予測にトランスフォーマーを使用する実験を行った。認知ロボティクスや人間とロボットの相互作用の分野では、人間の行動の理解と予測にできる限り早く関心がある。早期予測は、計画のための複雑な段階を予測し、効果的かつリアルタイムなインタラクションを可能にする。当社のTransformerネットワークでは,オンライン推論を用いてイベントを使用して操作動作の予測を行っている。このモデルは、早期に行動を予測することに成功し、時間とともに信頼性を高め、最先端の分類を達成する。さらに,注意に基づくトランスフォーマアーキテクチャにより,モデルによって選択された時空間パターンの役割を考察できる。実験の結果,Transformer ネットワークはビデオベースのアプローチよりも優れた動作ダイナミックな特徴を捉え,アクション間の差異が極めて微妙な方法で発生するシナリオに成功していることがわかった。最後に,新たなイベントデータセットをリリースする。このデータセットは,アクション認識の操作に関する文献の中で最初のものだ。コードはhttps://github.com/DaniDeniz/EventVisionTransformer.comから入手できる。

Neuromorphic visual sensors are artificial retinas that output sequences of asynchronous events when brightness changes occur in the scene. These sensors offer many advantages including very high temporal resolution, no motion blur and smart data compression ideal for real-time processing. In this study, we introduce an event-based dataset on fine-grained manipulation actions and perform an experimental study on the use of transformers for action prediction with events. There is enormous interest in the fields of cognitive robotics and human-robot interaction on understanding and predicting human actions as early as possible. Early prediction allows anticipating complex stages for planning, enabling effective and real-time interaction. Our Transformer network uses events to predict manipulation actions as they occur, using online inference. The model succeeds at predicting actions early on, building up confidence over time and achieving state-of-the-art classification. Moreover, the attention-based transformer architecture allows us to study the role of the spatio-temporal patterns selected by the model. Our experiments show that the Transformer network captures action dynamic features outperforming video-based approaches and succeeding with scenarios where the differences between actions lie in very subtle cues. Finally, we release the new event dataset, which is the first in the literature for manipulation action recognition. Code will be available at https://github.com/DaniDeniz/EventVisionTransformer.

翻訳日:2023-07-27 11:30:17 公開日:2023-07-26

# Visual Instruction Inversion: Visual Promptingによる画像編集

Visual Instruction Inversion: Image Editing via Visual Prompting ( http://arxiv.org/abs/2307.14331v1 )

ライセンス: Link先を確認

Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee

(参考訳) テキスト条件の画像編集は画像編集の強力なツールとして登場した。しかし、多くの場合、言語は曖昧で、特定の画像編集を記述するのに役に立たない。このような課題に直面した場合、視覚的なプロンプトは、アイデアを伝えるためのより情報的で直感的な方法になり得る。本稿では,視覚的プロンプトによる画像編集手法を提案する。編集の「前」と「後」の画像を表す一対の例が与えられた場合、我々のゴールは、新しい画像で同じ編集を行うために使用できるテキストベースの編集方向を学ぶことである。テキストと画像の拡散モデルのリッチで事前訓練された編集機能を利用して、視覚的プロンプトを編集命令に変換する。この結果から,一対の例では,最先端のテキストコンディショニング画像編集フレームワークと比較して,競合的な結果が得られることがわかった。

Text-conditioned image editing has emerged as a powerful tool for editing images. However, in many situations, language can be ambiguous and ineffective in describing specific image edits. When faced with such challenges, visual prompts can be a more informative and intuitive way to convey ideas. We present a method for image editing via visual prompting. Given pairs of example that represent the "before" and "after" images of an edit, our goal is to learn a text-based editing direction that can be used to perform the same edit on new images. We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions. Our results show that with just one example pair, we can achieve competitive results compared to state-of-the-art text-conditioned image editing frameworks.

翻訳日:2023-07-27 11:29:53 公開日:2023-07-26

# 有効-ハミルトン理論:開量子系の平衡状態への近似

Effective-Hamiltonian theory: An approximation to the equilibrium state of open quantum systems ( http://arxiv.org/abs/2307.14330v1 )

ライセンス: Link先を確認

Nicholas Anto-Sztrikacs, Brett Min, Marlon Brenes, and Dvira Segal

(参考訳) 熱浴との強いカップリングにおける量子系の平衡状態(平均力ギブス状態)の近似として,最近開発された実効ハミルトニアン(effh)法 [prx quantum $\bf{4}$, 020307 (2023)] を拡張してベンチマークを行った。 EFFH法は近似フレームワークである。反応-配位写像、ポーラロン変換、制御された切断の組み合わせにより、系-バスカップリングパラメータをシステムのハミルトニアンにインプリントする。まず、$\textit{variational}$ EFFH 技術を開発する。本手法では,システムバス結合パラメータ(元のEFFH法のように)と浴槽温度の両方で,系のパラメータを正規化する。次に,一般化スピン-ボーソンモデルを適用し,数値実効シミュレーションに対するeffh法からの平衡状態の評価を行い,ブラウンスペクトル関数を用いた偏光とコヒーレンスの両方について良好な一致を示す。第3に, EFFH法と慣れ親しんだ (正規および変動) ポーラロン法を対比した。両手法が平衡状態の類似構造を予測することを示し,EFFH法は簡単な計算と閉形式解析結果の利点を提供する。同様に、系の周波数に匹敵する温度では、EFFH法は、極弱から超強までのシステム-バス結合の完全な範囲において、平均力ギブズ状態に対して良好な近似を提供する。

We extend and benchmark the recently-developed Effective-Hamiltonian (EFFH) method [PRX Quantum $\bf{4}$, 020307 (2023)] as an approximation to the equilibrium state ("mean-force Gibbs state") of a quantum system at strong coupling to a thermal bath. The EFFH method is an approximate framework. Through a combination of the reaction-coordinate mapping, a polaron transformation and a controlled truncation, it imprints the system-bath coupling parameters into the system's Hamiltonian. First, we develop a $\textit{variational}$ EFFH technique. In this method, system's parameters are renormalized by both the system-bath coupling parameters (as in the original EFFH approach) and the bath's temperature. Second, adopting the generalized spin-boson model, we benchmark the equilibrium state from the EFFH treatment against numerically-exact simulations and demonstrate a good agreement for both polarization and coherences using the Brownian spectral function. Third, we contrast the (normal and variational) EFFH approach with the familiar (normal and variational) polaron treatment. We show that the two methods predict a similar structure for the equilibrium state, albeit the EFFH approach offers the advantage of simpler calculations and closed-form analytical results. Altogether, we argue that for temperatures comparable to the system's frequencies, the EFFH methodology provides a good approximation for the mean-force Gibbs state in the full range of system-bath coupling, from ultraweak to ultrastrong.

翻訳日:2023-07-27 11:29:42 公開日:2023-07-26

# MHz周波数フラクソニウム量子ビットを用いた高感度交流電荷検出

High-sensitivity AC-charge detection with a MHz-frequency fluxonium qubit ( http://arxiv.org/abs/2307.14329v1 )

ライセンス: Link先を確認

B.-L. Najera-Santos, R. Rousseau, K. Gerashchenko, H. Patange, A. Riva, M. Villiers, T. Briant, P.-F. Cohadon, A. Heidmann, J. Palomo, M. Rosticher, H. le Sueur, A. Sarlette, W. C. Smith, Z. Leghtas, E. Flurin, T. Jacqmin, S. Del\'eglise

(参考訳) 強い双極子モーメントと長いコヒーレンス時間により、超伝導量子ビットはハイブリッド量子回路において顕著な成功を収めた。しかし、ほとんどの量子ビットアーキテクチャはGHz周波数範囲に限定されており、相互作用可能なシステムのクラスを厳しく制限している。一方、フラクソニウム量子ビットは、標準的なマイクロ波技術で操作され読み出されながら、非常に低い周波数にバイアスすることができる。ここでは、前例のない低い遷移周波数を1.8〜\mathrm{MHz}$で設計し、運用する。最終基底状態が 97.7~\%$ の ‘hot' 量子ビット遷移のサイドバンド冷却は, 有効温度が 23~\mu\mathrm{K}$ の値に対応する。さらに,コヒーレンス時間$t_1=34~\mu\mathrm{s}$,$t_2^*=39〜\mu\mathrm{s}$,シングルショットのqubit状態の読み出しによるコヒーレント操作も示す。重要なことは、量子ビット遷移を容量結合導波路で直接処理することにより、高周波磁場に対する高い感度を示すことである。周期量子ビット合成と問合せにより、この低周波量子ビットを周波数分解電荷センサに変換する。この方法により、電荷感度は33〜\mu\mathrm{e}/\sqrt{\mathrm{Hz}}$、エネルギー感度(ヘルツあたりジュール)は2.8〜\hbar$となる。この方法は、直流電荷ノイズに対する固有の非感度を維持しつつ、最先端のトランスポートベースデバイスに匹敵する。高電荷感度と大きな静電容量シャントが組み合わさって、1-10〜\mathrm{MHz}$範囲の量子現象を探索するための新しい経路を解き放つ。

Owing to their strong dipole moment and long coherence times, superconducting qubits have demonstrated remarkable success in hybrid quantum circuits. However, most qubit architectures are limited to the GHz frequency range, severely constraining the class of systems they can interact with. The fluxonium qubit, on the other hand, can be biased to very low frequency while being manipulated and read out with standard microwave techniques. Here, we design and operate a heavy fluxonium with an unprecedentedly low transition frequency of $1.8~\mathrm{MHz}$. We demonstrate resolved sideband cooling of the ``hot'' qubit transition with a final ground state population of $97.7~\%$, corresponding to an effective temperature of $23~\mu\mathrm{K}$. We further demonstrate coherent manipulation with coherence times $T_1=34~\mu\mathrm{s}$, $T_2^*=39~\mu\mathrm{s}$, and single-shot readout of the qubit state. Importantly, by directly addressing the qubit transition with a capacitively coupled waveguide, we showcase its high sensitivity to a radio-frequency field. Through cyclic qubit preparation and interrogation, we transform this low-frequency fluxonium qubit into a frequency-resolved charge sensor. This method results in a charge sensitivity of $33~\mu\mathrm{e}/\sqrt{\mathrm{Hz}}$, or an energy sensitivity (in joules per hertz) of $2.8~\hbar$. This method rivals state-of-the-art transport-based devices, while maintaining inherent insensitivity to DC charge noise. The high charge sensitivity combined with large capacitive shunt unlocks new avenues for exploring quantum phenomena in the $1-10~\mathrm{MHz}$ range, such as the strong-coupling regime with a resonant macroscopic mechanical resonator.

翻訳日:2023-07-27 11:29:12 公開日:2023-07-26

PDF登録状況（公開日: 20230726）