Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20231126となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# DonationChain: ブロックチェーンベースの寄付追跡システムのための新しいプラットフォーム DonationChain: A New Platform for Blockchain-Based Donation-Tracking System ( http://arxiv.org/abs/2311.03573v2 ) ライセンス: Link先を確認	Chaimaa Nairi, Murtaza Cicioglu, Ali Calhan,	(参考訳) スマートコントラクトとブロックチェーン技術を使用した寄付追跡システムは、慈善金の追跡と管理方法に革命をもたらす可能性がある。この記事では、スマートコントラクトとブロックチェーンを使用して、慈善寄付を追跡するための透明でセキュアな台帳を作成する方法について説明する。従来の寄付システムの限界と、ブロックチェーンベースのシステムがこれらの課題を克服する上でどのように役立つかについて議論する。スマートコントラクトがどのように機能するか、寄付追跡でどのように使用できるのか、自動プロセス、トランザクション手数料の削減、説明責任の向上など、それらが提供するメリットについて説明する。また、ブロックチェーン技術は、透明性を高め、不正を防止するために、分散的で改ざんされた台帳を提供する方法について論じる。最後に、技術的専門知識の必要性やセキュリティ侵害の可能性など、スマートコントラクトベースの寄付追跡システムを実装する際に対処しなければならない課題について検討する。全体として、スマートコントラクトとブロックチェーンを使用した寄付追跡システムは、寄付プロセスにおける信頼と説明責任を高める可能性がある。 A donation-tracking system using smart contracts and blockchain technology has the potential to revolutionize the way charitable giving is tracked and managed. This article explores how smart contracts and blockchain can be used to create a transparent and secure ledger for tracking charitable donations. We discuss the limitations of traditional donation systems and how a blockchain-based system can help overcome these challenges. We describe how smart contracts work, how they can be used in donation tracking, and the benefits they offer, including automated processes, reduced transaction fees, and increased accountability. We also discuss how blockchain technology provides a decentralized and tamper-proof ledger that can increase transparency and help prevent fraud. Finally, we examine some of the challenges that must be addressed when implementing a smart contract-based donation tracking system, such as the need for technical expertise and the potential for security breaches. Overall, a donation-tracking system using smart contracts and blockchain has the potential to increase trust and accountability in the donation process, which can ultimately help ensure that donations are used for their intended purposes.	翻訳日:2024-03-25 13:36:10 公開日:2023-11-26
# Make Them Change It Every Week!:A Qualitative Exploration of Online Developer Advice on Usable and Secure Authentication "Make Them Change it Every Week!": A Qualitative Exploration of Online Developer Advice on Usable and Secure Authentication ( http://arxiv.org/abs/2309.00744v2 ) ライセンス: Link先を確認	Jan H. Klemmer, Marco Gutfleisch, Christian Stransky, Yasemin Acar, M. Angela Sasse, Sascha Fahl,	(参考訳) ウェブ以降で使用可能なセキュアな認証は、ミッションクリティカルだ。パスワードベースの認証はまだ普及しているが、ユーザーは数百のオンラインアカウントとパスワードを扱うのに苦労している。多要素認証のような代替や拡張には独自の課題があり、限定的な採用しか見つからない。セキュリティとユーザビリティの適切なバランスを見つけることは、開発者にとっては難しい。以前の調査では、開発者はオンラインリソースを使用してコードを記述する際のセキュリティ上の決定を通知していた。他の分野と同様、Stack Overflowに関する議論、OWASPやNISTといった機関によるガイドラインなど、開発者の認証アドバイスがオンラインで公開されている。エンドユーザのセキュリティに影響を及ぼす認証に関する,開発者のアドバイスを最初に検討しています。 18名のプロのWeb開発者を対象に調査を行い,406件の文書と272件のアドバイスを質的に分析した。我々は、オンラインアドバイスのアクセシビリティと品質を理解し、オンラインアドバイスが安全および(使用不能な)認証にどのように貢献するかについての洞察を提供することを目指している。アドバイスは散在しており、推奨され、一貫したアドバイスを見つけることは、開発者にとっても問題である。最も一般的なアドバイスはパスワードベースの認証だが、より現代的な代替案はほとんどない。残念ながら、多くのアドバイスはデバタブル(複雑なパスワードポリシーなど)、時代遅れ(例えば、通常のパスワード変更を強制)、あるいは矛盾し、使用不能または安全でない認証につながる可能性がある。調査の結果から,開発者,アドバイス提供者,公式機関,学界に対して,開発者のオンラインアドバイスを改善する方法について提言する。 Usable and secure authentication on the web and beyond is mission-critical. While password-based authentication is still widespread, users have trouble dealing with potentially hundreds of online accounts and their passwords. Alternatives or extensions such as multi-factor authentication have their own challenges and find only limited adoption. Finding the right balance between security and usability is challenging for developers. Previous work found that developers use online resources to inform security decisions when writing code. Similar to other areas, lots of authentication advice for developers is available online, including blog posts, discussions on Stack Overflow, research papers, or guidelines by institutions like OWASP or NIST. We are the first to explore developer advice on authentication that affects usable security for end-users. Based on a survey with 18 professional web developers, we obtained 406 documents and qualitatively analyzed 272 contained pieces of advice in depth. We aim to understand the accessibility and quality of online advice and provide insights into how online advice might contribute to (in)secure and (un)usable authentication. We find that advice is scattered and that finding recommendable, consistent advice is a challenge for developers, among others. The most common advice is for password-based authentication, but little for more modern alternatives. Unfortunately, many pieces of advice are debatable (e.g., complex password policies), outdated (e.g., enforcing regular password changes), or contradicting and might lead to unusable or insecure authentication. Based on our findings, we make recommendations for developers, advice providers, official institutions, and academia on how to improve online advice for developers.	翻訳日:2024-03-19 06:53:05 公開日:2023-11-26
# IoTエコシステムの脅威とアクセス制御のソリューションとしてのブロックチェーンの課題:調査 Challenges in Blockchain as a Solution for IoT Ecosystem Threats and Access Control: A Survey ( http://arxiv.org/abs/2311.15290v1 ) ライセンス: Link先を確認	Suranjeet Chowdhury Avik, Sujit Biswas, Md Atiqur Rahaman Ahad, Zohaib Latif, Abdullah Alghamdi, Hamad Abosaq, Anupam Kumar Bairagi,	(参考訳) IoT(Internet of Things)は,私たちの日常生活のさまざまな側面に影響を与え,変革しています。一般的な信念とは対照的に、消費者や自動化システムからのデータ収集に使われるため、セキュリティやプライバシーの問題を提起する。集中制御システムのような問題やブロックチェーンとの統合のような潜在的な代替案について議論する記事が多数掲載されている。最近の調査ではIoTエコシステムが直面する課題とソリューションに焦点が当てられているが、そのほとんどが脅威や困難、ブロックチェーンベースのソリューションに集中していない。さらに、ブロックチェーンやIoT統合の課題やアタックにも焦点を絞ったものはありません。 IoTエコシステムの文脈では、全体的なセキュリティ対策は、全体的な課題を理解する上で非常に重要です。この記事では、最近の多くの記事で概説された困難を要約し、ブロックチェーンベースのソリューションなど、さまざまなアプローチにおけるさまざまな攻撃とセキュリティ上の課題を詳述する。より明確に言えば、このコントリビューションは脅威、アクセス制御の問題、簡潔な修正を集約する。さらに、この研究はパブリックブロックチェーンプロトコルに対するいくつかの攻撃をリストアップしており、研究者がIoTユースケースの予防措置を取るための実例をいくつか挙げている。最後に、今後の研究方向性は、現代の研究貢献を分析して研究ギャップを終わらせるものである。 The Internet of Things (IoT) is increasingly influencing and transforming various aspects of our daily lives. Contrary to popular belief, it raises security and privacy issues as it is used to collect data from consumers or automated systems. Numerous articles are published that discuss issues like centralised control systems and potential alternatives like integration with blockchain. Although a few recent surveys focused on the challenges and solutions facing the IoT ecosystem, most of them did not concentrate on the threats, difficulties, or blockchain-based solutions. Additionally, none of them focused on blockchain and IoT integration challenges and attacks. In the context of the IoT ecosystem, overall security measures are very important to understand the overall challenges. This article summarises difficulties that have been outlined in numerous recent articles and articulates various attacks and security challenges in a variety of approaches, including blockchain-based solutions and so on. More clearly, this contribution consolidates threats, access control issues, and remedies in brief. In addition, this research has listed some attacks on public blockchain protocols with some real-life examples that can guide researchers in taking preventive measures for IoT use cases. Finally, a future research direction concludes the research gaps by analysing contemporary research contributions.	翻訳日:2024-03-18 15:51:52 公開日:2023-11-26
# メタバース・セキュリティ・インプリケーションにおける暗号利用の理解 Understanding the Utilization of Cryptocurrency in the Metaverse and Security Implications ( http://arxiv.org/abs/2311.15360v1 ) ライセンス: Link先を確認	Ayodeji Adeniran, Mohammed Alkinoon, David Mohaisen,	(参考訳) 本稿では,暗号を組み込んだ様々なメタバースプラットフォームの動作とセキュリティの分析と理解について述べる。我々は、少なくとも2500万ドルの資本金とコインの上位メタバースドメインを取得し、DNSIPアドレスのホスティング、登録場所、登録URL、DNSサービスプロバイダ、有効期限、各メタバースWebサイトをチェックし、暗号通貨のフィアット通貨に関する情報を含む、名前登録情報(フース)でデータを拡張した。 virustotal.comの結果には、通信ファイル、受動的DNS、レファラーファイル、各メタバースドメインに対する悪意のある検出が含まれている。そこで我々は,メタバースサイトに関連する有害な検出の様々な事例を発見した。我々の分析では、悪意ある活動に影響を及ぼす可能性のあるファイルやその他の属性とともに、相関的な意味でのセキュリティの指標を強調します。 We present our results on analyzing and understanding the behavior and security of various metaverse platforms incorporating cryptocurrencies. We obtained the top metaverse coins with a capitalization of at least 25 million US dollars and the top metaverse domains for the coins, and augmented our data with name registration information (via whois), including the hosting DNS IP addresses, registrant location, registrar URL, DNS service provider, expiry date and check each metaverse website for information on fiat currency for cryptocurrency. The result from virustotal.com includes the communication files, passive DNS, referrer files, and malicious detections for each metaverse domain. Among other insights, we discovered various incidents of malicious detection associated with metaverse websites. Our analysis highlights indicators of (in)security, in the correlation sense, with the files and other attributes that are potentially responsible for the malicious activities.	翻訳日:2024-03-18 15:51:52 公開日:2023-11-26
# 無料コンテンツWebサイトのインフラ利用とそのセキュリティ特性 The Infrastructure Utilization of Free Contents Websites Reveal their Security Characteristics ( http://arxiv.org/abs/2311.15363v1 ) ライセンス: Link先を確認	Mohamed Alqadhi, David Mohaisen,	(参考訳) 無料コンテンツWebサイト(FCW)は、Webの重要な要素であり、それらの利用を理解することが不可欠である。この研究は、さまざまなネットワークサイズ、クラウドサービスプロバイダ、そして国と、彼らが提供しているコンテンツの種類に応じてどのように関連しているかを研究することで、世界中のFCWを分析します。さらに,これらの知見を,プレミアムコンテンツWebサイト(PCWs)と比較した。分析の結果、FCWは中規模のネットワークと相関し、悪意のあるウェブサイトの集中度が高いことが判明した。さらに,PCW,クラウド,カントリーホスティングのパターンには強い相関関係が認められた。同時に, FCWに関してもいくつかの相関関係が観察された。本研究は, 相関分析によるFCW生態系の解明に寄与し, それらの濃度による適切な分離, ろ過による潜在的なリスクの制御を示唆する指標となる。 Free Content Websites (FCWs) are a significant element of the Web, and realizing their use is essential. This study analyzes FCWs worldwide by studying how they correlate with different network sizes, cloud service providers, and countries, depending on the type of content they offer. Additionally, we compare these findings with those of premium content websites (PCWs). Our analysis concluded that FCWs correlate mainly with networks of medium size, which are associated with a higher concentration of malicious websites. Moreover, we found a strong correlation between PCWs, cloud, and country hosting patterns. At the same time, some correlations were also observed concerning FCWs but with distinct patterns contrasting each other for both types. Our investigation contributes to comprehending the FCW ecosystem through correlation analysis, and the indicative results point toward controlling the potential risks caused by these sites through adequate segregation and filtering due to their concentration.	翻訳日:2024-03-18 15:42:08 公開日:2023-11-26
# 自己監督型ウェイポイント騒音予測による軌道予測の強化 Enhancing Trajectory Prediction through Self-Supervised Waypoint Noise Prediction ( http://arxiv.org/abs/2312.09466v1 ) ライセンス: Link先を確認	Pranav Singh Chib, Pravendra Singh	(参考訳) トラジェクトリ予測は、観測されたトラジェクトリシーケンスから将来のトラジェクトリを予測するために、トラフィックアクターの不確定の性質をモデル化する重要なタスクである。しかし、現在の方法では、トラジェクトリーがこれらの多様体に厳密に従うと仮定して、推定されたデータ多様体に限定し、過度に単純化された予測をもたらす。そこで本研究では,SSWNP(Self-Supervised Waypoint Noise Prediction)と呼ばれる新しい手法を提案する。提案手法では,まず,過去の観測された軌跡の清潔でノイズに満ちた視点を,経路の空間領域にまたがって作成する。次に、軌道予測モデルを用いて、これらの2つの視点からの予測と軌道予測タスクとの空間的整合性を維持する。ノイズ拡張ビューの導入は、モデルがデータ多様体の狭い解釈に依存することを緩和し、より妥当で多様な表現を学べる。また,過去の観測軌跡の2つの視点における騒音を補助的自己監視課題として予測し,モデルによる基礎表現と今後の予測の理解を深める。実験的な証拠は、SSWNPをモデル学習プロセスに組み込むことで、ベースライン法と比較してノイズの多い環境でも性能が著しく向上することを示している。提案手法は既存の軌道予測手法を補完することができる。提案手法の有効性を示すために,NBA Sports VU,ETH-UCY,TrajNet++の3つのデータセットに対して広範な実験を行った。 Trajectory prediction is an important task that involves modeling the indeterminate nature of traffic actors to forecast future trajectories given the observed trajectory sequences. However, current methods confine themselves to presumed data manifolds, assuming that trajectories strictly adhere to these manifolds, resulting in overly simplified predictions. To this end, we propose a novel approach called SSWNP (Self-Supervised Waypoint Noise Prediction). In our approach, we first create clean and noise-augmented views of past observed trajectories across the spatial domain of waypoints. We then compel the trajectory prediction model to maintain spatial consistency between predictions from these two views, in addition to the trajectory prediction task. Introducing the noise-augmented view mitigates the model's reliance on a narrow interpretation of the data manifold, enabling it to learn more plausible and diverse representations. We also predict the noise present in the two views of past observed trajectories as an auxiliary self-supervised task, enhancing the model's understanding of the underlying representation and future predictions. Empirical evidence demonstrates that the incorporation of SSWNP into the model learning process significantly improves performance, even in noisy environments, when compared to baseline methods. Our approach can complement existing trajectory prediction methods. To showcase the effectiveness of our approach, we conducted extensive experiments on three datasets: NBA Sports VU, ETH-UCY, and TrajNet++, with experimental results highlighting the substantial improvement achieved in trajectory prediction tasks.	翻訳日:2024-01-15 14:13:37 公開日:2023-11-26
# ai駆動e-liability knowledge graphs: サプライチェーン炭素会計と排出責任管理のための包括的枠組み AI-driven E-Liability Knowledge Graphs: A Comprehensive Framework for Supply Chain Carbon Accounting and Emissions Liability Management ( http://arxiv.org/abs/2312.00045v1 ) ライセンス: Link先を確認	Olamide Oladeji, Seyed Shahabeddin Mousavi, Marc Roston	(参考訳) 炭素収支は気候変動と闘う上で基本的な役割を担っているが、その課題がないわけではない。本稿は、従来の炭素会計の実践を批判し、その後、カプランとラマンナが提唱したE-liability Carbon Accounting Method and Emissions Liability Management (ELM)を導入し、その強みを強調した。実世界の炭素会計改善のためのこの新しいアプローチの膨大な価値を認識し、E-liability Knowledge GraphフレームワークであるAIと計算を活用する新しいデータ駆動統合フレームワークを導入し、E-liability Carbon Accounting方法論の現実の実装を実現します。提案手法は,サプライチェーン内の複雑な環境相互作用を明確化し,より良い情報とより責任のある意思決定を可能にする。我々は,このフレームワークの実装面を分析し,グローバルサプライチェーンの透明性と脱炭を確実にする上で,このAI支援知識グラフの役割について論じる。 While carbon accounting plays a fundamental role in our fight against climate change, it is not without its challenges. We begin the paper with a critique of the conventional carbon accounting practices, after which we proceed to introduce the E-liability carbon accounting methodology and Emissions Liability Management (ELM) originally proposed by Kaplan and Ramanna, highlighting their strengths. Recognizing the immense value of this novel approach for real-world carbon accounting improvement, we introduce a novel data-driven integrative framework that leverages AI and computation - the E-Liability Knowledge Graph framework - to achieve real-world implementation of the E-liability carbon accounting methodology. In addition to providing a path-to-implementation, our proposed framework brings clarity to the complex environmental interactions within supply chains, thus enabling better informed and more responsible decision-making. We analyze the implementation aspects of this framework and conclude with a discourse on the role of this AI-aided knowledge graph in ensuring the transparency and decarbonization of global supply chains.	翻訳日:2023-12-11 03:59:47 公開日:2023-11-26
# AIガバナンスの強化のためのAI監査の強化 Advancing AI Audits for Enhanced AI Governance ( http://arxiv.org/abs/2312.00044v1 ) ライセンス: Link先を確認	Arisa Ema, Ryo Sato, Tomoharu Hase, Masafumi Nakano, Shinji Kamimura, Hiromu Kitamura	(参考訳) 人工知能(AI)が社会の様々なサービスやシステムに統合されるにつれて、多くの企業や組織がAIの原則や政策を提案し、関連するコミットメントを行った。逆に、独立監査の必要性を提案し、AIサービスやシステムの開発者や提供者が採用する自発的な原則がリスクを十分に解決する、と主張する者もいる。このポリシーレコメンデーションは、AIサービスとシステムの監査に関する問題を要約し、健全なAIガバナンスに寄与するAI監査を促進するための3つのレコメンデーションを提示する。勧告1.AI監査のための制度設計の開発推薦2.AI監査のための人材育成勧告3。技術進歩に応じてAI監査を更新する。このポリシーレコメンデーションでは、AIは、生成AIがどのように監査されるべきかを概説する最後の章でデータを認識し、予測するものであると仮定されている。 As artificial intelligence (AI) is integrated into various services and systems in society, many companies and organizations have proposed AI principles, policies, and made the related commitments. Conversely, some have proposed the need for independent audits, arguing that the voluntary principles adopted by the developers and providers of AI services and systems insufficiently address risk. This policy recommendation summarizes the issues related to the auditing of AI services and systems and presents three recommendations for promoting AI auditing that contribute to sound AI governance. Recommendation1.Development of institutional design for AI audits. Recommendation2.Training human resources for AI audits. Recommendation3. Updating AI audits in accordance with technological progress. In this policy recommendation, AI is assumed to be that which recognizes and predicts data with the last chapter outlining how generative AI should be audited.	翻訳日:2023-12-11 03:59:27 公開日:2023-11-26
# 有機化学研究パラダイムの転換-手作業から自動化と人工知能の交差点への移行 Transforming organic chemistry research paradigms: moving from manual efforts to the intersection of automation and artificial intelligence ( http://arxiv.org/abs/2312.00808v1 ) ライセンス: Link先を確認	Chengchun Liu, Yuntian Chen, Fanyang Mo	(参考訳) 有機化学は、労働集約的なアプローチから、自動化と人工知能(AI)が支配する新しい時代へと、大きなパラダイムシフトを遂げている。この変化は、技術の進歩、研究効率と正確性の向上への需要の増大、学際的研究の急成長によってもたらされている。計算能力とアルゴリズムによってサポートされているAIモデルは、合成計画を大幅に作り変え、複雑な分子合成に取り組むための画期的な方法を導入している。さらに、自律ロボットシステムは、前例のないスピードと精度で退屈な作業を行うことで、発見のペースを急速に加速している。この記事では、このパラダイムシフトによって提示される複数の機会と課題を調べ、その広範囲にわたる影響について検討します。これは、自動化とAIの相乗的相互作用によってますます定義される有機化学研究の将来の軌道に関する貴重な洞察を提供する。 Organic chemistry is undergoing a major paradigm shift, moving from a labor-intensive approach to a new era dominated by automation and artificial intelligence (AI). This transformative shift is being driven by technological advances, the ever-increasing demand for greater research efficiency and accuracy, and the burgeoning growth of interdisciplinary research. AI models, supported by computational power and algorithms, are drastically reshaping synthetic planning and introducing groundbreaking ways to tackle complex molecular synthesis. In addition, autonomous robotic systems are rapidly accelerating the pace of discovery by performing tedious tasks with unprecedented speed and precision. This article examines the multiple opportunities and challenges presented by this paradigm shift and explores its far-reaching implications. It provides valuable insights into the future trajectory of organic chemistry research, which is increasingly defined by the synergistic interaction of automation and AI.	翻訳日:2023-12-11 03:31:10 公開日:2023-11-26
# 炭素会計におけるaiデータ活用--代替資源からの情報抽出 Leveraging AI-derived Data for Carbon Accounting: Information Extraction from Alternative Sources ( http://arxiv.org/abs/2312.03722v1 ) ライセンス: Link先を確認	Olamide Oladeji, Seyed Shahabeddin Mousavi	(参考訳) 炭素会計は、排出削減と脱炭への世界的道の基本的な構成要素であるが、信頼性と信頼性のある炭素会計対策の達成には多くの課題がある。私たちは、炭素会計はデータ駆動であるだけでなく、より方法論的に健全である必要があることを動機付けている。我々は、信頼された炭素会計手続きへの道のりにおいて重要な役割を果たす、より多様なデータソースの必要性を議論し、その理由だけでなく、一般的な人工知能(ai)と自然言語処理(nlp)が、このプロセスにおける非構造化データの利用をより有効にする分野の最近の進歩に照らして、代替データセットの宝庫への合理的なアクセスをいかに解放するかについて詳しく説明する。金融・海運データに対するOpenAIのGPT APIを用いたNLPを用いた分析により,近年の現実世界データに関するケーススタディを提案する。本稿は,これらの手法とアプローチを,AIを活用した統合的炭素会計のためのより広範なフレームワークに統合する方法についての議論で締めくくった。 Carbon accounting is a fundamental building block in our global path to emissions reduction and decarbonization, yet many challenges exist in achieving reliable and trusted carbon accounting measures. We motivate that carbon accounting not only needs to be more data-driven, but also more methodologically sound. We discuss the need for alternative, more diverse data sources that can play a significant role on our path to trusted carbon accounting procedures and elaborate on not only why, but how Artificial Intelligence (AI) in general and Natural Language Processing (NLP) in particular can unlock reasonable access to a treasure trove of alternative data sets in light of the recent advances in the field that better enable the utilization of unstructured data in this process. We present a case study of the recent developments on real-world data via an NLP-powered analysis using OpenAI's GPT API on financial and shipping data. We conclude the paper with a discussion on how these methods and approaches can be integrated into a broader framework for AI-enabled integrative carbon accounting.	翻訳日:2023-12-11 03:22:33 公開日:2023-11-26
# モデルグラデード評価と自動解釈可能性のロバスト性を探る Exploring the Robustness of Model-Graded Evaluations and Automated Interpretability ( http://arxiv.org/abs/2312.03721v1 ) ライセンス: Link先を確認	Simon Lermen and Ond\v{r}ej Kvapil	(参考訳) 言語モデルの評価に対する様々なリスクや特徴に対する関心が高まっている。グラデーションの自然言語理解に依存する評価は、他の言語モデルを用いて大規模に行うことができる。我々は,これらのモデルグレード評価のロバスト性を,新しい偽装evalを含む異なるデータセットへのインジェクションにテストする。これらの注射は、検査官と検査官の間の直接のコミュニケーションに似て、成績を変える。私たちは、よりインテリジェントなモデルが彼らの評価モデルを操作したり協力したりする未来を推定します。本研究は, 現状の商業モデルにおけるこれらの注入に対する感受性について検討した。さらに、同様のインジェクションを自動解釈フレームワークで使用して、誤解を招くモデル記述の説明を生成することもできる。結果は今後の働きを刺激し、評価と自動解釈可能性に対する不適格な信頼に注意する必要がある。 There has been increasing interest in evaluations of language models for a variety of risks and characteristics. Evaluations relying on natural language understanding for grading can often be performed at scale by using other language models. We test the robustness of these model-graded evaluations to injections on different datasets including a new Deception Eval. These injections resemble direct communication between the testee and the evaluator to change their grading. We extrapolate that future, more intelligent models might manipulate or cooperate with their evaluation model. We find significant susceptibility to these injections in state-of-the-art commercial models on all examined evaluations. Furthermore, similar injections can be used on automated interpretability frameworks to produce misleading model-written explanations. The results inspire future work and should caution against unqualified trust in evaluations and automated interpretability.	翻訳日:2023-12-11 03:22:13 公開日:2023-11-26
# llmとの交渉: 迅速なハック、スキルギャップ、推論欠陥 Negotiating with LLMS: Prompt Hacks, Skill Gaps, and Reasoning Deficits ( http://arxiv.org/abs/2312.03720v1 ) ライセンス: Link先を確認	Johannes Schneider, Steffi Haag, Leona Chandra Kruse	(参考訳) 大規模な言語モデルであるChatGPTのようなLSMは、記録的な時間で100のMioユーザバリアに達し、私たちの生活のあらゆる領域に入り込み、これらの人工知能モデルと人間の間の多様な相互作用へと繋がる可能性がある。多くの研究が一階の原則から誘導的にガバナンスと規制について議論しているが、人間とLSMの対話を観察するインダクティブでデータ駆動のレンズを提供する研究はほとんどない。本研究は,全年齢グループで40名以上の個人を対象に,llmと価格交渉を行うユーザ調査を行う。交渉結果と戦略の相違について検討し, LLMとの相互作用について考察する。さらに,LLMの推論能力に関する欠点を強調し,その結果として,LLMが命令に反し合理性を超えた合意を下すために,LLMを操作しようとするハッキングに対する感受性を強調した。また,LLMを効果的に操作する上でのリテラシーのギャップを指摘するため,人間が達成した交渉価格が幅広い範囲で達成できることも示している。 Large language models LLMs like ChatGPT have reached the 100 Mio user barrier in record time and might increasingly enter all areas of our life leading to a diverse set of interactions between those Artificial Intelligence models and humans. While many studies have discussed governance and regulations deductively from first-order principles, few studies provide an inductive, data-driven lens based on observing dialogues between humans and LLMs especially when it comes to non-collaborative, competitive situations that have the potential to pose a serious threat to people. In this work, we conduct a user study engaging over 40 individuals across all age groups in price negotiations with an LLM. We explore how people interact with an LLM, investigating differences in negotiation outcomes and strategies. Furthermore, we highlight shortcomings of LLMs with respect to their reasoning capabilities and, in turn, susceptiveness to prompt hacking, which intends to manipulate the LLM to make agreements that are against its instructions or beyond any rationality. We also show that the negotiated prices humans manage to achieve span a broad range, which points to a literacy gap in effectively interacting with LLMs.	翻訳日:2023-12-11 03:22:02 公開日:2023-11-26
# 総合標準化試験におけるAIチャットボットの性能評価 : GREを用いた事例 Assessing AI Chatbots Performance in Comprehensive Standardized Test Preparation; A Case Study with GRE ( http://arxiv.org/abs/2312.03719v1 ) ライセンス: Link先を確認	Mohammad Abu-Haifa, Bara'a Etawi, Huthaifa Alkhatatbeh, and Ayman Ababneh	(参考訳) 本稿では、標準化されたテスト質問に対する3つの人工知能チャットボット(bing、chatgpt、gpt-4)の性能を総合的に評価する。 GREとして知られる大学院記録試験は,定量的推論と言語スキルの両方を含むケーススタディとして機能する。チャットボットの能力を評価するために,多種多様なスタイルと157の言語質問を多種多様な難易度(易易度,中度,難易度)に分類した137の量的推論質問を行った。本稿では、各チャットボットの性能を試験でテストされた様々なスキルやスタイルにまたがって提示することにより、標準化テスト準備における人工知能の利用に関する結果とその意義について詳細に検討する。さらに,画像に基づく質問に対する人工知能の習熟度について検討し,各チャットボットの不確実性レベルについて述べる。その結果、チャットボット全体の成功度が変化し、モデルの洗練度とトレーニングデータの影響が示された。 gpt-4は、特に複雑な言語理解タスクにおいて最も熟練し、言語理解における人工知能の進化と、高いスコアで試験に合格する能力を強調した。 This research paper presents a comprehensive evaluation of the performance of three artificial 10 intelligence chatbots: Bing, ChatGPT, and GPT-4, in addressing standardized test questions. Graduate record examination, known as GRE, serves as a case study in this paper, encompassing both quantitative reasoning and verbal skills. A total of 137 quantitative reasoning questions, featuring diverse styles and 157 verbal questions categorized into varying levels of difficulty (easy, medium, and hard) were administered to assess the chatbots' capabilities. This paper provides a detailed examination of the results and their implications for the utilization of artificial intelligence in standardized test preparation by presenting the performance of each chatbot across various skills and styles tested in the exam. Additionally, this paper explores the proficiency of artificial intelligence in addressing image-based questions and illustrates the uncertainty level of each chatbot. The results reveal varying degrees of success across the chatbots, demonstrating the influence of model sophistication and training data. GPT-4 emerged as the most proficient, especially in complex language understanding tasks, highlighting the evolution of artificial intelligence in language comprehension and its ability to pass the exam with a high score.	翻訳日:2023-12-11 03:21:42 公開日:2023-11-26
# 法律における大規模言語モデル:調査 Large Language Models in Law: A Survey ( http://arxiv.org/abs/2312.03718v1 ) ライセンス: Link先を確認	Jinqi Lai, Wensheng Gan, Jiayang Wu, Zhenlian Qi, Philip S. Yu	(参考訳) 人工知能(AI)の出現は、従来の司法産業に大きな影響を与えた。さらに、近年、AIGC(AI生成コンテンツ)の開発により、画像認識、自動テキスト生成、対話型チャットなど、AIと法則がさまざまな領域に応用されている。大型モデルの急速な台頭と普及に伴い、AIが従来の司法業界に変革をもたらすことは明らかである。しかし、法的な大規模言語モデル(LLM)の適用はまだ初期段階にある。いくつかの課題に対処する必要がある。本稿では,法的LLMを包括的に調査することを目的とする。我々は、LLMの広範な調査を行うだけでなく、司法制度におけるそれらの適用を明らかにする。まず、法分野におけるAI技術の概観と、LLMにおける最近の研究の紹介を行う。次に,ユーザへの法的助言や裁判中の裁判官支援など,法律llmが提示する実践的実施について論じる。さらに、データ、アルゴリズム、司法実務を含む法的LLMの限界についても検討する。最後に,実践的提言を要約し,これらの課題に対処するための今後の開発方向性を提案する。 The advent of artificial intelligence (AI) has significantly impacted the traditional judicial industry. Moreover, recently, with the development of AI-generated content (AIGC), AI and law have found applications in various domains, including image recognition, automatic text generation, and interactive chat. With the rapid emergence and growing popularity of large models, it is evident that AI will drive transformation in the traditional judicial industry. However, the application of legal large language models (LLMs) is still in its nascent stage. Several challenges need to be addressed. In this paper, we aim to provide a comprehensive survey of legal LLMs. We not only conduct an extensive survey of LLMs, but also expose their applications in the judicial system. We first provide an overview of AI technologies in the legal field and showcase the recent research in LLMs. Then, we discuss the practical implementation presented by legal LLMs, such as providing legal advice to users and assisting judges during trials. In addition, we explore the limitations of legal LLMs, including data, algorithms, and judicial practice. Finally, we summarize practical recommendations and propose future development directions to address these challenges.	翻訳日:2023-12-11 03:21:18 公開日:2023-11-26
# chatgptを用いた画像解析における深層学習技術の進化の要約--質的研究 ChatGPT Application In Summarizing An Evolution Of Deep Learning Techniques In Imaging: A Qualitative Study ( http://arxiv.org/abs/2312.03723v1 ) ライセンス: Link先を確認	Arman Sarraf, Amirabbas Abbaspour	(参考訳) 記事やテキストの要約の追求は、自然言語処理(nlp)実践者の注意を惹きつけ、自身を強烈な挑戦と表現している。 ChatGPT 3.5は、最大3000個のトークンの内容を1ページに格納する能力を示し、様々なテーマにまたがる所定のテキストから重要な情報を保持することを目的としている。質的研究の結果、7つの科学論文を選定し、公開のチャットgptサービスを用いて論文の要約を作成した。その後,記事の共著者6名を対象に,原内容と比較して要約の質を評価するための5つの質問を行った。その結果,ChatGPTが生成した要約は,各原稿の主文を保存し,記事に含まれる重要な情報を効果的にカプセル化することがわかった。しかし、本来の記事とは対照的に、要約の技術的な深みはわずかに減少していた。その結果,ChatGPTのテキスト要約能力は,純粋に科学的言説よりも報告に整合した方法で本質的な洞察を抽出する強力なツールであることが示唆された。 The pursuit of article or text summarization has captured the attention of natural language processing (NLP) practitioners, presenting itself as a formidable challenge. ChatGPT 3.5 exhibits the capacity to condense the content of up to 3000 tokens into a single page, aiming to retain pivotal information from a given text across diverse themes. In a conducted qualitative research endeavor, we selected seven scientific articles and employed the publicly available ChatGPT service to generate summaries of these articles. Subsequently, we engaged six co-authors of the articles in a survey, presenting five questions to evaluate the quality of the summaries compared to the original content. The findings revealed that the summaries produced by ChatGPT effectively encapsulated the crucial information present in the articles, preserving the principal message of each manuscript. Nonetheless, there was a slight diminishment in the technical depth of the summaries as opposed to the original articles. As a result, our conclusion underscores ChatGPT's text summarization capability as a potent tool for extracting essential insights in a manner more aligned with reporting than purely scientific discourse.	翻訳日:2023-12-11 03:05:53 公開日:2023-11-26
# 分布アルゴリズム推定における遺伝的ドリフトのシャープ境界 Sharp Bounds for Genetic Drift in Estimation of Distribution Algorithms ( http://arxiv.org/abs/1910.14389v2 ) ライセンス: Link先を確認	Benjamin Doerr, Weijie Zheng	(参考訳) 分布アルゴリズムの推定 (EDAs) は、人口ではなく確率的モデルを進化させるという広義の進化的アルゴリズム(EA)の一分野である。既存のアルゴリズムはこのカテゴリに分類される。 EAにおける遺伝的ドリフトと類似して、EDAは、適合性によって正当化されない確率モデルの更新がサンプリング周波数を境界値に移動する現象にも遭遇する。これによりパフォーマンスが大幅に低下する可能性がある。本稿では,複数の単変量EDAに対して中性ビットのサンプリング周波数の境界打点時間の最初の鋭い推定値を示す。それぞれの世代で$\lambda$ offspringから$\mu$のベストな個人を選択するUMDAに対して、中立ビットの周波数がミドルレンジ$[\tfrac 14 \tfrac 34]$と0または1に吸収されたときに期待される最初のイテレーションが$\Theta(\mu)$であることを示す。対応するヒットタイムは、仮説上の集団サイズが$K$のcGAに対して$\Theta(K^2)$である。さらに,$\mu$,$\lambda$,$\rho$ のパラメータを持つ pbil に対して,期待値の $\theta(\mu/\rho^2)$ を繰り返すことで,中性ビットのサンプリング周波数が区間 $[\theta(\rho/\mu),1-\theta(\rho/\mu)] を残し,そのビットに対して常に同じ値がサンプリングされ,その周波数が最大速度で対応する境界値に近づくことを証明した。これらのステートメントで暗黙的な下限に対しては、指数的テール境界も示される。ビットが中性ではなく、中性である場合、あるいはそれを好む場合、低周波値に達するための時間上の下限は依然として保持される。類似のステートメントは、中立あるいは0の値を好むビットに対して成り立つ。 Estimation of Distribution Algorithms (EDAs) are one branch of Evolutionary Algorithms (EAs) in the broad sense that they evolve a probabilistic model instead of a population. Many existing algorithms fall into this category. Analogous to genetic drift in EAs, EDAs also encounter the phenomenon that updates of the probabilistic model not justified by the fitness move the sampling frequencies to the boundary values. This can result in a considerable performance loss. This paper proves the first sharp estimates of the boundary hitting time of the sampling frequency of a neutral bit for several univariate EDAs. For the UMDA that selects $\mu$ best individuals from $\lambda$ offspring each generation, we prove that the expected first iteration when the frequency of the neutral bit leaves the middle range $[\tfrac 14, \tfrac 34]$ and the expected first time it is absorbed in 0 or 1 are both $\Theta(\mu)$. The corresponding hitting times are $\Theta(K^2)$ for the cGA with hypothetical population size $K$. This paper further proves that for PBIL with parameters $\mu$, $\lambda$, and $\rho$, in an expected number of $\Theta(\mu/\rho^2)$ iterations the sampling frequency of a neutral bit leaves the interval $[\Theta(\rho/\mu),1-\Theta(\rho/\mu)]$ and then always the same value is sampled for this bit, that is, the frequency approaches the corresponding boundary value with maximum speed. For the lower bounds implicit in these statements, we also show exponential tail bounds. If a bit is not neutral, but neutral or has a preference for ones, then the lower bounds on the times to reach a low frequency value still hold. An analogous statement holds for bits that are neutral or prefer the value zero.	翻訳日:2023-11-30 18:22:43 公開日:2023-11-26
# 拡張SRVF表現を用いた木状3次元物体の弾性形状解析 Elastic Shape Analysis of Tree-like 3D Objects using Extended SRVF Representation ( http://arxiv.org/abs/2110.08693v4 ) ライセンス: Link先を確認	Guan Wang, Hamid Laga, Anuj Srivastava	(参考訳) 複雑な幾何学的・トポロジカルな変動を示すニューロンや植物木といった詳細な3d生体オブジェクトをどうやって分析できるのか? 本稿では,木のような3次元オブジェクトの形状間の測地変形を表現,比較,計算するための新しい数学的枠組みを開発する。サブツリーの階層構造はこれらのオブジェクトを特徴付ける -- 各サブツリーはメインブランチを持ち、いくつかのサイドブランチが付属している -- 。まず,ユークリッド曲線向けに開発された正方根速度関数(srvf)を木形3dオブジェクトに拡張した新しい表現法を提案する。次に、一方の木の形の物体を他方に変形させるために必要な曲げ、伸展、分岐スライディングを定量化する新しい計量を定義する。 QED(Quotient Euclidean Distance)やTED(Tree Edit Distance)といった現在のメトリクスと比較すると、提案された表現とメトリクスは、枝の完全な弾力性(屈曲と伸張)と位相的変動(分岐死・産出・すべり)を捉えている。 QEDおよびTEDメトリクスのエッジ崩壊とノード分割操作による縮小を完全に回避する。本稿では,ニューロンや植物木などの生物オブジェクト間の測地学の比較,マッチング,計算において,このフレームワークの有用性を示す。このフレームワークは様々な形状分析タスクにも適用できる。 (i)木形3次元物体の対称性解析と対称性二木形3Dオブジェクトの集団の計算概要統計(意味と変動のモード) (iii)そのような集団にパラメトリック確率分布を適合させること。 (iv)推定確率分布からランダムサンプリングにより、新しい木形3dオブジェクトを合成する。 How can one analyze detailed 3D biological objects, such as neurons and botanical trees, that exhibit complex geometrical and topological variation? In this paper, we develop a novel mathematical framework for representing, comparing, and computing geodesic deformations between the shapes of such tree-like 3D objects. A hierarchical organization of subtrees characterizes these objects -- each subtree has the main branch with some side branches attached -- and one needs to match these structures across objects for meaningful comparisons. We propose a novel representation that extends the Square-Root Velocity Function (SRVF), initially developed for Euclidean curves, to tree-shaped 3D objects. We then define a new metric that quantifies the bending, stretching, and branch sliding needed to deform one tree-shaped object into the other. Compared to the current metrics, such as the Quotient Euclidean Distance (QED) and the Tree Edit Distance (TED), the proposed representation and metric capture the full elasticity of the branches (i.e., bending and stretching) as well as the topological variations (i.e., branch death/birth and sliding). It completely avoids the shrinkage that results from the edge collapse and node split operations of the QED and TED metrics. We demonstrate the utility of this framework in comparing, matching, and computing geodesics between biological objects such as neurons and botanical trees. The framework is also applied to various shape analysis tasks: (i) symmetry analysis and symmetrization of tree-shaped 3D objects, (ii) computing summary statistics (means and modes of variations) of populations of tree-shaped 3D objects, (iii) fitting parametric probability distributions to such populations, and (iv) finally synthesizing novel tree-shaped 3D objects through random sampling from estimated probability distributions.	翻訳日:2023-11-30 18:17:54 公開日:2023-11-26
# 複雑性理論のヒントで正しいアルゴリズムを選ぶ Choosing the Right Algorithm With Hints From Complexity Theory ( http://arxiv.org/abs/2109.06584v2 ) ライセンス: Link先を確認	Shouda Wang and Weijie Zheng and Benjamin Doerr	(参考訳) 異なる探索ヒューリスティックのミリアードから適切なアルゴリズムを選択することは、新しい最適化問題に直面すると困難である。本研究では,ブラックボックスオプティマイザの幅広いクラスにおいて,どのようなアルゴリズムが最良かという純粋に学術的な疑問は,適切な最適化ヒューリスティックを探索する方向を示す実りある指標を与えることができると論じる。最近提案されたdlbベンチマークでこのアプローチを実証し、既知の結果はいくつかの古典的な進化アルゴリズムの$o(n^3)$ランタイムと、推定分布アルゴリズムの$o(n^2 \log n)$ランタイムのみである。単項ブラックボックスの複雑性が$O(n^2)$であることは、メトロポリスアルゴリズムを興味深い候補として提案し、二次時間でDLB問題を解くことを証明した。我々はまた、より良いランタイムが偏りのないアルゴリズムのクラスでは得られないことを証明するので、より多くの親の情報を使って新しいソリューションを生成するアルゴリズムに注意を移す。このタイプの人工アルゴリズムは、$O(n \log n)$ランタイムを持つので、意味に基づくコンパクトな遺伝的アルゴリズム(sig-cGA)は、高い確率で$O(n \log n)$の時間でもDLB問題を解くことができる。我々の実験はメトロポリスのアルゴリズムの優れた性能を示しており、明らかに妥当な問題サイズとみなす全てのアルゴリズムの中で最高のものである。 Choosing a suitable algorithm from the myriads of different search heuristics is difficult when faced with a novel optimization problem. In this work, we argue that the purely academic question of what could be the best possible algorithm in a certain broad class of black-box optimizers can give fruitful indications in which direction to search for good established optimization heuristics. We demonstrate this approach on the recently proposed DLB benchmark, for which the only known results are $O(n^3)$ runtimes for several classic evolutionary algorithms and an $O(n^2 \log n)$ runtime for an estimation-of-distribution algorithm. Our finding that the unary unbiased black-box complexity is only $O(n^2)$ suggests the Metropolis algorithm as an interesting candidate and we prove that it solves the DLB problem in quadratic time. Since we also prove that better runtimes cannot be obtained in the class of unary unbiased algorithms, we shift our attention to algorithms that use the information of more parents to generate new solutions. An artificial algorithm of this type having an $O(n \log n)$ runtime leads to the result that the significance-based compact genetic algorithm (sig-cGA) can solve the DLB problem also in time $O(n \log n)$ with high probability. Our experiments show a remarkably good performance of the Metropolis algorithm, clearly the best of all algorithms regarded for reasonable problem sizes.	翻訳日:2023-11-30 18:17:23 公開日:2023-11-26
# 量子揺らぎによる透明非線形減衰 Apparent nonlinear damping triggered by quantum fluctuations ( http://arxiv.org/abs/2104.06464v2 ) ライセンス: Link先を確認	Mario F. Gely, Adri\'an Sanz Mora, Shun Yanai, Rik van der Spek, Daniel Bothner, Gary A. Steele	(参考訳) 非線形減衰、振動振幅による減衰率の変化は多くの電気的、機械的、生物学的振動子において重要な役割を果たす。カーボンナノチューブ、グラフェン膜、超伝導共振器などの新しい技術では、非線形減衰の起源はよく分かっていない。これは、減衰速度が極めて精密なセンサーや量子コンピュータへのこれらのシステムの適用におけるメリットの鍵となるため、問題である。超伝導共振器の測定により、量子揺らぎの相互作用とジョセフソン接合の非線形性から、非線形減衰によく似た共振器応答のパワー依存性が現れることを示す。この現象は位相空間における準確率の流れを通して理解され、可視化することができる。量子ゆらぎやその他のノイズ源は、ナノメカニカル振動子やマクロシステムのような同様の保守的な非線形性を持つ系において明らかに非線形減衰を引き起こすことを期待する。 Nonlinear damping, the change in damping rate with the amplitude of oscillations plays an important role in many electrical, mechanical and even biological oscillators. In novel technologies such as carbon nanotubes, graphene membranes or superconducting resonators, the origin of nonlinear damping is sometimes unclear. This presents a problem, as the damping rate is a key figure of merit in the application of these systems to extremely precise sensors or quantum computers. Through measurements of a superconducting resonator, we show that from the interplay of quantum fluctuations and the nonlinearity of a Josephson junction emerges a power-dependence in the resonator response which closely resembles nonlinear damping. The phenomenon can be understood and visualized through the flow of quasi-probability in phase space where it reveals itself as dephasing. Crucially, the effect is not restricted to superconducting circuits: we expect that quantum fluctuations or other sources of noise give rise to apparent nonlinear damping in systems with a similar conservative nonlinearity, such as nano-mechanical oscillators or even macroscopic systems.	翻訳日:2023-11-30 18:15:20 公開日:2023-11-26
# ブリッジと非定常マルチアームバンド Bridging Adversarial and Nonstationary Multi-armed Bandit ( http://arxiv.org/abs/2201.01628v3 ) ライセンス: Link先を確認	Ningyuan Chen, Shuoguang Yang, Hailun Zhang	(参考訳) マルチアームのバンディットフレームワークでは、時変報酬分布を扱うために一般的に使われる2つの定式化がある: 逆バンディットと非定常バンディットである。本論文では, オーラクル, アルゴリズム, 後悔分析の相違について述べるが, この2つを特殊ケースとしてスムーズにブリッジする統一的な定式化について述べる。この定式化は、タイムウインドウ内で最高の固定アームを取るオラクルを使用します。ウィンドウサイズによっては、非定常バンディットの逆バンディットと動的オラクルにおいて後からオラクルになる。我々は、一致する下限で最適な後悔を得るアルゴリズムを提供する。 In the multi-armed bandit framework, there are two formulations that are commonly employed to handle time-varying reward distributions: adversarial bandit and nonstationary bandit. Although their oracles, algorithms, and regret analysis differ significantly, we provide a unified formulation in this paper that smoothly bridges the two as special cases. The formulation uses an oracle that takes the best-fixed arm within time windows. Depending on the window size, it turns into the oracle in hindsight in the adversarial bandit and dynamic oracle in the nonstationary bandit. We provide algorithms that attain the optimal regret with the matching lower bound.	翻訳日:2023-11-30 18:05:43 公開日:2023-11-26
# 有効信頼度推定による半教師付きサルエント物体検出 Semi-supervised Salient Object Detection with Effective Confidence Estimation ( http://arxiv.org/abs/2112.14019v2 ) ライセンス: Link先を確認	Jiawei Liu, Jing Zhang, Nick Barnes	(参考訳) 既存の有能なオブジェクト検出モデルの成功は、大きなピクセル単位でラベル付けされたトレーニングデータセットに依存している。我々は,少数のラベル付きサンプルと多数のラベル付きサンプルにアクセス可能な半教師付きサルエント物体検出について検討した。具体的には,条件付エネルギーベースモデルを用いた擬似ラベル学習フレームワークを提案する。条件付エネルギーベースモデルの確率的潜在変数を用いて,人間の給与ラベルの確率的性質をモデル化する。さらに、未ラベルサンプルに対して生成された対応する擬似ラベルの信頼性を強調して、高品質な画素単位の不確かさマップを作成することができる。これにより、モデル最適化における低確かさの擬似ラベルの寄与を最小化し、エラーの伝播を防止できる。実験の結果,提案手法はラベルなしデータの寄与を効果的に探究できることがわかった。ラベル付きサンプルは1/16に過ぎず,最先端の完全教師付きモデルと比較して競争性能が向上する。 The success of existing salient object detection models relies on a large pixel-wise labeled training dataset, which is time-consuming and expensive to obtain. We study semi-supervised salient object detection, with access to a small number of labeled samples and a large number of unlabeled samples. Specifically, we present a pseudo label based learn-ing framework with a Conditional Energy-based Model. We model the stochastic nature of human saliency labels using the stochastic latent variable of the Conditional Energy-based Model. It further enables generation of a high-quality pixel-wise uncertainty map, highlighting the reliability of corresponding pseudo label generated for the unlabeled sample. This minimises the contribution of low-certainty pseudo labels in optimising the model, preventing the error propagation. Experimental results show that the proposed strategy can effectively explore the contribution of unlabeled data. With only 1/16 labeled samples, our model achieves competitive performance compared with state-of-the-art fully-supervised models.	翻訳日:2023-11-30 18:05:32 公開日:2023-11-26
# 残留型物理インフォームド・トランスファー・ラーニング:深層学習による長期cfdシミュレーションの高速化 Residual-based physics-informed transfer learning: A hybrid method for accelerating long-term CFD simulations via deep learning ( http://arxiv.org/abs/2206.06817v3 ) ライセンス: Link先を確認	Joongoo Jeon, Juhyeong Lee, Ricardo Vinuesa, Sung Joong Kim	(参考訳) 人工知能(AI)の大きな波が計算流体力学(CFD)の加速研究の分野に伝播している一方で、最近の研究は、次の目標を再現するAI技術の開発が主要な課題であり、(1)長期CFDシミュレーションにおける未確認(将来の)時系列の正確な予測(2)シミュレーションの加速(3)複数のPDE条件下で許容されるトレーニングデータと時間(4)の量を予測することを強調している。本研究では、ML-CFDハイブリッド計算を用いて、これらの4つの目的を達成するための残差に基づく物理情報伝達学習(RePIT)戦略を提案する。我々の仮説は、CFDとAIが第1原理の残差を監視しながら時系列を交互に計算するハイブリッド手法により、長期CFDシミュレーションが実現可能であるというものである。自然対流のCFDケーススタディによりRePIT戦略の有効性を検証した。単一のトレーニングアプローチでは、残留スケールの変化が100回程度発生し、予測された時系列が非物理的パターンを示し、また基底の真実からかなりのずれが生じた。逆にRePITの戦略は、決定範囲内の残差を維持し、シミュレーション期間全体を通して良好な精度を示した。地上の真理からの最大誤差は、温度0.4K未満、速度0.024m/sである。さらに,ML-GPUとCFD-CPUの計算時間の平均は0.171秒,0.015秒であった。パラメータアップ時間を含めると、シミュレーションは1.9倍に加速された。結論として、我々のRePIT戦略は、業界におけるCFDシミュレーションのコストを削減するための有望な手法である。しかし、より活発な最適化と改善研究が必要である。 While a big wave of artificial intelligence (AI) has propagated to the field of computational fluid dynamics (CFD) acceleration studies, recent research has highlighted that the development of AI techniques that reconciles the following goals remains our primary task: (1) accurate prediction of unseen (future) time series in long-term CFD simulations (2) acceleration of simulations (3) an acceptable amount of training data and time (4) within a multiple PDEs condition. In this study, we propose a residual-based physics-informed transfer learning (RePIT) strategy to achieve these four objectives using ML-CFD hybrid computation. Our hypothesis is that long-term CFD simulation is feasible with the hybrid method where CFD and AI alternately calculate time series while monitoring the first principle's residuals. The feasibility of RePIT strategy was verified through a CFD case study on natural convection. In a single training approach, a residual scale change occurred around 100th timestep, resulting in predicted time series exhibiting non-physical patterns as well as a significant deviations from the ground truth. Conversely, RePIT strategy maintained the residuals within the defined range and demonstrated good accuracy throughout the entire simulation period. The maximum error from the ground truth was below 0.4 K for temperature and 0.024 m/s for x-axis velocity. Furthermore, the average time for 1 timestep by the ML-GPU and CFD-CPU calculations was 0.171 s and 0.015 s, respectively. Including the parameter-updating time, the simulation was accelerated by a factor of 1.9. In conclusion, our RePIT strategy is a promising technique to reduce the cost of CFD simulations in industry. However, more vigorous optimization and improvement studies are still necessary.	翻訳日:2023-11-30 17:56:41 公開日:2023-11-26
# 部分的参加設定における分散非凸問題の計算・通信効率化手法 A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting ( http://arxiv.org/abs/2205.15580v3 ) ライセンス: Link先を確認	Alexander Tyurin, Peter Richt\'arik	(参考訳) 本稿では,分散最適化と連合学習の3つの重要な要素,確率的勾配の分散低減,部分的参加,圧縮通信について述べる。本手法は, 部分参加環境において, 最適オラクル複雑性と最先端通信複雑性を有することを示す。通信圧縮機能にかかわらず,本手法は分散の低減と部分的参加をうまく組み合わせる:最適なオラクル複雑性を得る,全てのノードの参加を必要としない,有界勾配(異性性)の仮定を必要としない。 We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication. We prove that the new method has optimal oracle complexity and state-of-the-art communication complexity in the partial participation setting. Regardless of the communication compression feature, our method successfully combines variance reduction and partial participation: we get the optimal oracle complexity, never need the participation of all nodes, and do not require the bounded gradients (dissimilarity) assumption.	翻訳日:2023-11-30 17:54:06 公開日:2023-11-26
# 深層学習モデルの関数型ニューラルネットワークの解析:関数型望遠鏡仮説 Analysis of functional neural codes of deep learning models: Functional Telescope Hypothesis ( http://arxiv.org/abs/2205.10952v3 ) ライセンス: Link先を確認	Jung Hoon Lee and Sujith Vijayan	(参考訳) ディープラーニング(DL)エージェントであるディープニューラルネットワーク(DNN)は、大量の並列/シーケンス操作を必要とする。これにより、DNNの動作を理解することが難しく、適切な診断を妨げる。内部プロセスに関するより詳しい知識がなければ、DNNを高い領域にデプロイすることは破滅的な失敗につながる可能性がある。したがって、より信頼性の高いDNN/DLを現実世界の高精細な問題に展開するためには、DNNの内部動作に関する洞察を得ることが不可欠である。本稿では、DNNの意思決定に関連するDLモデルの内部コードの解析に自己組織化マップ(SOM)を用いる。分析の結果,入力層近傍の浅層は特徴を凝縮空間に圧縮し,出力層近傍の深層は特徴空間を広げることが示唆された。また, 圧縮された特徴がDNNの障害を負う可能性を示唆する証拠も発見された。 Deep neural networks (DNNs), the agents of deep learning (DL), require a massive number of parallel/sequential operations. This makes it difficult to comprehend DNNs' operations and impedes proper diagnosis. Without better knowledge of their internal process, deploying DNNs in high-stakes domains can lead to catastrophic failures. Therefore, to build more reliable DNNs/DL to be deployed in high-stakes real-world problems, it is imperative that we gain insights into DNNs' internal operations underlying their decision-making. Here, we use the self-organizing map (SOM) to analyze DL models' internal codes associated with DNNs' decision-making. Our analyses suggest that shallow layers close to the input layer compress features into condensed space and that deep layers close to the output layer expand feature space. We also found evidence indicating that compressed features may underlie DNNs' vulnerabilities to adversarial perturbations.	翻訳日:2023-11-30 17:53:16 公開日:2023-11-26
# 2次情報を用いたモーメントベース政策グラディエント Momentum-Based Policy Gradient with Second-Order Information ( http://arxiv.org/abs/2205.08253v3 ) ライセンス: Link先を確認	Saber Salehkaleybar, Sadegh Khorasani, Negar Kiyavash, Niao He, Patrick Thiran	(参考訳) 近年の強化学習において, 政策勾配法における変数低減勾配推定器は, 評価過程の加速を許容する主要な研究の焦点となっている。本稿では,時間変化学習率のモーメントを用いて,2次情報を確率勾配降下(SGD)に組み込んだ分散帰納法SHARPを提案する。 SHARPアルゴリズムはパラメータフリーで$\epsilon$-approximate 1次定常点を$O(\epsilon^{-3})$の軌道数で達成し、各イテレーションで$O(1)$のバッチサイズを使用する。従来の研究と異なり,提案アルゴリズムでは,分散還元プロセスの利点を損なうような重要サンプリングを必要としない。さらに、推定誤差の分散は$O(1/t^{2/3})$の速さで減衰し、$t$は反復数である。提案手法が様々な制御課題に対して有効であることを示すとともに,実際の技術状況に対する優位性を示す。 Variance-reduced gradient estimators for policy gradient methods have been one of the main focus of research in the reinforcement learning in recent years as they allow acceleration of the estimation process. We propose a variance-reduced policy-gradient method, called SHARP, which incorporates second-order information into stochastic gradient descent (SGD) using momentum with a time-varying learning rate. SHARP algorithm is parameter-free, achieving $\epsilon$-approximate first-order stationary point with $O(\epsilon^{-3})$ number of trajectories, while using a batch size of $O(1)$ at each iteration. Unlike most previous work, our proposed algorithm does not require importance sampling which can compromise the advantage of variance reduction process. Moreover, the variance of estimation error decays with the fast rate of $O(1/t^{2/3})$ where $t$ is the number of iterations. Our extensive experimental evaluations show the effectiveness of the proposed algorithm on various control tasks and its advantage over the state of the art in practice.	翻訳日:2023-11-30 17:52:39 公開日:2023-11-26
# 自己整合性制約によるブートストラップ動作予測 Bootstrap Motion Forecasting With Self-Consistent Constraints ( http://arxiv.org/abs/2204.05859v4 ) ライセンス: Link先を確認	Maosheng Ye, Jiamiao Xu, Xunnong Xu, Tengfei Wang, Tongyi Cao, Qifeng Chen	(参考訳) 自己整合性制約(MISC)を用いた動き予測をブートストラップする新しいフレームワークを提案する。運動予測タスクは、過去の空間的・時間的情報を組み込むことで、車両の将来の軌跡を予測することを目的としている。 miscの鍵となる設計は、トレーニング中の空間的および時間的摂動の下で予測された軌道を規則化する双対一貫性制約である。また,運動予測におけるマルチモダリティをモデル化するために,教師のターゲットを正確に把握し,マルチモダリティを監督する新しいセルフセンシングスキームを設計する。複数の教師の目標からの明示的な制約を伴って,予測性能の明確な改善を観察する。 argoverse motion forecasting benchmarkとwaymo open motion datasetに関する広範な実験は、miscが最先端の手法を大きく上回っていることを示している。提案手法は一般的な手法であり,他の動き予測手法に容易に組み込むことができるため,提案手法は既存手法の予測性能を一貫して改善することを示す。 We present a novel framework to bootstrap Motion forecasting with Self-consistent Constraints (MISC). The motion forecasting task aims at predicting future trajectories of vehicles by incorporating spatial and temporal information from the past. A key design of MISC is the proposed Dual Consistency Constraints that regularize the predicted trajectories under spatial and temporal perturbation during training. Also, to model the multi-modality in motion forecasting, we design a novel self-ensembling scheme to obtain accurate teacher targets to enforce the self-constraints with multi-modality supervision. With explicit constraints from multiple teacher targets, we observe a clear improvement in the prediction performance. Extensive experiments on the Argoverse motion forecasting benchmark and Waymo Open Motion dataset show that MISC significantly outperforms the state-of-the-art methods. As the proposed strategies are general and can be easily incorporated into other motion forecasting approaches, we also demonstrate that our proposed scheme consistently improves the prediction performance of several existing methods.	翻訳日:2023-11-30 17:51:56 公開日:2023-11-26
# テンソル分解のためのニアリンアー時間と固定パラメータトラクタブルアルゴリズム Near-Linear Time and Fixed-Parameter Tractable Algorithms for Tensor Decompositions ( http://arxiv.org/abs/2207.07417v3 ) ライセンス: Link先を確認	Arvind V. Mahankali, David P. Woodruff, Ziyu Zhang	(参考訳) 我々はテンソルの低位近似について研究し、テンソルトレインとタッカー分解、およびツリーテンソルネットワークとより一般的なテンソルネットワークとの近似に焦点を当てた。テンソルトレインの分解に対して、小さなビクリテリアランクを持つビクリテリア$(1 + \eps)$-approximationアルゴリズムと、低次項までのランニング時間を持つ$O(q \cdot \nnz(A))$を与え、これは \cite{huber2017randomized} の加算誤差アルゴリズムよりも改善する。 huber2017randomized} のアルゴリズムを相対誤差アルゴリズムに変換する方法を示すが、それらのアルゴリズムは、bicriteria ランク $r$ を持つ $(1 + \eps)$近似アルゴリズムに変換するとき、必ず $o(qr^2 \cdot \nnz(a)) + n \cdot \poly(qk/\eps)$ の計算時間を持つ。我々の知る限り、テンソル列車分解に対する多項式時間相対誤差近似を初めて達成した研究である。我々の鍵となる手法は、$q$のテンソルのテンソル列の平坦化である行列に対して、$q$の行数多項式を持つ部分空間埋め込みを得る方法である。我々はアルゴリズムをツリーテンソルネットワークに拡張する。さらに、このアルゴリズムを任意のグラフを持つテンソルネットワーク(一般テンソルネットワークと呼ぶ)に拡張し、 \cite{ms08_simulating_quantum_tensor_contraction} の結果を用いて、ランク$k$の一般的なテンソルネットワークをランク$k^{O(\deg(G)\tw(G))}$のバイナリツリーネットワークに縮約できることを示し、ツリーテンソルネットワークの場合の削減を可能にした。最後に、テンソルトレイン、タッカー、cp分解に対して、多項式系解法を使用しないため、より単純である新しい固定パラメータ扱い可能なアルゴリズムを与える。ちょうど$k$行のガウス部分空間埋め込みの技法(つまり指数関数的に小さい成功確率)は独立な興味を持つ。 We study low rank approximation of tensors, focusing on the tensor train and Tucker decompositions, as well as approximations with tree tensor networks and more general tensor networks. For tensor train decomposition, we give a bicriteria $(1 + \eps)$-approximation algorithm with a small bicriteria rank and $O(q \cdot \nnz(A))$ running time, up to lower order terms, which improves over the additive error algorithm of \cite{huber2017randomized}. We also show how to convert the algorithm of \cite{huber2017randomized} into a relative error algorithm, but their algorithm necessarily has a running time of $O(qr^2 \cdot \nnz(A)) + n \cdot \poly(qk/\eps)$ when converted to a $(1 + \eps)$-approximation algorithm with bicriteria rank $r$. To the best of our knowledge, our work is the first to achieve polynomial time relative error approximation for tensor train decomposition. Our key technique is a method for obtaining subspace embeddings with a number of rows polynomial in $q$ for a matrix which is the flattening of a tensor train of $q$ tensors. We extend our algorithm to tree tensor networks. In addition, we extend our algorithm to tensor networks with arbitrary graphs (which we refer to as general tensor networks), by using a result of \cite{ms08_simulating_quantum_tensor_contraction} and showing that a general tensor network of rank $k$ can be contracted to a binary tree network of rank $k^{O(\deg(G)\tw(G))}$, allowing us to reduce to the case of tree tensor networks. Finally, we give new fixed-parameter tractable algorithms for the tensor train, Tucker, and CP decompositions, which are simpler than those of \cite{swz19_tensor_low_rank} since they do not make use of polynomial system solvers. Our technique of Gaussian subspace embeddings with exactly $k$ rows (and thus exponentially small success probability) may be of independent interest.	翻訳日:2023-11-30 17:41:32 公開日:2023-11-26
# 量子確率過程からの予測的作業抽出のためのエンジン Engines for predictive work extraction from memoryful quantum stochastic processes ( http://arxiv.org/abs/2207.03480v3 ) ライセンス: Link先を確認	Ruo Cheng Huang, Paul M. Riechers, Mile Gu, and Varun Narasimhachar	(参考訳) 量子情報処理技術は、古典的な自由エネルギーに加えて、システムの本質的に量子的な特徴から仕事の抽出を可能にする。一方、計算力学の科学は、非マルコフ古典および量子確率過程の予測モデリングのためのツールを与える。これら2つの科学のツールを組み合わせて、量子出力を持つ非マルコフ確率過程から予測作業を抽出する手法を開発した。提案手法は,非予測的な量子ワーク抽出プロトコルよりも多くの作業を抽出することができ,また,量子情報処理を伴わない予測作業抽出が可能であることを実証する。古典的前例のない量子プロセスからの作業抽出において,メモリの有効性において相転移が認められる。我々の研究は、基本的に量子的、本質的に時間的に変化する形で環境自由エネルギーを利用する機械の展望を開放する。 Quantum information-processing techniques enable work extraction from a system's inherently quantum features, in addition to the classical free energy it contains. Meanwhile, the science of computational mechanics affords tools for the predictive modeling of non-Markovian classical and quantum stochastic processes. We combine tools from these two sciences to develop a technique for predictive work extraction from non-Markovian stochastic processes with quantum outputs. We demonstrate that this technique can extract more work than non-predictive quantum work extraction protocols, on one hand, and predictive work extraction without quantum information processing, on the other. We discover a phase transition in the efficacy of memory for work extraction from quantum processes, which is without classical precedent. Our work opens up the prospect of machines that harness environmental free energy in an essentially quantum, essentially time-varying form.	翻訳日:2023-11-30 17:40:43 公開日:2023-11-26
# PlanBench: 変更計画と推論に関する大規模言語モデル評価のための拡張可能なベンチマーク PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change ( http://arxiv.org/abs/2206.10498v4 ) ライセンス: Link先を確認	Karthik Valmeekam, Matthew Marquez, Alberto Olmo, Sarath Sreedharan, Subbarao Kambhampati	(参考訳) 行動計画の作成と変化の推論は、長年、知的エージェントの中核的能力と見なされてきた。したがって、大規模言語モデル(LLM)の計画と推論能力を評価することが、研究のホットなトピックになっていることは驚くにあたらない。しかし、llm計画能力に関するほとんどの主張は、llmが計画しているのか、単に広大な世界の知識から取得しているだけなのかを知ることが難しい、常識的なタスクに基づいている。 LLMが本質的に計画能力を持っているかどうかを評価するのに十分な多様性を持つ体系的で拡張可能な計画ベンチマークが必要である。そこで本研究では,自動計画コミュニティ,特に国際計画コンペティションで使用されるドメインの種類に基づいた拡張可能なベンチマークスイートであるPlanBenchを提案する。 PlanBenchはタスクドメインと特定の計画機能の両方に十分な多様性を提供します。また本研究では,SOTAモデルにおいても,計画生成-LLM性能を含む多くの重要な機能について非常に短い結果が得られた。したがって、プランベンチは計画と推論におけるLLMの進歩の有用なマーカーとして機能する。 Generating plans of action, and reasoning about change have long been considered a core competence of intelligent agents. It is thus no surprise that evaluating the planning and reasoning capabilities of large language models (LLMs) has become a hot topic of research. Most claims about LLM planning capabilities are however based on common sense tasks-where it becomes hard to tell whether LLMs are planning or merely retrieving from their vast world knowledge. There is a strong need for systematic and extensible planning benchmarks with sufficient diversity to evaluate whether LLMs have innate planning capabilities. Motivated by this, we propose PlanBench, an extensible benchmark suite based on the kinds of domains used in the automated planning community, especially in the International Planning Competition, to test the capabilities of LLMs in planning or reasoning about actions and change. PlanBench provides sufficient diversity in both the task domains and the specific planning capabilities. Our studies also show that on many critical capabilities-including plan generation-LLM performance falls quite short, even with the SOTA models. PlanBench can thus function as a useful marker of progress of LLMs in planning and reasoning.	翻訳日:2023-11-30 17:39:28 公開日:2023-11-26
# 拡張不変マニフォールド学習 Augmentation Invariant Manifold Learning ( http://arxiv.org/abs/2211.00460v2 ) ライセンス: Link先を確認	Shulei Wang	(参考訳) データ拡張は、近年の自己教師型表現学習の進歩において、広く使われている技法であり、重要な要素である。拡張データ間の類似性を維持することにより、結果として得られるデータ表現は、様々な下流解析を改善し、多くのアプリケーションで最先端のパフォーマンスを達成することができる。経験的効果にもかかわらず、既存のほとんどの手法は一般的な非線形条件下での理論的な理解を欠いている。このギャップを埋めるために、データ拡張変換をモデル化する低次元積多様体上の統計フレームワークを開発する。本フレームワークでは,拡張不変多様体学習と呼ばれる新しい表現学習手法を導入し,確率的最適化問題として再構成して計算効率の高いアルゴリズムを設計する。従来の自己教師付き手法と比較して、新しい手法は多様体の幾何構造と拡張データの不変性を同時に利用し、明確な理論的保証を有する。提案手法におけるデータ拡張の役割を考察し,より複雑なデータ拡張が下流分析の改善につながることを示すために,下流解析において拡張データから得られたデータ表現が$k$-nearest 隣の分類器を改善する方法と方法を明らかにする。最後に,シミュレーションおよび実データを用いた数値実験を行い,提案手法の有効性を示す。 Data augmentation is a widely used technique and an essential ingredient in the recent advance in self-supervised representation learning. By preserving the similarity between augmented data, the resulting data representation can improve various downstream analyses and achieve state-of-the-art performance in many applications. Despite the empirical effectiveness, most existing methods lack theoretical understanding under a general nonlinear setting. To fill this gap, we develop a statistical framework on a low-dimension product manifold to model the data augmentation transformation. Under this framework, we introduce a new representation learning method called augmentation invariant manifold learning and design a computationally efficient algorithm by reformulating it as a stochastic optimization problem. Compared with existing self-supervised methods, the new method simultaneously exploits the manifold's geometric structure and invariant property of augmented data and has an explicit theoretical guarantee. Our theoretical investigation characterizes the role of data augmentation in the proposed method and reveals why and how the data representation learned from augmented data can improve the $k$-nearest neighbor classifier in the downstream analysis, showing that a more complex data augmentation leads to more improvement in downstream analysis. Finally, numerical experiments on simulated and real datasets are presented to demonstrate the merit of the proposed method.	翻訳日:2023-11-30 17:33:01 公開日:2023-11-26
# 量子臨界における創発的連続対称性の検出 Detecting emergent continuous symmetries at quantum criticality ( http://arxiv.org/abs/2210.17539v4 ) ライセンス: Link先を確認	Mingru Yang, Bram Vanhecke, Norbert Schuch	(参考訳) 新しくあるいは拡大された対称性は、ハミルトン群の非正規化群フローにおいて対称性の破れ項が無関係である場合、対称性を持たないハミルトニアンの低エネルギースペクトルに現れる。本稿では,量子スピンチェーンの基底状態から創発的保存電流の格子作用素近似を数値的に抽出するテンソルネットワークに基づくアルゴリズムを提案する。スピン-1/2$J$-$Q$Heisenberg 連鎖と分解量子臨界点 (DQCP) の1次元バージョンに対する我々の結果は、創発格子 Kac-Moody 生成器を得るための方法の力を示している。これはまた、可積分モデルの局所的な運動積分と臨界ギャップのない基底状態の局所親ハミルトニアンを見つける方法として見ることもできる。 New or enlarged symmetries can emerge at the low-energy spectrum of a Hamiltonian that does not possess the symmetries, if the symmetry breaking terms in the Hamiltonian are irrelevant under the renormalization group flow. In this letter, we propose a tensor network based algorithm to numerically extract lattice operator approximation of the emergent conserved currents from the ground state of any quantum spin chains, without the necessity to have prior knowledge about its low-energy effective field theory. Our results for the spin-1/2 $J$-$Q$ Heisenberg chain and a one-dimensional version of the deconfined quantum critical points (DQCP) demonstrate the power of our method to obtain the emergent lattice Kac-Moody generators. It can also be viewed as a way to find the local integrals of motion of an integrable model and the local parent Hamiltonian of a critical gapless ground state.	翻訳日:2023-11-30 17:32:41 公開日:2023-11-26
# G-PECNet: 一般化可能な歩行者軌道予測システムを目指して G-PECNet: Towards a Generalizable Pedestrian Trajectory Prediction System ( http://arxiv.org/abs/2210.09846v2 ) ライセンス: Link先を確認	Aryan Garg, Renu M. Rameshan	(参考訳) 人的資産を妨害したり損傷させたりすることなく、ダイナミックな物理的環境をナビゲートすることは、社会ロボットにとって極めて重要である。本研究では,自律型ドローンナビゲーションのサブ課題である,ディープジェネレーティブモデルを用いて,ドメイン外人間およびエージェントのトラジェクタの予測を行う。提案手法は,2020年のベンチマークでは, 周期的アクティベーション関数にインスパイアされたアーキテクチャ改善と, 隠れマルコフモデル(HMM)と強化学習(RL)を用いた合成軌道(データ)拡張を併用して, 最終変位誤差(FDE)の9.5倍の改善を観測する。さらに,軌道の非線形性および外乱検出のための簡易な幾何学的インスピレーション付き計量法を提案する。コードは$\href{https://github.com/aryan-garg/pecnet-pedestrian-trajectory-prediction.git}{github}$で入手できる。 Navigating dynamic physical environments without obstructing or damaging human assets is of quintessential importance for social robots. In this work, we solve autonomous drone navigation's sub-problem of predicting out-of-domain human and agent trajectories using a deep generative model. Our method: General-PECNet or G-PECNet observes an improvement of 9.5\% on the Final Displacement Error (FDE) on 2020's benchmark: PECNet through a combination of architectural improvements inspired by periodic activation functions and synthetic trajectory (data) augmentations using Hidden Markov Models (HMMs) and Reinforcement Learning (RL). Additionally, we propose a simple geometry-inspired metric for trajectory non-linearity and outlier detection, helpful for the task. Code available at $\href{https://github.com/Aryan-Garg/PECNet-Pedestrian-Trajectory-Prediction.git}{GitHub}$	翻訳日:2023-11-30 17:32:08 公開日:2023-11-26
# MA-RECON: 高速MRIk空間補間のためのマスク対応ディープニューラルネットワーク MA-RECON: Mask-aware deep-neural-network for robust fast MRI k-space interpolation ( http://arxiv.org/abs/2209.00462v2 ) ライセンス: Link先を確認	Nitzan Avidan and Moti Freiman	(参考訳) フーリエ領域にあるアンダーサンプリングされた「k空間」データからのMRI画像の高品質な再構成は、MRI取得時間を短縮し、時間分解能の優れた確保に不可欠である。近年,このプロセスに関連した複雑で不適切な逆問題に取り組むために,深層ニューラルネットワーク(dnn)手法が数多く登場している。しかし、獲得過程や解剖学的分布の変動に対する不安定さは、これらのDNNアーキテクチャ内の関連物理モデルの一般化に欠如している。本研究の目的は,新しいマスク対応DNNアーキテクチャであるMA-RECONを導入することで,k空間補間のためのDNN手法の一般化能力を向上することである。従来のアプローチとは異なり、MA-RECONアーキテクチャは観測データだけでなく、モデル構造内のアンダーサンプリングマスクも符号化している。様々なアンダーサンプリングマスクで生成されたデータを活用して、アンダーサンプリングされたMRI再構成問題の一般化を刺激する。したがって、関連する逆問題(古典的圧縮センシングアプローチ)を効果的に表現する。我々のMA-RECONアプローチの利点は、広くアクセス可能な高速MRIデータセットによる厳密なテストによって確認された。アンダーサンプリングマスク強化を訓練した標準DNN法とDNNと比較して,本手法は優れた一般化能力を示した。その結果、特に病理疾患のある地域では、獲得過程と解剖学的分布の両方の変化に対するロバスト性が大幅に向上した。結論として,我々のマスク認識戦略は,低サンプリングk空間データからMRI再構成のためのDNNベースの手法の一般化能力と堅牢性を高めることを約束する。 High-quality reconstruction of MRI images from under-sampled `k-space' data, which is in the Fourier domain, is crucial for shortening MRI acquisition times and ensuring superior temporal resolution. Over recent years, a wealth of deep neural network (DNN) methods have emerged, aiming to tackle the complex, ill-posed inverse problem linked to this process. However, their instability against variations in the acquisition process and anatomical distribution exposes a deficiency in the generalization of relevant physical models within these DNN architectures. The goal of our work is to enhance the generalization capabilities of DNN methods for k-space interpolation by introducing `MA-RECON', an innovative mask-aware DNN architecture and associated training method. Unlike preceding approaches, our `MA-RECON' architecture encodes not only the observed data but also the under-sampling mask within the model structure. It implements a tailored training approach that leverages data generated with a variety of under-sampling masks to stimulate the model's generalization of the under-sampled MRI reconstruction problem. Therefore, effectively represents the associated inverse problem, akin to the classical compressed sensing approach. The benefits of our MA-RECON approach were affirmed through rigorous testing with the widely accessible fastMRI dataset. Compared to standard DNN methods and DNNs trained with under-sampling mask augmentation, our approach demonstrated superior generalization capabilities. This resulted in a considerable improvement in robustness against variations in both the acquisition process and anatomical distribution, especially in regions with pathology. In conclusion, our mask-aware strategy holds promise for enhancing the generalization capacity and robustness of DNN-based methodologies for MRI reconstruction from undersampled k-space data.	翻訳日:2023-11-30 17:27:34 公開日:2023-11-26
# InferEM:共感的対話生成のための話者意図の推測 InferEM: Inferring the Speaker's Intention for Empathetic Dialogue Generation ( http://arxiv.org/abs/2212.06373v7 ) ライセンス: Link先を確認	Guoqing Lv, Jiang Li, Xiaoping Wang, Zhigang Zeng	(参考訳) 共感応答生成に対する現在のアプローチは、一般的に対話履歴全体をエンコードし、出力をデコーダに入れてフレンドリーなフィードバックを生成する。これらの手法は文脈情報のモデル化に焦点をあてるが、話者の直接の意図を捉えることは無視する。我々は,対話の最後の発声が話者の意図を実証的に伝えることを主張する。そこで本研究では,共感応答生成のための新しいモデルInferEMを提案する。我々は,最後の発話を別々に符号化し,多面的注意に基づく意図融合モジュールを通して対話全体と融合し,話者の意図を捉える。さらに,先行した発話を用いて最後の発話を予測し,人間の心理をシミュレートし,対話者が事前に何を話すのかを推測する。発話予測と応答生成の最適化率のバランスをとるために,InferEMのためのマルチタスク学習戦略を設計する。実験の結果,inferemの共感性発現改善における可能性と妥当性が示された。 Current approaches to empathetic response generation typically encode the entire dialogue history directly and put the output into a decoder to generate friendly feedback. These methods focus on modelling contextual information but neglect capturing the direct intention of the speaker. We argue that the last utterance in the dialogue empirically conveys the intention of the speaker. Consequently, we propose a novel model named InferEM for empathetic response generation. We separately encode the last utterance and fuse it with the entire dialogue through the multi-head attention based intention fusion module to capture the speaker's intention. Besides, we utilize previous utterances to predict the last utterance, which simulates human's psychology to guess what the interlocutor may speak in advance. To balance the optimizing rates of the utterance prediction and response generation, a multi-task learning strategy is designed for InferEM. Experimental results demonstrate the plausibility and validity of InferEM in improving empathetic expression.	翻訳日:2023-11-30 17:19:37 公開日:2023-11-26
# 言語濃度を用いたモデル推論精度の厳密な評価 Rigorous Assessment of Model Inference Accuracy using Language Cardinality ( http://arxiv.org/abs/2211.16587v2 ) ライセンス: Link先を確認	Donato Clun, Donghwan Shin, Antonio Filieri, Domenico Bianculli	(参考訳) 有限状態オートマトンのようなモデルは、実行中に観測可能なイベントのシーケンスをキャプチャすることでソフトウェアシステムの振る舞いを抽象化するために広く使われている。それでも、モデルが実際に存在することはめったになく、その場合には、容易に時代遅れになり、さらに、手動でモデルを構築し、メンテナンスすることは、コストがかかり、エラーが発生します。その結果、これらの問題に対処するために、実行トレースからモデルを自動的に構築する様々なモデル推論手法が提案されている。しかし、推論されたモデルの体系的かつ信頼性の高い精度評価を行うことは、未解決の問題である。参照モデルが与えられたとしても、既存のモデル精度評価手法のほとんどは、誤解を招く結果や偏った結果を返す可能性がある。これは主に、有限個のランダムに生成されたトレースに対する統計的推定子に依存しており、推定に関する避けられない不確実性をもたらし、ランダムなトレース生成プロセスのパラメータに敏感である。本稿では,モデル精度評価におけるバイアスと不確実性を最小限に抑え,統計的推定を決定論的精度尺度に置き換える,解析的組合せに基づく系統的アプローチを提案する。確立された仕様マイニングベンチマークから参照モデルに対する最先端推論ツールによって推定されるモデルの精度を評価することにより,提案手法の一貫性と妥当性を実験的に実証した。 Models such as finite state automata are widely used to abstract the behavior of software systems by capturing the sequences of events observable during their execution. Nevertheless, models rarely exist in practice and, when they do, get easily outdated; moreover, manually building and maintaining models is costly and error-prone. As a result, a variety of model inference methods that automatically construct models from execution traces have been proposed to address these issues. However, performing a systematic and reliable accuracy assessment of inferred models remains an open problem. Even when a reference model is given, most existing model accuracy assessment methods may return misleading and biased results. This is mainly due to their reliance on statistical estimators over a finite number of randomly generated traces, introducing avoidable uncertainty about the estimation and being sensitive to the parameters of the random trace generative process. This paper addresses this problem by developing a systematic approach based on analytic combinatorics that minimizes bias and uncertainty in model accuracy assessment by replacing statistical estimation with deterministic accuracy measures. We experimentally demonstrate the consistency and applicability of our approach by assessing the accuracy of models inferred by state-of-the-art inference tools against reference models from established specification mining benchmarks.	翻訳日:2023-11-30 17:18:35 公開日:2023-11-26
# テキスト・画像拡散モデルへの条件制御の追加 Adding Conditional Control to Text-to-Image Diffusion Models ( http://arxiv.org/abs/2302.05543v3 ) ライセンス: Link先を確認	Lvmin Zhang and Anyi Rao and Maneesh Agrawala	(参考訳) 大規模で事前訓練されたテキスト-画像拡散モデルに空間条件制御を追加するニューラルネットワークアーキテクチャであるControlNetを提案する。 controlnetはプロダクション対応の大規模拡散モデルをロックし、数十億のイメージでトレーニングされた深層で堅牢なエンコーディング層を強力なバックボーンとして再利用して、さまざまな条件付きコントロールのセットを学ぶ。ニューラル・アーキテクチャは「ゼロ畳み込み」(ゼロ初期化畳み込み層)と接続され、パラメータを徐々にゼロから成長させ、有害なノイズが微調整に影響を与えないようにする。条件付制御,例えばエッジ,エッジ,深さ,セグメンテーション,人間のポーズ等を,プロンプトの有無にかかわらず,単一または複数条件を用いて安定した拡散でテストする。 ControlNetsのトレーニングは、小さな (50k) と大きな (>1m) データセットで堅牢であることを示す。画像拡散モデルを制御するため,コントロールネットは広い範囲の応用を促進する可能性がある。 We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.	翻訳日:2023-11-30 17:07:34 公開日:2023-11-26
# DocAsRef: 参照ベースの概要品質基準を自由に再利用する実証的研究 DocAsRef: An Empirical Study on Repurposing Reference-Based Summary Quality Metrics Reference-Freely ( http://arxiv.org/abs/2212.10013v2 ) ライセンス: Link先を確認	Forrest Sheng Bao, Ruixuan Tu, Ge Luo, Yinfei Yang, Hebi Li, Minghui Qiu, Youbiao He, Cen Chen	(参考訳) 自動要約品質評価は、参照ベースと参照フリーの2つのカテゴリに分類される。人間が書いた参照から得られる追加情報によって歴史的により正確と考えられる参照ベースのメトリクスは、人間の入力に依存して制限される。本稿では,システムサマリーとシステムサマリーを比較するための基準ベースメトリクスの比較手法を,そのソース文書に対する評価に効果的に適用し,これらのメトリクスを基準フリーに変換できると仮定する。実験結果はこの仮説を支持する。参照フリーで再利用された後、<0.5Bパラメータの事前訓練されたDeBERTa-large-MNLIモデルを使用したゼロショットBERTScoreは、SummEvalおよびNewsroomデータセットのさまざまな側面において、オリジナルの参照ベースバージョンを一貫して上回っている。また、既存の参照フリーメトリクスと比べて優れているし、gpt-3.5に基づいたゼロショットサマリーエミュレータと密接に競合する。 Automated summary quality assessment falls into two categories: reference-based and reference-free. Reference-based metrics, historically deemed more accurate due to the additional information provided by human-written references, are limited by their reliance on human input. In this paper, we hypothesize that the comparison methodologies used by some reference-based metrics to evaluate a system summary against its corresponding reference can be effectively adapted to assess it against its source document, thereby transforming these metrics into reference-free ones. Experimental results support this hypothesis. After being repurposed reference-freely, the zero-shot BERTScore using the pretrained DeBERTa-large-MNLI model of <0.5B parameters consistently outperforms its original reference-based version across various aspects on the SummEval and Newsroom datasets. It also excels in comparison to most existing reference-free metrics and closely competes with zero-shot summary evaluators based on GPT-3.5.	翻訳日:2023-11-30 17:03:29 公開日:2023-11-26
# 量子ディープヘッジ Quantum Deep Hedging ( http://arxiv.org/abs/2303.16585v2 ) ライセンス: Link先を確認	El Amine Cherrat, Snehal Raj, Iordanis Kerenidis, Abhishek Shekhar, Ben Wood, Jon Dee, Shouvanik Chakrabarti, Richard Chen, Dylan Herman, Shaohan Hu, Pierre Minssen, Ruslan Shaydulin, Yue Sun, Romina Yalovetzky, Marco Pistoia	(参考訳) 量子機械学習は、業界、特に金融分野での変革的な影響の可能性を秘めている。私たちの仕事では、深層強化学習が実際の市場に対して強力なフレームワークを提供するため、ヘッジの問題に目を向けています。本研究では,ポリシと値関数に直交層と複合層を持つ量子ニューラルネットワークアーキテクチャを用いた,ポリシー探索および分布型アクタクリティカルアルゴリズムに基づく量子強化学習法を開発した。我々は、我々が使用する量子ニューラルネットワークが学習可能であることを証明し、量子モデルが学習可能なパラメータの数を減少させながら同等の性能を達成し、分布アプローチが古典的および量子的手法よりも優れた性能が得られることを示す広範なシミュレーションを行う。トラップイオン量子プロセッサ上で提案したモデルの実装に成功し、最大16ドルキュービットの回路を活用し、ノイズレスシミュレーションによく適合する性能を観測した。我々の量子技術は一般的なものであり、ヘッジ以外の強化学習問題にも適用できる。 Quantum machine learning has the potential for a transformative impact across industry sectors and in particular in finance. In our work we look at the problem of hedging where deep reinforcement learning offers a powerful framework for real markets. We develop quantum reinforcement learning methods based on policy-search and distributional actor-critic algorithms that use quantum neural network architectures with orthogonal and compound layers for the policy and value functions. We prove that the quantum neural networks we use are trainable, and we perform extensive simulations that show that quantum models can reduce the number of trainable parameters while achieving comparable performance and that the distributional approach obtains better performance than other standard approaches, both classical and quantum. We successfully implement the proposed models on a trapped-ion quantum processor, utilizing circuits with up to $16$ qubits, and observe performance that agrees well with noiseless simulation. Our quantum techniques are general and can be applied to other reinforcement learning problems beyond hedging.	翻訳日:2023-11-30 16:44:31 公開日:2023-11-26
# 可変レンズを用いた変圧器の潜時予測 Eliciting Latent Predictions from Transformers with the Tuned Lens ( http://arxiv.org/abs/2303.08112v4 ) ライセンス: Link先を確認	Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt	(参考訳) 反復推論の観点からトランスフォーマーを解析し,モデル予測がレイヤ単位でどのように洗練されるかを理解する。そのため、凍結事前訓練されたモデルで各ブロックに対するアフィンプローブを訓練し、すべての隠れた状態を語彙上の分布に復号することができる。我々の方法である 'emph{tuned Lens} は、初期の 'logit Lens' 技術の洗練であり、有用な洞察を得たが、しばしば脆弱である。我々は,最大20Bパラメータを持つ多種多様な自己回帰言語モデルを用いて,ロジットレンズよりも予測的かつ信頼性が高く,偏りがないことを示す。因果実験により、調整レンズはモデル自体と同様の機能を使用することを示した。また,悪意のある入力を高精度に検出するために,潜在予測の軌跡が利用できることも見いだした。結果の再現に必要なコードは、https://github.com/alignmentresearch/tuned-lensにある。 We analyze transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer. To do so, we train an affine probe for each block in a frozen pretrained model, making it possible to decode every hidden state into a distribution over the vocabulary. Our method, the \emph{tuned lens}, is a refinement of the earlier ``logit lens'' technique, which yielded useful insights but is often brittle. We test our method on various autoregressive language models with up to 20B parameters, showing it to be more predictive, reliable and unbiased than the logit lens. With causal experiments, we show the tuned lens uses similar features to the model itself. We also find the trajectory of latent predictions can be used to detect malicious inputs with high accuracy. All code needed to reproduce our results can be found at https://github.com/AlignmentResearch/tuned-lens.	翻訳日:2023-11-30 16:39:28 公開日:2023-11-26
# 乳幼児の泣き声の弱さ検出 Weakly Supervised Detection of Baby Cry ( http://arxiv.org/abs/2304.10001v3 ) ライセンス: Link先を確認	Weijun Tan, Qi Yao, Jingfeng Liu	(参考訳) 乳幼児の泣き声の検出は乳児のモニタリングと健康管理の重要な部分である。既存のほとんどのメソッドは、教師付きSVM、CNN、またはそれらの変種を使用する。本研究では,乳児の泣き声を検出するために弱い教師付き異常検出法を提案する。この弱い監視では、オーディオファイルに泣き声がある場合にのみ弱いアノテーションが必要である。我々は、VGGish特徴抽出器と、長い音声ファイルの異常検出ネットワークを用いて、データマイニング手法を設計する。得られたデータセットは、簡単なCNN機能ネットワークをトレーニングして、Cry/non-cry分類を行う。次に、このCNNを異常検出フレームワークの機能抽出器として使用し、より優れた低温検出性能を実現する。 Detection of baby cries is an important part of baby monitoring and health care. Almost all existing methods use supervised SVM, CNN, or their varieties. In this work, we propose to use weakly supervised anomaly detection to detect a baby cry. In this weak supervision, we only need weak annotation if there is a cry in an audio file. We design a data mining technique using the pre-trained VGGish feature extractor and an anomaly detection network on long untrimmed audio files. The obtained datasets are used to train a simple CNN feature network for cry/non-cry classification. This CNN is then used as a feature extractor in an anomaly detection framework to achieve better cry detection performance.	翻訳日:2023-11-30 16:31:24 公開日:2023-11-26
# 条件適応器:高速推論によるパラメータ効率変換学習 Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference ( http://arxiv.org/abs/2304.04947v2 ) ライセンス: Link先を確認	Tao Lei, Junwen Bai, Siddhartha Brahma, Joshua Ainslie, Kenton Lee, Yanqi Zhou, Nan Du, Vincent Y. Zhao, Yuexin Wu, Bo Li, Yu Zhang, Ming-Wei Chang	(参考訳) 本稿では,パラメータ効率の高い伝達学習手法である条件付きアダプタ(coda)を提案する。 CoDAは標準アダプタアプローチを超越して一般化し、条件計算を用いて速度と精度のバランスをとる新しい方法を実現する。既存の密集した事前学習モデルから始め、codaは少量の新しいパラメータと軽量トレーニングフェーズと共にスパースアクティベーションを追加している。我々の実験は、CoDAアプローチが予想外の効果的な知識伝達方法を提供することを示した。様々な言語、視覚、音声のタスクを通して、codaは、精度損失が中程度からゼロ、パラメータ効率が同じで、最先端アダプタアプローチと比較して2倍から8倍の推論スピードアップを実現している。 We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning method that also improves inference efficiency. CoDA generalizes beyond standard adapter approaches to enable a new way of balancing speed and accuracy using conditional computation. Starting with an existing dense pretrained model, CoDA adds sparse activation together with a small number of new parameters and a light-weight training phase. Our experiments demonstrate that the CoDA approach provides an unexpectedly efficient way to transfer knowledge. Across a variety of language, vision, and speech tasks, CoDA achieves a 2x to 8x inference speed-up compared to the state-of-the-art Adapter approaches with moderate to no accuracy loss and the same parameter efficiency.	翻訳日:2023-11-30 16:30:04 公開日:2023-11-26
# ニュートン重力場における量子時計の時間拡張 Time dilation of quantum clocks in a Newtonian gravitational field ( http://arxiv.org/abs/2304.04281v3 ) ライセンス: Link先を確認	Tommaso Favalli and Augusto Smerzi	(参考訳) 球状質量によって生成されるニュートン重力場と相互作用する2つの非相対論的量子時計を考える。 page と wootters のアプローチの枠組みでは、時計の時間状態の時間拡張を導出する。遅延はシュワルツシルト計量から得られる重力時間拡張と一階まで一致している。この結果は相対論的重力ポテンシャルを考えることで拡張できる:この場合、正確なシュワルツシルト解との一致を得る。 We consider two non-relativistic quantum clocks interacting with a Newtonian gravitational field produced by a spherical mass. In the framework of Page and Wootters approach, we derive a time dilation for the time states of the clocks. The delay is in agreement up to first order with the gravitational time dilation obtained from the Schwarzschild metric. This result can be extended by considering the relativistic gravitational potential: in this case we obtain the agreement with the exact Schwarzschild solution.	翻訳日:2023-11-30 16:29:50 公開日:2023-11-26
# 最近傍のアルゴリズムにおける効率的なタスク特化データ評価」の一考察 A Note on "Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms" ( http://arxiv.org/abs/2304.04258v2 ) ライセンス: Link先を確認	Jiachen T. Wang and Ruoxi Jia	(参考訳) データ評価は、機械学習(ML)モデルに対する個々のデータポイントの影響を研究する、成長する研究分野である。データシャプリー(data shapley)は、協調ゲーム理論と経済学に触発され、データ評価の効果的な方法である。しかし、Shapley値(SV)が計算コストが高いことはよく知られている。幸いなことに、Jia et al. (2019) は、K-Nearest Neighbors (KNN) モデルでは、Data Shapleyの計算は驚くほど単純で効率的であることを示した。本稿では、Jia et al. (2019) の業績を再考し、KNNモデルの性能をよりよく反映した、より自然で解釈可能なユーティリティ関数を提案する。新しいユーティリティ関数を用いて、kn分類器/レグレプタのデータシェープリーの対応する計算手順を導出する。我々の新しいアプローチは、ソフトラベルKNN-SVと呼ばれ、元の方法と同じ時間複雑性を実現する。さらに,局所性感度ハッシュ(LSH)に基づくソフトラベルKNN-SVの効率的な近似アルゴリズムを提案する。実験の結果, ソフトラベルKNN-SVは, 誤りラベル付きデータ検出タスクにおけるほとんどのデータセットにおいて, 元の手法よりも優れており, 今後のデータ評価研究のベースラインとして優れていることがわかった。 Data valuation is a growing research field that studies the influence of individual data points for machine learning (ML) models. Data Shapley, inspired by cooperative game theory and economics, is an effective method for data valuation. However, it is well-known that the Shapley value (SV) can be computationally expensive. Fortunately, Jia et al. (2019) showed that for K-Nearest Neighbors (KNN) models, the computation of Data Shapley is surprisingly simple and efficient. In this note, we revisit the work of Jia et al. (2019) and propose a more natural and interpretable utility function that better reflects the performance of KNN models. We derive the corresponding calculation procedure for the Data Shapley of KNN classifiers/regressors with the new utility functions. Our new approach, dubbed soft-label KNN-SV, achieves the same time complexity as the original method. We further provide an efficient approximation algorithm for soft-label KNN-SV based on locality sensitive hashing (LSH). Our experimental results demonstrate that Soft-label KNN-SV outperforms the original method on most datasets in the task of mislabeled data detection, making it a better baseline for future work on data valuation.	翻訳日:2023-11-30 16:29:44 公開日:2023-11-26
# 非一貫性オントロジーを用いた不整合耐性推論への埋め込みに基づくアプローチ An Embedding-based Approach to Inconsistency-tolerant Reasoning with Inconsistent Ontologies ( http://arxiv.org/abs/2304.01664v2 ) ライセンス: Link先を確認	Keyu Wang, Site Li, Jiaye Li, Guilin Qi and Qiu Ji	(参考訳) 不整合処理は知識管理において重要な問題である。特にオントロジー工学では、論理的な矛盾はオントロジー構築中に起こりうる。矛盾するオントロジーで推論する自然な方法は、オントロジーの最大一貫した部分集合を利用することである。しかしながら、最大整合性部分集合の選択に関する以前の研究は公理の意味論をほとんど考慮していないため、不合理な推論につながる可能性がある。本稿では,公理の埋め込みに基づく記述論理における矛盾したオントロジーを推論する新しい手法を提案する。まず, 公理を分散意味ベクトルに変換し, 公理間の意味接続を計算する手法を提案する。次に,最大一貫性部分集合を選択する組込みベース手法を定義し,非一貫性許容推論関係を定義する。いくつかの論理的性質を考慮した推論関係の有理性を示す。最後に,いくつかのオントロジーについて実験を行い,推論関係の推論力を評価する。実験結果から, 組込み法は, 最大一貫した部分集合に基づく既存不整合耐性推論法より優れることが示された。 Inconsistency handling is an important issue in knowledge management. Especially in ontology engineering, logical inconsistencies may occur during ontology construction. A natural way to reason with an inconsistent ontology is to utilize the maximal consistent subsets of the ontology. However, previous studies on selecting maximum consistent subsets have rarely considered the semantics of the axioms, which may result in irrational inference. In this paper, we propose a novel approach to reasoning with inconsistent ontologies in description logics based on the embeddings of axioms. We first give a method for turning axioms into distributed semantic vectors to compute the semantic connections between the axioms. We then define an embedding-based method for selecting the maximum consistent subsets and use it to define an inconsistency-tolerant inference relation. We show the rationality of our inference relation by considering some logical properties. Finally, we conduct experiments on several ontologies to evaluate the reasoning power of our inference relation. The experimental results show that our embedding-based method can outperform existing inconsistency-tolerant reasoning methods based on maximal consistent subsets.	翻訳日:2023-11-30 16:27:53 公開日:2023-11-26
# 準メトリック学習による最適ゴールリーチ強化学習 Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning ( http://arxiv.org/abs/2304.01203v7 ) ライセンス: Link先を確認	Tongzhou Wang, Antonio Torralba, Phillip Isola, Amy Zhang	(参考訳) 目標到達強化学習(rl)では、最適値関数は準メトリック構造と呼ばれる特定の幾何学を持つ。本稿では,準メトリックモデルを用いて最適値関数を学習する新しい rl 手法である quasimetric reinforcement learning (qrl) を提案する。従来のアプローチとは違い、QRLの目標は特に準計量のために設計されており、強力な理論的回復保証を提供する。実験的に、離散化されたマウンテンカー環境を徹底的に分析し、QRLの特性と代替品に対する優位性を識別する。オフラインおよびオンラインの目標達成ベンチマークでは、QRLは、状態ベースと画像ベースの両方で、サンプル効率とパフォーマンスが改善されている。 In goal-reaching reinforcement learning (RL), the optimal value function has a particular geometry, called quasimetric structure. This paper introduces Quasimetric Reinforcement Learning (QRL), a new RL method that utilizes quasimetric models to learn optimal value functions. Distinct from prior approaches, the QRL objective is specifically designed for quasimetrics, and provides strong theoretical recovery guarantees. Empirically, we conduct thorough analyses on a discretized MountainCar environment, identifying properties of QRL and its advantages over alternatives. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance, across both state-based and image-based observations.	翻訳日:2023-11-30 16:27:34 公開日:2023-11-26
# AIによる調査:大規模言語モデルの活用とオピニオン予測のための調査 AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction ( http://arxiv.org/abs/2305.09620v2 ) ライセンス: Link先を確認	Junsol Kim, Byungkyu Lee	(参考訳) 人間のような反応を生み出す大きな言語モデル(LLM)は、社会科学における研究の実践に革命をもたらし始めている。本稿では,LLMとソーシャルサーベイを統合して,これまで質問されなかった質問に対する個々の回答を正確に予測する方法を示す。本研究は,LLMを個人化するための新たな手法として,テキストから導かれる調査質問の意味,回答パターンから推測される個人の潜在信念,調査データを用いた微調整による調査期間の時間的文脈を考察する。 1972年から2021年までの一般社会調査の結果から,alpaca-7bに基づく微調整モデルでは,部分的欠落と完全欠落に対する個々の回答を予測できることが示された。また,同性婚への支持が高まるなど,世論の態度が変わった際には,不在の傾向を高い信頼感と要点で埋めることができる。 LLMを意見予測に用いた場合、個人の自律性とプライバシに関する実践的制約、社会デコグラフィー表現、倫理的懸念について論じる。本研究は,LLMと調査が相互に相互に能力を高めることを示し,LLMは調査可能性を広げ,調査はLLMのアライメントを改善する。 Large language models (LLMs) that produce human-like responses have begun to revolutionize research practices in the social sciences. This paper shows how we can integrate LLMs and social surveys to accurately predict individual responses to survey questions that were not asked before. We develop a novel methodological framework to personalize LLMs by considering the meaning of survey questions derived from their text, the latent beliefs of individuals inferred from their response patterns, and the temporal contexts across different survey periods through fine-tuning LLMs with survey data. Using the General Social Survey from 1972 to 2021, we show that the fine-tuned model based on Alpaca-7b can predict individual responses to survey questions that are partially missing as well as entirely missing. The remarkable prediction capabilities allow us to fill in missing trends with high confidence and pinpoint when public attitudes changed, such as the rising support for same-sex marriage. We discuss practical constraints, socio-demographic representation, and ethical concerns regarding individual autonomy and privacy when using LLMs for opinion prediction. This study demonstrates that LLMs and surveys can mutually enhance each other's capabilities: LLMs broaden survey potential, while surveys improve the alignment of LLMs.	翻訳日:2023-11-30 16:21:09 公開日:2023-11-26
# Bare Homography による画像マッチング Image Matching by Bare Homography ( http://arxiv.org/abs/2305.08946v5 ) ライセンス: Link先を確認	Fabio Bellavia	(参考訳) 本稿では,シーンを粗い局所重なり面としてモデル化する,新しい非奥行き画像マッチングフレームワークslimeを提案する。この中間表現は、キーポイントパッチの局所的なアフィン近似と、空間的および類似性の制約に基づくグローバルマッチングの間に位置し、プレーンが一般的なシーンに関して扱いやすいので、対応の漸進的プルーニングを提供する。スライムは画像を異なるスケールで重なり合う領域に分解し、ゆるい平面ホモグラフを計算する。平面は一致するマッチによって相互に拡張され、画像は固定タイルに分割され、タイルのペアごとに最適なホモグラフのみが保持される。安定マッチは、ペアワイズホモグラフによって提供される許容ステレオ構成のコンセンサスに従って識別される。タイル内では、粗面はマッチの重なりに応じてマージされ、さらに一貫した対応が抽出される。プロセス全体はホモグラフィの制約のみを含む。その結果、シーン上の正しいマッチのカバレッジと安定性の両方が増幅され、困難なシーンでマッチを見つけられるようになり、従来のハイブリッドマッチングパイプラインが、最近のエンドツーエンドのディープマッチングメソッドに対して失われた基盤を構築できるようになった。さらに、エンドツーエンドのディープ・ネットワークとハイブリッド・パイプラインで表現される画像マッチングにおける最近の最先端画像の比較分析を行った。この評価は、急激な時間変化や相対的な画像回転の強い変動など、批判的かつ困難なシナリオを考慮して、平面と非平面の両方を考慮する。この分析によれば、この分野における印象的な進歩にもかかわらず、今後の研究で検討すべき改善の余地は広い。 This paper presents Slime, a novel non-deep image matching framework which models the scene as rough local overlapping planes. This intermediate representation sits in-between the local affine approximation of the keypoint patches and the global matching based on both spatial and similarity constraints, providing a progressive pruning of the correspondences, as planes are easier to handle with respect to general scenes. Slime decomposes the images into overlapping regions at different scales and computes loose planar homographies. Planes are mutually extended by compatible matches and the images are split into fixed tiles, with only the best homographies retained for each pair of tiles. Stable matches are identified according to the consensus of the admissible stereo configurations provided by pairwise homographies. Within tiles, the rough planes are then merged according to their overlap in terms of matches and further consistent correspondences are extracted. The whole process only involves homography constraints. As a result, both the coverage and the stability of correct matches over the scene are amplified, together with the ability to spot matches in challenging scenes, allowing traditional hybrid matching pipelines to make up lost ground against recent end-to-end deep matching methods. In addition, the paper gives a thorough comparative analysis of recent state-of-the-art in image matching represented by end-to-end deep networks and hybrid pipelines. The evaluation considers both planar and non-planar scenes, taking into account critical and challenging scenarios including abrupt temporal image changes and strong variations in relative image rotations. According to this analysis, although the impressive progress done in this field, there is still a wide room for improvements to be investigated in future research.	翻訳日:2023-11-30 16:20:46 公開日:2023-11-26
# 機械学習を用いた最適行動実験の設計 Designing Optimal Behavioral Experiments Using Machine Learning ( http://arxiv.org/abs/2305.07721v2 ) ライセンス: Link先を確認	Simon Valentin, Steven Kleinegesse, Neil R. Bramley, Peggy Seri\`es, Michael U. Gutmann, Christopher G. Lucas	(参考訳) 計算モデルは人間の認知と行動を理解する強力なツールである。彼らは我々の理論を明確かつ正確に表現し、微妙でしばしば直感に反する予測を提供する。しかし、この豊かさと驚きの能力は、我々の科学的直観と伝統的なツールが、これらのモデルをテストし比較するための実験の設計に不適であることを意味する。これらの落とし穴を回避し、計算モデリングの可能性を最大限に発揮するためには、モデルが人間の振る舞いを説明することや、モデルがすべき補助的な仮定について明確な答えを提供する実験をデザインするツールが必要です。ベイズ最適実験設計(BOED)は、情報的データが得られると思われる実験を特定することにより、最適な実験設計の探索を定式化する。本稿では,boedと機械学習の最近の進歩を活かして,データのシミュレーションが可能な任意の種類のモデルに対して最適な実験を見つけるためのチュートリアルを提供し,この手法の副産物が実際の実験データに対して,モデルとそのパラメータを迅速かつ簡単に評価できることを示す。ケーススタディとして,マルチアームバンディット意思決定タスクにおける探索と搾取のバランスに関する理論を考察する。提案手法をシミュレーションと実世界実験を用いて検証する。文献で一般的に用いられる実験的な設計と比較すると,人間の行動に最も適したモデル群を最適設計がより効率的に決定し,望ましいモデルに対する行動のキャラクタリゼーションをより効率的に行うことが示される。同時に,boedで適切に対応できるように科学的質問を形式化することは困難であり,実践者が認識すべきいくつかの潜在的な注意事項と落とし穴について議論する。すべての分析を再現するためのコードとチュートリアルノートブックを提供します。 Computational models are powerful tools for understanding human cognition and behavior. They let us express our theories clearly and precisely, and offer predictions that can be subtle and often counter-intuitive. However, this same richness and ability to surprise means our scientific intuitions and traditional tools are ill-suited to designing experiments to test and compare these models. To avoid these pitfalls and realize the full potential of computational modeling, we require tools to design experiments that provide clear answers about what models explain human behavior and the auxiliary assumptions those models must make. Bayesian optimal experimental design (BOED) formalizes the search for optimal experimental designs by identifying experiments that are expected to yield informative data. In this work, we provide a tutorial on leveraging recent advances in BOED and machine learning to find optimal experiments for any kind of model that we can simulate data from, and show how by-products of this procedure allow for quick and straightforward evaluation of models and their parameters against real experimental data. As a case study, we consider theories of how people balance exploration and exploitation in multi-armed bandit decision-making tasks. We validate the presented approach using simulations and a real-world experiment. As compared to experimental designs commonly used in the literature, we show that our optimal designs more efficiently determine which of a set of models best account for individual human behavior, and more efficiently characterize behavior given a preferred model. At the same time, formalizing a scientific question such that it can be adequately addressed with BOED can be challenging and we discuss several potential caveats and pitfalls that practitioners should be aware of. We provide code and tutorial notebooks to replicate all analyses.	翻訳日:2023-11-30 16:19:20 公開日:2023-11-26
# 自動抽出メソッドリファクタリングによる単一責任のサポート Supporting single responsibility through automated extract method refactoring ( http://arxiv.org/abs/2305.03428v2 ) ライセンス: Link先を確認	Alireza Ardalani, Saeed Parsa, Morteza Zakeri-Nasrabadi, Alexander Chatzigeorgiou	(参考訳) メソッド/関数の責務は、所望の計算を実行し、オブジェクトフィールドや出力命令の変数を含む様々な成果物を通じて、結果を呼び出し元に分散することである。この責任の定義に基づいて、単一責任を持つ人に長いメソッドをリファクタリングする新しいアルゴリズムを提供する。本稿では,長いメソッドを少し重なり合うスライスに分解する後方スライスアルゴリズムを提案する。スライスは各出力命令に対して計算され、メソッドに委譲された責任の結果を表す。スライシング基準が同じ出力変数に対処した場合、スライスはオーバーラップしない。スライスはさらに独立な方法として抽出され、ある行動保存が行われると元の方法によって呼び出される。提案手法はGEMS抽出法リファクタリングベンチマークと実世界の3つのプロジェクトで評価されている。平均して、我々の実験は、最先端のアプローチと比較して、少なくとも29.6%の精度の向上と12.1%の改善を実演しています。さらに,本ツールはリファクタリング後のメソッドレベルの凝集度を平均20%改善する。実験により,単一責任の手法抽出における提案手法の適用性を確認した。 The responsibility of a method/function is to perform some desired computations and disseminate the results to its caller through various deliverables, including object fields and variables in output instructions. Based on this definition of responsibility, this paper offers a new algorithm to refactor long methods to those with a single responsibility. We propose a backward slicing algorithm to decompose a long method into slightly overlapping slices. The slices are computed for each output instruction, representing the outcome of a responsibility delegated to the method. The slices will be non-overlapping if the slicing criteria address the same output variable. The slices are further extracted as independent methods, invoked by the original method if certain behavioral preservations are made. The proposed method has been evaluated on the GEMS extract method refactoring benchmark and three real-world projects. On average, our experiments demonstrate at least a 29.6% improvement in precision and a 12.1% improvement in the recall of uncovering refactoring opportunities compared to the state-of-the-art approaches. Furthermore, our tool improves method-level cohesion metrics by an average of 20% after refactoring. Experimental results confirm the applicability of the proposed approach in extracting methods with a single responsibility.	翻訳日:2023-11-30 16:17:24 公開日:2023-11-26
# GenerateCT:3次元胸部CTボリュームのテキストコンディショナル生成 GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes ( http://arxiv.org/abs/2305.16037v3 ) ライセンス: Link先を確認	Ibrahim Ethem Hamamci, Sezgin Er, Enis Simsar, Anjany Sekuboyina, Chinmay Prabhakar, Alperen Tezcan, Ayse Gulnihan Simsek, Sevval Nil Esirgun, Furkan Almas, Irem Do\u{g}an, Muhammed Furkan Dasdelen, Hadrien Reynaud, Sarthak Pati, Christian Bluethgen, Mehmet Kemal Ozdemir, Bjoern Menze	(参考訳) 本稿では,フリーフォーム医療用テキストプロンプトに条件付CTボリュームを生成する新しい手法であるGenerateCTを紹介する。 GenerateCTは、CTボリュームを符号化する新しい因果視覚変換器と、CTとテキストトークンを整列するテキストイメージ変換器と、テキスト条件の超解像拡散モデルとを含む3つの重要なコンポーネントを含む。 GenerateCTは、FIDとFVDの低いスコアで検証された、現実的で高解像度で高忠実な3D胸部CTボリュームを生成することができる。 GenerateCTの臨床応用を探求するため,多義性分類タスクにおいて有用性を評価した。まず,実データセット上でのマルチ異常度分類器のトレーニングにより,ベースラインを確立した。モデルの外部データセットへの一般化と、ゼロショットシナリオにおける未認識のプロンプトによるパフォーマンスをさらに評価するために、外部データセットを使用して分類器をトレーニングし、追加のベンチマークを設定した。我々は,generatectを用いて各集合のボリュームを等数に合成し,トレーニングデータセットを2倍にする実験を行った。最初の実験では、実数と生成量で分類器を共同で訓練する際、APスコアが11%改善した。第2の実験では、目に見えないプロンプトに基づいた実数と生成量のトレーニングでは7%の改善が見られた。さらに、GenerateCTは、任意のサイズの合成トレーニングデータセットのスケーリングを可能にする。例えば、実際のデータセットの5倍の10万のctボリュームを生成し、これらの合成ボリュームのみに分類器をトレーニングしました。驚くべきことに、この分類器は、利用可能なすべての実データでトレーニングされたもののパフォーマンスを8%上回った。最後に、ドメインの専門家は生成されたボリュームを評価し、テキストプロンプトと高い整合性を確認した。私たちのコードと事前トレーニングされたモデルは、https://github.com/ibrahimethemhamamci/GenerateCTで利用可能です。 In this paper, we introduce GenerateCT, a novel approach for generating CT volumes conditioned on free-form medical text prompts. GenerateCT includes a text encoder and three key components: a novel causal vision transformer for encoding CT volumes, a text-image transformer for aligning CT and text tokens, and a text-conditional super-resolution diffusion model. GenerateCT can produce realistic, high-resolution, and high-fidelity 3D chest CT volumes, validated by low FID and FVD scores. To explore GenerateCT's clinical applications, we evaluated its utility in a multi-abnormality classification task. First, we established a baseline by training a multi-abnormality classifier on our real dataset. To further assess the model's generalization to external datasets and its performance with unseen prompts in a zero-shot scenario, we employed an external dataset to train the classifier, setting an additional benchmark. We conducted two experiments in which we doubled the training datasets by synthesizing an equal number of volumes for each set using GenerateCT. The first experiment demonstrated an 11% improvement in the AP score when training the classifier jointly on real and generated volumes. The second experiment showed a 7% improvement when training on both real and generated volumes based on unseen prompts. Moreover, GenerateCT enables the scaling of synthetic training datasets to arbitrary sizes. As an example, we generated 100,000 CT volumes, fivefold the number in our real dataset, and trained the classifier exclusively on these synthetic volumes. Impressively, this classifier surpassed the performance of the one trained on all available real data by a margin of 8%. Lastly, domain experts evaluated the generated volumes, confirming a high degree of alignment with the text prompt. Our code and pre-trained models are available at: https://github.com/ibrahimethemhamamci/GenerateCT	翻訳日:2023-11-30 16:10:09 公開日:2023-11-26
# 大規模言語モデルの強みとバイアスを明らかにするインコンテキスト・インフォメーション In-Context Impersonation Reveals Large Language Models' Strengths and Biases ( http://arxiv.org/abs/2305.14930v2 ) ライセンス: Link先を確認	Leonard Salewski, Stephan Alaniz, Isabel Rio-Torto, Eric Schulz, Zeynep Akata	(参考訳) 日常会話では、人間は異なる役割を担い、選択した役割に語彙を適応することができる。 LLMがテキスト・イン・コンテクストを生成する際に,その役割を異にするかどうかを検討する。我々は、視覚と言語タスクを解く前に、LLMに異なるペルソナを仮定するよう依頼する。私たちは、プロンプトに社会的なアイデンティティまたはドメインの専門知識に関連付けられたペルソナをプレフィックスすることでこれを行います。マルチアームバンディットタスクでは、異なる年齢の子どものふりをしたLSMが、人間のような発達段階の探索を回復する。言語に基づく推論タスクでは、ドメインエキスパートを装うLLMが、ドメイン専門家を装うLLMよりも優れた性能を発揮する。最後に,異なるカテゴリを記述する際に,llmsの擬態が視覚情報に補完するかどうかを検証した。鳥の専門家になるよう促されたLLMは、車の専門家になるよう促された鳥よりも鳥をうまく説明します。男性であるように促されたLSMは、女性であるように促された車よりも、車を記述するのが得意である。これらの結果から, LLMは多様な役割を担っており, この文脈内偽造は, 隠れた強みや偏見を明らかにするのに有効であることが示唆された。 In everyday conversations, humans can take on different roles and adapt their vocabulary to their chosen roles. We explore whether LLMs can take on, that is impersonate, different roles when they generate text in-context. We ask LLMs to assume different personas before solving vision and language tasks. We do this by prefixing the prompt with a persona that is associated either with a social identity or domain expertise. In a multi-armed bandit task, we find that LLMs pretending to be children of different ages recover human-like developmental stages of exploration. In a language-based reasoning task, we find that LLMs impersonating domain experts perform better than LLMs impersonating non-domain experts. Finally, we test whether LLMs' impersonations are complementary to visual information when describing different categories. We find that impersonation can improve performance: an LLM prompted to be a bird expert describes birds better than one prompted to be a car expert. However, impersonation can also uncover LLMs' biases: an LLM prompted to be a man describes cars better than one prompted to be a woman. These findings demonstrate that LLMs are capable of taking on diverse roles and that this in-context impersonation can be used to uncover their hidden strengths and biases.	翻訳日:2023-11-30 16:08:35 公開日:2023-11-26
# Moment Matching Denoisingギブズサンプリング Moment Matching Denoising Gibbs Sampling ( http://arxiv.org/abs/2305.11650v4 ) ライセンス: Link先を確認	Mingtian Zhang and Alex Hawkins-Hooker and Brooks Paige and David Barber	(参考訳) エネルギーベースモデル(ebms)は複雑なデータ分布をモデリングするための汎用フレームワークを提供する。しかし、ESMからのトレーニングとサンプリングは引き続き大きな課題を呈している。スケーラブルなEMMトレーニングのための広く使われているDenoising Score Matching (DSM) 法は不整合の問題に悩まされ、エネルギーモデルが「ノイズの多い」データ分布を学習する。そこで本研究では,DSM で十分に訓練された 'ノイズ' モデルが与えられた場合に,基礎となるクリーンモデルから効果的なサンプリングを可能にする,モーメントマッチングを用いた効率的なサンプリングフレームワークを提案する。関連手法と比較して,本手法の利点を考察し,高次元データセットへの拡張方法を示す。 Energy-Based Models (EBMs) offer a versatile framework for modeling complex data distributions. However, training and sampling from EBMs continue to pose significant challenges. The widely-used Denoising Score Matching (DSM) method for scalable EBM training suffers from inconsistency issues, causing the energy model to learn a `noisy' data distribution. In this work, we propose an efficient sampling framework: (pseudo)-Gibbs sampling with moment matching, which enables effective sampling from the underlying clean model when given a `noisy' model that has been well-trained via DSM. We explore the benefits of our approach compared to related methods and demonstrate how to scale the method to high-dimensional datasets.	翻訳日:2023-11-30 16:05:49 公開日:2023-11-26
# アダマールパラメータ化下における政策勾配の線形収束について On the Linear Convergence of Policy Gradient under Hadamard Parameterization ( http://arxiv.org/abs/2305.19575v2 ) ライセンス: Link先を確認	Jiacai Liu, Jinchi Chen, and Ke Wei	(参考訳) アダマールのパラメータ化の下での決定論的政策勾配の収束を表裏の設定で研究し、アルゴリズムの線形収束を確立する。この目的のために、我々はまずすべてのイテレーションに対して、エラーが$o(\frac{1}{k})$レートで減少することを示す。この結果に基づき、このアルゴリズムは、mdp問題と初期化のみに依存する定数である $k_0$ の反復後に、より高速な局所線形収束率を持つことを示した。アルゴリズムの局所的な線形収束を示すために、我々は実際に$k\ge k_0$のとき、サブ最適確率$b_s^k$(すなわち、出力ポリシ$\pi^k$の確率)の収縮を確立した。 The convergence of deterministic policy gradient under the Hadamard parameterization is studied in the tabular setting and the linear convergence of the algorithm is established. To this end, we first show that the error decreases at an $O(\frac{1}{k})$ rate for all the iterations. Based on this result, we further show that the algorithm has a faster local linear convergence rate after $k_0$ iterations, where $k_0$ is a constant that only depends on the MDP problem and the initialization. To show the local linear convergence of the algorithm, we have indeed established the contraction of the sub-optimal probability $b_s^k$ (i.e., the probability of the output policy $\pi^k$ on non-optimal actions) when $k\ge k_0$.	翻訳日:2023-11-30 15:58:06 公開日:2023-11-26
# 解釈可能な機械学習モデル発見のための並列座標 Parallel Coordinates for Discovery of Interpretable Machine Learning Models ( http://arxiv.org/abs/2305.18434v2 ) ライセンス: Link先を確認	Dustin Hayes, Boris Kovalerchuk	(参考訳) この研究は、並列座標における視覚的知識発見を用いて、解釈可能な機械学習の手法を前進させる。パラレル座標によるグラフィックデータ表現は、ハイパーキューブとハイパーブロック(hbs)の概念をエンドユーザにとって分かりやすくした。提案したデータ分類アルゴリズムであるHyperでは,混合および純粋なハイパーブロックを用いることが提案されている。ハイパーモデルは決定木を一般化する。アルゴリズムはいくつかの設定とオプションで表示され、インタラクティブ、自動オーバーラップ、非オーバーラップのハイパーブロックを検出する。さらに,視覚パターンの言語記述と連動してハイパーブロックの使用が実証された。 UCI MLリポジトリのベンチマークデータは、Hyperアルゴリズムを評価するために使用された。これにより、10倍のクロスバリデーションを用いて評価した混合HBと純粋なHBの発見が可能となった。ハイパーブロック間の接続、次元縮小、可視化が確立されている。エンドユーザーがハイパーブロックを見つけて観察する能力と、パターンを明確にするためのサイドバイサイドの可視化能力は、ハイパーブロック技術とハイパーアルゴリズムの大きな利点である。従来の並列座標ではサポートされていないが,不完全なn-Dデータを不完全な値で可視化する新しい手法を提案する。 HBが決定木上のデータの過一般化と過適合の両方を防止できる能力は、ハイパーブロックの別の利点として示される。ハイパーテクノロジーを実装するviscanvas 2.0ソフトウェアツールの特徴を紹介する。 This work uses visual knowledge discovery in parallel coordinates to advance methods of interpretable machine learning. The graphic data representation in parallel coordinates made the concepts of hypercubes and hyperblocks (HBs) simple to understand for end users. It is suggested to use mixed and pure hyperblocks in the proposed data classifier algorithm Hyper. It is shown that Hyper models generalize decision trees. The algorithm is presented in several settings and options to discover interactively or automatically overlapping or non-overlapping hyperblocks. Additionally, the use of hyperblocks in conjunction with language descriptions of visual patterns is demonstrated. The benchmark data from the UCI ML repository were used to evaluate the Hyper algorithm. It enabled the discovery of mixed and pure HBs evaluated using 10-fold cross validation. Connections among hyperblocks, dimension reduction and visualization have been established. The capability of end users to find and observe hyperblocks, as well as the ability of side-by-side visualizations to make patterns evident, are among major advantages ofhyperblock technology and the Hyper algorithm. A new method to visualize incomplete n-D data with missing values is proposed, while the traditional parallel coordinates do not support it. The ability of HBs to better prevent both overgeneralization and overfitting of data over decision trees is demonstrated as another benefit of the hyperblocks. The features of VisCanvas 2.0 software tool that implements Hyper technology are presented.	翻訳日:2023-11-30 15:56:35 公開日:2023-11-26
# 物理インフォームドニューラルネットワークにおける外挿故障の理解と緩和 Understanding and Mitigating Extrapolation Failures in Physics-Informed Neural Networks ( http://arxiv.org/abs/2306.09478v2 ) ライセンス: Link先を確認	Lukas Fesser, Luca D'Amico-Wong, Richard Qiu	(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、ディープニューラルネットワーク(DNN)を用いた偏微分方程式(PDE)の効率的な近似によって最近人気を博している。しかし、それらのドメイン外振舞いはよく理解されておらず、以前の研究では、解関数に高周波成分が存在することが外挿性能の悪い原因になるかもしれないと推測されている。本稿では,高次元pdesを含む異なる種類のpdesの代表的な集合に対するピンの補間挙動について検討する。その結果,外挿障害は解関数の高周波数によるものではなく,フーリエスペクトルの時間的支持の変化によるものであることがわかった。本稿では、これらのスペクトルシフトを、WWF(Weighted Wasserstein-Fourier distance)を導入して定量化する。 WWFは、PINN外挿性能の予測に利用でき、重要なスペクトルシフトがない場合には、PINN外挿性能においても真の解に近づいたままであることを示す。最後に,より大きなスペクトルシフトの影響を緩和し,補間誤差を最大82%低減するトランスファー学習に基づく戦略を提案する。 Physics-informed Neural Networks (PINNs) have recently gained popularity due to their effective approximation of partial differential equations (PDEs) using deep neural networks (DNNs). However, their out of domain behavior is not well understood, with previous work speculating that the presence of high frequency components in the solution function might be to blame for poor extrapolation performance. In this paper, we study the extrapolation behavior of PINNs on a representative set of PDEs of different types, including high-dimensional PDEs. We find that failure to extrapolate is not caused by high frequencies in the solution function, but rather by shifts in the support of the Fourier spectrum over time. We term these spectral shifts and quantify them by introducing a Weighted Wasserstein-Fourier distance (WWF). We show that the WWF can be used to predict PINN extrapolation performance, and that in the absence of significant spectral shifts, PINN predictions stay close to the true solution even in extrapolation. Finally, we propose a transfer learning-based strategy to mitigate the effects of larger spectral shifts, which decreases extrapolation errors by up to 82%.	翻訳日:2023-11-30 15:46:00 公開日:2023-11-26
# 確率的プログラムを用いた大規模言語モデルの逐次モンテカルロステアリング Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs ( http://arxiv.org/abs/2306.03081v2 ) ライセンス: Link先を確認	Alexander K. Lew, Tan Zhi-Xuan, Gabriel Grand, and Vikash K. Mansinghka	(参考訳) 微調整と強化学習の後でも、大きな言語モデル(llm)は不可能ではないが、プロンプトだけで確実に制御することは困難である。連続モンテカルロステアリング(SMC)と呼ばれるLCMの出力に構文的および意味的制約を強制する新しい推論時手法を提案する。鍵となるアイデアは、言語生成タスクを離散確率系列モデルにおける後続推論問題として指定し、標準復号を逐次モンテカルロ推論に置き換えることである。ビームサーチと同様の計算コストのために、SMC は LLM を操り、埋め込み、構文制約による生成、交差点の促進など様々なタスクを解くことができる。 smcステアリングの実験を容易にするために、新しい世代のタスクを言語モデル確率プログラムとして簡潔に指定し、llamaファミリートランスフォーマーのステアリングを自動化する、確率的プログラミングライブラリllamppl(https://github.com/probcomp/hfppl)を提案する。 Even after fine-tuning and reinforcement learning, large language models (LLMs) can be difficult, if not impossible, to control reliably with prompts alone. We propose a new inference-time approach to enforcing syntactic and semantic constraints on the outputs of LLMs, called sequential Monte Carlo (SMC) steering. The key idea is to specify language generation tasks as posterior inference problems in a class of discrete probabilistic sequence models, and replace standard decoding with sequential Monte Carlo inference. For a computational cost similar to that of beam search, SMC can steer LLMs to solve diverse tasks, including infilling, generation under syntactic constraints, and prompt intersection. To facilitate experimentation with SMC steering, we present a probabilistic programming library, LLaMPPL (https://github.com/probcomp/hfppl), for concisely specifying new generation tasks as language model probabilistic programs, and automating steering of LLaMA-family Transformers.	翻訳日:2023-11-30 15:43:39 公開日:2023-11-26
# 失認者再確認のための消去・変換・通知防御ネットワーク Erasing, Transforming, and Noising Defense Network for Occluded Person Re-Identification ( http://arxiv.org/abs/2307.07187v3 ) ライセンス: Link先を確認	Neng Dong, Liyan Zhang, Shuanglin Yan, Hao Tang and Jinhui Tang	(参考訳) 排他的摂動は、人物の再識別(re-ID)において重大な課題を示し、外部の視覚的手がかりに依存する既存の手法では、追加の計算資源を必要とし、排他的情報の欠落の問題のみを考慮する。本稿では, 騒音障害としてオクルージョンを扱い, 敵防御の観点から隠蔽された人物のre-IDを解消する, 消去, トランスフォーミング, 騒音防御ネットワーク (ETNDNet) という, シンプルで効果的なフレームワークを提案する。提案するETNDNetでは,まず特徴マップをランダムに消去し,不完全な情報を持つ敵表現を生成する。第2に,オクルージョンによる位置ずれをシミュレートするランダムな変換を導入し,抽出器と分類器を逆さまに訓練し,不整合情報に対する堅牢な表現を学習する。第3に,障害物や非目標歩行者が導入した騒音情報に対処するために,ランダムな値で特徴マップを摂動させ,re-IDシステムにおいて敵ゲーミングを採用し,閉塞音に対する耐性を高める。 ETNDNetには3つの重要なハイライトがある。 (i)パラメータを持つ外部モジュールを一切必要としない。 (ii)障害物や非目標歩行者からの閉塞による諸問題を効果的に処理し、三隠蔽者再IDのための最初のGANベースの敵防衛パラダイムを設計する。 5つの公開データセットに対する大規模な実験は、提案したETNDNetの有効性、優位性、実用性を完全に証明している。コードは \url{https://github.com/nengdong96/ETNDNet} でリリースされる。 Occlusion perturbation presents a significant challenge in person re-identification (re-ID), and existing methods that rely on external visual cues require additional computational resources and only consider the issue of missing information caused by occlusion. In this paper, we propose a simple yet effective framework, termed Erasing, Transforming, and Noising Defense Network (ETNDNet), which treats occlusion as a noise disturbance and solves occluded person re-ID from the perspective of adversarial defense. In the proposed ETNDNet, we introduce three strategies: Firstly, we randomly erase the feature map to create an adversarial representation with incomplete information, enabling adversarial learning of identity loss to protect the re-ID system from the disturbance of missing information. Secondly, we introduce random transformations to simulate the position misalignment caused by occlusion, training the extractor and classifier adversarially to learn robust representations immune to misaligned information. Thirdly, we perturb the feature map with random values to address noisy information introduced by obstacles and non-target pedestrians, and employ adversarial gaming in the re-ID system to enhance its resistance to occlusion noise. Without bells and whistles, ETNDNet has three key highlights: (i) it does not require any external modules with parameters, (ii) it effectively handles various issues caused by occlusion from obstacles and non-target pedestrians, and (iii) it designs the first GAN-based adversarial defense paradigm for occluded person re-ID. Extensive experiments on five public datasets fully demonstrate the effectiveness, superiority, and practicality of the proposed ETNDNet. The code will be released at \url{https://github.com/nengdong96/ETNDNet}.	翻訳日:2023-11-30 15:36:29 公開日:2023-11-26
# WavJourney: 大きな言語モデルによる作曲オーディオ作成 WavJourney: Compositional Audio Creation with Large Language Models ( http://arxiv.org/abs/2307.14335v2 ) ライセンス: Link先を確認	Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang	(参考訳) 音声生成モデルの進歩にもかかわらず、その能力は音声の書き起こしや音声キャプションのようなドメイン固有の条件に限られることが多い。しかし、現実の音声生成は、音声、音楽、音響効果などの様々な要素を含む調和した音声を制御可能な条件で生成することを目的としており、既存の音声生成システムでは対処が難しい。本稿では,大規模言語モデル(llms)を活用した新しいフレームワークであるwavjourneyを提案する。 WavJourneyを使えば、ユーザーはテキストによる説明だけで様々なオーディオ要素でストーリーテリングオーディオコンテンツを作成できる。具体的には、テキスト命令が与えられた場合、WavJourney はまず LLM に対して、オーディオ要素の構造的意味表現として機能するオーディオスクリプトを生成するよう促す。音声スクリプトはコンピュータプログラムに変換され、プログラムの各行はタスク固有のオーディオ生成モデルまたは計算操作関数を呼び出す。そして、コンピュータプログラムを実行し、音声生成のための構成的で解釈可能なソリューションを得る。実験結果から,WavJourneyはテキスト記述された意味的,空間的,時間的条件に整合した現実的な音声を合成し,テキストから音声生成のベンチマークで最先端の結果が得られることが示唆された。さらに,新しいマルチジャンル・ストーリー・ベンチマークを導入する。主観評価はWavJourneyがテキストから魅力的なストーリーテリング音声コンテンツを制作する可能性を示している。さらにwavjourneyがマルチラウンド対話における人間と機械の共創を促進することを実証する。今後の研究を促進するため、コードと合成オーディオはhttps://audio-agi.github.io/wavjourney_demopage/で入手できる。 Despite breakthroughs in audio generation models, their capabilities are often confined to domain-specific conditions such as speech transcriptions and audio captions. However, real-world audio creation aims to generate harmonious audio containing various elements such as speech, music, and sound effects with controllable conditions, which is challenging to address using existing audio generation systems. We present WavJourney, a novel framework that leverages Large Language Models (LLMs) to connect various audio models for audio creation. WavJourney allows users to create storytelling audio content with diverse audio elements simply from textual descriptions. Specifically, given a text instruction, WavJourney first prompts LLMs to generate an audio script that serves as a structured semantic representation of audio elements. The audio script is then converted into a computer program, where each line of the program calls a task-specific audio generation model or computational operation function. The computer program is then executed to obtain a compositional and interpretable solution for audio creation. Experimental results suggest that WavJourney is capable of synthesizing realistic audio aligned with textually-described semantic, spatial and temporal conditions, achieving state-of-the-art results on text-to-audio generation benchmarks. Additionally, we introduce a new multi-genre story benchmark. Subjective evaluations demonstrate the potential of WavJourney in crafting engaging storytelling audio content from text. We further demonstrate that WavJourney can facilitate human-machine co-creation in multi-round dialogues. To foster future research, the code and synthesized audio are available at: https://audio-agi.github.io/WavJourney_demopage/.	翻訳日:2023-11-30 15:24:28 公開日:2023-11-26
# RANSACを用いた教師なし画像異常検出 Unsupervised Image Outlier Detection using RANSAC ( http://arxiv.org/abs/2307.12301v2 ) ライセンス: Link先を確認	Chen-Han Tsai, Yu-Shao Peng	(参考訳) 画像異常検出(OD)は、コンピュータビジョンタスクで使用される画像データセットの品質と精度を保証するための重要なツールである。しかし、既存のアプローチのほとんどは、アウトレイラ予測に先立ってトレーニングのために、一連の分散データを必要とする。データの品質と量は、結果のパフォーマンスに影響を与える可能性がある。したがって、適切な分配集合を選択するには、しばしばかなりの労力を要する。本研究では,一級分類方式で汚染された集合内の外れ値を検出するための教師なし画像ODアルゴリズムであるRANSAC-NNを提案する。 RANSAC-NNはトレーニングなしで、様々なODベンチマークで確立された他の方法と比較して好適に機能する。さらに,本手法は,RANSAC-NNを前処理中に簡単に適用することで,既存のOD手法の堅牢性を高めることができることを示す。 Image outlier detection (OD) is an essential tool to ensure the quality and accuracy of image datasets used in computer vision tasks. Most existing approaches, however, require a set of in-distribution data for training prior to outlier prediction. The quality and quantity of the data can influence the resulting performance. Thus, selecting a suitable in-distribution set often requires considerable effort. In this work, we propose RANSAC-NN, an unsupervised image OD algorithm designed to detect outliers within contaminated sets in a one-class classification fashion. Without any training, RANSAC-NN performs favorably in comparison to other well-established methods in a variety of OD benchmarks. Furthermore, we show that our method can enhance the robustness of existing OD methods by simply applying RANSAC-NN during pre-processing.	翻訳日:2023-11-30 15:23:15 公開日:2023-11-26
# 銀行業務自動化のためのマルチモーダル文書分析 Multimodal Document Analytics for Banking Process Automation ( http://arxiv.org/abs/2307.11845v2 ) ライセンス: Link先を確認	Christopher Gerling, Stefan Lessmann	(参考訳) 従来の銀行は急速に発展する金融エコシステムにおいてフィンテックとの競争が激化している。この課題に対処するには,運用効率の向上が不可欠だ。本研究の目的は,銀行における文書集約型ビジネスプロセスの効率化である。そこで我々はまず,小売部門における業務文書の状況について概観する。バンキング文書はテキスト、レイアウト、視覚を含むことが多く、文書分析とプロセスの自動化には通常の自然言語処理(NLP)以上のものが必要であることを示唆している。これを検証し、ビジネス文書処理時の視覚的手がかりの漸進的価値を評価するために、最近提案されたLayoutXLMと呼ばれるマルチモーダルモデルと強力なテキスト分類器(例えばBERT)と大規模言語モデル(例えばGPT)を比較した。その結果,レイアウト情報をモデルに組み込むことで性能が大幅に向上することが確認された。興味深いことに、最高のモデルパフォーマンス(f1スコアの観点から)の75%以上が、トレーニングデータの30%以下で達成可能であることもわかりました。これは、マルチモーダルモデルを構築するためのラベル付きデータの要求が適度であることを示し、マルチモーダル文書分析の現実的な応用を単純化する。また,マルチモーダルバンキング文書分類器の校正範囲において,微調整の必要性を含め,より具体的な実践について考察した。本論文は,銀行業務における文書処理におけるマルチモデルモデルの有効性と効率に関する実証的証拠を提示し,この可能性を日々の業務において解き放つための実践的なガイダンスを提供する。 Traditional banks face increasing competition from FinTechs in the rapidly evolving financial ecosystem. Raising operational efficiency is vital to address this challenge. Our study aims to improve the efficiency of document-intensive business processes in banking. To that end, we first review the landscape of business documents in the retail segment. Banking documents often contain text, layout, and visuals, suggesting that document analytics and process automation require more than plain natural language processing (NLP). To verify this and assess the incremental value of visual cues when processing business documents, we compare a recently proposed multimodal model called LayoutXLM to powerful text classifiers (e.g., BERT) and large language models (e.g., GPT) in a case study related to processing company register extracts. The results confirm that incorporating layout information in a model substantially increases its performance. Interestingly, we also observed that more than 75% of the best model performance (in terms of the F1 score) can be achieved with as little as 30% of the training data. This shows that the demand for data labeled data to set up a multi-modal model can be moderate, which simplifies real-world applications of multimodal document analytics. Our study also sheds light on more specific practices in the scope of calibrating a multimodal banking document classifier, including the need for fine-tuning. In sum, the paper contributes original empirical evidence on the effectiveness and efficiency of multi-model models for document processing in the banking business and offers practical guidance on how to unlock this potential in day-to-day operations.	翻訳日:2023-11-30 15:22:22 公開日:2023-11-26
# 双方向積分近似による完全拡散反転 Exact Diffusion Inversion via Bi-directional Integration Approximation ( http://arxiv.org/abs/2307.10829v6 ) ライセンス: Link先を確認	Guoqiang Zhang and J. P. Lewis and W. Bastiaan Kleijn	(参考訳) 近年,EDICT[36]やNull-textインバージョン[22]などの画像編集を可能にするために,DDIMインバージョンの不整合問題に対処する様々な手法が提案されている。しかし、上記の手法は計算オーバーヘッドがかなり大きい。本稿では,BDIA(emph{bi-directional integration approximation)と呼ばれる新しい手法を提案する。次の拡散状態 $\boldsymbol{z}_{i-1}$ at timestep $t_i$ と履歴情報 $(i,\boldsymbol{z}_i)$ と $(i+1,\boldsymbol{z}_{i+1})$ を推定する。まず、推定されたガウスノイズ $\hat{\boldsymbol{\epsilon}}(\boldsymbol{z}_i,i)$ を取得し、次に次回の時間スロット$[t_i, t_{i-1}]$ と前回の時間スロット$[t_i, t_{t+1}]$ を後方方向に近似するためにDDIM更新手順を2回適用する。以前の時間スロットのDDIMステップは、$\boldsymbol{z}_i$を計算する際に以前になされた積分近似を洗練するために使用される。 BDIA-DDIMのよい性質は、$\boldsymbol{z}_{i-1}$の更新式が$(\boldsymbol{z}_{i+1}, \boldsymbol{z}_i, \hat{\boldsymbol{\epsilon}}(\boldsymbol{z}_i,i))$の線形結合であることである。これにより、$\boldsymbol{z}_{i+1}$が与えられた$(\boldsymbol{z}_i, \boldsymbol{z}_{i-1})$の正確な逆計算が可能になり、正確な拡散反転をもたらす。 bdia-ddimが特に画像編集に有効であることを実験により実証した。さらに,BDIA-DDIMはテキスト・ツー・イメージ生成において,DDIMよりも優れた画像サンプリング特性が得られることを示した。 BDIAはDDIMに加えて他のODEソルバの性能向上にも応用できる。本研究は,BDIAをEDMサンプリング手順に適用することにより,事前学習した4つのモデルよりも一貫して優れた性能が得られることを示す。 Recently, various methods have been proposed to address the inconsistency issue of DDIM inversion to enable image editing, such as EDICT [36] and Null-text inversion [22]. However, the above methods introduce considerable computational overhead. In this paper, we propose a new technique, named \emph{bi-directional integration approximation} (BDIA), to perform exact diffusion inversion with neglible computational overhead. Suppose we would like to estimate the next diffusion state $\boldsymbol{z}_{i-1}$ at timestep $t_i$ with the historical information $(i,\boldsymbol{z}_i)$ and $(i+1,\boldsymbol{z}_{i+1})$. We first obtain the estimated Gaussian noise $\hat{\boldsymbol{\epsilon}}(\boldsymbol{z}_i,i)$, and then apply the DDIM update procedure twice for approximating the ODE integration over the next time-slot $[t_i, t_{i-1}]$ in the forward manner and the previous time-slot $[t_i, t_{t+1}]$ in the backward manner. The DDIM step for the previous time-slot is used to refine the integration approximation made earlier when computing $\boldsymbol{z}_i$. A nice property of BDIA-DDIM is that the update expression for $\boldsymbol{z}_{i-1}$ is a linear combination of $(\boldsymbol{z}_{i+1}, \boldsymbol{z}_i, \hat{\boldsymbol{\epsilon}}(\boldsymbol{z}_i,i))$. This allows for exact backward computation of $\boldsymbol{z}_{i+1}$ given $(\boldsymbol{z}_i, \boldsymbol{z}_{i-1})$, thus leading to exact diffusion inversion. It is demonstrated with experiments that (round-trip) BDIA-DDIM is particularly effective for image editing. Our experiments further show that BDIA-DDIM produces markedly better image sampling qualities than DDIM for text-to-image generation. BDIA can also be applied to improve the performance of other ODE solvers in addition to DDIM. In our work, it is found that applying BDIA to the EDM sampling procedure produces consistently better performance over four pre-trained models.	翻訳日:2023-11-30 15:21:55 公開日:2023-11-26
# より表現力のあるグラフニューラルネットワークは生成タスクを改善するか? Will More Expressive Graph Neural Networks do Better on Generative Tasks? ( http://arxiv.org/abs/2308.11978v2 ) ライセンス: Link先を確認	Xiandong Zou, Xiangyu Zhao, Pietro Li\`o, Yiren Zhao	(参考訳) グラフ生成は、与えられたラベルに基づいて、複数のノードとエッジを持つ完全なグラフを予測するため、大きな課題となる。この課題は、デノボ薬や分子設計を含む多くの現実世界の応用にも根本的な重要性を持っている。近年,グラフ生成分野においていくつかの手法が成功している。しかしながら、これらの手法は、(1)基礎となるグラフニューラルネットワーク(GNN)アーキテクチャがしばしば過小評価され、(2)限られた数のメトリクスで評価されることの2つの重大な欠点に悩まされている。このギャップを埋めるために、グラフ生成モデルの基盤となるGNNをより表現力のあるGNNに置き換えることで、分子グラフ生成タスクの文脈下でのGNNの表現性を調査する。具体的には、ZINC-250kデータセット上の6つの分子生成目標に対する6つのGNNの性能を、GCPNやGraphAFのような自己回帰生成モデルと、GraphEBMのような1ショット生成モデルという2つの異なる生成フレームワークで分析する。 GNNは,分子生成タスクにおけるGCPN,GraphAF,GraphEBMの性能を向上させることができるが,GNN表現性は優れたGNN生成モデルに必要な条件ではない。さらに,提案する分子生成目標 (DRD2, Median1, Median2) に基づいて, 変分オートエンコーダやベイズ最適化モデルなどの非GNNグラフ生成手法を用いて, 高度GNNを用いたGCPNとGraphAFの最先端結果が得られることを示す。 Graph generation poses a significant challenge as it involves predicting a complete graph with multiple nodes and edges based on simply a given label. This task also carries fundamental importance to numerous real-world applications, including de-novo drug and molecular design. In recent years, several successful methods have emerged in the field of graph generation. However, these approaches suffer from two significant shortcomings: (1) the underlying Graph Neural Network (GNN) architectures used in these methods are often underexplored; and (2) these methods are often evaluated on only a limited number of metrics. To fill this gap, we investigate the expressiveness of GNNs under the context of the molecular graph generation task, by replacing the underlying GNNs of graph generative models with more expressive GNNs. Specifically, we analyse the performance of six GNNs on six different molecular generative objectives on the ZINC-250k dataset in two different generative frameworks: autoregressive generation models, such as GCPN and GraphAF, and one-shot generation models, such as GraphEBM. Through our extensive experiments, we demonstrate that advanced GNNs can indeed improve the performance of GCPN, GraphAF, and GraphEBM on molecular generation tasks, but GNN expressiveness is not a necessary condition for a good GNN-based generative model. Moreover, we show that GCPN and GraphAF with advanced GNNs can achieve state-of-the-art results across 17 other non-GNN-based graph generative approaches, such as variational autoencoders and Bayesian optimisation models, on the proposed molecular generative objectives (DRD2, Median1, Median2), which are important metrics for de-novo molecular design.	翻訳日:2023-11-30 15:12:59 公開日:2023-11-26
# LLMエージェントに社会原理はあるか? Is There Any Social Principle for LLM-Based Agents? ( http://arxiv.org/abs/2308.11136v2 ) ライセンス: Link先を確認	Jitao Bai, Simiao Zhang, Zhonghao Chen	(参考訳) 大規模言語モデルに基づくエージェントは、人間中心のアライメントやアプリケーション以上のものを含むべきである。エージェント自体により多くの注意を払うべきであり、エージェントに適した社会科学を確立する可能性について議論すべきである。 Focus on Large Language Model based agents should involve more than "human-centered" alignment or application. We argue that more attention should be paid to the agent itself and discuss the potential of establishing tailored social sciences for agents.	翻訳日:2023-11-30 15:12:27 公開日:2023-11-26
# アンハーモニック・アライアンス:正確なWKBはETPと出会う An anharmonic alliance: exact WKB meets EPT ( http://arxiv.org/abs/2309.02505v2 ) ライセンス: Link先を確認	Bruno Bucciotti, Tomas Reis, and Marco Serone	(参考訳) 離散スペクトルを持つある種の量子力学系において、可観測値は$\hbar$の半連続で与えられることが示され、ボレル再帰可能な拡張を持つ$\hbar_0$-deformationsは、元のモデルを$\hbar_0=\hbar$で再現する。このような拡張はExact Perturbation Theory (EPT)と呼ばれた。本研究では, 多項式量子力学系のスペクトルを調べることにより, 厳密な wkb 法の枠組みの中で, 上記の結果が得られるかを検討する。正確な wkb の中で、エネルギー固有値は voros の記号 $a_{\gamma_i}$, $\gamma_i$ で定義される正確な量子化条件によって決定され、一般に $\hbar$ で変換される。準調和ポテンシャルにおけるエネルギー固有値のボレル和が正確なWKBでどのように出現するかをレビューした後、量子補正で高次無調和ポテンシャルに拡張する。次に、任意の多項式ポテンシャルが、正確な量子化条件が単に$a_\gamma=-1$と読み取るモデルに$\hbar_0$-変形できることを示し、すべてのエネルギー固有値に対して EPT Borel 再帰級数をもたらす。 Certain quantum mechanical systems with a discrete spectrum, whose observables are given by a transseries in $\hbar$, were shown to admit $\hbar_0$-deformations with Borel resummable expansions which reproduce the original model at $\hbar_0=\hbar$. Such expansions were dubbed Exact Perturbation Theory (EPT). We investigate how the above results can be obtained within the framework of the exact WKB method by studying the spectrum of polynomial quantum mechanical systems. Within exact WKB, energy eigenvalues are determined by exact quantization conditions defined in terms of Voros symbols $a_{\gamma_i}$, $\gamma_i$ being their associated cycles, and generally give rise to transseries in $\hbar$. After reviewing how the Borel summability of energy eigenvalues in the quartic anharmonic potential emerges in exact WKB, we extend it to higher order anharmonic potentials with quantum corrections. We then show that any polynomial potential can be $\hbar_0$-deformed to a model where the exact quantization condition reads simply $a_\gamma=-1$ and leads to the EPT Borel resummable series for all energy eigenvalues.	翻訳日:2023-11-30 15:02:55 公開日:2023-11-26
# Threshold KNN-Shapley: データ評価に対する線形時間とプライバシフレンドリなアプローチ Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation ( http://arxiv.org/abs/2308.15709v2 ) ライセンス: Link先を確認	Jiachen T. Wang, Yuqing Zhu, Yu-Xiang Wang, Ruoxi Jia, Prateek Mittal	(参考訳) データ評価は、トレーニング機械学習(ml)モデルにおける個々のデータソースの有用性を定量化することを目的としており、データ中心のml研究の重要な側面である。しかし、データのバリュエーションは、その重要性にもかかわらずプライバシー上の問題にしばしば見過ごされる。本稿では,近年最も実践的なデータ評価手法であるKNN-Shapleyに着目し,これらの課題について考察する。我々はまず、KNN-Shapleyの固有のプライバシーリスクを強調し、KNN-Shapleyを差分プライバシー(DP)に適合させる上で重要な技術的困難を実証する。これらの課題を克服するために、プライバシーに配慮したKNN-Shapleyの改良版であるTKNN-Shapleyを導入する。 DP-TKNN-Shapleyにはいくつかの利点があり、データ品質の差別化において、民営化されたKNN-Shapleyに比べ、プライバシー利用のトレードオフが優れていることを示す。さらに、プライベートでないTKNN-Shapleyでさえ、KNN-Shapleyと同等のパフォーマンスを実現している。全体としては、TKNN-ShapleyはKNN-Shapleyに代わる有望な代替手段であることを示している。 Data valuation aims to quantify the usefulness of individual data sources in training machine learning (ML) models, and is a critical aspect of data-centric ML research. However, data valuation faces significant yet frequently overlooked privacy challenges despite its importance. This paper studies these challenges with a focus on KNN-Shapley, one of the most practical data valuation methods nowadays. We first emphasize the inherent privacy risks of KNN-Shapley, and demonstrate the significant technical difficulties in adapting KNN-Shapley to accommodate differential privacy (DP). To overcome these challenges, we introduce TKNN-Shapley, a refined variant of KNN-Shapley that is privacy-friendly, allowing for straightforward modifications to incorporate DP guarantee (DP-TKNN-Shapley). We show that DP-TKNN-Shapley has several advantages and offers a superior privacy-utility tradeoff compared to naively privatized KNN-Shapley in discerning data quality. Moreover, even non-private TKNN-Shapley achieves comparable performance as KNN-Shapley. Overall, our findings suggest that TKNN-Shapley is a promising alternative to KNN-Shapley, particularly for real-world applications involving sensitive data.	翻訳日:2023-11-30 14:59:42 公開日:2023-11-26
# Video Task Decathlon: 自動運転における画像とビデオタスクの統合 Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving ( http://arxiv.org/abs/2309.04422v2 ) ライセンス: Link先を確認	Thomas E. Huang, Yifan Liu, Luc Van Gool, Fisher Yu	(参考訳) 動的シーンで複数の異種視覚タスクを実行することは、人間の知覚能力の要点である。表現学習による画像およびビデオ認識の著しい進歩にもかかわらず、現在の研究は、タスクの特異性、均質性、あるいは単純な組み合わせのための特別なネットワークの設計に焦点を当てている。そこで我々は,様々な入出力構造を有する自律運転における主要画像および映像認識タスクのための統一モデルの構築について検討する。そこで本研究では,対象と画素の分類,セグメンテーション,局所化,関連付けにまたがる10の代表的な画像および映像タスクを含む,新たな課題であるvtd(video task decathlon)を設計した。 VTDでは,1つの構造と1組の重みを持つ統一ネットワークであるVTDNetを,全10タスクに対して開発する。 VTDNetは同様のタスクをグループ化し、タスクグループ内およびタスクグループ間で情報交換を行う。すべてのタスクにラベル付けする非現実性や,多数のタスクの共同トレーニングに伴うパフォーマンス劣化を考慮し,VTDNetの学習に成功し,性能損失を軽減するためのカリキュラムトレーニング,擬似ラベル付け,ファインチューニング(CPF)方式を設計する。 CPFで武装したVTDNetは、ほとんどのタスクにおいて、全体の20%しか計算できないシングルタスクよりも大幅に優れている。 vtdは、自動運転における知覚タスクの統一を探求するための有望な新しい方向である。 Performing multiple heterogeneous visual tasks in dynamic scenes is a hallmark of human perception capability. Despite remarkable progress in image and video recognition via representation learning, current research still focuses on designing specialized networks for singular, homogeneous, or simple combination of tasks. We instead explore the construction of a unified model for major image and video recognition tasks in autonomous driving with diverse input and output structures. To enable such an investigation, we design a new challenge, Video Task Decathlon (VTD), which includes ten representative image and video tasks spanning classification, segmentation, localization, and association of objects and pixels. On VTD, we develop our unified network, VTDNet, that uses a single structure and a single set of weights for all ten tasks. VTDNet groups similar tasks and employs task interaction stages to exchange information within and between task groups. Given the impracticality of labeling all tasks on all frames, and the performance degradation associated with joint training of many tasks, we design a Curriculum training, Pseudo-labeling, and Fine-tuning (CPF) scheme to successfully train VTDNet on all tasks and mitigate performance loss. Armed with CPF, VTDNet significantly outperforms its single-task counterparts on most tasks with only 20% overall computations. VTD is a promising new direction for exploring the unification of perception tasks in autonomous driving.	翻訳日:2023-11-30 14:47:35 公開日:2023-11-26
# 単一磁束量子回路から室温へのフォトニックリンク Photonic link from single flux quantum circuits to room temperature ( http://arxiv.org/abs/2309.03284v2 ) ライセンス: Link先を確認	Mohan Shen, Jiacheng Xie, Yuntao Xu, Sihao Wang, Risheng Cheng, Wei Fu, Yiyu Zhou, Hong X. Tang	(参考訳) 低温環境と室温環境の間の広帯域でエネルギー効率の高い信号伝達は、超伝導量子回路や古典論理回路において大きなボトルネックとなっている。フォトニックリンクは、高い帯域幅と低い熱負荷を同時に提供することで、この課題を克服することを約束している。しかし、極低温電気光学変調器の開発は、超伝導回路の厳密な要求により、電気信号のフォトニック読み出しの鍵となる。例えば、ラピッド単一磁束量子回路(rsfq)は、従来の回路で使用される電圧レベルの信号よりもはるかに低い数ミリボルト(mv)の小さな信号振幅で動作している。本稿では,1m長のSEOM上に42mVの極低半波電圧V{\piを印加した新しい超伝導電気光学変調器(SEOM)により,追加の電気増幅を行なわずにRSFQ回路を初めて直接的に読み取ることを示す。超伝導体の低オーミック損失を利用して、基本V{\pi}帯域幅のトレードオフを破り、低温で0.2mのSEOMで最大17GHzの光帯域を示す。本研究は,今後の大型超伝導回路と室温電子回路間の高帯域信号伝送を実現するための有効なソリューションを提案する。 Broadband, energy-efficient signal transfer between cryogenic and room-temperature environment has been a major bottleneck for superconducting quantum and classical logic circuits. Photonic links promise to overcome this challenge by offering simultaneous high bandwidth and low thermal load. However, the development of cryogenic electro-optic modulators -- a key component for photonic readout of electrical signals -- has been stifled by the stringent requirements of superconducting circuits. Rapid single flux quantum circuits (RSFQ), for example, operate with a tiny signal amplitude of only a few millivolts (mV), far below the volt-level signal used in conventional circuits. Here, we demonstrate the first direct optical readout of an RSFQ circuit without additional electrical amplification enabled by a novel superconducting electro-optic modulator (SEOM) featuring a record-low half-wave voltage V{\pi} of 42 mV on a 1 m-long SEOM. Leveraging the low ohmic loss of superconductors, we break the fundamental V{\pi}-bandwidth trade-off and demonstrate electro-optic bandwidth up to 17 GHz on a 0.2 m-long SEOM at cryogenic temperatures. Our work presents a viable solution toward high-bandwidth signal transfer between future large-scale superconducting circuits and room-temperature electronics.	翻訳日:2023-11-30 14:47:07 公開日:2023-11-26
# プラグアンドプレイ演算子の収縮性について On the Contractivity of Plug-and-Play Operators ( http://arxiv.org/abs/2309.16899v2 ) ライセンス: Link先を確認	Chirayu D. Athalye, Kunal N. Chaudhury, and Bhartendu Kumar	(参考訳) プラグ・アンド・プレイ(PnP)正則化では、ISTAやADMMといったアルゴリズムの近似演算子を強力なデノイザに置き換える。この形式的な置換は実際驚くほどうまく機能する。実際、PnPは様々なイメージング応用に最先端の結果をもたらすことが示されている。 pnpの実証的な成功は、研究者がその理論的基盤、特に収束を理解する動機となった。先行研究において、非局所的な手段のようなカーネルのノイズに対して、pnp-istaは前方モデル上のいくつかの強い仮定の下で確実に収束することを示した。フォワードモデルにおける仮定を緩和できるか? 収束解析はPnP-ADMMに拡張できるのか? 収束率を推定できますか? 本文では, 縮尺写像定理を用いてこれらの問題を解く。 i) 対称雑音に対するPnP-ISTAとPnP-ADMMが線形収束を示すことを示す。 (II) カーネルデノイザでは, PnP-ISTA と PnP-ADMM がイメージインペイントに対して直線的に収束することを示す。再建実験を用いて理論的知見を検証した。 In plug-and-play (PnP) regularization, the proximal operator in algorithms such as ISTA and ADMM is replaced by a powerful denoiser. This formal substitution works surprisingly well in practice. In fact, PnP has been shown to give state-of-the-art results for various imaging applications. The empirical success of PnP has motivated researchers to understand its theoretical underpinnings and, in particular, its convergence. It was shown in prior work that for kernel denoisers such as the nonlocal means, PnP-ISTA provably converges under some strong assumptions on the forward model. The present work is motivated by the following questions: Can we relax the assumptions on the forward model? Can the convergence analysis be extended to PnP-ADMM? Can we estimate the convergence rate? In this letter, we resolve these questions using the contraction mapping theorem: (i) for symmetric denoisers, we show that (under mild conditions) PnP-ISTA and PnP-ADMM exhibit linear convergence; and (ii) for kernel denoisers, we show that PnP-ISTA and PnP-ADMM converge linearly for image inpainting. We validate our theoretical findings using reconstruction experiments.	翻訳日:2023-11-30 14:40:30 公開日:2023-11-26
# コンテキスト内学習に人間生成のデモンストレーションは必要か? Are Human-generated Demonstrations Necessary for In-context Learning? ( http://arxiv.org/abs/2309.14681v3 ) ライセンス: Link先を確認	Rui Li, Guoyin Wang, Jiwei Li	(参考訳) 大規模言語モデル(llm)の有望な少数ショット能力にもかかわらず、インコンテキスト学習(icl)の標準パラダイムは、選択されたデモンストレーションに対する感受性の欠点と、これらのデモを生成するための複雑さに苦しんでいる。本稿では,iclに人為的なデモンストレーションが必要かどうかという根本的な疑問を提起する。そこで本研究では,人間による実演を含まない自意識促進戦略 (sec) を提案する。 SECのキーポイントは、手作りの例をICLのデモとして使用する代わりに、SECは、最終出力がどの部分で生成されるかに基づいて、まず自身のデモを作成するようにLLMに求めていることだ。 secは柔軟なフレームワークであり、vailla iclとchain-of-thought(cot)の両方に対応できるが、より簡単である。算術推論、常識推論、マルチタスク言語理解、コード生成ベンチマークにおける広範な実験は、手作りのデモンストレーションを必要としないSECがゼロショット学習戦略を著しく上回り、手作りのデモでICLに匹敵する結果を達成していることを示している。これは、多くのタスクにおいて、現代のLLMは意思決定の能力にのみ依存し、外部のトレーニングデータの必要性を取り除くのに十分なレベルの能力を持っていることを示している。コードはhttps://github.com/ruili33/secで入手できる。 Despite the promising few-shot ability of large language models (LLMs), the standard paradigm of In-context Learning (ICL) suffers the disadvantages of susceptibility to selected demonstrations and the intricacy to generate these demonstrations. In this paper, we raise the fundamental question that whether human-generated demonstrations are necessary for ICL. To answer this question, we propose self-contemplation prompting strategy (SEC), a paradigm free from human-crafted demonstrations. The key point of SEC is that, instead of using hand-crafted examples as demonstrations in ICL, SEC asks LLMs to first create demonstrations on their own, based on which the final output is generated. SEC is a flexible framework and can be adapted to both the vanilla ICL and the chain-of-thought (CoT), but with greater ease: as the manual-generation process of both examples and rationale can be saved. Extensive experiments in arithmetic reasoning, commonsense reasoning, multi-task language understanding, and code generation benchmarks, show that SEC, which does not require hand-crafted demonstrations, significantly outperforms the zero-shot learning strategy, and achieves comparable results to ICL with hand-crafted demonstrations. This demonstrates that, for many tasks, contemporary LLMs possess a sufficient level of competence to exclusively depend on their own capacity for decision making, removing the need for external training data. Code is available at https://github.com/ruili33/SEC.	翻訳日:2023-11-30 14:38:44 公開日:2023-11-26
# 信頼の復号化:強化学習視点 Decoding trust: A reinforcement learning perspective ( http://arxiv.org/abs/2309.14598v2 ) ライセンス: Link先を確認	Guozhong Zheng, Jiqiang Zhang, Jing Zhang, Weiran Cai, and Li Chen	(参考訳) 信頼ゲームにおける行動実験は、信頼と信頼性が人間の間で普遍的であることを示し、正統派経済学において「ホモ・エコノミクス」を仮定することで予測と矛盾している。これは、何らかのメカニズムが彼らの出現を好む必要があることを意味する。しかし、以前の説明の多くは、ソーシャル学習の単純なバージョンである模倣学習に基づくいくつかの要因に頼る必要がある。ここでは、個人が蓄積した経験を通して長期的な回帰を評価することによって戦略を更新する強化学習のパラダイムに目を向ける。具体的には,q-learningアルゴリズムを用いて,受託者の意思決定を指導する2つのq-tableと関連づけた信頼ゲームについて検討する。両者のシナリオでは、個人が過去の経験と未来への回帰の両方を理解すれば、高いレベルの信頼と信頼感が生まれます。機械学的には、Qテーブルの進化は人間の心理的変化に似た交差を示す。また,ゲームパラメータの位相図も提供し,境界解析を行った。これらの発見は、シナリオが格子状個体群に拡張された場合、堅牢である。その結果,外部要因を伴わない信頼と信頼性の出現の自然な説明が得られた。さらに重要なことは、提案されたパラダイムは、人間の行動における多くのパズルを解読する可能性を示している。 Behavioral experiments on the trust game have shown that trust and trustworthiness are universal among human beings, contradicting the prediction by assuming \emph{Homo economicus} in orthodox Economics. This means some mechanism must be at work that favors their emergence. Most previous explanations however need to resort to some factors based upon imitative learning, a simple version of social learning. Here, we turn to the paradigm of reinforcement learning, where individuals update their strategies by evaluating the long-term return through accumulated experience. Specifically, we investigate the trust game with the Q-learning algorithm, where each participant is associated with two evolving Q-tables that guide one's decision making as trustor and trustee respectively. In the pairwise scenario, we reveal that high levels of trust and trustworthiness emerge when individuals appreciate both their historical experience and returns in the future. Mechanistically, the evolution of the Q-tables shows a crossover that resembles human's psychological changes. We also provide the phase diagram for the game parameters, where the boundary analysis is conducted. These findings are robust when the scenario is extended to a latticed population. Our results thus provide a natural explanation for the emergence of trust and trustworthiness without external factors involved. More importantly, the proposed paradigm shows the potential in deciphering many puzzles in human behaviors.	翻訳日:2023-11-30 14:38:15 公開日:2023-11-26
# 加速サンプリングのための自己調整型ハミルトンモンテカルロ Self-Tuning Hamiltonian Monte Carlo for Accelerated Sampling ( http://arxiv.org/abs/2309.13593v2 ) ライセンス: Link先を確認	Henrik Christiansen and Federico Errica and Francesco Alesiani	(参考訳) ハミルトニアンモンテカルロシミュレーションの性能は、積分の時間ステップと積分の回数の両方に大きく依存する。本稿では,位相空間の高速探索を促進する局所損失関数に基づいて,パラメータを自動的にチューニングする適応型汎用フレームワークを提案する。損失と自己相関時間との良好な対応が確立できることを示し、完全に微分可能なセットアップを用いた勾配に基づく最適化を実現する。この損失は、積分ステップの数に対して分布の勾配駆動的な学習を可能にするように構成される。本手法は,1次元高調波振動子とアラニンジペプチドに対して,シミュレーション手法のテストケースとして一般的である。本稿では,高調波発振器の応用により,局所極小数の多い頑丈な損失面を避けるために固定時間ステップを使わないことの重要性を強調した。アラニンジペプチドの場合、損失定義の唯一の自由パラメータをチューニングすることで、そのパラメータと自己相関時間との間に良い対応が得られ、グリッド探索と比較してシミュレーションパラメータの最適化において100ドル以上の速度が向上する。このシステムでは、インテグレータを拡張して原子依存のタイムステップを可能にし、自動相関時間でさらに25\%のコストを削減します。 The performance of Hamiltonian Monte Carlo simulations crucially depends on both the integration timestep and the number of integration steps. We present an adaptive general-purpose framework to automatically tune such parameters, based on a local loss function which promotes the fast exploration of phase-space. We show that a good correspondence between loss and autocorrelation time can be established, allowing for gradient-based optimization using a fully-differentiable set-up. The loss is constructed in such a way that it also allows for gradient-driven learning of a distribution over the number of integration steps. Our approach is demonstrated for the one-dimensional harmonic oscillator and alanine dipeptide, a small protein common as a test case for simulation methods. Through the application to the harmonic oscillator, we highlight the importance of not using a fixed timestep to avoid a rugged loss surface with many local minima, otherwise trapping the optimization. In the case of alanine dipeptide, by tuning the only free parameter of our loss definition, we find a good correspondence between it and the autocorrelation times, resulting in a $>100$ fold speed up in optimization of simulation parameters compared to a grid-search. For this system, we also extend the integrator to allow for atom-dependent timesteps, providing a further reduction of $25\%$ in autocorrelation times.	翻訳日:2023-11-30 14:35:22 公開日:2023-11-26
# MiCRO:分散DNNトレーニングのスケーリングと高速化のためのニアゼロコスト勾配スカラー化 MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training ( http://arxiv.org/abs/2310.00967v2 ) ライセンス: Link先を確認	Daegun Yoon, Sangyoon Oh	(参考訳) Gradient Sparsificationは、分散ディープニューラルネットワーク(DNN)トレーニングのスケーリングと高速化のための通信最適化技術である。これにより、グラデーション集約のための通信トラフィックが増加する。しかし、勾配選択や通信トラフィックの増加といった計算コストが高いため、既存のスパルサライザはスケーラビリティに乏しい。特に通信トラフィックの増加は勾配のビルドアップと勾配選択の不適切なしきい値によって引き起こされる。これらの課題に対処するため、我々はMiCROと呼ばれる新しい勾配スカラー化手法を提案する。 MiCROでは、勾配ベクトルは分割され、各パーティションは対応するワーカーに割り当てられる。各ワーカーはそのパーティションから勾配を選択し、集約された勾配は勾配のビルドから解放される。さらに、圧縮比誤差を最小にすることで、ユーザの要求に応じて通信トラフィックを維持するための正確な閾値を推定する。 MiCROは、分散DNNトレーニングのスケーラビリティと加速を妨げる既存の問題を解決することで、ほぼゼロのコスト勾配スカラー化を可能にする。我々の大規模な実験では、MiCROは優れた収束率を持つ最先端のスパリファイアよりも優れていた。 Gradient sparsification is a communication optimisation technique for scaling and accelerating distributed deep neural network (DNN) training. It reduces the increasing communication traffic for gradient aggregation. However, existing sparsifiers have poor scalability because of the high computational cost of gradient selection and/or increase in communication traffic. In particular, an increase in communication traffic is caused by gradient build-up and inappropriate threshold for gradient selection. To address these challenges, we propose a novel gradient sparsification method called MiCRO. In MiCRO, the gradient vector is partitioned, and each partition is assigned to the corresponding worker. Each worker then selects gradients from its partition, and the aggregated gradients are free from gradient build-up. Moreover, MiCRO estimates the accurate threshold to maintain the communication traffic as per user requirement by minimising the compression ratio error. MiCRO enables near-zero cost gradient sparsification by solving existing problems that hinder the scalability and acceleration of distributed DNN training. In our extensive experiments, MiCRO outperformed state-of-the-art sparsifiers with an outstanding convergence rate.	翻訳日:2023-11-30 14:25:26 公開日:2023-11-26
# ドロップウェイポイントによる動的マルチエージェント環境における軌道予測の改善 Improving Trajectory Prediction in Dynamic Multi-Agent Environment by Dropping Waypoints ( http://arxiv.org/abs/2309.17338v2 ) ライセンス: Link先を確認	Pranav Singh Chib, Pravendra Singh	(参考訳) 本質的に多様性があり不確実な軌跡の性質は、それらを正確にモデル化する上で非常に難しい課題である。動作予測システムは、エージェントの将来の軌跡を予測するために、過去から空間的および時間的情報を効果的に学習する必要がある。既存の多くの手法は、時間的特徴を捉えるために、積み重ねられたモデル内の別々のコンポーネントを通して時間的動きを学ぶ。さらに、観測された軌道ウェイポイントシーケンスが完了したという仮定の下では、予測手法がしばしば動作し、値が不足するシナリオを無視して、その性能に影響を与える可能性がある。さらに、これらのモデルは予測を行う際に特定のウェイポイントシーケンスに偏りがある。軌道予測モデルのトレーニング中に時間依存を明示的に組み込む時間的経路点降下(twd)と呼ばれる新しい手法を提案する。過去の観測軌道から統計的にウェイポイントを落とすことにより、モデルは残りのウェイポイントから基礎となる時間的表現を学習せざるを得なくなり、モデルが改善される。確率的時間的ウェイポイントをモデル学習プロセスに組み込むことは、欠落した値のシナリオにおけるパフォーマンスを大幅に向上させる。実験の結果, 軌道予測能力の大幅な改善が示された。提案手法は,既存の軌道予測手法を補完し,予測精度を向上させる。 NBA Sports VU, ETH-UCY, TrajNet++の3つのデータセットに対する提案手法の評価を行った。 The inherently diverse and uncertain nature of trajectories presents a formidable challenge in accurately modeling them. Motion prediction systems must effectively learn spatial and temporal information from the past to forecast the future trajectories of the agent. Many existing methods learn temporal motion via separate components within stacked models to capture temporal features. Furthermore, prediction methods often operate under the assumption that observed trajectory waypoint sequences are complete, disregarding scenarios where missing values may occur, which can influence their performance. Moreover, these models may be biased toward particular waypoint sequences when making predictions. We propose a novel approach called Temporal Waypoint Dropping (TWD) that explicitly incorporates temporal dependencies during the training of a trajectory prediction model. By stochastically dropping waypoints from past observed trajectories, the model is forced to learn the underlying temporal representation from the remaining waypoints, resulting in an improved model. Incorporating stochastic temporal waypoint dropping into the model learning process significantly enhances its performance in scenarios with missing values. Experimental results demonstrate our approach's substantial improvement in trajectory prediction capabilities. Our approach can complement existing trajectory prediction methods to improve their prediction accuracy. We evaluate our proposed approach on three datasets: NBA Sports VU, ETH-UCY, and TrajNet++.	翻訳日:2023-11-30 14:24:00 公開日:2023-11-26
# 因果に準拠した説明のための深いバックトラッキング反事実 Deep Backtracking Counterfactuals for Causally Compliant Explanations ( http://arxiv.org/abs/2310.07665v2 ) ライセンス: Link先を確認	Klaus-Rudolf Kladny, Julius von K\"ugelgen, Bernhard Sch\"olkopf, Michael Muehlebach	(参考訳) 反事実は、変化した状況下で観察されたであろうこと、事実的な観察を条件に答えることによって、貴重な洞察を与えることができる。反事実の古典的介入解釈が広く研究されている一方で、バックトラックは研究の少ない代替手段となっているが、バックトラック原理はすべての因果法がそのまま維持される代替哲学として出現している。本研究では, 深部生成成分からなる構造因果モデルにおいて, 逆追従反事実を計算するための実践的手法を提案する。そこで我々は,因果モデルの構造化潜在空間におけるトラクタブルな制約付き最適化問題を解くことで,対物生成を可能にする構造的割り当てに条件を課す。また,本定式化は,反事実的説明の分野における手法との比較も促進する。これらと比較すると,本手法は汎用性,モジュール性,因果性に準拠した代替手段である。これらの特性をmnistとcelebaの修正版で実験的に実証する。 Counterfactuals can offer valuable insights by answering what would have been observed under altered circumstances, conditional on a factual observation. Whereas the classical interventional interpretation of counterfactuals has been studied extensively, backtracking constitutes a less studied alternative the backtracking principle has emerged as an alternative philosophy where all causal laws are kept intact. In the present work, we introduce a practical method for computing backtracking counterfactuals in structural causal models that consist of deep generative components. To this end, we impose conditions on the structural assignments that enable the generation of counterfactuals by solving a tractable constrained optimization problem in the structured latent space of a causal model. Our formulation also facilitates a comparison with methods in the field of counterfactual explanations. Compared to these, our method represents a versatile, modular and causally compliant alternative. We demonstrate these properties experimentally on a modified version of MNIST and CelebA.	翻訳日:2023-11-30 14:02:07 公開日:2023-11-26
# Fed-GraB: 自己調整型グラディエントバランサによる長期学習 Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer ( http://arxiv.org/abs/2310.07587v4 ) ライセンス: Link先を確認	Zikai Xiao, Zihan Chen, Songshang Liu, Hualiang Wang, Yang Feng, Jin Hao, Joey Tianyi Zhou, Jian Wu, Howard Hao Yang, Zuozhu Liu	(参考訳) データプライバシと長期分布は、多くの現実世界のタスクで例外ではなく、標準である。本稿では,各クライアントがローカルに異種データセットを持つフェデレーション・ロングテール・ラーニング(federated long-tailed learning, fed-lt)タスクについて検討する。このような条件下では、既存のフェデレーション最適化と/または集中型ロングテール学習法はほとんど適用されない。 (a)世界的長期分布をプライバシー制約下で特徴付けること (b)頭部の不均衡に対処するために局所学習戦略を調整すること。そこで本研究では,DPA(Direct Prior Analyzer)モジュールによって評価された大域的長期分布のフィードバックに基づいて,クライアントの勾配を閉ループで再重み付けする自己調整型グラディエント・バランサ(SGB)モジュールからなる,$\texttt{Fed-GraB}$という手法を提案する。クライアントは$\texttt{Fed-GraB}$を使用することで、モデルトレーニングプロセス中にデータの不均一性によって引き起こされる分散ドリフトを効果的に軽減し、多数派クラスのパフォーマンスを維持しながら、少数派クラスのパフォーマンスを向上したグローバルモデルを得ることができる。大規模な実験では、CIFAR-10-LT、CIFAR-100-LT、ImageNet-LT、iNaturalistなどの代表的なデータセットに対して、$\texttt{Fed-GraB}$が最先端のパフォーマンスを達成することが示されている。 Data privacy and long-tailed distribution are the norms rather than the exception in many real-world tasks. This paper investigates a federated long-tailed learning (Fed-LT) task in which each client holds a locally heterogeneous dataset; if the datasets can be globally aggregated, they jointly exhibit a long-tailed distribution. Under such a setting, existing federated optimization and/or centralized long-tailed learning methods hardly apply due to challenges in (a) characterizing the global long-tailed distribution under privacy constraints and (b) adjusting the local learning strategy to cope with the head-tail imbalance. In response, we propose a method termed $\texttt{Fed-GraB}$, comprised of a Self-adjusting Gradient Balancer (SGB) module that re-weights clients' gradients in a closed-loop manner, based on the feedback of global long-tailed distribution evaluated by a Direct Prior Analyzer (DPA) module. Using $\texttt{Fed-GraB}$, clients can effectively alleviate the distribution drift caused by data heterogeneity during the model training process and obtain a global model with better performance on the minority classes while maintaining the performance of the majority classes. Extensive experiments demonstrate that $\texttt{Fed-GraB}$ achieves state-of-the-art performance on representative datasets such as CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist.	翻訳日:2023-11-30 14:01:51 公開日:2023-11-26
# ニューラルネットワークの特徴の類似性を超えて:ネットワークの特徴複雑性とそのカテゴリー理論による解釈 Going Beyond Neural Network Feature Similarity: The Network Feature Complexity and Its Interpretation Using Category Theory ( http://arxiv.org/abs/2310.06756v2 ) ライセンス: Link先を確認	Yiting Chen, Zhanpeng Zhou, Junchi Yan	(参考訳) ニューラルネットワークの振舞いはいまだ不透明であり、最近広く知られる現象は、異なるランダムパラメータで初期化されると、ネットワークが同様のパフォーマンスを達成することである。この現象は、異なるネットワークによって学習された特徴間の類似性を測定することに大きな注目を集めている。しかし、同等の機能はほとんど存在しないため、同じ機能を記述することは曖昧である。本稿では、等価機能の概念を拡張し、機能的に等価機能と呼ぶものの定義を提供する。これらの特徴は特定の変換の下で等価な出力を生成する。この定義を用いて、ニューラルネットワークが各層で学習した特徴の冗長性に関して、いわゆる特徴複雑性のより内在的な指標を導出することを目指している。我々は、数学の発達した分野である圏論のレンズを通して、我々のアプローチの正式な解釈を提供する。さらに,特徴量の定量化のために,Iterative Feature Mergingというアルゴリズムを提案する。実験結果は、様々な観点から我々の考えと理論を検証した。実験により、同じニューラルネットワークで学習された異なる特徴間で機能的等価性が広く存在し、性能に影響を与えずにネットワークのパラメータ数を削減できることを実証し、ifmはデータ非依存モデルプルーネ法として大きな可能性を示している。定義された機能の複雑さに関する興味深い経験的な発見もいくつか出てきました。 The behavior of neural networks still remains opaque, and a recently widely noted phenomenon is that networks often achieve similar performance when initialized with different random parameters. This phenomenon has attracted significant attention in measuring the similarity between features learned by distinct networks. However, feature similarity could be vague in describing the same feature since equivalent features hardly exist. In this paper, we expand the concept of equivalent feature and provide the definition of what we call functionally equivalent features. These features produce equivalent output under certain transformations. Using this definition, we aim to derive a more intrinsic metric for the so-called feature complexity regarding the redundancy of features learned by a neural network at each layer. We offer a formal interpretation of our approach through the lens of category theory, a well-developed area in mathematics. To quantify the feature complexity, we further propose an efficient algorithm named Iterative Feature Merging. Our experimental results validate our ideas and theories from various perspectives. We empirically demonstrate that the functionally equivalence widely exists among different features learned by the same neural network and we could reduce the number of parameters of the network without affecting the performance.The IFM shows great potential as a data-agnostic model prune method. We have also drawn several interesting empirical findings regarding the defined feature complexity.	翻訳日:2023-11-30 14:00:26 公開日:2023-11-26
# 多ユーザ遅延フィードバックを持つ逆帯域:理論と応用 Adversarial Bandits with Multi-User Delayed Feedback: Theory and Application ( http://arxiv.org/abs/2310.11188v2 ) ライセンス: Link先を確認	Yandi Li, Jianxiong Guo, Yupeng Li, Tian Wang, Weijia Jia	(参考訳) マルチアームバンディット(MAB)モデルは、リソース割り当て、オンライン広告、動的価格設定など、様々な現実のシナリオに適用可能性や有効性から、研究の注目を集めている。重要な分野として,学習アルゴリズムに挑戦するために,概念敵が各アームに関連する報酬分布を戦略的に選択し,エージェントがアクションを取ると対応する報酬フィードバックを受け取るまでの遅延を経験する,多くの研究者によって,遅延フィードバックを伴う敵対的mab問題が提案され,研究されている。しかし、既存のモデルは1人のユーザーのみが生成するフィードバックを制限するため、複数のユーザーの一般的なシナリオ(例えば、一群のユーザーに対する広告推薦)にモデルは適用できない。本稿では,複数ユーザからのフィードバックが遅延し,内部分布が制限されないことを考察する。対照的に、フィードバック遅延は任意であり、予めプレイヤーに未知である。また、ラウンド内の異なるユーザにとって、フィードバックの遅延は遅延相関の仮定を持たない。そこで,マルチユーザによる遅延フィードバックを用いた逆MAB問題を定式化し,異なるユーザからのフィードバックの重み付けを考慮し,各ラウンドで決定を行うEXP3アルゴリズムを改良したMUD-EXP3を設計する。既知の端末ラウンドインデックス$T$, ユーザ数$M$, アーム数$N$, 遅延上限$d_{max}$の前提で、$\mathcal{O}(\sqrt{TM^2\ln{N}(N\mathrm{e}+4d_{max})} の後悔を証明する。さらに、未知の$T$のより一般的な場合、適応アルゴリズム AMUD-EXP3 は$T$に対するサブ線形後悔と共に提案される。最後に,アルゴリズムの正しさと有効性を示すため,広範な実験を行った。 The multi-armed bandit (MAB) models have attracted significant research attention due to their applicability and effectiveness in various real-world scenarios such as resource allocation, online advertising, and dynamic pricing. As an important branch, the adversarial MAB problems with delayed feedback have been proposed and studied by many researchers recently where a conceptual adversary strategically selects the reward distributions associated with each arm to challenge the learning algorithm and the agent experiences a delay between taking an action and receiving the corresponding reward feedback. However, the existing models restrict the feedback to be generated from only one user, which makes models inapplicable to the prevailing scenarios of multiple users (e.g. ad recommendation for a group of users). In this paper, we consider that the delayed feedback results are from multiple users and are unrestricted on internal distribution. In contrast, the feedback delay is arbitrary and unknown to the player in advance. Also, for different users in a round, the delays in feedback have no assumption of latent correlation. Thus, we formulate an adversarial MAB problem with multi-user delayed feedback and design a modified EXP3 algorithm MUD-EXP3, which makes a decision at each round by considering the importance-weighted estimator of the received feedback from different users. On the premise of known terminal round index $T$, the number of users $M$, the number of arms $N$, and upper bound of delay $d_{max}$, we prove a regret of $\mathcal{O}(\sqrt{TM^2\ln{N}(N\mathrm{e}+4d_{max})})$. Furthermore, for the more common case of unknown $T$, an adaptive algorithm AMUD-EXP3 is proposed with a sublinear regret with respect to $T$. Finally, extensive experiments are conducted to indicate the correctness and effectiveness of our algorithms.	翻訳日:2023-11-30 13:53:49 公開日:2023-11-26
# 公共インターネットデータを用いたマルチモーダル基礎モデルの不確かさの推定 Estimating Uncertainty in Multimodal Foundation Models using Public Internet Data ( http://arxiv.org/abs/2310.09926v2 ) ライセンス: Link先を確認	Shiladitya Dutta, Hongbo Wei, Lars van der Laan, Ahmed M. Alaa	(参考訳) ファンデーションモデルは、自己教師付き学習を使用して大規模な大量のデータに基づいて訓練されており、幅広い下流タスクへの適応を可能にする。テスト時には、これらのモデルはゼロショット機能を示し、以前は目に見えない(ユーザ指定)カテゴリを分類することができる。本稿では,これらのゼロショット予測における不確かさを定量化する問題に対処する。ウェブデータとの共形予測を用いたゼロショット設定における不確実性推定のためのヒューリスティック手法を提案する。テスト時に一連のクラスが与えられると、プロンプトテンプレート("a image of a <category>"など)を使用してクリップスタイルのモデルでゼロショットの分類を行い、オープンwebからのキャリブレーションデータに対する検索クエリと同じテンプレートを使用する。 webベースのキャリブレーションセットが与えられた場合、検索されたwebデータの潜在的なエラーを考慮し、新しいコンフォメーションスコアにコンフォメーション予測を適用する。本研究は, 生物医学基礎モデルにおける提案手法の有用性を評価し, 様々な生体医学データセットにおいて, 対象範囲を満足できる効率で達成できることを予備的に示した。 Foundation models are trained on vast amounts of data at scale using self-supervised learning, enabling adaptation to a wide range of downstream tasks. At test time, these models exhibit zero-shot capabilities through which they can classify previously unseen (user-specified) categories. In this paper, we address the problem of quantifying uncertainty in these zero-shot predictions. We propose a heuristic approach for uncertainty estimation in zero-shot settings using conformal prediction with web data. Given a set of classes at test time, we conduct zero-shot classification with CLIP-style models using a prompt template, e.g., "an image of a <category>", and use the same template as a search query to source calibration data from the open web. Given a web-based calibration set, we apply conformal prediction with a novel conformity score that accounts for potential errors in retrieved web data. We evaluate the utility of our proposed method in Biomedical foundation models; our preliminary results show that web-based conformal prediction sets achieve the target coverage with satisfactory efficiency on a variety of biomedical datasets.	翻訳日:2023-11-30 13:49:55 公開日:2023-11-26
# 事前学習拡散モデルのh空間における解釈方向の教師なし発見 Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models ( http://arxiv.org/abs/2310.09912v2 ) ライセンス: Link先を確認	Zijian Zhang, Luping Liu. Zhijie Lin, Yichen Zhu, Zhou Zhao	(参考訳) 本稿では,事前学習された拡散モデルのh空間における解釈可能な方向を識別する,教師なし学習に基づく最初の手法を提案する。提案手法は,GAN潜在空間で動作する既存の手法から導かれる。具体的には、事前学習した拡散モデルのh-スペースで動作するシフト制御モジュールを用いて、サンプルを自分自身のシフトバージョンに操作し、次いで再構成器を用いて操作のタイプと強度を再現する。それらを共同で最適化することで、モデルは自然に絡み合った解釈可能な方向を発見する。無意味かつ破壊的な方向の発見を防止するため、シフトサンプルの忠実性を維持するために識別器を用いる。拡散モデルの反復的生成過程のため、バックプロパゲート勾配に多くの中間テンソルを格納するために、我々のトレーニングは相当量のGPU VRAMを必要とする。この問題に対処するため, 勾配チェックポインティングに基づく一般的なVRAM効率トレーニングアルゴリズムを提案し, VRAMの占有を許容し, トレーニング効率を犠牲にしながら, 生成過程全体を通して勾配をバックプロパガントする。拡散モデルに関する既存の研究と比較して,本手法は,他の複雑な手順を必要とせず,本質的にグローバルかつスケーラブルな方向を識別する。各種データセットに対する大規模な実験により,本手法の有効性が示された。 We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models. Our method is derived from an existing technique that operates on the GAN latent space. Specifically, we employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself, followed by a reconstructor to reproduce both the type and the strength of the manipulation. By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions. To prevent the discovery of meaningless and destructive directions, we employ a discriminator to maintain the fidelity of shifted sample. Due to the iterative generative process of diffusion models, our training requires a substantial amount of GPU VRAM to store numerous intermediate tensors for back-propagating gradient. To address this issue, we propose a general VRAM-efficient training algorithm based on gradient checkpointing technique to back-propagate any gradient through the whole generative process, with acceptable occupancy of VRAM and sacrifice of training efficiency. Compared with existing related works on diffusion models, our method inherently identifies global and scalable directions, without necessitating any other complicated procedures. Extensive experiments on various datasets demonstrate the effectiveness of our method.	翻訳日:2023-11-30 13:49:35 公開日:2023-11-26
# ハリケーントラジェクタの地理空間予測のためのグラフ変換器 GraphTransformers for Geospatial Forecasting of Hurricane Trajectories ( http://arxiv.org/abs/2310.20174v2 ) ライセンス: Link先を確認	Pallavi Banerjee, Satyaki Chakraborty	(参考訳) 本稿では,グラフトランスフォーマを用いた地理空間シーケンスの軌跡予測のための新しい枠組みを提案する。いくつかのシーケンスを見渡すと、そのようなシーケンスモデリングタスクを考慮せずに、異なる地理空間ポイント間でグラフ構造が自動的に現れるのが観察された。このグラフ構造を明示的に活用することで,地理空間的軌道予測を大幅に改善できることを示す。当社のGraphTransformerアプローチは,ハリケーンの軌跡を6時間単位で予測するデータセットであるHURDATに基づいて,最先端のTransformerベースのベースラインを大幅に改善する。 In this paper we introduce a novel framework for trajectory prediction of geospatial sequences using GraphTransformers. When viewed across several sequences, we observed that a graph structure automatically emerges between different geospatial points that is often not taken into account for such sequence modeling tasks. We show that by leveraging this graph structure explicitly, geospatial trajectory prediction can be significantly improved. Our GraphTransformer approach improves upon state-of-the-art Transformer based baseline significantly on HURDAT, a dataset where we are interested in predicting the trajectory of a hurricane on a 6 hourly basis.	翻訳日:2023-11-30 13:29:14 公開日:2023-11-26
# FLIP: CTR予測のためのIDベースモデルと事前学習言語モデルとの微粒なアライメントを目指して FLIP: Towards Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction ( http://arxiv.org/abs/2310.19453v2 ) ライセンス: Link先を確認	Hangyu Wang, Jianghao Lin, Xiangyang Li, Bo Chen, Chenxu Zhu, Ruiming Tang, Weinan Zhang, Yong Yu	(参考訳) クリックスルーレート(CTR)予測は、さまざまなパーソナライズされたオンラインサービスにおいてコア機能モジュールとして機能する。 CTR予測のための従来のIDベースのモデルは、特徴相互作用モデリングを通じて協調的な信号をキャプチャする表形式での1ホット符号化ID特徴を入力として捉えている。しかし、ワンホットエンコーディングは、元のフィーチャーテキストにある意味情報を破棄する。近年、PLM(Pretrained Language Models)の出現は、ハードプロンプトテンプレートによって得られるテキストモダリティの文を入力として、意味知識を抽出するためにPLMを採用する別のパラダイムを生み出している。しかし、一般的にPLMは入力されたテキストデータをサブワードトークンにトークン化し、フィールドワイドの協調信号を無視する。したがって、これらの2つの研究は、同じ入力データ(例えば、テキストと表のモダリティ)の異なる特性に焦点を当て、相互に相補的な関係を形成する。本稿では,CTR予測のためのIDベースモデルと事前学習言語モデル(FLIP)間の細粒度特徴レベルのアライメントを提案する。マスク型言語と表型モデリングの両方のための新しい統合再構築事前学習タスクをデザインする。具体的には、一方のモダリティ(トークンや特徴)のマスクされたデータは、他方のモダリティの助けを借りて復元され、双対モダリティ間の十分な相互情報抽出を通じて特徴レベルの相互作用とアライメントを確立する必要がある。さらに,下流のctr予測タスクに対して,idベースモデルとplmを共同で微調整し,両モデルの利点を組み合わせることにより,優れた性能を実現することを提案する。 3つの実世界のデータセットに対する大規模な実験により、FLIPはSOTAベースラインより優れており、様々なIDベースのモデルやPLMと高い互換性を持つことが示された。 Click-through rate (CTR) prediction plays as a core function module in various personalized online services. The traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality, which capture the collaborative signals via feature interaction modeling. But the one-hot encoding discards the semantic information conceived in the original feature texts. Recently, the emergence of Pretrained Language Models (PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality obtained by hard prompt templates and adopts PLMs to extract the semantic knowledge. However, PLMs generally tokenize the input text data into subword tokens and ignore field-wise collaborative signals. Therefore, these two lines of research focus on different characteristics of the same input data (i.e., textual and tabular modalities), forming a distinct complementary relationship with each other. In this paper, we propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models (FLIP) for CTR prediction. We design a novel joint reconstruction pretraining task for both masked language and tabular modeling. Specifically, the masked data of one modality (i.e., tokens or features) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose to jointly finetune the ID-based model and PLM for downstream CTR prediction tasks, thus achieving superior performance by combining the advantages of both models. Extensive experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible for various ID-based models and PLMs.	翻訳日:2023-11-30 13:27:38 公開日:2023-11-26
# 3次元不確かさ場の推定:神経放射場に対する不確かさの定量化 Estimating 3D Uncertainty Field: Quantifying Uncertainty for Neural Radiance Fields ( http://arxiv.org/abs/2311.01815v2 ) ライセンス: Link先を確認	Jianxiong Shen and Ruijie Ren and Adria Ruiz and Francesc Moreno-Noguer	(参考訳) ニューラル・ラジアンス・フィールド(NeRF)に基づく現在の手法では、特に隠蔽されたシーンや外部シーンの内容を含む見えない領域において、予測の不確かさを定量化する能力が著しく欠如している。この制限は、モデル予測の信頼性を未知の環境でのロボット探索や計画といったタスクに考慮しなければならないロボット工学の広範な応用を妨げる。そこで本研究では,これらの不完全領域を明示的に識別する学習不完全シーン幾何に基づく3次元不確かさ場を推定する新しい手法を提案する。各カメラ線に沿って蓄積された透過率を考慮すると、不確実性フィールドは2次元不確かさを推定し、シーン内容の内外に直接投射する光に対して高い値を示す。学習面上の不確実性を定量化するために,確率的放射場をモデル化する。近年の手法と比較して、3D未確認領域と2Dレンダリングピクセルの両方で高い不確実性について明確に推論できるのは,本手法のみであることを示す。さらに,我々が設計した不確実性分野は,次の視点選択のような実世界のロボット作業に理想的に適していることを示す。 Current methods based on Neural Radiance Fields (NeRF) significantly lack the capacity to quantify uncertainty in their predictions, particularly on the unseen space including the occluded and outside scene content. This limitation hinders their extensive applications in robotics, where the reliability of model predictions has to be considered for tasks such as robotic exploration and planning in unknown environments. To address this, we propose a novel approach to estimate a 3D Uncertainty Field based on the learned incomplete scene geometry, which explicitly identifies these unseen regions. By considering the accumulated transmittance along each camera ray, our Uncertainty Field infers 2D pixel-wise uncertainty, exhibiting high values for rays directly casting towards occluded or outside the scene content. To quantify the uncertainty on the learned surface, we model a stochastic radiance field. Our experiments demonstrate that our approach is the only one that can explicitly reason about high uncertainty both on 3D unseen regions and its involved 2D rendered pixels, compared with recent methods. Furthermore, we illustrate that our designed uncertainty field is ideally suited for real-world robotics tasks, such as next-best-view selection.	翻訳日:2023-11-30 13:17:19 公開日:2023-11-26
# EHR監査ログのエントロピー推定のための自己回帰型言語モデル Autoregressive Language Models For Estimating the Entropy of Epic EHR Audit Logs ( http://arxiv.org/abs/2311.06401v3 ) ライセンス: Link先を確認	Benjamin C. Warner, Thomas Kannampallil, Seunghwan Kim	(参考訳) EHR監査ログは、臨床医の活動を捉えた、非常にきめ細かい出来事のストリームであり、電子健康記録(EHR)で臨床医のワークフローを特徴づける研究において重要な領域である。 EHR監査ログ(監査ログ)を通じてワークフローの複雑さを測定する既存のテクニックには、EHRセッションの完全な複雑さを捉えることができない時間または周波数ベースの横断的な集約が含まれる。ワークフロー内の動作シーケンスのエントロピーや不規則性を測定し、評価モデルを公開する上で、トランスフォーマティブベースの表型言語モデル(tabular lm)の使用法を簡単に評価する。 EHR audit logs are a highly granular stream of events that capture clinician activities, and is a significant area of interest for research in characterizing clinician workflow on the electronic health record (EHR). Existing techniques to measure the complexity of workflow through EHR audit logs (audit logs) involve time- or frequency-based cross-sectional aggregations that are unable to capture the full complexity of a EHR session. We briefly evaluate the usage of transformer-based tabular language model (tabular LM) in measuring the entropy or disorderedness of action sequences within workflow and release the evaluated models publicly.	翻訳日:2023-11-30 13:07:45 公開日:2023-11-26
# 証明可能訓練可能な回転同値量子機械学習 Provably Trainable Rotationally Equivariant Quantum Machine Learning ( http://arxiv.org/abs/2311.05873v2 ) ライセンス: Link先を確認	Maxwell T. West, Jamie Heredge, Martin Sevior and Muhammad Usman	(参考訳) 優れた機械学習アルゴリズムを実現するために量子計算のパワーを爆発させることは、近年では大きな研究の焦点となっているが、量子機械学習(QML)の展望は、かなりの技術的課題によって低下している。特に重要な問題は、一般的なQMLモデルは、トレーニングランドスケープにおいていわゆる不毛の台地に悩まされていることだ。この効果に対抗するための主要な戦略は、ヒルベルト空間のより小さく関連する部分集合に集中するために、データの対称性を考慮した問題固有のモデルを構築することである。本研究では、量子フーリエ変換に基づいて構築された回転同変QMLモデルの族を導入し、リー代数的なQMLモデルの最近の知見を活用し、我々のモデルのサブセットがバレンプラトーを示さないことを示す。解析結果に加えて, シリコン中のリン不純物の模擬走査トンネル顕微鏡画像のデータセット上で, 回転対称性が自然に生じる場合の回転同変モデルを数値的に検証し, それらが実用上劇的に向上していることを見出した。 Exploiting the power of quantum computation to realise superior machine learning algorithmshas been a major research focus of recent years, but the prospects of quantum machine learning (QML) remain dampened by considerable technical challenges. A particularly significant issue is that generic QML models suffer from so-called barren plateaus in their training landscapes -- large regions where cost function gradients vanish exponentially in the number of qubits employed, rendering large models effectively untrainable. A leading strategy for combating this effect is to build problem-specific models which take into account the symmetries of their data in order to focus on a smaller, relevant subset of Hilbert space. In this work, we introduce a family of rotationally equivariant QML models built upon the quantum Fourier transform, and leverage recent insights from the Lie-algebraic study of QML models to prove that (a subset of) our models do not exhibit barren plateaus. In addition to our analytical results we numerically test our rotationally equivariant models on a dataset of simulated scanning tunnelling microscope images of phosphorus impurities in silicon, where rotational symmetry naturally arises, and find that they dramatically outperform their generic counterparts in practice.	翻訳日:2023-11-30 13:07:14 公開日:2023-11-26
# 非定常テスト時間適応のための層間自動重み付け Layer-wise Auto-Weighting for Non-Stationary Test-Time Adaptation ( http://arxiv.org/abs/2311.05858v3 ) ライセンス: Link先を確認	Junyoung Park, Jin Kim, Hyeongjun Kwon, Ilhoon Yoon, Kwanghoon Sohn	(参考訳) 実世界のアプリケーションにおける推論中のドメインシフトの必然性を考えると、テスト時間適応(TTA)はデプロイ後のモデル適応に不可欠である。しかし、目標分布を継続的に変化させる現実のシナリオは、破滅的な忘れ込みやエラーの蓄積といった課題を呈している。非定常領域シフトのための既存のTTAメソッドは、有効ではあるが過剰な計算負荷を発生させ、デバイス上の設定では実用的ではない。本稿では,保存や集中的適応のための層を自律的に識別する連続的および漸進的ttaの自動重み付けアルゴリズムを提案する。 fim(fisher information matrix)を活用することで,まず学習重みを設計,無関係なものを保存しつつ,ログライクな変化に関連するレイヤを選択的に重視する。そこで我々はさらに,特定の層をほぼ凍結させる指数的min-maxスケーラを提案する。これにより、忘れとエラーの蓄積を最小限に抑え、非定常目標分布に効率よく適応する。 CIFAR-10C, CIFAR-100C, ImageNet-C を用いた実験により,本手法は従来の連続的および漸進的TTA手法より優れ, 計算負荷を著しく低減し, 連続的あるいは漸進的な目標領域への適応におけるFIMベースの学習重みの重要性を強調した。 Given the inevitability of domain shifts during inference in real-world applications, test-time adaptation (TTA) is essential for model adaptation after deployment. However, the real-world scenario of continuously changing target distributions presents challenges including catastrophic forgetting and error accumulation. Existing TTA methods for non-stationary domain shifts, while effective, incur excessive computational load, making them impractical for on-device settings. In this paper, we introduce a layer-wise auto-weighting algorithm for continual and gradual TTA that autonomously identifies layers for preservation or concentrated adaptation. By leveraging the Fisher Information Matrix (FIM), we first design the learning weight to selectively focus on layers associated with log-likelihood changes while preserving unrelated ones. Then, we further propose an exponential min-max scaler to make certain layers nearly frozen while mitigating outliers. This minimizes forgetting and error accumulation, leading to efficient adaptation to non-stationary target distribution. Experiments on CIFAR-10C, CIFAR-100C, and ImageNet-C show our method outperforms conventional continual and gradual TTA approaches while significantly reducing computational load, highlighting the importance of FIM-based learning weight in adapting to continuously or gradually shifting target domains.	翻訳日:2023-11-30 13:06:52 公開日:2023-11-26
# Mirror: さまざまな情報抽出タスクのためのユニバーサルフレームワーク Mirror: A Universal Framework for Various Information Extraction Tasks ( http://arxiv.org/abs/2311.05419v2 ) ライセンス: Link先を確認	Tong Zhu, Junfei Ren, Zijian Yu, Mengsong Wu, Guoliang Zhang, Xiaoye Qu, Wenliang Chen, Zhefeng Wang, Baoxing Huai, Min Zhang	(参考訳) 情報抽出タスク間の知識の共有は、さまざまなデータフォーマットとタスクのバリエーションのため、常に課題となっている。一方、この分散は情報の無駄を招き、実際のシナリオにおける複雑なアプリケーション構築の困難を増す。最近の研究は、しばしば三重項抽出問題としてIEタスクを定式化している。しかし、そのようなパラダイムはマルチスパンとn-ary抽出をサポートしておらず、弱い汎用性をもたらす。この目的のために、我々はIE問題を統一されたマルチスロットタプルに再編成し、様々なIEタスク、すなわちMirrorのための普遍的なフレームワークを提案する。具体的には、既存のieタスクをマルチスパン循環グラフ抽出問題として再キャストし、非自己回帰グラフ復号アルゴリズムを考案し、すべてのスパンを1ステップで抽出する。このグラフ構造は驚くほど汎用性があり、複雑なIEタスクだけでなく、機械読み取りの理解や分類タスクもサポートしています。モデル事前学習のための57のデータセットを含むコーパスを手動で構築し、8つの下流タスクにわたる30のデータセットで実験を行う。実験結果から,本モデルは良好な互換性を示し,ショット数やゼロショット数でSOTAシステムと競合する性能を示した。コード、モデルの重み付け、事前トレーニングコーパスはhttps://github.com/Spico197/Mirror.orgで入手できる。 Sharing knowledge between information extraction tasks has always been a challenge due to the diverse data formats and task variations. Meanwhile, this divergence leads to information waste and increases difficulties in building complex applications in real scenarios. Recent studies often formulate IE tasks as a triplet extraction problem. However, such a paradigm does not support multi-span and n-ary extraction, leading to weak versatility. To this end, we reorganize IE problems into unified multi-slot tuples and propose a universal framework for various IE tasks, namely Mirror. Specifically, we recast existing IE tasks as a multi-span cyclic graph extraction problem and devise a non-autoregressive graph decoding algorithm to extract all spans in a single step. It is worth noting that this graph structure is incredibly versatile, and it supports not only complex IE tasks, but also machine reading comprehension and classification tasks. We manually construct a corpus containing 57 datasets for model pretraining, and conduct experiments on 30 datasets across 8 downstream tasks. The experimental results demonstrate that our model has decent compatibility and outperforms or reaches competitive performance with SOTA systems under few-shot and zero-shot settings. The code, model weights, and pretraining corpus are available at https://github.com/Spico197/Mirror .	翻訳日:2023-11-30 13:05:05 公開日:2023-11-26
# 抽象・推論課題におけるヒト, GPT-4, GPT-4Vの比較 Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks ( http://arxiv.org/abs/2311.09247v2 ) ライセンス: Link先を確認	Melanie Mitchell, Alessandro B. Palmarini, Arseny Moskvichev	(参考訳) GPT-4のテキストのみおよびマルチモーダル版の抽象的推論能力について,コア知識の概念による堅牢な理解と推論の評価を目的としたConceptARCベンチマーク[10]を用いて検討する。我々はmoskvichevらの仕事を拡大する。 [10]概念ARCタスクのテキストバージョンでGPT-4をより詳細に評価し(単純なゼロショットプロンプトではなく)、最も単純なタスクの画像バージョンを用いてGPT-4のマルチモーダルバージョンであるGPT-4Vを評価する。実験結果から,GPT-4のどちらのバージョンも人間に近いレベルで頑健な抽象化能力を開発していないという結論が得られた。 We explore the abstract reasoning abilities of text-only and multimodal versions of GPT-4, using the ConceptARC benchmark [10], which is designed to evaluate robust understanding and reasoning with core-knowledge concepts. We extend the work of Moskvichev et al. [10] by evaluating GPT-4 on more detailed, one-shot prompting (rather than simple, zero-shot prompts) with text versions of ConceptARC tasks, and by evaluating GPT-4V, the multimodal version of GPT-4, on zero- and one-shot prompts using image versions of the simplest tasks. Our experimental results support the conclusion that neither version of GPT-4 has developed robust abstraction abilities at humanlike levels.	翻訳日:2023-11-30 12:56:24 公開日:2023-11-26
# 微粒化エンタングルメントの精製 Tetrationally Compact Entanglement Purification ( http://arxiv.org/abs/2311.10971v2 ) ライセンス: Link先を確認	Craig Gidney	(参考訳) 本論文は, 絡み合いの共有に使用される量子チャネルにのみノイズの源が存在することを前提として, 絡み合いを最小限のストレージで浄化できることを示唆する。目標の不確かさである$\epsilon$との絡み合ったペアは、$o(\log^{\ast} \frac{1}{\epsilon})$ストレージ空間を使って$\tilde{o}(\log \frac{1}{\epsilon})$で作成することができる。これは、エラー検出の複数のステージを使用して、各ステージ内で強化される。例えば、11キュービットのノイズのないストレージは、エンタングルメントを3ドルの不確かさで10-100000000000000000000000000000000000000000000000}$のエンタングルメントに変換するのに十分であることを示している。 This paper shows that entanglement can be purified using very little storage, assuming the only source of noise is in the quantum channel being used to share the entanglement. Entangled pairs with a target infidelity of $\epsilon$ can be created in $\tilde{O}(\log \frac{1}{\epsilon})$ time using $O(\log^{\ast} \frac{1}{\epsilon})$ storage space, where $\log^{\ast}$ is the iterated logarithm. This is achieved by using multiple stages of error detection, with boosting within each stage. For example, the paper shows that 11 qubits of noiseless storage is enough to turn entanglement with an infidelity of $1/3$ into entanglement with an infidelity of $10^{-1000000000000000000000000000}$.	翻訳日:2023-11-30 12:45:48 公開日:2023-11-26
# ケースリポジトリ:aiアライメントのためのケースベース推論に向けて Case Repositories: Towards Case-Based Reasoning for AI Alignment ( http://arxiv.org/abs/2311.10934v3 ) ライセンス: Link先を確認	K. J. Kevin Feng, Quan Ze Chen, Inyoung Cheong, King Xia, Amy X. Zhang	(参考訳) ケーススタディは一般的に、法、倫理、その他の多くの領域において、人間の価値観によって知らされる複雑で曖昧な社会的問題に直面している。 aiが実際にどのように連携すべきかを考えると、同じような複雑さと曖昧さが生まれます。異なる個人やコミュニティの多様な(そして時には矛盾する)価値に直面するとき、その価値はaiと一致し、aiはどうすればよいのか? ケースベース推論(CBR)の考え方を基礎として,一組の事例に基づく判断による政策構築に焦点を当てた,立憲AIアライメントのための補完的アプローチを提案する。このようなケースリポジトリを組み立てるプロセスを示します。 1) 'seed'' ケースのセットの収集 -- ai システムに質問する可能性のある質問 -- 特定のドメインにおいて。 2【ドメインの専門家とのワークショップによるケースのドメイン固有のキーディメンジョンの抽出】 3) LLM を用いて野生で見られない症例のバリエーションを発生させ, 4) 事件の審理及び改善を公に行うこと。次に、このようなケースリポジトリがaiアライメントにどのように役立つかについて議論し、受け入れ可能な行動の先例として直接行動し、個人やコミュニティがaiの倫理的推論に携わる媒体としての役割を論じる。 Case studies commonly form the pedagogical backbone in law, ethics, and many other domains that face complex and ambiguous societal questions informed by human values. Similar complexities and ambiguities arise when we consider how AI should be aligned in practice: when faced with vast quantities of diverse (and sometimes conflicting) values from different individuals and communities, with whose values is AI to align, and how should AI do so? We propose a complementary approach to constitutional AI alignment, grounded in ideas from case-based reasoning (CBR), that focuses on the construction of policies through judgments on a set of cases. We present a process to assemble such a case repository by: 1) gathering a set of ``seed'' cases -- questions one may ask an AI system -- in a particular domain, 2) eliciting domain-specific key dimensions for cases through workshops with domain experts, 3) using LLMs to generate variations of cases not seen in the wild, and 4) engaging with the public to judge and improve cases. We then discuss how such a case repository could assist in AI alignment, both through directly acting as precedents to ground acceptable behaviors, and as a medium for individuals and communities to engage in moral reasoning around AI.	翻訳日:2023-11-30 12:45:25 公開日:2023-11-26
# OCT2 Confocal: 3D CycleGANによる網膜OCT画像の共焦点顕微鏡への変換 OCT2Confocal: 3D CycleGAN based Translation of Retinal OCT Images to Confocal Microscopy ( http://arxiv.org/abs/2311.10902v2 ) ライセンス: Link先を確認	Xin Tian, Nantheera Anantrasirichai, Lindsay Nicholson, Alin Achim	(参考訳) 光コヒーレンス断層撮影(oct)と共焦点顕微鏡は網膜イメージングにおいて重要な役割を果たす。 in vivo octは高速で非侵襲的なイメージングを提供するが、明快な問題やモーションアーティファクトによって妨げられる。生体内共焦点顕微鏡は高解像度の細胞色像を提供するが、侵襲的であり、倫理的懸念と潜在的な組織損傷をもたらす。これらのモダリティを橋渡しするために,生体共焦点顕微鏡画像へのOCTの教師なし翻訳のための3D CycleGANフレームワークを開発した。 OCT2Confocalのデータセットに適用すると、このフレームワークは3Dの医療データドメイン間で効果的に翻訳され、血管、テクスチャ、細胞の詳細を精度良くキャプチャする。これは、octの固有の3d情報を活用し、共焦点顕微鏡のリッチで詳細な色領域に変換する最初の試みである。 3D CycleGANフレームワークは、量的および質的なメトリクスを通じて評価され、圧縮可能な画像の忠実さと品質を示し、制限されたデータの制約にもかかわらず既存の手法より優れている。この非侵襲的な網膜共焦点画像の生成は、眼科における診断とモニタリング機能をさらに強化する可能性がある。 Optical coherence tomography (OCT) and confocal microscopy are pivotal in retinal imaging, each presenting unique benefits and limitations. In vivo OCT offers rapid, non-invasive imaging but can be hampered by clarity issues and motion artifacts. Ex vivo confocal microscopy provides high-resolution, cellular detailed color images but is invasive and poses ethical concerns and potential tissue damage. To bridge these modalities, we developed a 3D CycleGAN framework for unsupervised translation of in vivo OCT to ex vivo confocal microscopy images. Applied to our OCT2Confocal dataset, this framework effectively translates between 3D medical data domains, capturing vascular, textural, and cellular details with precision. This marks the first attempt to exploit the inherent 3D information of OCT and translate it into the rich, detailed color domain of confocal microscopy. Assessed through quantitative and qualitative metrics, the 3D CycleGAN framework demonstrates commendable image fidelity and quality, outperforming existing methods despite the constraints of limited data. This non-invasive generation of retinal confocal images has the potential to further enhance diagnostic and monitoring capabilities in ophthalmology.	翻訳日:2023-11-30 12:45:01 公開日:2023-11-26
# 信頼できる大規模ビジョンモデル:サーベイ Trustworthy Large Models in Vision: A Survey ( http://arxiv.org/abs/2311.09680v3 ) ライセンス: Link先を確認	Ziyan Guo and Li Xu and Jun Liu	(参考訳) 大規模モデル(LM)の急速な進歩は、最近、自然言語処理(NLP)からコンピュータビジョン(CV)まで、様々な分野の深層学習に革命をもたらした。しかし、LMは強力な性能を持つが信頼できない行動のため、学界や業界によってますます批判され、信頼性の高い方法によって緊急に緩和される必要がある。 NLPにおける信頼できるLMに関する文献が豊富にあるにもかかわらず、CVにおけるLMの信頼性を特に調査する体系的な調査はいまだに残っていない。このギャップを緩和するために,本調査におけるlmsの視点における信頼に値する利用を妨げる4つの懸念を要約する。 1)人間の誤用。 2)脆弱性。 3)本質的な問題 4) 解釈可能。それぞれの課題、対策、議論を強調することにより、この調査が読者のこの分野に対する理解を促進し、LMと人間の期待との整合を促進し、人類社会の災害というよりは、信頼できるLMを福祉として機能させることを期待する。 The rapid progress of Large Models (LMs) has recently revolutionized various fields of deep learning with remarkable grades, ranging from Natural Language Processing (NLP) to Computer Vision (CV). However, LMs are increasingly challenged and criticized by academia and industry due to their powerful performance but untrustworthy behavior, which urgently needs to be alleviated by reliable methods. Despite the abundance of literature on trustworthy LMs in NLP, a systematic survey specifically delving into the trustworthiness of LMs in CV remains absent. In order to mitigate this gap, we summarize four relevant concerns that obstruct the trustworthy usage in vision of LMs in this survey, including 1) human misuse, 2) vulnerability, 3) inherent issue and 4) interpretability. By highlighting corresponding challenge, countermeasures, and discussion in each topic, we hope this survey will facilitate readers' understanding of this field, promote alignment of LMs with human expectations and enable trustworthy LMs to serve as welfare rather than disaster for human society.	翻訳日:2023-11-30 12:41:12 公開日:2023-11-26
# 光フローのないビデオフレーム補間のためのマルチインシングルアウトネットワーク A Multi-In-Single-Out Network for Video Frame Interpolation without Optical Flow ( http://arxiv.org/abs/2311.11602v2 ) ライセンス: Link先を確認	Jaemin Lee, Minseok Seo, Sangwoo Lee, Hyobin Park, Dong-Geol Choi	(参考訳) 一般に、深層学習に基づくビデオフレーム補間(vfi)法は、主に2つの入力フレーム間の動きベクトルを推定し、それを目標時間にゆがめることに焦点を当てている。このアプローチは2つの入力フレーム間の線形運動に対して顕著な性能を示すが、オクルージョンや非線形運動を扱う際の限界を示す。近年,これらの問題に対処するための生成モデルがVFIに適用されている。しかしながら、VFIは可塑性画像の生成に重点を置いているのではなく、与えられた2つのフレーム間の正確な中間フレームの予測に重点を置いているため、性能制限は継続する。本稿では,動作ベクトル推定に依存しないマルチインシングルアウト(MISO)に基づくVFI手法を提案し,オクルージョンと非線形動作を効果的にモデル化する。さらに,MISO-VFIによりビデオフレーム内の時空間相関をよりよく捉えることができる新しい動き知覚損失を導入する。 MISO-VFI法は,VFIベンチマークのVimeo90K,Middlebury,UCF101において,既存手法と比較して高い性能差を示した。 In general, deep learning-based video frame interpolation (VFI) methods have predominantly focused on estimating motion vectors between two input frames and warping them to the target time. While this approach has shown impressive performance for linear motion between two input frames, it exhibits limitations when dealing with occlusions and nonlinear movements. Recently, generative models have been applied to VFI to address these issues. However, as VFI is not a task focused on generating plausible images, but rather on predicting accurate intermediate frames between two given frames, performance limitations still persist. In this paper, we propose a multi-in-single-out (MISO) based VFI method that does not rely on motion vector estimation, allowing it to effectively model occlusions and nonlinear motion. Additionally, we introduce a novel motion perceptual loss that enables MISO-VFI to better capture the spatio-temporal correlations within the video frames. Our MISO-VFI method achieves state-of-the-art results on VFI benchmarks Vimeo90K, Middlebury, and UCF101, with a significant performance gap compared to existing approaches.	翻訳日:2023-11-30 12:31:46 公開日:2023-11-26
# akconv: 任意のサンプル形状と任意の数のパラメータを持つ畳み込みカーネル AKConv: Convolutional Kernel with Arbitrary Sampled Shapes and Arbitrary Number of Parameters ( http://arxiv.org/abs/2311.11587v2 ) ライセンス: Link先を確認	Xin Zhang, Yingze Song, Tingting Song, Degang Yang, Yichen Ye, Jie Zhou and Liming Zhang	(参考訳) 畳み込み操作に基づくニューラルネットワークは、ディープラーニングの分野で顕著な成果を上げているが、標準的な畳み込み操作には2つの固有の欠陥がある。一方、畳み込み操作はローカルウィンドウに制限され、他の場所からの情報をキャプチャできないため、サンプリングされた形状が固定される。一方、畳み込み核のサイズは k$\times$ k に固定されており、これは固定された正方形であり、パラメータの数はサイズとともに正方形に増加する傾向にある。ターゲットの形状とサイズが、異なるデータセットや異なる場所で異なることは明らかである。固定されたサンプル形状と正方形を持つ畳み込みカーネルは、ターゲットの変化にうまく適応しない。上記の質問に応えて、Alterable Kernel Convolution (AKConv) が本研究で検討され、畳み込みカーネルに任意の数のパラメータと任意のサンプル形状を与え、ネットワークオーバヘッドとパフォーマンスのトレードオフのためのよりリッチなオプションを提供する。 AKConvでは、新しい座標生成アルゴリズムを用いて任意の大きさの畳み込みカーネルの初期位置を定義する。ターゲットの変化に適応するため,各位置におけるサンプルの形状を調整するためのオフセットを導入する。さらに、同じ大きさと異なる初期サンプル形状のAKConvを用いてニューラルネットワークの効果について検討する。 AKConvは、不規則な畳み込み操作による効率的な特徴抽出のプロセスを完了し、畳み込みサンプリング形状に対するさらなる探索オプションを提供する。代表的なデータセットCOCO2017、VOC 7+12、VisDrone-DET2021のオブジェクト検出実験は、AKConvの利点を十分に証明している。 AKConvは、ネットワーク性能を改善するために畳み込み操作を置き換えるためのプラグアンドプレイ畳み込み操作として使用できる。関連するタスクのコードはhttps://github.com/CV-ZhangXin/AKConvで確認できる。 Neural networks based on convolutional operations have achieved remarkable results in the field of deep learning, but there are two inherent flaws in standard convolutional operations. On the one hand, the convolution operation be confined to a local window and cannot capture information from other locations, and its sampled shapes is fixed. On the other hand, the size of the convolutional kernel is fixed to k $\times$ k, which is a fixed square shape, and the number of parameters tends to grow squarely with size. It is obvious that the shape and size of targets are various in different datasets and at different locations. Convolutional kernels with fixed sample shapes and squares do not adapt well to changing targets. In response to the above questions, the Alterable Kernel Convolution (AKConv) is explored in this work, which gives the convolution kernel an arbitrary number of parameters and arbitrary sampled shapes to provide richer options for the trade-off between network overhead and performance. In AKConv, we define initial positions for convolutional kernels of arbitrary size by means of a new coordinate generation algorithm. To adapt to changes for targets, we introduce offsets to adjust the shape of the samples at each position. Moreover, we explore the effect of the neural network by using the AKConv with the same size and different initial sampled shapes. AKConv completes the process of efficient feature extraction by irregular convolutional operations and brings more exploration options for convolutional sampling shapes. Object detection experiments on representative datasets COCO2017, VOC 7+12 and VisDrone-DET2021 fully demonstrate the advantages of AKConv. AKConv can be used as a plug-and-play convolutional operation to replace convolutional operations to improve network performance. The code for the relevant tasks can be found at https://github.com/CV-ZhangXin/AKConv.	翻訳日:2023-11-30 12:31:25 公開日:2023-11-26
# インドにおける新型コロナウイルスワクチンの機械学習による感受性分析 Unveiling Public Perceptions: Machine Learning-Based Sentiment Analysis of COVID-19 Vaccines in India ( http://arxiv.org/abs/2311.11435v2 ) ライセンス: Link先を確認	Milind Gupta and Abhishek Kaushik	(参考訳) 2020年3月、世界保健機関(WHO)は新型コロナウイルスの世界的な感染拡大を宣言。 2021年半ばまでに、インドはコビシエルド、コヴァクシン、スプートニクの3つのワクチンを導入した。インドのような人口密度の高い国でワクチン接種が成功するためには、大衆の感情を理解することが不可欠だった。ソーシャルメディア、特にredditは4億3000万人のユーザーを抱えており、情報を広める上で重要な役割を果たした。この研究では、Redditのデータを分析し、新型コロナウイルスワクチンに対するインド人の感情を測定するためにデータマイニング技術を採用している。 PythonのText Blobライブラリを使って、コメントは一般的な感情を評価するために注釈付けされる。結果、インドのredditユーザーのほとんどが、予防接種に関する中立性を示しており、インド政府は人口のかなりの部分を予防接種しようとしている。 In March 2020, the World Health Organisation declared COVID-19 a global pandemic as it spread to nearly every country. By mid-2021, India had introduced three vaccines: Covishield, Covaxin, and Sputnik. To ensure successful vaccination in a densely populated country like India, understanding public sentiment was crucial. Social media, particularly Reddit with over 430 million users, played a vital role in disseminating information. This study employs data mining techniques to analyze Reddit data and gauge Indian sentiments towards COVID-19 vaccines. Using Python's Text Blob library, comments are annotated to assess general sentiments. Results show that most Reddit users in India expressed neutrality about vaccination, posing a challenge for the Indian government's efforts to vaccinate a significant portion of the population.	翻訳日:2023-11-30 12:30:11 公開日:2023-11-26
# LION : デュアルレベルビジュアル知識を用いたマルチモーダル大言語モデルの構築 LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge ( http://arxiv.org/abs/2311.11860v2 ) ライセンス: Link先を確認	Gongwei Chen, Leyang Shen, Rui Shao, Xiang Deng, Liqiang Nie	(参考訳) MLLM(Multimodal Large Language Models)は、マルチモーダル信号の知覚と理解が可能なLLMを提供する。しかし、既存のmllmの多くは、粗い画像テキストペアに事前学習された視覚エンコーダを主に採用しており、視覚知識の抽出と推論が不十分である。この問題に対処するために,2段階の視覚的知識を注入することでMLLMを増強するデュアルレベルvIsual knOwledge eNhanced Multimodal Large Language Model (LION)を考案した。 1)細粒度空間認識視覚知識の進歩的導入我々は,領域レベルの視覚言語(VL)タスクと連携した視覚アグリゲータを設計し,細粒度空間認識視覚知識をMLLMに組み込む。組込み時の画像レベルと領域レベルのVLタスク間の衝突を軽減するため,適応の混合によるステージワイドな指導学習戦略を考案した。このプログレッシブな組み込み方式は、これらの2種類のVLタスク間の相互促進に寄与する。 2)ハイレベルな視覚的証拠のソフトプロンプト。多様な画像タグを活用することで,MLLMの高度な意味的視覚的エビデンスを実現する。予測タグの不完全による潜在的な影響を軽減するため,学習可能なトークンをテキスト命令に組み込むことにより,ソフトプロンプト手法を提案する。複数のマルチモーダルベンチマークに関する総合的な実験は、我々のモデルの優位性を示している(例:VSRでの5%精度の改善、InstructBLIP上のTextCapsでの3%CIDEr、Cosmos-2上のRefCOCOgでの5%精度)。 Multimodal Large Language Models (MLLMs) have endowed LLMs with the ability to perceive and understand multi-modal signals. However, most of the existing MLLMs mainly adopt vision encoders pretrained on coarsely aligned image-text pairs, leading to insufficient extraction and reasoning of visual knowledge. To address this issue, we devise a dual-Level vIsual knOwledge eNhanced Multimodal Large Language Model (LION), which empowers the MLLM by injecting visual knowledge in two levels. 1) Progressive incorporation of fine-grained spatial-aware visual knowledge. We design a vision aggregator cooperated with region-level vision-language (VL) tasks to incorporate fine-grained spatial-aware visual knowledge into the MLLM. To alleviate the conflict between image-level and region-level VL tasks during incorporation, we devise a dedicated stage-wise instruction-tuning strategy with mixture-of-adapters. This progressive incorporation scheme contributes to the mutual promotion between these two kinds of VL tasks. 2) Soft prompting of high-level semantic visual evidence. We facilitate the MLLM with high-level semantic visual evidence by leveraging diverse image tags. To mitigate the potential influence caused by imperfect predicted tags, we propose a soft prompting method by embedding a learnable token into the tailored text instruction. Comprehensive experiments on several multi-modal benchmarks demonstrate the superiority of our model (e.g., improvement of 5% accuracy on VSR and 3% CIDEr on TextCaps over InstructBLIP, 5% accuracy on RefCOCOg over Kosmos-2).	翻訳日:2023-11-30 12:22:32 公開日:2023-11-26
# 変分探索モジュールVEM:地理空間モデリングとAIワークフローのためのクラウドネイティブ最適化と検証ツール Variational Exploration Module VEM: A Cloud-Native Optimization and Validation Tool for Geospatial Modeling and AI Workflows ( http://arxiv.org/abs/2311.16196v1 ) ライセンス: Link先を確認	Julian Kuehnert (1), Hiwot Tadesse (1), Chris Dearden (2), Rosie Lickorish (3), Paolo Fraccaro (3), Anne Jones (3), Blair Edwards (3), Sekou L. Remy (1), Peter Melling (4), Tim Culmer (4) ((1) IBM Research, Nairobi, Kenya, (2) STFC Hartree Centre, Warrington, UK, (3) IBM Research, Daresbury, UK, (4) Riskaware Ltd., Bristol, UK)	(参考訳) 地理空間観測と計算モデルが組み合わさって、我々の環境の物理的システムを理解し、社会的な害を軽減するためのベストプラクティスの設計を可能にしている。クラウドベースのデプロイメントは、これらのモデリングとAIワークフローのスケールアップに役立つ。しかし、実践者が堅牢な結論を出すためには、モデルチューニングとテストが不可欠であり、モデル入力変数のバリエーションを伴うリソース集約的なプロセスである。本研究では,ワークフロー実行のオーケストレーションとベイジアンおよび機械学習に基づくモデル動作解析手法を用いて,クラウドにデプロイされたモデリングワークフローの最適化と検証を容易にする変分探索モジュールを開発した。ユーザ設定は、マルチエージェント環境で多様なサンプリング戦略を組み合わせることができる。モデルに依存しないモジュールの柔軟性と堅牢性は実世界のアプリケーションを用いて実証される。 Geospatial observations combined with computational models have become key to understanding the physical systems of our environment and enable the design of best practices to reduce societal harm. Cloud-based deployments help to scale up these modeling and AI workflows. Yet, for practitioners to make robust conclusions, model tuning and testing is crucial, a resource intensive process which involves the variation of model input variables. We have developed the Variational Exploration Module which facilitates the optimization and validation of modeling workflows deployed in the cloud by orchestrating workflow executions and using Bayesian and machine learning-based methods to analyze model behavior. User configurations allow the combination of diverse sampling strategies in multi-agent environments. The flexibility and robustness of the model-agnostic module is demonstrated using real-world applications.	翻訳日:2023-11-29 21:44:44 公開日:2023-11-26
# 早期・適時診断のための基礎的枠組みと方法論 A Foundational Framework and Methodology for Personalized Early and Timely Diagnosis ( http://arxiv.org/abs/2311.16195v1 ) ライセンス: Link先を確認	Tim Schubert, Richard W Peck, Alexander Gimson, Camelia Davtyan, Mihaela van der Schaar	(参考訳) 病気の早期診断は、より良い治療オプションを可能にし、長期生存と生活の質を改善し、全体的なコストを下げることで、医療の深い変革の可能性を秘めている。医療用ビッグデータの出現、診断検査の進歩、および機械学習と統計学の進歩により、早期またはタイムリーな診断がリーチ内にあるように思われる。初期の診断研究は、個々の診断経路を最適化する可能性をしばしば無視する。パーソナライズされた早期診断を実現するためには, 診断過程を明確化し, 個々の患者に対して, 診断の時間依存性の値を体系的に同定する基盤的枠組みが必要である。本稿では,早期診断とタイムリー診断のための基礎的枠組みを提案する。診断プロセスを概説する意思決定論的アプローチに基づいており、最適なパーソナライズされた診断パスを推定するために機械学習と統計的方法論を統合する。提案するフレームワークと,おそらく他のフレームワークを説明するために,本質的な定義を提供する。基礎的なフレームワークの開発は、いくつかの理由から必要です。 1)形式主義は,意思決定支援ツールの開発を明快にする。 2 観察情報は、将来の患者の軌跡の推定と相補することができる。 3)非現実的診断パスと関連する不確実性の純利益を個人にモデル化できる 4)「早期」「時期」の診断は明確に定義することができる。 5) パーソナライズされた早期診断, 健康結果, 発生コストに対する影響の観点から, 技術の価値を評価するためのメカニズムが出現する。最後に、この基盤となる枠組みが、待望のタイムリーな診断と介入の可能性を解き明かし、患者の成果を改善し、医療システムにより高い費用効果をもたらすことを期待する。 Early diagnosis of diseases holds the potential for deep transformation in healthcare by enabling better treatment options, improving long-term survival and quality of life, and reducing overall cost. With the advent of medical big data, advances in diagnostic tests as well as in machine learning and statistics, early or timely diagnosis seems within reach. Early diagnosis research often neglects the potential for optimizing individual diagnostic paths. To enable personalized early diagnosis, a foundational framework is needed that delineates the diagnosis process and systematically identifies the time-dependent value of various diagnostic tests for an individual patient given their unique characteristics. Here, we propose the first foundational framework for early and timely diagnosis. It builds on decision-theoretic approaches to outline the diagnosis process and integrates machine learning and statistical methodology for estimating the optimal personalized diagnostic path. To describe the proposed framework as well as possibly other frameworks, we provide essential definitions. The development of a foundational framework is necessary for several reasons: 1) formalism provides clarity for the development of decision support tools; 2) observed information can be complemented with estimates of the future patient trajectory; 3) the net benefit of counterfactual diagnostic paths and associated uncertainties can be modeled for individuals 4) 'early' and 'timely' diagnosis can be clearly defined; 5) a mechanism emerges for assessing the value of technologies in terms of their impact on personalized early diagnosis, resulting health outcomes and incurred costs. Finally, we hope that this foundational framework will unlock the long-awaited potential of timely diagnosis and intervention, leading to improved outcomes for patients and higher cost-effectiveness for healthcare systems.	翻訳日:2023-11-29 21:44:31 公開日:2023-11-26
# BadCLIP:CLIPのバックドア攻撃のためのトリガー対応プロンプト学習 BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP ( http://arxiv.org/abs/2311.16194v1 ) ライセンス: Link先を確認	Jiawang Bai, Kuofeng Gao, Shaobo Min, Shu-Tao Xia, Zhifeng Li, Wei Liu	(参考訳) CLIPとして知られるコントラストビジョンランゲージ事前トレーニングは、下流の画像認識タスクに対処する上で有望な効果を示している。しかし、最近の研究により、CLIPモデルは下流指向のバックドアで埋め込むことができることが明らかになった。下流タスクでは、1つの犠牲者モデルはクリーンなサンプルでうまく機能するが、特定のトリガーが存在するたびに特定のターゲットクラスを予測する。バックドアを注入するには、既存の攻撃は、トレーニング済みのCLIPモデル全体を悪質に微調整するために、大量のデータに依存するため、データ制限のシナリオには適用できない。本研究は,学習可能なプロンプトの最近の成功に動機づけられ,プロンプト学習段階でクリップモデルにバックドアを注入することでこの問題に対処した。 BadCLIP という手法は,CLIP に対するバックドア攻撃,すなわち画像エンコーダとテキストエンコーダの両方にトリガーを作用させる,新しい効果的な機構に基づいて構築されている。画像に適用される学習可能なトリガーとトリガー対応コンテキストジェネレータで構成されており、トリガーはトリガー対応プロンプトを通じてテキスト機能を変更でき、これにより強力で一般化可能な攻撃をもたらす。 11のデータセットで実施された大規模な実験では、BadCLIPのクリーンな精度は高度な急進的な学習手法と似ており、ほとんどの場合、攻撃成功率は99%以上である。 BadCLIPはまた、目に見えないクラスにも一般化可能で、クロスデータセットとクロスドメイン設定の下で強力な一般化機能を示している。 Contrastive Vision-Language Pre-training, known as CLIP, has shown promising effectiveness in addressing downstream image recognition tasks. However, recent works revealed that the CLIP model can be implanted with a downstream-oriented backdoor. On downstream tasks, one victim model performs well on clean samples but predicts a specific target class whenever a specific trigger is present. For injecting a backdoor, existing attacks depend on a large amount of additional data to maliciously fine-tune the entire pre-trained CLIP model, which makes them inapplicable to data-limited scenarios. In this work, motivated by the recent success of learnable prompts, we address this problem by injecting a backdoor into the CLIP model in the prompt learning stage. Our method named BadCLIP is built on a novel and effective mechanism in backdoor attacks on CLIP, i.e., influencing both the image and text encoders with the trigger. It consists of a learnable trigger applied to images and a trigger-aware context generator, such that the trigger can change text features via trigger-aware prompts, resulting in a powerful and generalizable attack. Extensive experiments conducted on 11 datasets verify that the clean accuracy of BadCLIP is similar to those of advanced prompt learning methods and the attack success rate is higher than 99% in most cases. BadCLIP is also generalizable to unseen classes, and shows a strong generalization capability under cross-dataset and cross-domain settings.	翻訳日:2023-11-29 21:44:04 公開日:2023-11-26
# 人工知能における知識獲得への学生の関心 Students' interest in knowledge acquisition in Artificial Intelligence ( http://arxiv.org/abs/2311.16193v1 ) ライセンス: Link先を確認	Manuela-Andreea Petrescu, Emilia-Loredana Pop and Tudor-Dan Mihoc	(参考訳) 本研究では,人工知能コースに関する学生の期待と視点を考察し,分析した。コンピュータサイエンス専門学校に入学した200人中58人の大学生から匿名回答を得た。回答は分析され、テーマ分析を用いて解釈され、人工知能研究のトピックに関連する関心や魅力、魅力のない側面を解明した。その傾向、適用性、その主題に対する情熱と関心、将来の成長の可能性、高い給与のために、学生は人工知能に興味を持っていると結論づけた。しかし、学生の期待は主に人工知能分野における中レベルの知識の獲得に関連しており、男性は女性よりも高度なスキルの獲得に関心があるようである。学生が楽しまなかった最も一般的な部分は、人工知能で使われる数学的側面であった。その一部(小さなグループ)は、否定的な目的のために非倫理的な方法で使用できる人工知能の可能性も認識していた。また,本研究は,中等知識の習得に学生はそれほど熱心でも興味も持たず,DBの使用状況や基本情報にも関係していたデータベース・コースと比較した。 Some students' expectations and points of view related to the Artificial Intelligence course are explored and analyzed in this study. We anonymous collected answers from 58 undergraduate students out of 200 enrolled in the Computer Science specialization. The answers were analysed and interpreted using thematic analysis to find out their interests and attractive and unattractive aspects related to the Artificial Intelligence study topic. We concluded that students are interested in Artificial Intelligence due to its trendiness, applicability, their passion and interest in the subject, the potential for future growth, and high salaries. However, the students' expectations were mainly related to achieving medium knowledge in the Artificial Intelligence field, and men seem to be more interested in acquiring high-level skills than women. The most common part that wasn't enjoyed by the students was the mathematical aspect used in Artificial Intelligence. Some of them (a small group) were also aware of the Artificial Intelligence potential which could be used in an unethical manner for negative purposes. Our study also provides a short comparison to the Databases course, in which students were not that passionate or interested in achieving medium knowledge, their interest was related to DB usage and basic information.	翻訳日:2023-11-29 21:43:37 公開日:2023-11-26
# 余寿命予測のための複数入力自己回帰モデルの利用 Utilizing Multiple Inputs Autoregressive Models for Bearing Remaining Useful Life Prediction ( http://arxiv.org/abs/2311.16192v1 ) ライセンス: Link先を確認	Junliang Wang, Qinghua Zhang, Guanhua Zhu, Guoxi Sun	(参考訳) 転がり軸受(RUL)の正確な寿命予測は工業生産において重要であるが、既存のモデルはすべての振動信号パターンを完全に処理できないため、限られた一般化能力に苦慮することが多い。軸受のRUL予測において,この課題に対処する新しい多入力自己回帰モデルを提案する。提案手法は, 従来予測されていたHealth Indicator (HI) 値と振動信号を一意に統合し, 現在の窓 HI 値を出力するために特徴融合を利用する。自己回帰反復により、モデルはグローバルな受容場を獲得し、一般化の限界を効果的に克服する。さらに,自動回帰モデルにおける誤りの蓄積を軽減するために,セグメント化手法と複数のトレーニングイテレーションを革新的に取り入れた。 PMH2012データセットの実証評価では, 同様の自己回帰アプローチを用いたバックボーンネットワークと比較して, ルート平均角誤差(RMSE)とスコアが有意に低いことが示されている。特に、ラベル値を入力や非自己回帰的ネットワークとして使用する従来の自己回帰モデルよりも優れており、RMSEとScoreの指標において顕著なリードを持つ優れた一般化能力を示している。 Accurate prediction of the Remaining Useful Life (RUL) of rolling bearings is crucial in industrial production, yet existing models often struggle with limited generalization capabilities due to their inability to fully process all vibration signal patterns. We introduce a novel multi-input autoregressive model to address this challenge in RUL prediction for bearings. Our approach uniquely integrates vibration signals with previously predicted Health Indicator (HI) values, employing feature fusion to output current window HI values. Through autoregressive iterations, the model attains a global receptive field, effectively overcoming the limitations in generalization. Furthermore, we innovatively incorporate a segmentation method and multiple training iterations to mitigate error accumulation in autoregressive models. Empirical evaluation on the PMH2012 dataset demonstrates that our model, compared to other backbone networks using similar autoregressive approaches, achieves significantly lower Root Mean Square Error (RMSE) and Score. Notably, it outperforms traditional autoregressive models that use label values as inputs and non-autoregressive networks, showing superior generalization abilities with a marked lead in RMSE and Score metrics.	翻訳日:2023-11-29 21:43:19 公開日:2023-11-26
# MACE:周波数領域における多重パターン調整および効率的な異常検出手法 MACE: A Multi-pattern Accommodated and Efficient Anomaly Detection Method in the Frequency Domain ( http://arxiv.org/abs/2311.16191v1 ) ライセンス: Link先を確認	Feiyi Chen, Yingying zhang, Zhen Qin, Lunting Fan, Renhe Jiang, Yuxuan Liang, Qingsong Wen, Shuiguang Deng	(参考訳) 異常検出は、クラウドシステムの堅牢性を大幅に向上させる。ニューラルネットワークベースの手法は、最近、強力なアドバンテージを示しているが、クラウド環境では実用的な課題に直面している。各サービスに対するユニークなモデルを維持することの非現実性と、統一モデルによる多様な正常なパターンを扱う能力の制限と、リアルタイムなトラフィック処理や短時間の異常検出感度の問題だ。そこで本研究では、時系列異常検出のための周波数領域におけるマルチパターン調整および効率的な異常検出手法であるMACEを提案する。そこには3つの新しい特徴がある。 (i)多様な正常パターンの扱いに優れるパターン抽出機構は、データサンプル自体にのみ注目するのではなく、データサンプルとサービス正常パターンとの相関を調べることにより、異常を識別することができる。二時間領域における短期異常を増幅し、周波数領域における異常の再構成を阻害する双対的畳み込み機構で、異常と正常との再構成誤差を増大させ、異常検出を容易にする。 (iii)周波数領域のスパーシティと並列性を利用して、モデル効率を向上させる。理論的および実験的にフーリエ基底の戦略的に選択された部分集合を使うことは、計算オーバーヘッドを減少させるだけでなく、完全なスペクトルを使うよりも異常を区別する利益となることを証明した。さらに、多種多様な正規パターンを統一モデルで処理し、最先端の性能を高い効率で実現するためのMISの有効性を示す。 \end{abstract} Anomaly detection significantly enhances the robustness of cloud systems. While neural network-based methods have recently demonstrated strong advantages, they encounter practical challenges in cloud environments: the contradiction between the impracticality of maintaining a unique model for each service and the limited ability of dealing with diverse normal patterns by a unified model, as well as issues with handling heavy traffic in real time and short-term anomaly detection sensitivity. Thus, we propose MACE, a Multi-pattern Accommodated and efficient Anomaly detection method in the frequency domain for time series anomaly detection. There are three novel characteristics of it: (i) a pattern extraction mechanism excelling at handling diverse normal patterns, which enables the model to identify anomalies by examining the correlation between the data sample and its service normal pattern, instead of solely focusing on the data sample itself; (ii) a dualistic convolution mechanism that amplifies short-term anomalies in the time domain and hinders the reconstruction of anomalies in the frequency domain, which enlarges the reconstruction error disparity between anomaly and normality and facilitates anomaly detection; (iii) leveraging the sparsity and parallelism of frequency domain to enhance model efficiency. We theoretically and experimentally prove that using a strategically selected subset of Fourier bases can not only reduce computational overhead but is also profit to distinguish anomalies, compared to using the complete spectrum. Moreover, extensive experiments demonstrate MACE's effectiveness in handling diverse normal patterns with a unified model and it achieves state-of-the-art performance with high efficiency. \end{abstract}	翻訳日:2023-11-29 21:42:57 公開日:2023-11-26
# Q-Pilot:フライングアンシラを用いたフィールドプログラマブル量子アレイコンパイル Q-Pilot: Field Programmable Quantum Array Compilation with Flying Ancillas ( http://arxiv.org/abs/2311.16190v1 ) ライセンス: Link先を確認	Hanrui Wang and Bochen Tan and Pengyu Liu and Yilian Liu and Jiaqi Gu and Jason Cong and Song Han	(参考訳) ニュートラル原子配列は量子コンピューティングにとって有望なプラットフォームとなり、特に原子運動のユニークな能力を持つ『textit{field programmable qubit array}』(FPQA)が注目されている。この機能は実行中のqubit接続の動的変更を可能にし、長距離ゲートの実行コストを削減し、並列性を改善する。しかし、この柔軟性の追加は、回路コンパイルに新たな課題をもたらす。 FPGAの配置とルーティング戦略に着想を得て,データキュービット間の2キュービットゲートのルーティングに可動原子を用いながら,すべてのデータキュービットを固定原子にマッピングすることを提案する。これらの移動原子は、ancilla qubitsとして機能し、実行中に動的に生成され、リサイクルされる。本稿では,フライングアンシラを用いたFPQA用スケーラブルコンパイラQ-Pilotについて述べる。量子シミュレーションと量子近似最適化アルゴリズム(qaoa)という2つの重要な量子応用について、ドメイン固有のルーティング戦略を考案する。超伝導デバイスや固定原子配列などの代替技術と比較して、Q-PilotはFPQAの柔軟性を効果的に活用し、それぞれ100キュービットのランダム、量子シミュレーション、QAOAの回路深度で1.4$\times$, 27.7$\times$, 6.3$\times$の低減を実現している。 Neutral atom arrays have become a promising platform for quantum computing, especially the \textit{field programmable qubit array} (FPQA) endowed with the unique capability of atom movement. This feature allows dynamic alterations in qubit connectivity during runtime, which can reduce the cost of executing long-range gates and improve parallelism. However, this added flexibility introduces new challenges in circuit compilation. Inspired by the placement and routing strategies for FPGAs, we propose to map all data qubits to fixed atoms while utilizing movable atoms to route for 2-qubit gates between data qubits. Coined \textit{flying ancillas}, these mobile atoms function as ancilla qubits, dynamically generated and recycled during execution. We present Q-Pilot, a scalable compiler for FPQA employing flying ancillas to maximize circuit parallelism. For two important quantum applications, quantum simulation and the Quantum Approximate Optimization Algorithm (QAOA), we devise domain-specific routing strategies. In comparison to alternative technologies such as superconducting devices or fixed atom arrays, Q-Pilot effectively harnesses the flexibility of FPQA, achieving reductions of 1.4$\times$, 27.7$\times$, and 6.3$\times$ in circuit depth for 100-qubit random, quantum simulation, and QAOA circuits, respectively.	翻訳日:2023-11-29 21:42:28 公開日:2023-11-26
# 大規模視覚言語モデルを用いた物体間インタラクション検出のための人間中心視覚手がかりの生成 Generating Human-Centric Visual Cues for Human-Object Interaction Detection via Large Vision-Language Models ( http://arxiv.org/abs/2311.16475v1 ) ライセンス: Link先を確認	Yu-Wei Zhan, Fan Liu, Xin Luo, Liqiang Nie, Xin-Shun Xu, Mohan Kankanhalli	(参考訳) human-object interaction (hoi) 検出は、人間とオブジェクトのペアを検出し、それらの相互作用を予測することを目的としている。しかし、人間の行動の複雑さとこれらの相互作用が起こる多様な文脈は困難である。直感的には、関与する参加者、ボディランゲージ、周囲の環境など、人間中心の視覚的手がかりは、これらの相互作用を形作る上で重要な役割を果たす。これらの手がかりは、特に目に見えない相互作用の解釈に不可欠である。本稿では,VLMを用いた3つのプロンプトを提案する。このようなリッチな人中心視覚クイズを活かすために,Human-Centric Visual Cues を用いた HCVC という新しい手法を提案する。特に,視覚的キュー機能をインスタンスやインタラクションデコーダに組み込むために,マルチトワーアーキテクチャを備えたトランスフォーマーベースのマルチモーダル融合モジュールを開発した。広範にわたる実験と解析により,人中心視力を用いたHOI検出の有効性が検証された。特に, 実験結果から, 2つの広く使用されているデータセットに対する既存の最先端手法よりも, 提案モデルの方が優れていることが示された。 Human-object interaction (HOI) detection aims at detecting human-object pairs and predicting their interactions. However, the complexity of human behavior and the diverse contexts in which these interactions occur make it challenging. Intuitively, human-centric visual cues, such as the involved participants, the body language, and the surrounding environment, play crucial roles in shaping these interactions. These cues are particularly vital in interpreting unseen interactions. In this paper, we propose three prompts with VLM to generate human-centric visual cues within an image from multiple perspectives of humans. To capitalize on these rich Human-Centric Visual Cues, we propose a novel approach named HCVC for HOI detection. Particularly, we develop a transformer-based multimodal fusion module with multitower architecture to integrate visual cue features into the instance and interaction decoders. Our extensive experiments and analysis validate the efficacy of leveraging the generated human-centric visual cues for HOI detection. Notably, the experimental results indicate the superiority of the proposed model over the existing state-of-the-art methods on two widely used datasets.	翻訳日:2023-11-29 20:27:42 公開日:2023-11-26
# GS-IR:逆レンダリングのための3次元ガウススティング GS-IR: 3D Gaussian Splatting for Inverse Rendering ( http://arxiv.org/abs/2311.16473v1 ) ライセンス: Link先を確認	Zhihao Liang, Qi Zhang, Ying Feng, Ying Shan, Kui Jia	(参考訳) 本稿では,3次元ガウス散乱(GS)に基づく新しい逆レンダリング手法であるGS-IRを提案する。暗黙的なニューラル表現とボリュームレンダリング(例えば、NeRF)を低表現力と高い計算複雑性で用いた従来の作品とは異なり、GSは、未知の照明条件下で撮影されたマルチビュー画像からシーン幾何学、表面物質、環境照明を推定するために、新しいビュー合成のための最高性能の表現である。 gsを逆レンダリングに導入する場合、主な問題は2つある。 1)GSは,本質的に可塑性な正常生産をサポートしない。 2)前方マッピング(ラスタ化やスプラッティングなど)は後方マッピング(レイトレーシングなど)のように咬合を追跡することはできない。これらの課題に対処するため,gs-irは,通常推定のための奥行き導出に基づく正規化と,間接照明をモデル化するためのベイキングに基づくオクルージョンを組み込んだ効率的な最適化手法を提案する。フレキシブルかつ表現力のあるGS表現は、高速かつコンパクトな幾何再構成、フォトリアリスティックな新規ビュー合成、有効物理ベースレンダリングを実現する。本手法は,様々な挑戦シーンの質的,定量的評価を通じて,ベースライン法よりも優れていることを示す。 We propose GS-IR, a novel inverse rendering approach based on 3D Gaussian Splatting (GS) that leverages forward mapping volume rendering to achieve photorealistic novel view synthesis and relighting results. Unlike previous works that use implicit neural representations and volume rendering (e.g. NeRF), which suffer from low expressive power and high computational complexity, we extend GS, a top-performance representation for novel view synthesis, to estimate scene geometry, surface material, and environment illumination from multi-view images captured under unknown lighting conditions. There are two main problems when introducing GS to inverse rendering: 1) GS does not support producing plausible normal natively; 2) forward mapping (e.g. rasterization and splatting) cannot trace the occlusion like backward mapping (e.g. ray tracing). To address these challenges, our GS-IR proposes an efficient optimization scheme that incorporates a depth-derivation-based regularization for normal estimation and a baking-based occlusion to model indirect lighting. The flexible and expressive GS representation allows us to achieve fast and compact geometry reconstruction, photorealistic novel view synthesis, and effective physically-based rendering. We demonstrate the superiority of our method over baseline methods through qualitative and quantitative evaluations on various challenging scenes.	翻訳日:2023-11-29 20:27:07 公開日:2023-11-26
# eye vs. ai: 映像記憶における人間の視線とモデル注意 Eye vs. AI: Human Gaze and Model Attention in Video Memorability ( http://arxiv.org/abs/2311.16484v1 ) ライセンス: Link先を確認	Prajneya Kumar, Eshika Khandelwal, Makarand Tapaswi, Vishnu Sreekumar	(参考訳) ビデオの記憶可能性を決定する要因を理解することは、教育技術や広告などの分野で重要な応用となる。この目的に向けて,映像の記憶可能性を支える意味的および時間的注意機構について検討する。本研究では,大規模映像データセットにおける映像記憶性予測におけるsota性能に適合する時空間的注意を持つ変圧器モデルを提案する。さらに重要なのは、自己注意パターンは、モデルが記憶可能性を予測する場所を示しています。小型眼球追跡実験により収集された人間の視線固定密度マップに対するモデル注意力の比較を行った。定量的塩分濃度指標は、モデル注意と人間の視線が類似したパターンに従うことを示している。さらに, パノプティカルセグメンテーションでは, モデルや人間の方がモノのクラスに多く参加していることが確認されているが, 注目度の増加/減少するクラスは, 記憶可能性スコアが高い傾向にある。また,本モデルが人間の時間的注意パターンを模倣し,初期フレームに重きを置くことも観察した。 Understanding the factors that determine video memorability has important applications in areas such as educational technology and advertising. Towards this goal, we investigate the semantic and temporal attention mechanisms underlying video memorability. We propose a Transformer-based model with spatio-temporal attention that matches SoTA performance on video memorability prediction on a large naturalistic video dataset. More importantly, the self-attention patterns show us where the model looks to predict memorability. We compare model attention against human gaze fixation density maps collected through a small-scale eye-tracking experiment where humans perform a video memory task. Quantitative saliency metrics show that the model attention and human gaze follow similar patterns. Furthermore, while panoptic segmentation confirms that the model and humans attend more to thing classes, stuff classes that receive increased/decreased attention tend to have higher memorability scores. We also observe that the model assigns greater importance to the initial frames, mimicking temporal attention patterns found in humans.	翻訳日:2023-11-29 20:13:14 公開日:2023-11-26
# pisa: ポイントクラウドベースのインストラクションシーン拡張 PISA: Point-cloud-based Instructed Scene Augmentation ( http://arxiv.org/abs/2311.16501v1 ) ライセンス: Link先を確認	Yiyang Luo and Ke Lin	(参考訳) 屋内シーン拡張は、拡張現実と仮想現実の応用を含むコンピュータビジョンの分野において、新たなトピックとなっている。しかし、既存のシーン拡張手法は、主に所望の場所として所定の位置を持つ事前構築されたオブジェクトデータベースを必要とする。本稿では,テキスト命令で条件付きで周囲に整合した点雲オブジェクトを生成可能な,最初のエンドツーエンドマルチモーダルディープニューラルネットワークを提案する。我々のモデルは、クエリとポイントクラウドの入力に基づいて、適切な位置に一見オブジェクトを生成し、これにより、以前は目に見えないオブジェクトのレイアウトを含む新しいシナリオを作成することができる。プレストアされたCADモデルのデータベースはもはや不要である。生成モデルとしてPoint-Eを用い,不明瞭な言語記述による偽陰性問題を緩和するために,定量化位置予測とTop-K推定を含む手法を導入する。さらに,本モデルが実際の室内物体を生成できることを総合的に示し,生成物体の多様性,指示の有効性,定量的測定結果を示すことにより,モデルの能力を評価する。さらに詳細な評価のために、モデルによって生成されたシーンの品質を評価するためのメトリクスとして、視覚的な接地も取り入れています。 Indoor scene augmentation has become an emerging topic in the field of computer vision with applications in augmented and virtual reality. However, existing scene augmentation methods mostly require a pre-built object database with a given position as the desired location. In this paper, we propose the first end-to-end multi-modal deep neural network that can generate point cloud objects consistent with their surroundings, conditioned on text instructions. Our model generates a seemly object in the appropriate position based on the inputs of a query and point clouds, thereby enabling the creation of new scenarios involving previously unseen layouts of objects. Database of pre-stored CAD models is no longer needed. We use Point-E as our generative model and introduce methods including quantified position prediction and Top-K estimation to mitigate the false negative problems caused by ambiguous language description. Moreover, we evaluate the ability of our model by demonstrating the diversity of generated objects, the effectiveness of instruction, and quantitative metric results, which collectively indicate that our model is capable of generating realistic in-door objects. For a more thorough evaluation, we also incorporate visual grounding as a metric to assess the quality of the scenes generated by our model.	翻訳日:2023-11-29 20:00:22 公開日:2023-11-26
# 知識誘導予測アーキテクチャによるSAR ATRの自己教師付き学習 Self-Supervised Learning for SAR ATR with a Knowledge-Guided Predictive Architecture ( http://arxiv.org/abs/2311.15153v1 ) ライセンス: Link先を確認	Weijie Li, Yang Wei, Tianpeng Liu, Yuenan Hou, Yongxiang Liu, Li Liu	(参考訳) 近年,SAR(Synthetic Aperture Radar)センサやターゲットデータセットの出現により,下流タスクを自己教師付き学習技術と一体化することが可能となり,SAR目標認識分野における基礎モデル構築の道を開いた。 sar目標認識のための自己教師あり学習の主な課題は、低データ品質と雑音における一般化された表現学習であり、上記の問題に対処するために、局所マスクパッチを用いた知識誘導型予測アーキテクチャを提案する。提案アーキテクチャの中核は、従来のSARドメインの特徴抽出と最先端のスケーラブルな自己教師付き学習を組み合わせることで、正確な一般化された特徴表現を実現することである。提案フレームワークは、様々な下流データセット(MSTAR、FUSAR-Ship、SAR-ACD、SSDD)で検証され、SARターゲット認識に一貫したパフォーマンス改善をもたらすことができる。実験結果は,SAR目標認識のための自己教師付き学習手法の多種多様な目標,シーン,センサに対する統一的な性能向上を強く実証した。 Recently, the emergence of a large number of Synthetic Aperture Radar (SAR) sensors and target datasets has made it possible to unify downstream tasks with self-supervised learning techniques, which can pave the way for building the foundation model in the SAR target recognition field. The major challenge of self-supervised learning for SAR target recognition lies in the generalizable representation learning in low data quality and noise.To address the aforementioned problem, we propose a knowledge-guided predictive architecture that uses local masked patches to predict the multiscale SAR feature representations of unseen context. The core of the proposed architecture lies in combining traditional SAR domain feature extraction with state-of-the-art scalable self-supervised learning for accurate generalized feature representations. The proposed framework is validated on various downstream datasets (MSTAR, FUSAR-Ship, SAR-ACD and SSDD), and can bring consistent performance improvement for SAR target recognition. The experimental results strongly demonstrate the unified performance improvement of the self-supervised learning technique for SAR target recognition across diverse targets, scenes and sensors.	翻訳日:2023-11-29 17:05:18 公開日:2023-11-26
# ユーザフィードバックとアプリ更新ログの対一致機能点における時系列利用によるユーザ貢献率の推定 Estimation of the User Contribution Rate by Leveraging Time Sequence in Pairwise Matching function-point between Users Feedback and App Updating Log ( http://arxiv.org/abs/2311.15179v1 ) ライセンス: Link先を確認	Shiqi Duan, Jianxun Liu, Yong Xiao, Xiangping Zhang	(参考訳) モバイルアプリケーションは、人々の日常生活の不可分な部分となっている。それでも市場競争は非常に激しく、ほとんどのユーザーの間で認識されていないアプリは市場排除の影響を受けやすい。この目的のためにデベロッパーは、より広いユーザー基盤の要求を迅速かつ正確に理解し、アプリの秩序と健全な進化を効果的に戦略化し促進する必要がある。一般的なユーザ要件が開発者によって採用される率、あるいはユーザコントリビューションは、アプリケーション開発者やソフトウェアエンジニアリング研究者にとって、アプリ要件の進化を測ったり、洞察を得て、アプリのソフトウェアの進化を予測する上で重要なツールとなる、非常に価値のある指標です。残念なことに、この重要な指標には洗練された定量的分析アプローチやツールが欠けている。この問題に対処するために,本稿では,アプリの更新ログとユーザレビューに存在する時間的相関知覚に基づく定量的分析手法を提案する。本手法の主な考え方は,ユーザ要求とアプリの更新ログを開発者対応として検討し,テキスト・コンピューティングによって両者の相互関係と時系列関係を抽出・解析し,ユーザの貢献度を定量的に計算する実現可能なアプローチを構築することである。このアプローチの実現可能性を示すため,本論文では,中国本土のApp Storeの4つの中国アプリと,米国内の1つの英国アプリから,2,178件の更新ログと4,236,417件のユーザレビューを含むデータを収集し,実験結果から,これらのアプリの機能のうち16.6%～43.2%が,オンラインユーザ要件の推進に関連していることが判明した。 Mobile applications have become an inseparable part of people's daily life. Nonetheless, the market competition is extremely fierce, and apps lacking recognition among most users are susceptible to market elimination. To this end, developers must swiftly and accurately apprehend the requirements of the wider user base to effectively strategize and promote their apps' orderly and healthy evolution. The rate at which general user requirements are adopted by developers, or user contribution, is a very valuable metric that can be an important tool for app developers or software engineering researchers to measure or gain insight into the evolution of app requirements and predict the evolution of app software. Regrettably, the landscape lacks refined quantitative analysis approaches and tools for this pivotal indicator. To address this problem, this paper exploratively proposes a quantitative analysis approach based on the temporal correlation perception that exists in the app update log and user reviews, which provides a feasible solution for quantitatively obtaining the user contribution. The main idea of this scheme is to consider valid user reviews as user requirements and app update logs as developer responses, and to mine and analyze the pairwise and chronological relationships existing between the two by text computing, thus constructing a feasible approach for quantitatively calculating user contribution. To demonstrate the feasibility of the approach, this paper collects data from four Chinese apps in the App Store in mainland China and one English app in the U.S. region, including 2,178 update logs and 4,236,417 user reviews, and from the results of the experiment, it was found that 16.6%-43.2% of the feature of these apps would be related to the drive from the online popular user requirements.	翻訳日:2023-11-28 19:15:01 公開日:2023-11-26
# ディープラーニングに基づく非接触指紋のセグメンテーションと抽出 Deep Learning-Based Approaches for Contactless Fingerprints Segmentation and Extraction ( http://arxiv.org/abs/2311.15163v1 ) ライセンス: Link先を確認	M.G. Sarwar Murshed, Syed Konain Abbas, Sandip Purnapatra, Daqing Hou and Faraz Hussain	(参考訳) 指紋は、人間のアイデンティティの最もユニークで信頼できる特徴の1つとして広く認識されている。現代の指紋認証システムでは、認証プロセス中に指紋をキャプチャするために指紋スキャナーや指紋センサーを使用する必要がある。光、容量、超音波センサーなどの様々なタイプの指紋センサーは、指紋データを収集し分析するための異なる技術を採用している。この特定のハードウェアやセンサーへの依存は、指紋ベースの生体認証システムを採用するための障壁や課題を生み出す。この制限は、様々なアプリケーションやシナリオにおける指紋認証の普及を妨げる。国境管理、医療システム、教育機関、金融取引、空港のセキュリティは、指紋センサーが一般に利用できない場合、課題に直面している。追加ハードウェアへの依存を軽減するために、代替として非接触指紋の使用が登場した。堅牢な非接触指紋認証システムの実現には,正確な指紋分割法,正確な指紋抽出ツール,信頼性の高い指紋照合器の開発が不可欠である。本稿では,コンタクトレス指紋定位とセグメンテーションのための深層学習に基づくセグメンテーションツールの開発に着目する。本システムは,非接触指紋画像から高いセグメンテーション精度と確実な指紋抽出を実現するために,ディープラーニング技術を活用する。本評価では,平均平均絶対誤差(mae)が30ピクセル,角度予測誤差(eap)が5.92度,ラベリング精度97.46%を示した。これらの結果は,新しい非接触指紋セグメンテーションおよび抽出ツールの有効性を示す。 Fingerprints are widely recognized as one of the most unique and reliable characteristics of human identity. Most modern fingerprint authentication systems rely on contact-based fingerprints, which require the use of fingerprint scanners or fingerprint sensors for capturing fingerprints during the authentication process. Various types of fingerprint sensors, such as optical, capacitive, and ultrasonic sensors, employ distinct techniques to gather and analyze fingerprint data. This dependency on specific hardware or sensors creates a barrier or challenge for the broader adoption of fingerprint based biometric systems. This limitation hinders the widespread adoption of fingerprint authentication in various applications and scenarios. Border control, healthcare systems, educational institutions, financial transactions, and airport security face challenges when fingerprint sensors are not universally available. To mitigate the dependence on additional hardware, the use of contactless fingerprints has emerged as an alternative. Developing precise fingerprint segmentation methods, accurate fingerprint extraction tools, and reliable fingerprint matchers are crucial for the successful implementation of a robust contactless fingerprint authentication system. This paper focuses on the development of a deep learning-based segmentation tool for contactless fingerprint localization and segmentation. Our system leverages deep learning techniques to achieve high segmentation accuracy and reliable extraction of fingerprints from contactless fingerprint images. In our evaluation, our segmentation method demonstrated an average mean absolute error (MAE) of 30 pixels, an error in angle prediction (EAP) of 5.92 degrees, and a labeling accuracy of 97.46%. These results demonstrate the effectiveness of our novel contactless fingerprint segmentation and extraction tools.	翻訳日:2023-11-28 19:14:29 公開日:2023-11-26
# ベイズ型新素材探索におけるドメイン知識注入 Domain Knowledge Injection in Bayesian Search for New Materials ( http://arxiv.org/abs/2311.15162v1 ) ライセンス: Link先を確認	Zikai Xie, Xenophon Evangelopoulos, Joseph Thacker, Andrew Cooper	(参考訳) 本稿では,探索空間における探索を調整するためのドメイン知識に対応するベイズ最適化(BO)アルゴリズムであるDKIBOを提案する。ベイズ最適化は、多くの難解な科学的問題に対するサンプル効率の最適化として最近登場した。既存のBOフレームワークは、空間を狭めることで、事前の信念の入力を加速させるが、そのような知識を組み込むことは必ずしも簡単ではなく、バイアスを導入し、パフォーマンスの低下につながることが多い。本稿では,ガウス過程の近似パワーを高めるために,追加決定論的サロゲートモデルを用いて構造知識を獲得関数に組み込む簡単な手法を提案する。これは、手前の問題の構造情報に基づいて好適に選択され、より良いインフォームドサンプリングに向けた補正項として機能する。材料設計タスクにドメイン知識をうまく注入することにより,提案手法の実用性を実証する。さらに, 実験条件およびアブレーション解析により, 提案手法の性能を検証した。 In this paper we propose DKIBO, a Bayesian optimization (BO) algorithm that accommodates domain knowledge to tune exploration in the search space. Bayesian optimization has recently emerged as a sample-efficient optimizer for many intractable scientific problems. While various existing BO frameworks allow the input of prior beliefs to accelerate the search by narrowing down the space, incorporating such knowledge is not always straightforward and can often introduce bias and lead to poor performance. Here we propose a simple approach to incorporate structural knowledge in the acquisition function by utilizing an additional deterministic surrogate model to enrich the approximation power of the Gaussian process. This is suitably chosen according to structural information of the problem at hand and acts a corrective term towards a better-informed sampling. We empirically demonstrate the practical utility of the proposed method by successfully injecting domain knowledge in a materials design task. We further validate our method's performance on different experimental settings and ablation analyses.	翻訳日:2023-11-28 19:14:07 公開日:2023-11-26
# 連続学習のための低級重み摂動を考慮したヘシアン Hessian Aware Low-Rank Weight Perturbation for Continual Learning ( http://arxiv.org/abs/2311.15161v1 ) ライセンス: Link先を確認	Jiaqi Li, Rui Wang, Yuanhao Lai, Changjian Shui, Sabyasachi Sahoo, Charles X. Ling, Shichun Yang, Boyu Wang, Christian Gagn\'e, Fan Zhou	(参考訳) 連続学習は、前者から得られた知識を忘れることなく、一連のタスクを順次学習することを目的としている。本研究では,連続学習のためのヘッセン認識低ランク摂動アルゴリズムを提案する。重み行列変換を用いて逐次タスクに沿ったパラメータ遷移をモデル化することにより、ニューラルネットワークの各層におけるタスク適応パラメータに低ランク近似を適用することを提案する。具体的には,ヘッセン近似と提案した低ランク近似の量的関係を理論的に実証する。近似ランクは、層比勾配と低ランク近似誤差によって推定される経験的損失の限界増加に従って、全世界的に決定される。さらに,パラメータ成長を抑えるために,重要度を低くすることでモデル容量を制御する。大規模タスクを含むデータセットを含む様々なベンチマークについて広範な実験を行い,提案手法の有効性と拡張性を示すため,最近の最先端手法との比較を行った。実験結果から,本手法は異なるベンチマークにおいて,特にタスク順序の堅牢性を実現し,課題の処理において,より優れた性能を示すことがわかった。デモコードはhttps://github.com/lijiaqi/HALRPで見ることができる。 Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones. In this work, we propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning. By modeling the parameter transitions along the sequential tasks with the weight matrix transformation, we propose to apply the low-rank approximation on the task-adaptive parameters in each layer of the neural networks. Specifically, we theoretically demonstrate the quantitative relationship between the Hessian and the proposed low-rank approximation. The approximation ranks are then globally determined according to the marginal increment of the empirical loss estimated by the layer-specific gradient and low-rank approximation error. Furthermore, we control the model capacity by pruning less important parameters to diminish the parameter growth. We conduct extensive experiments on various benchmarks, including a dataset with large-scale tasks, and compare our method against some recent state-of-the-art methods to demonstrate the effectiveness and scalability of our proposed method. Empirical results show that our method performs better on different benchmarks, especially in achieving task order robustness and handling the forgetting issue. A demo code can be found at https://github.com/lijiaqi/HALRP.	翻訳日:2023-11-28 19:13:50 公開日:2023-11-26
# グループ混合型視覚トランスフォーマーの進歩 Advancing Vision Transformers with Group-Mix Attention ( http://arxiv.org/abs/2311.15157v1 ) ライセンス: Link先を確認	Chongjian Ge, Xiaohan Ding, Zhan Tong, Li Yuan, Jiangliu Wang, Yibing Song, Ping Luo	(参考訳) 視覚変換器 (ViTs) は、MHSA (Multi-head Self-attention) による長距離依存をモデル化することで、視覚認識を強化することが示されている。しかし、Query and Keyから生成された注目マップは、1つの粒度でトークン間相関のみをキャプチャする。本稿では,表現能力を高めるために,トークンとグループ(すなわち複数の隣接トークン)間の相関を捉えるための,より包括的なメカニズムを持つべきである。そこで我々は,従来の自己注意の代替としてグループ・ミクス・アテンション(GMA)を提案し,トークン・ツー・トークン・ツー・グループ,グループ・ツー・グループ間の相関を様々なグループサイズで同時に捉えることができる。この目的のために、GMAはQuery、Key、Valueを一様にセグメントに分割し、グループプロキシを生成するために異なるグループアグリゲーションを実行する。アテンションマップはトークンとグループプロキシの混合に基づいて計算され、トークンとグループの値の再結合に使用される。 GMAに基づく強力なバックボーンであるGroupMixFormerを導入し、既存のモデルよりも少ないパラメータで画像分類、オブジェクト検出、セマンティックセグメンテーションにおける最先端のパフォーマンスを実現する。例えば、GroupMixFormer-L(70.3Mパラメータと384^2入力)はImageNet-1Kで86.2%、GroupMixFormer-B(45.8Mパラメータ)はADE20Kで51.2% mIoUに達する。 Vision Transformers (ViTs) have been shown to enhance visual recognition through modeling long-range dependencies with multi-head self-attention (MHSA), which is typically formulated as Query-Key-Value computation. However, the attention map generated from the Query and Key captures only token-to-token correlations at one single granularity. In this paper, we argue that self-attention should have a more comprehensive mechanism to capture correlations among tokens and groups (i.e., multiple adjacent tokens) for higher representational capacity. Thereby, we propose Group-Mix Attention (GMA) as an advanced replacement for traditional self-attention, which can simultaneously capture token-to-token, token-to-group, and group-to-group correlations with various group sizes. To this end, GMA splits the Query, Key, and Value into segments uniformly and performs different group aggregations to generate group proxies. The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value. Based on GMA, we introduce a powerful backbone, namely GroupMixFormer, which achieves state-of-the-art performance in image classification, object detection, and semantic segmentation with fewer parameters than existing models. For instance, GroupMixFormer-L (with 70.3M parameters and 384^2 input) attains 86.2% Top-1 accuracy on ImageNet-1K without external data, while GroupMixFormer-B (with 45.8M parameters) attains 51.2% mIoU on ADE20K.	翻訳日:2023-11-28 19:13:33 公開日:2023-11-26
# xTrimoGene:シングルセルRNA-Seqデータのための効率的でスケーラブルな表現学習者 xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data ( http://arxiv.org/abs/2311.15156v1 ) ライセンス: Link先を確認	Jing Gong, Minsheng Hao, Xingyi Cheng, Xin Zeng, Chiming Liu, Jianzhu Ma, Xuegong Zhang, Taifeng Wang, Le Song	(参考訳) 高スループットシークエンシング技術の進歩は、単一細胞レベルでの遺伝子発現の測定に大きな進歩をもたらした。公開されているシングルセルRNA-seq(scRNA-seq)の量は、すでに2万の遺伝子を計測したヒトの5000万レコードを超えている。これは教師なし表現学習の必要性を強調するものだが、古典的なトランスフォーマーアーキテクチャでは、計算とメモリの両方でそのようなデータをトレーニングすることは禁止されている。この課題に対処するため、我々は、xTrimoGene$^\alpha$(略してxTrimoGene)と呼ばれる、cRNA-seqデータのための新しい非対称エンコーダデコーダ変換器を提案する。 xTrimoGeneのこのスケーラブルな設計は、従来のトランスフォーマーに比べてFLOPを1～2桁削減し、高い精度を維持しながら、今日の最大のScRNA-seqデータセット上で最大のトランスフォーマーモデルをトレーニングすることができる。また,モデルサイズを拡大するにつれて,xTrimoGeneの性能が向上し,セルタイプアノテーションやパーターブシーク効果予測,薬物の組み合わせ予測など,様々な下流タスクにおけるSOTA性能も向上することを示した。 xTrimoGeneモデルは現在、以下のリンクを通じてサービスとして利用可能である。 Advances in high-throughput sequencing technology have led to significant progress in measuring gene expressions at the single-cell level. The amount of publicly available single-cell RNA-seq (scRNA-seq) data is already surpassing 50M records for humans with each record measuring 20,000 genes. This highlights the need for unsupervised representation learning to fully ingest these data, yet classical transformer architectures are prohibitive to train on such data in terms of both computation and memory. To address this challenge, we propose a novel asymmetric encoder-decoder transformer for scRNA-seq data, called xTrimoGene$^\alpha$ (or xTrimoGene for short), which leverages the sparse characteristic of the data to scale up the pre-training. This scalable design of xTrimoGene reduces FLOPs by one to two orders of magnitude compared to classical transformers while maintaining high accuracy, enabling us to train the largest transformer models over the largest scRNA-seq dataset today. Our experiments also show that the performance of xTrimoGene improves as we scale up the model sizes, and it also leads to SOTA performance over various downstream tasks, such as cell type annotation, perturb-seq effect prediction, and drug combination prediction. xTrimoGene model is now available for use as a service via the following link: https://api.biomap.com/xTrimoGene/apply.	翻訳日:2023-11-28 19:13:02 公開日:2023-11-26
# 非対称Bethe Ansatz Asymmetric Bethe Ansatz ( http://arxiv.org/abs/2311.15155v1 ) ライセンス: Link先を確認	Steven G. Jackson, Gregory E. Astrakharchik, and Maxim Olshanii	(参考訳) 最近提案された2つの$\delta$-function-interacting particlesの正確な量子解は、質量比3\! :\! ハードウォールボックス (Y. Liu, F. Qi, Y. Zhang, S. Chen, iScience 22 181 (2019)) の 1$ は、半透明な $\delta$-function ミラーに対するベーテ・アンザッツ積分性(英語版)(Bethe Ansatz integrability)の従来の必要条件に反するように見える: もしベーテ・アンザッツ可解モデルの2つのミラーが二面角 $\pi/(\text{odd number})$ で交差する場合、これらのミラーは等結合定数を割り当てなければならない。この論文では、この条件を緩和する方法を見出した: 従来の可積分系を取り込んで、その半透明ミラーのいくつかを完全に反射させることで置き換えることができる。後者の集合は、従来の系の対称性群の反射部分群の鏡で表さなければならない。この部分群は対称性の元系に対して対称であることが要求されるので、提案されたメソッドの名は非対称ベテ・アンザッツ (ABA) である。我々は、Liu-Qi-Zhang-Chen問題の正確な解が ABA の特別な例であることを示す。 The recently proposed exact quantum solution for two $\delta$-function-interacting particles with a mass-ratio $3\!:\!1$ in a hard-wall box [Y. Liu, F. Qi, Y. Zhang and S. Chen, iScience 22, 181 (2019)] seemingly violates the conventional necessary condition for a Bethe Ansatz integrability for a system of semitransparent $\delta$-function mirrors: if two mirrors of a Bethe-Ansatz-solvable model cross at a dihedral angle $\pi/(\text{odd number})$, these mirrors must be assigned equal coupling constants. In our article, we find a way to relax this condition: it turns out that one can take a conventional integrable system and replace some of its semi-transparent mirrors by perfectly reflecting ones. The latter set must be represented by the mirrors of a reflection subgroup of the symmetry group of the conventional system. This subgroup is \emph{not} required to be symmetric with respect to the symmetries original system, hence the proposed name for the method: Asymmetric Bethe Ansatz (ABA). We show that the exact solution of the Liu-Qi-Zhang-Chen problem is a particular instance of the ABA.	翻訳日:2023-11-28 19:12:38 公開日:2023-11-26
# フラクショナル非線形Schr\"{o}ディンガー方程式におけるスペクトル分岐の観測 Observation of the spectral bifurcation in the Fractional Nonlinear Schr\"{o}dinger Equation ( http://arxiv.org/abs/2311.15150v1 ) ライセンス: Link先を確認	Shilong Liu, Yingwen Zhang, St\'ephane Virally, Ebrahim Karimi, Boris A. Malomed, Denis V. Seletskiy	(参考訳) 超高速ソリトンパルスのスペクトル分岐の包括的調査と実験的実現を報告する。これらの分岐は、分数非線形schr\"{o}dinger方程式の枠組みにおける分数群速度分散とkerr非線形性(自己相変調)の相互作用によって引き起こされる。分数分散と非線形の作用下でパルスのダイナミクスを捉えるために,周波数チャープに基づく効果的な「力」モデルを提案する。力'モデルを利用することで、スペクトル分岐 \{1\}$\rightarrow$ \{n\} を関連する非線形レベルで直接生成する分数分散プロファイルを設計する。これらの結果は、非線形性の成長に付随する伝統的な分岐の列 \{1\}$\rightarrow$ \{2\}$\rightarrow$ \{3\} ... $\rightarrow$ \{N\} を超えて拡張される。実験的な検証では、パルス整形器のセットアップ内で正確に調整されたホログラムが、変更可能な非線形媒体に結合される。特に、次列カスケードで必要となる非線形性の強度が著しく低い場合、最大で N=5 in \{1\}$\rightarrow$ \{N\} 分岐が得られる。工学的なスペクトル分岐パターンの提案は、超高速信号処理アプリケーションにとって大きな可能性を秘めている。実例として、これらの分岐モードを用いて、100kmの単モードファイバで光データをスキューズし、伝送する。 We report a comprehensive investigation and experimental realization of spectral bifurcations of ultrafast soliton pulses. These bifurcations are induced by the interplay between fractional group-velocity dispersion and Kerr nonlinearity (self-phase modulation) within the framework of the fractional nonlinear Schr\"{o}dinger equation. To capture the dynamics of the pulses under the action of the fractional dispersion and nonlinearity, we propose an effective `force' model based on the frequency chirp, which characterizes their interactions as either `repulsion', `attraction', or `equilibration'. By leveraging the `force' model, we design segmented fractional dispersion profiles that directly generate spectral bifurcations \{1\}$\rightarrow$ \{N\} at relevant nonlinearity levels. These results extend beyond the traditional sequence of bifurcations \{1\}$\rightarrow$ \{2\}$\rightarrow$ \{3\} ... $\rightarrow$ \{N\} associated with the growth of the nonlinearity. The experimental validation involves a precisely tailored hologram within a pulse shaper setup, coupled to an alterable nonlinear medium. Notably, we achieve up to N=5 in \{1\}$\rightarrow$ \{N\} bifurcations at a significantly lower strength of nonlinearity than otherwise would be required in a sequential cascade. The proposal for engineering spectral bifurcation patterns holds significant potential for ultrafast signal processing applications. As a practical illustration, we employ these bifurcation modes to optical data squeezing and transmitting it across a 100-km-long single-mode fiber.	翻訳日:2023-11-28 19:12:10 公開日:2023-11-26
# IBM量子プロセッサ上での人工ニューラルネットワークシンドロームデコード Artificial Neural Network Syndrome Decoding on IBM Quantum Processors ( http://arxiv.org/abs/2311.15146v1 ) ライセンス: Link先を確認	Brhyeton Hall, Spiro Gicev, Muhammad Usman	(参考訳) シンドローム復号法は、フォールトトレラント量子コンピューティングのための量子エラー補正の実装において、積分的だが計算的に要求されるステップである。本稿では,IBM量子プロセッサ上でのニューラルネットワーク(ANN)デコードの開発とベンチマークについて報告する。 ANNは重六角形コードアーキテクチャからシンドローム計測データを効率よく復号し、適切な修正を適用し、エラー保護を容易にする。 IBMデバイスの現在の物理的エラー率は、コードのしきい値を超え、論理的エラー率抑制のためにANNデコーダの範囲を制限する。しかし,本研究では,実験装置から取得したシンドロームデータのANN復号法の適用性を確認し,近日中にしきい値誤差率未満の量子デバイスが利用可能になると,機械学習を量子エラー訂正の有望な経路として確立する。 Syndrome decoding is an integral but computationally demanding step in the implementation of quantum error correction for fault-tolerant quantum computing. Here, we report the development and benchmarking of Artificial Neural Network (ANN) decoding on IBM Quantum Processors. We demonstrate that ANNs can efficiently decode syndrome measurement data from heavy-hexagonal code architecture and apply appropriate corrections to facilitate error protection. The current physical error rates of IBM devices are above the code's threshold and restrict the scope of our ANN decoder for logical error rate suppression. However, our work confirms the applicability of ANN decoding methods of syndrome data retrieved from experimental devices and establishes machine learning as a promising pathway for quantum error correction when quantum devices with below threshold error rates become available in the near future.	翻訳日:2023-11-28 19:11:44 公開日:2023-11-26
# 微妙な選択と深層学習:ドメイン一般化のためのCLIPによる選択的クロスモーダル蒸留 Choosing Wisely and Learning Deeply: Selective Cross-Modality Distillation via CLIP for Domain Generalization ( http://arxiv.org/abs/2311.15145v1 ) ライセンス: Link先を確認	Jixuan Leng, Yijiang Li, Haohan Wang	(参考訳) ドメインの一般化(DG)は重要な研究領域であり、複数のドメインにまたがるモデルをトレーニングし、目に見えない領域でテストすることを目指している。本稿では、ドメイン一般化のための選択的クロスモダリティ蒸留(scmd)という新しいアプローチを提案する。 SCMDは、大きな視覚言語モデル、特にCLIPモデルの能力を活用して、より効率的なモデルをトレーニングし、目に見えない領域にわたって堅牢な一般化能力を取得する。我々の主な貢献は、蒸留の難しいサンプルを特定するために戦略的に設計されたユニークな選択フレームワークである。並行して、新しいクロスモダリティモジュールを導入する。このモジュールは、学生モデルの投影された特徴とCLIPからのテキスト埋め込みをシームレスに組み合わせ、類似度分布のアライメントを保証する。 SCMDの性能を様々なベンチマークで評価し、ResNet50が既存のドメイン一般化手法を超越して最先端のパフォーマンスを提供できるようにします。さらに、我々は選択戦略の理論分析を行い、DG分野におけるその有効性と可能性について深い洞察を提供する。 Domain Generalization (DG), a crucial research area, seeks to train models across multiple domains and test them on unseen ones. In this paper, we introduce a novel approach, namely, Selective Cross-Modality Distillation for Domain Generalization (SCMD). SCMD leverages the capabilities of large vision-language models, specifically the CLIP model, to train a more efficient model, ensuring it acquires robust generalization capabilities across unseen domains. Our primary contribution is a unique selection framework strategically designed to identify hard-to-learn samples for distillation. In parallel, we introduce a novel cross-modality module. This module seamlessly combines the projected features of the student model with the text embeddings from CLIP, ensuring the alignment of similarity distributions. We assess SCMD's performance on various benchmarks, where it empowers a ResNet50 to deliver state-of-the-art performance, surpassing existing domain generalization methods. Furthermore, we provide a theoretical analysis of our selection strategy, offering deeper insight into its effectiveness and potential in the field of DG.	翻訳日:2023-11-28 19:11:30 公開日:2023-11-26
# ロングストーリー:コヒーレント、完全、そしてロングストーリーの生成を制御する LongStory: Coherent, Complete and Length Controlled Long story Generation ( http://arxiv.org/abs/2311.15208v1 ) ライセンス: Link先を確認	Kyeongman Park, Nakyeong Yang, Kyomin Jung	(参考訳) 人間の作者は、コヒーレンスを失うことなく、どんなストーリーでも書ける。また、彼らは常に適切な結末、現在の言語モデルに欠けている能力に物語をもたらします。本稿では,コヒーレントで完全かつ長さ制御の長いストーリー生成のためのLongStoryを提案する。 LongStoryは,(1)長期・短期の重み調整器(CWC)と(2)長期ストーリー構造位置(LSP)の2つの新しい手法を導入した。 cwcは長期的文脈記憶と短期的文脈の不正行為の重み付けを調整し、それぞれの役割を認めている。 LSPは長い物語の構造的位置を伝えるために談話トークンを使用している。平均ストーリーの長さの異なる3つのデータセットでトレーニングされたlongstoryは、強力なストーリージェネレータプロットマシン、一貫性、完全性、関連性、反復性を含む他のベースラインよりも優れている。また、各データセット上でゼロショットテストを実施し、トレーニングデータを超えた結果を予測するモデルの能力を評価し、そのパフォーマンスとモデルの変種を比較して方法論を検証する。 A human author can write any length of story without losing coherence. Also, they always bring the story to a proper ending, an ability that current language models lack. In this work, we present the LongStory for coherent, complete, and length-controlled long story generation. LongStory introduces two novel methodologies: (1) the long and short-term contexts weight calibrator (CWC) and (2) long story structural positions (LSP). The CWC adjusts weights for long-term context Memory and short-term context Cheating, acknowledging their distinct roles. The LSP employs discourse tokens to convey the structural positions of a long story. Trained on three datasets with varied average story lengths, LongStory outperforms other baselines, including the strong story generator Plotmachine, in coherence, completeness, relevance, and repetitiveness. We also perform zero-shot tests on each dataset to assess the model's ability to predict outcomes beyond its training data and validate our methodology by comparing its performance with variants of our model.	翻訳日:2023-11-28 19:02:03 公開日:2023-11-26
# 低次元ディスクリプタを用いた化合物空間における分子特性の効率的な補間 Efficient interpolation of molecular properties across chemical compound space with low-dimensional descriptors ( http://arxiv.org/abs/2311.15207v1 ) ライセンス: Link先を確認	Yun-Wen Mao and Roman V. Krems	(参考訳) 低次元ディスクリプタを持つ化合物空間における補間のための分子特性の正確なデータスターベドモデルを示す。我々の出発点は、クーロン行列の固有値の分布の性質から導かれた三次元、普遍的、物理的ディスクリプタに基づいている。分子の形状と構成を考慮し、これらの記述子とガーシュゴリンの円定理で示される6次元の特徴を組み合わせる。そこで,ガウス過程の回帰に対して,可変関数型カーネルを用いた9次元ディスクリプタを用いることにより,高効率な低次元補間モデルを実現する。 100分子で訓練されたモデルでは、エントロピーと温度 (s \times t$) とゼロ点振動エネルギー (zpve) の積を、絶対誤差が1 kcal mol$^{-1}$ for $> 78$ \%、テストデータ中の分子の1.3 kcal mol$^{-1}$ for $> 92$ \%で予測することができる。試験データは、3つの原子から29個の原子に変化する2万の分子と、それぞれ36 kcal mol$^{-1}$と161 kcal mol$^{-1}$をカバーする$S \times T$とZPVEの範囲からなる。また,ゲルシュゴリン環定理に基づく記述子は,分子の原子結合を明示的に考慮したグラフニューラルネットワークに基づく記述モデルよりも正確な分子エントロピーモデルが得られることを示す。 We demonstrate accurate data-starved models of molecular properties for interpolation in chemical compound spaces with low-dimensional descriptors. Our starting point is based on three-dimensional, universal, physical descriptors derived from the properties of the distributions of the eigenvalues of Coulomb matrices. To account for the shape and composition of molecules, we combine these descriptors with six-dimensional features informed by the Gershgorin circle theorem. We use the nine-dimensional descriptors thus obtained for Gaussian process regression based on kernels with variable functional form, leading to extremely efficient, low-dimensional interpolation models. The resulting models trained with 100 molecules are able to predict the product of entropy and temperature ($S \times T$) and zero point vibrational energy (ZPVE) with the absolute error under 1 kcal mol$^{-1}$ for $> 78$ \% and under 1.3 kcal mol$^{-1}$ for $> 92$ \% of molecules in the test data. The test data comprises 20,000 molecules with complexity varying from three atoms to 29 atoms and the ranges of $S \times T$ and ZPVE covering 36 kcal mol$^{-1}$ and 161 kcal mol$^{-1}$, respectively. We also illustrate that the descriptors based on the Gershgorin circle theorem yield more accurate models of molecular entropy than those based on graph neural networks that explicitly account for the atomic connectivity of molecules.	翻訳日:2023-11-28 19:01:43 公開日:2023-11-26
# Insect-Foundation: Visual Insect Understandingのための基盤モデルと大規模100万データセット Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding ( http://arxiv.org/abs/2311.15206v1 ) ライセンス: Link先を確認	Hoang-Quan Nguyen, Thanh-Dat Truong, Xuan Bac Nguyen, Ashley Dowling, Xin Li, Khoa Luu	(参考訳) 精密農業において、昆虫の検出と認識は、作物が健康に育ち、高品質な収量を生み出す能力において重要な役割を果たす。現在のマシンビジョンモデルは、高いパフォーマンスを達成するために大量のデータを必要とする。しかし、世界中で約550万種の昆虫が生息している。既存の昆虫のデータセットは、地理的に異なる場所と取得コストのために、そのわずかしかカバーできない。本稿では,昆虫に関する基礎モデルトレーニングに革命をもたらすゲーム変換リソースである'Insect-1M''データセットを紹介する。私たちのデータセットは昆虫の幅広い範囲をカバーしており、100万枚の画像に分類階層と昆虫の記述の密接な識別ラベルがあり、昆虫学のパノラマ的なビューを提供しています。そこで本研究では,昆虫画像間の微妙な相違を識別できるパッチワイド関連注意機構を備えた,微小機能自己教師型学習法を開発した。さらに,昆虫記述による微小機能モデリングを改善するために,記述一貫性損失を導入する。本研究は,昆虫モデルにおける提案手法の有効性を実証し,昆虫関連課題の標準ベンチマークにおける最新性能を実現する。当社の昆虫財団モデルとデータセットは、次世代昆虫関連視覚モデルに力を与え、精密農業の究極の目標に近付くことを約束しています。 In precision agriculture, the detection and recognition of insects play an essential role in the ability of crops to grow healthy and produce a high-quality yield. The current machine vision model requires a large volume of data to achieve high performance. However, there are approximately 5.5 million different insect species in the world. None of the existing insect datasets can cover even a fraction of them due to varying geographic locations and acquisition costs. In this paper, we introduce a novel ``Insect-1M'' dataset, a game-changing resource poised to revolutionize insect-related foundation model training. Covering a vast spectrum of insect species, our dataset, including 1 million images with dense identification labels of taxonomy hierarchy and insect descriptions, offers a panoramic view of entomology, enabling foundation models to comprehend visual and semantic information about insects like never before. Then, to efficiently establish an Insect Foundation Model, we develop a micro-feature self-supervised learning method with a Patch-wise Relevant Attention mechanism capable of discerning the subtle differences among insect images. In addition, we introduce Description Consistency loss to improve micro-feature modeling via insect descriptions. Through our experiments, we illustrate the effectiveness of our proposed approach in insect modeling and achieve State-of-the-Art performance on standard benchmarks of insect-related tasks. Our Insect Foundation Model and Dataset promise to empower the next generation of insect-related vision models, bringing them closer to the ultimate goal of precision agriculture.	翻訳日:2023-11-28 19:01:15 公開日:2023-11-26
# SAR船舶分類のための手作り共同特徴ビュー付きデュアルストリームコントラスト予測ネットワーク Dual-stream contrastive predictive network with joint handcrafted feature view for SAR ship classification ( http://arxiv.org/abs/2311.15202v1 ) ライセンス: Link先を確認	Xianting Feng, Hao zheng, Zhigang Hu, Liu Yang, Meiguang Zheng	(参考訳) 既存の合成開口レーダー(SAR)の船種分類技術は、ラベルのないSARの船種画像の識別特性を無視して、正確なラベル付きデータに大きく依存している。研究者は従来の手作りの機能を取り入れてCNNベースの機能を充実させようとするが、既存の手法は情報冗長性を容易に引き起こし、それらの相互作用を捉えるのに失敗する。これらの問題に対処するために,2つの非対称なタスク設計と偽陰性サンプル除去モジュールからなる新しい二ストリームコントラスト予測ネットワーク(DCPNet)を提案する。最初のタスクは正のサンプルペアを構築し、コアエンコーダにより一般的な表現を学習させることである。第2の課題は, 深部特徴と手話特徴との対応を適応的に把握し, モデル内での知識伝達を実現し, 特徴融合による冗長性を効果的に改善することである。クラスタ間の分離性を高めるため、クラスタレベルのタスクも設計する。 OpenSARShipとFUSAR-Shipデータセットの実験結果は、教師付きモデルの分類精度の向上を示し、DCPNetの効果的な表現の学習能力を確認する。 Most existing synthetic aperture radar (SAR) ship classification technologies heavily rely on correctly labeled data, ignoring the discriminative features of unlabeled SAR ship images. Even though researchers try to enrich CNN-based features by introducing traditional handcrafted features, existing methods easily cause information redundancy and fail to capture the interaction between them. To address these issues, we propose a novel dual-stream contrastive predictive network (DCPNet), which consists of two asymmetric task designs and the false negative sample elimination module. The first task is to construct positive sample pairs, guiding the core encoder to learn more general representations. The second task is to encourage adaptive capture of the correspondence between deep features and handcrated features, achieving knowledge transfer within the model, and effectively improving the redundancy caused by the feature fusion. To increase the separability between clusters, we also design a cluster-level tasks. The experimental results on OpenSARShip and FUSAR-Ship datasets demonstrate the improvement in classification accuracy of supervised models and confirm the capability of learning effective representations of DCPNet.	翻訳日:2023-11-28 19:00:48 公開日:2023-11-26
# splicemix:マルチラベル画像分類のためのクロススケール・セマンティックブレンド拡張戦略 SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label Image Classification ( http://arxiv.org/abs/2311.15200v1 ) ライセンス: Link先を確認	Lei Wang and Yibing Zhan and Leilei Ma and Dapeng Tao and Liang Ding and Chen Gong	(参考訳) 近年、ミックススタイルのデータ拡張手法(例えばmixupやcutmix)が様々なビジュアルタスクで有望なパフォーマンスを示している。しかし、これらの手法は主にシングルラベル画像のために設計されており、シングルラベル画像とマルチラベル画像のかなりの差を無視している。一方で、従来のマルチラベル画像分類(mlic)法は、複雑なモデルを設計する傾向があり、高価な計算をもたらす。本稿では,マルチラベル画像分類,すなわちSpliceMixの簡易かつ効果的な拡張戦略を提案する。私たちのメソッドのspliceは2倍です。 1) 混合画像は,複数のダウンサンプリングされた画像を格子状に分割し,混合に係わる画像の意味を,共起バイアスを緩和する対象の欠陥を伴わずにブレンドする。 2)混合画像とオリジナルのミニバッチをスプライシングし,新しいスプライス混合ミニバッチを形成した。さらに、SpliceMixedのミニバッチは、混合画像と元の正規画像との相互作用を可能にする。また,一貫性学習(splicemix-cl)に基づく簡易かつ非パラメトリックな拡張を提供し,splicemixの柔軟な拡張性を示す。様々なタスクに関する大規模な実験は、ベースラインモデル(例えばResNet)でSpliceMixを使用するだけで、最先端のメソッドよりも優れたパフォーマンスが得られることを示した。さらに、SpliceMixの一般化性は、SpliceMixとの結婚時に現在のMLICメソッドの改善によってさらに検証される。コードはhttps://github.com/zuiran/splicemixで入手できる。 Recently, Mix-style data augmentation methods (e.g., Mixup and CutMix) have shown promising performance in various visual tasks. However, these methods are primarily designed for single-label images, ignoring the considerable discrepancies between single- and multi-label images, i.e., a multi-label image involves multiple co-occurred categories and fickle object scales. On the other hand, previous multi-label image classification (MLIC) methods tend to design elaborate models, bringing expensive computation. In this paper, we introduce a simple but effective augmentation strategy for multi-label image classification, namely SpliceMix. The "splice" in our method is two-fold: 1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias; 2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together. Furthermore, such splice in our SpliceMixed mini-batch enables interactions between mixed images and original regular images. We also offer a simple and non-parametric extension based on consistency learning (SpliceMix-CL) to show the flexible extensibility of our SpliceMix. Extensive experiments on various tasks demonstrate that only using SpliceMix with a baseline model (e.g., ResNet) achieves better performance than state-of-the-art methods. Moreover, the generalizability of our SpliceMix is further validated by the improvements in current MLIC methods when married with our SpliceMix. The code is available at https://github.com/zuiran/SpliceMix.	翻訳日:2023-11-28 19:00:27 公開日:2023-11-26
# ChatGPTとBeyond: 教育における創造的AI革命 ChatGPT and Beyond: The Generative AI Revolution in Education ( http://arxiv.org/abs/2311.15198v1 ) ライセンス: Link先を確認	Mohammad AL-Smadi	(参考訳) 生成的人工知能(AI)モデル、特にChatGPTの普及と利用が、教育現場におけるその潜在的な応用を探求する研究の急増を引き起こした。本調査は,2022年11月から2023年7月までに発行された学術文献について,特にscopus-indexed q1およびq2ジャーナルのハイインパクト研究を対象とする。この調査は、様々な教育的文脈における生成AIモデルの実践的応用と意味を掘り下げるものである。近年の学術文献の包括的かつ厳密な評価を通じて、この調査は、教育における生成的AIモデル、特にChatGPTの進化的役割を解明することを目指している。このダイナミックな分野における潜在的利益、課題、そして新たなトレンドを振り返ることで、この調査は、人工知能と教育の橋渡しの理解に寄与することに努めている。このレビューの結果は、教育者、研究者、政策立案者に対して、AI技術の学習環境への統合に関する情報的な決定を下すよう促す。 The wide adoption and usage of generative artificial intelligence (AI) models, particularly ChatGPT, has sparked a surge in research exploring their potential applications in the educational landscape. This survey examines academic literature published between November, 2022, and July, 2023, specifically targeting high-impact research from Scopus-indexed Q1 and Q2 journals. This survey delves into the practical applications and implications of generative AI models across a diverse range of educational contexts. Through a comprehensive and rigorous evaluation of recent academic literature, this survey seeks to illuminate the evolving role of generative AI models, particularly ChatGPT, in education. By shedding light on the potential benefits, challenges, and emerging trends in this dynamic field, the survey endeavors to contribute to the understanding of the nexus between artificial intelligence and education. The findings of this review will empower educators, researchers, and policymakers to make informed decisions about the integration of AI technologies into learning environments.	翻訳日:2023-11-28 18:59:59 公開日:2023-11-26
# アンサンブル窒素空洞を用いた広帯域マイクロ波センサの実証 Demonstration of highly-sensitive wideband microwave sensing using ensemble nitrogen-vacancy centers ( http://arxiv.org/abs/2311.15196v1 ) ライセンス: Link先を確認	Kensuke Ogawa, Shunsuke Nishimura, Kento Sasaki, Kensuke Kobayasahi	(参考訳) マイクロ波磁気測定はマイクロ波技術の進歩に不可欠である。ダイヤモンド中のアンサンブル窒素空洞(NV)中心を用いた交流ゼーマン効果を用いた広帯域マイクロ波センシングプロトコルを実証する。広視野顕微鏡はマイクロ波共振器の周波数特性と外共振マイクロ波振幅の空間分布を可視化することができる。さらに、この手法を動的疎結合と組み合わせることで、5.2 \, \mathrm{\mu T} / \sqrt{\mathrm{Hz}}$のマイクロ波振幅感度が40.2 \, \mathrm{\mu T} / \sqrt{\mathrm{Hz}}$の7.7倍向上し、2.77 \, \mathrm{\mu m} \times 2.77 \, \mathrm{\mu m} \times 30 \, \mathrm{nm}$の感度が得られる。我々の業績は、広帯域および広帯域マイクロ波イメージングのためのアンサンブルNVセンターの適応に向けた具体的なステップである。 Microwave magnetometry is essential for the advancement of microwave technologies. We demonstrate a broadband microwave sensing protocol using the AC Zeeman effect with ensemble nitrogen-vacancy (NV) centers in diamond. A widefield microscope can visualize the frequency characteristics of the microwave resonator and the spatial distribution of off-resonant microwave amplitude. Furthermore, by combining this method with dynamical decoupling, we achieve the microwave amplitude sensitivity of $5.2 \, \mathrm{\mu T} / \sqrt{\mathrm{Hz}}$, which is 7.7 times better than $40.2 \, \mathrm{\mu T} / \sqrt{\mathrm{Hz}}$ obtained using the protocol in previous research over a sensing volume of $2.77 \, \mathrm{\mu m} \times 2.77 \, \mathrm{\mu m} \times 30 \, \mathrm{nm}$. Our achievement is a concrete step in adapting ensemble NV centers for wideband and widefield microwave imaging.	翻訳日:2023-11-28 18:59:42 公開日:2023-11-26
# 基本原理知識者になるためのニューラルネットワークモデル Neural Network Models of Becoming a Cardinal Principle Knower ( http://arxiv.org/abs/2311.15194v1 ) ライセンス: Link先を確認	Vima Gupta, Sashank Varma	(参考訳) 小学校に入ると、最初の50～100個の数字を記憶した数列から、後継関数を理解し、数え切れないほど無限となる数列の順序構造を理解するようになる。本研究では,N in (0, 98) のペア (N, N+1) における後継関数を学習する2つのニューラルネットワークモデルの発達変化について検討する。第1モデルは入力および出力値のワンホットエンコーディングを使用し、カウントリストを記憶する子供に対応し、第2モデルは位置値エンコーディングを使用し、命名番号の言語規則を学習する子供に対応する。位置-値モデルでは、十の境界を越えた表現的類似性の低下が予測された。テンス境界を越えた数え上げは、2次元空間におけるベクトル演算として理解でき、同じテンス配置の数値は線形に分離可能な方法で構成され、同じテンス配置の数字はグループ分けされる。カリキュラム学習シミュレーションは, 発達期児の発達する数値環境において, より少ない数の表現が, より大きい数の表現が学習され始めれば, より鋭くなり続けることを示す。これらのモデルは、後続関数の学習を超えて、より一般的な数え上げ過程をシミュレートし、可算無限を理解することの意味をより深く理解するために、再帰的アーキテクチャを用いた将来の作業の舞台となった。 As children enter elementary school, their understanding of the ordinal structure of numbers transitions from a memorized count list of the first 50-100 numbers to knowing the successor function and understanding the countably infinite. We investigate this developmental change in two neural network models that learn the successor function on the pairs (N, N+1) for N in (0, 98). The first uses a one-hot encoding of the input and output values and corresponds to children memorizing a count list, while the second model uses a place-value encoding and corresponds to children learning the language rules for naming numbers. The place-value model showed a predicted drop in representational similarity across tens boundaries. Counting across a tens boundary can be understood as a vector operation in 2D space, where the numbers with the same tens place are organized in a linearly separable manner, whereas those with the same ones place are grouped together. A curriculum learning simulation shows that, in the expanding numerical environment of the developing child, representations of smaller numbers continue to be sharpened even as larger numbers begin to be learned. These models set the stage for future work using recurrent architectures to move beyond learning the successor function to simulating the counting process more generally, and point towards a deeper understanding of what it means to understand the countably infinite.	翻訳日:2023-11-28 18:59:14 公開日:2023-11-26
# IA-LSTM:歩行者軌道予測のための対話型LSTM IA-LSTM: Interaction-Aware LSTM for Pedestrian Trajectory Prediction ( http://arxiv.org/abs/2311.15193v1 ) ライセンス: Link先を確認	Yuehai Chen	(参考訳) 群衆シナリオにおける歩行者の軌道予測は、衝突を避けるための政策決定に有用であるため、自動運転や自律移動ロボット分野において不可欠である。人間は異なる歩行運動を持ち、現在の環境における人間と物体、特に人間自身との相互作用は複雑であるため、これは難しい問題である。しかし、従来の研究では人間と人間の相互作用をモデル化する方法に焦点が当てられていた。この問題に対処するために,人間と人間の相互作用の相対的重要性を計測できるだけでなく,歩行者ごとに個人的な空間を構築できるコレントロピーに基づく新しいメカニズムを導入する。さらに,シーン内の動的ヒューマンインタラクションの特徴表現を効果的に抽出し,対応する重みを計算し,異なるインタラクションの重要性を表現できる,このデータ駆動機構を含むインタラクションモジュールを提案する。このような社会的メッセージを歩行者間で共有するために、軌道予測のためのLong Short-Term Memory(LSTM)ネットワークに基づく対話型アーキテクチャを設計する。 2つの公開データセットでモデルの性能を実証し, 実験結果から, 従来の手法よりも優れた性能が得られることを示した。 Predicting the trajectory of pedestrians in crowd scenarios is indispensable in self-driving or autonomous mobile robot field because estimating the future locations of pedestrians around is beneficial for policy decision to avoid collision. It is a challenging issue because humans have different walking motions and the interactions between humans and objects in the current environment, especially between human themselves, are complex. Previous researches have focused on how to model the human-human interactions, however, neglecting the relative importance of interactions. In order to address this issue, we introduce a novel mechanism based on the correntropy, which not only can measure the relative importance of human-human interactions, but also can build personal space for each pedestrian. We further propose an Interaction Module including this data-driven mechanism that can effectively extract feature representations of dynamic human-human interactions in the scene and calculate corresponding weights to represent the importance of different interactions. To share such social messages among pedestrians, we design an interaction-aware architecture based on the Long Short-Term Memory (LSTM) network for trajectory prediction. We demonstrate the performance of our model on two public datasets and the experimental results demonstrate that our model can achieve better performance than several latest methods with good performance.	翻訳日:2023-11-28 18:58:46 公開日:2023-11-26
# 大規模言語モデルのボラティリティのベンチマーク Benchmarking Large Language Model Volatility ( http://arxiv.org/abs/2311.15180v1 ) ライセンス: Link先を確認	Boyang Yu	(参考訳) 大規模言語モデル(LLM)からの非決定論的アウトプットの影響は,財務テキスト理解タスクにおいて十分に検討されていない。ニュース感情分析による米国株式市場への投資に関する説得力のあるケーススタディを通じて、文レベルの感情分類結果の実質的な変動を明らかにし、llm出力の生来のボラティリティを強調する。これらの不確実性は下流に流れ込み、ポートフォリオの構築とリターンに大きな変化をもたらした。言語モデルデコーダの温度パラメータを微調整すると、潜在的な対策が提示されるが、創造性を損なうことになる。同様に、複数の出力をアンサンブルすることは揮発性出力の効果を緩和するが、注目すべき計算投資を必要とする。本研究は,LLMの金融意思決定への統合の不確実性,特に非決定論的情報によって決定されるシナリオにおいて,不確実性に対処するための貴重な洞察を実践者に与えている。 The impact of non-deterministic outputs from Large Language Models (LLMs) is not well examined for financial text understanding tasks. Through a compelling case study on investing in the US equity market via news sentiment analysis, we uncover substantial variability in sentence-level sentiment classification results, underscoring the innate volatility of LLM outputs. These uncertainties cascade downstream, leading to more significant variations in portfolio construction and return. While tweaking the temperature parameter in the language model decoder presents a potential remedy, it comes at the expense of stifled creativity. Similarly, while ensembling multiple outputs mitigates the effect of volatile outputs, it demands a notable computational investment. This work furnishes practitioners with invaluable insights for adeptly navigating uncertainty in the integration of LLMs into financial decision-making, particularly in scenarios dictated by non-deterministic information.	翻訳日:2023-11-28 18:58:23 公開日:2023-11-26
# humanrecon: 幾何学的手がかりと物理前兆を用いた動的ヒトの神経再構築 HumanRecon: Neural Reconstruction of Dynamic Human Using Geometric Cues and Physical Priors ( http://arxiv.org/abs/2311.15171v1 ) ライセンス: Link先を確認	Junhui Yin, Wei Yin, Hao Chen, Xuqian Ren, Zhanyu Ma, Jun Guo, Yifan Liu	(参考訳) 近年の動的再建法は有望な再建結果を得た。これらの手法の多くは、明示的な幾何学的制約を考慮せずにRGB色監視のみに依存している。これにより、既存の人間の再構築技術は色に過度にフィットしやすくなり、幾何学的に固有の曖昧さ、特に疎らなマルチビュー設定を引き起こす。分子形状予測の分野での最近の進歩に触発されて、動的人間の再構築のための暗黙表現の学習において、推定深度と正規度の幾何学的制約を考える。幾何正規化として、信頼できるが明示的な監視情報を提供し、再構築品質を向上させる。また,視覚方向へのノイズの付加やヒト表面の密度の最大化など,いくつかの物理的に有益な先行技術も活用する。これらの先行は、光線に沿って描画された色が方向を見るために堅牢であることを保証するとともに、光線に沿って推定される密度の本来のあいまいさを低減する。実験の結果,人間固有の単分子推定器によって予測される深度と正常な手がかりは,効果的な監視信号を提供し,より正確な画像の描画を可能にすることが示された。最後に,提案する物理プライオリティにより,過剰フィッティングが著しく減少し,新規ビュー合成の全体的な品質が向上することを示す。私たちのコードは、~\href{https://github.com/PRIS-CV/HumanRecon}{https://github.com/PRIS-CV/HumanRecon}で利用可能です。 Recent methods for dynamic human reconstruction have attained promising reconstruction results. Most of these methods rely only on RGB color supervision without considering explicit geometric constraints. This leads to existing human reconstruction techniques being more prone to overfitting to color and causes geometrically inherent ambiguities, especially in the sparse multi-view setup. Motivated by recent advances in the field of monocular geometry prediction, we consider the geometric constraints of estimated depth and normals in the learning of neural implicit representation for dynamic human reconstruction. As a geometric regularization, this provides reliable yet explicit supervision information, and improves reconstruction quality. We also exploit several beneficial physical priors, such as adding noise into view direction and maximizing the density on the human surface. These priors ensure the color rendered along rays to be robust to view direction and reduce the inherent ambiguities of density estimated along rays. Experimental results demonstrate that depth and normal cues, predicted by human-specific monocular estimators, can provide effective supervision signals and render more accurate images. Finally, we also show that the proposed physical priors significantly reduce overfitting and improve the overall quality of novel view synthesis. Our code is available at:~\href{https://github.com/PRIS-CV/HumanRecon}{https://github.com/PRIS-CV/HumanRecon}.	翻訳日:2023-11-28 18:58:07 公開日:2023-11-26
# 配電系統における高インピーダンス故障位置推定のためのデータ駆動手法 A Data-Driven Approach for High-Impedance Fault Localization in Distribution Systems ( http://arxiv.org/abs/2311.15168v1 ) ライセンス: Link先を確認	Yuqi Zhou, Yuqing Dong and Rui Yang	(参考訳) 配電系統の信頼性の高い運用には,高精度で迅速な障害同定が不可欠である。送電網の他の故障とは異なり、hifは低故障電流のため従来の過電流リレーでは検出が極めて困難である。 HIFは様々な要因によって影響を受けるが、電圧電流特性は、システムが障害にどう反応するかを著しく示唆し、HIFを効果的にローカライズする機会を与える。本研究では,HIFイベントの識別のためのデータ駆動型手法を提案する。まず、電圧電流軌道の非線形性に取り組むため、分割関数で軌道を近似する最適化問題を定式化する。次に,すべてのセグメントの機能特徴を入力として収集し,サポートベクターマシンアプローチを用いて異なる場所でのhifを効率的に識別する。 IEEE 123-node test feederの数値的研究により,実時間HIF識別のための提案手法の有効性と精度が示された。 Accurate and quick identification of high-impedance faults is critical for the reliable operation of distribution systems. Unlike other faults in power grids, HIFs are very difficult to detect by conventional overcurrent relays due to the low fault current. Although HIFs can be affected by various factors, the voltage current characteristics can substantially imply how the system responds to the disturbance and thus provides opportunities to effectively localize HIFs. In this work, we propose a data-driven approach for the identification of HIF events. To tackle the nonlinearity of the voltage current trajectory, first, we formulate optimization problems to approximate the trajectory with piecewise functions. Then we collect the function features of all segments as inputs and use the support vector machine approach to efficiently identify HIFs at different locations. Numerical studies on the IEEE 123-node test feeder demonstrate the validity and accuracy of the proposed approach for real-time HIF identification.	翻訳日:2023-11-28 18:57:42 公開日:2023-11-26
# スライス・ツー・スライス・レジストレーションと再構成による自己監督型OCT画像 Self-supervised OCT Image Denoising with Slice-to-Slice Registration and Reconstruction ( http://arxiv.org/abs/2311.15167v1 ) ライセンス: Link先を確認	Shijie Li, Palaiologos Alexopoulos, Anse Vellappally, Ronald Zambrano, Wollstein Gadi, Guido Gerig	(参考訳) 強いスペックルノイズは、光コヒーレンストモグラフィー(OCT)イメージングに固有のものであり、臨床診断と疾患のモニタリングの進歩の鍵となる網膜構造の正確な定量化のための重要な障害である。構造保存ノイズ低減のための学習に基づく自己教師手法は,従来の手法よりも優れた性能を示したが,OCTイメージングではユニークな課題に直面している。コヒーレントAスキャンビームによるボクセルの高相関は、独立画素雑音の仮定に反する自己教師付き学習法の有効性を損なう。この独立性の仮定による既存モデルの限界を示す実験を行う。次に,OCT画像に特化して,スライス・バイ・スライス・トレーニングと登録用モジュールをひとつのネットワークに統合した,エンドツーエンドの自己教師型学習フレームワークを提案する。提案手法に対して広範なアブレーション研究を行った。前述した自己教師付き推論モデルとの比較により,提案フレームワークの性能が向上し,セグメンテーション性能と定量的解析への前処理ステップとして機能する可能性が示唆された。 Strong speckle noise is inherent to optical coherence tomography (OCT) imaging and represents a significant obstacle for accurate quantitative analysis of retinal structures which is key for advances in clinical diagnosis and monitoring of disease. Learning-based self-supervised methods for structure-preserving noise reduction have demonstrated superior performance over traditional methods but face unique challenges in OCT imaging. The high correlation of voxels generated by coherent A-scan beams undermines the efficacy of self-supervised learning methods as it violates the assumption of independent pixel noise. We conduct experiments demonstrating limitations of existing models due to this independence assumption. We then introduce a new end-to-end self-supervised learning framework specifically tailored for OCT image denoising, integrating slice-by-slice training and registration modules into one network. An extensive ablation study is conducted for the proposed approach. Comparison to previously published self-supervised denoising models demonstrates improved performance of the proposed framework, potentially serving as a preprocessing step towards superior segmentation performance and quantitative analysis.	翻訳日:2023-11-28 18:57:29 公開日:2023-11-26
# 混成分類器による精度・ロバスト性取引の軽減 Mixing Classifiers to Alleviate the Accuracy-Robustness Trade-Off ( http://arxiv.org/abs/2311.15165v1 ) ライセンス: Link先を確認	Yatong Bai, Brendon G. Anderson, Somayeh Sojoudi	(参考訳) 機械学習モデルは、最近データ駆動制御システムで大きな成功を収めている。しかし、標準学習モデルは、高い性能と厳密な堅牢性保証を必要とする安全クリティカルなシステムの制御において克服されなければならない制限である精度・ロバスト性トレードオフに苦しむことが多い。本研究では,標準モデルから高い精度とロバストモデルから高いロバスト性を同時に継承する分類器を開発するため,近年の"局所偏り平滑化"法を基礎としている。具体的には、局所バイアススムーシングをマルチクラス設定に拡張し、定式化を一般化して標準ニューラルネットワークとロバストニューラルネットワークの出力を“混合”することで、パフォーマンスボトルネックを克服する。我々は、ロバストなベースモデルのロバスト性が証明可能であれば、閉じた形式の$\ell_p$半径内で、入力に対する変更や攻撃が混合分類器の誤分類をもたらすことはないことを証明する。さらに、CIFAR-10ベンチマークデータセット上で数値実験を行い、混合モデルが精度・損耗トレードオフを著しく改善することを確認した。 Machine learning models have recently found tremendous success in data-driven control systems. However, standard learning models often suffer from an accuracy-robustness trade-off, which is a limitation that must be overcome in the control of safety-critical systems that require both high performance and rigorous robustness guarantees. In this work, we build upon the recent "locally biased smoothing" method to develop classifiers that simultaneously inherit high accuracy from standard models and high robustness from robust models. Specifically, we extend locally biased smoothing to the multi-class setting, and then overcome its performance bottleneck by generalizing the formulation to "mix" the outputs of a standard neural network and a robust neural network. We prove that when the robustness of the robust base model is certifiable, within a closed-form $\ell_p$ radius, no alteration or attack on an input can result in misclassification of the mixed classifier; the proposed model inherits the certified robustness. Moreover, we use numerical experiments on the CIFAR-10 benchmark dataset to verify that the mixed model noticeably improves the accuracy-robustness trade-off.	翻訳日:2023-11-28 18:57:09 公開日:2023-11-26
# 非物理的擬似モードモデルと物理アンサンブルのモデル化 : 非マルコフ量子ノイズのシミュレーション、緩和、再構成 Modeling the unphysical pseudomode model with physical ensembles: simulation, mitigation, and restructuring of non-Markovian quantum noise ( http://arxiv.org/abs/2311.15240v1 ) ライセンス: Link先を確認	Mauro Cirio, Si Luo, Pengfei Liang, Franco Nori, Neill Lambert	(参考訳) ガウス環境が量子系に与える影響は、連続体を離散的な補助量子と古典的自由度の集合に効果的に置き換えることによって説明できる。これは、還元されたシステムダイナミクスを古典的にシミュレートするために使用できる擬モードモデルを定義する。ここでは、擬モードモデル自体のアナログまたはデジタル量子シミュレーションの潜在的な利点を、別の視点で検討し、分析する。表面的には、そのような直接的な実験的な実装は、一般に、有効自由度の非物理的性質のために不可能である。しかし,非物理的擬似モードモデルの効果は,補助調和モードと任意の確率的駆動場を含む物理系のアンサンブル上で測定結果を用いて再現できることを示した。これは測定データにおける不正確性に対する安定性によって効率が制限される補間手法を導入することで実現される。そのようなシミュレーションがいかに私たちを許すかを検討する。 (i)古典的シミュレーションに挑戦する体制における複雑な非摂動環境と非マルコフ環境の効果の正確な量子シミュレーションを行う。 (ii) 逆に、量子デバイスに存在する潜在的な非マルコフノイズを緩和し、 (iii) 所定の浴槽の温度などの性質のいくつかを再構成すること。 The influence of a Gaussian environment on a quantum system can be described by effectively replacing the continuum with a discrete set of ancillary quantum and classical degrees of freedom. This defines a pseudomode model which can be used to classically simulate the reduced system dynamics. Here, we consider an alternative point of view and analyze the potential benefits of an analog or digital quantum simulation of the pseudomode model itself. Superficially, such a direct experimental implementation is, in general, impossible due to the unphysical properties of the effective degrees of freedom involved. However, we show that the effects of the unphysical pseudomode model can still be reproduced using measurement results over an ensemble of physical systems involving ancillary harmonic modes and an optional stochastic driving field. This is done by introducing an extrapolation technique whose efficiency is limited by stability against imprecision in the measurement data. We examine how such a simulation would allow us to (i) perform accurate quantum simulation of the effects of complex non-perturbative and non-Markovian environments in regimes that are challenging for classical simulation, (ii) conversely, mitigate potential unwanted non-Markovian noise present in quantum devices, and (iii) restructure some of some of the properties of a given physical bath, such as its temperature.	翻訳日:2023-11-28 18:50:07 公開日:2023-11-26
# 一般関数近似を用いた強化学習のためのほぼ最適かつ低スイッチングアルゴリズム A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation ( http://arxiv.org/abs/2311.15238v1 ) ライセンス: Link先を確認	Heyang Zhao and Jiafan He and Quanquan Gu	(参考訳) 探索・探索ジレンマは、複雑なモデルクラスを持つ強化学習(RL)において中心的な課題となっている。本稿では,一般関数近似を用いたRLのための単調Q-Learning with Upper Confidence Bound (MQL-UCB)を提案する。我々の主要なアルゴリズム設計は,(1)スイッチングコストを低く抑える一般的な決定論的政策変更戦略,(2)注意深く制御された関数クラス複雑性を持つ単調値関数構造,(3)データ効率の高い履歴軌跡を利用する分散重み付け回帰スキームである。 MQL-UCBは、$\tilde{O}(d\sqrt{HK})$が十分大きく、ほぼ最適ポリシーの切り替えコストが$\tilde{O}(dH)$で、$d$が関数クラスの希釈次元、$H$が計画的地平線、$K$がエピソード数である場合に、最小限の後悔を達成する。非線形関数近似を用いたサンプル効率とデプロイメント効率のよいq-learningの設計に光を当てた。 The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with general function approximation. Our key algorithmic design includes (1) a general deterministic policy-switching strategy that achieves low switching cost, (2) a monotonic value function structure with carefully controlled function class complexity, and (3) a variance-weighted regression scheme that exploits historical trajectories with high data efficiency. MQL-UCB achieves minimax optimal regret of $\tilde{O}(d\sqrt{HK})$ when $K$ is sufficiently large and near-optimal policy switching cost of $\tilde{O}(dH)$, with $d$ being the eluder dimension of the function class, $H$ being the planning horizon, and $K$ being the number of episodes. Our work sheds light on designing provably sample-efficient and deployment-efficient Q-learning with nonlinear function approximation.	翻訳日:2023-11-28 18:49:31 公開日:2023-11-26
# SARオブジェクト分類のための自己知識蒸留に基づく二重逆正規化ネットワーク Double Reverse Regularization Network Based on Self-Knowledge Distillation for SAR Object Classification ( http://arxiv.org/abs/2311.15231v1 ) ライセンス: Link先を確認	Bo Xu, Hao Zheng, Zhigang Hu, Liu Yang, Meiguang Zheng	(参考訳) 現在の合成開口レーダ(sar)オブジェクト分類では、制限データセット(few-shot)とノイズデータによる深刻な過剰フィッティングの問題が大きな課題の1つとなっている。本稿では,知識蒸留の利点を学習ラベル平滑化正規化として考慮し,自己知識蒸留(drrnet-skd)に基づく新しい二重反転正規化ネットワークを提案する。具体的には, 蒸留重量が蒸留プロセスに与える影響を探索することで, オフラインとオンラインの蒸留を相補的に組み合わせることで, 効果的な正則化ネットワークを実現するために, 二重逆思考を採用することに着想を得た。次に、適応重み付け(AWA)モジュールは、ネットワーク性能に基づいて2つの逆転重みを適応的に割り当てるように設計され、学生ネットワークが両方の教師の恩恵を受けることができる。 The experimental results on OpenSARShip and FUSAR-Ship showed that DRRNet-SKD exhibits excellent performance on classical CNNs, out-of-the-the-the-art-knowledge distillation method。 In current synthetic aperture radar (SAR) object classification, one of the major challenges is the severe overfitting issue due to the limited dataset (few-shot) and noisy data. Considering the advantages of knowledge distillation as a learned label smoothing regularization, this paper proposes a novel Double Reverse Regularization Network based on Self-Knowledge Distillation (DRRNet-SKD). Specifically, through exploring the effect of distillation weight on the process of distillation, we are inspired to adopt the double reverse thought to implement an effective regularization network by combining offline and online distillation in a complementary way. Then, the Adaptive Weight Assignment (AWA) module is designed to adaptively assign two reverse-changing weights based on the network performance, allowing the student network to better benefit from both teachers. The experimental results on OpenSARShip and FUSAR-Ship demonstrate that DRRNet-SKD exhibits remarkable performance improvement on classical CNNs, outperforming state-of-the-art self-knowledge distillation methods.	翻訳日:2023-11-28 18:48:57 公開日:2023-11-26
# GAIA:ゼロショットトーキングアバター世代 GAIA: Zero-shot Talking Avatar Generation ( http://arxiv.org/abs/2311.15230v1 ) ライセンス: Link先を確認	Tianyu He, Junliang Guo, Runyi Yu, Yuchi Wang, Jialiang Zhu, Kaikai An, Leyi Li, Xu Tan, Chunyu Wang, Han Hu, HsiangTao Wu, Sheng Zhao, Jiang Bian	(参考訳) ゼロショットトークアバター生成は、音声と1つのポートレート画像から自然なトークビデオを合成することを目的としている。従来の手法は、ワーピングに基づく運動表現や3次元モルファブルモデルといったドメイン固有のヒューリスティックに依存しており、これは生成されたアバターの自然性と多様性を制限する。本稿では,対話型アバター生成におけるドメインプライオリティを解消するgaia(generative ai for avatar)を紹介する。音声がアバターの動きのみを駆動するのに対し、アバターの外観と背景はビデオ全体を通して同じままであるという観察に照らして、アプローチを2つの段階に分けた。 1) 各フレームを動作及び外観表現に分解する。 2) 音声および参照ポートレート画像に条件付き動作シーケンスを生成する。大規模な高品質な音声アバターデータセットを収集し、異なるスケール(最大2Bパラメータ)でモデルをトレーニングします。 GAIAの優位性,スケーラビリティ,柔軟性を検証した実験結果 1) 結果のモデルは,自然性,多様性,リップシンク品質,視覚的品質の点で,従来のベースラインモデルを上回る。 2) より大きなモデルはより良い結果をもたらすので、フレームワークはスケーラブルです。 3) 汎用的で,制御可能な発話アバター生成やテキスト指示アバター生成など,さまざまなアプリケーションを可能にする。 Zero-shot talking avatar generation aims at synthesizing natural talking videos from speech and a single portrait image. Previous methods have relied on domain-specific heuristics such as warping-based motion representation and 3D Morphable Models, which limit the naturalness and diversity of the generated avatars. In this work, we introduce GAIA (Generative AI for Avatar), which eliminates the domain priors in talking avatar generation. In light of the observation that the speech only drives the motion of the avatar while the appearance of the avatar and the background typically remain the same throughout the entire video, we divide our approach into two stages: 1) disentangling each frame into motion and appearance representations; 2) generating motion sequences conditioned on the speech and reference portrait image. We collect a large-scale high-quality talking avatar dataset and train the model on it with different scales (up to 2B parameters). Experimental results verify the superiority, scalability, and flexibility of GAIA as 1) the resulting model beats previous baseline models in terms of naturalness, diversity, lip-sync quality, and visual quality; 2) the framework is scalable since larger models yield better results; 3) it is general and enables different applications like controllable talking avatar generation and text-instructed avatar generation.	翻訳日:2023-11-28 18:48:23 公開日:2023-11-26
# 画像分類のための1ビットスーパービジョン:問題、解決策、そしてそれ以上 One-bit Supervision for Image Classification: Problem, Solution, and Beyond ( http://arxiv.org/abs/2311.15225v1 ) ライセンス: Link先を確認	Hengtong Hu, Lingxi Xie, Xinyue Hue, Richang Hong, Qi Tian	(参考訳) 本稿では,画像分類のための新しい学習セットであるone-bit supervisorを提案する。各サンプルの正確なラベルを用いてモデルをトレーニングする代わりに、我々の設定では、各サンプルのクラスラベルを予測し、推測が正しいかどうかを答えから学習することで、情報の1ビット(yes or no)を提供するモデルが必要である。この設定の興味深い特性は、アノテーションの負担が正確なラベルを提供するよりも大幅に軽減されていることである。 1ビット監視には2つのキーがあります。一推測精度及び推定精度の向上 (ii)不正確な推測をうまく利用すること。これらの目標を達成するために,多段階学習パラダイムを提案し,既成の半教師付き学習アルゴリズムに負のラベル抑圧を組み込む。理論解析により,1ビットアノテーションは全ビットアノテーションよりも効率が高く,本手法とアクティブラーニングの併用条件が示唆された。これにより,より効率的なトレーニングスケジュールが得られる自己教師付き学習アルゴリズムに,ワンビット監視フレームワークをさらに統合する。自己指導型学習を初期化に用いた場合、スクラッチのトレーニングと異なり、ハードサンプルマイニングとクラスバランスの両方が学習性能の向上に有効である。しかし、これら2つのフレームワークは、初期段階ではフルビットラベルが必要である。この負担を軽減すべく、教師なしドメイン適応を用いて初期モデルをトレーニングし、ターゲットデータセット上で純粋な1ビットアノテーションを実行する。複数のベンチマークにおいて、提案手法の学習効率は、フルビットの半教師付き監視手法よりも優れている。 This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification. Instead of training model using the accurate label of each sample, our setting requires the model to interact with the system by predicting the class label of each sample and learn from the answer whether the guess is correct, which provides one bit (yes or no) of information. An intriguing property of the setting is that the burden of annotation largely alleviates in comparison to offering the accurate label. There are two keys to one-bit supervision, which are (i) improving the guess accuracy and (ii) making good use of the incorrect guesses. To achieve these goals, we propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm. Theoretical analysis shows that one-bit annotation is more efficient than full-bit annotation in most cases and gives the conditions of combining our approach with active learning. Inspired by this, we further integrate the one-bit supervision framework into the self-supervised learning algorithm which yields an even more efficient training schedule. Different from training from scratch, when self-supervised learning is used for initialization, both hard example mining and class balance are verified effective in boosting the learning performance. However, these two frameworks still need full-bit labels in the initial stage. To cast off this burden, we utilize unsupervised domain adaptation to train the initial model and conduct pure one-bit annotations on the target dataset. In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.	翻訳日:2023-11-28 18:47:55 公開日:2023-11-26
# 為替取引における決定木心理的リスク評価 Decision Tree Psychological Risk Assessment in Currency Trading ( http://arxiv.org/abs/2311.15222v1 ) ライセンス: Link先を確認	Jai Pal	(参考訳) 本研究は、AI(AI)を通貨トレーディングの世界に統合することに焦点を当て、個人トレーダの慣用性に合わせたインテリジェントなパーソナルアシスタントとして機能するパーソナライズされたAIモデルの開発を実証する。この論文は、AIモデルがトレーダの履歴データ内のニュアンスドパターンを識別し、通貨取引における心理的リスクダイナミクスをより正確かつ洞察に富んだ評価を容易にすることを示唆している。 PRIは、トレーダーの心理的脆弱性を促進する市場の状況に応じて変動を経験するダイナミックな指標である。高度な技術を利用することで、決定木を分類し、木構造内の決定境界を明確にすることができる。ユーザの時系列取引エントリを組み込むことで、心理的リスクが高められた場合の臨界点の特定に適している。リアルタイムの計算の性質は、心理的リスクの差し迫った瞬間についてトレーダーにタイムリーな警告を提供するプロアクティブツールとしてのモデルの実用性を高める。この研究の意味は通貨取引の制限を超えて広がり、パーソナライズされたモデリングの法的な適用が効率的かつ戦略的アプローチとして現れる他の産業の領域に到達した。本稿では,最先端技術と人間心理学の複雑なニュアンスを交点として,動的・高圧環境における意思決定支援のための変容パラダイムを提案する。 This research paper focuses on the integration of Artificial Intelligence (AI) into the currency trading landscape, positing the development of personalized AI models, essentially functioning as intelligent personal assistants tailored to the idiosyncrasies of individual traders. The paper posits that AI models are capable of identifying nuanced patterns within the trader's historical data, facilitating a more accurate and insightful assessment of psychological risk dynamics in currency trading. The PRI is a dynamic metric that experiences fluctuations in response to market conditions that foster psychological fragility among traders. By employing sophisticated techniques, a classifying decision tree is crafted, enabling clearer decision-making boundaries within the tree structure. By incorporating the user's chronological trade entries, the model becomes adept at identifying critical junctures when psychological risks are heightened. The real-time nature of the calculations enhances the model's utility as a proactive tool, offering timely alerts to traders about impending moments of psychological risks. The implications of this research extend beyond the confines of currency trading, reaching into the realms of other industries where the judicious application of personalized modeling emerges as an efficient and strategic approach. This paper positions itself at the intersection of cutting-edge technology and the intricate nuances of human psychology, offering a transformative paradigm for decision making support in dynamic and high-pressure environments.	翻訳日:2023-11-28 18:47:18 公開日:2023-11-26
# 限られたサンプルによる位相検索の局所的景観 The Local Landscape of Phase Retrieval Under Limited Samples ( http://arxiv.org/abs/2311.15221v1 ) ライセンス: Link先を確認	Kaizhao Liu, Zihao Wang, Lei Wu	(参考訳) 本稿では,限られたサンプルを用いて局地的位相探索の局地的景観を詳細に解析する。本研究の目的は,グローバルミニマを取り巻く良質な局所景観を高次元で保証するために必要なサンプルサイズを最小にすることである。 n$ と $d$ はそれぞれサンプルサイズと入力次元を表す。まず、局所凸性を探究し、$n=o(d\log d)$ が局所球のほとんどすべての固定点に対して、ヘッセン行列は、d$が十分大きい限り負の固有値を持つ必要があることを確かめる。そのため、地域景観は非凸である。次に、一点強凸性を考えると、n=\omega(d)$ である限り、高い確率で、そのランドスケープは局所環状の 1点強凸である: $\{w\in\mathbb{r}^d: o_d(1)\leqslant \\|w-w^\\|\leqslant c\}$, ここで $w^$ は基底真理であり、$c$ は絶対定数である。これは、この領域の任意の点から初期化された勾配降下が指数関数的に速く$o_d(1)$-loss解に収束することを意味する。さらに、$n=o(d\log d)$ のとき、半径が $\widetilde\Theta\left(\sqrt{1/d}\right)$ であることを示し、一点凸性は対応する小さな局所球で破れる。これは、一点凸性のみに依存することで、限られたサンプルの下での勾配降下に対して正確な$w^$の収束を確立することができないことを示している。 In this paper, we provide a fine-grained analysis of the local landscape of phase retrieval under the regime with limited samples. Our aim is to ascertain the minimal sample size necessary to guarantee a benign local landscape surrounding global minima in high dimensions. Let $n$ and $d$ denote the sample size and input dimension, respectively. We first explore the local convexity and establish that when $n=o(d\log d)$, for almost every fixed point in the local ball, the Hessian matrix must have negative eigenvalues as long as $d$ is sufficiently large. Consequently, the local landscape is highly non-convex. We next consider the one-point strong convexity and show that as long as $n=\omega(d)$, with high probability, the landscape is one-point strongly convex in the local annulus: $\{w\in\mathbb{R}^d: o_d(1)\leqslant \\|w-w^\\|\leqslant c\}$, where $w^$ is the ground truth and $c$ is an absolute constant. This implies that gradient descent initialized from any point in this domain can converge to an $o_d(1)$-loss solution exponentially fast. Furthermore, we show that when $n=o(d\log d)$, there is a radius of $\widetilde\Theta\left(\sqrt{1/d}\right)$ such that one-point convexity breaks in the corresponding smaller local ball. This indicates an impossibility to establish a convergence to exact $w^$ for gradient descent under limited samples by relying solely on one-point convexity.	翻訳日:2023-11-28 18:46:53 公開日:2023-11-26
# 量的分析と質的データに基づく株式市場予測のためのデータセット Dataset for Stock Market Forecasting Based on Quantitative Analysis and Qualitative Data ( http://arxiv.org/abs/2311.15218v1 ) ライセンス: Link先を確認	Sai Akash Bathini, Dagli Cihan	(参考訳) 機械学習の金融への応用は、株式市場の予測よりもよく知られたアプローチになっている。株式市場は揮発性が高く、全世界で毎分大量のデータが生成される。このデータから効果的なインテリジェンスを抽出することが重要である。しかし,数値ストックデータと定性的テキストデータとの協調は難しい課題である。本研究は,前例のない,技術的かつ基本的なデータと,ニュースアーカイブやテレビニュースキャプション,ラジオの書き起こし,ツイート,日々の金融新聞などから収集した感情を備えたデータセットを提供する。感情抽出に使われるテキストデータエントリは合計で140万以上である。データセットは、2018年1月から2022年12月までの8つの異なる企業の日次エントリと、Dow Jones Index全体で構成されている。モデル学習とデプロイの準備が整った、ホロスティック基本および技術データを提供する。ディープラーニングモデルの予測力は、提供されるトレーニングデータによって大きく決定される。このデータセットは、株式市場の予測に質的なインテリジェンスをグローバルに取り入れた研究の恩恵を受けるだろう。データセットはhttps://github.com/batking24/Huge-Stock-Datasetで公開されている。 The application of Machine learning to finance has become a familiar approach, even more so in stock market forecasting. The stock market is highly volatile and huge amounts of data are generated every minute globally. The extraction of effective intelligence from this data is of critical importance. However, a collaboration of numerical stock data with qualitative text data can be a challenging task. In this work, we accomplish this and provide an unprecedented, publicly available dataset with technical and fundamental data, sentiment that we gathered from News Archives, TV news captions, Radio Transcripts, Tweets, Daily financial newspapers, etc. The text data entries used for sentiment extraction total more than 1.4 Million. The dataset comprises of daily entries from January 2018 to December 2022 for 8 different companies and Dow Jones Index as a whole. Holistic Fundamental and Technical data is provided training ready for Model learning and deployment. The predictive power of deep learning models is highly determined by the training data provided. This dataset would be of benefit for research globally incorporating qualitative intelligence for stock market forecasting. The dataset is made available at https://github.com/batking24/Huge-Stock-Dataset.	翻訳日:2023-11-28 18:46:16 公開日:2023-11-26
# 物理インフォームドグラフ学習による大規模単位コミットメント問題の解法 Solve Large-scale Unit Commitment Problems by Physics-informed Graph Learning ( http://arxiv.org/abs/2311.15216v1 ) ライセンス: Link先を確認	Jingtao Qin, Nanpeng Yu	(参考訳) 単位コミットメント(UC)問題は一般的に混合整数プログラム(MIP)として定式化され、分岐とバウンド(B&B)方式で解決される。グラフニューラルネットワーク(GNN)の最近の進歩により、最新のMIPソルバにおけるB&Bアルゴリズムを、潜水と分岐の学習によって強化することができる。 MIP問題に対処する既存のGNNモデルは、大規模なUC問題を扱う際に計算コストがかかる数学的定式化によって構築されている。本稿では,電力系統の様々な構成要素の基盤的特徴を活かし,高品質な可変代入を求めるニューラルダイビングのための物理計算型階層型グラフ畳み込みネットワーク(pi-gcn)を提案する。さらに,MIPモデルに基づくグラフ畳み込みネットワーク(MB-GCN)を神経分岐に適用し,B&Bツリーの各ノードで分岐する最適な変数を選択する。最後に、ニューラルダイビングとニューラルブランチを現代のMIPソルバに統合し、大規模UC問題用に設計された新しいニューラルMIPソルバを確立する。多くの研究により、PI-GCNはニューラルダイビングのベースラインMB-GCNよりも性能とスケーラビリティが優れていることが示されている。さらに,提案するニューラルダイビングモデルとベースラインニューラル分岐モデルを組み合わせた場合,ニューラルmipソルバは運用コストが最も低く,最新のmipソルバよりも優れた性能を発揮する。 Unit commitment (UC) problems are typically formulated as mixed-integer programs (MIP) and solved by the branch-and-bound (B&B) scheme. The recent advances in graph neural networks (GNN) enable it to enhance the B&B algorithm in modern MIP solvers by learning to dive and branch. Existing GNN models that tackle MIP problems are mostly constructed from mathematical formulation, which is computationally expensive when dealing with large-scale UC problems. In this paper, we propose a physics-informed hierarchical graph convolutional network (PI-GCN) for neural diving that leverages the underlying features of various components of power systems to find high-quality variable assignments. Furthermore, we adopt the MIP model-based graph convolutional network (MB-GCN) for neural branching to select the optimal variables for branching at each node of the B&B tree. Finally, we integrate neural diving and neural branching into a modern MIP solver to establish a novel neural MIP solver designed for large-scale UC problems. Numeral studies show that PI-GCN has better performance and scalability than the baseline MB-GCN on neural diving. Moreover, the neural MIP solver yields the lowest operational cost and outperforms a modern MIP solver for all testing days after combining it with our proposed neural diving model and the baseline neural branching model.	翻訳日:2023-11-28 18:45:59 公開日:2023-11-26
# 隣り合う階層初期化を持つ新しい正規化カットソルバー A Novel Normalized-Cut Solver with Nearest Neighbor Hierarchical Initialization ( http://arxiv.org/abs/2311.15214v1 ) ライセンス: Link先を確認	Feiping Nie, Jitao Lu, Danyang Wu, Rong Wang, Xuelong Li	(参考訳) 正規化カット(N-Cut)は、スペクトルクラスタリングの有名なモデルである。従来のN-Cutソルバは2段階である。 1)正規化ラプラシアン行列の連続スペクトル埋め込みの計算 2)K$-meansまたはスペクトル回転による離散化。しかしこのパラダイムは2つの重大な問題をもたらします 1) 2段階法は元の問題の緩和版を解くため、元のN-Cut問題に対して良い解を得ることはできない。 2) 緩和された問題を解決するには,$\mathcal{o}(n^3)$ の時間複雑性 (n$ はノード数) を持つ固有値分解が必要である。この問題を解決するために,有名な座標降下法に基づく新しいN-Cut解法を提案する。バニラ座標降下法にも$\mathcal{o}(n^3)$ の時間複雑性があるので、時間複雑性を$\mathcal{o}(\|e\|)$ (\|e\|$ is the number of edges) に減らすための様々な加速戦略を設計する。クラスタリングに不確実性をもたらすランダム初期化への依存を避けるため,決定論的アウトプットを与える効率的な初期化手法を提案する。いくつかのベンチマークデータセットに対する大規模な実験により、提案手法は従来の解法と比較してクラスタリング性能が向上する一方、N-Cutの目的値が大きいことが示されている。 Normalized-Cut (N-Cut) is a famous model of spectral clustering. The traditional N-Cut solvers are two-stage: 1) calculating the continuous spectral embedding of normalized Laplacian matrix; 2) discretization via $K$-means or spectral rotation. However, this paradigm brings two vital problems: 1) two-stage methods solve a relaxed version of the original problem, so they cannot obtain good solutions for the original N-Cut problem; 2) solving the relaxed problem requires eigenvalue decomposition, which has $\mathcal{O}(n^3)$ time complexity ($n$ is the number of nodes). To address the problems, we propose a novel N-Cut solver designed based on the famous coordinate descent method. Since the vanilla coordinate descent method also has $\mathcal{O}(n^3)$ time complexity, we design various accelerating strategies to reduce the time complexity to $\mathcal{O}(\|E\|)$ ($\|E\|$ is the number of edges). To avoid reliance on random initialization which brings uncertainties to clustering, we propose an efficient initialization method that gives deterministic outputs. Extensive experiments on several benchmark datasets demonstrate that the proposed solver can obtain larger objective values of N-Cut, meanwhile achieving better clustering performance compared to traditional solvers.	翻訳日:2023-11-28 18:45:33 公開日:2023-11-26
# 気胸分節に対する不確かさを伴う解剖学的制約の緩和 Leveraging Anatomical Constraints with Uncertainty for Pneumothorax Segmentation ( http://arxiv.org/abs/2311.15213v1 ) ライセンス: Link先を確認	Han Yuan, Chuan Hong, Nguyen Tuan Anh Tran, Xinxing Xu, Nan Liu	(参考訳) 気胸は胸腔に異常な空気の蓄積(肺と胸壁の間の潜在的な空間)によって引き起こされる医学上の緊急事態である。 2D胸部X線写真では胸腔内および縦隔外側に気胸を認め,この領域を「lung+ space」と呼ぶ。深層学習(DL)は胸部X線写真における気胸病変の分画にますます利用されているが,既存のDLモデルの多くはエンドツーエンドアプローチを採用している。これらのモデルは胸部x線写真を直接臨床医に注釈された病変領域にマッピングし、気胸が本質的に位置に敏感であるという重要な領域知識を無視することが多い。 2次元胸部x線写真における気胸分画のdlモデルトレーニング中に肺+空間を制約として組み込む新しいアプローチを提案する。追加アノテーションの必要性を回避し,対象タスクにおける潜在的なラベルリークを防止するために,外部データセットと肺分節補助タスクを利用する。このアプローチは胸部X線写真ごとに肺+空間の特定の制約を生成する。さらに,補助データセットと対象データセット間のドメインシフトに起因する信頼できない制約を排除するために,判別器を組み込んだ。その結果,平均性能は4.6%,3.6%,3.3%向上し,iou(intersection over union),dsc(dice similarity coefficient)およびhd(hausdorff distance)が改善した。本研究は, 気胸の部位特異性に関する医学領域知識を取り入れ, dl-based lesion segmentationを増強する意義を強調する。 Pneumothorax is a medical emergency caused by abnormal accumulation of air in the pleural space - the potential space between the lungs and chest wall. On 2D chest radiographs, pneumothorax occurs within the thoracic cavity and outside of the mediastinum and we refer to this area as "lung+ space". While deep learning (DL) has increasingly been utilized to segment pneumothorax lesions in chest radiographs, many existing DL models employ an end-to-end approach. These models directly map chest radiographs to clinician-annotated lesion areas, often neglecting the vital domain knowledge that pneumothorax is inherently location-sensitive. We propose a novel approach that incorporates the lung+ space as a constraint during DL model training for pneumothorax segmentation on 2D chest radiographs. To circumvent the need for additional annotations and to prevent potential label leakage on the target task, our method utilizes external datasets and an auxiliary task of lung segmentation. This approach generates a specific constraint of lung+ space for each chest radiograph. Furthermore, we have incorporated a discriminator to eliminate unreliable constraints caused by the domain shift between the auxiliary and target datasets. Our results demonstrated significant improvements, with average performance gains of 4.6%, 3.6%, and 3.3% regarding Intersection over Union (IoU), Dice Similarity Coefficient (DSC), and Hausdorff Distance (HD). Our research underscores the significance of incorporating medical domain knowledge about the location-specific nature of pneumothorax to enhance DL-based lesion segmentation.	翻訳日:2023-11-28 18:45:09 公開日:2023-11-26
# OpenPerf: オープンソースエコシステムの持続可能な開発のためのベンチマークフレームワーク OpenPerf: A Benchmarking Framework for the Sustainable Development of the Open-Source Ecosystem ( http://arxiv.org/abs/2311.15212v1 ) ライセンス: Link先を確認	Fenglin Bi, Fanyu Han, Shengyu Zhao, Jinlu Li, Yanbin Zhang, Wei Wang	(参考訳) ベンチマークには、特定のテスト対象の特定のパフォーマンス指標を定量的かつ比較的に評価するための、科学的テスト方法、ツール、フレームワークを設計することが含まれる。人工知能の開発により、imagenetやdataperfといったaiベンチマークデータセットは、学術分野と産業分野の両方で徐々にコンセンサス標準になっている。しかし、ベンチマークフレームワークの構築は、さまざまなデータタイプ、幅広い研究課題、そしてコラボレーションネットワークの複雑な性質のために、オープンソースドメインにおいて依然として重要な課題である。本稿では,オープンソースエコシステムの持続可能な開発を目的としたベンチマークフレームワークであるOpenPerfを紹介する。このフレームワークは、オープンソースの研究で9つのタスクベンチマークタスクを定義し、時系列、テキスト、グラフィックという3つのデータタイプを包含し、回帰、分類、推奨、ランキング、ネットワーク構築、異常検出を含む6つの研究問題に対処する。上記のタスクに基づいて、3つのデータサイエンスタスクベンチマーク、2つのインデックスベースのベンチマーク、1つの標準ベンチマークを実装した。特に、インデックスベースのベンチマークは、オープンソースコミュニティガバナンスの評価基準として、China Electronics Standardization Instituteによって採用されている。さらに私たちは,堅牢なデータ管理,ツール統合,ユーザインターフェース機能を提供するだけでなく,学術機関や産業,財団にサービスを提供するためにbenchmarking-as-a-service(baas)モデルも採用する,openperf用の包括的なツールキットを開発した。 Alibaba、Ant Group、East China Normal Universityといった著名な企業や機関に適用することで、オープンソースエコシステムの健全な進化におけるOpenPerfの重要な役割を検証しました。 Benchmarking involves designing scientific test methods, tools, and frameworks to quantitatively and comparably assess specific performance indicators of certain test subjects. With the development of artificial intelligence, AI benchmarking datasets such as ImageNet and DataPerf have gradually become consensus standards in both academic and industrial fields. However, constructing a benchmarking framework remains a significant challenge in the open-source domain due to the diverse range of data types, the wide array of research issues, and the intricate nature of collaboration networks. This paper introduces OpenPerf, a benchmarking framework designed for the sustainable development of the open-source ecosystem. This framework defines 9 task benchmarking tasks in the open-source research, encompassing 3 data types: time series, text, and graphics, and addresses 6 research problems including regression, classification, recommendation, ranking, network building, and anomaly detection. Based on the above tasks, we implemented 3 data science task benchmarks, 2 index-based benchmarks, and 1 standard benchmark. Notably, the index-based benchmarks have been adopted by the China Electronics Standardization Institute as evaluation criteria for open-source community governance. Additionally, we have developed a comprehensive toolkit for OpenPerf, which not only offers robust data management, tool integration, and user interface capabilities but also adopts a Benchmarking-as-a-Service (BaaS) model to serve academic institutions, industries, and foundations. Through its application in renowned companies and institutions such as Alibaba, Ant Group, and East China Normal University, we have validated OpenPerf's pivotal role in the healthy evolution of the open-source ecosystem.	翻訳日:2023-11-28 18:44:39 公開日:2023-11-26
# 確率的トランスフォーマー:文脈表現のための確率的依存モデル Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation ( http://arxiv.org/abs/2311.15211v1 ) ライセンス: Link先を確認	Haoyi Wu, Kewei Tu	(参考訳) 構文構造は自然言語処理(nlp)において重要な役割を担っていたが、ディープラーニング革命以降、nlpは徐々に構文構造を考慮しない神経モデルに支配されるようになった。非常に成功したニューラルモデルの一つがトランスフォーマーである。エンコーダとして使用する場合、トランスフォーマーは入力文中の単語の文脈表現を生成する。本研究では,神経的な視点からではなく,純粋に構文的・確率的視点から,文脈的単語表現の新しいモデルを提案する。具体的には、文中のすべての単語の離散的な潜在表現とそれらの間の依存弧をモデル化する条件付きランダムフィールドを設計し、近似推論に平均場変動推論を用いる。驚くべきことに、我々のモデルの計算グラフはトランスフォーマーに似ており、依存と自己対応、潜在表現上の分布と単語の文脈埋め込みの間の対応がある。実験により,本モデルが小型・中型データセットのトランスフォーマーと競合することを示す。私たちの研究が,従来の構文的アプローチと確率的アプローチ,最先端のニューラルネットワークのnlpとのギャップを埋める上で有効であることを願っています。 Syntactic structures used to play a vital role in natural language processing (NLP), but since the deep learning revolution, NLP has been gradually dominated by neural models that do not consider syntactic structures in their design. One vastly successful class of neural models is transformers. When used as an encoder, a transformer produces contextual representation of words in the input sentence. In this work, we propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective. Specifically, we design a conditional random field that models discrete latent representations of all words in a sentence as well as dependency arcs between them; and we use mean field variational inference for approximate inference. Strikingly, we find that the computation graph of our model resembles transformers, with correspondences between dependencies and self-attention and between distributions over latent representations and contextual embeddings of words. Experiments show that our model performs competitively to transformers on small to medium sized datasets. We hope that our work could help bridge the gap between traditional syntactic and probabilistic approaches and cutting-edge neural approaches to NLP, and inspire more linguistically-principled neural approaches in the future.	翻訳日:2023-11-28 18:44:12 公開日:2023-11-26
# 子音認識のためのトポロジー複合機械学習 Topology combined machine learning for consonant recognition ( http://arxiv.org/abs/2311.15210v1 ) ライセンス: Link先を確認	Pingyao Feng, Siheng Yi, Qingrui Qu, Zhiwang Yu, Yifei Zhu	(参考訳) 人工知能による信号処理では、既存のディープラーニングモデルはしばしばブラックボックス構造を示し、その妥当性と理解性はいまだに不明である。トポロジカル手法の統合は、比較的初期段階の応用にもかかわらず、モデルをより解釈しやすくすると同時に、時間依存データから構造情報を抽出し、よりスマートな学習を可能にする。ここでは,機械学習の時系列に内在する最も有意義なトポロジ的特徴を捉えるための,透過的で広く適用可能な手法 topcap を提供する。高次元空間で回転するTopCapは、本質的な次元が低いデータセットでほとんど検出されない特徴をキャプチャできる。時間遅延埋め込みと持続的ホモロジーを応用して、シミュレーションデータを用いて、時系列の振動などの情報を、その周波数、振幅、平均線の可変性の観点からカプセル化する記述子を得る。この情報はベクトル化され、k-nearest近傍やサポートベクターマシンなどの複数の機械学習アルゴリズムに供給される。特に、音声および無声子音の分類において、TopCapは96%を超える精度を達成し、音声および音声信号の深層学習のためのトポロジ的畳み込み層の設計に向けられている。 In artificial-intelligence-aided signal processing, existing deep learning models often exhibit a black-box structure, and their validity and comprehensibility remain elusive. The integration of topological methods, despite its relatively nascent application, serves a dual purpose of making models more interpretable as well as extracting structural information from time-dependent data for smarter learning. Here, we provide a transparent and broadly applicable methodology, TopCap, to capture the most salient topological features inherent in time series for machine learning. Rooted in high-dimensional ambient spaces, TopCap is capable of capturing features rarely detected in datasets with low intrinsic dimensionality. Applying time-delay embedding and persistent homology, we obtain descriptors which encapsulate information such as the vibration of a time series, in terms of its variability of frequency, amplitude, and average line, demonstrated with simulated data. This information is then vectorised and fed into multiple machine learning algorithms such as k-nearest neighbours and support vector machine. Notably, in classifying voiced and voiceless consonants, TopCap achieves an accuracy exceeding 96% and is geared towards designing topological convolutional layers for deep learning of speech and audio signals.	翻訳日:2023-11-28 18:43:49 公開日:2023-11-26
# 仮想環境における具体化エージェント See and Think: Embodied Agent in Virtual Environment ( http://arxiv.org/abs/2311.15209v1 ) ライセンス: Link先を確認	Zhonghan Zhao, Wenhao Chai, Xuan Wang, Li Boyi, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang	(参考訳) 大規模言語モデル(LLM)は、いくつかのオープンワールドタスクにおいて驚くべき進歩を遂げた。近年, LLM を用いたエンボディエージェントの構築がホットスポットとなっている。本稿では,Minecraft仮想環境における包括的で視覚的なエンボディエージェントであるSTEVEを提案する。 STEVEは視覚知覚、言語命令、コードアクションの3つの重要なコンポーネントから構成される。視覚知覚は、環境内の視覚情報の解釈を伴い、エージェントの状態とタスク命令と共にllmsコンポーネントに統合される。言語指導は、反復的な推論と複雑なタスクを管理可能なガイドラインに分解する責任がある。コードアクションはスキルデータベースの検索に基づいて実行可能なスキルアクションを生成し、エージェントがminecraft環境内で効果的に対話できるようにする。また、600ドル+ビジョン環境ペア、20K知識質問応答ペア、200ドル+スキルコードペアを含むSTEVE-21Kデータセットも収集しています。我々は,連続的ブロック探索,知識質問と回答,技術木熟達を行い,その性能を評価する。大規模な実験によると、STEVEは、キーテクツリーのアンロックを高速化する$1.5と、これまでの最先端のメソッドに比べて、ブロック検索タスクを高速化する$2.5だ。 Large language models (LLMs) have achieved impressive progress on several open-world tasks. Recently, using LLMs to build embodied agents has been a hotspot. In this paper, we propose STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment. STEVE consists of three key components: vision perception, language instruction, and code action. Vision perception involves the interpretation of visual information in the environment, which is then integrated into the LLMs component with agent state and task instruction. Language instruction is responsible for iterative reasoning and decomposing complex tasks into manageable guidelines. Code action generates executable skill actions based on retrieval in skill database, enabling the agent to interact effectively within the Minecraft environment. We also collect STEVE-21K dataset, which includes 600$+$ vision-environment pairs, 20K knowledge question-answering pairs, and 200$+$ skill-code pairs. We conduct continuous block search, knowledge question and answering, and tech tree mastery to evaluate the performance. Extensive experiments show that STEVE achieves at most $1.5 \times$ faster unlocking key tech trees and $2.5 \times$ quicker in block search tasks compared to previous state-of-the-art methods.	翻訳日:2023-11-28 18:43:27 公開日:2023-11-26
# ウェブ・モバイル技術研究における学生の関心 Student's Interests Related to Web and Mobile Technologies Study ( http://arxiv.org/abs/2311.15293v1 ) ライセンス: Link先を確認	Manuela Petrescu, Adrian Sterca and Ioan Badarinza	(参考訳) 本稿では,Webとモバイル技術に関する学生の関心と課題について考察する。本研究は,Webプログラミング講座に参加する大学生,学生を対象にした調査である。特に,Web やモバイル開発において,学生がキャリアを成功させる上での課題について検討した結果,Web やモバイル技術が急速に変化する中で,最新の状態を維持するのに必要な作業が,最も重要であることがわかった。調査対象となった大学生のWeb開発やモバイル開発に対する態度は概して肯定的であり,60%以上がウェブやモバイル開発に興味を持っていると回答している。また、その多くがバックエンドのWeb技術に取り組んでいることもわかりました。学生が関心を持つ特定のウェブ技術については、非常に多様である。本研究は,Webとモバイル技術に関する学生の関心や課題に関する貴重な知見を提供し,この領域における効果的な教育・学習手法の開発を導くものである。 We explore in this paper the interests and challenges of students regarding web and mobile technologies. Our study is based on a survey among undergraduate students, students that attend a Web Programming course. In particular, we study the challenges students have in following a successful career in web or mobile development and we have found that the most important one is the large effort required for keeping up to date with the fast changing web and mobile technologies. Overall, the attitude of the surveyed undergraduate students towards web development and mobile development is rather positive, as more than 60% of them said that they are interested in a career in web or mobile development. We also found out that most of them prefer working on back-end web technologies. As for the specific web technologies students are interested on, they are highly varied. Overall, our study provides valuable insights into the interests and challenges of students regarding web and mobile technologies, which can guide the development of effective teaching and learning approaches in this area.	翻訳日:2023-11-28 18:36:00 公開日:2023-11-26
# Obj-NeRF:多視点画像から物体のNeRFを抽出する Obj-NeRF: Extract Object NeRFs from Multi-view Images ( http://arxiv.org/abs/2311.15291v1 ) ライセンス: Link先を確認	Zhiyi Li, Lihe Ding, Tianfan Xue	(参考訳) ニューラル・ラジアンス・フィールド(NeRF)は3次元環境における新しいビュー合成において顕著な効果を示した。しかし,複数視点画像から特定の物体の放射能場を抽出することは,咬合や背景の複雑さからかなりの困難に直面するため,nerf編集や3dメッシュ抽出などの下流アプリケーションでは困難が伴う。この問題を解決するため,本論文では,単一プロンプトを用いた多視点画像から特定の物体の3次元形状を復元する包括的パイプラインであるObj-NeRFを提案する。この手法は, セグメンテーションモデル(SAM)の2次元セグメンテーション能力とNeRFの3次元再構成能力を組み合わせたものである。具体的には,指示対象の多視点セグメンテーションをSAMを用いて1つのプロンプトで取得する。次に,このセグメンテーション画像を用いてNeRF構築を監督し,いくつかの効果的な手法を統合する。さらに、様々なオブジェクトを含む大きなオブジェクトレベルのnerfデータセットを構築し、様々なダウンストリームタスクで役立ちます。また,本手法の実用性を示すため,Obj-NeRFを物体除去,回転,置換,再色など様々な用途に適用する。 Neural Radiance Fields (NeRFs) have demonstrated remarkable effectiveness in novel view synthesis within 3D environments. However, extracting a radiance field of one specific object from multi-view images encounters substantial challenges due to occlusion and background complexity, thereby presenting difficulties in downstream applications such as NeRF editing and 3D mesh extraction. To solve this problem, in this paper, we propose Obj-NeRF, a comprehensive pipeline that recovers the 3D geometry of a specific object from multi-view images using a single prompt. This method combines the 2D segmentation capabilities of the Segment Anything Model (SAM) in conjunction with the 3D reconstruction ability of NeRF. Specifically, we first obtain multi-view segmentation for the indicated object using SAM with a single prompt. Then, we use the segmentation images to supervise NeRF construction, integrating several effective techniques. Additionally, we construct a large object-level NeRF dataset containing diverse objects, which can be useful in various downstream tasks. To demonstrate the practicality of our method, we also apply Obj-NeRF to various applications, including object removal, rotation, replacement, and recoloring.	翻訳日:2023-11-28 18:35:44 公開日:2023-11-26
# 貨物旅行の時間的・空間的特徴:データ駆動探索分析 Spatial and Temporal Characteristics of Freight Tours: A Data-Driven Exploratory Analysis ( http://arxiv.org/abs/2311.15287v1 ) ライセンス: Link先を確認	Ali Nadi, L\'or\'ant Tavasszy, J.W.C. van Lint, Maaike Snelder	(参考訳) 本稿では,デジタル貨物輸送活動データから異なる貨物市場におけるスケジューリングと経路パターンを推定するモデリング手法を提案する。貨物輸送データから規則を抽出するための離散連続決定木アプローチを含む,完全なモデリングフレームワークを提供する。これらのモデルをオランダで収集したツアーデータに適用し、出発時刻パターンとツアー戦略を理解し、提案アルゴリズムの有効性を評価した。旅行の種類や貨物活動の時間パターンを捉える上で,時間的・時間的特徴が重要であることがわかった。また、実証的な証拠は、ほとんどの輸送市場のキャリアが混雑のレベルに敏感であることを示している。それらの多くは、混雑するゾーンに面した場合のツアーの種類、出発時間、ツアー毎の停止数を調整する。結果は、実践者が輸送市場をより把握し、貨物・交通管理対策を開発するために利用することができる。 This paper presents a modeling approach to infer scheduling and routing patterns from digital freight transport activity data for different freight markets. We provide a complete modeling framework including a new discrete-continuous decision tree approach for extracting rules from the freight transport data. We apply these models to collected tour data for the Netherlands to understand departure time patterns and tour strategies, also allowing us to evaluate the effectiveness of the proposed algorithm. We find that spatial and temporal characteristics are important to capture the types of tours and time-of-day patterns of freight activities. Also, the empirical evidence indicates that carriers in most of the transport markets are sensitive to the level of congestion. Many of them adjust the type of tour, departure time, and the number of stops per tour when facing a congested zone. The results can be used by practitioners to get more grip on transport markets and develop freight and traffic management measures.	翻訳日:2023-11-28 18:35:26 公開日:2023-11-26
# 高次元pdesのためのランダム化平滑化を用いた物理形ニューラルネットワークのバイアス分散トレードオフ Bias-Variance Trade-off in Physics-Informed Neural Networks with Randomized Smoothing for High-Dimensional PDEs ( http://arxiv.org/abs/2311.15283v1 ) ライセンス: Link先を確認	Zheyuan Hu, Zhouhao Yang, Yezhen Wang, George Em Karniadakis, Kenji Kawaguchi	(参考訳) 物理インフォームドニューラルネットワーク(PINN)は低次元偏微分方程式(PDE)に有効であることが証明されているが、高次元シナリオでは計算コストがハードルとなっている。これは物理学的インフォームド損失における高次微分と高次元微分の計算において特に顕著である。 Randomized Smoothing PINN (RS-PINN) は、元のニューラルネットモデルの確率的滑らか化のためのガウスノイズを導入し、微分近似のためのモンテカルロ法を可能にし、コストのかかる自動微分の必要性を排除した。高次元での計算効率にもかかわらず、RS-PINNは損失と勾配の両方にバイアスを導入し、特に確率勾配降下(SGD)と組み合わせると、収束に悪影響を及ぼす。 RS-PINNにおけるバイアスの包括的解析は,平均二乗誤差(MSE)損失とPDE非線形性の非線形性に起因する。 PDE非線形性の順序に基づく補正バイアス補正手法を提案する。 RS-PINNはバイアスのないバージョンと比較して、その長所と短所を詳細に調べることができる。具体的には、偏りのあるバージョンは分散が低く、偏りのないバージョンよりも速く走るが、偏りのため正確ではない。バイアス分散のトレードオフを最適化するために,バイアス分散モデルの高速収束と非バイアスバージョンの高精度を両立するハイブリッド手法における2つのアプローチを組み合わせる。また,RS-PINNの実装も強化した。 fokker-planck, hjb, viscous burgers', allen-cahn, sine-gordon等を含む多種多様な高次元pdesに関する広範な実験はバイアス分散トレードオフを示し、ハイブリッドrs-pinnの有効性を強調している。特定のPDE問題の寸法や非線形性に応じてバイアス付き、バイアスなし、ハイブリッド版を選択するための実証的ガイドラインが提供される。 While physics-informed neural networks (PINNs) have been proven effective for low-dimensional partial differential equations (PDEs), the computational cost remains a hurdle in high-dimensional scenarios. This is particularly pronounced when computing high-order and high-dimensional derivatives in the physics-informed loss. Randomized Smoothing PINN (RS-PINN) introduces Gaussian noise for stochastic smoothing of the original neural net model, enabling Monte Carlo methods for derivative approximation, eliminating the need for costly auto-differentiation. Despite its computational efficiency in high dimensions, RS-PINN introduces biases in both loss and gradients, negatively impacting convergence, especially when coupled with stochastic gradient descent (SGD). We present a comprehensive analysis of biases in RS-PINN, attributing them to the nonlinearity of the Mean Squared Error (MSE) loss and the PDE nonlinearity. We propose tailored bias correction techniques based on the order of PDE nonlinearity. The unbiased RS-PINN allows for a detailed examination of its pros and cons compared to the biased version. Specifically, the biased version has a lower variance and runs faster than the unbiased version, but it is less accurate due to the bias. To optimize the bias-variance trade-off, we combine the two approaches in a hybrid method that balances the rapid convergence of the biased version with the high accuracy of the unbiased version. In addition, we present an enhanced implementation of RS-PINN. Extensive experiments on diverse high-dimensional PDEs, including Fokker-Planck, HJB, viscous Burgers', Allen-Cahn, and Sine-Gordon equations, illustrate the bias-variance trade-off and highlight the effectiveness of the hybrid RS-PINN. Empirical guidelines are provided for selecting biased, unbiased, or hybrid versions, depending on the dimensionality and nonlinearity of the specific PDE problem.	翻訳日:2023-11-28 18:35:11 公開日:2023-11-26
# 適応重み変調を用いた高能率リハーサル自由ゼロ学習 Efficient Rehearsal Free Zero Forgetting Continual Learning using Adaptive Weight Modulation ( http://arxiv.org/abs/2311.15276v1 ) ライセンス: Link先を確認	Yonatan Sverdlov, Shimon Ullman	(参考訳) ニューラルネットワークは、連続学習(continuous learning)として知られる、長期にわたる複数のタスクの知識獲得という、注目すべき課題に直面している。この課題は、新しいタスクの目的に合うように前もって学習した重量を調整する傾向から生じ、破滅的な忘れという現象を引き起こす。この問題に対するほとんどのアプローチは、新しいタスクのパフォーマンスを最大化することと、以前のタスクの忘れを最小化することのバランスを求める。対照的に、私たちのアプローチは、忘れることなく、新しいタスクのパフォーマンスを最大化しようとしています。これは各タスクに対してタスク固有の変調パラメータを作成することで実現される。これらは連続したタスクの学習中に学習可能なパラメータである。総合的な実験評価を行い,他のマルチタスクモデルに困難をもたらす新しいタスクの獲得と保持において優れた性能を示す。これは、新たなタスクの獲得を伴いながら、破滅的な忘れを予防するためのアプローチの有効性を強調します。 Artificial neural networks encounter a notable challenge known as continual learning, which involves acquiring knowledge of multiple tasks over an extended period. This challenge arises due to the tendency of previously learned weights to be adjusted to suit the objectives of new tasks, resulting in a phenomenon called catastrophic forgetting. Most approaches to this problem seek a balance between maximizing performance on the new tasks and minimizing the forgetting of previous tasks. In contrast, our approach attempts to maximize the performance of the new task, while ensuring zero forgetting. This is accomplished by creating a task-specific modulation parameters for each task. Only these would be learnable parameters during learning of consecutive tasks. Through comprehensive experimental evaluations, our model demonstrates superior performance in acquiring and retaining novel tasks that pose difficulties for other multi-task models. This emphasizes the efficacy of our approach in preventing catastrophic forgetting while accommodating the acquisition of new tasks	翻訳日:2023-11-28 18:34:35 公開日:2023-11-26
# 手書き数式認識のための知的検出ネットワーク An Intelligent-Detection Network for Handwritten Mathematical Expression Recognition ( http://arxiv.org/abs/2311.15273v1 ) ライセンス: Link先を確認	Ziqi Ye	(参考訳) 教育における人工知能技術の利用は急速に増加しており、研究者による手書き数式認識(hmer)に注目が集まっている。しかし、hmerの既存の手法の多くは複雑な構造を持つ式を正確に読み取ることができない可能性がある。提案するHMER用知的検出ネットワーク(IDN)は,オブジェクト検出技術を用いて従来のエンコーダデコーダ法と異なる。具体的には,デジタルオブジェクトとシンボルオブジェクトの両方を正確に検出できる拡張YOLOv7ネットワークを開発した。次に、検出結果を双方向ゲート再帰ユニット(BiGRU)とベースラインシンボル関係ツリー(BSRT)に統合し、シンボルと数字の関係を決定する。提案手法は, 複雑な手書き数式認識において, エンコーダ・デコーダネットワークよりも優れていることを示す。これは記号と数字の正確な検出のためである。我々の研究は、HMERの分野に貴重な貢献をする可能性がある。これは、学校における課題グレーディングや文書情報の入力など、様々な実践的なシナリオに適用できる。 The use of artificial intelligence technology in education is growing rapidly, with increasing attention being paid to handwritten mathematical expression recognition (HMER) by researchers. However, many existing methods for HMER may fail to accurately read formulas with complex structures, as the attention results can be inaccurate due to illegible handwriting or large variations in writing styles. Our proposed Intelligent-Detection Network (IDN) for HMER differs from traditional encoder-decoder methods by utilizing object detection techniques. Specifically, we have developed an enhanced YOLOv7 network that can accurately detect both digital and symbolic objects. The detection results are then integrated into the bidirectional gated recurrent unit (BiGRU) and the baseline symbol relationship tree (BSRT) to determine the relationships between symbols and numbers. The experiments demonstrate that the proposed method outperforms those encoder-decoder networks in recognizing complex handwritten mathematical expressions. This is due to the precise detection of symbols and numbers. Our research has the potential to make valuable contributions to the field of HMER. This could be applied in various practical scenarios, such as assignment grading in schools and information entry of paper documents.	翻訳日:2023-11-28 18:34:18 公開日:2023-11-26
# Tessel: フレキシブルスケジュール検索による大規模DNNモデルの分散実行促進 Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search ( http://arxiv.org/abs/2311.15269v1 ) ライセンス: Link先を確認	Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang	(参考訳) ますます複雑で多様なディープニューラルネットワーク(dnn)モデルは、トレーニングや推論タスクのために複数のデバイスに分散し、パフォーマンスのために注意深く計画されたスケジュールを必要とする。しかしながら、既存のプラクティスは、新興の多様なモデル認識オペレータ配置戦略の利点を十分に活用しない、事前定義されたスケジュールに依存することが多い。大規模かつ多様なスケジュール空間のため、手作りの高効率スケジュールは困難である。本稿では,分散dnnトレーニングのための効率的なスケジュール検索と,多様なオペレータ配置戦略のための推論を行う自動システムであるtesselを提案する。検索コストを削減するため、Tessel氏は、最も効率的なスケジュールは、異なるデータ入力に対して繰り返しパターン(繰り返し)を示すことが多いという洞察を活用している。これは2段階のアプローチにつながる: 繰り返しの建設とスケジュールの完了。様々なオペレータ配置戦略のスケジュールを調べることで、テッセルはトレーニングと推論のパフォーマンスを著しく改善する。代表的DNNモデルによる実験では、Tesselは最大5.5倍のトレーニング性能向上と最大38%の推論遅延削減を実現している。 Increasingly complex and diverse deep neural network (DNN) models necessitate distributing the execution across multiple devices for training and inference tasks, and also require carefully planned schedules for performance. However, existing practices often rely on predefined schedules that may not fully exploit the benefits of emerging diverse model-aware operator placement strategies. Handcrafting high-efficiency schedules can be challenging due to the large and varying schedule space. This paper presents Tessel, an automated system that searches for efficient schedules for distributed DNN training and inference for diverse operator placement strategies. To reduce search costs, Tessel leverages the insight that the most efficient schedules often exhibit repetitive pattern (repetend) across different data inputs. This leads to a two-phase approach: repetend construction and schedule completion. By exploring schedules for various operator placement strategies, Tessel significantly improves both training and inference performance. Experiments with representative DNN models demonstrate that Tessel achieves up to 5.5x training performance speedup and up to 38% inference latency reduction.	翻訳日:2023-11-28 18:33:59 公開日:2023-11-26
# スパース表現による未学習 Unlearning via Sparse Representations ( http://arxiv.org/abs/2311.15268v1 ) ライセンス: Link先を確認	Vedant Shah, Frederik Tr\"auble, Ashish Malik, Hugo Larochelle, Michael Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal	(参考訳) 訓練されたモデルから \emph{forget set} に関する知識を消去する機械 \emph{unlearning} は、既存の技術によってコストと実行不可能であることが証明される。本稿では,離散表現型ボトルネックに基づく無計算ゼロショット学習手法を提案する。提案手法は,提案手法を効率的に学習し,他のデータセットにおけるモデルの性能に負のダメージを与えることを示す。 CIFAR-10, CIFAR-100, LACUNA-100の3つのデータセットを用いて, 提案手法の評価を行った。提案手法を,未学習の知識蒸留を用いた最先端手法であるSCRUBと比較した。 3つのデータセット全体にわたって、提案手法はSCRUBに劣らず、計算コストがほとんどない。 Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's performance on the rest of the data set. We evaluate the proposed technique on the problem of \textit{class unlearning} using three datasets: CIFAR-10, CIFAR-100, and LACUNA-100. We compare the proposed technique to SCRUB, a state-of-the-art approach which uses knowledge distillation for unlearning. Across all three datasets, the proposed technique performs as well as, if not better than SCRUB while incurring almost no computational cost.	翻訳日:2023-11-28 18:33:41 公開日:2023-11-26
# ChAda-ViT : 不均一顕微鏡像の同時表現学習におけるチャネル適応的注意 ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images ( http://arxiv.org/abs/2311.15264v1 ) ライセンス: Link先を確認	Nicolas Bourriez, Ihab Bendidi, Ethan Cohen, Gabriel Watkinson, Maxime Sanchez, Guillaume Bollot, Auguste Genovesio	(参考訳) rgbチャネルに一貫してエンコードされるカラー写真画像とは異なり、生物学的画像は様々なモダリティを包含しており、顕微鏡のタイプや各チャネルの意味は実験によって異なる。重要なことは、チャンネルの数は1から1ダース程度で、その相関関係はRGBよりも比較的低く、それぞれが特定の情報コンテンツを提供する。この側面は、バイオイメージ領域から設計された手法によって見落とされ、現在のソリューションは主にチャネル内の空間的注意に焦点を当てており、チャネル間の関係を無視していることが多いが、ほとんどの生物学的応用において不可欠である。重要なことに、可変チャネルタイプとカウントは、大規模な事前トレーニングのための統一表現へのいくつかの実験の投影を妨げる。本研究では,任意の数,順序,種類のチャネルを持つ画像に対して,チャネル間アテンション機構を用いた新しいチャネル適応型視覚トランスフォーマアーキテクチャであるChAda-ViTを提案する。 IDRCell100kは、7つの顕微鏡モードを多種多様なチャネルタイプでカバーし、1つの実験ごとに1から10までのチャネル数を変化させた79の実験セットである。提案したアーキテクチャは, 既存のアプローチを, 生物学的に関係のある下流タスクで上回っている。さらに、様々な画像や実験的なモダリティを統一された生物学的イメージ表現に埋め込むことで、異なる顕微鏡、チャネル番号、タイプで測定する間を初めて橋渡しすることができる。後者は、学際的な研究の促進と、生物学的画像に基づく分析における深層学習のより良い採用の道を開くべきである。コードとデータはまもなくリリースされる。 Unlike color photography images, which are consistently encoded into RGB channels, biological images encompass various modalities, where the type of microscopy and the meaning of each channel varies with each experiment. Importantly, the number of channels can range from one to a dozen and their correlation is often comparatively much lower than RGB, as each of them brings specific information content. This aspect is largely overlooked by methods designed out of the bioimage field, and current solutions mostly focus on intra-channel spatial attention, often ignoring the relationship between channels, yet crucial in most biological applications. Importantly, the variable channel type and count prevent the projection of several experiments to a unified representation for large scale pre-training. In this study, we propose ChAda-ViT, a novel Channel Adaptive Vision Transformer architecture employing an Inter-Channel Attention mechanism on images with an arbitrary number, order and type of channels. We also introduce IDRCell100k, a bioimage dataset with a rich set of 79 experiments covering 7 microscope modalities, with a multitude of channel types, and channel counts varying from 1 to 10 per experiment. Our proposed architecture, trained in a self-supervised manner, outperforms existing approaches in several biologically relevant downstream tasks. Additionally, it can be used to bridge the gap for the first time between assays with different microscopes, channel numbers or types by embedding various image and experimental modalities into a unified biological image representation. The latter should facilitate interdisciplinary studies and pave the way for better adoption of deep learning in biological image-based analyses. Code and Data to be released soon.	翻訳日:2023-11-28 18:33:27 公開日:2023-11-26
# 自己教師付きグラフ畳み込みネットワークを細胞グラフに適用した脳画像における皮質層の研究 Revealing Cortical Layers In Histological Brain Images With Self-Supervised Graph Convolutional Networks Applied To Cell-Graphs ( http://arxiv.org/abs/2311.15262v1 ) ライセンス: Link先を確認	Valentina Vadori, Antonella Peruffo, Jean-Marie Gra\"ic, Giulia Vadori, Livio Finos, Enrico Grisan	(参考訳) 大脳皮質の層を同定することは、脳構造と種間の機能の関係に関する洞察を提供することを目的とした細胞構造の比較研究に不可欠である。広範な注釈付きデータセットがないことは、通常、機械学習アプローチの採用を制限するものであり、神経解剖学者による皮質層の手作業による記述につながる。大脳皮質の2次元Nissl染色組織スライスにおける層検出のための自己監督的アプローチを導入する。それは、個々の細胞のセグメンテーションと、属性付きセルグラフの作成から始まる。自己教師付きグラフ畳み込みネットワークは、細胞環境の形態的および構造的特性を符号化した細胞埋め込みを生成し、最終層化のためのコミュニティ検出アルゴリズムにより活用する。本手法は, 空間的トランスクリプトミクスデータを含まない, 自己管理した最初の手法であり, 細胞構造解析の促進, アノテーションニーズの回避, 種間調査の進展を期待できる。 Identifying cerebral cortex layers is crucial for comparative studies of the cytoarchitecture aiming at providing insights into the relations between brain structure and function across species. The absence of extensive annotated datasets typically limits the adoption of machine learning approaches, leading to the manual delineation of cortical layers by neuroanatomists. We introduce a self-supervised approach to detect layers in 2D Nissl-stained histological slices of the cerebral cortex. It starts with the segmentation of individual cells and the creation of an attributed cell-graph. A self-supervised graph convolutional network generates cell embeddings that encode morphological and structural traits of the cellular environment and are exploited by a community detection algorithm for the final layering. Our method, the first self-supervised of its kind with no spatial transcriptomics data involved, holds the potential to accelerate cytoarchitecture analyses, sidestepping annotation needs and advancing cross-species investigation.	翻訳日:2023-11-28 18:33:02 公開日:2023-11-26
# NeuRAD: 自律運転のためのニューラルレンダリング NeuRAD: Neural Rendering for Autonomous Driving ( http://arxiv.org/abs/2311.15260v1 ) ライセンス: Link先を確認	Adam Tonderski, Carl Lindstr\"om, Georg Hess, William Ljungbergh, Lennart Svensson, Christoffer Petersson	(参考訳) neural radiance fields(nerfs)は、自動運転(ad)コミュニティで人気を集めている。近年の手法では, クローズドループシミュレーションやADシステムのテスト, 高度なトレーニングデータ拡張技術などが実現されている。しかし、既存の手法では、長い訓練時間、密集した意味的監督、あるいは一般化可能性の欠如がしばしば必要である。これにより、大規模な AD への NeRF の適用が妨げられる。本稿では,動的ADデータに適した,堅牢なビュー合成手法であるNeuRADを提案する。我々の手法は単純なネットワーク設計、カメラとライダーの両方のための広範なセンサーモデリング -- ローリングシャッター、ビーム発散、レイドロップなど -- を備えており、最初から複数のデータセットに適用できる。一般的な5つのADデータセット上でのパフォーマンスを検証する。さらなる開発を促進するため、NeuRADソースコードを公開しています。 https://github.com/georghess/NeuRAD を参照。 Neural radiance fields (NeRFs) have gained popularity in the autonomous driving (AD) community. Recent methods show NeRFs' potential for closed-loop simulation, enabling testing of AD systems, and as an advanced training data augmentation technique. However, existing methods often require long training times, dense semantic supervision, or lack generalizability. This, in turn, hinders the application of NeRFs for AD at scale. In this paper, we propose NeuRAD, a robust novel view synthesis method tailored to dynamic AD data. Our method features simple network design, extensive sensor modeling for both camera and lidar -- including rolling shutter, beam divergence and ray dropping -- and is applicable to multiple datasets out of the box. We verify its performance on five popular AD datasets, achieving state-of-the-art performance across the board. To encourage further development, we openly release the NeuRAD source code. See https://github.com/georghess/NeuRAD .	翻訳日:2023-11-28 18:32:45 公開日:2023-11-26
# メタバースを使うべきかどうか? メタ教育技術を活用した大学生の行動意図に関する研究 Should I use metaverse or not? An investigation of university students behavioral intention to use MetaEducation technology ( http://arxiv.org/abs/2311.15251v1 ) ライセンス: Link先を確認	Nikolaos Misirlis, Yiannis Nikolaidis, Anna Sabidussi	(参考訳) Metaverseは、バーチャルと拡張現実を組み合わせた急成長する技術トレンドであり、ユーザーがデジタルアバターを通じて仮想アイデンティティを仮定し、現実の世界にいる他の人と対話できる完全なデジタル環境を提供する。その応用分野は、経済(暗号通貨分野への参入)、金融、社会生活、労働環境、医療、不動産、教育など多岐にわたる。新型コロナウイルス(covid-19)とcovid-19後、大学はeラーニング技術を急速に採用し、学生に学習コンテンツやプラットフォームへのオンラインアクセスを提供してきた。そこで本研究では,TAM(Technology Acceptance Model)を参考に,大学生のメタバース技術の教育における受容と活用の意図を分析する枠組みを提案する。本研究は, 教育におけるメタバース技術活用の意図と, 態度, 認知的有用性, 使いやすさ, 教育におけるメタバース技術の自己有効性, 主観規範など, 選択されたtam構成との関係について検討することを目的とする。特に、自己効力感と主観的ノルムは、態度と知覚的有用性に肯定的な影響を及ぼすが、知覚的使用感は、態度や知覚的有用性と強く相関しない。著者らは、研究の構成要素間の弱い関連性はメタエデュケーションとその潜在的な利益に関する限られた知識に起因すると仮定している。高等教育分野におけるメタ教育技術の受容と活用に関わる複雑なダイナミクスを包括的に理解するために,提案モデルのさらなる調査と分析が求められている。 Metaverse, a burgeoning technological trend that combines virtual and augmented reality, provides users with a fully digital environment where they can assume a virtual identity through a digital avatar and interact with others as they were in the real world. Its applications span diverse domains such as economy (with its entry into the cryptocurrency field), finance, social life, working environment, healthcare, real estate, and education. During the COVID-19 and post-COVID-19 era, universities have rapidly adopted e-learning technologies to provide students with online access to learning content and platforms, rendering previous considerations on integrating such technologies or preparing institutional infrastructures virtually obsolete. In light of this context, the present study proposes a framework for analyzing university students' acceptance and intention to use metaverse technologies in education, drawing upon the Technology Acceptance Model (TAM). The study aims to investigate the relationship between students' intention to use metaverse technologies in education, hereafter referred to as MetaEducation, and selected TAM constructs, including Attitude, Perceived Usefulness, Perceived Ease of Use, Self-efficacy of metaverse technologies in education, and Subjective Norm. Notably, Self-efficacy and Subjective Norm have a positive influence on Attitude and Perceived Usefulness, whereas Perceived Ease of Use does not exhibit a strong correlation with Attitude or Perceived Usefulness. The authors postulate that the weak associations between the study's constructs may be attributed to limited knowledge regarding MetaEducation and its potential benefits. Further investigation and analysis of the study's proposed model are warranted to comprehensively understand the complex dynamics involved in the acceptance and utilization of MetaEducation technologies in the realm of higher education	翻訳日:2023-11-28 18:32:31 公開日:2023-11-26
# 大規模言語モデルを用いたアルゴリズム進化 Algorithm Evolution Using Large Language Model ( http://arxiv.org/abs/2311.15249v1 ) ライセンス: Link先を確認	Fei Liu, Xialiang Tong, Mingxuan Yuan and Qingfu Zhang	(参考訳) 最適化は多くの現実のアプリケーションで見られます。特定の最適化問題に対して効果的なアルゴリズムを設計するには、ドメイン知識とアルゴリズム設計スキルを持つ人間の専門家による退屈な努力が必要となる。本稿では,大規模言語モデル(AEL)を用いたアルゴリズム進化という新しい手法を提案する。大規模な言語モデル(LLM)を使用して、進化的フレームワークを通じて最適化アルゴリズムを自動生成する。 AELはモデルトレーニングなしでアルゴリズムレベルの進化を行う。人間の努力とドメイン知識の要求は大幅に削減できる。本研究では, AEL による構成的アルゴリズムは, 単純な手作りと LLM 生成のヒューリスティックよりも優れていることを示す。他のドメイン深層学習モデルベースアルゴリズムと比較して、これらの手法は様々な問題サイズにまたがる優れたスケーラビリティを示す。 AELはまた、アルゴリズムの探索演算子としてLLMを使用した以前の試みとは大きく異なる。 Optimization can be found in many real-life applications. Designing an effective algorithm for a specific optimization problem typically requires a tedious amount of effort from human experts with domain knowledge and algorithm design skills. In this paper, we propose a novel approach called Algorithm Evolution using Large Language Model (AEL). It utilizes a large language model (LLM) to automatically generate optimization algorithms via an evolutionary framework. AEL does algorithm-level evolution without model training. Human effort and requirements for domain knowledge can be significantly reduced. We take constructive methods for the salesman traveling problem as a test example, we show that the constructive algorithm obtained by AEL outperforms simple hand-crafted and LLM-generated heuristics. Compared with other domain deep learning model-based algorithms, these methods exhibit excellent scalability across different problem sizes. AEL is also very different from previous attempts that utilize LLMs as search operators in algorithms.	翻訳日:2023-11-28 18:32:00 公開日:2023-11-26
# 分散検出のためのIDライクなプロンプト学習 ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection ( http://arxiv.org/abs/2311.15243v1 ) ライセンス: Link先を確認	Yichen Bai, Zongbo Han, Changqing Zhang, Bing Cao, Xiaoheng Jiang, Qinghua Hu	(参考訳) アウト・オブ・ディストリビューション(OOD)検出法は、OODサンプルを識別するモデルをトレーニングするために補助的なアウトレイアを利用することが多い。しかし、これらのサンプルは、ID(In-distriion)データに近い最も困難なOODサンプル、すなわちIDライクなサンプルを効果的に区別する際の制限に直面している。そこで本研究では,IDサンプルの近傍空間からCLIPを用いて,ID類似の異常値を検出する新しいOOD検出フレームワークを提案する。次に、識別されたIDライクな外れ値を利用して、OOD検出のためのCLIPの機能をさらに活用する即時学習フレームワークを提案する。強力なCLIPから恩恵を受けるため、補助的な外れ値データセットを公開せずにモデルのプロンプトを学習するためには、少数のIDサンプルが必要である。最も難しいidライクなoodサンプルに着目し,クリップの能力をエレガントに活用することにより,実世界の様々な画像データセットにおいて,優れた少数ショット学習性能を実現する(例えば,imagenet-1kデータセットにおける4ショットood検出では,平均fpr95を12.16%削減し,平均aurocを2.76%改善した)。 Out-of-distribution (OOD) detection methods often exploit auxiliary outliers to train model identifying OOD samples, especially discovering challenging outliers from auxiliary outliers dataset to improve OOD detection. However, they may still face limitations in effectively distinguishing between the most challenging OOD samples that are much like in-distribution (ID) data, i.e., ID-like samples. To this end, we propose a novel OOD detection framework that discovers ID-like outliers using CLIP from the vicinity space of the ID samples, thus helping to identify these most challenging OOD samples. Then a prompt learning framework is proposed that utilizes the identified ID-like outliers to further leverage the capabilities of CLIP for OOD detection. Benefiting from the powerful CLIP, we only need a small number of ID samples to learn the prompts of the model without exposing other auxiliary outlier datasets. By focusing on the most challenging ID-like OOD samples and elegantly exploiting the capabilities of CLIP, our method achieves superior few-shot learning performance on various real-world image datasets (e.g., in 4-shot OOD detection on the ImageNet-1k dataset, our method reduces the average FPR95 by 12.16% and improves the average AUROC by 2.76%, compared to state-of-the-art methods).	翻訳日:2023-11-28 18:31:47 公開日:2023-11-26
# CalibFormer: トランスフォーマーによるLiDARカメラ自動校正ネットワーク CalibFormer: A Transformer-based Automatic LiDAR-Camera Calibration Network ( http://arxiv.org/abs/2311.15241v1 ) ライセンス: Link先を確認	Yuxuan Xiao, Yao Li, Chengzhen Meng, Xingchen Li and Yanyong Zhang	(参考訳) LiDARとカメラの融合は、認識タスクの自動運転にますます採用されている。このような融合に基づくアルゴリズムの性能は、センサーキャリブレーションの精度に大きく依存する。以前は、多くの校正手法には特定の目標や手動による介入が含まれていた。学習に基づくオンライン校正手法が提案されているが、ほとんどのケースでその性能はほとんど満足できない。これらの手法は通常、スパース特徴写像、信頼できない相互モダリティ関係、不正確なキャリブレーションパラメータ回帰などの問題に苦しむ。本稿では,この問題を解決するために,自動LiDARカメラキャリブレーションのためのエンドツーエンドネットワークCalibFormerを提案する。高解像度表現を実現するために,複数のカメラ層とLiDAR画像層を集約する。マルチヘッド相関モジュールを用いて特徴間の相関をより正確に識別する。最後に,相関情報から正確な校正パラメータを推定するためにトランスアーキテクチャを用いる。提案手法は, KITTIデータセット上で平均翻訳誤差が0.8751 \mathrm{cm}$, 平均回転誤差が0.0562 ^{\circ}$となり, 既存の最先端手法を超越し, 強靭性, 精度, 一般化能力を示した。 The fusion of LiDARs and cameras has been increasingly adopted in autonomous driving for perception tasks. The performance of such fusion-based algorithms largely depends on the accuracy of sensor calibration, which is challenging due to the difficulty of identifying common features across different data modalities. Previously, many calibration methods involved specific targets and/or manual intervention, which has proven to be cumbersome and costly. Learning-based online calibration methods have been proposed, but their performance is barely satisfactory in most cases. These methods usually suffer from issues such as sparse feature maps, unreliable cross-modality association, inaccurate calibration parameter regression, etc. In this paper, to address these issues, we propose CalibFormer, an end-to-end network for automatic LiDAR-camera calibration. We aggregate multiple layers of camera and LiDAR image features to achieve high-resolution representations. A multi-head correlation module is utilized to identify correlations between features more accurately. Lastly, we employ transformer architectures to estimate accurate calibration parameters from the correlation information. Our method achieved a mean translation error of $0.8751 \mathrm{cm}$ and a mean rotation error of $0.0562 ^{\circ}$ on the KITTI dataset, surpassing existing state-of-the-art methods and demonstrating strong robustness, accuracy, and generalization capabilities.	翻訳日:2023-11-28 18:31:18 公開日:2023-11-26
# ASI:ディープラーニングモデル評価のための精度安定度指標 ASI: Accuracy-Stability Index for Evaluating Deep Learning Models ( http://arxiv.org/abs/2311.15332v1 ) ライセンス: Link先を確認	Wei Dai, Daniel Berleant	(参考訳) モデル導入が継続する深層学習研究の文脈では、効果的で効率的な評価の必要性が依然として最重要である。既存の手法は、しばしば精度の指標を強調し、安定性を見越す。これを解決するために,深層学習モデルの精度と安定性を両立させる定量的尺度であるASI(Acuracy-Stability Index)を提案する。実験により, ASIの応用が実証され, ASI, 平均精度, 変動係数を可視化する3次元表面モデルが提示された。本稿では,深層学習モデルの精度と安定性を正確に評価するための新しい手法として,深層学習モデルの定量的ベンチマーク指標の重要な課題について述べる。本稿は,潜在的な弱さに関する議論を終え,今後の研究方向性を概説する。 In the context of deep learning research, where model introductions continually occur, the need for effective and efficient evaluation remains paramount. Existing methods often emphasize accuracy metrics, overlooking stability. To address this, the paper introduces the Accuracy-Stability Index (ASI), a quantitative measure incorporating both accuracy and stability for assessing deep learning models. Experimental results demonstrate the application of ASI, and a 3D surface model is presented for visualizing ASI, mean accuracy, and coefficient of variation. This paper addresses the important issue of quantitative benchmarking metrics for deep learning models, providing a new approach for accurately evaluating accuracy and stability of deep learning models. The paper concludes with discussions on potential weaknesses and outlines future research directions.	翻訳日:2023-11-28 18:24:00 公開日:2023-11-26
# 付加・予測のための時間ネットワークの複雑な凸領域のオピニオンダイナミクスの展望 Perspective in Opinion Dynamics on Complex Convex Domains of Time Networks for Addiction, Forgetting ( http://arxiv.org/abs/2311.15318v1 ) ライセンス: Link先を確認	Yasuko Kawahata	(参考訳) 本稿では,先行研究を改訂し,時空間スケールの変化を紹介する。本稿では,a層とb層を含むモデルについて述べる。また, ある条件下での層A, A', B', B'の依存性や忘れの変化をモデル化する。また、忘れや依存の強化や妨害的な行動を持ち、保守的、洗脳、脱トキシングの傾向が少なく、バブルをフィルターする傾向の強い意見団の形成について論じるため、時間とともに忘れや依存を推奨、妨害、ブロック、あるいは扇動する新しいクラスターcとdが導入される。この導入により、時間と空間の2次元における意見の拡大、意見空間の発展状況、世論の拡大に関する仮説を試すことができる。コンセンサス構築における課題は強調され、意見のダイナミックな性質と、不満、不信感、メディアの影響といった要素を考慮する必要性が強調される。本稿では,コンセンサス構築モデルに信頼,不信,メディアの影響を取り入れた拡張フレームワークを提案する。我々は,より深い洞察を得る方法として,dimerizingを用いたネットワーク分析を提案する。本稿では,ネットワーククラスタリング,メディアの影響,コンセンサス構築について述べる。ダイマーの位置と分布を分析し、ネットワークの構造とダイナミクスについて洞察を得る。ダイマーティリングは物理学や社会学といったネットワーク分析以外の様々な分野に応用されてきた。論文は、コンセンサス構築における多様な視点、ネットワーク分析、影響力のあるエンティティの重要性を強調して結論づける。また、複雑なネットワーク構造を理解するのに役立つトーラスベースの可視化も導入している。 This paper revises previous work and introduces changes in spatio-temporal scales. The paper presents a model that includes layers A and B with varying degrees of forgetting and dependence over time. We also model changes in dependence and forgetting in layers A, A', B, and B' under certain conditions. In addition, to discuss the formation of opinion clusters that have reinforcing or obstructive behaviors of forgetting and dependence and are conservative or brainwashing or detoxifying and less prone to filter bubbling, new clusters C and D that recommend, obstruct, block, or incite forgetting and dependence over time are Introduction. This introduction allows us to test hypotheses regarding the expansion of opinions in two dimensions over time and space, the state of development of opinion space, and the expansion of public opinion. Challenges in consensus building will be highlighted, emphasizing the dynamic nature of opinions and the need to consider factors such as dissent, distrust, and media influence. The paper proposes an extended framework that incorporates trust, distrust, and media influence into the consensus building model. We introduce network analysis using dimerizing as a method to gain deeper insights. In this context, we discuss network clustering, media influence, and consensus building. The location and distribution of dimers will be analyzed to gain insight into the structure and dynamics of the network. Dimertiling has been applied in various fields other than network analysis, such as physics and sociology. The paper concludes by emphasizing the importance of diverse perspectives, network analysis, and influential entities in consensus building. It also introduces torus-based visualizations that aid in understanding complex network structures.	翻訳日:2023-11-28 18:23:41 公開日:2023-11-26
# 一般化グラフプロンプト:グラフ上の事前学習とダウンストリームタスクの統合に向けて Generalized Graph Prompt: Toward a Unification of Pre-Training and Downstream Tasks on Graphs ( http://arxiv.org/abs/2311.15317v1 ) ライセンス: Link先を確認	Xingtong Yu, Zhenghao Liu, Yuan Fang, Zemin Liu, Sihong Chen and Xinming Zhang	(参考訳) グラフニューラルネットワークはグラフ表現学習の強力なツールとして登場したが、そのパフォーマンスはタスク固有の監督に大きく依存している。ラベル付け要求を減らすため、"pre-train, prompt"パラダイムはますます一般的になっている。しかしながら、グラフ上でのプロンプトに関する既存の研究は限定的であり、異なる下流タスクにアピールするための普遍的な治療法が欠如している。本稿では,グラフの事前学習と促進のための新しいフレームワークであるGraphPromptを提案する。 graphpromptは、事前トレーニングとダウンストリームのタスクを共通のタスクテンプレートに統合するだけでなく、学習可能なプロンプトを使用して、事前トレーニングされたモデルから最も関連する知識をタスク固有の方法で特定する。この2つのステージでGraphPromptをさらに強化するために、GraphPrompt+に2つの大きな拡張を加えました。まず、単純なリンク予測以上のグラフ事前学習タスクを一般化し、タスクテンプレートとの互換性を広げる。次に,事前学習したグラフエンコーダの各層に一連のプロンプトベクトルを組み込んだ,より一般化されたプロンプト設計を提案する。最後に、GraphPromptとGraphPrompt+を評価し分析するために、5つの公開データセットに関する広範な実験を行う。 Graph neural networks have emerged as a powerful tool for graph representation learning, but their performance heavily relies on abundant task-specific supervision. To reduce labeling requirement, the "pre-train, prompt" paradigms have become increasingly common. However, existing study of prompting on graphs is limited, lacking a universal treatment to appeal to different downstream tasks. In this paper, we propose GraphPrompt, a novel pre-training and prompting framework on graphs. GraphPrompt not only unifies pre-training and downstream tasks into a common task template but also employs a learnable prompt to assist a downstream task in locating the most relevant knowledge from the pre-trained model in a task-specific manner. To further enhance GraphPrompt in these two stages, we extend it into GraphPrompt+ with two major enhancements. First, we generalize several popular graph pre-training tasks beyond simple link prediction to broaden the compatibility with our task template. Second, we propose a more generalized prompt design that incorporates a series of prompt vectors within every layer of the pre-trained graph encoder, in order to capitalize on the hierarchical information across different layers beyond just the readout layer. Finally, we conduct extensive experiments on five public datasets to evaluate and analyze GraphPrompt and GraphPrompt+.	翻訳日:2023-11-28 18:22:55 公開日:2023-11-26
# 預言コモンセンス推論による共感・感情支援対話生成の促進 Enhancing Empathetic and Emotion Support Dialogue Generation with Prophetic Commonsense Inference ( http://arxiv.org/abs/2311.15316v1 ) ライセンス: Link先を確認	Lanrui Wang, Jiangnan Li, Chenxu Yang, Zheng Lin, Weiping Wang	(参考訳) 共感的および感情的支援の会話に対する人々の関心は大幅に高まっている。より敏感で理解力のある回答を提供するために、常識的な知識を活用することは、心理的側面や因果性をよりよく理解するための共通の戦略となっている。しかし、そのような常識推論は文脈外であり、今後の対話のテーマを予測できないため、一貫性や共感が欠如している。本稿では,この問題を解決するために,コモンセンス知識を推論する革新的なパラダイムである予言コモンセンス推論を提案する。対話の理解と常識的推論に大規模言語モデルの能力を活用することで,過去と将来の対話のギャップを埋めるために,可変モデルの訓練を行う。共感的ダイアログと感情支援会話に関する広範な実験により,対話エージェントと提案する予言的コモンセンス推論を併用することで,反応の質が著しく向上することが示された。 The interest in Empathetic and Emotional Support conversations among the public has significantly increased. To offer more sensitive and understanding responses, leveraging commonsense knowledge has become a common strategy to better understand psychological aspects and causality. However, such commonsense inferences can be out of context and unable to predict upcoming dialogue themes, resulting in responses that lack coherence and empathy. To remedy this issue, we present Prophetic Commonsense Inference, an innovative paradigm for inferring commonsense knowledge. By harnessing the capabilities of Large Language Models in understanding dialogue and making commonsense deductions, we train tunable models to bridge the gap between past and potential future dialogues. Extensive experiments conducted on EmpatheticDialogues and Emotion Support Conversation show that equipping dialogue agents with our proposed prophetic commonsense inference significantly enhances the quality of their responses.	翻訳日:2023-11-28 18:22:18 公開日:2023-11-26
# マルチパーティ量子和プロトコルのノイズロバスト性 Noise robustness of a multiparty quantum summation protocol ( http://arxiv.org/abs/2311.15314v1 ) ライセンス: Link先を確認	Ant\'on Rodr\'iguez Otero and Niels M. P. Neumann and Ward van der Schoot and Robert Wezeman	(参考訳) 量子コンピュータを量子ネットワークに接続することは、分散データセット上でセキュアに計算を行うなど、幅広い新しいアプリケーションを開く。しかし、短期量子ネットワークはノイズが多いため、プロトコルの正確性とセキュリティは保証されない。雑音の影響を調べるために,不完全連接状態を持つマルチパーティ要約プロトコルについて検討する。本研究では, このプロトコルにおけるノイズの非分極化と劣化の影響と, 確率分布に生じる雑音パターンについて解析的に検討する。我々は、シャミールの秘密の共有を利用して、プロトコルにおける信頼できる第三者の必要性を排除して結論付ける。 Connecting quantum computers to a quantum network opens a wide array of new applications, such as securely performing computations on distributed data sets. Near-term quantum networks are noisy, however, and hence correctness and security of protocols are not guaranteed. To study the impact of noise, we consider a multiparty summation protocol with imperfect shared entangled states. We study analytically the impact of both depolarising and dephasing noise on this protocol and the noise patterns arising in the probability distributions. We conclude by eliminating the need for a trusted third party in the protocol using Shamir's secret sharing.	翻訳日:2023-11-28 18:21:37 公開日:2023-11-26
# 周波数依存ミラーを用いた散逸・分散キャビティ光学 Dissipative and dispersive cavity optomechanics with a frequency-dependent mirror ( http://arxiv.org/abs/2311.15311v1 ) ライセンス: Link先を確認	Juliette Monsel, Anastasiia Ciers, Sushanth Kini Manjeshwar, Witlef Wieczorek, Janine Splettstoesser	(参考訳) 光学マイクロキャビティは、光をサブ波長ボリュームに閉じ込めることで、光と機械運動の相互作用を著しく向上させることができる。しかし、これは光学損失率の増加のコストがかかる。したがって、マイクロキャビティベースの光機械システムは未解決のサイドバンド方式に置かれ、サイドバンドベースの地中冷却が防止される。このようなシステムにおける光損失を減らす経路は、キャビティミラー、すなわち機械共振器と相互作用する光モードを設計することである。本研究では,このような光学系の解析を行い,鏡の1つは周波数依存性が強く,つまり懸濁したファノミラーである。この光学力学系は、懸濁したファノミラーの運動と結合する2つの光学モードからなる。我々は、標準分散光機械結合と散逸結合の両方を含む量子結合モード記述を定式化する。線形状態におけるシステム力学のランゲヴィン方程式を解くことにより, 空洞が分解側バンド状態では無くとも, 室温から基底状態の冷却が可能であることを示すが, 強い光モード結合により有効なサイドバンド分解能を実現することができる。さらに, キャビティ出力スペクトルは, 機械的共振器のフォノン占有率を推定するために, 効果的なレーザデチューニングに関して適切に解析する必要があることがわかった。また, ファノミラーの特性を解析することにより, ファノ系マイクロキャビティにおける非線形量子光力学の展開を予測した。 An optomechanical microcavity can considerably enhance the interaction between light and mechanical motion by confining light to a sub-wavelength volume. However, this comes at the cost of an increased optical loss rate. Therefore, microcavity-based optomechanical systems are placed in the unresolved-sideband regime, preventing sideband-based ground-state cooling. A pathway to reduce optical loss in such systems is to engineer the cavity mirrors, i.e., the optical modes that interact with the mechanical resonator. In our work, we analyze such an optomechanical system, whereby one of the mirrors is strongly frequency-dependent, i.e., a suspended Fano mirror. This optomechanical system consists of two optical modes that couple to the motion of the suspended Fano mirror. We formulate a quantum-coupled-mode description that includes both the standard dispersive optomechanical coupling as well as dissipative coupling. We solve the Langevin equations of the system dynamics in the linear regime showing that ground-state cooling from room temperature can be achieved even if the cavity is per se not in the resolved-sideband regime, but achieves effective sideband resolution through strong optical mode coupling. Importantly, we find that the cavity output spectrum needs to be properly analyzed with respect to the effective laser detuning to infer the phonon occupation of the mechanical resonator. Our work also predicts how to reach the regime of nonlinear quantum optomechanics in a Fano-based microcavity by engineering the properties of the Fano mirror.	翻訳日:2023-11-28 18:21:23 公開日:2023-11-26
# 低コストゼロ知識証明によるセキュアで検証可能なデータコラボレーション Secure and Verifiable Data Collaboration with Low-Cost Zero-Knowledge Proofs ( http://arxiv.org/abs/2311.15310v1 ) ライセンス: Link先を確認	Yizheng Zhu, Yuncheng Wu, Zhaojing Luo, Beng Chin Ooi, Xiaokui Xiao	(参考訳) 組織は、データ分析のためのデータコラボレーションの価値をますます認識している。しかし、厳格なデータ保護法は生データの直接交換を禁じている。データコラボレーションを容易にするために、フェデレートラーニング(FL)が実現可能なソリューションとして登場し、複数のクライアントが、その生データの機密性を確保しつつ、中央サーバの監督下で機械学習(ML)モデルを協調的にトレーニングすることができる。しかし、既存の研究は2つの大きなリスクを明らかにしている。 (i)クライアントがアップロードした更新(つまりモデル勾配)から機密情報を推測し、クライアントの入力プライバシを侵害する可能性があること。 (ii) 不正な更新をアップロードしてflモデルに毒を盛る悪意のあるクライアントのリスクは、入力整合性を損なう。近年の研究では、ゼロ知識証明(ZKP)によるセキュアアグリゲーションを利用して、FLの入力プライバシーと整合性を保証する。それでも、非常に低い効率に悩まされており、実際の配備には実用的ではない。本稿では,入力プライバシと整合性を同時に確保し,安全かつ検証可能なデータコラボレーションのための,新規かつ高効率な解 risefl を提案する。次に,ビザンチンのロバスト性を満たすハイブリッドなコミットメントスキームを設計し,性能を向上する。第3に,提案手法のセキュリティ保証を理論的に証明する。合成データと実世界のデータセットに関する広範な実験は、我々のソリューションは効率的であり、クライアントの計算と通信の両方において非常に効率的であることを示唆している。例えばRiseFLは、クライアント計算の3つの最先端ベースラインであるACORN, RoFL, EIFFeLよりも最大28x, 53x, 164x高速である。 Organizations are increasingly recognizing the value of data collaboration for data analytics purposes. Yet, stringent data protection laws prohibit the direct exchange of raw data. To facilitate data collaboration, federated Learning (FL) emerges as a viable solution, which enables multiple clients to collaboratively train a machine learning (ML) model under the supervision of a central server while ensuring the confidentiality of their raw data. However, existing studies have unveiled two main risks: (i) the potential for the server to infer sensitive information from the client's uploaded updates (i.e., model gradients), compromising client input privacy, and (ii) the risk of malicious clients uploading malformed updates to poison the FL model, compromising input integrity. Recent works utilize secure aggregation with zero-knowledge proofs (ZKP) to guarantee input privacy and integrity in FL. Nevertheless, they suffer from extremely low efficiency and, thus, are impractical for real deployment. In this paper, we propose a novel and highly efficient solution RiseFL for secure and verifiable data collaboration, ensuring input privacy and integrity simultaneously.Firstly, we devise a probabilistic integrity check method that significantly reduces the cost of ZKP generation and verification. Secondly, we design a hybrid commitment scheme to satisfy Byzantine robustness with improved performance. Thirdly, we theoretically prove the security guarantee of the proposed solution. Extensive experiments on synthetic and real-world datasets suggest that our solution is effective and is highly efficient in both client computation and communication. For instance, RiseFL is up to 28x, 53x and 164x faster than three state-of-the-art baselines ACORN, RoFL and EIFFeL for the client computation.	翻訳日:2023-11-28 18:20:48 公開日:2023-11-26
# AV-Deepfake1M:大規模LCM駆動型オーディオビジュアルディープフェイクデータセット AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset ( http://arxiv.org/abs/2311.15308v1 ) ライセンス: Link先を確認	Zhixi Cai, Shreya Ghosh, Aman Pankaj Adatia, Munawar Hayat, Abhinav Dhall, Kalin Stefanov	(参考訳) 高度にリアルなディープフェイクな映像コンテンツの検出とローカライズは、最先端の最先端の手法でも困難である。この領域における研究はほとんどが高品質なディープフェイク画像やビデオの検出に重点を置いているが、実際のビデオに埋め込まれたオーディオ視覚操作の小さな部分の局所化の問題に対処する研究はほとんどない。本研究では,このようなコンテンツ生成の過程をエミュレートし,AV-Deepfake1Mデータセットを提案する。データセットにはコンテンツ駆動 (i)ビデオ操作、 (ii)音声操作、及び (iii) 2k以上の被写体に対する視聴覚操作により,合計100万以上の映像が得られた。本稿では,提案するデータ生成パイプラインの詳細な記述と,生成されたデータの品質の厳密な解析について述べる。最先端のディープフェイク検出とローカライズ手法を用いて提案したデータセットの総合ベンチマークは,従来のデータセットと比較して大幅な性能低下を示している。提案したデータセットは、次世代のディープフェイクローカライゼーション手法を構築する上で重要な役割を果たす。データセットと関連するコードはhttps://github.com/ControlNet/AV-Deepfake1Mで公開されている。 The detection and localization of highly realistic deepfake audio-visual content are challenging even for the most advanced state-of-the-art methods. While most of the research efforts in this domain are focused on detecting high-quality deepfake images and videos, only a few works address the problem of the localization of small segments of audio-visual manipulations embedded in real videos. In this research, we emulate the process of such content generation and propose the AV-Deepfake1M dataset. The dataset contains content-driven (i) video manipulations, (ii) audio manipulations, and (iii) audio-visual manipulations for more than 2K subjects resulting in a total of more than 1M videos. The paper provides a thorough description of the proposed data generation pipeline accompanied by a rigorous analysis of the quality of the generated data. The comprehensive benchmark of the proposed dataset utilizing state-of-the-art deepfake detection and localization methods indicates a significant drop in performance compared to previous datasets. The proposed dataset will play a vital role in building the next-generation deepfake localization methods. The dataset and associated code are available at https://github.com/ControlNet/AV-Deepfake1M .	翻訳日:2023-11-28 18:20:19 公開日:2023-11-26
# スケッチビデオ合成 Sketch Video Synthesis ( http://arxiv.org/abs/2311.15306v1 ) ライセンス: Link先を確認	Yudian Zheng, Xiaodong Cun, Menghan Xia, Chi-Man Pun	(参考訳) 画像スケッチ生成には意味的な複雑さやハイレベルな概念を理解することが不可欠であり、この課題はビデオの領域に適用されるとさらに強固になる。そこで本稿では,フレームワイズb\'ezier曲線で表現された映像をスケッチするための新しい最適化ベースフレームワークを提案する。具体的には,まず各曲線の位置と幅を暖めるためのクロスフレームストローク初期化手法を提案する。次に,CLIP特徴に基づく意味的損失と,自己分解型2Dアトラスネットワークを用いて新たに設計された一貫性損失を利用して,これらの曲線の位置を最適化する。これらのデザイン要素に基づいて作られたスケッチビデオは、印象的な視覚的抽象化と時間的コヒーレンスを示している。さらに,スケッチ作成プロセスを通じて映像をSVGラインに変換することにより,ティーザーの例に示すように,スケッチベースのビデオ編集やビデオドーナリングの応用を解放する。 Understanding semantic intricacies and high-level concepts is essential in image sketch generation, and this challenge becomes even more formidable when applied to the domain of videos. To address this, we propose a novel optimization-based framework for sketching videos represented by the frame-wise B\'ezier curve. In detail, we first propose a cross-frame stroke initialization approach to warm up the location and the width of each curve. Then, we optimize the locations of these curves by utilizing a semantic loss based on CLIP features and a newly designed consistency loss using the self-decomposed 2D atlas network. Built upon these design elements, the resulting sketch video showcases impressive visual abstraction and temporal coherence. Furthermore, by transforming a video into SVG lines through the sketching process, our method unlocks applications in sketch-based video editing and video doodling, enabled through video composition, as exemplified in the teaser.	翻訳日:2023-11-28 18:20:00 公開日:2023-11-26
# 概念蒸留:人間中心の説明をモデル改善に活用する Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement ( http://arxiv.org/abs/2311.15303v1 ) ライセンス: Link先を確認	Avani Gupta, Saurabh Saini, P J Narayanan	(参考訳) 人間はハードな特徴の代わりに抽象的な概念を使う。近年の解釈可能性研究は、ニューラルネットワークの人間中心の概念説明に焦点を当てている。概念活性化ベクトル(cav)は、与えられた概念に対するモデルの感度と潜在的バイアスを推定する。本稿では,CAVをポストホック解析からアンテホックトレーニングに拡張し,新たな概念損失を用いた微調整によりモデルバイアスを低減する。概念は、ネットワークの最終層で過去に定義されていた。クラスプロトタイプを用いて中間層に一般化する。これにより、最も有益であることが知られている最後の畳み込み層でのクラス学習が促進される。また,教師として訓練済みの知識モデルを用いて,より豊かな概念を創出するために,概念蒸留を導入する。提案手法は,概念に向けてモデルを感性化あるいは脱感性化することができる。いくつかの分類問題に対する概念感受性トレーニングの応用について述べる。また,概念を用いて先行知識を再構築問題であるiidに誘導する。概念に敏感なトレーニングは、モデルの解釈性を改善し、バイアスを減らし、事前知識を誘導する。コードと詳細はhttps://avani17101.github.io/concept-distilllation/を参照してください。 Humans use abstract concepts for understanding instead of hard features. Recent interpretability research has focused on human-centered concept explanations of neural networks. Concept Activation Vectors (CAVs) estimate a model's sensitivity and possible biases to a given concept. In this paper, we extend CAVs from post-hoc analysis to ante-hoc training in order to reduce model bias through fine-tuning using an additional Concept Loss. Concepts were defined on the final layer of the network in the past. We generalize it to intermediate layers using class prototypes. This facilitates class learning in the last convolution layer, which is known to be most informative. We also introduce Concept Distillation to create richer concepts using a pre-trained knowledgeable model as the teacher. Our method can sensitize or desensitize a model towards concepts. We show applications of concept-sensitive training to debias several classification problems. We also use concepts to induce prior knowledge into IID, a reconstruction problem. Concept-sensitive training can improve model interpretability, reduce biases, and induce prior knowledge. Please visit https://avani17101.github.io/Concept-Distilllation/ for code and more details.	翻訳日:2023-11-28 18:19:44 公開日:2023-11-26
# アンサンブル学習による眼疾患予測とoctスキャンによる注意 Eye Disease Prediction using Ensemble Learning and Attention on OCT Scans ( http://arxiv.org/abs/2311.15301v1 ) ライセンス: Link先を確認	Gauri Naik, Nandini Narvekar, Dimple Agarwal, Nishita Nandanwar, Himangi Pande	(参考訳) 眼疾患は何十年にもわたって大きな課題となっているが、技術の進歩により、その検出と治療のための新しい道が開かれた。機械学習とディープラーニングのアルゴリズムは、特に光コヒーレント技術(oct)イメージングと組み合わせることで、この領域で活用されている。 OCT画像から眼疾患を効率的に検出するための新しい手法を提案する。本手法は,脈絡膜新生血管 (cnv) , 糖尿病黄斑浮腫 (dme) , drusen などの特定の病態により, 患者を無疾患 (正常眼) に分類することを可能にする。本研究では,効率的な眼疾患予測に機械学習とディープラーニング技術を利用するエンド・ツー・エンドのWebアプリケーションを提案する。このアプリケーションは、訓練されたカスタムUNetモデルを使用してセグメンテーションを行うOCTスキャン画像の提出を可能にする。次に、セグメント画像は、自己注意層で強化されたInceptionV3とXceptionネットワークからなるアンサンブルモデルに入力される。この自己注意アプローチは、個々のモデルの特徴マップを活用して、分類精度を向上させる。アンサンブルモデルの出力を集約して様々な眼疾患を予測・分類する。アプリケーションの効率性と最適な性能を確保するため、大規模な実験と最適化が実施されている。本研究は眼疾患予測における提案手法の有効性を示す。開発したWebアプリケーションは早期発見やタイムリーな介入の可能性を秘めており、眼科医療の成果に寄与する。 Eye diseases have posed significant challenges for decades, but advancements in technology have opened new avenues for their detection and treatment. Machine learning and deep learning algorithms have become instrumental in this domain, particularly when combined with Optical Coherent Technology (OCT) imaging. We propose a novel method for efficient detection of eye diseases from OCT images. Our technique enables the classification of patients into disease free (normal eyes) or affected by specific conditions such as Choroidal Neovascularization (CNV), Diabetic Macular Edema (DME), or Drusen. In this work, we introduce an end to end web application that utilizes machine learning and deep learning techniques for efficient eye disease prediction. The application allows patients to submit their raw OCT scanned images, which undergo segmentation using a trained custom UNet model. The segmented images are then fed into an ensemble model, comprising InceptionV3 and Xception networks, enhanced with a self attention layer. This self attention approach leverages the feature maps of individual models to achieve improved classification accuracy. The ensemble model's output is aggregated to predict and classify various eye diseases. Extensive experimentation and optimization have been conducted to ensure the application's efficiency and optimal performance. Our results demonstrate the effectiveness of the proposed approach in accurate eye disease prediction. The developed web application holds significant potential for early detection and timely intervention, thereby contributing to improved eye healthcare outcomes.	翻訳日:2023-11-28 18:19:29 公開日:2023-11-26
# コンテナ端末におけるタイムスロット管理のためのデータ駆動型マルチエージェント意思決定支援システム:ロッテルダム港を事例として A Data-driven and multi-agent decision support system for time slot management at container terminals: A case study for the Port of Rotterdam ( http://arxiv.org/abs/2311.15298v1 ) ライセンス: Link先を確認	Ali Nadi, Maaike Snelder, J.W.C. van Lint, L\'or\'ant Tavasszy	(参考訳) コンテナハブからのトラックの出発時間を制御することは、交通システムと物流システムの両方にとって重要である。しかしこれには、ターミナルゲートでのトラック到着時刻を制御し管理できるインテリジェントな意思決定支援システムが必要である。本稿では,ポート・ハイトランドエコシステムにおけるロジスティクスとトラフィックの相互作用を理解し,予測し,制御するための統合モデルを提案する。このアプローチはコンテキスト対応であり、大きな履歴データを使用してシステム状態を予測し、トラックの流入と流出に応じて制御ポリシーを適用する。規制方針は、トラック会社、ターミナルオペレーター、道路交通代理店を含む複数の利害関係者の満足を確保する。提案手法は, ゲート待ち時間とコスト効率の向上を期待するタイムスロットを選択するために, 系統的にステアトラックを運用する5つの統合モジュールから構成される。シミュレーションは実世界のデータによって支援され、システム内で大きな利得が得られることを示す。 Controlling the departure time of the trucks from a container hub is important to both the traffic and the logistics systems. This, however, requires an intelligent decision support system that can control and manage truck arrival times at terminal gates. This paper introduces an integrated model that can be used to understand, predict, and control logistics and traffic interactions in the port-hinterland ecosystem. This approach is context-aware and makes use of big historical data to predict system states and apply control policies accordingly, on truck inflow and outflow. The control policies ensure multiple stakeholders satisfaction including those of trucking companies, terminal operators, and road traffic agencies. The proposed method consists of five integrated modules orchestrated to systematically steer truckers toward choosing those time slots that are expected to result in lower gate waiting times and more cost-effective schedules. The simulation is supported by real-world data and shows that significant gains can be obtained in the system.	翻訳日:2023-11-28 18:19:04 公開日:2023-11-26
# 温間開始ガウス過程を用いた可制御性多目的最適化 Controllable Expensive Multi-objective Optimization with Warm-starting Gaussian Processes ( http://arxiv.org/abs/2311.15297v1 ) ライセンス: Link先を確認	Quang-Huy Nguyen, Long P. Hoang, Hoang V. Viet, Dung D. Le	(参考訳) Pareto Set Learning (PSL)は、多目的最適化(MOO)問題において、Paretoフロント全体を近似するための有望なアプローチである。しかしながら、既存の微分自由PSL法はしばしば不安定で非効率であり、特に、目的関数評価がコストがかかる高価なブラックボックスMOO問題に対して有効である。本研究では,Co-PSLと呼ばれる新しい制御可能なPSL法を用いて,既存のPSL法の不安定性と非効率性に対処することを提案する。特に、Co-PSLは、(1)ガウス過程の先行値を得るためのベイズ最適化をウォームスタートさせ、(2)制御可能なパレート集合学習により、好みから対応するパレート解へのパラメトリックマッピングを正確に取得する。前者はPSLプロセスの安定化と高価な機能評価の削減を支援することである。後者は、競合する目標間のリアルタイムのトレードオフ制御をサポートする。合成および実世界のMOO問題における性能は、高価な多目的最適化タスクにおけるCo-PSLの有効性を示す。 Pareto Set Learning (PSL) is a promising approach for approximating the entire Pareto front in multi-objective optimization (MOO) problems. However, existing derivative-free PSL methods are often unstable and inefficient, especially for expensive black-box MOO problems where objective function evaluations are costly. In this work, we propose to address the instability and inefficiency of existing PSL methods with a novel controllable PSL method, called Co-PSL. Particularly, Co-PSL consists of two stages: (1) warm-starting Bayesian optimization to obtain quality Gaussian Processes priors and (2) controllable Pareto set learning to accurately acquire a parametric mapping from preferences to the corresponding Pareto solutions. The former is to help stabilize the PSL process and reduce the number of expensive function evaluations. The latter is to support real-time trade-off control between conflicting objectives. Performances across synthesis and real-world MOO problems showcase the effectiveness of our Co-PSL for expensive multi-objective optimization tasks.	翻訳日:2023-11-28 18:18:46 公開日:2023-11-26
# uhgeval:unconstrained generationによる中国語大言語モデルの幻覚のベンチマーク UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation ( http://arxiv.org/abs/2311.15296v1 ) ライセンス: Link先を確認	Xun Liang, Shichao Song, Simin Niu, Zhiyu Li, Feiyu Xiong, Bo Tang, Zhaohui Wy, Dawei He, Peng Cheng, Zhonghao Wang, Haiying Deng	(参考訳) 大規模言語モデル(llm)は、現代自然言語処理において重要な貢献者として登場し、様々な産業に適用されつつある。しかし、これらの大規模確率論的統計モデルは、現在プロのコンテンツ生成に必要な品質を保証できない。これらのモデルは、しばしば幻覚テキストを生成し、専門的な文脈で実用性を妥協する。テキスト生成におけるLCMの信頼性を評価するために,幻覚現象のベンチマーク評価を開発した。しかしながら、これらのベンチマークはコストと時間的制約のため、しばしば制約付き生成技術を利用する。これらの技術は、指示幻覚誘導と、幻覚を生み出すための真正のテキストを意図的に変更する戦略の使用を含んでいる。これらのアプローチは、現実世界のアプリケーションによって要求される制限のないテキスト生成と一致しない。さらに, テキスト生成における幻覚評価専用の中国語データセットも, 現在不足している。その結果,LLMによる最小限の制約で生成した出力をコンパイルするUnconstrained Hallucination Generation Evaluation (UHGEval) ベンチマークを開発した。同時に,スケーラブルで再現可能な実験を行うための総合的なベンチマーク評価フレームワークを構築した。また,著明な中国語モデルとgptシリーズモデルを評価し,幻覚の課題に関する専門的なパフォーマンス洞察を導出するための広範な実験を行った。 Large language models (LLMs) have emerged as pivotal contributors in contemporary natural language processing and are increasingly being applied across a diverse range of industries. However, these large-scale probabilistic statistical models cannot currently ensure the requisite quality in professional content generation. These models often produce hallucinated text, compromising their practical utility in professional contexts. To assess the authentic reliability of LLMs in text generation, numerous initiatives have developed benchmark evaluations for hallucination phenomena. Nevertheless, these benchmarks frequently utilize constrained generation techniques due to cost and temporal constraints. These techniques encompass the use of directed hallucination induction and strategies that deliberately alter authentic text to produce hallucinations. These approaches are not congruent with the unrestricted text generation demanded by real-world applications. Furthermore, a well-established Chinese-language dataset dedicated to the evaluation of hallucinations in text generation is presently lacking. Consequently, we have developed an Unconstrained Hallucination Generation Evaluation (UHGEval) benchmark, designed to compile outputs produced with minimal restrictions by LLMs. Concurrently, we have established a comprehensive benchmark evaluation framework to aid subsequent researchers in undertaking scalable and reproducible experiments. We have also executed extensive experiments, evaluating prominent Chinese language models and the GPT series models to derive professional performance insights regarding hallucination challenges.	翻訳日:2023-11-28 18:18:28 公開日:2023-11-26
# ロシアによるウクライナ侵攻におけるパルチザンニュース共有に関する研究 A Study of Partisan News Sharing in the Russian invasion of Ukraine ( http://arxiv.org/abs/2311.15294v1 ) ライセンス: Link先を確認	Yiming Zhu, Ehsan-Ul Haq, Gareth Tyson, Lik-Hang Lee, Yuyang Wang, Pan Hui	(参考訳) ロシアによるウクライナ侵攻以来、大量の偏見や党派的なニュースがソーシャルメディアを通じて拡散してきた。これはより広範な社会的問題につながる可能性があるため、オンラインコミュニティのより良いガバナンスには、パルチザン的なニュース共有がユーザのコミュニケーションにどのように影響するかを理解することが重要であると論じる。本稿では,パルチザンニュース共有の計測研究を行う。我々は,このような共有がユーザのコミュニケーションに与える影響を特徴付けることを目的とする。われわれの分析では、ロシアの侵略に関連するRedditの6つのコミュニティにわたる8ヶ月のデータセットをカバーしている。まず,パルチザンニュース共有の時間変化の分析を行った。我々は,この侵略が,パルチザンのニュース共有量の増加とともに,観察されたコミュニティの議論を刺激することを確認する。次に,このような共有に対するユーザの反応を特徴付ける。我々は、パルチザンバイアスがその伝播を狭める役割を担っていることを観察する。バイアスのあるメディアは、複数のサブレディットにまたがる可能性が低い。しかし、パルチザン的なニュース共有は、より多くのコメントを生成して、議論に参加するユーザーを惹きつけている。その後、パルチザンニュースを広める可能性のあるユーザーを特定するための予測モデルを構築しました。しかし、予測は平均で61.57%の精度で難しい。コメントネットワークの中央集権性分析により,パルチザンニュースを広める利用者は,中立ニュースを広める利用者に比べてネットワークの影響が小さいことが示唆された。 Since the Russian invasion of Ukraine, a large volume of biased and partisan news has been spread via social media platforms. As this may lead to wider societal issues, we argue that understanding how partisan news sharing impacts users' communication is crucial for better governance of online communities. In this paper, we perform a measurement study of partisan news sharing. We aim to characterize the role of such sharing in influencing users' communications. Our analysis covers an eight-month dataset across six Reddit communities related to the Russian invasion. We first perform an analysis of the temporal evolution of partisan news sharing. We confirm that the invasion stimulates discussion in the observed communities, accompanied by an increased volume of partisan news sharing. Next, we characterize users' response to such sharing. We observe that partisan bias plays a role in narrowing its propagation. More biased media is less likely to be spread across multiple subreddits. However, we find that partisan news sharing attracts more users to engage in the discussion, by generating more comments. We then built a predictive model to identify users likely to spread partisan news. The prediction is challenging though, with 61.57% accuracy on average. Our centrality analysis on the commenting network further indicates that the users who disseminate partisan news possess lower network influence in comparison to those who propagate neutral news.	翻訳日:2023-11-28 18:18:09 公開日:2023-11-26
# BatchNormによる弱補正ビデオ異常検出 BatchNorm-based Weakly Supervised Video Anomaly Detection ( http://arxiv.org/abs/2311.15367v1 ) ライセンス: Link先を確認	Yixuan Zhou, Yi Qu, Xing Xu, Fumin Shen, Jingkuan Song, Hengtao Shen	(参考訳) 異常発生の有無を示すビデオレベルラベルのみが使用可能な弱教師付きビデオ異常検出(wvad)では,異常発生の時間的アノテーションにおける内在的曖昧さが主な課題となっている。異常事象の時間的特徴がしばしば異常な特徴を示すという統計的知見に着想を得て,BatchNormをWVADに組み込んだBN-WVADを提案する。提案したBN-WVADでは,BatchNormの平均ベクトル(DFM)から特徴の偏差を信頼性のある異常基準として活用し,異常ビデオ中の潜在的な異常断片を識別する。提案したDFM基準は、異常認識にも適しており、ラベルノイズに対する耐性も高く、ノイズラベルに影響を受けやすい異常分類器の予測を補正するための追加の異常スコアとして機能する。さらに、より異常なイベントが発生するビデオ中の異常なスニペットをフィルタリングするために、バッチレベルの選択戦略が考案されている。提案したBN-WVADモデルでは、UCF-CrimeのAUCは87.24%、XD-Violenceは84.93%に達する。私たちのコード実装はhttps://github.com/cool-xuan/bn-wvadからアクセスできます。 In weakly supervised video anomaly detection (WVAD), where only video-level labels indicating the presence or absence of abnormal events are available, the primary challenge arises from the inherent ambiguity in temporal annotations of abnormal occurrences. Inspired by the statistical insight that temporal features of abnormal events often exhibit outlier characteristics, we propose a novel method, BN-WVAD, which incorporates BatchNorm into WVAD. In the proposed BN-WVAD, we leverage the Divergence of Feature from Mean vector (DFM) of BatchNorm as a reliable abnormality criterion to discern potential abnormal snippets in abnormal videos. The proposed DFM criterion is also discriminative for anomaly recognition and more resilient to label noise, serving as the additional anomaly score to amend the prediction of the anomaly classifier that is susceptible to noisy labels. Moreover, a batch-level selection strategy is devised to filter more abnormal snippets in videos where more abnormal events occur. The proposed BN-WVAD model demonstrates state-of-the-art performance on UCF-Crime with an AUC of 87.24%, and XD-Violence, where AP reaches up to 84.93%. Our code implementation is accessible at https://github.com/cool-xuan/BN-WVAD.	翻訳日:2023-11-28 18:09:50 公開日:2023-11-26
# seq2seq変換による非ターゲットコードオーサシップ回避 Untargeted Code Authorship Evasion with Seq2Seq Transformation ( http://arxiv.org/abs/2311.15366v1 ) ライセンス: Link先を確認	Soohyeon Choi and Rhongho Jang and DaeHun Nyang and David Mohaisen	(参考訳) コードオーサシップの属性(Code Authorship Attribution)は、プログラム言語コードの作者をコード内のスタイリスティックな特徴を通じて識別する問題である。本稿では、StuctCoderと呼ばれるSeq2Seqコードトランスフォーマーを利用する、コードオーサシップ難読化技術であるSCAEを紹介する。 SCAEは、ある言語から別の言語(例えばJavaからC#)への関数レベルのコード変換用に最初に設計されたシステムであるStructCoderを、転送学習を使ってカスタマイズする。 SCAEは、既存の作業と比べて、わずかに精度の低下で効率を向上した。また,85%のトランスフォーメーション成功率と95.77%の回避成功率を維持しながら,処理時間を約68%削減した。 Code authorship attribution is the problem of identifying authors of programming language codes through the stylistic features in their codes, a topic that recently witnessed significant interest with outstanding performance. In this work, we present SCAE, a code authorship obfuscation technique that leverages a Seq2Seq code transformer called StructCoder. SCAE customizes StructCoder, a system designed initially for function-level code translation from one language to another (e.g., Java to C#), using transfer learning. SCAE improved the efficiency at a slight accuracy degradation compared to existing work. We also reduced the processing time by about 68% while maintaining an 85% transformation success rate and up to 95.77% evasion success rate in the untargeted setting.	翻訳日:2023-11-28 18:09:26 公開日:2023-11-26
# L{}ojasiewicz-Simon不等式による連続的なディープラーニングモデルの収束結果 A Convergence result of a continuous model of deep learning via \L{}ojasiewicz--Simon inequality ( http://arxiv.org/abs/2311.15365v1 ) ライセンス: Link先を確認	Noboru Isobe	(参考訳) 本研究では,Deep Neural Network (DNN) の連続モデルの最適化プロセスを表すWasserstein型勾配流に着目した。まず, モデルの平均損失に対する最小化器の存在を, $l^2$-正規化の下で確立する。その後、損失の最大傾斜曲線の存在を示す。私たちの主な結果は、時間が無限になるにつれて、損失の臨界点への流れの収束です。この結果を証明するための重要な側面は、損失に対する L{}ojasiewicz--シモン勾配の不等式を確立することである。 NNと損失関数の解析性を仮定することで、この不等式を導出する。本証明は,非凸関数に対するwasserstein型勾配流の漸近的挙動を解析するための新しい手法を提供する。 This study focuses on a Wasserstein-type gradient flow, which represents an optimization process of a continuous model of a Deep Neural Network (DNN). First, we establish the existence of a minimizer for an average loss of the model under $L^2$-regularization. Subsequently, we show the existence of a curve of maximal slope of the loss. Our main result is the convergence of flow to a critical point of the loss as time goes to infinity. An essential aspect of proving this result involves the establishment of the \L{}ojasiewicz--Simon gradient inequality for the loss. We derive this inequality by assuming the analyticity of NNs and loss functions. Our proofs offer a new approach for analyzing the asymptotic behavior of Wasserstein-type gradient flows for nonconvex functionals.	翻訳日:2023-11-28 18:09:11 公開日:2023-11-26
# ロボットインタラクションにおけるRGBカメラを用いた超音波ジェスチャー認識 Ultra-Range Gesture Recognition using an RGB Camera in Human-Robot Interaction ( http://arxiv.org/abs/2311.15361v1 ) ライセンス: Link先を確認	Eran Bamani, Eden Nissinman, Inbar Meir, Lisa Koenigsberg, Avishai Sintov	(参考訳) ハンドジェスチャは、非言語的意図、思考、命令が伝達される人間の相互作用において重要な役割を果たす。 HRI(Human-Robot Interaction)では、ハンドジェスチャはロボットエージェントに明確で迅速な指示を伝達するための類似した、効率的な媒体を提供する。しかし,ジェスチャ認識のための最先端の視覚ベース手法は,ユーザカメラ距離7mまでしか効果がないことが示されている。このような距離の短い範囲では、サービスロボット、捜索救助ロボット、ドローンといった実用的なhriを制限することができる。本研究では,最大25mの認識距離とHRIの文脈で,Ultra-Range Gesture Recognition (URGR)問題に対処する。シンプルなRGBカメラのみを用いたURGRのための新しいディープラーニングフレームワークを提案する。まず、HQ-Netと呼ばれる新しい超解像度モデルを用いて、ユーザの低解像度画像を強化する。次に,拡張画像を入力とする新しいurgr分類器であるgraph vision transformer(gvit)を提案する。 GViTは、グラフ畳み込みネットワーク(GCN)と修正されたビジョントランスフォーマー(ViT)の利点を組み合わせたものである。多様なテストデータに対する提案フレームワークの評価は、98.1%高い認識率をもたらす。このフレームワークは、超距離での人間の認識よりも優れた性能を示した。本研究では,複雑な屋内・屋外環境下での人間のジェスチャーによる自律的四足歩行ロボットの性能解析と実演を行う。 Hand gestures play a significant role in human interactions where non-verbal intentions, thoughts and commands are conveyed. In Human-Robot Interaction (HRI), hand gestures offer a similar and efficient medium for conveying clear and rapid directives to a robotic agent. However, state-of-the-art vision-based methods for gesture recognition have been shown to be effective only up to a user-camera distance of seven meters. Such a short distance range limits practical HRI with, for example, service robots, search and rescue robots and drones. In this work, we address the Ultra-Range Gesture Recognition (URGR) problem by aiming for a recognition distance of up to 25 meters and in the context of HRI. We propose a novel deep-learning framework for URGR using solely a simple RGB camera. First, a novel super-resolution model termed HQ-Net is used to enhance the low-resolution image of the user. Then, we propose a novel URGR classifier termed Graph Vision Transformer (GViT) which takes the enhanced image as input. GViT combines the benefits of a Graph Convolutional Network (GCN) and a modified Vision Transformer (ViT). Evaluation of the proposed framework over diverse test data yields a high recognition rate of 98.1%. The framework has also exhibited superior performance compared to human recognition in ultra-range distances. With the framework, we analyze and demonstrate the performance of an autonomous quadruped robot directed by human gestures in complex ultra-range indoor and outdoor environments.	翻訳日:2023-11-28 18:09:00 公開日:2023-11-26
# レーザー支援量子反射による原子表面散乱の制御 Controlling Atom-Surface Scattering with Laser Assisted Quantum Reflection ( http://arxiv.org/abs/2311.15357v1 ) ライセンス: Link先を確認	A. L. Harris	(参考訳) 低エネルギー原子-表面散乱では、古典的な旋回点を持たない魅力的なポテンシャルの領域で原子を反射することができる。この現象は量子反射(quantum reflection)として知られており、原子の表面に付着する確率を減少させ、また原子トラップにも用いられる。我々は、印加されたレーザー場の存在下でモースポテンシャル内をゆっくり動く原子を持つ1次元モデルで量子反射過程をシミュレートする。レーザー支援量子反射の場合、レーザー場は原子にさらなる運動量と運動エネルギーを与える。これにより、原子と表面の間の最接近距離が減少する。その結果,レーザーパルスのタイミングや強度によって距離を制御でき,粘着率や量子反射率の低減が期待できることがわかった。 In low energy atom-surface scattering, it is possible for the atom to be reflected in a region of attractive potential with no classical turning point. This phenomenon has come to be known as quantum reflection and it can reduce the sticking probability of atoms to surfaces, as well be used for atom trapping. We simulate the quantum reflection process in a one-dimensional model with a slow-moving atom moving in a Morse potential in the presence of an applied laser field. We show that in the case of laser-assisted quantum reflection, the laser field imparts additional momentum and kinetic energy to the atom. This results in a decreased distance of closest approach between the atom and surface. Our results show that the distance of closest approach and can be controlled through the timing and intensity of the laser pulse, which may result in enhanced sticking probability and/or reduced quantum reflection probability.	翻訳日:2023-11-28 18:08:36 公開日:2023-11-26
# 第二の考えを持つか? 聞いてみましょう Having Second Thoughts? Let's hear it ( http://arxiv.org/abs/2311.15356v1 ) ライセンス: Link先を確認	Jung H. Lee and Sujith Vijayan	(参考訳) ディープラーニングモデルは、低次知覚領域から高次認知領域へのボトムアップ信号経路を緩く模倣する。訓練後、DLモデルはいくつかのドメイン固有のタスクにおいて人間より優れているが、意思決定プロセスは容易に破壊されることが知られている。人間の脳は複数の機能領域から構成されており、ボトムアップとトップダウン(高次から低次まで)の複雑な相互作用に依存しているため、トップダウン信号処理を取り入れることで、DLモデルをより堅牢にすることができると仮定する。この仮説に対処するため,我々は,DLモデルをより堅牢にできるかどうか,選択的注意を模倣した認証プロセスを提案する。実験的な評価から,新たに提案された認証により,DLモデルの精度が向上し,その脆弱性を人為的,自然的両面的な例で軽減する安全対策が構築できることが示唆された。 Deep learning models loosely mimic bottom-up signal pathways from low-order sensory areas to high-order cognitive areas. After training, DL models can outperform humans on some domain-specific tasks, but their decision-making process has been known to be easily disrupted. Since the human brain consists of multiple functional areas highly connected to one another and relies on intricate interplays between bottom-up and top-down (from high-order to low-order areas) processing, we hypothesize that incorporating top-down signal processing may make DL models more robust. To address this hypothesis, we propose a certification process mimicking selective attention and test if it could make DL models more robust. Our empirical evaluations suggest that this newly proposed certification can improve DL models' accuracy and help us build safety measures to alleviate their vulnerabilities with both artificial and natural adversarial examples.	翻訳日:2023-11-28 18:08:23 公開日:2023-11-26
# 強化学習における任意制約を伴う確率的行動の生成モデル Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning ( http://arxiv.org/abs/2311.15341v1 ) ライセンス: Link先を確認	Changyu Chen, Ramesha Karunasena, Thanh Hong Nguyen, Arunesh Sinha, Pradeep Varakantham	(参考訳) 強化学習(rl)の多くの問題は、大きな離散的多次元かつ無秩序なアクション空間を持つ最適方針を求めており、複数のセキュリティリソースの配置や緊急対応ユニットなどのリソースのランダム配置の問題を含んでいる。この設定の課題は、下層の作用空間が分類的(離散的かつ非順序的)で大きく、既存のRL法ではうまく機能しないことである。さらに、これらの問題は実効作用(配置)の妥当性を必要とし、この妥当性制約はしばしば閉じた数学的形式でコンパクトに表現することが困難である。問題の割り当ての性質は、もし存在するならば、確率的最適政策を好む。本稿では,(1)(状態)条件付き正規化フローを適用して確率的ポリシーをコンパクトに表現すること -- ネットワークが1つのサンプルアクションとそれに対応するアクションのログ確率を生成することによって生じるコンパクト性 -- をアクタ-クリティックな方法で使用すること,(2)ベースポリシーを更新するために無効なアクション拒否法(有効なアクションオラクルによる)を使用することによって,これらの課題に対処する。アクション拒否は、私たちが導出する変更されたポリシー勾配によって実現されます。最後に、従来の手法と比較して、我々のアプローチのスケーラビリティと、任意の状態におけるアクションの分布のサポートに任意の状態条件制約を適用する能力を示すための広範な実験を行う。 Many problems in Reinforcement Learning (RL) seek an optimal policy with large discrete multidimensional yet unordered action spaces; these include problems in randomized allocation of resources such as placements of multiple security resources and emergency response units, etc. A challenge in this setting is that the underlying action space is categorical (discrete and unordered) and large, for which existing RL methods do not perform well. Moreover, these problems require validity of the realized action (allocation); this validity constraint is often difficult to express compactly in a closed mathematical form. The allocation nature of the problem also prefers stochastic optimal policies, if one exists. In this work, we address these challenges by (1) applying a (state) conditional normalizing flow to compactly represent the stochastic policy -- the compactness arises due to the network only producing one sampled action and the corresponding log probability of the action, which is then used by an actor-critic method; and (2) employing an invalid action rejection method (via a valid action oracle) to update the base policy. The action rejection is enabled by a modified policy gradient that we derive. Finally, we conduct extensive experiments to show the scalability of our approach compared to prior methods and the ability to enforce arbitrary state-conditional constraints on the support of the distribution of actions in any state.	翻訳日:2023-11-28 18:08:06 公開日:2023-11-26
# 情報マスキングの敵意的浄化 Adversarial Purification of Information Masking ( http://arxiv.org/abs/2311.15339v1 ) ライセンス: Link先を確認	Sitong Liu, Zhichao Lian, Shuangquan Zhang, Liang Xiao	(参考訳) 敵対的攻撃は、ニューラルネットワークを騙すために画像に極小で知覚できない摂動を生成する。これらに対抗して、敵の入力サンプルをクリーンな出力画像に変換し、敵の攻撃から守る。それでも、ある程度の生成モデルは、敵の摂動を効果的に排除できず、理想的でない浄化結果をもたらす。ターゲットモデルに対する残余の敵対的摂動の潜在的な脅威を強調し,摂動スケールと攻撃能力の関係を定量的に確立する。特に、精製画像上の残留摂動は、主に、対向サンプルの同じ位置パッチと類似のパッチに由来する。本稿では,情報マスク浄化 (IMPure) と呼ばれる新たな対外浄化手法を提案する。逆方向のサンプルを得るために,まずパッチ情報の一部をマスクし,次にパッチを再構築して,パッチからの逆方向の摂動に抵抗する。すべてのパッチを並列に再構築し,結束画像を得る。そして, 類似する局所的摂動に対して精製試料を保護するため, 特徴抽出ネットワークに入力する前に, 精製試料と入力試料をランダムに混合することにより, このリスクをシミュレートする。最後に,画素損失と知覚損失の組合せ制約を確立し,モデルの再構成適応性を高める。 3つの分類器モデルを用いたimagenetデータセットの広範囲な実験により,本手法は9つの攻撃手法に対して最先端の結果が得られることを示した。実装コードと事前トレーニングされたウェイトは、 \textcolor{blue}{https://github.com/nowindbutrain/impure} でアクセスできる。 Adversarial attacks meticulously generate minuscule, imperceptible perturbations to images to deceive neural networks. Counteracting these, adversarial purification methods seek to transform adversarial input samples into clean output images to defend against adversarial attacks. Nonetheless, extent generative models fail to effectively eliminate adversarial perturbations, yielding less-than-ideal purification results. We emphasize the potential threat of residual adversarial perturbations to target models, quantitatively establishing a relationship between perturbation scale and attack capability. Notably, the residual perturbations on the purified image primarily stem from the same-position patch and similar patches of the adversarial sample. We propose a novel adversarial purification approach named Information Mask Purification (IMPure), aims to extensively eliminate adversarial perturbations. To obtain an adversarial sample, we first mask part of the patches information, then reconstruct the patches to resist adversarial perturbations from the patches. We reconstruct all patches in parallel to obtain a cohesive image. Then, in order to protect the purified samples against potential similar regional perturbations, we simulate this risk by randomly mixing the purified samples with the input samples before inputting them into the feature extraction network. Finally, we establish a combined constraint of pixel loss and perceptual loss to augment the model's reconstruction adaptability. Extensive experiments on the ImageNet dataset with three classifier models demonstrate that our approach achieves state-of-the-art results against nine adversarial attack methods. Implementation code and pre-trained weights can be accessed at \textcolor{blue}{https://github.com/NoWindButRain/IMPure}.	翻訳日:2023-11-28 18:07:41 公開日:2023-11-26
# 視覚変換器を用いた効率的なシーケンス推論のためのトークンリサイクル Token Recycling for Efficient Sequential Inference with Vision Transformers ( http://arxiv.org/abs/2311.15335v1 ) ライセンス: Link先を確認	Jan Olszewski and Dawid Rymarczyk and Piotr W\'ojcik and Mateusz Pach and Bartosz Zieli\'nski	(参考訳) 視覚変換器(ViT)は、不足値の計算を必要としないため、不完全な入力を処理するために畳み込みニューラルネットワークをバイパスする。したがって、ViTは、例えばActive Visual Exploration問題のようなシーケンシャルな意思決定に適している。しかし、新しいシーケンシャル情報が到着するたびにフルフォワードパスを実行するため、計算的に非効率である。この計算効率を抑えるために,任意のアーキテクチャで使用可能なViT推論のTOken Recycling (TORE)修正を導入する。 TOREはViTをイテレータとアグリゲータという2つの部分に分割する。イテレータはシーケンシャル情報を中間トークンに別々に処理し、キャッシュする。アグリゲータは中間トークンを共同で処理して予測を得る。これにより、イテレーターによる計算結果を再利用することができる。効率的な逐次推論を除いては,逐次的意思決定に伴う計算負担を大幅に軽減し,最先端の精度を保ちながら補完的な学習方針を提案する。 Vision Transformers (ViTs) overpass Convolutional Neural Networks in processing incomplete inputs because they do not require the imputation of missing values. Therefore, ViTs are well suited for sequential decision-making, e.g. in the Active Visual Exploration problem. However, they are computationally inefficient because they perform a full forward pass each time a piece of new sequential information arrives. To reduce this computational inefficiency, we introduce the TOken REcycling (TORE) modification for the ViT inference, which can be used with any architecture. TORE divides ViT into two parts, iterator and aggregator. An iterator processes sequential information separately into midway tokens, which are cached. The aggregator processes midway tokens jointly to obtain the prediction. This way, we can reuse the results of computations made by iterator. Except for efficient sequential inference, we propose a complementary training policy, which significantly reduces the computational burden associated with sequential decision-making while achieving state-of-the-art accuracy.	翻訳日:2023-11-28 18:07:10 公開日:2023-11-26
# どれくらいのデータが必要ですか? 医療データに関する事例研究 How much data do I need? A case study on medical data ( http://arxiv.org/abs/2311.15331v1 ) ライセンス: Link先を確認	Ayse Betul Cengiz and A. Stephen McGough	(参考訳) ディープラーニングネットワークをトレーニングするデータの収集には、労力とリソースの面でコストがかかる。多くの場合、特に医学的文脈では、有害な影響がある可能性がある。侵襲的な医療処置や、それ自体が医療被害を引き起こすようなプロセスが必要となる。しかし、Deep Learningはデータ不足の方法だと見なされている。ここでは2つの一般的なアナージを見てみましょう。 i) より多くのデータがより良い結果をもたらすこと二十分なデータがない場合、転送学習は役に立ちます。これらは広く真であると仮定され、深層学習に関わる問題を解決する方法を選択する証拠として使用される。 6つの医学データセットと6つの一般データセットを評価した。これらのデータセットのさまざまなサブセット上でResNet18ネットワークをトレーニングして、“より多くのデータがより良い結果をもたらす”と評価する。転送学習が普遍的に有益かどうかを判断するために、これらのデータセットのうち11つを、第12データセットのサブセットである胸部で転送学習のソースとしています。マルチステージトランスファーラーニングが一貫したメリットをもたらすかどうか、さらに調べていきます。分析の結果、実際の状況はこれらの単純なアサージよりも複雑であることが分かりました -- より多くのデータがリターンの減少につながる可能性があり、転送学習のためのデータセットの誤った選択は、パフォーマンスを悪化させる可能性があるのです。多段階転送学習も同様にデータセット間の複雑な関係を明らかにする。 The collection of data to train a Deep Learning network is costly in terms of effort and resources. In many cases, especially in a medical context, it may have detrimental impacts. Such as requiring invasive medical procedures or processes which could in themselves cause medical harm. However, Deep Learning is seen as a data hungry method. Here, we look at two commonly held adages i) more data gives better results and ii) transfer learning will aid you when you don't have enough data. These are widely assumed to be true and used as evidence for choosing how to solve a problem when Deep Learning is involved. We evaluate six medical datasets and six general datasets. Training a ResNet18 network on varying subsets of these datasets to evaluate `more data gives better results'. We take eleven of these datasets as the sources for Transfer Learning on subsets of the twelfth dataset -- Chest -- in order to determine whether Transfer Learning is universally beneficial. We go further to see whether multi-stage Transfer Learning provides a consistent benefit. Our analysis shows that the real situation is more complex than these simple adages -- more data could lead to a case of diminishing returns and an incorrect choice of dataset for transfer learning can lead to worse performance, with datasets which we would consider highly similar to the Chest dataset giving worse results than datasets which are more dissimilar. Multi-stage transfer learning likewise reveals complex relationships between datasets.	翻訳日:2023-11-28 18:06:52 公開日:2023-11-26
# BS-Diff:胸部X線画像からの条件拡散モデルを用いた効果的な骨抑制 BS-Diff: Effective Bone Suppression Using Conditional Diffusion Models from Chest X-Ray Images ( http://arxiv.org/abs/2311.15328v1 ) ライセンス: Link先を確認	Zhanghao Chen, Yifei Sun, Wenjian Qin, Ruiquan Ge, Cheng Pan, Wenming Deng, Zhou Liu, Wenwen Min, Ahmed Elazab, Xiang Wan, Changmiao Wang	(参考訳) 胸部X線(CXR)は肺検診の低用量モードとして一般的に用いられる。しかし、肺領域の約75%が骨と重なり、疾患の検出と診断を妨げているため、CXRsの有効性は幾らか阻害されている。改善策として骨抑制技術が導入された。現在の病院のデュアルエネルギーサブトラクションイメージング技術では、高価な機器と被写体が高放射線にさらされる必要がある。これらの問題を回避すべく,深層学習に基づく画像生成アルゴリズムが提案されている。しかし, 既存の手法では, 高品質な画像が得られず, 特に肺血管のテクスチャの細部が捉えられにくい。これらの課題に対処するために,U-Netアーキテクチャとオートエンコーダを組み込むシンプルな拡張モジュールを備えた条件拡散モデルを備えた骨抑制フレームワークであるBS-Diffを提案する。提案するネットワークは骨抑制率の高い軟部組織像を生成するだけでなく,微細な画像の詳細を捉える能力も備えている。また,2010年以降で最大のデータセットを収集し,高精細度CXRと軟部組織像を関連病院で収集した120例のデータを収集した。広範囲な実験、比較分析、アブレーション研究、臨床評価は、提案されたBS-Diffが複数の指標でいくつかの骨圧モデルより優れていることを示している。 Chest X-rays (CXRs) are commonly utilized as a low-dose modality for lung screening. Nonetheless, the efficacy of CXRs is somewhat impeded, given that approximately 75% of the lung area overlaps with bone, which in turn hampers the detection and diagnosis of diseases. As a remedial measure, bone suppression techniques have been introduced. The current dual-energy subtraction imaging technique in the clinic requires costly equipment and subjects being exposed to high radiation. To circumvent these issues, deep learning-based image generation algorithms have been proposed. However, existing methods fall short in terms of producing high-quality images and capturing texture details, particularly with pulmonary vessels. To address these issues, this paper proposes a new bone suppression framework, termed BS-Diff, that comprises a conditional diffusion model equipped with a U-Net architecture and a simple enhancement module to incorporate an autoencoder. Our proposed network cannot only generate soft tissue images with a high bone suppression rate but also possesses the capability to capture fine image details. Additionally, we compiled the largest dataset since 2010, including data from 120 patients with high-definition, high-resolution paired CXRs and soft tissue images collected by our affiliated hospital. Extensive experiments, comparative analyses, ablation studies, and clinical evaluations indicate that the proposed BS-Diff outperforms several bone-suppression models across multiple metrics.	翻訳日:2023-11-28 18:06:28 公開日:2023-11-26
# FRAC-Q-Learning:社会ロボットのためのボレドム回避プロセスによる強化学習 FRAC-Q-Learning: A Reinforcement Learning with Boredom Avoidance Processes for Social Robots ( http://arxiv.org/abs/2311.15327v1 ) ライセンス: Link先を確認	Akinari Onishi	(参考訳) 強化学習アルゴリズムはしばしば社会ロボットに適用されている。しかし、ほとんどの強化学習アルゴリズムはソーシャルロボットの使用に最適化されておらず、従ってユーザを惹きつける可能性がある。本研究では,ソーシャルロボットであるfrac-q-learningに特化した新しい強化学習手法を提案する。提案アルゴリズムは,プロセスのランダム化と分類に加えて,忘れるプロセスから構成される。本研究では,従来のq-learningとの比較により,frac-q-learningへの関心と退屈度を評価した。 FRAC-Qラーニングは,従来のQラーニングに比べて関心度が高い傾向を示し,利用者のブーイングが著しく困難であった。したがって、frac-q-learningはユーザーを退屈させないソーシャルロボットの開発に寄与することができる。提案アルゴリズムは、Webベースのコミュニケーションや教育システムにも応用できる。本稿では,frac-q-learningのプロセス全体,詳細な実装,詳細な評価方法について述べる。 The reinforcement learning algorithms have often been applied to social robots. However, most reinforcement learning algorithms were not optimized for the use of social robots, and consequently they may bore users. We proposed a new reinforcement learning method specialized for the social robot, the FRAC-Q-learning, that can avoid user boredom. The proposed algorithm consists of a forgetting process in addition to randomizing and categorizing processes. This study evaluated interest and boredom hardness scores of the FRAC-Q-learning by a comparison with the traditional Q-learning. The FRAC-Q-learning showed significantly higher trend of interest score, and indicated significantly harder to bore users compared to the traditional Q-learning. Therefore, the FRAC-Q-learning can contribute to develop a social robot that will not bore users. The proposed algorithm can also find applications in Web-based communication and educational systems. This paper presents the entire process, detailed implementation and a detailed evaluation method of the of the FRAC-Q-learning for the first time.	翻訳日:2023-11-28 18:06:05 公開日:2023-11-26
# 軽量顔認識: 改良されたMobileFaceNetモデル Lightweight Face Recognition: An Improved MobileFaceNet Model ( http://arxiv.org/abs/2311.15326v1 ) ライセンス: Link先を確認	Ahmad Hassanpour, Yasamin Kowsari	(参考訳) 本稿では,MobileFaceNetとその修正版であるMMobileFaceNetに着目した,軽量顔認識(FR)モデルの広範な探索と比較分析を行う。計算資源が限られているデバイス上での効率的なFRモデルの必要性は、精度を犠牲にすることなく、メモリフットプリントと計算要求を削減したモデルの開発につながった。本研究は、データセット選択、モデルアーキテクチャ、最適化アルゴリズムがFRモデルの性能に与える影響について考察する。 EFaR-2023コンペティションでは,特にパラメータ数に制限されたカテゴリにおいて,当社のモデルが例外的なパフォーマンスを示した。 Webface42Mデータセットのサブセットを採用し、シャープネスを意識した最小化(SAM)最適化を統合することで、クロスポジション、クロスエイジ、クロスエッチのパフォーマンスをテストするものなど、さまざまなベンチマークで精度を大幅に向上しました。この結果は, 計算効率だけでなく, 多様な条件下で高い精度を維持できるモデルの構築における我々のアプローチの有効性を裏付けるものである。 This paper presents an extensive exploration and comparative analysis of lightweight face recognition (FR) models, specifically focusing on MobileFaceNet and its modified variant, MMobileFaceNet. The need for efficient FR models on devices with limited computational resources has led to the development of models with reduced memory footprints and computational demands without sacrificing accuracy. Our research delves into the impact of dataset selection, model architecture, and optimization algorithms on the performance of FR models. We highlight our participation in the EFaR-2023 competition, where our models showcased exceptional performance, particularly in categories restricted by the number of parameters. By employing a subset of the Webface42M dataset and integrating sharpness-aware minimization (SAM) optimization, we achieved significant improvements in accuracy across various benchmarks, including those that test for cross-pose, cross-age, and cross-ethnicity performance. The results underscore the efficacy of our approach in crafting models that are not only computationally efficient but also maintain high accuracy in diverse conditions.	翻訳日:2023-11-28 18:05:50 公開日:2023-11-26
# 集団効果を有するLEDの超熱光子統計量の集団変動機構 Population fluctuation mechanism of the super-thermal photon statistic of LEDs with collective effects ( http://arxiv.org/abs/2311.15324v1 ) ライセンス: Link先を確認	Igor E. Protsenko, Alexander V. Uskov	(参考訳) その結果,エミッタ数の変動は線形状態の小さなLEDの超熱光子統計につながり,強いエミッタ-フィールド結合と集合効果に好適なキャビティを有することがわかった。 2階相関関数 g_2 の簡単な解析式が見つかる。 2レベルLEDモデルでは、g_2はg_2=6まで上昇する。超熱光子統計は、自然発生のキャビティモードへの人口変動の増加に関連している。 We found that fluctuations in the number of emitters lead to a super-thermal photon statistics of small LEDs in a linear regime, with a strong emitter-field coupling and a bad cavity favorable for collective effects. A simple analytical expression for the second-order correlation function g_2 is found. g_2 increase up to g_2=6 in the two-level LED model is predicted. The super-thermal photon statistics is related to the population fluctuation increase of the spontaneous emission to the cavity mode.	翻訳日:2023-11-28 18:05:33 公開日:2023-11-26
# まばらなポーリ・リンドブラッド雑音モデル学習手法 Techniques for learning sparse Pauli-Lindblad noise models ( http://arxiv.org/abs/2311.15408v1 ) ライセンス: Link先を確認	Ewout van den Berg, Pawel Wocjan	(参考訳) 確率的誤差キャンセルやゼロノイズ外挿のような誤差緩和技術は、正確なノイズモデルから恩恵を受ける。 sparse pauli-lindbladノイズモデルは、これらのアプリケーションでもっとも成功したモデルの1つです。既存の実装では、モデルは、キュービット位相に従う一項と二項の局所項を持つ一連の単純なパウリチャネルに分解される。このモデルは、現代の超伝導量子プロセッサの誤差軽減のためのノイズを正確に捉えることが示されているが、最寄りの相互作用を超えた高次項や効果を考慮することが重要である。しかし、そのような拡張モデルが実用的であり続けるためには、それらが効率的に学習できることを保証する必要がある。本研究では,これを実現する新しい手法を提案する。我々は,ポーリ回転に基づくtwirlingを導入することで,単一量子ビットの学習補正シーケンスを自動生成し,学習する必要のある独特なフィデリティの数を減らすことができる。さらに,学習ベース数を最小化するために,グラフカラー化と一様被覆配列を利用する基底選択戦略を提案する。これらの手法を組み合わせることで、拡張されたノイズモデルの学習が、複雑さが増しても効率的であることを保証する。 Error-mitigation techniques such as probabilistic error cancellation and zero-noise extrapolation benefit from accurate noise models. The sparse Pauli-Lindblad noise model is one of the most successful models for those applications. In existing implementations, the model decomposes into a series of simple Pauli channels with one- and two-local terms that follow the qubit topology. While the model has been shown to accurately capture the noise in contemporary superconducting quantum processors for error mitigation, it is important to consider higher-weight terms and effects beyond nearest-neighbor interactions. For such extended models to remain practical, however, we need to ensure that they can be learned efficiently. In this work we present new techniques that accomplish exactly this. We introduce twirling based on Pauli rotations, which enables us to automatically generate single-qubit learning correction sequences and reduce the number of unique fidelities that need to be learned. In addition, we propose a basis-selection strategy that leverages graph coloring and uniform covering arrays to minimize the number of learning bases. Taken together, these techniques ensure that the learning of the extended noise models remains efficient, despite their increased complexity.	翻訳日:2023-11-28 17:58:21 公開日:2023-11-26
# 統計的学習理論を深層学習に適用する Applying statistical learning theory to deep learning ( http://arxiv.org/abs/2311.15404v1 ) ライセンス: Link先を確認	C\'edric Gerbelot, Avetik Karagulyan, Stefani Karp, Kavya Ravichandran, Menachem Stern, Nathan Srebro	(参考訳) 統計的学習理論は教師付き学習を理解するための強固な枠組みを提供するが、深層学習の多くの理論的な側面はいまだに不明であり、特に、異なるアーキテクチャが勾配に基づく方法で訓練された場合、どのように帰納的バイアスをもたらすかである。これらの講義の目的は、学習理論の観点から深層学習を理解しようとするときに生じる主な疑問の概要を提供することである。統計的学習理論と確率的最適化に関する簡単なリマインダーの後、良性過剰の文脈で暗黙のバイアスについて論じる。その後、ミラー降下アルゴリズムの一般的な説明に移り、与えられた学習問題に対するパラメータ空間と対応する関数空間の間の行き来や、学習問題の幾何が計量テンソルによってどのように表現されるかを示す。この枠組みに基づき,線形対角ネットワーク上の勾配降下の暗黙的バイアスを,様々な回帰タスクに対して詳細に検討し,損失関数,初期化時のパラメータスケール,ネットワークの深さが,暗黙的バイアス,特にカーネルや特徴学習間の遷移にどのようにつながるかを示す。 Although statistical learning theory provides a robust framework to understand supervised learning, many theoretical aspects of deep learning remain unclear, in particular how different architectures may lead to inductive bias when trained using gradient based methods. The goal of these lectures is to provide an overview of some of the main questions that arise when attempting to understand deep learning from a learning theory perspective. After a brief reminder on statistical learning theory and stochastic optimization, we discuss implicit bias in the context of benign overfitting. We then move to a general description of the mirror descent algorithm, showing how we may go back and forth between a parameter space and the corresponding function space for a given learning problem, as well as how the geometry of the learning problem may be represented by a metric tensor. Building on this framework, we provide a detailed study of the implicit bias of gradient descent on linear diagonal networks for various regression tasks, showing how the loss function, scale of parameters at initialization and depth of the network may lead to various forms of implicit bias, in particular transitioning between kernel or feature learning.	翻訳日:2023-11-28 17:58:04 公開日:2023-11-26
# 多段文書分類のための学習節重み付け Learning Section Weights for Multi-Label Document Classification ( http://arxiv.org/abs/2311.15402v1 ) ライセンス: Link先を確認	Maziar Moradi Fard, Paula Sorrolla Bayod, Kiomars Motarjem, Mohammad Alian Nejadi, Saber Akhondi, Camilo Thorne	(参考訳) マルチラベル文書分類は、NLPにおける伝統的なタスクである。シングルラベルの分類と比較すると、各文書は複数のクラスに割り当てられる。この問題は科学論文のタグ付けなど、様々な分野において極めて重要である。文書は、しばしば抽象やタイトルなどのいくつかのセクションに分けられる。現在のアプローチでは、異なるセクションを複数ラベルの分類に等しく扱う。これは現実的な仮定ではなく、準最適結果をもたらすと我々は主張する。そこで我々は,複数ラベル分類における各セクションの寄与を利用して,LSW(Learning Section Weights)と呼ばれる新しい手法を提案する。複数のフィードフォワード層によって、LSWは各セクションに重みを割り当て、予測に重みを組み込むことを学ぶ。我々は科学的論文にアプローチを実演する。パブリック(arXiv)およびプライベート(Elsevier)データセットの実験結果は、最先端のマルチラベル文書分類法と比較して、LSWの優位性を確認する。特に、lswはマクロ平均化f1-scoreでは1.3%改善され、公開利用可能なarxivデータセットでのマクロ平均リコールでは1.3%向上した。 Multi-label document classification is a traditional task in NLP. Compared to single-label classification, each document can be assigned multiple classes. This problem is crucially important in various domains, such as tagging scientific articles. Documents are often structured into several sections such as abstract and title. Current approaches treat different sections equally for multi-label classification. We argue that this is not a realistic assumption, leading to sub-optimal results. Instead, we propose a new method called Learning Section Weights (LSW), leveraging the contribution of each distinct section for multi-label classification. Via multiple feed-forward layers, LSW learns to assign weights to each section of, and incorporate the weights in the prediction. We demonstrate our approach on scientific articles. Experimental results on public (arXiv) and private (Elsevier) datasets confirm the superiority of LSW, compared to state-of-the-art multi-label document classification methods. In particular, LSW achieves a 1.3% improvement in terms of macro averaged F1-score while it achieves 1.3% in terms of macro averaged recall on the publicly available arXiv dataset.	翻訳日:2023-11-28 17:57:42 公開日:2023-11-26
# 日常生活行動の現実的シミュレーションのための枠組み A Framework for Realistic Simulation of Daily Human Activity ( http://arxiv.org/abs/2311.15400v1 ) ライセンス: Link先を確認	Ifrah Idrees, Siddharth Singh, Kerui Xu, Dylan F. Glas	(参考訳) 家庭内のユーザの日常的な動きに反応し適応するAstroのようなソーシャルロボットにとって、機能開発とテストには、人間の活動の現実的なシミュレーションが必要である。本稿では,在宅環境における日常の行動パターンをシミュレーションし,異なるパーソナラや活動パターンの手動構成可能性,活動タイミングの変動,複数のホームレイアウトのテストを行うためのフレームワークを提案する。本稿では,スケジュールの日々の変動を特定する手法を提案し,テンプレートからスケジュールを生成する双方向制約伝搬アルゴリズムを提案する。ユースケースシナリオ分析を用いて、我々のフレームワークの表現力を検証するとともに、3つの公開データセットと自己収集データセットから人間の行動によく似たデータを生成することができることを示す。本研究の貢献は,社会ロボットの大規模行動の体系的テストを支援し,異なる家庭における人間の行動の合成データセットの手続き的生成を可能にし,トレーニングデータのバイアスを最小化し,家庭環境におけるより堅牢で効果的なロボットの実現に寄与する。 For social robots like Astro which interact with and adapt to the daily movements of users within the home, realistic simulation of human activity is needed for feature development and testing. This paper presents a framework for simulating daily human activity patterns in home environments at scale, supporting manual configurability of different personas or activity patterns, variation of activity timings, and testing on multiple home layouts. We introduce a method for specifying day-to-day variation in schedules and present a bidirectional constraint propagation algorithm for generating schedules from templates. We validate the expressive power of our framework through a use case scenario analysis and demonstrate that our method can be used to generate data closely resembling human behavior from three public datasets and a self-collected dataset. Our contribution supports systematic testing of social robot behaviors at scale, enables procedural generation of synthetic datasets of human movement in different households, and can help minimize bias in training data, leading to more robust and effective robots for home environments.	翻訳日:2023-11-28 17:57:20 公開日:2023-11-26
# 線形行動クローニング剤の最適指導 Optimally Teaching a Linear Behavior Cloning Agent ( http://arxiv.org/abs/2311.15399v1 ) ライセンス: Link先を確認	Shubham Kumar Bharti, Stephen Wright, Adish Singla, Xiaojin Zhu	(参考訳) 線形行動クローニング(LBC)学習者の最適指導について検討する。この設定では、教師はLBC学習者に示す状態を選択することができる。学習者は、デモと一致する無限線形仮説のバージョン空間を維持する。教師の目標は,最小限の州の実演数を用いて,現実的な目標政策を学習者に教えることである。この数字は「TD」として知られている。本稿では,インスタンス最適tdを実現する "`teach using iterative elimination(tie)" という指導アルゴリズムを提案する。しかし、最適学習セットの探索はNPハードであることも示している。さらに、教示次元に対して$\log(\|a\|-1)$の近似比を保証する近似アルゴリズムを提供する。最後に,本アルゴリズムの効率と有効性を検証する実験結果を提供する。 We study optimal teaching of Linear Behavior Cloning (LBC) learners. In this setup, the teacher can select which states to demonstrate to an LBC learner. The learner maintains a version space of infinite linear hypotheses consistent with the demonstration. The goal of the teacher is to teach a realizable target policy to the learner using minimum number of state demonstrations. This number is known as the Teaching Dimension(TD). We present a teaching algorithm called ``Teach using Iterative Elimination(TIE)" that achieves instance optimal TD. However, we also show that finding optimal teaching set computationally is NP-hard. We further provide an approximation algorithm that guarantees an approximation ratio of $\log(\|A\|-1)$ on the teaching dimension. Finally, we provide experimental results to validate the efficiency and effectiveness of our algorithm.	翻訳日:2023-11-28 17:56:50 公開日:2023-11-26
# 半制約クラスタリングのための制約マッチング ConstraintMatch for Semi-constrained Clustering ( http://arxiv.org/abs/2311.15395v1 ) ライセンス: Link先を確認	Jann Goschenhofer, Bernd Bischl, Zsolt Kira	(参考訳) 制約付きクラスタリングによって、ペアワイズ制約のみを使用した分類モデルのトレーニングが可能になる。真の基盤となるクラスラベルがなくてもうまく機能するが、制約付きクラスタリングモデルはトレーニングに大量のバイナリ制約アノテーションを必要とする。本稿では,制約の小さなセットとともに大量の \textit{unconstrained} データを利用できる半教師付きコンテキストを提案し,そのような制約のないデータを活用するために \textit{ConstraintMatch} を提案する。完全なラベルを用いた半教師付き学習では、多くの進歩がなされているが、制約ベースのラベル設定において、結果のメソッドをナイーブに適用することを妨げる多くの課題がある。したがって、これらの課題、特にその理由と分析を行う。 1)疑似ラベルの主な弱点である確認バイアスを克服するための \textit{pseudo-constraining} メカニズムの提案 2) \textit{informative} unconstrainedサンプルの選択に向けた擬似ラベル法の開発 3) 半拘束型モデルトレーニングを容易にする初期損失と補助損失に対するペアワイズ損失関数の使用も可能であることを示す。大規模実験により,5つの難解なベンチマークにおいて,正規クラスタリングとオーバークラスタシナリオの両方において,関連するベースラインに対する制約マッチの有効性を実証し,いくつかのコンポーネントの分析を提供する。 Constrained clustering allows the training of classification models using pairwise constraints only, which are weak and relatively easy to mine, while still yielding full-supervision-level model performance. While they perform well even in the absence of the true underlying class labels, constrained clustering models still require large amounts of binary constraint annotations for training. In this paper, we propose a semi-supervised context whereby a large amount of \textit{unconstrained} data is available alongside a smaller set of constraints, and propose \textit{ConstraintMatch} to leverage such unconstrained data. While a great deal of progress has been made in semi-supervised learning using full labels, there are a number of challenges that prevent a naive application of the resulting methods in the constraint-based label setting. Therefore, we reason about and analyze these challenges, specifically 1) proposing a \textit{pseudo-constraining} mechanism to overcome the confirmation bias, a major weakness of pseudo-labeling, 2) developing new methods for pseudo-labeling towards the selection of \textit{informative} unconstrained samples, 3) showing that this also allows the use of pairwise loss functions for the initial and auxiliary losses which facilitates semi-constrained model training. In extensive experiments, we demonstrate the effectiveness of ConstraintMatch over relevant baselines in both the regular clustering and overclustering scenarios on five challenging benchmarks and provide analyses of its several components.	翻訳日:2023-11-28 17:56:34 公開日:2023-11-26
# 2層非線形回帰に対する近似ニュートン法の局所収束 Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression ( http://arxiv.org/abs/2311.15390v1 ) ライセンス: Link先を確認	Zhihang Li, Zhao Song, Zifan Wang, Junze Yin	(参考訳) 日常生活の様々な側面において,大規模言語モデル(LLM)による顕著な進歩があった。 LLMは自然言語処理における変換力として機能し、テキスト生成、翻訳、感情分析、質問応答の応用を見つける。 llmの成果は、この分野における研究努力の大幅な増加につながった。 1つの特定の2層回帰問題は、前回の作業においてよく研究されており、第1の層はreluユニットによって活性化され、第2の層はsoftmaxユニットによって活性化される。以前の研究は2層回帰を構築するための堅固な分析を提供するが、2層以上の回帰問題を構成する分析には依然としてギャップがある。本稿では,この問題に対処するための重要なステップとして,二層回帰問題の解析を行う。以前の作業とは対照的に、最初のレイヤはsoftmaxユニットによってアクティベートされます。これにより、softmax関数に基づいてより多くのアクティベーション関数を作成するための将来の分析のステージが設定される。ソフトマックス関数の再配置は、大きく異なる分析をもたらす。その結果, 正規化トレーニング損失を最小化するために用いられる近似ニュートン法の収束特性を解析した。ヘッセン行列の損失関数は正定値であり、ある仮定の下でリプシッツが連続であることを証明する。これにより,提案アルゴリズムの局所収束保証を確立することができる。具体的には、適切な初期化と$O(\log(1/\epsilon)$反復の後、高い確率でトレーニング損失を最小化する$\epsilon$-approximateを見つけることができる。それぞれの繰り返しはおよそ$O(\mathrm{nnz}(C) + d^\omega)$timeを必要とし、$d$はモデルのサイズ、$C$は入力行列、$\omega < 2.374$は行列乗算指数である。 There have been significant advancements made by large language models (LLMs) in various aspects of our daily lives. LLMs serve as a transformative force in natural language processing, finding applications in text generation, translation, sentiment analysis, and question-answering. The accomplishments of LLMs have led to a substantial increase in research efforts in this domain. One specific two-layer regression problem has been well-studied in prior works, where the first layer is activated by a ReLU unit, and the second layer is activated by a softmax unit. While previous works provide a solid analysis of building a two-layer regression, there is still a gap in the analysis of constructing regression problems with more than two layers. In this paper, we take a crucial step toward addressing this problem: we provide an analysis of a two-layer regression problem. In contrast to previous works, our first layer is activated by a softmax unit. This sets the stage for future analyses of creating more activation functions based on the softmax function. Rearranging the softmax function leads to significantly different analyses. Our main results involve analyzing the convergence properties of an approximate Newton method used to minimize the regularized training loss. We prove that the loss function for the Hessian matrix is positive definite and Lipschitz continuous under certain assumptions. This enables us to establish local convergence guarantees for the proposed training algorithm. Specifically, with an appropriate initialization and after $O(\log(1/\epsilon))$ iterations, our algorithm can find an $\epsilon$-approximate minimizer of the training loss with high probability. Each iteration requires approximately $O(\mathrm{nnz}(C) + d^\omega)$ time, where $d$ is the model size, $C$ is the input matrix, and $\omega < 2.374$ is the matrix multiplication exponent.	翻訳日:2023-11-28 17:55:56 公開日:2023-11-26
# spectro-vit:spectrogramsを用いたgaba編集mrs再建のための視覚トランスフォーマーモデル Spectro-ViT: A Vision Transformer Model for GABA-edited MRS Reconstruction Using Spectrograms ( http://arxiv.org/abs/2311.15386v1 ) ライセンス: Link先を確認	Gabriel Dias, Rodrigo Pommot Berto, Mateus Oliveira, Lucas Ueda, Sergio Dertkigil, Paula D. P. Costa, Amirmohammad Shamaei, Roberto Souza, Ashley Harris, Leticia Rittner	(参考訳) 目的: 視覚トランスフォーマ (vit) を用いたgaba-edited magnetic resonance spectroscopy (mrs) の再構成・除去について, 一般に取得される過渡現象の4分の1をスペクトログラムを用いて検討すること。理論と方法:gabaで編集されたmrsスキャンで収集される典型的なトランジェント数の4分の1は前処理され、短時間フーリエ変換(stft)を用いて分光画像表現に変換される。データの画像表現は、GABA編集MSSスペクトル(Spectro-ViT)を再構成するための事前訓練されたViTの適応を可能にする。 Spectro-ViTは微調整され、その後、 \textit{in vivo} GABA編集MSSデータを用いてテストされる。スペクトル品質指標と推定代謝物濃度値を用いて, スペクトルvit特性を文献中の他のモデルと比較した。結果:spectro-vitモデルは,5つの定量的指標(2乗誤差,形状スコア,gaba+/water fit誤差,最大半分幅)のうち4つで,他のモデルを大きく上回った。 GABA+/水, GABA+/Cr, およびGlx/水) の代謝物濃度は, 典型的なGABA添加MSSスキャンを用いて推定した代謝物濃度とほぼ一致した。結論: 提案したSpectro-ViTモデルはGABA編集MSSの再構築において最先端の結果を得た。 Purpose: To investigate the use of a Vision Transformer (ViT) to reconstruct/denoise GABA-edited magnetic resonance spectroscopy (MRS) from a quarter of the typically acquired number of transients using spectrograms. Theory and Methods: A quarter of the typically acquired number of transients collected in GABA-edited MRS scans are pre-processed and converted to a spectrogram image representation using the Short-Time Fourier Transform (STFT). The image representation of the data allows the adaptation of a pre-trained ViT for reconstructing GABA-edited MRS spectra (Spectro-ViT). The Spectro-ViT is fine-tuned and then tested using \textit{in vivo} GABA-edited MRS data. The Spectro-ViT performance is compared against other models in the literature using spectral quality metrics and estimated metabolite concentration values. Results: The Spectro-ViT model significantly outperformed all other models in four out of five quantitative metrics (mean squared error, shape score, GABA+/water fit error, and full width at half maximum). The metabolite concentrations estimated (GABA+/water, GABA+/Cr, and Glx/water) were consistent with the metabolite concentrations estimated using typical GABA-edited MRS scans reconstructed with the full amount of typically collected transients. Conclusion: The proposed Spectro-ViT model achieved state-of-the-art results in reconstructing GABA-edited MRS, and the results indicate these scans could be up to four times faster.	翻訳日:2023-11-28 17:55:10 公開日:2023-11-26
# ロバストかつ自動データクラスタリング: Dirichlet ProcessがMeansの仲介者と出会う Robust and Automatic Data Clustering: Dirichlet Process meets Median-of-Means ( http://arxiv.org/abs/2311.15384v1 ) ライセンス: Link先を確認	Supratik Basu, Jyotishka Ray Choudhury, Debolina Paul, Swagatam Das	(参考訳) クラスタリングは、教師なし機械学習の領域における最も顕著な課題の1つである。セントロイドベースのクラスタリングアルゴリズムの配列のうち、ロイドのヒューリスティックに根ざした古典的な$k$-meansアルゴリズムは、文献で広く使われている技法の1つとして中心的な段階を採っている。それでも、$k$-meansとその変種には注目すべき制限がある。これらは、初期クラスター中心に強く依存しており、目的関数の局所的ミニマムへの収束性があり、データの異常値やノイズに対する感受性が高い。ノイズや異常値を含むデータと向き合うと、中央値推定器(mom)が任意のcentroidベースのクラスタリングフレームワークの安定化力として現れる。別の注意として、既存のクラスタリング方法論の中で一般的な制約は、分析の前にクラスタ数に関する前提知識にある。ベイズ非パラメトリックモデルのようなモデルベース手法を利用することで、無限混合モデルの利点が得られるため、そのような要求を回避できる。本稿では,これらの事実に動機づけられて,クラスタ数を事前に指定せずに,ノイズがクラスタ品質に与える影響を緩和するモデルベースおよびセンタロイドベース手法の原則を統合することにより,効率的かつ自動的なクラスタリング手法を提案する。クラスタリングエラーの上限に関する統計的保証と、シミュレーションおよび実データによる厳密な評価は、既存のクラスタリングアルゴリズムよりも提案手法の利点を示唆している。 Clustering stands as one of the most prominent challenges within the realm of unsupervised machine learning. Among the array of centroid-based clustering algorithms, the classic $k$-means algorithm, rooted in Lloyd's heuristic, takes center stage as one of the extensively employed techniques in the literature. Nonetheless, both $k$-means and its variants grapple with noteworthy limitations. These encompass a heavy reliance on initial cluster centroids, susceptibility to converging into local minima of the objective function, and sensitivity to outliers and noise in the data. When confronted with data containing noisy or outlier-laden observations, the Median-of-Means (MoM) estimator emerges as a stabilizing force for any centroid-based clustering framework. On a different note, a prevalent constraint among existing clustering methodologies resides in the prerequisite knowledge of the number of clusters prior to analysis. Utilizing model-based methodologies, such as Bayesian nonparametric models, offers the advantage of infinite mixture models, thereby circumventing the need for such requirements. Motivated by these facts, in this article, we present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies that mitigates the effect of noise on the quality of clustering while ensuring that the number of clusters need not be specified in advance. Statistical guarantees on the upper bound of clustering error, and rigorous assessment through simulated and real datasets suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.	翻訳日:2023-11-28 17:54:40 公開日:2023-11-26
# ゼロショットオープン語彙3次元視覚グラウンドのためのビジュアルプログラミング Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding ( http://arxiv.org/abs/2311.15383v1 ) ライセンス: Link先を確認	Zhihao Yuan, Jinke Ren, Chun-Mei Feng, Hengshuang Zhao, Shuguang Cui, Zhen Li	(参考訳) 3Dビジュアルグラウンド(3DVG)はテキスト記述に基づく3Dオブジェクトのローカライズを目的としている。従来の3DVGの教師付き手法は、しばしば広範囲のアノテーションと事前定義された語彙を必要とする。この問題に対処するために,大規模言語モデル(LLM)の能力を活かしたゼロショットオープン語彙3DVGのための新しいビジュアルプログラミング手法を提案する。提案手法は,ゼロショット3DVGの基本的な理解を確立するため,LLMに係わるユニークなダイアログベースの手法から始まる。これに基づいて、ビュー非依存、ビュー依存、機能モジュールという3つのタイプのモジュールからなる視覚プログラムを設計する。これらのモジュールは、特に3Dシナリオに適したもので、複雑な推論と推論を実行するために協調して動作する。さらに,既存の3次元オブジェクト検出器の範囲をオープン語彙シナリオに拡張する言語オブジェクト相関モジュールを開発した。我々のゼロショットアプローチは、いくつかの教師付きベースラインより優れており、効果的な3DVGへの大きな前進を示している。 3D Visual Grounding (3DVG) aims at localizing 3D object based on textual descriptions. Conventional supervised methods for 3DVG often necessitate extensive annotations and a predefined vocabulary, which can be restrictive. To address this issue, we propose a novel visual programming approach for zero-shot open-vocabulary 3DVG, leveraging the capabilities of large language models (LLMs). Our approach begins with a unique dialog-based method, engaging with LLMs to establish a foundational understanding of zero-shot 3DVG. Building on this, we design a visual program that consists of three types of modules, i.e., view-independent, view-dependent, and functional modules. These modules, specifically tailored for 3D scenarios, work collaboratively to perform complex reasoning and inference. Furthermore, we develop an innovative language-object correlation module to extend the scope of existing 3D object detectors into open-vocabulary scenarios. Extensive experiments demonstrate that our zero-shot approach can outperform some supervised baselines, marking a significant stride towards effective 3DVG.	翻訳日:2023-11-28 17:54:07 公開日:2023-11-26
# フェデレーション学習のためのマルチグローバルサーバアーキテクチャの評価 Evaluating Multi-Global Server Architecture for Federated Learning ( http://arxiv.org/abs/2311.15382v1 ) ライセンス: Link先を確認	Asfia Kawnine, Hung Cao, Atah Nuh Mih, Monica Wachowicz	(参考訳) 単一のグローバルサーバフレームワークによるフェデレーション学習(fl)は現在、モバイルデバイスやエッジデバイスといった分散環境でマシンラーニングモデルをトレーニングするための一般的なアプローチである。しかしながら、集中型サーバアーキテクチャは、中央/グローバルサーバ上のあらゆる課題がシステム全体の障害を引き起こすため、リスクを負う。このリスクを最小限に抑えるために,複数のグローバルサーバのデプロイを活用する新しいフェデレーション学習フレームワークを提案する。フェデレーション学習における複数のグローバルサーバの実装は,局所的なコラボレーションと知識の集約を生かして効率を向上し,単一サーバフレームワークにおける通信障害に対するエラー耐性を処理できることを実証する。そこで我々は,複数のグローバルサーバの展開を利用する新しいフレームワークを提案する。複数駅における電気自動車(ev)充電の事象履歴を含むデータセットを用いて,一連の実験を行った。複数のグローバルサーバとクライアントサーバを連携させて,各クライアントサーバが異なるリージョンを戦略的に表現し,グローバルサーバがそれらのデバイスからローカル更新を集約する役割を担った。グローバルモデルの予備結果は、複数のサーバに起因するパフォーマンスの差が1%未満であることを示している。モデル効率が向上するという仮説は期待通りではなかったが、アルゴリズムに付加された通信課題を扱うための規則は、誤り耐性の問題を解決した。将来の研究は、複数のグローバルサーバをデプロイするための特定の用途を特定することに集中できる。 Federated learning (FL) with a single global server framework is currently a popular approach for training machine learning models on decentralized environment, such as mobile devices and edge devices. However, the centralized server architecture poses a risk as any challenge on the central/global server would result in the failure of the entire system. To minimize this risk, we propose a novel federated learning framework that leverages the deployment of multiple global servers. We posit that implementing multiple global servers in federated learning can enhance efficiency by capitalizing on local collaborations and aggregating knowledge, and the error tolerance in regard to communication failure in the single server framework would be handled. We therefore propose a novel framework that leverages the deployment of multiple global servers. We conducted a series of experiments using a dataset containing the event history of electric vehicle (EV) charging at numerous stations. We deployed a federated learning setup with multiple global servers and client servers, where each client-server strategically represented a different region and a global server was responsible for aggregating local updates from those devices. Our preliminary results of the global models demonstrate that the difference in performance attributed to multiple servers is less than 1%. While the hypothesis of enhanced model efficiency was not as expected, the rule for handling communication challenges added to the algorithm could resolve the error tolerance issue. Future research can focus on identifying specific uses for the deployment of multiple global servers.	翻訳日:2023-11-28 17:53:46 公開日:2023-11-26
# 計算効率の向上とAI能力の拡散 Increased Compute Efficiency and the Diffusion of AI Capabilities ( http://arxiv.org/abs/2311.15377v1 ) ライセンス: Link先を確認	Konstantin Pilz, Lennart Heim, Nicholas Brown	(参考訳) 高度なaiモデルのトレーニングには、計算リソースや計算に多大な投資が必要です。しかし、ハードウェアの革新が計算とアルゴリズムの進歩の価格を下げるにつれ、AIモデルを所定のパフォーマンスにトレーニングするコストは時間の経過とともに低下する。この現象を分析するために、計算投資のトレーニングと結果のAIモデルの性能を関連付ける計算(投資)効率を導入する。次に,計算効率の向上の概念モデルを提案し,社会的・統治的意味を評価する。アクセス効果は、時間とともにモデルをトレーニングできるアクターの数を増やすが、パフォーマンス効果は、大きな計算投資家が新しい機能を開拓し、能力が拡散してもパフォーマンス上の優位性を維持することができるように、アクターに利用可能なパフォーマンスを同時に向上させる。相対的なパフォーマンスの優位性はゼロサム競争において大きな利益をもたらすかもしれないが、パフォーマンスの天井はリーダーの優位性を減少させる可能性がある。それでも、最も深刻なリスクが最も先進的なモデルから生じた場合、大きな計算投資家は、まず危険な能力を発見すれば、特に精査を保証できる。そのため政府は、大規模な計算投資家に対して、危険な能力について警告し、適切な準備と、優れたモデルパフォーマンスと防御手段の計算アクセスを可能とするよう要求すべきである。過度なリスク、特に犯罪支配能力の場合、政府は完全に増殖を制限する必要があるかもしれない。 Training advanced AI models requires large investments in computational resources, or compute. Yet, as hardware innovation reduces the price of compute and algorithmic advances make its use more efficient, the cost of training an AI model to a given performance falls over time. To analyze this phenomenon, we introduce compute (investment) efficiency, which relates training compute investment to the resulting AI model performance. We then present a conceptual model of increases in compute efficiency and assess the social and governance implications. We find that while an access effect increases the number of actors who can train models to a given performance over time, a performance effect simultaneously increases the performance available to every actor - potentially enabling large compute investors to pioneer new capabilities and maintain a performance advantage even as capabilities diffuse. The market effects are multifaceted: while a relative performance advantage might grant outsized benefits in zero-sum competition, performance ceilings might reduce leaders' advantage. Nonetheless, we find that if the most severe risks arise from the most advanced models, large compute investors warrant particular scrutiny since they discover potentially dangerous capabilities first. Consequently, governments should require large compute investors to warn them about dangerous capabilities, thereby enabling timely preparation and potentially using their superior model performance and compute access for defensive measures. In cases of extreme risks, especially offense-dominant capabilities, the government might need to actively restrict the proliferation entirely.	翻訳日:2023-11-28 17:53:22 公開日:2023-11-26
# MI攻撃に必要なのは信頼だけ Confidence Is All You Need for MI Attacks ( http://arxiv.org/abs/2311.15373v1 ) ライセンス: Link先を確認	Abhishek Sinha, Himanshi Tibrewal, Mansi Gupta, Nikhar Waghela, Shivank Garg	(参考訳) 機械学習のセキュリティの進化期において、機密データの機密性に対する強力な脅威としてメンバーシップ推論攻撃が出現した。この攻撃では、敵はターゲットモデルのトレーニング中に特定のポイントが使用されたかどうかを判定する。本稿では,モデルのトレーニングセットにおけるデータポイントのメンバシップを計測する新しい手法を提案する。伝統的に行われているように、損失とメンバシップを関連付ける代わりに、トレーニング例が一般的に実際のクラスに分類された時に高い信頼度を示すという事実を活用しています。トレーニング中、モデルは基本的にトレーニングデータに適合しており、見えないデータに対する一般化において特に困難に直面する可能性がある。この非対称性は、トレーニングデータに存在する特定のパターンやノイズを利用するため、トレーニングデータに対する信頼性を高めるモデルにつながる。提案手法は,機械学習モデルが生成する信頼度値を活用する。これらの信頼度は、予測におけるモデルの確信度を確率論的に測定し、与えられたデータポイントのメンバシップを推測するためにさらに利用できる。さらに,与えられたデータポイントの基底真理(真のクラス)を知らずにこの攻撃を実行できる別の手法を導入することにより,既存のラベル依存型攻撃手法に対するエッジを提供する。 In this evolving era of machine learning security, membership inference attacks have emerged as a potent threat to the confidentiality of sensitive data. In this attack, adversaries aim to determine whether a particular point was used during the training of a target model. This paper proposes a new method to gauge a data point's membership in a model's training set. Instead of correlating loss with membership, as is traditionally done, we have leveraged the fact that training examples generally exhibit higher confidence values when classified into their actual class. During training, the model is essentially being 'fit' to the training data and might face particular difficulties in generalization to unseen data. This asymmetry leads to the model achieving higher confidence on the training data as it exploits the specific patterns and noise present in the training data. Our proposed approach leverages the confidence values generated by the machine learning model. These confidence values provide a probabilistic measure of the model's certainty in its predictions and can further be used to infer the membership of a given data point. Additionally, we also introduce another variant of our method that allows us to carry out this attack without knowing the ground truth(true class) of a given data point, thus offering an edge over existing label-dependent attack methods.	翻訳日:2023-11-28 17:52:57 公開日:2023-11-26
# TD-Net : スパース・ビューCT再構成のためのトリドメインネットワーク TD-Net: A Tri-domain network for sparse-view CT reconstruction ( http://arxiv.org/abs/2311.15369v1 ) ライセンス: Link先を確認	Xinyuan Wang and Changqing Su and Bo Xiong	(参考訳) X線放射リスクの低減を目的としたスパースビューCT再構成は、しばしば画質劣化に悩まされ、ノイズやアーティファクトとして現れる。既存のポストプロセッシングとデュアルドメイン技術は、放射線の低減に効果があるが、しばしば過剰な結果につながり、診断の明確さを損なう。そこで本研究では,シンノグラム,画像,周波数領域の最適化を統一したtd-netを提案する。周波数スーパービジョンモジュール(FSM)を組み込むことで、TD-Netは複雑な詳細を十分に保存する。広汎な評価は、高画質CT画像のスパースビューからの再構成におけるTD-Netの優れた性能を示す。様々なノイズシナリオにおけるTD-Netの機能強化は、医療画像のブレークスルーとしての可能性を強調している。 Sparse-view CT reconstruction, aimed at reducing X-ray radiation risks, frequently suffers from image quality degradation, manifested as noise and artifacts. Existing post-processing and dual-domain techniques, although effective in radiation reduction, often lead to over-smoothed results, compromising diagnostic clarity. Addressing this, we introduce TD-Net, a pioneering tri-domain approach that unifies sinogram, image, and frequency domain optimizations. By incorporating Frequency Supervision Module(FSM), TD-Net adeptly preserves intricate details, overcoming the prevalent over-smoothing issue. Extensive evaluations demonstrate TD-Net's superior performance in reconstructing high-quality CT images from sparse views, efficiently balancing radiation safety and image fidelity. The enhanced capabilities of TD-Net in varied noise scenarios highlight its potential as a breakthrough in medical imaging.	翻訳日:2023-11-28 17:52:37 公開日:2023-11-26
# ビデオインペインティングのためのフローガイド拡散 Flow-Guided Diffusion for Video Inpainting ( http://arxiv.org/abs/2311.15368v1 ) ライセンス: Link先を確認	Bohai Gu, Yongsheng Yu, Heng Fan, Libo Zhang	(参考訳) ビデオインペインティングは、大きな動きや低照度条件といった複雑なシナリオに挑戦されている。新たな拡散モデルを含む現在の手法は、品質と効率の限界に直面している。本稿では,本論文で紹介するfgdvi(flow-guided diffusion model for video inpainting)について紹介する。我々は,1ステップ潜時伝播の高精度化に光フローを用い,モデル非依存な潜時補間手法を導入する。このテクニックは、追加のトレーニングなしで、任意のビデオ拡散モデル(vdm)とシームレスに統合する。我々のFGDVIは、既存の最先端手法に比べて、フローワープ誤差E_warpが10%向上したことを示す。包括的実験によりFGDVIの優れた性能が検証され,高度な映像のインペイントに期待できる方向性が得られた。コードと詳細な結果はhttps://github.com/nevsnev/fgdviで公開されている。 Video inpainting has been challenged by complex scenarios like large movements and low-light conditions. Current methods, including emerging diffusion models, face limitations in quality and efficiency. This paper introduces the Flow-Guided Diffusion model for Video Inpainting (FGDVI), a novel approach that significantly enhances temporal consistency and inpainting quality via reusing an off-the-shelf image generation diffusion model. We employ optical flow for precise one-step latent propagation and introduces a model-agnostic flow-guided latent interpolation technique. This technique expedites denoising, seamlessly integrating with any Video Diffusion Model (VDM) without additional training. Our FGDVI demonstrates a remarkable 10% improvement in flow warping error E_warp over existing state-of-the-art methods. Our comprehensive experiments validate superior performance of FGDVI, offering a promising direction for advanced video inpainting. The code and detailed results will be publicly available in https://github.com/NevSNev/FGDVI.	翻訳日:2023-11-28 17:52:20 公開日:2023-11-26
# GGNN : 残差接続と重み付きメッセージパッシングを用いたGNNの一般化 GGNNs : Generalizing GNNs using Residual Connections and Weighted Message Passing ( http://arxiv.org/abs/2311.15448v1 ) ライセンス: Link先を確認	Abhinav Raghuvanshi and Kushal Sokke Malleshappa	(参考訳) 多くの実世界の現象はグラフとしてモデル化することができ、その普遍的存在のために非常に価値がある。 GNNはこれらのグラフ内の関係やパターンを捉え、効果的な学習と予測タスクを可能にする。 GNNはMulti-Layer Perceptrons (MLP)を使用して構築され、ノード間の機能のフローを容易にするためにメッセージパッシングのための追加レイヤが組み込まれている。一般に、GNNの一般化力は、ノードが隣人と情報を交換し、グラフのノード間で情報を効果的に取得し、伝播することができる層間のメッセージパッシング機構に起因していると考えられている。提案手法は,各ノードにアキュミュレートする前にメッセージを重み付けし,Residual接続を追加することによって,メッセージパッシング機構をさらに改良する。この2つのメカニズムは学習の大幅な改善とより高速な収束を示す Many real-world phenomena can be modeled as a graph, making them extremely valuable due to their ubiquitous presence. GNNs excel at capturing those relationships and patterns within these graphs, enabling effective learning and prediction tasks. GNNs are constructed using Multi-Layer Perceptrons (MLPs) and incorporate additional layers for message passing to facilitate the flow of features among nodes. It is commonly believed that the generalizing power of GNNs is attributed to the message-passing mechanism between layers, where nodes exchange information with their neighbors, enabling them to effectively capture and propagate information across the nodes of a graph. Our technique builds on these results, modifying the message-passing mechanism further: one by weighing the messages before accumulating at each node and another by adding Residual connections. These two mechanisms show significant improvements in learning and faster convergence	翻訳日:2023-11-28 17:43:57 公開日:2023-11-26
# FLAIR: 顔ビデオ復元のための条件付き拡散フレームワーク FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration ( http://arxiv.org/abs/2311.15445v1 ) ライセンス: Link先を確認	Zihao Zou and Jiaming Liu and Shirin Shoushtari and Yubo Wang and Weijie Gan and Ulugbek S. Kamilov	(参考訳) 顔画像復元(FVR)は、低品質の入力から知覚的にリアルな顔映像を復元しようとする、難しいが重要な問題である。拡散確率モデル(dpms)は顔画像の復元において顕著な性能を発揮することが示されているが、しばしば時間的に一貫性のある高品質な映像を保存できず、再構成された顔の忠実さを損なう。 FLAIR for FVRと呼ばれる新しい条件拡散フレームワークを提案する。 FLAIRは、従来の画像DPMをビデオDPMに変換することにより、フレーム間の時間的一貫性を計算的に効率的に確保する。提案した変換は、繰り返しビデオリファインメント層と、異なるスケールでの時間的自己アテンションを用いる。 FLAIRはまた、推論中に知覚品質と歪み品質のバランスをとるために条件付き反復精製プロセスを使用する。このプロセスは、2つの重要なコンポーネントから構成される:データ一貫性モジュールは、生成されたビデオがその劣化した観察に正確に一致することを解析的に保証する。ビデオの超解像、デブロアリング、JPEG復元、および2つの高品質な顔ビデオデータセットに対する時空フレーム補間において、FLAIRが現在最先端(SOTA)よりも優れていることを示す。 Face video restoration (FVR) is a challenging but important problem where one seeks to recover a perceptually realistic face videos from a low-quality input. While diffusion probabilistic models (DPMs) have been shown to achieve remarkable performance for face image restoration, they often fail to preserve temporally coherent, high-quality videos, compromising the fidelity of reconstructed faces. We present a new conditional diffusion framework called FLAIR for FVR. FLAIR ensures temporal consistency across frames in a computationally efficient fashion by converting a traditional image DPM into a video DPM. The proposed conversion uses a recurrent video refinement layer and a temporal self-attention at different scales. FLAIR also uses a conditional iterative refinement process to balance the perceptual and distortion quality during inference. This process consists of two key components: a data-consistency module that analytically ensures that the generated video precisely matches its degraded observation and a coarse-to-fine image enhancement module specifically for facial regions. Our extensive experiments show superiority of FLAIR over the current state-of-the-art (SOTA) for video super-resolution, deblurring, JPEG restoration, and space-time frame interpolation on two high-quality face video datasets.	翻訳日:2023-11-28 17:43:41 公開日:2023-11-26
# 量子拡散モデル Quantum Diffusion Models ( http://arxiv.org/abs/2311.15444v1 ) ライセンス: Link先を確認	Andrea Cacioppo, Lorenzo Colantonio, Simone Bordoni and Stefano Giagu	(参考訳) 我々は生成拡散モデルの量子バージョンを提案する。このアルゴリズムでは、ニューラルネットワークは量子状態を直接生成するためにパラメータ化された量子回路に置き換えられる。我々はアルゴリズムの完全な量子バージョンと潜在量子バージョンの両方を示し、これらのモデルの条件付きバージョンも提示する。モデルの性能は質的評価によって補完される定量的指標を用いて評価されてきた。アルゴリズムの簡易版の実装は、実際のNISQ量子ハードウェア上で実行されている。 We propose a quantum version of a generative diffusion model. In this algorithm, artificial neural networks are replaced with parameterized quantum circuits, in order to directly generate quantum states. We present both a full quantum and a latent quantum version of the algorithm; we also present a conditioned version of these models. The models' performances have been evaluated using quantitative metrics complemented by qualitative assessments. An implementation of a simplified version of the algorithm has been executed on real NISQ quantum hardware.	翻訳日:2023-11-28 17:43:17 公開日:2023-11-26
# simplex 構造を用いたグラフィックプリミティブの効率的な符号化 Efficient Encoding of Graphics Primitives with Simplex-based Structures ( http://arxiv.org/abs/2311.15439v1 ) ライセンス: Link先を確認	Yibo Wen, Yunfan Yang	(参考訳) グリッドベースの構造は、画像、符号付き距離関数(SDF)、ニューラルレイディアンスフィールド(NeRF)などのグラフィックプリミティブの明示的な特徴を符号化するのに一般的に用いられる。しかし、$n$次元空間では、サンプリングされた点の値を計算するには、その2^n$隣接する頂点の値を補間する必要がある。次元による指数的スケーリングは、大きな計算オーバーヘッドをもたらす。本稿では,グラフィックプリミティブをエンコードするためのsimplexベースの手法を提案する。 simplexベースの構造における頂点の数は次元とともに線形に増加するので、グリッドベースの表現よりも効率的で一般化できる。非軸整合simplicial構造特性を用いて、単純なノイズアルゴリズムの変換手順に類似した効率的なサンプリングのための座標変換、simplicial subdivision、Barycentric interpolationスキームを導出し、証明する。最後に、ハッシュテーブルを使用して、簡単なグリッドにすべての関心点の多重解像度の特徴を格納し、グラフィックプリミティブをパラメータ化するために、完全に接続された小さなニューラルネットワークに渡します。我々は,C++ と CUDA で簡単な構造符号化アルゴリズムを実装した。 2次元画像整合作業において,提案手法は,同じ品質と圧縮率を維持しつつ,インスタントngpで提案したベースライン法に比べて9.4%の時間でギガピクセル画像の整合を行うことができる。ボリュームレンダリングでは、サンプルが十分に密度が高いときに41.2%のスピードアップを観測する。 Grid-based structures are commonly used to encode explicit features for graphics primitives such as images, signed distance functions (SDF), and neural radiance fields (NeRF) due to their simple implementation. However, in $n$-dimensional space, calculating the value of a sampled point requires interpolating the values of its $2^n$ neighboring vertices. The exponential scaling with dimension leads to significant computational overheads. To address this issue, we propose a simplex-based approach for encoding graphics primitives. The number of vertices in a simplex-based structure increases linearly with dimension, making it a more efficient and generalizable alternative to grid-based representations. Using the non-axis-aligned simplicial structure property, we derive and prove a coordinate transformation, simplicial subdivision, and barycentric interpolation scheme for efficient sampling, which resembles transformation procedures in the simplex noise algorithm. Finally, we use hash tables to store multiresolution features of all interest points in the simplicial grid, which are passed into a tiny fully connected neural network to parameterize graphics primitives. We implemented a detailed simplex-based structure encoding algorithm in C++ and CUDA using the methods outlined in our approach. In the 2D image fitting task, the proposed method is capable of fitting a giga-pixel image with 9.4% less time compared to the baseline method proposed by instant-ngp, while maintaining the same quality and compression rate. In the volumetric rendering setup, we observe a maximum 41.2% speedup when the samples are dense enough.	翻訳日:2023-11-28 17:43:11 公開日:2023-11-26
# ProtoArgNet: Super-Prototypes and Argumentationによる解釈可能な画像分類 [技術報告] ProtoArgNet: Interpretable Image Classification with Super-Prototypes and Argumentation [Technical Report] ( http://arxiv.org/abs/2311.15438v1 ) ライセンス: Link先を確認	Hamed Ayoobi, Nico Potyka, Francesca Toni	(参考訳) ProtoArgNetは,プロトタイプ部分学習の精神における画像分類のための,新しい解釈可能なディープニューラルネットワークである。以前のアプローチでは、すべてのクラスを複数の原型-パーツに関連付けるが、ProtoArgNetは、原型-パーツを単一の原型クラス表現に組み合わせた超原型を使用する。さらに、以前のアプローチでは、ProtoPNetのロジスティック回帰のような解釈可能な分類層を使用していたが、ProtoArgNetは、引数の形式に基づいた解釈可能な読み込みに依存しながら、多層パーセプトロンによる精度を向上させる。 protoargnetは、多層パーセプトロン/アグメンテーションコンポーネントのスパース化のプロセスによって、ユーザ認知要求にカスタマイズできる。また、他のprototypepical-part-learningアプローチとは対照的に、protoargnetは画像内の異なる領域からの異なるprototypepical-part間の空間関係を認識できる。 We propose ProtoArgNet, a novel interpretable deep neural architecture for image classification in the spirit of prototypical-part-learning as found, e.g. in ProtoPNet. While earlier approaches associate every class with multiple prototypical-parts, ProtoArgNet uses super-prototypes that combine prototypical-parts into single prototypical class representations. Furthermore, while earlier approaches use interpretable classification layers, e.g. logistic regression in ProtoPNet, ProtoArgNet improves accuracy with multi-layer perceptrons while relying upon an interpretable reading thereof based on a form of argumentation. ProtoArgNet is customisable to user cognitive requirements by a process of sparsification of the multi-layer perceptron/argumentation component. Also, as opposed to other prototypical-part-learning approaches, ProtoArgNet can recognise spatial relations between different prototypical-parts that are from different regions in images, similar to how CNNs capture relations between patterns recognized in earlier layers.	翻訳日:2023-11-28 17:42:42 公開日:2023-11-26
# リラックスした自然景観統計モデルによる品質モデリング Quality Modeling Under A Relaxed Natural Scene Statistics Model ( http://arxiv.org/abs/2311.15437v1 ) ライセンス: Link先を確認	Abhinau K. Venkataramanan and Alan C. Bovik	(参考訳) 視覚情報忠実度 (VIF) や時空間縮小参照エントロピー差 (ST-RRED) などの情報理論画像品質評価 (IQA) モデルは,自然景観統計学 (NSS) と情報理論をシームレスに統合することで大きな成功を収めている。自然画像のウェーブレットサブバンド係数を管理するガウススケール混合(GSM)モデルがこれらのアルゴリズムの基礎となっている。しかし、ソーシャルメディア上のユーザー生成コンテンツの爆発は、通常、多くの未知の障害の1つ以上によって歪められているが、単純なgsmモデルに依存するnssベースのiqaモデルの限界を明らかにする。本稿では,多変量一般化ガウス分布(MGGD)の有用性を導出し,それを応用して一般化GSM(GGSM)モデルの下でのVIFの挙動について検討する。 Information-theoretic image quality assessment (IQA) models such as Visual Information Fidelity (VIF) and Spatio-temporal Reduced Reference Entropic Differences (ST-RRED) have enjoyed great success by seamlessly integrating natural scene statistics (NSS) with information theory. The Gaussian Scale Mixture (GSM) model that governs the wavelet subband coefficients of natural images forms the foundation for these algorithms. However, the explosion of user-generated content on social media, which is typically distorted by one or more of many possible unknown impairments, has revealed the limitations of NSS-based IQA models that rely on the simple GSM model. Here, we seek to elaborate the VIF index by deriving useful properties of the Multivariate Generalized Gaussian Distribution (MGGD), and using them to study the behavior of VIF under a Generalized GSM (GGSM) model.	翻訳日:2023-11-28 17:42:20 公開日:2023-11-26
# 言語モデリングのためのスキップ学習 Learning to Skip for Language Modeling ( http://arxiv.org/abs/2311.15436v1 ) ライセンス: Link先を確認	Dewen Zeng, Nan Du, Tao Wang, Yuanzhong Xu, Tao Lei, Zhifeng Chen, Claire Cui	(参考訳) 過パラメータ化された大規模言語モデルは、文脈内数ショット学習の顕著な一般化性能を有する。しかし、ほとんどの言語モデルは、入力データの複雑さや重要性を無視して、各トークンに同じ量のパラメータや計算を割り当てている。言語モデルの事前訓練では、異なるトークンに可変量の計算を割り当てるべきであり、これは単純なルーティング機構によって効率的に実現できると論じる。トークンが初期レイヤのみの早期終了が可能な従来の早期停止技術とは異なり,バイナリルータを用いた任意の入力トークンに対するレイヤ(あるいはモジュール)の実行を動的にスキップする,より一般的な方法を提案する。提案手法は, 24 個の NLP タスクにまたがる広範囲な評価において, 提案手法は, 推論に軽度な余剰コストでのみ, 他の競合ベースラインと比較して1ショット性能を著しく向上させることができることを示した。 Overparameterized large-scale language models have impressive generalization performance of in-context few-shot learning. However, most language models allocate the same amount of parameters or computation to each token, disregarding the complexity or importance of the input data. We argue that in language model pretraining, a variable amount of computation should be assigned to different tokens, and this can be efficiently achieved via a simple routing mechanism. Different from conventional early stopping techniques where tokens can early exit at only early layers, we propose a more general method that dynamically skips the execution of a layer (or module) for any input token with a binary router. In our extensive evaluation across 24 NLP tasks, we demonstrate that the proposed method can significantly improve the 1-shot performance compared to other competitive baselines only at mild extra cost for inference.	翻訳日:2023-11-28 17:42:02 公開日:2023-11-26
# 機能性拡散 Functional Diffusion ( http://arxiv.org/abs/2311.15435v1 ) ライセンス: Link先を確認	Biao Zhang and Peter Wonka	(参考訳) 本稿では,関数拡散と呼ばれる新しい生成拡散モデルを提案する。以前の研究とは対照的に、関数拡散は連続領域を持つ関数で表されるサンプルに作用する。関数拡散は古典的拡散モデルの無限次元領域への拡張と見なすことができる。機能拡散は、画像、ビデオ、オーディオ、3d形状、変形、 \etcが最小限の変更で同じフレームワークで処理できるため、非常に多様である。さらに、関数拡散は非標準領域で定義された不規則データやデータに特に適している。本研究では,関数拡散に必要な基礎を導出し,トランスフォーマアーキテクチャに基づく最初の実装を提案する。 3次元面上で定義される複雑な符号付き距離関数と変形関数に対する生成結果を示す。 We propose a new class of generative diffusion models, called functional diffusion. In contrast to previous work, functional diffusion works on samples that are represented by functions with a continuous domain. Functional diffusion can be seen as an extension of classical diffusion models to an infinite-dimensional domain. Functional diffusion is very versatile as images, videos, audio, 3D shapes, deformations, \etc, can be handled by the same framework with minimal changes. In addition, functional diffusion is especially suited for irregular data or data defined in non-standard domains. In our work, we derive the necessary foundations for functional diffusion and propose a first implementation based on the transformer architecture. We show generative results on complicated signed distance functions and deformation functions defined on 3D surfaces.	翻訳日:2023-11-28 17:41:49 公開日:2023-11-26
# デジタル性・生殖健康における集団プライバシの重要性 The Importance of Collective Privacy in Digital Sexual and Reproductive Health ( http://arxiv.org/abs/2311.15432v1 ) ライセンス: Link先を確認	Teresa Almeida, Maryam Mehrnezhad, Stephen Cook	(参考訳) デジタル性と生殖の健康技術は豊富にあり、その潜在的な機密データ漏洩に関する懸念を示している。我々は15のIoTデバイスを性的および生殖的追跡サービスで分析し、この絶え間なく続くデータの収集が、パートナー、子、家族を含む個人以上の多くの意味を持つことがわかった。結果は、デジタル性的および生殖的健康データプライバシーは個人的および集団的努力であることを示している。 There is an abundance of digital sexual and reproductive health technologies that presents a concern regarding their potential sensitive data breaches. We analyzed 15 Internet of Things (IoT) devices with sexual and reproductive tracking services and found this ever-extending collection of data implicates many beyond the individual including partner, child, and family. Results suggest that digital sexual and reproductive health data privacy is both an individual and collective endeavor.	翻訳日:2023-11-28 17:41:39 公開日:2023-11-26
# ディープラーニングを用いた機械によるテキスト検出 Machine-Generated Text Detection using Deep Learning ( http://arxiv.org/abs/2311.15425v1 ) ライセンス: Link先を確認	Raghav Gaggar, Ashish Bhagchandani, Harsh Oza	(参考訳) 本研究では,大規模言語モデル (llm) が生成するテキストを人間の生成したテキストから識別するという重要な課題に焦点を当てた。このような機能を持つモデルの実現に関する議論が進行中であることを踏まえ,モデルの実現可能性に関する証拠を提示する。我々は,Twitter Sentiment, Football Commentary, Project Gutenberg, PubMedQA, SQuADなど,複数のデータセットでモデルを評価し,検出手法の有効性を確認した。これらのデータセットは、あらゆる可能性を含む複雑な制約でサンプリングされ、将来の研究の基礎となった。 GPT-3.5-TurboをSVM,RoBERTa-base,RoBERTa-largeなどの各種検出器に対して評価した。研究結果から, 文のシーケンス長に大きく依存した。 Our research focuses on the crucial challenge of discerning text produced by Large Language Models (LLMs) from human-generated text, which holds significance for various applications. With ongoing discussions about attaining a model with such functionality, we present supporting evidence regarding the feasibility of such models. We evaluated our models on multiple datasets, including Twitter Sentiment, Football Commentary, Project Gutenberg, PubMedQA, and SQuAD, confirming the efficacy of the enhanced detection approaches. These datasets were sampled with intricate constraints encompassing every possibility, laying the foundation for future research. We evaluate GPT-3.5-Turbo against various detectors such as SVM, RoBERTa-base, and RoBERTa-large. Based on the research findings, the results predominantly relied on the sequence length of the sentence.	翻訳日:2023-11-28 17:41:31 公開日:2023-11-26
# Wired Perspectives:マルチビューのワイヤーアートが生成AIを取り入れる Wired Perspectives: Multi-View Wire Art Embraces Generative AI ( http://arxiv.org/abs/2311.15421v1 ) ライセンス: Link先を確認	Zhiyu Qu and Lan Yang and Honggang Zhang and Tao Xiang and Kaiyue Pang and Yi-Zhe Song	(参考訳) 多視点ワイヤーアート(MVWA、Multi-view wire art)は、異なる視点から様々な解釈をすることができる静的な3D彫刻である。そこで我々は,MVWAを容易に作成できるAIシステムDreamWireを紹介する。ユーザーはテキストプロンプトやスクリブルを通じてビジョンを表現し、複雑な3dワイヤー組織から解放する。提案手法は,3次元b\'ezier曲線,prim'sアルゴリズム,および拡散モデルあるいはそれらの変種(例えば controlnet)からの知識蒸留を合成する。このブレンドにより、システムは3dワイヤアートを表現でき、空間的連続性とデータの不足を克服することができる。本システムの内部動作について,接続性と視覚美学のトレードオフを含む総合的な評価と分析を行った。 Creating multi-view wire art (MVWA), a static 3D sculpture with diverse interpretations from different viewpoints, is a complex task even for skilled artists. In response, we present DreamWire, an AI system enabling everyone to craft MVWA easily. Users express their vision through text prompts or scribbles, freeing them from intricate 3D wire organisation. Our approach synergises 3D B\'ezier curves, Prim's algorithm, and knowledge distillation from diffusion models or their variants (e.g., ControlNet). This blend enables the system to represent 3D wire art, ensuring spatial continuity and overcoming data scarcity. Extensive evaluation and analysis are conducted to shed insight on the inner workings of the proposed system, including the trade-off between connectivity and visual aesthetics.	翻訳日:2023-11-28 17:41:17 公開日:2023-11-26
# MCReSANetを用いた低電圧グリッドにおける高調波電流発生のためのデータ駆動モデリング Data-Driven Modelling for Harmonic Current Emission in Low-Voltage Grid Using MCReSANet with Interpretability Analysis ( http://arxiv.org/abs/2311.15420v1 ) ライセンス: Link先を確認	Jieyu Yao, Hao Yu, Paul Judge, Jiabin Jia, Sasa Djokic, Verner P\"uvi, Matti Lehtonen, Jan Meyer	(参考訳) 電力エレクトロニクス PE の負荷は、電力変換効率と制御を向上させるが、グリッドにおけるハーモニクスの主要な源はそれらである。分布系で多様な負荷が結合されると、その相互作用は調和電圧と電流の関係に関する解析モデルを確立する。そこで本論文では,mresanetを用いた高調波電圧と電流の非線形なデータ駆動モデルを提案する。フィンランドとドイツのpccsから得られた2つのデータセットを用いて、マクレサネットが選択されたフィンランドとドイツのデータセットの様々なネットワーク特性が存在する場合でも、正確な非線形マッピングを確立できることを実証する。 MCReSANetが構築したモデルでは、CNNと比較してMAEが10%、CNNが14%改善され、フィンランドとドイツの両方のデータセットのMLPに比べて8%と17%改善され、モデルの不確実性が他のモデルよりもはるかに低い。本論文は,モデル解釈可能性解析の手法である,より正確なSHAP値に基づく特徴重要度解析のための重要な前提条件である。特徴量分析の結果,分布系における高調波電圧の次数と電流の関係が詳細に示された。それぞれの高調波電流の順序にはインタラクティブな影響があるが、高調波電圧の順序は高調波電流の放出に支配的な影響を与えている: 正の列とゼロの列の高調波は、それぞれフィンランドとドイツのネットワークにおいて支配的な重要性を持ち、2つの選択されたフィンランドとドイツのデータセットで接続された負荷タイプのパターンに準拠している。本稿では,配電系統における多種多様PE負荷による高調波電流放出の理解と予測の可能性を高めるとともに,多種多様グリッド環境における電力品質の最適化に有効であることを示す。 Even though the use of power electronics PE loads offers enhanced electrical energy conversion efficiency and control, they remain the primary sources of harmonics in grids. When diverse loads are connected in the distribution system, their interactions complicate establishing analytical models for the relationship between harmonic voltages and currents. To solve this, our paper presents a data-driven model using MCReSANet to construct the highly nonlinear between harmonic voltage and current. Two datasets from PCCs in Finland and Germany are utilized, which demonstrates that MCReSANet is capable of establishing accurate nonlinear mappings, even in the presence of various network characteristics for selected Finland and Germany datasets. The model built by MCReSANet can improve the MAE by 10% and 14% compared to the CNN, and by 8% and 17% compared to the MLP for both Finnish and German datasets, also showing much lower model uncertainty than others. This is a crucial prerequisite for more precise SHAP value-based feature importance analysis, which is a method for the model interpretability analysis in this paper. The results by feature importance analysis show the detailed relationships between each order of harmonic voltage and current in the distribution system. There is an interactive impact on each order of harmonic current, but some orders of harmonic voltages have a dominant influence on harmonic current emissions: positive sequence and zero sequence harmonics have the dominant importance in the Finnish and German networks, respectively, which conforms to the pattern of connected load types in two selected Finnish and German datasets. This paper enhances the potential for understanding and predicting harmonic current emissions by diverse PE loads in distribution systems, which is beneficial to more effective management for optimizing power quality in diverse grid environments.	翻訳日:2023-11-28 17:41:02 公開日:2023-11-26
# 行列と線形写像のフロベニウス型ノルムと内積とニューラルネットワークトレーニングへの応用 Frobenius-Type Norms and Inner Products of Matrices and Linear Maps with Applications to Neural Network Training ( http://arxiv.org/abs/2311.15419v1 ) ライセンス: Link先を確認	Roland Herzog and Frederik K\"ohne and Leonie Kreis and Anton Schiela	(参考訳) フロベニウスノルムは行列の標準の頻繁な選択である。特に、基盤となるフロベニウスの内積は、ニューラルネットワークのトレーニングで発生するような行列変数に対する対象の勾配を評価するために一般的に用いられる。我々は、直線写像や行列に対するフロベニウスノルムや内積のより広い視点を提供し、それらの内積への依存性をドメイン空間やコドメイン空間で確立する。これは、古典的なフロベニウスノルムが、より一般的なフロベニウス型ノルムの族に属する特別な要素であることを示している。この実現によって提供される重要な余分な自由は、特に、前提条件のニューラルネットワークトレーニングに使用できる。 The Frobenius norm is a frequent choice of norm for matrices. In particular, the underlying Frobenius inner product is typically used to evaluate the gradient of an objective with respect to matrix variable, such as those occuring in the training of neural networks. We provide a broader view on the Frobenius norm and inner product for linear maps or matrices, and establish their dependence on inner products in the domain and co-domain spaces. This shows that the classical Frobenius norm is merely one special element of a family of more general Frobenius-type norms. The significant extra freedom furnished by this realization can be used, among other things, to precondition neural network training.	翻訳日:2023-11-28 17:40:30 公開日:2023-11-26
# GANに基づくLiDAR強度シミュレーション GAN-Based LiDAR Intensity Simulation ( http://arxiv.org/abs/2311.15415v1 ) ライセンス: Link先を確認	Richard Marcus, Felix Gabel, Niklas Knoop and Marc Stamminger	(参考訳) 現実の車両センサシミュレーションは、自動運転を開発する上で重要な要素である。物理ベースのLiDARのような視覚センサーの実装は実際は複雑であるため、データベースのアプローチはソリューションを約束する。実際のテストドライブからのカメラ画像とLiDARスキャンを使って、GANはそれらの間の翻訳を訓練することができる。このプロセスには2つの追加点がある。まず、カメラ画像を利用して、セグメンテーションデータと深度マップをトレーニング用追加入力として取得する。第2に,物体検出ネットワークが実点群と合成点群の間でどのように一般化し,真理点群を含まない評価を可能にするかを検証することで,LiDARシミュレーションの性能を検証した。両方を組み合わせることで,LiDAR点雲をシミュレートし,現実性を実証する。 Realistic vehicle sensor simulation is an important element in developing autonomous driving. As physics-based implementations of visual sensors like LiDAR are complex in practice, data-based approaches promise solutions. Using pairs of camera images and LiDAR scans from real test drives, GANs can be trained to translate between them. For this process, we contribute two additions. First, we exploit the camera images, acquiring segmentation data and dense depth maps as additional input for training. Second, we test the performance of the LiDAR simulation by testing how well an object detection network generalizes between real and synthetic point clouds to enable evaluation without ground truth point clouds. Combining both, we simulate LiDAR point clouds and demonstrate their realism.	翻訳日:2023-11-28 17:40:18 公開日:2023-11-26
# KOPPA: Key-Query Orthogonal ProjectionとプロトタイプベースのOne-Versus-AllによるPromptベースの継続的学習の改善 KOPPA: Improving Prompt-based Continual Learning with Key-Query Orthogonal Projection and Prototype-based One-Versus-All ( http://arxiv.org/abs/2311.15414v1 ) ライセンス: Link先を確認	Quyen Tran, Lam Tran, Khoat Than, Toan Tran, Dinh Phung, Trung Le	(参考訳) 大規模言語モデルに適用された即時チューニング技術からインスピレーションを得た最近のViTネットワークは,連続学習分野において顕著な成果を上げている。具体的には、一連のプロンプトを維持し、そのサブセットをキー-クエリマッチング戦略を用いて各タスクの学習に割り当てることを提案する。しかしながら、古いタスククエリと将来のタスクのキーとの相関性、潜在空間の特徴のシフト、独立したタスクで学習された潜在ベクトルの相対的分離の制御を欠くと、制限を受ける可能性がある。本研究では,モデルに依存しないメタラーニングにインスパイアされた直交投影に基づく新しいキークエリ学習戦略を導入する。さらに,OVA(One-Versus-All)のプロトタイプベースコンポーネントを導入し,分類ヘッドの区別を強化する。ベンチマークデータを用いた実験結果から,提案手法は,現在の最先端手法を最大20%超える結果が得られることを示した。 Drawing inspiration from prompt tuning techniques applied to Large Language Models, recent methods based on pre-trained ViT networks have achieved remarkable results in the field of Continual Learning. Specifically, these approaches propose to maintain a set of prompts and allocate a subset of them to learn each task using a key-query matching strategy. However, they may encounter limitations when lacking control over the correlations between old task queries and keys of future tasks, the shift of features in the latent space, and the relative separation of latent vectors learned in independent tasks. In this work, we introduce a novel key-query learning strategy based on orthogonal projection, inspired by model-agnostic meta-learning, to enhance prompt matching efficiency and address the challenge of shifting features. Furthermore, we introduce a One-Versus-All (OVA) prototype-based component that enhances the classification head distinction. Experimental results on benchmark datasets demonstrate that our method empowers the model to achieve results surpassing those of current state-of-the-art approaches by a large margin of up to 20%.	翻訳日:2023-11-28 17:40:05 公開日:2023-11-26
# DISYRE: Unsupervised Anomaly Detection のための拡散誘導型合成保存法 DISYRE: Diffusion-Inspired SYnthetic REstoration for Unsupervised Anomaly Detection ( http://arxiv.org/abs/2311.15453v1 ) ライセンス: Link先を確認	Sergio Naval Marimont and Matthew Baugh and Vasilis Siomos and Christos Tzelepis and Bernhard Kainz and Giacomo Tarroni	(参考訳) 教師なし異常検出(unsupervised anomaly detection, uad)技術は、アノテーションに頼ることなく異常を識別し、ローカライズすることを目的としている。拡散モデルは、所望の分布に属する確率、すなわちスコア関数 $\nabla_x \log p(x)$ をモデル化するために、入力を$x$ に変更することを学ぶ。このようなスコア関数は、$\nabla_x \log p(x)$ がピクセル単位の異常スコアであるため、uad に潜在的に関係している。しかし,拡散モデルはガウス雑音に基づく汚職過程を逆転するように訓練されており,学習したスコア関数は医学的異常に一般化する可能性は低い。本研究は, UADに関連するスコア関数の学習方法の問題に対処し, DISYRE: Diffusion-Inspired SYnthetic Restorationを提案する。拡散型パイプラインは維持するが,ガウス雑音の劣化を徐々に合成異常に置き換えて,学習したスコア関数を医学的,自然発生異常に一般化する。我々は3つの一般的な脳MRI UADベンチマークでdisYREを評価し、3つのタスクのうち2つで他の方法よりもかなり優れています。 Unsupervised Anomaly Detection (UAD) techniques aim to identify and localize anomalies without relying on annotations, only leveraging a model trained on a dataset known to be free of anomalies. Diffusion models learn to modify inputs $x$ to increase the probability of it belonging to a desired distribution, i.e., they model the score function $\nabla_x \log p(x)$. Such a score function is potentially relevant for UAD, since $\nabla_x \log p(x)$ is itself a pixel-wise anomaly score. However, diffusion models are trained to invert a corruption process based on Gaussian noise and the learned score function is unlikely to generalize to medical anomalies. This work addresses the problem of how to learn a score function relevant for UAD and proposes DISYRE: Diffusion-Inspired SYnthetic REstoration. We retain the diffusion-like pipeline but replace the Gaussian noise corruption with a gradual, synthetic anomaly corruption so the learned score function generalizes to medical, naturally occurring anomalies. We evaluate DISYRE on three common Brain MRI UAD benchmarks and substantially outperform other methods in two out of the three tasks.	翻訳日:2023-11-28 17:27:34 公開日:2023-11-26
# 選択質問応答のための不確かさ認識言語モデリング Uncertainty-aware Language Modeling for Selective Question Answering ( http://arxiv.org/abs/2311.15451v1 ) ライセンス: Link先を確認	Qi Yang, Shreya Ravikumar, Fynn Schmitt-Ulms, Satvik Lolla, Ege Demir, Iaroslav Elistratov, Alex Lavaee, Sadhana Lolla, Elaheh Ahmadi, Daniela Rus, Alexander Amini, Alejandro Perez	(参考訳) 本稿では,予測毎に不確実性を推定できる不確実性認識型LLMを自動大言語モデル(LLM)変換手法を提案する。我々のアプローチはモデルとデータに依存しず、計算効率が高く、外部モデルやシステムに依存しない。任意の精度を維持しながら、可能な限り多くの質問に答えるために、選択された質問応答設定で変換されたモデルを評価する。本研究は,SQuAD抽出QAタスクとTruthfulQA生成QAタスクを用いてBERTおよびLlama 2モデル変異体を試験した。提案手法により得られた不確実性推定値を用いることで,モデル確率を用いた場合よりも精度が著しく向上することを示す。 We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs capable of estimating uncertainty with every prediction. Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems. We evaluate converted models on the selective question answering setting -- to answer as many questions as possible while maintaining a given accuracy, forgoing providing predictions when necessary. As part of our results, we test BERT and Llama 2 model variants on the SQuAD extractive QA task and the TruthfulQA generative QA task. We show that using the uncertainty estimates provided by our approach to selectively answer questions leads to significantly higher accuracy over directly using model probabilities.	翻訳日:2023-11-28 17:27:11 公開日:2023-11-26

Title

Authors

Abstract

論文公表日・翻訳日

# DonationChain: ブロックチェーンベースの寄付追跡システムのための新しいプラットフォーム

DonationChain: A New Platform for Blockchain-Based Donation-Tracking System ( http://arxiv.org/abs/2311.03573v2 )

ライセンス: Link先を確認

Chaimaa Nairi, Murtaza Cicioglu, Ali Calhan,

(参考訳) スマートコントラクトとブロックチェーン技術を使用した寄付追跡システムは、慈善金の追跡と管理方法に革命をもたらす可能性がある。この記事では、スマートコントラクトとブロックチェーンを使用して、慈善寄付を追跡するための透明でセキュアな台帳を作成する方法について説明する。従来の寄付システムの限界と、ブロックチェーンベースのシステムがこれらの課題を克服する上でどのように役立つかについて議論する。スマートコントラクトがどのように機能するか、寄付追跡でどのように使用できるのか、自動プロセス、トランザクション手数料の削減、説明責任の向上など、それらが提供するメリットについて説明する。また、ブロックチェーン技術は、透明性を高め、不正を防止するために、分散的で改ざんされた台帳を提供する方法について論じる。最後に、技術的専門知識の必要性やセキュリティ侵害の可能性など、スマートコントラクトベースの寄付追跡システムを実装する際に対処しなければならない課題について検討する。全体として、スマートコントラクトとブロックチェーンを使用した寄付追跡システムは、寄付プロセスにおける信頼と説明責任を高める可能性がある。

A donation-tracking system using smart contracts and blockchain technology has the potential to revolutionize the way charitable giving is tracked and managed. This article explores how smart contracts and blockchain can be used to create a transparent and secure ledger for tracking charitable donations. We discuss the limitations of traditional donation systems and how a blockchain-based system can help overcome these challenges. We describe how smart contracts work, how they can be used in donation tracking, and the benefits they offer, including automated processes, reduced transaction fees, and increased accountability. We also discuss how blockchain technology provides a decentralized and tamper-proof ledger that can increase transparency and help prevent fraud. Finally, we examine some of the challenges that must be addressed when implementing a smart contract-based donation tracking system, such as the need for technical expertise and the potential for security breaches. Overall, a donation-tracking system using smart contracts and blockchain has the potential to increase trust and accountability in the donation process, which can ultimately help ensure that donations are used for their intended purposes.

翻訳日:2024-03-25 13:36:10 公開日:2023-11-26

# Make Them Change It Every Week!:A Qualitative Exploration of Online Developer Advice on Usable and Secure Authentication

"Make Them Change it Every Week!": A Qualitative Exploration of Online Developer Advice on Usable and Secure Authentication ( http://arxiv.org/abs/2309.00744v2 )

ライセンス: Link先を確認

Jan H. Klemmer, Marco Gutfleisch, Christian Stransky, Yasemin Acar, M. Angela Sasse, Sascha Fahl,

(参考訳) ウェブ以降で使用可能なセキュアな認証は、ミッションクリティカルだ。パスワードベースの認証はまだ普及しているが、ユーザーは数百のオンラインアカウントとパスワードを扱うのに苦労している。多要素認証のような代替や拡張には独自の課題があり、限定的な採用しか見つからない。セキュリティとユーザビリティの適切なバランスを見つけることは、開発者にとっては難しい。以前の調査では、開発者はオンラインリソースを使用してコードを記述する際のセキュリティ上の決定を通知していた。他の分野と同様、Stack Overflowに関する議論、OWASPやNISTといった機関によるガイドラインなど、開発者の認証アドバイスがオンラインで公開されている。エンドユーザのセキュリティに影響を及ぼす認証に関する,開発者のアドバイスを最初に検討しています。 18名のプロのWeb開発者を対象に調査を行い,406件の文書と272件のアドバイスを質的に分析した。我々は、オンラインアドバイスのアクセシビリティと品質を理解し、オンラインアドバイスが安全および(使用不能な)認証にどのように貢献するかについての洞察を提供することを目指している。アドバイスは散在しており、推奨され、一貫したアドバイスを見つけることは、開発者にとっても問題である。最も一般的なアドバイスはパスワードベースの認証だが、より現代的な代替案はほとんどない。残念ながら、多くのアドバイスはデバタブル(複雑なパスワードポリシーなど)、時代遅れ(例えば、通常のパスワード変更を強制)、あるいは矛盾し、使用不能または安全でない認証につながる可能性がある。調査の結果から,開発者,アドバイス提供者,公式機関,学界に対して,開発者のオンラインアドバイスを改善する方法について提言する。

Usable and secure authentication on the web and beyond is mission-critical. While password-based authentication is still widespread, users have trouble dealing with potentially hundreds of online accounts and their passwords. Alternatives or extensions such as multi-factor authentication have their own challenges and find only limited adoption. Finding the right balance between security and usability is challenging for developers. Previous work found that developers use online resources to inform security decisions when writing code. Similar to other areas, lots of authentication advice for developers is available online, including blog posts, discussions on Stack Overflow, research papers, or guidelines by institutions like OWASP or NIST. We are the first to explore developer advice on authentication that affects usable security for end-users. Based on a survey with 18 professional web developers, we obtained 406 documents and qualitatively analyzed 272 contained pieces of advice in depth. We aim to understand the accessibility and quality of online advice and provide insights into how online advice might contribute to (in)secure and (un)usable authentication. We find that advice is scattered and that finding recommendable, consistent advice is a challenge for developers, among others. The most common advice is for password-based authentication, but little for more modern alternatives. Unfortunately, many pieces of advice are debatable (e.g., complex password policies), outdated (e.g., enforcing regular password changes), or contradicting and might lead to unusable or insecure authentication. Based on our findings, we make recommendations for developers, advice providers, official institutions, and academia on how to improve online advice for developers.

翻訳日:2024-03-19 06:53:05 公開日:2023-11-26

# IoTエコシステムの脅威とアクセス制御のソリューションとしてのブロックチェーンの課題:調査

Challenges in Blockchain as a Solution for IoT Ecosystem Threats and Access Control: A Survey ( http://arxiv.org/abs/2311.15290v1 )

ライセンス: Link先を確認

Suranjeet Chowdhury Avik, Sujit Biswas, Md Atiqur Rahaman Ahad, Zohaib Latif, Abdullah Alghamdi, Hamad Abosaq, Anupam Kumar Bairagi,

(参考訳) IoT(Internet of Things)は,私たちの日常生活のさまざまな側面に影響を与え,変革しています。一般的な信念とは対照的に、消費者や自動化システムからのデータ収集に使われるため、セキュリティやプライバシーの問題を提起する。集中制御システムのような問題やブロックチェーンとの統合のような潜在的な代替案について議論する記事が多数掲載されている。最近の調査ではIoTエコシステムが直面する課題とソリューションに焦点が当てられているが、そのほとんどが脅威や困難、ブロックチェーンベースのソリューションに集中していない。さらに、ブロックチェーンやIoT統合の課題やアタックにも焦点を絞ったものはありません。 IoTエコシステムの文脈では、全体的なセキュリティ対策は、全体的な課題を理解する上で非常に重要です。この記事では、最近の多くの記事で概説された困難を要約し、ブロックチェーンベースのソリューションなど、さまざまなアプローチにおけるさまざまな攻撃とセキュリティ上の課題を詳述する。より明確に言えば、このコントリビューションは脅威、アクセス制御の問題、簡潔な修正を集約する。さらに、この研究はパブリックブロックチェーンプロトコルに対するいくつかの攻撃をリストアップしており、研究者がIoTユースケースの予防措置を取るための実例をいくつか挙げている。最後に、今後の研究方向性は、現代の研究貢献を分析して研究ギャップを終わらせるものである。

The Internet of Things (IoT) is increasingly influencing and transforming various aspects of our daily lives. Contrary to popular belief, it raises security and privacy issues as it is used to collect data from consumers or automated systems. Numerous articles are published that discuss issues like centralised control systems and potential alternatives like integration with blockchain. Although a few recent surveys focused on the challenges and solutions facing the IoT ecosystem, most of them did not concentrate on the threats, difficulties, or blockchain-based solutions. Additionally, none of them focused on blockchain and IoT integration challenges and attacks. In the context of the IoT ecosystem, overall security measures are very important to understand the overall challenges. This article summarises difficulties that have been outlined in numerous recent articles and articulates various attacks and security challenges in a variety of approaches, including blockchain-based solutions and so on. More clearly, this contribution consolidates threats, access control issues, and remedies in brief. In addition, this research has listed some attacks on public blockchain protocols with some real-life examples that can guide researchers in taking preventive measures for IoT use cases. Finally, a future research direction concludes the research gaps by analysing contemporary research contributions.

翻訳日:2024-03-18 15:51:52 公開日:2023-11-26

# メタバース・セキュリティ・インプリケーションにおける暗号利用の理解

Understanding the Utilization of Cryptocurrency in the Metaverse and Security Implications ( http://arxiv.org/abs/2311.15360v1 )

ライセンス: Link先を確認

Ayodeji Adeniran, Mohammed Alkinoon, David Mohaisen,

(参考訳) 本稿では,暗号を組み込んだ様々なメタバースプラットフォームの動作とセキュリティの分析と理解について述べる。我々は、少なくとも2500万ドルの資本金とコインの上位メタバースドメインを取得し、DNSIPアドレスのホスティング、登録場所、登録URL、DNSサービスプロバイダ、有効期限、各メタバースWebサイトをチェックし、暗号通貨のフィアット通貨に関する情報を含む、名前登録情報(フース)でデータを拡張した。 virustotal.comの結果には、通信ファイル、受動的DNS、レファラーファイル、各メタバースドメインに対する悪意のある検出が含まれている。そこで我々は,メタバースサイトに関連する有害な検出の様々な事例を発見した。我々の分析では、悪意ある活動に影響を及ぼす可能性のあるファイルやその他の属性とともに、相関的な意味でのセキュリティの指標を強調します。

We present our results on analyzing and understanding the behavior and security of various metaverse platforms incorporating cryptocurrencies. We obtained the top metaverse coins with a capitalization of at least 25 million US dollars and the top metaverse domains for the coins, and augmented our data with name registration information (via whois), including the hosting DNS IP addresses, registrant location, registrar URL, DNS service provider, expiry date and check each metaverse website for information on fiat currency for cryptocurrency. The result from virustotal.com includes the communication files, passive DNS, referrer files, and malicious detections for each metaverse domain. Among other insights, we discovered various incidents of malicious detection associated with metaverse websites. Our analysis highlights indicators of (in)security, in the correlation sense, with the files and other attributes that are potentially responsible for the malicious activities.

翻訳日:2024-03-18 15:51:52 公開日:2023-11-26

# 無料コンテンツWebサイトのインフラ利用とそのセキュリティ特性

The Infrastructure Utilization of Free Contents Websites Reveal their Security Characteristics ( http://arxiv.org/abs/2311.15363v1 )

ライセンス: Link先を確認

Mohamed Alqadhi, David Mohaisen,

(参考訳) 無料コンテンツWebサイト(FCW)は、Webの重要な要素であり、それらの利用を理解することが不可欠である。この研究は、さまざまなネットワークサイズ、クラウドサービスプロバイダ、そして国と、彼らが提供しているコンテンツの種類に応じてどのように関連しているかを研究することで、世界中のFCWを分析します。さらに,これらの知見を,プレミアムコンテンツWebサイト(PCWs)と比較した。分析の結果、FCWは中規模のネットワークと相関し、悪意のあるウェブサイトの集中度が高いことが判明した。さらに,PCW,クラウド,カントリーホスティングのパターンには強い相関関係が認められた。同時に, FCWに関してもいくつかの相関関係が観察された。本研究は, 相関分析によるFCW生態系の解明に寄与し, それらの濃度による適切な分離, ろ過による潜在的なリスクの制御を示唆する指標となる。

Free Content Websites (FCWs) are a significant element of the Web, and realizing their use is essential. This study analyzes FCWs worldwide by studying how they correlate with different network sizes, cloud service providers, and countries, depending on the type of content they offer. Additionally, we compare these findings with those of premium content websites (PCWs). Our analysis concluded that FCWs correlate mainly with networks of medium size, which are associated with a higher concentration of malicious websites. Moreover, we found a strong correlation between PCWs, cloud, and country hosting patterns. At the same time, some correlations were also observed concerning FCWs but with distinct patterns contrasting each other for both types. Our investigation contributes to comprehending the FCW ecosystem through correlation analysis, and the indicative results point toward controlling the potential risks caused by these sites through adequate segregation and filtering due to their concentration.

翻訳日:2024-03-18 15:42:08 公開日:2023-11-26

# 自己監督型ウェイポイント騒音予測による軌道予測の強化

Enhancing Trajectory Prediction through Self-Supervised Waypoint Noise Prediction ( http://arxiv.org/abs/2312.09466v1 )

ライセンス: Link先を確認

Pranav Singh Chib, Pravendra Singh

(参考訳) トラジェクトリ予測は、観測されたトラジェクトリシーケンスから将来のトラジェクトリを予測するために、トラフィックアクターの不確定の性質をモデル化する重要なタスクである。しかし、現在の方法では、トラジェクトリーがこれらの多様体に厳密に従うと仮定して、推定されたデータ多様体に限定し、過度に単純化された予測をもたらす。そこで本研究では,SSWNP(Self-Supervised Waypoint Noise Prediction)と呼ばれる新しい手法を提案する。提案手法では,まず,過去の観測された軌跡の清潔でノイズに満ちた視点を,経路の空間領域にまたがって作成する。次に、軌道予測モデルを用いて、これらの2つの視点からの予測と軌道予測タスクとの空間的整合性を維持する。ノイズ拡張ビューの導入は、モデルがデータ多様体の狭い解釈に依存することを緩和し、より妥当で多様な表現を学べる。また,過去の観測軌跡の2つの視点における騒音を補助的自己監視課題として予測し,モデルによる基礎表現と今後の予測の理解を深める。実験的な証拠は、SSWNPをモデル学習プロセスに組み込むことで、ベースライン法と比較してノイズの多い環境でも性能が著しく向上することを示している。提案手法は既存の軌道予測手法を補完することができる。提案手法の有効性を示すために,NBA Sports VU,ETH-UCY,TrajNet++の3つのデータセットに対して広範な実験を行った。

Trajectory prediction is an important task that involves modeling the indeterminate nature of traffic actors to forecast future trajectories given the observed trajectory sequences. However, current methods confine themselves to presumed data manifolds, assuming that trajectories strictly adhere to these manifolds, resulting in overly simplified predictions. To this end, we propose a novel approach called SSWNP (Self-Supervised Waypoint Noise Prediction). In our approach, we first create clean and noise-augmented views of past observed trajectories across the spatial domain of waypoints. We then compel the trajectory prediction model to maintain spatial consistency between predictions from these two views, in addition to the trajectory prediction task. Introducing the noise-augmented view mitigates the model's reliance on a narrow interpretation of the data manifold, enabling it to learn more plausible and diverse representations. We also predict the noise present in the two views of past observed trajectories as an auxiliary self-supervised task, enhancing the model's understanding of the underlying representation and future predictions. Empirical evidence demonstrates that the incorporation of SSWNP into the model learning process significantly improves performance, even in noisy environments, when compared to baseline methods. Our approach can complement existing trajectory prediction methods. To showcase the effectiveness of our approach, we conducted extensive experiments on three datasets: NBA Sports VU, ETH-UCY, and TrajNet++, with experimental results highlighting the substantial improvement achieved in trajectory prediction tasks.

翻訳日:2024-01-15 14:13:37 公開日:2023-11-26

# ai駆動e-liability knowledge graphs: サプライチェーン炭素会計と排出責任管理のための包括的枠組み

AI-driven E-Liability Knowledge Graphs: A Comprehensive Framework for Supply Chain Carbon Accounting and Emissions Liability Management ( http://arxiv.org/abs/2312.00045v1 )

ライセンス: Link先を確認

Olamide Oladeji, Seyed Shahabeddin Mousavi, Marc Roston

(参考訳) 炭素収支は気候変動と闘う上で基本的な役割を担っているが、その課題がないわけではない。本稿は、従来の炭素会計の実践を批判し、その後、カプランとラマンナが提唱したE-liability Carbon Accounting Method and Emissions Liability Management (ELM)を導入し、その強みを強調した。実世界の炭素会計改善のためのこの新しいアプローチの膨大な価値を認識し、E-liability Knowledge GraphフレームワークであるAIと計算を活用する新しいデータ駆動統合フレームワークを導入し、E-liability Carbon Accounting方法論の現実の実装を実現します。提案手法は,サプライチェーン内の複雑な環境相互作用を明確化し,より良い情報とより責任のある意思決定を可能にする。我々は,このフレームワークの実装面を分析し,グローバルサプライチェーンの透明性と脱炭を確実にする上で,このAI支援知識グラフの役割について論じる。

While carbon accounting plays a fundamental role in our fight against climate change, it is not without its challenges. We begin the paper with a critique of the conventional carbon accounting practices, after which we proceed to introduce the E-liability carbon accounting methodology and Emissions Liability Management (ELM) originally proposed by Kaplan and Ramanna, highlighting their strengths. Recognizing the immense value of this novel approach for real-world carbon accounting improvement, we introduce a novel data-driven integrative framework that leverages AI and computation - the E-Liability Knowledge Graph framework - to achieve real-world implementation of the E-liability carbon accounting methodology. In addition to providing a path-to-implementation, our proposed framework brings clarity to the complex environmental interactions within supply chains, thus enabling better informed and more responsible decision-making. We analyze the implementation aspects of this framework and conclude with a discourse on the role of this AI-aided knowledge graph in ensuring the transparency and decarbonization of global supply chains.

翻訳日:2023-12-11 03:59:47 公開日:2023-11-26

# AIガバナンスの強化のためのAI監査の強化

Advancing AI Audits for Enhanced AI Governance ( http://arxiv.org/abs/2312.00044v1 )

ライセンス: Link先を確認

Arisa Ema, Ryo Sato, Tomoharu Hase, Masafumi Nakano, Shinji Kamimura, Hiromu Kitamura

(参考訳) 人工知能(AI)が社会の様々なサービスやシステムに統合されるにつれて、多くの企業や組織がAIの原則や政策を提案し、関連するコミットメントを行った。逆に、独立監査の必要性を提案し、AIサービスやシステムの開発者や提供者が採用する自発的な原則がリスクを十分に解決する、と主張する者もいる。このポリシーレコメンデーションは、AIサービスとシステムの監査に関する問題を要約し、健全なAIガバナンスに寄与するAI監査を促進するための3つのレコメンデーションを提示する。勧告1.AI監査のための制度設計の開発推薦2.AI監査のための人材育成勧告3。技術進歩に応じてAI監査を更新する。このポリシーレコメンデーションでは、AIは、生成AIがどのように監査されるべきかを概説する最後の章でデータを認識し、予測するものであると仮定されている。

As artificial intelligence (AI) is integrated into various services and systems in society, many companies and organizations have proposed AI principles, policies, and made the related commitments. Conversely, some have proposed the need for independent audits, arguing that the voluntary principles adopted by the developers and providers of AI services and systems insufficiently address risk. This policy recommendation summarizes the issues related to the auditing of AI services and systems and presents three recommendations for promoting AI auditing that contribute to sound AI governance. Recommendation1.Development of institutional design for AI audits. Recommendation2.Training human resources for AI audits. Recommendation3. Updating AI audits in accordance with technological progress. In this policy recommendation, AI is assumed to be that which recognizes and predicts data with the last chapter outlining how generative AI should be audited.

翻訳日:2023-12-11 03:59:27 公開日:2023-11-26

# 有機化学研究パラダイムの転換-手作業から自動化と人工知能の交差点への移行

Transforming organic chemistry research paradigms: moving from manual efforts to the intersection of automation and artificial intelligence ( http://arxiv.org/abs/2312.00808v1 )

ライセンス: Link先を確認

Chengchun Liu, Yuntian Chen, Fanyang Mo

(参考訳) 有機化学は、労働集約的なアプローチから、自動化と人工知能(AI)が支配する新しい時代へと、大きなパラダイムシフトを遂げている。この変化は、技術の進歩、研究効率と正確性の向上への需要の増大、学際的研究の急成長によってもたらされている。計算能力とアルゴリズムによってサポートされているAIモデルは、合成計画を大幅に作り変え、複雑な分子合成に取り組むための画期的な方法を導入している。さらに、自律ロボットシステムは、前例のないスピードと精度で退屈な作業を行うことで、発見のペースを急速に加速している。この記事では、このパラダイムシフトによって提示される複数の機会と課題を調べ、その広範囲にわたる影響について検討します。これは、自動化とAIの相乗的相互作用によってますます定義される有機化学研究の将来の軌道に関する貴重な洞察を提供する。

Organic chemistry is undergoing a major paradigm shift, moving from a labor-intensive approach to a new era dominated by automation and artificial intelligence (AI). This transformative shift is being driven by technological advances, the ever-increasing demand for greater research efficiency and accuracy, and the burgeoning growth of interdisciplinary research. AI models, supported by computational power and algorithms, are drastically reshaping synthetic planning and introducing groundbreaking ways to tackle complex molecular synthesis. In addition, autonomous robotic systems are rapidly accelerating the pace of discovery by performing tedious tasks with unprecedented speed and precision. This article examines the multiple opportunities and challenges presented by this paradigm shift and explores its far-reaching implications. It provides valuable insights into the future trajectory of organic chemistry research, which is increasingly defined by the synergistic interaction of automation and AI.

翻訳日:2023-12-11 03:31:10 公開日:2023-11-26

# 炭素会計におけるaiデータ活用--代替資源からの情報抽出

Leveraging AI-derived Data for Carbon Accounting: Information Extraction from Alternative Sources ( http://arxiv.org/abs/2312.03722v1 )

ライセンス: Link先を確認

Olamide Oladeji, Seyed Shahabeddin Mousavi

(参考訳) 炭素会計は、排出削減と脱炭への世界的道の基本的な構成要素であるが、信頼性と信頼性のある炭素会計対策の達成には多くの課題がある。私たちは、炭素会計はデータ駆動であるだけでなく、より方法論的に健全である必要があることを動機付けている。我々は、信頼された炭素会計手続きへの道のりにおいて重要な役割を果たす、より多様なデータソースの必要性を議論し、その理由だけでなく、一般的な人工知能(ai)と自然言語処理(nlp)が、このプロセスにおける非構造化データの利用をより有効にする分野の最近の進歩に照らして、代替データセットの宝庫への合理的なアクセスをいかに解放するかについて詳しく説明する。金融・海運データに対するOpenAIのGPT APIを用いたNLPを用いた分析により,近年の現実世界データに関するケーススタディを提案する。本稿は,これらの手法とアプローチを,AIを活用した統合的炭素会計のためのより広範なフレームワークに統合する方法についての議論で締めくくった。

Carbon accounting is a fundamental building block in our global path to emissions reduction and decarbonization, yet many challenges exist in achieving reliable and trusted carbon accounting measures. We motivate that carbon accounting not only needs to be more data-driven, but also more methodologically sound. We discuss the need for alternative, more diverse data sources that can play a significant role on our path to trusted carbon accounting procedures and elaborate on not only why, but how Artificial Intelligence (AI) in general and Natural Language Processing (NLP) in particular can unlock reasonable access to a treasure trove of alternative data sets in light of the recent advances in the field that better enable the utilization of unstructured data in this process. We present a case study of the recent developments on real-world data via an NLP-powered analysis using OpenAI's GPT API on financial and shipping data. We conclude the paper with a discussion on how these methods and approaches can be integrated into a broader framework for AI-enabled integrative carbon accounting.

翻訳日:2023-12-11 03:22:33 公開日:2023-11-26

# モデルグラデード評価と自動解釈可能性のロバスト性を探る

Exploring the Robustness of Model-Graded Evaluations and Automated Interpretability ( http://arxiv.org/abs/2312.03721v1 )

ライセンス: Link先を確認

Simon Lermen and Ond\v{r}ej Kvapil

(参考訳) 言語モデルの評価に対する様々なリスクや特徴に対する関心が高まっている。グラデーションの自然言語理解に依存する評価は、他の言語モデルを用いて大規模に行うことができる。我々は,これらのモデルグレード評価のロバスト性を,新しい偽装evalを含む異なるデータセットへのインジェクションにテストする。これらの注射は、検査官と検査官の間の直接のコミュニケーションに似て、成績を変える。私たちは、よりインテリジェントなモデルが彼らの評価モデルを操作したり協力したりする未来を推定します。本研究は, 現状の商業モデルにおけるこれらの注入に対する感受性について検討した。さらに、同様のインジェクションを自動解釈フレームワークで使用して、誤解を招くモデル記述の説明を生成することもできる。結果は今後の働きを刺激し、評価と自動解釈可能性に対する不適格な信頼に注意する必要がある。

There has been increasing interest in evaluations of language models for a variety of risks and characteristics. Evaluations relying on natural language understanding for grading can often be performed at scale by using other language models. We test the robustness of these model-graded evaluations to injections on different datasets including a new Deception Eval. These injections resemble direct communication between the testee and the evaluator to change their grading. We extrapolate that future, more intelligent models might manipulate or cooperate with their evaluation model. We find significant susceptibility to these injections in state-of-the-art commercial models on all examined evaluations. Furthermore, similar injections can be used on automated interpretability frameworks to produce misleading model-written explanations. The results inspire future work and should caution against unqualified trust in evaluations and automated interpretability.

翻訳日:2023-12-11 03:22:13 公開日:2023-11-26

# llmとの交渉: 迅速なハック、スキルギャップ、推論欠陥

Negotiating with LLMS: Prompt Hacks, Skill Gaps, and Reasoning Deficits ( http://arxiv.org/abs/2312.03720v1 )

ライセンス: Link先を確認

Johannes Schneider, Steffi Haag, Leona Chandra Kruse

(参考訳) 大規模な言語モデルであるChatGPTのようなLSMは、記録的な時間で100のMioユーザバリアに達し、私たちの生活のあらゆる領域に入り込み、これらの人工知能モデルと人間の間の多様な相互作用へと繋がる可能性がある。多くの研究が一階の原則から誘導的にガバナンスと規制について議論しているが、人間とLSMの対話を観察するインダクティブでデータ駆動のレンズを提供する研究はほとんどない。本研究は,全年齢グループで40名以上の個人を対象に,llmと価格交渉を行うユーザ調査を行う。交渉結果と戦略の相違について検討し, LLMとの相互作用について考察する。さらに,LLMの推論能力に関する欠点を強調し,その結果として,LLMが命令に反し合理性を超えた合意を下すために,LLMを操作しようとするハッキングに対する感受性を強調した。また,LLMを効果的に操作する上でのリテラシーのギャップを指摘するため,人間が達成した交渉価格が幅広い範囲で達成できることも示している。

Large language models LLMs like ChatGPT have reached the 100 Mio user barrier in record time and might increasingly enter all areas of our life leading to a diverse set of interactions between those Artificial Intelligence models and humans. While many studies have discussed governance and regulations deductively from first-order principles, few studies provide an inductive, data-driven lens based on observing dialogues between humans and LLMs especially when it comes to non-collaborative, competitive situations that have the potential to pose a serious threat to people. In this work, we conduct a user study engaging over 40 individuals across all age groups in price negotiations with an LLM. We explore how people interact with an LLM, investigating differences in negotiation outcomes and strategies. Furthermore, we highlight shortcomings of LLMs with respect to their reasoning capabilities and, in turn, susceptiveness to prompt hacking, which intends to manipulate the LLM to make agreements that are against its instructions or beyond any rationality. We also show that the negotiated prices humans manage to achieve span a broad range, which points to a literacy gap in effectively interacting with LLMs.

翻訳日:2023-12-11 03:22:02 公開日:2023-11-26

# 総合標準化試験におけるAIチャットボットの性能評価 : GREを用いた事例

Assessing AI Chatbots Performance in Comprehensive Standardized Test Preparation; A Case Study with GRE ( http://arxiv.org/abs/2312.03719v1 )

ライセンス: Link先を確認

Mohammad Abu-Haifa, Bara'a Etawi, Huthaifa Alkhatatbeh, and Ayman Ababneh

(参考訳) 本稿では、標準化されたテスト質問に対する3つの人工知能チャットボット(bing、chatgpt、gpt-4)の性能を総合的に評価する。 GREとして知られる大学院記録試験は,定量的推論と言語スキルの両方を含むケーススタディとして機能する。チャットボットの能力を評価するために,多種多様なスタイルと157の言語質問を多種多様な難易度(易易度,中度,難易度)に分類した137の量的推論質問を行った。本稿では、各チャットボットの性能を試験でテストされた様々なスキルやスタイルにまたがって提示することにより、標準化テスト準備における人工知能の利用に関する結果とその意義について詳細に検討する。さらに,画像に基づく質問に対する人工知能の習熟度について検討し,各チャットボットの不確実性レベルについて述べる。その結果、チャットボット全体の成功度が変化し、モデルの洗練度とトレーニングデータの影響が示された。 gpt-4は、特に複雑な言語理解タスクにおいて最も熟練し、言語理解における人工知能の進化と、高いスコアで試験に合格する能力を強調した。

This research paper presents a comprehensive evaluation of the performance of three artificial 10 intelligence chatbots: Bing, ChatGPT, and GPT-4, in addressing standardized test questions. Graduate record examination, known as GRE, serves as a case study in this paper, encompassing both quantitative reasoning and verbal skills. A total of 137 quantitative reasoning questions, featuring diverse styles and 157 verbal questions categorized into varying levels of difficulty (easy, medium, and hard) were administered to assess the chatbots' capabilities. This paper provides a detailed examination of the results and their implications for the utilization of artificial intelligence in standardized test preparation by presenting the performance of each chatbot across various skills and styles tested in the exam. Additionally, this paper explores the proficiency of artificial intelligence in addressing image-based questions and illustrates the uncertainty level of each chatbot. The results reveal varying degrees of success across the chatbots, demonstrating the influence of model sophistication and training data. GPT-4 emerged as the most proficient, especially in complex language understanding tasks, highlighting the evolution of artificial intelligence in language comprehension and its ability to pass the exam with a high score.

翻訳日:2023-12-11 03:21:42 公開日:2023-11-26

# 法律における大規模言語モデル:調査

Large Language Models in Law: A Survey ( http://arxiv.org/abs/2312.03718v1 )

ライセンス: Link先を確認

Jinqi Lai, Wensheng Gan, Jiayang Wu, Zhenlian Qi, Philip S. Yu

(参考訳) 人工知能(AI)の出現は、従来の司法産業に大きな影響を与えた。さらに、近年、AIGC(AI生成コンテンツ)の開発により、画像認識、自動テキスト生成、対話型チャットなど、AIと法則がさまざまな領域に応用されている。大型モデルの急速な台頭と普及に伴い、AIが従来の司法業界に変革をもたらすことは明らかである。しかし、法的な大規模言語モデル(LLM)の適用はまだ初期段階にある。いくつかの課題に対処する必要がある。本稿では,法的LLMを包括的に調査することを目的とする。我々は、LLMの広範な調査を行うだけでなく、司法制度におけるそれらの適用を明らかにする。まず、法分野におけるAI技術の概観と、LLMにおける最近の研究の紹介を行う。次に,ユーザへの法的助言や裁判中の裁判官支援など,法律llmが提示する実践的実施について論じる。さらに、データ、アルゴリズム、司法実務を含む法的LLMの限界についても検討する。最後に,実践的提言を要約し,これらの課題に対処するための今後の開発方向性を提案する。

The advent of artificial intelligence (AI) has significantly impacted the traditional judicial industry. Moreover, recently, with the development of AI-generated content (AIGC), AI and law have found applications in various domains, including image recognition, automatic text generation, and interactive chat. With the rapid emergence and growing popularity of large models, it is evident that AI will drive transformation in the traditional judicial industry. However, the application of legal large language models (LLMs) is still in its nascent stage. Several challenges need to be addressed. In this paper, we aim to provide a comprehensive survey of legal LLMs. We not only conduct an extensive survey of LLMs, but also expose their applications in the judicial system. We first provide an overview of AI technologies in the legal field and showcase the recent research in LLMs. Then, we discuss the practical implementation presented by legal LLMs, such as providing legal advice to users and assisting judges during trials. In addition, we explore the limitations of legal LLMs, including data, algorithms, and judicial practice. Finally, we summarize practical recommendations and propose future development directions to address these challenges.

翻訳日:2023-12-11 03:21:18 公開日:2023-11-26

# chatgptを用いた画像解析における深層学習技術の進化の要約--質的研究

ChatGPT Application In Summarizing An Evolution Of Deep Learning Techniques In Imaging: A Qualitative Study ( http://arxiv.org/abs/2312.03723v1 )

ライセンス: Link先を確認

Arman Sarraf, Amirabbas Abbaspour

(参考訳) 記事やテキストの要約の追求は、自然言語処理(nlp)実践者の注意を惹きつけ、自身を強烈な挑戦と表現している。 ChatGPT 3.5は、最大3000個のトークンの内容を1ページに格納する能力を示し、様々なテーマにまたがる所定のテキストから重要な情報を保持することを目的としている。質的研究の結果、7つの科学論文を選定し、公開のチャットgptサービスを用いて論文の要約を作成した。その後,記事の共著者6名を対象に,原内容と比較して要約の質を評価するための5つの質問を行った。その結果,ChatGPTが生成した要約は,各原稿の主文を保存し,記事に含まれる重要な情報を効果的にカプセル化することがわかった。しかし、本来の記事とは対照的に、要約の技術的な深みはわずかに減少していた。その結果,ChatGPTのテキスト要約能力は,純粋に科学的言説よりも報告に整合した方法で本質的な洞察を抽出する強力なツールであることが示唆された。

The pursuit of article or text summarization has captured the attention of natural language processing (NLP) practitioners, presenting itself as a formidable challenge. ChatGPT 3.5 exhibits the capacity to condense the content of up to 3000 tokens into a single page, aiming to retain pivotal information from a given text across diverse themes. In a conducted qualitative research endeavor, we selected seven scientific articles and employed the publicly available ChatGPT service to generate summaries of these articles. Subsequently, we engaged six co-authors of the articles in a survey, presenting five questions to evaluate the quality of the summaries compared to the original content. The findings revealed that the summaries produced by ChatGPT effectively encapsulated the crucial information present in the articles, preserving the principal message of each manuscript. Nonetheless, there was a slight diminishment in the technical depth of the summaries as opposed to the original articles. As a result, our conclusion underscores ChatGPT's text summarization capability as a potent tool for extracting essential insights in a manner more aligned with reporting than purely scientific discourse.

翻訳日:2023-12-11 03:05:53 公開日:2023-11-26

# 分布アルゴリズム推定における遺伝的ドリフトのシャープ境界

Sharp Bounds for Genetic Drift in Estimation of Distribution Algorithms ( http://arxiv.org/abs/1910.14389v2 )

ライセンス: Link先を確認

Benjamin Doerr, Weijie Zheng

(参考訳) 分布アルゴリズムの推定 (EDAs) は、人口ではなく確率的モデルを進化させるという広義の進化的アルゴリズム(EA)の一分野である。既存のアルゴリズムはこのカテゴリに分類される。 EAにおける遺伝的ドリフトと類似して、EDAは、適合性によって正当化されない確率モデルの更新がサンプリング周波数を境界値に移動する現象にも遭遇する。これによりパフォーマンスが大幅に低下する可能性がある。本稿では,複数の単変量EDAに対して中性ビットのサンプリング周波数の境界打点時間の最初の鋭い推定値を示す。それぞれの世代で$\lambda$ offspringから$\mu$のベストな個人を選択するUMDAに対して、中立ビットの周波数がミドルレンジ$[\tfrac 14 \tfrac 34]$と0または1に吸収されたときに期待される最初のイテレーションが$\Theta(\mu)$であることを示す。対応するヒットタイムは、仮説上の集団サイズが$K$のcGAに対して$\Theta(K^2)$である。さらに,$\mu$,$\lambda$,$\rho$ のパラメータを持つ pbil に対して,期待値の $\theta(\mu/\rho^2)$ を繰り返すことで,中性ビットのサンプリング周波数が区間 $[\theta(\rho/\mu),1-\theta(\rho/\mu)] を残し,そのビットに対して常に同じ値がサンプリングされ,その周波数が最大速度で対応する境界値に近づくことを証明した。これらのステートメントで暗黙的な下限に対しては、指数的テール境界も示される。ビットが中性ではなく、中性である場合、あるいはそれを好む場合、低周波値に達するための時間上の下限は依然として保持される。類似のステートメントは、中立あるいは0の値を好むビットに対して成り立つ。

Estimation of Distribution Algorithms (EDAs) are one branch of Evolutionary Algorithms (EAs) in the broad sense that they evolve a probabilistic model instead of a population. Many existing algorithms fall into this category. Analogous to genetic drift in EAs, EDAs also encounter the phenomenon that updates of the probabilistic model not justified by the fitness move the sampling frequencies to the boundary values. This can result in a considerable performance loss. This paper proves the first sharp estimates of the boundary hitting time of the sampling frequency of a neutral bit for several univariate EDAs. For the UMDA that selects $\mu$ best individuals from $\lambda$ offspring each generation, we prove that the expected first iteration when the frequency of the neutral bit leaves the middle range $[\tfrac 14, \tfrac 34]$ and the expected first time it is absorbed in 0 or 1 are both $\Theta(\mu)$. The corresponding hitting times are $\Theta(K^2)$ for the cGA with hypothetical population size $K$. This paper further proves that for PBIL with parameters $\mu$, $\lambda$, and $\rho$, in an expected number of $\Theta(\mu/\rho^2)$ iterations the sampling frequency of a neutral bit leaves the interval $[\Theta(\rho/\mu),1-\Theta(\rho/\mu)]$ and then always the same value is sampled for this bit, that is, the frequency approaches the corresponding boundary value with maximum speed. For the lower bounds implicit in these statements, we also show exponential tail bounds. If a bit is not neutral, but neutral or has a preference for ones, then the lower bounds on the times to reach a low frequency value still hold. An analogous statement holds for bits that are neutral or prefer the value zero.

翻訳日:2023-11-30 18:22:43 公開日:2023-11-26

# 拡張SRVF表現を用いた木状3次元物体の弾性形状解析

Elastic Shape Analysis of Tree-like 3D Objects using Extended SRVF Representation ( http://arxiv.org/abs/2110.08693v4 )

ライセンス: Link先を確認

Guan Wang, Hamid Laga, Anuj Srivastava

(参考訳) 複雑な幾何学的・トポロジカルな変動を示すニューロンや植物木といった詳細な3d生体オブジェクトをどうやって分析できるのか? 本稿では,木のような3次元オブジェクトの形状間の測地変形を表現,比較,計算するための新しい数学的枠組みを開発する。サブツリーの階層構造はこれらのオブジェクトを特徴付ける -- 各サブツリーはメインブランチを持ち、いくつかのサイドブランチが付属している -- 。まず,ユークリッド曲線向けに開発された正方根速度関数(srvf)を木形3dオブジェクトに拡張した新しい表現法を提案する。次に、一方の木の形の物体を他方に変形させるために必要な曲げ、伸展、分岐スライディングを定量化する新しい計量を定義する。 QED(Quotient Euclidean Distance)やTED(Tree Edit Distance)といった現在のメトリクスと比較すると、提案された表現とメトリクスは、枝の完全な弾力性(屈曲と伸張)と位相的変動(分岐死・産出・すべり)を捉えている。 QEDおよびTEDメトリクスのエッジ崩壊とノード分割操作による縮小を完全に回避する。本稿では,ニューロンや植物木などの生物オブジェクト間の測地学の比較,マッチング,計算において,このフレームワークの有用性を示す。このフレームワークは様々な形状分析タスクにも適用できる。 (i)木形3次元物体の対称性解析と対称性二木形3Dオブジェクトの集団の計算概要統計(意味と変動のモード) (iii)そのような集団にパラメトリック確率分布を適合させること。 (iv)推定確率分布からランダムサンプリングにより、新しい木形3dオブジェクトを合成する。

How can one analyze detailed 3D biological objects, such as neurons and botanical trees, that exhibit complex geometrical and topological variation? In this paper, we develop a novel mathematical framework for representing, comparing, and computing geodesic deformations between the shapes of such tree-like 3D objects. A hierarchical organization of subtrees characterizes these objects -- each subtree has the main branch with some side branches attached -- and one needs to match these structures across objects for meaningful comparisons. We propose a novel representation that extends the Square-Root Velocity Function (SRVF), initially developed for Euclidean curves, to tree-shaped 3D objects. We then define a new metric that quantifies the bending, stretching, and branch sliding needed to deform one tree-shaped object into the other. Compared to the current metrics, such as the Quotient Euclidean Distance (QED) and the Tree Edit Distance (TED), the proposed representation and metric capture the full elasticity of the branches (i.e., bending and stretching) as well as the topological variations (i.e., branch death/birth and sliding). It completely avoids the shrinkage that results from the edge collapse and node split operations of the QED and TED metrics. We demonstrate the utility of this framework in comparing, matching, and computing geodesics between biological objects such as neurons and botanical trees. The framework is also applied to various shape analysis tasks: (i) symmetry analysis and symmetrization of tree-shaped 3D objects, (ii) computing summary statistics (means and modes of variations) of populations of tree-shaped 3D objects, (iii) fitting parametric probability distributions to such populations, and (iv) finally synthesizing novel tree-shaped 3D objects through random sampling from estimated probability distributions.

翻訳日:2023-11-30 18:17:54 公開日:2023-11-26

# 複雑性理論のヒントで正しいアルゴリズムを選ぶ

Choosing the Right Algorithm With Hints From Complexity Theory ( http://arxiv.org/abs/2109.06584v2 )

ライセンス: Link先を確認

Shouda Wang and Weijie Zheng and Benjamin Doerr

(参考訳) 異なる探索ヒューリスティックのミリアードから適切なアルゴリズムを選択することは、新しい最適化問題に直面すると困難である。本研究では,ブラックボックスオプティマイザの幅広いクラスにおいて,どのようなアルゴリズムが最良かという純粋に学術的な疑問は,適切な最適化ヒューリスティックを探索する方向を示す実りある指標を与えることができると論じる。最近提案されたdlbベンチマークでこのアプローチを実証し、既知の結果はいくつかの古典的な進化アルゴリズムの$o(n^3)$ランタイムと、推定分布アルゴリズムの$o(n^2 \log n)$ランタイムのみである。単項ブラックボックスの複雑性が$O(n^2)$であることは、メトロポリスアルゴリズムを興味深い候補として提案し、二次時間でDLB問題を解くことを証明した。我々はまた、より良いランタイムが偏りのないアルゴリズムのクラスでは得られないことを証明するので、より多くの親の情報を使って新しいソリューションを生成するアルゴリズムに注意を移す。このタイプの人工アルゴリズムは、$O(n \log n)$ランタイムを持つので、意味に基づくコンパクトな遺伝的アルゴリズム(sig-cGA)は、高い確率で$O(n \log n)$の時間でもDLB問題を解くことができる。我々の実験はメトロポリスのアルゴリズムの優れた性能を示しており、明らかに妥当な問題サイズとみなす全てのアルゴリズムの中で最高のものである。

Choosing a suitable algorithm from the myriads of different search heuristics is difficult when faced with a novel optimization problem. In this work, we argue that the purely academic question of what could be the best possible algorithm in a certain broad class of black-box optimizers can give fruitful indications in which direction to search for good established optimization heuristics. We demonstrate this approach on the recently proposed DLB benchmark, for which the only known results are $O(n^3)$ runtimes for several classic evolutionary algorithms and an $O(n^2 \log n)$ runtime for an estimation-of-distribution algorithm. Our finding that the unary unbiased black-box complexity is only $O(n^2)$ suggests the Metropolis algorithm as an interesting candidate and we prove that it solves the DLB problem in quadratic time. Since we also prove that better runtimes cannot be obtained in the class of unary unbiased algorithms, we shift our attention to algorithms that use the information of more parents to generate new solutions. An artificial algorithm of this type having an $O(n \log n)$ runtime leads to the result that the significance-based compact genetic algorithm (sig-cGA) can solve the DLB problem also in time $O(n \log n)$ with high probability. Our experiments show a remarkably good performance of the Metropolis algorithm, clearly the best of all algorithms regarded for reasonable problem sizes.

翻訳日:2023-11-30 18:17:23 公開日:2023-11-26

# 量子揺らぎによる透明非線形減衰

Apparent nonlinear damping triggered by quantum fluctuations ( http://arxiv.org/abs/2104.06464v2 )

ライセンス: Link先を確認

Mario F. Gely, Adri\'an Sanz Mora, Shun Yanai, Rik van der Spek, Daniel Bothner, Gary A. Steele

(参考訳) 非線形減衰、振動振幅による減衰率の変化は多くの電気的、機械的、生物学的振動子において重要な役割を果たす。カーボンナノチューブ、グラフェン膜、超伝導共振器などの新しい技術では、非線形減衰の起源はよく分かっていない。これは、減衰速度が極めて精密なセンサーや量子コンピュータへのこれらのシステムの適用におけるメリットの鍵となるため、問題である。超伝導共振器の測定により、量子揺らぎの相互作用とジョセフソン接合の非線形性から、非線形減衰によく似た共振器応答のパワー依存性が現れることを示す。この現象は位相空間における準確率の流れを通して理解され、可視化することができる。量子ゆらぎやその他のノイズ源は、ナノメカニカル振動子やマクロシステムのような同様の保守的な非線形性を持つ系において明らかに非線形減衰を引き起こすことを期待する。

Nonlinear damping, the change in damping rate with the amplitude of oscillations plays an important role in many electrical, mechanical and even biological oscillators. In novel technologies such as carbon nanotubes, graphene membranes or superconducting resonators, the origin of nonlinear damping is sometimes unclear. This presents a problem, as the damping rate is a key figure of merit in the application of these systems to extremely precise sensors or quantum computers. Through measurements of a superconducting resonator, we show that from the interplay of quantum fluctuations and the nonlinearity of a Josephson junction emerges a power-dependence in the resonator response which closely resembles nonlinear damping. The phenomenon can be understood and visualized through the flow of quasi-probability in phase space where it reveals itself as dephasing. Crucially, the effect is not restricted to superconducting circuits: we expect that quantum fluctuations or other sources of noise give rise to apparent nonlinear damping in systems with a similar conservative nonlinearity, such as nano-mechanical oscillators or even macroscopic systems.

翻訳日:2023-11-30 18:15:20 公開日:2023-11-26

# ブリッジと非定常マルチアームバンド

Bridging Adversarial and Nonstationary Multi-armed Bandit ( http://arxiv.org/abs/2201.01628v3 )

ライセンス: Link先を確認

Ningyuan Chen, Shuoguang Yang, Hailun Zhang

(参考訳) マルチアームのバンディットフレームワークでは、時変報酬分布を扱うために一般的に使われる2つの定式化がある: 逆バンディットと非定常バンディットである。本論文では, オーラクル, アルゴリズム, 後悔分析の相違について述べるが, この2つを特殊ケースとしてスムーズにブリッジする統一的な定式化について述べる。この定式化は、タイムウインドウ内で最高の固定アームを取るオラクルを使用します。ウィンドウサイズによっては、非定常バンディットの逆バンディットと動的オラクルにおいて後からオラクルになる。我々は、一致する下限で最適な後悔を得るアルゴリズムを提供する。

In the multi-armed bandit framework, there are two formulations that are commonly employed to handle time-varying reward distributions: adversarial bandit and nonstationary bandit. Although their oracles, algorithms, and regret analysis differ significantly, we provide a unified formulation in this paper that smoothly bridges the two as special cases. The formulation uses an oracle that takes the best-fixed arm within time windows. Depending on the window size, it turns into the oracle in hindsight in the adversarial bandit and dynamic oracle in the nonstationary bandit. We provide algorithms that attain the optimal regret with the matching lower bound.

翻訳日:2023-11-30 18:05:43 公開日:2023-11-26

# 有効信頼度推定による半教師付きサルエント物体検出

Semi-supervised Salient Object Detection with Effective Confidence Estimation ( http://arxiv.org/abs/2112.14019v2 )

ライセンス: Link先を確認

Jiawei Liu, Jing Zhang, Nick Barnes

(参考訳) 既存の有能なオブジェクト検出モデルの成功は、大きなピクセル単位でラベル付けされたトレーニングデータセットに依存している。我々は,少数のラベル付きサンプルと多数のラベル付きサンプルにアクセス可能な半教師付きサルエント物体検出について検討した。具体的には,条件付エネルギーベースモデルを用いた擬似ラベル学習フレームワークを提案する。条件付エネルギーベースモデルの確率的潜在変数を用いて,人間の給与ラベルの確率的性質をモデル化する。さらに、未ラベルサンプルに対して生成された対応する擬似ラベルの信頼性を強調して、高品質な画素単位の不確かさマップを作成することができる。これにより、モデル最適化における低確かさの擬似ラベルの寄与を最小化し、エラーの伝播を防止できる。実験の結果,提案手法はラベルなしデータの寄与を効果的に探究できることがわかった。ラベル付きサンプルは1/16に過ぎず,最先端の完全教師付きモデルと比較して競争性能が向上する。

The success of existing salient object detection models relies on a large pixel-wise labeled training dataset, which is time-consuming and expensive to obtain. We study semi-supervised salient object detection, with access to a small number of labeled samples and a large number of unlabeled samples. Specifically, we present a pseudo label based learn-ing framework with a Conditional Energy-based Model. We model the stochastic nature of human saliency labels using the stochastic latent variable of the Conditional Energy-based Model. It further enables generation of a high-quality pixel-wise uncertainty map, highlighting the reliability of corresponding pseudo label generated for the unlabeled sample. This minimises the contribution of low-certainty pseudo labels in optimising the model, preventing the error propagation. Experimental results show that the proposed strategy can effectively explore the contribution of unlabeled data. With only 1/16 labeled samples, our model achieves competitive performance compared with state-of-the-art fully-supervised models.

翻訳日:2023-11-30 18:05:32 公開日:2023-11-26

# 残留型物理インフォームド・トランスファー・ラーニング:深層学習による長期cfdシミュレーションの高速化

Residual-based physics-informed transfer learning: A hybrid method for accelerating long-term CFD simulations via deep learning ( http://arxiv.org/abs/2206.06817v3 )

ライセンス: Link先を確認

Joongoo Jeon, Juhyeong Lee, Ricardo Vinuesa, Sung Joong Kim

(参考訳) 人工知能(AI)の大きな波が計算流体力学(CFD)の加速研究の分野に伝播している一方で、最近の研究は、次の目標を再現するAI技術の開発が主要な課題であり、(1)長期CFDシミュレーションにおける未確認(将来の)時系列の正確な予測(2)シミュレーションの加速(3)複数のPDE条件下で許容されるトレーニングデータと時間(4)の量を予測することを強調している。本研究では、ML-CFDハイブリッド計算を用いて、これらの4つの目的を達成するための残差に基づく物理情報伝達学習(RePIT)戦略を提案する。我々の仮説は、CFDとAIが第1原理の残差を監視しながら時系列を交互に計算するハイブリッド手法により、長期CFDシミュレーションが実現可能であるというものである。自然対流のCFDケーススタディによりRePIT戦略の有効性を検証した。単一のトレーニングアプローチでは、残留スケールの変化が100回程度発生し、予測された時系列が非物理的パターンを示し、また基底の真実からかなりのずれが生じた。逆にRePITの戦略は、決定範囲内の残差を維持し、シミュレーション期間全体を通して良好な精度を示した。地上の真理からの最大誤差は、温度0.4K未満、速度0.024m/sである。さらに,ML-GPUとCFD-CPUの計算時間の平均は0.171秒,0.015秒であった。パラメータアップ時間を含めると、シミュレーションは1.9倍に加速された。結論として、我々のRePIT戦略は、業界におけるCFDシミュレーションのコストを削減するための有望な手法である。しかし、より活発な最適化と改善研究が必要である。

While a big wave of artificial intelligence (AI) has propagated to the field of computational fluid dynamics (CFD) acceleration studies, recent research has highlighted that the development of AI techniques that reconciles the following goals remains our primary task: (1) accurate prediction of unseen (future) time series in long-term CFD simulations (2) acceleration of simulations (3) an acceptable amount of training data and time (4) within a multiple PDEs condition. In this study, we propose a residual-based physics-informed transfer learning (RePIT) strategy to achieve these four objectives using ML-CFD hybrid computation. Our hypothesis is that long-term CFD simulation is feasible with the hybrid method where CFD and AI alternately calculate time series while monitoring the first principle's residuals. The feasibility of RePIT strategy was verified through a CFD case study on natural convection. In a single training approach, a residual scale change occurred around 100th timestep, resulting in predicted time series exhibiting non-physical patterns as well as a significant deviations from the ground truth. Conversely, RePIT strategy maintained the residuals within the defined range and demonstrated good accuracy throughout the entire simulation period. The maximum error from the ground truth was below 0.4 K for temperature and 0.024 m/s for x-axis velocity. Furthermore, the average time for 1 timestep by the ML-GPU and CFD-CPU calculations was 0.171 s and 0.015 s, respectively. Including the parameter-updating time, the simulation was accelerated by a factor of 1.9. In conclusion, our RePIT strategy is a promising technique to reduce the cost of CFD simulations in industry. However, more vigorous optimization and improvement studies are still necessary.

翻訳日:2023-11-30 17:56:41 公開日:2023-11-26

# 部分的参加設定における分散非凸問題の計算・通信効率化手法

A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting ( http://arxiv.org/abs/2205.15580v3 )

ライセンス: Link先を確認

Alexander Tyurin, Peter Richt\'arik

(参考訳) 本稿では,分散最適化と連合学習の3つの重要な要素,確率的勾配の分散低減,部分的参加,圧縮通信について述べる。本手法は, 部分参加環境において, 最適オラクル複雑性と最先端通信複雑性を有することを示す。通信圧縮機能にかかわらず,本手法は分散の低減と部分的参加をうまく組み合わせる:最適なオラクル複雑性を得る,全てのノードの参加を必要としない,有界勾配(異性性)の仮定を必要としない。

We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication. We prove that the new method has optimal oracle complexity and state-of-the-art communication complexity in the partial participation setting. Regardless of the communication compression feature, our method successfully combines variance reduction and partial participation: we get the optimal oracle complexity, never need the participation of all nodes, and do not require the bounded gradients (dissimilarity) assumption.

翻訳日:2023-11-30 17:54:06 公開日:2023-11-26

# 深層学習モデルの関数型ニューラルネットワークの解析:関数型望遠鏡仮説

Analysis of functional neural codes of deep learning models: Functional Telescope Hypothesis ( http://arxiv.org/abs/2205.10952v3 )

ライセンス: Link先を確認

Jung Hoon Lee and Sujith Vijayan

(参考訳) ディープラーニング(DL)エージェントであるディープニューラルネットワーク(DNN)は、大量の並列/シーケンス操作を必要とする。これにより、DNNの動作を理解することが難しく、適切な診断を妨げる。内部プロセスに関するより詳しい知識がなければ、DNNを高い領域にデプロイすることは破滅的な失敗につながる可能性がある。したがって、より信頼性の高いDNN/DLを現実世界の高精細な問題に展開するためには、DNNの内部動作に関する洞察を得ることが不可欠である。本稿では、DNNの意思決定に関連するDLモデルの内部コードの解析に自己組織化マップ(SOM)を用いる。分析の結果,入力層近傍の浅層は特徴を凝縮空間に圧縮し,出力層近傍の深層は特徴空間を広げることが示唆された。また, 圧縮された特徴がDNNの障害を負う可能性を示唆する証拠も発見された。

Deep neural networks (DNNs), the agents of deep learning (DL), require a massive number of parallel/sequential operations. This makes it difficult to comprehend DNNs' operations and impedes proper diagnosis. Without better knowledge of their internal process, deploying DNNs in high-stakes domains can lead to catastrophic failures. Therefore, to build more reliable DNNs/DL to be deployed in high-stakes real-world problems, it is imperative that we gain insights into DNNs' internal operations underlying their decision-making. Here, we use the self-organizing map (SOM) to analyze DL models' internal codes associated with DNNs' decision-making. Our analyses suggest that shallow layers close to the input layer compress features into condensed space and that deep layers close to the output layer expand feature space. We also found evidence indicating that compressed features may underlie DNNs' vulnerabilities to adversarial perturbations.

翻訳日:2023-11-30 17:53:16 公開日:2023-11-26

# 2次情報を用いたモーメントベース政策グラディエント

Momentum-Based Policy Gradient with Second-Order Information ( http://arxiv.org/abs/2205.08253v3 )

ライセンス: Link先を確認

Saber Salehkaleybar, Sadegh Khorasani, Negar Kiyavash, Niao He, Patrick Thiran

(参考訳) 近年の強化学習において, 政策勾配法における変数低減勾配推定器は, 評価過程の加速を許容する主要な研究の焦点となっている。本稿では,時間変化学習率のモーメントを用いて,2次情報を確率勾配降下(SGD)に組み込んだ分散帰納法SHARPを提案する。 SHARPアルゴリズムはパラメータフリーで$\epsilon$-approximate 1次定常点を$O(\epsilon^{-3})$の軌道数で達成し、各イテレーションで$O(1)$のバッチサイズを使用する。従来の研究と異なり,提案アルゴリズムでは,分散還元プロセスの利点を損なうような重要サンプリングを必要としない。さらに、推定誤差の分散は$O(1/t^{2/3})$の速さで減衰し、$t$は反復数である。提案手法が様々な制御課題に対して有効であることを示すとともに,実際の技術状況に対する優位性を示す。

Variance-reduced gradient estimators for policy gradient methods have been one of the main focus of research in the reinforcement learning in recent years as they allow acceleration of the estimation process. We propose a variance-reduced policy-gradient method, called SHARP, which incorporates second-order information into stochastic gradient descent (SGD) using momentum with a time-varying learning rate. SHARP algorithm is parameter-free, achieving $\epsilon$-approximate first-order stationary point with $O(\epsilon^{-3})$ number of trajectories, while using a batch size of $O(1)$ at each iteration. Unlike most previous work, our proposed algorithm does not require importance sampling which can compromise the advantage of variance reduction process. Moreover, the variance of estimation error decays with the fast rate of $O(1/t^{2/3})$ where $t$ is the number of iterations. Our extensive experimental evaluations show the effectiveness of the proposed algorithm on various control tasks and its advantage over the state of the art in practice.

翻訳日:2023-11-30 17:52:39 公開日:2023-11-26

# 自己整合性制約によるブートストラップ動作予測

Bootstrap Motion Forecasting With Self-Consistent Constraints ( http://arxiv.org/abs/2204.05859v4 )

ライセンス: Link先を確認

Maosheng Ye, Jiamiao Xu, Xunnong Xu, Tengfei Wang, Tongyi Cao, Qifeng Chen

(参考訳) 自己整合性制約(MISC)を用いた動き予測をブートストラップする新しいフレームワークを提案する。運動予測タスクは、過去の空間的・時間的情報を組み込むことで、車両の将来の軌跡を予測することを目的としている。 miscの鍵となる設計は、トレーニング中の空間的および時間的摂動の下で予測された軌道を規則化する双対一貫性制約である。また,運動予測におけるマルチモダリティをモデル化するために,教師のターゲットを正確に把握し,マルチモダリティを監督する新しいセルフセンシングスキームを設計する。複数の教師の目標からの明示的な制約を伴って,予測性能の明確な改善を観察する。 argoverse motion forecasting benchmarkとwaymo open motion datasetに関する広範な実験は、miscが最先端の手法を大きく上回っていることを示している。提案手法は一般的な手法であり,他の動き予測手法に容易に組み込むことができるため,提案手法は既存手法の予測性能を一貫して改善することを示す。

We present a novel framework to bootstrap Motion forecasting with Self-consistent Constraints (MISC). The motion forecasting task aims at predicting future trajectories of vehicles by incorporating spatial and temporal information from the past. A key design of MISC is the proposed Dual Consistency Constraints that regularize the predicted trajectories under spatial and temporal perturbation during training. Also, to model the multi-modality in motion forecasting, we design a novel self-ensembling scheme to obtain accurate teacher targets to enforce the self-constraints with multi-modality supervision. With explicit constraints from multiple teacher targets, we observe a clear improvement in the prediction performance. Extensive experiments on the Argoverse motion forecasting benchmark and Waymo Open Motion dataset show that MISC significantly outperforms the state-of-the-art methods. As the proposed strategies are general and can be easily incorporated into other motion forecasting approaches, we also demonstrate that our proposed scheme consistently improves the prediction performance of several existing methods.

翻訳日:2023-11-30 17:51:56 公開日:2023-11-26

# テンソル分解のためのニアリンアー時間と固定パラメータトラクタブルアルゴリズム

Near-Linear Time and Fixed-Parameter Tractable Algorithms for Tensor Decompositions ( http://arxiv.org/abs/2207.07417v3 )

ライセンス: Link先を確認

Arvind V. Mahankali, David P. Woodruff, Ziyu Zhang

(参考訳) 我々はテンソルの低位近似について研究し、テンソルトレインとタッカー分解、およびツリーテンソルネットワークとより一般的なテンソルネットワークとの近似に焦点を当てた。テンソルトレインの分解に対して、小さなビクリテリアランクを持つビクリテリア$(1 + \eps)$-approximationアルゴリズムと、低次項までのランニング時間を持つ$O(q \cdot \nnz(A))$を与え、これは \cite{huber2017randomized} の加算誤差アルゴリズムよりも改善する。 huber2017randomized} のアルゴリズムを相対誤差アルゴリズムに変換する方法を示すが、それらのアルゴリズムは、bicriteria ランク $r$ を持つ $(1 + \eps)$近似アルゴリズムに変換するとき、必ず $o(qr^2 \cdot \nnz(a)) + n \cdot \poly(qk/\eps)$ の計算時間を持つ。我々の知る限り、テンソル列車分解に対する多項式時間相対誤差近似を初めて達成した研究である。我々の鍵となる手法は、$q$のテンソルのテンソル列の平坦化である行列に対して、$q$の行数多項式を持つ部分空間埋め込みを得る方法である。我々はアルゴリズムをツリーテンソルネットワークに拡張する。さらに、このアルゴリズムを任意のグラフを持つテンソルネットワーク(一般テンソルネットワークと呼ぶ)に拡張し、 \cite{ms08_simulating_quantum_tensor_contraction} の結果を用いて、ランク$k$の一般的なテンソルネットワークをランク$k^{O(\deg(G)\tw(G))}$のバイナリツリーネットワークに縮約できることを示し、ツリーテンソルネットワークの場合の削減を可能にした。最後に、テンソルトレイン、タッカー、cp分解に対して、多項式系解法を使用しないため、より単純である新しい固定パラメータ扱い可能なアルゴリズムを与える。ちょうど$k$行のガウス部分空間埋め込みの技法(つまり指数関数的に小さい成功確率)は独立な興味を持つ。

We study low rank approximation of tensors, focusing on the tensor train and Tucker decompositions, as well as approximations with tree tensor networks and more general tensor networks. For tensor train decomposition, we give a bicriteria $(1 + \eps)$-approximation algorithm with a small bicriteria rank and $O(q \cdot \nnz(A))$ running time, up to lower order terms, which improves over the additive error algorithm of \cite{huber2017randomized}. We also show how to convert the algorithm of \cite{huber2017randomized} into a relative error algorithm, but their algorithm necessarily has a running time of $O(qr^2 \cdot \nnz(A)) + n \cdot \poly(qk/\eps)$ when converted to a $(1 + \eps)$-approximation algorithm with bicriteria rank $r$. To the best of our knowledge, our work is the first to achieve polynomial time relative error approximation for tensor train decomposition. Our key technique is a method for obtaining subspace embeddings with a number of rows polynomial in $q$ for a matrix which is the flattening of a tensor train of $q$ tensors. We extend our algorithm to tree tensor networks. In addition, we extend our algorithm to tensor networks with arbitrary graphs (which we refer to as general tensor networks), by using a result of \cite{ms08_simulating_quantum_tensor_contraction} and showing that a general tensor network of rank $k$ can be contracted to a binary tree network of rank $k^{O(\deg(G)\tw(G))}$, allowing us to reduce to the case of tree tensor networks. Finally, we give new fixed-parameter tractable algorithms for the tensor train, Tucker, and CP decompositions, which are simpler than those of \cite{swz19_tensor_low_rank} since they do not make use of polynomial system solvers. Our technique of Gaussian subspace embeddings with exactly $k$ rows (and thus exponentially small success probability) may be of independent interest.

翻訳日:2023-11-30 17:41:32 公開日:2023-11-26

# 量子確率過程からの予測的作業抽出のためのエンジン

Engines for predictive work extraction from memoryful quantum stochastic processes ( http://arxiv.org/abs/2207.03480v3 )

ライセンス: Link先を確認

Ruo Cheng Huang, Paul M. Riechers, Mile Gu, and Varun Narasimhachar

(参考訳) 量子情報処理技術は、古典的な自由エネルギーに加えて、システムの本質的に量子的な特徴から仕事の抽出を可能にする。一方、計算力学の科学は、非マルコフ古典および量子確率過程の予測モデリングのためのツールを与える。これら2つの科学のツールを組み合わせて、量子出力を持つ非マルコフ確率過程から予測作業を抽出する手法を開発した。提案手法は,非予測的な量子ワーク抽出プロトコルよりも多くの作業を抽出することができ,また,量子情報処理を伴わない予測作業抽出が可能であることを実証する。古典的前例のない量子プロセスからの作業抽出において,メモリの有効性において相転移が認められる。我々の研究は、基本的に量子的、本質的に時間的に変化する形で環境自由エネルギーを利用する機械の展望を開放する。

Quantum information-processing techniques enable work extraction from a system's inherently quantum features, in addition to the classical free energy it contains. Meanwhile, the science of computational mechanics affords tools for the predictive modeling of non-Markovian classical and quantum stochastic processes. We combine tools from these two sciences to develop a technique for predictive work extraction from non-Markovian stochastic processes with quantum outputs. We demonstrate that this technique can extract more work than non-predictive quantum work extraction protocols, on one hand, and predictive work extraction without quantum information processing, on the other. We discover a phase transition in the efficacy of memory for work extraction from quantum processes, which is without classical precedent. Our work opens up the prospect of machines that harness environmental free energy in an essentially quantum, essentially time-varying form.

翻訳日:2023-11-30 17:40:43 公開日:2023-11-26

# PlanBench: 変更計画と推論に関する大規模言語モデル評価のための拡張可能なベンチマーク

PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change ( http://arxiv.org/abs/2206.10498v4 )

ライセンス: Link先を確認

Karthik Valmeekam, Matthew Marquez, Alberto Olmo, Sarath Sreedharan, Subbarao Kambhampati

(参考訳) 行動計画の作成と変化の推論は、長年、知的エージェントの中核的能力と見なされてきた。したがって、大規模言語モデル(LLM)の計画と推論能力を評価することが、研究のホットなトピックになっていることは驚くにあたらない。しかし、llm計画能力に関するほとんどの主張は、llmが計画しているのか、単に広大な世界の知識から取得しているだけなのかを知ることが難しい、常識的なタスクに基づいている。 LLMが本質的に計画能力を持っているかどうかを評価するのに十分な多様性を持つ体系的で拡張可能な計画ベンチマークが必要である。そこで本研究では,自動計画コミュニティ,特に国際計画コンペティションで使用されるドメインの種類に基づいた拡張可能なベンチマークスイートであるPlanBenchを提案する。 PlanBenchはタスクドメインと特定の計画機能の両方に十分な多様性を提供します。また本研究では,SOTAモデルにおいても,計画生成-LLM性能を含む多くの重要な機能について非常に短い結果が得られた。したがって、プランベンチは計画と推論におけるLLMの進歩の有用なマーカーとして機能する。

Generating plans of action, and reasoning about change have long been considered a core competence of intelligent agents. It is thus no surprise that evaluating the planning and reasoning capabilities of large language models (LLMs) has become a hot topic of research. Most claims about LLM planning capabilities are however based on common sense tasks-where it becomes hard to tell whether LLMs are planning or merely retrieving from their vast world knowledge. There is a strong need for systematic and extensible planning benchmarks with sufficient diversity to evaluate whether LLMs have innate planning capabilities. Motivated by this, we propose PlanBench, an extensible benchmark suite based on the kinds of domains used in the automated planning community, especially in the International Planning Competition, to test the capabilities of LLMs in planning or reasoning about actions and change. PlanBench provides sufficient diversity in both the task domains and the specific planning capabilities. Our studies also show that on many critical capabilities-including plan generation-LLM performance falls quite short, even with the SOTA models. PlanBench can thus function as a useful marker of progress of LLMs in planning and reasoning.

翻訳日:2023-11-30 17:39:28 公開日:2023-11-26

# 拡張不変マニフォールド学習

Augmentation Invariant Manifold Learning ( http://arxiv.org/abs/2211.00460v2 )

ライセンス: Link先を確認

Shulei Wang

(参考訳) データ拡張は、近年の自己教師型表現学習の進歩において、広く使われている技法であり、重要な要素である。拡張データ間の類似性を維持することにより、結果として得られるデータ表現は、様々な下流解析を改善し、多くのアプリケーションで最先端のパフォーマンスを達成することができる。経験的効果にもかかわらず、既存のほとんどの手法は一般的な非線形条件下での理論的な理解を欠いている。このギャップを埋めるために、データ拡張変換をモデル化する低次元積多様体上の統計フレームワークを開発する。本フレームワークでは,拡張不変多様体学習と呼ばれる新しい表現学習手法を導入し,確率的最適化問題として再構成して計算効率の高いアルゴリズムを設計する。従来の自己教師付き手法と比較して、新しい手法は多様体の幾何構造と拡張データの不変性を同時に利用し、明確な理論的保証を有する。提案手法におけるデータ拡張の役割を考察し,より複雑なデータ拡張が下流分析の改善につながることを示すために,下流解析において拡張データから得られたデータ表現が$k$-nearest 隣の分類器を改善する方法と方法を明らかにする。最後に,シミュレーションおよび実データを用いた数値実験を行い,提案手法の有効性を示す。

Data augmentation is a widely used technique and an essential ingredient in the recent advance in self-supervised representation learning. By preserving the similarity between augmented data, the resulting data representation can improve various downstream analyses and achieve state-of-the-art performance in many applications. Despite the empirical effectiveness, most existing methods lack theoretical understanding under a general nonlinear setting. To fill this gap, we develop a statistical framework on a low-dimension product manifold to model the data augmentation transformation. Under this framework, we introduce a new representation learning method called augmentation invariant manifold learning and design a computationally efficient algorithm by reformulating it as a stochastic optimization problem. Compared with existing self-supervised methods, the new method simultaneously exploits the manifold's geometric structure and invariant property of augmented data and has an explicit theoretical guarantee. Our theoretical investigation characterizes the role of data augmentation in the proposed method and reveals why and how the data representation learned from augmented data can improve the $k$-nearest neighbor classifier in the downstream analysis, showing that a more complex data augmentation leads to more improvement in downstream analysis. Finally, numerical experiments on simulated and real datasets are presented to demonstrate the merit of the proposed method.

翻訳日:2023-11-30 17:33:01 公開日:2023-11-26

# 量子臨界における創発的連続対称性の検出

Detecting emergent continuous symmetries at quantum criticality ( http://arxiv.org/abs/2210.17539v4 )

ライセンス: Link先を確認

Mingru Yang, Bram Vanhecke, Norbert Schuch

(参考訳) 新しくあるいは拡大された対称性は、ハミルトン群の非正規化群フローにおいて対称性の破れ項が無関係である場合、対称性を持たないハミルトニアンの低エネルギースペクトルに現れる。本稿では,量子スピンチェーンの基底状態から創発的保存電流の格子作用素近似を数値的に抽出するテンソルネットワークに基づくアルゴリズムを提案する。スピン-1/2$J$-$Q$Heisenberg 連鎖と分解量子臨界点 (DQCP) の1次元バージョンに対する我々の結果は、創発格子 Kac-Moody 生成器を得るための方法の力を示している。これはまた、可積分モデルの局所的な運動積分と臨界ギャップのない基底状態の局所親ハミルトニアンを見つける方法として見ることもできる。

New or enlarged symmetries can emerge at the low-energy spectrum of a Hamiltonian that does not possess the symmetries, if the symmetry breaking terms in the Hamiltonian are irrelevant under the renormalization group flow. In this letter, we propose a tensor network based algorithm to numerically extract lattice operator approximation of the emergent conserved currents from the ground state of any quantum spin chains, without the necessity to have prior knowledge about its low-energy effective field theory. Our results for the spin-1/2 $J$-$Q$ Heisenberg chain and a one-dimensional version of the deconfined quantum critical points (DQCP) demonstrate the power of our method to obtain the emergent lattice Kac-Moody generators. It can also be viewed as a way to find the local integrals of motion of an integrable model and the local parent Hamiltonian of a critical gapless ground state.

翻訳日:2023-11-30 17:32:41 公開日:2023-11-26

# G-PECNet: 一般化可能な歩行者軌道予測システムを目指して

G-PECNet: Towards a Generalizable Pedestrian Trajectory Prediction System ( http://arxiv.org/abs/2210.09846v2 )

ライセンス: Link先を確認

Aryan Garg, Renu M. Rameshan

(参考訳) 人的資産を妨害したり損傷させたりすることなく、ダイナミックな物理的環境をナビゲートすることは、社会ロボットにとって極めて重要である。本研究では,自律型ドローンナビゲーションのサブ課題である,ディープジェネレーティブモデルを用いて,ドメイン外人間およびエージェントのトラジェクタの予測を行う。提案手法は,2020年のベンチマークでは, 周期的アクティベーション関数にインスパイアされたアーキテクチャ改善と, 隠れマルコフモデル(HMM)と強化学習(RL)を用いた合成軌道(データ)拡張を併用して, 最終変位誤差(FDE)の9.5倍の改善を観測する。さらに,軌道の非線形性および外乱検出のための簡易な幾何学的インスピレーション付き計量法を提案する。コードは$\href{https://github.com/aryan-garg/pecnet-pedestrian-trajectory-prediction.git}{github}$で入手できる。

Navigating dynamic physical environments without obstructing or damaging human assets is of quintessential importance for social robots. In this work, we solve autonomous drone navigation's sub-problem of predicting out-of-domain human and agent trajectories using a deep generative model. Our method: General-PECNet or G-PECNet observes an improvement of 9.5\% on the Final Displacement Error (FDE) on 2020's benchmark: PECNet through a combination of architectural improvements inspired by periodic activation functions and synthetic trajectory (data) augmentations using Hidden Markov Models (HMMs) and Reinforcement Learning (RL). Additionally, we propose a simple geometry-inspired metric for trajectory non-linearity and outlier detection, helpful for the task. Code available at $\href{https://github.com/Aryan-Garg/PECNet-Pedestrian-Trajectory-Prediction.git}{GitHub}$

翻訳日:2023-11-30 17:32:08 公開日:2023-11-26

# MA-RECON: 高速MRIk空間補間のためのマスク対応ディープニューラルネットワーク

MA-RECON: Mask-aware deep-neural-network for robust fast MRI k-space interpolation ( http://arxiv.org/abs/2209.00462v2 )

ライセンス: Link先を確認

Nitzan Avidan and Moti Freiman

(参考訳) フーリエ領域にあるアンダーサンプリングされた「k空間」データからのMRI画像の高品質な再構成は、MRI取得時間を短縮し、時間分解能の優れた確保に不可欠である。近年,このプロセスに関連した複雑で不適切な逆問題に取り組むために,深層ニューラルネットワーク(dnn)手法が数多く登場している。しかし、獲得過程や解剖学的分布の変動に対する不安定さは、これらのDNNアーキテクチャ内の関連物理モデルの一般化に欠如している。本研究の目的は,新しいマスク対応DNNアーキテクチャであるMA-RECONを導入することで,k空間補間のためのDNN手法の一般化能力を向上することである。従来のアプローチとは異なり、MA-RECONアーキテクチャは観測データだけでなく、モデル構造内のアンダーサンプリングマスクも符号化している。様々なアンダーサンプリングマスクで生成されたデータを活用して、アンダーサンプリングされたMRI再構成問題の一般化を刺激する。したがって、関連する逆問題(古典的圧縮センシングアプローチ)を効果的に表現する。我々のMA-RECONアプローチの利点は、広くアクセス可能な高速MRIデータセットによる厳密なテストによって確認された。アンダーサンプリングマスク強化を訓練した標準DNN法とDNNと比較して,本手法は優れた一般化能力を示した。その結果、特に病理疾患のある地域では、獲得過程と解剖学的分布の両方の変化に対するロバスト性が大幅に向上した。結論として,我々のマスク認識戦略は,低サンプリングk空間データからMRI再構成のためのDNNベースの手法の一般化能力と堅牢性を高めることを約束する。

High-quality reconstruction of MRI images from under-sampled `k-space' data, which is in the Fourier domain, is crucial for shortening MRI acquisition times and ensuring superior temporal resolution. Over recent years, a wealth of deep neural network (DNN) methods have emerged, aiming to tackle the complex, ill-posed inverse problem linked to this process. However, their instability against variations in the acquisition process and anatomical distribution exposes a deficiency in the generalization of relevant physical models within these DNN architectures. The goal of our work is to enhance the generalization capabilities of DNN methods for k-space interpolation by introducing `MA-RECON', an innovative mask-aware DNN architecture and associated training method. Unlike preceding approaches, our `MA-RECON' architecture encodes not only the observed data but also the under-sampling mask within the model structure. It implements a tailored training approach that leverages data generated with a variety of under-sampling masks to stimulate the model's generalization of the under-sampled MRI reconstruction problem. Therefore, effectively represents the associated inverse problem, akin to the classical compressed sensing approach. The benefits of our MA-RECON approach were affirmed through rigorous testing with the widely accessible fastMRI dataset. Compared to standard DNN methods and DNNs trained with under-sampling mask augmentation, our approach demonstrated superior generalization capabilities. This resulted in a considerable improvement in robustness against variations in both the acquisition process and anatomical distribution, especially in regions with pathology. In conclusion, our mask-aware strategy holds promise for enhancing the generalization capacity and robustness of DNN-based methodologies for MRI reconstruction from undersampled k-space data.

翻訳日:2023-11-30 17:27:34 公開日:2023-11-26

# InferEM:共感的対話生成のための話者意図の推測

InferEM: Inferring the Speaker's Intention for Empathetic Dialogue Generation ( http://arxiv.org/abs/2212.06373v7 )

ライセンス: Link先を確認

Guoqing Lv, Jiang Li, Xiaoping Wang, Zhigang Zeng

(参考訳) 共感応答生成に対する現在のアプローチは、一般的に対話履歴全体をエンコードし、出力をデコーダに入れてフレンドリーなフィードバックを生成する。これらの手法は文脈情報のモデル化に焦点をあてるが、話者の直接の意図を捉えることは無視する。我々は,対話の最後の発声が話者の意図を実証的に伝えることを主張する。そこで本研究では,共感応答生成のための新しいモデルInferEMを提案する。我々は,最後の発話を別々に符号化し,多面的注意に基づく意図融合モジュールを通して対話全体と融合し,話者の意図を捉える。さらに,先行した発話を用いて最後の発話を予測し,人間の心理をシミュレートし,対話者が事前に何を話すのかを推測する。発話予測と応答生成の最適化率のバランスをとるために,InferEMのためのマルチタスク学習戦略を設計する。実験の結果,inferemの共感性発現改善における可能性と妥当性が示された。

Current approaches to empathetic response generation typically encode the entire dialogue history directly and put the output into a decoder to generate friendly feedback. These methods focus on modelling contextual information but neglect capturing the direct intention of the speaker. We argue that the last utterance in the dialogue empirically conveys the intention of the speaker. Consequently, we propose a novel model named InferEM for empathetic response generation. We separately encode the last utterance and fuse it with the entire dialogue through the multi-head attention based intention fusion module to capture the speaker's intention. Besides, we utilize previous utterances to predict the last utterance, which simulates human's psychology to guess what the interlocutor may speak in advance. To balance the optimizing rates of the utterance prediction and response generation, a multi-task learning strategy is designed for InferEM. Experimental results demonstrate the plausibility and validity of InferEM in improving empathetic expression.

翻訳日:2023-11-30 17:19:37 公開日:2023-11-26

# 言語濃度を用いたモデル推論精度の厳密な評価

Rigorous Assessment of Model Inference Accuracy using Language Cardinality ( http://arxiv.org/abs/2211.16587v2 )

ライセンス: Link先を確認

Donato Clun, Donghwan Shin, Antonio Filieri, Domenico Bianculli

(参考訳) 有限状態オートマトンのようなモデルは、実行中に観測可能なイベントのシーケンスをキャプチャすることでソフトウェアシステムの振る舞いを抽象化するために広く使われている。それでも、モデルが実際に存在することはめったになく、その場合には、容易に時代遅れになり、さらに、手動でモデルを構築し、メンテナンスすることは、コストがかかり、エラーが発生します。その結果、これらの問題に対処するために、実行トレースからモデルを自動的に構築する様々なモデル推論手法が提案されている。しかし、推論されたモデルの体系的かつ信頼性の高い精度評価を行うことは、未解決の問題である。参照モデルが与えられたとしても、既存のモデル精度評価手法のほとんどは、誤解を招く結果や偏った結果を返す可能性がある。これは主に、有限個のランダムに生成されたトレースに対する統計的推定子に依存しており、推定に関する避けられない不確実性をもたらし、ランダムなトレース生成プロセスのパラメータに敏感である。本稿では,モデル精度評価におけるバイアスと不確実性を最小限に抑え,統計的推定を決定論的精度尺度に置き換える,解析的組合せに基づく系統的アプローチを提案する。確立された仕様マイニングベンチマークから参照モデルに対する最先端推論ツールによって推定されるモデルの精度を評価することにより,提案手法の一貫性と妥当性を実験的に実証した。

Models such as finite state automata are widely used to abstract the behavior of software systems by capturing the sequences of events observable during their execution. Nevertheless, models rarely exist in practice and, when they do, get easily outdated; moreover, manually building and maintaining models is costly and error-prone. As a result, a variety of model inference methods that automatically construct models from execution traces have been proposed to address these issues. However, performing a systematic and reliable accuracy assessment of inferred models remains an open problem. Even when a reference model is given, most existing model accuracy assessment methods may return misleading and biased results. This is mainly due to their reliance on statistical estimators over a finite number of randomly generated traces, introducing avoidable uncertainty about the estimation and being sensitive to the parameters of the random trace generative process. This paper addresses this problem by developing a systematic approach based on analytic combinatorics that minimizes bias and uncertainty in model accuracy assessment by replacing statistical estimation with deterministic accuracy measures. We experimentally demonstrate the consistency and applicability of our approach by assessing the accuracy of models inferred by state-of-the-art inference tools against reference models from established specification mining benchmarks.

翻訳日:2023-11-30 17:18:35 公開日:2023-11-26

# テキスト・画像拡散モデルへの条件制御の追加

Adding Conditional Control to Text-to-Image Diffusion Models ( http://arxiv.org/abs/2302.05543v3 )

ライセンス: Link先を確認

Lvmin Zhang and Anyi Rao and Maneesh Agrawala

(参考訳) 大規模で事前訓練されたテキスト-画像拡散モデルに空間条件制御を追加するニューラルネットワークアーキテクチャであるControlNetを提案する。 controlnetはプロダクション対応の大規模拡散モデルをロックし、数十億のイメージでトレーニングされた深層で堅牢なエンコーディング層を強力なバックボーンとして再利用して、さまざまな条件付きコントロールのセットを学ぶ。ニューラル・アーキテクチャは「ゼロ畳み込み」(ゼロ初期化畳み込み層)と接続され、パラメータを徐々にゼロから成長させ、有害なノイズが微調整に影響を与えないようにする。条件付制御,例えばエッジ,エッジ,深さ,セグメンテーション,人間のポーズ等を,プロンプトの有無にかかわらず,単一または複数条件を用いて安定した拡散でテストする。 ControlNetsのトレーニングは、小さな (50k) と大きな (>1m) データセットで堅牢であることを示す。画像拡散モデルを制御するため,コントロールネットは広い範囲の応用を促進する可能性がある。

We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.

翻訳日:2023-11-30 17:07:34 公開日:2023-11-26

# DocAsRef: 参照ベースの概要品質基準を自由に再利用する実証的研究

DocAsRef: An Empirical Study on Repurposing Reference-Based Summary Quality Metrics Reference-Freely ( http://arxiv.org/abs/2212.10013v2 )

ライセンス: Link先を確認

Forrest Sheng Bao, Ruixuan Tu, Ge Luo, Yinfei Yang, Hebi Li, Minghui Qiu, Youbiao He, Cen Chen

(参考訳) 自動要約品質評価は、参照ベースと参照フリーの2つのカテゴリに分類される。人間が書いた参照から得られる追加情報によって歴史的により正確と考えられる参照ベースのメトリクスは、人間の入力に依存して制限される。本稿では,システムサマリーとシステムサマリーを比較するための基準ベースメトリクスの比較手法を,そのソース文書に対する評価に効果的に適用し,これらのメトリクスを基準フリーに変換できると仮定する。実験結果はこの仮説を支持する。参照フリーで再利用された後、<0.5Bパラメータの事前訓練されたDeBERTa-large-MNLIモデルを使用したゼロショットBERTScoreは、SummEvalおよびNewsroomデータセットのさまざまな側面において、オリジナルの参照ベースバージョンを一貫して上回っている。また、既存の参照フリーメトリクスと比べて優れているし、gpt-3.5に基づいたゼロショットサマリーエミュレータと密接に競合する。

Automated summary quality assessment falls into two categories: reference-based and reference-free. Reference-based metrics, historically deemed more accurate due to the additional information provided by human-written references, are limited by their reliance on human input. In this paper, we hypothesize that the comparison methodologies used by some reference-based metrics to evaluate a system summary against its corresponding reference can be effectively adapted to assess it against its source document, thereby transforming these metrics into reference-free ones. Experimental results support this hypothesis. After being repurposed reference-freely, the zero-shot BERTScore using the pretrained DeBERTa-large-MNLI model of <0.5B parameters consistently outperforms its original reference-based version across various aspects on the SummEval and Newsroom datasets. It also excels in comparison to most existing reference-free metrics and closely competes with zero-shot summary evaluators based on GPT-3.5.

翻訳日:2023-11-30 17:03:29 公開日:2023-11-26

# 量子ディープヘッジ

Quantum Deep Hedging ( http://arxiv.org/abs/2303.16585v2 )

ライセンス: Link先を確認

El Amine Cherrat, Snehal Raj, Iordanis Kerenidis, Abhishek Shekhar, Ben Wood, Jon Dee, Shouvanik Chakrabarti, Richard Chen, Dylan Herman, Shaohan Hu, Pierre Minssen, Ruslan Shaydulin, Yue Sun, Romina Yalovetzky, Marco Pistoia

(参考訳) 量子機械学習は、業界、特に金融分野での変革的な影響の可能性を秘めている。私たちの仕事では、深層強化学習が実際の市場に対して強力なフレームワークを提供するため、ヘッジの問題に目を向けています。本研究では,ポリシと値関数に直交層と複合層を持つ量子ニューラルネットワークアーキテクチャを用いた,ポリシー探索および分布型アクタクリティカルアルゴリズムに基づく量子強化学習法を開発した。我々は、我々が使用する量子ニューラルネットワークが学習可能であることを証明し、量子モデルが学習可能なパラメータの数を減少させながら同等の性能を達成し、分布アプローチが古典的および量子的手法よりも優れた性能が得られることを示す広範なシミュレーションを行う。トラップイオン量子プロセッサ上で提案したモデルの実装に成功し、最大16ドルキュービットの回路を活用し、ノイズレスシミュレーションによく適合する性能を観測した。我々の量子技術は一般的なものであり、ヘッジ以外の強化学習問題にも適用できる。

Quantum machine learning has the potential for a transformative impact across industry sectors and in particular in finance. In our work we look at the problem of hedging where deep reinforcement learning offers a powerful framework for real markets. We develop quantum reinforcement learning methods based on policy-search and distributional actor-critic algorithms that use quantum neural network architectures with orthogonal and compound layers for the policy and value functions. We prove that the quantum neural networks we use are trainable, and we perform extensive simulations that show that quantum models can reduce the number of trainable parameters while achieving comparable performance and that the distributional approach obtains better performance than other standard approaches, both classical and quantum. We successfully implement the proposed models on a trapped-ion quantum processor, utilizing circuits with up to $16$ qubits, and observe performance that agrees well with noiseless simulation. Our quantum techniques are general and can be applied to other reinforcement learning problems beyond hedging.

翻訳日:2023-11-30 16:44:31 公開日:2023-11-26

# 可変レンズを用いた変圧器の潜時予測

Eliciting Latent Predictions from Transformers with the Tuned Lens ( http://arxiv.org/abs/2303.08112v4 )

ライセンス: Link先を確認

Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt

(参考訳) 反復推論の観点からトランスフォーマーを解析し,モデル予測がレイヤ単位でどのように洗練されるかを理解する。そのため、凍結事前訓練されたモデルで各ブロックに対するアフィンプローブを訓練し、すべての隠れた状態を語彙上の分布に復号することができる。我々の方法である 'emph{tuned Lens} は、初期の 'logit Lens' 技術の洗練であり、有用な洞察を得たが、しばしば脆弱である。我々は,最大20Bパラメータを持つ多種多様な自己回帰言語モデルを用いて,ロジットレンズよりも予測的かつ信頼性が高く,偏りがないことを示す。因果実験により、調整レンズはモデル自体と同様の機能を使用することを示した。また,悪意のある入力を高精度に検出するために,潜在予測の軌跡が利用できることも見いだした。結果の再現に必要なコードは、https://github.com/alignmentresearch/tuned-lensにある。

We analyze transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer. To do so, we train an affine probe for each block in a frozen pretrained model, making it possible to decode every hidden state into a distribution over the vocabulary. Our method, the \emph{tuned lens}, is a refinement of the earlier ``logit lens'' technique, which yielded useful insights but is often brittle. We test our method on various autoregressive language models with up to 20B parameters, showing it to be more predictive, reliable and unbiased than the logit lens. With causal experiments, we show the tuned lens uses similar features to the model itself. We also find the trajectory of latent predictions can be used to detect malicious inputs with high accuracy. All code needed to reproduce our results can be found at https://github.com/AlignmentResearch/tuned-lens.

翻訳日:2023-11-30 16:39:28 公開日:2023-11-26

# 乳幼児の泣き声の弱さ検出

Weakly Supervised Detection of Baby Cry ( http://arxiv.org/abs/2304.10001v3 )

ライセンス: Link先を確認

Weijun Tan, Qi Yao, Jingfeng Liu

(参考訳) 乳幼児の泣き声の検出は乳児のモニタリングと健康管理の重要な部分である。既存のほとんどのメソッドは、教師付きSVM、CNN、またはそれらの変種を使用する。本研究では,乳児の泣き声を検出するために弱い教師付き異常検出法を提案する。この弱い監視では、オーディオファイルに泣き声がある場合にのみ弱いアノテーションが必要である。我々は、VGGish特徴抽出器と、長い音声ファイルの異常検出ネットワークを用いて、データマイニング手法を設計する。得られたデータセットは、簡単なCNN機能ネットワークをトレーニングして、Cry/non-cry分類を行う。次に、このCNNを異常検出フレームワークの機能抽出器として使用し、より優れた低温検出性能を実現する。

Detection of baby cries is an important part of baby monitoring and health care. Almost all existing methods use supervised SVM, CNN, or their varieties. In this work, we propose to use weakly supervised anomaly detection to detect a baby cry. In this weak supervision, we only need weak annotation if there is a cry in an audio file. We design a data mining technique using the pre-trained VGGish feature extractor and an anomaly detection network on long untrimmed audio files. The obtained datasets are used to train a simple CNN feature network for cry/non-cry classification. This CNN is then used as a feature extractor in an anomaly detection framework to achieve better cry detection performance.

翻訳日:2023-11-30 16:31:24 公開日:2023-11-26

# 条件適応器:高速推論によるパラメータ効率変換学習

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference ( http://arxiv.org/abs/2304.04947v2 )

ライセンス: Link先を確認

Tao Lei, Junwen Bai, Siddhartha Brahma, Joshua Ainslie, Kenton Lee, Yanqi Zhou, Nan Du, Vincent Y. Zhao, Yuexin Wu, Bo Li, Yu Zhang, Ming-Wei Chang

(参考訳) 本稿では,パラメータ効率の高い伝達学習手法である条件付きアダプタ(coda)を提案する。 CoDAは標準アダプタアプローチを超越して一般化し、条件計算を用いて速度と精度のバランスをとる新しい方法を実現する。既存の密集した事前学習モデルから始め、codaは少量の新しいパラメータと軽量トレーニングフェーズと共にスパースアクティベーションを追加している。我々の実験は、CoDAアプローチが予想外の効果的な知識伝達方法を提供することを示した。様々な言語、視覚、音声のタスクを通して、codaは、精度損失が中程度からゼロ、パラメータ効率が同じで、最先端アダプタアプローチと比較して2倍から8倍の推論スピードアップを実現している。

We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning method that also improves inference efficiency. CoDA generalizes beyond standard adapter approaches to enable a new way of balancing speed and accuracy using conditional computation. Starting with an existing dense pretrained model, CoDA adds sparse activation together with a small number of new parameters and a light-weight training phase. Our experiments demonstrate that the CoDA approach provides an unexpectedly efficient way to transfer knowledge. Across a variety of language, vision, and speech tasks, CoDA achieves a 2x to 8x inference speed-up compared to the state-of-the-art Adapter approaches with moderate to no accuracy loss and the same parameter efficiency.

翻訳日:2023-11-30 16:30:04 公開日:2023-11-26

# ニュートン重力場における量子時計の時間拡張

Time dilation of quantum clocks in a Newtonian gravitational field ( http://arxiv.org/abs/2304.04281v3 )

ライセンス: Link先を確認

Tommaso Favalli and Augusto Smerzi

(参考訳) 球状質量によって生成されるニュートン重力場と相互作用する2つの非相対論的量子時計を考える。 page と wootters のアプローチの枠組みでは、時計の時間状態の時間拡張を導出する。遅延はシュワルツシルト計量から得られる重力時間拡張と一階まで一致している。この結果は相対論的重力ポテンシャルを考えることで拡張できる:この場合、正確なシュワルツシルト解との一致を得る。

We consider two non-relativistic quantum clocks interacting with a Newtonian gravitational field produced by a spherical mass. In the framework of Page and Wootters approach, we derive a time dilation for the time states of the clocks. The delay is in agreement up to first order with the gravitational time dilation obtained from the Schwarzschild metric. This result can be extended by considering the relativistic gravitational potential: in this case we obtain the agreement with the exact Schwarzschild solution.

翻訳日:2023-11-30 16:29:50 公開日:2023-11-26

# 最近傍のアルゴリズムにおける効率的なタスク特化データ評価」の一考察

A Note on "Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms" ( http://arxiv.org/abs/2304.04258v2 )

ライセンス: Link先を確認

Jiachen T. Wang and Ruoxi Jia

(参考訳) データ評価は、機械学習(ML)モデルに対する個々のデータポイントの影響を研究する、成長する研究分野である。データシャプリー(data shapley)は、協調ゲーム理論と経済学に触発され、データ評価の効果的な方法である。しかし、Shapley値(SV)が計算コストが高いことはよく知られている。幸いなことに、Jia et al. (2019) は、K-Nearest Neighbors (KNN) モデルでは、Data Shapleyの計算は驚くほど単純で効率的であることを示した。本稿では、Jia et al. (2019) の業績を再考し、KNNモデルの性能をよりよく反映した、より自然で解釈可能なユーティリティ関数を提案する。新しいユーティリティ関数を用いて、kn分類器/レグレプタのデータシェープリーの対応する計算手順を導出する。我々の新しいアプローチは、ソフトラベルKNN-SVと呼ばれ、元の方法と同じ時間複雑性を実現する。さらに,局所性感度ハッシュ(LSH)に基づくソフトラベルKNN-SVの効率的な近似アルゴリズムを提案する。実験の結果, ソフトラベルKNN-SVは, 誤りラベル付きデータ検出タスクにおけるほとんどのデータセットにおいて, 元の手法よりも優れており, 今後のデータ評価研究のベースラインとして優れていることがわかった。

Data valuation is a growing research field that studies the influence of individual data points for machine learning (ML) models. Data Shapley, inspired by cooperative game theory and economics, is an effective method for data valuation. However, it is well-known that the Shapley value (SV) can be computationally expensive. Fortunately, Jia et al. (2019) showed that for K-Nearest Neighbors (KNN) models, the computation of Data Shapley is surprisingly simple and efficient. In this note, we revisit the work of Jia et al. (2019) and propose a more natural and interpretable utility function that better reflects the performance of KNN models. We derive the corresponding calculation procedure for the Data Shapley of KNN classifiers/regressors with the new utility functions. Our new approach, dubbed soft-label KNN-SV, achieves the same time complexity as the original method. We further provide an efficient approximation algorithm for soft-label KNN-SV based on locality sensitive hashing (LSH). Our experimental results demonstrate that Soft-label KNN-SV outperforms the original method on most datasets in the task of mislabeled data detection, making it a better baseline for future work on data valuation.

翻訳日:2023-11-30 16:29:44 公開日:2023-11-26

# 非一貫性オントロジーを用いた不整合耐性推論への埋め込みに基づくアプローチ

An Embedding-based Approach to Inconsistency-tolerant Reasoning with Inconsistent Ontologies ( http://arxiv.org/abs/2304.01664v2 )

ライセンス: Link先を確認

Keyu Wang, Site Li, Jiaye Li, Guilin Qi and Qiu Ji

(参考訳) 不整合処理は知識管理において重要な問題である。特にオントロジー工学では、論理的な矛盾はオントロジー構築中に起こりうる。矛盾するオントロジーで推論する自然な方法は、オントロジーの最大一貫した部分集合を利用することである。しかしながら、最大整合性部分集合の選択に関する以前の研究は公理の意味論をほとんど考慮していないため、不合理な推論につながる可能性がある。本稿では,公理の埋め込みに基づく記述論理における矛盾したオントロジーを推論する新しい手法を提案する。まず, 公理を分散意味ベクトルに変換し, 公理間の意味接続を計算する手法を提案する。次に,最大一貫性部分集合を選択する組込みベース手法を定義し,非一貫性許容推論関係を定義する。いくつかの論理的性質を考慮した推論関係の有理性を示す。最後に,いくつかのオントロジーについて実験を行い,推論関係の推論力を評価する。実験結果から, 組込み法は, 最大一貫した部分集合に基づく既存不整合耐性推論法より優れることが示された。

Inconsistency handling is an important issue in knowledge management. Especially in ontology engineering, logical inconsistencies may occur during ontology construction. A natural way to reason with an inconsistent ontology is to utilize the maximal consistent subsets of the ontology. However, previous studies on selecting maximum consistent subsets have rarely considered the semantics of the axioms, which may result in irrational inference. In this paper, we propose a novel approach to reasoning with inconsistent ontologies in description logics based on the embeddings of axioms. We first give a method for turning axioms into distributed semantic vectors to compute the semantic connections between the axioms. We then define an embedding-based method for selecting the maximum consistent subsets and use it to define an inconsistency-tolerant inference relation. We show the rationality of our inference relation by considering some logical properties. Finally, we conduct experiments on several ontologies to evaluate the reasoning power of our inference relation. The experimental results show that our embedding-based method can outperform existing inconsistency-tolerant reasoning methods based on maximal consistent subsets.

翻訳日:2023-11-30 16:27:53 公開日:2023-11-26

# 準メトリック学習による最適ゴールリーチ強化学習

Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning ( http://arxiv.org/abs/2304.01203v7 )

ライセンス: Link先を確認

Tongzhou Wang, Antonio Torralba, Phillip Isola, Amy Zhang

(参考訳) 目標到達強化学習(rl)では、最適値関数は準メトリック構造と呼ばれる特定の幾何学を持つ。本稿では,準メトリックモデルを用いて最適値関数を学習する新しい rl 手法である quasimetric reinforcement learning (qrl) を提案する。従来のアプローチとは違い、QRLの目標は特に準計量のために設計されており、強力な理論的回復保証を提供する。実験的に、離散化されたマウンテンカー環境を徹底的に分析し、QRLの特性と代替品に対する優位性を識別する。オフラインおよびオンラインの目標達成ベンチマークでは、QRLは、状態ベースと画像ベースの両方で、サンプル効率とパフォーマンスが改善されている。

In goal-reaching reinforcement learning (RL), the optimal value function has a particular geometry, called quasimetric structure. This paper introduces Quasimetric Reinforcement Learning (QRL), a new RL method that utilizes quasimetric models to learn optimal value functions. Distinct from prior approaches, the QRL objective is specifically designed for quasimetrics, and provides strong theoretical recovery guarantees. Empirically, we conduct thorough analyses on a discretized MountainCar environment, identifying properties of QRL and its advantages over alternatives. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance, across both state-based and image-based observations.

翻訳日:2023-11-30 16:27:34 公開日:2023-11-26

# AIによる調査:大規模言語モデルの活用とオピニオン予測のための調査

AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction ( http://arxiv.org/abs/2305.09620v2 )

ライセンス: Link先を確認

Junsol Kim, Byungkyu Lee

(参考訳) 人間のような反応を生み出す大きな言語モデル(LLM)は、社会科学における研究の実践に革命をもたらし始めている。本稿では,LLMとソーシャルサーベイを統合して,これまで質問されなかった質問に対する個々の回答を正確に予測する方法を示す。本研究は,LLMを個人化するための新たな手法として,テキストから導かれる調査質問の意味,回答パターンから推測される個人の潜在信念,調査データを用いた微調整による調査期間の時間的文脈を考察する。 1972年から2021年までの一般社会調査の結果から,alpaca-7bに基づく微調整モデルでは,部分的欠落と完全欠落に対する個々の回答を予測できることが示された。また,同性婚への支持が高まるなど,世論の態度が変わった際には,不在の傾向を高い信頼感と要点で埋めることができる。 LLMを意見予測に用いた場合、個人の自律性とプライバシに関する実践的制約、社会デコグラフィー表現、倫理的懸念について論じる。本研究は,LLMと調査が相互に相互に能力を高めることを示し,LLMは調査可能性を広げ,調査はLLMのアライメントを改善する。

Large language models (LLMs) that produce human-like responses have begun to revolutionize research practices in the social sciences. This paper shows how we can integrate LLMs and social surveys to accurately predict individual responses to survey questions that were not asked before. We develop a novel methodological framework to personalize LLMs by considering the meaning of survey questions derived from their text, the latent beliefs of individuals inferred from their response patterns, and the temporal contexts across different survey periods through fine-tuning LLMs with survey data. Using the General Social Survey from 1972 to 2021, we show that the fine-tuned model based on Alpaca-7b can predict individual responses to survey questions that are partially missing as well as entirely missing. The remarkable prediction capabilities allow us to fill in missing trends with high confidence and pinpoint when public attitudes changed, such as the rising support for same-sex marriage. We discuss practical constraints, socio-demographic representation, and ethical concerns regarding individual autonomy and privacy when using LLMs for opinion prediction. This study demonstrates that LLMs and surveys can mutually enhance each other's capabilities: LLMs broaden survey potential, while surveys improve the alignment of LLMs.

翻訳日:2023-11-30 16:21:09 公開日:2023-11-26

# Bare Homography による画像マッチング

Image Matching by Bare Homography ( http://arxiv.org/abs/2305.08946v5 )

ライセンス: Link先を確認

Fabio Bellavia

(参考訳) 本稿では,シーンを粗い局所重なり面としてモデル化する,新しい非奥行き画像マッチングフレームワークslimeを提案する。この中間表現は、キーポイントパッチの局所的なアフィン近似と、空間的および類似性の制約に基づくグローバルマッチングの間に位置し、プレーンが一般的なシーンに関して扱いやすいので、対応の漸進的プルーニングを提供する。スライムは画像を異なるスケールで重なり合う領域に分解し、ゆるい平面ホモグラフを計算する。平面は一致するマッチによって相互に拡張され、画像は固定タイルに分割され、タイルのペアごとに最適なホモグラフのみが保持される。安定マッチは、ペアワイズホモグラフによって提供される許容ステレオ構成のコンセンサスに従って識別される。タイル内では、粗面はマッチの重なりに応じてマージされ、さらに一貫した対応が抽出される。プロセス全体はホモグラフィの制約のみを含む。その結果、シーン上の正しいマッチのカバレッジと安定性の両方が増幅され、困難なシーンでマッチを見つけられるようになり、従来のハイブリッドマッチングパイプラインが、最近のエンドツーエンドのディープマッチングメソッドに対して失われた基盤を構築できるようになった。さらに、エンドツーエンドのディープ・ネットワークとハイブリッド・パイプラインで表現される画像マッチングにおける最近の最先端画像の比較分析を行った。この評価は、急激な時間変化や相対的な画像回転の強い変動など、批判的かつ困難なシナリオを考慮して、平面と非平面の両方を考慮する。この分析によれば、この分野における印象的な進歩にもかかわらず、今後の研究で検討すべき改善の余地は広い。

This paper presents Slime, a novel non-deep image matching framework which models the scene as rough local overlapping planes. This intermediate representation sits in-between the local affine approximation of the keypoint patches and the global matching based on both spatial and similarity constraints, providing a progressive pruning of the correspondences, as planes are easier to handle with respect to general scenes. Slime decomposes the images into overlapping regions at different scales and computes loose planar homographies. Planes are mutually extended by compatible matches and the images are split into fixed tiles, with only the best homographies retained for each pair of tiles. Stable matches are identified according to the consensus of the admissible stereo configurations provided by pairwise homographies. Within tiles, the rough planes are then merged according to their overlap in terms of matches and further consistent correspondences are extracted. The whole process only involves homography constraints. As a result, both the coverage and the stability of correct matches over the scene are amplified, together with the ability to spot matches in challenging scenes, allowing traditional hybrid matching pipelines to make up lost ground against recent end-to-end deep matching methods. In addition, the paper gives a thorough comparative analysis of recent state-of-the-art in image matching represented by end-to-end deep networks and hybrid pipelines. The evaluation considers both planar and non-planar scenes, taking into account critical and challenging scenarios including abrupt temporal image changes and strong variations in relative image rotations. According to this analysis, although the impressive progress done in this field, there is still a wide room for improvements to be investigated in future research.

翻訳日:2023-11-30 16:20:46 公開日:2023-11-26

# 機械学習を用いた最適行動実験の設計

Designing Optimal Behavioral Experiments Using Machine Learning ( http://arxiv.org/abs/2305.07721v2 )

ライセンス: Link先を確認

Simon Valentin, Steven Kleinegesse, Neil R. Bramley, Peggy Seri\`es, Michael U. Gutmann, Christopher G. Lucas

(参考訳) 計算モデルは人間の認知と行動を理解する強力なツールである。彼らは我々の理論を明確かつ正確に表現し、微妙でしばしば直感に反する予測を提供する。しかし、この豊かさと驚きの能力は、我々の科学的直観と伝統的なツールが、これらのモデルをテストし比較するための実験の設計に不適であることを意味する。これらの落とし穴を回避し、計算モデリングの可能性を最大限に発揮するためには、モデルが人間の振る舞いを説明することや、モデルがすべき補助的な仮定について明確な答えを提供する実験をデザインするツールが必要です。ベイズ最適実験設計(BOED)は、情報的データが得られると思われる実験を特定することにより、最適な実験設計の探索を定式化する。本稿では,boedと機械学習の最近の進歩を活かして,データのシミュレーションが可能な任意の種類のモデルに対して最適な実験を見つけるためのチュートリアルを提供し,この手法の副産物が実際の実験データに対して,モデルとそのパラメータを迅速かつ簡単に評価できることを示す。ケーススタディとして,マルチアームバンディット意思決定タスクにおける探索と搾取のバランスに関する理論を考察する。提案手法をシミュレーションと実世界実験を用いて検証する。文献で一般的に用いられる実験的な設計と比較すると,人間の行動に最も適したモデル群を最適設計がより効率的に決定し,望ましいモデルに対する行動のキャラクタリゼーションをより効率的に行うことが示される。同時に,boedで適切に対応できるように科学的質問を形式化することは困難であり,実践者が認識すべきいくつかの潜在的な注意事項と落とし穴について議論する。すべての分析を再現するためのコードとチュートリアルノートブックを提供します。

Computational models are powerful tools for understanding human cognition and behavior. They let us express our theories clearly and precisely, and offer predictions that can be subtle and often counter-intuitive. However, this same richness and ability to surprise means our scientific intuitions and traditional tools are ill-suited to designing experiments to test and compare these models. To avoid these pitfalls and realize the full potential of computational modeling, we require tools to design experiments that provide clear answers about what models explain human behavior and the auxiliary assumptions those models must make. Bayesian optimal experimental design (BOED) formalizes the search for optimal experimental designs by identifying experiments that are expected to yield informative data. In this work, we provide a tutorial on leveraging recent advances in BOED and machine learning to find optimal experiments for any kind of model that we can simulate data from, and show how by-products of this procedure allow for quick and straightforward evaluation of models and their parameters against real experimental data. As a case study, we consider theories of how people balance exploration and exploitation in multi-armed bandit decision-making tasks. We validate the presented approach using simulations and a real-world experiment. As compared to experimental designs commonly used in the literature, we show that our optimal designs more efficiently determine which of a set of models best account for individual human behavior, and more efficiently characterize behavior given a preferred model. At the same time, formalizing a scientific question such that it can be adequately addressed with BOED can be challenging and we discuss several potential caveats and pitfalls that practitioners should be aware of. We provide code and tutorial notebooks to replicate all analyses.

翻訳日:2023-11-30 16:19:20 公開日:2023-11-26

# 自動抽出メソッドリファクタリングによる単一責任のサポート

Supporting single responsibility through automated extract method refactoring ( http://arxiv.org/abs/2305.03428v2 )

ライセンス: Link先を確認

Alireza Ardalani, Saeed Parsa, Morteza Zakeri-Nasrabadi, Alexander Chatzigeorgiou

(参考訳) メソッド/関数の責務は、所望の計算を実行し、オブジェクトフィールドや出力命令の変数を含む様々な成果物を通じて、結果を呼び出し元に分散することである。この責任の定義に基づいて、単一責任を持つ人に長いメソッドをリファクタリングする新しいアルゴリズムを提供する。本稿では,長いメソッドを少し重なり合うスライスに分解する後方スライスアルゴリズムを提案する。スライスは各出力命令に対して計算され、メソッドに委譲された責任の結果を表す。スライシング基準が同じ出力変数に対処した場合、スライスはオーバーラップしない。スライスはさらに独立な方法として抽出され、ある行動保存が行われると元の方法によって呼び出される。提案手法はGEMS抽出法リファクタリングベンチマークと実世界の3つのプロジェクトで評価されている。平均して、我々の実験は、最先端のアプローチと比較して、少なくとも29.6%の精度の向上と12.1%の改善を実演しています。さらに,本ツールはリファクタリング後のメソッドレベルの凝集度を平均20%改善する。実験により,単一責任の手法抽出における提案手法の適用性を確認した。

The responsibility of a method/function is to perform some desired computations and disseminate the results to its caller through various deliverables, including object fields and variables in output instructions. Based on this definition of responsibility, this paper offers a new algorithm to refactor long methods to those with a single responsibility. We propose a backward slicing algorithm to decompose a long method into slightly overlapping slices. The slices are computed for each output instruction, representing the outcome of a responsibility delegated to the method. The slices will be non-overlapping if the slicing criteria address the same output variable. The slices are further extracted as independent methods, invoked by the original method if certain behavioral preservations are made. The proposed method has been evaluated on the GEMS extract method refactoring benchmark and three real-world projects. On average, our experiments demonstrate at least a 29.6% improvement in precision and a 12.1% improvement in the recall of uncovering refactoring opportunities compared to the state-of-the-art approaches. Furthermore, our tool improves method-level cohesion metrics by an average of 20% after refactoring. Experimental results confirm the applicability of the proposed approach in extracting methods with a single responsibility.

翻訳日:2023-11-30 16:17:24 公開日:2023-11-26

# GenerateCT:3次元胸部CTボリュームのテキストコンディショナル生成

GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes ( http://arxiv.org/abs/2305.16037v3 )

ライセンス: Link先を確認

Ibrahim Ethem Hamamci, Sezgin Er, Enis Simsar, Anjany Sekuboyina, Chinmay Prabhakar, Alperen Tezcan, Ayse Gulnihan Simsek, Sevval Nil Esirgun, Furkan Almas, Irem Do\u{g}an, Muhammed Furkan Dasdelen, Hadrien Reynaud, Sarthak Pati, Christian Bluethgen, Mehmet Kemal Ozdemir, Bjoern Menze

(参考訳) 本稿では,フリーフォーム医療用テキストプロンプトに条件付CTボリュームを生成する新しい手法であるGenerateCTを紹介する。 GenerateCTは、CTボリュームを符号化する新しい因果視覚変換器と、CTとテキストトークンを整列するテキストイメージ変換器と、テキスト条件の超解像拡散モデルとを含む3つの重要なコンポーネントを含む。 GenerateCTは、FIDとFVDの低いスコアで検証された、現実的で高解像度で高忠実な3D胸部CTボリュームを生成することができる。 GenerateCTの臨床応用を探求するため,多義性分類タスクにおいて有用性を評価した。まず,実データセット上でのマルチ異常度分類器のトレーニングにより,ベースラインを確立した。モデルの外部データセットへの一般化と、ゼロショットシナリオにおける未認識のプロンプトによるパフォーマンスをさらに評価するために、外部データセットを使用して分類器をトレーニングし、追加のベンチマークを設定した。我々は,generatectを用いて各集合のボリュームを等数に合成し,トレーニングデータセットを2倍にする実験を行った。最初の実験では、実数と生成量で分類器を共同で訓練する際、APスコアが11%改善した。第2の実験では、目に見えないプロンプトに基づいた実数と生成量のトレーニングでは7%の改善が見られた。さらに、GenerateCTは、任意のサイズの合成トレーニングデータセットのスケーリングを可能にする。例えば、実際のデータセットの5倍の10万のctボリュームを生成し、これらの合成ボリュームのみに分類器をトレーニングしました。驚くべきことに、この分類器は、利用可能なすべての実データでトレーニングされたもののパフォーマンスを8%上回った。最後に、ドメインの専門家は生成されたボリュームを評価し、テキストプロンプトと高い整合性を確認した。私たちのコードと事前トレーニングされたモデルは、https://github.com/ibrahimethemhamamci/GenerateCTで利用可能です。

In this paper, we introduce GenerateCT, a novel approach for generating CT volumes conditioned on free-form medical text prompts. GenerateCT includes a text encoder and three key components: a novel causal vision transformer for encoding CT volumes, a text-image transformer for aligning CT and text tokens, and a text-conditional super-resolution diffusion model. GenerateCT can produce realistic, high-resolution, and high-fidelity 3D chest CT volumes, validated by low FID and FVD scores. To explore GenerateCT's clinical applications, we evaluated its utility in a multi-abnormality classification task. First, we established a baseline by training a multi-abnormality classifier on our real dataset. To further assess the model's generalization to external datasets and its performance with unseen prompts in a zero-shot scenario, we employed an external dataset to train the classifier, setting an additional benchmark. We conducted two experiments in which we doubled the training datasets by synthesizing an equal number of volumes for each set using GenerateCT. The first experiment demonstrated an 11% improvement in the AP score when training the classifier jointly on real and generated volumes. The second experiment showed a 7% improvement when training on both real and generated volumes based on unseen prompts. Moreover, GenerateCT enables the scaling of synthetic training datasets to arbitrary sizes. As an example, we generated 100,000 CT volumes, fivefold the number in our real dataset, and trained the classifier exclusively on these synthetic volumes. Impressively, this classifier surpassed the performance of the one trained on all available real data by a margin of 8%. Lastly, domain experts evaluated the generated volumes, confirming a high degree of alignment with the text prompt. Our code and pre-trained models are available at: https://github.com/ibrahimethemhamamci/GenerateCT

翻訳日:2023-11-30 16:10:09 公開日:2023-11-26

# 大規模言語モデルの強みとバイアスを明らかにするインコンテキスト・インフォメーション

In-Context Impersonation Reveals Large Language Models' Strengths and Biases ( http://arxiv.org/abs/2305.14930v2 )

ライセンス: Link先を確認

Leonard Salewski, Stephan Alaniz, Isabel Rio-Torto, Eric Schulz, Zeynep Akata

(参考訳) 日常会話では、人間は異なる役割を担い、選択した役割に語彙を適応することができる。 LLMがテキスト・イン・コンテクストを生成する際に,その役割を異にするかどうかを検討する。我々は、視覚と言語タスクを解く前に、LLMに異なるペルソナを仮定するよう依頼する。私たちは、プロンプトに社会的なアイデンティティまたはドメインの専門知識に関連付けられたペルソナをプレフィックスすることでこれを行います。マルチアームバンディットタスクでは、異なる年齢の子どものふりをしたLSMが、人間のような発達段階の探索を回復する。言語に基づく推論タスクでは、ドメインエキスパートを装うLLMが、ドメイン専門家を装うLLMよりも優れた性能を発揮する。最後に,異なるカテゴリを記述する際に,llmsの擬態が視覚情報に補完するかどうかを検証した。鳥の専門家になるよう促されたLLMは、車の専門家になるよう促された鳥よりも鳥をうまく説明します。男性であるように促されたLSMは、女性であるように促された車よりも、車を記述するのが得意である。これらの結果から, LLMは多様な役割を担っており, この文脈内偽造は, 隠れた強みや偏見を明らかにするのに有効であることが示唆された。

In everyday conversations, humans can take on different roles and adapt their vocabulary to their chosen roles. We explore whether LLMs can take on, that is impersonate, different roles when they generate text in-context. We ask LLMs to assume different personas before solving vision and language tasks. We do this by prefixing the prompt with a persona that is associated either with a social identity or domain expertise. In a multi-armed bandit task, we find that LLMs pretending to be children of different ages recover human-like developmental stages of exploration. In a language-based reasoning task, we find that LLMs impersonating domain experts perform better than LLMs impersonating non-domain experts. Finally, we test whether LLMs' impersonations are complementary to visual information when describing different categories. We find that impersonation can improve performance: an LLM prompted to be a bird expert describes birds better than one prompted to be a car expert. However, impersonation can also uncover LLMs' biases: an LLM prompted to be a man describes cars better than one prompted to be a woman. These findings demonstrate that LLMs are capable of taking on diverse roles and that this in-context impersonation can be used to uncover their hidden strengths and biases.

翻訳日:2023-11-30 16:08:35 公開日:2023-11-26

# Moment Matching Denoisingギブズサンプリング

Moment Matching Denoising Gibbs Sampling ( http://arxiv.org/abs/2305.11650v4 )

ライセンス: Link先を確認

Mingtian Zhang and Alex Hawkins-Hooker and Brooks Paige and David Barber

(参考訳) エネルギーベースモデル(ebms)は複雑なデータ分布をモデリングするための汎用フレームワークを提供する。しかし、ESMからのトレーニングとサンプリングは引き続き大きな課題を呈している。スケーラブルなEMMトレーニングのための広く使われているDenoising Score Matching (DSM) 法は不整合の問題に悩まされ、エネルギーモデルが「ノイズの多い」データ分布を学習する。そこで本研究では,DSM で十分に訓練された 'ノイズ' モデルが与えられた場合に,基礎となるクリーンモデルから効果的なサンプリングを可能にする,モーメントマッチングを用いた効率的なサンプリングフレームワークを提案する。関連手法と比較して,本手法の利点を考察し,高次元データセットへの拡張方法を示す。

Energy-Based Models (EBMs) offer a versatile framework for modeling complex data distributions. However, training and sampling from EBMs continue to pose significant challenges. The widely-used Denoising Score Matching (DSM) method for scalable EBM training suffers from inconsistency issues, causing the energy model to learn a `noisy' data distribution. In this work, we propose an efficient sampling framework: (pseudo)-Gibbs sampling with moment matching, which enables effective sampling from the underlying clean model when given a `noisy' model that has been well-trained via DSM. We explore the benefits of our approach compared to related methods and demonstrate how to scale the method to high-dimensional datasets.

翻訳日:2023-11-30 16:05:49 公開日:2023-11-26

# アダマールパラメータ化下における政策勾配の線形収束について

On the Linear Convergence of Policy Gradient under Hadamard Parameterization ( http://arxiv.org/abs/2305.19575v2 )

ライセンス: Link先を確認

Jiacai Liu, Jinchi Chen, and Ke Wei

(参考訳) アダマールのパラメータ化の下での決定論的政策勾配の収束を表裏の設定で研究し、アルゴリズムの線形収束を確立する。この目的のために、我々はまずすべてのイテレーションに対して、エラーが$o(\frac{1}{k})$レートで減少することを示す。この結果に基づき、このアルゴリズムは、mdp問題と初期化のみに依存する定数である $k_0$ の反復後に、より高速な局所線形収束率を持つことを示した。アルゴリズムの局所的な線形収束を示すために、我々は実際に$k\ge k_0$のとき、サブ最適確率$b_s^k$(すなわち、出力ポリシ$\pi^k$の確率)の収縮を確立した。

The convergence of deterministic policy gradient under the Hadamard parameterization is studied in the tabular setting and the linear convergence of the algorithm is established. To this end, we first show that the error decreases at an $O(\frac{1}{k})$ rate for all the iterations. Based on this result, we further show that the algorithm has a faster local linear convergence rate after $k_0$ iterations, where $k_0$ is a constant that only depends on the MDP problem and the initialization. To show the local linear convergence of the algorithm, we have indeed established the contraction of the sub-optimal probability $b_s^k$ (i.e., the probability of the output policy $\pi^k$ on non-optimal actions) when $k\ge k_0$.

翻訳日:2023-11-30 15:58:06 公開日:2023-11-26

# 解釈可能な機械学習モデル発見のための並列座標

Parallel Coordinates for Discovery of Interpretable Machine Learning Models ( http://arxiv.org/abs/2305.18434v2 )

ライセンス: Link先を確認

Dustin Hayes, Boris Kovalerchuk

(参考訳) この研究は、並列座標における視覚的知識発見を用いて、解釈可能な機械学習の手法を前進させる。パラレル座標によるグラフィックデータ表現は、ハイパーキューブとハイパーブロック(hbs)の概念をエンドユーザにとって分かりやすくした。提案したデータ分類アルゴリズムであるHyperでは,混合および純粋なハイパーブロックを用いることが提案されている。ハイパーモデルは決定木を一般化する。アルゴリズムはいくつかの設定とオプションで表示され、インタラクティブ、自動オーバーラップ、非オーバーラップのハイパーブロックを検出する。さらに,視覚パターンの言語記述と連動してハイパーブロックの使用が実証された。 UCI MLリポジトリのベンチマークデータは、Hyperアルゴリズムを評価するために使用された。これにより、10倍のクロスバリデーションを用いて評価した混合HBと純粋なHBの発見が可能となった。ハイパーブロック間の接続、次元縮小、可視化が確立されている。エンドユーザーがハイパーブロックを見つけて観察する能力と、パターンを明確にするためのサイドバイサイドの可視化能力は、ハイパーブロック技術とハイパーアルゴリズムの大きな利点である。従来の並列座標ではサポートされていないが,不完全なn-Dデータを不完全な値で可視化する新しい手法を提案する。 HBが決定木上のデータの過一般化と過適合の両方を防止できる能力は、ハイパーブロックの別の利点として示される。ハイパーテクノロジーを実装するviscanvas 2.0ソフトウェアツールの特徴を紹介する。

This work uses visual knowledge discovery in parallel coordinates to advance methods of interpretable machine learning. The graphic data representation in parallel coordinates made the concepts of hypercubes and hyperblocks (HBs) simple to understand for end users. It is suggested to use mixed and pure hyperblocks in the proposed data classifier algorithm Hyper. It is shown that Hyper models generalize decision trees. The algorithm is presented in several settings and options to discover interactively or automatically overlapping or non-overlapping hyperblocks. Additionally, the use of hyperblocks in conjunction with language descriptions of visual patterns is demonstrated. The benchmark data from the UCI ML repository were used to evaluate the Hyper algorithm. It enabled the discovery of mixed and pure HBs evaluated using 10-fold cross validation. Connections among hyperblocks, dimension reduction and visualization have been established. The capability of end users to find and observe hyperblocks, as well as the ability of side-by-side visualizations to make patterns evident, are among major advantages ofhyperblock technology and the Hyper algorithm. A new method to visualize incomplete n-D data with missing values is proposed, while the traditional parallel coordinates do not support it. The ability of HBs to better prevent both overgeneralization and overfitting of data over decision trees is demonstrated as another benefit of the hyperblocks. The features of VisCanvas 2.0 software tool that implements Hyper technology are presented.

翻訳日:2023-11-30 15:56:35 公開日:2023-11-26

# 物理インフォームドニューラルネットワークにおける外挿故障の理解と緩和

Understanding and Mitigating Extrapolation Failures in Physics-Informed Neural Networks ( http://arxiv.org/abs/2306.09478v2 )

ライセンス: Link先を確認

Lukas Fesser, Luca D'Amico-Wong, Richard Qiu

(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、ディープニューラルネットワーク(DNN)を用いた偏微分方程式(PDE)の効率的な近似によって最近人気を博している。しかし、それらのドメイン外振舞いはよく理解されておらず、以前の研究では、解関数に高周波成分が存在することが外挿性能の悪い原因になるかもしれないと推測されている。本稿では,高次元pdesを含む異なる種類のpdesの代表的な集合に対するピンの補間挙動について検討する。その結果,外挿障害は解関数の高周波数によるものではなく,フーリエスペクトルの時間的支持の変化によるものであることがわかった。本稿では、これらのスペクトルシフトを、WWF(Weighted Wasserstein-Fourier distance)を導入して定量化する。 WWFは、PINN外挿性能の予測に利用でき、重要なスペクトルシフトがない場合には、PINN外挿性能においても真の解に近づいたままであることを示す。最後に,より大きなスペクトルシフトの影響を緩和し,補間誤差を最大82%低減するトランスファー学習に基づく戦略を提案する。

Physics-informed Neural Networks (PINNs) have recently gained popularity due to their effective approximation of partial differential equations (PDEs) using deep neural networks (DNNs). However, their out of domain behavior is not well understood, with previous work speculating that the presence of high frequency components in the solution function might be to blame for poor extrapolation performance. In this paper, we study the extrapolation behavior of PINNs on a representative set of PDEs of different types, including high-dimensional PDEs. We find that failure to extrapolate is not caused by high frequencies in the solution function, but rather by shifts in the support of the Fourier spectrum over time. We term these spectral shifts and quantify them by introducing a Weighted Wasserstein-Fourier distance (WWF). We show that the WWF can be used to predict PINN extrapolation performance, and that in the absence of significant spectral shifts, PINN predictions stay close to the true solution even in extrapolation. Finally, we propose a transfer learning-based strategy to mitigate the effects of larger spectral shifts, which decreases extrapolation errors by up to 82%.

翻訳日:2023-11-30 15:46:00 公開日:2023-11-26

# 確率的プログラムを用いた大規模言語モデルの逐次モンテカルロステアリング

Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs ( http://arxiv.org/abs/2306.03081v2 )

ライセンス: Link先を確認

Alexander K. Lew, Tan Zhi-Xuan, Gabriel Grand, and Vikash K. Mansinghka

(参考訳) 微調整と強化学習の後でも、大きな言語モデル(llm)は不可能ではないが、プロンプトだけで確実に制御することは困難である。連続モンテカルロステアリング(SMC)と呼ばれるLCMの出力に構文的および意味的制約を強制する新しい推論時手法を提案する。鍵となるアイデアは、言語生成タスクを離散確率系列モデルにおける後続推論問題として指定し、標準復号を逐次モンテカルロ推論に置き換えることである。ビームサーチと同様の計算コストのために、SMC は LLM を操り、埋め込み、構文制約による生成、交差点の促進など様々なタスクを解くことができる。 smcステアリングの実験を容易にするために、新しい世代のタスクを言語モデル確率プログラムとして簡潔に指定し、llamaファミリートランスフォーマーのステアリングを自動化する、確率的プログラミングライブラリllamppl(https://github.com/probcomp/hfppl)を提案する。

Even after fine-tuning and reinforcement learning, large language models (LLMs) can be difficult, if not impossible, to control reliably with prompts alone. We propose a new inference-time approach to enforcing syntactic and semantic constraints on the outputs of LLMs, called sequential Monte Carlo (SMC) steering. The key idea is to specify language generation tasks as posterior inference problems in a class of discrete probabilistic sequence models, and replace standard decoding with sequential Monte Carlo inference. For a computational cost similar to that of beam search, SMC can steer LLMs to solve diverse tasks, including infilling, generation under syntactic constraints, and prompt intersection. To facilitate experimentation with SMC steering, we present a probabilistic programming library, LLaMPPL (https://github.com/probcomp/hfppl), for concisely specifying new generation tasks as language model probabilistic programs, and automating steering of LLaMA-family Transformers.

翻訳日:2023-11-30 15:43:39 公開日:2023-11-26

# 失認者再確認のための消去・変換・通知防御ネットワーク

Erasing, Transforming, and Noising Defense Network for Occluded Person Re-Identification ( http://arxiv.org/abs/2307.07187v3 )

ライセンス: Link先を確認

Neng Dong, Liyan Zhang, Shuanglin Yan, Hao Tang and Jinhui Tang

(参考訳) 排他的摂動は、人物の再識別(re-ID)において重大な課題を示し、外部の視覚的手がかりに依存する既存の手法では、追加の計算資源を必要とし、排他的情報の欠落の問題のみを考慮する。本稿では, 騒音障害としてオクルージョンを扱い, 敵防御の観点から隠蔽された人物のre-IDを解消する, 消去, トランスフォーミング, 騒音防御ネットワーク (ETNDNet) という, シンプルで効果的なフレームワークを提案する。提案するETNDNetでは,まず特徴マップをランダムに消去し,不完全な情報を持つ敵表現を生成する。第2に,オクルージョンによる位置ずれをシミュレートするランダムな変換を導入し,抽出器と分類器を逆さまに訓練し,不整合情報に対する堅牢な表現を学習する。第3に,障害物や非目標歩行者が導入した騒音情報に対処するために,ランダムな値で特徴マップを摂動させ,re-IDシステムにおいて敵ゲーミングを採用し,閉塞音に対する耐性を高める。 ETNDNetには3つの重要なハイライトがある。 (i)パラメータを持つ外部モジュールを一切必要としない。 (ii)障害物や非目標歩行者からの閉塞による諸問題を効果的に処理し、三隠蔽者再IDのための最初のGANベースの敵防衛パラダイムを設計する。 5つの公開データセットに対する大規模な実験は、提案したETNDNetの有効性、優位性、実用性を完全に証明している。コードは \url{https://github.com/nengdong96/ETNDNet} でリリースされる。

Occlusion perturbation presents a significant challenge in person re-identification (re-ID), and existing methods that rely on external visual cues require additional computational resources and only consider the issue of missing information caused by occlusion. In this paper, we propose a simple yet effective framework, termed Erasing, Transforming, and Noising Defense Network (ETNDNet), which treats occlusion as a noise disturbance and solves occluded person re-ID from the perspective of adversarial defense. In the proposed ETNDNet, we introduce three strategies: Firstly, we randomly erase the feature map to create an adversarial representation with incomplete information, enabling adversarial learning of identity loss to protect the re-ID system from the disturbance of missing information. Secondly, we introduce random transformations to simulate the position misalignment caused by occlusion, training the extractor and classifier adversarially to learn robust representations immune to misaligned information. Thirdly, we perturb the feature map with random values to address noisy information introduced by obstacles and non-target pedestrians, and employ adversarial gaming in the re-ID system to enhance its resistance to occlusion noise. Without bells and whistles, ETNDNet has three key highlights: (i) it does not require any external modules with parameters, (ii) it effectively handles various issues caused by occlusion from obstacles and non-target pedestrians, and (iii) it designs the first GAN-based adversarial defense paradigm for occluded person re-ID. Extensive experiments on five public datasets fully demonstrate the effectiveness, superiority, and practicality of the proposed ETNDNet. The code will be released at \url{https://github.com/nengdong96/ETNDNet}.

翻訳日:2023-11-30 15:36:29 公開日:2023-11-26

# WavJourney: 大きな言語モデルによる作曲オーディオ作成

WavJourney: Compositional Audio Creation with Large Language Models ( http://arxiv.org/abs/2307.14335v2 )

ライセンス: Link先を確認

Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

(参考訳) 音声生成モデルの進歩にもかかわらず、その能力は音声の書き起こしや音声キャプションのようなドメイン固有の条件に限られることが多い。しかし、現実の音声生成は、音声、音楽、音響効果などの様々な要素を含む調和した音声を制御可能な条件で生成することを目的としており、既存の音声生成システムでは対処が難しい。本稿では,大規模言語モデル(llms)を活用した新しいフレームワークであるwavjourneyを提案する。 WavJourneyを使えば、ユーザーはテキストによる説明だけで様々なオーディオ要素でストーリーテリングオーディオコンテンツを作成できる。具体的には、テキスト命令が与えられた場合、WavJourney はまず LLM に対して、オーディオ要素の構造的意味表現として機能するオーディオスクリプトを生成するよう促す。音声スクリプトはコンピュータプログラムに変換され、プログラムの各行はタスク固有のオーディオ生成モデルまたは計算操作関数を呼び出す。そして、コンピュータプログラムを実行し、音声生成のための構成的で解釈可能なソリューションを得る。実験結果から,WavJourneyはテキスト記述された意味的,空間的,時間的条件に整合した現実的な音声を合成し,テキストから音声生成のベンチマークで最先端の結果が得られることが示唆された。さらに,新しいマルチジャンル・ストーリー・ベンチマークを導入する。主観評価はWavJourneyがテキストから魅力的なストーリーテリング音声コンテンツを制作する可能性を示している。さらにwavjourneyがマルチラウンド対話における人間と機械の共創を促進することを実証する。今後の研究を促進するため、コードと合成オーディオはhttps://audio-agi.github.io/wavjourney_demopage/で入手できる。

Despite breakthroughs in audio generation models, their capabilities are often confined to domain-specific conditions such as speech transcriptions and audio captions. However, real-world audio creation aims to generate harmonious audio containing various elements such as speech, music, and sound effects with controllable conditions, which is challenging to address using existing audio generation systems. We present WavJourney, a novel framework that leverages Large Language Models (LLMs) to connect various audio models for audio creation. WavJourney allows users to create storytelling audio content with diverse audio elements simply from textual descriptions. Specifically, given a text instruction, WavJourney first prompts LLMs to generate an audio script that serves as a structured semantic representation of audio elements. The audio script is then converted into a computer program, where each line of the program calls a task-specific audio generation model or computational operation function. The computer program is then executed to obtain a compositional and interpretable solution for audio creation. Experimental results suggest that WavJourney is capable of synthesizing realistic audio aligned with textually-described semantic, spatial and temporal conditions, achieving state-of-the-art results on text-to-audio generation benchmarks. Additionally, we introduce a new multi-genre story benchmark. Subjective evaluations demonstrate the potential of WavJourney in crafting engaging storytelling audio content from text. We further demonstrate that WavJourney can facilitate human-machine co-creation in multi-round dialogues. To foster future research, the code and synthesized audio are available at: https://audio-agi.github.io/WavJourney_demopage/.

翻訳日:2023-11-30 15:24:28 公開日:2023-11-26

# RANSACを用いた教師なし画像異常検出

Unsupervised Image Outlier Detection using RANSAC ( http://arxiv.org/abs/2307.12301v2 )

ライセンス: Link先を確認

Chen-Han Tsai, Yu-Shao Peng

(参考訳) 画像異常検出(OD)は、コンピュータビジョンタスクで使用される画像データセットの品質と精度を保証するための重要なツールである。しかし、既存のアプローチのほとんどは、アウトレイラ予測に先立ってトレーニングのために、一連の分散データを必要とする。データの品質と量は、結果のパフォーマンスに影響を与える可能性がある。したがって、適切な分配集合を選択するには、しばしばかなりの労力を要する。本研究では,一級分類方式で汚染された集合内の外れ値を検出するための教師なし画像ODアルゴリズムであるRANSAC-NNを提案する。 RANSAC-NNはトレーニングなしで、様々なODベンチマークで確立された他の方法と比較して好適に機能する。さらに,本手法は,RANSAC-NNを前処理中に簡単に適用することで,既存のOD手法の堅牢性を高めることができることを示す。

Image outlier detection (OD) is an essential tool to ensure the quality and accuracy of image datasets used in computer vision tasks. Most existing approaches, however, require a set of in-distribution data for training prior to outlier prediction. The quality and quantity of the data can influence the resulting performance. Thus, selecting a suitable in-distribution set often requires considerable effort. In this work, we propose RANSAC-NN, an unsupervised image OD algorithm designed to detect outliers within contaminated sets in a one-class classification fashion. Without any training, RANSAC-NN performs favorably in comparison to other well-established methods in a variety of OD benchmarks. Furthermore, we show that our method can enhance the robustness of existing OD methods by simply applying RANSAC-NN during pre-processing.

翻訳日:2023-11-30 15:23:15 公開日:2023-11-26

# 銀行業務自動化のためのマルチモーダル文書分析

Multimodal Document Analytics for Banking Process Automation ( http://arxiv.org/abs/2307.11845v2 )

ライセンス: Link先を確認

Christopher Gerling, Stefan Lessmann

(参考訳) 従来の銀行は急速に発展する金融エコシステムにおいてフィンテックとの競争が激化している。この課題に対処するには,運用効率の向上が不可欠だ。本研究の目的は,銀行における文書集約型ビジネスプロセスの効率化である。そこで我々はまず,小売部門における業務文書の状況について概観する。バンキング文書はテキスト、レイアウト、視覚を含むことが多く、文書分析とプロセスの自動化には通常の自然言語処理(NLP)以上のものが必要であることを示唆している。これを検証し、ビジネス文書処理時の視覚的手がかりの漸進的価値を評価するために、最近提案されたLayoutXLMと呼ばれるマルチモーダルモデルと強力なテキスト分類器(例えばBERT)と大規模言語モデル(例えばGPT)を比較した。その結果,レイアウト情報をモデルに組み込むことで性能が大幅に向上することが確認された。興味深いことに、最高のモデルパフォーマンス(f1スコアの観点から)の75%以上が、トレーニングデータの30%以下で達成可能であることもわかりました。これは、マルチモーダルモデルを構築するためのラベル付きデータの要求が適度であることを示し、マルチモーダル文書分析の現実的な応用を単純化する。また,マルチモーダルバンキング文書分類器の校正範囲において,微調整の必要性を含め,より具体的な実践について考察した。本論文は,銀行業務における文書処理におけるマルチモデルモデルの有効性と効率に関する実証的証拠を提示し,この可能性を日々の業務において解き放つための実践的なガイダンスを提供する。

Traditional banks face increasing competition from FinTechs in the rapidly evolving financial ecosystem. Raising operational efficiency is vital to address this challenge. Our study aims to improve the efficiency of document-intensive business processes in banking. To that end, we first review the landscape of business documents in the retail segment. Banking documents often contain text, layout, and visuals, suggesting that document analytics and process automation require more than plain natural language processing (NLP). To verify this and assess the incremental value of visual cues when processing business documents, we compare a recently proposed multimodal model called LayoutXLM to powerful text classifiers (e.g., BERT) and large language models (e.g., GPT) in a case study related to processing company register extracts. The results confirm that incorporating layout information in a model substantially increases its performance. Interestingly, we also observed that more than 75% of the best model performance (in terms of the F1 score) can be achieved with as little as 30% of the training data. This shows that the demand for data labeled data to set up a multi-modal model can be moderate, which simplifies real-world applications of multimodal document analytics. Our study also sheds light on more specific practices in the scope of calibrating a multimodal banking document classifier, including the need for fine-tuning. In sum, the paper contributes original empirical evidence on the effectiveness and efficiency of multi-model models for document processing in the banking business and offers practical guidance on how to unlock this potential in day-to-day operations.

翻訳日:2023-11-30 15:22:22 公開日:2023-11-26

# 双方向積分近似による完全拡散反転

Exact Diffusion Inversion via Bi-directional Integration Approximation ( http://arxiv.org/abs/2307.10829v6 )

ライセンス: Link先を確認

Guoqiang Zhang and J. P. Lewis and W. Bastiaan Kleijn

(参考訳) 近年,EDICT[36]やNull-textインバージョン[22]などの画像編集を可能にするために,DDIMインバージョンの不整合問題に対処する様々な手法が提案されている。しかし、上記の手法は計算オーバーヘッドがかなり大きい。本稿では,BDIA(emph{bi-directional integration approximation)と呼ばれる新しい手法を提案する。次の拡散状態 $\boldsymbol{z}_{i-1}$ at timestep $t_i$ と履歴情報 $(i,\boldsymbol{z}_i)$ と $(i+1,\boldsymbol{z}_{i+1})$ を推定する。まず、推定されたガウスノイズ $\hat{\boldsymbol{\epsilon}}(\boldsymbol{z}_i,i)$ を取得し、次に次回の時間スロット$[t_i, t_{i-1}]$ と前回の時間スロット$[t_i, t_{t+1}]$ を後方方向に近似するためにDDIM更新手順を2回適用する。以前の時間スロットのDDIMステップは、$\boldsymbol{z}_i$を計算する際に以前になされた積分近似を洗練するために使用される。 BDIA-DDIMのよい性質は、$\boldsymbol{z}_{i-1}$の更新式が$(\boldsymbol{z}_{i+1}, \boldsymbol{z}_i, \hat{\boldsymbol{\epsilon}}(\boldsymbol{z}_i,i))$の線形結合であることである。これにより、$\boldsymbol{z}_{i+1}$が与えられた$(\boldsymbol{z}_i, \boldsymbol{z}_{i-1})$の正確な逆計算が可能になり、正確な拡散反転をもたらす。 bdia-ddimが特に画像編集に有効であることを実験により実証した。さらに,BDIA-DDIMはテキスト・ツー・イメージ生成において,DDIMよりも優れた画像サンプリング特性が得られることを示した。 BDIAはDDIMに加えて他のODEソルバの性能向上にも応用できる。本研究は,BDIAをEDMサンプリング手順に適用することにより,事前学習した4つのモデルよりも一貫して優れた性能が得られることを示す。

Recently, various methods have been proposed to address the inconsistency issue of DDIM inversion to enable image editing, such as EDICT [36] and Null-text inversion [22]. However, the above methods introduce considerable computational overhead. In this paper, we propose a new technique, named \emph{bi-directional integration approximation} (BDIA), to perform exact diffusion inversion with neglible computational overhead. Suppose we would like to estimate the next diffusion state $\boldsymbol{z}_{i-1}$ at timestep $t_i$ with the historical information $(i,\boldsymbol{z}_i)$ and $(i+1,\boldsymbol{z}_{i+1})$. We first obtain the estimated Gaussian noise $\hat{\boldsymbol{\epsilon}}(\boldsymbol{z}_i,i)$, and then apply the DDIM update procedure twice for approximating the ODE integration over the next time-slot $[t_i, t_{i-1}]$ in the forward manner and the previous time-slot $[t_i, t_{t+1}]$ in the backward manner. The DDIM step for the previous time-slot is used to refine the integration approximation made earlier when computing $\boldsymbol{z}_i$. A nice property of BDIA-DDIM is that the update expression for $\boldsymbol{z}_{i-1}$ is a linear combination of $(\boldsymbol{z}_{i+1}, \boldsymbol{z}_i, \hat{\boldsymbol{\epsilon}}(\boldsymbol{z}_i,i))$. This allows for exact backward computation of $\boldsymbol{z}_{i+1}$ given $(\boldsymbol{z}_i, \boldsymbol{z}_{i-1})$, thus leading to exact diffusion inversion. It is demonstrated with experiments that (round-trip) BDIA-DDIM is particularly effective for image editing. Our experiments further show that BDIA-DDIM produces markedly better image sampling qualities than DDIM for text-to-image generation. BDIA can also be applied to improve the performance of other ODE solvers in addition to DDIM. In our work, it is found that applying BDIA to the EDM sampling procedure produces consistently better performance over four pre-trained models.

翻訳日:2023-11-30 15:21:55 公開日:2023-11-26

# より表現力のあるグラフニューラルネットワークは生成タスクを改善するか?

Will More Expressive Graph Neural Networks do Better on Generative Tasks? ( http://arxiv.org/abs/2308.11978v2 )

ライセンス: Link先を確認

Xiandong Zou, Xiangyu Zhao, Pietro Li\`o, Yiren Zhao

(参考訳) グラフ生成は、与えられたラベルに基づいて、複数のノードとエッジを持つ完全なグラフを予測するため、大きな課題となる。この課題は、デノボ薬や分子設計を含む多くの現実世界の応用にも根本的な重要性を持っている。近年,グラフ生成分野においていくつかの手法が成功している。しかしながら、これらの手法は、(1)基礎となるグラフニューラルネットワーク(GNN)アーキテクチャがしばしば過小評価され、(2)限られた数のメトリクスで評価されることの2つの重大な欠点に悩まされている。このギャップを埋めるために、グラフ生成モデルの基盤となるGNNをより表現力のあるGNNに置き換えることで、分子グラフ生成タスクの文脈下でのGNNの表現性を調査する。具体的には、ZINC-250kデータセット上の6つの分子生成目標に対する6つのGNNの性能を、GCPNやGraphAFのような自己回帰生成モデルと、GraphEBMのような1ショット生成モデルという2つの異なる生成フレームワークで分析する。 GNNは,分子生成タスクにおけるGCPN,GraphAF,GraphEBMの性能を向上させることができるが,GNN表現性は優れたGNN生成モデルに必要な条件ではない。さらに,提案する分子生成目標 (DRD2, Median1, Median2) に基づいて, 変分オートエンコーダやベイズ最適化モデルなどの非GNNグラフ生成手法を用いて, 高度GNNを用いたGCPNとGraphAFの最先端結果が得られることを示す。

Graph generation poses a significant challenge as it involves predicting a complete graph with multiple nodes and edges based on simply a given label. This task also carries fundamental importance to numerous real-world applications, including de-novo drug and molecular design. In recent years, several successful methods have emerged in the field of graph generation. However, these approaches suffer from two significant shortcomings: (1) the underlying Graph Neural Network (GNN) architectures used in these methods are often underexplored; and (2) these methods are often evaluated on only a limited number of metrics. To fill this gap, we investigate the expressiveness of GNNs under the context of the molecular graph generation task, by replacing the underlying GNNs of graph generative models with more expressive GNNs. Specifically, we analyse the performance of six GNNs on six different molecular generative objectives on the ZINC-250k dataset in two different generative frameworks: autoregressive generation models, such as GCPN and GraphAF, and one-shot generation models, such as GraphEBM. Through our extensive experiments, we demonstrate that advanced GNNs can indeed improve the performance of GCPN, GraphAF, and GraphEBM on molecular generation tasks, but GNN expressiveness is not a necessary condition for a good GNN-based generative model. Moreover, we show that GCPN and GraphAF with advanced GNNs can achieve state-of-the-art results across 17 other non-GNN-based graph generative approaches, such as variational autoencoders and Bayesian optimisation models, on the proposed molecular generative objectives (DRD2, Median1, Median2), which are important metrics for de-novo molecular design.

翻訳日:2023-11-30 15:12:59 公開日:2023-11-26

# LLMエージェントに社会原理はあるか?

Is There Any Social Principle for LLM-Based Agents? ( http://arxiv.org/abs/2308.11136v2 )

ライセンス: Link先を確認

Jitao Bai, Simiao Zhang, Zhonghao Chen

(参考訳) 大規模言語モデルに基づくエージェントは、人間中心のアライメントやアプリケーション以上のものを含むべきである。エージェント自体により多くの注意を払うべきであり、エージェントに適した社会科学を確立する可能性について議論すべきである。

Focus on Large Language Model based agents should involve more than "human-centered" alignment or application. We argue that more attention should be paid to the agent itself and discuss the potential of establishing tailored social sciences for agents.

翻訳日:2023-11-30 15:12:27 公開日:2023-11-26

# アンハーモニック・アライアンス:正確なWKBはETPと出会う

An anharmonic alliance: exact WKB meets EPT ( http://arxiv.org/abs/2309.02505v2 )

ライセンス: Link先を確認

Bruno Bucciotti, Tomas Reis, and Marco Serone

(参考訳) 離散スペクトルを持つある種の量子力学系において、可観測値は$\hbar$の半連続で与えられることが示され、ボレル再帰可能な拡張を持つ$\hbar_0$-deformationsは、元のモデルを$\hbar_0=\hbar$で再現する。このような拡張はExact Perturbation Theory (EPT)と呼ばれた。本研究では, 多項式量子力学系のスペクトルを調べることにより, 厳密な wkb 法の枠組みの中で, 上記の結果が得られるかを検討する。正確な wkb の中で、エネルギー固有値は voros の記号 $a_{\gamma_i}$, $\gamma_i$ で定義される正確な量子化条件によって決定され、一般に $\hbar$ で変換される。準調和ポテンシャルにおけるエネルギー固有値のボレル和が正確なWKBでどのように出現するかをレビューした後、量子補正で高次無調和ポテンシャルに拡張する。次に、任意の多項式ポテンシャルが、正確な量子化条件が単に$a_\gamma=-1$と読み取るモデルに$\hbar_0$-変形できることを示し、すべてのエネルギー固有値に対して EPT Borel 再帰級数をもたらす。

Certain quantum mechanical systems with a discrete spectrum, whose observables are given by a transseries in $\hbar$, were shown to admit $\hbar_0$-deformations with Borel resummable expansions which reproduce the original model at $\hbar_0=\hbar$. Such expansions were dubbed Exact Perturbation Theory (EPT). We investigate how the above results can be obtained within the framework of the exact WKB method by studying the spectrum of polynomial quantum mechanical systems. Within exact WKB, energy eigenvalues are determined by exact quantization conditions defined in terms of Voros symbols $a_{\gamma_i}$, $\gamma_i$ being their associated cycles, and generally give rise to transseries in $\hbar$. After reviewing how the Borel summability of energy eigenvalues in the quartic anharmonic potential emerges in exact WKB, we extend it to higher order anharmonic potentials with quantum corrections. We then show that any polynomial potential can be $\hbar_0$-deformed to a model where the exact quantization condition reads simply $a_\gamma=-1$ and leads to the EPT Borel resummable series for all energy eigenvalues.

翻訳日:2023-11-30 15:02:55 公開日:2023-11-26

# Threshold KNN-Shapley: データ評価に対する線形時間とプライバシフレンドリなアプローチ

Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation ( http://arxiv.org/abs/2308.15709v2 )

ライセンス: Link先を確認

Jiachen T. Wang, Yuqing Zhu, Yu-Xiang Wang, Ruoxi Jia, Prateek Mittal

(参考訳) データ評価は、トレーニング機械学習(ml)モデルにおける個々のデータソースの有用性を定量化することを目的としており、データ中心のml研究の重要な側面である。しかし、データのバリュエーションは、その重要性にもかかわらずプライバシー上の問題にしばしば見過ごされる。本稿では,近年最も実践的なデータ評価手法であるKNN-Shapleyに着目し,これらの課題について考察する。我々はまず、KNN-Shapleyの固有のプライバシーリスクを強調し、KNN-Shapleyを差分プライバシー(DP)に適合させる上で重要な技術的困難を実証する。これらの課題を克服するために、プライバシーに配慮したKNN-Shapleyの改良版であるTKNN-Shapleyを導入する。 DP-TKNN-Shapleyにはいくつかの利点があり、データ品質の差別化において、民営化されたKNN-Shapleyに比べ、プライバシー利用のトレードオフが優れていることを示す。さらに、プライベートでないTKNN-Shapleyでさえ、KNN-Shapleyと同等のパフォーマンスを実現している。全体としては、TKNN-ShapleyはKNN-Shapleyに代わる有望な代替手段であることを示している。

Data valuation aims to quantify the usefulness of individual data sources in training machine learning (ML) models, and is a critical aspect of data-centric ML research. However, data valuation faces significant yet frequently overlooked privacy challenges despite its importance. This paper studies these challenges with a focus on KNN-Shapley, one of the most practical data valuation methods nowadays. We first emphasize the inherent privacy risks of KNN-Shapley, and demonstrate the significant technical difficulties in adapting KNN-Shapley to accommodate differential privacy (DP). To overcome these challenges, we introduce TKNN-Shapley, a refined variant of KNN-Shapley that is privacy-friendly, allowing for straightforward modifications to incorporate DP guarantee (DP-TKNN-Shapley). We show that DP-TKNN-Shapley has several advantages and offers a superior privacy-utility tradeoff compared to naively privatized KNN-Shapley in discerning data quality. Moreover, even non-private TKNN-Shapley achieves comparable performance as KNN-Shapley. Overall, our findings suggest that TKNN-Shapley is a promising alternative to KNN-Shapley, particularly for real-world applications involving sensitive data.

翻訳日:2023-11-30 14:59:42 公開日:2023-11-26

# Video Task Decathlon: 自動運転における画像とビデオタスクの統合

Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving ( http://arxiv.org/abs/2309.04422v2 )

ライセンス: Link先を確認

Thomas E. Huang, Yifan Liu, Luc Van Gool, Fisher Yu

(参考訳) 動的シーンで複数の異種視覚タスクを実行することは、人間の知覚能力の要点である。表現学習による画像およびビデオ認識の著しい進歩にもかかわらず、現在の研究は、タスクの特異性、均質性、あるいは単純な組み合わせのための特別なネットワークの設計に焦点を当てている。そこで我々は,様々な入出力構造を有する自律運転における主要画像および映像認識タスクのための統一モデルの構築について検討する。そこで本研究では,対象と画素の分類,セグメンテーション,局所化,関連付けにまたがる10の代表的な画像および映像タスクを含む,新たな課題であるvtd(video task decathlon)を設計した。 VTDでは,1つの構造と1組の重みを持つ統一ネットワークであるVTDNetを,全10タスクに対して開発する。 VTDNetは同様のタスクをグループ化し、タスクグループ内およびタスクグループ間で情報交換を行う。すべてのタスクにラベル付けする非現実性や,多数のタスクの共同トレーニングに伴うパフォーマンス劣化を考慮し,VTDNetの学習に成功し,性能損失を軽減するためのカリキュラムトレーニング,擬似ラベル付け,ファインチューニング(CPF)方式を設計する。 CPFで武装したVTDNetは、ほとんどのタスクにおいて、全体の20%しか計算できないシングルタスクよりも大幅に優れている。 vtdは、自動運転における知覚タスクの統一を探求するための有望な新しい方向である。

Performing multiple heterogeneous visual tasks in dynamic scenes is a hallmark of human perception capability. Despite remarkable progress in image and video recognition via representation learning, current research still focuses on designing specialized networks for singular, homogeneous, or simple combination of tasks. We instead explore the construction of a unified model for major image and video recognition tasks in autonomous driving with diverse input and output structures. To enable such an investigation, we design a new challenge, Video Task Decathlon (VTD), which includes ten representative image and video tasks spanning classification, segmentation, localization, and association of objects and pixels. On VTD, we develop our unified network, VTDNet, that uses a single structure and a single set of weights for all ten tasks. VTDNet groups similar tasks and employs task interaction stages to exchange information within and between task groups. Given the impracticality of labeling all tasks on all frames, and the performance degradation associated with joint training of many tasks, we design a Curriculum training, Pseudo-labeling, and Fine-tuning (CPF) scheme to successfully train VTDNet on all tasks and mitigate performance loss. Armed with CPF, VTDNet significantly outperforms its single-task counterparts on most tasks with only 20% overall computations. VTD is a promising new direction for exploring the unification of perception tasks in autonomous driving.

翻訳日:2023-11-30 14:47:35 公開日:2023-11-26

# 単一磁束量子回路から室温へのフォトニックリンク

Photonic link from single flux quantum circuits to room temperature ( http://arxiv.org/abs/2309.03284v2 )

ライセンス: Link先を確認

Mohan Shen, Jiacheng Xie, Yuntao Xu, Sihao Wang, Risheng Cheng, Wei Fu, Yiyu Zhou, Hong X. Tang

(参考訳) 低温環境と室温環境の間の広帯域でエネルギー効率の高い信号伝達は、超伝導量子回路や古典論理回路において大きなボトルネックとなっている。フォトニックリンクは、高い帯域幅と低い熱負荷を同時に提供することで、この課題を克服することを約束している。しかし、極低温電気光学変調器の開発は、超伝導回路の厳密な要求により、電気信号のフォトニック読み出しの鍵となる。例えば、ラピッド単一磁束量子回路(rsfq)は、従来の回路で使用される電圧レベルの信号よりもはるかに低い数ミリボルト(mv)の小さな信号振幅で動作している。本稿では,1m長のSEOM上に42mVの極低半波電圧V{\piを印加した新しい超伝導電気光学変調器(SEOM)により,追加の電気増幅を行なわずにRSFQ回路を初めて直接的に読み取ることを示す。超伝導体の低オーミック損失を利用して、基本V{\pi}帯域幅のトレードオフを破り、低温で0.2mのSEOMで最大17GHzの光帯域を示す。本研究は,今後の大型超伝導回路と室温電子回路間の高帯域信号伝送を実現するための有効なソリューションを提案する。

Broadband, energy-efficient signal transfer between cryogenic and room-temperature environment has been a major bottleneck for superconducting quantum and classical logic circuits. Photonic links promise to overcome this challenge by offering simultaneous high bandwidth and low thermal load. However, the development of cryogenic electro-optic modulators -- a key component for photonic readout of electrical signals -- has been stifled by the stringent requirements of superconducting circuits. Rapid single flux quantum circuits (RSFQ), for example, operate with a tiny signal amplitude of only a few millivolts (mV), far below the volt-level signal used in conventional circuits. Here, we demonstrate the first direct optical readout of an RSFQ circuit without additional electrical amplification enabled by a novel superconducting electro-optic modulator (SEOM) featuring a record-low half-wave voltage V{\pi} of 42 mV on a 1 m-long SEOM. Leveraging the low ohmic loss of superconductors, we break the fundamental V{\pi}-bandwidth trade-off and demonstrate electro-optic bandwidth up to 17 GHz on a 0.2 m-long SEOM at cryogenic temperatures. Our work presents a viable solution toward high-bandwidth signal transfer between future large-scale superconducting circuits and room-temperature electronics.

翻訳日:2023-11-30 14:47:07 公開日:2023-11-26

# プラグアンドプレイ演算子の収縮性について

On the Contractivity of Plug-and-Play Operators ( http://arxiv.org/abs/2309.16899v2 )

ライセンス: Link先を確認

Chirayu D. Athalye, Kunal N. Chaudhury, and Bhartendu Kumar

(参考訳) プラグ・アンド・プレイ(PnP)正則化では、ISTAやADMMといったアルゴリズムの近似演算子を強力なデノイザに置き換える。この形式的な置換は実際驚くほどうまく機能する。実際、PnPは様々なイメージング応用に最先端の結果をもたらすことが示されている。 pnpの実証的な成功は、研究者がその理論的基盤、特に収束を理解する動機となった。先行研究において、非局所的な手段のようなカーネルのノイズに対して、pnp-istaは前方モデル上のいくつかの強い仮定の下で確実に収束することを示した。フォワードモデルにおける仮定を緩和できるか? 収束解析はPnP-ADMMに拡張できるのか? 収束率を推定できますか? 本文では, 縮尺写像定理を用いてこれらの問題を解く。 i) 対称雑音に対するPnP-ISTAとPnP-ADMMが線形収束を示すことを示す。 (II) カーネルデノイザでは, PnP-ISTA と PnP-ADMM がイメージインペイントに対して直線的に収束することを示す。再建実験を用いて理論的知見を検証した。

In plug-and-play (PnP) regularization, the proximal operator in algorithms such as ISTA and ADMM is replaced by a powerful denoiser. This formal substitution works surprisingly well in practice. In fact, PnP has been shown to give state-of-the-art results for various imaging applications. The empirical success of PnP has motivated researchers to understand its theoretical underpinnings and, in particular, its convergence. It was shown in prior work that for kernel denoisers such as the nonlocal means, PnP-ISTA provably converges under some strong assumptions on the forward model. The present work is motivated by the following questions: Can we relax the assumptions on the forward model? Can the convergence analysis be extended to PnP-ADMM? Can we estimate the convergence rate? In this letter, we resolve these questions using the contraction mapping theorem: (i) for symmetric denoisers, we show that (under mild conditions) PnP-ISTA and PnP-ADMM exhibit linear convergence; and (ii) for kernel denoisers, we show that PnP-ISTA and PnP-ADMM converge linearly for image inpainting. We validate our theoretical findings using reconstruction experiments.

翻訳日:2023-11-30 14:40:30 公開日:2023-11-26

# コンテキスト内学習に人間生成のデモンストレーションは必要か?

Are Human-generated Demonstrations Necessary for In-context Learning? ( http://arxiv.org/abs/2309.14681v3 )

ライセンス: Link先を確認

Rui Li, Guoyin Wang, Jiwei Li

(参考訳) 大規模言語モデル(llm)の有望な少数ショット能力にもかかわらず、インコンテキスト学習(icl)の標準パラダイムは、選択されたデモンストレーションに対する感受性の欠点と、これらのデモを生成するための複雑さに苦しんでいる。本稿では,iclに人為的なデモンストレーションが必要かどうかという根本的な疑問を提起する。そこで本研究では,人間による実演を含まない自意識促進戦略 (sec) を提案する。 SECのキーポイントは、手作りの例をICLのデモとして使用する代わりに、SECは、最終出力がどの部分で生成されるかに基づいて、まず自身のデモを作成するようにLLMに求めていることだ。 secは柔軟なフレームワークであり、vailla iclとchain-of-thought(cot)の両方に対応できるが、より簡単である。算術推論、常識推論、マルチタスク言語理解、コード生成ベンチマークにおける広範な実験は、手作りのデモンストレーションを必要としないSECがゼロショット学習戦略を著しく上回り、手作りのデモでICLに匹敵する結果を達成していることを示している。これは、多くのタスクにおいて、現代のLLMは意思決定の能力にのみ依存し、外部のトレーニングデータの必要性を取り除くのに十分なレベルの能力を持っていることを示している。コードはhttps://github.com/ruili33/secで入手できる。

Despite the promising few-shot ability of large language models (LLMs), the standard paradigm of In-context Learning (ICL) suffers the disadvantages of susceptibility to selected demonstrations and the intricacy to generate these demonstrations. In this paper, we raise the fundamental question that whether human-generated demonstrations are necessary for ICL. To answer this question, we propose self-contemplation prompting strategy (SEC), a paradigm free from human-crafted demonstrations. The key point of SEC is that, instead of using hand-crafted examples as demonstrations in ICL, SEC asks LLMs to first create demonstrations on their own, based on which the final output is generated. SEC is a flexible framework and can be adapted to both the vanilla ICL and the chain-of-thought (CoT), but with greater ease: as the manual-generation process of both examples and rationale can be saved. Extensive experiments in arithmetic reasoning, commonsense reasoning, multi-task language understanding, and code generation benchmarks, show that SEC, which does not require hand-crafted demonstrations, significantly outperforms the zero-shot learning strategy, and achieves comparable results to ICL with hand-crafted demonstrations. This demonstrates that, for many tasks, contemporary LLMs possess a sufficient level of competence to exclusively depend on their own capacity for decision making, removing the need for external training data. Code is available at https://github.com/ruili33/SEC.

翻訳日:2023-11-30 14:38:44 公開日:2023-11-26

# 信頼の復号化:強化学習視点

Decoding trust: A reinforcement learning perspective ( http://arxiv.org/abs/2309.14598v2 )

ライセンス: Link先を確認

Guozhong Zheng, Jiqiang Zhang, Jing Zhang, Weiran Cai, and Li Chen

(参考訳) 信頼ゲームにおける行動実験は、信頼と信頼性が人間の間で普遍的であることを示し、正統派経済学において「ホモ・エコノミクス」を仮定することで予測と矛盾している。これは、何らかのメカニズムが彼らの出現を好む必要があることを意味する。しかし、以前の説明の多くは、ソーシャル学習の単純なバージョンである模倣学習に基づくいくつかの要因に頼る必要がある。ここでは、個人が蓄積した経験を通して長期的な回帰を評価することによって戦略を更新する強化学習のパラダイムに目を向ける。具体的には,q-learningアルゴリズムを用いて,受託者の意思決定を指導する2つのq-tableと関連づけた信頼ゲームについて検討する。両者のシナリオでは、個人が過去の経験と未来への回帰の両方を理解すれば、高いレベルの信頼と信頼感が生まれます。機械学的には、Qテーブルの進化は人間の心理的変化に似た交差を示す。また,ゲームパラメータの位相図も提供し,境界解析を行った。これらの発見は、シナリオが格子状個体群に拡張された場合、堅牢である。その結果,外部要因を伴わない信頼と信頼性の出現の自然な説明が得られた。さらに重要なことは、提案されたパラダイムは、人間の行動における多くのパズルを解読する可能性を示している。

Behavioral experiments on the trust game have shown that trust and trustworthiness are universal among human beings, contradicting the prediction by assuming \emph{Homo economicus} in orthodox Economics. This means some mechanism must be at work that favors their emergence. Most previous explanations however need to resort to some factors based upon imitative learning, a simple version of social learning. Here, we turn to the paradigm of reinforcement learning, where individuals update their strategies by evaluating the long-term return through accumulated experience. Specifically, we investigate the trust game with the Q-learning algorithm, where each participant is associated with two evolving Q-tables that guide one's decision making as trustor and trustee respectively. In the pairwise scenario, we reveal that high levels of trust and trustworthiness emerge when individuals appreciate both their historical experience and returns in the future. Mechanistically, the evolution of the Q-tables shows a crossover that resembles human's psychological changes. We also provide the phase diagram for the game parameters, where the boundary analysis is conducted. These findings are robust when the scenario is extended to a latticed population. Our results thus provide a natural explanation for the emergence of trust and trustworthiness without external factors involved. More importantly, the proposed paradigm shows the potential in deciphering many puzzles in human behaviors.

翻訳日:2023-11-30 14:38:15 公開日:2023-11-26

# 加速サンプリングのための自己調整型ハミルトンモンテカルロ

Self-Tuning Hamiltonian Monte Carlo for Accelerated Sampling ( http://arxiv.org/abs/2309.13593v2 )

ライセンス: Link先を確認

Henrik Christiansen and Federico Errica and Francesco Alesiani

(参考訳) ハミルトニアンモンテカルロシミュレーションの性能は、積分の時間ステップと積分の回数の両方に大きく依存する。本稿では,位相空間の高速探索を促進する局所損失関数に基づいて,パラメータを自動的にチューニングする適応型汎用フレームワークを提案する。損失と自己相関時間との良好な対応が確立できることを示し、完全に微分可能なセットアップを用いた勾配に基づく最適化を実現する。この損失は、積分ステップの数に対して分布の勾配駆動的な学習を可能にするように構成される。本手法は,1次元高調波振動子とアラニンジペプチドに対して,シミュレーション手法のテストケースとして一般的である。本稿では,高調波発振器の応用により,局所極小数の多い頑丈な損失面を避けるために固定時間ステップを使わないことの重要性を強調した。アラニンジペプチドの場合、損失定義の唯一の自由パラメータをチューニングすることで、そのパラメータと自己相関時間との間に良い対応が得られ、グリッド探索と比較してシミュレーションパラメータの最適化において100ドル以上の速度が向上する。このシステムでは、インテグレータを拡張して原子依存のタイムステップを可能にし、自動相関時間でさらに25\%のコストを削減します。

The performance of Hamiltonian Monte Carlo simulations crucially depends on both the integration timestep and the number of integration steps. We present an adaptive general-purpose framework to automatically tune such parameters, based on a local loss function which promotes the fast exploration of phase-space. We show that a good correspondence between loss and autocorrelation time can be established, allowing for gradient-based optimization using a fully-differentiable set-up. The loss is constructed in such a way that it also allows for gradient-driven learning of a distribution over the number of integration steps. Our approach is demonstrated for the one-dimensional harmonic oscillator and alanine dipeptide, a small protein common as a test case for simulation methods. Through the application to the harmonic oscillator, we highlight the importance of not using a fixed timestep to avoid a rugged loss surface with many local minima, otherwise trapping the optimization. In the case of alanine dipeptide, by tuning the only free parameter of our loss definition, we find a good correspondence between it and the autocorrelation times, resulting in a $>100$ fold speed up in optimization of simulation parameters compared to a grid-search. For this system, we also extend the integrator to allow for atom-dependent timesteps, providing a further reduction of $25\%$ in autocorrelation times.

翻訳日:2023-11-30 14:35:22 公開日:2023-11-26

# MiCRO:分散DNNトレーニングのスケーリングと高速化のためのニアゼロコスト勾配スカラー化

MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training ( http://arxiv.org/abs/2310.00967v2 )

ライセンス: Link先を確認

Daegun Yoon, Sangyoon Oh

(参考訳) Gradient Sparsificationは、分散ディープニューラルネットワーク(DNN)トレーニングのスケーリングと高速化のための通信最適化技術である。これにより、グラデーション集約のための通信トラフィックが増加する。しかし、勾配選択や通信トラフィックの増加といった計算コストが高いため、既存のスパルサライザはスケーラビリティに乏しい。特に通信トラフィックの増加は勾配のビルドアップと勾配選択の不適切なしきい値によって引き起こされる。これらの課題に対処するため、我々はMiCROと呼ばれる新しい勾配スカラー化手法を提案する。 MiCROでは、勾配ベクトルは分割され、各パーティションは対応するワーカーに割り当てられる。各ワーカーはそのパーティションから勾配を選択し、集約された勾配は勾配のビルドから解放される。さらに、圧縮比誤差を最小にすることで、ユーザの要求に応じて通信トラフィックを維持するための正確な閾値を推定する。 MiCROは、分散DNNトレーニングのスケーラビリティと加速を妨げる既存の問題を解決することで、ほぼゼロのコスト勾配スカラー化を可能にする。我々の大規模な実験では、MiCROは優れた収束率を持つ最先端のスパリファイアよりも優れていた。

Gradient sparsification is a communication optimisation technique for scaling and accelerating distributed deep neural network (DNN) training. It reduces the increasing communication traffic for gradient aggregation. However, existing sparsifiers have poor scalability because of the high computational cost of gradient selection and/or increase in communication traffic. In particular, an increase in communication traffic is caused by gradient build-up and inappropriate threshold for gradient selection. To address these challenges, we propose a novel gradient sparsification method called MiCRO. In MiCRO, the gradient vector is partitioned, and each partition is assigned to the corresponding worker. Each worker then selects gradients from its partition, and the aggregated gradients are free from gradient build-up. Moreover, MiCRO estimates the accurate threshold to maintain the communication traffic as per user requirement by minimising the compression ratio error. MiCRO enables near-zero cost gradient sparsification by solving existing problems that hinder the scalability and acceleration of distributed DNN training. In our extensive experiments, MiCRO outperformed state-of-the-art sparsifiers with an outstanding convergence rate.

翻訳日:2023-11-30 14:25:26 公開日:2023-11-26

# ドロップウェイポイントによる動的マルチエージェント環境における軌道予測の改善

Improving Trajectory Prediction in Dynamic Multi-Agent Environment by Dropping Waypoints ( http://arxiv.org/abs/2309.17338v2 )

ライセンス: Link先を確認

Pranav Singh Chib, Pravendra Singh

(参考訳) 本質的に多様性があり不確実な軌跡の性質は、それらを正確にモデル化する上で非常に難しい課題である。動作予測システムは、エージェントの将来の軌跡を予測するために、過去から空間的および時間的情報を効果的に学習する必要がある。既存の多くの手法は、時間的特徴を捉えるために、積み重ねられたモデル内の別々のコンポーネントを通して時間的動きを学ぶ。さらに、観測された軌道ウェイポイントシーケンスが完了したという仮定の下では、予測手法がしばしば動作し、値が不足するシナリオを無視して、その性能に影響を与える可能性がある。さらに、これらのモデルは予測を行う際に特定のウェイポイントシーケンスに偏りがある。軌道予測モデルのトレーニング中に時間依存を明示的に組み込む時間的経路点降下(twd)と呼ばれる新しい手法を提案する。過去の観測軌道から統計的にウェイポイントを落とすことにより、モデルは残りのウェイポイントから基礎となる時間的表現を学習せざるを得なくなり、モデルが改善される。確率的時間的ウェイポイントをモデル学習プロセスに組み込むことは、欠落した値のシナリオにおけるパフォーマンスを大幅に向上させる。実験の結果, 軌道予測能力の大幅な改善が示された。提案手法は,既存の軌道予測手法を補完し,予測精度を向上させる。 NBA Sports VU, ETH-UCY, TrajNet++の3つのデータセットに対する提案手法の評価を行った。

The inherently diverse and uncertain nature of trajectories presents a formidable challenge in accurately modeling them. Motion prediction systems must effectively learn spatial and temporal information from the past to forecast the future trajectories of the agent. Many existing methods learn temporal motion via separate components within stacked models to capture temporal features. Furthermore, prediction methods often operate under the assumption that observed trajectory waypoint sequences are complete, disregarding scenarios where missing values may occur, which can influence their performance. Moreover, these models may be biased toward particular waypoint sequences when making predictions. We propose a novel approach called Temporal Waypoint Dropping (TWD) that explicitly incorporates temporal dependencies during the training of a trajectory prediction model. By stochastically dropping waypoints from past observed trajectories, the model is forced to learn the underlying temporal representation from the remaining waypoints, resulting in an improved model. Incorporating stochastic temporal waypoint dropping into the model learning process significantly enhances its performance in scenarios with missing values. Experimental results demonstrate our approach's substantial improvement in trajectory prediction capabilities. Our approach can complement existing trajectory prediction methods to improve their prediction accuracy. We evaluate our proposed approach on three datasets: NBA Sports VU, ETH-UCY, and TrajNet++.

翻訳日:2023-11-30 14:24:00 公開日:2023-11-26

# 因果に準拠した説明のための深いバックトラッキング反事実

Deep Backtracking Counterfactuals for Causally Compliant Explanations ( http://arxiv.org/abs/2310.07665v2 )

ライセンス: Link先を確認

Klaus-Rudolf Kladny, Julius von K\"ugelgen, Bernhard Sch\"olkopf, Michael Muehlebach

(参考訳) 反事実は、変化した状況下で観察されたであろうこと、事実的な観察を条件に答えることによって、貴重な洞察を与えることができる。反事実の古典的介入解釈が広く研究されている一方で、バックトラックは研究の少ない代替手段となっているが、バックトラック原理はすべての因果法がそのまま維持される代替哲学として出現している。本研究では, 深部生成成分からなる構造因果モデルにおいて, 逆追従反事実を計算するための実践的手法を提案する。そこで我々は,因果モデルの構造化潜在空間におけるトラクタブルな制約付き最適化問題を解くことで,対物生成を可能にする構造的割り当てに条件を課す。また,本定式化は,反事実的説明の分野における手法との比較も促進する。これらと比較すると,本手法は汎用性,モジュール性,因果性に準拠した代替手段である。これらの特性をmnistとcelebaの修正版で実験的に実証する。

Counterfactuals can offer valuable insights by answering what would have been observed under altered circumstances, conditional on a factual observation. Whereas the classical interventional interpretation of counterfactuals has been studied extensively, backtracking constitutes a less studied alternative the backtracking principle has emerged as an alternative philosophy where all causal laws are kept intact. In the present work, we introduce a practical method for computing backtracking counterfactuals in structural causal models that consist of deep generative components. To this end, we impose conditions on the structural assignments that enable the generation of counterfactuals by solving a tractable constrained optimization problem in the structured latent space of a causal model. Our formulation also facilitates a comparison with methods in the field of counterfactual explanations. Compared to these, our method represents a versatile, modular and causally compliant alternative. We demonstrate these properties experimentally on a modified version of MNIST and CelebA.

翻訳日:2023-11-30 14:02:07 公開日:2023-11-26

# Fed-GraB: 自己調整型グラディエントバランサによる長期学習

Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer ( http://arxiv.org/abs/2310.07587v4 )

ライセンス: Link先を確認

Zikai Xiao, Zihan Chen, Songshang Liu, Hualiang Wang, Yang Feng, Jin Hao, Joey Tianyi Zhou, Jian Wu, Howard Hao Yang, Zuozhu Liu

(参考訳) データプライバシと長期分布は、多くの現実世界のタスクで例外ではなく、標準である。本稿では,各クライアントがローカルに異種データセットを持つフェデレーション・ロングテール・ラーニング(federated long-tailed learning, fed-lt)タスクについて検討する。このような条件下では、既存のフェデレーション最適化と/または集中型ロングテール学習法はほとんど適用されない。 (a)世界的長期分布をプライバシー制約下で特徴付けること (b)頭部の不均衡に対処するために局所学習戦略を調整すること。そこで本研究では,DPA(Direct Prior Analyzer)モジュールによって評価された大域的長期分布のフィードバックに基づいて,クライアントの勾配を閉ループで再重み付けする自己調整型グラディエント・バランサ(SGB)モジュールからなる,$\texttt{Fed-GraB}$という手法を提案する。クライアントは$\texttt{Fed-GraB}$を使用することで、モデルトレーニングプロセス中にデータの不均一性によって引き起こされる分散ドリフトを効果的に軽減し、多数派クラスのパフォーマンスを維持しながら、少数派クラスのパフォーマンスを向上したグローバルモデルを得ることができる。大規模な実験では、CIFAR-10-LT、CIFAR-100-LT、ImageNet-LT、iNaturalistなどの代表的なデータセットに対して、$\texttt{Fed-GraB}$が最先端のパフォーマンスを達成することが示されている。

Data privacy and long-tailed distribution are the norms rather than the exception in many real-world tasks. This paper investigates a federated long-tailed learning (Fed-LT) task in which each client holds a locally heterogeneous dataset; if the datasets can be globally aggregated, they jointly exhibit a long-tailed distribution. Under such a setting, existing federated optimization and/or centralized long-tailed learning methods hardly apply due to challenges in (a) characterizing the global long-tailed distribution under privacy constraints and (b) adjusting the local learning strategy to cope with the head-tail imbalance. In response, we propose a method termed $\texttt{Fed-GraB}$, comprised of a Self-adjusting Gradient Balancer (SGB) module that re-weights clients' gradients in a closed-loop manner, based on the feedback of global long-tailed distribution evaluated by a Direct Prior Analyzer (DPA) module. Using $\texttt{Fed-GraB}$, clients can effectively alleviate the distribution drift caused by data heterogeneity during the model training process and obtain a global model with better performance on the minority classes while maintaining the performance of the majority classes. Extensive experiments demonstrate that $\texttt{Fed-GraB}$ achieves state-of-the-art performance on representative datasets such as CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist.

翻訳日:2023-11-30 14:01:51 公開日:2023-11-26

# ニューラルネットワークの特徴の類似性を超えて:ネットワークの特徴複雑性とそのカテゴリー理論による解釈

Going Beyond Neural Network Feature Similarity: The Network Feature Complexity and Its Interpretation Using Category Theory ( http://arxiv.org/abs/2310.06756v2 )

ライセンス: Link先を確認

Yiting Chen, Zhanpeng Zhou, Junchi Yan

(参考訳) ニューラルネットワークの振舞いはいまだ不透明であり、最近広く知られる現象は、異なるランダムパラメータで初期化されると、ネットワークが同様のパフォーマンスを達成することである。この現象は、異なるネットワークによって学習された特徴間の類似性を測定することに大きな注目を集めている。しかし、同等の機能はほとんど存在しないため、同じ機能を記述することは曖昧である。本稿では、等価機能の概念を拡張し、機能的に等価機能と呼ぶものの定義を提供する。これらの特徴は特定の変換の下で等価な出力を生成する。この定義を用いて、ニューラルネットワークが各層で学習した特徴の冗長性に関して、いわゆる特徴複雑性のより内在的な指標を導出することを目指している。我々は、数学の発達した分野である圏論のレンズを通して、我々のアプローチの正式な解釈を提供する。さらに,特徴量の定量化のために,Iterative Feature Mergingというアルゴリズムを提案する。実験結果は、様々な観点から我々の考えと理論を検証した。実験により、同じニューラルネットワークで学習された異なる特徴間で機能的等価性が広く存在し、性能に影響を与えずにネットワークのパラメータ数を削減できることを実証し、ifmはデータ非依存モデルプルーネ法として大きな可能性を示している。定義された機能の複雑さに関する興味深い経験的な発見もいくつか出てきました。

The behavior of neural networks still remains opaque, and a recently widely noted phenomenon is that networks often achieve similar performance when initialized with different random parameters. This phenomenon has attracted significant attention in measuring the similarity between features learned by distinct networks. However, feature similarity could be vague in describing the same feature since equivalent features hardly exist. In this paper, we expand the concept of equivalent feature and provide the definition of what we call functionally equivalent features. These features produce equivalent output under certain transformations. Using this definition, we aim to derive a more intrinsic metric for the so-called feature complexity regarding the redundancy of features learned by a neural network at each layer. We offer a formal interpretation of our approach through the lens of category theory, a well-developed area in mathematics. To quantify the feature complexity, we further propose an efficient algorithm named Iterative Feature Merging. Our experimental results validate our ideas and theories from various perspectives. We empirically demonstrate that the functionally equivalence widely exists among different features learned by the same neural network and we could reduce the number of parameters of the network without affecting the performance.The IFM shows great potential as a data-agnostic model prune method. We have also drawn several interesting empirical findings regarding the defined feature complexity.

翻訳日:2023-11-30 14:00:26 公開日:2023-11-26

# 多ユーザ遅延フィードバックを持つ逆帯域:理論と応用

Adversarial Bandits with Multi-User Delayed Feedback: Theory and Application ( http://arxiv.org/abs/2310.11188v2 )

ライセンス: Link先を確認

Yandi Li, Jianxiong Guo, Yupeng Li, Tian Wang, Weijia Jia

(参考訳) マルチアームバンディット(MAB)モデルは、リソース割り当て、オンライン広告、動的価格設定など、様々な現実のシナリオに適用可能性や有効性から、研究の注目を集めている。重要な分野として,学習アルゴリズムに挑戦するために,概念敵が各アームに関連する報酬分布を戦略的に選択し,エージェントがアクションを取ると対応する報酬フィードバックを受け取るまでの遅延を経験する,多くの研究者によって,遅延フィードバックを伴う敵対的mab問題が提案され,研究されている。しかし、既存のモデルは1人のユーザーのみが生成するフィードバックを制限するため、複数のユーザーの一般的なシナリオ(例えば、一群のユーザーに対する広告推薦)にモデルは適用できない。本稿では,複数ユーザからのフィードバックが遅延し,内部分布が制限されないことを考察する。対照的に、フィードバック遅延は任意であり、予めプレイヤーに未知である。また、ラウンド内の異なるユーザにとって、フィードバックの遅延は遅延相関の仮定を持たない。そこで,マルチユーザによる遅延フィードバックを用いた逆MAB問題を定式化し,異なるユーザからのフィードバックの重み付けを考慮し,各ラウンドで決定を行うEXP3アルゴリズムを改良したMUD-EXP3を設計する。既知の端末ラウンドインデックス$T$, ユーザ数$M$, アーム数$N$, 遅延上限$d_{max}$の前提で、$\mathcal{O}(\sqrt{TM^2\ln{N}(N\mathrm{e}+4d_{max})} の後悔を証明する。さらに、未知の$T$のより一般的な場合、適応アルゴリズム AMUD-EXP3 は$T$に対するサブ線形後悔と共に提案される。最後に,アルゴリズムの正しさと有効性を示すため,広範な実験を行った。

The multi-armed bandit (MAB) models have attracted significant research attention due to their applicability and effectiveness in various real-world scenarios such as resource allocation, online advertising, and dynamic pricing. As an important branch, the adversarial MAB problems with delayed feedback have been proposed and studied by many researchers recently where a conceptual adversary strategically selects the reward distributions associated with each arm to challenge the learning algorithm and the agent experiences a delay between taking an action and receiving the corresponding reward feedback. However, the existing models restrict the feedback to be generated from only one user, which makes models inapplicable to the prevailing scenarios of multiple users (e.g. ad recommendation for a group of users). In this paper, we consider that the delayed feedback results are from multiple users and are unrestricted on internal distribution. In contrast, the feedback delay is arbitrary and unknown to the player in advance. Also, for different users in a round, the delays in feedback have no assumption of latent correlation. Thus, we formulate an adversarial MAB problem with multi-user delayed feedback and design a modified EXP3 algorithm MUD-EXP3, which makes a decision at each round by considering the importance-weighted estimator of the received feedback from different users. On the premise of known terminal round index $T$, the number of users $M$, the number of arms $N$, and upper bound of delay $d_{max}$, we prove a regret of $\mathcal{O}(\sqrt{TM^2\ln{N}(N\mathrm{e}+4d_{max})})$. Furthermore, for the more common case of unknown $T$, an adaptive algorithm AMUD-EXP3 is proposed with a sublinear regret with respect to $T$. Finally, extensive experiments are conducted to indicate the correctness and effectiveness of our algorithms.

翻訳日:2023-11-30 13:53:49 公開日:2023-11-26

# 公共インターネットデータを用いたマルチモーダル基礎モデルの不確かさの推定

Estimating Uncertainty in Multimodal Foundation Models using Public Internet Data ( http://arxiv.org/abs/2310.09926v2 )

ライセンス: Link先を確認

Shiladitya Dutta, Hongbo Wei, Lars van der Laan, Ahmed M. Alaa

(参考訳) ファンデーションモデルは、自己教師付き学習を使用して大規模な大量のデータに基づいて訓練されており、幅広い下流タスクへの適応を可能にする。テスト時には、これらのモデルはゼロショット機能を示し、以前は目に見えない(ユーザ指定)カテゴリを分類することができる。本稿では,これらのゼロショット予測における不確かさを定量化する問題に対処する。ウェブデータとの共形予測を用いたゼロショット設定における不確実性推定のためのヒューリスティック手法を提案する。テスト時に一連のクラスが与えられると、プロンプトテンプレート("a image of a <category>"など)を使用してクリップスタイルのモデルでゼロショットの分類を行い、オープンwebからのキャリブレーションデータに対する検索クエリと同じテンプレートを使用する。 webベースのキャリブレーションセットが与えられた場合、検索されたwebデータの潜在的なエラーを考慮し、新しいコンフォメーションスコアにコンフォメーション予測を適用する。本研究は, 生物医学基礎モデルにおける提案手法の有用性を評価し, 様々な生体医学データセットにおいて, 対象範囲を満足できる効率で達成できることを予備的に示した。

Foundation models are trained on vast amounts of data at scale using self-supervised learning, enabling adaptation to a wide range of downstream tasks. At test time, these models exhibit zero-shot capabilities through which they can classify previously unseen (user-specified) categories. In this paper, we address the problem of quantifying uncertainty in these zero-shot predictions. We propose a heuristic approach for uncertainty estimation in zero-shot settings using conformal prediction with web data. Given a set of classes at test time, we conduct zero-shot classification with CLIP-style models using a prompt template, e.g., "an image of a <category>", and use the same template as a search query to source calibration data from the open web. Given a web-based calibration set, we apply conformal prediction with a novel conformity score that accounts for potential errors in retrieved web data. We evaluate the utility of our proposed method in Biomedical foundation models; our preliminary results show that web-based conformal prediction sets achieve the target coverage with satisfactory efficiency on a variety of biomedical datasets.

翻訳日:2023-11-30 13:49:55 公開日:2023-11-26

# 事前学習拡散モデルのh空間における解釈方向の教師なし発見

Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models ( http://arxiv.org/abs/2310.09912v2 )

ライセンス: Link先を確認

Zijian Zhang, Luping Liu. Zhijie Lin, Yichen Zhu, Zhou Zhao

(参考訳) 本稿では,事前学習された拡散モデルのh空間における解釈可能な方向を識別する,教師なし学習に基づく最初の手法を提案する。提案手法は,GAN潜在空間で動作する既存の手法から導かれる。具体的には、事前学習した拡散モデルのh-スペースで動作するシフト制御モジュールを用いて、サンプルを自分自身のシフトバージョンに操作し、次いで再構成器を用いて操作のタイプと強度を再現する。それらを共同で最適化することで、モデルは自然に絡み合った解釈可能な方向を発見する。無意味かつ破壊的な方向の発見を防止するため、シフトサンプルの忠実性を維持するために識別器を用いる。拡散モデルの反復的生成過程のため、バックプロパゲート勾配に多くの中間テンソルを格納するために、我々のトレーニングは相当量のGPU VRAMを必要とする。この問題に対処するため, 勾配チェックポインティングに基づく一般的なVRAM効率トレーニングアルゴリズムを提案し, VRAMの占有を許容し, トレーニング効率を犠牲にしながら, 生成過程全体を通して勾配をバックプロパガントする。拡散モデルに関する既存の研究と比較して,本手法は,他の複雑な手順を必要とせず,本質的にグローバルかつスケーラブルな方向を識別する。各種データセットに対する大規模な実験により,本手法の有効性が示された。

We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models. Our method is derived from an existing technique that operates on the GAN latent space. Specifically, we employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself, followed by a reconstructor to reproduce both the type and the strength of the manipulation. By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions. To prevent the discovery of meaningless and destructive directions, we employ a discriminator to maintain the fidelity of shifted sample. Due to the iterative generative process of diffusion models, our training requires a substantial amount of GPU VRAM to store numerous intermediate tensors for back-propagating gradient. To address this issue, we propose a general VRAM-efficient training algorithm based on gradient checkpointing technique to back-propagate any gradient through the whole generative process, with acceptable occupancy of VRAM and sacrifice of training efficiency. Compared with existing related works on diffusion models, our method inherently identifies global and scalable directions, without necessitating any other complicated procedures. Extensive experiments on various datasets demonstrate the effectiveness of our method.

翻訳日:2023-11-30 13:49:35 公開日:2023-11-26

# ハリケーントラジェクタの地理空間予測のためのグラフ変換器

GraphTransformers for Geospatial Forecasting of Hurricane Trajectories ( http://arxiv.org/abs/2310.20174v2 )

ライセンス: Link先を確認

Pallavi Banerjee, Satyaki Chakraborty

(参考訳) 本稿では,グラフトランスフォーマを用いた地理空間シーケンスの軌跡予測のための新しい枠組みを提案する。いくつかのシーケンスを見渡すと、そのようなシーケンスモデリングタスクを考慮せずに、異なる地理空間ポイント間でグラフ構造が自動的に現れるのが観察された。このグラフ構造を明示的に活用することで,地理空間的軌道予測を大幅に改善できることを示す。当社のGraphTransformerアプローチは,ハリケーンの軌跡を6時間単位で予測するデータセットであるHURDATに基づいて,最先端のTransformerベースのベースラインを大幅に改善する。

In this paper we introduce a novel framework for trajectory prediction of geospatial sequences using GraphTransformers. When viewed across several sequences, we observed that a graph structure automatically emerges between different geospatial points that is often not taken into account for such sequence modeling tasks. We show that by leveraging this graph structure explicitly, geospatial trajectory prediction can be significantly improved. Our GraphTransformer approach improves upon state-of-the-art Transformer based baseline significantly on HURDAT, a dataset where we are interested in predicting the trajectory of a hurricane on a 6 hourly basis.

翻訳日:2023-11-30 13:29:14 公開日:2023-11-26

# FLIP: CTR予測のためのIDベースモデルと事前学習言語モデルとの微粒なアライメントを目指して

FLIP: Towards Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction ( http://arxiv.org/abs/2310.19453v2 )

ライセンス: Link先を確認

Hangyu Wang, Jianghao Lin, Xiangyang Li, Bo Chen, Chenxu Zhu, Ruiming Tang, Weinan Zhang, Yong Yu

(参考訳) クリックスルーレート(CTR)予測は、さまざまなパーソナライズされたオンラインサービスにおいてコア機能モジュールとして機能する。 CTR予測のための従来のIDベースのモデルは、特徴相互作用モデリングを通じて協調的な信号をキャプチャする表形式での1ホット符号化ID特徴を入力として捉えている。しかし、ワンホットエンコーディングは、元のフィーチャーテキストにある意味情報を破棄する。近年、PLM(Pretrained Language Models)の出現は、ハードプロンプトテンプレートによって得られるテキストモダリティの文を入力として、意味知識を抽出するためにPLMを採用する別のパラダイムを生み出している。しかし、一般的にPLMは入力されたテキストデータをサブワードトークンにトークン化し、フィールドワイドの協調信号を無視する。したがって、これらの2つの研究は、同じ入力データ(例えば、テキストと表のモダリティ)の異なる特性に焦点を当て、相互に相補的な関係を形成する。本稿では,CTR予測のためのIDベースモデルと事前学習言語モデル(FLIP)間の細粒度特徴レベルのアライメントを提案する。マスク型言語と表型モデリングの両方のための新しい統合再構築事前学習タスクをデザインする。具体的には、一方のモダリティ(トークンや特徴)のマスクされたデータは、他方のモダリティの助けを借りて復元され、双対モダリティ間の十分な相互情報抽出を通じて特徴レベルの相互作用とアライメントを確立する必要がある。さらに,下流のctr予測タスクに対して,idベースモデルとplmを共同で微調整し,両モデルの利点を組み合わせることにより,優れた性能を実現することを提案する。 3つの実世界のデータセットに対する大規模な実験により、FLIPはSOTAベースラインより優れており、様々なIDベースのモデルやPLMと高い互換性を持つことが示された。

Click-through rate (CTR) prediction plays as a core function module in various personalized online services. The traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality, which capture the collaborative signals via feature interaction modeling. But the one-hot encoding discards the semantic information conceived in the original feature texts. Recently, the emergence of Pretrained Language Models (PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality obtained by hard prompt templates and adopts PLMs to extract the semantic knowledge. However, PLMs generally tokenize the input text data into subword tokens and ignore field-wise collaborative signals. Therefore, these two lines of research focus on different characteristics of the same input data (i.e., textual and tabular modalities), forming a distinct complementary relationship with each other. In this paper, we propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models (FLIP) for CTR prediction. We design a novel joint reconstruction pretraining task for both masked language and tabular modeling. Specifically, the masked data of one modality (i.e., tokens or features) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose to jointly finetune the ID-based model and PLM for downstream CTR prediction tasks, thus achieving superior performance by combining the advantages of both models. Extensive experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible for various ID-based models and PLMs.

翻訳日:2023-11-30 13:27:38 公開日:2023-11-26

# 3次元不確かさ場の推定:神経放射場に対する不確かさの定量化

Estimating 3D Uncertainty Field: Quantifying Uncertainty for Neural Radiance Fields ( http://arxiv.org/abs/2311.01815v2 )

ライセンス: Link先を確認

Jianxiong Shen and Ruijie Ren and Adria Ruiz and Francesc Moreno-Noguer

(参考訳) ニューラル・ラジアンス・フィールド(NeRF)に基づく現在の手法では、特に隠蔽されたシーンや外部シーンの内容を含む見えない領域において、予測の不確かさを定量化する能力が著しく欠如している。この制限は、モデル予測の信頼性を未知の環境でのロボット探索や計画といったタスクに考慮しなければならないロボット工学の広範な応用を妨げる。そこで本研究では,これらの不完全領域を明示的に識別する学習不完全シーン幾何に基づく3次元不確かさ場を推定する新しい手法を提案する。各カメラ線に沿って蓄積された透過率を考慮すると、不確実性フィールドは2次元不確かさを推定し、シーン内容の内外に直接投射する光に対して高い値を示す。学習面上の不確実性を定量化するために,確率的放射場をモデル化する。近年の手法と比較して、3D未確認領域と2Dレンダリングピクセルの両方で高い不確実性について明確に推論できるのは,本手法のみであることを示す。さらに,我々が設計した不確実性分野は,次の視点選択のような実世界のロボット作業に理想的に適していることを示す。

Current methods based on Neural Radiance Fields (NeRF) significantly lack the capacity to quantify uncertainty in their predictions, particularly on the unseen space including the occluded and outside scene content. This limitation hinders their extensive applications in robotics, where the reliability of model predictions has to be considered for tasks such as robotic exploration and planning in unknown environments. To address this, we propose a novel approach to estimate a 3D Uncertainty Field based on the learned incomplete scene geometry, which explicitly identifies these unseen regions. By considering the accumulated transmittance along each camera ray, our Uncertainty Field infers 2D pixel-wise uncertainty, exhibiting high values for rays directly casting towards occluded or outside the scene content. To quantify the uncertainty on the learned surface, we model a stochastic radiance field. Our experiments demonstrate that our approach is the only one that can explicitly reason about high uncertainty both on 3D unseen regions and its involved 2D rendered pixels, compared with recent methods. Furthermore, we illustrate that our designed uncertainty field is ideally suited for real-world robotics tasks, such as next-best-view selection.

翻訳日:2023-11-30 13:17:19 公開日:2023-11-26

# EHR監査ログのエントロピー推定のための自己回帰型言語モデル

Autoregressive Language Models For Estimating the Entropy of Epic EHR Audit Logs ( http://arxiv.org/abs/2311.06401v3 )

ライセンス: Link先を確認

Benjamin C. Warner, Thomas Kannampallil, Seunghwan Kim

(参考訳) EHR監査ログは、臨床医の活動を捉えた、非常にきめ細かい出来事のストリームであり、電子健康記録(EHR)で臨床医のワークフローを特徴づける研究において重要な領域である。 EHR監査ログ(監査ログ)を通じてワークフローの複雑さを測定する既存のテクニックには、EHRセッションの完全な複雑さを捉えることができない時間または周波数ベースの横断的な集約が含まれる。ワークフロー内の動作シーケンスのエントロピーや不規則性を測定し、評価モデルを公開する上で、トランスフォーマティブベースの表型言語モデル(tabular lm)の使用法を簡単に評価する。

EHR audit logs are a highly granular stream of events that capture clinician activities, and is a significant area of interest for research in characterizing clinician workflow on the electronic health record (EHR). Existing techniques to measure the complexity of workflow through EHR audit logs (audit logs) involve time- or frequency-based cross-sectional aggregations that are unable to capture the full complexity of a EHR session. We briefly evaluate the usage of transformer-based tabular language model (tabular LM) in measuring the entropy or disorderedness of action sequences within workflow and release the evaluated models publicly.

翻訳日:2023-11-30 13:07:45 公開日:2023-11-26

# 証明可能訓練可能な回転同値量子機械学習

Provably Trainable Rotationally Equivariant Quantum Machine Learning ( http://arxiv.org/abs/2311.05873v2 )

ライセンス: Link先を確認

Maxwell T. West, Jamie Heredge, Martin Sevior and Muhammad Usman

(参考訳) 優れた機械学習アルゴリズムを実現するために量子計算のパワーを爆発させることは、近年では大きな研究の焦点となっているが、量子機械学習(QML)の展望は、かなりの技術的課題によって低下している。特に重要な問題は、一般的なQMLモデルは、トレーニングランドスケープにおいていわゆる不毛の台地に悩まされていることだ。この効果に対抗するための主要な戦略は、ヒルベルト空間のより小さく関連する部分集合に集中するために、データの対称性を考慮した問題固有のモデルを構築することである。本研究では、量子フーリエ変換に基づいて構築された回転同変QMLモデルの族を導入し、リー代数的なQMLモデルの最近の知見を活用し、我々のモデルのサブセットがバレンプラトーを示さないことを示す。解析結果に加えて, シリコン中のリン不純物の模擬走査トンネル顕微鏡画像のデータセット上で, 回転対称性が自然に生じる場合の回転同変モデルを数値的に検証し, それらが実用上劇的に向上していることを見出した。

Exploiting the power of quantum computation to realise superior machine learning algorithmshas been a major research focus of recent years, but the prospects of quantum machine learning (QML) remain dampened by considerable technical challenges. A particularly significant issue is that generic QML models suffer from so-called barren plateaus in their training landscapes -- large regions where cost function gradients vanish exponentially in the number of qubits employed, rendering large models effectively untrainable. A leading strategy for combating this effect is to build problem-specific models which take into account the symmetries of their data in order to focus on a smaller, relevant subset of Hilbert space. In this work, we introduce a family of rotationally equivariant QML models built upon the quantum Fourier transform, and leverage recent insights from the Lie-algebraic study of QML models to prove that (a subset of) our models do not exhibit barren plateaus. In addition to our analytical results we numerically test our rotationally equivariant models on a dataset of simulated scanning tunnelling microscope images of phosphorus impurities in silicon, where rotational symmetry naturally arises, and find that they dramatically outperform their generic counterparts in practice.

翻訳日:2023-11-30 13:07:14 公開日:2023-11-26

# 非定常テスト時間適応のための層間自動重み付け

Layer-wise Auto-Weighting for Non-Stationary Test-Time Adaptation ( http://arxiv.org/abs/2311.05858v3 )

ライセンス: Link先を確認

Junyoung Park, Jin Kim, Hyeongjun Kwon, Ilhoon Yoon, Kwanghoon Sohn

(参考訳) 実世界のアプリケーションにおける推論中のドメインシフトの必然性を考えると、テスト時間適応(TTA)はデプロイ後のモデル適応に不可欠である。しかし、目標分布を継続的に変化させる現実のシナリオは、破滅的な忘れ込みやエラーの蓄積といった課題を呈している。非定常領域シフトのための既存のTTAメソッドは、有効ではあるが過剰な計算負荷を発生させ、デバイス上の設定では実用的ではない。本稿では,保存や集中的適応のための層を自律的に識別する連続的および漸進的ttaの自動重み付けアルゴリズムを提案する。 fim(fisher information matrix)を活用することで,まず学習重みを設計,無関係なものを保存しつつ,ログライクな変化に関連するレイヤを選択的に重視する。そこで我々はさらに,特定の層をほぼ凍結させる指数的min-maxスケーラを提案する。これにより、忘れとエラーの蓄積を最小限に抑え、非定常目標分布に効率よく適応する。 CIFAR-10C, CIFAR-100C, ImageNet-C を用いた実験により,本手法は従来の連続的および漸進的TTA手法より優れ, 計算負荷を著しく低減し, 連続的あるいは漸進的な目標領域への適応におけるFIMベースの学習重みの重要性を強調した。

Given the inevitability of domain shifts during inference in real-world applications, test-time adaptation (TTA) is essential for model adaptation after deployment. However, the real-world scenario of continuously changing target distributions presents challenges including catastrophic forgetting and error accumulation. Existing TTA methods for non-stationary domain shifts, while effective, incur excessive computational load, making them impractical for on-device settings. In this paper, we introduce a layer-wise auto-weighting algorithm for continual and gradual TTA that autonomously identifies layers for preservation or concentrated adaptation. By leveraging the Fisher Information Matrix (FIM), we first design the learning weight to selectively focus on layers associated with log-likelihood changes while preserving unrelated ones. Then, we further propose an exponential min-max scaler to make certain layers nearly frozen while mitigating outliers. This minimizes forgetting and error accumulation, leading to efficient adaptation to non-stationary target distribution. Experiments on CIFAR-10C, CIFAR-100C, and ImageNet-C show our method outperforms conventional continual and gradual TTA approaches while significantly reducing computational load, highlighting the importance of FIM-based learning weight in adapting to continuously or gradually shifting target domains.

翻訳日:2023-11-30 13:06:52 公開日:2023-11-26

# Mirror: さまざまな情報抽出タスクのためのユニバーサルフレームワーク

Mirror: A Universal Framework for Various Information Extraction Tasks ( http://arxiv.org/abs/2311.05419v2 )

ライセンス: Link先を確認

Tong Zhu, Junfei Ren, Zijian Yu, Mengsong Wu, Guoliang Zhang, Xiaoye Qu, Wenliang Chen, Zhefeng Wang, Baoxing Huai, Min Zhang

(参考訳) 情報抽出タスク間の知識の共有は、さまざまなデータフォーマットとタスクのバリエーションのため、常に課題となっている。一方、この分散は情報の無駄を招き、実際のシナリオにおける複雑なアプリケーション構築の困難を増す。最近の研究は、しばしば三重項抽出問題としてIEタスクを定式化している。しかし、そのようなパラダイムはマルチスパンとn-ary抽出をサポートしておらず、弱い汎用性をもたらす。この目的のために、我々はIE問題を統一されたマルチスロットタプルに再編成し、様々なIEタスク、すなわちMirrorのための普遍的なフレームワークを提案する。具体的には、既存のieタスクをマルチスパン循環グラフ抽出問題として再キャストし、非自己回帰グラフ復号アルゴリズムを考案し、すべてのスパンを1ステップで抽出する。このグラフ構造は驚くほど汎用性があり、複雑なIEタスクだけでなく、機械読み取りの理解や分類タスクもサポートしています。モデル事前学習のための57のデータセットを含むコーパスを手動で構築し、8つの下流タスクにわたる30のデータセットで実験を行う。実験結果から,本モデルは良好な互換性を示し,ショット数やゼロショット数でSOTAシステムと競合する性能を示した。コード、モデルの重み付け、事前トレーニングコーパスはhttps://github.com/Spico197/Mirror.orgで入手できる。

Sharing knowledge between information extraction tasks has always been a challenge due to the diverse data formats and task variations. Meanwhile, this divergence leads to information waste and increases difficulties in building complex applications in real scenarios. Recent studies often formulate IE tasks as a triplet extraction problem. However, such a paradigm does not support multi-span and n-ary extraction, leading to weak versatility. To this end, we reorganize IE problems into unified multi-slot tuples and propose a universal framework for various IE tasks, namely Mirror. Specifically, we recast existing IE tasks as a multi-span cyclic graph extraction problem and devise a non-autoregressive graph decoding algorithm to extract all spans in a single step. It is worth noting that this graph structure is incredibly versatile, and it supports not only complex IE tasks, but also machine reading comprehension and classification tasks. We manually construct a corpus containing 57 datasets for model pretraining, and conduct experiments on 30 datasets across 8 downstream tasks. The experimental results demonstrate that our model has decent compatibility and outperforms or reaches competitive performance with SOTA systems under few-shot and zero-shot settings. The code, model weights, and pretraining corpus are available at https://github.com/Spico197/Mirror .

翻訳日:2023-11-30 13:05:05 公開日:2023-11-26

# 抽象・推論課題におけるヒト, GPT-4, GPT-4Vの比較

Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks ( http://arxiv.org/abs/2311.09247v2 )

ライセンス: Link先を確認

Melanie Mitchell, Alessandro B. Palmarini, Arseny Moskvichev

(参考訳) GPT-4のテキストのみおよびマルチモーダル版の抽象的推論能力について,コア知識の概念による堅牢な理解と推論の評価を目的としたConceptARCベンチマーク[10]を用いて検討する。我々はmoskvichevらの仕事を拡大する。 [10]概念ARCタスクのテキストバージョンでGPT-4をより詳細に評価し(単純なゼロショットプロンプトではなく)、最も単純なタスクの画像バージョンを用いてGPT-4のマルチモーダルバージョンであるGPT-4Vを評価する。実験結果から,GPT-4のどちらのバージョンも人間に近いレベルで頑健な抽象化能力を開発していないという結論が得られた。

We explore the abstract reasoning abilities of text-only and multimodal versions of GPT-4, using the ConceptARC benchmark [10], which is designed to evaluate robust understanding and reasoning with core-knowledge concepts. We extend the work of Moskvichev et al. [10] by evaluating GPT-4 on more detailed, one-shot prompting (rather than simple, zero-shot prompts) with text versions of ConceptARC tasks, and by evaluating GPT-4V, the multimodal version of GPT-4, on zero- and one-shot prompts using image versions of the simplest tasks. Our experimental results support the conclusion that neither version of GPT-4 has developed robust abstraction abilities at humanlike levels.

翻訳日:2023-11-30 12:56:24 公開日:2023-11-26

# 微粒化エンタングルメントの精製

Tetrationally Compact Entanglement Purification ( http://arxiv.org/abs/2311.10971v2 )

ライセンス: Link先を確認

Craig Gidney

(参考訳) 本論文は, 絡み合いの共有に使用される量子チャネルにのみノイズの源が存在することを前提として, 絡み合いを最小限のストレージで浄化できることを示唆する。目標の不確かさである$\epsilon$との絡み合ったペアは、$o(\log^{\ast} \frac{1}{\epsilon})$ストレージ空間を使って$\tilde{o}(\log \frac{1}{\epsilon})$で作成することができる。これは、エラー検出の複数のステージを使用して、各ステージ内で強化される。例えば、11キュービットのノイズのないストレージは、エンタングルメントを3ドルの不確かさで10-100000000000000000000000000000000000000000000000}$のエンタングルメントに変換するのに十分であることを示している。

This paper shows that entanglement can be purified using very little storage, assuming the only source of noise is in the quantum channel being used to share the entanglement. Entangled pairs with a target infidelity of $\epsilon$ can be created in $\tilde{O}(\log \frac{1}{\epsilon})$ time using $O(\log^{\ast} \frac{1}{\epsilon})$ storage space, where $\log^{\ast}$ is the iterated logarithm. This is achieved by using multiple stages of error detection, with boosting within each stage. For example, the paper shows that 11 qubits of noiseless storage is enough to turn entanglement with an infidelity of $1/3$ into entanglement with an infidelity of $10^{-1000000000000000000000000000}$.

翻訳日:2023-11-30 12:45:48 公開日:2023-11-26

# ケースリポジトリ:aiアライメントのためのケースベース推論に向けて

Case Repositories: Towards Case-Based Reasoning for AI Alignment ( http://arxiv.org/abs/2311.10934v3 )

ライセンス: Link先を確認

K. J. Kevin Feng, Quan Ze Chen, Inyoung Cheong, King Xia, Amy X. Zhang

(参考訳) ケーススタディは一般的に、法、倫理、その他の多くの領域において、人間の価値観によって知らされる複雑で曖昧な社会的問題に直面している。 aiが実際にどのように連携すべきかを考えると、同じような複雑さと曖昧さが生まれます。異なる個人やコミュニティの多様な(そして時には矛盾する)価値に直面するとき、その価値はaiと一致し、aiはどうすればよいのか? ケースベース推論(CBR)の考え方を基礎として,一組の事例に基づく判断による政策構築に焦点を当てた,立憲AIアライメントのための補完的アプローチを提案する。このようなケースリポジトリを組み立てるプロセスを示します。 1) 'seed'' ケースのセットの収集 -- ai システムに質問する可能性のある質問 -- 特定のドメインにおいて。 2【ドメインの専門家とのワークショップによるケースのドメイン固有のキーディメンジョンの抽出】 3) LLM を用いて野生で見られない症例のバリエーションを発生させ, 4) 事件の審理及び改善を公に行うこと。次に、このようなケースリポジトリがaiアライメントにどのように役立つかについて議論し、受け入れ可能な行動の先例として直接行動し、個人やコミュニティがaiの倫理的推論に携わる媒体としての役割を論じる。

Case studies commonly form the pedagogical backbone in law, ethics, and many other domains that face complex and ambiguous societal questions informed by human values. Similar complexities and ambiguities arise when we consider how AI should be aligned in practice: when faced with vast quantities of diverse (and sometimes conflicting) values from different individuals and communities, with whose values is AI to align, and how should AI do so? We propose a complementary approach to constitutional AI alignment, grounded in ideas from case-based reasoning (CBR), that focuses on the construction of policies through judgments on a set of cases. We present a process to assemble such a case repository by: 1) gathering a set of ``seed'' cases -- questions one may ask an AI system -- in a particular domain, 2) eliciting domain-specific key dimensions for cases through workshops with domain experts, 3) using LLMs to generate variations of cases not seen in the wild, and 4) engaging with the public to judge and improve cases. We then discuss how such a case repository could assist in AI alignment, both through directly acting as precedents to ground acceptable behaviors, and as a medium for individuals and communities to engage in moral reasoning around AI.

翻訳日:2023-11-30 12:45:25 公開日:2023-11-26

# OCT2 Confocal: 3D CycleGANによる網膜OCT画像の共焦点顕微鏡への変換

OCT2Confocal: 3D CycleGAN based Translation of Retinal OCT Images to Confocal Microscopy ( http://arxiv.org/abs/2311.10902v2 )

ライセンス: Link先を確認

Xin Tian, Nantheera Anantrasirichai, Lindsay Nicholson, Alin Achim

(参考訳) 光コヒーレンス断層撮影(oct)と共焦点顕微鏡は網膜イメージングにおいて重要な役割を果たす。 in vivo octは高速で非侵襲的なイメージングを提供するが、明快な問題やモーションアーティファクトによって妨げられる。生体内共焦点顕微鏡は高解像度の細胞色像を提供するが、侵襲的であり、倫理的懸念と潜在的な組織損傷をもたらす。これらのモダリティを橋渡しするために,生体共焦点顕微鏡画像へのOCTの教師なし翻訳のための3D CycleGANフレームワークを開発した。 OCT2Confocalのデータセットに適用すると、このフレームワークは3Dの医療データドメイン間で効果的に翻訳され、血管、テクスチャ、細胞の詳細を精度良くキャプチャする。これは、octの固有の3d情報を活用し、共焦点顕微鏡のリッチで詳細な色領域に変換する最初の試みである。 3D CycleGANフレームワークは、量的および質的なメトリクスを通じて評価され、圧縮可能な画像の忠実さと品質を示し、制限されたデータの制約にもかかわらず既存の手法より優れている。この非侵襲的な網膜共焦点画像の生成は、眼科における診断とモニタリング機能をさらに強化する可能性がある。

Optical coherence tomography (OCT) and confocal microscopy are pivotal in retinal imaging, each presenting unique benefits and limitations. In vivo OCT offers rapid, non-invasive imaging but can be hampered by clarity issues and motion artifacts. Ex vivo confocal microscopy provides high-resolution, cellular detailed color images but is invasive and poses ethical concerns and potential tissue damage. To bridge these modalities, we developed a 3D CycleGAN framework for unsupervised translation of in vivo OCT to ex vivo confocal microscopy images. Applied to our OCT2Confocal dataset, this framework effectively translates between 3D medical data domains, capturing vascular, textural, and cellular details with precision. This marks the first attempt to exploit the inherent 3D information of OCT and translate it into the rich, detailed color domain of confocal microscopy. Assessed through quantitative and qualitative metrics, the 3D CycleGAN framework demonstrates commendable image fidelity and quality, outperforming existing methods despite the constraints of limited data. This non-invasive generation of retinal confocal images has the potential to further enhance diagnostic and monitoring capabilities in ophthalmology.

翻訳日:2023-11-30 12:45:01 公開日:2023-11-26

# 信頼できる大規模ビジョンモデル:サーベイ

Trustworthy Large Models in Vision: A Survey ( http://arxiv.org/abs/2311.09680v3 )

ライセンス: Link先を確認

Ziyan Guo and Li Xu and Jun Liu

(参考訳) 大規模モデル(LM)の急速な進歩は、最近、自然言語処理(NLP)からコンピュータビジョン(CV)まで、様々な分野の深層学習に革命をもたらした。しかし、LMは強力な性能を持つが信頼できない行動のため、学界や業界によってますます批判され、信頼性の高い方法によって緊急に緩和される必要がある。 NLPにおける信頼できるLMに関する文献が豊富にあるにもかかわらず、CVにおけるLMの信頼性を特に調査する体系的な調査はいまだに残っていない。このギャップを緩和するために,本調査におけるlmsの視点における信頼に値する利用を妨げる4つの懸念を要約する。 1)人間の誤用。 2)脆弱性。 3)本質的な問題 4) 解釈可能。それぞれの課題、対策、議論を強調することにより、この調査が読者のこの分野に対する理解を促進し、LMと人間の期待との整合を促進し、人類社会の災害というよりは、信頼できるLMを福祉として機能させることを期待する。

The rapid progress of Large Models (LMs) has recently revolutionized various fields of deep learning with remarkable grades, ranging from Natural Language Processing (NLP) to Computer Vision (CV). However, LMs are increasingly challenged and criticized by academia and industry due to their powerful performance but untrustworthy behavior, which urgently needs to be alleviated by reliable methods. Despite the abundance of literature on trustworthy LMs in NLP, a systematic survey specifically delving into the trustworthiness of LMs in CV remains absent. In order to mitigate this gap, we summarize four relevant concerns that obstruct the trustworthy usage in vision of LMs in this survey, including 1) human misuse, 2) vulnerability, 3) inherent issue and 4) interpretability. By highlighting corresponding challenge, countermeasures, and discussion in each topic, we hope this survey will facilitate readers' understanding of this field, promote alignment of LMs with human expectations and enable trustworthy LMs to serve as welfare rather than disaster for human society.

翻訳日:2023-11-30 12:41:12 公開日:2023-11-26

# 光フローのないビデオフレーム補間のためのマルチインシングルアウトネットワーク

A Multi-In-Single-Out Network for Video Frame Interpolation without Optical Flow ( http://arxiv.org/abs/2311.11602v2 )

ライセンス: Link先を確認

Jaemin Lee, Minseok Seo, Sangwoo Lee, Hyobin Park, Dong-Geol Choi

(参考訳) 一般に、深層学習に基づくビデオフレーム補間(vfi)法は、主に2つの入力フレーム間の動きベクトルを推定し、それを目標時間にゆがめることに焦点を当てている。このアプローチは2つの入力フレーム間の線形運動に対して顕著な性能を示すが、オクルージョンや非線形運動を扱う際の限界を示す。近年,これらの問題に対処するための生成モデルがVFIに適用されている。しかしながら、VFIは可塑性画像の生成に重点を置いているのではなく、与えられた2つのフレーム間の正確な中間フレームの予測に重点を置いているため、性能制限は継続する。本稿では,動作ベクトル推定に依存しないマルチインシングルアウト(MISO)に基づくVFI手法を提案し,オクルージョンと非線形動作を効果的にモデル化する。さらに,MISO-VFIによりビデオフレーム内の時空間相関をよりよく捉えることができる新しい動き知覚損失を導入する。 MISO-VFI法は,VFIベンチマークのVimeo90K,Middlebury,UCF101において,既存手法と比較して高い性能差を示した。

In general, deep learning-based video frame interpolation (VFI) methods have predominantly focused on estimating motion vectors between two input frames and warping them to the target time. While this approach has shown impressive performance for linear motion between two input frames, it exhibits limitations when dealing with occlusions and nonlinear movements. Recently, generative models have been applied to VFI to address these issues. However, as VFI is not a task focused on generating plausible images, but rather on predicting accurate intermediate frames between two given frames, performance limitations still persist. In this paper, we propose a multi-in-single-out (MISO) based VFI method that does not rely on motion vector estimation, allowing it to effectively model occlusions and nonlinear motion. Additionally, we introduce a novel motion perceptual loss that enables MISO-VFI to better capture the spatio-temporal correlations within the video frames. Our MISO-VFI method achieves state-of-the-art results on VFI benchmarks Vimeo90K, Middlebury, and UCF101, with a significant performance gap compared to existing approaches.

翻訳日:2023-11-30 12:31:46 公開日:2023-11-26

# akconv: 任意のサンプル形状と任意の数のパラメータを持つ畳み込みカーネル

AKConv: Convolutional Kernel with Arbitrary Sampled Shapes and Arbitrary Number of Parameters ( http://arxiv.org/abs/2311.11587v2 )

ライセンス: Link先を確認

Xin Zhang, Yingze Song, Tingting Song, Degang Yang, Yichen Ye, Jie Zhou and Liming Zhang

(参考訳) 畳み込み操作に基づくニューラルネットワークは、ディープラーニングの分野で顕著な成果を上げているが、標準的な畳み込み操作には2つの固有の欠陥がある。一方、畳み込み操作はローカルウィンドウに制限され、他の場所からの情報をキャプチャできないため、サンプリングされた形状が固定される。一方、畳み込み核のサイズは k$\times$ k に固定されており、これは固定された正方形であり、パラメータの数はサイズとともに正方形に増加する傾向にある。ターゲットの形状とサイズが、異なるデータセットや異なる場所で異なることは明らかである。固定されたサンプル形状と正方形を持つ畳み込みカーネルは、ターゲットの変化にうまく適応しない。上記の質問に応えて、Alterable Kernel Convolution (AKConv) が本研究で検討され、畳み込みカーネルに任意の数のパラメータと任意のサンプル形状を与え、ネットワークオーバヘッドとパフォーマンスのトレードオフのためのよりリッチなオプションを提供する。 AKConvでは、新しい座標生成アルゴリズムを用いて任意の大きさの畳み込みカーネルの初期位置を定義する。ターゲットの変化に適応するため,各位置におけるサンプルの形状を調整するためのオフセットを導入する。さらに、同じ大きさと異なる初期サンプル形状のAKConvを用いてニューラルネットワークの効果について検討する。 AKConvは、不規則な畳み込み操作による効率的な特徴抽出のプロセスを完了し、畳み込みサンプリング形状に対するさらなる探索オプションを提供する。代表的なデータセットCOCO2017、VOC 7+12、VisDrone-DET2021のオブジェクト検出実験は、AKConvの利点を十分に証明している。 AKConvは、ネットワーク性能を改善するために畳み込み操作を置き換えるためのプラグアンドプレイ畳み込み操作として使用できる。関連するタスクのコードはhttps://github.com/CV-ZhangXin/AKConvで確認できる。

Neural networks based on convolutional operations have achieved remarkable results in the field of deep learning, but there are two inherent flaws in standard convolutional operations. On the one hand, the convolution operation be confined to a local window and cannot capture information from other locations, and its sampled shapes is fixed. On the other hand, the size of the convolutional kernel is fixed to k $\times$ k, which is a fixed square shape, and the number of parameters tends to grow squarely with size. It is obvious that the shape and size of targets are various in different datasets and at different locations. Convolutional kernels with fixed sample shapes and squares do not adapt well to changing targets. In response to the above questions, the Alterable Kernel Convolution (AKConv) is explored in this work, which gives the convolution kernel an arbitrary number of parameters and arbitrary sampled shapes to provide richer options for the trade-off between network overhead and performance. In AKConv, we define initial positions for convolutional kernels of arbitrary size by means of a new coordinate generation algorithm. To adapt to changes for targets, we introduce offsets to adjust the shape of the samples at each position. Moreover, we explore the effect of the neural network by using the AKConv with the same size and different initial sampled shapes. AKConv completes the process of efficient feature extraction by irregular convolutional operations and brings more exploration options for convolutional sampling shapes. Object detection experiments on representative datasets COCO2017, VOC 7+12 and VisDrone-DET2021 fully demonstrate the advantages of AKConv. AKConv can be used as a plug-and-play convolutional operation to replace convolutional operations to improve network performance. The code for the relevant tasks can be found at https://github.com/CV-ZhangXin/AKConv.

翻訳日:2023-11-30 12:31:25 公開日:2023-11-26

# インドにおける新型コロナウイルスワクチンの機械学習による感受性分析

Unveiling Public Perceptions: Machine Learning-Based Sentiment Analysis of COVID-19 Vaccines in India ( http://arxiv.org/abs/2311.11435v2 )

ライセンス: Link先を確認

Milind Gupta and Abhishek Kaushik

(参考訳) 2020年3月、世界保健機関(WHO)は新型コロナウイルスの世界的な感染拡大を宣言。 2021年半ばまでに、インドはコビシエルド、コヴァクシン、スプートニクの3つのワクチンを導入した。インドのような人口密度の高い国でワクチン接種が成功するためには、大衆の感情を理解することが不可欠だった。ソーシャルメディア、特にredditは4億3000万人のユーザーを抱えており、情報を広める上で重要な役割を果たした。この研究では、Redditのデータを分析し、新型コロナウイルスワクチンに対するインド人の感情を測定するためにデータマイニング技術を採用している。 PythonのText Blobライブラリを使って、コメントは一般的な感情を評価するために注釈付けされる。結果、インドのredditユーザーのほとんどが、予防接種に関する中立性を示しており、インド政府は人口のかなりの部分を予防接種しようとしている。

In March 2020, the World Health Organisation declared COVID-19 a global pandemic as it spread to nearly every country. By mid-2021, India had introduced three vaccines: Covishield, Covaxin, and Sputnik. To ensure successful vaccination in a densely populated country like India, understanding public sentiment was crucial. Social media, particularly Reddit with over 430 million users, played a vital role in disseminating information. This study employs data mining techniques to analyze Reddit data and gauge Indian sentiments towards COVID-19 vaccines. Using Python's Text Blob library, comments are annotated to assess general sentiments. Results show that most Reddit users in India expressed neutrality about vaccination, posing a challenge for the Indian government's efforts to vaccinate a significant portion of the population.

翻訳日:2023-11-30 12:30:11 公開日:2023-11-26

# LION : デュアルレベルビジュアル知識を用いたマルチモーダル大言語モデルの構築

LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge ( http://arxiv.org/abs/2311.11860v2 )

ライセンス: Link先を確認

Gongwei Chen, Leyang Shen, Rui Shao, Xiang Deng, Liqiang Nie

(参考訳) MLLM(Multimodal Large Language Models)は、マルチモーダル信号の知覚と理解が可能なLLMを提供する。しかし、既存のmllmの多くは、粗い画像テキストペアに事前学習された視覚エンコーダを主に採用しており、視覚知識の抽出と推論が不十分である。この問題に対処するために,2段階の視覚的知識を注入することでMLLMを増強するデュアルレベルvIsual knOwledge eNhanced Multimodal Large Language Model (LION)を考案した。 1)細粒度空間認識視覚知識の進歩的導入我々は,領域レベルの視覚言語(VL)タスクと連携した視覚アグリゲータを設計し,細粒度空間認識視覚知識をMLLMに組み込む。組込み時の画像レベルと領域レベルのVLタスク間の衝突を軽減するため,適応の混合によるステージワイドな指導学習戦略を考案した。このプログレッシブな組み込み方式は、これらの2種類のVLタスク間の相互促進に寄与する。 2)ハイレベルな視覚的証拠のソフトプロンプト。多様な画像タグを活用することで,MLLMの高度な意味的視覚的エビデンスを実現する。予測タグの不完全による潜在的な影響を軽減するため,学習可能なトークンをテキスト命令に組み込むことにより,ソフトプロンプト手法を提案する。複数のマルチモーダルベンチマークに関する総合的な実験は、我々のモデルの優位性を示している(例:VSRでの5%精度の改善、InstructBLIP上のTextCapsでの3%CIDEr、Cosmos-2上のRefCOCOgでの5%精度)。

Multimodal Large Language Models (MLLMs) have endowed LLMs with the ability to perceive and understand multi-modal signals. However, most of the existing MLLMs mainly adopt vision encoders pretrained on coarsely aligned image-text pairs, leading to insufficient extraction and reasoning of visual knowledge. To address this issue, we devise a dual-Level vIsual knOwledge eNhanced Multimodal Large Language Model (LION), which empowers the MLLM by injecting visual knowledge in two levels. 1) Progressive incorporation of fine-grained spatial-aware visual knowledge. We design a vision aggregator cooperated with region-level vision-language (VL) tasks to incorporate fine-grained spatial-aware visual knowledge into the MLLM. To alleviate the conflict between image-level and region-level VL tasks during incorporation, we devise a dedicated stage-wise instruction-tuning strategy with mixture-of-adapters. This progressive incorporation scheme contributes to the mutual promotion between these two kinds of VL tasks. 2) Soft prompting of high-level semantic visual evidence. We facilitate the MLLM with high-level semantic visual evidence by leveraging diverse image tags. To mitigate the potential influence caused by imperfect predicted tags, we propose a soft prompting method by embedding a learnable token into the tailored text instruction. Comprehensive experiments on several multi-modal benchmarks demonstrate the superiority of our model (e.g., improvement of 5% accuracy on VSR and 3% CIDEr on TextCaps over InstructBLIP, 5% accuracy on RefCOCOg over Kosmos-2).

翻訳日:2023-11-30 12:22:32 公開日:2023-11-26

# 変分探索モジュールVEM:地理空間モデリングとAIワークフローのためのクラウドネイティブ最適化と検証ツール

Variational Exploration Module VEM: A Cloud-Native Optimization and Validation Tool for Geospatial Modeling and AI Workflows ( http://arxiv.org/abs/2311.16196v1 )

ライセンス: Link先を確認

Julian Kuehnert (1), Hiwot Tadesse (1), Chris Dearden (2), Rosie Lickorish (3), Paolo Fraccaro (3), Anne Jones (3), Blair Edwards (3), Sekou L. Remy (1), Peter Melling (4), Tim Culmer (4) ((1) IBM Research, Nairobi, Kenya, (2) STFC Hartree Centre, Warrington, UK, (3) IBM Research, Daresbury, UK, (4) Riskaware Ltd., Bristol, UK)

(参考訳) 地理空間観測と計算モデルが組み合わさって、我々の環境の物理的システムを理解し、社会的な害を軽減するためのベストプラクティスの設計を可能にしている。クラウドベースのデプロイメントは、これらのモデリングとAIワークフローのスケールアップに役立つ。しかし、実践者が堅牢な結論を出すためには、モデルチューニングとテストが不可欠であり、モデル入力変数のバリエーションを伴うリソース集約的なプロセスである。本研究では,ワークフロー実行のオーケストレーションとベイジアンおよび機械学習に基づくモデル動作解析手法を用いて,クラウドにデプロイされたモデリングワークフローの最適化と検証を容易にする変分探索モジュールを開発した。ユーザ設定は、マルチエージェント環境で多様なサンプリング戦略を組み合わせることができる。モデルに依存しないモジュールの柔軟性と堅牢性は実世界のアプリケーションを用いて実証される。

Geospatial observations combined with computational models have become key to understanding the physical systems of our environment and enable the design of best practices to reduce societal harm. Cloud-based deployments help to scale up these modeling and AI workflows. Yet, for practitioners to make robust conclusions, model tuning and testing is crucial, a resource intensive process which involves the variation of model input variables. We have developed the Variational Exploration Module which facilitates the optimization and validation of modeling workflows deployed in the cloud by orchestrating workflow executions and using Bayesian and machine learning-based methods to analyze model behavior. User configurations allow the combination of diverse sampling strategies in multi-agent environments. The flexibility and robustness of the model-agnostic module is demonstrated using real-world applications.

翻訳日:2023-11-29 21:44:44 公開日:2023-11-26

# 早期・適時診断のための基礎的枠組みと方法論

A Foundational Framework and Methodology for Personalized Early and Timely Diagnosis ( http://arxiv.org/abs/2311.16195v1 )

ライセンス: Link先を確認

Tim Schubert, Richard W Peck, Alexander Gimson, Camelia Davtyan, Mihaela van der Schaar

(参考訳) 病気の早期診断は、より良い治療オプションを可能にし、長期生存と生活の質を改善し、全体的なコストを下げることで、医療の深い変革の可能性を秘めている。医療用ビッグデータの出現、診断検査の進歩、および機械学習と統計学の進歩により、早期またはタイムリーな診断がリーチ内にあるように思われる。初期の診断研究は、個々の診断経路を最適化する可能性をしばしば無視する。パーソナライズされた早期診断を実現するためには, 診断過程を明確化し, 個々の患者に対して, 診断の時間依存性の値を体系的に同定する基盤的枠組みが必要である。本稿では,早期診断とタイムリー診断のための基礎的枠組みを提案する。診断プロセスを概説する意思決定論的アプローチに基づいており、最適なパーソナライズされた診断パスを推定するために機械学習と統計的方法論を統合する。提案するフレームワークと,おそらく他のフレームワークを説明するために,本質的な定義を提供する。基礎的なフレームワークの開発は、いくつかの理由から必要です。 1)形式主義は,意思決定支援ツールの開発を明快にする。 2 観察情報は、将来の患者の軌跡の推定と相補することができる。 3)非現実的診断パスと関連する不確実性の純利益を個人にモデル化できる 4)「早期」「時期」の診断は明確に定義することができる。 5) パーソナライズされた早期診断, 健康結果, 発生コストに対する影響の観点から, 技術の価値を評価するためのメカニズムが出現する。最後に、この基盤となる枠組みが、待望のタイムリーな診断と介入の可能性を解き明かし、患者の成果を改善し、医療システムにより高い費用効果をもたらすことを期待する。

Early diagnosis of diseases holds the potential for deep transformation in healthcare by enabling better treatment options, improving long-term survival and quality of life, and reducing overall cost. With the advent of medical big data, advances in diagnostic tests as well as in machine learning and statistics, early or timely diagnosis seems within reach. Early diagnosis research often neglects the potential for optimizing individual diagnostic paths. To enable personalized early diagnosis, a foundational framework is needed that delineates the diagnosis process and systematically identifies the time-dependent value of various diagnostic tests for an individual patient given their unique characteristics. Here, we propose the first foundational framework for early and timely diagnosis. It builds on decision-theoretic approaches to outline the diagnosis process and integrates machine learning and statistical methodology for estimating the optimal personalized diagnostic path. To describe the proposed framework as well as possibly other frameworks, we provide essential definitions. The development of a foundational framework is necessary for several reasons: 1) formalism provides clarity for the development of decision support tools; 2) observed information can be complemented with estimates of the future patient trajectory; 3) the net benefit of counterfactual diagnostic paths and associated uncertainties can be modeled for individuals 4) 'early' and 'timely' diagnosis can be clearly defined; 5) a mechanism emerges for assessing the value of technologies in terms of their impact on personalized early diagnosis, resulting health outcomes and incurred costs. Finally, we hope that this foundational framework will unlock the long-awaited potential of timely diagnosis and intervention, leading to improved outcomes for patients and higher cost-effectiveness for healthcare systems.

翻訳日:2023-11-29 21:44:31 公開日:2023-11-26

# BadCLIP:CLIPのバックドア攻撃のためのトリガー対応プロンプト学習

BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP ( http://arxiv.org/abs/2311.16194v1 )

ライセンス: Link先を確認

Jiawang Bai, Kuofeng Gao, Shaobo Min, Shu-Tao Xia, Zhifeng Li, Wei Liu

(参考訳) CLIPとして知られるコントラストビジョンランゲージ事前トレーニングは、下流の画像認識タスクに対処する上で有望な効果を示している。しかし、最近の研究により、CLIPモデルは下流指向のバックドアで埋め込むことができることが明らかになった。下流タスクでは、1つの犠牲者モデルはクリーンなサンプルでうまく機能するが、特定のトリガーが存在するたびに特定のターゲットクラスを予測する。バックドアを注入するには、既存の攻撃は、トレーニング済みのCLIPモデル全体を悪質に微調整するために、大量のデータに依存するため、データ制限のシナリオには適用できない。本研究は,学習可能なプロンプトの最近の成功に動機づけられ,プロンプト学習段階でクリップモデルにバックドアを注入することでこの問題に対処した。 BadCLIP という手法は,CLIP に対するバックドア攻撃,すなわち画像エンコーダとテキストエンコーダの両方にトリガーを作用させる,新しい効果的な機構に基づいて構築されている。画像に適用される学習可能なトリガーとトリガー対応コンテキストジェネレータで構成されており、トリガーはトリガー対応プロンプトを通じてテキスト機能を変更でき、これにより強力で一般化可能な攻撃をもたらす。 11のデータセットで実施された大規模な実験では、BadCLIPのクリーンな精度は高度な急進的な学習手法と似ており、ほとんどの場合、攻撃成功率は99%以上である。 BadCLIPはまた、目に見えないクラスにも一般化可能で、クロスデータセットとクロスドメイン設定の下で強力な一般化機能を示している。

Contrastive Vision-Language Pre-training, known as CLIP, has shown promising effectiveness in addressing downstream image recognition tasks. However, recent works revealed that the CLIP model can be implanted with a downstream-oriented backdoor. On downstream tasks, one victim model performs well on clean samples but predicts a specific target class whenever a specific trigger is present. For injecting a backdoor, existing attacks depend on a large amount of additional data to maliciously fine-tune the entire pre-trained CLIP model, which makes them inapplicable to data-limited scenarios. In this work, motivated by the recent success of learnable prompts, we address this problem by injecting a backdoor into the CLIP model in the prompt learning stage. Our method named BadCLIP is built on a novel and effective mechanism in backdoor attacks on CLIP, i.e., influencing both the image and text encoders with the trigger. It consists of a learnable trigger applied to images and a trigger-aware context generator, such that the trigger can change text features via trigger-aware prompts, resulting in a powerful and generalizable attack. Extensive experiments conducted on 11 datasets verify that the clean accuracy of BadCLIP is similar to those of advanced prompt learning methods and the attack success rate is higher than 99% in most cases. BadCLIP is also generalizable to unseen classes, and shows a strong generalization capability under cross-dataset and cross-domain settings.

翻訳日:2023-11-29 21:44:04 公開日:2023-11-26

# 人工知能における知識獲得への学生の関心

Students' interest in knowledge acquisition in Artificial Intelligence ( http://arxiv.org/abs/2311.16193v1 )

ライセンス: Link先を確認

Manuela-Andreea Petrescu, Emilia-Loredana Pop and Tudor-Dan Mihoc

(参考訳) 本研究では,人工知能コースに関する学生の期待と視点を考察し,分析した。コンピュータサイエンス専門学校に入学した200人中58人の大学生から匿名回答を得た。回答は分析され、テーマ分析を用いて解釈され、人工知能研究のトピックに関連する関心や魅力、魅力のない側面を解明した。その傾向、適用性、その主題に対する情熱と関心、将来の成長の可能性、高い給与のために、学生は人工知能に興味を持っていると結論づけた。しかし、学生の期待は主に人工知能分野における中レベルの知識の獲得に関連しており、男性は女性よりも高度なスキルの獲得に関心があるようである。学生が楽しまなかった最も一般的な部分は、人工知能で使われる数学的側面であった。その一部(小さなグループ)は、否定的な目的のために非倫理的な方法で使用できる人工知能の可能性も認識していた。また,本研究は,中等知識の習得に学生はそれほど熱心でも興味も持たず,DBの使用状況や基本情報にも関係していたデータベース・コースと比較した。

Some students' expectations and points of view related to the Artificial Intelligence course are explored and analyzed in this study. We anonymous collected answers from 58 undergraduate students out of 200 enrolled in the Computer Science specialization. The answers were analysed and interpreted using thematic analysis to find out their interests and attractive and unattractive aspects related to the Artificial Intelligence study topic. We concluded that students are interested in Artificial Intelligence due to its trendiness, applicability, their passion and interest in the subject, the potential for future growth, and high salaries. However, the students' expectations were mainly related to achieving medium knowledge in the Artificial Intelligence field, and men seem to be more interested in acquiring high-level skills than women. The most common part that wasn't enjoyed by the students was the mathematical aspect used in Artificial Intelligence. Some of them (a small group) were also aware of the Artificial Intelligence potential which could be used in an unethical manner for negative purposes. Our study also provides a short comparison to the Databases course, in which students were not that passionate or interested in achieving medium knowledge, their interest was related to DB usage and basic information.

翻訳日:2023-11-29 21:43:37 公開日:2023-11-26

# 余寿命予測のための複数入力自己回帰モデルの利用

Utilizing Multiple Inputs Autoregressive Models for Bearing Remaining Useful Life Prediction ( http://arxiv.org/abs/2311.16192v1 )

ライセンス: Link先を確認

Junliang Wang, Qinghua Zhang, Guanhua Zhu, Guoxi Sun

(参考訳) 転がり軸受(RUL)の正確な寿命予測は工業生産において重要であるが、既存のモデルはすべての振動信号パターンを完全に処理できないため、限られた一般化能力に苦慮することが多い。軸受のRUL予測において,この課題に対処する新しい多入力自己回帰モデルを提案する。提案手法は, 従来予測されていたHealth Indicator (HI) 値と振動信号を一意に統合し, 現在の窓 HI 値を出力するために特徴融合を利用する。自己回帰反復により、モデルはグローバルな受容場を獲得し、一般化の限界を効果的に克服する。さらに,自動回帰モデルにおける誤りの蓄積を軽減するために,セグメント化手法と複数のトレーニングイテレーションを革新的に取り入れた。 PMH2012データセットの実証評価では, 同様の自己回帰アプローチを用いたバックボーンネットワークと比較して, ルート平均角誤差(RMSE)とスコアが有意に低いことが示されている。特に、ラベル値を入力や非自己回帰的ネットワークとして使用する従来の自己回帰モデルよりも優れており、RMSEとScoreの指標において顕著なリードを持つ優れた一般化能力を示している。

Accurate prediction of the Remaining Useful Life (RUL) of rolling bearings is crucial in industrial production, yet existing models often struggle with limited generalization capabilities due to their inability to fully process all vibration signal patterns. We introduce a novel multi-input autoregressive model to address this challenge in RUL prediction for bearings. Our approach uniquely integrates vibration signals with previously predicted Health Indicator (HI) values, employing feature fusion to output current window HI values. Through autoregressive iterations, the model attains a global receptive field, effectively overcoming the limitations in generalization. Furthermore, we innovatively incorporate a segmentation method and multiple training iterations to mitigate error accumulation in autoregressive models. Empirical evaluation on the PMH2012 dataset demonstrates that our model, compared to other backbone networks using similar autoregressive approaches, achieves significantly lower Root Mean Square Error (RMSE) and Score. Notably, it outperforms traditional autoregressive models that use label values as inputs and non-autoregressive networks, showing superior generalization abilities with a marked lead in RMSE and Score metrics.

翻訳日:2023-11-29 21:43:19 公開日:2023-11-26

# MACE:周波数領域における多重パターン調整および効率的な異常検出手法

MACE: A Multi-pattern Accommodated and Efficient Anomaly Detection Method in the Frequency Domain ( http://arxiv.org/abs/2311.16191v1 )

ライセンス: Link先を確認

Feiyi Chen, Yingying zhang, Zhen Qin, Lunting Fan, Renhe Jiang, Yuxuan Liang, Qingsong Wen, Shuiguang Deng

(参考訳) 異常検出は、クラウドシステムの堅牢性を大幅に向上させる。ニューラルネットワークベースの手法は、最近、強力なアドバンテージを示しているが、クラウド環境では実用的な課題に直面している。各サービスに対するユニークなモデルを維持することの非現実性と、統一モデルによる多様な正常なパターンを扱う能力の制限と、リアルタイムなトラフィック処理や短時間の異常検出感度の問題だ。そこで本研究では、時系列異常検出のための周波数領域におけるマルチパターン調整および効率的な異常検出手法であるMACEを提案する。そこには3つの新しい特徴がある。 (i)多様な正常パターンの扱いに優れるパターン抽出機構は、データサンプル自体にのみ注目するのではなく、データサンプルとサービス正常パターンとの相関を調べることにより、異常を識別することができる。二時間領域における短期異常を増幅し、周波数領域における異常の再構成を阻害する双対的畳み込み機構で、異常と正常との再構成誤差を増大させ、異常検出を容易にする。 (iii)周波数領域のスパーシティと並列性を利用して、モデル効率を向上させる。理論的および実験的にフーリエ基底の戦略的に選択された部分集合を使うことは、計算オーバーヘッドを減少させるだけでなく、完全なスペクトルを使うよりも異常を区別する利益となることを証明した。さらに、多種多様な正規パターンを統一モデルで処理し、最先端の性能を高い効率で実現するためのMISの有効性を示す。 \end{abstract}

Anomaly detection significantly enhances the robustness of cloud systems. While neural network-based methods have recently demonstrated strong advantages, they encounter practical challenges in cloud environments: the contradiction between the impracticality of maintaining a unique model for each service and the limited ability of dealing with diverse normal patterns by a unified model, as well as issues with handling heavy traffic in real time and short-term anomaly detection sensitivity. Thus, we propose MACE, a Multi-pattern Accommodated and efficient Anomaly detection method in the frequency domain for time series anomaly detection. There are three novel characteristics of it: (i) a pattern extraction mechanism excelling at handling diverse normal patterns, which enables the model to identify anomalies by examining the correlation between the data sample and its service normal pattern, instead of solely focusing on the data sample itself; (ii) a dualistic convolution mechanism that amplifies short-term anomalies in the time domain and hinders the reconstruction of anomalies in the frequency domain, which enlarges the reconstruction error disparity between anomaly and normality and facilitates anomaly detection; (iii) leveraging the sparsity and parallelism of frequency domain to enhance model efficiency. We theoretically and experimentally prove that using a strategically selected subset of Fourier bases can not only reduce computational overhead but is also profit to distinguish anomalies, compared to using the complete spectrum. Moreover, extensive experiments demonstrate MACE's effectiveness in handling diverse normal patterns with a unified model and it achieves state-of-the-art performance with high efficiency. \end{abstract}

翻訳日:2023-11-29 21:42:57 公開日:2023-11-26

# Q-Pilot:フライングアンシラを用いたフィールドプログラマブル量子アレイコンパイル

Q-Pilot: Field Programmable Quantum Array Compilation with Flying Ancillas ( http://arxiv.org/abs/2311.16190v1 )

ライセンス: Link先を確認

Hanrui Wang and Bochen Tan and Pengyu Liu and Yilian Liu and Jiaqi Gu and Jason Cong and Song Han

(参考訳) ニュートラル原子配列は量子コンピューティングにとって有望なプラットフォームとなり、特に原子運動のユニークな能力を持つ『textit{field programmable qubit array}』(FPQA)が注目されている。この機能は実行中のqubit接続の動的変更を可能にし、長距離ゲートの実行コストを削減し、並列性を改善する。しかし、この柔軟性の追加は、回路コンパイルに新たな課題をもたらす。 FPGAの配置とルーティング戦略に着想を得て,データキュービット間の2キュービットゲートのルーティングに可動原子を用いながら,すべてのデータキュービットを固定原子にマッピングすることを提案する。これらの移動原子は、ancilla qubitsとして機能し、実行中に動的に生成され、リサイクルされる。本稿では,フライングアンシラを用いたFPQA用スケーラブルコンパイラQ-Pilotについて述べる。量子シミュレーションと量子近似最適化アルゴリズム(qaoa)という2つの重要な量子応用について、ドメイン固有のルーティング戦略を考案する。超伝導デバイスや固定原子配列などの代替技術と比較して、Q-PilotはFPQAの柔軟性を効果的に活用し、それぞれ100キュービットのランダム、量子シミュレーション、QAOAの回路深度で1.4$\times$, 27.7$\times$, 6.3$\times$の低減を実現している。

Neutral atom arrays have become a promising platform for quantum computing, especially the \textit{field programmable qubit array} (FPQA) endowed with the unique capability of atom movement. This feature allows dynamic alterations in qubit connectivity during runtime, which can reduce the cost of executing long-range gates and improve parallelism. However, this added flexibility introduces new challenges in circuit compilation. Inspired by the placement and routing strategies for FPGAs, we propose to map all data qubits to fixed atoms while utilizing movable atoms to route for 2-qubit gates between data qubits. Coined \textit{flying ancillas}, these mobile atoms function as ancilla qubits, dynamically generated and recycled during execution. We present Q-Pilot, a scalable compiler for FPQA employing flying ancillas to maximize circuit parallelism. For two important quantum applications, quantum simulation and the Quantum Approximate Optimization Algorithm (QAOA), we devise domain-specific routing strategies. In comparison to alternative technologies such as superconducting devices or fixed atom arrays, Q-Pilot effectively harnesses the flexibility of FPQA, achieving reductions of 1.4$\times$, 27.7$\times$, and 6.3$\times$ in circuit depth for 100-qubit random, quantum simulation, and QAOA circuits, respectively.

翻訳日:2023-11-29 21:42:28 公開日:2023-11-26

# 大規模視覚言語モデルを用いた物体間インタラクション検出のための人間中心視覚手がかりの生成

Generating Human-Centric Visual Cues for Human-Object Interaction Detection via Large Vision-Language Models ( http://arxiv.org/abs/2311.16475v1 )

ライセンス: Link先を確認

Yu-Wei Zhan, Fan Liu, Xin Luo, Liqiang Nie, Xin-Shun Xu, Mohan Kankanhalli

(参考訳) human-object interaction (hoi) 検出は、人間とオブジェクトのペアを検出し、それらの相互作用を予測することを目的としている。しかし、人間の行動の複雑さとこれらの相互作用が起こる多様な文脈は困難である。直感的には、関与する参加者、ボディランゲージ、周囲の環境など、人間中心の視覚的手がかりは、これらの相互作用を形作る上で重要な役割を果たす。これらの手がかりは、特に目に見えない相互作用の解釈に不可欠である。本稿では,VLMを用いた3つのプロンプトを提案する。このようなリッチな人中心視覚クイズを活かすために,Human-Centric Visual Cues を用いた HCVC という新しい手法を提案する。特に,視覚的キュー機能をインスタンスやインタラクションデコーダに組み込むために,マルチトワーアーキテクチャを備えたトランスフォーマーベースのマルチモーダル融合モジュールを開発した。広範にわたる実験と解析により,人中心視力を用いたHOI検出の有効性が検証された。特に, 実験結果から, 2つの広く使用されているデータセットに対する既存の最先端手法よりも, 提案モデルの方が優れていることが示された。

Human-object interaction (HOI) detection aims at detecting human-object pairs and predicting their interactions. However, the complexity of human behavior and the diverse contexts in which these interactions occur make it challenging. Intuitively, human-centric visual cues, such as the involved participants, the body language, and the surrounding environment, play crucial roles in shaping these interactions. These cues are particularly vital in interpreting unseen interactions. In this paper, we propose three prompts with VLM to generate human-centric visual cues within an image from multiple perspectives of humans. To capitalize on these rich Human-Centric Visual Cues, we propose a novel approach named HCVC for HOI detection. Particularly, we develop a transformer-based multimodal fusion module with multitower architecture to integrate visual cue features into the instance and interaction decoders. Our extensive experiments and analysis validate the efficacy of leveraging the generated human-centric visual cues for HOI detection. Notably, the experimental results indicate the superiority of the proposed model over the existing state-of-the-art methods on two widely used datasets.

翻訳日:2023-11-29 20:27:42 公開日:2023-11-26

# GS-IR:逆レンダリングのための3次元ガウススティング

GS-IR: 3D Gaussian Splatting for Inverse Rendering ( http://arxiv.org/abs/2311.16473v1 )

ライセンス: Link先を確認

Zhihao Liang, Qi Zhang, Ying Feng, Ying Shan, Kui Jia

(参考訳) 本稿では,3次元ガウス散乱(GS)に基づく新しい逆レンダリング手法であるGS-IRを提案する。暗黙的なニューラル表現とボリュームレンダリング(例えば、NeRF)を低表現力と高い計算複雑性で用いた従来の作品とは異なり、GSは、未知の照明条件下で撮影されたマルチビュー画像からシーン幾何学、表面物質、環境照明を推定するために、新しいビュー合成のための最高性能の表現である。 gsを逆レンダリングに導入する場合、主な問題は2つある。 1)GSは,本質的に可塑性な正常生産をサポートしない。 2)前方マッピング(ラスタ化やスプラッティングなど)は後方マッピング(レイトレーシングなど)のように咬合を追跡することはできない。これらの課題に対処するため,gs-irは,通常推定のための奥行き導出に基づく正規化と,間接照明をモデル化するためのベイキングに基づくオクルージョンを組み込んだ効率的な最適化手法を提案する。フレキシブルかつ表現力のあるGS表現は、高速かつコンパクトな幾何再構成、フォトリアリスティックな新規ビュー合成、有効物理ベースレンダリングを実現する。本手法は,様々な挑戦シーンの質的,定量的評価を通じて,ベースライン法よりも優れていることを示す。

We propose GS-IR, a novel inverse rendering approach based on 3D Gaussian Splatting (GS) that leverages forward mapping volume rendering to achieve photorealistic novel view synthesis and relighting results. Unlike previous works that use implicit neural representations and volume rendering (e.g. NeRF), which suffer from low expressive power and high computational complexity, we extend GS, a top-performance representation for novel view synthesis, to estimate scene geometry, surface material, and environment illumination from multi-view images captured under unknown lighting conditions. There are two main problems when introducing GS to inverse rendering: 1) GS does not support producing plausible normal natively; 2) forward mapping (e.g. rasterization and splatting) cannot trace the occlusion like backward mapping (e.g. ray tracing). To address these challenges, our GS-IR proposes an efficient optimization scheme that incorporates a depth-derivation-based regularization for normal estimation and a baking-based occlusion to model indirect lighting. The flexible and expressive GS representation allows us to achieve fast and compact geometry reconstruction, photorealistic novel view synthesis, and effective physically-based rendering. We demonstrate the superiority of our method over baseline methods through qualitative and quantitative evaluations on various challenging scenes.

翻訳日:2023-11-29 20:27:07 公開日:2023-11-26

# eye vs. ai: 映像記憶における人間の視線とモデル注意

Eye vs. AI: Human Gaze and Model Attention in Video Memorability ( http://arxiv.org/abs/2311.16484v1 )

ライセンス: Link先を確認

Prajneya Kumar, Eshika Khandelwal, Makarand Tapaswi, Vishnu Sreekumar

(参考訳) ビデオの記憶可能性を決定する要因を理解することは、教育技術や広告などの分野で重要な応用となる。この目的に向けて,映像の記憶可能性を支える意味的および時間的注意機構について検討する。本研究では,大規模映像データセットにおける映像記憶性予測におけるsota性能に適合する時空間的注意を持つ変圧器モデルを提案する。さらに重要なのは、自己注意パターンは、モデルが記憶可能性を予測する場所を示しています。小型眼球追跡実験により収集された人間の視線固定密度マップに対するモデル注意力の比較を行った。定量的塩分濃度指標は、モデル注意と人間の視線が類似したパターンに従うことを示している。さらに, パノプティカルセグメンテーションでは, モデルや人間の方がモノのクラスに多く参加していることが確認されているが, 注目度の増加/減少するクラスは, 記憶可能性スコアが高い傾向にある。また,本モデルが人間の時間的注意パターンを模倣し,初期フレームに重きを置くことも観察した。

Understanding the factors that determine video memorability has important applications in areas such as educational technology and advertising. Towards this goal, we investigate the semantic and temporal attention mechanisms underlying video memorability. We propose a Transformer-based model with spatio-temporal attention that matches SoTA performance on video memorability prediction on a large naturalistic video dataset. More importantly, the self-attention patterns show us where the model looks to predict memorability. We compare model attention against human gaze fixation density maps collected through a small-scale eye-tracking experiment where humans perform a video memory task. Quantitative saliency metrics show that the model attention and human gaze follow similar patterns. Furthermore, while panoptic segmentation confirms that the model and humans attend more to thing classes, stuff classes that receive increased/decreased attention tend to have higher memorability scores. We also observe that the model assigns greater importance to the initial frames, mimicking temporal attention patterns found in humans.

翻訳日:2023-11-29 20:13:14 公開日:2023-11-26

# pisa: ポイントクラウドベースのインストラクションシーン拡張

PISA: Point-cloud-based Instructed Scene Augmentation ( http://arxiv.org/abs/2311.16501v1 )

ライセンス: Link先を確認

Yiyang Luo and Ke Lin

(参考訳) 屋内シーン拡張は、拡張現実と仮想現実の応用を含むコンピュータビジョンの分野において、新たなトピックとなっている。しかし、既存のシーン拡張手法は、主に所望の場所として所定の位置を持つ事前構築されたオブジェクトデータベースを必要とする。本稿では,テキスト命令で条件付きで周囲に整合した点雲オブジェクトを生成可能な,最初のエンドツーエンドマルチモーダルディープニューラルネットワークを提案する。我々のモデルは、クエリとポイントクラウドの入力に基づいて、適切な位置に一見オブジェクトを生成し、これにより、以前は目に見えないオブジェクトのレイアウトを含む新しいシナリオを作成することができる。プレストアされたCADモデルのデータベースはもはや不要である。生成モデルとしてPoint-Eを用い,不明瞭な言語記述による偽陰性問題を緩和するために,定量化位置予測とTop-K推定を含む手法を導入する。さらに,本モデルが実際の室内物体を生成できることを総合的に示し,生成物体の多様性,指示の有効性,定量的測定結果を示すことにより,モデルの能力を評価する。さらに詳細な評価のために、モデルによって生成されたシーンの品質を評価するためのメトリクスとして、視覚的な接地も取り入れています。

Indoor scene augmentation has become an emerging topic in the field of computer vision with applications in augmented and virtual reality. However, existing scene augmentation methods mostly require a pre-built object database with a given position as the desired location. In this paper, we propose the first end-to-end multi-modal deep neural network that can generate point cloud objects consistent with their surroundings, conditioned on text instructions. Our model generates a seemly object in the appropriate position based on the inputs of a query and point clouds, thereby enabling the creation of new scenarios involving previously unseen layouts of objects. Database of pre-stored CAD models is no longer needed. We use Point-E as our generative model and introduce methods including quantified position prediction and Top-K estimation to mitigate the false negative problems caused by ambiguous language description. Moreover, we evaluate the ability of our model by demonstrating the diversity of generated objects, the effectiveness of instruction, and quantitative metric results, which collectively indicate that our model is capable of generating realistic in-door objects. For a more thorough evaluation, we also incorporate visual grounding as a metric to assess the quality of the scenes generated by our model.

翻訳日:2023-11-29 20:00:22 公開日:2023-11-26

# 知識誘導予測アーキテクチャによるSAR ATRの自己教師付き学習

Self-Supervised Learning for SAR ATR with a Knowledge-Guided Predictive Architecture ( http://arxiv.org/abs/2311.15153v1 )

ライセンス: Link先を確認

Weijie Li, Yang Wei, Tianpeng Liu, Yuenan Hou, Yongxiang Liu, Li Liu

(参考訳) 近年,SAR(Synthetic Aperture Radar)センサやターゲットデータセットの出現により,下流タスクを自己教師付き学習技術と一体化することが可能となり,SAR目標認識分野における基礎モデル構築の道を開いた。 sar目標認識のための自己教師あり学習の主な課題は、低データ品質と雑音における一般化された表現学習であり、上記の問題に対処するために、局所マスクパッチを用いた知識誘導型予測アーキテクチャを提案する。提案アーキテクチャの中核は、従来のSARドメインの特徴抽出と最先端のスケーラブルな自己教師付き学習を組み合わせることで、正確な一般化された特徴表現を実現することである。提案フレームワークは、様々な下流データセット(MSTAR、FUSAR-Ship、SAR-ACD、SSDD)で検証され、SARターゲット認識に一貫したパフォーマンス改善をもたらすことができる。実験結果は,SAR目標認識のための自己教師付き学習手法の多種多様な目標,シーン,センサに対する統一的な性能向上を強く実証した。

Recently, the emergence of a large number of Synthetic Aperture Radar (SAR) sensors and target datasets has made it possible to unify downstream tasks with self-supervised learning techniques, which can pave the way for building the foundation model in the SAR target recognition field. The major challenge of self-supervised learning for SAR target recognition lies in the generalizable representation learning in low data quality and noise.To address the aforementioned problem, we propose a knowledge-guided predictive architecture that uses local masked patches to predict the multiscale SAR feature representations of unseen context. The core of the proposed architecture lies in combining traditional SAR domain feature extraction with state-of-the-art scalable self-supervised learning for accurate generalized feature representations. The proposed framework is validated on various downstream datasets (MSTAR, FUSAR-Ship, SAR-ACD and SSDD), and can bring consistent performance improvement for SAR target recognition. The experimental results strongly demonstrate the unified performance improvement of the self-supervised learning technique for SAR target recognition across diverse targets, scenes and sensors.

翻訳日:2023-11-29 17:05:18 公開日:2023-11-26

# ユーザフィードバックとアプリ更新ログの対一致機能点における時系列利用によるユーザ貢献率の推定

Estimation of the User Contribution Rate by Leveraging Time Sequence in Pairwise Matching function-point between Users Feedback and App Updating Log ( http://arxiv.org/abs/2311.15179v1 )

ライセンス: Link先を確認

Shiqi Duan, Jianxun Liu, Yong Xiao, Xiangping Zhang

(参考訳) モバイルアプリケーションは、人々の日常生活の不可分な部分となっている。それでも市場競争は非常に激しく、ほとんどのユーザーの間で認識されていないアプリは市場排除の影響を受けやすい。この目的のためにデベロッパーは、より広いユーザー基盤の要求を迅速かつ正確に理解し、アプリの秩序と健全な進化を効果的に戦略化し促進する必要がある。一般的なユーザ要件が開発者によって採用される率、あるいはユーザコントリビューションは、アプリケーション開発者やソフトウェアエンジニアリング研究者にとって、アプリ要件の進化を測ったり、洞察を得て、アプリのソフトウェアの進化を予測する上で重要なツールとなる、非常に価値のある指標です。残念なことに、この重要な指標には洗練された定量的分析アプローチやツールが欠けている。この問題に対処するために,本稿では,アプリの更新ログとユーザレビューに存在する時間的相関知覚に基づく定量的分析手法を提案する。本手法の主な考え方は,ユーザ要求とアプリの更新ログを開発者対応として検討し,テキスト・コンピューティングによって両者の相互関係と時系列関係を抽出・解析し,ユーザの貢献度を定量的に計算する実現可能なアプローチを構築することである。このアプローチの実現可能性を示すため,本論文では,中国本土のApp Storeの4つの中国アプリと,米国内の1つの英国アプリから,2,178件の更新ログと4,236,417件のユーザレビューを含むデータを収集し,実験結果から,これらのアプリの機能のうち16.6%～43.2%が,オンラインユーザ要件の推進に関連していることが判明した。

Mobile applications have become an inseparable part of people's daily life. Nonetheless, the market competition is extremely fierce, and apps lacking recognition among most users are susceptible to market elimination. To this end, developers must swiftly and accurately apprehend the requirements of the wider user base to effectively strategize and promote their apps' orderly and healthy evolution. The rate at which general user requirements are adopted by developers, or user contribution, is a very valuable metric that can be an important tool for app developers or software engineering researchers to measure or gain insight into the evolution of app requirements and predict the evolution of app software. Regrettably, the landscape lacks refined quantitative analysis approaches and tools for this pivotal indicator. To address this problem, this paper exploratively proposes a quantitative analysis approach based on the temporal correlation perception that exists in the app update log and user reviews, which provides a feasible solution for quantitatively obtaining the user contribution. The main idea of this scheme is to consider valid user reviews as user requirements and app update logs as developer responses, and to mine and analyze the pairwise and chronological relationships existing between the two by text computing, thus constructing a feasible approach for quantitatively calculating user contribution. To demonstrate the feasibility of the approach, this paper collects data from four Chinese apps in the App Store in mainland China and one English app in the U.S. region, including 2,178 update logs and 4,236,417 user reviews, and from the results of the experiment, it was found that 16.6%-43.2% of the feature of these apps would be related to the drive from the online popular user requirements.

翻訳日:2023-11-28 19:15:01 公開日:2023-11-26

# ディープラーニングに基づく非接触指紋のセグメンテーションと抽出

Deep Learning-Based Approaches for Contactless Fingerprints Segmentation and Extraction ( http://arxiv.org/abs/2311.15163v1 )

ライセンス: Link先を確認

M.G. Sarwar Murshed, Syed Konain Abbas, Sandip Purnapatra, Daqing Hou and Faraz Hussain

(参考訳) 指紋は、人間のアイデンティティの最もユニークで信頼できる特徴の1つとして広く認識されている。現代の指紋認証システムでは、認証プロセス中に指紋をキャプチャするために指紋スキャナーや指紋センサーを使用する必要がある。光、容量、超音波センサーなどの様々なタイプの指紋センサーは、指紋データを収集し分析するための異なる技術を採用している。この特定のハードウェアやセンサーへの依存は、指紋ベースの生体認証システムを採用するための障壁や課題を生み出す。この制限は、様々なアプリケーションやシナリオにおける指紋認証の普及を妨げる。国境管理、医療システム、教育機関、金融取引、空港のセキュリティは、指紋センサーが一般に利用できない場合、課題に直面している。追加ハードウェアへの依存を軽減するために、代替として非接触指紋の使用が登場した。堅牢な非接触指紋認証システムの実現には,正確な指紋分割法,正確な指紋抽出ツール,信頼性の高い指紋照合器の開発が不可欠である。本稿では,コンタクトレス指紋定位とセグメンテーションのための深層学習に基づくセグメンテーションツールの開発に着目する。本システムは,非接触指紋画像から高いセグメンテーション精度と確実な指紋抽出を実現するために,ディープラーニング技術を活用する。本評価では,平均平均絶対誤差(mae)が30ピクセル,角度予測誤差(eap)が5.92度,ラベリング精度97.46%を示した。これらの結果は,新しい非接触指紋セグメンテーションおよび抽出ツールの有効性を示す。

Fingerprints are widely recognized as one of the most unique and reliable characteristics of human identity. Most modern fingerprint authentication systems rely on contact-based fingerprints, which require the use of fingerprint scanners or fingerprint sensors for capturing fingerprints during the authentication process. Various types of fingerprint sensors, such as optical, capacitive, and ultrasonic sensors, employ distinct techniques to gather and analyze fingerprint data. This dependency on specific hardware or sensors creates a barrier or challenge for the broader adoption of fingerprint based biometric systems. This limitation hinders the widespread adoption of fingerprint authentication in various applications and scenarios. Border control, healthcare systems, educational institutions, financial transactions, and airport security face challenges when fingerprint sensors are not universally available. To mitigate the dependence on additional hardware, the use of contactless fingerprints has emerged as an alternative. Developing precise fingerprint segmentation methods, accurate fingerprint extraction tools, and reliable fingerprint matchers are crucial for the successful implementation of a robust contactless fingerprint authentication system. This paper focuses on the development of a deep learning-based segmentation tool for contactless fingerprint localization and segmentation. Our system leverages deep learning techniques to achieve high segmentation accuracy and reliable extraction of fingerprints from contactless fingerprint images. In our evaluation, our segmentation method demonstrated an average mean absolute error (MAE) of 30 pixels, an error in angle prediction (EAP) of 5.92 degrees, and a labeling accuracy of 97.46%. These results demonstrate the effectiveness of our novel contactless fingerprint segmentation and extraction tools.

翻訳日:2023-11-28 19:14:29 公開日:2023-11-26

# ベイズ型新素材探索におけるドメイン知識注入

Domain Knowledge Injection in Bayesian Search for New Materials ( http://arxiv.org/abs/2311.15162v1 )

ライセンス: Link先を確認

Zikai Xie, Xenophon Evangelopoulos, Joseph Thacker, Andrew Cooper

(参考訳) 本稿では,探索空間における探索を調整するためのドメイン知識に対応するベイズ最適化(BO)アルゴリズムであるDKIBOを提案する。ベイズ最適化は、多くの難解な科学的問題に対するサンプル効率の最適化として最近登場した。既存のBOフレームワークは、空間を狭めることで、事前の信念の入力を加速させるが、そのような知識を組み込むことは必ずしも簡単ではなく、バイアスを導入し、パフォーマンスの低下につながることが多い。本稿では,ガウス過程の近似パワーを高めるために,追加決定論的サロゲートモデルを用いて構造知識を獲得関数に組み込む簡単な手法を提案する。これは、手前の問題の構造情報に基づいて好適に選択され、より良いインフォームドサンプリングに向けた補正項として機能する。材料設計タスクにドメイン知識をうまく注入することにより,提案手法の実用性を実証する。さらに, 実験条件およびアブレーション解析により, 提案手法の性能を検証した。

In this paper we propose DKIBO, a Bayesian optimization (BO) algorithm that accommodates domain knowledge to tune exploration in the search space. Bayesian optimization has recently emerged as a sample-efficient optimizer for many intractable scientific problems. While various existing BO frameworks allow the input of prior beliefs to accelerate the search by narrowing down the space, incorporating such knowledge is not always straightforward and can often introduce bias and lead to poor performance. Here we propose a simple approach to incorporate structural knowledge in the acquisition function by utilizing an additional deterministic surrogate model to enrich the approximation power of the Gaussian process. This is suitably chosen according to structural information of the problem at hand and acts a corrective term towards a better-informed sampling. We empirically demonstrate the practical utility of the proposed method by successfully injecting domain knowledge in a materials design task. We further validate our method's performance on different experimental settings and ablation analyses.

翻訳日:2023-11-28 19:14:07 公開日:2023-11-26

# 連続学習のための低級重み摂動を考慮したヘシアン

Hessian Aware Low-Rank Weight Perturbation for Continual Learning ( http://arxiv.org/abs/2311.15161v1 )

ライセンス: Link先を確認

Jiaqi Li, Rui Wang, Yuanhao Lai, Changjian Shui, Sabyasachi Sahoo, Charles X. Ling, Shichun Yang, Boyu Wang, Christian Gagn\'e, Fan Zhou

(参考訳) 連続学習は、前者から得られた知識を忘れることなく、一連のタスクを順次学習することを目的としている。本研究では,連続学習のためのヘッセン認識低ランク摂動アルゴリズムを提案する。重み行列変換を用いて逐次タスクに沿ったパラメータ遷移をモデル化することにより、ニューラルネットワークの各層におけるタスク適応パラメータに低ランク近似を適用することを提案する。具体的には,ヘッセン近似と提案した低ランク近似の量的関係を理論的に実証する。近似ランクは、層比勾配と低ランク近似誤差によって推定される経験的損失の限界増加に従って、全世界的に決定される。さらに,パラメータ成長を抑えるために,重要度を低くすることでモデル容量を制御する。大規模タスクを含むデータセットを含む様々なベンチマークについて広範な実験を行い,提案手法の有効性と拡張性を示すため,最近の最先端手法との比較を行った。実験結果から,本手法は異なるベンチマークにおいて,特にタスク順序の堅牢性を実現し,課題の処理において,より優れた性能を示すことがわかった。デモコードはhttps://github.com/lijiaqi/HALRPで見ることができる。

Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones. In this work, we propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning. By modeling the parameter transitions along the sequential tasks with the weight matrix transformation, we propose to apply the low-rank approximation on the task-adaptive parameters in each layer of the neural networks. Specifically, we theoretically demonstrate the quantitative relationship between the Hessian and the proposed low-rank approximation. The approximation ranks are then globally determined according to the marginal increment of the empirical loss estimated by the layer-specific gradient and low-rank approximation error. Furthermore, we control the model capacity by pruning less important parameters to diminish the parameter growth. We conduct extensive experiments on various benchmarks, including a dataset with large-scale tasks, and compare our method against some recent state-of-the-art methods to demonstrate the effectiveness and scalability of our proposed method. Empirical results show that our method performs better on different benchmarks, especially in achieving task order robustness and handling the forgetting issue. A demo code can be found at https://github.com/lijiaqi/HALRP.

翻訳日:2023-11-28 19:13:50 公開日:2023-11-26

# グループ混合型視覚トランスフォーマーの進歩

Advancing Vision Transformers with Group-Mix Attention ( http://arxiv.org/abs/2311.15157v1 )

ライセンス: Link先を確認

Chongjian Ge, Xiaohan Ding, Zhan Tong, Li Yuan, Jiangliu Wang, Yibing Song, Ping Luo

(参考訳) 視覚変換器 (ViTs) は、MHSA (Multi-head Self-attention) による長距離依存をモデル化することで、視覚認識を強化することが示されている。しかし、Query and Keyから生成された注目マップは、1つの粒度でトークン間相関のみをキャプチャする。本稿では,表現能力を高めるために,トークンとグループ(すなわち複数の隣接トークン)間の相関を捉えるための,より包括的なメカニズムを持つべきである。そこで我々は,従来の自己注意の代替としてグループ・ミクス・アテンション(GMA)を提案し,トークン・ツー・トークン・ツー・グループ,グループ・ツー・グループ間の相関を様々なグループサイズで同時に捉えることができる。この目的のために、GMAはQuery、Key、Valueを一様にセグメントに分割し、グループプロキシを生成するために異なるグループアグリゲーションを実行する。アテンションマップはトークンとグループプロキシの混合に基づいて計算され、トークンとグループの値の再結合に使用される。 GMAに基づく強力なバックボーンであるGroupMixFormerを導入し、既存のモデルよりも少ないパラメータで画像分類、オブジェクト検出、セマンティックセグメンテーションにおける最先端のパフォーマンスを実現する。例えば、GroupMixFormer-L(70.3Mパラメータと384^2入力)はImageNet-1Kで86.2%、GroupMixFormer-B(45.8Mパラメータ)はADE20Kで51.2% mIoUに達する。

Vision Transformers (ViTs) have been shown to enhance visual recognition through modeling long-range dependencies with multi-head self-attention (MHSA), which is typically formulated as Query-Key-Value computation. However, the attention map generated from the Query and Key captures only token-to-token correlations at one single granularity. In this paper, we argue that self-attention should have a more comprehensive mechanism to capture correlations among tokens and groups (i.e., multiple adjacent tokens) for higher representational capacity. Thereby, we propose Group-Mix Attention (GMA) as an advanced replacement for traditional self-attention, which can simultaneously capture token-to-token, token-to-group, and group-to-group correlations with various group sizes. To this end, GMA splits the Query, Key, and Value into segments uniformly and performs different group aggregations to generate group proxies. The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value. Based on GMA, we introduce a powerful backbone, namely GroupMixFormer, which achieves state-of-the-art performance in image classification, object detection, and semantic segmentation with fewer parameters than existing models. For instance, GroupMixFormer-L (with 70.3M parameters and 384^2 input) attains 86.2% Top-1 accuracy on ImageNet-1K without external data, while GroupMixFormer-B (with 45.8M parameters) attains 51.2% mIoU on ADE20K.

翻訳日:2023-11-28 19:13:33 公開日:2023-11-26

# xTrimoGene:シングルセルRNA-Seqデータのための効率的でスケーラブルな表現学習者

xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data ( http://arxiv.org/abs/2311.15156v1 )

ライセンス: Link先を確認

Jing Gong, Minsheng Hao, Xingyi Cheng, Xin Zeng, Chiming Liu, Jianzhu Ma, Xuegong Zhang, Taifeng Wang, Le Song

(参考訳) 高スループットシークエンシング技術の進歩は、単一細胞レベルでの遺伝子発現の測定に大きな進歩をもたらした。公開されているシングルセルRNA-seq(scRNA-seq)の量は、すでに2万の遺伝子を計測したヒトの5000万レコードを超えている。これは教師なし表現学習の必要性を強調するものだが、古典的なトランスフォーマーアーキテクチャでは、計算とメモリの両方でそのようなデータをトレーニングすることは禁止されている。この課題に対処するため、我々は、xTrimoGene$^\alpha$(略してxTrimoGene)と呼ばれる、cRNA-seqデータのための新しい非対称エンコーダデコーダ変換器を提案する。 xTrimoGeneのこのスケーラブルな設計は、従来のトランスフォーマーに比べてFLOPを1～2桁削減し、高い精度を維持しながら、今日の最大のScRNA-seqデータセット上で最大のトランスフォーマーモデルをトレーニングすることができる。また,モデルサイズを拡大するにつれて,xTrimoGeneの性能が向上し,セルタイプアノテーションやパーターブシーク効果予測,薬物の組み合わせ予測など,様々な下流タスクにおけるSOTA性能も向上することを示した。 xTrimoGeneモデルは現在、以下のリンクを通じてサービスとして利用可能である。

Advances in high-throughput sequencing technology have led to significant progress in measuring gene expressions at the single-cell level. The amount of publicly available single-cell RNA-seq (scRNA-seq) data is already surpassing 50M records for humans with each record measuring 20,000 genes. This highlights the need for unsupervised representation learning to fully ingest these data, yet classical transformer architectures are prohibitive to train on such data in terms of both computation and memory. To address this challenge, we propose a novel asymmetric encoder-decoder transformer for scRNA-seq data, called xTrimoGene$^\alpha$ (or xTrimoGene for short), which leverages the sparse characteristic of the data to scale up the pre-training. This scalable design of xTrimoGene reduces FLOPs by one to two orders of magnitude compared to classical transformers while maintaining high accuracy, enabling us to train the largest transformer models over the largest scRNA-seq dataset today. Our experiments also show that the performance of xTrimoGene improves as we scale up the model sizes, and it also leads to SOTA performance over various downstream tasks, such as cell type annotation, perturb-seq effect prediction, and drug combination prediction. xTrimoGene model is now available for use as a service via the following link: https://api.biomap.com/xTrimoGene/apply.

翻訳日:2023-11-28 19:13:02 公開日:2023-11-26

# 非対称Bethe Ansatz

Asymmetric Bethe Ansatz ( http://arxiv.org/abs/2311.15155v1 )

ライセンス: Link先を確認

Steven G. Jackson, Gregory E. Astrakharchik, and Maxim Olshanii

(参考訳) 最近提案された2つの$\delta$-function-interacting particlesの正確な量子解は、質量比3\! :\! ハードウォールボックス (Y. Liu, F. Qi, Y. Zhang, S. Chen, iScience 22 181 (2019)) の 1$ は、半透明な $\delta$-function ミラーに対するベーテ・アンザッツ積分性(英語版)(Bethe Ansatz integrability)の従来の必要条件に反するように見える: もしベーテ・アンザッツ可解モデルの2つのミラーが二面角 $\pi/(\text{odd number})$ で交差する場合、これらのミラーは等結合定数を割り当てなければならない。この論文では、この条件を緩和する方法を見出した: 従来の可積分系を取り込んで、その半透明ミラーのいくつかを完全に反射させることで置き換えることができる。後者の集合は、従来の系の対称性群の反射部分群の鏡で表さなければならない。この部分群は対称性の元系に対して対称であることが要求されるので、提案されたメソッドの名は非対称ベテ・アンザッツ (ABA) である。我々は、Liu-Qi-Zhang-Chen問題の正確な解が ABA の特別な例であることを示す。

The recently proposed exact quantum solution for two $\delta$-function-interacting particles with a mass-ratio $3\!:\!1$ in a hard-wall box [Y. Liu, F. Qi, Y. Zhang and S. Chen, iScience 22, 181 (2019)] seemingly violates the conventional necessary condition for a Bethe Ansatz integrability for a system of semitransparent $\delta$-function mirrors: if two mirrors of a Bethe-Ansatz-solvable model cross at a dihedral angle $\pi/(\text{odd number})$, these mirrors must be assigned equal coupling constants. In our article, we find a way to relax this condition: it turns out that one can take a conventional integrable system and replace some of its semi-transparent mirrors by perfectly reflecting ones. The latter set must be represented by the mirrors of a reflection subgroup of the symmetry group of the conventional system. This subgroup is \emph{not} required to be symmetric with respect to the symmetries original system, hence the proposed name for the method: Asymmetric Bethe Ansatz (ABA). We show that the exact solution of the Liu-Qi-Zhang-Chen problem is a particular instance of the ABA.

翻訳日:2023-11-28 19:12:38 公開日:2023-11-26

# フラクショナル非線形Schr\"{o}ディンガー方程式におけるスペクトル分岐の観測

Observation of the spectral bifurcation in the Fractional Nonlinear Schr\"{o}dinger Equation ( http://arxiv.org/abs/2311.15150v1 )

ライセンス: Link先を確認

Shilong Liu, Yingwen Zhang, St\'ephane Virally, Ebrahim Karimi, Boris A. Malomed, Denis V. Seletskiy

(参考訳) 超高速ソリトンパルスのスペクトル分岐の包括的調査と実験的実現を報告する。これらの分岐は、分数非線形schr\"{o}dinger方程式の枠組みにおける分数群速度分散とkerr非線形性(自己相変調)の相互作用によって引き起こされる。分数分散と非線形の作用下でパルスのダイナミクスを捉えるために,周波数チャープに基づく効果的な「力」モデルを提案する。力'モデルを利用することで、スペクトル分岐 \{1\}$\rightarrow$ \{n\} を関連する非線形レベルで直接生成する分数分散プロファイルを設計する。これらの結果は、非線形性の成長に付随する伝統的な分岐の列 \{1\}$\rightarrow$ \{2\}$\rightarrow$ \{3\} ... $\rightarrow$ \{N\} を超えて拡張される。実験的な検証では、パルス整形器のセットアップ内で正確に調整されたホログラムが、変更可能な非線形媒体に結合される。特に、次列カスケードで必要となる非線形性の強度が著しく低い場合、最大で N=5 in \{1\}$\rightarrow$ \{N\} 分岐が得られる。工学的なスペクトル分岐パターンの提案は、超高速信号処理アプリケーションにとって大きな可能性を秘めている。実例として、これらの分岐モードを用いて、100kmの単モードファイバで光データをスキューズし、伝送する。

We report a comprehensive investigation and experimental realization of spectral bifurcations of ultrafast soliton pulses. These bifurcations are induced by the interplay between fractional group-velocity dispersion and Kerr nonlinearity (self-phase modulation) within the framework of the fractional nonlinear Schr\"{o}dinger equation. To capture the dynamics of the pulses under the action of the fractional dispersion and nonlinearity, we propose an effective `force' model based on the frequency chirp, which characterizes their interactions as either `repulsion', `attraction', or `equilibration'. By leveraging the `force' model, we design segmented fractional dispersion profiles that directly generate spectral bifurcations \{1\}$\rightarrow$ \{N\} at relevant nonlinearity levels. These results extend beyond the traditional sequence of bifurcations \{1\}$\rightarrow$ \{2\}$\rightarrow$ \{3\} ... $\rightarrow$ \{N\} associated with the growth of the nonlinearity. The experimental validation involves a precisely tailored hologram within a pulse shaper setup, coupled to an alterable nonlinear medium. Notably, we achieve up to N=5 in \{1\}$\rightarrow$ \{N\} bifurcations at a significantly lower strength of nonlinearity than otherwise would be required in a sequential cascade. The proposal for engineering spectral bifurcation patterns holds significant potential for ultrafast signal processing applications. As a practical illustration, we employ these bifurcation modes to optical data squeezing and transmitting it across a 100-km-long single-mode fiber.

翻訳日:2023-11-28 19:12:10 公開日:2023-11-26

# IBM量子プロセッサ上での人工ニューラルネットワークシンドロームデコード

Artificial Neural Network Syndrome Decoding on IBM Quantum Processors ( http://arxiv.org/abs/2311.15146v1 )

ライセンス: Link先を確認

Brhyeton Hall, Spiro Gicev, Muhammad Usman

(参考訳) シンドローム復号法は、フォールトトレラント量子コンピューティングのための量子エラー補正の実装において、積分的だが計算的に要求されるステップである。本稿では,IBM量子プロセッサ上でのニューラルネットワーク(ANN)デコードの開発とベンチマークについて報告する。 ANNは重六角形コードアーキテクチャからシンドローム計測データを効率よく復号し、適切な修正を適用し、エラー保護を容易にする。 IBMデバイスの現在の物理的エラー率は、コードのしきい値を超え、論理的エラー率抑制のためにANNデコーダの範囲を制限する。しかし,本研究では,実験装置から取得したシンドロームデータのANN復号法の適用性を確認し,近日中にしきい値誤差率未満の量子デバイスが利用可能になると,機械学習を量子エラー訂正の有望な経路として確立する。

Syndrome decoding is an integral but computationally demanding step in the implementation of quantum error correction for fault-tolerant quantum computing. Here, we report the development and benchmarking of Artificial Neural Network (ANN) decoding on IBM Quantum Processors. We demonstrate that ANNs can efficiently decode syndrome measurement data from heavy-hexagonal code architecture and apply appropriate corrections to facilitate error protection. The current physical error rates of IBM devices are above the code's threshold and restrict the scope of our ANN decoder for logical error rate suppression. However, our work confirms the applicability of ANN decoding methods of syndrome data retrieved from experimental devices and establishes machine learning as a promising pathway for quantum error correction when quantum devices with below threshold error rates become available in the near future.

翻訳日:2023-11-28 19:11:44 公開日:2023-11-26

# 微妙な選択と深層学習:ドメイン一般化のためのCLIPによる選択的クロスモーダル蒸留

Choosing Wisely and Learning Deeply: Selective Cross-Modality Distillation via CLIP for Domain Generalization ( http://arxiv.org/abs/2311.15145v1 )

ライセンス: Link先を確認

Jixuan Leng, Yijiang Li, Haohan Wang

(参考訳) ドメインの一般化(DG)は重要な研究領域であり、複数のドメインにまたがるモデルをトレーニングし、目に見えない領域でテストすることを目指している。本稿では、ドメイン一般化のための選択的クロスモダリティ蒸留(scmd)という新しいアプローチを提案する。 SCMDは、大きな視覚言語モデル、特にCLIPモデルの能力を活用して、より効率的なモデルをトレーニングし、目に見えない領域にわたって堅牢な一般化能力を取得する。我々の主な貢献は、蒸留の難しいサンプルを特定するために戦略的に設計されたユニークな選択フレームワークである。並行して、新しいクロスモダリティモジュールを導入する。このモジュールは、学生モデルの投影された特徴とCLIPからのテキスト埋め込みをシームレスに組み合わせ、類似度分布のアライメントを保証する。 SCMDの性能を様々なベンチマークで評価し、ResNet50が既存のドメイン一般化手法を超越して最先端のパフォーマンスを提供できるようにします。さらに、我々は選択戦略の理論分析を行い、DG分野におけるその有効性と可能性について深い洞察を提供する。

Domain Generalization (DG), a crucial research area, seeks to train models across multiple domains and test them on unseen ones. In this paper, we introduce a novel approach, namely, Selective Cross-Modality Distillation for Domain Generalization (SCMD). SCMD leverages the capabilities of large vision-language models, specifically the CLIP model, to train a more efficient model, ensuring it acquires robust generalization capabilities across unseen domains. Our primary contribution is a unique selection framework strategically designed to identify hard-to-learn samples for distillation. In parallel, we introduce a novel cross-modality module. This module seamlessly combines the projected features of the student model with the text embeddings from CLIP, ensuring the alignment of similarity distributions. We assess SCMD's performance on various benchmarks, where it empowers a ResNet50 to deliver state-of-the-art performance, surpassing existing domain generalization methods. Furthermore, we provide a theoretical analysis of our selection strategy, offering deeper insight into its effectiveness and potential in the field of DG.

翻訳日:2023-11-28 19:11:30 公開日:2023-11-26

# ロングストーリー:コヒーレント、完全、そしてロングストーリーの生成を制御する

LongStory: Coherent, Complete and Length Controlled Long story Generation ( http://arxiv.org/abs/2311.15208v1 )

ライセンス: Link先を確認

Kyeongman Park, Nakyeong Yang, Kyomin Jung

(参考訳) 人間の作者は、コヒーレンスを失うことなく、どんなストーリーでも書ける。また、彼らは常に適切な結末、現在の言語モデルに欠けている能力に物語をもたらします。本稿では,コヒーレントで完全かつ長さ制御の長いストーリー生成のためのLongStoryを提案する。 LongStoryは,(1)長期・短期の重み調整器(CWC)と(2)長期ストーリー構造位置(LSP)の2つの新しい手法を導入した。 cwcは長期的文脈記憶と短期的文脈の不正行為の重み付けを調整し、それぞれの役割を認めている。 LSPは長い物語の構造的位置を伝えるために談話トークンを使用している。平均ストーリーの長さの異なる3つのデータセットでトレーニングされたlongstoryは、強力なストーリージェネレータプロットマシン、一貫性、完全性、関連性、反復性を含む他のベースラインよりも優れている。また、各データセット上でゼロショットテストを実施し、トレーニングデータを超えた結果を予測するモデルの能力を評価し、そのパフォーマンスとモデルの変種を比較して方法論を検証する。

A human author can write any length of story without losing coherence. Also, they always bring the story to a proper ending, an ability that current language models lack. In this work, we present the LongStory for coherent, complete, and length-controlled long story generation. LongStory introduces two novel methodologies: (1) the long and short-term contexts weight calibrator (CWC) and (2) long story structural positions (LSP). The CWC adjusts weights for long-term context Memory and short-term context Cheating, acknowledging their distinct roles. The LSP employs discourse tokens to convey the structural positions of a long story. Trained on three datasets with varied average story lengths, LongStory outperforms other baselines, including the strong story generator Plotmachine, in coherence, completeness, relevance, and repetitiveness. We also perform zero-shot tests on each dataset to assess the model's ability to predict outcomes beyond its training data and validate our methodology by comparing its performance with variants of our model.

翻訳日:2023-11-28 19:02:03 公開日:2023-11-26

# 低次元ディスクリプタを用いた化合物空間における分子特性の効率的な補間

Efficient interpolation of molecular properties across chemical compound space with low-dimensional descriptors ( http://arxiv.org/abs/2311.15207v1 )

ライセンス: Link先を確認

Yun-Wen Mao and Roman V. Krems

(参考訳) 低次元ディスクリプタを持つ化合物空間における補間のための分子特性の正確なデータスターベドモデルを示す。我々の出発点は、クーロン行列の固有値の分布の性質から導かれた三次元、普遍的、物理的ディスクリプタに基づいている。分子の形状と構成を考慮し、これらの記述子とガーシュゴリンの円定理で示される6次元の特徴を組み合わせる。そこで,ガウス過程の回帰に対して,可変関数型カーネルを用いた9次元ディスクリプタを用いることにより,高効率な低次元補間モデルを実現する。 100分子で訓練されたモデルでは、エントロピーと温度 (s \times t$) とゼロ点振動エネルギー (zpve) の積を、絶対誤差が1 kcal mol$^{-1}$ for $> 78$ \%、テストデータ中の分子の1.3 kcal mol$^{-1}$ for $> 92$ \%で予測することができる。試験データは、3つの原子から29個の原子に変化する2万の分子と、それぞれ36 kcal mol$^{-1}$と161 kcal mol$^{-1}$をカバーする$S \times T$とZPVEの範囲からなる。また,ゲルシュゴリン環定理に基づく記述子は,分子の原子結合を明示的に考慮したグラフニューラルネットワークに基づく記述モデルよりも正確な分子エントロピーモデルが得られることを示す。

We demonstrate accurate data-starved models of molecular properties for interpolation in chemical compound spaces with low-dimensional descriptors. Our starting point is based on three-dimensional, universal, physical descriptors derived from the properties of the distributions of the eigenvalues of Coulomb matrices. To account for the shape and composition of molecules, we combine these descriptors with six-dimensional features informed by the Gershgorin circle theorem. We use the nine-dimensional descriptors thus obtained for Gaussian process regression based on kernels with variable functional form, leading to extremely efficient, low-dimensional interpolation models. The resulting models trained with 100 molecules are able to predict the product of entropy and temperature ($S \times T$) and zero point vibrational energy (ZPVE) with the absolute error under 1 kcal mol$^{-1}$ for $> 78$ \% and under 1.3 kcal mol$^{-1}$ for $> 92$ \% of molecules in the test data. The test data comprises 20,000 molecules with complexity varying from three atoms to 29 atoms and the ranges of $S \times T$ and ZPVE covering 36 kcal mol$^{-1}$ and 161 kcal mol$^{-1}$, respectively. We also illustrate that the descriptors based on the Gershgorin circle theorem yield more accurate models of molecular entropy than those based on graph neural networks that explicitly account for the atomic connectivity of molecules.

翻訳日:2023-11-28 19:01:43 公開日:2023-11-26

# Insect-Foundation: Visual Insect Understandingのための基盤モデルと大規模100万データセット

Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding ( http://arxiv.org/abs/2311.15206v1 )

ライセンス: Link先を確認

Hoang-Quan Nguyen, Thanh-Dat Truong, Xuan Bac Nguyen, Ashley Dowling, Xin Li, Khoa Luu

(参考訳) 精密農業において、昆虫の検出と認識は、作物が健康に育ち、高品質な収量を生み出す能力において重要な役割を果たす。現在のマシンビジョンモデルは、高いパフォーマンスを達成するために大量のデータを必要とする。しかし、世界中で約550万種の昆虫が生息している。既存の昆虫のデータセットは、地理的に異なる場所と取得コストのために、そのわずかしかカバーできない。本稿では,昆虫に関する基礎モデルトレーニングに革命をもたらすゲーム変換リソースである'Insect-1M''データセットを紹介する。私たちのデータセットは昆虫の幅広い範囲をカバーしており、100万枚の画像に分類階層と昆虫の記述の密接な識別ラベルがあり、昆虫学のパノラマ的なビューを提供しています。そこで本研究では,昆虫画像間の微妙な相違を識別できるパッチワイド関連注意機構を備えた,微小機能自己教師型学習法を開発した。さらに,昆虫記述による微小機能モデリングを改善するために,記述一貫性損失を導入する。本研究は,昆虫モデルにおける提案手法の有効性を実証し,昆虫関連課題の標準ベンチマークにおける最新性能を実現する。当社の昆虫財団モデルとデータセットは、次世代昆虫関連視覚モデルに力を与え、精密農業の究極の目標に近付くことを約束しています。

In precision agriculture, the detection and recognition of insects play an essential role in the ability of crops to grow healthy and produce a high-quality yield. The current machine vision model requires a large volume of data to achieve high performance. However, there are approximately 5.5 million different insect species in the world. None of the existing insect datasets can cover even a fraction of them due to varying geographic locations and acquisition costs. In this paper, we introduce a novel ``Insect-1M'' dataset, a game-changing resource poised to revolutionize insect-related foundation model training. Covering a vast spectrum of insect species, our dataset, including 1 million images with dense identification labels of taxonomy hierarchy and insect descriptions, offers a panoramic view of entomology, enabling foundation models to comprehend visual and semantic information about insects like never before. Then, to efficiently establish an Insect Foundation Model, we develop a micro-feature self-supervised learning method with a Patch-wise Relevant Attention mechanism capable of discerning the subtle differences among insect images. In addition, we introduce Description Consistency loss to improve micro-feature modeling via insect descriptions. Through our experiments, we illustrate the effectiveness of our proposed approach in insect modeling and achieve State-of-the-Art performance on standard benchmarks of insect-related tasks. Our Insect Foundation Model and Dataset promise to empower the next generation of insect-related vision models, bringing them closer to the ultimate goal of precision agriculture.

翻訳日:2023-11-28 19:01:15 公開日:2023-11-26

# SAR船舶分類のための手作り共同特徴ビュー付きデュアルストリームコントラスト予測ネットワーク

Dual-stream contrastive predictive network with joint handcrafted feature view for SAR ship classification ( http://arxiv.org/abs/2311.15202v1 )

ライセンス: Link先を確認

Xianting Feng, Hao zheng, Zhigang Hu, Liu Yang, Meiguang Zheng

(参考訳) 既存の合成開口レーダー(SAR)の船種分類技術は、ラベルのないSARの船種画像の識別特性を無視して、正確なラベル付きデータに大きく依存している。研究者は従来の手作りの機能を取り入れてCNNベースの機能を充実させようとするが、既存の手法は情報冗長性を容易に引き起こし、それらの相互作用を捉えるのに失敗する。これらの問題に対処するために,2つの非対称なタスク設計と偽陰性サンプル除去モジュールからなる新しい二ストリームコントラスト予測ネットワーク(DCPNet)を提案する。最初のタスクは正のサンプルペアを構築し、コアエンコーダにより一般的な表現を学習させることである。第2の課題は, 深部特徴と手話特徴との対応を適応的に把握し, モデル内での知識伝達を実現し, 特徴融合による冗長性を効果的に改善することである。クラスタ間の分離性を高めるため、クラスタレベルのタスクも設計する。 OpenSARShipとFUSAR-Shipデータセットの実験結果は、教師付きモデルの分類精度の向上を示し、DCPNetの効果的な表現の学習能力を確認する。

Most existing synthetic aperture radar (SAR) ship classification technologies heavily rely on correctly labeled data, ignoring the discriminative features of unlabeled SAR ship images. Even though researchers try to enrich CNN-based features by introducing traditional handcrafted features, existing methods easily cause information redundancy and fail to capture the interaction between them. To address these issues, we propose a novel dual-stream contrastive predictive network (DCPNet), which consists of two asymmetric task designs and the false negative sample elimination module. The first task is to construct positive sample pairs, guiding the core encoder to learn more general representations. The second task is to encourage adaptive capture of the correspondence between deep features and handcrated features, achieving knowledge transfer within the model, and effectively improving the redundancy caused by the feature fusion. To increase the separability between clusters, we also design a cluster-level tasks. The experimental results on OpenSARShip and FUSAR-Ship datasets demonstrate the improvement in classification accuracy of supervised models and confirm the capability of learning effective representations of DCPNet.

翻訳日:2023-11-28 19:00:48 公開日:2023-11-26

# splicemix:マルチラベル画像分類のためのクロススケール・セマンティックブレンド拡張戦略

SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label Image Classification ( http://arxiv.org/abs/2311.15200v1 )

ライセンス: Link先を確認

Lei Wang and Yibing Zhan and Leilei Ma and Dapeng Tao and Liang Ding and Chen Gong

(参考訳) 近年、ミックススタイルのデータ拡張手法(例えばmixupやcutmix)が様々なビジュアルタスクで有望なパフォーマンスを示している。しかし、これらの手法は主にシングルラベル画像のために設計されており、シングルラベル画像とマルチラベル画像のかなりの差を無視している。一方で、従来のマルチラベル画像分類(mlic)法は、複雑なモデルを設計する傾向があり、高価な計算をもたらす。本稿では,マルチラベル画像分類,すなわちSpliceMixの簡易かつ効果的な拡張戦略を提案する。私たちのメソッドのspliceは2倍です。 1) 混合画像は,複数のダウンサンプリングされた画像を格子状に分割し,混合に係わる画像の意味を,共起バイアスを緩和する対象の欠陥を伴わずにブレンドする。 2)混合画像とオリジナルのミニバッチをスプライシングし,新しいスプライス混合ミニバッチを形成した。さらに、SpliceMixedのミニバッチは、混合画像と元の正規画像との相互作用を可能にする。また,一貫性学習(splicemix-cl)に基づく簡易かつ非パラメトリックな拡張を提供し,splicemixの柔軟な拡張性を示す。様々なタスクに関する大規模な実験は、ベースラインモデル(例えばResNet)でSpliceMixを使用するだけで、最先端のメソッドよりも優れたパフォーマンスが得られることを示した。さらに、SpliceMixの一般化性は、SpliceMixとの結婚時に現在のMLICメソッドの改善によってさらに検証される。コードはhttps://github.com/zuiran/splicemixで入手できる。

Recently, Mix-style data augmentation methods (e.g., Mixup and CutMix) have shown promising performance in various visual tasks. However, these methods are primarily designed for single-label images, ignoring the considerable discrepancies between single- and multi-label images, i.e., a multi-label image involves multiple co-occurred categories and fickle object scales. On the other hand, previous multi-label image classification (MLIC) methods tend to design elaborate models, bringing expensive computation. In this paper, we introduce a simple but effective augmentation strategy for multi-label image classification, namely SpliceMix. The "splice" in our method is two-fold: 1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias; 2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together. Furthermore, such splice in our SpliceMixed mini-batch enables interactions between mixed images and original regular images. We also offer a simple and non-parametric extension based on consistency learning (SpliceMix-CL) to show the flexible extensibility of our SpliceMix. Extensive experiments on various tasks demonstrate that only using SpliceMix with a baseline model (e.g., ResNet) achieves better performance than state-of-the-art methods. Moreover, the generalizability of our SpliceMix is further validated by the improvements in current MLIC methods when married with our SpliceMix. The code is available at https://github.com/zuiran/SpliceMix.

翻訳日:2023-11-28 19:00:27 公開日:2023-11-26

# ChatGPTとBeyond: 教育における創造的AI革命

ChatGPT and Beyond: The Generative AI Revolution in Education ( http://arxiv.org/abs/2311.15198v1 )

ライセンス: Link先を確認

Mohammad AL-Smadi

(参考訳) 生成的人工知能(AI)モデル、特にChatGPTの普及と利用が、教育現場におけるその潜在的な応用を探求する研究の急増を引き起こした。本調査は,2022年11月から2023年7月までに発行された学術文献について,特にscopus-indexed q1およびq2ジャーナルのハイインパクト研究を対象とする。この調査は、様々な教育的文脈における生成AIモデルの実践的応用と意味を掘り下げるものである。近年の学術文献の包括的かつ厳密な評価を通じて、この調査は、教育における生成的AIモデル、特にChatGPTの進化的役割を解明することを目指している。このダイナミックな分野における潜在的利益、課題、そして新たなトレンドを振り返ることで、この調査は、人工知能と教育の橋渡しの理解に寄与することに努めている。このレビューの結果は、教育者、研究者、政策立案者に対して、AI技術の学習環境への統合に関する情報的な決定を下すよう促す。

The wide adoption and usage of generative artificial intelligence (AI) models, particularly ChatGPT, has sparked a surge in research exploring their potential applications in the educational landscape. This survey examines academic literature published between November, 2022, and July, 2023, specifically targeting high-impact research from Scopus-indexed Q1 and Q2 journals. This survey delves into the practical applications and implications of generative AI models across a diverse range of educational contexts. Through a comprehensive and rigorous evaluation of recent academic literature, this survey seeks to illuminate the evolving role of generative AI models, particularly ChatGPT, in education. By shedding light on the potential benefits, challenges, and emerging trends in this dynamic field, the survey endeavors to contribute to the understanding of the nexus between artificial intelligence and education. The findings of this review will empower educators, researchers, and policymakers to make informed decisions about the integration of AI technologies into learning environments.

翻訳日:2023-11-28 18:59:59 公開日:2023-11-26

# アンサンブル窒素空洞を用いた広帯域マイクロ波センサの実証

Demonstration of highly-sensitive wideband microwave sensing using ensemble nitrogen-vacancy centers ( http://arxiv.org/abs/2311.15196v1 )

ライセンス: Link先を確認

Kensuke Ogawa, Shunsuke Nishimura, Kento Sasaki, Kensuke Kobayasahi

(参考訳) マイクロ波磁気測定はマイクロ波技術の進歩に不可欠である。ダイヤモンド中のアンサンブル窒素空洞(NV)中心を用いた交流ゼーマン効果を用いた広帯域マイクロ波センシングプロトコルを実証する。広視野顕微鏡はマイクロ波共振器の周波数特性と外共振マイクロ波振幅の空間分布を可視化することができる。さらに、この手法を動的疎結合と組み合わせることで、5.2 \, \mathrm{\mu T} / \sqrt{\mathrm{Hz}}$のマイクロ波振幅感度が40.2 \, \mathrm{\mu T} / \sqrt{\mathrm{Hz}}$の7.7倍向上し、2.77 \, \mathrm{\mu m} \times 2.77 \, \mathrm{\mu m} \times 30 \, \mathrm{nm}$の感度が得られる。我々の業績は、広帯域および広帯域マイクロ波イメージングのためのアンサンブルNVセンターの適応に向けた具体的なステップである。

Microwave magnetometry is essential for the advancement of microwave technologies. We demonstrate a broadband microwave sensing protocol using the AC Zeeman effect with ensemble nitrogen-vacancy (NV) centers in diamond. A widefield microscope can visualize the frequency characteristics of the microwave resonator and the spatial distribution of off-resonant microwave amplitude. Furthermore, by combining this method with dynamical decoupling, we achieve the microwave amplitude sensitivity of $5.2 \, \mathrm{\mu T} / \sqrt{\mathrm{Hz}}$, which is 7.7 times better than $40.2 \, \mathrm{\mu T} / \sqrt{\mathrm{Hz}}$ obtained using the protocol in previous research over a sensing volume of $2.77 \, \mathrm{\mu m} \times 2.77 \, \mathrm{\mu m} \times 30 \, \mathrm{nm}$. Our achievement is a concrete step in adapting ensemble NV centers for wideband and widefield microwave imaging.

翻訳日:2023-11-28 18:59:42 公開日:2023-11-26

# 基本原理知識者になるためのニューラルネットワークモデル

Neural Network Models of Becoming a Cardinal Principle Knower ( http://arxiv.org/abs/2311.15194v1 )

ライセンス: Link先を確認

Vima Gupta, Sashank Varma

(参考訳) 小学校に入ると、最初の50～100個の数字を記憶した数列から、後継関数を理解し、数え切れないほど無限となる数列の順序構造を理解するようになる。本研究では,N in (0, 98) のペア (N, N+1) における後継関数を学習する2つのニューラルネットワークモデルの発達変化について検討する。第1モデルは入力および出力値のワンホットエンコーディングを使用し、カウントリストを記憶する子供に対応し、第2モデルは位置値エンコーディングを使用し、命名番号の言語規則を学習する子供に対応する。位置-値モデルでは、十の境界を越えた表現的類似性の低下が予測された。テンス境界を越えた数え上げは、2次元空間におけるベクトル演算として理解でき、同じテンス配置の数値は線形に分離可能な方法で構成され、同じテンス配置の数字はグループ分けされる。カリキュラム学習シミュレーションは, 発達期児の発達する数値環境において, より少ない数の表現が, より大きい数の表現が学習され始めれば, より鋭くなり続けることを示す。これらのモデルは、後続関数の学習を超えて、より一般的な数え上げ過程をシミュレートし、可算無限を理解することの意味をより深く理解するために、再帰的アーキテクチャを用いた将来の作業の舞台となった。

As children enter elementary school, their understanding of the ordinal structure of numbers transitions from a memorized count list of the first 50-100 numbers to knowing the successor function and understanding the countably infinite. We investigate this developmental change in two neural network models that learn the successor function on the pairs (N, N+1) for N in (0, 98). The first uses a one-hot encoding of the input and output values and corresponds to children memorizing a count list, while the second model uses a place-value encoding and corresponds to children learning the language rules for naming numbers. The place-value model showed a predicted drop in representational similarity across tens boundaries. Counting across a tens boundary can be understood as a vector operation in 2D space, where the numbers with the same tens place are organized in a linearly separable manner, whereas those with the same ones place are grouped together. A curriculum learning simulation shows that, in the expanding numerical environment of the developing child, representations of smaller numbers continue to be sharpened even as larger numbers begin to be learned. These models set the stage for future work using recurrent architectures to move beyond learning the successor function to simulating the counting process more generally, and point towards a deeper understanding of what it means to understand the countably infinite.

翻訳日:2023-11-28 18:59:14 公開日:2023-11-26

# IA-LSTM:歩行者軌道予測のための対話型LSTM

IA-LSTM: Interaction-Aware LSTM for Pedestrian Trajectory Prediction ( http://arxiv.org/abs/2311.15193v1 )

ライセンス: Link先を確認

Yuehai Chen

(参考訳) 群衆シナリオにおける歩行者の軌道予測は、衝突を避けるための政策決定に有用であるため、自動運転や自律移動ロボット分野において不可欠である。人間は異なる歩行運動を持ち、現在の環境における人間と物体、特に人間自身との相互作用は複雑であるため、これは難しい問題である。しかし、従来の研究では人間と人間の相互作用をモデル化する方法に焦点が当てられていた。この問題に対処するために,人間と人間の相互作用の相対的重要性を計測できるだけでなく,歩行者ごとに個人的な空間を構築できるコレントロピーに基づく新しいメカニズムを導入する。さらに,シーン内の動的ヒューマンインタラクションの特徴表現を効果的に抽出し,対応する重みを計算し,異なるインタラクションの重要性を表現できる,このデータ駆動機構を含むインタラクションモジュールを提案する。このような社会的メッセージを歩行者間で共有するために、軌道予測のためのLong Short-Term Memory(LSTM)ネットワークに基づく対話型アーキテクチャを設計する。 2つの公開データセットでモデルの性能を実証し, 実験結果から, 従来の手法よりも優れた性能が得られることを示した。

Predicting the trajectory of pedestrians in crowd scenarios is indispensable in self-driving or autonomous mobile robot field because estimating the future locations of pedestrians around is beneficial for policy decision to avoid collision. It is a challenging issue because humans have different walking motions and the interactions between humans and objects in the current environment, especially between human themselves, are complex. Previous researches have focused on how to model the human-human interactions, however, neglecting the relative importance of interactions. In order to address this issue, we introduce a novel mechanism based on the correntropy, which not only can measure the relative importance of human-human interactions, but also can build personal space for each pedestrian. We further propose an Interaction Module including this data-driven mechanism that can effectively extract feature representations of dynamic human-human interactions in the scene and calculate corresponding weights to represent the importance of different interactions. To share such social messages among pedestrians, we design an interaction-aware architecture based on the Long Short-Term Memory (LSTM) network for trajectory prediction. We demonstrate the performance of our model on two public datasets and the experimental results demonstrate that our model can achieve better performance than several latest methods with good performance.

翻訳日:2023-11-28 18:58:46 公開日:2023-11-26

# 大規模言語モデルのボラティリティのベンチマーク

Benchmarking Large Language Model Volatility ( http://arxiv.org/abs/2311.15180v1 )

ライセンス: Link先を確認

Boyang Yu

(参考訳) 大規模言語モデル(LLM)からの非決定論的アウトプットの影響は,財務テキスト理解タスクにおいて十分に検討されていない。ニュース感情分析による米国株式市場への投資に関する説得力のあるケーススタディを通じて、文レベルの感情分類結果の実質的な変動を明らかにし、llm出力の生来のボラティリティを強調する。これらの不確実性は下流に流れ込み、ポートフォリオの構築とリターンに大きな変化をもたらした。言語モデルデコーダの温度パラメータを微調整すると、潜在的な対策が提示されるが、創造性を損なうことになる。同様に、複数の出力をアンサンブルすることは揮発性出力の効果を緩和するが、注目すべき計算投資を必要とする。本研究は,LLMの金融意思決定への統合の不確実性,特に非決定論的情報によって決定されるシナリオにおいて,不確実性に対処するための貴重な洞察を実践者に与えている。

The impact of non-deterministic outputs from Large Language Models (LLMs) is not well examined for financial text understanding tasks. Through a compelling case study on investing in the US equity market via news sentiment analysis, we uncover substantial variability in sentence-level sentiment classification results, underscoring the innate volatility of LLM outputs. These uncertainties cascade downstream, leading to more significant variations in portfolio construction and return. While tweaking the temperature parameter in the language model decoder presents a potential remedy, it comes at the expense of stifled creativity. Similarly, while ensembling multiple outputs mitigates the effect of volatile outputs, it demands a notable computational investment. This work furnishes practitioners with invaluable insights for adeptly navigating uncertainty in the integration of LLMs into financial decision-making, particularly in scenarios dictated by non-deterministic information.

翻訳日:2023-11-28 18:58:23 公開日:2023-11-26

# humanrecon: 幾何学的手がかりと物理前兆を用いた動的ヒトの神経再構築

HumanRecon: Neural Reconstruction of Dynamic Human Using Geometric Cues and Physical Priors ( http://arxiv.org/abs/2311.15171v1 )

ライセンス: Link先を確認

Junhui Yin, Wei Yin, Hao Chen, Xuqian Ren, Zhanyu Ma, Jun Guo, Yifan Liu

(参考訳) 近年の動的再建法は有望な再建結果を得た。これらの手法の多くは、明示的な幾何学的制約を考慮せずにRGB色監視のみに依存している。これにより、既存の人間の再構築技術は色に過度にフィットしやすくなり、幾何学的に固有の曖昧さ、特に疎らなマルチビュー設定を引き起こす。分子形状予測の分野での最近の進歩に触発されて、動的人間の再構築のための暗黙表現の学習において、推定深度と正規度の幾何学的制約を考える。幾何正規化として、信頼できるが明示的な監視情報を提供し、再構築品質を向上させる。また,視覚方向へのノイズの付加やヒト表面の密度の最大化など,いくつかの物理的に有益な先行技術も活用する。これらの先行は、光線に沿って描画された色が方向を見るために堅牢であることを保証するとともに、光線に沿って推定される密度の本来のあいまいさを低減する。実験の結果,人間固有の単分子推定器によって予測される深度と正常な手がかりは,効果的な監視信号を提供し,より正確な画像の描画を可能にすることが示された。最後に,提案する物理プライオリティにより,過剰フィッティングが著しく減少し,新規ビュー合成の全体的な品質が向上することを示す。私たちのコードは、~\href{https://github.com/PRIS-CV/HumanRecon}{https://github.com/PRIS-CV/HumanRecon}で利用可能です。

Recent methods for dynamic human reconstruction have attained promising reconstruction results. Most of these methods rely only on RGB color supervision without considering explicit geometric constraints. This leads to existing human reconstruction techniques being more prone to overfitting to color and causes geometrically inherent ambiguities, especially in the sparse multi-view setup. Motivated by recent advances in the field of monocular geometry prediction, we consider the geometric constraints of estimated depth and normals in the learning of neural implicit representation for dynamic human reconstruction. As a geometric regularization, this provides reliable yet explicit supervision information, and improves reconstruction quality. We also exploit several beneficial physical priors, such as adding noise into view direction and maximizing the density on the human surface. These priors ensure the color rendered along rays to be robust to view direction and reduce the inherent ambiguities of density estimated along rays. Experimental results demonstrate that depth and normal cues, predicted by human-specific monocular estimators, can provide effective supervision signals and render more accurate images. Finally, we also show that the proposed physical priors significantly reduce overfitting and improve the overall quality of novel view synthesis. Our code is available at:~\href{https://github.com/PRIS-CV/HumanRecon}{https://github.com/PRIS-CV/HumanRecon}.

翻訳日:2023-11-28 18:58:07 公開日:2023-11-26

# 配電系統における高インピーダンス故障位置推定のためのデータ駆動手法

A Data-Driven Approach for High-Impedance Fault Localization in Distribution Systems ( http://arxiv.org/abs/2311.15168v1 )

ライセンス: Link先を確認

Yuqi Zhou, Yuqing Dong and Rui Yang

(参考訳) 配電系統の信頼性の高い運用には,高精度で迅速な障害同定が不可欠である。送電網の他の故障とは異なり、hifは低故障電流のため従来の過電流リレーでは検出が極めて困難である。 HIFは様々な要因によって影響を受けるが、電圧電流特性は、システムが障害にどう反応するかを著しく示唆し、HIFを効果的にローカライズする機会を与える。本研究では,HIFイベントの識別のためのデータ駆動型手法を提案する。まず、電圧電流軌道の非線形性に取り組むため、分割関数で軌道を近似する最適化問題を定式化する。次に,すべてのセグメントの機能特徴を入力として収集し,サポートベクターマシンアプローチを用いて異なる場所でのhifを効率的に識別する。 IEEE 123-node test feederの数値的研究により,実時間HIF識別のための提案手法の有効性と精度が示された。

Accurate and quick identification of high-impedance faults is critical for the reliable operation of distribution systems. Unlike other faults in power grids, HIFs are very difficult to detect by conventional overcurrent relays due to the low fault current. Although HIFs can be affected by various factors, the voltage current characteristics can substantially imply how the system responds to the disturbance and thus provides opportunities to effectively localize HIFs. In this work, we propose a data-driven approach for the identification of HIF events. To tackle the nonlinearity of the voltage current trajectory, first, we formulate optimization problems to approximate the trajectory with piecewise functions. Then we collect the function features of all segments as inputs and use the support vector machine approach to efficiently identify HIFs at different locations. Numerical studies on the IEEE 123-node test feeder demonstrate the validity and accuracy of the proposed approach for real-time HIF identification.

翻訳日:2023-11-28 18:57:42 公開日:2023-11-26

# スライス・ツー・スライス・レジストレーションと再構成による自己監督型OCT画像

Self-supervised OCT Image Denoising with Slice-to-Slice Registration and Reconstruction ( http://arxiv.org/abs/2311.15167v1 )

ライセンス: Link先を確認

Shijie Li, Palaiologos Alexopoulos, Anse Vellappally, Ronald Zambrano, Wollstein Gadi, Guido Gerig

(参考訳) 強いスペックルノイズは、光コヒーレンストモグラフィー(OCT)イメージングに固有のものであり、臨床診断と疾患のモニタリングの進歩の鍵となる網膜構造の正確な定量化のための重要な障害である。構造保存ノイズ低減のための学習に基づく自己教師手法は,従来の手法よりも優れた性能を示したが,OCTイメージングではユニークな課題に直面している。コヒーレントAスキャンビームによるボクセルの高相関は、独立画素雑音の仮定に反する自己教師付き学習法の有効性を損なう。この独立性の仮定による既存モデルの限界を示す実験を行う。次に,OCT画像に特化して,スライス・バイ・スライス・トレーニングと登録用モジュールをひとつのネットワークに統合した,エンドツーエンドの自己教師型学習フレームワークを提案する。提案手法に対して広範なアブレーション研究を行った。前述した自己教師付き推論モデルとの比較により,提案フレームワークの性能が向上し,セグメンテーション性能と定量的解析への前処理ステップとして機能する可能性が示唆された。

Strong speckle noise is inherent to optical coherence tomography (OCT) imaging and represents a significant obstacle for accurate quantitative analysis of retinal structures which is key for advances in clinical diagnosis and monitoring of disease. Learning-based self-supervised methods for structure-preserving noise reduction have demonstrated superior performance over traditional methods but face unique challenges in OCT imaging. The high correlation of voxels generated by coherent A-scan beams undermines the efficacy of self-supervised learning methods as it violates the assumption of independent pixel noise. We conduct experiments demonstrating limitations of existing models due to this independence assumption. We then introduce a new end-to-end self-supervised learning framework specifically tailored for OCT image denoising, integrating slice-by-slice training and registration modules into one network. An extensive ablation study is conducted for the proposed approach. Comparison to previously published self-supervised denoising models demonstrates improved performance of the proposed framework, potentially serving as a preprocessing step towards superior segmentation performance and quantitative analysis.

翻訳日:2023-11-28 18:57:29 公開日:2023-11-26

# 混成分類器による精度・ロバスト性取引の軽減

Mixing Classifiers to Alleviate the Accuracy-Robustness Trade-Off ( http://arxiv.org/abs/2311.15165v1 )

ライセンス: Link先を確認

Yatong Bai, Brendon G. Anderson, Somayeh Sojoudi

(参考訳) 機械学習モデルは、最近データ駆動制御システムで大きな成功を収めている。しかし、標準学習モデルは、高い性能と厳密な堅牢性保証を必要とする安全クリティカルなシステムの制御において克服されなければならない制限である精度・ロバスト性トレードオフに苦しむことが多い。本研究では,標準モデルから高い精度とロバストモデルから高いロバスト性を同時に継承する分類器を開発するため,近年の"局所偏り平滑化"法を基礎としている。具体的には、局所バイアススムーシングをマルチクラス設定に拡張し、定式化を一般化して標準ニューラルネットワークとロバストニューラルネットワークの出力を“混合”することで、パフォーマンスボトルネックを克服する。我々は、ロバストなベースモデルのロバスト性が証明可能であれば、閉じた形式の$\ell_p$半径内で、入力に対する変更や攻撃が混合分類器の誤分類をもたらすことはないことを証明する。さらに、CIFAR-10ベンチマークデータセット上で数値実験を行い、混合モデルが精度・損耗トレードオフを著しく改善することを確認した。

Machine learning models have recently found tremendous success in data-driven control systems. However, standard learning models often suffer from an accuracy-robustness trade-off, which is a limitation that must be overcome in the control of safety-critical systems that require both high performance and rigorous robustness guarantees. In this work, we build upon the recent "locally biased smoothing" method to develop classifiers that simultaneously inherit high accuracy from standard models and high robustness from robust models. Specifically, we extend locally biased smoothing to the multi-class setting, and then overcome its performance bottleneck by generalizing the formulation to "mix" the outputs of a standard neural network and a robust neural network. We prove that when the robustness of the robust base model is certifiable, within a closed-form $\ell_p$ radius, no alteration or attack on an input can result in misclassification of the mixed classifier; the proposed model inherits the certified robustness. Moreover, we use numerical experiments on the CIFAR-10 benchmark dataset to verify that the mixed model noticeably improves the accuracy-robustness trade-off.

翻訳日:2023-11-28 18:57:09 公開日:2023-11-26

# 非物理的擬似モードモデルと物理アンサンブルのモデル化 : 非マルコフ量子ノイズのシミュレーション、緩和、再構成

Modeling the unphysical pseudomode model with physical ensembles: simulation, mitigation, and restructuring of non-Markovian quantum noise ( http://arxiv.org/abs/2311.15240v1 )

ライセンス: Link先を確認

Mauro Cirio, Si Luo, Pengfei Liang, Franco Nori, Neill Lambert

(参考訳) ガウス環境が量子系に与える影響は、連続体を離散的な補助量子と古典的自由度の集合に効果的に置き換えることによって説明できる。これは、還元されたシステムダイナミクスを古典的にシミュレートするために使用できる擬モードモデルを定義する。ここでは、擬モードモデル自体のアナログまたはデジタル量子シミュレーションの潜在的な利点を、別の視点で検討し、分析する。表面的には、そのような直接的な実験的な実装は、一般に、有効自由度の非物理的性質のために不可能である。しかし,非物理的擬似モードモデルの効果は,補助調和モードと任意の確率的駆動場を含む物理系のアンサンブル上で測定結果を用いて再現できることを示した。これは測定データにおける不正確性に対する安定性によって効率が制限される補間手法を導入することで実現される。そのようなシミュレーションがいかに私たちを許すかを検討する。 (i)古典的シミュレーションに挑戦する体制における複雑な非摂動環境と非マルコフ環境の効果の正確な量子シミュレーションを行う。 (ii) 逆に、量子デバイスに存在する潜在的な非マルコフノイズを緩和し、 (iii) 所定の浴槽の温度などの性質のいくつかを再構成すること。

The influence of a Gaussian environment on a quantum system can be described by effectively replacing the continuum with a discrete set of ancillary quantum and classical degrees of freedom. This defines a pseudomode model which can be used to classically simulate the reduced system dynamics. Here, we consider an alternative point of view and analyze the potential benefits of an analog or digital quantum simulation of the pseudomode model itself. Superficially, such a direct experimental implementation is, in general, impossible due to the unphysical properties of the effective degrees of freedom involved. However, we show that the effects of the unphysical pseudomode model can still be reproduced using measurement results over an ensemble of physical systems involving ancillary harmonic modes and an optional stochastic driving field. This is done by introducing an extrapolation technique whose efficiency is limited by stability against imprecision in the measurement data. We examine how such a simulation would allow us to (i) perform accurate quantum simulation of the effects of complex non-perturbative and non-Markovian environments in regimes that are challenging for classical simulation, (ii) conversely, mitigate potential unwanted non-Markovian noise present in quantum devices, and (iii) restructure some of some of the properties of a given physical bath, such as its temperature.

翻訳日:2023-11-28 18:50:07 公開日:2023-11-26

# 一般関数近似を用いた強化学習のためのほぼ最適かつ低スイッチングアルゴリズム

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation ( http://arxiv.org/abs/2311.15238v1 )

ライセンス: Link先を確認

Heyang Zhao and Jiafan He and Quanquan Gu

(参考訳) 探索・探索ジレンマは、複雑なモデルクラスを持つ強化学習(RL)において中心的な課題となっている。本稿では,一般関数近似を用いたRLのための単調Q-Learning with Upper Confidence Bound (MQL-UCB)を提案する。我々の主要なアルゴリズム設計は,(1)スイッチングコストを低く抑える一般的な決定論的政策変更戦略,(2)注意深く制御された関数クラス複雑性を持つ単調値関数構造,(3)データ効率の高い履歴軌跡を利用する分散重み付け回帰スキームである。 MQL-UCBは、$\tilde{O}(d\sqrt{HK})$が十分大きく、ほぼ最適ポリシーの切り替えコストが$\tilde{O}(dH)$で、$d$が関数クラスの希釈次元、$H$が計画的地平線、$K$がエピソード数である場合に、最小限の後悔を達成する。非線形関数近似を用いたサンプル効率とデプロイメント効率のよいq-learningの設計に光を当てた。

The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with general function approximation. Our key algorithmic design includes (1) a general deterministic policy-switching strategy that achieves low switching cost, (2) a monotonic value function structure with carefully controlled function class complexity, and (3) a variance-weighted regression scheme that exploits historical trajectories with high data efficiency. MQL-UCB achieves minimax optimal regret of $\tilde{O}(d\sqrt{HK})$ when $K$ is sufficiently large and near-optimal policy switching cost of $\tilde{O}(dH)$, with $d$ being the eluder dimension of the function class, $H$ being the planning horizon, and $K$ being the number of episodes. Our work sheds light on designing provably sample-efficient and deployment-efficient Q-learning with nonlinear function approximation.

翻訳日:2023-11-28 18:49:31 公開日:2023-11-26

# SARオブジェクト分類のための自己知識蒸留に基づく二重逆正規化ネットワーク

Double Reverse Regularization Network Based on Self-Knowledge Distillation for SAR Object Classification ( http://arxiv.org/abs/2311.15231v1 )

ライセンス: Link先を確認

Bo Xu, Hao Zheng, Zhigang Hu, Liu Yang, Meiguang Zheng

(参考訳) 現在の合成開口レーダ(sar)オブジェクト分類では、制限データセット(few-shot)とノイズデータによる深刻な過剰フィッティングの問題が大きな課題の1つとなっている。本稿では,知識蒸留の利点を学習ラベル平滑化正規化として考慮し,自己知識蒸留(drrnet-skd)に基づく新しい二重反転正規化ネットワークを提案する。具体的には, 蒸留重量が蒸留プロセスに与える影響を探索することで, オフラインとオンラインの蒸留を相補的に組み合わせることで, 効果的な正則化ネットワークを実現するために, 二重逆思考を採用することに着想を得た。次に、適応重み付け(AWA)モジュールは、ネットワーク性能に基づいて2つの逆転重みを適応的に割り当てるように設計され、学生ネットワークが両方の教師の恩恵を受けることができる。 The experimental results on OpenSARShip and FUSAR-Ship showed that DRRNet-SKD exhibits excellent performance on classical CNNs, out-of-the-the-the-art-knowledge distillation method。

In current synthetic aperture radar (SAR) object classification, one of the major challenges is the severe overfitting issue due to the limited dataset (few-shot) and noisy data. Considering the advantages of knowledge distillation as a learned label smoothing regularization, this paper proposes a novel Double Reverse Regularization Network based on Self-Knowledge Distillation (DRRNet-SKD). Specifically, through exploring the effect of distillation weight on the process of distillation, we are inspired to adopt the double reverse thought to implement an effective regularization network by combining offline and online distillation in a complementary way. Then, the Adaptive Weight Assignment (AWA) module is designed to adaptively assign two reverse-changing weights based on the network performance, allowing the student network to better benefit from both teachers. The experimental results on OpenSARShip and FUSAR-Ship demonstrate that DRRNet-SKD exhibits remarkable performance improvement on classical CNNs, outperforming state-of-the-art self-knowledge distillation methods.

翻訳日:2023-11-28 18:48:57 公開日:2023-11-26

# GAIA:ゼロショットトーキングアバター世代

GAIA: Zero-shot Talking Avatar Generation ( http://arxiv.org/abs/2311.15230v1 )

ライセンス: Link先を確認

Tianyu He, Junliang Guo, Runyi Yu, Yuchi Wang, Jialiang Zhu, Kaikai An, Leyi Li, Xu Tan, Chunyu Wang, Han Hu, HsiangTao Wu, Sheng Zhao, Jiang Bian

(参考訳) ゼロショットトークアバター生成は、音声と1つのポートレート画像から自然なトークビデオを合成することを目的としている。従来の手法は、ワーピングに基づく運動表現や3次元モルファブルモデルといったドメイン固有のヒューリスティックに依存しており、これは生成されたアバターの自然性と多様性を制限する。本稿では,対話型アバター生成におけるドメインプライオリティを解消するgaia(generative ai for avatar)を紹介する。音声がアバターの動きのみを駆動するのに対し、アバターの外観と背景はビデオ全体を通して同じままであるという観察に照らして、アプローチを2つの段階に分けた。 1) 各フレームを動作及び外観表現に分解する。 2) 音声および参照ポートレート画像に条件付き動作シーケンスを生成する。大規模な高品質な音声アバターデータセットを収集し、異なるスケール(最大2Bパラメータ)でモデルをトレーニングします。 GAIAの優位性,スケーラビリティ,柔軟性を検証した実験結果 1) 結果のモデルは,自然性,多様性,リップシンク品質,視覚的品質の点で,従来のベースラインモデルを上回る。 2) より大きなモデルはより良い結果をもたらすので、フレームワークはスケーラブルです。 3) 汎用的で,制御可能な発話アバター生成やテキスト指示アバター生成など,さまざまなアプリケーションを可能にする。

Zero-shot talking avatar generation aims at synthesizing natural talking videos from speech and a single portrait image. Previous methods have relied on domain-specific heuristics such as warping-based motion representation and 3D Morphable Models, which limit the naturalness and diversity of the generated avatars. In this work, we introduce GAIA (Generative AI for Avatar), which eliminates the domain priors in talking avatar generation. In light of the observation that the speech only drives the motion of the avatar while the appearance of the avatar and the background typically remain the same throughout the entire video, we divide our approach into two stages: 1) disentangling each frame into motion and appearance representations; 2) generating motion sequences conditioned on the speech and reference portrait image. We collect a large-scale high-quality talking avatar dataset and train the model on it with different scales (up to 2B parameters). Experimental results verify the superiority, scalability, and flexibility of GAIA as 1) the resulting model beats previous baseline models in terms of naturalness, diversity, lip-sync quality, and visual quality; 2) the framework is scalable since larger models yield better results; 3) it is general and enables different applications like controllable talking avatar generation and text-instructed avatar generation.

翻訳日:2023-11-28 18:48:23 公開日:2023-11-26

# 画像分類のための1ビットスーパービジョン:問題、解決策、そしてそれ以上

One-bit Supervision for Image Classification: Problem, Solution, and Beyond ( http://arxiv.org/abs/2311.15225v1 )

ライセンス: Link先を確認

Hengtong Hu, Lingxi Xie, Xinyue Hue, Richang Hong, Qi Tian

(参考訳) 本稿では,画像分類のための新しい学習セットであるone-bit supervisorを提案する。各サンプルの正確なラベルを用いてモデルをトレーニングする代わりに、我々の設定では、各サンプルのクラスラベルを予測し、推測が正しいかどうかを答えから学習することで、情報の1ビット(yes or no)を提供するモデルが必要である。この設定の興味深い特性は、アノテーションの負担が正確なラベルを提供するよりも大幅に軽減されていることである。 1ビット監視には2つのキーがあります。一推測精度及び推定精度の向上 (ii)不正確な推測をうまく利用すること。これらの目標を達成するために,多段階学習パラダイムを提案し,既成の半教師付き学習アルゴリズムに負のラベル抑圧を組み込む。理論解析により,1ビットアノテーションは全ビットアノテーションよりも効率が高く,本手法とアクティブラーニングの併用条件が示唆された。これにより,より効率的なトレーニングスケジュールが得られる自己教師付き学習アルゴリズムに,ワンビット監視フレームワークをさらに統合する。自己指導型学習を初期化に用いた場合、スクラッチのトレーニングと異なり、ハードサンプルマイニングとクラスバランスの両方が学習性能の向上に有効である。しかし、これら2つのフレームワークは、初期段階ではフルビットラベルが必要である。この負担を軽減すべく、教師なしドメイン適応を用いて初期モデルをトレーニングし、ターゲットデータセット上で純粋な1ビットアノテーションを実行する。複数のベンチマークにおいて、提案手法の学習効率は、フルビットの半教師付き監視手法よりも優れている。

This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification. Instead of training model using the accurate label of each sample, our setting requires the model to interact with the system by predicting the class label of each sample and learn from the answer whether the guess is correct, which provides one bit (yes or no) of information. An intriguing property of the setting is that the burden of annotation largely alleviates in comparison to offering the accurate label. There are two keys to one-bit supervision, which are (i) improving the guess accuracy and (ii) making good use of the incorrect guesses. To achieve these goals, we propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm. Theoretical analysis shows that one-bit annotation is more efficient than full-bit annotation in most cases and gives the conditions of combining our approach with active learning. Inspired by this, we further integrate the one-bit supervision framework into the self-supervised learning algorithm which yields an even more efficient training schedule. Different from training from scratch, when self-supervised learning is used for initialization, both hard example mining and class balance are verified effective in boosting the learning performance. However, these two frameworks still need full-bit labels in the initial stage. To cast off this burden, we utilize unsupervised domain adaptation to train the initial model and conduct pure one-bit annotations on the target dataset. In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.

翻訳日:2023-11-28 18:47:55 公開日:2023-11-26

# 為替取引における決定木心理的リスク評価

Decision Tree Psychological Risk Assessment in Currency Trading ( http://arxiv.org/abs/2311.15222v1 )

ライセンス: Link先を確認

Jai Pal

(参考訳) 本研究は、AI(AI)を通貨トレーディングの世界に統合することに焦点を当て、個人トレーダの慣用性に合わせたインテリジェントなパーソナルアシスタントとして機能するパーソナライズされたAIモデルの開発を実証する。この論文は、AIモデルがトレーダの履歴データ内のニュアンスドパターンを識別し、通貨取引における心理的リスクダイナミクスをより正確かつ洞察に富んだ評価を容易にすることを示唆している。 PRIは、トレーダーの心理的脆弱性を促進する市場の状況に応じて変動を経験するダイナミックな指標である。高度な技術を利用することで、決定木を分類し、木構造内の決定境界を明確にすることができる。ユーザの時系列取引エントリを組み込むことで、心理的リスクが高められた場合の臨界点の特定に適している。リアルタイムの計算の性質は、心理的リスクの差し迫った瞬間についてトレーダーにタイムリーな警告を提供するプロアクティブツールとしてのモデルの実用性を高める。この研究の意味は通貨取引の制限を超えて広がり、パーソナライズされたモデリングの法的な適用が効率的かつ戦略的アプローチとして現れる他の産業の領域に到達した。本稿では,最先端技術と人間心理学の複雑なニュアンスを交点として,動的・高圧環境における意思決定支援のための変容パラダイムを提案する。

This research paper focuses on the integration of Artificial Intelligence (AI) into the currency trading landscape, positing the development of personalized AI models, essentially functioning as intelligent personal assistants tailored to the idiosyncrasies of individual traders. The paper posits that AI models are capable of identifying nuanced patterns within the trader's historical data, facilitating a more accurate and insightful assessment of psychological risk dynamics in currency trading. The PRI is a dynamic metric that experiences fluctuations in response to market conditions that foster psychological fragility among traders. By employing sophisticated techniques, a classifying decision tree is crafted, enabling clearer decision-making boundaries within the tree structure. By incorporating the user's chronological trade entries, the model becomes adept at identifying critical junctures when psychological risks are heightened. The real-time nature of the calculations enhances the model's utility as a proactive tool, offering timely alerts to traders about impending moments of psychological risks. The implications of this research extend beyond the confines of currency trading, reaching into the realms of other industries where the judicious application of personalized modeling emerges as an efficient and strategic approach. This paper positions itself at the intersection of cutting-edge technology and the intricate nuances of human psychology, offering a transformative paradigm for decision making support in dynamic and high-pressure environments.

翻訳日:2023-11-28 18:47:18 公開日:2023-11-26

# 限られたサンプルによる位相検索の局所的景観

The Local Landscape of Phase Retrieval Under Limited Samples ( http://arxiv.org/abs/2311.15221v1 )

ライセンス: Link先を確認

Kaizhao Liu, Zihao Wang, Lei Wu

(参考訳) 本稿では,限られたサンプルを用いて局地的位相探索の局地的景観を詳細に解析する。本研究の目的は,グローバルミニマを取り巻く良質な局所景観を高次元で保証するために必要なサンプルサイズを最小にすることである。 n$ と $d$ はそれぞれサンプルサイズと入力次元を表す。まず、局所凸性を探究し、$n=o(d\log d)$ が局所球のほとんどすべての固定点に対して、ヘッセン行列は、d$が十分大きい限り負の固有値を持つ必要があることを確かめる。そのため、地域景観は非凸である。次に、一点強凸性を考えると、n=\omega(d)$ である限り、高い確率で、そのランドスケープは局所環状の 1点強凸である: $\{w\in\mathbb{r}^d: o_d(1)\leqslant \|w-w^*\|\leqslant c\}$, ここで $w^*$ は基底真理であり、$c$ は絶対定数である。これは、この領域の任意の点から初期化された勾配降下が指数関数的に速く$o_d(1)$-loss解に収束することを意味する。さらに、$n=o(d\log d)$ のとき、半径が $\widetilde\Theta\left(\sqrt{1/d}\right)$ であることを示し、一点凸性は対応する小さな局所球で破れる。これは、一点凸性のみに依存することで、限られたサンプルの下での勾配降下に対して正確な$w^*$の収束を確立することができないことを示している。

In this paper, we provide a fine-grained analysis of the local landscape of phase retrieval under the regime with limited samples. Our aim is to ascertain the minimal sample size necessary to guarantee a benign local landscape surrounding global minima in high dimensions. Let $n$ and $d$ denote the sample size and input dimension, respectively. We first explore the local convexity and establish that when $n=o(d\log d)$, for almost every fixed point in the local ball, the Hessian matrix must have negative eigenvalues as long as $d$ is sufficiently large. Consequently, the local landscape is highly non-convex. We next consider the one-point strong convexity and show that as long as $n=\omega(d)$, with high probability, the landscape is one-point strongly convex in the local annulus: $\{w\in\mathbb{R}^d: o_d(1)\leqslant \|w-w^*\|\leqslant c\}$, where $w^*$ is the ground truth and $c$ is an absolute constant. This implies that gradient descent initialized from any point in this domain can converge to an $o_d(1)$-loss solution exponentially fast. Furthermore, we show that when $n=o(d\log d)$, there is a radius of $\widetilde\Theta\left(\sqrt{1/d}\right)$ such that one-point convexity breaks in the corresponding smaller local ball. This indicates an impossibility to establish a convergence to exact $w^*$ for gradient descent under limited samples by relying solely on one-point convexity.

翻訳日:2023-11-28 18:46:53 公開日:2023-11-26

# 量的分析と質的データに基づく株式市場予測のためのデータセット

Dataset for Stock Market Forecasting Based on Quantitative Analysis and Qualitative Data ( http://arxiv.org/abs/2311.15218v1 )

ライセンス: Link先を確認

Sai Akash Bathini, Dagli Cihan

(参考訳) 機械学習の金融への応用は、株式市場の予測よりもよく知られたアプローチになっている。株式市場は揮発性が高く、全世界で毎分大量のデータが生成される。このデータから効果的なインテリジェンスを抽出することが重要である。しかし,数値ストックデータと定性的テキストデータとの協調は難しい課題である。本研究は,前例のない,技術的かつ基本的なデータと,ニュースアーカイブやテレビニュースキャプション,ラジオの書き起こし,ツイート,日々の金融新聞などから収集した感情を備えたデータセットを提供する。感情抽出に使われるテキストデータエントリは合計で140万以上である。データセットは、2018年1月から2022年12月までの8つの異なる企業の日次エントリと、Dow Jones Index全体で構成されている。モデル学習とデプロイの準備が整った、ホロスティック基本および技術データを提供する。ディープラーニングモデルの予測力は、提供されるトレーニングデータによって大きく決定される。このデータセットは、株式市場の予測に質的なインテリジェンスをグローバルに取り入れた研究の恩恵を受けるだろう。データセットはhttps://github.com/batking24/Huge-Stock-Datasetで公開されている。

The application of Machine learning to finance has become a familiar approach, even more so in stock market forecasting. The stock market is highly volatile and huge amounts of data are generated every minute globally. The extraction of effective intelligence from this data is of critical importance. However, a collaboration of numerical stock data with qualitative text data can be a challenging task. In this work, we accomplish this and provide an unprecedented, publicly available dataset with technical and fundamental data, sentiment that we gathered from News Archives, TV news captions, Radio Transcripts, Tweets, Daily financial newspapers, etc. The text data entries used for sentiment extraction total more than 1.4 Million. The dataset comprises of daily entries from January 2018 to December 2022 for 8 different companies and Dow Jones Index as a whole. Holistic Fundamental and Technical data is provided training ready for Model learning and deployment. The predictive power of deep learning models is highly determined by the training data provided. This dataset would be of benefit for research globally incorporating qualitative intelligence for stock market forecasting. The dataset is made available at https://github.com/batking24/Huge-Stock-Dataset.

翻訳日:2023-11-28 18:46:16 公開日:2023-11-26

# 物理インフォームドグラフ学習による大規模単位コミットメント問題の解法

Solve Large-scale Unit Commitment Problems by Physics-informed Graph Learning ( http://arxiv.org/abs/2311.15216v1 )

ライセンス: Link先を確認

Jingtao Qin, Nanpeng Yu

(参考訳) 単位コミットメント(UC)問題は一般的に混合整数プログラム(MIP)として定式化され、分岐とバウンド(B&B)方式で解決される。グラフニューラルネットワーク(GNN)の最近の進歩により、最新のMIPソルバにおけるB&Bアルゴリズムを、潜水と分岐の学習によって強化することができる。 MIP問題に対処する既存のGNNモデルは、大規模なUC問題を扱う際に計算コストがかかる数学的定式化によって構築されている。本稿では,電力系統の様々な構成要素の基盤的特徴を活かし,高品質な可変代入を求めるニューラルダイビングのための物理計算型階層型グラフ畳み込みネットワーク(pi-gcn)を提案する。さらに,MIPモデルに基づくグラフ畳み込みネットワーク(MB-GCN)を神経分岐に適用し,B&Bツリーの各ノードで分岐する最適な変数を選択する。最後に、ニューラルダイビングとニューラルブランチを現代のMIPソルバに統合し、大規模UC問題用に設計された新しいニューラルMIPソルバを確立する。多くの研究により、PI-GCNはニューラルダイビングのベースラインMB-GCNよりも性能とスケーラビリティが優れていることが示されている。さらに,提案するニューラルダイビングモデルとベースラインニューラル分岐モデルを組み合わせた場合,ニューラルmipソルバは運用コストが最も低く,最新のmipソルバよりも優れた性能を発揮する。

Unit commitment (UC) problems are typically formulated as mixed-integer programs (MIP) and solved by the branch-and-bound (B&B) scheme. The recent advances in graph neural networks (GNN) enable it to enhance the B&B algorithm in modern MIP solvers by learning to dive and branch. Existing GNN models that tackle MIP problems are mostly constructed from mathematical formulation, which is computationally expensive when dealing with large-scale UC problems. In this paper, we propose a physics-informed hierarchical graph convolutional network (PI-GCN) for neural diving that leverages the underlying features of various components of power systems to find high-quality variable assignments. Furthermore, we adopt the MIP model-based graph convolutional network (MB-GCN) for neural branching to select the optimal variables for branching at each node of the B&B tree. Finally, we integrate neural diving and neural branching into a modern MIP solver to establish a novel neural MIP solver designed for large-scale UC problems. Numeral studies show that PI-GCN has better performance and scalability than the baseline MB-GCN on neural diving. Moreover, the neural MIP solver yields the lowest operational cost and outperforms a modern MIP solver for all testing days after combining it with our proposed neural diving model and the baseline neural branching model.

翻訳日:2023-11-28 18:45:59 公開日:2023-11-26

# 隣り合う階層初期化を持つ新しい正規化カットソルバー

A Novel Normalized-Cut Solver with Nearest Neighbor Hierarchical Initialization ( http://arxiv.org/abs/2311.15214v1 )

ライセンス: Link先を確認

Feiping Nie, Jitao Lu, Danyang Wu, Rong Wang, Xuelong Li

(参考訳) 正規化カット(N-Cut)は、スペクトルクラスタリングの有名なモデルである。従来のN-Cutソルバは2段階である。 1)正規化ラプラシアン行列の連続スペクトル埋め込みの計算 2)K$-meansまたはスペクトル回転による離散化。しかしこのパラダイムは2つの重大な問題をもたらします 1) 2段階法は元の問題の緩和版を解くため、元のN-Cut問題に対して良い解を得ることはできない。 2) 緩和された問題を解決するには,$\mathcal{o}(n^3)$ の時間複雑性 (n$ はノード数) を持つ固有値分解が必要である。この問題を解決するために,有名な座標降下法に基づく新しいN-Cut解法を提案する。バニラ座標降下法にも$\mathcal{o}(n^3)$ の時間複雑性があるので、時間複雑性を$\mathcal{o}(|e|)$ (|e|$ is the number of edges) に減らすための様々な加速戦略を設計する。クラスタリングに不確実性をもたらすランダム初期化への依存を避けるため,決定論的アウトプットを与える効率的な初期化手法を提案する。いくつかのベンチマークデータセットに対する大規模な実験により、提案手法は従来の解法と比較してクラスタリング性能が向上する一方、N-Cutの目的値が大きいことが示されている。

Normalized-Cut (N-Cut) is a famous model of spectral clustering. The traditional N-Cut solvers are two-stage: 1) calculating the continuous spectral embedding of normalized Laplacian matrix; 2) discretization via $K$-means or spectral rotation. However, this paradigm brings two vital problems: 1) two-stage methods solve a relaxed version of the original problem, so they cannot obtain good solutions for the original N-Cut problem; 2) solving the relaxed problem requires eigenvalue decomposition, which has $\mathcal{O}(n^3)$ time complexity ($n$ is the number of nodes). To address the problems, we propose a novel N-Cut solver designed based on the famous coordinate descent method. Since the vanilla coordinate descent method also has $\mathcal{O}(n^3)$ time complexity, we design various accelerating strategies to reduce the time complexity to $\mathcal{O}(|E|)$ ($|E|$ is the number of edges). To avoid reliance on random initialization which brings uncertainties to clustering, we propose an efficient initialization method that gives deterministic outputs. Extensive experiments on several benchmark datasets demonstrate that the proposed solver can obtain larger objective values of N-Cut, meanwhile achieving better clustering performance compared to traditional solvers.

翻訳日:2023-11-28 18:45:33 公開日:2023-11-26

# 気胸分節に対する不確かさを伴う解剖学的制約の緩和

Leveraging Anatomical Constraints with Uncertainty for Pneumothorax Segmentation ( http://arxiv.org/abs/2311.15213v1 )

ライセンス: Link先を確認

Han Yuan, Chuan Hong, Nguyen Tuan Anh Tran, Xinxing Xu, Nan Liu

(参考訳) 気胸は胸腔に異常な空気の蓄積(肺と胸壁の間の潜在的な空間)によって引き起こされる医学上の緊急事態である。 2D胸部X線写真では胸腔内および縦隔外側に気胸を認め,この領域を「lung+ space」と呼ぶ。深層学習(DL)は胸部X線写真における気胸病変の分画にますます利用されているが,既存のDLモデルの多くはエンドツーエンドアプローチを採用している。これらのモデルは胸部x線写真を直接臨床医に注釈された病変領域にマッピングし、気胸が本質的に位置に敏感であるという重要な領域知識を無視することが多い。 2次元胸部x線写真における気胸分画のdlモデルトレーニング中に肺+空間を制約として組み込む新しいアプローチを提案する。追加アノテーションの必要性を回避し,対象タスクにおける潜在的なラベルリークを防止するために,外部データセットと肺分節補助タスクを利用する。このアプローチは胸部X線写真ごとに肺+空間の特定の制約を生成する。さらに,補助データセットと対象データセット間のドメインシフトに起因する信頼できない制約を排除するために,判別器を組み込んだ。その結果,平均性能は4.6%,3.6%,3.3%向上し,iou(intersection over union),dsc(dice similarity coefficient)およびhd(hausdorff distance)が改善した。本研究は, 気胸の部位特異性に関する医学領域知識を取り入れ, dl-based lesion segmentationを増強する意義を強調する。

Pneumothorax is a medical emergency caused by abnormal accumulation of air in the pleural space - the potential space between the lungs and chest wall. On 2D chest radiographs, pneumothorax occurs within the thoracic cavity and outside of the mediastinum and we refer to this area as "lung+ space". While deep learning (DL) has increasingly been utilized to segment pneumothorax lesions in chest radiographs, many existing DL models employ an end-to-end approach. These models directly map chest radiographs to clinician-annotated lesion areas, often neglecting the vital domain knowledge that pneumothorax is inherently location-sensitive. We propose a novel approach that incorporates the lung+ space as a constraint during DL model training for pneumothorax segmentation on 2D chest radiographs. To circumvent the need for additional annotations and to prevent potential label leakage on the target task, our method utilizes external datasets and an auxiliary task of lung segmentation. This approach generates a specific constraint of lung+ space for each chest radiograph. Furthermore, we have incorporated a discriminator to eliminate unreliable constraints caused by the domain shift between the auxiliary and target datasets. Our results demonstrated significant improvements, with average performance gains of 4.6%, 3.6%, and 3.3% regarding Intersection over Union (IoU), Dice Similarity Coefficient (DSC), and Hausdorff Distance (HD). Our research underscores the significance of incorporating medical domain knowledge about the location-specific nature of pneumothorax to enhance DL-based lesion segmentation.

翻訳日:2023-11-28 18:45:09 公開日:2023-11-26

# OpenPerf: オープンソースエコシステムの持続可能な開発のためのベンチマークフレームワーク

OpenPerf: A Benchmarking Framework for the Sustainable Development of the Open-Source Ecosystem ( http://arxiv.org/abs/2311.15212v1 )

ライセンス: Link先を確認

Fenglin Bi, Fanyu Han, Shengyu Zhao, Jinlu Li, Yanbin Zhang, Wei Wang

(参考訳) ベンチマークには、特定のテスト対象の特定のパフォーマンス指標を定量的かつ比較的に評価するための、科学的テスト方法、ツール、フレームワークを設計することが含まれる。人工知能の開発により、imagenetやdataperfといったaiベンチマークデータセットは、学術分野と産業分野の両方で徐々にコンセンサス標準になっている。しかし、ベンチマークフレームワークの構築は、さまざまなデータタイプ、幅広い研究課題、そしてコラボレーションネットワークの複雑な性質のために、オープンソースドメインにおいて依然として重要な課題である。本稿では,オープンソースエコシステムの持続可能な開発を目的としたベンチマークフレームワークであるOpenPerfを紹介する。このフレームワークは、オープンソースの研究で9つのタスクベンチマークタスクを定義し、時系列、テキスト、グラフィックという3つのデータタイプを包含し、回帰、分類、推奨、ランキング、ネットワーク構築、異常検出を含む6つの研究問題に対処する。上記のタスクに基づいて、3つのデータサイエンスタスクベンチマーク、2つのインデックスベースのベンチマーク、1つの標準ベンチマークを実装した。特に、インデックスベースのベンチマークは、オープンソースコミュニティガバナンスの評価基準として、China Electronics Standardization Instituteによって採用されている。さらに私たちは,堅牢なデータ管理,ツール統合,ユーザインターフェース機能を提供するだけでなく,学術機関や産業,財団にサービスを提供するためにbenchmarking-as-a-service(baas)モデルも採用する,openperf用の包括的なツールキットを開発した。 Alibaba、Ant Group、East China Normal Universityといった著名な企業や機関に適用することで、オープンソースエコシステムの健全な進化におけるOpenPerfの重要な役割を検証しました。

Benchmarking involves designing scientific test methods, tools, and frameworks to quantitatively and comparably assess specific performance indicators of certain test subjects. With the development of artificial intelligence, AI benchmarking datasets such as ImageNet and DataPerf have gradually become consensus standards in both academic and industrial fields. However, constructing a benchmarking framework remains a significant challenge in the open-source domain due to the diverse range of data types, the wide array of research issues, and the intricate nature of collaboration networks. This paper introduces OpenPerf, a benchmarking framework designed for the sustainable development of the open-source ecosystem. This framework defines 9 task benchmarking tasks in the open-source research, encompassing 3 data types: time series, text, and graphics, and addresses 6 research problems including regression, classification, recommendation, ranking, network building, and anomaly detection. Based on the above tasks, we implemented 3 data science task benchmarks, 2 index-based benchmarks, and 1 standard benchmark. Notably, the index-based benchmarks have been adopted by the China Electronics Standardization Institute as evaluation criteria for open-source community governance. Additionally, we have developed a comprehensive toolkit for OpenPerf, which not only offers robust data management, tool integration, and user interface capabilities but also adopts a Benchmarking-as-a-Service (BaaS) model to serve academic institutions, industries, and foundations. Through its application in renowned companies and institutions such as Alibaba, Ant Group, and East China Normal University, we have validated OpenPerf's pivotal role in the healthy evolution of the open-source ecosystem.

翻訳日:2023-11-28 18:44:39 公開日:2023-11-26

# 確率的トランスフォーマー:文脈表現のための確率的依存モデル

Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation ( http://arxiv.org/abs/2311.15211v1 )

ライセンス: Link先を確認

Haoyi Wu, Kewei Tu

(参考訳) 構文構造は自然言語処理(nlp)において重要な役割を担っていたが、ディープラーニング革命以降、nlpは徐々に構文構造を考慮しない神経モデルに支配されるようになった。非常に成功したニューラルモデルの一つがトランスフォーマーである。エンコーダとして使用する場合、トランスフォーマーは入力文中の単語の文脈表現を生成する。本研究では,神経的な視点からではなく,純粋に構文的・確率的視点から,文脈的単語表現の新しいモデルを提案する。具体的には、文中のすべての単語の離散的な潜在表現とそれらの間の依存弧をモデル化する条件付きランダムフィールドを設計し、近似推論に平均場変動推論を用いる。驚くべきことに、我々のモデルの計算グラフはトランスフォーマーに似ており、依存と自己対応、潜在表現上の分布と単語の文脈埋め込みの間の対応がある。実験により,本モデルが小型・中型データセットのトランスフォーマーと競合することを示す。私たちの研究が,従来の構文的アプローチと確率的アプローチ,最先端のニューラルネットワークのnlpとのギャップを埋める上で有効であることを願っています。

Syntactic structures used to play a vital role in natural language processing (NLP), but since the deep learning revolution, NLP has been gradually dominated by neural models that do not consider syntactic structures in their design. One vastly successful class of neural models is transformers. When used as an encoder, a transformer produces contextual representation of words in the input sentence. In this work, we propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective. Specifically, we design a conditional random field that models discrete latent representations of all words in a sentence as well as dependency arcs between them; and we use mean field variational inference for approximate inference. Strikingly, we find that the computation graph of our model resembles transformers, with correspondences between dependencies and self-attention and between distributions over latent representations and contextual embeddings of words. Experiments show that our model performs competitively to transformers on small to medium sized datasets. We hope that our work could help bridge the gap between traditional syntactic and probabilistic approaches and cutting-edge neural approaches to NLP, and inspire more linguistically-principled neural approaches in the future.

翻訳日:2023-11-28 18:44:12 公開日:2023-11-26

# 子音認識のためのトポロジー複合機械学習

Topology combined machine learning for consonant recognition ( http://arxiv.org/abs/2311.15210v1 )

ライセンス: Link先を確認

Pingyao Feng, Siheng Yi, Qingrui Qu, Zhiwang Yu, Yifei Zhu

(参考訳) 人工知能による信号処理では、既存のディープラーニングモデルはしばしばブラックボックス構造を示し、その妥当性と理解性はいまだに不明である。トポロジカル手法の統合は、比較的初期段階の応用にもかかわらず、モデルをより解釈しやすくすると同時に、時間依存データから構造情報を抽出し、よりスマートな学習を可能にする。ここでは,機械学習の時系列に内在する最も有意義なトポロジ的特徴を捉えるための,透過的で広く適用可能な手法 topcap を提供する。高次元空間で回転するTopCapは、本質的な次元が低いデータセットでほとんど検出されない特徴をキャプチャできる。時間遅延埋め込みと持続的ホモロジーを応用して、シミュレーションデータを用いて、時系列の振動などの情報を、その周波数、振幅、平均線の可変性の観点からカプセル化する記述子を得る。この情報はベクトル化され、k-nearest近傍やサポートベクターマシンなどの複数の機械学習アルゴリズムに供給される。特に、音声および無声子音の分類において、TopCapは96%を超える精度を達成し、音声および音声信号の深層学習のためのトポロジ的畳み込み層の設計に向けられている。

In artificial-intelligence-aided signal processing, existing deep learning models often exhibit a black-box structure, and their validity and comprehensibility remain elusive. The integration of topological methods, despite its relatively nascent application, serves a dual purpose of making models more interpretable as well as extracting structural information from time-dependent data for smarter learning. Here, we provide a transparent and broadly applicable methodology, TopCap, to capture the most salient topological features inherent in time series for machine learning. Rooted in high-dimensional ambient spaces, TopCap is capable of capturing features rarely detected in datasets with low intrinsic dimensionality. Applying time-delay embedding and persistent homology, we obtain descriptors which encapsulate information such as the vibration of a time series, in terms of its variability of frequency, amplitude, and average line, demonstrated with simulated data. This information is then vectorised and fed into multiple machine learning algorithms such as k-nearest neighbours and support vector machine. Notably, in classifying voiced and voiceless consonants, TopCap achieves an accuracy exceeding 96% and is geared towards designing topological convolutional layers for deep learning of speech and audio signals.

翻訳日:2023-11-28 18:43:49 公開日:2023-11-26

# 仮想環境における具体化エージェント

See and Think: Embodied Agent in Virtual Environment ( http://arxiv.org/abs/2311.15209v1 )

ライセンス: Link先を確認

Zhonghan Zhao, Wenhao Chai, Xuan Wang, Li Boyi, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang

(参考訳) 大規模言語モデル(LLM)は、いくつかのオープンワールドタスクにおいて驚くべき進歩を遂げた。近年, LLM を用いたエンボディエージェントの構築がホットスポットとなっている。本稿では,Minecraft仮想環境における包括的で視覚的なエンボディエージェントであるSTEVEを提案する。 STEVEは視覚知覚、言語命令、コードアクションの3つの重要なコンポーネントから構成される。視覚知覚は、環境内の視覚情報の解釈を伴い、エージェントの状態とタスク命令と共にllmsコンポーネントに統合される。言語指導は、反復的な推論と複雑なタスクを管理可能なガイドラインに分解する責任がある。コードアクションはスキルデータベースの検索に基づいて実行可能なスキルアクションを生成し、エージェントがminecraft環境内で効果的に対話できるようにする。また、600ドル+ビジョン環境ペア、20K知識質問応答ペア、200ドル+スキルコードペアを含むSTEVE-21Kデータセットも収集しています。我々は,連続的ブロック探索,知識質問と回答,技術木熟達を行い,その性能を評価する。大規模な実験によると、STEVEは、キーテクツリーのアンロックを高速化する$1.5と、これまでの最先端のメソッドに比べて、ブロック検索タスクを高速化する$2.5だ。

Large language models (LLMs) have achieved impressive progress on several open-world tasks. Recently, using LLMs to build embodied agents has been a hotspot. In this paper, we propose STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment. STEVE consists of three key components: vision perception, language instruction, and code action. Vision perception involves the interpretation of visual information in the environment, which is then integrated into the LLMs component with agent state and task instruction. Language instruction is responsible for iterative reasoning and decomposing complex tasks into manageable guidelines. Code action generates executable skill actions based on retrieval in skill database, enabling the agent to interact effectively within the Minecraft environment. We also collect STEVE-21K dataset, which includes 600$+$ vision-environment pairs, 20K knowledge question-answering pairs, and 200$+$ skill-code pairs. We conduct continuous block search, knowledge question and answering, and tech tree mastery to evaluate the performance. Extensive experiments show that STEVE achieves at most $1.5 \times$ faster unlocking key tech trees and $2.5 \times$ quicker in block search tasks compared to previous state-of-the-art methods.

翻訳日:2023-11-28 18:43:27 公開日:2023-11-26

# ウェブ・モバイル技術研究における学生の関心

Student's Interests Related to Web and Mobile Technologies Study ( http://arxiv.org/abs/2311.15293v1 )

ライセンス: Link先を確認

Manuela Petrescu, Adrian Sterca and Ioan Badarinza

(参考訳) 本稿では,Webとモバイル技術に関する学生の関心と課題について考察する。本研究は,Webプログラミング講座に参加する大学生,学生を対象にした調査である。特に,Web やモバイル開発において,学生がキャリアを成功させる上での課題について検討した結果,Web やモバイル技術が急速に変化する中で,最新の状態を維持するのに必要な作業が,最も重要であることがわかった。調査対象となった大学生のWeb開発やモバイル開発に対する態度は概して肯定的であり,60%以上がウェブやモバイル開発に興味を持っていると回答している。また、その多くがバックエンドのWeb技術に取り組んでいることもわかりました。学生が関心を持つ特定のウェブ技術については、非常に多様である。本研究は,Webとモバイル技術に関する学生の関心や課題に関する貴重な知見を提供し,この領域における効果的な教育・学習手法の開発を導くものである。

We explore in this paper the interests and challenges of students regarding web and mobile technologies. Our study is based on a survey among undergraduate students, students that attend a Web Programming course. In particular, we study the challenges students have in following a successful career in web or mobile development and we have found that the most important one is the large effort required for keeping up to date with the fast changing web and mobile technologies. Overall, the attitude of the surveyed undergraduate students towards web development and mobile development is rather positive, as more than 60% of them said that they are interested in a career in web or mobile development. We also found out that most of them prefer working on back-end web technologies. As for the specific web technologies students are interested on, they are highly varied. Overall, our study provides valuable insights into the interests and challenges of students regarding web and mobile technologies, which can guide the development of effective teaching and learning approaches in this area.

翻訳日:2023-11-28 18:36:00 公開日:2023-11-26

# Obj-NeRF:多視点画像から物体のNeRFを抽出する

Obj-NeRF: Extract Object NeRFs from Multi-view Images ( http://arxiv.org/abs/2311.15291v1 )

ライセンス: Link先を確認

Zhiyi Li, Lihe Ding, Tianfan Xue

(参考訳) ニューラル・ラジアンス・フィールド(NeRF)は3次元環境における新しいビュー合成において顕著な効果を示した。しかし,複数視点画像から特定の物体の放射能場を抽出することは,咬合や背景の複雑さからかなりの困難に直面するため,nerf編集や3dメッシュ抽出などの下流アプリケーションでは困難が伴う。この問題を解決するため,本論文では,単一プロンプトを用いた多視点画像から特定の物体の3次元形状を復元する包括的パイプラインであるObj-NeRFを提案する。この手法は, セグメンテーションモデル(SAM)の2次元セグメンテーション能力とNeRFの3次元再構成能力を組み合わせたものである。具体的には,指示対象の多視点セグメンテーションをSAMを用いて1つのプロンプトで取得する。次に,このセグメンテーション画像を用いてNeRF構築を監督し,いくつかの効果的な手法を統合する。さらに、様々なオブジェクトを含む大きなオブジェクトレベルのnerfデータセットを構築し、様々なダウンストリームタスクで役立ちます。また,本手法の実用性を示すため,Obj-NeRFを物体除去,回転,置換,再色など様々な用途に適用する。

Neural Radiance Fields (NeRFs) have demonstrated remarkable effectiveness in novel view synthesis within 3D environments. However, extracting a radiance field of one specific object from multi-view images encounters substantial challenges due to occlusion and background complexity, thereby presenting difficulties in downstream applications such as NeRF editing and 3D mesh extraction. To solve this problem, in this paper, we propose Obj-NeRF, a comprehensive pipeline that recovers the 3D geometry of a specific object from multi-view images using a single prompt. This method combines the 2D segmentation capabilities of the Segment Anything Model (SAM) in conjunction with the 3D reconstruction ability of NeRF. Specifically, we first obtain multi-view segmentation for the indicated object using SAM with a single prompt. Then, we use the segmentation images to supervise NeRF construction, integrating several effective techniques. Additionally, we construct a large object-level NeRF dataset containing diverse objects, which can be useful in various downstream tasks. To demonstrate the practicality of our method, we also apply Obj-NeRF to various applications, including object removal, rotation, replacement, and recoloring.

翻訳日:2023-11-28 18:35:44 公開日:2023-11-26

# 貨物旅行の時間的・空間的特徴:データ駆動探索分析

Spatial and Temporal Characteristics of Freight Tours: A Data-Driven Exploratory Analysis ( http://arxiv.org/abs/2311.15287v1 )

ライセンス: Link先を確認

Ali Nadi, L\'or\'ant Tavasszy, J.W.C. van Lint, Maaike Snelder

(参考訳) 本稿では,デジタル貨物輸送活動データから異なる貨物市場におけるスケジューリングと経路パターンを推定するモデリング手法を提案する。貨物輸送データから規則を抽出するための離散連続決定木アプローチを含む,完全なモデリングフレームワークを提供する。これらのモデルをオランダで収集したツアーデータに適用し、出発時刻パターンとツアー戦略を理解し、提案アルゴリズムの有効性を評価した。旅行の種類や貨物活動の時間パターンを捉える上で,時間的・時間的特徴が重要であることがわかった。また、実証的な証拠は、ほとんどの輸送市場のキャリアが混雑のレベルに敏感であることを示している。それらの多くは、混雑するゾーンに面した場合のツアーの種類、出発時間、ツアー毎の停止数を調整する。結果は、実践者が輸送市場をより把握し、貨物・交通管理対策を開発するために利用することができる。

This paper presents a modeling approach to infer scheduling and routing patterns from digital freight transport activity data for different freight markets. We provide a complete modeling framework including a new discrete-continuous decision tree approach for extracting rules from the freight transport data. We apply these models to collected tour data for the Netherlands to understand departure time patterns and tour strategies, also allowing us to evaluate the effectiveness of the proposed algorithm. We find that spatial and temporal characteristics are important to capture the types of tours and time-of-day patterns of freight activities. Also, the empirical evidence indicates that carriers in most of the transport markets are sensitive to the level of congestion. Many of them adjust the type of tour, departure time, and the number of stops per tour when facing a congested zone. The results can be used by practitioners to get more grip on transport markets and develop freight and traffic management measures.

翻訳日:2023-11-28 18:35:26 公開日:2023-11-26

# 高次元pdesのためのランダム化平滑化を用いた物理形ニューラルネットワークのバイアス分散トレードオフ

Bias-Variance Trade-off in Physics-Informed Neural Networks with Randomized Smoothing for High-Dimensional PDEs ( http://arxiv.org/abs/2311.15283v1 )

ライセンス: Link先を確認

Zheyuan Hu, Zhouhao Yang, Yezhen Wang, George Em Karniadakis, Kenji Kawaguchi

(参考訳) 物理インフォームドニューラルネットワーク(PINN)は低次元偏微分方程式(PDE)に有効であることが証明されているが、高次元シナリオでは計算コストがハードルとなっている。これは物理学的インフォームド損失における高次微分と高次元微分の計算において特に顕著である。 Randomized Smoothing PINN (RS-PINN) は、元のニューラルネットモデルの確率的滑らか化のためのガウスノイズを導入し、微分近似のためのモンテカルロ法を可能にし、コストのかかる自動微分の必要性を排除した。高次元での計算効率にもかかわらず、RS-PINNは損失と勾配の両方にバイアスを導入し、特に確率勾配降下(SGD)と組み合わせると、収束に悪影響を及ぼす。 RS-PINNにおけるバイアスの包括的解析は,平均二乗誤差(MSE)損失とPDE非線形性の非線形性に起因する。 PDE非線形性の順序に基づく補正バイアス補正手法を提案する。 RS-PINNはバイアスのないバージョンと比較して、その長所と短所を詳細に調べることができる。具体的には、偏りのあるバージョンは分散が低く、偏りのないバージョンよりも速く走るが、偏りのため正確ではない。バイアス分散のトレードオフを最適化するために,バイアス分散モデルの高速収束と非バイアスバージョンの高精度を両立するハイブリッド手法における2つのアプローチを組み合わせる。また,RS-PINNの実装も強化した。 fokker-planck, hjb, viscous burgers', allen-cahn, sine-gordon等を含む多種多様な高次元pdesに関する広範な実験はバイアス分散トレードオフを示し、ハイブリッドrs-pinnの有効性を強調している。特定のPDE問題の寸法や非線形性に応じてバイアス付き、バイアスなし、ハイブリッド版を選択するための実証的ガイドラインが提供される。

While physics-informed neural networks (PINNs) have been proven effective for low-dimensional partial differential equations (PDEs), the computational cost remains a hurdle in high-dimensional scenarios. This is particularly pronounced when computing high-order and high-dimensional derivatives in the physics-informed loss. Randomized Smoothing PINN (RS-PINN) introduces Gaussian noise for stochastic smoothing of the original neural net model, enabling Monte Carlo methods for derivative approximation, eliminating the need for costly auto-differentiation. Despite its computational efficiency in high dimensions, RS-PINN introduces biases in both loss and gradients, negatively impacting convergence, especially when coupled with stochastic gradient descent (SGD). We present a comprehensive analysis of biases in RS-PINN, attributing them to the nonlinearity of the Mean Squared Error (MSE) loss and the PDE nonlinearity. We propose tailored bias correction techniques based on the order of PDE nonlinearity. The unbiased RS-PINN allows for a detailed examination of its pros and cons compared to the biased version. Specifically, the biased version has a lower variance and runs faster than the unbiased version, but it is less accurate due to the bias. To optimize the bias-variance trade-off, we combine the two approaches in a hybrid method that balances the rapid convergence of the biased version with the high accuracy of the unbiased version. In addition, we present an enhanced implementation of RS-PINN. Extensive experiments on diverse high-dimensional PDEs, including Fokker-Planck, HJB, viscous Burgers', Allen-Cahn, and Sine-Gordon equations, illustrate the bias-variance trade-off and highlight the effectiveness of the hybrid RS-PINN. Empirical guidelines are provided for selecting biased, unbiased, or hybrid versions, depending on the dimensionality and nonlinearity of the specific PDE problem.

翻訳日:2023-11-28 18:35:11 公開日:2023-11-26

# 適応重み変調を用いた高能率リハーサル自由ゼロ学習

Efficient Rehearsal Free Zero Forgetting Continual Learning using Adaptive Weight Modulation ( http://arxiv.org/abs/2311.15276v1 )

ライセンス: Link先を確認

Yonatan Sverdlov, Shimon Ullman

(参考訳) ニューラルネットワークは、連続学習(continuous learning)として知られる、長期にわたる複数のタスクの知識獲得という、注目すべき課題に直面している。この課題は、新しいタスクの目的に合うように前もって学習した重量を調整する傾向から生じ、破滅的な忘れという現象を引き起こす。この問題に対するほとんどのアプローチは、新しいタスクのパフォーマンスを最大化することと、以前のタスクの忘れを最小化することのバランスを求める。対照的に、私たちのアプローチは、忘れることなく、新しいタスクのパフォーマンスを最大化しようとしています。これは各タスクに対してタスク固有の変調パラメータを作成することで実現される。これらは連続したタスクの学習中に学習可能なパラメータである。総合的な実験評価を行い,他のマルチタスクモデルに困難をもたらす新しいタスクの獲得と保持において優れた性能を示す。これは、新たなタスクの獲得を伴いながら、破滅的な忘れを予防するためのアプローチの有効性を強調します。

Artificial neural networks encounter a notable challenge known as continual learning, which involves acquiring knowledge of multiple tasks over an extended period. This challenge arises due to the tendency of previously learned weights to be adjusted to suit the objectives of new tasks, resulting in a phenomenon called catastrophic forgetting. Most approaches to this problem seek a balance between maximizing performance on the new tasks and minimizing the forgetting of previous tasks. In contrast, our approach attempts to maximize the performance of the new task, while ensuring zero forgetting. This is accomplished by creating a task-specific modulation parameters for each task. Only these would be learnable parameters during learning of consecutive tasks. Through comprehensive experimental evaluations, our model demonstrates superior performance in acquiring and retaining novel tasks that pose difficulties for other multi-task models. This emphasizes the efficacy of our approach in preventing catastrophic forgetting while accommodating the acquisition of new tasks

翻訳日:2023-11-28 18:34:35 公開日:2023-11-26

# 手書き数式認識のための知的検出ネットワーク

An Intelligent-Detection Network for Handwritten Mathematical Expression Recognition ( http://arxiv.org/abs/2311.15273v1 )

ライセンス: Link先を確認

Ziqi Ye

(参考訳) 教育における人工知能技術の利用は急速に増加しており、研究者による手書き数式認識(hmer)に注目が集まっている。しかし、hmerの既存の手法の多くは複雑な構造を持つ式を正確に読み取ることができない可能性がある。提案するHMER用知的検出ネットワーク(IDN)は,オブジェクト検出技術を用いて従来のエンコーダデコーダ法と異なる。具体的には,デジタルオブジェクトとシンボルオブジェクトの両方を正確に検出できる拡張YOLOv7ネットワークを開発した。次に、検出結果を双方向ゲート再帰ユニット(BiGRU)とベースラインシンボル関係ツリー(BSRT)に統合し、シンボルと数字の関係を決定する。提案手法は, 複雑な手書き数式認識において, エンコーダ・デコーダネットワークよりも優れていることを示す。これは記号と数字の正確な検出のためである。我々の研究は、HMERの分野に貴重な貢献をする可能性がある。これは、学校における課題グレーディングや文書情報の入力など、様々な実践的なシナリオに適用できる。

The use of artificial intelligence technology in education is growing rapidly, with increasing attention being paid to handwritten mathematical expression recognition (HMER) by researchers. However, many existing methods for HMER may fail to accurately read formulas with complex structures, as the attention results can be inaccurate due to illegible handwriting or large variations in writing styles. Our proposed Intelligent-Detection Network (IDN) for HMER differs from traditional encoder-decoder methods by utilizing object detection techniques. Specifically, we have developed an enhanced YOLOv7 network that can accurately detect both digital and symbolic objects. The detection results are then integrated into the bidirectional gated recurrent unit (BiGRU) and the baseline symbol relationship tree (BSRT) to determine the relationships between symbols and numbers. The experiments demonstrate that the proposed method outperforms those encoder-decoder networks in recognizing complex handwritten mathematical expressions. This is due to the precise detection of symbols and numbers. Our research has the potential to make valuable contributions to the field of HMER. This could be applied in various practical scenarios, such as assignment grading in schools and information entry of paper documents.

翻訳日:2023-11-28 18:34:18 公開日:2023-11-26

# Tessel: フレキシブルスケジュール検索による大規模DNNモデルの分散実行促進

Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search ( http://arxiv.org/abs/2311.15269v1 )

ライセンス: Link先を確認

Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang

(参考訳) ますます複雑で多様なディープニューラルネットワーク(dnn)モデルは、トレーニングや推論タスクのために複数のデバイスに分散し、パフォーマンスのために注意深く計画されたスケジュールを必要とする。しかしながら、既存のプラクティスは、新興の多様なモデル認識オペレータ配置戦略の利点を十分に活用しない、事前定義されたスケジュールに依存することが多い。大規模かつ多様なスケジュール空間のため、手作りの高効率スケジュールは困難である。本稿では,分散dnnトレーニングのための効率的なスケジュール検索と,多様なオペレータ配置戦略のための推論を行う自動システムであるtesselを提案する。検索コストを削減するため、Tessel氏は、最も効率的なスケジュールは、異なるデータ入力に対して繰り返しパターン(繰り返し)を示すことが多いという洞察を活用している。これは2段階のアプローチにつながる: 繰り返しの建設とスケジュールの完了。様々なオペレータ配置戦略のスケジュールを調べることで、テッセルはトレーニングと推論のパフォーマンスを著しく改善する。代表的DNNモデルによる実験では、Tesselは最大5.5倍のトレーニング性能向上と最大38%の推論遅延削減を実現している。

Increasingly complex and diverse deep neural network (DNN) models necessitate distributing the execution across multiple devices for training and inference tasks, and also require carefully planned schedules for performance. However, existing practices often rely on predefined schedules that may not fully exploit the benefits of emerging diverse model-aware operator placement strategies. Handcrafting high-efficiency schedules can be challenging due to the large and varying schedule space. This paper presents Tessel, an automated system that searches for efficient schedules for distributed DNN training and inference for diverse operator placement strategies. To reduce search costs, Tessel leverages the insight that the most efficient schedules often exhibit repetitive pattern (repetend) across different data inputs. This leads to a two-phase approach: repetend construction and schedule completion. By exploring schedules for various operator placement strategies, Tessel significantly improves both training and inference performance. Experiments with representative DNN models demonstrate that Tessel achieves up to 5.5x training performance speedup and up to 38% inference latency reduction.

翻訳日:2023-11-28 18:33:59 公開日:2023-11-26

# スパース表現による未学習

Unlearning via Sparse Representations ( http://arxiv.org/abs/2311.15268v1 )

ライセンス: Link先を確認

Vedant Shah, Frederik Tr\"auble, Ashish Malik, Hugo Larochelle, Michael Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal

(参考訳) 訓練されたモデルから \emph{forget set} に関する知識を消去する機械 \emph{unlearning} は、既存の技術によってコストと実行不可能であることが証明される。本稿では,離散表現型ボトルネックに基づく無計算ゼロショット学習手法を提案する。提案手法は,提案手法を効率的に学習し,他のデータセットにおけるモデルの性能に負のダメージを与えることを示す。 CIFAR-10, CIFAR-100, LACUNA-100の3つのデータセットを用いて, 提案手法の評価を行った。提案手法を,未学習の知識蒸留を用いた最先端手法であるSCRUBと比較した。 3つのデータセット全体にわたって、提案手法はSCRUBに劣らず、計算コストがほとんどない。

Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's performance on the rest of the data set. We evaluate the proposed technique on the problem of \textit{class unlearning} using three datasets: CIFAR-10, CIFAR-100, and LACUNA-100. We compare the proposed technique to SCRUB, a state-of-the-art approach which uses knowledge distillation for unlearning. Across all three datasets, the proposed technique performs as well as, if not better than SCRUB while incurring almost no computational cost.

翻訳日:2023-11-28 18:33:41 公開日:2023-11-26

# ChAda-ViT : 不均一顕微鏡像の同時表現学習におけるチャネル適応的注意

ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images ( http://arxiv.org/abs/2311.15264v1 )

ライセンス: Link先を確認

Nicolas Bourriez, Ihab Bendidi, Ethan Cohen, Gabriel Watkinson, Maxime Sanchez, Guillaume Bollot, Auguste Genovesio

(参考訳) rgbチャネルに一貫してエンコードされるカラー写真画像とは異なり、生物学的画像は様々なモダリティを包含しており、顕微鏡のタイプや各チャネルの意味は実験によって異なる。重要なことは、チャンネルの数は1から1ダース程度で、その相関関係はRGBよりも比較的低く、それぞれが特定の情報コンテンツを提供する。この側面は、バイオイメージ領域から設計された手法によって見落とされ、現在のソリューションは主にチャネル内の空間的注意に焦点を当てており、チャネル間の関係を無視していることが多いが、ほとんどの生物学的応用において不可欠である。重要なことに、可変チャネルタイプとカウントは、大規模な事前トレーニングのための統一表現へのいくつかの実験の投影を妨げる。本研究では,任意の数,順序,種類のチャネルを持つ画像に対して,チャネル間アテンション機構を用いた新しいチャネル適応型視覚トランスフォーマアーキテクチャであるChAda-ViTを提案する。 IDRCell100kは、7つの顕微鏡モードを多種多様なチャネルタイプでカバーし、1つの実験ごとに1から10までのチャネル数を変化させた79の実験セットである。提案したアーキテクチャは, 既存のアプローチを, 生物学的に関係のある下流タスクで上回っている。さらに、様々な画像や実験的なモダリティを統一された生物学的イメージ表現に埋め込むことで、異なる顕微鏡、チャネル番号、タイプで測定する間を初めて橋渡しすることができる。後者は、学際的な研究の促進と、生物学的画像に基づく分析における深層学習のより良い採用の道を開くべきである。コードとデータはまもなくリリースされる。

Unlike color photography images, which are consistently encoded into RGB channels, biological images encompass various modalities, where the type of microscopy and the meaning of each channel varies with each experiment. Importantly, the number of channels can range from one to a dozen and their correlation is often comparatively much lower than RGB, as each of them brings specific information content. This aspect is largely overlooked by methods designed out of the bioimage field, and current solutions mostly focus on intra-channel spatial attention, often ignoring the relationship between channels, yet crucial in most biological applications. Importantly, the variable channel type and count prevent the projection of several experiments to a unified representation for large scale pre-training. In this study, we propose ChAda-ViT, a novel Channel Adaptive Vision Transformer architecture employing an Inter-Channel Attention mechanism on images with an arbitrary number, order and type of channels. We also introduce IDRCell100k, a bioimage dataset with a rich set of 79 experiments covering 7 microscope modalities, with a multitude of channel types, and channel counts varying from 1 to 10 per experiment. Our proposed architecture, trained in a self-supervised manner, outperforms existing approaches in several biologically relevant downstream tasks. Additionally, it can be used to bridge the gap for the first time between assays with different microscopes, channel numbers or types by embedding various image and experimental modalities into a unified biological image representation. The latter should facilitate interdisciplinary studies and pave the way for better adoption of deep learning in biological image-based analyses. Code and Data to be released soon.

翻訳日:2023-11-28 18:33:27 公開日:2023-11-26

# 自己教師付きグラフ畳み込みネットワークを細胞グラフに適用した脳画像における皮質層の研究

Revealing Cortical Layers In Histological Brain Images With Self-Supervised Graph Convolutional Networks Applied To Cell-Graphs ( http://arxiv.org/abs/2311.15262v1 )

ライセンス: Link先を確認

Valentina Vadori, Antonella Peruffo, Jean-Marie Gra\"ic, Giulia Vadori, Livio Finos, Enrico Grisan

(参考訳) 大脳皮質の層を同定することは、脳構造と種間の機能の関係に関する洞察を提供することを目的とした細胞構造の比較研究に不可欠である。広範な注釈付きデータセットがないことは、通常、機械学習アプローチの採用を制限するものであり、神経解剖学者による皮質層の手作業による記述につながる。大脳皮質の2次元Nissl染色組織スライスにおける層検出のための自己監督的アプローチを導入する。それは、個々の細胞のセグメンテーションと、属性付きセルグラフの作成から始まる。自己教師付きグラフ畳み込みネットワークは、細胞環境の形態的および構造的特性を符号化した細胞埋め込みを生成し、最終層化のためのコミュニティ検出アルゴリズムにより活用する。本手法は, 空間的トランスクリプトミクスデータを含まない, 自己管理した最初の手法であり, 細胞構造解析の促進, アノテーションニーズの回避, 種間調査の進展を期待できる。

Identifying cerebral cortex layers is crucial for comparative studies of the cytoarchitecture aiming at providing insights into the relations between brain structure and function across species. The absence of extensive annotated datasets typically limits the adoption of machine learning approaches, leading to the manual delineation of cortical layers by neuroanatomists. We introduce a self-supervised approach to detect layers in 2D Nissl-stained histological slices of the cerebral cortex. It starts with the segmentation of individual cells and the creation of an attributed cell-graph. A self-supervised graph convolutional network generates cell embeddings that encode morphological and structural traits of the cellular environment and are exploited by a community detection algorithm for the final layering. Our method, the first self-supervised of its kind with no spatial transcriptomics data involved, holds the potential to accelerate cytoarchitecture analyses, sidestepping annotation needs and advancing cross-species investigation.

翻訳日:2023-11-28 18:33:02 公開日:2023-11-26

# NeuRAD: 自律運転のためのニューラルレンダリング

NeuRAD: Neural Rendering for Autonomous Driving ( http://arxiv.org/abs/2311.15260v1 )

ライセンス: Link先を確認

Adam Tonderski, Carl Lindstr\"om, Georg Hess, William Ljungbergh, Lennart Svensson, Christoffer Petersson

(参考訳) neural radiance fields(nerfs)は、自動運転(ad)コミュニティで人気を集めている。近年の手法では, クローズドループシミュレーションやADシステムのテスト, 高度なトレーニングデータ拡張技術などが実現されている。しかし、既存の手法では、長い訓練時間、密集した意味的監督、あるいは一般化可能性の欠如がしばしば必要である。これにより、大規模な AD への NeRF の適用が妨げられる。本稿では,動的ADデータに適した,堅牢なビュー合成手法であるNeuRADを提案する。我々の手法は単純なネットワーク設計、カメラとライダーの両方のための広範なセンサーモデリング -- ローリングシャッター、ビーム発散、レイドロップなど -- を備えており、最初から複数のデータセットに適用できる。一般的な5つのADデータセット上でのパフォーマンスを検証する。さらなる開発を促進するため、NeuRADソースコードを公開しています。 https://github.com/georghess/NeuRAD を参照。

Neural radiance fields (NeRFs) have gained popularity in the autonomous driving (AD) community. Recent methods show NeRFs' potential for closed-loop simulation, enabling testing of AD systems, and as an advanced training data augmentation technique. However, existing methods often require long training times, dense semantic supervision, or lack generalizability. This, in turn, hinders the application of NeRFs for AD at scale. In this paper, we propose NeuRAD, a robust novel view synthesis method tailored to dynamic AD data. Our method features simple network design, extensive sensor modeling for both camera and lidar -- including rolling shutter, beam divergence and ray dropping -- and is applicable to multiple datasets out of the box. We verify its performance on five popular AD datasets, achieving state-of-the-art performance across the board. To encourage further development, we openly release the NeuRAD source code. See https://github.com/georghess/NeuRAD .

翻訳日:2023-11-28 18:32:45 公開日:2023-11-26

# メタバースを使うべきかどうか? メタ教育技術を活用した大学生の行動意図に関する研究

Should I use metaverse or not? An investigation of university students behavioral intention to use MetaEducation technology ( http://arxiv.org/abs/2311.15251v1 )

ライセンス: Link先を確認

Nikolaos Misirlis, Yiannis Nikolaidis, Anna Sabidussi

(参考訳) Metaverseは、バーチャルと拡張現実を組み合わせた急成長する技術トレンドであり、ユーザーがデジタルアバターを通じて仮想アイデンティティを仮定し、現実の世界にいる他の人と対話できる完全なデジタル環境を提供する。その応用分野は、経済(暗号通貨分野への参入)、金融、社会生活、労働環境、医療、不動産、教育など多岐にわたる。新型コロナウイルス(covid-19)とcovid-19後、大学はeラーニング技術を急速に採用し、学生に学習コンテンツやプラットフォームへのオンラインアクセスを提供してきた。そこで本研究では,TAM(Technology Acceptance Model)を参考に,大学生のメタバース技術の教育における受容と活用の意図を分析する枠組みを提案する。本研究は, 教育におけるメタバース技術活用の意図と, 態度, 認知的有用性, 使いやすさ, 教育におけるメタバース技術の自己有効性, 主観規範など, 選択されたtam構成との関係について検討することを目的とする。特に、自己効力感と主観的ノルムは、態度と知覚的有用性に肯定的な影響を及ぼすが、知覚的使用感は、態度や知覚的有用性と強く相関しない。著者らは、研究の構成要素間の弱い関連性はメタエデュケーションとその潜在的な利益に関する限られた知識に起因すると仮定している。高等教育分野におけるメタ教育技術の受容と活用に関わる複雑なダイナミクスを包括的に理解するために,提案モデルのさらなる調査と分析が求められている。

Metaverse, a burgeoning technological trend that combines virtual and augmented reality, provides users with a fully digital environment where they can assume a virtual identity through a digital avatar and interact with others as they were in the real world. Its applications span diverse domains such as economy (with its entry into the cryptocurrency field), finance, social life, working environment, healthcare, real estate, and education. During the COVID-19 and post-COVID-19 era, universities have rapidly adopted e-learning technologies to provide students with online access to learning content and platforms, rendering previous considerations on integrating such technologies or preparing institutional infrastructures virtually obsolete. In light of this context, the present study proposes a framework for analyzing university students' acceptance and intention to use metaverse technologies in education, drawing upon the Technology Acceptance Model (TAM). The study aims to investigate the relationship between students' intention to use metaverse technologies in education, hereafter referred to as MetaEducation, and selected TAM constructs, including Attitude, Perceived Usefulness, Perceived Ease of Use, Self-efficacy of metaverse technologies in education, and Subjective Norm. Notably, Self-efficacy and Subjective Norm have a positive influence on Attitude and Perceived Usefulness, whereas Perceived Ease of Use does not exhibit a strong correlation with Attitude or Perceived Usefulness. The authors postulate that the weak associations between the study's constructs may be attributed to limited knowledge regarding MetaEducation and its potential benefits. Further investigation and analysis of the study's proposed model are warranted to comprehensively understand the complex dynamics involved in the acceptance and utilization of MetaEducation technologies in the realm of higher education

翻訳日:2023-11-28 18:32:31 公開日:2023-11-26

# 大規模言語モデルを用いたアルゴリズム進化

Algorithm Evolution Using Large Language Model ( http://arxiv.org/abs/2311.15249v1 )

ライセンス: Link先を確認

Fei Liu, Xialiang Tong, Mingxuan Yuan and Qingfu Zhang

(参考訳) 最適化は多くの現実のアプリケーションで見られます。特定の最適化問題に対して効果的なアルゴリズムを設計するには、ドメイン知識とアルゴリズム設計スキルを持つ人間の専門家による退屈な努力が必要となる。本稿では,大規模言語モデル(AEL)を用いたアルゴリズム進化という新しい手法を提案する。大規模な言語モデル(LLM)を使用して、進化的フレームワークを通じて最適化アルゴリズムを自動生成する。 AELはモデルトレーニングなしでアルゴリズムレベルの進化を行う。人間の努力とドメイン知識の要求は大幅に削減できる。本研究では, AEL による構成的アルゴリズムは, 単純な手作りと LLM 生成のヒューリスティックよりも優れていることを示す。他のドメイン深層学習モデルベースアルゴリズムと比較して、これらの手法は様々な問題サイズにまたがる優れたスケーラビリティを示す。 AELはまた、アルゴリズムの探索演算子としてLLMを使用した以前の試みとは大きく異なる。

Optimization can be found in many real-life applications. Designing an effective algorithm for a specific optimization problem typically requires a tedious amount of effort from human experts with domain knowledge and algorithm design skills. In this paper, we propose a novel approach called Algorithm Evolution using Large Language Model (AEL). It utilizes a large language model (LLM) to automatically generate optimization algorithms via an evolutionary framework. AEL does algorithm-level evolution without model training. Human effort and requirements for domain knowledge can be significantly reduced. We take constructive methods for the salesman traveling problem as a test example, we show that the constructive algorithm obtained by AEL outperforms simple hand-crafted and LLM-generated heuristics. Compared with other domain deep learning model-based algorithms, these methods exhibit excellent scalability across different problem sizes. AEL is also very different from previous attempts that utilize LLMs as search operators in algorithms.

翻訳日:2023-11-28 18:32:00 公開日:2023-11-26

# 分散検出のためのIDライクなプロンプト学習

ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection ( http://arxiv.org/abs/2311.15243v1 )

ライセンス: Link先を確認

Yichen Bai, Zongbo Han, Changqing Zhang, Bing Cao, Xiaoheng Jiang, Qinghua Hu

(参考訳) アウト・オブ・ディストリビューション(OOD)検出法は、OODサンプルを識別するモデルをトレーニングするために補助的なアウトレイアを利用することが多い。しかし、これらのサンプルは、ID(In-distriion)データに近い最も困難なOODサンプル、すなわちIDライクなサンプルを効果的に区別する際の制限に直面している。そこで本研究では,IDサンプルの近傍空間からCLIPを用いて,ID類似の異常値を検出する新しいOOD検出フレームワークを提案する。次に、識別されたIDライクな外れ値を利用して、OOD検出のためのCLIPの機能をさらに活用する即時学習フレームワークを提案する。強力なCLIPから恩恵を受けるため、補助的な外れ値データセットを公開せずにモデルのプロンプトを学習するためには、少数のIDサンプルが必要である。最も難しいidライクなoodサンプルに着目し,クリップの能力をエレガントに活用することにより,実世界の様々な画像データセットにおいて,優れた少数ショット学習性能を実現する(例えば,imagenet-1kデータセットにおける4ショットood検出では,平均fpr95を12.16%削減し,平均aurocを2.76%改善した)。

Out-of-distribution (OOD) detection methods often exploit auxiliary outliers to train model identifying OOD samples, especially discovering challenging outliers from auxiliary outliers dataset to improve OOD detection. However, they may still face limitations in effectively distinguishing between the most challenging OOD samples that are much like in-distribution (ID) data, i.e., ID-like samples. To this end, we propose a novel OOD detection framework that discovers ID-like outliers using CLIP from the vicinity space of the ID samples, thus helping to identify these most challenging OOD samples. Then a prompt learning framework is proposed that utilizes the identified ID-like outliers to further leverage the capabilities of CLIP for OOD detection. Benefiting from the powerful CLIP, we only need a small number of ID samples to learn the prompts of the model without exposing other auxiliary outlier datasets. By focusing on the most challenging ID-like OOD samples and elegantly exploiting the capabilities of CLIP, our method achieves superior few-shot learning performance on various real-world image datasets (e.g., in 4-shot OOD detection on the ImageNet-1k dataset, our method reduces the average FPR95 by 12.16% and improves the average AUROC by 2.76%, compared to state-of-the-art methods).

翻訳日:2023-11-28 18:31:47 公開日:2023-11-26

# CalibFormer: トランスフォーマーによるLiDARカメラ自動校正ネットワーク

CalibFormer: A Transformer-based Automatic LiDAR-Camera Calibration Network ( http://arxiv.org/abs/2311.15241v1 )

ライセンス: Link先を確認

Yuxuan Xiao, Yao Li, Chengzhen Meng, Xingchen Li and Yanyong Zhang

(参考訳) LiDARとカメラの融合は、認識タスクの自動運転にますます採用されている。このような融合に基づくアルゴリズムの性能は、センサーキャリブレーションの精度に大きく依存する。以前は、多くの校正手法には特定の目標や手動による介入が含まれていた。学習に基づくオンライン校正手法が提案されているが、ほとんどのケースでその性能はほとんど満足できない。これらの手法は通常、スパース特徴写像、信頼できない相互モダリティ関係、不正確なキャリブレーションパラメータ回帰などの問題に苦しむ。本稿では,この問題を解決するために,自動LiDARカメラキャリブレーションのためのエンドツーエンドネットワークCalibFormerを提案する。高解像度表現を実現するために,複数のカメラ層とLiDAR画像層を集約する。マルチヘッド相関モジュールを用いて特徴間の相関をより正確に識別する。最後に,相関情報から正確な校正パラメータを推定するためにトランスアーキテクチャを用いる。提案手法は, KITTIデータセット上で平均翻訳誤差が0.8751 \mathrm{cm}$, 平均回転誤差が0.0562 ^{\circ}$となり, 既存の最先端手法を超越し, 強靭性, 精度, 一般化能力を示した。

The fusion of LiDARs and cameras has been increasingly adopted in autonomous driving for perception tasks. The performance of such fusion-based algorithms largely depends on the accuracy of sensor calibration, which is challenging due to the difficulty of identifying common features across different data modalities. Previously, many calibration methods involved specific targets and/or manual intervention, which has proven to be cumbersome and costly. Learning-based online calibration methods have been proposed, but their performance is barely satisfactory in most cases. These methods usually suffer from issues such as sparse feature maps, unreliable cross-modality association, inaccurate calibration parameter regression, etc. In this paper, to address these issues, we propose CalibFormer, an end-to-end network for automatic LiDAR-camera calibration. We aggregate multiple layers of camera and LiDAR image features to achieve high-resolution representations. A multi-head correlation module is utilized to identify correlations between features more accurately. Lastly, we employ transformer architectures to estimate accurate calibration parameters from the correlation information. Our method achieved a mean translation error of $0.8751 \mathrm{cm}$ and a mean rotation error of $0.0562 ^{\circ}$ on the KITTI dataset, surpassing existing state-of-the-art methods and demonstrating strong robustness, accuracy, and generalization capabilities.

翻訳日:2023-11-28 18:31:18 公開日:2023-11-26

# ASI:ディープラーニングモデル評価のための精度安定度指標

ASI: Accuracy-Stability Index for Evaluating Deep Learning Models ( http://arxiv.org/abs/2311.15332v1 )

ライセンス: Link先を確認

Wei Dai, Daniel Berleant

(参考訳) モデル導入が継続する深層学習研究の文脈では、効果的で効率的な評価の必要性が依然として最重要である。既存の手法は、しばしば精度の指標を強調し、安定性を見越す。これを解決するために,深層学習モデルの精度と安定性を両立させる定量的尺度であるASI(Acuracy-Stability Index)を提案する。実験により, ASIの応用が実証され, ASI, 平均精度, 変動係数を可視化する3次元表面モデルが提示された。本稿では,深層学習モデルの精度と安定性を正確に評価するための新しい手法として,深層学習モデルの定量的ベンチマーク指標の重要な課題について述べる。本稿は,潜在的な弱さに関する議論を終え,今後の研究方向性を概説する。

In the context of deep learning research, where model introductions continually occur, the need for effective and efficient evaluation remains paramount. Existing methods often emphasize accuracy metrics, overlooking stability. To address this, the paper introduces the Accuracy-Stability Index (ASI), a quantitative measure incorporating both accuracy and stability for assessing deep learning models. Experimental results demonstrate the application of ASI, and a 3D surface model is presented for visualizing ASI, mean accuracy, and coefficient of variation. This paper addresses the important issue of quantitative benchmarking metrics for deep learning models, providing a new approach for accurately evaluating accuracy and stability of deep learning models. The paper concludes with discussions on potential weaknesses and outlines future research directions.

翻訳日:2023-11-28 18:24:00 公開日:2023-11-26

# 付加・予測のための時間ネットワークの複雑な凸領域のオピニオンダイナミクスの展望

Perspective in Opinion Dynamics on Complex Convex Domains of Time Networks for Addiction, Forgetting ( http://arxiv.org/abs/2311.15318v1 )

ライセンス: Link先を確認

Yasuko Kawahata

(参考訳) 本稿では,先行研究を改訂し,時空間スケールの変化を紹介する。本稿では,a層とb層を含むモデルについて述べる。また, ある条件下での層A, A', B', B'の依存性や忘れの変化をモデル化する。また、忘れや依存の強化や妨害的な行動を持ち、保守的、洗脳、脱トキシングの傾向が少なく、バブルをフィルターする傾向の強い意見団の形成について論じるため、時間とともに忘れや依存を推奨、妨害、ブロック、あるいは扇動する新しいクラスターcとdが導入される。この導入により、時間と空間の2次元における意見の拡大、意見空間の発展状況、世論の拡大に関する仮説を試すことができる。コンセンサス構築における課題は強調され、意見のダイナミックな性質と、不満、不信感、メディアの影響といった要素を考慮する必要性が強調される。本稿では,コンセンサス構築モデルに信頼,不信,メディアの影響を取り入れた拡張フレームワークを提案する。我々は,より深い洞察を得る方法として,dimerizingを用いたネットワーク分析を提案する。本稿では,ネットワーククラスタリング,メディアの影響,コンセンサス構築について述べる。ダイマーの位置と分布を分析し、ネットワークの構造とダイナミクスについて洞察を得る。ダイマーティリングは物理学や社会学といったネットワーク分析以外の様々な分野に応用されてきた。論文は、コンセンサス構築における多様な視点、ネットワーク分析、影響力のあるエンティティの重要性を強調して結論づける。また、複雑なネットワーク構造を理解するのに役立つトーラスベースの可視化も導入している。

This paper revises previous work and introduces changes in spatio-temporal scales. The paper presents a model that includes layers A and B with varying degrees of forgetting and dependence over time. We also model changes in dependence and forgetting in layers A, A', B, and B' under certain conditions. In addition, to discuss the formation of opinion clusters that have reinforcing or obstructive behaviors of forgetting and dependence and are conservative or brainwashing or detoxifying and less prone to filter bubbling, new clusters C and D that recommend, obstruct, block, or incite forgetting and dependence over time are Introduction. This introduction allows us to test hypotheses regarding the expansion of opinions in two dimensions over time and space, the state of development of opinion space, and the expansion of public opinion. Challenges in consensus building will be highlighted, emphasizing the dynamic nature of opinions and the need to consider factors such as dissent, distrust, and media influence. The paper proposes an extended framework that incorporates trust, distrust, and media influence into the consensus building model. We introduce network analysis using dimerizing as a method to gain deeper insights. In this context, we discuss network clustering, media influence, and consensus building. The location and distribution of dimers will be analyzed to gain insight into the structure and dynamics of the network. Dimertiling has been applied in various fields other than network analysis, such as physics and sociology. The paper concludes by emphasizing the importance of diverse perspectives, network analysis, and influential entities in consensus building. It also introduces torus-based visualizations that aid in understanding complex network structures.

翻訳日:2023-11-28 18:23:41 公開日:2023-11-26

# 一般化グラフプロンプト:グラフ上の事前学習とダウンストリームタスクの統合に向けて

Generalized Graph Prompt: Toward a Unification of Pre-Training and Downstream Tasks on Graphs ( http://arxiv.org/abs/2311.15317v1 )

ライセンス: Link先を確認

Xingtong Yu, Zhenghao Liu, Yuan Fang, Zemin Liu, Sihong Chen and Xinming Zhang

(参考訳) グラフニューラルネットワークはグラフ表現学習の強力なツールとして登場したが、そのパフォーマンスはタスク固有の監督に大きく依存している。ラベル付け要求を減らすため、"pre-train, prompt"パラダイムはますます一般的になっている。しかしながら、グラフ上でのプロンプトに関する既存の研究は限定的であり、異なる下流タスクにアピールするための普遍的な治療法が欠如している。本稿では,グラフの事前学習と促進のための新しいフレームワークであるGraphPromptを提案する。 graphpromptは、事前トレーニングとダウンストリームのタスクを共通のタスクテンプレートに統合するだけでなく、学習可能なプロンプトを使用して、事前トレーニングされたモデルから最も関連する知識をタスク固有の方法で特定する。この2つのステージでGraphPromptをさらに強化するために、GraphPrompt+に2つの大きな拡張を加えました。まず、単純なリンク予測以上のグラフ事前学習タスクを一般化し、タスクテンプレートとの互換性を広げる。次に,事前学習したグラフエンコーダの各層に一連のプロンプトベクトルを組み込んだ,より一般化されたプロンプト設計を提案する。最後に、GraphPromptとGraphPrompt+を評価し分析するために、5つの公開データセットに関する広範な実験を行う。

Graph neural networks have emerged as a powerful tool for graph representation learning, but their performance heavily relies on abundant task-specific supervision. To reduce labeling requirement, the "pre-train, prompt" paradigms have become increasingly common. However, existing study of prompting on graphs is limited, lacking a universal treatment to appeal to different downstream tasks. In this paper, we propose GraphPrompt, a novel pre-training and prompting framework on graphs. GraphPrompt not only unifies pre-training and downstream tasks into a common task template but also employs a learnable prompt to assist a downstream task in locating the most relevant knowledge from the pre-trained model in a task-specific manner. To further enhance GraphPrompt in these two stages, we extend it into GraphPrompt+ with two major enhancements. First, we generalize several popular graph pre-training tasks beyond simple link prediction to broaden the compatibility with our task template. Second, we propose a more generalized prompt design that incorporates a series of prompt vectors within every layer of the pre-trained graph encoder, in order to capitalize on the hierarchical information across different layers beyond just the readout layer. Finally, we conduct extensive experiments on five public datasets to evaluate and analyze GraphPrompt and GraphPrompt+.

翻訳日:2023-11-28 18:22:55 公開日:2023-11-26

# 預言コモンセンス推論による共感・感情支援対話生成の促進

Enhancing Empathetic and Emotion Support Dialogue Generation with Prophetic Commonsense Inference ( http://arxiv.org/abs/2311.15316v1 )

ライセンス: Link先を確認

Lanrui Wang, Jiangnan Li, Chenxu Yang, Zheng Lin, Weiping Wang

(参考訳) 共感的および感情的支援の会話に対する人々の関心は大幅に高まっている。より敏感で理解力のある回答を提供するために、常識的な知識を活用することは、心理的側面や因果性をよりよく理解するための共通の戦略となっている。しかし、そのような常識推論は文脈外であり、今後の対話のテーマを予測できないため、一貫性や共感が欠如している。本稿では,この問題を解決するために,コモンセンス知識を推論する革新的なパラダイムである予言コモンセンス推論を提案する。対話の理解と常識的推論に大規模言語モデルの能力を活用することで,過去と将来の対話のギャップを埋めるために,可変モデルの訓練を行う。共感的ダイアログと感情支援会話に関する広範な実験により,対話エージェントと提案する予言的コモンセンス推論を併用することで,反応の質が著しく向上することが示された。

The interest in Empathetic and Emotional Support conversations among the public has significantly increased. To offer more sensitive and understanding responses, leveraging commonsense knowledge has become a common strategy to better understand psychological aspects and causality. However, such commonsense inferences can be out of context and unable to predict upcoming dialogue themes, resulting in responses that lack coherence and empathy. To remedy this issue, we present Prophetic Commonsense Inference, an innovative paradigm for inferring commonsense knowledge. By harnessing the capabilities of Large Language Models in understanding dialogue and making commonsense deductions, we train tunable models to bridge the gap between past and potential future dialogues. Extensive experiments conducted on EmpatheticDialogues and Emotion Support Conversation show that equipping dialogue agents with our proposed prophetic commonsense inference significantly enhances the quality of their responses.

翻訳日:2023-11-28 18:22:18 公開日:2023-11-26

# マルチパーティ量子和プロトコルのノイズロバスト性

Noise robustness of a multiparty quantum summation protocol ( http://arxiv.org/abs/2311.15314v1 )

ライセンス: Link先を確認

Ant\'on Rodr\'iguez Otero and Niels M. P. Neumann and Ward van der Schoot and Robert Wezeman

(参考訳) 量子コンピュータを量子ネットワークに接続することは、分散データセット上でセキュアに計算を行うなど、幅広い新しいアプリケーションを開く。しかし、短期量子ネットワークはノイズが多いため、プロトコルの正確性とセキュリティは保証されない。雑音の影響を調べるために,不完全連接状態を持つマルチパーティ要約プロトコルについて検討する。本研究では, このプロトコルにおけるノイズの非分極化と劣化の影響と, 確率分布に生じる雑音パターンについて解析的に検討する。我々は、シャミールの秘密の共有を利用して、プロトコルにおける信頼できる第三者の必要性を排除して結論付ける。

Connecting quantum computers to a quantum network opens a wide array of new applications, such as securely performing computations on distributed data sets. Near-term quantum networks are noisy, however, and hence correctness and security of protocols are not guaranteed. To study the impact of noise, we consider a multiparty summation protocol with imperfect shared entangled states. We study analytically the impact of both depolarising and dephasing noise on this protocol and the noise patterns arising in the probability distributions. We conclude by eliminating the need for a trusted third party in the protocol using Shamir's secret sharing.

翻訳日:2023-11-28 18:21:37 公開日:2023-11-26

# 周波数依存ミラーを用いた散逸・分散キャビティ光学

Dissipative and dispersive cavity optomechanics with a frequency-dependent mirror ( http://arxiv.org/abs/2311.15311v1 )

ライセンス: Link先を確認

Juliette Monsel, Anastasiia Ciers, Sushanth Kini Manjeshwar, Witlef Wieczorek, Janine Splettstoesser

(参考訳) 光学マイクロキャビティは、光をサブ波長ボリュームに閉じ込めることで、光と機械運動の相互作用を著しく向上させることができる。しかし、これは光学損失率の増加のコストがかかる。したがって、マイクロキャビティベースの光機械システムは未解決のサイドバンド方式に置かれ、サイドバンドベースの地中冷却が防止される。このようなシステムにおける光損失を減らす経路は、キャビティミラー、すなわち機械共振器と相互作用する光モードを設計することである。本研究では,このような光学系の解析を行い,鏡の1つは周波数依存性が強く,つまり懸濁したファノミラーである。この光学力学系は、懸濁したファノミラーの運動と結合する2つの光学モードからなる。我々は、標準分散光機械結合と散逸結合の両方を含む量子結合モード記述を定式化する。線形状態におけるシステム力学のランゲヴィン方程式を解くことにより, 空洞が分解側バンド状態では無くとも, 室温から基底状態の冷却が可能であることを示すが, 強い光モード結合により有効なサイドバンド分解能を実現することができる。さらに, キャビティ出力スペクトルは, 機械的共振器のフォノン占有率を推定するために, 効果的なレーザデチューニングに関して適切に解析する必要があることがわかった。また, ファノミラーの特性を解析することにより, ファノ系マイクロキャビティにおける非線形量子光力学の展開を予測した。

An optomechanical microcavity can considerably enhance the interaction between light and mechanical motion by confining light to a sub-wavelength volume. However, this comes at the cost of an increased optical loss rate. Therefore, microcavity-based optomechanical systems are placed in the unresolved-sideband regime, preventing sideband-based ground-state cooling. A pathway to reduce optical loss in such systems is to engineer the cavity mirrors, i.e., the optical modes that interact with the mechanical resonator. In our work, we analyze such an optomechanical system, whereby one of the mirrors is strongly frequency-dependent, i.e., a suspended Fano mirror. This optomechanical system consists of two optical modes that couple to the motion of the suspended Fano mirror. We formulate a quantum-coupled-mode description that includes both the standard dispersive optomechanical coupling as well as dissipative coupling. We solve the Langevin equations of the system dynamics in the linear regime showing that ground-state cooling from room temperature can be achieved even if the cavity is per se not in the resolved-sideband regime, but achieves effective sideband resolution through strong optical mode coupling. Importantly, we find that the cavity output spectrum needs to be properly analyzed with respect to the effective laser detuning to infer the phonon occupation of the mechanical resonator. Our work also predicts how to reach the regime of nonlinear quantum optomechanics in a Fano-based microcavity by engineering the properties of the Fano mirror.

翻訳日:2023-11-28 18:21:23 公開日:2023-11-26

# 低コストゼロ知識証明によるセキュアで検証可能なデータコラボレーション

Secure and Verifiable Data Collaboration with Low-Cost Zero-Knowledge Proofs ( http://arxiv.org/abs/2311.15310v1 )

ライセンス: Link先を確認

Yizheng Zhu, Yuncheng Wu, Zhaojing Luo, Beng Chin Ooi, Xiaokui Xiao

(参考訳) 組織は、データ分析のためのデータコラボレーションの価値をますます認識している。しかし、厳格なデータ保護法は生データの直接交換を禁じている。データコラボレーションを容易にするために、フェデレートラーニング(FL)が実現可能なソリューションとして登場し、複数のクライアントが、その生データの機密性を確保しつつ、中央サーバの監督下で機械学習(ML)モデルを協調的にトレーニングすることができる。しかし、既存の研究は2つの大きなリスクを明らかにしている。 (i)クライアントがアップロードした更新(つまりモデル勾配)から機密情報を推測し、クライアントの入力プライバシを侵害する可能性があること。 (ii) 不正な更新をアップロードしてflモデルに毒を盛る悪意のあるクライアントのリスクは、入力整合性を損なう。近年の研究では、ゼロ知識証明(ZKP)によるセキュアアグリゲーションを利用して、FLの入力プライバシーと整合性を保証する。それでも、非常に低い効率に悩まされており、実際の配備には実用的ではない。本稿では,入力プライバシと整合性を同時に確保し,安全かつ検証可能なデータコラボレーションのための,新規かつ高効率な解 risefl を提案する。次に,ビザンチンのロバスト性を満たすハイブリッドなコミットメントスキームを設計し,性能を向上する。第3に,提案手法のセキュリティ保証を理論的に証明する。合成データと実世界のデータセットに関する広範な実験は、我々のソリューションは効率的であり、クライアントの計算と通信の両方において非常に効率的であることを示唆している。例えばRiseFLは、クライアント計算の3つの最先端ベースラインであるACORN, RoFL, EIFFeLよりも最大28x, 53x, 164x高速である。

Organizations are increasingly recognizing the value of data collaboration for data analytics purposes. Yet, stringent data protection laws prohibit the direct exchange of raw data. To facilitate data collaboration, federated Learning (FL) emerges as a viable solution, which enables multiple clients to collaboratively train a machine learning (ML) model under the supervision of a central server while ensuring the confidentiality of their raw data. However, existing studies have unveiled two main risks: (i) the potential for the server to infer sensitive information from the client's uploaded updates (i.e., model gradients), compromising client input privacy, and (ii) the risk of malicious clients uploading malformed updates to poison the FL model, compromising input integrity. Recent works utilize secure aggregation with zero-knowledge proofs (ZKP) to guarantee input privacy and integrity in FL. Nevertheless, they suffer from extremely low efficiency and, thus, are impractical for real deployment. In this paper, we propose a novel and highly efficient solution RiseFL for secure and verifiable data collaboration, ensuring input privacy and integrity simultaneously.Firstly, we devise a probabilistic integrity check method that significantly reduces the cost of ZKP generation and verification. Secondly, we design a hybrid commitment scheme to satisfy Byzantine robustness with improved performance. Thirdly, we theoretically prove the security guarantee of the proposed solution. Extensive experiments on synthetic and real-world datasets suggest that our solution is effective and is highly efficient in both client computation and communication. For instance, RiseFL is up to 28x, 53x and 164x faster than three state-of-the-art baselines ACORN, RoFL and EIFFeL for the client computation.

翻訳日:2023-11-28 18:20:48 公開日:2023-11-26

# AV-Deepfake1M:大規模LCM駆動型オーディオビジュアルディープフェイクデータセット

AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset ( http://arxiv.org/abs/2311.15308v1 )

ライセンス: Link先を確認

Zhixi Cai, Shreya Ghosh, Aman Pankaj Adatia, Munawar Hayat, Abhinav Dhall, Kalin Stefanov

(参考訳) 高度にリアルなディープフェイクな映像コンテンツの検出とローカライズは、最先端の最先端の手法でも困難である。この領域における研究はほとんどが高品質なディープフェイク画像やビデオの検出に重点を置いているが、実際のビデオに埋め込まれたオーディオ視覚操作の小さな部分の局所化の問題に対処する研究はほとんどない。本研究では,このようなコンテンツ生成の過程をエミュレートし,AV-Deepfake1Mデータセットを提案する。データセットにはコンテンツ駆動 (i)ビデオ操作、 (ii)音声操作、及び (iii) 2k以上の被写体に対する視聴覚操作により,合計100万以上の映像が得られた。本稿では,提案するデータ生成パイプラインの詳細な記述と,生成されたデータの品質の厳密な解析について述べる。最先端のディープフェイク検出とローカライズ手法を用いて提案したデータセットの総合ベンチマークは,従来のデータセットと比較して大幅な性能低下を示している。提案したデータセットは、次世代のディープフェイクローカライゼーション手法を構築する上で重要な役割を果たす。データセットと関連するコードはhttps://github.com/ControlNet/AV-Deepfake1Mで公開されている。

The detection and localization of highly realistic deepfake audio-visual content are challenging even for the most advanced state-of-the-art methods. While most of the research efforts in this domain are focused on detecting high-quality deepfake images and videos, only a few works address the problem of the localization of small segments of audio-visual manipulations embedded in real videos. In this research, we emulate the process of such content generation and propose the AV-Deepfake1M dataset. The dataset contains content-driven (i) video manipulations, (ii) audio manipulations, and (iii) audio-visual manipulations for more than 2K subjects resulting in a total of more than 1M videos. The paper provides a thorough description of the proposed data generation pipeline accompanied by a rigorous analysis of the quality of the generated data. The comprehensive benchmark of the proposed dataset utilizing state-of-the-art deepfake detection and localization methods indicates a significant drop in performance compared to previous datasets. The proposed dataset will play a vital role in building the next-generation deepfake localization methods. The dataset and associated code are available at https://github.com/ControlNet/AV-Deepfake1M .

翻訳日:2023-11-28 18:20:19 公開日:2023-11-26

# スケッチビデオ合成

Sketch Video Synthesis ( http://arxiv.org/abs/2311.15306v1 )

ライセンス: Link先を確認

Yudian Zheng, Xiaodong Cun, Menghan Xia, Chi-Man Pun

(参考訳) 画像スケッチ生成には意味的な複雑さやハイレベルな概念を理解することが不可欠であり、この課題はビデオの領域に適用されるとさらに強固になる。そこで本稿では,フレームワイズb\'ezier曲線で表現された映像をスケッチするための新しい最適化ベースフレームワークを提案する。具体的には,まず各曲線の位置と幅を暖めるためのクロスフレームストローク初期化手法を提案する。次に,CLIP特徴に基づく意味的損失と,自己分解型2Dアトラスネットワークを用いて新たに設計された一貫性損失を利用して,これらの曲線の位置を最適化する。これらのデザイン要素に基づいて作られたスケッチビデオは、印象的な視覚的抽象化と時間的コヒーレンスを示している。さらに,スケッチ作成プロセスを通じて映像をSVGラインに変換することにより,ティーザーの例に示すように,スケッチベースのビデオ編集やビデオドーナリングの応用を解放する。

Understanding semantic intricacies and high-level concepts is essential in image sketch generation, and this challenge becomes even more formidable when applied to the domain of videos. To address this, we propose a novel optimization-based framework for sketching videos represented by the frame-wise B\'ezier curve. In detail, we first propose a cross-frame stroke initialization approach to warm up the location and the width of each curve. Then, we optimize the locations of these curves by utilizing a semantic loss based on CLIP features and a newly designed consistency loss using the self-decomposed 2D atlas network. Built upon these design elements, the resulting sketch video showcases impressive visual abstraction and temporal coherence. Furthermore, by transforming a video into SVG lines through the sketching process, our method unlocks applications in sketch-based video editing and video doodling, enabled through video composition, as exemplified in the teaser.

翻訳日:2023-11-28 18:20:00 公開日:2023-11-26

# 概念蒸留:人間中心の説明をモデル改善に活用する

Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement ( http://arxiv.org/abs/2311.15303v1 )

ライセンス: Link先を確認

Avani Gupta, Saurabh Saini, P J Narayanan

(参考訳) 人間はハードな特徴の代わりに抽象的な概念を使う。近年の解釈可能性研究は、ニューラルネットワークの人間中心の概念説明に焦点を当てている。概念活性化ベクトル(cav)は、与えられた概念に対するモデルの感度と潜在的バイアスを推定する。本稿では,CAVをポストホック解析からアンテホックトレーニングに拡張し,新たな概念損失を用いた微調整によりモデルバイアスを低減する。概念は、ネットワークの最終層で過去に定義されていた。クラスプロトタイプを用いて中間層に一般化する。これにより、最も有益であることが知られている最後の畳み込み層でのクラス学習が促進される。また,教師として訓練済みの知識モデルを用いて,より豊かな概念を創出するために,概念蒸留を導入する。提案手法は,概念に向けてモデルを感性化あるいは脱感性化することができる。いくつかの分類問題に対する概念感受性トレーニングの応用について述べる。また,概念を用いて先行知識を再構築問題であるiidに誘導する。概念に敏感なトレーニングは、モデルの解釈性を改善し、バイアスを減らし、事前知識を誘導する。コードと詳細はhttps://avani17101.github.io/concept-distilllation/を参照してください。

Humans use abstract concepts for understanding instead of hard features. Recent interpretability research has focused on human-centered concept explanations of neural networks. Concept Activation Vectors (CAVs) estimate a model's sensitivity and possible biases to a given concept. In this paper, we extend CAVs from post-hoc analysis to ante-hoc training in order to reduce model bias through fine-tuning using an additional Concept Loss. Concepts were defined on the final layer of the network in the past. We generalize it to intermediate layers using class prototypes. This facilitates class learning in the last convolution layer, which is known to be most informative. We also introduce Concept Distillation to create richer concepts using a pre-trained knowledgeable model as the teacher. Our method can sensitize or desensitize a model towards concepts. We show applications of concept-sensitive training to debias several classification problems. We also use concepts to induce prior knowledge into IID, a reconstruction problem. Concept-sensitive training can improve model interpretability, reduce biases, and induce prior knowledge. Please visit https://avani17101.github.io/Concept-Distilllation/ for code and more details.

翻訳日:2023-11-28 18:19:44 公開日:2023-11-26

# アンサンブル学習による眼疾患予測とoctスキャンによる注意

Eye Disease Prediction using Ensemble Learning and Attention on OCT Scans ( http://arxiv.org/abs/2311.15301v1 )

ライセンス: Link先を確認

Gauri Naik, Nandini Narvekar, Dimple Agarwal, Nishita Nandanwar, Himangi Pande

(参考訳) 眼疾患は何十年にもわたって大きな課題となっているが、技術の進歩により、その検出と治療のための新しい道が開かれた。機械学習とディープラーニングのアルゴリズムは、特に光コヒーレント技術(oct)イメージングと組み合わせることで、この領域で活用されている。 OCT画像から眼疾患を効率的に検出するための新しい手法を提案する。本手法は,脈絡膜新生血管 (cnv) , 糖尿病黄斑浮腫 (dme) , drusen などの特定の病態により, 患者を無疾患 (正常眼) に分類することを可能にする。本研究では,効率的な眼疾患予測に機械学習とディープラーニング技術を利用するエンド・ツー・エンドのWebアプリケーションを提案する。このアプリケーションは、訓練されたカスタムUNetモデルを使用してセグメンテーションを行うOCTスキャン画像の提出を可能にする。次に、セグメント画像は、自己注意層で強化されたInceptionV3とXceptionネットワークからなるアンサンブルモデルに入力される。この自己注意アプローチは、個々のモデルの特徴マップを活用して、分類精度を向上させる。アンサンブルモデルの出力を集約して様々な眼疾患を予測・分類する。アプリケーションの効率性と最適な性能を確保するため、大規模な実験と最適化が実施されている。本研究は眼疾患予測における提案手法の有効性を示す。開発したWebアプリケーションは早期発見やタイムリーな介入の可能性を秘めており、眼科医療の成果に寄与する。

Eye diseases have posed significant challenges for decades, but advancements in technology have opened new avenues for their detection and treatment. Machine learning and deep learning algorithms have become instrumental in this domain, particularly when combined with Optical Coherent Technology (OCT) imaging. We propose a novel method for efficient detection of eye diseases from OCT images. Our technique enables the classification of patients into disease free (normal eyes) or affected by specific conditions such as Choroidal Neovascularization (CNV), Diabetic Macular Edema (DME), or Drusen. In this work, we introduce an end to end web application that utilizes machine learning and deep learning techniques for efficient eye disease prediction. The application allows patients to submit their raw OCT scanned images, which undergo segmentation using a trained custom UNet model. The segmented images are then fed into an ensemble model, comprising InceptionV3 and Xception networks, enhanced with a self attention layer. This self attention approach leverages the feature maps of individual models to achieve improved classification accuracy. The ensemble model's output is aggregated to predict and classify various eye diseases. Extensive experimentation and optimization have been conducted to ensure the application's efficiency and optimal performance. Our results demonstrate the effectiveness of the proposed approach in accurate eye disease prediction. The developed web application holds significant potential for early detection and timely intervention, thereby contributing to improved eye healthcare outcomes.

翻訳日:2023-11-28 18:19:29 公開日:2023-11-26

# コンテナ端末におけるタイムスロット管理のためのデータ駆動型マルチエージェント意思決定支援システム:ロッテルダム港を事例として

A Data-driven and multi-agent decision support system for time slot management at container terminals: A case study for the Port of Rotterdam ( http://arxiv.org/abs/2311.15298v1 )

ライセンス: Link先を確認

Ali Nadi, Maaike Snelder, J.W.C. van Lint, L\'or\'ant Tavasszy

(参考訳) コンテナハブからのトラックの出発時間を制御することは、交通システムと物流システムの両方にとって重要である。しかしこれには、ターミナルゲートでのトラック到着時刻を制御し管理できるインテリジェントな意思決定支援システムが必要である。本稿では,ポート・ハイトランドエコシステムにおけるロジスティクスとトラフィックの相互作用を理解し,予測し,制御するための統合モデルを提案する。このアプローチはコンテキスト対応であり、大きな履歴データを使用してシステム状態を予測し、トラックの流入と流出に応じて制御ポリシーを適用する。規制方針は、トラック会社、ターミナルオペレーター、道路交通代理店を含む複数の利害関係者の満足を確保する。提案手法は, ゲート待ち時間とコスト効率の向上を期待するタイムスロットを選択するために, 系統的にステアトラックを運用する5つの統合モジュールから構成される。シミュレーションは実世界のデータによって支援され、システム内で大きな利得が得られることを示す。

Controlling the departure time of the trucks from a container hub is important to both the traffic and the logistics systems. This, however, requires an intelligent decision support system that can control and manage truck arrival times at terminal gates. This paper introduces an integrated model that can be used to understand, predict, and control logistics and traffic interactions in the port-hinterland ecosystem. This approach is context-aware and makes use of big historical data to predict system states and apply control policies accordingly, on truck inflow and outflow. The control policies ensure multiple stakeholders satisfaction including those of trucking companies, terminal operators, and road traffic agencies. The proposed method consists of five integrated modules orchestrated to systematically steer truckers toward choosing those time slots that are expected to result in lower gate waiting times and more cost-effective schedules. The simulation is supported by real-world data and shows that significant gains can be obtained in the system.

翻訳日:2023-11-28 18:19:04 公開日:2023-11-26

# 温間開始ガウス過程を用いた可制御性多目的最適化

Controllable Expensive Multi-objective Optimization with Warm-starting Gaussian Processes ( http://arxiv.org/abs/2311.15297v1 )

ライセンス: Link先を確認

Quang-Huy Nguyen, Long P. Hoang, Hoang V. Viet, Dung D. Le

(参考訳) Pareto Set Learning (PSL)は、多目的最適化(MOO)問題において、Paretoフロント全体を近似するための有望なアプローチである。しかしながら、既存の微分自由PSL法はしばしば不安定で非効率であり、特に、目的関数評価がコストがかかる高価なブラックボックスMOO問題に対して有効である。本研究では,Co-PSLと呼ばれる新しい制御可能なPSL法を用いて,既存のPSL法の不安定性と非効率性に対処することを提案する。特に、Co-PSLは、(1)ガウス過程の先行値を得るためのベイズ最適化をウォームスタートさせ、(2)制御可能なパレート集合学習により、好みから対応するパレート解へのパラメトリックマッピングを正確に取得する。前者はPSLプロセスの安定化と高価な機能評価の削減を支援することである。後者は、競合する目標間のリアルタイムのトレードオフ制御をサポートする。合成および実世界のMOO問題における性能は、高価な多目的最適化タスクにおけるCo-PSLの有効性を示す。

Pareto Set Learning (PSL) is a promising approach for approximating the entire Pareto front in multi-objective optimization (MOO) problems. However, existing derivative-free PSL methods are often unstable and inefficient, especially for expensive black-box MOO problems where objective function evaluations are costly. In this work, we propose to address the instability and inefficiency of existing PSL methods with a novel controllable PSL method, called Co-PSL. Particularly, Co-PSL consists of two stages: (1) warm-starting Bayesian optimization to obtain quality Gaussian Processes priors and (2) controllable Pareto set learning to accurately acquire a parametric mapping from preferences to the corresponding Pareto solutions. The former is to help stabilize the PSL process and reduce the number of expensive function evaluations. The latter is to support real-time trade-off control between conflicting objectives. Performances across synthesis and real-world MOO problems showcase the effectiveness of our Co-PSL for expensive multi-objective optimization tasks.

翻訳日:2023-11-28 18:18:46 公開日:2023-11-26

# uhgeval:unconstrained generationによる中国語大言語モデルの幻覚のベンチマーク

UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation ( http://arxiv.org/abs/2311.15296v1 )

ライセンス: Link先を確認

Xun Liang, Shichao Song, Simin Niu, Zhiyu Li, Feiyu Xiong, Bo Tang, Zhaohui Wy, Dawei He, Peng Cheng, Zhonghao Wang, Haiying Deng

(参考訳) 大規模言語モデル(llm)は、現代自然言語処理において重要な貢献者として登場し、様々な産業に適用されつつある。しかし、これらの大規模確率論的統計モデルは、現在プロのコンテンツ生成に必要な品質を保証できない。これらのモデルは、しばしば幻覚テキストを生成し、専門的な文脈で実用性を妥協する。テキスト生成におけるLCMの信頼性を評価するために,幻覚現象のベンチマーク評価を開発した。しかしながら、これらのベンチマークはコストと時間的制約のため、しばしば制約付き生成技術を利用する。これらの技術は、指示幻覚誘導と、幻覚を生み出すための真正のテキストを意図的に変更する戦略の使用を含んでいる。これらのアプローチは、現実世界のアプリケーションによって要求される制限のないテキスト生成と一致しない。さらに, テキスト生成における幻覚評価専用の中国語データセットも, 現在不足している。その結果,LLMによる最小限の制約で生成した出力をコンパイルするUnconstrained Hallucination Generation Evaluation (UHGEval) ベンチマークを開発した。同時に,スケーラブルで再現可能な実験を行うための総合的なベンチマーク評価フレームワークを構築した。また,著明な中国語モデルとgptシリーズモデルを評価し,幻覚の課題に関する専門的なパフォーマンス洞察を導出するための広範な実験を行った。

Large language models (LLMs) have emerged as pivotal contributors in contemporary natural language processing and are increasingly being applied across a diverse range of industries. However, these large-scale probabilistic statistical models cannot currently ensure the requisite quality in professional content generation. These models often produce hallucinated text, compromising their practical utility in professional contexts. To assess the authentic reliability of LLMs in text generation, numerous initiatives have developed benchmark evaluations for hallucination phenomena. Nevertheless, these benchmarks frequently utilize constrained generation techniques due to cost and temporal constraints. These techniques encompass the use of directed hallucination induction and strategies that deliberately alter authentic text to produce hallucinations. These approaches are not congruent with the unrestricted text generation demanded by real-world applications. Furthermore, a well-established Chinese-language dataset dedicated to the evaluation of hallucinations in text generation is presently lacking. Consequently, we have developed an Unconstrained Hallucination Generation Evaluation (UHGEval) benchmark, designed to compile outputs produced with minimal restrictions by LLMs. Concurrently, we have established a comprehensive benchmark evaluation framework to aid subsequent researchers in undertaking scalable and reproducible experiments. We have also executed extensive experiments, evaluating prominent Chinese language models and the GPT series models to derive professional performance insights regarding hallucination challenges.

翻訳日:2023-11-28 18:18:28 公開日:2023-11-26

# ロシアによるウクライナ侵攻におけるパルチザンニュース共有に関する研究

A Study of Partisan News Sharing in the Russian invasion of Ukraine ( http://arxiv.org/abs/2311.15294v1 )

ライセンス: Link先を確認

Yiming Zhu, Ehsan-Ul Haq, Gareth Tyson, Lik-Hang Lee, Yuyang Wang, Pan Hui

(参考訳) ロシアによるウクライナ侵攻以来、大量の偏見や党派的なニュースがソーシャルメディアを通じて拡散してきた。これはより広範な社会的問題につながる可能性があるため、オンラインコミュニティのより良いガバナンスには、パルチザン的なニュース共有がユーザのコミュニケーションにどのように影響するかを理解することが重要であると論じる。本稿では,パルチザンニュース共有の計測研究を行う。我々は,このような共有がユーザのコミュニケーションに与える影響を特徴付けることを目的とする。われわれの分析では、ロシアの侵略に関連するRedditの6つのコミュニティにわたる8ヶ月のデータセットをカバーしている。まず,パルチザンニュース共有の時間変化の分析を行った。我々は,この侵略が,パルチザンのニュース共有量の増加とともに,観察されたコミュニティの議論を刺激することを確認する。次に,このような共有に対するユーザの反応を特徴付ける。我々は、パルチザンバイアスがその伝播を狭める役割を担っていることを観察する。バイアスのあるメディアは、複数のサブレディットにまたがる可能性が低い。しかし、パルチザン的なニュース共有は、より多くのコメントを生成して、議論に参加するユーザーを惹きつけている。その後、パルチザンニュースを広める可能性のあるユーザーを特定するための予測モデルを構築しました。しかし、予測は平均で61.57%の精度で難しい。コメントネットワークの中央集権性分析により,パルチザンニュースを広める利用者は,中立ニュースを広める利用者に比べてネットワークの影響が小さいことが示唆された。

Since the Russian invasion of Ukraine, a large volume of biased and partisan news has been spread via social media platforms. As this may lead to wider societal issues, we argue that understanding how partisan news sharing impacts users' communication is crucial for better governance of online communities. In this paper, we perform a measurement study of partisan news sharing. We aim to characterize the role of such sharing in influencing users' communications. Our analysis covers an eight-month dataset across six Reddit communities related to the Russian invasion. We first perform an analysis of the temporal evolution of partisan news sharing. We confirm that the invasion stimulates discussion in the observed communities, accompanied by an increased volume of partisan news sharing. Next, we characterize users' response to such sharing. We observe that partisan bias plays a role in narrowing its propagation. More biased media is less likely to be spread across multiple subreddits. However, we find that partisan news sharing attracts more users to engage in the discussion, by generating more comments. We then built a predictive model to identify users likely to spread partisan news. The prediction is challenging though, with 61.57% accuracy on average. Our centrality analysis on the commenting network further indicates that the users who disseminate partisan news possess lower network influence in comparison to those who propagate neutral news.

翻訳日:2023-11-28 18:18:09 公開日:2023-11-26

# BatchNormによる弱補正ビデオ異常検出

BatchNorm-based Weakly Supervised Video Anomaly Detection ( http://arxiv.org/abs/2311.15367v1 )

ライセンス: Link先を確認

Yixuan Zhou, Yi Qu, Xing Xu, Fumin Shen, Jingkuan Song, Hengtao Shen

(参考訳) 異常発生の有無を示すビデオレベルラベルのみが使用可能な弱教師付きビデオ異常検出(wvad)では,異常発生の時間的アノテーションにおける内在的曖昧さが主な課題となっている。異常事象の時間的特徴がしばしば異常な特徴を示すという統計的知見に着想を得て,BatchNormをWVADに組み込んだBN-WVADを提案する。提案したBN-WVADでは,BatchNormの平均ベクトル(DFM)から特徴の偏差を信頼性のある異常基準として活用し,異常ビデオ中の潜在的な異常断片を識別する。提案したDFM基準は、異常認識にも適しており、ラベルノイズに対する耐性も高く、ノイズラベルに影響を受けやすい異常分類器の予測を補正するための追加の異常スコアとして機能する。さらに、より異常なイベントが発生するビデオ中の異常なスニペットをフィルタリングするために、バッチレベルの選択戦略が考案されている。提案したBN-WVADモデルでは、UCF-CrimeのAUCは87.24%、XD-Violenceは84.93%に達する。私たちのコード実装はhttps://github.com/cool-xuan/bn-wvadからアクセスできます。

In weakly supervised video anomaly detection (WVAD), where only video-level labels indicating the presence or absence of abnormal events are available, the primary challenge arises from the inherent ambiguity in temporal annotations of abnormal occurrences. Inspired by the statistical insight that temporal features of abnormal events often exhibit outlier characteristics, we propose a novel method, BN-WVAD, which incorporates BatchNorm into WVAD. In the proposed BN-WVAD, we leverage the Divergence of Feature from Mean vector (DFM) of BatchNorm as a reliable abnormality criterion to discern potential abnormal snippets in abnormal videos. The proposed DFM criterion is also discriminative for anomaly recognition and more resilient to label noise, serving as the additional anomaly score to amend the prediction of the anomaly classifier that is susceptible to noisy labels. Moreover, a batch-level selection strategy is devised to filter more abnormal snippets in videos where more abnormal events occur. The proposed BN-WVAD model demonstrates state-of-the-art performance on UCF-Crime with an AUC of 87.24%, and XD-Violence, where AP reaches up to 84.93%. Our code implementation is accessible at https://github.com/cool-xuan/BN-WVAD.

翻訳日:2023-11-28 18:09:50 公開日:2023-11-26

# seq2seq変換による非ターゲットコードオーサシップ回避

Untargeted Code Authorship Evasion with Seq2Seq Transformation ( http://arxiv.org/abs/2311.15366v1 )

ライセンス: Link先を確認

Soohyeon Choi and Rhongho Jang and DaeHun Nyang and David Mohaisen

(参考訳) コードオーサシップの属性(Code Authorship Attribution)は、プログラム言語コードの作者をコード内のスタイリスティックな特徴を通じて識別する問題である。本稿では、StuctCoderと呼ばれるSeq2Seqコードトランスフォーマーを利用する、コードオーサシップ難読化技術であるSCAEを紹介する。 SCAEは、ある言語から別の言語(例えばJavaからC#)への関数レベルのコード変換用に最初に設計されたシステムであるStructCoderを、転送学習を使ってカスタマイズする。 SCAEは、既存の作業と比べて、わずかに精度の低下で効率を向上した。また,85%のトランスフォーメーション成功率と95.77%の回避成功率を維持しながら,処理時間を約68%削減した。

Code authorship attribution is the problem of identifying authors of programming language codes through the stylistic features in their codes, a topic that recently witnessed significant interest with outstanding performance. In this work, we present SCAE, a code authorship obfuscation technique that leverages a Seq2Seq code transformer called StructCoder. SCAE customizes StructCoder, a system designed initially for function-level code translation from one language to another (e.g., Java to C#), using transfer learning. SCAE improved the efficiency at a slight accuracy degradation compared to existing work. We also reduced the processing time by about 68% while maintaining an 85% transformation success rate and up to 95.77% evasion success rate in the untargeted setting.

翻訳日:2023-11-28 18:09:26 公開日:2023-11-26

# L{}ojasiewicz-Simon不等式による連続的なディープラーニングモデルの収束結果

A Convergence result of a continuous model of deep learning via \L{}ojasiewicz--Simon inequality ( http://arxiv.org/abs/2311.15365v1 )

ライセンス: Link先を確認

Noboru Isobe

(参考訳) 本研究では,Deep Neural Network (DNN) の連続モデルの最適化プロセスを表すWasserstein型勾配流に着目した。まず, モデルの平均損失に対する最小化器の存在を, $l^2$-正規化の下で確立する。その後、損失の最大傾斜曲線の存在を示す。私たちの主な結果は、時間が無限になるにつれて、損失の臨界点への流れの収束です。この結果を証明するための重要な側面は、損失に対する L{}ojasiewicz--シモン勾配の不等式を確立することである。 NNと損失関数の解析性を仮定することで、この不等式を導出する。本証明は,非凸関数に対するwasserstein型勾配流の漸近的挙動を解析するための新しい手法を提供する。

This study focuses on a Wasserstein-type gradient flow, which represents an optimization process of a continuous model of a Deep Neural Network (DNN). First, we establish the existence of a minimizer for an average loss of the model under $L^2$-regularization. Subsequently, we show the existence of a curve of maximal slope of the loss. Our main result is the convergence of flow to a critical point of the loss as time goes to infinity. An essential aspect of proving this result involves the establishment of the \L{}ojasiewicz--Simon gradient inequality for the loss. We derive this inequality by assuming the analyticity of NNs and loss functions. Our proofs offer a new approach for analyzing the asymptotic behavior of Wasserstein-type gradient flows for nonconvex functionals.

翻訳日:2023-11-28 18:09:11 公開日:2023-11-26

# ロボットインタラクションにおけるRGBカメラを用いた超音波ジェスチャー認識

Ultra-Range Gesture Recognition using an RGB Camera in Human-Robot Interaction ( http://arxiv.org/abs/2311.15361v1 )

ライセンス: Link先を確認

Eran Bamani, Eden Nissinman, Inbar Meir, Lisa Koenigsberg, Avishai Sintov

(参考訳) ハンドジェスチャは、非言語的意図、思考、命令が伝達される人間の相互作用において重要な役割を果たす。 HRI(Human-Robot Interaction)では、ハンドジェスチャはロボットエージェントに明確で迅速な指示を伝達するための類似した、効率的な媒体を提供する。しかし,ジェスチャ認識のための最先端の視覚ベース手法は,ユーザカメラ距離7mまでしか効果がないことが示されている。このような距離の短い範囲では、サービスロボット、捜索救助ロボット、ドローンといった実用的なhriを制限することができる。本研究では,最大25mの認識距離とHRIの文脈で,Ultra-Range Gesture Recognition (URGR)問題に対処する。シンプルなRGBカメラのみを用いたURGRのための新しいディープラーニングフレームワークを提案する。まず、HQ-Netと呼ばれる新しい超解像度モデルを用いて、ユーザの低解像度画像を強化する。次に,拡張画像を入力とする新しいurgr分類器であるgraph vision transformer(gvit)を提案する。 GViTは、グラフ畳み込みネットワーク(GCN)と修正されたビジョントランスフォーマー(ViT)の利点を組み合わせたものである。多様なテストデータに対する提案フレームワークの評価は、98.1%高い認識率をもたらす。このフレームワークは、超距離での人間の認識よりも優れた性能を示した。本研究では,複雑な屋内・屋外環境下での人間のジェスチャーによる自律的四足歩行ロボットの性能解析と実演を行う。

Hand gestures play a significant role in human interactions where non-verbal intentions, thoughts and commands are conveyed. In Human-Robot Interaction (HRI), hand gestures offer a similar and efficient medium for conveying clear and rapid directives to a robotic agent. However, state-of-the-art vision-based methods for gesture recognition have been shown to be effective only up to a user-camera distance of seven meters. Such a short distance range limits practical HRI with, for example, service robots, search and rescue robots and drones. In this work, we address the Ultra-Range Gesture Recognition (URGR) problem by aiming for a recognition distance of up to 25 meters and in the context of HRI. We propose a novel deep-learning framework for URGR using solely a simple RGB camera. First, a novel super-resolution model termed HQ-Net is used to enhance the low-resolution image of the user. Then, we propose a novel URGR classifier termed Graph Vision Transformer (GViT) which takes the enhanced image as input. GViT combines the benefits of a Graph Convolutional Network (GCN) and a modified Vision Transformer (ViT). Evaluation of the proposed framework over diverse test data yields a high recognition rate of 98.1%. The framework has also exhibited superior performance compared to human recognition in ultra-range distances. With the framework, we analyze and demonstrate the performance of an autonomous quadruped robot directed by human gestures in complex ultra-range indoor and outdoor environments.

翻訳日:2023-11-28 18:09:00 公開日:2023-11-26

# レーザー支援量子反射による原子表面散乱の制御

Controlling Atom-Surface Scattering with Laser Assisted Quantum Reflection ( http://arxiv.org/abs/2311.15357v1 )

ライセンス: Link先を確認

A. L. Harris

(参考訳) 低エネルギー原子-表面散乱では、古典的な旋回点を持たない魅力的なポテンシャルの領域で原子を反射することができる。この現象は量子反射(quantum reflection)として知られており、原子の表面に付着する確率を減少させ、また原子トラップにも用いられる。我々は、印加されたレーザー場の存在下でモースポテンシャル内をゆっくり動く原子を持つ1次元モデルで量子反射過程をシミュレートする。レーザー支援量子反射の場合、レーザー場は原子にさらなる運動量と運動エネルギーを与える。これにより、原子と表面の間の最接近距離が減少する。その結果,レーザーパルスのタイミングや強度によって距離を制御でき,粘着率や量子反射率の低減が期待できることがわかった。

In low energy atom-surface scattering, it is possible for the atom to be reflected in a region of attractive potential with no classical turning point. This phenomenon has come to be known as quantum reflection and it can reduce the sticking probability of atoms to surfaces, as well be used for atom trapping. We simulate the quantum reflection process in a one-dimensional model with a slow-moving atom moving in a Morse potential in the presence of an applied laser field. We show that in the case of laser-assisted quantum reflection, the laser field imparts additional momentum and kinetic energy to the atom. This results in a decreased distance of closest approach between the atom and surface. Our results show that the distance of closest approach and can be controlled through the timing and intensity of the laser pulse, which may result in enhanced sticking probability and/or reduced quantum reflection probability.

翻訳日:2023-11-28 18:08:36 公開日:2023-11-26

# 第二の考えを持つか? 聞いてみましょう

Having Second Thoughts? Let's hear it ( http://arxiv.org/abs/2311.15356v1 )

ライセンス: Link先を確認

Jung H. Lee and Sujith Vijayan

(参考訳) ディープラーニングモデルは、低次知覚領域から高次認知領域へのボトムアップ信号経路を緩く模倣する。訓練後、DLモデルはいくつかのドメイン固有のタスクにおいて人間より優れているが、意思決定プロセスは容易に破壊されることが知られている。人間の脳は複数の機能領域から構成されており、ボトムアップとトップダウン(高次から低次まで)の複雑な相互作用に依存しているため、トップダウン信号処理を取り入れることで、DLモデルをより堅牢にすることができると仮定する。この仮説に対処するため,我々は,DLモデルをより堅牢にできるかどうか,選択的注意を模倣した認証プロセスを提案する。実験的な評価から,新たに提案された認証により,DLモデルの精度が向上し,その脆弱性を人為的,自然的両面的な例で軽減する安全対策が構築できることが示唆された。

Deep learning models loosely mimic bottom-up signal pathways from low-order sensory areas to high-order cognitive areas. After training, DL models can outperform humans on some domain-specific tasks, but their decision-making process has been known to be easily disrupted. Since the human brain consists of multiple functional areas highly connected to one another and relies on intricate interplays between bottom-up and top-down (from high-order to low-order areas) processing, we hypothesize that incorporating top-down signal processing may make DL models more robust. To address this hypothesis, we propose a certification process mimicking selective attention and test if it could make DL models more robust. Our empirical evaluations suggest that this newly proposed certification can improve DL models' accuracy and help us build safety measures to alleviate their vulnerabilities with both artificial and natural adversarial examples.

翻訳日:2023-11-28 18:08:23 公開日:2023-11-26

# 強化学習における任意制約を伴う確率的行動の生成モデル

Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning ( http://arxiv.org/abs/2311.15341v1 )

ライセンス: Link先を確認

Changyu Chen, Ramesha Karunasena, Thanh Hong Nguyen, Arunesh Sinha, Pradeep Varakantham

(参考訳) 強化学習(rl)の多くの問題は、大きな離散的多次元かつ無秩序なアクション空間を持つ最適方針を求めており、複数のセキュリティリソースの配置や緊急対応ユニットなどのリソースのランダム配置の問題を含んでいる。この設定の課題は、下層の作用空間が分類的(離散的かつ非順序的)で大きく、既存のRL法ではうまく機能しないことである。さらに、これらの問題は実効作用(配置)の妥当性を必要とし、この妥当性制約はしばしば閉じた数学的形式でコンパクトに表現することが困難である。問題の割り当ての性質は、もし存在するならば、確率的最適政策を好む。本稿では,(1)(状態)条件付き正規化フローを適用して確率的ポリシーをコンパクトに表現すること -- ネットワークが1つのサンプルアクションとそれに対応するアクションのログ確率を生成することによって生じるコンパクト性 -- をアクタ-クリティックな方法で使用すること,(2)ベースポリシーを更新するために無効なアクション拒否法(有効なアクションオラクルによる)を使用することによって,これらの課題に対処する。アクション拒否は、私たちが導出する変更されたポリシー勾配によって実現されます。最後に、従来の手法と比較して、我々のアプローチのスケーラビリティと、任意の状態におけるアクションの分布のサポートに任意の状態条件制約を適用する能力を示すための広範な実験を行う。

Many problems in Reinforcement Learning (RL) seek an optimal policy with large discrete multidimensional yet unordered action spaces; these include problems in randomized allocation of resources such as placements of multiple security resources and emergency response units, etc. A challenge in this setting is that the underlying action space is categorical (discrete and unordered) and large, for which existing RL methods do not perform well. Moreover, these problems require validity of the realized action (allocation); this validity constraint is often difficult to express compactly in a closed mathematical form. The allocation nature of the problem also prefers stochastic optimal policies, if one exists. In this work, we address these challenges by (1) applying a (state) conditional normalizing flow to compactly represent the stochastic policy -- the compactness arises due to the network only producing one sampled action and the corresponding log probability of the action, which is then used by an actor-critic method; and (2) employing an invalid action rejection method (via a valid action oracle) to update the base policy. The action rejection is enabled by a modified policy gradient that we derive. Finally, we conduct extensive experiments to show the scalability of our approach compared to prior methods and the ability to enforce arbitrary state-conditional constraints on the support of the distribution of actions in any state.

翻訳日:2023-11-28 18:08:06 公開日:2023-11-26

# 情報マスキングの敵意的浄化

Adversarial Purification of Information Masking ( http://arxiv.org/abs/2311.15339v1 )

ライセンス: Link先を確認

Sitong Liu, Zhichao Lian, Shuangquan Zhang, Liang Xiao

(参考訳) 敵対的攻撃は、ニューラルネットワークを騙すために画像に極小で知覚できない摂動を生成する。これらに対抗して、敵の入力サンプルをクリーンな出力画像に変換し、敵の攻撃から守る。それでも、ある程度の生成モデルは、敵の摂動を効果的に排除できず、理想的でない浄化結果をもたらす。ターゲットモデルに対する残余の敵対的摂動の潜在的な脅威を強調し,摂動スケールと攻撃能力の関係を定量的に確立する。特に、精製画像上の残留摂動は、主に、対向サンプルの同じ位置パッチと類似のパッチに由来する。本稿では,情報マスク浄化 (IMPure) と呼ばれる新たな対外浄化手法を提案する。逆方向のサンプルを得るために,まずパッチ情報の一部をマスクし,次にパッチを再構築して,パッチからの逆方向の摂動に抵抗する。すべてのパッチを並列に再構築し,結束画像を得る。そして, 類似する局所的摂動に対して精製試料を保護するため, 特徴抽出ネットワークに入力する前に, 精製試料と入力試料をランダムに混合することにより, このリスクをシミュレートする。最後に,画素損失と知覚損失の組合せ制約を確立し,モデルの再構成適応性を高める。 3つの分類器モデルを用いたimagenetデータセットの広範囲な実験により,本手法は9つの攻撃手法に対して最先端の結果が得られることを示した。実装コードと事前トレーニングされたウェイトは、 \textcolor{blue}{https://github.com/nowindbutrain/impure} でアクセスできる。

Adversarial attacks meticulously generate minuscule, imperceptible perturbations to images to deceive neural networks. Counteracting these, adversarial purification methods seek to transform adversarial input samples into clean output images to defend against adversarial attacks. Nonetheless, extent generative models fail to effectively eliminate adversarial perturbations, yielding less-than-ideal purification results. We emphasize the potential threat of residual adversarial perturbations to target models, quantitatively establishing a relationship between perturbation scale and attack capability. Notably, the residual perturbations on the purified image primarily stem from the same-position patch and similar patches of the adversarial sample. We propose a novel adversarial purification approach named Information Mask Purification (IMPure), aims to extensively eliminate adversarial perturbations. To obtain an adversarial sample, we first mask part of the patches information, then reconstruct the patches to resist adversarial perturbations from the patches. We reconstruct all patches in parallel to obtain a cohesive image. Then, in order to protect the purified samples against potential similar regional perturbations, we simulate this risk by randomly mixing the purified samples with the input samples before inputting them into the feature extraction network. Finally, we establish a combined constraint of pixel loss and perceptual loss to augment the model's reconstruction adaptability. Extensive experiments on the ImageNet dataset with three classifier models demonstrate that our approach achieves state-of-the-art results against nine adversarial attack methods. Implementation code and pre-trained weights can be accessed at \textcolor{blue}{https://github.com/NoWindButRain/IMPure}.

翻訳日:2023-11-28 18:07:41 公開日:2023-11-26

# 視覚変換器を用いた効率的なシーケンス推論のためのトークンリサイクル

Token Recycling for Efficient Sequential Inference with Vision Transformers ( http://arxiv.org/abs/2311.15335v1 )

ライセンス: Link先を確認

Jan Olszewski and Dawid Rymarczyk and Piotr W\'ojcik and Mateusz Pach and Bartosz Zieli\'nski

(参考訳) 視覚変換器(ViT)は、不足値の計算を必要としないため、不完全な入力を処理するために畳み込みニューラルネットワークをバイパスする。したがって、ViTは、例えばActive Visual Exploration問題のようなシーケンシャルな意思決定に適している。しかし、新しいシーケンシャル情報が到着するたびにフルフォワードパスを実行するため、計算的に非効率である。この計算効率を抑えるために,任意のアーキテクチャで使用可能なViT推論のTOken Recycling (TORE)修正を導入する。 TOREはViTをイテレータとアグリゲータという2つの部分に分割する。イテレータはシーケンシャル情報を中間トークンに別々に処理し、キャッシュする。アグリゲータは中間トークンを共同で処理して予測を得る。これにより、イテレーターによる計算結果を再利用することができる。効率的な逐次推論を除いては,逐次的意思決定に伴う計算負担を大幅に軽減し,最先端の精度を保ちながら補完的な学習方針を提案する。

Vision Transformers (ViTs) overpass Convolutional Neural Networks in processing incomplete inputs because they do not require the imputation of missing values. Therefore, ViTs are well suited for sequential decision-making, e.g. in the Active Visual Exploration problem. However, they are computationally inefficient because they perform a full forward pass each time a piece of new sequential information arrives. To reduce this computational inefficiency, we introduce the TOken REcycling (TORE) modification for the ViT inference, which can be used with any architecture. TORE divides ViT into two parts, iterator and aggregator. An iterator processes sequential information separately into midway tokens, which are cached. The aggregator processes midway tokens jointly to obtain the prediction. This way, we can reuse the results of computations made by iterator. Except for efficient sequential inference, we propose a complementary training policy, which significantly reduces the computational burden associated with sequential decision-making while achieving state-of-the-art accuracy.

翻訳日:2023-11-28 18:07:10 公開日:2023-11-26

# どれくらいのデータが必要ですか? 医療データに関する事例研究

How much data do I need? A case study on medical data ( http://arxiv.org/abs/2311.15331v1 )

ライセンス: Link先を確認

Ayse Betul Cengiz and A. Stephen McGough

(参考訳) ディープラーニングネットワークをトレーニングするデータの収集には、労力とリソースの面でコストがかかる。多くの場合、特に医学的文脈では、有害な影響がある可能性がある。侵襲的な医療処置や、それ自体が医療被害を引き起こすようなプロセスが必要となる。しかし、Deep Learningはデータ不足の方法だと見なされている。ここでは2つの一般的なアナージを見てみましょう。 i) より多くのデータがより良い結果をもたらすこと二十分なデータがない場合、転送学習は役に立ちます。これらは広く真であると仮定され、深層学習に関わる問題を解決する方法を選択する証拠として使用される。 6つの医学データセットと6つの一般データセットを評価した。これらのデータセットのさまざまなサブセット上でResNet18ネットワークをトレーニングして、“より多くのデータがより良い結果をもたらす”と評価する。転送学習が普遍的に有益かどうかを判断するために、これらのデータセットのうち11つを、第12データセットのサブセットである胸部で転送学習のソースとしています。マルチステージトランスファーラーニングが一貫したメリットをもたらすかどうか、さらに調べていきます。分析の結果、実際の状況はこれらの単純なアサージよりも複雑であることが分かりました -- より多くのデータがリターンの減少につながる可能性があり、転送学習のためのデータセットの誤った選択は、パフォーマンスを悪化させる可能性があるのです。多段階転送学習も同様にデータセット間の複雑な関係を明らかにする。

The collection of data to train a Deep Learning network is costly in terms of effort and resources. In many cases, especially in a medical context, it may have detrimental impacts. Such as requiring invasive medical procedures or processes which could in themselves cause medical harm. However, Deep Learning is seen as a data hungry method. Here, we look at two commonly held adages i) more data gives better results and ii) transfer learning will aid you when you don't have enough data. These are widely assumed to be true and used as evidence for choosing how to solve a problem when Deep Learning is involved. We evaluate six medical datasets and six general datasets. Training a ResNet18 network on varying subsets of these datasets to evaluate `more data gives better results'. We take eleven of these datasets as the sources for Transfer Learning on subsets of the twelfth dataset -- Chest -- in order to determine whether Transfer Learning is universally beneficial. We go further to see whether multi-stage Transfer Learning provides a consistent benefit. Our analysis shows that the real situation is more complex than these simple adages -- more data could lead to a case of diminishing returns and an incorrect choice of dataset for transfer learning can lead to worse performance, with datasets which we would consider highly similar to the Chest dataset giving worse results than datasets which are more dissimilar. Multi-stage transfer learning likewise reveals complex relationships between datasets.

翻訳日:2023-11-28 18:06:52 公開日:2023-11-26

# BS-Diff:胸部X線画像からの条件拡散モデルを用いた効果的な骨抑制

BS-Diff: Effective Bone Suppression Using Conditional Diffusion Models from Chest X-Ray Images ( http://arxiv.org/abs/2311.15328v1 )

ライセンス: Link先を確認

Zhanghao Chen, Yifei Sun, Wenjian Qin, Ruiquan Ge, Cheng Pan, Wenming Deng, Zhou Liu, Wenwen Min, Ahmed Elazab, Xiang Wan, Changmiao Wang

(参考訳) 胸部X線(CXR)は肺検診の低用量モードとして一般的に用いられる。しかし、肺領域の約75%が骨と重なり、疾患の検出と診断を妨げているため、CXRsの有効性は幾らか阻害されている。改善策として骨抑制技術が導入された。現在の病院のデュアルエネルギーサブトラクションイメージング技術では、高価な機器と被写体が高放射線にさらされる必要がある。これらの問題を回避すべく,深層学習に基づく画像生成アルゴリズムが提案されている。しかし, 既存の手法では, 高品質な画像が得られず, 特に肺血管のテクスチャの細部が捉えられにくい。これらの課題に対処するために,U-Netアーキテクチャとオートエンコーダを組み込むシンプルな拡張モジュールを備えた条件拡散モデルを備えた骨抑制フレームワークであるBS-Diffを提案する。提案するネットワークは骨抑制率の高い軟部組織像を生成するだけでなく,微細な画像の詳細を捉える能力も備えている。また,2010年以降で最大のデータセットを収集し,高精細度CXRと軟部組織像を関連病院で収集した120例のデータを収集した。広範囲な実験、比較分析、アブレーション研究、臨床評価は、提案されたBS-Diffが複数の指標でいくつかの骨圧モデルより優れていることを示している。

Chest X-rays (CXRs) are commonly utilized as a low-dose modality for lung screening. Nonetheless, the efficacy of CXRs is somewhat impeded, given that approximately 75% of the lung area overlaps with bone, which in turn hampers the detection and diagnosis of diseases. As a remedial measure, bone suppression techniques have been introduced. The current dual-energy subtraction imaging technique in the clinic requires costly equipment and subjects being exposed to high radiation. To circumvent these issues, deep learning-based image generation algorithms have been proposed. However, existing methods fall short in terms of producing high-quality images and capturing texture details, particularly with pulmonary vessels. To address these issues, this paper proposes a new bone suppression framework, termed BS-Diff, that comprises a conditional diffusion model equipped with a U-Net architecture and a simple enhancement module to incorporate an autoencoder. Our proposed network cannot only generate soft tissue images with a high bone suppression rate but also possesses the capability to capture fine image details. Additionally, we compiled the largest dataset since 2010, including data from 120 patients with high-definition, high-resolution paired CXRs and soft tissue images collected by our affiliated hospital. Extensive experiments, comparative analyses, ablation studies, and clinical evaluations indicate that the proposed BS-Diff outperforms several bone-suppression models across multiple metrics.

翻訳日:2023-11-28 18:06:28 公開日:2023-11-26

# FRAC-Q-Learning:社会ロボットのためのボレドム回避プロセスによる強化学習

FRAC-Q-Learning: A Reinforcement Learning with Boredom Avoidance Processes for Social Robots ( http://arxiv.org/abs/2311.15327v1 )

ライセンス: Link先を確認

Akinari Onishi

(参考訳) 強化学習アルゴリズムはしばしば社会ロボットに適用されている。しかし、ほとんどの強化学習アルゴリズムはソーシャルロボットの使用に最適化されておらず、従ってユーザを惹きつける可能性がある。本研究では,ソーシャルロボットであるfrac-q-learningに特化した新しい強化学習手法を提案する。提案アルゴリズムは,プロセスのランダム化と分類に加えて,忘れるプロセスから構成される。本研究では,従来のq-learningとの比較により,frac-q-learningへの関心と退屈度を評価した。 FRAC-Qラーニングは,従来のQラーニングに比べて関心度が高い傾向を示し,利用者のブーイングが著しく困難であった。したがって、frac-q-learningはユーザーを退屈させないソーシャルロボットの開発に寄与することができる。提案アルゴリズムは、Webベースのコミュニケーションや教育システムにも応用できる。本稿では,frac-q-learningのプロセス全体,詳細な実装,詳細な評価方法について述べる。

The reinforcement learning algorithms have often been applied to social robots. However, most reinforcement learning algorithms were not optimized for the use of social robots, and consequently they may bore users. We proposed a new reinforcement learning method specialized for the social robot, the FRAC-Q-learning, that can avoid user boredom. The proposed algorithm consists of a forgetting process in addition to randomizing and categorizing processes. This study evaluated interest and boredom hardness scores of the FRAC-Q-learning by a comparison with the traditional Q-learning. The FRAC-Q-learning showed significantly higher trend of interest score, and indicated significantly harder to bore users compared to the traditional Q-learning. Therefore, the FRAC-Q-learning can contribute to develop a social robot that will not bore users. The proposed algorithm can also find applications in Web-based communication and educational systems. This paper presents the entire process, detailed implementation and a detailed evaluation method of the of the FRAC-Q-learning for the first time.

翻訳日:2023-11-28 18:06:05 公開日:2023-11-26

# 軽量顔認識: 改良されたMobileFaceNetモデル

Lightweight Face Recognition: An Improved MobileFaceNet Model ( http://arxiv.org/abs/2311.15326v1 )

ライセンス: Link先を確認

Ahmad Hassanpour, Yasamin Kowsari

(参考訳) 本稿では,MobileFaceNetとその修正版であるMMobileFaceNetに着目した,軽量顔認識(FR)モデルの広範な探索と比較分析を行う。計算資源が限られているデバイス上での効率的なFRモデルの必要性は、精度を犠牲にすることなく、メモリフットプリントと計算要求を削減したモデルの開発につながった。本研究は、データセット選択、モデルアーキテクチャ、最適化アルゴリズムがFRモデルの性能に与える影響について考察する。 EFaR-2023コンペティションでは,特にパラメータ数に制限されたカテゴリにおいて,当社のモデルが例外的なパフォーマンスを示した。 Webface42Mデータセットのサブセットを採用し、シャープネスを意識した最小化(SAM)最適化を統合することで、クロスポジション、クロスエイジ、クロスエッチのパフォーマンスをテストするものなど、さまざまなベンチマークで精度を大幅に向上しました。この結果は, 計算効率だけでなく, 多様な条件下で高い精度を維持できるモデルの構築における我々のアプローチの有効性を裏付けるものである。

This paper presents an extensive exploration and comparative analysis of lightweight face recognition (FR) models, specifically focusing on MobileFaceNet and its modified variant, MMobileFaceNet. The need for efficient FR models on devices with limited computational resources has led to the development of models with reduced memory footprints and computational demands without sacrificing accuracy. Our research delves into the impact of dataset selection, model architecture, and optimization algorithms on the performance of FR models. We highlight our participation in the EFaR-2023 competition, where our models showcased exceptional performance, particularly in categories restricted by the number of parameters. By employing a subset of the Webface42M dataset and integrating sharpness-aware minimization (SAM) optimization, we achieved significant improvements in accuracy across various benchmarks, including those that test for cross-pose, cross-age, and cross-ethnicity performance. The results underscore the efficacy of our approach in crafting models that are not only computationally efficient but also maintain high accuracy in diverse conditions.

翻訳日:2023-11-28 18:05:50 公開日:2023-11-26

# 集団効果を有するLEDの超熱光子統計量の集団変動機構

Population fluctuation mechanism of the super-thermal photon statistic of LEDs with collective effects ( http://arxiv.org/abs/2311.15324v1 )

ライセンス: Link先を確認

Igor E. Protsenko, Alexander V. Uskov

(参考訳) その結果,エミッタ数の変動は線形状態の小さなLEDの超熱光子統計につながり,強いエミッタ-フィールド結合と集合効果に好適なキャビティを有することがわかった。 2階相関関数 g_2 の簡単な解析式が見つかる。 2レベルLEDモデルでは、g_2はg_2=6まで上昇する。超熱光子統計は、自然発生のキャビティモードへの人口変動の増加に関連している。

We found that fluctuations in the number of emitters lead to a super-thermal photon statistics of small LEDs in a linear regime, with a strong emitter-field coupling and a bad cavity favorable for collective effects. A simple analytical expression for the second-order correlation function g_2 is found. g_2 increase up to g_2=6 in the two-level LED model is predicted. The super-thermal photon statistics is related to the population fluctuation increase of the spontaneous emission to the cavity mode.

翻訳日:2023-11-28 18:05:33 公開日:2023-11-26

# まばらなポーリ・リンドブラッド雑音モデル学習手法

Techniques for learning sparse Pauli-Lindblad noise models ( http://arxiv.org/abs/2311.15408v1 )

ライセンス: Link先を確認

Ewout van den Berg, Pawel Wocjan

(参考訳) 確率的誤差キャンセルやゼロノイズ外挿のような誤差緩和技術は、正確なノイズモデルから恩恵を受ける。 sparse pauli-lindbladノイズモデルは、これらのアプリケーションでもっとも成功したモデルの1つです。既存の実装では、モデルは、キュービット位相に従う一項と二項の局所項を持つ一連の単純なパウリチャネルに分解される。このモデルは、現代の超伝導量子プロセッサの誤差軽減のためのノイズを正確に捉えることが示されているが、最寄りの相互作用を超えた高次項や効果を考慮することが重要である。しかし、そのような拡張モデルが実用的であり続けるためには、それらが効率的に学習できることを保証する必要がある。本研究では,これを実現する新しい手法を提案する。我々は,ポーリ回転に基づくtwirlingを導入することで,単一量子ビットの学習補正シーケンスを自動生成し,学習する必要のある独特なフィデリティの数を減らすことができる。さらに,学習ベース数を最小化するために,グラフカラー化と一様被覆配列を利用する基底選択戦略を提案する。これらの手法を組み合わせることで、拡張されたノイズモデルの学習が、複雑さが増しても効率的であることを保証する。

Error-mitigation techniques such as probabilistic error cancellation and zero-noise extrapolation benefit from accurate noise models. The sparse Pauli-Lindblad noise model is one of the most successful models for those applications. In existing implementations, the model decomposes into a series of simple Pauli channels with one- and two-local terms that follow the qubit topology. While the model has been shown to accurately capture the noise in contemporary superconducting quantum processors for error mitigation, it is important to consider higher-weight terms and effects beyond nearest-neighbor interactions. For such extended models to remain practical, however, we need to ensure that they can be learned efficiently. In this work we present new techniques that accomplish exactly this. We introduce twirling based on Pauli rotations, which enables us to automatically generate single-qubit learning correction sequences and reduce the number of unique fidelities that need to be learned. In addition, we propose a basis-selection strategy that leverages graph coloring and uniform covering arrays to minimize the number of learning bases. Taken together, these techniques ensure that the learning of the extended noise models remains efficient, despite their increased complexity.

翻訳日:2023-11-28 17:58:21 公開日:2023-11-26

# 統計的学習理論を深層学習に適用する

Applying statistical learning theory to deep learning ( http://arxiv.org/abs/2311.15404v1 )

ライセンス: Link先を確認

C\'edric Gerbelot, Avetik Karagulyan, Stefani Karp, Kavya Ravichandran, Menachem Stern, Nathan Srebro

(参考訳) 統計的学習理論は教師付き学習を理解するための強固な枠組みを提供するが、深層学習の多くの理論的な側面はいまだに不明であり、特に、異なるアーキテクチャが勾配に基づく方法で訓練された場合、どのように帰納的バイアスをもたらすかである。これらの講義の目的は、学習理論の観点から深層学習を理解しようとするときに生じる主な疑問の概要を提供することである。統計的学習理論と確率的最適化に関する簡単なリマインダーの後、良性過剰の文脈で暗黙のバイアスについて論じる。その後、ミラー降下アルゴリズムの一般的な説明に移り、与えられた学習問題に対するパラメータ空間と対応する関数空間の間の行き来や、学習問題の幾何が計量テンソルによってどのように表現されるかを示す。この枠組みに基づき,線形対角ネットワーク上の勾配降下の暗黙的バイアスを,様々な回帰タスクに対して詳細に検討し,損失関数,初期化時のパラメータスケール,ネットワークの深さが,暗黙的バイアス,特にカーネルや特徴学習間の遷移にどのようにつながるかを示す。

Although statistical learning theory provides a robust framework to understand supervised learning, many theoretical aspects of deep learning remain unclear, in particular how different architectures may lead to inductive bias when trained using gradient based methods. The goal of these lectures is to provide an overview of some of the main questions that arise when attempting to understand deep learning from a learning theory perspective. After a brief reminder on statistical learning theory and stochastic optimization, we discuss implicit bias in the context of benign overfitting. We then move to a general description of the mirror descent algorithm, showing how we may go back and forth between a parameter space and the corresponding function space for a given learning problem, as well as how the geometry of the learning problem may be represented by a metric tensor. Building on this framework, we provide a detailed study of the implicit bias of gradient descent on linear diagonal networks for various regression tasks, showing how the loss function, scale of parameters at initialization and depth of the network may lead to various forms of implicit bias, in particular transitioning between kernel or feature learning.

翻訳日:2023-11-28 17:58:04 公開日:2023-11-26

# 多段文書分類のための学習節重み付け

Learning Section Weights for Multi-Label Document Classification ( http://arxiv.org/abs/2311.15402v1 )

ライセンス: Link先を確認

Maziar Moradi Fard, Paula Sorrolla Bayod, Kiomars Motarjem, Mohammad Alian Nejadi, Saber Akhondi, Camilo Thorne

(参考訳) マルチラベル文書分類は、NLPにおける伝統的なタスクである。シングルラベルの分類と比較すると、各文書は複数のクラスに割り当てられる。この問題は科学論文のタグ付けなど、様々な分野において極めて重要である。文書は、しばしば抽象やタイトルなどのいくつかのセクションに分けられる。現在のアプローチでは、異なるセクションを複数ラベルの分類に等しく扱う。これは現実的な仮定ではなく、準最適結果をもたらすと我々は主張する。そこで我々は,複数ラベル分類における各セクションの寄与を利用して,LSW(Learning Section Weights)と呼ばれる新しい手法を提案する。複数のフィードフォワード層によって、LSWは各セクションに重みを割り当て、予測に重みを組み込むことを学ぶ。我々は科学的論文にアプローチを実演する。パブリック(arXiv)およびプライベート(Elsevier)データセットの実験結果は、最先端のマルチラベル文書分類法と比較して、LSWの優位性を確認する。特に、lswはマクロ平均化f1-scoreでは1.3%改善され、公開利用可能なarxivデータセットでのマクロ平均リコールでは1.3%向上した。

Multi-label document classification is a traditional task in NLP. Compared to single-label classification, each document can be assigned multiple classes. This problem is crucially important in various domains, such as tagging scientific articles. Documents are often structured into several sections such as abstract and title. Current approaches treat different sections equally for multi-label classification. We argue that this is not a realistic assumption, leading to sub-optimal results. Instead, we propose a new method called Learning Section Weights (LSW), leveraging the contribution of each distinct section for multi-label classification. Via multiple feed-forward layers, LSW learns to assign weights to each section of, and incorporate the weights in the prediction. We demonstrate our approach on scientific articles. Experimental results on public (arXiv) and private (Elsevier) datasets confirm the superiority of LSW, compared to state-of-the-art multi-label document classification methods. In particular, LSW achieves a 1.3% improvement in terms of macro averaged F1-score while it achieves 1.3% in terms of macro averaged recall on the publicly available arXiv dataset.

翻訳日:2023-11-28 17:57:42 公開日:2023-11-26

# 日常生活行動の現実的シミュレーションのための枠組み

A Framework for Realistic Simulation of Daily Human Activity ( http://arxiv.org/abs/2311.15400v1 )

ライセンス: Link先を確認

Ifrah Idrees, Siddharth Singh, Kerui Xu, Dylan F. Glas

(参考訳) 家庭内のユーザの日常的な動きに反応し適応するAstroのようなソーシャルロボットにとって、機能開発とテストには、人間の活動の現実的なシミュレーションが必要である。本稿では,在宅環境における日常の行動パターンをシミュレーションし,異なるパーソナラや活動パターンの手動構成可能性,活動タイミングの変動,複数のホームレイアウトのテストを行うためのフレームワークを提案する。本稿では,スケジュールの日々の変動を特定する手法を提案し,テンプレートからスケジュールを生成する双方向制約伝搬アルゴリズムを提案する。ユースケースシナリオ分析を用いて、我々のフレームワークの表現力を検証するとともに、3つの公開データセットと自己収集データセットから人間の行動によく似たデータを生成することができることを示す。本研究の貢献は,社会ロボットの大規模行動の体系的テストを支援し,異なる家庭における人間の行動の合成データセットの手続き的生成を可能にし,トレーニングデータのバイアスを最小化し,家庭環境におけるより堅牢で効果的なロボットの実現に寄与する。

For social robots like Astro which interact with and adapt to the daily movements of users within the home, realistic simulation of human activity is needed for feature development and testing. This paper presents a framework for simulating daily human activity patterns in home environments at scale, supporting manual configurability of different personas or activity patterns, variation of activity timings, and testing on multiple home layouts. We introduce a method for specifying day-to-day variation in schedules and present a bidirectional constraint propagation algorithm for generating schedules from templates. We validate the expressive power of our framework through a use case scenario analysis and demonstrate that our method can be used to generate data closely resembling human behavior from three public datasets and a self-collected dataset. Our contribution supports systematic testing of social robot behaviors at scale, enables procedural generation of synthetic datasets of human movement in different households, and can help minimize bias in training data, leading to more robust and effective robots for home environments.

翻訳日:2023-11-28 17:57:20 公開日:2023-11-26

# 線形行動クローニング剤の最適指導

Optimally Teaching a Linear Behavior Cloning Agent ( http://arxiv.org/abs/2311.15399v1 )

ライセンス: Link先を確認

Shubham Kumar Bharti, Stephen Wright, Adish Singla, Xiaojin Zhu

(参考訳) 線形行動クローニング(LBC)学習者の最適指導について検討する。この設定では、教師はLBC学習者に示す状態を選択することができる。学習者は、デモと一致する無限線形仮説のバージョン空間を維持する。教師の目標は,最小限の州の実演数を用いて,現実的な目標政策を学習者に教えることである。この数字は「TD」として知られている。本稿では,インスタンス最適tdを実現する "`teach using iterative elimination(tie)" という指導アルゴリズムを提案する。しかし、最適学習セットの探索はNPハードであることも示している。さらに、教示次元に対して$\log(|a|-1)$の近似比を保証する近似アルゴリズムを提供する。最後に,本アルゴリズムの効率と有効性を検証する実験結果を提供する。

We study optimal teaching of Linear Behavior Cloning (LBC) learners. In this setup, the teacher can select which states to demonstrate to an LBC learner. The learner maintains a version space of infinite linear hypotheses consistent with the demonstration. The goal of the teacher is to teach a realizable target policy to the learner using minimum number of state demonstrations. This number is known as the Teaching Dimension(TD). We present a teaching algorithm called ``Teach using Iterative Elimination(TIE)" that achieves instance optimal TD. However, we also show that finding optimal teaching set computationally is NP-hard. We further provide an approximation algorithm that guarantees an approximation ratio of $\log(|A|-1)$ on the teaching dimension. Finally, we provide experimental results to validate the efficiency and effectiveness of our algorithm.

翻訳日:2023-11-28 17:56:50 公開日:2023-11-26

# 半制約クラスタリングのための制約マッチング

ConstraintMatch for Semi-constrained Clustering ( http://arxiv.org/abs/2311.15395v1 )

ライセンス: Link先を確認

Jann Goschenhofer, Bernd Bischl, Zsolt Kira

(参考訳) 制約付きクラスタリングによって、ペアワイズ制約のみを使用した分類モデルのトレーニングが可能になる。真の基盤となるクラスラベルがなくてもうまく機能するが、制約付きクラスタリングモデルはトレーニングに大量のバイナリ制約アノテーションを必要とする。本稿では,制約の小さなセットとともに大量の \textit{unconstrained} データを利用できる半教師付きコンテキストを提案し,そのような制約のないデータを活用するために \textit{ConstraintMatch} を提案する。完全なラベルを用いた半教師付き学習では、多くの進歩がなされているが、制約ベースのラベル設定において、結果のメソッドをナイーブに適用することを妨げる多くの課題がある。したがって、これらの課題、特にその理由と分析を行う。 1)疑似ラベルの主な弱点である確認バイアスを克服するための \textit{pseudo-constraining} メカニズムの提案 2) \textit{informative} unconstrainedサンプルの選択に向けた擬似ラベル法の開発 3) 半拘束型モデルトレーニングを容易にする初期損失と補助損失に対するペアワイズ損失関数の使用も可能であることを示す。大規模実験により,5つの難解なベンチマークにおいて,正規クラスタリングとオーバークラスタシナリオの両方において,関連するベースラインに対する制約マッチの有効性を実証し,いくつかのコンポーネントの分析を提供する。

Constrained clustering allows the training of classification models using pairwise constraints only, which are weak and relatively easy to mine, while still yielding full-supervision-level model performance. While they perform well even in the absence of the true underlying class labels, constrained clustering models still require large amounts of binary constraint annotations for training. In this paper, we propose a semi-supervised context whereby a large amount of \textit{unconstrained} data is available alongside a smaller set of constraints, and propose \textit{ConstraintMatch} to leverage such unconstrained data. While a great deal of progress has been made in semi-supervised learning using full labels, there are a number of challenges that prevent a naive application of the resulting methods in the constraint-based label setting. Therefore, we reason about and analyze these challenges, specifically 1) proposing a \textit{pseudo-constraining} mechanism to overcome the confirmation bias, a major weakness of pseudo-labeling, 2) developing new methods for pseudo-labeling towards the selection of \textit{informative} unconstrained samples, 3) showing that this also allows the use of pairwise loss functions for the initial and auxiliary losses which facilitates semi-constrained model training. In extensive experiments, we demonstrate the effectiveness of ConstraintMatch over relevant baselines in both the regular clustering and overclustering scenarios on five challenging benchmarks and provide analyses of its several components.

翻訳日:2023-11-28 17:56:34 公開日:2023-11-26

# 2層非線形回帰に対する近似ニュートン法の局所収束

Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression ( http://arxiv.org/abs/2311.15390v1 )

ライセンス: Link先を確認

Zhihang Li, Zhao Song, Zifan Wang, Junze Yin

(参考訳) 日常生活の様々な側面において,大規模言語モデル(LLM)による顕著な進歩があった。 LLMは自然言語処理における変換力として機能し、テキスト生成、翻訳、感情分析、質問応答の応用を見つける。 llmの成果は、この分野における研究努力の大幅な増加につながった。 1つの特定の2層回帰問題は、前回の作業においてよく研究されており、第1の層はreluユニットによって活性化され、第2の層はsoftmaxユニットによって活性化される。以前の研究は2層回帰を構築するための堅固な分析を提供するが、2層以上の回帰問題を構成する分析には依然としてギャップがある。本稿では,この問題に対処するための重要なステップとして,二層回帰問題の解析を行う。以前の作業とは対照的に、最初のレイヤはsoftmaxユニットによってアクティベートされます。これにより、softmax関数に基づいてより多くのアクティベーション関数を作成するための将来の分析のステージが設定される。ソフトマックス関数の再配置は、大きく異なる分析をもたらす。その結果, 正規化トレーニング損失を最小化するために用いられる近似ニュートン法の収束特性を解析した。ヘッセン行列の損失関数は正定値であり、ある仮定の下でリプシッツが連続であることを証明する。これにより,提案アルゴリズムの局所収束保証を確立することができる。具体的には、適切な初期化と$O(\log(1/\epsilon)$反復の後、高い確率でトレーニング損失を最小化する$\epsilon$-approximateを見つけることができる。それぞれの繰り返しはおよそ$O(\mathrm{nnz}(C) + d^\omega)$timeを必要とし、$d$はモデルのサイズ、$C$は入力行列、$\omega < 2.374$は行列乗算指数である。

There have been significant advancements made by large language models (LLMs) in various aspects of our daily lives. LLMs serve as a transformative force in natural language processing, finding applications in text generation, translation, sentiment analysis, and question-answering. The accomplishments of LLMs have led to a substantial increase in research efforts in this domain. One specific two-layer regression problem has been well-studied in prior works, where the first layer is activated by a ReLU unit, and the second layer is activated by a softmax unit. While previous works provide a solid analysis of building a two-layer regression, there is still a gap in the analysis of constructing regression problems with more than two layers. In this paper, we take a crucial step toward addressing this problem: we provide an analysis of a two-layer regression problem. In contrast to previous works, our first layer is activated by a softmax unit. This sets the stage for future analyses of creating more activation functions based on the softmax function. Rearranging the softmax function leads to significantly different analyses. Our main results involve analyzing the convergence properties of an approximate Newton method used to minimize the regularized training loss. We prove that the loss function for the Hessian matrix is positive definite and Lipschitz continuous under certain assumptions. This enables us to establish local convergence guarantees for the proposed training algorithm. Specifically, with an appropriate initialization and after $O(\log(1/\epsilon))$ iterations, our algorithm can find an $\epsilon$-approximate minimizer of the training loss with high probability. Each iteration requires approximately $O(\mathrm{nnz}(C) + d^\omega)$ time, where $d$ is the model size, $C$ is the input matrix, and $\omega < 2.374$ is the matrix multiplication exponent.

翻訳日:2023-11-28 17:55:56 公開日:2023-11-26

# spectro-vit:spectrogramsを用いたgaba編集mrs再建のための視覚トランスフォーマーモデル

Spectro-ViT: A Vision Transformer Model for GABA-edited MRS Reconstruction Using Spectrograms ( http://arxiv.org/abs/2311.15386v1 )

ライセンス: Link先を確認

Gabriel Dias, Rodrigo Pommot Berto, Mateus Oliveira, Lucas Ueda, Sergio Dertkigil, Paula D. P. Costa, Amirmohammad Shamaei, Roberto Souza, Ashley Harris, Leticia Rittner

(参考訳) 目的: 視覚トランスフォーマ (vit) を用いたgaba-edited magnetic resonance spectroscopy (mrs) の再構成・除去について, 一般に取得される過渡現象の4分の1をスペクトログラムを用いて検討すること。理論と方法:gabaで編集されたmrsスキャンで収集される典型的なトランジェント数の4分の1は前処理され、短時間フーリエ変換(stft)を用いて分光画像表現に変換される。データの画像表現は、GABA編集MSSスペクトル(Spectro-ViT)を再構成するための事前訓練されたViTの適応を可能にする。 Spectro-ViTは微調整され、その後、 \textit{in vivo} GABA編集MSSデータを用いてテストされる。スペクトル品質指標と推定代謝物濃度値を用いて, スペクトルvit特性を文献中の他のモデルと比較した。結果:spectro-vitモデルは,5つの定量的指標(2乗誤差,形状スコア,gaba+/water fit誤差,最大半分幅)のうち4つで,他のモデルを大きく上回った。 GABA+/水, GABA+/Cr, およびGlx/水) の代謝物濃度は, 典型的なGABA添加MSSスキャンを用いて推定した代謝物濃度とほぼ一致した。結論: 提案したSpectro-ViTモデルはGABA編集MSSの再構築において最先端の結果を得た。

Purpose: To investigate the use of a Vision Transformer (ViT) to reconstruct/denoise GABA-edited magnetic resonance spectroscopy (MRS) from a quarter of the typically acquired number of transients using spectrograms. Theory and Methods: A quarter of the typically acquired number of transients collected in GABA-edited MRS scans are pre-processed and converted to a spectrogram image representation using the Short-Time Fourier Transform (STFT). The image representation of the data allows the adaptation of a pre-trained ViT for reconstructing GABA-edited MRS spectra (Spectro-ViT). The Spectro-ViT is fine-tuned and then tested using \textit{in vivo} GABA-edited MRS data. The Spectro-ViT performance is compared against other models in the literature using spectral quality metrics and estimated metabolite concentration values. Results: The Spectro-ViT model significantly outperformed all other models in four out of five quantitative metrics (mean squared error, shape score, GABA+/water fit error, and full width at half maximum). The metabolite concentrations estimated (GABA+/water, GABA+/Cr, and Glx/water) were consistent with the metabolite concentrations estimated using typical GABA-edited MRS scans reconstructed with the full amount of typically collected transients. Conclusion: The proposed Spectro-ViT model achieved state-of-the-art results in reconstructing GABA-edited MRS, and the results indicate these scans could be up to four times faster.

翻訳日:2023-11-28 17:55:10 公開日:2023-11-26

# ロバストかつ自動データクラスタリング: Dirichlet ProcessがMeansの仲介者と出会う

Robust and Automatic Data Clustering: Dirichlet Process meets Median-of-Means ( http://arxiv.org/abs/2311.15384v1 )

ライセンス: Link先を確認

Supratik Basu, Jyotishka Ray Choudhury, Debolina Paul, Swagatam Das

(参考訳) クラスタリングは、教師なし機械学習の領域における最も顕著な課題の1つである。セントロイドベースのクラスタリングアルゴリズムの配列のうち、ロイドのヒューリスティックに根ざした古典的な$k$-meansアルゴリズムは、文献で広く使われている技法の1つとして中心的な段階を採っている。それでも、$k$-meansとその変種には注目すべき制限がある。これらは、初期クラスター中心に強く依存しており、目的関数の局所的ミニマムへの収束性があり、データの異常値やノイズに対する感受性が高い。ノイズや異常値を含むデータと向き合うと、中央値推定器(mom)が任意のcentroidベースのクラスタリングフレームワークの安定化力として現れる。別の注意として、既存のクラスタリング方法論の中で一般的な制約は、分析の前にクラスタ数に関する前提知識にある。ベイズ非パラメトリックモデルのようなモデルベース手法を利用することで、無限混合モデルの利点が得られるため、そのような要求を回避できる。本稿では,これらの事実に動機づけられて,クラスタ数を事前に指定せずに,ノイズがクラスタ品質に与える影響を緩和するモデルベースおよびセンタロイドベース手法の原則を統合することにより,効率的かつ自動的なクラスタリング手法を提案する。クラスタリングエラーの上限に関する統計的保証と、シミュレーションおよび実データによる厳密な評価は、既存のクラスタリングアルゴリズムよりも提案手法の利点を示唆している。

Clustering stands as one of the most prominent challenges within the realm of unsupervised machine learning. Among the array of centroid-based clustering algorithms, the classic $k$-means algorithm, rooted in Lloyd's heuristic, takes center stage as one of the extensively employed techniques in the literature. Nonetheless, both $k$-means and its variants grapple with noteworthy limitations. These encompass a heavy reliance on initial cluster centroids, susceptibility to converging into local minima of the objective function, and sensitivity to outliers and noise in the data. When confronted with data containing noisy or outlier-laden observations, the Median-of-Means (MoM) estimator emerges as a stabilizing force for any centroid-based clustering framework. On a different note, a prevalent constraint among existing clustering methodologies resides in the prerequisite knowledge of the number of clusters prior to analysis. Utilizing model-based methodologies, such as Bayesian nonparametric models, offers the advantage of infinite mixture models, thereby circumventing the need for such requirements. Motivated by these facts, in this article, we present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies that mitigates the effect of noise on the quality of clustering while ensuring that the number of clusters need not be specified in advance. Statistical guarantees on the upper bound of clustering error, and rigorous assessment through simulated and real datasets suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.

翻訳日:2023-11-28 17:54:40 公開日:2023-11-26

# ゼロショットオープン語彙3次元視覚グラウンドのためのビジュアルプログラミング

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding ( http://arxiv.org/abs/2311.15383v1 )

ライセンス: Link先を確認

Zhihao Yuan, Jinke Ren, Chun-Mei Feng, Hengshuang Zhao, Shuguang Cui, Zhen Li

(参考訳) 3Dビジュアルグラウンド(3DVG)はテキスト記述に基づく3Dオブジェクトのローカライズを目的としている。従来の3DVGの教師付き手法は、しばしば広範囲のアノテーションと事前定義された語彙を必要とする。この問題に対処するために,大規模言語モデル(LLM)の能力を活かしたゼロショットオープン語彙3DVGのための新しいビジュアルプログラミング手法を提案する。提案手法は,ゼロショット3DVGの基本的な理解を確立するため,LLMに係わるユニークなダイアログベースの手法から始まる。これに基づいて、ビュー非依存、ビュー依存、機能モジュールという3つのタイプのモジュールからなる視覚プログラムを設計する。これらのモジュールは、特に3Dシナリオに適したもので、複雑な推論と推論を実行するために協調して動作する。さらに,既存の3次元オブジェクト検出器の範囲をオープン語彙シナリオに拡張する言語オブジェクト相関モジュールを開発した。我々のゼロショットアプローチは、いくつかの教師付きベースラインより優れており、効果的な3DVGへの大きな前進を示している。

3D Visual Grounding (3DVG) aims at localizing 3D object based on textual descriptions. Conventional supervised methods for 3DVG often necessitate extensive annotations and a predefined vocabulary, which can be restrictive. To address this issue, we propose a novel visual programming approach for zero-shot open-vocabulary 3DVG, leveraging the capabilities of large language models (LLMs). Our approach begins with a unique dialog-based method, engaging with LLMs to establish a foundational understanding of zero-shot 3DVG. Building on this, we design a visual program that consists of three types of modules, i.e., view-independent, view-dependent, and functional modules. These modules, specifically tailored for 3D scenarios, work collaboratively to perform complex reasoning and inference. Furthermore, we develop an innovative language-object correlation module to extend the scope of existing 3D object detectors into open-vocabulary scenarios. Extensive experiments demonstrate that our zero-shot approach can outperform some supervised baselines, marking a significant stride towards effective 3DVG.

翻訳日:2023-11-28 17:54:07 公開日:2023-11-26

# フェデレーション学習のためのマルチグローバルサーバアーキテクチャの評価

Evaluating Multi-Global Server Architecture for Federated Learning ( http://arxiv.org/abs/2311.15382v1 )

ライセンス: Link先を確認

Asfia Kawnine, Hung Cao, Atah Nuh Mih, Monica Wachowicz

(参考訳) 単一のグローバルサーバフレームワークによるフェデレーション学習(fl)は現在、モバイルデバイスやエッジデバイスといった分散環境でマシンラーニングモデルをトレーニングするための一般的なアプローチである。しかしながら、集中型サーバアーキテクチャは、中央/グローバルサーバ上のあらゆる課題がシステム全体の障害を引き起こすため、リスクを負う。このリスクを最小限に抑えるために,複数のグローバルサーバのデプロイを活用する新しいフェデレーション学習フレームワークを提案する。フェデレーション学習における複数のグローバルサーバの実装は,局所的なコラボレーションと知識の集約を生かして効率を向上し,単一サーバフレームワークにおける通信障害に対するエラー耐性を処理できることを実証する。そこで我々は,複数のグローバルサーバの展開を利用する新しいフレームワークを提案する。複数駅における電気自動車(ev)充電の事象履歴を含むデータセットを用いて,一連の実験を行った。複数のグローバルサーバとクライアントサーバを連携させて,各クライアントサーバが異なるリージョンを戦略的に表現し,グローバルサーバがそれらのデバイスからローカル更新を集約する役割を担った。グローバルモデルの予備結果は、複数のサーバに起因するパフォーマンスの差が1%未満であることを示している。モデル効率が向上するという仮説は期待通りではなかったが、アルゴリズムに付加された通信課題を扱うための規則は、誤り耐性の問題を解決した。将来の研究は、複数のグローバルサーバをデプロイするための特定の用途を特定することに集中できる。

Federated learning (FL) with a single global server framework is currently a popular approach for training machine learning models on decentralized environment, such as mobile devices and edge devices. However, the centralized server architecture poses a risk as any challenge on the central/global server would result in the failure of the entire system. To minimize this risk, we propose a novel federated learning framework that leverages the deployment of multiple global servers. We posit that implementing multiple global servers in federated learning can enhance efficiency by capitalizing on local collaborations and aggregating knowledge, and the error tolerance in regard to communication failure in the single server framework would be handled. We therefore propose a novel framework that leverages the deployment of multiple global servers. We conducted a series of experiments using a dataset containing the event history of electric vehicle (EV) charging at numerous stations. We deployed a federated learning setup with multiple global servers and client servers, where each client-server strategically represented a different region and a global server was responsible for aggregating local updates from those devices. Our preliminary results of the global models demonstrate that the difference in performance attributed to multiple servers is less than 1%. While the hypothesis of enhanced model efficiency was not as expected, the rule for handling communication challenges added to the algorithm could resolve the error tolerance issue. Future research can focus on identifying specific uses for the deployment of multiple global servers.

翻訳日:2023-11-28 17:53:46 公開日:2023-11-26

# 計算効率の向上とAI能力の拡散

Increased Compute Efficiency and the Diffusion of AI Capabilities ( http://arxiv.org/abs/2311.15377v1 )

ライセンス: Link先を確認

Konstantin Pilz, Lennart Heim, Nicholas Brown

(参考訳) 高度なaiモデルのトレーニングには、計算リソースや計算に多大な投資が必要です。しかし、ハードウェアの革新が計算とアルゴリズムの進歩の価格を下げるにつれ、AIモデルを所定のパフォーマンスにトレーニングするコストは時間の経過とともに低下する。この現象を分析するために、計算投資のトレーニングと結果のAIモデルの性能を関連付ける計算(投資)効率を導入する。次に,計算効率の向上の概念モデルを提案し,社会的・統治的意味を評価する。アクセス効果は、時間とともにモデルをトレーニングできるアクターの数を増やすが、パフォーマンス効果は、大きな計算投資家が新しい機能を開拓し、能力が拡散してもパフォーマンス上の優位性を維持することができるように、アクターに利用可能なパフォーマンスを同時に向上させる。相対的なパフォーマンスの優位性はゼロサム競争において大きな利益をもたらすかもしれないが、パフォーマンスの天井はリーダーの優位性を減少させる可能性がある。それでも、最も深刻なリスクが最も先進的なモデルから生じた場合、大きな計算投資家は、まず危険な能力を発見すれば、特に精査を保証できる。そのため政府は、大規模な計算投資家に対して、危険な能力について警告し、適切な準備と、優れたモデルパフォーマンスと防御手段の計算アクセスを可能とするよう要求すべきである。過度なリスク、特に犯罪支配能力の場合、政府は完全に増殖を制限する必要があるかもしれない。

Training advanced AI models requires large investments in computational resources, or compute. Yet, as hardware innovation reduces the price of compute and algorithmic advances make its use more efficient, the cost of training an AI model to a given performance falls over time. To analyze this phenomenon, we introduce compute (investment) efficiency, which relates training compute investment to the resulting AI model performance. We then present a conceptual model of increases in compute efficiency and assess the social and governance implications. We find that while an access effect increases the number of actors who can train models to a given performance over time, a performance effect simultaneously increases the performance available to every actor - potentially enabling large compute investors to pioneer new capabilities and maintain a performance advantage even as capabilities diffuse. The market effects are multifaceted: while a relative performance advantage might grant outsized benefits in zero-sum competition, performance ceilings might reduce leaders' advantage. Nonetheless, we find that if the most severe risks arise from the most advanced models, large compute investors warrant particular scrutiny since they discover potentially dangerous capabilities first. Consequently, governments should require large compute investors to warn them about dangerous capabilities, thereby enabling timely preparation and potentially using their superior model performance and compute access for defensive measures. In cases of extreme risks, especially offense-dominant capabilities, the government might need to actively restrict the proliferation entirely.

翻訳日:2023-11-28 17:53:22 公開日:2023-11-26

# MI攻撃に必要なのは信頼だけ

Confidence Is All You Need for MI Attacks ( http://arxiv.org/abs/2311.15373v1 )

ライセンス: Link先を確認

Abhishek Sinha, Himanshi Tibrewal, Mansi Gupta, Nikhar Waghela, Shivank Garg

(参考訳) 機械学習のセキュリティの進化期において、機密データの機密性に対する強力な脅威としてメンバーシップ推論攻撃が出現した。この攻撃では、敵はターゲットモデルのトレーニング中に特定のポイントが使用されたかどうかを判定する。本稿では,モデルのトレーニングセットにおけるデータポイントのメンバシップを計測する新しい手法を提案する。伝統的に行われているように、損失とメンバシップを関連付ける代わりに、トレーニング例が一般的に実際のクラスに分類された時に高い信頼度を示すという事実を活用しています。トレーニング中、モデルは基本的にトレーニングデータに適合しており、見えないデータに対する一般化において特に困難に直面する可能性がある。この非対称性は、トレーニングデータに存在する特定のパターンやノイズを利用するため、トレーニングデータに対する信頼性を高めるモデルにつながる。提案手法は,機械学習モデルが生成する信頼度値を活用する。これらの信頼度は、予測におけるモデルの確信度を確率論的に測定し、与えられたデータポイントのメンバシップを推測するためにさらに利用できる。さらに,与えられたデータポイントの基底真理(真のクラス)を知らずにこの攻撃を実行できる別の手法を導入することにより,既存のラベル依存型攻撃手法に対するエッジを提供する。

In this evolving era of machine learning security, membership inference attacks have emerged as a potent threat to the confidentiality of sensitive data. In this attack, adversaries aim to determine whether a particular point was used during the training of a target model. This paper proposes a new method to gauge a data point's membership in a model's training set. Instead of correlating loss with membership, as is traditionally done, we have leveraged the fact that training examples generally exhibit higher confidence values when classified into their actual class. During training, the model is essentially being 'fit' to the training data and might face particular difficulties in generalization to unseen data. This asymmetry leads to the model achieving higher confidence on the training data as it exploits the specific patterns and noise present in the training data. Our proposed approach leverages the confidence values generated by the machine learning model. These confidence values provide a probabilistic measure of the model's certainty in its predictions and can further be used to infer the membership of a given data point. Additionally, we also introduce another variant of our method that allows us to carry out this attack without knowing the ground truth(true class) of a given data point, thus offering an edge over existing label-dependent attack methods.

翻訳日:2023-11-28 17:52:57 公開日:2023-11-26

# TD-Net : スパース・ビューCT再構成のためのトリドメインネットワーク

TD-Net: A Tri-domain network for sparse-view CT reconstruction ( http://arxiv.org/abs/2311.15369v1 )

ライセンス: Link先を確認

Xinyuan Wang and Changqing Su and Bo Xiong

(参考訳) X線放射リスクの低減を目的としたスパースビューCT再構成は、しばしば画質劣化に悩まされ、ノイズやアーティファクトとして現れる。既存のポストプロセッシングとデュアルドメイン技術は、放射線の低減に効果があるが、しばしば過剰な結果につながり、診断の明確さを損なう。そこで本研究では,シンノグラム,画像,周波数領域の最適化を統一したtd-netを提案する。周波数スーパービジョンモジュール(FSM)を組み込むことで、TD-Netは複雑な詳細を十分に保存する。広汎な評価は、高画質CT画像のスパースビューからの再構成におけるTD-Netの優れた性能を示す。様々なノイズシナリオにおけるTD-Netの機能強化は、医療画像のブレークスルーとしての可能性を強調している。

Sparse-view CT reconstruction, aimed at reducing X-ray radiation risks, frequently suffers from image quality degradation, manifested as noise and artifacts. Existing post-processing and dual-domain techniques, although effective in radiation reduction, often lead to over-smoothed results, compromising diagnostic clarity. Addressing this, we introduce TD-Net, a pioneering tri-domain approach that unifies sinogram, image, and frequency domain optimizations. By incorporating Frequency Supervision Module(FSM), TD-Net adeptly preserves intricate details, overcoming the prevalent over-smoothing issue. Extensive evaluations demonstrate TD-Net's superior performance in reconstructing high-quality CT images from sparse views, efficiently balancing radiation safety and image fidelity. The enhanced capabilities of TD-Net in varied noise scenarios highlight its potential as a breakthrough in medical imaging.

翻訳日:2023-11-28 17:52:37 公開日:2023-11-26

# ビデオインペインティングのためのフローガイド拡散

Flow-Guided Diffusion for Video Inpainting ( http://arxiv.org/abs/2311.15368v1 )

ライセンス: Link先を確認

Bohai Gu, Yongsheng Yu, Heng Fan, Libo Zhang

(参考訳) ビデオインペインティングは、大きな動きや低照度条件といった複雑なシナリオに挑戦されている。新たな拡散モデルを含む現在の手法は、品質と効率の限界に直面している。本稿では,本論文で紹介するfgdvi(flow-guided diffusion model for video inpainting)について紹介する。我々は,1ステップ潜時伝播の高精度化に光フローを用い,モデル非依存な潜時補間手法を導入する。このテクニックは、追加のトレーニングなしで、任意のビデオ拡散モデル(vdm)とシームレスに統合する。我々のFGDVIは、既存の最先端手法に比べて、フローワープ誤差E_warpが10%向上したことを示す。包括的実験によりFGDVIの優れた性能が検証され,高度な映像のインペイントに期待できる方向性が得られた。コードと詳細な結果はhttps://github.com/nevsnev/fgdviで公開されている。

Video inpainting has been challenged by complex scenarios like large movements and low-light conditions. Current methods, including emerging diffusion models, face limitations in quality and efficiency. This paper introduces the Flow-Guided Diffusion model for Video Inpainting (FGDVI), a novel approach that significantly enhances temporal consistency and inpainting quality via reusing an off-the-shelf image generation diffusion model. We employ optical flow for precise one-step latent propagation and introduces a model-agnostic flow-guided latent interpolation technique. This technique expedites denoising, seamlessly integrating with any Video Diffusion Model (VDM) without additional training. Our FGDVI demonstrates a remarkable 10% improvement in flow warping error E_warp over existing state-of-the-art methods. Our comprehensive experiments validate superior performance of FGDVI, offering a promising direction for advanced video inpainting. The code and detailed results will be publicly available in https://github.com/NevSNev/FGDVI.

翻訳日:2023-11-28 17:52:20 公開日:2023-11-26

# GGNN : 残差接続と重み付きメッセージパッシングを用いたGNNの一般化

GGNNs : Generalizing GNNs using Residual Connections and Weighted Message Passing ( http://arxiv.org/abs/2311.15448v1 )

ライセンス: Link先を確認

Abhinav Raghuvanshi and Kushal Sokke Malleshappa

(参考訳) 多くの実世界の現象はグラフとしてモデル化することができ、その普遍的存在のために非常に価値がある。 GNNはこれらのグラフ内の関係やパターンを捉え、効果的な学習と予測タスクを可能にする。 GNNはMulti-Layer Perceptrons (MLP)を使用して構築され、ノード間の機能のフローを容易にするためにメッセージパッシングのための追加レイヤが組み込まれている。一般に、GNNの一般化力は、ノードが隣人と情報を交換し、グラフのノード間で情報を効果的に取得し、伝播することができる層間のメッセージパッシング機構に起因していると考えられている。提案手法は,各ノードにアキュミュレートする前にメッセージを重み付けし,Residual接続を追加することによって,メッセージパッシング機構をさらに改良する。この2つのメカニズムは学習の大幅な改善とより高速な収束を示す

Many real-world phenomena can be modeled as a graph, making them extremely valuable due to their ubiquitous presence. GNNs excel at capturing those relationships and patterns within these graphs, enabling effective learning and prediction tasks. GNNs are constructed using Multi-Layer Perceptrons (MLPs) and incorporate additional layers for message passing to facilitate the flow of features among nodes. It is commonly believed that the generalizing power of GNNs is attributed to the message-passing mechanism between layers, where nodes exchange information with their neighbors, enabling them to effectively capture and propagate information across the nodes of a graph. Our technique builds on these results, modifying the message-passing mechanism further: one by weighing the messages before accumulating at each node and another by adding Residual connections. These two mechanisms show significant improvements in learning and faster convergence

翻訳日:2023-11-28 17:43:57 公開日:2023-11-26

# FLAIR: 顔ビデオ復元のための条件付き拡散フレームワーク

FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration ( http://arxiv.org/abs/2311.15445v1 )

ライセンス: Link先を確認

Zihao Zou and Jiaming Liu and Shirin Shoushtari and Yubo Wang and Weijie Gan and Ulugbek S. Kamilov

(参考訳) 顔画像復元(FVR)は、低品質の入力から知覚的にリアルな顔映像を復元しようとする、難しいが重要な問題である。拡散確率モデル(dpms)は顔画像の復元において顕著な性能を発揮することが示されているが、しばしば時間的に一貫性のある高品質な映像を保存できず、再構成された顔の忠実さを損なう。 FLAIR for FVRと呼ばれる新しい条件拡散フレームワークを提案する。 FLAIRは、従来の画像DPMをビデオDPMに変換することにより、フレーム間の時間的一貫性を計算的に効率的に確保する。提案した変換は、繰り返しビデオリファインメント層と、異なるスケールでの時間的自己アテンションを用いる。 FLAIRはまた、推論中に知覚品質と歪み品質のバランスをとるために条件付き反復精製プロセスを使用する。このプロセスは、2つの重要なコンポーネントから構成される:データ一貫性モジュールは、生成されたビデオがその劣化した観察に正確に一致することを解析的に保証する。ビデオの超解像、デブロアリング、JPEG復元、および2つの高品質な顔ビデオデータセットに対する時空フレーム補間において、FLAIRが現在最先端(SOTA)よりも優れていることを示す。

Face video restoration (FVR) is a challenging but important problem where one seeks to recover a perceptually realistic face videos from a low-quality input. While diffusion probabilistic models (DPMs) have been shown to achieve remarkable performance for face image restoration, they often fail to preserve temporally coherent, high-quality videos, compromising the fidelity of reconstructed faces. We present a new conditional diffusion framework called FLAIR for FVR. FLAIR ensures temporal consistency across frames in a computationally efficient fashion by converting a traditional image DPM into a video DPM. The proposed conversion uses a recurrent video refinement layer and a temporal self-attention at different scales. FLAIR also uses a conditional iterative refinement process to balance the perceptual and distortion quality during inference. This process consists of two key components: a data-consistency module that analytically ensures that the generated video precisely matches its degraded observation and a coarse-to-fine image enhancement module specifically for facial regions. Our extensive experiments show superiority of FLAIR over the current state-of-the-art (SOTA) for video super-resolution, deblurring, JPEG restoration, and space-time frame interpolation on two high-quality face video datasets.

翻訳日:2023-11-28 17:43:41 公開日:2023-11-26

# 量子拡散モデル

Quantum Diffusion Models ( http://arxiv.org/abs/2311.15444v1 )

ライセンス: Link先を確認

Andrea Cacioppo, Lorenzo Colantonio, Simone Bordoni and Stefano Giagu

(参考訳) 我々は生成拡散モデルの量子バージョンを提案する。このアルゴリズムでは、ニューラルネットワークは量子状態を直接生成するためにパラメータ化された量子回路に置き換えられる。我々はアルゴリズムの完全な量子バージョンと潜在量子バージョンの両方を示し、これらのモデルの条件付きバージョンも提示する。モデルの性能は質的評価によって補完される定量的指標を用いて評価されてきた。アルゴリズムの簡易版の実装は、実際のNISQ量子ハードウェア上で実行されている。

We propose a quantum version of a generative diffusion model. In this algorithm, artificial neural networks are replaced with parameterized quantum circuits, in order to directly generate quantum states. We present both a full quantum and a latent quantum version of the algorithm; we also present a conditioned version of these models. The models' performances have been evaluated using quantitative metrics complemented by qualitative assessments. An implementation of a simplified version of the algorithm has been executed on real NISQ quantum hardware.

翻訳日:2023-11-28 17:43:17 公開日:2023-11-26

# simplex 構造を用いたグラフィックプリミティブの効率的な符号化

Efficient Encoding of Graphics Primitives with Simplex-based Structures ( http://arxiv.org/abs/2311.15439v1 )

ライセンス: Link先を確認

Yibo Wen, Yunfan Yang

(参考訳) グリッドベースの構造は、画像、符号付き距離関数(SDF)、ニューラルレイディアンスフィールド(NeRF)などのグラフィックプリミティブの明示的な特徴を符号化するのに一般的に用いられる。しかし、$n$次元空間では、サンプリングされた点の値を計算するには、その2^n$隣接する頂点の値を補間する必要がある。次元による指数的スケーリングは、大きな計算オーバーヘッドをもたらす。本稿では,グラフィックプリミティブをエンコードするためのsimplexベースの手法を提案する。 simplexベースの構造における頂点の数は次元とともに線形に増加するので、グリッドベースの表現よりも効率的で一般化できる。非軸整合simplicial構造特性を用いて、単純なノイズアルゴリズムの変換手順に類似した効率的なサンプリングのための座標変換、simplicial subdivision、Barycentric interpolationスキームを導出し、証明する。最後に、ハッシュテーブルを使用して、簡単なグリッドにすべての関心点の多重解像度の特徴を格納し、グラフィックプリミティブをパラメータ化するために、完全に接続された小さなニューラルネットワークに渡します。我々は,C++ と CUDA で簡単な構造符号化アルゴリズムを実装した。 2次元画像整合作業において,提案手法は,同じ品質と圧縮率を維持しつつ,インスタントngpで提案したベースライン法に比べて9.4%の時間でギガピクセル画像の整合を行うことができる。ボリュームレンダリングでは、サンプルが十分に密度が高いときに41.2%のスピードアップを観測する。

Grid-based structures are commonly used to encode explicit features for graphics primitives such as images, signed distance functions (SDF), and neural radiance fields (NeRF) due to their simple implementation. However, in $n$-dimensional space, calculating the value of a sampled point requires interpolating the values of its $2^n$ neighboring vertices. The exponential scaling with dimension leads to significant computational overheads. To address this issue, we propose a simplex-based approach for encoding graphics primitives. The number of vertices in a simplex-based structure increases linearly with dimension, making it a more efficient and generalizable alternative to grid-based representations. Using the non-axis-aligned simplicial structure property, we derive and prove a coordinate transformation, simplicial subdivision, and barycentric interpolation scheme for efficient sampling, which resembles transformation procedures in the simplex noise algorithm. Finally, we use hash tables to store multiresolution features of all interest points in the simplicial grid, which are passed into a tiny fully connected neural network to parameterize graphics primitives. We implemented a detailed simplex-based structure encoding algorithm in C++ and CUDA using the methods outlined in our approach. In the 2D image fitting task, the proposed method is capable of fitting a giga-pixel image with 9.4% less time compared to the baseline method proposed by instant-ngp, while maintaining the same quality and compression rate. In the volumetric rendering setup, we observe a maximum 41.2% speedup when the samples are dense enough.

翻訳日:2023-11-28 17:43:11 公開日:2023-11-26

# ProtoArgNet: Super-Prototypes and Argumentationによる解釈可能な画像分類 [技術報告]

ProtoArgNet: Interpretable Image Classification with Super-Prototypes and Argumentation [Technical Report] ( http://arxiv.org/abs/2311.15438v1 )

ライセンス: Link先を確認

Hamed Ayoobi, Nico Potyka, Francesca Toni

(参考訳) ProtoArgNetは,プロトタイプ部分学習の精神における画像分類のための,新しい解釈可能なディープニューラルネットワークである。以前のアプローチでは、すべてのクラスを複数の原型-パーツに関連付けるが、ProtoArgNetは、原型-パーツを単一の原型クラス表現に組み合わせた超原型を使用する。さらに、以前のアプローチでは、ProtoPNetのロジスティック回帰のような解釈可能な分類層を使用していたが、ProtoArgNetは、引数の形式に基づいた解釈可能な読み込みに依存しながら、多層パーセプトロンによる精度を向上させる。 protoargnetは、多層パーセプトロン/アグメンテーションコンポーネントのスパース化のプロセスによって、ユーザ認知要求にカスタマイズできる。また、他のprototypepical-part-learningアプローチとは対照的に、protoargnetは画像内の異なる領域からの異なるprototypepical-part間の空間関係を認識できる。

We propose ProtoArgNet, a novel interpretable deep neural architecture for image classification in the spirit of prototypical-part-learning as found, e.g. in ProtoPNet. While earlier approaches associate every class with multiple prototypical-parts, ProtoArgNet uses super-prototypes that combine prototypical-parts into single prototypical class representations. Furthermore, while earlier approaches use interpretable classification layers, e.g. logistic regression in ProtoPNet, ProtoArgNet improves accuracy with multi-layer perceptrons while relying upon an interpretable reading thereof based on a form of argumentation. ProtoArgNet is customisable to user cognitive requirements by a process of sparsification of the multi-layer perceptron/argumentation component. Also, as opposed to other prototypical-part-learning approaches, ProtoArgNet can recognise spatial relations between different prototypical-parts that are from different regions in images, similar to how CNNs capture relations between patterns recognized in earlier layers.

翻訳日:2023-11-28 17:42:42 公開日:2023-11-26

# リラックスした自然景観統計モデルによる品質モデリング

Quality Modeling Under A Relaxed Natural Scene Statistics Model ( http://arxiv.org/abs/2311.15437v1 )

ライセンス: Link先を確認

Abhinau K. Venkataramanan and Alan C. Bovik

(参考訳) 視覚情報忠実度 (VIF) や時空間縮小参照エントロピー差 (ST-RRED) などの情報理論画像品質評価 (IQA) モデルは,自然景観統計学 (NSS) と情報理論をシームレスに統合することで大きな成功を収めている。自然画像のウェーブレットサブバンド係数を管理するガウススケール混合(GSM)モデルがこれらのアルゴリズムの基礎となっている。しかし、ソーシャルメディア上のユーザー生成コンテンツの爆発は、通常、多くの未知の障害の1つ以上によって歪められているが、単純なgsmモデルに依存するnssベースのiqaモデルの限界を明らかにする。本稿では,多変量一般化ガウス分布(MGGD)の有用性を導出し,それを応用して一般化GSM(GGSM)モデルの下でのVIFの挙動について検討する。

Information-theoretic image quality assessment (IQA) models such as Visual Information Fidelity (VIF) and Spatio-temporal Reduced Reference Entropic Differences (ST-RRED) have enjoyed great success by seamlessly integrating natural scene statistics (NSS) with information theory. The Gaussian Scale Mixture (GSM) model that governs the wavelet subband coefficients of natural images forms the foundation for these algorithms. However, the explosion of user-generated content on social media, which is typically distorted by one or more of many possible unknown impairments, has revealed the limitations of NSS-based IQA models that rely on the simple GSM model. Here, we seek to elaborate the VIF index by deriving useful properties of the Multivariate Generalized Gaussian Distribution (MGGD), and using them to study the behavior of VIF under a Generalized GSM (GGSM) model.

翻訳日:2023-11-28 17:42:20 公開日:2023-11-26

# 言語モデリングのためのスキップ学習

Learning to Skip for Language Modeling ( http://arxiv.org/abs/2311.15436v1 )

ライセンス: Link先を確認

Dewen Zeng, Nan Du, Tao Wang, Yuanzhong Xu, Tao Lei, Zhifeng Chen, Claire Cui

(参考訳) 過パラメータ化された大規模言語モデルは、文脈内数ショット学習の顕著な一般化性能を有する。しかし、ほとんどの言語モデルは、入力データの複雑さや重要性を無視して、各トークンに同じ量のパラメータや計算を割り当てている。言語モデルの事前訓練では、異なるトークンに可変量の計算を割り当てるべきであり、これは単純なルーティング機構によって効率的に実現できると論じる。トークンが初期レイヤのみの早期終了が可能な従来の早期停止技術とは異なり,バイナリルータを用いた任意の入力トークンに対するレイヤ(あるいはモジュール)の実行を動的にスキップする,より一般的な方法を提案する。提案手法は, 24 個の NLP タスクにまたがる広範囲な評価において, 提案手法は, 推論に軽度な余剰コストでのみ, 他の競合ベースラインと比較して1ショット性能を著しく向上させることができることを示した。

Overparameterized large-scale language models have impressive generalization performance of in-context few-shot learning. However, most language models allocate the same amount of parameters or computation to each token, disregarding the complexity or importance of the input data. We argue that in language model pretraining, a variable amount of computation should be assigned to different tokens, and this can be efficiently achieved via a simple routing mechanism. Different from conventional early stopping techniques where tokens can early exit at only early layers, we propose a more general method that dynamically skips the execution of a layer (or module) for any input token with a binary router. In our extensive evaluation across 24 NLP tasks, we demonstrate that the proposed method can significantly improve the 1-shot performance compared to other competitive baselines only at mild extra cost for inference.

翻訳日:2023-11-28 17:42:02 公開日:2023-11-26

# 機能性拡散

Functional Diffusion ( http://arxiv.org/abs/2311.15435v1 )

ライセンス: Link先を確認

Biao Zhang and Peter Wonka

(参考訳) 本稿では,関数拡散と呼ばれる新しい生成拡散モデルを提案する。以前の研究とは対照的に、関数拡散は連続領域を持つ関数で表されるサンプルに作用する。関数拡散は古典的拡散モデルの無限次元領域への拡張と見なすことができる。機能拡散は、画像、ビデオ、オーディオ、3d形状、変形、 \etcが最小限の変更で同じフレームワークで処理できるため、非常に多様である。さらに、関数拡散は非標準領域で定義された不規則データやデータに特に適している。本研究では,関数拡散に必要な基礎を導出し,トランスフォーマアーキテクチャに基づく最初の実装を提案する。 3次元面上で定義される複雑な符号付き距離関数と変形関数に対する生成結果を示す。

We propose a new class of generative diffusion models, called functional diffusion. In contrast to previous work, functional diffusion works on samples that are represented by functions with a continuous domain. Functional diffusion can be seen as an extension of classical diffusion models to an infinite-dimensional domain. Functional diffusion is very versatile as images, videos, audio, 3D shapes, deformations, \etc, can be handled by the same framework with minimal changes. In addition, functional diffusion is especially suited for irregular data or data defined in non-standard domains. In our work, we derive the necessary foundations for functional diffusion and propose a first implementation based on the transformer architecture. We show generative results on complicated signed distance functions and deformation functions defined on 3D surfaces.

翻訳日:2023-11-28 17:41:49 公開日:2023-11-26

# デジタル性・生殖健康における集団プライバシの重要性

The Importance of Collective Privacy in Digital Sexual and Reproductive Health ( http://arxiv.org/abs/2311.15432v1 )

ライセンス: Link先を確認

Teresa Almeida, Maryam Mehrnezhad, Stephen Cook

(参考訳) デジタル性と生殖の健康技術は豊富にあり、その潜在的な機密データ漏洩に関する懸念を示している。我々は15のIoTデバイスを性的および生殖的追跡サービスで分析し、この絶え間なく続くデータの収集が、パートナー、子、家族を含む個人以上の多くの意味を持つことがわかった。結果は、デジタル性的および生殖的健康データプライバシーは個人的および集団的努力であることを示している。

There is an abundance of digital sexual and reproductive health technologies that presents a concern regarding their potential sensitive data breaches. We analyzed 15 Internet of Things (IoT) devices with sexual and reproductive tracking services and found this ever-extending collection of data implicates many beyond the individual including partner, child, and family. Results suggest that digital sexual and reproductive health data privacy is both an individual and collective endeavor.

翻訳日:2023-11-28 17:41:39 公開日:2023-11-26

# ディープラーニングを用いた機械によるテキスト検出

Machine-Generated Text Detection using Deep Learning ( http://arxiv.org/abs/2311.15425v1 )

ライセンス: Link先を確認

Raghav Gaggar, Ashish Bhagchandani, Harsh Oza

(参考訳) 本研究では,大規模言語モデル (llm) が生成するテキストを人間の生成したテキストから識別するという重要な課題に焦点を当てた。このような機能を持つモデルの実現に関する議論が進行中であることを踏まえ,モデルの実現可能性に関する証拠を提示する。我々は,Twitter Sentiment, Football Commentary, Project Gutenberg, PubMedQA, SQuADなど,複数のデータセットでモデルを評価し,検出手法の有効性を確認した。これらのデータセットは、あらゆる可能性を含む複雑な制約でサンプリングされ、将来の研究の基礎となった。 GPT-3.5-TurboをSVM,RoBERTa-base,RoBERTa-largeなどの各種検出器に対して評価した。研究結果から, 文のシーケンス長に大きく依存した。

Our research focuses on the crucial challenge of discerning text produced by Large Language Models (LLMs) from human-generated text, which holds significance for various applications. With ongoing discussions about attaining a model with such functionality, we present supporting evidence regarding the feasibility of such models. We evaluated our models on multiple datasets, including Twitter Sentiment, Football Commentary, Project Gutenberg, PubMedQA, and SQuAD, confirming the efficacy of the enhanced detection approaches. These datasets were sampled with intricate constraints encompassing every possibility, laying the foundation for future research. We evaluate GPT-3.5-Turbo against various detectors such as SVM, RoBERTa-base, and RoBERTa-large. Based on the research findings, the results predominantly relied on the sequence length of the sentence.

翻訳日:2023-11-28 17:41:31 公開日:2023-11-26

# Wired Perspectives:マルチビューのワイヤーアートが生成AIを取り入れる

Wired Perspectives: Multi-View Wire Art Embraces Generative AI ( http://arxiv.org/abs/2311.15421v1 )

ライセンス: Link先を確認

Zhiyu Qu and Lan Yang and Honggang Zhang and Tao Xiang and Kaiyue Pang and Yi-Zhe Song

(参考訳) 多視点ワイヤーアート(MVWA、Multi-view wire art)は、異なる視点から様々な解釈をすることができる静的な3D彫刻である。そこで我々は,MVWAを容易に作成できるAIシステムDreamWireを紹介する。ユーザーはテキストプロンプトやスクリブルを通じてビジョンを表現し、複雑な3dワイヤー組織から解放する。提案手法は,3次元b\'ezier曲線,prim'sアルゴリズム,および拡散モデルあるいはそれらの変種(例えば controlnet)からの知識蒸留を合成する。このブレンドにより、システムは3dワイヤアートを表現でき、空間的連続性とデータの不足を克服することができる。本システムの内部動作について,接続性と視覚美学のトレードオフを含む総合的な評価と分析を行った。

Creating multi-view wire art (MVWA), a static 3D sculpture with diverse interpretations from different viewpoints, is a complex task even for skilled artists. In response, we present DreamWire, an AI system enabling everyone to craft MVWA easily. Users express their vision through text prompts or scribbles, freeing them from intricate 3D wire organisation. Our approach synergises 3D B\'ezier curves, Prim's algorithm, and knowledge distillation from diffusion models or their variants (e.g., ControlNet). This blend enables the system to represent 3D wire art, ensuring spatial continuity and overcoming data scarcity. Extensive evaluation and analysis are conducted to shed insight on the inner workings of the proposed system, including the trade-off between connectivity and visual aesthetics.

翻訳日:2023-11-28 17:41:17 公開日:2023-11-26

# MCReSANetを用いた低電圧グリッドにおける高調波電流発生のためのデータ駆動モデリング

Data-Driven Modelling for Harmonic Current Emission in Low-Voltage Grid Using MCReSANet with Interpretability Analysis ( http://arxiv.org/abs/2311.15420v1 )

ライセンス: Link先を確認

Jieyu Yao, Hao Yu, Paul Judge, Jiabin Jia, Sasa Djokic, Verner P\"uvi, Matti Lehtonen, Jan Meyer

(参考訳) 電力エレクトロニクス PE の負荷は、電力変換効率と制御を向上させるが、グリッドにおけるハーモニクスの主要な源はそれらである。分布系で多様な負荷が結合されると、その相互作用は調和電圧と電流の関係に関する解析モデルを確立する。そこで本論文では,mresanetを用いた高調波電圧と電流の非線形なデータ駆動モデルを提案する。フィンランドとドイツのpccsから得られた2つのデータセットを用いて、マクレサネットが選択されたフィンランドとドイツのデータセットの様々なネットワーク特性が存在する場合でも、正確な非線形マッピングを確立できることを実証する。 MCReSANetが構築したモデルでは、CNNと比較してMAEが10%、CNNが14%改善され、フィンランドとドイツの両方のデータセットのMLPに比べて8%と17%改善され、モデルの不確実性が他のモデルよりもはるかに低い。本論文は,モデル解釈可能性解析の手法である,より正確なSHAP値に基づく特徴重要度解析のための重要な前提条件である。特徴量分析の結果,分布系における高調波電圧の次数と電流の関係が詳細に示された。それぞれの高調波電流の順序にはインタラクティブな影響があるが、高調波電圧の順序は高調波電流の放出に支配的な影響を与えている: 正の列とゼロの列の高調波は、それぞれフィンランドとドイツのネットワークにおいて支配的な重要性を持ち、2つの選択されたフィンランドとドイツのデータセットで接続された負荷タイプのパターンに準拠している。本稿では,配電系統における多種多様PE負荷による高調波電流放出の理解と予測の可能性を高めるとともに,多種多様グリッド環境における電力品質の最適化に有効であることを示す。

Even though the use of power electronics PE loads offers enhanced electrical energy conversion efficiency and control, they remain the primary sources of harmonics in grids. When diverse loads are connected in the distribution system, their interactions complicate establishing analytical models for the relationship between harmonic voltages and currents. To solve this, our paper presents a data-driven model using MCReSANet to construct the highly nonlinear between harmonic voltage and current. Two datasets from PCCs in Finland and Germany are utilized, which demonstrates that MCReSANet is capable of establishing accurate nonlinear mappings, even in the presence of various network characteristics for selected Finland and Germany datasets. The model built by MCReSANet can improve the MAE by 10% and 14% compared to the CNN, and by 8% and 17% compared to the MLP for both Finnish and German datasets, also showing much lower model uncertainty than others. This is a crucial prerequisite for more precise SHAP value-based feature importance analysis, which is a method for the model interpretability analysis in this paper. The results by feature importance analysis show the detailed relationships between each order of harmonic voltage and current in the distribution system. There is an interactive impact on each order of harmonic current, but some orders of harmonic voltages have a dominant influence on harmonic current emissions: positive sequence and zero sequence harmonics have the dominant importance in the Finnish and German networks, respectively, which conforms to the pattern of connected load types in two selected Finnish and German datasets. This paper enhances the potential for understanding and predicting harmonic current emissions by diverse PE loads in distribution systems, which is beneficial to more effective management for optimizing power quality in diverse grid environments.

翻訳日:2023-11-28 17:41:02 公開日:2023-11-26

# 行列と線形写像のフロベニウス型ノルムと内積とニューラルネットワークトレーニングへの応用

Frobenius-Type Norms and Inner Products of Matrices and Linear Maps with Applications to Neural Network Training ( http://arxiv.org/abs/2311.15419v1 )

ライセンス: Link先を確認

Roland Herzog and Frederik K\"ohne and Leonie Kreis and Anton Schiela

(参考訳) フロベニウスノルムは行列の標準の頻繁な選択である。特に、基盤となるフロベニウスの内積は、ニューラルネットワークのトレーニングで発生するような行列変数に対する対象の勾配を評価するために一般的に用いられる。我々は、直線写像や行列に対するフロベニウスノルムや内積のより広い視点を提供し、それらの内積への依存性をドメイン空間やコドメイン空間で確立する。これは、古典的なフロベニウスノルムが、より一般的なフロベニウス型ノルムの族に属する特別な要素であることを示している。この実現によって提供される重要な余分な自由は、特に、前提条件のニューラルネットワークトレーニングに使用できる。

The Frobenius norm is a frequent choice of norm for matrices. In particular, the underlying Frobenius inner product is typically used to evaluate the gradient of an objective with respect to matrix variable, such as those occuring in the training of neural networks. We provide a broader view on the Frobenius norm and inner product for linear maps or matrices, and establish their dependence on inner products in the domain and co-domain spaces. This shows that the classical Frobenius norm is merely one special element of a family of more general Frobenius-type norms. The significant extra freedom furnished by this realization can be used, among other things, to precondition neural network training.

翻訳日:2023-11-28 17:40:30 公開日:2023-11-26

# GANに基づくLiDAR強度シミュレーション

GAN-Based LiDAR Intensity Simulation ( http://arxiv.org/abs/2311.15415v1 )

ライセンス: Link先を確認

Richard Marcus, Felix Gabel, Niklas Knoop and Marc Stamminger

(参考訳) 現実の車両センサシミュレーションは、自動運転を開発する上で重要な要素である。物理ベースのLiDARのような視覚センサーの実装は実際は複雑であるため、データベースのアプローチはソリューションを約束する。実際のテストドライブからのカメラ画像とLiDARスキャンを使って、GANはそれらの間の翻訳を訓練することができる。このプロセスには2つの追加点がある。まず、カメラ画像を利用して、セグメンテーションデータと深度マップをトレーニング用追加入力として取得する。第2に,物体検出ネットワークが実点群と合成点群の間でどのように一般化し,真理点群を含まない評価を可能にするかを検証することで,LiDARシミュレーションの性能を検証した。両方を組み合わせることで,LiDAR点雲をシミュレートし,現実性を実証する。

Realistic vehicle sensor simulation is an important element in developing autonomous driving. As physics-based implementations of visual sensors like LiDAR are complex in practice, data-based approaches promise solutions. Using pairs of camera images and LiDAR scans from real test drives, GANs can be trained to translate between them. For this process, we contribute two additions. First, we exploit the camera images, acquiring segmentation data and dense depth maps as additional input for training. Second, we test the performance of the LiDAR simulation by testing how well an object detection network generalizes between real and synthetic point clouds to enable evaluation without ground truth point clouds. Combining both, we simulate LiDAR point clouds and demonstrate their realism.

翻訳日:2023-11-28 17:40:18 公開日:2023-11-26

# KOPPA: Key-Query Orthogonal ProjectionとプロトタイプベースのOne-Versus-AllによるPromptベースの継続的学習の改善

KOPPA: Improving Prompt-based Continual Learning with Key-Query Orthogonal Projection and Prototype-based One-Versus-All ( http://arxiv.org/abs/2311.15414v1 )

ライセンス: Link先を確認

Quyen Tran, Lam Tran, Khoat Than, Toan Tran, Dinh Phung, Trung Le

(参考訳) 大規模言語モデルに適用された即時チューニング技術からインスピレーションを得た最近のViTネットワークは,連続学習分野において顕著な成果を上げている。具体的には、一連のプロンプトを維持し、そのサブセットをキー-クエリマッチング戦略を用いて各タスクの学習に割り当てることを提案する。しかしながら、古いタスククエリと将来のタスクのキーとの相関性、潜在空間の特徴のシフト、独立したタスクで学習された潜在ベクトルの相対的分離の制御を欠くと、制限を受ける可能性がある。本研究では,モデルに依存しないメタラーニングにインスパイアされた直交投影に基づく新しいキークエリ学習戦略を導入する。さらに,OVA(One-Versus-All)のプロトタイプベースコンポーネントを導入し,分類ヘッドの区別を強化する。ベンチマークデータを用いた実験結果から,提案手法は,現在の最先端手法を最大20%超える結果が得られることを示した。

Drawing inspiration from prompt tuning techniques applied to Large Language Models, recent methods based on pre-trained ViT networks have achieved remarkable results in the field of Continual Learning. Specifically, these approaches propose to maintain a set of prompts and allocate a subset of them to learn each task using a key-query matching strategy. However, they may encounter limitations when lacking control over the correlations between old task queries and keys of future tasks, the shift of features in the latent space, and the relative separation of latent vectors learned in independent tasks. In this work, we introduce a novel key-query learning strategy based on orthogonal projection, inspired by model-agnostic meta-learning, to enhance prompt matching efficiency and address the challenge of shifting features. Furthermore, we introduce a One-Versus-All (OVA) prototype-based component that enhances the classification head distinction. Experimental results on benchmark datasets demonstrate that our method empowers the model to achieve results surpassing those of current state-of-the-art approaches by a large margin of up to 20%.

翻訳日:2023-11-28 17:40:05 公開日:2023-11-26

# DISYRE: Unsupervised Anomaly Detection のための拡散誘導型合成保存法

DISYRE: Diffusion-Inspired SYnthetic REstoration for Unsupervised Anomaly Detection ( http://arxiv.org/abs/2311.15453v1 )

ライセンス: Link先を確認

Sergio Naval Marimont and Matthew Baugh and Vasilis Siomos and Christos Tzelepis and Bernhard Kainz and Giacomo Tarroni

(参考訳) 教師なし異常検出(unsupervised anomaly detection, uad)技術は、アノテーションに頼ることなく異常を識別し、ローカライズすることを目的としている。拡散モデルは、所望の分布に属する確率、すなわちスコア関数 $\nabla_x \log p(x)$ をモデル化するために、入力を$x$ に変更することを学ぶ。このようなスコア関数は、$\nabla_x \log p(x)$ がピクセル単位の異常スコアであるため、uad に潜在的に関係している。しかし,拡散モデルはガウス雑音に基づく汚職過程を逆転するように訓練されており,学習したスコア関数は医学的異常に一般化する可能性は低い。本研究は, UADに関連するスコア関数の学習方法の問題に対処し, DISYRE: Diffusion-Inspired SYnthetic Restorationを提案する。拡散型パイプラインは維持するが,ガウス雑音の劣化を徐々に合成異常に置き換えて,学習したスコア関数を医学的,自然発生異常に一般化する。我々は3つの一般的な脳MRI UADベンチマークでdisYREを評価し、3つのタスクのうち2つで他の方法よりもかなり優れています。

Unsupervised Anomaly Detection (UAD) techniques aim to identify and localize anomalies without relying on annotations, only leveraging a model trained on a dataset known to be free of anomalies. Diffusion models learn to modify inputs $x$ to increase the probability of it belonging to a desired distribution, i.e., they model the score function $\nabla_x \log p(x)$. Such a score function is potentially relevant for UAD, since $\nabla_x \log p(x)$ is itself a pixel-wise anomaly score. However, diffusion models are trained to invert a corruption process based on Gaussian noise and the learned score function is unlikely to generalize to medical anomalies. This work addresses the problem of how to learn a score function relevant for UAD and proposes DISYRE: Diffusion-Inspired SYnthetic REstoration. We retain the diffusion-like pipeline but replace the Gaussian noise corruption with a gradual, synthetic anomaly corruption so the learned score function generalizes to medical, naturally occurring anomalies. We evaluate DISYRE on three common Brain MRI UAD benchmarks and substantially outperform other methods in two out of the three tasks.

翻訳日:2023-11-28 17:27:34 公開日:2023-11-26

# 選択質問応答のための不確かさ認識言語モデリング

Uncertainty-aware Language Modeling for Selective Question Answering ( http://arxiv.org/abs/2311.15451v1 )

ライセンス: Link先を確認

Qi Yang, Shreya Ravikumar, Fynn Schmitt-Ulms, Satvik Lolla, Ege Demir, Iaroslav Elistratov, Alex Lavaee, Sadhana Lolla, Elaheh Ahmadi, Daniela Rus, Alexander Amini, Alejandro Perez

(参考訳) 本稿では,予測毎に不確実性を推定できる不確実性認識型LLMを自動大言語モデル(LLM)変換手法を提案する。我々のアプローチはモデルとデータに依存しず、計算効率が高く、外部モデルやシステムに依存しない。任意の精度を維持しながら、可能な限り多くの質問に答えるために、選択された質問応答設定で変換されたモデルを評価する。本研究は,SQuAD抽出QAタスクとTruthfulQA生成QAタスクを用いてBERTおよびLlama 2モデル変異体を試験した。提案手法により得られた不確実性推定値を用いることで,モデル確率を用いた場合よりも精度が著しく向上することを示す。

We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs capable of estimating uncertainty with every prediction. Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems. We evaluate converted models on the selective question answering setting -- to answer as many questions as possible while maintaining a given accuracy, forgoing providing predictions when necessary. As part of our results, we test BERT and Llama 2 model variants on the SQuAD extractive QA task and the TruthfulQA generative QA task. We show that using the uncertainty estimates provided by our approach to selectively answer questions leads to significantly higher accuracy over directly using model probabilities.

翻訳日:2023-11-28 17:27:11 公開日:2023-11-26

PDF登録状況（公開日: 20231126）