Fugu-MT 論文翻訳(概要): Actively Learning Halfspaces without Synthetic Data

論文の概要: Actively Learning Halfspaces without Synthetic Data

arxiv url: http://arxiv.org/abs/2509.20848v1
Date: Thu, 25 Sep 2025 07:39:25 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-26 20:58:12.759311
Title: Actively Learning Halfspaces without Synthetic Data
Title（参考訳）: 合成データのない半空間をアクティブに学習する
Authors: Hadley Black, Kasper Green Larsen, Arya Mazumdar, Barna Saha, Geelon So,
Abstract要約: 我々は、点合成なしでハーフスペースを学習するための効率的なアルゴリズムを設計する。コーナリーとして、軸整合半空間に対して最適な$O(d + log n)$クエリ決定論的学習器を得る。我々のアルゴリズムはブール関数を$f$ over $n$要素で学習するより一般的な問題を解く。
参考スコア（独自算出の注目度）: 34.777547976926456
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the classic point location problem, one is given an arbitrary dataset $X \subset \mathbb{R}^d$ of $n$ points with query access to an unknown halfspace $f : \mathbb{R}^d \to \{0,1\}$, and the goal is to learn the label of every point in $X$. This problem is extremely well-studied and a nearly-optimal $\widetilde{O}(d \log n)$ query algorithm is known due to Hopkins-Kane-Lovett-Mahajan (FOCS 2020). However, their algorithm is granted the power to query arbitrary points outside of $X$ (point synthesis), and in fact without this power there is an $\Omega(n)$ query lower bound due to Dasgupta (NeurIPS 2004). In this work our goal is to design efficient algorithms for learning halfspaces without point synthesis. To circumvent the $\Omega(n)$ lower bound, we consider learning halfspaces whose normal vectors come from a set of size $D$, and show tight bounds of $\Theta(D + \log n)$. As a corollary, we obtain an optimal $O(d + \log n)$ query deterministic learner for axis-aligned halfspaces, closing a previous gap of $O(d \log n)$ vs. $\Omega(d + \log n)$. In fact, our algorithm solves the more general problem of learning a Boolean function $f$ over $n$ elements which is monotone under at least one of $D$ provided orderings. Our technical insight is to exploit the structure in these orderings to perform a binary search in parallel rather than considering each ordering sequentially, and we believe our approach may be of broader interest. Furthermore, we use our exact learning algorithm to obtain nearly optimal algorithms for PAC-learning. We show that $O(\min(D + \log(1/\varepsilon), 1/\varepsilon) \cdot \log D)$ queries suffice to learn $f$ within error $\varepsilon$, even in a setting when $f$ can be adversarially corrupted on a $c\varepsilon$-fraction of points, for a sufficiently small constant $c$. This bound is optimal up to a $\log D$ factor, including in the realizable setting.
Abstract（参考訳）: 古典的な点位置問題では、未知の半空間 $f : \mathbb{R}^d \to \{0,1\}$ へのクエリアクセスを持つ任意のデータセット $X \subset \mathbb{R}^d$ が与えられる。この問題は極めてよく研究されており、Hopkins-Kane-Lovett-Mahajan (FOCS 2020)により、ほぼ最適の$\widetilde{O}(d \log n)$クエリアルゴリズムが知られている。しかし、それらのアルゴリズムは、$X$(ポイント合成)以外の任意の点を問合せする権限が与えられており、実際には、Dasgupta (NeurIPS 2004) による$\Omega(n)$の問合せ下界が存在する。この研究の目的は、点合成なしでハーフスペースを学習するための効率的なアルゴリズムを設計することである。下界の$\Omega(n)$を回避するために、通常のベクトルが$D$の集合から来る学習半空間について検討し、$\Theta(D + \log n)$の厳密な境界を示す。コーナリーとして、軸方向の半空間に対して最適な$O(d + \log n)$クエリ決定論的学習者を取得し、$O(d \log n)$ vs. $\Omega(d + \log n)$のギャップを閉じる。実際、我々のアルゴリズムはブール関数を$f$ over $n$要素で学習するより一般的な問題を解く。我々の技術的洞察は、それぞれの順序を逐次的に考慮するのではなく、これらの順序の構造を利用してバイナリ検索を並列に行うことであり、我々のアプローチはより広い関心を持つ可能性があると信じている。さらに,我々は,PAC学習のためのほぼ最適なアルゴリズムを得るために,正確な学習アルゴリズムを用いた。 O(\min(D + \log(1/\varepsilon), 1/\varepsilon) \cdot \log D)$ query suffice to learn $f$ within error $\varepsilon$, in a setting if $f$ can be adversarially derupted on a $c\varepsilon$-fraction of point, for a enough small constant $c$。この境界は、実現可能な設定を含む$\log D$ factorまで最適である。

論文の概要: Actively Learning Halfspaces without Synthetic Data

関連論文リスト