This paper considers over-the-air federated learning (OTA-FL). OTA-FL
exploits the superposition property of the wireless medium, and performs model
aggregation over the air for free. Thus, it can greatly reduce the
communication cost incurred in communicating model updates from the edge
devices. In order to fully utilize this advantage while providing comparable
learning performance to conventional federated learning that presumes model
aggregation via noiseless channels, we consider the joint design of
transmission scaling and the number of local iterations at each round, given
the power constraint at each edge device. We first characterize the training
error due to such channel noise in OTA-FL by establishing a fundamental lower
bound for general functions with Lipschitz-continuous gradients. Then, by
introducing an adaptive transceiver power scaling scheme, we propose an
over-the-air federated learning algorithm with joint adaptive computation and
power control (ACPC-OTA-FL). We provide the convergence analysis for
ACPC-OTA-FL in training with non-convex objective functions and heterogeneous
data. We show that the convergence rate of ACPC-OTA-FL matches that of FL with
noise-free communications.
OTA-FL exploits the superposition property of the wireless medium, and performs model aggregation over the air for free.
OTA-FLは、無線媒体の重ね合わせ特性を利用して、空気上のモデルアグリゲーションを無料で行う。
0.69
Thus, it can greatly reduce the communication cost incurred in communicating model updates from the edge devices.
これにより、エッジデバイスからの通信モデル更新で発生する通信コストを大幅に削減することができる。
0.76
In order to fully utilize this advantage while providing comparable learning performance to conventional federated learning that presumes model aggregation via noiseless channels, we consider the joint design of transmission scaling and the number of local iterations at each round, given the power constraint at each edge device.
We first characterize the training error due to such channel noise in OTA-FL by establishing a fundamental lower bound for general functions with Lipschitz-continuous gradients.
Then, by introducing an adaptive transceiver power scaling scheme, we propose an over-the-air federated learning algorithm with joint adaptive computation and power control (ACPC-OTAFL).
We show that the convergence rate of ACPC-OTA-FL matches that of FL with noise-free communications.
本稿では,ACPC-OTA-FLの収束速度がFLのノイズフリー通信と一致することを示す。
0.60
I. INTRODUCTION
I. イントロダクション
0.64
In recent years, advances in machine learning (ML) have achieved astonishing successes in many applications that transform our society, e g , in computer vision, natural language processing, and robotics.
However, due to the rapidly increasing demands for training data, high latency and costs of data transmissions, as well as data privacy/security concerns, aggregating all data to the cloud for ML training is unlikely to remain feasible.
FL employs multiple clients, typically deployed over wireless edge networks, to locally train a learning model and exchange only intermediate updates between the server and clients.
FL provides better avenues for privacy protection by avoiding the transmission of local data, while also being able to leverage parallel clients computation for training speedup.
However, FL also inherits many design challenges of distributed ML.
しかし、FLは分散MLの多くの設計課題を継承している。
0.57
One of the main challenges of FL stems from the communication constraint in the iterative FL learning process, particularly in resource (bandwidth and power)-limited wireless FL systems [2]–[4].
To receive the update information from multiple clients in each round, the conventional wisdom
各ラウンドの複数のクライアントから更新情報を受信するには、従来の知恵
0.82
This work is supported in part by NSF-2112471 (AI-EDGE).
NSF-2112471(AI-EDGE) によって部分的にサポートされている。
0.55
is to use orthogonal spectral or temporal channels for each client and avoid interference among the clients.
各クライアントに直交スペクトルまたは時間的チャネルを使用し、クライアント間の干渉を避けること。
0.79
However, this is neither desirable (since as the number of clients increases, the available rate per edge device decreases, lengthening the communication duration), nor necessary (since only the aggregated model updates is needed at the server) in FL.
Over-the-air FL (OTA-FL) has recently emerged as an effective approach in that it exploits the superposition property of the wireless medium to perform model aggregation “for free” by allowing simultaneous transmission of all clients’ updates [5]–[7].
Specifically, under OTA-FL, the server directly recovers a noisy aggregation of the clients’ model updates that transmit in the same spectral-temporal channel, rather than trying to decode each client’s model update first in orthogonal spectral or temporal channels.
As a result, OTA-FL dramatically reduces the communication costs and overheads from collecting the update from each client, and accordingly enjoys better communication parallelism regardless of the number of clients.
Moreover, existing works on OTA-FL have not considered data heterogeneity, (i.e., datasets among clients are non-i.i.d. and with unbalanced sizes) and system heterogeneity (i.e., the computation and communication capacities varies among clients and could be time-varying [2], [3] simultaneously).
Therefore, a fundamental question in OTAFL system design is: how to develop an efficient OTA-FL training algorithm that can handle both data and system heterogeneity under noisy channels.
In this paper, we answer this question by studying the impact of channel noise on OTA-FL and proposing an overthe-air federated learning algorithm with joint adaptive computation and power control (ACPC-OTA-FL) for edge devices with heterogeneous capabilities.
Our main contributions are summarized as follows: • We first characterize the training error of the conventional OTA-based FedAvg algorithm [1] by establishing a lower bound of the convergence error under Gaussian multiple access channels (MAC) for general functions with Lipschitz continuous gradients.
the noise-free assumption. This insight motivates us to propose our ACPC-OTA-FL algorithm that considers local computation and power control co-design to best utilize the power resources at the edge devices.
• Our ACPC-OTA-FL algorithm allows each client to (in a distributed manner) adaptively determine its transmission power level and number of local update steps to fully utilize the computation and communication resources.
Despite the advantage of high scalability for large amount of clients, existing works on OTA-FL [9]–[13] have empirically shown that the channel noise substantially degrades the learning performance.
To mitigate the impacts of channel noise under limited transmission power constraints, one popular approach is to utilize uniform transceiver scaling for all clients.
Reference [15] has proposed a new learning rate scheme that considers the quality of the gradient estimation; Reference [16] has developed a uniform-forcing transceiver scaling for OTA function computation, while the work in [17] has studied the optimal power control problem by taking gradient statistics into account.
Reference [18] has proposed an uniform transceiver scaling by considering data heterogeneity.
参照 [18] はデータの均一性を考慮した一様トランシーバスケーリングを提案している。
0.48
A common approach of these existing works above is to formulate the power control problem separately to satisfy the power constraints after the local computation at clients.
Moreover, data and system heterogeneity and adapting the power resources for computation is not tied to transmission power control, despite the fact that each edge device is constrained in total power that is needed for both computation and communication.
Due to the coupling of computation and communication processes, a joint adaptive computation and power control for OTA-FL is necessary in order to better mitigate the combined impacts of channel noise, power constraints, as well as data and system heterogeneity, which constitutes the main goal of this paper.
Upon receiving xt, client i performs local computations with xi
xt を受信すると、クライアント i は xi で局所計算を行う
0.73
|Di|Pξi Pi∈[m] |Di|
|Di|P'i ピアル[m] |Di|
0.39
j∈Di |Di| t,0 = xt: t − 1,
jjdi 〔di〕 t,0 = xt: t − 1
0.31
xi t,k+1 = xi
xi t,k+1 = xi
0.46
t,k − η∇F (xi
t,k − η\f (xi) である。
0.57
t,k, ξi t,k),
t,k,i t,k)。
0.37
k = 0, . . . , τ i
k = 0, . . , τ i
0.38
(2) where τ i t denotes the total number of local steps at client i in round t and ξi t,k is a random data sample used by client i in step k in round t.
(2) τ i t は、ラウンド t のクライアント i における局所的なステップの総数、そして τi t,k は、ラウンド t のステップ k においてクライアント i が使うランダムなデータサンプルである。
0.63
We note that one key feature of the OTA-FL model in this paper is that we allow τ i t to be time-varying and device-dependent.
本稿では,OTA-FLモデルの重要な特徴の一つとして,τ i t の時間変化とデバイス依存を許容する点に留意する。
0.71
While this makes the OTA-FL model more practical and flexible, it also introduces an extra dimension of challenges in algorithmic design and convergence analysis.
The specific communication model will be described in Section III-B.
具体的な通信モデルは第III-B節で記述する。
0.70
B. Communication Model We consider an OTA-FL system in which the server broadcasts to all clients in a downlink channel and the clients transmit to the server through a common uplink channel synchronously.
As a result, each client receives an error-free global model parameter xt for its local computation in the beginning of each round t, i.e., xi t,0 = xt.
For the uplink, we consider a Gaussian MAC, where the output the channel in each communication round t is:
アップリンクについては、各通信ラウンド t におけるチャネルの出力が次のようになるガウスmacを考える。
0.67
yt = Xi∈[m]
yt = Xi∂[m]
0.35
hi tzi t + wt.
こんにちは tzi t + wt である。
0.71
(3) In (3), zi gain of client i, and wt ∼ N (0, σ2
(3) (3)では、クライアント i の zi ゲインと wt > N (0, σ2)
0.62
t ∈ Rd is the input from client i, hi
t ∈ Rd はクライアント i, hi からの入力である
0.88
t is the channel c Id) is an additive Gaussian
t はチャネル c Id) は加法ガウスである
0.63
channel noise.
チャンネルノイズ。
0.73
英語(論文から抽出)
日本語訳
スコア
We also consider an instantaneous power constraint at each
それぞれに瞬間的な電力制約も考慮する。
0.75
client in each communication round:
各通信ラウンドのクライアント:
0.75
kzi tk2 ≤ P i t ,
キジ tk2 ≤ P i t ,
0.46
∀i ∈ [m],∀t,
i ∈ [m], t である。
0.56
(4) where P i i in communication round t.
(4) p i は t で通信する。
0.48
t is the maximum transmission power limit for client
tはクライアントの送信電力制限の最大値です
0.89
IV. IMPACTS OF CHANNEL NOISE AND SYSTEM-DATA
IV。 チャネルノイズとシステムデータの影響
0.48
HETEROGENEITY ON OTA-FL
OTA-FLのヘテロジェネティ
0.61
In Section IV-A, we first characterize the impact of the channel noise on OTA-FL when directly applying the standard FedAvg framework with SGD local updates without considering power control at each client.
Then, in Section IV-B, we provide a concrete example to further illustrate the impact of channel noise coupled with heterogeneous numbers of local updates, i.e., system heterogeneity, on OTA-FL performance.
To study the impact of channel noise, we first consider a general L-smooth objective function (i.e., having L-Lipschitz continuous gradients) with a single local step, i.e., τ i t = 1,∀i ∈ [m], t ∈ [T ].
チャネルノイズの影響を研究するために、まず1つの局所ステップ、すなわちτ i t = 1, i ∈ [m], t ∈ [T ] を持つ一般L-スムース目的関数(すなわち、L-リプシッツ連続勾配を持つ)を考える。
0.74
We note that we consider the original FedAvg where model parameters {xi t, i ∈ [m]} are aggregated overthe-air without any further scaling.
ここでは、モデルパラメータ {xi t, i ∈ [m]} がそれ以上のスケーリングを行わずに空気上に集約されるオリジナルのFedAvgを考える。
0.70
Consequently, the channel output could be simplified as xt+1 = xt − η∇F (xt, ξt) + wt, where ξt , {ξi t,∀i ∈ [m]} represents one collective data batch composed of random samples {ξi t,∀i} from all clients.
Then, we have the following theorem to characterize the impact of the channel noise on the OTA version of the FedAvg algorithm:
次に、fedavgアルゴリズムのota版におけるチャネルノイズの影響を特徴付ける次の定理を示す。
0.65
Theorem 1 (Lower Bound for Gaussian Channel).
定理1(ガウスチャネルのより下界)。
0.52
Consider an OTA-FL system for training an L-smooth objective function F (x) with an optimal solution x∗.
最適解 x∗ で l-smooth objective function f (x) を訓練するための ota-fl システムを考える。
0.75
Supposed that each client uses local SGD updates that are subject to additive white Gaussian noise (AWGN), i.e., xt+1 = xt − η∇F (xt, ξt) + wt, where η < 1 c Id).
which further implies the following lower bound for the training convergence:
これはさらに、トレーニング収束の以下の下限を意味する。
0.65
c , lim t→∞
c」。 lim t→∞
0.33
E(cid:2)kxt − x∗k2(cid:3) ≥
E(cid:2)kxt − x∗k2(cid:3) ≥
0.37
η2σ2 + σ2 c
η2σ2 + σ2 c
0.33
1 − (1 − ηL)2 ,
1 − (1 − ηL)2 ,
0.47
where the stochastic gradient noise is assumed to be Gaussian,
確率的勾配ノイズがガウス的であると仮定される。
0.61
i.e., ∇F (xt, ξt) − ∇F (xt) ∼ N (0, σ2Id).
すなわち、n (0, σ2Id) − σF (xt, σ2Id) である。
0.67
Proof Sketch. By assuming independent stochastic gradient noise and channel noise, we could decouple these noise terms and thus producing an iteration relation of kxt − x∗k by Lsmoothness with proper learning rate η < 1 L .
スケッチの証明。 独立確率勾配雑音とチャネル雑音を仮定することにより、これらの雑音項を分離し、Lsmoothness による kxt − x∗k の反復関係を適切な学習率 η < 1 L で生成することができる。
0.69
As the channel noise exists in every round, such noise variance term σ2 c is non-vanishing even for infinitely many rounds.
チャネルノイズはすべてのラウンドに存在するので、このようなノイズ分散項 σ2 c は無限に多くのラウンドに対しても消滅しない。
0.56
Due to space limitation, we refer readers to Appendix VIII for the complete proof.
空間制限のため、読者は完全な証明のために appendix viii を参照する。
0.63
standard SGD updates are used locally at each client.
標準のsgdアップデートは各クライアントでローカルに使用される。
0.69
This motivates us to develop joint adaptive computation and power control for OTA-FL to mitigate the MAC noise effect.
As discussed in above Section III-A, the FL optimization problem considered in this paper contains non-convex objective function, heterogeneous (non-i.i.d.) data, and different number of local updates τ i t at each client.
上述のセクションIII-Aで論じられたように、この論文で考慮されたFL最適化問題は、非凸目的関数、不均一(非i.d.)データ、各クライアントにおけるローカル更新 τ i t の異なる数を含む。
0.65
As shown in previous work [19], different number of local steps (or optimization processes) among clients introduce objective inconsistency, rendering potentially arbitrary deviation from optimal solutions in conventional FL.
Next, we show that similar negative impacts of data heterogeneity and different number of local steps also affect the OTA-FL performance even under proper power control.
end for The server aggregates and updates global model by receiver rescaling (12).
サーバは、レシーバ再スケーリングによってグローバルモデルを集約し、更新する(12)。
0.65
8: end for that the complex coupling between power control and systemdata heterogeneity renders a highly non-trivial OTA-FL power control and algorithmic design to guarantee convergence to an optimal solution.
This further motivates our OTA-FL algorithm design with joint adaptive computation and power control in Section V.
このことはOTA-FLアルゴリズムの設計を第V節の適応計算と電力制御で動機付けている。
0.67
V. ALGORITHM DESIGN
V.アルゴリトム設計
0.73
To address the negative impacts of channel noise and system-data heterogeneity in Section IV, we propose an overthe-air federated learning algorithm with joint adaptive computation and power control (ACPC-OTA-FL) as shown in Algorithm 1.
The basic idea of our ACPC-OTA-FL algorithm is to utilize a time-varying dynamic number of local SGD steps at each client under the instantaneous power constraint at this particular client.
c β2 t There are two advantages in our algorithm compared to previous works.
c β2 t 我々のアルゴリズムには以前のものに比べて2つの利点がある。
0.78
First, we jointly consider the computationcommunica tion co-design due to their complex coupling relationship as shown in Section IV-B.
まず,第IV-B節に示すような複雑な結合関係による計算通信の協調設計について考察する。
0.72
As a result, more powerful clients with more computation capacities and transmission power will execute more local update steps and have a large fraction in the server-side aggregation.
This adaptive and client-dependent design is different from previous works [11], [12], [14]–[18], which considered the communication problem separately after finishing local update computation and used an uniform power control scaling factor without considering the heterogeneity among the clients.
Second, our ACPC-OTA-FL algorithm alleviates the straggler (i.e., slow client) problem by allowing different local step numbers across clients in each communication round.
Before providing the theoretical convergence result, we first state our assumptions:
理論的な収束結果を提供する前に、まず仮定を述べる。
0.69
Assumption 1. (L-Lipschitz Continuous Gradient)
仮定1。 (l-リプシッツ連続勾配)
0.58
There exists a constant L > 0, such that k∇Fi
定数 L > 0 が存在して k が成り立つ。
0.69
(x) − ∇Fi (y)k ≤ Lkx − yk, ∀x, y ∈ Rd, and i ∈ [m].
(x) − ジフィ (y)k ≤ lkx − yk, (y)x, y ∈ rd, i ∈ [m] である。
0.88
Assumption 2. (Unbiased Local Stochastic Gradients and Their Bounded Variance) Let ξi be a random local data sample at client i.
推定 2. (Unbiased Local Stochastic Gradients and Their Bounded Variance) .i をクライアント i におけるランダムなローカルデータサンプルとする。
0.71
The local stochastic gradient is unbiased and has a bounded variance, i.e., E[∇Fi(x, ξi)] = ∇Fi(x), ∀i ∈ [m], and E[k∇Fi(x, ξi) − ∇Fi(x)k2] ≤ σ2, where the expectation is taken over the local data distribution Xi.
Assumption 3. (Bounded Stochastic Gradient) There exist a constant G ≥ 0, such that the norm of each local stochastic gradient is bounded: E[k∇Fi(x, ξi)k2] ≤ G2, ∀i ∈ [m].
Thus, we can decouple the channel noise term as an extra error scaled by σ2 c when β2 t calculating the function descent (F (xt+1) − F (xt)) in each round by the L-smoothness.
したがって、各ラウンドの関数降下 (f (xt+1) − f (xt)) をl-スムースネスで計算したとき、チャネルノイズ項をσ2 c でスケールする余分な誤差として分離することができる。
0.69
Then, the technical challenge lies in heterogeneous local steps across clients.
技術的な課題は、クライアント間の異種なローカルステップにある。
0.55
By simulating
シミュレーションすることで
0.57
英語(論文から抽出)
日本語訳
スコア
i=1 αi τ i
i=1 である。 αi τ i
0.39
2 ηtEt[kPm t mL2Pm
2 ηtEt[kPm t mL2Pm
0.41
mini-batch SGD method, we could further bound the differt Pτ i t−1 ence 1 t,k))k2] ≤ k=0 (∇Fi(xt) − ∇Fi(xi i=1(αi)2(cid:0)τ i t(cid:1)2 1 2 η3 G2, which accounts for the size of dataset, data heterogeneity and different local steps.
Pτ i t−1 ence 1 t,k))k2] ≤ k=0 (\Fi(xt) − >Fi(xi i=1(αi)2(cid:0)τ i t(cid:1)2 1 2 η3 G2 はデータセットのサイズ、データの異質性、局所的なステップが異なる。
0.78
The above two terms correspond to channel noise error and local update error, respectively.
上記の2つの用語はそれぞれチャンネルノイズエラーとローカル更新エラーに対応している。
0.77
Following the classic analysis for SGDbased methods, the optimization error and statistical error could be similarly derived, and the final convergence result naturally follows.
Due to space limitation, we relegate the full proof to Appendix VIII.
空間制限のため、我々は Appendix VIII に完全証明を再帰する。
0.77
Theorem 2 characterizes four sources of errors that affect the convergence rate:
定理2は収束率に影響を与える4つの誤差源を特徴づける。
0.66
1) the optimization error dependent on the distance between the initial guess and optimal objective value;
1) 最適化誤差は,初期推定値と最適目標値との距離に依存する。
0.85
2) the statistical error due to the use of stochastic gradients rather than full gradients;
2) 完全勾配よりも確率勾配を用いることによる統計的誤差
0.62
3) local update error from local update steps coupled with data heterogeneity; and
3) ローカル更新ステップからのローカル更新エラーとデータの不均一性
0.77
4) channel noise error from over-the-air transmissions.
4) 送風機からの流路騒音誤差
0.65
Among these four errors, only the optimization error (first term) vanishes as the total number of iterations T gets large, while other three terms are independent of T .
これら4つの誤りのうち、最適化誤差(第一項)だけが、T の総イテレーション数が大きくなるにつれて消え、他の3つの項は T とは独立である。
0.69
Similar to classic SGD or FedAvg convergence analysis, diminishing learning rates O( 1√T ) can be used to remove the statistical and local update errors and obtain a convergence error bound mint∈[T ] Ek∇F (xt)k2 = O( 1√T ).
To mitigate the channel noise error, the parameter β needs to be chosen judiciously.
チャネルノイズを緩和するには、パラメータβを適切に選択する必要がある。
0.81
Given δi t in communication round t, we can set t (τ i t )2 t = mini∈[m]{ P i β2 i }.
通信円 t における δi t が与えられたとき、t (τ i t )2 t = mini∂[m]{ P i β2 i } を設定できる。
0.78
If the δi t-information is unavailable, kδi tk2α2 t )2G2 by its upper bound, and tk2 ≤ η2 we can choose kδi t = Pt thus β2 t G2 , where Pt = mini∈[m] P i t .
δi t-情報が利用できない場合、kδi tk2α2 t )2g2 はその上界で、tk2 ≤ η2 は kδi t = pt から β2 t g2 を選ぶことができる。
0.81
For the special i η2 t ,∀i, t, αi = 1 case with P = P i m (balanced datasets), and identical local steps τ i t = τ,∀i, t, the channel noise error c G2 (the fourth term) becomes ησ2 P m2 , and the following result immediately follows from Theorem 2:
p = p i m (バランスデータセット) と同一の局所ステップ τ i t = τ,i, t を持つ特別な i η2 t ,i, t, αi = 1 の場合、チャネルノイズエラー c g2 (第4項) は ησ2 p m2 となり、次の結果は定理 2 からすぐ続く。
0.84
t (τ i
t (複数形 ts)
0.58
α2 , β2 = m
α2 , β2 = m
0.44
m , τi = τ, η = η2 , the convergence rate of ACPC-OTA-FL under ).
m , τi = τ, η = η2, ACPC-OTA-FL の収束速度。
0.68
Corollary 1 (Convergence Rate). Let αi = 1 √m√T the special case above is O( σ2+1√mT Corollary 1 implies that, if τ ≤ T 1/4 m3/4 , a linear speedup in terms of the number of clients (i.e., O( )) can be achieved, which shows the benefits of parallelism and matches the convergence rate of FedAvg in noise-free communication environment [21], [22].
1√mT T Lastly, we note that it is straightforward to extend our results to fading channels with known CSI.
1/mT T 最後に、結果を既知のcsiでフェージングチャネルに拡張するのは簡単なことです。
0.42
Specifically, the adaptive computation and power control strategy for fading channels tk2 ≤ P i is to choose local steps τ i t , where δi t = βi t > 0 represents the maximum transmission power for client i in round t.
具体的には、フェーディングチャネル tk2 ≤ P i に対する適応計算および電力制御戦略は、δi t = βi t > 0 が t のクライアント i の最大送信電力を表すような局所ステップ τi t を選択することである。
0.85
Under this joint computation and power control, the received signal remains the same as that in the non-fading OTA-FL setting.
この共同計算と電力制御の下では、受信信号は非フェーディングOTA-FL設定と同じである。
0.73
Thus, the same convergence results in Theorem 2 and Corollary 1 continue to hold.
したがって、定理 2 と補題 1 における同じ収束結果が保たれている。
0.71
t such that kδi t = βtαi , and P i τ i t hi t
t を kδi t = βtαi とし、p i τi t hi t とする。
0.64
t(cid:0)xi t,τi − xi
t(cid:0)xi t,τi − xi
0.46
t,0(cid:1) , βi
t,0(cid:1),βi
0.37
LOGISTIC REGRESSION TEST ACCURACY (%) FOR ACPC-OTA-FL COMPARED WITH COTAF AND FEDAVG ON THE MNIST DATASET.
コタフとFedAVGとを併用したACPC-OTA-FLのロジスティック回帰試験精度(%)。
0.61
TABLE I Non-IID Level
テーブルI 非IIDレベル
0.66
Algorithm p = 1
アルゴリズム p = 1
0.57
p = 2 p = 5
p = 2 p = 5
0.43
p = 10 ACPC-OTA-FL
p = 10 ACPC-OTA-FL
0.31
COTAF FedAvg
COTAF FedAvg
0.43
ACPC-OTA-FL
ACPC-OTA-FL
0.20
COTAF FedAvg
COTAF FedAvg
0.43
ACPC-OTA-FL
ACPC-OTA-FL
0.20
COTAF FedAvg
COTAF FedAvg
0.43
ACPC-OTA-FL
ACPC-OTA-FL
0.20
COTAF FedAvg
COTAF FedAvg
0.43
Signal-to-Noise Ratio -1 dB 78.22 46.55 67.49 81.89 63.59 71.55 86.48 79.64 74.76 86.21 86.43 76.26
VI. NUMERICAL RESULTS In this section, we conduct numerical experiments to verify our theoretical results using logistic regression on the MNIST dataset [23].
Following the same procedure as in existing works [1], [20], [22], we distribute the data evenly to m = 10 clients in a label-based partition to impose data heterogeneity across the clients, where the heterogeneity level can be characterized by a parameter p.
既存の作業 [1], [20], [22] と同じ手順に従って、データをラベルベースのパーティションで m = 10 のクライアントに均等に分散し、クライアントに不均一性を課し、不均一度レベルをパラメータ p で特徴づける。
0.67
As the MNIST dataset contains 10 classes of labels in total, p = 10 represents the i.i.d. case.
1) Test accuracy drops significantly by directly applying FedAvg algorithm to wireless OTA-FL (up to 20% accuracy drop) under large channel noise, which validates our Theorem 1 and is consistent with existing works [9]–[11]; and
For example, when SNR = −1 dB and p = 1, ACPC-OTA-FL improves the test accuracy by 31.76% and 10.73% compared to COTAF and FedAvg, respectively.
例えば、SNR = −1 dB と p = 1 のとき、ACPC-OTA-FL は COTAF と FedAvg と比較してテスト精度を 31.76% と 10.73% 改善する。
0.74
The intuition is that the gradient returned from the clients vary dramatically in highly heterogeneous data settings, and thus utilizing an adaptive local steps under limited power constraints allows each client to fully exploit both computation and communication resources.
We first characterized the training error due to channel noise for conventional OTA-FL by establishing a fundamental lower bound for general objective functions with Lipschitzcontinuous gradients.
This motivated us to propose an overthe-air federated learning algorithm with joint adaptive computation and power control (ACPC-OTA-FL) to mitigate the impacts of channel noise on the learning performance, while
[21] H. Yu, S. Yang, and S. Zhu, “Parallel restarted sgd with faster convergence and less communication: Demystifying why model averaging works for deep learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol.
H. Yu, S. Yang, S. Zhuは、AAAI Conference on Artificial Intelligence, vol.の中で、“Parallelは、より早く収束し、コミュニケーションの少ないsgdを再開した。 訳抜け防止モード: [21]H.Yu、S.Yang、S.Zhu 並列再起動型sgdの高速化とコミュニケーションの低減 : なぜデミスティフィケーション モデル平均化はディープラーニングに役立ちます。 人工知能国際会議(AAAI)に参加して
0.65
33, no. 01, 2019, pp. 5693–5700.
33, no. 01, 2019, pp. 5693-5700。
0.90
[22] H. Yang, M. Fang, and J. Liu, “Achieving linear speedup with partial worker participation in non-IID federated learning,” in International Conference on Learning Representations, 2021.
22] h. yang, m. fang, j. liu, “achieving linear speedup with partial workers participation in non-iid federated learning” in international conference on learning representations, 2021” (英語) 訳抜け防止モード: [22 ]H. Yang, M. Fang, J. Liu 「非IDフェデレーション学習における部分的労働者参加による線形スピードアップの実現」 国際学習表現会議(2021年)に参加。
[23] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol.
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, “Gradient-based learning applied to document recognition”, Proceedings of the IEEE, vol。 訳抜け防止モード: [23 ]Y.LeCun,L.Bottou,Y.B engio, そしてP. Haffner氏は,“ドキュメント認識に適用されたグラディエントベースの学習”だ。 IEEE , vol の成果。
0.71
86, no. 11, pp. 2278–2324, 1998.
86, No. 11, pp. 2278–2324, 1998。
0.45
taking the device heterogeneity into consideration.
装置の不均一性を考慮に入れます
0.60
We analyzed the convergence of ACPC-OTA-FL with non-convex objective functions and heterogeneous data, and shown that the convergence rate of ACPC-OTA-FL matches that of FedAvg with noise-free communications.
REFERENCES [1] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics.
参考 B. McMahan氏、E. Moore氏、D. Ramage氏、S. Hampson氏、B. A. y Arcas氏は人工知能と統計学で「分散データから深層ネットワークをコミュニケーション効率よく学習する」と語っています。 訳抜け防止モード: 参考 B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. y Arcas, “コミュニケーション - 分散データからのディープネットワークの効率的な学習”。 人工知能と統計学の分野です
0.66
PMLR, 2017, pp. 1273– 1282.
pmlr、2017年、p.1273-1282。
0.55
[2] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: Concept and applications,” ACM Transactions on Intelligent Systems and Technology (TIST), vol.
[2] Q. Yang, Y. Liu, T. Chen, Y. Tong, “Federated Machine Learning: Concept and Applications”, ACM Transactions on Intelligent Systems and Technology (TIST), vol.
0.43
10, no. 2, pp. 1–19, 2019.
第10巻、第2巻、第1-19頁、2019年。
0.41
[3] H. B. McMahan et al , “Advances and open problems in federated learning,” Foundations and Trends® in Machine Learning, vol.
[4] S. Niknam, H. S. Dhillon, and J. H. Reed, “Federated learning for wireless communications: Motivation, opportunities, and challenges,” IEEE Communications Magazine, vol.
4]s. niknam, h. s. dhillon, j. h. reed, “federated learning for wireless communications: motivation, opportunity, and challenges”. ieee communications magazine, vol. (英語) 訳抜け防止モード: [4 ]S.Niknam、H.S.Dhillon、J.H. Reed 「無線通信のためのフェデレーションラーニング : モチベーション、機会、課題」 IEEE Communications Magazine, vol。
0.71
58, no. 6, pp. 46–51, 2020.
58, No. 6, pp. 46-51, 2020。
0.46
[5] O. Abari, H. Rahul, and D. Katabi, “Over-the-air function computation
[5]O. Abari, H. Rahul, D. Katabi, “Over-the-air function computing”
0.39
in sensor networks,” arXiv preprint arXiv:1612.02307, 2016.
arxivは2016年にarxiv:1612.02307をプレプリントした。
0.36
[6] K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning via overthe-air computation,” IEEE Transactions on Wireless Communications, vol.
6] k. yang, t. jiang, y. shi, z. ding, “federated learning via over the-air computation”, ieee transactions on wireless communications, vol. 。 訳抜け防止モード: [6]王陽、T.江、Y.シー、 とZ.Ding氏は語る。「空気計算によるフェデレーション学習」。 IEEE Transactions on Wireless Communications, vol。
0.71
19, no. 3, pp. 2022–2035, 2020.
19, 3, pp. 2022-2035, 2020。
0.80
[7] G. Zhu, J. Xu, K. Huang, and S. Cui, “Over-the-air computing for wireless data aggregation in massive iot,” IEEE Wireless Communications, vol.
ieee wireless communications, vol.7] g. zhu, j. xu, k. huang, s. cui, “大規模iotにおけるワイヤレスデータ集約のための無線コンピューティング”。
0.77
28, no. 4, pp. 57–65, 2021.
28,4,p.57-65,2021。
0.61
[8] K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning based on over-the-air computation,” in ICC 2019-2019 IEEE international conference on communications (ICC).
9] K. Yang, T. Jiang, Y. Shi, Z. Ding, “Federated Learning based on over-the-air calculation” in ICC 2019-2019 IEEE international conference on Communication (ICC)。 訳抜け防止モード: [8]王陽、T.江、Y.シー、 そしてZ.Ding氏は,“オーバー - 空気計算に基づくフェデレートラーニング”について語る。 ICC 2019 - 2019 IEEE International Conference on Communications (ICC) に参加。
0.61
IEEE, 2019, pp. 1–6.
IEEE, 2019, pp. 1-6。
0.44
[9] G. Zhu, Y. Wang, and K. Huang, “Broadband analog aggregation for low-latency federated edge learning,” IEEE Transactions on Wireless Communications, vol.
G. Zhu, Y. Wang, K. Huang, “Broadband analog aggregate for low-latency federated edge learning”, IEEE Transactions on Wireless Communications, vol.[9] G. Zhu, Y. Wang, K. Huang。 訳抜け防止モード: 9 ] g. zhu, y. wang, k. huang. 低遅延フェデレートエッジラーニングのためのブロードバンドアナログアグリゲーション” ieee transactions on wireless communications, vol. を参照。
0.60
19, no. 1, pp. 491–506, 2019.
19, 1, pp. 491-506, 2019。
0.77
[10] T. Sery and K. Cohen, “On analog gradient descent learning over multiple access fading channels,” IEEE Transactions on Signal Processing, vol.
10] t. sery氏とk. cohen氏は、"複数のアクセスフェージングチャネル上でのアナログ勾配降下学習について"、ieee transactions on signal processing, vol。
0.66
68, pp. 2897–2911, 2020.
68, pp. 2897-2911, 2020。
0.90
[11] M. M. Amiri and D. G ¨und¨uz, “Federated learning over wireless fading channels,” IEEE Transactions on Wireless Communications, vol.
IEEE Transactions on Wireless Communications, vol..[11] M.M. AmiriとD.G.シュンド・ジュズ, “Federated Learning over Wireless fading channel”. IEEE Transactions on Wireless Communications.
0.41
19, no. 5, pp. 3546–3557, 2020.
19,5,p.3546-3557,202 0。
0.64
[12] ——, “Machine learning at the wireless edge: Distributed stochastic gradient descent over-the-air,” IEEE Transactions on Signal Processing, vol.
IEEE Transactions on Signal Processing, vol.[12] ——— “ワイヤレスエッジでのマシーン学習:分散確率勾配降下”。
0.71
68, pp. 2155–2169, 2020.
68, pp. 2155–2169, 2020。
0.93
[13] ——, “Over-the-air machine learning at the wireless edge,” in 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).
2019年にはIEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)が開催された。 訳抜け防止モード: ワイヤレスのエッジでエアマシーンを学習する”。 2019年、IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC) に参加。
0.62
IEEE, 2019, pp. 1–5.
IEEE, 2019, pp. 1-5。
0.87
[14] G. Zhu, Y. Du, D. G ¨und¨uz, and K. Huang, “One-bit over-the-air aggregation for communication-efficient federated edge learning: Design and convergence analysis,” IEEE Transactions on Wireless Communications, vol.
IEEE Transactions on Wireless Communications, vol.[14] G. Zhu, Y. Du, D. G シュンド・シュウズ, K. Huang, “通信効率のよいエッジ学習のための一ビットオーバーザエアアグリゲーション:設計と収束分析”。
0.84
20, no. 3, pp. 2120–2135, 2020.
20, 3, pp. 2120-2135, 2020。
0.79
[15] H. Guo, A. Liu, and V. K. Lau, “Analog gradient aggregation for federated learning over wireless networks: Customized design and convergence analysis,” IEEE Internet of Things Journal, vol.
IEEE Internet of Things Journal, vol.[15] H. Guo, A. Liu, V. K. Lau, “Analog gradient aggregate for federated learning over Wireless network: Customized design and convergence analysis”. IEEE Internet of Things Journal.com 訳抜け防止モード: [15]H.Guo、A.Liu、V.K. Lau 「無線ネットワーク上での連合学習のためのアナログ勾配集約 : カスタマイズ設計と収束解析」 IEEE Internet of Things Journal(英語)
0.78
8, no. 1, pp. 197–210, 2020.
8巻1頁、p.197-210、2020年。
0.57
[16] L. Chen, X. Qin, and G. Wei, “A uniform-forcing transceiver design for over-the-air function computation,” IEEE Wireless Communications Letters, vol.
ieee wireless communications letters, vol. “a uniform-forcing transceiver design for over-the-air function computation”[16] l. chen, x. qin, g. wei。 訳抜け防止モード: [16 ]l. chen, x. qin, g. wei, 「制服」 オーバーのためのトランシーバ設計を強制する --air function computation, ”ieee wireless communications letters, vol. の略。
0.61
7, no. 6, pp. 942–945, 2018.
7、no. 6、p. 942-945、2018。
0.80
[17] N. Zhang and M. Tao, “Gradient statistics aware power control for over-the-air federated learning in fading channels,” in 2020 IEEE International Conference on Communications Workshops (ICC Workshops).
[17] n. zhang と m. tao は,2020 ieee international conference on communications workshops (icc workshops) において,“フェージングチャネルにおけるオーバー・ザ・エア・フェデレート・ラーニングのための段階的統計認識電力制御”を発表した。
0.71
IEEE, 2020, pp. 1–6.
橋本、2020年、p.1-6。
0.37
[18] T. Sery, N. Shlezinger, K. Cohen, and Y. C. Eldar, “Over-the-air federated learning from heterogeneous data,” IEEE Transactions on Signal Processing, 2021.
T.Sery, N. Shlezinger, K. Cohen, Y. C. Eldar, “Over-the-air federated learning from heterogeneous data”, IEEE Transactions on Signal Processing, 2021。 訳抜け防止モード: [18 ]T.Sery,N.Shlezinger, K.Cohen, そしてY・C・エルダーは、"Over - the - the- air federated learning from heterogeneous data"と述べた。 IEEE Transactions on Signal Processing , 2021
0.68
[19] J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, “Tackling the objective inconsistency problem in heterogeneous federated optimization,” Advances in Neural Information Processing Systems, vol.
19] j. wang, q. liu, h. liang, g. joshi, h. v. poor, “ヘテロジニアスフェデレーション最適化における客観的不整合問題に取り組むこと” ニューラル情報処理システムの進歩, vol. 訳抜け防止モード: [19 ]J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, “異種フェデレーション最適化における目的的不整合問題に取り組む”。 ニューラル情報処理システムの進歩
0.86
33, 2020. [20] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang,
33, 2020. [20]X.Li、K.Huang、W.Yang、S.Wang、Z.Zhang
0.37
“On the in International Conference on Learning Representations, 2020.
Consider an OTA-FL system for training an L-smooth objective function F (x) with an optimal solution x∗.
最適解 x∗ で l-smooth objective function f (x) を訓練するための ota-fl システムを考える。
0.75
Supposed that each client uses local SGD updates that are subject to additive white Gaussian noise (AWGN), i.e., xt+1 = xt − η∇F (xt, ξt) + wt, where η < 1 c Id).