**4. FL hyper-parameter tuning for edge computing**

At present, there is no *de facto* method for incorporating FL into edge computing. We propose the use of automated tuning of FL hyper-parameters as a means to decrease the system overhead associated with FL training. The possibilities of adjusting FL hyper-parameters to minimize the system overhead of FL training are

#### *Federated Learning Hyper-Parameter Tuning for Edge Computing DOI: http://dx.doi.org/10.5772/intechopen.110747*

becoming more apparent. In this section, we use our preliminary work, called FedTune [22], to clarify the potential value of FL hyper-parameter tuning for edge computing. FedTune takes into account the application's prioritization for CompT, TransT, CompL, and TransL, which are represented by *α*, *β*, *γ*, and *δ*, respectively. We have *α* þ *β* þ *γ* þ *δ* ¼ 1. For instance, if we take *α* ¼ 0*:*6, *β* ¼ 0*:*2, *γ* ¼ 0*:*1, and *δ* ¼ 0*:*1, it means that the application gives the highest priority to CompT, some importance to TransT, and least importance to CompL and TransL. For two sets of FL hyperparameters *S*<sup>1</sup> and *S*2, FedTune defines the comparison function *I S*ð Þ 1, *S*<sup>2</sup> as

$$I(\mathbb{S}\_1, \mathbb{S}\_2) = a \times \frac{t\_2 - t\_1}{t\_1} + \beta \times \frac{q\_2 - q\_1}{q\_1} + \gamma \times \frac{z\_2 - z\_1}{z\_1} + \delta \times \frac{v\_2 - v\_1}{v\_1} \tag{1}$$

where *t*<sup>1</sup> and *t*<sup>2</sup> are CompT for *S*<sup>1</sup> and *S*<sup>2</sup> when achieving the same model accuracy. Similarly, *q*<sup>1</sup> and *q*<sup>2</sup> denote TransT, *z*<sup>1</sup> and *z*<sup>2</sup> represent CompL, and *v*<sup>1</sup> and *v*<sup>2</sup> indicate TransL for *S*<sup>1</sup> and *S*2, respectively. If *I S*ð Þ 1, *S*<sup>2</sup> <0, then *S*<sup>2</sup> is better than *S*1. A set of hyper-parameters is better than another set if the weighted improvement of some training aspects (e.g., CompT and CompL) is higher than the weighted degradation, if any, of the remaining training aspects (e.g., TransT and TransL). The weights assigned to each aspect are determined by the application's training preferences on CompT, TransT, CompL, and TransL.

FedTune utilizes an iterative algorithm to update the hyper-parameters for the next round (refer to [22] for more details). This process is triggered only when the model accuracy has improved by a minimum amount of *ε*. After normalizing the current overheads, FedTune computes the comparison function between the previous hyper-parameters *Sprv* and the current hyper-parameters *Scur*. It then updates the hyper-parameters and resumes the training process. Due to its lightweight nature, FedTune has a minimal computational burden on a standard edge computing system.

The results obtained by FedTune are promising. The performance of FedTune for various datasets when FedAvg is employed is illustrated in **Table 2**. For the speech-tocommand dataset and EMNIST dataset, the learning rate is set to 0.01, while for the Cifar-100 dataset, it is set to 0.1, all with a momentum of 0.9. The standard deviation is presented in parentheses. The results demonstrate that FedTune consistently enhances the overall performance for all three datasets. Specifically, by averaging 15 combinations of training preferences, FedTune reduces the system overhead of the speech-to-command dataset by 22.48% compared to the baseline. We have observed that FedTune is more beneficial for FL training when the convergence of the training process takes more training rounds. The performance of FedTune with various aggregation methods is presented in **Table 3** for the ResNet-10 model using the speech-tocommand dataset. A learning rate of 0.1, *β*<sup>1</sup> of 0, and *τ* of 1e-3 were used for FedAdagrad. As shown, FedTune can improve the performance of the system when using different aggregation methods. Specifically, when using FedAdagrad, FedTune reduces the system overhead by 26.75%.


**Table 2.**

*Performance of FedTune for diverse datasets when FedAvg aggregation method is applied.*


**Table 3.**

*Performance of FedTune for diverse aggregation algorithms. Speech-to-command dataset and ResNet-10 are used in this experiment.*
