Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks Using PAC-Bayesian Analysis

Yusuke Tsuzuku, Issei Sato, Masashi Sugiyama
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9636-9647, 2020.

Abstract

The notion of flat minima has gained attention as a key metric of the generalization ability of deep learning models. However, current definitions of flatness are known to be sensitive to parameter rescaling. While some previous studies have proposed to rescale flatness metrics using parameter scales to avoid the scale dependence, the normalized metrics lose the direct theoretical connections between flat minima and generalization. In this paper, we first provide generalization error bounds using existing normalized flatness measures. Using the analysis, we then propose a novel normalized flatness metric. The proposed metric enjoys both direct theoretical connections and better empirical correlation to generalization error.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-tsuzuku20a, title = {Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks Using {PAC}-{B}ayesian Analysis}, author = {Tsuzuku, Yusuke and Sato, Issei and Sugiyama, Masashi}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {9636--9647}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/tsuzuku20a/tsuzuku20a.pdf}, url = {https://proceedings.mlr.press/v119/tsuzuku20a.html}, abstract = {The notion of flat minima has gained attention as a key metric of the generalization ability of deep learning models. However, current definitions of flatness are known to be sensitive to parameter rescaling. While some previous studies have proposed to rescale flatness metrics using parameter scales to avoid the scale dependence, the normalized metrics lose the direct theoretical connections between flat minima and generalization. In this paper, we first provide generalization error bounds using existing normalized flatness measures. Using the analysis, we then propose a novel normalized flatness metric. The proposed metric enjoys both direct theoretical connections and better empirical correlation to generalization error.} }
Endnote
%0 Conference Paper %T Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks Using PAC-Bayesian Analysis %A Yusuke Tsuzuku %A Issei Sato %A Masashi Sugiyama %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-tsuzuku20a %I PMLR %P 9636--9647 %U https://proceedings.mlr.press/v119/tsuzuku20a.html %V 119 %X The notion of flat minima has gained attention as a key metric of the generalization ability of deep learning models. However, current definitions of flatness are known to be sensitive to parameter rescaling. While some previous studies have proposed to rescale flatness metrics using parameter scales to avoid the scale dependence, the normalized metrics lose the direct theoretical connections between flat minima and generalization. In this paper, we first provide generalization error bounds using existing normalized flatness measures. Using the analysis, we then propose a novel normalized flatness metric. The proposed metric enjoys both direct theoretical connections and better empirical correlation to generalization error.
APA
Tsuzuku, Y., Sato, I. & Sugiyama, M.. (2020). Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks Using PAC-Bayesian Analysis. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:9636-9647 Available from https://proceedings.mlr.press/v119/tsuzuku20a.html.

Related Material