A Novel Exponential Continuous Learning Rate Adaption Gradient Descent Optimization Method

Alexander Kleinsorge; Alexander Fauck; Stefan Kupper

doi:10.52825/th-wildau-ensp.v2i.2939

Authors

Alexander Kleinsorge Technical University of Applied Sciences Wildau https://orcid.org/0009-0003-3710-7004
Alexander Fauck Technical University of Applied Sciences Wildau
Stefan Kupper Technical University of Applied Sciences Wildau

DOI:

https://doi.org/10.52825/th-wildau-ensp.v2i.2939

Keywords:

Neural Network, Training, Optimizer

Abstract

We present two novel, fast gradient based optimizer algorithms with dynamic learning rate. The main idea is to adapt the learning rate α by situational awareness, mainly striving for orthogonal neighboring gradients. The method has a high success and fast convergence rate and relies much less on hand-tuned hyper-parameters, providing greater universality. It scales linearly (of order O(n)) with dimension and is rotation invariant, thereby overcoming known limitations. The method is presented in two variants C2Min and P2Min, with slightly different control. Their impressive performance is demonstrated by experiments on several benchmark data-sets (ranging from MNIST to Tiny ImageNet) against the state-of-the-art optimizers Adam and Lion.

Downloads

Download data is not yet available.

References

O. Borysenko, and M. Byshkin, "CoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing", Scientific Reports, vol. 11, p. 10705, 2021. DOI: 10.1038/s41598-021-90144-3.

M. N. Ab Wahab, S. Nefti-Meziani, and A. Atyabi, "A comprehensive review of swarm optimization algorithms", PloS one, vol. 10, no. 5, p. e0122827, 2015.

K. Mishchenko, and A. Defazio, Prodigy: An Expeditiously Adaptive Parameter-Free Learner, 2023. [Online]. Available: 2306.06101, arXiv: 2306.06101 [cs.LG].

M. Ivgi, O. Hinder, and Y. Carmon, DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule, 2023. [Online]. Available: 2302.12022, arXiv: 2302.12022 [cs.LG].

D. P. Kingma, and J. Ba, Adam: A Method for Stochastic Optimization, 2017. [Online]. Available: 1412.6980, arXiv: 1412.6980 [cs.LG].

X. Chen, C. Liang, D. Huang et al., "Symbolic Discovery of Optimization Algorithms", ArXiv, vol. abs/2302.06675, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:256846990.

Y. Nesterov, Lectures on Convex Optimization, 2nd ed., Springer Publishing Company, Incorporated, 2018. ISBN: 3319915770.

B. Grimmer, Provably Faster Gradient Descent via Long Steps, 2023. [Online]. Available: 2307.06324, arXiv: 2307.06324 [math.OC].

T. T. Truong, and H. Nguyen, "Backtracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experiments", Applied Mathematics & Optimization, vol. 84, pp. 2557-2586, 2021. DOI: 10.1007/s00245-020-09718-8. [Online]. Available: https://doi.org/10.1007/s00245-020-09718-8.

S. Ling, N. Sharp, and A. Jacobson, VectorAdam for Rotation Equivariant Geometry Optimization, 2022. DOI: 10.48550/ARXIV.2205.13599. [Online]. Available: https://arxiv.org/abs/2205.13599.

A. Kleinsorge, S. Kupper, A. Fauck, and F. Rothe, ELRA: Exponential learning rate adaption gradient descent optimization method, 2023. [Online]. Available: https://arxiv.org/abs/2309.06274, arXiv: 2309.06274 [cs.LG].

a. git, "python elra solver in git", anonymous.4open.science, 2024. [Online]. Available: https://anonymous.4open.science/r/solver-347A/README.md.

S. L. Smith, P. Kindermans, and Q. V. Le, "Don't Decay the Learning Rate, Increase the Batch Size", CoRR, vol. abs/1711.00489, 2017. [Online]. Available: http://arxiv.org/abs/1711.00489.

K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition", in CVPR2016, Jun. 2016, pp. 770-778. DOI: 10.1109/CVPR.2016.90.

S. Zagoruyko, and N. Komodakis, Wide Residual Networks, 2017. [Online]. Available: 1605.07146, arXiv: 1605.07146 [cs.CV].

L. Sun, "ResNet on Tiny ImageNet", 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:196590979.

A. Shcherbina, "Tiny ImageNet Challenge", cs231n.stanford.edu, 2016. [Online]. Available: http://cs231n.stanford.edu/reports/2016/pdfs/401_Report.pdf.

M. Z. Alom, M. Hasan, C. Yakopcic, T. M. Taha, and V. K. Asari, "Improved inception-residual convolutional neural network for object recognition", Neural Computing and Applications, vol. 32, pp. 279-293, 2020. DOI: 10.1007/s00521-018-3627-6. [Online]. Available: https://doi.org/10.1007/s00521-018-3627-6.

P. Zhou, J. Feng, C. Ma, C. Xiong, S. Hoi, and E. Weinan, "Towards Theoretically Understanding Why SGD Generalizes Better than ADAM in Deep Learning", in Proceedings of the 34th International Conference on Neural Information Processing Systems, ser. NIPS'20, Vancouver, BC, Canada: Curran Associates Inc., 2020. ISBN: 9781713829546.

W. R. Huang, Z. Emam, M. Goldblum et al., "Understanding Generalization through Visualizations", CoRR, vol. abs/1906.03291, 2019. [Online]. Available: http://arxiv.org/abs/1906.03291.

Y. Dauphin, and E. D. Cubuk, "Deconstructing the Regularization of BatchNorm", in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=d-XzF81Wg1.

J. Milnor, Lectures on the H-Cobordism Theorem, Princeton: Princeton University Press, 1965. ISBN: 9781400878055. DOI: doi:10.1515/9781400878055. [Online]. Available: https://doi.org/10.1515/9781400878055.

A Novel Exponential Continuous Learning Rate Adaption Gradient Descent Optimization Method

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Conference Proceedings Volume

Section

License