A Novel Exponential Continuous Learning Rate Adaption Gradient Descent Optimization Method
DOI:
https://doi.org/10.52825/th-wildau-ensp.v2i.2939Keywords:
Neural Network, Training, OptimizerAbstract
We present two novel, fast gradient based optimizer algorithms with dynamic learning rate. The main idea is to adapt the learning rate α by situational awareness, mainly striving for orthogonal neighboring gradients. The method has a high success and fast convergence rate and relies much less on hand-tuned hyper-parameters, providing greater universality. It scales linearly (of order O(n)) with dimension and is rotation invariant, thereby overcoming known limitations. The method is presented in two variants C2Min and P2Min, with slightly different control. Their impressive performance is demonstrated by experiments on several benchmark data-sets (ranging from MNIST to Tiny ImageNet) against the state-of-the-art optimizers Adam and Lion.
Downloads
References
O. Borysenko, and M. Byshkin, "CoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing", Scientific Reports, vol. 11, p. 10705, 2021. DOI: 10.1038/s41598-021-90144-3.
M. N. Ab Wahab, S. Nefti-Meziani, and A. Atyabi, "A comprehensive review of swarm optimization algorithms", PloS one, vol. 10, no. 5, p. e0122827, 2015.
K. Mishchenko, and A. Defazio, Prodigy: An Expeditiously Adaptive Parameter-Free Learner, 2023. [Online]. Available: 2306.06101, arXiv: 2306.06101 [cs.LG].
M. Ivgi, O. Hinder, and Y. Carmon, DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule, 2023. [Online]. Available: 2302.12022, arXiv: 2302.12022 [cs.LG].
D. P. Kingma, and J. Ba, Adam: A Method for Stochastic Optimization, 2017. [Online]. Available: 1412.6980, arXiv: 1412.6980 [cs.LG].
X. Chen, C. Liang, D. Huang et al., "Symbolic Discovery of Optimization Algorithms", ArXiv, vol. abs/2302.06675, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:256846990.
Y. Nesterov, Lectures on Convex Optimization, 2nd ed., Springer Publishing Company, Incorporated, 2018. ISBN: 3319915770.
B. Grimmer, Provably Faster Gradient Descent via Long Steps, 2023. [Online]. Available: 2307.06324, arXiv: 2307.06324 [math.OC].
T. T. Truong, and H. Nguyen, "Backtracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experiments", Applied Mathematics & Optimization, vol. 84, pp. 2557-2586, 2021. DOI: 10.1007/s00245-020-09718-8. [Online]. Available: https://doi.org/10.1007/s00245-020-09718-8.
S. Ling, N. Sharp, and A. Jacobson, VectorAdam for Rotation Equivariant Geometry Optimization, 2022. DOI: 10.48550/ARXIV.2205.13599. [Online]. Available: https://arxiv.org/abs/2205.13599.
A. Kleinsorge, S. Kupper, A. Fauck, and F. Rothe, ELRA: Exponential learning rate adaption gradient descent optimization method, 2023. [Online]. Available: https://arxiv.org/abs/2309.06274, arXiv: 2309.06274 [cs.LG].
a. git, "python elra solver in git", anonymous.4open.science, 2024. [Online]. Available: https://anonymous.4open.science/r/solver-347A/README.md.
S. L. Smith, P. Kindermans, and Q. V. Le, "Don't Decay the Learning Rate, Increase the Batch Size", CoRR, vol. abs/1711.00489, 2017. [Online]. Available: http://arxiv.org/abs/1711.00489.
K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition", in CVPR2016, Jun. 2016, pp. 770-778. DOI: 10.1109/CVPR.2016.90.
S. Zagoruyko, and N. Komodakis, Wide Residual Networks, 2017. [Online]. Available: 1605.07146, arXiv: 1605.07146 [cs.CV].
L. Sun, "ResNet on Tiny ImageNet", 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:196590979.
A. Shcherbina, "Tiny ImageNet Challenge", cs231n.stanford.edu, 2016. [Online]. Available: http://cs231n.stanford.edu/reports/2016/pdfs/401_Report.pdf.
M. Z. Alom, M. Hasan, C. Yakopcic, T. M. Taha, and V. K. Asari, "Improved inception-residual convolutional neural network for object recognition", Neural Computing and Applications, vol. 32, pp. 279-293, 2020. DOI: 10.1007/s00521-018-3627-6. [Online]. Available: https://doi.org/10.1007/s00521-018-3627-6.
P. Zhou, J. Feng, C. Ma, C. Xiong, S. Hoi, and E. Weinan, "Towards Theoretically Understanding Why SGD Generalizes Better than ADAM in Deep Learning", in Proceedings of the 34th International Conference on Neural Information Processing Systems, ser. NIPS'20, Vancouver, BC, Canada: Curran Associates Inc., 2020. ISBN: 9781713829546.
W. R. Huang, Z. Emam, M. Goldblum et al., "Understanding Generalization through Visualizations", CoRR, vol. abs/1906.03291, 2019. [Online]. Available: http://arxiv.org/abs/1906.03291.
Y. Dauphin, and E. D. Cubuk, "Deconstructing the Regularization of BatchNorm", in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=d-XzF81Wg1.
J. Milnor, Lectures on the H-Cobordism Theorem, Princeton: Princeton University Press, 1965. ISBN: 9781400878055. DOI: doi:10.1515/9781400878055. [Online]. Available: https://doi.org/10.1515/9781400878055.
Downloads
Published
How to Cite
Conference Proceedings Volume
Section
License
Copyright (c) 2025 Alexander Kleinsorge, Alexander Fauck, Stefan Kupper

This work is licensed under a Creative Commons Attribution 4.0 International License.