TY - GEN
T1 - Training Artificial Neural Networks by Coordinate Search Algorithm
AU - Rokhsatyazdi, Ehsan
AU - Rahnamayan, Shahryar
AU - Miyandoab, Sevil Zanjani
AU - Bidgoli, Azam Asilian
AU - Tizhoosh, H. R.
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Training Artificial Neural Networks (ANNs) poses a challenging and critical problem in machine learning. Despite the effectiveness of gradient-based learning methods, such as Stochastic Gradient Descent (SGD), in training neural networks, they do have several limitations. For instance, they require differentiable activation functions, and cannot optimize a model based on several independent non-differentiable loss functions simultaneously; for example, the F1-score, which is used during testing, can be used during training when a gradient-free optimization algorithm is utilized. Furthermore, the training (i.e., optimization of weights) in any DNN can be possible with a small size of the training dataset. To address these concerns, we propose an efficient version of the gradient-free Coordinate Search (CS) algorithm, an instance of General Pattern Search (GPS) methods, for training (i.e., optimizing) neural networks. The proposed algorithm can be used with non-differentiable activation functions and tailored to multi-objective/multi-loss problems. Finding the optimal values for weights of ANNs is a large-scale optimization problem. Therefore instead of finding the optimal value for each variable, which is the common technique in classical CS, we accelerate optimization and convergence by bundling the variables (i.e., weights). In fact, this strategy is a form of dimension reduction for optimization problems. Based on the experimental results, the proposed method is comparable with the SGD algorithm, and in some cases, it outperforms the gradient-based approach. Particularly, in situations with insufficient labeled training data, the proposed CS method performs better. The performance plots demonstrate a high convergence rate, highlighting the capability of our suggested method to find a reasonable solution with fewer function calls. As of now, the only practical and efficient way of training ANNs with hundreds of thousands of weights is gradient-based algorithms such as SGD or Adam. In this paper we introduce an alternative method for training ANN.
AB - Training Artificial Neural Networks (ANNs) poses a challenging and critical problem in machine learning. Despite the effectiveness of gradient-based learning methods, such as Stochastic Gradient Descent (SGD), in training neural networks, they do have several limitations. For instance, they require differentiable activation functions, and cannot optimize a model based on several independent non-differentiable loss functions simultaneously; for example, the F1-score, which is used during testing, can be used during training when a gradient-free optimization algorithm is utilized. Furthermore, the training (i.e., optimization of weights) in any DNN can be possible with a small size of the training dataset. To address these concerns, we propose an efficient version of the gradient-free Coordinate Search (CS) algorithm, an instance of General Pattern Search (GPS) methods, for training (i.e., optimizing) neural networks. The proposed algorithm can be used with non-differentiable activation functions and tailored to multi-objective/multi-loss problems. Finding the optimal values for weights of ANNs is a large-scale optimization problem. Therefore instead of finding the optimal value for each variable, which is the common technique in classical CS, we accelerate optimization and convergence by bundling the variables (i.e., weights). In fact, this strategy is a form of dimension reduction for optimization problems. Based on the experimental results, the proposed method is comparable with the SGD algorithm, and in some cases, it outperforms the gradient-based approach. Particularly, in situations with insufficient labeled training data, the proposed CS method performs better. The performance plots demonstrate a high convergence rate, highlighting the capability of our suggested method to find a reasonable solution with fewer function calls. As of now, the only practical and efficient way of training ANNs with hundreds of thousands of weights is gradient-based algorithms such as SGD or Adam. In this paper we introduce an alternative method for training ANN.
KW - Artificial Neural Network (ANN)
KW - Coordinate Search
KW - Expensive Optimization
KW - Gradient-free
KW - Large-Scale Optimization
KW - Stochastic Gradient Descent (SGD)
UR - http://www.scopus.com/inward/record.url?scp=85182928622&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182928622&partnerID=8YFLogxK
U2 - 10.1109/SSCI52147.2023.10371958
DO - 10.1109/SSCI52147.2023.10371958
M3 - Conference contribution
AN - SCOPUS:85182928622
T3 - 2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023
SP - 1540
EP - 1546
BT - 2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023
Y2 - 5 December 2023 through 8 December 2023
ER -