TY - GEN
T1 - Tradeoff between exploration and exploitation of OQ(λ) with non-markovian update in dynamic environments
AU - Shokri, Maryam
AU - Tizhoosh, Hamid R.
AU - Kamel, Mohamed S.
PY - 2008
Y1 - 2008
N2 - This paper presents some investigations on tradeoff between exploration and exploitation of opposition-based Q(λ) with non-Markovian update (NOQ(λ)) in a dynamic environment. In the previous work the authors applied NOQ(λ) to the deterministic GridWorld problem. In this paper, we have implemented the NOQ(λ) algorithm for a simple elevator control problem to test the behavior of the algorithm for nondeterministic and dynamic environment. We also extend the NOQ(λ) algorithm by introducing the opposition weight to And a better tradeoff between exploration and exploitation for the NOQ(λ) technique. The value of the opposition weight increases as the number of steps increases. Hence, it has more positive effects on the Q-value updates for opposite actions as the learning progresses. The performance of NOQ(λ) method is compared with Q(λ) technique. The experiments indicate that NOQ(λ) performs better than Q(λ).
AB - This paper presents some investigations on tradeoff between exploration and exploitation of opposition-based Q(λ) with non-Markovian update (NOQ(λ)) in a dynamic environment. In the previous work the authors applied NOQ(λ) to the deterministic GridWorld problem. In this paper, we have implemented the NOQ(λ) algorithm for a simple elevator control problem to test the behavior of the algorithm for nondeterministic and dynamic environment. We also extend the NOQ(λ) algorithm by introducing the opposition weight to And a better tradeoff between exploration and exploitation for the NOQ(λ) technique. The value of the opposition weight increases as the number of steps increases. Hence, it has more positive effects on the Q-value updates for opposite actions as the learning progresses. The performance of NOQ(λ) method is compared with Q(λ) technique. The experiments indicate that NOQ(λ) performs better than Q(λ).
UR - http://www.scopus.com/inward/record.url?scp=56349096864&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=56349096864&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2008.4634208
DO - 10.1109/IJCNN.2008.4634208
M3 - Conference contribution
AN - SCOPUS:56349096864
SN - 9781424418213
T3 - Proceedings of the International Joint Conference on Neural Networks
SP - 2915
EP - 2921
BT - 2008 International Joint Conference on Neural Networks, IJCNN 2008
T2 - 2008 International Joint Conference on Neural Networks, IJCNN 2008
Y2 - 1 June 2008 through 8 June 2008
ER -