Active exploratory Q-learning for large problems

Xianghai Wu; Jonathan Kofman; Hamid R. Tizhoosh

doi:10.1109/ICSMC.2007.4414257

Active exploratory Q-learning for large problems

Xianghai Wu, Jonathan Kofman, Hamid R. Tizhoosh

Artificial Intelligence and Informatics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Although reinforcement learning (RL) emerged more than a decade ago, it is still under extensive investigation in application to large problems, where the states and actions are multi-dimensional and continuous and result in the so-called curse of dimensionality. Conventional RL methods are still not efficient enough in huge state-action spaces, while value-function generalization-based approaches require a very large number of good training examples. This paper presents an active exploratory approach to address the challenge of RL in large problems. The core principle of this approach is that the agent does not rush to the next state. Instead, it attempts a number of actions at the current state first, and then selects the action that returns the greatest immediate reward. The state resulting from performing the action is considered as the next state. Four active exploration algorithms for good actions are proposed: random-based search, opposition-based random search, search by cyclical adjustment, and opposition-based cyclical adjustment of each action dimension. The efficiency of these algorithms is determined by a visual-servoing experiment with a 6-axis robot.

Original language	English (US)
Title of host publication	2007 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2007
Pages	4040-4045
Number of pages	6
DOIs	https://doi.org/10.1109/ICSMC.2007.4414257
State	Published - 2007
Event	2007 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2007 - Montreal, QC, Canada Duration: Oct 7 2007 → Oct 10 2007

Publication series

Name	Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
ISSN (Print)	1062-922X

Conference

Conference	2007 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2007
Country/Territory	Canada
City	Montreal, QC
Period	10/7/07 → 10/10/07

ASJC Scopus subject areas

General Engineering

Access to Document

10.1109/ICSMC.2007.4414257

Cite this

Active exploratory Q-learning for large problems. / Wu, Xianghai; Kofman, Jonathan; Tizhoosh, Hamid R.
2007 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2007. 2007. p. 4040-4045 4414257 (Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Wu, X, Kofman, J & Tizhoosh, HR 2007, Active exploratory Q-learning for large problems. in 2007 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2007., 4414257, Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, pp. 4040-4045, 2007 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2007, Montreal, QC, Canada, 10/7/07. https://doi.org/10.1109/ICSMC.2007.4414257

@inproceedings{18dc6b799fee4e74aa5e35fcb7319e26,

title = "Active exploratory Q-learning for large problems",

abstract = "Although reinforcement learning (RL) emerged more than a decade ago, it is still under extensive investigation in application to large problems, where the states and actions are multi-dimensional and continuous and result in the so-called curse of dimensionality. Conventional RL methods are still not efficient enough in huge state-action spaces, while value-function generalization-based approaches require a very large number of good training examples. This paper presents an active exploratory approach to address the challenge of RL in large problems. The core principle of this approach is that the agent does not rush to the next state. Instead, it attempts a number of actions at the current state first, and then selects the action that returns the greatest immediate reward. The state resulting from performing the action is considered as the next state. Four active exploration algorithms for good actions are proposed: random-based search, opposition-based random search, search by cyclical adjustment, and opposition-based cyclical adjustment of each action dimension. The efficiency of these algorithms is determined by a visual-servoing experiment with a 6-axis robot.",

author = "Xianghai Wu and Jonathan Kofman and Tizhoosh, {Hamid R.}",

year = "2007",

doi = "10.1109/ICSMC.2007.4414257",

language = "English (US)",

isbn = "1424409918",

series = "Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics",

pages = "4040--4045",

booktitle = "2007 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2007",

note = "2007 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2007 ; Conference date: 07-10-2007 Through 10-10-2007",

}

TY - GEN

T1 - Active exploratory Q-learning for large problems

AU - Wu, Xianghai

AU - Kofman, Jonathan

AU - Tizhoosh, Hamid R.

PY - 2007

Y1 - 2007

N2 - Although reinforcement learning (RL) emerged more than a decade ago, it is still under extensive investigation in application to large problems, where the states and actions are multi-dimensional and continuous and result in the so-called curse of dimensionality. Conventional RL methods are still not efficient enough in huge state-action spaces, while value-function generalization-based approaches require a very large number of good training examples. This paper presents an active exploratory approach to address the challenge of RL in large problems. The core principle of this approach is that the agent does not rush to the next state. Instead, it attempts a number of actions at the current state first, and then selects the action that returns the greatest immediate reward. The state resulting from performing the action is considered as the next state. Four active exploration algorithms for good actions are proposed: random-based search, opposition-based random search, search by cyclical adjustment, and opposition-based cyclical adjustment of each action dimension. The efficiency of these algorithms is determined by a visual-servoing experiment with a 6-axis robot.

AB - Although reinforcement learning (RL) emerged more than a decade ago, it is still under extensive investigation in application to large problems, where the states and actions are multi-dimensional and continuous and result in the so-called curse of dimensionality. Conventional RL methods are still not efficient enough in huge state-action spaces, while value-function generalization-based approaches require a very large number of good training examples. This paper presents an active exploratory approach to address the challenge of RL in large problems. The core principle of this approach is that the agent does not rush to the next state. Instead, it attempts a number of actions at the current state first, and then selects the action that returns the greatest immediate reward. The state resulting from performing the action is considered as the next state. Four active exploration algorithms for good actions are proposed: random-based search, opposition-based random search, search by cyclical adjustment, and opposition-based cyclical adjustment of each action dimension. The efficiency of these algorithms is determined by a visual-servoing experiment with a 6-axis robot.

UR - http://www.scopus.com/inward/record.url?scp=40949138163&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=40949138163&partnerID=8YFLogxK

U2 - 10.1109/ICSMC.2007.4414257

DO - 10.1109/ICSMC.2007.4414257

M3 - Conference contribution

AN - SCOPUS:40949138163

SN - 1424409918

SN - 9781424409914

T3 - Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics

SP - 4040

EP - 4045

BT - 2007 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2007

T2 - 2007 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2007

Y2 - 7 October 2007 through 10 October 2007

ER -

Active exploratory Q-learning for large problems

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this