Opposition-based Q(λ) with non-markovian update

Maryam Shokri, Hamid R. Tizhoosh, Mohamed S. Kamel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The OQ(λ) algorithm benefits from an extension of eligibility traces introduced as opposition trace. This new technique is a combination of the idea of opposition and eligibility traces to deal with large state space problems in reinforcement learning applications. In our previous works the comparison of the results of OQ(λ) and conventional Watkins' Q(λ) reflected a remarkable increase in performance for the OQ(λ) algorithm. However the Markovian update of opposition traces is an issue which is investigated in this paper. It has been assumed that the opposite state can be presented to the agent. This may limit the usability of the technique to deterministic environments. In order to relax this assumption the Non-Markovian Opposition-Based Q(λ) (NOQ(λ)) is introduced in this work. The new method is a hybrid of Markovian update for eligibility traces and non-Markovian-based update for opposition traces. The experimental results show improvements of learning speed for the proposed technique compared to Q(λ) and OQ(λ). The new technique performs faster than OQ(λ) algorithm with the same success rate and can be employed for broader range of applications since it does not require determining state transition.

Original languageEnglish (US)
Title of host publicationProceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
Pages288-295
Number of pages8
DOIs
StatePublished - 2007
Event2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007 - Honolulu, HI, United States
Duration: Apr 1 2007Apr 5 2007

Publication series

NameProceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007

Conference

Conference2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
Country/TerritoryUnited States
CityHonolulu, HI
Period4/1/074/5/07

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Opposition-based Q(λ) with non-markovian update'. Together they form a unique fingerprint.

Cite this