Széchenyi Plan Plus | Government of Hungary. Funded by the European Union. NextGeneration EU.

EN HU
  • Discover
    • News
    • Events
    • Report
  • Research & development
    • Areas of application
    • Research topics
  • Resources
    • Publications
    • Lead researchers
  • Partners
    • Consortium members
    • International partners
    • Industry contacts
    • University contacts
  1. Home
  2. Publications
IEEE Access (Vol. 11) / 4 December 2023

Enhanced Experience Prioritization: A Novel Upper Confidence Bound Approach

Value-based Reinforcement Learning algorithms achieve superior performance by utilizing experiences gathered in the past to update their so-called value-function. In most cases, it is accomplished by applying a sampling strategy to an experience buffer, in which state transitions are stored during the training process. However, the design of such methods is not so intuitive. General theoretic approaches tend to determine the expected learning progress from each experience, based on which the update of neural networks can be carried out efficiently. Proper choice of these methods can not only accelerate, but also stabilize the training significantly by increasing sampling efficiency, which indirectly leads to a reduction in time and computing capacity requirements. As one of the most critical aspects of using Machine Learning (ML) based techniques originates from the lack of decent computing power, thus endeavour to find optimal solutions has long been a researched topic in the field of Reinforcement Learning. Therefore the main focus of this research has been to develop an experience prioritization method acquiring competitive performance, besides having the overall cost of training considerably lowered. In this paper, we propose a novel priority value assignment concept for experience prioritization in Reinforcement Learning, based on the Upper Confidence Bound algorithm. Furthermore, we present empirical findings of our solution, that it outperforms current state-of-the-art in terms of sampling efficiency, while enabling faster and more cost-efficient training processes.

Url
https://doi.org/10.1109/ACCESS.2023.3339248
Authors
Kővári, B.
Pelenczei, B.
Bécsi, T.
Institutes

Kapcsolat

Prof. Dr. Péter Gáspár

H-1111 Budapest, Kende u. 13-17.

+36 1 279 6000

autonom@nemzetilabor.hu

© 2020-2023 National Laboratory for Autonomous Systems, Budapest