MCTS-Based Policy Improvement for Reinforcement Learning

Curriculum Learning (CL) is a potent field in Machine Learning that provides several excellent techniques for enhancing the performance of the training process given the same data points, regardless of the training method used. In this research, we propose a novel Monte Carlo Tree Search (MCTS)-based technique that enhances model performance, articulating the utilization of MCTS in Curriculum Learning. The proposed approach leverages MCTS to optimize the sequence of batches during the training process. First, we demonstrate the application of our method in Reinforcement Learning, where sparse rewards often diminish convergence and deteriorate performance. By leveraging the strategic planning and exploration capabilities of MCTS, our method systematically identifies and selects trajectories that are more informative and have a higher potential to enhance policy improvement. This MCTS-guided batch optimization focuses the learning process on valuable experiences, accelerating convergence and improving overall performance. We evaluate our approach on standard RL benchmarks, demonstrating that it outperforms conventional batch selection methods regarding learning speed and policy effectiveness. The results highlight the potential of combining MCTS with CL to optimize batch selection, offering a promising direction for future research in efficient Reinforcement Learning.