Carrot and Stick - Part 2 - Q Learning

From Theory to Practice

When I think of Reinforcement Learning I usually think of an agent or robot traveling through a maze, avoiding traps, collecting supplies. In each step it observes its state, tries to estimate what will be the best action to take based on all the experience it gained. The way I visualize it, in each state, the robot scans through a database, looking for all the valid actions it can take in that state, and picks the one with the best chance of being the optimal action - Q Learning is a fundamental Reinforcement Learning algorithm that works similar to this. This post is dedicated to the Q Learning algorithm. By the end of this post you will be able to write your own Q Learning agent and test it in an interactive environment.

In my previous post we talked about the Carrot and Stick__ framework - a Reinforcement Learning framework for all your Reinforcement needs. We introduced all the modules of the framework - _Agent, Decision Model, Game and World. We have also showed how to implement the Hill Climb algorithm using the modules.

In this post we will go into depth (but not too much) on how Q Learning works, how to implement it with Carrot and Stick, run a game and compare it to the Hill Climb algorithm.

Q Learning in Theory

Temporal Difference



© 2019. All rights reserved