Rough Notes

What is Reinforcement Learning

In which we learn how to learn

25 Apr 2025

The minimum we need to know

There are lots of resources about Reinforcement Learning out there, so I won’t try to make any type of introduction here.

Also, at this point, just asking Gemini or ChatGPT is a great option. Let’s try that, with GPT 4.5, before it gets removed.

Write a one-page summary of Reinforcement Learning and how it works. Make it simple for an audience with a technical background but no experience in Machine Learning.


Not bad! But let’s focus on just the basics, so we can move on with the Connect-4 model.

Connect 4 and Reinforcement Learning

Gemini's idea of using RL to train a Connect 4 model


In the Connect 4 project we are training a model. This model will be the agent. The environment is the Connect 4 board. The agent modifies the environment using actions, that is, dropping a piece in a non-full column. That changes the state of the environment. After each move, and when the game finishes, the agent gets a reward.

ChatGPT's version is certainly different


Most of this is pretty straightforward, but the reward part is tricky. How do we decide what reward to give? How can we give a reward after a move if there is no winning or losing yet. We’ll analyze that in the next articles.