The minimum we need to know
There are lots of resources about Reinforcement Learning out there, so I won’t try to make any type of introduction here.
Also, at this point, just asking Gemini or ChatGPT is a great option. Let’s try that, with GPT 4.5, before it gets removed.

Not bad! But let’s focus on just the basics, so we can move on with the Connect-4 model.
Connect 4 and Reinforcement Learning

In the Connect 4 project we are training a model. This model will be the agent. The environment is the Connect 4 board. The agent modifies the environment using actions, that is, dropping a piece in a non-full column. That changes the state of the environment. After each move, and when the game finishes, the agent gets a reward.

Most of this is pretty straightforward, but the reward part is tricky. How do we decide what reward to give? How can we give a reward after a move if there is no winning or losing yet. We’ll analyze that in the next articles.