Rough Notes

Training a model to play Connect-4 - I

In which I explain what is this about

04 Jan 2025

games reinforcement learning

Why Oh Why

I have a clear memory of several times when AI made an impression on me. The first was when I learned about gradient descent back in college (in a class I basically never attended, like most of my classes — the fact that I still got a degree is a clear sign my university failed.) In the last few years, generative AI has appeared as an almost magical technology (most people focus on the failures and unfulfilled hype, but I can’t help but recall how these advancements seemed impossible just a few years ago.) But the one that really resonates with me is AlphaZero (I knew about AlphaGo, but I don’t think I paid much attention to it)

Back in college, while I was skipping classes and exams, I spent a lot of time playing cards. Mostly typical Spanish games like Tute or Mus, usually with no serious bets — just the loser paying for the beers. I didn’t really understand gradient descent back then, but I had the idea of training a program to play cards. It sounded so cool!

My typical college studying setup (source)

I never got to it, partly because I didn’t know how to start, partly because it was the 90s, and the technology wasn’t there yet. But the idea of training a model to play a game stuck with me, and the success of AlphaZero pushed me to start reading about Machine Learning and Deep Learning (slightly) before it was trendy.

All this is to say I really wanted to start a project like this: training a model to play a game. Now the technology is definitely there, there are lots of internet communities dedicated to this, and for doing something simple, you just need your personal computer.

The project

Connect-4 is a solved game. With perfect play, the first player always wins. It is a simple game in which two players drop colored disks in a vertical 7-column, 6-row grid.

Connect-4 board — Image source: Wikipedia

It is also a game with 7.5 trillion possible states. So, it should be complex enough to require some work to find the right approach. What I don’t know (yet) is whether it will be too challenging for a learning experiment on a personal computer. We’ll see.

I am using this project to learn about Reinforcement Learning, probably starting with Deep Q-learning, but I still don’t know if that is the right approach. The exciting part is that there will be many other things to learn along the way.

I will be uploading the code to this repo.