Reinforcement Learning Tic-Tac-Toe
An interactive Tic-Tac-Toe game powered by Reinforcement Learning using a Q-learning agent and Minimax algorithm.
Overview
I built a Python-based reinforcement learning project that trains an AI agent to play Tic-Tac-Toe perfectly using Q-Learning. The project also features an interactive web interface built with Streamlit, allowing users to play against the trained AI in real-time. My goal was to explore reinforcement learning algorithms and demonstrate how a machine can learn optimal strategies through self-play.
My Approach
The project implements two distinct AI opponents: a Q-Learning agent and a classic Minimax AI. For the Q-Learning agent, I designed a custom game environment that handles the board state and win/loss conditions. The state is represented as a 9-character string, and the agent uses an epsilon-greedy policy during training to balance exploration and exploitation.
- Reward Structure: The agent receives a +1 reward for winning, -1 for losing (penalized retroactively), and 0 for draws or intermediate moves.
- Persistent Learning: The AI saves its learned Q-table so it doesn't have to retrain every time the app starts.
- Interactive UI: The Streamlit app provides a seamless interface for users to challenge the AI, with the agent automatically exploiting its optimal policy during actual gameplay.
Results & Learnings
The Q-Learning agent successfully learned an optimal policy, demonstrating the power of reinforcement learning in mastering perfect-information games. Building this project deepened my understanding of the Bellman equation, reward shaping, and the trade-offs between different AI approaches like Minimax and Q-Learning. Integrating the Python backend with a Streamlit frontend also provided valuable experience in deploying interactive machine learning applications.