Reinforcement Learning Tic-Tac-Toe

An interactive Tic-Tac-Toe game powered by Reinforcement Learning using a Q-learning agent and Minimax algorithm.

PythonReinforcement LearningQ-LearningMinimaxStreamlit

Overview

I built a Python-based reinforcement learning project that trains an AI agent to play Tic-Tac-Toe perfectly using Q-Learning. The project also features an interactive web interface built with Streamlit, allowing users to play against the trained AI in real-time. My goal was to explore reinforcement learning algorithms and demonstrate how a machine can learn optimal strategies through self-play.

My Approach

The project implements two distinct AI opponents: a Q-Learning agent and a classic Minimax AI. For the Q-Learning agent, I designed a custom game environment that handles the board state and win/loss conditions. The state is represented as a 9-character string, and the agent uses an epsilon-greedy policy during training to balance exploration and exploitation.

Reward Structure: The agent receives a +1 reward for winning, -1 for losing (penalized retroactively), and 0 for draws or intermediate moves.
Persistent Learning: The AI saves its learned Q-table so it doesn't have to retrain every time the app starts.
Interactive UI: The Streamlit app provides a seamless interface for users to challenge the AI, with the agent automatically exploiting its optimal policy during actual gameplay.

Results & Learnings

The Q-Learning agent successfully learned an optimal policy, demonstrating the power of reinforcement learning in mastering perfect-information games. Building this project deepened my understanding of the Bellman equation, reward shaping, and the trade-offs between different AI approaches like Minimax and Q-Learning. Integrating the Python backend with a Streamlit frontend also provided valuable experience in deploying interactive machine learning applications.