2024-12-07 00:40:57 +04:00

3411 lines
76 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Начало лабораторной работы"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Что необходимо сделать:\n",
"Развернуть и запустить проект по реализации обучения с подкреплением для игры \"Крестики-нолики\". Перевести проект на библиотеку gymnasium и современную версию Python. Реализовать агента для игры \"Крестики-нолики\" в виде отдельного класса (по примеру из лекции). Переписать основной цикл обучения для работы с отдельным классом агента (по примеру из лекции). Выполнить тестирование новой версии программы."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Наши крестики-нолики: https://github.com/nczempin/gym-tic-tac-toe"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Перевод проекта на библиотеку gymnasium\n",
"Gymnasium — это открытая библиотека Python, которая предоставляет стандартизированные среды для разработки и тестирования алгоритмов обучения с подкреплением (Reinforcement Learning, RL). Ранее была известна как OpenAI Gym (до 2022 года), но теперь развивается под новым именем.\n",
"\n",
"Библиотека позволяет разработчикам RL-агентов взаимодействовать с различными симуляциями — от простых игровых задач до сложных физических моделей. Gymnasium упрощает процесс тестирования и сравнения алгоритмов благодаря унифицированному интерфейсу.\n",
"\n",
"**Основные возможности Gymnasium:**\n",
"- Стандартизированные интерфейсы для взаимодействия со средой. \n",
"- Простая интеграция с популярными библиотеками RL. \n",
"- Поддержка пользовательских сред.\n",
"\n",
"**Основные функции Gymnasium:**\n",
"- Инициализация среды (gymnasium.make()): позволяет создавать экземпляр среды по её имени.\n",
"- Сброс среды (reset()): возвращает начальное состояние среды и другую информацию.\n",
"- Выполнение действия (step(action)): передаёт действие агенту и возвращает результат.\n",
"- Закрытие среды (close()): очищает ресурсы, связанные со средой.\n",
"- Режим рендеринга (render()): позволяет визуализировать работу среды."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import gymnasium as gym\n",
"from gymnasium import spaces\n",
"\n",
"class TicTacToeEnv(gym.Env):\n",
" metadata = {'render.modes': ['human']}\n",
" \n",
" symbols = ['O', ' ', 'X']\n",
"\n",
" def __init__(self):\n",
" super().__init__()\n",
" self.action_space = spaces.Discrete(9) # Дискретное пространство действий (08), соответствующее номерам клеток на игровом поле (от 0 до 8).\n",
" self.observation_space = spaces.Discrete(9 * 3 * 2) # Дискретное пространство состояний. Расчёт: 9 клеток × 3 состояния (пустая, крестик, нолик) × 2 игрока (на чей ход).\n",
" self.reset()\n",
"\n",
" def step(self, action):\n",
" done = False\n",
" reward = 0\n",
"\n",
" p, square = action # p - игрок (1 или -1), square - номер клетки\n",
"\n",
" board = self.state['board']\n",
" proposed = board[square] \n",
" om = self.state['on_move'] \n",
" if proposed != 0: # Клетка уже занята\n",
" print(f\"Незаконный ход: Квадрат {square} уже занят.\")\n",
" done = True\n",
" reward = -1 * om \n",
" if p != om: # Не тот игрок на ходу\n",
" print(f\"Незаконный ход: игрок {p} не находится в движении\")\n",
" done = True\n",
" reward = -1 * om\n",
" else:\n",
" board[square] = p\n",
" self.state['on_move'] = -p\n",
"\n",
" for i in range(3):\n",
" # Горизонтали и вертикали\n",
" if (board[i * 3] == p and board[i * 3 + 1] == p and board[i * 3 + 2] == p) or \\\n",
" (board[i] == p and board[i + 3] == p and board[i + 6] == p):\n",
" reward = p\n",
" done = True\n",
" break\n",
"\n",
" # Диагонали\n",
" if (board[0] == p and board[4] == p and board[8] == p) or \\\n",
" (board[2] == p and board[4] == p and board[6] == p):\n",
" reward = p\n",
" done = True\n",
" \n",
" return self.state, reward, done, {}\n",
"\n",
" def reset(self):\n",
" self.state = {}\n",
" self.state['board'] = [0, 0, 0, 0, 0, 0, 0, 0, 0] \n",
" self.state['on_move'] = 1 \n",
" return self.state, {}\n",
"\n",
" def render(self, close=False):\n",
" if close:\n",
" return\n",
" print(\"on move: \" , self.symbols[self.state['on_move']+1])\n",
" for i in range (9):\n",
" print (self.symbols[self.state['board'][i]+1], end=\" \")\n",
" if ((i % 3) == 2):\n",
" print()\n",
"\n",
" def move_generator(self):\n",
" moves = []\n",
" for i in range(9):\n",
" if self.state['board'][i] == 0:\n",
" p = self.state['on_move']\n",
" m = [p, i]\n",
" moves.append(m)\n",
" return moves"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Реализация агента\n",
"В контексте обучения с подкреплением (Reinforcement Learning, RL) агент — это программный компонент, который взаимодействует с окружающей средой (environment) с целью научиться выбирать оптимальные действия для достижения своей цели.\n",
"\n",
"## Роль агента: \n",
"Агент принимает решение (выбирает действие), основываясь на текущем состоянии среды, и затем получает обратную связь (награду и новое состояние) от среды. \n",
"\n",
"## Функционал агента: \n",
"Выбор действия: Использует алгоритмы или стратегии, чтобы определить, что делать дальше. \n",
"Обучение: Обновляет свои знания или стратегию на основе опыта, чтобы лучше справляться с задачей. \n",
"Интерактивность: Адаптируется к изменениям в среде. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"\n",
"class Agent:\n",
" def __init__(self, symbol):\n",
" self.symbol = symbol # Символ игрока (1 - X, -1 - O)\n",
" \n",
" def get_action(self, moves):\n",
" return random.choice(moves) # Выбираем случайный ход из доступных"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Основной цикл обучения"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
"O X \n",
"on move: O\n",
" \n",
" \n",
"O X X \n",
"on move: X\n",
" \n",
" O \n",
"O X X \n",
"on move: O\n",
" \n",
" X O \n",
"O X X \n",
"on move: X\n",
" O \n",
" X O \n",
"O X X \n",
"on move: O\n",
" X O \n",
" X O \n",
"O X X \n",
"Episode 1, Total Reward: 1\n",
"Average Reward: 1.0\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
"O \n",
" X \n",
"on move: O\n",
" \n",
"O X \n",
" X \n",
"on move: X\n",
" \n",
"O O X \n",
" X \n",
"on move: O\n",
" X \n",
"O O X \n",
" X \n",
"on move: X\n",
" X O \n",
"O O X \n",
" X \n",
"on move: O\n",
" X O \n",
"O O X \n",
"X X \n",
"on move: X\n",
"O X O \n",
"O O X \n",
"X X \n",
"on move: O\n",
"O X O \n",
"O O X \n",
"X X X \n",
"Episode 2, Total Reward: 1\n",
"Average Reward: 1.0\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" O \n",
" X \n",
"on move: O\n",
" X \n",
" O \n",
" X \n",
"on move: X\n",
" X \n",
" O O \n",
" X \n",
"on move: O\n",
" X \n",
" O O \n",
"X X \n",
"on move: X\n",
" X \n",
" O O \n",
"X X O \n",
"on move: O\n",
" X X \n",
" O O \n",
"X X O \n",
"on move: X\n",
" X X \n",
"O O O \n",
"X X O \n",
"Episode 3, Total Reward: -1\n",
"Average Reward: 0.3333333333333333\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" \n",
"X X \n",
"on move: X\n",
" O \n",
" \n",
"X X O \n",
"on move: O\n",
" O X \n",
" \n",
"X X O \n",
"on move: X\n",
" O X \n",
"O \n",
"X X O \n",
"on move: O\n",
" O X \n",
"O X \n",
"X X O \n",
"Episode 4, Total Reward: 1\n",
"Average Reward: 0.5\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" \n",
"X O \n",
"on move: O\n",
" X \n",
" \n",
"X O \n",
"on move: X\n",
" O X \n",
" \n",
"X O \n",
"on move: O\n",
" O X \n",
" X \n",
"X O \n",
"Episode 5, Total Reward: 1\n",
"Average Reward: 0.6\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X O \n",
" \n",
" \n",
"on move: O\n",
"X O \n",
"X \n",
" \n",
"on move: X\n",
"X O \n",
"X \n",
"O \n",
"on move: O\n",
"X O \n",
"X \n",
"O X \n",
"on move: X\n",
"X O \n",
"X \n",
"O X O \n",
"on move: O\n",
"X O \n",
"X X \n",
"O X O \n",
"on move: X\n",
"X O \n",
"X O X \n",
"O X O \n",
"on move: O\n",
"X O X \n",
"X O X \n",
"O X O \n",
"Episode 6, Total Reward: 0\n",
"Average Reward: 0.5\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" \n",
"X O \n",
"on move: O\n",
" \n",
" X \n",
"X O \n",
"on move: X\n",
" \n",
" X \n",
"X O O \n",
"on move: O\n",
"X \n",
" X \n",
"X O O \n",
"on move: X\n",
"X \n",
" O X \n",
"X O O \n",
"on move: O\n",
"X X \n",
" O X \n",
"X O O \n",
"on move: X\n",
"X X \n",
"O O X \n",
"X O O \n",
"on move: O\n",
"X X X \n",
"O O X \n",
"X O O \n",
"Episode 7, Total Reward: 1\n",
"Average Reward: 0.5714285714285714\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" \n",
"X O \n",
"on move: O\n",
" \n",
" X \n",
"X O \n",
"on move: X\n",
" \n",
" X O \n",
"X O \n",
"on move: O\n",
"X \n",
" X O \n",
"X O \n",
"on move: X\n",
"X O \n",
" X O \n",
"X O \n",
"on move: O\n",
"X O \n",
"X X O \n",
"X O \n",
"Episode 8, Total Reward: 1\n",
"Average Reward: 0.625\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
"O \n",
"on move: O\n",
" X \n",
" X \n",
"O \n",
"on move: X\n",
" X \n",
" X \n",
"O O \n",
"on move: O\n",
" X \n",
" X \n",
"O X O \n",
"on move: X\n",
" X \n",
"O X \n",
"O X O \n",
"on move: O\n",
"X X \n",
"O X \n",
"O X O \n",
"on move: X\n",
"X O X \n",
"O X \n",
"O X O \n",
"on move: O\n",
"X O X \n",
"O X X \n",
"O X O \n",
"Episode 9, Total Reward: 0\n",
"Average Reward: 0.5555555555555556\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
" X O \n",
"on move: O\n",
"X \n",
" \n",
" X O \n",
"on move: X\n",
"X \n",
"O \n",
" X O \n",
"on move: O\n",
"X X \n",
"O \n",
" X O \n",
"on move: X\n",
"X X \n",
"O O \n",
" X O \n",
"on move: O\n",
"X X \n",
"O X O \n",
" X O \n",
"Episode 10, Total Reward: 1\n",
"Average Reward: 0.6\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
"O X \n",
" \n",
" \n",
"on move: O\n",
"O X \n",
" \n",
" X \n",
"on move: X\n",
"O O X \n",
" \n",
" X \n",
"on move: O\n",
"O O X \n",
" X \n",
" X \n",
"on move: X\n",
"O O X \n",
" X \n",
" O X \n",
"on move: O\n",
"O O X \n",
" X \n",
"X O X \n",
"Episode 11, Total Reward: 1\n",
"Average Reward: 0.6363636363636364\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O \n",
" X X \n",
" \n",
"on move: X\n",
" O \n",
" X X \n",
"O \n",
"on move: O\n",
" O \n",
" X X \n",
"O X \n",
"on move: X\n",
" O \n",
"O X X \n",
"O X \n",
"on move: O\n",
"X O \n",
"O X X \n",
"O X \n",
"Episode 12, Total Reward: 1\n",
"Average Reward: 0.6666666666666666\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
"O \n",
"on move: O\n",
" \n",
" X X \n",
"O \n",
"on move: X\n",
" O \n",
" X X \n",
"O \n",
"on move: O\n",
" O \n",
"X X X \n",
"O \n",
"Episode 13, Total Reward: 1\n",
"Average Reward: 0.6923076923076923\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X O \n",
" \n",
"on move: O\n",
" \n",
"X X O \n",
" \n",
"on move: X\n",
" O \n",
"X X O \n",
" \n",
"on move: O\n",
" O \n",
"X X O \n",
" X \n",
"on move: X\n",
" O O \n",
"X X O \n",
" X \n",
"on move: O\n",
"X O O \n",
"X X O \n",
" X \n",
"Episode 14, Total Reward: 1\n",
"Average Reward: 0.7142857142857143\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
"O \n",
" X \n",
" \n",
"on move: O\n",
"O \n",
" X \n",
" X \n",
"on move: X\n",
"O \n",
" X \n",
" X O \n",
"on move: O\n",
"O X \n",
" X \n",
" X O \n",
"on move: X\n",
"O X \n",
" X \n",
"O X O \n",
"on move: O\n",
"O X \n",
"X X \n",
"O X O \n",
"on move: X\n",
"O X \n",
"X O X \n",
"O X O \n",
"Episode 15, Total Reward: -1\n",
"Average Reward: 0.6\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
"O \n",
" X \n",
"on move: O\n",
" X \n",
"O \n",
" X \n",
"on move: X\n",
" X \n",
"O \n",
"O X \n",
"on move: O\n",
"X X \n",
"O \n",
"O X \n",
"on move: X\n",
"X X \n",
"O O \n",
"O X \n",
"on move: O\n",
"X X \n",
"O X O \n",
"O X \n",
"on move: X\n",
"X O X \n",
"O X O \n",
"O X \n",
"on move: O\n",
"X O X \n",
"O X O \n",
"O X X \n",
"Episode 16, Total Reward: 1\n",
"Average Reward: 0.625\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" O \n",
" \n",
"on move: O\n",
"X X \n",
" O \n",
" \n",
"on move: X\n",
"X X \n",
" O \n",
"O \n",
"on move: O\n",
"X X \n",
" O \n",
"O X \n",
"on move: X\n",
"X X O \n",
" O \n",
"O X \n",
"Episode 17, Total Reward: -1\n",
"Average Reward: 0.5294117647058824\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
"O \n",
"on move: O\n",
"X X \n",
" \n",
"O \n",
"on move: X\n",
"X X O \n",
" \n",
"O \n",
"on move: O\n",
"X X O \n",
" \n",
"O X \n",
"on move: X\n",
"X X O \n",
" O \n",
"O X \n",
"Episode 18, Total Reward: -1\n",
"Average Reward: 0.4444444444444444\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" O \n",
" X \n",
"on move: O\n",
" \n",
"X O \n",
" X \n",
"on move: X\n",
"O \n",
"X O \n",
" X \n",
"on move: O\n",
"O \n",
"X O X \n",
" X \n",
"on move: X\n",
"O \n",
"X O X \n",
"O X \n",
"on move: O\n",
"O \n",
"X O X \n",
"O X X \n",
"on move: X\n",
"O O \n",
"X O X \n",
"O X X \n",
"Episode 19, Total Reward: -1\n",
"Average Reward: 0.3684210526315789\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O \n",
" X \n",
" X \n",
"on move: X\n",
" O \n",
"O X \n",
" X \n",
"on move: O\n",
" O \n",
"O X \n",
"X X \n",
"on move: X\n",
"O O \n",
"O X \n",
"X X \n",
"on move: O\n",
"O O \n",
"O X \n",
"X X X \n",
"Episode 20, Total Reward: 1\n",
"Average Reward: 0.4\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
"O \n",
" X \n",
" \n",
"on move: O\n",
"O \n",
"X X \n",
" \n",
"on move: X\n",
"O \n",
"X X \n",
"O \n",
"on move: O\n",
"O \n",
"X X X \n",
"O \n",
"Episode 21, Total Reward: 1\n",
"Average Reward: 0.42857142857142855\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
"X \n",
" X \n",
" O \n",
"on move: X\n",
"X \n",
"O X \n",
" O \n",
"on move: O\n",
"X X \n",
"O X \n",
" O \n",
"on move: X\n",
"X X \n",
"O X O \n",
" O \n",
"on move: O\n",
"X X X \n",
"O X O \n",
" O \n",
"Episode 22, Total Reward: 1\n",
"Average Reward: 0.45454545454545453\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" \n",
" X X \n",
"on move: X\n",
" O \n",
" O \n",
" X X \n",
"on move: O\n",
" O \n",
"X O \n",
" X X \n",
"on move: X\n",
"O O \n",
"X O \n",
" X X \n",
"on move: O\n",
"O X O \n",
"X O \n",
" X X \n",
"on move: X\n",
"O X O \n",
"X O \n",
"O X X \n",
"Episode 23, Total Reward: -1\n",
"Average Reward: 0.391304347826087\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" \n",
" X \n",
"X O \n",
"on move: X\n",
" \n",
" X \n",
"X O O \n",
"on move: O\n",
" \n",
" X X \n",
"X O O \n",
"on move: X\n",
" O \n",
" X X \n",
"X O O \n",
"on move: O\n",
" O \n",
"X X X \n",
"X O O \n",
"Episode 24, Total Reward: 1\n",
"Average Reward: 0.4166666666666667\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
"O \n",
" \n",
"on move: O\n",
"X X \n",
"O \n",
" \n",
"on move: X\n",
"X X \n",
"O \n",
" O \n",
"on move: O\n",
"X X \n",
"O \n",
" O X \n",
"on move: X\n",
"X X \n",
"O O \n",
" O X \n",
"on move: O\n",
"X X \n",
"O O \n",
"X O X \n",
"on move: X\n",
"X X O \n",
"O O \n",
"X O X \n",
"on move: O\n",
"X X O \n",
"O O X \n",
"X O X \n",
"Episode 25, Total Reward: 0\n",
"Average Reward: 0.4\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X X \n",
" \n",
" O \n",
"on move: X\n",
" X X \n",
" \n",
"O O \n",
"on move: O\n",
" X X \n",
" X \n",
"O O \n",
"on move: X\n",
"O X X \n",
" X \n",
"O O \n",
"on move: O\n",
"O X X \n",
" X \n",
"O X O \n",
"Episode 26, Total Reward: 1\n",
"Average Reward: 0.4230769230769231\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
"O \n",
"X \n",
"on move: O\n",
"X \n",
"O \n",
"X \n",
"on move: X\n",
"X O \n",
"O \n",
"X \n",
"on move: O\n",
"X O X \n",
"O \n",
"X \n",
"on move: X\n",
"X O X \n",
"O \n",
"X O \n",
"on move: O\n",
"X O X \n",
"O X \n",
"X O \n",
"on move: X\n",
"X O X \n",
"O X \n",
"X O O \n",
"on move: O\n",
"X O X \n",
"O X X \n",
"X O O \n",
"Episode 27, Total Reward: 1\n",
"Average Reward: 0.4444444444444444\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
"O \n",
" \n",
"on move: O\n",
"X \n",
"O X \n",
" \n",
"on move: X\n",
"X \n",
"O X \n",
" O \n",
"on move: O\n",
"X \n",
"O X \n",
" X O \n",
"on move: X\n",
"X \n",
"O X \n",
"O X O \n",
"on move: O\n",
"X X \n",
"O X \n",
"O X O \n",
"Episode 28, Total Reward: 1\n",
"Average Reward: 0.4642857142857143\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" O \n",
" X \n",
"on move: O\n",
" X \n",
" O \n",
" X \n",
"on move: X\n",
" X \n",
" O \n",
"O X \n",
"on move: O\n",
" X X \n",
" O \n",
"O X \n",
"on move: X\n",
" X X \n",
" O \n",
"O X O \n",
"on move: O\n",
" X X \n",
" O X \n",
"O X O \n",
"on move: X\n",
" X X \n",
"O O X \n",
"O X O \n",
"on move: O\n",
"X X X \n",
"O O X \n",
"O X O \n",
"Episode 29, Total Reward: 1\n",
"Average Reward: 0.4827586206896552\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X \n",
"O \n",
"on move: O\n",
" X \n",
"X \n",
"O \n",
"on move: X\n",
" X \n",
"X O \n",
"O \n",
"on move: O\n",
"X X \n",
"X O \n",
"O \n",
"on move: X\n",
"X X \n",
"X O O \n",
"O \n",
"on move: O\n",
"X X \n",
"X O O \n",
"O X \n",
"on move: X\n",
"X X \n",
"X O O \n",
"O O X \n",
"on move: O\n",
"X X X \n",
"X O O \n",
"O O X \n",
"Episode 30, Total Reward: 1\n",
"Average Reward: 0.5\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
"O \n",
" \n",
" X \n",
"on move: O\n",
"O \n",
"X \n",
" X \n",
"on move: X\n",
"O \n",
"X O \n",
" X \n",
"on move: O\n",
"O \n",
"X O X \n",
" X \n",
"on move: X\n",
"O \n",
"X O X \n",
" X O \n",
"Episode 31, Total Reward: -1\n",
"Average Reward: 0.45161290322580644\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X O \n",
" \n",
" \n",
"on move: O\n",
"X O \n",
" \n",
" X \n",
"on move: X\n",
"X O O \n",
" \n",
" X \n",
"on move: O\n",
"X O O \n",
" X \n",
" X \n",
"on move: X\n",
"X O O \n",
" X \n",
" X O \n",
"on move: O\n",
"X O O \n",
" X X \n",
" X O \n",
"on move: X\n",
"X O O \n",
" X X \n",
"O X O \n",
"on move: O\n",
"X O O \n",
"X X X \n",
"O X O \n",
"Episode 32, Total Reward: 1\n",
"Average Reward: 0.46875\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" \n",
"X X \n",
"on move: X\n",
" O \n",
" O \n",
"X X \n",
"on move: O\n",
" O \n",
" O \n",
"X X X \n",
"Episode 33, Total Reward: 1\n",
"Average Reward: 0.48484848484848486\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
"O \n",
"on move: O\n",
" X \n",
" X \n",
"O \n",
"on move: X\n",
" X O \n",
" X \n",
"O \n",
"on move: O\n",
" X O \n",
" X \n",
"O X \n",
"on move: X\n",
" X O \n",
"O X \n",
"O X \n",
"on move: O\n",
" X O \n",
"O X \n",
"O X X \n",
"on move: X\n",
" X O \n",
"O O X \n",
"O X X \n",
"Episode 34, Total Reward: -1\n",
"Average Reward: 0.4411764705882353\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" O \n",
" \n",
"on move: O\n",
"X \n",
" O \n",
" X \n",
"on move: X\n",
"X O \n",
" O \n",
" X \n",
"on move: O\n",
"X O \n",
" O \n",
"X X \n",
"on move: X\n",
"X O \n",
" O O \n",
"X X \n",
"on move: O\n",
"X O X \n",
" O O \n",
"X X \n",
"on move: X\n",
"X O X \n",
"O O O \n",
"X X \n",
"Episode 35, Total Reward: -1\n",
"Average Reward: 0.4\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X \n",
" X \n",
" O \n",
"on move: X\n",
" X \n",
" X \n",
" O O \n",
"on move: O\n",
" X X \n",
" X \n",
" O O \n",
"on move: X\n",
" X X \n",
" X \n",
"O O O \n",
"Episode 36, Total Reward: -1\n",
"Average Reward: 0.3611111111111111\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" O \n",
" X \n",
"on move: O\n",
" \n",
"X O \n",
" X \n",
"on move: X\n",
" O \n",
"X O \n",
" X \n",
"on move: O\n",
"X O \n",
"X O \n",
" X \n",
"on move: X\n",
"X O O \n",
"X O \n",
" X \n",
"on move: O\n",
"X O O \n",
"X O \n",
"X X \n",
"Episode 37, Total Reward: 1\n",
"Average Reward: 0.3783783783783784\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
"O \n",
"on move: O\n",
"X \n",
"X \n",
"O \n",
"on move: X\n",
"X \n",
"X O \n",
"O \n",
"on move: O\n",
"X \n",
"X X O \n",
"O \n",
"on move: X\n",
"X O \n",
"X X O \n",
"O \n",
"on move: O\n",
"X O X \n",
"X X O \n",
"O \n",
"on move: X\n",
"X O X \n",
"X X O \n",
"O O \n",
"on move: O\n",
"X O X \n",
"X X O \n",
"O X O \n",
"Episode 38, Total Reward: 0\n",
"Average Reward: 0.3684210526315789\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" O \n",
" \n",
"on move: O\n",
"X X \n",
" O \n",
" \n",
"on move: X\n",
"X X \n",
" O \n",
"O \n",
"on move: O\n",
"X X X \n",
" O \n",
"O \n",
"Episode 39, Total Reward: 1\n",
"Average Reward: 0.38461538461538464\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
" O \n",
"on move: O\n",
"X \n",
" X \n",
" O \n",
"on move: X\n",
"X \n",
" X \n",
" O O \n",
"on move: O\n",
"X X \n",
" X \n",
" O O \n",
"on move: X\n",
"X X \n",
" O X \n",
" O O \n",
"on move: O\n",
"X X X \n",
" O X \n",
" O O \n",
"Episode 40, Total Reward: 1\n",
"Average Reward: 0.4\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" O \n",
" X \n",
"on move: O\n",
" \n",
" O \n",
"X X \n",
"on move: X\n",
" O \n",
" O \n",
"X X \n",
"on move: O\n",
" O \n",
"X O \n",
"X X \n",
"on move: X\n",
"O O \n",
"X O \n",
"X X \n",
"on move: O\n",
"O O \n",
"X O \n",
"X X X \n",
"Episode 41, Total Reward: 1\n",
"Average Reward: 0.4146341463414634\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O \n",
"X X \n",
" \n",
"on move: X\n",
" O \n",
"X X \n",
" O \n",
"on move: O\n",
" O X \n",
"X X \n",
" O \n",
"on move: X\n",
" O X \n",
"X X \n",
"O O \n",
"on move: O\n",
" O X \n",
"X X \n",
"O X O \n",
"on move: X\n",
" O X \n",
"X X O \n",
"O X O \n",
"on move: O\n",
"X O X \n",
"X X O \n",
"O X O \n",
"Episode 42, Total Reward: 0\n",
"Average Reward: 0.40476190476190477\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" X O \n",
" \n",
" X \n",
"on move: X\n",
" X O \n",
" O \n",
" X \n",
"on move: O\n",
" X O \n",
" O X \n",
" X \n",
"on move: X\n",
"O X O \n",
" O X \n",
" X \n",
"on move: O\n",
"O X O \n",
" O X \n",
" X X \n",
"on move: X\n",
"O X O \n",
"O O X \n",
" X X \n",
"on move: O\n",
"O X O \n",
"O O X \n",
"X X X \n",
"Episode 43, Total Reward: 1\n",
"Average Reward: 0.4186046511627907\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" \n",
"X O \n",
"on move: O\n",
" \n",
" \n",
"X O X \n",
"on move: X\n",
" \n",
" O \n",
"X O X \n",
"on move: O\n",
" X \n",
" O \n",
"X O X \n",
"on move: X\n",
"O X \n",
" O \n",
"X O X \n",
"on move: O\n",
"O X \n",
"X O \n",
"X O X \n",
"on move: X\n",
"O X O \n",
"X O \n",
"X O X \n",
"on move: O\n",
"O X O \n",
"X X O \n",
"X O X \n",
"Episode 44, Total Reward: 0\n",
"Average Reward: 0.4090909090909091\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
"O X \n",
" \n",
" \n",
"on move: O\n",
"O X \n",
"X \n",
" \n",
"on move: X\n",
"O X \n",
"X \n",
" O \n",
"on move: O\n",
"O X \n",
"X X \n",
" O \n",
"on move: X\n",
"O X \n",
"X X \n",
"O O \n",
"on move: O\n",
"O X \n",
"X X X \n",
"O O \n",
"Episode 45, Total Reward: 1\n",
"Average Reward: 0.4222222222222222\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X O \n",
" \n",
"on move: O\n",
" \n",
"X O \n",
" X \n",
"on move: X\n",
" \n",
"X O O \n",
" X \n",
"on move: O\n",
" \n",
"X O O \n",
" X X \n",
"on move: X\n",
" O \n",
"X O O \n",
" X X \n",
"on move: O\n",
" O \n",
"X O O \n",
"X X X \n",
"Episode 46, Total Reward: 1\n",
"Average Reward: 0.43478260869565216\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
"O \n",
"X \n",
"on move: O\n",
" \n",
"O X \n",
"X \n",
"on move: X\n",
"O \n",
"O X \n",
"X \n",
"on move: O\n",
"O \n",
"O X \n",
"X X \n",
"on move: X\n",
"O O \n",
"O X \n",
"X X \n",
"on move: O\n",
"O O \n",
"O X \n",
"X X X \n",
"Episode 47, Total Reward: 1\n",
"Average Reward: 0.44680851063829785\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
"O \n",
" \n",
" X \n",
"on move: O\n",
"O X \n",
" \n",
" X \n",
"on move: X\n",
"O X \n",
"O \n",
" X \n",
"on move: O\n",
"O X X \n",
"O \n",
" X \n",
"on move: X\n",
"O X X \n",
"O \n",
"O X \n",
"Episode 48, Total Reward: -1\n",
"Average Reward: 0.4166666666666667\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
"O \n",
"X \n",
" \n",
"on move: O\n",
"O X \n",
"X \n",
" \n",
"on move: X\n",
"O O X \n",
"X \n",
" \n",
"on move: O\n",
"O O X \n",
"X \n",
" X \n",
"on move: X\n",
"O O X \n",
"X O \n",
" X \n",
"on move: O\n",
"O O X \n",
"X O \n",
" X X \n",
"on move: X\n",
"O O X \n",
"X O O \n",
" X X \n",
"on move: O\n",
"O O X \n",
"X O O \n",
"X X X \n",
"Episode 49, Total Reward: 1\n",
"Average Reward: 0.42857142857142855\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
"O \n",
"on move: O\n",
"X X \n",
" \n",
"O \n",
"on move: X\n",
"X X \n",
" O \n",
"O \n",
"on move: O\n",
"X X \n",
" X O \n",
"O \n",
"on move: X\n",
"X X \n",
"O X O \n",
"O \n",
"on move: O\n",
"X X X \n",
"O X O \n",
"O \n",
"Episode 50, Total Reward: 1\n",
"Average Reward: 0.44\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" \n",
"X O \n",
"on move: O\n",
"X \n",
" \n",
"X O \n",
"on move: X\n",
"X O \n",
" \n",
"X O \n",
"on move: O\n",
"X O \n",
" \n",
"X O X \n",
"on move: X\n",
"X O \n",
" O \n",
"X O X \n",
"Episode 51, Total Reward: -1\n",
"Average Reward: 0.4117647058823529\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O \n",
" X \n",
" X \n",
"on move: X\n",
" O \n",
" X \n",
"O X \n",
"on move: O\n",
" O \n",
"X X \n",
"O X \n",
"on move: X\n",
" O \n",
"X O X \n",
"O X \n",
"on move: O\n",
" O X \n",
"X O X \n",
"O X \n",
"Episode 52, Total Reward: 1\n",
"Average Reward: 0.4230769230769231\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
"O \n",
" X \n",
"on move: O\n",
" \n",
"O \n",
" X X \n",
"on move: X\n",
" \n",
"O O \n",
" X X \n",
"on move: O\n",
" X \n",
"O O \n",
" X X \n",
"on move: X\n",
" X \n",
"O O O \n",
" X X \n",
"Episode 53, Total Reward: -1\n",
"Average Reward: 0.39622641509433965\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X O \n",
" \n",
" \n",
"on move: O\n",
"X O \n",
" X \n",
" \n",
"on move: X\n",
"X O \n",
" X O \n",
" \n",
"on move: O\n",
"X O \n",
" X O \n",
"X \n",
"on move: X\n",
"X O \n",
"O X O \n",
"X \n",
"on move: O\n",
"X O X \n",
"O X O \n",
"X \n",
"Episode 54, Total Reward: 1\n",
"Average Reward: 0.4074074074074074\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" O \n",
" \n",
"X \n",
"on move: O\n",
"X O \n",
" \n",
"X \n",
"on move: X\n",
"X O \n",
" \n",
"X O \n",
"on move: O\n",
"X O \n",
" \n",
"X X O \n",
"on move: X\n",
"X O \n",
" O \n",
"X X O \n",
"Episode 55, Total Reward: -1\n",
"Average Reward: 0.38181818181818183\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" O \n",
" \n",
"on move: O\n",
" X \n",
" O X \n",
" \n",
"on move: X\n",
" X \n",
"O O X \n",
" \n",
"on move: O\n",
" X X \n",
"O O X \n",
" \n",
"on move: X\n",
" X X \n",
"O O X \n",
" O \n",
"on move: O\n",
" X X \n",
"O O X \n",
" X O \n",
"on move: X\n",
" X X \n",
"O O X \n",
"O X O \n",
"on move: O\n",
"X X X \n",
"O O X \n",
"O X O \n",
"Episode 56, Total Reward: 1\n",
"Average Reward: 0.39285714285714285\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
"O \n",
" X \n",
" \n",
"on move: O\n",
"O \n",
" X \n",
" X \n",
"on move: X\n",
"O O \n",
" X \n",
" X \n",
"on move: O\n",
"O O \n",
" X \n",
"X X \n",
"on move: X\n",
"O O \n",
"O X \n",
"X X \n",
"on move: O\n",
"O O X \n",
"O X \n",
"X X \n",
"Episode 57, Total Reward: 1\n",
"Average Reward: 0.40350877192982454\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X \n",
" O \n",
"on move: O\n",
" X \n",
"X \n",
" O \n",
"on move: X\n",
" X \n",
"X \n",
"O O \n",
"on move: O\n",
" X \n",
"X \n",
"O X O \n",
"on move: X\n",
" O X \n",
"X \n",
"O X O \n",
"on move: O\n",
"X O X \n",
"X \n",
"O X O \n",
"on move: X\n",
"X O X \n",
"X O \n",
"O X O \n",
"on move: O\n",
"X O X \n",
"X O X \n",
"O X O \n",
"Episode 58, Total Reward: 0\n",
"Average Reward: 0.39655172413793105\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
"O \n",
"X \n",
"on move: O\n",
" X \n",
"O \n",
"X \n",
"on move: X\n",
" X \n",
"O \n",
"X O \n",
"on move: O\n",
" X \n",
"O X \n",
"X O \n",
"on move: X\n",
" X \n",
"O O X \n",
"X O \n",
"on move: O\n",
"X X \n",
"O O X \n",
"X O \n",
"on move: X\n",
"X X \n",
"O O X \n",
"X O O \n",
"on move: O\n",
"X X X \n",
"O O X \n",
"X O O \n",
"Episode 59, Total Reward: 1\n",
"Average Reward: 0.4067796610169492\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" O \n",
" X \n",
"on move: O\n",
" X \n",
" O \n",
" X \n",
"on move: X\n",
"O X \n",
" O \n",
" X \n",
"on move: O\n",
"O X \n",
" O \n",
" X X \n",
"on move: X\n",
"O X \n",
"O O \n",
" X X \n",
"on move: O\n",
"O X \n",
"O O X \n",
" X X \n",
"on move: X\n",
"O X \n",
"O O X \n",
"O X X \n",
"Episode 60, Total Reward: -1\n",
"Average Reward: 0.38333333333333336\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
"O \n",
"on move: O\n",
"X \n",
" X \n",
"O \n",
"on move: X\n",
"X \n",
" X \n",
"O O \n",
"on move: O\n",
"X \n",
" X X \n",
"O O \n",
"on move: X\n",
"X \n",
"O X X \n",
"O O \n",
"on move: O\n",
"X X \n",
"O X X \n",
"O O \n",
"on move: X\n",
"X X O \n",
"O X X \n",
"O O \n",
"on move: O\n",
"X X O \n",
"O X X \n",
"O O X \n",
"Episode 61, Total Reward: 1\n",
"Average Reward: 0.39344262295081966\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
"O \n",
" \n",
"on move: O\n",
"X \n",
"O \n",
" X \n",
"on move: X\n",
"X \n",
"O \n",
" X O \n",
"on move: O\n",
"X \n",
"O X \n",
" X O \n",
"on move: X\n",
"X O \n",
"O X \n",
" X O \n",
"on move: O\n",
"X O \n",
"O X X \n",
" X O \n",
"on move: X\n",
"X O \n",
"O X X \n",
"O X O \n",
"on move: O\n",
"X O X \n",
"O X X \n",
"O X O \n",
"Episode 62, Total Reward: 0\n",
"Average Reward: 0.3870967741935484\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" O \n",
" \n",
"X \n",
"on move: O\n",
" O \n",
" \n",
"X X \n",
"on move: X\n",
" O \n",
"O \n",
"X X \n",
"on move: O\n",
" O \n",
"O X \n",
"X X \n",
"on move: X\n",
" O \n",
"O X O \n",
"X X \n",
"on move: O\n",
" X O \n",
"O X O \n",
"X X \n",
"Episode 63, Total Reward: 1\n",
"Average Reward: 0.3968253968253968\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
"O \n",
" X \n",
" \n",
"on move: O\n",
"O \n",
" X \n",
"X \n",
"on move: X\n",
"O \n",
" O X \n",
"X \n",
"on move: O\n",
"O X \n",
" O X \n",
"X \n",
"on move: X\n",
"O X O \n",
" O X \n",
"X \n",
"on move: O\n",
"O X O \n",
" O X \n",
"X X \n",
"on move: X\n",
"O X O \n",
" O X \n",
"X O X \n",
"on move: O\n",
"O X O \n",
"X O X \n",
"X O X \n",
"Episode 64, Total Reward: 0\n",
"Average Reward: 0.390625\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
"O \n",
" X \n",
" \n",
"on move: O\n",
"O \n",
" X \n",
"X \n",
"on move: X\n",
"O O \n",
" X \n",
"X \n",
"on move: O\n",
"O O \n",
"X X \n",
"X \n",
"on move: X\n",
"O O O \n",
"X X \n",
"X \n",
"Episode 65, Total Reward: -1\n",
"Average Reward: 0.36923076923076925\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" O \n",
" \n",
"on move: O\n",
" X \n",
" O X \n",
" \n",
"on move: X\n",
" O X \n",
" O X \n",
" \n",
"on move: O\n",
" O X \n",
" O X \n",
" X \n",
"Episode 66, Total Reward: 1\n",
"Average Reward: 0.3787878787878788\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
"X X \n",
" \n",
" O \n",
"on move: X\n",
"X X \n",
" O \n",
" O \n",
"on move: O\n",
"X X X \n",
" O \n",
" O \n",
"Episode 67, Total Reward: 1\n",
"Average Reward: 0.3880597014925373\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" \n",
"X X \n",
" O \n",
"on move: X\n",
" O \n",
"X X \n",
" O \n",
"on move: O\n",
" O \n",
"X X \n",
"X O \n",
"on move: X\n",
" O \n",
"X O X \n",
"X O \n",
"on move: O\n",
" O \n",
"X O X \n",
"X O X \n",
"on move: X\n",
"O O \n",
"X O X \n",
"X O X \n",
"on move: O\n",
"O X O \n",
"X O X \n",
"X O X \n",
"Episode 68, Total Reward: 0\n",
"Average Reward: 0.38235294117647056\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
"O \n",
" X \n",
" \n",
"on move: O\n",
"O \n",
" X \n",
" X \n",
"on move: X\n",
"O \n",
" O X \n",
" X \n",
"on move: O\n",
"O \n",
" O X \n",
"X X \n",
"on move: X\n",
"O O \n",
" O X \n",
"X X \n",
"on move: O\n",
"O O X \n",
" O X \n",
"X X \n",
"on move: X\n",
"O O X \n",
"O O X \n",
"X X \n",
"on move: O\n",
"O O X \n",
"O O X \n",
"X X X \n",
"Episode 69, Total Reward: 1\n",
"Average Reward: 0.391304347826087\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X O \n",
" \n",
" \n",
"on move: O\n",
" X O \n",
" X \n",
" \n",
"on move: X\n",
" X O \n",
" X O \n",
" \n",
"on move: O\n",
" X O \n",
" X O \n",
" X \n",
"Episode 70, Total Reward: 1\n",
"Average Reward: 0.4\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X O \n",
" \n",
"on move: O\n",
" \n",
"X X O \n",
" \n",
"on move: X\n",
" \n",
"X X O \n",
" O \n",
"on move: O\n",
" \n",
"X X O \n",
"X O \n",
"on move: X\n",
" O \n",
"X X O \n",
"X O \n",
"on move: O\n",
"X O \n",
"X X O \n",
"X O \n",
"Episode 71, Total Reward: 1\n",
"Average Reward: 0.4084507042253521\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
"O X \n",
" \n",
"on move: O\n",
" \n",
"O X \n",
" X \n",
"on move: X\n",
" \n",
"O X O \n",
" X \n",
"on move: O\n",
" X \n",
"O X O \n",
" X \n",
"on move: X\n",
" O X \n",
"O X O \n",
" X \n",
"on move: O\n",
"X O X \n",
"O X O \n",
" X \n",
"on move: X\n",
"X O X \n",
"O X O \n",
"O X \n",
"on move: O\n",
"X O X \n",
"O X O \n",
"O X X \n",
"Episode 72, Total Reward: 1\n",
"Average Reward: 0.4166666666666667\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O X \n",
" \n",
" X \n",
"on move: X\n",
"O O X \n",
" \n",
" X \n",
"on move: O\n",
"O O X \n",
" X \n",
" X \n",
"on move: X\n",
"O O X \n",
" X \n",
"O X \n",
"on move: O\n",
"O O X \n",
"X X \n",
"O X \n",
"on move: X\n",
"O O X \n",
"X X O \n",
"O X \n",
"on move: O\n",
"O O X \n",
"X X O \n",
"O X X \n",
"Episode 73, Total Reward: 0\n",
"Average Reward: 0.410958904109589\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X O \n",
" \n",
" \n",
"on move: O\n",
"X O X \n",
" \n",
" \n",
"on move: X\n",
"X O X \n",
"O \n",
" \n",
"on move: O\n",
"X O X \n",
"O \n",
" X \n",
"on move: X\n",
"X O X \n",
"O \n",
" O X \n",
"on move: O\n",
"X O X \n",
"O X \n",
" O X \n",
"Episode 74, Total Reward: 1\n",
"Average Reward: 0.4189189189189189\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
" O X \n",
"on move: O\n",
" \n",
" \n",
"X O X \n",
"on move: X\n",
" O \n",
" \n",
"X O X \n",
"on move: O\n",
"X O \n",
" \n",
"X O X \n",
"on move: X\n",
"X O \n",
" O \n",
"X O X \n",
"on move: O\n",
"X O \n",
"X O \n",
"X O X \n",
"Episode 75, Total Reward: 1\n",
"Average Reward: 0.4266666666666667\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" \n",
"X O \n",
"on move: O\n",
" \n",
" X \n",
"X O \n",
"on move: X\n",
" \n",
" X O \n",
"X O \n",
"on move: O\n",
" X \n",
" X O \n",
"X O \n",
"Episode 76, Total Reward: 1\n",
"Average Reward: 0.4342105263157895\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
"O \n",
" \n",
"on move: O\n",
" X X \n",
"O \n",
" \n",
"on move: X\n",
" X X \n",
"O \n",
" O \n",
"on move: O\n",
"X X X \n",
"O \n",
" O \n",
"Episode 77, Total Reward: 1\n",
"Average Reward: 0.44155844155844154\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" O \n",
"X \n",
"on move: O\n",
" X \n",
" O \n",
"X \n",
"on move: X\n",
" X \n",
" O \n",
"X O \n",
"on move: O\n",
" X X \n",
" O \n",
"X O \n",
"on move: X\n",
" X X \n",
"O O \n",
"X O \n",
"on move: O\n",
" X X \n",
"O O X \n",
"X O \n",
"on move: X\n",
"O X X \n",
"O O X \n",
"X O \n",
"on move: O\n",
"O X X \n",
"O O X \n",
"X O X \n",
"Episode 78, Total Reward: 1\n",
"Average Reward: 0.44871794871794873\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" O \n",
" \n",
"on move: O\n",
" X \n",
" O X \n",
" \n",
"on move: X\n",
" O X \n",
" O X \n",
" \n",
"on move: O\n",
" O X \n",
" O X \n",
" X \n",
"Episode 79, Total Reward: 1\n",
"Average Reward: 0.45569620253164556\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" O \n",
"X \n",
" \n",
"on move: O\n",
"X O \n",
"X \n",
" \n",
"on move: X\n",
"X O \n",
"X O \n",
" \n",
"on move: O\n",
"X X O \n",
"X O \n",
" \n",
"on move: X\n",
"X X O \n",
"X O \n",
"O \n",
"on move: O\n",
"X X O \n",
"X O \n",
"O X \n",
"on move: X\n",
"X X O \n",
"X O O \n",
"O X \n",
"Episode 80, Total Reward: -1\n",
"Average Reward: 0.4375\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
"O \n",
"on move: O\n",
"X \n",
" X \n",
"O \n",
"on move: X\n",
"X \n",
" X O \n",
"O \n",
"on move: O\n",
"X \n",
"X X O \n",
"O \n",
"on move: X\n",
"X O \n",
"X X O \n",
"O \n",
"on move: O\n",
"X O \n",
"X X O \n",
"O X \n",
"Episode 81, Total Reward: 1\n",
"Average Reward: 0.4444444444444444\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" O X \n",
" \n",
"on move: O\n",
" \n",
" O X \n",
" X \n",
"on move: X\n",
" \n",
" O X \n",
"O X \n",
"on move: O\n",
" \n",
"X O X \n",
"O X \n",
"on move: X\n",
" \n",
"X O X \n",
"O X O \n",
"on move: O\n",
" X \n",
"X O X \n",
"O X O \n",
"on move: X\n",
" O X \n",
"X O X \n",
"O X O \n",
"on move: O\n",
"X O X \n",
"X O X \n",
"O X O \n",
"Episode 82, Total Reward: 0\n",
"Average Reward: 0.43902439024390244\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" \n",
"X X \n",
" O \n",
"on move: X\n",
"O \n",
"X X \n",
" O \n",
"on move: O\n",
"O \n",
"X X \n",
" X O \n",
"on move: X\n",
"O O \n",
"X X \n",
" X O \n",
"on move: O\n",
"O O \n",
"X X X \n",
" X O \n",
"Episode 83, Total Reward: 1\n",
"Average Reward: 0.4457831325301205\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X \n",
" X \n",
" O \n",
"on move: X\n",
"O X \n",
" X \n",
" O \n",
"on move: O\n",
"O X \n",
" X X \n",
" O \n",
"on move: X\n",
"O X \n",
"O X X \n",
" O \n",
"on move: O\n",
"O X \n",
"O X X \n",
" X O \n",
"on move: X\n",
"O O X \n",
"O X X \n",
" X O \n",
"on move: O\n",
"O O X \n",
"O X X \n",
"X X O \n",
"Episode 84, Total Reward: 1\n",
"Average Reward: 0.4523809523809524\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" X \n",
" X \n",
" O \n",
"on move: X\n",
" X \n",
" X \n",
" O O \n",
"on move: O\n",
" X \n",
" X \n",
"X O O \n",
"on move: X\n",
" X \n",
"O X \n",
"X O O \n",
"on move: O\n",
" X \n",
"O X X \n",
"X O O \n",
"on move: X\n",
" X O \n",
"O X X \n",
"X O O \n",
"on move: O\n",
"X X O \n",
"O X X \n",
"X O O \n",
"Episode 85, Total Reward: 0\n",
"Average Reward: 0.4470588235294118\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
"O \n",
" X \n",
" \n",
"on move: O\n",
"O X \n",
" X \n",
" \n",
"on move: X\n",
"O O X \n",
" X \n",
" \n",
"on move: O\n",
"O O X \n",
" X X \n",
" \n",
"on move: X\n",
"O O X \n",
" X X \n",
" O \n",
"on move: O\n",
"O O X \n",
" X X \n",
"X O \n",
"Episode 86, Total Reward: 1\n",
"Average Reward: 0.45348837209302323\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X O \n",
" \n",
" \n",
"on move: O\n",
"X O \n",
" X \n",
" \n",
"on move: X\n",
"X O \n",
"O X \n",
" \n",
"on move: O\n",
"X O \n",
"O X X \n",
" \n",
"on move: X\n",
"X O \n",
"O X X \n",
" O \n",
"on move: O\n",
"X O \n",
"O X X \n",
" O X \n",
"Episode 87, Total Reward: 1\n",
"Average Reward: 0.45977011494252873\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X O \n",
" \n",
"on move: O\n",
" X \n",
" X O \n",
" \n",
"on move: X\n",
" X O \n",
" X O \n",
" \n",
"on move: O\n",
"X X O \n",
" X O \n",
" \n",
"on move: X\n",
"X X O \n",
" X O \n",
"O \n",
"on move: O\n",
"X X O \n",
" X O \n",
"O X \n",
"Episode 88, Total Reward: 1\n",
"Average Reward: 0.4659090909090909\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" \n",
"X O \n",
"on move: O\n",
" X \n",
" \n",
"X O \n",
"on move: X\n",
" X \n",
" \n",
"X O O \n",
"on move: O\n",
"X X \n",
" \n",
"X O O \n",
"on move: X\n",
"X X \n",
" O \n",
"X O O \n",
"on move: O\n",
"X X X \n",
" O \n",
"X O O \n",
"Episode 89, Total Reward: 1\n",
"Average Reward: 0.47191011235955055\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X \n",
" \n",
"X O \n",
"on move: X\n",
" X \n",
" O \n",
"X O \n",
"on move: O\n",
"X X \n",
" O \n",
"X O \n",
"on move: X\n",
"X O X \n",
" O \n",
"X O \n",
"Episode 90, Total Reward: -1\n",
"Average Reward: 0.45555555555555555\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
" O \n",
"on move: O\n",
"X \n",
" X \n",
" O \n",
"on move: X\n",
"X \n",
"O X \n",
" O \n",
"on move: O\n",
"X X \n",
"O X \n",
" O \n",
"on move: X\n",
"X X \n",
"O X O \n",
" O \n",
"on move: O\n",
"X X X \n",
"O X O \n",
" O \n",
"Episode 91, Total Reward: 1\n",
"Average Reward: 0.46153846153846156\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" O \n",
" \n",
"on move: O\n",
" X \n",
" O \n",
" X \n",
"on move: X\n",
" X \n",
" O \n",
"O X \n",
"on move: O\n",
" X X \n",
" O \n",
"O X \n",
"on move: X\n",
" X X \n",
"O O \n",
"O X \n",
"on move: O\n",
" X X \n",
"O X O \n",
"O X \n",
"on move: X\n",
" X X \n",
"O X O \n",
"O O X \n",
"on move: O\n",
"X X X \n",
"O X O \n",
"O O X \n",
"Episode 92, Total Reward: 1\n",
"Average Reward: 0.4673913043478261\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
" X O \n",
"on move: O\n",
" X \n",
" \n",
" X O \n",
"on move: X\n",
" X \n",
" O \n",
" X O \n",
"on move: O\n",
" X \n",
" X O \n",
" X O \n",
"Episode 93, Total Reward: 1\n",
"Average Reward: 0.4731182795698925\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" X \n",
" X \n",
" O \n",
"on move: X\n",
" X \n",
" X \n",
" O O \n",
"on move: O\n",
"X X \n",
" X \n",
" O O \n",
"on move: X\n",
"X X \n",
"O X \n",
" O O \n",
"on move: O\n",
"X X \n",
"O X X \n",
" O O \n",
"on move: X\n",
"X X \n",
"O X X \n",
"O O O \n",
"Episode 94, Total Reward: -1\n",
"Average Reward: 0.4574468085106383\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
"O X \n",
" \n",
" \n",
"on move: O\n",
"O X \n",
" \n",
"X \n",
"on move: X\n",
"O X \n",
" \n",
"X O \n",
"on move: O\n",
"O X \n",
" X \n",
"X O \n",
"Episode 95, Total Reward: 1\n",
"Average Reward: 0.4631578947368421\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X O \n",
" \n",
"on move: O\n",
" \n",
"X O \n",
"X \n",
"on move: X\n",
"O \n",
"X O \n",
"X \n",
"on move: O\n",
"O \n",
"X O \n",
"X X \n",
"on move: X\n",
"O O \n",
"X O \n",
"X X \n",
"on move: O\n",
"O O \n",
"X O X \n",
"X X \n",
"on move: X\n",
"O O O \n",
"X O X \n",
"X X \n",
"Episode 96, Total Reward: -1\n",
"Average Reward: 0.4479166666666667\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" \n",
" X \n",
" X O \n",
"on move: X\n",
"O \n",
" X \n",
" X O \n",
"on move: O\n",
"O \n",
" X \n",
"X X O \n",
"on move: X\n",
"O \n",
"O X \n",
"X X O \n",
"on move: O\n",
"O X \n",
"O X \n",
"X X O \n",
"Episode 97, Total Reward: 1\n",
"Average Reward: 0.4536082474226804\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X \n",
" O \n",
"on move: O\n",
" \n",
"X \n",
"X O \n",
"on move: X\n",
" \n",
"X \n",
"X O O \n",
"on move: O\n",
" \n",
"X X \n",
"X O O \n",
"on move: X\n",
"O \n",
"X X \n",
"X O O \n",
"on move: O\n",
"O X \n",
"X X \n",
"X O O \n",
"Episode 98, Total Reward: 1\n",
"Average Reward: 0.45918367346938777\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
" X O \n",
"on move: O\n",
"X \n",
" \n",
" X O \n",
"on move: X\n",
"X O \n",
" \n",
" X O \n",
"on move: O\n",
"X O \n",
" X \n",
" X O \n",
"on move: X\n",
"X O O \n",
" X \n",
" X O \n",
"on move: O\n",
"X O O \n",
"X X \n",
" X O \n",
"on move: X\n",
"X O O \n",
"X X \n",
"O X O \n",
"on move: O\n",
"X O O \n",
"X X X \n",
"O X O \n",
"Episode 99, Total Reward: 1\n",
"Average Reward: 0.46464646464646464\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X \n",
"O \n",
"on move: O\n",
" \n",
"X \n",
"O X \n",
"on move: X\n",
"O \n",
"X \n",
"O X \n",
"on move: O\n",
"O \n",
"X X \n",
"O X \n",
"on move: X\n",
"O O \n",
"X X \n",
"O X \n",
"on move: O\n",
"O O \n",
"X X \n",
"O X X \n",
"on move: X\n",
"O O O \n",
"X X \n",
"O X X \n",
"Episode 100, Total Reward: -1\n",
"Average Reward: 0.45\n"
]
}
],
"source": [
"# Основной цикл обучения (работа с отдельным классом агента)\n",
"\n",
"# Создание среды для игры в крестики-нолики\n",
"environment = TicTacToeEnv()\n",
"\n",
"# Создание агента (играющего крестиками)\n",
"agent = Agent(symbol=1)\n",
"\n",
"num_episodes = 100 # Количество эпизодов (игр) для обучения\n",
"collected_rewards = [] # Список для хранения наград/побед в каждом эпизоде \n",
"\n",
"# Переменная для отслеживания символа и текущего игрока\n",
"oom = 1\n",
"\n",
"for i in range(num_episodes):\n",
" # Сброс среды и начало нового эпизода\n",
" state, _ = environment.reset() \n",
"\n",
" # Общая награда за эпизод\n",
" total_reward = 0\n",
"\n",
" # Флаг завершения игры\n",
" done = False\n",
" om = oom \n",
"\n",
" # Максимум 9 ходов, поскольку поле 3x3 \n",
" for j in range(9): \n",
" moves = environment.move_generator() \n",
"\n",
" # Ходов нет, заканчиваем игру\n",
" if not moves:\n",
" break\n",
"\n",
" \n",
" if len(moves) == 1:\n",
" move = moves[0] # Если остался один ход на основе стратегии\n",
" else:\n",
" move = agent.get_action(moves) # Агент выбирает ход на основе стратегии\n",
"\n",
" # Выполнение хода и обновление состояния игры\n",
" next_state, reward, done, info = environment.step(move)\n",
" total_reward += reward\n",
" state = next_state\n",
"\n",
" # Отображаем текущее состояние игры\n",
" environment.render()\n",
"\n",
" if done:\n",
" break\n",
"\n",
" om = -om # Смена игрока\n",
"\n",
" collected_rewards.append(total_reward)\n",
"\n",
" print(f\"Episode {i+1}, Total Reward: {total_reward}\")\n",
" average_reward = sum(collected_rewards) / len(collected_rewards)\n",
" print(f\"Average Reward: {average_reward}\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "aimenv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}