2024-12-07 00:32:06 +04:00

5125 lines
111 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Лабораторная работа №6"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Задача:\n",
"\n",
"**Разработка и запуск проекта по обучению с подкреплением для игры \"Крестики-нолики\"**\n",
"\n",
"**Что нужно сделать:**\n",
"\n",
"1. **Перевести проект на библиотеку Gymnasium и современную версию Python.** Gymnasium — это современная альтернатива библиотеке Gym, предоставляющая удобные инструменты для создания и тестирования моделей обучения с подкреплением.\n",
"2. **Реализовать агента для игры \"Крестики-нолики\" в виде отдельного класса.** Следуя примеру из лекции, создайте класс агента, который будет отвечать за принятие решений и обучение.\n",
"3. **Основной цикл обучения для работы с отдельным классом агента.** Обучение происходит через взаимодействие с классом агента.\n",
"4. **Протестировать новую версию программы.** Убедиться, что программа работает корректно и агент успешно обучается играть в \"Крестики-нолики\".\n",
"\n",
"**Обучение с подкреплением (Reinforcement Learning, RL)** — это подход машинного обучения, при котором агент учится принимать решения, взаимодействуя с окружающей средой. Цель агента — выбирать действия, которые максимизируют накопленную награду в долгосрочной перспективе.\n",
"\n",
"**Ссылка на игру \"Крестики-нолики\":** [https://github.com/nczempin/gym-tic-tac-toe](https://github.com/nczempin/gym-tic-tac-toe)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Переход на среду Gymnasium\n",
"\n",
"**Gymnasium (прежнее название — OpenAI Gym)** — это библиотека, предназначенная для разработки и тестирования алгоритмов обучения с подкреплением. Она предлагает широкий набор стандартных сред для RL, включая игры, задачи управления и моделирование.\n",
"\n",
"Gymnasium предлагает обновленный API по сравнению с Gym, который:\n",
"\n",
"* **Упрощает** методы `reset` и `step`.\n",
"* **Добавляет** новые функции, такие как поддержка тайм-аутов и контроль количества шагов.\n",
"* **Улучшает** поддержку пользовательских сред.\n",
"\n",
"**Основные преимущества Gymnasium:**\n",
"\n",
"* **Стандартизированные интерфейсы** для взаимодействия со средой.\n",
"* **Простая интеграция** с популярными библиотеками RL.\n",
"* **Поддержка** создания собственных сред.\n",
"\n",
"**Ключевые функции Gymnasium:**\n",
"\n",
"* `env.reset()` — инициализация среды.\n",
"* `env.step(action)` — выполнение действия и переход в новое состояние.\n",
"* `env.render()` — визуализация текущего состояния среды."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import gymnasium as gym\n",
"from gymnasium import spaces\n",
"\n",
"class TicTacToeEnv(gym.Env):\n",
" metadata = {'render.modes': ['human']}\n",
" \n",
" symbols = ['O', ' ', 'X']\n",
"\n",
" def __init__(self):\n",
" super().__init__()\n",
" self.action_space = spaces.Discrete(9)\n",
" self.observation_space = spaces.Discrete(9 * 3 * 2)\n",
" self.reset()\n",
"\n",
" def step(self, action):\n",
" done = False\n",
" reward = 0\n",
"\n",
" p, square = action # p - игрок (1 или -1), square - номер клетки\n",
"\n",
" board = self.state['board']\n",
" proposed = board[square] \n",
" om = self.state['on_move'] \n",
" if proposed != 0: # Клетка уже занята\n",
" print(f\"Незаконный ход: Квадрат {square} уже занят.\")\n",
" done = True\n",
" reward = -1 * om \n",
" if p != om: # Не тот игрок на ходу\n",
" print(f\"Незаконный ход: игрок {p} не находится в движении\")\n",
" done = True\n",
" reward = -1 * om\n",
" else:\n",
" board[square] = p\n",
" self.state['on_move'] = -p\n",
"\n",
" for i in range(3):\n",
" # Горизонтали и вертикали\n",
" if (board[i * 3] == p and board[i * 3 + 1] == p and board[i * 3 + 2] == p) or \\\n",
" (board[i] == p and board[i + 3] == p and board[i + 6] == p):\n",
" reward = p\n",
" done = True\n",
" break\n",
"\n",
" # Диагонали\n",
" if (board[0] == p and board[4] == p and board[8] == p) or \\\n",
" (board[2] == p and board[4] == p and board[6] == p):\n",
" reward = p\n",
" done = True\n",
" \n",
" return self.state, reward, done, {}\n",
"\n",
" def reset(self):\n",
" self.state = {}\n",
" self.state['board'] = [0, 0, 0, 0, 0, 0, 0, 0, 0] \n",
" self.state['on_move'] = 1 \n",
" return self.state, {}\n",
"\n",
" def render(self, close=False):\n",
" if close:\n",
" return\n",
" print(\"on move: \" , self.symbols[self.state['on_move']+1])\n",
" for i in range (9):\n",
" print (self.symbols[self.state['board'][i]+1], end=\" \");\n",
" if ((i % 3) == 2):\n",
" print();\n",
"\n",
" def move_generator(self):\n",
" moves = []\n",
" for i in range(9):\n",
" if self.state['board'][i] == 0:\n",
" p = self.state['on_move']\n",
" m = [p, i]\n",
" moves.append(m)\n",
" return moves"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Разработка агента\n",
"\n",
"В рамках обучения с подкреплением (Reinforcement Learning, RL) **агент** — это система, которая взаимодействует с окружающей средой для достижения определенной цели. Основная задача агента — разработать стратегию, которая максимизирует накопленную награду в долгосрочной перспективе.\n",
"\n",
"**Роль агента:**\n",
"\n",
"* **Принятие решений:** Агент выбирает действие на основе текущего состояния среды.\n",
"* **Получение обратной связи:** После выполнения действия агент получает от среды награду и информацию о новом состоянии.\n",
"\n",
"**Основные функции агента:**\n",
"\n",
"* **Выбор действия:** Использует алгоритмы или стратегии для определения следующего шага.\n",
"* **Обучение:** Анализирует полученный опыт и обновляет свои знания или стратегию для повышения эффективности.\n",
"* **Адаптация:** Адаптируется к изменениям в окружающей среде."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"\n",
"# Реализация Агента, который в рамках обучения с подкреплением взаимодействует со средой и вырабатывает наилучшую стратегию \n",
"\n",
"class Agent:\n",
" def __init__(self, symbol):\n",
" self.symbol = symbol # Символ игрока (1 - X, -1 - O)\n",
" \n",
" def get_action(self, moves):\n",
" return random.choice(moves) # Выбираем случайный ход из доступных"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Основной цикл обучения\n",
"\n",
"Основной цикл обучения включает создание и взаимодействие со средой. После завершения игры агент получает очки, которые отражают его успехи и позволяют оценить, насколько успешно прошло обучение системы."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O \n",
" X X \n",
" \n",
"on move: X\n",
" O O \n",
" X X \n",
" \n",
"on move: O\n",
"X O O \n",
" X X \n",
" \n",
"on move: X\n",
"X O O \n",
" X X \n",
" O \n",
"on move: O\n",
"X O O \n",
"X X X \n",
" O \n",
"Episode 1, Total Reward: 1\n",
"Average Reward: 1.0\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" O \n",
"X \n",
"on move: O\n",
" \n",
" O \n",
"X X \n",
"on move: X\n",
"O \n",
" O \n",
"X X \n",
"on move: O\n",
"O X \n",
" O \n",
"X X \n",
"on move: X\n",
"O X \n",
" O \n",
"X O X \n",
"on move: O\n",
"O X \n",
" X O \n",
"X O X \n",
"Episode 2, Total Reward: 1\n",
"Average Reward: 1.0\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
"O \n",
" X \n",
" \n",
"on move: O\n",
"O \n",
" X \n",
" X \n",
"on move: X\n",
"O \n",
"O X \n",
" X \n",
"on move: O\n",
"O X \n",
"O X \n",
" X \n",
"on move: X\n",
"O O X \n",
"O X \n",
" X \n",
"on move: O\n",
"O O X \n",
"O X \n",
"X X \n",
"Episode 3, Total Reward: 1\n",
"Average Reward: 1.0\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X O \n",
" \n",
"on move: O\n",
" \n",
"X X O \n",
" \n",
"on move: X\n",
" O \n",
"X X O \n",
" \n",
"on move: O\n",
" O \n",
"X X O \n",
" X \n",
"on move: X\n",
" O O \n",
"X X O \n",
" X \n",
"on move: O\n",
" O O \n",
"X X O \n",
" X X \n",
"on move: X\n",
"O O O \n",
"X X O \n",
" X X \n",
"Episode 4, Total Reward: -1\n",
"Average Reward: 0.5\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
"O X \n",
" \n",
" \n",
"on move: O\n",
"O X \n",
" X \n",
" \n",
"on move: X\n",
"O X \n",
" X \n",
"O \n",
"on move: O\n",
"O X \n",
" X X \n",
"O \n",
"on move: X\n",
"O X O \n",
" X X \n",
"O \n",
"on move: O\n",
"O X O \n",
" X X \n",
"O X \n",
"on move: X\n",
"O X O \n",
"O X X \n",
"O X \n",
"Episode 5, Total Reward: -1\n",
"Average Reward: 0.2\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" O \n",
" X \n",
"on move: O\n",
" \n",
" O \n",
"X X \n",
"on move: X\n",
" O \n",
" O \n",
"X X \n",
"on move: O\n",
" O \n",
" O X \n",
"X X \n",
"on move: X\n",
" O \n",
"O O X \n",
"X X \n",
"on move: O\n",
" O \n",
"O O X \n",
"X X X \n",
"Episode 6, Total Reward: 1\n",
"Average Reward: 0.3333333333333333\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" O X \n",
" \n",
" \n",
"on move: O\n",
" O X \n",
"X \n",
" \n",
"on move: X\n",
" O X \n",
"X \n",
" O \n",
"on move: O\n",
"X O X \n",
"X \n",
" O \n",
"on move: X\n",
"X O X \n",
"X \n",
"O O \n",
"on move: O\n",
"X O X \n",
"X X \n",
"O O \n",
"on move: X\n",
"X O X \n",
"X O X \n",
"O O \n",
"on move: O\n",
"X O X \n",
"X O X \n",
"O X O \n",
"Episode 7, Total Reward: 0\n",
"Average Reward: 0.2857142857142857\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
"O \n",
"X \n",
" \n",
"on move: O\n",
"O \n",
"X \n",
" X \n",
"on move: X\n",
"O O \n",
"X \n",
" X \n",
"on move: O\n",
"O O \n",
"X \n",
" X X \n",
"on move: X\n",
"O O \n",
"X \n",
"O X X \n",
"on move: O\n",
"O O \n",
"X X \n",
"O X X \n",
"on move: X\n",
"O O \n",
"X X O \n",
"O X X \n",
"on move: O\n",
"O X O \n",
"X X O \n",
"O X X \n",
"Episode 8, Total Reward: 1\n",
"Average Reward: 0.375\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
"O \n",
" \n",
"on move: O\n",
"X \n",
"O X \n",
" \n",
"on move: X\n",
"X \n",
"O X \n",
" O \n",
"on move: O\n",
"X \n",
"O X \n",
"X O \n",
"on move: X\n",
"X \n",
"O X \n",
"X O O \n",
"on move: O\n",
"X \n",
"O X X \n",
"X O O \n",
"on move: X\n",
"X O \n",
"O X X \n",
"X O O \n",
"on move: O\n",
"X X O \n",
"O X X \n",
"X O O \n",
"Episode 9, Total Reward: 0\n",
"Average Reward: 0.3333333333333333\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
"O X \n",
" \n",
"on move: O\n",
" \n",
"O X X \n",
" \n",
"on move: X\n",
" \n",
"O X X \n",
" O \n",
"on move: O\n",
" \n",
"O X X \n",
"X O \n",
"on move: X\n",
"O \n",
"O X X \n",
"X O \n",
"on move: O\n",
"O \n",
"O X X \n",
"X O X \n",
"on move: X\n",
"O O \n",
"O X X \n",
"X O X \n",
"on move: O\n",
"O X O \n",
"O X X \n",
"X O X \n",
"Episode 10, Total Reward: 0\n",
"Average Reward: 0.3\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
"O \n",
" \n",
"on move: O\n",
" X \n",
"O \n",
"X \n",
"on move: X\n",
" X \n",
"O \n",
"X O \n",
"on move: O\n",
" X \n",
"O \n",
"X X O \n",
"on move: X\n",
" X \n",
"O O \n",
"X X O \n",
"on move: O\n",
" X X \n",
"O O \n",
"X X O \n",
"on move: X\n",
" X X \n",
"O O O \n",
"X X O \n",
"Episode 11, Total Reward: -1\n",
"Average Reward: 0.18181818181818182\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X O \n",
" \n",
" \n",
"on move: O\n",
"X O \n",
"X \n",
" \n",
"on move: X\n",
"X O \n",
"X \n",
"O \n",
"on move: O\n",
"X O X \n",
"X \n",
"O \n",
"on move: X\n",
"X O X \n",
"X O \n",
"O \n",
"on move: O\n",
"X O X \n",
"X O \n",
"O X \n",
"on move: X\n",
"X O X \n",
"X O O \n",
"O X \n",
"on move: O\n",
"X O X \n",
"X O O \n",
"O X X \n",
"Episode 12, Total Reward: 0\n",
"Average Reward: 0.16666666666666666\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
"X \n",
" X \n",
"on move: X\n",
" O \n",
"X \n",
"O X \n",
"on move: O\n",
" O \n",
"X X \n",
"O X \n",
"on move: X\n",
" O O \n",
"X X \n",
"O X \n",
"on move: O\n",
"X O O \n",
"X X \n",
"O X \n",
"on move: X\n",
"X O O \n",
"X O X \n",
"O X \n",
"Episode 13, Total Reward: -1\n",
"Average Reward: 0.07692307692307693\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
"X \n",
" X \n",
" O \n",
"on move: X\n",
"X \n",
" X \n",
" O O \n",
"on move: O\n",
"X \n",
"X X \n",
" O O \n",
"on move: X\n",
"X \n",
"X X \n",
"O O O \n",
"Episode 14, Total Reward: -1\n",
"Average Reward: 0.0\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
"O X \n",
" \n",
" \n",
"on move: O\n",
"O X \n",
" \n",
"X \n",
"on move: X\n",
"O X \n",
" \n",
"X O \n",
"on move: O\n",
"O X \n",
" \n",
"X X O \n",
"on move: X\n",
"O X \n",
" O \n",
"X X O \n",
"Episode 15, Total Reward: -1\n",
"Average Reward: -0.06666666666666667\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
"O \n",
" X \n",
"on move: O\n",
" X \n",
"O \n",
" X \n",
"on move: X\n",
" X \n",
"O O \n",
" X \n",
"on move: O\n",
" X \n",
"O X O \n",
" X \n",
"on move: X\n",
" X \n",
"O X O \n",
" X O \n",
"on move: O\n",
" X \n",
"O X O \n",
"X X O \n",
"Episode 16, Total Reward: 1\n",
"Average Reward: 0.0\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
"X X \n",
" \n",
" O \n",
"on move: X\n",
"X X O \n",
" \n",
" O \n",
"on move: O\n",
"X X O \n",
"X \n",
" O \n",
"on move: X\n",
"X X O \n",
"X \n",
"O O \n",
"on move: O\n",
"X X O \n",
"X X \n",
"O O \n",
"on move: X\n",
"X X O \n",
"X X \n",
"O O O \n",
"Episode 17, Total Reward: -1\n",
"Average Reward: -0.058823529411764705\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
"X X \n",
" \n",
" O \n",
"on move: X\n",
"X X \n",
" O \n",
" O \n",
"on move: O\n",
"X X \n",
" O \n",
"X O \n",
"on move: X\n",
"X X \n",
"O O \n",
"X O \n",
"on move: O\n",
"X X \n",
"O O \n",
"X O X \n",
"on move: X\n",
"X X \n",
"O O O \n",
"X O X \n",
"Episode 18, Total Reward: -1\n",
"Average Reward: -0.1111111111111111\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" O \n",
" X \n",
"on move: O\n",
" \n",
" O \n",
"X X \n",
"on move: X\n",
" O \n",
" O \n",
"X X \n",
"on move: O\n",
" X O \n",
" O \n",
"X X \n",
"on move: X\n",
" X O \n",
" O \n",
"X X O \n",
"on move: O\n",
"X X O \n",
" O \n",
"X X O \n",
"on move: X\n",
"X X O \n",
" O O \n",
"X X O \n",
"Episode 19, Total Reward: -1\n",
"Average Reward: -0.15789473684210525\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" \n",
"X X \n",
"on move: X\n",
" O \n",
" \n",
"X O X \n",
"on move: O\n",
" X O \n",
" \n",
"X O X \n",
"on move: X\n",
" X O \n",
" O \n",
"X O X \n",
"on move: O\n",
" X O \n",
"X O \n",
"X O X \n",
"on move: X\n",
"O X O \n",
"X O \n",
"X O X \n",
"on move: O\n",
"O X O \n",
"X O X \n",
"X O X \n",
"Episode 20, Total Reward: 0\n",
"Average Reward: -0.15\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X \n",
" O \n",
"on move: O\n",
" \n",
"X X \n",
" O \n",
"on move: X\n",
" \n",
"X O X \n",
" O \n",
"on move: O\n",
"X \n",
"X O X \n",
" O \n",
"on move: X\n",
"X \n",
"X O X \n",
" O O \n",
"on move: O\n",
"X X \n",
"X O X \n",
" O O \n",
"on move: X\n",
"X X O \n",
"X O X \n",
" O O \n",
"on move: O\n",
"X X O \n",
"X O X \n",
"X O O \n",
"Episode 21, Total Reward: 1\n",
"Average Reward: -0.09523809523809523\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
"O \n",
"on move: O\n",
"X \n",
" X \n",
"O \n",
"on move: X\n",
"X \n",
"O X \n",
"O \n",
"on move: O\n",
"X \n",
"O X X \n",
"O \n",
"on move: X\n",
"X \n",
"O X X \n",
"O O \n",
"on move: O\n",
"X \n",
"O X X \n",
"O O X \n",
"Episode 22, Total Reward: 1\n",
"Average Reward: -0.045454545454545456\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" X \n",
" X \n",
"on move: X\n",
" O \n",
" X \n",
" O X \n",
"on move: O\n",
" O \n",
" X X \n",
" O X \n",
"on move: X\n",
" O O \n",
" X X \n",
" O X \n",
"on move: O\n",
"X O O \n",
" X X \n",
" O X \n",
"Episode 23, Total Reward: 1\n",
"Average Reward: 0.0\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X \n",
"O \n",
"on move: O\n",
" X \n",
"X \n",
"O \n",
"on move: X\n",
"O X \n",
"X \n",
"O \n",
"on move: O\n",
"O X \n",
"X \n",
"O X \n",
"on move: X\n",
"O X \n",
"X O \n",
"O X \n",
"on move: O\n",
"O X \n",
"X O \n",
"O X X \n",
"on move: X\n",
"O X \n",
"X O O \n",
"O X X \n",
"on move: O\n",
"O X X \n",
"X O O \n",
"O X X \n",
"Episode 24, Total Reward: 0\n",
"Average Reward: 0.0\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" O \n",
"X \n",
" \n",
"on move: O\n",
" O X \n",
"X \n",
" \n",
"on move: X\n",
" O X \n",
"X \n",
" O \n",
"on move: O\n",
" O X \n",
"X \n",
"X O \n",
"on move: X\n",
" O X \n",
"X \n",
"X O O \n",
"on move: O\n",
"X O X \n",
"X \n",
"X O O \n",
"Episode 25, Total Reward: 1\n",
"Average Reward: 0.04\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" X \n",
" X \n",
" O \n",
"on move: X\n",
" X \n",
" X \n",
"O O \n",
"on move: O\n",
" X X \n",
" X \n",
"O O \n",
"on move: X\n",
" X X \n",
" X \n",
"O O O \n",
"Episode 26, Total Reward: -1\n",
"Average Reward: 0.0\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X O \n",
" \n",
" \n",
"on move: O\n",
"X O \n",
" X \n",
" \n",
"on move: X\n",
"X O \n",
" X \n",
"O \n",
"on move: O\n",
"X O \n",
" X X \n",
"O \n",
"on move: X\n",
"X O \n",
" X X \n",
"O O \n",
"on move: O\n",
"X O \n",
" X X \n",
"O O X \n",
"Episode 27, Total Reward: 1\n",
"Average Reward: 0.037037037037037035\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
"O \n",
" X \n",
" \n",
"on move: O\n",
"O \n",
" X \n",
"X \n",
"on move: X\n",
"O \n",
" X \n",
"X O \n",
"on move: O\n",
"O X \n",
" X \n",
"X O \n",
"on move: X\n",
"O X \n",
"O X \n",
"X O \n",
"on move: O\n",
"O X \n",
"O X \n",
"X X O \n",
"Episode 28, Total Reward: 1\n",
"Average Reward: 0.07142857142857142\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X O \n",
" \n",
"on move: O\n",
" X \n",
" X O \n",
" \n",
"on move: X\n",
" X \n",
"O X O \n",
" \n",
"on move: O\n",
" X \n",
"O X O \n",
" X \n",
"Episode 29, Total Reward: 1\n",
"Average Reward: 0.10344827586206896\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X \n",
" X \n",
" O \n",
"on move: X\n",
" X \n",
"O X \n",
" O \n",
"on move: O\n",
" X X \n",
"O X \n",
" O \n",
"on move: X\n",
"O X X \n",
"O X \n",
" O \n",
"on move: O\n",
"O X X \n",
"O X \n",
"X O \n",
"Episode 30, Total Reward: 1\n",
"Average Reward: 0.13333333333333333\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" \n",
"X O \n",
"on move: O\n",
" X \n",
" \n",
"X O \n",
"on move: X\n",
" X \n",
" \n",
"X O O \n",
"on move: O\n",
" X \n",
" X \n",
"X O O \n",
"on move: X\n",
"O X \n",
" X \n",
"X O O \n",
"on move: O\n",
"O X \n",
"X X \n",
"X O O \n",
"on move: X\n",
"O X \n",
"X O X \n",
"X O O \n",
"Episode 31, Total Reward: -1\n",
"Average Reward: 0.0967741935483871\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" O X \n",
" \n",
"on move: O\n",
" \n",
" O X \n",
" X \n",
"on move: X\n",
" O \n",
" O X \n",
" X \n",
"on move: O\n",
"X O \n",
" O X \n",
" X \n",
"on move: X\n",
"X O O \n",
" O X \n",
" X \n",
"on move: O\n",
"X O O \n",
"X O X \n",
" X \n",
"on move: X\n",
"X O O \n",
"X O X \n",
" O X \n",
"Episode 32, Total Reward: -1\n",
"Average Reward: 0.0625\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
"O \n",
" \n",
"X \n",
"on move: O\n",
"O \n",
" X \n",
"X \n",
"on move: X\n",
"O O \n",
" X \n",
"X \n",
"on move: O\n",
"O O \n",
" X X \n",
"X \n",
"on move: X\n",
"O O \n",
" X X \n",
"X O \n",
"on move: O\n",
"O O \n",
" X X \n",
"X O X \n",
"on move: X\n",
"O O \n",
"O X X \n",
"X O X \n",
"on move: O\n",
"O O X \n",
"O X X \n",
"X O X \n",
"Episode 33, Total Reward: 1\n",
"Average Reward: 0.09090909090909091\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
" O \n",
"on move: O\n",
"X \n",
" X \n",
" O \n",
"on move: X\n",
"X O \n",
" X \n",
" O \n",
"on move: O\n",
"X O \n",
" X \n",
" X O \n",
"on move: X\n",
"X O \n",
"O X \n",
" X O \n",
"on move: O\n",
"X O \n",
"O X X \n",
" X O \n",
"on move: X\n",
"X O O \n",
"O X X \n",
" X O \n",
"on move: O\n",
"X O O \n",
"O X X \n",
"X X O \n",
"Episode 34, Total Reward: 0\n",
"Average Reward: 0.08823529411764706\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
"X X \n",
" \n",
" O \n",
"on move: X\n",
"X X \n",
"O \n",
" O \n",
"on move: O\n",
"X X \n",
"O \n",
"X O \n",
"on move: X\n",
"X X \n",
"O O \n",
"X O \n",
"on move: O\n",
"X X \n",
"O O \n",
"X X O \n",
"on move: X\n",
"X X \n",
"O O O \n",
"X X O \n",
"Episode 35, Total Reward: -1\n",
"Average Reward: 0.05714285714285714\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" O X \n",
" \n",
" \n",
"on move: O\n",
"X O X \n",
" \n",
" \n",
"on move: X\n",
"X O X \n",
" O \n",
" \n",
"on move: O\n",
"X O X \n",
" O \n",
" X \n",
"on move: X\n",
"X O X \n",
" O O \n",
" X \n",
"on move: O\n",
"X O X \n",
" O O \n",
"X X \n",
"on move: X\n",
"X O X \n",
" O O \n",
"X O X \n",
"Episode 36, Total Reward: -1\n",
"Average Reward: 0.027777777777777776\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" O \n",
"X \n",
" \n",
"on move: O\n",
" O \n",
"X \n",
" X \n",
"on move: X\n",
"O O \n",
"X \n",
" X \n",
"on move: O\n",
"O O X \n",
"X \n",
" X \n",
"on move: X\n",
"O O X \n",
"X \n",
"O X \n",
"on move: O\n",
"O O X \n",
"X X \n",
"O X \n",
"Episode 37, Total Reward: 1\n",
"Average Reward: 0.05405405405405406\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" O X \n",
" \n",
"on move: O\n",
"X \n",
" O X \n",
" \n",
"on move: X\n",
"X \n",
" O X \n",
"O \n",
"on move: O\n",
"X \n",
" O X \n",
"O X \n",
"on move: X\n",
"X O \n",
" O X \n",
"O X \n",
"on move: O\n",
"X O \n",
"X O X \n",
"O X \n",
"on move: X\n",
"X O \n",
"X O X \n",
"O O X \n",
"Episode 38, Total Reward: -1\n",
"Average Reward: 0.02631578947368421\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X O \n",
" \n",
" \n",
"on move: O\n",
"X O \n",
"X \n",
" \n",
"on move: X\n",
"X O O \n",
"X \n",
" \n",
"on move: O\n",
"X O O \n",
"X \n",
" X \n",
"on move: X\n",
"X O O \n",
"X \n",
" O X \n",
"on move: O\n",
"X O O \n",
"X X \n",
" O X \n",
"on move: X\n",
"X O O \n",
"X O X \n",
" O X \n",
"Episode 39, Total Reward: -1\n",
"Average Reward: 0.0\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X \n",
"X \n",
" O \n",
"on move: X\n",
" X \n",
"X \n",
" O O \n",
"on move: O\n",
" X \n",
"X \n",
"X O O \n",
"on move: X\n",
" X \n",
"X O \n",
"X O O \n",
"on move: O\n",
" X X \n",
"X O \n",
"X O O \n",
"on move: X\n",
"O X X \n",
"X O \n",
"X O O \n",
"Episode 40, Total Reward: -1\n",
"Average Reward: -0.025\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
"O X \n",
" \n",
"on move: O\n",
" \n",
"O X \n",
" X \n",
"on move: X\n",
" \n",
"O X \n",
"O X \n",
"on move: O\n",
" X \n",
"O X \n",
"O X \n",
"on move: X\n",
" X \n",
"O X O \n",
"O X \n",
"on move: O\n",
"X X \n",
"O X O \n",
"O X \n",
"Episode 41, Total Reward: 1\n",
"Average Reward: 0.0\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X \n",
"X \n",
" O \n",
"on move: X\n",
" X O \n",
"X \n",
" O \n",
"on move: O\n",
" X O \n",
"X \n",
" O X \n",
"on move: X\n",
" X O \n",
"X \n",
"O O X \n",
"on move: O\n",
" X O \n",
"X X \n",
"O O X \n",
"on move: X\n",
" X O \n",
"X O X \n",
"O O X \n",
"Episode 42, Total Reward: -1\n",
"Average Reward: -0.023809523809523808\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X O \n",
" \n",
" \n",
"on move: O\n",
"X O \n",
"X \n",
" \n",
"on move: X\n",
"X O \n",
"X \n",
"O \n",
"on move: O\n",
"X O \n",
"X \n",
"O X \n",
"on move: X\n",
"X O \n",
"X O \n",
"O X \n",
"on move: O\n",
"X O \n",
"X O \n",
"O X X \n",
"on move: X\n",
"X O \n",
"X O O \n",
"O X X \n",
"on move: O\n",
"X O X \n",
"X O O \n",
"O X X \n",
"Episode 43, Total Reward: 0\n",
"Average Reward: -0.023255813953488372\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
" O \n",
"on move: O\n",
"X \n",
" \n",
" O X \n",
"on move: X\n",
"X O \n",
" \n",
" O X \n",
"on move: O\n",
"X O X \n",
" \n",
" O X \n",
"on move: X\n",
"X O X \n",
" \n",
"O O X \n",
"on move: O\n",
"X O X \n",
" X \n",
"O O X \n",
"Episode 44, Total Reward: 1\n",
"Average Reward: 0.0\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
"X O \n",
" \n",
" X \n",
"on move: X\n",
"X O \n",
" O \n",
" X \n",
"on move: O\n",
"X O \n",
" O \n",
" X X \n",
"on move: X\n",
"X O \n",
"O O \n",
" X X \n",
"on move: O\n",
"X O \n",
"O O \n",
"X X X \n",
"Episode 45, Total Reward: 1\n",
"Average Reward: 0.022222222222222223\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X O \n",
" \n",
" \n",
"on move: O\n",
"X O \n",
" X \n",
" \n",
"on move: X\n",
"X O \n",
" X \n",
"O \n",
"on move: O\n",
"X O X \n",
" X \n",
"O \n",
"on move: X\n",
"X O X \n",
" X \n",
"O O \n",
"on move: O\n",
"X O X \n",
"X X \n",
"O O \n",
"on move: X\n",
"X O X \n",
"X X O \n",
"O O \n",
"on move: O\n",
"X O X \n",
"X X O \n",
"O X O \n",
"Episode 46, Total Reward: 0\n",
"Average Reward: 0.021739130434782608\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" O \n",
" \n",
"on move: O\n",
"X \n",
" O \n",
" X \n",
"on move: X\n",
"X \n",
" O \n",
" O X \n",
"on move: O\n",
"X \n",
" O \n",
"X O X \n",
"on move: X\n",
"X O \n",
" O \n",
"X O X \n",
"on move: O\n",
"X O \n",
"X O \n",
"X O X \n",
"Episode 47, Total Reward: 1\n",
"Average Reward: 0.0425531914893617\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" O \n",
" \n",
"X \n",
"on move: O\n",
" O \n",
" \n",
"X X \n",
"on move: X\n",
" O \n",
" \n",
"X X O \n",
"on move: O\n",
" O \n",
"X \n",
"X X O \n",
"on move: X\n",
" O \n",
"X O \n",
"X X O \n",
"on move: O\n",
"X O \n",
"X O \n",
"X X O \n",
"Episode 48, Total Reward: 1\n",
"Average Reward: 0.0625\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
"O \n",
" \n",
" X \n",
"on move: O\n",
"O \n",
" X \n",
" X \n",
"on move: X\n",
"O \n",
" X \n",
" X O \n",
"on move: O\n",
"O X \n",
" X \n",
" X O \n",
"on move: X\n",
"O X \n",
" X \n",
"O X O \n",
"on move: O\n",
"O X X \n",
" X \n",
"O X O \n",
"Episode 49, Total Reward: 1\n",
"Average Reward: 0.08163265306122448\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
"O \n",
"on move: O\n",
" \n",
" X X \n",
"O \n",
"on move: X\n",
" O \n",
" X X \n",
"O \n",
"on move: O\n",
" O \n",
" X X \n",
"O X \n",
"on move: X\n",
" O \n",
"O X X \n",
"O X \n",
"on move: O\n",
"X O \n",
"O X X \n",
"O X \n",
"Episode 50, Total Reward: 1\n",
"Average Reward: 0.1\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
"O X \n",
"on move: O\n",
" \n",
" X \n",
"O X \n",
"on move: X\n",
" \n",
" X \n",
"O O X \n",
"on move: O\n",
" X \n",
" X \n",
"O O X \n",
"Episode 51, Total Reward: 1\n",
"Average Reward: 0.11764705882352941\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
"O X \n",
" \n",
" \n",
"on move: O\n",
"O X X \n",
" \n",
" \n",
"on move: X\n",
"O X X \n",
" \n",
"O \n",
"on move: O\n",
"O X X \n",
" \n",
"O X \n",
"on move: X\n",
"O X X \n",
" \n",
"O X O \n",
"on move: O\n",
"O X X \n",
" X \n",
"O X O \n",
"Episode 52, Total Reward: 1\n",
"Average Reward: 0.1346153846153846\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" O \n",
" \n",
"on move: O\n",
" X \n",
" O X \n",
" \n",
"on move: X\n",
" X \n",
" O X \n",
"O \n",
"on move: O\n",
" X X \n",
" O X \n",
"O \n",
"on move: X\n",
" X X \n",
" O X \n",
"O O \n",
"on move: O\n",
" X X \n",
"X O X \n",
"O O \n",
"on move: X\n",
"O X X \n",
"X O X \n",
"O O \n",
"on move: O\n",
"O X X \n",
"X O X \n",
"O O X \n",
"Episode 53, Total Reward: 1\n",
"Average Reward: 0.1509433962264151\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X \n",
" \n",
"X O \n",
"on move: X\n",
" X O \n",
" \n",
"X O \n",
"on move: O\n",
" X O \n",
" \n",
"X O X \n",
"on move: X\n",
" X O \n",
"O \n",
"X O X \n",
"on move: O\n",
" X O \n",
"O X \n",
"X O X \n",
"on move: X\n",
" X O \n",
"O X O \n",
"X O X \n",
"on move: O\n",
"X X O \n",
"O X O \n",
"X O X \n",
"Episode 54, Total Reward: 1\n",
"Average Reward: 0.16666666666666666\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O \n",
" X \n",
" X \n",
"on move: X\n",
" O \n",
" O X \n",
" X \n",
"on move: O\n",
" X O \n",
" O X \n",
" X \n",
"on move: X\n",
" X O \n",
"O O X \n",
" X \n",
"on move: O\n",
" X O \n",
"O O X \n",
" X X \n",
"on move: X\n",
"O X O \n",
"O O X \n",
" X X \n",
"on move: O\n",
"O X O \n",
"O O X \n",
"X X X \n",
"Episode 55, Total Reward: 1\n",
"Average Reward: 0.18181818181818182\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
"O X \n",
" \n",
" \n",
"on move: O\n",
"O X X \n",
" \n",
" \n",
"on move: X\n",
"O X X \n",
" \n",
" O \n",
"on move: O\n",
"O X X \n",
" X \n",
" O \n",
"on move: X\n",
"O X X \n",
"O X \n",
" O \n",
"on move: O\n",
"O X X \n",
"O X \n",
" X O \n",
"on move: X\n",
"O X X \n",
"O X \n",
"O X O \n",
"Episode 56, Total Reward: -1\n",
"Average Reward: 0.16071428571428573\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
"O \n",
" \n",
"on move: O\n",
"X \n",
"O \n",
" X \n",
"on move: X\n",
"X \n",
"O O \n",
" X \n",
"on move: O\n",
"X \n",
"O O \n",
"X X \n",
"on move: X\n",
"X \n",
"O O \n",
"X O X \n",
"on move: O\n",
"X X \n",
"O O \n",
"X O X \n",
"on move: X\n",
"X X \n",
"O O O \n",
"X O X \n",
"Episode 57, Total Reward: -1\n",
"Average Reward: 0.14035087719298245\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" O X \n",
" \n",
" \n",
"on move: O\n",
" O X \n",
" \n",
" X \n",
"on move: X\n",
" O X \n",
" \n",
"O X \n",
"on move: O\n",
" O X \n",
" X \n",
"O X \n",
"Episode 58, Total Reward: 1\n",
"Average Reward: 0.15517241379310345\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X \n",
"O \n",
"on move: O\n",
" \n",
"X \n",
"O X \n",
"on move: X\n",
" \n",
"X O \n",
"O X \n",
"on move: O\n",
" \n",
"X O \n",
"O X X \n",
"on move: X\n",
" O \n",
"X O \n",
"O X X \n",
"on move: O\n",
" O X \n",
"X O \n",
"O X X \n",
"on move: X\n",
"O O X \n",
"X O \n",
"O X X \n",
"on move: O\n",
"O O X \n",
"X X O \n",
"O X X \n",
"Episode 59, Total Reward: 0\n",
"Average Reward: 0.15254237288135594\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X \n",
" \n",
"X O \n",
"on move: X\n",
" O X \n",
" \n",
"X O \n",
"on move: O\n",
" O X \n",
"X \n",
"X O \n",
"on move: X\n",
" O X \n",
"X \n",
"X O O \n",
"on move: O\n",
"X O X \n",
"X \n",
"X O O \n",
"Episode 60, Total Reward: 1\n",
"Average Reward: 0.16666666666666666\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
" O \n",
"on move: O\n",
"X \n",
" X \n",
" O \n",
"on move: X\n",
"X O \n",
" X \n",
" O \n",
"on move: O\n",
"X O \n",
" X X \n",
" O \n",
"on move: X\n",
"X O \n",
" X X \n",
"O O \n",
"on move: O\n",
"X O \n",
" X X \n",
"O X O \n",
"on move: X\n",
"X O O \n",
" X X \n",
"O X O \n",
"on move: O\n",
"X O O \n",
"X X X \n",
"O X O \n",
"Episode 61, Total Reward: 1\n",
"Average Reward: 0.18032786885245902\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
"O \n",
"on move: O\n",
" \n",
" X \n",
"O X \n",
"on move: X\n",
" O \n",
" X \n",
"O X \n",
"on move: O\n",
"X O \n",
" X \n",
"O X \n",
"on move: X\n",
"X O \n",
" X \n",
"O X O \n",
"on move: O\n",
"X O \n",
"X X \n",
"O X O \n",
"on move: X\n",
"X O \n",
"X O X \n",
"O X O \n",
"on move: O\n",
"X O X \n",
"X O X \n",
"O X O \n",
"Episode 62, Total Reward: 0\n",
"Average Reward: 0.1774193548387097\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" O \n",
" \n",
"on move: O\n",
"X \n",
" O \n",
" X \n",
"on move: X\n",
"X \n",
" O \n",
"O X \n",
"on move: O\n",
"X \n",
" O \n",
"O X X \n",
"on move: X\n",
"X \n",
"O O \n",
"O X X \n",
"on move: O\n",
"X X \n",
"O O \n",
"O X X \n",
"on move: X\n",
"X X \n",
"O O O \n",
"O X X \n",
"Episode 63, Total Reward: -1\n",
"Average Reward: 0.15873015873015872\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
"O \n",
" \n",
"X \n",
"on move: O\n",
"O \n",
"X \n",
"X \n",
"on move: X\n",
"O \n",
"X O \n",
"X \n",
"on move: O\n",
"O X \n",
"X O \n",
"X \n",
"on move: X\n",
"O O X \n",
"X O \n",
"X \n",
"on move: O\n",
"O O X \n",
"X X O \n",
"X \n",
"Episode 64, Total Reward: 1\n",
"Average Reward: 0.171875\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" X \n",
" X \n",
"on move: X\n",
" O \n",
" X \n",
"O X \n",
"on move: O\n",
" X O \n",
" X \n",
"O X \n",
"Episode 65, Total Reward: 1\n",
"Average Reward: 0.18461538461538463\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" O \n",
" \n",
"on move: O\n",
"X \n",
" O \n",
" X \n",
"on move: X\n",
"X \n",
" O O \n",
" X \n",
"on move: O\n",
"X X \n",
" O O \n",
" X \n",
"on move: X\n",
"X X \n",
" O O \n",
" X O \n",
"on move: O\n",
"X X X \n",
" O O \n",
" X O \n",
"Episode 66, Total Reward: 1\n",
"Average Reward: 0.19696969696969696\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
"O \n",
"on move: O\n",
" \n",
" X X \n",
"O \n",
"on move: X\n",
" O \n",
" X X \n",
"O \n",
"on move: O\n",
" X O \n",
" X X \n",
"O \n",
"on move: X\n",
" X O \n",
"O X X \n",
"O \n",
"on move: O\n",
" X O \n",
"O X X \n",
"O X \n",
"on move: X\n",
"O X O \n",
"O X X \n",
"O X \n",
"Episode 67, Total Reward: -1\n",
"Average Reward: 0.1791044776119403\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" X \n",
" X \n",
"on move: X\n",
" O \n",
"O X \n",
" X \n",
"on move: O\n",
" O X \n",
"O X \n",
" X \n",
"on move: X\n",
" O X \n",
"O X \n",
"O X \n",
"on move: O\n",
" O X \n",
"O X X \n",
"O X \n",
"on move: X\n",
" O X \n",
"O X X \n",
"O X O \n",
"on move: O\n",
"X O X \n",
"O X X \n",
"O X O \n",
"Episode 68, Total Reward: 0\n",
"Average Reward: 0.17647058823529413\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
"O X \n",
"on move: O\n",
" \n",
" X \n",
"O X \n",
"on move: X\n",
" \n",
" X \n",
"O O X \n",
"on move: O\n",
" X \n",
" X \n",
"O O X \n",
"on move: X\n",
"O X \n",
" X \n",
"O O X \n",
"on move: O\n",
"O X \n",
" X X \n",
"O O X \n",
"on move: X\n",
"O X O \n",
" X X \n",
"O O X \n",
"on move: O\n",
"O X O \n",
"X X X \n",
"O O X \n",
"Episode 69, Total Reward: 1\n",
"Average Reward: 0.18840579710144928\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
"X \n",
" X \n",
"on move: X\n",
" O O \n",
"X \n",
" X \n",
"on move: O\n",
" O O \n",
"X X \n",
" X \n",
"on move: X\n",
" O O \n",
"X X \n",
" O X \n",
"on move: O\n",
" O O \n",
"X X \n",
"X O X \n",
"on move: X\n",
" O O \n",
"X O X \n",
"X O X \n",
"Episode 70, Total Reward: -1\n",
"Average Reward: 0.17142857142857143\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" O \n",
"X \n",
"on move: O\n",
"X \n",
" O \n",
"X \n",
"on move: X\n",
"X \n",
" O \n",
"X O \n",
"on move: O\n",
"X \n",
" O X \n",
"X O \n",
"on move: X\n",
"X O \n",
" O X \n",
"X O \n",
"Episode 71, Total Reward: -1\n",
"Average Reward: 0.15492957746478872\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" O \n",
" \n",
"on move: O\n",
"X \n",
"X O \n",
" \n",
"on move: X\n",
"X O \n",
"X O \n",
" \n",
"on move: O\n",
"X O \n",
"X O \n",
" X \n",
"on move: X\n",
"X O \n",
"X O \n",
" O X \n",
"on move: O\n",
"X O \n",
"X X O \n",
" O X \n",
"Episode 72, Total Reward: 1\n",
"Average Reward: 0.16666666666666666\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X O \n",
" \n",
"on move: O\n",
" \n",
" X O \n",
" X \n",
"on move: X\n",
" O \n",
" X O \n",
" X \n",
"on move: O\n",
" O \n",
" X O \n",
" X X \n",
"on move: X\n",
" O \n",
"O X O \n",
" X X \n",
"on move: O\n",
"X O \n",
"O X O \n",
" X X \n",
"Episode 73, Total Reward: 1\n",
"Average Reward: 0.1780821917808219\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X \n",
" O \n",
"on move: O\n",
" X \n",
"X \n",
" O \n",
"on move: X\n",
" X \n",
"X O \n",
" O \n",
"on move: O\n",
" X \n",
"X O X \n",
" O \n",
"on move: X\n",
" X \n",
"X O X \n",
"O O \n",
"on move: O\n",
"X X \n",
"X O X \n",
"O O \n",
"on move: X\n",
"X X O \n",
"X O X \n",
"O O \n",
"Episode 74, Total Reward: -1\n",
"Average Reward: 0.16216216216216217\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
"O X \n",
"on move: O\n",
" \n",
" \n",
"O X X \n",
"on move: X\n",
" O \n",
" \n",
"O X X \n",
"on move: O\n",
" X O \n",
" \n",
"O X X \n",
"on move: X\n",
" X O \n",
"O \n",
"O X X \n",
"on move: O\n",
" X O \n",
"O X \n",
"O X X \n",
"Episode 75, Total Reward: 1\n",
"Average Reward: 0.17333333333333334\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
"O \n",
" \n",
"X \n",
"on move: O\n",
"O \n",
" \n",
"X X \n",
"on move: X\n",
"O \n",
" O \n",
"X X \n",
"on move: O\n",
"O \n",
" O \n",
"X X X \n",
"Episode 76, Total Reward: 1\n",
"Average Reward: 0.18421052631578946\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
"O \n",
" X \n",
"on move: O\n",
" \n",
"O \n",
" X X \n",
"on move: X\n",
" O \n",
"O \n",
" X X \n",
"on move: O\n",
" O \n",
"O X \n",
" X X \n",
"on move: X\n",
" O \n",
"O X \n",
"O X X \n",
"on move: O\n",
" O \n",
"O X X \n",
"O X X \n",
"on move: X\n",
" O O \n",
"O X X \n",
"O X X \n",
"on move: O\n",
"X O O \n",
"O X X \n",
"O X X \n",
"Episode 77, Total Reward: 1\n",
"Average Reward: 0.19480519480519481\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
"O \n",
"on move: O\n",
" X \n",
" \n",
"O X \n",
"on move: X\n",
" X \n",
" O \n",
"O X \n",
"on move: O\n",
"X X \n",
" O \n",
"O X \n",
"on move: X\n",
"X X \n",
"O O \n",
"O X \n",
"on move: O\n",
"X X \n",
"O O \n",
"O X X \n",
"on move: X\n",
"X X \n",
"O O O \n",
"O X X \n",
"Episode 78, Total Reward: -1\n",
"Average Reward: 0.1794871794871795\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O X \n",
" X \n",
" \n",
"on move: X\n",
" O X \n",
" X \n",
" O \n",
"on move: O\n",
"X O X \n",
" X \n",
" O \n",
"on move: X\n",
"X O X \n",
" X \n",
" O O \n",
"on move: O\n",
"X O X \n",
"X X \n",
" O O \n",
"on move: X\n",
"X O X \n",
"X X O \n",
" O O \n",
"on move: O\n",
"X O X \n",
"X X O \n",
"X O O \n",
"Episode 79, Total Reward: 1\n",
"Average Reward: 0.189873417721519\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" \n",
" X X \n",
"on move: X\n",
"O O \n",
" \n",
" X X \n",
"on move: O\n",
"O O \n",
"X \n",
" X X \n",
"on move: X\n",
"O O O \n",
"X \n",
" X X \n",
"Episode 80, Total Reward: -1\n",
"Average Reward: 0.175\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
"O \n",
" \n",
"X \n",
"on move: O\n",
"O \n",
" X \n",
"X \n",
"on move: X\n",
"O \n",
" O X \n",
"X \n",
"on move: O\n",
"O \n",
" O X \n",
"X X \n",
"on move: X\n",
"O \n",
" O X \n",
"X O X \n",
"on move: O\n",
"O X \n",
" O X \n",
"X O X \n",
"Episode 81, Total Reward: 1\n",
"Average Reward: 0.18518518518518517\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" \n",
"X O \n",
"on move: O\n",
" \n",
" X \n",
"X O \n",
"on move: X\n",
"O \n",
" X \n",
"X O \n",
"on move: O\n",
"O X \n",
" X \n",
"X O \n",
"Episode 82, Total Reward: 1\n",
"Average Reward: 0.1951219512195122\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" \n",
" X \n",
" O X \n",
"on move: X\n",
" \n",
" O X \n",
" O X \n",
"on move: O\n",
" X \n",
" O X \n",
" O X \n",
"on move: X\n",
" X \n",
"O O X \n",
" O X \n",
"on move: O\n",
" X \n",
"O O X \n",
"X O X \n",
"on move: X\n",
" X O \n",
"O O X \n",
"X O X \n",
"on move: O\n",
"X X O \n",
"O O X \n",
"X O X \n",
"Episode 83, Total Reward: 0\n",
"Average Reward: 0.1927710843373494\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
"O X \n",
" \n",
"on move: O\n",
" X \n",
"O X \n",
" \n",
"on move: X\n",
" X \n",
"O X \n",
" O \n",
"on move: O\n",
" X X \n",
"O X \n",
" O \n",
"on move: X\n",
" X X \n",
"O X \n",
" O O \n",
"on move: O\n",
"X X X \n",
"O X \n",
" O O \n",
"Episode 84, Total Reward: 1\n",
"Average Reward: 0.20238095238095238\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" O \n",
" \n",
"X \n",
"on move: O\n",
" O \n",
" \n",
"X X \n",
"on move: X\n",
"O O \n",
" \n",
"X X \n",
"on move: O\n",
"O O \n",
"X \n",
"X X \n",
"on move: X\n",
"O O \n",
"X \n",
"X X O \n",
"on move: O\n",
"O O X \n",
"X \n",
"X X O \n",
"on move: X\n",
"O O X \n",
"X O \n",
"X X O \n",
"Episode 85, Total Reward: -1\n",
"Average Reward: 0.18823529411764706\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" O X \n",
" \n",
" \n",
"on move: O\n",
" O X \n",
" \n",
" X \n",
"on move: X\n",
" O X \n",
" O \n",
" X \n",
"on move: O\n",
" O X \n",
" O \n",
" X X \n",
"on move: X\n",
" O X \n",
" O O \n",
" X X \n",
"on move: O\n",
"X O X \n",
" O O \n",
" X X \n",
"on move: X\n",
"X O X \n",
"O O O \n",
" X X \n",
"Episode 86, Total Reward: -1\n",
"Average Reward: 0.1744186046511628\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" O \n",
" \n",
"on move: O\n",
" X X \n",
" O \n",
" \n",
"on move: X\n",
" X X \n",
" O O \n",
" \n",
"on move: O\n",
"X X X \n",
" O O \n",
" \n",
"Episode 87, Total Reward: 1\n",
"Average Reward: 0.1839080459770115\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X \n",
" O \n",
"on move: O\n",
" X \n",
"X \n",
" O \n",
"on move: X\n",
" X \n",
"X \n",
" O O \n",
"on move: O\n",
" X \n",
"X \n",
"X O O \n",
"on move: X\n",
" X \n",
"X O \n",
"X O O \n",
"on move: O\n",
" X X \n",
"X O \n",
"X O O \n",
"on move: X\n",
" X X \n",
"X O O \n",
"X O O \n",
"on move: O\n",
"X X X \n",
"X O O \n",
"X O O \n",
"Episode 88, Total Reward: 1\n",
"Average Reward: 0.19318181818181818\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
"O \n",
" \n",
" X \n",
"on move: O\n",
"O X \n",
" \n",
" X \n",
"on move: X\n",
"O X \n",
"O \n",
" X \n",
"on move: O\n",
"O X X \n",
"O \n",
" X \n",
"on move: X\n",
"O X X \n",
"O O \n",
" X \n",
"on move: O\n",
"O X X \n",
"O O \n",
"X X \n",
"on move: X\n",
"O X X \n",
"O O O \n",
"X X \n",
"Episode 89, Total Reward: -1\n",
"Average Reward: 0.1797752808988764\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" \n",
"X X \n",
"on move: X\n",
" O O \n",
" \n",
"X X \n",
"on move: O\n",
" O O \n",
" X \n",
"X X \n",
"on move: X\n",
" O O \n",
" O X \n",
"X X \n",
"on move: O\n",
" O O \n",
" O X \n",
"X X X \n",
"Episode 90, Total Reward: 1\n",
"Average Reward: 0.18888888888888888\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
"O X \n",
" \n",
"on move: O\n",
"X \n",
"O X \n",
" \n",
"on move: X\n",
"X \n",
"O O X \n",
" \n",
"on move: O\n",
"X \n",
"O O X \n",
"X \n",
"on move: X\n",
"X \n",
"O O X \n",
"X O \n",
"on move: O\n",
"X X \n",
"O O X \n",
"X O \n",
"on move: X\n",
"X X O \n",
"O O X \n",
"X O \n",
"on move: O\n",
"X X O \n",
"O O X \n",
"X X O \n",
"Episode 91, Total Reward: 0\n",
"Average Reward: 0.18681318681318682\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" O \n",
" \n",
"X \n",
"on move: O\n",
" O \n",
" \n",
"X X \n",
"on move: X\n",
" O \n",
" \n",
"X X O \n",
"on move: O\n",
" O \n",
"X \n",
"X X O \n",
"on move: X\n",
"O O \n",
"X \n",
"X X O \n",
"on move: O\n",
"O X O \n",
"X \n",
"X X O \n",
"on move: X\n",
"O X O \n",
"X O \n",
"X X O \n",
"Episode 92, Total Reward: -1\n",
"Average Reward: 0.17391304347826086\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
" X O \n",
"on move: O\n",
"X \n",
" \n",
" X O \n",
"on move: X\n",
"X \n",
"O \n",
" X O \n",
"on move: O\n",
"X X \n",
"O \n",
" X O \n",
"on move: X\n",
"X O X \n",
"O \n",
" X O \n",
"on move: O\n",
"X O X \n",
"O \n",
"X X O \n",
"on move: X\n",
"X O X \n",
"O O \n",
"X X O \n",
"on move: O\n",
"X O X \n",
"O O X \n",
"X X O \n",
"Episode 93, Total Reward: 0\n",
"Average Reward: 0.17204301075268819\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
"O \n",
"X \n",
" \n",
"on move: O\n",
"O \n",
"X \n",
" X \n",
"on move: X\n",
"O \n",
"X \n",
" O X \n",
"on move: O\n",
"O \n",
"X \n",
"X O X \n",
"on move: X\n",
"O O \n",
"X \n",
"X O X \n",
"on move: O\n",
"O O \n",
"X X \n",
"X O X \n",
"on move: X\n",
"O O \n",
"X X O \n",
"X O X \n",
"on move: O\n",
"O O X \n",
"X X O \n",
"X O X \n",
"Episode 94, Total Reward: 1\n",
"Average Reward: 0.18085106382978725\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" X O \n",
" \n",
" X \n",
"on move: X\n",
" X O \n",
" O \n",
" X \n",
"on move: O\n",
" X O \n",
" X O \n",
" X \n",
"on move: X\n",
" X O \n",
" X O \n",
"O X \n",
"on move: O\n",
" X O \n",
" X O \n",
"O X X \n",
"Episode 95, Total Reward: 1\n",
"Average Reward: 0.18947368421052632\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" \n",
" X \n",
" O X \n",
"on move: X\n",
" O \n",
" X \n",
" O X \n",
"on move: O\n",
" X O \n",
" X \n",
" O X \n",
"on move: X\n",
"O X O \n",
" X \n",
" O X \n",
"on move: O\n",
"O X O \n",
" X X \n",
" O X \n",
"on move: X\n",
"O X O \n",
" X X \n",
"O O X \n",
"on move: O\n",
"O X O \n",
"X X X \n",
"O O X \n",
"Episode 96, Total Reward: 1\n",
"Average Reward: 0.19791666666666666\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
"X \n",
" X \n",
"on move: X\n",
" O \n",
"X O \n",
" X \n",
"on move: O\n",
" O \n",
"X O \n",
" X X \n",
"on move: X\n",
" O \n",
"X O O \n",
" X X \n",
"on move: O\n",
"X O \n",
"X O O \n",
" X X \n",
"on move: X\n",
"X O \n",
"X O O \n",
"O X X \n",
"on move: O\n",
"X O X \n",
"X O O \n",
"O X X \n",
"Episode 97, Total Reward: 0\n",
"Average Reward: 0.1958762886597938\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
"O \n",
" \n",
"on move: O\n",
" X \n",
"O X \n",
" \n",
"on move: X\n",
" O X \n",
"O X \n",
" \n",
"on move: O\n",
" O X \n",
"O X \n",
" X \n",
"on move: X\n",
"O O X \n",
"O X \n",
" X \n",
"on move: O\n",
"O O X \n",
"O X \n",
" X X \n",
"Episode 98, Total Reward: 1\n",
"Average Reward: 0.20408163265306123\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
"X \n",
" X \n",
"on move: X\n",
"O O \n",
"X \n",
" X \n",
"on move: O\n",
"O O \n",
"X \n",
" X X \n",
"on move: X\n",
"O O \n",
"X \n",
"O X X \n",
"on move: O\n",
"O X O \n",
"X \n",
"O X X \n",
"on move: X\n",
"O X O \n",
"X O \n",
"O X X \n",
"on move: O\n",
"O X O \n",
"X X O \n",
"O X X \n",
"Episode 99, Total Reward: 1\n",
"Average Reward: 0.21212121212121213\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
"O \n",
" X \n",
"on move: O\n",
" \n",
"O \n",
" X X \n",
"on move: X\n",
" \n",
"O \n",
"O X X \n",
"on move: O\n",
" \n",
"O X \n",
"O X X \n",
"on move: X\n",
" O \n",
"O X \n",
"O X X \n",
"on move: O\n",
"X O \n",
"O X \n",
"O X X \n",
"on move: X\n",
"X O \n",
"O O X \n",
"O X X \n",
"Episode 100, Total Reward: -1\n",
"Average Reward: 0.2\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
"X O \n",
" X \n",
" \n",
"on move: X\n",
"X O \n",
" O X \n",
" \n",
"on move: O\n",
"X O \n",
" O X \n",
"X \n",
"on move: X\n",
"X O \n",
" O X \n",
"X O \n",
"on move: O\n",
"X O \n",
" O X \n",
"X X O \n",
"on move: X\n",
"X O O \n",
" O X \n",
"X X O \n",
"on move: O\n",
"X O O \n",
"X O X \n",
"X X O \n",
"Episode 101, Total Reward: 1\n",
"Average Reward: 0.2079207920792079\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X O \n",
" \n",
"on move: O\n",
" X \n",
"X O \n",
" \n",
"on move: X\n",
" X \n",
"X O O \n",
" \n",
"on move: O\n",
" X \n",
"X O O \n",
" X \n",
"on move: X\n",
" X \n",
"X O O \n",
"O X \n",
"on move: O\n",
" X X \n",
"X O O \n",
"O X \n",
"on move: X\n",
"O X X \n",
"X O O \n",
"O X \n",
"on move: O\n",
"O X X \n",
"X O O \n",
"O X X \n",
"Episode 102, Total Reward: 0\n",
"Average Reward: 0.20588235294117646\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
"O \n",
" X \n",
" \n",
"on move: O\n",
"O X \n",
" X \n",
" \n",
"on move: X\n",
"O X \n",
" X \n",
" O \n",
"on move: O\n",
"O X \n",
" X \n",
" O X \n",
"on move: X\n",
"O X \n",
" X \n",
"O O X \n",
"on move: O\n",
"O X \n",
" X X \n",
"O O X \n",
"Episode 103, Total Reward: 1\n",
"Average Reward: 0.21359223300970873\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
"O \n",
" \n",
"on move: O\n",
"X X \n",
"O \n",
" \n",
"on move: X\n",
"X X \n",
"O O \n",
" \n",
"on move: O\n",
"X X X \n",
"O O \n",
" \n",
"Episode 104, Total Reward: 1\n",
"Average Reward: 0.22115384615384615\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
"O \n",
" \n",
"on move: O\n",
" X \n",
"O \n",
"X \n",
"on move: X\n",
" X \n",
"O \n",
"X O \n",
"on move: O\n",
" X X \n",
"O \n",
"X O \n",
"on move: X\n",
"O X X \n",
"O \n",
"X O \n",
"on move: O\n",
"O X X \n",
"O X \n",
"X O \n",
"on move: X\n",
"O X X \n",
"O X \n",
"X O O \n",
"on move: O\n",
"O X X \n",
"O X X \n",
"X O O \n",
"Episode 105, Total Reward: 1\n",
"Average Reward: 0.22857142857142856\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
"O X \n",
" \n",
"on move: O\n",
" X \n",
"O X \n",
" \n",
"on move: X\n",
" X \n",
"O X \n",
" O \n",
"on move: O\n",
"X X \n",
"O X \n",
" O \n",
"on move: X\n",
"X O X \n",
"O X \n",
" O \n",
"on move: O\n",
"X O X \n",
"O X \n",
"X O \n",
"on move: X\n",
"X O X \n",
"O O X \n",
"X O \n",
"on move: O\n",
"X O X \n",
"O O X \n",
"X X O \n",
"Episode 106, Total Reward: 0\n",
"Average Reward: 0.22641509433962265\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O \n",
" X X \n",
" \n",
"on move: X\n",
" O \n",
" X X \n",
" O \n",
"on move: O\n",
" O \n",
"X X X \n",
" O \n",
"Episode 107, Total Reward: 1\n",
"Average Reward: 0.2336448598130841\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
"X \n",
" X \n",
" O \n",
"on move: X\n",
"X \n",
" X \n",
" O O \n",
"on move: O\n",
"X X \n",
" X \n",
" O O \n",
"on move: X\n",
"X X \n",
"O X \n",
" O O \n",
"on move: O\n",
"X X X \n",
"O X \n",
" O O \n",
"Episode 108, Total Reward: 1\n",
"Average Reward: 0.24074074074074073\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
"O \n",
" X \n",
"on move: O\n",
" X \n",
"O \n",
" X \n",
"on move: X\n",
" X \n",
"O O \n",
" X \n",
"on move: O\n",
" X \n",
"O O \n",
"X X \n",
"on move: X\n",
" X \n",
"O O O \n",
"X X \n",
"Episode 109, Total Reward: -1\n",
"Average Reward: 0.22935779816513763\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" \n",
" X \n",
"X O \n",
"on move: X\n",
" \n",
" X \n",
"X O O \n",
"on move: O\n",
" \n",
" X X \n",
"X O O \n",
"on move: X\n",
" O \n",
" X X \n",
"X O O \n",
"on move: O\n",
" O X \n",
" X X \n",
"X O O \n",
"Episode 110, Total Reward: 1\n",
"Average Reward: 0.23636363636363636\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" O \n",
" \n",
"on move: O\n",
" X \n",
" O \n",
" X \n",
"on move: X\n",
" X \n",
" O O \n",
" X \n",
"on move: O\n",
"X X \n",
" O O \n",
" X \n",
"on move: X\n",
"X X \n",
" O O \n",
" O X \n",
"on move: O\n",
"X X \n",
"X O O \n",
" O X \n",
"on move: X\n",
"X O X \n",
"X O O \n",
" O X \n",
"Episode 111, Total Reward: -1\n",
"Average Reward: 0.22522522522522523\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
"O \n",
" X \n",
"on move: O\n",
" \n",
"O \n",
"X X \n",
"on move: X\n",
" \n",
"O O \n",
"X X \n",
"on move: O\n",
" \n",
"O X O \n",
"X X \n",
"on move: X\n",
" O \n",
"O X O \n",
"X X \n",
"on move: O\n",
"X O \n",
"O X O \n",
"X X \n",
"on move: X\n",
"X O O \n",
"O X O \n",
"X X \n",
"on move: O\n",
"X O O \n",
"O X O \n",
"X X X \n",
"Episode 112, Total Reward: 1\n",
"Average Reward: 0.23214285714285715\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
"O X \n",
" \n",
" \n",
"on move: O\n",
"O X \n",
" \n",
"X \n",
"on move: X\n",
"O X \n",
"O \n",
"X \n",
"on move: O\n",
"O X X \n",
"O \n",
"X \n",
"on move: X\n",
"O X X \n",
"O \n",
"X O \n",
"on move: O\n",
"O X X \n",
"O \n",
"X O X \n",
"on move: X\n",
"O X X \n",
"O O \n",
"X O X \n",
"on move: O\n",
"O X X \n",
"O X O \n",
"X O X \n",
"Episode 113, Total Reward: 1\n",
"Average Reward: 0.23893805309734514\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" \n",
" X X \n",
"on move: X\n",
" O \n",
" O \n",
" X X \n",
"on move: O\n",
" O \n",
" O \n",
"X X X \n",
"Episode 114, Total Reward: 1\n",
"Average Reward: 0.24561403508771928\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
"X \n",
" X \n",
"on move: X\n",
" O \n",
"X \n",
"O X \n",
"on move: O\n",
"X O \n",
"X \n",
"O X \n",
"on move: X\n",
"X O \n",
"X O \n",
"O X \n",
"on move: O\n",
"X O \n",
"X O X \n",
"O X \n",
"on move: X\n",
"X O \n",
"X O X \n",
"O X O \n",
"on move: O\n",
"X O X \n",
"X O X \n",
"O X O \n",
"Episode 115, Total Reward: 0\n",
"Average Reward: 0.24347826086956523\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
"O \n",
"on move: O\n",
" X X \n",
" \n",
"O \n",
"on move: X\n",
"O X X \n",
" \n",
"O \n",
"on move: O\n",
"O X X \n",
" X \n",
"O \n",
"on move: X\n",
"O X X \n",
" X \n",
"O O \n",
"on move: O\n",
"O X X \n",
" X \n",
"O O X \n",
"Episode 116, Total Reward: 1\n",
"Average Reward: 0.25\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" O \n",
"X \n",
"on move: O\n",
" \n",
" O \n",
"X X \n",
"on move: X\n",
" \n",
" O \n",
"X O X \n",
"on move: O\n",
" \n",
"X O \n",
"X O X \n",
"on move: X\n",
"O \n",
"X O \n",
"X O X \n",
"on move: O\n",
"O X \n",
"X O \n",
"X O X \n",
"on move: X\n",
"O X \n",
"X O O \n",
"X O X \n",
"on move: O\n",
"O X X \n",
"X O O \n",
"X O X \n",
"Episode 117, Total Reward: 0\n",
"Average Reward: 0.24786324786324787\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O \n",
" X X \n",
" \n",
"on move: X\n",
" O \n",
" X X \n",
" O \n",
"on move: O\n",
"X O \n",
" X X \n",
" O \n",
"on move: X\n",
"X O \n",
" X X \n",
"O O \n",
"on move: O\n",
"X O \n",
"X X X \n",
"O O \n",
"Episode 118, Total Reward: 1\n",
"Average Reward: 0.2542372881355932\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
"O X \n",
" \n",
" \n",
"on move: O\n",
"O X \n",
"X \n",
" \n",
"on move: X\n",
"O X \n",
"X \n",
" O \n",
"on move: O\n",
"O X \n",
"X X \n",
" O \n",
"on move: X\n",
"O O X \n",
"X X \n",
" O \n",
"on move: O\n",
"O O X \n",
"X X X \n",
" O \n",
"Episode 119, Total Reward: 1\n",
"Average Reward: 0.2605042016806723\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" O \n",
" X \n",
"on move: O\n",
" \n",
" O \n",
"X X \n",
"on move: X\n",
" O \n",
" O \n",
"X X \n",
"on move: O\n",
" O X \n",
" O \n",
"X X \n",
"on move: X\n",
" O X \n",
" O \n",
"X O X \n",
"on move: O\n",
" O X \n",
"X O \n",
"X O X \n",
"on move: X\n",
" O X \n",
"X O O \n",
"X O X \n",
"Episode 120, Total Reward: -1\n",
"Average Reward: 0.25\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O \n",
" X X \n",
" \n",
"on move: X\n",
"O O \n",
" X X \n",
" \n",
"on move: O\n",
"O O X \n",
" X X \n",
" \n",
"on move: X\n",
"O O X \n",
" X X \n",
"O \n",
"on move: O\n",
"O O X \n",
" X X \n",
"O X \n",
"Episode 121, Total Reward: 1\n",
"Average Reward: 0.256198347107438\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
"O \n",
"on move: O\n",
"X X \n",
" \n",
"O \n",
"on move: X\n",
"X X \n",
"O \n",
"O \n",
"on move: O\n",
"X X \n",
"O \n",
"O X \n",
"on move: X\n",
"X X O \n",
"O \n",
"O X \n",
"on move: O\n",
"X X O \n",
"O \n",
"O X X \n",
"on move: X\n",
"X X O \n",
"O O \n",
"O X X \n",
"Episode 122, Total Reward: -1\n",
"Average Reward: 0.2459016393442623\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" \n",
"X O \n",
"on move: O\n",
" \n",
" \n",
"X X O \n",
"on move: X\n",
" \n",
"O \n",
"X X O \n",
"on move: O\n",
" X \n",
"O \n",
"X X O \n",
"on move: X\n",
" X \n",
"O O \n",
"X X O \n",
"on move: O\n",
" X X \n",
"O O \n",
"X X O \n",
"on move: X\n",
" X X \n",
"O O O \n",
"X X O \n",
"Episode 123, Total Reward: -1\n",
"Average Reward: 0.23577235772357724\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
"O \n",
" \n",
" X \n",
"on move: O\n",
"O X \n",
" \n",
" X \n",
"on move: X\n",
"O X \n",
" O \n",
" X \n",
"on move: O\n",
"O X \n",
" O \n",
" X X \n",
"on move: X\n",
"O O X \n",
" O \n",
" X X \n",
"on move: O\n",
"O O X \n",
"X O \n",
" X X \n",
"on move: X\n",
"O O X \n",
"X O O \n",
" X X \n",
"on move: O\n",
"O O X \n",
"X O O \n",
"X X X \n",
"Episode 124, Total Reward: 1\n",
"Average Reward: 0.24193548387096775\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
"O \n",
" X \n",
"on move: O\n",
" X \n",
"O \n",
" X \n",
"on move: X\n",
" X \n",
"O \n",
"O X \n",
"on move: O\n",
" X \n",
"O X \n",
"O X \n",
"on move: X\n",
" X \n",
"O X O \n",
"O X \n",
"on move: O\n",
" X X \n",
"O X O \n",
"O X \n",
"on move: X\n",
"O X X \n",
"O X O \n",
"O X \n",
"Episode 125, Total Reward: -1\n",
"Average Reward: 0.232\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
"O X \n",
" \n",
"on move: O\n",
" \n",
"O X \n",
"X \n",
"on move: X\n",
" \n",
"O O X \n",
"X \n",
"on move: O\n",
" \n",
"O O X \n",
"X X \n",
"on move: X\n",
" \n",
"O O X \n",
"X O X \n",
"on move: O\n",
"X \n",
"O O X \n",
"X O X \n",
"on move: X\n",
"X O \n",
"O O X \n",
"X O X \n",
"Episode 126, Total Reward: -1\n",
"Average Reward: 0.2222222222222222\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" O \n",
" \n",
"on move: O\n",
" X X \n",
" O \n",
" \n",
"on move: X\n",
" X X \n",
" O O \n",
" \n",
"on move: O\n",
" X X \n",
" O O \n",
" X \n",
"on move: X\n",
" X X \n",
" O O \n",
"O X \n",
"on move: O\n",
" X X \n",
"X O O \n",
"O X \n",
"on move: X\n",
" X X \n",
"X O O \n",
"O X O \n",
"on move: O\n",
"X X X \n",
"X O O \n",
"O X O \n",
"Episode 127, Total Reward: 1\n",
"Average Reward: 0.2283464566929134\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
"O \n",
"on move: O\n",
"X \n",
"X \n",
"O \n",
"on move: X\n",
"X \n",
"X O \n",
"O \n",
"on move: O\n",
"X \n",
"X O X \n",
"O \n",
"on move: X\n",
"X \n",
"X O X \n",
"O O \n",
"on move: O\n",
"X X \n",
"X O X \n",
"O O \n",
"on move: X\n",
"X X \n",
"X O X \n",
"O O O \n",
"Episode 128, Total Reward: -1\n",
"Average Reward: 0.21875\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O \n",
" X X \n",
" \n",
"on move: X\n",
"O O \n",
" X X \n",
" \n",
"on move: O\n",
"O O \n",
" X X \n",
" X \n",
"on move: X\n",
"O O O \n",
" X X \n",
" X \n",
"Episode 129, Total Reward: -1\n",
"Average Reward: 0.20930232558139536\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" O \n",
" \n",
"on move: O\n",
" X \n",
" O \n",
"X \n",
"on move: X\n",
"O X \n",
" O \n",
"X \n",
"on move: O\n",
"O X \n",
" O \n",
"X X \n",
"on move: X\n",
"O X \n",
" O O \n",
"X X \n",
"on move: O\n",
"O X \n",
" O O \n",
"X X X \n",
"Episode 130, Total Reward: 1\n",
"Average Reward: 0.2153846153846154\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X \n",
" \n",
"X O \n",
"on move: X\n",
"O X \n",
" \n",
"X O \n",
"on move: O\n",
"O X \n",
" \n",
"X X O \n",
"on move: X\n",
"O X \n",
" O \n",
"X X O \n",
"Episode 131, Total Reward: -1\n",
"Average Reward: 0.20610687022900764\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
"O X \n",
" \n",
"on move: O\n",
" X \n",
"O X \n",
" \n",
"on move: X\n",
" O X \n",
"O X \n",
" \n",
"on move: O\n",
" O X \n",
"O X \n",
" X \n",
"Episode 132, Total Reward: 1\n",
"Average Reward: 0.21212121212121213\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
"O \n",
" X \n",
" \n",
"on move: O\n",
"O \n",
" X \n",
" X \n",
"on move: X\n",
"O \n",
" X O \n",
" X \n",
"on move: O\n",
"O \n",
" X O \n",
"X X \n",
"on move: X\n",
"O O \n",
" X O \n",
"X X \n",
"on move: O\n",
"O O \n",
" X O \n",
"X X X \n",
"Episode 133, Total Reward: 1\n",
"Average Reward: 0.21804511278195488\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" O X \n",
" \n",
" \n",
"on move: O\n",
"X O X \n",
" \n",
" \n",
"on move: X\n",
"X O X \n",
" O \n",
" \n",
"on move: O\n",
"X O X \n",
" O \n",
"X \n",
"on move: X\n",
"X O X \n",
"O O \n",
"X \n",
"on move: O\n",
"X O X \n",
"O X O \n",
"X \n",
"Episode 134, Total Reward: 1\n",
"Average Reward: 0.22388059701492538\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" \n",
" X \n",
"X O \n",
"on move: X\n",
"O \n",
" X \n",
"X O \n",
"on move: O\n",
"O \n",
"X X \n",
"X O \n",
"on move: X\n",
"O O \n",
"X X \n",
"X O \n",
"on move: O\n",
"O O \n",
"X X X \n",
"X O \n",
"Episode 135, Total Reward: 1\n",
"Average Reward: 0.22962962962962963\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X \n",
"O \n",
"on move: O\n",
" X \n",
"X \n",
"O \n",
"on move: X\n",
"O X \n",
"X \n",
"O \n",
"on move: O\n",
"O X \n",
"X X \n",
"O \n",
"on move: X\n",
"O X \n",
"X X \n",
"O O \n",
"on move: O\n",
"O X \n",
"X X \n",
"O X O \n",
"on move: X\n",
"O X \n",
"X O X \n",
"O X O \n",
"Episode 136, Total Reward: -1\n",
"Average Reward: 0.22058823529411764\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
"O \n",
" \n",
" X \n",
"on move: O\n",
"O \n",
" X \n",
" X \n",
"on move: X\n",
"O \n",
" X \n",
"O X \n",
"on move: O\n",
"O \n",
"X X \n",
"O X \n",
"on move: X\n",
"O \n",
"X X \n",
"O O X \n",
"on move: O\n",
"O \n",
"X X X \n",
"O O X \n",
"Episode 137, Total Reward: 1\n",
"Average Reward: 0.22627737226277372\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" O \n",
"X \n",
" \n",
"on move: O\n",
"X O \n",
"X \n",
" \n",
"on move: X\n",
"X O \n",
"X \n",
"O \n",
"on move: O\n",
"X O \n",
"X X \n",
"O \n",
"on move: X\n",
"X O O \n",
"X X \n",
"O \n",
"on move: O\n",
"X O O \n",
"X X X \n",
"O \n",
"Episode 138, Total Reward: 1\n",
"Average Reward: 0.2318840579710145\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" X \n",
" X \n",
"on move: X\n",
"O O \n",
" X \n",
" X \n",
"on move: O\n",
"O X O \n",
" X \n",
" X \n",
"Episode 139, Total Reward: 1\n",
"Average Reward: 0.23741007194244604\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
"O X \n",
"on move: O\n",
" X \n",
" \n",
"O X \n",
"on move: X\n",
" X \n",
" O \n",
"O X \n",
"on move: O\n",
" X \n",
"X O \n",
"O X \n",
"on move: X\n",
" X O \n",
"X O \n",
"O X \n",
"Episode 140, Total Reward: -1\n",
"Average Reward: 0.22857142857142856\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" X O \n",
" \n",
" X \n",
"on move: X\n",
" X O \n",
" \n",
" O X \n",
"on move: O\n",
" X O \n",
"X \n",
" O X \n",
"on move: X\n",
" X O \n",
"X O \n",
" O X \n",
"on move: O\n",
" X O \n",
"X X O \n",
" O X \n",
"on move: X\n",
" X O \n",
"X X O \n",
"O O X \n",
"on move: O\n",
"X X O \n",
"X X O \n",
"O O X \n",
"Episode 141, Total Reward: 1\n",
"Average Reward: 0.23404255319148937\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
"O \n",
" \n",
"on move: O\n",
"X \n",
"O X \n",
" \n",
"on move: X\n",
"X \n",
"O X \n",
" O \n",
"on move: O\n",
"X \n",
"O X \n",
" O X \n",
"Episode 142, Total Reward: 1\n",
"Average Reward: 0.23943661971830985\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
"O \n",
" \n",
" X \n",
"on move: O\n",
"O \n",
" X \n",
" X \n",
"on move: X\n",
"O \n",
" X \n",
" O X \n",
"on move: O\n",
"O \n",
"X X \n",
" O X \n",
"on move: X\n",
"O O \n",
"X X \n",
" O X \n",
"on move: O\n",
"O X O \n",
"X X \n",
" O X \n",
"on move: X\n",
"O X O \n",
"X X O \n",
" O X \n",
"on move: O\n",
"O X O \n",
"X X O \n",
"X O X \n",
"Episode 143, Total Reward: 0\n",
"Average Reward: 0.23776223776223776\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X \n",
" \n",
"X O \n",
"on move: X\n",
"O X \n",
" \n",
"X O \n",
"on move: O\n",
"O X \n",
" X \n",
"X O \n",
"Episode 144, Total Reward: 1\n",
"Average Reward: 0.24305555555555555\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" O \n",
" \n",
"X \n",
"on move: O\n",
" O X \n",
" \n",
"X \n",
"on move: X\n",
"O O X \n",
" \n",
"X \n",
"on move: O\n",
"O O X \n",
" \n",
"X X \n",
"on move: X\n",
"O O X \n",
"O \n",
"X X \n",
"on move: O\n",
"O O X \n",
"O \n",
"X X X \n",
"Episode 145, Total Reward: 1\n",
"Average Reward: 0.2482758620689655\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X O \n",
" \n",
" \n",
"on move: O\n",
"X O \n",
" \n",
"X \n",
"on move: X\n",
"X O \n",
"O \n",
"X \n",
"on move: O\n",
"X O \n",
"O \n",
"X X \n",
"on move: X\n",
"X O \n",
"O O \n",
"X X \n",
"on move: O\n",
"X O \n",
"O O \n",
"X X X \n",
"Episode 146, Total Reward: 1\n",
"Average Reward: 0.2534246575342466\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" O \n",
" X \n",
"on move: O\n",
" \n",
" O X \n",
" X \n",
"on move: X\n",
" \n",
"O O X \n",
" X \n",
"on move: O\n",
" \n",
"O O X \n",
"X X \n",
"on move: X\n",
" O \n",
"O O X \n",
"X X \n",
"on move: O\n",
"X O \n",
"O O X \n",
"X X \n",
"on move: X\n",
"X O \n",
"O O X \n",
"X O X \n",
"on move: O\n",
"X X O \n",
"O O X \n",
"X O X \n",
"Episode 147, Total Reward: 0\n",
"Average Reward: 0.25170068027210885\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" O X \n",
" \n",
" \n",
"on move: O\n",
"X O X \n",
" \n",
" \n",
"on move: X\n",
"X O X \n",
" \n",
" O \n",
"on move: O\n",
"X O X \n",
" X \n",
" O \n",
"on move: X\n",
"X O X \n",
" X \n",
"O O \n",
"on move: O\n",
"X O X \n",
"X X \n",
"O O \n",
"on move: X\n",
"X O X \n",
"X X O \n",
"O O \n",
"on move: O\n",
"X O X \n",
"X X O \n",
"O X O \n",
"Episode 148, Total Reward: 0\n",
"Average Reward: 0.25\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
"O \n",
" \n",
"on move: O\n",
" X \n",
"O \n",
" X \n",
"on move: X\n",
"O X \n",
"O \n",
" X \n",
"on move: O\n",
"O X \n",
"O X \n",
" X \n",
"Episode 149, Total Reward: 1\n",
"Average Reward: 0.2550335570469799\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" X \n",
" X \n",
" O \n",
"on move: X\n",
"O X \n",
" X \n",
" O \n",
"on move: O\n",
"O X X \n",
" X \n",
" O \n",
"on move: X\n",
"O X X \n",
" X \n",
" O O \n",
"on move: O\n",
"O X X \n",
" X X \n",
" O O \n",
"on move: X\n",
"O X X \n",
"O X X \n",
" O O \n",
"on move: O\n",
"O X X \n",
"O X X \n",
"X O O \n",
"Episode 150, Total Reward: 1\n",
"Average Reward: 0.26\n"
]
}
],
"source": [
"# Основной цикл обучения (работа с отдельным классом агента)\n",
"\n",
"# Создание среды для игры в крестики-нолики\n",
"environment = TicTacToeEnv()\n",
"\n",
"# Создание агента (играющего крестиками)\n",
"agent = Agent(symbol=1)\n",
"\n",
"num_episodes = 150 # Количество эпизодов (игр) для обучения\n",
"collected_rewards = [] # Список для хранения наград/побед в каждом эпизоде \n",
"\n",
"# Переменная для отслеживания символа и текущего игрока\n",
"oom = 1\n",
"\n",
"for i in range(num_episodes):\n",
" # Сброс среды и начало нового эпизода\n",
" state, _ = environment.reset() \n",
"\n",
" # Общая награда за эпизод\n",
" total_reward = 0\n",
"\n",
" # Флаг завершения игры\n",
" done = False\n",
" om = oom \n",
"\n",
" # Максимум 9 ходов, поскольку поле 3x3 \n",
" for j in range(9): \n",
" moves = environment.move_generator() \n",
"\n",
" # Ходов нет, заканчиваем игру\n",
" if not moves:\n",
" break\n",
"\n",
" \n",
" if len(moves) == 1:\n",
" move = moves[0] # Если остался один ход на основе стратегии\n",
" else:\n",
" move = agent.get_action(moves) # Агент выбирает ход на основе стратегии\n",
"\n",
" # Выполнение хода и обновление состояния игры\n",
" next_state, reward, done, info = environment.step(move)\n",
" total_reward += reward\n",
" state = next_state\n",
"\n",
" # Отображаем текущее состояние игры\n",
" environment.render()\n",
"\n",
" if done:\n",
" break\n",
"\n",
" om = -om # Смена игрока\n",
"\n",
" collected_rewards.append(total_reward)\n",
"\n",
" print(f\"Episode {i+1}, Total Reward: {total_reward}\")\n",
" average_reward = sum(collected_rewards) / len(collected_rewards)\n",
" print(f\"Average Reward: {average_reward}\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}