AIM-PIbd-31-Alekseev-I-S/Lab_6/Lab6.ipynb
Иван Алексеев 7de16ac006 ещё хочу
2024-12-06 17:30:55 +04:00

3578 lines
82 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Начало крайней лабораторной в этом семестре, эх..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Что необходимо сделать:\n",
"Развернуть и запустить проект по реализации обучения с подкреплением для игры \"Крестики-нолики\". Перевести проект на библиотеку gymnasium и современную версию Python. Реализовать агента для игры \"Крестики-нолики\" в виде отдельного класса (по примеру из лекции). Переписать основной цикл обучения для работы с отдельным классом агента (по примеру из лекции). Выполнить тестирование новой версии программы."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Наши крестики-нолики: https://github.com/nczempin/gym-tic-tac-toe"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Перевод проекта на библиотеку gymnasium\n",
"Gymnasium — это открытая библиотека Python, которая предоставляет стандартизированные среды для разработки и тестирования алгоритмов обучения с подкреплением (Reinforcement Learning, RL). Ранее была известна как OpenAI Gym (до 2022 года), но теперь развивается под новым именем.\n",
"\n",
"Библиотека позволяет разработчикам RL-агентов взаимодействовать с различными симуляциями — от простых игровых задач до сложных физических моделей. Gymnasium упрощает процесс тестирования и сравнения алгоритмов благодаря унифицированному интерфейсу.\n",
"\n",
"**Основные возможности Gymnasium:**\n",
"1. Унифицированный API для RL-сред:\n",
"\n",
"- Gymnasium предлагает стандартный интерфейс для взаимодействия с RL-средами, включающий методы reset(), step(action), и другие.\n",
"- Это позволяет легко переключаться между различными средами без изменения кода агента.\n",
"2. Разнообразие встроенных сред:\n",
"\n",
"- Библиотека включает множество готовых симуляций, от простых (например, CartPole, MountainCar) до сложных (например, робототехника, Atari-игры).\n",
"- Среды подразделяются на категории: контроль, игры, физика, робототехника и др.\n",
"3. Поддержка классических задач RL:\n",
"\n",
"- Среды для изучения классических задач, таких как балансировка маятника, решение головоломок, управление роботами и т.д.\n",
"4. Гибкость создания пользовательских сред:\n",
"\n",
"- Gymnasium позволяет разработчикам создавать собственные симуляции, соответствующие API.\n",
"5. Совместимость с различными RL-библиотеками:\n",
"\n",
"- Gymnasium интегрируется с популярными RL-фреймворками, такими как Stable-Baselines3, Ray RLlib, TensorFlow Agents, PyTorch RL и другими.\n",
"6. Визуализация:\n",
"\n",
"- Среды могут визуализироваться, что упрощает отладку и демонстрацию работы алгоритмов.\n",
"\n",
"**Основные функции Gymnasium:**\n",
"- Инициализация среды (gymnasium.make()): позволяет создавать экземпляр среды по её имени.\n",
"- Сброс среды (reset()): возвращает начальное состояние среды и другую информацию.\n",
"- Выполнение действия (step(action)): передаёт действие агенту и возвращает результат.\n",
"- Закрытие среды (close()): очищает ресурсы, связанные со средой.\n",
"- Режим рендеринга (render()): позволяет визуализировать работу среды."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import gymnasium as gym\n",
"from gymnasium import spaces\n",
"\n",
"class TicTacToeEnv(gym.Env):\n",
" metadata = {'render.modes': ['human']}\n",
" \n",
" symbols = ['O', ' ', 'X']\n",
"\n",
" def __init__(self):\n",
" super().__init__()\n",
" self.action_space = spaces.Discrete(9) # Дискретное пространство действий (08), соответствующее номерам клеток на игровом поле (от 0 до 8).\n",
" self.observation_space = spaces.Discrete(9 * 3 * 2) # Дискретное пространство состояний. Расчёт: 9 клеток × 3 состояния (пустая, крестик, нолик) × 2 игрока (на чей ход).\n",
" self.reset()\n",
"\n",
" def step(self, action):\n",
" done = False\n",
" reward = 0\n",
"\n",
" p, square = action # p - игрок (1 или -1), square - номер клетки\n",
"\n",
" board = self.state['board']\n",
" proposed = board[square] \n",
" om = self.state['on_move'] \n",
" if proposed != 0: # Клетка уже занята\n",
" print(f\"Незаконный ход: Квадрат {square} уже занят.\")\n",
" done = True\n",
" reward = -1 * om \n",
" if p != om: # Не тот игрок на ходу\n",
" print(f\"Незаконный ход: игрок {p} не находится в движении\")\n",
" done = True\n",
" reward = -1 * om\n",
" else:\n",
" board[square] = p\n",
" self.state['on_move'] = -p\n",
"\n",
" for i in range(3):\n",
" # Горизонтали и вертикали\n",
" if (board[i * 3] == p and board[i * 3 + 1] == p and board[i * 3 + 2] == p) or \\\n",
" (board[i] == p and board[i + 3] == p and board[i + 6] == p):\n",
" reward = p\n",
" done = True\n",
" break\n",
"\n",
" # Диагонали\n",
" if (board[0] == p and board[4] == p and board[8] == p) or \\\n",
" (board[2] == p and board[4] == p and board[6] == p):\n",
" reward = p\n",
" done = True\n",
" \n",
" return self.state, reward, done, {}\n",
"\n",
" def reset(self):\n",
" self.state = {}\n",
" self.state['board'] = [0, 0, 0, 0, 0, 0, 0, 0, 0] \n",
" self.state['on_move'] = 1 \n",
" return self.state, {}\n",
"\n",
" def render(self, close=False):\n",
" if close:\n",
" return\n",
" print(\"on move: \" , self.symbols[self.state['on_move']+1])\n",
" for i in range (9):\n",
" print (self.symbols[self.state['board'][i]+1], end=\" \")\n",
" if ((i % 3) == 2):\n",
" print()\n",
"\n",
" def move_generator(self):\n",
" moves = []\n",
" for i in range(9):\n",
" if self.state['board'][i] == 0:\n",
" p = self.state['on_move']\n",
" m = [p, i]\n",
" moves.append(m)\n",
" return moves"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Реализация агента\n",
"В контексте обучения с подкреплением (Reinforcement Learning, RL) агент — это программный компонент, который взаимодействует с окружающей средой (environment) с целью научиться выбирать оптимальные действия для достижения своей цели.\n",
"\n",
"**Основные аспекты агента:**\n",
"1. Что делает агент?\n",
"\n",
"- Агент принимает решения: в каждом состоянии среды он выбирает действие (например, на какую клетку походить в крестиках-ноликах или как двигаться в игре).\n",
"- Он учится выбирать действия, которые максимизируют его \"награду\" (reward), получаемую от среды.\n",
"2. Как агент учится?\n",
"\n",
"- Агент улучшает свою стратегию на основе опыта взаимодействия со средой.\n",
"- Этот процесс осуществляется с использованием алгоритмов обучения с подкреплением (например, Q-learning, глубокое Q-обучение или методы политики).\n",
"\n",
"Ключевые элементы агента:\n",
"1. Наблюдение (Observation): Агент воспринимает текущее состояние среды. Это может быть, например, игровое поле, показания датчиков или изображение. Наблюдение агент получает в виде данных от среды.\n",
"2. Действие (Action): В каждом шаге агент выбирает действие из допустимого набора действий. Этот набор зависит от правил среды (например, куда можно походить в игре).\n",
"3. Политика (Policy): Политика — это стратегия, которой следует агент, чтобы выбирать действия.\n",
"Она может быть:\n",
"- Детерминированной: одно и то же состояние всегда приводит к одному и тому же действию.\n",
"- Стохастической: для одного состояния агент выбирает действие с определённой вероятностью.\n",
"4. Награда (Reward): После выполнения действия агент получает награду — числовое значение, отражающее, насколько его действие было \"хорошим\".\n",
"Награды помогают агенту оценивать свои действия и формировать полезное поведение.\n",
"5. Функция ценности (Value Function):\n",
"Это внутренняя оценка агента, которая отражает, насколько \"хорошим\" или \"перспективным\" является определённое состояние.\n",
"Она позволяет агенту предсказывать долгосрочные последствия своих действий.\n",
"6. Алгоритм обучения:\n",
"Агент использует алгоритмы для обновления своей политики и функции ценности, основываясь на полученных наградах.\n",
"Примеры: метод Q-learning, методы на основе градиента политики (Policy Gradient), алгоритмы Actor-Critic и т. д.\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"\n",
"class Agent:\n",
" def __init__(self, symbol):\n",
" self.symbol = symbol # Символ игрока (1 - X, -1 - O)\n",
" \n",
" def get_action(self, moves):\n",
" return random.choice(moves) # Выбираем случайный ход из доступных"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Основной цикл обучения"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" O X \n",
" \n",
" \n",
"on move: O\n",
" O X \n",
" X \n",
" \n",
"on move: X\n",
"O O X \n",
" X \n",
" \n",
"on move: O\n",
"O O X \n",
" X X \n",
" \n",
"on move: X\n",
"O O X \n",
" X X \n",
" O \n",
"on move: O\n",
"O O X \n",
" X X \n",
"X O \n",
"Episode 1, Total Reward: 1\n",
"Average Reward: 1.0\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
"O \n",
"on move: O\n",
"X X \n",
" \n",
"O \n",
"on move: X\n",
"X X \n",
" \n",
"O O \n",
"on move: O\n",
"X X \n",
" X \n",
"O O \n",
"on move: X\n",
"X X O \n",
" X \n",
"O O \n",
"on move: O\n",
"X X O \n",
" X X \n",
"O O \n",
"on move: X\n",
"X X O \n",
"O X X \n",
"O O \n",
"on move: O\n",
"X X O \n",
"O X X \n",
"O O X \n",
"Episode 2, Total Reward: 1\n",
"Average Reward: 1.0\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X \n",
" \n",
" X O \n",
"on move: X\n",
" X \n",
" \n",
"O X O \n",
"on move: O\n",
"X X \n",
" \n",
"O X O \n",
"on move: X\n",
"X X \n",
" O \n",
"O X O \n",
"on move: O\n",
"X X \n",
" O X \n",
"O X O \n",
"on move: X\n",
"X X \n",
"O O X \n",
"O X O \n",
"on move: O\n",
"X X X \n",
"O O X \n",
"O X O \n",
"Episode 3, Total Reward: 1\n",
"Average Reward: 1.0\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
"O \n",
" X \n",
" \n",
"on move: O\n",
"O \n",
" X \n",
" X \n",
"on move: X\n",
"O \n",
" X O \n",
" X \n",
"on move: O\n",
"O X \n",
" X O \n",
" X \n",
"on move: X\n",
"O X \n",
" X O \n",
" O X \n",
"on move: O\n",
"O X X \n",
" X O \n",
" O X \n",
"on move: X\n",
"O X X \n",
" X O \n",
"O O X \n",
"on move: O\n",
"O X X \n",
"X X O \n",
"O O X \n",
"Episode 4, Total Reward: 0\n",
"Average Reward: 0.75\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" O \n",
" \n",
"X \n",
"on move: O\n",
" O \n",
"X \n",
"X \n",
"on move: X\n",
" O \n",
"X \n",
"X O \n",
"on move: O\n",
" O \n",
"X \n",
"X O X \n",
"on move: X\n",
" O O \n",
"X \n",
"X O X \n",
"on move: O\n",
"X O O \n",
"X \n",
"X O X \n",
"Episode 5, Total Reward: 1\n",
"Average Reward: 0.8\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X O \n",
" \n",
"on move: O\n",
" \n",
"X X O \n",
" \n",
"on move: X\n",
" \n",
"X X O \n",
"O \n",
"on move: O\n",
" \n",
"X X O \n",
"O X \n",
"on move: X\n",
" O \n",
"X X O \n",
"O X \n",
"on move: O\n",
" X O \n",
"X X O \n",
"O X \n",
"on move: X\n",
" X O \n",
"X X O \n",
"O O X \n",
"on move: O\n",
"X X O \n",
"X X O \n",
"O O X \n",
"Episode 6, Total Reward: 1\n",
"Average Reward: 0.8333333333333334\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" O \n",
" X \n",
"on move: O\n",
" \n",
"X O \n",
" X \n",
"on move: X\n",
" O \n",
"X O \n",
" X \n",
"on move: O\n",
"X O \n",
"X O \n",
" X \n",
"on move: X\n",
"X O \n",
"X O \n",
" O X \n",
"Episode 7, Total Reward: -1\n",
"Average Reward: 0.5714285714285714\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" \n",
"X X \n",
" O \n",
"on move: X\n",
" \n",
"X X O \n",
" O \n",
"on move: O\n",
" X \n",
"X X O \n",
" O \n",
"on move: X\n",
" O X \n",
"X X O \n",
" O \n",
"on move: O\n",
" O X \n",
"X X O \n",
" X O \n",
"on move: X\n",
" O X \n",
"X X O \n",
"O X O \n",
"on move: O\n",
"X O X \n",
"X X O \n",
"O X O \n",
"Episode 8, Total Reward: 0\n",
"Average Reward: 0.5\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
" X O \n",
"on move: O\n",
"X \n",
" \n",
" X O \n",
"on move: X\n",
"X \n",
" O \n",
" X O \n",
"on move: O\n",
"X X \n",
" O \n",
" X O \n",
"on move: X\n",
"X X \n",
" O \n",
"O X O \n",
"on move: O\n",
"X X \n",
"X O \n",
"O X O \n",
"on move: X\n",
"X O X \n",
"X O \n",
"O X O \n",
"on move: O\n",
"X O X \n",
"X X O \n",
"O X O \n",
"Episode 9, Total Reward: 0\n",
"Average Reward: 0.4444444444444444\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
" X O \n",
"on move: O\n",
" \n",
" \n",
"X X O \n",
"on move: X\n",
" \n",
" O \n",
"X X O \n",
"on move: O\n",
" \n",
" O X \n",
"X X O \n",
"on move: X\n",
"O \n",
" O X \n",
"X X O \n",
"Episode 10, Total Reward: -1\n",
"Average Reward: 0.3\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O \n",
"X X \n",
" \n",
"on move: X\n",
" O \n",
"X X \n",
"O \n",
"on move: O\n",
" O \n",
"X X X \n",
"O \n",
"Episode 11, Total Reward: 1\n",
"Average Reward: 0.36363636363636365\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X O \n",
" \n",
"on move: O\n",
" \n",
"X O \n",
" X \n",
"on move: X\n",
" O \n",
"X O \n",
" X \n",
"on move: O\n",
" O \n",
"X O \n",
"X X \n",
"on move: X\n",
" O \n",
"X O \n",
"X O X \n",
"on move: O\n",
" X O \n",
"X O \n",
"X O X \n",
"on move: X\n",
"O X O \n",
"X O \n",
"X O X \n",
"on move: O\n",
"O X O \n",
"X O X \n",
"X O X \n",
"Episode 12, Total Reward: 0\n",
"Average Reward: 0.3333333333333333\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" O X \n",
" \n",
"on move: O\n",
" X \n",
" O X \n",
" \n",
"on move: X\n",
" X \n",
" O X \n",
" O \n",
"on move: O\n",
" X \n",
" O X \n",
"X O \n",
"on move: X\n",
" X \n",
" O X \n",
"X O O \n",
"on move: O\n",
" X X \n",
" O X \n",
"X O O \n",
"on move: X\n",
" X X \n",
"O O X \n",
"X O O \n",
"on move: O\n",
"X X X \n",
"O O X \n",
"X O O \n",
"Episode 13, Total Reward: 1\n",
"Average Reward: 0.38461538461538464\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" \n",
"X O \n",
" \n",
"on move: O\n",
" \n",
"X O \n",
" X \n",
"on move: X\n",
" \n",
"X O \n",
" O X \n",
"on move: O\n",
" X \n",
"X O \n",
" O X \n",
"on move: X\n",
"O X \n",
"X O \n",
" O X \n",
"on move: O\n",
"O X \n",
"X X O \n",
" O X \n",
"on move: X\n",
"O X \n",
"X X O \n",
"O O X \n",
"on move: O\n",
"O X X \n",
"X X O \n",
"O O X \n",
"Episode 14, Total Reward: 0\n",
"Average Reward: 0.35714285714285715\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" O \n",
" X \n",
"on move: O\n",
" X \n",
" O \n",
" X \n",
"on move: X\n",
" X \n",
"O O \n",
" X \n",
"on move: O\n",
" X \n",
"O O \n",
" X X \n",
"on move: X\n",
" O X \n",
"O O \n",
" X X \n",
"on move: O\n",
" O X \n",
"O X O \n",
" X X \n",
"on move: X\n",
" O X \n",
"O X O \n",
"O X X \n",
"on move: O\n",
"X O X \n",
"O X O \n",
"O X X \n",
"Episode 15, Total Reward: 1\n",
"Average Reward: 0.4\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" \n",
"X O \n",
"on move: O\n",
" X \n",
" \n",
"X O \n",
"on move: X\n",
" X O \n",
" \n",
"X O \n",
"on move: O\n",
" X O \n",
" \n",
"X X O \n",
"on move: X\n",
"O X O \n",
" \n",
"X X O \n",
"on move: O\n",
"O X O \n",
"X \n",
"X X O \n",
"on move: X\n",
"O X O \n",
"X O \n",
"X X O \n",
"Episode 16, Total Reward: -1\n",
"Average Reward: 0.3125\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" X \n",
" X \n",
" O \n",
"on move: X\n",
" X \n",
" O X \n",
" O \n",
"on move: O\n",
"X X \n",
" O X \n",
" O \n",
"on move: X\n",
"X X \n",
"O O X \n",
" O \n",
"on move: O\n",
"X X X \n",
"O O X \n",
" O \n",
"Episode 17, Total Reward: 1\n",
"Average Reward: 0.35294117647058826\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
"O X \n",
" \n",
"on move: O\n",
" \n",
"O X \n",
" X \n",
"on move: X\n",
" O \n",
"O X \n",
" X \n",
"on move: O\n",
"X O \n",
"O X \n",
" X \n",
"Episode 18, Total Reward: 1\n",
"Average Reward: 0.3888888888888889\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
" O X \n",
"on move: O\n",
"X \n",
" \n",
" O X \n",
"on move: X\n",
"X \n",
" O \n",
" O X \n",
"on move: O\n",
"X X \n",
" O \n",
" O X \n",
"on move: X\n",
"X O X \n",
" O \n",
" O X \n",
"Episode 19, Total Reward: -1\n",
"Average Reward: 0.3157894736842105\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
"O \n",
" X \n",
"on move: O\n",
" \n",
"O \n",
"X X \n",
"on move: X\n",
" O \n",
"O \n",
"X X \n",
"on move: O\n",
" O \n",
"O X \n",
"X X \n",
"on move: X\n",
" O \n",
"O X \n",
"X X O \n",
"on move: O\n",
" X O \n",
"O X \n",
"X X O \n",
"Episode 20, Total Reward: 1\n",
"Average Reward: 0.35\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
"O \n",
"on move: O\n",
"X X \n",
" \n",
"O \n",
"on move: X\n",
"X X \n",
" \n",
"O O \n",
"on move: O\n",
"X X \n",
" X \n",
"O O \n",
"on move: X\n",
"X O X \n",
" X \n",
"O O \n",
"on move: O\n",
"X O X \n",
"X X \n",
"O O \n",
"on move: X\n",
"X O X \n",
"X O X \n",
"O O \n",
"Episode 21, Total Reward: -1\n",
"Average Reward: 0.2857142857142857\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" O \n",
"X \n",
"on move: O\n",
"X \n",
" O \n",
"X \n",
"on move: X\n",
"X \n",
" O O \n",
"X \n",
"on move: O\n",
"X X \n",
" O O \n",
"X \n",
"on move: X\n",
"X X \n",
" O O \n",
"X O \n",
"on move: O\n",
"X X \n",
"X O O \n",
"X O \n",
"Episode 22, Total Reward: 1\n",
"Average Reward: 0.3181818181818182\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
"O \n",
" \n",
"on move: O\n",
" X \n",
"O \n",
"X \n",
"on move: X\n",
" X O \n",
"O \n",
"X \n",
"on move: O\n",
"X X O \n",
"O \n",
"X \n",
"on move: X\n",
"X X O \n",
"O \n",
"X O \n",
"on move: O\n",
"X X O \n",
"O X \n",
"X O \n",
"on move: X\n",
"X X O \n",
"O O X \n",
"X O \n",
"on move: O\n",
"X X O \n",
"O O X \n",
"X X O \n",
"Episode 23, Total Reward: 0\n",
"Average Reward: 0.30434782608695654\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
"X \n",
" X \n",
" O \n",
"on move: X\n",
"X \n",
" X \n",
" O O \n",
"on move: O\n",
"X X \n",
" X \n",
" O O \n",
"on move: X\n",
"X X \n",
" X \n",
"O O O \n",
"Episode 24, Total Reward: -1\n",
"Average Reward: 0.25\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" \n",
"X O \n",
"on move: O\n",
" \n",
" X \n",
"X O \n",
"on move: X\n",
" O \n",
" X \n",
"X O \n",
"on move: O\n",
" O \n",
"X X \n",
"X O \n",
"on move: X\n",
"O O \n",
"X X \n",
"X O \n",
"on move: O\n",
"O O \n",
"X X X \n",
"X O \n",
"Episode 25, Total Reward: 1\n",
"Average Reward: 0.28\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" \n",
"X O \n",
"on move: O\n",
" \n",
" \n",
"X O X \n",
"on move: X\n",
" \n",
" O \n",
"X O X \n",
"on move: O\n",
" X \n",
" O \n",
"X O X \n",
"on move: X\n",
" X \n",
" O O \n",
"X O X \n",
"on move: O\n",
" X X \n",
" O O \n",
"X O X \n",
"on move: X\n",
"O X X \n",
" O O \n",
"X O X \n",
"on move: O\n",
"O X X \n",
"X O O \n",
"X O X \n",
"Episode 26, Total Reward: 0\n",
"Average Reward: 0.2692307692307692\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
"O \n",
"on move: O\n",
" \n",
"X X \n",
"O \n",
"on move: X\n",
" \n",
"X X \n",
"O O \n",
"on move: O\n",
" \n",
"X X \n",
"O O X \n",
"on move: X\n",
" O \n",
"X X \n",
"O O X \n",
"on move: O\n",
"X O \n",
"X X \n",
"O O X \n",
"Episode 27, Total Reward: 1\n",
"Average Reward: 0.2962962962962963\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
"O X \n",
"on move: O\n",
" X \n",
" \n",
"O X \n",
"on move: X\n",
" X O \n",
" \n",
"O X \n",
"on move: O\n",
" X O \n",
" X \n",
"O X \n",
"on move: X\n",
" X O \n",
" X \n",
"O O X \n",
"on move: O\n",
" X O \n",
"X X \n",
"O O X \n",
"on move: X\n",
"O X O \n",
"X X \n",
"O O X \n",
"on move: O\n",
"O X O \n",
"X X X \n",
"O O X \n",
"Episode 28, Total Reward: 1\n",
"Average Reward: 0.32142857142857145\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" \n",
" X X \n",
"on move: X\n",
" O O \n",
" \n",
" X X \n",
"on move: O\n",
"X O O \n",
" \n",
" X X \n",
"on move: X\n",
"X O O \n",
" O \n",
" X X \n",
"on move: O\n",
"X O O \n",
" O X \n",
" X X \n",
"on move: X\n",
"X O O \n",
" O X \n",
"O X X \n",
"Episode 29, Total Reward: -1\n",
"Average Reward: 0.27586206896551724\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O \n",
" X X \n",
" \n",
"on move: X\n",
" O \n",
"O X X \n",
" \n",
"on move: O\n",
" O \n",
"O X X \n",
" X \n",
"on move: X\n",
"O O \n",
"O X X \n",
" X \n",
"on move: O\n",
"O O \n",
"O X X \n",
"X X \n",
"on move: X\n",
"O O O \n",
"O X X \n",
"X X \n",
"Episode 30, Total Reward: -1\n",
"Average Reward: 0.23333333333333334\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" \n",
" X \n",
"X O \n",
"on move: X\n",
" O \n",
" X \n",
"X O \n",
"on move: O\n",
" O \n",
"X X \n",
"X O \n",
"on move: X\n",
" O \n",
"X X \n",
"X O O \n",
"on move: O\n",
"X O \n",
"X X \n",
"X O O \n",
"Episode 31, Total Reward: 1\n",
"Average Reward: 0.25806451612903225\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X O \n",
" \n",
" \n",
"on move: O\n",
"X O \n",
" \n",
"X \n",
"on move: X\n",
"X O O \n",
" \n",
"X \n",
"on move: O\n",
"X O O \n",
" X \n",
"X \n",
"on move: X\n",
"X O O \n",
" O X \n",
"X \n",
"on move: O\n",
"X O O \n",
" O X \n",
"X X \n",
"on move: X\n",
"X O O \n",
"O O X \n",
"X X \n",
"on move: O\n",
"X O O \n",
"O O X \n",
"X X X \n",
"Episode 32, Total Reward: 1\n",
"Average Reward: 0.28125\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
"O \n",
"X \n",
" \n",
"on move: O\n",
"O \n",
"X \n",
"X \n",
"on move: X\n",
"O \n",
"X O \n",
"X \n",
"on move: O\n",
"O \n",
"X O \n",
"X X \n",
"on move: X\n",
"O O \n",
"X O \n",
"X X \n",
"on move: O\n",
"O X O \n",
"X O \n",
"X X \n",
"on move: X\n",
"O X O \n",
"X O \n",
"X X O \n",
"Episode 33, Total Reward: -1\n",
"Average Reward: 0.24242424242424243\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" X \n",
" X \n",
"on move: X\n",
" O \n",
" X O \n",
" X \n",
"on move: O\n",
" O X \n",
" X O \n",
" X \n",
"on move: X\n",
" O X \n",
"O X O \n",
" X \n",
"on move: O\n",
" O X \n",
"O X O \n",
"X X \n",
"Episode 34, Total Reward: 1\n",
"Average Reward: 0.2647058823529412\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" O \n",
" \n",
"on move: O\n",
" X \n",
" O \n",
" X \n",
"on move: X\n",
" X \n",
"O O \n",
" X \n",
"on move: O\n",
" X X \n",
"O O \n",
" X \n",
"on move: X\n",
" X X \n",
"O O \n",
" O X \n",
"on move: O\n",
"X X X \n",
"O O \n",
" O X \n",
"Episode 35, Total Reward: 1\n",
"Average Reward: 0.2857142857142857\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" O \n",
" \n",
"on move: O\n",
" X X \n",
" O \n",
" \n",
"on move: X\n",
" X X \n",
" O O \n",
" \n",
"on move: O\n",
" X X \n",
"X O O \n",
" \n",
"on move: X\n",
"O X X \n",
"X O O \n",
" \n",
"on move: O\n",
"O X X \n",
"X O O \n",
" X \n",
"on move: X\n",
"O X X \n",
"X O O \n",
" X O \n",
"Episode 36, Total Reward: -1\n",
"Average Reward: 0.25\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" X \n",
" O \n",
"on move: O\n",
" \n",
" X \n",
"X O \n",
"on move: X\n",
" \n",
" X O \n",
"X O \n",
"on move: O\n",
"X \n",
" X O \n",
"X O \n",
"on move: X\n",
"X \n",
"O X O \n",
"X O \n",
"on move: O\n",
"X X \n",
"O X O \n",
"X O \n",
"Episode 37, Total Reward: 1\n",
"Average Reward: 0.2702702702702703\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O X \n",
" X \n",
" \n",
"on move: X\n",
" O X \n",
" O X \n",
" \n",
"on move: O\n",
" O X \n",
" O X \n",
" X \n",
"on move: X\n",
" O X \n",
" O X \n",
"O X \n",
"on move: O\n",
" O X \n",
" O X \n",
"O X X \n",
"Episode 38, Total Reward: 1\n",
"Average Reward: 0.2894736842105263\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
"O X \n",
" \n",
"on move: O\n",
"X \n",
"O X \n",
" \n",
"on move: X\n",
"X \n",
"O X \n",
"O \n",
"on move: O\n",
"X X \n",
"O X \n",
"O \n",
"on move: X\n",
"X X \n",
"O X \n",
"O O \n",
"on move: O\n",
"X X \n",
"O X \n",
"O X O \n",
"on move: X\n",
"X X \n",
"O O X \n",
"O X O \n",
"on move: O\n",
"X X X \n",
"O O X \n",
"O X O \n",
"Episode 39, Total Reward: 1\n",
"Average Reward: 0.3076923076923077\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
"O \n",
" \n",
"on move: O\n",
" X \n",
"O \n",
" X \n",
"on move: X\n",
" O X \n",
"O \n",
" X \n",
"on move: O\n",
" O X \n",
"O \n",
" X X \n",
"on move: X\n",
"O O X \n",
"O \n",
" X X \n",
"on move: O\n",
"O O X \n",
"O X \n",
" X X \n",
"Episode 40, Total Reward: 1\n",
"Average Reward: 0.325\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" O \n",
" X \n",
"on move: O\n",
" \n",
"X O \n",
" X \n",
"on move: X\n",
" \n",
"X O \n",
" O X \n",
"on move: O\n",
" \n",
"X O \n",
"X O X \n",
"on move: X\n",
"O \n",
"X O \n",
"X O X \n",
"on move: O\n",
"O X \n",
"X O \n",
"X O X \n",
"on move: X\n",
"O X \n",
"X O O \n",
"X O X \n",
"on move: O\n",
"O X X \n",
"X O O \n",
"X O X \n",
"Episode 41, Total Reward: 0\n",
"Average Reward: 0.3170731707317073\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" X O \n",
" X \n",
" \n",
"on move: X\n",
" X O \n",
" X \n",
"O \n",
"on move: O\n",
" X O \n",
" X X \n",
"O \n",
"on move: X\n",
" X O \n",
"O X X \n",
"O \n",
"on move: O\n",
" X O \n",
"O X X \n",
"O X \n",
"Episode 42, Total Reward: 1\n",
"Average Reward: 0.3333333333333333\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" O \n",
" \n",
"X \n",
"on move: O\n",
" X O \n",
" \n",
"X \n",
"on move: X\n",
" X O \n",
" \n",
"X O \n",
"on move: O\n",
"X X O \n",
" \n",
"X O \n",
"on move: X\n",
"X X O \n",
" O \n",
"X O \n",
"Episode 43, Total Reward: -1\n",
"Average Reward: 0.3023255813953488\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
"X \n",
" X \n",
"on move: X\n",
"O O \n",
"X \n",
" X \n",
"on move: O\n",
"O O \n",
"X \n",
" X X \n",
"on move: X\n",
"O O \n",
"X O \n",
" X X \n",
"on move: O\n",
"O O X \n",
"X O \n",
" X X \n",
"on move: X\n",
"O O X \n",
"X O \n",
"O X X \n",
"on move: O\n",
"O O X \n",
"X O X \n",
"O X X \n",
"Episode 44, Total Reward: 1\n",
"Average Reward: 0.3181818181818182\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" X \n",
" X \n",
"on move: X\n",
" O O \n",
" X \n",
" X \n",
"on move: O\n",
" O O \n",
" X \n",
" X X \n",
"on move: X\n",
" O O \n",
" O X \n",
" X X \n",
"on move: O\n",
"X O O \n",
" O X \n",
" X X \n",
"on move: X\n",
"X O O \n",
"O O X \n",
" X X \n",
"on move: O\n",
"X O O \n",
"O O X \n",
"X X X \n",
"Episode 45, Total Reward: 1\n",
"Average Reward: 0.3333333333333333\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
" O \n",
"on move: O\n",
"X X \n",
" \n",
" O \n",
"on move: X\n",
"X X \n",
"O \n",
" O \n",
"on move: O\n",
"X X X \n",
"O \n",
" O \n",
"Episode 46, Total Reward: 1\n",
"Average Reward: 0.34782608695652173\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
"O \n",
" \n",
" X \n",
"on move: O\n",
"O \n",
" X \n",
" X \n",
"on move: X\n",
"O O \n",
" X \n",
" X \n",
"on move: O\n",
"O O \n",
" X \n",
"X X \n",
"on move: X\n",
"O O \n",
" X O \n",
"X X \n",
"on move: O\n",
"O O \n",
" X O \n",
"X X X \n",
"Episode 47, Total Reward: 1\n",
"Average Reward: 0.3617021276595745\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
"O \n",
" \n",
" X \n",
"on move: O\n",
"O \n",
" X \n",
" X \n",
"on move: X\n",
"O \n",
"O X \n",
" X \n",
"on move: O\n",
"O \n",
"O X \n",
" X X \n",
"on move: X\n",
"O \n",
"O X \n",
"O X X \n",
"Episode 48, Total Reward: -1\n",
"Average Reward: 0.3333333333333333\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O \n",
" X X \n",
" \n",
"on move: X\n",
" O \n",
" X X \n",
"O \n",
"on move: O\n",
" O \n",
" X X \n",
"O X \n",
"on move: X\n",
" O \n",
"O X X \n",
"O X \n",
"on move: O\n",
" O \n",
"O X X \n",
"O X X \n",
"on move: X\n",
"O O \n",
"O X X \n",
"O X X \n",
"Episode 49, Total Reward: -1\n",
"Average Reward: 0.30612244897959184\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
"O \n",
" X \n",
" \n",
"on move: O\n",
"O \n",
" X \n",
" X \n",
"on move: X\n",
"O O \n",
" X \n",
" X \n",
"on move: O\n",
"O O \n",
" X \n",
" X X \n",
"on move: X\n",
"O O \n",
" X \n",
"O X X \n",
"on move: O\n",
"O O \n",
"X X \n",
"O X X \n",
"on move: X\n",
"O O O \n",
"X X \n",
"O X X \n",
"Episode 50, Total Reward: -1\n",
"Average Reward: 0.28\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" X \n",
" X \n",
"on move: X\n",
" O \n",
" X \n",
"O X \n",
"on move: O\n",
" O \n",
" X X \n",
"O X \n",
"on move: X\n",
" O \n",
"O X X \n",
"O X \n",
"on move: O\n",
" O X \n",
"O X X \n",
"O X \n",
"on move: X\n",
"O O X \n",
"O X X \n",
"O X \n",
"Episode 51, Total Reward: -1\n",
"Average Reward: 0.2549019607843137\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
"O X \n",
" \n",
" \n",
"on move: O\n",
"O X \n",
" X \n",
" \n",
"on move: X\n",
"O X \n",
" O X \n",
" \n",
"on move: O\n",
"O X \n",
" O X \n",
" X \n",
"on move: X\n",
"O X O \n",
" O X \n",
" X \n",
"on move: O\n",
"O X O \n",
"X O X \n",
" X \n",
"on move: X\n",
"O X O \n",
"X O X \n",
"O X \n",
"Episode 52, Total Reward: -1\n",
"Average Reward: 0.23076923076923078\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" O X \n",
" \n",
" \n",
"on move: O\n",
"X O X \n",
" \n",
" \n",
"on move: X\n",
"X O X \n",
" \n",
" O \n",
"on move: O\n",
"X O X \n",
" X \n",
" O \n",
"on move: X\n",
"X O X \n",
" X \n",
" O O \n",
"on move: O\n",
"X O X \n",
" X X \n",
" O O \n",
"on move: X\n",
"X O X \n",
" X X \n",
"O O O \n",
"Episode 53, Total Reward: -1\n",
"Average Reward: 0.20754716981132076\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
"O \n",
"on move: O\n",
"X \n",
" \n",
"O X \n",
"on move: X\n",
"X \n",
"O \n",
"O X \n",
"on move: O\n",
"X X \n",
"O \n",
"O X \n",
"on move: X\n",
"X X \n",
"O O \n",
"O X \n",
"on move: O\n",
"X X \n",
"O O X \n",
"O X \n",
"on move: X\n",
"X X O \n",
"O O X \n",
"O X \n",
"Episode 54, Total Reward: -1\n",
"Average Reward: 0.18518518518518517\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" O X \n",
" \n",
"on move: O\n",
" X \n",
" O X \n",
" \n",
"on move: X\n",
" X \n",
" O X \n",
"O \n",
"on move: O\n",
" X \n",
" O X \n",
"O X \n",
"on move: X\n",
"O X \n",
" O X \n",
"O X \n",
"on move: O\n",
"O X \n",
"X O X \n",
"O X \n",
"on move: X\n",
"O X O \n",
"X O X \n",
"O X \n",
"Episode 55, Total Reward: -1\n",
"Average Reward: 0.16363636363636364\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X O \n",
" \n",
" \n",
"on move: O\n",
" X O \n",
" \n",
"X \n",
"on move: X\n",
" X O \n",
" O \n",
"X \n",
"on move: O\n",
" X O \n",
" O \n",
"X X \n",
"on move: X\n",
" X O \n",
" O \n",
"X X O \n",
"on move: O\n",
" X O \n",
" O X \n",
"X X O \n",
"on move: X\n",
" X O \n",
"O O X \n",
"X X O \n",
"on move: O\n",
"X X O \n",
"O O X \n",
"X X O \n",
"Episode 56, Total Reward: 0\n",
"Average Reward: 0.16071428571428573\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
"O \n",
"on move: O\n",
" X \n",
"X \n",
"O \n",
"on move: X\n",
" X \n",
"X O \n",
"O \n",
"on move: O\n",
"X X \n",
"X O \n",
"O \n",
"on move: X\n",
"X O X \n",
"X O \n",
"O \n",
"on move: O\n",
"X O X \n",
"X O \n",
"O X \n",
"on move: X\n",
"X O X \n",
"X O O \n",
"O X \n",
"on move: O\n",
"X O X \n",
"X O O \n",
"O X X \n",
"Episode 57, Total Reward: 0\n",
"Average Reward: 0.15789473684210525\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" O \n",
"X \n",
"on move: O\n",
" X \n",
" O \n",
"X \n",
"on move: X\n",
" X \n",
"O O \n",
"X \n",
"on move: O\n",
"X X \n",
"O O \n",
"X \n",
"on move: X\n",
"X O X \n",
"O O \n",
"X \n",
"on move: O\n",
"X O X \n",
"O O \n",
"X X \n",
"on move: X\n",
"X O X \n",
"O O \n",
"X O X \n",
"on move: O\n",
"X O X \n",
"O X O \n",
"X O X \n",
"Episode 58, Total Reward: 1\n",
"Average Reward: 0.1724137931034483\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" O X \n",
" \n",
"on move: O\n",
" \n",
" O X \n",
" X \n",
"on move: X\n",
"O \n",
" O X \n",
" X \n",
"on move: O\n",
"O \n",
" O X \n",
" X X \n",
"on move: X\n",
"O O \n",
" O X \n",
" X X \n",
"on move: O\n",
"O O \n",
" O X \n",
"X X X \n",
"Episode 59, Total Reward: 1\n",
"Average Reward: 0.1864406779661017\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
"X \n",
" X \n",
"on move: X\n",
" O \n",
"X \n",
" X O \n",
"on move: O\n",
" O \n",
"X X \n",
" X O \n",
"on move: X\n",
" O \n",
"X X \n",
"O X O \n",
"on move: O\n",
" O \n",
"X X X \n",
"O X O \n",
"Episode 60, Total Reward: 1\n",
"Average Reward: 0.2\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
" O \n",
"on move: O\n",
"X \n",
" \n",
"X O \n",
"on move: X\n",
"X \n",
" O \n",
"X O \n",
"on move: O\n",
"X \n",
" O \n",
"X X O \n",
"on move: X\n",
"X O \n",
" O \n",
"X X O \n",
"on move: O\n",
"X O \n",
"X O \n",
"X X O \n",
"Episode 61, Total Reward: 1\n",
"Average Reward: 0.21311475409836064\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" O \n",
"X \n",
"on move: O\n",
"X \n",
" O \n",
"X \n",
"on move: X\n",
"X \n",
" O \n",
"X O \n",
"on move: O\n",
"X \n",
"X O \n",
"X O \n",
"Episode 62, Total Reward: 1\n",
"Average Reward: 0.22580645161290322\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" O \n",
" \n",
"on move: O\n",
"X \n",
" O \n",
" X \n",
"on move: X\n",
"X \n",
"O O \n",
" X \n",
"on move: O\n",
"X \n",
"O O \n",
"X X \n",
"on move: X\n",
"X \n",
"O O \n",
"X X O \n",
"on move: O\n",
"X X \n",
"O O \n",
"X X O \n",
"on move: X\n",
"X X \n",
"O O O \n",
"X X O \n",
"Episode 63, Total Reward: -1\n",
"Average Reward: 0.20634920634920634\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" O \n",
"X \n",
" \n",
"on move: O\n",
"X O \n",
"X \n",
" \n",
"on move: X\n",
"X O \n",
"X \n",
" O \n",
"on move: O\n",
"X O \n",
"X X \n",
" O \n",
"on move: X\n",
"X O \n",
"X X \n",
"O O \n",
"on move: O\n",
"X O \n",
"X X \n",
"O X O \n",
"on move: X\n",
"X O O \n",
"X X \n",
"O X O \n",
"on move: O\n",
"X O O \n",
"X X X \n",
"O X O \n",
"Episode 64, Total Reward: 1\n",
"Average Reward: 0.21875\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" O \n",
" \n",
"X \n",
"on move: O\n",
"X O \n",
" \n",
"X \n",
"on move: X\n",
"X O \n",
"O \n",
"X \n",
"on move: O\n",
"X O \n",
"O \n",
"X X \n",
"on move: X\n",
"X O O \n",
"O \n",
"X X \n",
"on move: O\n",
"X O O \n",
"O X \n",
"X X \n",
"on move: X\n",
"X O O \n",
"O X \n",
"X X O \n",
"on move: O\n",
"X O O \n",
"O X X \n",
"X X O \n",
"Episode 65, Total Reward: 0\n",
"Average Reward: 0.2153846153846154\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" \n",
"X X \n",
"on move: X\n",
"O O \n",
" \n",
"X X \n",
"on move: O\n",
"O O \n",
" X \n",
"X X \n",
"on move: X\n",
"O O O \n",
" X \n",
"X X \n",
"Episode 66, Total Reward: -1\n",
"Average Reward: 0.19696969696969696\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
" O X \n",
" \n",
"on move: O\n",
" X \n",
" O X \n",
" \n",
"on move: X\n",
" X \n",
" O X \n",
"O \n",
"on move: O\n",
" X X \n",
" O X \n",
"O \n",
"on move: X\n",
" X X \n",
"O O X \n",
"O \n",
"on move: O\n",
" X X \n",
"O O X \n",
"O X \n",
"on move: X\n",
"O X X \n",
"O O X \n",
"O X \n",
"Episode 67, Total Reward: -1\n",
"Average Reward: 0.1791044776119403\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X \n",
" \n",
" O X \n",
"on move: X\n",
" X \n",
" O \n",
" O X \n",
"on move: O\n",
"X X \n",
" O \n",
" O X \n",
"on move: X\n",
"X X \n",
" O \n",
"O O X \n",
"on move: O\n",
"X X \n",
" X O \n",
"O O X \n",
"Episode 68, Total Reward: 1\n",
"Average Reward: 0.19117647058823528\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
"O X \n",
"on move: O\n",
" X \n",
" \n",
"O X \n",
"on move: X\n",
" O X \n",
" \n",
"O X \n",
"on move: O\n",
" O X \n",
" \n",
"O X X \n",
"on move: X\n",
" O X \n",
" O \n",
"O X X \n",
"on move: O\n",
" O X \n",
"X O \n",
"O X X \n",
"on move: X\n",
"O O X \n",
"X O \n",
"O X X \n",
"on move: O\n",
"O O X \n",
"X X O \n",
"O X X \n",
"Episode 69, Total Reward: 0\n",
"Average Reward: 0.18840579710144928\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
"O \n",
" \n",
"on move: O\n",
" X \n",
"O \n",
" X \n",
"on move: X\n",
"O X \n",
"O \n",
" X \n",
"on move: O\n",
"O X \n",
"O X \n",
" X \n",
"on move: X\n",
"O O X \n",
"O X \n",
" X \n",
"on move: O\n",
"O O X \n",
"O X \n",
"X X \n",
"on move: X\n",
"O O X \n",
"O X \n",
"X X O \n",
"on move: O\n",
"O O X \n",
"O X X \n",
"X X O \n",
"Episode 70, Total Reward: 1\n",
"Average Reward: 0.2\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
"O \n",
" \n",
"on move: O\n",
" X \n",
"O \n",
" X \n",
"on move: X\n",
" X \n",
"O O \n",
" X \n",
"on move: O\n",
" X \n",
"O O X \n",
" X \n",
"on move: X\n",
" X O \n",
"O O X \n",
" X \n",
"on move: O\n",
" X O \n",
"O O X \n",
" X X \n",
"on move: X\n",
" X O \n",
"O O X \n",
"O X X \n",
"Episode 71, Total Reward: -1\n",
"Average Reward: 0.18309859154929578\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" O \n",
" X \n",
" \n",
"on move: O\n",
" O \n",
" X \n",
"X \n",
"on move: X\n",
"O O \n",
" X \n",
"X \n",
"on move: O\n",
"O O \n",
" X \n",
"X X \n",
"on move: X\n",
"O O O \n",
" X \n",
"X X \n",
"Episode 72, Total Reward: -1\n",
"Average Reward: 0.16666666666666666\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X O \n",
" \n",
" \n",
"on move: O\n",
"X O \n",
" \n",
"X \n",
"on move: X\n",
"X O \n",
" O \n",
"X \n",
"on move: O\n",
"X O \n",
" X O \n",
"X \n",
"on move: X\n",
"X O \n",
" X O \n",
"X O \n",
"on move: O\n",
"X O \n",
"X X O \n",
"X O \n",
"Episode 73, Total Reward: 1\n",
"Average Reward: 0.1780821917808219\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
"O \n",
"X \n",
"on move: O\n",
" X \n",
"O \n",
"X \n",
"on move: X\n",
" X O \n",
"O \n",
"X \n",
"on move: O\n",
"X X O \n",
"O \n",
"X \n",
"on move: X\n",
"X X O \n",
"O \n",
"X O \n",
"on move: O\n",
"X X O \n",
"O X \n",
"X O \n",
"on move: X\n",
"X X O \n",
"O X \n",
"X O O \n",
"on move: O\n",
"X X O \n",
"O X X \n",
"X O O \n",
"Episode 74, Total Reward: 0\n",
"Average Reward: 0.17567567567567569\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
" O \n",
"on move: O\n",
"X \n",
" X \n",
" O \n",
"on move: X\n",
"X \n",
" O X \n",
" O \n",
"on move: O\n",
"X X \n",
" O X \n",
" O \n",
"on move: X\n",
"X X \n",
"O O X \n",
" O \n",
"on move: O\n",
"X X X \n",
"O O X \n",
" O \n",
"Episode 75, Total Reward: 1\n",
"Average Reward: 0.18666666666666668\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" \n",
" \n",
"X O \n",
"on move: O\n",
" X \n",
" \n",
"X O \n",
"on move: X\n",
" X \n",
" \n",
"X O O \n",
"on move: O\n",
"X X \n",
" \n",
"X O O \n",
"on move: X\n",
"X X \n",
" O \n",
"X O O \n",
"on move: O\n",
"X X X \n",
" O \n",
"X O O \n",
"Episode 76, Total Reward: 1\n",
"Average Reward: 0.19736842105263158\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
"O X \n",
" \n",
" \n",
"on move: O\n",
"O X \n",
" X \n",
" \n",
"on move: X\n",
"O X \n",
" X \n",
" O \n",
"on move: O\n",
"O X \n",
"X X \n",
" O \n",
"on move: X\n",
"O X \n",
"X X \n",
" O O \n",
"on move: O\n",
"O X X \n",
"X X \n",
" O O \n",
"on move: X\n",
"O X X \n",
"X X \n",
"O O O \n",
"Episode 77, Total Reward: -1\n",
"Average Reward: 0.18181818181818182\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" O \n",
"X \n",
" \n",
"on move: O\n",
" O \n",
"X \n",
" X \n",
"on move: X\n",
" O \n",
"X \n",
" O X \n",
"on move: O\n",
" O \n",
"X X \n",
" O X \n",
"on move: X\n",
" O \n",
"X X \n",
"O O X \n",
"on move: O\n",
"X O \n",
"X X \n",
"O O X \n",
"on move: X\n",
"X O \n",
"X O X \n",
"O O X \n",
"Episode 78, Total Reward: -1\n",
"Average Reward: 0.16666666666666666\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
"X X \n",
" \n",
" O \n",
"on move: X\n",
"X X O \n",
" \n",
" O \n",
"on move: O\n",
"X X O \n",
" X \n",
" O \n",
"on move: X\n",
"X X O \n",
" X \n",
" O O \n",
"on move: O\n",
"X X O \n",
"X X \n",
" O O \n",
"on move: X\n",
"X X O \n",
"X X O \n",
" O O \n",
"Episode 79, Total Reward: -1\n",
"Average Reward: 0.1518987341772152\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" \n",
" O \n",
"on move: O\n",
"X X \n",
" \n",
" O \n",
"on move: X\n",
"X X \n",
" O \n",
" O \n",
"on move: O\n",
"X X X \n",
" O \n",
" O \n",
"Episode 80, Total Reward: 1\n",
"Average Reward: 0.1625\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X O \n",
" \n",
" \n",
"on move: O\n",
" X O \n",
"X \n",
" \n",
"on move: X\n",
" X O \n",
"X O \n",
" \n",
"on move: O\n",
" X O \n",
"X X O \n",
" \n",
"on move: X\n",
" X O \n",
"X X O \n",
" O \n",
"on move: O\n",
" X O \n",
"X X O \n",
" O X \n",
"on move: X\n",
"O X O \n",
"X X O \n",
" O X \n",
"on move: O\n",
"O X O \n",
"X X O \n",
"X O X \n",
"Episode 81, Total Reward: 0\n",
"Average Reward: 0.16049382716049382\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
" \n",
"O X \n",
" \n",
"on move: O\n",
"X \n",
"O X \n",
" \n",
"on move: X\n",
"X \n",
"O X \n",
" O \n",
"on move: O\n",
"X \n",
"O X \n",
" X O \n",
"on move: X\n",
"X \n",
"O X \n",
"O X O \n",
"on move: O\n",
"X X \n",
"O X \n",
"O X O \n",
"on move: X\n",
"X X \n",
"O X O \n",
"O X O \n",
"on move: O\n",
"X X X \n",
"O X O \n",
"O X O \n",
"Episode 82, Total Reward: 1\n",
"Average Reward: 0.17073170731707318\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X \n",
" \n",
"X O \n",
"on move: X\n",
" X \n",
" \n",
"X O O \n",
"on move: O\n",
"X X \n",
" \n",
"X O O \n",
"on move: X\n",
"X X \n",
" O \n",
"X O O \n",
"on move: O\n",
"X X \n",
" O X \n",
"X O O \n",
"on move: X\n",
"X X O \n",
" O X \n",
"X O O \n",
"on move: O\n",
"X X O \n",
"X O X \n",
"X O O \n",
"Episode 83, Total Reward: 1\n",
"Average Reward: 0.18072289156626506\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X O \n",
" \n",
" \n",
"on move: O\n",
"X O \n",
" X \n",
" \n",
"on move: X\n",
"X O O \n",
" X \n",
" \n",
"on move: O\n",
"X O O \n",
" X \n",
" X \n",
"on move: X\n",
"X O O \n",
" X \n",
" O X \n",
"on move: O\n",
"X O O \n",
"X X \n",
" O X \n",
"on move: X\n",
"X O O \n",
"X X \n",
"O O X \n",
"on move: O\n",
"X O O \n",
"X X X \n",
"O O X \n",
"Episode 84, Total Reward: 1\n",
"Average Reward: 0.19047619047619047\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
"O X \n",
"on move: O\n",
" X \n",
" \n",
"O X \n",
"on move: X\n",
" X \n",
" \n",
"O O X \n",
"on move: O\n",
" X \n",
"X \n",
"O O X \n",
"on move: X\n",
" X \n",
"X O \n",
"O O X \n",
"on move: O\n",
" X \n",
"X O X \n",
"O O X \n",
"Episode 85, Total Reward: 1\n",
"Average Reward: 0.2\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
" O \n",
" \n",
"on move: O\n",
"X X \n",
" O \n",
" \n",
"on move: X\n",
"X X \n",
" O \n",
"O \n",
"on move: O\n",
"X X \n",
" X O \n",
"O \n",
"on move: X\n",
"X X \n",
" X O \n",
"O O \n",
"on move: O\n",
"X X X \n",
" X O \n",
"O O \n",
"Episode 86, Total Reward: 1\n",
"Average Reward: 0.20930232558139536\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X O \n",
" \n",
" \n",
"on move: O\n",
"X O \n",
" \n",
" X \n",
"on move: X\n",
"X O O \n",
" \n",
" X \n",
"on move: O\n",
"X O O \n",
" \n",
" X X \n",
"on move: X\n",
"X O O \n",
" \n",
"O X X \n",
"on move: O\n",
"X O O \n",
"X \n",
"O X X \n",
"on move: X\n",
"X O O \n",
"X O \n",
"O X X \n",
"on move: O\n",
"X O O \n",
"X X O \n",
"O X X \n",
"Episode 87, Total Reward: 1\n",
"Average Reward: 0.21839080459770116\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X \n",
" \n",
" X O \n",
"on move: X\n",
" X \n",
" \n",
"O X O \n",
"on move: O\n",
" X X \n",
" \n",
"O X O \n",
"on move: X\n",
" X X \n",
"O \n",
"O X O \n",
"on move: O\n",
" X X \n",
"O X \n",
"O X O \n",
"Episode 88, Total Reward: 1\n",
"Average Reward: 0.22727272727272727\n",
"on move: O\n",
" \n",
" \n",
"X \n",
"on move: X\n",
" O \n",
" \n",
"X \n",
"on move: O\n",
" O \n",
"X \n",
"X \n",
"on move: X\n",
" O \n",
"X O \n",
"X \n",
"on move: O\n",
" O \n",
"X O \n",
"X X \n",
"on move: X\n",
" O O \n",
"X O \n",
"X X \n",
"on move: O\n",
" O O \n",
"X O X \n",
"X X \n",
"on move: X\n",
"O O O \n",
"X O X \n",
"X X \n",
"Episode 89, Total Reward: -1\n",
"Average Reward: 0.21348314606741572\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
"O \n",
" \n",
"on move: O\n",
" X X \n",
"O \n",
" \n",
"on move: X\n",
" X X \n",
"O \n",
" O \n",
"on move: O\n",
"X X X \n",
"O \n",
" O \n",
"Episode 90, Total Reward: 1\n",
"Average Reward: 0.2222222222222222\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
"X \n",
" X \n",
"on move: X\n",
" O \n",
"X \n",
"O X \n",
"on move: O\n",
" O \n",
"X X \n",
"O X \n",
"on move: X\n",
" O \n",
"X O X \n",
"O X \n",
"on move: O\n",
" O \n",
"X O X \n",
"O X X \n",
"on move: X\n",
" O O \n",
"X O X \n",
"O X X \n",
"Episode 91, Total Reward: -1\n",
"Average Reward: 0.2087912087912088\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" O \n",
" \n",
"on move: O\n",
" X \n",
"X O \n",
" \n",
"on move: X\n",
" X O \n",
"X O \n",
" \n",
"on move: O\n",
" X O \n",
"X O \n",
" X \n",
"on move: X\n",
"O X O \n",
"X O \n",
" X \n",
"on move: O\n",
"O X O \n",
"X O \n",
"X X \n",
"on move: X\n",
"O X O \n",
"X O \n",
"X X O \n",
"Episode 92, Total Reward: -1\n",
"Average Reward: 0.1956521739130435\n",
"on move: O\n",
"X \n",
" \n",
" \n",
"on move: X\n",
"X \n",
"O \n",
" \n",
"on move: O\n",
"X \n",
"O X \n",
" \n",
"on move: X\n",
"X O \n",
"O X \n",
" \n",
"on move: O\n",
"X O \n",
"O X \n",
"X \n",
"on move: X\n",
"X O O \n",
"O X \n",
"X \n",
"on move: O\n",
"X O O \n",
"O X \n",
"X X \n",
"on move: X\n",
"X O O \n",
"O O X \n",
"X X \n",
"on move: O\n",
"X O O \n",
"O O X \n",
"X X X \n",
"Episode 93, Total Reward: 1\n",
"Average Reward: 0.20430107526881722\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" O \n",
" X \n",
"on move: O\n",
" X \n",
" O \n",
" X \n",
"on move: X\n",
"O X \n",
" O \n",
" X \n",
"on move: O\n",
"O X \n",
" O \n",
" X X \n",
"on move: X\n",
"O X \n",
" O \n",
"O X X \n",
"on move: O\n",
"O X \n",
" X O \n",
"O X X \n",
"Episode 94, Total Reward: 1\n",
"Average Reward: 0.2127659574468085\n",
"on move: O\n",
" \n",
"X \n",
" \n",
"on move: X\n",
" O \n",
"X \n",
" \n",
"on move: O\n",
" O \n",
"X \n",
"X \n",
"on move: X\n",
" O \n",
"X \n",
"X O \n",
"on move: O\n",
" O \n",
"X X \n",
"X O \n",
"on move: X\n",
" O \n",
"X X \n",
"X O O \n",
"on move: O\n",
"X O \n",
"X X \n",
"X O O \n",
"Episode 95, Total Reward: 1\n",
"Average Reward: 0.22105263157894736\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
" X O \n",
"on move: O\n",
" X \n",
" \n",
" X O \n",
"on move: X\n",
" X \n",
"O \n",
" X O \n",
"on move: O\n",
" X \n",
"O \n",
"X X O \n",
"on move: X\n",
" X O \n",
"O \n",
"X X O \n",
"on move: O\n",
" X O \n",
"O X \n",
"X X O \n",
"Episode 96, Total Reward: 1\n",
"Average Reward: 0.22916666666666666\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" \n",
" \n",
"O X \n",
"on move: O\n",
"X \n",
" \n",
"O X \n",
"on move: X\n",
"X \n",
"O \n",
"O X \n",
"on move: O\n",
"X \n",
"O X \n",
"O X \n",
"on move: X\n",
"X \n",
"O O X \n",
"O X \n",
"on move: O\n",
"X X \n",
"O O X \n",
"O X \n",
"on move: X\n",
"X X \n",
"O O X \n",
"O O X \n",
"on move: O\n",
"X X X \n",
"O O X \n",
"O O X \n",
"Episode 97, Total Reward: 1\n",
"Average Reward: 0.23711340206185566\n",
"on move: O\n",
" \n",
" X \n",
" \n",
"on move: X\n",
"O \n",
" X \n",
" \n",
"on move: O\n",
"O X \n",
" X \n",
" \n",
"on move: X\n",
"O X \n",
" X \n",
" O \n",
"on move: O\n",
"O X \n",
"X X \n",
" O \n",
"on move: X\n",
"O X \n",
"X X \n",
"O O \n",
"on move: O\n",
"O X \n",
"X X \n",
"O X O \n",
"on move: X\n",
"O O X \n",
"X X \n",
"O X O \n",
"on move: O\n",
"O O X \n",
"X X X \n",
"O X O \n",
"Episode 98, Total Reward: 1\n",
"Average Reward: 0.24489795918367346\n",
"on move: O\n",
" \n",
" \n",
" X \n",
"on move: X\n",
" O \n",
" \n",
" X \n",
"on move: O\n",
" O \n",
" X \n",
" X \n",
"on move: X\n",
" O \n",
"O X \n",
" X \n",
"on move: O\n",
" O \n",
"O X \n",
" X X \n",
"on move: X\n",
"O O \n",
"O X \n",
" X X \n",
"on move: O\n",
"O O \n",
"O X \n",
"X X X \n",
"Episode 99, Total Reward: 1\n",
"Average Reward: 0.25252525252525254\n",
"on move: O\n",
" X \n",
" \n",
" \n",
"on move: X\n",
" X \n",
" \n",
" O \n",
"on move: O\n",
" X \n",
" X \n",
" O \n",
"on move: X\n",
" X \n",
" X \n",
"O O \n",
"on move: O\n",
" X X \n",
" X \n",
"O O \n",
"on move: X\n",
" X X \n",
"O X \n",
"O O \n",
"on move: O\n",
" X X \n",
"O X \n",
"O O X \n",
"Episode 100, Total Reward: 1\n",
"Average Reward: 0.26\n"
]
}
],
"source": [
"# Основной цикл обучения (работа с отдельным классом агента)\n",
"\n",
"# Создание среды для игры в крестики-нолики\n",
"environment = TicTacToeEnv()\n",
"\n",
"# Создание агента (играющего крестиками)\n",
"agent = Agent(symbol=1)\n",
"\n",
"num_episodes = 100 # Количество эпизодов (игр) для обучения\n",
"collected_rewards = [] # Список для хранения наград/побед в каждом эпизоде \n",
"\n",
"# Переменная для отслеживания символа и текущего игрока\n",
"oom = 1\n",
"\n",
"for i in range(num_episodes):\n",
" # Сброс среды и начало нового эпизода\n",
" state, _ = environment.reset() \n",
"\n",
" # Общая награда за эпизод\n",
" total_reward = 0\n",
"\n",
" # Флаг завершения игры\n",
" done = False\n",
" om = oom \n",
"\n",
" # Максимум 9 ходов, поскольку поле 3x3 \n",
" for j in range(9): \n",
" moves = environment.move_generator() \n",
"\n",
" # Ходов нет, заканчиваем игру\n",
" if not moves:\n",
" break\n",
"\n",
" \n",
" if len(moves) == 1:\n",
" move = moves[0] # Если остался один ход на основе стратегии\n",
" else:\n",
" move = agent.get_action(moves) # Агент выбирает ход на основе стратегии\n",
"\n",
" # Выполнение хода и обновление состояния игры\n",
" next_state, reward, done, info = environment.step(move)\n",
" total_reward += reward\n",
" state = next_state\n",
"\n",
" # Отображаем текущее состояние игры\n",
" environment.render()\n",
"\n",
" if done:\n",
" break\n",
"\n",
" om = -om # Смена игрока\n",
"\n",
" collected_rewards.append(total_reward)\n",
"\n",
" print(f\"Episode {i+1}, Total Reward: {total_reward}\")\n",
" average_reward = sum(collected_rewards) / len(collected_rewards)\n",
" print(f\"Average Reward: {average_reward}\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "aimenv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}