AIM-PIbd-31-Shanygin-A-V/lab_3/Lab3.ipynb

605 lines
731 KiB
Plaintext
Raw Normal View History

2024-10-25 21:28:49 +04:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Stores"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['Store ID ', 'Store_Area', 'Items_Available', 'Daily_Customer_Count',\n",
" 'Store_Sales'],\n",
" dtype='object')\n"
]
}
],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.model_selection import train_test_split\n",
"import seaborn as sns\n",
"import numpy as np\n",
"df = pd.read_csv(\".//static//csv//Stores.csv\")\n",
"print(df.columns)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Бизнес цели:**\n",
"\n",
"1. Оптимизация ассортимента товаров в магазине. Повысить эффективность продаж за счет оптимального количества доступных товаров, чтобы избежать дефицита или излишков на складе.\n",
"2. Увеличение посещаемости магазинов. Повысить среднее количество клиентов в день для увеличения выручки и конкурентоспособности магазина.\n",
"\n",
"**Цели технического проекта:**\n",
"\n",
"Разработать прогнозный анализ спроса с использованием данных о доступных товарах и продажах для определения оптимального количества товаров в магазине.\n",
"\n",
"Провести корреляционный анализ между посещаемостью, площадью магазина и продажами, чтобы выявить факторы, влияющие на увеличение числа посетителей.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Подготовка данных:**"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Store ID 0\n",
"Store_Area 0\n",
"Items_Available 0\n",
"Daily_Customer_Count 0\n",
"Store_Sales 0\n",
"dtype: int64\n"
]
}
],
"source": [
"# Очистка данных: проверка пропусков и выбросов\n",
"print(df.isnull().sum()) # Проверка на пропущенные значения\n",
"# Заполнение пропусков медианой (если есть)\n",
"df.fillna(df.median(), inplace=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Выполнить разбиение каждого набора данных на обучающую, контрольную итестовую выборки для устранения проблемы просачивания данных:**"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Размер обучающей выборки: 572\n",
"Размер контрольной выборки: 144\n",
"Размер тестовой выборки: 180\n"
]
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"# Разделение данных на обучающую и тестовую выборки (80% - обучение, 20% - тест)\n",
"train_data, test_data = train_test_split(df, test_size=0.2, random_state=42)\n",
"\n",
"# Разделение обучающей выборки на обучающую и контрольную (80% - обучение, 20% - контроль)\n",
"train_data, val_data = train_test_split(train_data, test_size=0.2, random_state=42)\n",
"\n",
"print(\"Размер обучающей выборки:\", len(train_data))\n",
"print(\"Размер контрольной выборки:\", len(val_data))\n",
"print(\"Размер тестовой выборки:\", len(test_data))"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0kAAAIjCAYAAADWYVDIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACVdUlEQVR4nOzdeXxTVfo/8M+9N3vadKUbS9lXBRRF0VEWWcR93MZtVHTUr190VMblx4wLODoO476gzoLojKIzqIOj31EEFFdAREGQfS200JJuaZr93vP7I21saAtdkt4k/bxfL16am5uTJyfpvXlyzn2OJIQQICIiIiIiIgCArHcAREREREREiYRJEhERERERURNMkoiIiIiIiJpgkkRERERERNQEkyQiIiIiIqImmCQRERERERE1wSSJiIiIiIioCSZJRERERERETTBJIiIiojarqanBzp07EQqF9A6FYkgIgaqqKuzYsUPvUIgSApMkIiIialUwGMSf/vQnjBo1CmazGVlZWRg0aBBWrFihd2hJYdOmTViyZEnk9vr16/F///d/+gXURF1dHe6//34MGTIEJpMJOTk5GDx4MLZt26Z3aES6Y5JEdAyvvvoqJEmK/LNYLBg8eDBuu+02lJeX6x0eUUJyOp2QJAlz5szROxTqBL/fj8mTJ+OBBx7AhAkTsHjxYixbtgyffPIJxo0bp3d4SaGurg633HILVq9ejR07duCOO+7Axo0b9Q4LlZWVGDduHJ577jlceumleO+997Bs2TKsXLkSffv21Ts8It0Z9A6AKFk8/PDD6NevH3w+H7788ku89NJL+O9//4tNmzbBZrPpHR4RUczNmzcPa9aswdKlSzFhwgS9w0lK48aNi/wDgMGDB+Omm27SOSrgnnvuwcGDB7Fq1SqMGDFC73CIEg6TJKI2mj59Ok466SQAwK9+9Svk5OTgqaeewnvvvYcrr7xS5+iIiGIrFArhmWeewW9+8xsmSJ20ZMkSbN68GV6vF8cffzxMJpOu8VRUVOC1117Dyy+/zASJqBWcbkfUQZMmTQIA7NmzBwBQVVWFu+++G8cffzzS0tLgcDgwffp0bNiwodljfT4f5syZg8GDB8NisaCwsBAXX3wxdu3aBQDYu3dv1BS/I/81/cKycuVKSJKEf/7zn/jtb3+LgoIC2O12XHDBBdi/f3+z516zZg3OPvtsZGRkwGazYfz48fjqq69afI0TJkxo8flbmkL1+uuvY8yYMbBarcjOzsYVV1zR4vMf7bU1pWkannnmGYwYMQIWiwX5+fm45ZZbUF1dHbVf3759cd555zV7nttuu61Zmy3F/vjjjzfrUyA8zeihhx7CwIEDYTab0bt3b9x7773w+/0t9lVHX2d9fT1+85vfoHfv3jCbzRgyZAieeOIJCCGaxd7Sv0ceeQQAEAgE8OCDD2LMmDHIyMiA3W7HGWecgU8//bTFuJ544gk8/fTTKC4uhtVqxfjx47Fp06aofa+//vpm0272798Pq9UKSZKwd+/eyHaPx4MZM2bAbrdj+PDhWLduHYDw9SwzZsyAzWbDqFGj8O2330a11/gZu+iii5r14S233AJJknDcccdFbX/iiSdw2mmnIScnB1arFWPGjMHbb7/d8htxhAkTJjRrr7HNI18TEC5ScOedd0ben4EDB2LevHnQNC2yT9M+PdJxxx3X4t/r0eJtqd9b0rdv38jnQJZlFBQU4Be/+AVKSkqO+VgAePHFFzFixAiYzWYUFRVh5syZqKmpidy/bds2VFdXIz09HePHj4fNZkNGRgbOO++8qM/Kp59+CkmS8O9//7vZcyxatAiSJGHVqlWRmK+//vqofRr7ZOXKlZFtX3zxBS677DL06dMn8vd31113wev1Rj12zpw5zf6m3njjDYwePRoWiwU5OTm48sorm/XJ9ddfj7S0tKhtb7/9drM4ACAtLa1ZzEDbjnkTJkyIvP/Dhw/HmDFjsGHDhhaPBS058hicm5uLc889t9nfqiRJuO2221ptp3HaeOPne+3atdA0DYFAACeddNJR+woAPvnkE5xxxhmw2+3IzMzEhRdeiC1btkTt0/hebN26FZdffjkcDgdycnJwxx13wOfzNYu36bE4FArhnHPOQXZ2NjZv3hy1b1vPLUSxxpEkog5qTGhycnIAALt378aSJUtw2WWXoV+/figvL8ef//xnjB8/Hps3b0ZRUREAQFVVnHfeeVixYgWuuOIK3HHHHairq8OyZcuwadMmDBgwIPIcV155Jc4555yo5509e3aL8Tz66KOQJAn33XcfKioq8Mwzz2Dy5MlYv349rFYrgPCJbvr06RgzZgweeughyLKMhQsXYtKkSfjiiy8wduzYZu326tULjz32GADA7Xbj1ltvbfG5H3jgAVx++eX41a9+hcOHD+P555/HmWeeie+//x6ZmZnNHnPzzTfjjDPOAAC8++67zb5g3XLLLXj11VcxY8YM/PrXv8aePXvwwgsv4Pvvv8dXX30Fo9HYYj+0R01NTeS1NaVpGi644AJ8+eWXuPnmmzFs2DBs3LgRTz/9NLZv3x51EfaxHO11CiFwwQUX4NNPP8WNN96I0aNHY+nSpbjnnntQWlqKp59+OqqtKVOm4Nprr43aNnr0aACAy+XC3/72N1x55ZW46aabUFdXhwULFmDatGn45ptvIvs1+vvf/466ujrMnDkTPp8Pzz77LCZNmoSNGzciPz+/1dfz4IMPNvvCAwB33XUXXnvtNdx2223o1asX/vd//xcA8Je//AWTJk3CI488gmeffRbTp0/H7t27kZ6eHnmsxWLB//3f/6GiogJ5eXkAAK/Xi3/+85+wWCzNnuvZZ5/FBRdcgKuvvhqBQABvvfUWLrvsMnzwwQc499xzW429vTweD8aPH4/S0lLccsst6NOnD77++mvMnj0bBw8exDPPPBOz5+qoM844AzfffDM0TcOmTZvwzDPPoKysDF988cVRHzdnzhzMnTsXkydPxq233opt27bhpZdewtq1ayN/X5WVlQDCx5xBgwZh7ty58Pl8mD9/Pk4//XSsXbsWgwcPxoQJE9C7d2+88cYb+PnPfx71PG+88QYGDBjQ7uuXFi9eDI/Hg1tvvRU5OTn45ptv8Pzzz+PAgQNYvHhxq49btGgRrrnmGowaNQqPPfYYKisr8dxzz+HLL7/E999/j9zc3HbF0ZqOHPMa3Xfffe16rqFDh+J3v/sdhBDYtWsXnnrqKZxzzjltToZb0vje3nbbbRgzZgz++Mc/4vDhwy321fLlyzF9+nT0798fc+bMgdfrxfPPP4/TTz8d3333XbOE/vLLL0ffvn3x2GOPYfXq1XjuuedQXV2Nv//9763G86tf/QorV67EsmXLMHz48Mj2zvQzUacJIjqqhQsXCgBi+fLl4vDhw2L//v3irbfeEjk5OcJqtYoDBw4IIYTw+XxCVdWox+7Zs0eYzWbx8MMPR7a98sorAoB46qmnmj2XpmmRxwEQjz/+eLN9RowYIcaPHx+5/emnnwoAomfPnsLlckW2/+tf/xIAxLPPPhtpe9CgQWLatGmR5xFCCI/HI/r16yemTJnS7LlOO+00cdxxx0VuHz58WAAQDz30UGTb3r17haIo4tFHH4167MaNG4XBYGi2fceOHQKAeO211yLbHnroIdH0cPTFF18IAOKNN96IeuxHH33UbHtxcbE499xzm8U+c+ZMceQh7sjY7733XpGXlyfGjBkT1af/+Mc/hCzL4osvvoh6/MsvvywAiK+++qrZ8x2pLa9zyZIlAoB45JFHoh576aWXCkmSxM6dO6NinzlzZqvPFwqFhN/vj9pWXV0t8vPzxQ033BDZ1vjZavrZFUKINWvWCADirrvuimy77rrrRHFxceT2pk2bhCzLYvr06QKA2LNnjxBCiEOHDgmTySRmz54d2feDDz4QAMQ555wT+bxt2bJFSJIknn766ch+48ePFyNGjBAjR44UTzzxRGT7P/7xD9GrVy9xxhlniBEjRkS9Lo/HE3U7EAiI4447TkyaNKnV/jny+Y70+OOPR70mIYT4/e9/L+x2u9i+fXvUvv/
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1YAAAIjCAYAAAAAxIqtAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACLFklEQVR4nOzdd3wUZf4H8M/M9k0vpEESQu+gIJwgIEqRw8LPU09FRTyVU7Bx6h3eIWBDDk9R4fT0rKfYu6coVeQoUgRBQwglhJqQXrbPPL8/lqxZkkCSrcl+3q8XL93ZnWe++83s7Hz3eeYZSQghQERERERERK0mhzoAIiIiIiKito6FFRERERERkY9YWBEREREREfmIhRUREREREZGPWFgRERERERH5iIUVERERERGRj1hYERERERER+YiFFRERERERkY9YWBERERGRX6iqipKSEhw4cCDUoRAFHQsrIiIiojC3fv16rF271vN47dq1+N///he6gOo5ceIE7r33XmRnZ0Ov16NDhw7o06cPqqqqQh0aUVCxsKJ26fXXX4ckSZ5/RqMRPXr0wMyZM1FUVBTq8IjCUklJCSRJwrx580IdChGd5vDhw7jzzjuxa9cu7Nq1C3feeScOHz4c6rCwb98+nHfeeXj33Xcxffp0fPnll1ixYgVWrVqFqKioUIdHFFTaUAdAFEiPPPIIcnJyYLPZsH79erzwwgv46quvsHv3bpjN5lCHR0RE1CxXXnklFi9ejAEDBgAAzj//fFx55ZUhjgqYPn069Ho9Nm3ahI4dO4Y6HKKQYmFF7drEiRMxZMgQAMCtt96KpKQkPP300/jss89w3XXXhTg6IiKi5jEYDNiwYQN2794NAOjXrx80Gk1IY9q2bRtWr16Nb7/9lkUVETgUkCLMRRddBAA4ePAgAKCsrAz3338/+vfvj+joaMTGxmLixInYuXNng3VtNhvmzZuHHj16wGg0Ij09HVdeeSX2798PACgoKPAafnj6vwsvvNDT1tq1ayFJEt577z089NBDSEtLQ1RUFC6//PJGh3Zs3rwZl1xyCeLi4mA2mzF69Ogmx9ZfeOGFjW6/seFdb731FgYPHgyTyYTExERce+21jW7/TO+tPlVVsXjxYvTt2xdGoxGpqamYPn06ysvLvV7XuXNnXHrppQ22M3PmzAZtNhb7okWLGuQUAOx2O+bOnYtu3brBYDAgMzMTDz74IOx2e6O5au37rK2txZ/+9CdkZmbCYDCgZ8+eeOqppyCEaBB7Y/8ee+wxAIDD4cDDDz+MwYMHIy4uDlFRURg5ciTWrFnTaFxPPfUUnnnmGWRnZ8NkMmH06NGek6w6N998Mzp37uy17PDhwzCZTJAkCQUFBZ7lFosF06ZNQ1RUFPr06YNt27YBAJxOJ6ZNmwaz2YyBAwdi69atXu3V7WOTJ09ukMPp06dDkiT069fPa/lTTz2F4cOHIykpCSaTCYMHD8aHH37Y+B/iNBdeeGGD9uraPP09AUBFRQXuvfdez9+nW7duWLhwIVRV9bymfk5P169fv0Y/r2eKt7G8N6Zz586e/UCWZaSlpeH3v/89CgsL/bZuS3J9+rDpxo5Xda+pn2dVVTFgwABIkoTXX3/9jLHW/3f6a5sbqyRJmDlzZoPll156aYO8t+QzAAD//Oc/0bdvXxgMBmRkZGDGjBmoqKjwek1L9sGWHN9cLhceffRRdO3aFQaDAZ07d8ZDDz3U4JjVuXNn3HzzzdBoNBg4cCAGDhyIjz/+GJIk+W2/O9Nnos68efO83sOmTZtgNBqxf/9+Tw7T0tIwffp0lJWVNVj/gw8+8HznJCcn44YbbsDRo0e9XnPzzTcjOjoaBw4cwIQJExAVFYWMjAw88sgjXsfYunjr71PV1dUYPHgwcnJycPz4cc/y5n43EfmKPVYUUeqKoKSkJADAgQMH8Omnn+Lqq69GTk4OioqK8K9//QujR4/GL7/8goyMDACAoii49NJLsWrVKlx77bW45557UF1djRUrVmD37t3o2rWrZxvXXXcdfvvb33ptd/bs2Y3G8/jjj0OSJPz5z39GcXExFi9ejLFjx2LHjh0wmUwAgNWrV2PixIkYPHgw5s6dC1mW8dprr+Giiy7C999/j6FDhzZot1OnTliwYAEAoKamBnfccUej254zZw6uueYa3HrrrTh58iSef/55jBo1Cj/++CPi4+MbrHP77bdj5MiRAICPP/4Yn3zyidfz06dPx+uvv45p06bh7rvvxsGDB7FkyRL8+OOP+N///gedTtdoHlqioqLC897qU1UVl19+OdavX4/bb78dvXv3xq5du/DMM89g7969+PTTT5u9jTO9TyEELr/8cqxZswZ/+MMfMGjQIHzzzTd44IEHcPToUTzzzDNebY0bNw433XST17JBgwYBAKqqqvDvf/8b1113HW677TZUV1fjlVdewYQJE/DDDz94XlfnzTffRHV1NWbMmAGbzYZnn30WF110EXbt2oXU1NQm38/DDz8Mm83WYPl9992HN954AzNnzkSnTp1w5513AgBeeuklXHTRRXjsscfw7LPPYuLEiThw4ABiYmI86xqNRvz3v/9FcXExUlJSAABWqxXvvfcejEZjg209++yzuPzyyzFlyhQ4HA68++67uPrqq/Hll19i0qRJTcbeUhaLBaNHj8bRo0cxffp0ZGVlYcOGDZg9ezaOHz+OxYsX+21brTVy5EjcfvvtUFUVu3fvxuLFi3Hs2DF8//33flm3Nbl+5plnkJycDMB9bDib//znP9i1a9cZXzNo0CD86U9/AuD+Mevhhx9u8Jpg7RdNfQbmzZuH+fPnY+zYsbjjjjuQl5eHF154AVu2bPHbMetMbr31Vrzxxhu46qqr8Kc//QmbN2/GggULkJub2+D4Wp/L5cJf//rXFm3Ll/2uKaWlpbDZbLjjjjtw0UUX4Y9//CP279+PpUuXYvPmzdi8eTMMBgMAeL4bzjvvPCxYsABFRUV49tln8b///a/Bd46iKLjkkkvwm9/8Bn//+9+xfPlyzJ07Fy6XC4888kijsTidTvzud79DYWEh/ve//yE9Pd3zXDC+m4gAAIKoHXrttdcEALFy5Upx8uRJcfjwYfHuu++KpKQkYTKZxJEjR4QQQthsNqEoite6Bw8eFAaDQTzyyCOeZa+++qoAIJ5++ukG21JV1bMeALFo0aIGr+nbt68YPXq05/GaNWsEANGxY0dRVVXlWf7+++8LAOLZZ5/1tN29e3cxYcIEz3aEEMJisYicnBwxbty4BtsaPny46Nevn+fxyZMnBQAxd+5cz7KCggKh0WjE448/7rXurl27hFarbbA8Pz9fABBvvPGGZ9ncuXNF/UPI999/LwCIt99+22vd5cuXN1ienZ0tJk2a1CD2GTNmiNMPS6fH/uCDD4qUlBQxePBgr5z+5z//EbIsi++//95r/RdffFEAEP/73/8abO90zXmfn376qQAgHnvsMa91r7rqKiFJkti3b59X7DNmzGhyey6XS9jtdq9l5eXlIjU1Vdxyyy2eZXX7Vv19VwghNm/eLACI++67z7Ns6tSpIjs72/N49+7dQpZlMXHiRAFAHDx4UAghxIkTJ4RerxezZ8/2vPbLL78UAMRvf/tbz/6Wm5srJEkSzzzzjOd1o0ePFn379hUDBgwQTz31lGf5f/7zH9GpUycxcuRI0bdvX6/3ZbFYvB47HA7Rr18/cdFFFzWZn9O3d7pFixZ5vSchhHj00UdFVFSU2Lt3r9dr//KXvwiNRiMKCwuFEK37vH7wwQdNxnh63puSnZ0tpk6d6rXs+uuvF2az2W/rtiTXL7/8sgAgDh065Fk2evRor/dfdzyty7PNZhNZWVmefeq1115r0G5GRoa49NJLPY+3bNnS6GubG2tTn6VJkyY1yHtzPwPFxcVCr9eL8ePHe30PLFmyRAAQr776qldOmrsPNvf4tmPHDgFA3HrrrV6vu//++wUAsXr1aq826//t//nPfwqDwSDGjBnjt/3uTJ+JOqcfD+seX3zxxcLlcnmW1+0zzz//vBDC/XdNSUkR/fr1E1ar1fO6umPOww8
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0kAAAIjCAYAAADWYVDIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACF9ElEQVR4nOzdd3xb1f3/8ffVtLxjOzuO40xCZhsIZYYVIGWWL2W3Acr4UWgLFChpSxNGS/nSQijQAt+yOti7lBXCLmEFsiBk7+09ZM17fn/Idu3YGXYkS7Jfz8fDj0RX0tFH8vWV3jrnnmMZY4wAAAAAAJIkR7ILAAAAAIBUQkgCAAAAgBYISQAAAADQAiEJAAAAAFogJAEAAABAC4QkAAAAAGiBkAQAAAAALRCSAAAAAKAFQhIAAEA3FIlEtH37dq1fvz7ZpQBph5AEAADQAa+88ooWLFjQfPnFF1/UV199lbyCWlixYoUuueQS9e/fXx6PR3379tXBBx8sY0yySwPSCiEJiJNHH31UlmU1/2RkZGjkyJG68sortW3btmSXB6SksrIyWZalWbNmJbsUYK8tXrxYP/vZz7RixQp9/PHH+n//7/+ptrY22WXp448/1uTJk/X222/rhhtu0BtvvKE5c+boxRdflGVZyS4PSCuuZBcAdDc333yzSktLFQgE9OGHH+ovf/mLXn31VS1ZskSZmZnJLg8AsI8uvvhiPfzwwxo5cqQk6fTTT9d3vvOdpNYUCoV04YUXauTIkXrzzTeVl5eX1HqAdEdIAuJs2rRpOuCAAyTF3kgLCwt155136qWXXtI555yT5OoAAPuqd+/eWrJkSfOXX6NHj052SfrXv/6lZcuW6ZtvviEgAXHAcDsgwY4++mhJ0po1ayRJFRUVuvbaazVu3DhlZ2crNzdX06ZN08KFC9vcNxAIaNasWRo5cqQyMjLUv39/nX766Vq1apUkae3ata2G+O38c+SRRza39e6778qyLD311FP65S9/qX79+ikrK0unnHKKNmzY0OaxP/nkE51wwgnKy8tTZmampkyZov/85z/tPscjjzyy3cdvbwjVP/7xD02aNEk+n08FBQU6++yz23383T23lmzb1uzZszVmzBhlZGSob9++uuyyy1RZWdnqdkOGDNFJJ53U5nGuvPLKNm22V/sdd9zR5jWVpGAwqJkzZ2r48OHyer0qLi7W9ddfr2Aw2O5r1dnnWV9fr5///OcqLi6W1+vVqFGj9Ic//KHNeQa7auvWW2+VFPu2+Te/+Y0mTZqkvLw8ZWVl6fDDD9c777zTbl1/+MMfdNddd6mkpEQ+n09TpkzRkiVLWt32ggsu0JAhQ1pt27Bhg3w+nyzL0tq1a5u3+/1+XXjhhcrKytL++++v+fPnS5LC4bAuvPBCZWZmasKECfr8889btde0j5122mltXsPLLrtMlmVp7Nixrbb/4Q9/0CGHHKLCwkL5fD5NmjRJzz77bPu/iJ0ceeSRbdpranPn5yRJVVVVuuqqq5p/P8OHD9ftt98u27abb9PyNd3Z2LFj2/173V297b3u7RkyZEjzfuBwONSvXz+dddZZezyZv+X92vtp+dh7+3coSa+99pqmTJminJwc5ebm6sADD9Tjjz8uadfHkvb+LiKRiG655RYNGzZMXq9XQ4YM0S9/+cs2f3t7+/w78jc2a9Yseb1eTZo0SaNHj97l8aE9LZ+L0+nUwIEDdemll6qqqqr5Np35/X/88ccqLS3Vc889p2HDhsnj8Wjw4MG6/vrr1dDQ0Ob+f/7znzVmzBh5vV4NGDBAV1xxRasapP/+HcyfP1+HHHKIfD6fSktLdf/997e6XVO97777bvO2zZs3a8iQITrggANUV1fXvH1fj5lAV6EnCUiwpkBTWFgoSVq9erVefPFFff/731dpaam2bdumBx54QFOmTNHXX3+tAQMGSJKi0ahOOukkzZ07V2effbZ+9rOfqba2VnPmzNGSJUs0bNiw5sc455xz9N3vfrfV486YMaPden7729/Ksiz94he/0Pbt2zV79mwde+yxWrBggXw+nyTp7bff1rRp0zRp0iTNnDlTDodDjzzyiI4++mh98MEHmjx5cpt2Bw0apNtuu02SVFdXp8svv7zdx77xxht15pln6uKLL9aOHTt0zz336IgjjtCXX36p/Pz8Nve59NJLdfjhh0uSnn/+eb3wwgutrr/sssv06KOP6sILL9RPf/pTrVmzRvfee6++/PJL/ec//5Hb7W73deiIqqqq5ufWkm3bOuWUU/Thhx/q0ksv1ejRo7V48WLdddddWr58uV588cW9fozdPU9jjE455RS98847+tGPfqSJEyfqjTfe0HXXXadNmzbprrvuatXW1KlT9cMf/rDVtokTJ0qSampq9Ne//lXnnHOOLrnkEtXW1uqhhx7S8ccfr08//bT5dk3+9re/qba2VldccYUCgYDuvvtuHX300Vq8eLH69u27y+fzm9/8RoFAoM32q6++Wo899piuvPJKDRo0SD/+8Y8lSQ8++KCOPvpo3Xrrrbr77rs1bdo0rV69Wjk5Oc33zcjI0L///W9t375dffr0kSQ1NDToqaeeUkZGRpvHuvvuu3XKKafovPPOUygU0pNPPqnvf//7euWVV3TiiSfusvaO8vv9mjJlijZt2qTLLrtMgwcP1kcffaQZM2Zoy5Ytmj17dtweq7MOP/xwXXrppbJtW0uWLNHs2bO1efNmffDBB7u8z+zZs5s/3C5dulS/+93v9Mtf/rK51yQ7O7v5tnv7d/joo4/qoosu0pgxYzRjxgzl5+fryy+/1Ouvv65zzz1Xv/rVr3TxxRdLip2vdvXVV7f622jp4osv1mOPPaYzzjhDP//5z/XJJ5/otttu09KlS9scJ/b0/Dv6N9bSro4Pu/O9731Pp59+uiKRiObNm6cHH3xQDQ0N+vvf/96hdloqLy/X6tWr9ctf/lKnn366fv7zn+vzzz/XHXfcoSVLlujf//53c8icNWuWbrrpJh177LG6/PLLtWzZMv3lL3/RZ5991ua4WVlZqe9+97s688wzdc455+jpp5/W5ZdfLo/Ho4suuqjdWqqrqzVt2jS53W69+uqrzftKPI+ZQMIZAHHxyCOPGEnmrbfeMjt27DAbNmwwTz75pCksLDQ+n89s3LjRGGNMIBAw0Wi01X3XrFljvF6vufnmm5u3Pfzww0aSufPOO9s8lm3bzfeTZO644442txkzZoyZMmVK8+V33nnHSDIDBw40NTU1zduffvppI8ncfffdzW2PGDHCHH/88c2PY4wxfr/flJaWmqlTp7Z5rEMOOcSMHTu2+fKOHTuMJDNz5szmbWvXrjVOp9P89re/bXXfxYsXG5fL1Wb7ihUrjCTz2GOPNW+bOXOmaXnY+uCDD4wk889//rPVfV9//fU220tKSsyJJ57YpvYrrrjC7Hwo3Ln266+/3vTp08dMmjSp1Wv697//3TgcDvPBBx+0uv/9999vJJn//Oc/bR5vZ3vzPF988UUjydx6662t7nvGGWcYy7LMypUrW9V+xRVX7PLxIpGICQaDrbZVVlaavn37mosuuqh5W9O+1XLfNcaYTz75xEgyV199dfO26dOnm5KSkubLS5YsMQ6Hw0ybNs1IMmvWrDHGGLN161bj8XjMjBkzmm/7yiuvGEnmu9/9bvP+tnTpUmNZlrnrrruabzdlyhQzZswYM378ePOHP/yhefvf//53M2jQIHP44YebMWPGtHpefr+/1eVQKGTGjh1rjj766F2+Pjs/3s7uuOOOVs/JGGNuueUWk5WVZZYvX97qtjfccINxOp1m/fr1xpjO/b0+88wzu6xx59d9V0pKSsz06dNbbTv33HNNZmbmHu+7cz3vvPNOm+v29u+wqqrK5OTkmIMOOsg0NDS0um3LY02TptfrkUceaXPdggULjCRz8cUXt9p+7bXXGknm7bffbt62N8+/o39je3N82JWd729M7Bi6//77N1/uzO9/+vTpRpK54IILWt2u6Xjyr3/9yxhjzPbt243H4zHHHXdcq/eie++910gyDz/8cPO2KVOmGEnmj3/
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Статистические показатели для обучающей выборки:\n",
"Среднее значение: 10.95\n",
"Стандартное отклонение: 0.31\n",
"Минимальное значение: 9.70\n",
"Максимальное значение: 11.56\n",
"Количество наблюдений: 627\n",
"\n",
"Статистические показатели для валидационной выборки:\n",
"Среднее значение: 10.92\n",
"Стандартное отклонение: 0.29\n",
"Минимальное значение: 10.01\n",
"Максимальное значение: 11.53\n",
"Количество наблюдений: 134\n",
"\n",
"Статистические показатели для тестовой выборки:\n",
"Среднее значение: 10.97\n",
"Стандартное отклонение: 0.33\n",
"Минимальное значение: 9.61\n",
"Максимальное значение: 11.66\n",
"Количество наблюдений: 135\n",
"\n"
]
}
],
"source": [
"df['store_sales_log'] = np.log(df['Store_Sales'])\n",
"\n",
"X = df.drop(['Store_Sales', 'store_sales_log'], axis=1) \n",
"y = df['store_sales_log'] \n",
"\n",
"X = X.select_dtypes(include='number')\n",
"\n",
"X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42)\n",
"\n",
"X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)\n",
"\n",
"def plot_distribution(data, title):\n",
" \"\"\"Построение гистограммы распределения целевого признака\"\"\"\n",
" plt.figure(figsize=(10, 6))\n",
" sns.histplot(data, kde=True, bins=30, color='skyblue')\n",
" plt.title(title)\n",
" plt.xlabel('Logarithm of Price')\n",
" plt.ylabel('Count')\n",
" plt.grid(True)\n",
" plt.show()\n",
"\n",
"plot_distribution(y_train, 'Распределение логарифма цены в обучающей выборке')\n",
"plot_distribution(y_val, 'Распределение логарифма цены в валидационной выборке')\n",
"plot_distribution(y_test, 'Распределение логарифма цены в тестовой выборке')\n",
"\n",
"def get_statistics(df, name):\n",
" print(f\"Статистические показатели для {name} выборки:\")\n",
" print(f\"Среднее значение: {df.mean():.2f}\")\n",
" print(f\"Стандартное отклонение: {df.std():.2f}\")\n",
" print(f\"Минимальное значение: {df.min():.2f}\")\n",
" print(f\"Максимальное значение: {df.max():.2f}\")\n",
" print(f\"Количество наблюдений: {df.count()}\\n\")\n",
"\n",
"get_statistics(y_train, \"обучающей\")\n",
"get_statistics(y_val, \"валидационной\")\n",
"get_statistics(y_test, \"тестовой\")\n"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Распределение классов после SMOTE (oversampling):\n",
"store_sales_category\n",
"0 129\n",
"1 129\n",
"2 129\n",
"3 129\n",
"4 129\n",
"Name: count, dtype: int64\n"
]
}
],
"source": [
"from imblearn.over_sampling import SMOTE\n",
"df['store_sales_log'] = np.log(df['Store_Sales'])\n",
"\n",
"df['store_sales_category'] = pd.qcut(df['store_sales_log'], q=5, labels=[0, 1, 2, 3, 4])\n",
"\n",
"X = df.drop(['Store_Sales', 'store_sales_log', 'store_sales_category'], axis=1)\n",
"y = df['store_sales_category']\n",
"X = pd.get_dummies(X, drop_first=True)\n",
"for col in X.columns:\n",
" X[col] = X[col].fillna(X[col].mode()[0])\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n",
"\n",
"smote = SMOTE(random_state=42)\n",
"X_train_smote, y_train_smote = smote.fit_resample(X_train, y_train)\n",
"\n",
"print(\"Распределение классов после SMOTE (oversampling):\")\n",
"print(pd.Series(y_train_smote).value_counts())\n"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAArwAAAIqCAYAAADGhZkdAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABf2klEQVR4nO3deXQUZfr+/6s6IQkEkkDIQgjIqqBIEAiIoCIiuICiMIowGpDR+SqgiKOiHzdcBlF0GBRxxhGQUdwFl1FUkEUEETCAiIAKyGbCmgQCJCT9/P7g12WadGehG5Iu3q9zcg65q7rq6bqfJFcX1dWWMcYIAAAAcChXVQ8AAAAAOJkIvAAAAHA0Ai8AAAAcjcALAAAARyPwAgAAwNEIvAAAAHA0Ai8AAAAcjcALAAAARyPwAgAAwNEIvKgWmjRpIsuyvL4iIyPVuHFj3XDDDfr666+reogAAEk///yzRowYobPPPlvR0dGKiopSamqq0tPTNWLECL3//vulHtO9e3f7d/s111xT5vbfffddr78F27dv97vut99+q7/85S9q2bKlateurejoaLVo0ULDhg3TkiVLSq2/YMGCUn9rKvL12GOPSZKmT59eofWbNGlSqWOKky+8qgcAlNS1a1e1aNFCkpSTk6MVK1bonXfe0bvvvqsJEyZo9OjRVTxCADh9ffDBBxo0aJAKCgoUHx+vrl27KiEhQfv379eqVas0efJkvfXWW+rfv7/fbXz66afKzs5WUlKSz+WvvvpqueMoLCzU7bffrqlTp0qSmjdvrt69e8uyLK1Zs0ZTp07V1KlTNXToUL388suKiIiQJCUnJysjI6PU9latWqXVq1crKSlJl19+eanl7dq18/o+OjpaAwYM8Du++vXrl/sccIoZoBo444wzjCQzbdo0r/rhw4fNzTffbCSZsLAws2HDhqoZIACc5rKyskzt2rWNJHPPPfeYw4cPl1pnxYoVZsyYMaXqF198sZFkOnbsaCSZZ555xuc+tm7dalwul0lPTzeSjCSzbdu2Uutde+21RpKJj483H3/8canln376qUlISDCSzHXXXVfuc3v00UeNJHPxxReXud60adOMJHPGGWeUu01UL1zSgGotKipKkydPVnR0tIqLi/XBBx9U9ZAA4LT0ySef6ODBg0pJSdGECRMUFRVVap0OHTpo3Lhxfrfx5z//WREREZo2bZrP5dOnT5fb7dYtt9zidxuvvPKKZs2apRo1amjOnDnq06dPqXWuuOIKff7556pRo4Y++OCDCp01hrMReFHt1a5dW2eddZYkacuWLXZ99+7dmjRpkq688ko1bdpUNWvWVExMjDp27Kjx48fryJEjfrd56NAhTZw4Ud26dVPdunUVGRmpM844Q3379tXMmTO91i157VlZX927d/d6nOdase7du+vQoUN68MEH1aJFC0VFRSklJUXDhg3Tjh07/I5x//79evTRR9WuXTvVqVNHtWrV0rnnnqsnn3xShw4d8vu4LVu2lDvWkscxGPss77o2f9ez7dy5U6NHj1br1q1Vq1Yt1alTR+np6XrxxRdVVFRUav0hQ4bIsixNnz69zOd+/P781T0yMjLssS5YsMDnOu+9954uv/xyJSQkKCIiQg0bNtSf//xnrVu3zs9R8c+zr+Pl5OQoPT1dlmXpjjvukDGm1DqPPfZYpebh0aNH9frrr2vw4MFq1aqVYmJiVLNmTZ111lm68847tXPnTr/jNMbogw8+UJ8+fZScnKyIiAglJyerW7duGj9+vA4fPlzqMStXrlRGRoaaNm2qqKgo1atXT2lpabr33nv122+/lVr/u+++0/XXX6+UlBRFREQoMTFRffv21ZdffulzTJ45UPKrVq1aatmypYYNG1bpflT0mkx/Pzf79u3Tgw8+qHPOOceewx06dNAzzzzj8/h47NixQ/fee6/OPfdc1alTR9HR0TrzzDM1ZMgQn9eeSr7f61Dyy9/PRbDmbnZ2tiQpISGhUo8rKT4+XldffbV++uknLV261GuZMUbTp09XzZo1deONN/p8vDHGDtT/7//9P3Xs2NHvvs477zzdfvvtkqS///3vPn+ecPrgGl6EhLy8PElSZGSkXfv888911113qWHDhmrRooXOP/987d69W8uWLdOYMWP04Ycfav78+V6PkaRt27bp8ssv17p161SrVi117dpV8fHx2rFjh77++mv98MMPGjRoUKkx9O7dW8nJyaXqWVlZ+vzzz/2OvbCwUJdeeqnWrFmj7t27q3379lq8eLGmTp2qTz/9VIsWLVLLli29HrNu3Tpdfvnl2rZtmxo0aKBu3bqpRo0a+u677/Twww/r/fff14IFCxQbG+t3v76uMXvvvfeUn5/vc/1g7LN58+bq1q2b/f3Bgwd9voFFkhYtWqR+/fpp//79atKkiS677DIVFBTou+++08iRI/Xxxx/rk08+UY0aNfzuLxgWL16sGTNm+F1eVFSkwYMH65133lFkZKQ6dOighg0bauPGjXrjjTf0wQcf6IMPPvB53V9l5OTk6LLLLtOKFSt0xx136MUXX/QZij3S0tK8riv0Nw+zs7N10003KTY2Vq1bt1bbtm2Vn5+vVatW6YUXXtBbb72lJUuW2NfOexw9elQDBw7UBx98IJfLpU6dOqlHjx7as2eP1q1bpzFjxuiGG27wehHx7LPPasyYMXK73TrzzDN1zTXX6PDhw/rll180YcIEnXPOORoyZIi9/iuvvKL/9//+n9xut8477zx1795dv/32mz755BN98skneuyxx/Too4/6fP4lr/c/cOCAli1bpqlTp+rNN9/Ud999pzZt2lTgqP/h+Llbkr+fm02bNqlHjx767bfflJCQoCuvvFJHjx7V/Pnzdf/99+vtt9/W3LlzVbduXa/HzZs3TwMGDFBOTo4SExN16aWXKiIiQlu2bLFfcF9wwQV+x9q/f3/Vrl3b/n7x4sX69ddfS60X7LnbuHFjSdLatWs1b948XXrppRV63PFuueUWvffee5o6daq6dOli1+fPn69NmzZp8ODBfn/PrFmzRps3b5Yk3XzzzeXu6+abb9akSZO0adMmrV27Vueee+4JjRkOUKUXVAD/P3/X8BpjzOrVq43L5TKSzNSpU+36unXrzNKlS0utv2/fPtOrVy+f14kVFxfb15D16tXL7Nq1y2v54cOHzf/+9z+vmufas/nz5/sc+/z5831e++WpSzItWrQwv/32m9d++vfvbySZ888/3+txhw4dMs2bNzeSzEMPPWQKCgrsZfn5+ebGG280kszQoUN9jufXX3/1e42Z5zhv3rw5qPv8z3/+YySZIUOGeNU3b97scyy///67iY+PN5ZlmZdeeskUFxfby/bs2WN69OhhJJmxY8d6PS4jI8PvPClrf/7qR48eNeeee64JCwszKSkpPvv84IMPGkmmc+fOZtOmTV7L3n33XRMWFmbq1q1r9u/f73NMvnjmhcf+/fvteTl8+PAyH/vQQw8ZSeaxxx7zqvubh3l5eebDDz/06qkxxhQWFpoHHnjASDJXXnllqf2MHj3aSDJNmjQxq1at8lrmdrvN3LlzTU5Ojl378MMPjSQTFRVl3n777VLb+/HHH826devs79esWWPCw8ONZVlmxowZXut++umnJiIiwkgyX3zxhdcyf3Pg6NGjpk+fPvb1pRXluSYzIyPD7zr+fm46d+5sJJmrr77aHDx40K7v2rXLtG/f3kgygwYN8nrM1q1bTWxsrJFkxowZU6ov2dnZ5uuvv/Y5jtTUVCPJbNmyxavu75gEe+4eOHDANGzY0EgylmWZ7t27myeeeML873//K/W79Hie36P//e9/TXFxsUlNTTV16tQx+fn59jqDBw82ksxXX31ljPnj56TkNbyvvvqqkWQiIiLM0aNHyx3z0aNH7blU8u/H8biG1/kIvKgWfAXenJwc87///c8OYikpKV5/VMqyYcMGI8mkp6d71WfPnm0kmQYNGpgDBw5UaFvBCLyzZ88u9bjs7GxTq1YtI8l88803dn3KlClGkunTp4/P/R04cMAkJiaa8PBws2/fvlLLf/zxRzt
"text/plain": [
"<Figure size 800x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(8, 6))\n",
"pd.Series(y_train_smote).value_counts().sort_index().plot(kind='bar', color='skyblue')\n",
"\n",
"plt.title('Распределение классов после SMOTE', fontsize=16)\n",
"plt.xlabel('Классы', fontsize=14)\n",
"plt.ylabel('Количество', fontsize=14)\n",
"plt.xticks(rotation=0) # Оставляем метки классов горизонтальными\n",
"plt.grid(axis='y', linestyle='--', alpha=0.7)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"На графике видно, что распределение классов после применения SMOTE стало полностью сбалансированным: все 5 классов имеют одинаковое количество выборок. Это говорит о том, что проблема несбалансированности классов успешно решена."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Исходные данные: (896, 7) Store ID int64\n",
"Store_Area int64\n",
"Items_Available int64\n",
"Daily_Customer_Count int64\n",
"Store_Sales int64\n",
"store_sales_log float64\n",
"store_sales_category category\n",
"dtype: object\n",
"После OneHotEncoder: (896, 7) Store ID int64\n",
"Store_Area int64\n",
"Items_Available int64\n",
"Daily_Customer_Count int64\n",
"Store_Sales int64\n",
"store_sales_log float64\n",
"store_sales_category category\n",
"dtype: object\n",
"После дискретизации: (896, 13) Store ID int64\n",
"Store_Area int64\n",
"Items_Available int64\n",
"Daily_Customer_Count int64\n",
"Store_Sales int64\n",
"store_sales_log float64\n",
"store_sales_category category\n",
"Store ID _bin float64\n",
"Store_Area_bin float64\n",
"Items_Available_bin float64\n",
"Daily_Customer_Count_bin float64\n",
"Store_Sales_bin float64\n",
"store_sales_log_bin float64\n",
"dtype: object\n",
"После синтеза новых признаков: (896, 15) Store ID int64\n",
"Store_Area int64\n",
"Items_Available int64\n",
"Daily_Customer_Count int64\n",
"Store_Sales int64\n",
"store_sales_log float64\n",
"store_sales_category category\n",
"Store ID _bin float64\n",
"Store_Area_bin float64\n",
"Items_Available_bin float64\n",
"Daily_Customer_Count_bin float64\n",
"Store_Sales_bin float64\n",
"store_sales_log_bin float64\n",
"interaction_1 int64\n",
"interaction_2 float64\n",
"dtype: object\n",
"После нормализации: (896, 15) Store ID float64\n",
"Store_Area float64\n",
"Items_Available float64\n",
"Daily_Customer_Count float64\n",
"Store_Sales float64\n",
"store_sales_log float64\n",
"store_sales_category float64\n",
"Store ID _bin float64\n",
"Store_Area_bin float64\n",
"Items_Available_bin float64\n",
"Daily_Customer_Count_bin float64\n",
"Store_Sales_bin float64\n",
"store_sales_log_bin float64\n",
"interaction_1 float64\n",
"interaction_2 float64\n",
"dtype: object\n",
"После генерации признаков Featuretools: (896, 15) [<Feature: Store ID >, <Feature: Store_Area>, <Feature: Items_Available>, <Feature: Daily_Customer_Count>, <Feature: Store_Sales>, <Feature: store_sales_log>, <Feature: store_sales_category>, <Feature: Store ID _bin>, <Feature: Store_Area_bin>, <Feature: Items_Available_bin>, <Feature: Daily_Customer_Count_bin>, <Feature: Store_Sales_bin>, <Feature: store_sales_log_bin>, <Feature: interaction_1>, <Feature: interaction_2>]\n",
"Предсказательная способность:\n",
"MSE: 0.00, R²: 0.97\n",
"\n",
"Время вычисления для обучения модели: 0.0209 секунд\n",
"\n",
"Надежность (R² для 10 разбиений): 0.98 ± 0.00\n",
"\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABDgAAANWCAYAAAAbSTwrAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOyddXhURxeH3427EHchCUmACO5OKe5uwUqBAi1SoEVrFNeWUtxa3N09ePAEiRACgQhJNm673x9LNiwRLPko7bzPc5/snXvmd2fnnj3ZnXtmrkQul8sRCAQCgUAgEAgEAoFAIPiEUfvYDRAIBAKBQCAQCAQCgUAg+FDEAIdAIBAIBAKBQCAQCASCTx4xwCEQCAQCgUAgEAgEAoHgk0cMcAgEAoFAIBAIBAKBQCD45BEDHAKBQCAQCAQCgUAgEAg+ecQAh0AgEAgEAoFAIBAIBIJPHjHAIRAIBAKBQCAQCAQCgeCTRwxwCAQCgUAgEAgEAoFAIPjkEQMcAoFAIBAIBAKBQCAQCD55xACHQCAQCAQCgUAgEAgEgk8eMcAhEAgEAsF7snr1aiQSCVeuXClwbNmyZUgkEtq1a0dubu5HaJ1AIBAIBALBfwsxwCEQCAQCQQmzY8cOhgwZQt26ddm4cSPq6uofu0kCgUAgEAgE/3rEAIdAIBAIBCXIyZMn6d69O97e3uzZswcdHZ2P3SSBQCAQCASC/wRigEMgEAgEghLi+vXrtG3bFhsbGw4dOoSxsXEBmy1btlC5cmV0dXUxNzenV69ePHnyRMUmICAAAwMDwsLCaNasGfr6+tja2vLDDz8gl8uVdhEREUgkEmbPns28efNwcnJCV1eX+vXrc/v27QLnDgkJoVOnTpQpUwYdHR2qVKnC7t27C30vDRo0QCKRFNhWr16tYrdkyRIqVKiAnp6eit3WrVtVtCpUqFDgHLNnz0YikRAREaEsy5v282qZTCbDx8en0PMfP36cunXroq+vj4mJCW3btiU4OFjFZurUqUgkEuLi4lTKr1y5UkAzr+9fZ+vWrUgkEk6ePKksO3PmDJ07d8bR0RFtbW0cHBz45ptvSE9PL7R+lSpVMDQ0VOmn2bNnF7B9lbz+0NLSIjY2VuVYYGCgUufVaVJv066AgIBCr++rW941cHZ2plWrVhw+fBg/Pz90dHTw9vZm+/bthbb1ba7du/RzVlYWkydPpnLlyhgbG6Ovr0/dunU5ceJEsX0nEAgEgv8eGh+7AQKBQCAQ/BsIDQ3l888/R1tbm0OHDmFjY1PAZvXq1fTr14+qVasyffp0nj9/zoIFCzh37hxBQUGYmJgobXNzc/n888+pUaMGM2fO5ODBg0yZMoWcnBx++OEHFd21a9eSnJzMsGHDyMjIYMGCBTRq1Ihbt25hZWUFwJ07d6hduzZ2dnaMHz8efX19Nm/eTLt27di2bRvt27cv0F5PT0++//57AOLi4vjmm29Ujm/atImhQ4fSoEEDhg8fjr6+PsHBwfzyyy8f2p0qrFu3jlu3bhUoP3r0KM2bN8fV1ZWpU6eSnp7OokWLqF27NteuXcPZ2blE2/E6W7ZsIS0tjSFDhmBmZsalS5dYtGgRUVFRbNmyRWkXGBhIly5d8PX15ddff8XY2LjQ/iwOdXV11q9fr1Jn1apV6OjokJGR8c7tGjx4ME2aNFHW6d27N+3bt6dDhw7KMgsLC+XrBw8e0LVrV7788kv69u3LqlWr6Ny5MwcPHqRp06ZFtruoa/cuSKVSli9fTvfu3Rk0aBDJycmsWLGCZs2acenSJfz8/D5IXyAQCAT/IuQCgUAgEAjei1WrVskB+d69e+Vly5aVA/LPPvusUNusrCy5paWlvEKFCvL09HRl+d69e+WAfPLkycqyvn37ygH58OHDlWUymUzesmVLuZaWljw2NlYul8vl4eHhckCuq6srj4qKUtpevHhRDsi/+eYbZVnjxo3lFStWlGdkZKho1qpVS+7u7l6gvbVr15Y3bNhQuZ93rlWrVinLunfvLjcxMVF5PydOnJAD8i1btijL6tevLy9fvnyBc8yaNUsOyMPDw5VleX2aV5aRkSF3dHSUN2/evMD5/fz85JaWlvL4+Hhl2Y0bN+RqamryPn36KMumTJkiB5T9lsfly5cLaPbt21eur69foK1btmyRA/ITJ04oy9LS0grYTZ8+XS6RSOSPHj1Slk2YMEEOyKOjo5Vlef05a9asAhqvktcf3bt3l1esWFFZnpqaKjcyMpL36NFDDsgvX778zu16FUA+ZcqUQo85OTnJAfm2bduUZUlJSXIbGxu5v79/gba+zbV7l37OycmRZ2ZmqtglJCTIrays5P379y+0zQKBQCD4byKmqAgEAoFA8IEEBATw+PFjevToweHDh1Xu3udx5coVYmJiGDp0qMq6HC1btsTT05N9+/YVqPPVV18pX0skEr766iuysrI4evSoil27du2ws7NT7lerVo3q1auzf/9+AF68eMHx48fp0qULycnJxMXFERcXR3x8PM2aNePBgwcFpslkZWWhra1d7PtOTk5GT0+vVNcZ+e2334iPj2fKlCkq5dHR0Vy/fp2AgADKlCmjLPfx8aFp06bK9/4qL168UL73uLg4kpKSijzvq3ZxcXEkJycXsNHV1VW+Tk1NJS4ujlq1aiGXywkKClIeS05ORk1NTSVD513p3bs3ISEhyqko27Ztw9jYmMaNG793u94FW1tblSwfIyMj+vTpQ1BQEM+ePSu0TlHX7l1RV1dHS0sLUEx5efHiBTk5OVSpUoVr1659kLZAIBAI/l2IAQ6BQCAQCD6QFy9esH79etasWYOfnx8jR44s8OP50aNHAJQrV65AfU9PT+XxPNTU1HB1dVUp8/DwAFBZ4wDA3d29gKaHh4fS7uHDh8jlciZNmoSFhYXKlvfjMyYmRqV+YmJioWskvErNmjV5+vQpU6dOJTIy8o2DBu9KUlISv/zyC6NGjVJOtcmjuP708vIiLi6O1NRUlfJy5cqpvPdXp2i8SmpqaoF+6t+/fwG7yMhI5QCLgYEBFhYW1K9fX9n2PGrWrIlMJmPkyJGEhoYSFxdHQkLCO/WFhYUFLVu2ZOXKlQCsXLmSvn37oqZW8Kvc27brXXBzc0MikaiUFeWPeecp6tq9D2vWrMHHxwcdHR3MzMywsLBg3759JepvAoFAIPj0EWtwCAQCgUDwgcyaNYvOnTsD8Oeff1KjRg0mTJjA77///pFbpkAmkwEwZswYmjVrVqiNm5ubyv6zZ8+KtM3jm2++4d69e/z4449MmzatZBr7CjNmzEBNTY2xY8cSHx//wXrbtm3DyMhIuX///n2GDRtWwE5HR4c9e/aolJ05c0Zl7ZPc3FyaNm3KixcvGDduHJ6enujr6/PkyRMCAgKUfQ7QrVs3rl27xqJFi/jzzz/fu/39+/enT58+DB8+nNOnT7N8+XLOnDmjYvMu7SpNSvLarV+/noCAANq1a8fYsWOxtLREXV2d6dOnExoaWkItFggEAsG/ATHAIRAIBALBB1KvXj3l66pVqzJs2DB+++03+vTpQ40aNQBwcnIC4N69ezRq1Eil/r1795TH85DJZISFhSnvkoPiBzlQYPHMBw8eFGjT/fv3lXZ5mSCamppFZi28SlRUFMnJyXh5eRVrp6ury7JlywgKCsLY2JgpU6Zw48YNxowZ88ZzvImnT5+yYMECpk+fjqGhYYEfya/25+uEhIRgbm6Ovr6+Snm9evUwNzdX7hc1ZURdXb1APyUmJqrs37p1i/v377NmzRr69OmjLD9y5EgBPTU1NWbPns2tW7cIDw/n999/5/nz5/Tq1avQ8xdF8+bN0dHRoVu3btSpU4eyZcsWGOB4l3a9C3lZQK9mcRTlj2+6du/K1q1bcXV1Zfv27Srn/9CpLwKBQCD49yGmqAgEAoFAUML8/PPP2NjY8MUXX5CTkwNAlSpVsLS05I8//iAzM1Npe+DAAYKDg2nZsmUBncWLFytfy+VyFi9ejKamZoF1F3bu3KmyhsalS5e4ePEizZs3B8DS0pIGDRqwdOlSoqOjC5zn9cePbty4EaDAQEx
"text/plain": [
"<Figure size 1200x800 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Пропуски в данных:\n",
" Series([], dtype: int64)\n",
"\n",
"Проверка на уникальность значений в 'index': True\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA+kAAAIjCAYAAAB/OVoZAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACgk0lEQVR4nOzdd3wUdf7H8ffuplfSC6l0CNJCkSYoICKKiNhRwXq20/Msh/7spxyKIncqWE6wFxTsioAgCii9hxZKQknIJqT33fn9geSMhBaSzCZ5PR+PfTzY2Zn5vmezu+xn5zvfr8UwDEMAAAAAAMB0VrMDAAAAAACAIyjSAQAAAABwERTpAAAAAAC4CIp0AAAAAABcBEU6AAAAAAAugiIdAAAAAAAXQZEOAAAAAICLoEgHAAAAAMBFUKQDAAAAAOAiKNIBAECVZ599Vk6nU5LkdDo1adIkkxPhdGzatEmff/551f1169bpm2++MS9QEzN48GANHjzY7BgAmjiKdABN2qxZs2SxWKpuXl5eateune666y5lZmaaHQ9wOW+//bamTJmiffv26YUXXtDbb79tdiSchoKCAt1222369ddftWPHDt1zzz3auHHjKW//1VdfadCgQQoPD5ePj49atWqlK664Qt9//33VOgcOHNATTzyhdevW1cMR1K2NGzdq7Nixio+Pl5eXl1q2bKlhw4bpP//5j9nRAOC43MwOAAAN4amnnlJiYqJKS0v1yy+/aPr06fr222+1adMm+fj4mB0PcBlPPfWUrr/+ej300EPy9PTUe++9Z3YknIa+fftW3SSpXbt2uuWWW05p2ylTpuiBBx7QoEGDNHHiRPn4+Gjnzp1asGCBPvroI11wwQWSjhTpTz75pBISEtStW7f6OpQztmzZMp177rmKi4vTLbfcosjISKWnp+vXX3/VtGnTdPfdd5sdEQBqRJEOoFkYMWKEevbsKUm6+eabFRISohdffFFffPGFrr76apPTAa7jyiuv1LnnnqudO3eqbdu2CgsLMzsSTtPnn3+uLVu2qKSkRGeddZY8PDxOuk1lZaWefvppDRs2TD/88MMxjx86dKg+olZTVFQkX1/fOtvfM888o8DAQK1cuVItWrSo9lhDHA8A1Bbd3QE0S+edd54kaffu3ZKknJwc3X///TrrrLPk5+engIAAjRgxQuvXrz9m29LSUj3xxBNq166dvLy8FBUVpTFjxig1NVWStGfPnmpd7P98++P1jIsXL5bFYtHHH3+shx9+WJGRkfL19dWoUaOUnp5+TNu//fabLrjgAgUGBsrHx0eDBg3S0qVLazzGwYMH19j+E088ccy67733npKTk+Xt7a3g4GBdddVVNbZ/omP7I6fTqZdeeklJSUny8vJSRESEbrvtNh0+fLjaegkJCbrooouOaeeuu+46Zp81ZX/++eePeU4lqaysTI8//rjatGkjT09PxcbG6sEHH1RZWVmNz9UfDR48WJ07dz5m+ZQpU2SxWLRnz55qy3Nzc3XvvfcqNjZWnp6eatOmjSZPnlx1XfcfPfHEEzU+d+PHj6+23v79+3XjjTcqIiJCnp6eSkpK0ltvvVVtnaOvnaM3T09PtWvXTpMmTZJhGNXWXbt2rUaMGKGAgAD5+flpyJAh+vXXX6utc/TSkD179ig8PFz9+vVTSEiIunTpIovFolmzZp3wefvzpSUne92dzjHW5fvj6N8gPDxcFRUV1R778MMPq/La7fZqj3333XcaOHCgfH195e/vr5EjR2rz5s3V1hk/frz8/PyOyfXpp5/KYrFo8eLFVctO93X26quvKikpSZ6enoqOjtadd96p3Nzcauv88XrpTp06KTk5WevXr6/xPfpndrtd+fn56t+/f42Ph4eHSzryN+nVq5ckacKECVX7/uPrY/bs2VWfJ6GhoRo3bpz2799fbX9Hn6vU1FRdeOGF8vf317XXXivp1D8/TiY1NVVJSUnHFOh/PJ6jZs6cqfPOO0/h4eHy9PRUp06dNH369FNq51Q/b+bPn68BAwaoRYsW8vPzU/v27fXwww+f1jEBaB44kw6gWTpaUIeEhEiSdu3apc8//1yXX365EhMTlZmZqddee02DBg3Sli1bFB0dLUlyOBy66KKLtHDhQl111VW65557VFBQoPnz52vTpk1q3bp1VRtXX321LrzwwmrtTpw4scY8zzzzjCwWix566CEdOnRIL730koYOHap169bJ29tbkvTjjz9qxIgRSk5O1uOPPy6r1Vr1xfLnn39W7969j9lvTExM1cBfhYWFuv3222ts+9FHH9UVV1yhm2++WVlZWfrPf/6jc845R2vXrq3xC+6tt96qgQMHSpLmzJmjuXPnVnv8tttu06xZszRhwgT99a9/1e7du/Xyyy9r7dq1Wrp0qdzd3Wt8Hk5Hbm5ujYOaOZ1OjRo1Sr/88otuvfVWdezYURs3btTUqVO1ffv2aoNqnani4mINGjRI+/fv12233aa4uDgtW7ZMEydO1MGDB/XSSy/VuN27775b9e+//e1v1R7LzMzU2WefLYvForvuukthYWH67rvvdNNNNyk/P1/33ntvtfUffvhhdezYUSUlJVXFbHh4uG666SZJ0ubNmzVw4EAFBATowQcflLu7u1577TUNHjxYP/30k/r06XPc43v33XdP63pm6X+XlhxV0+vudI+xPt4fBQUF+vrrr3XppZdWLZs5c6a8vLxUWlp6zPNwww03aPjw4Zo8ebKKi4s1ffp0DRgwQGvXrlVCQsJpPUen64knntCTTz6poUOH6vbbb9e2bds0ffp0rVy58qTvp4ceeuiU2ggPD5e3t7e++uor3X333QoODq5xvY4dO+qpp57SY489Vu1zoF+/fpJU9b7v1auXJk2apMzMTE2bNk1Lly495vOksrJSw4cP14ABAzRlypSqS4/q6vMjPj5ey5cv16ZNm2r8QeSPpk+frqSkJI0aNUpubm766quvdMcdd8jpdOrOO+887nan+nmzefNmXXTRRerSpYueeuopeXp6aufOncf9kRVAM2cAQBM2c+ZMQ5KxYMECIysry0hPTzc++ugjIyQkxPD29jb27dtnGIZhlJaWGg6Ho9q2u3fvNjw9PY2nnnqqatlbb71lSDJefPHFY9pyOp1V20kynn/++WPWSUpKMgYNGlR1f9GiRYYko2XLlkZ+fn7V8k8++cSQZEybNq1q323btjWGDx9e1Y5hGEZxcbGRmJhoDBs27Ji2+vXrZ3Tu3LnqflZWliHJePzxx6uW7dmzx7DZbMYzzzxTbduNGzcabm5uxyzfsWOHIcl4++23q5Y9/vjjxh//O/n5558NScb7779fbdvvv//+mOXx8fHGyJEjj8l+5513Gn/+L+rP2R988EEjPDzcSE5Orvacvvvuu4bVajV+/vnnatvPmDHDkGQsXbr0mPb+aNCgQUZSUtIxy59//nlDkrF79+6qZU8//bTh6+trbN++vdq6//jHPwybzWakpaVVW/7II48YFoul2rL4+HjjhhtuqLp/0003GVFRUYbdbq+23lVXXWUEBgYaxcXFhmH877WzaNGiqnVKS0sNq9Vq3HHHHVXLRo8ebXh4eBipqalVyw4cOGD4+/sb55xzTtWyo++Vo8dXWlpqxMXFGSNGjDAkGTNnzjz2yfqDo9uvXLmy2vKaXnene4x1+f44+nq9+uqrjYsuuqhq+d69ew2r1WpcffXVhiQjKyvLMAzDKCgoMFq0aGHccsst1bJmZGQYgYGB1ZbfcMMNhq+v7zHPzezZs4/5W53q6+zQoUOGh4eHcf7551f7jHr55ZcNScZbb71VbZ9/fC98++23hiTjggsuOOb9VJPHHnvMkGT4+voaI0aMMJ555hlj9erVx6y3cuXKGl8T5eXlRnh4uNG5c2ejpKSkavnXX39tSDIee+yxqmU33HCDIcn4xz/+UW0fp/P5cTI//PCDYbPZDJvNZvTt29d48MEHjXnz5hnl5eXHrHv0NfdHw4cPN1q1alVt2Z+f41P9vJk6dWq11xUAnAjd3QE0C0OHDlVYWJhiY2N11VVXyc/PT3PnzlXLli0
"text/plain": [
"<Figure size 1200x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA+kAAAIjCAYAAAB/OVoZAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADeUElEQVR4nOzdeXxU1f3/8dedmcxMJvu+EMK+76IooIK7QFW01da2imur1bbWVq22atWfW60L/dpabRXcrQtqXXHFBVFBdgg7IUD2fZnMfn9/BKIxBLJPQt7PxyMPmTv3nvOZSWLmc885n2OYpmkiIiIiIiIiImFnCXcAIiIiIiIiItJASbqIiIiIiIhID6EkXURERERERKSHUJIuIiIiIiIi0kMoSRcRERERERHpIZSki4iIiIiIiPQQStJFREREREREeggl6SIiIiIiIiI9hJJ0ERERERERkR5CSbqIiIiIiIhID6EkXUSkD1m4cCGGYTR+OZ1Ohg8fztVXX01RUVG4wxMRERHp82zhDkBERLrf7bffzqBBg/B4PHz++ec88sgjvP3226xfvx6XyxXu8ERERET6LCXpIiJ90KxZszjyyCMBuOyyy0hKSuKBBx7g9ddf5/zzzw9zdCIiIiJ9l6a7i4gIJ554IgA7d+4EoLy8nD/84Q+MGzeO6OhoYmNjmTVrFmvWrGl2rcfj4S9/+QvDhw/H6XSSkZHBOeecw/bt2wHIzc1tMsX++18zZ85sbGvJkiUYhsF///tfbrrpJtLT04mKiuLMM89k9+7dzfr+6quvOP3004mLi8PlcjFjxgyWLl16wNc4c+bMA/b/l7/8pdm5zzzzDJMnTyYyMpLExER+8pOfHLD/g7227wqFQjz00EOMGTMGp9NJWloav/zlL6moqGhy3sCBA/nBD37QrJ+rr766WZsHiv2+++5r9p4CeL1ebr31VoYOHYrD4aB///5cf/31eL3eA75X3zVz5kzGjh3b7Pjf/vY3DMMgNze3yfF//vOfjBkzBofDQWZmJldddRWVlZXN2vzue5WcnMycOXNYv359s9d49dVXtxjb/uUb+2P46KOPsFgs3HLLLU3Oe+655zAMg0ceeeSQr/VgP6vffa37v1fvvfceEydOxOl0Mnr0aBYtWnTQGKHh52H8+PEYhsHChQsbj//lL39h9OjRjb9zxxxzDK+99lqzGFv7/Xj99deZM2cOmZmZOBwOhgwZwh133EEwGGzW5vd/Zu68804sFgvPPfdck+MvvfRS4+9GcnIyP//5z9m7d2+Tcy666KIm71tCQgIzZ87ks88+axa3iIg0p5F0ERFpTKiTkpIA2LFjB6+99hrnnnsugwYNoqioiEcffZQZM2awceNGMjMzAQgGg/zgBz/gww8/5Cc/+Qm//e1vqamp4f3332f9+vUMGTKksY/zzz+f2bNnN+n3xhtvPGA8d955J4ZhcMMNN1BcXMxDDz3EySefzOrVq4mMjAQaErJZs2YxefJkbr31ViwWCwsWLODEE0/ks88+Y8qUKc3azcrK4u677wagtraWK6+88oB933zzzZx33nlcdtlllJSU8H//938cf/zxrFq1ivj4+GbX/OIXv+C4444DYNGiRbz66qtNnv/lL3/JwoULufjii/nNb37Dzp07efjhh1m1ahVLly4lIiLigO9DW1RWVja+tu8KhUKceeaZfP755/ziF79g1KhRrFu3jgcffJAtW7Y0SwI74i9/+Qu33XYbJ598MldeeSWbN2/mkUceYfny5c1e58iRI/nTn/6EaZps376dBx54gNmzZ5OXl9fu/k888UR+9atfcffddzN37lyOOOIICgoK+PWvf83JJ5/MFVdcccg2vvszst/bb7/N888/3+zcrVu38uMf/5grrriCefPmsWDBAs4991zeffddTjnllBb7ePrpp1m3bl2z43V1dZx99tkMHDiQ+vp6Fi5cyA9/+EOWLVt2wJ/nQ1m4cCHR0dFce+21REdH89FHH3HLLbdQXV3Nfffd1+J1CxYs4M9//jP3338/P/3pT5u0d/HFF3PUUUdx9913U1RUxPz581m6dGmz343k5GQefPBBAPbs2cP8+fOZPXs2u3fvPuDvkIiIfIcpIiJ9xoIFC0zA/OCDD8ySkhJz9+7d5gsvvGAmJSWZkZGR5p49e0zTNE2Px2MGg8Em1+7cudN0OBzm7bff3njsiSeeMAHzgQceaNZXKBRqvA4w77vvvmbnjBkzxpwxY0bj448//tgEzH79+pnV1dWNx1988UUTMOfPn9/Y9rBhw8zTTjutsR/TNE23220OGjTIPOWUU5r1NW3aNHPs2LGNj0tKSkzAvPXWWxuP5ebmmlar1bzzzjubXLtu3TrTZrM1O75161YTMJ988snGY7feeqv53T+vn332mQmYzz77bJNr33333WbHBwwYYM6ZM6dZ7FdddZX5/T/Z34/9+uuvN1NTU83Jkyc3eU+ffvpp02KxmJ999lmT6//1r3+ZgLl06dJm/X3XjBkzzDFjxjQ7ft9995mAuXPnTtM0TbO4uNi02+3mqaee2uRn5+GHHzYB84knnmjS5ndjNE3TvOmmm0zALC4ubvIar7rqqhZj2//zvD8G0zTNuro6c+jQoeaYMWNMj8djzpkzx4yNjTV37dp10NfZltdqmg3fK8B85ZVXGo9VVVWZGRkZ5qRJk1qM0ePxmNnZ2easWbNMwFywYEGL8RQXF5uA+be//a1dMbrd7mbn/fKXvzRdLpfp8XiatLn/+/HWW2+ZNpvN/P3vf9/kOp/PZ6ampppjx4416+vrG4+/+eabJmDecsstjcfmzZtnDhgwoMn1jz32mAmYX3/9dYuvV0REGmi6u4hIH3TyySeTkpJC//79+clPfkJ0dDSvvvoq/fr1A8DhcGCxNPyJCAaDlJWVER0dzYgRI1i5cmVjO6+88grJycn8+te/btbH96dnt8WFF15ITExM4+Mf/ehHZGRk8PbbbwOwevVqtm7dyk9/+lPKysooLS2ltLSUuro6TjrpJD799FNCoVCTNj0eD06n86D9Llq0iFAoxHnnndfYZmlpKenp6QwbNoyPP/64yfk+nw9oeL9a8tJLLxEXF8cpp5zSpM3JkycTHR3drE2/39/kvNLSUjwez0Hj3rt3L//3f//HzTffTHR0dLP+R40axciRI5u0uX+Jw/f7P5BgMNgsJrfb3eScDz74AJ/PxzXXXNP4swNw+eWXExsby1tvvXXA11lSUsKyZct49dVXGT9+PMnJyU3O83g8lJaWUlZW1ux7eiAul4uFCxeSk5PD8ccfz1tvvcWDDz5Idnb2Ia9tq8zMTM4+++zGx7GxsVx44YWsWrWKwsLCA17zj3/8g7KyMm699dYDPr//fdm+fTv33HMPFouF6dOnNzmnNd8PoHHWCUBNTQ2lpaUcd9xxuN1uNm3a1Oz8r7/+mvPOO48f/vCHzUbaV6xYQXFxMb/61a+a/B7NmTOHkSNHNvv+hkKhxthWr17NU089RUZGBqNGjTrg6xYRkW9puruISB/0j3/8g+HDh2Oz2UhLS2PEiBFNEqtQKMT8+fP55z//yc6dO5usYd0/JR4apsmPGDECm61z/5wMGzasyWPDMBg6dGjjetutW7cCMG/evBbbqKqqIiEhofFxaWlps3a/b+vWrZim2eJ535+Wvn+t9fcT4++3WVVVRWpq6gGfLy4ubvL4vffeIyUl5aBxft+tt95KZmYmv/zlL3n55Zeb9Z+Tk9Nim9/v/0A2bdp0yJh27doFwIgRI5oct9vtDB48uPH5/b744osmbQ4bNozXXnut2c2dxx9/nMcff7yxraOPPpoHHnigsfDhgUyfPp0rr7ySf/zjH5x22mlccsklh3yN7TF06NBm8Q4fPhxoqFeQnp7e5Lmqqiruuusurr32WtLS0g7Y5ocffsisWbOAhqT/5Zdf5phjjmlyTmu+HwAbNmzgz3/+Mx999BHV1dXNYvmuvXv3MmfOHOrq6igrK2v2ulr6/kLD0oXPP/+8ybHdu3c3iTEjI4NXXnnloL8rIiLSQEm6iEgfNGXKlIMmOXfddRc
"text/plain": [
"<Figure size 1200x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.preprocessing import OneHotEncoder, KBinsDiscretizer, StandardScaler, MinMaxScaler\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.metrics import mean_squared_error, r2_score\n",
"import time\n",
"import featuretools as ft\n",
"\n",
"\n",
"print(\"Исходные данные:\", df.shape, df.dtypes)\n",
"\n",
"# Обработка категориальных признаков\n",
"categorical_cols = df.select_dtypes(include=['object']).columns\n",
"encoder = OneHotEncoder(sparse_output=False, drop='first')\n",
"encoded_data = encoder.fit_transform(df[categorical_cols])\n",
"\n",
"# Преобразуем в DataFrame и объединяем с исходным\n",
"encoded_df = pd.DataFrame(encoded_data, columns=encoder.get_feature_names_out(categorical_cols))\n",
"df = pd.concat([df.drop(categorical_cols, axis=1), encoded_df], axis=1)\n",
"print(\"После OneHotEncoder:\", df.shape, df.dtypes)\n",
"\n",
"# Дискретизация числовых столбцов\n",
"numeric_cols = df.select_dtypes(include=['int64', 'float64']).columns\n",
"discretizer = KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='uniform')\n",
"discretized_data = discretizer.fit_transform(df[numeric_cols])\n",
"\n",
"# Преобразуем в DataFrame и объединяем с исходным\n",
"discretized_df = pd.DataFrame(discretized_data, columns=[f\"{col}_bin\" for col in numeric_cols])\n",
"df = pd.concat([df, discretized_df], axis=1)\n",
"print(\"После дискретизации:\", df.shape, df.dtypes)\n",
"\n",
"# Синтез новых признаков\n",
"df['interaction_1'] = df['Store_Area'] * df['Items_Available']\n",
"df['interaction_2'] = df['Daily_Customer_Count'] / (df['Store_Area'] + 1) # Избегаем деления на 0\n",
"print(\"После синтеза новых признаков:\", df.shape, df.dtypes)\n",
"\n",
"# Применяем стандартизацию и нормализацию\n",
"scaler_standard = StandardScaler()\n",
"scaler_minmax = MinMaxScaler()\n",
"\n",
"df_standard = pd.DataFrame(scaler_standard.fit_transform(df), columns=df.columns)\n",
"df_minmax = pd.DataFrame(scaler_minmax.fit_transform(df), columns=df.columns)\n",
"df = df_minmax\n",
"print(\"После нормализации:\", df.shape, df.dtypes)\n",
"\n",
"# Создаем сущность в Featuretools\n",
"es = ft.EntitySet(id=\"store_data\")\n",
"\n",
"# Добавляем таблицу данных как сущность\n",
"df = df.reset_index() # Сброс индекса и создание 'id'\n",
"es = es.add_dataframe(dataframe_name=\"stores\", dataframe=df, index=\"index\")\n",
"\n",
"# Генерация новых признаков с помощью Featuretools\n",
"feature_matrix, feature_defs = ft.dfs(entityset=es, target_dataframe_name=\"stores\", max_depth=1)\n",
"print(\"После генерации признаков Featuretools:\", feature_matrix.shape, feature_defs)\n",
"\n",
"# Объединяем новые признаки с исходными\n",
"df = pd.concat([df, feature_matrix], axis=1)\n",
"\n",
"# Оценка качества признаков\n",
"def evaluate_predictive_power(X_train, y_train, X_test, y_test):\n",
" model = LinearRegression()\n",
" model.fit(X_train, y_train)\n",
" y_pred = model.predict(X_test)\n",
" \n",
" mse = mean_squared_error(y_test, y_pred)\n",
" r2 = r2_score(y_test, y_pred)\n",
" \n",
" print(f\"Предсказательная способность:\\nMSE: {mse:.2f}, R²: {r2:.2f}\\n\")\n",
" return mse, r2\n",
"\n",
"def measure_computation_time(X_train, y_train):\n",
" model = LinearRegression()\n",
" start_time = time.time()\n",
" model.fit(X_train, y_train)\n",
" elapsed_time = time.time() - start_time\n",
" print(f\"Время вычисления для обучения модели: {elapsed_time:.4f} секунд\\n\")\n",
"\n",
"def check_reliability(X, y):\n",
" model = LinearRegression()\n",
" reliability_scores = []\n",
" for _ in range(10): # 10 случайных разбиений\n",
" X_train_sub, X_test_sub, y_train_sub, y_test_sub = train_test_split(X, y, test_size=0.3, random_state=None)\n",
" model.fit(X_train_sub, y_train_sub)\n",
" y_pred_sub = model.predict(X_test_sub)\n",
" r2_sub = r2_score(y_test_sub, y_pred_sub)\n",
" reliability_scores.append(r2_sub)\n",
" \n",
" print(f\"Надежность (R² для 10 разбиений): {np.mean(reliability_scores):.2f} ± {np.std(reliability_scores):.2f}\\n\")\n",
"\n",
"def check_correlation(df):\n",
" plt.figure(figsize=(12, 8))\n",
" correlation_matrix = df.corr()\n",
" sns.heatmap(correlation_matrix, annot=True, fmt=\".2f\", cmap='coolwarm')\n",
" plt.title(\"Корреляционная матрица\")\n",
" plt.show()\n",
"\n",
"def check_integrity(df):\n",
" missing_values = df.isnull().sum()\n",
" print(\"Пропуски в данных:\\n\", missing_values[missing_values > 0])\n",
" print(\"\\nПроверка на уникальность значений в 'index':\", df['index'].is_unique)\n",
"\n",
"# Разделение на признаки и целевую переменную\n",
"X = df.drop(['Store_Sales'], axis=1, errors='ignore')\n",
"y = df['Store_Sales']\n",
"\n",
"# Разделение на обучающую и тестовую выборки\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n",
"\n",
"# Выполнение оценок\n",
"evaluate_predictive_power(X_train, y_train, X_test, y_test)\n",
"measure_computation_time(X_train, y_train)\n",
"check_reliability(X, y)\n",
"check_correlation(df)\n",
"check_integrity(df)\n",
"\n",
"# Визуализация распределения целевой переменной (если есть)\n",
"if 'Store_Sales' in df.columns:\n",
" plt.figure(figsize=(12, 6))\n",
" sns.histplot(df['Store_Sales'], color='blue', kde=True, stat='density', bins=30)\n",
" plt.title(\"Распределение целевой переменной Store_Sales\")\n",
" plt.xlabel(\"Store Sales\")\n",
" plt.ylabel(\"Density\")\n",
" plt.show()\n",
"else:\n",
" print(\"Столбец 'Store_Sales' не найден в DataFrame.\")\n",
"\n",
"# Визуализация новых признаков (пример)\n",
"if 'interaction_1' in df.columns and 'interaction_2' in df.columns:\n",
" plt.figure(figsize=(12, 6))\n",
" sns.histplot(df['interaction_1'], color='orange', kde=True, stat='density', bins=30, label='interaction_1')\n",
" sns.histplot(df['interaction_2'], color='green', kde=True, stat='density', bins=30, label='interaction_2')\n",
" plt.title(\"Распределение новых признаков\")\n",
" plt.xlabel(\"Value\")\n",
" plt.ylabel(\"Density\")\n",
" plt.legend()\n",
" plt.show()\n",
"else:\n",
" print(\"Одного или нескольких новых признаков не найдено в DataFrame.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Оценить качество каждого набора признаков:**\n",
"\n",
"Предсказательная способность: Хорошая, особенно после дискретизации. Однако необходимо внимательно следить за переобучением.\n",
"\n",
"Cкорость вычисления: 0.0209 секунд.\n",
"Время вычисления достаточно короткое, что является положительным признаком, особенно при больших объемах данных. Это указывает на то, что процесс обработки данных эффективен.\n",
"\n",
"Надежность: R² для 10 разбиений: 0.98 ± 0.00. Очень высокая надежность (близкая к 1) указывает на то, что модель может хорошо предсказывать значения целевой переменной Store_Sales. Однако такая высокая оценка может также указывать на возможность переобучения, особенно если данные хорошо известны модели.\n",
"\n",
"Целостность: Данные чистые, без пропусков или дубликатов, что положительно влияет на качество модели.\n",
"\n",
"Корреляция: \n",
"\n",
"1. Параметры, такие как Store_Sales, store_sales_log, и store_sales_category, имеют очень высокую корреляцию между собой. Это говорит о том, что эти показатели связаны и изменяются почти одинаково. Например, рост объема продаж (Store_Sales) связан с ростом значений логарифма продаж (store_sales_log) и категории продаж (store_sales_category).\n",
"\n",
"2. Некоторые параметры, такие как количество клиентов в день (Daily_Customer_Count) и площадь магазина (Store_Area), имеют низкую корреляцию с другими переменными, что может свидетельствовать о слабой связи между этими показателями.\n",
"\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "aimenv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}