770 lines
2.0 MiB
Plaintext
Raw Normal View History

2024-11-29 20:49:12 +04:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## **Лабораторная работа №5**\n",
"\n",
"**Бизнес-цели:**\n",
"\n",
"Улучшение финансового планирования\n",
"\n",
"Использование подходов кластеризации для предсказания объемов продаж и доходов по различным сегментам\n",
"\n",
"**Столбцы датасета и их пояснение:**\n",
"\n",
2024-11-29 21:41:18 +04:00
"**Date** - Дата, на которую относятся данные. Эта характеристика указывает конкретный день, в который происходила торговля акциями Walmart.\n",
2024-11-29 20:49:12 +04:00
"\n",
2024-11-29 21:41:18 +04:00
"**Open** - Цена открытия. Стоимость акций Walmart в начале торгового дня. Это важный показатель, который показывает, по какой цене начались торги в конкретный день, и часто используется для сравнения с ценой закрытия для определения дневного тренда.\n",
2024-11-29 20:49:12 +04:00
"\n",
2024-11-29 21:41:18 +04:00
"**High** - Максимальная цена за день. Наибольшая цена, достигнутая акциями Walmart в течение торгового дня. Эта характеристика указывает, какой была самая высокая стоимость акций за день.\n",
2024-11-29 20:49:12 +04:00
"\n",
2024-11-29 21:41:18 +04:00
"**Low** - Минимальная цена за день. Наименьшая цена, по которой торговались акции Walmart в течение дня.\n",
2024-11-29 20:49:12 +04:00
"\n",
2024-11-29 21:41:18 +04:00
"**Close** - Цена закрытия. Стоимость акций Walmart в конце торгового дня. Цена закрытия — один из основных показателей, используемых для анализа акций, так как она отображает итоговую стоимость акций за день и часто используется для расчета дневных изменений и трендов на длительных временных периодах.\n",
2024-11-29 20:49:12 +04:00
"\n",
"**Adj Close** - Скорректированная цена закрытия. Цена закрытия, скорректированная с учетом всех корпоративных действий.\n",
"\n",
2024-11-29 21:41:18 +04:00
"**Volume** - Объем торгов. Количество акций Walmart, проданных и купленных в течение дня. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Выгружаем данные**"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Date Open High Low Close Adj Close Volume\n",
"0 1/3/2000 22.791668 23.000000 21.833332 22.270832 14.469358 25109700\n",
"1 1/4/2000 21.833332 21.937500 21.395832 21.437500 13.927947 20235300\n",
"2 1/5/2000 21.291668 21.458332 20.729168 21.000000 13.643703 21056100\n",
"3 1/6/2000 21.000000 21.520832 20.895832 21.229168 13.792585 19633500\n",
"4 1/7/2000 21.500000 22.979168 21.500000 22.833332 14.834813 23930700\n",
"Index(['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'], dtype='object')\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Date</th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Adj Close</th>\n",
" <th>Volume</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1/3/2000</td>\n",
" <td>22.791668</td>\n",
" <td>23.000000</td>\n",
" <td>21.833332</td>\n",
" <td>22.270832</td>\n",
" <td>14.469358</td>\n",
" <td>25109700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1/4/2000</td>\n",
" <td>21.833332</td>\n",
" <td>21.937500</td>\n",
" <td>21.395832</td>\n",
" <td>21.437500</td>\n",
" <td>13.927947</td>\n",
" <td>20235300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1/5/2000</td>\n",
" <td>21.291668</td>\n",
" <td>21.458332</td>\n",
" <td>20.729168</td>\n",
" <td>21.000000</td>\n",
" <td>13.643703</td>\n",
" <td>21056100</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1/6/2000</td>\n",
" <td>21.000000</td>\n",
" <td>21.520832</td>\n",
" <td>20.895832</td>\n",
" <td>21.229168</td>\n",
" <td>13.792585</td>\n",
" <td>19633500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1/7/2000</td>\n",
" <td>21.500000</td>\n",
" <td>22.979168</td>\n",
" <td>21.500000</td>\n",
" <td>22.833332</td>\n",
" <td>14.834813</td>\n",
" <td>23930700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1/10/2000</td>\n",
" <td>22.416668</td>\n",
" <td>22.500000</td>\n",
" <td>21.875000</td>\n",
" <td>22.416668</td>\n",
" <td>14.564112</td>\n",
" <td>20142900</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>1/11/2000</td>\n",
" <td>22.354168</td>\n",
" <td>22.583332</td>\n",
" <td>21.875000</td>\n",
" <td>22.083332</td>\n",
" <td>14.347544</td>\n",
" <td>14829900</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>1/12/2000</td>\n",
" <td>22.062500</td>\n",
" <td>22.250000</td>\n",
" <td>21.687500</td>\n",
" <td>21.687500</td>\n",
" <td>14.090372</td>\n",
" <td>12255000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1/13/2000</td>\n",
" <td>22.000000</td>\n",
" <td>22.041668</td>\n",
" <td>21.666668</td>\n",
" <td>21.708332</td>\n",
" <td>14.103909</td>\n",
" <td>15063000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>1/14/2000</td>\n",
" <td>21.333332</td>\n",
" <td>21.979168</td>\n",
" <td>21.333332</td>\n",
" <td>21.500000</td>\n",
" <td>13.968553</td>\n",
" <td>18936600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>1/18/2000</td>\n",
" <td>21.062500</td>\n",
" <td>22.145832</td>\n",
" <td>21.020832</td>\n",
" <td>21.854168</td>\n",
" <td>14.198661</td>\n",
" <td>19326600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>1/19/2000</td>\n",
" <td>21.750000</td>\n",
" <td>21.937500</td>\n",
" <td>21.333332</td>\n",
" <td>21.354168</td>\n",
" <td>13.873807</td>\n",
" <td>14459700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>1/20/2000</td>\n",
" <td>21.479168</td>\n",
" <td>21.500000</td>\n",
" <td>20.833332</td>\n",
" <td>21.125000</td>\n",
" <td>13.724912</td>\n",
" <td>17214300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>1/21/2000</td>\n",
" <td>21.312500</td>\n",
" <td>21.312500</td>\n",
" <td>20.687500</td>\n",
" <td>20.812500</td>\n",
" <td>13.521886</td>\n",
" <td>20857500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>1/24/2000</td>\n",
" <td>21.145832</td>\n",
" <td>21.145832</td>\n",
" <td>19.166668</td>\n",
" <td>19.791668</td>\n",
" <td>12.858650</td>\n",
" <td>23399700</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Date Open High Low Close Adj Close Volume\n",
"0 1/3/2000 22.791668 23.000000 21.833332 22.270832 14.469358 25109700\n",
"1 1/4/2000 21.833332 21.937500 21.395832 21.437500 13.927947 20235300\n",
"2 1/5/2000 21.291668 21.458332 20.729168 21.000000 13.643703 21056100\n",
"3 1/6/2000 21.000000 21.520832 20.895832 21.229168 13.792585 19633500\n",
"4 1/7/2000 21.500000 22.979168 21.500000 22.833332 14.834813 23930700\n",
"5 1/10/2000 22.416668 22.500000 21.875000 22.416668 14.564112 20142900\n",
"6 1/11/2000 22.354168 22.583332 21.875000 22.083332 14.347544 14829900\n",
"7 1/12/2000 22.062500 22.250000 21.687500 21.687500 14.090372 12255000\n",
"8 1/13/2000 22.000000 22.041668 21.666668 21.708332 14.103909 15063000\n",
"9 1/14/2000 21.333332 21.979168 21.333332 21.500000 13.968553 18936600\n",
"10 1/18/2000 21.062500 22.145832 21.020832 21.854168 14.198661 19326600\n",
"11 1/19/2000 21.750000 21.937500 21.333332 21.354168 13.873807 14459700\n",
"12 1/20/2000 21.479168 21.500000 20.833332 21.125000 13.724912 17214300\n",
"13 1/21/2000 21.312500 21.312500 20.687500 20.812500 13.521886 20857500\n",
"14 1/24/2000 21.145832 21.145832 19.166668 19.791668 12.858650 23399700"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Date 0\n",
"Open 0\n",
"High 0\n",
"Low 0\n",
"Close 0\n",
"Adj Close 0\n",
"Volume 0\n",
"dtype: int64\n"
]
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.read_csv(\"..//static//csv//WMT.csv\").head(15000)\n",
"\n",
"print(df.head())\n",
"print(df.columns)\n",
"display(df.head(15))\n",
"print(df.isnull().sum()) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Очистка данных**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Open High Low Close Adj Close Volume\n",
"0 22.791668 23.000000 21.833332 22.270832 14.469358 25109700\n",
"1 21.833332 21.937500 21.395832 21.437500 13.927947 20235300\n",
"2 21.291668 21.458332 20.729168 21.000000 13.643703 21056100\n",
"3 21.000000 21.520832 20.895832 21.229168 13.792585 19633500\n",
"4 21.500000 22.979168 21.500000 22.833332 14.834813 23930700\n"
]
}
],
"source": [
"df_cleaned = df.drop(columns=['Date'], errors='ignore').dropna()\n",
"print(df_cleaned.head()) # Вывод очищенного DataFrame\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Визуализация парных взаимосвязей**"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi0AAASgCAYAAACEzgvMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd5xcd33v/9cp09vOzmyv2lXvkmVJLrItYzDNdihJCE4IBG4uxRAuhEAuNwUSHtwkOFwSILmUkPxubigBLgRMcQDbGBfZsi0X9a7tZbZNnzlzzu+PtRbLko2wsXe1ej8fD4H2zJnZ7xkd72O++/58vx/D8zwPERERERERERERERGReWbO9wBERERERERERERERERAoYWIiIiIiIiIiIiIiCwQCi1ERERERERERERERGRBUGghIiIiIiIiIiIiIiILgkILERERERERERERERFZEBRaiIiIiIiIiIiIiIjIgqDQQkREREREREREREREFgSFFiIiIiIiIiIiIiIisiAotBARERERERERERERkQVBoYWIyCLz0EMP8e53v5srrriCdevW8ZKXvIT/8T/+B0ePHp3vob0o+vv7WbFiBd/85jfP+fg3v/lNVqxYQX9//zm/Ph/P5TkiIiIiIheK3/md3+F3fud35nsYL7iZmRk+/elPc8MNN7Bp0yYuu+wyfvd3f5ef/OQnZ5z3oQ99iGuvvXaeRikicvFRaCEisoh87nOf4+abb6ZYLPLf//t/54tf/CJvf/vb2bdvH695zWu47bbb5nuIC84111zDV7/6VRobG+d7KCIiIiIi8iI5evQov/Zrv8bXv/51XvOa1/CZz3yGv/iLv6Curo53vOMdfPazn53vIYqIXLTs+R6AiIj8atxxxx3ceuutvPvd7+aWW26ZO75161Z+7dd+jfe///186EMfYvny5SxbtmweR7qw1NfXU19fP9/DEBERERGRF0m1WuW9730vPp+Pf/u3fyOVSs09dt111/Enf/InfOpTn+Laa69l5cqV8zhSEZGLk1ZaiIgsEp/+9Kfp6enhXe9611mP+Xw+PvrRj2JZFp///Ofnjq9YsYJ//dd/5YMf/CCbNm3i8ssv52Mf+xjlcvmM5//oRz/ita99LevWreOKK67gL//yLykUCnOP//3f/z0vfelLufPOO7nhhhtYu3Yt119/Pd/61reecbzf+c53WLFiBYcOHTrre61YsYJ9+/YB8C//8i+8/OUvZ926dezYsYM///M/J5fLPZe36JzOtdXT//t//49XvvKVrFu3jhtvvJH77ruP1atXn7Xl1KOPPsob3vAG1q1bxzXXXMMXvvCFX9m4REREREQWunvuuYc3vvGNXHLJJWzbto33v//9DA0NAfDjH//4jM/1AN/61rdYsWIF//7v/z53bP/+/axYsYJHHnnkrNf/x3/8R9auXcv09PQZx//5n/+ZNWvWkMlkcF2XT37yk1x77bWsXbuWa6+9lltvvZVqtfqM477rrrs4dOgQf/AHf3BGYHHae97zHn77t38bx3HO+fxarcb//b//lxtuuIH169dzzTXX8IlPfOKMedTExATvf//757btvemmm86aHw0ODvK+972PrVu3smHDBn73d3/3jPdLRORipdBCRGQRmJiY4IknnmDnzp0YhnHOc+rq6rj88sv58Y9/fMbxT33qU2QyGf7X//pfvO1tb+OrX/0qH/zgB+ce/853vsO73vUuenp6+MxnPsMtt9zCf/zHf/DOd74Tz/PmzhsbG+OjH/0ob3rTm/jc5z5He3s7H/zgB5+xl8Z1111HOBw+a8uq7373uyxbtozVq1fz3e9+l7/5m7/h5ptv5otf/CLvete7+Pa3v81f/MVf/ML3xHVdHMc564/rus/6vG9961t86EMfYvPmzXz2s5/l+uuv553vfCe1Wu2sc//8z/+cV73qVXzuc59j06ZN/M3f/A133HHHLxybiIiIiMiF7lvf+ha/93u/R0tLC3/7t3/LH//xH/PII4/wm7/5m2QyGS677DL8fj/33nvv3HPuv/9+AHbv3j137Kc//Sn19fVs2LDhrO9xww034DgOt99++xnHb7vtNq688kpSqRSf//zn+fKXv8y73vUu/umf/onf+q3f4otf/CL/8A//8Ixj/+lPf4plWVx99dXnfLyhoYE/+ZM/Ye3ated8/E//9E/5+Mc/znXXXcc//MM/cPPNN/Ov//qvZ8yRPvCBD3D06FE+8pGP8PnPf57Vq1fzwQ9+cO49mJiY4A1veAN79+7lT/7kT7j11ltxXZebb775oulHKCLyTLQ9lIjIIjAwMABAW1vbs57X1dXFj3/8Y6anp0kkEsDs9kj/+I//iG3bXH311Zimycc//nHe/e5309PTwyc+8Ql27NjBJz7xibnX6e7u5s1vfjN33XUX11xzDQDFYpGPfexjXHbZZXPn7Ny5k7vuuove3t6zxhIKhbj++uv53ve+x3/7b/8NgHw+zx133DG3WuSBBx6gvb2dm2++GdM02bp1K+Fw+KxKq3P58Ic/zIc//OFfeN7TfepTn2Lnzp385V/+JQA7duzA5/Nx6623nnXu+973Pn7rt34LgI0bN/Kf//mf3H///ezcufOX/r4iIiIiIhcK13X5xCc+wZVXXnnG5+TNmzfzyle+ki9+8Yv80R/9EVu3buW+++7jbW97GwD33Xcfa9as4cEHH5x7zt133z03D3m6trY2Lr30Ur773e/y67/+6wCcOnWKxx57jE9+8pPA7Jxh7dq1vO51rwNmt8cNhULEYrFnHP/w8DDJZJJIJPJLX/uRI0f4+te/zvvf/35+//d/H4ArrriCxsZG/uiP/oif/vSnXH311TzwwAO8613v4rrrrpsbV11dHX6/H5hdUT41NcWXv/zluXncVVddxStf+Uo+9alP8Xd/93e/9NhERBYLrbQQEVkETlfz+Hy+Zz3PsqwzzofZ6iXb/nmGff311wPw4IMPcuzYMYaHh7n22mvPWK1w6aWXEo1Gueeee854/Y0bN879vbm5GeCMbaSe7qabbpqbdMDsEvJKpcKNN94IwPbt2zl+/Divfe1r+fSnP83jjz/ODTfcwO/8zu8863UC3HLLLXz9618/689T+3083cmTJxkcHOTlL3/5Gcdf9apXnfP8LVu2zP09FAqRTqeZmZn5hWMTEREREbmQHT9+nLGxMV796lefcbyzs5NNmzbxwAMPAHDNNdfw0EMPUalUOH78OMPDw7z97W9nYGCAgYEBcrkcjzzyyFwh1LnceOONPPjgg4yNjQGzqyyi0SjXXnstANu2bZvbpuoLX/gCR44c4bd/+7e56aabnvE1Lcs650rq83H62p4+R3jVq16FZVns2rVrblx///d/z3ve8x7+/d//nfHxcT74wQ+yefNmYDbAWbVqFU1NTXPzLNM0ueqqq85YnSIicjFSaCEisgicrsw5veLimfT19RGJRKirq5s71tTUdMY5p/d0nZ6eZmpqCoCPfOQjrFmz5ow/uVyO0dHRM54bCoXm/n66UuqpAcnTbdu2jaamprktom677Ta2bt06F3i88pWv5NZbbyUcDvPZz36W17/+9bzkJS/he9/73rNeJ8y+J+vWrTvrz7OtRpmYmDjjPTgtnU6f8/ynXi/MXvOzXa+IiIiIyGJwep5wrs/J6XSabDYLzIYWxWKRhx9+mPvuu48lS5awc+dOwuEwDz74IPfeey+GYXDllVc+4/d6+ctfjm3bfP/73wdm5wzXX389wWAQgLe97W386Z/+KaVSiU984hO86lWv4tWvfvXcNkzn0tbWxvT0NPl8/hnPGR4ePufx06u+Gxoazjhu2zbJZHLu2j/5yU/y5je/mSeeeIL/8T/+B1dffTVvfetb5+ZsU1NT7Nmz56x51v/9v/+XbDZLsVh8xrGJiCx22h5KRGQRSKVSbNy4kR/+8If8wR/8wTmXVudyOe655565iqTTJicnz/h6fHwcmN02Kh6PA8wt7X6601tMPVemaXLDDTfw3e9+l7e//e3cc889fPSjHz3jnFe/+tW8+tWvJpvN8rOf/YzPf/7zfOA
"text/plain": [
"<Figure size 1600x1200 with 4 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"sns.set(style=\"whitegrid\")\n",
"\n",
"plt.figure(figsize=(16, 12))\n",
"plt.subplot(2, 2, 1)\n",
"sns.scatterplot(x=df_cleaned['Open'], y=df_cleaned['High'], alpha=0.6)\n",
"plt.title('Open vs High')\n",
"\n",
"plt.subplot(2, 2, 2)\n",
"sns.scatterplot(x=df_cleaned['Low'], y=df_cleaned['Close'], alpha=0.6)\n",
"plt.title('Low vs Close')\n",
"\n",
"plt.subplot(2, 2, 3)\n",
"sns.scatterplot(x=df_cleaned['High'], y=df_cleaned['Adj Close'], alpha=0.6)\n",
"plt.title('High vs Adj Close')\n",
"\n",
"plt.subplot(2, 2, 4)\n",
"sns.scatterplot(x=df_cleaned['Volume'], y=df_cleaned['Adj Close'], alpha=0.6)\n",
"plt.title('Volume vs Adj Close')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Стандартизация данных для кластеризации**\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import StandardScaler\n",
"\n",
"scaler = StandardScaler()\n",
"data_scaled = scaler.fit_transform(df_cleaned)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Агломеративная (иерархическая) кластеризация**\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1kAAAJ1CAYAAAArGDrKAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB6nUlEQVR4nO3deXgTdeLH8U/aJD1ooaW05b6K3DeCsr8FtCiyyqqAiwoioi6gHIq4Kq66nsAqCAiC4Hpxiais4IqiouIJCqigct8U2gKlpdAjaZPfH5ixadMrnV7wfj0PD83MZOabyWQyn/kesbjdbrcAAAAAAKYIqOwCAAAAAMD5hJAFAAAAACYiZAEAAACAiQhZAAAAAGAiQhYAAAAAmIiQBQAAAAAmImQBAAAAgIkIWQAAAABgIkIWAAAAAJiIkAXANA899JBatWrl899DDz1U2cUDkEdaWpq6deumrVu3Ki0tTXfddZdeffXVyi4WAJwXrJVdAADnl+joaM2dO9dr2rhx4yqpNAAKU6tWLY0cOVJDhgyR2+1Wq1at9O9//7uyiwUA5wVCFgDT5ObmKjQ0VJ07d/aabrfbK6dAAIo0btw43XTTTTp9+rSaNGmiwMDAyi4SAJwXaC4IwDQ5OTkKDg4u0bKbNm3SLbfcok6dOqlHjx568MEHlZKSYsxfuXKlWrVqpSNHjng9Lz4+3qvpodPpLLSJYv51/fzzzxo4cKA6duyov/71r/roo4+81p2enq6pU6fqiiuuUIcOHTRgwAC98847BbaffztHjhzR8OHD9dBDD+mll17Sn/70J3Xr1k133323EhISvJ7/6aefaujQoerSpYvat2+v/v37a+nSpcb8jRs3GuvdvHmz13OXLFmiVq1aKT4+vkB5HnnkEa9l09LS1L59e7Vq1UobN24s8fYL8/bbb2vQoEHq3LmzOnbsqOuuu04ffvhhgX3sq4loYe/P8OHDvbaxZs0aDRo0SF26dNH//d//6bHHHlNaWpoxf86cOWrVqpW6dOkih8Ph9dwJEyYUaJaanZ2tZ599Vn369FH79u3117/+VWvWrPF6Xnx8vGbOnKkpU6aoe/fuuuSSS/TAAw8oNTW1xK+/qGayK1euNN7TvO/DyZMndfHFF/t8L1u1aqXWrVure/fuGj9+vE6dOmUs06pVK82ZM8erbJ794s++lKQ6deqoefPm+vbbb4tt2pt/Wx988IG6d++uGTNmSPI+fvP/y1vuHTt2aNy4cbr00kvVrl079erVS08//bSysrKMZRwOh2bNmqW+ffuqY8eOGjBggP773/+WaJ9L0tGjR3XfffepR48e6tSpk0aMGKHffvvNWP+RI0fUqlUrffDBBxozZow6deqkyy67TC+++KJcLpfX+5J/n9x3331e76nb7dbs2bPVq1cvdevWTWPGjNGxY8eM5XNzc7Vw4UINGDBAHTt2VOfOnXXTTTdpw4YNRb6PUsH3PP9jt9utm266yet8+dBDD3kdW5K0fPlyn8cPgPJBTRYA02RmZqpWrVrFLvfDDz9o5MiRuvTSSzVr1iylpaVp9uzZuvXWW/XOO++UOKhJ5y6kJWn+/PmqXbu2pHMXxPnDkSSNHj1at9xyiyZOnKh33nlH9957rxYsWKA+ffooKytLQ4cO1cmTJzVhwgQ1aNBAn376qf75z3/qxIkTGjNmjLGePn366O677zYex8TESJLWrVunyMhIPfLII3K5XJoxY4aGDx+uDz74QCEhIfriiy80duxY3XrrrRo/fryysrK0bNkyPfnkk2rfvr06depkrLNGjRr67LPP1K1bN2PamjVrFBBQ8N5YjRo19MUXX8jtdstisUiSPv74Y+Xm5notV5rt57V06VI9/fTTGj9+vLp166a0tDS9/PLLuv/++9WlSxfVrVvXWHbu3LmKjo6WJOP9kKQbbrhBf/vb34zHTzzxhNc25s2bpxdeeEFDhw7VxIkTdfjwYc2ePVs//fSTVqxY4XVMWCwWfffdd+rTp48k6ezZs1q/fr3XvnG73Ro7dqy2bNmiCRMmKC4uTp988okmTpwoh8Oh66+/3lh22bJlatKkiaZOnaqUlBTNmDFDBw8e1PLly2WxWIp9/XfffbduuukmSedqhtq2bWscH40bN9bu3bsL7NMZM2YoPT1dNWvW9JruObacTqf27t2rZ599Vs8884ymT5/u873xpTT70sPpdGrKlCkl3oYkZWVl6cknn9Sdd96pv/71r17zHnvsMbVr1854fOONNxp/Jycna9iwYercubOmTZsmu92uL7/8Uq+99ppiYmI0atQoSdL999+v9evX66677lKnTp20fv16PfTQQ7LZbMXu85SUFN10000KCQnRo48+qpCQEL3xxhsaNmyY3nnnHcXFxRnlefzxx9WnTx/NmTNHmzdv1ty5c5WRkaF//OMfPl/3pk2b9MEHH3hNe/3117VgwQI98MADatasmaZNm6Z77rlHK1askCRNnz5db775piZNmqRWrVopKSlJL774ou655x598cUXCgkJKdW+z2vVqlX68ccfi1wmLS1Ns2bN8nsbAEqPkAXANKmpqUbgKMqMGTPUrFkzLViwwGie1KlTJ11zzTV69913NWzYsBJvMyMjQ5LUpUsXRUZGSpK++uorn8sOHz5cY8eOlST16tVLAwcO1Isvvqg+ffpo5cqV2rVrl5YvX64uXboYy+Tk5GjevHm66aabFBERIelceMjfJFI6FzJXrlypRo0aSZKaN2+ugQMH6r333tPNN9+sPXv2aODAgfrnP/9pPKdLly665JJLtHHjRq+Q07t3b61bt8640EtMTNSPP/6oiy++uEDtWM+ePbV+/Xr9/PPPRrk+/PBDde/e3av2pDTbz+vw4cO64447vIJlgwYNNGjQIG3evFnXXHONMb1NmzZq2LBhgXXUrVvXa5+FhYUZf6elpWn+/PkaMmSIHnvsMWN6y5YtNWzYsALHhGffeELWZ599pujoaK/ah2+//VZfffWVZs6cqauvvlrSufczMzNT06dP14ABA2S1nvsKDAgI0Guvvabw8HBJ597fsWPH6quvvlLv3r1L9PobN24s6VzT2MKOD49t27Zp1apVatOmjU6fPu01L+9zu3fvrm+//Va//vproevKr7T70mPx4sXKyMhQnTp1Sryt//3vf7LZbLrzzjsLNDNs0aJFoftg165datOmjWbPnm0cB3/605/0zTffaOPGjRo1apR27dqltWvX6uGHH9aIESMknTvOExIStHHjRg0YMKDIfT5z5kylpqbqzTffVIMGDSSdO26uvvpqzZ49Wy+88IKxbLt27YwQ27t3b2VkZOiNN97QXXfd5XWcSpLL5dLTTz+tdu3aeb0vGRkZuvvuu3XbbbdJOldL9uSTT+r06dOqWbOmkpOTNXHiRK/a26CgII0fP147d+4s8ngpytmzZzV9+vQC5cnvhRdeUP369b1qRQGUL5oLAjBNcnKyYmNji1wmMzNTP//8s/r06SO3262cnBzl5OSoUaNGiouL0zfffOO1vMvlMpbJyckpsL7ExEQFBAQUuBjyZeDAgcbfFotFV155pbZu3aqsrCx9//33atCggRGwPK699lplZ2fr559/Lnb9Xbt2NQKWJLVt21aNGjXSDz/8IEm68847NW3aNJ09e1a//PKL1qxZowULFkhSgeZv8fHxOnDggPbt2ydJ+uijj9SpUyfjgjGv8PBw9ejRQ+vWrZMkpaSkaOPGjV7hp7Tbz+uhhx7S/fffr9OnT+unn37SqlWrjCaGRT2vpH766Sc5HA4NGDDAa/rFF1+sBg0a6Pvvv/ea3rdvX3322Wdyu92SztXweYKUx3fffSeLxaI+ffp4HT/x8fE6fvy4V+1SfHy8EbA8j61Wq/G+mfn63W63nn76ad1www1q3bq1z/k5OTlyOBzaunWrNm/erPbt23stk/8zkTdclnZfStKJEyf04osv6sEHH1RQUFCJXkdSUpJefvllDR06tNT9uP785z9ryZIlCgoK0p49e7Ru3TrNnz9fKSkpxv70NJXt16+f13PnzJmjp556qthtfPfdd2r
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[13 13 13 ... 1 1 1]\n"
]
}
],
"source": [
"from scipy.cluster.hierarchy import dendrogram, linkage, fcluster\n",
"\n",
"linkage_matrix = linkage(data_scaled, method='ward')\n",
"plt.figure(figsize=(10, 7))\n",
"dendrogram(linkage_matrix)\n",
"plt.title('Дендрограмма агломеративной кластеризации')\n",
"plt.xlabel('Индекс образца')\n",
"plt.ylabel('Расстояние')\n",
"plt.show()\n",
"\n",
"# Получение результатов кластеризации с заданным порогом\n",
"result = fcluster(linkage_matrix, t=10, criterion='distance')\n",
"print(result) # Вывод результатов кластеризации"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Визуализация распределения кластеров**\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi0AAASgCAYAAACEzgvMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd5xddZ3/8dcpt0/vJZkkk0kmvZFGD6FKEUVUFEUsqwjouqKiy6pY+KmryLKou6JYVhFBUKpI7zWUQMqESU+m93Ln1lN+fwwZiBQjKXcI7+fjkcfMfM+553y/9x54fD/38y2G7/s+IiIiIiIiIiIiIiIiOWbmugIiIiIiIiIiIiIiIiKgpIWIiIiIiIiIiIiIiIwTSlqIiIiIiIiIiIiIiMi4oKSFiIiIiIiIiIiIiIiMC0paiIiIiIiIiIiIiIjIuKCkhYiIiIiIiIiIiIiIjAtKWoiIiIiIiIiIiIiIyLigpIWIiIiIiIiIiIiIiIwLSlqIiIgcYL7v57oKIiIiIiLyNqaYQkQOZkpaiIjsQ88++yyf+9znOPzww5k7dy7HHnss//Ef/8HmzZtzXbUDoqWlhcbGRv785z+/7vE///nPNDY20tLS8rp/74m38ppXGxoa4ic/+QmnnXYaCxcu5NBDD+VjH/sY999//27nffWrX2XlypVv6R5vdu+vfOUrPPPMM/v0uiIiIiJy8PjoRz/KRz/60VxXY7/LZb98bymmEBHZv5S0EBHZR66++mrOPvtskskk//7v/84111zDeeedx/r163nve9/LHXfckesqjjsrVqzg+uuvp6Ki4oDcb/PmzbznPe/hxhtv5L3vfS8//elP+c53vkNRURGf/exn+dnPfrZf79/U1MQtt9yC53n79T4iIiIiIuNZrvvleyPXdVdMISLvBHauKyAicjB44IEHuPzyy/nc5z7HhRdeOFa+dOlS3vOe93DRRRfx1a9+lenTpzNt2rQc1nR8KSkpoaSk5IDcK5vN8oUvfIFAIMAf/vAHSktLx44dd9xxfP3rX+fKK69k5cqVzJgx44DUSURERETknebt3C9/O9ddROTtRDMtRET2gZ/85CfU19dzwQUXvOZYIBDg29/+NpZl8Ytf/GKsvLGxkd///vdcfPHFLFy4kMMOO4zLLruMdDq92+vvvfdezjjjDObOncvhhx/Od7/7XRKJxNjxq666iuOPP54HH3yQ0047jTlz5nDiiSdy8803v2F9b7vtNhobG2lubn7NvRobG1m/fj0Av/3tbznppJOYO3cuRx55JJdeeinxePytvEWv6/WWevrLX/7CySefzNy5c3n3u9/NE088waxZs16z5NQLL7zAWWedxdy5c1mxYgW//OUv3/ReDz30EM3Nzfzrv/7rbsHFLp///Of5yEc+guM4r/v6xsZGrrrqqt3KrrrqKhobG8f+7uvr46KLLhpbHuz0008f+xyeeuopzjnnHADOOeec3ab87+ln/JOf/ISlS5dyxBFHMDg4yNq1a/nYxz7GIYccwsKFCzn33HNZvXr1m74PIiIiInJweOyxx/jwhz/MIYccwrJly7joootob28H4L777tutXw9w880309jYyJ/+9KexsqamJhobG3n++edfc/3//d//Zc6cOQwODu5W/pvf/IbZs2fT29uL53lcccUVrFy5kjlz5rBy5Uouv/xystnsG9Z7b/vlruty7bXXctpppzFv3jxWrFjBj370o93iqDfrl+/S1tbGF7/4RZYuXcr8+fP52Mc+ttv7tT/qrphCRGTPKGkhIrKX+vr6WLt2LccccwyGYbzuOUVFRRx22GHcd999u5VfeeWV9Pb28l//9V986lOf4vrrr+fiiy8eO37bbbdxwQUXUF9fz09/+lMuvPBCbr31Vs4///zdNl7r7u7m29/+Nueccw5XX301EyZM4OKLL37DvTSOO+44otHoa5asuv3225k2bRqzZs3i9ttv54c//CFnn30211xzDRdccAG33HIL3/nOd/7he+J5Ho7jvObfP5rCfPPNN/PVr36VRYsW8bOf/YwTTzyR888/H9d1X3PupZdeyimnnMLVV1/NwoUL+eEPf8gDDzzwhtd++OGHsSyLo48++nWPl5eX8/Wvf505c+b8w/a9kS9/+cts3ryZb33rW/ziF79g1qxZXHzxxTz55JPMnj2bb3zjGwB84xvf4Jvf/Caw559xW1sbDz30EFdccQVf+9rXsCyLT33qUxQXF3PVVVdxxRVXkEwm+eQnP8nw8PBbboOIiIiIjH8333wzn/jEJ6iurubHP/4xX/va13j++ef54Ac/SG9vL4ceeijBYJDHH3987DVPPvkkwG57ITz88MOUlJQwf/7819zjtNNOw3Ec7r777t3K77jjDo444ghKS0v5xS9+wXXXXccFF1zAr371Kz70oQ9xzTXX8D//8z9vWPe97Zd/4xvf4Hvf+x7HHXcc//M//8PZZ5/N73//+936z2/WL4fRGO6ss85i3bp1fP3rX+fyyy/H8zzOPvvsN92PUDGFiMiBoeWhRET2UmtrKwC1tbVvet6kSZO47777GBwcpLCwEBhdHul///d/sW2bo48+GtM0+d73vsfnPvc56uvr+dGPfsSRRx7Jj370o7HrTJ48mXPPPZeHHnqIFStWAJBMJrnssss49NBDx8455phjeOihh5g6depr6hKJRDjxxBP561//yr/9278BMDIywgMPPDA2W+Tpp59mwoQJnH322ZimydKlS4lGo68ZafV6LrnkEi655JJ/eN7fu/LKKznmmGP47ne/C8CRRx5JIBDg8ssvf825X/ziF/nQhz4EwIIFC7jnnnt48sknOeaYY1732h0dHRQXFxOLxf7peu2pp59+mgsuuIDjjjsOGF0erKioiGAwSF5eHg0NDQA0NDTQ0NCA7/t7/Bk7jsPFF1/M4sWLAVi9ejX9/f2cc845LFq0CID6+nquv/56RkZGyM/P32/tFBEREZHc8TyPH/3oRxxxxBG79ZMXLVrEySefzDXXXMNXvvIVli5dyhNPPMGnPvUpAJ544glmz57NqlWrxl7zyCOPjMUhf6+2tpYlS5Zw++238/73vx+AHTt28OKLL3LFFVcAo/3fOXPm8L73vQ8Y7f9GIpE37YvuTb9806ZN3HjjjVx00UV8+tOfBuDwww+noqKCr3zlKzz88MMcffTRb9ovh9EZ5QMDA1x33XVjcdxRRx3FySefzJVXXsl///d/7/O67ynFFCIimmkhIrLXdo1cCQQCb3qeZVm7nQ+jo5ds+5X88YknngjAqlWr2LJlCx0dHaxcuXK32QpLliwhLy+Pxx57bLfrL1iwYOz3qqoqgN2mA/+9008/fSzogNEp5JlMhne/+90ALF++nK1bt3LGGWfwk5/8hDVr1nDaaaftNgX5jVx44YXceOONr/n36v0+/t727dtpa2vjpJNO2q38lFNOed3zd3W0YTQJU1ZWxtDQ0Bte37Ks152xsS8tW7aMq666is9//vP86U9/oqenh4svvngsAPh7/+xnPHPmzLHfp02bRklJCeeddx7f+MY3uOeeeygrK+PLX/7y2OcvIiIiIgefrVu30t3dzamnnrpbeV1dHQsXLuTpp58GYMWKFTz77LNkMhm2bt1KR0cH5513Hq2trbS2thKPx3n++efHvtB+Pe9+97tZtWoV3d3dwOgsi7y8PFauXAmM9n93LVP1y1/+kk2bNvGRj3yE008//Q2vuTf98l1t+/sY4ZRTTsGyLJ566qmxer1Zv/yJJ55g5syZVFZWjvXBTdPkqKOO2m12yr6s+55STCEioqSFiMhe2zUyZ9eMizeyc+dOYrEYRUVFY2WVlZW7nbNrXdTBwUEGBgYA+Na3vsXs2bN3+xePx+nq6trttZFIZOz3XSOlXp0g+XvLli2jsrJybImoO+64g6VLl451Tk8++WQuv/xyotEoP/vZzzjzzDM59thj+etf//qm7YTR92Tu3Lmv+fdms1H6+vp2ew92KSsre93zX91eGG3zm7W3traWwcFBRkZG3vCcjo6ONzy2J6644grOPfdc1q5dy3/8x39
"text/plain": [
"<Figure size 1600x1200 with 4 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(16, 12))\n",
"plt.subplot(2, 2, 1)\n",
"sns.scatterplot(x=df_cleaned['Open'], y=df_cleaned['High'], hue=result, palette='Set1', alpha=0.6)\n",
"plt.title('Open vs High Clusters')\n",
"\n",
"plt.subplot(2, 2, 2)\n",
"sns.scatterplot(x=df_cleaned['Low'], y=df_cleaned['Close'], hue=result, palette='Set1', alpha=0.6)\n",
"plt.title('Low vs Close Clusters')\n",
"\n",
"plt.subplot(2, 2, 3)\n",
"sns.scatterplot(x=df_cleaned['High'], y=df_cleaned['Adj Close'], hue=result, palette='Set1', alpha=0.6)\n",
"plt.title('High vs Adj Close Clusters')\n",
"\n",
"plt.subplot(2, 2, 4)\n",
"sns.scatterplot(x=df_cleaned['Volume'], y=df_cleaned['Adj Close'], hue=result, palette='Set1', alpha=0.6)\n",
"plt.title('Volume vs Adj Close Clusters')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**KMeans (неиерархическая кластеризация) для сравнения**\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Центры кластеров:\n",
" [[1.76367212e+01 1.78216288e+01 1.74494147e+01 1.76355079e+01\n",
" 1.23626925e+01 3.93726001e+07]\n",
" [4.68041860e+01 4.71914073e+01 4.64368902e+01 4.68130050e+01\n",
" 4.52526313e+01 2.24991882e+07]\n",
" [2.65805223e+01 2.67680688e+01 2.64133213e+01 2.65956796e+01\n",
" 2.26497465e+01 2.25562105e+07]]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi0AAASgCAYAAACEzgvMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd5wdZdn/8c/M6XV732w2vSckhIQQICQ06eKDCqIIAlJFQBH9oc8Dj2BBmjQFBMUG+IiigtJ7SwgtIY2UTbb3Pbt7epvfH0sWlhQCyWY3yff9euVFdmbOzH3PGWCuve7rvg3LsixERERERERERERERESGmDnUDRAREREREREREREREQElLUREREREREREREREZJhQ0kJERERERERERERERIYFJS1ERERERERERERERGRYUNJCRERERERERERERESGBSUtRERERERERERERERkWFDSQkREREREREREREREhgUlLUREREREREREREREZFhQ0kJERHaYZVlD3QQREREREdkLKLYQEZFtUdJCRIa1r33ta3zta1/bYns4HOZLX/oSU6dO5emnn+4/dsKECZx66qnbPN9ll13GhAkT+P73vz9obR4siUSC3/3ud/zXf/0X+++/P3PmzOHUU0/lkUceGfDCf9tttzFhwoRdeu1kMslPfvIT/vWvf+2S823re92dkskkf/jDH/jiF7/IrFmzmDVrFieffDL33XcfsVhsSNu2u3zSs/Lx7+mzfG/D4bsWERERAcUWH6XYYteor69nwoQJ/O1vfxuS6+9Oy5cv54orruCwww5j+vTpHHHEEfzoRz+irq5uwHETJkzgtttuG6JWisjewj7UDRAR+bTC4TDnnHMOq1ev5o477mDBggX9+0zT5J133qG5uZnS0tIBn4tGozz33HO7u7m7RHt7O+eccw5NTU187WtfY/r06WSzWZ577jm+//3vs3TpUn784x9jGMagXL+1tZX777+fn/70p7vkfP/zP/+zS87zWfX29nLuueeyevVqTjvtNC655BIMw2Dp0qX86le/4u9//zv33HPPFs/Qvm6ovzcRERGRXU2xhWIL+WR/+tOf+MlPfsLcuXP5zne+Q3FxMZs2beLee+/lySef5P7772fixIlD3UwR2YsoaSEie5TNQcWqVav41a9+xfz58wfsnzx5MuvWrePxxx/nzDPPHLDvueeew+PxEAwGd2OLd40rr7yS5uZmHnroIaqrq/u3H3bYYZSXl3PTTTexcOFCDj/88KFr5KcwduzYIb3+VVddxdq1a3nwwQcHvFwffPDBnHTSSZx22ml897vf5Q9/+MOgBWt7oqH+3kRERER2JcUWii3kk7355ptcd911nH766Vx11VX92+fOncsRRxzB5z//ef7f//t/+0S1iYjsPpoeSkT2GJFIhHPPPZc1a9Zw9913bxFUAHi9XhYsWMDjjz++xb5///vfHH300djtA/O12WyWu+++myOPPJKpU6dy9NFH84c//GHAMZlMhrvvvpvjjz+e6dOns99++3Hqqafy+uuv9x9z2223ceSRR/L8889zwgkn9J/rkUceGXCu+++/n8997nNMmzaNQw45hKuvvppwOLzNfq9atYqXX36Zs88+e0BQsdmZZ57J6aefjtfr3ernFy1atEXJ+t/+9jcmTJhAfX09APF4nKuvvppDDz2UqVOn8rnPfY57770X6Ct53hyw/OAHP2DRokX951m6dClf/epXmTFjBnPmzOHKK6+ks7NzwHUmT57M//3f/zF//nzmzJnDunXrtijhnjBhAn/605+46qqrmDNnDjNnzuTb3/427e3tA9p97733cvjhhzN9+nROPfVUnn32WSZMmMDixYv72/pJ5chr167liSee4LzzztvqaKBRo0bx7W9/mzfeeKP/+918v959911OPvlkpk+fzgknnLDFc5ZIJLj++utZsGABU6dO5YQTTuDf//73Ft/Hrbfeys9//nMOOuggpk+fztlnn83GjRu32eZvfOMbfOELX9hi+4UXXsiJJ54IQGdnJ9/5zneYP38+06ZN46STTtri2dtZH//ewuEw//3f/828efOYOXMml112Gb/73e+2mELAsizuueee/lLyL3/5yyxbtmyXtk1ERETk01BsodhiV8QWOyqRSHDHHXf0f1dHHXUUd999N9lsFoCLL764/71+s69//etMnTqVeDzev+26667j6KOP3uo1jj76aC655JIttp900klccMEFANTW1nL++eczd+5cZsyYwZe//GVeeOGF7bb93nvvJRAIcPnll2+xLz8/n+9///scfvjhRKPRrX6+tbWVH/zgByxYsIDp06dzyimn8Mwzzww45pVXXuFLX/oSM2fO5IADDuCCCy5g/fr1A455+umn+cIXvsC0adOYP38+11577TavKSJ7PiUtRGSPEI1G+eY3v8nKlSu55557mDt37jaPPfbYY/vLuDcLh8O8+OKLHH/88Vscf/XVV3Prrbdy4okn8utf/5rPfe5z/OQnP+GOO+7oP+aGG27gzjvv5Mtf/jK/+c1v+PGPf0woFOLb3/72gPUP2tra+N///V/OOOMM7r77biorK7nyyiv7X7geffRRfvGLX3D66adz7733ctFFF/GPf/yDH//4x9vsz0svvQQw4IX+o1wuV/8vjj+rn/zkJ7z44otceeWV/S/v119/PQ8//DDFxcXcfvvtAFxwwQX9f3/jjTc488wzcbvd3HLLLfy///f/WLJkCWecccaAF+tMJsN9993Hddddxw9+8APGjBmz1TbcfPPNZLNZbrrpJr73ve/x3HPP8ZOf/KR//+23384NN9zAMcccw5133smMGTO49NJLB5yjuLiYhx56iC9+8Yvb7Osn3U/oe4YMw9jiZfq8887j8MMP5/bbb2fUqFFceuml/S/5lmVx0UUX8eCDD3LWWWfxq1/9qv8X+R8PLn//+9+zYcMGfvrTn3Lttdfy3nvvceWVV26zPSeeeCIrVqxg06ZN/dt6enp48cUXOemkkwC44oorWL9+Pddccw333HMPkydP5sorrxwQ/G5LOp3e6p9PWhzxwgsv5D//+Q/f+ta3uPnmm4lEItx4441bHPfmm2/y1FNP8aMf/Yhf/OIXtLa2csEFF5BOpz+xbSIiIiK7mmILxRa7KrbYEZZlcf755/Ob3/yGL37xi/3PxS233NI/tdWCBQt4//336ejoAPqSHG+//TapVIp33nmn/1wvvvgiCxcu3Op1TjzxRF544YUBSav169ezevVqTjrpJLLZLOeddx6xWIzrr7+eO++8k9zcXC644IIBccbH2/7yyy8zb948PB7PVo859thjueiii7aa6Gpvb+eUU05h6dKlXHbZZdx2221UVFRw0UUX8c9//hOAuro6LrzwQqZOncqvfvUrrrvuOmpqavjmN7/Zn9T517/+xUUXXcTo0aO54447uPjii/nnP//JhRdeqAXdRfZSmh5KRIa9zUHFm2++2f/z9hx22GF4PJ4BZdxPPfUUBQUF7L///gOOramp4S9/+QuXX3453/zmN4G+KYIMw+Cuu+7iK1/5Cnl5ebS2tnLZZZcNGMHjcrn41re+xZo1a9hvv/0AiMViXHfddf0v+dXV1SxcuJAXXniBMWPGsGTJEiorKzn99NMxTZM5c+bg9Xrp7u7eZn+ampoAqKys3PGb9iktWbKE+fPnc9xxxwF9pb5er5eCggKcTieTJk0CoKqqismTJwNw4403MmrUKO666y5sNhsAM2bM4LjjjuPhhx/m9NNP7z//+eefz2GHHbbdNowfP37AvLbLli3rH9UWjUa55557OP300/nud78L9H1PsViMhx56qP8zTqez/7vYls0jwCoqKrZ5TE5ODjk5OTQ0NAzY/rWvfY2LLroIgEMOOYSTTz65f+7jV199lZdeeombb76ZY489tv+YWCzGDTfcwPHHH98/Ei8YDHLnnXf237fa2lpuu+02urq6yMvL26I9Rx11FNdccw2PPvpo//WffPJJMplMf7C8ZMkSLrroIo444ggA5syZQ25uLk6nc7v3A2DKlCnb3Ddnzpy
"text/plain": [
"<Figure size 1600x1200 with 4 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.cluster import KMeans\n",
"\n",
"random_state = 9\n",
"kmeans = KMeans(n_clusters=3, random_state=random_state)\n",
"labels = kmeans.fit_predict(data_scaled)\n",
"centers = kmeans.cluster_centers_\n",
"\n",
"# Отображение центроидов\n",
"centers = scaler.inverse_transform(centers) # Обратная стандартизация\n",
"print(\"Центры кластеров:\\n\", centers)\n",
"\n",
"# Визуализация результатов кластеризации KMeans\n",
"plt.figure(figsize=(16, 12))\n",
"plt.subplot(2, 2, 1)\n",
"sns.scatterplot(x=df_cleaned['Open'], y=df_cleaned['High'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 0], centers[:, 1], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: Open vs High')\n",
"plt.legend()\n",
"\n",
"plt.subplot(2, 2, 2)\n",
"sns.scatterplot(x=df_cleaned['Low'], y=df_cleaned['Close'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 2], centers[:, 3], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: Low vs Close')\n",
"plt.legend()\n",
"\n",
"plt.subplot(2, 2, 3)\n",
"sns.scatterplot(x=df_cleaned['High'], y=df_cleaned['Adj Close'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 1], centers[:, 4], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: High vs Adj Close')\n",
"plt.legend()\n",
"\n",
"plt.subplot(2, 2, 4)\n",
"sns.scatterplot(x=df_cleaned['Volume'], y=df_cleaned['Adj Close'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 3], centers[:, 4], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: Volume vs Adj Close')\n",
"plt.legend()\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**PCA для визуализации сокращенной размерности**\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi8AAAJHCAYAAADoqsXxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3hUVf7H8c+dmkwSEhJIoYTee5GiooKKXRfFn7oWZEVRLLsqiq4KlnV1XTsCdtlVXNFVsfe+iigqoAIC0ksSSnomkyn390fMyJCEOi3D+/U8PJBzZu75zpy54Z753nOOYZqmKQAAAAAAAAAAgDhhiXUAAAAAAAAAAAAAOyN5AQAAAAAAAAAA4grJCwAAAAAAAAAAEFdIXgAAAAAAAAAAgLhC8gIAAAAAAAAAAMQVkhcAAAAAAAAAACCukLwAAAAAAAAAAABxheQFAAAAAAAAAACIKyQvACABmKYZ6xDQiIO5bw7m1w4AAIBQXBseHOhnAOFE8gJIYOeff766desW8qd379466qijdNttt6m0tLTec9asWaNbb71VxxxzjPr27aujjjpK11xzjZYvX95oOw888IC6deumO+64I5Ivp1HTp09Xt27dYtJ2Q1555RV169ZNGzdujPjzampq9Pe//11vvPHGvoa5T84++2x169ZN7733XkTbibe+PBBlZWW6/vrrtXDhwmDZ+eefr/PPPz9qMezt+Txq1CjdcMMNYW175cqVOuecc8JyrI0bN6pbt2565ZVXwnI8AAAQPxizxEYijVm6deum6dOn1ytfsWKFhg8friOPPFJr164NPrZbt266//77GzxWIBDQiBEjmuy1Z2Fhoe655x4df/zx6tevnw4//HBdeumlIWMSKTLjkoKCAl1yySXatGlTWI7XWL8COLiQvAASXM+ePTV37tzgn2eeeUYXXnihXn75ZU2cODHkroj3339fY8aM0c8//6zLLrtMTzzxhK6++mqtXbtW//d//6cvv/yy3vEDgYDmzZunrl276rXXXpPb7Y7myzvoFRUV6V//+pd8Pl/E2li9erV++OEHde3aVS+88ELE2kk0y5Yt02uvvaZAIBAsmzZtmqZNmxaV9vfnfA6nd999Vz/88ENYjpWdna25c+fqqKOOCsvxAABAfGHMktiiMWbZ1cqVK3XhhRcqOTlZzz33nNq3bx+ss1gsevfddxt83rfffquioqIoRRle3333nU477TR98sknuuCCC/Too4/qpptuUnV1tc4//3zNmzcvou1/9dVX+uyzz8J2vLlz5+rMM88M2/EANE22WAcAILJSU1PVv3//kLJDDjlElZWVevjhh7V48WL1799f69ev15QpUzRixAg9+OCDslqtwcePHj1a55xzjqZMmaKPP/5YDocjWPe///1PBQUFuv/++3XeeefpzTff5AIjwbzyyitq3bq1Jk6cqMmTJ2vdunVq165drMNqkjp37hyVdvb3fI5XDoej3u8xAACQOBizIJx+/fVXjRs3TikpKfrXv/6lVq1ahdQPHDhQCxcu1NKlS9WzZ8+Qurfeeks9evTQsmXLohnyASspKdFf/vIXtW/fXs8884ySk5ODdccdd5wuueQSTZ06VYcffrhatGgRw0j3Htf/ACRmXgAHrd69e0uSNm/eLEl69tlnVVNTo5tvvjlkECBJycnJmjJlis4444x607Zffvllde3aVYMGDdLQoUM1d+7cPbY9atQo/f3vf9e4cePUt29f3XTTTZJqL7imTp2qQw89VH369NH//d//af78+SHP9Xg8uuuuu3TYYYdpwIABuvHGG+XxeEIe09AU2AULFqhbt25asGBBsGz16tW64oorNGTIEB1yyCGaOHGifv3115C27rnnHh155JHq3bu3TjnlFL399tshxw0EApo5c6aOOuoo9evXT5MmTWpwavuu9vZ5H374of74xz9qwIAB6t27t44//njNmTNHUu1SOkcffbQk6cYbb9SoUaOCz3vppZd0+umnq3///urbt69OO+00vfPOOyHH7tat2x6XCvL7/Zo3b55GjhypY445Ri6Xq8E+9nq9uvfee3XEEUeob9++uuiiizRv3rx6U8pfffVVnXjiierTp49OPfVUzZ8/Xz179tztlOy3335bp59+ugYMGKDDDjtMU6dODXmvpk+fruOPP14ffPCBTj75ZPXp00ennXaafvjhBy1atEhnnnmm+vbtq5NPPrne52nFihWaOHGiBg4cqIEDB+ryyy/Xhg0bgvV1n5sXXnhBI0eO1MCBA4N38+3uPV6wYIEuuOACSdIFF1wQ/Dzu/Nn805/+pNNPP73e6500aZJOPfXU4M8LFy7Ueeedp379+mnIkCGaMmWKduzY0ej7Je3/+bzza975XNk1dkn66aefNG7cOA0aNEgDBgzQhRdeqEWLFkmq7ZNHHnlEUuh070AgoMcff1zHHnusevfureOOO07PPvtsvXYmT56sq666Sv3799f48ePrLRv1yiuvqGfPnlq8eLHOOuss9enTRyNHjtRTTz0VcqyioiJdffXVwXN86tSpeuCBB0LOFQAAEL8YszBm2Zsxy85+/fVXXXDBBUpLS9Nzzz1XL3Eh1SbGWrRoUW/2hc/n0/vvv6+TTjqp3nP2pt937Nih2267TSNHjlTv3r01ZMgQXX755SHjofPPP1833XSTHn/8cR111FHq06ePzj77bC1ZsiT4mOrqat1666064ogjgu/nrte5u5o3b56Kior017/+NSRxIdXONJk8ebLOPfdcVVRU1HtuY0u03nDDDSH9tX79el166aUaOnSo+vXrp7POOis40+KVV17RjTfeKEk6+uijQ/rspZde0kknnRRcDm769Ony+/0h7YwbN07Tpk3TwIEDdeKJJ8rv94eMI+rOjfnz5+tPf/qT+vXrp8MOO0z//Oc/Q45VUVGhqVOnavjw4RowYICuvvpqzZ49O66WbAOwb0heAAepNWvWSJLatm0rSfriiy/Us2dP5eTkNPj44cOH6+qrr1bLli2DZSUlJfr444/1hz/8QZI0ZswY/fjjj/r555/32P6cOXPUp08fzZw5U2PHjpXH49G4ceP00Ucf6eqrr9Yjjzyi3NxcTZgwIeSi8LrrrtOLL76oiRMn6sEHH1Rpaalmz569z6+/sLBQZ511ltauXatbb71V//znP7Vt2zaNGzdOJSUlMk1Tl19+uV544QWNHz9es2bNCl787Dzd9p///KdmzJihsWPH6pFHHlFGRobuu+++Pba/N8/79NNPdfnll6tXr16aOXOmpk+frrZt2+r222/X4sWLlZ2dHfyC+LLLLgv+e86cOZo6daqOOeYYPfbYY7r33nvlcDg0efJkFRQUBI8/d+5cTZo0abdxfv7559q6dav+8Ic/KCkpSSeccIJeffVV1dTUhDxu6tSp+te//qXzzjtPM2bMUIsWLXTLLbeEPGbevHm64YYbNHDgQM2cOVPHHXecJk2aFHKxuauZM2fqmmuuUf/+/fXwww/r8ssv13vvvafzzz9f1dXVwccVFBTo7rvv1qWXXqqHHnpIZWVluuqqq3TNNdfozDPP1IwZM2Sapq6++urg89asWaOzzz5b27dv1z/+8Q/deeed2rBhg8455xxt3749JI5HHnlEU6ZM0dSpUzVgwIA9vse9evXS1KlTg+9NQ0tFnXrqqfr555+1bt26YFlZWZk+//xznXbaaZJqp61feOGFSkpK0oMPPqi//vWv+uabb3TBBReEvP5d7c/5vC8qKio0YcIENW/eXNOnT9cDDzwgt9utiy66SOXl5TrzzDM1duxYSaHTvW+99VY9/PDDOvXUU/Xoo4/q+OOP19///nfNmDEj5PjvvPOOUlJSNGvWLE2YMKHBGAKBgP7yl7/oxBNP1OOPP66BAwfqnnvu0RdffCGpdm3lcePG6fvvv9df//pX3XXXXVq+fLmefvrp/XrNAAAg+hizMGbZmzFLndWrV2vcuHFKTU3Vc8891+jnxGq16rjjjquXvJg/f748Hk+9G132pt9N09TEiRP15ZdfavLkyXrqqad0xRVXaP78+fXGAu+
"text/plain": [
"<Figure size 1600x600 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.decomposition import PCA\n",
"\n",
"pca = PCA(n_components=2)\n",
"reduced_data = pca.fit_transform(data_scaled)\n",
"\n",
"# Визуализация сокращенных данных\n",
"plt.figure(figsize=(16, 6))\n",
"plt.subplot(1, 2, 1)\n",
"sns.scatterplot(x=reduced_data[:, 0], y=reduced_data[:, 1], hue=result, palette='Set1', alpha=0.6)\n",
"plt.title('PCA reduced data: Agglomerative Clustering')\n",
"\n",
"plt.subplot(1, 2, 2)\n",
"sns.scatterplot(x=reduced_data[:, 0], y=reduced_data[:, 1], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.title('PCA reduced data: KMeans Clustering')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Анализ инерции для метода локтя (метод оценки суммы квадратов расстояний)**\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA2cAAAImCAYAAADXOPIYAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB7GElEQVR4nO3deVjU5f7G8XsGhn2RTXAXQVED3FKzMo3K9k7mqU6llSfLzPKX1rFM62Rli6HmkpVZqaXZosfq2Gmz3dTUUlFxQcUdkV1kGWDm9wcyOaKCCDMDvF/XxQV8l2c+jE/G7bN8DVar1SoAAAAAgFMZnV0AAAAAAIBwBgAAAAAugXAGAAAAAC6AcAYAAAAALoBwBgAAAAAugHAGAAAAAC6AcAYAAAAALoBwBgAAAAAugHAGAAAAAC6AcAYAAAAALoBwBgAOMGTIEMXExOgf//jHGa8ZPXq0YmJi9OSTTzqwMgA1deDAAcXExGjp0qXOLgVAA0E4AwAHMRqN2rBhg9LS0iqdKygo0A8//OCEqgAAgKsgnAGAg3Tu3Fmenp766quvKp374Ycf5O3trfDwcCdUBgAAXAHhDAAcxMfHR/369TttOPvyyy919dVXy93dvdK57777Trfccovi4uJ0ySWX6IUXXlBBQYEkKSEhQTExMaf9OHDggCRp5cqVuvPOO9WjRw/17t1bjz32mA4fPmz3Go899thp26hqulbFdM3TfZwsKSlJ9913n3r37q3u3bvrwQcf1M6dO23n16xZo5iYGK1Zs0aStGPHDl155ZX6xz/+oZkzZ57xNWbOnClJ+uSTT3TttdcqNjbW7nxVU0Q//vjj07Z78n0VU9equq6mNVT3vTnb65/pfMWfw5NPPqmEhAS71128eLHde3jy66xfv97u2g8++EAxMTF2bRQVFWnKlCkaMGCAYmNj1b17dw0dOlTJycl2956priFDhthdU1HH6ZzaPyoMGTLErp3i4mK9/vrruuaaaxQXF6cBAwZozpw5slgsdvecWsuaNWuqdW9VrFarxo0bp/j4eP3666/Vvg8AKlT+LQAAUGeuu+46Pfroo0pLS1NERIQkKT8/Xz///LPee+89/fzzz3bXf/HFF3r88cd144036tFHH9XBgwc1bdo0paSk6L333tOsWbNkNpt19OhRPfzwwxoxYoT69+8vSWratKmWLVumJ554QjfccIOGDx+u7OxszZgxQ7fffrv+85//KCQkRFL5L7W33367brnlFkmytVcdnTt31r///W/b95988ok+/fRT2/erV6/WsGHD1Lt3b7344osqLi7WW2+9pX/84x/6+OOPFRUVVanNV199VbGxsRoxYoQCAwPVt29fSdLEiRMlyfZ6ERERWrt2rSZMmKC///3vmjBhgnx9fSWpWvUXFRUpLi5OEyZMsB07030nv7enXlfTGs7lvXnmmWd0wQUXnPb1P/roI0nSli1b9Nxzz1W69lS5ubl67bXXTnvO19dX33//vXr06GE79uWXX8potP/33LFjx2rdunUaM2aMWrdurb1792r69Ol67LHHtHz5chkMBtu1f//733Xrrbfavq/4c6xNVqtVDz74oDZs2KCHH35YHTt21Jo1a/Taa69p//79ev75523Xntpno6Kiqn3v2bzwwgv673//q9dff12XXnpprf+MABo+whkAOFD//v3l7e2tr776Svfee68k6dtvv1VISIjdL8NS+S+biYmJ6tu3rxITE23H27Ztq3vvvVc//fSTLSxUjJK1bt1aXbt2lSRZLBYlJibq0ksv1ZQpU2z3d+/eXdddd53eeecdjR07VpJUWFiotm3b2u6taK86/Pz8bPdJ0i+//GJ3fsqUKWrTpo3mzJkjNzc3SdKll16qq666SjNmzND06dPtrt+7d69+/fVXff7552rfvr0k2YKsn5+fJNm93vLlyyVJTz31lC0USZKHh0eVtRcWFio0NNSuvTPdd/J7e+p1mzZtqlEN5/LeREdHn/H1K44XFxef9tpTzZgxQ82bN1d2dnalc5dddplWrFihf/3rX5KktLQ0/fnnn7rwwgt18OBBSZLZbNbx48c1YcIEXXfddZKkXr16KT8/Xy+//LIyMjIUFhZmazMiIsKunoo/x9r0888/67ffftPUqVN1/fXXS5IuueQSeXl5afr06br77rtt/enUPvvTTz9V+94zmTJlij766CPNmjVLl112Wa3/fAAaB6Y1AoADeXl5KSEhwW5q4/Lly3XttdfajTRI0u7du5WWlqaEhASVlpbaPnr27Ck/Pz+tXLnyrK+1Z88eHT16VDfccIPd8datW6tbt276/fffbccOHz4sf3//WvgJ7RUUFCgpKUnXXnutLXxIUkBAgC6//HK7GiqunzZtmnr37l3lL8MV4uPjJUnvvvuu0tPTZTabVVpaWq17a+vnrkkN5/re1JYdO3boo48+0tNPP33a8wkJCUpNTdXu3bslSV999ZW6dOmiFi1a2K7x8PDQO++8o+uuu05HjhzR6tWrtXjxYtumNmaz+ZzrslgsKi0tldVqrfKaio+Tr/3999/l7u6ua665xu6em266yXb+TM7nXklauHCh5syZo+uvv95udBUAzhUjZwDgYNdee60efvhhpaWlydPTU6tWrdKjjz5a6bqcnBxJ5VPATjcNLD09/ayvU3F/aGhopXOhoaHaunWrpPIRukOHDqlly5bn9oNUw7Fjx2S1Ws9Yw7Fjx+yOPfjggwoICLCbFlmVnj17asKECZozZ45mzZp1TvUdPHjwrNP/6rKGc31vassLL7yg66+/Xt26dTvt+fDwcMXGxmrFihVq166dvvzyS91www22/lLhl19+0Ysvvqjdu3fL19dXHTt2lI+PjySdNWCdyezZszV79my5ubkpNDRUl156qf7v//7PbpOcitHmk/Xq1UtS+VTNoKAgu6AryTaCd7b383zulaRt27bp0ksv1X//+1/dc8896ty581mvB4AzIZwBgINddtll8vX11VdffSUfHx+1bNlSsbGxla4LCAiQVL62p+IX0JMFBgae9XWaNGkiScrIyKh07ujRowoKCpIkJScnq6ioqNImHrXB399fBoPhjDVU1Fhh7Nix+uqrrzRq1CgtXLiw2tPfbrvtNv36668qLS3VM888o5YtW2rEiBFnvcdisWjjxo0aNGhQtV7j1JHN863hXN+b2vC///1PmzdvtpvmejpXXHGFVqxYoWuvvVabN2/WrFmz7MLZvn37NHLkSF155ZV666231KpVKxkMBi1cuLDStFap6vdOKn//brvtNlksFh06dEjTpk3T/fffr88//9x2zcSJE+3C9MnrxgIDA5Wdna2ysjK7kFXxjxgV/f10zudeSfq///s/3X333br++us1YcIEffLJJ5WCHgBUB9MaAcDBPDw8dOWVV+rrr7/W//73P9sal1O1a9dOISEhOnDggOLi4mwf4eHhmjJlSqWRjFNFRkYqLCxM//3vf+2O79+/Xxs2bFD37t0lST/++KM6deqk4ODgc/5ZLBbLWX8J9fHxUWxsrP73v/+prKzMdvzYsWP68ccfK62zi42N1axZs3Tw4EG9+uqr1a5j+vTp+vHHH/Xyyy/r2muvVVxcXJXrvf744w8VFBSod+/eZ72uYhTo1A0xzreGc31vzpfZbNbkyZM1cuRIu/Vgp3PllVdq48aN+uCDD9SjRw81bdrU7vzmzZtVXFysBx54QK1bt7aFr4pgVvGeVex0WNV7J5VvYBMXF6cuXbro2muv1V133aXt27crNzfXdk1kZKTdfwsnr+/r1auXSktLK+2GWhHuzvZ+ns+9UvlIp5eXl5555hlt2bJF7733XpU/LwCcDiNnAOAE1113nYYPHy6j0Wi3U+DJ3NzcNHr0aD3zzDNyc3PT5Zdfrry8PM2ePVtHjhypcjqe0WjUmDFjNG7cOD322GO66aablJ2drVmzZikwMFBDhw7Vli1btHDhQl1//fXasGGD7d6jR49KKh8hycrKqhTcsrKylJKSor1799pC3pk89thjuu+++/TAAw/ozjvvVElJiebMmSOz2ayRI0d
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"inertias = []\n",
"clusters_range = range(1, 11)\n",
"for i in clusters_range:\n",
" kmeans = KMeans(n_clusters=i, random_state=random_state)\n",
" kmeans.fit(data_scaled)\n",
" inertias.append(kmeans.inertia_)\n",
"\n",
"\n",
"plt.figure(figsize=(10, 6))\n",
"plt.plot(clusters_range, inertias, marker='o')\n",
"plt.title('Метод локтя для оптимального k')\n",
"plt.xlabel('Количество кластеров')\n",
"plt.ylabel('Инерция')\n",
"plt.grid(True)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Расчет коэффициентов силуэта**\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1oAAAImCAYAAABKNfuQAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACbl0lEQVR4nOzdd1zU9eMH8NfdccfeeykKygYH4MQ92rmzcjTUhivLkdm3rCzNWWo4UkstR27NBjnSnOAE2agoew/Z4+73B3E/CVQO7/wwXs/Hg0dy97kPr3tHxov3+/P+iBQKhQJERERERESkNmKhAxAREREREbU0LFpERERERERqxqJFRERERESkZixaREREREREasaiRUREREREpGYsWkRERERERGrGokVERERERKRmLFpERERERERqxqJFRERERESkZixaRNRqjB8/HuPHj6/12KVLl/DCCy/Aw8MDe/bs0ejX//DDDzFgwACVXzdgwAB8+OGHGkhERJri6uqKNWvWCB2DiASkJXQAIiKhZGdn4+2334anpyc2b94MV1dXoSMRERFRC8GiRUSt1g8//IDS0lIsXboU1tbWQschIiKiFoRLB4moVcrNzcWOHTvw/PPP1ylZCQkJmDFjBnr16oVOnTph/PjxuHz5cq1j/v77b4wYMQK+vr7o2bMnPv30U9y7d6/WMT///DP69+8PX19fzJo1C4WFhQCAdevWoUePHvDz88Onn36K8vJy5WvKy8vx2Wefwd/fH926dVMuPSoqKsKcOXPQqVMn9O3bFz///LPyNUlJSXB1dcX+/fuVj5WVlWHgwIG1ZunqWzp58eJFuLq64uLFi/V+DlTP/Pn5+dVZ9rhnzx48++yz8PLyQr9+/bBmzRpUVVUpn69vqeT9WWu+Vn0fNTkftWyyvvf0XxkZGZg3bx569OiBzp07Y9y4cbh69ary+f8u8VIoFBg7dixcXV2RlJRU67iHZZ0xYwb69OkDuVxe6+svWLAAQ4cOBQCkpaXh/fffR/fu3eHr64vx48fj2rVrAIA1a9Y88GvU5IuOjsa0adPQvXt3eHp6IjAwEIsWLUJpaelDx+Ds2bMPzd7Q9wgAx48fx/Dhw+Hr6/vQc91v//79cHV1xfXr1zF8+HD4+Pjg+eefxx9//FHruKSkJMydOxe9e/eGp6cnevTogblz5yI3N1d5TFRUFF599VV07twZgwYNwq5du5TP1ff9C9T9PnnUsr77v++2bdtW57+vCxcuwM3NDd99990Dz/Ffq1evhru7Ow4cONDg1xBR88YZLSJqVRQKBVJTU7Fo0SJUVlbirbfeqvV8fHw8xowZAycnJ3z88ceQSqXYtm0bJk6ciC1btiAgIAChoaF455138MILL+CDDz5AXFwcvvnmG8TGxuKnn36CRCLBsWPH8Pnnn2P8+PHo06cPdu/ejWPHjgEAfvvtNyxatAjJyclYvnw5dHR0MH/+fADAsmXLsG/fPsydOxc2NjZYtWoVkpOTkZycjKeeegqrV6/G6dOn8fnnn8PGxgYDBw6s931u2rSpVkl4HCtWrMC9e/dgZGSkfGzDhg1YtWoVxo0bh/nz5yMqKgpr1qxBamoqvvrqqwad19PTE7t37wZQXdr27t2r/NzAwEAt2YuKivDyyy+jqqoKc+bMgbW1NbZs2YI33ngDBw4cgJOTU53XHDp0qFYRu9+oUaMwevRo5eefffZZref+/PNPXLx4ET169AAAlJaW4o8//sDkyZNRXl6OSZMmoaKiAp9++imkUimCgoIwfvx4/PLLLxg9ejQCAwNrnffTTz8FANjY2CAjIwOvvvoqOnXqhCVLlkAmk+H06dP44YcfYGVlhSlTpjxwHEpLS2FjY4Nvv/223uwNfY93797FzJkzERgYiFmzZim/Jx50rv966623MG7cOMyaNQt79+7Fe++9hw0bNqBv374oKSnBhAkTYGpqik8//RSGhoa4evUq1q5dCx0dHXz++ecoKSnB5MmTYW9vjzVr1uDKlSv49NNPYWdnhz59+jQog6rGjx+P4OBgfP311+jXrx9kMhk++ugjdOrUCW+//XaDzrF582YEBQVh0aJFGD58uEZyElHTw6JFRK1KaGgo+vXrB6lUiu+//77OD9pr166FTCbDtm3blD/s9+vXD8899xyWLl2KvXv34uDBg3BycsLixYshFovRq1cv6Orq4pNPPsGpU6cwYMAArF+/Ht26dcPHH38MAOjWrRt69eqFe/fuYfHixfDy8gIAFBQU4Pvvv8e7774LuVyO3bt3Y8qUKRg3bhwAwMLCAi+99BJMTEywfPlySKVS9OnTB7GxsdiwYUO9RSs1NRXff/89PD09ERER8VjjFR4ejkOHDsHd3R0FBQUAgHv37iEoKAgvvfSS8v317t0bJiYm+Pjjj/H666+jQ4cOjzy3gYEBOnXqBAD4559/AED5ubocOHAAycnJOHDgANzd3QEAXbp0wbBhwxAaGlrn339RURGWL1/+wLGzsbGplfH+Qti7d2/Y2Njg4MGDyqL1119/obi4GMOGDcO1a9dw69Yt/Pzzz+jcubMyy+DBgxEUFIQ1a9bAxsam1nnv/1pnzpyBu7s7vv32W+XzPXv2xNmzZ3Hx4sWHFq2SkhIYGRk9MHtD32NkZCQqKiowa9YsdOzY8ZHn+q/x48dj6tSpAIDAwEAMHz4c3333Hfr27YuEhATY2Njg66+/hqOjIwCge/fuuH79OkJCQgAAycnJ8Pb2xkcffQRHR0f07t0bO3bswD///KOxoiUSibB48WK88MILWLZsGSQSCfLy8rB161ZIJJJHvn7nzp1YtmwZPv/8c4waNUojGYmoaeLSQSJqVTw8PLBkyRIYGxtj/vz5dWZ9QkJC0L9//1o/OGppaeHZZ5/FjRs3UFRUhC+//BIHDx6EWCxGZWUlKisrMXToUIjFYoSGhqKyshKRkZHo3bu38hza2trw9fWFrq6usmQB1T+cl5aWIiYmBjExMSgrK1POagDVP2hra2vDx8cHUqm01usiIiJqLdWr8fXXX8PPzw/9+/d/rLFSKBRYtGgRRo0aBTc3N+XjV69eRWlpKQYMGKB8/5WVlcplgmfPnq11nvuP+e+yuobmaOxrL1++DAcHB2XJAgBdXV38+eeftWZtagQFBcHU1BQvv/yyyl9LLBZj+PDhCA4ORklJCYDqotezZ0/Y2NggICAA165dQ6dOnVBVVYXKykoYGRmhV69eCA0NfeT5e/fujZ9++gna2tqIj4/H8ePHsW7dOuTk5NRaflqf1NRUGBoaqvye/svT0xNaWlr46aefkJycjPLyclRWVkKhUDTo9ffP5ohEIgwePBhhYWEoLS2Fu7s7duzYAXt7eyQkJODUqVPYvHkzbt26pXx/Li4uWLduHRwdHVFeXo7Tp08jPz8fzs7Otb6OXC6v9X1XX76aYxqS3dHREbNnz8aBAwewZ88efPzxx8oy+DAnT57EZ599Bj8/P4wZM+aRxxNRy8IZLSJqVQwMDDB8+HC0b98eL7/8Mt577z3s3r1b+Zvp/Px8WFhY1HmdhYUFFAoFCgsLoa+vD21tbQDVP3jer6CgANnZ2aiqqoKpqWmt50xMTGBsbFzrsZqlV1lZWcrS9N/XGRsbw8TEpM7rKisra127AlQXxWPHjuHw4cM4evRoQ4bkgQ4ePIiEhASsX78eX3/9tfLxvLw8AHjgDEpGRobyz8nJyXXGqDE5Dh48CJFIBHNzc3Tt2hUzZ86s88N1ffLy8mBubt6gr5OQkICtW7di06ZNSElJaVTWkSNHYv369QgODkb37t1x/vx5LF++XPm8TCYDUH3d1v3X6jRkZkQul2PlypX4+eefUVxcDFtbW/j4+Ci/Fx8mOTkZ9vb2jXhHtTk6OmLZsmVYuXKlcplnjYCAgEe+3srKqtbn5ubmUCgUKCgogI6ODn744QesX78eeXl5sLCwgJeXF3R1detc/1hQUAB/f38AgKWlJZ5++ulaz7/22mt1vvZ/8wUFBSEoKAgSiQQWFhbo3bs3Zs6c+cCNcZ555hksWbIEANCrV69HvlcAiIiIQL9+/fD
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.metrics import silhouette_score\n",
"\n",
"silhouette_scores = []\n",
"for i in clusters_range[1:]: \n",
" kmeans = KMeans(n_clusters=i, random_state=random_state)\n",
" labels = kmeans.fit_predict(data_scaled)\n",
" score = silhouette_score(data_scaled, labels)\n",
" silhouette_scores.append(score)\n",
"\n",
"# Построение диаграммы значений силуэта\n",
"plt.figure(figsize=(10, 6))\n",
"plt.plot(clusters_range[1:], silhouette_scores, marker='o')\n",
"plt.title('Коэффициенты силуэта для разных k')\n",
"plt.xlabel('Количество кластеров')\n",
"plt.ylabel('Коэффициент силуэта')\n",
"plt.grid(True)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Средний коэффициент силуэта: 0.466\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA00AAAJzCAYAAADTBPhFAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3iTVf8G8DtJkzRtuvee0N2yypI9HIAoICoyXLyKgnvy6k9f9VXRFxdLEFQEHIgMQUCUJUNW2UJZ3XvvpM38/VEbCW1D05WO+3NdXpLnJM/zTZ6mzZ1znnMEer1eDyIiIiIiImqQ0NIFEBERERERdWQMTURERERERCYwNBEREREREZnA0ERERERERGQCQxMREREREZEJDE1EREREREQmMDQRERERERGZwNBERERERERkAkMTERERERGRCQxNRNRsM2fORFhYmNF//fr1w6xZs3D8+HFLl0dEXVxYWBgWL15cb/uVK1cwaNAgDB8+HKmpqY0+fvHixQgLC0NMTAwqKysbvM/333+PsLAwjBo1qrXKJqJOiKGJiFokMjIS69evx/r16/Hdd99hwYIFEIvFePTRR3H16lVLl0dE3czVq1fx0EMPQSaTYd26dQgMDLzpYzQaDfbu3dtg244dO1q5QiLqjBiaiKhF5HI5evXqhV69eqFv374YM2YMFi9eDKFQiE2bNlm6PCLqRpKSkvDggw/C1tYW69atg5+fX5Me16dPH+zcubPe9ry8PCQkJCAiIqK1SyWiToahiYhanUwmg1QqhUAgMGybOXMmZs6caXS/jz76CGFhYUbhat26dRg9ejR69+6NGTNm4MqVKwCAb7/9FmFhYUhJSTHax88//4yIiAjk5OQAAHbv3o0HHngAvXv3RnR0NG6//XZ8++23Ro959dVX6w0rrPsvMzPTcJ8bh+P88MMP9YYD7dixA+PGjUOvXr0wefJkJCQkGD3mZvUcO3YMYWFhOHbsmNHjbny9mvL6qVQqfPDBBxg+fDgiIiKMnpepAHvjvt99913ExMTgwIEDAP4ZwtTQf9fX3ZTXPj8/H6+88goGDRpkOMenT58GAIwaNeqm5yUhIQEzZsxAXFwc+vfvj1deeQXFxcWG/W/atAlhYWE4e/YsJk2ahNjYWNx555349ddfjeqoqKjA+++/jzFjxiAmJgYTJkzATz/9ZHSf6+sJDw9HfHw8nnrqKZSUlDT6WgJAcnIy5s2bh/79+yM+Ph6PP/44kpKSGr2/qdf3+vOWmpqKp59+Grfccgt69eqFmTNn4uTJk4b2zMxMw+O2bt1qdIx9+/YZ2q63Y8cOTJ48Gb1798Ytt9yCN954A2VlZfVqu15DP4ujRo3Cq6++2ujtG9XVev3zO3XqFO677z7ExMTglltuwTvvvIPq6upG93GjpKQkzJo1C3Z2dli3bh28vb2b/Nhx48bh0KFD9Ybo/frrrwgKCkJ4eHi9x+zevRuTJ0821Pvf//4XCoWi3n2a8v4/cuQIHnnkEcTFxeGWW27B//73P2i1WsP9Dh8+jHvvvRe9e/dGfHw8nnjiCZM/U0TU+hiaiKhF9Ho9NBoNNBoN1Go1CgoK8NFHH0GlUmHKlCmNPi49PR2rV6822vbbb7/hnXfewfjx47F06VJotVrMmTMHKpUKd955J6RSKX7++Wejx2zZsgWDBg2Cl5cX9u/fj7lz5yIqKgrLli3D4sWL4efnh7fffhtnz541epybm5thWOH69evxxBNPmHyeZWVl+PTTT422nTt3Di+++CJ69eqFzz//HF5eXpgzZw4KCwsBwKx6zNXQ67dy5Up88803ePDBB/HNN99g/fr1WLJkiVn7PXfuHL7//nt8+umn6N27t1Hb9a/XG2+8YdTWlOdaVVWFadOm4dixY3jppZewZMkSSKVSPPLII0hNTcWSJUuMan7iiScMx3N3d8eJEyfw0EMPwdraGp9++in+/e9/4/jx45g1a1a9D9ePP/44Ro8ejSVLliAoKAjPPvss/vjjDwBAdXU1HnjgAWzbtg2zZ8/GsmXL0LdvX7z22mtYvny50X6GDx+O9evXY+3atXjhhRdw+PBhvPvuu42+fnl5ebjvvvuQmpqK//znP/jf//6HwsJCPPjggygtLTX52l//+t543q5du4bJkycjMzMTr7/+OhYuXAiBQIAHH3yw3vWDtra29Yaa7dixA0Kh8Z/8ZcuW4fnnn0evXr2waNEizJ07F7t27cLMmTPNCiutIScnB48++iicnJywZMkSPP300/j555/x8ssvN+nxycnJePDBByGXy7Fu3Tp4eHiYdfzbbrsNWq22wddt/Pjx9e6/bds2zJ07F8HBwVi6dCnmzZuHrVu34sknn4Rerwdg3vv/xRdfRN++fbF8+XJMmDABq1atwoYNGwAAGRkZePLJJxEdHY3PP/8c7777LlJSUvDYY49Bp9OZ9TyJqPmsLF0AEXVuJ06cQFRUVL3tzz//PEJCQhp93HvvvYcePXrgwoULhm3FxcV44IEH8PzzzwOo7Tmp+5Y+IiICY8eOxdatW/HMM89AIBAgNzcXR48exf/+9z8AtR8sJ02ahNdee82wz969e2PAgAE4duwY4uLiDNslEgl69epluJ2cnGzyeS5atAje3t5GvQy5ubm47bbb8N///hdCoRCurq6YMGECzpw5gzFjxphVj7kaev3OnTuH8PBwPPLII4ZtdT00TVXX0zd69Oh6bde/XjU1NUZtTXmumzdvRlZWFjZv3mwY7tSnTx/cfffdOHHiBKZOnWpUs7+/v9ExP/roIwQFBWHFihUQiUQAgLi4OIwfPx4bN27E9OnTDfedOXMm5s6dCwAYOnQoJk2ahKVLl2L48OHYtGkTrly5gh9++MEQDIcOHQqNRoNly5bh/vvvh6OjIwDA2dnZUEN8fDz+/PNPo9f8RqtXr4ZKpcLXX38NNzc3AEB4eDimTZuGs2fPYvjw4Y0+9vrneuN5W7JkCSQSCdasWQO5XA4AGDFiBCZMmIAPP/zQqJds2LBhOHjwIFQqFSQSCWpqarBnzx7Ex8cbegbLysrw+eef49577zUKwD179sT06dPrvZ5tbeXKlXBycsLSpUsN51YoFOL111/H5cuX6/V2XS81NRWzZs1CYWEh1Gp1s4KEq6sr4uPjsXPnTkycOBEAkJWVhbNnz+LDDz/E559/brivXq/HwoULMXToUCxcuNCwPTAwEA899BD++OMPjBgxwqz3/9SpUw0/r4MGDcLu3buxf/9+3H///Th37hyqq6vx+OOPG8Kgp6cn9uzZA4VCYfh5IKK2xdBERC0SFRWFt956C0Dth4ny8nIcOHAAn3zyCRQKBZ577rl6jzlw4AD+/PNPrFy5ErNmzTJsv//++wEAOp0OCoUCv/32G6ytreHj4wMAuOeee/DLL78gISEB8fHx2LJlC2xtbTF27FgAwOzZswHU9mikpKQgPT0d58+fB1AbwJrrypUrht6GuhoB4NZbb8Wtt94KvV4PhUKBnTt3QigUIigoqE3raez1i4mJwRdffIFdu3Zh4MCBsLW1bfIHSL1ej9OnT2PHjh31erCaoinP9eTJk/D19TW6PkQmk2HXrl033b9SqcTZs2fx6KOPGno3AcDPzw8hISE4fPiw0Yf8SZMmGf4tEAgwduxYLF68GNXV1Th+/Dh8fHzq9aRNnDgRP/30k1G4qTuWTqfDpUuXcPLkSQwePLjROk+ePIlevXoZAhNQ+wF33759N32Ophw/fhwjR440+oBsZWVl6JWtqqoybB84cCAOHDiAY8eOYejQoThw4ADkcjn69etnCE1nzpyBSqXChAkTjI7Tr18/+Pj44Pjx4y0OTXWvnVAorNfLVUen00Gj0SAhIQFDhgwxBCagNvwBta+pqdD0yy+/IDo6Gp988gkeeeQRvPTSS1i9erXRMbVaraEHCKj9mbj+WEDtEL3//ve/qKyshFwux/bt2xEVFYWAgACj+yUnJyM3NxePP/644ecQqA3Vcrkchw8fxogRI8x6/9/4s+jp6WkY6hcXFwepVIp77rkHt99+O4YNG4YBAwYgNja20deEiFo
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.decomposition import PCA\n",
"\n",
"# Применение K-Means\n",
"kmeans = KMeans(n_clusters=3, random_state=42) \n",
"df_clusters = kmeans.fit_predict(data_scaled)\n",
"\n",
"# Оценка качества кластеризации\n",
"silhouette_avg = silhouette_score(data_scaled, df_clusters)\n",
"print(f'Средний коэффициент силуэта: {silhouette_avg:.3f}')\n",
"\n",
"# Визуализация кластеров\n",
"pca = PCA(n_components=2)\n",
"df_pca = pca.fit_transform(data_scaled)\n",
"\n",
"plt.figure(figsize=(10, 7))\n",
"sns.scatterplot(x=df_pca[:, 0], y=df_pca[:, 1], hue=df_clusters, palette='viridis', alpha=0.7)\n",
"plt.title('Визуализация кластеров с помощью K-Means')\n",
"plt.xlabel('Первая компонентa PCA')\n",
"plt.ylabel('Вторая компонентa PCA')\n",
"plt.legend(title='Кластер', loc='upper right')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Средний коэффициент силуэта, равный 0.466, указывает на умеренно хорошую кластеризацию."
2024-11-29 20:49:12 +04:00
]
}
],
"metadata": {
2024-11-29 21:41:18 +04:00
"kernelspec": {
"display_name": "miienv",
"language": "python",
"name": "python3"
},
2024-11-29 20:49:12 +04:00
"language_info": {
2024-11-29 21:41:18 +04:00
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
2024-11-29 20:49:12 +04:00
}
},
"nbformat": 4,
"nbformat_minor": 2
}