777 lines
1.6 MiB
Plaintext
Raw Normal View History

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## **Лабораторная работа №5**\n",
"\n",
"**Бизнес-цели:**\n",
"\n",
"Улучшение финансового планирования\n",
"\n",
"Использование подходов кластеризации для предсказания объемов продаж и доходов по различным сегментам\n",
"\n",
"**Столбцы датасета и их пояснение:**\n",
"\n",
"**Date** - Дата, на которую относятся данные. Эта характеристика указывает конкретный день, в который происходила торговля акциями Walmart.\n",
"\n",
"**Open** - Цена открытия. Стоимость акций Walmart в начале торгового дня. Это важный показатель, который показывает, по какой цене начались торги в конкретный день, и часто используется для сравнения с ценой закрытия для определения дневного тренда.\n",
"\n",
"**High** - Максимальная цена за день. Наибольшая цена, достигнутая акциями Walmart в течение торгового дня. Эта характеристика указывает, какой была самая высокая стоимость акций за день.\n",
"\n",
"**Low** - Минимальная цена за день. Наименьшая цена, по которой торговались акции Walmart в течение дня.\n",
"\n",
"**Close** - Цена закрытия. Стоимость акций Walmart в конце торгового дня. Цена закрытия — один из основных показателей, используемых для анализа акций, так как она отображает итоговую стоимость акций за день и часто используется для расчета дневных изменений и трендов на длительных временных периодах.\n",
"\n",
"**Adj Close** - Скорректированная цена закрытия. Цена закрытия, скорректированная с учетом всех корпоративных действий.\n",
"\n",
"**Volume** - Объем торгов. Количество акций Walmart, проданных и купленных в течение дня. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Выгружаем данные**"
]
},
{
"cell_type": "code",
2024-12-21 15:22:35 +04:00
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-12-21 15:22:35 +04:00
" Date Open High Low Close Adj Close Volume\n",
"0 1972-08-25 0.021159 0.021566 0.021159 0.021484 0.011664 7526400\n",
"1 1972-08-28 0.021484 0.021647 0.021403 0.021403 0.011620 2918400\n",
"2 1972-08-29 0.021322 0.021322 0.021159 0.021159 0.011488 5836800\n",
"3 1972-08-30 0.021159 0.021159 0.020996 0.021159 0.011488 1228800\n",
"4 1972-08-31 0.020996 0.020996 0.020833 0.020833 0.011311 2611200\n",
"Index(['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'], dtype='object')\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Date</th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Adj Close</th>\n",
" <th>Volume</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
2024-12-21 15:22:35 +04:00
" <td>1972-08-25</td>\n",
" <td>0.021159</td>\n",
" <td>0.021566</td>\n",
" <td>0.021159</td>\n",
" <td>0.021484</td>\n",
" <td>0.011664</td>\n",
" <td>7526400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
2024-12-21 15:22:35 +04:00
" <td>1972-08-28</td>\n",
" <td>0.021484</td>\n",
" <td>0.021647</td>\n",
" <td>0.021403</td>\n",
" <td>0.021403</td>\n",
" <td>0.011620</td>\n",
" <td>2918400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
2024-12-21 15:22:35 +04:00
" <td>1972-08-29</td>\n",
" <td>0.021322</td>\n",
" <td>0.021322</td>\n",
" <td>0.021159</td>\n",
" <td>0.021159</td>\n",
" <td>0.011488</td>\n",
" <td>5836800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
2024-12-21 15:22:35 +04:00
" <td>1972-08-30</td>\n",
" <td>0.021159</td>\n",
" <td>0.021159</td>\n",
" <td>0.020996</td>\n",
" <td>0.021159</td>\n",
" <td>0.011488</td>\n",
" <td>1228800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
2024-12-21 15:22:35 +04:00
" <td>1972-08-31</td>\n",
" <td>0.020996</td>\n",
" <td>0.020996</td>\n",
" <td>0.020833</td>\n",
" <td>0.020833</td>\n",
" <td>0.011311</td>\n",
" <td>2611200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
2024-12-21 15:22:35 +04:00
" <td>1972-09-01</td>\n",
" <td>0.020915</td>\n",
" <td>0.020996</td>\n",
" <td>0.020915</td>\n",
" <td>0.020996</td>\n",
" <td>0.011400</td>\n",
" <td>768000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
2024-12-21 15:22:35 +04:00
" <td>1972-09-05</td>\n",
" <td>0.020996</td>\n",
" <td>0.020996</td>\n",
" <td>0.020833</td>\n",
" <td>0.020833</td>\n",
" <td>0.011311</td>\n",
" <td>1689600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
2024-12-21 15:22:35 +04:00
" <td>1972-09-06</td>\n",
" <td>0.020996</td>\n",
" <td>0.020996</td>\n",
" <td>0.020996</td>\n",
" <td>0.020996</td>\n",
" <td>0.011400</td>\n",
" <td>768000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
2024-12-21 15:22:35 +04:00
" <td>1972-09-07</td>\n",
" <td>0.020996</td>\n",
" <td>0.020996</td>\n",
" <td>0.020915</td>\n",
" <td>0.020915</td>\n",
" <td>0.011356</td>\n",
" <td>3532800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
2024-12-21 15:22:35 +04:00
" <td>1972-09-08</td>\n",
" <td>0.020833</td>\n",
" <td>0.020833</td>\n",
" <td>0.020752</td>\n",
" <td>0.020752</td>\n",
" <td>0.011267</td>\n",
" <td>1996800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
2024-12-21 15:22:35 +04:00
" <td>1972-09-11</td>\n",
" <td>0.020915</td>\n",
" <td>0.020915</td>\n",
" <td>0.020915</td>\n",
" <td>0.020915</td>\n",
" <td>0.011356</td>\n",
" <td>2764800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
2024-12-21 15:22:35 +04:00
" <td>1972-09-12</td>\n",
" <td>0.020915</td>\n",
" <td>0.020915</td>\n",
" <td>0.020915</td>\n",
" <td>0.020915</td>\n",
" <td>0.011356</td>\n",
" <td>1843200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
2024-12-21 15:22:35 +04:00
" <td>1972-09-13</td>\n",
" <td>0.020915</td>\n",
" <td>0.020915</td>\n",
" <td>0.020915</td>\n",
" <td>0.020915</td>\n",
" <td>0.011356</td>\n",
" <td>460800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
2024-12-21 15:22:35 +04:00
" <td>1972-09-14</td>\n",
" <td>0.020915</td>\n",
" <td>0.021077</td>\n",
" <td>0.020833</td>\n",
" <td>0.021077</td>\n",
" <td>0.011443</td>\n",
" <td>3840000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
2024-12-21 15:22:35 +04:00
" <td>1972-09-15</td>\n",
" <td>0.020996</td>\n",
" <td>0.020996</td>\n",
" <td>0.020915</td>\n",
" <td>0.020915</td>\n",
" <td>0.011356</td>\n",
" <td>1536000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
2024-12-21 15:22:35 +04:00
" Date Open High Low Close Adj Close Volume\n",
"0 1972-08-25 0.021159 0.021566 0.021159 0.021484 0.011664 7526400\n",
"1 1972-08-28 0.021484 0.021647 0.021403 0.021403 0.011620 2918400\n",
"2 1972-08-29 0.021322 0.021322 0.021159 0.021159 0.011488 5836800\n",
"3 1972-08-30 0.021159 0.021159 0.020996 0.021159 0.011488 1228800\n",
"4 1972-08-31 0.020996 0.020996 0.020833 0.020833 0.011311 2611200\n",
"5 1972-09-01 0.020915 0.020996 0.020915 0.020996 0.011400 768000\n",
"6 1972-09-05 0.020996 0.020996 0.020833 0.020833 0.011311 1689600\n",
"7 1972-09-06 0.020996 0.020996 0.020996 0.020996 0.011400 768000\n",
"8 1972-09-07 0.020996 0.020996 0.020915 0.020915 0.011356 3532800\n",
"9 1972-09-08 0.020833 0.020833 0.020752 0.020752 0.011267 1996800\n",
"10 1972-09-11 0.020915 0.020915 0.020915 0.020915 0.011356 2764800\n",
"11 1972-09-12 0.020915 0.020915 0.020915 0.020915 0.011356 1843200\n",
"12 1972-09-13 0.020915 0.020915 0.020915 0.020915 0.011356 460800\n",
"13 1972-09-14 0.020915 0.021077 0.020833 0.021077 0.011443 3840000\n",
"14 1972-09-15 0.020996 0.020996 0.020915 0.020915 0.011356 1536000"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Date 0\n",
"Open 0\n",
"High 0\n",
"Low 0\n",
"Close 0\n",
"Adj Close 0\n",
"Volume 0\n",
"dtype: int64\n"
]
}
],
"source": [
"import pandas as pd\n",
"\n",
2024-12-21 15:22:35 +04:00
"df = pd.read_csv(\"data/wmt_data.csv\").head(15000)\n",
"\n",
"print(df.head())\n",
"print(df.columns)\n",
"display(df.head(15))\n",
"print(df.isnull().sum()) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Очистка данных**"
]
},
{
"cell_type": "code",
2024-12-21 15:22:35 +04:00
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-12-21 15:22:35 +04:00
" Open High Low Close Adj Close Volume\n",
"0 0.021159 0.021566 0.021159 0.021484 0.011664 7526400\n",
"1 0.021484 0.021647 0.021403 0.021403 0.011620 2918400\n",
"2 0.021322 0.021322 0.021159 0.021159 0.011488 5836800\n",
"3 0.021159 0.021159 0.020996 0.021159 0.011488 1228800\n",
"4 0.020996 0.020996 0.020833 0.020833 0.011311 2611200\n"
]
}
],
"source": [
"df_cleaned = df.drop(columns=['Date'], errors='ignore').dropna()\n",
"print(df_cleaned.head()) # Вывод очищенного DataFrame\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Визуализация парных взаимосвязей**"
]
},
{
"cell_type": "code",
2024-12-21 15:22:35 +04:00
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
2024-12-21 15:22:35 +04:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi8AAASgCAYAAACAO9vxAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs/Xl8XGd5//+/zjmzLxpJo32zJO+7HTuOs8dJIJCQQIBSSlrK9m1pCJQSKNB8KE0pj26EpRDaQlj6a8sOTdqwE0wSQjYncTbvjmztGmm0jGafM+f8/lAk7DiBEDkZ2Xo/Hw8T65wz0n1fmTyYS9d935fhuq6LiIiIiIiIiIiIiIjIAmFWegAiIiIiIiIiIiIiIiLHUvFCREREREREREREREQWFBUvRERERERERERERERkQVHxQkREREREREREREREFhQVL0REREREREREREREZEFR8UJERERERERERERERBYUFS9ERERERERERERERGRBUfFCREREREREREREREQWFBUvRERERERERERERERkQVHxQkTkNPTQQw/x7ne/m3PPPZf169dzySWX8P/+3//j8OHDlR7aS6K/v5+VK1fyve9971nvf+9732PlypX09/c/69fPxwt5jYiIiIjIqeSP/uiP+KM/+qNKD+NFl0ql+NznPseVV17J5s2bOfvss/njP/5jfv7znx/33Ic+9CEuvvjiCo1SRGTxUfFCROQ084UvfIFrrrmGXC7HX/3VX/GlL32Jd77znezZs4err76a73//+5Ue4oJz0UUX8c1vfpOGhoZKD0VERERERF5Chw8f5jWveQ3f+c53uPrqq7n55pv52Mc+RnV1NX/2Z3/G5z//+UoPUURk0fJUegAiInLy7Ny5k5tuuol3v/vdXHfddXPXt23bxmte8xquv/56PvShD7FixQqWL19ewZEuLLW1tdTW1lZ6GCIiIiIi8hIqlUq8973vxev18rWvfY14PD5379JLL+UjH/kIn/nMZ7j44otZtWpVBUcqIrI4aeeFiMhp5HOf+xzd3d28613vOuGe1+vlb//2b7Esiy9+8Ytz11euXMl//dd/8cEPfpDNmzdzzjnn8PGPf5xCoXDc63/2s5/x2te+lvXr13Puuefyd3/3d2Sz2bn7n/3sZ3nZy17GL37xC6688krWrVvHZZddxq233vqc4/2///s/Vq5cyYEDB074WStXrmTPnj0A/Md//AeveMUrWL9+Peeffz5/8zd/QzqdfiEhelbPdgTU//zP/3D55Zezfv16rrrqKu69917WrFlzwlFUjz76KG984xtZv349F110EbfccstJG5eIiIiIyKngnnvu4U1vehNbtmzhrLPO4vrrr2doaAiAO+6447jP9gC33norK1eu5Nvf/vbctb1797Jy5UoeeeSRE77/v/3bv7Fu3TqmpqaOu/7Vr36VtWvXkkwmcRyHT33qU1x88cWsW7eOiy++mJtuuolSqfSc477zzjs5cOAAf/7nf35c4WLWe97zHv7wD/8Q27af9fXlcpn//u//5sorr2TDhg1cdNFFfOITnzgulxofH+f666+fO9L31a9+9Qk50uDgIO973/vYtm0bGzdu5I//+I+Pi5eIyGKl4oWIyGlifHycJ554gh07dmAYxrM+U11dzTnnnMMdd9xx3PXPfOYzJJNJPv3pT/OOd7yDb37zm3zwgx+cu/9///d/vOtd76K7u5ubb76Z6667jv/93//l2muvxXXduedGR0f527/9W9785jfzhS98gba2Nj74wQ8+Z6+NSy+9lFAodMJRVrfffjvLly9nzZo13H777fzzP/8z11xzDV/60pd417vexW233cbHPvax3xoTx3GwbfuEP47j/MbX3XrrrXzoQx/ijDPO4POf/zyXXXYZ1157LeVy+YRn/+Zv/oYrrriCL3zhC2zevJl//ud/ZufOnb91bCIiIiIip4Nbb72Vt73tbTQ3N/PJT36SD3/4wzzyyCP8/u//PslkkrPPPhufz8evfvWrudfcd999AOzatWvu2l133UVtbS0bN2484WdceeWV2LbNT37yk+Ouf//73+e8884jHo/zxS9+ka9//eu8613v4stf/jJ/8Ad/wJe+9CX+9V//9TnHftddd2FZFhdeeOGz3q+vr+cjH/kI69ate9b7f/3Xf83f//3fc+mll/Kv//qvXHPNNfzXf/3XcXnSBz7wAQ4fPsyNN97IF7/4RdasWcMHP/jBuRiMj4/zxje+kSeffJKPfOQj3HTTTTiOwzXXXLNoehaKiDwXHRslInKaGBgYAKC1tfU3PrdkyRLuuOMOpqamiMViwMyxSf/2b/+Gx+PhwgsvxDRN/v7v/553v/vddHd384lPfILzzz+fT3ziE3Pfp7Ozk7e85S3ceeedXHTRRQDkcjk+/vGPc/bZZ889s2PHDu68806WLl16wliCwSCXXXYZP/jBD/iLv/gLADKZDDt37pzbPfLAAw/Q1tbGNddcg2mabNu2jVAodMKqq2dzww03cMMNN/zW557pM5/5DDt27ODv/u7vADj//PPxer3cdNNNJzz7vve9jz/4gz8AYNOmTfz0pz/lvvvuY8eOHb/zzxUREREROZU4jsMnPvEJzjvvvOM+K59xxhlcfvnlfOlLX+Iv//Iv2bZtG/feey/veMc7ALj33ntZu3YtDz744Nxr7r777rlc5JlaW1s588wzuf322/m93/s9AHp7e3nsscf41Kc+BczkDevWreN1r3sdMHN0bjAYJBqNPuf4h4eHqampIRwO/85zP3ToEN/5zne4/vrr+ZM/+RMAzj33XBoaGvjLv/xL7rrrLi688EIeeOAB3vWud3HppZfOjau6uhqfzwfM7DKfnJzk61//+lwud8EFF3D55Zfzmc98hn/5l3/5nccmInK60M4LEZHTxOzKHq/X+xufsyzruOdhZiWTx/PrevZll10GwIMPPshTTz3F8PAwF1988XG7F84880wikQj33HPPcd9/06ZNc39vamoCOO54qWd69atfPZd4wMy28mKxyFVXXQXA9u3b6enp4bWvfS2f+9znePzxx7nyyiv5oz/6o984T4DrrruO73znOyf8ObYfyDMdPXqUwcFBXvGKVxx3/YorrnjW57du3Tr392AwSF1dHalU6reOTURERETkVNfT08Po6CivetWrjrve0dHB5s2beeCBBwC46KKLeOihhygWi/T09DA8PMw73/lOBgYGGBgYIJ1O88gjj8wtino2V111FQ8++CCjo6PAzK6LSCTCxRdfDMBZZ501d3zVLbfcwqFDh/jDP/xDXv3qVz/n97Qs61l3Vz8fs3N7Zp5wxRVXYFkW999//9y4PvvZz/Ke97yHb3/724yNjfHBD36QM844A5gp5KxevZrGxsa5XMs0TS644ILjdquIiCxGKl6IiJwmZlfpzO7AeC59fX2Ew2Gqq6vnrjU2Nh73zOx5r1NTU0xOTgJw4403snbt2uP+pNNpEonEca8NBoNzf59dNXVsoeSZzjrrLBobG+eOjvr+97/Ptm3b5gofl19+OTfddBOhUIjPf/7zvP71r+eSSy7hBz/4wW+cJ8zEZP369Sf8+U27U8bHx4+Lway6urpnff7Y+cLMnH/TfEVEREREThezucKzfVauq6tjenoamCle5HI5Hn74Ye699166urrYsWMHoVCIBx98kF/96lcYhsF55533nD/rFa94BR6Phx/+8IfATN5w2WWXEQgEAHjHO97BX//1X5PP5/nEJz7BFVdcwate9aq545meTWtrK1NTU2Qymed8Znh4+Fmvz+4Er6+vP+66x+OhpqZmbu6f+tSneMtb3sITTzzB//t//48LL7yQt7/97XN52+TkJLt37z4h1/rv//5vpqenyeVyzzk2EZHTnY6NEhE5TcTjcTZt2sSPf/xj/vzP//xZt1un02nuueeeudVJsyYmJo77emxsDJg5Tqqqqgpgbrv3M80ePfVCmabJlVdeye2338473/lO7rnnHv72b//2uGde9apX8apXvYrp6Wl++ctf8sUvfpEPfOADbNmy5YTCy3zNFk2
"text/plain": [
"<Figure size 1600x1200 with 4 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"sns.set(style=\"whitegrid\")\n",
"\n",
"plt.figure(figsize=(16, 12))\n",
"plt.subplot(2, 2, 1)\n",
2024-12-21 15:22:35 +04:00
"sns.scatterplot(x=df_cleaned[\"Open\"], y=df_cleaned[\"High\"], alpha=0.6)\n",
"plt.title('Open vs High')\n",
"\n",
"plt.subplot(2, 2, 2)\n",
"sns.scatterplot(x=df_cleaned['Low'], y=df_cleaned['Close'], alpha=0.6)\n",
"plt.title('Low vs Close')\n",
"\n",
"plt.subplot(2, 2, 3)\n",
"sns.scatterplot(x=df_cleaned['High'], y=df_cleaned['Adj Close'], alpha=0.6)\n",
"plt.title('High vs Adj Close')\n",
"\n",
"plt.subplot(2, 2, 4)\n",
"sns.scatterplot(x=df_cleaned['Volume'], y=df_cleaned['Adj Close'], alpha=0.6)\n",
"plt.title('Volume vs Adj Close')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
2024-12-21 15:22:35 +04:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Корелляция сильная на первых трёх графиках, а по последнему можно сделать вывод, что корелляция слабая и прямой зависимости нет."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Стандартизация данных для кластеризации**\n",
"\n"
]
},
{
"cell_type": "code",
2024-12-21 15:22:35 +04:00
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import StandardScaler\n",
"\n",
"scaler = StandardScaler()\n",
"data_scaled = scaler.fit_transform(df_cleaned)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Агломеративная (иерархическая) кластеризация**\n"
]
},
{
"cell_type": "code",
2024-12-21 15:22:35 +04:00
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
2024-12-21 15:22:35 +04:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1kAAAJ4CAYAAACXhikUAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAcTFJREFUeJzt3Xd4VFXixvF3kpk0EiC0UKT3TsAg7EoxIOKKsoCiUlSUBalKUWEFLNRVOgiCjS5dhZ8FBRRdCwqiYEWlSAsBQkJNJsnc3x/ZjDPJpE1uMgl8P8/jY3LbnFsmnPeec8+1GIZhCAAAAABgCj9fFwAAAAAAriWELAAAAAAwESELAAAAAExEyAIAAAAAExGyAAAAAMBEhCwAAAAAMBEhCwAAAABMRMgCAAAAABMRsgAAAADARIQsAKYZN26c6tev7/G/cePG+bp4AFwkJCSoVatW2r9/vxISEjRkyBC9/vrrvi4WAFwTrL4uAIBrS/ny5bVw4UK3acOHD/dRaQBkpVSpUhowYIB69+4twzBUv359/ec///F1sQDgmkDIAmCa1NRUhYSEqEWLFm7TAwICfFMgANkaPny47rvvPl24cEHVq1eXv7+/r4sEANcEugsCME1KSoqCgoJyteyePXvUr18/NW/eXK1bt9ZTTz2luLg45/zNmzerfv36On78uNt60dHRbl0Pk5OTs+yimHFb33//vXr06KFmzZrpzjvv1AcffOC27YsXL2r69Onq3LmzmjZtqm7dumnjxo2ZPj/j5xw/flz9+/fXuHHj9PLLL+tvf/ubWrVqpaFDh+rEiRNu62/fvl19+vRRZGSkmjRpoq5du2r16tXO+bt373Zud+/evW7rrlq1SvXr11d0dHSm8kyYMMFt2YSEBDVp0kT169fX7t27c/35WdmwYYN69uypFi1aqFmzZurevbvef//9TMfYUxfRrM5P//793T7jvffeU8+ePRUZGam///3vmjRpkhISEpzzFyxYoPr16ysyMlJ2u91t3ZEjR2bqlpqUlKQXXnhBHTp0UJMmTXTnnXfqvffec1svOjpac+bM0bRp0xQVFaWbbrpJTz75pOLj43O9/9l1k928ebPznLqeh3PnzunGG2/0eC7r16+vBg0aKCoqSiNGjND58+edy9SvX18LFixwK1v6cfHmWEpSuXLlVKtWLX3xxRc5du3N+FnvvvuuoqKiNGvWLEnu12/G/1zL/csvv2j48OFq06aNGjdurHbt2mnKlClKTEx0LmO32zV37lx16tRJzZo1U7du3fTWW2/l6phL0smTJzV69Gi1bt1azZs314MPPqiffvrJuf3jx4+rfv36evfdd/Xoo4+qefPm6tixo1566SU5HA6385LxmIwePdrtnBqGoXnz5qldu3Zq1aqVHn30UZ06dcq5fGpqqpYuXapu3bqpWbNmatGihe677z599dVX2Z5HKfM5z/i7YRi677773P5ejhs3zu3akqS1a9d6vH4AFAxasgCY5urVqypVqlSOy33zzTcaMGCA2rRpo7lz5yohIUHz5s3TAw88oI0bN+Y6qElpFWlJWrx4scqUKSMprUKcMRxJ0uDBg9WvXz+NGjVKGzdu1OOPP64lS5aoQ4cOSkxMVJ8+fXTu3DmNHDlSVapU0fbt2/X000/r7NmzevTRR53b6dChg4YOHer8vUKFCpKkHTt2KDw8XBMmTJDD4dCsWbPUv39/vfvuuwoODtYnn3yiYcOG6YEHHtCIESOUmJioNWvW6Pnnn1eTJk3UvHlz5zZLlCihnTt3qlWrVs5p7733nvz8Mt8bK1GihD755BMZhiGLxSJJ+vDDD5Wamuq2XF4+39Xq1as1ZcoUjRgxQq1atVJCQoJeeeUVjR07VpGRkapYsaJz2YULF6p8+fKS5DwfknT33Xfrnnvucf7+3HPPuX3GokWLNH/+fPXp00ejRo3SsWPHNG/ePH333Xdav3692zVhsVj05ZdfqkOHDpKky5cva9euXW7HxjAMDRs2TN9++61Gjhyp2rVr66OPPtKoUaNkt9v1z3/+07nsmjVrVL16dU2fPl1xcXGaNWuWjh49qrVr18piseS4/0OHDtV9990nKa1lqFGjRs7ro1q1avrtt98yHdNZs2bp4sWLKlmypNv09GsrOTlZf/zxh1544QVNnTpVM2fO9HhuPMnLsUyXnJysadOm5fozJCkxMVHPP/+8Bg4cqDvvvNNt3qRJk9S4cWPn7/fee6/z59jYWPXt21ctWrTQjBkzFBAQoE8//VRvvPGGKlSooEGDBkmSxo4dq127dmnIkCFq3ry5du3apXHjxslms+V4zOPi4nTfffcpODhYEydOVHBwsJYvX66+fftq48aNql27trM8zz77rDp06KAFCxZo7969Wrhwoa5cuaInnnjC437v2bNH7777rtu0ZcuWacmSJXryySdVs2ZNzZgxQ4899pjWr18vSZo5c6befPNNjRkzRvXr19fp06f10ksv6bHHHtMnn3yi4ODgPB17V++884727duX7TIJCQmaO3eu158BIO8IWQBMEx8f7wwc2Zk1a5Zq1qypJUuWOLsnNW/eXHfccYc2bdqkvn375vozr1y5IkmKjIxUeHi4JOmzzz7zuGz//v01bNgwSVK7du3Uo0cPvfTSS+rQoYM2b96sgwcPau3atYqMjHQuk5KSokWLFum+++5T6dKlJaWFh4xdIqW0kLl582ZVrVpVklSrVi316NFDb7/9tu6//379/vvv6tGjh55++mnnOpGRkbrpppu0e/dut5DTvn177dixw1nRi4mJ0b59+3TjjTdmah1r27atdu3ape+//95Zrvfff19RUVFurSd5+XxXx44d0yOPPOIWLKtUqaKePXtq7969uuOOO5zTGzZsqBtuuCHTNipWrOh2zEJDQ50/JyQkaPHixerdu7cmTZrknF6vXj317ds30zWRfmzSQ9bOnTtVvnx5t9aHL774Qp999pnmzJmjf/zjH5LSzufVq1c1c+ZMdevWTVZr2j+Bfn5+euONNxQWFiYp7fwOGzZMn332mdq3b5+r/a9WrZqktK6xWV0f6Q4cOKB33nlHDRs21IULF9zmua4bFRWlL774Qj/++GOW28oor8cy3cqVK3XlyhWVK1cu15/1f//3f7LZbBo4cGCmboZ16tTJ8hgcPHhQDRs21Lx585zXwd/+9jd9/vnn2r17twYNGqSDBw9q27Zt+ve//60HH3xQUtp1fuLECe3evVvdunXL9pjPmTNH8fHxevPNN1WlShVJadfNP/7xD82bN0/z5893Ltu4cWNniG3fvr2uXLmi5cuXa8iQIW7XqSQ5HA5NmTJFjRs3djsvV65c0dChQ/XQQw9JSmsle/7553XhwgWVLFlSsbGxGjVqlFvrbWBgoEaMGKFff/012+slO5cvX9bMmTMzlSej+fPnq3Llym6togAKFt0FAZgmNjZWERER2S5z9epVff/99+rQoYMMw1BKSopSUlJUtWpV1a5dW59//rnb8g6Hw7lMSkpKpu3FxMTIz88vU2XIkx49ejh/tlgsuvXWW7V//34lJibq66+/VpUqVZwBK91dd92lpKQkff/99zluv2XLls6AJUmNGjVS1apV9c0330iSBg4cqBkzZujy5cv64Ycf9N5772nJkiWSlKn7W3R0tI4cOaJDhw5Jkj744AM1b97cWWF0FRYWptatW2vHjh2SpLi4OO3evdst/OT1812NGzdOY8eO1YULF/Tdd9/pnXfecXYxzG693Pruu+9kt9vVrVs3t+k33nijqlSpoq+//tpteqdOnbRz504ZhiEprYUvPUil+/LLL2WxWNShQwe36yc6Olpnzpxxa12Kjo52Bqz0361Wq/O8mbn/hmFoypQpuvvuu9WgQQOP81NSUmS327V//37t3btXTZo0cVsm43fCNVzm9VhK0tmzZ/XSSy/pqaeeUmBgYK724/Tp03rllVfUp0+fPD/HdfPNN2vVqlUKDAzU77//rh07dmjx4sWKi4tzHs/0rrJdunRxW3fBggWaPHlyjp/x5ZdfqmHDhoqIiHAeJz8/P7Vv315
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-12-21 15:22:35 +04:00
"[2 1 2 ... 9 9 9]\n"
]
}
],
"source": [
"from scipy.cluster.hierarchy import dendrogram, linkage, fcluster\n",
"\n",
"linkage_matrix = linkage(data_scaled, method='ward')\n",
"plt.figure(figsize=(10, 7))\n",
"dendrogram(linkage_matrix)\n",
"plt.title('Дендрограмма агломеративной кластеризации')\n",
"plt.xlabel('Индекс образца')\n",
"plt.ylabel('Расстояние')\n",
"plt.show()\n",
"\n",
"# Получение результатов кластеризации с заданным порогом\n",
"result = fcluster(linkage_matrix, t=10, criterion='distance')\n",
"print(result) # Вывод результатов кластеризации"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Визуализация распределения кластеров**\n"
]
},
{
"cell_type": "code",
2024-12-21 15:22:35 +04:00
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
2024-12-21 15:22:35 +04:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi8AAAU4CAYAAADXeURrAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3XecHWX5///XzJx+tvf03ntCQugQQDpIUVEUUPkqAjZQ0Q+KWPjZQERA/SAKfgQRpUoT6TX0BAhJSG+b3c32duqU3x+Tc8iSBJDdZDfJ+/l4rNmdc87MPXdWnSvXfd2X4Xmeh4iIiIiIiIiIiIiIyABh9vcAREREREREREREREREtqXkhYiIiIiIiIiIiIiIDChKXoiIiIiIiIiIiIiIyICi5IWIiIiIiIiIiIiIiAwoSl6IiIiIiIiIiIiIiMiAouSFiIiIiIiIiIiIiIgMKEpeiIiIiIiIiIiIiIjIgKLkhYiIiIiIiIiIiIiIDChKXoiIiPQDz/P6ewgiIiIiIrKHU1whInszJS9ERPrYa6+9xle/+lUOOuggpk2bxpFHHsn3v/99Vq9e3d9D2y02bdrEhAkTuPvuu3f4+t13382ECRPYtGnTDn/+MD7KZ7bV0dHB9ddfz0knncSsWbM44IADOOecc3jiiSd6vO+73/0uCxYs+EjXeL9rf+c73+HVV1/t0/OKiIiIyN7lc5/7HJ/73Of6exi7XH8+m/eW4goRkV1LyQsRkT504403ctZZZ5FMJvmf//kf/vSnP3H++eezdOlSTj31VB588MH+HuKAc/jhh3PHHXdQVVW1W663evVqPv7xj3PnnXdy6qmncsMNN/CTn/yEkpISvvKVr/C73/1ul15/2bJl3Hfffbiuu0uvIyIiIiIy0PX3s3lv9PfYFVeIyL4g0N8DEBHZWzz55JNcffXVfPWrX+Wiiy7KH583bx4f//jHueSSS/jud7/L+PHjGTduXD+OdGApKyujrKxst1wrm83yjW98g2AwyN/+9jfKy8vzrx111FH84Ac/4Nprr2XBggVMnDhxt4xJRERERGRftCc/m+/JYxcR2ZOo8kJEpI9cf/31jB49mgsvvHC714LBID/+8Y+xLIs//vGP+eMTJkzg1ltv5dJLL2XWrFkceOCBXHnllaTT6R6ff+yxxzjttNOYNm0aBx10ED/96U9JJBL516+77jqOPvponnrqKU466SSmTp3KMcccw7333rvT8d5///1MmDCBFStWbHetCRMmsHTpUgD+8pe/cOyxxzJt2jQOOeQQrrjiCrq6uj7KFO3QjraAuueeezj++OOZNm0aJ598MgsXLmTy5MnbbUX1xhtvcOaZZzJt2jQOP/xwbrrppve91tNPP82KFSv4+te/3iPAyPna177GZz/7WWzb3uHnJ0yYwHXXXdfj2HXXXceECRPyP7e0tHDJJZfktw075ZRT8n8PL730EmeffTYAZ599do9tAD7s3/H111/PvHnzOPjgg2lvb2fJkiWcc845zJkzh1mzZnHuueeyePHi950HEREREdl7PP/883zmM59hzpw57L///lxyySXU1dUB8Pjjj/d4tge49957mTBhAv/85z/zx5YtW8aECRNYtGjRduf/wx/+wNSpU2lvb+9x/JZbbmHKlCk0Nzfjui7XXHMNCxYsYOrUqSxYsICrr76abDa703H39tnccRxuu+02TjrpJKZPn87hhx/OVVdd1SOWer9n85zNmzdz8cUXM2/ePGbMmME555zTY752xdgVV4iIfDhKXoiI9IGWlhaWLFnCEUccgWEYO3xPSUkJBx54II8//niP49deey3Nzc385je/4bzzzuOOO+7g0ksvzb9+//33c+GFFzJ69GhuuOEGLrroIv71r39xwQUX9GjO1tjYyI9//GPOPvtsbrzxRoYOHcqll166014bRx11FLFYbLutrB544AHGjRvH5MmTeeCBB/jVr37FWWedxZ/+9CcuvPBC7rvvPn7yk5984Jy4rott29t9fVBZ87333st3v/tdZs+eze9+9zuOOeYYLrjgAhzH2e69V1xxBSeccAI33ngjs2bN4le/+hVPPvnkTs/9zDPPYFkWhx122A5fr6ys5Ac/+AFTp079wPvbmW9/+9usXr2aH/3oR/zxj39k8uTJXHrppbz44otMmTKFyy+/HIDLL7+cH/7wh8CH/zvevHkzTz/9NNdccw3f+973sCyL8847j9LSUq677jquueYakskkX/ziF+ns7PzI9yAiIiIie4Z7772XL3zhCwwaNIhf//rXfO9732PRokV86lOform5mQMOOIBQKMQLL7yQ/8yLL74I0KNXwjPPPENZWRkzZszY7honnXQStm3zn//8p8fxBx98kIMPPpjy8nL++Mc/cvvtt3PhhRfy5z//mU9/+tP86U9/4ve///1Ox97bZ/PLL7+cn/3sZxx11FH8/ve/56yzzuLWW2/t8Qz9fs/m4MdxZ555Jm+//TY/+MEPuPrqq3Fdl7POOut9exYqrhAR2T20bZSISB+ora0FYMiQIe/7vhEjRvD444/T3t5OcXEx4G+b9Ic//IFAIMBhhx2GaZr87Gc/46tf/SqjR4/mqquu4pBDDuGqq67Kn2fkyJGce+65PP300xx++OEAJJNJrrzySg444ID8e4444giefvppxowZs91YotEoxxxzDA899BDf/OY3Aeju7ubJJ5/MV4+8/PLLDB06lLPOOgvTNJk3bx6xWGy7VVc7ctlll3HZZZd94Pve69prr+WII47gpz/9KQCHHHIIwWCQq6++erv3XnzxxXz6058GYObMmTz66KO8+OKLHHHEETs8d319PaWlpcTj8f96XB/Wyy+/zIUXXshRRx0F+NuGlZSUEAqFKCgoYOzYsQCMHTuWsWPH4nneh/47tm2bSy+9lP322w+AxYsX09raytlnn83s2bMBGD16NHfccQfd3d0UFhbusvsUERERkf7lui5XXXUVBx98cI9n5dmzZ3P88cfzpz/9ie985zvMmzePhQsXct555wGwcOFCpkyZwiuvvJL/zLPPPpuPRd5ryJAhzJ07lwceeIBPfOITAGzYsIE333yTa665BvCfgadOncrpp58O+M/A0Wj0fZ9He/NsvmrVKu68804uueQSvvSlLwFw0EEHUVVVxXe+8x2eeeYZDjvssPd9Nge/yrytrY3bb789H8sdeuihHH/88Vx77bX89re/7fOxf1iKK0REVHkhItIncqtYgsHg+77Psqwe7wd/JVMg8G4u+ZhjjgHglVdeYc2aNdTX17NgwYIe1Qtz586loKCA559/vsf5Z86cmf++pqYGoEeJ8Hudcsop+cAD/LLyTCbDySefDMD8+fNZu3Ytp512Gtdffz1vvfUWJ510Uo+y5J256KKLuPPOO7f72rYfyHutX7+ezZs3c+yxx/Y4fsIJJ+zw/bmHbfCTMRUVFXR0dOz0/JZl7bCCoy/tv//+XHfddXzta1/jn//8J01NTVx66aX5IOC9/tu/40mTJuW/HzduHGVlZZx//vlcfvnlPProo1RUVPDtb387//cvIiIiInuntWvX0tjYyIknntjj+PDhw5k1axYvv/wyAIcffjivvfYamUyGtWvXUl9fz/nnn09tbS21tbV0dXWxaNGi/D9s78jJJ5/MK6+8QmNjI+BXXRQUFLBgwQLAfwbObV910003sWrVKj772c9yyimn7PScvXk2z93be+OEE044AcuyeOmll/Ljer9n84ULFzJp0iSqq6vzz+GmaXLooYf2qFbpy7F/WIorRESUvBAR6RO5VTq5Coyd2bhxI/F4nJKSkvyx6urqHu/J7Zna3t5OW1sbAD/60Y+YMmVKj6+uri62bNnS47PRaDT/fW7V1LaJkvfaf//9qa6uzm8d9eCDDzJv3rz8A+rxxx/P1VdfTSwW43e/+x1nnHEGRx55JA899ND73if4czJt2rTtvt6vOqWlpaXHHORUVFTs8P3b3i/49/x+9ztkyBDa29vp7u7e6Xvq6+t3+tqHcc0113DuueeyZMkSvv/973PYYYfxxS9+cae/G//t3/G
"text/plain": [
"<Figure size 1600x1200 with 4 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(16, 12))\n",
"plt.subplot(2, 2, 1)\n",
"sns.scatterplot(x=df_cleaned['Open'], y=df_cleaned['High'], hue=result, palette='Set1', alpha=0.6)\n",
"plt.title('Open vs High Clusters')\n",
"\n",
"plt.subplot(2, 2, 2)\n",
"sns.scatterplot(x=df_cleaned['Low'], y=df_cleaned['Close'], hue=result, palette='Set1', alpha=0.6)\n",
"plt.title('Low vs Close Clusters')\n",
"\n",
"plt.subplot(2, 2, 3)\n",
"sns.scatterplot(x=df_cleaned['High'], y=df_cleaned['Adj Close'], hue=result, palette='Set1', alpha=0.6)\n",
"plt.title('High vs Adj Close Clusters')\n",
"\n",
"plt.subplot(2, 2, 4)\n",
"sns.scatterplot(x=df_cleaned['Volume'], y=df_cleaned['Adj Close'], hue=result, palette='Set1', alpha=0.6)\n",
"plt.title('Volume vs Adj Close Clusters')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**KMeans (неиерархическая кластеризация) для сравнения**\n"
]
},
{
"cell_type": "code",
2024-12-21 15:22:35 +04:00
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Центры кластеров:\n",
2024-12-21 15:22:35 +04:00
" [[1.80172461e+00 1.82298328e+00 1.77878707e+00 1.80190896e+00\n",
" 1.12235004e+00 1.40331174e+07]\n",
" [2.01023414e+01 2.02889878e+01 1.99175185e+01 2.01054904e+01\n",
" 1.51394877e+01 3.34414658e+07]\n",
" [4.81867001e+01 4.85778072e+01 4.78128132e+01 4.82043999e+01\n",
" 4.64423199e+01 2.18625744e+07]]\n"
]
},
{
"data": {
2024-12-21 15:22:35 +04:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi8AAASgCAYAAACAO9vxAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3Xd8VfX9x/HX3Ss3e5NA2BsUFFCU6d5a29o6qlXrqqNaa632Z621w7rqagW12qXWWrXVurciICpTNknI3snd+/z+oKRGQJEQboD38/HIg+Scc8/5fL/3AueTz/l+vybDMAxERERERERERERERET6CXO6AxAREREREREREREREfksFS9ERERERERERERERKRfUfFCRERERERERERERET6FRUvRERERERERERERESkX1HxQkRERERERERERERE+hUVL0REREREREREREREpF9R8UJERERERERERERERPoVFS9ERERERERERERERKRfUfFCRES+EsMw0h2CiIiIiIjsI5RfiIjIjqh4ISL93tlnn83ZZ5+9zfZAIMA3vvENxo0bx2uvvdZ97MiRIznjjDN2eL4f/OAHjBw5kh//+Md9FnNfiUajPProo3zta19j8uTJTJkyhTPOOINnn322x03/vffey8iRI3frtWOxGL/85S/597//vVvOt6P3dU+KxWL8+c9/5utf/zqTJk1i0qRJnHrqqTzyyCOEw+G0xranfNln5fPv0668b/3hvRYRERHZSvnF/yi/2D1qa2sZOXIk//znP9Ny/T1pxYoVXHvttcyaNYsJEyZwxBFH8NOf/pSampoex40cOZJ77703TVGKyL7Cmu4ARER2RSAQ4IILLmDNmjXcf//9zJw5s3uf2Wxm6dKlNDY2Ulxc3ON1oVCIN998c0+Hu1u0trZywQUX0NDQwNlnn82ECRNIpVK8+eab/PjHP2bJkiXccsstmEymPrl+c3Mzjz32GL/61a92y/luuumm3XKeXeX3+7nwwgtZs2YN3/rWt7jiiiswmUwsWbKE3//+9zzzzDPMnz9/m8/Q/i7d75uIiIhIX1B+ofxCvtxf//pXfvnLXzJ16lSuueYaCgsLqa6u5uGHH+aVV17hscceY9SoUekOU0T2ISpeiMheZ2tisXr1an7/+98zffr0HvvHjBnDhg0beOmllzj33HN77HvzzTdxuVxkZmbuwYh3j+uuu47GxkaefPJJKioqurfPmjWL0tJS7rzzTmbPns3cuXPTF+RXMGzYsLRe/4YbbmD9+vU88cQTPW6wDzvsME4++WS+9a1v8cMf/pA///nPfZaw7Y3S/b6JiIiI7G7KL5RfyJf76KOPuPXWWznzzDO54YYburdPnTqVI444glNOOYWf/OQn+8XoExHZczRtlIjsVYLBIBdeeCFr165l3rx52yQWAG63m5kzZ/LSSy9ts+8///kPRx99NFZrz9ptKpVi3rx5HHnkkYwbN46jjz6aP//5zz2OSSaTzJs3jxNOOIEJEyZwwAEHcMYZZ7Bw4cLuY+69916OPPJI3nrrLU488cTucz377LM9zvXYY49xzDHHMH78eA4//HB+9rOfEQgEdtju1atX895773H++ef3SCy2OvfccznzzDNxu93bff2cOXO2Gcb+z3/+k5EjR1JbWwtAJBLhZz/7GTNmzGDcuHEcc8wxPPzww8CWYdBbk5brr7+eOXPmdJ9nyZIlnHXWWUycOJEpU6Zw3XXX0d7e3uM6Y8aM4amnnmL69OlMmTKFDRs2bDOse+TIkfz1r3/lhhtuYMqUKRx44IFceeWVtLa29oj74YcfZu7cuUyYMIEzzjiDN954g5EjR7Jo0aLuWL9siPL69et5+eWXueiii7b7ZNDgwYO58sor+fDDD7vf3639tWzZMk499VQmTJjAiSeeuM3nLBqNcttttzFz5kzGjRvHiSeeyH/+859t3o977rmH3/zmNxx66KFMmDCB888/n6qqqh3G/N3vfpfTTjttm+2XXnopJ510EgDt7e1cc801TJ8+nfHjx3PyySdv89nrrc+/b4FAgP/7v//jkEMO4cADD+QHP/gBjz766DbTChiGwfz587uHl3/zm99k+fLluzU2ERERka9K+YXyi92RX+ysaDTK/fff3/1eHXXUUcybN49UKgXA97///e57+62+853vMG7cOCKRSPe2W2+9laOPPnq71zj66KO54oorttl+8sknc8kllwCwefNmLr74YqZOncrEiRP55je/ydtvv/2FsT/88MN4vV6uvvrqbfbl5uby4x//mLlz5xIKhbb7+ubmZq6//npmzpzJhAkTOP3003n99dd7HPP+++/zjW98gwMPPJCDDz6YSy65hI0bN/Y45rXXXuO0005j/PjxTJ8+nV/84hc7vKaI7P1UvBCRvUYoFOJ73/sen376KfPnz2fq1Kk7PPa4447rHtq9VSAQ4J133uGEE07Y5vif/exn3HPPPZx00kn84Q9/4JhjjuGXv/wl999/f/cxt99+Ow888ADf/OY3eeihh7jlllvo7Ozkyiuv7LE+QktLCz//+c8555xzmDdvHmVlZVx33XXdN13PP/88v/3tbznzzDN5+OGHueyyy3juuee45ZZbdtied999F6DHTf1nORyO7l8g76pf/vKXvPPOO1x33XXdN/C33XYbTz/9NIWFhdx3330AXHLJJd3ff/jhh5x77rk4nU7uvvtufvKTn7B48WLOOeecHjfXyWSSRx55hFtvvZXrr7+eoUOHbjeGu+66i1QqxZ133smPfvQj3nzzTX75y19277/vvvu4/fbbOfbYY3nggQeYOHEiV111VY9zFBYW8uSTT/L1r399h239sv6ELZ8hk8m0zQ31RRddxNy5c7nvvvsYPHgwV111VfeNvmEYXHbZZTzxxBOcd955/P73v+/+hf7nE8w//elPbNq0iV/96lf84he/YOXKlVx33XU7jOekk05i1apVVFdXd2/z+Xy88847nHzyyQBce+21bNy4kZtvvpn58+czZswYrrvuuh4J8I4kEontfn3ZAoqXXnopL774Ipdffjl33XUXwWCQO+64Y5vjPvroI1599VV++tOf8tvf/pbm5mYuueQSEonEl8YmIiIi0heUXyi/2F35xc4wDIOLL76Yhx56iK9//evdn4u77767e8qrmTNnsm7dOtra2oAtxY5PPvmEeDzO0qVLu8/1zjvvMHv27O1e56STTuLtt9/uUbzauHEja9as4eSTTyaVSnHRRRcRDoe57bbbeOCBB8jOzuaSSy7pkWt8Pvb33nuPQw45BJfLtd1jjjvuOC677LLtFrxaW1s5/fTTWbJkCT/4wQ+49957GTBgAJdddhn/+te/AKipqeHSSy9l3Lhx/P73v+fWW2+lsrKS733ve93FnX//+99cdtllDBkyhPvvv5/vf//7/Otf/+LSSy/Vwu8i+yhNGyUie4WticVHH33U/fMXmTVrFi6Xq8fQ7ldffZW8vDwmT57c49jKykr+/ve/c/XVV/O9730P2DJ1kMlk4sEHH+Tb3/42OTk5NDc384Mf/KDH0zwOh4PLL7+ctWvXcsABBwAQDoe59dZbu2/0KyoqmD17Nm+//TZDhw5l8eLFlJWVceaZZ2I2m5kyZQput5uurq4dtqehoQGAsrKyne+0r2jx4sVMnz6d448/Htgy/NftdpOXl4fdbmf06NEADBw4kDFjxgBwxx13MHjwYB588EEsFgsAEydO5Pjjj+fpp5/mzDPP7D7/xRdfzKxZs74whhEjRvSY83b58uXdT7iFQiHmz5/PmWeeyQ9/+ENgy/sUDod58sknu19jt9u734sd2fo02IABA3Z4TFZWFllZWdTV1fXYfvbZZ3PZZZcBcPjhh3Pqqad2z4u8YMEC3n33Xe666y6OO+647mPC4TC33347J5xwQvdTeZmZmTzwwAPd/bZ582buvfdeOjo6yMnJ2Saeo446iptvvpnnn3+++/qvvPIKyWSyO2FevHgxl112GUcccQQAU6ZMITs7G7vd/oX9ATB27Ngd7psyZcp2t3/wwQcsWrSIe++9l6OOOgq
"text/plain": [
"<Figure size 1600x1200 with 4 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.cluster import KMeans\n",
"\n",
"random_state = 9\n",
"kmeans = KMeans(n_clusters=3, random_state=random_state)\n",
"labels = kmeans.fit_predict(data_scaled)\n",
"centers = kmeans.cluster_centers_\n",
"\n",
"# Отображение центроидов\n",
"centers = scaler.inverse_transform(centers) # Обратная стандартизация\n",
"print(\"Центры кластеров:\\n\", centers)\n",
"\n",
"# Визуализация результатов кластеризации KMeans\n",
"plt.figure(figsize=(16, 12))\n",
"plt.subplot(2, 2, 1)\n",
"sns.scatterplot(x=df_cleaned['Open'], y=df_cleaned['High'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 0], centers[:, 1], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: Open vs High')\n",
"plt.legend()\n",
"\n",
"plt.subplot(2, 2, 2)\n",
"sns.scatterplot(x=df_cleaned['Low'], y=df_cleaned['Close'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 2], centers[:, 3], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: Low vs Close')\n",
"plt.legend()\n",
"\n",
"plt.subplot(2, 2, 3)\n",
"sns.scatterplot(x=df_cleaned['High'], y=df_cleaned['Adj Close'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 1], centers[:, 4], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: High vs Adj Close')\n",
"plt.legend()\n",
"\n",
"plt.subplot(2, 2, 4)\n",
"sns.scatterplot(x=df_cleaned['Volume'], y=df_cleaned['Adj Close'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 3], centers[:, 4], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: Volume vs Adj Close')\n",
"plt.legend()\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**PCA для визуализации сокращенной размерности**\n"
]
},
{
"cell_type": "code",
2024-12-21 15:22:35 +04:00
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
2024-12-21 15:22:35 +04:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAABjAAAALpCAYAAAAdGW8KAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3Xd4m9X5//H3oy15bzt77x1ICCOEsHdZBcoulITQUqCUwBcI68cohE1YLatAy2jYe4/SEAgkAbL3cry3tcfvD2MRx3amLVn259UrF/VzJJ1bOpb83Lqfc44RiUQiiIiIiIiIiIiIiIiIdCCmeAcgIiIiIiIiIiIiIiKyPRUwRERERERERERERESkw1EBQ0REREREREREREREOhwVMEREREREREREREREpMNRAUNERERERERERERERDocFTBERERERERERERERKTDUQFDREREREREREREREQ6HBUwRERERERERERERESkw1EBQ0SkE4hEIvEOQVrRlcemKz93EREREWlK54Zdg8ZZRNqaChgindg555zD4MGDm/wbMWIEU6ZM4eabb6a6urrZfdatW8dNN93EYYcdxqhRo5gyZQpXXnkly5cvb7Wf++67j8GDB3Prrbe259Np1UMPPcTgwYPj0ndLXn31VQYPHszmzZvb/X5+v5/bb7+dt956a3fD3C1nnHEGgwcP5oMPPmjXfjraWO6Nmpoarr76ahYsWBA9ds4553DOOefELIZdfT9PnTqVa665pk37XrVqFWeeeWabPNbmzZsZPHgwr776aps8noiIiHQcylniozPlLIMHD+ahhx5qdnzlypVMmjSJgw8+mPXr10dvO3jwYO69994WHyscDnPQQQcl7LlncXExd911F0cddRSjR4/mwAMPZPr06U1yEmifvKSoqIiLL76YLVu2tMnjtTauItL1qIAh0skNGzaMl156Kfrv6aef5vzzz2fu3LlMmzatydURH374ISeddBJLlizhkksu4e9//ztXXHEF69ev57e//S1ff/11s8cPh8O8/vrrDBo0iDfeeAOPxxPLp9fllZSU8OyzzxIMBtutj7Vr17Jw4UIGDRrEiy++2G79dDbLli3jjTfeIBwOR4/deOON3HjjjTHpf0/ez23p/fffZ+HChW3yWLm5ubz00ktMmTKlTR5PREREOhblLJ1bLHKW7a1atYrzzz8fp9PJ888/T58+faJtJpOJ999/v8X7fffdd5SUlMQoyrb1/fffc+KJJ/LZZ59x7rnn8thjj3Hdddfh9Xo555xzeP3119u1///973988cUXbfZ4L730EqeddlqbPZ6IJC5LvAMQkfaVnJzMmDFjmhzbd999qa+v58EHH2Tx4sWMGTOGjRs3MnPmTA466CDuv/9+zGZz9PZHHHEEZ555JjNnzuTTTz/FZrNF2/773/9SVFTEvffey9lnn83bb7+tk4xO5tVXX6V79+5MmzaNq666ig0bNtC7d+94h5WQBgwYEJN+9vT93FHZbLZmn2MiIiLSeShnkba0Zs0azjvvPJKSknj22Wfp1q1bk/Zx48axYMECli5dyrBhw5q0vfPOOwwdOpRly5bFMuS9VlVVxeWXX06fPn14+umncTqd0bYjjzySiy++mFmzZnHggQeSnZ0dx0h3nc7/RaSRZmCIdFEjRowAoLCwEIDnnnsOv9/P9ddf3yQRAHA6ncycOZNTTjml2RTuuXPnMmjQIMaPH8/EiRN56aWXdtr31KlTuf322znvvPMYNWoU1113HdBw0jVr1iz2339/Ro4cyW9/+1vmzZvX5L4+n4877riDAw44gLFjx3Lttdfi8/ma3Kal6bDz589n8ODBzJ8/P3ps7dq1/PGPf2TChAnsu+++TJs2jTVr1jTp66677uLggw9mxIgRHH/88bz77rtNHjccDvPII48wZcoURo8ezYwZM1qc5r69Xb3fxx9/zO9+9zvGjh3LiBEjOOqoo3jhhReAhmV1Dj30UACuvfZapk6dGr3fK6+8wsknn8yYMWMYNWoUJ554Iu+9916Txx48ePBOlw0KhUK8/vrrHHLIIRx22GG4XK4WxzgQCDB79mwmT57MqFGjuPDCC3n99debTS9/7bXXOOaYYxg5ciQnnHAC8+bNY9iwYTucnv3uu+9y8sknM3bsWA444ABmzZrV5LV66KGHOOqoo/joo4847rjjGDlyJCeeeCILFy5k0aJFnHbaaYwaNYrjjjuu2e/TypUrmTZtGuPGjWPcuHFceumlbNq0Kdre+Hvz4osvcsghhzBu3LjoVX07eo3nz5/PueeeC8C5554b/X3c9nfz97//PSeffHKz5ztjxgxOOOGE6M8LFizg7LPPZvTo0UyYMIGZM2dSUVHR6usFe/5+3vY5b/te2T52gJ9//pnzzjuP8ePHM3bsWM4//3wWLVoENIzJww8/DDSd+h0Oh3niiSc4/PDDGTFiBEceeSTPPfdcs36uuuoqLrvsMsaMGcMFF1zQbAmpV199lWHDhrF48WJOP/10Ro4cySGHHMKTTz7Z5LFKSkq44oorou/xWbNmcd999zV5r4iIiEjHpZxFOcuu5CzbWrNmDeeeey4pKSk8//zzzYoX0FAcy87ObjYLIxgM8uGHH3Lsscc2u8+ujHtFRQU333wzhxxyCCNGjGDChAlceumlTfKhc845h+uuu44nnniCKVOmMHLkSM444wx+/PHH6G28Xi833XQTkydPjr6e25/nbu/111+npKSE//u//2tSvICGGSdXXXUVZ511FnV1dc3u29pyrddcc02T8dq4cSPTp09n4sSJjB49mtNPPz064+LVV1/l2muvBeDQQw9tMmavvPIKxx57bHRpuIceeohQKNSkn/POO48bb7yRcePGccwxxxAKhZrkEY3vjXnz5vH73/+e0aNHc8ABB3D33Xc3eay6ujpmzZrFpEmTGDt2LFdccQXPPPNMh1q+TUR2nwoYIl3UunXrAOjZsycAX331FcOGDSMvL6/F20+aNIkrrriCnJyc6LGqqio+/fRTfvOb3wBw0kkn8dNPP7FkyZKd9v/CCy8wcuRIHnnkEU499VR8Ph/nnXcen3zyCVdccQUPP/ww+fn5XHTRRU1ODP/617/y8ssvM23aNO6//36qq6t55plndvv5FxcXc/rpp7N+/Xpuuukm7r77bsrKyjjvvPOoqqoiEolw6aWX8uKLL3LBBRfw6KOPRk+Atp16e/fddzNnzhxOPfVUHn74YdLT07nnnnt22v+u3O/zzz/n0ksvZfjw4TzyyCM89NBD9OzZk1tuuYXFixeTm5sb/ZL4kksuif7/F154gVmzZnHYYYfx+OOPM3v2bGw2G1dddRVFRUXRx3/ppZeYMWPGDuP88ssvKS0t5Te/+Q0Oh4Ojjz6a1157Db/f3+R2s2bN4tlnn+Xss89mzpw5ZGdnc8MNNzS5zeuvv84111zDuHHjeOSRRzjyyCOZMWNGkxPO7T3yyCNceeWVjBkzhgcffJBLL72UDz74gHPOOQev1xu9XVFREXfeeSfTp0/ngQceoKamhssuu4wrr7yS0047jTlz5hCJRLjiiiui91u3bh1nnHEG5eXl/O1vf+O2225j06ZNnHnmmZSXlzeJ4+GHH2bmzJnMmjWLsWPH7vQ1Hj58OLNmzYq+Ni0tG3XCCSewZMkSNmzYED1WU1PDl19+yYknngg0TGE///zzcTgc3H///fzf//0f3377Leeee26T57+9PXk/7466ujouuugiMjIyeOihh7jvvvvweDxceOGF1NbWctppp3HqqacCTad+33TTTTz44IOccMIJPPbYYxx11FHcfvvtzJkzp8njv/feeyQlJfHoo49y0UUXtRhDOBzm8ssv55hjjuGJJ55g3Lhx3HXXXXz11VdAw1rL5513Hj/88AP/93//xx133MHy5ct56qmn9ug5i4iISOwpZ1HOsis5S6O1a9dy3nnnkZyczPPPP9/q74nZbObII49sVsCYN28ePp+v2cUuuzLukUiEadOm8fXXX3PVVVfx5JNP8sc//pF58+Y1ywU++OADPvnkE66//nruvfd
"text/plain": [
"<Figure size 1600x600 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.decomposition import PCA\n",
"\n",
"pca = PCA(n_components=2)\n",
"reduced_data = pca.fit_transform(data_scaled)\n",
"\n",
"# Визуализация сокращенных данных\n",
"plt.figure(figsize=(16, 6))\n",
"plt.subplot(1, 2, 1)\n",
"sns.scatterplot(x=reduced_data[:, 0], y=reduced_data[:, 1], hue=result, palette='Set1', alpha=0.6)\n",
"plt.title('PCA reduced data: Agglomerative Clustering')\n",
"\n",
"plt.subplot(1, 2, 2)\n",
"sns.scatterplot(x=reduced_data[:, 0], y=reduced_data[:, 1], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.title('PCA reduced data: KMeans Clustering')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Анализ инерции для метода локтя (метод оценки суммы квадратов расстояний)**\n"
]
},
{
"cell_type": "code",
2024-12-21 15:22:35 +04:00
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
2024-12-21 15:22:35 +04:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA2cAAAImCAYAAADXOPIYAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAfxJJREFUeJzt3XlclWX+//H3Oey7sghY4oYiCrjilpaZY5s1qe3ZYlpWNk5ZP8tymqysvuWuWTmWtplmmtU47dO0uKWpiQIqKebCIosgssP5/YGcPIKKeDz3AV7Px4MHcN/XfZ0Ph2sa3l7Xfd0mi8ViEQAAAADAUGajCwAAAAAAEM4AAAAAwCkQzgAAAADACRDOAAAAAMAJEM4AAAAAwAkQzgAAAADACRDOAAAAAMAJEM4AAAAAwAkQzgAAAADACRDOAAAAAMAJEM4AwAHuvPNORUVF6dZbbz1tm0cffVRRUVF68sknHVgZgPo6ePCgoqKitGrVKqNLAdBIEM4AwEHMZrO2bdum9PT0GucKCwv1/fffG1AVAABwFoQzAHCQzp07y8PDQ19++WWNc99//728vLwUGhpqQGUAAMAZEM4AwEG8vb112WWX1RrO/vOf/+jKK6+Uq6trjXPffvutRowYodjYWF1yySV64YUXVFhYKEkaPHiwoqKiav04ePCgJGnt2rW6/fbb1bNnT/Xp00ePPfaY0tLSbF7jscceq7WPsy3Xql6uWdvHyRISEjRmzBj16dNHPXr00AMPPKA9e/ZYz2/cuFFRUVHauHGjJGn37t0aMmSIbr31Vs2bN++0rzFv3jxJ0ooVK3T11VcrJibG5vzZloh+9NFHtfZ78nXVS9fO1q6+NdT1vTnT65/ufPXv4cknn9TgwYNtXnfZsmU27+HJr/Prr7/atH3//fcVFRVl00dxcbFmzJihoUOHKiYmRj169NDo0aOVlJRkc+3p6rrzzjtt2lTXUZtTx0e1O++806afkpISvfbaa7rqqqsUGxuroUOHauHChaqsrLS55tRaNm7cWKdrz8ZisWjy5MmKi4vTzz//XOfrAKBazb8CAAAXzDXXXKNHHnlE6enpCgsLkyQVFBToxx9/1OLFi/Xjjz/atP/888/1+OOP67rrrtMjjzyiQ4cOadasWUpJSdHixYs1f/58lZaW6siRI3r44Yf14IMPatCgQZKkFi1aaPXq1XriiSc0bNgwjRs3Trm5uZo7d65uueUWffLJJwoKCpJU9UftLbfcohEjRkiStb+66Ny5s/75z39av1+xYoU+/vhj6/cbNmzQ2LFj1adPH7344osqKSnRm2++qVtvvVUfffSR2rdvX6PPV199VTExMXrwwQcVEBCggQMHSpKmTp0qSdbXCwsL06ZNmzRlyhTdeOONmjJlinx8fCSpTvUXFxcrNjZWU6ZMsR473XUnv7entqtvDefy3jzzzDPq0qVLra+/fPlySdLOnTv13HPP1Wh7qry8PM2ePbvWcz4+Pvrvf/+rnj17Wo/95z//kdls+++5kyZN0ubNmzVx4kRFRERo//79mjNnjh577DGtWbNGJpPJ2vbGG2/UTTfdZP2++vdoTxaLRQ888IC2bdumhx9+WJ06ddLGjRs1e/ZsHThwQM8//7y17aljtn379nW+9kxeeOEF/fvf/9Zrr72mAQMG2P1nBND4Ec4AwIEGDRokLy8vffnll7rnnnskSd98842CgoJs/hiWqv7YnD59ugYOHKjp06dbj7dp00b33HOPfvjhB2tYqJ4li4iIULdu3SRJlZWVmj59ugYMGKAZM2ZYr+/Ro4euueYavfXWW5o0aZIkqaioSG3atLFeW91fXfj6+lqvk6SffvrJ5vyMGTPUunVrLVy4UC4uLpKkAQMG6C9/+Yvmzp2rOXPm2LTfv3+/fv75Z3322Wfq0KGDJFmDrK+vryTZvN6aNWskSU899ZQ1FEmSu7v7WWsvKipScHCwTX+nu+7k9/bUdtu3b69XDefy3kRGRp729auPl5SU1Nr2VHPnzlXLli2Vm5tb49yll16q7777Tv/v//0/SVJ6erq2bt2qXr166dChQ5Kk0tJSHT9+XFOmTNE111wjSerdu7cKCgr08ssvKysrSyEhIdY+w8LCbOqp/j3a048//qh169Zp5syZuvbaayVJl1xyiTw9PTVnzhzddddd1vF06pj94Ycf6nzt6cyYMUPLly/X/Pnzdemll9r95wPQNLCsEQAcyNPTU4MHD7ZZ2rhmzRpdffXVNjMNkrR3716lp6dr8ODBKi8vt37Ex8fL19dXa9euPeNr7du3T0eOHNGwYcNsjkdERKh79+765ZdfrMfS0tLk5+dnh5/QVmFhoRISEnT11Vdbw4ck+fv76/LLL7epobr9rFmz1KdPn7P+MVwtLi5OkvT2228rMzNTpaWlKi8vr9O19vq561PDub439rJ7924tX75c//jHP2o9P3jwYKWmpmrv3r2SpC+//FJdu3bVRRddZG3j7u6ut956S9dcc40yMjK0YcMGLVu2zLqpTWlp6TnXVVlZqfLyclkslrO2qf44ue0vv/wiV1dXXXXVVTbXXH/99dbzp3M+10rSBx98oIULF+raa6+1mV0FgHPFzBkAONjVV1+thx9+WOnp6fLw8ND69ev1yCOP1Gh39OhRSVVLwGpbBpaZmXnG16m+Pjg4uMa54OBgJSYmSqqaoTt8+LAuvvjic/tB6uDYsWOyWCynreHYsWM2xx544AH5+/vbLIs8m/j4eE2ZMkULFy7U/Pnzz6m+Q4cOnXH534Ws4VzfG3t54YUXdO2116p79+61ng8NDVVMTIy+++47tWvXTv/5z380bNgw63ip9tNPP+nFF1/U3r175ePjo06dOsnb21uSzhiwTmfBggVasGCBXFxcFBwcrAEDBujvf/+7zSY51bPNJ+vdu7ekqqWazZs3twm6kqwzeGd6P8/nWklKTk7WgAED9O9//1t33323OnfufMb2AHA6hDMAcLBLL71UPj4++vLLL+Xt7a2LL75YMTExNdr5+/tLqrq3p/oP0JMFBASc8XWaNWsmScrKyqpx7siRI2revLkkKSkpScXFxTU28bAHPz8/mUym09ZQXWO1SZMm6csvv9SECRP0wQcf1Hn5280336yff/5Z5eXleuaZZ3TxxRfrwQcfPOM1lZWV+u233zRy5Mg6vcapM5vnW8O5vjf28MUXX2jHjh02y1xrc8UVV+i7777T1VdfrR07dmj+/Pk24eyPP/7Q+PHjNWTIEL355ptq1aqVTCaTPvjggxrLWqWzv3dS1ft38803q7KyUocPH9asWbN033336bPPPrO2mTp1qk2YPvm+sYCAAOXm5qqiosImZFX/I0b1eK/N+VwrSX//+99111136dprr9WUKVO0YsWKGkEPAOqCZY0A4GDu7u4aMmSIvvrqK33xxRfWe1xO1a5dOwUFBengwYOKjY21foSGhmrGjBk1ZjJO1bZtW4WEhOjf//63zfEDBw5o27Zt6tGjhyTpf//7n6KjoxUYGHjOP0tlZeUZ/wj19vZWTEyMvvjiC1VUVFiPHzt2TP/73/9q3GcXExOj+fPn69ChQ3r11VfrXMecOXP0v//9Ty+//LKuvvpqxcbGnvV+ry1btqiwsFB9+vQ5Y7vqWaBTN8Q43xrO9b05X6WlpXrllVc0fvx4m/vBajNkyBD99ttvev/999WzZ0+1aNHC5vyOHTtUUlKi+++/XxEREdbwVR3Mqt+z6p0Oz/beSVUb2MTGxqpr1666+uqrdccdd2jXrl3Ky8uztmnbtq3N/xZOvr+vd+/eKi8vr7EbanW4O9P7eT7XSlUznZ6ennrmmWe0c+dOLV68+Kw/LwDUhpkzADDANddco3HjxslsNtvsFHgyFxcXPfroo3rmmWfk4uKiyy+/XPn5+VqwYIEyMjLOuhzPbDZr4sSJmjx5sh577DFdf/31ys3N1fz58xUQEKDRo0dr586d+uCDD3Tttddq27Zt1muPHDkiqWqGJCcnp0Zwy8nJUUpKivbv328Neafz2GOPacyYMbr//vt1++23q6ysTAsXLlRpaan
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"inertias = []\n",
"clusters_range = range(1, 11)\n",
"for i in clusters_range:\n",
" kmeans = KMeans(n_clusters=i, random_state=random_state)\n",
" kmeans.fit(data_scaled)\n",
" inertias.append(kmeans.inertia_)\n",
"\n",
"\n",
"plt.figure(figsize=(10, 6))\n",
"plt.plot(clusters_range, inertias, marker='o')\n",
"plt.title('Метод локтя для оптимального k')\n",
"plt.xlabel('Количество кластеров')\n",
"plt.ylabel('Инерция')\n",
"plt.grid(True)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Расчет коэффициентов силуэта**\n"
]
},
{
"cell_type": "code",
2024-12-21 15:22:35 +04:00
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
2024-12-21 15:22:35 +04:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA2MAAAImCAYAAADe01JiAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAq7tJREFUeJzs3XlY1OXeBvB7ZpiBYd8XRWVTVnEFczc1K9vUzNLQOqW2WJYlVkff0rK03Eo9mpWWS6alZnqyIpfymBtuKbuAKCD7vg0DM/P+gUyMoDI4w2+A+3NdXMpvm+88TcXNs4k0Go0GRERERERE1KrEQhdARERERETUETGMERERERERCYBhjIiIiIiISAAMY0RERERERAJgGCMiIiIiIhIAwxgREREREZEAGMaIiIiIiIgEwDBGREREREQkAIYxIiIiIiIiATCMERE1MHXqVEydOlXn2JkzZ/Doo48iKCgIP/zwg1Ff/+2338bIkSP1vm/kyJF4++23jVARERmLv78/1qxZI3QZRCQgM6ELICIyZQUFBXjxxRcRHByMjRs3wt/fX+iSiIiIqJ1gGCMiuo2vv/4aCoUCn3zyCdzc3IQuh4iIiNoRDlMkIrqFoqIibN++HY888kijIJaWlobZs2dj8ODB6N27N6ZOnYqzZ8/qXPPHH39gwoQJ6NWrFwYNGoT33nsPZWVlOtd8++23uPfee9GrVy/MmTMH5eXlAID169dj4MCB6N+/P9577z0olUrtPUqlEosWLUJYWBgGDBigHeZUUVGByMhI9O7dG8OHD8e3336rvScjIwP+/v7Ys2eP9lh1dTVGjRql09vX1DDNU6dOwd/fH6dOnWrye6CuB7F///6Nhlj+8MMPeOihhxASEoIRI0ZgzZo1UKlU2vNNDctsWGv9azX1VV/nnYZoNvWebpabm4u33noLAwcORJ8+fRAREYHz589rz988nEyj0eCpp56Cv78/MjIydK67Xa2zZ8/GsGHDoFardV5//vz5uP/++wEA2dnZeOONN3DPPfegV69emDp1Ki5cuAAAWLNmzS1fo76+hIQEvPLKK7jnnnsQHByMoUOHYvHixVAoFLdtg7/++uu2tTf3PQLAoUOHMH78ePTq1eu2z2poz5498Pf3x99//43x48cjNDQUjzzyCH799Ved6zIyMjBv3jwMGTIEwcHBGDhwIObNm4eioiLtNfHx8Xj66afRp08fjB49Gjt27NCea+rzCzT+nNxpCGHDz92WLVsa/ft18uRJBAQE4D//+c8tn3Gz1atXIzAwED/++GOz7yGito09Y0REN9FoNMjKysLixYtRW1uLF154Qed8cnIyJk2aBC8vLyxYsABSqRRbtmzBM888g02bNiE8PBzR0dF46aWX8Oijj+LNN9/E5cuX8emnnyIpKQnbtm2DRCLBwYMH8f7772Pq1KkYNmwYdu7ciYMHDwIADhw4gMWLFyMzMxPLly+HhYUF3nnnHQDAsmXLsHv3bsybNw/u7u5YtWoVMjMzkZmZiQceeACrV6/G0aNH8f7778Pd3R2jRo1q8n1+9dVXOkHibqxYsQJlZWWwtbXVHtuwYQNWrVqFiIgIvPPOO4iPj8eaNWuQlZWFjz76qFnPDQ4Oxs6dOwHUBbtdu3Zpv7e2tjZI7RUVFZg8eTJUKhUiIyPh5uaGTZs24bnnnsOPP/4ILy+vRvf89NNPOmGtoYkTJ+KJJ57Qfr9o0SKdc7/99htOnTqFgQMHAgAUCgV+/fVXzJgxA0qlEtOnT0dNTQ3ee+89SKVSrFu3DlOnTsX333+PJ554AkOHDtV57nvvvQcAcHd3R25uLp5++mn07t0bS5cuhUwmw9GjR/H111/D1dUVM2fOvGU7KBQKuLu747PPPmuy9ua+x2vXruG1117D0KFDMWfOHO1n4lbPutkLL7yAiIgIzJkzB7t27cLrr7+ODRs2YPjw4aiqqsK0adPg4OCA9957DzY2Njh//jzWrl0LCwsLvP/++6iqqsKMGTPQuXNnrFmzBufOncN7772HTp06YdiwYc2qQV9Tp05FVFQUPv74Y4wYMQIymQz//ve/0bt3b7z44ovNesbGjRuxbt06LF68GOPHjzdKnURkehjGiIhuEh0djREjRkAqleLLL79s9MP42rVrIZPJsGXLFm0gGDFiBB5++GF88skn2LVrF/bu3QsvLy8sWbIEYrEYgwcPhlwux7vvvos///wTI0eOxOeff44BAwZgwYIFAIABAwZg8ODBKCsrw5IlSxASEgIAKC0txZdffomXX34ZarUaO3fuxMyZMxEREQEAcHZ2xpNPPgl7e3ssX74cUqkUw4YNQ1JSEjZs2NBkGMvKysKXX36J4OBgxMbG3lV7Xbp0CT/99BMCAwNRWloKACgrK8O6devw5JNPat/fkCFDYG9vjwULFuBf//oXunfvfsdnW1tbo3fv3gCA//3vfwCg/d5QfvzxR2RmZuLHH39EYGAgAKBv374YN24coqOjG/3zr6iowPLly2/Zdu7u7jo1NgyNQ4YMgbu7O/bu3asNY7///jsqKysxbtw4XLhwAampqfj222/Rp08fbS333Xcf1q1bhzVr1sDd3V3nuQ1f69ixYwgMDMRnn32mPT9o0CD89ddfOHXq1G3DWFVVFWxtbW9Ze3PfY1xcHGpqajBnzhz06NHjjs+62dSpUzFr1iwAwNChQzF+/Hj85z//wfDhw5GWlgZ3d3d8/PHH6NKlCwDgnnvuwd9//43Tp08DADIzM9GzZ0/8+9//RpcuXTBkyBBs374d//vf/4wWxkQiEZYsWYJHH30Uy5Ytg0QiQXFxMTZv3gyJRHLH+7/77jssW7YM77//PiZOnGiUGonINHGYIhHRTYKCgrB06VLY2dnhnXfeadR7dPr0adx77706P1yamZnhoYceQkxMDCoqKvDhhx9i7969EIvFqK2tRW1tLe6//36IxWJER0ejtrYWcXFxGDJkiPYZ5ubm6NWrF+RyuTaIAXU/wCsUCiQmJiIxMRHV1dXa3hGg7odxc3NzhIaGQiqV6twXGxurMyyw3scff4z+/fvj3nvvvau20mg0WLx4MSZOnIiAgADt8fPnz0OhUGDkyJHa919bW6sdkvjXX3/pPKfhNTcP4WtuHS299+zZs/D09NQGMQCQy+X47bffdHp/6q1btw4ODg6YPHmy3q8lFosxfvx4REVFoaqqCkBdGBw0aBDc3d0RHh6OCxcuoHfv3lCpVKitrYWtrS0GDx6M6OjoOz5/yJAh2LZtG8zNzZGcnIxDhw5h/fr1KCws1Bnq2pSsrCzY2Njo/Z5uFhwcDDMzM2zbtg2ZmZlQKpWora2FRqNp1v0Ne4VEIhHuu+8+XLx4EQqFAoGBgdi+fTs6d+6MtLQ0/Pnnn9i4cSNSU1O178/Pzw/r169Hly5doFQqcfToUZSUlMDX11fnddRqtc7nrqn66q9pTu1dunTB3Llz8eOPP+KHH37AggULtIHxdo4cOYJFixahf//+mDRp0h2vJ6L2hT1jREQ3sba2xvjx4+Hj44PJkyfj9ddfx86dO7W/4S4pKYGzs3Oj+5ydnaHRaFBeXg4rKyuYm5sDqPvhtKHS0lIUFBRApVLBwcFB55y9vT3s7Ox0jtUP88rPz9cGq5vvs7Ozg729faP7amtrdebSAHVh8uDBg9i3bx9+/vnn5jTJLe3duxdpaWn4/PPP8fHHH2uPFxcXA8Ate2Jyc3O1f8/MzGzURi2pY+/evRCJRHByckK/fv3w2muvNfoBvCnFxcVwcnJq1uukpaVh8+bN+Oqrr3D9+vUW1fr444/j888/R1RUFO655x6cOHECy5cv156XyWQA6uaRNZw71JweFrVajZUrV+Lbb79FZWUlPDw8EBoaqv0s3k5mZiY6d+7cgnekq0uXLli2bBlWrlypHVJaLzw8/I73u7q66nzv5OQEjUaD0tJSWFhY4Ouvv8bnn3+O4uJiODs7IyQkBHK5vNF8zNLSUoSFhQEAXFxc8OCDD+qcf/bZZxu99s31rVu3DuvWrYNEIoGzszOGDBmC11577ZaL+YwdOxZLly4FAAw
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.metrics import silhouette_score\n",
"\n",
"silhouette_scores = []\n",
"for i in clusters_range[1:]: \n",
" kmeans = KMeans(n_clusters=i, random_state=random_state)\n",
" labels = kmeans.fit_predict(data_scaled)\n",
" score = silhouette_score(data_scaled, labels)\n",
" silhouette_scores.append(score)\n",
"\n",
"# Построение диаграммы значений силуэта\n",
"plt.figure(figsize=(10, 6))\n",
"plt.plot(clusters_range[1:], silhouette_scores, marker='o')\n",
"plt.title('Коэффициенты силуэта для разных k')\n",
"plt.xlabel('Количество кластеров')\n",
"plt.ylabel('Коэффициент силуэта')\n",
"plt.grid(True)\n",
"plt.show()"
]
},
{
"cell_type": "code",
2024-12-21 15:22:35 +04:00
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2024-12-21 15:22:35 +04:00
"Средний коэффициент силуэта: 0.620\n"
]
},
{
"data": {
2024-12-21 15:22:35 +04:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA00AAAJzCAYAAADTBPhFAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQABAABJREFUeJzs3Xd8VFX6x/HPudPSgNB7R+lNAbEggnWtC6urqNhXUNC1rau7/tZV17KKZQUU1LWiK/a1u2LvFKUoiEqVXkNIm3p+f9xkYEghCUkmJN/365UXmXvu3PvM3CTcZ845zzHWWouIiIiIiIiUyEl2ACIiIiIiIrWZkiYREREREZEyKGkSEREREREpg5ImERERERGRMihpEhERERERKYOSJhERERERkTIoaRIRERERESmDkiYREREREZEyKGkSEREREREpg5ImEam0sWPH0r1794SvQYMGcd555zF79uxkhycidVz37t2ZPHlyse0//fQThx56KMOHD2flypWlPn/y5Ml0796dvn37kpOTU+I+//nPf+jevTsjR46sqrBFZD+kpElE9kmvXr2YOXMmM2fO5LnnnuOuu+7C5/Nx8cUX8/PPPyc7PBGpZ37++WcuuOACUlNTmTFjBp06ddrrcyKRCB9++GGJbW+//XYVRygi+yMlTSKyTzIyMhgwYAADBgzg4IMP5phjjmHy5Mk4jsMrr7yS7PBEpB5ZtmwZ559/Punp6cyYMYP27duX63kHHXQQ77zzTrHtGzduZO7cufTs2bOqQxWR/YySJhGpcqmpqQQCAYwx8W1jx45l7NixCfvde++9dO/ePSG5mjFjBkcffTQDBw7k3HPP5aeffgLg2WefpXv37qxYsSLhGP/973/p2bMn69evB2DWrFmcffbZDBw4kD59+nDCCSfw7LPPJjznhhtuKDassOhrzZo18X32HI7z/PPPFxsO9Pbbb3PiiScyYMAARo8ezdy5cxOes7d4vvnmG7p3784333yT8Lw936/yvH+hUIh//vOfDB8+nJ49eya8rrIS2D2Pffvtt9O3b18+/fRTYNcQppK+do+7PO/9pk2b+POf/8yhhx4av8bfffcdACNHjtzrdZk7dy7nnnsu/fv3Z8iQIfz5z39m27Zt8eO/8sordO/enQULFjBq1Cj69evHKaecwrvvvpsQx86dO7nzzjs55phj6Nu3LyeffDIvvfRSwj67x9OjRw8GDx7MFVdcwfbt20t9LwGWL1/OxIkTGTJkCIMHD2bcuHEsW7as1P3Len93v24rV67kyiuv5PDDD2fAgAGMHTuWefPmxdvXrFkTf97rr7+ecI6PPvoo3ra7t99+m9GjRzNw4EAOP/xw/va3v7Fjx45ise2upJ/FkSNHcsMNN5T6eE9Fse7++r799lvOPPNM+vbty+GHH85tt91GQUFBqcfY07JlyzjvvPNo0KABM2bMoE2bNuV+7oknnsjnn39ebIjeu+++S+fOnenRo0ex58yaNYvRo0fH4/3HP/5BXl5esX3K8/v/1VdfcdFFF9G/f38OP/xw7rnnHqLRaHy/L774gt///vcMHDiQwYMHc9lll5X5MyUiVU9Jk4jsE2stkUiESCRCOBxm8+bN3HvvvYRCIX73u9+V+rzVq1fz5JNPJmz73//+x2233cZJJ53E1KlTiUajjB8/nlAoxCmnnEIgEOC///1vwnNee+01Dj30UFq3bs3HH3/MhAkT6N27Nw899BCTJ0+mffv23HrrrSxYsCDhec2bN48PK5w5cyaXXXZZma9zx44dPPDAAwnbFi5cyHXXXceAAQN4+OGHad26NePHj2fLli0AFYqnokp6/x599FGeeuopzj//fJ566ilmzpzJlClTKnTchQsX8p///IcHHniAgQMHJrTt/n797W9/S2grz2vNzc1lzJgxfPPNN/zpT39iypQpBAIBLrroIlauXMmUKVMSYr7sssvi52vRogVz5szhggsuICUlhQceeIC//OUvzJ49m/POO6/YzfW4ceM4+uijmTJlCp07d+aqq67ik08+AaCgoICzzz6bN954g0suuYSHHnqIgw8+mL/+9a9MmzYt4TjDhw9n5syZPPPMM1x77bV88cUX3H777aW+fxs3buTMM89k5cqV/P3vf+eee+5hy5YtnH/++WRlZZX53u/+/u553X755RdGjx7NmjVruOmmm5g0aRLGGM4///xi8wfT09OLDTV7++23cZzE//IfeughrrnmGgYMGMCDDz7IhAkTeO+99xg7dmyFkpWqsH79ei6++GIaN27MlClTuPLKK/nvf//L9ddfX67nL1++nPPPP5+MjAxmzJhBy5YtK3T+448/nmg0WuL7dtJJJxXb/4033mDChAl06dKFqVOnMnHiRF5//XUuv/xyrLVAxX7/r7vuOg4++GCmTZvGySefzGOPPcaLL74IwK+//srll19Onz59ePjhh7n99ttZsWIFl156KbFYrEKvU0Qqz5vsAERk/zZnzhx69+5dbPs111xD165dS33eHXfcwQEHHMAPP/wQ37Zt2zbOPvtsrrnmGsDtOSn6lL5nz54ce+yxvP766/zxj3/EGMOGDRv4+uuvueeeewD3xnLUqFH89a9/jR9z4MCBHHLIIXzzzTf0798/vt3v9zNgwID44+XLl5f5Oh988EHatGmT0MuwYcMGjj/+eP7xj3/gOA7NmjXj5JNPZv78+RxzzDEViqeiSnr/Fi5cSI8ePbjooovi24p6aMqrqKfv6KOPLta2+/sVDAYT2srzWl999VXWrl3Lq6++Gh/udNBBB/Hb3/6WOXPmcMYZZyTE3KFDh4Rz3nvvvXTu3Jnp06fj8XgA6N+/PyeddBIvv/wy55xzTnzfsWPHMmHCBACGDRvGqFGjmDp1KsOHD+eVV17hp59+4vnnn48nhsOGDSMSifDQQw9x1llnkZmZCUCTJk3iMQwePJgvv/wy4T3f05NPPkkoFOKJJ56gefPmAPTo0YMxY8awYMEChg8fXupzd3+te163KVOm4Pf7efrpp8nIyADgqKOO4uSTT+buu+9O6CU78sgj+eyzzwiFQvj9foLBIB988AGDBw+O9wzu2LGDhx9+mN///vcJCfCBBx7IOeecU+z9rG6PPvoojRs3ZurUqfFr6zgON910E0uXLi3W27W7lStXct5557FlyxbC4XClEolmzZoxePBg3nnnHU499VQA1q5dy4IFC7j77rt5+OGH4/taa5k0aRLDhg1j0qRJ8e2dOnXiggsu4JNPPuGoo46q0O//GWecEf95PfTQQ5k1axYff/wxZ511FgsXLqSgoIBx48bFk8FWrVrxwQcfkJeXF/95EJHqpaRJRPZJ7969ueWWWwD3ZiI7O5tPP/2U+++/n7y8PK6++upiz/n000/58ssvefTRRznvvPPi28866ywAYrEYeXl5/O9//yMlJYW2bdsCcPrpp/Pmm28yd+5cBg8ezGuvvUZ6ejrHHnssAJdccgng9misWLGC1atXs2jRIsBNwCrrp59+ivc2FMUIcNxxx3HcccdhrSUvL4933nkHx3Ho3LlztcZT2vvXt29fHnnkEd577z2GDh1Kenp6uW8grbV89913vP3228V6sMqjPK913rx5tGvXLmF+SGpqKu+9995ej5+fn8+CBQu4+OKL472bAO3bt6dr16588cUXCTf5o0aNin9vjOHYY49l8uTJFBQUMHv2bNq2bVusJ+3UU0/lpZdeSkhuis4Vi8X48ccfmTdvHocddlipcc6bN48BAwbEEyZwb3A/+uijvb7GssyePZsRI0Yk3CB7vd54r2xubm58+9ChQ/n000/55ptvGDZsGJ9++ikZGRkMGjQonjTNnz+fUCjEySefnHCeQYMG0bZtW2bPnr3PSVPRe+c4TrFeriKxWIxIJMLcuXM54ogj4gkTuMkfuO9pWUnTm2++SZ8+fbj//vu56KKL+NOf/sSTTz6ZcM5oNBrvAQL3Z2L3c4E7RO8f//gHOTk5ZGRk8NZbb9G7d286duyYsN/y5cvZsGED48aNi/8cgptUZ2Rk8MUXX3DUUUdV6Pd/z5/FVq1
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.decomposition import PCA\n",
"\n",
"# Применение K-Means\n",
"kmeans = KMeans(n_clusters=3, random_state=42) \n",
"df_clusters = kmeans.fit_predict(data_scaled)\n",
"\n",
"# Оценка качества кластеризации\n",
"silhouette_avg = silhouette_score(data_scaled, df_clusters)\n",
"print(f'Средний коэффициент силуэта: {silhouette_avg:.3f}')\n",
"\n",
"# Визуализация кластеров\n",
"pca = PCA(n_components=2)\n",
"df_pca = pca.fit_transform(data_scaled)\n",
"\n",
"plt.figure(figsize=(10, 7))\n",
"sns.scatterplot(x=df_pca[:, 0], y=df_pca[:, 1], hue=df_clusters, palette='viridis', alpha=0.7)\n",
"plt.title('Визуализация кластеров с помощью K-Means')\n",
"plt.xlabel('Первая компонентa PCA')\n",
"plt.ylabel('Вторая компонентa PCA')\n",
"plt.legend(title='Кластер', loc='upper right')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2024-12-21 15:22:35 +04:00
"Средний коэффициент силуэта, равный 0.620, указывает на умеренно хорошую кластеризацию."
]
}
],
"metadata": {
"kernelspec": {
2024-12-21 15:22:35 +04:00
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2024-12-21 15:22:35 +04:00
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}