568 lines
340 KiB
Plaintext
568 lines
340 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Лабораторная работа 5\n",
|
|||
|
"\n",
|
|||
|
"Определение бизнес-цели для решения задачи кластеризации\n",
|
|||
|
"\n",
|
|||
|
"Бизнес-цель: Идентификация временных периодов с похожими рыночными условиями на основе исторических данных о ценах акций.\n",
|
|||
|
"\n",
|
|||
|
"Постановка задачи:Группировка временных периодов (например, дней) на основе схожих характеристик рыночной активности.\n",
|
|||
|
" \n",
|
|||
|
"Столбцы датасета и их пояснение:\n",
|
|||
|
"\n",
|
|||
|
"Date - Дата, на которую относятся данные. Эта характеристика указывает конкретный день, в который происходила торговля акциями Starbucks.\n",
|
|||
|
"\n",
|
|||
|
"Open - Цена открытия. Стоимость акций Starbucks в начале торгового дня. Это важный показатель, который показывает, по какой цене начались торги в конкретный день, и часто используется для сравнения с ценой закрытия для определения дневного тренда.\n",
|
|||
|
"\n",
|
|||
|
"High- Максимальная цена за день. Наибольшая цена, достигнутая акциями Starbucks в течение торгового дня. Эта характеристика указывает, какой была самая высокая стоимость акций за день.\n",
|
|||
|
"\n",
|
|||
|
"Low- Минимальная цена за день. Наименьшая цена, по которой торговались акции Starbucks в течение дня.\n",
|
|||
|
"\n",
|
|||
|
"Close- Цена закрытия. Стоимость акций Starbucks в конце торгового дня. Цена закрытия — один из основных показателей, используемых для анализа акций, так как она отображает итоговую стоимость акций за день и часто используется для расчета дневных изменений и трендов на длительных временных периодах.\n",
|
|||
|
"\n",
|
|||
|
"Adj Close - Скорректированная цена закрытия. Цена закрытия, скорректированная с учетом всех корпоративных действий.\n",
|
|||
|
"\n",
|
|||
|
"Volume- Объем торгов. Количество акций Starbucks, проданных и купленных в течение дня. "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Загрузка данных датасета"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 2,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>Date</th>\n",
|
|||
|
" <th>Open</th>\n",
|
|||
|
" <th>High</th>\n",
|
|||
|
" <th>Low</th>\n",
|
|||
|
" <th>Close</th>\n",
|
|||
|
" <th>Adj Close</th>\n",
|
|||
|
" <th>Volume</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>0</th>\n",
|
|||
|
" <td>1992-06-26</td>\n",
|
|||
|
" <td>0.328125</td>\n",
|
|||
|
" <td>0.347656</td>\n",
|
|||
|
" <td>0.320313</td>\n",
|
|||
|
" <td>0.335938</td>\n",
|
|||
|
" <td>0.260703</td>\n",
|
|||
|
" <td>224358400</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>1992-06-29</td>\n",
|
|||
|
" <td>0.339844</td>\n",
|
|||
|
" <td>0.367188</td>\n",
|
|||
|
" <td>0.332031</td>\n",
|
|||
|
" <td>0.359375</td>\n",
|
|||
|
" <td>0.278891</td>\n",
|
|||
|
" <td>58732800</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>1992-06-30</td>\n",
|
|||
|
" <td>0.367188</td>\n",
|
|||
|
" <td>0.371094</td>\n",
|
|||
|
" <td>0.343750</td>\n",
|
|||
|
" <td>0.347656</td>\n",
|
|||
|
" <td>0.269797</td>\n",
|
|||
|
" <td>34777600</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>3</th>\n",
|
|||
|
" <td>1992-07-01</td>\n",
|
|||
|
" <td>0.351563</td>\n",
|
|||
|
" <td>0.359375</td>\n",
|
|||
|
" <td>0.339844</td>\n",
|
|||
|
" <td>0.355469</td>\n",
|
|||
|
" <td>0.275860</td>\n",
|
|||
|
" <td>18316800</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>1992-07-02</td>\n",
|
|||
|
" <td>0.359375</td>\n",
|
|||
|
" <td>0.359375</td>\n",
|
|||
|
" <td>0.347656</td>\n",
|
|||
|
" <td>0.355469</td>\n",
|
|||
|
" <td>0.275860</td>\n",
|
|||
|
" <td>13996800</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>...</th>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" <td>...</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>8031</th>\n",
|
|||
|
" <td>2024-05-17</td>\n",
|
|||
|
" <td>75.269997</td>\n",
|
|||
|
" <td>78.000000</td>\n",
|
|||
|
" <td>74.919998</td>\n",
|
|||
|
" <td>77.849998</td>\n",
|
|||
|
" <td>77.849998</td>\n",
|
|||
|
" <td>14436500</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>8032</th>\n",
|
|||
|
" <td>2024-05-20</td>\n",
|
|||
|
" <td>77.680000</td>\n",
|
|||
|
" <td>78.320000</td>\n",
|
|||
|
" <td>76.709999</td>\n",
|
|||
|
" <td>77.540001</td>\n",
|
|||
|
" <td>77.540001</td>\n",
|
|||
|
" <td>11183800</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>8033</th>\n",
|
|||
|
" <td>2024-05-21</td>\n",
|
|||
|
" <td>77.559998</td>\n",
|
|||
|
" <td>78.220001</td>\n",
|
|||
|
" <td>77.500000</td>\n",
|
|||
|
" <td>77.720001</td>\n",
|
|||
|
" <td>77.720001</td>\n",
|
|||
|
" <td>8916600</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>8034</th>\n",
|
|||
|
" <td>2024-05-22</td>\n",
|
|||
|
" <td>77.699997</td>\n",
|
|||
|
" <td>81.019997</td>\n",
|
|||
|
" <td>77.440002</td>\n",
|
|||
|
" <td>80.720001</td>\n",
|
|||
|
" <td>80.720001</td>\n",
|
|||
|
" <td>22063400</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>8035</th>\n",
|
|||
|
" <td>2024-05-23</td>\n",
|
|||
|
" <td>80.099998</td>\n",
|
|||
|
" <td>80.699997</td>\n",
|
|||
|
" <td>79.169998</td>\n",
|
|||
|
" <td>79.260002</td>\n",
|
|||
|
" <td>79.260002</td>\n",
|
|||
|
" <td>4651418</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"<p>8036 rows × 7 columns</p>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" Date Open High Low Close Adj Close \\\n",
|
|||
|
"0 1992-06-26 0.328125 0.347656 0.320313 0.335938 0.260703 \n",
|
|||
|
"1 1992-06-29 0.339844 0.367188 0.332031 0.359375 0.278891 \n",
|
|||
|
"2 1992-06-30 0.367188 0.371094 0.343750 0.347656 0.269797 \n",
|
|||
|
"3 1992-07-01 0.351563 0.359375 0.339844 0.355469 0.275860 \n",
|
|||
|
"4 1992-07-02 0.359375 0.359375 0.347656 0.355469 0.275860 \n",
|
|||
|
"... ... ... ... ... ... ... \n",
|
|||
|
"8031 2024-05-17 75.269997 78.000000 74.919998 77.849998 77.849998 \n",
|
|||
|
"8032 2024-05-20 77.680000 78.320000 76.709999 77.540001 77.540001 \n",
|
|||
|
"8033 2024-05-21 77.559998 78.220001 77.500000 77.720001 77.720001 \n",
|
|||
|
"8034 2024-05-22 77.699997 81.019997 77.440002 80.720001 80.720001 \n",
|
|||
|
"8035 2024-05-23 80.099998 80.699997 79.169998 79.260002 79.260002 \n",
|
|||
|
"\n",
|
|||
|
" Volume \n",
|
|||
|
"0 224358400 \n",
|
|||
|
"1 58732800 \n",
|
|||
|
"2 34777600 \n",
|
|||
|
"3 18316800 \n",
|
|||
|
"4 13996800 \n",
|
|||
|
"... ... \n",
|
|||
|
"8031 14436500 \n",
|
|||
|
"8032 11183800 \n",
|
|||
|
"8033 8916600 \n",
|
|||
|
"8034 22063400 \n",
|
|||
|
"8035 4651418 \n",
|
|||
|
"\n",
|
|||
|
"[8036 rows x 7 columns]"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 2,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"from scipy.cluster.hierarchy import dendrogram, linkage, fcluster\n",
|
|||
|
"from sklearn.cluster import KMeans\n",
|
|||
|
"from sklearn.decomposition import PCA\n",
|
|||
|
"from sklearn.preprocessing import StandardScaler\n",
|
|||
|
"from sklearn.metrics import silhouette_score\n",
|
|||
|
"\n",
|
|||
|
"df = pd.read_csv(\"data/starbucks.csv\")\n",
|
|||
|
"df "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Предобработка данных"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 3,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Загрузка и предобработка данных\n",
|
|||
|
"data = pd.read_csv(\"data/starbucks.csv\")\n",
|
|||
|
"features = ['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']\n",
|
|||
|
"\n",
|
|||
|
"# Масштабируем числовые данные\n",
|
|||
|
"scaler = StandardScaler()\n",
|
|||
|
"data_scaled = scaler.fit_transform(data[features])"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Выполним понижение размерности с помощью PCA\n",
|
|||
|
"\n",
|
|||
|
"Используем метод анализа главных компонент (PCA) для сокращения размерности данных до двух измерений. Это позволяет визуализировать данные на плоскости и понять их структуру. Также построим график, показывающий расположение объектов в пространстве двух главных компонент."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Collecting seaborn\n",
|
|||
|
" Using cached seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)\n",
|
|||
|
"Requirement already satisfied: numpy!=1.24.0,>=1.20 in c:\\users\\ateks\\courses\\courses\\.venv\\lib\\site-packages (from seaborn) (2.1.3)\n",
|
|||
|
"Requirement already satisfied: pandas>=1.2 in c:\\users\\ateks\\courses\\courses\\.venv\\lib\\site-packages (from seaborn) (2.2.3)\n",
|
|||
|
"Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in c:\\users\\ateks\\courses\\courses\\.venv\\lib\\site-packages (from seaborn) (3.9.3)\n",
|
|||
|
"Requirement already satisfied: contourpy>=1.0.1 in c:\\users\\ateks\\courses\\courses\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.3.1)\n",
|
|||
|
"Requirement already satisfied: cycler>=0.10 in c:\\users\\ateks\\courses\\courses\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.12.1)\n",
|
|||
|
"Requirement already satisfied: fonttools>=4.22.0 in c:\\users\\ateks\\courses\\courses\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.55.1)\n",
|
|||
|
"Requirement already satisfied: kiwisolver>=1.3.1 in c:\\users\\ateks\\courses\\courses\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.7)\n",
|
|||
|
"Requirement already satisfied: packaging>=20.0 in c:\\users\\ateks\\courses\\courses\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (24.2)\n",
|
|||
|
"Requirement already satisfied: pillow>=8 in c:\\users\\ateks\\courses\\courses\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (11.0.0)\n",
|
|||
|
"Requirement already satisfied: pyparsing>=2.3.1 in c:\\users\\ateks\\courses\\courses\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.2.0)\n",
|
|||
|
"Requirement already satisfied: python-dateutil>=2.7 in c:\\users\\ateks\\courses\\courses\\.venv\\lib\\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.9.0.post0)\n",
|
|||
|
"Requirement already satisfied: pytz>=2020.1 in c:\\users\\ateks\\courses\\courses\\.venv\\lib\\site-packages (from pandas>=1.2->seaborn) (2024.2)\n",
|
|||
|
"Requirement already satisfied: tzdata>=2022.7 in c:\\users\\ateks\\courses\\courses\\.venv\\lib\\site-packages (from pandas>=1.2->seaborn) (2024.2)\n",
|
|||
|
"Requirement already satisfied: six>=1.5 in c:\\users\\ateks\\courses\\courses\\.venv\\lib\\site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.16.0)\n",
|
|||
|
"Using cached seaborn-0.13.2-py3-none-any.whl (294 kB)\n",
|
|||
|
"Installing collected packages: seaborn\n",
|
|||
|
"Successfully installed seaborn-0.13.2\n",
|
|||
|
"Note: you may need to restart the kernel to use updated packages.\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"%pip install seaborn ##не устанавливается из консоли :("
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 9,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0kAAAIlCAYAAAAAOLPVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/GU6VOAAAACXBIWXMAAA9hAAAPYQGoP6dpAACUVklEQVR4nOzdeXgUVdYG8Leq1+whkBAghE0RURZBQRYBEdxBEBVGxwF0lE9REFwZFRB1FFccRXBFxxERBUZwREVWEdkJRJRFthACIWRPequuut8fnW66SXdIh053Et7f80RJVS+nbi+pU/fecyUhhAAREREREREBAORIB0BERERERFSXMEkiIiIiIiLywiSJiIiIiIjIC5MkIiIiIiIiL0ySiIiIiIiIvDBJIiIiIiIi8sIkiYiIiIiIyAuTJCIiIiIiIi9MkoiIiIiIiLwwSapjWrduDUmSfH5MJhPS09MxcuRI/Pzzz1Xe/+jRo3jmmWdw5ZVXIjk5GQaDAYmJiejWrRsmTpyILVu2VHl/h8OB5ORkSJKE1NRUOJ3OUB6ex/Tp0ysd55nHumHDhlp5biIiIiKiqkhCCBHpIOi01q1b48iRI+jTpw8uuOACAEBRURG2bt2KY8eOQZIkvPbaa5g8eXKl+77yyit49tln4XA4EBsbi549eyIlJQWlpaXIzMzEkSNHAACPP/44XnnlFb/P/9VXX+GOO+7w/P7f//4Xt9xyS8iPc/r06XjuuefQtGlTXH/99Z7tRUVFyMjIwJEjRyBJEmbPno0HHngg5M9PRERERBSIPtIBkH9///vfMWbMGM/vNpsN48aNw7///W888cQTuPnmm9G+fXvP/qeeegozZ86EwWDAa6+9hoceeggmk8nnMTdu3Iinn34a+/btC/i8H330EQCgRYsWOHbsGD766KNaSZLcOnTogE8++cRnm6qqePzxx/Hmm29i8uTJuP3229GkSZNai4GIiIiIyBuH29UTZrMZs2fPRkxMDFRVxeLFiz37Vq5ciZkzZwIAvvzySzz66KOVEiQAuPLKK/HTTz/h0Ucf9fscR48exYoVK6DT6bBw4UJIkoTvvvsOx48fr52DCkCn0+Gf//wndDodbDYbfvnll7A+PxERERGd35gk1SOxsbG46KKLAACHDx/2bH/hhRcAAEOHDsXw4cOrfAxJknDVVVf53ffxxx9D0zTccMMN6N27NwYOHAhVVfHpp5+G5gCCYDabkZiYCACV5kV98sknkCTJp6fNzW63o3379p45Tt4KCgowevRoXHbZZUhOTobRaERqair69OmDuXPnwuFweG67evVqSJKEDh06INCIVJvNhsaNG0OSJPz++++e7Zs3b8YTTzyBHj16IDU1FUajEU2bNsWQIUPw008/VXnca9as8TtXy/vH2+HDhyFJElq3bu338UaPHu2535o1a3z2TZkyBX379kWzZs1gNpvRqFEjdO7cGVOnTsXJkycrPdbvv/+OadOmoU+fPmjRogWMRiMaN26MQYMGYeHChVUez4ABAwIe84ABA/zGdy7HBgCZmZm466670KpVK5hMpkrtWFVMVfE3b9D758yeUSD8bRdo+9keuzrP6W3MmDGVjnnjxo0wGo2IiopCRkZGpfvs2rUL0dHRMBgM1b4A4v7MV+fH+7vRraCgAP/4xz9wySWXIDo6GnFxcejevTteeeUVWK3WgM977NgxPP744+jUqRPi4uIQExOD9u3bY8yYMQHnTNbk/QEAX3/9Na6//nrPd1OLFi3w17/+1ee7pbqq21bTp0/3e/8FCxbgmmuuQVJSEkwmE1q1aoV77rkn4CgE9zH7a3sg8Hd2Tb/LN23ahLi4OMTHx/udZ+vvPoDruzk+Ph5xcXHYtGlTpf1WqxWvv/46rrzySiQmJsJsNuOiiy7CE088gfz8/Gofl7dAbVNVm82bN6/K1+jUqVN47LHH0LFjR0RHR5/178TZeN/ngw8+QPfu3RETE4PExETceOON2Lhxo9/71eR7bdOmTRgxYgQuuugiJCYmet5fQ4YMwXfffVfp9t6f/X79+gU8hsGDB5/1M7Zy5UrceuutaNasGYxGI1JSUjB8+HD8+uuvIW2Xmn4e3H744QfcfPPNSElJgdFoRPPmzTFy5Ehs3bo14PELIbB48WLcfPPNnvOO1NRU9O3bFzNnzvR8z53t+ynQ30h/3/Vue/fu9fyNrenf1bqKw+3qmZKSEgDw9BQVFRVh3bp1AFwnjTUlhMC8efMAAPfcc4/n/ytXrsS8efPw1FNP+b3fmDFj8Omnn2L06NEBv5hq4uDBg54/Spdcckm17/fqq69i//79fvcVFBRg4cKF6NSpE/r06YOYmBicOHEC69evx4YNG/DNN99g+fLlAICrr74anTp1QmZmJn766ScMHjy40uN98cUXKCgowNVXX42OHTt6tv/jH//A6tWrcckll3i+VA8cOIBvv/0W3377LWbNmoWJEydWeRxnztUCEHSyun79evz73/8OuH/+/PmIjo5G9+7dkZCQgJKSEvzyyy94/vnnMW/ePOzYscNnmOMbb7yBjz76CB06dECnTp2QmJiIrKwsrF69GitXrsTGjRvxxhtvBBVjTZ3t2Hbu3InevXvDYrGgefPmuPHGG5GQkAAAOHHiBH744YdzjmHEiBGIjY31ienAgQN+b1uX2q62XXnllXj55Zfx6KOP4o477sC2bdsQFxcHACgtLcXtt98Oq9WKV155BX369Anqsdu1a4e+ffv63ff111+jvLy80vaDBw9i4MCBOHLkCJKTk3HjjTdCURSsXr0aTz75JL788kv89NNPaNSokc/9Vq5cidtuuw1FRUVISUnBNddcA6PRiMOHD2P+/PkAgN69eweMtbrvD6fTibvuugsLFy6EyWRC9+7d0aJFC+zbtw+ff/45Fi9ejMWLF1f6PqiOQH8TMjIysHPnzkrbhRAYM2YM/v3vf0Ov16Nfv35ISUnB9u3bMW/ePHz55ZdYtGhRjWIJVlXf5T179sTSpUtx44034vrrr8fatWtx6aWXVvl4v/32G2644QYoioLly5ejZ8+ePvtzcnJw/fXXIzMzE0lJSbjiiisQFxeH7du349VXX8VXX32FNWvWoFWrViE7Rn8KCwvx5JNPBtxvsVjQu3dv7N+/H9HR0RgwYICn2BIQ/N8Jb5MnT8asWbPQp08f3HLLLcjMzMTy5cuxYsUKLFy4sNJF2Jp8r2VmZmL16tXo1KkTOnXqBL1e7/P38aWXXgp4vvHzzz8jIyMDXbt29dm+e/fus16AfOyxx/D6669DlmVcfvnluOqqq5CVlYVvvvkGy5YtwwcffICxY8eGpF3OxbPPPosXXngBkiShd+/eSE9Pxx9//IGFCxdi0aJFeP/99z3naG6KomDUqFFYvHgxZFlGjx49MHDgQJw6dQq///47nnrqKYwcORKtW7fGbbfdhlOnTvnc3/2eOfP7okOHDtWK+aGHHvK5yNygCKpTWrVqJQCIefPmVdq3c+dOIcuyACA+/vhjIYQQK1euFAAEAJGVlVXj5/3xxx8FAJGSkiIcDocQQgir1SoSExMFALFu3Tq/9xs9erQAIEaPHh3U802bNk0AEP379/fZXlRUJFauXCm6du0qAIiRI0dWuu+8efP8PuehQ4dEVFSUSE9P97SJN6fTKRRFqfR4R44cEcnJyQKA+O233zzbP/jgAwFADB061O8xdO/eXQAQixYt8tn+3XffiZycnEq337Bhg4iPjxcGg0FkZ2f7fcyffvpJABADBgyotM/fMR06dEgAEK1atfLZriiK6NSpk9DpdKJ58+YCgFi9erXPbWw2W6XnsFgsYsCAAQKAeOedd3z2rVmzRhw4cKDSffbs2SPS0tIEALFp0yaffatXr/b7Onvr37+/3/jO5djuvfdeAUBce+21nvdzMDFVpWXLlgKAOHTokM9292fB32c33G0XaPvZHjvYtqnqmIcNGyYAiFGjRnm2jRo1SgAQN998s9A0rVrPIUTgz7w393fnma9Lz549PZ/jsrIyz/aTJ0+Kbt26CQDizjvv9LlPVlaWSEhIEADEU08
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import seaborn as sns\n",
|
|||
|
"# Понижение размерности с помощью PCA\n",
|
|||
|
"pca = PCA(n_components=2)\n",
|
|||
|
"data_pca = pca.fit_transform(data_scaled)\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация данных после PCA\n",
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"sns.scatterplot(x=data_pca[:, 0], y=data_pca[:, 1], alpha=0.7, edgecolor=None)\n",
|
|||
|
"plt.title(\"PCA: Визуализация данных после понижения размерности\", fontsize=16)\n",
|
|||
|
"plt.xlabel(\"Главная компонента 1\")\n",
|
|||
|
"plt.ylabel(\"Главная компонента 2\")\n",
|
|||
|
"plt.grid()\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Определение количества кластеров\n",
|
|||
|
"\n",
|
|||
|
"Выполним определение используя два метода:\n",
|
|||
|
"\n",
|
|||
|
"Метод локтя: Строится график зависимости инерции от количества кластеров. Этот метод помогает определить оптимальное количество кластеров, при котором инерция перестаёт существенно снижаться.\n",
|
|||
|
"\n",
|
|||
|
"Коэффициент силуэта: Для каждого количества кластеров вычисляется средний коэффициент силуэта, который измеряет качество кластеризации. График помогает выбрать количество кластеров с максимальным значением силуэта."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 10,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA2QAAAImCAYAAAA8D0kbAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/GU6VOAAAACXBIWXMAAA9hAAAPYQGoP6dpAACBHklEQVR4nO3dd3gU1f7H8c+mJ5CEntCJgvQOQuhEioAKUgTlh4AdAYFYUQSxXJSrgFJEEEGvogKCXSDSO4KECwiIEgWBhE6oSUjm98fc3WTJJqRPyvv1PPMkmTm7+93ds4FP5sw5NsMwDAEAAAAA8pyb1QUAAAAAQFFFIAMAAAAAixDIAAAAAMAiBDIAAAAAsAiBDAAAAAAsQiADAAAAAIsQyAAAAADAIgQyAAAAALAIgQwAAAAALOJhdQEAAGRFbGys1qxZo//+97+Kjo7WlStXFBsbq7Zt22r06NFWlwcAQIYQyFCkVatWTX///bck6amnntK7776bZtt///vfeu655yRJ7u7uun79ep7UCMBZQkKCXn75ZU2fPl1XrlxJdfzs2bMEMgBAgWEzDMOwugjAKikDWenSpXX8+HF5eXm5bFu7dm0dOHBAEoEMsEpiYqLuvvtu/fTTTwoJCdH48ePVqVMnVapUyerSAADIEq4hAyQ1a9ZMZ86c0TfffOPy+ObNm3XgwAE1b948jysDkNK///1v/fTTT+rYsaMiIyM1ZMgQwhgAoEAjkAGSHnroIUnSRx995PL4vHnznNoByHvXr1/XO++8o9KlS+urr75SQECA1SUBAJBtBDJAUv369dWsWTOtXLlSx44dczp26dIlLVq0SJUqVVKXLl3SvZ/r16/rww8/VIcOHVSqVCl5e3srJCREw4YN09GjR53aDhkyRDabLcPbjVasWKG77rpL5cqVk5eXlypUqKD+/ftrx44d6da4YMGCdB+nWrVqGXvRbvDXX3/d9Dn89ddfqW73zz//aOTIkapRo4Z8fHwUGBio1q1b64MPPlBiYmKa9Q8ZMiTVsYiICPn5+alYsWJavXp1hp7vjdvatWtTPV779u1VqlQpubu7p2q/YMGCTL9Wa9euzfT7bZfV1/nbb7/VnXfeqXLlysnDwyPVbV555ZUM1//KK6+kur2Pj49CQkLUv39/bdu2LdVt0nvfXOnQoUOq92P79u06ffq0evTooSVLlqh169YKDAyUj4+PbrnlFj3++OM6fPhwmvd59uxZvfjii6pbt678/Pzk7++vpk2bavLkybp69Wqq9vb3qUOHDrpy5YpefPFFVa9eXT4+PqpQoYIefvjhVL8v7H7++WeNHDlSjRo1UpkyZeTt7a1KlSqpf//++uWXXzL0GqSU2X6fUmY/Yzc+/4z2U3vfTOt3yODBg11+zm52O/vvyrQ+a6tWrVLv3r1Vvnx5eXl5qVy5crr33nu1ZcuWNJ/blStXNG3aNLVp00YlS5aUt7e3qlatqrvvvlsLFy50qiujW8rPULVq1VIdL1mypOrXr68JEybozJkzqWr6+++/9dZbbyksLExVqlSRt7e3SpQooTZt2uiDDz5QUlJSms8nLWn9Pjl//ryaN28um82mJ598Uq6uXnH1OU+5dejQwal9QkKCPv30Uw0cOFC1atVSQECAfH19VbNmTT311FM6fvx4mnUahqGlS5fqrrvuUnBwsLy8vBQcHKw2bdrorbfecnw+Xb2uGa1PynxfSfn6zZ07V02bNlWxYsVUokQJde/eXVu3bk3zOWX1903KzcPDQxUrVtQdd9yhL7/8Ms3HQsHHpB7A/zz00EPasWOHFixYoJdeesmxf9GiRbp06ZJGjRolN7e0/4Zx8eJF3XPPPVq7dq2KFy+upk2bqmzZstqzZ49mz56txYsXKyIiQo0bN5YktWnTJtV9LF++XDExMeratauCg4PTfKyXX35Zr7/+umw2m1q1aqUqVapo//79WrRokb766ivNmTPnpmfzbr31VqcaLl26pK+++ird22REsWLF1LdvX6d9S5Ys0eXLl1O1/eWXX3TnnXfq7NmzqlKlinr16qULFy5o7dq12rx5s5YtW6Zvv/02zev6UoqIiFDPnj1ls9n0/fffq2PHjpKk6tWra/DgwU5tIyMjtXv3bjVs2FCNGjVyOpbydR83bpzeeOMNSeaw1ho1ajhq2bhxo/7888+bvyDpCAoK0p133um07+OPP87QbTPzOs+dO1ePPfaYJKlu3boKCwuTj4+PpOTXIitSvn5Xr17Vr7/+6uiDP/74403/gJFZ//zzjyRp8eLF+uSTT+Tt7a327durZMmS2rFjh+bMmaOFCxfq66+/1h133OF028OHDyssLEx///23ypYtq+7duyshIUFr1qzR888/ry+//FI///yzSpYsmepx4+Pjdccdd+i///2vOnTooCZNmmjjxo366KOP9OOPP2r9+vWqUaOG022eeOIJHT16VHXr1lXr1q3l4eGhAwcOaNGiRVq6dKm++OIL9enTJ9uvSVr93i4nPmPZ6ad2Gzdu1CeffJKp22TEM888o3feeUdubm5q1qyZ2rZtqyNHjuibb77Rd999p7lz52ro0KFOtzl69KjuvPNO/fbbb/Lz81Pr1q1VunRpHTt2TBs2bNCePXv0wAMPqHjx4ql+d0RHR2vFihUuX5Mbf5dIUp8+fVS8eHFJ0unTp7V27Vq9+uqr+uKLLxQZGSlfX19H2//85z96+eWXFRISottuu02tW7fWiRMntGXLFm3atEkrV67UkiVL0v2DTUacP39enTt31o4dO/Tkk09qxowZ6d7njb8n7a/BjWJiYjRo0CAFBgaqdu3aatCggS5fvqzIyEhNnz5dX3zxhTZv3qzq1as73S4hIUEDBgzQ0qVL5ebmpttvv11hYWE6ffq0fvvtN73wwgvq37+/qlWrpr59++r06dNOt7f3xRvfq1q1ajn9nJW+YhceHq5p06apdevW6tmzp/bs2aOffvpJERERWrRoke69916n9tn5fZOybyUkJOiPP/7Q6tWrtXr1au3du1evvfaayxpRwBlAEVa1alVDkrFhwwbj/Pnzhq+vr1G9enWnNq1btzZsNpvx559/GlFRUYYkw93dPdV9PfDAA4Yk46677jJiYmKcjk2dOtWQZNSoUcO4fv16mvW0b9/ekGSsWbMmzTY//fSTIcnw8fExVq5c6XTsww8/NCQZnp6ext69e13eft68eYYkY/DgwU777c+tatWqaT52ev744w9DklGtWrVUx+yvc1RUlGPftWvXHPufeOIJIz4+3nHszz//NKpVq2ZIMl588UWn+5o/f36q+leuXGn4+voafn5+6b52dhMmTDAkGRMmTEizTVxcnOHn52dIMj7++ONUxwcPHmxIMubPn3/Tx7vRqlWrDElG+/btUx2TZKT3qzmzr7NhGMatt95qSDImTpyY6jYZeS0yepukpCRj2LBhhiSjT58+TsdcvW/pcfVZsN+HJCMkJMT4448/HMcSExON5557zpBklC5d2jhz5ozT/bVo0cKQZNxzzz3GpUuXHPtPnjxpNGnSxJBkPPDAA063WbNmjePxqlevbvz999+OY1evXjX69OljSDJatmyZqv5ly5YZZ8+edbnfw8PDKF26tHHlypUMvRYpn3tm+n1WP2N2P//8syHJ6NChQ6pjrvppWr9DEhISjPr16xvu7u5GhQoVUr2vN/vdk9Znbc6cOY73Zvfu3U7H1q1bZ/j7+xteXl7G77//7tifmJhoNGvWzJBkdOnSxTh58qTT7a5evWr88MMPLuswjOQ+4eqzm1Jan8WYmBijSpUqhiTju+++czq2fft2Y8+ePanu69ixY0bDhg0NScaiRYvSfdwb3fg+nTt3zvH8hw8fnu5tx40bZ0gyXnnlFaf9ab0GsbGxxjfffGPExcU57Y+PjzfGjh1rSDK6d++e6nHCw8Mdv9MiIyOdjiUlJRk///yzcf78+Qw/R1ey0ldS3revr6+xatUqp2OTJ082JBmBgYGp/s3Pzu8bV33riy++cPxuQ+HEkEXgfwIDA9W7d2/98ccfWrdunSTp4MGD2rRpk9q3b69
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1cAAAIlCAYAAAA5XwKOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/GU6VOAAAACXBIWXMAAA9hAAAPYQGoP6dpAACjbklEQVR4nOzdd1QU198G8GeWjoCgdFCxd7GjYlfsHURFxR5rRI0akxh72k9jjbFGAUvsGoyKYsOOHSzYERWsWFA67Lx/8O5GZFEWdhnK8zmHo8zOzD57d5nd786dewVRFEUQERERERFRrsikDkBERERERFQYsLgiIiIiIiLSABZXREREREREGsDiioiIiIiISANYXBEREREREWkAiysiIiIiIiINYHFFRERERESkASyuiIiIiIiINIDFFRERERERkQawuCpgnJycIAgCfH19s1zn33//hYGBAQRBwLfffpt34bJBEAQIgqCRfSna4uHDhxrZHxERERFRbrC4KmT27dsHd3d3JCcnY/Lkyfjtt9+kjkREREREVCSwuCpEDhw4oCysJk6ciPnz50sdiYiIiIioyGBxVUgEBgaiZ8+eSEpKgo+PDxYuXCh1JCIiIiKiIoXFVSFw6NAhZWH19ddfY/HixZ9d//z58/D09IS9vT309fVhbW2Nrl27IigoKNO6d+/exYQJE1CjRg0UL14cxsbGqFSpEkaPHo27d+9meR+vX7/GiBEjULJkSZiZmWHAgAF4/fq18vY3b95g0KBBKF68OCwsLODt7Y3nz5+r3FdCQgKmTp0KOzs7FCtWDF26dMGjR4+UtycmJmLChAmwsrKCqakpunfvjnv37mXaz/HjxyEIAlq2bKnyflq2bKm8JuzT67i+dH2Xr68vBEHA4MGDs9zv8ePHM9127Ngx5X2q2hYA7ty5g5EjR6J8+fIwNDRE8eLF0bx5c2zcuPGzj0PV/QGq22Hw4MHKHNn5UUdUVBSmTJmCmjVrwtTUFMWKFUOlSpUwePBgnDlzJsO6n9v/zJkzlbdndc3hrFmzPpv748es2N/IkSOzzH7+/HkIggAHBwekpqYiNTUVCxcuRMuWLWFnZwd9fX3Y2Nigc+fO2LVrl9p5Pv35+PW1a9cuDB8+HDVq1ICFhQUMDQ1RtmxZDB06FLdv3866wb/g4cOHauX41Je2VfW6O3nyJHr27Al7e3vo6ell2iar174qH/+txcTEYOzYsShdujQMDAxQpkwZTJw4EW/evFG5bU7adOHChWjdujUcHR1hZGQEMzMzVK1aFRMnTkRERESWOb/0N/XpY1a8VmbNmpVpX69evULJkiUhCAKcnJyyvR2Q9d/Ul/6Ws9rvl+7vU6qOnb///jsEQUClSpXw/v37TNusWbMGgiCgVKlSePXqVbbuR9Heqo4N8+bNgyAIqFixIp48eZLpdnXeEz+mzvEG+Pz7RFJSEipVqqTyefncdsDn359SU1Oxdu1atGzZEiVKlICBgQHKli2L0aNH4/Hjx1k+tuwctxW5svujODaoOgbJZDLY2NigSZMmWL16NVJTU7PMpk4bJCUloUuXLhAEAd27d0dycnKmbb/0OD79mwNyd3w+evQoevfuDUdHRxgYGMDKygoNGjTAzJkzERMTAyDj55Ev/ajKd+nSJfTv3195bCxRogTat2+P/fv3f7H9du/ejaZNm8LMzAympqZo2bJlltsBQHx8PH799VfUrVsXpqamMDY2RvXq1TF9+nSVx2JtPP/5ja7UASh3goKC0L17dyQmJmLMmDFYunTpZ9dfs2YNRo0aBblcjjp16qBly5aIjIzEv//+i3///RezZs3CzJkzlevv27cPS5YsQfXq1dGqVSvo6+sjLCwMK1euhK+vL3x9fdGnT58M9xEXF4dWrVohLCwM9vb2aNu2LUJCQtCtWzflOt26dUNUVBQ6duyIs2fPYsOGDTh79izOnTuHkiVLKteTy+Xo3r07goKCYGFhgY4dO+Lu3bto27YtEhISAAAjRozAvXv30Lp1a4SFhSEgIABnzpzB2bNnUaFChWy148aNGxEcHJytdTUlJSUFY8eO/ew627dvh7e3NxITE1GlShV06tQJ7969Q0hICAYOHIijR49i3bp1uc7StGnTTMsCAwPx/PlztG/fHra2tjna75EjR+Dh4YG3b9/C2toabdq0gb6+Ph4+fIjNmzcDAJo0afLF/dy/f1+t6wednZ1Ru3Zt5e/Pnj3DwYMHM6wzevRo/Prrr9i0aRN+++03mJubZ9rP8uXLAQAjR46Erq4u3r59i2+++QbW1taoUaMGbGxsEBUVhcOHD2P//v3o3bs3Nm/eDF3d9ENr7dq1MWjQoAz7PHXqFO7fvw9XV9dMr08TExPl/z09PWFgYIBq1aqhdevWSE1NxfXr17F+/Xps27YNhw4dylbbZaVYsWLw8PDIsGzHjh2Ii4vL1vafPi7F6+VTgYGB6NKlC9LS0lCuXDn06NEDxYoVAwDcu3cPp0+fzlH+N2/ewMXFBTExMcoPIsePH8fixYtx4MABnDx5ElZWVhm2yUmb7tmzB1FRUahVqxbMzc2RkJCAkJAQLF68GOvWrcP58+dRuXLlLHN++jzn5DFPmzYtw5dTBd0333yDEydOICAgAF999RX+/vtv5W2hoaEYP348dHV1sXXrVlhaWubqvubNm4cff/wRFSpUwPHjx+Hg4JDhdnXfE1XJzvHmS+bPn//ZLyxz4v379+jWrRuOHz8OExMT1KtXD1ZWVrh27RpWrlyJ7du3IygoCHXq1MmwXXaP2xUqVMh0HLh69SpCQ0MztQmATO8jHx+D0tLS8OjRI5w6dQpnz57FyZMnsWHDhlw9/qSkJLi7u2Pfvn3o3r07tm/fDj09vSzXL1++fIb3wg8fPmDnzp0q183p8Xn8+PFYtmwZgPT3h2bNmuHdu3e4ffs25syZg1atWqFly5bo0KFDpqJJcXx2d3fP8F7x6d/IkiVLMGnSJMjlctSuXRsuLi549uwZjh8/jkOHDmH27NmYMWOGyse1dOlSLFq0CPXr10eXLl1w//59BAcHIzg4GEuXLsXXX3+dYf3Xr1+jTZs2uHr1KszMzNC6dWvo6ekhODgYP/30EzZv3oyjR4+qLAC1/fxLSqQCpUyZMiIAcf369eLhw4dFIyMjEYBoYWEhxsXFfXbbsLAwUVdXVxQEQfT3989w2/79+0V9fX0RgHjo0CHl8itXroi3bt3KtC9/f39RJpOJ+vr64pUrVzLcNn36dBGA2Lx5c2WmpKQksUePHiIAEYDYrVs3MTExUXlbt27dRADiV199lWFfa9euFQGIVapUEV++fCmKoijK5XJx7Nixyn25uLiIb968EUVRFNPS0sTRo0eLAMR27dpl2NexY8dEAGKLFi0yLH/37p1oa2srmpqaiubm5iIAMSIiIsM6inb/dLnC+vXrRQDioEGDMt3WokULEYB47NixDMt/++03EYBYunRplduGhYWJBgYGoqGhobhz584Mtz18+FCsWbOmCED08/PL1v19qR2ymzu7Hj16JBYvXlwEIE6bNk1MSkrKcPvz58/FkydPZlimeE4/1bFjxwxttX79epX3qXjtzZo1K8PyrB5z//79RQDiwoULM+3r5cuXooGBgainpyc+ffpUFEVRTEhIEPfu3SumpqZmWPfBgwdixYoVRQDilClTVGZTGDRo0Gcfg8KWLVvEDx8+ZFgml8vF5cuXiwDE6tWri3K5/LP7UOX+/fsiALFMmTKZbvvS6zw1NTXL5yir10ubNm1EAOKwYcMy5f3c301WFNsAEBs1aiTGxMQob3vz5o3YpEkTEYDYt2/fTNvmpE0Vx6mPpaamil5eXiIAcfLkySpzDhgwQAQg+vr6qsz/6WOeOXOmCECcOXNmhuVnz54VBUFQvvY/fd6y2k4hq+crq+Vf2u+X7u9TWb2m3rx5Izo5OYkAxBUrVoiiKIqxsbHKv6P58+dna/8Kqv6u5s6dKwIQK1asKD558iTTNjl5T/yYusebrJ77iIgI0cjISPkcf/q8fOnvJKs2VrxGu3TpIj5//jzDbYsWLVK2zcf
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Метод локтя\n",
|
|||
|
"inertia = []\n",
|
|||
|
"for k in range(1, 11):\n",
|
|||
|
" kmeans = KMeans(n_clusters=k, random_state=42)\n",
|
|||
|
" kmeans.fit(data_scaled)\n",
|
|||
|
" inertia.append(kmeans.inertia_)\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"plt.plot(range(1, 11), inertia, marker='o', color='blue', linestyle='--')\n",
|
|||
|
"plt.title(\"Метод локтя для выбора количества кластеров\", fontsize=16)\n",
|
|||
|
"plt.xlabel(\"Количество кластеров\")\n",
|
|||
|
"plt.ylabel(\"Инерция\")\n",
|
|||
|
"plt.grid()\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# Коэффициент силуэта\n",
|
|||
|
"silhouette_scores = []\n",
|
|||
|
"for k in range(2, 11):\n",
|
|||
|
" kmeans = KMeans(n_clusters=k, random_state=42)\n",
|
|||
|
" kmeans.fit(data_scaled)\n",
|
|||
|
" score = silhouette_score(data_scaled, kmeans.labels_)\n",
|
|||
|
" silhouette_scores.append(score)\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"plt.plot(range(2, 11), silhouette_scores, marker='o', color='green', linestyle='-')\n",
|
|||
|
"plt.title(\"Коэффициент силуэта для различных количеств кластеров\", fontsize=16)\n",
|
|||
|
"plt.xlabel(\"Количество кластеров\")\n",
|
|||
|
"plt.ylabel(\"Коэффициент силуэта\")\n",
|
|||
|
"plt.grid()\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Кластеризация с помощью KMeans\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 12,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0kAAAIlCAYAAAAAOLPVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/GU6VOAAAACXBIWXMAAA9hAAAPYQGoP6dpAAC1VklEQVR4nOzdd5xTVf7/8ddNmd7oHUVARRERRAQVkaKioCj2VQHdXVcREeyuvWH5ihVlLQs2flZYCyoiTRBBBBFQUOkgHYbpk3bP74/MhISZgckwk1Dez8djHkzuvbn3k5NMyCfnnM+xjDEGERERERERAcAR7wBEREREREQOJEqSREREREREwihJEhERERERCaMkSUREREREJIySJBERERERkTBKkkRERERERMIoSRIREREREQmjJElERERERCSMkiQREREREZEwSpIkZo488kgsy2LcuHEVHvPFF1+QmJiIZVncddddsQtOqsWgQYOwLItBgwaVuz8QCHDddddhWRaZmZnMnDkztM+yrNDPc889t9frDBkyJHRsq1atqvMhiIjIQWrcuHER/5dYloXD4SAzM5NTTjmFxx9/nPz8/Arvb4zh448/5sorr6RFixakpqaSlJREs2bN6Nu3L6+99hp5eXl7jWHUqFGha7/00kvV/RAlhpQkyQFj0qRJDBgwAK/Xy+23385TTz0V75CkGnk8Hi699FLGjh1LvXr1mD59OmeeeWa5x44dO7bC8xQXFzN+/PiaClNERA5yqampDBw4kIEDB/K3v/2Ntm3bsmDBAu677z46duzIli1bytxn1apVdOzYkUsvvZT333+f5ORkzj77bPr378+RRx7JlClTuOGGGzjqqKNYu3Zthdd+8803Q7//97//rZHHJ7HhincAIgBfffVVKEEaPnw4zzzzTLxDkmqUn59P//79mTp1Ks2bN2fKlCkcffTR5R578skn89NPPzF//nw6depUZv+ECRPYtWsXnTp1Yv78+TUduoiIHGTq1q1bZtTKjz/+SM+ePfnjjz+44447ePvtt0P71q1bR5cuXdi6dStdunRhzJgxtGvXLuL+eXl5vPrqqzz++ONkZ2dzxBFHlLnu3Llz+e2338jKysLn87Fo0SIWLlxIhw4dauRxSs1ST5LE3ddff81FF12Ex+Nh2LBhjBo1Kt4hSTXasWMHPXv2ZOrUqbRp04bvv/++wgQJ4LrrrgMq/gau9Fu60uNERET25ZRTTuG2224Dgl+2+f3+0L6rr76arVu3csoppzBt2rQyCRJAeno6d955JwsWLKBBgwblXqP0/6crr7ySSy+9NGKbHHyUJElcffPNN6EEaejQoTz//PN7PX7NmjVlxhvv+bNmzZqI+/z444/ceeednHLKKTRs2JCEhAQaNGhAv379+Pbbb/d6vT/++IObbrqJY445hpSUFDIyMjjuuOO46aabWLp0KQAPPfTQPmPaW3wbN25kxIgRtGnThpSUFNLT0+nUqRMvv/xyxJt4qdJ5P+PGjeOXX37h4osvpl69eiQnJ9OuXTteeOEFAoFAmfuVxvnQQw/t9TGXKp1Dtme80di4cSPdunXjxx9/pFOnTsyaNYumTZvu9T59+vShYcOGvP/++xQXF0fsW716NdOnT6dLly4ce+yxez1PUVERzz77LKeeeipZWVkkJSVxzDHHcOedd7Jjx44yx/t8Pt59913+9re/ceyxx5KRkUFycjLHHHMMt9xyCxs3biz3Ot27d8eyLGbMmMGiRYu4+OKLqVu3LomJiRx33HE8++yzGGPK3M/j8fDMM8/QsWNH0tPTSUhIoGHDhnTq1Ik777yTnTt37vXxVVbp8xj+U6tWLU444QQefPDBctsi/DHtafr06aHz7Dn37KuvvuKCCy6gZcuWofY76qijuOKKK/j+++8jjh04cCCWZTFy5MgKY//www+xLItTTjkltK2qz1Op0r+fin72fEx7+7vZvn07derUwbIsjjzyyIh9K1as4Morr6Rt27bUrl2bhIQEmjRpQs+ePRk/fnyZ18SMGTOwLIvu3buXG3fpc1Le32T43+rEiRM5/fTTycjIID09ne7du/Pll19W2B6FhYU8+eSTdOjQgfT0dFJSUjj++OO57777yM7OLnN8ee/BDoeDBg0a0LVrV1577bVy37eq8j5cOr+kojmOez72ymzf17krc81we/tbAfj4448599xzqVevXug1cPXVV/Pbb79V6vyVjWtvj9fv9/PGG2/QvXt3ateuTWJiIi1atODGG29k/fr1ZY4Pfy0WFhZy77330qpVK5KSkmjcuDHXX389f/31V4WxZmdn8+CDD9K+ffvQa+qEE07gscceo7CwsMzx77zzDueeey5HHnkkaWlppKam0rp1a/7+97+zZMmSSrdRZXTs2BGAgoICtm/fDsDMmTOZNWsWAGPGjCEpKWmv52jVqhWNGjUqs72goIAPPvgAgOuvv57rr78egPHjx5f5v0wODhpuJ3EzZcoULrzwQoqLi7npppt48cUXK33f1NRULrnkkohtH3/8MQUFBWWOvffee5k+fTrHH388HTt2JDU1lZUrV/LFF1/wxRdf8PzzzzNs2LAy9xs/fjzXXXcdHo+H5s2bc95552HbNqtWrWLMmDHUr1+ftm3b0r59ewYOHBhx39mzZ7Ny5UpOO+20MoUF0tLSQr9/99139O/fn+zsbI488kh69+6Nx+Phxx9/ZOjQoXz++ed88cUXuN3uMvH9+OOP3HjjjTRs2JCePXuSnZ3NjBkzuPXWW5k9e3boA2a8rFixgt69e7NmzRp69uzJ//73v4jHXhGXy8W1117L008/zSeffMLf/va30L6xY8dijNlnL9LGjRs599xzWbJkCbVr16ZTp06kp6ezcOFCnnnmGT766CNmzJgRMVxiy5YtXHPNNWRmZtKmTRvatWtHQUEBixYt4qWXXuL9999nzpw5FRaKmDx5MqNGjaJly5b07t2bTZs2MXv2bG6//XbWr18f8QWAbducf/75TJ06lYyMDM444wyysrLYtm0bf/75J8888wxXXXUVtWvX3md7VdaAAQNC7b99+3ZmzJjBI488wvvvv8+iRYtITk7e5zl8Ph9DhgypcP/333/PvHnzaNu2bejDyG+//cYHH3zAhx9+yHvvvceVV14JwLBhw3j77bcZM2YMd955J06ns8z5Ro8eDcDNN98c2ra/z1OpPf82V6xYUSaR25e77767wmR2zZo1fPHFF7Rr146jjz6apKQk1q1bx8yZM5k2bRrfffcdY8aMqdR13n333YgiJxV58cUXee655zj55JPp27cvK1euZObMmcycOZMXX3yRoUOHRhy/c+dOevbsyaJFi8jIyKBHjx643W5mzpzJ448/zvjx45k2bVqZBBAi34MDgQDr1q1j9uzZ/PDDD8yaNYt33nkn4viqvg8frPx+P3/729/48MMPSUxMpGPHjjRp0oQ//viD9957jwkTJjBhwgTOPffcGo0jLy+PCy64gBkzZpCWlkbHjh2pV68eS5YsYcyYMXz00UdMmTKFk046qcx9vV4vPXv2ZPHixXTv3p0OHTowe/Zs/vvf//Lll1/y3Xff0bp164j7/Pbbb5x77rmsX7+eRo0acfrpp+N2u/nxxx+5//77+eSTT5gxYwaZmZmh+3zzzTf8+uuvHHfccXTp0gWv18uiRYt48803eeedd5gyZQrdunWrlvbIzc0N/Z6YmAjAp59+CsAJJ5xQbjtU1gcffEBeXh7t2rULvf8dffTR/PHHH0yYMIGrrrpqPyKXuDAiMXLEEUcYwIwdO9Z8++23Jjk52QCmVq1apqCgoFLnWLFihQHMkUceWeH5V69eHbH9yy+/NBs3bixz/Jw5c0xGRoZxu91mw4YNEft++ukn43a7jWVZ5sUXXzSBQCBi/5o1a8xPP/1UYZwDBw4MPdaKbNq0ydSpU8dYlmVeeeWViGts377d9OjRwwDm4YcfLvfcgLnpppuMz+cL7Vu6dKmpV6+eAcyYMWMi7vfggw8awDz44IMVxhSuovbcm9LYTj31VNOgQQMDmAEDBpji4uJ93rf
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"optimal_k = 3 \n",
|
|||
|
"kmeans = KMeans(n_clusters=optimal_k, random_state=42)\n",
|
|||
|
"data['KMeans Cluster'] = kmeans.fit_predict(data_scaled)\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация кластеров KMeans\n",
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"sns.scatterplot(x=data_pca[:, 0], y=data_pca[:, 1], hue=data['KMeans Cluster'], palette='viridis', alpha=0.8, edgecolor=None)\n",
|
|||
|
"plt.title(\"Кластеры KMeans, визуализированные через PCA\", fontsize=16)\n",
|
|||
|
"plt.xlabel(\"Главная компонента 1\")\n",
|
|||
|
"plt.ylabel(\"Главная компонента 2\")\n",
|
|||
|
"plt.legend(title='Кластеры')\n",
|
|||
|
"plt.grid()\n",
|
|||
|
"plt.show()\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Иерархическая кластеризация\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 13,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA+0AAALpCAYAAADGocexAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/GU6VOAAAACXBIWXMAAA9hAAAPYQGoP6dpAACKZUlEQVR4nOzdd3hUZd7G8XuSTCopBFIIJYIgTTouVUQQUERRWCur2MsLIuIqulbWVdR1LSjq6u6KXVAXd3WRIgqKNAUBBUQINUAo6T2Tmef9A2fMkEnIhAk5kO/nunJl5tTfmXJm7jnnPI/NGGMEAAAAAAAsJ6i+CwAAAAAAAL4R2gEAAAAAsChCOwAAAAAAFkVoBwAAAADAogjtAAAAAABYFKEdAAAAAACLIrQDAAAAAGBRhHYAAAAAACyK0A4AAHAS2rhxo5555hmvYeXl5Zo2bZo2btxYT1UBAAKN0A6cQnbu3CmbzXbMv0cffbS+SwUAHKecnBzdc889WrlypWfY3Llz9eijjyo3N7ceKwMABFJIfRcAIPBsNpuuvfbaSsPnz5+vAwcO1ENFAIBAO+uss3TGGWfo3HPP1YgRI1RWVqZFixapQ4cOOuuss+q7PABAgBDagVNQUFCQZs2aVWn44MGDCe0AcIoIDQ3VZ599prvvvlvLli2TJI0cOVLPPvus7HZ7PVcHAAgUQjtwCnG5XJKOHGkHAJz62rVrp//+97/1XQYAoA5xTTtwCiktLZUkhYTU/ve4NWvWaNy4cWrVqpXCwsIUHx+vESNGaN68eT6nP+2002Sz2bRz506f42fNmiWbzabrrruuynUe6xr8JUuW+JwvPT1dd9xxh9q1a6fw8HDFxsZqwIAB+vvf/y6n01llLdX9rVu3zjP94MGDPetfunSphg8frvj4eEVGRup3v/ud3n77bZ91HTp0SDNmzNDIkSPVunVrRUREKCYmRr1799ZTTz2lkpKSah+HoKAgbdu2zec07777rme6wYMHe41bsmSJZ1zr1q09P+Ic7eabb66yfYP8/Hy9/vrrGjNmjNq1a6eoqChFRUWpS5cueuCBB5STk+NzmTVxrMf/tNNOq3LeittW1V9V6zznnHMUHx+v4ODgSvP4OiOlKtddd12V82zZskVhYWE+nxe3ffv2acqUKerYsaMiIyMVHR2ts846Sy+99JLKy8urXd/69es1ZswYJSQkKCIiQl27dtULL7zg83Ve2+ew4nt57ty5GjhwoGJiYhQdHa3Bgwf73Ad89NFHstlsSkhIUHp6eqXxCxYsUHBwsGJjY7V169bjfiyrm8/dpkdVr6Ps7Gw98sgj6t69u6KjoxUZGakuXbroL3/5i4qKinzOIx3ZJ44fP16tW7dWeHi44uPj1a1bN91zzz3atWuXZzr3a9TX879+/Xo1bdpUISEhevfddyuNz8rK0p/+9Cd17tzZ89ro1auXnn76aRUXF1eavrp1Sb/tu6rbN1c339H7XGOMbrnlFtlsNvXt29fn9fK1eY9+8cUXuuOOO9S9e3c1bdpUYWFhatGiha644gp999131dZak+fF/Xqp6d/RfvnlF9166606/fTTPZ8xgwYN0jvvvHPMx8+fz4zavh9WrVqlsWPHqn379oqLi1NYWJhSU1N10UUX+Xy/OhwOvfPOOxo3bpw6dOigmJgYRUREqH379po0aZL27dt3zO3ypbrXY223ra5e48DJiiPtwCnE4XBIksLCwmo1/wsvvKApU6bI5XKpe/fu6tOnjzIyMrRkyRItXLhQ06ZN08MPPxzIkr2MHz/e63511+B/9913Ov/885WVlaVWrVrpkksuUW5urpYsWaLly5dr7ty5+u9//6vQ0NBK855++ukaOHCgz+XGx8dXGjZ37ly99NJL6tChg0aMGKF9+/Zp2bJluvbaa7Vu3Tr97W9/85p+wYIFuvPOO9W8eXO1bdtWffv21aFDh7Rq1Srdd999+s9//qOvvvqqyufJGKOXXnpJzz//fKVxL7zwgs95jrZz507997//1SWXXOI1PDMz02docFu/fr1uueUWJSQkqH379urVq5eys7O1Zs0aPfHEE5ozZ45WrlypJk2a1KgOX45+/AsKCvTxxx/XaN6kpCSdf/75XsPefPNNn9M++OCDevzxxyVJvXv3Vrt27Tyvh2XLliktLa025fs0ceJElZWVVTn+66+/1iWXXKLs7GyddtppGjZsmEpLS7V69Wrdcccd+vTTT/XZZ5/5PKV59erVuv3225WcnKyhQ4cqOztbS5Ys0eTJk7Vs2TLNmTPHK3Ac73M4Y8YMPffcc+rdu7dGjRqltLQ0LV26VEuXLtWMGTN0xx13eKb9/e9/rzvuuEMvvviirrrqKn311VeeHw337t2ra665Ri6XS6+//rratWsXkMeyNjZt2qTzzz9fe/bsUbNmzTRw4EDZ7XatXr1aDz30kD7++GMtWbJEsbGxXvP99a9/1X333SeXy6UzzjhDo0ePVnFxsbZt26ZnnnlGnTt3rvYHSenI8zF06FDl5OTorbfe0tVXX+01fvv27RoyZIh27dqlhIQEjRw5Ug6HQ1999ZWmTp2q2bNn64svvlDjxo1rtK3vvPOOli5d6tfjUx1jjG699Va9/vrr6tu3rxYsWKCYmJgqp/fnPXrbbbdpz5496ty5swYMGKCQkBD9/PPPmjNnjv7973/rgw8+0NixYyvNV9Pnxdd+3v25MmLECCUnJ1e5HR9++KGuvfZalZSUqEOHDho5cqRyc3O1atUqXXPNNfryyy/1r3/9y+e8/n5mVKe698OPP/6or776Sl26dFGXLl0UEhKitLQ0ffbZZ/rss880ffp03XfffZ7pDxw4oGuuuUaxsbHq2LGjunbtqsLCQq1bt04vvviiPvjgAy1fvlxt27atcX3Ho7bv9UC/xoGTggFwyli6dKmRZJKSknyOP+ecc4wk88gjj1QaN3/+fGOz2UzTpk3N0qVLvcZt2LDBtGjRwkgyS5Ys8RqXmppqJJkdO3b4XOcbb7xhJJnx48f7HO90Oo0k42t35K73q6++8hpeUlLiWe9tt91mysrKPOPS0tLMaaedZiSZP/3pT37VUtX6JZknnnjCa9ySJUtMRESEkWTmz5/vNW7Tpk1mxYoVlZaXlZVlhg8fbiSZp59+utJ497qGDBliYmJiTH5+vtf45cuXG0lm6NChRpI555xzvMZ/9dVXRpLp1auXiY2NNeeee26ldTzxxBNeyzj6tbBnzx7zxRdfGKfT6TW8sLDQXHvttUaS+b//+79Ky62Jf/7znz4f/x07dhhJJjU1tcp5v/jiCyPJDB48uNI4X6+f0tJSExkZaSSZN998s9I848ePN5LMG2+8UeP6q5pn9uzZRpJp1aqVz+dl//79pkmTJsZms5mXX37Z67E9fPiwGTJkiJFkpk2b5nN97sfc4XB4xv30008mISHBSDKvvvqq13y1fQ7d7ymbzWbeeecdr3EffPCBsdlsJiQkxPz4449e40pLS83vfvc7I8lMnTrVGGOMw+EwAwcONJLMhAkTAvZYVve8VfU6KioqMqeffrqRZB588EFTWlrq9ZhcddVVRpK5/vrrveb7z3/+YySZ8PBwM3v27Err27hxo9m0aZPnvvv9V7HmdevWmSZNmpjg4GDz3nvvVVqGMcb06dPHSDIXX3yxKSgo8Aw/ePCg6dmzp5Fkrr76aq95fK3LGGNyc3NNcnKyiY6ONnFxcdXum305ep/rcrnMzTffbCSZvn37mtzc3Crn9fc9aowxc+fONVlZWT6Hh4SEmCZNmpiioiKvcf4+L8faRl82bNhgwsLCTHh4uPn444+9xu3cudN06dLF576ltp8ZtX0/lJSU+Kz/888/N5JM06ZNvYbn5eWZ//znP17vAWOMKSsrM/fff7+RZEaOHFlpecd6zKp6PR7PttXVaxw4WRHagVPIxx9/bCSZzp07+xxfXWh3f3H86KOPfM47Z84cI8mMHTvWa/jxhvbCwkIjyYSEhFRZ79FfFN5++20jyaSkpPj80vLRRx8ZSSY6OtoUFxfXuJaq1t+jRw+f4++++24jyQw
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1200x800 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA3MAAAImCAYAAAD5fdOKAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/GU6VOAAAACXBIWXMAAA9hAAAPYQGoP6dpAAC8b0lEQVR4nOzdd3gU5doG8Htm+6aSnkDoICUU6cUDSAfpCAIWwI4oCiqKHRviUazY+QQLR1FBQVRAkCYiAtKL9FDSCelb5/3+WHbNZndDNqSw4f5dF+eYmdmZZ96d2Z1n3yYJIQSIiIiIiIgooMjVHQARERERERH5j8kcERERERFRAGIyR0REREREFICYzBEREREREQUgJnNEREREREQBiMkcERERERFRAGIyR0REREREFICYzBEREREREQUgJnNEFJB+//13fPLJJ27LsrKy8Nxzz+Hs2bPVFBURERFR1amQZK5+/fqQJAkLFy70uc2PP/4InU4HSZLw2GOPVcRhiegqdvLkSUybNg0nTpxwLfu///s/zJ49G5IkVWNkREREdKVauHAhJEly+yfLMsLCwtCpUye89NJLyM/P9/l6IQS+/fZbjB8/Hg0aNEBQUBD0ej0SExMxZMgQfPTRR8jLyys1hnnz5rmO/c4771zW+UhCCHFZe4AjmTt16hQ+/fRTTJo0yWP9ypUrMWrUKFgsFjzyyCP473//e7mHJKKr3Pnz59GkSRMoioI+ffrg/PnzWL9+PXr37o1ff/21usMjIiKiK9DChQsxefJkBAUF4cYbbwQA2O12HD9+HFu3boWiKGjatCk2btyI2NhYt9ceP34cN954I/7++28AQPPmzXHNNddAp9Ph7Nmz2LZtGywWC6KiorB9+3bUq1fPawwtW7bEgQMHAABt27Z17a881OV+ZRn9/PPPGD16NCwWC6ZPn85EjogqREREBFatWoWZM2fi119/hV6vx80334zXXnutukMjIiKiK1xUVJRHq8Jt27ahT58++Oeff/Doo4/is88+c61LTk5G165dkZ6ejq5du+KDDz5A69at3V6fl5eH999/Hy+99BKys7O9JnNbt27FgQMHEB4eDqvVil27dmHnzp1o165duc6jUvvM/fLLLxg5ciTMZjMefPBBzJs3rzIPR0RXmQ4dOmDdunW4cOECUlNT8fnnn3v8ikZERERUFp06dcLDDz8MAFi6dClsNptr3S233IL09HR06tQJ69at80jkACAkJAQzZ87Ejh07fD6PLFiwAAAwfvx4jBkzxm1ZeVRaMrd69WpXIvfAAw/gzTffLHX7kydPerRfLfnv5MmTbq/Ztm0bZs6ciU6dOiEuLg5arRaxsbEYOnToJZtZ/fPPP7jvvvtwzTXXwGg0IjQ0FC1atMB9992Hffv2AQCee+65S8ZUWnznzp3DjBkz0Lx5cxiNRoSEhKBjx45499133S4Op0mTJrn6Hu7evRujRo1CdHQ0DAYDWrdujbfeegt2u93jdc44n3vuuVLP2cnZx7FkvKXp1asXJEnC+vXrPdb99ttvrjLw1swWcJT3Pffcg0aNGkGv1yMsLAw9evTAF198ccnjbdiwAf3790dERASMRiM6deqEzz//3OvrMjIy8Pbbb2Pw4MFo0KABDAYDQkND0aFDB8ydOxcmk8nr65zxA8DHH3+M9u3bIygoCOHh4Rg8eDC2bt3q8ZrXX38dkiShadOmXttGf/zxx5AkCYmJicjMzPR6biWVVpalvW79+vWQJAm9evXyen7+XotO69atw5gxY1CnTh3odDpER0ejY8eOePbZZ5GVleXaztn+3Nv7v2bNGhiNRgQFBWHdunUe68+cOYMHHngATZo0cV0b3bt3x4cffuj1ei/tWGazGU2bNnV7P8vK131hNpsxZMgQSJKE4cOHw2Kx+IzJ17/69et7vGbp0qW48847kZSUhFq1akGv16NBgwa4/fbbcfjw4VJjLcv74rxeyvLPW3w7duzAzTffjLp160Kn0yEiIgIDBgzATz/9dMnyW7ZsGa677jqEhoYiJCQEvXr18vm68t4PP//8M4YNG4ZGjRohNDQUBoMBDRs2xLhx4/D777977CsvLw8ff/wxRo0ahSZNmiAoKAhBQUFo1aoVnnzySVy4cOGS5+VNaddjec+tsq7x0jjPs/i/WrVqoVWrVh73u1NVvHcTJ06EJEmYM2eOz9iXLFkCSZLQqVMn1zKr1YovvvgCN998M5o1a+Y6zjXXXINp06bh3LlzpZaH8/vY17+S51Ta93BmZiYiIyO93mtHjx7F+PHjkZSUhIiICGi1WtSuXRt9+vTB4sWLUbInzKU+64vf9yWv2fLeowBQWFiIV155Be3atUNISAiMRiNatmyJp556CtnZ2R7be3umk2UZsbGx6NatGz766COv3z3lea4r7X7xdu5lWX6pfZflmMWVdq8AwLfffouBAwciOjradQ3ccsstrmZ4ZXWpuEo7X5vNhk8++QS9evVCREQEdDodGjRogClTpuD06dMe2xe/FgsLC/HEE0+gcePG0Ov1SEhIwB133FHqYGTZ2dl49tln0bZtW9c11apVK7z44osoLCz02P7zzz/HwIEDUb9+fQQHByMoKAhNmjTBnXfeib1795a5jMqiffv2AICCggLX89uGDRuwadMmAMAHH3wAvV5f6j4aN26M+Ph4j+UFBQX4+uuvAQB33HEH7rjjDgDA4sWLfT6jXpKoAPXq1RMAxKeffiqEEGL16tVCr9cLAOK+++4r0z5OnDghAIigoCAxceJEt39BQUECgDhx4oTba/r06SNkWRatWrUSgwcPFmPGjBHt2rUTAAQA8eabb3o91pdffil0Op0AIOrWrStGjx4tRo4cKdq0aSMkSRLPPvusEEKIZcuWecTSqFEjAUB0797dY11GRobrGBs2bBC1atUSAET9+vXFsGHDxIABA1zL+vfvLywWi1tcEydOFADElClThF6vF/Xr1xc33XST6N+/v9BqtQKAuPHGG4WiKG6ve/bZZwUAV9yX4ny/SpZnaXr27CkAiN9++81tucViEc2bN3eV+cSJEz1eu2TJEtf10KxZMzFy5EjRu3dv1/s6efJkn8ebNm2akGVZtGjRQowbN0706NFDyLIsAIgZM2Z4vO7zzz8XAETt2rVFz549xbhx40SfPn1EcHCwACC6du0qTCaTx+uc8U+fPl1IkiSuu+46MX78eJGUlCQACLVaLZYuXerxumHDhgkAYty4cW7Ld+3aJfR6vVCr1eL333+vkLL09TohhPjtt98EANGzZ0+PdeW5FoUQ4oEHHnDF0rZtWzFu3DgxaNAg0bBhQ484Pv30U68xr169WhgMBmE0GsW6des8jrFt2zYRERHhuhdvuukmMXDgQNf1MmDAAGE2m91e4+tYQgjxwgsvuGL29+PN231hMpnEDTfcIACI4cOHey2n4jE1atTI7TNh9OjRAoCoV6+ex2tUKpUwGo2iQ4cOYtSoUWLYsGGusg0KCvK4bpzK+r7MmTPH52fp6NGj3ZY//PDDbsd48803XfdZ27ZtxY033iiuu+461+fQ7NmzfZbf9OnTBQDRoUMHMX78eNGpUydXvG+//bbH68p7Pzz55JMiJiZG9O7dW4wZM0aMGTNGtGzZUgAQkiSJxYsXu22/adMmAUBER0eL6667zvXZGhkZKQCIxo0bi8zMTJ/n5evzsrTrsbznVlnXeGmc51n82rjhhhtc10zTpk1FYWFhhZyfP+/djh07XJ8PNpvNa+w9evQQAMSiRYtcy06fPi0AiLCwMNGlSxcxZswYMXjwYJGQkOC6Do4cOeKzPJzfxyW/67t37+71nEr7Hr7jjjtcZVHys2DNmjUiODhYdOvWTYwcOVKMHz9edO/eXahUKgFA3HPPPW7bl/ZZ7/z+c/4rec2W9x7NysoSbdu2FQBEaGioGDZsmBg9erSIiooSAESDBg08juXtme6WW25x+w6/5ZZbPI5Vnue60u6Xkufuq0z8vb/LcszifN0rVqtVjB07VgAQOp1OdOvWTYwZM0a0adNGABAGg0H8/PPPZTpGWeLydb65ubm
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.cluster import KMeans, AgglomerativeClustering\n",
|
|||
|
"\n",
|
|||
|
"Z = linkage(data_scaled, method='ward')\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(12, 8))\n",
|
|||
|
"dendrogram(Z, truncate_mode='lastp', p=optimal_k, leaf_rotation=90., leaf_font_size=12., show_contracted=True)\n",
|
|||
|
"plt.title(\"Дендограмма для иерархической кластеризации\", fontsize=16)\n",
|
|||
|
"plt.xlabel(\"Объекты\")\n",
|
|||
|
"plt.ylabel(\"Евклидово расстояние\")\n",
|
|||
|
"plt.grid()\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# Применение иерархической кластеризации\n",
|
|||
|
"hierarchical = AgglomerativeClustering(n_clusters=optimal_k)\n",
|
|||
|
"data['Hierarchical Cluster'] = hierarchical.fit_predict(data_scaled)\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация кластеров иерархической кластеризации\n",
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"sns.scatterplot(x=data_pca[:, 0], y=data_pca[:, 1], hue=data['Hierarchical Cluster'], palette='coolwarm', alpha=0.8, edgecolor=None)\n",
|
|||
|
"plt.title(\"Кластеры иерархической кластеризации, визуализированные через PCA\", fontsize=16)\n",
|
|||
|
"plt.xlabel(\"Главная компонента 1\")\n",
|
|||
|
"plt.ylabel(\"Главная компонента 2\")\n",
|
|||
|
"plt.legend(title='Кластеры')\n",
|
|||
|
"plt.grid()\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Оценка качества кластеризации\n",
|
|||
|
"\n",
|
|||
|
"Подводя итоги оцениваем качество кластеризации. Для этого были вычислены средние коэффициенты силуэта для:\n",
|
|||
|
"\n",
|
|||
|
"Кластеризации KMeans.\n",
|
|||
|
"Иерархической кластеризации.\n",
|
|||
|
"Эти метрики показывают, насколько хорошо объекты внутри одного кластера схожи и насколько различаются между кластерами. "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 14,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Коэффициент силуэта для KMeans: 0.5469\n",
|
|||
|
"Коэффициент силуэта для иерархической кластеризации: 0.5783\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"silhouette_kmeans = silhouette_score(data_scaled, data['KMeans Cluster'])\n",
|
|||
|
"silhouette_hierarchical = silhouette_score(data_scaled, data['Hierarchical Cluster'])\n",
|
|||
|
"\n",
|
|||
|
"print(f\"Коэффициент силуэта для KMeans: {silhouette_kmeans:.4f}\")\n",
|
|||
|
"print(f\"Коэффициент силуэта для иерархической кластеризации: {silhouette_hierarchical:.4f}\")"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": ".venv",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.12.5"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 2
|
|||
|
}
|