476 lines
801 KiB
Plaintext
Raw Normal View History

2024-12-07 15:58:46 +04:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Лабораторная работа №3\n",
"\n",
"## Набор данных Students Performance in Exams (Успеваемость студентов на экзаменах)\n",
"\n",
"Выгрузка данных из CSV файла в датафрейм"
]
},
{
"cell_type": "code",
"execution_count": 201,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"from sklearn import set_config\n",
"from sklearn.decomposition import PCA\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"\n",
"\n",
"set_config(transform_output=\"pandas\")\n",
"\n",
"random_state=9\n",
"# Загрузка данных\n",
"df = pd.read_csv(\"..//..//static//csv//StudentsPerformance.csv\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Описание набора \n",
"\n",
"Контекст\n",
"Оценки, полученные студентами\n",
"\n",
"Содержание\n",
"Этот набор данных состоит из оценок, полученных учащимися по различным предметам.\n",
"\n",
"Вдохновение\n",
"Понять влияние происхождения родителей, подготовки к тестированию и т. д. на успеваемость учащихся."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Анализ содержимого\n",
"\n",
"*Объекты наблюдения:* студенты, участвующие в экзаменах.\n",
"\n",
"*Атрибуты объектов:* \n",
"\n",
"1. gender — пол: определяет гендерную принадлежность студента (мужской, женский). \n",
"2. race/ethnicity — этническая принадлежность: группа, к которой относится студент (например, различные расовые/этнические категории). \n",
"3. parental level of education — уровень образования родителей(например, среднее образование, высшее образование и т.д.). \n",
"4. lunch — тип обеда: информация о том, получает ли студент бесплатный или платный обед. \n",
"5. test preparation course — курс подготовки к тесту\n",
"6. math score — результаты экзаменов по математике.\n",
"7. reading score — результаты экзаменов по чтению.\n",
"8. writing score — результаты экзаменов по письму.\n",
"\n",
"\n",
"### Бизнес-цель:\n",
"\n",
"Сегментация студентов на основе их успеваемости и факторов, влияющих на результаты экзаменов. Кластеризация поможет выявить группы студентов с похожими характеристиками, что может быть полезно для: \n",
"**Персонализированного подхода**: Выявить группы студентов, которым требуется дополнительная помощь или поддержка, например, по подготовке к экзаменам или улучшению питания. \n",
"**Целевая помощь**: Разработка программ помощи для студентов, чьи результаты могут быть улучшены, например, через курс подготовки к тестам. \n",
"**Мониторинг образования**: Оценка факторов, влияющих на успех в обучении, для улучшения образовательных методов.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Понижение размерности и визуализация данных\n",
"\n",
"Перед тем как применять алгоритмы кластеризации, можно выполнить понижение размерности, чтобы упростить визуализацию и улучшить производительность. Для этого применим методы, такие как PCA (Principal Component Analysis).\n",
"\n",
"Так как данные имеют категориальные переменные, их следует преобразовать в числовые с помощью OneHotEncoder или LabelEncoder, а затем применить PCA."
]
},
{
"cell_type": "code",
"execution_count": 202,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import StandardScaler\n",
"df_encoded = pd.get_dummies(df, columns=['gender', 'race/ethnicity', 'parental level of education', 'lunch', 'test preparation course'], drop_first=True).astype(int)\n",
"scaler = StandardScaler()\n",
"df_encoded[['math score','reading score','writing score']] = scaler.fit_transform(df_encoded[['math score','reading score','writing score']])"
]
},
{
"cell_type": "code",
"execution_count": 203,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAr4AAAIjCAYAAADlfxjoAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOydd3wUZf7HP5vNbpJNSCEh1IQENvQWQVSSAHLYACkqKvpTMNZT5Dw8C0q3IIftBOsp4t0hNqSInnqCKMWGRCGImkBIUEpISDZke3bn90eYdXZ36u5sS77v14uXZnd25plnnpn5PN/nWzQMwzAgCIIgCIIgiDZOXKQbQBAEQRAEQRDhgIQvQRAEQRAE0S4g4UsQBEEQBEG0C0j4EgRBEARBEO0CEr4EQRAEQRBEu4CEL0EQBEEQBNEuIOFLEARBEARBtAtI+BIEQRAEQRDtAhK+BEEQBEEQRLuAhC9BEARBEATRLiDhSxAyWLNmDTQajedfYmIi+vTpg9mzZ+PkyZN+2588eRJ/+9vf0K9fPxgMBiQnJ2P48OF49NFH0djYyHuMkSNHQqPR4MUXXwzx2RAEQRBE+0TDMAwT6UYQRLSzZs0a3HTTTVi6dCny8/Nhs9mwc+dO/Pvf/0bPnj1RXl4Og8EAAPjuu+8wYcIENDc34//+7/8wfPhwAMCePXvw1ltvYdSoUfj000+99l9RUYE+ffogLy8P3bt3x86dO8N+jgRBEATR1omPdAMIIpa47LLLMGLECADALbfcgszMTDz99NPYtGkTZsyYgcbGRkybNg1arRZlZWXo16+f1+8fe+wx/POf//Tb73/+8x9kZ2fjqaeewlVXXYUjR44gLy8vHKdEEARBEO0GcnUgiCAYN24cAKCqqgoA8PLLL+P333/H008/7Sd6AaBz586YP3++3+dvvvkmrrrqKkyaNAlpaWl48803FbVj8eLFXq4Y7L+xY8f6bXvkyBHebTUajdd2tbW1uPnmm5GbmwutVuvZJiUlRVabXnjhBQwcOBAJCQno1q0b7rrrLj83j7Fjx2LQoEF+v33yySeh0Whw5MgRr88bGxtxzz33ICcnBwkJCTAajVi+fDncbrff+T355JN++x00aJBXn2zfvh0ajQbbt2/3fHbs2DHk5eVhxIgRaG5uFtwOACZOnAiNRoPFixcL9kNzczOSk5Pxl7/8xe+73377DVqtFsuWLfM6x7/+9a/Iy8tDQkICevTogRtvvBF1dXVebRH6N2vWrID7jP2n0+mQl5eH++67Dw6Hw7Pd6dOn8be//Q2DBw9GSkoKUlNTcdlll+HHH3/0Oze73Y5FixbBaDQiISEBOTk5uP/++2G32722Y4/57LPP+u2jX79+0Gg0mD17ttfnhw8fxvTp09GxY0cYDAacf/75+PDDD/1+b7PZsHjxYvTp0weJiYno2rUrrrjiChw6dEj0PvDtS9bViTse3W43hgwZAo1GgzVr1vgd2xex6yp1Tdkx9vnnn0Oj0WDDhg1++3/zzTeh0Wjw1VdfeX2el5fHuz/fNrvdbjz77LMYOHAgEhMT0blzZ9x+++1oaGjw29+kSZP8jj979myvZ4jU+XDvQ/ZZ07lzZyQmJmLo0KF44403vPYvd4wShBRk8SWIIDh06BAAIDMzEwCwefNmJCUl4aqrrpK9j2+++QaVlZV4/fXXodfrccUVV2Dt2rV46KGHFLfnxRdf9AjTefPmiW572223oaSkBADw/vvv+71MZ86cic8++wx33303hg4dCq1Wi1deeQV79+6VbMfixYuxZMkSjB8/Hn/+85/xyy+/4MUXX8R3332HXbt2QafTKT43i8WCMWPG4Pfff8ftt9+O3Nxc7N69G/PmzcPx48d5hZNSTCYTLrvsMuh0Onz00UeiIv/LL7/ERx99JLnPlJQUTJs2DW+//TaefvppaLVaz3fr1q0DwzC4/vrrAbSK5JKSEhw8eBClpaU455xzUFdXh82bN+O3335DVlaW57dz5szBueee63WsW265xetvpX3Gjgm73Y5PPvkETz75JBITE/HII48AaBWcGzduxPTp05Gfn4+TJ0/i5ZdfxpgxY/DTTz+hW7duAFpF1OTJk7Fz507cdttt6N+/P/bv349nnnkGv/76KzZu3Oh13MTERLz++uu45557PJ/t3r0b1dXVfv158uRJjBo1ChaLBXPmzEFmZibeeOMNTJ48Ge+99x6mTZsGAHC5XJg0aRK2bt2Ka6+9Fn/5y19w5swZ/O9//0N5eTnGjx+Pf//73579svcA97PevXvzXlMA+Pe//439+/cLfs9F6rr279/f67ivvPIKDh48iGeeecbz2ZAhQzB48GDk5ORg7dq1nvNkWbt2LXr37o0LLrjA7/jDhg3DvffeC6B1kr5w4UK/bW6//XaPS9ecOXNQVVWFVatWoaysLKB7lns+O3bswCuvvIJnnnnGM4Y7d+4MALBarRg7diwqKysxe/Zs5Ofn491338WsWbPQ2NjoN2GUGqMEIQlDEIQkr7/+OgOA+eyzz5hTp04xR48eZd566y0mMzOTSUpKYn777TeGYRgmIyODGTp0qKJ9z549m8nJyWHcbjfDMAzz6aefMgCYsrIy2ft46KGHGABMXV2d57OBAwcyY8aM8du2oqKCAcC88cYbns8WLVrEcB8HVquViYuLY26//Xav386cOZNJTk4WbUttbS2j1+uZiy++mHG5XJ7PV61axQBgVq9e7flszJgxzMCBA/32sWLFCgYAU1VV5fnskUceYZKTk5lff/3Va9sHH3yQ0Wq1TE1NDcMwDFNVVcUAYFasWOG3X98++fzzzxkAzOeff87YbDZm7NixTHZ2NlNZWen1O+52LOeddx5z2WWXMQCYRYsWifbJJ598wgBg/vvf/3p9PmTIEK/2LFy4kAHAvP/++377YMcH25Z3333Xb5vk5GRm5syZnr+V9tnrr7/utV23bt2YCRMmeP622Wxe15T9bUJCArN06VLPZ//+97+ZuLg4ZseOHV7bvvTSSwwAZteuXZ7PADBXXXUVEx8fz+zZs8fz+c0338xcd911DADmrrvu8nx+zz33MAC89n3mzBkmPz+fycvL87Rv9erVDADm6aef9usnti+5+N4DXNj7nx2PNpuNyc3N9Vx/337zRc515TJz5kymZ8+evPuaN28ek5CQwDQ2Nno+q62tZeLj43nHYbdu3ZhJkyZ5/v7uu+/82rxjxw4GALN27Vqv33788cd+n/fs2ZOZOHGi33Huuusu2f3H5dlnn2UAMP/5z388nzkcDuaCCy5gUlJSmKamJoZh5I9RgpCCXB0IQgHjx49Hp06dkJOTg2uvvRYpKSnYsGEDunfvDgBoampChw4dZO+vpaUFb7/9Nq655hrPMuG4ceOQnZ2NtWvXyt6PzWYD0Go5k4JdFkxISBDcxmw2w+12eyzZSvjss8/gcDhwzz33IC7uj0fMrbfeitTUVN4laTm8++67KCkpQUZGBurq6jz/xo8fD5fLhS+//NJre4vF4rVdXV0dXC4X777dbjduvPFGfP311/joo49ELX1Aq3Xwu+++wxNPPCGr7ePHj0e3bt28rml5eTn27duH//u///N8tn79egwdOtTPmgfAzxVFDkr7rLm5GXV1dfj999/xyiuv4MSJE/jTn/7k+T4hIcFzTV0uF+rr65GSkoK+fft6rQS8++676N+/P/r16+d1XNY16PPPP/c6bufOnTFx4kS8/vrrAFqv3TvvvIObbrrJ75w++ugjjBw5EsXFxZ7PUlJScNttt+HIkSP46aefALT2ZVZWFu6++26/fQTSl1yef/551NfXY9GiRbK2V/O63njjjbDb7Xjvvfc8n7399ttoaWnxGkssNptN8rnw7rvvIi0tDRdddJHX9Ro+fDhSUlL8rpfT6fS7t9hnkFI++ugjdOnSBTNmzPB8ptPpMGfOHDQ3N+OLL77w2l5qjBKEFOTqQBAKeP7559GnTx/Ex8ejc+fO6Nu3r5e4S01NxZkzZ2Tv79NPP8WpU6cwcuRIVFZWej6/8MILsW7dOixfvtxr/0LU1dVBp9N5Mku
"text/plain": [
"<Figure size 800x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Применение PCA для понижения размерности до 2\n",
"pca = PCA(n_components=2)\n",
"X_pca = pca.fit_transform(df_encoded)\n",
"# Преобразуем результат в DataFrame для удобства работы с seaborn\n",
"pca_df = pd.DataFrame(X_pca, columns=['pca0', 'pca1'])\n",
"# Визуализация\n",
"plt.figure(figsize=(8,6))\n",
"sns.scatterplot(data=pca_df, x='pca0', y='pca1')\n",
"plt.title('PCA для оценки успеваемости студентов')\n",
"plt.xlabel('Principal Component 1')\n",
"plt.ylabel('Principal Component 2')\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Каждая точка на графике соответствует студенту, а положение точки показывает, как студент распределяется по этим двум главным компонентам.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Выбор количества кластеров\n",
"\n",
"Для выбора оптимального количества кластеров можно использовать следующие методы:\n",
"\n",
"1. **Оценка инерции** (сумма квадратов расстояний от точек до их центроидов). Инерция часто используется для выбора числа кластеров в алгоритме KMeans.\n",
"2. **Коэффициент силуэта** (Silhouette Score), который измеряет, насколько хорошо каждый объект подходит своему кластеру и насколько хорошо он отделен от других кластеров."
]
},
{
"cell_type": "code",
"execution_count": 204,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA+IAAAIQCAYAAAAFN9TtAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACs/klEQVR4nOzdeVxU5f4H8M+ZGWAAYRCQVTZXRFQ2Rcy1cGlR29VMzZ9XK1PrUpa2aLZcbptZapreVrM0y0xbUKNyi2QTV0JTEERWkWGTbeb8/oCZHEEFHTgzw+f9es3rXs55zpnvQfOZ7zzP830EURRFEBEREREREVG7kEkdABEREREREVFHwkSciIiIiIiIqB0xESciIiIiIiJqR0zEiYiIiIiIiNoRE3EiIiIiIiKidsREnIiIiIiIiKgdMREnIiIiIiIiakdMxImIiIiIiIjaERNxIiIiIiIionbERJyIiIiIiIioHTERJ7rM8ePH8fDDD8Pb2xs2Njbw8vLC1KlTcfz4calDIyIisliffvopBEFAcnJyk3Pr16+HIAi4++67odFo2iWeu+66C/7+/q2+bt68eRAEwfgBEZHFYSJO1Gjr1q0ICwtDfHw8Zs6ciQ8++ACzZs3Cb7/9hrCwMHz33XdSh0hERNShfPfdd3j88ccxbNgwbNq0CXK5XOqQiIiMQiF1AESm4PTp05g2bRq6deuGvXv3okuXLvpzTz75JIYNG4Zp06bhyJEj6Natm4SREhERdQy///47pkyZgqCgIOzYsQNKpVLqkIiIjIYj4kQA3nrrLVRVVWHdunUGSTgAuLq64sMPP0RlZSXefPNN/fGXX34ZgiCguLjYoH1ycjIEQcCnn35qcFyr1WLFihXo27cvlEol3N3d8eijj+LixYsG7fz9/XHXXXc1ibG56W6CIODll1/W/1xfX4877rgDzs7OOHHihP74J598gltvvRVubm6wsbFBUFAQ1qxZ06LfzSOPPNJkel5OTg5sbW0hCAKysrJuKHYA+OKLLxAeHg5bW1s4Oztj8uTJyMnJMWgzcuRIBAcHIyUlBUOGDIGtrS0CAgKwdu1ag3a///47BEHAN9980+R9OnXqhEceecTgmL+/f5NjW7ZsgSAIBs+blZXV7J/nE088AUEQmtyDiIhuXlpaGiZOnAhPT0/s3LkTKpWqSZstW7bo+xBXV1c8/PDDyM3N1Z/Pzc3FlClT9MvNunXrhmeffRbl5eVN7rVhwwb4+PjAyckJsbGx+uObN2+Gl5cXXF1d8cYbbzS5bufOnejVqxc6deqEBQsWQBRFAA19Uvfu3eHo6IiYmBiDKfW6/ur33383uNedd97ZpF9vzWeN1vZXpaWleOqpp+Dj4wMbGxv06NEDb7zxBrRabZN7vv32202ePTg4GCNHjjR4pmu9Ln+u5lRXV+Pll19Gr169oFQq4enpiXvvvRenT5++oecDGj5DNBeL7h5Lly6FlZUVioqKmlw7Z84cODk5obq6Gvv27UN0dDRcXV1ha2uL0NBQrFmzRv/nfa33uvylczOfy8hycEScCMCOHTvg7++PYcOGNXt++PDh8Pf3x48//njD7/Hoo4/i008/xcyZM7FgwQJkZmZi1apVOHToEA4cOAArK6sbvrfOv/71L/z+++/YvXs3goKC9MfXrFmDvn37YsKECVAoFNixYwfmzp0LrVaLJ554otXvs2TJElRXV99UrK+//jpeeuklPPjgg/jXv/6FoqIirFy5EsOHD8ehQ4fg5OSkb3vx4kXccccdePDBBzFlyhR8/fXXePzxx2FtbY3/+7//u6k4dOrr6/HCCy+0qO3ff/+N9evXG+V9iYjI0OnTpzFu3DjY2Nhg586d8PT0bNJG158OHDgQsbGxKCgowHvvvYcDBw7o+5DTp0+joKAA8+fPR+fOnXH8+HG8//77iI+Px/79+2FrawsAOHDgAGbMmIEhQ4ZgypQp2LBhA86cOYNLly7hlVdewfPPP49du3Zh0aJF8PX1xZQpUwAAZ86cwd13340ePXrgP//5D+Li4vRr3J944gnMnz8fhw4dwrvvvosuXbpg8eLFV33mvXv34qeffjL67/Jq/VVVVRVGjBiB3NxcPProo/D19cUff/yBxYsXIy8vDytWrGjV+/Tp0wcbNmzQ/7xu3Tqkp6fj3Xff1R/r37//Va/XaDS46667EB8fj8mTJ+PJJ59EeXk5du/ejWPHjqF79+6ter7LBQYG6vv34uJi/Pvf/9afmzZtGl555RVs3rwZ8+bN0x+vra3FN998g/vuuw9KpRJ//PEH3Nzc8OKLL0Iul2PPnj2YO3cujhw5ok+gX3jhBfzrX/8yeJ85c+Y0+9nS2J/LyEyJRB1caWmpCECcOHHiNdtNmDBBBCCWlZWJoiiKS5cuFQGIRUVFBu2SkpJEAOInn3yiP7Zv3z4RgLhx40aDtnFxcU2O+/n5iXfeeWeT93/iiSfEK/+TBSAuXbpUFEVRXLx4sSiXy8Vt27Y1ubaqqqrJsbFjx4rdunW75jOLoijOmDFD9PPz0/987NgxUSaTibfffrsIQMzMzGx17FlZWaJcLhdff/11g3ZHjx4VFQqFwfERI0aIAMR33nlHf6ympkYMCQkR3dzcxNraWlEURfG3334TAYhbtmxp8v729vbijBkzDI75+fkZHPvggw9EGxsbcdSoUQbPm5mZ2eTP88EHHxSDg4NFHx+fJvclIqLW++STT0QA4g8//CB2795dBCCOGTOm2ba1tbWim5ubGBwcLF66dEl//IcffhABiEuWLLnq++zevVsEIL7yyiv6YxMmTBADAgLE6upqURRFsby8XAwICBDt7OzEM2fOiKIoilqtVrzlllvEAQMG6K9bsGCB6ODgIBYXF4uiKIp1dXXi4MGDRQDiwYMH9e2mTJkiurm56e+v669+++03fZvIyEh9v6rr10WxdZ81WtNfvfrqq6K9vb148uRJg/suWrRIlMvlYnZ2tsE933rrrSa/y759+4ojRoxoclwUm352uJ6PP/5YBCAuX768yTmtVtvq59O55ZZbxFGjRul/bu4eUVFRYmRkpMF1W7dubfJndKUXXnhBBCDu3bu3ybnm3udyN/O5jCwHp6ZTh6eboubg4HDNdrrzZWVlrX6PLVu2QKVSYfTo0SguLta/wsPD0alTJ/z2228G7evq6gzaFRcXX3MEetWqVYiNjcX777+PiRMnNjmv+9YfANRqNYqLizFixAicOXMGarW6Vc+yePFihIWF4YEHHmj2fEti37p1K7RaLR588EGDdh4eHujZs2eT34dCocCjjz6q/9na2hqPPvooCgsLkZKS0qr4m1NVVYVXXnkF8+bNg6+v7zXbpqSkYMuWLYiNjYVMxn9CiYiM6ZFHHkFOTg4eeugh7Nq1C1u2bGnSJjk5GYWFhZg7d67BuvE777wTgYGBBrPXruyTQkJCEBERYXDf+Ph43HHHHbCxsQHQsJwpKCgIXbp0QUBAAADoq7YfPnwYFy5c0F83fPhwuLi4AGjoq8LDwwEAgwYN0t//3nvvRWFhIY4dO9bsM2/duhVJSUn473//e0O/s6u5Vn+1ZcsWDBs2DJ07dzb4/URHR0Oj0WDv3r0G7auqqpr07casYP/tt9/C1dUV8+fPb3LualXoW9If19bW6v9cr2b69Ok4ePCgfgo8AGzcuBE+Pj4YMWKE/tiVv4PZs2fDysqq2b+j12PMz2Vkvvgpkjo8XYLd3Jqxy7U0YW/OqVOnoFar4ebmhi5duhi8KioqUFhYaNB+165dTdp99NFHzd77559/xpNPPgkAKCkpabbNgQMHEB0dDXt7ezg5OaFLly54/vnnAaBV/+Dv378fO3bswBtvvHHVjrElsZ86dQqiKKJnz55N2qanpzf5fXh5ecHe3t7gWK9evQDAYI36jVq+fDmqq6v1v5NrWbRoEYYNG9bsWngiIro5JSUl+OKLL/DZZ58hJCQETz75ZJN+6uzZswCA3r17N7k+MDBQfx5o6P+u7GeSk5Px999/A2hY+lRZWQlvb+/rxqZro6tlkpOTc0PXXU6j0eD555/H1KlTrzl
"text/plain": [
"<Figure size 1200x600 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.cluster import KMeans\n",
"from sklearn.metrics import silhouette_score\n",
"\n",
"# Оценка инерции для различных количеств кластеров\n",
"inertia = []\n",
"silhouette_avg = []\n",
"\n",
"for n_clusters in range(2, 10): # Пытаемся от 2 до 10 кластеров\n",
" kmeans = KMeans(n_clusters=n_clusters, random_state=42)\n",
" kmeans.fit(X_pca) # Используем данные после PCA\n",
" inertia.append(kmeans.inertia_)\n",
" silhouette_avg.append(silhouette_score(X_pca, kmeans.labels_))\n",
"\n",
"# Визуализация инерции и силуэта\n",
"plt.figure(figsize=(12, 6))\n",
"plt.subplot(1, 2, 1)\n",
"plt.plot(range(2, 10), inertia, marker='o')\n",
"plt.title('Оценка инерции')\n",
"\n",
"plt.subplot(1, 2, 2)\n",
"plt.plot(range(2, 10), silhouette_avg, marker='o')\n",
"plt.title('Коэффициент силуэта')\n",
"\n",
"plt.show()\n"
]
},
{
"cell_type": "code",
"execution_count": 205,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABKUAAAHqCAYAAADVi/1VAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADRZ0lEQVR4nOzdeVyUVfvH8c+wg4pIyqLivu9bmbmngmaLWY9LlkumLVqp/VroyS3r0czMFtPH3UrTbDUtBRfMEpdQMi0t91RAzYUUZb1/f0wzjyOggMANw/f9evFi5r7PnLnOMMLxmnOu22IYhoGIiIiIiIiIiEghcjE7ABERERERERERKXmUlBIRERERERERkUKnpJSIiIiIiIiIiBQ6JaVERERERERERKTQKSklIiIiIiIiIiKFTkkpEREREREREREpdEpKiYiIiIiIiIhIoVNSSkRERERERERECp2SUiIiIiIiIiIiUuiUlBIREbmKxWKhU6dOZochIiIiIk5m8ODBWCwWjhw5YnYoRYaSUiLXOHLkCBaLhe7du2d5Pjo6GovFgsViYfDgwYUbnIiIiJjuRnOFM2fO0KRJEywWC6NGjSrc4K5xs4n2qKgoLBYLEyZMyLeYREREbJSUEsmlF154wewQREREpIj666+/6NKlC7/88gvPPPMMM2bMMDskERGRIktJKZFcWLlyJT/88AM1a9Y0OxQREREpYmwJqd27d/P000/zzjvvmB2SiIhIkaaklEgOpaen8/LLL+Pu7p7tEvZOnTphsViyPHe95e+HDx/mscceo0qVKnh6ehIcHMzgwYM5evRopra2ZfjHjx+nf//+lC9fHh8fH9q2bcu6desytb/evmXb9oNrtyEuWrQIi8XCokWLHI5nZGTQqlWrbLcCfPnll3To0AFfX1/7FkfbV26X/dtey6y+ro3r2rFm9XXtGNPT05k6dSpNmjTB29s7U/uoqKgbxnj167d371569uyJn58fpUuXJjQ0lJiYmEyPiYmJYeTIkTRq1IiyZcvi7e1N48aNmTJlCqmpqZnaDx8+nGbNmuHv74+XlxfVq1dn+PDhHD58+KZfg+x+ju+++679MVe/b7J7f2f3PqpWrRrVqlXLMs6s3mPZ9ZOVa2P/8ccfcXNzo1mzZiQnJzu0vd65nDIMg9GjR2OxWBgwYECWPysRKdnOnj1L165d+fnnnxkxYgTvvvtulu0uXbrE+PHjqVevHl5eXvj7+9OzZ09+/PFHh3bnzp3jueeeo0GDBpQuXZoyZcpw22238e6772b7O+jbb7+lcePGeHt707FjRw4ePGg/t2bNGurXr28/t3v37iz7mDp1KpUrV6Z06dIMGTKES5cu2c+9+eabBAcH4+vryyOPPMJff/3l8NjrzXV69+6NxWLJ9HehMOcpf//9N+PHj6dhw4Z4e3vj5+dHWFgYP/zwQ6a2uZnT2eLMyVdO68h8//339OrVi8DAQDw9PQkJCaF3794OsU6YMCHLOUtSUhIhISHX/Zt6vRht/c2bNw+LxcLUqVOz7GPDhg1YLBYef/xxAH744QceeOABQkJC8PT0JDAwkPvvvz/Te9v2+uXky+b333/nhRdeoEWLFtxyyy14eXlRp04dXnrpJS5evJij1/RqtvdQVl/ZbXe9UdzX+v777wkLC8Pf3x8XF5frzsmyY3sfXrlyhZdeeokqVarg5eVF/fr1ee+99zAMw6H9hQsXeOONN+jYsSMVK1bEw8ODihUrMnDgQIffBzbLli2jc+fOVKxY0f4z69KlCytXrrzp1yC7f9snT56kTJkymX5XZPfv+urX4WrZvf9tsvpZXu/f9fViT05OplmzZri5uWV6P1/vXG5ERUXh5+dHlSpV2LdvX577KY7czA5ApLhYvHgxe/fu5amnnqJWrVr51u+2bdsICwvj0qVL3H333dSuXZsjR46wZMkSvvvuO6Kjo6lRo4bDY86dO0fbtm2pUKECjz32GKdPn2b58uV0796dzz77jF69euVbfFdbuHBhlokWgFWrVtG7d2+8vb154IEHqF69Oi4uLhw5coTFixfn+TnHjx9vvx0bG8vXX399w8c8++yz+Pn5AXD+/PksP6l+6aWXmDZtGsHBwTz66KNUqFABsP5B2LRpU65iPHToEG3btqVFixY8+eSTHD16lBUrVtChQwc2bNhA69at7W3nzp3LN998Q4cOHbjrrrtISkoiKiqK8PBwduzYweeff+7Q908//US9evXo0qUL7u7u7N27l/nz5/P111/z22+/4e/vn+fXICtnzpwplnVD2rZtyyuvvMLEiRN58cUX7dtlzp8/z4ABA/D09OSTTz7B09Mz132npqYyePBgli5dyqhRo5g+fXqOJjQiUnLYElKxsbE89dRTvP/++1m2u3LlCnfeeSfbt2+nRYsWjBo1ioSEBJYvX87atWv55JNP+Ne//gXA0aNHmT17Nt26daNnz56kpaWxYcMGnn32Wb766iu+++47h99pGzdu5L777sPb25sBAwZw8eJFevbsCVg//BowYAB9+/bl7NmzfPbZZ3Tq1ImYmBiqV69u72PSpEmMGzeO6tWrM3ToUHbs2MGYMWMA+OSTT0hKSuKhhx4iJiaGjz/+mH379hEdHY2b2/X/S7Fhwwa+/PLLm3qNc+J685SzZ8/SoUMH9u7dS9u2bXniiSdITEzk66+/pnPnzqxYsSLP8yc/Pz+H+Qpg/zt0bU0x29/m63nnnXcYPXo03t7e3H///VSpUoUTJ07www8/8Nlnn9GuXbvrPn7y5MkcP378hs9TtWpVh+TItXOg/v3789xzzzF//vwsy1fMnTsXgGHDhgHw2WefERsbS5cuXahYsSJxcXF8/fXXrFy5kpkzZ/LEE08A1g+trn29Jk6cmCmeq33xxRfMnz+fzp0706lTJzIyMti6dStvvPEGmzZt4vvvv8fd3f2GY77WfffdR7NmzRziuJGOHTs6JDsWLVqU6YPkXbt20bVrVwzDoFevXtStWxc3N7dczcmu1qdPH3bt2sUDDzwAwOeff84zzzzDkSNHeOutt+ztfvvtN8aNG0fnzp25//77KVWqFPv27WPp0qWsXr2anTt3UrVqVXv7X3/9FTc3N3r37k3ZsmVJSEjg66+/5r777uOjjz7i4YcfzvNrkJ0XX3wxT4lEM9nmkS1btmTAgAH8/PPPlC1bFrCWdvn555+ZMGECbdu2zVP/n3/+OQMGDKBmzZqsXbuWypUr52f4RZ8hIg4OHz5sAEZYWJj92OXLl43KlSsbpUuXNuLj443o6GgDMAYNGuTw2I4dOxrZ/bPauHGjARjjx4+3H0tJSTGqVatmlClTxti5c6dD+82bNxuurq7G3Xff7XAcMADjoYceMjIyMuzHf/75Z8PDw8OoUKGCkZSUZD8+aNAgAzAOHz6c7VivHcfChQsNwFi4cKH92IULF4zAwECjZcuWBmB07NjR4TH/+te/DMD44osvbjjunGjXrl2m1zKruK42YMAAAzCOHDlywzFWqFDB8PT0NE6ePOlwfPz48QZgbNy48YYx2voGjJdeesnh3Jo1awzAaNy4scPxo0ePGmlpaQ7HMjIyjEcffdQAjB9++OGGzzt27FgDMFasWJHpXG5eg6x+jk888YTh4uJiNGvWLNP7plOnTgbg8L67Xv9Vq1Y1qlatmuUYsvpZZtdPVrKKPS0tzWjbtq1hsViMb7/91jAMw+jTp48BGP/9739v2GdWff/9999GaGioARiTJ0/OcR8i4tyuniucPXvWaNGihQEYd9xxR6bfkVebOHGiARgDBgxwaLdz507Dw8PD8PPzMxITEw3DMIyLFy8aFy5ccHh8RkaGMWzYMAMwwsPDHc41adLE8PDwMHbt2mU/9tZbbxmA4enp6fD3Zfny5QZg9O/f334sPj7e8PDwMOrUqWOcPXvWMAzr79VevXoZgFGjRg0jISHB3v7JJ5/M9Ps1q7/5aWlpRqNGjYzKlSsbgYGBmf4uFNY85aGHHjIAY+7cuQ7HExISjJCQEKNChQrG5cuX7cdzO6e71vX+Bl5PbGys4eLiYlSsWDHTa5K
"text/plain": [
"<Figure size 1200x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.cluster import KMeans\n",
"from sklearn.metrics import silhouette_score\n",
"import matplotlib.pyplot as plt\n",
"\n",
"\n",
"# Списки для хранения метрик\n",
"inertia = []\n",
"silhouette_scores = []\n",
"\n",
"# Оценка для числа кластеров от 2 до 10\n",
"k_values = range(2, 11)\n",
"for k in k_values:\n",
" kmeans = KMeans(n_clusters=k, random_state=42)\n",
" labels = kmeans.fit_predict(df_encoded)\n",
" \n",
" inertia.append(kmeans.inertia_)\n",
" silhouette_scores.append(silhouette_score(df_encoded, labels))\n",
"\n",
"# Визуализация инерции\n",
"plt.figure(figsize=(12, 5))\n",
"\n",
"plt.subplot(1, 2, 1)\n",
"plt.plot(k_values, inertia, marker='o', color='orange')\n",
"plt.title('Инерция для различных k', fontsize=14)\n",
"plt.xlabel('Количество кластеров (k)')\n",
"plt.ylabel('Инерция')\n",
"plt.grid(True)\n",
"\n",
"# Визуализация коэффициента силуэта\n",
"plt.subplot(1, 2, 2)\n",
"plt.plot(k_values, silhouette_scores, marker='o', color='orange')\n",
"plt.title('Коэффициент силуэта для различных k', fontsize=14)\n",
"plt.xlabel('Количество кластеров (k)')\n",
"plt.ylabel('Силуэт')\n",
"plt.grid(True)\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Оптимальное количество кластеров — от 2 до 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Выполнение кластерного анализа с применением иерархического и неиерархического алгоритма кластеризации"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Неиерархический алгоритм (например, KMeans)"
]
},
{
"cell_type": "code",
"execution_count": 206,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 3\n",
"1 1\n",
"2 1\n",
"3 0\n",
"4 2\n",
"Name: Cluster_KMeans, dtype: int32\n"
]
}
],
"source": [
"from sklearn.cluster import KMeans\n",
"\n",
"# Используем оптимальное количество кластеров, например, 3\n",
"kmeans = KMeans(n_clusters=5, random_state=random_state)\n",
"kmeans.fit(X_pca)\n",
"\n",
"# Получаем метки кластеров\n",
"labels = kmeans.labels_\n",
"\n",
"# Добавляем метки кластеров к данным\n",
"df_encoded['Cluster_KMeans'] = labels\n",
"\n",
"print(df_encoded['Cluster_KMeans'].head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Иерархический алгоритм (например, агломеративная кластеризация)"
]
},
{
"cell_type": "code",
"execution_count": 207,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 4\n",
"1 2\n",
"2 2\n",
"3 0\n",
"4 1\n",
"Name: Cluster_Agglomerative, dtype: int64\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0kAAAKPCAYAAAC4v4FdAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACH7klEQVR4nOzdd3wUdf7H8fduek8IKZQAoShNpCgQVEBEEVFEEEThJ+0UFRTB0wM9sZyIqCd4HioqxYINC8qpiCBY6KBI74QWEkhI78nO74+YdZYE2BSyJHk9H499PLIz39n97Gay2ffM9/sdi2EYhgAAAAAAkiSrqwsAAAAAgIsJIQkAAAAATAhJAAAAAGBCSAIAAAAAE0ISAAAAAJgQkgAAAADAhJAEAAAAACaEJAAAAAAwISQBAAAAgAkhCQBwUfvf//6nLVu22O8vXrxYO3bscF1BqBGOHTumBQsW2O/HxsZq4cKFrisIwEWFkISL2meffSaLxVLqrW3btq4uD0AV2LZtmyZMmKB9+/Zp3bp1uu+++5Senu7qslDNWSwWjRs3Tt9//71iY2P12GOP6ZdffnF1WQAuEu6uLgBwxuOPP65WrVrZ70+bNs2F1QCoSn/72980b948XXLJJZKkgQMHqmvXri6uCtVdgwYNdM899+jGG2+UJNWrV0+rVq1ybVEALhoWwzAMVxcBnM1nn32mwYMHa+XKlerZs6d9ec+ePZWYmKjt27e7rjgAVSY3N1fbt2+Xr6+vwwEToKIOHDigxMREtW3bVn5+fq4uB8BFgu52uKjl5eVJkqxW53bVlJQUPfzww4qKipKXl5eaN2+uGTNmyGaz2dvExsbKYrHo5ZdfLrF927ZtHcJYsaeffrrULn9ntu3Zs6fatm2rzZs3q1u3bvLx8VF0dLTefPPNEo958uRJjRkzRhEREfL29tbll1+ud99916FNca2l3T744ANJ0oIFC2SxWPTzzz9r7NixCg0NVWBgoO6++24lJyc7PN5XX32lfv36qX79+vLy8lKzZs30r3/9S4WFhSVeh8Vi0YABA0rUPXbs2BLdHc11Ll682KF9Tk6OQkJCSrznhw8f1gMPPKBLL71UPj4+Cg0N1eDBgxUbG1viOc9U/Hzm8QSSNG7cOFksFo0cOdJheVn3i5kzZ6px48by8fFRjx49SoTxrVu3auTIkWratKm8vb0VGRmp0aNHKykpyaFd8X6ze/duDRkyRIGBgQoNDdWECROUk5Njbzd//nxZLBbNmzfPYfvnn39eFotF3377bZlf98iRI9WkSZMS753FYtHTTz/tsOz48eMaPXq0IiIi5OXlpTZt2pSoRSr6XT799NO65JJL5O3trXr16mngwIE6cODAWetLT09Xp06dFB0drRMnTtiXZ2Zm6pFHHrH/Ti699FK9/PLLOvO4XXG9Xl5e6tSpk1q1aqWXXnqp1L+/0syfP1+9evVSeHi4vLy81Lp1a73xxhsl2jVp0qTUv7O//e1vDu3O9llw5j73+++/q2/fvgoMDJS/v7+uu+46rVu3zqFN8d+up6enTp065bBu7dq19sfetGmTw7r169frxhtvVFBQkHx9fdWjRw+tXr261DrPt+85+x6d7f0pvhXva2XZR4tff/HN19dXl112md555x2HbUeOHCl/f3+dqbg79plnfxYtWqROnTrJx8dHdevW1fDhw3X8+PESj1lcc7NmzdSlSxedPn1aPj4+slgs5/0cKu3v64MPPpDVatULL7xQov3Z3jfz8zi7r0rSd999px49eiggIECBgYG68sor9eGHH0r66/P7XLcz6y5+v+rUqaOhQ4fq6NGjDm2c/d+2atWqUn8n/fr1K/HZ88Ybb+jyyy9XUFCQ/Pz8dPnll2vu3LkO25X1szYxMdFh+aZNm0rsj+XZR82/J5vNpnbt2pX6GKhZ6G6Hi1pxSPLy8jpv26ysLPXo0UPHjx/X2LFj1ahRI61Zs0ZTpkzRiRMnNGvWrArX88Ybb9j/WU+ZMqXUNsnJybrppps0ZMgQ3Xnnnfr00091//33y9PTU6NHj5YkZWdnq2fPntq/f7/Gjx+v6OhoLVq0SCNHjlRKSoomTJjg8Jh33nmnbrrpJodlV111lcP98ePHKzg4WE8//bT27NmjN954Q4cPH7b/05KKPvD9/f01adIk+fv768cff9TUqVOVlpaml156yeHxvL299c033+jkyZMKDw+31/3JJ5/I29u71Nfu7e2t+fPnO4SrL774osSXMknauHGj1qxZo6FDh6phw4aKjY3VG2+8oZ49e2rnzp3y9fUt9TnOZv/+/Xr77bdLLC/rfvHee+8pPT1d48aNU05Ojl599VX16tVL27ZtU0REhCTphx9+0MGDBzVq1ChFRkZqx44deuutt7Rjxw6tW7euxJeQIUOGqEmTJpo+fbrWrVun//znP0pOTtZ7770nSRo1apS++OILTZo0Sddff72ioqK0bds2PfPMMxozZkyJ370zr9tZCQkJ6tq1qywWi8aPH6+wsDB99913GjNmjNLS0vTwww9LkgoLC3XzzTdrxYoVGjp0qCZMmKD09HT98MMP2r59u5o1a1bisfPz8zVo0CAdOXJEq1evVr169SRJhmGof//+WrlypcaMGaP27dvr+++/16OPPqrjx49r5syZZ603JSVF06dPd/r1vfHGG2rTpo369+8vd3d3LVmyRA888IBsNpvGjRvn0LZ9+/Z65JFHHJY1b9681Md9//337T9PnDjRYd2OHTt0zTXXKDAwUI899pg8PDw0Z84c9ezZUz/99JO6dOni0N7NzU0ffPCBw+PMnz9f3t7eJf52fvzxR/Xt21edOnXSU089JavVav9y/csvv6hz584O7c+37zn7Hs2aNUsZGRmSpF27dun555936AZdWogpdr59dObMmapbt67S0tI0b9483XPPPWrSpIl69+591m3OZsGCBRo1apSuvPJKTZ8+XQkJCXr11Ve1evVq/f777woODj7rtlOnTi31s8oZy5Yt0+jRozV+/HhNnjy51Da33XabBg4cKEn65Zdf9NZbbzmsd3ZfXbBggUaPHq02bdpoypQpCg4O1u+//66lS5fqrrvu0hNPPGEP94mJiZo4caLuvfdeXXPNNSVqmjZtmp588kkNGTJEf/vb33Tq1Cm99tpr6t69e4n3y5n/baX5+eef7Qd6zNLT03XDDTeoWbNmMgxDn376qf72t78pODhYgwYNklT2z9ryKsvn6Pvvv69t27ZVyvPiImcAF7FZs2YZkow//vjDYXmPHj2MNm3aOCz717/+Zfj5+Rl79+51WD558mTDzc3NOHLkiGEYhnHo0CFDkvHSSy+VeL42bdoYPXr0KLH88ccfNyQZiYmJ52zbo0cPQ5Lx73//274sNzfXaN++vREeHm7k5eU5vK4PPvjA3i4vL8+IiYkx/P39jbS0tPPWWmz+/PmGJKNTp072xzcMw3jxxRcNScZXX31lX5aVlVVi+7Fjxxq+vr5GTk6Ow+to06aN0a5dO+Pll1+2L3///feNhg0bGtdcc43D+19c55133mm4u7sb8fHx9nXXXXedcdddd5V4HaXVsnbtWkOS8d5775319Zqfb/78+fZlQ4YMMdq2bWtERUUZI0aMsC8v637h4+NjHDt2zN5u/fr1hiRj4sSJ56z9o48+MiQZP//8s33ZU089ZUgy+vfv79D2gQceKLFfnzhxwqhTp45x/fXXG7m5uUaHDh2MRo0aGampqeV63aNGjTIaNWpUok5JxlNPPWW/P2bMGKNevXoO+7ZhGMbQoUONoKAg+2udN2+eIcl45ZVXSjymzWYrUZ/NZjOGDRtm+Pr6GuvXr3dov3jxYkOS8dxzzzksv/322w2LxWLs37//rPU+9thjRnh4uNGpU6dS/1bPVNrvqk+fPkbTpk0dljVu3Njo16/feR/viSeeMCwWS4ltze/9gAEDDE9PT+PAgQP2ZXFxcUZAQIDRvXt3+7Liv90777zTuOyyy+zLMzMzjcDAQPvfzcaNGw3DKHqfW7RoYfTp08f+nhe/xujoaOP666+3LyvLvufse1R
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.cluster import AgglomerativeClustering\n",
"\n",
"# Агломеративная кластеризация\n",
"agg_clust = AgglomerativeClustering(n_clusters=5)\n",
"agg_clust.fit(X_pca)\n",
"\n",
"# Получаем метки кластеров\n",
"df_encoded['Cluster_Agglomerative'] = agg_clust.labels_\n",
"print(df_encoded['Cluster_Agglomerative'].head())\n",
"\n",
"from scipy.cluster import hierarchy\n",
"linkage_matrix = hierarchy.linkage(df_encoded, method='ward')\n",
"\n",
"plt.figure(figsize=(10, 7))\n",
"hierarchy.dendrogram(\n",
" linkage_matrix, \n",
" truncate_mode='lastp', \n",
" p=5, \n",
" leaf_rotation=90., \n",
" leaf_font_size=12., \n",
" show_contracted=True\n",
")\n",
"plt.title('Дендрограмма иерархической агломеративной кластеризации')\n",
"plt.xlabel('Индексы образцов')\n",
"plt.ylabel('Расстояние')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Оценка качества решения\n",
"\n",
"Для оценки качества кластеризации можно использовать: \n",
"**Визуализация**: Построение графиков с метками кластеров (например, с использованием PCA). \n",
"**Коэффициент силуэта**: Метрическая оценка качества кластеров."
]
},
{
"cell_type": "code",
"execution_count": 208,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Коэффициент силуэта для KMeans: 0.36834520397007814\n"
]
}
],
"source": [
"# Коэффициент силуэта для KMeans\n",
"sil_score_kmeans = silhouette_score(X_pca, df_encoded['Cluster_KMeans'])\n",
"print(f\"Коэффициент силуэта для KMeans: {sil_score_kmeans}\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Если коэффициент силуэта близок к 1, это означает, что кластеры хорошо разделены. Если он близок к 0, кластеры пересекаются, а если он близок к -1, это означает плохую кластеризацию. В нашем случае, можно сказать, что кластеры пересекаются."
]
},
{
"cell_type": "code",
"execution_count": 209,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAr4AAAIjCAYAAADlfxjoAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd1QUVxvA4d9sp4qg2AuKXcHee+9dY6/RmJjkS09MMxoTY4wakxhT7D32rth7L4gKKir2ggiCtG0z3x/IxpW2GFSU+5zjOTJzZ+buUPbdO+99r6QoioIgCIIgCIIgvOJUL7oDgiAIgiAIgvA8iMBXEARBEARByBFE4CsIgiAIgiDkCCLwFQRBEARBEHIEEfgKgiAIgiAIOYIIfAVBEARBEIQcQQS+giAIgiAIQo4gAl9BEARBEAQhRxCBryAIgiAIgpAjiMBXEARBeCX9+OOPlC1bFlmWX3RXso3g4GA0Gg1nzpx50V0RhBdCBL6CkMPMmTMHSZI4duxYin1///03kiTRuXNnrFbrC+id8DIZNGgQkiTZ/rm7u+Pv78+kSZMwGo0p2gcGBtKvXz+KFCmCXq/H09OT5s2bM3v27FR/3h48eIDBYECSJEJCQjLVt5iYGCZMmMCnn36KSvXvW50kSbz99tsp2n///fdIksSQIUOQZZkrV67YXte4ceNSvUbfvn2RJAlXV9dM9e1FKl++PO3atePrr79+0V0RhBdCBL6CIACwatUq3nzzTRo0aMCSJUtQq9UvukvCS0Cv1zN//nzmz5/P999/j6enJx999BEDBw60azdjxgyqV6/Ozp076du3L7///jtff/01Tk5ODB06lAkTJqQ497Jly5Akifz587Nw4cJM9WvWrFlYLBZ69+6dYdsffviBL774goEDBzJjxgy7QNlgMLB48eIUx8TFxbFmzRoMBkOm+pUdjBgxglWrVnHp0qUX3RVBeP4UQRBylNmzZyuAcvToUdu2nTt3Knq9XqlUqZLy4MGDF9g74WUycOBAxcXFxW6b1WpVqlevrgDKzZs3FUVRlIMHDypqtVqpX7++EhMTk+I8R48eVWbPnp1ie8OGDZWuXbsq77//vuLj45Opvvn5+Sn9+vVLsR1QRo4cafv6xx9/VABlwIABitVqtW0PCwtTAKVr164KoAQGBtqdZ+HChYpWq1U6dOiQ4h5kdyaTScmdO7fy1VdfveiuCMJzJ0Z8BSGHCwwMpFOnThQoUICAgABy5cqVos3jj32f/Pe4n376ibp16+Ll5YWTkxPVqlVj+fLlqV53wYIF1KxZE2dnZ3Lnzk3Dhg3ZsmULAMWLF0/zepIkUbx4cdt5ZFnm559/pkKFChgMBvLly8cbb7xBVFSU3fWKFy9O+/bt2bJlC5UrV8ZgMFC+fHlWrlxp1y69VJBkjRs3pnHjxundVgCHXwMkjSB++OGHtjSAMmXK8NNPP6EoSqbuXbJdu3Y5dF2j0cjo0aPx9fVFr9dTpEgRPvnkk1RTFRyhUqls9+bKlSsAjBkzBkmSWLhwIW5ubimOqV69OoMGDbLbdu3aNfbu3UuvXr3o1asXYWFhHDhwwKE+hIWFERQURPPmzdNtN3nyZD755BP69evH7Nmz7UZ6k9WpUwcfHx8WLVpkt33hwoW0bt0aT0/PVM+9adMmGjRogIuLC25ubrRr146zZ8/atQkKCmLQoEGUKFECg8FA/vz5GTJkCPfv37dr98033yBJEhcvXmTQoEF4eHiQK1cuBg8eTHx8vF3brVu3Ur9+fTw8PHB1daVMmTJ8/vnndm20Wi2NGzdmzZo16d4fQXgVaV50BwRBeHEuXbpE69at0ev1BAQEUKBAgXTbDx8+nAYNGgCwcuVKVq1aZbd/6tSpdOzYkb59+2IymViyZAk9evRg/fr1tGvXztZuzJgxfPPNN9StW5exY8ei0+k4fPgwO3bsoGXLlvz888/ExsYCEBISwvfff8/nn39OuXLlAOxyKt944w3mzJnD4MGDeffddwkLC+O3337j5MmT7N+/H61Wa2sbGhrKa6+9xogRIxg4cCCzZ8+mR48ebN68mRYtWvy3m5mGFi1aMGDAALttkyZNsgvMFUWhY8eO7Ny5k6FDh1K5cmUCAgL4+OOPuXnzJlOmTHH43j3p8fv2119/ce3aNds+WZbp2LEj+/btY/jw4ZQrV47Tp08zZcoULly4wOrVq5/qNSc/Qvfy8iI+Pp7t27fTsGFDihYt6vA5Fi9ejIuLC+3bt8fJyYmSJUuycOFC6tatm+GxyQFy1apV02wzdepUPvzwQ/r06cOcOXNSDXqT9e7dmwULFvDDDz8gSRIRERFs2bKF+fPns3nz5hTt58+fz8CBA2nVqhUTJkwgPj6e6dOnU79+fU6ePGn78LF161YuX77M4MGDyZ8/P2fPnuWvv/7i7NmzHDp0KMUHy549e+Lj48P48eM5ceIEM2bMwNvb25YmcvbsWdq3b4+fnx9jx45Fr9dz8eJF9u/fn6KP1apVY82aNcTExODu7p7hPRWEV8aLHnIWBOH5Sk51WL9+vVKyZEkFUFq2bJnuMaGhoQqgzJ0717Zt9OjRypN/QuLj4+2+NplMSsWKFZWmTZvanUulUildunSxe7SsKIoiy3KKa+/cuVMBlJ07d6bYt3fvXgVQFi5caLd98+bNKbYXK1ZMAZQVK1bYtkVHRysFChRQqlSpYtuWWirIkxo1aqQ0atQozf3JeOKxerJ27dopxYoVs329evVqBVDGjRtn16579+6KJEnKxYsXFUXJ3L3bunWrAii7d++2bRs4cKDddefPn6+oVCpl7969dsf+8ccfCqDs378/3deXnOpw79495d69e8rFixeV77//XpEkSfHz81MURVFOnTqlAMr//ve/dM/1pEqVKil9+/a1ff35558refLkUcxmc4bHfvnllwqgPHz4MMU+wPaz0Lt3b8VisaR6juRUh4kTJypnzpxRANt9mjZtmuLq6qrExcWlSPd4+PCh4uHhoQwbNszufHfu3FFy5cplt/3J3xdFUZTFixcrgLJnzx7btuTftSFDhti17dKli+Ll5WX7esqUKQqg3Lt3L73boyiKoixatEgBlMOHD2fYVhBeJSLVQRByqEGDBnH9+nX69OnDli1bWLZsWZptTSYTkDSRKT1OTk62/0dFRREdHU2DBg04ceKEbfvq1auRZZmvv/46xSjbkyNcGVm2bBm5cuWiRYsWRERE2P5Vq1YNV1dXdu7cade+YMGCdOnSxfa1u7s7AwYM4OTJk9y5c8eubXR0NBERETx8+DBTfXoaGzduRK1W8+6779pt//DDD1EUhU2bNgGZu3eOfM+WLVtGuXLlKFu2rN39a9q0KUCK+5eauLg48ubNS968efH19eXzzz+nTp06tqcBMTExAKmmOKQlKCiI06dP201M6927NxEREQQEBGR4/P3799FoNGlWW7h79y4APj4+Dk3irFChAn5+frZJbosWLaJTp044OzunaLt161YePHhg62/yP7VaTa1atezu6eO/L4mJiURERFC7dm0Au9+ZZCNGjLD7ukGDBty/f992jz08PABYs2ZNhiXccufODUBERERGL18QXiki1UEQcqjIyEiWLFlCly5dCA4O5n//+x8tW7ZMNcf3wYMHABmWbVq/fj3jxo0jMDDQLkf08aDs0qVLqFQqypcv/59fQ2hoKNHR0Xh7e6e6Pzw83O5rX1/fFAFi6dKlgaR81Pz589u2P54f6uHhQe/evZk4cSIuLi7/ud9Punr1KgULFkwRHCanKFy9ehXI3L1z5HsWGhpKSEgIefPmTXX/k/cvNQaDgXXr1gFJQbaPjw+FCxe27U9+jJ6ZDxALFizAxcWFEiVKcPHiRdt1ihcvzsKFC+3SZp7GwIEDuXXrFt9//z158uTh/fffz/CYPn36MGnSJN5//30OHDiQIm82WWhoKIDtw8OTHk8riIyMZMyYMSxZsiTFvY6Ojk5x7JOpIsnBa1RUFO7u7rz22mvMmDGD119/nc8++4xmzZrRtWtXunfvnuKDkvIodzyzHzYF4WUnAl9ByKEmTpxIjx4
"text/plain": [
"<Figure size 800x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAr4AAAIjCAYAAADlfxjoAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzddXQUVxvA4d96nBBcgia4BHeXYMHdpUiLfAVKKVDcSoECbZFSKO7uCe5SLLgTPEBCSEJkfb4/0myzbBQChHKfcziHzNyZuSvZvHvn3veVSZIkIQiCIAiCIAj/cfJP3QFBEARBEARB+BhE4CsIgiAIgiB8EUTgKwiCIAiCIHwRROArCIIgCIIgfBFE4CsIgiAIgiB8EUTgKwiCIAiCIHwRROArCIIgCIIgfBFE4CsIgiAIgiB8EUTgKwiCIAiCIHwRROArCILwmfn5558pVKgQZrP5U3fFonv37uTJk+dTd+OTypMnD927d//U3UjU0qVLkclkPHjw4FN35Z18jOfY19cXJycngoKCPuh1hE9DBL7Cf0bsB/q5c+ds9v3555/IZDKaN2+OyWT6BL0TPifdu3dHJpNZ/rm4uFCyZElmzpyJTqezae/v70/nzp1xd3dHo9Hg5uZG3bp1WbJkSbzvt9DQUOzs7JDJZNy4cSNFfQsPD2fatGkMHz4cudz2I/x9zi0k7eTJk4wbN47Q0NBP3RUrJpOJJUuWULNmTdzc3NBoNOTJk4cePXrE+5n4oezevZtx48Z9tOt9CA0aNMDDw4OpU6d+6q4IH4AIfIX/vC1btvD1119TrVo11q5di0Kh+NRdEj4DGo2GFStWsGLFCqZMmYKbmxvfffcd3bp1s2q3aNEiypYty6FDh+jUqRPz5s1jzJgx2Nvb06tXL6ZNm2Zz7g0bNiCTyciaNSurVq1KUb/++usvjEYjHTp0iHf/+5xbSNrJkycZP358vIHvrVu3+PPPPz96n6Kjo2nSpAk9e/ZEkiRGjhzJ/Pnz6dq1K6dOnaJ8+fI8efLko/Rl9+7djB8//oOd/2M9x3379uWPP/7gzZs3H/xawsel/NQdEIQP6fDhw3To0IEiRYqwY8cO7OzsPnWXhM+EUqmkc+fOlp+/+eYbKlSowLp16/jll1/Inj07p0+fpl+/flSqVIndu3fj7Oxsaf/tt99y7tw5rl69anPulStX0qhRI3Lnzs3q1auZNGlSsvu1ZMkSmjZtmuB7+X3O/SWKjIzE0dExVc6l0WhS5TwpNWzYMHx9fZk1axbffvut1b6xY8cya9asT9Kv1CJJElqtFnt7+4/2HLdq1YqBAweyYcMGevbs+VGuKXwkkiD8RyxZskQCpLNnz0qSJEkXL16UXFxcpDx58kjPnj2L95iAgAAJiPdfXNOnT5cqVaokubm5SXZ2dlLp0qWlDRs2xHvOFStWSOXKlZPs7e0lV1dXqVq1apKfn58kSZKUO3fuBK8HSLlz57acx2QySbNmzZKKFCkiaTQaKXPmzFKfPn2kkJAQq+vlzp1baty4seTn5yeVLFlS0mg0UuHChaVNmzYl+vzEp0aNGlKNGjUS3B8ruY9BkiQpIiJCGjJkiJQzZ05JrVZLBQoUkKZPny6ZzeYUPXexDh06lKzrarVaacyYMVL+/PkltVot5cyZUxo2bJik1WqTfHzdunWTHB0dbbZ/9913EiCdOHFCkiRJatCggaRUKqWHDx8mec5YDx8+lGQymbR+/XrpzJkzVudLyv379yVAWrp06Xuf+/fff5fy5s0r2dnZSeXKlZOOHj0a7+v/4MEDycfHR3JwcJAyZcokffvtt5Kvr68ESIcOHbK069at2zu/9oDUv39/af369VLhwoUlOzs7qWLFitLly5clSZKkBQsWSPnz55c0Go1Uo0YNKSAgwObxnD59WvL29pZcXFwke3t7qXr16tLx48et2owdO1YCpGvXrkkdOnSQXF1dJS8vL0mSJOnSpUtSt27dpLx580oajUbKkiWL1KNHDyk4ONjm+Lf/xfYnd+7cUrdu3SRJkqSzZ88m+FrFPn87duywbHvy5InUo0cPKXPmzJJarZaKFCkiLV682ObYtz1+/FhSKpVSvXr1kmwrSf9+DsR9DgFp7NixNm3jPh5JkiS9Xi+NGzdO8vDwkDQajeTm5iZVqVJF2rt3ryRJMe+BxD5LU/qZ5uvrK5UpU0bSaDTSrFmz4u1T7OM5fvy4NHjwYCljxoySg4OD1Lx5c+nly5dW5zWZTNLYsWOlbNmySfb29lLNmjWla9eu2ZwzVqlSpaSmTZsm63kVPh9ixFf4T7p37x4NGjRAo9Hg5+dHtmzZEm3fp08fqlWrBsDmzZvZsmWL1f45c+bQtGlTOnXqhF6vZ+3atbRp04adO3fSuHFjS7vx48czbtw4KleuzIQJE1Cr1Zw5c4aDBw9Sv359Zs+eTUREBAA3btxgypQpjBw5ksKFCwPg5ORkOVffvn1ZunQpPXr0YNCgQQQEBPD7779z8eJFTpw4gUqlsrS9c+cO7dq1o1+/fnTr1o0lS5bQpk0bfH19qVev3vs9mQmoV68eXbt2tdo2c+ZMXr9+bflZkiSaNm3KoUOH6NWrF15eXvj5+TFs2DCePn1qNRKV1HP3trjP28KFC3n06JFln9lspmnTphw/fpw+ffpQuHBhrly5wqxZs7h9+zZbt259p8d87949ADJkyEBUVBQHDhygevXq5MqVK9nnWLNmDY6OjjRp0gR7e3vy58/PqlWrqFy5cpLHnjx5EoDSpUu/17nnz5/PgAEDqFatGoMHD+bBgwc0b96c9OnTkzNnTku7yMhIateuTWBgIP/73//ImjUrq1ev5tChQ0n2NSWvPcCxY8fYvn07/fv3B2Dq1Kk0adKE77//nnnz5vHNN9/w+vVrfv75Z3r27MnBgwctxx48eJCGDRtSpkwZxo4di1wuZ8mSJdSuXZtjx45Rvnx5q2u1adMGT09PpkyZgiRJAOzbt4/79+/To0cPsmbNyrVr11i4cCHXrl3j9OnTyGQyWrZsye3bt1mzZg2zZs0iY8aMAGTKlMnm8ZctW5Z8+fKxfv16m+kx69atI3369Hh7ewPw4sULKlasiEwmY8CAAWTKlIk9e/bQq1cvwsPDbUZx49qzZw9Go5EuXbok+Zq8r3HjxjF16lS++uorypcvT3h4OOfOnePChQvUq1ePvn378uzZM/bt28eKFStsjk/JZ9qtW7fo0KEDffv2pXfv3hQsWDDRvg0cOJD06dMzduxYHjx4wOzZsxkwYADr1q2ztBkxYgQ///wzPj4+eHt7c+nSJby9vdFqtfGes0yZMu/8WSGkYZ848BaEVBP7zX/nzp1S/vz5JUCqX79+osfcuXNHAqRly5ZZtsWO6sQVFRVl9bNer5eKFSsm1a5d2+pccrlcatGihWQymazaxze6GTtyGXfULNaxY8ckQFq1apXV9tiRorjbY0eR447whoWFSdmyZZNKlSpl2ZbaI779+/e32d64cWOrUb+tW7dKgDRp0iSrdq1bt5ZkMpl09+5dSZJS9tzt27dPAqQjR45Ytr092rhixQpJLpdLx44dszp2wYIFyRphjR3xDQoKkoKCgqS7d+9KU6ZMkWQymVSiRAlJkmJGCAHpf//7X6Lnelvx4sWlTp06WX4eOXKklDFjRslgMCR57I8//igB0ps3b9753DqdTsqQIYNUrlw5q+1Lly6VAKvXf+bMmRIgbd261bItOjpaKlSoUJIjvsl97SUp5v2k0WisRiH/+OMPCZCyZs0qhYeHW7aPGDHCasTSbDZLnp6ekre3t9V7JSoqSsqbN6/VSGjs73aHDh1snru3f8clSZLWrFkjAdLRo0ct26ZPn24zYhrr7ZHDESNGSCqVympEU6fTSa6urlLPnj0t23r16iVly5bNanRZkiSpffv2Urp06eLtW6zBgwdLgHTx4sUE28T1PiO+JUuWlBo3bpzo+fv372/z+SlJ7/aZ5uvrm2SfYh9P3bp1rV7/wYMHSwqFQgoNDZUkSZKeP38uKZVKqXnz5lbnGzdunATEO+I7ZcoUCZBevHiR6GM
"text/plain": [
"<Figure size 800x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(8,6))\n",
"sns.scatterplot(data=pca_df, x='pca0', y='pca1', hue=df_encoded['Cluster_KMeans'], palette=\"viridis\", s=100)\n",
"plt.title('Кластеры после PCA (KMeans)')\n",
"plt.xlabel('Principal Component 1')\n",
"plt.ylabel('Principal Component 2')\n",
"plt.show()\n",
"\n",
"\n",
"plt.figure(figsize=(8,6))\n",
"sns.scatterplot(data=pca_df, x='pca0', y='pca1', hue=df_encoded['Cluster_Agglomerative'], palette=\"viridis\", s=100)\n",
"plt.title('Кластеры после PCA (Agglomerative Clustering)')\n",
"plt.xlabel('Principal Component 1')\n",
"plt.ylabel('Principal Component 2')\n",
"plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "aimvenv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}