AIM-PIbd-31-Afanasev-S-S/lab_5/lab5.ipynb

712 lines
2.1 MiB
Plaintext
Raw Normal View History

2024-11-22 23:46:42 +04:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Начало лабораторной работы"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Вариант 3:* Диабет у индейцев Пима "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Перечислим атрибуты датасета\n",
"\n",
"Pregnancies — Количество беременностей.\n",
"\n",
"Glucose — Уровень глюкозы в крови.\n",
"\n",
"BloodPressure — Диастолическое артериальное давление.\n",
"\n",
"SkinThickness — Толщина кожной складки на трицепсе.\n",
"\n",
"Insulin — Уровень инсулина в сыворотке крови.\n",
"\n",
"BMI — Индекс массы тела.\n",
"\n",
"DiabetesPedigreeFunction — Функция родословной диабета.\n",
"\n",
"Age — Возраст.\n",
"\n",
"Outcome — Наличие диабета (0 — нет, 1 — да).\n",
"\n",
"Группировать индейцев Пима по \"интересным\" характеристикам для анализа и информирования: Риск развития диабета (на основе уровня глюкозы, BMI и DiabetesPedigreeFunction); Возрастные группы с высокой заболеваемостью (например, молодёжь и пожилые); Факторы риска у женщин с беременностями (Pregnancies, Insulin, и SkinThickness); Уровень давления и инсулина у людей с подтверждённым диабетом."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n",
"0 6 148 72 35 0 33.6 \n",
"1 1 85 66 29 0 26.6 \n",
"2 8 183 64 0 0 23.3 \n",
"3 1 89 66 23 94 28.1 \n",
"4 0 137 40 35 168 43.1 \n",
"\n",
" DiabetesPedigreeFunction Age Outcome \n",
"0 0.627 50 1 \n",
"1 0.351 31 0 \n",
"2 0.672 32 1 \n",
"3 0.167 21 0 \n",
"4 2.288 33 1 \n"
]
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"from scipy.cluster.hierarchy import dendrogram, linkage, fcluster\n",
"from sklearn.cluster import KMeans\n",
"from sklearn.decomposition import PCA\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.metrics import silhouette_score\n",
"\n",
"df = pd.read_csv(\"C:/Users/TIGR228/Desktop/МИИ/Lab1/AIM-PIbd-31-Afanasev-S-S/static/csv/diabetes.csv\")\n",
"df = df.head(1500)\n",
"print(df.head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Очистка данных\n",
"\n",
"Удалим несущественные данные"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Pregnancies Glucose BloodPressure BMI DiabetesPedigreeFunction Age \\\n",
"0 6 148 72 33.6 0.627 50 \n",
"1 1 85 66 26.6 0.351 31 \n",
"2 8 183 64 23.3 0.672 32 \n",
"3 1 89 66 28.1 0.167 21 \n",
"4 0 137 40 43.1 2.288 33 \n",
"\n",
" Outcome \n",
"0 1 \n",
"1 0 \n",
"2 1 \n",
"3 0 \n",
"4 1 \n"
]
}
],
"source": [
"df_cleaned = df.drop(columns=['SkinThickness', 'Insulin'], errors='ignore').dropna()\n",
"print(df_cleaned.head()) # Вывод очищенного DataFrame"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Визуализация парных взаимосвязей"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi0AAASgCAYAAACEzgvMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeXxb5Z0v/o+OjmRtlmTLuxPHjrNBSEJCthYIBChtWTKF9NJOmeFC6YWWQn9MoSzDXKaUdpj2QpdhgJbSQmdpKYVOaIFuUFrWAAESIAvZ7CReZcu2ZO06kn5/KFIkW7Il+Ug6kj7v14sXsY6k85zveSSd83yfRRWNRqMgIiIiIiIiIiIiIiIqMaHUBSAiIiIiIiIiIiIiIgKYtCAiIiIiIiIiIiIiIoVg0oKIiIiIiIiIiIiIiBSBSQsiIiIiIiIiIiIiIlIEJi2IiIiIiIiIiIiIiEgRmLQgIiIiIiIiIiIiIiJFYNKCiIiIiIiIiIiIiIgUgUkLIiIiIiIiIiIiIiJSBCYtiIiIiIiIiIiIiIhIEZi0ICIqM4cOHcLdd9+Nj3/841i1ahVOO+00fPazn8XPf/5zSJKUeF5fXx+WLl2KX//61yUsrfLF45T83/Lly7Fp0ybceeedGBsbS3l+/Dnf/e53075fJBLBmWeeOS3255xzDm677baCHgsRERERUbZ4X1EYkUgEZ599NpYuXYoPPvig1MUhIipLYqkLQERE2Xvuuedw++23o7u7G1dddRW6urrg9/vx17/+Ff/yL/+Cl19+GQ8++CBUKlWpi1p2vvSlL+Hss88GAAQCAfT09OD+++/HwYMH8fOf/zzluYIg4Pe//z2++tWvTnuft956C3a7vRhFJiIiIiLKC+8rCufVV1/F6OgoFi5ciMcffxzf/OY3S10kIqKyw6QFEVGZOHToEG6//XaceeaZ+P73vw9RPPEVftZZZ2HDhg34yle+gt/97ne44IILSljS8tTR0YFTTz018feGDRug0Wjwj//4jzhw4AAWL16c2LZmzRrs2LEDe/bswcknn5zyPs8++yxOOukk7N27t1hFJyIiIiLKGu8rCuvXv/41Vq9ejTPPPBMPPfQQbrvtNphMplIXi4iorHB6KCKiMvHII49AEATcddddKTcWcR//+MfxqU99KuPr77//fixdunTa40uXLsX999+f+NvtduPuu+/GmWeeiVNPPRVbt27FX/7yl8T2cDiM//7v/8bFF1+MlStX4uyzz8a9996LQCCQeM7Y2BhuuukmnH766VixYgX+5m/+Btu2bUvZ78DAAL761a9i/fr1WLVqFf73//7f2LNnT8byv/POO1i6dClefPHFlMf37t2LpUuX4k9/+hMA4JlnnsGWLVuwcuVKbNy4ETfffDOGh4czvu9MLBYLAEzrYbZu3To0NDTg97//fcrjkiThj3/8Iy688MK89kdEREREVGi8ryjcfYXT6cTzzz+PzZs346KLLoLP58PTTz897Xlutxt33nknPvKRj2D16tX4h3/4Bzz22GPT4vr888/j0ksvxYoVK3D66afjm9/8Jrxe74xlICKqBExaEBGViRdeeAEbN26EzWbL+Jxvf/vbc+oNFQ6H8fnPfx6//e1vce211+LBBx/EwoUL8eUvfxk7duwAANx555245557cN555+Ghhx7C5Zdfjv/6r//Cddddh2g0CgD42te+hkOHDuGuu+7Cj3/8Y5x88sm49dZbsX37dgCxm4/Pfvaz2L17N/7v//2/uO+++xCJRHD55Zfj0KFDacu2Zs0adHR04Nlnn015/JlnnoHVasVZZ52Ft99+G7fccgvOP/98/PjHP8btt9+O7du346abbpr12CORCCRJgiRJ8Pv92LdvHx588EFs3LgRixYtSnmuWq3Gxz/+8WlJi9dffx2BQADnnHNOdgEnIiIiIioy3lcU7r7it7/9LcLhMC6++GK0tbVh48aN+OUvfznteddddx1+97vf4YYbbsD3vvc9eDwe3HfffdPe68tf/jIWLlyIBx54ANdffz1+85vfpMSHiKhScXooIqIy4HQ64XQ60dnZOW1b8iJ5QGxUgFqtzms/L730Enbt2oUHHngA5513HgBg48aNOHbsGLZv3w6r1Yonn3wSN910E6655hoAwOmnn46mpibccssteOmll3DWWWfhzTffxJe//OXEe6xfvx5WqxVarRYA8LOf/QwTExP4xS9+gfb2dgDApk2bcMEFF+AHP/gB/u3f/i1t+bZs2YKf/vSn8Pv90Ol0iEajeO655/CJT3wCWq0Wb7/9NnQ6Ha655prEvqxWK95//31Eo9EZ5+S94447cMcdd6Q8ZrVa8Z//+Z9pn3/BBRfgv//7v1OmiHruuedw7rnnoqamJqt4ExEREREVE+8rYgp1X/HrX/8amzZtQmNjIwDg0ksvxde+9jW88847WLNmDYBYR6c33ngD999/P84///xEmS+66KJEoiUajeLee+/FmWeeiXvvvTfx/p2dnbjyyivx17/+NbEeHxFRJeJICyKiMhCJRNI+fuTIESxfvjzlv4997GN57+ftt9+GRqNJGSkgCAIef/xxXH/99XjzzTcBYNr0RxdeeCHUajXeeOMNALH1IO6//3585Stfwa9+9SuMjo7i1ltvTblQP+mkk9Dc3JwY3SAIAjZt2oTXXnstY/m2bNkCr9ebGMr9zjvvYGBgAH/zN38DIDZtk8/nw0UXXYT77rsPO3bswBlnnIHrr79+1kUEr7/+ejz55JN48skn8fjjj+N73/seurq6Ej23pjrttNPQ3NycGG0RDAbx/PPP46KLLppxP0REREREpcL7iphC3Ffs27cPu3fvxvnnnw+XywWXy4WNGzfCYDCkjLbYvn07NBpNIhETj03yyJbDhw9jaGgI55xzTuK4JEnCunXrYDKZ8Oqrr85+EoiIyhhHWhARlYG6ujoYDAb09/enPN7a2oonn3wy8fcDDzyA/fv3572fiYkJWK1WCEL6nLbT6QSARM+hOFEUUVdXh8nJSQDA9773Pfzwhz/E7373O/zhD3+AIAj46Ec/im984xtob2/HxMRE4sYoHZ/PB71eP+3xBQsWYPXq1Xj22WfxyU9+Es8++yw6OjoSNy2rV6/Gww8/jMceewyPPvooHn74YTQ0NOCLX/wi/v7v/37GY29vb8eKFSsSf69evRpnnXUWzj77bNx///344Q9/mPJ8lUqFT3ziE/j973+Pr371q3j55ZchCAJOP/30vNfQICIiIiIqJN5XxBTiviIev9tvvx233357yrbf/e53+Md//EdYLBaMj4+njU3ydF0TExMAgLvuugt33XXXtH3Z7fa0ZSAiqhRMWhARlYlzzjkHL774ItxuN0wmEwBAq9WmNLRbrdaMr4/3CAqHw4lh3h6PJ+U5tbW1mJiYmDbkec+ePYhGo4mFqUdGRhLDrwEgFAphfHwcdXV1iff52te+hq997Ws4fPgwXnjhBTz44IO466678PDDD6O2thbr16/HLbfckras8SHY6WzZsgX33HMPJicn8fvf/x5/+7d/m7L9zDPPxJlnngmfz4ft27fjP/7jP/DNb34Tq1atwsqVKzO+bzpGoxELFy7EkSNH0m6/4IIL8LOf/Qx79+7Fc889h/PPPx8ajSanfRARERERFRPvK2LkvK8IBoP47W9/i/PPPx9/93d/l7Ktr68P//iP/4j/+Z//wZVXXonm5maMj48jEomkJC4cDkfi32azGQBwyy23YP369dPKHo8fEVGl4vRQRERl4pprroEkSfinf/onBIPBadv9fj+OHTuW8fXxG5KhoaHEY2+//XbKc9auXYtQKISXXnop8Vg0GsXtt9+OH/3oR4kL5qmL1j377LMIh8M47bTT0N/fj7POOisxbdLChQvxf/7P/8FHP/pRDAwMAIjNRdvT04Ouri6sWLEi8d/TTz+NJ598csa5cy+44AJEo1H84Ac/gMPhwJYtWxLbvv3tb2Pr1q2IRqPQ6/XYvHkzbr31VgBI7DsXk5OT6OnpwYIFC9JuP/XUU9He3o6nn34af/7zn6cNbyciIiIiUhr
"text/plain": [
"<Figure size 1600x1200 with 4 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"\n",
"sns.set(style=\"whitegrid\")\n",
"\n",
"plt.figure(figsize=(16, 12))\n",
"\n",
"# Визуализация взаимосвязи уровня глюкозы и индекса массы тела\n",
"plt.subplot(2, 2, 1)\n",
"sns.scatterplot(x=df_cleaned['Glucose'], y=df_cleaned['BMI'], alpha=0.6)\n",
"plt.title('Glucose vs BMI')\n",
"\n",
"# Визуализация взаимосвязи уровня глюкозы и возраста\n",
"plt.subplot(2, 2, 2)\n",
"sns.scatterplot(x=df_cleaned['Glucose'], y=df_cleaned['Age'], alpha=0.6)\n",
"plt.title('Glucose vs Age')\n",
"\n",
"# Визуализация взаимосвязи артериального давления и индекса массы тела\n",
"plt.subplot(2, 2, 3)\n",
"sns.scatterplot(x=df_cleaned['BloodPressure'], y=df_cleaned['BMI'], alpha=0.6)\n",
"plt.title('BloodPressure vs BMI')\n",
"\n",
"# Визуализация взаимосвязи DiabetesPedigreeFunction и индекса массы тела\n",
"plt.subplot(2, 2, 4)\n",
"sns.scatterplot(x=df_cleaned['DiabetesPedigreeFunction'], y=df_cleaned['BMI'], alpha=0.6)\n",
"plt.title('DiabetesPedigreeFunction vs BMI')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Стандартизация данных для кластеризации"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Стандартизация данных — процесс приведения всех признаков (столбцов) к одному масштабу."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"scaler = StandardScaler()\n",
"data_scaled = scaler.fit_transform(df_cleaned)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Агломеративная (иерархическая) кластеризация"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Иерархическая кластеризация — метод машинного обучения, предназначенный для группировки объектов (точек данных) на основе их схожести или расстояния друг от друга. Основная идея заключается в создании структуры кластеров в виде дерева (дендрограммы), которое показывает, как объекты группируются на разных уровнях."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1AAAAJxCAYAAABMnFMWAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADC+0lEQVR4nOzdeXhU5cHG4WeSmckeskAWEgj7KiCyiFVBcW3FVlDRinutVkGtS9UudnGptpVaFBesrYpCBcW1dUelWhEFRNxRMAESIEAIScgyM8n5/uCb4ySZTM5MJpmZ5Hdfl5dhtvPO2eZ9zrscm2EYhgAAAAAA7YqLdAEAAAAAIFYQoAAAAADAIgIUAAAAAFhEgAIAAAAAiwhQAAAAAGARAQoAAAAALCJAAQAAAIBFBCgAAAAAsIgABQAAAAAWEaAAWHLTTTdp+PDhfv+76aabIl08AD7279+vCRMmaOPGjdq/f78uv/xy/fOf/4x0sQCgW7BHugAAYkefPn20cOHCZo/NmzcvQqUB0JZevXrpoosu0uzZs2UYhoYPH64//elPkS4WAHQLBCgAljQ2Nio5OVmHHnpos8edTmdkCgQgoHnz5unss89WVVWVioqKFB8fH+kiAUC3QBc+AJZ4PB4lJiZaeu3atWt17rnnaty4cZo8ebJuvPFGVVRUmM8/88wzGj58uLZv397sfdOnT2/WHdDtdrfZbbDlZ3388ceaOXOmxo4dq1NPPVWvvPJKs8+urq7WHXfcoeOPP15jxozRjBkz9PTTT7dafsvlbN++Xeedd55uuukmPfjgg/re976nCRMm6IorrlBpaWmz97/xxhs655xzNH78eB1yyCE6+eSTtWTJEvP5NWvWmJ+7bt26Zu994oknNHz4cE2fPr1VeX7zm980e+3+/ft1yCGHaPjw4VqzZo3l5bflqaee0qxZs3TooYdq7Nix+tGPfqSXX3651Tr2122zre1z3nnnNVvGSy+9pFmzZmn8+PE68sgj9dvf/lb79+83n7/33ns1fPhwjR8/Xi6Xq9l7r7rqqlZdRRsaGvTnP/9Z06ZN0yGHHKJTTz1VL730UrP3TZ8+XXfffbf++Mc/atKkSTr88MN1ww03qLKy0vL3D9R19ZlnnjG3qe922Lt3ryZOnOh3Ww4fPlwjRozQpEmTdOWVV2rfvn3ma4YPH6577723Wdm86yWUdSlJvXv31qBBg/Tee++129225bL+85//aNKkSZo/f76k5vtvy/98y/3ll19q3rx5mjJlikaPHq2jjz5at912m+rr683XuFwu/e1vf9Nxxx2nsWPHasaMGXr22WctrXNJKisr07XXXqvJkydr3LhxuuCCC/T555+bn799+3YNHz5c//nPf/Szn/1M48aN0zHHHKP77rtPTU1NzbZLy3Vy7bXXNtumhmFowYIFOvroozVhwgT97Gc/044dO8zXNzY26qGHHtKMGTM0duxYHXrooTr77LP1/vvvB9yOUutt3vLfhmHo7LPPbna+vOmmm5rtW5L05JNP+t1/AHQOWqAAWFJXV6devXq1+7oPP/xQF110kaZMmaK//e1v2r9/vxYsWKDzzz9fTz/9tOUQJh2sJEvSAw88oKysLEkHK7stg48kXXbZZTr33HN1zTXX6Omnn9bPf/5zLVq0SNOmTVN9fb3OOecc7d27V1dddZUKCgr0xhtv6Ne//rX27Nmjn/3sZ+bnTJs2TVdccYX575ycHEnSypUrlZmZqd/85jdqamrS/Pnzdd555+k///mPkpKS9Pbbb2vu3Lk6//zzdeWVV6q+vl5Lly7VLbfcokMOOUTjxo0zPzMlJUVvvvmmJkyYYD720ksvKS6u9TWtlJQUvf322zIMQzabTZL02muvqbGxsdnrglm+ryVLlui2227TlVdeqQkTJmj//v36+9//ruuvv17jx49XXl6e+dqFCxeqT58+kmRuD0k644wzdOaZZ5r//sMf/tBsGffff7/uuecenXPOObrmmmu0bds2LViwQBs2bNDy5cub7RM2m02rV6/WtGnTJEkHDhzQqlWrmq0bwzA0d+5crV+/XldddZUGDx6s119/Xddcc41cLpdOO+0087VLly5VUVGR7rjjDlVUVGj+/PkqKSnRk08+KZvN1u73v+KKK3T22WdLOtiiM2rUKHP/6N+/v77++utW63T+/Pmqrq5Wenp6s8e9+5bb7dbmzZv15z//Wbfffrvuuusuv9vGn2DWpZfb7dYf//hHy8uQpPr6et1yyy265JJLdOqppzZ77re//a1Gjx5t/vuss84y/y4vL9ecOXN06KGH6s4775TT6dR///tfPfLII8rJydGll14qSbr++uu1atUqXX755Ro3bpxWrVqlm266SQ6Ho911XlFRobPPPltJSUm6+eablZSUpMcee0xz5szR008/rcGDB5vl+f3vf69p06bp3nvv1bp167Rw4ULV1tbqF7/4hd/vvXbtWv3nP/9p9tijjz6qRYsW6YYbbtDAgQN155136uqrr9by5cslSXfddZf+9a9/6brrrtPw4cO1a9cu3Xfffbr66qv19ttvKykpKah17+v555/XRx99FPA1+/fv19/+9reQlwEgeAQoAJZUVlaaYSKQ+fPna+DAgVq0aJHZZWjcuHE65ZRTtGLFCs2ZM8fyMmtrayVJ48ePV2ZmpiTpnXfe8fva8847T3PnzpUkHX300Zo5c6buu+8+TZs2Tc8884w2bdqkJ598UuPHjzdf4/F4dP/99+vss89WRkaGpIPBoGU3RelggHzmmWfUr18/SdKgQYM0c+ZMPffcc/rxj3+sb775RjNnztSvf/1r8z3jx4/X4YcfrjVr1jQLMFOnTtXKlSvNStzOnTv10UcfaeLEia1atY444gitWrVKH3/8sVmul19+WZMmTWrW6hHM8n1t27ZNP/nJT5qFxoKCAs2aNUvr1q3TKaecYj4+cuRIFRYWtvqMvLy8ZussNTXV/Hv//v164IEHNHv2bP32t781Hx82bJjmzJnTap/wrhtvgHrzzTfVp0+fZq0G7733nt555x3dfffd+sEPfiDp4Pasq6vTXXfdpRkzZshuP/jzFhcXp0ceeURpaWmSDm7fuXPn6p133tHUqVMtff/+/ftLOthdta39w+uTTz7R888/r5EjR6qqqqrZc77vnTRpkt577z199tlnbX5WS8GuS6/HH39ctbW16t27t+Vl/fvf/5bD4dAll1zSquvfkCFD2lwHmzZt0siRI7VgwQJzP/je976n//3vf1qzZo0uvfRSbdq0Sa+++qp+9atf6YILLpB0cD8vLS3VmjVrNGPGjIDr/O6771ZlZaX+9a9/qaCgQNLB/eYHP/iBFixYoHvuucd87ejRo82AOnXqVNXW1uqxxx7T5Zdf3mw/laSmpibddtttGj16dLPtUltbqyuuuEIXXnihpIOtW7fccouqqqqUnp6u8vJyXXPNNc1aXRMSEnTllVfqq6++Cri/BHLgwAHdddddrcrT0j333KO+ffs2a80E0LnowgfAkvLycuXm5gZ8TV1dnT7++GNNmzZNhmHI4/HI4/GoX79+Gjx4sP73v/81e31TU5P5Go/H0+rzdu7cqbi4uFYVHX9mzpxp/m2z2XTCCSdo48aNqq+v1wcffKCCggIzPHn98Ic/VENDgz7++ON2P/+www4zw5MkjRo1Sv369dOHH34oSbrkkkt055136sCBA/r000/10ksvadGiRZLUqkva9OnTVVxcrC1btkiSXnnlFY0bN86sDPpKS0vT5MmTtXLlSklSRUWF1qxZ0yzYBLt8XzfddJOuv/56VVVVacOGDXr++efNbn+B3mfVhg0b5HK5NGPGjGaPT5w4UQUFBfrggw+aPX7cccfpzTfflGEYkg62zHlDktfq1atls9k0bdq0ZvvP9OnTtXv37matQtOnTzfDk/ffdrvd3G7h/P6GYei2227TGWecoREjRvh93uPxyOVyaePGjVq3bp0OOeSQZq9peUz4Bsdg16Uk7dmzR/fdd59uvPFGJSQkWPoeu3bt0t///nedc845QY+bOuqoo/TEE08oISF
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 3 20 3 20 14 11 6 13 1 8 11 2 16 1 1 13 5 7 13 6 16 8 2 7\n",
" 2 7 7 20 8 11 8 4 20 11 8 15 10 3 6 3 9 16 8 2 9 14 18 18\n",
" 6 12 18 20 20 1 9 20 2 16 16 17 12 7 20 18 7 11 4 16 20 9 6 9\n",
" 2 17 17 19 11 20 13 20 20 12 11 20 2 18 10 17 2 20 20 9 10 1 18 10\n",
" 20 20 20 5 4 18 18 20 17 18 18 9 20 5 5 3 20 20 3 1 7 20 20 11\n",
" 5 17 17 8 6 13 17 20 7 1 5 3 5 11 18 17 18 18 18 17 8 11 20 7\n",
" 9 15 10 15 8 20 17 9 3 16 2 2 20 15 20 2 9 10 17 20 4 3 9 9\n",
" 20 18 7 6 13 17 20 3 11 5 10 3 11 17 19 20 9 3 1 4 3 6 20 10\n",
" 3 13 11 5 20 6 6 6 18 17 18 20 8 11 1 2 18 2 20 17 8 5 7 2\n",
" 6 9 3 7 4 1 13 8 18 17 17 5 14 17 5 2 18 17 20 5 1 5 3 18\n",
" 20 20 6 3 17 3 10 5 10 17 11 17 18 17 3 6 20 20 9 3 9 13 18 8\n",
" 6 11 13 16 20 13 2 20 9 17 10 17 7 18 8 15 5 10 10 1 7 8 16 5\n",
" 20 17 17 6 4 5 8 9 6 18 7 8 13 6 11 2 9 17 1 18 4 6 11 17\n",
" 6 20 3 17 20 5 17 1 9 6 6 3 17 18 6 10 5 11 16 20 13 8 20 5\n",
" 13 7 3 2 18 18 19 11 8 10 18 13 20 19 11 9 11 20 17 1 6 13 10 4\n",
" 3 8 8 1 9 20 6 20 18 7 14 15 17 20 17 2 18 17 5 16 18 20 15 15\n",
" 18 20 3 2 1 18 11 5 18 9 4 15 15 6 20 5 6 8 7 10 2 17 7 20\n",
" 3 5 11 17 16 18 5 5 15 5 18 6 17 20 17 20 2 5 12 5 17 7 13 11\n",
" 18 17 15 13 10 9 20 11 5 18 17 3 6 14 18 17 6 17 20 6 17 8 17 2\n",
" 8 20 3 8 8 20 11 11 11 18 20 17 13 16 16 17 17 10 17 8 6 11 11 8\n",
" 5 17 20 17 13 5 17 16 11 8 17 11 11 3 12 8 11 18 1 9 18 20 19 11\n",
" 11 10 5 18 15 8 7 18 8 20 20 5 3 10 10 8 20 17 12 3 17 20 20 11\n",
" 18 18 18 17 17 13 15 13 8 8 18 4 3 6 3 11 17 2 1 17 8 9 20 18\n",
" 8 18 17 11 17 8 10 10 7 5 17 20 18 15 17 11 9 6 11 18 20 20 18 20\n",
" 15 5 11 1 5 11 8 10 3 20 7 11 1 13 2 17 7 15 9 4 11 20 5 20\n",
" 18 13 17 3 13 17 4 20 17 20 20 5 3 11 3 11 11 20 3 13 17 15 16 17\n",
" 20 11 18 18 10 11 3 17 20 17 11 7 11 18 3 20 18 17 1 13 18 17 6 5\n",
" 7 20 20 17 17 20 17 5 20 16 10 3 10 4 2 2 7 17 1 7 11 10 9 20\n",
" 10 5 8 3 1 17 6 20 20 5 17 6 8 17 18 20 18 5 11 2 16 7 20 1\n",
" 5 13 16 16 17 3 1 13 11 11 12 20 1 6 9 9 2 18 11 3 4 8 17 7\n",
" 11 17 7 11 8 11 17 18 18 20 6 7 5 20 8 20 18 11 17 7 2 20 20 3\n",
" 16 8 5 16 5 1 4 17 20 5 3 4 10 7 17 1 15 2 11 8 17 11 7 18]\n"
]
}
],
"source": [
"linkage_matrix = linkage(data_scaled, method='ward')\n",
"plt.figure(figsize=(10, 7))\n",
"dendrogram(linkage_matrix)\n",
"plt.title('Дендрограмма агломеративной кластеризации')\n",
"plt.xlabel('Индекс образца')\n",
"plt.ylabel('Расстояние')\n",
"plt.show()\n",
"\n",
"# Получение результатов кластеризации с заданным порогом\n",
"result = fcluster(linkage_matrix, t=10, criterion='distance')\n",
"print(result) # Вывод результатов кластеризации"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Визуализация распределения кластеров"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi0AAASgCAYAAACEzgvMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3xb5dk//s8Z2rIl770dO3tPIGRAadkjtKXQUtYDbQp9eApllP5oU9ovbZ9CaXkoLVCggzJKKSvMUDZk78SJ48ROvPeSNc/4/eHEibCUYUuWbH/erxdNfW5J59Jl2T73ue4h6Lqug4iIiIiIiIiIiIiIKMbEWAdAREREREREREREREQEsGhBRERERERERERERERxgkULIiIiIiIiIiIiIiKKCyxaEBERERERERERERFRXGDRgoiIiIiIiIiIiIiI4gKLFkREREREREREREREFBdYtCAiIiIiIiIiIiIiorjAogUREREREREREREREcUFFi2IiIiIiIiIiIiIiCgusGhBRDTK7N+/H/fddx++/OUvY8aMGZgzZw6uuOIK/OMf/4CiKAOPq6urQ3l5OV566aUYRhv/juTp2P+mTJmCM888E/feey86OjqCHn/kMQ8++GDI19M0DYsXLx6U++XLl+Ouu+6K6nshIiIiIjpZ7FdEh6ZpWLp0KcrLy7Fz585Yh0NENCrJsQ6AiIhO3htvvIG7774bJSUluPbaa1FUVASv14sPP/wQ/+///T98/PHH+MMf/gBBEGId6qjz3e9+F0uXLgUA+Hw+VFdX4+GHH0ZVVRX+8Y9/BD1WFEW89dZb+MEPfjDodTZs2ICWlpaRCJmIiIiIaEjYr4ieTz/9FG1tbSguLsZzzz2Hn//857EOiYho1GHRgoholNi/fz/uvvtuLF68GA899BBk+eiv8CVLlmDBggX4/ve/jzfffBPnnXdeDCMdnfLz8zFz5syBrxcsWACDwYAf/ehH2LdvHyZMmDDQNnv2bGzcuBG7d+/G5MmTg15n9erVmDRpEioqKkYqdCIiIiKik8Z+RXS99NJLmDVrFhYvXoxHH30Ud911F+x2e6zDIiIaVbg8FBHRKPHEE09AFEWsWrUqqGNxxJe//GVccsklYZ//8MMPo7y8fNDx8vJyPPzwwwNfu1wu3HfffVi8eDFmzpyJFStW4IMPPhhoV1UVzzzzDC688EJMnz4dS5cuxW9+8xv4fL6Bx3R0dOC2227D6aefjmnTpuHiiy/Gyy+/HHTehoYG/OAHP8D8+fMxY8YMfPvb38bu3bvDxr9582aUl5fj/fffDzpeUVGB8vJyvPvuuwCA119/HRdddBGmT5+OhQsX4vbbb0dzc3PY1z0eh8MBAINGmM2bNw+pqal46623go4rioJ33nkH559//pDOR0REREQUbexXRK9f0d3djTVr1mDZsmW44IIL4PF48Morrwx6nMvlwr333otFixZh1qxZ+J//+R88/fTTg/K6Zs0aXHbZZZg2bRpOP/10/PznP4fb7T5uDEREYwGLFkREo8R7772HhQsXIiUlJexjfvWrXw1rNJSqqrjuuuvw2muv4aabbsIf/vAHFBcX43vf+x42btwIALj33ntx//334+yzz8ajjz6Kq666Cn//+9+xcuVK6LoOAPjhD3+I/fv3Y9WqVXj88ccxefJk3HnnnVi7di2A/s7HFVdcgV27duH/+//+PzzwwAPQNA1XXXUV9u/fHzK22bNnIz8/H6tXrw46/vrrr8PpdGLJkiXYtGkT7rjjDpxzzjl4/PHHcffdd2Pt2rW47bbbTvjeNU2DoihQFAVerxd79uzBH/7wByxcuBClpaVBj5UkCV/+8pcHFS0+//xz+Hw+LF++/OQSTkREREQ0wtiviF6/4rXXXoOqqrjwwguRnZ2NhQsX4vnnnx/0uJUrV+LNN9/ELbfcgt/+9rfo6+vDAw88MOi1vve976G4uBiPPPIIbr75Zrz66qtB+SEiGqu4PBQR0SjQ3d2N7u5uFBYWDmo7dpM8oH9WgCRJQzrPRx99hG3btuGRRx7B2WefDQBYuHAhamtrsXbtWjidTrz44ou47bbbcOONNwIATj/9dKSnp+OOO+7ARx99hCVLlmD9+vX43ve+N/Aa8+fPh9PphNFoBAD85S9/QVdXF5599lnk5OQAAM4880ycd955+N3vfoff//73IeO76KKL8OSTT8Lr9cJsNkPXdbzxxhv4yle+AqPRiE2bNsFsNuPGG28cOJfT6cSOHTug6/px1+S95557cM899wQdczqd+Nvf/hby8eeddx6eeeaZoCWi3njjDZx11lkwmUwnlW8iIiIiopHEfkW/aPUrXnrpJZx55plIS0sDAFx22WX44Q9/iM2bN2P27NkA+gc6rVu3Dg8//DDOOeecgZgvuOCCgUKLruv4zW9+g8WLF+M3v/nNwOsXFhbimmuuwYcffjiwHx8R0VjEmRZERKOApmkhjx88eBBTpkwJ+u9LX/rSkM+zadMmGAyGoJkCoijiueeew80334z169cDwKDlj84//3xIkoR169YB6N8P4uGHH8b3v/99/POf/0RbWxvuvPPOoAv1SZMmISMjY2B2gyiKOPPMM/HZZ5+Fje+iiy6C2+0emMq9efNmNDQ04OKLLwbQv2yTx+PBBRdcgAceeAAbN27EGWecgZtvvvmEmwjefPPNePHFF/Hiiy/iueeew29/+1sUFRUNjNz6ojlz5iAjI2NgtoXf78eaNWtwwQUXHPc8RERERESxwn5Fv2j0K/bs2YNdu3bhnHPOQU9PD3p6erBw4UJYrdag2RZr166FwWAYKMQcyc2xM1sOHDiApqYmLF++fOB9KYqCefPmwW6349NPPz3xN4GIaBTjTAsiolEgKSkJVqsV9fX1QcezsrLw4osvDnz9yCOPoLKycsjn6erqgtPphCiGrml3d3cDwMDIoSNkWUZSUhJ6e3sBAL/97W/xxz/+EW+++SbefvttiKKI0047DT/72c+Qk5ODrq6ugY5RKB6PBxaLZdDxgoICzJo1C6tXr8a5556L1atXIz8/f6DTMmvWLDz22GN4+umn8dRTT+Gxxx5DamoqvvOd7+Bb3/rWcd97Tk4Opk2bNvD1rFmzsGTJEixduhQPP/ww/vjHPwY9XhAEfOUrX8Fbb72FH/zgB/j4448hiiJOP/30Ie+hQUREREQUTexX9ItGv+JI/u6++27cfffdQW1vvvkmfvSjH8HhcKCzszNkbo5drqurqwsAsGrVKqxatWrQuVpaWkLGQEQ0VrBoQUQ0Sixfvhzvv/8+XC4X7HY7AMBoNAbdaHc6nWGff2REkKqqA9O8+/r6gh6TkJCArq6uQVOed+/eDV3XBzambm1tHZh+DQCBQACdnZ1ISkoaeJ0f/vCH+OEPf4gDBw7gvffewx/+8AesWrUKjz32GBISEjB//nzccccdIWM9MgU7lIsuugj3338/ent78dZbb+Eb3/hGUPvixYuxePFieDwerF27Fn/961/x85//HDNmzMD06dPDvm4oNpsNxcXFOHjwYMj28847D3/5y19QUVGBN954A+eccw4MBsMpnYOIiIiIaCSxX9Evkv0Kv9+P1157Deeccw6++c1vBrXV1dXhRz/6Ef7973/jmmuuQUZGBjo7O6FpWlDhor29feD/JyYmAgDuuOMOzJ8/f1DsR/JHRDRWcXkoIqJR4sYbb4SiKPjxj38Mv98/qN3r9aK2tjbs8490SJqamgaObdq0Kegxc+fORSAQwEcffTRwTNd13H333fjTn/40cMH8xU3rVq9eDVVVMWfOHNTX12PJkiUDyyYVFxfjv/7rv3DaaaehoaEBQP9atNXV1SgqKsK0adMG/nvllVfw4osvHnft3PPOOw+6ruN3v/sd2tvbcdFFFw20/epXv8KKFSug6zosFguWLVuGO++8EwAGzn0qent7UV1djYKCgpDtM2fORE5ODl555RX85z//GTS9nYiIiIgo3rBf0S+S/Yr//Oc/6OrqwhVXXIEFCxYE/bdixQoUFhYOLBE1f/58KIqC//znP0G5WbNmzcDXxcXFSEl
"text/plain": [
"<Figure size 1600x1200 with 4 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"\n",
"sns.set(style=\"whitegrid\")\n",
"\n",
"plt.figure(figsize=(16, 12))\n",
"\n",
"# Визуализация взаимосвязи уровня глюкозы и индекса массы тела\n",
"plt.subplot(2, 2, 1)\n",
"sns.scatterplot(x=df_cleaned['Glucose'], y=df_cleaned['BMI'], hue=df_cleaned['Outcome'], palette='Set1', alpha=0.6)\n",
"plt.title('Glucose vs BMI')\n",
"\n",
"# Визуализация взаимосвязи уровня глюкозы и возраста\n",
"plt.subplot(2, 2, 2)\n",
"sns.scatterplot(x=df_cleaned['Glucose'], y=df_cleaned['Age'], hue=df_cleaned['Outcome'], palette='Set1', alpha=0.6)\n",
"plt.title('Glucose vs Age')\n",
"\n",
"# Визуализация взаимосвязи артериального давления и индекса массы тела\n",
"plt.subplot(2, 2, 3)\n",
"sns.scatterplot(x=df_cleaned['BloodPressure'], y=df_cleaned['BMI'], hue=df_cleaned['Outcome'], palette='Set1', alpha=0.6)\n",
"plt.title('BloodPressure vs BMI')\n",
"\n",
"# Визуализация взаимосвязи DiabetesPedigreeFunction и индекса массы тела\n",
"plt.subplot(2, 2, 4)\n",
"sns.scatterplot(x=df_cleaned['DiabetesPedigreeFunction'], y=df_cleaned['BMI'], hue=df_cleaned['Outcome'], palette='Set1', alpha=0.6)\n",
"plt.title('DiabetesPedigreeFunction vs BMI')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### KMeans (неиерархическая кластеризация) для сравнения"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Неиерархическая кластеризация — метод группировки данных, при котором объекты распределяются по заданному числу кластеров(в нашем случае - \n",
"𝑘 в методе K-Means), основываясь на определенных метриках расстояния или схожести. В отличие от иерархической кластеризации, которая создает древовидную структуру кластеров, неиерархическая работает с фиксированным количеством кластеров и напрямую распределяет объекты в группы.\n",
"\n",
"K-Means:\n",
"* Один из самых популярных методов.\n",
"* Делит данные на 𝑘 кластеров, минимизируя сумму квадратов расстояний от каждой точки до её центроида.\n",
"* Центроиды обновляются итеративно, пока результат не стабилизируется."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Центры кластеров:\n",
" [[103.03726708 33.13167702 72.86335404 29.18322981]\n",
" [105.31168831 25.04350649 45.6038961 25.57792208]\n",
" [136.91472868 29.89457364 78.20155039 53.64341085]\n",
" [158.21472393 37.96809816 76.68711656 32.34969325]]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi8AAASgCAYAAACAO9vxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3xUVfo/8M+901tmkknvJCGhhyJFERAsKGJb2dVdlAVRUexl1f25rrKKuq69Ioh9FXXdRb92UewUQVCQUJNAem+T6TP390fMwJCZAMmUJHze+/IFuefOPU/OzCz3nOeecwRJkiQQERERERERERERERH1EWK0AyAiIiIiIiIiIiIiIjoUkxdERERERERERERERNSnMHlBRERERERERERERER9CpMXRERERERERERERETUpzB5QUREREREREREREREfQqTF0RERERERERERERE1KcweUFERERERERERERERH0KkxdERERERERERERERNSnMHlBRER+JEmKdghERERERDQAsG9BRES9weQFEUXdpZdeiksvvbTLcYvFgj/84Q8YMWIE1qxZ4zu3oKAAF198cdDr3XTTTSgoKMAdd9wRtpjDxeFw4OWXX8aFF16IcePGYcKECbj44ouxevVqvxv/p556CgUFBSGt2+l04v7778f//d//heR6wd7XSPvkk09w5ZVXYsqUKRgxYgROPvlk3HDDDfjll1/8zgtHmw5Ene106H+jR4/Geeedh1WrVvmd+9///td3TklJScDrffPNN75zOm3YsAEFBQXYsGFDWH8XIiIiGnjYtziIfYvQY98iPL7//nsUFBTgnHPOiXYoRNTHyKMdABFRIBaLBZdffjl27tyJZ555BtOmTfOViaKIrVu3orq6GsnJyX6vs1qtWLt2baTDDYn6+npcfvnlqKqqwqWXXopRo0bB6/Vi7dq1uOOOO7Bp0ybce++9EAQhLPXX1tbilVdewQMPPBCS6919990huU5Pud1u3HLLLfj8889x7rnn4q677kJsbCwqKyvx9ttv4+KLL8bDDz+MWbNmRTXO/uqtt94CAHi9XlgsFnzzzTe4++67IZPJ8Pvf/97vXFEU8cknn+Dqq6/ucp2PPvooIvESERHR8Yt9C/Yteot9i/B69913kZ+fj927d2Pz5s0YN25ctEMioj6CyQsi6nM6OxdFRUV47rnnMHnyZL/yYcOGYe/evfjkk08wf/58v7K1a9dCo9EgJiYmghGHxu23347q6mq89dZbyM7O9h0/5ZRTkJqaikcffRTTp0/HqaeeGr0gj0FeXl5U61+2bBk++eQTPPnkk5g5c6Zf2TnnnINrrrkGS5YswYwZM6BWq6MUZf81evRov5+nTp2KnTt3YtWqVV2SF2PHjsXHH3/cJXnhdDqxZs0aDB06FEVFReEOmYiIiI5D7FuwbxEK7FuET2trK9asWYMlS5bg+eefx6pVq5i8ICIfLhtFRH1Ke3s7rrjiCuzatQvLly/v0rkAAK1Wi2nTpuGTTz7pUvbRRx9h5syZkMv9c7NerxfLly/H6aefjhEjRmDmzJl47bXX/M7xeDxYvnw5Zs+ejVGjRmH06NG4+OKLsX79et85Tz31FE4//XR89dVXOOecc3zXWr16td+1XnnlFZx55pkYOXIkpkyZgnvuuQcWiyXo711UVITvvvsOCxcu9OtcdJo/fz7mzp0LrVYb8PUzZszoMpW9c8me8vJyAIDdbsc999yDqVOnYsSIETjzzDOxcuVKAEB5ebmv4/LXv/4VM2bM8F1n06ZNuOSSS1BYWIgJEybg9ttvR2Njo189w4YNwzvvvIPJkydjwoQJ2Lt3b5ep3QUFBfj3v/+NO++8ExMmTMCYMWNwww03oL6+3i/ulStX4tRTT8WoUaNw8cUX48svv/RbQqi8vBwFBQV46qmngranzWbDypUrceaZZ3bpXAAdT9jdeOONmDhxIhoaGnrcpgCwdetWXHbZZRg7diwmTZqEm2++GTU1Nb7y2tpa/PWvf8W0adMwatQozJkzB1988YXfdb///nv84Q9/wJgxYzB+/HhcffXV2Ldvn985a9aswe9+9zuMHDkSkydPxn333Qer1Rq0De666y5MnjwZHo/H7/jSpUsxceJEuFyubj8TPRETExPw6b1Zs2Zh165dXZaO+uabbyAIAqZOndrjOomIiIiCYd+CfQv2Lfp+3+L//u//4Ha7MWXKFJx77rn49NNP0dzc3OW8LVu2YO7cuRg9ejROOeUUvPLKK5g/f75fuzocDjz00EOYNm0aRowYgXPOOYczvYn6OSYviKjPsFqtuPLKK7Fjxw6sWLECEydODHrurFmzfNO7O3UuXTN79uwu599zzz148sknce6552LZsmU488wzcf/99+OZZ57xnfPwww/j2WefxUUXXYQXXngB9957L5qbm3HDDTfAZrP5zqurq8M//vEPzJs3D8uXL0d6ejpuv/123w3hBx98gH/961+YO3cuVq5ciWuuuQbvvfce7r333qC/z7fffgsAfjf2h1KpVPj73/+OE088Meg1juT+++/HN998g9tvv913E//QQw/h3XffRWJiIp5++mkAwNVXX+37+48//oj58+dDrVbj8ccfx//7f/8PGzduxLx582C3233X9ng8ePHFF7F06VL89a9/RW5ubsAYHnvsMXi9Xjz66KO47bbbsHbtWtx///2+8qeffhoPP/wwzjrrLDz77LMoLCzEjTfe6HeNxMREvPXWW12e7j/UDz/8AKvVGvCz0KmgoABPPvkk0tLSjth2wezYsQOXXHKJ7yZ5yZIl2L59OxYuXAi32436+nrMmTMHmzZtwk033YSnnnoKaWlpuOaaa/D+++8DAMrKyrB48WKMGDECzz33HJYuXYqSkhJceeWV8Hq9ADpu6K+55hrk5OTgmWeewbXXXov3338fixcvDroJ4nnnnYf6+nq/fSO8Xi8+/vhjnH322VAoFN1+Jo7E7Xb7/mttbcUHH3yAb775BpdcckmXcydPngyj0dhlUOCjjz7C6aefDoVCcdRtTkRERHQ02Ldg34J9i/7Rt3j33XcxZcoUxMfH4/zzz4fL5cL//vc/v3P27dvnmxn16KOP4rrrrsPy5cuxefNm3zmSJOGaa67BqlWrsGDBAjz33HMYM2YMbrrppi4JQSLqP7hsFBH1CZ2di86bj+6e+gA6pjtrNBq/6d2ff/45zGZzlymmJSUlePvtt3HzzTfjyiuvBACcfPLJEAQBzz//PP70pz8hNjYWtbW1uOmmm/ye6FGpVLjuuuuwa9cu3zI5NpsNS5cu9d3sZ2dnY/r06fj666+Rm5uLjRs3Ij09HXPnzoUoipgwYQK0Wi1aWlqC/j5VVVUAgPT09KNvtGO0ceNGTJ48GWeffTYAYOLEidBqtTCbzVAqlRg6dCgAIDMzE8OGDQMAPPLIIxg0aBCef/55yGQyAEBhYSHOPvtsvPvuu5g7d67v+ldddRVOOeWUbmPIz8/3W/f2l19+8Q1oW61WrFixAnPnzsWtt94KoON9stlsvv0VAECpVHZZsuhwZWVlANDlSTOv1+u7ae8kiiJEsWe5/GXLlsFkMuHFF1+ESqUC0NEBuuWWW7Bnzx588MEHaGxsxKeffurryEybNg3z58/HQw89hNmzZ+OXX36B3W7HokWLkJSUBABITk7GF198AavVCp1Oh4cffhhTpkzBww8/7Ks7Ozsb8+fPx9dffx2w3ceNG4e0tDR88MEHOOmkkwB0bIRdV1eH8847D0D3n4kjGT58eJdjM2bMCLjOr1wux2mnnea3dJTNZsPatWvxzDPP+HU6iIiIiHqLfQv2Ldi36B99i127duHXX3/Fk08+CQBITU3FpEmT8NZbb2HBggW+855//nkYDAa88MIL0Gg0AICcnBxcfPHFvnN++OEHfPvtt3jsscd8fZIpU6bAZrPh4YcfxuzZs7vMoiKivo8zL4ioT9i+fTv27NmDf//738jKysIdd9yBurq6oOer1WrMmDHD70nuDz/8EGeddVaXZWvWr18PSZIwY8YMv6fFZ8yYAYfD4evUPPL
"text/plain": [
"<Figure size 1600x1200 with 4 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.cluster import KMeans\n",
"from sklearn.preprocessing import StandardScaler\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"\n",
"# Масштабирование данных\n",
"scaler = StandardScaler()\n",
"data_scaled = scaler.fit_transform(df_cleaned[['Glucose', 'BMI', 'BloodPressure', 'Age']])\n",
"\n",
"# Обучение K-Means\n",
"random_state = 17\n",
"kmeans = KMeans(n_clusters=4, random_state=random_state)\n",
"labels = kmeans.fit_predict(data_scaled)\n",
"centers = kmeans.cluster_centers_\n",
"\n",
"# Обратная стандартизация центров кластеров\n",
"centers = scaler.inverse_transform(centers)\n",
"print(\"Центры кластеров:\\n\", centers)\n",
"\n",
"# Визуализация кластеризации\n",
"plt.figure(figsize=(16, 12))\n",
"\n",
"# Взаимосвязь Glucose и BMI\n",
"plt.subplot(2, 2, 1)\n",
"sns.scatterplot(x=df_cleaned['Glucose'], y=df_cleaned['BMI'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 0], centers[:, 1], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: Glucose vs BMI')\n",
"plt.legend()\n",
"\n",
"# Взаимосвязь Glucose и Age\n",
"plt.subplot(2, 2, 2)\n",
"sns.scatterplot(x=df_cleaned['Glucose'], y=df_cleaned['Age'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 0], centers[:, 3], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: Glucose vs Age')\n",
"plt.legend()\n",
"\n",
"# Взаимосвязь BloodPressure и BMI\n",
"plt.subplot(2, 2, 3)\n",
"sns.scatterplot(x=df_cleaned['BloodPressure'], y=df_cleaned['BMI'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 2], centers[:, 1], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: BloodPressure vs BMI')\n",
"plt.legend()\n",
"\n",
"# Взаимосвязь BloodPressure и Age\n",
"plt.subplot(2, 2, 4)\n",
"sns.scatterplot(x=df_cleaned['BloodPressure'], y=df_cleaned['Age'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 2], centers[:, 3], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: BloodPressure vs Age')\n",
"plt.legend()\n",
"\n",
"plt.tight_layout()\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### PCA для визуализации сокращенной размерности"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"PCA (Principal Component Analysis) — метод сокращения размерности, используемый для преобразования высокоразмерных данных в пространство с меньшим количеством измерений, сохраняя при этом как можно больше информации (дисперсии) из исходных данных.\n",
"\n",
"В контексте графиков для визуализации результатов кластеризации, PCA используется для проекции многомерных данных в двумерное пространство, чтобы можно было легко визуализировать кластеры."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi8AAAJHCAYAAADoqsXxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeXxU1fk/8M+9d/Yl+x4SCISENSRsgshatIr70toKWqtW/UqLrbut4t5qpW7g/tNqXalLi0sV911kEZBVgYQlQPZlJrPP3PP7Y8iQIZMQkslkAp/368Wr5Z479z4zcwfvuc8555GEEAJERERERERERERERERxQu7rAIiIiIiIiIiIiIiIiNpi8oKIiIiIiIiIiIiIiOIKkxdERERERERERERERBRXmLwgIiIiIiIiIiIiIqK4wuQFERERERERERERERHFFSYviIiIiIiIiIiIiIgorjB5QUREREREREREREREcYXJCyIiIiIiIiIiIiIiiitMXhARUZcIIfo6BOpl/I6JiIiIqKv6471jf4yZjgy/Y6KjC5MXRNShCy+8EMXFxWF/Ro0ahRkzZuCOO+5Ac3Nzu9dUVFTg9ttvx+zZs1FSUoIZM2bgmmuuwdatWzs8z4MPPoji4mLcdddd3Y5r2LBhGDt2LM455xwsW7as2++5re+++w7FxcX47rvvonK8nqqsrERxcTHefPPNDvdZvHhxu8+mpKQEP//5z3HfffehqanpiM9rs9lwww03YPXq1T2IPrLO3tOKFStQWlqK008/HfX19aF9i4uLsXTp0ojHs9vtGD16dFx9b0eiq7+fWbNm4aabborqubdt24Zf//rXUTlWV65VIiIi6n/YP2D/AOjd/kErr9eLF154Ab/4xS8wduxYjB07FmeffTaeffZZuFyubh3zsccewzPPPBPlSPvO8uXLcemll+L4449HaWkpTjvtNDz22GNoaWkJ7dNX1+yFF16ICy+8MPT3H3/8EWeddRZGjRqFOXPm4M0330RxcTEqKyujet5Dv+PW65+I+i9NXwdARPFtxIgRuO2220J/9/l82LRpEx544AFs2bIFr7zyCiRJAgB88MEHuOGGGzB06FD83//9HwYMGICqqio8//zz+OUvf4nHH38cU6ZMCTu+qqr473//i6KiIixbtgzXXXcdjEbjEccVCARQVVWF5557DjfccAOSkpIwffr0KH0K/U/rw30hBJxOJzZs2ICnn34an3zyCV555RWkpKR0+VhbtmzBsmXLcO655/ZWuO2sXLkSV155JQoKCvDss88iOTk5dGMryzLef/99nH/++e1e9+GHH8Lr9cYszmjqzu8nmt5//32sXbs2KsfKyMjA0qVLkZ+fH5XjERERUfxg/6B/6k/9A7vdjt/97nfYunUrfv3rX2PBggWQJAmrV6/G448/jv/85z94+umnkZWVdUTHffjhh/H73/++V2KOJVVVcf311+P999/Hueeei1//+tcwm81Yt24dnnnmGXz00Ud47rnnkJCQ0Gcxtv0tAsCjjz6Kffv24dFHH0VKSgpyc3OxdOlSZGRkRPW8h37Hv/jFLzB16tSonoOIYovJCyLqlMViQWlpadi2CRMmwOFw4JFHHsH69etRWlqK3bt348Ybb8TUqVPx0EMPQVGU0P4nnXQSfv3rX+PGG2/EJ598Ap1OF2r76quvUFVVhQceeADz5s3DO++8g1/84hfdigsApk2bhsmTJ+PNN988pjsnh342U6ZMwfHHH48LLrgADzzwAO6+++6+CawLVq1ahSuuuAKFhYV49tln2910jx07Ft999x0aGhradbLeffddDB8+HFu2bIllyD3W3d9PvNLpdBF/n0RERNT/sX/QP/Wn/sFf/vIXbNu2Da+++iqGDRsW2n7CCSfgzDPPxK9//Wtcd911eOGFF0KJsmPJ//t//w/vvPMOlixZghNPPDG0ffLkyZg4cSLmzp2LRx99FDfffHOfxVhYWBj298bGRhQVFYX9Bo8kYdZdWVlZR5zkIqL4wmWjiKhbRo0aBQDYt28fAOCFF16A1+vFLbfcEtYxAQCj0Ygbb7wR5557brup5G+88QaKioowbtw4HHfccR0uB9RVer0eOp0u7CZWVVU89dRTOPHEEzFq1Cj8/Oc/xwsvvNDuta+++ip+/vOfo6SkBPPmzQu9t1YdTTktLi7G4sWLQ39vaWnBXXfdhalTp6K0tBTnnnsuPvvss7DXvPbaazj11FND0+wXL16MQCAQts8HH3yAM844AyUlJTj77LM7nVrfFSUlJTjppJPw3//+N2yq9WuvvYZzzjkHpaWlKCkpwZlnnon33nsPQHCa8UUXXQQAuOiii0JTfwOBAJ566imcdtppKCkpQWlpKX71q19hxYoVoeO2TmNv+9kczurVq3H55ZejuLi4w9FCJ554ImRZxocffhi2vbGxEStWrMCpp57a7jX79u3DNddcg4kTJ2LMmDH4zW9+g82bN4ftU1lZiRtuuAEnnHACRo4cicmTJ+OGG25AY2NjaJ9Zs2bhkUcewX333Yfjjz8eJSUluPTSS7Fz587QPg0NDbj22msxZcoUjB49GmeeeSb++9//dvq+u/v7ATqeCn7oVO2NGzfiN7/5DcaNG4eysjJcfPHFWLduHYDgtb1kyRIA4ddzV347F154Ia677josWLAApaWl+O1vf9tuCYM333wTI0aMwPr163H++edj9OjRmDlzZrtp+zU1NfjTn/6EiRMnYsKECVi4cCEefPBBzJo1q9PPj4iIiPoe+wcHsX/Q/f7Btm3bsHz5clxxxRVhiYtWBQUFuPrqq7Fq1arQsbvyPbS2L1myJGzfdevW4ZJLLsHYsWMxadIkXHPNNaiurg6119TU4Oabb8b06dNRUlKC8847Dx9//HG787zyyiu46aabMG7cOEycOBF333033G437rvvPkyaNAnHHXcc/vKXv8Dj8YRe19XrsC2fz4dnn30W06ZNC0tctBo3bhwWLFjQLnnQ1kcffYQLLrgAZWVlGDVqFE4++WS89NJLYfs8//zzOPnkkzF69GhMnToVt99+e9hyVF9//TV++ctfoqysDBMmTMD//d//YceOHaH2tn2R4uJirFy5EqtWrQr1ESItG/X555/jV7/6FUpLS3HCCSdg4cKFsNlsofZVq1bh0ksvxYQJEzBq1CjMmjULixcvhqqqofMA4d9xpGvjf//7H8455xyUlZVhypQpWLhwYdi/Q4sXL8aJJ56Izz77DKeffnrouzlcn46IegeTF0TULRUVFQCAvLw8AMCXX36JESNGIDMzM+L+kydPxp/+9Cekp6eHtjU1NeGTTz7BWWedBQA4++yzsWHDBmzatOmw5xdCwO/3h/54PB6Ul5fj5ptvhsPhwJlnnhna9/bbb8cjjzyCM844A0888QROPvlk/PWvf8Wjjz4a2ufFF1/EbbfdhunTp+Oxxx7DmDFjcOuttx7x5xIIBHDJJZfg7bffxhVXXIHHHnsMgwcPxvz580Nrwj755JO49dZbMXnyZDzxxBOYO3cunn766bDzffLJJ1iwYAGKi4vx6KOP4pRTTsH1119/xPEcasqUKfD5fNiwYQMA4KWXXsLChQsxe/ZsPPnkk1i0aBF0Oh2uu+46VFVVYeTIkVi4cCEAYOHChaHpv4sWLcJjjz2G888/H//v//0/3HXXXWhqasLVV18d6vi0Lh3UlZFyALBmzRr87ne/Q3FxMZ555hlYLJaI+yUkJGDKlCl4//33w7YvX74cOTk5KCkpCdve0NCAX/3qV9i0aRNuvfVW/OMf/4Cqqpg7d27oBtvlcuGiiy7Cjh07cNttt+GZZ57BRRddhHfffRcPPvhg2PH+9a9/oby8HH/7299w9913Y+PGjbjxxhtD7ddffz127NiBO+64A08//TRGjBiBG2+8Mazjdqju/H6OREtLCy677DIkJydj8eLFePDBB+FyuXDppZfCbrfjF7/4Bc477zwACPvOuvLbAYD33nsPZrMZjz/+OC677LKIMaiqij/+8Y+YM2cOnnrqKYwdOxZ///vf8eWXXwIIrmv8m9/8Bt9
"text/plain": [
"<Figure size 1600x600 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from sklearn.decomposition import PCA\n",
"\n",
"# Снижение размерности с использованием PCA\n",
"pca = PCA(n_components=2)\n",
"reduced_data = pca.fit_transform(data_scaled)\n",
"\n",
"# Визуализация сокращенных данных\n",
"plt.figure(figsize=(16, 6))\n",
"\n",
"# Визуализация для KMeans кластеризации\n",
"plt.subplot(1, 2, 1)\n",
"sns.scatterplot(x=reduced_data[:, 0], y=reduced_data[:, 1], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.title('PCA Reduced Data: KMeans Clustering')\n",
"\n",
"# Визуализация для исходных данных с категорией Outcome\n",
"plt.subplot(1, 2, 2)\n",
"sns.scatterplot(x=reduced_data[:, 0], y=reduced_data[:, 1], hue=df_cleaned['Outcome'], palette='Set2', alpha=0.6)\n",
"plt.title('PCA Reduced Data: Outcome Classification')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Анализ инерции для метода локтя (метод оценки суммы квадратов расстояний)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Анализ инерции для метода локтя — это техника, используемая для определения оптимального числа кластеров в задаче кластеризации (например, для алгоритма K-Means). Метод основывается на оценке суммы квадратичных отклонений (или инерции) объектов от центров их кластеров.\n",
"\n",
"Инерция (в контексте кластеризации) — это метрика, которая измеряет \"плотность\" кластеров, то есть, насколько близко точки внутри каждого кластера расположены к его центроиду.\n",
"Формально инерция определяется как сумма квадратов расстояний всех точек до ближайшего центра кластера.\n",
"\n",
"Метод локтя:\n",
"1. Для различных значений 𝑘 (количества кластеров) вычисляется инерция.\n",
"2. Значения инерции отображаются на графике в зависимости от 𝑘.\n",
"3. Смотрится точка, после которой уменьшение инерции значительно замедляется. Эта точка называется локтем, и соответствующее значение 𝑘 считается оптимальным числом кластеров."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA14AAAImCAYAAABD3lvqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB8+0lEQVR4nO3deVhV1f7H8c9hngSZwXlAUQIHFM1+OZGZlXVTm9PKW2pmeVO7ll3rXm12TDMrs8EsK0sbLZtHpxwTxQkRxYF5Epnh/P5ATh5BQATOAd6v5+ER9l57n+85revl49prLYPRaDQKAAAAAFBnbCxdAAAAAAA0dgQvAAAAAKhjBC8AAAAAqGMELwAAAACoYwQvAAAAAKhjBC8AAAAAqGMELwAAAACoYwQvAAAAAKhjBC8AAAAAqGMELwAAAACoYwQvAKgFY8aMUXBwsG6//fYLtpkyZYqCg4P1+OOP12NlAGrq+PHjCg4O1tq1ay1dCoBGgOAFALXExsZGu3btUkJCQrlzOTk5+vnnny1QFQAAsAYELwCoJSEhIXJ0dNT69evLnfv555/l7Owsf39/C1QGAAAsjeAFALXExcVFAwcOrDB4ff3117rmmmtkZ2dX7twPP/ygkSNHKiwsTP/3f/+nZ555Rjk5OZKkyMhIBQcHV/h1/PhxSdKGDRt05513qlevXurbt6+mTZumU6dOmb3GtGnTKrxHVY9QlT1CWdHXuaKionTfffepb9++Cg8P1wMPPKBDhw6Zzm/ZskXBwcHasmWLJOngwYMaMmSIbr/9dr388ssXfI2XX35ZkvTxxx/r2muvVWhoqNn5qh7bXL16dYX3Pfe6ssfJqmpX0xqq+9lU9voXOl/23+Hxxx9XZGSk2et++OGHZp/hua+zfft2s7bvvfeegoODze6Rl5en+fPna+jQoQoNDVV4eLjGjh2rffv2mV17obrGjBlj1qasjoqc3z/KjBkzxuw++fn5euWVVzRs2DCFhYVp6NChWrZsmUpKSsyuOb+WLVu2VOvaqhiNRs2YMUPdunXTH3/8Ue3rAECSyv8GAACoseuuu06PPPKIEhISFBAQIEnKzs7Wb7/9prffflu//fabWfsvv/xSjz76qG644QY98sgjOnHihBYuXKiYmBi9/fbbWrJkiQoKCpScnKyHHnpIEydO1KBBgyRJfn5++uyzz/TYY49p+PDhmjBhgtLT07V48WLddttt+vTTT+Xt7S2p9BfW2267TSNHjpQk0/2qIyQkRP/9739NP3/88cf65JNPTD9v3rxZ999/v/r27avnnntO+fn5ev3113X77bdr9erV6tixY7l7zp07V6GhoZo4caI8PDzUv39/SdKsWbMkyfR6AQEB2rp1q2bOnKmbb75ZM2fOlKurqyRVq/68vDyFhYVp5syZpmMXuu7cz/b8djWt4WI+m6eeekqXXXZZha//0UcfSZL27t2r2bNnl2t7vszMTL300ksVnnN1ddVPP/2kXr16mY59/fXXsrEx/7fY6dOna9u2bZo6daratGmjo0ePatGiRZo2bZrWrVsng8FganvzzTfrlltuMf1c9t+xNhmNRj3wwAPatWuXHnroIXXp0kVbtmzRSy+9pPj4eD399NOmtuf32Y4dO1b72so888wz+uqrr/TKK6/oyiuvrPX3CKBxI3gBQC0aNGiQnJ2dtX79et17772SpO+//17e3t5mv+hKpb9Izps3T/3799e8efNMx9u1a6d7771Xv/76qykIlI1utWnTRj169JAklZSUaN68ebryyis1f/580/Xh4eG67rrr9Oabb2r69OmSpNzcXLVr1850bdn9qsPNzc10nST9/vvvZufnz5+vtm3batmyZbK1tZUkXXnllbr66qu1ePFiLVq0yKz90aNH9ccff+iLL75Qp06dJMkUUt3c3CTJ7PXWrVsnSXriiSdMgUeSHBwcqqw9NzdXPj4+Zve70HXnfrbnt9u9e3eNariYzyYoKOiCr192PD8/v8K251u8eLFatGih9PT0cucGDBigH3/8Uf/+978lSQkJCdq5c6d69+6tEydOSJIKCgp05swZzZw5U9ddd50kqU+fPsrOztYLL7yglJQU+fr6mu4ZEBBgVk/Zf8fa9Ntvv2njxo1asGCBrr/+eknS//3f/8nJyUmLFi3S3XffbepP5/fZX3/9tdrXXsj8+fP10UcfacmSJRowYECtvz8AjR+PGgJALXJyclJkZKTZ44br1q3TtddeazZCIEmxsbFKSEhQZGSkioqKTF8RERFyc3PThg0bKn2tI0eOKDk5WcOHDzc73qZNG/Xs2VN//vmn6dipU6fUrFmzWniH5nJychQVFaVrr73WFCwkyd3dXYMHDzaroaz9woUL1bdv3yp/0S3TrVs3SdJbb72lpKQkFRQUqKioqFrX1tb7rkkNF/vZ1JaDBw/qo48+0pNPPlnh+cjISMXFxSk2NlaStH79enXv3l0tW7Y0tXFwcNCbb76p6667TomJidq8ebM+/PBD0wIxBQUFF11XSUmJioqKZDQaq2xT9nVu2z///FN2dnYaNmyY2TU33nij6fyFXMq1kvT+++9r2bJluv76681GRQHgYjDiBQC17Nprr9VDDz2khIQEOTo6atOmTXrkkUfKtcvIyJBU+lhWRY9mJSUlVfo6Zdf7+PiUO+fj46Po6GhJpSNrJ0+eVKtWrS7ujVTD6dOnZTQaL1jD6dOnzY498MADcnd3N3tUsSoRERGaOXOmli1bpiVLllxUfSdOnKj0kby6rOFiP5va8swzz+j6669Xz549Kzzv7++v0NBQ/fjjj+rQoYO+/vprDR8+3NRfyvz+++967rnnFBsbK1dXV3Xp0kUuLi6SVGl4upClS5dq6dKlsrW1lY+Pj6688kr961//MltwpmyU+Fx9+vSRVPr4pKenp1mIlWQaeavs87yUayVp//79uvLKK/XVV1/pnnvuUUhISKXtAaAiBC8AqGUDBgyQq6ur1q9fLxcXF7Vq1UqhoaHl2rm7u0sqnUtT9svluTw8PCp9nebNm0uSUlJSyp1LTk6Wp6enJGnfvn3Ky8srtyBGbWjWrJkMBsMFayirscz06dO1fv16TZ48We+//361H0m79dZb9ccff6ioqEhPPfWUWrVqpYkTJ1Z6TUlJif766y+NGjWqWq9x/ojkpdZwsZ9Nbfjmm2+0Z88es0dPK3LVVVfpxx9/1LXXXqs9e/ZoyZIlZsHr2LFjmjRpkoYMGaLXX39drVu3lsFg0Pvvv1/uUVOp6s9OKv38br31VpWUlOjkyZNauHChxo0bpy+++MLUZtasWWZB+dx5Wh4eHkpPT1dxcbFZgCr7B4qy/l6RS7lWkv71r3/p7rvv1vXXX6+ZM2fq448/LhfiAKAqPGoIALXMwcFBQ4YM0bfffqtvvvnGNKfkfB06dJC3t7eOHz+usLAw05e/v7/mz59fbgTifO3bt5evr6+++uors+Px8fHatWuXwsPDJUm//PKLunbtKi8vr4t+LyUlJZX+guni4qLQ0FB98803Ki4uNh0/ffq0fvnll3Lz2kJDQ7VkyRKdOHFCc+fOrXYdixYt0i+//KIXXnhB1157rcLCwqqcX7Vjxw7l5OSob9++lbYrG705f3GJS63hYj+bS1VQUKA5c+Zo0qRJZvOvKjJkyBD99ddfeu+999SrVy/5+fmZnd+zZ4/y8/M1fvx4tWnTxhSsykJX2WdWtiJgVZ+dVLoYTFhYmLp3765rr71Wd911lw4cOKDMzExTm/bt25v9b+Hc+XR9+vRRUVFRuVVDy4JbZZ/npVwrlY5QOjk56amnntLevXv19ttvV/l+AeB8jHgBQB247rrrNGHCBNnY2JitqHcuW1tbTZkyRU899ZRsbW01ePBgZWVlaenSpUpMTKzyETkbGxtNnTpVM2bM0LRp03TjjTcqPT1dS5YskYeHh8aOHau9e/fq/fff1/XXX69du3aZrk1OTpZUOrKRlpZWLpSlpaUpJiZGR48eNQW4C5k2bZruu+8+jR8/XnfeeacKCwu1bNkyFRQUaNKkSeXa+/v765FHHtGzzz6
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"inertias = []\n",
"clusters_range = range(1, 23)\n",
"for i in clusters_range:\n",
" kmeans = KMeans(n_clusters=i, random_state=random_state)\n",
" kmeans.fit(data_scaled)\n",
" inertias.append(kmeans.inertia_)\n",
"\n",
"\n",
"plt.figure(figsize=(10, 6))\n",
"plt.plot(clusters_range, inertias, marker='o')\n",
"plt.title('Метод локтя для оптимального k')\n",
"plt.xlabel('Количество кластеров')\n",
"plt.ylabel('Инерция')\n",
"plt.grid(True)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Можно заметить, что после 19-го кластера функция начинает принимать линейный вид, что говорит о следующем: создание более 19-го кластера - не самое оптимальное решение, дальнейшее разбиение данных становится избыточным. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Расчитаем коэффициенты силуэта"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1oAAAImCAYAAABKNfuQAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADAM0lEQVR4nOzdeVjUVdsH8O/MMMOw7zsqKrIJaCoguKZmPa1mZZvappZZPG1apI+2mPa6ZGXhlpZLZYtmZVZmlpUooriwIyDKvu/bwMy8f+BMIogMzjDD8P1cF5fyW+/fYcS555xzH4FSqVSCiIiIiIiItEao7wCIiIiIiIiMDRMtIiIiIiIiLWOiRUREREREpGVMtIiIiIiIiLSMiRYREREREZGWMdEiIiIiIiLSMiZaREREREREWsZEi4iIiIiISMuYaBEREREREWkZEy0i6jNmzZqFWbNmtdl28uRJ3H333QgICMA333yj0/u/9tprmDRpksbnTZo0Ca+99poOIiIiXfH19cX69ev1HQYR6ZGJvgMgItKXsrIyPPPMMxg6dCi2bt0KX19ffYdERERERoKJFhH1WZ9++ikaGxuxatUquLi46DscIiIiMiIcOkhEfVJFRQW++OIL3HXXXe2SrOzsbERGRmLMmDEYPnw4Zs2ahVOnTrU55s8//8T06dMxbNgwREREYNmyZaipqWlzzOeff46bb74Zw4YNw4svvoja2loAwIYNGxAeHo5Ro0Zh2bJlkMlk6nNkMhnefPNNhISEICwsTD30qK6uDgsXLsTw4cMxYcIEfP755+pzcnNz4evri71796q3NTU1YfLkyW166ToaOhkbGwtfX1/ExsZ2+D3Q2vM3atSodsMev/nmG9xxxx0IDAzExIkTsX79esjlcvX+joZKXhmr6l4dfanivN6wyY6e6WrFxcV49dVXER4ejptuugkzZ87E6dOn1fuvHuKlVCrx0EMPwdfXF7m5uW2O6yzWyMhIjB8/HgqFos39Fy9ejFtvvRUAUFhYiJdeegmjR4/GsGHDMGvWLJw5cwYAsH79+mveQxVfamoqnnvuOYwePRpDhw7FuHHjsHz5cjQ2NnbaBkePHu009q4+IwD8/vvvuPfeezFs2LBOr3WlvXv3wtfXF2fPnsW9996L4OBg3HXXXfjll1/aHJebm4tFixZh7NixGDp0KMLDw7Fo0SJUVFSoj0lJScGjjz6Km266CVOmTMHu3bvV+zp6/QLtXyfXG9Z35etux44d7f59HT9+HH5+fvj444+veY2rffjhh/D398d3333X5XOIqHdjjxYR9SlKpRIFBQVYvnw5Wlpa8PTTT7fZn5GRgRkzZsDLywtLliyBWCzGjh078Nhjj2Hbtm0IDQ1FXFwc5s+fj7vvvhsvv/wyzp8/j/fffx/p6enYtWsXRCIRDh06hLfeeguzZs3C+PHj8dVXX+HQoUMAgAMHDmD58uXIy8vDmjVrIJVKERUVBQBYvXo19uzZg0WLFsHV1RXr1q1DXl4e8vLycNttt+HDDz/EX3/9hbfeeguurq6YPHlyh8/5ySeftEkSbsTatWtRU1MDa2tr9bZNmzZh3bp1mDlzJqKiopCSkoL169ejoKAAK1as6NJ1hw4diq+++gpAa9L27bffqr+3tLTUSux1dXV4+OGHIZfLsXDhQri4uGDbtm148skn8d1338HLy6vdOd9//32bROxK999/Px544AH192+++Wabfb/++itiY2MRHh4OAGhsbMQvv/yCuXPnQiaTYc6cOWhubsayZcsgFosRHR2NWbNm4euvv8YDDzyAcePGtbnusmXLAACurq4oLi7Go48+iuHDh+Pdd9+FRCLBX3/9hU8//RTOzs6YN2/eNduhsbERrq6u+OCDDzqMvavPeOnSJfz3v//FuHHj8OKLL6pfE9e61tWefvppzJw5Ey+++CK+/fZbvPDCC9i0aRMmTJiAhoYGzJ49G3Z2dli2bBmsrKxw+vRpfPTRR5BKpXjrrbfQ0NCAuXPnwsPDA+vXr0d8fDyWLVsGd3d3jB8/vksxaGrWrFk4ePAg/u///g8TJ06ERCLB66+/juHDh+OZZ57p0jW2bt2K6OhoLF++HPfee69O4iQiw8NEi4j6lLi4OEycOBFisRhbtmxp90b7o48+gkQiwY4dO9Rv9idOnIg777wTq1atwrfffot9+/bBy8sLK1euhFAoxJgxY2BmZoalS5fiyJEjmDRpEjZu3IiwsDAsWbIEABAWFoYxY8agpqYGK1euRGBgIACguroaW7ZswbPPPguFQoGvvvoK8+bNw8yZMwEAjo6OePDBB2Fra4s1a9ZALBZj/PjxSE9Px6ZNmzpMtAoKCrBlyxYMHToUSUlJN9ReCQkJ+P777+Hv74/q6moAQE1NDaKjo/Hggw+qn2/s2LGwtbXFkiVL8MQTT2DIkCHXvbalpSWGDx8OAPj7778BQP29tnz33XfIy8vDd999B39/fwDAiBEjMG3aNMTFxbX7+dfV1WHNmjXXbDtXV9c2MV6ZEI4dOxaurq7Yt2+fOtH67bffUF9fj2nTpuHMmTPIysrC559/jptuukkdyy233ILo6GisX78erq6uba575b3++ecf+Pv744MPPlDvj4iIwNGjRxEbG9tpotXQ0ABra+trxt7VZ0xOTkZzczNefPFF+Pj4XPdaV5s1axYWLFgAABg3bhzuvfdefPzxx5gwYQKys7Ph6uqK//u//0O/fv0AAKNHj8bZs2dx4sQJAEBeXh6CgoLw+uuvo1+/fhg7diy++OIL/P333zpLtAQCAVauXIm7774bq1evhkgkQmVlJbZv3w6RSHTd87/88kusXr0ab731Fu6//36dxEhEholDB4moTwkICMC7774LGxsbREVFtev1OXHiBG6++eY2bxxNTExwxx13IDExEXV1dXjnnXewb98+CIVCtLS0oKWlBbfeeiuEQiHi4uLQ0tKC5ORkjB07Vn0NU1NTDBs2DGZmZuokC2h9c97Y2Ii0tDSkpaWhqalJ3asBtL7RNjU1RXBwMMRicZvzkpKS2gzVU/m///s/jBo1CjfffPMNtZVSqcTy5ctx//33w8/PT7399OnTaGxsxKRJk9TP39LSoh4mePTo0TbXufKYq4fVdTWO7p576tQpeHp6qpMsADAzM8Ovv/7aptdGJTo6GnZ2dnj44Yc1vpdQKMS9996LgwcPoqGhAUBrohcREQFXV1eEhobizJkzGD58OORyOVpaWmBtbY0xY8YgLi7uutcfO3Ysdu3aBVNTU2RkZOD333/Hhg0bUF5e3mb4aUcKCgpgZWWl8TNdbejQoTAxMcGuXbuQl5cHmUyGlpYWKJXKLp1/ZW+OQCDALbfcgnPnzqGxsRH+/v744osv4OHhgezsbBw5cgRbt25FVlaW+vm8vb2xYcMG9OvXDzKZDH/99ReqqqowePDgNvdRKBRtXncdxac6piux9+vXD6+88gq+++47fPPNN1iyZIk6GezMH3/8gTfffBOjRo3CjBkzrns8ERkX9mgRUZ9iaWmJe++9F4MGDcLDDz+MF154AV999ZX6k+mqqio4Ojq2O8/R0RFKpRK1tbWwsLCAqakpgNY3nleqrq5GWVkZ5HI57Ozs2uyztbWFjY1Nm22qoVelpaXqpOnq82xsbGBra9vuvJaWljZzV4DWRPHQoUP44Ycf8NNPP3WlSa5p3759yM7OxsaNG/F///d/6u2VlZUAcM0elOLiYvXf8/Ly2rVRd+LYt28fBAIBHBwcMHLkSPz3v/9t9+a6I5WVlXBwcOjSfbKzs7F9+3Z88sknyM/P71as9913HzZu3IiDBw9i9OjROHbsGNasWaPeL5FIALTO27pyrk5XekYUCgXee+89fP7556ivr4ebmxuCg4PVr8XO5OXlwcPDoxtP1Fa/fv2wevVqvPfee+phniqhoaHXPd/Z2bnN9w4ODlAqlaiuroZUKsWnn36KjRs3orKyEo6OjggMDISZmVm7+Y/V1dUICQkBADg5OeE///lPm/2PP/54u3tfHV90dDSio6MhEong6OiIsWPH4r///e81C+Pcfvv
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"silhouette_scores = []\n",
"for i in clusters_range[1:]: \n",
" kmeans = KMeans(n_clusters=i, random_state=random_state)\n",
" labels = kmeans.fit_predict(data_scaled)\n",
" score = silhouette_score(data_scaled, labels)\n",
" silhouette_scores.append(score)\n",
"\n",
"# Построение диаграммы значений силуэта\n",
"plt.figure(figsize=(10, 6))\n",
"plt.plot(clusters_range[1:], silhouette_scores, marker='o')\n",
"plt.title('Коэффициенты силуэта для разных k')\n",
"plt.xlabel('Количество кластеров')\n",
"plt.ylabel('Коэффициент силуэта')\n",
"plt.grid(True)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Средний коэффициент силуэта (silhouette score) используется для оценки качества кластеризации. Его значение лежит в диапазоне от -1 до 1. Что означают различные значения:\n",
"\n",
"* Близко к 1.0 (0.71.0): Кластеры хорошо разделены и компактны. Это отличный результат кластеризации.\n",
"* От 0.5 до 0.7: Кластеры четко различимы, но есть некоторое пересечение между ними. Это хороший результат.\n",
"* От 0.25 до 0.5: Кластеры перекрываются, что указывает на менее четкую границу между группами. Качество кластеризации удовлетворительное, но может потребоваться уточнение числа кластеров или доработка данных.\n",
"* Близко к 0.0: Кластеры сильно перекрываются или распределение данных не позволяет выделить четкие группы. В этом случае нужно пересмотреть выбор числа кластеров, алгоритм или исходные данные.\n",
"* Меньше 0.0: Плохая кластеризация: точки ближе к центрам чужих кластеров, чем к своим. Это сигнал о том, что данные плохо структурированы для текущей кластеризации."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Средний коэффициент силуэта: 0.213\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1EAAAJzCAYAAADulpkjAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3xUVf7/8df0kmTSey+QEBIIvUuxYAEVe9ddu65+7auru+uu5acr6q69rWvBghUbCIIUQaT3TnpCek8mmf77I2TIkAkkkJCAn+fj4WPNvTP3nrm5We97zjmfo3C5XC6EEEIIIYQQQnSJsq8bIIQQQgghhBAnEwlRQgghhBBCCNENEqKEEEIIIYQQohskRAkhhBBCCCFEN0iIEkIIIYQQQohukBAlhBBCCCGEEN0gIUoIIYQQQgghukFClBBCCCGEEEJ0g4QoIYQQQgghhOgGCVFCiGN27bXXkpqa6vHPyJEjue6661i7dm1fN08IcYpLTU3l5Zdf7rB97969jBs3jsmTJ5OXl9fp+19++WVSU1PJzMyksbHR62s++eQTUlNTmTZtWk81WwhxCpAQJYQ4Lunp6cydO5e5c+fy8ccf88wzz6DRaLjxxhvZt29fXzdPCPE7s2/fPm644QYMBgNz5swhISHhqO+x2+38/PPPXvfNnz+/h1sohDgVSIgSQhwXX19fsrKyyMrKYsSIEZxxxhm8/PLLKJVKvvrqq75unhDidyQ7O5vrr78eHx8f5syZQ2xsbJfeN3z4cBYsWNBhe1lZGevXr2fQoEE93VQhxElOQpQQoscZDAZ0Oh0KhcK97dprr+Xaa6/1eN3zzz9PamqqR9iaM2cOp59+OsOGDeOaa65h7969AHz00UekpqaSm5vrcYxvvvmGQYMGUVJSAsDixYu56qqrGDZsGBkZGZx99tl89NFHHu95+OGHOwxDbPunqKjI/ZrDh+98+umnHYYPzZ8/n3PPPZesrCwuuugi1q9f7/Geo7VnzZo1pKamsmbNGo/3HX69unL9rFYrzz77LJMnT2bQoEEen+tIgfbwYz/11FNkZmayYsUK4NCQJ2//tG93V659eXk5f/7znxk3bpz7d7xp0yYApk2bdtTfy/r167nmmmsYOnQoo0eP5s9//jPV1dXu43/11VekpqayZcsWZs2axZAhQ5g5cyY//vijRzsaGhr4f//v/3HGGWeQmZnJjBkz+OKLLzxe0749aWlpjBo1irvuuouamppOryVATk4Of/rTnxg9ejSjRo3i1ltvJTs7u9PXH+n6tv+95eXlcffddzNhwgSysrK49tpr2bBhg3t/UVGR+33ffvutxzmWLl3q3tfe/Pnzueiiixg2bBgTJkzgb3/7G3V1dR3a1p63e3HatGk8/PDDnf58uLa2tv98Gzdu5PLLLyczM5MJEybwxBNP0NLS0ukxDpednc11112Hn58fc+bMISoqqsvvPffcc1m5cmWHIX0//vgjiYmJpKWldXjP4sWLueiii9ztffLJJzGbzR1e05W//9WrV/PHP/6RoUOHMmHCBJ577jkcDof7datWreKyyy5j2LBhjBo1ittvv/2I95QQovdJiBJCHBeXy4Xdbsdut2Oz2aioqOD555/HarVy8cUXd/q+goIC3nvvPY9tixYt4oknnuC8887j1VdfxeFwcNttt2G1Wpk5cyY6nY5vvvnG4z3z5s1j3LhxREZGsmzZMu68804GDx7Ma6+9xssvv0xsbCz//Oc/2bJli8f7QkND3cMQ586dy+23337Ez1lXV8e///1vj21bt27lgQceICsri9dff53IyEhuu+02KisrAbrVnu7ydv3efvtt3n//fa6//nref/995s6dyyuvvNKt427dupVPPvmEf//73wwbNsxjX/vr9be//c1jX1c+a1NTE1deeSVr1qzhwQcf5JVXXkGn0/HHP/6RvLw8XnnlFY8233777e7zhYWFsW7dOm644Qb0ej3//ve/+ctf/sLatWu57rrrOjxs33rrrZx++um88sorJCYmcs8997B8+XIAWlpauOqqq/juu++46aabeO211xgxYgSPPvoob7zxhsdxJk+ezNy5c/nwww+5//77WbVqFU899VSn16+srIzLL7+cvLw8Hn/8cZ577jkqKyu5/vrrqa2tPeK1b399D/+97d+/n4suuoiioiIee+wxZs+ejUKh4Prrr+8w/9DHx6fD0LT58+ejVHr+J/+1117jvvvuIysri5deeok777yThQsXcu2113YrvPSEkpISbrzxRgIDA3nllVe4++67+eabb3jooYe69P6cnByuv/56fH19mTNnDuHh4d06//Tp03E4HF6v23nnndfh9d999x133nknSUlJvPrqq/zpT3/i22+/5Y477sDlcgHd+/t/4IEHGDFiBG+88QYzZszgnXfe4fPPPwegsLCQO+64g4yMDF5//XWeeuopcnNzueWWW3A6nd36nEKInqPu6wYIIU5u69atY/DgwR2233fffSQnJ3f6vqeffpoBAwawY8cO97bq6mquuuoq7rvvPqC1Z6XtW/xBgwZx5pln8u233/J///d/KBQKSktL+e2333juueeA1gfNWbNm8eijj7qPOWzYMMaMGcOaNWsYOnSoe7tWqyUrK8v9c05OzhE/50svvURUVJRHL0RpaSnTp0/nySefRKlUEhISwowZM9i8eTNnnHFGt9rTXd6u39atW0lLS+OPf/yje1tbD05XtfUEnn766R32tb9eFovFY19XPuvXX39NcXExX3/9tXt41PDhw7nwwgtZt24dl156qUeb4+LiPM75/PPPk5iYyJtvvolKpQJg6NChnHfeeXz55ZdcffXV7tdee+213HnnnQBMmjSJWbNm8eqrrzJ58mS++uor9u7dy6effuoOipMmTcJut/Paa69xxRVXEBAQAEBQUJC7DaNGjeLXX3/1uOaHe++997Barfzvf/8jNDQUgLS0NK688kq2bNnC5MmTO31v+896+O/tlVdeQavV8sEHH+Dr6wvAlClTmDFjBv/61788etFOO+00fvnlF6xWK1qtFovFwpIlSxg1apS757Curo7XX3+dyy67zCMQDxw4kKuvvrrD9extb7/9NoGBgbz66qvu361SqeSxxx5jz549HXrD2svLy+O6666jsrISm812TMEiJCSEUaNGsWDBAs4//3wAiouL2bJlC//61794/fXX3a91uVzMnj2bSZMmMXv2bPf2hIQEbrjhBpYvX86UKVO69fd/6aWXuu/XcePGsXjxYpYtW8YVV1zB1q1baWlp4dZbb3WHw4iICJYsWYLZbHbfD0KIE0tClBDiuAwePJh//OMfQOvDRX19PStWrODFF1/EbDZz7733dnjPihUr+PXXX3n77be57rrr3NuvuOIKAJxOJ2azmUWLFqHX64mOjgbgkksu4fvvv2f9+vWMGjWKefPm4ePjw5lnngnATTfdBLT2eOTm5lJQUMC2bduA1kB2rPbu3evujWhrI8BZZ53FWWedhcvlwmw2s2DBApRKJYmJib3ans6uX2ZmJm+99RYLFy5k7Nix+Pj4dPmB0uVysWnTJubPn9+hh6sruvJZN2zYQExMjMf8EoPBwMKFC496/ObmZrZs2cKNN97o7v0EiI2NJTk5mVWrVnk89M+aNcv97wqFgjPPPJOXX36ZlpYW1q5dS3R0dIeetvPPP58vvvjCI+y0ncvpdLJ79242bNjA+PHjO23nhg0byMrKcgcoaH3gXbp06VE/45GsXbuWqVOnejwwq9Vqd69tU1OTe/vYsWNZsWIFa9asYdKkSaxYsQJfX19GjhzpDlGbN2/GarUyY8YMj/OMHDmS6Oho1q5de9whqu3aKZXKDr1gbZxOJ3a7nfXr1zNx4kR3gILWMAit1/RIIer7778nIyODF198kT/+8Y88+OCDvPfeex7ndDgc7h4iaL0n2p8LWof0PfnkkzQ2NuLr68sPP/zA4MGDiY+P93hdTk4OpaWl3Hrrre77EFpDtq+vL6tWrWLKlCnd+vs//F6MiIhwDw0cOnQoOp2OSy65hLPPPpvTTjuNMWP
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.metrics import silhouette_score\n",
"from sklearn.cluster import KMeans\n",
"from sklearn.decomposition import PCA\n",
"\n",
"# ========================\n",
"# Масштабирование данных\n",
"# ========================\n",
"scaler = StandardScaler()\n",
"data_scaled = scaler.fit_transform(df_cleaned[['Glucose', 'BMI', 'BloodPressure', 'Age']])\n",
"\n",
"# ========================\n",
"# Применение K-Means\n",
"# ========================\n",
"kmeans = KMeans(n_clusters=4, random_state=42) \n",
"df_clusters = kmeans.fit_predict(data_scaled)\n",
"\n",
"# ========================\n",
"# Оценка качества кластеризации\n",
"# ========================\n",
"silhouette_avg = silhouette_score(data_scaled, df_clusters)\n",
"print(f'Средний коэффициент силуэта: {silhouette_avg:.3f}')\n",
"\n",
"# ========================\n",
"# Визуализация кластеров\n",
"# ========================\n",
"pca = PCA(n_components=2)\n",
"df_pca = pca.fit_transform(data_scaled)\n",
"\n",
"plt.figure(figsize=(10, 7))\n",
"sns.scatterplot(x=df_pca[:, 0], y=df_pca[:, 1], hue=df_clusters, palette='viridis', alpha=0.7)\n",
"plt.title('Визуализация кластеров с помощью K-Means')\n",
"plt.xlabel('Первая компонентa PCA')\n",
"plt.ylabel('Вторая компонентa PCA')\n",
"plt.legend(title='Кластер', loc='upper right')\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В нашем случае, результат находится ближе к хорошему, но пока что больше соответствует удовлетворительному состоянию. На графике видно, что кластеры имеют некоторую степень пересечения, что приемлемо. Это может указывать на сложность четкого разделения групп пациентов из-за схожести их характеристик (например, уровня глюкозы, индекса массы тела или давления). Однако, кластеризация все же предоставляет полезное разделение для анализа данных и дальнейшей интерпретации"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "aimenv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}