712 lines
2.1 MiB
Plaintext
712 lines
2.1 MiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Начало лабораторной работы"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"*Вариант 3:* Диабет у индейцев Пима "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Перечислим атрибуты датасета\n",
|
|||
|
"\n",
|
|||
|
"Pregnancies — Количество беременностей.\n",
|
|||
|
"\n",
|
|||
|
"Glucose — Уровень глюкозы в крови.\n",
|
|||
|
"\n",
|
|||
|
"BloodPressure — Диастолическое артериальное давление.\n",
|
|||
|
"\n",
|
|||
|
"SkinThickness — Толщина кожной складки на трицепсе.\n",
|
|||
|
"\n",
|
|||
|
"Insulin — Уровень инсулина в сыворотке крови.\n",
|
|||
|
"\n",
|
|||
|
"BMI — Индекс массы тела.\n",
|
|||
|
"\n",
|
|||
|
"DiabetesPedigreeFunction — Функция родословной диабета.\n",
|
|||
|
"\n",
|
|||
|
"Age — Возраст.\n",
|
|||
|
"\n",
|
|||
|
"Outcome — Наличие диабета (0 — нет, 1 — да).\n",
|
|||
|
"\n",
|
|||
|
"Группировать индейцев Пима по \"интересным\" характеристикам для анализа и информирования: Риск развития диабета (на основе уровня глюкозы, BMI и DiabetesPedigreeFunction); Возрастные группы с высокой заболеваемостью (например, молодёжь и пожилые); Факторы риска у женщин с беременностями (Pregnancies, Insulin, и SkinThickness); Уровень давления и инсулина у людей с подтверждённым диабетом."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 1,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
" Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n",
|
|||
|
"0 6 148 72 35 0 33.6 \n",
|
|||
|
"1 1 85 66 29 0 26.6 \n",
|
|||
|
"2 8 183 64 0 0 23.3 \n",
|
|||
|
"3 1 89 66 23 94 28.1 \n",
|
|||
|
"4 0 137 40 35 168 43.1 \n",
|
|||
|
"\n",
|
|||
|
" DiabetesPedigreeFunction Age Outcome \n",
|
|||
|
"0 0.627 50 1 \n",
|
|||
|
"1 0.351 31 0 \n",
|
|||
|
"2 0.672 32 1 \n",
|
|||
|
"3 0.167 21 0 \n",
|
|||
|
"4 2.288 33 1 \n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import seaborn as sns\n",
|
|||
|
"from scipy.cluster.hierarchy import dendrogram, linkage, fcluster\n",
|
|||
|
"from sklearn.cluster import KMeans\n",
|
|||
|
"from sklearn.decomposition import PCA\n",
|
|||
|
"from sklearn.preprocessing import StandardScaler\n",
|
|||
|
"from sklearn.metrics import silhouette_score\n",
|
|||
|
"\n",
|
|||
|
"df = pd.read_csv(\"C:/Users/TIGR228/Desktop/МИИ/Lab1/AIM-PIbd-31-Afanasev-S-S/static/csv/diabetes.csv\")\n",
|
|||
|
"df = df.head(1500)\n",
|
|||
|
"print(df.head())"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Очистка данных\n",
|
|||
|
"\n",
|
|||
|
"Удалим несущественные данные"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 2,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
" Pregnancies Glucose BloodPressure BMI DiabetesPedigreeFunction Age \\\n",
|
|||
|
"0 6 148 72 33.6 0.627 50 \n",
|
|||
|
"1 1 85 66 26.6 0.351 31 \n",
|
|||
|
"2 8 183 64 23.3 0.672 32 \n",
|
|||
|
"3 1 89 66 28.1 0.167 21 \n",
|
|||
|
"4 0 137 40 43.1 2.288 33 \n",
|
|||
|
"\n",
|
|||
|
" Outcome \n",
|
|||
|
"0 1 \n",
|
|||
|
"1 0 \n",
|
|||
|
"2 1 \n",
|
|||
|
"3 0 \n",
|
|||
|
"4 1 \n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_cleaned = df.drop(columns=['SkinThickness', 'Insulin'], errors='ignore').dropna()\n",
|
|||
|
"print(df_cleaned.head()) # Вывод очищенного DataFrame"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Визуализация парных взаимосвязей"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 3,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi0AAASgCAYAAACEzgvMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeXxb5Z0v/o+OjmRtlmTLuxPHjrNBSEJCthYIBChtWTKF9NJOmeFC6YWWQn9MoSzDXKaUdpj2QpdhgJbSQmdpKYVOaIFuUFrWAAESIAvZ7CReZcu2ZO06kn5/KFIkW7Il+Ug6kj7v14sXsY6k85zveSSd83yfRRWNRqMgIiIiIiIiIiIiIiIqMaHUBSAiIiIiIiIiIiIiIgKYtCAiIiIiIiIiIiIiIoVg0oKIiIiIiIiIiIiIiBSBSQsiIiIiIiIiIiIiIlIEJi2IiIiIiIiIiIiIiEgRmLQgIiIiIiIiIiIiIiJFYNKCiIiIiIiIiIiIiIgUgUkLIiIiIiIiIiIiIiJSBCYtiIiIiIiIiIiIiIhIEZi0ICIqM4cOHcLdd9+Nj3/841i1ahVOO+00fPazn8XPf/5zSJKUeF5fXx+WLl2KX//61yUsrfLF45T83/Lly7Fp0ybceeedGBsbS3l+/Dnf/e53075fJBLBmWeeOS3255xzDm677baCHgsRERERUbZ4X1EYkUgEZ599NpYuXYoPPvig1MUhIipLYqkLQERE2Xvuuedw++23o7u7G1dddRW6urrg9/vx17/+Ff/yL/+Cl19+GQ8++CBUKlWpi1p2vvSlL+Hss88GAAQCAfT09OD+++/HwYMH8fOf/zzluYIg4Pe//z2++tWvTnuft956C3a7vRhFJiIiIiLKC+8rCufVV1/F6OgoFi5ciMcffxzf/OY3S10kIqKyw6QFEVGZOHToEG6//XaceeaZ+P73vw9RPPEVftZZZ2HDhg34yle+gt/97ne44IILSljS8tTR0YFTTz018feGDRug0Wjwj//4jzhw4AAWL16c2LZmzRrs2LEDe/bswcknn5zyPs8++yxOOukk7N27t1hFJyIiIiLKGu8rCuvXv/41Vq9ejTPPPBMPPfQQbrvtNphMplIXi4iorHB6KCKiMvHII49AEATcddddKTcWcR//+MfxqU99KuPr77//fixdunTa40uXLsX999+f+NvtduPuu+/GmWeeiVNPPRVbt27FX/7yl8T2cDiM//7v/8bFF1+MlStX4uyzz8a9996LQCCQeM7Y2BhuuukmnH766VixYgX+5m/+Btu2bUvZ78DAAL761a9i/fr1WLVqFf73//7f2LNnT8byv/POO1i6dClefPHFlMf37t2LpUuX4k9/+hMA4JlnnsGWLVuwcuVKbNy4ETfffDOGh4czvu9MLBYLAEzrYbZu3To0NDTg97//fcrjkiThj3/8Iy688MK89kdEREREVGi8ryjcfYXT6cTzzz+PzZs346KLLoLP58PTTz897Xlutxt33nknPvKRj2D16tX4h3/4Bzz22GPT4vr888/j0ksvxYoVK3D66afjm9/8Jrxe74xlICKqBExaEBGViRdeeAEbN26EzWbL+Jxvf/vbc+oNFQ6H8fnPfx6//e1vce211+LBBx/EwoUL8eUvfxk7duwAANx555245557cN555+Ghhx7C5Zdfjv/6r//Cddddh2g0CgD42te+hkOHDuGuu+7Cj3/8Y5x88sm49dZbsX37dgCxm4/Pfvaz2L17N/7v//2/uO+++xCJRHD55Zfj0KFDacu2Zs0adHR04Nlnn015/JlnnoHVasVZZ52Ft99+G7fccgvOP/98/PjHP8btt9+O7du346abbpr12CORCCRJgiRJ8Pv92LdvHx588EFs3LgRixYtSnmuWq3Gxz/+8WlJi9dffx2BQADnnHNOdgEnIiIiIioy3lcU7r7it7/9LcLhMC6++GK0tbVh48aN+OUvfznteddddx1+97vf4YYbbsD3vvc9eDwe3HfffdPe68tf/jIWLlyIBx54ANdffz1+85vfpMSHiKhScXooIqIy4HQ64XQ60dnZOW1b8iJ5QGxUgFqtzms/L730Enbt2oUHHngA5513HgBg48aNOHbsGLZv3w6r1Yonn3wSN910E6655hoAwOmnn46mpibccssteOmll3DWWWfhzTffxJe//OXEe6xfvx5WqxVarRYA8LOf/QwTExP4xS9+gfb2dgDApk2bcMEFF+AHP/gB/u3f/i1t+bZs2YKf/vSn8Pv90Ol0iEajeO655/CJT3wCWq0Wb7/9NnQ6Ha655prEvqxWK95//31Eo9EZ5+S94447cMcdd6Q8ZrVa8Z//+Z9pn3/BBRfgv//7v1OmiHruuedw7rnnoqamJqt4ExEREREVE+8rYgp1X/HrX/8amzZtQmNjIwDg0ksvxde+9jW88847WLNmDYBYR6c33ngD999/P84///xEmS+66KJEoiUajeLee+/FmWeeiXvvvTfx/p2dnbjyyivx17/+NbEeHxFRJeJICyKiMhCJRNI+fuTIESxfvjzlv4997GN57+ftt9+GRqNJGSkgCAIef/xxXH/99XjzzTcBYNr0RxdeeCHUajXeeOMNALH1IO6//3585Stfwa9+9SuMjo7i1ltvTblQP+mkk9Dc3JwY3SAIAjZt2oTXXnstY/m2bNkCr9ebGMr9zjvvYGBgAH/zN38DIDZtk8/nw0UXXYT77rsPO3bswBlnnIHrr79+1kUEr7/+ejz55JN48skn8fjjj+N73/seurq6Ej23pjrttNPQ3NycGG0RDAbx/PPP46KLLppxP0REREREpcL7iphC3Ffs27cPu3fvxvnnnw+XywWXy4WNGzfCYDCkjLbYvn07NBpNIhETj03yyJbDhw9jaGgI55xzTuK4JEnCunXrYDKZ8Oqrr85+EoiIyhhHWhARlYG6ujoYDAb09/enPN7a2oonn3wy8fcDDzyA/fv3572fiYkJWK1WCEL6nLbT6QSARM+hOFEUUVdXh8nJSQDA9773Pfzwhz/E7373O/zhD3+AIAj46Ec/im984xtob2/HxMRE4sYoHZ/PB71eP+3xBQsWYPXq1Xj22WfxyU9+Es8++yw6OjoSNy2rV6/Gww8/jMceewyPPvooHn74YTQ0NOCLX/wi/v7v/37GY29vb8eKFSsSf69evRpnnXUWzj77bNx///344Q9/mPJ8lUqFT3ziE/j973+Pr371q3j55ZchCAJOP/30vNfQICIiIiIqJN5XxBTiviIev9tvvx233357yrbf/e53+Md//EdYLBaMj4+njU3ydF0TExMAgLvuugt33XXXtH3Z7fa0ZSAiqhRMWhARlYlzzjkHL774ItxuN0wmEwBAq9WmNLRbrdaMr4/3CAqHw4lh3h6PJ+U5tbW1mJiYmDbkec+ePYhGo4mFqUdGRhLDrwEgFAphfHwcdXV1iff52te+hq997Ws4fPgwXnjhBTz44IO466678PDDD6O2thbr16/HLbfckras8SHY6WzZsgX33HMPJicn8fvf/x5/+7d/m7L9zDPPxJlnngmfz4ft27fjP/7jP/DNb34Tq1atwsqVKzO+bzpGoxELFy7EkSNH0m6/4IIL8LOf/Qx79+7Fc889h/PPPx8ajSanfRARERERFRPvK2LkvK8IBoP47W9/i/PPPx9/93d/l7Ktr68P//iP/4j/+Z//wZVXXonm5maMj48jEomkJC4cDkfi32azGQBwyy23YP369dPKHo8fEVGl4vRQRERl4pprroEkSfinf/onBIPBadv9fj+OHTuW8fXxG5KhoaHEY2+//XbKc9auXYtQKISXXnop8Vg0GsXtt9+OH/3oR4kL5qmL1j377LMIh8M47bTT0N/fj7POOisxbdLChQvxf/7P/8FHP/pRDAwMAIjNRdvT04Ouri6sWLEi8d/TTz+NJ598csa5cy+44AJEo1H84Ac/gMPhwJYtWxLbvv3tb2Pr1q2IRqPQ6/XYvHkzbr31VgBI7DsXk5OT6OnpwYIFC9JuP/XUU9He3o6nn34af/7zn6cNbyciIiIiUhr
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1600x1200 with 4 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import seaborn as sns\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"\n",
|
|||
|
"sns.set(style=\"whitegrid\")\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(16, 12))\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация взаимосвязи уровня глюкозы и индекса массы тела\n",
|
|||
|
"plt.subplot(2, 2, 1)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['Glucose'], y=df_cleaned['BMI'], alpha=0.6)\n",
|
|||
|
"plt.title('Glucose vs BMI')\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация взаимосвязи уровня глюкозы и возраста\n",
|
|||
|
"plt.subplot(2, 2, 2)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['Glucose'], y=df_cleaned['Age'], alpha=0.6)\n",
|
|||
|
"plt.title('Glucose vs Age')\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация взаимосвязи артериального давления и индекса массы тела\n",
|
|||
|
"plt.subplot(2, 2, 3)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['BloodPressure'], y=df_cleaned['BMI'], alpha=0.6)\n",
|
|||
|
"plt.title('BloodPressure vs BMI')\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация взаимосвязи DiabetesPedigreeFunction и индекса массы тела\n",
|
|||
|
"plt.subplot(2, 2, 4)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['DiabetesPedigreeFunction'], y=df_cleaned['BMI'], alpha=0.6)\n",
|
|||
|
"plt.title('DiabetesPedigreeFunction vs BMI')\n",
|
|||
|
"\n",
|
|||
|
"plt.tight_layout()\n",
|
|||
|
"plt.show()\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Стандартизация данных для кластеризации"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Стандартизация данных — процесс приведения всех признаков (столбцов) к одному масштабу."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"scaler = StandardScaler()\n",
|
|||
|
"data_scaled = scaler.fit_transform(df_cleaned)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Агломеративная (иерархическая) кластеризация"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Иерархическая кластеризация — метод машинного обучения, предназначенный для группировки объектов (точек данных) на основе их схожести или расстояния друг от друга. Основная идея заключается в создании структуры кластеров в виде дерева (дендрограммы), которое показывает, как объекты группируются на разных уровнях."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 5,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1AAAAJxCAYAAABMnFMWAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADC+0lEQVR4nOzdeXhU5cHG4WeSmckeskAWEgj7KiCyiFVBcW3FVlDRinutVkGtS9UudnGptpVaFBesrYpCBcW1dUelWhEFRNxRMAESIEAIScgyM8n5/uCb4ySZTM5MJpmZ5Hdfl5dhtvPO2eZ9zrscm2EYhgAAAAAA7YqLdAEAAAAAIFYQoAAAAADAIgIUAAAAAFhEgAIAAAAAiwhQAAAAAGARAQoAAAAALCJAAQAAAIBFBCgAAAAAsIgABQAAAAAWEaAAWHLTTTdp+PDhfv+76aabIl08AD7279+vCRMmaOPGjdq/f78uv/xy/fOf/4x0sQCgW7BHugAAYkefPn20cOHCZo/NmzcvQqUB0JZevXrpoosu0uzZs2UYhoYPH64//elPkS4WAHQLBCgAljQ2Nio5OVmHHnpos8edTmdkCgQgoHnz5unss89WVVWVioqKFB8fH+kiAUC3QBc+AJZ4PB4lJiZaeu3atWt17rnnaty4cZo8ebJuvPFGVVRUmM8/88wzGj58uLZv397sfdOnT2/WHdDtdrfZbbDlZ3388ceaOXOmxo4dq1NPPVWvvPJKs8+urq7WHXfcoeOPP15jxozRjBkz9PTTT7dafsvlbN++Xeedd55uuukmPfjgg/re976nCRMm6IorrlBpaWmz97/xxhs655xzNH78eB1yyCE6+eSTtWTJEvP5NWvWmJ+7bt26Zu994oknNHz4cE2fPr1VeX7zm980e+3+/ft1yCGHaPjw4VqzZo3l5bflqaee0qxZs3TooYdq7Nix+tGPfqSXX3651Tr2122zre1z3nnnNVvGSy+9pFmzZmn8+PE68sgj9dvf/lb79+83n7/33ns1fPhwjR8/Xi6Xq9l7r7rqqlZdRRsaGvTnP/9Z06ZN0yGHHKJTTz1VL730UrP3TZ8+XXfffbf++Mc/atKkSTr88MN1ww03qLKy0vL3D9R19ZlnnjG3qe922Lt3ryZOnOh3Ww4fPlwjRozQpEmTdOWVV2rfvn3ma4YPH6577723Wdm86yWUdSlJvXv31qBBg/Tee++129225bL+85//aNKkSZo/f76k5vtvy/98y/3ll19q3rx5mjJlikaPHq2jjz5at912m+rr683XuFwu/e1vf9Nxxx2nsWPHasaMGXr22WctrXNJKisr07XXXqvJkydr3LhxuuCCC/T555+bn799+3YNHz5c//nPf/Szn/1M48aN0zHHHKP77rtPTU1NzbZLy3Vy7bXXNtumhmFowYIFOvroozVhwgT97Gc/044dO8zXNzY26qGHHtKMGTM0duxYHXrooTr77LP1/vvvB9yOUutt3vLfhmHo7LPPbna+vOmmm5rtW5L05JNP+t1/AHQOWqAAWFJXV6devXq1+7oPP/xQF110kaZMmaK//e1v2r9/vxYsWKDzzz9fTz/9tOUQJh2sJEvSAw88oKysLEkHK7stg48kXXbZZTr33HN1zTXX6Omnn9bPf/5zLVq0SNOmTVN9fb3OOecc7d27V1dddZUKCgr0xhtv6Ne//rX27Nmjn/3sZ+bnTJs2TVdccYX575ycHEnSypUrlZmZqd/85jdqamrS/Pnzdd555+k///mPkpKS9Pbbb2vu3Lk6//zzdeWVV6q+vl5Lly7VLbfcokMOOUTjxo0zPzMlJUVvvvmmJkyYYD720ksvKS6u9TWtlJQUvf322zIMQzabTZL02muvqbGxsdnrglm+ryVLlui2227TlVdeqQkTJmj//v36+9//ruuvv17jx49XXl6e+dqFCxeqT58+kmRuD0k644wzdOaZZ5r//sMf/tBsGffff7/uuecenXPOObrmmmu0bds2LViwQBs2bNDy5cub7RM2m02rV6/WtGnTJEkHDhzQqlWrmq0bwzA0d+5crV+/XldddZUGDx6s119/Xddcc41cLpdOO+0087VLly5VUVGR7rjjDlVUVGj+/PkqKSnRk08+KZvN1u73v+KKK3T22WdLOtiiM2rUKHP/6N+/v77++utW63T+/Pmqrq5Wenp6s8e9+5bb7dbmzZv15z//Wbfffrvuuusuv9vGn2DWpZfb7dYf//hHy8uQpPr6et1yyy265JJLdOqppzZ77re//a1Gjx5t/vuss84y/y4vL9ecOXN06KGH6s4775TT6dR///tfPfLII8rJydGll14qSbr++uu1atUqXX755Ro3bpxWrVqlm266SQ6Ho911XlFRobPPPltJSUm6+eablZSUpMcee0xz5szR008/rcGDB5vl+f3vf69p06bp3nvv1bp167Rw4ULV1tbqF7/4hd/vvXbtWv3nP/9p9tijjz6qRYsW6YYbbtDAgQN155136uqrr9by5cslSXfddZf+9a9/6brrrtPw4cO1a9cu3Xfffbr66qv19ttvKykpKah17+v555/XRx99FPA1+/fv19/+9reQlwEgeAQoAJZUVlaaYSKQ+fPna+DAgVq0aJHZZWjcuHE65ZRTtGLFCs2ZM8fyMmtrayVJ48ePV2ZmpiTpnXfe8fva8847T3PnzpUkHX300Zo5c6buu+8+TZs2Tc8884w2bdqkJ598UuPHjzdf4/F4dP/99+vss89WRkaGpIPBoGU3RelggHzmmWfUr18/SdKgQYM0c+ZMPffcc/rxj3+sb775RjNnztSvf/1r8z3jx4/X4YcfrjVr1jQLMFOnTtXKlSvNStzOnTv10UcfaeLEia1atY444gitWrVKH3/8sVmul19+WZMmTWrW6hHM8n1t27ZNP/nJT5qFxoKCAs2aNUvr1q3TKaecYj4+cuRIFRYWtvqMvLy8ZussNTXV/Hv//v164IEHNHv2bP32t781Hx82bJjmzJnTap/wrhtvgHrzzTfVp0+fZq0G7733nt555x3dfffd+sEPfiDp4Pasq6vTXXfdpRkzZshuP/jzFhcXp0ceeURpaWmSDm7fuXPn6p133tHUqVMtff/+/ftLOthdta39w+uTTz7R888/r5EjR6qqqqrZc77vnTRpkt577z199tlnbX5WS8GuS6/HH39ctbW16t27t+Vl/fvf/5bD4dAll1zSquvfkCFD2lwHmzZt0siRI7VgwQJzP/je976n//3vf1qzZo0uvfRSbdq0Sa+++qp+9atf6YILLpB0cD8vLS3VmjVrNGPGjIDr/O6771ZlZaX+9a9/qaCgQNLB/eYHP/iBFixYoHvuucd87ejRo82AOnXqVNXW1uqxxx7T5Zdf3mw/laSmpibddtttGj16dLPtUltbqyuuuEIXXnihpIOtW7fccouqqqqUnp6u8vJyXXPNNc1aXRMSEnTllVfqq6++Cri/BHLgwAHdddddrcrT0j333KO+ffs2a80E0LnowgfAkvLycuXm5gZ8TV1dnT7++GNNmzZNhmHI4/HI4/GoX79+Gjx4sP73v/81e31TU5P5Go/H0+rzdu7cqbi4uFYVHX9mzpxp/m2z2XTCCSdo48aNqq+v1wcffKCCggIzPHn98Ic/VENDgz7++ON2P/+www4zw5MkjRo1Sv369dOHH34oSbrkkkt055136sCBA/r000/10ksvadGiRZLUqkva9OnTVVxcrC1btkiSXnnlFY0bN86sDPpKS0vT5MmTtXLlSklSRUWF1qxZ0yzYBLt8XzfddJOuv/56VVVVacOGDXr++efNbn+B3mfVhg0b5HK5NGPGjGaPT5w4UQUFBfrggw+aPX7cccfpzTfflGEYkg62zHlDktfq1atls9k0bdq0ZvvP9OnTtXv37matQtOnTzfDk/ffdrvd3G7h/P6GYei2227TGWecoREjRvh93uPxyOVyaePGjVq3bp0OOeSQZq9peUz4Bsdg16Uk7dmzR/fdd59uvPFGJSQkWPoeu3bt0t///nedc845QY+bOuqoo/TEE08oISF
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x700 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"[ 3 20 3 20 14 11 6 13 1 8 11 2 16 1 1 13 5 7 13 6 16 8 2 7\n",
|
|||
|
" 2 7 7 20 8 11 8 4 20 11 8 15 10 3 6 3 9 16 8 2 9 14 18 18\n",
|
|||
|
" 6 12 18 20 20 1 9 20 2 16 16 17 12 7 20 18 7 11 4 16 20 9 6 9\n",
|
|||
|
" 2 17 17 19 11 20 13 20 20 12 11 20 2 18 10 17 2 20 20 9 10 1 18 10\n",
|
|||
|
" 20 20 20 5 4 18 18 20 17 18 18 9 20 5 5 3 20 20 3 1 7 20 20 11\n",
|
|||
|
" 5 17 17 8 6 13 17 20 7 1 5 3 5 11 18 17 18 18 18 17 8 11 20 7\n",
|
|||
|
" 9 15 10 15 8 20 17 9 3 16 2 2 20 15 20 2 9 10 17 20 4 3 9 9\n",
|
|||
|
" 20 18 7 6 13 17 20 3 11 5 10 3 11 17 19 20 9 3 1 4 3 6 20 10\n",
|
|||
|
" 3 13 11 5 20 6 6 6 18 17 18 20 8 11 1 2 18 2 20 17 8 5 7 2\n",
|
|||
|
" 6 9 3 7 4 1 13 8 18 17 17 5 14 17 5 2 18 17 20 5 1 5 3 18\n",
|
|||
|
" 20 20 6 3 17 3 10 5 10 17 11 17 18 17 3 6 20 20 9 3 9 13 18 8\n",
|
|||
|
" 6 11 13 16 20 13 2 20 9 17 10 17 7 18 8 15 5 10 10 1 7 8 16 5\n",
|
|||
|
" 20 17 17 6 4 5 8 9 6 18 7 8 13 6 11 2 9 17 1 18 4 6 11 17\n",
|
|||
|
" 6 20 3 17 20 5 17 1 9 6 6 3 17 18 6 10 5 11 16 20 13 8 20 5\n",
|
|||
|
" 13 7 3 2 18 18 19 11 8 10 18 13 20 19 11 9 11 20 17 1 6 13 10 4\n",
|
|||
|
" 3 8 8 1 9 20 6 20 18 7 14 15 17 20 17 2 18 17 5 16 18 20 15 15\n",
|
|||
|
" 18 20 3 2 1 18 11 5 18 9 4 15 15 6 20 5 6 8 7 10 2 17 7 20\n",
|
|||
|
" 3 5 11 17 16 18 5 5 15 5 18 6 17 20 17 20 2 5 12 5 17 7 13 11\n",
|
|||
|
" 18 17 15 13 10 9 20 11 5 18 17 3 6 14 18 17 6 17 20 6 17 8 17 2\n",
|
|||
|
" 8 20 3 8 8 20 11 11 11 18 20 17 13 16 16 17 17 10 17 8 6 11 11 8\n",
|
|||
|
" 5 17 20 17 13 5 17 16 11 8 17 11 11 3 12 8 11 18 1 9 18 20 19 11\n",
|
|||
|
" 11 10 5 18 15 8 7 18 8 20 20 5 3 10 10 8 20 17 12 3 17 20 20 11\n",
|
|||
|
" 18 18 18 17 17 13 15 13 8 8 18 4 3 6 3 11 17 2 1 17 8 9 20 18\n",
|
|||
|
" 8 18 17 11 17 8 10 10 7 5 17 20 18 15 17 11 9 6 11 18 20 20 18 20\n",
|
|||
|
" 15 5 11 1 5 11 8 10 3 20 7 11 1 13 2 17 7 15 9 4 11 20 5 20\n",
|
|||
|
" 18 13 17 3 13 17 4 20 17 20 20 5 3 11 3 11 11 20 3 13 17 15 16 17\n",
|
|||
|
" 20 11 18 18 10 11 3 17 20 17 11 7 11 18 3 20 18 17 1 13 18 17 6 5\n",
|
|||
|
" 7 20 20 17 17 20 17 5 20 16 10 3 10 4 2 2 7 17 1 7 11 10 9 20\n",
|
|||
|
" 10 5 8 3 1 17 6 20 20 5 17 6 8 17 18 20 18 5 11 2 16 7 20 1\n",
|
|||
|
" 5 13 16 16 17 3 1 13 11 11 12 20 1 6 9 9 2 18 11 3 4 8 17 7\n",
|
|||
|
" 11 17 7 11 8 11 17 18 18 20 6 7 5 20 8 20 18 11 17 7 2 20 20 3\n",
|
|||
|
" 16 8 5 16 5 1 4 17 20 5 3 4 10 7 17 1 15 2 11 8 17 11 7 18]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"linkage_matrix = linkage(data_scaled, method='ward')\n",
|
|||
|
"plt.figure(figsize=(10, 7))\n",
|
|||
|
"dendrogram(linkage_matrix)\n",
|
|||
|
"plt.title('Дендрограмма агломеративной кластеризации')\n",
|
|||
|
"plt.xlabel('Индекс образца')\n",
|
|||
|
"plt.ylabel('Расстояние')\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# Получение результатов кластеризации с заданным порогом\n",
|
|||
|
"result = fcluster(linkage_matrix, t=10, criterion='distance')\n",
|
|||
|
"print(result) # Вывод результатов кластеризации"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Визуализация распределения кластеров"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 6,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi0AAASgCAYAAACEzgvMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3xb5dk//s8Z2rIl770dO3tPIGRAadkjtKXQUtYDbQp9eApllP5oU9ovbZ9CaXkoLVCggzJKKSvMUDZk78SJ48ROvPeSNc/4/eHEibCUYUuWbH/erxdNfW5J59Jl2T73ue4h6Lqug4iIiIiIiIiIiIiIKMbEWAdAREREREREREREREQEsGhBRERERERERERERERxgkULIiIiIiIiIiIiIiKKCyxaEBERERERERERERFRXGDRgoiIiIiIiIiIiIiI4gKLFkREREREREREREREFBdYtCAiIiIiIiIiIiIiorjAogUREREREREREREREcUFFi2IiIiIiIiIiIiIiCgusGhBRDTK7N+/H/fddx++/OUvY8aMGZgzZw6uuOIK/OMf/4CiKAOPq6urQ3l5OV566aUYRhv/juTp2P+mTJmCM888E/feey86OjqCHn/kMQ8++GDI19M0DYsXLx6U++XLl+Ouu+6K6nshIiIiIjpZ7FdEh6ZpWLp0KcrLy7Fz585Yh0NENCrJsQ6AiIhO3htvvIG7774bJSUluPbaa1FUVASv14sPP/wQ/+///T98/PHH+MMf/gBBEGId6qjz3e9+F0uXLgUA+Hw+VFdX4+GHH0ZVVRX+8Y9/BD1WFEW89dZb+MEPfjDodTZs2ICWlpaRCJmIiIiIaEjYr4ieTz/9FG1tbSguLsZzzz2Hn//857EOiYho1GHRgoholNi/fz/uvvtuLF68GA899BBk+eiv8CVLlmDBggX4/ve/jzfffBPnnXdeDCMdnfLz8zFz5syBrxcsWACDwYAf/ehH2LdvHyZMmDDQNnv2bGzcuBG7d+/G5MmTg15n9erVmDRpEioqKkYqdCIiIiKik8Z+RXS99NJLmDVrFhYvXoxHH30Ud911F+x2e6zDIiIaVbg8FBHRKPHEE09AFEWsWrUqqGNxxJe//GVccsklYZ//8MMPo7y8fNDx8vJyPPzwwwNfu1wu3HfffVi8eDFmzpyJFStW4IMPPhhoV1UVzzzzDC688EJMnz4dS5cuxW9+8xv4fL6Bx3R0dOC2227D6aefjmnTpuHiiy/Gyy+/HHTehoYG/OAHP8D8+fMxY8YMfPvb38bu3bvDxr9582aUl5fj/fffDzpeUVGB8vJyvPvuuwCA119/HRdddBGmT5+OhQsX4vbbb0dzc3PY1z0eh8MBAINGmM2bNw+pqal46623go4rioJ33nkH559//pDOR0REREQUbexXRK9f0d3djTVr1mDZsmW44IIL4PF48Morrwx6nMvlwr333otFixZh1qxZ+J//+R88/fTTg/K6Zs0aXHbZZZg2bRpOP/10/PznP4fb7T5uDEREYwGLFkREo8R7772HhQsXIiUlJexjfvWrXw1rNJSqqrjuuuvw2muv4aabbsIf/vAHFBcX43vf+x42btwIALj33ntx//334+yzz8ajjz6Kq666Cn//+9+xcuVK6LoOAPjhD3+I/fv3Y9WqVXj88ccxefJk3HnnnVi7di2A/s7HFVdcgV27duH/+//+PzzwwAPQNA1XXXUV9u/fHzK22bNnIz8/H6tXrw46/vrrr8PpdGLJkiXYtGkT7rjjDpxzzjl4/PHHcffdd2Pt2rW47bbbTvjeNU2DoihQFAVerxd79uzBH/7wByxcuBClpaVBj5UkCV/+8pcHFS0+//xz+Hw+LF++/OQSTkREREQ0wtiviF6/4rXXXoOqqrjwwguRnZ2NhQsX4vnnnx/0uJUrV+LNN9/ELbfcgt/+9rfo6+vDAw88MOi1vve976G4uBiPPPIIbr75Zrz66qtB+SEiGqu4PBQR0SjQ3d2N7u5uFBYWDmo7dpM8oH9WgCRJQzrPRx99hG3btuGRRx7B2WefDQBYuHAhamtrsXbtWjidTrz44ou47bbbcOONNwIATj/9dKSnp+OOO+7ARx99hCVLlmD9+vX43ve+N/Aa8+fPh9PphNFoBAD85S9/QVdXF5599lnk5OQAAM4880ycd955+N3vfoff//73IeO76KKL8OSTT8Lr9cJsNkPXdbzxxhv4yle+AqPRiE2bNsFsNuPGG28cOJfT6cSOHTug6/px1+S95557cM899wQdczqd+Nvf/hby8eeddx6eeeaZoCWi3njjDZx11lkwmUwnlW8iIiIiopHEfkW/aPUrXnrpJZx55plIS0sDAFx22WX44Q9/iM2bN2P27NkA+gc6rVu3Dg8//DDOOeecgZgvuOCCgUKLruv4zW9+g8WLF+M3v/nNwOsXFhbimmuuwYcffjiwHx8R0VjEmRZERKOApmkhjx88eBBTpkwJ+u9LX/rSkM+zadMmGAyGoJkCoijiueeew80334z169cDwKDlj84//3xIkoR169YB6N8P4uGHH8b3v/99/POf/0RbWxvuvPPOoAv1SZMmISMjY2B2gyiKOPPMM/HZZ5+Fje+iiy6C2+0emMq9efNmNDQ04OKLLwbQv2yTx+PBBRdcgAceeAAbN27EGWecgZtvvvmEmwjefPPNePHFF/Hiiy/iueeew29/+1sUFRUNjNz6ojlz5iAjI2NgtoXf78eaNWtwwQUXHPc8RERERESxwn5Fv2j0K/bs2YNdu3bhnHPOQU9PD3p6erBw4UJYrdag2RZr166FwWAYKMQcyc2xM1sOHDiApqYmLF++fOB9KYqCefPmwW6349NPPz3xN4GIaBTjTAsiolEgKSkJVqsV9fX1QcezsrLw4osvDnz9yCOPoLKycsjn6erqgtPphCiGrml3d3cDwMDIoSNkWUZSUhJ6e3sBAL/97W/xxz/+EW+++SbefvttiKKI0047DT/72c+Qk5ODrq6ugY5RKB6PBxaLZdDxgoICzJo1C6tXr8a5556L1atXIz8/f6DTMmvWLDz22GN4+umn8dRTT+Gxxx5DamoqvvOd7+Bb3/rWcd97Tk4Opk2bNvD1rFmzsGTJEixduhQPP/ww/vjHPwY9XhAEfOUrX8Fbb72FH/zgB/j4448hiiJOP/30Ie+hQUREREQUTexX9ItGv+JI/u6++27cfffdQW1vvvkmfvSjH8HhcKCzszNkbo5drqurqwsAsGrVKqxatWrQuVpaWkLGQEQ0VrBoQUQ0Sixfvhzvv/8+XC4X7HY7AMBoNAbdaHc6nWGff2REkKqqA9O8+/r6gh6TkJCArq6uQVOed+/eDV3XBzambm1tHZh+DQCBQACdnZ1ISkoaeJ0f/vCH+OEPf4gDBw7gvffewx/+8AesWrUKjz32GBISEjB//nzccccdIWM9MgU7lIsuugj3338/ent78dZbb+Eb3/hGUPvixYuxePFieDwerF27Fn/961/x85//HDNmzMD06dPDvm4oNpsNxcXFOHjwYMj28847D3/5y19QUVGBN954A+eccw4MBsMpnYOIiIiIaCSxX9Evkv0Kv9+P1157Deeccw6++c1vBrXV1dXhRz/6Ef7973/jmmuuQUZGBjo7O6FpWlDhor29feD/JyYmAgDuuOMOzJ8/f1DsR/JHRDRWcXkoIqJR4sYbb4SiKPjxj38Mv98/qN3r9aK2tjbs8490SJqamgaObdq0Kegxc+fORSAQwEcffTRwTNd13H333fjTn/40cMH8xU3rVq9eDVVVMWfOHNTX12PJkiUDyyYVFxfjv/7rv3DaaaehoaEBQP9atNXV1SgqKsK0adMG/nvllVfw4osvHnft3PPOOw+6ruN3v/sd2tvbcdFFFw20/epXv8KKFSug6zosFguWLVuGO++8EwAGzn0qent7UV1djYKCgpDtM2fORE5ODl555RX85z//GTS9nYiIiIgo3rBf0S+S/Yr//Oc/6OrqwhVXXIEFCxYE/bdixQoUFhYOLBE1f/58KIqC//znP0G5WbNmzcDXxcXFSEl
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1600x1200 with 4 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import seaborn as sns\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"\n",
|
|||
|
"sns.set(style=\"whitegrid\")\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(16, 12))\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация взаимосвязи уровня глюкозы и индекса массы тела\n",
|
|||
|
"plt.subplot(2, 2, 1)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['Glucose'], y=df_cleaned['BMI'], hue=df_cleaned['Outcome'], palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.title('Glucose vs BMI')\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация взаимосвязи уровня глюкозы и возраста\n",
|
|||
|
"plt.subplot(2, 2, 2)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['Glucose'], y=df_cleaned['Age'], hue=df_cleaned['Outcome'], palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.title('Glucose vs Age')\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация взаимосвязи артериального давления и индекса массы тела\n",
|
|||
|
"plt.subplot(2, 2, 3)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['BloodPressure'], y=df_cleaned['BMI'], hue=df_cleaned['Outcome'], palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.title('BloodPressure vs BMI')\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация взаимосвязи DiabetesPedigreeFunction и индекса массы тела\n",
|
|||
|
"plt.subplot(2, 2, 4)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['DiabetesPedigreeFunction'], y=df_cleaned['BMI'], hue=df_cleaned['Outcome'], palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.title('DiabetesPedigreeFunction vs BMI')\n",
|
|||
|
"\n",
|
|||
|
"plt.tight_layout()\n",
|
|||
|
"plt.show()\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### KMeans (неиерархическая кластеризация) для сравнения"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Неиерархическая кластеризация — метод группировки данных, при котором объекты распределяются по заданному числу кластеров(в нашем случае - \n",
|
|||
|
"𝑘 в методе K-Means), основываясь на определенных метриках расстояния или схожести. В отличие от иерархической кластеризации, которая создает древовидную структуру кластеров, неиерархическая работает с фиксированным количеством кластеров и напрямую распределяет объекты в группы.\n",
|
|||
|
"\n",
|
|||
|
"K-Means:\n",
|
|||
|
"* Один из самых популярных методов.\n",
|
|||
|
"* Делит данные на 𝑘 кластеров, минимизируя сумму квадратов расстояний от каждой точки до её центроида.\n",
|
|||
|
"* Центроиды обновляются итеративно, пока результат не стабилизируется."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Центры кластеров:\n",
|
|||
|
" [[103.03726708 33.13167702 72.86335404 29.18322981]\n",
|
|||
|
" [105.31168831 25.04350649 45.6038961 25.57792208]\n",
|
|||
|
" [136.91472868 29.89457364 78.20155039 53.64341085]\n",
|
|||
|
" [158.21472393 37.96809816 76.68711656 32.34969325]]\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi8AAASgCAYAAACAO9vxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3xUVfo/8M+901tmkknvJCGhhyJFERAsKGJb2dVdlAVRUexl1f25rrKKuq69Ioh9FXXdRb92UewUQVCQUJNAem+T6TP390fMwJCZAMmUJHze+/IFuefOPU/OzCz3nOeecwRJkiQQERERERERERERERH1EWK0AyAiIiIiIiIiIiIiIjoUkxdERERERERERERERNSnMHlBRERERERERERERER9CpMXRERERERERERERETUpzB5QUREREREREREREREfQqTF0RERERERERERERE1KcweUFERERERERERERERH0KkxdERERERERERERERNSnMHlBRER+JEmKdghERERERDQAsG9BRES9weQFEUXdpZdeiksvvbTLcYvFgj/84Q8YMWIE1qxZ4zu3oKAAF198cdDr3XTTTSgoKMAdd9wRtpjDxeFw4OWXX8aFF16IcePGYcKECbj44ouxevVqvxv/p556CgUFBSGt2+l04v7778f//d//heR6wd7XSPvkk09w5ZVXYsqUKRgxYgROPvlk3HDDDfjll1/8zgtHmw5Ene106H+jR4/Geeedh1WrVvmd+9///td3TklJScDrffPNN75zOm3YsAEFBQXYsGFDWH8XIiIiGnjYtziIfYvQY98iPL7//nsUFBTgnHPOiXYoRNTHyKMdABFRIBaLBZdffjl27tyJZ555BtOmTfOViaKIrVu3orq6GsnJyX6vs1qtWLt2baTDDYn6+npcfvnlqKqqwqWXXopRo0bB6/Vi7dq1uOOOO7Bp0ybce++9EAQhLPXX1tbilVdewQMPPBCS6919990huU5Pud1u3HLLLfj8889x7rnn4q677kJsbCwqKyvx9ttv4+KLL8bDDz+MWbNmRTXO/uqtt94CAHi9XlgsFnzzzTe4++67IZPJ8Pvf/97vXFEU8cknn+Dqq6/ucp2PPvooIvESERHR8Yt9C/Yteot9i/B69913kZ+fj927d2Pz5s0YN25ctEMioj6CyQsi6nM6OxdFRUV47rnnMHnyZL/yYcOGYe/evfjkk08wf/58v7K1a9dCo9EgJiYmghGHxu23347q6mq89dZbyM7O9h0/5ZRTkJqaikcffRTTp0/HqaeeGr0gj0FeXl5U61+2bBk++eQTPPnkk5g5c6Zf2TnnnINrrrkGS5YswYwZM6BWq6MUZf81evRov5+nTp2KnTt3YtWqVV2SF2PHjsXHH3/cJXnhdDqxZs0aDB06FEVFReEOmYiIiI5D7FuwbxEK7FuET2trK9asWYMlS5bg+eefx6pVq5i8ICIfLhtFRH1Ke3s7rrjiCuzatQvLly/v0rkAAK1Wi2nTpuGTTz7pUvbRRx9h5syZkMv9c7NerxfLly/H6aefjhEjRmDmzJl47bXX/M7xeDxYvnw5Zs+ejVGjRmH06NG4+OKLsX79et85Tz31FE4//XR89dVXOOecc3zXWr16td+1XnnlFZx55pkYOXIkpkyZgnvuuQcWiyXo711UVITvvvsOCxcu9OtcdJo/fz7mzp0LrVYb8PUzZszoMpW9c8me8vJyAIDdbsc999yDqVOnYsSIETjzzDOxcuVKAEB5ebmv4/LXv/4VM2bM8F1n06ZNuOSSS1BYWIgJEybg9ttvR2Njo189w4YNwzvvvIPJkydjwoQJ2Lt3b5ep3QUFBfj3v/+NO++8ExMmTMCYMWNwww03oL6+3i/ulStX4tRTT8WoUaNw8cUX48svv/RbQqi8vBwFBQV46qmngranzWbDypUrceaZZ3bpXAAdT9jdeOONmDhxIhoaGnrcpgCwdetWXHbZZRg7diwmTZqEm2++GTU1Nb7y2tpa/PWvf8W0adMwatQozJkzB1988YXfdb///nv84Q9/wJgxYzB+/HhcffXV2Ldvn985a9aswe9+9zuMHDkSkydPxn333Qer1Rq0De666y5MnjwZHo/H7/jSpUsxceJEuFyubj8TPRETExPw6b1Zs2Zh165dXZaO+uabbyAIAqZOndrjOomIiIiCYd+CfQv2Lfp+3+L//u//4Ha7MWXKFJx77rn49NNP0dzc3OW8LVu2YO7cuRg9ejROOeUUvPLKK5g/f75fuzocDjz00EOYNm0aRowYgXPOOYczvYn6OSYviKjPsFqtuPLKK7Fjxw6sWLECEydODHrurFmzfNO7O3UuXTN79uwu599zzz148sknce6552LZsmU488wzcf/99+OZZ57xnfPwww/j2WefxUUXXYQXXngB9957L5qbm3HDDTfAZrP5zqurq8M//vEPzJs3D8uXL0d6ejpuv/123w3hBx98gH/961+YO3cuVq5ciWuuuQbvvfce7r333qC/z7fffgsAfjf2h1KpVPj73/+OE088Meg1juT+++/HN998g9tvv913E//QQw/h3XffRWJiIp5++mkAwNVXX+37+48//oj58+dDrVbj8ccfx//7f/8PGzduxLx582C3233X9ng8ePHFF7F06VL89a9/RW5ubsAYHnvsMXi9Xjz66KO47bbbsHbtWtx///2+8qeffhoPP/wwzjrrLDz77LMoLCzEjTfe6HeNxMREvPXWW12e7j/UDz/8AKvVGvCz0KmgoABPPvkk0tLSjth2wezYsQOXXHKJ7yZ5yZIl2L59OxYuXAi32436+nrMmTMHmzZtwk033YSnnnoKaWlpuOaaa/D+++8DAMrKyrB48WKMGDECzz33HJYuXYqSkhJceeWV8Hq9ADpu6K+55hrk5OTgmWeewbXXXov3338fixcvDroJ4nnnnYf6+nq/fSO8Xi8+/vhjnH322VAoFN1+Jo7E7Xb7/mttbcUHH3yAb775BpdcckmXcydPngyj0dhlUOCjjz7C6aefDoVCcdRtTkRERHQ02Ldg34J9i/7Rt3j33XcxZcoUxMfH4/zzz4fL5cL//vc/v3P27dvnmxn16KOP4rrrrsPy5cuxefNm3zmSJOGaa67BqlWrsGDBAjz33HMYM2YMbrrppi4JQSLqP7hsFBH1CZ2di86bj+6e+gA6pjtrNBq/6d2ff/45zGZzlymmJSUlePvtt3HzzTfjyiuvBACcfPLJEAQBzz//PP70pz8hNjYWtbW1uOmmm/ye6FGpVLjuuuuwa9cu3zI5NpsNS5cu9d3sZ2dnY/r06fj666+Rm5uLjRs3Ij09HXPnzoUoipgwYQK0Wi1aWlqC/j5VVVUAgPT09KNvtGO0ceNGTJ48GWeffTYAYOLEidBqtTCbzVAqlRg6dCgAIDMzE8OGDQMAPPLIIxg0aBCef/55yGQyAEBhYSHOPvtsvPvuu5g7d67v+ldddRVOOeWUbmPIz8/3W/f2l19+8Q1oW61WrFixAnPnzsWtt94KoON9stlsvv0VAECpVHZZsuhwZWVlANDlSTOv1+u7ae8kiiJEsWe5/GXLlsFkMuHFF1+ESqUC0NEBuuWWW7Bnzx588MEHaGxsxKeffurryEybNg3z58/HQw89hNmzZ+OXX36B3W7HokWLkJSUBABITk7GF198AavVCp1Oh4cffhhTpkzBww8/7Ks7Ozsb8+fPx9dffx2w3ceNG4e0tDR88MEHOOmkkwB0bIRdV1eH8847D0D3n4kjGT58eJdjM2bMCLjOr1wux2mnnea3dJTNZsPatWvxzDPP+HU6iIiIiHqLfQv2Ldi36B99i127duHXX3/Fk08+CQBITU3FpEmT8NZbb2HBggW+855//nkYDAa88MIL0Gg0AICcnBxcfPHFvnN++OEHfPvtt3jsscd8fZIpU6bAZrPh4YcfxuzZs7vMoiKivo8zL4ioT9i+fTv27NmDf//738jKysIdd9yBurq6oOer1WrMmDHD70nuDz/8EGeddVaXZWvWr18PSZIwY8YMv6fFZ8yYAYfD4evUPPL
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1600x1200 with 4 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.cluster import KMeans\n",
|
|||
|
"from sklearn.preprocessing import StandardScaler\n",
|
|||
|
"import seaborn as sns\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"\n",
|
|||
|
"# Масштабирование данных\n",
|
|||
|
"scaler = StandardScaler()\n",
|
|||
|
"data_scaled = scaler.fit_transform(df_cleaned[['Glucose', 'BMI', 'BloodPressure', 'Age']])\n",
|
|||
|
"\n",
|
|||
|
"# Обучение K-Means\n",
|
|||
|
"random_state = 17\n",
|
|||
|
"kmeans = KMeans(n_clusters=4, random_state=random_state)\n",
|
|||
|
"labels = kmeans.fit_predict(data_scaled)\n",
|
|||
|
"centers = kmeans.cluster_centers_\n",
|
|||
|
"\n",
|
|||
|
"# Обратная стандартизация центров кластеров\n",
|
|||
|
"centers = scaler.inverse_transform(centers)\n",
|
|||
|
"print(\"Центры кластеров:\\n\", centers)\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация кластеризации\n",
|
|||
|
"plt.figure(figsize=(16, 12))\n",
|
|||
|
"\n",
|
|||
|
"# Взаимосвязь Glucose и BMI\n",
|
|||
|
"plt.subplot(2, 2, 1)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['Glucose'], y=df_cleaned['BMI'], hue=labels, palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.scatter(centers[:, 0], centers[:, 1], s=300, c='red', label='Centroids')\n",
|
|||
|
"plt.title('KMeans Clustering: Glucose vs BMI')\n",
|
|||
|
"plt.legend()\n",
|
|||
|
"\n",
|
|||
|
"# Взаимосвязь Glucose и Age\n",
|
|||
|
"plt.subplot(2, 2, 2)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['Glucose'], y=df_cleaned['Age'], hue=labels, palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.scatter(centers[:, 0], centers[:, 3], s=300, c='red', label='Centroids')\n",
|
|||
|
"plt.title('KMeans Clustering: Glucose vs Age')\n",
|
|||
|
"plt.legend()\n",
|
|||
|
"\n",
|
|||
|
"# Взаимосвязь BloodPressure и BMI\n",
|
|||
|
"plt.subplot(2, 2, 3)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['BloodPressure'], y=df_cleaned['BMI'], hue=labels, palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.scatter(centers[:, 2], centers[:, 1], s=300, c='red', label='Centroids')\n",
|
|||
|
"plt.title('KMeans Clustering: BloodPressure vs BMI')\n",
|
|||
|
"plt.legend()\n",
|
|||
|
"\n",
|
|||
|
"# Взаимосвязь BloodPressure и Age\n",
|
|||
|
"plt.subplot(2, 2, 4)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['BloodPressure'], y=df_cleaned['Age'], hue=labels, palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.scatter(centers[:, 2], centers[:, 3], s=300, c='red', label='Centroids')\n",
|
|||
|
"plt.title('KMeans Clustering: BloodPressure vs Age')\n",
|
|||
|
"plt.legend()\n",
|
|||
|
"\n",
|
|||
|
"plt.tight_layout()\n",
|
|||
|
"plt.show()\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### PCA для визуализации сокращенной размерности"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"PCA (Principal Component Analysis) — метод сокращения размерности, используемый для преобразования высокоразмерных данных в пространство с меньшим количеством измерений, сохраняя при этом как можно больше информации (дисперсии) из исходных данных.\n",
|
|||
|
"\n",
|
|||
|
"В контексте графиков для визуализации результатов кластеризации, PCA используется для проекции многомерных данных в двумерное пространство, чтобы можно было легко визуализировать кластеры."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 9,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi8AAAJHCAYAAADoqsXxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeXxU1fk/8M+9d/Yl+x4SCISENSRsgshatIr70toKWqtW/UqLrbut4t5qpW7g/tNqXalLi0sV911kEZBVgYQlQPZlJrPP3PP7Y8iQIZMQkslkAp/368Wr5Z479z4zcwfvuc8555GEEAJERERERERERERERERxQu7rAIiIiIiIiIiIiIiIiNpi8oKIiIiIiIiIiIiIiOIKkxdERERERERERERERBRXmLwgIiIiIiIiIiIiIqK4wuQFERERERERERERERHFFSYviIiIiIiIiIiIiIgorjB5QUREREREREREREREcYXJCyIiIiIiIiIiIiIiiitMXhARUZcIIfo6BOpl/I6JiIiIqKv6471jf4yZjgy/Y6KjC5MXRNShCy+8EMXFxWF/Ro0ahRkzZuCOO+5Ac3Nzu9dUVFTg9ttvx+zZs1FSUoIZM2bgmmuuwdatWzs8z4MPPoji4mLcdddd3Y5r2LBhGDt2LM455xwsW7as2++5re+++w7FxcX47rvvonK8nqqsrERxcTHefPPNDvdZvHhxu8+mpKQEP//5z3HfffehqanpiM9rs9lwww03YPXq1T2IPrLO3tOKFStQWlqK008/HfX19aF9i4uLsXTp0ojHs9vtGD16dFx9b0eiq7+fWbNm4aabborqubdt24Zf//rXUTlWV65VIiIi6n/YP2D/AOjd/kErr9eLF154Ab/4xS8wduxYjB07FmeffTaeffZZuFyubh3zsccewzPPPBPlSPvO8uXLcemll+L4449HaWkpTjvtNDz22GNoaWkJ7dNX1+yFF16ICy+8MPT3H3/8EWeddRZGjRqFOXPm4M0330RxcTEqKyujet5Dv+PW65+I+i9NXwdARPFtxIgRuO2220J/9/l82LRpEx544AFs2bIFr7zyCiRJAgB88MEHuOGGGzB06FD83//9HwYMGICqqio8//zz+OUvf4nHH38cU6ZMCTu+qqr473//i6KiIixbtgzXXXcdjEbjEccVCARQVVWF5557DjfccAOSkpIwffr0KH0K/U/rw30hBJxOJzZs2ICnn34an3zyCV555RWkpKR0+VhbtmzBsmXLcO655/ZWuO2sXLkSV155JQoKCvDss88iOTk5dGMryzLef/99nH/++e1e9+GHH8Lr9cYszmjqzu8nmt5//32sXbs2KsfKyMjA0qVLkZ+fH5XjERERUfxg/6B/6k/9A7vdjt/97nfYunUrfv3rX2PBggWQJAmrV6/G448/jv/85z94+umnkZWVdUTHffjhh/H73/++V2KOJVVVcf311+P999/Hueeei1//+tcwm81Yt24dnnnmGXz00Ud47rnnkJCQ0Gcxtv0tAsCjjz6Kffv24dFHH0VKSgpyc3OxdOlSZGRkRPW8h37Hv/jFLzB16tSonoOIYovJCyLqlMViQWlpadi2CRMmwOFw4JFHHsH69etRWlqK3bt348Ybb8TUqVPx0EMPQVGU0P4nnXQSfv3rX+PGG2/EJ598Ap1OF2r76quvUFVVhQceeADz5s3DO++8g1/84hfdigsApk2bhsmTJ+PNN988pjsnh342U6ZMwfHHH48LLrgADzzwAO6+++6+CawLVq1ahSuuuAKFhYV49tln2910jx07Ft999x0aGhradbLeffddDB8+HFu2bIllyD3W3d9PvNLpdBF/n0RERNT/sX/QP/Wn/sFf/vIXbNu2Da+++iqGDRsW2n7CCSfgzDPPxK9//Wtcd911eOGFF0KJsmPJ//t//w/vvPMOlixZghNPPDG0ffLkyZg4cSLmzp2LRx99FDfffHOfxVhYWBj298bGRhQVFYX9Bo8kYdZdWVlZR5zkIqL4wmWjiKhbRo0aBQDYt28fAOCFF16A1+vFLbfcEtYxAQCj0Ygbb7wR5557brup5G+88QaKioowbtw4HHfccR0uB9RVer0eOp0u7CZWVVU89dRTOPHEEzFq1Cj8/Oc/xwsvvNDuta+++ip+/vOfo6SkBPPmzQu9t1YdTTktLi7G4sWLQ39vaWnBXXfdhalTp6K0tBTnnnsuPvvss7DXvPbaazj11FND0+wXL16MQCAQts8HH3yAM844AyUlJTj77LM7nVrfFSUlJTjppJPw3//+N2yq9WuvvYZzzjkHpaWlKCkpwZlnnon33nsPQHCa8UUXXQQAuOiii0JTfwOBAJ566imcdtppKCkpQWlpKX71q19hxYoVoeO2TmNv+9kczurVq3H55ZejuLi4w9FCJ554ImRZxocffhi2vbGxEStWrMCpp57a7jX79u3DNddcg4kTJ2LMmDH4zW9+g82bN4ftU1lZiRtuuAEnnHACRo4cicmTJ+OGG25AY2NjaJ9Zs2bhkUcewX333Yfjjz8eJSUluPTSS7Fz587QPg0NDbj22msxZcoUjB49GmeeeSb++9//dvq+u/v7ATqeCn7oVO2NGzfiN7/5DcaNG4eysjJcfPHFWLduHYDgtb1kyRIA4ddzV347F154Ia677josWLAApaWl+O1vf9tuCYM333wTI0aMwPr163H++edj9OjRmDlzZrtp+zU1NfjTn/6EiRMnYsKECVi4cCEefPBBzJo1q9PPj4iIiPoe+wcHsX/Q/f7Btm3bsHz5clxxxRVhiYtWBQUFuPrqq7Fq1arQsbvyPbS2L1myJGzfdevW4ZJLLsHYsWMxadIkXHPNNaiurg6119TU4Oabb8b06dNRUlKC8847Dx9//HG787zyyiu46aabMG7cOEycOBF333033G437rvvPkyaNAnHHXcc/vKXv8Dj8YRe19XrsC2fz4dnn30W06ZNC0tctBo3bhwWLFjQLnnQ1kcffYQLLrgAZWVlGDVqFE4++WS89NJLYfs8//zzOPnkkzF69GhMnToVt99+e9hyVF9//TV++ctfoqysDBMmTMD//d//YceOHaH2tn2R4uJirFy5EqtWrQr1ESItG/X555/jV7/6FUpLS3HCCSdg4cKFsNlsofZVq1bh0ksvxYQJEzBq1CjMmjULixcvhqqqofMA4d9xpGvjf//7H8455xyUlZVhypQpWLhwYdi/Q4sXL8aJJ56Izz77DKeffnrouzlcn46IegeTF0TULRUVFQCAvLw8AMCXX36JESNGIDMzM+L+kydPxp/+9Cekp6eHtjU1NeGTTz7BWWedBQA4++yzsWHDBmzatOmw5xdCwO/3h/54PB6Ul5fj5ptvhsPhwJlnnhna9/bbb8cjjzyCM844A0888QROPvlk/PWvf8Wjjz4a2ufFF1/EbbfdhunTp+Oxxx7DmDFjcOuttx7x5xIIBHDJJZfg7bffxhVXXIHHHnsMgwcPxvz580Nrwj755JO49dZbMXnyZDzxxBOYO3cunn766bDzffLJJ1iwYAGKi4vx6KOP4pRTTsH1119/xPEcasqUKfD5fNiwYQMA4KWXXsLChQsxe/ZsPPnkk1i0aBF0Oh2uu+46VFVVYeTIkVi4cCEAYOHChaHpv4sWLcJjjz2G888/H//v//0/3HXXXWhqasLVV18d6vi0Lh3UlZFyALBmzRr87ne/Q3FxMZ555hlYLJaI+yUkJGDKlCl4//33w7YvX74cOTk5KCkpCdve0NCAX/3qV9i0aRNuvfVW/OMf/4Cqqpg7d27oBtvlcuGiiy7Cjh07cNttt+GZZ57BRRddhHfffRcPPvhg2PH+9a9/oby8HH/7299w9913Y+PGjbjxxhtD7ddffz127NiBO+64A08//TRGjBiBG2+8Mazjdqju/H6OREtLCy677DIkJydj8eLFePDBB+FyuXDppZfCbrfjF7/4Bc477zwACPvOuvLbAYD33nsPZrMZjz/+OC677LKIMaiqij/+8Y+YM2cOnnrqKYwdOxZ///vf8eWXXwIIrmv8m9/8Bt9
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1600x600 with 2 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.decomposition import PCA\n",
|
|||
|
"\n",
|
|||
|
"# Снижение размерности с использованием PCA\n",
|
|||
|
"pca = PCA(n_components=2)\n",
|
|||
|
"reduced_data = pca.fit_transform(data_scaled)\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация сокращенных данных\n",
|
|||
|
"plt.figure(figsize=(16, 6))\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация для KMeans кластеризации\n",
|
|||
|
"plt.subplot(1, 2, 1)\n",
|
|||
|
"sns.scatterplot(x=reduced_data[:, 0], y=reduced_data[:, 1], hue=labels, palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.title('PCA Reduced Data: KMeans Clustering')\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация для исходных данных с категорией Outcome\n",
|
|||
|
"plt.subplot(1, 2, 2)\n",
|
|||
|
"sns.scatterplot(x=reduced_data[:, 0], y=reduced_data[:, 1], hue=df_cleaned['Outcome'], palette='Set2', alpha=0.6)\n",
|
|||
|
"plt.title('PCA Reduced Data: Outcome Classification')\n",
|
|||
|
"\n",
|
|||
|
"plt.tight_layout()\n",
|
|||
|
"plt.show()\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Анализ инерции для метода локтя (метод оценки суммы квадратов расстояний)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Анализ инерции для метода локтя — это техника, используемая для определения оптимального числа кластеров в задаче кластеризации (например, для алгоритма K-Means). Метод основывается на оценке суммы квадратичных отклонений (или инерции) объектов от центров их кластеров.\n",
|
|||
|
"\n",
|
|||
|
"Инерция (в контексте кластеризации) — это метрика, которая измеряет \"плотность\" кластеров, то есть, насколько близко точки внутри каждого кластера расположены к его центроиду.\n",
|
|||
|
"Формально инерция определяется как сумма квадратов расстояний всех точек до ближайшего центра кластера.\n",
|
|||
|
"\n",
|
|||
|
"Метод локтя:\n",
|
|||
|
"1. Для различных значений 𝑘 (количества кластеров) вычисляется инерция.\n",
|
|||
|
"2. Значения инерции отображаются на графике в зависимости от 𝑘.\n",
|
|||
|
"3. Смотрится точка, после которой уменьшение инерции значительно замедляется. Эта точка называется локтем, и соответствующее значение 𝑘 считается оптимальным числом кластеров."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 10,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA14AAAImCAYAAABD3lvqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB8+0lEQVR4nO3deVhV1f7H8c9hngSZwXlAUQIHFM1+OZGZlXVTm9PKW2pmeVO7ll3rXm12TDMrs8EsK0sbLZtHpxwTxQkRxYF5Epnh/P5ATh5BQATOAd6v5+ER9l57n+85revl49prLYPRaDQKAAAAAFBnbCxdAAAAAAA0dgQvAAAAAKhjBC8AAAAAqGMELwAAAACoYwQvAAAAAKhjBC8AAAAAqGMELwAAAACoYwQvAAAAAKhjBC8AAAAAqGMELwAAAACoYwQvAKgFY8aMUXBwsG6//fYLtpkyZYqCg4P1+OOP12NlAGrq+PHjCg4O1tq1ay1dCoBGgOAFALXExsZGu3btUkJCQrlzOTk5+vnnny1QFQAAsAYELwCoJSEhIXJ0dNT69evLnfv555/l7Owsf39/C1QGAAAsjeAFALXExcVFAwcOrDB4ff3117rmmmtkZ2dX7twPP/ygkSNHKiwsTP/3f/+nZ555Rjk5OZKkyMhIBQcHV/h1/PhxSdKGDRt05513qlevXurbt6+mTZumU6dOmb3GtGnTKrxHVY9QlT1CWdHXuaKionTfffepb9++Cg8P1wMPPKBDhw6Zzm/ZskXBwcHasmWLJOngwYMaMmSIbr/9dr388ssXfI2XX35ZkvTxxx/r2muvVWhoqNn5qh7bXL16dYX3Pfe6ssfJqmpX0xqq+9lU9voXOl/23+Hxxx9XZGSk2et++OGHZp/hua+zfft2s7bvvfeegoODze6Rl5en+fPna+jQoQoNDVV4eLjGjh2rffv2mV17obrGjBlj1qasjoqc3z/KjBkzxuw++fn5euWVVzRs2DCFhYVp6NChWrZsmUpKSsyuOb+WLVu2VOvaqhiNRs2YMUPdunXTH3/8Ue3rAECSyv8GAACoseuuu06PPPKIEhISFBAQIEnKzs7Wb7/9prffflu//fabWfsvv/xSjz76qG644QY98sgjOnHihBYuXKiYmBi9/fbbWrJkiQoKCpScnKyHHnpIEydO1KBBgyRJfn5++uyzz/TYY49p+PDhmjBhgtLT07V48WLddttt+vTTT+Xt7S2p9BfW2267TSNHjpQk0/2qIyQkRP/9739NP3/88cf65JNPTD9v3rxZ999/v/r27avnnntO+fn5ev3113X77bdr9erV6tixY7l7zp07V6GhoZo4caI8PDzUv39/SdKsWbMkyfR6AQEB2rp1q2bOnKmbb75ZM2fOlKurqyRVq/68vDyFhYVp5syZpmMXuu7cz/b8djWt4WI+m6eeekqXXXZZha//0UcfSZL27t2r2bNnl2t7vszMTL300ksVnnN1ddVPP/2kXr16mY59/fXXsrEx/7fY6dOna9u2bZo6daratGmjo0ePatGiRZo2bZrWrVsng8FganvzzTfrlltuMf1c9t+xNhmNRj3wwAPatWuXHnroIXXp0kVbtmzRSy+9pPj4eD399NOmtuf32Y4dO1b72so888wz+uqrr/TKK6/oyiuvrPX3CKBxI3gBQC0aNGiQnJ2dtX79et17772SpO+//17e3t5mv+hKpb9Izps3T/3799e8efNMx9u1a6d7771Xv/76qykIlI1utWnTRj169JAklZSUaN68ebryyis1f/580/Xh4eG67rrr9Oabb2r69OmSpNzcXLVr1850bdn9qsPNzc10nST9/vvvZufnz5+vtm3batmyZbK1tZUkXXnllbr66qu1ePFiLVq0yKz90aNH9ccff+iLL75Qp06dJMkUUt3c3CTJ7PXWrVsnSXriiSdMgUeSHBwcqqw9NzdXPj4+Zve70HXnfrbnt9u9e3eNariYzyYoKOiCr192PD8/v8K251u8eLFatGih9PT0cucGDBigH3/8Uf/+978lSQkJCdq5c6d69+6tEydOSJIKCgp05swZzZw5U9ddd50kqU+fPsrOztYLL7yglJQU+fr6mu4ZEBBgVk/Zf8fa9Ntvv2njxo1asGCBrr/+eknS//3f/8nJyUmLFi3S3XffbepP5/fZX3/9tdrXXsj8+fP10UcfacmSJRowYECtvz8AjR+PGgJALXJyclJkZKTZ44br1q3TtddeazZCIEmxsbFKSEhQZGSkioqKTF8RERFyc3PThg0bKn2tI0eOKDk5WcOHDzc73qZNG/Xs2VN//vmn6dipU6fUrFmzWniH5nJychQVFaVrr73WFCwkyd3dXYMHDzaroaz9woUL1bdv3yp/0S3TrVs3SdJbb72lpKQkFRQUqKioqFrX1tb7rkkNF/vZ1JaDBw/qo48+0pNPPlnh+cjISMXFxSk2NlaStH79enXv3l0tW7Y0tXFwcNCbb76p6667TomJidq8ebM+/PBD0wIxBQUFF11XSUmJioqKZDQaq2xT9nVu2z///FN2dnYaNmyY2TU33nij6fyFXMq1kvT+++9r2bJluv76681GRQHgYjDiBQC17Nprr9VDDz2khIQEOTo6atOmTXrkkUfKtcvIyJBU+lhWRY9mJSUlVfo6Zdf7+PiUO+fj46Po6GhJpSNrJ0+eVKtWrS7ujVTD6dOnZTQaL1jD6dOnzY498MADcnd3N3tUsSoRERGaOXOmli1bpiVLllxUfSdOnKj0kby6rOFiP5va8swzz+j6669Xz549Kzzv7++v0NBQ/fjjj+rQoYO+/vprDR8+3NRfyvz+++967rnnFBsbK1dXV3Xp0kUuLi6SVGl4upClS5dq6dKlsrW1lY+Pj6688kr961//MltwpmyU+Fx9+vSRVPr4pKenp1mIlWQaeavs87yUayVp//79uvLKK/XVV1/pnnvuUUhISKXtAaAiBC8AqGUDBgyQq6ur1q9fLxcXF7Vq1UqhoaHl2rm7u0sqnUtT9svluTw8PCp9nebNm0uSUlJSyp1LTk6Wp6enJGnfvn3Ky8srtyBGbWjWrJkMBsMFayirscz06dO1fv16TZ48We+//361H0m79dZb9ccff6ioqEhPPfWUWrVqpYkTJ1Z6TUlJif766y+NGjWqWq9x/ojkpdZwsZ9Nbfjmm2+0Z88es0dPK3LVVVfpxx9/1LXXXqs9e/ZoyZIlZsHr2LFjmjRpkoYMGaLXX39drVu3lsFg0Pvvv1/uUVOp6s9OKv38br31VpWUlOjkyZNauHChxo0bpy+++MLUZtasWWZB+dx5Wh4eHkpPT1dxcbFZgCr7B4qy/l6RS7lWkv71r3/p7rvv1vXXX6+ZM2fq448/LhfiAKAqPGoIALXMwcFBQ4YM0bfffqtvvvnGNKfkfB06dJC3t7eOHz+usLAw05e/v7/mz59fbgTifO3bt5evr6+++uors+Px8fHatWuXwsPDJUm//PKLunbtKi8vr4t+LyUlJZX+guni4qLQ0FB98803Ki4uNh0/ffq0fvnll3Lz2kJDQ7VkyRKdOHFCc+fOrXYdixYt0i+//KIXXnhB1157rcLCwqqcX7Vjxw7l5OSob9++lbYrG705f3GJS63hYj+bS1VQUKA5c+Zo0qRJZvOvKjJkyBD99ddfeu+999SrVy/5+fmZnd+zZ4/y8/M1fvx4tWnTxhSsykJX2WdWtiJgVZ+dVLoYTFhYmLp3765rr71Wd911lw4cOKDMzExTm/bt25v9b+Hc+XR9+vRRUVFRuVVDy4JbZZ/npVwrlY5QOjk56amnntLevXv19ttvV/l+AeB8jHgBQB247rrrNGHCBNnY2JitqHcuW1tbTZkyRU899ZRsbW01ePBgZWVlaenSpUpMTKzyETkbGxtNnTpVM2bM0LRp03TjjTcqPT1dS5YskYeHh8aOHau9e/fq/fff1/XXX69du3aZrk1OTpZUOrKRlpZWLpSlpaUpJiZGR48eNQW4C5k2bZruu+8+jR8/XnfeeacKCwu1bNkyFRQUaNKkSeXa+/v765FHHtGzzz6
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"inertias = []\n",
|
|||
|
"clusters_range = range(1, 23)\n",
|
|||
|
"for i in clusters_range:\n",
|
|||
|
" kmeans = KMeans(n_clusters=i, random_state=random_state)\n",
|
|||
|
" kmeans.fit(data_scaled)\n",
|
|||
|
" inertias.append(kmeans.inertia_)\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"plt.plot(clusters_range, inertias, marker='o')\n",
|
|||
|
"plt.title('Метод локтя для оптимального k')\n",
|
|||
|
"plt.xlabel('Количество кластеров')\n",
|
|||
|
"plt.ylabel('Инерция')\n",
|
|||
|
"plt.grid(True)\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Можно заметить, что после 19-го кластера функция начинает принимать линейный вид, что говорит о следующем: создание более 19-го кластера - не самое оптимальное решение, дальнейшее разбиение данных становится избыточным. "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Расчитаем коэффициенты силуэта"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 11,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1oAAAImCAYAAABKNfuQAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADAM0lEQVR4nOzdeVjUVdsH8O/MMMOw7zsqKrIJaCoguKZmPa1mZZvappZZPG1apI+2mPa6ZGXhlpZLZYtmZVZmlpUooriwIyDKvu/bwMy8f+BMIogMzjDD8P1cF5fyW+/fYcS555xzH4FSqVSCiIiIiIiItEao7wCIiIiIiIiMDRMtIiIiIiIiLWOiRUREREREpGVMtIiIiIiIiLSMiRYREREREZGWMdEiIiIiIiLSMiZaREREREREWsZEi4iIiIiISMuYaBEREREREWkZEy0i6jNmzZqFWbNmtdl28uRJ3H333QgICMA333yj0/u/9tprmDRpksbnTZo0Ca+99poOIiIiXfH19cX69ev1HQYR6ZGJvgMgItKXsrIyPPPMMxg6dCi2bt0KX19ffYdERERERoKJFhH1WZ9++ikaGxuxatUquLi46DscIiIiMiIcOkhEfVJFRQW++OIL3HXXXe2SrOzsbERGRmLMmDEYPnw4Zs2ahVOnTrU55s8//8T06dMxbNgwREREYNmyZaipqWlzzOeff46bb74Zw4YNw4svvoja2loAwIYNGxAeHo5Ro0Zh2bJlkMlk6nNkMhnefPNNhISEICwsTD30qK6uDgsXLsTw4cMxYcIEfP755+pzcnNz4evri71796q3NTU1YfLkyW166ToaOhkbGwtfX1/ExsZ2+D3Q2vM3atSodsMev/nmG9xxxx0IDAzExIkTsX79esjlcvX+joZKXhmr6l4dfanivN6wyY6e6WrFxcV49dVXER4ejptuugkzZ87E6dOn1fuvHuKlVCrx0EMPwdfXF7m5uW2O6yzWyMhIjB8/HgqFos39Fy9ejFtvvRUAUFhYiJdeegmjR4/GsGHDMGvWLJw5cwYAsH79+mveQxVfamoqnnvuOYwePRpDhw7FuHHjsHz5cjQ2NnbaBkePHu009q4+IwD8/vvvuPfeezFs2LBOr3WlvXv3wtfXF2fPnsW9996L4OBg3HXXXfjll1/aHJebm4tFixZh7NixGDp0KMLDw7Fo0SJUVFSoj0lJScGjjz6Km266CVOmTMHu3bvV+zp6/QLtXyfXG9Z35etux44d7f59HT9+HH5+fvj444+veY2rffjhh/D398d3333X5XOIqHdjjxYR9SlKpRIFBQVYvnw5Wlpa8PTTT7fZn5GRgRkzZsDLywtLliyBWCzGjh078Nhjj2Hbtm0IDQ1FXFwc5s+fj7vvvhsvv/wyzp8/j/fffx/p6enYtWsXRCIRDh06hLfeeguzZs3C+PHj8dVXX+HQoUMAgAMHDmD58uXIy8vDmjVrIJVKERUVBQBYvXo19uzZg0WLFsHV1RXr1q1DXl4e8vLycNttt+HDDz/EX3/9hbfeeguurq6YPHlyh8/5ySeftEkSbsTatWtRU1MDa2tr9bZNmzZh3bp1mDlzJqKiopCSkoL169ejoKAAK1as6NJ1hw4diq+++gpAa9L27bffqr+3tLTUSux1dXV4+OGHIZfLsXDhQri4uGDbtm148skn8d1338HLy6vdOd9//32bROxK999/Px544AH192+++Wabfb/++itiY2MRHh4OAGhsbMQvv/yCuXPnQiaTYc6cOWhubsayZcsgFosRHR2NWbNm4euvv8YDDzyAcePGtbnusmXLAACurq4oLi7Go48+iuHDh+Pdd9+FRCLBX3/9hU8//RTOzs6YN2/eNduhsbERrq6u+OCDDzqMvavPeOnSJfz3v//FuHHj8OKLL6pfE9e61tWefvppzJw5Ey+++CK+/fZbvPDCC9i0aRMmTJiAhoYGzJ49G3Z2dli2bBmsrKxw+vRpfPTRR5BKpXjrrbfQ0NCAuXPnwsPDA+vXr0d8fDyWLVsGd3d3jB8/vksxaGrWrFk4ePAg/u///g8TJ06ERCLB66+/juHDh+OZZ57p0jW2bt2K6OhoLF++HPfee69O4iQiw8NEi4j6lLi4OEycOBFisRhbtmxp90b7o48+gkQiwY4dO9Rv9idOnIg777wTq1atwrfffot9+/bBy8sLK1euhFAoxJgxY2BmZoalS5fiyJEjmDRpEjZu3IiwsDAsWbIEABAWFoYxY8agpqYGK1euRGBgIACguroaW7ZswbPPPguFQoGvvvoK8+bNw8yZMwEAjo6OePDBB2Fra4s1a9ZALBZj/PjxSE9Px6ZNmzpMtAoKCrBlyxYMHToUSUlJN9ReCQkJ+P777+Hv74/q6moAQE1NDaKjo/Hggw+qn2/s2LGwtbXFkiVL8MQTT2DIkCHXvbalpSWGDx8OAPj7778BQP29tnz33XfIy8vDd999B39/fwDAiBEjMG3aNMTFxbX7+dfV1WHNmjXXbDtXV9c2MV6ZEI4dOxaurq7Yt2+fOtH67bffUF9fj2nTpuHMmTPIysrC559/jptuukkdyy233ILo6GisX78erq6uba575b3++ecf+Pv744MPPlDvj4iIwNGjRxEbG9tpotXQ0ABra+trxt7VZ0xOTkZzczNefPFF+Pj4XPdaV5s1axYWLFgAABg3bhzuvfdefPzxx5gwYQKys7Ph6uqK//u//0O/fv0AAKNHj8bZs2dx4sQJAEBeXh6CgoLw+uuvo1+/fhg7diy++OIL/P333zpLtAQCAVauXIm7774bq1evhkgkQmVlJbZv3w6RSHTd87/88kusXr0ab731Fu6//36dxEhEholDB4moTwkICMC7774LGxsbREVFtev1OXHiBG6++eY2bxxNTExwxx13IDExEXV1dXjnnXewb98+CIVCtLS0oKWlBbfeeiuEQiHi4uLQ0tKC5ORkjB07Vn0NU1NTDBs2DGZmZuokC2h9c97Y2Ii0tDSkpaWhqalJ3asBtL7RNjU1RXBwMMRicZvzkpKS2gzVU/m///s/jBo1CjfffPMNtZVSqcTy5ctx//33w8/PT7399OnTaGxsxKRJk9TP39LSoh4mePTo0TbXufKYq4fVdTWO7p576tQpeHp6qpMsADAzM8Ovv/7aptdGJTo6GnZ2dnj44Yc1vpdQKMS9996LgwcPoqGhAUBrohcREQFXV1eEhobizJkzGD58OORyOVpaWmBtbY0xY8YgLi7uutcfO3Ysdu3aBVNTU2RkZOD333/Hhg0bUF5e3mb4aUcKCgpgZWWl8TNdbejQoTAxMcGuXbuQl5cHmUyGlpYWKJXKLp1/ZW+OQCDALbfcgnPnzqGxsRH+/v744osv4OHhgezsbBw5cgRbt25FVlaW+vm8vb2xYcMG9OvXDzKZDH/99ReqqqowePDgNvdRKBRtXncdxac6piux9+vXD6+88gq+++47fPPNN1iyZIk6GezMH3/8gTfffBOjRo3CjBkzrns8ERkX9mgRUZ9iaWmJe++9F4MGDcLDDz+MF154AV999ZX6k+mqqio4Ojq2O8/R0RFKpRK1tbWwsLCAqakpgNY3nleqrq5GWVkZ5HI57Ozs2uyztbWFjY1Nm22qoVelpaXqpOnq82xsbGBra9vuvJaWljZzV4DWRPHQoUP44Ycf8NNPP3WlSa5p3759yM7OxsaNG/F///d/6u2VlZUAcM0elOLiYvXf8/Ly2rVRd+LYt28fBAIBHBwcMHLkSPz3v/9t9+a6I5WVlXBwcOjSfbKzs7F9+3Z88sknyM/P71as9913HzZu3IiDBw9i9OjROHbsGNasWaPeL5FIALTO27pyrk5XekYUCgXee+89fP7556ivr4ebmxuCg4PVr8XO5OXlwcPDoxtP1Fa/fv2wevVqvPfee+phniqhoaHXPd/Z2bnN9w4ODlAqlaiuroZUKsWnn36KjRs3orKyEo6OjggMDISZmVm7+Y/V1dUICQkBADg5OeE///lPm/2PP/54u3tfHV90dDSio6MhEong6OiIsWPH4r///e81C+Pcfvv
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"silhouette_scores = []\n",
|
|||
|
"for i in clusters_range[1:]: \n",
|
|||
|
" kmeans = KMeans(n_clusters=i, random_state=random_state)\n",
|
|||
|
" labels = kmeans.fit_predict(data_scaled)\n",
|
|||
|
" score = silhouette_score(data_scaled, labels)\n",
|
|||
|
" silhouette_scores.append(score)\n",
|
|||
|
"\n",
|
|||
|
"# Построение диаграммы значений силуэта\n",
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"plt.plot(clusters_range[1:], silhouette_scores, marker='o')\n",
|
|||
|
"plt.title('Коэффициенты силуэта для разных k')\n",
|
|||
|
"plt.xlabel('Количество кластеров')\n",
|
|||
|
"plt.ylabel('Коэффициент силуэта')\n",
|
|||
|
"plt.grid(True)\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Средний коэффициент силуэта (silhouette score) используется для оценки качества кластеризации. Его значение лежит в диапазоне от -1 до 1. Что означают различные значения:\n",
|
|||
|
"\n",
|
|||
|
"* Близко к 1.0 (0.7–1.0): Кластеры хорошо разделены и компактны. Это отличный результат кластеризации.\n",
|
|||
|
"* От 0.5 до 0.7: Кластеры четко различимы, но есть некоторое пересечение между ними. Это хороший результат.\n",
|
|||
|
"* От 0.25 до 0.5: Кластеры перекрываются, что указывает на менее четкую границу между группами. Качество кластеризации удовлетворительное, но может потребоваться уточнение числа кластеров или доработка данных.\n",
|
|||
|
"* Близко к 0.0: Кластеры сильно перекрываются или распределение данных не позволяет выделить четкие группы. В этом случае нужно пересмотреть выбор числа кластеров, алгоритм или исходные данные.\n",
|
|||
|
"* Меньше 0.0: Плохая кластеризация: точки ближе к центрам чужих кластеров, чем к своим. Это сигнал о том, что данные плохо структурированы для текущей кластеризации."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 12,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Средний коэффициент силуэта: 0.213\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1EAAAJzCAYAAADulpkjAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3xUVf7/8df0kmTSey+QEBIIvUuxYAEVe9ddu65+7auru+uu5acr6q69rWvBghUbCIIUQaT3TnpCek8mmf77I2TIkAkkkJCAn+fj4WPNvTP3nrm5We97zjmfo3C5XC6EEEIIIYQQQnSJsq8bIIQQQgghhBAnEwlRQgghhBBCCNENEqKEEEIIIYQQohskRAkhhBBCCCFEN0iIEkIIIYQQQohukBAlhBBCCCGEEN0gIUoIIYQQQgghukFClBBCCCGEEEJ0g4QoIYQQQgghhOgGCVFCiGN27bXXkpqa6vHPyJEjue6661i7dm1fN08IcYpLTU3l5Zdf7rB97969jBs3jsmTJ5OXl9fp+19++WVSU1PJzMyksbHR62s++eQTUlNTmTZtWk81WwhxCpAQJYQ4Lunp6cydO5e5c+fy8ccf88wzz6DRaLjxxhvZt29fXzdPCPE7s2/fPm644QYMBgNz5swhISHhqO+x2+38/PPPXvfNnz+/h1sohDgVSIgSQhwXX19fsrKyyMrKYsSIEZxxxhm8/PLLKJVKvvrqq75unhDidyQ7O5vrr78eHx8f5syZQ2xsbJfeN3z4cBYsWNBhe1lZGevXr2fQoEE93VQhxElOQpQQoscZDAZ0Oh0KhcK97dprr+Xaa6/1eN3zzz9PamqqR9iaM2cOp59+OsOGDeOaa65h7969AHz00UekpqaSm5vrcYxvvvmGQYMGUVJSAsDixYu56qqrGDZsGBkZGZx99tl89NFHHu95+OGHOwxDbPunqKjI/ZrDh+98+umnHYYPzZ8/n3PPPZesrCwuuugi1q9f7/Geo7VnzZo1pKamsmbNGo/3HX69unL9rFYrzz77LJMnT2bQoEEen+tIgfbwYz/11FNkZmayYsUK4NCQJ2//tG93V659eXk5f/7znxk3bpz7d7xp0yYApk2bdtTfy/r167nmmmsYOnQoo0eP5s9//jPV1dXu43/11VekpqayZcsWZs2axZAhQ5g5cyY//vijRzsaGhr4f//v/3HGGWeQmZnJjBkz+OKLLzxe0749aWlpjBo1irvuuouamppOryVATk4Of/rTnxg9ejSjRo3i1ltvJTs7u9PXH+n6tv+95eXlcffddzNhwgSysrK49tpr2bBhg3t/UVGR+33ffvutxzmWLl3q3tfe/Pnzueiiixg2bBgTJkzgb3/7G3V1dR3a1p63e3HatGk8/PDDnf58uLa2tv98Gzdu5PLLLyczM5MJEybwxBNP0NLS0ukxDpednc11112Hn58fc+bMISoqqsvvPffcc1m5cmWHIX0//vgjiYmJpKWldXjP4sWLueiii9ztffLJJzGbzR1e05W//9WrV/PHP/6RoUOHMmHCBJ577jkcDof7datWreKyyy5j2LBhjBo1ittvv/2I95QQovdJiBJCHBeXy4Xdbsdut2Oz2aioqOD555/HarVy8cUXd/q+goIC3nvvPY9tixYt4oknnuC8887j1VdfxeFwcNttt2G1Wpk5cyY6nY5vvvnG4z3z5s1j3LhxREZGsmzZMu68804GDx7Ma6+9xssvv0xsbCz//Oc/2bJli8f7QkND3cMQ586dy+23337Ez1lXV8e///1vj21bt27lgQceICsri9dff53IyEhuu+02KisrAbrVnu7ydv3efvtt3n//fa6//nref/995s6dyyuvvNKt427dupVPPvmEf//73wwbNsxjX/vr9be//c1jX1c+a1NTE1deeSVr1qzhwQcf5JVXXkGn0/HHP/6RvLw8XnnlFY8233777e7zhYWFsW7dOm644Qb0ej3//ve/+ctf/sLatWu57rrrOjxs33rrrZx++um88sorJCYmcs8997B8+XIAWlpauOqqq/juu++46aabeO211xgxYgSPPvoob7zxhsdxJk+ezNy5c/nwww+5//77WbVqFU899VSn16+srIzLL7+cvLw8Hn/8cZ577jkqKyu5/vrrqa2tPeK1b399D/+97d+/n4suuoiioiIee+wxZs+ejUKh4Prrr+8w/9DHx6fD0LT58+ejVHr+J/+1117jvvvuIysri5deeok777yThQsXcu2113YrvPSEkpISbrzxRgIDA3nllVe4++67+eabb3jooYe69P6cnByuv/56fH19mTNnDuHh4d06//Tp03E4HF6v23nnndfh9d999x133nknSUlJvPrqq/zpT3/i22+/5Y477sDlcgHd+/t/4IEHGDFiBG+88QYzZszgnXfe4fPPPwegsLCQO+64g4yMDF5//XWeeuopcnNzueWWW3A6nd36nEKInqPu6wYIIU5u69atY/DgwR2233fffSQnJ3f6vqeffpoBAwawY8cO97bq6mquuuoq7rvvPqC1Z6XtW/xBgwZx5pln8u233/J///d/KBQKSktL+e2333juueeA1gfNWbNm8eijj7qPOWzYMMaMGcOaNWsYOnSoe7tWqyUrK8v9c05OzhE/50svvURUVJRHL0RpaSnTp0/nySefRKlUEhISwowZM9i8eTNnnHFGt9rTXd6u39atW0lLS+OPf/yje1tbD05XtfUEnn766R32tb9eFovFY19XPuvXX39NcXExX3/9tXt41PDhw7nwwgtZt24dl156qUeb4+LiPM75/PPPk5iYyJtvvolKpQJg6NChnHfeeXz55ZdcffXV7tdee+213HnnnQBMmjSJWbNm8eqrrzJ58mS++uor9u7dy6effuoOipMmTcJut/Paa69xxRVXEBAQAEBQUJC7DaNGjeLXX3/1uOaHe++997Barfzvf/8jNDQUgLS0NK688kq2bNnC5MmTO31v+896+O/tlVdeQavV8sEHH+Dr6wvAlClTmDFjBv/61788etFOO+00fvnlF6xWK1qtFovFwpIlSxg1apS757Curo7XX3+dyy67zCMQDxw4kKuvvrrD9extb7/9NoGBgbz66qvu361SqeSxxx5jz549HXrD2svLy+O6666jsrISm812TMEiJCSEUaNGsWDBAs4//3wAiouL2bJlC//61794/fXX3a91uVzMnj2bSZMmMXv2bPf2hIQEbrjhBpYvX86UKVO69fd/6aWXuu/XcePGsXjxYpYtW8YVV1zB1q1baWlp4dZbb3WHw4iICJYsWYLZbHbfD0KIE0tClBDiuAwePJh//OMfQOvDRX19PStWrODFF1/EbDZz7733dnjPihUr+PXXX3n77be57rrr3NuvuOIKAJxOJ2azmUWLFqHX64mOjgbgkksu4fvvv2f9+vWMGjWKefPm4ePjw5lnngnATTfdBLT2eOTm5lJQUMC2bduA1kB2rPbu3evujWhrI8BZZ53FWWedhcvlwmw2s2DBApRKJYmJib3ans6uX2ZmJm+99RYLFy5k7Nix+Pj4dPmB0uVysWnTJubPn9+hh6sruvJZN2zYQExMjMf8EoPBwMKFC496/ObmZrZs2cKNN97o7v0EiI2NJTk5mVWrVnk89M+aNcv97wqFgjPPPJOXX36ZlpYW1q5dS3R0dIeetvPPP58vvvjCI+y0ncvpdLJ79242bNjA+PHjO23nhg0byMrKcgcoaH3gXbp06VE/45GsXbuWqVOnejwwq9Vqd69tU1OTe/vYsWNZsWIFa9asYdKkSaxYsQJfX19GjhzpDlGbN2/GarUyY8YMj/OMHDmS6Oho1q5de9whqu3aKZXKDr1gbZxOJ3a7nfXr1zNx4kR3gILWMAit1/RIIer7778nIyODF198kT/+8Y88+OCDvPfeex7ndDgc7h4iaL0n2p8LWof0PfnkkzQ2NuLr68sPP/zA4MGDiY+P93hdTk4OpaWl3Hrrre77EFpDtq+vL6tWrWLKlCnd+vs//F6MiIhwDw0cOnQoOp2OSy65hLPPPpvTTjuNMWP
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x700 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import seaborn as sns\n",
|
|||
|
"from sklearn.preprocessing import StandardScaler\n",
|
|||
|
"from sklearn.metrics import silhouette_score\n",
|
|||
|
"from sklearn.cluster import KMeans\n",
|
|||
|
"from sklearn.decomposition import PCA\n",
|
|||
|
"\n",
|
|||
|
"# ========================\n",
|
|||
|
"# Масштабирование данных\n",
|
|||
|
"# ========================\n",
|
|||
|
"scaler = StandardScaler()\n",
|
|||
|
"data_scaled = scaler.fit_transform(df_cleaned[['Glucose', 'BMI', 'BloodPressure', 'Age']])\n",
|
|||
|
"\n",
|
|||
|
"# ========================\n",
|
|||
|
"# Применение K-Means\n",
|
|||
|
"# ========================\n",
|
|||
|
"kmeans = KMeans(n_clusters=4, random_state=42) \n",
|
|||
|
"df_clusters = kmeans.fit_predict(data_scaled)\n",
|
|||
|
"\n",
|
|||
|
"# ========================\n",
|
|||
|
"# Оценка качества кластеризации\n",
|
|||
|
"# ========================\n",
|
|||
|
"silhouette_avg = silhouette_score(data_scaled, df_clusters)\n",
|
|||
|
"print(f'Средний коэффициент силуэта: {silhouette_avg:.3f}')\n",
|
|||
|
"\n",
|
|||
|
"# ========================\n",
|
|||
|
"# Визуализация кластеров\n",
|
|||
|
"# ========================\n",
|
|||
|
"pca = PCA(n_components=2)\n",
|
|||
|
"df_pca = pca.fit_transform(data_scaled)\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(10, 7))\n",
|
|||
|
"sns.scatterplot(x=df_pca[:, 0], y=df_pca[:, 1], hue=df_clusters, palette='viridis', alpha=0.7)\n",
|
|||
|
"plt.title('Визуализация кластеров с помощью K-Means')\n",
|
|||
|
"plt.xlabel('Первая компонентa PCA')\n",
|
|||
|
"plt.ylabel('Вторая компонентa PCA')\n",
|
|||
|
"plt.legend(title='Кластер', loc='upper right')\n",
|
|||
|
"plt.show()\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"В нашем случае, результат находится ближе к хорошему, но пока что больше соответствует удовлетворительному состоянию. На графике видно, что кластеры имеют некоторую степень пересечения, что приемлемо. Это может указывать на сложность четкого разделения групп пациентов из-за схожести их характеристик (например, уровня глюкозы, индекса массы тела или давления). Однако, кластеризация все же предоставляет полезное разделение для анализа данных и дальнейшей интерпретации"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "aimenv",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.12.5"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 2
|
|||
|
}
|