MII/mai/lab5.ipynb

816 lines
2.9 MiB
Plaintext
Raw Normal View History

2024-11-23 11:56:30 +04:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Лабораторная работа 5\n",
"\n",
"Датасет - **Объекты вокруг Земли**\thttps://www.kaggle.com/datasets/sameepvani/nasa-nearest-earth-objects\n",
"\n",
"1. **name**: Название астероида\n",
"2. **absolute_magnitude**: Абсолютная звездная величина астероида\n",
"3. **estimated_diameter_min**: Минимальный оценочный диаметр астероида (в километрах)\n",
"4. **estimated_diameter_max**: Максимальный оценочный диаметр астероида (в километрах)\n",
"5. **hazardous**: Является ли астероид потенциально опасным\n",
"6. **relative_velocity**: Относительная скорость астероида по отношению к Земле (в километрах в секунду)\n",
"7. **miss_distance**: Расстояние между Землёй и астероидом в момент его максимального сближения\n",
"8. **orbiting_body**: Центральное небесное тело, вокруг которого движется астероид\n",
"9. **sentry_object**: Указывает, отслеживается ли данный объект системой мониторинга NASA Sentry\n",
"10. **z**: Глубина бриллианта в миллиметрах"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Бизнес-цели**: \n",
"1. Поддержка образовательных и информационных программ.\n",
"2. Группировать астероиды по \"интересным\" характеристикам для визуализации и информирования общества (например, медленные и большие астероиды, самые близкие к Земле и т.д.)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>est_diameter_min</th>\n",
" <th>est_diameter_max</th>\n",
" <th>relative_velocity</th>\n",
" <th>miss_distance</th>\n",
" <th>orbiting_body</th>\n",
" <th>sentry_object</th>\n",
" <th>absolute_magnitude</th>\n",
" <th>hazardous</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2162635</td>\n",
" <td>162635 (2000 SS164)</td>\n",
" <td>1.198271</td>\n",
" <td>2.679415</td>\n",
" <td>13569.249224</td>\n",
" <td>5.483974e+07</td>\n",
" <td>Earth</td>\n",
" <td>False</td>\n",
" <td>16.73</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2277475</td>\n",
" <td>277475 (2005 WK4)</td>\n",
" <td>0.265800</td>\n",
" <td>0.594347</td>\n",
" <td>73588.726663</td>\n",
" <td>6.143813e+07</td>\n",
" <td>Earth</td>\n",
" <td>False</td>\n",
" <td>20.00</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2512244</td>\n",
" <td>512244 (2015 YE18)</td>\n",
" <td>0.722030</td>\n",
" <td>1.614507</td>\n",
" <td>114258.692129</td>\n",
" <td>4.979872e+07</td>\n",
" <td>Earth</td>\n",
" <td>False</td>\n",
" <td>17.83</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3596030</td>\n",
" <td>(2012 BV13)</td>\n",
" <td>0.096506</td>\n",
" <td>0.215794</td>\n",
" <td>24764.303138</td>\n",
" <td>2.543497e+07</td>\n",
" <td>Earth</td>\n",
" <td>False</td>\n",
" <td>22.20</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>3667127</td>\n",
" <td>(2014 GE35)</td>\n",
" <td>0.255009</td>\n",
" <td>0.570217</td>\n",
" <td>42737.733765</td>\n",
" <td>4.627557e+07</td>\n",
" <td>Earth</td>\n",
" <td>False</td>\n",
" <td>20.09</td>\n",
" <td>True</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id name est_diameter_min est_diameter_max \\\n",
"0 2162635 162635 (2000 SS164) 1.198271 2.679415 \n",
"1 2277475 277475 (2005 WK4) 0.265800 0.594347 \n",
"2 2512244 512244 (2015 YE18) 0.722030 1.614507 \n",
"3 3596030 (2012 BV13) 0.096506 0.215794 \n",
"4 3667127 (2014 GE35) 0.255009 0.570217 \n",
"\n",
" relative_velocity miss_distance orbiting_body sentry_object \\\n",
"0 13569.249224 5.483974e+07 Earth False \n",
"1 73588.726663 6.143813e+07 Earth False \n",
"2 114258.692129 4.979872e+07 Earth False \n",
"3 24764.303138 2.543497e+07 Earth False \n",
"4 42737.733765 4.627557e+07 Earth False \n",
"\n",
" absolute_magnitude hazardous \n",
"0 16.73 False \n",
"1 20.00 True \n",
"2 17.83 False \n",
"3 22.20 False \n",
"4 20.09 True "
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"from scipy.cluster.hierarchy import dendrogram, linkage, fcluster\n",
"from sklearn.cluster import KMeans\n",
"from sklearn.decomposition import PCA\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.metrics import silhouette_score\n",
"\n",
"df = pd.read_csv(\"data/neo.csv\")\n",
"df = df.head(1500)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Очистка данных\n",
"\n",
"Удалим несущественные данные"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>est_diameter_min</th>\n",
" <th>est_diameter_max</th>\n",
" <th>relative_velocity</th>\n",
" <th>miss_distance</th>\n",
" <th>absolute_magnitude</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2162635</td>\n",
" <td>1.198271</td>\n",
" <td>2.679415</td>\n",
" <td>13569.249224</td>\n",
" <td>5.483974e+07</td>\n",
" <td>16.73</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2277475</td>\n",
" <td>0.265800</td>\n",
" <td>0.594347</td>\n",
" <td>73588.726663</td>\n",
" <td>6.143813e+07</td>\n",
" <td>20.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2512244</td>\n",
" <td>0.722030</td>\n",
" <td>1.614507</td>\n",
" <td>114258.692129</td>\n",
" <td>4.979872e+07</td>\n",
" <td>17.83</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3596030</td>\n",
" <td>0.096506</td>\n",
" <td>0.215794</td>\n",
" <td>24764.303138</td>\n",
" <td>2.543497e+07</td>\n",
" <td>22.20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>3667127</td>\n",
" <td>0.255009</td>\n",
" <td>0.570217</td>\n",
" <td>42737.733765</td>\n",
" <td>4.627557e+07</td>\n",
" <td>20.09</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id est_diameter_min est_diameter_max relative_velocity \\\n",
"0 2162635 1.198271 2.679415 13569.249224 \n",
"1 2277475 0.265800 0.594347 73588.726663 \n",
"2 2512244 0.722030 1.614507 114258.692129 \n",
"3 3596030 0.096506 0.215794 24764.303138 \n",
"4 3667127 0.255009 0.570217 42737.733765 \n",
"\n",
" miss_distance absolute_magnitude \n",
"0 5.483974e+07 16.73 \n",
"1 6.143813e+07 20.00 \n",
"2 4.979872e+07 17.83 \n",
"3 2.543497e+07 22.20 \n",
"4 4.627557e+07 20.09 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_cleaned = df.drop(columns=['name', 'orbiting_body', 'sentry_object', 'hazardous'], errors='ignore').dropna()\n",
"df_cleaned.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Визуализация парных взаимосвязей"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi4AAASgCAYAAABv+bDPAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeXxcdbn48c/MnNlnsu9J03RL95aW0paylFYFBQFFFBVERBQFuSIuyM8F0MsVFwRkX0SuCgpeFbggcBURpKwtbekKXZK22ZNJMpl95sw5vz+mM51JJslkaZvQ5/163deVWc7yPWfSmef5fp/HoOu6jhBCCCGEEEIIIYQQQgghxARgPNoHIIQQQgghhBBCCCGEEEIIkSSJCyGEEEIIIYQQQgghhBBCTBiSuBBCCCGEEEIIIYQQQgghxIQhiQshhBBCCCGEEEIIIYQQQkwYkrgQQgghhBBCCCGEEEIIIcSEIYkLIYQQQgghhBBCCCGEEEJMGJK4EEIIIYQQQgghhBBCCCHEhCGJCyGEEEIIIYQQQgghhBBCTBiSuBBCTAi6rh/tQzhijqVzneiOtWtxrJ2vEEIIIcTRJt+/RkfGbeKSayOEOFIkcSGEOKr6+vr4zne+w/r168dle7Nnz+aOO+4AoKmpidmzZ/OXv/xlXLY9Hl544QWuvfbao30Yw5qIYzfeNmzYwJe//OVx2dZf/vIXZs+eTVNTEwB33HEHs2fPHpdtj4doNMp//dd/8b//+79H+1CEEEIIISad0X43vvvuu/n1r3+d+u+J9h1xpPp/5x0P/cd2vH8fTgQT6br3P5bvfve7rF27Nuf379q1i8985jOH49CEEGIASVwIIY6qHTt28OSTT6Jp2rhvu6ysjMcee4zTTjtt3Lc9Wg8//DCtra1H+zCGNRHHbrz96U9/Ys+ePYdl25/85Cd57LHHDsu2R6Ojo4P//u//RlXVo30oQgghhBDHjNtvv51QKJT674n2HXEi6P+743D+PjxaJvJ1v+KKK7jzzjtzfv1zzz3Hxo0bD+MRCSHEIcrRPgAhhDhcLBYLxx133NE+jElJxm5sKioqqKioONqHIYQQQgghJhD5jjjQsfC7YyJf99ra2qN9CEIIMShZcSGEGJM//elPnHXWWSxYsIDTTjuNO+64g3g8nnq+u7ubb37zm5x00kksXLiQc889lyeeeAKAN954g4svvhiAiy++mM997nMj2vebb77JBRdcwOLFiznjjDN49dVXM57PtqT7rbfe4otf/CInnHACCxYsYO3atdxxxx2pGT3J9zz33HNcccUVHHfccaxatYq7774bv9/P//t//4/jjz+eVatW8fOf/zyjvmckEuFnP/sZq1evZsGCBZx99tn87W9/Sz3/uc99jjfffJM333yT2bNn88YbbwDQ29vLD3/4Q1atWsXChQv51Kc+xWuvvZZxLrNnz+bOO+/kvPPOY9GiRTnPihnt+fQfu7/85S/MmzePzZs3c8EFF7Bw4ULWrFmTsfS9v7fffpvZs2fz4osvZjy+Y8cOZs+ezd///ncAnn76ac455xwWLVrEypUr+da3vkV7e/uQ55XLmK1bt45PfepTLFmyhBNOOIGvfvWrqRUW3/3ud/nrX/9Kc3PziJf9a5rG3XffzWmnncbixYu54oor8Hq9Ga/pvwQ7Ho9z//3389GPfpRFixZx3HHH8elPf5rXX3894z0f/vCH+fvf/85HP/rR1Odl48aNbNq0iU9+8pMsWrSIj370owPO9b333uPyyy9n6dKlLF26lCuvvJIDBw4AiWv5gQ98AIDrrrsuYyn4+vXrueiii1i8eDHLly/n2muvpbu7O/V88rr/6U9/4qSTTmL58uXs3r07p3Eay/n84x//4LOf/SxLlixhwYIFfPjDH+aRRx5JPf+1r32NhQsXsnfv3oz9zZ07lzfffDOn4xNCCCHEsWft2rX813/9F5///OdZtGgR3/ve94Dcvlv2N9zviuR3wTvvvDP1v9O/I957770sWLBgwPfIhx9+mPnz5+PxeABoaWnhmmuuYfny5SxevJjPf/7zbN++fUTn/YMf/ICTTjop43cawE033cSKFSuIxWLA0N8pB7Nu3To++9nPcvzxx7NixQq++c1vDlhhvnfvXr72ta+xfPlyTjjhBC6//PLU9/L03x3Zfh8+8sgjzJ49m4aGhoxtPvnkk8ydOzfravZcxlbTNG699VbWrl2bun633HJLaiyyGe332/6/Dfbv389XvvIVVqxYweLFi7ngggt46aWXUs+Hw2FuuOEGTj311NR34aF+dw0mEonwk5/8hJNOOoklS5Zw3XXXEYlEMl7Tv1TU1q1b+fznP8/xxx/PkiVLuOSSS9i0aVPqPJK/Q9NLNHd3d3PjjTeyZs0aFixYwPLly7nyyiszSop97nOf43vf+x73338/p512GgsXLuTTn/4077zzTsbxbNq0iUsvvZSlS5eycuVKrrnmmozfhqP5rAohJi9JXAghRu2+++7jBz/4ASeeeCL33nsvF154IQ888AA/+MEPUq/59re/zZ49e7jxxht54IEHmDdvHtdeey2vv/468+fP54c//CEAP/zhD7n++utz3ve2bdu49NJLcbvd/OpXv+Liiy/mmmuuGfI9O3fu5JJLLqGgoIBbb72Ve+65h2XLlnHnnXfy7LPPZrz2+9//PvX19dxzzz2ceOKJ3H777Zx//vnYbDbuvPNOTj/9dB588EGee+45INGg7Morr+SPf/wjX/jCF7jnnntYsmQJ3/jGN1KJmuuvv5558+Yxb948HnvsMebPn08kEuHzn/88L7zwAt/4xje48847qaio4LLLLhvwBezee+/l7LPP5le/+hVnnHFGzmM1mvPJRtM0rr76as4880zuv/9+li5dys9+9jP+/e9/Z3390qVLqa2t5Zlnnsl4/Omnn6agoIDVq1ezYcMGvvOd73D66afzwAMPcN111/H666/zzW9+c9DjyGXMDhw4wBVXXMGCBQu45557uOmmm2hoaODLX/4ymqZxxRVXsHr1akpLS0dcEuvnP/85d911F+effz533nknBQUF3HLLLUO+5xe/+AV33303F1xwAQ8++CA//vGP6e3t5etf/3pG+YC2tjZuvvlmvvKVr3D77bfT19fHf/zHf3DNNdfwyU9+krvuugtd1/nGN75BOBwGoKGhgU9/+tN4PB5++tOfctNNN3HgwAE+85nP4PF4KCsrS/3A+OpXv5r632+99RaXXHIJNpuN2267jf/3//4fb775JhdffHFq25BIujz00EPcdNNNXHfddcyYMSPnsRrN+fzrX//iyiuvZP78+dx9993ccccdTJkyhR/96Eds3rwZgBtuuAGHw5H6m7F161buvfdeLr30UpYvX57z8QkhhBDi2PPII4+wcOFC7r77bs4///wRfR9PyuV3RbI00Pnnn5+1TNDZZ5+Nqqr83//9X8bjzzzzDCeffDLFxcV0d3fz6U9/mm3btvGDH/yAW265BU3TuPDCC0dU8vTcc8+lq6srNXEKEt/tn332Wc466yzMZvOw3ymzeeKJJ7j00kuprKzkl7/8Jddddx0bN27kggsuSL2nvb2dCy64gMbGRm644QZ+/vOf09XVxec//3l6e3sztpft9+HZZ5+N1WrlySefHLDvE088kcrKylGN7QMPPMAf/vAHrrzySh566CE+85nP8Otf/5p77rlnyLEczffbdJqmcfnllxMKhfjZz37G3XffTUFBAV/96lfZt28fAP/1X//Fyy+/zLXXXsuvf/1rPvCBD/Czn/2MP//5z0MeW3/f/va3efzxx7n88su57bbb8Hq9PPzww4O+3u/3c9lll1FYWMgdd9zBrbfeSigU4otf/CI+n49PfvKTnH/++UDi/v7kJz+JrutcfvnlrFu3jm9961v8+te/5mtf+xqvvfbagN/3zz//PC+88ALf//73+eUvf0lXVxdXXXVVKqG2fft2LrrootSEwBtvvJGtW7fyxS9+EVVVR/VZFUJMcroQQoxCX1+fvmjRIv2HP/xhxuOPP/64Xl9fr7/33nu6ruv6ggU
"text/plain": [
"<Figure size 1600x1200 with 4 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.set(style=\"whitegrid\")\n",
"\n",
"# связь между минимальным и максимальным диаметром астероидов\n",
"plt.figure(figsize=(16, 12))\n",
"plt.subplot(2, 2, 1)\n",
"sns.scatterplot(x=df_cleaned['est_diameter_min'], y=df_cleaned['est_diameter_max'], alpha=0.6)\n",
"plt.title('est_diameter_min vs est_diameter_max')\n",
"\n",
"# связь между расстоянием промаха и яркостью астероида\n",
"plt.subplot(2, 2, 2)\n",
"sns.scatterplot(x=df_cleaned['relative_velocity'], y=df_cleaned['miss_distance'], alpha=0.6)\n",
"plt.title('relative_velocity vs miss_distance')\n",
"\n",
"# связь между расстоянием промаха и абсолютной звездной величиной\n",
"plt.subplot(2, 2, 3)\n",
"sns.scatterplot(x=df_cleaned['miss_distance'], y=df_cleaned['absolute_magnitude'], alpha=0.6)\n",
"plt.title('miss_distance vs absolute_magnitude')\n",
"\n",
"# связь яркости с его относительной скоростью\n",
"plt.subplot(2, 2, 4)\n",
"sns.scatterplot(x=df_cleaned['absolute_magnitude'], y=df_cleaned['relative_velocity'], alpha=0.6)\n",
"plt.title('absolute_magnitude vs relative_velocity')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Стандартизация данных для кластеризации"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Стандартизация данных — процесс приведения всех признаков (столбцов) к одному масштабу."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[-0.51120494, 3.73744939, 3.73744939, -1.45805986, 0.5311181 ,\n",
" -2.02560332],\n",
" [-0.50518329, 0.35347528, 0.35347528, 0.77156175, 0.86292579,\n",
" -0.94818185],\n",
" [-0.49287316, 2.00915073, 2.00915073, 2.28238187, 0.27762431,\n",
" -1.66316796],\n",
" ...,\n",
" [-0.42311599, -0.5744506 , -0.5744506 , -0.12789786, 0.31239586,\n",
" 1.39117363],\n",
" [-0.42119188, -0.52315117, -0.52315117, -0.94732109, 1.179388 ,\n",
" 0.76514892],\n",
" [-0.51510459, 3.1416016 , 3.1416016 , -0.61031165, -0.21666262,\n",
" -1.92016758]])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scaler = StandardScaler()\n",
"data_scaled = scaler.fit_transform(df_cleaned)\n",
"data_scaled"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Агломеративная (иерархическая) кластеризация"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Иерархическая кластеризация — метод машинного обучения, предназначенный для группировки объектов (точек данных) на основе их схожести или расстояния друг от друга. Основная идея заключается в создании структуры кластеров в виде дерева (дендрограммы), которое показывает, как объекты группируются на разных уровнях."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1AAAAJ0CAYAAAAcUcKlAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACzkElEQVR4nOzdeXhU1f3H8c8kM9lXAiHsOwgoiyyCVUCgtFW0Bat1X6o/96WKVVurVavVqmhR1EptVdwXqEvrvqF1AcFdVBTZtxCSsIVkltzfH/Re7kxmJncmk8wkeb+ex8cwc5dz1znfe875XpdhGIYAAAAAAI1KS3YBAAAAAKC1IIACAAAAAIcIoAAAAADAIQIoAAAAAHCIAAoAAAAAHCKAAgAAAACHCKAAAAAAwCECKAAAAABwiAAKAAAAABwigALg2JVXXqlBgwaF/e/KK69MdvEA2Gzfvl2jRo3S559/ru3bt+vcc8/VP//5z2QXCwBaPXeyCwCgdenUqZPmzp0b9NkFF1yQpNIAiKSwsFCnn366jj32WBmGoUGDBukvf/lLsosFAK0eARQAxwKBgHJycjRixIigzzMyMpJTIABRXXDBBTruuOO0Y8cO9erVS+np6ckuEgC0enThA+CY3+9XVlaWo2mXLl2qk046ScOHD9fYsWN1xRVXqLKy0vp+4cKFGjRokNavXx803+TJk4O6A/p8vojdBkOX9dlnn2nGjBkaNmyYjjzySL388stBy965c6duuukmTZ06VQcccICmT5+uZ555psH6Q9ezfv16nXzyybryyiv1t7/9TQcffLBGjRql8847Txs2bAia//XXX9cJJ5ygkSNHav/999dPf/pTPfroo9b3ixcvtpa7bNmyoHkfeeQRDRo0SJMnT25Qnj/84Q9B027fvl3777+/Bg0apMWLFztefyRPP/20Zs6cqREjRmjYsGH6+c9/rpdeeqnBPg7XbTPS8Tn55JOD1vHiiy9q5syZGjlypH70ox/pmmuu0fbt263v77rrLg0aNEgjR46U1+sNmveiiy5q0FW0rq5Ot9xyiyZOnKj9999fRx55pF588cWg+SZPnqw77rhDf/7znzVmzBgddNBBuvzyy1VdXe14+6N1XV24cKF1TO3HYdu2bRo9enTYYzlo0CDtt99+GjNmjC688EJVVVVZ0wwaNEh33XVXUNnM/RLPvpSkjh07qm/fvnr//fcb7W4buq7//Oc/GjNmjGbPni0p+PwN/c9e7m+++UYXXHCBxo0bp6FDh+rQQw/VDTfcoNraWmsar9erv/71r5oyZYqGDRum6dOn61//+pejfS5JGzdu1KWXXqqxY8dq+PDhOvXUU7V8+XJr+evXr9egQYP0n//8R+ecc46GDx+uSZMm6e6771Z9fX3QcQndJ5deemnQMTUMQ3PmzNGhhx6qUaNG6ZxzztGmTZus6QOBgObNm6fp06dr2LBhGjFihI477jh9+OGHUY+j1PCYh/7bMAwdd9xxQffLK6+8MujckqQnnngi7PkDIPFogQLg2J49e1RYWNjodB999JFOP/10jRs3Tn/961+1fft2zZkzR6eccoqeeeYZx0GYtLeSLEn33nuvOnToIGlvZTc08JGks88+WyeddJIuueQSPfPMM/rNb36j++67TxMnTlRtba1OOOEEbdu2TRdddJG6deum119/XVdddZUqKip0zjnnWMuZOHGizjvvPOvfpaWlkqQ33nhDxcXF+sMf/qD6+nrNnj1bJ598sv7zn/8oOztbb7/9ts4//3ydcsopuvDCC1VbW6vHHntM119/vfbff38NHz7cWmZubq7efPNNjRo1yvrsxRdfVFpaw+daubm5evvtt2UYhlwulyTp1VdfVSAQCJoulvXbPfroo7rhhht04YUXatSoUdq+fbv+/ve/67LLLtPIkSNVVlZmTTt37lx16tRJkqzjIUm//OUvdcwxx1j/vu6664LWcc899+jOO+/UCSecoEsuuUTr1q3TnDlz9Omnn+qpp54KOidcLpc++OADTZw4UZK0e/duLVq0KGjfGIah888/Xx9//LEuuugi9evXT6+99pouueQSeb1e/eIXv7Cmfeyxx9SrVy/ddNNNqqys1OzZs7VmzRo98cQTcrlcjW7/eeedp+OOO07S3hadIUOGWOdHz5499d133zXYp7Nnz9bOnTtVUFAQ9Ll5bvl8Pq1cuVK33HKLbrzxRt12221hj004sexLk8/n05///GfH65Ck2tpaXX/99TrzzDN15JFHBn13zTXXaOjQoda/f/WrX1l/l5eX68QTT9SIESN08803KyMjQ++8844eeOABlZaW6qyzzpIkXXbZZVq0aJHOPfdcDR8+XIsWLdKVV14pj8fT6D6vrKzUcccdp+zsbF199dXKzs7WQw89pBNPPFHPPPOM+vXrZ5Xn2muv1cSJE3XXXXdp2bJlmjt3rmpqavTb3/427HYvXbpU//nPf4I+e/DBB3Xffffp8ssvV58+fXTzzTfr4osv1lNPPSVJuu222/T4449r1qxZGjRokLZs2aK7775bF198sd5++21lZ2fHtO/tnnvuOX3yySdRp9m+fbv++te/xr0OALEhgALgWHV1tRVMRDN79mz16dNH9913n9VlaPjw4TriiCO0YMECnXjiiY7XWVNTI0kaOXKkiouLJUnvvvtu2GlPPvlknX/++ZKkQw89VDNmzNDdd9+tiRMnauHChVqxYoWeeOIJjRw50prG7/frnnvu0XHHHaeioiJJewOD0G6K0t4AcuHCherRo4ckqW/fvpoxY4aeffZZHX/88fr+++81Y8YMXXXVVdY8I0eO1EEHHaTFixcHBTATJkzQG2+8YVXiNm/erE8++USjR49u0Ko1fvx4LVq0SJ999plVrpdeekljxowJavWIZf1269at0xlnnBEUNHbr1k0zZ87UsmXLdMQRR1ifDx48WN27d2+wjLKysqB9lpeXZ/29fft23XvvvTr22GN1zTXXWJ8PHDhQJ554YoNzwtw3ZgD15ptvqlOnTkGtBu+//77effdd3XHHHTr88MMl7T2ee/bs0W233abp06fL7d77E5eWlqYHHnhA+fn5kvYe3/PPP1/vvvuuJkyY4Gj7e/bsKWlvd9VI54fpiy++0HPPPafBgwdrx44dQd/Z5x0zZozef/99ffXVVxGXFSrWfWl6+OGHVVNTo44dOzpe17///W95PB6deeaZDbr+9e/fP+I+WLFihQYPHqw5c+ZY58HBBx+s9957T4sXL9ZZZ52lFStW6JVXXtHvf/97nXrqqZL2nucbNmzQ4sWLNX369Kj7/I477lB1dbUef/xxdevWTdLe8+bwww/XnDlzdOedd1rTDh061ApQJ0yYoJqaGj300EM699xzg85TSaqvr9cNN9ygoUOHBh2XmpoanXfeeTrttNMk7W3duv7667Vjxw4VFBSovLxcl1xySVCra2Zmpi688EJ9++23Uc+XaHbv3q3bbrutQXlC3XnnneratWtQayaA5kMXPgCOlZeXq3PnzlGn2bNnjz777DNNnDhRhmHI7/fL7/erR48e6tevn957772g6evr661p/H5/g+Vt3rxZaWlpDSo64cyYMcP62+Vy6cc//rE+//xz1dbWasmSJerWrZsVPJmOOuoo1dXV6bPPPmt0+QceeKAVPEnSkCFD1KNHD3300UeSpDPPPFM333yzdu/erS+//FIvvvii7rvvPklq0CVt8uTJWr16tX744QdJ0ssvv6zhw4dblUG7/Px8jR07Vm+88YYkqbKyUosXLw4KbGJdv92VV16pyy67TDt27NCnn36q5557zur2F20+pz799FN5vV5Nnz496PPRo0erW7duWrJkSdDnU6ZM0ZtvvinDMCTtbZkzgyTTBx98IJfLpYkTJwadP5MnT9bWrVuDWoUmT55sBU/mv91ut3XcErn9hmHohhtu0C9/+Uvtt99+Yb/3+/3yer36/PPPtWzZMu2///5B04ReE/bAMdZ9KUkVFRW6++67dcUVVygzM9PRdmzZskV///vfdcIJJ8Q8buqQQw7RI488oszMTH3//fd
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 7 17 13 ... 18 9 7]\n"
]
}
],
"source": [
"linkage_matrix = linkage(data_scaled, method='ward')\n",
"\n",
"plt.figure(figsize=(10, 7))\n",
"dendrogram(linkage_matrix)\n",
"plt.title('Дендрограмма агломеративной кластеризации')\n",
"plt.xlabel('Индекс образца')\n",
"plt.ylabel('Расстояние')\n",
"plt.show()\n",
"\n",
"result = fcluster(linkage_matrix, t=10, criterion='distance')\n",
"print(result) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Визуализация распределения кластеров"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi4AAASgCAYAAABv+bDPAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3gc1dX48e+U7UWrXizbcpN7BWy6sVMgdAgJEAgQIIFAeN+EFOCXAiThDSkECL2EkAIJpAGBAKGFYqqNDTa2cZEtS1avu9q+M/f3x9prryXZcjdwPs+TJ3hmdubO3ZE0c8/cczSllEIIIYQQQgghhBBCCCGEEOIAoO/vBgghhBBCCCGEEEIIIYQQQmwmgQshhBBCCCGEEEIIIYQQQhwwJHAhhBBCCCGEEEIIIYQQQogDhgQuhBBCCCGEEEIIIYQQQghxwJDAhRBCCCGEEEIIIYQQQgghDhgSuBBCCCGEEEIIIYQQQgghxAFDAhdCCCGEEEIIIYQQQgghhDhgSOBCCCGEEEIIIYQQQgghhBAHDAlcCCEOCEqp/d2EfeaTdK4Huk/ad/FJO18hhBBCiP1N7r92jfTbgUu+GyHEviKBCyHEfhUOh/ne977HwoUL98j+xo8fz2233QZAY2Mj48eP5x//+Mce2fee8MILL3DVVVft72bs0IHYd3vaokWL+NrXvrZH9vWPf/yD8ePH09jYCMBtt93G+PHj98i+94RUKsX//d//8a9//Wt/N0UIIYQQ4iNnV++N77zzTn7729/m/n2g3SPurG3vefeEbft2Tz8fHggOpO9927ZcffXVzJ8/f8ifX716NWefffbeaJoQQvQjgQshxH61YsUKHn/8cWzb3uP7Lisr45FHHuGYY47Z4/veVQ8++CDNzc37uxk7dCD23Z7217/+lbVr1+6VfX/hC1/gkUce2Sv73hVtbW38/ve/J5PJ7O+mCCGEEEJ8Ytx6663E4/Hcvw+0e8QDwbbPHXvz+XB/OZC/98suu4zbb799yNs/88wzLF68eC+2SAghtjD3dwOEEGJvcTqdzJgxY3834yNJ+m73VFRUUFFRsb+bIYQQQgghDiByj9jfJ+G540D+3keMGLG/myCEEIOSGRdCiN3y17/+lRNOOIEpU6ZwzDHHcNttt2FZVm59V1cX3/72tzniiCOYOnUqp5xyCo899hgAb731Fueddx4A5513Hl/+8pd36thvv/02Z555JtOnT+fYY4/l9ddfz1s/0JTud955h4suuohDDjmEKVOmMH/+fG677bbcGz2bP/PMM89w2WWXMWPGDA4//HDuvPNO+vr6+H//7/9x0EEHcfjhh/PLX/4yL79nMpnkF7/4BXPnzmXKlCmcdNJJ/Pvf/86t//KXv8zbb7/N22+/zfjx43nrrbcA6Onp4Uc/+hGHH344U6dO5Ytf/CJvvPFG3rmMHz+e22+/ndNPP51p06YN+a2YXT2fbfvuH//4B5MmTeK9997jzDPPZOrUqcybNy9v6vu23n33XcaPH89LL72Ut3zFihWMHz+e5557DoAnn3ySk08+mWnTpnHooYfyne98h9bW1u2e11D6bMGCBXzxi19k5syZHHLIIXz961/PzbC4+uqr+ec//8nGjRt3etq/bdvceeedHHPMMUyfPp3LLruM3t7evG22nYJtWRb33nsvJ554ItOmTWPGjBmcddZZvPnmm3mfOe6443juuec48cQTcz8vixcvZsmSJXzhC19g2rRpnHjiif3OddWqVVxyySXMmjWLWbNmcfnll9PQ0ABkv8tPfepTAFxzzTV5U8EXLlzIueeey/Tp05k9ezZXXXUVXV1dufWbv/e//vWvHHHEEcyePZs1a9YMqZ9253yef/55vvSlLzFz5kymTJnCcccdx0MPPZRb/41vfIOpU6dSV1eXd7yJEyfy9ttvD6l9QgghhPjkmT9/Pv/3f//H+eefz7Rp0/j+978PDO3ecls7eq7YfC94++235/5763vEu+++mylTpvS7j3zwwQeZPHkynZ2dADQ1NXHllVcye/Zspk+fzvnnn8/y5ct36rx/+MMfcsQRR+Q9pwHccMMNzJkzh3Q6DWz/nnIwCxYs4Etf+hIHHXQQc+bM4dvf/na/GeZ1dXV84xvfYPbs2RxyyCFccsklufvyrZ87Bno+fOihhxg/fjzr1q3L2+fjjz/OxIkTB5zNPpS+tW2bm2++mfnz5+e+v5tuuinXFwPZ1fvbbZ8NNmzYwKWXXsqcOXOYPn06Z555Ji+//HJufSKR4LrrruPoo4/O3Qtv77lrMMlkkp/97GccccQRzJw5k2uuuYZkMpm3zbapopYtW8b555/PQQcdxMyZM7ngggtYsmRJ7jw2P4dunaK5q6uL66+/nnnz5jFlyhRmz57N5ZdfnpdS7Mtf/jLf//73uffeeznmmGOYOnUqZ511Fu+//35ee5YsWcKFF17IrFmzOPTQQ7nyyivzng135WdVCPHRJYELIcQuu+eee/jhD3/IYYcdxt13380555zDfffdxw9/+MPcNt/97ndZu3Yt119/Pffddx+TJk3iqquu4s0332Ty5Mn86Ec/AuBHP/oR11577ZCP/cEHH3DhhRcSCAT4zW9+w3nnnceVV1653c+sXLmSCy64gFAoxM0338xdd93FwQcfzO23387TTz+dt+0PfvADamtrueuuuzjssMO49dZbOeOMM3C73dx+++189rOf5f777+eZZ54BsgXKLr/8cv7yl7/wla98hbvuuouZM2fyrW99Kxeoufbaa5k0aRKTJk3ikUceYfLkySSTSc4//3xeeOEFvvWtb3H77bdTUVHBxRdf3O8G7O677+akk07iN7/5Dccee+yQ+2pXzmcgtm3zzW9+k+OPP557772XWbNm8Ytf/IJXX311wO1nzZrFiBEjeOqpp/KWP/nkk4RCIebOncuiRYv43ve+x2c/+1nuu+8+rrnmGt58802+/e1vD9qOofRZQ0MDl112GVOmTOGuu+7ihhtuYN26dXzta1/Dtm0uu+wy5s6dS2lp6U6nxPrlL3/JHXfcwRlnnMHtt99OKBTipptu2u5nfvWrX3HnnXdy5plncv/99/OTn/yEnp4e/vd//zcvfUBLSws33ngjl156KbfeeivhcJj/+Z//4corr+QLX/gCd9xxB0opvvWtb5FIJABYt24dZ511Fp2dnfz85z/nhhtuoKGhgbPPPpvOzk7KyspyDxhf//rXc//9zjvvcMEFF+B2u7nlllv4f//v//H2229z3nnn5fYN2aDLAw88wA033MA111zDmDFjhtxXu3I+//3vf7n88suZPHkyd955J7fddhvDhw/nxz/+Me+99x4A1113HV6vN/c7Y9myZdx9991ceOGFzJ49e8jtE0IIIcQnz0MPPcTUqVO58847OeOMM3bqfnyzoTxXbE4NdMYZZwyYJuikk04ik8nwn//8J2/5U089xZFHHklxcTFdXV2cddZZfPDBB/zwhz/kpptuwrZtzjnnnJ1KeXrKKafQ0dGRe3EKsvf2Tz/9NCeccAIOh2OH95QDeeyxx7jwwguprKzk17/+Nddccw2LFy/mzDPPzH2mtbWVM888k/Xr13Pdddfxy1/+ko6ODs4//3x6enry9jfQ8+FJJ52Ey+Xi8ccf73fsww47jMrKyl3q2/vuu48///nPXH755TzwwAOcffbZ/Pa3v+Wuu+7abl/uyv3t1mzb5pJLLiEej/OLX/yCO++8k1AoxNe//nXq6+sB+L//+z9eeeUVrrrqKn7729/yqU99il/84hf8/e9/327btvXd736XRx99lEsuuYRbbrmF3t5eHnzwwUG37+vr4+KLL6awsJDbbruNm2++mXg8zkUXXUQkEuELX/gCZ5xxBpC9vr/whS+glOKSSy5hwYIFfOc73+G3v/0t3/jGN3jjjTf6Pd8/++yzvPDCC/zgBz/g17/+NR0dHVxxxRW5gNry5cs599xzcy8EXn/99SxbtoyLLrqITCazSz+rQoiPOCWEELsgHA6radOmqR/96Ed5yx999FFVW1urVq1apZRSasqUKequu+7Krbc
"text/plain": [
"<Figure size 1600x1200 with 4 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(16, 12))\n",
"plt.subplot(2, 2, 1)\n",
"sns.scatterplot(x=df_cleaned['est_diameter_min'], y=df_cleaned['est_diameter_max'], hue=result, palette='Set1', alpha=0.6)\n",
"plt.title('est_diameter_min vs est_diameter_max')\n",
"\n",
"plt.subplot(2, 2, 2)\n",
"sns.scatterplot(x=df_cleaned['relative_velocity'], y=df_cleaned['miss_distance'], hue=result, palette='Set1', alpha=0.6)\n",
"plt.title('relative_velocity vs miss_distance')\n",
"\n",
"plt.subplot(2, 2, 3)\n",
"sns.scatterplot(x=df_cleaned['miss_distance'], y=df_cleaned['absolute_magnitude'], hue=result, palette='Set1', alpha=0.6)\n",
"plt.title('miss_distance vs absolute_magnitude')\n",
"\n",
"plt.subplot(2, 2, 4)\n",
"sns.scatterplot(x=df_cleaned['absolute_magnitude'], y=df_cleaned['relative_velocity'], hue=result, palette='Set1', alpha=0.6)\n",
"plt.title('absolute_magnitude vs relative_velocity')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### KMeans (неиерархическая кластеризация) для сравнения"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Неиерархическая кластеризация — метод группировки данных, при котором объекты распределяются по заданному числу кластеров(в нашем случае - \n",
"𝑘 в методе K-Means), основываясь на определенных метриках расстояния или схожести. В отличие от иерархической кластеризации, которая создает древовидную структуру кластеров, неиерархическая работает с фиксированным количеством кластеров и напрямую распределяет объекты в группы.\n",
"\n",
"K-Means:\n",
"* Делит данные на 𝑘 кластеров, **минимизируя сумму квадратов расстояний от каждой точки до её центроида**.\n",
"* Центроиды обновляются итеративно, пока результат не стабилизируется."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Центры кластеров:\n",
" [[-0.43863035 0.09971675 0.09971675 0.63756469 0.62405041 -0.45475412]\n",
" [-0.4426337 -0.35846552 -0.35846552 -0.63523481 -0.61852806 0.41688706]\n",
" [ 2.21402038 -0.38941373 -0.38941373 -0.16164814 -0.0129622 0.65296724]\n",
" [-0.50737813 3.15522336 3.15522336 0.6094733 0.11681301 -1.85373482]]\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi4AAASgCAYAAABv+bDPAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd5wV1fn48c+U27d3tsAuLLssvUiXaq/YosZuYjeaYiz5xXzVGI1JNCaKxh41NuwdO9KkSy/CUpftvd06M+f3x2UvXHYXliarnvfr5UuYuXfmTLnDzDznPI8ihBBIkiRJkiRJkiRJkiRJkiRJkiR1A+rRboAkSZIkSZIkSZIkSZIkSZIkSVIbGbiQJEmSJEmSJEmSJEmSJEmSJKnbkIELSZIkSZIkSZIkSZIkSZIkSZK6DRm4kCRJkiRJkiRJkiRJkiRJkiSp25CBC0mSJEmSJEmSJEmSJEmSJEmSug0ZuJAkSZIkSZIkSZIkSZIkSZIkqduQgQtJkiRJkiRJkiRJkiRJkiRJkroNGbiQJEmSJEmSJEmSJEmSJEmSJKnbkIELSZK6NSHE0W5Ct/JT2x8/te3tzn5Kx+KntK2SJEmSJB1e8j7i4Ml9d+DkPuve5PGRpEMjAxeStJdLL72USy+9tN30lpYWzj//fAYOHMgXX3wR+WxhYSEXXnhhp8v77W9/S2FhIXfccccRa/OREggEeP755zn33HMZMWIEo0aN4sILL+Tdd9+N+gf40UcfpbCw8LCuOxgMcv/99/PBBx8cluV1dlx/SJYtW8Y111xzWJb19ttvU1hYyM6dO4EjcwwPxeE+/kdSd9t3R8Ljjz/Os88+e1iWtfdvcerUqd3q+rhp0yZ+/vOfH+1mSJIkSVKXyeeX3eTzy9F3MPu2oqKCa665htLS0si07naPeKCOxPHbe98ezufD7qI7Hfe921JYWMijjz7a5e+/8cYb/O1vfzsSTZOknwz9aDdAkn4IWlpauOqqq9iwYQOPPfYYkyZNisxTVZUVK1ZQUVFBRkZG1Pe8Xi+zZs36vpt7WNTU1HDVVVdRXl7OpZdeyuDBg7Esi1mzZnHHHXewdOlS7r33XhRFOSLrr6qq4oUXXuCvf/3rYVneXXfddViWczS98cYbbN68+Ygs+2c/+xkTJkw4Iss+GIf7+B9J3W3fHQn//ve/+dWvfnVElj19+nRiYmKOyLIPxieffMLy5cuPdjMkSZIk6ZDI5xf5/PJD8s033zB79uyoad3tHrE72Pu540g+Hx4t3fm4z5gxo901c1/+85//MGrUqCPYIkn68ZOBC0naj7ab/vXr1/Of//yH8ePHR83v378/xcXFfPLJJ1xxxRVR82bNmoXL5SIuLu57bPHhcfvtt1NRUcGMGTPIzc2NTJ88eTKZmZn885//ZMqUKRx33HFHr5EHID8//2g3oVvLyMg4oJswaTe57w5N//79j3YTJEmSJOlHRT6/yOeXHwN5j9jeT+G5ozsf96FDhx7tJkjST45MFSVJ+9Da2srVV1/Nd999x1NPPdXuph/A7XYzadIkPvnkk3bzPv74Y0466SR0PTpGaFkWTz31FCeccAIDBw7kpJNO4n//+1/UZ0zT5KmnnuL0009n8ODBDB06lAsvvJCFCxdGPvPoo49ywgkn8PXXX3PGGWdElvXuu+9GLeuFF17g5JNPZtCgQUyYMIG7776blpaWTrd7/fr1zJs3j1/+8pdRN/1trrjiCi6++GLcbneH3+9oeOfeqYn8fj933303EydOZODAgZx88smRVDQ7d+6MPFD84Q9/YOrUqZHlLF26lEsuuYQhQ4YwatQobr/9durq6qLW079/f9544w3Gjx/PqFGjKC4ubjdUt7CwkJdffpk//vGPjBo1imHDhvHrX/+ampqaqHY/++yzHHfccQwePJgLL7yQr776isLCQhYtWhRpa1eGjDY0NPB///d/jBs3jkGDBnH++eezYMGCqM/Mnz+f888/n2HDhjFy5Eiuv/76SA+aO+64g3feeYfS0lIKCwt5++2397m+PVmWxeOPP87kyZMZMmQIN9xwA42NjVGf2XvYcVfPv5NPPpnPP/+c008/nUGDBjFt2jSWL1/OihUr+NnPfsbgwYM5/fTT223rxo0bufbaaxk+fDjDhw/nxhtvpKSkJLJPD/fx74qD3Z69992ll17KH//4R5566ikmT57MoEGDuPDCC1m1alWn6/7Tn/7E+PHjMU0zavp9993H6NGjCYVC+/zN7Mv+9pllWTz88MNMnTqVgQMHMnXqVB566CFCoRBAZNumT59+wMP+y8rK+NWvfsWIESMYP348//3vf9t9Zu/rxc6dO7nttts49thjGTBgAGPHjuW2226jvr4+6jvTp0/n/vvvZ/To0QwbNoxbbrmF1tZWnnrqKSZOnMiIESO46aabor4H4Z5pp512GgMHDmTy5Mk8+uijkf3+6KOPMn369Mh2t/2uu3LNvvTSS/n973/PzTffzNChQ7nyyiu7vJ8Odnv8fj8PPfQQJ554IgMHDmT48OFceeWVrF+/HoDy8nJGjBgRde0LBAKceuqpnHbaaQQCgS63UZIkSfphkM8v8vnlcDy/LFq0iMLCQl577TWmTJnC8OHDmT9/fpe2Z2/7Oy/efvtt/vCHPwBw3HHHRY7DnsfkpJNO4uabb2637GnTpnH99ddH/v7FF19wzjnnMGjQIMaPH89f/vIXvF5vp23bWyAQYMSIEe3S+xiGwZgxY/jLX/4Smbave8rOlv3YY49FzusTTzyRp556Csuyoj737rvvcvbZZzNkyBAmT57MQw89RDAYBKKfOzp6Pjz33HM7TAN3xRVXdHpv2pV9u2PHDq677jpGjx7NkCFDuOCCC9qNkNnbwd7f7v1b/PDDDznzzDMZPHgwY8aM4fe//z2VlZWR+WvWrOHyyy9nxIgRDBs2jCuuuIIVK1bss20d2bBhA1deeSXDhg1jypQpvP/+++0+s/fvZl/XqalTp1JaWso777wTdR1ZsmQJv/zlLxk5cmTk2evRRx+NnAdtv8+ZM2dy8803M2zYMEaNGsWdd94ZdS4LIXj++ec55ZRTGDx4MCeccALPPvtsVEq8A/2tSlK3JCRJinLJJZeISy65RLS2toqLLrpIDB48WCxZsmSfn505c6YoLCwU5eXlkXnNzc1i4MCBYsmSJWLKlCni9ttvj8z705/+JAYMGCAeeeQRMXfuXPHPf/5T9OvXT0yfPj3ymQceeEAMGTJEvPjii2LRokXi/fffFyeddJIYNWqU8Hq9QgghHnnkETFkyBAxZcoU8frrr4v58+eLX/ziF6KgoEAUFxcLIYT44IMPxIABAyLLefXVV8XQoUPFbbfd1uk+ePLJJ6OWsT+PPPKIKCgoiPx97+0VQoi33npLFBQUiJKSksg+mDJlivjwww/FwoULxd///ndRUFAg3nzzTREIBMRnn30mCgoKxMMPPyzWrl0rhBBi8eLFYsCAAeKXv/yl+Oqrr8Q777wjJk+eLE477TTh8/mi1nPyySeLWbNmibfffltYlhU5Vm0KCgrEiBEjxB133CHmzp0rXnnlFTFo0CDx29/+NvKZRx99VPTr10/84x//EHPnzhX333+/GDRokCgoKBALFy4UQggRCATE8uXLo4793vx+vzjzzDPFuHHjxOuvvy6+/vprcdNNN4n+/fuLb775RgghxI4dO8TgwYPFPffcIxYsWCA+/fRTcdJJJ4mpU6cK0zTF9u3bxdVXXy3Gjx8vli9fLmpra7t0bIQIn0v9+/cXjz76qJgzZ474wx/+IAYMGBB1PPY+hgdy/k2dOlV88MEH4ssvvxSTJ08Wxx57rJgyZYqYMWOGmDNnjjj11FPF6NGjI8doy5YtYtiwYeLcc88Vn332mfj444/FGWecIcaPHy9qamqOyPHvioPdnr333SWXXCJGjBghzj//fPH555+Lzz77TBx33HFi4sSJwjCMDte9ZMkSUVBQIObPnx+ZZpqmGD9+vLjnnnuEEPv+zXSmK/vsiSeeECNHjhRvvvmmWLRokXjqqadEUVGR+Pe//y2EEGL58uW
"text/plain": [
"<Figure size 1600x1200 with 4 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"random_state = 17\n",
"kmeans = KMeans(n_clusters=4, random_state=random_state)\n",
"\n",
"labels = kmeans.fit_predict(data_scaled)\n",
"\n",
"centers = kmeans.cluster_centers_\n",
"\n",
"print(\"Центры кластеров:\\n\", centers)\n",
"centers = scaler.inverse_transform(centers)\n",
"\n",
"plt.figure(figsize=(16, 12))\n",
"plt.subplot(2, 2, 1)\n",
"sns.scatterplot(x=df_cleaned['est_diameter_min'], y=df_cleaned['est_diameter_max'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 1], centers[:, 2], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: est_diameter_min vs est_diameter_max')\n",
"plt.legend()\n",
"\n",
"plt.subplot(2, 2, 2)\n",
"sns.scatterplot(x=df_cleaned['relative_velocity'], y=df_cleaned['miss_distance'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 3], centers[:, 4], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: relative_velocity vs miss_distance')\n",
"plt.legend()\n",
"\n",
"plt.subplot(2, 2, 3)\n",
"sns.scatterplot(x=df_cleaned['miss_distance'], y=df_cleaned['absolute_magnitude'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 4], centers[:, 5], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: miss_distance vs absolute_magnitude')\n",
"plt.legend()\n",
"\n",
"plt.subplot(2, 2, 4)\n",
"sns.scatterplot(x=df_cleaned['absolute_magnitude'], y=df_cleaned['relative_velocity'], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.scatter(centers[:, 5], centers[:, 3], s=300, c='red', label='Centroids')\n",
"plt.title('KMeans Clustering: absolute_magnitude vs relative_velocity')\n",
"plt.legend()\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### PCA для визуализации сокращенной размерности"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"PCA (Principal Component Analysis) — метод сокращения размерности, используемый для преобразования высокоразмерных данных в пространство с меньшим количеством измерений, сохраняя при этом как можно больше информации (дисперсии) из исходных данных.\n",
"\n",
"В контексте графиков для визуализации результатов кластеризации, PCA используется для проекции многомерных данных в двумерное пространство, чтобы можно было легко визуализировать кластеры."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi4AAAJHCAYAAAAHaK7PAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3wUZf7A8c/MbE8vpAKBEHoJAaSoVBU86+lPz3a2s4By550etrPheZa7w3L2Xk7xRAWxi6CoqIjSew0ESEjv2b4zvz+WLIQkECCbBPy+X6/ckXlm53l2dzbOd7/zPF/FMAwDIYQQQgghhBBCCCGEEEKIDkBt7wEIIYQQQgghhBBCCCGEEELUk8SFEEIIIYQQQgghhBBCCCE6DElcCCGEEEIIIYQQQgghhBCiw5DEhRBCCCGEEEIIIYQQQgghOgxJXAghhBBCCCGEEEIIIYQQosOQxIUQQgghhBBCCCGEEEIIIToMSVwIIYQQQgghhBBCCCGEEKLDkMSFEEIIIYQQQgghhBBCCCE6DElcCCHEMc4wjPYegmjGr/m9+TU/dyGEEEII0ZBcG/46yPsshGhNkrgQ4jh1+eWX07t37wY/AwYMYNy4cdx///1UVVU1esz27duZPn06p556KoMGDWLcuHHccsstbNy4sdl+Hn/8cXr37s0DDzwQzqfTrKeeeorevXu3S99NmTNnDr1792b37t1hf5zX6+Whhx7i448/PtxhHpaLL76Y3r17M2/evLD209Hey6NRXV3NbbfdxtKlS0PbLr/8ci6//PI2G0NLP88TJkzgjjvuaNW+t2zZwiWXXNIqx9q9eze9e/dmzpw5rXI8IYQQQnQcErO0j+MpZunduzdPPfVUo+2bN29m1KhRjB07lh07doT27d27N4899liTx9J1ndGjRx+z155FRUX861//4vTTTyc7O5uTTz6ZKVOmNIhJIDxxSWFhIddffz35+fmtcrzm3lchxK+LJC6EOI7169ePWbNmhX5ee+01rrrqKmbPns3kyZMb3A3x5Zdfct5557Fu3TpuuOEGXnrpJW6++WZ27NjB7373O3744YdGx9d1nblz59KrVy8+/PBDXC5XWz69X73i4mLeeOMN/H5/2PrIzc1lxYoV9OrVi3feeSds/RxvNmzYwIcffoiu66Ft9913H/fdd1+b9H8kn+fW9MUXX7BixYpWOVZSUhKzZs1i3LhxrXI8IYQQQnQsErMc39oiZjnQli1buOqqq7Db7bz11lt069Yt1KaqKl988UWTj/vll18oLi5uo1G2rmXLlnHuueeycOFCrrjiCp5//nnuuusu3G43l19+OXPnzg1r/z/++CPffvttqx1v1qxZXHjhha12PCHEscnU3gMQQoRPZGQkgwcPbrDthBNOoK6ujieffJJVq1YxePBgdu7cye23387o0aN54okn0DQttP/EiRO55JJLuP322/n666+xWCyhtu+//57CwkIee+wxfv/73/PJJ5/IxcVxZs6cOaSnpzN58mSmTZtGXl4eGRkZ7T2sY1JWVlab9HOkn+eOymKxNPo7JoQQQojjh8QsojVt27aNK6+8koiICN544w3S0tIatA8ZMoSlS5eyfv16+vXr16Dt008/pW/fvmzYsKEth3zUKisr+ctf/kK3bt147bXXsNvtobZJkyZx/fXXc++993LyySeTmJjYjiNtObn+F0KAzLgQ4ldpwIABABQUFADw5ptv4vV6ufvuuxsEAAB2u53bb7+d//u//2s0VXv27Nn06tWLoUOHMmLECGbNmnXIvidMmMBDDz3ElVdeyaBBg7jrrruA4MXWvffey4knnsjAgQP53e9+x+LFixs81uPx8PDDD3PSSSeRk5PDnXfeicfjabBPU9NelyxZQu/evVmyZEloW25uLn/84x8ZPnw4J5xwApMnT2bbtm0N+vrXv/7F2LFjGTBgAGeffTafffZZg+Pqus6zzz7LuHHjyM7O5sYbb2xyOvuBWvq4BQsWcOmll5KTk8OAAQM4/fTTmTlzJhBcPueUU04B4M4772TChAmhx7333nucf/75DB48mEGDBnHuuefy+eefNzh27969D7k8UCAQYO7cuYwfP55TTz0Vh8PR5Hvs8/mYMWMGY8aMYdCgQVxzzTXMnTu30TTyDz74gDPOOIOBAwdyzjnnsHjxYvr163fQadifffYZ559/Pjk5OZx00knce++9DV6rp556itNPP5358+dz1llnMXDgQM4991xWrFjBypUrufDCCxk0aBBnnXVWo/Np8+bNTJ48mSFDhjBkyBCmTp3Krl27Qu31580777zD+PHjGTJkSOguvoO9xkuWLOGKK64A4Iorrgidj/ufm3/4wx84//zzGz3fG2+8kXPOOSf0+9KlS/n9739PdnY2w4cP5/bbb6e8vLzZ1wuO/PO8/3Pe/7Ny4NgB1q5dy5VXXsnQoUPJycnhqquuYuXKlUDwPXn66aeBhlO8dV3nxRdf5LTTTmPAgAFMmjSJN998s1E/06ZN46abbmLw4MFcffXVjZaKmjNnDv369WPVqlVcdNFFDBw4kPHjx/PKK680OFZxcTE333xz6DN+77338vjjjzf4rAghhBCi45KYRWKWlsQs+9u2bRtXXHEFUVFRvPXWW42SFhBMiiUmJjaadeH3+/nyyy8588wzGz2mJe97eXk5999/P+PHj2fAgAEMHz6cqVOnNoiHLr/8cu666y5efPFFxo0bx8CBA7n44otZvXp1aB+328306dMZM2ZM6PU88Dr3QHPnzqW4uJi//e1vDZIWEJxhMm3aNC677DJqa2sbPba5ZVnvuOOOBu/Xzp07mTJlCiNGjCA7O5uLLrooNMNizpw53HnnnQCccsopDd6z9957jzPPPDO0BNxTTz1FIBBo0M+VV17Jfffdx5AhQzjjjDMIBAIN4oj6z8bixYv5wx/+QHZ2NieddBL//ve/GxyrtraWe++9l1GjRpGTk8PNN9/M66+/3qGWaRNCHB5JXAjxK7R9+3YAunTpAsCiRYvo168fycnJTe4/atQobr75Zjp16hTaVllZyddff81vf/tbAM477zzWrFnDunXrDtn/zJkzGThwIM8++ywXXHABHo+HK6+8kq+++oqbb76Zp59+mpSUFK699toGF4S33nor7777LpMnT+aJJ56gqqqK119//bCff1FRERdddBE7duxg+vTp/Pvf/6a0tJQrr7ySyspKDMNg6tSpvPPOO1x99dU899xzoQuf/afY/vvf/+aZZ57hggsu4OmnnyY2NpZHH330kP235HHffPMNU6dOpX///jz77LM89dRTdOnShb///e+sWrWKpKSk0JfDN9xwQ+jfM2fO5N577+XUU0/lhRdeYMaMGVgsFqZNm0ZhYWHo+LNmzeLGG2886Di/++47SkpK+O1vf4vNZuM3v/kNH3zwAV6vt8F+9957L2+88Qa///3veeaZZ0hMTOSee+5psM/cuXO54447GDJkCM8++yyTJk3ixhtvbHCheaBnn32WW265hcGDB/Pkk08ydepU5s2bx+WXX47b7Q7tV1hYyCOPPMKUKVP4z3/+Q3V1NTfddBO33HILF154Ic888wyGYXDzzTeHHrd9+3YuvvhiysrK+Oc//8mDDz7Irl27uOSSSygrK2swjqeffprbb7+de++9l5ycnEO+xv379+fee+8NvTZNLQ91zjnnsG7dOvLy8kLbqqur+e677zj33HOB4FT1q666CpvNxhNPPMHf/vY3fv75Z6644ooGz/9AR/J5Phy1tbVce+21xMXF8dRTT/H444/jcrm45pprqKmp4cILL+SCCy4AGk7xnj59Ok8++STnnHMOzz//PKeffjoPPfQQzzzzTIPjf/7550RERPDcc89x7bXXNjkGXdf5y1/+whlnnMGLL77IkCFD+Ne//sWiRYuA4FrKV155JcuXL+dvf/sbDz/8MBs3buTVV189oucshBBCiLYnMYvELC2JWerl5uZy5ZVXEhkZyVtvvdXseaJpGpMmTWqUuFi8eDEej6fRTS4ted8Nw2Dy5Mn88MMPTJs2jVdeeYU//vG
"text/plain": [
"<Figure size 1600x600 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"pca = PCA(n_components=2)\n",
"reduced_data = pca.fit_transform(data_scaled)\n",
"\n",
"plt.figure(figsize=(16, 6))\n",
"plt.subplot(1, 2, 1)\n",
"sns.scatterplot(x=reduced_data[:, 0], y=reduced_data[:, 1], hue=result, palette='Set1', alpha=0.6)\n",
"plt.title('PCA reduced data: Agglomerative Clustering')\n",
"\n",
"plt.subplot(1, 2, 2)\n",
"sns.scatterplot(x=reduced_data[:, 0], y=reduced_data[:, 1], hue=labels, palette='Set1', alpha=0.6)\n",
"plt.title('PCA reduced data: KMeans Clustering')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Анализ инерции для метода локтя (метод оценки суммы квадратов расстояний)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Анализ инерции для метода локтя — это техника, используемая для определения оптимального числа кластеров в задаче кластеризации (например, для алгоритма K-Means). **Метод основывается на оценке суммы квадратичных отклонений (или инерции) объектов от центров их кластеров**.\n",
"\n",
"Инерция (в контексте кластеризации) — это метрика, которая измеряет \"плотность\" кластеров, то есть, насколько близко точки внутри каждого кластера расположены к его центроиду.\n",
"Формально инерция определяется как **сумма квадратов расстояний всех точек до ближайшего центра кластера**.\n",
"\n",
"Метод локтя:\n",
"1. Для различных значений 𝑘 (количества кластеров) вычисляется инерция.\n",
"2. Значения инерции отображаются на графике в зависимости от 𝑘.\n",
"3. Смотрится точка, после которой уменьшение инерции значительно замедляется. Эта точка называется локтем, и соответствующее значение 𝑘 считается оптимальным числом кластеров."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA14AAAImCAYAAABD3lvqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACHHElEQVR4nOzdeVzVVf7H8fe97DsCCm6IouKGO26pmZnti1naYoujaWbjpJVtTmX7zzRLHSvT1BZHLc2psd3Kyh3LFVBRUVQQ2UUu+/39gdzpCgIicC/wej4ePK58v+d77gc6w/DmnO/5Gsxms1kAAAAAgBpjtHUBAAAAAFDfEbwAAAAAoIYRvAAAAACghhG8AAAAAKCGEbwAAAAAoIYRvAAAAACghhG8AAAAAKCGEbwAAAAAoIYRvAAAAACghhG8AAAAAKCGEbwAoBrcd999CgsL01133XXRNlOnTlVYWJiefvrpWqwMQFWdOHFCYWFhWrt2ra1LAVAPELwAoJoYjUbt2rVLiYmJpc5lZ2fr559/tkFVAADAHhC8AKCadOrUSS4uLvr2229Lnfv555/l5uamwMBAG1QGAABsjeAFANXE3d1dV155ZZnB6+uvv9a1114rR0fHUud+/PFH3X777QoPD9cVV1yhV155RdnZ2ZKkoUOHKiwsrMyPEydOSJI2bdqke+65R7169VLfvn31+OOPKyEhweo9Hn/88TL7qGgJVckSyrI+/mrv3r0aN26c+vbtq549e+rhhx/WoUOHLOe3bdumsLAwbdu2TZJ08OBBDRs2THfddZfmz59/0feYP3++JOmzzz7T9ddfry5dulidr2jZ5urVq8vs96/XlSwnq6hdVWuo7PemvPe/2PmS/w5PP/20hg4davW+K1eutPoe/vV9du7cadX2k08+UVhYmFUfOTk5mjNnjoYPH64uXbqoZ8+eGjt2rKKjo62uvVhd9913n1WbkjrKcuH4KHHfffdZ9ZObm6t//etfuu666xQeHq7hw4dr0aJFKioqsrrmwlq2bdtWqWsrYjab9cwzz6hr1676/fffK30dAEhS6d8AAABVdsMNN+ixxx5TYmKigoKCJElZWVn69ddftXTpUv36669W7b/66is98cQTuvnmm/XYY4/p5MmTmjt3rmJjY7V06VItWLBAeXl5OnPmjB599FFNmjRJQ4YMkSQ1adJE69at01NPPaWbbrpJEydOVFpamubNm6fRo0friy++kL+/v6TiX1hHjx6t22+/XZIs/VVGp06d9MILL1g+/+yzz/T5559bPt+6davGjx+vvn376rXXXlNubq7ef/993XXXXVq9erVCQ0NL9fnmm2+qS5cumjRpknx8fDRo0CBJ0syZMyXJ8n5BQUHasWOHZsyYoTvuuEMzZsyQh4eHJFWq/pycHIWHh2vGjBmWYxe77q/f2wvbVbWGS/nePP/88+rcuXOZ779q1SpJ0v79+/XSSy+VanuhjIwMvf3222We8/Dw0E8//aRevXpZjn399dcyGq3/Fjt9+nRFRkZq2rRpCg4O1rFjx/TOO+/o8ccf1/r162UwGCxt77jjDt15552Wz0v+O1Yns9mshx9+WLt27dKjjz6qDh06aNu2bXr77bcVHx+vl19+2dL2wjEbGhpa6WvL88orr+i///2v/vWvf2ngwIHV/jUCqN8IXgBQjYYMGSI3Nzd9++23evDBByVJP/zwg/z9/a1+0ZWKf5GcPXu2Bg0apNmzZ1uOh4SE6MEHH9TGjRstQaBkdis4OFjdu3eXJBUVFWn27NkaOHCg5syZY7m+Z8+euuGGG7RkyRJNnz5dkmQymRQSEmK5tqS/yvD09LRcJ0m//fab1fk5c+aoVatWWrRokRwcHCRJAwcO1DXXXKN58+bpnXfesWp/7Ngx/f777/ryyy/Vrl07SbKEVE9PT0myer/169dLkp599llL4JEkZ2fnCms3mUwKCAiw6u9i1/31e3thuz179lSphkv53rRt2/ai719yPDc3t8y2F5o3b56aNWumtLS0UucGDx6sDRs26Mknn5QkJSYm6s8//1Tv3r118uRJSVJeXp7OnTunGTNm6IYbbpAk9enTR1lZWXrjjTeUnJysxo0bW/oMCgqyqqfkv2N1+vXXX7V582a99dZbuvHGGyVJV1xxhVxdXfXOO+/o/vvvt4ynC8fsxo0bK33txcyZM0erVq3SggULNHjw4Gr/+gDUfyw1BIBq5OrqqqFDh1otN1y/fr2uv/56qxkCSTpy5IgSExM1dOhQFRQUWD4iIiLk6empTZs2lfteR48e1ZkzZ3TTTTdZHQ8ODlaPHj20fft2y7GEhAR5eXlVw1doLTs7W3v37tX1119vCRaS5O3trauuusqqhpL2c+fOVd++fSv8RbdE165dJUkffvihkpKSlJeXp4KCgkpdW11fd1VquNTvTXU5ePCgVq1apX/+859lnh86dKji4uJ05MgRSdK3336rbt26qXnz5pY2zs7OWrJkiW644QadPn1aW7du1cqVKy0bxOTl5V1yXUVFRSooKJDZbK6wTcnHX9tu375djo6Ouu6666yuueWWWyznL+ZyrpWkTz/9VIsWLdKNN95oNSsKAJeCGS8AqGbXX3+9Hn30USUmJsrFxUVbtmzRY489Vqpdenq6pOJlWWUtzUpKSir3fUquDwgIKHUuICBAUVFRkopn1k6dOqUWLVpc2hdSCWfPnpXZbL5oDWfPnrU69vDDD8vb29tqqWJFIiIiNGPGDC1atEgLFiy4pPpOnjxZ7pK8mqzhUr831eWVV17RjTfeqB49epR5PjAwUF26dNGGDRvUpk0bff3117rpppss46XEb7/9ptdee01HjhyRh4eHOnToIHd3d0kqNzxdzMKFC7Vw4UI5ODgoICBAAwcO1D/+8Q+rDWdKZon/qk+fPpKKl082atTIKsRKssy8lff9vJxrJSkmJkYDBw7Uf//7Xz3wwAPq1KlTue0BoCwELwCoZoMHD5aHh4e+/fZbubu7q0WLFurSpUupdt7e3pKK76Up+eXyr3x8fMp9H19fX0lScnJyqXNnzpxRo0aNJEnR0dHKyckptSFGdfDy8pLBYLhoDSU1lpg+fbq+/fZbTZkyRZ9++mmll6SNGjVKv//+uwoKCvT888+rRYsWmjRpUrnXFBUVaffu3Ro5cmSl3uPCGcnLreFSvzfV4ZtvvtG+ffuslp6W5eqrr9aGDRt0/fXXa9++fVqwYIFV8Dp+/LgmT56sYcOG6f3331fLli1lMBj06aefllpqKlX8vZOKv3+jRo1SUVGRTp06pblz5+qhhx7Sl19+aWkzc+ZMq6D81/u0fHx8lJaWpsLCQqsAVfIHipLxXpbLuVaS/vGPf+j+++/XjTfeqBkzZuizzz4rFeIAoCIsNQSAaubs7Kxhw4bpu+++0zfffGO5p+RCbdq0kb+/v06cOKHw8HDLR2BgoObMmVNqBuJCrVu3VuPGjfXf//7X6nh8fLx27dqlnj17SpJ++eUXdezYUX5+fpf8tRQVFZX7C6a7u7u6dOmib775RoWFhZbjZ8+e1S+//FLqvrYuXbpowYIFOnnypN58881K1/HOO+/ol19+0RtvvKHrr79e4eHhFd5f9ccffyg7O1t9+/Ytt13J7M2Fm0tcbg2X+r25XHl5eZo1a5YmT55sdf9VWYYNG6bdu3frk08+Ua9evdSkSROr8/v27VNubq4mTJig4OBgS7AqCV0l37OSHQEr+t5JxZvBhIeHq1u3brr++ut177336sCBA8rIyLC0ad26tdX/Fv56P12fPn1UUFBQatfQkuBW3vfzcq6VimcoXV1d9fzzz2v//v1aunRphV8vAFyIGS8AqAE33HCDJk6cKKPRaLWj3l85ODho6tSpev755+Xg4KCrrrpKmZmZWrhwoU6fPl3hEjmj0ahp06bpmWee0eOPP65bbrlFaWlpWrBggXx8fDR27Fjt379fn376qW688Ubt2rXLcu2ZM2ckFc9spKamlgplqampio2N1bFjxywB7mIef/xxjRs3ThMmTNA999yj/Px8LVq0SHl5eZo8eXKp9oG
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"inertias = []\n",
"clusters_range = range(1, 23)\n",
"for i in clusters_range:\n",
" kmeans = KMeans(n_clusters=i, random_state=random_state)\n",
" kmeans.fit(data_scaled)\n",
" inertias.append(kmeans.inertia_)\n",
"\n",
"plt.figure(figsize=(10, 6))\n",
"plt.plot(clusters_range, inertias, marker='o')\n",
"plt.title('Метод локтя для оптимального k')\n",
"plt.xlabel('Количество кластеров')\n",
"plt.ylabel('Инерция')\n",
"plt.grid(True)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Расчитаем коэффициенты силуэта"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1oAAAImCAYAAABKNfuQAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADHbElEQVR4nOzdeViUZfcH8O8zw8CwDLuACLjhAgouueG+lWWb2l5q9WaWWbap5U/fbLH0dcnS0tK01DTLtTItl0pLU3HLBVwQEUTZYYZtGJiZ3x/DjCKIDMzwzPL9XFdXOfPMM2duMTmc+z5H0Ov1ehAREREREZHFSMQOgIiIiIiIyNEw0SIiIiIiIrIwJlpEREREREQWxkSLiIiIiIjIwphoERERERERWRgTLSIiIiIiIgtjokVERERERGRhTLSIiIiIiIgsjIkWERERERGRhTHRIiKnMWbMGIwZM6bKY0eOHMEDDzyA6OhobNiwwarv//bbb2Pw4MFmv27w4MF4++23rRAREVlLu3btsHjxYrHDICIRuYgdABGRWHJzc/Hiiy+iQ4cOWLFiBdq1ayd2SEREROQgmGgRkdP6+uuvoVarMXfuXAQHB4sdDhERETkQbh0kIqeUn5+PdevW4f7776+WZKWkpGDSpEno06cPOnfujDFjxuDo0aNVrvnzzz8xatQodOrUCb1798bMmTNRWFhY5Zq1a9di0KBB6NSpE15//XUUFRUBAJYuXYq4uDh069YNM2fOhEajMb1Go9HgvffeQ/fu3dGzZ0/T1qPi4mJMmTIFnTt3xoABA7B27VrTa65cuYJ27dph8+bNpsfKysowZMiQKlW6mrZOHjp0CO3atcOhQ4dq/DVgqPx169at2rbHDRs24N5770XHjh0xcOBALF68GFqt1vR8TVslb4zV+F41/WOM83bbJmv6TDfLysrCW2+9hbi4OHTp0gWjR4/G8ePHTc/fvMVLr9fj8ccfR7t27XDlypUq19UW66RJk9C/f3/odLoq7z99+nQMGzYMAJCRkYE33ngDvXr1QqdOnTBmzBicOHECALB48eJbvocxvrNnz+Lll19Gr1690KFDB/Tr1w+zZs2CWq2udQ32799fa+x1/YwAsGfPHowcORKdOnWq9V432rx5M9q1a4d///0XI0eORGxsLO6//378+uuvVa67cuUKpk6dir59+6JDhw6Ii4vD1KlTkZ+fb7omMTERTz31FLp06YKhQ4di/fr1pudq+voFqn+d3G5b341fd6tXr6725+vgwYNo3749Pv/881ve42aLFi1CVFQUtmzZUufXEJF9Y0WLiJyKXq/HtWvXMGvWLFRUVOCFF16o8nxSUhIeffRRtGjRAjNmzIBMJsPq1avx9NNPY+XKlejRowfi4+MxYcIEPPDAA3jzzTdx4cIFfPLJJzh//jy+/fZbSKVS7N69G++//z7GjBmD/v374/vvv8fu3bsBANu3b8esWbOQnp6O+fPnQy6XY9q0aQCAefPmYdOmTZg6dSpCQkKwcOFCpKenIz09HXfffTcWLVqEffv24f3330dISAiGDBlS4+f86quvqiQJDbFgwQIUFhbC29vb9NiXX36JhQsXYvTo0Zg2bRoSExOxePFiXLt2DR999FGd7tuhQwd8//33AAxJ28aNG02/9vLyskjsxcXFeOKJJ6DVajFlyhQEBwdj5cqV+M9//oMtW7agRYsW1V7z448/VknEbvTwww/jkUceMf36vffeq/Lcb7/9hkOHDiEuLg4AoFar8euvv+L555+HRqPBuHHjUF5ejpkzZ0Imk2HJkiUYM2YMfvjhBzzyyCPo169flfvOnDkTABASEoKsrCw89dRT6Ny5M+bMmQNXV1fs27cPX3/9NYKCgjB+/PhbroNarUZISAg+/fTTGmOv62dMTU3Fq6++in79+uH11183fU3c6l43e+GFFzB69Gi8/vrr2LhxI1577TV8+eWXGDBgAEpLSzF27Fj4+flh5syZUCgUOH78OD777DPI5XK8//77KC0txfPPP49mzZph8eLFOHbsGGbOnInQ0FD079+/TjGYa8yYMdi5cyf+97//YeDAgXB1dcX//d//oXPnznjxxRfrdI8VK1ZgyZIlmDVrFkaOHGmVOInI9jDRIiKnEh8fj4EDB0Imk2H58uXVvtH+7LPP4OrqitWrV5u+2R84cCDuu+8+zJ07Fxs3bsTWrVvRokULzJ49GxKJBH369IG7uzveeecd7N27F4MHD8YXX3yBnj17YsaMGQCAnj17ok+fPigsLMTs2bPRsWNHAIBKpcLy5cvx0ksvQafT4fvvv8f48eMxevRoAEBgYCAee+wx+Pr6Yv78+ZDJZOjfvz/Onz+PL7/8ssZE69q1a1i+fDk6dOiAM2fONGi9Tp06hR9//BFRUVFQqVQAgMLCQixZsgSPPfaY6fP17dsXvr6+mDFjBp599lm0adPmtvf28vJC586dAQB//fUXAJh+bSlbtmxBeno6tmzZgqioKABA165dMWLECMTHx1f7/S8uLsb8+fNvuXYhISFVYrwxIezbty9CQkKwdetWU6K1a9culJSUYMSIEThx4gSSk5Oxdu1adOnSxRTLnXfeiSVLlmDx4sUICQmpct8b3+vvv/9GVFQUPv30U9PzvXv3xv79+3Ho0KFaE63S0lJ4e3vfMva6fsaEhASUl5fj9ddfR9u2bW97r5uNGTMGEydOBAD069cPI0eOxOeff44BAwYgJSUFISEh+N///ofw8HAAQK9evfDvv//i8OHDAID09HTExMTg//7v/xAeHo6+ffti3bp1+Ouvv6yWaAmCgNmzZ+OBBx7AvHnzIJVKUVBQgFWrVkEqld729d999x3mzZuH999/Hw8//LBVYiQi28Stg0TkVKKjozFnzhz4+Phg2rRp1ao+hw8fxqBBg6p84+ji4oJ7770Xp0+fRnFxMT788ENs3boVEokEFRUVqKiowLBhwyCRSBAfH4+KigokJCSgb9++pnu4ubmhU6dOcHd3NyVZgOGbc7VajXPnzuHcuXMoKyszVTUAwzfabm5uiI2NhUwmq/K6M2fOVNmqZ/S///0P3bp1w6BBgxq0Vnq9HrNmzcLDDz+M9u3bmx4/fvw41Go1Bg8ebPr8FRUVpm2C+/fvr3KfG6+5eVtdXeOo72uPHj2KsLAwU5IFAO7u7vjtt9+qVG2MlixZAj8/PzzxxBNmv5dEIsHIkSOxc+dOlJaWAjAker1790ZISAh69OiBEydOoHPnztBqtaioqIC3tzf69OmD+Pj4296/b9+++Pbbb+Hm5oakpCTs2bMHS5cuRV5eXpXtpzW5du0aFAqF2Z/pZh06dICLiwu+/fZbpKenQ6PRoKKiAnq9vk6vv7GaIwgC7rzzTpw8eRJqtRpRUVFYt24dmjVrhpSUFOzduxcrVqxAcnKy6fNFRkZi6dKlCA8Ph0ajwb59+6BUKtG6desq76PT6ap83dUUn/GausQeHh6OyZMnY8uWLdiwYQNmzJhhSgZr88cff+C9995Dt27d8Oijj972eiJyLKxoEZFT8fLywsiRI9GqVSs88cQTeO211/D999+bfjKtVCoRGBhY7XWBgYHQ6/UoKiqCp6cn3NzcABi+8byRSqVCbm4utFot/Pz8qjzn6+sLHx+fKo8Zt17l5OSYkqabX+fj4wNfX99qr6uoqKhydgUwJIq7d+/GTz/9hF9++aUuS3JLW7duRUpKCr744gv873//Mz1eUFAAALesoGRlZZn+Oz09vdoa1SeOrVu3QhAEBAQE4I477sCrr75a7ZvrmhQUFCAgIKBO75OSkoJVq1bhq6++wtWrV+sV60MPPYQvvvgCO3fuRK9evfDPP/9g/vz5puddXV0BGM5t3XhWpy6VEZ1Oh48//hhr165FSUkJmjZtitjYWNPXYm3S09PRrFmzenyiqsLDwzFv3jx8/PHHpm2eRj169Ljt64OCgqr8OiAgAHq9HiqVCnK5HF9//TW++OILFBQUIDAwEB07doS7u3u1848qlQrdu3cHADRp0gT33HNPleefeeaZau99c3xLlizBkiVLIJVKERg
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"silhouette_scores = []\n",
"for i in clusters_range[1:]: \n",
" kmeans = KMeans(n_clusters=i, random_state=random_state)\n",
" labels = kmeans.fit_predict(data_scaled)\n",
" score = silhouette_score(data_scaled, labels)\n",
" silhouette_scores.append(score)\n",
"\n",
"plt.figure(figsize=(10, 6))\n",
"plt.plot(clusters_range[1:], silhouette_scores, marker='o')\n",
"plt.title('Коэффициенты силуэта для разных k')\n",
"plt.xlabel('Количество кластеров')\n",
"plt.ylabel('Коэффициент силуэта')\n",
"plt.grid(True)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Средний коэффициент силуэта (silhouette score) используется для оценки качества кластеризации. Его значение лежит в диапазоне от -1 до 1. Что означают различные значения:\n",
"\n",
"* Близко к 1.0 (0.71.0): Кластеры хорошо разделены и компактны. Это отличный результат кластеризации.\n",
"\n",
"* От 0.5 до 0.7: Кластеры четко различимы, но есть некоторое пересечение между ними. Это хороший результат.\n",
"* От 0.25 до 0.5: Кластеры перекрываются, что указывает на менее четкую границу между группами. Качество кластеризации удовлетворительное, но может потребоваться уточнение числа кластеров или доработка данных.\n",
"\n",
"* Близко к 0.0: Кластеры сильно перекрываются или распределение данных не позволяет выделить четкие группы. В этом случае нужно пересмотреть выбор числа кластеров, алгоритм или исходные данные.\n",
"\n",
"* Меньше 0.0: Плохая кластеризация: точки ближе к центрам чужих кластеров, чем к своим. Это сигнал о том, что данные плохо структурированы для текущей кластеризации."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Средний коэффициент силуэта: 0.302\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA04AAAJzCAYAAAA4M0NGAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3gc1dXA4d+d2b6rLtmSLPci925jYww2vdcQqg0JBEgooYWEhFQCHySUhB4gVEMwvVfTbcC94oa75aJq9a0z9/tjpLWFJBdcZOzzPo8frLlT7o5mzZ49956rtNYaIYQQQgghhBCtMtq6A0IIIYQQQgixv5PASQghhBBCCCF2QAInIYQQQgghhNgBCZyEEEIIIYQQYgckcBJCCCGEEEKIHZDASQghhBBCCCF2QAInIYQQQgghhNgBCZyEEEIIIYQQYgckcBJCCCGEEEKIHZDASQjxg02YMIHCwsImf4YPH87EiROZMWNGW3dPCHGAKyws5P7772+2ffny5YwePZojjjiCNWvWtHr8/fffT2FhIQMGDKC2trbFff73v/9RWFjIkUceuae6LYT4kZLASQixW/r27cvkyZOZPHkyzz//PHfccQdut5tLLrmE7777rq27J4Q4yHz33XdcfPHF+P1+Jk2aRJcuXXZ4TCKR4JNPPmmx7d13393DPRRC/FhJ4CSE2C2hUIjBgwczePBghg0bxtFHH83999+PYRi8+uqrbd09IcRBZOXKlVx00UUEg0EmTZpEx44dd+q4oUOH8t577zXbXlxczKxZs+jTp8+e7qoQ4kdIAichxB7n9/vxer0opZLbJkyYwIQJE5rsd/fdd1NYWNgkwJo0aRJHHXUUQ4YM4cILL2T58uUAPPfccxQWFrJ69eom53jjjTfo06cPmzZtAmDKlCmcf/75DBkyhP79+3P88cfz3HPPNTnmd7/7XbMhho1/ioqKkvt8f2jOCy+80Gxo0LvvvsuJJ57I4MGDOfPMM5k1a1aTY3bUn+nTp1NYWMj06dObHPf9+7Uz9y8Wi3HnnXdyxBFH0KdPnyava3tB7PfPfdtttzFgwAC++OILYOtwppb+bNvvnbn3JSUl/Pa3v2X06NHJ3/HcuXMBOPLII3f4e5k1axYXXnghgwYNYuTIkfz2t7+loqIief5XX32VwsJC5s+fzxlnnMHAgQM55ZRTeP/995v0o6amhv/7v//j6KOPZsCAAZx88sm8/PLLTfbZtj+9e/dmxIgRXH311WzZsqXVewmwatUqrrrqKkaOHMmIESO4/PLLWblyZav7b+/+bvt7W7NmDddccw1jxoxh8ODBTJgwgdmzZyfbi4qKkse9+eabTa7x6aefJtu29e6773LmmWcyZMgQxowZw5/+9Ceqqqqa9W1bLT2LRx55JL/73e9a/fn7Gvu67eubM2cO55xzDgMGDGDMmDHceuutRCKRVs/xfStXrmTixImkpKQwadIk8vPzd/rYE088kalTpzYbrvf+++/TtWtXevfu3eyYKVOmcOaZZyb7+/e//536+vpm++zM+//rr7/m5z//OYMGDWLMmDH885//xLKs5H7Tpk3jpz/9KUOGDGHEiBH88pe/3O4zJYTYOyRwEkLsFq01iUSCRCJBPB6ntLSUu+++m1gsxllnndXqcevWreOpp55qsu3DDz/k1ltv5aSTTuLBBx/EsiyuuOIKYrEYp5xyCl6vlzfeeKPJMa+//jqjR48mLy+Pzz77jCuvvJJ+/frx0EMPcf/999OxY0f+9re/MX/+/CbH5eTkJIcYTp48mV/+8pfbfZ1VVVX861//arJtwYIF3HjjjQwePJiHH36YvLw8rrjiCsrKygB2qT+7qqX799hjj/H0009z0UUX8fTTTzN58mQeeOCBXTrvggUL+N///se//vUvhgwZ0qRt2/v1pz/9qUnbzrzWuro6zjvvPKZPn85vfvMbHnjgAbxeLz//+c9Zs2YNDzzwQJM+//KXv0xer127dsycOZOLL74Yn8/Hv/71L37/+98zY8YMJk6c2OwD9uWXX85RRx3FAw88QNeuXbn22mv5/PPPAYhEIpx//vm89dZbXHrppTz00EMMGzaMP/zhDzzyyCNNznPEEUcwefJknn32WW644QamTZvGbbfd1ur9Ky4u5pxzzmHNmjX85S9/4Z///CdlZWVcdNFFVFZWbvfeb3t/v/97W7FiBWeeeSZFRUXccsst3HXXXSiluOiii5rNJwwGg82Gnb377rsYRtP/5T/00ENcf/31DB48mPvuu48rr7ySDz74gAkTJuxSwLInbNq0iUsuuYSMjAweeOABrrnmGt544w1uuummnTp+1apVXHTRRYRCISZNmkT79u136frHHXcclmW1eN9OOumkZvu/9dZbXHnllXTr1o0HH3yQq666ijfffJNf/epXaK2BXXv/33jjjQwbNoxHHnmEk08+mccff5yXXnoJgPXr1/OrX/2K/v378/DDD3PbbbexevVqLrvsMmzb3qXXKYTYPa627oAQ4sdt5syZ9OvXr9n266+/nu7du7d63O23307Pnj359ttvk9sqKio4//zzuf766wEng9L4bX2fPn045phjePPNN/n1r3+NUorNmzfzzTff8M9//hNwPlyeccYZ/OEPf0iec8iQIRxyyCFMnz6dQYMGJbd7PB4GDx6c/HnVqlXbfZ333Xcf+fn5TbINmzdv5rjjjuPvf/87hmGQnZ3NySefzLx58zj66KN3qT+7qqX7t2DBAnr37s3Pf/7z5LbGTM3Oasz4HXXUUc3atr1f0Wi0SdvOvNbXXnuNDRs28NprryWHPg0dOpTTTz+dmTNncvbZZzfpc6dOnZpc8+6776Zr16785z//wTRNAAYNGsRJJ53EK6+8wgUXXJDcd8KECVx55ZUAjB07ljPOOIMHH3yQI444gldffZXly5fzwgsvJIPDsWPHkkgkeOihhzj33HNJT08HIDMzM9mHESNG8NVXXzW559/31FNPEYvFePLJJ8nJyQGgd+/enHfeecyfP58jjjii1WO3fa3f/7098MADeDwennnmGUKhEADjxo3j5JNP5h//+EeTbNnhhx/Ol19+SSwWw+PxEI1G+fjjjxkxYkQyQ1hVVcXDDz/MT3/60yZBcK9evbjgggua3c+97bHHHiMjI4MHH3ww+bs1DINbbrmFZcuWNct6bWvNmjVMnDiRsrIy4vH4DwomsrOzGTFiBO+99x6nnnoqABs2bGD+/Pn84x//4OGHH07uq7XmrrvuYuzYsdx1113J7V26dOHiiy/m888/Z9y4cbv0/j/77LOTz+vo0aOZMmUKn332Geeeey4LFiwgEolw+eWXJwPC3NxcPv74Y+rr65PPgxBi75PASQixW/r168df//pXwPlAUV1dzRdffMG9995LfX091113XbNjvvjiC7766isee+wxJk6cmNx+7rnnAmDbNvX19Xz44Yf4fD46dOgAwE9+8hPefvttZs2axYgRI3j99dcJBoMcc8wxAFx66aWAk9lYvXo169atY+HChYAThP1Qy5cvT2YdGvsIcOyxx3Lssceitaa+vp733nsPwzDo2rXrXu1Pa/dvwIABPProo3zwwQeMGjWKYDC40x8itdbMnTuXd999t1kma2fszGudPXs2BQUFTeaL+P1+Pvjggx2ePxwOM3/+fC655JJklhOgY8eOdO/enWnTpjX5oH/GGWck/66U4phjjuH+++8nEokwY8YMOnTo0Cyjduqpp/Lyyy83CXAar2XbNkuXLmX27NkceuihrfZz9uzZDB48OBk0gfMh99NPP93ha9yeGTNmMH78+CYfkl0uVzI7W1dXl9w+atQovvjiC6ZPn87YsWP54osvCIVCDB8+PBk4zZs3j1gsxsknn9zkOsOHD6dDhw7MmDFjtwOnxntnGEazbFcj27ZJJBLMmjWLww47LBk0gRMAgnNPtxc4vf322/Tv3597772Xn//85/zmN7/hqaeeanJNy7KSmSBwnoltrwXOcL2///3v1NbWEgqFeOedd+jXrx+dO3dust+qVavYvHkzl19+efI5BCewDoVCTJs
"text/plain": [
"<Figure size 1000x700 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"kmeans = KMeans(n_clusters=4, random_state=42) \n",
"df_clusters = kmeans.fit_predict(data_scaled)\n",
"\n",
"silhouette_avg = silhouette_score(data_scaled, df_clusters)\n",
"print(f'Средний коэффициент силуэта: {silhouette_avg:.3f}')\n",
"\n",
"pca = PCA(n_components=2)\n",
"df_pca = pca.fit_transform(data_scaled)\n",
"\n",
"plt.figure(figsize=(10, 7))\n",
"sns.scatterplot(x=df_pca[:, 0], y=df_pca[:, 1], hue=df_clusters, palette='viridis', alpha=0.7)\n",
"plt.title('Визуализация кластеров с помощью K-Means')\n",
"plt.xlabel('Первая компонентa PCA')\n",
"plt.ylabel('Вторая компонентa PCA')\n",
"plt.legend(title='Кластер', loc='upper right')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В нашем случае результат находится ближе к хорошему, но пока что всё-таки больше в удовлетворительном состоянии, что приемлемо и говорит о некотором пересечении между кластерами"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}