816 lines
2.9 MiB
Plaintext
816 lines
2.9 MiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Лабораторная работа 5\n",
|
|||
|
"\n",
|
|||
|
"Датасет - **Объекты вокруг Земли**\thttps://www.kaggle.com/datasets/sameepvani/nasa-nearest-earth-objects\n",
|
|||
|
"\n",
|
|||
|
"1. **name**: Название астероида\n",
|
|||
|
"2. **absolute_magnitude**: Абсолютная звездная величина астероида\n",
|
|||
|
"3. **estimated_diameter_min**: Минимальный оценочный диаметр астероида (в километрах)\n",
|
|||
|
"4. **estimated_diameter_max**: Максимальный оценочный диаметр астероида (в километрах)\n",
|
|||
|
"5. **hazardous**: Является ли астероид потенциально опасным\n",
|
|||
|
"6. **relative_velocity**: Относительная скорость астероида по отношению к Земле (в километрах в секунду)\n",
|
|||
|
"7. **miss_distance**: Расстояние между Землёй и астероидом в момент его максимального сближения\n",
|
|||
|
"8. **orbiting_body**: Центральное небесное тело, вокруг которого движется астероид\n",
|
|||
|
"9. **sentry_object**: Указывает, отслеживается ли данный объект системой мониторинга NASA Sentry\n",
|
|||
|
"10. **z**: Глубина бриллианта в миллиметрах"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"**Бизнес-цели**: \n",
|
|||
|
"1. Поддержка образовательных и информационных программ.\n",
|
|||
|
"2. Группировать астероиды по \"интересным\" характеристикам для визуализации и информирования общества (например, медленные и большие астероиды, самые близкие к Земле и т.д.)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 1,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>id</th>\n",
|
|||
|
" <th>name</th>\n",
|
|||
|
" <th>est_diameter_min</th>\n",
|
|||
|
" <th>est_diameter_max</th>\n",
|
|||
|
" <th>relative_velocity</th>\n",
|
|||
|
" <th>miss_distance</th>\n",
|
|||
|
" <th>orbiting_body</th>\n",
|
|||
|
" <th>sentry_object</th>\n",
|
|||
|
" <th>absolute_magnitude</th>\n",
|
|||
|
" <th>hazardous</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>0</th>\n",
|
|||
|
" <td>2162635</td>\n",
|
|||
|
" <td>162635 (2000 SS164)</td>\n",
|
|||
|
" <td>1.198271</td>\n",
|
|||
|
" <td>2.679415</td>\n",
|
|||
|
" <td>13569.249224</td>\n",
|
|||
|
" <td>5.483974e+07</td>\n",
|
|||
|
" <td>Earth</td>\n",
|
|||
|
" <td>False</td>\n",
|
|||
|
" <td>16.73</td>\n",
|
|||
|
" <td>False</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>2277475</td>\n",
|
|||
|
" <td>277475 (2005 WK4)</td>\n",
|
|||
|
" <td>0.265800</td>\n",
|
|||
|
" <td>0.594347</td>\n",
|
|||
|
" <td>73588.726663</td>\n",
|
|||
|
" <td>6.143813e+07</td>\n",
|
|||
|
" <td>Earth</td>\n",
|
|||
|
" <td>False</td>\n",
|
|||
|
" <td>20.00</td>\n",
|
|||
|
" <td>True</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>2512244</td>\n",
|
|||
|
" <td>512244 (2015 YE18)</td>\n",
|
|||
|
" <td>0.722030</td>\n",
|
|||
|
" <td>1.614507</td>\n",
|
|||
|
" <td>114258.692129</td>\n",
|
|||
|
" <td>4.979872e+07</td>\n",
|
|||
|
" <td>Earth</td>\n",
|
|||
|
" <td>False</td>\n",
|
|||
|
" <td>17.83</td>\n",
|
|||
|
" <td>False</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>3</th>\n",
|
|||
|
" <td>3596030</td>\n",
|
|||
|
" <td>(2012 BV13)</td>\n",
|
|||
|
" <td>0.096506</td>\n",
|
|||
|
" <td>0.215794</td>\n",
|
|||
|
" <td>24764.303138</td>\n",
|
|||
|
" <td>2.543497e+07</td>\n",
|
|||
|
" <td>Earth</td>\n",
|
|||
|
" <td>False</td>\n",
|
|||
|
" <td>22.20</td>\n",
|
|||
|
" <td>False</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>3667127</td>\n",
|
|||
|
" <td>(2014 GE35)</td>\n",
|
|||
|
" <td>0.255009</td>\n",
|
|||
|
" <td>0.570217</td>\n",
|
|||
|
" <td>42737.733765</td>\n",
|
|||
|
" <td>4.627557e+07</td>\n",
|
|||
|
" <td>Earth</td>\n",
|
|||
|
" <td>False</td>\n",
|
|||
|
" <td>20.09</td>\n",
|
|||
|
" <td>True</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" id name est_diameter_min est_diameter_max \\\n",
|
|||
|
"0 2162635 162635 (2000 SS164) 1.198271 2.679415 \n",
|
|||
|
"1 2277475 277475 (2005 WK4) 0.265800 0.594347 \n",
|
|||
|
"2 2512244 512244 (2015 YE18) 0.722030 1.614507 \n",
|
|||
|
"3 3596030 (2012 BV13) 0.096506 0.215794 \n",
|
|||
|
"4 3667127 (2014 GE35) 0.255009 0.570217 \n",
|
|||
|
"\n",
|
|||
|
" relative_velocity miss_distance orbiting_body sentry_object \\\n",
|
|||
|
"0 13569.249224 5.483974e+07 Earth False \n",
|
|||
|
"1 73588.726663 6.143813e+07 Earth False \n",
|
|||
|
"2 114258.692129 4.979872e+07 Earth False \n",
|
|||
|
"3 24764.303138 2.543497e+07 Earth False \n",
|
|||
|
"4 42737.733765 4.627557e+07 Earth False \n",
|
|||
|
"\n",
|
|||
|
" absolute_magnitude hazardous \n",
|
|||
|
"0 16.73 False \n",
|
|||
|
"1 20.00 True \n",
|
|||
|
"2 17.83 False \n",
|
|||
|
"3 22.20 False \n",
|
|||
|
"4 20.09 True "
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 1,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import seaborn as sns\n",
|
|||
|
"from scipy.cluster.hierarchy import dendrogram, linkage, fcluster\n",
|
|||
|
"from sklearn.cluster import KMeans\n",
|
|||
|
"from sklearn.decomposition import PCA\n",
|
|||
|
"from sklearn.preprocessing import StandardScaler\n",
|
|||
|
"from sklearn.metrics import silhouette_score\n",
|
|||
|
"\n",
|
|||
|
"df = pd.read_csv(\"data/neo.csv\")\n",
|
|||
|
"df = df.head(1500)\n",
|
|||
|
"df.head()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Очистка данных\n",
|
|||
|
"\n",
|
|||
|
"Удалим несущественные данные"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 2,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>id</th>\n",
|
|||
|
" <th>est_diameter_min</th>\n",
|
|||
|
" <th>est_diameter_max</th>\n",
|
|||
|
" <th>relative_velocity</th>\n",
|
|||
|
" <th>miss_distance</th>\n",
|
|||
|
" <th>absolute_magnitude</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>0</th>\n",
|
|||
|
" <td>2162635</td>\n",
|
|||
|
" <td>1.198271</td>\n",
|
|||
|
" <td>2.679415</td>\n",
|
|||
|
" <td>13569.249224</td>\n",
|
|||
|
" <td>5.483974e+07</td>\n",
|
|||
|
" <td>16.73</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>1</th>\n",
|
|||
|
" <td>2277475</td>\n",
|
|||
|
" <td>0.265800</td>\n",
|
|||
|
" <td>0.594347</td>\n",
|
|||
|
" <td>73588.726663</td>\n",
|
|||
|
" <td>6.143813e+07</td>\n",
|
|||
|
" <td>20.00</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>2</th>\n",
|
|||
|
" <td>2512244</td>\n",
|
|||
|
" <td>0.722030</td>\n",
|
|||
|
" <td>1.614507</td>\n",
|
|||
|
" <td>114258.692129</td>\n",
|
|||
|
" <td>4.979872e+07</td>\n",
|
|||
|
" <td>17.83</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>3</th>\n",
|
|||
|
" <td>3596030</td>\n",
|
|||
|
" <td>0.096506</td>\n",
|
|||
|
" <td>0.215794</td>\n",
|
|||
|
" <td>24764.303138</td>\n",
|
|||
|
" <td>2.543497e+07</td>\n",
|
|||
|
" <td>22.20</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>4</th>\n",
|
|||
|
" <td>3667127</td>\n",
|
|||
|
" <td>0.255009</td>\n",
|
|||
|
" <td>0.570217</td>\n",
|
|||
|
" <td>42737.733765</td>\n",
|
|||
|
" <td>4.627557e+07</td>\n",
|
|||
|
" <td>20.09</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" id est_diameter_min est_diameter_max relative_velocity \\\n",
|
|||
|
"0 2162635 1.198271 2.679415 13569.249224 \n",
|
|||
|
"1 2277475 0.265800 0.594347 73588.726663 \n",
|
|||
|
"2 2512244 0.722030 1.614507 114258.692129 \n",
|
|||
|
"3 3596030 0.096506 0.215794 24764.303138 \n",
|
|||
|
"4 3667127 0.255009 0.570217 42737.733765 \n",
|
|||
|
"\n",
|
|||
|
" miss_distance absolute_magnitude \n",
|
|||
|
"0 5.483974e+07 16.73 \n",
|
|||
|
"1 6.143813e+07 20.00 \n",
|
|||
|
"2 4.979872e+07 17.83 \n",
|
|||
|
"3 2.543497e+07 22.20 \n",
|
|||
|
"4 4.627557e+07 20.09 "
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 2,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"df_cleaned = df.drop(columns=['name', 'orbiting_body', 'sentry_object', 'hazardous'], errors='ignore').dropna()\n",
|
|||
|
"df_cleaned.head()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Визуализация парных взаимосвязей"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 3,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi4AAASgCAYAAABv+bDPAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeXxcdbn48c/MnNlnsu9J03RL95aW0paylFYFBQFFFBVERBQFuSIuyM8F0MsVFwRkX0SuCgpeFbggcBURpKwtbekKXZK22ZNJMpl95sw5vz+mM51JJslkaZvQ5/163deVWc7yPWfSmef5fp/HoOu6jhBCCCGEEEIIIYQQQgghxARgPNoHIIQQQgghhBBCCCGEEEIIkSSJCyGEEEIIIYQQQgghhBBCTBiSuBBCCCGEEEIIIYQQQgghxIQhiQshhBBCCCGEEEIIIYQQQkwYkrgQQgghhBBCCCGEEEIIIcSEIYkLIYQQQgghhBBCCCGEEEJMGJK4EEIIIYQQQgghhBBCCCHEhCGJCyGEEEIIIYQQQgghhBBCTBiSuBBCTAi6rh/tQzhijqVzneiOtWtxrJ2vEEIIIcTRJt+/RkfGbeKSayOEOFIkcSGEOKr6+vr4zne+w/r168dle7Nnz+aOO+4AoKmpidmzZ/OXv/xlXLY9Hl544QWuvfbao30Yw5qIYzfeNmzYwJe//OVx2dZf/vIXZs+eTVNTEwB33HEHs2fPHpdtj4doNMp//dd/8b//+79H+1CEEEIIISad0X43vvvuu/n1r3+d+u+J9h1xpPp/5x0P/cd2vH8fTgQT6br3P5bvfve7rF27Nuf379q1i8985jOH49CEEGIASVwIIY6qHTt28OSTT6Jp2rhvu6ysjMcee4zTTjtt3Lc9Wg8//DCtra1H+zCGNRHHbrz96U9/Ys+ePYdl25/85Cd57LHHDsu2R6Ojo4P//u//RlXVo30oQgghhBDHjNtvv51QKJT674n2HXEi6P+743D+PjxaJvJ1v+KKK7jzzjtzfv1zzz3Hxo0bD+MRCSHEIcrRPgAhhDhcLBYLxx133NE+jElJxm5sKioqqKioONqHIYQQQgghJhD5jjjQsfC7YyJf99ra2qN9CEIIMShZcSGEGJM//elPnHXWWSxYsIDTTjuNO+64g3g8nnq+u7ubb37zm5x00kksXLiQc889lyeeeAKAN954g4svvhiAiy++mM997nMj2vebb77JBRdcwOLFiznjjDN49dVXM57PtqT7rbfe4otf/CInnHACCxYsYO3atdxxxx2pGT3J9zz33HNcccUVHHfccaxatYq7774bv9/P//t//4/jjz+eVatW8fOf/zyjvmckEuFnP/sZq1evZsGCBZx99tn87W9/Sz3/uc99jjfffJM333yT2bNn88YbbwDQ29vLD3/4Q1atWsXChQv51Kc+xWuvvZZxLrNnz+bOO+/kvPPOY9GiRTnPihnt+fQfu7/85S/MmzePzZs3c8EFF7Bw4ULWrFmTsfS9v7fffpvZs2fz4osvZjy+Y8cOZs+ezd///ncAnn76ac455xwWLVrEypUr+da3vkV7e/uQ55XLmK1bt45PfepTLFmyhBNOOIGvfvWrqRUW3/3ud/nrX/9Kc3PziJf9a5rG3XffzWmnncbixYu54oor8Hq9Ga/pvwQ7Ho9z//3389GPfpRFixZx3HHH8elPf5rXX3894z0f/vCH+fvf/85HP/rR1Odl48aNbNq0iU9+8pMsWrSIj370owPO9b333uPyyy9n6dKlLF26lCuvvJIDBw4AiWv5gQ98AIDrrrsuYyn4+vXrueiii1i8eDHLly/n2muvpbu7O/V88rr/6U9/4qSTTmL58uXs3r07p3Eay/n84x//4LOf/SxLlixhwYIFfPjDH+aRRx5JPf+1r32NhQsXsnfv3oz9zZ07lzfffDOn4xNCCCHEsWft2rX813/9F5///OdZtGgR3/ve94Dcvlv2N9zviuR3wTvvvDP1v9O/I957770sWLBgwPfIhx9+mPnz5+PxeABoaWnhmmuuYfny5SxevJjPf/7zbN++fUTn/YMf/ICTTjop43cawE033cSKFSuIxWLA0N8pB7Nu3To++9nPcvzxx7NixQq++c1vDlhhvnfvXr72ta+xfPlyTjjhBC6//PLU9/L03x3Zfh8+8sgjzJ49m4aGhoxtPvnkk8ydOzfravZcxlbTNG699VbWrl2bun633HJLaiyyGe332/6/Dfbv389XvvIVVqxYweLFi7ngggt46aWXUs+Hw2FuuOEGTj311NR34aF+dw0mEonwk5/8hJNOOoklS5Zw3XXXEYlEMl7Tv1TU1q1b+fznP8/xxx/PkiVLuOSSS9i0aVPqPJK/Q9NLNHd3d3PjjTeyZs0aFixYwPLly7nyyiszSop97nOf43vf+x73338/p512GgsXLuTTn/4077zzTsbxbNq0iUsvvZSlS5eycuVKrrnmmozfhqP5rAohJi9JXAghRu2+++7jBz/4ASeeeCL33nsvF154IQ888AA/+MEPUq/59re/zZ49e7jxxht54IEHmDdvHtdeey2vv/468+fP54c//CEAP/zhD7n++utz3ve2bdu49NJLcbvd/OpXv+Liiy/mmmuuGfI9O3fu5JJLLqGgoIBbb72Ve+65h2XLlnHnnXfy7LPPZrz2+9//PvX19dxzzz2ceOKJ3H777Zx//vnYbDbuvPNOTj/9dB588EGee+45INGg7Morr+SPf/wjX/jCF7jnnntYsmQJ3/jGN1KJmuuvv5558+Yxb948HnvsMebPn08kEuHzn/88L7zwAt/4xje48847qaio4LLLLhvwBezee+/l7LPP5le/+hVnnHFGzmM1mvPJRtM0rr76as4880zuv/9+li5dys9+9jP+/e9/Z3390qVLqa2t5Zlnnsl4/Omnn6agoIDVq1ezYcMGvvOd73D66afzwAMPcN111/H666/zzW9+c9DjyGXMDhw4wBVXXMGCBQu45557uOmmm2hoaODLX/4ymqZxxRVXsHr1akpLS0dcEuvnP/85d911F+effz533nknBQUF3HLLLUO+5xe/+AV33303F1xwAQ8++CA//vGP6e3t5etf/3pG+YC2tjZuvvlmvvKVr3D77bfT19fHf/zHf3DNNdfwyU9+krvuugtd1/nGN75BOBwGoKGhgU9/+tN4PB5++tOfctNNN3HgwAE+85nP4PF4KCsrS/3A+OpXv5r632+99RaXXHIJNpuN2267jf/3//4fb775JhdffHFq25BIujz00EPcdNNNXHfddcyYMSPnsRrN+fzrX//iyiuvZP78+dx9993ccccdTJkyhR/96Eds3rwZgBtuuAGHw5H6m7F161buvfdeLr30UpYvX57z8QkhhBDi2PPII4+wcOFC7r77bs4///wRfR9PyuV3RbI00Pnnn5+1TNDZZ5+Nqqr83//9X8bjzzzzDCeffDLFxcV0d3fz6U9/mm3btvGDH/yAW265BU3TuPDCC0dU8vTcc8+lq6srNXEKEt/tn332Wc466yzMZvOw3ymzeeKJJ7j00kuprKzkl7/8Jddddx0bN27kggsuSL2nvb2dCy64gMbGRm644QZ+/vOf09XVxec//3l6e3sztpft9+HZZ5+N1WrlySefHLDvE088kcrKylGN7QMPPMAf/vAHrrzySh566CE+85nP8Otf/5p77rlnyLEczffbdJqmcfnllxMKhfjZz37G3XffTUFBAV/96lfZt28fAP/1X//Fyy+/zLXXXsuvf/1rPvCBD/Czn/2MP//5z0MeW3/f/va3efzxx7n88su57bbb8Hq9PPzww4O+3u/3c9lll1FYWMgdd9zBrbfeSigU4otf/CI+n49PfvKTnH/++UDi/v7kJz+JrutcfvnlrFu3jm9961v8+te/5mtf+xqvvfbagN/3zz//PC+88ALf//73+eUvf0lXVxdXXXVVKqG2fft2LrrootSEwBtvvJGtW7fyxS9+EVVVR/VZFUJMcroQQoxCX1+fvmjRIv2HP/xhxuOPP/64Xl9fr7/33nu6ruv6ggU
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1600x1200 with 4 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"sns.set(style=\"whitegrid\")\n",
|
|||
|
"\n",
|
|||
|
"# связь между минимальным и максимальным диаметром астероидов\n",
|
|||
|
"plt.figure(figsize=(16, 12))\n",
|
|||
|
"plt.subplot(2, 2, 1)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['est_diameter_min'], y=df_cleaned['est_diameter_max'], alpha=0.6)\n",
|
|||
|
"plt.title('est_diameter_min vs est_diameter_max')\n",
|
|||
|
"\n",
|
|||
|
"# связь между расстоянием промаха и яркостью астероида\n",
|
|||
|
"plt.subplot(2, 2, 2)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['relative_velocity'], y=df_cleaned['miss_distance'], alpha=0.6)\n",
|
|||
|
"plt.title('relative_velocity vs miss_distance')\n",
|
|||
|
"\n",
|
|||
|
"# связь между расстоянием промаха и абсолютной звездной величиной\n",
|
|||
|
"plt.subplot(2, 2, 3)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['miss_distance'], y=df_cleaned['absolute_magnitude'], alpha=0.6)\n",
|
|||
|
"plt.title('miss_distance vs absolute_magnitude')\n",
|
|||
|
"\n",
|
|||
|
"# связь яркости с его относительной скоростью\n",
|
|||
|
"plt.subplot(2, 2, 4)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['absolute_magnitude'], y=df_cleaned['relative_velocity'], alpha=0.6)\n",
|
|||
|
"plt.title('absolute_magnitude vs relative_velocity')\n",
|
|||
|
"\n",
|
|||
|
"plt.tight_layout()\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Стандартизация данных для кластеризации"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Стандартизация данных — процесс приведения всех признаков (столбцов) к одному масштабу."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"array([[-0.51120494, 3.73744939, 3.73744939, -1.45805986, 0.5311181 ,\n",
|
|||
|
" -2.02560332],\n",
|
|||
|
" [-0.50518329, 0.35347528, 0.35347528, 0.77156175, 0.86292579,\n",
|
|||
|
" -0.94818185],\n",
|
|||
|
" [-0.49287316, 2.00915073, 2.00915073, 2.28238187, 0.27762431,\n",
|
|||
|
" -1.66316796],\n",
|
|||
|
" ...,\n",
|
|||
|
" [-0.42311599, -0.5744506 , -0.5744506 , -0.12789786, 0.31239586,\n",
|
|||
|
" 1.39117363],\n",
|
|||
|
" [-0.42119188, -0.52315117, -0.52315117, -0.94732109, 1.179388 ,\n",
|
|||
|
" 0.76514892],\n",
|
|||
|
" [-0.51510459, 3.1416016 , 3.1416016 , -0.61031165, -0.21666262,\n",
|
|||
|
" -1.92016758]])"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"scaler = StandardScaler()\n",
|
|||
|
"data_scaled = scaler.fit_transform(df_cleaned)\n",
|
|||
|
"data_scaled"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Агломеративная (иерархическая) кластеризация"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Иерархическая кластеризация — метод машинного обучения, предназначенный для группировки объектов (точек данных) на основе их схожести или расстояния друг от друга. Основная идея заключается в создании структуры кластеров в виде дерева (дендрограммы), которое показывает, как объекты группируются на разных уровнях."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 5,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1AAAAJ0CAYAAAAcUcKlAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACzkElEQVR4nOzdeXhU1f3H8c8kM9lXAiHsOwgoiyyCVUCgtFW0Bat1X6o/96WKVVurVavVqmhR1EptVdwXqEvrvqF1AcFdVBTZtxCSsIVkltzfH/Re7kxmJncmk8wkeb+ex8cwc5dz1znfe875XpdhGIYAAAAAAI1KS3YBAAAAAKC1IIACAAAAAIcIoAAAAADAIQIoAAAAAHCIAAoAAAAAHCKAAgAAAACHCKAAAAAAwCECKAAAAABwiAAKAAAAABwigALg2JVXXqlBgwaF/e/KK69MdvEA2Gzfvl2jRo3S559/ru3bt+vcc8/VP//5z2QXCwBaPXeyCwCgdenUqZPmzp0b9NkFF1yQpNIAiKSwsFCnn366jj32WBmGoUGDBukvf/lLsosFAK0eARQAxwKBgHJycjRixIigzzMyMpJTIABRXXDBBTruuOO0Y8cO9erVS+np6ckuEgC0enThA+CY3+9XVlaWo2mXLl2qk046ScOHD9fYsWN1xRVXqLKy0vp+4cKFGjRokNavXx803+TJk4O6A/p8vojdBkOX9dlnn2nGjBkaNmyYjjzySL388stBy965c6duuukmTZ06VQcccICmT5+uZ555psH6Q9ezfv16nXzyybryyiv1t7/9TQcffLBGjRql8847Txs2bAia//XXX9cJJ5ygkSNHav/999dPf/pTPfroo9b3ixcvtpa7bNmyoHkfeeQRDRo0SJMnT25Qnj/84Q9B027fvl3777+/Bg0apMWLFztefyRPP/20Zs6cqREjRmjYsGH6+c9/rpdeeqnBPg7XbTPS8Tn55JOD1vHiiy9q5syZGjlypH70ox/pmmuu0fbt263v77rrLg0aNEgjR46U1+sNmveiiy5q0FW0rq5Ot9xyiyZOnKj9999fRx55pF588cWg+SZPnqw77rhDf/7znzVmzBgddNBBuvzyy1VdXe14+6N1XV24cKF1TO3HYdu2bRo9enTYYzlo0CDtt99+GjNmjC688EJVVVVZ0wwaNEh33XVXUNnM/RLPvpSkjh07qm/fvnr//fcb7W4buq7//Oc/GjNmjGbPni0p+PwN/c9e7m+++UYXXHCBxo0bp6FDh+rQQw/VDTfcoNraWmsar9erv/71r5oyZYqGDRum6dOn61//+pejfS5JGzdu1KWXXqqxY8dq+PDhOvXUU7V8+XJr+evXr9egQYP0n//8R+ecc46GDx+uSZMm6e6771Z9fX3QcQndJ5deemnQMTUMQ3PmzNGhhx6qUaNG6ZxzztGmTZus6QOBgObNm6fp06dr2LBhGjFihI477jh9+OGHUY+j1PCYh/7bMAwdd9xxQffLK6+8MujckqQnnngi7PkDIPFogQLg2J49e1RYWNjodB999JFOP/10jRs3Tn/961+1fft2zZkzR6eccoqeeeYZx0GYtLeSLEn33nuvOnToIGlvZTc08JGks88+WyeddJIuueQSPfPMM/rNb36j++67TxMnTlRtba1OOOEEbdu2TRdddJG6deum119/XVdddZUqKip0zjnnWMuZOHGizjvvPOvfpaWlkqQ33nhDxcXF+sMf/qD6+nrNnj1bJ598sv7zn/8oOztbb7/9ts4//3ydcsopuvDCC1VbW6vHHntM119/vfbff38NHz7cWmZubq7efPNNjRo1yvrsxRdfVFpaw+daubm5evvtt2UYhlwulyTp1VdfVSAQCJoulvXbPfroo7rhhht04YUXatSoUdq+fbv+/ve/67LLLtPIkSNVVlZmTTt37lx16tRJkqzjIUm//OUvdcwxx1j/vu6664LWcc899+jOO+/UCSecoEsuuUTr1q3TnDlz9Omnn+qpp54KOidcLpc++OADTZw4UZK0e/duLVq0KGjfGIah888/Xx9//LEuuugi9evXT6+99pouueQSeb1e/eIXv7Cmfeyxx9SrVy/ddNNNqqys1OzZs7VmzRo98cQTcrlcjW7/eeedp+OOO07S3hadIUOGWOdHz5499d133zXYp7Nnz9bOnTtVUFAQ9Ll5bvl8Pq1cuVK33HKLbrzxRt12221hj004sexLk8/n05///GfH65Ck2tpaXX/99TrzzDN15JFHBn13zTXXaOjQoda/f/WrX1l/l5eX68QTT9SIESN08803KyMjQ++8844eeOABlZaW6qyzzpIkXXbZZVq0aJHOPfdcDR8+XIsWLdKVV14pj8fT6D6vrKzUcccdp+zsbF199dXKzs7WQw89pBNPPFHPPPOM+vXrZ5Xn2muv1cSJE3XXXXdp2bJlmjt3rmpqavTb3/427HYvXbpU//nPf4I+e/DBB3Xffffp8ssvV58+fXTzzTfr4osv1lNPPSVJuu222/T4449r1qxZGjRokLZs2aK7775bF198sd5++21lZ2fHtO/tnnvuOX3yySdRp9m+fbv++te/xr0OALEhgALgWHV1tRVMRDN79mz16dNH9913n9VlaPjw4TriiCO0YMECnXjiiY7XWVNTI0kaOXKkiouLJUnvvvtu2GlPPvlknX/++ZKkQw89VDNmzNDdd9+tiRMnauHChVqxYoWeeOIJjRw50prG7/frnnvu0XHHHaeioiJJewOD0G6K0t4AcuHCherRo4ckqW/fvpoxY4aeffZZHX/88fr+++81Y8YMXXXVVdY8I0eO1EEHHaTFixcHBTATJkzQG2+8YVXiNm/erE8++USjR49u0Ko1fvx4LVq0SJ999plVrpdeekljxowJavWIZf1269at0xlnnBEUNHbr1k0zZ87UsmXLdMQRR1ifDx48WN27d2+wjLKysqB9lpeXZ/29fft23XvvvTr22GN1zTXXWJ8PHDhQJ554YoNzwtw3ZgD15ptvqlOnTkGtBu+//77effdd3XHHHTr88MMl7T2ee/bs0W233abp06fL7d77E5eWlqYHHnhA+fn5kvYe3/PPP1/vvvuuJkyY4Gj7e/bsKWlvd9VI54fpiy++0HPPPafBgwdrx44dQd/Z5x0zZozef/99ffXVVxGXFSrWfWl6+OGHVVNTo44dOzpe17///W95PB6deeaZDbr+9e/fP+I+WLFihQYPHqw5c+ZY58HBBx+s9957T4sXL9ZZZ52lFStW6JVXXtHvf/97nXrqqZL2nucbNmzQ4sWLNX369Kj7/I477lB1dbUef/xxdevWTdLe8+bwww/XnDlzdOedd1rTDh061ApQJ0yYoJqaGj300EM699xzg85TSaqvr9cNN9ygoUOHBh2XmpoanXfeeTrttNMk7W3duv7667Vjxw4VFBSovLxcl1xySVCra2Zmpi688EJ9++23Uc+XaHbv3q3bbrutQXlC3XnnneratWtQayaA5kMXPgCOlZeXq3PnzlGn2bNnjz777DNNnDhRhmHI7/fL7/erR48e6tevn957772g6evr661p/H5/g+Vt3rxZaWlpDSo64cyYMcP62+Vy6cc//rE+//xz1dbWasmSJerWrZsVPJmOOuoo1dXV6bPPPmt0+QceeKAVPEnSkCFD1KNHD3300UeSpDPPPFM333yzdu/erS+//FIvvvii7rvvPklq0CVt8uTJWr16tX744QdJ0ssvv6zhw4dblUG7/Px8jR07Vm+88YYkqbKyUosXLw4KbGJdv92VV16pyy67TDt27NCnn36q5557zur2F20+pz799FN5vV5Nnz496PPRo0erW7duWrJkSdDnU6ZM0ZtvvinDMCTtbZkzgyTTBx98IJfLpYkTJwadP5MnT9bWrVuDWoUmT55sBU/mv91ut3XcErn9hmHohhtu0C9/+Uvtt99+Yb/3+/3yer36/PPPtWzZMu2///5B04ReE/bAMdZ9KUkVFRW6++67dcUVVygzM9PRdmzZskV///vfdcIJJ8Q8buqQQw7RI488oszMTH3//fd
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x700 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"[ 7 17 13 ... 18 9 7]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"linkage_matrix = linkage(data_scaled, method='ward')\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(10, 7))\n",
|
|||
|
"dendrogram(linkage_matrix)\n",
|
|||
|
"plt.title('Дендрограмма агломеративной кластеризации')\n",
|
|||
|
"plt.xlabel('Индекс образца')\n",
|
|||
|
"plt.ylabel('Расстояние')\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"result = fcluster(linkage_matrix, t=10, criterion='distance')\n",
|
|||
|
"print(result) "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Визуализация распределения кластеров"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 6,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi4AAASgCAYAAABv+bDPAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3gc1dX48e+U7UWrXizbcpN7BWy6sVMgdAgJEAgQIIFAeN+EFOCXAiThDSkECL2EkAIJpAGBAKGFYqqNDTa2cZEtS1avu9q+M/f3x9prryXZcjdwPs+TJ3hmdubO3ZE0c8/cczSllEIIIYQQQgghhBBCCCGEEOIAoO/vBgghhBBCCCGEEEIIIYQQQmwmgQshhBBCCCGEEEIIIYQQQhwwJHAhhBBCCCGEEEIIIYQQQogDhgQuhBBCCCGEEEIIIYQQQghxwJDAhRBCCCGEEEIIIYQQQgghDhgSuBBCCCGEEEIIIYQQQgghxAFDAhdCCCGEEEIIIYQQQgghhDhgSOBCCCGEEEIIIYQQQgghhBAHDAlcCCEOCEqp/d2EfeaTdK4Huk/ad/FJO18hhBBCiP1N7r92jfTbgUu+GyHEviKBCyHEfhUOh/ne977HwoUL98j+xo8fz2233QZAY2Mj48eP5x//+Mce2fee8MILL3DVVVft72bs0IHYd3vaokWL+NrXvrZH9vWPf/yD8ePH09jYCMBtt93G+PHj98i+94RUKsX//d//8a9//Wt/N0UIIYQQ4iNnV++N77zzTn7729/m/n2g3SPurG3vefeEbft2Tz8fHggOpO9927ZcffXVzJ8/f8ifX716NWefffbeaJoQQvQjgQshxH61YsUKHn/8cWzb3uP7Lisr45FHHuGYY47Z4/veVQ8++CDNzc37uxk7dCD23Z7217/+lbVr1+6VfX/hC1/gkUce2Sv73hVtbW38/ve/J5PJ7O+mCCGEEEJ8Ytx6663E4/Hcvw+0e8QDwbbPHXvz+XB/OZC/98suu4zbb799yNs/88wzLF68eC+2SAghtjD3dwOEEGJvcTqdzJgxY3834yNJ+m73VFRUUFFRsb+bIYQQQgghDiByj9jfJ+G540D+3keMGLG/myCEEIOSGRdCiN3y17/+lRNOOIEpU6ZwzDHHcNttt2FZVm59V1cX3/72tzniiCOYOnUqp5xyCo899hgAb731Fueddx4A5513Hl/+8pd36thvv/02Z555JtOnT+fYY4/l9ddfz1s/0JTud955h4suuohDDjmEKVOmMH/+fG677bbcGz2bP/PMM89w2WWXMWPGDA4//HDuvPNO+vr6+H//7/9x0EEHcfjhh/PLX/4yL79nMpnkF7/4BXPnzmXKlCmcdNJJ/Pvf/86t//KXv8zbb7/N22+/zfjx43nrrbcA6Onp4Uc/+hGHH344U6dO5Ytf/CJvvPFG3rmMHz+e22+/ndNPP51p06YN+a2YXT2fbfvuH//4B5MmTeK9997jzDPPZOrUqcybNy9v6vu23n33XcaPH89LL72Ut3zFihWMHz+e5557DoAnn3ySk08+mWnTpnHooYfyne98h9bW1u2e11D6bMGCBXzxi19k5syZHHLIIXz961/PzbC4+uqr+ec//8nGjRt3etq/bdvceeedHHPMMUyfPp3LLruM3t7evG22nYJtWRb33nsvJ554ItOmTWPGjBmcddZZvPnmm3mfOe6443juuec48cQTcz8vixcvZsmSJXzhC19g2rRpnHjiif3OddWqVVxyySXMmjWLWbNmcfnll9PQ0ABkv8tPfepTAFxzzTV5U8EXLlzIueeey/Tp05k9ezZXXXUVXV1dufWbv/e//vWvHHHEEcyePZs1a9YMqZ9253yef/55vvSlLzFz5kymTJnCcccdx0MPPZRb/41vfIOpU6dSV1eXd7yJEyfy9ttvD6l9QgghhPjkmT9/Pv/3f//H+eefz7Rp0/j+978PDO3ecls7eq7YfC94++235/5763vEu+++mylTpvS7j3zwwQeZPHkynZ2dADQ1NXHllVcye/Zspk+fzvnnn8/y5ct36rx/+MMfcsQRR+Q9pwHccMMNzJkzh3Q6DWz/nnIwCxYs4Etf+hIHHXQQc+bM4dvf/na/GeZ1dXV84xvfYPbs2RxyyCFccsklufvyrZ87Bno+fOihhxg/fjzr1q3L2+fjjz/OxIkTB5zNPpS+tW2bm2++mfnz5+e+v5tuuinXFwPZ1fvbbZ8NNmzYwKWXXsqcOXOYPn06Z555Ji+//HJufSKR4LrrruPoo4/O3Qtv77lrMMlkkp/97GccccQRzJw5k2uuuYZkMpm3zbapopYtW8b555/PQQcdxMyZM7ngggtYsmRJ7jw2P4dunaK5q6uL66+/nnnz5jFlyhRmz57N5ZdfnpdS7Mtf/jLf//73uffeeznmmGOYOnUqZ511Fu+//35ee5YsWcKFF17IrFmzOPTQQ7nyyivzng135WdVCPHRJYELIcQuu+eee/jhD3/IYYcdxt13380555zDfffdxw9/+MPcNt/97ndZu3Yt119/Pffddx+TJk3iqquu4s0332Ty5Mn86Ec/AuBHP/oR11577ZCP/cEHH3DhhRcSCAT4zW9+w3nnnceVV1653c+sXLmSCy64gFAoxM0338xdd93FwQcfzO23387TTz+dt+0PfvADamtrueuuuzjssMO49dZbOeOMM3C73dx+++189rOf5f777+eZZ54BsgXKLr/8cv7yl7/wla98hbvuuouZM2fyrW99Kxeoufbaa5k0aRKTJk3ikUceYfLkySSTSc4//3xeeOEFvvWtb3H77bdTUVHBxRdf3O8G7O677+akk07iN7/5Dccee+yQ+2pXzmcgtm3zzW9+k+OPP557772XWbNm8Ytf/IJXX311wO1nzZrFiBEjeOqpp/KWP/nkk4RCIebOncuiRYv43ve+x2c/+1nuu+8+rrnmGt58802+/e1vD9qOofRZQ0MDl112GVOmTOGuu+7ihhtuYN26dXzta1/Dtm0uu+wy5s6dS2lp6U6nxPrlL3/JHXfcwRlnnMHtt99OKBTipptu2u5nfvWrX3HnnXdy5plncv/99/OTn/yEnp4e/vd//zcvfUBLSws33ngjl156KbfeeivhcJj/+Z//4corr+QLX/gCd9xxB0opvvWtb5FIJABYt24dZ511Fp2dnfz85z/nhhtuoKGhgbPPPpvOzk7KyspyDxhf//rXc//9zjvvcMEFF+B2u7nlllv4f//v//H2229z3nnn5fYN2aDLAw88wA033MA111zDmDFjhtxXu3I+//3vf7n88suZPHkyd955J7fddhvDhw/nxz/+Me+99x4A1113HV6vN/c7Y9myZdx9991ceOGFzJ49e8jtE0IIIcQnz0MPPcTUqVO58847OeOMM3bqfnyzoTxXbE4NdMYZZwyYJuikk04ik8nwn//8J2/5U089xZFHHklxcTFdXV2cddZZfPDBB/zwhz/kpptuwrZtzjnnnJ1KeXrKKafQ0dGRe3EKsvf2Tz/9NCeccAIOh2OH95QDeeyxx7jwwguprKzk17/+Nddccw2LFy/mzDPPzH2mtbWVM888k/Xr13Pdddfxy1/+ko6ODs4//3x6enry9jfQ8+FJJ52Ey+Xi8ccf73fsww47jMrKyl3q2/vuu48///nPXH755TzwwAOcffbZ/Pa3v+Wuu+7abl/uyv3t1mzb5pJLLiEej/OLX/yCO++8k1AoxNe//nXq6+sB+L//+z9eeeUVrrrqKn7729/yqU99il/84hf8/e9/327btvXd736XRx99lEsuuYRbbrmF3t5eHnzwwUG37+vr4+KLL6awsJDbbruNm2++mXg8zkUXXUQkEuELX/gCZ5xxBpC9vr/whS+glOKSSy5hwYIFfOc73+G3v/0t3/jGN3jjjTf6Pd8/++yzvPDCC/zgBz/g17/+NR0dHVxxxRW5gNry5cs599xzcy8EXn/99SxbtoyLLrqITCazSz+rQoiPOCWEELsgHA6radOmqR/96Ed5yx999FFVW1urVq1apZRSasqUKequu+7Krbc
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1600x1200 with 4 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"plt.figure(figsize=(16, 12))\n",
|
|||
|
"plt.subplot(2, 2, 1)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['est_diameter_min'], y=df_cleaned['est_diameter_max'], hue=result, palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.title('est_diameter_min vs est_diameter_max')\n",
|
|||
|
"\n",
|
|||
|
"plt.subplot(2, 2, 2)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['relative_velocity'], y=df_cleaned['miss_distance'], hue=result, palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.title('relative_velocity vs miss_distance')\n",
|
|||
|
"\n",
|
|||
|
"plt.subplot(2, 2, 3)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['miss_distance'], y=df_cleaned['absolute_magnitude'], hue=result, palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.title('miss_distance vs absolute_magnitude')\n",
|
|||
|
"\n",
|
|||
|
"plt.subplot(2, 2, 4)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['absolute_magnitude'], y=df_cleaned['relative_velocity'], hue=result, palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.title('absolute_magnitude vs relative_velocity')\n",
|
|||
|
"\n",
|
|||
|
"plt.tight_layout()\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### KMeans (неиерархическая кластеризация) для сравнения"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Неиерархическая кластеризация — метод группировки данных, при котором объекты распределяются по заданному числу кластеров(в нашем случае - \n",
|
|||
|
"𝑘 в методе K-Means), основываясь на определенных метриках расстояния или схожести. В отличие от иерархической кластеризации, которая создает древовидную структуру кластеров, неиерархическая работает с фиксированным количеством кластеров и напрямую распределяет объекты в группы.\n",
|
|||
|
"\n",
|
|||
|
"K-Means:\n",
|
|||
|
"* Делит данные на 𝑘 кластеров, **минимизируя сумму квадратов расстояний от каждой точки до её центроида**.\n",
|
|||
|
"* Центроиды обновляются итеративно, пока результат не стабилизируется."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Центры кластеров:\n",
|
|||
|
" [[-0.43863035 0.09971675 0.09971675 0.63756469 0.62405041 -0.45475412]\n",
|
|||
|
" [-0.4426337 -0.35846552 -0.35846552 -0.63523481 -0.61852806 0.41688706]\n",
|
|||
|
" [ 2.21402038 -0.38941373 -0.38941373 -0.16164814 -0.0129622 0.65296724]\n",
|
|||
|
" [-0.50737813 3.15522336 3.15522336 0.6094733 0.11681301 -1.85373482]]\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi4AAASgCAYAAABv+bDPAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd5wV1fn48c+U27d3tsAuLLssvUiXaq/YosZuYjeaYiz5xXzVGI1JNCaKxh41NuwdO9KkSy/CUpftvd06M+f3x2UvXHYXliarnvfr5UuYuXfmTLnDzDznPI8ihBBIkiRJkiRJkiRJkiRJkiRJkiR1A+rRboAkSZIkSZIkSZIkSZIkSZIkSVIbGbiQJEmSJEmSJEmSJEmSJEmSJKnbkIELSZIkSZIkSZIkSZIkSZIkSZK6DRm4kCRJkiRJkiRJkiRJkiRJkiSp25CBC0mSJEmSJEmSJEmSJEmSJEmSug0ZuJAkSZIkSZIkSZIkSZIkSZIkqduQgQtJkiRJkiRJkiRJkiRJkiRJkroNGbiQJEmSJEmSJEmSJEmSJEmSJKnbkIELSZK6NSHE0W5Ct/JT2x8/te3tzn5Kx+KntK2SJEmSJB1e8j7i4Ml9d+DkPuve5PGRpEMjAxeStJdLL72USy+9tN30lpYWzj//fAYOHMgXX3wR+WxhYSEXXnhhp8v77W9/S2FhIXfccccRa/OREggEeP755zn33HMZMWIEo0aN4sILL+Tdd9+N+gf40UcfpbCw8LCuOxgMcv/99/PBBx8cluV1dlx/SJYtW8Y111xzWJb19ttvU1hYyM6dO4EjcwwPxeE+/kdSd9t3R8Ljjz/Os88+e1iWtfdvcerUqd3q+rhp0yZ+/vOfH+1mSJIkSVKXyeeX3eTzy9F3MPu2oqKCa665htLS0si07naPeKCOxPHbe98ezufD7qI7Hfe921JYWMijjz7a5e+/8cYb/O1vfzsSTZOknwz9aDdAkn4IWlpauOqqq9iwYQOPPfYYkyZNisxTVZUVK1ZQUVFBRkZG1Pe8Xi+zZs36vpt7WNTU1HDVVVdRXl7OpZdeyuDBg7Esi1mzZnHHHXewdOlS7r33XhRFOSLrr6qq4oUXXuCvf/3rYVneXXfddViWczS98cYbbN68+Ygs+2c/+xkTJkw4Iss+GIf7+B9J3W3fHQn//ve/+dWvfnVElj19+nRiYmKOyLIPxieffMLy5cuPdjMkSZIk6ZDI5xf5/PJD8s033zB79uyoad3tHrE72Pu540g+Hx4t3fm4z5gxo901c1/+85//MGrUqCPYIkn68ZOBC0naj7ab/vXr1/Of//yH8ePHR83v378/xcXFfPLJJ1xxxRVR82bNmoXL5SIuLu57bPHhcfvtt1NRUcGMGTPIzc2NTJ88eTKZmZn885//ZMqUKRx33HFHr5EHID8//2g3oVvLyMg4oJswaTe57w5N//79j3YTJEmSJOlHRT6/yOeXHwN5j9jeT+G5ozsf96FDhx7tJkjST45MFSVJ+9Da2srVV1/Nd999x1NPPdXuph/A7XYzadIkPvnkk3bzPv74Y0466SR0PTpGaFkWTz31FCeccAIDBw7kpJNO4n//+1/UZ0zT5KmnnuL0009n8ODBDB06lAsvvJCFCxdGPvPoo49ywgkn8PXXX3PGGWdElvXuu+9GLeuFF17g5JNPZtCgQUyYMIG7776blpaWTrd7/fr1zJs3j1/+8pdRN/1trrjiCi6++GLcbneH3+9oeOfeqYn8fj933303EydOZODAgZx88smRVDQ7d+6MPFD84Q9/YOrUqZHlLF26lEsuuYQhQ4YwatQobr/9durq6qLW079/f9544w3Gjx/PqFGjKC4ubjdUt7CwkJdffpk//vGPjBo1imHDhvHrX/+ampqaqHY/++yzHHfccQwePJgLL7yQr776isLCQhYtWhRpa1eGjDY0NPB///d/jBs3jkGDBnH++eezYMGCqM/Mnz+f888/n2HDhjFy5Eiuv/76SA+aO+64g3feeYfS0lIKCwt5++2397m+PVmWxeOPP87kyZMZMmQIN9xwA42NjVGf2XvYcVfPv5NPPpnPP/+c008/nUGDBjFt2jSWL1/OihUr+NnPfsbgwYM5/fTT223rxo0bufbaaxk+fDjDhw/nxhtvpKSkJLJPD/fx74qD3Z69992ll17KH//4R5566ikmT57MoEGDuPDCC1m1alWn6/7Tn/7E+PHjMU0zavp9993H6NGjCYVC+/zN7Mv+9pllWTz88MNMnTqVgQMHMnXqVB566CFCoRBAZNumT59+wMP+y8rK+NWvfsWIESMYP348//3vf9t9Zu/rxc6dO7nttts49thjGTBgAGPHjuW2226jvr4+6jvTp0/n/vvvZ/To0QwbNoxbbrmF1tZWnnrqKSZOnMiIESO46aabor4H4Z5pp512GgMHDmTy5Mk8+uijkf3+6KOPMn369Mh2t/2uu3LNvvTSS/n973/PzTffzNChQ7nyyiu7vJ8Odnv8fj8PPfQQJ554IgMHDmT48OFceeWVrF+/HoDy8nJGjBgRde0LBAKceuqpnHbaaQQCgS63UZIkSfphkM8v8vnlcDy/LFq0iMLCQl577TWmTJnC8OHDmT9/fpe2Z2/7Oy/efvtt/vCHPwBw3HHHRY7DnsfkpJNO4uabb2637GnTpnH99ddH/v7FF19wzjnnMGjQIMaPH89f/vIXvF5vp23bWyAQYMSIEe3S+xiGwZgxY/jLX/4Smbave8rOlv3YY49FzusTTzyRp556Csuyoj737rvvcvbZZzNkyBAmT57MQw89RDAYBKKfOzp6Pjz33HM7TAN3xRVXdHpv2pV9u2PHDq677jpGjx7NkCFDuOCCC9qNkNnbwd7f7v1b/PDDDznzzDMZPHgwY8aM4fe//z2VlZWR+WvWrOHyyy9nxIgRDBs2jCuuuIIVK1bss20d2bBhA1deeSXDhg1jypQpvP/+++0+s/fvZl/XqalTp1JaWso777wTdR1ZsmQJv/zlLxk5cmTk2evRRx+NnAdtv8+ZM2dy8803M2zYMEaNGsWdd94ZdS4LIXj++ec55ZRTGDx4MCeccALPPvtsVEq8A/2tSlK3JCRJinLJJZeISy65RLS2toqLLrpIDB48WCxZsmSfn505c6YoLCwU5eXlkXnNzc1i4MCBYsmSJWLKlCni9ttvj8z705/+JAYMGCAeeeQRMXfuXPHPf/5T9OvXT0yfPj3ymQceeEAMGTJEvPjii2LRokXi/fffFyeddJIYNWqU8Hq9QgghHnnkETFkyBAxZcoU8frrr4v58+eLX/ziF6KgoEAUFxcLIYT44IMPxIABAyLLefXVV8XQoUPFbbfd1uk+ePLJJ6OWsT+PPPKIKCgoiPx97+0VQoi33npLFBQUiJKSksg+mDJlivjwww/FwoULxd///ndRUFAg3nzzTREIBMRnn30mCgoKxMMPPyzWrl0rhBBi8eLFYsCAAeKXv/yl+Oqrr8Q777wjJk+eLE477TTh8/mi1nPyySeLWbNmibfffltYlhU5Vm0KCgrEiBEjxB133CHmzp0rXnnlFTFo0CDx29/+NvKZRx99VPTr10/84x//EHPnzhX333+/GDRokCgoKBALFy4UQggRCATE8uXLo4793vx+vzjzzDPFuHHjxOuvvy6+/vprcdNNN4n+/fuLb775RgghxI4dO8TgwYPFPffcIxYsWCA+/fRTcdJJJ4mpU6cK0zTF9u3bxdVXXy3Gjx8vli9fLmpra7t0bIQIn0v9+/cXjz76qJgzZ474wx/+IAYMGBB1PPY+hgdy/k2dOlV88MEH4ssvvxSTJ08Wxx57rJgyZYqYMWOGmDNnjjj11FPF6NGjI8doy5YtYtiwYeLcc88Vn332mfj444/FGWecIcaPHy9qamqOyPHvioPdnr333SWXXCJGjBghzj//fPH555+Lzz77TBx33HFi4sSJwjCMDte9ZMkSUVBQIObPnx+ZZpqmGD9+vLjnnnuEEPv+zXSmK/vsiSeeECNHjhRvvvmmWLRokXjqqadEUVGR+Pe//y2EEGL58uW
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1600x1200 with 4 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"random_state = 17\n",
|
|||
|
"kmeans = KMeans(n_clusters=4, random_state=random_state)\n",
|
|||
|
"\n",
|
|||
|
"labels = kmeans.fit_predict(data_scaled)\n",
|
|||
|
"\n",
|
|||
|
"centers = kmeans.cluster_centers_\n",
|
|||
|
"\n",
|
|||
|
"print(\"Центры кластеров:\\n\", centers)\n",
|
|||
|
"centers = scaler.inverse_transform(centers)\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(16, 12))\n",
|
|||
|
"plt.subplot(2, 2, 1)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['est_diameter_min'], y=df_cleaned['est_diameter_max'], hue=labels, palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.scatter(centers[:, 1], centers[:, 2], s=300, c='red', label='Centroids')\n",
|
|||
|
"plt.title('KMeans Clustering: est_diameter_min vs est_diameter_max')\n",
|
|||
|
"plt.legend()\n",
|
|||
|
"\n",
|
|||
|
"plt.subplot(2, 2, 2)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['relative_velocity'], y=df_cleaned['miss_distance'], hue=labels, palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.scatter(centers[:, 3], centers[:, 4], s=300, c='red', label='Centroids')\n",
|
|||
|
"plt.title('KMeans Clustering: relative_velocity vs miss_distance')\n",
|
|||
|
"plt.legend()\n",
|
|||
|
"\n",
|
|||
|
"plt.subplot(2, 2, 3)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['miss_distance'], y=df_cleaned['absolute_magnitude'], hue=labels, palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.scatter(centers[:, 4], centers[:, 5], s=300, c='red', label='Centroids')\n",
|
|||
|
"plt.title('KMeans Clustering: miss_distance vs absolute_magnitude')\n",
|
|||
|
"plt.legend()\n",
|
|||
|
"\n",
|
|||
|
"plt.subplot(2, 2, 4)\n",
|
|||
|
"sns.scatterplot(x=df_cleaned['absolute_magnitude'], y=df_cleaned['relative_velocity'], hue=labels, palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.scatter(centers[:, 5], centers[:, 3], s=300, c='red', label='Centroids')\n",
|
|||
|
"plt.title('KMeans Clustering: absolute_magnitude vs relative_velocity')\n",
|
|||
|
"plt.legend()\n",
|
|||
|
"\n",
|
|||
|
"plt.tight_layout()\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### PCA для визуализации сокращенной размерности"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"PCA (Principal Component Analysis) — метод сокращения размерности, используемый для преобразования высокоразмерных данных в пространство с меньшим количеством измерений, сохраняя при этом как можно больше информации (дисперсии) из исходных данных.\n",
|
|||
|
"\n",
|
|||
|
"В контексте графиков для визуализации результатов кластеризации, PCA используется для проекции многомерных данных в двумерное пространство, чтобы можно было легко визуализировать кластеры."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 13,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABi4AAAJHCAYAAAAHaK7PAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3wUZf7A8c/MbE8vpAKBEHoJAaSoVBU86+lPz3a2s4By550etrPheZa7w3L2Xk7xRAWxi6CoqIjSew0ESEjv2b4zvz+WLIQkECCbBPy+X6/ckXlm53l2dzbOd7/zPF/FMAwDIYQQQgghhBBCCCGEEEKIDkBt7wEIIYQQQgghhBBCCCGEEELUk8SFEEIIIYQQQgghhBBCCCE6DElcCCGEEEIIIYQQQgghhBCiw5DEhRBCCCGEEEIIIYQQQgghOgxJXAghhBBCCCGEEEIIIYQQosOQxIUQQgghhBBCCCGEEEIIIToMSVwIIYQQQgghhBBCCCGEEKLDkMSFEEIIIYQQQgghhBBCCCE6DElcCCHEMc4wjPYegmjGr/m9+TU/dyGEEEII0ZBcG/46yPsshGhNkrgQ4jh1+eWX07t37wY/AwYMYNy4cdx///1UVVU1esz27duZPn06p556KoMGDWLcuHHccsstbNy4sdl+Hn/8cXr37s0DDzwQzqfTrKeeeorevXu3S99NmTNnDr1792b37t1hf5zX6+Whhx7i448/PtxhHpaLL76Y3r17M2/evLD209Hey6NRXV3NbbfdxtKlS0PbLr/8ci6//PI2G0NLP88TJkzgjjvuaNW+t2zZwiWXXNIqx9q9eze9e/dmzpw5rXI8IYQQQnQcErO0j+MpZunduzdPPfVUo+2bN29m1KhRjB07lh07doT27d27N4899liTx9J1ndGjRx+z155FRUX861//4vTTTyc7O5uTTz6ZKVOmNIhJIDxxSWFhIddffz35+fmtcrzm3lchxK+LJC6EOI7169ePWbNmhX5ee+01rrrqKmbPns3kyZMb3A3x5Zdfct5557Fu3TpuuOEGXnrpJW6++WZ27NjB7373O3744YdGx9d1nblz59KrVy8+/PBDXC5XWz69X73i4mLeeOMN/H5/2PrIzc1lxYoV9OrVi3feeSds/RxvNmzYwIcffoiu66Ft9913H/fdd1+b9H8kn+fW9MUXX7BixYpWOVZSUhKzZs1i3LhxrXI8IYQQQnQsErMc39oiZjnQli1buOqqq7Db7bz11lt069Yt1KaqKl988UWTj/vll18oLi5uo1G2rmXLlnHuueeycOFCrrjiCp5//nnuuusu3G43l19+OXPnzg1r/z/++CPffvttqx1v1qxZXHjhha12PCHEscnU3gMQQoRPZGQkgwcPbrDthBNOoK6ujieffJJVq1YxePBgdu7cye23387o0aN54okn0DQttP/EiRO55JJLuP322/n666+xWCyhtu+//57CwkIee+wxfv/73/PJJ5/IxcVxZs6cOaSnpzN58mSmTZtGXl4eGRkZ7T2sY1JWVlab9HOkn+eOymKxNPo7JoQQQojjh8QsojVt27aNK6+8koiICN544w3S0tIatA8ZMoSlS5eyfv16+vXr16Dt008/pW/fvmzYsKEth3zUKisr+ctf/kK3bt147bXXsNvtobZJkyZx/fXXc++993LyySeTmJjYjiNtObn+F0KAzLgQ4ldpwIABABQUFADw5ptv4vV6ufvuuxsEAAB2u53bb7+d//u//2s0VXv27Nn06tWLoUOHMmLECGbNmnXIvidMmMBDDz3ElVdeyaBBg7jrrruA4MXWvffey4knnsjAgQP53e9+x+LFixs81uPx8PDDD3PSSSeRk5PDnXfeicfjabBPU9NelyxZQu/evVmyZEloW25uLn/84x8ZPnw4J5xwApMnT2bbtm0N+vrXv/7F2LFjGTBgAGeffTafffZZg+Pqus6zzz7LuHHjyM7O5sYbb2xyOvuBWvq4BQsWcOmll5KTk8OAAQM4/fTTmTlzJhBcPueUU04B4M4772TChAmhx7333nucf/75DB48mEGDBnHuuefy+eefNzh27969D7k8UCAQYO7cuYwfP55TTz0Vh8PR5Hvs8/mYMWMGY8aMYdCgQVxzzTXMnTu30TTyDz74gDPOOIOBAwdyzjnnsHjxYvr163fQadifffYZ559/Pjk5OZx00knce++9DV6rp556itNPP5358+dz1llnMXDgQM4991xWrFjBypUrufDCCxk0aBBnnXVWo/Np8+bNTJ48mSFDhjBkyBCmTp3Krl27Qu31580777zD+PHjGTJkSOguvoO9xkuWLOGKK64A4Iorrgidj/ufm3/4wx84//zzGz3fG2+8kXPOOSf0+9KlS/n9739PdnY2w4cP5/bbb6e8vLzZ1wuO/PO8/3Pe/7Ny4NgB1q5dy5VXXsnQoUPJycnhqquuYuXKlUDwPXn66aeBhlO8dV3nxRdf5LTTTmPAgAFMmjSJN998s1E/06ZN46abbmLw4MFcffXVjZaKmjNnDv369WPVqlVcdNFFDBw4kPHjx/PKK680OFZxcTE333xz6DN+77338vjjjzf4rAghhBCi45KYRWKWlsQs+9u2bRtXXHEFUVFRvPXWW42SFhBMiiUmJjaadeH3+/nyyy8588wzGz2mJe97eXk5999/P+PHj2fAgAEMHz6cqVOnNoiHLr/8cu666y5efPFFxo0bx8CBA7n44otZvXp1aB+328306dMZM2ZM6PU88Dr3QHPnzqW4uJi//e1vDZIWEJxhMm3aNC677DJqa2sbPba5ZVnvuOOOBu/Xzp07mTJlCiNGjCA7O5uLLrooNMNizpw53HnnnQCccsopDd6z9957jzPPPDO0BNxTTz1FIBBo0M+VV17Jfffdx5AhQzjjjDMIBAIN4oj6z8bixYv5wx/+QHZ2NieddBL//ve/GxyrtraWe++9l1GjRpGTk8PNN9/M66+/3qGWaRNCHB5JXAjxK7R9+3YAunTpAsCiRYvo168fycnJTe4/atQobr75Zjp16hTaVllZyddff81vf/tbAM477zzWrFnDunXrDtn/zJkzGThwIM8++ywXXHABHo+HK6+8kq+++oqbb76Zp59+mpSUFK699toGF4S33nor7777LpMnT+aJJ56gqqqK119//bCff1FRERdddBE7duxg+vTp/Pvf/6a0tJQrr7ySyspKDMNg6tSpvPPOO1x99dU899xzoQuf/afY/vvf/+aZZ57hggsu4OmnnyY2NpZHH330kP235HHffPMNU6dOpX///jz77LM89dRTdOnShb///e+sWrWKpKSk0JfDN9xwQ+jfM2fO5N577+XUU0/lhRdeYMaMGVgsFqZNm0ZhYWHo+LNmzeLGG2886Di/++47SkpK+O1vf4vNZuM3v/kNH3zwAV6vt8F+9957L2+88Qa///3veeaZZ0hMTOSee+5psM/cuXO54447GDJkCM8++yyTJk3ixhtvbHCheaBnn32WW265hcGDB/Pkk08ydepU5s2bx+WXX47b7Q7tV1hYyCOPPMKUKVP4z3/+Q3V1NTfddBO33HILF154Ic888wyGYXDzzTeHHrd9+3YuvvhiysrK+Oc//8mDDz7Irl27uOSSSygrK2swjqeffprbb7+de++9l5ycnEO+xv379+fee+8NvTZNLQ91zjnnsG7dOvLy8kLbqqur+e677zj33HOB4FT1q666CpvNxhNPPMHf/vY3fv75Z6644ooGz/9AR/J5Phy1tbVce+21xMXF8dRTT/H444/jcrm45pprqKmp4cILL+SCCy4AGk7xnj59Ok8++STnnHMOzz//PKeffjoPPfQQzzzzTIPjf/7550RERPDcc89x7bXXNjkGXdf5y1/+whlnnMGLL77IkCFD+Ne//sWiRYuA4FrKV155JcuXL+dvf/sbDz/8MBs3buTVV189oucshBBCiLYnMYvELC2JWerl5uZy5ZVXEhkZyVtvvdXseaJpGpMmTWqUuFi8eDEej6fRTS4ted8Nw2Dy5Mn88MMPTJs2jVdeeYU//vG
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1600x600 with 2 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"pca = PCA(n_components=2)\n",
|
|||
|
"reduced_data = pca.fit_transform(data_scaled)\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(16, 6))\n",
|
|||
|
"plt.subplot(1, 2, 1)\n",
|
|||
|
"sns.scatterplot(x=reduced_data[:, 0], y=reduced_data[:, 1], hue=result, palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.title('PCA reduced data: Agglomerative Clustering')\n",
|
|||
|
"\n",
|
|||
|
"plt.subplot(1, 2, 2)\n",
|
|||
|
"sns.scatterplot(x=reduced_data[:, 0], y=reduced_data[:, 1], hue=labels, palette='Set1', alpha=0.6)\n",
|
|||
|
"plt.title('PCA reduced data: KMeans Clustering')\n",
|
|||
|
"\n",
|
|||
|
"plt.tight_layout()\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"#### Анализ инерции для метода локтя (метод оценки суммы квадратов расстояний)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Анализ инерции для метода локтя — это техника, используемая для определения оптимального числа кластеров в задаче кластеризации (например, для алгоритма K-Means). **Метод основывается на оценке суммы квадратичных отклонений (или инерции) объектов от центров их кластеров**.\n",
|
|||
|
"\n",
|
|||
|
"Инерция (в контексте кластеризации) — это метрика, которая измеряет \"плотность\" кластеров, то есть, насколько близко точки внутри каждого кластера расположены к его центроиду.\n",
|
|||
|
"Формально инерция определяется как **сумма квадратов расстояний всех точек до ближайшего центра кластера**.\n",
|
|||
|
"\n",
|
|||
|
"Метод локтя:\n",
|
|||
|
"1. Для различных значений 𝑘 (количества кластеров) вычисляется инерция.\n",
|
|||
|
"2. Значения инерции отображаются на графике в зависимости от 𝑘.\n",
|
|||
|
"3. Смотрится точка, после которой уменьшение инерции значительно замедляется. Эта точка называется локтем, и соответствующее значение 𝑘 считается оптимальным числом кластеров."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 9,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA14AAAImCAYAAABD3lvqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACHHElEQVR4nOzdeVzVVf7H8fe97DsCCm6IouKGO26pmZnti1naYoujaWbjpJVtTmX7zzRLHSvT1BZHLc2psd3Kyh3LFVBRUVQQ2UUu+/39gdzpCgIicC/wej4ePK58v+d77gc6w/DmnO/5Gsxms1kAAAAAgBpjtHUBAAAAAFDfEbwAAAAAoIYRvAAAAACghhG8AAAAAKCGEbwAAAAAoIYRvAAAAACghhG8AAAAAKCGEbwAAAAAoIYRvAAAAACghhG8AAAAAKCGEbwAoBrcd999CgsL01133XXRNlOnTlVYWJiefvrpWqwMQFWdOHFCYWFhWrt2ra1LAVAPELwAoJoYjUbt2rVLiYmJpc5lZ2fr559/tkFVAADAHhC8AKCadOrUSS4uLvr2229Lnfv555/l5uamwMBAG1QGAABsjeAFANXE3d1dV155ZZnB6+uvv9a1114rR0fHUud+/PFH3X777QoPD9cVV1yhV155RdnZ2ZKkoUOHKiwsrMyPEydOSJI2bdqke+65R7169VLfvn31+OOPKyEhweo9Hn/88TL7qGgJVckSyrI+/mrv3r0aN26c+vbtq549e+rhhx/WoUOHLOe3bdumsLAwbdu2TZJ08OBBDRs2THfddZfmz59/0feYP3++JOmzzz7T9ddfry5dulidr2jZ5urVq8vs96/XlSwnq6hdVWuo7PemvPe/2PmS/w5PP/20hg4davW+K1eutPoe/vV9du7cadX2k08+UVhYmFUfOTk5mjNnjoYPH64uXbqoZ8+eGjt2rKKjo62uvVhd9913n1WbkjrKcuH4KHHfffdZ9ZObm6t//etfuu666xQeHq7hw4dr0aJFKioqsrrmwlq2bdtWqWsrYjab9cwzz6hr1676/fffK30dAEhS6d8AAABVdsMNN+ixxx5TYmKigoKCJElZWVn69ddftXTpUv36669W7b/66is98cQTuvnmm/XYY4/p5MmTmjt3rmJjY7V06VItWLBAeXl5OnPmjB599FFNmjRJQ4YMkSQ1adJE69at01NPPaWbbrpJEydOVFpamubNm6fRo0friy++kL+/v6TiX1hHjx6t22+/XZIs/VVGp06d9MILL1g+/+yzz/T5559bPt+6davGjx+vvn376rXXXlNubq7ef/993XXXXVq9erVCQ0NL9fnmm2+qS5cumjRpknx8fDRo0CBJ0syZMyXJ8n5BQUHasWOHZsyYoTvuuEMzZsyQh4eHJFWq/pycHIWHh2vGjBmWYxe77q/f2wvbVbWGS/nePP/88+rcuXOZ779q1SpJ0v79+/XSSy+VanuhjIwMvf3222We8/Dw0E8//aRevXpZjn399dcyGq3/Fjt9+nRFRkZq2rRpCg4O1rFjx/TOO+/o8ccf1/r162UwGCxt77jjDt15552Wz0v+O1Yns9mshx9+WLt27dKjjz6qDh06aNu2bXr77bcVHx+vl19+2dL2wjEbGhpa6WvL88orr+i///2v/vWvf2ngwIHV/jUCqN8IXgBQjYYMGSI3Nzd9++23evDBByVJP/zwg/z9/a1+0ZWKf5GcPXu2Bg0apNmzZ1uOh4SE6MEHH9TGjRstQaBkdis4OFjdu3eXJBUVFWn27NkaOHCg5syZY7m+Z8+euuGGG7RkyRJNnz5dkmQymRQSEmK5tqS/yvD09LRcJ0m//fab1fk5c+aoVatWWrRokRwcHCRJAwcO1DXXXKN58+bpnXfesWp/7Ngx/f777/ryyy/Vrl07SbKEVE9PT0myer/169dLkp599llL4JEkZ2fnCms3mUwKCAiw6u9i1/31e3thuz179lSphkv53rRt2/ai719yPDc3t8y2F5o3b56aNWumtLS0UucGDx6sDRs26Mknn5QkJSYm6s8//1Tv3r118uRJSVJeXp7OnTunGTNm6IYbbpAk9enTR1lZWXrjjTeUnJysxo0bW/oMCgqyqqfkv2N1+vXXX7V582a99dZbuvHGGyVJV1xxhVxdXfXOO+/o/vvvt4ynC8fsxo0bK33txcyZM0erVq3SggULNHjw4Gr/+gDUfyw1BIBq5OrqqqFDh1otN1y/fr2uv/56qxkCSTpy5IgSExM1dOhQFRQUWD4iIiLk6empTZs2lfteR48e1ZkzZ3TTTTdZHQ8ODlaPHj20fft2y7GEhAR5eXlVw1doLTs7W3v37tX1119vCRaS5O3trauuusqqhpL2c+fOVd++fSv8RbdE165dJUkffvihkpKSlJeXp4KCgkpdW11fd1VquNTvTXU5ePCgVq1apX/+859lnh86dKji4uJ05MgRSdK3336rbt26qXnz5pY2zs7OWrJkiW644QadPn1aW7du1cqVKy0bxOTl5V1yXUVFRSooKJDZbK6wTcnHX9tu375djo6Ouu6666yuueWWWyznL+ZyrpWkTz/9VIsWLdKNN95oNSsKAJeCGS8AqGbXX3+9Hn30USUmJsrFxUVbtmzRY489Vqpdenq6pOJlWWUtzUpKSir3fUquDwgIKHUuICBAUVFRkopn1k6dOqUWLVpc2hdSCWfPnpXZbL5oDWfPnrU69vDDD8vb29tqqWJFIiIiNGPGDC1atEgLFiy4pPpOnjxZ7pK8mqzhUr831eWVV17RjTfeqB49epR5PjAwUF26dNGGDRvUpk0bff3117rpppss46XEb7/9ptdee01HjhyRh4eHOnToIHd3d0kqNzxdzMKFC7Vw4UI5ODgoICBAAwcO1D/+8Q+rDWdKZon/qk+fPpKKl082atTIKsRKssy8lff9vJxrJSkmJkYDBw7Uf//7Xz3wwAPq1KlTue0BoCwELwCoZoMHD5aHh4e+/fZbubu7q0WLFurSpUupdt7e3pKK76Up+eXyr3x8fMp9H19fX0lScnJyqXNnzpxRo0aNJEnR0dHKyckptSFGdfDy8pLBYLhoDSU1lpg+fbq+/fZbTZkyRZ9++mmll6SNGjVKv//+uwoKCvT888+rRYsWmjRpUrnXFBUVaffu3Ro5cmSl3uPCGcnLreFSvzfV4ZtvvtG+ffuslp6W5eqrr9aGDRt0/fXXa9++fVqwYIFV8Dp+/LgmT56sYcOG6f3331fLli1lMBj06aefllpqKlX8vZOKv3+jRo1SUVGRTp06pblz5+qhhx7Sl19+aWkzc+ZMq6D81/u0fHx8lJaWpsLCQqsAVfIHipLxXpbLuVaS/vGPf+j+++/XjTfeqBkzZuizzz4rFeIAoCIsNQSAaubs7Kxhw4bpu+++0zfffGO5p+RCbdq0kb+/v06cOKHw8HDLR2BgoObMmVNqBuJCrVu3VuPGjfXf//7X6nh8fLx27dqlnj17SpJ++eUXdezYUX5+fpf8tRQVFZX7C6a7u7u6dOmib775RoWFhZbjZ8+e1S+//FLqvrYuXbpowYIFOnnypN58881K1/HOO+/ol19+0RtvvKHrr79e4eHhFd5f9ccffyg7O1t9+/Ytt13J7M2Fm0tcbg2X+r25XHl5eZo1a5YmT55sdf9VWYYNG6bdu3frk08+Ua9evdSkSROr8/v27VNubq4mTJig4OBgS7AqCV0l37OSHQEr+t5JxZvBhIeHq1u3brr++ut177336sCBA8rIyLC0ad26tdX/Fv56P12fPn1UUFBQatfQkuBW3vfzcq6VimcoXV1d9fzzz2v//v1aunRphV8vAFyIGS8AqAE33HCDJk6cKKPRaLWj3l85ODho6tSpev755+Xg4KCrrrpKmZmZWrhwoU6fPl3hEjmj0ahp06bpmWee0eOPP65bbrlFaWlpWrBggXx8fDR27Fjt379fn376qW688Ubt2rXLcu2ZM2ckFc9spKamlgplqampio2N1bFjxywB7mIef/xxjRs3ThMmTNA999yj/Px8LVq0SHl5eZo8eXKp9oG
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"inertias = []\n",
|
|||
|
"clusters_range = range(1, 23)\n",
|
|||
|
"for i in clusters_range:\n",
|
|||
|
" kmeans = KMeans(n_clusters=i, random_state=random_state)\n",
|
|||
|
" kmeans.fit(data_scaled)\n",
|
|||
|
" inertias.append(kmeans.inertia_)\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"plt.plot(clusters_range, inertias, marker='o')\n",
|
|||
|
"plt.title('Метод локтя для оптимального k')\n",
|
|||
|
"plt.xlabel('Количество кластеров')\n",
|
|||
|
"plt.ylabel('Инерция')\n",
|
|||
|
"plt.grid(True)\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Расчитаем коэффициенты силуэта"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 10,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1oAAAImCAYAAABKNfuQAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADHbElEQVR4nOzdeViUZfcH8O8zw8CwDLuACLjhAgouueG+lWWb2l5q9WaWWbap5U/fbLH0dcnS0tK01DTLtTItl0pLU3HLBVwQEUTZYYZtGJiZ3x/DjCKIDMzwzPL9XFdXOfPMM2duMTmc+z5H0Ov1ehAREREREZHFSMQOgIiIiIiIyNEw0SIiIiIiIrIwJlpEREREREQWxkSLiIiIiIjIwphoERERERERWRgTLSIiIiIiIgtjokVERERERGRhTLSIiIiIiIgsjIkWERERERGRhTHRIiKnMWbMGIwZM6bKY0eOHMEDDzyA6OhobNiwwarv//bbb2Pw4MFmv27w4MF4++23rRAREVlLu3btsHjxYrHDICIRuYgdABGRWHJzc/Hiiy+iQ4cOWLFiBdq1ayd2SEREROQgmGgRkdP6+uuvoVarMXfuXAQHB4sdDhERETkQbh0kIqeUn5+PdevW4f7776+WZKWkpGDSpEno06cPOnfujDFjxuDo0aNVrvnzzz8xatQodOrUCb1798bMmTNRWFhY5Zq1a9di0KBB6NSpE15//XUUFRUBAJYuXYq4uDh069YNM2fOhEajMb1Go9HgvffeQ/fu3dGzZ0/T1qPi4mJMmTIFnTt3xoABA7B27VrTa65cuYJ27dph8+bNpsfKysowZMiQKlW6mrZOHjp0CO3atcOhQ4dq/DVgqPx169at2rbHDRs24N5770XHjh0xcOBALF68GFqt1vR8TVslb4zV+F41/WOM83bbJmv6TDfLysrCW2+9hbi4OHTp0gWjR4/G8ePHTc/fvMVLr9fj8ccfR7t27XDlypUq19UW66RJk9C/f3/odLoq7z99+nQMGzYMAJCRkYE33ngDvXr1QqdOnTBmzBicOHECALB48eJbvocxvrNnz+Lll19Gr1690KFDB/Tr1w+zZs2CWq2udQ32799fa+x1/YwAsGfPHowcORKdOnWq9V432rx5M9q1a4d///0XI0eORGxsLO6//378+uuvVa67cuUKpk6dir59+6JDhw6Ii4vD1KlTkZ+fb7omMTERTz31FLp06YKhQ4di/fr1pudq+voFqn+d3G5b341fd6tXr6725+vgwYNo3749Pv/881ve42aLFi1CVFQUtmzZUufXEJF9Y0WLiJyKXq/HtWvXMGvWLFRUVOCFF16o8nxSUhIeffRRtGjRAjNmzIBMJsPq1avx9NNPY+XKlejRowfi4+MxYcIEPPDAA3jzzTdx4cIFfPLJJzh//jy+/fZbSKVS7N69G++//z7GjBmD/v374/vvv8fu3bsBANu3b8esWbOQnp6O+fPnQy6XY9q0aQCAefPmYdOmTZg6dSpCQkKwcOFCpKenIz09HXfffTcWLVqEffv24f3330dISAiGDBlS4+f86quvqiQJDbFgwQIUFhbC29vb9NiXX36JhQsXYvTo0Zg2bRoSExOxePFiXLt2DR999FGd7tuhQwd8//33AAxJ28aNG02/9vLyskjsxcXFeOKJJ6DVajFlyhQEBwdj5cqV+M9//oMtW7agRYsW1V7z448/VknEbvTwww/jkUceMf36vffeq/Lcb7/9hkOHDiEuLg4AoFar8euvv+L555+HRqPBuHHjUF5ejpkzZ0Imk2HJkiUYM2YMfvjhBzzyyCPo169flfvOnDkTABASEoKsrCw89dRT6Ny5M+bMmQNXV1fs27cPX3/9NYKCgjB+/PhbroNarUZISAg+/fTTGmOv62dMTU3Fq6++in79+uH11183fU3c6l43e+GFFzB69Gi8/vrr2LhxI1577TV8+eWXGDBgAEpLSzF27Fj4+flh5syZUCgUOH78OD777DPI5XK8//77KC0txfPPP49mzZph8eLFOHbsGGbOnInQ0FD079+/TjGYa8yYMdi5cyf+97//YeDAgXB1dcX//d//oXPnznjxxRfrdI8VK1ZgyZIlmDVrFkaOHGmVOInI9jDRIiKnEh8fj4EDB0Imk2H58uXVvtH+7LPP4OrqitWrV5u+2R84cCDuu+8+zJ07Fxs3bsTWrVvRokULzJ49GxKJBH369IG7uzveeecd7N27F4MHD8YXX3yBnj17YsaMGQCAnj17ok+fPigsLMTs2bPRsWNHAIBKpcLy5cvx0ksvQafT4fvvv8f48eMxevRoAEBgYCAee+wx+Pr6Yv78+ZDJZOjfvz/Onz+PL7/8ssZE69q1a1i+fDk6dOiAM2fONGi9Tp06hR9//BFRUVFQqVQAgMLCQixZsgSPPfaY6fP17dsXvr6+mDFjBp599lm0adPmtvf28vJC586dAQB//fUXAJh+bSlbtmxBeno6tmzZgqioKABA165dMWLECMTHx1f7/S8uLsb8+fNvuXYhISFVYrwxIezbty9CQkKwdetWU6K1a9culJSUYMSIEThx4gSSk5Oxdu1adOnSxRTLnXfeiSVLlmDx4sUICQmpct8b3+vvv/9GVFQUPv30U9PzvXv3xv79+3Ho0KFaE63S0lJ4e3vfMva6fsaEhASUl5fj9ddfR9u2bW97r5uNGTMGEydOBAD069cPI0eOxOeff44BAwYgJSUFISEh+N///ofw8HAAQK9evfDvv//i8OHDAID09HTExMTg//7v/xAeHo6+ffti3bp1+Ouvv6yWaAmCgNmzZ+OBBx7AvHnzIJVKUVBQgFWrVkEqld729d999x3mzZuH999/Hw8//LBVYiQi28Stg0TkVKKjozFnzhz4+Phg2rRp1ao+hw8fxqBBg6p84+ji4oJ7770Xp0+fRnFxMT788ENs3boVEokEFRUVqKiowLBhwyCRSBAfH4+KigokJCSgb9++pnu4ubmhU6dOcHd3NyVZgOGbc7VajXPnzuHcuXMoKyszVTUAwzfabm5uiI2NhUwmq/K6M2fOVNmqZ/S///0P3bp1w6BBgxq0Vnq9HrNmzcLDDz+M9u3bmx4/fvw41Go1Bg8ebPr8FRUVpm2C+/fvr3KfG6+5eVtdXeOo72uPHj2KsLAwU5IFAO7u7vjtt9+qVG2MlixZAj8/PzzxxBNmv5dEIsHIkSOxc+dOlJaWAjAker1790ZISAh69OiBEydOoHPnztBqtaioqIC3tzf69OmD+Pj4296/b9+++Pbbb+Hm5oakpCTs2bMHS5cuRV5eXpXtpzW5du0aFAqF2Z/pZh06dICLiwu+/fZbpKenQ6PRoKKiAnq9vk6vv7GaIwgC7rzzTpw8eRJqtRpRUVFYt24dmjVrhpSUFOzduxcrVqxAcnKy6fNFRkZi6dKlCA8Ph0ajwb59+6BUKtG6desq76PT6ap83dUUn/GausQeHh6OyZMnY8uWLdiwYQNmzJhhSgZr88cff+C9995Dt27d8Oijj972eiJyLKxoEZFT8fLywsiRI9GqVSs88cQTeO211/D999+bfjKtVCoRGBhY7XWBgYHQ6/UoKiqCp6cn3NzcABi+8byRSqVCbm4utFot/Pz8qjzn6+sLHx+fKo8Zt17l5OSYkqabX+fj4wNfX99qr6uoqKhydgUwJIq7d+/GTz/9hF9++aUuS3JLW7duRUpKCr744gv873//Mz1eUFAAALesoGRlZZn+Oz09vdoa1SeOrVu3QhAEBAQE4I477sCrr75a7ZvrmhQUFCAgIKBO75OSkoJVq1bhq6++wtWrV+sV60MPPYQvvvgCO3fuRK9evfDPP/9g/vz5puddXV0BGM5t3XhWpy6VEZ1Oh48//hhr165FSUkJmjZtitjYWNPXYm3S09PRrFmzenyiqsLDwzFv3jx8/PHHpm2eRj169Ljt64OCgqr8OiAgAHq9HiqVCnK5HF9//TW++OILFBQUIDAwEB07doS7u3u1848qlQrdu3cHADRp0gT33HNPleefeeaZau99c3xLlizBkiVLIJVKERg
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"silhouette_scores = []\n",
|
|||
|
"for i in clusters_range[1:]: \n",
|
|||
|
" kmeans = KMeans(n_clusters=i, random_state=random_state)\n",
|
|||
|
" labels = kmeans.fit_predict(data_scaled)\n",
|
|||
|
" score = silhouette_score(data_scaled, labels)\n",
|
|||
|
" silhouette_scores.append(score)\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"plt.plot(clusters_range[1:], silhouette_scores, marker='o')\n",
|
|||
|
"plt.title('Коэффициенты силуэта для разных k')\n",
|
|||
|
"plt.xlabel('Количество кластеров')\n",
|
|||
|
"plt.ylabel('Коэффициент силуэта')\n",
|
|||
|
"plt.grid(True)\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Средний коэффициент силуэта (silhouette score) используется для оценки качества кластеризации. Его значение лежит в диапазоне от -1 до 1. Что означают различные значения:\n",
|
|||
|
"\n",
|
|||
|
"* Близко к 1.0 (0.7–1.0): Кластеры хорошо разделены и компактны. Это отличный результат кластеризации.\n",
|
|||
|
"\n",
|
|||
|
"* От 0.5 до 0.7: Кластеры четко различимы, но есть некоторое пересечение между ними. Это хороший результат.\n",
|
|||
|
"* От 0.25 до 0.5: Кластеры перекрываются, что указывает на менее четкую границу между группами. Качество кластеризации удовлетворительное, но может потребоваться уточнение числа кластеров или доработка данных.\n",
|
|||
|
"\n",
|
|||
|
"* Близко к 0.0: Кластеры сильно перекрываются или распределение данных не позволяет выделить четкие группы. В этом случае нужно пересмотреть выбор числа кластеров, алгоритм или исходные данные.\n",
|
|||
|
"\n",
|
|||
|
"* Меньше 0.0: Плохая кластеризация: точки ближе к центрам чужих кластеров, чем к своим. Это сигнал о том, что данные плохо структурированы для текущей кластеризации."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 11,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Средний коэффициент силуэта: 0.302\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA04AAAJzCAYAAAA4M0NGAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3gc1dXA4d+d2b6rLtmSLPci925jYww2vdcQqg0JBEgooYWEhFQCHySUhB4gVEMwvVfTbcC94oa75aJq9a0z9/tjpLWFJBdcZOzzPo8frLlT7o5mzZ49956rtNYaIYQQQgghhBCtMtq6A0IIIYQQQgixv5PASQghhBBCCCF2QAInIYQQQgghhNgBCZyEEEIIIYQQYgckcBJCCCGEEEKIHZDASQghhBBCCCF2QAInIYQQQgghhNgBCZyEEEIIIYQQYgckcBJCCCGEEEKIHZDASQjxg02YMIHCwsImf4YPH87EiROZMWNGW3dPCHGAKyws5P7772+2ffny5YwePZojjjiCNWvWtHr8/fffT2FhIQMGDKC2trbFff73v/9RWFjIkUceuae6LYT4kZLASQixW/r27cvkyZOZPHkyzz//PHfccQdut5tLLrmE7777rq27J4Q4yHz33XdcfPHF+P1+Jk2aRJcuXXZ4TCKR4JNPPmmx7d13393DPRRC/FhJ4CSE2C2hUIjBgwczePBghg0bxtFHH83999+PYRi8+uqrbd09IcRBZOXKlVx00UUEg0EmTZpEx44dd+q4oUOH8t577zXbXlxczKxZs+jTp8+e7qoQ4kdIAichxB7n9/vxer0opZLbJkyYwIQJE5rsd/fdd1NYWNgkwJo0aRJHHXUUQ4YM4cILL2T58uUAPPfccxQWFrJ69eom53jjjTfo06cPmzZtAmDKlCmcf/75DBkyhP79+3P88cfz3HPPNTnmd7/7XbMhho1/ioqKkvt8f2jOCy+80Gxo0LvvvsuJJ57I4MGDOfPMM5k1a1aTY3bUn+nTp1NYWMj06dObHPf9+7Uz9y8Wi3HnnXdyxBFH0KdPnyava3tB7PfPfdtttzFgwAC++OILYOtwppb+bNvvnbn3JSUl/Pa3v2X06NHJ3/HcuXMBOPLII3f4e5k1axYXXnghgwYNYuTIkfz2t7+loqIief5XX32VwsJC5s+fzxlnnMHAgQM55ZRTeP/995v0o6amhv/7v//j6KOPZsCAAZx88sm8/PLLTfbZtj+9e/dmxIgRXH311WzZsqXVewmwatUqrrrqKkaOHMmIESO4/PLLWblyZav7b+/+bvt7W7NmDddccw1jxoxh8ODBTJgwgdmzZyfbi4qKkse9+eabTa7x6aefJtu29e6773LmmWcyZMgQxowZw5/+9Ceqqqqa9W1bLT2LRx55JL/73e9a/fn7Gvu67eubM2cO55xzDgMGDGDMmDHceuutRCKRVs/xfStXrmTixImkpKQwadIk8vPzd/rYE088kalTpzYbrvf+++/TtWtXevfu3eyYKVOmcOaZZyb7+/e//536+vpm++zM+//rr7/m5z//OYMGDWLMmDH885//xLKs5H7Tpk3jpz/9KUOGDGHEiBH88pe/3O4zJYTYOyRwEkLsFq01iUSCRCJBPB6ntLSUu+++m1gsxllnndXqcevWreOpp55qsu3DDz/k1ltv5aSTTuLBBx/EsiyuuOIKYrEYp5xyCl6vlzfeeKPJMa+//jqjR48mLy+Pzz77jCuvvJJ+/frx0EMPcf/999OxY0f+9re/MX/+/CbH5eTkJIcYTp48mV/+8pfbfZ1VVVX861//arJtwYIF3HjjjQwePJiHH36YvLw8rrjiCsrKygB2qT+7qqX799hjj/H0009z0UUX8fTTTzN58mQeeOCBXTrvggUL+N///se//vUvhgwZ0qRt2/v1pz/9qUnbzrzWuro6zjvvPKZPn85vfvMbHnjgAbxeLz//+c9Zs2YNDzzwQJM+//KXv0xer127dsycOZOLL74Yn8/Hv/71L37/+98zY8YMJk6c2OwD9uWXX85RRx3FAw88QNeuXbn22mv5/PPPAYhEIpx//vm89dZbXHrppTz00EMMGzaMP/zhDzzyyCNNznPEEUcwefJknn32WW644QamTZvGbbfd1ur9Ky4u5pxzzmHNmjX85S9/4Z///CdlZWVcdNFFVFZWbvfeb3t/v/97W7FiBWeeeSZFRUXccsst3HXXXSiluOiii5rNJwwGg82Gnb377rsYRtP/5T/00ENcf/31DB48mPvuu48rr7ySDz74gAkTJuxSwLInbNq0iUsuuYSMjAweeOABrrnmGt544w1uuummnTp+1apVXHTRRYRCISZNmkT79u136frHHXcclmW1eN9OOumkZvu/9dZbXHnllXTr1o0HH3yQq666ijfffJNf/epXaK2BXXv/33jjjQwbNoxHHnmEk08+mccff5yXXnoJgPXr1/OrX/2K/v378/DDD3PbbbexevVqLrvsMmzb3qXXKYTYPa627oAQ4sdt5syZ9OvXr9n266+/nu7du7d63O23307Pnj359ttvk9sqKio4//zzuf766wEng9L4bX2fPn045phjePPNN/n1r3+NUorNmzfzzTff8M9//hNwPlyeccYZ/OEPf0iec8iQIRxyyCFMnz6dQYMGJbd7PB4GDx6c/HnVqlXbfZ333Xcf+fn5TbINmzdv5rjjjuPvf/87hmGQnZ3NySefzLx58zj66KN3qT+7qqX7t2DBAnr37s3Pf/7z5LbGTM3Oasz4HXXUUc3atr1f0Wi0SdvOvNbXXnuNDRs28NprryWHPg0dOpTTTz+dmTNncvbZZzfpc6dOnZpc8+6776Zr16785z//wTRNAAYNGsRJJ53EK6+8wgUXXJDcd8KECVx55ZUAjB07ljPOOIMHH3yQI444gldffZXly5fzwgsvJIPDsWPHkkgkeOihhzj33HNJT08HIDMzM9mHESNG8NVXXzW559/31FNPEYvFePLJJ8nJyQGgd+/enHfeecyfP58jjjii1WO3fa3f/7098MADeDwennnmGUKhEADjxo3j5JNP5h//+EeTbNnhhx/Ol19+SSwWw+PxEI1G+fjjjxkxYkQyQ1hVVcXDDz/MT3/60yZBcK9evbjgggua3c+97bHHHiMjI4MHH3ww+bs1DINbbrmFZcuWNct6bWvNmjVMnDiRsrIy4vH4DwomsrOzGTFiBO+99x6nnnoqABs2bGD+/Pn84x//4OGHH07uq7XmrrvuYuzYsdx1113J7V26dOHiiy/m888/Z9y4cbv0/j/77LOTz+vo0aOZMmUKn332Geeeey4LFiwgEolw+eWXJwPC3NxcPv74Y+rr65PPgxBi75PASQixW/r168df//pXwPlAUV1dzRdffMG9995LfX091113XbNjvvjiC7766isee+wxJk6cmNx+7rnnAmDbNvX19Xz44Yf4fD46dOgAwE9+8hPefvttZs2axYgRI3j99dcJBoMcc8wxAFx66aWAk9lYvXo169atY+HChYAThP1Qy5cvT2YdGvsIcOyxx3Lssceitaa+vp733nsPwzDo2rXrXu1Pa/dvwIABPProo3zwwQeMGjWKYDC40x8itdbMnTuXd999t1kma2fszGudPXs2BQUFTeaL+P1+Pvjggx2ePxwOM3/+fC655JJklhOgY8eOdO/enWnTpjX5oH/GGWck/66U4phjjuH+++8nEokwY8YMOnTo0Cyjduqpp/Lyyy83CXAar2XbNkuXLmX27NkceuihrfZz9uzZDB48OBk0gfMh99NPP93ha9yeGTNmMH78+CYfkl0uVzI7W1dXl9w+atQovvjiC6ZPn87YsWP54osvCIVCDB8+PBk4zZs3j1gsxsknn9zkOsOHD6dDhw7MmDFjtwOnxntnGEazbFcj27ZJJBLMmjWLww47LBk0gRMAgnNPtxc4vf322/Tv3597772Xn//85/zmN7/hqaeeanJNy7KSmSBwnoltrwXOcL2///3v1NbWEgqFeOedd+jXrx+dO3dust+qVavYvHkzl19+efI5BCewDoVCTJs
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x700 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"kmeans = KMeans(n_clusters=4, random_state=42) \n",
|
|||
|
"df_clusters = kmeans.fit_predict(data_scaled)\n",
|
|||
|
"\n",
|
|||
|
"silhouette_avg = silhouette_score(data_scaled, df_clusters)\n",
|
|||
|
"print(f'Средний коэффициент силуэта: {silhouette_avg:.3f}')\n",
|
|||
|
"\n",
|
|||
|
"pca = PCA(n_components=2)\n",
|
|||
|
"df_pca = pca.fit_transform(data_scaled)\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(10, 7))\n",
|
|||
|
"sns.scatterplot(x=df_pca[:, 0], y=df_pca[:, 1], hue=df_clusters, palette='viridis', alpha=0.7)\n",
|
|||
|
"plt.title('Визуализация кластеров с помощью K-Means')\n",
|
|||
|
"plt.xlabel('Первая компонентa PCA')\n",
|
|||
|
"plt.ylabel('Вторая компонентa PCA')\n",
|
|||
|
"plt.legend(title='Кластер', loc='upper right')\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"В нашем случае результат находится ближе к хорошему, но пока что всё-таки больше в удовлетворительном состоянии, что приемлемо и говорит о некотором пересечении между кластерами"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python 3",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.12.7"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 2
|
|||
|
}
|