AIM-PIbd-31-Kouvshinoff-T-A/lab_2/laba2.ipynb

722 lines
281 KiB
Plaintext
Raw Normal View History

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1 Датасет: NASA - Nearest Earth Objects\n",
"#### https://www.kaggle.com/datasets/sameepvani/nasa-nearest-earth-objects \n",
"\n",
"There is an infinite number of objects in the outer space. Some of them are closer than we think. Even though we might think that a distance of 70,000 Km can not potentially harm us, but at an astronomical scale, this is a very small distance and can disrupt many natural phenomena. These objects/asteroids can thus prove to be harmful. Hence, it is wise to know what is surrounding us and what can harm us amongst those. Thus, this dataset compiles the list of NASA certified asteroids that are classified as the nearest earth object.\n",
"\n",
"В космосе находится бесконечное количество объектов. Некоторые из них находятся ближе, чем мы думаем. Хотя мы можем думать, что расстояние в 70 000 км не может потенциально навредить нам, но в астрономических масштабах это очень малое расстояние и может нарушить многие природные явления. Таким образом, эти объекты/астероиды могут оказаться вредными. Следовательно, разумно знать, что нас окружает и что из этого может навредить нам. Таким образом, этот набор данных составляет список сертифицированных NASA астероидов, которые классифицируются как ближайшие к Земле объекты.\n",
"\n",
"- Из этого описания очевидно что объектами иследования являются околоземные объекты.\n",
"- Атрибуты объектов: id, name, est_diameter_min, est_diameter_max, relative_velocity, miss_distance, orbiting_body, sentry_object, absolute_magnitude, hazardous\n",
"- Очевидная цель этого датасета - это научиться определять опасность объекта автоматически.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"количество колонок: 10\n",
"колонки: id, name, est_diameter_min, est_diameter_max, relative_velocity, miss_distance, orbiting_body, sentry_object, absolute_magnitude, hazardous\n"
]
}
],
"source": [
"import pandas as pd\n",
"df = pd.read_csv(\"..//static//csv//neo_v2.csv\", sep=\",\")\n",
"print('количество колонок: ' + str(df.columns.size)) \n",
"print('колонки: ' + ', '.join(df.columns))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Получение сведений о пропущенных данных\n",
"\n",
"Типы пропущенных данных:\n",
"- None - представление пустых данных в Python\n",
"- NaN - представление пустых данных в Pandas\n",
"- '' - пустая строка"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"id 0\n",
"name 0\n",
"est_diameter_min 0\n",
"est_diameter_max 0\n",
"relative_velocity 0\n",
"miss_distance 0\n",
"orbiting_body 0\n",
"sentry_object 0\n",
"absolute_magnitude 0\n",
"hazardous 0\n",
"dtype: int64\n",
"\n",
"id False\n",
"name False\n",
"est_diameter_min False\n",
"est_diameter_max False\n",
"relative_velocity False\n",
"miss_distance False\n",
"orbiting_body False\n",
"sentry_object False\n",
"absolute_magnitude False\n",
"hazardous False\n",
"dtype: bool\n",
"\n"
]
}
],
"source": [
"# Количество пустых значений признаков\n",
"print(df.isnull().sum())\n",
"\n",
"print()\n",
"\n",
"# Есть ли пустые значения признаков\n",
"print(df.isnull().any())\n",
"\n",
"print()\n",
"\n",
"# Процент пустых значений признаков\n",
"for i in df.columns:\n",
" null_rate = df[i].isnull().sum() / len(df) * 100\n",
" if null_rate > 0:\n",
" print(f\"{i} процент пустых значений: %{null_rate:.2f}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Итог: пропущеных значений нет"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"инфографика на сайте и в datawrangelere показывает, что в столбцах orbiting_body и sentry_object у всех записей одно и тоже значение. Значит эти столбцы можно выкинуть из набора данных."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"количество колонок: 8\n",
"колонки: id, name, est_diameter_min, est_diameter_max, relative_velocity, miss_distance, absolute_magnitude, hazardous\n"
]
}
],
"source": [
"df = df.drop(columns=['orbiting_body'])\n",
"df = df.drop(columns=['sentry_object'])\n",
"print('количество колонок: ' + str(df.columns.size)) \n",
"print('колонки: ' + ', '.join(df.columns))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"я быстро посмотрев данные зашумленности не выявил"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"поля id и name в предсказании не помогут, но я их пока выкидывать не буду"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"посмотрим выбросы:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Колонка est_diameter_min:\n",
" Есть выбросы: Да\n",
" Количество выбросов: 8306\n",
" Минимальное значение: 0.0006089126\n",
" Максимальное значение: 37.8926498379\n",
" 1-й квартиль (Q1): 0.0192555078\n",
" 3-й квартиль (Q3): 0.1434019235\n",
"\n",
"Колонка est_diameter_max:\n",
" Есть выбросы: Да\n",
" Количество выбросов: 8306\n",
" Минимальное значение: 0.00136157\n",
" Максимальное значение: 84.7305408852\n",
" 1-й квартиль (Q1): 0.0430566244\n",
" 3-й квартиль (Q3): 0.320656449\n",
"\n",
"Колонка relative_velocity:\n",
" Есть выбросы: Да\n",
" Количество выбросов: 1574\n",
" Минимальное значение: 203.34643253\n",
" Максимальное значение: 236990.1280878666\n",
" 1-й квартиль (Q1): 28619.02064490995\n",
" 3-й квартиль (Q3): 62923.60463276395\n",
"\n",
"Колонка miss_distance:\n",
" Есть выбросы: Нет\n",
" Количество выбросов: 0\n",
" Минимальное значение: 6745.532515957\n",
" Максимальное значение: 74798651.4521972\n",
" 1-й квартиль (Q1): 17210820.23576468\n",
" 3-й квартиль (Q3): 56548996.45139917\n",
"\n",
"Колонка absolute_magnitude:\n",
" Есть выбросы: Да\n",
" Количество выбросов: 101\n",
" Минимальное значение: 9.23\n",
" Максимальное значение: 33.2\n",
" 1-й квартиль (Q1): 21.34\n",
" 3-й квартиль (Q3): 25.7\n",
"\n"
]
}
],
"source": [
"numeric_columns = ['est_diameter_min', 'est_diameter_max', 'relative_velocity', 'miss_distance', 'absolute_magnitude']\n",
"for column in numeric_columns:\n",
" if pd.api.types.is_numeric_dtype(df[column]): # Проверяем, является ли колонка числовой\n",
" q1 = df[column].quantile(0.25) # Находим 1-й квартиль (Q1)\n",
" q3 = df[column].quantile(0.75) # Находим 3-й квартиль (Q3)\n",
" iqr = q3 - q1 # Вычисляем межквартильный размах (IQR)\n",
"\n",
" # Определяем границы для выбросов\n",
" lower_bound = q1 - 1.5 * iqr # Нижняя граница\n",
" upper_bound = q3 + 1.5 * iqr # Верхняя граница\n",
"\n",
" # Подсчитываем количество выбросов\n",
" outliers = df[(df[column] < lower_bound) | (df[column] > upper_bound)]\n",
" outlier_count = outliers.shape[0]\n",
"\n",
" print(f\"Колонка {column}:\")\n",
" print(f\" Есть выбросы: {'Да' if outlier_count > 0 else 'Нет'}\")\n",
" print(f\" Количество выбросов: {outlier_count}\")\n",
" print(f\" Минимальное значение: {df[column].min()}\")\n",
" print(f\" Максимальное значение: {df[column].max()}\")\n",
" print(f\" 1-й квартиль (Q1): {q1}\")\n",
" print(f\" 3-й квартиль (Q3): {q3}\\n\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"построим графики в надежде найти какие то зависимости опасности от других колонок"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZ0AAAK9CAYAAAD/m7EJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABXv0lEQVR4nO3dd3xT9f4/8FfSJulMS+mGDpZgmdciWFYr0FaWIHC5gkpBxMEQ6UWEL7JcVVBEEAGvl6HAVQFxsmpZggW0gFCWFFlCBwW62zRNPr8/aPMjbQppmp604fV8PPqAfM7nnLyTpp9XzjmfnMiEEAJEREQSkNu6ACIiun8wdIiISDIMHSIikgxDh4iIJMPQISIiyTB0iIhIMgwdIiKSDEOHiIgkw9AhIiLJMHTIboSGhmLMmDGG23v27IFMJsOePXtsVtP9ICoqClFRUbYuo0bWrFkDmUyG33//3dalWEVDeq3XSeisWLECsbGx8PPzg0KhgL+/PyIjI/H5559Dr9fXxV1SA7NhwwYsXrzY1mXYXFFREebNm9cgBgsia3Csi42uXbsWAQEBmD17NtRqNXJycnDw4EGMGTMG27Ztw//+97+6uFtqQDZs2IDU1FS88sordXYfvXr1QnFxMZRKZZ3dR20VFRVh/vz5ANDg9hYq7Ny509YlUANSJ6Gzb98+KBQKo7aXX34ZjRs3xscff4yEhASEhobWxV0TGcjlcjg5Odm6DJsoLCyEq6urJPdVn0PdXgghUFJSAmdnZ1uXUmt1cnitcuBUqAgaufz/3+13332HAQMGIDAwECqVCi1atMCbb74JnU5ntG5UVBRkMpnhx9vbGwMGDEBqaqpRP5lMhnnz5hm1LVy4EDKZrMo7yZKSEsybNw8PPPAAnJycEBAQgKFDh+L8+fMAgIsXL0Imk2HNmjVG602cOBEymczo/EHFMWKlUonr168b9U9OTjbUXfkY8saNGxEeHg5nZ2d4e3vj6aefxtWrV6s8d2fOnMGIESPg4+MDZ2dntG7dGrNmzQIAzJs3z+i5MfVTcfgmKioK7dq1q7J9cx06dAiPPfYYPDw84OLigsjISBw4cMCoT35+Pl555RWEhoZCpVLB19cX0dHROHLkiKGGn376CZcuXTLUV5M3IUIIvPXWW2jatClcXFzw6KOP4uTJk1X6mTrO/csvv+Cf//wngoODoVKpEBQUhKlTp6K4uNho3TFjxsDNzQ2XL1/GwIED4ebmhiZNmmDZsmUAgBMnTqB3795wdXVFSEgINmzYUOX+c3Jy8MorryAoKAgqlQotW7bEe++9ZzjEfPHiRfj4+AAA5s+fb3gu7nz9njlzBsOHD4eXlxecnJzQuXNnfP/990b3U/Ha27t3LyZMmABfX180bdrUrOey4jX+/vvvY9myZWjevDlcXFwQExODK1euQAiBN998E02bNoWzszMGDx6MmzdvGm2j8jmdiuf966+/xttvv42mTZvCyckJffr0QVpamll1HT16FP369YNarYabmxv69OmDgwcPVnnMd/up/HdrikajQXx8PHx8fODq6oonnniiyt+vOWPU3eq587lZvXo1evfuDV9fX6hUKoSFhWH58uVV6goNDcXAgQOxY8cOdO7cGc7Ozli5ciUA4O+//8aQIUPg6uoKX19fTJ06FRqNxuTjM2d8qe6c3JgxY6r8XX755ZcIDw+Hu7s71Go12rdvj48++uhuT3EVdbKnUyEnJwdlZWXIz89HSkoK3n//fTz55JMIDg429FmzZg3c3NwQHx8PNzc37Nq1C3PmzEFeXh4WLlxotL02bdpg1qxZEELg/PnzWLRoEfr374/Lly/ftYaEhIQq7TqdDgMHDkRSUhKefPJJTJkyBfn5+UhMTERqaipatGhhcntpaWn4z3/+U+39OTg4YN26dZg6daqhbfXq1XByckJJSYlR3zVr1mDs2LF4+OGHkZCQgMzMTHz00Uc4cOAAjh49Ck9PTwDA8ePH0bNnTygUCjz//PMIDQ3F+fPn8cMPP+Dtt9/G0KFD0bJlS8N2p06digcffBDPP/+8oe3BBx+stmZz7dq1C/369UN4eDjmzp0LuVxu+CP65Zdf0KVLFwDAiy++iE2bNmHSpEkICwvDjRs3sH//fpw+fRoPPfQQZs2ahdzcXPz999/48MMPAQBubm5m1zFnzhy89dZb6N+/P/r3748jR44gJiYGpaWl91x348aNKCoqwksvvYTGjRvj8OHDWLp0Kf7++29s3LjRqK9Op0O/fv3Qq1cvLFiwAOvXr8ekSZPg6uqKWbNm4amnnsLQoUOxYsUKjB49GhEREWjWrBmA24fNIiMjcfXqVbzwwgsIDg7Gr7/+ipkzZyI9PR2LFy+Gj48Pli9fjpdeeglPPPEEhg4dCgDo0KEDAODkyZPo3r07mjRpghkzZsDV1RVff/01hgwZgs2bN+OJJ54wqnfChAnw8fHBnDlzUFhYaPbzCQDr169HaWkpJk+ejJs3b2LBggUYMWIEevfujT179uC1115DWloali5dimnTpmHVqlX33Oa7774LuVyOadOmITc3FwsWLMBTTz2FQ4cO3XW9kydPomfPnlCr1Zg+fToUCgVWrlyJqKgo7N27F127dkWvXr3wxRdfGNZ5++23AcDwRgwAunXrds8aJ0+ejEaNGmHu3Lm4ePEiFi9ejEmTJuGrr74y9DFnjKpcDwBcunQJr7/+Onx9fQ1ty5cvR9u2bfH444/D0dERP/zwAyZMmAC9Xo+JEycarX/27FmMHDkSL7zwAsaPH4/WrVujuLgYffr0weXLl/Hyyy8jMDAQX3zxBXbt2lXlsZk7vpgrMTERI0eORJ8+ffDee+8BAE6fPo0DBw5gypQp5m9I1KHWrVsLAIaf0aNHC61Wa9SnqKioynovvPCCcHFxESUlJYa2yMhIERkZadTv//7v/wQAkZWVZWgDIObOnWu4PX36dOHr6yvCw8ON1l+1apUAIBYtWlTl/vV6vRBCiAsXLggAYvXq1YZlI0aMEO3atRNBQUEiLi7O0L569WoBQIwcOVK0b9/e0F5YWCjUarUYNWqUACB+++03IYQQpaWlwtfXV7Rr104UFxcb+v/4448CgJgzZ46hrVevXsLd3V1cunTJZJ2VhYSEGNV2p8jISNG2bVuTy+5Gr9eLVq1aidjYWKP7LSoqEs2aNRPR0dGGNg8PDzFx4sS7bm/AgAEiJCSkxnVkZWUJpVIpBgwYYFRHxWvhzse9e/duAUDs3r3bqN7KEhIShEwmM3p+4+LiBADxzjvvGNpu3bolnJ2dhUwmE19++aWh/cyZM1Ved2+++aZwdXUVf/75p9F9zZgxQzg4OIjLly8LIYS4fv16lXUr9OnTR7Rv397o70Cv14tu3bqJVq1aGdoqXns9evQQZWVlJp616lW8xn18fEROTo6hfebMmQKA6Nixo9Hf7MiRI4VSqbzr32bF8/7ggw8KjUZjaP/oo48EAHHixIm71jRkyBChVCrF+fPnDW3Xrl0T7u7uolevXibXMTU+3E3Fc9a3b1+j19HUqVOFg4OD0XNh7hh1p+LiYhEeHi4CAwNFenr6XbcVGxsrmjdvbtQWEhIiAIjt27cbtS9evFgAEF9//bWhrbCwULRs2dLotV6T8aW65y4uLs7ob3TKlClCrVbX+DVWWZ1OmV69ejUSExOxfv16jBs3DuvXrzd69w3A6Bhlfn4+srOz0bNnTxQVFeHMmTNGfbVaLbKzs3H9+nUkJydjy5Yt6NChA7y9vU3e/9WrV7F06VLMnj27yjvpzZs3w9vbG5MnT66ynkwmM7m9lJQUbNy4EQkJCUaHCO/0zDPP4MyZM4bDaJs3b4aHhwf69Olj1O/3339HVlYWJkyYYHTeYcCAAWjTpg1++uknAMD169exb98+PPvss0Z7iHer8150Oh2ys7ORnZ1t1t4BABw7dgznzp3DqFGjcOPGDcP6hYWF6NOnD/bt22c4bOTp6YlDhw7h2rVrFtV3Nz///LPhHfmdj9/cCQl3vt4KCwuRnZ2Nbt26QQiBo0ePVun/3HPPGf7v6emJ1q1bw9XVFSN
"text/plain": [
"<Figure size 400x800 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAaAAAAK9CAYAAABilriBAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABV9klEQVR4nO3deZyN9f//8eeZfTMzllnQmLFlp0+kECOGEUqptGfpW31CWT4SLZZUkpLI0mr5pE+b9hChlIRIJUvIFmbGOvt+rt8fZs7PmYWZ4/BmPO6329zMeV/v6zqvc5lzPc91Xe9zXTbLsiwBAHCeeZguAABwaSKAAABGEEAAACMIIACAEQQQAMAIAggAYAQBBAAwggACABhBAAEAjCCAcNGLiYlRv379HI+/++472Ww2fffdd8ZqwsVj3LhxstlsOnLkiOlS3GLu3Lmy2Wzas2eP6VLOyK0BNHv2bMXHxysiIkLe3t6KjIxUbGys5s+fL7vd7s6nwkXqvffe09SpU02XYVxGRobGjRtHSOKS5uXOhc2bN0/Vq1fX008/reDgYJ04cUI///yz+vXrp8WLF+t///ufO58OF6H33ntPmzdv1tChQ8/Zc3To0EGZmZny8fE5Z89xtjIyMjR+/HhJUseOHc0WAxji1gBatWqVvL29ndoeffRRVa1aVa+99pomTpyomJgYdz4lUIyHh4f8/PxMl2FEenq6AgMDTZeBc8hutysnJ6dC/I279RBc0fApVBg6Hh7//+k+//xz9ejRQzVq1JCvr6/q1q2rCRMmKD8/32nejh07ymazOX6qVaumHj16aPPmzU79bDabxo0b59Q2efJk2Wy2Yp8ws7KyNG7cOF1++eXy8/NT9erV1bt3b+3atUuStGfPHtlsNs2dO9dpvkGDBslmszmdbyg83urj46PDhw879V+zZo2j7l9++cVp2kcffaSWLVvK399f1apV0z333KMDBw4UW3fbtm1Tnz59FBYWJn9/fzVo0EBPPvmkpP9/7Pp0P4WHeDp27KimTZsWW35ZrV27Vt26dVNISIgCAgIUGxur1atXO/VJTU3V0KFDFRMTI19fX4WHh6tLly7auHGjo4avv/5ae/fuddRXng8klmXp2Wef1WWXXaaAgABdd911+vPPP4v1K+kc0A8//KDbbrtNtWrVkq+vr6KiojRs2DBlZmY6zduvXz8FBQVp37596tmzp4KCglSzZk3NmDFDkvTHH3+oU6dOCgwMVHR0tN57771iz3/ixAkNHTpUUVFR8vX1Vb169TRp0iTHYeg9e/YoLCxMkjR+/HjHujj173fbtm269dZbVaVKFfn5+alVq1b64osvnJ6n8G/v+++/18CBAxUeHq7LLrusTOuy8G/8pZde0owZM1SnTh0FBASoa9eu2r9/vyzL0oQJE3TZZZfJ399fvXr10rFjx5yWUZb38NatW+Xv76/77rvPad4ff/xRnp6eevzxx89Y64oVK9S+fXsFBgYqNDRUvXr10tatWx3Ty/M+OJ0TJ06oX79+Cg0NVUhIiPr376+MjAynPnPmzFGnTp0UHh4uX19fNW7cWLNmzXLqc7p6Tt12vPTSS2rbtq2qVq0qf39/tWzZUh9//HGxumw2mwYPHqwFCxaoSZMm8vX11ZIlSyRJf/75pzp16iR/f39ddtllevbZZ0s93TFz5kzH/DVq1NCgQYN04sQJpz5Fz6cW6tixY7Ht6PTp09WkSRMFBASocuXKatWqVYnvh9Nx6x5QoRMnTigvL0+pqanasGGDXnrpJd1xxx2qVauWo8/cuXMVFBSk4cOHKygoSCtWrNCYMWOUkpKiyZMnOy2vYcOGevLJJ2VZlnbt2qUpU6aoe/fu2rdv32lrmDhxYrH2/Px89ezZU8uXL9cdd9yhIUOGKDU1VcuWLdPmzZtVt27dEpe3c+dOvfnmm6U+n6enp959910NGzbM0TZnzhz5+fkpKyvLqe/cuXPVv39/XXXVVZo4caISExP16quvavXq1fr1118VGhoqSfr999/Vvn17eXt768EHH1RMTIx27dqlL7/8Us8995x69+6tevXqOZY7bNgwNWrUSA8++KCjrVGjRqXWXFYrVqzQ9ddfr5YtW2rs2LHy8PBwvBF/+OEHtW7dWpL073//Wx9//LEGDx6sxo0b6+jRo/rxxx+1detWXXnllXryySeVnJysf/75R6+88ookKSgoqMx1jBkzRs8++6y6d++u7t27a+PGjeratatycnLOOO9HH32kjIwMPfzww6patarWrVun6dOn659//tFHH33k1Dc/P1/XX3+9OnTooBdffFELFizQ4MGDFRgYqCeffFJ33323evfurdmzZ+u+++5TmzZtVLt2bUknD63FxsbqwIEDeuihh1SrVi399NNPGj16tA4dOqSpU6cqLCxMs2bN0sMPP6ybb75ZvXv3liQ1b95c0smNSrt27VSzZk2NGjVKgYGB+vDDD3XTTTdp4cKFuvnmm53qHThwoMLCwjRmzBilp6eXeX1K0oIFC5STk6NHHnlEx44d04svvqg+ffqoU6dO+u677/T4449r586dmj59ukaMGKF33nnHMW9Z3sONGjXShAkT9Nhjj+nWW2/VjTfeqPT0dPXr108NGzbUM888c9r6vv32W11//fWqU6eOxo0bp8zMTE2fPl3t2rXTxo0bFRMT47b3QZ8+fVS7dm1NnDhRGzdu1FtvvaXw8HBNmjTJ0WfWrFlq0qSJbrzxRnl5eenLL7/UwIEDZbfbNWjQIEkqVo8kbdiwQVOnTlV4eLij7dVXX9WNN96ou+++Wzk5OXr//fd122236auvvlKPHj2c5l+xYoU+/PBDDR48WNWqVVNMTIwSEhJ03XXXKS8vz/F38sYbb8jf37/Yaxs3bpzGjx+vuLg4Pfzww9q+fbtmzZql9evXa/Xq1aXuPJTmzTff1KOPPqpbb71VQ4YMUVZWln7//XetXbtWd911V9kXZJ0DDRo0sCQ5fu677z4rNzfXqU9GRkax+R566CErICDAysrKcrTFxsZasbGxTv2eeOIJS5KVlJTkaJNkjR071vF45MiRVnh4uNWyZUun+d955x1LkjVlypRiz2+32y3Lsqzdu3dbkqw5c+Y4pvXp08dq2rSpFRUVZfXt29fRPmfOHEuSdeedd1rNmjVztKenp1vBwcHWXXfdZUmy1q9fb1mWZeXk5Fjh4eFW06ZNrczMTEf/r776ypJkjRkzxtHWoUMHq1KlStbevXtLrLOo6Ohop9pOFRsbazVp0qTEaadjt9ut+vXrW/Hx8U7Pm5GRYdWuXdvq0qWLoy0kJMQaNGjQaZfXo0cPKzo6utx1JCUlWT4+PlaPHj2c6ij8Wzj1da9cudKSZK1cudKp3qImTpxo2Ww2p/Xbt29fS5L1/PPPO9qOHz9u+fv7WzabzXr//fcd7du2bSv2dzdhwgQrMDDQ+uuvv5yea9SoUZanp6e1b98+y7Is6/Dhw8XmLdS5c2erWbNmTu8Du91utW3b1qpfv76jrfBv79prr7Xy8vJKWGulK/wbDwsLs06cOOFoHz16tCXJatGihdN79s4777R8fHycairrezg/P9+69tprrYiICOvIkSPWoEGDLC8vL8d74nSuuOIKKzw83Dp69Kij7bfffrM8PDys++67r8R5Tvc+KMnYsWMtSdaAAQOc2m+++WaratWqTm0lveb4+HirTp06pS7/8OHDVq1ataxmzZpZaWlppS4rJyfHatq0qdWpUyendkmWh4eH9eeffzq1Dx061JJkrV271tGWlJRkhYSEWJKs3bt3O9p8fHysrl27Wvn5+Y6+r732miXJeueddxxtpa27otvhXr16ubQ9KeqcDMOeM2eOli1bpgULFuj+++/XggULnD6NSHJK6dTUVB05ckTt27dXRkaGtm3b5tQ3NzdXR44c0eHDh7VmzRp9+umnat68uapVq1bi8x84cEDTp0/X008/XewT9sKFC1WtWjU98sgjxeaz2WwlLm/Dhg366KOPNHHiRKfDiKe69957tW3bNsehtoULFyokJESdO3d26vfLL78oKSlJAwc
"text/plain": [
"<Figure size 400x800 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAboAAAK9CAYAAABSGqmgAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABsU0lEQVR4nO3dd3gU5doG8Hu2b3pCOgQSivSiQZAmCIFQREBUwAJEj3wqHIWoHLBQVY6oKCCClaLoASx4VAQigihGUHpoEgiCgSSEkJ5sfb8/QvawKbAJG3Yy3L/r4tKdeXb2yWYz9847TRJCCBARESmUytMNEBER1SUGHRERKRqDjoiIFI1BR0REisagIyIiRWPQERGRojHoiIhI0Rh0RESkaAw6IiJSNAYdURWio6Mxfvx4ty5z/PjxiI6OdusyPW3FihWQJAmnTp2qk+XPmjULkiTVybKvt23btkGSJHz++eeebsUtTp06BUmSsGLFCk+3clU3dNAtW7YM8fHxCAsLg1arRXh4OHr37o1Vq1bBbrd7uj2qh86ePYtZs2Zh3759nm5FsV555RWsX7/e021QPXJDB93KlSvh7e2NF198ER9++CGee+45NGzYEOPHj8cDDzzg6faoHjp79ixmz55dZdC9//77OHbs2PVvqh574YUXUFJS4jSNQUc1pfF0A560fft2aLVap2lPPvkkGjRogLfffhvz5s1T3FDTjaq4uBheXl4e7aHiZ42uTqPRQKO5oVdTHiWHvxt3uKG36Kpb8ZSHm0r1v7fn66+/xpAhQxAZGQm9Xo9mzZph7ty5sNlsTs/t06cPJEly/AsODsaQIUOQkpLiVCdJEmbNmuU07bXXXoMkSejTp4/T9NLSUsyaNQs33XQTDAYDIiIicPfdd+PEiRMAqh8rnzhxIiRJctrXVL5PRafT4fz58071ycnJjr7/+OMPp3nr1q1DbGwsjEYjgoOD8eCDDyI9Pb3Se3f06FHcd999CAkJgdFoRMuWLfH8888D+N/+liv927Ztm+N9bNeuXaXlu6L8ubt378btt98OLy8vPPfccwAAk8mEmTNnonnz5tDr9YiKisLUqVNhMpmuuMycnBw888wzaN++PXx8fODn54dBgwZh//79jppt27bh1ltvBQAkJCQ4fqby38vl++gsFguCgoKQkJBQ6bXy8/NhMBjwzDPPOKbVtu/Lvf7665AkCX/99VeledOnT4dOp8PFixcd03bu3ImBAwfC398fXl5e6N27N3bs2OHSa73zzjto27Yt9Ho9IiMjMXHiROTm5laq27lzJwYPHozAwEB4e3ujQ4cOWLhwoWN+xX10kiShqKgIK1eudLy/48ePx9atWyFJEr766qtKr/Hpp59CkiQkJydfseeTJ0/i3nvvRVBQELy8vHDbbbfhu+++c8wv38d2pX8V/6arYrfb8fLLL6NRo0YwGAzo168fUlNTnWp+/vln3HvvvWjcuLHj9z1lyhSnrdsr9XP5F/SarLuq+7vJzc3F+PHj4e/vj4CAAIwbN67K3ycA/Pjjj+jVqxe8vb0REBCAYcOG4ciRI0411e2vrmqfbFJSEnr27ImAgAD4+PigZcuWjr5cxa9KKPslWq1WFBQUYPfu3Xj99dcxevRoNG7c2FGzYsUK+Pj4IDExET4+Pvjxxx8xY8YM5Ofn47XXXnNaXqtWrfD8889DCIETJ05gwYIFGDx4ME6fPn3FHubNm1dpus1mw5133oktW7Zg9OjReOqpp1BQUICkpCSkpKSgWbNmVS4vNTUV77//frWvp1ar8cknn2DKlCmOacuXL4fBYEBpaalT7YoVK5CQkIBbb70V8+bNQ2ZmJhYuXIgdO3Zg7969CAgIAAAcOHAAvXr1glarxYQJExAdHY0TJ07gm2++wcsvv4y7774bzZs3dyx3ypQpaN26NSZMmOCY1rp162p7rokLFy5g0KBBGD16NB588EGEhYXBbrfjrrvuwi+//IIJEyagdevWOHjwIN588038+eefVxwOO3nyJNavX497770XMTExyMzMxLvvvovevXvj8OHDiIyMROvWrTFnzhzMmDEDEyZMQK9evQAA3bt3r7Q8rVaLESNG4Msvv8S7774LnU7nmLd+/XqYTCaMHj0aAK6p78vdd999mDp1KtauXYtnn33Wad7atWsxYMAABAYGAihbWQ0aNAixsbGYOXMmVCoVli9fjr59++Lnn39Gly5dqn2dWbNmYfbs2YiLi8Pjjz+OY8eOYenSpfj999+xY8cOxxfMpKQk3HnnnYiIiMBTTz2F8PBwHDlyBN9++y2eeuqpKpf98ccf4x//+Ae6dOni+Nw0a9YMt912G6KiorB69WqMGDHC6TmrV69Gs2bN0K1bt2p7zszMRPfu3VFcXOwY1Vm5ciXuuusufP755xgxYgRat26Njz/+2PGc9957D0eOHMGbb77pmNahQ4dqX6Pcv//9b6hUKjzzzDPIy8vD/Pnz8cADD2Dnzp2OmnXr1qG4uBiPP/44GjRogF27dmHx4sX4+++/sW7dOgCo1A9Qth5JTExEaGioY1pN1l1V/d0IITBs2DD88ssveOyxx9C6dWt89dVXGDduXKWf7YcffsCgQYPQtGlTzJo1CyUlJVi8eDF69OiBPXv21HiE7NChQ7jzzjvRoUMHzJkzB3q9HqmpqS5/4XIQJFq2bCkAOP6NHTtWWCwWp5ri4uJKz/u///s/4eXlJUpLSx3TevfuLXr37u1U99xzzwkAIisryzENgJg5c6bj8dSpU0VoaKiIjY11ev5HH30kAIgFCxZUen273S6EECItLU0AEMuXL3fMu++++0S7du1EVFSUGDdunGP68uXLBQAxZswY0b59e8f0oqIi4efnJ+6//34BQPz+++9CCCHMZrMIDQ0V7dq1EyUlJY76b7/9VgAQM2bMcEy7/fbbha+vr/jrr7+q7LOiJk2aOPV2ud69e4u2bdtWOe9qevfuLQCIZcuWOU3/+OOPhUqlEj///LPT9GXLlgkAYseOHdX2VlpaKmw2m9Pz0tLShF6vF3PmzHFM+/333yv9LsqNGzdONGnSxPF406ZNAoD45ptvnOoGDx4smjZtWqu+r6Zbt24iNjbWadquXbsEALFq1SohRNnvq0WLFiI+Pt7pd1dcXCxiYmJE//79HdPKP09paWlCCCGysrKETqcTAwYMcHq/3n77bQFAfPTRR0IIIaxWq4iJiRFNmjQRFy9edOrn8tecOXOmqLia8vb2rvJzM336dKHX60Vubq5jWlZWltBoNE5/a1WZPHmyAOD0HhcUFIiYmBgRHR1d6XcvROXf59Vs3bpVABCtW7cWJpPJMX3hwoUCgDh48KBjWlXrm3nz5glJkir9fZWz2+3izjvvFD4+PuLQoUNXXFZ1666q/m7Wr18vAIj58+c7plmtVtGrV69Kn/VOnTqJ0NBQceHCBce0/fv3C5VKJcaOHeuYVt17V/H3/eabbwoA4vz581X+zK66oYcuyy1fvhxJSUlYvXo1HnnkEaxevdppKwMAjEaj4/8LCgqQnZ2NXr16obi4GEePHnWqtVgsyM7Oxvnz55GcnIyvvvoKHTp0QHBwcJWvn56ejsWLF+PFF1+Ej4+P07wvvvgCwcHB+Oc//1npedUddr17926sW7cO8+bNcxp+vdxDDz2Eo0ePOoYov/jiC/j7+6Nfv35OdX/88QeysrLwxBNPwGAwOKYPGTIErVq1cgztnD9/Htu3b8fDDz/stCV8pT6vxmazITs7G9nZ2TCbzTV6rl6vrzQsuG7dOrRu3RqtWrVyLDc7Oxt9+/YFAGzduvWKyyt/L202Gy5cuOAYRtmzZ08Nf7Iyffv2RXBwMNasWeOYdvHiRSQlJWHUqFFu6buiUaNGYffu3Y5hbwBYs2YN9Ho9hg0bBgDYt28fjh8/jvvvvx8XLlxwvF5RURH69euH7du3V3tU8g8//ACz2YzJkyc7ffYeffRR+Pn5OT4ve/fuRVpaGiZPnuwYEShX28/L2LFjYTKZnA7fX7NmDaxWKx588MErPnfDhg3o0qULevbs6Zjm4+ODCRMm4NSpUzh
"text/plain": [
"<Figure size 400x800 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAALMCAYAAADkXsVPAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABIGElEQVR4nO3deXxMZ///8feEZJLIhiQIEdvPVlRLKUG0VGqraqslWkH39i6Vuwu3W5toS5dvVS2lrbtoVbWo9m5VbaW62JfaKY2laguVIERkzu8PJ3Mbk0SMkYO8no9HHpzrXOfMZ04m13vOMnNshmEYAgAUez5WFwAAuDoQCAAASQQCAMBEIAAAJBEIAAATgQAAkEQgAABMBAIAQBKBAAAwEQi4akyaNEk2m027du2yuhQnm82m5ORk5/TVWOP1aPHixbLZbJoxY4bVpXjFrl27ZLPZNGnSJKtLKZAlgTB+/HjFx8erXLly8vX1Vfny5RUXF6ePP/5YDofDipKAKyYzM1PJyclavHix1aUABbIkECZPnqxSpUppyJAh+s9//qN//etfqlixonr37q2ePXtaURKuAg899JBOnTqlmJgYq0vJlyc1ZmZmKiUlhUDAVa+kFQ+6ZMkS+fr6urT169dPZcuW1ZgxYzR8+HBVqVLFitJgoRIlSqhEiRJWl1Gga6FGWCMzM1OBgYFWl3FZLNlDuDAMcuWGgI/P/8r6+uuv1bFjR0VFRclut6t69ep65ZVXlJOT47Js69atZbPZnD/h4eHq2LGjNm7c6NLvwmPCkvTWW2/JZrOpdevWLu2nT59WcnKyatasKX9/f1WoUEH33HOPdu7cKSn/44JPP/20bDabevfu7WzLPfbs5+enw4cPu/RfunSps+5Vq1a5zJs+fboaNWqkgIAAhYeH68EHH9S+ffvctt3WrVt1//33KyIiQgEBAapVq5YGDx4sSUpOTnbZNnn95L57bd26terVq+e2/sLIXXb9+vWKi4tTYGCgatSo4TwO/OOPP6pp06bO+hYsWOCyfF7H51etWqX4+HiFh4crICBAVatWVd++fV2WmzZtmho1aqTg4GCFhISofv36evfddy+p9qysLA0YMEAREREKDg7WXXfdpT///NOt36XWuGvXLkVEREiSUlJSnNs79zW4fv169e7dW9WqVZO/v7/Kly+vvn376siRIy6Pm/s73LFjh3r37q2wsDCFhoaqT58+yszMdKtzypQpatKkiQIDA1W6dGm1atVK8+bNc+kzZ84ctWzZUqVKlVJwcLA6duyoTZs2FWp7/fHHH+rWrZvKlCmjwMBA3XrrrZo9e7Zzfu45gIJ+Lvw7zIvD4dBrr72mSpUqyd/fX23atNGOHTtc+vz000/q1q2bKleuLLvdrujoaA0YMECnTp0qVD3nv/m8lPGmXr16Wr16tVq1aqXAwED961//kiQdO3ZMvXv3VmhoqMLCwpSYmKhjx47l+fx++OEH5+8gLCxMXbp00ZYtW1z69O7dO883yLmvifPNnz9fLVq0UFhYmIKCglSrVi1nXYVhyR5CrmPHjuns2bM6fvy4Vq9erf/7v/9T9+7dVblyZWefSZMmKSgoSElJSQoKCtIPP/ygl156SRkZGXrrrbdc1le7dm0NHjxYhmFo586dGjFihDp06KA9e/YUWMPw4cPd2nNyctSpUyctXLhQ3bt3V//+/XX8+HHNnz9fGzduVPXq1fNc344dO/Thhx/m+3glSpTQlClTNGDAAGfbxIkT5e/vr9OnT7v0nTRpkvr06aNbbrlFw4cP18GDB/Xuu+/ql19+0dq1axUWFibp3KDSsmVL+fr66rHHHlOVKlW0c+dOffPNN3rttdd0zz33qEaNGs71DhgwQHXq1NFjjz3mbKtTp06+NV+Kv//+W506dVL37t3VrVs3jRs3Tt27d9enn36qZ599Vk888YQSEhL01ltv6b777tPevXsVHByc57oOHTqkdu3aKSIiQgMHDlRYWJh27dqlL7/80tln/vz56tGjh9q0aaM33nhDkrRlyxb98ssv6t+/f6HrfuSRRzRlyhQlJCSoefPm+uGHH9SxY8eLLnexGiMiIjRu3Dg9+eST6tq1q+655x5JUoMGDZz1//HHH+rTp4/Kly+vTZs26YMPPtCmTZu0bNkytz/4+++/X1WrVtXw4cO1Zs0aTZgwQZGRkc7nLp0LnuTkZDVv3lxDhw6Vn5+fli9frh9++EHt2rWTJH3yySdKTExUfHy83njjDWVmZmrcuHFq0aKF1q5dW+Ae+sGDB9W8eXNlZmY69+wnT56su+66SzNmzFDXrl1Vp04dffLJJ85lPvjgA23ZskXvvPOOsy13GxTk9ddfl4+Pj5577jmlp6frzTffVM+ePbV8+XJnn+nTpyszM1NPPvmkypYtqxUrVmj06NH6888/NX36dElyq0c697eflJSkyMhIZ9uljDdHjhxR+/bt1b17dz344IMqV66cDMNQly5d9PPPP+uJJ55QnTp1NGvWLCUmJro9twULFqh9+/aqVq2akpOTderUKY0ePVqxsbFas2bNJR8l2bRpkzp16qQGDRpo6NChstvt2rFjh3755ZfCr8SwUK1atQxJzp9evXoZ2dnZLn0yMzPdlnv88ceNwMBA4/Tp0862uLg4Iy4uzqXfv/71L0OScejQIWebJOPll192Tr/wwgtGZGSk0ahRI5flP/roI0OSMWLECLfHdzgchmEYRmpqqiHJmDhxonPe/fffb9SrV8+Ijo42EhMTne0TJ040JBk9evQw6tev72w/efKkERISYiQkJBiSjJUrVxqGYRhnzpwxIiMjjXr16hmnTp1y9v/2228NScZLL73kbGvVqpURHBxs7N69O886LxQTE+NS2/ni4uKMG264Ic95FxMXF2dIMqZOneps27p1qyHJ8PHxMZYtW+Zsnzt3rtu2y91GqamphmEYxqxZs1y2SV769+9vhISEGGfPnvWoZsMwjHXr1hmSjKeeesqlPfd3cv7rxZMaDx8+7LaeXHm9vj/77DNDkrFkyRJn28svv2xIMvr27evSt2vXrkbZsmWd07///rvh4+NjdO3a1cjJyXHpm/t6OH78uBEWFmY8+uijLvMPHDhghIaGurVf6NlnnzUkGT/99JOz7fjx40bVqlWNKlWquD2uYRhGYmKiERMTU+B6z7do0SJDklGnTh0jKyvL2f7uu+8akowNGzY42/LahsOHDzdsNpvb30Quh8NhdOrUyQgKCjI2bdpU4LryG28kGePHj3fp+9VXXxmSjDfffNPZdvbsWaNly5Zur/eGDRsakZGRxpEjR5xtv/32m+Hj42P06tXL2Zbftst9TeR65513DEnG4cOH83zOhWHpZacTJ07U/Pnz9emnn+rhhx/Wp59+6vKuVZICAgKc/z9+/LjS0tLUsmVLZWZmauvWrS59s7OzlZaWpsOHD2vp0qWaNWuWGjRooPDw8Dwff9++fRo9erSGDBmioKAgl3kzZ85UeHi4nnnmGbflLnzXlmv16tWaPn26hg8f7nLY63wPPfSQtm7d6jw0NHPmTIWGhqpNmzYu/VatWqVDhw7pqaeekr+/v7O9Y8eOql27tnP3/PDhw1qyZIn69u3rsmdVUJ0Xk5OTo7S0NKWlpenMmTOXtGxQUJC6d+/unK5Vq5bCwsJUp04dNW3a1Nme+/8//vgj33Xl7gF9++23ys7OzrfPyZMnNX/+/Euq83zfffedpHPnsc737LPPXnTZwtRYkPNf36dPn1ZaWppuvfVWSdKaNWvc+j/xxBMu0y1bttSRI0eUkZEhSfrqq6/kcDj00ksvub0Gc18P8+fP17Fjx9SjRw/n7zktLU0lSpRQ06ZNtWjRogJr/u6779SkSRO1aNHC2RYUFKTHHntMu3bt0ubNmy9hCxSsT58+8vPzc063bNlSkuvr5vxtePLkSaWlpal58+YyDENr167Nc72vvPKKvv32W02aNEl169bNc10XG2/sdrv69Onj0vbdd9+pZMmSevLJJ51tJUqUcBtH9u/fr3Xr1ql3794qU6aMs71Bgwa64447nK/JS5H7Wvz
"text/plain": [
"<Figure size 400x800 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAacAAAK9CAYAAACASqP4AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABc3klEQVR4nO3dd3gU1f4G8Hc22ZJeSCMQSAjdUBSkS4dIk2YBC6EoemlSvCgqVa9RvCLS5SdSRCyAYEWqgCKConRCTQAhCSSkkL7l/P7IZm82u4HNssmO2ffzPHlgz5yd/e5kM+/OzJkZSQghQEREJCMKZxdARERUFsOJiIhkh+FERESyw3AiIiLZYTgREZHsMJyIiEh2GE5ERCQ7DCciIpIdhhMREckOw4lsNnLkSHh7e1fpayYlJUGSJKxZs6ZKX9dVzJkzB5IkVdnrSZKEOXPmVNnrOUJkZCT69+/v7DIcpmvXrujatauzy7irCoXTihUrEBsbi9DQUCiVSoSFhaFLly5Yt24dDAZDZdVIZJe8vDzMmTMHe/fudXYp/yhvvfUWtm7d6uwyyMVVKJzWrl0LLy8vzJw5E6tWrcKrr76KWrVqYeTIkXjqqacqq0Yiu+Tl5WHu3LkMpzt4/fXXkZ+fb9bGcCI5cK9I5/3790OpVJq1TZo0CTVq1MCSJUsQHx+PyMhIR9ZHRJXI3d0d7u4VWg2QzOXm5sLLy8vZZdyzCm05lQ2mEiWBpFD8b3Zff/01+vXrh/DwcKjVakRHR+ONN96AXq83e27Xrl0hSZLpJygoCP369cPJkyfN+lnbV/3uu+9CkiSL/acFBQWYM2cOGjZsCI1Gg5o1a2LIkCG4ePEigPKPY4wfPx6SJGHkyJGmtjVr1kCSJKhUKty8edOs/8GDB011//HHH2bTNm7ciFatWsHDwwNBQUF4+umnce3aNYtll5CQgMcffxzBwcHw8PBAo0aN8NprrwH43/GAO/2UbBV07doVMTExFvO3xc8//4zHHnsMderUgVqtRkREBKZMmWLxjbrEpUuXEBsbCy8vL4SHh2PevHkoe3H7zz//HK1atYKPjw98fX3RrFkzfPDBBxbzeeyxxxAYGAhPT0+0a9cO33///V3rLW+f+ciRI02fxaSkJAQHBwMA5s6da1pepT9DCQkJePTRRxEYGAiNRoPWrVvjm2++uevrlyVJEiZMmICNGzeiadOm8PDwQPv27XHixAkAwIcffoj69etDo9Gga9euSEpKMnt+RZZ/yWtoNBrExMRgy5YtZu+75L1LkoT//ve/WLlyJaKjo6FWq/Hggw/i999/N5tf2WNOkiQhNzcXa9euNS2zkr+Hsq9T3jwAoLCwEFOmTEFwcDB8fHzwyCOP4O+//7a6/K5du4bRo0cjNDQUarUa9913Hz7++OPyFrcZnU6HN954w/QeIyMj8eqrr6KwsNDUJzIy8o5/Q7Z+of7ll1/Qpk0baDQa1KtXD+vWrTObfuvWLbz00kto1qwZvL294evriz59+uDYsWNm/e5UT8nf8+XLlzFu3Dg0atQIHh4eqFGjBh577DGLz07J+mnfvn0YN24cQkJCULt2bdP0kt+/h4cH2rRpg59//tnqe7tx4wbGjBmD0NBQaDQatGjRAmvXrjXrs3fvXrMaS1hbn6akpGDUqFGoXbs21Go1atasiYEDB1rUfyd2fWXKzMyETqfD7du3ceTIEfz3v//FsGHDUKdOHVOfNWvWwNvbG1OnToW3tzf27NmDWbNmITs7G++++67Z/Bo3bozXXnsNQghcvHgRCxYsQN++fXHlypU71hAfH2/Rrtfr0b9/f+zevRvDhg3Diy++iNu3b2Pnzp04efIkoqOjrc7vwoUL+L//+79yX8/NzQ3r16/HlClTTG2rV6+GRqNBQUGBWd81a9Zg1KhRePDBBxEfH4/U1FR88MEHOHDgAP766y/4+/sDAI4fP46HHnoISqUSY8eORWRkJC5evIhvv/0W//nPfzBkyBDUr1/fNN8pU6agSZMmGDt2rKmtSZMm5dZsq40bNyIvLw//+te/UKNGDRw+fBiLFy/G33//jY0bN5r11ev1ePjhh9GuXTvMnz8fP/74I2bPng2dTod58+YBAHbu3Inhw4ejR48eeOeddwAAZ86cwYEDB/Diiy8CAFJTU9GhQwfk5eWZtr7Xrl2LRx55BJs2bcLgwYPv6T0FBwdj+fLl+Ne//oXBgwdjyJAhAIDmzZsDAE6dOoWOHTuiVq1aeOWVV+Dl5YUvv/wSgwYNwubNmyv8+j///DO++eYbjB8/HgAQHx+P/v37Y/r06Vi2bBnGjRuHjIwMzJ8/H6NHj8aePXtMz7V1+X///fd44okn0KxZM8THxyMjIwNjxoxBrVq1rNa0YcMG3L59G88//zwkScL8+fMxZMgQXLp0qdwvmp988gmeffZZtGnTxvQ5K+9v5k6effZZrF+/Hk8++SQ6dOiAPXv2oF+/fhb9UlNT0a5dO1PABwcHY9u2bRgzZgyys7MxefLku77O2rVr8eijj2LatGk4dOgQ4uPjcebMGWzZsgUAsHDhQuTk5AAo/hy+9dZbePXVV01/O7YM8rlw4QIeffRRjBkzBnFxcfj4448xcuRItGrVCvfddx+A4i9bW7duxWOPPYaoqCikpqbiww8/RJcuXXD69GmEh4db1FPi/fffx9GjR1GjRg0AwO+//45ff/0Vw4YNQ+3atZGUlITly5eja9euOH36NDw9Pc2eP27cOAQHB2PWrFnIzc0FAKxatQrPP/88OnTogMmTJ+PSpUt45JFHEBgYiIiICNNz8/Pz0bVrV1y4cAETJkxAVFQUNm7ciJEjRyIzM9P0N1sRQ4cOxalTpzBx4kRERkbixo0b2LlzJ65cuWL73jVhh0aNGgkApp8RI0YIrVZr1icvL8/iec8//7zw9PQUBQUFprYuXbqILl26mPV79dVXBQBx48YNUxsAMXv2bNPj6dOni5CQENGqVSuz53/88ccCgFiwYIHF6xsMBiGEEImJiQKAWL16tWna448/LmJiYkRERISIi4szta9evVoAEMOHDxfNmjUztefm5gpfX1/x5JNPCgDi999/F0IIUVRUJEJCQkRMTIzIz8839f/uu+8EADFr1ixTW+fOnYWPj4+4fPmy1TrLqlu3rlltpXXp0kXcd999VqfdjbXfVXx8vJAkyay2uLg4AUBMnDjRrNZ+/foJlUolbt68KYQQ4sUXXxS+vr5Cp9OV+5qTJ08WAMTPP/9sart9+7aIiooSkZGRQq/XCyGs/66sfWZK6qtbt67p8c2bNy0+NyV69OghmjVrZvZZNBgMokOHDqJBgwbl1m0NAKFWq0ViYqKp7cMPPxQARFhYmMjOzja1z5gxQwAw62vr8m/WrJmoXbu2uH37tqlt7969AoDZ+y5ZZjVq1BC3bt0ytX/99dcCgPj2229NbbNnzxZlVwNeXl5WP2dll2958zh69KgAIMaNG2fWr+RvpfTvY8yYMaJmzZoiLS3NrO+wYcOEn5+f1WVT9nWeffZZs/aXXnpJABB79uyxeM5PP/0kAIiffvqp3PmWVbduXQFA7N+/39R248YNoVarxbRp00xtBQUFps9ticTERKFWq8W8efPKnf+XX34pAJj1sfa+Dx48KACIdevWmdpK1k+dOnUy+3srWQ+1bNlSFBYWmtpXrlwpAJj9/SxcuFAAEOvXrzd7fvv27YW3t7fp81vesiv7N5qRkSEAiHfffbfc92wLu4aSr169Gjt37sSnn36KMWPG4NNPPzX7Ng8AHh4epv/fvn0baWlpeOihh5CXl4eEhASzvlqtFmlpabh58yYOHjyILVu2oHnz5ggKCrL6+teuXcPixYsxc+ZMi289mzdvRlBQECZOnGjxvPKGzB45cgQbN25EfHy82a7J0p555hkkJCSYdt9t3rwZfn5+6NGjh1m/P/74Azdu3MC4ceOg0WhM7f369UPjxo1Nu61u3ryJ/fv3Y/To0WZbnHeq8270ej3S0tKQlpaGoqIim59X+neVm5uLtLQ0dOjQAUII/PXXXxb9J0yYYFbrhAkTUFR
"text/plain": [
"<Figure size 400x800 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"# Список числовых колонок, для которых мы будем строить графики\n",
"numeric_columns = ['est_diameter_min', 'est_diameter_max', 'relative_velocity', 'miss_distance', 'absolute_magnitude']\n",
"\n",
"# Создание диаграмм зависимости\n",
"for column in numeric_columns:\n",
" plt.figure(figsize=(4, 8)) # Установка размера графика\n",
" plt.scatter(df['hazardous'], df[column], alpha=0.5) # Создаем диаграмму рассеяния\n",
" plt.title(f'Зависимость {column} от hazardous')\n",
" plt.xlabel('hazardous (0 = нет, 1 = да)')\n",
" plt.ylabel(column)\n",
" plt.xticks([0, 1]) # Установка меток по оси X\n",
" plt.grid() # Добавление сетки для удобства восприятия\n",
" plt.show() # Отображение графика"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Создадим выборки данных. разбивать будем относительно параметра опасный, ведь это тот самый параметр по которому наша выборка разбивается на классы. И собственно его нам и надо будет предсказывать"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: scikit-learn in d:\\мии\\aim-pibd-31-kouvshinoff-t-a\\laba\\lib\\site-packages (1.5.2)\n",
"Requirement already satisfied: numpy>=1.19.5 in d:\\мии\\aim-pibd-31-kouvshinoff-t-a\\laba\\lib\\site-packages (from scikit-learn) (2.1.1)\n",
"Requirement already satisfied: scipy>=1.6.0 in d:\\мии\\aim-pibd-31-kouvshinoff-t-a\\laba\\lib\\site-packages (from scikit-learn) (1.14.1)\n",
"Requirement already satisfied: joblib>=1.2.0 in d:\\мии\\aim-pibd-31-kouvshinoff-t-a\\laba\\lib\\site-packages (from scikit-learn) (1.4.2)\n",
"Requirement already satisfied: threadpoolctl>=3.1.0 in d:\\мии\\aim-pibd-31-kouvshinoff-t-a\\laba\\lib\\site-packages (from scikit-learn) (3.5.0)\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"pip install scikit-learn"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# Функция для создания выборок\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"def split_stratified_into_train_val_test(\n",
" df_input,\n",
" stratify_colname=\"y\",\n",
" frac_train=0.6,\n",
" frac_val=0.15,\n",
" frac_test=0.25,\n",
" random_state=None,\n",
"):\n",
" \"\"\"\n",
" Splits a Pandas dataframe into three subsets (train, val, and test)\n",
" following fractional ratios provided by the user, where each subset is\n",
" stratified by the values in a specific column (that is, each subset has\n",
" the same relative frequency of the values in the column). It performs this\n",
" splitting by running train_test_split() twice.\n",
"\n",
" Parameters\n",
" ----------\n",
" df_input : Pandas dataframe\n",
" Input dataframe to be split.\n",
" stratify_colname : str\n",
" The name of the column that will be used for stratification. Usually\n",
" this column would be for the label.\n",
" frac_train : float\n",
" frac_val : float\n",
" frac_test : float\n",
" The ratios with which the dataframe will be split into train, val, and\n",
" test data. The values should be expressed as float fractions and should\n",
" sum to 1.0.\n",
" random_state : int, None, or RandomStateInstance\n",
" Value to be passed to train_test_split().\n",
"\n",
" Returns\n",
" -------\n",
" df_train, df_val, df_test :\n",
" Dataframes containing the three splits.\n",
" \"\"\"\n",
"\n",
" if frac_train + frac_val + frac_test != 1.0:\n",
" raise ValueError(\n",
" \"fractions %f, %f, %f do not add up to 1.0\"\n",
" % (frac_train, frac_val, frac_test)\n",
" )\n",
"\n",
" if stratify_colname not in df_input.columns:\n",
" raise ValueError(\"%s is not a column in the dataframe\" % (stratify_colname))\n",
"\n",
" X = df_input # Contains all columns.\n",
" y = df_input[\n",
" [stratify_colname]\n",
" ] # Dataframe of just the column on which to stratify.\n",
"\n",
" # Split original dataframe into train and temp dataframes.\n",
" df_train, df_temp, y_train, y_temp = train_test_split(\n",
" X, y, stratify=y, test_size=(1.0 - frac_train), random_state=random_state\n",
" )\n",
"\n",
" # Split the temp dataframe into val and test dataframes.\n",
" relative_frac_test = frac_test / (frac_val + frac_test)\n",
" df_val, df_test, y_val, y_test = train_test_split(\n",
" df_temp,\n",
" y_temp,\n",
" stratify=y_temp,\n",
" test_size=relative_frac_test,\n",
" random_state=random_state,\n",
" )\n",
"\n",
" assert len(df_input) == len(df_train) + len(df_val) + len(df_test)\n",
"\n",
" return df_train, df_val, df_test"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"hazardous\n",
"False 81996\n",
"True 8840\n",
"Name: count, dtype: int64\n",
"\n",
"Обучающая выборка: (54501, 6)\n",
"hazardous\n",
"False 49197\n",
"True 5304\n",
"Name: count, dtype: int64\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgkAAADECAYAAAAVi7K7AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA9TklEQVR4nO3dd1xT1/sH8E8SIGHvLbLBgQNxtFbBjavUVVu3WBxVa7X6tdXWgdWidddtHVDxa61StVonbqxVceBWkOFmbwiQ5Pz+4Jd8CUkQELwgz/v1yktzOffc5557c/PknDt4jDEGQgghhJBy+FwHQAghhJC6iZIEQgghhKhFSQIhhBBC1KIkgRBCCCFqUZJACCGEELUoSSCEEEKIWpQkEEIIIUQtShIIIYQQopYW1wEQQkhDUFxcjIyMDMhkMtjZ2XEdDqlBYrEYGRkZ0NLSgpWVFdfh1CjqSSCkDhg7diwMDAy4DqPGLFy4EDwej+swOBcdHY3hw4fDwsICQqEQtra2GDx4MNdh1Rvr169HVlaW4v2aNWuQn5/PXUBlREZGIiAgACYmJtDV1YW9vT2+/vprrsOqcVXqSQgNDUVgYKDivVAoROPGjdGrVy/MmzcP1tbWNR4gIYTUR4cOHcJnn32GJk2aYMmSJXB1dQWA9+6XZm06fPgw4uLiMHPmTFy4cAHz5s3DtGnTuA4LGzduxFdffYVOnTph7dq1sLe3BwA4OjpyHFnNq9Zww6JFi+Ds7AyxWIyoqChs2rQJR48exd27d6Gnp1fTMRJCSL2SkZGBoKAg+Pv7Y9++fdDR0eE6pHpp7ty5CAgIwNq1a8Hn87Fy5Urw+dx2gMfGxuKbb77BhAkTsHHjxve+x6xaSUKfPn3Qtm1bAEBQUBDMzc2xatUqHDp0CMOGDavRAAkhdY9EIoFMJqMvPw127twJsViM0NBQaqO34Ofnh6SkJDx48AAODg5o1KgR1yHhl19+gY2NDX755Zf3PkEAauichG7dugEAEhISAJRm0bNmzUKLFi1gYGAAIyMj9OnTBzExMSrzisViLFy4EB4eHhCJRLC1tcWgQYPw5MkTAEBiYiJ4PJ7GV5cuXRR1nTt3DjweD3v37sXcuXNhY2MDfX19BAQE4NmzZyrLvnLlCnr37g1jY2Po6enBz88Ply5dUruOXbp0Ubv8hQsXqpQNDw+Hj48PdHV1YWZmhs8//1zt8itat7JkMhnWrFmD5s2bQyQSwdraGhMnTkRmZqZSOScnJ/Tv319lOVOnTlWpU13sy5cvV2lTACgqKsKCBQvg5uYGoVAIBwcHzJ49G0VFRWrbqqwuXbqo1LdkyRLw+Xz897//rVZ7rFixAh07doS5uTl0dXXh4+OD/fv3q11+eHg42rdvDz09PZiamsLX1xcnT55UKnPs2DH4+fnB0NAQRkZGaNeunUps+/btU2xTCwsLjBw5Ei9evFAqM3bsWKWYTU1N0aVLF1y8ePGN7ST34sULDBgwAAYGBrC0tMSsWbMglUqrvP7lY1G3zxYXF2P+/Pnw8fGBsbEx9PX10blzZ5w9e1apLvl2WbFiBdasWQNXV1cIhULcv38fABAVFYV27dpBJBLB1dUVW7ZsUbtuEokEP/74o2J+JycnzJ07V2U/0vS5cnJywtixYxXvS0pKEBwcDHd3d4hEIpibm6NTp044depUhW0cGhqq1B56enpo0aIFtm3bVuF8cvHx8fj0009hZmYGPT09fPDBB/j777+Vyvz7779o3bo1fvrpJzg4OEAoFMLd3R1Lly6FTCZTlPPz80OrVq3ULsfT0xP+/v5KMScmJiqVKf/5quw2BVTb+fXr1xg9ejQsLS0hFArh5eWFX3/9VWmesvtCWV5eXiqf8xUrVqiN+cWLFxg3bhysra0hFArRvHlz7NixQ6mM/Fh+7tw5mJiY4MMPP0SjRo3Qr18/jfuHuvnlL6FQCA8PD4SEhKDsg4/l586kpaVprKv8fvfvv//Cx8cHkydPVqyDurYCgPz8fMycOVOxD3h6emLFihUo//BlHo+HqVOnYvfu3fD09IRIJIKPjw8uXLigVE7duT5nz56FUCjEpEmTlKZXpp0ro0aubpB/oZubmwMo/RAdPHgQn376KZydnZGcnIwtW7bAz88P9+/fV5zZK5VK0b9/f5w+fRqff/45vv76a+Tm5uLUqVO4e/euYgwPAIYNG4a+ffsqLXfOnDlq41myZAl4PB6+/fZbpKSkYM2aNejRowdu3boFXV1dAMCZM2fQp08f+Pj4YMGCBeDz+di5cye6deuGixcvon379ir1NmrUCCEhIQCAvLw8fPnll2qXPW/ePAwdOhRBQUFITU3FunXr4Ovri5s3b8LExERlngkTJqBz584AgD///BMHDhxQ+vvEiRMV54NMmzYNCQkJWL9+PW7evIlLly5BW1tbbTtURVZWlmLdypLJZAgICEBUVBQmTJiApk2b4s6dO1i9ejUeP36MgwcPVmk5O3fuxA8//ICVK1di+PDhasu8qT3Wrl2LgIAAjBgxAsXFxfj999/x6aef4siRI+jXr5+iXHBwMBYuXIiOHTti0aJF0NHRwZUrV3DmzBn06tULQOnBd9y4cWjevDnmzJkDExMT3Lx5E8ePH1fEJ2/7du3aISQkBMnJyVi7di0uXbqksk0tLCywevVqAMDz58+xdu1a9O3bF8+ePVO77cuSSqXw9/dHhw4dsGLFCkRGRmLlypVwdXVV2tcqs/4TJ05Ejx49lOo/fvw4du/erRgTz8nJwbZt2zBs2DCMHz8eubm52L59O/z9/XH16lW0bt1aZduJxWJMmDABQqEQZmZmuHPnDnr16gVLS0ssXLgQEokECxYsUHt+UlBQEMLCwjBkyBDMnDkTV65cQUhICB48eKCyjStj4cKFCAkJQVBQENq3b4+cnBxER0fjxo0b6Nmz5xvnX716NSwsLJCTk4MdO3Zg/PjxcHJyUmm3spKTk9GxY0cUFBRg2rRpMDc3R1hYGAICArB//34MHDgQAJCeno6oqChERUVh3Lhx8PHxwenTpzFnzhwkJiZi8+bNAIBRo0Zh/PjxuHv3Lry8vBTLuXbtGh4/fowffvihSm1S1W0qV1xcjB49euDhw4f48ssv4enpiYMHD2LChAlIT0/Hd999V6U4NElOTsYHH3yg+FK0tLTEsWPH8MUXXyAnJwfTp0/XOO+FCxdw9OjRKi1v7ty5aNq0KQoLCxU/Hq2srPDFF19Uex3S09MRHR0NLS0tTJkyBa6urmrbijGGgIAAnD17Fl988QVat26NEydO4D//+Q9evHihOE7InT9/Hnv37sW0adMgFAqxceNG9O7dG1evXlXaN8qKiYnBgAED0LdvX2zYsEEx/W3aWQWrgp07dzIALDIykqWmprJnz56x33//nZmbmzNdXV32/PlzxhhjYrGYSaVSpXkTEhKYUChkixYtUkzbsWMHA8BWrVqlsiyZTKaYDwBbvny5SpnmzZszPz8/xfuzZ88yAMze3p7l5OQopv/xxx8MAFu7dq2ibnd3d+bv769YDmOMFRQUMGdnZ9azZ0+VZXXs2JF5eXkp3qempjIAbMGCBYppiYmJTCAQsCVLlijNe+fOHaalpaUyPTY2lgFgYWFhimkLFixgZTfLxYsXGQC2e/dupXmPHz+uMt3R0ZH169dPJfYpU6aw8pu6fOyzZ89mVlZWzMfHR6lNd+3axfh8Prt48aLS/Js3b2YA2KVLl1SWV5afn5+ivr///ptpaWmxmTNnqi1bmfZgrHQ7lVVcXMy8vLxYt27dlOri8/ls4MCBKvuifJtnZWUxQ0ND1qFDB1ZYWKi2THFxMbOysmJeXl5KZY4cOcIAsPnz5yumjRkzhjk6OirVs3XrVgaAXb16Ve06l50XgNLngzHGvL29mY+PT5XXv7zY2FhmbGzMevbsySQSCWOMMYlEwoqKipTKZWZmMmtrazZu3DjFNPln0MjIiKWkpCiVHzBgABOJRCwpKUkx7f79+0wgEChtt1u3bjEALCgoSGn+WbNmMQDszJkzimnl9005R0dHNmbMGMX7Vq1aqd3f30R
"text/plain": [
"<Figure size 200x200 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Контрольная выборка: (18167, 6)\n",
"hazardous\n",
"False 16399\n",
"True 1768\n",
"Name: count, dtype: int64\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAhUAAADECAYAAAAoGdPdAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA8hUlEQVR4nO3dd1wT9/8H8FcSQgJhLxm1iAxRUaso1oG4EbEWZ11V6Re1tWhttba2PxWtLXXUUXdbB47aCrbSasW9UOsGtyJLRURA2QRI8vn9wTf5EgJIIHCA7+fjkYfmuPvc++5yl3c+447HGGMghBBCCKklPtcBEEIIIaRpoKSCEEIIITpBSQUhhBBCdIKSCkIIIYToBCUVhBBCCNEJSioIIYQQohOUVBBCCCFEJyipIIQQQohOUFJBCCGE1LOsrCw8fPgQMpmM61B0ipIKQhqAyZMnw8jIiOswdCYkJAQ8Ho/rMMhr5smTJ9i+fbvqfVJSEnbv3s1dQGWUlJRg2bJl6NChA0QiEczNzeHq6orjx49zHZpOaZVUbN++HTweT/USi8Vwc3NDcHAw0tLS6ipGQggh5JV4PB4+/vhjHD58GElJSZg7dy7Onj3LdVgoKipC//79MX/+fPTu3Rvh4eE4evQoTpw4gW7dunEdnk7p1WShxYsXw8nJCVKpFNHR0di4cSP++ecf3Lp1C4aGhrqOkRBCCHklBwcHTJkyBYMGDQIA2NnZ4dSpU9wGBWDp0qW4ePEiDh8+jN69e3MdTp2qUVLh5+eHzp07AwCCgoJgaWmJlStXIjIyEmPHjtVpgISQhkcmk0GhUEBfX5/rUAhRs3r1asyYMQMZGRnw8PCARCLhNB6ZTIbVq1dj9uzZTT6hAHTUp6Jv374AgMTERADAixcvMGfOHLRr1w5GRkYwMTGBn58fYmNjNZaVSqUICQmBm5sbxGIx7OzsMHz4cMTHxwMobRMr2+RS/lX2IJ06dQo8Hg+///47vvrqK9ja2kIikWDo0KF4/PixxrovXryIQYMGwdTUFIaGhvDx8cG5c+cq3MbevXtXuP6QkBCNeXft2gVPT08YGBjAwsICY8aMqXD9VW1bWQqFAqtXr0bbtm0hFovRrFkzTJs2DS9fvlSbr0WLFhgyZIjGeoKDgzXKrCj25cuXa+xToLTqbuHChXBxcYFIJELz5s0xd+5cFBUVVbivyurdu7dGed9++y34fD5+/fXXGu2PFStWoHv37rC0tISBgQE8PT0RERFR4fp37doFLy8vGBoawtzcHL169cKRI0fU5jl06BB8fHxgbGwMExMTdOnSRSO28PBw1TG1srLChAkTkJKSojbP5MmT1WI2NzdH7969tap+TUlJQUBAAIyMjGBtbY05c+ZALpdrvf3lY6noM1tcXIwFCxbA09MTpqamkEgk8Pb2xsmTJ9XKUh6XFStWYPXq1XB2doZIJMKdO3cAANHR0ejSpQvEYjGcnZ2xefPmCrdNJpPhm2++US3fokULfPXVVxqfo8rOqxYtWmDy5Mmq9yUlJVi0aBFcXV0hFothaWmJnj174ujRo1Xu4/LNuIaGhmjXrh1++eWXKpcru2xSUpJq2u3bt2Fubo4hQ4aodbpLSEjAqFGjYGFhAUNDQ7z99ts4ePCgWnnKa1ZFn18jIyPV9paPuaKXsi+Bsn9OQkICfH19IZFIYG9vj8WLF6P8Q6nz8/Mxe/ZsNG/eHCKRCK1atcKKFSs05qsqhrLnt3KeK1euVLkfK+tDFBERAR6Pp1G7UN3zr0WLFgAAZ2dndO3aFS9evICBgYHGMasspuqcv5VdZ5WUx1S5Dffv38fLly9hbGwMHx8fGBoawtTUFEOGDMGtW7c0lr9+/Tr8/PxgYmICIyMj9OvXD//++6/aPMr9fObMGUybNg2WlpYwMTHBxIkTK/xeKHveAMDUqVMhFos19vOhQ4fg7e0NiUQCY2Nj+Pv74/bt21Xut/JqVFNRnjIBsLS0BFB6Mu3fvx+jRo2Ck5MT0tLSsHnzZvj4+ODOnTuwt7cHAMjlcgwZMgTHjx/HmDFj8MknnyA3NxdHjx7FrVu34OzsrFrH2LFjMXjwYLX1zps3r8J4vv32W/B4PHzxxRd4/vw5Vq9ejf79+yMmJgYGBgYAgBMnTsDPzw+enp5YuHAh+Hw+tm3bhr59++Ls2bPw8vLSKPeNN95AaGgoACAvLw8fffRRheueP38+Ro8ejaCgIKSnp2Pt2rXo1asXrl+/DjMzM41lpk6dCm9vbwDAH3/8gT///FPt79OmTcP27dsRGBiImTNnIjExEevWrcP169dx7tw5CIXCCveDNrKyslTbVpZCocDQoUMRHR2NqVOnonXr1rh58yZWrVqFBw8eYP/+/VqtZ9u2bfi///s//PDDDxg3blyF87xqf6xZswZDhw7F+PHjUVxcjN9++w2jRo3CgQMH4O/vr5pv0aJFCAkJQffu3bF48WLo6+vj4sWLOHHiBAYOHAig9OT84IMP0LZtW8ybNw9mZma4fv06oqKiVPEp932XLl0QGhqKtLQ0rFmzBufOndM4plZWVli1ahWA0k5ja9asweDBg/H48eMKj31Zcrkcvr6+6Nq1K1asWIFjx47hhx9+gLOzs9pnrTrbP23aNPTv31+t/KioKOzevRs2NjYAgJycHPzyyy8YO3YspkyZgtzcXGzZsgW+vr64dOkS3nrrLY1jJ5VKMXXqVIhEIlhYWODmzZsYOHAgrK2tERISAplMhoULF6JZs2Ya2xcUFISwsDCMHDkSs2fPxsWLFxEaGoq7d+9qHOPqCAkJQWhoKIKCguDl5YWcnBxcuXIF165dw4ABA165/KpVq2BlZYWcnBxs3boVU6ZMQYsWLTT2W1UeP36MQYMGwd3dHXv37oWeXuklNS0tDd27d0dBQQFmzpwJS0tLhIWFYejQoYiIiMCwYcO02tZevXph586dqvfffvstAODrr79WTevevbvq/3K5HIMGDcLbb7+NZcuWISoqCgsXLoRMJsPixYsBAIwxDB06FCdPnsR//vMfvPXWWzh8+DA+//xzpKSkqD7H5Sn3W9k46pI25195CxYsgFQqrfa6anP+ViYzMxNA6feVq6srFi1aBKlUivXr16NHjx64fPky3NzcAJQmqN7e3jAxMcHcuXMhFAqxefNm9O7dG6dPn0bXrl3Vyg4ODoaZmRlCQkJw//59bNy4EcnJyarEpiILFy7Eli1b8Pvvv6slhDt37sSkSZPg6+uLpUuXoqCgABs3bkTPnj1x/fp1VcL2SkwL27ZtYwDYsWPHWHp6Onv8+DH77bffmKWlJTMwMGBPnjxhjDEmlUqZXC5XWzYxMZGJRCK2ePFi1bStW7cyAGzlypUa61IoFKrlALDly5drzNO2bVvm4+Ojen/y5EkGgDk4OLCcnBzV9L179zIAbM2aNaqyXV1dma+vr2o9jDFWUFDAnJyc2IABAzTW1b17d+bh4aF6n56ezgCwhQsXqqYlJSUxgUDAvv32W7Vlb968yfT09DSmx8XFMQAsLCxMNW3hwoWs7GE5e/YsA8B2796ttmxUVJTGdEdHR+bv768R+8cff8zKH+rysc+dO5fZ2NgwT09PtX26c+dOxufz2dmzZ9WW37RpEwPAzp07p7G+snx8fFTlHTx4kOnp6bHZs2dXOG919gdjpceprOLiYubh4cH69u2rVhafz2fDhg3T+Cwqj3lWVhYzNjZmXbt2ZYWFhRXOU1xczGxsbJiHh4faPAcOHGAA2IIFC1TTJk2axBwdHdXK+emnnxgAdunSpQq3ueyyANTOD8YY69ixI/P09NR6+8uLi4tjpqambMCAAUwmkzHGGJPJZKyoqEhtvpcvX7JmzZqxDz74QDVNeQ6amJiw58+fq80fEBDAxGIxS05OVk27c+cOEwgEasctJiaGAWBBQUFqy8+ZM4cBYCdOnFBNK//ZVHJ0dGSTJk1Sve/QoUOFn/dXUV7HEhMTVdMePHjAALBly5ZVe9kXL16wNm3asFatWrGMjAy1+WbNmsUAqJ03ubm5zMnJibVo0UL1mVRes8LDwzXWJZFI1La3rLL
"text/plain": [
"<Figure size 200x200 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Тестовая выборка: (18168, 6)\n",
"hazardous\n",
"False 16400\n",
"True 1768\n",
"Name: count, dtype: int64\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAfQAAADECAYAAABp29OTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA5wElEQVR4nO3dd1xT1/sH8E8SQgJhhi0OkKEoThTrQNCqiFjFqrRq6/riqLWtVmtr+1VxtPxad922VVGsrava2jqrVrFW6wBFUZHhqrIUhEAISc7vD5p8CWELXEye9+uVl+Zy7rnPvbk5zx3n3PAYYwyEEEIIeanxuQ6AEEIIIS+OEjohhBBiACihE0IIIQaAEjohhBBiACihE0IIIQaAEjohhBBiACihE0IIIQaAEjohhBBiACihE0IIMRp5eXlIS0uDTCbjOpQ6RwmdkEZg/PjxsLCw4DqMOhMZGQkej8d1GKSB5OfnY9WqVdr3OTk5WLduHXcBlcIYw+bNm/HKK6/A3NwcVlZWcHd3R0xMDNeh1bkaJfRt27aBx+NpX2KxGN7e3pg+fTrS09PrK0ZCCCGNmJmZGf773/9i586dePDgASIjI/HLL79wHRYAYPTo0Zg6dSp8fHywY8cOHD9+HCdOnMDrr7/OdWh1zqQ2My1atAju7u6Qy+WIjY3Fhg0b8NtvvyEhIQHm5uZ1HSMhhJBGTCAQYOHChRg7dizUajWsrKzw66+/ch0Wtm/fjh9//BExMTEYPXo01+HUu1ol9JCQEHTp0gUAEBERATs7O6xYsQIHDx7EqFGj6jRAQkjjo1QqoVarYWpqynUopJGYNWsW3njjDTx48AA+Pj6wsbHhOiQsXboUo0aNMopkDtTRPfS+ffsCAFJTUwEAT58+xezZs9GuXTtYWFjAysoKISEhiI+P15tXLpcjMjIS3t7eEIvFcHFxweuvv47k5GQAQFpams5l/rKvoKAgbV2nT58Gj8fDjz/+iE8//RTOzs6QSCQYMmQIHjx4oLfsCxcuYODAgbC2toa5uTkCAwNx7ty5ctcxKCio3OVHRkbqlY2JiYGfnx/MzMwglUrx5ptvlrv8ytatNLVajVWrVqFt27YQi8VwcnLClClT8OzZM51ybm5uGDx4sN5ypk+frldnebEvXbpUb5sCQFFRERYsWABPT0+IRCI0a9YMc+bMQVFRUbnbqrSgoCC9+j7//HPw+Xx8//33tdoey5YtQ48ePWBnZwczMzP4+flh79695S4/JiYG/v7+MDc3h62tLXr37o1jx47plDl8+DACAwNhaWkJKysrdO3aVS+2PXv2aD9Te3t7vPXWW3j06JFOmfHjx+vEbGtri6CgIJw9e7bK7aTx6NEjhIWFwcLCAg4ODpg9ezZUKlWN179sLOXtswqFAvPnz4efnx+sra0hkUgQEBCAU6dO6dSl+VyWLVuGVatWwcPDAyKRCDdv3gQAxMbGomvXrhCLxfDw8MCmTZvKXTelUonFixdr53dzc8Onn36qtx9V9L1yc3PD+PHjte+Li4uxcOFCeHl5QSwWw87ODr169cLx48cr3cZlbx2am5ujXbt2+Pbbb2s0X3mvbdu2acvfunULI0aMgFQqhVgsRpcuXfDzzz/r1ZuTk4OZM2fCzc0NIpEITZs2xdixY5GVlaVt0yp7ld5WV69eRUhICKysrGBhYYFXX30Vf/31V63X/+TJkwgICIBEIoGNjQ2GDh2KxMREnTKl+0s0bdoU3bt3h4mJCZydncHj8XD69OlKt6tmfs3L0tIS/v7+OHDggE65oKAg+Pr6VliPZj/VfAYymQwJCQlo1qwZQkNDYWVlBYlEUuF3MiUlBSNHjoRUKoW5uTleeeUVvasMNckxNWn7apKLKlOrM/SyNMnXzs4OQMmGOXDgAEaOHAl3d3ekp6dj06ZNCAwMxM2bN9GkSRMAgEqlwuDBg/H777/jzTffxAcffIC8vDwcP34cCQkJ8PDw0C5j1KhRGDRokM5y586dW248n3/+OXg8Hj7++GNkZGRg1apV6NevH+Li4mBmZgagZEcNCQmBn58fFixYAD6fj61bt6Jv3744e/Ys/P399ept2rQpoqKiAJR0AnnnnXfKXfa8efMQHh6OiIgIZGZmYs2aNejduzeuXr1a7lHr5MmTERAQAADYv38/fvrpJ52/T5kyBdu2bcOECRPw/vvvIzU1FWvXrsXVq1dx7tw5CIXCcrdDTeTk5GjXrTS1Wo0hQ4YgNjYWkydPho+PD65fv46VK1fizp07el+6qmzduhX//e9/sXz58gqPmqvaHqtXr8aQIUMwZswYKBQK/PDDDxg5ciQOHTqE0NBQbbmFCxciMjISPXr0wKJFi2BqaooLFy7g5MmTGDBgAICSxm3ixIlo27Yt5s6dCxsbG1y9ehVHjhzRxqfZ9l27dkVUVBTS09OxevVqnDt3Tu8ztbe3x8qVKwEADx8+xOrVqzFo0CA8ePCgyjMWlUqF4OBgdOvWDcuWLcOJEyewfPlyeHh46Oxr1Vn/KVOmoF+/fjr1HzlyBDt37oSjoyMA4Pnz5/j2228xatQoTJo0CXl5efjuu+8QHByMixcvomPHjnqfnVwux+TJkyESiSCVSnH9+nUMGDAADg4OiIyMhFKpxIIFC+Dk5KS3fhEREYiOjsaIESMwa9YsXLhwAVFRUUhMTNT7jKsjMjISUVFRiIiIgL+/P54/f45Lly7hypUr6N+/f5Xzr1y5Evb29nj+/Dm2bNmCSZMmwc3NTW+7afTu3Rs7duzQvv/8888BAJ999pl2Wo8ePQAAN27cQM+ePeHq6opPPvkEEokEu3fvRlhYGPbt24dhw4YBKGlHAgICkJiYiIkTJ6Jz587IysrCzz//jIcPH2rv+2ps3rwZiYmJ2n0MANq3b69dZkBAAKysrDBnzhwIhUJs2rQJQUFB+OOPP9CtW7carf+JEycQEhKCli1bIjIyEoWFhVizZg169uyJK1euwM3NrcJtu3z58hr3q9KsZ1ZWFtavX4+RI0ciISEBrVq1qlE9GtnZ2QCAL7/8Es7Ozvjoo48gFovxzTffoF+/fjh+/Dh69+4NAEhPT0ePHj1QUFCA999/H3Z2doiOjsaQIUOwd+9e7eelUZ0cU1ZFbV9tclGFWA1s3bqVAWAnTpxgmZmZ7MGDB+yHH35gdnZ2zMzMjD18+JAxxphcLmcqlUpn3tTUVCYSidiiRYu007Zs2cIAsBUrVugtS61Wa+cDwJYuXapXpm3btiwwMFD7/tSpUwwAc3V1Zc+fP9dO3717NwPAVq9era3by8uLBQcHa5fDGGMFBQXM3d2d9e/fX29ZPXr0YL6+vtr3mZmZDABbsGCBdlpaWhoTCATs888/15n3+vXrzMTERG96UlISA8Cio6O10xYsWMBKfyxnz55lANjOnTt15j1y5Ije9BYtWrDQ0FC92N99911W9qMuG/ucOXOYo6Mj8/Pz09mmO3bsYHw+n509e1Zn/o0bNzIA7Ny5c3rLKy0wMFBb36+//spMTEzYrFmzyi1bne3BWMnnVJpCoWC+vr6sb9++OnXx+Xw2bNgwvX1R85nn5OQwS0tL1q1bN1ZYWFhuGYVCwRwdHZmvr69OmUOHDjEAbP78+dpp48aNYy1atNCpZ/PmzQwAu3jxYrnrXHpeADrfD8YY69SpE/Pz86vx+peVlJTErK2tWf/+/ZlSqWSMMaZUKllRUZFOuWfPnjEnJyc2ceJE7TTNd9DKyoplZGTolA8LC2NisZjdu3dPO+3mzZtMIBDofG5xcXEMAIuIiNCZf/bs2QwAO3nypHZa2X1To0WLFmzcuHHa9x06dCh3f6+Kph1LTU3VTrtz5w4DwL766qtq11N63y7r1VdfZe3atWNyuVw7Ta1Wsx49ejAvLy/ttPnz5zMAbP/+/Xp1lG6bNMrbxzTCwsKYqakpS05O1k77559/mKWlJevdu7d2WnXXv2PHjszR0ZFlZ2drp8XHxzM+n8/Gjh2rnVb2O5qRkcEsLS1ZSEgIA8BOnTpVbrwVzc8YY8eOHWMA2O7du7XTAgMDWdu2bSusR7Ofbt26Vee
"text/plain": [
"<Figure size 200x200 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Вывод распределения количества наблюдений по меткам (классам)\n",
"print(df.hazardous.value_counts())\n",
"print()\n",
"\n",
"\n",
"data = df[['est_diameter_min', 'est_diameter_max', 'relative_velocity', 'miss_distance', 'absolute_magnitude', 'hazardous']].copy()\n",
"\n",
"df_train, df_val, df_test = split_stratified_into_train_val_test(\n",
" data, stratify_colname=\"hazardous\", frac_train=0.60, frac_val=0.20, frac_test=0.20\n",
")\n",
"\n",
"print(\"Обучающая выборка: \", df_train.shape)\n",
"print(df_train.hazardous.value_counts())\n",
"hazardous_counts = df_train['hazardous'].value_counts()\n",
"plt.figure(figsize=(2, 2))# Установка размера графика\n",
"plt.pie(hazardous_counts, labels=hazardous_counts.index, autopct='%1.1f%%', startangle=90)# Построение круговой диаграммы\n",
"plt.title('Распределение классов hazardous в обучающей выборке')# Добавление заголовка\n",
"plt.show()# Отображение графика\n",
"\n",
"print(\"Контрольная выборка: \", df_val.shape)\n",
"print(df_val.hazardous.value_counts())\n",
"hazardous_counts = df_val['hazardous'].value_counts()\n",
"plt.figure(figsize=(2, 2))\n",
"plt.pie(hazardous_counts, labels=hazardous_counts.index, autopct='%1.1f%%', startangle=90)\n",
"plt.title('Распределение классов hazardous в контрольной выборке')\n",
"plt.show()\n",
"\n",
"print(\"Тестовая выборка: \", df_test.shape)\n",
"print(df_test.hazardous.value_counts())\n",
"hazardous_counts = df_test['hazardous'].value_counts()\n",
"plt.figure(figsize=(2, 2))\n",
"plt.pie(hazardous_counts, labels=hazardous_counts.index, autopct='%1.1f%%', startangle=90)\n",
"plt.title('Распределение классов hazardous в тестовой выборке')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"распределение плохое, соотношение классов сильно смещено, это может привести к проблемам в обучении модели, так как модель будет обучаться в основном на одном классе. В таких случаях стоит рассмотреть методы аугментации данных."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"аугментация данных оверсемплингом(Этот метод увеличивает количество примеров меньшинства)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting imblearn\n",
" Downloading imblearn-0.0-py2.py3-none-any.whl.metadata (355 bytes)\n",
"Collecting imbalanced-learn (from imblearn)\n",
" Downloading imbalanced_learn-0.12.4-py3-none-any.whl.metadata (8.3 kB)\n",
"Requirement already satisfied: numpy>=1.17.3 in d:\\мии\\aim-pibd-31-kouvshinoff-t-a\\laba\\lib\\site-packages (from imbalanced-learn->imblearn) (2.1.1)\n",
"Requirement already satisfied: scipy>=1.5.0 in d:\\мии\\aim-pibd-31-kouvshinoff-t-a\\laba\\lib\\site-packages (from imbalanced-learn->imblearn) (1.14.1)\n",
"Requirement already satisfied: scikit-learn>=1.0.2 in d:\\мии\\aim-pibd-31-kouvshinoff-t-a\\laba\\lib\\site-packages (from imbalanced-learn->imblearn) (1.5.2)\n",
"Requirement already satisfied: joblib>=1.1.1 in d:\\мии\\aim-pibd-31-kouvshinoff-t-a\\laba\\lib\\site-packages (from imbalanced-learn->imblearn) (1.4.2)\n",
"Requirement already satisfied: threadpoolctl>=2.0.0 in d:\\мии\\aim-pibd-31-kouvshinoff-t-a\\laba\\lib\\site-packages (from imbalanced-learn->imblearn) (3.5.0)\n",
"Downloading imblearn-0.0-py2.py3-none-any.whl (1.9 kB)\n",
"Downloading imbalanced_learn-0.12.4-py3-none-any.whl (258 kB)\n",
"Installing collected packages: imbalanced-learn, imblearn\n",
"Successfully installed imbalanced-learn-0.12.4 imblearn-0.0\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"pip install imblearn"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Обучающая выборка после oversampling: (100447, 6)\n",
"hazardous\n",
"True 51250\n",
"False 49197\n",
"Name: count, dtype: int64\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAq0AAADECAYAAAC1OBgQAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA/cUlEQVR4nO3dd3gUVdsG8Ht3k+ymQkIqCElIIXQkNMEkgDQJIk0UECkfAiqvAgIKKgkIRpr0agGEAAoovCJdQAkvHektBFBqSEJ6393z/YG7ZrObSpJZkvt3XXslO+XMMzNnZp+dOXNWJoQQICIiIiIyY3KpAyAiIiIiKgqTViIiIiIye0xaiYiIiMjsMWklIiIiIrPHpJWIiIiIzB6TViIiIiIye0xaiYiIiMjsMWklIiIiIrPHpJWIiOgZp9VqER8fj5s3b0odClG5YdJKZAaGDh0KOzs7qcMoM+Hh4ZDJZFKHQfTUoqKicOjQIf37Q4cO4ciRI9IFlMfDhw8xduxYeHp6wsrKCi4uLmjQoAFSUlKkDo2oXJQoaV2zZg1kMpn+pVKp4O/vjzFjxiA2Nra8YiQiIpLEnTt38O677+LChQu4cOEC3n33Xdy5c0fqsHDjxg20bNkSmzZtwqhRo7Bjxw7s27cPv/32G2xtbaUOj0po0qRJkMlkeP31102Ov337tkH+ZWlpCWdnZ7Rt2xZTpkzB33///VTlA0BcXBw++OADBAQEwNraGq6urmjVqhU++ugjpKWlISkpCR4eHmjXrh2EEEbzHzt2DHK5HBMnTgTw78ULNzc3ZGRkGE3v5eWFHj16FBp3fhYlmvof06dPh7e3N7KyshAVFYXly5dj586duHjxImxsbEpTJBERkdnp06cPFixYgCZNmgAAXnjhBfTp00fiqIBRo0bBysoKx44dQ61ataQOh56CEAIbN26El5cXfvnlF6SmpsLe3t7ktAMGDED37t2h1WqRmJiIkydPYsGCBVi4cCG+/fZbvPHGG6Uq//Hjx2jRogVSUlIwfPhwBAQEICEhAefPn8fy5cvxzjvvwMvLCwsWLMAbb7yBr7/+GiNHjtTPr1arMXr0aHh6emLatGkGZT969AjLly/Hhx9+WCYbq9hWr14tAIiTJ08aDB8/frwAIDZs2FCS4ojoH0OGDBG2trZSh1Fsubm5Ijs7u8DxYWFhooSnFyKzpVarxdmzZ8XZs2eFWq2WOhxx6tQpAUDs3btX6lCoDBw4cEAAEAcOHBCWlpZizZo1RtPcunVLABBz5swxGnf79m3h7+8vrKysxNmzZ0tV/uzZswUAceTIEaNxycnJIjMzU//+5ZdfFo6OjuLhw4f6YXPnzhUAxM6dO/XDdJ8DzZo1E25ubiIjI8OgXE9PTxEaGlrAVjGtTNq0duzYEQBw69YtAE8y9gkTJqBx48aws7ODg4MDXn75ZZw7d85o3qysLISHh8Pf3x8qlQoeHh7o06cPYmJiABhfEs//at++vb6sQ4cOQSaT4YcffsCUKVPg7u4OW1tb9OzZ0+TtnOPHj6Nbt26oVq0abGxsEBISUmBbpfbt25tcfnh4uNG069evR2BgIKytreHk5IQ33njD5PILW7e8tFotFixYgIYNG0KlUsHNzQ2jRo1CYmKiwXQFXWofM2aMUZmmYp8zZ47RNgWA7OxshIWFwdfXF0qlErVr18akSZOQnZ1tclvl1b59e6PyZs6cCblcjg0bNpRqe8ydOxdt27ZFjRo1YG1tjcDAQGzZssXk8tevX49WrVrBxsYGjo6OCA4Oxt69ew2m2bVrF0JCQmBvbw8HBwe0bNnSKLbNmzfr96mzszPefPNN3Lt3z2CaoUOHGsTs6OiI9u3b4/Dhw0VuJ5179+6hV69esLOzg4uLCyZMmACNRlPi9c8fi6k6m5OTg6lTpyIwMBDVqlWDra0tgoKCcPDgQYOydPtl7ty5WLBgAXx8fKBUKnH58mUAT9r8tWzZEiqVCj4+Pli5cqXJdVOr1fj888/183t5eWHKlClG9aig48rLywtDhw7Vv8/NzcW0adPg5+cHlUqFGjVq4MUXX8S+ffsK3cb5mznZ2NigcePG+Oabb0o0n6nXmjVrAPzbRvnmzZvo2rUrbG1tUbNmTUyfPt3otpqUx3dJz5lleRxs374doaGhqFmzJpRKJXx8fPD5558b1XdT66LbF7dv3y7V9iluXdTVOYVCgaZNm6Jp06b46aefIJPJ4OXlZbSs/Ly8vPTbQS6Xw93dHa+//rrBrdy8x1dB8rcRP3bsGFQqFWJiYtCwYUMolUq4u7tj1KhRePz4sdH8xd1vxamzunh1dR0AUlNTERgYCG9vbzx48EA/vLh125TCzmEymcygnXFx1zH/OQT49xjIX15Zfz4XJjIyEg0aNECHDh3QqVMnREZGFnteAPD09MSaNWuQk5OD2bNnl6r8mJgYKBQKtGnTxmicg4MDVCqV/v2yZcuQnZ2N8ePHA3jShCY8PByvv/46Xn75ZaP5p06ditjYWCxfvrxE62VKqZoH5KdLMGvUqAEAuHnzJrZt24bXXnsN3t7eiI2NxcqVKxESEoLLly+jZs2aAACNRoMePXrgt99+wxtvvIEPPvgAqamp2LdvHy5evAgfHx/9MnSXxPOaPHmyyXhmzpwJmUyGjz76CI8ePcKCBQvQqVMnnD17FtbW1gCAAwcO4OWXX0ZgYCDCwsIgl8uxevVqdOzYEYcPH0arVq2Myn3uuecQEREBAEhLS8M777xjctmfffYZ+vfvjxEjRiAuLg6LFy9GcHAw/vzzT1SvXt1onpEjRyIoKAgA8NNPP+Hnn382GD9q1CisWbMGw4YNw/vvv49bt25hyZIl+PPPP3HkyBFYWlqa3A4lkZSUpF+3vLRaLXr27ImoqCiMHDkS9evXx4ULFzB//nxcv34d27ZtK9FyVq9ejU8//RTz5s3DwIEDTU5T1PZYuHAhevbsiUGDBiEnJwebNm3Ca6+9hh07diA0NFQ/3bRp0xAeHo62bdti+vTpsLKywvHjx3HgwAF06dIFwJMPv+HDh6Nhw4aYPHkyqlevjj///BO7d+/Wx6fb9i1btkRERARiY2OxcOFCHDlyxGifOjs7Y/78+QCAu3fvYuHChejevTvu3Lljct/npdFo0LVrV7Ru3Rpz587F/v37MW/ePPj4+BjUteKs/6hRo9CpUyeD8nfv3o3IyEi4uroCAFJSUvDNN99gwIABePvtt5Gamopvv/0WXbt2xYkTJ9CsWTOjfZeVlYWRI0dCqVTCyckJFy5cQJcuXeDi4oLw8HCo1WqEhYXBzc3NaP1GjBiBtWvXol+/fvjwww9x/PhxRERE4MqVK0b7uDjCw8MRERGBESNGoFWrVkhJScGpU6dw5swZdO7cucj558+fD2dnZ6SkpOC7777D22+/DS8vL6PtphMcHIx169bp38+cORMA8Mknn+iHtW3bVv+/RqNBt27d0KZNG8yePRu7d+9GWFgY1Go1pk+frp9OyuM777oUdc4s6+NgzZo1sLOzw/jx42FnZ4cDBw5g6tSpSElJwZw5c556nQtT2rqoVqsN9ndxBAUFYeTIkdBqtbh48SIWLFiA+/fvl+jLbH4JCQnIysrCO++8g44dO2L06NGIiYnB0qVLcfz4cRw/fhxKpRJAyfZbcetsXrm5uejbty/+/vtvHDlyBB4eHvpxT1u3lUql0ZfJkydPYtGiRQbDSrKOJVXen8/Z2dnYunWr/tb5gAEDMGzYMDx8+BDu7u7FjvOFF16Aj4+P0Zf24pbv6ekJjUaDdevWYciQIYUuy8vLC9OmTcPEiRMxdOhQLFu2DBYWFliwYIHJ6YOCgtCxY0fMnj0b77zzjv6cUioluSyrax6wf/9+ERcXJ+7cuSM2bdokatSoIaytrcXdu3eFEEJkZWUJjUZjMO+tW7eEUqkU06dP1w/77rvvBADx1VdfGS1Lq9Xq50MBl8QbNmwoQkJC9O8PHjwoAIhatWqJlJQU/fAff/xRABALFy7Ul+3n5ye6du2qX44QQmRkZAhvb2/RuXNno2W1bdtWNGrUSP8+Li5OABBhYWH
"text/plain": [
"<Figure size 200x200 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from imblearn.over_sampling import ADASYN\n",
"\n",
"# Создание экземпляра ADASYN\n",
"ada = ADASYN()\n",
"\n",
"# Применение ADASYN\n",
"X_resampled, y_resampled = ada.fit_resample(df_train.drop(columns=['hazardous']), df_train['hazardous'])\n",
"\n",
"# Создание нового DataFrame\n",
"df_train_adasyn = pd.DataFrame(X_resampled)\n",
"df_train_adasyn['hazardous'] = y_resampled # Добавление целевой переменной\n",
"\n",
"# Вывод информации о новой выборке\n",
"print(\"Обучающая выборка после oversampling: \", df_train_adasyn.shape)\n",
"print(df_train_adasyn['hazardous'].value_counts())\n",
"hazardous_counts = df_train_adasyn['hazardous'].value_counts()\n",
"plt.figure(figsize=(2, 2))\n",
"plt.pie(hazardous_counts, labels=hazardous_counts.index, autopct='%1.1f%%', startangle=90)\n",
"plt.title('Распределение классов hazardous в тренировачной выборке после ADASYN')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"P.S. можно было использовать ещё SMOTE, SVM-SMOTE, K-means SMOTE, SMOTE-N, SMOTE-NC, RandomOverSampler."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"проведём также балансировку данных методом андерсемплинга. Этот метод помогает сбалансировать выборку, уменьшая количество экземпляров класса большинства, чтобы привести его в соответствие с классом меньшинства."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Обучающая выборка после undersampling: (10608, 6)\n",
"hazardous\n",
"False 5304\n",
"True 5304\n",
"Name: count, dtype: int64\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAuYAAADECAYAAADTYuRHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABAH0lEQVR4nO3dd1gUV9sG8Ht3gaWrFClqFEFBxIolsWFHxc8aNUZjiy3G5NXEFDXGEpXXmNhbEmM3mthfNdZYYo8aCzZE7I0iUgQW2N3z/UF2w7JLdXEQ7t917aU7O3PmmTNnZp6dPXOQCSEEiIiIiIhIUnKpAyAiIiIiIibmRERERETFAhNzIiIiIqJigIk5EREREVExwMSciIiIiKgYYGJORERERFQMMDEnIiIiIioGmJgTERERERUDTMyJiIhec1qtFrGxsbh9+7bUoRDRS2BiTlQMDBo0CPb29lKHYTZTpkyBTCaTOgyil3b8+HEcOXJE//7IkSM4ceKEdAFl8fTpU4wZMwaVK1eGlZUVXF1d4e/vj8TERKlDo2Jo0KBBqFKlitRhFAtHjhyBTCYzOLaLS/0UKDFftWoVZDKZ/mVtbY3q1atj9OjRiIqKKqoYiYiIJPHgwQOMGjUKYWFhCAsLw6hRo/DgwQOpw8KtW7fQsGFDbNy4ESNGjMCuXbtw4MAB/PHHH7Czs5M6PMqDLp86d+6cyc87d+5cLJJEevUsCrPQtGnT4OXlBZVKhePHj2Pp0qX4/fffceXKFdja2po7RiIiIkn06NED8+bNQ+3atQEAb731Fnr06CFxVMCIESNgZWWF06dPo0KFClKHQ/Ta++mnn6DVaqUOo3CJeceOHdGgQQMAwNChQ+Hs7Iw5c+Zgx44d6Nu3r1kDJKLiR61WQ6vVwsrKSupQiIqUUqnEyZMnceXKFQBAQEAAFAqFpDGdP38ehw4dwv79+5mUkySEEFCpVLCxsZE6FLOxtLSUOgQAZupj3rp1awDAnTt3AABxcXEYN24catWqBXt7ezg6OqJjx464dOmS0bIqlQpTpkxB9erVYW1tDQ8PD/To0QORkZEAgLt37xp0n8n+atmypb4sXZ+hX3/9FRMmTIC7uzvs7OzQpUsXkz89njlzBh06dECZMmVga2uLoKCgHPsOtmzZ0uT6p0yZYjTvunXrEBgYCBsbGzg5OeGdd94xuf7cti0rrVaLefPmoWbNmrC2toabmxtGjBiB58+fG8xXpUoVdO7c2Wg9o0ePNirTVOyzZ882qlMASEtLw+TJk+Hj4wOlUolKlSrh888/R1pamsm6yqply5ZG5c2YMQNyuRy//PJLoerju+++Q5MmTeDs7AwbGxsEBgZi8+bNJte/bt06NGrUCLa2tihXrhxatGiB/fv3G8yzZ88eBAUFwcHBAY6OjmjYsKFRbJs2bdLvUxcXF/Tv3x+PHj0ymGfQoEEGMZcrVw4tW7bEsWPH8qwnnUePHqFbt26wt7eHq6srxo0bB41GU+Dtzx6LqTabnp6Or7/+GoGBgShTpgzs7OzQvHlzHD582KAs3X757rvvMG/ePHh7e0OpVOLatWsAMvvgNmzYENbW1vD29sYPP/xgctvUajW++eYb/fJVqlTBhAkTjNpRTsdVlSpVMGjQIP37jIwMTJ06FdWqVYO1tTWcnZ3RrFkzHDhwINc6zt4lz9bWFrVq1cLy5csLtJyp16pVqwD8+8zA7du3ERwcDDs7O3h6emLatGkQQhiUK+XxXdBzprmPgyVLlqBmzZpQKpXw9PTEhx9+iPj4+Dy3Rbcv7t69W6j6yW9b1LU5hUKBOnXqoE6dOti6dStkMlm+uhlUqVJFXw9yuRzu7u7o06cP7t+/r58n6/GVk+zPbJw+fRrW1taIjIzU15+7uztGjBiBuLg4o+Xzu9/y02Z18eraOgAkJSUhMDAQXl5eePLkiX56ftu2Kbmdw7L3Dc7vNgLAjRs30Lt3b7i6usLGxga+vr6YOHGi0XxZ911u692zZw+aN28OOzs7ODg4ICQkBFevXs1z+woqazv58ccf9W23YcOGOHv2rNH827dvR0BAAKytrREQEIBt27aZLLeg5599+/ahQYMGsLGx0Z/rDxw4gGbNmqFs2bKwt7eHr68vJkyYoF+2MNeaxYsXo2rVqrC1tUX79u3x4MEDCCHwzTffoGLFirCxsUHXrl2N2rsuzv3796Nu3bqwtraGv78/tm7dmmcdZ+9jXtA637RpE/z9/Q3qvDD91gt1xzw7XRLt7OwMALh9+za2b9+OXr16wcvLC1FRUfjhhx8QFBSEa9euwdPTEwCg0WjQuXNn/PHHH3jnnXfwn//8B0lJSThw4ACuXLkCb29v/Tr69u2LTp06Gax3/PjxJuOZMWMGZDIZvvjiC0RHR2PevHlo27YtLl68qP92d+jQIXTs2BGBgYGYPHky5HI5Vq5cidatW+PYsWNo1KiRUbkVK1ZEaGgoAODFixf44IMPTK570qRJ6N27N4YOHYqYmBgsXLgQLVq0wIULF1C2bFmjZYYPH47mzZsDALZu3Wp0AI0YMQKrVq3C4MGD8fHHH+POnTtYtGgRLly4gBMnTpjlW158fLx+27LSarXo0qULjh8/juHDh6NGjRoICwvD3LlzcfPmTWzfvr1A61m5ciW++uorfP/993j33XdNzpNXfcyfPx9dunRBv379kJ6ejo0bN6JXr17YtWsXQkJC9PNNnToVU6ZMQZMmTTBt2jRYWVnhzJkzOHToENq3bw8g8wI/ZMgQ1KxZE+PHj0fZsmVx4cIF7N27Vx+fru4bNmyI0NBQREVFYf78+Thx4oTRPnVxccHcuXMBAA8fPsT8+fPRqVMnPHjwwOS+z0qj0SA4OBiNGzfGd999h4MHD+L777+Ht7e3QVvLz/aPGDECbdu2NSh/7969WL9+PcqXLw8ASExMxPLly9G3b18MGzYMSUlJ+PnnnxEcHIy//voLdevWNdp3KpUKw4cPh1KphJOTE8LCwtC+fXu4urpiypQpUKvVmDx5Mtzc3Iy2b+jQoVi9ejXefvttfPrppzhz5gxCQ0Nx/fr1HC8auZkyZQpCQ0MxdOhQNGrUCImJiTh37hz+/vtvtGvXLs/l586dCxcXFyQmJmLFihUYNmwYqlSpYlRvOi1atMDatWv172fMmAEABhf1Jk2a6P+v0WjQoUMHvPnmm/j222+xd+9eTJ48GWq1GtOmTdPPJ+XxnXVb8jpnmvs4mDJlCqZOnYq2bdvigw8+QHh4OJYuXYqzZ8+abbtzUti2qFarTSZxuWnevDmGDx8OrVaLK1euYN68eXj8+HGBvrBn9+zZM6hUKnzwwQdo3bo1Ro4cicjISCxevBhnzpzBmTNnoFQqARRsv+W3zWaVkZGBnj174v79+zhx4gQ8PDz0n71s21YqlUZfmM+ePYsFCxYYTMvvNl6+fBnNmzeHpaUlhg8fjipVqiAyMhI7d+7UH89Z6fYdAFy/fh0zZ840+Hzt2rUYOHAggoODMWvWLKSkpGDp0qVo1qwZLly4UCR9xH/55RckJSVhxIgRkMlk+Pbbb9GjRw/cvn1bX5/79+9Hz5494e/vj9DQUDx79gyDBw9GxYoVjcoryD4KDw9H3759MWLECAwbNgy+vr64evUqOnfujNq1a2PatGlQKpW4deuWwU3Ogl5r1q9fj/T0dHz00UeIi4vDt99+i969e6N169Y4cuQIvvjiC9y6dQsLFy7EuHHjsGLFCoPlIyIi0KdPH4wcORIDBw7EypUr0atXL+zduzdf14bC1Pnu3bvRp08f1KpVC6GhoXj+/Dnef//9wv2iJQpg5cqVAoA4ePCgiImJEQ8ePBAbN24Uzs7OwsbGRjx8+FAIIYRKpRIajcZg2Tt37gilUimmTZumn7ZixQoBQMyZM8doXVqtVr8cADF79myjeWrWrCmCgoL07w8fPiwAiAoVKojExET99N9++00AEPPnz9eXXa1aNREcHKxfjxBCpKSkCC8vL9GuXTujdTVp0kQEBATo38fExAgAYvLkyfppd+/eFQqFQsyYMcNg2bCwMGFhYWE0PSIiQgAQq1e
"text/plain": [
"<Figure size 200x200 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from imblearn.under_sampling import RandomUnderSampler\n",
"\n",
"rus = RandomUnderSampler()# Создание экземпляра RandomUnderSampler\n",
"\n",
"# Применение RandomUnderSampler\n",
"X_resampled, y_resampled = rus.fit_resample(df_train.drop(columns=['hazardous']), df_train['hazardous'])\n",
"\n",
"# Создание нового DataFrame\n",
"df_train_undersampled = pd.DataFrame(X_resampled)\n",
"df_train_undersampled['hazardous'] = y_resampled # Добавление целевой переменной\n",
"\n",
"# Вывод информации о новой выборке\n",
"print(\"Обучающая выборка после undersampling: \", df_train_undersampled.shape)\n",
"print(df_train_undersampled['hazardous'].value_counts())\n",
"\n",
"# Визуализация распределения классов\n",
"hazardous_counts = df_train_undersampled['hazardous'].value_counts()\n",
"plt.figure(figsize=(2, 2))\n",
"plt.pie(hazardous_counts, labels=hazardous_counts.index, autopct='%1.1f%%', startangle=90)\n",
"plt.title('Распределение классов hazardous в тренировочной выборке после Undersampling')\n",
"plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "laba",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}