2024-10-26 01:15:17 +04:00
{
2024-11-09 10:04:05 +04:00
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Лабораторная работа №2\n",
"## Были выбраны следующие датасеты:\n",
" - ### 11. Цены на бриллианты.\n",
" - ### 18. Цены на мобильные устройства.\n",
" - ### 19. Данные о миллионерах."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Начнем анализировать датасет №11.\n",
"\n",
"Ссылка на исходные данные: https://www.kaggle.com/datasets/nancyalaswad90/diamonds-prices\n",
"\n",
"**Общее описание**: Данный датасет содержит цены и атрибуты для 53940 алмазов круглой огранки. Имеются 10 характеристик (карат, огранка, цвет, чистота, глубина, таблица, цена, x, y и z). Большинство переменных являются числовыми по своей природе, но переменные cut, color и clearity являются упорядоченными факторными переменными.\n",
"\n",
"**Проблемная область**: Финансовый анализ и прогнозирование цен акций.\n",
"\n",
"**Объекты наблюдения**: Данные о алмазах, включающие атрибуты: _Carat, Cut, Color, Clarity, Depth, Table, Price_.\n",
"\n",
"**Бизнес цели**:\n",
"- ***Прогнозирование цен на алмазы***: Позволяет покупателям и продавцам лучше ориентироваться в рыночных ценах, а также помогает в принятии решений о покупке или продаже алмазов,\n",
"- ***Анализ факторов, влияющих на стоимость***: Понимание, какие характеристики алмаза (например, качество огранки или цвет) оказывают наибольшее влияние на е г о цену, может помочь в разработке стратегий ценообразования и улучшении ассортимента.\n",
"\n",
"**Цели технического проекта**:\n",
"1. ***Прогнозирование цен на алмазы***: Входные данные - атрибуты алмазов; целевой признак - _це на _,\n",
"2. ***Анализ факторов влияния***: Входные данные - атрибуты, описывающие качество и характеристики алмаза; целевой признак - влияние каждого атрибута на конечную цену, что может быть проанализировано с помощью методов регрессии и визуализации данных."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"ename": "KeyboardInterrupt",
"evalue": "",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
"Cell \u001b[1;32mIn[1], line 3\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mpandas\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mpd\u001b[39;00m\n\u001b[1;32m----> 3\u001b[0m df \u001b[38;5;241m=\u001b[39m \u001b[43mpd\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread_csv\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m../data/Diamonds-Prices.csv\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[0;32m 4\u001b[0m \u001b[38;5;28mprint\u001b[39m(df\u001b[38;5;241m.\u001b[39mcolumns)\n",
"File \u001b[1;32mc:\\Users\\bocchanskyy\\source\\repos\\MAI_PIbd-33_Volkov_NA\\.venv\\Lib\\site-packages\\pandas\\io\\parsers\\readers.py:1026\u001b[0m, in \u001b[0;36mread_csv\u001b[1;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)\u001b[0m\n\u001b[0;32m 1013\u001b[0m kwds_defaults \u001b[38;5;241m=\u001b[39m _refine_defaults_read(\n\u001b[0;32m 1014\u001b[0m dialect,\n\u001b[0;32m 1015\u001b[0m delimiter,\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 1022\u001b[0m dtype_backend\u001b[38;5;241m=\u001b[39mdtype_backend,\n\u001b[0;32m 1023\u001b[0m )\n\u001b[0;32m 1024\u001b[0m kwds\u001b[38;5;241m.\u001b[39mupdate(kwds_defaults)\n\u001b[1;32m-> 1026\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_read\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilepath_or_buffer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[1;32mc:\\Users\\bocchanskyy\\source\\repos\\MAI_PIbd-33_Volkov_NA\\.venv\\Lib\\site-packages\\pandas\\io\\parsers\\readers.py:620\u001b[0m, in \u001b[0;36m_read\u001b[1;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[0;32m 617\u001b[0m _validate_names(kwds\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mnames\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m))\n\u001b[0;32m 619\u001b[0m \u001b[38;5;66;03m# Create the parser.\u001b[39;00m\n\u001b[1;32m--> 620\u001b[0m parser \u001b[38;5;241m=\u001b[39m \u001b[43mTextFileReader\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilepath_or_buffer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 622\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m chunksize \u001b[38;5;129;01mor\u001b[39;00m iterator:\n\u001b[0;32m 623\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m parser\n",
"File \u001b[1;32mc:\\Users\\bocchanskyy\\source\\repos\\MAI_PIbd-33_Volkov_NA\\.venv\\Lib\\site-packages\\pandas\\io\\parsers\\readers.py:1620\u001b[0m, in \u001b[0;36mTextFileReader.__init__\u001b[1;34m(self, f, engine, **kwds)\u001b[0m\n\u001b[0;32m 1617\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moptions[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhas_index_names\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m kwds[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhas_index_names\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n\u001b[0;32m 1619\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandles: IOHandles \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m-> 1620\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_engine \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_make_engine\u001b[49m\u001b[43m(\u001b[49m\u001b[43mf\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mengine\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[1;32mc:\\Users\\bocchanskyy\\source\\repos\\MAI_PIbd-33_Volkov_NA\\.venv\\Lib\\site-packages\\pandas\\io\\parsers\\readers.py:1898\u001b[0m, in \u001b[0;36mTextFileReader._make_engine\u001b[1;34m(self, f, engine)\u001b[0m\n\u001b[0;32m 1895\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(msg)\n\u001b[0;32m 1897\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m-> 1898\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mmapping\u001b[49m\u001b[43m[\u001b[49m\u001b[43mengine\u001b[49m\u001b[43m]\u001b[49m\u001b[43m(\u001b[49m\u001b[43mf\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 1899\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m:\n\u001b[0;32m 1900\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandles \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n",
"File \u001b[1;32mc:\\Users\\bocchanskyy\\source\\repos\\MAI_PIbd-33_Volkov_NA\\.venv\\Lib\\site-packages\\pandas\\io\\parsers\\c_parser_wrapper.py:93\u001b[0m, in \u001b[0;36mCParserWrapper.__init__\u001b[1;34m(self, src, **kwds)\u001b[0m\n\u001b[0;32m 90\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m kwds[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdtype_backend\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mpyarrow\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n\u001b[0;32m 91\u001b[0m \u001b[38;5;66;03m# Fail here loudly instead of in cython after reading\u001b[39;00m\n\u001b[0;32m 92\u001b[0m import_optional_dependency(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mpyarrow\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m---> 93\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_reader \u001b[38;5;241m=\u001b[39m \u001b[43mparsers\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mTextReader\u001b[49m\u001b[43m(\u001b[49m\u001b[43msrc\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 95\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39munnamed_cols \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_reader\u001b[38;5;241m.\u001b[39munnamed_cols\n\u001b[0;32m 97\u001b[0m \u001b[38;5;66;03m# error: Cannot determine type of 'names'\u001b[39;00m\n",
"\u001b[1;31mKeyboardInterrupt\u001b[0m: "
]
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.read_csv(\"../data/Diamonds-Prices.csv\")\n",
"print(df.columns)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Атрибуты: \n",
"- Неизвестный: 0, \n",
"- Караты (carat), \n",
"- Огранка (cut), \n",
"- Цвет (color), \n",
"- Чистота (clarity), \n",
"- Глубина (depth), \n",
"- Площадь огранки (table), \n",
"- Цена (price), \n",
"- Ширина (координата X), \n",
"- Длина (координата Y), \n",
"- Высота (координата Z). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Проверяем на выбросы"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0EAAAIjCAYAAADFthA8AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACu9klEQVR4nOzdeXwTdf4/8FfSI+mZXpQUhLYcCrUCgrJUDhVBEUTUXV1RXK9FRNmvsq4HugrIKuu66/FbFBUPVhDcXS9AsSsIimARpVy1qFDbitICbSG9r2R+f9SJOSbJTDK5X8/Hg8eDppOZTyaT9POe9+fz/mgEQRBAREREREQUJbTBbgAREREREVEgMQgiIiIiIqKowiCIiIiIiIiiCoMgIiIiIiKKKgyCiIiIiIgoqjAIIiIiIiKiqMIgiIiIiIiIogqDICIiIiIiiioMgoiIiIiIKKowCCIiIiIioqjCIIiIrN566y1oNBrJf4WFhcFuHhEREZEqYoPdACIKPQ8++CCGDh1q/fmxxx4LYmuIiIiI1MUgiIicTJ48GRdccIH155dffhl1dXXBaxARERGRijgcjoisOjs7AQBareevhpUrV0Kj0aCqqsr6mMViwbBhw6DRaLBy5Urr4/v378dNN92EAQMGQK/Xw2g04pZbbkF9fb3dPhctWiQ5FC829pf7NRdccAEKCwuxe/dunHfeeUhISEB+fj5eeOEFp9fyyCOPYNSoUTAYDEhKSsL48eOxdetWu+2qqqqsx3nvvffsftfe3o709HRoNBr8/e9/d2pndnY2urq67J6zdu1a6/5sA8d169Zh2rRp6NOnD3Q6HQYOHIglS5bAbDZ7PNfLly/H8OHDra9j+PDheOWVV+y28fYcp6SkYPTo0U6v/YILLrALhAHgyy+/tD7P0erVqzF69GgkJiYiPT0dEyZMwEcffWT9fV5eHm666Sa75/z3v/+FRqNBXl6e9TFv3g8A2LNnDy699FKkpqYiOTkZF110EXbu3OnUzlOnTmH+/PnIy8uDTqfDaaedht/97neoq6vDJ5984nI4qPhv0aJFdufRVnNzM4xGIzQaDT755BOnY9sSn+94c+Grr75y+vwAPZ+tZ555BmeeeSb0ej169+6NOXPm4OTJk3bb5eXl4bLLLnM63rx585zaq9FoMG/ePJdtlPqMu7JlyxaMHz8eSUlJSEtLw4wZM3Dw4EGn1+vun7tzJnW+t27dCp1Oh9tvv936WHV1Ne644w6cccYZSEhIQGZmJq6++mqn1yC+tm3btmHOnDnIzMxEamoqfve73zmdUzmf3QsuuMDj6xO99tprmDhxIrKzs6HT6VBQUIDly5d7PMdEpC5mgojISgyCdDqdV89ftWoVDhw44PT4pk2b8P333+Pmm2+G0WjE119/jZdeeglff/01du7c6dS5Wb58OZKTk60/OwZlJ0+exNSpU3HNNddg5syZ+M9//oO5c+ciPj4et9xyCwCgsbERL7/8MmbOnInZs2ejqakJr7zyCi655BLs2rULI0aMsNunXq/Ha6+9hiuuuML62DvvvIP29naXr7epqQnvv/8+rrzySutjr732GvR6vdPzVq5cieTkZPzxj39EcnIytmzZgkceeQSNjY148sknXR5DPM7FF1+MgQMHQhAE/Oc//8Hvf/97pKWl4de//rVX53jVqlUAgLq6Ojz//PO4+uqrUVZWhjPOOMNlO+6//37JxxcvXoxFixbhvPPOw6OPPor4+Hh88cUX2LJlCy6++GLJ53R3d+Ohhx5yeSwl78fXX3+N8ePHIzU1Fffddx/i4uLw4osv4oILLsCnn36KX/3qVwB6gpTx48fj4MGDuOWWWzBy5EjU1dVh/fr1+PHHHzF06FDreQGAl156CQcPHsTTTz9tfWzYsGEu2/yPf/wDx44dc/l7X8yZMwcrV67EzTffjP/7v/9DZWUlli1bhj179mDHjh2Ii4vzy3Hl2Lx5My699FIMGDAAixYtQltbG/75z39i7NixKC0tRV5eHq666ioMGjTI+pz58+dj6NChuO2226yP2Q7B9WTfvn244oorMHXqVDz33HPWx7/88kt8/vnnuPbaa3HaaaehqqoKy5cvxwUXXIDy8nIkJiba7WfevHlIS0vDokWL8O2332L58uWorq62BsSAvM/uQw89hN///vcAej5T8+fPx2233Ybx48c7tX358uU488wzcfnllyM2NhYbNmzAHXfcAYvFgjvvvFP2OSAiHwlERD975plnBADCvn377B4///zzhTPPPNPusddee00AIFRWVgqCIAjt7e1C//79hUsvvVQAILz22mvWbVtbW52OtXbtWgGAsG3bNutjCxcuFAAIJ06ccNnG888/XwAg/OMf/7A+1tHRIYwYMULIzs4WOjs7BUEQhO7ubqGjo8PuuSdPnhR69+4t3HLLLdbHKisrBQDCzJkzhdjYWKG2ttb6u4suuki47rrrBADCk08+6dTOmTNnCpdddpn18erqakGr1QozZ850eh1S52DOnDlCYmKi0N7e7vL1Sunu7hZSU1OFefPmud2/u3Ns66OPPhIACP/5z3+sj51//vnC+eefb/1548aNAgBhypQpds8/dOiQoNVqhSuvvFIwm812+7VYLNb/5+bmCjfeeKP15+eff17Q6XTChRdeKOTm5lof9+b9uOKKK4T4+HihoqLC+tjRo0eFlJQUYcKECdbHHnnkEQGA8M477zidK9u2im688Ua7ttlyPI/Hjx8XUlJSrNf/1q1bJZ/n+HzHa/3LL790+vx89tlnAgDhjTfesNu2uLjY6fHc3Fxh2rRpTse78847nd53AMKdd97pso2On3FXxM9efX299bF9+/YJWq1W+N3vfif5HMfrwRPb811VVSXk5OQI48aNE9ra2uy2k/oclJSUCACE119/3fqY+NpGjRpl/c4QBEH429/+JgAQ1q1b53af7j674jVs+x56auMll1wiDBgwQHJ7IvIPDocjIitx6FSvXr0UP/e5555DfX09Fi5c6PS7hIQE6//b29tRV1eHMWPGAABKS0sVHys2NhZz5syx/hwfH485c+bg+PHj2L17NwAgJiYG8fHxAHqGEjU0NKC7uxvnnHOO5DFHjhyJM88805oJqK6uxtatW52GcNm65ZZbUFxcjNraWgDAv/71LxQVFeH000932tb2HDQ1NaGurg7jx49Ha2srvvnmG4+v2Ww2o66uDtXV1Xj66afR2Nhod5dZ6Tmuq6tDXV0dDh48iBdeeAFJSUnW7R0JgoAFCxbg17/+tTWrInrvvfdgsVjwyCOPOGXspIbNAUBrayseffRRzJs3D/3795fcRu77YTab8dFHH+GKK67AgAEDrI/n5OTguuuuw/bt29HY2AgAePvttzF8+HC7zJ2ntsq1ZMkSGAwG/N///Z9P+5Hy3//+FwaDAZMnT7a+b3V1dRg1ahSSk5Odhnh2dXXZbVdXV+cyoyleK/X19bBYLIrbVlNTg7179+Kmm25CRkaG9fFhw4Zh8uTJ2Lhxo+J9ulNfX49LLrkEKSkpWL9+PfR6vd3vbT8HXV1dqK+vx6BBg5CWlib5Objtttvssmhz585FbGysXbt9/ew6st2fyWRCXV0dzj//fHz//fcwmUyK90dE3mEQRERW1dXViI2NVRwEmUwmPP744/jjH/+I3r17O/2+oaEBd911F3r37o2EhAT06tUL+fn51ucq1adPHyQlJdk9JgYetmP///Wvf2HYsGHQ6/XIzMxEr1698MEHH7g85s0334zXXnsNQM8QmPPOOw+DBw922Y4RI0agsLAQr7/+OgRBsA5XkvL111/jyiuvhMFgQGpqKnr16oVZs2YBkHcODh06hF69eiEvLw8PPfQQnn/+eVxzzTXW3ys9x7169UKvXr1QUFCAzZs344033kC/fv0kj/3GG2/g66+/xuOPP+70u4qKCmi1WhQUFHh8DaKnnnoK7e3tePDBB91uJ+f9OHHiBFpbWyWH8Q0dOhQWiwVHjhyxttUfpd4rKyvx4osvYvHixU6dcjUcOnQIJpMJ2dnZ1
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Загрузка данных\n",
"df = pd.read_csv(\"../data/Diamonds-Prices.csv\")\n",
"\n",
"plt.figure(figsize=(10, 6))\n",
"plt.scatter(df[\"price\"], df[\"carat\"])\n",
"plt.xlabel(\"Цена\")\n",
"plt.ylabel(\"Карат\")\n",
"plt.title(\"Диаграмма зависимости цены от карата\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Выброс с наибольшим значением был замечен при ~175000\n",
"Начнем использовать метод межквантильного размаха для удаления выбросов."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Пустые значения по столбцам:\n",
"Unnamed: 0 0\n",
"carat 0\n",
"cut 0\n",
"color 0\n",
"clarity 0\n",
"depth 0\n",
"table 0\n",
"price 0\n",
"x 0\n",
"y 0\n",
"z 0\n",
"dtype: int64\n",
"\n",
"Количество дубликатов: 0\n",
"\n",
"Статистический обзор данных:\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>carat</th>\n",
" <th>depth</th>\n",
" <th>table</th>\n",
" <th>price</th>\n",
" <th>x</th>\n",
" <th>y</th>\n",
" <th>z</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>53943.000000</td>\n",
" <td>53943.000000</td>\n",
" <td>53943.000000</td>\n",
" <td>53943.000000</td>\n",
" <td>53943.000000</td>\n",
" <td>53943.000000</td>\n",
" <td>53943.000000</td>\n",
" <td>53943.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>26972.000000</td>\n",
" <td>0.797935</td>\n",
" <td>61.749322</td>\n",
" <td>57.457251</td>\n",
" <td>3932.734294</td>\n",
" <td>5.731158</td>\n",
" <td>5.734526</td>\n",
" <td>3.538730</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>15572.147122</td>\n",
" <td>0.473999</td>\n",
" <td>1.432626</td>\n",
" <td>2.234549</td>\n",
" <td>3989.338447</td>\n",
" <td>1.121730</td>\n",
" <td>1.142103</td>\n",
" <td>0.705679</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>0.200000</td>\n",
" <td>43.000000</td>\n",
" <td>43.000000</td>\n",
" <td>326.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>13486.500000</td>\n",
" <td>0.400000</td>\n",
" <td>61.000000</td>\n",
" <td>56.000000</td>\n",
" <td>950.000000</td>\n",
" <td>4.710000</td>\n",
" <td>4.720000</td>\n",
" <td>2.910000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>26972.000000</td>\n",
" <td>0.700000</td>\n",
" <td>61.800000</td>\n",
" <td>57.000000</td>\n",
" <td>2401.000000</td>\n",
" <td>5.700000</td>\n",
" <td>5.710000</td>\n",
" <td>3.530000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>40457.500000</td>\n",
" <td>1.040000</td>\n",
" <td>62.500000</td>\n",
" <td>59.000000</td>\n",
" <td>5324.000000</td>\n",
" <td>6.540000</td>\n",
" <td>6.540000</td>\n",
" <td>4.040000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>53943.000000</td>\n",
" <td>5.010000</td>\n",
" <td>79.000000</td>\n",
" <td>95.000000</td>\n",
" <td>18823.000000</td>\n",
" <td>10.740000</td>\n",
" <td>58.900000</td>\n",
" <td>31.800000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 carat depth table price \\\n",
"count 53943.000000 53943.000000 53943.000000 53943.000000 53943.000000 \n",
"mean 26972.000000 0.797935 61.749322 57.457251 3932.734294 \n",
"std 15572.147122 0.473999 1.432626 2.234549 3989.338447 \n",
"min 1.000000 0.200000 43.000000 43.000000 326.000000 \n",
"25% 13486.500000 0.400000 61.000000 56.000000 950.000000 \n",
"50% 26972.000000 0.700000 61.800000 57.000000 2401.000000 \n",
"75% 40457.500000 1.040000 62.500000 59.000000 5324.000000 \n",
"max 53943.000000 5.010000 79.000000 95.000000 18823.000000 \n",
"\n",
" x y z \n",
"count 53943.000000 53943.000000 53943.000000 \n",
"mean 5.731158 5.734526 3.538730 \n",
"std 1.121730 1.142103 0.705679 \n",
"min 0.000000 0.000000 0.000000 \n",
"25% 4.710000 4.720000 2.910000 \n",
"50% 5.700000 5.710000 3.530000 \n",
"75% 6.540000 6.540000 4.040000 \n",
"max 10.740000 58.900000 31.800000 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"null_values_diamond = df.isnull().sum()\n",
"print(\"Пустые значения по столбцам:\")\n",
"print(null_values_diamond)\n",
"\n",
"duplicates = df.duplicated().sum()\n",
"print(f\"\\nК о личе с тво дубликатов: {duplicates}\")\n",
"\n",
"print(\"\\nС та тис тиче с кий обзор данных:\")\n",
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Коэффициент асимметрии для столбца 'Unnamed: 0': 0.0\n",
"\n",
"Коэффициент асимметрии для столбца 'carat': 1.1167052359880187\n",
"\n",
"Коэффициент асимметрии для столбца 'depth': -0.08218721424717913\n",
"\n",
"Коэффициент асимметрии для столбца 'table': 0.7968359775412807\n",
"\n",
"Коэффициент асимметрии для столбца 'price': 1.6184763222032386\n",
"\n",
"Коэффициент асимметрии для столбца 'x': 0.37868453466912216\n",
"\n",
"Коэффициент асимметрии для столбца 'y': 2.4342330799873775\n",
"\n",
"Коэффициент асимметрии для столбца 'z': 1.5224810204974413\n"
]
}
],
"source": [
"import numpy as np\n",
"\n",
"for column in df.select_dtypes(include=[np.number]).columns:\n",
" asymmetry = df[column].skew()\n",
" print(f\"\\nК о эффицие нт асимметрии для столбца '{column}': {asymmetry}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Видим выбросы. Очистим данные от шумов."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0EAAAI+CAYAAAB6/gF5AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAC+C0lEQVR4nOzdeXwTdf4/8FeSHmlLm7aUkoLQlkOhVMCiCHKoCCuC4PHbdUXxq+iiIuzXdT1xRURXWdZdj11cVFRYQVDXC1HELwrKYZGVylELCrWtHC3Qg7T0bjO/P+rEHDPJTDK5X8/Hg4c2ncx8Mpk0n/d8Pp/3WycIggAiIiIiIqIooQ92A4iIiIiIiAKJQRAREREREUUVBkFERERERBRVGAQREREREVFUYRBERERERERRhUEQERERERFFFQZBREREREQUVRgEERERERFRVGEQREREREREUYVBEBEREZGfHD16FCtXrrT9XF5ejjfeeCN4DSIiAAyCiMiNd955BzqdTvJffn5+sJtHRBTydDod5s6di08//RTl5eV44IEHsG3btmA3iyjqxQS7AUQU+h5++GEMHjzY9vOTTz4ZxNYQEYWP3r17Y/bs2Zg8eTIAICsrC1988UVwG0VE0AmCIAS7EUQUmt555x385je/wZYtW3DJJZfYHr/kkktQXV2N4uLi4DWOiCiMlJaWorq6Gvn5+UhKSgp2c4iiHqfDEZGstrY2AIBe7/lPxcqVK6HT6VBeXm57zGq1YujQodDpdA5z4vft24dbbrkF/fr1g9FohNlsxq233oqamhqHfT722GOSU/FiYn4ZxL7kkkuQn5+P3bt346KLLkJCQgJyc3Px4osvuryWRx99FCNGjIDJZEJSUhLGjRuHLVu2OGxXXl5uO84HH3zg8LuWlhakpaVBp9Phb3/7m0s7MzMz0d7e7vCctWvX2vZXXV1te3zdunWYOnUqevXqhfj4ePTv3x9PPPEEOjs7PZ5r8XgHDx7Eddddh5SUFHTv3h133303WlpaHLZdsWIFJkyYgMzMTMTHxyMvLw/Lli2T3O8nn3yCiy++GMnJyUhJScEFF1yANWvWOGzz9ddfY8qUKUhLS0NSUhKGDh2K559/3mGbgwcP4te//jXS09NhNBpx/vnn48MPP3TYRs31cssttzi8/2lpabjkkktcphQpPafiNePsb3/7m0ubcnJycMsttzhs95///Ac6nQ45OTkOj588eRK33XYb+vbtC4PBYGtvt27dXI7lLCcnR3bqqU6nc9l+9erVGDFiBBISEpCeno7rr78eR44ckXydnj4bANDa2oqFCxdiwIABiI+PR58+ffDAAw+gtbXVZdsvvvhCcTudideu1Ou3P89qrg8Ats9Cjx49kJCQgHPOOQd/+tOfHI7p7p84MnPJJZc43PABuka+9Xq9y2fhP//5j+09yMjIwMyZM3Hs2DGHbW655RbbddK/f39ceOGFqK2tRUJCgsvrI6LA4nQ4IpIlBkHx8fFePX/VqlXYv3+/y+ObNm3Cjz/+iFmzZsFsNuO7777Dyy+/jO+++w47d+506SQtW7bMoSPpHJTV1dVhypQpuO666zBjxgy8/fbbmDNnDuLi4nDrrbcCAOrr6/HKK69gxowZmD17NhoaGvDqq6/i8ssvx65duzB8+HCHfRqNRqxYsQJXX3217bH33nvPJciw19DQgI8++gjXXHON7bEVK1bAaDS6PG/lypXo1q0b/vjHP6Jbt27YvHkzHn30UdTX1+Ppp5+WPYa96667Djk5OVi8eDF27tyJf/zjH6irq8Prr7/ucO6GDBmC6dOnIyYmBuvXr8ddd90Fq9WKuXPnOrTn1ltvxZAhQzB//nykpqbi22+/xcaNG3HDDTcA6HrfrrzySmRlZeHuu++G2WzGgQMH8NFHH+Huu+8GAHz33XcYM2YMevfujYceeghJSUl4++23cfXVV+Pdd991ODfO5K4XAMjIyMCzzz4LoGuh+fPPP48pU6bgyJEjSE1N1eycetLR0WHrXDu7+eab8dlnn+H3v/89hg0bBoPBgJdffhlFRUWK9j18+HDce++9Do+9/vrr2LRpk8NjTz75JBYsWIDrrrsOv/vd73Dq1Cn885//xPjx4/Htt9/azgeg7LNhtVoxffp0bN++HbfffjsGDx6M/fv349lnn8UPP/zgcjNA9L//+7+44IILZNupNbnrY9++fRg3bhxiY2Nx++23IycnB6WlpVi/fj2efPJJXHvttRgwYIBt+3vuuQeDBw/G7bffbnvMfrqvvRUrVuCRRx7B3//+d9vnAOi61mbNmoULLrgAixcvxokTJ/D8889jx44dLu+Bs0cffdTt3xEiChCBiEjGc889JwAQ9u7d6/D4xRdfLAwZMsThsRUrVggAhLKyMkEQBKGlpUXo27evcMUVVwgAhBUrVti2bWpqcjnW2rVrBQDC1q1bbY8tXLhQACCcOnVKto0XX3yxAED4+9//bnustbVVGD58uJCZmSm0tbUJgiAIHR0dQmtrq8Nz6+rqhJ49ewq33nqr7bGysjIBgDBjxgwhJiZGqKqqsv3usssuE2644QYBgPD000+7tHPGjBnClVdeaXu8oqJC0Ov1wowZM1xeh9Q5uOOOO4TExEShpaVF9vXaH2/69OkOj991110u75fUcS6//HKhX79+tp9Pnz4tJCcnCxdeeKHQ3NzssK3VahUEoev85ebmCtnZ2UJdXZ3kNoLQdY7OPfdch9dgtVqFiy66SBg4cKDtMTXXy8033yxkZ2c7HPPll18WAAi7du1y+1qlzqnU9SsIgvD00087tEkQBCE7O1u4+eabbT//61//EuLj44VLL73UoU3Nzc2CXq8X7rjjDod93nzzzUJSUpLLsZxlZ2cLU6dOdXl87ty5gv1XdXl5uWAwGIQnn3zSYbv9+/cLMTExDo8r/WysWrVK0Ov1wrZt2xz2+eKLLwoAhB07djg8/n//938CAOGdd96RbaecRYsWCQAcrhnx9dufZzXXx/jx44Xk5GShoqLCYZ/Ox5A7lr2LL75YuPjiiwVBEISPP/5YiImJEe69916Hbdra2oTMzEwhPz/f4fPy0UcfCQCERx991PaY87VbXFws6PV62+uwv9aIKLA4HY6IZInT03r06KH6uS+88AJqamqwcOFCl98lJCTY/r+lpQXV1dUYNWoUACi+a24vJiYGd9xxh+3nuLg43HHHHTh58iR2794NADAYDIiLiwPQdee7trYWHR0dOP/88yWPWVBQgCFDhmDVqlUAgIqKCmzZssVlapS9W2+9FRs3bkRVVRUA4N///jdGjx6Ns88+22Vb+3PQ0NCA6upqjBs3Dk1NTTh48KCi120/kgMAv//97wEAGzZskDyOxWJBdXU1Lr74Yvz444+wWCwAukZ4Ghoa8NBDD8FoNDrsUxyV+/bbb1FWVoY//OEPLne5xW1qa2uxefNmXHfddbbXVF1djZqaGlx++eU4dOiQy3QhkbvrBeh6z8T97dmzB6+//jqysrIc7uCrOaednZ22/Yn/mpqaJI8tampqwuOPP4558+ahb9++Dr9rbGyE1WpF9+7d3e7DV++99x6sViuuu+46h7abzWYMHDjQZXqnks/Gf/7zHwwePBiDBg1y2OeECRMAwGWf4iiG87WiRGZmJoCu0Tw15K6PU6dOYevWrbj11ltd3hMl0/Pk7Nq1C9dddx3+3//7fy6jiN988w1OnjyJu+66y+EcTJ06FYMGDcLHH38su9/58+ejoKAAv/nNb7xuGxFpg9PhiEhWRUUFYmJiVAdBFosFTz31FP74xz+iZ8+eLr+vra3FokWL8Oabb+LkyZMuz1WrV69eLguNxcCjvLzcFmD9+9//xt///nccPHjQYe1Obm6u5H5nzZqFl19+Gffddx9WrlyJiy66CAMHDpRtx/Dhw5Gfn4/XX38d999/P1auXImHH37YZa0G0DVt7JFHHsHmzZtRX1/v8Dul58C5Lf3794der3dYZ7Bjxw4sXLgQhYWFLp18i8UCk8mE0tJSAHCb9lzJNocPH4YgCFiwY
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA2QAAAI1CAYAAAC5TTkuAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAD2iUlEQVR4nOzdd3wUdfoH8M9uKgmphGQThCQUhRCKIJAYikIQBMWC54+ih4JgAQ+xAXoo5RTROxFPBEGxUE9PTlG4HCUoxVAEgsSAUhIQTMB0SEjd+f2xzLJlZvY7s7O7s5vn/Xr5uiM7OzvbZr/PfJ/v8+g4juNACCGEEEIIIcTt9J4+AEIIIYQQQghpriggI4QQQgghhBAPoYCMEEIIIYQQQjyEAjJCCCGEEEII8RAKyAghhBBCCCHEQyggI4QQQgghhBAPoYCMEEIIIYQQQjyEAjJCCCGEEEII8RAKyAghhBBCCCHEQyggI4QQQgghhBAPoYCMEOLV/v3vf0On0wn+l5qa6unDI4QQQgiR5O/pAyCEEDW89NJL6NKli/nfr732mgePhhBCCCGEDQVkhBCfMHToUNx2223mf3/44YcoKSnx3AERQgghhDCglEVCiFerr68HAOj1jk9nn3zyCXQ6HQoLC81/MxqN6N69O3Q6HT755BPz33/66Sc88sgjaN++PYKDg2EwGDBx4kSUlpZa7XPu3LmC6ZL+/tevd912221ITU3FoUOHcOutt6JFixZITk7G8uXL7Z7LK6+8gt69eyMiIgKhoaEYMGAAdu7cabVdYWGh+XG++uorq9tqa2sRFRUFnU6Hv//973bHGRsbi4aGBqv7rF+/3rw/yyD266+/xsiRI5GQkICgoCB06NABCxYsQFNTk8PXmn+8EydO4MEHH0R4eDhatWqF6dOno7a21mrbjz/+GIMHD0ZsbCyCgoKQkpKCZcuWCe73v//9LwYNGoSwsDCEh4ejT58+WLdundU2+/fvx4gRIxAVFYXQ0FB0794dS5YssdrmxIkTeOCBBxAdHY3g4GDccsst2LRpk9U2cj4vjzzyiNX7HxUVhdtuuw27d++22ifra8p/Zmz9/e9/tzumpKQkPPLII1bbffHFF9DpdEhKSrL6+6VLlzBp0iS0a9cOfn5+5uNt2bKl3WPZSkpKEk0P1ul0Vts2NjZiwYIF6NChA4KCgpCUlISXXnoJdXV1dvtleU8tP/NSj2s0GvHOO++ga9euCA4ORlxcHB5//HGUl5czPT/b1/G7776DTqfDd999Z/7bbbfdZnXxBwAOHjwoeDwAsGbNGvTt2xchISGIiorCwIEDsXXrVvNjSr2m/PvHP3/Lz9zly5fRu3dvJCcno6ioSHQ7AJg6dSp0Op3d8yOEaAPNkBFCvBofkAUFBSm6/+rVq3Hs2DG7v2/btg1nzpzBo48+CoPBgJ9//hkrVqzAzz//jH379tkNvJYtW2Y1qLUNEMvLyzFixAg8+OCDGDt2LD7//HM8+eSTCAwMxMSJEwEAVVVV+PDDDzF27FhMnjwZly9fxkcffYRhw4bhwIED6Nmzp9U+g4OD8fHHH+Pee+81/23jxo12AY+ly5cv49tvv8V9991n/tvHH3+M4OBgu/t98sknaNmyJZ599lm0bNkS2dnZeOWVV1BVVYW33npL9DEsPfjgg0hKSsLChQuxb98+vPvuuygvL8dnn31m9dp17doVo0aNgr+/P7755hs89dRTMBqNmDp1qtXxTJw4EV27dsXs2bMRGRmJI0eOICsrC+PGjQNget/uuusuxMfHY/r06TAYDDh+/Di+/fZbTJ8+HQDw888/IyMjA23atMGsWbMQGhqKzz//HPfeey++/PJLq9fGltjnBQBiYmKwePFiAMD58+exZMkSjBgxAr/99hsiIyNVe00daWxsxMsvvyx424QJE7B9+3Y8/fTT6NGjB/z8/LBixQocPnyYad89e/bEc889Z/W3zz77DNu2bbP622OPPYZPP/0UDzzwAJ577jns378fCxcuxPHjx/Gf//zHvB3Le2ppypQpGDBgAADTZ91yXwDw+OOP45NPPsGjjz6Kv/zlLygoKMB7772HI0eOYO/evQgICGB6nnLNnDlT8O/z5s3D3Llzceutt2L+/PkIDAzE/v37kZ2djTvuuAPvvPMOrly5AgA4fvw4Xn/9dav0a7FAuaGhAaNHj8a5c+ewd+9exMfHix7bqVOnsHLlSiefISHEpThCCPFi77zzDgeAO3r0qNXfBw0axHXt2tXqbx9//DEHgCsoKOA4juNqa2u5du3acXfeeScHgPv444/N29bU1Ng91vr16zkA3K5du8x/e/XVVzkA3B9//CF6jIMGDeIAcP/4xz/Mf6urq+N69uzJxcbGcvX19RzHcVxjYyNXV1dndd/y8nIuLi6OmzhxovlvBQUFHABu7NixnL+/P1dcXGy+bciQIdy4ceM4ANxbb71ld5xjx47l7rrrLvPfz549y+n1em7s2LF2z0PoNXj88ce5kJAQrra2VvT5Wj7eqFGjrP7+1FNP2b1fQo8zbNgwrn379uZ/V1RUcGFhYVy/fv24q1evWm1rNBo5jjO9fsnJyVxiYiJXXl4uuA3HmV6jbt26WT0Ho9HI3XrrrVynTp3Mf5PzeZkwYQKXmJho9ZgrVqzgAHAHDhyQfK5Cr6nQ55fjOO6tt96yOiaO47jExERuwoQJ5n+///77XFBQEHf77bdbHdPVq1c5vV7PPf7441b7nDBhAhcaGmr3WLYSExO5kSNH2v196tSpnOVwIjc3lwPAPfbYY1bbPf/88xwALjs7m+M4tveUd/LkSQ4A9+mnn5r/xn/GeLt37+YAcGvXrrW6b1ZWluDfbSUnJ3N//vOfrf62c+dODgC3c+dO898GDRrEDRo0yPzvLVu2cAC44cOHWx3PyZMnOb1ez913331cU1OT5PMTeywe/53/+OOPOaPRyI0fP54LCQnh9u/fL7od78EHH+RSU1O5tm3bWn1OCCHaQSmLhBCvxqcQtm7dWvZ9ly5ditLSUrz66qt2t7Vo0cL8/2tra1FSUoK0tDQAYJ5NsOTv74/HH3/c/O/AwEA8/vjjuHTpEg4dOgQA8PPzQ2BgIABT6lVZWRkaGxtxyy23CD5mr1690LVrV6xevRoAcPbsWezcuVMyLWnixInIyspCcXExAODTTz9Feno6brzxRrttLV+Dy5cvo6SkBAMGDEBNTQ1OnDjB9LwtZ7gA4OmnnwYAbNmyRfBxKisrUVJSgkGDBuHMmTOorKwEYJr5unz5MmbNmoXg4GCrffKzlUeOHEFBQQGeeeYZ84yU7TZlZWXIzs7Ggw8+aH5OJSUlKC0txbBhw3Dy5ElcuHBB8LlIfV4A03vG7y83NxefffYZ4uPjrYrNyHlNm5qazPvj/6upqRF8bF5NTQ3mz5+PadOmoV27dla3VVdXw2g0olWrVpL7cBb/3j777LNWf+dn1jZv3gyA7T3lscyEf/HFF4iIiMDQoUOtXrPevXujZcuWdqm/tmJjY3H+/HmGZ3gdx3GYPXs2Ro8ejX79+lnd9tVXX8FoNOKVV16xmzEXSm1k9cILL2Dt2rX4/PPP0bdvX8ltDx06hC+++AILFy5kSusmhHgGfTsJIV7t7Nmz8Pf3lx2QVVZW4vXXX8ezzz6LuLg4u9vLysowffp0xMXFoUWLFmjdujWSk5PN95UrISEBoaGhVn/jgyDL9UCffvopunfvjuDgYLRq1QqtW7fG5s2bRR/z0UcfxccffwzAlP516623olOnTqLH0bNnT6SmpuKzzz4Dx3Hm9C4hP//8M+677z5EREQgPDwcrVu3xkMPPQSA/TWwPZYOHTpAr9dbPee9e/ciMzMToaGhiIyMROvWrfHSSy9ZPc7p06cBQLKVAcs2p06dAsdxmDNnDlq3bm31Hx9oXbp0ye5+jj4vAPDbb7+Z93XzzTfj9OnT+PLLL63SzuS8pidOnBA9RjFvv/02amtrza+fpVatWqFTp0748MMPsXXrVly6dAklJSWC67qccfbsWej1enTs2NHq7waDA
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(10, 6))\n",
"plt.scatter(df[\"price\"], df[\"carat\"])\n",
"plt.xlabel(\"Цена\")\n",
"plt.ylabel(\"Карат\")\n",
"plt.xticks(rotation=45)\n",
"plt.title(\"Диаграмма рассеивания перед чисткой\")\n",
"plt.show()\n",
"\n",
"\n",
"# Выбираем столбцы для анализа\n",
"column1 = \"carat\"\n",
"column2 = \"price\"\n",
"# Функция для удаления выбросов\n",
"def remove_outliers(df, column):\n",
" Q1 = df[column].quantile(0.25)\n",
" Q3 = df[column].quantile(0.75)\n",
" IQR = Q3 - Q1\n",
" lower_bound = Q1 - 1.5 * IQR\n",
" upper_bound = Q3 + 1.5 * IQR\n",
" return df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]\n",
"\n",
"\n",
"# Удаление выбросов для каждого столбца\n",
"df_cleaned = df.copy()\n",
"for column in [column1, column2]:\n",
" df_cleaned = remove_outliers(df_cleaned, column)\n",
"\n",
"\n",
"plt.figure(figsize=(10, 6))\n",
"plt.scatter(df_cleaned[column1], df_cleaned[column2])\n",
"plt.xlabel(\"Цена\")\n",
"plt.ylabel(\"Карат\")\n",
"plt.xticks(rotation=45)\n",
"plt.title(\"Диаграмма рассеивания после чистки\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Количество строк до удаления выбросов: 53943\n",
"Количество строк после удаления выбросов: 49517\n"
]
}
],
"source": [
"# Вывод количества строк до и после удаления выбросов\n",
"print(f\"Количество строк до удаления выбросов: {len(df)}\")\n",
"print(f\"Количество строк после удаления выбросов: {len(df_cleaned)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Перейдем к созданию выборок"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Размер обучающей выборки: 32365\n",
"Размер контрольной выборки: 10789\n",
"Размер тестовой выборки: 10789\n"
]
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"df = pd.read_csv(\"../data/Diamonds-Prices.csv\")\n",
"\n",
"# Выбираем признаки и целевую переменную\n",
"X = df.drop(\"price\", axis=1) # В с е столбцы, кроме цены\n",
"y = df[\"price\"]\n",
"\n",
"# Разбиение данных на обучающую и оставшуюся часть (контрольную + тестовую)\n",
"X_train, X_temp, y_train, y_temp = train_test_split(\n",
" X, y, test_size=0.4, random_state=42\n",
")\n",
"\n",
"# Разбиение оставшейся части на контрольную и тестовую выборки\n",
"X_val, X_test, y_val, y_test = train_test_split(\n",
" X_temp, y_temp, test_size=0.5, random_state=42\n",
")\n",
"\n",
"# Вывод размеров выборок\n",
"print(f\"Размер обучающей выборки: {X_train.shape[0]}\")\n",
"print(f\"Размер контрольной выборки: {X_val.shape[0]}\")\n",
"print(f\"Размер тестовой выборки: {X_test.shape[0]}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Проанализируем сбалансированность выборок"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Распределение Price в обучающей выборке:\n",
"price\n",
"327 1\n",
"334 1\n",
"336 1\n",
"337 1\n",
"338 1\n",
" ..\n",
"18791 1\n",
"18795 2\n",
"18797 1\n",
"18804 1\n",
"18806 1\n",
"Name: count, Length: 9476, dtype: int64\n",
"Процент положительных значений: 100.00%\n",
"Процент отрицательных значений: 0.00%\n",
"\n",
"Необходима аугментация данных для балансировки классов.\n",
"\n",
"Распределение Price в контрольной выборке:\n",
"price\n",
"326 2\n",
"340 1\n",
"344 1\n",
"354 1\n",
"357 1\n",
" ..\n",
"18781 1\n",
"18784 1\n",
"18791 1\n",
"18803 1\n",
"18823 1\n",
"Name: count, Length: 5389, dtype: int64\n",
"Процент положительных значений: 100.00%\n",
"Процент отрицательных значений: 0.00%\n",
"\n",
"Необходима аугментация данных для балансировки классов.\n",
"\n",
"Распределение Price в тестовой выборке:\n",
"price\n",
"335 1\n",
"336 1\n",
"337 1\n",
"351 1\n",
"353 1\n",
" ..\n",
"18766 1\n",
"18768 1\n",
"18780 1\n",
"18788 1\n",
"18818 1\n",
"Name: count, Length: 5308, dtype: int64\n",
"Процент положительных значений: 100.00%\n",
"Процент отрицательных значений: 0.00%\n",
"\n",
"Необходима аугментация данных для балансировки классов.\n",
"\n"
]
}
],
"source": [
"def analyze_distribution(data, title):\n",
" print(f\"Распределение Price в {title}:\")\n",
" distribution = data.value_counts().sort_index()\n",
" print(distribution)\n",
" total = len(data)\n",
" positive_count = (data > 0).sum()\n",
" negative_count = (data < 0).sum()\n",
" positive_percent = (positive_count / total) * 100\n",
" negative_percent = (negative_count / total) * 100\n",
" print(f\"Процент положительных значений: {positive_percent:.2f}%\")\n",
" print(f\"Процент отрицательных значений: {negative_percent:.2f}%\")\n",
" print(\"\\nН е о б х о дима аугментация данных для балансировки классов.\\n\")\n",
"\n",
"\n",
"# Анализ распределения для каждой выборки\n",
"analyze_distribution(y_train, \"обучающей выборке\")\n",
"analyze_distribution(y_val, \"контрольной выборке\")\n",
"analyze_distribution(y_test, \"тестовой выборке\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Применяем методы приращения данных"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Распределение Price в обучающей выборке после oversampling:\n",
"price\n",
"327 85\n",
"334 85\n",
"336 85\n",
"337 85\n",
"338 85\n",
" ..\n",
"18791 85\n",
"18795 85\n",
"18797 85\n",
"18804 85\n",
"18806 85\n",
"Name: count, Length: 9476, dtype: int64\n",
"Процент положительных значений: 100.00%\n",
"Процент отрицательных значений: 0.00%\n",
"\n",
"Необходима аугментация данных для балансировки классов.\n",
"\n",
"Распределение Price в контрольной выборке:\n",
"price\n",
"326 2\n",
"340 1\n",
"344 1\n",
"354 1\n",
"357 1\n",
" ..\n",
"18781 1\n",
"18784 1\n",
"18791 1\n",
"18803 1\n",
"18823 1\n",
"Name: count, Length: 5389, dtype: int64\n",
"Процент положительных значений: 100.00%\n",
"Процент отрицательных значений: 0.00%\n",
"\n",
"Необходима аугментация данных для балансировки классов.\n",
"\n",
"Распределение Price в тестовой выборке:\n",
"price\n",
"335 1\n",
"336 1\n",
"337 1\n",
"351 1\n",
"353 1\n",
" ..\n",
"18766 1\n",
"18768 1\n",
"18780 1\n",
"18788 1\n",
"18818 1\n",
"Name: count, Length: 5308, dtype: int64\n",
"Процент положительных значений: 100.00%\n",
"Процент отрицательных значений: 0.00%\n",
"\n",
"Необходима аугментация данных для балансировки классов.\n",
"\n"
]
}
],
"source": [
"from imblearn.over_sampling import RandomOverSampler\n",
"\n",
"# Применение oversampling к обучающей выборке\n",
"oversampler = RandomOverSampler(random_state=42)\n",
"X_train_resampled, y_train_resampled = oversampler.fit_resample(X_train, y_train)\n",
"\n",
"# Анализ распределения для каждой выборки\n",
"analyze_distribution(y_train_resampled, \"обучающей выборке после oversampling\")\n",
"analyze_distribution(y_val, \"контрольной выборке\")\n",
"analyze_distribution(y_test, \"тестовой выборке\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Начнем анализировать датасет №18.\n",
"\n",
"Ссылка на исходные данные: https://www.kaggle.com/datasets/dewangmoghe/mobile-phone-price-prediction\n",
"\n",
"**Общее описание**: Данный датасет содержит информацию о ценах и атрибутах для 1369 мобильных телефонов разных конфигураций и производителей. Имеются 17 характеристик (именование модели, оценка (мин - 0, макс - 5), оценка на основе характеристик (мин - 0, макс - 100), информация о поддержке 2 симок и сетевых технологий (3G, 4G, 5G, VoLTE), количество оперативной памяти, характеристики батареи, информация о дисплее, характеристики камеры, поддержка внешней памяти, версия системы Android, цена, компания-производитель, поддержка быстрой зарядки, разрешение экрана, тип процессора, название процессора).\n",
"\n",
"**Проблемная область**: Финансовый анализ и прогнозирование цен на мобильные телефоны.\n",
"\n",
"**Объекты наблюдения**: телефон, включающий атрибуты: _Name, Rating, Spec_score, No_of_sim, RAM, Battery, Display, Camera, External_Memory, Android_version, Price, Company, Inbuilt_memory, Fast_charging, Screen_resolution, Processor, Processor_name_.\n",
"\n",
"**Бизнес цели**:\n",
"- ***Прогнозирование цен мобильные телефоны на основе оценки характеристик***.\n",
"- ***Прогнозирование оценки на основе фирмы и цены***.\n",
"\n",
"**Цели технического проекта**:\n",
"1. ***Прогнозирование цен на телефоны***: Входные данные - _о це нка х а р а кте р ис тик_; целевой признак - _це на _,\n",
"2. ***Анализ факторов влияния***: Входные данные - _фир ма и це на _; целевой признак - _о це нка х а р а кте р ис тик_."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['Unnamed: 0', 'Name', 'Rating', 'Spec_score', 'No_of_sim', 'Ram',\n",
" 'Battery', 'Display', 'Camera', 'External_Memory', 'Android_version',\n",
" 'Price', 'company', 'Inbuilt_memory', 'fast_charging',\n",
" 'Screen_resolution', 'Processor', 'Processor_name'],\n",
" dtype='object')\n"
]
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.read_csv(\"../data/mobile-phone-price-prediction.csv\")\n",
"print(df.columns)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Атрибуты: \n",
"- Неизвестный: 0, \n",
"- Наименование телефона (Name), \n",
"- Рейтинг (Rating),\n",
"- Рейтинг на основе характеристик (Spec_score),\n",
"- Поддержка различных технологий (No_of_sim),\n",
"- Количество оперативной памяти (Ram),\n",
"- Инфо о батарее (Battery),\n",
"- Инфо о дисплее (Display),\n",
"- Инфо о камере (Camera),\n",
"- Инфо о внешней памяти (External_Memory),\n",
"- Версия Android (Android_version),\n",
"- Цена (Price),\n",
"- Компания-производитель (company),\n",
"- Инфо о внутренней памяти (Inbuilt_memory),\n",
"- Быстрая зарядка (fast_charging),\n",
"- Разрешение экрана (Screen_resolution),\n",
"- Тип процессора (Processor),\n",
"- Наименование процессора (Processor_name)."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABIgAAAJLCAYAAACMgK3jAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdfXwU5bk//s9uQkggZCWRkCCSLFALa4xCFY0lYKlRRPHxVOUncCyeiFh85Gs1VIiAGtGeatVKIaXyVGt7WqsiIZZKlXgaDBUR0+ATbIJgQiCBTQIJS3b390fOpoTsw0wymZlr9/N+vXi9mp2LeDG99557r525L4vP5/OBiIiIiIiIiIiiltXoBIiIiIiIiIiIyFgsEBERERERERERRTkWiIiIiIiIiIiIohwLREREREREREREUY4FIiIiIiIiIiKiKMcCERERERERERFRlGOBiIiIiIiIiIgoyrFAREREREREREQU5VggIiIiIiIiIiKKciwQERERERERERFFORaIiIiIyLT+9Kc/wWKxBPyTlZVldHpRq6WlBYWFhZg6dSqSk5NhsViwZs0ao9MiIiKiXog1OgEiIiKicBYuXIixY8d2/vzUU08ZmA0dOXIES5cuxYgRI3DhhRfi/fffNzolIiIi6iUWiIiIiMj08vLycMUVV3T+/Jvf/AZHjhwxLqEol56ejtraWqSlpeGf//wnLrnkEqNTIiIiol7iI2ZERERkWm63GwBgtYZfsqxZswYWiwXV1dWdr3m9XmRnZ3d7BGr37t248847MXLkSMTHxyMtLQ1z5sxBQ0NDl9/5xBNPBHy8LTb239+xXXHFFcjKysLHH3+Myy+/HAkJCbDb7fj1r3/d7d+yePFifO9734PNZsPAgQORm5uLv//9713iqqurO/87b775ZpdjbW1tGDx4MCwWC37+8593yzM1NRWnTp3q8nd+//vfd/6+04tqb731Fq699loMGzYM/fv3x6hRo7Bs2TJ4PJ6w57p///5IS0sLG0dERERy8A4iIiIiMi1/gah///49+vvr16/HZ5991u31LVu2YN++ffjxj3+MtLQ0/Otf/8KqVavwr3/9C9u3b4fFYukSv2LFCiQmJnb+fGbB6ujRo5g2bRpuvfVWzJgxA3/84x8xb948xMXFYc6cOQCApqYm/OY3v8GMGTOQn5+P5uZmrF69GldffTUqKipw0UUXdfmd8fHxePXVV3HjjTd2vvbGG2+gra0t6L+3ubkZ77zzDm666abO11599VXEx8d3+3tr1qxBYmIiHn74YSQmJmLr1q1YvHgxmpqa8NxzzwX9bxAREVFkYoGIiIiITMvlcgEAEhISVP/dkydPYvHixbjmmmuwefPmLsfuvfdeLFiwoMtrl112GWbMmIEPP/wQubm5XY79x3/8B84+++yg/61vv/0W//3f/42HH34YADB37lxceumlKCgowKxZs9CvXz8MHjwY1dXViIuL6/x7+fn5GDNmDF566SWsXr26y++86aab8D//8z84dOgQhg4dCgD47W9/i5tvvhmvvfZawDxuuukm/Pa3v+0sEO3fvx/vvfcebrvtNvz+97/vEvvaa691Oa/33HMP7rnnHrzyyit48skne1yUIyIiIpn4iBkRERGZlv+RryFDhqj+u7/61a/Q0NCAwsLCbsdOL4y0tbXhyJEjuOyyywAAO3fuVP3fio2Nxdy5czt/jouLw9y5c1FfX4+PP/4YABATE9NZHPJ6vWhsbER7ezsuvvjigP/N8ePH4/zzz8f69esBADU1Nfj73/+OO++8M2gec+bMQWlpKerq6gAAa9euRU5ODs4777xusaefg+bmZhw5cgS5ubk4ceIEPv/8c9XngIiIiGRjgYiIiIhMq6amBrGxsaoLRC6XC08//TQefvjhzrtvTtfY2IgHHngAQ4cORUJCAoYMGQK73d75d9UaNmwYBg4c2OU1f1Hm9D2R1q5di+zsbMTHxyMlJQVDhgzBpk2bgv43f/zjH+PVV18F0PFI2OWXX47vfOc7QfO46KKLkJWVhXXr1sHn82HNmjX48Y9/HDD2X//6F2666SbYbDYkJSVhyJAhmDlzJoCenQMiIiKSjQUiIiIiMq0vvvgCI0eO7LIptBLLly+H1WrFI488EvD4rbfeiuLiYtxzzz1444038Ne//hWlpaUAOu7u6QsbNmzAnXfeiVGjRmH16tUoLS3Fli1bMGXKlKD/zZkzZ+Lrr7/G9u3bsXbt2qDFntPNmTMHr776Kj744APU1dXh1ltv7RZz7NgxTJ48GZ9++imWLl2KjRs3YsuWLVi+fDmAvjsHREREZF7cg4iIiIhM6eTJk9i1a1eXTZqV+Pbbb/HLX/4SRUVFGDRoULfOZEePHsV7772HJUuWYPHixZ2vf/XVVz3O9dtvv8Xx48e73EX05ZdfAgAyMzMBAH/6058wcuRIvPHGG102wQ70CJxfSkoKrr/++s7H1W699dYuncgCueOOO/DII4/ggQcewH/8x39g0KBB3WLef/99NDQ04I033sCkSZM6X3c6nYr+vURERBR5eAcRERERmdJrr72GkydP4oc//KGqv7dkyRIMHToU99xzT8DjMTExAACfz9fl9RdeeKFHeQJAe3s7Vq5c2fmz2+3GypUrMWTIEHzve98L+t/96KOPUF5eHvJ3z5kzB7t378aPfvSjLp3UgklOTsYNN9yA3bt3d3ZQO1OgXNxuN1555ZWwv5+IiIgiE+8gIiIiIlM5fvw4XnrpJSxduhQxMTHw+XzYsGFDl5hDhw6hpaUFGzZsQF5eXpd9hv7617/id7/7XZduYadLSkrCpEmT8Oyzz+LUqVM455xz8Ne//rVXd88MGzYMy5cvR3V1Nc477zz84Q9/wK5du7Bq1Sr069cPAHDdddfhjTfewE033YRrr70WTqcTv/71r+FwONDS0hL0d0+dOhWHDx9WVBzyW7NmDX71q18F7bx2+eWXY/DgwfjP//xP3H///bBYLFi/fn23olkoL7/8Mo4dO4Zvv/0WALBx40YcOHAAAHDffffBZrMp/l1ERERkPBaIiIiIyFQOHz6MgoKCzp9P7w52plmzZuHvf/97lwLRRRddhBkzZoT8b7z22mu477778Ktf/Qo+nw9XXXUVNm/ejGHDhvUo58GDB2Pt2rW47777UFxcjKFDh+Lll19Gfn5+Z8ydd96Juro6rFy5Eu+++y4cDgc2bNiA//mf/8H7778f9HdbLJaghZ5gEhISunQpO1NKSgreeecdLFiwAI8//jgGDx6MmTNn4oc//CGuvvpqRf+Nn//856ipqen8+Y033sAbb7wBoGPvJBaIiIiIZLH41HxVRERERNTHqqurYbfb8fe//x1XXHFFr+P62hVXXIEjR46gsrLSsByIiIiIeot7EBERERERERERRTkWiIiIiMhUEhMTcccdd3R5bKw3cUREREQUHh8xIyIiIuoFPmJGREREkYAFIiIiIiIiIiKiKMdHzIiIiIiIiIiIohzb3APwer349ttvMWjQIFgsFqPTISIiIiIiIiLShM/nQ3NzM4YNGwarNfh9QiwQAfj2229x7rnnGp0GEREREREREVGf+OabbzB8+PCgx1kgAjBo0CAAHScrKSnJ4GyIiIiIiIiIiLTR1NSEc889t7P2EQwLREDnY2VJSUksEBERERERERFRxAm3pY6hm1Rv27YN06dPx7Bhw2CxWPDmm292Oe7z+bB48WKkp6cjISEBV155Jb766qsuMY2NjbjjjjuQlJSEs846C3fddRdaWlp0/FcQEREREREREclmaIHo+PHjuPDCC/GrX/0q4PFnn30WL774In7961/jo48+wsCBA3H11Vejra2tM+aOO+7Av/71L2zZsgXvvPMOtm3bhrvvvluvfwIRERERERERkXgWn8/nMzoJoONWp7/85S+48cYbAXTcPTRs2DAsWLAA/+///T8AgMvlwtChQ7FmzRrcfvvt2LNnDxwOB3bs2IGLL74YAFBaWopp06bhwIEDGDZsWMD/1smTJ3Hy5MnOn/3P47lcLj5iR
"text/plain": [
"<Figure size 1400x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(14, 6))\n",
"\n",
"\n",
"plt.scatter(df[\"company\"].str.lower(), df[\"Spec_score\"])\n",
"plt.xlabel(\"Фирма\")\n",
"plt.ylabel(\"Оценка характеристик\")\n",
"plt.xticks(rotation=45)\n",
"plt.title(\"Диаграмма 1\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Между атрибутами присутствует связь. Пример, на диаграмме 1 - связь между фирмой и оценкой характеристик"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Перейдем к проверке на выбросы"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Пустые значения по столбцам:\n",
"Unnamed: 0 0\n",
"Name 0\n",
"Rating 0\n",
"Spec_score 0\n",
"No_of_sim 0\n",
"Ram 0\n",
"Battery 0\n",
"Display 0\n",
"Camera 0\n",
"External_Memory 0\n",
"Android_version 443\n",
"Price 0\n",
"company 0\n",
"Inbuilt_memory 19\n",
"fast_charging 89\n",
"Screen_resolution 2\n",
"Processor 28\n",
"Processor_name 0\n",
"dtype: int64\n",
"\n",
"Количество дубликатов: 0\n",
"\n",
"Статистический обзор данных:\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>Rating</th>\n",
" <th>Spec_score</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>1370.000000</td>\n",
" <td>1370.000000</td>\n",
" <td>1370.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>684.500000</td>\n",
" <td>4.374416</td>\n",
" <td>80.234307</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>395.629246</td>\n",
" <td>0.230176</td>\n",
" <td>8.373922</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>0.000000</td>\n",
" <td>3.750000</td>\n",
" <td>42.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>342.250000</td>\n",
" <td>4.150000</td>\n",
" <td>75.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>684.500000</td>\n",
" <td>4.400000</td>\n",
" <td>82.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>1026.750000</td>\n",
" <td>4.550000</td>\n",
" <td>86.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>1369.000000</td>\n",
" <td>4.750000</td>\n",
" <td>98.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 Rating Spec_score\n",
"count 1370.000000 1370.000000 1370.000000\n",
"mean 684.500000 4.374416 80.234307\n",
"std 395.629246 0.230176 8.373922\n",
"min 0.000000 3.750000 42.000000\n",
"25% 342.250000 4.150000 75.000000\n",
"50% 684.500000 4.400000 82.000000\n",
"75% 1026.750000 4.550000 86.000000\n",
"max 1369.000000 4.750000 98.000000"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"null_values = df.isnull().sum()\n",
"print(\"Пустые значения по столбцам:\")\n",
"print(null_values)\n",
"\n",
"duplicates = df.duplicated().sum()\n",
"print(f\"\\nК о личе с тво дубликатов: {duplicates}\")\n",
"\n",
"print(\"\\nС та тис тиче с кий обзор данных:\")\n",
"df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Видим, что есть пустые данные, но нет дубликатов. Удаляем их"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"В наборе данных 'Phones' было удалено 553 строк с пустыми значениями.\n"
]
}
],
"source": [
"def drop_missing_values(dataframe, name):\n",
" before_shape = dataframe.shape\n",
" cleaned_dataframe = dataframe.dropna()\n",
" after_shape = cleaned_dataframe.shape\n",
" print(\n",
" f\"В наборе данных '{name}' было удалено {before_shape[0] - after_shape[0]} строк с пустыми значениями.\"\n",
" )\n",
" return cleaned_dataframe\n",
"\n",
"\n",
"cleaned_df = drop_missing_values(df, \"Phones\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Рассчитаем коэффициент ассиметрии"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Коэффициент асимметрии для столбца 'Unnamed: 0': 0.0\n",
"\n",
"Коэффициент асимметрии для столбца 'Rating': -0.06697860128699223\n",
"\n",
"Коэффициент асимметрии для столбца 'Spec_score': -0.7393772365886471\n"
]
}
],
"source": [
"import numpy as np\n",
"for column in df.select_dtypes(include=[np.number]).columns:\n",
" asymmetry = df[column].skew()\n",
" print(f\"\\nК о эффицие нт асимметрии для столбца '{column}': {asymmetry}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Выбросы незначительные.\n",
"\n",
"Очистим данные от шумов."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1IAAAJLCAYAAADtiKfgAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADus0lEQVR4nOzdeXxTVfo/8E/a0oUusQW6AELL4mBFBAQUZBsFQRQQHRFHUOQnMDKoIKMjDKuyjKKi4ICAyqqjjgiCIoiALFIWBUQsqEDL2lKg0AW60Ob8/ug3SdMmaZLmJk/Sz/v16kt7c0iennvvufe59+Y8OqWUAhERERERETkswNsBEBERERER+RomUkRERERERE5iIkVEREREROQkJlJEREREREROYiJFRERERETkJCZSRERERERETmIiRURERERE5CQmUkRERERERE5iIkVEREREROQkJlJEREREgp05cwZLly41/Z6eno6PPvrIewEREQAmUkSksc8//xw6nc7qT8uWLb0dHhGReDqdDn//+9+xceNGpKen46WXXsKOHTu8HRZRjRfk7QCIqGaYMGECbr75ZtPvM2bM8GI0RES+o0GDBhg+fDh69+4NAEhISMD333/v3aCICDqllPJ2EETkvz7//HM88sgj2Lp1K7p3725a3r17d1y8eBGHDx/2XnBERD7k+PHjuHjxIlq2bInw8HBvh0NU4/HRPiLSVHFxMQAgIKDq4Wbp0qXQ6XRIT083LTMYDGjVqhV0Op3FdwQOHTqEoUOHokmTJggNDUV8fDyGDRuGS5cuWbzn1KlTrT5WGBRkviHfvXt3tGzZEj/99BM6deqEsLAwJCUl4b333qv0t0yePBm333479Ho9wsPD0aVLF2zdutWiXXp6uulz1qxZY/FaYWEhoqOjodPp8MYbb1SKMzY2FtevX7f4N//9739N73fx4kXT8i+//BL3338/6tevj5CQEDRt2hSvvvoqSktLq+xr4+cdPXoUAwcORFRUFOrUqYPnn38ehYWFFm2XLFmCu+++G7GxsQgJCUFycjIWLFhg9X2/+eYbdOvWDZGRkYiKikL79u3x8ccfW7TZs2cP+vTpg+joaISHh6NVq1Z45513LNocPXoUf/nLXxATE4PQ0FC0a9cOa9eutWjjzPYydOhQi/UfHR2N7t27V3o8ytE+NW4zFb3xxhuVYkpMTMTQoUMt2v3vf/+DTqdDYmKixfKsrCz8v//3/9CoUSMEBgaa4o2IiKj0WRUlJibafIxWp9NVar9y5UrcfvvtCAsLQ0xMDAYNGoTTp09b/Tur2jcAoKioCFOmTEGzZs0QEhKCG2+8ES+99BKKiooqtf3+++8djrMi47Zr7e8v38/ObB8ATPtCvXr1EBYWhj/96U/417/+ZfGZ9n6Md4i6d+9ucdEIKLsDHxAQUGlf+N///mdaB3Xr1sXgwYNx9uxZizZDhw41bSdNmzbFHXfcgezsbISFhVX6+4jIs/hoHxFpyphIhYSEuPTvV6xYgV9++aXS8k2bNuHEiRN46qmnEB8fj19//RWLFi3Cr7/+it27d1c60VqwYIHFyWjFxO7y5cvo06cPBg4ciMceewyfffYZnnnmGQQHB2PYsGEAgNzcXLz//vt47LHHMHz4cOTl5eGDDz5Ar169sHfvXrRu3driPUNDQ7FkyRI8+OCDpmVffPFFpUSlvLy8PHz11VcYMGCAadmSJUsQGhpa6d8tXboUEREReOGFFxAREYEtW7Zg8uTJyM3NxezZs21+RnkDBw5EYmIiZs2ahd27d2Pu3Lm4fPkyli9fbtF3t9xyC/r164egoCCsW7cOo0aNgsFgwN///neLeIYNG4ZbbrkF48ePxw033IADBw5gw4YN+Otf/wqgbL098MADSEhIwPPPP4/4+HgcOXIEX331FZ5//nkAwK+//oq77roLDRo0wMsvv4zw8HB89tlnePDBB7Fq1SqLvqnI1vYCAHXr1sWcOXMAlH15/5133kGfPn1w+vRp3HDDDW7r06qUlJSYTtArevLJJ/Hdd9/h2WefxW233YbAwEAsWrQI+/fvd+i9W7dujXHjxlksW758OTZt2mSxbMaMGZg0aRIGDhyIp59+GhcuXMC8efPQtWtXHDhwwNQfgGP7hsFgQL9+/bBz506MGDECN998M3755RfMmTMHv//+e6ULCkbPPfcc2rdvbzNOd7O1fRw6dAhdunRBrVq1MGLECCQmJuL48eNYt24dZsyYgYceegjNmjUztR87dixuvvlmjBgxwrSs/KPL5S1ZsgQTJ07Em2++adoPgLJt7amnnkL79u0xa9YsnD9/Hu+88w5++OGHSuugosmTJ9sdR4jIQxQRkYbefvttBUD9/PPPFsu7deumbrnlFotlS5YsUQBUWlqaUkqpwsJC1ahRI3XfffcpAGrJkiWmtteuXav0Wf/9738VALV9+3bTsilTpigA6sKFCzZj7NatmwKg3nzzTdOyoqIi1bp1axUbG6uKi4uVUkqVlJSooqIii397+fJlFRcXp4YNG2ZalpaWpgCoxx57TAUFBanMzEzTa/fcc4/661//qgCo2bNnV4rzscceUw888IBp+cmTJ1VAQIB67LHHKv0d1vpg5MiRqnbt2qqwsNDm31v+8/r162exfNSoUZXWl7XP6dWrl2rSpInp9ytXrqjIyEh1xx13qIKCAou2BoNBKVXWf0lJSapx48bq8uXLVtsoVdZHt956q8XfYDAYVKdOnVTz5s1Ny5zZXp588knVuHFji89ctGiRAqD27t1r92+11qfWtl+llJo9e7ZFTEop1bhxY/Xkk0+afp8/f74KCQlRf/7zny1iKigoUAEBAWrkyJEW7/nkk0+q8PDwSp9VUePGjdX9999fafnf//53Vf5wn56ergIDA9WMGTMs2v3yyy8qKCjIYrmj+8aKFStUQECA2rFjh8V7vvfeewqA+uGHHyyWf/vttwqA+vzzz23Gacu0adMUAIttxvj3l+9nZ7aPrl27qsjISHXy5EmL96z4GbY+q7xu3bqpbt26KaWU+vrrr1VQUJAaN26cRZvi4mIVGxurWrZsabG/fPXVVwqAmjx5smlZxW338OHDKiAgwPR3lN/WiMiz+GgfEWnK+KhdvXr1nP63//nPf3Dp0iVMmTKl0mthYWGm/y8sLMTFixdx5513AoDDV+/LCwoKwsiRI02/BwcHY+TIkcjKysJPP/0EAAgMDERwcDCAsivw2dnZKCkpQbt27ax+Ztu2bXHLLbdgxYoVAICTJ09i69atlR7zKm/YsGHYsGEDMjMzAQDLli1Dx44dcdNNN1VqW74P8vLycPHiRXTp0gXXrl3D0aNHHfq7y99RAoBnn30WALB+/Xqrn5OTk4OLFy+iW7duOHHiBHJycgCU3WnKy8vDyy+/jNDQUIv3NN4dPHDgANLS0jBmzJhKV9uNbbKzs7FlyxYMHDjQ9DddvHgRly5dQq9evfDHH39UevTJyN72ApStM+P7HTx4EMuXL0dCQoLFnQRn+rS0tNT0fsafa9euWf1so2vXruGVV17B6NGj0ahRI4vXrl69CoPBgDp16th9j+r64osvYDAYMHDgQIvY4+Pj0bx580qPqjqyb/zvf//DzTffjBYtWli859133w0Ald7TeDel4rbiiNjYWABldxWdYWv7uHDhArZv345hw4ZVWieOPGpoy969ezFw4EA8/PDDle5m/vjjj8jKysKoUaMs+uD+++9HixYt8PXXX9t83/Hjx6Nt27Z45JFHXI6NiNyDj/YRkaZOnjyJoKAgpxOpnJwczJw5Ey+88ALi4uIqvZ6dnY1p06bhk08+QVZWVqV/66z69etX+vK2MXlJT083JWnLli3Dm2++iaNHj1p8lykpKcnq+z711FNYtGgR/vGPf2Dp0qXo1KkTmjdvbjOO1q1bo2XLlli+fDlefPFFLF26FBMmTKj03RWg7BG4iRMnYsuWLcjNzbV4zdE+qBhL06ZNERAQYPG9ix9++AFTpkxBSkpKpUQhJycHer0ex48fBwC7U9o70ubYsWNQSmHSp
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Выбросы в датасете:\n",
" Unnamed: 0 Name Rating Spec_score \\\n",
"99 99 Vivo Y02 4.35 54 \n",
"214 214 Realme C30s 4.55 58 \n",
"802 802 Vivo Y02 (2GB RAM + 32GB) 4.50 53 \n",
"803 803 Vivo Y02 4.35 54 \n",
"1344 1344 TCL 501 4.25 55 \n",
"\n",
" No_of_sim Ram Battery Display \\\n",
"99 Dual Sim, 3G, 4G, VoLTE, 3 GB RAM 5000 mAh Battery 6.51 inches \n",
"214 Dual Sim, 3G, 4G, VoLTE, 2 GB RAM 5000 mAh Battery 6.5 inches \n",
"802 Dual Sim, 3G, 4G, VoLTE, 2 GB RAM 5000 mAh Battery 6.51 inches \n",
"803 Dual Sim, 3G, 4G, VoLTE, 3 GB RAM 5000 mAh Battery 6.51 inches \n",
"1344 Dual Sim, 3G, 4G, VoLTE, 2 GB RAM 3000 mAh Battery 6 inches \n",
"\n",
" Camera External_Memory \\\n",
"99 8 MP Rear & 5 MP Front Camera Memory Card Supported, upto 1 TB \n",
"214 8 MP Rear & 5 MP Front Camera Memory Card Supported, upto 1 TB \n",
"802 8 MP Rear & 5 MP Front Camera Memory Card Supported, upto 1 TB \n",
"803 8 MP Rear & 5 MP Front Camera Memory Card Supported, upto 1 TB \n",
"1344 5 MP Rear & 2 MP Front Camera Memory Card Supported \n",
"\n",
" Android_version Price company Inbuilt_memory fast_charging \\\n",
"99 12 9,999 Vivo 32 GB inbuilt 10W Fast Charging \n",
"214 12 6,950 Realme 32 GB inbuilt 10W Fast Charging \n",
"802 12 8,999 Vivo 32 GB inbuilt 10W Fast Charging \n",
"803 12 8,489 Vivo 32 GB inbuilt 10W Fast Charging \n",
"1344 14 7,990 TCL 32 GB inbuilt 10W Fast Charging \n",
"\n",
" Screen_resolution Processor \\\n",
"99 720 x 1600 px Display with Water Drop Notch Octa Core Processor \n",
"214 720 x 1600 px Display with Water Drop Notch Octa Core \n",
"802 720 x 1600 px Display with Water Drop Notch Octa Core Processor \n",
"803 720 x 1600 px Display with Water Drop Notch Octa Core Processor \n",
"1344 540 x 1092 px Display Octa Core \n",
"\n",
" Processor_name \n",
"99 Helio \n",
"214 Unisoc SC9863A \n",
"802 Helio \n",
"803 Helio \n",
"1344 Helio G36 \n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0kAAAJLCAYAAAAyxt3/AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzde3hTZbo28DtJSdNjbAqlRaQtoONUYASlM2UsuFUEURCd2aIjIDJWNoyjoltHGLFUlG51vsHTyEAdQcBRnO1hU6dW8QRlRMsWlMGo4yEtIsViAz3QpmmS9f2RnbZpDl1Js1bepPfvunpdNH1In6618q715F15H40kSRKIiIiIiIgIAKCNdgJEREREREQiYZFERERERETUC4skIiIiIiKiXlgkERERERER9cIiiYiIiIiIqBcWSURERERERL2wSCIiIiIiIuqFRRIREREREVEvLJKIiIiIiIh6YZFERERERETUC4skIlLEf//3f0Oj0fj9GjduXLTTIyIiIgooIdoJEFF8W7lyJX784x93f//ggw9GMRsiIiKi/rFIIiJFTZ8+HRdeeGH3908//TR++OGH6CVERERE1A/ebkdEirDb7QAArbb/YWbz5s3QaDSoq6vrfszlcmHChAnQaDTYvHlz9+MHDx7EokWLMHr0aBgMBmRnZ2Px4sVoamryes7Vq1f7vdUvIaHnvaELL7wQ48aNw0cffYQpU6YgKSkJ+fn5+POf/+zzt9x3330477zzYDQakZKSguLiYrz77rtecXV1dd2/59VXX/X6mc1mQ0ZGBjQaDf7whz/45JmVlYWuri6v//P88893P1/vwvJ//ud/cPnll2PEiBFITEzEmDFjsGbNGjidzn63tef3ff7557jmmmuQnp6OzMxM3HbbbbDZbF6xmzZtwkUXXYSsrCwkJiaioKAA69ev9/u8r7/+OqZNm4a0tDSkp6dj8uTJ+Otf/+oV8+GHH2LWrFnIyMhASkoKJkyYgMcee8wr5vPPP8cvf/lLmEwmGAwGnH/++dixY4dXTCjHy6JFi7z2f0ZGBi688ELU1NR4Pafcbeo5Zvr6wx/+4JNTXl4eFi1a5BX3t7/9DRqNBnl5eV6PNzY24te//jVGjRoFnU7XnW9qaqrP7+orLy8v4K2tGo3GK9bhcGDNmjUYM2YMEhMTkZeXh5UrV6Kzs9PneeXs097HfLDf63K58Oijj+Kcc86BwWDA8OHDsWTJEpw4cULW39d3O7733nvQaDR47733uh+78MILvd6QAYB9+/b5zQcAtm3bhsLCQiQnJyMjIwNTp07Fm2++2f07g21Tz/7z/P29j7nW1lacd955yM/PR0NDQ8A4APjNb34DjUbj8/cRUfRxJomIFOEpkhITE8P6/1u3bsU///lPn8d37tyJb775BjfeeCOys7Px6aefYuPGjfj000/xwQcf+FwMrV+/3utCs2/RduLECcyaNQvXXHMNrrvuOrz44otYunQp9Ho9Fi9eDABoaWnB008/jeuuuw4lJSVobW3FX/7yF8yYMQO1tbU499xzvZ7TYDBg06ZNmDt3bvdjL7/8sk8R0ltraytee+01XHXVVd2Pbdq0CQaDwef/bd68GampqbjjjjuQmpqKd955B/fddx9aWlrwyCOPBPwdvV1zzTXIy8tDeXk5PvjgAzz++OM4ceIEtmzZ4rXtzjnnHMyZMwcJCQmorKzEsmXL4HK58Jvf/MYrn8WLF+Occ87BihUrcNppp+HAgQOorq7Gr371KwDu/XbFFVcgJycHt912G7Kzs/HZZ5/htddew2233QYA+PTTT/Hzn/8cp59+Ou655x6kpKTgxRdfxNy5c/HSSy95bZu+Ah0vADB06FCsW7cOAHDkyBE89thjmDVrFr799lucdtppEdum/XE4HPj973/v92c33HAD3nrrLfz2t7/FT37yE+h0OmzcuBH79++X9dznnnsu7rzzTq/HtmzZgp07d3o9dtNNN+HZZ5/FL3/5S9x555348MMPUV5ejs8++wyvvPJKd5ycfdrbzTffjOLiYgDuY733cwHAkiVLsHnzZtx444249dZbYbFY8OSTT+LAgQP4xz/+gSFDhsj6O0P1u9/9zu/jZWVlWL16NaZMmYL7778fer0eH374Id555x1ceumlePTRR9HW1gYA+Oyzz7B27VqvW4cDFa9dXV34xS9+gcOHD+Mf//gHcnJyAub21VdfoaKiYoB/IREpRiIiUsCjjz4qAZA++eQTr8enTZsmnXPOOV6Pbdq0SQIgWSwWSZIkyWazSaNGjZIuu+wyCYC0adOm7tj29naf3/X8889LAKTdu3d3P1ZaWioBkI4fPx4wx2nTpkkApP/3//5f92OdnZ3SueeeK2VlZUl2u12SJElyOBxSZ2en1/89ceKENHz4cGnx4sXdj1ksFgmAdN1110kJCQnSsWPHun928cUXS7/61a8kANIjjzzik+d1110nXXHFFd2P19fXS1qtVrruuut8/g5/22DJkiVScnKyZLPZAv69vX/fnDlzvB5ftmyZz/7y93tmzJghjR49uvv7kydPSmlpadJPf/pTqaOjwyvW5XJJkuTefvn5+VJubq504sQJvzGS5N5G48eP9/obXC6XNGXKFOnMM8/sfiyU4+WGG26QcnNzvX7nxo0bJQBSbW1t0L/V3zb1d/xKkiQ98sgjXjlJkiTl5uZKN9xwQ/f3Tz31lJSYmCj927/9m1dOHR0dklarlZYsWeL1nDfccIOUkpLi87v6ys3NlS6//HKfx3/zm99IvU/zH3/8sQRAuummm7zi/vM//1MCIL3zzjuSJMnbpx5ffvmlBEB69tlnux/zHGMeNTU1EgDpueee8/q/1dXVfh/vKz8/X1q4cKHXY++++64EQHr33Xe7H5s2bZo0bdq07u+rqqokANLMmTO98vnyyy8lrVYrXXXVVZLT6Qz69wX6XR6e1/ymTZskl8slXX/99VJycrL04YcfBozzuOaaa6Rx48ZJZ5xxhtdxQkRi4O12RKQIz+1vw4YNC/n//ulPf0JTUxNKS0t9fpaUlNT9b5vNhh9++AE/+9nPAED2u+69JSQkYMmSJd3f6/V6LFmyBI2Njfjoo48AADqdDnq9HoD7tiGr1QqHw4Hzzz/f7++cNGkSzjnnHGzduhUAUF9fj3fffTfoLTWLFy9GdXU1jh07BgB49tlnUVRUhLPOOssntvc2aG1txQ8//IDi4mK0t7fj888/l/V3954JAoDf/va3AICqqiq/v6e5uRk//PADpk2bhm+++QbNzc0A3DNEra2tuOeee2AwGLye0zOrd+DAAVgsFtx+++3dMzd9Y6xWK9555x1cc8013X/TDz/8gKamJsyYMQNffvklvvvuO79/S7DjBXDvM8/zffzxx9iyZQtycnK8FhQJZZs6nc7u5/N8tbe3+/3dHu3t7bj//vtxyy23YNSoUV4/O3XqFFwuFzIzM4M+x0B59u0dd9zh9bhnBurvf/87AHn71EPOjPHf/vY3GI1GTJ8+3WubnXfeeUhNTfW5bbWvrKwsHDlyRMZf2EOSJKxYsQK/+MUv8NOf/tTrZ6+++ipcLhfuu+8+n5llf7flyXXXXXfhueeew4svvojCwsKgsR999BH+9re/oby8XNYtyUSkPr4yiUgR9fX1SEhICLlIam5uxtq1a3HHHXdg+PDhPj+3Wq247bbbMHz4cCQlJWHYsGHIz8/v/r+hGjFiBFJSUrwe8xQmvT9f8uyzz2LChAkwGAzIzMzEsGHD8Pe//z3g77zxxhuxadMmAO5bl6ZMmYIzzzwzYB7nnnsuxo0bhy1btkCSpO5bk/z59NNPcdVVV8FoNCI9PR3Dhg3D/PnzAcjfBn1zGTNmDLRardff/I9//AOXXHIJUlJScNppp2HYsGFYuXKl1+/5+uuvASDosu5yYr766itIkoRVq1Zh2LBhXl+e4qexsdHn//V3vADAt99+2/1cEydOxNdff42XXnrJ65apULbp559/HjDHQP74xz/CZrN1b7/eMjMzceaZZ+Lpp5/Gm2++icbGRvzww
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(10, 6))\n",
"plt.scatter(cleaned_df[\"company\"].str.lower(), cleaned_df[\"Spec_score\"])\n",
"plt.xlabel(\"Фирма\")\n",
"plt.ylabel(\"Оценка характеристик\")\n",
"plt.xticks(rotation=45)\n",
"plt.title(\"Диаграмма рассеивания перед чисткой\")\n",
"plt.show()\n",
"\n",
"Q1 = cleaned_df[\"Spec_score\"].quantile(0.25)\n",
"Q3 = cleaned_df[\"Spec_score\"].quantile(0.75)\n",
"\n",
"IQR = Q3 - Q1\n",
"\n",
"threshold = 1.5 * IQR\n",
"lower_bound = Q1 - threshold\n",
"upper_bound = Q3 + threshold\n",
"\n",
"outliers = (cleaned_df[\"Spec_score\"] < lower_bound) | (\n",
" cleaned_df[\"Spec_score\"] > upper_bound\n",
")\n",
"\n",
"print(\"Выбросы в датасете:\")\n",
"print(cleaned_df[outliers])\n",
"\n",
"median_score = cleaned_df[\"Spec_score\"].median()\n",
"cleaned_df.loc[outliers, \"Spec_score\"] = median_score\n",
"\n",
"plt.figure(figsize=(10, 6))\n",
"plt.scatter(cleaned_df[\"company\"].str.lower(), cleaned_df[\"Spec_score\"])\n",
"plt.xlabel(\"Фирма\")\n",
"plt.ylabel(\"Оценка характеристик\")\n",
"plt.xticks(rotation=45)\n",
"plt.title(\"Диаграмма рассеивания после чистки\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Разбиваем на выборки."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Размер обучающей выборки: 489\n",
"Размер контрольной выборки: 164\n",
"Размер тестовой выборки: 164\n",
"\n",
"Распределение оценки характеристик в обучающей выборке:\n",
"Spec_score\n",
"75 48\n",
"86 35\n",
"80 34\n",
"84 32\n",
"85 23\n",
"78 23\n",
"83 23\n",
"77 19\n",
"79 19\n",
"82 18\n",
"89 17\n",
"88 17\n",
"71 16\n",
"73 15\n",
"72 13\n",
"74 13\n",
"87 12\n",
"69 11\n",
"76 10\n",
"81 10\n",
"67 9\n",
"90 9\n",
"70 8\n",
"68 8\n",
"91 8\n",
"64 7\n",
"93 7\n",
"92 6\n",
"66 5\n",
"94 4\n",
"63 4\n",
"96 2\n",
"95 1\n",
"65 1\n",
"60 1\n",
"61 1\n",
"Name: count, dtype: int64\n",
"\n",
"Распределение оценки характеристик в контрольной выборке:\n",
"Spec_score\n",
"75 18\n",
"81 12\n",
"74 11\n",
"79 9\n",
"82 9\n",
"85 9\n",
"84 8\n",
"86 8\n",
"76 7\n",
"78 7\n",
"77 7\n",
"83 6\n",
"89 5\n",
"71 5\n",
"72 5\n",
"80 4\n",
"70 4\n",
"88 3\n",
"68 3\n",
"65 3\n",
"73 3\n",
"67 2\n",
"87 2\n",
"63 2\n",
"95 2\n",
"93 2\n",
"90 2\n",
"94 1\n",
"66 1\n",
"92 1\n",
"69 1\n",
"98 1\n",
"61 1\n",
"Name: count, dtype: int64\n",
"\n",
"Распределение оценки характеристик в тестовой выборке:\n",
"Spec_score\n",
"75 15\n",
"84 13\n",
"76 11\n",
"82 10\n",
"81 9\n",
"80 9\n",
"77 8\n",
"83 8\n",
"86 7\n",
"89 6\n",
"78 6\n",
"79 6\n",
"87 5\n",
"71 5\n",
"74 5\n",
"85 5\n",
"70 4\n",
"94 3\n",
"72 3\n",
"73 3\n",
"66 3\n",
"91 3\n",
"88 3\n",
"92 3\n",
"93 2\n",
"96 1\n",
"64 1\n",
"90 1\n",
"67 1\n",
"62 1\n",
"65 1\n",
"68 1\n",
"95 1\n",
"69 1\n",
"Name: count, dtype: int64\n",
"\n"
]
}
],
"source": [
"train_df, test_df = train_test_split(cleaned_df, test_size=0.2, random_state=42)\n",
"\n",
"train_df, val_df = train_test_split(train_df, test_size=0.25, random_state=42)\n",
"\n",
"print(\"Размер обучающей выборки:\", len(train_df))\n",
"print(\"Размер контрольной выборки:\", len(val_df))\n",
"print(\"Размер тестовой выборки:\", len(test_df))\n",
"\n",
"print()\n",
"\n",
"\n",
"def check_balance(df, name):\n",
" counts = df[\"Spec_score\"].value_counts()\n",
" print(f\"Распределение оценки характеристик в {name}:\")\n",
" print(counts)\n",
" print()\n",
"\n",
"\n",
"check_balance(train_df, \"обучающей выборке\")\n",
"check_balance(val_df, \"контрольной выборке\")\n",
"check_balance(test_df, \"тестовой выборке\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Оверсемплинг и андерсемплинг"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Оверсэмплинг:\n",
"Распределение оценки характеристик в обучающей выборке:\n",
"Spec_score\n",
"85 48\n",
"78 48\n",
"75 48\n",
"82 48\n",
"64 48\n",
"73 48\n",
"79 48\n",
"87 48\n",
"86 48\n",
"80 48\n",
"70 48\n",
"83 48\n",
"68 48\n",
"74 48\n",
"71 48\n",
"72 48\n",
"66 48\n",
"93 48\n",
"77 48\n",
"88 48\n",
"69 48\n",
"89 48\n",
"84 48\n",
"94 48\n",
"76 48\n",
"95 48\n",
"90 48\n",
"63 48\n",
"81 48\n",
"67 48\n",
"91 48\n",
"92 48\n",
"96 48\n",
"65 48\n",
"60 48\n",
"61 48\n",
"Name: count, dtype: int64\n",
"\n",
"Распределение оценки характеристик в контрольной выборке:\n",
"Spec_score\n",
"75 18\n",
"94 18\n",
"72 18\n",
"82 18\n",
"70 18\n",
"74 18\n",
"68 18\n",
"88 18\n",
"71 18\n",
"80 18\n",
"92 18\n",
"86 18\n",
"66 18\n",
"81 18\n",
"84 18\n",
"79 18\n",
"73 18\n",
"76 18\n",
"67 18\n",
"95 18\n",
"78 18\n",
"85 18\n",
"83 18\n",
"77 18\n",
"89 18\n",
"98 18\n",
"69 18\n",
"90 18\n",
"87 18\n",
"65 18\n",
"63 18\n",
"93 18\n",
"61 18\n",
"Name: count, dtype: int64\n",
"\n",
"Распределение оценки характеристик в тестовой выборке:\n",
"Spec_score\n",
"80 15\n",
"94 15\n",
"82 15\n",
"77 15\n",
"75 15\n",
"79 15\n",
"96 15\n",
"83 15\n",
"76 15\n",
"71 15\n",
"64 15\n",
"78 15\n",
"84 15\n",
"91 15\n",
"74 15\n",
"93 15\n",
"87 15\n",
"89 15\n",
"81 15\n",
"66 15\n",
"86 15\n",
"92 15\n",
"88 15\n",
"73 15\n",
"90 15\n",
"67 15\n",
"85 15\n",
"72 15\n",
"62 15\n",
"70 15\n",
"65 15\n",
"68 15\n",
"95 15\n",
"69 15\n",
"Name: count, dtype: int64\n",
"\n",
"Андерсэмплинг:\n",
"Распределение оценки характеристик в обучающей выборке:\n",
"Spec_score\n",
"60 1\n",
"61 1\n",
"63 1\n",
"64 1\n",
"65 1\n",
"66 1\n",
"67 1\n",
"68 1\n",
"69 1\n",
"70 1\n",
"71 1\n",
"72 1\n",
"73 1\n",
"74 1\n",
"75 1\n",
"76 1\n",
"77 1\n",
"78 1\n",
"79 1\n",
"80 1\n",
"81 1\n",
"82 1\n",
"83 1\n",
"84 1\n",
"85 1\n",
"86 1\n",
"87 1\n",
"88 1\n",
"89 1\n",
"90 1\n",
"91 1\n",
"92 1\n",
"93 1\n",
"94 1\n",
"95 1\n",
"96 1\n",
"Name: count, dtype: int64\n",
"\n",
"Распределение оценки характеристик в контрольной выборке:\n",
"Spec_score\n",
"61 1\n",
"63 1\n",
"65 1\n",
"66 1\n",
"67 1\n",
"68 1\n",
"69 1\n",
"70 1\n",
"71 1\n",
"72 1\n",
"73 1\n",
"74 1\n",
"75 1\n",
"76 1\n",
"77 1\n",
"78 1\n",
"79 1\n",
"80 1\n",
"81 1\n",
"82 1\n",
"83 1\n",
"84 1\n",
"85 1\n",
"86 1\n",
"87 1\n",
"88 1\n",
"89 1\n",
"90 1\n",
"92 1\n",
"93 1\n",
"94 1\n",
"95 1\n",
"98 1\n",
"Name: count, dtype: int64\n",
"\n",
"Распределение оценки характеристик в тестовой выборке:\n",
"Spec_score\n",
"62 1\n",
"64 1\n",
"65 1\n",
"66 1\n",
"67 1\n",
"68 1\n",
"69 1\n",
"70 1\n",
"71 1\n",
"72 1\n",
"73 1\n",
"74 1\n",
"75 1\n",
"76 1\n",
"77 1\n",
"78 1\n",
"79 1\n",
"80 1\n",
"81 1\n",
"82 1\n",
"83 1\n",
"84 1\n",
"85 1\n",
"86 1\n",
"87 1\n",
"88 1\n",
"89 1\n",
"90 1\n",
"91 1\n",
"92 1\n",
"93 1\n",
"94 1\n",
"95 1\n",
"96 1\n",
"Name: count, dtype: int64\n",
"\n"
]
}
],
"source": [
"from imblearn.over_sampling import RandomOverSampler\n",
"from imblearn.under_sampling import RandomUnderSampler\n",
"\n",
"def oversample(df, target_column):\n",
" X = df.drop(target_column, axis=1)\n",
" y = df[target_column]\n",
"\n",
" oversampler = RandomOverSampler(random_state=42)\n",
" x_resampled, y_resampled = oversampler.fit_resample(X, y)\n",
"\n",
" resampled_df = pd.concat([x_resampled, y_resampled], axis=1)\n",
" return resampled_df\n",
"\n",
"\n",
"def undersample(df, target_column):\n",
" X = df.drop(target_column, axis=1)\n",
" y = df[target_column]\n",
"\n",
" undersampler = RandomUnderSampler(random_state=42)\n",
" x_resampled, y_resampled = undersampler.fit_resample(X, y)\n",
"\n",
" resampled_df = pd.concat([x_resampled, y_resampled], axis=1)\n",
" return resampled_df\n",
"\n",
"train_df_oversampled = oversample(train_df, \"Spec_score\")\n",
"val_df_oversampled = oversample(val_df, \"Spec_score\")\n",
"test_df_oversampled = oversample(test_df, \"Spec_score\")\n",
"\n",
"train_df_undersampled = undersample(train_df, \"Spec_score\")\n",
"val_df_undersampled = undersample(val_df, \"Spec_score\")\n",
"test_df_undersampled = undersample(test_df, \"Spec_score\")\n",
"\n",
"print(\"Оверсэмплинг:\")\n",
"check_balance(train_df_oversampled, \"обучающей выборке\")\n",
"check_balance(val_df_oversampled, \"контрольной выборке\")\n",
"check_balance(test_df_oversampled, \"тестовой выборке\")\n",
"\n",
"print(\"Андерсэмплинг:\")\n",
"check_balance(train_df_undersampled, \"обучающей выборке\")\n",
"check_balance(val_df_undersampled, \"контрольной выборке\")\n",
"check_balance(test_df_undersampled, \"тестовой выборке\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Начнем анализировать датасет №19.\n",
"\n",
"Ссылка на исходные данные: https://www.kaggle.com/datasets/surajjha101/forbes-billionaires-data-preprocessed\n",
"\n",
"**Общее описание**: «Миллиардеры мира» — это ежегодный рейтинг документально подтвержденного состояния богатейших миллиардеров мира, который составляется и публикуется ежегодно в марте американским деловым журналом Forbes. Список был впервые опубликован в марте 1987 года. Общий собственный капитал каждого человека в списке оценивается и указывается в долларах США на основе их документально подтвержденных активов, а также с учетом долга и других факторов. Члены королевской семьи и диктаторы, чье богатство обусловлено их положением, исключены из этих списков. Этот рейтинг представляет собой индекс самых богатых задокументированных людей, исключая любой рейтинг тех, кто обладает богатством, которое невозможно полностью установить.\n",
"\n",
"**Проблемная область**: Анализ состояния, возраста и источников богатства самых богатых людей в мире.\n",
"\n",
"**Объекты наблюдения**: Богатейшие люди мира, представленные в датасете.\n",
"\n",
"**Связи между объектами**: можно выявить следующие связи:\n",
"- Между возрастом и состоянием\n",
"- Между страной проживания и источником дохода\n",
"- Между отраслью бизнеса и уровнем благосостояния.\n",
"\n",
"**Бизнес цели**:\n",
"- ***Понять факторы успеха:***: Исследовать, какие факторы (возраст, страна, источник дохода) влияют на высокие состояния. Это может помочь новым предпринимателям и стартапам учиться на опыте успешных людей.\n",
"- ***Анализ тенденций богатства***: Понимание как источники богатства меняются с о временем и как это связано с экономическими условиями в разных странах. Это непременно поможет инвесторам и аналитикам определить, какие секторы могут быть наиболее перспективными для инвестиций в будущем. \n",
"\n",
"**Цели технического проекта**:\n",
"1. ***Исследование факторов успеха***: Входные данные - данные о богатейших людях (возраст, чистая стоимость, индустрия); целевой признак - выявление факторов, способствующих накоплению состояния.\n",
"2. ***Анализ тенденций богатства***: Входные данные - данные о богатейших людях (возраст, страна, источник богатства); целевой признак - наличие зависимости между источником богатства и страной."
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['Rank ', 'Name', 'Networth', 'Age', 'Country', 'Source', 'Industry'], dtype='object')\n"
]
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.read_csv(\"../data/Forbes Billionaires.csv\")\n",
"print(df.columns)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Атрибуты:\n",
"- Ранг (Rank),\n",
"- Имя (Name),\n",
"- Общая стоимость (Networth),\n",
"- Возраст (Age),\n",
"- Страна (Country),\n",
"- Источник дохода(Source),\n",
"- Индустрия (Industry)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Посмотрим на связи."
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAasAAAEnCAYAAAAXY2zOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABnb0lEQVR4nO2dd3hUxfrHv9tLkt30BgESQidABEFIgIsgRUDaRQUuVcFCUxABlSZSBCuIFyt6LwJeRFHEn0oVwYAISFFEQm8JpG42m2yd3x+bPezZvmGTbMj7eZ48sGfmzHnPe+bMe2bmnXcEjDEGgiAIgghihDUtAEEQBEF4g4wVQRAEEfSQsSIIgiCCHjJWBEEQRNBDxoogCIIIeshYEQRBEEEPGSuCIAgi6CFjRRAEQQQ9ZKwIgiCIoIeMFUEQBBH0kLGqBOfOncMTTzyBlJQUyOVyqFQqZGRk4O2330ZZWVlNi0cQBHHXIa5pAWob27dvx/DhwyGTyTBmzBi0bt0aBoMB+/fvx6xZs/DHH3/g/fffr2kxCYIg7ioEFMjWdy5cuIA2bdqgfv362L17NxISEnjp2dnZ2L59O6ZPn15DEhIEQdyd0DCgH6xYsQJarRYfffSRk6ECgNTUVJ6hEggE3J9IJEK9evUwadIkFBUV8c47f/48hg8fjsTERAiFQu6c1q1bc3n279+PzMxMREdHQy6XIyUlBbNnz0Z5eTmX55NPPoFAIMBvv/3m973ZzpVKpbh16xYvLSsri5PJsexDhw6hb9++UKvVUCqV6N69Ow4cOMDLM3fuXMjlct7xvXv3QiAQYO/evdyxAwcOQC6XY+7cuU7y/eMf/+Dp0/a3cOFCLk/37t3Rtm1bl/fXrFkz9OnTx6MOGjVqxJUrFAoRHx+PRx55BJcvX+blKy0txcyZM5GUlASZTIZmzZrhtddeg/13X35+Pvr164f69etDJpMhISEBo0aNwqVLl7g8Fy9ehEAgwGuvvYY333wTDRs2hEKhQPfu3XHq1CneNU+cOIFx48ZxQ8/x8fGYMGEC8vPzne7j2rVreOyxx5CYmAiZTIbk5GQ89dRTMBgM3HP29PfJJ59wZe3evRtdu3ZFSEgIwsPDMWjQIJw+fZp3vYULF0IgECA2NhZGo5GXtnHjRq7cvLw8j/ofN24cGjVqxDt25coVKBQKCAQCXLx40eP5APDXX3/h4YcfRkxMDBQKBZo1a4YXX3yRl+fYsWPo168fVCoVQkND0bNnTxw8eNCprKKiIjz77LNo1KgRZDIZ6tevjzFjxiAvL4+rv57+7Oumr9c8duwY+vbti5iYGF5ZAwYM4PK4e8/z8vKcrmt7NvZotVrEx8c7vX+Ab+9zoJ63v9AwoB9s27YNKSkp6NKli8/nDBkyBEOHDoXJZEJWVhbef/99lJWV4b///S8AwGw246GHHsKlS5fwzDPPoGnTphAIBFiyZAmvnJKSErRo0QIPP/wwlEolsrKysGLFCuh0OqxevTpg9ygSibB+/Xo8++yz3LF169ZBLpfzDCNgbcj69euH9u3bY8GCBRAKhVi3bh3uv/9+/Pzzz+jYsSMAYOnSpTh79iyGDBmCQ4cOITk52em6Fy5cwODBgzFgwAAsXbrUpWz169fHsmXLAFhfuKeeeoqXPnr0aEycOBGnTp3iGfrDhw/j77//xksvveT1/rt27YpJkybBYrHg1KlTeOutt3D9+nX8/PPPAADGGB566CHs2bMHjz32GNq1a4cffvgBs2bNwrVr1/Dmm28CAAwGA8LCwjB9+nRERUXh3LlzWL16NU6cOIGTJ0/yrvmf//wHJSUlmDx5MsrLy/H222/j/vvvx8mTJxEXFwcA2LFjB86fP4/x48cjPj6eG27+448/cPDgQa5Bun79Ojp27IiioiJMmjQJzZs3x7Vr1/DFF19Ap9OhW7duXN0DwNUz+wbdVr937tyJfv36ISUlBQsXLkRZWRlWr16NjIwMHD161MmwlJSU4Ntvv8WQIUO4Y+7qjq/Mnz/f53NPnDiBrl27QiKRYNKkSWjUqBHOnTuHbdu2cff5xx9/oGvXrlCpVHj++echkUjw3nvv4R//+Ad++ukndOrUCYC1fnXt2hWnT5/GhAkTcM899yAvLw/ffPMNrl69ihYtWvD0+P777+P06dPc8weANm3a+HXN4uJi9OvXD4wxzJgxA0lJSQDAexcDweuvv47c3Fyn476+zzaq4nl7hBE+UVxczACwQYMG+XwOALZgwQLesS5durCWLVtyv8+cOcMAsGXLlvHyde/enbVq1cpj+Q8++CBr3bo193vdunUMADt8+LDPMjqeO2LECJaWlsYdLy0tZSqVio0cOZJXtsViYU2aNGF9+vRhFouFy6/T6VhycjJ74IEHeOWXlpayDh06sFatWrHi4mK2Z88eBoDt2bOHFRUVsZYtW7J7772X6XQ6l/J16dKFd6+3bt1y0m9RURGTy+Vs9uzZvHOnTZvGQkJCmFar9aiDhg0bsrFjx/KOjRw5kimVSu731q1bGQD2yiuv8PL985//ZAKBgGVnZ7stf8WKFQwAy8vLY4wxduHCBQaAKRQKdvXqVS7foUOHGAD27LPPcsdc6WXjxo0MANu3bx93bMyYMUwoFLqsA/bPyUb37t1Z9+7dXcrbrl07Fhsby/Lz87ljx48fZ0KhkI0ZM4Y7tmDBAq7uDBgwgDt+6dIlJhQK2YgRIxgAduvWLZfXsTF27FjWsGFD7vepU6eYUChk/fr1YwDYhQsXPJ7frVs3FhYWxi5dusQ7bn/fgwcPZlKplJ07d447dv36dRYWFsa6devGHZs/fz4DwL788kun67jSo6Ps9vh6zR9++IEBYBs3buSd37BhQ9a/f3/ut7v33NU7YXs2Nm7evMnCwsI4ne7Zs4e7J1/f50A9b3+hYUAf0Wg0AICwsDC/ztPpdMjLy0NOTg62bNmC48ePo2fPnlx6SUkJACAqKsqn8goKCnDjxg1s3boVWVlZ6Natm1Oe4uJi5OXlcWX7w+jRo/HXX39xQwxbtmyBWq3myQwAv//+O86ePYuRI0ciPz8feXl5yMvLQ2lpKXr27Il9+/bBYrFw+ZVKJbZt24aCggI8/PDDMJvNAKw9y0ceeQSFhYX45ptvoFAoXMpVXl4OuVzuUXa1Wo1BgwZh48aN3JCc2WzG559/jsGDByMkJMTr/ev1euTl5eHmzZvYsWMHdu/ezbv37777DiKRCNOmTeOdN3PmTDDG8H//93+84yUlJbh58yaysrKwceNGtGrVCpGRkbw8gwcPRr169bjfHTt2RKdOnfDdd99xx+z1Ul5ejry8PNx3330AgKNHjwIALBYLtm7dioEDB6JDhw5O9+Y4HOSJGzdu4Pfff8e4ceN48rZp0wYPPPAATzYbEyZMwPfff4+cnBwAwKefforOnTujadOmPl/Xnrlz5+Kee+7B8OHDvea9desW9u3bhwkTJqBBgwa8NNt9m81m/Pjjjxg8eDBSUlK49ISEBIwcORL79+/n3vMtW7agbdu2vF6DY3m+4M81/W0LbO+57a+goMDrOYsXL4ZarXaqv/6+z0Dgn7c3yFj5iEqlAgC/DcDKlSsRExODhIQE/POf/0TXrl3x6quvcunNmjVDREQEXn/9dRw4cAC3bt1CXl6e01iwjZYtWyIxMRFDhgzBoEGD8Pbbbzvl6dWrF2JiYqBSqRAREYGnn34apaWlPskbExOD/v374+OPPwYAfPzxxxg7diyEQn5VOXv2LABg7NixiImJ4f19+OGH0Ov1KC4u5p1TXl6OoqIi/PDDD9y81Ny5c/HDDz+guLgYer3erVx5eXlQq9Ve5R8zZgwuX77MDdvt3LkTubm5GD16tE/3v2nTJsTExCAuLg69e/dGUlISPvzwQy790qVLSExMdPpoadGiBZduz8SJExEXF4cuXbpALBZj586dTo1dkyZNnORo2rQpb46moKAA06dPR1xcHBQKBWJiYrjhVJueb926BY1GwxsCrSy2+2jWrJlTWosWLbiGzJ527dqhdevW+M9//gPGGD755BOMH
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAcAAAAFNCAYAAACXC791AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB4a0lEQVR4nO3dd1iT19sH8G/C3kORJSK4EVHcOHCL4NY6qeKoG6WOVq0I4q62zqLWVa1i3btWxYkD3AouFERxAA5kC0Jy3j948/wICUpCYiC5P9fFpXmew8lJQnLnnOec+/AYYwyEEEKIhuGrugGEEEKIKlAAJIQQopEoABJCCNFIFAAJIYRoJAqAhBBCNBIFQEIIIRqJAiAhhBCNRAGQEEKIRqIASAghRCNRACQab/HixRAKhQAAoVCIJUuWqLhFhJBvQa4AGB8fj3HjxsHZ2Rn6+vowNTVF69atsXr1anz69EnRbSREqbZv347ffvsNr169wu+//47t27erukmEkG+AJ2su0H///RcDBgyAnp4ehg8fDldXV3z+/BmXL1/GgQMHMGLECGzcuFFZ7SVE4fbs2YPhw4fj8+fP0NPTw86dO/Hdd9+pulmEECWTKQAmJCTAzc0NVatWxblz52Brayt2Pi4uDv/++y8CAgIU3lBClOnt27eIi4tDrVq1YGVlpermEEK+BSaD8ePHMwDsypUrpSoPgPvh8/nMzs6OjRkzhn38+FGsXHx8PPvuu++Yra0t4/F43O/Ur1+fK3Pp0iXWunVrVqlSJaanp8ecnJzYzz//zD59+sSV+euvvxgAduPGDVkeltjv6ujosLdv34qdu3r1Ktem4nVHRUUxLy8vZmpqygwMDJinpye7fPmyWJlZs2YxPT09sePnz59nANj58+e5Y5cvX2Z6enps1qxZEu1r166d2PMp+gkODubKeHp6Mjc3N6mPr3bt2qxr165ffR5OnDjBPD09mbGxMTMxMWFNmzZlYWFhX2xD0R8RAGzSpEls586drHbt2kxPT481btyYXbx4Uez+nj9/ziZMmMBq167N9PX1maWlJfvuu+9YQkKCWLmSXtt3795JPA+MMXb79m3WrVs3ZmJiwoyMjFjHjh1ZZGSk1DqL3pdAIGANGjRgANhff/31xedK9Psl/RRtU3BwMAPAHj16xAYMGMBMTEyYpaUlmzJlitjfMGOM5efns/nz5zNnZ2emq6vLHB0d2ezZs1lubi5X5uPHj6xu3bqsWbNmLCcnhzvu5+fHHB0dxeqbNGkSMzIyYrdu3eKOOTo6Mj8/P7Fye/fuZQDEfj8hIUHsMWlrazNHR0c2Y8YMlpeXx5X78OEDmz59OnN1dWVGRkbMxMSEdevWjd29e1fsPkR/9/v27ZN4Po2MjMTaJOvrs2/fPtakSRNmbGws1ubly5dL3FdRZf07YKzw9fjxxx+Zo6Mj09XVZfb29mzYsGHs3bt3XJmUlBQ2atQoVqVKFaanp8fc3NzYtm3bJOoSCARs1apVzNXVlenp6bHKlSszLy8v7m//a+/Bdu3ayXyfonIODg6Mz+dzdRkZGXFlRH8L0p7P+vXri92vtM83xhjz8fGR+n599eoVGzlyJKtSpQrT1dVlLi4ubMuWLWJlRHUCYHfu3JH4fVG7pf1tlURblmB57NgxODs7o1WrVqX+nb59+6Jfv34oKChAZGQkNm7ciE+fPmHHjh0AAIFAgF69euHFixf48ccfUbt2bfB4PCxatEisnszMTNSrVw8DBw6EoaEhIiMjsWzZMuTk5GDt2rWyPIwv0tLSws6dOzF16lTu2F9//QV9fX3k5uaKlT137hy8vb3RpEkTBAcHg8/n46+//kLHjh1x6dIlNG/eHEDhJIunT5+ib9++uHbtGpycnCTuNyEhAX369EGPHj2wePFiqW2rWrUqN0EjKysLEyZMEDs/bNgwjBkzBvfv34erqyt3/MaNG3jy5AkCAwO/+Ni3bduGUaNGoX79+pg9ezbMzc1x584dnDx5EkOHDsWcOXPwww8/AADev3+PqVOnYuzYsWjbtq3U+i5evIg9e/ZgypQp0NPTw7p169CtWzdcv36da9+NGzdw9epVDB48GFWrVsXz58+xfv16tG/fHg8fPoShoeEX2yzNgwcP0LZtW5iamuLnn3+Gjo4O/vzzT7Rv3x4XL15EixYtSvzdHTt2ICYmRqb7mz9/vthrKu21ERk4cCCqV6+OJUuWICoqCmvWrMHHjx/x999/c2V++OEHbN++Hd999x2mT5+Oa9euYcmSJXj06BEOHToEADA3N8fx48fRsmVL+Pn5Yc+ePeDxeBL3t3btWqxfvx4HDx5E48aNS3wMBQUFmDNnTonnRa9zXl4eTp06hd9++w36+vpYsGABAODZs2c4fPgwBgwYACcnJ6SkpODPP/9Eu3bt8PDhQ9jZ2X35SSylkl6fyMhIDBw4EA0bNsTSpUthZmbG/Y0q8n6kycrKQtu2bfHo0SOMGjUKjRs3xvv373H06FG8evUKlStXxqdPn9C+fXvExcXB398fTk5O2LdvH0aMGIG0tDSxUbPRo0dj27Zt8Pb2xg8//ICCggJcunQJUVFRaNq0KffZCQCXLl3Cxo0bsXLlSlSuXBkAYG1tDQAy3aefnx/OnDmDyZMno2HDhtDS0sLGjRtx+/ZtuZ4/aSIiInDixAmJ4ykpKWjZsiV4PB78/f1hZWWF//77D6NHj0ZGRgZ+/PFHsfL6+vr466+/sHr1au7Y9u3boaurK/EZ/VWljZTp6ekMAOvdu3epoyukRPpWrVoxFxcX7nZsbCwDwJYsWSJWrl27dmI9QGl8fHyYq6srd1sRPcAhQ4awBg0acMezs7OZqakpGzp0qFjdQqGQ1apVi3l5eTGhUMiVz8nJYU5OTqxLly5i9WdnZ7OmTZuy+vXrs/T0dLFvSGlpaczFxUXi23xRrVq1Enus0no+aWlpTF9fn82cOVPsd6dMmcKMjIxYVlZWiY8/LS2NmZiYsBYtWkj0SIo+PhHRt8GSvh3j/7+p3bx5kzv24sULpq+vz/r27csdk/Z4IyMjGQD2999/c8dk6QH26dOH6erqsvj4eO7YmzdvmImJCfP09JSoU/TNPzc3l1WrVo15e3vL1AMsTZtEPcBevXqJlZ04cSIDwO7du8cYY+zu3bsMAPvhhx/Eys2YMYMBYOfOnRM7funSJaanp8fmzJnDGBPvAf73339MS0tL6jf24j3AdevWMT09PdahQwepPcDiz4WdnR3z8fHhbufm5jKBQCBWJiEhgenp6bH58+dzx8rSA/zS6zN79mwGgCUlJUm0XdYeoKx/B0FBQQwAO3jwoMQ50Xtn1apVDADbuXMnd+7z58/Mw8ODGRsbs4yMDMYYY+fOnWMA2JQpU0qs60ttL6q09/np0yfG5/PZuHHjxH7fz89PoT3AFi1acM9p0ffG6NGjma2tLXv//r1YnYMHD2ZmZmbcZ4SoziFDhrBKlSqJjUDUqlWL+4yWpQdY6lmgGRkZAAATExOZAmxOTg7ev3+P5ORkHDhwAPfu3UOnTp2485mZmQCASpUqlaq+1NRUJCUl4fDhw4iMjISnp6dEmfT0dLx//56rWxbDhg3D48ePcfPmTQDAgQMHYGZmJtZmALh79y6ePn2KoUOH4sOHD3j//j3ev3+P7OxsdOrUCREREdzUegAwNDTEsWPHkJqaioEDB0IgEAAo7AEPGjQIHz9+xNGjR2FgYCC1Xbm5udDX1/9i283MzNC7d2/8888/YP9/aVcgEGDPnj3o06cPjIyMSvzd8PBwZGZmYtasWRL3I61nURoeHh5o0qQJd7tatWro3bs3Tp06xT3+oo83Pz8fHz58QM2aNWFubi7126fotRX9pKamip0XCAQ4ffo0+vTpA2dnZ+64ra0thg4disuXL3N/y8WFhobiw4cPCA4OluvxlsakSZPEbk+ePBkAuG/Gon+nTZsmVm769OkACiehFdWmTRv8+eefWLRoEXbu3Mkdf/DgAQYNGoRhw4ZhxowZX2xTTk4O5s+fD39/f1SrVk1qmaysLLx//
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAbQAAAFiCAYAAACeUy10AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACLjElEQVR4nO3dd1RTydsH8G+ooRdBigKCnSZgW0Sx4dr7YkPF3rFgXwuiKOquDcVeQMVVEevaRRRURBER7DTBBopIE+nz/sGb+yMmtBQC2fmck6O592buk5DkycydwiKEEFAURVFUPScj6QAoiqIoShRoQqMoiqKkAk1oFEVRlFSgCY2iKIqSCjShURRFUVKBJjSKoihKKtCERlEURUkFmtAoiqIoqUATGkVRFCUVaEKjKIqipIJACS0hIQHTp0+HmZkZ2Gw21NXV4eDggB07duDnz5+ijpGiKIqiqiRX0wdcvnwZzs7OUFRUxPjx42FpaYnCwkLcu3cPixcvxosXL7B//35xxEpRFEVRFWLVZHLipKQkWFtbo3Hjxrh9+zYMDAy49sfHx+Py5cuYN2+eyAOlKIqiqMrUqMlx8+bNyM3NxaFDh3iSGQA0a9aMK5mxWCzmJisri0aNGmHatGnIzMzkelxiYiKcnZ1haGgIGRkZ5jGWlpbMMffu3UPnzp2ho6MDNpsNMzMzLF26FPn5+cwxfn5+YLFYiIyMrMnT4nqsgoICvn79yrUvPDycienXsiMiItCnTx9oaGhAWVkZXbt2xf3797mOWb58OdhsNtf2O3fugMVi4c6dO8y2+/fvg81mY/ny5TzxdevWjev15NzWrFnDHNO1a1e0adOG7/Nr2bIlevfuXelr0KRJE0yYMIFrW2BgIFgsFpo0acK1vbS0FDt27ICVlRXYbDZ0dXXRp08f5vXhF2v5W7du3Ziyvnz5gsmTJ0NPTw9sNhtt2rSBv78/1/k4f593797xvC7ly+K8rmfOnOF5fqqqqlzPj1+ZpaWlsLa2BovFgp+fH7N9woQJPK/B8ePHISMjg40bN3Jtv337Nrp06QIVFRVoampi8ODBePXqFdcxa9asAYvFQsOGDVFUVMS1759//mFep/T0dJ7nwU+TJk34vs7lnwMA/PjxAwsXLoSRkREUFRXRsmVL/P333yj/u/bIkSNgsVg4fPgw12M3bNgAFouFK1eu1Kg8Ds7fht+tvOTkZMyaNQstW7aEkpISGjRoAGdnZ56/PT/v3r3j+7xnz54NFovF8/7mp6r3NgAUFxdj3bp1aNq0KRQVFdGkSRP8+eefKCgo4Cnv6tWr6Nq1K9TU1KCuro727dvjxIkTACr+XPN7bap7zuLiYnh5eaFFixZQVFTkKqv8c2jSpAkGDBjAE++cOXN4/ia/ftcAwF9//cXzWQaAgoICeHh4oFmzZlBUVISRkRGWLFnCEycnpu3bt/PE0KpVK7BYLMyZM4dnX2Vq1OR46dIlmJmZoVOnTtV+zNChQzFs2DAUFxcjPDwc+/fvx8+fP3Hs2DEAQElJCQYNGoTk5GTMnz8fLVq0AIvFwvr167nKycnJQevWrTFixAgoKysjPDwcmzdvRl5eHnbu3FmTp1EpWVlZHD9+HAsWLGC2HTlyBGw2myt5AmVfXH379kXbtm3h4eEBGRkZHDlyBD169EBYWBg6dOgAoOyLIC4uDkOHDkVERARMTU15zpuUlIQhQ4ZgwIAB2LBhA9/YGjduDG9vbwBAbm4uZs6cybV/3LhxmDp1Kp4/f871Y+Dx48d4+/YtVq5cWaPXori4GCtWrOC7b/LkyfDz80Pfvn0xZcoUFBcXIywsDA8fPkS7du2Yvy8AhIWFYf/+/di2bRt0dHQAAHp6egCAnz9/olu3boiPj8ecOXNgamqKwMBATJgwAZmZmbVe2z927BhiY2OrPO7GjRuYNGkS5syZg2XLljHbb926hb59+8LMzAxr1qzBz58/sXPnTjg4OCAqKoonKebk5ODff//F0KFDmW0Vvd+qYmNjg4ULFwIoez+tXr2aaz8hBIMGDUJISAgmT54MGxsbXL9+HYsXL8bHjx+xbds2AMDEiRNx9uxZuLu7o1evXjAyMkJsbCw8PT0xefJk9OvXr0bl/Wru3Llo3749AODo0aO4efMm1/7Hjx/jwYMHGDVqFBo3box3795hz5496NatG16+fAllZeUavS7x8fE4cOBAtY+v6r0NAFOmTIG/vz/++OMPLFy4EBEREfD29sarV69w7tw5piw/Pz9MmjQJFhYWWL58OTQ1NfH06VNcu3YNY8aMwYoVKzBlyhQAQHp6OhYsWIBp06ahS5cuPHFV95xbtmzBqlWrMHToUCxduhSKiorMZ1BUMjMzme+i8kpLSzFo0CDcu3cP06ZNQ+vWrREbG4tt27bh7du3OH/+PNfxbDYbR44cwfz585ltDx48QHJysmCBkWrKysoiAMjgwYOr+xACgHh4eHBt69SpEzE3N2fuv3nzhgAg3t7eXMd17dqVWFhYVFp+v379iKWlJXP/yJEjBAB5/PhxtWP89bGjR48mVlZWzPYfP34QdXV1MmbMGK6yS0tLSfPmzUnv3r1JaWkpc3xeXh4xNTUlvXr14ir/x48fpF27dsTCwoJkZWWRkJAQAoCEhISQzMxMYm5uTtq3b0/y8vL4xtepUyeu5/r161ee1zczM5Ow2WyydOlSrsfOnTuXqKiokNzc3EpfAxMTE+Lq6src3717N1FUVCTdu3cnJiYmzPbbt28TAGTu3Lk8ZZR/LTg4r21SUhLPvu3btxMA5Pjx48y2wsJCYm9vT1RVVUl2djYhhBB/f38CgCQmJnI9vmvXrqRr167Mfc7rGhgYyHMuFRUVruf3a1z5+fnE2NiY9O3blwAgR44cYY51dXVlXoPIyEiiqqpKnJ2dSUlJCdc5bGxsSMOGDcm3b9+Ybc+ePSMyMjJk/PjxzDYPDw/m/TZgwABme3JyMpGRkSGjR48mAMjXr195ngc/hoaGXOU8fvyY5zmcP3+eACBeXl5cj/3jjz8Ii8Ui8fHxzLbPnz8TbW1t0qtXL1JQUEBsbW2JsbExycrKEqg8Qgi5ceMGAUDOnDnDbJs9ezb59WuI32cgPDycACBHjx6t9HVISkried4jRowglpaWxMjIiOvvz0913tvR0dEEAJkyZQrX/kWLFhEA5Pbt24SQss+jmpoa6dixI/n58yffsqqKnaO65ySEEHt7e9K6dWuuc/D7bjQxMSH9+/fnORe/v8mv3zVLliwhDRs2JG3btuX6/B07dozIyMiQsLAwrsfv3buXACD379/nKvOPP/4gcnJyJDIyktk+efJk5vt29uzZPPFVptpNjtnZ2QAANTW1GiXMvLw8pKenIzU1FUFBQXj27Bl69uzJ7M/JyQEANGjQoFrlZWRk4PPnzzh//jzCw8Ph6OjIc0xWVhbS09OZsmti3LhxeP36NVM1DwoKgoaGBlfMABAdHY24uDiMGTMG3759Q3p6OtLT0/Hjxw/07NkToaGhKC0tZY5XVlbGpUuXkJGRgREjRqCkpARAWQ115MiR+P79Oy5evAglJSW+ceXn54PNZlcau4aGBgYPHox//vmHafIpKSnBqVOnMGTIEKioqFT7dcjLy8PatWsxZ84cGBsbc+0LCgoCi8WCh4cHz+N+baqoypUrV6Cvr4/Ro0cz2+Tl5TF37lzk5ubi7t27AICGDRsCAD58+FCtcnNycpi/CedWFV9fX3z79o3v8+JITExE//79YWNjg2PHjkFG5n8foc+fPyM6OhoTJkyAtrY2s93a2hq9evXiaqrjmDRpEq5du4bU1FQAgL+/P+zt7dGiRYtqPU+O6rw/rly5AllZWcydO5dr+8KFC0EIwdWrV5lt+vr68PX1xc2bN9GlSxdER0fj8OHDUFdXF6g8TowAqoyz/GegqKgI3759Q7NmzaCpqYmoqKhKH/urJ0+eIDAwEN7e3lx/q4pU573N+Tu6u7tz7efUji9fvgwAuHnzJnJycrBs2TKe5yzI56Q65wTK3vtaWlrVOkdRURHP56SqloGPH
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAGVCAYAAADAPivmAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACTJklEQVR4nO2dd1gT2dfHvwnSO6JUaSooCIIdC7o2sPeKDbtYsJe1Yl93dV0Ue+8NG6urq6yCBbGBYKODBVAR6dLCff/gzfwSEiDVINzP8+TRzNycORkyZ+6cewqLEEJAoVAolB8OW9EKUCgUSm2FGmAKhUJRENQAUygUioKgBphCoVAUBDXAFAqFoiCoAaZQKBQFQQ0whUKhKAhqgCkUCkVBUANMoVAoCoIaYAqFQlEQMjPA8fHxmDZtGmxsbKCmpgYdHR106NABf/31F75//y6rw1AoFEqNoY4shFy7dg3Dhg2Dqqoqxo0bh2bNmqGoqAj379/HokWL8OrVK+zbt08Wh6JQKJQaA0vaYjyJiYlwcnKCubk5/vvvP5iYmPDtj4uLw7Vr1+Dj4yOVohQKhVLjIFIyffp0AoA8ePBApPEAmBebzSampqZkypQp5Nu3b3zj4uPjydChQ4mJiQlhsVjMZxwcHJgx9+7dIx06dCB169YlqqqqxNramixevJh8//6dGXP48GECgDx58kTs78b9rLKyMvn8+TPfvocPHzI6lZf96NEj4u7uTnR0dIi6ujpxc3Mj9+/f5xuzdOlSoqqqyrf9zp07BAC5c+cOs+3+/ftEVVWVLF26VEC/zp07851P7mv16tXMGDc3N+Lk5CT0+9na2pKePXtWeR78/f2Jvb09UVFRISYmJsTb25vv71WRHrwvLgDIzJkzyYkTJ4itrS1RVVUlLVq0IMHBwXzHTEpKIjNmzCC2trZETU2NGBgYkKFDh5LExEQB/b59+0bmzp1LLC0tiYqKCjEzMyNjx44lX7584Rs3fvz4Ks/X+PHjiaamZpXnpHPnzqRz587M+6KiIrJixQpiZWVFlJWVSYMGDciiRYtIfn5+lbLGjx9PLC0t+ba9e/eOqKmpEQBCv3N53rx5Q4YNG0YMDQ2JmpoasbW1Jb/++ivfmOfPnxMPDw+ira1NNDU1SdeuXUloaKiArMrOJ/c3WtmL93yKesznz58Td3d3YmhoyCerT58+zJiKruUvX74IHHf16tV8vztCCMnJySFGRkYC1xghol2zXJn16tUjRUVFfPtOnTrF6Fz+d1cZUrsgAgMDYWNjg/bt24v8mUGDBmHw4MEoKSlBaGgo9u3bh+/fv+P48eMAAA6Hg/79+yM5ORlz586Fra0tWCwWNmzYwCcnJycHTZs2xfDhw6GhoYHQ0FBs2bIF+fn52LFjh7RfjUFJSQknTpzAvHnzmG2HDx+GmpoaCgoK+Mb+999/6NWrF1q2bInVq1eDzWbj8OHD6Nq1K+7du4c2bdoAADZu3IjY2FgMGjQIYWFhsLa2FjhuYmIiBg4ciL59+2Ljxo1CdTM3N8emTZsAALm5uZgxYwbf/rFjx2LKlCl4+fIlmjVrxmx/8uQJYmJisGLFikq/+5o1a+Dr64vu3btjxowZiI6Oxu7du/HkyRM8ePAAysrKWL58OSZPngwASE9Px7x58zB16lR06tRJqMzg4GCcPXsWc+bMgaqqKnbt2gUPDw88fvyY0fHJkyd4+PAhRo4cCXNzcyQlJWH37t3o0qULXr9+DQ0NDeY7d+rUCW/evMHEiRPRokULpKen4+rVq/jw4QMMDQ35jm1oaIg///yT7/zIgpkzZ2L//v3o378/Fi5ciPDwcPz+++94+fIlrl27BhaLJZa8VatWCfy2KiIyMhKdOnWCsrIypk6dCisrK8THxyMwMJC5Zl69eoVOnTpBR0cHixcvhrKyMvbu3YsuXbogODgYbdu2BVD1+WzatClznQLAvn378ObNG75z6uTkJNYxs7Ky0KtXLxBCMH/+fDRo0AAA+K43WbB161Z8+vRJYLuo1yyXnJwc/P333xg0aBCzrSJ7UCUim2ohZGVlEQBkwIABIn8G5e5UhBDSvn17Ym9vz7yPjo4mAMimTZv4xnXu3JlvBiyM3r17k2bNmjHvZTEDHjVqFHF0dGS25+XlER0dHTJ69Gg+2aWlpaRx48bE3d2dlJaWMuPz8/OJtbU16dGjB5/8vLw80qpVK+Lg4ECysrL4ZsCZmZnE3t6etG7dusJZVPv27fm+q7CZQGZmJlFTUyNLlizh++ycOXOIpqYmyc3NrfD7f/78maioqJCePXsSDofDbN+5cycBQA4dOiTwmcTERAKAHD58WKhM/P8s4enTp8y25ORkoqamRgYNGsRsE/adQ0NDCQBy7NgxZtuqVasIAHLx4kWB8bx/A0II8fT0JNbW1gL6SDsDjoyMJCwWi4wcOZJvzJo1awgAEhgYWKms8jPgly9fEjabTXr16iXSDNjNzY1oa2uT5ORkvu2833/gwIFERUWFxMfHM9tSUlKItrY2cXNzY7aJcz6F6c6LqMe8efMmAUBOnz7N93lLS0uZzYA/f/5MtLW1mXPKnQGLc81yZY4aNYr07duX2Z6cnEzYbDYZNWqU2DNgqaIgsrOzAQDa2tpifS4/Px/p6elIS0tDQEAAXrx4gW7dujH7c3JyAAB169YVSV5GRgZSU1Nx+fJlhIaGws3NTWBMVlYW0tPTGdniMHbsWLx9+xZPnz4FAAQEBEBXV5dPZwCIiIhAbGwsRo8eja9fvyI9PR3p6enIy8tDt27dEBISgtLSUma8hoYGAgMDkZGRgeHDh4PD4QAoewIYMWIEvn37hqtXr0JdXV2oXgUFBVBTU6tUd11dXQwYMACnT58G+X93P4fDwdmzZzFw4EBoampW+Nnbt2+jqKgIc+fOBZv9v5/KlClToKOjg2vXrlV67IpwdXVFy5YtmfcWFhYYMGAAbt68yZwD3u9cXFyMr1+/olGjRtDT08Pz58+ZfQEBAWjevDnfbIRL+VlnUVERVFVVRdKR+7erbEZTXFyM9PR0XLp0iZm98TJ37lwoKSmJfZ6WLVuGFi1aYNiwYVWO/fLlC0JCQjBx4kRYWFjw7eN+fw6Hg3///RcDBw6EjY0Ns9/ExASjR4/G/fv3mWtZnPNZGeIcU9zrnXstc18ZGRlVfmbdunXQ1dXFnDlz+LaLe80CwMSJE3Hjxg2kpaUBAI4ePQpXV1fY2tqKpD8vUrkgdHR0AEBso/b777/j999/Z957eHjgt99+Y97b2dlBX18fW7duhb29PeOCKC4uFirP3t6eebSYMGEC/vrrL4Ex3bt3Z/6vp6eHUaNG4ffff6/UAHGpV68e+vTpg0OHDqFVq1Y4dOgQxo8fz2eUACA2NhYAMH78+AplZWVlQV9fn3lfUFCAzMxM3Lx5k/khLVu2DE+ePIGGhgYKCwsrlJWeno7GjRtXqf+4ceNw9uxZ3Lt3D25ubrh9+zY+ffpU5eN3cnIygLK/By8qKiqwsbFh9ouLMJ1tbW2Rn5+PL1++wNjYGN+/f8emTZtw+PBhfPz4kbl5AGXnkEt8fDyGDBki0nEzMzOhpaVV5bi8vDzUq1ePed+gQQMsWLBAYCH54cOHfOPKnyddXV2YmJggKSlJJP0A4P79+wgMDERQUBDevXtX5fiEhAQA4HMvlefLly/Iz88X0A8AmjZtitLSUrx//x4ODg5inc/KEOeYrVq1grKyMtasWQNDQ0PGBVHe8HHhvZZFITExEXv37sXu3bsFJiySXLPOzs5o1qwZjh07hkWLFuHIkSP49ddf8f79e7H0AmRggE1NTfHy5UuxPjd27FiMGzcOpaWlSEhIwLp169C3b1/cvn0bLBYLWlpaOHv2LCZOnIiOHTvyfdbBwUFA3vnz55GdnY1nz55h8+bNMDMzw/r16/nG+Pv7w9bWFoWFhbh79y7++OMPAMCuXbtE0nnixIkYN24cZs+ejZCQEBw4cAD37t3jG8P9wfz+++9wdnYWKqe8AfDx8YGJiQk2bNgAT09PAMCzZ89w5swZLFu2DD4+P
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"\n",
"\n",
"plt.figure(figsize=(10, 6))\n",
"\n",
"\n",
"# Связь между возрастом и состоянием\n",
"\n",
"plt.subplot(2, 2, 1)\n",
"\n",
"sns.scatterplot(data=df, x=\"Age\", y=\"Networth\")\n",
"\n",
"plt.title(\"Связь между возрастом и состоянием\")\n",
"\n",
"plt.xlabel(\"Возраст\")\n",
"\n",
"plt.ylabel(\"Состояние (млрд)\")\n",
"\n",
"plt.show()\n",
"\n",
"\n",
"\n",
"# Связь между страной проживания и состоянием (топ-10 стран)\n",
"\n",
"plt.subplot(2, 2, 2)\n",
"\n",
"top_countries = df[\"Country\"].value_counts().index[:10]\n",
"\n",
"sns.boxplot(data=df[df[\"Country\"].isin(top_countries)], x=\"Country\", y=\"Networth\")\n",
"\n",
"plt.title(\"Связь между страной проживания и состоянием\")\n",
"\n",
"plt.xticks(rotation=90)\n",
"\n",
"plt.xlabel(\"Страна\")\n",
"\n",
"plt.ylabel(\"Состояние (млрд)\")\n",
"\n",
"plt.show()\n",
"\n",
"\n",
"\n",
"# Связь между источником дохода и состоянием (топ-10 источников дохода)\n",
"\n",
"plt.subplot(2, 2, 3)\n",
"\n",
"top_sources = df[\"Source\"].value_counts().index[:10]\n",
"\n",
"sns.boxplot(data=df[df[\"Source\"].isin(top_sources)], x=\"Source\", y=\"Networth\")\n",
"\n",
"plt.title(\"Связь между источником дохода и состоянием\")\n",
"\n",
"plt.xticks(rotation=90)\n",
"\n",
"plt.xlabel(\"Источник дохода\")\n",
"\n",
"plt.ylabel(\"Состояние (млрд)\")\n",
"\n",
"plt.show()\n",
"\n",
"\n",
"# Связь между отраслью и состоянием (топ-10 отраслей)\n",
"\n",
"plt.subplot(2, 2, 4)\n",
"\n",
"top_industries = df[\"Industry\"].value_counts().index[:10]\n",
"\n",
"sns.boxplot(data=df[df[\"Industry\"].isin(top_industries)], x=\"Industry\", y=\"Networth\")\n",
"\n",
"plt.title(\"Связь между отраслью и состоянием\")\n",
"\n",
"plt.xticks(rotation=90)\n",
"\n",
"plt.xlabel(\"Отрасль\")\n",
"\n",
"plt.ylabel(\"Состояние (млрд)\")\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Перейдем к выявлению выбросов."
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Пропущенные значения в данных:\n",
" Rank 0\n",
"Name 0\n",
"Networth 0\n",
"Age 0\n",
"Country 0\n",
"Source 0\n",
"Industry 0\n",
"dtype: int64\n"
]
}
],
"source": [
"missing_values = df.isnull().sum()\n",
"print(\"Пропущенные значения в данных:\\n\", missing_values)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Пропущенных данных не найдено.\n"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABKMAAAHWCAYAAACrLUrEAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABKhklEQVR4nO3dd5RV5b0//s/MwAwdRLpIFUWUpihBr4KRKIgJMf5sQSMxRqIY201uTDBgzDVYYhI7eMUSMXZjIRYwBlADiCixYwErAiK9Dszs3x+uOV8OQxkQ9rTXa61Zi9n72c9+yjnDc95nn31ykiRJAgAAAABSkFveDQAAAACg+hBGAQAAAJAaYRQAAAAAqRFGAQAAAJAaYRQAAAAAqRFGAQAAAJAaYRQAAAAAqRFGAQAAAJAaYRR8A8XFxbF48eKYO3dueTcFAIAUWQcC7DxhFOygBQsWxEUXXRRt27aN/Pz8aNq0aXTp0iVWrFhR3k0DAGA3sg4E2DVqlHcDYFe566674sc//nHWtqZNm8YBBxwQ//M//xMDBw78xuf44IMP4qijjooNGzbEBRdcEAcddFDUqFEjateuHXXr1v3G9QMAsOOsAwEqF2EUVc4VV1wR7du3jyRJYuHChXHXXXfFcccdF08++WQcf/zx36juYcOGRX5+fkyfPj322muvXdRiAAB2BetAgMpBGEWVM3DgwOjVq1fm95/85CfRvHnzuO+++77RImTWrFnx/PPPx8SJEy1AAAAqIOtAgMrBPaOo8ho1ahS1a9eOGjX+X/b60UcfRU5OTvzxj3/c6nGXX3555OTkZH6fPn161KpVKz788MM44IADoqCgIFq0aBHDhg2LJUuWZB3br1+/OPDAA2PWrFlx2GGHRe3ataN9+/YxZsyYUudZtGhRZqFUq1at6N69e9x9992lyhUXF8f1118fXbt2jVq1akXTpk1jwIAB8corr2TK5OTkxPnnn7/VPt11112Rk5MTH3300VbLREQMHTo0cnJytvozefLkrPIPPfRQHHzwwVG7du1o0qRJnH766fH5559v8xwlli1bFhdffHG0a9cuCgoKonXr1vGjH/0oFi9evMvHaFt9ysnJiX79+u3wOUvKtWnTJvLy8jJ11atXL1NmW4+3Aw88MOu8kydP3uIYDxo0KHJycuLyyy/PbNvSfD777LNx2GGHRZ06daJhw4Zx/PHHx5tvvrm14QeAKs068P+paOvAkvaU/NSpUye6du0at99+e6myzz//fBxxxBFRt27daNSoUQwePDjeeeedrDK33nprdO/ePRo2bBh169aN7t27x7hx40r1rV69ejF37tw49thjo27dutGqVau44oorIkmSrLJ//OMf47DDDos999wzateuHQcffHA8/PDDW+zL+PHj49BDD406derEHnvsEUceeWRMnDgxIiLatWu3zfFs165dJEkS7dq1i8GDB5eqe926ddGwYcMYNmzYdscUKhNXRlHlLF++PBYvXhxJksSiRYvixhtvjFWrVsXpp5/+jer96quvYt26dXHuuefGt7/97fjZz34WH374Ydx8880xY8aMmDFjRhQUFGTKL126NI477rg4+eST47TTTosHH3wwzj333MjPz4+zzjorIiLWrl0b/fr1iw8++CDOP//8aN++fTz00EMxdOjQWLZsWVx44YWZ+n7yk5/EXXfdFQMHDoyzzz47Nm7cGC+88EJMnz496x3AXaWgoKDUYmDmzJlxww03ZG0ruUfDIYccEqNHj46FCxfG9ddfHy+99FK89tpr0ahRo62eY9WqVXHEEUfEO++8E2eddVYcdNBBsXjx4njiiSfis88+iyZNmuzSMbrnnnsyZV944YW47bbb4s9//nM0adIkIiKaN28eETs2L2eeeWY899xz8fOf/zy6d+8eeXl5cdttt8Wrr76602O/ualTp8ZTTz213XIvvPBCHHfccdG2bdsYNWpUbNiwIW655ZY4/PDDY+bMmbHvvvvusjYBQEVkHbhrpLEOLFGyFluxYkXccccd8dOf/jTatWsX/fv3j4iI5557LgYOHBgdOnSIyy+/PNauXRs33nhjHH744fHqq69Gu3btIiJi5cqVccwxx0THjh0jSZJ48MEH4+yzz45GjRrFiSeemDlfUVFRDBgwIL71rW/FNddcE88880yMGjUqNm7cGFdccUWm3PXXXx/f+973YsiQIVFYWBj3339/nHTSSTFhwoQYNGhQptzvfve7uPzyy+Owww6LK664IvLz82PGjBnx/PPPxzHHHBN/+ctfYtWqVRER8c4778Qf/vCH+M1vfhP7779/RETUq1cvcnJy4vTTT49rrrkmlixZEo0bN87U/+STT8aKFSu+8WMYKpwEqog777wziYhSPwUFBcldd92VVXbevHlJRCTXXnvtVusbNWpUsulTpOT3o48+Otm4cWOp8954442ZbX379k0iIrnuuusy29avX5/06NEjadasWVJYWJgkSZL85S9/SSIiGT9+fKZcYWFh0qdPn6RevXrJihUrkiRJkueffz6JiOSCCy4o1c7i4uLMvyMiGT58+HbHaN68eVstkyRJcuaZZyZ169Yttf2hhx5KIiL517/+lWlrs2bNkgMPPDBZu3ZtptyECROSiEhGjhy5zfOMHDkyiYjk0Ucf3Wq/dvUYldjWWJT1nGvXrk1yc3OTYcOGZR2/+fht6/F2wAEHJH379s38/q9//StrjJMkSXr37p0MHDgwiYhk1KhRW+3DwQcfnDRs2DBZsGBBpsx7772X1KxZMznxxBNLnRsAqgrrwMq3DtxSe957770kIpJrrrkms61k3L766qvMtv/85z9Jbm5u8qMf/Wir9W/cuDFp0KBBcv7552f1LSKSn//855ltxcXFyaBBg5L8/Pzkyy+/zGxfs2ZNVn2FhYXJgQcemHz729/ObHv//feT3Nzc5IQTTkiKioqyym9p/bmldV6JOXPmJBGR3HrrrVnbv/e97yXt2rXbYn1QmfmYHlXOzTffHJMmTYpJkybF+PHj46ijjoqzzz47Hn300VJl16xZE4sXL46lS5eWujR3ay655JLIy8vL/H7GGWdE8+bN4x//+EdWuRo1amRdTpufnx/Dhg2LRYsWxaxZsyIi4qmnnooWLVrEaaedlilXs2bNuOCCC2LVqlUxZcqUiIh45JFHIicnJ0aNGlWqPZteQh7x9aW8ixcvjq+++iqKi4vL1Ked9corr8SiRYvivPPOi1q1amW2Dxo0KDp37lxqTDb3yCOPRPfu3eOEE04ota+kX7tjjLanrOdcvXp1FBcXx5577lmmekseb5v+FBUVbfOYRx99NGbOnBlXXXXVVsssXbo03nvvvZg1a1YMGTIkc4VXRESnTp3ie9/7XjzzzDPbPRcAVHbWgZVnHVhi6dKlsXjx4pg7d278+c9/jry8vOjbt29ERHzxxRcxe/bsGDp0aNbVQt26dYvvfOc7pa4cLyoqisWLF8fHH38cf/7zn2PFihVxxBFHlDrnph9nLPl4Y2FhYTz33HOZ7bVr185q4/Lly+OII47Iuvr9sccei+Li4hg5cmTk5ma/tN7R9ee+++4bvXv3jnvvvTezbcmSJfH000/HkCFDdrg+qOiEUVQ5hx56aPTv3z/69+8fQ4YMiX/84x/RpUuXzH8ymxo1alQ0bdo0GjduHHXq1IlBgwbF+++/v8V6S/4D6Ny5c9b2vLy86NSpU6nP37dq1arU1/yWfEyqpOzHH38cnTp1KvWfV8llux9//HFERHz44YfRqlWrrP+Et2bcuHHRtGnTaNKkSdSuXTuOPPLIrPsJ7Eol7dtvv/1K7evcuXNm/9Z8+OGHceCBB273HLt6jLanrOfcc889o1OnTnH77bfHxIkTY9GiRbF48eJYv379Fustebxt+vPuu+9utR1FRUXxm9/8JoYMGRLdunXbarmDDjooMwdbmov9998/Vq9enXUfLgCoiqwDK886sMRBBx0UTZs2jY4dO8Ydd9wRN910Uxx66KHbPcf+++8fixcvjtWrV2e2vf/++9G0adNo165dj
"text/plain": [
"<Figure size 1500x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Размер данных до удаления выбросов: (2600, 7)\n"
]
}
],
"source": [
"fig, axs = plt.subplots(1, 2, figsize=(15, 5))\n",
"\n",
"sns.boxplot(data=df, x='Networth', ax=axs[0])\n",
"axs[0].set_title(\"Выбросы по состоянию\")\n",
"\n",
"sns.boxplot(data=df, x=\"Age\", ax=axs[1])\n",
"axs[1].set_title(\"Выбросы по возрасту\")\n",
"\n",
"plt.show()\n",
"print(\"Размер данных до удаления выбросов: \", df.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Выбросов в данном случае не видно, данные в районе допустимых значений"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA/YAAAIjCAYAAACpnIB8AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB8/0lEQVR4nO3deVxU9f7H8fcMzAw7roDkRmou5VJaSuWSqWjWbfHXbbEyNbsZ1lXLylu5dcuyW2Zletu0zRYru6WlYq4ZmpmWaZmZRqXgCijrwJzfHzhHRhYRwZmB1/Px4BGc850znzPMl3zP93u+x2IYhiEAAAAAAOCXrN4uAAAAAAAAVB7BHgAAAAAAP0awBwAAAADAjxHsAQAAAADwYwR7AAAAAAD8GMEeAAAAAAA/RrAHAAAAAMCPEewBAAAAAPBjBHsAAAAAAPwYwR4AAKAWe/vtt7V7927z57lz5+qvv/7yXkEAgFNGsAeACpg7d64sFkuZX3/++ae3SwSASlmzZo0eeOAB7d69W0uWLFFiYqKsVv6JCAD+JNDbBQCAP5kyZYri4uJKbK9Xr54XqgGA0zdmzBj16tXL/Ns2duxYNWrUyMtVAQBOBcEeAE7BgAED1KVLF2+XAQBVpk2bNtq5c6d+/PFHNWjQQC1atPB2SQCAU8Q8KwCoQu4p+8WvV3W5XOrQoYMsFovmzp3r0f7nn3/W3//+dzVs2FDBwcFq3bq1Hn74YUnSpEmTyp3+b7FYtHLlSvNY8+fPV+fOnRUcHKwGDRrolltuKXGd7O23317qcVq2bGm2ad68ua688kotXbpUnTp1UlBQkNq1a6ePP/7Y41iHDh3S/fffr/bt2yssLEwREREaMGCAvv/+e492K1euNJ9n8+bNHvv++usvBQQEyGKx6MMPPyxRZ6dOnUq8xlOnTpXFYlFYWJjH9jlz5qh3796KioqSw+FQu3btNGvWrBKPL83tt9+usLAw/fbbb0pISFBoaKhiY2M1ZcoUGYbh0fY///mPLr74YtWvX1/BwcHq3LmzR+3Fvf3227rooosUEhKiunXrqkePHlq6dKm5v3nz5uX+fouzWCwaNWqU3nnnHbVu3VpBQUHq3LmzVq9eXeJ5//rrLw0bNkzR0dFyOBw699xz9frrr5daY1nvs169epVou379evXv31+RkZEKCQlRz549tXbt2lKPW9a5FX/PStIXX3yh7t27KzQ0VOHh4Ro4cKC2bt3q0cb9+znRhx9+WOKYvXr1KlH7hg0bSn1Njx49qvvuu09nn322bDabR50HDhwo9bzcLBaLJk2a5LGttP7fvHlz3X777R7t5s+fL4vFoubNm3tsd7lcmjFjhtq3b6+goCA1bNhQ/fv317fffms+Z3lfxc973759Gj58uKKjoxUUFKSOHTvqjTfe8Hi+3bt3m3+XQkND1bVrV7Vo0UKJiYmyWCwl6i6NN2surrSa3b8Pu92u/fv3e7RPTk42a3DX6laRv6VS1f39PpX3LACUhRF7AKhmb731lrZs2VJi+w8//KDu3bvLZrPpzjvvVPPmzbVz50599tlnevzxx3Xdddd5BO4xY8aobdu2uvPOO81tbdu2lVT0D9ihQ4fqwgsv1NSpU5WWlqYZM2Zo7dq12rRpk+rUqWM+xuFw6NVXX/WoJTw83OPnHTt26IYbbtBdd92lIUOGaM6cObr++uu1ePFi9e3bV5L022+/6ZNPPtH111+vuLg4paWl6b///a969uypbdu2KTY21uOYQUFBmjNnjmbMmGFue+ONN2S325Wbm1vi9QkMDNTWrVu1adMmnX/++eb2uXPnKigoqET7WbNm6dxzz9Xf/vY3BQYG6rPPPtPdd98tl8ulxMTEEu1PVFhYqP79+6tbt26aNm2aFi9erIkTJ6qgoEBTpkwx282YMUN/+9vfNHjwYOXn5+u9997T9ddfr4ULF2rgwIFmu8mTJ2vSpEm6+OKLNWXKFNntdq1fv17Lly9Xv379zHadOnXSfffd51HLm2++qaSkpBI1rlq1Su+//77uvfdeORwOvfTSS+rfv7+++eYbnXfeeZKktLQ0devWzfwgoGHDhvriiy80fPhwZWZmavTo0aWe/6xZs8zwPH78+BL7ly9frgEDBqhz586aOHGirFar+WHKmjVrdNFFF5V4TPfu3c33608//aQnnnjCY/9bb72lIUOGKCEhQU899ZSys7M1a9YsXXrppdq0aVOJ4FtZDz74YKnbx40bp9mzZ2v48OG65JJLZLPZ9PHHH2vBggVV8rylKSgoMMPfiYYPH665c+dqwIABuuOOO1RQUKA1a9Zo3bp16tKli9566y2z7Zo1a/Tyyy9r+vTpatCggSQpOjpakpSTk6NevXrp119/1ahRoxQXF6f58+fr9ttvV3p6uv75z3+WWd+vv/6qV155pcLn4w81BwQE6O2339aYMWPMbXPmzFFQUFCJvz0V/VtalX+/S1PWexYAymQAAE5qzpw5hiRjw4YNFWq3a9cuwzAMIzc312jatKkxYMAAQ5IxZ84cs22PHj2M8PBw4/fff/c4hsvlKvXYzZo1M4YMGVJie35+vhEVFWWcd955Rk5Ojrl94cKFhiRjwoQJ5rYhQ4YYoaGh5Z5Ds2bNDEnGRx99ZG7LyMgwGjVqZJx//vnmttzcXKOwsNDjsbt27TIcDocxZcoUc9uKFSsMScZNN91k1K9f38jLyzP3tWrVyrj55psNScb8+fNL1HnVVVcZo0aNMrevWbPGCA4ONq655poS55GdnV3iXBISEoyzzz673PN1P58k45577jG3uVwuY+DAgYbdbjf2799f5vPk5+cb5513ntG7d29z244dOwyr1Wpce+21JV6j4r/fZs2aGQMHDixRT2JionHi/6IlGZKMb7/91tz2+++/G0FBQca1115rbhs+fLjRqFEj48CBAx6Pv/HGG43IyMgS9f/rX/8yJHm0P/fcc42ePXt61NyqVSsjISHBo/7s7GwjLi7O6Nu3b4lzOOuss4yhQ4eaP7vfBytWrDAMwzCOHDli1KlTxxgxYoTH41JTU43IyEiP7WW9b+fPn+9xTMMwjJ49e3rU/vnnnxuSjP79+5d4TRs1amQkJCR4bJs4caIhyeN3XhqLxeLRtwyjZP83jJL99qWXXjIcDodx2WWXGc2aNTO3L1++3JBk3HvvvSWeq7S/CaU9l9tzzz1nSDLefvttc1t+fr4RHx9vhIWFGZmZmYZhFPXXE/8u/f3vfzfOO+88o0mTJqX+vSnO12t2P99NN91ktG/f3tyelZVlREREmH973H/XT+VvaVX9/TaMU3vPAkBZmIoPANVo5syZOnjwoCZOnOixff/+/Vq9erWGDRumpk2beuw71amX3377rfbt26e7777bYyR74MCBatOmjRYtWnTKdcfGxuraa681f46IiNBtt92mTZs2KTU1VVLRyL975ezCwkIdPHhQYWFhat26tb777rsSx7zqqqtksVj06aefSioavfvzzz91ww03lFnHsGHDNG/ePOXl5UkqGmW77rrrFBkZWaJtcHCw+X1GRoYOHDignj176rffflNGRkaFznvUqFHm9+4R7/z8fC1btqzU5zl8+LAyMjLUvXt3j3P+5JNP5HK5NGHChBKri5/O1Nr4+Hh17tzZ/Llp06a6+uqrtWTJEhUWFsowDH300Ue66qqrZBiGDhw4YH4lJCQoIyOjxO/GPWJZ2iwIt82bN2vHjh26+eabdfDgQfOYWVlZuvzyy7V69Wq5XC6Px+Tn58vhcJR5zKSkJKWnp+umm27yqDMgIEBdu3bVihUrSjymeLsDBw7oyJEj5b5ehmFo/PjxGjRokLp27Vpi/5EjR1S/fv1yj1GWqKioU74bRnZ2tqZMmaJRo0aV6PcfffSRLBZLib8V0qm/Zz7//HPFxMTopptuMrfZbDbde++9Onr0qFatWlXq4zZu3Kj58+dr6tSpFVoV319qvvXWW/Xzzz+bU+4/+ugjRUZG6vLLL/doV9G/pVX59/tEJ3vPAkBZCPYAUE0yMjL0xBNPaOzYseZ0U7fff
"text/plain": [
"<Figure size 1200x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Гистограмма распределения чистой стоимости\n",
"plt.figure(figsize=(12, 6))\n",
"sns.histplot(df['Networth'], bins=10, kde=True)\n",
"plt.title(\"Гистограмма распределения чистой стоимости\")\n",
"plt.xlabel(\"Чистая стоимость (в миллиардах долларов)\")\n",
"plt.ylabel(\"Частота\")\n",
"plt.grid(True)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Распределение чистой стоимости имеет ярко выраженное смещение: большая часть значений сосредоточена в нижнем диапазоне, с небольшим количеством высоких значений. Это указывает на преобладание людей с относительно низкой чистой стоимостью, тогда как у немногих (например, миллиардеров) чистая стоимость крайне высока."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABAwAAAKICAYAAAD0EmiCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3QU5dvG8XtTgZBCIIXQA6GETmihl0joVRBEuqAISJOmNOlFBZGqIqACKh2RXhSVIlXpItIEQiehpd/vH7w7v6wTIISQBPx+ztlzkpnZee6Znd2dueaZWYuqqgAAAAAAACRgl9YFAAAAAACA9IfAAAAAAAAAmBAYAAAAAAAAEwIDAAAAAABgQmAAAAAAAABMCAwAAAAAAIAJgQEAAAAAADAhMAAAAAAAACYEBgAAAAAAwITAAAAApDuHDx+WlStXGv8fPHhQfvjhh7QrCACA/yACAwBAks2fP18sFovs3bvXNO6zzz4Ti8UiTZs2lbi4uDSoDi+S27dvyxtvvCG7du2SkydPSu/eveXQoUNpXRYAAP8pDmldAADg+bdixQrp3r27VK1aVb755huxt7dP65LwnAsODjYeIiIFCxaUrl27pnFVAAD8txAYAACeyo8//iht2rSRwMBA+f777yVDhgxpXRJeECtXrpSjR4/K/fv3pXjx4uLk5JTWJQEA8J/CJQkAgGQ7ePCgNGnSRLJnzy4bNmwQd3d30zRLliyRoKAgyZgxo2TLlk1ee+01uXDhQqLzs1gsiT7OnDljM83IkSNtnjd58mSxWCxSo0YNY9jIkSPFYrGY2sibN6907NjRZtitW7ekT58+kitXLnF2dpYCBQrIxIkTJT4+3ma6+Ph4+fjjj6V48eKSIUMG8fLykrp16xqXaDysfuvDWt+PP/5oM9zZ2VkKFiwo48ePF1W1afPAgQNSr149cXNzk8yZM0vt2rVl165dia6/xFjXw78fCdeB9VKThOt5w4YNUqlSJcmUKZO4u7tLw4YN5fDhw4m2UaNGjUTb+PfrJCLy9ddfG9uDp6entG7dWs6fP2+an3VdBQYGSlBQkPz+++/GfB/nYfUktj2JiMycOVOKFi0qzs7O4ufnJz169JBbt249th0RkQsXLkiXLl3Ez89PnJ2dJV++fNK9e3eJjo421uujHvPnzxcRkY4dO0rmzJnl77//ltDQUHFxcRE/Pz8ZNWqUaZv44IMPpFKlSpI1a1bJmDGjBAUFydKlS021WSwW6dmzp2l4w4YNJW/evMmep8VikalTp5rGFS5c+KFtAgCeT/QwAAAky6lTp6Ru3bri7OwsGzZskOzZs5ummT9/vnTq1EnKlSsn48ePl8uXL8vHH38sv/76qxw4cEA8PDxMz2nWrJk0b95cRER+/vln+fTTTx9Zx61bt2T8+PHJXo579+5J9erV5cKFC/LGG29I7ty5ZceOHTJkyBC5dOmSzYFRly5dZP78+VKvXj15/fXXJTY2Vn7++WfZtWuXlC1bVr766itjWmvtU6ZMkWzZsomIiI+Pj03b7777rhQpUkTu378v3377rbz77rvi7e0tXbp0ERGRI0eOSNWqVcXNzU0GDhwojo6OMmfOHKlRo4b89NNPUqFChSQvZ8La+vbt+8hpf/75Z6lfv77kyZNHRowYITExMTJz5kypXLmy7NmzRwoWLGh6Ts6cOY3X4c6dO9K9e3fTNGPHjpVhw4ZJq1at5PXXX5erV6/KJ598ItWqVXvo9mA1aNCgJC6puR6rtWvXyuLFi22GjRw5Ut5//30JCQmR7t27y4kTJ2TWrFmyZ88e+fXXX8XR0fGhbVy8eFHKly8vt27dkm7duknhwoXlwoULsnTpUrl3755Uq1bNZr2PHTtWRETee+89Y1ilSpWMv+Pi4qRu3bpSsWJFmTRpkqxfv15GjBghsbGxMmrUKGO6jz/+WBo3bixt27aV6Oho+eabb6Rly5ayZs0aadCgwROtp+TMM0OGDDJv3jzp06ePMWzHjh1y9uzZZLUNAEjHFACAJJo3b56KiK5Zs0bz58+vIqJ16tRJdNro6Gj19vbWYsWK6f37943ha9asURHR4cOH20wfExOjIqLvv/++qb3Tp08bw0RER4wYYfw/cOBA9fb21qCgIK1evbox/P3331cR0fj4eJt28uTJox06dDD+Hz16tLq4uOiff/5pM93gwYPV3t5ez507p6qqW7duVRHRt99+27Ss/27jYbVbbdu2TUVEt23bZgyLjIxUOzs7feutt4xhTZs2VScnJz116pQx7OLFi+rq6qrVqlUzzTcx7733nlosFpth/14H/641KChI3d3dNSwszJjmzz//VEdHR23RooWpjUqVKmmxYsWM/69evWp6nc6cOaP29vY6duxYm+ceOnRIHRwcbIZXr17d5rVcu3atiojWrVtXk7LrUr16dS1atKhp+OTJk22W88qVK+rk5KR16tTRuLg4Y7rp06eriOgXX3zxyHbat2+vdnZ2umfPHtO4xLaJfy9XQh06dFAR0V69etnMo0GDBurk5KRXr141ht+7d8/mudHR0VqsWDGtVauWzXAR0R49epjaatCggebJk8dm2JPM8+WXX1YHBwfdu3evMbxLly766quvPrRNAMDziUsSAABPrGPHjnL+/Hl59dVXZePGjbJkyRLTNHv37pUrV67IW2+9ZXNfgwYNGkjhwoVNP5EXHR0tIiLOzs5JruPChQvyySefyLBhwyRz5sw247y9vUVE5J9//nnkPJYsWSJVq1aVLFmyyLVr14xHSEiIxMXFyfbt20VEZNmyZWKxWGTEiBGmeSSlm3xiwsPD5dq1a3Lu3DmZNGmSxMfHS61atUTkwdnmjRs3StOmTcXf3994Tvbs2eXVV1+VX375RSIiIh7bRnR0dJLX6c2bN+XPP/+Uffv2Sdu2bW16RAQEBEjjxo1l/fr1pl/BiIyMfOy9K5YvXy7x8fHSqlUrm/Xs6+srAQEBsm3btkSfp6oyZMgQadGixRP1qEiKzZs3S3R0tPTp00fs7P63S9S1a1dxc3N75M84xsfHy8qVK6VRo0ZStmxZ0/jkbhMJu/Nbu/dHR0fL5s2bjeEZM2Y0/r5586aEh4dL1apVZf/+/ab5RUZG2qzva9euSUxMjGm6J5mnj4+PNGjQQObNmyciD3rpfPfdd9KpU6dkLTMAIP3ikgQAwBO7ceOGfPPNN9KsWTM5evSo9O7dW+rUqWNzDwNr9+RChQqZnl+4cGH55ZdfbIZZrxn/94H/o4wYMUL8/PzkjTfeMF1vHRwcLBaLRYYMGSJjxowx5vvv+xKcPHlS/vjjD/Hy8kq0jStXrojIg0sw/Pz8xNPTM8n1PU7Tpk2Nv+3s7GTo0KHSokULERG5evWq3Lt3L9H1V6RIEYmPj5fz589L0aJFH9nGrVu3krxOy5QpY/z9sHaXLVsm165dswkTrl27JgEBAY+c98mTJ0VVHzrdw7r+L1y4UI4cOSLfffedLFq0KCmLkWQP20adnJzE39//kV3sr169KhEREVKsWLEUq8fOzs4mHBIR4/KPhPddWLNmjYwZM0YOHjwoUVFRxvDEQoq5c+fK3LlzTcPz5Mlj8/+TzFNEpFOnTtKpUyf58MMPZcmSJZIlSxYj7AIAvDgIDAAAT2zy5MnSsmVLERH59NNPpWLFijJkyBCZOXNmsucZFhYmIiK+vr5Jmv7YsWMyf/58+frrrxM92CxZsqSMGDFC3n//fVm4cOFD5xMfHy8vvfSSDBw4MNHxiV2vn1I++OADKVmypMTExMiePXtkzJgx4uDgkGgvhuQKCwtL8jr9+uuv5d69e9KtW7ckzz86OlouXbokL7300iOni4+PF4vFIuvWrUv0ZzcTCzWio6Nl2LBh0qVLl2f6OjxPfv75Z2ncuLFUq1ZNZs6cKdmzZxdHR0eZN29eooFKkyZNTDchHDp0qPF+S848RR70FHJycpKVK1fKvHnzpEOHDja9NAAALwYCAwDAE6tWrZrxd7ly5aRHjx4yY8YMad++vVSsWFFE/ncG88SJE6YzjydOnDCd4Tx69
"text/plain": [
"<Figure size 1200x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABAsAAAKpCAYAAADaPqVoAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3QU5fv38WspCSUFQkmhhNAJndBCLyGU0ItKERCkSUeKEekiUgTpWCgq8EWpItJBQWlCAGlKR2roJKElhL2eP3h2fhkSiiFkk/h+nbPnJDOzM9fszpb57D33bVFVFQAAAAAAgP8vlb0LAAAAAAAASQthAQAAAAAAMCEsAAAAAAAAJoQFAAAAAADAhLAAAAAAAACYEBYAAAAAAAATwgIAAAAAAGBCWAAAAAAAAEwICwAAAAAAgAlhAQAAsKsjR47IqlWrjP8PHjwoP//8s/0KAgAAhAUAgLgtWLBALBaL7Nu3L9a8r776SiwWizRt2lQeP35sh+qQkkREREi3bt1k9+7dcvLkSenbt68cPnzY3mUBAPCflsbeBQAAkpeVK1dKjx49pGrVqrJkyRJJnTq1vUtCMufv72/cREQKFiwoXbp0sXNVAAD8txEWAABe2q+//iqtW7cWX19f+emnnyRdunT2LgkpxKpVq+TYsWPy4MEDKV68uDg4ONi7JAAA/tO4DAEA8FIOHjwoTZo0EU9PT9mwYYO4urrGWmbp0qXi5+cn6dOnl6xZs0q7du3k0qVLca7PYrHEeTt37pxpmZEjR5ruN3HiRLFYLFKjRg1j2siRI8ViscTaRp48eaRjx46maXfu3JF+/fpJrly5xNHRUfLnzy/jx48Xq9VqWs5qtcrUqVOlePHiki5dOsmWLZvUq1fPuCzjWfXbbrb6fv31V9N0R0dHKViwoIwbN05U1bTNAwcOSP369cXFxUWcnJykdu3asnv37jgfv7jYHoenbzEfA9vlJTEf5w0bNkilSpUkQ4YM4urqKg0bNpQjR47EuY0aNWrEuY2nnycRkYULFxrHg5ubm7z11lty4cKFWOuzPVa+vr7i5+cnf/75p7HeF3lWPXEdTyIis2bNkqJFi4qjo6N4eXlJz5495c6dOy/cjsiLnx/bY/u824IFC0REpGPHjuLk5CRnzpyRunXrSsaMGcXLy0tGjx4d67iYNGmSVKpUSbJkySLp06cXPz8/WbZsWZw1Lly4UMqXLy8ZMmSQzJkzS7Vq1WTjxo2mZZ4+Jm23PHnymJY7c+aMtGrVSry8vCRVqlTGcsWKFYtzXQcPHjTd/9KlS5I6dWqxWCzPrBcAkHTRsgAA8EKnT5+WevXqiaOjo2zYsEE8PT1jLbNgwQJ55513pFy5cjJu3Di5evWqTJ06VXbs2CEHDhyQTJkyxbpPs2bNpHnz5iIi8ttvv8mXX3753Dru3Lkj48aNi/d+3L9/X6pXry6XLl2Sbt26Se7cuWXnzp0SHBwsV65ckc8//9xYtnPnzrJgwQKpX7++vPvuuxIdHS2//fab7N69W8qWLSvfffedsayt9ilTpkjWrFlFRMTd3d207Q8//FCKFCkiDx48kO+//14+/PBDyZ49u3Tu3FlERI4ePSpVq1YVFxcXGTx4sKRNm1a++OILqVGjhmzbtk0qVKjw0vsZs7b+/fs/d9nffvtNGjRoIN7e3jJixAh59OiRzJo1SypXrix79+6VggULxrpPzpw5jefh7t270qNHj1jLjB07VoYNGyZvvPGGvPvuu3L9+nWZPn26VKtW7ZnHg82QIUNeck9j12Ozdu1a+d///meaNnLkSBk1apQEBARIjx495Pjx4zJ79mzZu3ev7NixQ9KmTfvMbbzM81OtWjXTYz927FgRERk6dKgxrVKlSsbfjx8/lnr16knFihVlwoQJsn79ehkxYoRER0fL6NGjjeWmTp0qjRs3lrZt20pUVJQsWbJEWrVqJWvWrJGgoCBjuVGjRsnIkSOlUqVKMnr0aHFwcJA9e/bI1q1bJTAwMNY+2Y5JEZEvv/xSzp8/b6qtcePG8s8//0i/fv2kYMGCYrFYjH16Wrp06WT+/PkydepUY9o333wjDg4O8vDhw2c+rgCAJEwBAIjD/PnzVUR0zZo1mi9fPhURDQwMjHPZqKgozZ49uxYrVkwfPHhgTF+zZo2KiA4fPty0/KNHj1REdNSoUbG2d/bsWWOaiOiIESOM/wcPHqzZs2dXPz8/rV69ujF91KhRKiJqtVpN2/H29tYOHToY/48ZM0YzZsyoJ06cMC33wQcfaOrUqfX8+fOqqrp161YVEe3Tp0+sfX16G8+q3eaXX35REdFffvnFmPbw4UNNlSqVvvfee8a0pk2bqoODg54+fdqYdvnyZXV2dtZq1arFWm9chg4dqhaLxTTt6cfg6Vr9/PzU1dVVQ0NDjWVOnDihadOm1RYtWsTaRqVKlbRYsWLG/9evX4/1PJ07d05Tp06tY8eONd338OHDmiZNGtP06tWrm57LtWvXqohovXr19GW+plSvXl2LFi0aa/rEiRNN+3nt2jV1cHDQwMBAffz4sbHcjBkzVER03rx5z91OfJ6fp/ctpg4dOqiIaO/evY1pVqtVg4KC1MHBQa9fv25Mv3//vum+UVFRWqxYMa1Vq5Yx7eTJk5oqVSpt1qyZaf9s641p06ZNKiK6bds2Uz3e3t7G/8ePH1cR0XHjxsXap5iPt+34bt26tWbJkkUjIyONeQUKFNA2bdqoiOjSpUvjfBwAAEkXlyEAAJ6rY8eOcuHCBWnTpo1s3LhRli5dGmuZffv2ybVr1+S9994z9WMQFBQkhQsXjjUMXlRUlIiIODo6vnQdly5dkunTp8uwYcPEycnJNC979uwiInLx4sXnrmPp0qVStWpVyZw5s9y4ccO4BQQEyOPHj2X79u0iIrJ8+XKxWCwyYsSIWOt4mabxcQkLC5MbN27I+fPnZcKECWK1WqVWrVoi8uRX3I0bN0rTpk0lb968xn08PT2lTZs28vvvv0t4ePgLtxEVFfXSj+nt27flxIkTEhISIm3btjW1hChQoIA0btxY1q9fH2u0i4cPH76wr4oVK1aI1WqVN954w/Q4e3h4SIECBeSXX36J836qKsHBwdKiRYt/1ZLiZWzevFmioqKkX79+kirV/3396dKli7i4uDx3qMaEen7i0qtXL+Nvi8UivXr1kqioKNm8ebMxPX369Mbft2/flrCwMKlatars37/fmL5q1SqxWq0yfPhw0/7Z1hvTy7z+IiIiREQkS5YsL7UfjRo1EovFIqtXrxaRJy1WLl68KG+++eZL3R8AkPRwGQIA4Llu3bolS5YskWbNmsmxY8ekb9++EhgYaOqz4J9//hERkUKFCsW6f+HCheX33383TbNdI/70Sf/zjBgxQry8vKRbt26xrn/29/cXi8UiwcHB8vHHHxvrfbofgpMnT8qhQ4ckW7ZscW7j2rVrIvLksgsvLy9xc3N76fpepGnTpsbfqVKlko8++khatGghIiLXr1+X+/fvx/n4FSlSRKxWq1y4cEGKFi363G3cuXPnpR/TMmXKGH8/a7vLly+XGzdumIKEGzduSIECBZ677pMnT4qqPnO5ZzX3X7RokRw9elR++OEHWbx48cvsxkt71jHq4OAgefPmNebHJaGen6elSpXKFD6IiHHZR8y+FtasWSMff/yxHDx4UCIjI43pMUOA06dPS6pUqcTX1/eF232Z11+hQoUkc+bM8tlnn4mvr69xGcKjR4/iXD5t2rTSrl07mTdvnrRs2VLmzZsnLVq0EBcXlxfWAwBImggLAADPNXHiRGnVqpWIPLmuuWLFihIcHCyzZs2K9zpDQ0NFRMTDw+Ollv/rr79kwYIFsnDhwjhPNEuWLCkjRoyQUaNGyaJFi565HqvVKnXq1JHBgwfHOT+u6/MTyqRJk6RkyZLy6NEj2bt3r3z88ceSJk2aOFsvxFdoaOhLP6YLFy6U+/fvS9euXV96/VFRUXLlyhWpU6fOc5ezWq1isVhk3bp1cQ6tGddJalRUlAwbNkw6d+78Wp+H5Oa3336Tx
"text/plain": [
"<Figure size 1200x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1IAAAHWCAYAAAB9mLjgAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB8xElEQVR4nOzdd3wUdeLG8c/upvfeCxAghI4gGLGLAqKiYIFDxe6d4Nl+Fs6zn/3O3nvD3lA8QAQBkV5CDS2UAOm9t935/RHJGakJm0zK83699iWZnf3Os+NC8mRmvmMxDMNAREREREREjpnV7AAiIiIiIiLtjYqUiIiIiIhIE6lIiYiIiIiINJGKlIiIiIiISBOpSImIiIiIiDSRipSIiIiIiEgTqUiJiIiIiIg0kYqUiIiIiIhIE6lIiYiIiIiINJGKlIiIiIiISBOpSImItKD3338fi8XS8PDw8KBnz55MnTqV7Oxss+OJiIhIM7mYHUBEpDN45JFH6Nq1K1VVVSxevJjXXnuN//73v2zcuBEvLy+z44mIiEgTqUiJiLSC0aNHM2TIEACuv/56goODefbZZ5kxYwYTJ040OZ2IiIg0lU7tExExwVlnnQXArl27ACgoKOD//u//6NevHz4+Pvj5+TF69GjWrVt30Gurqqp46KGH6NmzJx4eHkRGRjJu3DjS0tIA2L17d6PTCf/8OOOMMxrGWrBgARaLhc8//5x//OMfRERE4O3tzYUXXsjevXsP2vby5csZNWoU/v7+eHl5cfrpp/Pbb78d8j2eccYZh9z+Qw89dNC6H3/8MYMHD8bT05OgoCAmTJhwyO0f6b39kcPh4Pnnn6dPnz54eHgQHh7OTTfdRGFhYaP1unTpwvnnn3/QdqZOnXrQmIfK/swzzxy0TwGqq6t58MEH6d69O+7u7sTGxnL33XdTXV19yH31R3/ebyEhIYwZM4aNGzc2Wq+uro5HH32UhIQE3N3d6dKlC//4xz8O2sbYsWPp0qULHh4ehIWFceGFF7Jhw4aD3tvUqVOZPn06iYmJeHh4MHjwYBYtWtRovT179nDzzTeTmJiIp6cnwcHBXHrppezevfug91FUVMTtt99Oly5dcHd3JyYmhquuuoq8vLyGz92RHgf2dVO2KSLSmnRESkTEBAdKT3BwMAA7d+7ku+++49JLL6Vr165kZ2fzxhtvcPrpp7N582aioqIAsNvtnH/++cybN48JEyZw6623Ulpayty5c9m4cSMJCQkN25g4cSLnnXdeo+1OmzbtkHkee+wxLBYL99xzDzk5OTz//POMGDGClJQUPD09AZg/fz6jR49m8ODBPPjgg1itVt577z3OOussfv31V4YOHXrQuDExMTzxxBMAlJWV8be//e2Q277//vu57LLLuP7668nNzeWll17itNNOY+3atQQEBBz0mhtvvJFTTz0VgG+++YZvv/220fM33XQT77//Ptdccw1///vf2bVrFy+//DJr167lt99+w9XV9ZD7oSmKiooa3tsfORwOLrzwQhYvXsyNN95IUlISGzZs4LnnnmPbtm189913Rx27V69e3HfffRiGQVpaGs8++yznnXce6enpDetcf/31fPDBB1xyySXceeedLF++nCeeeILU1NSD9seNN95IREQEGRkZvPzyy4wYMYJdu3Y1Oq104cKFfP755/z973/H3d2dV199lVGjRrFixQr69u0LwMqVK1myZAkTJkwgJiaG3bt389prr3HGGWewefPmhvHKyso49dRTSU1N5dprr+WEE04gLy+P77//nn379pGUlMRHH33UsO0333yT1NRUnnvuuYZl/fv3b9I2RURanSEiIi3mvffeMwDj559/NnJzc429e/can332mREcHGx4enoa+/btMwzDMKqqqgy73d7otbt27TLc3d2NRx55pGHZu+++awDGs88+e9C2HA5Hw+sA45lnnjlonT59+hinn356w9e//PKLARjR0dFGSUlJw/IvvvjCAIwXXnihYewePXoYI0eObNiOYRhGRUWF0bVrV+Occ845aFsnn3yy0bdv34avc3NzDcB48MEHG5bt3r3bsNlsxmOPPdbotRs2bDBcXFwOWr59+3YDMD744IOGZQ8++KDxx29nv/76qwEY06dPb/Ta2bNnH7Q8Pj7eGDNmzEHZp0yZYvz5W+Sfs999991GWFiYMXjw4Eb79KOPPjKsVqvx66+/Nnr966+/bgDGb7/9dtD2/uj0009vNJ5hGMY//vEPAzBycnIMwzCMlJQUAzCuv/76Ruv93//9nwEY8+fPP+z4B/7frlq1qtF7+/OyPXv2GB4eHsbFF1/csKyiouKg8ZYuXWoAxocfftiw7IEHHjAA45tvvjlo/T9+fg6YPHmyER8ff8i8x7pNEZHWplP7RERawYgRIwgNDSU2NpYJEybg4+PDt99+S3R0NADu7u5YrfX/JNvtdvLz8/Hx8SExMZE1a9Y0jPP1118TEhLCLbfcctA2/nwqWlNcddVV+Pr6Nnx9ySWXEBkZyX//+18AUlJS2L59O3/5y1/Iz88nLy+PvLw8ysvLOfvss1m0aBEOh6PRmFVVVXh4eBxxu9988w0Oh4PLLrusYcy8vDwiIiLo0aMHv/zyS6P1a2pqgPr9dThffvkl/v7+nHPOOY3GHDx4MD4+PgeNWVtb22i9vLw8qqqqjph7//79vPTSS9x///34+PgctP2kpCR69erVaMwDp3P+efuHciBTbm4uS5cu5dtvv6V///6EhIQANPx/ueOOOxq97s477wTgxx9/bLS8oqKCvLw8UlJSeOuttwgPD6dnz56N1klOTmbw4MENX8fFxTF27FjmzJmD3W4HaDg6eSBjfn4+3bt3JyAg4KDP6YABA7j44osPem9N/Zwe6zZFRFqbTu0TEWkFr7zyCj179sTFxYXw8HASExMbihPUnw72wgsv8Oqrr7Jr166GH1zhf6f/Qf0pgYmJibi4OPef7x49ejT62mKx0L1794brULZv3w7A5MmTDztGcXExgYGBDV/n5eUdNO6fbd++HcMwDrven0/BKyoqAjiovPx5zOLiYsLCwg75fE5OTqOvf/rpJ0JDQ4+Y888efPBBoqKiuOmmm/jqq68O2n5qauphx/zz9g9lyZIljV7fo0cPvvvuu4YSsmfPHqxWK927d2/0uoiICAICAtizZ0+j5Y888ghPPfVUw1gLFixoVJwPLP+znj17UlFRQW5uLhEREVRWVvLEE0/w3nvvsX//fgzDaFi3uLi44c9paWmMHz/+qO/zWBzrNkVEWpuKlIhIKxg6dGjDrH2H8vjjj3P//fdz7bXX8uijjxIUFITVauW222476EiPGQ5keOaZZxg4cOAh1/ljuampqSEzM5NzzjnnqONaLBZmzZqFzWY74pgAWVlZQH1hONKYYWFhTJ8+/ZDP/7ngDBs2jH/961+Nlr388svMmDHjkK9PTU3l/fff5+OPPz7ktVYOh4N+/frx7LPPHvL1sbGxh81+QP/+/fnPf/4DQG5uLi+++CJnnHEGa9asafTej/XozvXXX8/ZZ5/Nvn37eO655xg/fjxLlizB39//mF5/wC233MJ7773HbbfdRnJyMv7+/lgsFiZMmNBin1MztikicixUpERE2oCvvvqKM888k3feeafR8qKioobTuQASEhJYvnw5tbW1Tpkw4YADR5wOMAyDHTt2NFzwf2ASCz8/P0aMGHHU8datW0dtbe0Ry+OBcQ3DoGvXrgedanYomzdvxmKxkJiYeMQxf/75Z4YPH97otLDDCQkJOeg9HWlCiGnTpjFw4EAuv/zyw25/3bp1nH322c0+3TIwMLBRpjPOOIOoqCjee+89pk2bRnx8PA6Hg+3bt5OUlNSwXnZ2NkVFRcTHxzcar3v37g1Hr0aMGEFcXByffPJJo8k//vwZANi2bRteXl4N5fOrr75i8uTJDSUP6k/hPHCk8I/74M+zDDbXsW5TRKS16RopEZE2wGazNTplCeqvtdm/f3+jZePHjycvL4+XX375oDH+/Pqm+PDDDyktLW34+quvviIzM5PRo
"text/plain": [
"<Figure size 1000x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# 1. Столбчатая диаграмма по странам\n",
"\n",
"plt.figure(figsize=(12, 6))\n",
"\n",
"sns.countplot(data=df, x=\"Country\", order=df[\"Country\"].value_counts().index)\n",
"\n",
"plt.title(\"Количество людей по странам\")\n",
"\n",
"plt.xlabel(\"Страна\")\n",
"\n",
"plt.ylabel(\"Количество\")\n",
"\n",
"plt.xticks(rotation=45)\n",
"\n",
"plt.show()\n",
"\n",
"\n",
"# 2. Столбчатая диаграмма по отраслям\n",
"\n",
"plt.figure(figsize=(12, 6))\n",
"\n",
"sns.countplot(data=df, x=\"Industry\", order=df[\"Industry\"].value_counts().index)\n",
"\n",
"plt.title(\"Количество людей по отраслям\")\n",
"\n",
"plt.xlabel(\"Отрасль\")\n",
"\n",
"plt.ylabel(\"Количество\")\n",
"\n",
"plt.xticks(rotation=45)\n",
"\n",
"plt.show()\n",
"\n",
"\n",
"# 3. Гистограмма для анализа возраста\n",
"\n",
"plt.figure(figsize=(10, 5))\n",
"\n",
"sns.histplot(df[\"Age\"], bins=30, kde=True)\n",
"\n",
"plt.title(\"Распределение возраста\")\n",
"\n",
"plt.xlabel(\"Возраст\")\n",
"\n",
"plt.ylabel(\"Частота\")\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Графики демонстрируют разнообразие стран и отраслей, представленных в наборе данных, что указывает на охват данных по множеству регионов и различных сфер деятельности.\n",
"\n",
"Разбиваем набор данных на обучающую, контрольную и тестовую выборки"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"((1560, 6), (520, 6), (520, 6))"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"# Разделим набор данных на признаки (X) и целевой признак (y)\n",
"X = df.drop(columns=[\"Networth\"])\n",
"y = df[\"Networth\"]\n",
"\n",
"# Разделение на обучающую, контрольную и тестовую выборки\n",
"X_train, X_temp, y_train, y_temp = train_test_split(\n",
" X, y, test_size=0.4, random_state=42\n",
")\n",
"X_val, X_test, y_val, y_test = train_test_split(\n",
" X_temp, y_temp, test_size=0.5, random_state=42\n",
")\n",
"\n",
"# Проверка размера выборок\n",
"(X_train.shape, X_val.shape, X_test.shape)"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(count 1560.000000\n",
" mean 5.208173\n",
" std 12.653032\n",
" min 1.000000\n",
" 25% 1.500000\n",
" 50% 2.400000\n",
" 75% 4.300000\n",
" max 219.000000\n",
" Name: Networth, dtype: float64,\n",
" count 520.000000\n",
" mean 4.443654\n",
" std 7.267615\n",
" min 1.000000\n",
" 25% 1.500000\n",
" 50% 2.400000\n",
" 75% 4.825000\n",
" max 91.400000\n",
" Name: Networth, dtype: float64,\n",
" count 520.000000\n",
" mean 4.235577\n",
" std 5.861496\n",
" min 1.000000\n",
" 25% 1.600000\n",
" 50% 2.500000\n",
" 75% 4.500000\n",
" max 60.000000\n",
" Name: Networth, dtype: float64)"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Проверка распределения целевого признака по выборкам\n",
"train_dist = y_train.describe()\n",
"val_dist = y_val.describe()\n",
"test_dist = y_test.describe()\n",
"\n",
"train_dist, val_dist, test_dist"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Размеры после oversampling: (13910, 10047) (13910,)\n",
"Размеры после undersampling: (13065, 10047) (13065,)\n"
]
}
],
"source": [
"from imblearn.over_sampling import RandomOverSampler\n",
"oversampler = RandomOverSampler(random_state=12)\n",
"X_train_over, y_train_over = oversampler.fit_resample(X_train, y_train)\n",
"\n",
"undersampler = RandomUnderSampler(random_state=12)\n",
"X_train_under, y_train_under = undersampler.fit_resample(X_train, y_train)\n",
"\n",
"print(\"Размеры после oversampling:\", X_train_over.shape, y_train_over.shape)\n",
"print(\"Размеры после undersampling:\", X_train_under.shape, y_train_under.shape)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.6"
}
2024-10-26 01:15:17 +04:00
},
2024-11-09 10:04:05 +04:00
"nbformat": 4,
"nbformat_minor": 2
}