2003 lines
1.1 MiB
Plaintext
2003 lines
1.1 MiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Лабораторная работа №2\n",
|
|||
|
"## Были выбраны следующие датасеты:\n",
|
|||
|
" - ### 11. Цены на бриллианты.\n",
|
|||
|
" - ### 18. Цены на мобильные устройства.\n",
|
|||
|
" - ### 19. Данные о миллионерах."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Начнем анализировать датасет №11.\n",
|
|||
|
"\n",
|
|||
|
"Ссылка на исходные данные: https://www.kaggle.com/datasets/nancyalaswad90/diamonds-prices\n",
|
|||
|
"\n",
|
|||
|
"**Общее описание**: Данный датасет содержит цены и атрибуты для 53940 алмазов круглой огранки. Имеются 10 характеристик (карат, огранка, цвет, чистота, глубина, таблица, цена, x, y и z). Большинство переменных являются числовыми по своей природе, но переменные cut, color и clearity являются упорядоченными факторными переменными.\n",
|
|||
|
"\n",
|
|||
|
"**Проблемная область**: Финансовый анализ и прогнозирование цен акций.\n",
|
|||
|
"\n",
|
|||
|
"**Объекты наблюдения**: Данные о алмазах, включающие атрибуты: _Carat, Cut, Color, Clarity, Depth, Table, Price_.\n",
|
|||
|
"\n",
|
|||
|
"**Бизнес цели**:\n",
|
|||
|
"- ***Прогнозирование цен на алмазы***: Позволяет покупателям и продавцам лучше ориентироваться в рыночных ценах, а также помогает в принятии решений о покупке или продаже алмазов,\n",
|
|||
|
"- ***Анализ факторов, влияющих на стоимость***: Понимание, какие характеристики алмаза (например, качество огранки или цвет) оказывают наибольшее влияние на его цену, может помочь в разработке стратегий ценообразования и улучшении ассортимента.\n",
|
|||
|
"\n",
|
|||
|
"**Цели технического проекта**:\n",
|
|||
|
"1. ***Прогнозирование цен на алмазы***: Входные данные - атрибуты алмазов; целевой признак - _цена_,\n",
|
|||
|
"2. ***Анализ факторов влияния***: Входные данные - атрибуты, описывающие качество и характеристики алмаза; целевой признак - влияние каждого атрибута на конечную цену, что может быть проанализировано с помощью методов регрессии и визуализации данных."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 3,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Index(['Unnamed: 0', 'carat', 'cut', 'color', 'clarity', 'depth', 'table',\n",
|
|||
|
" 'price', 'x', 'y', 'z'],\n",
|
|||
|
" dtype='object')\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"\n",
|
|||
|
"df = pd.read_csv(\"../data/Diamonds-Prices.csv\")\n",
|
|||
|
"print(df.columns)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Атрибуты: \n",
|
|||
|
"- Неизвестный: 0, \n",
|
|||
|
"- Караты (carat), \n",
|
|||
|
"- Огранка (cut), \n",
|
|||
|
"- Цвет (color), \n",
|
|||
|
"- Чистота (clarity), \n",
|
|||
|
"- Глубина (depth), \n",
|
|||
|
"- Площадь огранки (table), \n",
|
|||
|
"- Цена (price), \n",
|
|||
|
"- Ширина (координата X), \n",
|
|||
|
"- Длина (координата Y), \n",
|
|||
|
"- Высота (координата Z). "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Проверяем на выбросы"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 5,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0EAAAIjCAYAAADFthA8AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACpr0lEQVR4nOzde3gU5dk/8O/s5rBJSDYnYBOEJBwUQgQEQZCTIlQOIuprVRTPP1SE1kNrFa0CUovUvlXfatFi0RYEra2KFIwFQTkYDBJOMagQk4CQAEnIBnJOdn5/xFn3MLs7szt7yn4/18V1kc3szLOzs5vnnvt57kcQRVEEERERERFRhNAFuwFERERERESBxCCIiIiIiIgiCoMgIiIiIiKKKAyCiIiIiIgoojAIIiIiIiKiiMIgiIiIiIiIIgqDICIiIiIiiigMgoiIiIiIKKIwCCIiIiIioojCIIiIiNwqLy+HIAh46623gt0UO/n5+Rg2bBgMBgMEQUBdXZ1fjycIAhYvXuzXYxARUWAwCCKiiHXo0CHceOONyMrKgsFgQK9evTBlyhT8+c9/9tsx165di5deesnp8ZMnT2Lx4sXYv3+/347t6LPPPoMgCNZ/0dHR6Nu3L+644w58//33mhzjiy++wOLFizUPUGpqanDTTTchLi4Or776KlavXo2EhARNj0GB5+rzQUSktahgN4CIKBi++OILXHnllejTpw/mzp0Lk8mE48ePY/fu3Xj55Zfxi1/8wi/HXbt2LYqLi/Hwww/bPX7y5EksWbIE2dnZGDZsmF+O7covf/lLjBw5Em1tbSgqKsJf//pXbNy4EYcOHUJmZqZP+/7iiy+wZMkS3HXXXUhOTtamwQD27NmDc+fOYenSpZg8ebJm+3WnqakJUVH8s+lPrj4fRERa47c5EUWk5557DkajEXv27HHqnJ8+fTo4jfKDhoYGjxmS8ePH48YbbwQA3H333bjwwgvxy1/+En//+9+xcOHCQDRTNek90jKwkmOxWNDa2gqDwQCDweDXY3VFjY2NiI+PD3YziIiccDgcEUWk0tJSDB48WLYT3aNHD6fH1qxZg1GjRiE+Ph4pKSmYMGEC/vvf/1p/v379esyYMQOZmZmIjY1Fv379sHTpUnR0dFi3ueKKK7Bx40ZUVFRYh6BlZ2fjs88+w8iRIwF0BiHS72zn4Hz55ZeYOnUqjEYj4uPjMXHiROzatcuujYsXL4YgCCgpKcGtt96KlJQUjBs3TvW5mTRpEgCgrKzM7XZbt27F+PHjkZCQgOTkZMyaNQuHDx+2a89jjz0GAMjJybG+rvLycrf7fe+99zBixAjExcUhPT0dc+bMwYkTJ6y/v+KKK3DnnXcCAEaOHAlBEHDXXXe53J90Xr755hvcdNNNSEpKQlpaGh566CE0NzfbbSsIAhYsWIC3334bgwcPRmxsLPLz862/c5wTdOLECdx7773W9z0nJwfz5s1Da2urdZu6ujo8/PDD6N27N2JjY9G/f38sX74cFovF7Xm45ppr0LdvX9nfjRkzBpdeeqn1582bN2PcuHFITk5Gt27dcNFFF+HJJ590u3+JFtc20Pm+5OXlYe/evZgwYQLi4+OtbfDl80FE5A/MBBFRRMrKykJBQQGKi4uRl5fndtslS5Zg8eLFuPzyy/Hss88iJiYGX375JbZu3Yqf/exnAIC33noL3bp1w6OPPopu3bph69ateOaZZ1BfX48XXngBAPDUU0/BbDbjhx9+wIsvvggA6NatGwYNGoRnn30WzzzzDO677z6MHz8eAHD55ZcD6Aw2pk2bhhEjRmDRokXQ6XR48803MWnSJOzYsQOjRo2ya+/Pf/5zDBgwAL///e8hiqLqc1NaWgoASEtLc7nNli1bMG3aNPTt2xeLFy9GU1MT/vznP2Ps2LEoKipCdnY2brjhBnz33XdYt24dXnzxRaSnpwMAunfv7nK/b731Fu6++26MHDkSy5Ytw6lTp/Dyyy9j165d2LdvH5KTk/HUU0/hoosuwl//+lc8++yzyMnJQb9+/Ty+rptuugnZ2dlYtmwZdu/ejf/7v//D2bNn8Y9//MNuu61bt+Kf//wnFixYgPT0dJcd8ZMnT2LUqFGoq6vDfffdh4EDB+LEiRP417/+hcbGRsTExKCxsRETJ07EiRMncP/996NPnz744osvsHDhQlRWVrqd/3LzzTfjjjvuwJ49e6xBMgBUVFRg9+7d1uvq66+/xjXXXIMhQ4bg2WefRWxsLI4ePeoUJMvR6tqW1NTUYNq0abjlllswZ84c9OzZU/E+XH0+iIj8QiQiikD//e9/Rb1eL+r1enHMmDHib37zG/GTTz4RW1tb7bY7cuSIqNPpxOuvv17s6Oiw+53FYrH+v7Gx0ekY999/vxgfHy82NzdbH5sxY4aYlZXltO2ePXtEAOKbb77pdIwBAwaIV199tdPxcnJyxClTplgfW7RokQhAnD17tqJzsG3bNhGAuGrVKvHMmTPiyZMnxY0bN4rZ2dmiIAjinj17RFEUxbKyMqe2DRs2TOzRo4dYU1NjfezAgQOiTqcT77jjDutjL7zwgghALCsr89ie1tZWsUePHmJeXp7Y1NRkffw///mPCEB85plnrI+9+eabIgBrG92Rzsu1115r9/iDDz4oAhAPHDhgfQyAqNPpxK+//tppPwDERYsWWX++4447RJ1OJ9sG6b1aunSpmJCQIH733Xd2v3/iiSdEvV4vHjt2zGW7zWazGBsbK/7qV7+ye/wPf/iDKAiCWFFRIYqiKL744osiAPHMmTMu9yVH62t74sSJIgDxtddec9re188HEZHWOByOiCLSlClTUFBQgGuvvRYHDhzAH/7wB1x99dXo1asXPvroI+t2H374ISwWC5555hnodPZfmYIgWP8fFxdn/f+5c+dQXV2N8ePHo7GxEd98843X7dy/fz+OHDmCW2+9FTU1NaiurkZ1dTUaGhpw1VVXYfv27U7Dqh544AFVx7jnnnvQvXt3ZGZmYsaMGWhoaMDf//53u+FWtiorK7F//37cddddSE1NtT4+ZMgQTJkyBZs2bVL/QgF89dVXOH36NB588EG7+TczZszAwIEDsXHjRq/2K5k/f77dz1LxC8f2Tpw4Ebm5uW73ZbFY8OGHH2LmzJmy50m6Nt577z2MHz8eKSkp1veuuroakydPRkdHB7Zv3+7yGElJSZg2bRr++c9/2mX03n33XYwePRp9+vQB8NO8qPXr13scYmfLH9d2bGws7r77bqdj+evzQUTkLQ6HI6KINXLkSLz//vtobW3FgQMH8MEHH+DFF1/EjTfeiP379yM3NxelpaXQ6XQeO8Vff/01fvvb32Lr1q2or6+3+53ZbPa6jUeOHAEA6xwYOWazGSkpKdafc3JyVB3jmWeewfjx46HX65Geno5Bgwa5rYJWUVEBALjoooucfjdo0CB88sknigoyqNnvwIEDsXPnTlX7czRgwAC7n/v16wedTuc0R0nJ+Ttz5gzq6+s9DqU8cuQIDh486HIIoKciHDfffDM+/PBDFBQU4PLLL0dpaSn27t1rN4zu5ptvxhtvvIH/9//+H5544glcddVVuOGGG3DjjTc6BTe2/HFt9+rVCzExMT7tg4goEBgEEVHEi4mJwciRIzFy5EhceOGFuPvuu/Hee+9h0aJFip5fV1eHiRMnIikpCc8++yz69esHg8GAoqIiPP7446ruzjuSnvvCCy+4LJ3tOG/C9q67EhdffHHAykyHEttshy21588di8WCKVOm4De/+Y3s7y+88EK3z585cybi4+Pxz3/+E5dffjn++c9/QqfT4ec//7lde7dv345t27Zh48aNyM/Px7vvvotJkybhv//9L/R6vdftV3tty507f34+iIi8xSCIiMiGNLSpsrISQGe2wGKxoKSkxGUQ8tlnn6Gmpgbvv/8+JkyYYH1crrqaq463q8elCf9JSUkhE6hkZWUBAL799lun333zzTdIT0+3ZoFcvS5P+5Uq1Em+/fZb6++9deTIEbssz9GjR2GxWLyqQNa9e3ckJSWhuLjY7Xb9+vXD+fPnvX7vEhIScM0
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"\n",
|
|||
|
"# Загрузка данных\n",
|
|||
|
"df = pd.read_csv(\"../data/Diamonds-Prices.csv\")\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"plt.scatter(df[\"price\"], df[\"carat\"])\n",
|
|||
|
"plt.xlabel(\"price\")\n",
|
|||
|
"plt.ylabel(\"carat\")\n",
|
|||
|
"plt.title(\"Точечная диаграмма зависимости цены от карата\")\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Выброс с наибольшим значением был замечен при ~175000\n",
|
|||
|
"Начнем использовать метод межквантильного размаха для удаления выбросов."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA2QAAAIjCAYAAABswtioAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOy9d7wkVbW3/1TsfHKanGFghowoIIIKIkEEXuUVkKxwTXjVF7ze6xUwXh1UECN6BUz8FDEikhRQR0DyMAMMk+PJoXNX3L8/dnXP6dMnTmAOWs/nM8qprq7atWtX9V57rfVdihBCEBISEhISEhISEhISEvKqo+7vBoSEhISEhISEhISEhPyrEhpkISEhISEhISEhISEh+4nQIAsJCQkJCQkJCQkJCdlPhAZZSEhISEhISEhISEjIfiI0yEJCQkJCQkJCQkJCQvYToUEWEhISEhISEhISEhKynwgNspCQkJCQkJCQkJCQkP1EaJCFhISEhISEhISEhITsJ0KDLCQkJCQkJCQkJGQPcV2Xnp4etm7dur+bEvIaIzTIQkJCQkJC/sW59NJLmT9//n4591e+8hWWLl2K7/v75fwhIXvCunXreP/738+MGTMwTZP29naOPfZYhBD7u2l7lRdffBFd11m9evX+bso/Jfr+bkDIvw6Kokxqv4cffpiTTjpp3zYmJCRkWvHtb3+beDzOpZdeur+bEvIqkslk+PKXv8yNN96Iqu5aI57M78WTTz7J0UcfvS+bFxIyLo8//jinnXYaTU1N/Md//AcHH3wwiqJQX18/6TnPa4WDDz6YM844g8985jP86le/2t/N+adDEf9sJnzItOUnP/lJ1d8/+tGPePDBB/nxj39ctf2UU06hvb391WxaSEjIfmb58uW0tLTwyCOP7O+m/EviOA6+7xOJRF7V8950001cd911dHd3E41GK9sVReGUU07h4osvrvnOX//6V2699dbQIAvZr9i2zWGHHUZdXR0PPPAA9fX1+7tJ+5w//vGPnH766axfv55Fixbt7+b8UxF6yEJeNd773vdW/f3444/z4IMP1mwPCQl5bSOEoFQqEYvF9ndTQiYgn8+TSCQwDGO/nP+2227jrLPOqjLGyhxwwAGj/j64rsutt976ajQvJGRMfv/737N27VpefvnlfwljDODkk0+msbGRO+64g89+9rP7uzn/VIQ5ZCHTlp6eHq644gra29uJRqMcdthh3HHHHVX73H777SiKwubNm6u2n3TSSTVhj5Zlcd1117F48WIikQhz5szh2muvxbKsyj6PPPIIiqLUrNKPzK/YvHkziqJw44031rR7+fLlVee2bZvPfOYzHHXUUdTX15NIJDjhhBN4+OGHx7zmuXPnomkaiqKgKArJZHL8zhrRrtH+De+j0frN930OPfRQFEXh9ttvr2xftWoVl156KQsXLiQajdLR0cHll19Of39/1bmvv/76Uc+r67vWfU466SSWL1/O008/zXHHHUcsFmPBggV897vfrTrWZPts+PX+5je/qfqsVCrR2NhYc5/K7Wxra8NxnKrv3HnnnZXj9fX1Vbb/9re/5YwzzmDmzJlEIhEWLVrE5z73OTzPG/d+DOekk04atX+uv/76mn3L92fkv5Fj+tlnn+Xtb387ra2tVfudeeaZE7bH931uvvlmDjnkEKLRKK2trbz97W/nqaeequxz22238Za3vIW2tjYikQgHH3ww3/nOd2qONX/+fM4880zuv/9+jj76aGKxGN/73vcmfYz58+ezZs0aHn300TGvdTjjjfOR3y0/0z//+c/5z//8Tzo6OkgkEpx11lls27at5th33XUXRx11FLFYjJaWFt773veyY8eOyueXXnrpuOce/lzNnz+/JgTzrrvuQlGUUd8n5X+GYTB//nyuueYabNsesx+Gf/fGG2/k61//OvPmzSMWi3HiiSfW5HpceumlJJNJNmzYwOmnn04qleLCCy+sfDYyh2wyYwRk9EO5z5qamnjPe94zat+OZNOmTaxatYqTTz55wn0nw5///GdOOOEEEokEDQ0NvPOd7+Sll16q2W/Hjh1cccUVled5wYIFfOADH6jp67Ge2eHvR4AnnniCt7/97dTX1xOPxznxxBNZuXLluG3N5XIkEgk++tGP1ny2fft2NE3jS1/6UtX2scbeyDE2lffVeM/ScHzf56abbmLZsmVEo1Ha29u56qqrGBwcrNqv/C4YyYc//OGaY472/luxYsWoz//GjRt597vfzcyZM1FVtdLG5cuX15xrJMOvSdM0Zs2axZVXXsnQ0FBln/J74pe//OWYxxn5jDz++OMsWLCAu+++m0WLFmGaJnPnzuXaa6+lWCxWfbfcLw888ACHH3440WiUgw8+eNTQv/K1NjU1EY/HecMb3sAf/vCHmv1KpRLXX389BxxwANFolBkzZnDuueeyYcOGyj75fJ5PfOITzJkzh0gkwoEHHsiNN95Yk982mT4CMAyDk046id/+9rdj9lPI7hF6yEKmJcVikZNOOon169fz4Q9/mAULFnDXXXdx6aWXMjQ0NOqP2Hj4vs9ZZ53F3/72N6688koOOuggXnjhBb7+9a/zyiuv1Ezm9yaZTIYf/OAHnH/++bz//e8nm83yv//7v5x66qn84x//4PDDD6/se8kll/DQQw/xkY98hMMOOwxN07j11lt55plnpnTO888/n9NPPx2Ae++9lzvvvHPC7/z4xz/mhRdeqNn+4IMPsnHjRi677DI6OjpYs2YNt956K2vWrOHxxx+v+ZH9zne+U2VADs8LARgcHOT000/nvPPO4/zzz+cXv/gFH/jABzBNk8svvxyYWp8BRKNRbrvtNs4+++zKtl/96leUSqUxrzebzXLPPfdwzjnnVLbddtttRKPRmu/dfvvtJJNJPv7xj5NMJvnzn//MZz7zGTKZDCtWrBjzHCOZPXt2ZZKVy+X4wAc+MO7+X//612lpaQHgC1/4QtVn6XSa0047DSEEH//4x5kzZw4AH/vYxybVliuuuILbb7+d0047jfe97324rstf//pXHn/88UoY2He+8x2WLVvGWWedha7r/P73v+eDH/wgvu/zoQ99qOp4a9eu5fzzz+eqq67i/e9/PwceeOCkj3HTTTfxkY98hGQyyX/9138BTCpsefg4L/OpT31q1H2/8IUvoCgKn/zkJ+np6eGmm27i5JNP5rnnnqt48m6//XYuu+wyXve61/GlL32J7u5ubr75ZlauXMmzzz5LQ0MDV111VZUBcdFFF3HOOedw7rnnVra1traO2gbXdSvXNxpXXnklJ5xwApZlcf/993PjjTcSjUb53Oc+N2Ff/OhHPyKbzfKhD32IUqnEzTffzFve8hZeeOGFqr50XZdTTz2VN77xjdx4443E4/ExjzmZMfKFL3yB//7v/+a8887jfe97H729vdxyyy286U1vqvTZWPz9738H4Mgjj5zw+ibioYce4rTTTmPhwoVcf/31FItFbrnlFo4//nieeeaZykR6586dHHPMMQwNDXHllVeydOlSduzYwS9/+UsKhQKmaVYdd+nSpZV71tfXV/N8/fnPf+a0007jqKOO4rrrrkNV1coixF//+leOOeaYUdubTCY555xz+PnPf87XvvY1NE2rfHbnnXcihKgYy8OJRCL84Ac/qPz9vve9r2af3XlflcceyHfnr3/966rPr7rqqsrzcfXVV7Np0ya++c1v8uyzz7Jy5cq94mEdGhqqMUIBPM/jrLPOYsuWLfz7v/87BxxwAIqi1LwTx6P8jLquy2OPPcatt95KsVisSZmYCv39/WzcuJH//M//5Nxzz+UTn/gETz31FCtWrGD16tX84Q9/qPp9XLduHf/3//5f/u3f/o1LLrmE2267jXe/+93cd999nHLKKQB0d3dz3HHHUSgUuPrqq2lubuaOO+7grLPO4pe//GXl98rzPM4880z+9Kc/8Z73vIePfvSjZLNZHnzwQVavXs2iRYsQQnDWWWfx8MMPc8UVV3D44Ydz//33c80117Bjxw6+/vWv71YfHXXUUfz2t78lk8lQV1e32/0XMgIRErKf+NC
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Количество строк до удаления выбросов: 53943\n",
|
|||
|
"Количество строк после удаления выбросов: 49517\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"\n",
|
|||
|
"# Загрузка данных\n",
|
|||
|
"df = pd.read_csv(\"../data/Diamonds-Prices.csv\")\n",
|
|||
|
"\n",
|
|||
|
"# Выбор столбцов для анализа\n",
|
|||
|
"column1 = \"carat\"\n",
|
|||
|
"column2 = \"price\"\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"# Функция для удаления выбросов\n",
|
|||
|
"def remove_outliers(df, column):\n",
|
|||
|
" Q1 = df[column].quantile(0.25)\n",
|
|||
|
" Q3 = df[column].quantile(0.75)\n",
|
|||
|
" IQR = Q3 - Q1\n",
|
|||
|
" lower_bound = Q1 - 1.5 * IQR\n",
|
|||
|
" upper_bound = Q3 + 1.5 * IQR\n",
|
|||
|
" return df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"# Удаление выбросов для каждого столбца\n",
|
|||
|
"df_cleaned = df.copy()\n",
|
|||
|
"for column in [column1, column2]:\n",
|
|||
|
" df_cleaned = remove_outliers(df_cleaned, column)\n",
|
|||
|
"\n",
|
|||
|
"# Построение точечной диаграммы после удаления выбросов\n",
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"plt.scatter(df_cleaned[column1], df_cleaned[column2], alpha=0.5)\n",
|
|||
|
"plt.xlabel(column1)\n",
|
|||
|
"plt.ylabel(column2)\n",
|
|||
|
"plt.title(f\"Точечная диаграмма для {column1} против {column2} (После удаления выбросов)\")\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# Вывод количества строк до и после удаления выбросов\n",
|
|||
|
"print(f\"Количество строк до удаления выбросов: {len(df)}\")\n",
|
|||
|
"print(f\"Количество строк после удаления выбросов: {len(df_cleaned)}\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 8,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Количество пропущенных значений в каждом столбце:\n",
|
|||
|
"Unnamed: 0 0\n",
|
|||
|
"carat 0\n",
|
|||
|
"cut 0\n",
|
|||
|
"color 0\n",
|
|||
|
"clarity 0\n",
|
|||
|
"depth 0\n",
|
|||
|
"table 0\n",
|
|||
|
"price 0\n",
|
|||
|
"x 0\n",
|
|||
|
"y 0\n",
|
|||
|
"z 0\n",
|
|||
|
"dtype: int64\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"\n",
|
|||
|
"# Проверка на пропущенные значения\n",
|
|||
|
"missing_values = df.isnull().sum()\n",
|
|||
|
"\n",
|
|||
|
"# Вывод результатов\n",
|
|||
|
"print(\"Количество пропущенных значений в каждом столбце:\")\n",
|
|||
|
"print(missing_values)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Пустых строк не обнаружено. Перейдем к созданию выборок."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 9,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Размер обучающей выборки: 32365\n",
|
|||
|
"Размер контрольной выборки: 10789\n",
|
|||
|
"Размер тестовой выборки: 10789\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"from sklearn.model_selection import train_test_split\n",
|
|||
|
"\n",
|
|||
|
"# Загрузка данных\n",
|
|||
|
"df = pd.read_csv(\"../data/Diamonds-Prices.csv\")\n",
|
|||
|
"\n",
|
|||
|
"# Выбор признаков и целевой переменной\n",
|
|||
|
"X = df.drop(\"price\", axis=1) # Признаки (все столбцы, кроме 'Price')\n",
|
|||
|
"y = df[\"price\"] # Целевая переменная ('Price')\n",
|
|||
|
"\n",
|
|||
|
"# Разбиение данных на обучающую и оставшуюся часть (контрольную + тестовую)\n",
|
|||
|
"X_train, X_temp, y_train, y_temp = train_test_split(\n",
|
|||
|
" X, y, test_size=0.4, random_state=42\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"# Разбиение оставшейся части на контрольную и тестовую выборки\n",
|
|||
|
"X_val, X_test, y_val, y_test = train_test_split(\n",
|
|||
|
" X_temp, y_temp, test_size=0.5, random_state=42\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"# Вывод размеров выборок\n",
|
|||
|
"print(f\"Размер обучающей выборки: {X_train.shape[0]}\")\n",
|
|||
|
"print(f\"Размер контрольной выборки: {X_val.shape[0]}\")\n",
|
|||
|
"print(f\"Размер тестовой выборки: {X_test.shape[0]}\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Теперь проанализируем сбалансированность выборок"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 10,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Распределение Price в обучающей выборке:\n",
|
|||
|
"price\n",
|
|||
|
"327 1\n",
|
|||
|
"334 1\n",
|
|||
|
"336 1\n",
|
|||
|
"337 1\n",
|
|||
|
"338 1\n",
|
|||
|
" ..\n",
|
|||
|
"18791 1\n",
|
|||
|
"18795 2\n",
|
|||
|
"18797 1\n",
|
|||
|
"18804 1\n",
|
|||
|
"18806 1\n",
|
|||
|
"Name: count, Length: 9476, dtype: int64\n",
|
|||
|
"Процент положительных значений: 100.00%\n",
|
|||
|
"Процент отрицательных значений: 0.00%\n",
|
|||
|
"\n",
|
|||
|
"Необходима аугментация данных для балансировки классов.\n",
|
|||
|
"\n",
|
|||
|
"Распределение Price в контрольной выборке:\n",
|
|||
|
"price\n",
|
|||
|
"326 2\n",
|
|||
|
"340 1\n",
|
|||
|
"344 1\n",
|
|||
|
"354 1\n",
|
|||
|
"357 1\n",
|
|||
|
" ..\n",
|
|||
|
"18781 1\n",
|
|||
|
"18784 1\n",
|
|||
|
"18791 1\n",
|
|||
|
"18803 1\n",
|
|||
|
"18823 1\n",
|
|||
|
"Name: count, Length: 5389, dtype: int64\n",
|
|||
|
"Процент положительных значений: 100.00%\n",
|
|||
|
"Процент отрицательных значений: 0.00%\n",
|
|||
|
"\n",
|
|||
|
"Необходима аугментация данных для балансировки классов.\n",
|
|||
|
"\n",
|
|||
|
"Распределение Price в тестовой выборке:\n",
|
|||
|
"price\n",
|
|||
|
"335 1\n",
|
|||
|
"336 1\n",
|
|||
|
"337 1\n",
|
|||
|
"351 1\n",
|
|||
|
"353 1\n",
|
|||
|
" ..\n",
|
|||
|
"18766 1\n",
|
|||
|
"18768 1\n",
|
|||
|
"18780 1\n",
|
|||
|
"18788 1\n",
|
|||
|
"18818 1\n",
|
|||
|
"Name: count, Length: 5308, dtype: int64\n",
|
|||
|
"Процент положительных значений: 100.00%\n",
|
|||
|
"Процент отрицательных значений: 0.00%\n",
|
|||
|
"\n",
|
|||
|
"Необходима аугментация данных для балансировки классов.\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"from sklearn.model_selection import train_test_split\n",
|
|||
|
"\n",
|
|||
|
"# Загрузка данных\n",
|
|||
|
"df = pd.read_csv(\"../data/Diamonds-Prices.csv\")\n",
|
|||
|
"\n",
|
|||
|
"# Выбор признаков и целевой переменной\n",
|
|||
|
"X = df.drop(\"price\", axis=1) # Признаки (все столбцы, кроме 'Price')\n",
|
|||
|
"y = df[\"price\"] # Целевая переменная ('Price')\n",
|
|||
|
"\n",
|
|||
|
"# Разбиение данных на обучающую и оставшуюся часть (контрольную + тестовую)\n",
|
|||
|
"X_train, X_temp, y_train, y_temp = train_test_split(\n",
|
|||
|
" X, y, test_size=0.4, random_state=42\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"# Разбиение оставшейся части на контрольную и тестовую выборки\n",
|
|||
|
"X_val, X_test, y_val, y_test = train_test_split(\n",
|
|||
|
" X_temp, y_temp, test_size=0.5, random_state=42\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"# Функция для анализа распределения и вывода результатов\n",
|
|||
|
"def analyze_distribution(data, title):\n",
|
|||
|
" print(f\"Распределение Price в {title}:\")\n",
|
|||
|
" distribution = data.value_counts().sort_index()\n",
|
|||
|
" print(distribution)\n",
|
|||
|
" total = len(data)\n",
|
|||
|
" positive_count = (data > 0).sum()\n",
|
|||
|
" negative_count = (data < 0).sum()\n",
|
|||
|
" positive_percent = (positive_count / total) * 100\n",
|
|||
|
" negative_percent = (negative_count / total) * 100\n",
|
|||
|
" print(f\"Процент положительных значений: {positive_percent:.2f}%\")\n",
|
|||
|
" print(f\"Процент отрицательных значений: {negative_percent:.2f}%\")\n",
|
|||
|
" print(\"\\nНеобходима аугментация данных для балансировки классов.\\n\")\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"# Анализ распределения для каждой выборки\n",
|
|||
|
"analyze_distribution(y_train, \"обучающей выборке\")\n",
|
|||
|
"analyze_distribution(y_val, \"контрольной выборке\")\n",
|
|||
|
"analyze_distribution(y_test, \"тестовой выборке\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Применяем методы приращения данных (аугментации)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 12,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Распределение Price в обучающей выборке после oversampling:\n",
|
|||
|
"price\n",
|
|||
|
"327 85\n",
|
|||
|
"334 85\n",
|
|||
|
"336 85\n",
|
|||
|
"337 85\n",
|
|||
|
"338 85\n",
|
|||
|
" ..\n",
|
|||
|
"18791 85\n",
|
|||
|
"18795 85\n",
|
|||
|
"18797 85\n",
|
|||
|
"18804 85\n",
|
|||
|
"18806 85\n",
|
|||
|
"Name: count, Length: 9476, dtype: int64\n",
|
|||
|
"Процент положительных значений: 100.00%\n",
|
|||
|
"Процент отрицательных значений: 0.00%\n",
|
|||
|
"Распределение Price в контрольной выборке:\n",
|
|||
|
"price\n",
|
|||
|
"326 2\n",
|
|||
|
"340 1\n",
|
|||
|
"344 1\n",
|
|||
|
"354 1\n",
|
|||
|
"357 1\n",
|
|||
|
" ..\n",
|
|||
|
"18781 1\n",
|
|||
|
"18784 1\n",
|
|||
|
"18791 1\n",
|
|||
|
"18803 1\n",
|
|||
|
"18823 1\n",
|
|||
|
"Name: count, Length: 5389, dtype: int64\n",
|
|||
|
"Процент положительных значений: 100.00%\n",
|
|||
|
"Процент отрицательных значений: 0.00%\n",
|
|||
|
"Распределение Price в тестовой выборке:\n",
|
|||
|
"price\n",
|
|||
|
"335 1\n",
|
|||
|
"336 1\n",
|
|||
|
"337 1\n",
|
|||
|
"351 1\n",
|
|||
|
"353 1\n",
|
|||
|
" ..\n",
|
|||
|
"18766 1\n",
|
|||
|
"18768 1\n",
|
|||
|
"18780 1\n",
|
|||
|
"18788 1\n",
|
|||
|
"18818 1\n",
|
|||
|
"Name: count, Length: 5308, dtype: int64\n",
|
|||
|
"Процент положительных значений: 100.00%\n",
|
|||
|
"Процент отрицательных значений: 0.00%\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"from sklearn.model_selection import train_test_split\n",
|
|||
|
"from imblearn.over_sampling import RandomOverSampler\n",
|
|||
|
"\n",
|
|||
|
"# Загрузка данных\n",
|
|||
|
"df = pd.read_csv(\"../data/Diamonds-Prices.csv\")\n",
|
|||
|
"\n",
|
|||
|
"# Выбор признаков и целевой переменной\n",
|
|||
|
"X = df.drop(\"price\", axis=1) # Признаки (все столбцы, кроме 'Price')\n",
|
|||
|
"y = df[\"price\"] # Целевая переменная ('Price')\n",
|
|||
|
"\n",
|
|||
|
"# Разбиение данных на обучающую и оставшуюся часть (контрольную + тестовую)\n",
|
|||
|
"X_train, X_temp, y_train, y_temp = train_test_split(\n",
|
|||
|
" X, y, test_size=0.4, random_state=42\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"# Разбиение оставшейся части на контрольную и тестовую выборки\n",
|
|||
|
"X_val, X_test, y_val, y_test = train_test_split(\n",
|
|||
|
" X_temp, y_temp, test_size=0.5, random_state=42\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"# Применение oversampling к обучающей выборке\n",
|
|||
|
"oversampler = RandomOverSampler(random_state=42)\n",
|
|||
|
"X_train_resampled, y_train_resampled = oversampler.fit_resample(X_train, y_train)\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"# Функция для анализа распределения и вывода результатов\n",
|
|||
|
"def analyze_distribution(data, title):\n",
|
|||
|
" print(f\"Распределение Price в {title}:\")\n",
|
|||
|
" distribution = data.value_counts().sort_index()\n",
|
|||
|
" print(distribution)\n",
|
|||
|
" total = len(data)\n",
|
|||
|
" positive_count = (data > 0).sum()\n",
|
|||
|
" negative_count = (data < 0).sum()\n",
|
|||
|
" positive_percent = (positive_count / total) * 100\n",
|
|||
|
" negative_percent = (negative_count / total) * 100\n",
|
|||
|
" print(f\"Процент положительных значений: {positive_percent:.2f}%\")\n",
|
|||
|
" print(f\"Процент отрицательных значений: {negative_percent:.2f}%\")\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"# Анализ распределения для каждой выборки\n",
|
|||
|
"analyze_distribution(y_train_resampled, \"обучающей выборке после oversampling\")\n",
|
|||
|
"analyze_distribution(y_val, \"контрольной выборке\")\n",
|
|||
|
"analyze_distribution(y_test, \"тестовой выборке\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Начнем анализировать датасет №18.\n",
|
|||
|
"\n",
|
|||
|
"Ссылка на исходные данные: https://www.kaggle.com/datasets/dewangmoghe/mobile-phone-price-prediction\n",
|
|||
|
"\n",
|
|||
|
"**Общее описание**: Данный датасет содержит информацию о ценах и атрибутах для 1369 мобильных телефонов разных конфигураций и производителей. Имеются 17 характеристик (именование модели, оценка (мин - 0, макс - 5), оценка на основе характеристик (мин - 0, макс - 100), информация о поддержке 2 симок и сетевых технологий (3G, 4G, 5G, VoLTE), количество оперативной памяти, характеристики батареи, информация о дисплее, характеристики камеры, поддержка внешней памяти, версия системы Android, цена, компания-производитель, поддержка быстрой зарядки, разрешение экрана, тип процессора, название процессора).\n",
|
|||
|
"\n",
|
|||
|
"**Проблемная область**: Финансовый анализ и прогнозирование цен на мобильные телефоны.\n",
|
|||
|
"\n",
|
|||
|
"**Объекты наблюдения**: телефон, включающий атрибуты: _Name, Rating, Spec_score, No_of_sim, RAM, Battery, Display, Camera, External_Memory, Android_version, Price, Company, Inbuilt_memory, Fast_charging, Screen_resolution, Processor, Processor_name_.\n",
|
|||
|
"\n",
|
|||
|
"**Бизнес цели**:\n",
|
|||
|
"- ***Прогнозирование цен мобильные телефоны на основе оценки характеристик***.\n",
|
|||
|
"- ***Прогнозирование оценки на основе фирмы и цены***.\n",
|
|||
|
"\n",
|
|||
|
"**Цели технического проекта**:\n",
|
|||
|
"1. ***Прогнозирование цен на телефоны***: Входные данные - _оценка характеристик_; целевой признак - _цена_,\n",
|
|||
|
"2. ***Анализ факторов влияния***: Входные данные - _фирма и цена_; целевой признак - _оценка характеристик_."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 13,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Index(['Unnamed: 0', 'Name', 'Rating', 'Spec_score', 'No_of_sim', 'Ram',\n",
|
|||
|
" 'Battery', 'Display', 'Camera', 'External_Memory', 'Android_version',\n",
|
|||
|
" 'Price', 'company', 'Inbuilt_memory', 'fast_charging',\n",
|
|||
|
" 'Screen_resolution', 'Processor', 'Processor_name'],\n",
|
|||
|
" dtype='object')\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"\n",
|
|||
|
"df = pd.read_csv(\"../data/mobile-phone-price-prediction.csv\")\n",
|
|||
|
"print(df.columns)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Атрибуты: \n",
|
|||
|
"- Неизвестный: 0, \n",
|
|||
|
"- Наименование телефона (Name), \n",
|
|||
|
"- Рейтинг (Rating),\n",
|
|||
|
"- Рейтинг на основе характеристик (Spec_score),\n",
|
|||
|
"- Поддержка различных технологий (No_of_sim),\n",
|
|||
|
"- Количество оперативной памяти (Ram),\n",
|
|||
|
"- Инфо о батарее (Battery),\n",
|
|||
|
"- Инфо о дисплее (Display),\n",
|
|||
|
"- Инфо о камере (Camera),\n",
|
|||
|
"- Инфо о внешней памяти (External_Memory),\n",
|
|||
|
"- Версия Android (Android_version),\n",
|
|||
|
"- Цена (Price),\n",
|
|||
|
"- Компания-производитель (company),\n",
|
|||
|
"- Инфо о внутренней памяти (Inbuilt_memory),\n",
|
|||
|
"- Быстрая зарядка (fast_charging),\n",
|
|||
|
"- Разрешение экрана (Screen_resolution),\n",
|
|||
|
"- Тип процессора (Processor),\n",
|
|||
|
"- Наименование процессора (Processor_name)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 14,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABIgAAAJLCAYAAACMgK3jAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdfXwU5bk//s9uQkggZCWRkCCSLFALa4xCFY0lYKlRRPHxVOUncCyeiFh85Gs1VIiAGtGeatVKIaXyVGt7WqsiIZZKlXgaDBUR0+ATbIJgQiCBTQIJS3b390fOpoTsw0wymZlr9/N+vXi9mp2LeDG99557r525L4vP5/OBiIiIiIiIiIiiltXoBIiIiIiIiIiIyFgsEBERERERERERRTkWiIiIiIiIiIiIohwLREREREREREREUY4FIiIiIiIiIiKiKMcCERERERERERFRlGOBiIiIiIiIiIgoyrFAREREREREREQU5VggIiIiIiIiIiKKciwQERERERERERFFORaIiIiIyLT+9Kc/wWKxBPyTlZVldHpRq6WlBYWFhZg6dSqSk5NhsViwZs0ao9MiIiKiXog1OgEiIiKicBYuXIixY8d2/vzUU08ZmA0dOXIES5cuxYgRI3DhhRfi/fffNzolIiIi6iUWiIiIiMj08vLycMUVV3T+/Jvf/AZHjhwxLqEol56ejtraWqSlpeGf//wnLrnkEqNTIiIiol7iI2ZERERkWm63GwBgtYZfsqxZswYWiwXV1dWdr3m9XmRnZ3d7BGr37t248847MXLkSMTHxyMtLQ1z5sxBQ0NDl9/5xBNPBHy8LTb239+xXXHFFcjKysLHH3+Myy+/HAkJCbDb7fj1r3/d7d+yePFifO9734PNZsPAgQORm5uLv//9713iqqurO/87b775ZpdjbW1tGDx4MCwWC37+8593yzM1NRWnTp3q8nd+//vfd/6+04tqb731Fq699loMGzYM/fv3x6hRo7Bs2TJ4PJ6w57p///5IS0sLG0dERERy8A4iIiIiMi1/gah///49+vvr16/HZ5991u31LVu2YN++ffjxj3+MtLQ0/Otf/8KqVavwr3/9C9u3b4fFYukSv2LFCiQmJnb+fGbB6ujRo5g2bRpuvfVWzJgxA3/84x8xb948xMXFYc6cOQCApqYm/OY3v8GMGTOQn5+P5uZmrF69GldffTUqKipw0UUXdfmd8fHxePXVV3HjjTd2vvbGG2+gra0t6L+3ubkZ77zzDm666abO11599VXEx8d3+3tr1qxBYmIiHn74YSQmJmLr1q1YvHgxmpqa8NxzzwX9bxAREVFkYoGIiIiITMvlcgEAEhISVP/dkydPYvHixbjmmmuwefPmLsfuvfdeLFiwoMtrl112GWbMmIEPP/wQubm5XY79x3/8B84+++yg/61vv/0W//3f/42HH34YADB37lxceumlKCgowKxZs9CvXz8MHjwY1dXViIuL6/x7+fn5GDNmDF566SWsXr26y++86aab8D//8z84dOgQhg4dCgD47W9/i5tvvhmvvfZawDxuuukm/Pa3v+0sEO3fvx/vvfcebrvtNvz+97/vEvvaa691Oa/33HMP7rnnHrzyyit48skne1yUIyIiIpn4iBkRERGZlv+RryFDhqj+u7/61a/Q0NCAwsLCbsdOL4y0tbXhyJEjuOyyywAAO3fuVP3fio2Nxdy5czt/jouLw9y5c1FfX4+PP/4YABATE9NZHPJ6vWhsbER7ezsuvvjigP/N8ePH4/zzz8f69esBADU1Nfj73/+OO++8M2gec+bMQWlpKerq6gAAa9euRU5ODs4777xusaefg+bmZhw5cgS5ubk4ceIEPv/8c9XngIiIiGRjgYiIiIhMq6amBrGxsaoLRC6XC08//TQefvjhzrtvTtfY2IgHHngAQ4cORUJCAoYMGQK73d75d9UaNmwYBg4c2OU1f1Hm9D2R1q5di+zsbMTHxyMlJQVDhgzBpk2bgv43f/zjH+PVV18F0PFI2OWXX47vfOc7QfO46KKLkJWVhXXr1sHn82HNmjX48Y9/HDD2X//6F2666SbYbDYkJSVhyJAhmDlzJoCenQMiIiKSjQUiIiIiMq0vvvgCI0eO7LIptBLLly+H1WrFI488EvD4rbfeiuLiYtxzzz1444038Ne//hWlpaUAOu7u6QsbNmzAnXfeiVGjRmH16tUoLS3Fli1bMGXKlKD/zZkzZ+Lrr7/G9u3bsXbt2qDFntPNmTMHr776Kj744APU1dXh1ltv7RZz7NgxTJ48GZ9++imWLl2KjRs3YsuWLVi+fDmAvjsHREREZF7cg4iIiIhM6eTJk9i1a1eXTZqV+Pbbb/HLX/4SRUVFGDRoULfOZEePHsV7772HJUuWYPHixZ2vf/XVVz3O9dtvv8Xx48e73EX05ZdfAgAyMzMBAH/6058wcuRIvPHGG102wQ70CJxfSkoKrr/++s7H1W699dYuncgCueOOO/DII4/ggQcewH/8x39g0KBB3WLef/99NDQ04I033sCkSZM6X3c6nYr+vURERBR5eAcRERERmdJrr72GkydP4oc//KGqv7dkyRIMHToU99xzT8DjMTExAACfz9fl9RdeeKFHeQJAe3s7Vq5c2fmz2+3GypUrMWTIEHzve98L+t/96KOPUF5eHvJ3z5kzB7t378aPfvSjLp3UgklOTsYNN9yA3bt3d3ZQO1OgXNxuN1555ZWwv5+IiIgiE+8gIiIiIlM5fvw4XnrpJSxduhQxMTHw+XzYsGFDl5hDhw6hpaUFGzZsQF5eXpd9hv7617/id7/7XZduYadLSkrCpEmT8Oyzz+LUqVM455xz8Ne//rVXd88MGzYMy5cvR3V1Nc477zz84Q9/wK5du7Bq1Sr069cPAHDdddfhjTfewE033YRrr70WTqcTv/71r+FwONDS0hL0d0+dOhWHDx9WVBzyW7NmDX71q18F7bx2+eWXY/DgwfjP//xP3H///bBYLFi/fn23olkoL7/8Mo4dO4Zvv/0WALBx40YcOHAAAHDffffBZrMp/l1ERERkPBaIiIiIyFQOHz6MgoKCzp9P7w52plmzZuHvf/97lwLRRRddhBkzZoT8b7z22mu477778Ktf/Qo+nw9XXXUVNm/ejGHDhvUo58GDB2Pt2rW47777UFxcjKFDh+Lll19Gfn5+Z8ydd96Juro6rFy5Eu+++y4cDgc2bNiA//mf/8H7778f9HdbLJaghZ5gEhISunQpO1NKSgreeecdLFiwAI8//jgGDx6MmTNn4oc//CGuvvpqRf+Nn//856ipqen8+Y033sAbb7wBoGPvJBaIiIiIZLH41HxVRERERNTHqqurYbfb8fe//x1XXHFFr+P62hVXXIEjR46gsrLSsByIiIiIeot7EBERERERERERRTkWiIiIiMhUEhMTcccdd3R5bKw3cUREREQUHh8xIyIiIuoFPmJGREREkYAFIiIiIiIiIiKiKMdHzIiIiIiIiIiIohzb3APwer349ttvMWjQIFgsFqPTISIiIiIiIiLShM/nQ3NzM4YNGwarNfh9QiwQAfj2229x7rnnGp0GEREREREREVGf+OabbzB8+PCgx1kgAjBo0CAAHScrKSnJ4GyIiIiIiIiIiLTR1NSEc889t7P2EQwLREDnY2VJSUksEBERERERERFRxAm3pY6hm1Rv27YN06dPx7Bhw2CxWPDmm292Oe7z+bB48WKkp6cjISEBV155Jb766qsuMY2NjbjjjjuQlJSEs846C3fddRdaWlp0/FcQEREREREREclmaIHo+PHjuPDCC/GrX/0q4PFnn30WL774In7961/jo48+wsCBA3H11Vejra2tM+aOO+7Av/71L2zZsgXvvPMOtm3bhrvvvluvfwIRERERERERkXgWn8/nMzoJoONWp7/85S+48cYbAXTcPTRs2DAsWLAA/+///T8AgMvlwtChQ7FmzRrcfvvt2LNnDxwOB3bs2IGLL74YAFBaWopp06bhwIEDGDZsWMD/1smTJ3Hy5MnOn/3P47lcLj5iRkREREREREQRo6mpCTa
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1400x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"plt.figure(figsize=(14, 6))\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"plt.scatter(df[\"company\"].str.lower(), df[\"Spec_score\"])\n",
|
|||
|
"plt.xlabel(\"Фирма\")\n",
|
|||
|
"plt.ylabel(\"Оценка характеристик\")\n",
|
|||
|
"plt.xticks(rotation=45)\n",
|
|||
|
"plt.title(\"Диаграмма 1\")\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Между атрибутами присутствует связь. Пример, на диаграмме 1 - связь между фирмой и оценкой характеристик"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Перейдем к проверке на выбросы"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 15,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Пустые значения по столбцам:\n",
|
|||
|
"Unnamed: 0 0\n",
|
|||
|
"Name 0\n",
|
|||
|
"Rating 0\n",
|
|||
|
"Spec_score 0\n",
|
|||
|
"No_of_sim 0\n",
|
|||
|
"Ram 0\n",
|
|||
|
"Battery 0\n",
|
|||
|
"Display 0\n",
|
|||
|
"Camera 0\n",
|
|||
|
"External_Memory 0\n",
|
|||
|
"Android_version 443\n",
|
|||
|
"Price 0\n",
|
|||
|
"company 0\n",
|
|||
|
"Inbuilt_memory 19\n",
|
|||
|
"fast_charging 89\n",
|
|||
|
"Screen_resolution 2\n",
|
|||
|
"Processor 28\n",
|
|||
|
"Processor_name 0\n",
|
|||
|
"dtype: int64\n",
|
|||
|
"\n",
|
|||
|
"Количество дубликатов: 0\n",
|
|||
|
"\n",
|
|||
|
"Статистический обзор данных:\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>Unnamed: 0</th>\n",
|
|||
|
" <th>Rating</th>\n",
|
|||
|
" <th>Spec_score</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>count</th>\n",
|
|||
|
" <td>1370.000000</td>\n",
|
|||
|
" <td>1370.000000</td>\n",
|
|||
|
" <td>1370.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>mean</th>\n",
|
|||
|
" <td>684.500000</td>\n",
|
|||
|
" <td>4.374416</td>\n",
|
|||
|
" <td>80.234307</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>std</th>\n",
|
|||
|
" <td>395.629246</td>\n",
|
|||
|
" <td>0.230176</td>\n",
|
|||
|
" <td>8.373922</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>min</th>\n",
|
|||
|
" <td>0.000000</td>\n",
|
|||
|
" <td>3.750000</td>\n",
|
|||
|
" <td>42.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>25%</th>\n",
|
|||
|
" <td>342.250000</td>\n",
|
|||
|
" <td>4.150000</td>\n",
|
|||
|
" <td>75.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>50%</th>\n",
|
|||
|
" <td>684.500000</td>\n",
|
|||
|
" <td>4.400000</td>\n",
|
|||
|
" <td>82.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>75%</th>\n",
|
|||
|
" <td>1026.750000</td>\n",
|
|||
|
" <td>4.550000</td>\n",
|
|||
|
" <td>86.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>max</th>\n",
|
|||
|
" <td>1369.000000</td>\n",
|
|||
|
" <td>4.750000</td>\n",
|
|||
|
" <td>98.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" Unnamed: 0 Rating Spec_score\n",
|
|||
|
"count 1370.000000 1370.000000 1370.000000\n",
|
|||
|
"mean 684.500000 4.374416 80.234307\n",
|
|||
|
"std 395.629246 0.230176 8.373922\n",
|
|||
|
"min 0.000000 3.750000 42.000000\n",
|
|||
|
"25% 342.250000 4.150000 75.000000\n",
|
|||
|
"50% 684.500000 4.400000 82.000000\n",
|
|||
|
"75% 1026.750000 4.550000 86.000000\n",
|
|||
|
"max 1369.000000 4.750000 98.000000"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 15,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"null_values = df.isnull().sum()\n",
|
|||
|
"print(\"Пустые значения по столбцам:\")\n",
|
|||
|
"print(null_values)\n",
|
|||
|
"\n",
|
|||
|
"duplicates = df.duplicated().sum()\n",
|
|||
|
"print(f\"\\nКоличество дубликатов: {duplicates}\")\n",
|
|||
|
"\n",
|
|||
|
"print(\"\\nСтатистический обзор данных:\")\n",
|
|||
|
"df.describe()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Видим, что есть пустые данные, но нет дубликатов. Удаляем их"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 17,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"В наборе данных 'Phones' было удалено 553 строк с пустыми значениями.\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"def drop_missing_values(dataframe, name):\n",
|
|||
|
" before_shape = dataframe.shape\n",
|
|||
|
" cleaned_dataframe = dataframe.dropna()\n",
|
|||
|
" after_shape = cleaned_dataframe.shape\n",
|
|||
|
" print(\n",
|
|||
|
" f\"В наборе данных '{name}' было удалено {before_shape[0] - after_shape[0]} строк с пустыми значениями.\"\n",
|
|||
|
" )\n",
|
|||
|
" return cleaned_dataframe\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"cleaned_df = drop_missing_values(df, \"Phones\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Рассчитаем коэффициент ассиметрии"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 20,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"\n",
|
|||
|
"Коэффициент асимметрии для столбца 'Unnamed: 0': 0.0\n",
|
|||
|
"\n",
|
|||
|
"Коэффициент асимметрии для столбца 'Rating': -0.06697860128699223\n",
|
|||
|
"\n",
|
|||
|
"Коэффициент асимметрии для столбца 'Spec_score': -0.7393772365886471\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import numpy as np\n",
|
|||
|
"for column in df.select_dtypes(include=[np.number]).columns:\n",
|
|||
|
" asymmetry = df[column].skew()\n",
|
|||
|
" print(f\"\\nКоэффициент асимметрии для столбца '{column}': {asymmetry}\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Выбросы незначительные.\n",
|
|||
|
"\n",
|
|||
|
"Очистим данные от шумов."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 21,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1IAAAJLCAYAAADtiKfgAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADus0lEQVR4nOzdeXxTVfo/8E/a0oUusQW6AELL4mBFBAQUZBsFQRQQHRFHUOQnMDKoIKMjDKuyjKKi4ICAyqqjjgiCIoiALFIWBUQsqEDL2lKg0AW60Ob8/ug3SdMmaZLmJk/Sz/v16kt7c0iennvvufe59+Y8OqWUAhERERERETkswNsBEBERERER+RomUkRERERERE5iIkVEREREROQkJlJEREREREROYiJFRERERETkJCZSRERERERETmIiRURERERE5CQmUkRERERERE5iIkVEREREROQkJlJEREREgp05cwZLly41/Z6eno6PPvrIewEREQAmUkSksc8//xw6nc7qT8uWLb0dHhGReDqdDn//+9+xceNGpKen46WXXsKOHTu8HRZRjRfk7QCIqGaYMGECbr75ZtPvM2bM8GI0RES+o0GDBhg+fDh69+4NAEhISMD333/v3aCICDqllPJ2EETkvz7//HM88sgj2Lp1K7p3725a3r17d1y8eBGHDx/2XnBERD7k+PHjuHjxIlq2bInw8HBvh0NU4/HRPiLSVHFxMQAgIKDq4Wbp0qXQ6XRIT083LTMYDGjVqhV0Op3FdwQOHTqEoUOHokmTJggNDUV8fDyGDRuGS5cuWbzn1KlTrT5WGBRkviHfvXt3tGzZEj/99BM6deqEsLAwJCUl4b333qv0t0yePBm333479Ho9wsPD0aVLF2zdutWiXXp6uulz1qxZY/FaYWEhoqOjodPp8MYbb1SKMzY2FtevX7f4N//9739N73fx4kXT8i+//BL3338/6tevj5CQEDRt2hSvvvoqSktLq+xr4+cdPXoUAwcORFRUFOrUqYPnn38ehYWFFm2XLFmCu+++G7GxsQgJCUFycjIWLFhg9X2/+eYbdOvWDZGRkYiKikL79u3x8ccfW7TZs2cP+vTpg+joaISHh6NVq1Z45513LNocPXoUf/nLXxATE4PQ0FC0a9cOa9eutWjjzPYydOhQi/UfHR2N7t27V3o8ytE+NW4zFb3xxhuVYkpMTMTQoUMt2v3vf/+DTqdDYmKixfKsrCz8v//3/9CoUSMEBgaa4o2IiKj0WRUlJibafIxWp9NVar9y5UrcfvvtCAsLQ0xMDAYNGoTTp09b/Tur2jcAoKioCFOmTEGzZs0QEhKCG2+8ES+99BKKiooqtf3+++8djrMi47Zr7e8v38/ObB8ATPtCvXr1EBYWhj/96U/417/+ZfGZ9n6Md4i6d+9ucdEIKLsDHxAQUGlf+N///mdaB3Xr1sXgwYNx9uxZizZDhw41bSdNmzbFHXfcgezsbISFhVX6+4jIs/hoHxFpyphIhYSEuPTvV6xYgV9++aXS8k2bNuHEiRN46qmnEB8fj19//RWLFi3Cr7/+it27d1c60VqwYIHFyWjFxO7y5cvo06cPBg4ciMceewyfffYZnnnmGQQHB2PYsGEAgNzcXLz//vt47LHHMHz4cOTl5eGDDz5Ar169sHfvXrRu3driPUNDQ7FkyRI8+OCDpmVffPFFpUSlvLy8PHz11VcYMGCAadmSJUsQGhpa6d8tXboUEREReOGFFxAREYEtW7Zg8uTJyM3NxezZs21+RnkDBw5EYmIiZs2ahd27d2Pu3Lm4fPkyli9fbtF3t9xyC/r164egoCCsW7cOo0aNgsFgwN///neLeIYNG4ZbbrkF48ePxw033IADBw5gw4YN+Otf/wqgbL098MADSEhIwPPPP4/4+HgcOXIEX331FZ5//nkAwK+//oq77roLDRo0wMsvv4zw8HB89tlnePDBB7Fq1SqLvqnI1vYCAHXr1sWcOXMAlH15/5133kGfPn1w+vRp3HDDDW7r06qUlJSYTtArevLJJ/Hdd9/h2WefxW233YbAwEAsWrQI+/fvd+i9W7dujXHjxlksW758OTZt2mSxbMaMGZg0aRIGDhyIp59+GhcuXMC8efPQtWtXHDhwwNQfgGP7hsFgQL9+/bBz506MGDECN998M3755RfMmTMHv//+e6ULCkbPPfcc2rdvbzNOd7O1fRw6dAhdunRBrVq1MGLECCQmJuL48eNYt24dZsyYgYceegjNmjUztR87dixuvvlmjBgxwrSs/KPL5S1ZsgQTJ07Em2++adoPgLJt7amnnkL79u0xa9YsnD9/Hu+88w5++OGHSuugosmTJ9sdR4jIQxQRkYbefvttBUD9/PPPFsu7deumbrnlFotlS5YsUQBUWlqaUkqpwsJC1ahRI3XfffcpAGrJkiWmtteuXav0Wf/9738VALV9+3bTsilTpigA6sKFCzZj7NatmwKg3nzzTdOyoqIi1bp1axUbG6uKi4uVUkqVlJSooqIii397+fJlFRcXp4YNG2ZalpaWpgCoxx57TAUFBanMzEzTa/fcc4/661//qgCo2bNnV4rzscceUw888IBp+cmTJ1VAQIB67LHHKv0d1vpg5MiRqnbt2qqwsNDm31v+8/r162exfNSoUZXWl7XP6dWrl2rSpInp9ytXrqjIyEh1xx13qIKCAou2BoNBKVXWf0lJSapx48bq8uXLVtsoVdZHt956q8XfYDAYVKdOnVTz5s1Ny5zZXp588knVuHFji89ctGiRAqD27t1r92+11qfWtl+llJo9e7ZFTEop1bhxY/Xkk0+afp8/f74KCQlRf/7zny1iKigoUAEBAWrkyJEW7/nkk0+q8PDwSp9VUePGjdX9999fafnf//53Vf5wn56ergIDA9WMGTMs2v3yyy8qKCjIYrmj+8aKFStUQECA2rFjh8V7vvfeewqA+uGHHyyWf/vttwqA+vzzz23Gacu0adMUAIttxvj3l+9nZ7aPrl27qsjISHXy5EmL96z4GbY+q7xu3bqpbt26KaWU+vrrr1VQUJAaN26cRZvi4mIVGxurWrZsabG/fPXVVwqAmjx5smlZxW338OHDKiAgwPR3lN/WiMiz+GgfEWnK+KhdvXr1nP63//nPf3Dp0iVMmTKl0mthYWGm/y8sLMTFixdx5513AoDDV+/LCwoKwsiRI02/BwcHY+TIkcjKysJPP/0EAAgMDERwcDCAsivw2dnZKCkpQbt27ax+Ztu2bXHLLbdgxYoVAICTJ09i69atlR7zKm/YsGHYsGEDMjMzAQDLli1Dx44dcdNNN1VqW74P8vLycPHiRXTp0gXXrl3D0aNHHfq7y99RAoBnn30WALB+/Xqrn5OTk4OLFy+iW7duOHHiBHJycgCU3WnKy8vDyy+/jNDQUIv3NN4dPHDgANLS0jBmzJhKV9uNbbKzs7FlyxYMHDjQ9DddvHgRly5dQq9evfDHH39UevTJyN72ApStM+P7HTx4EMuXL0dCQoLFnQRn+rS0tNT0fsafa9euWf1so2vXruGVV17B6NGj0ahRI4vXrl69CoPBgDp16th9j+r64osvYDAYMHDgQIvY4+Pj0bx580qPqjqyb/zvf//DzTffjBYtWli859133w0Ald7TeDel4rbiiNjYWABldxWdYWv7uHDhArZv345hw4ZVWieOPGpoy969ezFw4EA8/PDDle5m/vjjj8jKysKoUaMs+uD+++9HixYt8PXXX9t83/Hjx6Nt27Z45JFHXI6NiNyDj/YRkaZOnjyJoKAgpxOpnJwczJw5Ey+88ALi4uIqvZ6dnY1p06bhk08+QVZWVqV/66z69etX+vK2MXlJT083JWnLli3Dm2++iaNHj1p8lykpKcnq+z711FNYtGgR/vGPf2Dp0qXo1KkTmjdvbjOO1q1bo2XLlli+fDlefPFFLF26FBMmTKj03RWg7BG4iRMnYsuWLcjNzbV4zdE+qBhL06ZNERAQYPG9ix9++AFTpkxBSkpKpUQhJycHer0ex48fBwC7U9o70ubYsWNQSmHSpEmYNGmS1TZZWVlo0KB
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Выбросы в датасете:\n",
|
|||
|
" Unnamed: 0 Name Rating Spec_score \\\n",
|
|||
|
"99 99 Vivo Y02 4.35 54 \n",
|
|||
|
"214 214 Realme C30s 4.55 58 \n",
|
|||
|
"802 802 Vivo Y02 (2GB RAM + 32GB) 4.50 53 \n",
|
|||
|
"803 803 Vivo Y02 4.35 54 \n",
|
|||
|
"1344 1344 TCL 501 4.25 55 \n",
|
|||
|
"\n",
|
|||
|
" No_of_sim Ram Battery Display \\\n",
|
|||
|
"99 Dual Sim, 3G, 4G, VoLTE, 3 GB RAM 5000 mAh Battery 6.51 inches \n",
|
|||
|
"214 Dual Sim, 3G, 4G, VoLTE, 2 GB RAM 5000 mAh Battery 6.5 inches \n",
|
|||
|
"802 Dual Sim, 3G, 4G, VoLTE, 2 GB RAM 5000 mAh Battery 6.51 inches \n",
|
|||
|
"803 Dual Sim, 3G, 4G, VoLTE, 3 GB RAM 5000 mAh Battery 6.51 inches \n",
|
|||
|
"1344 Dual Sim, 3G, 4G, VoLTE, 2 GB RAM 3000 mAh Battery 6 inches \n",
|
|||
|
"\n",
|
|||
|
" Camera External_Memory \\\n",
|
|||
|
"99 8 MP Rear & 5 MP Front Camera Memory Card Supported, upto 1 TB \n",
|
|||
|
"214 8 MP Rear & 5 MP Front Camera Memory Card Supported, upto 1 TB \n",
|
|||
|
"802 8 MP Rear & 5 MP Front Camera Memory Card Supported, upto 1 TB \n",
|
|||
|
"803 8 MP Rear & 5 MP Front Camera Memory Card Supported, upto 1 TB \n",
|
|||
|
"1344 5 MP Rear & 2 MP Front Camera Memory Card Supported \n",
|
|||
|
"\n",
|
|||
|
" Android_version Price company Inbuilt_memory fast_charging \\\n",
|
|||
|
"99 12 9,999 Vivo 32 GB inbuilt 10W Fast Charging \n",
|
|||
|
"214 12 6,950 Realme 32 GB inbuilt 10W Fast Charging \n",
|
|||
|
"802 12 8,999 Vivo 32 GB inbuilt 10W Fast Charging \n",
|
|||
|
"803 12 8,489 Vivo 32 GB inbuilt 10W Fast Charging \n",
|
|||
|
"1344 14 7,990 TCL 32 GB inbuilt 10W Fast Charging \n",
|
|||
|
"\n",
|
|||
|
" Screen_resolution Processor \\\n",
|
|||
|
"99 720 x 1600 px Display with Water Drop Notch Octa Core Processor \n",
|
|||
|
"214 720 x 1600 px Display with Water Drop Notch Octa Core \n",
|
|||
|
"802 720 x 1600 px Display with Water Drop Notch Octa Core Processor \n",
|
|||
|
"803 720 x 1600 px Display with Water Drop Notch Octa Core Processor \n",
|
|||
|
"1344 540 x 1092 px Display Octa Core \n",
|
|||
|
"\n",
|
|||
|
" Processor_name \n",
|
|||
|
"99 Helio \n",
|
|||
|
"214 Unisoc SC9863A \n",
|
|||
|
"802 Helio \n",
|
|||
|
"803 Helio \n",
|
|||
|
"1344 Helio G36 \n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA0kAAAJLCAYAAAAyxt3/AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzde3hTZbo28DtJSdNjbAqlRaQtoONUYASlM2UsuFUEURCd2aIjIDJWNoyjoltHGLFUlG51vsHTyEAdQcBRnO1hU6dW8QRlRMsWlMGo4yEtIsViAz3QpmmS9f2RnbZpDl1Js1bepPfvunpdNH1In6618q715F15H40kSRKIiIiIiIgIAKCNdgJEREREREQiYZFERERERETUC4skIiIiIiKiXlgkERERERER9cIiiYiIiIiIqBcWSURERERERL2wSCIiIiIiIuqFRRIREREREVEvLJKIiIiIiIh6YZFERERERETUC4skIlLEf//3f0Oj0fj9GjduXLTTIyIiIgooIdoJEFF8W7lyJX784x93f//ggw9GMRsiIiKi/rFIIiJFTZ8+HRdeeGH3908//TR++OGH6CVERERE1A/ebkdEirDb7QAArbb/YWbz5s3QaDSoq6vrfszlcmHChAnQaDTYvHlz9+MHDx7EokWLMHr0aBgMBmRnZ2Px4sVoamryes7Vq1f7vdUvIaHnvaELL7wQ48aNw0cffYQpU6YgKSkJ+fn5+POf/+zzt9x3330477zzYDQakZKSguLiYrz77rtecXV1dd2/59VXX/X6mc1mQ0ZGBjQaDf7whz/45JmVlYWuri6v//P88893P1/vwvJ//ud/cPnll2PEiBFITEzEmDFjsGbNGjidzn63tef3ff7557jmmmuQnp6OzMxM3HbbbbDZbF6xmzZtwkUXXYSsrCwkJiaioKAA69ev9/u8r7/+OqZNm4a0tDSkp6dj8uTJ+Otf/+oV8+GHH2LWrFnIyMhASkoKJkyYgMcee8wr5vPPP8cvf/lLmEwmGAwGnH/++dixY4dXTCjHy6JFi7z2f0ZGBi688ELU1NR4Pafcbeo5Zvr6wx/+4JNTXl4eFi1a5BX3t7/9DRqNBnl5eV6PNzY24te//jVGjRoFnU7XnW9qaqrP7+orLy8v4K2tGo3GK9bhcGDNmjUYM2YMEhMTkZeXh5UrV6Kzs9PneeXs097HfLDf63K58Oijj+Kcc86BwWDA8OHDsWTJEpw4cULW39d3O7733nvQaDR47733uh+78MILvd6QAYB9+/b5zQcAtm3bhsLCQiQnJyMjIwNTp07Fm2++2f07g21Tz/7z/P29j7nW1lacd955yM/PR0NDQ8A4APjNb34DjUbj8/cRUfRxJomIFOEpkhITE8P6/1u3bsU///lPn8d37tyJb775BjfeeCOys7Px6aefYuPGjfj000/xwQcf+FwMrV+/3utCs2/RduLECcyaNQvXXHMNrrvuOrz44otYunQp9Ho9Fi9eDABoaWnB008/jeuuuw4lJSVobW3FX/7yF8yYMQO1tbU499xzvZ7TYDBg06ZNmDt3bvdjL7/8sk8R0ltraytee+01XHXVVd2Pbdq0CQaDwef/bd68GampqbjjjjuQmpqKd955B/fddx9aWlrwyCOPBPwdvV1zzTXIy8tDeXk5PvjgAzz++OM4ceIEtmzZ4rXtzjnnHMyZMwcJCQmorKzEsmXL4HK58Jvf/MYrn8WLF+Occ87BihUrcNppp+HAgQOorq7Gr371KwDu/XbFFVcgJycHt912G7Kzs/HZZ5/htddew2233QYA+PTTT/Hzn/8cp59+Ou655x6kpKTgxRdfxNy5c/HSSy95bZu+Ah0vADB06FCsW7cOAHDkyBE89thjmDVrFr799lucdtppEdum/XE4HPj973/v92c33HAD3nrrLfz2t7/FT37yE+h0OmzcuBH79++X9dznnnsu7rzzTq/HtmzZgp07d3o9dtNNN+HZZ5/FL3/5S9x555348MMPUV5ejs8++wyvvPJKd5ycfdrbzTffjOLiYgDuY733cwHAkiVLsHnzZtx444249dZbYbFY8OSTT+LAgQP4xz/+gSFDhsj6O0P1u9/9zu/jZWVlWL16NaZMmYL7778fer0eH374Id555x1ceumlePTRR9HW1gYA+Oyzz7B27VqvW4cDFa9dXV34xS9+gcOHD+Mf//gHcnJyAub21VdfoaKiYoB/IREpRiIiUsCjjz4qAZA++eQTr8enTZsmnXPOOV6Pbdq0SQIgWSwWSZIkyWazSaNGjZIuu+wyCYC0adOm7tj29naf3/X8889LAKTdu3d3P1ZaWioBkI4fPx4wx2nTpkkApP/3//5f92OdnZ3SueeeK2VlZUl2u12SJElyOBxSZ2en1/89ceKENHz4cGnx4sXdj1ksFgmAdN1110kJCQnSsWPHun928cUXS7/61a8kANIjjzzik+d1110nXXHFFd2P19fXS1qtVrruuut8/g5/22DJkiVScnKyZLPZAv69vX/fnDlzvB5ftmyZz/7y93tmzJghjR49uvv7kydPSmlpadJPf/pTqaOjwyvW5XJJkuTefvn5+VJubq504sQJvzGS5N5G48eP9/obXC6XNGXKFOnMM8/sfiyU4+WGG26QcnNzvX7nxo0bJQBSbW1t0L/V3zb1d/xKkiQ98sgjXjlJkiTl5uZKN9xwQ/f3Tz31lJSYmCj927/9m1dOHR0dklarlZYsWeL1nDfccIOUkpLi87v6ys3NlS6//HKfx3/zm99IvU/zH3/8sQRAuummm7zi/vM//1MCIL3zzjuSJMnbpx5ffvmlBEB69tlnux/zHGMeNTU1EgDpueee8/q/1dXVfh/vKz8/X1q4cKHXY++++64EQHr33Xe7H5s2bZo0bdq07u+rqqokANLMmTO98vnyyy8lrVYrXXXVVZLT6Qz69wX6XR6e1/ymTZskl8slXX/99VJycrL04YcfBozzuOaaa6Rx48ZJZ5xxhtdxQkRi4O12RKQIz+1vw4YNC/n//ulPf0JTUxNKS0t9fpaUlNT9b5vNhh9++AE/+9nPAED2u+69JSQkYMmSJd3f6/V6LFmyBI2Njfjoo48AADqdDnq9HoD7tiGr1QqHw4Hzzz/f7++cNGkSzjnnHGzduhUAUF9fj3fffTfoLTWLFy9GdXU1jh07BgB49tlnUVRUhLPOOssntvc2aG1txQ8//IDi4mK0t7fj888/l/V3954JAoDf/va3AICqqiq/v6e5uRk//PADpk2bhm+++QbNzc0A3DNEra2tuOeee2AwGLye0zOrd+DAAVgsFtx+++3dMzd9Y6xWK9555x1cc8013X/TDz/8gKamJsyYMQNffvklvvvuO79/S7DjBXDvM8/zffzxx9iyZQtycnK8FhQJZZs6nc7u5/N8tbe3+/3dHu3t7bj//vtxyy23YNSoUV4/O3XqFFwuFzIzM4M+x0B59u0dd9zh9bhnBurvf/87AHn71EPOjPHf/vY3GI1GTJ8+3WubnXfeeUhNTfW5bbWvrKwsHDlyRMZf2EOSJKxYsQK/+MUv8NOf/tTrZ6+++ipcLhfuu+8+n5llf7flyXXXXXfhueeew4svvojCwsKgsR999BH+9re/oby8XNYtyUSkPr4yiUgR9fX1SEhICLlIam5uxtq1a3HHHXdg+PDhPj+3Wq247bbbMHz4cCQlJWHYsGHIz8/v/r+hGjFiBFJSUrwe8xQmvT9f8uyzz2LChAkwGAzIzMzEsGHD8Pe//z3g77zxxhuxadMmAO5bl6ZMmYIzzzwzYB7nnnsuxo0bhy1btkCSpO5bk/z59NNPcdVVV8FoNCI9PR3Dhg3D/PnzAcjfBn1zGTNmDLRardff/I9//AOXXHIJUlJScNppp2HYsGFYuXKl1+/5+uuvASDosu5yYr766itIkoRVq1Zh2LBhXl+e4qexsdHn//V3vADAt99+2/1cEydOxNdff42XXnrJ65apULbp559/HjDHQP74xz/CZrN1b7/eMjMzceaZZ+Lpp5/Gm2++icbGRvzwww9+Pyc0EPX19dBqtRg
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"plt.scatter(cleaned_df[\"company\"].str.lower(), cleaned_df[\"Spec_score\"])\n",
|
|||
|
"plt.xlabel(\"Фирма\")\n",
|
|||
|
"plt.ylabel(\"Оценка характеристик\")\n",
|
|||
|
"plt.xticks(rotation=45)\n",
|
|||
|
"plt.title(\"Диаграмма рассеивания перед чисткой\")\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"Q1 = cleaned_df[\"Spec_score\"].quantile(0.25)\n",
|
|||
|
"Q3 = cleaned_df[\"Spec_score\"].quantile(0.75)\n",
|
|||
|
"\n",
|
|||
|
"IQR = Q3 - Q1\n",
|
|||
|
"\n",
|
|||
|
"threshold = 1.5 * IQR\n",
|
|||
|
"lower_bound = Q1 - threshold\n",
|
|||
|
"upper_bound = Q3 + threshold\n",
|
|||
|
"\n",
|
|||
|
"outliers = (cleaned_df[\"Spec_score\"] < lower_bound) | (\n",
|
|||
|
" cleaned_df[\"Spec_score\"] > upper_bound\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"print(\"Выбросы в датасете:\")\n",
|
|||
|
"print(cleaned_df[outliers])\n",
|
|||
|
"\n",
|
|||
|
"median_score = cleaned_df[\"Spec_score\"].median()\n",
|
|||
|
"cleaned_df.loc[outliers, \"Spec_score\"] = median_score\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"plt.scatter(cleaned_df[\"company\"].str.lower(), cleaned_df[\"Spec_score\"])\n",
|
|||
|
"plt.xlabel(\"Фирма\")\n",
|
|||
|
"plt.ylabel(\"Оценка характеристик\")\n",
|
|||
|
"plt.xticks(rotation=45)\n",
|
|||
|
"plt.title(\"Диаграмма рассеивания после чистки\")\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Разбиваем на выборки."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 22,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Размер обучающей выборки: 489\n",
|
|||
|
"Размер контрольной выборки: 164\n",
|
|||
|
"Размер тестовой выборки: 164\n",
|
|||
|
"\n",
|
|||
|
"Распределение оценки характеристик в обучающей выборке:\n",
|
|||
|
"Spec_score\n",
|
|||
|
"75 48\n",
|
|||
|
"86 35\n",
|
|||
|
"80 34\n",
|
|||
|
"84 32\n",
|
|||
|
"85 23\n",
|
|||
|
"78 23\n",
|
|||
|
"83 23\n",
|
|||
|
"77 19\n",
|
|||
|
"79 19\n",
|
|||
|
"82 18\n",
|
|||
|
"89 17\n",
|
|||
|
"88 17\n",
|
|||
|
"71 16\n",
|
|||
|
"73 15\n",
|
|||
|
"72 13\n",
|
|||
|
"74 13\n",
|
|||
|
"87 12\n",
|
|||
|
"69 11\n",
|
|||
|
"76 10\n",
|
|||
|
"81 10\n",
|
|||
|
"67 9\n",
|
|||
|
"90 9\n",
|
|||
|
"70 8\n",
|
|||
|
"68 8\n",
|
|||
|
"91 8\n",
|
|||
|
"64 7\n",
|
|||
|
"93 7\n",
|
|||
|
"92 6\n",
|
|||
|
"66 5\n",
|
|||
|
"94 4\n",
|
|||
|
"63 4\n",
|
|||
|
"96 2\n",
|
|||
|
"95 1\n",
|
|||
|
"65 1\n",
|
|||
|
"60 1\n",
|
|||
|
"61 1\n",
|
|||
|
"Name: count, dtype: int64\n",
|
|||
|
"\n",
|
|||
|
"Распределение оценки характеристик в контрольной выборке:\n",
|
|||
|
"Spec_score\n",
|
|||
|
"75 18\n",
|
|||
|
"81 12\n",
|
|||
|
"74 11\n",
|
|||
|
"79 9\n",
|
|||
|
"82 9\n",
|
|||
|
"85 9\n",
|
|||
|
"84 8\n",
|
|||
|
"86 8\n",
|
|||
|
"76 7\n",
|
|||
|
"78 7\n",
|
|||
|
"77 7\n",
|
|||
|
"83 6\n",
|
|||
|
"89 5\n",
|
|||
|
"71 5\n",
|
|||
|
"72 5\n",
|
|||
|
"80 4\n",
|
|||
|
"70 4\n",
|
|||
|
"88 3\n",
|
|||
|
"68 3\n",
|
|||
|
"65 3\n",
|
|||
|
"73 3\n",
|
|||
|
"67 2\n",
|
|||
|
"87 2\n",
|
|||
|
"63 2\n",
|
|||
|
"95 2\n",
|
|||
|
"93 2\n",
|
|||
|
"90 2\n",
|
|||
|
"94 1\n",
|
|||
|
"66 1\n",
|
|||
|
"92 1\n",
|
|||
|
"69 1\n",
|
|||
|
"98 1\n",
|
|||
|
"61 1\n",
|
|||
|
"Name: count, dtype: int64\n",
|
|||
|
"\n",
|
|||
|
"Распределение оценки характеристик в тестовой выборке:\n",
|
|||
|
"Spec_score\n",
|
|||
|
"75 15\n",
|
|||
|
"84 13\n",
|
|||
|
"76 11\n",
|
|||
|
"82 10\n",
|
|||
|
"81 9\n",
|
|||
|
"80 9\n",
|
|||
|
"77 8\n",
|
|||
|
"83 8\n",
|
|||
|
"86 7\n",
|
|||
|
"89 6\n",
|
|||
|
"78 6\n",
|
|||
|
"79 6\n",
|
|||
|
"87 5\n",
|
|||
|
"71 5\n",
|
|||
|
"74 5\n",
|
|||
|
"85 5\n",
|
|||
|
"70 4\n",
|
|||
|
"94 3\n",
|
|||
|
"72 3\n",
|
|||
|
"73 3\n",
|
|||
|
"66 3\n",
|
|||
|
"91 3\n",
|
|||
|
"88 3\n",
|
|||
|
"92 3\n",
|
|||
|
"93 2\n",
|
|||
|
"96 1\n",
|
|||
|
"64 1\n",
|
|||
|
"90 1\n",
|
|||
|
"67 1\n",
|
|||
|
"62 1\n",
|
|||
|
"65 1\n",
|
|||
|
"68 1\n",
|
|||
|
"95 1\n",
|
|||
|
"69 1\n",
|
|||
|
"Name: count, dtype: int64\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"train_df, test_df = train_test_split(cleaned_df, test_size=0.2, random_state=42)\n",
|
|||
|
"\n",
|
|||
|
"train_df, val_df = train_test_split(train_df, test_size=0.25, random_state=42)\n",
|
|||
|
"\n",
|
|||
|
"print(\"Размер обучающей выборки:\", len(train_df))\n",
|
|||
|
"print(\"Размер контрольной выборки:\", len(val_df))\n",
|
|||
|
"print(\"Размер тестовой выборки:\", len(test_df))\n",
|
|||
|
"\n",
|
|||
|
"print()\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"def check_balance(df, name):\n",
|
|||
|
" counts = df[\"Spec_score\"].value_counts()\n",
|
|||
|
" print(f\"Распределение оценки характеристик в {name}:\")\n",
|
|||
|
" print(counts)\n",
|
|||
|
" print()\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"check_balance(train_df, \"обучающей выборке\")\n",
|
|||
|
"check_balance(val_df, \"контрольной выборке\")\n",
|
|||
|
"check_balance(test_df, \"тестовой выборке\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Оверсемплинг и андерсемплинг"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 25,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Оверсэмплинг:\n",
|
|||
|
"Распределение оценки характеристик в обучающей выборке:\n",
|
|||
|
"Spec_score\n",
|
|||
|
"85 48\n",
|
|||
|
"78 48\n",
|
|||
|
"75 48\n",
|
|||
|
"82 48\n",
|
|||
|
"64 48\n",
|
|||
|
"73 48\n",
|
|||
|
"79 48\n",
|
|||
|
"87 48\n",
|
|||
|
"86 48\n",
|
|||
|
"80 48\n",
|
|||
|
"70 48\n",
|
|||
|
"83 48\n",
|
|||
|
"68 48\n",
|
|||
|
"74 48\n",
|
|||
|
"71 48\n",
|
|||
|
"72 48\n",
|
|||
|
"66 48\n",
|
|||
|
"93 48\n",
|
|||
|
"77 48\n",
|
|||
|
"88 48\n",
|
|||
|
"69 48\n",
|
|||
|
"89 48\n",
|
|||
|
"84 48\n",
|
|||
|
"94 48\n",
|
|||
|
"76 48\n",
|
|||
|
"95 48\n",
|
|||
|
"90 48\n",
|
|||
|
"63 48\n",
|
|||
|
"81 48\n",
|
|||
|
"67 48\n",
|
|||
|
"91 48\n",
|
|||
|
"92 48\n",
|
|||
|
"96 48\n",
|
|||
|
"65 48\n",
|
|||
|
"60 48\n",
|
|||
|
"61 48\n",
|
|||
|
"Name: count, dtype: int64\n",
|
|||
|
"\n",
|
|||
|
"Распределение оценки характеристик в контрольной выборке:\n",
|
|||
|
"Spec_score\n",
|
|||
|
"75 18\n",
|
|||
|
"94 18\n",
|
|||
|
"72 18\n",
|
|||
|
"82 18\n",
|
|||
|
"70 18\n",
|
|||
|
"74 18\n",
|
|||
|
"68 18\n",
|
|||
|
"88 18\n",
|
|||
|
"71 18\n",
|
|||
|
"80 18\n",
|
|||
|
"92 18\n",
|
|||
|
"86 18\n",
|
|||
|
"66 18\n",
|
|||
|
"81 18\n",
|
|||
|
"84 18\n",
|
|||
|
"79 18\n",
|
|||
|
"73 18\n",
|
|||
|
"76 18\n",
|
|||
|
"67 18\n",
|
|||
|
"95 18\n",
|
|||
|
"78 18\n",
|
|||
|
"85 18\n",
|
|||
|
"83 18\n",
|
|||
|
"77 18\n",
|
|||
|
"89 18\n",
|
|||
|
"98 18\n",
|
|||
|
"69 18\n",
|
|||
|
"90 18\n",
|
|||
|
"87 18\n",
|
|||
|
"65 18\n",
|
|||
|
"63 18\n",
|
|||
|
"93 18\n",
|
|||
|
"61 18\n",
|
|||
|
"Name: count, dtype: int64\n",
|
|||
|
"\n",
|
|||
|
"Распределение оценки характеристик в тестовой выборке:\n",
|
|||
|
"Spec_score\n",
|
|||
|
"80 15\n",
|
|||
|
"94 15\n",
|
|||
|
"82 15\n",
|
|||
|
"77 15\n",
|
|||
|
"75 15\n",
|
|||
|
"79 15\n",
|
|||
|
"96 15\n",
|
|||
|
"83 15\n",
|
|||
|
"76 15\n",
|
|||
|
"71 15\n",
|
|||
|
"64 15\n",
|
|||
|
"78 15\n",
|
|||
|
"84 15\n",
|
|||
|
"91 15\n",
|
|||
|
"74 15\n",
|
|||
|
"93 15\n",
|
|||
|
"87 15\n",
|
|||
|
"89 15\n",
|
|||
|
"81 15\n",
|
|||
|
"66 15\n",
|
|||
|
"86 15\n",
|
|||
|
"92 15\n",
|
|||
|
"88 15\n",
|
|||
|
"73 15\n",
|
|||
|
"90 15\n",
|
|||
|
"67 15\n",
|
|||
|
"85 15\n",
|
|||
|
"72 15\n",
|
|||
|
"62 15\n",
|
|||
|
"70 15\n",
|
|||
|
"65 15\n",
|
|||
|
"68 15\n",
|
|||
|
"95 15\n",
|
|||
|
"69 15\n",
|
|||
|
"Name: count, dtype: int64\n",
|
|||
|
"\n",
|
|||
|
"Андерсэмплинг:\n",
|
|||
|
"Распределение оценки характеристик в обучающей выборке:\n",
|
|||
|
"Spec_score\n",
|
|||
|
"60 1\n",
|
|||
|
"61 1\n",
|
|||
|
"63 1\n",
|
|||
|
"64 1\n",
|
|||
|
"65 1\n",
|
|||
|
"66 1\n",
|
|||
|
"67 1\n",
|
|||
|
"68 1\n",
|
|||
|
"69 1\n",
|
|||
|
"70 1\n",
|
|||
|
"71 1\n",
|
|||
|
"72 1\n",
|
|||
|
"73 1\n",
|
|||
|
"74 1\n",
|
|||
|
"75 1\n",
|
|||
|
"76 1\n",
|
|||
|
"77 1\n",
|
|||
|
"78 1\n",
|
|||
|
"79 1\n",
|
|||
|
"80 1\n",
|
|||
|
"81 1\n",
|
|||
|
"82 1\n",
|
|||
|
"83 1\n",
|
|||
|
"84 1\n",
|
|||
|
"85 1\n",
|
|||
|
"86 1\n",
|
|||
|
"87 1\n",
|
|||
|
"88 1\n",
|
|||
|
"89 1\n",
|
|||
|
"90 1\n",
|
|||
|
"91 1\n",
|
|||
|
"92 1\n",
|
|||
|
"93 1\n",
|
|||
|
"94 1\n",
|
|||
|
"95 1\n",
|
|||
|
"96 1\n",
|
|||
|
"Name: count, dtype: int64\n",
|
|||
|
"\n",
|
|||
|
"Распределение оценки характеристик в контрольной выборке:\n",
|
|||
|
"Spec_score\n",
|
|||
|
"61 1\n",
|
|||
|
"63 1\n",
|
|||
|
"65 1\n",
|
|||
|
"66 1\n",
|
|||
|
"67 1\n",
|
|||
|
"68 1\n",
|
|||
|
"69 1\n",
|
|||
|
"70 1\n",
|
|||
|
"71 1\n",
|
|||
|
"72 1\n",
|
|||
|
"73 1\n",
|
|||
|
"74 1\n",
|
|||
|
"75 1\n",
|
|||
|
"76 1\n",
|
|||
|
"77 1\n",
|
|||
|
"78 1\n",
|
|||
|
"79 1\n",
|
|||
|
"80 1\n",
|
|||
|
"81 1\n",
|
|||
|
"82 1\n",
|
|||
|
"83 1\n",
|
|||
|
"84 1\n",
|
|||
|
"85 1\n",
|
|||
|
"86 1\n",
|
|||
|
"87 1\n",
|
|||
|
"88 1\n",
|
|||
|
"89 1\n",
|
|||
|
"90 1\n",
|
|||
|
"92 1\n",
|
|||
|
"93 1\n",
|
|||
|
"94 1\n",
|
|||
|
"95 1\n",
|
|||
|
"98 1\n",
|
|||
|
"Name: count, dtype: int64\n",
|
|||
|
"\n",
|
|||
|
"Распределение оценки характеристик в тестовой выборке:\n",
|
|||
|
"Spec_score\n",
|
|||
|
"62 1\n",
|
|||
|
"64 1\n",
|
|||
|
"65 1\n",
|
|||
|
"66 1\n",
|
|||
|
"67 1\n",
|
|||
|
"68 1\n",
|
|||
|
"69 1\n",
|
|||
|
"70 1\n",
|
|||
|
"71 1\n",
|
|||
|
"72 1\n",
|
|||
|
"73 1\n",
|
|||
|
"74 1\n",
|
|||
|
"75 1\n",
|
|||
|
"76 1\n",
|
|||
|
"77 1\n",
|
|||
|
"78 1\n",
|
|||
|
"79 1\n",
|
|||
|
"80 1\n",
|
|||
|
"81 1\n",
|
|||
|
"82 1\n",
|
|||
|
"83 1\n",
|
|||
|
"84 1\n",
|
|||
|
"85 1\n",
|
|||
|
"86 1\n",
|
|||
|
"87 1\n",
|
|||
|
"88 1\n",
|
|||
|
"89 1\n",
|
|||
|
"90 1\n",
|
|||
|
"91 1\n",
|
|||
|
"92 1\n",
|
|||
|
"93 1\n",
|
|||
|
"94 1\n",
|
|||
|
"95 1\n",
|
|||
|
"96 1\n",
|
|||
|
"Name: count, dtype: int64\n",
|
|||
|
"\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from imblearn.over_sampling import RandomOverSampler\n",
|
|||
|
"from imblearn.under_sampling import RandomUnderSampler\n",
|
|||
|
"\n",
|
|||
|
"def oversample(df, target_column):\n",
|
|||
|
" X = df.drop(target_column, axis=1)\n",
|
|||
|
" y = df[target_column]\n",
|
|||
|
"\n",
|
|||
|
" oversampler = RandomOverSampler(random_state=42)\n",
|
|||
|
" x_resampled, y_resampled = oversampler.fit_resample(X, y)\n",
|
|||
|
"\n",
|
|||
|
" resampled_df = pd.concat([x_resampled, y_resampled], axis=1)\n",
|
|||
|
" return resampled_df\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"def undersample(df, target_column):\n",
|
|||
|
" X = df.drop(target_column, axis=1)\n",
|
|||
|
" y = df[target_column]\n",
|
|||
|
"\n",
|
|||
|
" undersampler = RandomUnderSampler(random_state=42)\n",
|
|||
|
" x_resampled, y_resampled = undersampler.fit_resample(X, y)\n",
|
|||
|
"\n",
|
|||
|
" resampled_df = pd.concat([x_resampled, y_resampled], axis=1)\n",
|
|||
|
" return resampled_df\n",
|
|||
|
"\n",
|
|||
|
"train_df_oversampled = oversample(train_df, \"Spec_score\")\n",
|
|||
|
"val_df_oversampled = oversample(val_df, \"Spec_score\")\n",
|
|||
|
"test_df_oversampled = oversample(test_df, \"Spec_score\")\n",
|
|||
|
"\n",
|
|||
|
"train_df_undersampled = undersample(train_df, \"Spec_score\")\n",
|
|||
|
"val_df_undersampled = undersample(val_df, \"Spec_score\")\n",
|
|||
|
"test_df_undersampled = undersample(test_df, \"Spec_score\")\n",
|
|||
|
"\n",
|
|||
|
"print(\"Оверсэмплинг:\")\n",
|
|||
|
"check_balance(train_df_oversampled, \"обучающей выборке\")\n",
|
|||
|
"check_balance(val_df_oversampled, \"контрольной выборке\")\n",
|
|||
|
"check_balance(test_df_oversampled, \"тестовой выборке\")\n",
|
|||
|
"\n",
|
|||
|
"print(\"Андерсэмплинг:\")\n",
|
|||
|
"check_balance(train_df_undersampled, \"обучающей выборке\")\n",
|
|||
|
"check_balance(val_df_undersampled, \"контрольной выборке\")\n",
|
|||
|
"check_balance(test_df_undersampled, \"тестовой выборке\")"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Начнем анализировать датасет №19.\n",
|
|||
|
"\n",
|
|||
|
"Ссылка на исходные данные: https://www.kaggle.com/datasets/surajjha101/forbes-billionaires-data-preprocessed\n",
|
|||
|
"\n",
|
|||
|
"**Общее описание**: «Миллиардеры мира» — это ежегодный рейтинг документально подтвержденного состояния богатейших миллиардеров мира, который составляется и публикуется ежегодно в марте американским деловым журналом Forbes. Список был впервые опубликован в марте 1987 года. Общий собственный капитал каждого человека в списке оценивается и указывается в долларах США на основе их документально подтвержденных активов, а также с учетом долга и других факторов. Члены королевской семьи и диктаторы, чье богатство обусловлено их положением, исключены из этих списков. Этот рейтинг представляет собой индекс самых богатых задокументированных людей, исключая любой рейтинг тех, кто обладает богатством, которое невозможно полностью установить.\n",
|
|||
|
"\n",
|
|||
|
"**Проблемная область**: Анализ состояния, возраста и источников богатства самых богатых людей в мире.\n",
|
|||
|
"\n",
|
|||
|
"**Объекты наблюдения**: Богатейшие люди мира, представленные в датасете.\n",
|
|||
|
"\n",
|
|||
|
"**Связи между объектами**: можно выявить следующие связи:\n",
|
|||
|
"- Между возрастом и состоянием\n",
|
|||
|
"- Между страной проживания и источником дохода\n",
|
|||
|
"- Между отраслью бизнеса и уровнем благосостояния.\n",
|
|||
|
"\n",
|
|||
|
"**Бизнес цели**:\n",
|
|||
|
"- ***Понять факторы успеха:***: Исследовать, какие факторы (возраст, страна, источник дохода) влияют на высокие состояния. Это может помочь новым предпринимателям и стартапам учиться на опыте успешных людей.\n",
|
|||
|
"- ***Анализ тенденций богатства***: Понимание как источники богатства меняются со временем и как это связано с экономическими условиями в разных странах. Это непременно поможет инвесторам и аналитикам определить, какие секторы могут быть наиболее перспективными для инвестиций в будущем. \n",
|
|||
|
"\n",
|
|||
|
"**Цели технического проекта**:\n",
|
|||
|
"1. ***Исследование факторов успеха***: Входные данные - данные о богатейших людях (возраст, чистая стоимость, индустрия); целевой признак - выявление факторов, способствующих накоплению состояния.\n",
|
|||
|
"2. ***Анализ тенденций богатства***: Входные данные - данные о богатейших людях (возраст, страна, источник богатства); целевой признак - наличие зависимости между источником богатства и страной."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 26,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Index(['Rank ', 'Name', 'Networth', 'Age', 'Country', 'Source', 'Industry'], dtype='object')\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"\n",
|
|||
|
"df = pd.read_csv(\"../data/Forbes-Billionaires.csv\")\n",
|
|||
|
"print(df.columns)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Атрибуты:\n",
|
|||
|
"- Ранг (Rank),\n",
|
|||
|
"- Имя (Name),\n",
|
|||
|
"- Общая стоимость (Networth),\n",
|
|||
|
"- Возраст (Age),\n",
|
|||
|
"- Страна (Country),\n",
|
|||
|
"- Источник дохода(Source),\n",
|
|||
|
"- Индустрия (Industry)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Посмотрим на связи."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 27,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAasAAAEnCAYAAAAXY2zOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABnb0lEQVR4nO2dd3hUxfrHv9tLkt30BgESQidABEFIgIsgRUDaRQUuVcFCUxABlSZSBCuIFyt6LwJeRFHEn0oVwYAISFFEQm8JpG42m2yd3x+bPezZvmGTbMj7eZ48sGfmzHnPe+bMe2bmnXcEjDEGgiAIgghihDUtAEEQBEF4g4wVQRAEEfSQsSIIgiCCHjJWBEEQRNBDxoogCIIIeshYEQRBEEEPGSuCIAgi6CFjRRAEQQQ9ZKwIgiCIoIeMFUEQBBH0kLGqBOfOncMTTzyBlJQUyOVyqFQqZGRk4O2330ZZWVlNi0cQBHHXIa5pAWob27dvx/DhwyGTyTBmzBi0bt0aBoMB+/fvx6xZs/DHH3/g/fffr2kxCYIg7ioEFMjWdy5cuIA2bdqgfv362L17NxISEnjp2dnZ2L59O6ZPn15DEhIEQdyd0DCgH6xYsQJarRYfffSRk6ECgNTUVJ6hEggE3J9IJEK9evUwadIkFBUV8c47f/48hg8fjsTERAiFQu6c1q1bc3n279+PzMxMREdHQy6XIyUlBbNnz0Z5eTmX55NPPoFAIMBvv/3m973ZzpVKpbh16xYvLSsri5PJsexDhw6hb9++UKvVUCqV6N69Ow4cOMDLM3fuXMjlct7xvXv3QiAQYO/evdyxAwcOQC6XY+7cuU7y/eMf/+Dp0/a3cOFCLk/37t3Rtm1bl/fXrFkz9OnTx6MOGjVqxJUrFAoRHx+PRx55BJcvX+blKy0txcyZM5GUlASZTIZmzZrhtddeg/13X35+Pvr164f69etDJpMhISEBo0aNwqVLl7g8Fy9ehEAgwGuvvYY333wTDRs2hEKhQPfu3XHq1CneNU+cOIFx48ZxQ8/x8fGYMGEC8vPzne7j2rVreOyxx5CYmAiZTIbk5GQ89dRTMBgM3HP29PfJJ59wZe3evRtdu3ZFSEgIwsPDMWjQIJw+fZp3vYULF0IgECA2NhZGo5GXtnHjRq7cvLw8j/ofN24cGjVqxDt25coVKBQKCAQCXLx40eP5APDXX3/h4YcfRkxMDBQKBZo1a4YXX3yRl+fYsWPo168fVCoVQkND0bNnTxw8eNCprKKiIjz77LNo1KgRZDIZ6tevjzFjxiAvL4+rv57+7Oumr9c8duwY+vbti5iYGF5ZAwYM4PK4e8/z8vKcrmt7NvZotVrEx8c7vX+Ab+9zoJ63v9AwoB9s27YNKSkp6NKli8/nDBkyBEOHDoXJZEJWVhbef/99lJWV4b///S8AwGw246GHHsKlS5fwzDPPoGnTphAIBFiyZAmvnJKSErRo0QIPP/wwlEolsrKysGLFCuh0OqxevTpg9ygSibB+/Xo8++yz3LF169ZBLpfzDCNgbcj69euH9u3bY8GCBRAKhVi3bh3uv/9+/Pzzz+jYsSMAYOnSpTh79iyGDBmCQ4cOITk52em6Fy5cwODBgzFgwAAsXbrUpWz169fHsmXLAFhfuKeeeoqXPnr0aEycOBGnTp3iGfrDhw/j77//xksvveT1/rt27YpJkybBYrHg1KlTeOutt3D9+nX8/PPPAADGGB566CHs2bMHjz32GNq1a4cffvgBs2bNwrVr1/Dmm28CAAwGA8LCwjB9+nRERUXh3LlzWL16NU6cOIGTJ0/yrvmf//wHJSUlmDx5MsrLy/H222/j/vvvx8mTJxEXFwcA2LFjB86fP4/x48cjPj6eG27+448/cPDgQa5Bun79Ojp27IiioiJMmjQJzZs3x7Vr1/DFF19Ap9OhW7duXN0DwNUz+wbdVr937tyJfv36ISUlBQsXLkRZWRlWr16NjIwMHD161MmwlJSU4Ntvv8WQIUO4Y+7qjq/Mnz/f53NPnDiBrl27QiKRYNKkSWjUqBHOnTuHbdu2cff5xx9/oGvXrlCpVHj++echkUjw3nvv4R//+Ad++ukndOrUCYC1fnXt2hWnT5/GhAkTcM899yAvLw/ffPMNrl69ihYtWvD0+P777+P06dPc8weANm3a+HXN4uJi9OvXD4wxzJgxA0lJSQDAexcDweuvv47c3Fyn476+zzaq4nl7hBE+UVxczACwQYMG+XwOALZgwQLesS5durCWLVtyv8+cOcMAsGXLlvHyde/enbVq1cpj+Q8++CBr3bo193vdunUMADt8+LDPMjqeO2LECJaWlsYdLy0tZSqVio0cOZJXtsViYU2aNGF9+vRhFouFy6/T6VhycjJ74IEHeOWXlpayDh06sFatWrHi4mK2Z88eBoDt2bOHFRUVsZYtW7J7772X6XQ6l/J16dKFd6+3bt1y0m9RURGTy+Vs9uzZvHOnTZvGQkJCmFar9aiDhg0bsrFjx/KOjRw5kimVSu731q1bGQD2yiuv8PL985//ZAKBgGVnZ7stf8WKFQwAy8vLY4wxduHCBQaAKRQKdvXqVS7foUOHGAD27LPPcsdc6WXjxo0MANu3bx93bMyYMUwoFLqsA/bPyUb37t1Z9+7dXcrbrl07Fhsby/Lz87ljx48fZ0KhkI0ZM4Y7tmDBAq7uDBgwgDt+6dIlJhQK2YgRIxgAduvWLZfXsTF27FjWsGFD7vepU6eYUChk/fr1YwDYhQsXPJ7frVs3FhYWxi5dusQ7bn/fgwcPZlKplJ07d447dv36dRYWFsa6devGHZs/fz4DwL788kun67jSo6Ps9vh6zR9++IEBYBs3buSd37BhQ9a/f3/ut7v33NU7YXs2Nm7evMnCwsI4ne7Zs4e7J1/f50A9b3+hYUAf0Wg0AICwsDC/ztPpdMjLy0NOTg62bNmC48ePo2fPnlx6SUkJACAqKsqn8goKCnDjxg1s3boVWVlZ6Natm1Oe4uJi5OXlcWX7w+jRo/HXX39xQwxbtmyBWq3myQwAv//+O86ePYuRI0ciPz8feXl5yMvLQ2lpKXr27Il9+/bBYrFw+ZVKJbZt24aCggI8/PDDMJvNAKw9y0ceeQSFhYX45ptvoFAoXMpVXl4OuVzuUXa1Wo1BgwZh48aN3JCc2WzG559/jsGDByMkJMTr/ev1euTl5eHmzZvYsWMHdu/ezbv37777DiKRCNOmTeOdN3PmTDDG8H//93+84yUlJbh58yaysrKwceNGtGrVCpGRkbw8gwcPRr169bjfHTt2RKdOnfDdd99xx+z1Ul5ejry8PNx3330AgKNHjwIALBYLtm7dioEDB6JDhw5O9+Y4HOSJGzdu4Pfff8e4ceN48rZp0wYPPPAATzYbEyZMwPfff4+cnBwAwKefforOnTujadOmPl/Xnrlz5+Kee+7B8OHDvea9desW9u3bhwkTJqBBgwa8NNt9m81m/Pjjjxg8eDBSUlK49ISEBIwcORL79+/n3vMtW7agbdu2vF6DY3m+4M81/W0LbO+57a+goMDrOYsXL4ZarXaqv/6+z0Dgn7c3yFj5iEqlAgC/DcDKlSsRExODhIQE/POf/0TXrl3x6quvcunNmjVDREQEXn/9dRw4cAC3bt1CXl6e01iwjZYtWyIxMRFDhgzBoEGD8Pbbbzvl6dWrF2JiYqBSqRAREYGnn34apaWlPskbExOD/v374+OPPwYAfPzxxxg7diyEQn5VOXv2LABg7NixiImJ4f19+OGH0Ov1KC4u5p1TXl6OoqIi/PDDD9y81Ny5c/HDDz+guLgYer3erVx5eXlQq9Ve5R8zZgwuX77MDdvt3LkTubm5GD16tE/3v2nTJsTExCAuLg69e/dGUlISPvzwQy790qVLSExMdPpoadGiBZduz8SJExEXF4cuXbpALBZj586dTo1dkyZNnORo2rQpb46moKAA06dPR1xcHBQKBWJiYrjhVJueb926BY1GwxsCrSy2+2jWrJlTWosWLbiGzJ527dqhdevW+M9//gPGGD755BOMHz++Utffv38/tm3bhld
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAcAAAAFNCAYAAACXC791AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB4a0lEQVR4nO3dd1iT19sH8G/C3kORJSK4EVHcOHCL4NY6qeKoG6WOVq0I4q62zqLWVa1i3btWxYkD3AouFERxAA5kC0Jy3j948/wICUpCYiC5P9fFpXmew8lJQnLnnOec+/AYYwyEEEKIhuGrugGEEEKIKlAAJIQQopEoABJCCNFIFAAJIYRoJAqAhBBCNBIFQEIIIRqJAiAhhBCNRAGQEEKIRqIASAghRCNRACQab/HixRAKhQAAoVCIJUuWqLhFhJBvQa4AGB8fj3HjxsHZ2Rn6+vowNTVF69atsXr1anz69EnRbSREqbZv347ffvsNr169wu+//47t27erukmEkG+AJ2su0H///RcDBgyAnp4ehg8fDldXV3z+/BmXL1/GgQMHMGLECGzcuFFZ7SVE4fbs2YPhw4fj8+fP0NPTw86dO/Hdd9+pulmEECWTKQAmJCTAzc0NVatWxblz52Brayt2Pi4uDv/++y8CAgIU3lBClOnt27eIi4tDrVq1YGVlpermEEK+BSaD8ePHMwDsypUrpSoPgPvh8/nMzs6OjRkzhn38+FGsXHx8PPvuu++Yra0t4/F43O/Ur1+fK3Pp0iXWunVrVqlSJaanp8ecnJzYzz//zD59+sSV+euvvxgAduPGDVkeltjv6ujosLdv34qdu3r1Ktem4nVHRUUxLy8vZmpqygwMDJinpye7fPmyWJlZs2YxPT09sePnz59nANj58+e5Y5cvX2Z6enps1qxZEu1r166d2PMp+gkODubKeHp6Mjc3N6mPr3bt2qxr165ffR5OnDjBPD09mbGxMTMxMWFNmzZlYWFhX2xD0R8RAGzSpEls586drHbt2kxPT481btyYXbx4Uez+nj9/ziZMmMBq167N9PX1maWlJfvuu+9YQkKCWLmSXtt3795JPA+MMXb79m3WrVs3ZmJiwoyMjFjHjh1ZZGSk1DqL3pdAIGANGjRgANhff/31xedK9Psl/RRtU3BwMAPAHj16xAYMGMBMTEyYpaUlmzJlitjfMGOM5efns/nz5zNnZ2emq6vLHB0d2ezZs1lubi5X5uPHj6xu3bqsWbNmLCcnhzvu5+fHHB0dxeqbNGkSMzIyYrdu3eKOOTo6Mj8/P7Fye/fuZQDEfj8hIUHsMWlrazNHR0c2Y8YMlpeXx5X78OEDmz59OnN1dWVGRkbMxMSEdevWjd29e1fsPkR/9/v27ZN4Po2MjMTaJOvrs2/fPtakSRNmbGws1ubly5dL3FdRZf07YKzw9fjxxx+Zo6Mj09XVZfb29mzYsGHs3bt3XJmUlBQ2atQoVqVKFaanp8fc3NzYtm3bJOoSCARs1apVzNXVlenp6bHKlSszLy8v7m//a+/Bdu3ayXyfonIODg6Mz+dzdRkZGXFlRH8L0p7P+vXri92vtM83xhjz8fGR+n599eoVGzlyJKtSpQrT1dVlLi4ubMuWLWJlRHUCYHfu3JH4fVG7pf1tlURblmB57NgxODs7o1WrVqX+nb59+6Jfv34oKChAZGQkNm7ciE+fPmHHjh0AAIFAgF69euHFixf48ccfUbt2bfB4PCxatEisnszMTNSrVw8DBw6EoaEhIiMjsWzZMuTk5GDt2rWyPIwv0tLSws6dOzF16lTu2F9//QV9fX3k5uaKlT137hy8vb3RpEkTBAcHg8/n46+//kLHjh1x6dIlNG/eHEDhJIunT5+ib9++uHbtGpycnCTuNyEhAX369EGPHj2wePFiqW2rWrUqN0EjKysLEyZMEDs/bNgwjBkzBvfv34erqyt3/MaNG3jy5AkCAwO/+Ni3bduGUaNGoX79+pg9ezbMzc1x584dnDx5EkOHDsWcOXPwww8/AADev3+PqVOnYuzYsWjbtq3U+i5evIg9e/ZgypQp0NPTw7p169CtWzdcv36da9+NGzdw9epVDB48GFWrVsXz58+xfv16tG/fHg8fPoShoeEX2yzNgwcP0LZtW5iamuLnn3+Gjo4O/vzzT7Rv3x4XL15EixYtSvzdHTt2ICYmRqb7mz9/vthrKu21ERk4cCCqV6+OJUuWICoqCmvWrMHHjx/x999/c2V++OEHbN++Hd999x2mT5+Oa9euYcmSJXj06BEOHToEADA3N8fx48fRsmVL+Pn5Yc+ePeDxeBL3t3btWqxfvx4HDx5E48aNS3wMBQUFmDNnTonnRa9zXl4eTp06hd9++w36+vpYsGABAODZs2c4fPgwBgwYACcnJ6SkpODPP/9Eu3bt8PDhQ9jZ2X35SSylkl6fyMhIDBw4EA0bNsTSpUthZmbG/Y0q8n6kycrKQtu2bfHo0SOMGjUKjRs3xvv373H06FG8evUKlStXxqdPn9C+fXvExcXB398fTk5O2LdvH0aMGIG0tDSxUbPRo0dj27Zt8Pb2xg8//ICCggJcunQJUVFRaNq0KffZCQCXLl3Cxo0bsXLlSlSuXBkAYG1tDQAy3aefnx/OnDmDyZMno2HDhtDS0sLGjRtx+/ZtuZ4/aSIiInDixAmJ4ykpKWjZsiV4PB78/f1hZWWF//77D6NHj0ZGRgZ+/PFHsfL6+vr466+/sHr1au7Y9u3boaurK/EZ/VWljZTp6ekMAOvdu3epoyukRPpWrVoxFxcX7nZsbCwDwJYsWSJWrl27dmI9QGl8fHyYq6srd1sRPcAhQ4awBg0acMezs7OZqakpGzp0qFjdQqGQ1apVi3l5eTGhUMiVz8nJYU5OTqxLly5i9WdnZ7OmTZuy+vXrs/T0dLFvSGlpaczFxUXi23xRrVq1Enus0no+aWlpTF9fn82cOVPsd6dMmcKMjIxYVlZWiY8/LS2NmZiYsBYtWkj0SIo+PhHRt8GSvh3j/7+p3bx5kzv24sULpq+vz/r27csdk/Z4IyMjGQD2999/c8dk6QH26dOH6erqsvj4eO7YmzdvmImJCfP09JSoU/TNPzc3l1WrVo15e3vL1AMsTZtEPcBevXqJlZ04cSIDwO7du8cYY+zu3bsMAPvhhx/Eys2YMYMBYOfOnRM7funSJaanp8fmzJnDGBPvAf73339MS0tL6jf24j3AdevWMT09PdahQwepPcDiz4WdnR3z8fHhbufm5jKBQCBWJiEhgenp6bH58+dzx8rSA/zS6zN79mwGgCUlJUm0XdYeoKx/B0FBQQwAO3jwoMQ50Xtn1apVDADbuXMnd+7z58/Mw8ODGRsbs4yMDMYYY+fOnWMA2JQpU0qs60ttL6q09/np0yfG5/PZuHHjxH7fz89PoT3AFi1acM9p0ffG6NGjma2tLXv//r1YnYMHD2ZmZmbcZ4SoziFDhrBKlSqJjUDUqlWL+4yWpQdY6lmgGRkZAAATExOZAmxOTg7ev3+P5ORkHDhwAPfu3UOnTp2485mZmQCASpUqlaq+1NRUJCUl4fDhw4iMjISnp6dEmfT0dLx//56rWxbDhg3D48ePcfPmTQDAgQMHYGZmJtZmALh79y6ePn2KoUOH4sOHD3j//j3ev3+P7OxsdOrUCREREdzUegAwNDTEsWPHkJqaioEDB0IgEAAo7AEPGjQIHz9+xNGjR2FgYCC1Xbm5udDX1/9i283MzNC7d2/8888/YP9/aVcgEGDPnj3o06cPjIyMSvzd8PBwZGZmYtasWRL3I61nURoeHh5o0qQJd7tatWro3bs3Tp06xT3+oo83Pz8fHz58QM2aNWFubi7126fotRX9pKamip0XCAQ4ffo0+vTpA2dnZ+64ra0thg4disuXL3N/y8WFhobiw4cPCA4OluvxlsakSZPEbk+ePBkAuG/Gon+nTZsmVm769OkACiehFdWmTRv8+eefWLRoEXbu3Mkdf/DgAQYNGoRhw4ZhxowZX2xTTk4O5s+fD39/f1SrVk1qmaysLLx//x6vX7/Gxo0bkZycLPa
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAbQAAAFiCAYAAACeUy10AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACLjElEQVR4nO3dd1RTydsH8G+ooRdBigKCnSZgW0Sx4dr7YkPF3rFgXwuiKOquDcVeQMVVEevaRRRURBER7DTBBopIE+nz/sGb+yMmtBQC2fmck6O592buk5DkycydwiKEEFAURVFUPScj6QAoiqIoShRoQqMoiqKkAk1oFEVRlFSgCY2iKIqSCjShURRFUVKBJjSKoihKKtCERlEURUkFmtAoiqIoqUATGkVRFCUVaEKjKIqipIJACS0hIQHTp0+HmZkZ2Gw21NXV4eDggB07duDnz5+ijpGiKIqiqiRX0wdcvnwZzs7OUFRUxPjx42FpaYnCwkLcu3cPixcvxosXL7B//35xxEpRFEVRFWLVZHLipKQkWFtbo3Hjxrh9+zYMDAy49sfHx+Py5cuYN2+eyAOlKIqiqMrUqMlx8+bNyM3NxaFDh3iSGQA0a9aMK5mxWCzmJisri0aNGmHatGnIzMzkelxiYiKcnZ1haGgIGRkZ5jGWlpbMMffu3UPnzp2ho6MDNpsNMzMzLF26FPn5+cwxfn5+YLFYiIyMrMnT4nqsgoICvn79yrUvPDycienXsiMiItCnTx9oaGhAWVkZXbt2xf3797mOWb58OdhsNtf2O3fugMVi4c6dO8y2+/fvg81mY/ny5TzxdevWjev15NzWrFnDHNO1a1e0adOG7/Nr2bIlevfuXelr0KRJE0yYMIFrW2BgIFgsFpo0acK1vbS0FDt27ICVlRXYbDZ0dXXRp08f5vXhF2v5W7du3Ziyvnz5gsmTJ0NPTw9sNhtt2rSBv78/1/k4f593797xvC7ly+K8rmfOnOF5fqqqqlzPj1+ZpaWlsLa2BovFgp+fH7N9woQJPK/B8ePHISMjg40bN3Jtv337Nrp06QIVFRVoampi8ODBePXqFdcxa9asAYvFQsOGDVFUVMS1759//mFep/T0dJ7nwU+TJk34vs7lnwMA/PjxAwsXLoSRkREUFRXRsmVL/P333yj/u/bIkSNgsVg4fPgw12M3bNgAFouFK1eu1Kg8Ds7fht+tvOTkZMyaNQstW7aEkpISGjRoAGdnZ56/PT/v3r3j+7xnz54NFovF8/7mp6r3NgAUFxdj3bp1aNq0KRQVFdGkSRP8+eefKCgo4Cnv6tWr6Nq1K9TU1KCuro727dvjxIkTACr+XPN7bap7zuLiYnh5eaFFixZQVFTkKqv8c2jSpAkGDBjAE++cOXN4/ia/ftcAwF9//cXzWQaAgoICeHh4oFmzZlBUVISRkRGWLFnCEycnpu3bt/PE0KpVK7BYLMyZM4dnX2Vq1OR46dIlmJmZoVOnTtV+zNChQzFs2DAUFxcjPDwc+/fvx8+fP3Hs2DEAQElJCQYNGoTk5GTMnz8fLVq0AIvFwvr167nKycnJQevWrTFixAgoKysjPDwcmzdvRl5eHnbu3FmTp1EpWVlZHD9+HAsWLGC2HTlyBGw2myt5AmVfXH379kXbtm3h4eEBGRkZHDlyBD169EBYWBg6dOgAoOyLIC4uDkOHDkVERARMTU15zpuUlIQhQ4ZgwIAB2LBhA9/YGjduDG9vbwBAbm4uZs6cybV/3LhxmDp1Kp4/f871Y+Dx48d4+/YtVq5cWaPXori4GCtWrOC7b/LkyfDz80Pfvn0xZcoUFBcXIywsDA8fPkS7du2Yvy8AhIWFYf/+/di2bRt0dHQAAHp6egCAnz9/olu3boiPj8ecOXNgamqKwMBATJgwAZmZmbVe2z927BhiY2OrPO7GjRuYNGkS5syZg2XLljHbb926hb59+8LMzAxr1qzBz58/sXPnTjg4OCAqKoonKebk5ODff//F0KFDmW0Vvd+qYmNjg4ULFwIoez+tXr2aaz8hBIMGDUJISAgmT54MGxsbXL9+HYsXL8bHjx+xbds2AMDEiRNx9uxZuLu7o1evXjAyMkJsbCw8PT0xefJk9OvXr0bl/Wru3Llo3749AODo0aO4efMm1/7Hjx/jwYMHGDVqFBo3box3795hz5496NatG16+fAllZeUavS7x8fE4cOBAtY+v6r0NAFOmTIG/vz/++OMPLFy4EBEREfD29sarV69w7tw5piw/Pz9MmjQJFhYWWL58OTQ1NfH06VNcu3YNY8aMwYoVKzBlyhQAQHp6OhYsWIBp06ahS5cuPHFV95xbtmzBqlWrMHToUCxduhSKiorMZ1BUMjMzme+i8kpLSzFo0CDcu3cP06ZNQ+vWrREbG4tt27bh7du3OH/+PNfxbDYbR44cwfz585ltDx48QHJysmCBkWrKysoiAMjgwYOr+xACgHh4eHBt69SpEzE3N2fuv3nzhgAg3t7eXMd17dqVWFhYVFp+v379iKWlJXP/yJEjBAB5/PhxtWP89bGjR48mVlZWzPYfP34QdXV1MmbMGK6yS0tLSfPmzUnv3r1JaWkpc3xeXh4xNTUlvXr14ir/x48fpF27dsTCwoJkZWWRkJAQAoCEhISQzMxMYm5uTtq3b0/y8vL4xtepUyeu5/r161ee1zczM5Ow2WyydOlSrsfOnTuXqKiokNzc3EpfAxMTE+Lq6src3717N1FUVCTdu3cnJiYmzPbbt28TAGTu3Lk8ZZR/LTg4r21SUhLPvu3btxMA5Pjx48y2wsJCYm9vT1RVVUl2djYhhBB/f38CgCQmJnI9vmvXrqRr167Mfc7rGhgYyHMuFRUVruf3a1z5+fnE2NiY9O3blwAgR44cYY51dXVlXoPIyEiiqqpKnJ2dSUlJCdc5bGxsSMOGDcm3b9+Ybc+ePSMyMjJk/PjxzDYPDw/m/TZgwABme3JyMpGRkSGjR48mAMjXr195ngc/hoaGXOU8fvyY5zmcP3+eACBeXl5cj/3jjz8Ii8Ui8fHxzLbPnz8TbW1t0qtXL1JQUEBsbW2JsbExycrKEqg8Qgi5ceMGAUDOnDnDbJs9ezb59WuI32cgPDycACBHjx6t9HVISkried4jRowglpaWxMjIiOvvz0913tvR0dEEAJkyZQrX/kWLFhEA5Pbt24SQss+jmpoa6dixI/n58yffsqqKnaO65ySEEHt7e9K6dWuuc/D7bjQxMSH9+/fnORe/v8mv3zVLliwhDRs2JG3btuX6/B07dozIyMiQsLAwrsfv3buXACD379/nKvOPP/4gcnJyJDIyktk+efJk5vt29uzZPPFVptpNjtnZ2QAANTW1GiXMvLw8pKenIzU1FUFBQXj27Bl69uzJ7M/JyQEANGjQoFrlZWRk4PPnzzh//jzCw8Ph6OjIc0xWVhbS09OZsmti3LhxeP36NVM1DwoKgoaGBlfMABAdHY24uDiMGTMG3759Q3p6OtLT0/Hjxw/07NkToaGhKC0tZY5XVlbGpUuXkJGRgREjRqCkpARAWQ115MiR+P79Oy5evAglJSW+ceXn54PNZlcau4aGBgYPHox//vmHafIpKSnBqVOnMGTIEKioqFT7dcjLy8PatWsxZ84cGBsbc+0LCgoCi8WCh4cHz+N+baqoypUrV6Cvr4/Ro0cz2+Tl5TF37lzk5ubi7t27AICGDRsCAD58+FCtcnNycpi/CedWFV9fX3z79o3v8+JITExE//79YWNjg2PHjkFG5n8foc+fPyM6OhoTJkyAtrY2s93a2hq9evXiaqrjmDRpEq5du4bU1FQAgL+/P+zt7dGiRYtqPU+O6rw/rly5AllZWcydO5dr+8KFC0EIwdWrV5lt+vr68PX1xc2bN9GlSxdER0fj8OHDUFdXF6g8TowAqoyz/GegqKgI3759Q7NmzaCpqYmoqKhKH/urJ0+eIDAwEN7e3lx/q4pU573N+Tu6u7tz7efUji9fvgwAuHnzJnJycrBs2TKe5yzI56Q65wTK3vtaWlrVOkdRURHP56SqloGPHz9i586dWLVqFVRVVbn
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAGVCAYAAADAPivmAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAACTJklEQVR4nO2dd1gT2dfHvwnSO6JUaSooCIIdC7o2sPeKDbtYsJe1Yl93dV0Ue+8NG6urq6yCBbGBYKODBVAR6dLCff/gzfwSEiDVINzP8+TRzNycORkyZ+6cewqLEEJAoVAolB8OW9EKUCgUSm2FGmAKhUJRENQAUygUioKgBphCoVAUBDXAFAqFoiCoAaZQKBQFQQ0whUKhKAhqgCkUCkVBUANMoVAoCoIaYAqFQlEQMjPA8fHxmDZtGmxsbKCmpgYdHR106NABf/31F75//y6rw1AoFEqNoY4shFy7dg3Dhg2Dqqoqxo0bh2bNmqGoqAj379/HokWL8OrVK+zbt08Wh6JQKJQaA0vaYjyJiYlwcnKCubk5/vvvP5iYmPDtj4uLw7Vr1+Dj4yOVohQKhVLjIFIyffp0AoA8ePBApPEAmBebzSampqZkypQp5Nu3b3zj4uPjydChQ4mJiQlhsVjMZxwcHJgx9+7dIx06dCB169YlqqqqxNramixevJh8//6dGXP48GECgDx58kTs78b9rLKyMvn8+TPfvocPHzI6lZf96NEj4u7uTnR0dIi6ujpxc3Mj9+/f5xuzdOlSoqqqyrf9zp07BAC5c+cOs+3+/ftEVVWVLF26VEC/zp07851P7mv16tXMGDc3N+Lk5CT0+9na2pKePXtWeR78/f2Jvb09UVFRISYmJsTb25vv71WRHrwvLgDIzJkzyYkTJ4itrS1RVVUlLVq0IMHBwXzHTEpKIjNmzCC2trZETU2NGBgYkKFDh5LExEQB/b59+0bmzp1LLC0tiYqKCjEzMyNjx44lX7584Rs3fvz4Ks/X+PHjiaamZpXnpHPnzqRz587M+6KiIrJixQpiZWVFlJWVSYMGDciiRYtIfn5+lbLGjx9PLC0t+ba9e/eOqKmpEQBCv3N53rx5Q4YNG0YMDQ2JmpoasbW1Jb/++ivfmOfPnxMPDw+ira1NNDU1SdeuXUloaKiArMrOJ/c3WtmL93yKesznz58Td3d3YmhoyCerT58+zJiKruUvX74IHHf16tV8vztCCMnJySFGRkYC1xghol2zXJn16tUjRUVFfPtOnTrF6Fz+d1cZUrsgAgMDYWNjg/bt24v8mUGDBmHw4MEoKSlBaGgo9u3bh+/fv+P48eMAAA6Hg/79+yM5ORlz586Fra0tWCwWNmzYwCcnJycHTZs2xfDhw6GhoYHQ0FBs2bIF+fn52LFjh7RfjUFJSQknTpzAvHnzmG2HDx+GmpoaCgoK+Mb+999/6NWrF1q2bInVq1eDzWbj8OHD6Nq1K+7du4c2bdoAADZu3IjY2FgMGjQIYWFhsLa2FjhuYmIiBg4ciL59+2Ljxo1CdTM3N8emTZsAALm5uZgxYwbf/rFjx2LKlCl4+fIlmjVrxmx/8uQJYmJisGLFikq/+5o1a+Dr64vu3btjxowZiI6Oxu7du/HkyRM8ePAAysrKWL58OSZPngwASE9Px7x58zB16lR06tRJqMzg4GCcPXsWc+bMgaqqKnbt2gUPDw88fvyY0fHJkyd4+PAhRo4cCXNzcyQlJWH37t3o0qULXr9+DQ0NDeY7d+rUCW/evMHEiRPRokULpKen4+rVq/jw4QMMDQ35jm1oaIg///yT7/zIgpkzZ2L//v3o378/Fi5ciPDwcPz+++94+fIlrl27BhaLJZa8VatWCfy2KiIyMhKdOnWCsrIypk6dCisrK8THxyMwMJC5Zl69eoVOnTpBR0cHixcvhrKyMvbu3YsuXbogODgYbdu2BVD1+WzatClznQLAvn378ObNG75z6uTkJNYxs7Ky0KtXLxBCMH/+fDRo0AAA+K43WbB161Z8+vRJYLuo1yyXnJwc/P333xg0aBCzrSJ7UCUim2ohZGVlEQBkwIABIn8G5e5UhBDSvn17Ym9vz7yPjo4mAMimTZv4xnXu3JlvBiyM3r17k2bNmjHvZTEDHjVqFHF0dGS25+XlER0dHTJ69Gg+2aWlpaRx48bE3d2dlJaWMuPz8/OJtbU16dGjB5/8vLw80qpVK+Lg4ECysrL4ZsCZmZnE3t6etG7dusJZVPv27fm+q7CZQGZmJlFTUyNLlizh++ycOXOIpqYmyc3NrfD7f/78maioqJCePXsSDofDbN+5cycBQA4dOiTwmcTERAKAHD58WKhM/P8s4enTp8y25ORkoqamRgYNGsRsE/adQ0NDCQBy7NgxZtuqVasIAHLx4kWB8bx/A0II8fT0JNbW1gL6SDsDjoyMJCwWi4wcOZJvzJo1awgAEhgYWKms8jPgly9fEjabTXr16iXSDNjNzY1oa2uT5ORkvu2833/gwIFERUWFxMfHM9tSUlKItrY2cXNzY7aJcz6F6c6LqMe8efMmAUBOnz7N93lLS0uZzYA/f/5MtLW1mXPKnQGLc81yZY4aNYr07duX2Z6cnEzYbDYZNWqU2DNgqaIgsrOzAQDa2tpifS4/Px/p6elIS0tDQEAAXrx4gW7dujH7c3JyAAB169YVSV5GRgZSU1Nx+fJlhIaGws3NTWBMVlYW0tPTGdniMHbsWLx9+xZPnz4FAAQEBEBXV5dPZwCIiIhAbGwsRo8eja9fvyI9PR3p6enIy8tDt27dEBISgtLSUma8hoYGAgMDkZGRgeHDh4PD4QAoewIYMWIEvn37hqtXr0JdXV2oXgUFBVBTU6tUd11dXQwYMACnT58G+X93P4fDwdmzZzFw4EBoampW+Nnbt2+jqKgIc+fOBZv9v5/KlClToKOjg2vXrlV67IpwdXVFy5YtmfcWFhYYMGAAbt68yZwD3u9cXFyMr1+/olGjRtDT08Pz58+ZfQEBAWjevDnfbIRL+VlnUVERVFVVRdKR+7erbEZTXFyM9PR0XLp0iZm98TJ37lwoKSmJfZ6WLVuGFi1aYNiwYVWO/fLlC0JCQjBx4kRYWFjw7eN+fw6Hg3///RcDBw6EjY0Ns9/ExASjR4/G/fv3mWtZnPNZGeIcU9zrnXstc18ZGRlVfmbdunXQ1dXFnDlz+LaLe80CwMSJE3Hjxg2kpaUBAI4ePQpXV1fY2tqKpD8vUrkgdHR0AEBso/b777/j999/Z957eHjgt99+Y97b2dlBX18fW7duhb29PeOCKC4uFirP3t6eebSYMGEC/vrrL4Ex3bt3Z/6vp6eHUaNG4ffff6/UAHGpV68e+vTpg0OHDqFVq1Y4dOgQxo8fz2eUACA2NhYAMH78+AplZWVlQV9fn3lfUFCAzMxM3Lx5k/khLVu2DE+ePIGGhgYKCwsrlJWeno7GjRtXqf+4ceNw9uxZ3Lt3D25ubrh9+zY+ffpU5eN3cnIygLK/By8qKiqwsbFh9ouLMJ1tbW2Rn5+PL1++wNjYGN+/f8emTZtw+PBhfPz4kbl5AGXnkEt8fDyGDBki0nEzMzOhpaVV5bi8vDzUq1ePed+gQQMsWLBAYCH54cOHfOPKnyddXV2YmJggKSlJJP0A4P79+wgMDERQUBDevXtX5fiEhAQA4HMvlefLly/Iz88X0A8AmjZtitLSUrx//x4ODg5inc/KEOeYrVq1grKyMtasWQNDQ0PGBVHe8HHhvZZFITExEXv37sXu3bsFJiySXLPOzs5o1qwZjh07hkWLFuHIkSP49ddf8f79e7H0AmRggE1NTfHy5UuxPjd27FiMGzcOpaWlSEhIwLp169C3b1/cvn0bLBYLWlpaOHv2LCZOnIiOHTvyfdbBwUFA3vnz55GdnY1nz55h8+bNMDMzw/r16/nG+Pv7w9bWFoWFhbh79y7++OMPAMCuXbtE0nnixIkYN24cZs+ejZCQEBw4cAD37t3jG8P9wfz+++9wdnYWKqe8AfDx8YGJiQk2bNgAT09PAMCzZ89w5swZLFu2DD4+Prh8+bKAnKKiIqSmpqJ
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import seaborn as sns\n",
|
|||
|
"\n",
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"\n",
|
|||
|
"# Связь между возрастом и состоянием\n",
|
|||
|
"plt.subplot(2, 2, 1)\n",
|
|||
|
"sns.scatterplot(data=df, x=\"Age\", y=\"Networth\")\n",
|
|||
|
"plt.title(\"Связь между возрастом и состоянием\")\n",
|
|||
|
"plt.xlabel(\"Возраст\")\n",
|
|||
|
"plt.ylabel(\"Состояние (млрд)\")\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"# Связь между страной проживания и состоянием (топ-10 стран)\n",
|
|||
|
"plt.subplot(2, 2, 2)\n",
|
|||
|
"top_countries = df[\"Country\"].value_counts().index[:10]\n",
|
|||
|
"sns.boxplot(data=df[df[\"Country\"].isin(top_countries)], x=\"Country\", y=\"Networth\")\n",
|
|||
|
"plt.title(\"Связь между страной проживания и состоянием\")\n",
|
|||
|
"plt.xticks(rotation=90)\n",
|
|||
|
"plt.xlabel(\"Страна\")\n",
|
|||
|
"plt.ylabel(\"Состояние (млрд)\")\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"# Связь между источником дохода и состоянием (топ-10 источников дохода)\n",
|
|||
|
"plt.subplot(2, 2, 3)\n",
|
|||
|
"top_sources = df[\"Source\"].value_counts().index[:10]\n",
|
|||
|
"sns.boxplot(data=df[df[\"Source\"].isin(top_sources)], x=\"Source\", y=\"Networth\")\n",
|
|||
|
"plt.title(\"Связь между источником дохода и состоянием\")\n",
|
|||
|
"plt.xticks(rotation=90)\n",
|
|||
|
"plt.xlabel(\"Источник дохода\")\n",
|
|||
|
"plt.ylabel(\"Состояние (млрд)\")\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# Связь между отраслью и состоянием (топ-10 отраслей)\n",
|
|||
|
"plt.subplot(2, 2, 4)\n",
|
|||
|
"top_industries = df[\"Industry\"].value_counts().index[:10]\n",
|
|||
|
"sns.boxplot(data=df[df[\"Industry\"].isin(top_industries)], x=\"Industry\", y=\"Networth\")\n",
|
|||
|
"plt.title(\"Связь между отраслью и состоянием\")\n",
|
|||
|
"plt.xticks(rotation=90)\n",
|
|||
|
"plt.xlabel(\"Отрасль\")\n",
|
|||
|
"plt.ylabel(\"Состояние (млрд)\")\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Перейдем к выявлению выбросов."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 28,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Пустые значения по столбцам:\n",
|
|||
|
"Rank 0\n",
|
|||
|
"Name 0\n",
|
|||
|
"Networth 0\n",
|
|||
|
"Age 0\n",
|
|||
|
"Country 0\n",
|
|||
|
"Source 0\n",
|
|||
|
"Industry 0\n",
|
|||
|
"dtype: int64\n",
|
|||
|
"\n",
|
|||
|
"Количество дубликатов: 0\n",
|
|||
|
"\n",
|
|||
|
"Статистический обзор данных:\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/html": [
|
|||
|
"<div>\n",
|
|||
|
"<style scoped>\n",
|
|||
|
" .dataframe tbody tr th:only-of-type {\n",
|
|||
|
" vertical-align: middle;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe tbody tr th {\n",
|
|||
|
" vertical-align: top;\n",
|
|||
|
" }\n",
|
|||
|
"\n",
|
|||
|
" .dataframe thead th {\n",
|
|||
|
" text-align: right;\n",
|
|||
|
" }\n",
|
|||
|
"</style>\n",
|
|||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|||
|
" <thead>\n",
|
|||
|
" <tr style=\"text-align: right;\">\n",
|
|||
|
" <th></th>\n",
|
|||
|
" <th>Rank</th>\n",
|
|||
|
" <th>Networth</th>\n",
|
|||
|
" <th>Age</th>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </thead>\n",
|
|||
|
" <tbody>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>count</th>\n",
|
|||
|
" <td>2600.000000</td>\n",
|
|||
|
" <td>2600.000000</td>\n",
|
|||
|
" <td>2600.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>mean</th>\n",
|
|||
|
" <td>1269.570769</td>\n",
|
|||
|
" <td>4.860750</td>\n",
|
|||
|
" <td>64.271923</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>std</th>\n",
|
|||
|
" <td>728.146364</td>\n",
|
|||
|
" <td>10.659671</td>\n",
|
|||
|
" <td>13.220607</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>min</th>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>1.000000</td>\n",
|
|||
|
" <td>19.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>25%</th>\n",
|
|||
|
" <td>637.000000</td>\n",
|
|||
|
" <td>1.500000</td>\n",
|
|||
|
" <td>55.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>50%</th>\n",
|
|||
|
" <td>1292.000000</td>\n",
|
|||
|
" <td>2.400000</td>\n",
|
|||
|
" <td>64.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>75%</th>\n",
|
|||
|
" <td>1929.000000</td>\n",
|
|||
|
" <td>4.500000</td>\n",
|
|||
|
" <td>74.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" <tr>\n",
|
|||
|
" <th>max</th>\n",
|
|||
|
" <td>2578.000000</td>\n",
|
|||
|
" <td>219.000000</td>\n",
|
|||
|
" <td>100.000000</td>\n",
|
|||
|
" </tr>\n",
|
|||
|
" </tbody>\n",
|
|||
|
"</table>\n",
|
|||
|
"</div>"
|
|||
|
],
|
|||
|
"text/plain": [
|
|||
|
" Rank Networth Age\n",
|
|||
|
"count 2600.000000 2600.000000 2600.000000\n",
|
|||
|
"mean 1269.570769 4.860750 64.271923\n",
|
|||
|
"std 728.146364 10.659671 13.220607\n",
|
|||
|
"min 1.000000 1.000000 19.000000\n",
|
|||
|
"25% 637.000000 1.500000 55.000000\n",
|
|||
|
"50% 1292.000000 2.400000 64.000000\n",
|
|||
|
"75% 1929.000000 4.500000 74.000000\n",
|
|||
|
"max 2578.000000 219.000000 100.000000"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 28,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"null_values = df.isnull().sum()\n",
|
|||
|
"print(\"Пустые значения по столбцам:\")\n",
|
|||
|
"print(null_values)\n",
|
|||
|
"\n",
|
|||
|
"duplicates = df.duplicated().sum()\n",
|
|||
|
"print(f\"\\nКоличество дубликатов: {duplicates}\")\n",
|
|||
|
"\n",
|
|||
|
"print(\"\\nСтатистический обзор данных:\")\n",
|
|||
|
"df.describe()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Пропущенных данных не найдено.\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 30,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABKMAAAHWCAYAAACrLUrEAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABKhklEQVR4nO3dd5RV5b0//s/MwAwdRLpIFUWUpihBr4KRKIgJMf5sQSMxRqIY201uTDBgzDVYYhI7eMUSMXZjIRYwBlADiCixYwErAiK9Dszs3x+uOV8OQxkQ9rTXa61Zi9n72c9+yjnDc95nn31ykiRJAgAAAABSkFveDQAAAACg+hBGAQAAAJAaYRQAAAAAqRFGAQAAAJAaYRQAAAAAqRFGAQAAAJAaYRQAAAAAqRFGAQAAAJAaYRR8A8XFxbF48eKYO3dueTcFAIAUWQcC7DxhFOygBQsWxEUXXRRt27aN/Pz8aNq0aXTp0iVWrFhR3k0DAGA3sg4E2DVqlHcDYFe566674sc//nHWtqZNm8YBBxwQ//M//xMDBw78xuf44IMP4qijjooNGzbEBRdcEAcddFDUqFEjateuHXXr1v3G9QMAsOOsAwEqF2EUVc4VV1wR7du3jyRJYuHChXHXXXfFcccdF08++WQcf/zx36juYcOGRX5+fkyfPj322muvXdRiAAB2BetAgMpBGEWVM3DgwOjVq1fm95/85CfRvHnzuO+++77RImTWrFnx/PPPx8SJEy1AAAAqIOtAgMrBPaOo8ho1ahS1a9eOGjX+X/b60UcfRU5OTvzxj3/c6nGXX3555OTkZH6fPn161KpVKz788MM44IADoqCgIFq0aBHDhg2LJUuWZB3br1+/OPDAA2PWrFlx2GGHRe3ataN9+/YxZsyYUudZtGhRZqFUq1at6N69e9x9992lyhUXF8f1118fXbt2jVq1akXTpk1jwIAB8corr2TK5OTkxPnnn7/VPt11112Rk5MTH3300VbLREQMHTo0cnJytvozefLkrPIPPfRQHHzwwVG7du1o0qRJnH766fH5559v8xwlli1bFhdffHG0a9cuCgoKonXr1vGjH/0oFi9evMvHaFt9ysnJiX79+u3wOUvKtWnTJvLy8jJ11atXL1NmW4+3Aw88MOu8kydP3uIYDxo0KHJycuLyyy/PbNvSfD777LNx2GGHRZ06daJhw4Zx/PHHx5tvvrm14QeAKs068P+paOvAkvaU/NSpUye6du0at99+e6myzz//fBxxxBFRt27daNSoUQwePDjeeeedrDK33nprdO/ePRo2bBh169aN7t27x7hx40r1rV69ejF37tw49thjo27dutGqVau44oorIkmSrLJ//OMf47DDDos999wzateuHQcffHA8/PDDW+zL+PHj49BDD406derEHnvsEUceeWRMnDgxIiLatWu3zfFs165dJEkS7dq1i8GDB5eqe926ddGwYcMYNmzYdscUKhNXRlHlLF++PBYvXhxJksSiRYvixhtvjFWrVsXpp5/+jer96quvYt26dXHuuefGt7/97fjZz34WH374Ydx8880xY8aMmDFjRhQUFGTKL126NI477rg4+eST47TTTosHH3wwzj333MjPz4+zzjorIiLWrl0b/fr1iw8++CDOP//8aN++fTz00EMxdOjQWLZsWVx44YWZ+n7yk5/EXXfdFQMHDoyzzz47Nm7cGC+88EJMnz496x3AXaWgoKDUYmDmzJlxww03ZG0ruUfDIYccEqNHj46FCxfG9ddfHy+99FK89tpr0ahRo62eY9WqVXHEEUfEO++8E2eddVYcdNBBsXjx4njiiSfis88+iyZNmuzSMbrnnnsyZV944YW47bbb4s9//nM0adIkIiKaN28eETs2L2eeeWY899xz8fOf/zy6d+8eeXl5cdttt8Wrr76602O/ualTp8ZTTz213XIvvPBCHHfccdG2bdsYNWpUbNiwIW655ZY4/PDDY+bMmbHvvvvusjYBQEVkHbhrpLEOLFGyFluxYkXccccd8dOf/jTatWsX/fv3j4iI5557LgYOHBgdOnSIyy+/PNauXRs33nhjHH744fHqq69Gu3btIiJi5cqVccwxx0THjh0jSZJ48MEH4+yzz45GjRrFiSeemDlfUVFRDBgwIL71rW/FNddcE88880yMGjUqNm7cGFdccUWm3PXXXx/f+973YsiQIVFYWBj3339/nHTSSTFhwoQYNGhQptzvfve7uPzyy+Owww6LK664IvLz82PGjBnx/PPPxzHHHBN/+ctfYtWqVRER8c4778Qf/vCH+M1vfhP7779/RETUq1cvcnJy4vTTT49rrrkmlixZEo0bN87U/+STT8aKFSu+8WMYKpwEqog777wziYhSPwUFBcldd92VVXbevHlJRCTXXnvtVusbNWpUsulTpOT3o48+Otm4cWOp8954442ZbX379k0iIrnuuusy29avX5/06NEjadasWVJYWJgkSZL85S9/SSIiGT9+fKZcYWFh0qdPn6RevXrJihUrkiRJkueffz6JiOSCCy4o1c7i4uLMvyMiGT58+HbHaN68eVstkyRJcuaZZyZ169Yttf2hhx5KIiL517/+lWlrs2bNkgMPPDBZu3ZtptyECROSiEhGjhy5zfOMHDkyiYjk0Ucf3Wq/dvUYldjWWJT1nGvXrk1yc3OTYcOGZR2/+fht6/F2wAEHJH379s38/q9//StrjJMkSXr37p0MHDgwiYhk1KhRW+3DwQcfnDRs2DBZsGBBpsx7772X1KxZMznxxBNLnRsAqgrrwMq3DtxSe957770kIpJrrrkms61k3L766qvMtv/85z9Jbm5u8qMf/Wir9W/cuDFp0KBBcv7552f1LSKSn//855ltxcXFyaBBg5L8/Pzkyy+/zGxfs2ZNVn2FhYXJgQcemHz729/ObHv//feT3Nzc5IQTTkiKioqyym9p/bmldV6JOXPmJBGR3HrrrVnbv/e97yXt2rXbYn1QmfmYHlXOzTffHJMmTYpJkybF+PHj46ijjoqzzz47Hn300VJl16xZE4sXL46lS5eWujR3ay655JLIy8vL/H7GGWdE8+bN4x//+EdWuRo1amRdTpufnx/Dhg2LRYsWxaxZsyIi4qmnnooWLVrEaaedlilXs2bNuOCCC2LVqlUxZcqUiIh45JFHIicnJ0aNGlWqPZteQh7x9aW8ixcvjq+++iqKi4vL1Ked9corr8SiRYvivPPOi1q1amW2Dxo0KDp37lxqTDb3yCOPRPfu3eOEE04ota+kX7tjjLanrOdcvXp1FBcXx5577lmmekseb5v+FBUVbfOYRx99NGbOnBlXXXXVVsssXbo03nvvvZg1a1YMGTIkc4VXRESnTp3ie9/7XjzzzDPbPRcAVHbWgZVnHVhi6dKlsXjx4pg7d278+c9/jry8vOjbt29ERHzxxRcxe/bsGDp0aNbVQt26dYvvfOc7pa4cLyoqisWLF8fHH38cf/7zn2PFihVxxBFHlDrnph9nLPl4Y2FhYTz33HOZ7bVr185q4/Lly+OII47Iuvr9sccei+Li4hg5cmTk5ma/tN7R9ee+++4bvXv3jnvvvTezbcmSJfH000/HkCFDdrg+qOiEUVQ5hx56aPTv3z/69+8fQ4YMiX/84x/RpUuXzH8ymxo1alQ0bdo0GjduHHXq1IlBgwbF+++/v8V6S/4D6Ny5c9b2vLy86NSpU6nP37dq1arU1/yWfEyqpOzHH38cnTp1KvWfV8llux9//HFERHz44YfRqlWrrP+Et2bcuHHRtGnTaNKkSdSuXTuOPPLIrPsJ7Eol7dtvv/1K7evcuXNm/9Z8+OGHceCBB273HLt6jLanrOfcc889o1OnTnH77bfHxIkTY9GiRbF48eJYv379Fustebxt+vPuu+9utR1FRUXxm9/8JoYMGRLdunXbarmDDjooMwdbmov9998/Vq9enXUfLgCoiqwDK886sMRBBx0UTZs2jY4dO8Ydd9wRN910Uxx66KHbPcf+++8fixcvjtWrV2e2vf/++9G0adNo165djBgxIm655ZY4+eSTs47
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1500x500 with 2 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Размер данных до удаления выбросов: (2600, 7)\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig, axs = plt.subplots(1, 2, figsize=(15, 5))\n",
|
|||
|
"\n",
|
|||
|
"sns.boxplot(data=df, x=\"Networth\", ax=axs[0])\n",
|
|||
|
"axs[0].set_title(\"Выбросы по состоянию\")\n",
|
|||
|
"\n",
|
|||
|
"sns.boxplot(data=df, x=\"Age\", ax=axs[1])\n",
|
|||
|
"axs[1].set_title(\"Выбросы по возрасту\")\n",
|
|||
|
"\n",
|
|||
|
"plt.show()\n",
|
|||
|
"print(\"Размер данных до удаления выбросов: \", df.shape)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Выбросов в данном случае не видно, данные в районе допустимых значений"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 32,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA/YAAAIjCAYAAACpnIB8AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB8/0lEQVR4nO3deVxU9f7H8fcMzAw7roDkRmou5VJaSuWSqWjWbfHXbbEyNbsZ1lXLylu5dcuyW2Zletu0zRYru6WlYq4ZmpmWaZmZRqXgCijrwJzfHzhHRhYRwZmB1/Px4BGc850znzPMl3zP93u+x2IYhiEAAAAAAOCXrN4uAAAAAAAAVB7BHgAAAAAAP0awBwAAAADAjxHsAQAAAADwYwR7AAAAAAD8GMEeAAAAAAA/RrAHAAAAAMCPEewBAAAAAPBjBHsAAAAAAPwYwR4AAKAWe/vtt7V7927z57lz5+qvv/7yXkEAgFNGsAeACpg7d64sFkuZX3/++ae3SwSASlmzZo0eeOAB7d69W0uWLFFiYqKsVv6JCAD+JNDbBQCAP5kyZYri4uJKbK9Xr54XqgGA0zdmzBj16tXL/Ns2duxYNWrUyMtVAQBOBcEeAE7BgAED1KVLF2+XAQBVpk2bNtq5c6d+/PFHNWjQQC1atPB2SQCAU8Q8KwCoQu4p+8WvV3W5XOrQoYMsFovmzp3r0f7nn3/W3//+dzVs2FDBwcFq3bq1Hn74YUnSpEmTyp3+b7FYtHLlSvNY8+fPV+fOnRUcHKwGDRrolltuKXGd7O23317qcVq2bGm2ad68ua688kotXbpUnTp1UlBQkNq1a6ePP/7Y41iHDh3S/fffr/bt2yssLEwREREaMGCAvv/+e492K1euNJ9n8+bNHvv++usvBQQEyGKx6MMPPyxRZ6dOnUq8xlOnTpXFYlFYWJjH9jlz5qh3796KioqSw+FQu3btNGvWrBKPL83tt9+usLAw/fbbb0pISFBoaKhiY2M1ZcoUGYbh0fY///mPLr74YtWvX1/BwcHq3LmzR+3Fvf3227rooosUEhKiunXrqkePHlq6dKm5v3nz5uX+fouzWCwaNWqU3nnnHbVu3VpBQUHq3LmzVq9eXeJ5//rrLw0bNkzR0dFyOBw699xz9frrr5daY1nvs169epVou379evXv31+RkZEKCQlRz549tXbt2lKPW9a5FX/PStIXX3yh7t27KzQ0VOHh4Ro4cKC2bt3q0cb9+znRhx9+WOKYvXr1KlH7hg0bSn1Njx49qvvuu09nn322bDabR50HDhwo9bzcLBaLJk2a5LGttP7fvHlz3X777R7t5s+fL4vFoubNm3tsd7lcmjFjhtq3b6+goCA1bNhQ/fv317fffms+Z3lfxc973759Gj58uKKjoxUUFKSOHTvqjTfe8Hi+3bt3m3+XQkND1bVrV7Vo0UKJiYmyWCwl6i6NN2surrSa3b8Pu92u/fv3e7RPTk42a3DX6laRv6VS1f39PpX3LACUhRF7AKhmb731lrZs2VJi+w8//KDu3bvLZrPpzjvvVPPmzbVz50599tlnevzxx3Xdddd5BO4xY8aobdu2uvPOO81tbdu2lVT0D9ihQ4fqwgsv1NSpU5WWlqYZM2Zo7dq12rRpk+rUqWM+xuFw6NVXX/WoJTw83OPnHTt26IYbbtBdd92lIUOGaM6cObr++uu1ePFi9e3bV5L022+/6ZNPPtH111+vuLg4paWl6b///a969uypbdu2KTY21uOYQUFBmjNnjmbMmGFue+ONN2S325Wbm1vi9QkMDNTWrVu1adMmnX/++eb2uXPnKigoqET7WbNm6dxzz9Xf/vY3BQYG6rPPPtPdd98tl8ulxMTEEu1PVFhYqP79+6tbt26aNm2aFi9erIkTJ6qgoEBTpkwx282YMUN/+9vfNHjwYOXn5+u9997T9ddfr4ULF2rgwIFmu8mTJ2vSpEm6+OKLNWXKFNntdq1fv17Lly9Xv379zHadOnXSfffd51HLm2++qaSkpBI1rlq1Su+//77uvfdeORwOvfTSS+rfv7+++eYbnXfeeZKktLQ0devWzfwgoGHDhvriiy80fPhwZWZmavTo0aWe/6xZs8zwPH78+BL7ly9frgEDBqhz586aOHGirFar+WHKmjVrdNFFF5V4TPfu3c33608//aQnnnjCY/9bb72lIUOGKCEhQU899ZSys7M1a9YsXXrppdq0aVOJ4FtZDz74YKnbx40bp9mzZ2v48OG65JJLZLPZ9PHHH2vBggVV8rylKSgoMMPfiYYPH665c+dqwIABuuOOO1RQUKA1a9Zo3bp16tKli9566y2z7Zo1a/Tyyy9r+vTpatCggSQpOjpakpSTk6NevXrp119/1ahRoxQXF6f58+fr9ttvV3p6uv75z3+WWd+vv/6qV155pcLn4w81BwQE6O2339aYMWPMbXPmzFFQUFCJvz0V/VtalX+/S1PWexYAymQAAE5qzpw5hiRjw4YNFWq3a9cuwzAMIzc312jatKkxYMAAQ5IxZ84cs22PHj2M8PBw4/fff/c4hsvlKvXYzZo1M4YMGVJie35+vhEVFWWcd955Rk5Ojrl94cKFhiRjwoQJ5rYhQ4YYoaGh5Z5Ds2bNDEnGRx99ZG7LyMgwGjVqZJx//vnmttzcXKOwsNDjsbt27TIcDocxZcoUc9uKFSsMScZNN91k1K9f38jLyzP3tWrVyrj55psNScb8+fNL1HnVVVcZo0aNMrevWbPGCA4ONq655poS55GdnV3iXBISEoyzzz673PN1P58k45577jG3uVwuY+DAgYbdbjf2799f5vPk5+cb5513ntG7d29z244dOwyr1Wpce+21JV6j4r/fZs2aGQMHDixRT2JionHi/6IlGZKMb7/91tz2+++/G0FBQca1115rbhs+fLjRqFEj48CBAx6Pv/HGG43IyMgS9f/rX/8yJHm0P/fcc42ePXt61NyqVSsjISHBo/7s7GwjLi7O6Nu3b4lzOOuss4yhQ4eaP7vfBytWrDAMwzCOHDli1KlTxxgxYoTH41JTU43IyEiP7WW9b+fPn+9xTMMwjJ49e3rU/vnnnxuSjP79+5d4TRs1amQkJCR4bJs4caIhyeN3XhqLxeLRtwyjZP83jJL99qWXXjIcDodx2WWXGc2aNTO3L1++3JBk3HvvvSWeq7S/CaU9l9tzzz1nSDLefvttc1t+fr4RHx9vhIWFGZmZmYZhFPXXE/8u/f3vfzfOO+88o0mTJqX+vSnO12t2P99NN91ktG/f3tyelZVlREREmH973H/XT+VvaVX9/TaMU3vPAkBZmIoPANVo5syZOnjwoCZOnOixff/+/Vq9erWGDRumpk2beuw71amX3377rfbt26e7777bYyR74MCBatOmjRYtWnTKdcfGxuraa681f46IiNBtt92mTZs2KTU1VVLRyL975ezCwkIdPHhQYWFhat26tb777rsSx7zqqqtksVj06aefSioavfvzzz91ww03lFnHsGHDNG/ePOXl5UkqGmW77rrrFBkZWaJtcHCw+X1GRoYOHDignj176rffflNGRkaFznvUqFHm9+4R7/z8fC1btqzU5zl8+LAyMjLUvXt3j3P+5JNP5HK5NGHChBKri5/O1Nr4+Hh17tzZ/Llp06a6+uqrtWTJEhUWFsowDH300Ue66qqrZBiGDhw4YH4lJCQoIyOjxO/GPWJZ2iwIt82bN2vHjh26+eabdfDgQfOYWVlZuvzyy7V69Wq5XC6Px+Tn58vhcJR5zKSkJKWnp+umm27yqDMgIEBdu3bVihUrSjymeLsDBw7oyJEj5b5ehmFo/PjxGjRokLp27Vpi/5EjR1S/fv1yj1GWqKioU74bRnZ2tqZMmaJRo0aV6PcfffSRLBZLib8V0qm/Zz7//HPFxMTopptuMrfZbDbde++9Onr0qFatWlXq4zZu3Kj58+dr6tSpFVoV319qvvXWW/Xzzz+bU+4/+ugjRUZG6vLLL/doV9G/pVX59/tEJ3vPAkBZCPYAUE0yMjL0xBNPaOzYseZ0U7fffvtNkszp06fj999/lyS
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1200x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Гистограмма распределения чистой стоимости\n",
|
|||
|
"plt.figure(figsize=(12, 6))\n",
|
|||
|
"sns.histplot(df[\"Networth\"], bins=10, kde=True)\n",
|
|||
|
"plt.title(\"Гистограмма распределения чистой стоимости\")\n",
|
|||
|
"plt.xlabel(\"Чистая стоимость (в миллиардах долларов)\")\n",
|
|||
|
"plt.ylabel(\"Частота\")\n",
|
|||
|
"plt.grid(True)\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Распределение чистой стоимости имеет ярко выраженное смещение: большая часть значений сосредоточена в нижнем диапазоне, с небольшим количеством высоких значений. Это указывает на преобладание людей с относительно низкой чистой стоимостью, тогда как у немногих (например, миллиардеров) чистая стоимость крайне высока."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 33,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABAwAAAKICAYAAAD0EmiCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3QU5dvG8XtTgZBCIIXQA6GETmihl0joVRBEuqAISJOmNOlFBZGqIqACKh2RXhSVIlXpItIEQiehpd/vH7w7v6wTIISQBPx+ztlzkpnZee6Znd2dueaZWYuqqgAAAAAAACRgl9YFAAAAAACA9IfAAAAAAAAAmBAYAAAAAAAAEwIDAAAAAABgQmAAAAAAAABMCAwAAAAAAIAJgQEAAAAAADAhMAAAAAAAACYEBgAAAAAAwITAAAAApDuHDx+WlStXGv8fPHhQfvjhh7QrCACA/yACAwBAks2fP18sFovs3bvXNO6zzz4Ti8UiTZs2lbi4uDSoDi+S27dvyxtvvCG7du2SkydPSu/eveXQoUNpXRYAAP8pDmldAADg+bdixQrp3r27VK1aVb755huxt7dP65LwnAsODjYeIiIFCxaUrl27pnFVAAD8txAYAACeyo8//iht2rSRwMBA+f777yVDhgxpXRJeECtXrpSjR4/K/fv3pXjx4uLk5JTWJQEA8J/CJQkAgGQ7ePCgNGnSRLJnzy4bNmwQd3d30zRLliyRoKAgyZgxo2TLlk1ee+01uXDhQqLzs1gsiT7OnDljM83IkSNtnjd58mSxWCxSo0YNY9jIkSPFYrGY2sibN6907NjRZtitW7ekT58+kitXLnF2dpYCBQrIxIkTJT4+3ma6+Ph4+fjjj6V48eKSIUMG8fLykrp16xqXaDysfuvDWt+PP/5oM9zZ2VkKFiwo48ePF1W1afPAgQNSr149cXNzk8yZM0vt2rVl165dia6/xFjXw78fCdeB9VKThOt5w4YNUqlSJcmUKZO4u7tLw4YN5fDhw4m2UaNGjUTb+PfrJCLy9ddfG9uDp6entG7dWs6fP2+an3VdBQYGSlBQkPz+++/GfB/nYfUktj2JiMycOVOKFi0qzs7O4ufnJz169JBbt249th0RkQsXLkiXLl3Ez89PnJ2dJV++fNK9e3eJjo421uujHvPnzxcRkY4dO0rmzJnl77//ltDQUHFxcRE/Pz8ZNWqUaZv44IMPpFKlSpI1a1bJmDGjBAUFydKlS021WSwW6dmzp2l4w4YNJW/evMmep8VikalTp5rGFS5c+KFtAgCeT/QwAAAky6lTp6Ru3bri7OwsGzZskOzZs5ummT9/vnTq1EnKlSsn48ePl8uXL8vHH38sv/76qxw4cEA8PDxMz2nWrJk0b95cRER+/vln+fTTTx9Zx61bt2T8+PHJXo579+5J9erV5cKFC/LGG29I7ty5ZceOHTJkyBC5dOmSzYFRly5dZP78+VKvXj15/fXXJTY2Vn7++WfZtWuXlC1bVr766itjWmvtU6ZMkWzZsomIiI+Pj03b7777rhQpUkTu378v3377rbz77rvi7e0tXbp0ERGRI0eOSNWqVcXNzU0GDhwojo6OMmfOHKlRo4b89NNPUqFChSQvZ8La+vbt+8hpf/75Z6lfv77kyZNHRowYITExMTJz5kypXLmy7NmzRwoWLGh6Ts6cOY3X4c6dO9K9e3fTNGPHjpVhw4ZJq1at5PXXX5erV6/KJ598ItWqVXvo9mA1aNCgJC6puR6rtWvXyuLFi22GjRw5Ut5//30JCQmR7t27y4kTJ2TWrFmyZ88e+fXXX8XR0fGhbVy8eFHKly8vt27dkm7duknhwoXlwoULsnTpUrl3755Uq1bNZr2PHTtWRETee+89Y1ilSpWMv+Pi4qRu3bpSsWJFmTRpkqxfv15GjBghsbGxMmrUKGO6jz/+WBo3bixt27aV6Oho+eabb6Rly5ayZs0aadCgwROtp+TMM0OGDDJv3jzp06ePMWzHjh1y9uzZZLUNAEjHFACAJJo3b56KiK5Zs0bz58+vIqJ16tRJdNro6Gj19vbWYsWK6f37943ha9asURHR4cOH20wfExOjIqLvv/++qb3Tp08bw0RER4wYYfw/cOBA9fb21qCgIK1evbox/P3331cR0fj4eJt28uTJox06dDD+Hz16tLq4uOiff/5pM93gwYPV3t5ez507p6qqW7duVRHRt99+27Ss/27jYbVbbdu2TUVEt23bZgyLjIxUOzs7feutt4xhTZs2VScnJz116pQx7OLFi+rq6qrVqlUzzTcx7733nlosFpth/14H/641KChI3d3dNSwszJjmzz//VEdHR23RooWpjUqVKmmxYsWM/69evWp6nc6cOaP29vY6duxYm+ceOnRIHRwcbIZXr17d5rVcu3atiojWrVtXk7LrUr16dS1atKhp+OTJk22W88qVK+rk5KR16tTRuLg4Y7rp06eriOgXX3zxyHbat2+vdnZ2umfPHtO4xLaJfy9XQh06dFAR0V69etnMo0GDBurk5KRXr141ht+7d8/mudHR0VqsWDGtVauWzXAR0R49epjaatCggebJk8dm2JPM8+WXX1YHBwfdu3evMbxLly766quvPrRNAMDziUsSAABPrGPHjnL+/Hl59dVXZePGjbJkyRLTNHv37pUrV67IW2+9ZXNfgwYNGkjhwoVNP5EXHR0tIiLOzs5JruPChQvyySefyLBhwyRz5sw247y9vUVE5J9//nnkPJYsWSJVq1aVLFmyyLVr14xHSEiIxMXFyfbt20VEZNmyZWKxWGTEiBGmeSSlm3xiwsPD5dq1a3Lu3DmZNGmSxMfHS61atUTkwdnmjRs3StOmTcXf3994Tvbs2eXVV1+VX375RSIiIh7bRnR0dJLX6c2bN+XPP/+Uffv2Sdu2bW16RAQEBEjjxo1l/fr1pl/BiIyMfOy9K5YvXy7x8fHSqlUrm/Xs6+srAQEBsm3btkSfp6oyZMgQadGixRP1qEiKzZs3S3R0tPTp00fs7P63S9S1a1dxc3N75M84xsfHy8qVK6VRo0ZStmxZ0/jkbhMJu/Nbu/dHR0fL5s2bjeEZM2Y0/r5586aEh4dL1apVZf/+/ab5RUZG2qzva9euSUxMjGm6J5mnj4+PNGjQQObNmyciD3rpfPfdd9KpU6dkLTMAIP3ikgQAwBO7ceOGfPPNN9KsWTM5evSo9O7dW+rUqWNzDwNr9+RChQqZnl+4cGH55ZdfbIZZrxn/94H/o4wYMUL8/PzkjTfeMF1vHRwcLBaLRYYMGSJjxowx5vvv+xKcPHlS/vjjD/Hy8kq0jStXrojIg0sw/Pz8xNPTM8n1PU7Tpk2Nv+3s7GTo0KHSokULERG5evWq3Lt3L9H1V6RIEYmPj5fz589L0aJFH9nGrVu3krxOy5QpY/z9sHaXLVsm165dswkTrl27JgEBAY+c98mTJ0VVHzrdw7r+L1y4UI4cOSLfffedLFq0KCmLkWQP20adnJzE39//kV3sr169KhEREVKsWLEUq8fOzs4mHBIR4/KPhPddWLNmjYwZM0YOHjwoUVFRxvDEQoq5c+fK3LlzTcPz5Mlj8/+TzFNEpFOnTtKpUyf58MMPZcmSJZIlSxYj7AIAvDgIDAAAT2zy5MnSsmVLERH59NNPpWLFijJkyBCZOXNmsucZFhYmIiK+vr5Jmv7YsWMyf/58+frrrxM92CxZsqSMGDFC3n//fVm4cOFD5xMfHy8vvfSSDBw4MNHxiV2vn1I++OADKVmypMTExMiePXtkzJgx4uDgkGgvhuQKCwtL8jr9+uuv5d69e9KtW7ckzz86OlouXbokL7300iOni4+PF4vFIuvWrUv0ZzcTCzWio6Nl2LBh0qVLl2f6OjxPfv75Z2ncuLFUq1ZNZs6cKdmzZxdHR0eZN29eooFKkyZNTDchHDp0qPF+S848RR70FHJycpKVK1fKvHnzpEOHDja9NAAALwYCAwDAE6tWrZrxd7ly5aRHjx4yY8YMad++vVSsWFFE/ncG88SJE6YzjydOnDCd4Tx69KiIPDiLnRRDhgyRUqV
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1200x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAABAsAAAKpCAYAAADaPqVoAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdd3QU5fv38WspCSUFQkmhhNAJndBCLyGU0ItKERCkSUeKEekiUgTpWCgq8EWpItJBQWlCAGlKR2roJKElhL2eP3h2fhkSiiFkk/h+nbPnJDOzM9fszpb57D33bVFVFQAAAAAAgP8vlb0LAAAAAAAASQthAQAAAAAAMCEsAAAAAAAAJoQFAAAAAADAhLAAAAAAAACYEBYAAAAAAAATwgIAAAAAAGBCWAAAAAAAAEwICwAAAAAAgAlhAQAAsKsjR47IqlWrjP8PHjwoP//8s/0KAgAAhAUAgLgtWLBALBaL7Nu3L9a8r776SiwWizRt2lQeP35sh+qQkkREREi3bt1k9+7dcvLkSenbt68cPnzY3mUBAPCflsbeBQAAkpeVK1dKjx49pGrVqrJkyRJJnTq1vUtCMufv72/cREQKFiwoXbp0sXNVAAD8txEWAABe2q+//iqtW7cWX19f+emnnyRdunT2LgkpxKpVq+TYsWPy4MEDKV68uDg4ONi7JAAA/tO4DAEA8FIOHjwoTZo0EU9PT9mwYYO4urrGWmbp0qXi5+cn6dOnl6xZs0q7du3k0qVLca7PYrHEeTt37pxpmZEjR5ruN3HiRLFYLFKjRg1j2siRI8ViscTaRp48eaRjx46maXfu3JF+/fpJrly5xNHRUfLnzy/jx48Xq9VqWs5qtcrUqVOlePHiki5dOsmWLZvUq1fPuCzjWfXbbrb6fv31V9N0R0dHKViwoIwbN05U1bTNAwcOSP369cXFxUWcnJykdu3asnv37jgfv7jYHoenbzEfA9vlJTEf5w0bNkilSpUkQ4YM4urqKg0bNpQjR47EuY0aNWrEuY2nnycRkYULFxrHg5ubm7z11lty4cKFWOuzPVa+vr7i5+cnf/75p7HeF3lWPXEdTyIis2bNkqJFi4qjo6N4eXlJz5495c6dOy/cjsiLnx/bY/u824IFC0REpGPHjuLk5CRnzpyRunXrSsaMGcXLy0tGjx4d67iYNGmSVKpUSbJkySLp06cXPz8/WbZsWZw1Lly4UMqXLy8ZMmSQzJkzS7Vq1WTjxo2mZZ4+Jm23PHnymJY7c+aMtGrVSry8vCRVqlTGcsWKFYtzXQcPHjTd/9KlS5I6dWqxWCzPrBcAkHTRsgAA8EKnT5+WevXqiaOjo2zYsEE8PT1jLbNgwQJ55513pFy5cjJu3Di5evWqTJ06VXbs2CEHDhyQTJkyxbpPs2bNpHnz5iIi8ttvv8mXX3753Dru3Lkj48aNi/d+3L9/X6pXry6XLl2Sbt26Se7cuWXnzp0SHBwsV65ckc8//9xYtnPnzrJgwQKpX7++vPvuuxIdHS2//fab7N69W8qWLSvfffedsayt9ilTpkjWrFlFRMTd3d207Q8//FCKFCkiDx48kO+//14+/PBDyZ49u3Tu3FlERI4ePSpVq1YVFxcXGTx4sKRNm1a++OILqVGjhmzbtk0qVKjw0vsZs7b+/fs/d9nffvtNGjRoIN7e3jJixAh59OiRzJo1SypXrix79+6VggULxrpPzpw5jefh7t270qNHj1jLjB07VoYNGyZvvPGGvPvuu3L9+nWZPn26VKtW7ZnHg82QIUNeck9j12Ozdu1a+d///meaNnLkSBk1apQEBARIjx495Pjx4zJ79mzZu3ev7NixQ9KmTfvMbbzM81OtWjXTYz927FgRERk6dKgxrVKlSsbfjx8/lnr16knFihVlwoQJsn79ehkxYoRER0fL6NGjjeWmTp0qjRs3lrZt20pUVJQsWbJEWrVqJWvWrJGgoCBjuVGjRsnIkSOlUqVKMnr0aHFwcJA9e/bI1q1bJTAwMNY+2Y5JEZEvv/xSzp8/b6qtcePG8s8//0i/fv2kYMGCYrFYjH16Wrp06WT+/PkydepUY9o333wjDg4O8vDhw2c+rgCAJEwBAIjD/PnzVUR0zZo1mi9fPhURDQwMjHPZqKgozZ49uxYrVkwfPHhgTF+zZo2KiA4fPty0/KNHj1REdNSoUbG2d/bsWWOaiOiIESOM/wcPHqzZs2dXPz8/rV69ujF91KhRKiJqtVpN2/H29tYOHToY/48ZM0YzZsyoJ06cMC33wQcfaOrUqfX8+fOqqrp161YVEe3Tp0+sfX16G8+q3eaXX35REdFffvnFmPbw4UNNlSqVvvfee8a0pk2bqoODg54+fdqYdvnyZXV2dtZq1arFWm9chg4dqhaLxTTt6cfg6Vr9/PzU1dVVQ0NDjWVOnDihadOm1RYtWsTaRqVKlbRYsWLG/9evX4/1PJ07d05Tp06tY8eONd338OHDmiZNGtP06tWrm57LtWvXqohovXr19GW+plSvXl2LFi0aa/rEiRNN+3nt2jV1cHDQwMBAffz4sbHcjBkzVER03rx5z91OfJ6fp/ctpg4dOqiIaO/evY1pVqtVg4KC1MHBQa9fv25Mv3//vum+UVFRWqxYMa1Vq5Yx7eTJk5oqVSpt1qyZaf9s641p06ZNKiK6bds2Uz3e3t7G/8ePH1cR0XHjxsXap5iPt+34bt26tWbJkkUjIyONeQUKFNA2bdqoiOjSpUvjfBwAAEkXlyEAAJ6rY8eOcuHCBWnTpo1s3LhRli5dGmuZffv2ybVr1+S9994z9WMQFBQkhQsXjjUMXlRUlIiIODo6vnQdly5dkunTp8uwYcPEycnJNC979uwiInLx4sXnrmPp0qVStWpVyZw5s9y4ccO4BQQEyOPHj2X79u0iIrJ8+XKxWCwyYsSIWOt4mabxcQkLC5MbN27I+fPnZcKECWK1WqVWrVoi8uRX3I0bN0rTpk0lb968xn08PT2lTZs28vvvv0t4ePgLtxEVFfXSj+nt27flxIkTEhISIm3btjW1hChQoIA0btxY1q9fH2u0i4cPH76wr4oVK1aI1WqVN954w/Q4e3h4SIECBeSXX36J836qKsHBwdKiRYt/1ZLiZWzevFmioqKkX79+kirV/3396dKli7i4uDx3qMaEen7i0qtXL+Nvi8UivXr1kqioKNm8ebMxPX369Mbft2/flrCwMKlatars37/fmL5q1SqxWq0yfPhw0/7Z1hvTy7z+IiIiREQkS5YsL7UfjRo1EovFIqtXrxaRJy1WLl68KG+++eZL3R8AkPRwGQIA4Llu3bolS5YskWbNmsmxY8ekb9++EhgYaOqz4J9//hERkUKFCsW6f+HCheX33383TbNdI/70Sf/zjBgxQry8vKRbt26xrn/29/cXi8UiwcHB8vHHHxvrfbofgpMnT8qhQ4ckW7ZscW7j2rVrIvLksgsvLy9xc3N76fpepGnTpsbfqVKlko8++khatGghIiLXr1+X+/fvx/n4FSlSRKxWq1y4cEGKFi363G3cuXPnpR/TMmXKGH8/a7vLly+XGzdumIKEGzduSIECBZ677pMnT4qqPnO5ZzX3X7RokRw9elR++OEHWbx48cvsxkt71jHq4OAgefPmNebHJaGen6elSpXKFD6IiHHZR8y+FtasWSMff/yxHDx4UCIjI43pMUOA06dPS6pUqcTX1/eF232Z11+hQoUkc+bM8tlnn4mvr69xGcKjR4/iXD5t2rTSrl07mTdvnrRs2VLmzZsnLVq0EBcXlxfWAwBImggLAADPNXHiRGnVqpWIPLmuuWLFihIcHCyzZs2K9zpDQ0NFRMTDw+Ollv/rr79kwYIFsnDhwjhPNEuWLCkjRoyQUaNGyaJFi565HqvVKnXq1JHBgwfHOT+u6/MTyqRJk6RkyZLy6NEj2bt3r3z88ceSJk2aOFsvxFdoaOhLP6YLFy6U+/fvS9euXV96/VFRUXLlyhWpU6fOc5ezWq1isVhk3bp1cQ6tGddJalRUlAwbNkw6d+78Wp+H5Oa3336Txo0bS7Vq1WTWrFni6ek
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1200x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1IAAAHWCAYAAAB9mLjgAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB8xElEQVR4nOzdd3wUdeLG8c/upvfeCxAghI4gGLGLAqKiYIFDxe6d4Nl+Fs6zn/3O3nvD3lA8QAQBkV5CDS2UAOm9t935/RHJGakJm0zK83699iWZnf3Os+NC8mRmvmMxDMNAREREREREjpnV7AAiIiIiIiLtjYqUiIiIiIhIE6lIiYiIiIiINJGKlIiIiIiISBOpSImIiIiIiDSRipSIiIiIiEgTqUiJiIiIiIg0kYqUiIiIiIhIE6lIiYiIiIiINJGKlIiIiIiISBOpSImItKD3338fi8XS8PDw8KBnz55MnTqV7Oxss+OJiIhIM7mYHUBEpDN45JFH6Nq1K1VVVSxevJjXXnuN//73v2zcuBEvLy+z44mIiEgTqUiJiLSC0aNHM2TIEACuv/56goODefbZZ5kxYwYTJ040OZ2IiIg0lU7tExExwVlnnQXArl27ACgoKOD//u//6NevHz4+Pvj5+TF69GjWrVt30Gurqqp46KGH6NmzJx4eHkRGRjJu3DjS0tIA2L17d6PTCf/8OOOMMxrGWrBgARaLhc8//5x//OMfRERE4O3tzYUXXsjevXsP2vby5csZNWoU/v7+eHl5cfrpp/Pbb78d8j2eccYZh9z+Qw89dNC6H3/8MYMHD8bT05OgoCAmTJhwyO0f6b39kcPh4Pnnn6dPnz54eHgQHh7OTTfdRGFhYaP1unTpwvnnn3/QdqZOnXrQmIfK/swzzxy0TwGqq6t58MEH6d69O+7u7sTGxnL33XdTXV19yH31R3/ebyEhIYwZM4aNGzc2Wq+uro5HH32UhIQE3N3d6dKlC//4xz8O2sbYsWPp0qULHh4ehIWFceGFF7Jhw4aD3tvUqVOZPn06iYmJeHh4MHjwYBYtWtRovT179nDzzTeTmJiIp6cnwcHBXHrppezevfug91FUVMTtt99Oly5dcHd3JyYmhquuuoq8vLyGz92RHgf2dVO2KSLSmnRESkTEBAdKT3BwMAA7d+7ku+++49JLL6Vr165kZ2fzxhtvcPrpp7N582aioqIAsNvtnH/++cybN48JEyZw6623Ulpayty5c9m4cSMJCQkN25g4cSLnnXdeo+1OmzbtkHkee+wxLBYL99xzDzk5OTz//POMGDGClJQUPD09AZg/fz6jR49m8ODBPPjgg1itVt577z3OOussfv31V4YOHXrQuDExMTzxxBMAlJWV8be//e2Q277//vu57LLLuP7668nNzeWll17itNNOY+3atQQEBBz0mhtvvJFTTz0VgG+++YZvv/220fM33XQT77//Ptdccw1///vf2bVrFy+//DJr167lt99+w9XV9ZD7oSmKiooa3tsfORwOLrzwQhYvXsyNN95IUlISGzZs4LnnnmPbtm189913Rx27V69e3HfffRiGQVpaGs8++yznnXce6enpDetcf/31fPDBB1xyySXceeedLF++nCeeeILU1NSD9seNN95IREQEGRkZvPzyy4wYMYJdu3Y1Oq104cKFfP755/z973/H3d2dV199lVGjRrFixQr69u0LwMqVK1myZAkTJkwgJiaG3bt389prr3HGGWewefPmhvHKyso49dRTSU1N5dprr+WEE04gLy+P77//nn379pGUlMRHH33UsO0333yT1NRUnnvuuYZl/fv3b9I2RURanSEiIi3mvffeMwDj559/NnJzc429e/can332mREcHGx4enoa+/btMwzDMKqqqgy73d7otbt27TLc3d2NRx55pGHZu+++awDGs88+e9C2HA5Hw+sA45lnnjlonT59+hinn356w9e//PKLARjR0dFGSUlJw/IvvvjCAIwXXnihYewePXoYI0eObNiOYRhGRUWF0bVrV+Occ845aFsnn3yy0bdv34avc3NzDcB48MEHG5bt3r3bsNlsxmOPPdbotRs2bDBcXFwOWr59+3YDMD744IOGZQ8++KDxx29nv/76qwEY06dPb/Ta2bNnH7Q8Pj7eGDNmzEHZp0yZYvz5W+Sfs999991GWFiYMXjw4Eb79KOPPjKsVqvx66+/Nnr966+/bgDGb7/9dtD2/uj0009vNJ5hGMY//vEPAzBycnIMwzCMlJQUAzCuv/76Ruv93//9nwEY8+fPP+z4B/7frlq1qtF7+/OyPXv2GB4eHsbFF1/csKyiouKg8ZYuXWoAxocfftiw7IEHHjAA45tvvjlo/T9+fg6YPHmyER8ff8i8x7pNEZHWplP7RERawYgRIwgNDSU2NpYJEybg4+PDt99+S3R0NADu7u5YrfX/JNvtdvLz8/Hx8SExMZE1a9Y0jPP1118TEhLCLbfcctA2/nwqWlNcddVV+Pr6Nnx9ySWXEBkZyX//+18AUlJS2L59O3/5y1/Iz88nLy+PvLw8ysvLOfvss1m0aBEOh6PRmFVVVXh4eBxxu9988w0Oh4PLLrusYcy8vDwiIiLo0aMHv/zyS6P1a2pqgPr9dThffvkl/v7+nHPOOY3GHDx4MD4+PgeNWVtb22i9vLw8qqqqjph7//79vPTSS9x///34+PgctP2kpCR69erVaMwDp3P+efuHciBTbm4uS5cu5dtvv6V///6EhIQANPx/ueOOOxq97s477wTgxx9/bLS8oqKCvLw8UlJSeOuttwgPD6dnz56N1klOTmbw4MENX8fFxTF27FjmzJmD3W4HaDg6eSBjfn4+3bt3JyAg4KDP6YABA7j44osPem9N/Zwe6zZFRFqbTu0TEWkFr7zyCj179sTFxYXw8HASExMbihPUnw72wgsv8Oqrr7Jr166GH1zhf6f/Qf0pgYmJibi4OPef7x49ejT62mKx0L1794brULZv3w7A5MmTDztGcXExgYGBDV/n5eUdNO6fbd++HcMwDrven0/BKyoqAjiovPx5zOLiYsLCwg75fE5OTqOvf/rpJ0JDQ4+Y888efPBBoqKiuOmmm/jqq68O2n5qauphx/zz9g9lyZIljV7fo0cPvvvuu4YSsmfPHqxWK927d2/0uoiICAICAtizZ0+j5Y888ghPPfVUw1gLFixoVJwPLP+znj17UlFRQW5uLhEREVRWVvLEE0/w3nvvsX//fgzDaFi3uLi44c9paWmMHz/+qO/zWBzrNkVEWpuKlIhIKxg6dGjDrH2H8vjjj3P//fdz7bXX8uijjxIUFITVauW222476EiPGQ5keOaZZxg4cOAh1/ljuampqSEzM5NzzjnnqONaLBZmzZqFzWY74pgAWVlZQH1hONKYYWFhTJ8+/ZDP/7ngDBs2jH/961+Nlr388svMmDHjkK9PTU3l/fff5+OPPz7ktVYOh4N+/frx7LPPHvL1sbGxh81+QP/+/fnPf/4DQG5uLi+++CJnnHEGa9asafTej/XozvXXX8/ZZ5/Nvn37eO655xg/fjxLlizB39//mF5/wC233MJ7773HbbfdRnJyMv7+/lgsFiZMmNBin1MztikicixUpERE2oCvvvqKM888k3feeafR8qKioobTuQASEhJYvnw5tbW1Tpkw4YADR5wOMAyDHTt2NFzwf2ASCz8/P0aMGHHU8datW0dtbe0Ry+OBcQ3DoGvXrgedanYomzdvxmKxkJiYeMQxf/75Z4YPH97otLDDCQkJOeg9HWlCiGnTpjFw4EAuv/zyw25/3bp1nH322c0+3TIwMLBRpjPOOIOoqCjee+89pk2bRnx8PA6Hg+3bt5OUlNSwXnZ2NkVFRcTHxzcar3v37g1Hr0aMGEFcXByffPJJo8k//vwZANi2bRteXl4N5fOrr75i8uTJDSUP6k/hPHCk8I/74M+zDDbXsW5TRKS16RopEZE2wGazNTplCeqvtdm/f3+jZePHjycvL4+XX375oDH+/Pqm+PDDDyktLW34+quvviIzM5PRo0cDMHjwYBISEvj3v/9
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x500 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# 1. Столбчатая диаграмма по странам\n",
|
|||
|
"plt.figure(figsize=(12, 6))\n",
|
|||
|
"sns.countplot(data=df, x=\"Country\", order=df[\"Country\"].value_counts().index)\n",
|
|||
|
"plt.title(\"Количество людей по странам\")\n",
|
|||
|
"plt.xlabel(\"Страна\")\n",
|
|||
|
"plt.ylabel(\"Количество\")\n",
|
|||
|
"plt.xticks(rotation=45)\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# 2. Столбчатая диаграмма по отраслям\n",
|
|||
|
"plt.figure(figsize=(12, 6))\n",
|
|||
|
"sns.countplot(data=df, x=\"Industry\", order=df[\"Industry\"].value_counts().index)\n",
|
|||
|
"plt.title(\"Количество людей по отраслям\")\n",
|
|||
|
"plt.xlabel(\"Отрасль\")\n",
|
|||
|
"plt.ylabel(\"Количество\")\n",
|
|||
|
"plt.xticks(rotation=45)\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# 3. Гистограмма для анализа возраста\n",
|
|||
|
"plt.figure(figsize=(10, 5))\n",
|
|||
|
"sns.histplot(df[\"Age\"], bins=30, kde=True)\n",
|
|||
|
"plt.title(\"Распределение возраста\")\n",
|
|||
|
"plt.xlabel(\"Возраст\")\n",
|
|||
|
"plt.ylabel(\"Частота\")\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Графики демонстрируют разнообразие стран и отраслей, представленных в наборе данных, что указывает на охват данных по множеству регионов и различных сфер деятельности."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Разбиваем набор данных на обучающую, контрольную и тестовую выборки"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 49,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"((1560, 6), (520, 6), (520, 6))"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 49,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"from sklearn.model_selection import train_test_split\n",
|
|||
|
"\n",
|
|||
|
"# Разделим набор данных на признаки (X) и целевой признак (y)\n",
|
|||
|
"X = df.drop(columns=[\"Networth\"])\n",
|
|||
|
"y = df[\"Networth\"]\n",
|
|||
|
"\n",
|
|||
|
"# Разделение на обучающую, контрольную и тестовую выборки\n",
|
|||
|
"X_train, X_temp, y_train, y_temp = train_test_split(\n",
|
|||
|
" X, y, test_size=0.4, random_state=42\n",
|
|||
|
")\n",
|
|||
|
"X_val, X_test, y_val, y_test = train_test_split(\n",
|
|||
|
" X_temp, y_temp, test_size=0.5, random_state=42\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"# Проверка размера выборок\n",
|
|||
|
"(X_train.shape, X_val.shape, X_test.shape)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 39,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"(count 1560.000000\n",
|
|||
|
" mean 5.208173\n",
|
|||
|
" std 12.653032\n",
|
|||
|
" min 1.000000\n",
|
|||
|
" 25% 1.500000\n",
|
|||
|
" 50% 2.400000\n",
|
|||
|
" 75% 4.300000\n",
|
|||
|
" max 219.000000\n",
|
|||
|
" Name: Networth, dtype: float64,\n",
|
|||
|
" count 520.000000\n",
|
|||
|
" mean 4.443654\n",
|
|||
|
" std 7.267615\n",
|
|||
|
" min 1.000000\n",
|
|||
|
" 25% 1.500000\n",
|
|||
|
" 50% 2.400000\n",
|
|||
|
" 75% 4.825000\n",
|
|||
|
" max 91.400000\n",
|
|||
|
" Name: Networth, dtype: float64,\n",
|
|||
|
" count 520.000000\n",
|
|||
|
" mean 4.235577\n",
|
|||
|
" std 5.861496\n",
|
|||
|
" min 1.000000\n",
|
|||
|
" 25% 1.600000\n",
|
|||
|
" 50% 2.500000\n",
|
|||
|
" 75% 4.500000\n",
|
|||
|
" max 60.000000\n",
|
|||
|
" Name: Networth, dtype: float64)"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 39,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Проверка распределения целевого признака по выборкам\n",
|
|||
|
"train_dist = y_train.describe()\n",
|
|||
|
"val_dist = y_val.describe()\n",
|
|||
|
"test_dist = y_test.describe()\n",
|
|||
|
"\n",
|
|||
|
"train_dist, val_dist, test_dist"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"from imblearn.over_sampling import RandomOverSampler\n",
|
|||
|
"from imblearn.under_sampling import RandomUnderSampler\n",
|
|||
|
"\n",
|
|||
|
"oversampler = RandomOverSampler(random_state=12)\n",
|
|||
|
"X_train_over, y_train_over = oversampler.fit_resample(X_train, y_train)\n",
|
|||
|
"\n",
|
|||
|
"undersampler = RandomUnderSampler(random_state=12)\n",
|
|||
|
"X_train_under, y_train_under = undersampler.fit_resample(X_train, y_train)\n",
|
|||
|
"\n",
|
|||
|
"print(\"Размеры после oversampling:\", X_train_over.shape, y_train_over.shape)\n",
|
|||
|
"print(\"Размеры после undersampling:\", X_train_under.shape, y_train_under.shape)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Размеры после oversampling: (13910, 10047) (13910,)\n",
|
|||
|
"Размеры после undersampling: (13065, 10047) (13065,)"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python (.venv)",
|
|||
|
"language": "python",
|
|||
|
"name": ".venv"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.12.6"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 2
|
|||
|
}
|