MAI_ISE-31_Andrikhov-A-S/lab2.ipynb

2207 lines
423 KiB
Plaintext
Raw Normal View History

2024-10-19 13:14:28 +04:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Лабораторная работа 2. Анализ нескольких датасетов."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.Выбрать три набора данных, которые не соответствуют Вашему варианту задания\n",
"Выбранны варианты: Данные по инсультам(Вариант 4), Продажи домов(Вариант 6), Цены на мобильные устройства (Вариант 18)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Провести анализ сведений о каждом наборе данных со страницы загрузки в Kaggle. Какова проблемная область?\n",
"\n",
"#### Данные по инсультам:\n",
"- **Проблемная область:** Анализ данных о пациентах с инсультом\n",
"- **Цели:** Анализ данных о пациентах с инсультом, определение факторов, влияющих на исход лечения\n",
"- **Набор данных:** 5111 записей, 12 переменных:\n",
" - id\n",
" - gender\n",
" - age\n",
" - hypertension\n",
" - heart_disease\n",
" - ever_married\n",
" - work_type\n",
" - residence_typr\n",
" - avg_glucose_level\n",
" - bmi\n",
" - smoking_status\n",
" - stroke\n",
"- **Описание данных:** Сведения о пациентах с инсультом, их лечении и исходе лечения\n",
"\n",
"#### Продажи домов:\n",
"- **Проблемная область:** Анализ продаж домов и их цен в зависисмости от различных факторов \n",
"- **Цели:** Анализ продаж домов, определение факторов, влияющих на цены\n",
"- **Набор данных:** 21614 записей, 21 переменная:\n",
" - id\n",
" - date\n",
" - price\n",
" - bedrooms\n",
" - bathrooms\n",
" - sqft_living\n",
" - sqft_loft\n",
" - floors\n",
" - waterfront\n",
" - view\n",
" - condition\n",
" - grade\n",
" - sqft_above\n",
" - sqft_basment\n",
" - yr_build\n",
" - yr_renovated\n",
" - zipcode\n",
" - lat\n",
" - longsqft_living15\n",
" - sqft_lot15\n",
"- **Описание данных:** Сведения о проданных домах в King County, США\n",
"\n",
"#### Цены на мобильные устройства:\n",
"- **Проблемная область:** Анализ цен на мобильные устройства\n",
"- **Цели:** Анализ цен на мобильные устройства, определение факторов, влияющих на цены\n",
"- **Набор данных:** 1371 записей, 18 переменных:\n",
" - id\n",
" - name\n",
" - rating\n",
" - spec_score\n",
" - no_of_sim\n",
" - ram\n",
" - battery\n",
" - camera\n",
" - external_memory\n",
" - android_version\n",
" - price\n",
" - company\n",
" - inbuild_memory\n",
" - fast_charging\n",
" - screen_resolution\n",
" - processor\n",
" - processor_name\n",
"- **Описание данных:** Сведения о ценах на мобильные устройства в зависимости от различных факторов"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Данные по инсультам:\n",
"Каждая строка в датасете содержит соответствующую информацию о пациенте, что позволяет проводить анализ и строить модели для предсказания риска инсульта."
]
},
{
"cell_type": "code",
2024-10-19 23:03:18 +04:00
"execution_count": 3,
2024-10-19 13:14:28 +04:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>gender</th>\n",
" <th>age</th>\n",
" <th>hypertension</th>\n",
" <th>heart_disease</th>\n",
" <th>ever_married</th>\n",
" <th>work_type</th>\n",
" <th>Residence_type</th>\n",
" <th>avg_glucose_level</th>\n",
" <th>bmi</th>\n",
" <th>smoking_status</th>\n",
" <th>stroke</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>9046</td>\n",
" <td>Male</td>\n",
" <td>67.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>Yes</td>\n",
" <td>Private</td>\n",
" <td>Urban</td>\n",
" <td>228.69</td>\n",
" <td>36.6</td>\n",
" <td>formerly smoked</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>51676</td>\n",
" <td>Female</td>\n",
" <td>61.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Self-employed</td>\n",
" <td>Rural</td>\n",
" <td>202.21</td>\n",
" <td>NaN</td>\n",
" <td>never smoked</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>31112</td>\n",
" <td>Male</td>\n",
" <td>80.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>Yes</td>\n",
" <td>Private</td>\n",
" <td>Rural</td>\n",
" <td>105.92</td>\n",
" <td>32.5</td>\n",
" <td>never smoked</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>60182</td>\n",
" <td>Female</td>\n",
" <td>49.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Private</td>\n",
" <td>Urban</td>\n",
" <td>171.23</td>\n",
" <td>34.4</td>\n",
" <td>smokes</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1665</td>\n",
" <td>Female</td>\n",
" <td>79.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Self-employed</td>\n",
" <td>Rural</td>\n",
" <td>174.12</td>\n",
" <td>24.0</td>\n",
" <td>never smoked</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5105</th>\n",
" <td>18234</td>\n",
" <td>Female</td>\n",
" <td>80.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Private</td>\n",
" <td>Urban</td>\n",
" <td>83.75</td>\n",
" <td>NaN</td>\n",
" <td>never smoked</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5106</th>\n",
" <td>44873</td>\n",
" <td>Female</td>\n",
" <td>81.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Self-employed</td>\n",
" <td>Urban</td>\n",
" <td>125.20</td>\n",
" <td>40.0</td>\n",
" <td>never smoked</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5107</th>\n",
" <td>19723</td>\n",
" <td>Female</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Self-employed</td>\n",
" <td>Rural</td>\n",
" <td>82.99</td>\n",
" <td>30.6</td>\n",
" <td>never smoked</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5108</th>\n",
" <td>37544</td>\n",
" <td>Male</td>\n",
" <td>51.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Private</td>\n",
" <td>Rural</td>\n",
" <td>166.29</td>\n",
" <td>25.6</td>\n",
" <td>formerly smoked</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5109</th>\n",
" <td>44679</td>\n",
" <td>Female</td>\n",
" <td>44.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Yes</td>\n",
" <td>Govt_job</td>\n",
" <td>Urban</td>\n",
" <td>85.28</td>\n",
" <td>26.2</td>\n",
" <td>Unknown</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5110 rows × 12 columns</p>\n",
"</div>"
],
"text/plain": [
" id gender age hypertension heart_disease ever_married \\\n",
"0 9046 Male 67.0 0 1 Yes \n",
"1 51676 Female 61.0 0 0 Yes \n",
"2 31112 Male 80.0 0 1 Yes \n",
"3 60182 Female 49.0 0 0 Yes \n",
"4 1665 Female 79.0 1 0 Yes \n",
"... ... ... ... ... ... ... \n",
"5105 18234 Female 80.0 1 0 Yes \n",
"5106 44873 Female 81.0 0 0 Yes \n",
"5107 19723 Female 35.0 0 0 Yes \n",
"5108 37544 Male 51.0 0 0 Yes \n",
"5109 44679 Female 44.0 0 0 Yes \n",
"\n",
" work_type Residence_type avg_glucose_level bmi smoking_status \\\n",
"0 Private Urban 228.69 36.6 formerly smoked \n",
"1 Self-employed Rural 202.21 NaN never smoked \n",
"2 Private Rural 105.92 32.5 never smoked \n",
"3 Private Urban 171.23 34.4 smokes \n",
"4 Self-employed Rural 174.12 24.0 never smoked \n",
"... ... ... ... ... ... \n",
"5105 Private Urban 83.75 NaN never smoked \n",
"5106 Self-employed Urban 125.20 40.0 never smoked \n",
"5107 Self-employed Rural 82.99 30.6 never smoked \n",
"5108 Private Rural 166.29 25.6 formerly smoked \n",
"5109 Govt_job Urban 85.28 26.2 Unknown \n",
"\n",
" stroke \n",
"0 1 \n",
"1 1 \n",
"2 1 \n",
"3 1 \n",
"4 1 \n",
"... ... \n",
"5105 0 \n",
"5106 0 \n",
"5107 0 \n",
"5108 0 \n",
"5109 0 \n",
"\n",
"[5110 rows x 12 columns]"
]
},
2024-10-19 23:03:18 +04:00
"execution_count": 3,
2024-10-19 13:14:28 +04:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"var4 = pd.read_csv(\"./datasets/var4/healthcare-dataset-stroke-data.csv\")\n",
"var4"
]
},
{
"cell_type": "code",
2024-10-19 23:03:18 +04:00
"execution_count": 4,
2024-10-19 13:14:28 +04:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"id int64\n",
"gender object\n",
"age float64\n",
"hypertension int64\n",
"heart_disease int64\n",
"ever_married object\n",
"work_type object\n",
"Residence_type object\n",
"avg_glucose_level float64\n",
"bmi float64\n",
"smoking_status object\n",
"stroke int64\n",
"dtype: object"
]
},
2024-10-19 23:03:18 +04:00
"execution_count": 4,
2024-10-19 13:14:28 +04:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var4.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Продажи домов\n",
"Каждая строка в датасете содержит соответствующую информацию о доме, что позволяет проводить анализ и строить модели для предсказания его цены."
]
},
{
"cell_type": "code",
2024-10-19 23:03:18 +04:00
"execution_count": 5,
2024-10-19 13:14:28 +04:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>date</th>\n",
" <th>price</th>\n",
" <th>bedrooms</th>\n",
" <th>bathrooms</th>\n",
" <th>sqft_living</th>\n",
" <th>sqft_lot</th>\n",
" <th>floors</th>\n",
" <th>waterfront</th>\n",
" <th>view</th>\n",
" <th>...</th>\n",
" <th>grade</th>\n",
" <th>sqft_above</th>\n",
" <th>sqft_basement</th>\n",
" <th>yr_built</th>\n",
" <th>yr_renovated</th>\n",
" <th>zipcode</th>\n",
" <th>lat</th>\n",
" <th>long</th>\n",
" <th>sqft_living15</th>\n",
" <th>sqft_lot15</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>7129300520</td>\n",
" <td>20141013T000000</td>\n",
" <td>221900.0</td>\n",
" <td>3</td>\n",
" <td>1.00</td>\n",
" <td>1180</td>\n",
" <td>5650</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>7</td>\n",
" <td>1180</td>\n",
" <td>0</td>\n",
" <td>1955</td>\n",
" <td>0</td>\n",
" <td>98178</td>\n",
" <td>47.5112</td>\n",
" <td>-122.257</td>\n",
" <td>1340</td>\n",
" <td>5650</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>6414100192</td>\n",
" <td>20141209T000000</td>\n",
" <td>538000.0</td>\n",
" <td>3</td>\n",
" <td>2.25</td>\n",
" <td>2570</td>\n",
" <td>7242</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>7</td>\n",
" <td>2170</td>\n",
" <td>400</td>\n",
" <td>1951</td>\n",
" <td>1991</td>\n",
" <td>98125</td>\n",
" <td>47.7210</td>\n",
" <td>-122.319</td>\n",
" <td>1690</td>\n",
" <td>7639</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>5631500400</td>\n",
" <td>20150225T000000</td>\n",
" <td>180000.0</td>\n",
" <td>2</td>\n",
" <td>1.00</td>\n",
" <td>770</td>\n",
" <td>10000</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>6</td>\n",
" <td>770</td>\n",
" <td>0</td>\n",
" <td>1933</td>\n",
" <td>0</td>\n",
" <td>98028</td>\n",
" <td>47.7379</td>\n",
" <td>-122.233</td>\n",
" <td>2720</td>\n",
" <td>8062</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2487200875</td>\n",
" <td>20141209T000000</td>\n",
" <td>604000.0</td>\n",
" <td>4</td>\n",
" <td>3.00</td>\n",
" <td>1960</td>\n",
" <td>5000</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>7</td>\n",
" <td>1050</td>\n",
" <td>910</td>\n",
" <td>1965</td>\n",
" <td>0</td>\n",
" <td>98136</td>\n",
" <td>47.5208</td>\n",
" <td>-122.393</td>\n",
" <td>1360</td>\n",
" <td>5000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1954400510</td>\n",
" <td>20150218T000000</td>\n",
" <td>510000.0</td>\n",
" <td>3</td>\n",
" <td>2.00</td>\n",
" <td>1680</td>\n",
" <td>8080</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>8</td>\n",
" <td>1680</td>\n",
" <td>0</td>\n",
" <td>1987</td>\n",
" <td>0</td>\n",
" <td>98074</td>\n",
" <td>47.6168</td>\n",
" <td>-122.045</td>\n",
" <td>1800</td>\n",
" <td>7503</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21608</th>\n",
" <td>263000018</td>\n",
" <td>20140521T000000</td>\n",
" <td>360000.0</td>\n",
" <td>3</td>\n",
" <td>2.50</td>\n",
" <td>1530</td>\n",
" <td>1131</td>\n",
" <td>3.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>8</td>\n",
" <td>1530</td>\n",
" <td>0</td>\n",
" <td>2009</td>\n",
" <td>0</td>\n",
" <td>98103</td>\n",
" <td>47.6993</td>\n",
" <td>-122.346</td>\n",
" <td>1530</td>\n",
" <td>1509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21609</th>\n",
" <td>6600060120</td>\n",
" <td>20150223T000000</td>\n",
" <td>400000.0</td>\n",
" <td>4</td>\n",
" <td>2.50</td>\n",
" <td>2310</td>\n",
" <td>5813</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>8</td>\n",
" <td>2310</td>\n",
" <td>0</td>\n",
" <td>2014</td>\n",
" <td>0</td>\n",
" <td>98146</td>\n",
" <td>47.5107</td>\n",
" <td>-122.362</td>\n",
" <td>1830</td>\n",
" <td>7200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21610</th>\n",
" <td>1523300141</td>\n",
" <td>20140623T000000</td>\n",
" <td>402101.0</td>\n",
" <td>2</td>\n",
" <td>0.75</td>\n",
" <td>1020</td>\n",
" <td>1350</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>7</td>\n",
" <td>1020</td>\n",
" <td>0</td>\n",
" <td>2009</td>\n",
" <td>0</td>\n",
" <td>98144</td>\n",
" <td>47.5944</td>\n",
" <td>-122.299</td>\n",
" <td>1020</td>\n",
" <td>2007</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21611</th>\n",
" <td>291310100</td>\n",
" <td>20150116T000000</td>\n",
" <td>400000.0</td>\n",
" <td>3</td>\n",
" <td>2.50</td>\n",
" <td>1600</td>\n",
" <td>2388</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>8</td>\n",
" <td>1600</td>\n",
" <td>0</td>\n",
" <td>2004</td>\n",
" <td>0</td>\n",
" <td>98027</td>\n",
" <td>47.5345</td>\n",
" <td>-122.069</td>\n",
" <td>1410</td>\n",
" <td>1287</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21612</th>\n",
" <td>1523300157</td>\n",
" <td>20141015T000000</td>\n",
" <td>325000.0</td>\n",
" <td>2</td>\n",
" <td>0.75</td>\n",
" <td>1020</td>\n",
" <td>1076</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>7</td>\n",
" <td>1020</td>\n",
" <td>0</td>\n",
" <td>2008</td>\n",
" <td>0</td>\n",
" <td>98144</td>\n",
" <td>47.5941</td>\n",
" <td>-122.299</td>\n",
" <td>1020</td>\n",
" <td>1357</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>21613 rows × 21 columns</p>\n",
"</div>"
],
"text/plain": [
" id date price bedrooms bathrooms \\\n",
"0 7129300520 20141013T000000 221900.0 3 1.00 \n",
"1 6414100192 20141209T000000 538000.0 3 2.25 \n",
"2 5631500400 20150225T000000 180000.0 2 1.00 \n",
"3 2487200875 20141209T000000 604000.0 4 3.00 \n",
"4 1954400510 20150218T000000 510000.0 3 2.00 \n",
"... ... ... ... ... ... \n",
"21608 263000018 20140521T000000 360000.0 3 2.50 \n",
"21609 6600060120 20150223T000000 400000.0 4 2.50 \n",
"21610 1523300141 20140623T000000 402101.0 2 0.75 \n",
"21611 291310100 20150116T000000 400000.0 3 2.50 \n",
"21612 1523300157 20141015T000000 325000.0 2 0.75 \n",
"\n",
" sqft_living sqft_lot floors waterfront view ... grade \\\n",
"0 1180 5650 1.0 0 0 ... 7 \n",
"1 2570 7242 2.0 0 0 ... 7 \n",
"2 770 10000 1.0 0 0 ... 6 \n",
"3 1960 5000 1.0 0 0 ... 7 \n",
"4 1680 8080 1.0 0 0 ... 8 \n",
"... ... ... ... ... ... ... ... \n",
"21608 1530 1131 3.0 0 0 ... 8 \n",
"21609 2310 5813 2.0 0 0 ... 8 \n",
"21610 1020 1350 2.0 0 0 ... 7 \n",
"21611 1600 2388 2.0 0 0 ... 8 \n",
"21612 1020 1076 2.0 0 0 ... 7 \n",
"\n",
" sqft_above sqft_basement yr_built yr_renovated zipcode lat \\\n",
"0 1180 0 1955 0 98178 47.5112 \n",
"1 2170 400 1951 1991 98125 47.7210 \n",
"2 770 0 1933 0 98028 47.7379 \n",
"3 1050 910 1965 0 98136 47.5208 \n",
"4 1680 0 1987 0 98074 47.6168 \n",
"... ... ... ... ... ... ... \n",
"21608 1530 0 2009 0 98103 47.6993 \n",
"21609 2310 0 2014 0 98146 47.5107 \n",
"21610 1020 0 2009 0 98144 47.5944 \n",
"21611 1600 0 2004 0 98027 47.5345 \n",
"21612 1020 0 2008 0 98144 47.5941 \n",
"\n",
" long sqft_living15 sqft_lot15 \n",
"0 -122.257 1340 5650 \n",
"1 -122.319 1690 7639 \n",
"2 -122.233 2720 8062 \n",
"3 -122.393 1360 5000 \n",
"4 -122.045 1800 7503 \n",
"... ... ... ... \n",
"21608 -122.346 1530 1509 \n",
"21609 -122.362 1830 7200 \n",
"21610 -122.299 1020 2007 \n",
"21611 -122.069 1410 1287 \n",
"21612 -122.299 1020 1357 \n",
"\n",
"[21613 rows x 21 columns]"
]
},
2024-10-19 23:03:18 +04:00
"execution_count": 5,
2024-10-19 13:14:28 +04:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var6 = pd.read_csv(\"./datasets/var6/kc_house_data.csv\")\n",
"var6"
]
},
{
"cell_type": "code",
2024-10-19 23:03:18 +04:00
"execution_count": 6,
2024-10-19 13:14:28 +04:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"id int64\n",
"date object\n",
"price float64\n",
"bedrooms int64\n",
"bathrooms float64\n",
"sqft_living int64\n",
"sqft_lot int64\n",
"floors float64\n",
"waterfront int64\n",
"view int64\n",
"condition int64\n",
"grade int64\n",
"sqft_above int64\n",
"sqft_basement int64\n",
"yr_built int64\n",
"yr_renovated int64\n",
"zipcode int64\n",
"lat float64\n",
"long float64\n",
"sqft_living15 int64\n",
"sqft_lot15 int64\n",
"dtype: object"
]
},
2024-10-19 23:03:18 +04:00
"execution_count": 6,
2024-10-19 13:14:28 +04:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var6.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Цены на мобильные устройства\n",
"Каждая строка в датасете содержит соответствующую информацию о мобильном устройстве, что позволяет проводить анализ и строить модели для предсказания его цены."
]
},
{
"cell_type": "code",
2024-10-19 23:03:18 +04:00
"execution_count": 7,
2024-10-19 13:14:28 +04:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>Name</th>\n",
" <th>Rating</th>\n",
" <th>Spec_score</th>\n",
" <th>No_of_sim</th>\n",
" <th>Ram</th>\n",
" <th>Battery</th>\n",
" <th>Display</th>\n",
" <th>Camera</th>\n",
" <th>External_Memory</th>\n",
" <th>Android_version</th>\n",
" <th>Price</th>\n",
" <th>company</th>\n",
" <th>Inbuilt_memory</th>\n",
" <th>fast_charging</th>\n",
" <th>Screen_resolution</th>\n",
" <th>Processor</th>\n",
" <th>Processor_name</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>Samsung Galaxy F14 5G</td>\n",
" <td>4.65</td>\n",
" <td>68</td>\n",
" <td>Dual Sim, 3G, 4G, 5G, VoLTE,</td>\n",
" <td>4 GB RAM</td>\n",
" <td>6000 mAh Battery</td>\n",
" <td>6.6 inches</td>\n",
" <td>50 MP + 2 MP Dual Rear &amp;amp; 13 MP Front Camera</td>\n",
" <td>Memory Card Supported, upto 1 TB</td>\n",
" <td>13</td>\n",
" <td>9,999</td>\n",
" <td>Samsung</td>\n",
" <td>128 GB inbuilt</td>\n",
" <td>25W Fast Charging</td>\n",
" <td>2408 x 1080 px Display with Water Drop Notch</td>\n",
" <td>Octa Core Processor</td>\n",
" <td>Exynos 1330</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>Samsung Galaxy A11</td>\n",
" <td>4.20</td>\n",
" <td>63</td>\n",
" <td>Dual Sim, 3G, 4G, VoLTE,</td>\n",
" <td>2 GB RAM</td>\n",
" <td>4000 mAh Battery</td>\n",
" <td>6.4 inches</td>\n",
" <td>13 MP + 5 MP + 2 MP Triple Rear &amp;amp; 8 MP Fro...</td>\n",
" <td>Memory Card Supported, upto 512 GB</td>\n",
" <td>10</td>\n",
" <td>9,990</td>\n",
" <td>Samsung</td>\n",
" <td>32 GB inbuilt</td>\n",
" <td>15W Fast Charging</td>\n",
" <td>720 x 1560 px Display with Punch Hole</td>\n",
" <td>1.8 GHz Processor</td>\n",
" <td>Octa Core</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>Samsung Galaxy A13</td>\n",
" <td>4.30</td>\n",
" <td>75</td>\n",
" <td>Dual Sim, 3G, 4G, VoLTE,</td>\n",
" <td>4 GB RAM</td>\n",
" <td>5000 mAh Battery</td>\n",
" <td>6.6 inches</td>\n",
" <td>50 MP Quad Rear &amp;amp; 8 MP Front Camera</td>\n",
" <td>Memory Card Supported, upto 1 TB</td>\n",
" <td>12</td>\n",
" <td>11,999</td>\n",
" <td>Samsung</td>\n",
" <td>64 GB inbuilt</td>\n",
" <td>25W Fast Charging</td>\n",
" <td>1080 x 2408 px Display with Water Drop Notch</td>\n",
" <td>2 GHz Processor</td>\n",
" <td>Octa Core</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>Samsung Galaxy F23</td>\n",
" <td>4.10</td>\n",
" <td>73</td>\n",
" <td>Dual Sim, 3G, 4G, VoLTE,</td>\n",
" <td>4 GB RAM</td>\n",
" <td>6000 mAh Battery</td>\n",
" <td>6.4 inches</td>\n",
" <td>48 MP Quad Rear &amp;amp; 13 MP Front Camera</td>\n",
" <td>Memory Card Supported, upto 1 TB</td>\n",
" <td>12</td>\n",
" <td>11,999</td>\n",
" <td>Samsung</td>\n",
" <td>64 GB inbuilt</td>\n",
" <td>NaN</td>\n",
" <td>720 x 1600 px</td>\n",
" <td>Octa Core</td>\n",
" <td>Helio G88</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4</td>\n",
" <td>Samsung Galaxy A03s (4GB RAM + 64GB)</td>\n",
" <td>4.10</td>\n",
" <td>69</td>\n",
" <td>Dual Sim, 3G, 4G, VoLTE,</td>\n",
" <td>4 GB RAM</td>\n",
" <td>5000 mAh Battery</td>\n",
" <td>6.5 inches</td>\n",
" <td>13 MP + 2 MP + 2 MP Triple Rear &amp;amp; 5 MP Fro...</td>\n",
" <td>Memory Card Supported, upto 1 TB</td>\n",
" <td>11</td>\n",
" <td>11,999</td>\n",
" <td>Samsung</td>\n",
" <td>64 GB inbuilt</td>\n",
" <td>15W Fast Charging</td>\n",
" <td>720 x 1600 px Display with Water Drop Notch</td>\n",
" <td>Octa Core</td>\n",
" <td>Helio P35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1365</th>\n",
" <td>1365</td>\n",
" <td>TCL 40R</td>\n",
" <td>4.05</td>\n",
" <td>75</td>\n",
" <td>Dual Sim, 3G, 4G, 5G, VoLTE,</td>\n",
" <td>4 GB RAM</td>\n",
" <td>5000 mAh Battery</td>\n",
" <td>6.6 inches</td>\n",
" <td>50 MP + 2 MP + 2 MP Triple Rear &amp;amp; 8 MP Fro...</td>\n",
" <td>Memory Card (Hybrid)</td>\n",
" <td>12</td>\n",
" <td>18,999</td>\n",
" <td>TCL</td>\n",
" <td>64 GB inbuilt</td>\n",
" <td>15W Fast Charging</td>\n",
" <td>720 x 1612 px</td>\n",
" <td>Octa Core</td>\n",
" <td>Dimensity 700 5G</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1366</th>\n",
" <td>1366</td>\n",
" <td>TCL 50 XL NxtPaper 5G</td>\n",
" <td>4.10</td>\n",
" <td>80</td>\n",
" <td>Dual Sim, 3G, 4G, VoLTE,</td>\n",
" <td>8 GB RAM</td>\n",
" <td>5000 mAh Battery</td>\n",
" <td>6.8 inches</td>\n",
" <td>50 MP + 2 MP Dual Rear &amp;amp; 16 MP Front Camera</td>\n",
" <td>Memory Card (Hybrid)</td>\n",
" <td>14</td>\n",
" <td>24,990</td>\n",
" <td>TCL</td>\n",
" <td>128 GB inbuilt</td>\n",
" <td>33W Fast Charging</td>\n",
" <td>1200 x 2400 px</td>\n",
" <td>Octa Core</td>\n",
" <td>Dimensity 7050</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1367</th>\n",
" <td>1367</td>\n",
" <td>TCL 50 XE NxtPaper 5G</td>\n",
" <td>4.00</td>\n",
" <td>80</td>\n",
" <td>Dual Sim, 3G, 4G, 5G, VoLTE,</td>\n",
" <td>6 GB RAM</td>\n",
" <td>5000 mAh Battery</td>\n",
" <td>6.6 inches</td>\n",
" <td>50 MP + 2 MP Dual Rear &amp;amp; 16 MP Front Camera</td>\n",
" <td>Memory Card Supported, upto 1 TB</td>\n",
" <td>13</td>\n",
" <td>23,990</td>\n",
" <td>TCL</td>\n",
" <td>256 GB inbuilt</td>\n",
" <td>18W Fast Charging</td>\n",
" <td>720 x 1612 px</td>\n",
" <td>Octa Core</td>\n",
" <td>Dimensity 6080</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1368</th>\n",
" <td>1368</td>\n",
" <td>TCL 40 NxtPaper 5G</td>\n",
" <td>4.50</td>\n",
" <td>79</td>\n",
" <td>Dual Sim, 3G, 4G, 5G, VoLTE,</td>\n",
" <td>6 GB RAM</td>\n",
" <td>5000 mAh Battery</td>\n",
" <td>6.6 inches</td>\n",
" <td>50 MP + 2 MP + 2 MP Triple Rear &amp;amp; 8 MP Fro...</td>\n",
" <td>Memory Card Supported, upto 1 TB</td>\n",
" <td>13</td>\n",
" <td>22,499</td>\n",
" <td>TCL</td>\n",
" <td>256 GB inbuilt</td>\n",
" <td>15W Fast Charging</td>\n",
" <td>720 x 1612 px</td>\n",
" <td>Octa Core</td>\n",
" <td>Dimensity 6020</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1369</th>\n",
" <td>1369</td>\n",
" <td>TCL Trifold</td>\n",
" <td>4.65</td>\n",
" <td>93</td>\n",
" <td>Dual Sim, 3G, 4G, 5G, VoLTE, Vo5G,</td>\n",
" <td>12 GB RAM</td>\n",
" <td>4600 mAh Battery</td>\n",
" <td>10 inches</td>\n",
" <td>Foldable Display, Dual Display</td>\n",
" <td>50 MP + 48 MP + 8 MP Triple Rear &amp;amp; 32 MP F...</td>\n",
" <td>13</td>\n",
" <td>1,19,990</td>\n",
" <td>TCL</td>\n",
" <td>256 GB inbuilt</td>\n",
" <td>67W Fast Charging</td>\n",
" <td>1916 x 2160 px</td>\n",
" <td>Octa Core</td>\n",
" <td>Snapdragon 8 Gen2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1370 rows × 18 columns</p>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 Name Rating Spec_score \\\n",
"0 0 Samsung Galaxy F14 5G 4.65 68 \n",
"1 1 Samsung Galaxy A11 4.20 63 \n",
"2 2 Samsung Galaxy A13 4.30 75 \n",
"3 3 Samsung Galaxy F23 4.10 73 \n",
"4 4 Samsung Galaxy A03s (4GB RAM + 64GB) 4.10 69 \n",
"... ... ... ... ... \n",
"1365 1365 TCL 40R 4.05 75 \n",
"1366 1366 TCL 50 XL NxtPaper 5G 4.10 80 \n",
"1367 1367 TCL 50 XE NxtPaper 5G 4.00 80 \n",
"1368 1368 TCL 40 NxtPaper 5G 4.50 79 \n",
"1369 1369 TCL Trifold 4.65 93 \n",
"\n",
" No_of_sim Ram Battery \\\n",
"0 Dual Sim, 3G, 4G, 5G, VoLTE, 4 GB RAM 6000 mAh Battery \n",
"1 Dual Sim, 3G, 4G, VoLTE, 2 GB RAM 4000 mAh Battery \n",
"2 Dual Sim, 3G, 4G, VoLTE, 4 GB RAM 5000 mAh Battery \n",
"3 Dual Sim, 3G, 4G, VoLTE, 4 GB RAM 6000 mAh Battery \n",
"4 Dual Sim, 3G, 4G, VoLTE, 4 GB RAM 5000 mAh Battery \n",
"... ... ... ... \n",
"1365 Dual Sim, 3G, 4G, 5G, VoLTE, 4 GB RAM 5000 mAh Battery \n",
"1366 Dual Sim, 3G, 4G, VoLTE, 8 GB RAM 5000 mAh Battery \n",
"1367 Dual Sim, 3G, 4G, 5G, VoLTE, 6 GB RAM 5000 mAh Battery \n",
"1368 Dual Sim, 3G, 4G, 5G, VoLTE, 6 GB RAM 5000 mAh Battery \n",
"1369 Dual Sim, 3G, 4G, 5G, VoLTE, Vo5G, 12 GB RAM 4600 mAh Battery \n",
"\n",
" Display Camera \\\n",
"0 6.6 inches 50 MP + 2 MP Dual Rear &amp; 13 MP Front Camera \n",
"1 6.4 inches 13 MP + 5 MP + 2 MP Triple Rear &amp; 8 MP Fro... \n",
"2 6.6 inches 50 MP Quad Rear &amp; 8 MP Front Camera \n",
"3 6.4 inches 48 MP Quad Rear &amp; 13 MP Front Camera \n",
"4 6.5 inches 13 MP + 2 MP + 2 MP Triple Rear &amp; 5 MP Fro... \n",
"... ... ... \n",
"1365 6.6 inches 50 MP + 2 MP + 2 MP Triple Rear &amp; 8 MP Fro... \n",
"1366 6.8 inches 50 MP + 2 MP Dual Rear &amp; 16 MP Front Camera \n",
"1367 6.6 inches 50 MP + 2 MP Dual Rear &amp; 16 MP Front Camera \n",
"1368 6.6 inches 50 MP + 2 MP + 2 MP Triple Rear &amp; 8 MP Fro... \n",
"1369 10 inches Foldable Display, Dual Display \n",
"\n",
" External_Memory Android_version \\\n",
"0 Memory Card Supported, upto 1 TB 13 \n",
"1 Memory Card Supported, upto 512 GB 10 \n",
"2 Memory Card Supported, upto 1 TB 12 \n",
"3 Memory Card Supported, upto 1 TB 12 \n",
"4 Memory Card Supported, upto 1 TB 11 \n",
"... ... ... \n",
"1365 Memory Card (Hybrid) 12 \n",
"1366 Memory Card (Hybrid) 14 \n",
"1367 Memory Card Supported, upto 1 TB 13 \n",
"1368 Memory Card Supported, upto 1 TB 13 \n",
"1369 50 MP + 48 MP + 8 MP Triple Rear &amp; 32 MP F... 13 \n",
"\n",
" Price company Inbuilt_memory fast_charging \\\n",
"0 9,999 Samsung 128 GB inbuilt 25W Fast Charging \n",
"1 9,990 Samsung 32 GB inbuilt 15W Fast Charging \n",
"2 11,999 Samsung 64 GB inbuilt 25W Fast Charging \n",
"3 11,999 Samsung 64 GB inbuilt NaN \n",
"4 11,999 Samsung 64 GB inbuilt 15W Fast Charging \n",
"... ... ... ... ... \n",
"1365 18,999 TCL 64 GB inbuilt 15W Fast Charging \n",
"1366 24,990 TCL 128 GB inbuilt 33W Fast Charging \n",
"1367 23,990 TCL 256 GB inbuilt 18W Fast Charging \n",
"1368 22,499 TCL 256 GB inbuilt 15W Fast Charging \n",
"1369 1,19,990 TCL 256 GB inbuilt 67W Fast Charging \n",
"\n",
" Screen_resolution Processor \\\n",
"0 2408 x 1080 px Display with Water Drop Notch Octa Core Processor \n",
"1 720 x 1560 px Display with Punch Hole 1.8 GHz Processor \n",
"2 1080 x 2408 px Display with Water Drop Notch 2 GHz Processor \n",
"3 720 x 1600 px Octa Core \n",
"4 720 x 1600 px Display with Water Drop Notch Octa Core \n",
"... ... ... \n",
"1365 720 x 1612 px Octa Core \n",
"1366 1200 x 2400 px Octa Core \n",
"1367 720 x 1612 px Octa Core \n",
"1368 720 x 1612 px Octa Core \n",
"1369 1916 x 2160 px Octa Core \n",
"\n",
" Processor_name \n",
"0 Exynos 1330 \n",
"1 Octa Core \n",
"2 Octa Core \n",
"3 Helio G88 \n",
"4 Helio P35 \n",
"... ... \n",
"1365 Dimensity 700 5G \n",
"1366 Dimensity 7050 \n",
"1367 Dimensity 6080 \n",
"1368 Dimensity 6020 \n",
"1369 Snapdragon 8 Gen2 \n",
"\n",
"[1370 rows x 18 columns]"
]
},
2024-10-19 23:03:18 +04:00
"execution_count": 7,
2024-10-19 13:14:28 +04:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var18 = pd.read_csv(\"./datasets/var18/mobile_phone_price_prediction.csv\")\n",
"var18"
]
},
{
"cell_type": "code",
2024-10-19 23:03:18 +04:00
"execution_count": 8,
2024-10-19 13:14:28 +04:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Unnamed: 0 int64\n",
"Name object\n",
"Rating float64\n",
"Spec_score int64\n",
"No_of_sim object\n",
"Ram object\n",
"Battery object\n",
"Display object\n",
"Camera object\n",
"External_Memory object\n",
"Android_version object\n",
"Price object\n",
"company object\n",
"Inbuilt_memory object\n",
"fast_charging object\n",
"Screen_resolution object\n",
"Processor object\n",
"Processor_name object\n",
"dtype: object"
]
},
2024-10-19 23:03:18 +04:00
"execution_count": 8,
2024-10-19 13:14:28 +04:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var18.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2024-10-19 23:03:18 +04:00
"### 3. Провести анализ содержимого каждого набора данных. Что является объектом/объектами наблюдения? Каковы атрибуты объектов? Есть ли связи между объектами?\n",
"\n",
"1. Датасет о риске инсульта\n",
" - Объект наблюжения: Пациенты\n",
"2. Датасет о продажах недвижимости\n",
" - Объект наблюдения: Сделки по проданным домам в King Country, США\n",
"3. Датасет о цене мобильных устройств\n",
" - Объект наблюдения: Модели телефонов и их цены\n",
"\n",
"Все аттрибуты были перечислены выше."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. Привести примеры бизнес-целей, для достижения которых могут подойти выбранные наборы данных. Каков эффект для бизнеса?\n",
"1. Датасет о риске инсульта\n",
" - Бизнес-цель: Определить факторы риска инсульта и предохранить пациентов от инсульта.\n",
" - Эффект для бизнеса: Снижение количества случаев инсульта, снижение затрат на лечение и улучшение репутации клиники.\n",
"2. Датасет о продажах недвижимости\n",
" - Бизнес-цель: Определить факторы, влияющие на продажи недвижимости\n",
" - Эффект для бизнеса: Улучшение стратегии продаж, повышение эффективности подбора имущества для последующего извлесения прибыли\n",
"3. Датасет о цене мобильных устройств\n",
" - Бизнес-цель: Определить факторы, влияющие на цену мобильных устройств\n",
" - Эффект для бизнеса: Улучшение стратегии ценообразования, повышение эффективности продаж и прибыли."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5. Привести примеры целей технического проекта для каждой выделенной ранее бизнес-цели. Что поступает на вход, что является целевым признаком?\n",
"1. Датасет о риске инсульта\n",
" - Бизнес-цель: Разработка системы раннего предупреждения инсульта.\n",
" - Цель технического проекта: Создание модели машинного обучения для прогнозирования вероятности инсульта.\n",
" - Входные данные:\n",
" Пол Возраст Наличие гипертензии Наличие сердечных заболеваний Статус брака Тип работы Тип проживания Средний уровень глюкозы Индекс массы тела Статус курения и т.д.\n",
" - Целевой признак: Наличие инсульта (stroke).\n",
"2. Датасет о продажах недвижимости\n",
" - Бизнес-цель: Развитие системы рекомендации недвижимости рекомендованной к покупке для последующей перепродажи.\n",
" - Цель технического проекта: Разработка модели машинного обучения для прогнозирования цены недвижимости.\n",
" - Входные данные:\n",
" Площадь Площадь комнат Площадь участка Тип дома Тип комнат и другие признаки.\n",
" - Целевой признак: Цена недвижимости (Price).\n",
"3. Датасет о цене мобильных устройств\n",
" - Бизнес-цель: Оптимизация ценообразования и улучшение стратегии продаж мобильных устройств.\n",
" - Цель технического проекта: Построение модели для предсказания рекомендованной цены мобильного устройства на основе характеристик.\n",
" - Входные данные:\n",
" Имя Рейтинг Очки производительности Кол-во SIM-слотов Оперативная память Емкость аккумклятора Дисплей Камера Дополнительные слоты для карт памяти и остальное.\n",
" - Целевой признак: Очки производительности (Spec_score)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6. Определить проблемы выбранных наборов данных: зашумленность, смещение, актуальность, выбросы, просачивание данных.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 7. Привести примеры решения обнаруженных проблем для каждого набора данных¶\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# 1. Проверка на зашумленность ---- количество пропусков в процентах от общего кол-ва\n",
"def check_noise(dataframe):\n",
" total_values = dataframe.size\n",
" missing_values = dataframe.isnull().sum().sum()\n",
" noise_percentage = (missing_values / total_values) * 100\n",
" return f\"Зашумленность: {noise_percentage:.2f}%\"\n",
"\n",
"# 2. Проверка на смещение ----- объем уникальных значений внутри определнной колонки \n",
"def check_bias(dataframe, target_column):\n",
" if target_column in dataframe.columns:\n",
" unique_values = dataframe[target_column].nunique()\n",
" total_values = len(dataframe)\n",
" bias_percentage = (unique_values / total_values) * 100\n",
" return f\"Смещение по {target_column}: {bias_percentage:.2f}% уникальных значений\"\n",
" return \"Целевой признак не найден.\"\n",
"\n",
"# 3. Проверка на дубликаты\n",
"def check_duplicates(dataframe):\n",
" duplicate_percentage = dataframe.duplicated().mean() * 100\n",
" return f\"Количество дубликатов: {duplicate_percentage:.2f}%\"\n",
"\n",
"# 4. Проверка на выбросы\n",
"def check_outliers(dataframe, column):\n",
" if column in dataframe.columns:\n",
" Q1 = dataframe[column].quantile(0.25)\n",
" Q3 = dataframe[column].quantile(0.75)\n",
" IQR = Q3 - Q1\n",
" lower_bound = Q1 - 1.5 * IQR\n",
" upper_bound = Q3 + 1.5 * IQR\n",
" outlier_count = dataframe[(dataframe[column] < lower_bound) | (dataframe[column] > upper_bound)].shape[0]\n",
" total_count = dataframe.shape[0]\n",
" outlier_percentage = (outlier_count / total_count) * 100\n",
" return f\"Выбросы по {column}: {outlier_percentage:.2f}%\"\n",
" return f\"Признак {column} не найден.\"\n",
"\n",
"# 5. Проверка на просачивание данных\n",
"def check_data_leakage(dataframe, target_column):\n",
" if target_column in dataframe.columns:\n",
" correlation_matrix = dataframe.select_dtypes(include=[np.number]).corr()\n",
" leakage_info = correlation_matrix[target_column].abs().nlargest(10)\n",
" leakage_report = \", \".join([f\"{feature}: {value:.2f}\" for feature, value in leakage_info.items() if feature != target_column])\n",
" return f\"Признаки просачивания данных: {leakage_report}\"\n",
" return \"Целевой признак не найден.\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Датасет о риске инсульта:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Зашумленность: 0.33%\n",
"Смещение по avg_glucose_level: 77.87% уникальных значений\n",
"Количество дубликатов: 0.00%\n",
"Выбросы по avg_glucose_level: 12.27%\n",
"Признаки просачивания данных: age: 0.25, heart_disease: 0.13, avg_glucose_level: 0.13, hypertension: 0.13, bmi: 0.04, id: 0.01\n"
]
}
],
"source": [
"noise_columns = check_noise(var4)\n",
"bias_info = check_bias(var4, 'avg_glucose_level') \n",
"duplicate_count = check_duplicates(var4)\n",
"outliers_data = check_outliers(var4, 'avg_glucose_level') \n",
"leakage_info = check_data_leakage(var4, 'stroke') \n",
"\n",
"print(noise_columns)\n",
"print(bias_info)\n",
"print(duplicate_count)\n",
"print(outliers_data)\n",
"print(leakage_info)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Датасет о продажах недвижимости:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Зашумленность: 0.00%\n",
"Смещение по price: 18.64% уникальных значений\n",
"Количество дубликатов: 0.00%\n",
"Выбросы по yr_renovated: 4.23%\n",
"Признаки просачивания данных: yr_built: 0.36, floors: 0.26, sqft_basement: 0.17, sqft_above: 0.16, grade: 0.14, bathrooms: 0.12, long: 0.11, sqft_living15: 0.09, yr_renovated: 0.06\n"
]
}
],
"source": [
"noise_columns = check_noise(var6)\n",
"bias_info = check_bias(var6, 'price') \n",
"duplicate_count = check_duplicates(var6)\n",
"outliers_data = check_outliers(var6, 'yr_renovated') \n",
"leakage_info = check_data_leakage(var6, 'condition') \n",
"\n",
"print(noise_columns)\n",
"print(bias_info)\n",
"print(duplicate_count)\n",
"print(outliers_data)\n",
"print(leakage_info)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Датасет о цене мобильных устройств:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Зашумленность: 2.36%\n",
"Смещение по company: 1.90% уникальных значений\n",
"Количество дубликатов: 0.00%\n",
"Выбросы по Spec_score: 1.24%\n",
"Признаки просачивания данных: Spec_score: 0.06, Unnamed: 0: 0.03\n"
]
}
],
"source": [
"noise_columns = check_noise(var18)\n",
"bias_info = check_bias(var18, 'company') \n",
"duplicate_count = check_duplicates(var18)\n",
"outliers_data = check_outliers(var18, 'Spec_score') \n",
"leakage_info = check_data_leakage(var18, 'Rating') \n",
"\n",
"print(noise_columns)\n",
"print(bias_info)\n",
"print(duplicate_count)\n",
"print(outliers_data)\n",
"print(leakage_info)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 9. Устранить проблему пропущенных данных. Для каждого набора данных использовать разные методы: удаление, подстановка константного значения (0 или подобное), подстановка среднего значения"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"id 0\n",
"gender 0\n",
"age 0\n",
"hypertension 0\n",
"heart_disease 0\n",
"ever_married 0\n",
"work_type 0\n",
"Residence_type 0\n",
"avg_glucose_level 0\n",
"bmi 201\n",
"smoking_status 0\n",
"stroke 0\n",
"dtype: int64"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Инсульт\n",
"var4.isnull().sum()"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"var4['bmi'] = var4['bmi'].fillna(var4['bmi'].mean())"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"id 0\n",
"gender 0\n",
"age 0\n",
"hypertension 0\n",
"heart_disease 0\n",
"ever_married 0\n",
"work_type 0\n",
"Residence_type 0\n",
"avg_glucose_level 0\n",
"bmi 0\n",
"smoking_status 0\n",
"stroke 0\n",
"dtype: int64"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var4.isnull().sum()"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"id 0\n",
"date 0\n",
"price 0\n",
"bedrooms 0\n",
"bathrooms 0\n",
"sqft_living 0\n",
"sqft_lot 0\n",
"floors 0\n",
"waterfront 0\n",
"view 0\n",
"condition 0\n",
"grade 0\n",
"sqft_above 0\n",
"sqft_basement 0\n",
"yr_built 0\n",
"yr_renovated 0\n",
"zipcode 0\n",
"lat 0\n",
"long 0\n",
"sqft_living15 0\n",
"sqft_lot15 0\n",
"dtype: int64"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Дома\n",
"var6.isnull().sum()"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Unnamed: 0 0\n",
"Name 0\n",
"Rating 0\n",
"Spec_score 0\n",
"No_of_sim 0\n",
"Ram 0\n",
"Battery 0\n",
"Display 0\n",
"Camera 0\n",
"External_Memory 0\n",
"Android_version 443\n",
"Price 0\n",
"company 0\n",
"Inbuilt_memory 19\n",
"fast_charging 89\n",
"Screen_resolution 2\n",
"Processor 28\n",
"Processor_name 0\n",
"dtype: int64"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Мобильные устройства\n",
"var18.isnull().sum()"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"var18['Android_version'] = var18['Android_version'].fillna('No info')\n",
"var18['Inbuilt_memory'] = var18['Android_version'].fillna('No info')\n",
"var18['fast_charging'] = var18['Android_version'].fillna('No info')\n",
"var18['Screen_resolution'] = var18['Android_version'].fillna('No info')\n",
"var18['Processor'] = var18['Android_version'].fillna('No info')"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Unnamed: 0 0\n",
"Name 0\n",
"Rating 0\n",
"Spec_score 0\n",
"No_of_sim 0\n",
"Ram 0\n",
"Battery 0\n",
"Display 0\n",
"Camera 0\n",
"External_Memory 0\n",
"Android_version 0\n",
"Price 0\n",
"company 0\n",
"Inbuilt_memory 0\n",
"fast_charging 0\n",
"Screen_resolution 0\n",
"Processor 0\n",
"Processor_name 0\n",
"dtype: int64"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var18.isnull().sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 10. Выполнить разбиение каждого набора данных на обучающую, контрольную и тестовую выборки¶\n"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"var4 Dataset:\n",
"Train: 80.00%\n",
"Validation: 10.00%\n",
"Test: 10.00%\n",
"\n"
]
}
],
"source": [
"# Разбиение var4 (Инсульт)\n",
"\n",
"original_var4_size = len(var4)\n",
"train_var4, temp_var4 = train_test_split(var4, test_size=0.2, random_state=42)\n",
"val_var4, test_var4 = train_test_split(temp_var4, test_size=0.5, random_state=42)\n",
"\n",
"print(\"var4 Dataset:\")\n",
"print(f\"Train: {len(train_var4)/original_var4_size*100:.2f}%\")\n",
"print(f\"Validation: {len(val_var4)/original_var4_size*100:.2f}%\")\n",
"print(f\"Test: {len(test_var4)/original_var4_size*100:.2f}%\\n\")"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"var6 Dataset:\n",
"Train: 80.00%\n",
"Validation: 10.00%\n",
"Test: 10.00%\n",
"\n"
]
}
],
"source": [
"# Разбиение var6 (Дома)\n",
"original_var6_size = len(var6)\n",
"train_var6, temp_var6 = train_test_split(var6, test_size=0.2, random_state=42)\n",
"val_var6, test_var6 = train_test_split(temp_var6, test_size=0.5, random_state=42)\n",
"\n",
"print(\"var6 Dataset:\")\n",
"print(f\"Train: {len(train_var6)/original_var6_size*100:.2f}%\")\n",
"print(f\"Validation: {len(val_var6)/original_var6_size*100:.2f}%\")\n",
"print(f\"Test: {len(test_var6)/original_var6_size*100:.2f}%\\n\")"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"var18 Dataset:\n",
"Train: 80.00%\n",
"Validation: 10.00%\n",
"Test: 10.00%\n",
"\n"
]
}
],
"source": [
"# Разбиение var18 (Мобильные устройства)\n",
"original_var18_size = len(var18)\n",
"train_var18, temp_var18 = train_test_split(var18, test_size=0.2, random_state=42)\n",
"val_var18, test_var18 = train_test_split(temp_var18, test_size=0.5, random_state=42)\n",
"\n",
"print(\"var18 Dataset:\")\n",
"print(f\"Train: {len(train_var18)/original_var18_size*100:.2f}%\")\n",
"print(f\"Validation: {len(val_var18)/original_var18_size*100:.2f}%\")\n",
"print(f\"Test: {len(test_var18)/original_var18_size*100:.2f}%\\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 11. Оценить сбалансированность выборок для каждого набора данных. Оценить необходимость использования методов приращения (аугментации) данных."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 12. Выполнить приращение данных методами выборки с избытком (oversampling) и выборки с недостатком (undersampling). Должны быть представлены примеры реализации обоих методов для выборок каждого набора данных."
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAsAAAAHWCAYAAAB5SD/0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB3OElEQVR4nO3deVxUVR8G8OfOwAzrgICAKCC5ay6JpWguKYpGlmWZRmlptmGllhZvbmlqmvueZVpvWmmlb2mpuGfiEknhhmYapIIissPMMHPfP3CujCwCAneW5/v5zEfm3jP3njssPhx+9xxBFEURRERERER2QiF3B4iIiIiI6hIDMBERERHZFQZgIiIiIrIrDMBEREREZFcYgImIiIjIrjAAExEREZFdYQAmIiIiIrvCAExEREREdoUBmIiIiIjsCgMwEZEVmTVrFoxGIwDAaDRi9uzZMveIquLEiRPYsmWL9DwhIQHbtm2Tr0NWYNq0aRAEQe5ukI1hACZZrVu3DoIgSA8nJyc0b94cY8aMQVpamtzdI7I4n3/+OebNm4d///0X8+fPx+effy53l6gKcnJy8PLLL+Pw4cM4d+4c3nzzTSQmJsrdrWpp3Lix2c/v8h7r1q2Tu6uyO3/+PJ555hn4+vrC2dkZzZo1w3vvvSd3t+yaIIqiKHcnyH6tW7cOL7zwAqZPn46QkBAUFhbi4MGD+O9//4vg4GCcOHECLi4ucneTyGJ88803GD58OHQ6HdRqNb788ks8+eSTcneLqmDQoEH43//+BwBo3rw5Dh06BG9vb5l7VXVbtmxBbm6u9Pynn37CV199hYULF8LHx0fa3rVrV9xzzz3VPk9RURGKiorg5OR0V/2VS0JCAnr16oWGDRti+PDh8Pb2RnJyMlJSUrB27Vq5u2e3GIBJVqYAfOzYMXTq1Ena/tZbb2HBggXYsGEDhg0bJmMPiSzP1atX8ddff6FZs2aoX7++3N2hajh16hQKCgrQtm1bqFQqubtTI+bNm4cJEybgwoULaNy4cbnt8vLy4OrqWncdk0FhYaH0eW3fvj1cXV2xd+9eODs7y9wzMmEJBFmk3r17AwAuXLgAAMjIyMDbb7+Ntm3bws3NDRqNBgMGDMAff/xR6rWFhYWYNm0amjdvDicnJzRo0ABPPPEEzp8/DwC4ePFihX+u69Wrl3Ssffv2QRAEfPPNN/jPf/4Df39/uLq64tFHH0VKSkqpcx85cgT9+/eHh4cHXFxc0LNnT/z6669lXmOvXr3KPP+0adNKtf3yyy8RGhoKZ2dneHl5YejQoWWev6JrK8loNGLRokVo06YNnJyc4Ofnh5dffhk3btwwa9e4cWM88sgjpc4zZsyYUscsq+8fffRRqfcUALRaLaZOnYqmTZtCrVYjMDAQEydOhFarLfO9KqlXr1649957S22fN28eBEHAxYsXzbZnZmZi7NixCAwMhFqtRtOmTTFnzhypjrYkU63h7Y/nn3/erN2lS5cwcuRI+Pn5Qa1Wo02bNvjss8/M2pi+dkwPtVqN5s2bY/bs2bh93OH48eMYMGAANBoN3Nzc0KdPHxw+fNisjalc6OLFi/D19UXXrl3h7e2Ndu3aVerPzLeXG93p664q11iT3x+mz4Gvry/0er3Zvq+++krqb3p6utm+n3/+Gd27d4erqyvc3d0RGRmJkydPmrV5/vnn4ebmVqpf3377LQRBwL59+6RtVf06W7FiBdq0aQO1Wo2AgABER0cjMzPTrE2vXr2k74XWrVsjNDQUf/zxR5nfoxUp73NYsv8lr7kyn+9vv/0WnTp1gru7u1m7efPmVbpfZTG95+fPn8fDDz8Md3d3REVFAQB++eUXPPXUUwgKCpJ+DowbNw4FBQVmxyirBlgQBIwZMwZbtmzBvffeK32Nbt++vcL+pKWlwcHBAe+//36pfUlJSRAEAcuWLQNQ+f93TN8HX3/9NSZNmoSGDRvCxcUF2dnZ2LlzJ06cOIGpU6fC2dkZ+fn5MBgMVX4fqeY5yN0BorKYwqrpz4J///03tmzZgqeeegohISFIS0vDxx9/jJ49e+LUqVMICAgAABgMBjzyyCPYvXs3hg4dijfffBM5OTmIjY3FiRMn0KRJE+kcw4YNw8MPP2x23piYmDL7M3PmTAiCgHfeeQdXr17FokWLEB4ejoSEBOk3+j179mDAgAEIDQ3F1KlToVAosHbtWvTu3Ru//PILHnjggVLHbdSokXQTU25uLl599dUyzz158mQMGTIEL774Iq5du4alS5eiR48eOH78ODw9PUu95qWXXkL37t0BAN9//z02b95stv/ll1+WRt/feOMNXLhwAcuWLcPx48fx66+/wtHRscz3oSoyMzPLvEHLaDTi0UcfxcGDB/HSSy+hVatWSExMxMKFC3H27FmzG4TuVn5+Pnr27IlLly7h5ZdfRlBQEA4dOoSYmBhcuXIFixYtKvN1//3vf6WPx40bZ7YvLS0NXbp0kf4Drl+/Pn7++WeMGjUK2dnZGDt2rFn7//znP2jVqhUKCgqkoOjr64tRo0YBAE6ePInu3btDo9Fg4sSJcHR0xMcff4xevXph//796Ny5c7nX99///rfK9aOmciOTsr7uqnqNtfH9kZOTg61bt+Lxxx+Xtq1duxZOTk4oLCws9T6MGDECERERmDNnDvLz87Fy5Uo8+OCDOH78eIWjkTVh2rRpeP/99xEeHo5XX30VSUlJWLlyJY4dO3bH76d33nmnWufs27cvhg8fDgA4duwYlixZUm5bHx8fLFy4UHr+3HPPme2Pi4vDkCFD0L59e3z44Yfw8PBAenp6qa/96ioqKkJERAQefPBBzJs3Typr27RpE/Lz8/Hqq6/C29sbR48exdKlS/Hvv/9i06ZNdzzuwYMH8f333+O1116Du7s7lixZgsGDByM5ObnckhI/Pz/07NkTGzduxNSpU832ffPNN1AqlXjqqacAVP7/HZMZM2ZApVLh7bffhlarhUqlwq5duwAAarUanTp1Qnx8PFQqFR5//HGsWLECXl5eVX4/qYaIRDJau3atCEDctWuXeO3aNTElJUX8+uuvRW9vb9HZ2Vn8999/RVEUxcLCQtFgMJi99sKFC6JarRanT58ubfvss89EAOKCBQtKnctoNEqvAyB+9NFHpdq0adNG7Nmzp/R87969IgCxYcOGYnZ2trR948aNIgBx8eLF0rGbNWsmRkRESOcRRVHMz88XQ0JCxL59+5Y6V9euXcV7771Xen7t2jURgDh16lRp28WLF0WlUinOnDnT7LWJiYmig4NDqe3nzp0TAYiff/65tG3q1KliyW/1X375RQQgrl+/3uy127dvL7U9ODhYjIyMLNX36Oho8fYfH7f3feLEiaKvr68YGhpq9p7+97//FRUKhfjLL7+YvX7VqlUiAPHXX38tdb6SevbsKbZp06bU9o8++kgEIF64cEHaNmPGDNHV1VU8e/asWdt3331XVCqVYnJystn29957TxQEwWxbcHCwOGLECOn5qFGjxAYNGojp6elm7YYOHSp6eHiI+fn5oije+trZu3ev1KawsFBUKBTia6+9Jm0bNGiQqFKpxPPnz0vbLl++LLq7u4s9evSQtpm+V0zXV1hYKAYFBYkDBgwQAYhr164t/WaVYHr9sWPHzLaX9XVX1Wusye8P09frsGHDxEceeUTa/s8//4gKhUIcNmyYCEC8du2aKIqimJOTI3p6eoqjR48262tqaqro4eFhtn3EiBGiq6trqfdm06ZNpT5Xlf06u3r1qqhSqcR+/fqZ/YxatmyZCED87LPPzI5Z8nvhp59+EgGI/fv3L/X9VB6dTicCEMeMGVNh/02ioqLEkJAQs223f75jYmJEAOKVK1ekbRX9nCxPWd+DI0aMEAGI7777bqn2pq+jkmbPni0KgiD+888/0rbbf4aZrkGlUol//fWXtO2PP/4QAYhLly6tsJ8ff/yxCEBMTEw02966dWuxd+/e0vPK/r9j+j645557Sl3To48+KgIQvb29xaioKPHbb78VJ0+eLDo4OIhdu3Y1+36gusUSCLII4eHhqF+/PgIDAzF06FC
"text/plain": [
"<Figure size 800x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAArcAAAHWCAYAAABt3aEVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB6M0lEQVR4nO3dd3xT5f4H8M/JTkdauls62FOWZRVliiIgiuAVuICoXPUqeEUUvVxkOnDwExeuq4IDRIbodbAFHBRF9ijIKLRAB21p05k0yfP7I01o6C5tT5p+3q9XXm3OeXLyPacnybdPvud5JCGEABERERGRB1DIHQARERERUV1hcktEREREHoPJLRERERF5DCa3REREROQxmNwSERERkcdgcktEREREHoPJLRERERF5DCa3REREROQxmNwSERERkcdgcktEVAsvvfQSbDYbAMBms2Hx4sUyR0Q1cfToUXzzzTfO+wcPHsQPP/wgX0BuSJIkLFiwwHl/xYoVkCQJ586dq/KxLVq0wP3331+n8dx///1o0aJFnW6TPBOTWwJw9U3LcdPpdGjXrh2mT5+OtLQ0ucMjcjuffvoplixZggsXLuD//u//8Omnn8odEtVAbm4uHnnkEezZswenTp3CE088gSNHjsgdVq3861//giRJOH36dIVt5syZA0mScPjw4QaMrOYuXbqEBQsW4ODBg3KHIqszZ87g73//O0JCQqDX69G2bVvMmTNH7rAaDZXcAZB7WbRoEVq2bImioiL8+uuveO+99/Djjz/i6NGj8PLykjs8IrexaNEi3HfffXj22Weh1WrxxRdfyB0S1UBcXJzzBgDt2rXDQw89JHNUtTNx4kS8/fbbWLVqFebNm1dumy+//BJdunRB165da/08kydPxvjx46HVamu9japcunQJCxcuRIsWLdC9e3eXdf/973+d35Z4soMHD2LQoEFo3rw5nnrqKQQGBiIpKQnJyclyh9ZoMLklF8OHD0fPnj0BAP/4xz8QGBiI119/Hd9++y0mTJggc3RE7mPcuHEYPHgwTp8+jbZt2yI4OFjukKiGvvnmGxw/fhyFhYXo0qULNBqN3CHVSp8+fdCmTRt8+eWX5Sa38fHxSExMxMsvv3xdz6NUKqFUKq9rG9dDrVbL9tz1raioyHn+TZ48GR06dMCOHTug1+tljqxxYlkCVWrIkCEAgMTERABAVlYWnn76aXTp0gU+Pj4wGAwYPnw4Dh06VOaxRUVFWLBgAdq1awedTofw8HCMGTMGZ86cAQCcO3fOpRTi2tugQYOc29q5cyckScJXX32F//znPwgLC4O3tzfuvPPOcv+b/f3333H77bfDz88PXl5eGDhwIH777bdy93HQoEHlPn/pWjOHL774ArGxsdDr9QgICMD48ePLff7K9q00m82GN954A507d4ZOp0NoaCgeeeQRXLlyxaVdixYtcMcdd5R5nunTp5fZZnmxv/baa2WOKQCYTCbMnz8fbdq0gVarRVRUFJ555hmYTKZyj1VpgwYNwg033FBm+ZIlS8qty8vOzsaMGTMQFRUFrVaLNm3a4JVXXim3J2bBggXlHrtra/guXryIBx98EKGhodBqtejcuTM++eQTlzaOc8dx02q1aNeuHRYvXgwhhEvbAwcOYPjw4TAYDPDx8cEtt9yCPXv2uLQpXXcYEhKCfv36ITAwEF27doUkSVixYkWlx+3aEqCqzrua7GNdvj4cf4OQkBAUFxe7rPvyyy+d8WZkZLis27hxI/r37w9vb2/4+vpi5MiROHbsmEub+++/Hz4+PmXiWrduHSRJws6dO53Lanqevfvuu+jcuTO0Wi0iIiIwbdo0ZGdnu7QZNGiQ87XQqVMnxMbG4tChQ+W+RitT0d+wdPyl97k6f+9169ahZ8+e8PX1dWm3ZMmSSmOZOHEiTpw4gf3795dZt2rVKkiShAkTJsBsNmPevHmIjY2Fn58fvL290b9/f+zYsaPK/S2v5lYIgRdeeAGRkZHw8vLC4MGDy/y9gep9duzcuRO9evUCADzwwAPOfXe8psqruc3Pz8dTTz3lfF9p3749lixZUua1LUkSpk+fjm+++QY33HCD87W0adOmSvc5LS0NKpUKCxcuLLPu5MmTkCQJ77zzTrX30bGfkiRh9erVeO6559C8eXN4eXnBaDRiy5YtOHr0KObPnw+9Xo+CggJYrdZKY6Sy2HNLlXIkooGBgQCAs2fP4ptvvsHf/vY3tGzZEmlpafjggw8wcOBAHD9+HBEREQAAq9WKO+64A9u3b8f48ePxxBNPIDc3F1u3bsXRo0fRunVr53NMmDABI0aMcHne2bNnlxvPiy++CEmS8OyzzyI9PR1vvPEGhg4dioMHDzr/w/3pp58wfPhwxMbGYv78+VAoFFi+fDmGDBmCX375Bb179y6z3cjISOcFQXl5eXj00UfLfe65c+fi3nvvxT/+8Q9cvnwZb7/9NgYMGIADBw7A39+/zGMefvhh9O/fHwDw9ddfY8OGDS7rH3nkEaxYsQIPPPAA/vWvfyExMRHvvPMODhw4gN9++61Oeiqys7PLvdjJZrPhzjvvxK+//oqHH34YHTt2xJEjR7B06VL89ddfLhfbXK+CggIMHDgQFy9exCOPPILo6Gjs3r0bs2fPRkpKCt54441yH/f55587f3/yySdd1qWlpaFv377OD63g4GBs3LgRU6dOhdFoxIwZM1za/+c//0HHjh1RWFjoTAJDQkIwdepUAMCxY8fQv39/GAwGPPPMM1Cr1fjggw8waNAg7Nq1C3369Klw/z7//PMa12s6SoAcyjvvarqP9fH6yM3Nxffff4+7777buWz58uXQ6XQoKioqcxymTJmCYcOG4ZVXXkFBQQHee+893HzzzThw4EC9Xwy0YMECLFy4EEOHDsWjjz6KkydP4r333sPevXurfD09++yztXrOW2+9Fffddx8AYO/evXjrrbcqbBsUFISlS5c670+ePNllfXx8PO69915069YNL7/8Mvz8/JCRkVHm3C/PxIkTsXDhQqxatQo33nijc7nVasWaNWvQv39/REdHIyMjAx999BEmTJiAhx56CLm5ufj4448xbNgw/PHHH2VKAaoyb948vPDCCxgxYgRGjBiB/fv347bbboPZbHZpV53Pjo4dO2LRokWYN2+ey3tnv379yn1uIQTuvPNO7NixA1OnTkX37t2xefNmzJo1CxcvXnQ51gDw66+/4uuvv8Zjjz0GX19fvPXWWxg7diySkpKcn3HXCg0NxcCBA7FmzRrMnz/fZd1XX30FpVKJv/3tb9Xex9Kef/55aDQaPP300zCZTNBoNNi2bRsAQKvVomfPnti3bx80Gg3uvvtuvPvuuwgICKjmX6aJE0RCiOXLlwsAYtu2beLy5csiOTlZrF69WgQGBgq9Xi8uXLgghBCiqKhIWK1Wl8cmJiYKrVYrFi1a5Fz2ySefCADi9ddfL/NcNpvN+TgA4rXXXivTpnPnzmLgwIHO+zt27BAARPPmzYXRaHQuX7NmjQAg3nzzTee227ZtK4YNG+Z8HiGEKCgoEC1bthS33nprmefq16+fuOGGG5z3L1++LACI+fPnO5edO3dOKJVK8eKLL7o89siRI0KlUpVZfurUKQFAfPrpp85l8+fPF6Vfcr/88osAIFauXOny2E2bNpVZHhMTI0aOHFkm9mnTpolrX8bXxv7MM8+IkJAQERsb63JMP//8c6FQKMQvv/zi8vj3339fABC//fZbmecrbeDAgaJz585llr/22msCgEhMTHQue/7554W3t7f466+/XNr++9//FkqlUiQlJbksnzNnjpAkyWVZTEyMmDJlivP+1KlTRXh4uMjIyHBpN378eOHn5ycKCgqEEFfPnR07djjbFBUVCYVCIR577DHnstGjRwuNRiPOnDnjXHbp0iXh6+srBgwY4FzmeK049q+oqEhER0eL4cOHCwBi+fLlZQ9WKY7H792712V5eeddTfexLl8fjvN1woQJ4o477nAuP3/+vFAoFGLChAkCgLh8+bIQQojc3Fzh7+8vHnroIZdYU1NThZ+fn8vyKVOmCG9
"text/plain": [
"<Figure size 800x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAArcAAAHWCAYAAABt3aEVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABv9klEQVR4nO3dd3gU5doG8Hu2p25ITyAJoRdBNLRIFSIBEUWwgKgoHPQoqICih086CoqoWADLQUAFEVBEEZEi5ShBAUHpIgKhpBLSk63z/bHZIUsKKZvM7ub+XddCdmYy82xJcufNM+8IoiiKICIiIiLyAAq5CyAiIiIichaGWyIiIiLyGAy3REREROQxGG6JiIiIyGMw3BIRERGRx2C4JSIiIiKPwXBLRERERB6D4ZaIiIiIPAbDLRERERF5DIZbIiIXNG/ePFitVgCA1WrF/PnzZa6IquPo0aP45ptvpPuHDx/G999/L19BRA0Iwy3VixUrVkAQBOmm0+nQqlUrTJgwAWlpaXKXR+RyVq5ciYULF+LixYt48803sXLlSrlLomrIy8vDk08+iX379uH06dN47rnncOTIEbnLqpGmTZs6fP+u6LZixQqnHG/evHkOvxi4mjNnzuChhx5CaGgovLy80LJlS7z88styl0WlqOQugBqWOXPmIDY2FsXFxfj555+xdOlSbN68GUePHoW3t7fc5RG5jDlz5uDRRx/FSy+9BK1Wi88//1zukqga4uPjpRsAtGrVCuPGjZO5qppZtGgR8vPzpfubN2/GF198gbfffhvBwcHS8ttuu80px5s3bx7uu+8+DB061Cn7c6bDhw+jb9++aNy4MZ5//nkEBQUhOTkZFy5ckLs0KoXhlurVoEGD0LlzZwDAv/71LwQFBeGtt97Cxo0bMXLkSJmrI3IdDz74IG6//Xb8/fffaNmyJUJCQuQuiarpm2++wfHjx1FUVIQOHTpAo9HIXVKNXB8yU1NT8cUXX2Do0KFo2rSpLDXVp+LiYum1e+SRR9CmTRvs3LkTXl5eMldGFWFbAsmqX79+AICzZ88CALKysvDCCy+gQ4cO8PX1hb+/PwYNGoQ//vijzOcWFxdj1qxZaNWqFXQ6HSIiIjBs2DCcOXMGAHDu3LlK/4TWt29faV+7du2CIAj48ssv8X//938IDw+Hj48P7r777nJ/I//1118xcOBA6PV6eHt7o0+fPvjll1/KfYx9+/Yt9/izZs0qs+3nn3+OuLg4eHl5ITAwECNGjCj3+JU9ttKsVisWLVqE9u3bQ6fTISwsDE8++SSuXr3qsF3Tpk1x1113lTnOhAkTyuyzvNrfeOONMs8pABgMBsycORMtWrSAVqtFVFQUXnzxRRgMhnKfq9L69u2Lm266qczyhQsXQhAEnDt3zmF5dnY2Jk6ciKioKGi1WrRo0QKvv/661Lda2qxZs8p97h577DGH7S5duoQxY8YgLCwMWq0W7du3xyeffOKwjf29Y79ptVq0atUK8+fPhyiKDtseOnQIgwYNgr+/P3x9fdG/f3/s27fPYRt7C8+5c+cQGhqK2267DUFBQejYsWOV/vR7fQvQjd531XmMzvz6sL8GoaGhMJlMDuu++OILqd7MzEyHdT/88AN69eoFHx8f+Pn5YfDgwTh27JjDNo899hh8fX3L1LV+/XoIgoBdu3ZJy6r7PluyZAnat28PrVaLyMhIjB8/HtnZ2Q7b9O3bV/paaNeuHeLi4vDHH3+U+zVamYpew9L1l37MVXm9169fj86dO8PPz89hu4ULF1a5ropU5fvX6dOnMXz4cISHh0On06FJkyYYMWIEcnJypMdcUFCAlStXVvh1aZeWlgaVSoXZs2eXWXfq1CkIgoD3338fQNV/ttjf62vWrMG0adPQuHFjeHt7Izc3F1u3bsXRo0cxc+ZMeHl5obCwEBaLpdbPGzkfR25JVvYgGhQUBAD4559/8M033+D+++9HbGws0tLS8OGHH6JPnz44fvw4IiMjAQAWiwV33XUXduzYgREjRuC5555DXl4etm3bhqNHj6J58+bSMUaOHIk777zT4bhTp04tt55XX30VgiDgpZdeQnp6OhYtWoSEhAQcPnxY+i39p59+wqBBgxAXF4eZM2dCoVBg+fLl6NevH/73v/+ha9euZfbbpEkT6YSg/Px8PPXUU+Uee/r06XjggQfwr3/9CxkZGXjvvffQu3dvHDp0CAEBAWU+54knnkCvXr0AAF9//TU2bNjgsP7JJ5/EihUr8Pjjj+PZZ5/F2bNn8f777+PQoUP45ZdfoFary30eqiM7O7vck52sVivuvvtu/Pzzz3jiiSfQtm1bHDlyBG+//Tb++usvp/bUFRYWok+fPrh06RKefPJJREdHY+/evZg6dSpSUlKwaNGicj/vs88+kz6eNGmSw7q0tDR0794dgiBgwoQJCAkJwQ8//ICxY8ciNzcXEydOdNj+//7v/9C2bVsUFRVJITA0NBRjx44FABw7dgy9evWCv78/XnzxRajVanz44Yfo27cvdu/ejW7dulX4+D777LNq92vaW4DsynvfVfcx1sXXR15eHjZt2oR7771XWrZ8+XLodDoUFxeXeR5Gjx6NxMREvP766ygsLMTSpUvRs2dPHDp0qM5HEWfNmoXZs2cjISEBTz31FE6dOoWlS5di//79N/x6eumll2p0zDvuuAOPPvooAGD//v149913K9w2ODgYb7/9tnT/kUcecViflJSEBx54ADfffDNee+016PV6ZGZmlnnv10RVvn8ZjUYkJibCYDDgmWeeQXh4OC5duoRNmzYhOzsber0en332Gf71r3+ha9eueOKJJwDA4ft5aWFhYejTpw/Wrl2LmTNnOqz78ssvoVQqcf/99wOo+s8Wu7lz50Kj0eCFF16AwWCARqPB9u3bAQBarRadO3fGwYMHodFocO+992LJkiUIDAys9fNITiIS1YPly5eLAMTt27eLGRkZ4oULF8Q1a9aIQUFBopeXl3jx4kVRFEWxuLhYtFgsDp979uxZUavVinPmzJGWffLJJyIA8a233ipzLKvVKn0eAPGNN94os0379u3FPn36SPd37twpAhAbN24s5ubmSsvXrl0rAhDfeecdad8tW7YUExMTpeOIoigWFhaKsbGx4h133FHmWLfddpt40003SfczMjJEAOLMmTOlZefOnROVSqX46quvOnzukSNHRJVKVWb56dOnRQDiypUrpWUzZ84US39J/+9//xMBiKtWrXL43C1btpRZHhMTIw4ePLhM7ePHjxev/zZxfe0vvviiGBoaKsbFxTk8p5999pmoUCjE//3vfw6f/8EHH4gAxF9++aXM8Urr06eP2L59+zLL33jjDRGAePbsWWnZ3LlzRR8fH/Gvv/5y2PY///mPqFQqxeTkZIflL7/8sigIgsOymJgYcfTo0dL9sWPHihEREWJmZqbDdiNGjBD1er1YWFgoiuK1987OnTulbYqLi0WFQiE+/fTT0rKhQ4eKGo1GPHPmjLTs8uXLop+fn9i7d29pmf1rxf74iouLxejoaHHQoEEiAHH58uVln6xS7J+/f/9+h+Xlve+q+xid+fVhf7+OHDlSvOuuu6Tl58+fFxUKhThy5EgRgJiRkSGKoijm5eWJAQEB4rhx4xxqTU1NFfV6vcPy0aNHiz4+PmWem3Xr1pV5rar6PktPTxc1Go04YMAAh+9R77//vghA/OSTTxz2WfprYfPmzSIAceDAgWW+nipiNBpFAOKECRMqrd9u1KhRYmxsrMOy61/vqVOnigDElJQUaVll3ycrcv1zU9XvX4cOHRIBiOvWrat0/z4+Pg5fi5X58MMPRQDikSNHHJa3a9dO7Nevn3S/qj9b7O/1Zs2aSe9/u7vvvlsEIAYFBYmjRo0S169fL06fPl1UqVTibbfd5vCeJ3mxLYHqVUJCAkJCQhAVFYURI0bA19cXGzZsQOPGjQHYfiNWKGxvS4vFgitXrsDX1xetW7fG77//Lu3nq6++QnBwMJ555pkyx6jOn/2u9+ijj8LPz0+6f9999yEiIgKbN28GYDuZ4PTp03jooYdw5coVZGZ
"text/plain": [
"<Figure size 800x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"def plot_sample_balance(y, sample_name):\n",
" plt.figure(figsize=(8, 5))\n",
" sns.histplot(y, bins=30, kde=True)\n",
" plt.title(f'Распределение целевой переменной для {sample_name}')\n",
" plt.xlabel(sample_name)\n",
" plt.ylabel('Частота')\n",
" plt.show()\n",
"\n",
"# Оценка сбалансированности выборок\n",
"plot_sample_balance(train_var6['price'], 'Train var6')\n",
"plot_sample_balance(val_var6['price'], 'Validation var6')\n",
"plot_sample_balance(test_var6['price'], 'Test var6')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Распределения выборок у данного датасета выглядят схоже. Это говорит о сбалансированности выборок."
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAsAAAAHWCAYAAAB5SD/0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABfSElEQVR4nO3deVxU9f7H8fewDTsuyJZo7vtypTIsdxSVLMuuWaaWpll4b+pNjbLcSssss1KrX4stmmlli5rmbiamqaS5ZYphKigau+zn94eXuY6ACgEDzuv5eJyHnHO+8z2fM2cG3p75njMmwzAMAQAAAHbCwdYFAAAAABWJAAwAAAC7QgAGAACAXSEAAwAAwK4QgAEAAGBXCMAAAACwKwRgAAAA2BUCMAAAAOwKARgAAAB2hQAMAFXIjBkzlJ+fL0nKz8/XzJkzbVwRSuLXX3/VV199ZZmPiYnRypUrbVdQFTBlyhSZTCZbl4HrDAEYNrVw4UKZTCbL5OrqqsaNG2v06NFKSEiwdXlApfPhhx9q9uzZ+vPPP/XKK6/oww8/tHVJKIHU1FQ9+uij2r59u44cOaInnnhC+/bts3VZpXLjjTda/f4ublq4cKGtS600Fi1aJJPJJE9PT1uXYvdMhmEYti4C9mvhwoV6+OGHNW3aNNWrV0+ZmZnaunWrPv74Y9WtW1e//vqr3N3dbV0mUGl89tlnGjJkiLKzs2U2m/XJJ5/o3nvvtXVZKIF+/frp66+/liQ1btxY27ZtU82aNW1cVcl99dVXSktLs8yvWrVKn376qebMmSNfX1/L8g4dOqh+/fql3k5ubq5yc3Pl6ur6t+q1tbS0NDVp0kTJycmWediOk60LACSpd+/euummmyRJjzzyiGrWrKlXX31VX3/9te6//34bVwdUHvfdd5+6du2q33//XY0aNVKtWrVsXRJK6KuvvtKBAwd04cIFtWrVSi4uLrYuqVT69etnNR8fH69PP/1U/fr104033ljs49LT0+Xh4XHN23FycpKTU9WKK5mZmXJxcZGDw/8+aH/++efl5eWlrl27Wg2DgW0wBAKVUrdu3SRJsbGxkqTz58/rySefVKtWreTp6Slvb2/17t1bv/zyS6HHZmZmasqUKWrcuLFcXV0VGBioe+65R0ePHpUkHT9+/Iof13Xp0sXS16ZNm2QymfTZZ5/p6aefVkBAgDw8PHTnnXfqxIkThbb9008/qVevXvLx8ZG7u7s6d+6sH3/8sch97NKlS5HbnzJlSqG2n3zyiUJCQuTm5qYaNWpo4MCBRW7/Svt2qfz8fL322mtq0aKFXF1d5e/vr0cffVR//fWXVbsbb7xRd9xxR6HtjB49ulCfRdX+8ssvF3pOJSkrK0uTJ09Ww4YNZTabFRwcrAkTJigrK6vI5+pSXbp0UcuWLQstnz17tkwmk44fP261PCkpSWPGjFFwcLDMZrMaNmyol156yTKO9lIFYw0vnx566CGrdidPntSwYcPk7+8vs9msFi1a6P3337dqU/DaKZjMZrMaN26smTNn6vIP3vbs2aPevXvL29tbnp6e6t69u7Zv327VpmC40PHjx+Xn56cOHTqoZs2aat269TV9zHz5cKOrve5Kso9l+f4oOAZ+fn7KycmxWvfpp59a6k1MTLRa991336ljx47y8PCQl5eXIiIitH//fqs2Dz30UJEfPX/++ecymUzatGmTZVlJX2fz589XixYtZDabFRQUpMjISCUlJVm16dKli+W90Lx5c4WEhOiXX34p8j16JcUdw0vrv3Sfr+V4f/7557rpppvk5eVl1W727NnXXFdRCp7zo0ePqk+fPvLy8tKgQYMkST/88IP++c9/qk6dOpbfA2PHjtWFCxes+ihqDLDJZNLo0aP11VdfqWXLlpbX6OrVq69YT0JCgpycnDR16tRC6w4fPiyTyaQ333xT0rX/3Sl4HyxZskSTJk3SDTfcIHd3d6WkpFjaHDlyRHPmzNGrr75a5cL89YqjgEqpIKwWfCx47NgxffXVV/rnP/+pevXqKSEhQW+//bY6d+6sAwcOKCgoSJKUl5enO+64Q+vXr9fAgQP1xBNPKDU1VWvXrtWvv/6qBg0aWLZx//33q0+fPlbbjYqKKrKeF154QSaTSRMnTtSZM2f02muvKSwsTDExMXJzc5MkbdiwQb1791ZISIgmT54sBwcHffDBB+rWrZt++OEH3XLLLYX6rV27tuUiprS0ND322GNFbvvZZ5/VgAED9Mgjj+js2bN644031KlTJ+3Zs0fVqlUr9JiRI0eqY8eOkqQvv/xSy5cvt1r/6KOPWoaf/Pvf/1ZsbKzefPNN7dmzRz/++KOcnZ2LfB5KIikpqcgLtPLz83XnnXdq69atGjlypJo1a6Z9+/Zpzpw5+u2338r0zEhGRoY6d+6skydP6tFHH1WdOnW0bds2RUVF6fTp03rttdeKfNzHH39s+Xns2LFW6xISEnTrrbda/gDXqlVL3333nYYPH66UlBSNGTPGqv3TTz+tZs2a6cKFC5ag6Ofnp+HDh0uS9u/fr44dO8rb21sTJkyQs7Oz3n77bXXp0kWbN29W+/bti92/jz/+uMTjRwuGGxUo6nVX0n0sj/dHamqqVqxYobvvvtuy7IMPPpCrq6syMzMLPQ9Dhw5VeHi4XnrpJWVkZGjBggW6/fbbtWfPniuejSwLU6ZM0dSpUxUWFqbHHntMhw8f1oIFC7Rz586rvp8mTpxYqm326NFDQ4YMkSTt3LlTr7/+erFtfX19NWfOHMv84MGDrdZHR0drwIABatOmjV588UX5+PgoMTGx0Gu/tHJzcxUeHq7bb79ds2fPtgxrW7ZsmTIyMvTYY4+pZs2a2rFjh9544w39+eefWrZs2VX73bp1q7788ks9/vjj8vLy0uuvv67+/fsrLi6u2CEl/v7+6ty5s5YuXarJkydbrfvss8/k6Oiof/7zn5Ku/e9OgenTp8vFxUVPPvmksrKyrM7sjxkzRl27dlWfPn20dOnSEj1/KCcGYEMffPCBIclYt26dcfbsWePEiRPGkiVLjJo1axpubm7Gn3/+aRiGYWRmZhp5eXlWj42NjTXMZrMxbdo0y7L333/fkGS8+uqrhbaVn59veZwk4+WXXy7UpkWLFkbnzp0t8xs3bjQkGTfccIORkpJiWb506VJDkjF37lxL340aNTLCw8Mt2zEMw8jIyDDq1atn9OjRo9C2OnToYLRs2dIyf/bsWUOSMXnyZMuy48ePG46OjsYLL7xg9dh9+/YZTk5OhZYfOXLEkGR8+OGHlmWTJ082Ln2r//DDD4YkY9GiRVaPXb16daHldevWNSIiIgrVHhkZaVz+6+Py2idMmGD4+fkZISEhVs/pxx9/bDg4OBg//PCD1ePfeustQ5Lx448/FtrepTp37my0aNGi0PKXX37ZkGTExsZalk2fPt3w8PAwfvvtN6u2Tz31lOHo6GjExcVZLX/mmWcMk8lktaxu3brG0KFDLfPDhw83AgMDjcTERKt2AwcONHx8fIyMjAzDMP732tm4caOlTWZmpuHg4GA8/vjjlmX9+vUzXFxcjKNHj1qWnTp1yvDy8jI6depkWVbwXinYv8zMTKNOnTpG7969DUnGBx98UPjJukTB43fu3Gm1vKjXXUn3sSzfHwWv1/vvv9+44447LMv/+OMPw8HBwbj//vsNScbZs2cNwzCM1NRUo1q1asaIESOsao2Pjzd8fHyslg8dOtTw8PAo9NwsW7as0LG61tfZmTNnDBcXF6Nnz55Wv6PefPNNQ5Lx/vvvW/V56Xth1apVhiSjV69ehd5PxcnOzjYkGaNHj75i/QUGDRpk1KtXz2rZ5cc7KirKkGScPn3asuxKvyeLU9R7cOjQoYYk46mnnirUvuB1dKmZM2caJpPJ+OOPPyzLLv8dVrAPLi4uxu+//25Z9ssvvxiSjDfeeOOKdb799tuGJGPfvn1Wy5s3b25069bNMn+tf3cK3gf169cvcp9WrFhhODk5Gfv37zcMo/jXISoWQyBQKYSFhalWrVoKDg7WwIED5enpqeXLl+uGG26QJJnNZstYqry
"text/plain": [
"<Figure size 800x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAArcAAAHWCAYAAABt3aEVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABYLElEQVR4nO3de5xN9f7H8ffec7+7zs0t5J7LaYQh12hCyqHjkkSJYnRC0RFyLSUnldDllOlCoqJfCiGpGCUhuYVohBkmzZW57u/vD2f2sc0MM2PYY3s9H4/1YH/Xd6/1WXvtNfOetb9rbYsxxggAAABwAVZnFwAAAACUFsItAAAAXAbhFgAAAC6DcAsAAACXQbgFAACAyyDcAgAAwGUQbgEAAOAyCLcAAABwGYRbAAAAuAzCLQCUwLPPPiubzSZJstlsmjlzppMrQnH88ssvWrFihf3xjh079PnnnzuvoDLIYrFoypQp9scxMTGyWCw6cuTIJZ97ww03aPDgwaVaz+DBg3XDDTeU6jLhmgi3kPS/H1p5k7e3t+rWrauRI0cqISHB2eUBZc4777yj2bNn648//tC///1vvfPOO84uCcWQmpqqhx9+WFu2bNGBAwf02GOPadeuXc4uq0T++c9/ymKx6ODBg4X2mTBhgiwWi37++eerWFnxHT9+XFOmTNGOHTucXUqZsGjRIlksFvn7+zu7lGsK4RYOpk2bpvfee0+vvvqqWrdurQULFigyMlJnzpxxdmlAmTJt2jRNmjRJ1apV06RJkzRjxgxnl4RiiIyMtE9169ZVfHy8hg4d6uyySmTAgAGSpMWLFxfa54MPPlDjxo3VpEmTEq9n4MCBOnv2rGrUqFHiZVzK8ePHNXXq1ALD7Ztvvqn9+/dfsXWXNWlpaRo3bpz8/PycXco1h3ALB127dtV9992nhx56SDExMRo1apQOHz6sTz/91NmlAWVK3759dfToUW3atElHjx7VPffc4+ySUEwrVqzQ7t279eOPP2rXrl2qWLGis0sqkZYtW+rGG2/UBx98UOD82NhYHT582B6CS8rNzU3e3t6yWCyXtZyS8vDwkJeXl1PWfaVlZGTYhznlmTFjhgICAtSzZ0/nFHUNI9ziojp16iRJOnz4sCTp9OnTeuKJJ9S4cWP5+/srMDBQXbt21c6dO/M9NyMjQ1OmTFHdunXl7e2tsLAw9erVS4cOHZIkHTlyxGEoxIVThw4d7Mv6+uuvZbFY9OGHH+qpp55SaGio/Pz8dNddd+no0aP51v3999/rjjvuUFBQkHx9fdW+fXtt2rSpwG3s0KFDges/f6xZnvfff18RERHy8fFRhQoV1K9fvwLXf7FtO5/NZtNLL72kRo0aydvbWyEhIXr44Yf1119/OfS74YYbdOedd+Zbz8iRI/Mts6DaX3jhhXyvqSRlZmZq8uTJuvHGG+Xl5aVq1app3LhxyszMLPC1Ol+HDh1000035WufPXt2gePykpKSNGrUKFWrVk1eXl668cYb9fzzz+f7gS5JU6ZMKfC1u3AM37Fjx/Tggw8qJCREXl5eatSokd5++22HPnnvnbzJy8tLdevW1cyZM2WMcei7fft2de3aVYGBgfL399dtt92mLVu2OPQ5f9xhcHCwWrdurYoVK6pJkyayWCyKiYm56Ot24RCgS73virONpXl85O2D4OBgZWdnO8z74IMP7PUmJiY6zFu1apXatm0rPz8/BQQEqHv37tq9e7dDn8GDBxf4MetHH30ki8Wir7/+2t5W3PfZ/Pnz1ahRI3l5eSk8PFzR0dFKSkpy6NOhQwf7sdCwYUNFRERo586dBR6jF1PYPjy//vO3uSj7+6OPPlLz5s0VEBDg0G/27NkXrWXAgAHat2+ffvrpp3zzFi9eLIvFov79+ysrK0tPP/20IiIiFBQUJD8/P7Vt21YbNmy45PYWNObWGKMZM2aoatWq8vX1VceOHfPtb6lovzu+/vpr3XLLLZKkBx54wL7tecdUQWNu09PT9fjjj9t/rtSrV0+zZ8/Od2xbLBaNHDlSK1as0E033WQ/llavXn3RbU5ISJC7u7umTp2ab97+/ftlsVj06quvFnkb87bTYrFoyZIlmjhxoqpUqSJfX1+lpKTY+xw4cEBz5szRiy++KHd394vWiPx4xXBReUE074zGb7/9phUrVugf//iHatasqYSEBL3++utq37699uzZo/DwcElSbm6u7rzzTq1fv179+vXTY489ptTUVK1du1a//PKLateubV9H//791a1bN4f1jh8/vsB6nnnmGVksFj355JM6efKkXnrpJXXu3Fk7duyQj4+PJOmrr75S165dFRERocmTJ8tqtWrhwoXq1KmTvv32W7Vo0SLfcqtWrWq/ICgtLU3Dhw8vcN2TJk1Snz599NBDD+nUqVOaO3eu2rVrp+3bt6tcuXL5njNs2DC1bdtWkvTJJ59o+fLlDvMffvhhxcTE6IEHHtA///lPHT58WK+++qq2b9+uTZs2ycPDo8DXoTiSkpIKvNjJZrPprrvu0nfffadhw4apQYMG2rVrl+bMmaNff/3V4WKby3XmzBm1b99ex44d08MPP6zq1atr8+bNGj9+vE6cOKGXXnqpwOe999579v+PHj3aYV5CQoJatWpl/6VVuXJlrVq1SkOGDFFKSopGjRrl0P+pp55SgwYNdPbsWXsIDA4O1pAhQyRJu3fvVtu2bRUYGKhx48bJw8NDr7/+ujp06KCNGzeqZcuWhW7fe++9V+zxmtOmTVPNmjXtjwt63xV3G6/E8ZGamqqVK1fq73//u71t4cKF8vb2VkZGRr7XYdCgQYqKitLzzz+vM2fOaMGCBbr11lu1ffv2K34x0JQpUzR16lR17txZw4cP1/79+7VgwQJt3br1ksfTk08+WaJ1dunSRffff78kaevWrXrllVcK7VupUiXNmTPH/njgwIEO82NjY9WnTx81bdpUzz33nIKCgpSYmJjvvV+QAQMGaOrUqVq8eLFuvvlme3tubq6WLl2qtm3bqnr16kpMTNR//vMf9e/fX0OHDlVqaqreeustRUVF6YcfflCzZs2Ktf1PP/20ZsyYoW7duqlbt2766aefdPvttysrK8uhX1F+dzRo0EDTpk3T008/7fCzs3Xr1gWu2xiju+66Sxs2bNCQIUPUrFkzrVmzRmPHjtWxY8ccXmtJ+u677/TJJ59oxIgRCggI0CuvvKLevXsrLi6u0LP2ISEhat++vZYuXarJkyc7zPvwww/l5uamf/zjH0XexvNNnz5dnp6eeuKJJ5SZmSlPT0/7vFGjRqljx47q1q2bli5dWoQ9AQcGMMYsXLjQSDLr1q0zp06dMkePHjVLliwxFStWND4+PuaPP/4wxhiTkZFhcnNzHZ57+PBh4+XlZaZNm2Zve/vtt40k8+KLL+Zbl81msz9PknnhhRfy9WnUqJFp3769/fGGDRuMJFOlShWTkpJib1+6dKmRZF5++WX7suvUqWOioqLs6zHGmDNnzpiaNWuaLl265FtX69atzU033WR/fOrUKSPJTJ482d525MgR4+bmZp555hmH5+7atcu4u7vnaz9w4ICRZN555x172+TJk835h9y3335rJJlFixY5PHf16tX52mvUqGG6d++er/bo6Ghz4WF8Ye3jxo0zwcHBJiIiwuE1fe+994zVajXffvutw/Nfe+01I8ls2rQp3/rO1759e9OoUaN87S+88IKRZA4fPmxvmz59uvHz8zO//vqrQ99//etfxs3NzcTFxTm0T5gwwVgsFoe2GjVqmEGDBtkfDxkyxISFhZnExESHfv369TNBQUHmzJkzxpj/vXc2bNhg75ORkWGsVqsZMWKEva1nz57G09PTHDp0yN52/PhxExAQYNq1a2dvyztW8rYvIyPDVK9e3XTt2tVIMgsXLsz/Yp0n7/lbt251aC/ofVfcbSzN4yPv/dq/f39z55132tt///13Y7VaTf/+/Y0kc+rUKWOMMampqaZcuXJm6NChDrXGx8eboKAgh/ZBgwYZPz+/fK/NsmXL8u2ror7PTp48aTw9Pc3tt9/u8DPq1VdfNZLM22+
"text/plain": [
"<Figure size 800x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAArcAAAHWCAYAAABt3aEVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABSTUlEQVR4nO3deZyNdf/H8fc5s++DMTMGI2SP1AhjzTpJi5sSyRJRosJ9qxRZQxKV0I6KhKI7SXYqoxCyJ0sjzDBpdrNfvz/cc36OmWFmjDnj8no+HteD63t9z3V9rnOd5T3X+Z7rWAzDMAQAAACYgNXRBQAAAADFhXALAAAA0yDcAgAAwDQItwAAADANwi0AAABMg3ALAAAA0yDcAgAAwDQItwAAADANwi0AAABMg3ALAKXQ5MmTlZ2dLUnKzs7WlClTHFwRCmPfvn1asWKFbX737t369ttvHVcQcBMh3KJEzJ8/XxaLxTa5u7urZs2aGjp0qGJiYhxdHlDqLFiwQNOnT9dff/2lN954QwsWLHB0SSiExMREPfnkk9q2bZuOHDmi5557Tnv37nV0WUVyyy232L1+5zfNnz+/WLY3efJkuz8MSquFCxfKYrHI29vb0aXgMhbDMAxHFwHzmz9/vh5//HFNmDBBVatWVWpqqn788Ud9+umnqlKlivbt2ydPT09HlwmUGl988YX69Omj9PR0ubm56bPPPtNDDz3k6LJQCF26dNHXX38tSapZs6a2bt2qcuXKObiqwluxYoWSkpJs86tWrdLnn3+umTNnKiAgwNberFkzVatW7Zq35+3trYceeqjYwvL1kJSUpFq1aik+Pt42j9LD2dEF4ObSqVMnNWrUSJL0xBNPqFy5cpoxY4a+/vpr9ezZ08HVAaXHI488ojZt2uiPP/5QjRo1VL58eUeXhEJasWKFDhw4oAsXLqh+/fpydXV1dElF0qVLF7v56Ohoff755+rSpYtuueUWh9RUklJTU+Xq6iqr9f8/7J40aZJ8fHzUpk2bG+Is882GYQlwqLZt20qSjh8/Lkk6f/68/vOf/6h+/fry9vaWr6+vOnXqpD179uS6bWpqqsaNG6eaNWvK3d1dFSpUUNeuXXX06FFJ0okTJ674Edrdd99tW9emTZtksVj0xRdf6KWXXlJwcLC8vLz0wAMP6OTJk7m2/fPPP+uee+6Rn5+fPD091bp1a/3000957uPdd9+d5/bHjRuXq+9nn32msLAweXh4qGzZsurRo0ee27/Svl0qOztbb775purVqyd3d3cFBQXpySef1D///GPX75ZbbtF9992XaztDhw7Ntc68an/99ddz3aeSlJaWprFjx+rWW2+Vm5ubKleurOeff15paWl53leXuvvuu3Xbbbflap8+fbosFotOnDhh1x4XF6dhw4apcuXKcnNz06233qrXXnvNNm71UuPGjcvzvuvXr59dv1OnTql///4KCgqSm5ub6tWrp48//tiuT85jJ2dyc3NTzZo1NWXKFF3+wdiuXbvUqVMn+fr6ytvbW+3atdO2bdvs+uQM4Tlx4oQCAwPVrFkzlStXTg0aNCjQR7+XDwG62uOuMPtYnM+PnGMQGBiojIwMu2Wff/65rd7Y2Fi7Zd99951atmwpLy8v+fj4qHPnztq/f79dn379+uX5UfGyZctksVi0adMmW1thH2dz5sxRvXr15ObmppCQEA0ZMkRxcXF2fe6++27bc6Fu3boKCwvTnj178nyOXkl+x/DS+i/d54Ic72XLlqlRo0by8fGx6zd9+vQC15Wfgrx+HTlyRN26dVNwcLDc3d1VqVIl9ejRw3YG1GKxKDk5WQsWLMj3eZkjJiZGzs7OGj9+fK5lhw8flsVi0TvvvCOp4O8tOY/1xYsXa/To0apYsaI8PT2VkJBgtw8zZ87UjBkz5OzMOcLSiKMCh8oJojkf1R07dkwrVqzQww8/rKpVqyomJkbvvfeeWrdurQMHDigkJESSlJWVpfvuu0/r169Xjx499NxzzykxMVFr167Vvn37VL16dds2evbsqXvvvdduu6NGjcqznldffVUWi0UvvPCCzp49qzfffFPt27fX7t275eHhIUnasGGDOnXqpLCwMI0dO1ZWq1Xz5s1T27Zt9cMPP6hx48a51lupUiXbF4KSkpI0ePDgPLc9ZswYde/eXU888YTOnTunWbNmqVWrVtq1a5f8/f1z3WbQoEFq2bKlJOmrr77S8uXL7ZY/+eSTtiEhzz77rI4fP6533nlHu3bt0k8//SQXF5c874fCiIuLy/PLTtnZ2XrggQf0448/atCgQapTp4727t2rmTNn6vfffy/Wsx0pKSlq3bq1Tp06pSeffFKhoaHaunWrRo0apTNnzujNN9/M83affvqp7f/Dhw+3WxYTE6OmTZvKYrFo6NChKl++vL777jsNGDBACQkJGjZsmF3/l156SXXq1NGFCxdsITAwMFADBgyQJO3fv18tW7aUr6+vnn/+ebm4uOi9997T3Xffrc2bN6tJkyb57t+nn35a6PGaOUOAcuT1uCvsPl6P50diYqJWrlypf/3rX7a2efPmyd3dXampqbnuh759+yoiIkKvvfaaUlJSNHfuXLVo0UK7du267mcRx40bp/Hjx6t9+/YaPHiwDh8+rLlz52r79u1XfT698MILRdpmhw4d1KdPH0nS9u3b9fbbb+fbNyAgQDNnzrTN9+7d2255ZGSkunfvrttvv11Tp06Vn5+fYmNjcz32i6Igr1/p6emKiIhQWlqannnmGQUHB+vUqVNauXKl4uLi5Ofnp08//VRPPPGEGjdurEGDBkmS3ev5pYKCgtS6dWstWbJEY8eOtVv2xRdfyMnJSQ8//LCkgr+35Jg4caJcXV31n//8R2lpaXZn3YcNG6Y2bdro3nvv1ZIlS675vsN1YAAlYN68eYYkY926dca5c+eMkydPGosXLzbKlStneHh4GH/99ZdhGIaRmppqZGVl2d32+PHjhpubmzFhwgRb28cff2xIMmbMmJFrW9nZ2bbbSTJef/31XH3q1atntG7d2ja/ceNGQ5JRsWJFIyEhwda+ZMkSQ5Lx1ltv2dZdo0YNIyIiwrYdwzCMlJQUo2rVqkaHDh1ybatZs2bGbbfdZps/d+6cIckYO3asre3EiROGk5OT8eqrr9rddu/evYazs3Ou9iNHjhiSjAULFtjaxo4da1z6lP7hhx8MScbChQvtbrt69epc7VWqVDE6d+6cq/YhQ4YYl79MXF77888/bwQGBhphYWF29+mnn35qWK1W44cffrC7/bvvvmtIMn766adc27tU69atjXr16uVqf/311w1JxvHjx21tEydONLy8vIzff//dru+LL75oODk5GVFRUXbtL7/8smGxWOzaqlSpYvTt29c2P2DAAKNChQpGbGysXb8ePXoYfn5+RkpKimEY///Y2bhxo61PamqqYbVajaefftrW1qVLF8PV1dU4evSore306dOGj4+P0apVK1tbznMlZ/9SU1ON0NBQo1OnToYkY968ebnvrEvk3H779u127Xk97gq7j8X5/Mh5vPbs2dO47777bO1//vmnYbVajZ49exqSjHPnzhmGYRiJiYmGv7+/MXDgQLtao6OjDT8/P7v2vn37Gl5eXrnum6VLl+Y6VgV9nJ09e9ZwdXU1OnbsaPca9c477xiSjI8//thunZc+F1atWmVIMu65555cz6f8pKenG5KMoUOHXrH+HL169TKqVq1q13b58R41apQhyThz5oyt7Uqvk/m5/L4p6OvXrl27DEnG0qVLr7h+Ly8vu+filbz33nuGJGPv3r127XXr1jXatm1rmy/oe0vOY71atWq2x/+lVq5caTg7Oxv79+83DCP/xxoci2EJKFHt27dX+fLlVblyZfXo0UPe3t5avny5KlasKElyc3OzjWvKysrS33//LW9vb9WqVUu//vqrbT1ffvmlAgIC9Mwzz+TaRmE+9rtcnz595OPjY5t/6KGHVKFCBa1atUrSxcv5HDlyRI8++qj+/vtvxcbGKjY2VsnJyWrXrp22bNmS62Pw1NRUubu7X3G7X331lbKzs9W
"text/plain": [
"<Figure size 800x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_sample_balance(train_var4['stroke'], 'Train var4')\n",
"plot_sample_balance(val_var4['stroke'], 'Validation var4')\n",
"plot_sample_balance(test_var4['stroke'], 'Test var4')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Выборки выглядят схоже, но у всех трех имеется явный дисбаланс классов. Это проблема, т.к в дальнейшем не сможем обучить какую-либо модель."
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAArwAAAHWCAYAAACVPVriAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB/a0lEQVR4nO3dd3hTZf8G8PskadO9N3RRCi1ll2EZsgoIqCAogkURfIFXQUUUEBUHDkT5KVMRXwVFUMGB6wVkCYKl7F1WaWnpTktHmjZtk/P7ozQvoS10pD1pen+uKxfknJMn3+Rk3H3ynOcIoiiKICIiIiKyUDKpCyAiIiIiakwMvERERERk0Rh4iYiIiMiiMfASERERkUVj4CUiIiIii8bAS0REREQWjYGXiIiIiCwaAy8RERERWTQGXiIiIiKyaAy8REQSeO+996DX6wEAer0eixcvlrgiqouzZ89i69athusnT57EH3/8IV1BzcCbb74JQRCkLoNaKAZeMon169dDEATDxcbGBu3atcOsWbOQmZkpdXlEZuerr77C0qVLcf36dfzf//0fvvrqK6lLojooLCzEjBkzcOjQIVy+fBnPP/88zpw5I3VZ9RIUFGT0+V3TZf369VKXKql3330XDz74ILy9vSEIAt58880at921axcGDRoEDw8PuLi4oFevXtiwYUPTFUtVCKIoilIXQc3f+vXrMWXKFCxatAjBwcEoKSnBgQMHsGHDBgQGBuLs2bOws7OTukwis/H999/jiSeeQGlpKZRKJb755hs8/PDDUpdFdTBmzBj88ssvAIB27drhn3/+gbu7u8RV1d3WrVuhVqsN1//73//i22+/xccffwwPDw/D8j59+qBNmzb1vp/y8nKUl5fDxsamQfVKRRAE+Pj4oEuXLtixYwfeeOONakPvr7/+ijFjxiAqKgoTJ06EIAjYvHkz9u/fj48++ggvvPBC0xdPDLxkGpWB98iRI+jRo4dh+YsvvoiPPvoImzZtwsSJEyWskMj8ZGVl4cqVKwgNDYWnp6fU5VA9nD9/HsXFxejUqROsra2lLsckli5dirlz5yIxMRFBQUE1bldUVAR7e/umK0wCJSUlsLa2hkwmQ1JSEoKCgqBSqeDp6Vlj4B02bBjOnTuHq1evQqlUAqgI+2FhYbC3t8epU6ea+FEQwCEN1MgGDx4MAEhMTAQA5Obm4qWXXkKnTp3g4OAAJycnjBgxotoPgJKSErz55pto164dbGxs4Ovri7FjxyIhIQEAkJSUdMef3wYOHGho66+//oIgCPj+++/xyiuvwMfHB/b29njwwQeRkpJS5b7j4uJw3333wdnZGXZ2dhgwYAAOHjxY7WMcOHBgtfdf3QfhN998g8jISNja2sLNzQ0TJkyo9v7v9NhupdfrsWzZMkRERMDGxgbe3t6YMWMGbty4YbRdUFAQ7r///ir3M2vWrCptVlf7hx9+WOU5BQCtVos33ngDbdu2hVKphL+/P+bNmwetVlvtc3WrgQMHomPHjlWWL126FIIgICkpyWh5Xl4eZs+eDX9/fyiVSrRt2xZLliwxjIO9VeVYwdsvTz75pNF2qampmDp1Kry9vaFUKhEREYEvv/zSaJvK107lRalUol27dli8eDFu7y84ceIERowYAScnJzg4OGDIkCE4dOiQ0TaVw3+SkpLg5eWFPn36wN3dHZ07d67Vz8a3Dx+62+uuLo/RlO+Pyn3g5eWFsrIyo3XffvutoV6VSmW0btu2bejfvz/s7e3h6OiIUaNG4dy5c0bbPPnkk3BwcKhS1w8//ABBEPDXX38ZltX1dfbJJ58gIiICSqUSfn5+mDlzJvLy8oy2GThwoOG90KFDB0RGRuLUqVPVvkfvpKZ9eGv9tz7m2uzvH374AT169ICjo6PRdkuXLq11XdWpfM4TEhIwcuRIODo6IiYmBgDw999/45FHHkFAQIDhc+CFF15AcXGxURvVjeEVBAGzZs3C1q1b0bFjR8NrdPv27XesJzMzEwqFAm+99VaVdRcvXoQgCFi1ahWA2n/vVL4PvvvuO7z22mto1aoV7OzsUFBQAAB3DP+3KigogKurqyHsAoBCoYCHhwdsbW1r1QaZnkLqAsiyVYbTyp/5rl69iq1bt+KRRx5BcHAwMjMz8dlnn2HAgAE4f/48/Pz8AAA6nQ73338/du/ejQkTJuD5559HYWEhdu7cibNnzyIkJMRwHxMnTsTIkSON7nfBggXV1vPuu+9CEATMnz8fWVlZWLZsGaKjo3Hy5EnDB9GePXswYsQIREZG4o033oBMJsO6deswePBg/P333+jVq1eVdlu3bm046EitVuPpp5+u9r4XLlyI8ePH41//+heys7OxcuVK3HvvvThx4gRcXFyq3Gb69Ono378/AOCnn37Czz//bLR+xowZht715557DomJiVi1ahVOnDiBgwcPwsrKqtrnoS7y8vKqPaBKr9fjwQcfxIEDBzB9+nSEh4fjzJkz+Pjjj3Hp0iWjA3oaSqPRYMCAAUhNTcWMGTMQEBCAf/75BwsWLEB6ejqWLVtW7e1uHTN3+8+ImZmZuOeeewxfuJ6enti2bRueeuopFBQUYPbs2Ubbv/LKKwgPD0dxcbEhGHp5eeGpp54CAJw7dw79+/eHk5MT5s2bBysrK3z22WcYOHAg9u3bh969e9f4+DZs2FDn8Z+Vw4cqVfe6q+tjbIz3R2FhIX7//Xc89NBDhmXr1q2DjY0NSkpKqjwPkydPxvDhw7FkyRJoNBp8+umn6NevH06cOFHrwFFfb775Jt566y1ER0fj6aefxsWLF/Hpp5/iyJEjd30/zZ8/v173OXToUDzxxBMAgCNHjmDFihU1buvh4YGPP/7YcP3xxx83Wh8bG4vx48ejS5cueP/99+Hs7AyVSmWyn9DLy8sxfPhw9OvXD0uXLjUMU9uyZQs0Gg2efvppuLu74/Dhw1i5ciWuX7+OLVu23LXdAwcO4KeffsIzzzwDR0dHrFixAuPGjUNycnKNQ0S8vb0xYMAAbN68GW+88YbRuu+//x5yuRyPPPIIgNp/71R6++23YW1tjZdeeglarbbOPfcDBw7EkiVLsHDhQkyePBmCIGDTpk04evQoNm/eXKe2yIREIhNYt26dCEDctWuXmJ2dLaakpIjfffed6O7uLtra2orXr18XRVEUS0pKRJ1OZ3TbxMREUalUiosWLTIs+/LLL0UA4kcffVTlvvR6veF2AMQPP/ywyjYRERHigAEDDNf37t0rAhBbtWolFhQUGJZv3rxZBCAuX77c0HZoaKg4fPhww/2IoihqNBoxODhYHDp0aJX76tOnj9ixY0fD9ezsbBGA+MYbbxiWJSUliXK5XHz33XeNbnvmzBlRoVBUWX758mURgPjVV18Zlr3xxhvirW/Zv//+WwQgbty40ei227dvr7I8MDBQHDVqVJXaZ86cKd7+MXB77fPmzRO9vLzEyMhIo+d0w4YNokwmE//++2+j269Zs0YEIB48eLDK/d1qwIABYkRERJXlH374oQhATExMNCx7++23RXt7e/HSpUtG27788suiXC4Xk5OTjZa/+uqroiAIRssCAwPFyZMnG64/9dRToq+vr6hSqYy2mzBhgujs7CxqNBpRFP/32tm7d69hm5KSElEmk4nPPPOMYdmYMWNEa2trMSEhwbAsLS1NdHR0FO+9917Dssr3SuXjKykpEQMCAsQRI0aIAMR169ZVfbJuUXn7I0eOGC2v7nVX18doyvdH5et14sSJ4v33329Yfu3aNVEmk4kTJ04UAYjZ2dmiKIpiYWGh6OLiIk6bNs2o1oyMDNHZ2dlo+eTJk0V7e/sqz82WLVuq7Kvavs6ysrJEa2trcdiwYUafUatWrRIBiF9++aVRm7e+F/773/+KAMT77ruvyvupJqWlpSIAcdasWXesv1JMTIwYHBxstOz2/b1gwQIRgJienm5YdqfPyZpU9x6cPHmyCEB8+eWXq2xf+Tq61eLFi0VBEMRr164Zlt3+GVb5GKytrcUrV64Ylp06dUoEIK5cufKOdX722WciAPHMmTNGyzt06CAOHjzYcL2
"text/plain": [
"<Figure size 800x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAq8AAAHWCAYAAABZpGAJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB2RElEQVR4nO3dd3hTZf8G8DtJ23TvTSdll12G7CkbRVSGgCwRBQVEgbciIigi4k9BQQFfBURAlgwHICBLdoEytBRaSltKV7pn2ibP74/avIS20JE2SXt/rivXRc45ec43pycnNyfPeY5ECCFARERERGQEpPougIiIiIioohheiYiIiMhoMLwSERERkdFgeCUiIiIio8HwSkRERERGg+GViIiIiIwGwysRERERGQ2GVyIiIiIyGgyvRERERGQ0GF6JqF77+OOPoVarAQBqtRrLly/Xc0VUGTdv3sS+ffs0z0NDQ/Hbb7/pryADJJFI8MEHH2ieb9q0CRKJBPfu3Xvia/38/DBp0iSd1jNp0iT4+fnptE2qXxhe65iSg1LJw9zcHE2aNMEbb7yBxMREfZdHZHA2b96Mzz77DPfv38f//d//YfPmzfouiSohKysL06dPx/nz53Hnzh3Mnj0bN27c0HdZVTJr1ixIJBJERESUu8zChQshkUhw/fr1Wqys8h48eIAPPvgAoaGh+i5Fb5YtW4ZnnnkGbm5upf4D8aijR4+iT58+cHZ2hr29PTp16oQtW7bUXrFGhuG1jlq6dCm2bNmCNWvWoGvXrvjmm2/QpUsX5Obm6rs0IoOydOlSLFq0CN7e3li0aBE++ugjfZdEldClSxfNo0mTJkhISMC0adP0XVaVjBs3DgCwbdu2cpfZvn07WrVqhdatW1d5PRMmTEBeXh58fX2r3MaTPHjwAEuWLCkzvH777bcIDw+vsXUbivfeew+XLl1Cu3btHrvcgQMHMGDAABQUFOCDDz7AsmXLYGFhgZdffhlffPFFLVVrXEz0XQDVjMGDB6NDhw4AgFdeeQVOTk74/PPPsX//fowdO1bP1REZjtGjR6NPnz6IiIhA48aN4eLiou+SqJL27duHf/75B3l5eWjVqhXMzMz0XVKVdO7cGY0aNcL27dvx/vvvl5p/7tw5REVF4ZNPPqnWemQyGWQyWbXaqA5TU1O9rbum5efnw8zMDFKpFFFRUfDz84NCoXjscWXNmjXw8PDAn3/+CblcDgCYPn06mjVrhk2bNuGtt96qrfKNBs+81hN9+/YFAERFRQEAUlNT8c4776BVq1awtraGra0tBg8ejGvXrpV6bX5+Pj744AM0adIE5ubm8PDwwMiRIxEZGQkAuHfvnlZXhUcfvXv31rR14sQJSCQS7NixA++++y7c3d1hZWWFZ555BrGxsaXWfeHCBQwaNAh2dnawtLREr169cObMmTLfY+/evctcf1k/1fz4448ICgqChYUFHB0dMWbMmDLX/7j39jC1Wo1Vq1YhMDAQ5ubmcHNzw/Tp05GWlqa1nJ+fH4YNG1ZqPW+88UapNsuqfeXKlaW2KQAolUosXrwYjRo1glwuh7e3N+bPnw+lUlnmtnpY79690bJly1LTP/vsszL7xaWnp2POnDnw9vaGXC5Ho0aNsGLFCk2/0Yd98MEHZW67R/vQxcXFYcqUKXBzc4NcLkdgYCC+//57rWVK9p2Sh1wuR5MmTbB8+XIIIbSWvXr1KgYPHgxbW1tYW1ujX79+OH/+vNYyD/f7c3V1RdeuXeHk5ITWrVtDIpFg06ZNj91uj3bRedJ+V5n3qMvPR8nfwNXVFYWFhVrztm/frqlXoVBozTt48CB69OgBKysr2NjYYOjQofj777+1lpk0aRKsra1L1bV7925IJBKcOHFCM62y+9nXX3+NwMBAyOVyeHp6YubMmUhPT9dapnfv3prPQosWLRAUFIRr166V+Rl9nPL+hg/X//B7rsjfe/fu3ejQoQNsbGy0lvvss88eW8u4ceNw69YtXLlypdS8bdu2QSKRYOzYsSgoKMD777+PoKAg2NnZwcrKCj169MDx48ef+H7L6vMqhMBHH30ELy8vWFpaok+fPqX+3kDFvjtOnDiBjh07AgAmT56see8ln6my+rzm5OTg7bff1hxXmjZtis8++6zUZ1sikeCNN97Avn370LJlS81n6dChQ499z4mJiTAxMcGSJUtKzQsPD4dEIsGaNWsq/B5L3qdEIsFPP/2E9957Dw0aNIClpSUyMzMBoML9ejMzM+Hg4KAJrgBgYmICZ2dnWFhYVKiN+oZnXuuJkqDp5OQEALh79y727duHF198Ef7+/khMTMT69evRq1cv/PPPP/D09AQAqFQqDBs2DMeOHcOYMWMwe/ZsZGVl4ciRI7h58yYCAgI06xg7diyGDBmitd7g4OAy61m2bBkkEgkWLFiApKQkrFq1Cv3790doaKjmw/rnn39i8ODBCAoKwuLFiyGVSrFx40b07dsXp0+fRqdOnUq16+XlpbngJjs7G6+//nqZ6160aBFGjRqFV155BcnJyfjqq6/Qs2dPXL16Ffb29qVe8+qrr6JHjx4AgJ9//hl79+7Vmj99+nRs2rQJkydPxqxZsxAVFYU1a9bg6tWrOHPmjE7ONKSnp5d5MZFarcYzzzyDv/76C6+++iqaN2+OGzdu4IsvvsDt27e1LmaprtzcXPTq1QtxcXGYPn06fHx8cPbsWQQHByM+Ph6rVq0q83UP99169CxCYmIinnrqKc2XkouLCw4ePIipU6ciMzMTc+bM0Vr+3XffRfPmzZGXl6cJea6urpg6dSoA4O+//0aPHj1ga2uL+fPnw9TUFOvXr0fv3r1x8uRJdO7cudz3t2XLlkr3l1y6dCn8/f01z8va7yr7Hmvi85GVlYVff/0Vzz33nGbaxo0bYW5ujvz8/FLbYeLEiRg4cCBWrFiB3NxcfPPNN+jevTuuXr1a4xfbfPDBB1iyZAn69++P119/HeHh4fjmm29w6dKlJ36eFixYUKV1Pv3003j55ZcBAJcuXcKXX35Z7rLOzs5aP+dOmDBBa/65c+cwatQotGnTBp988gns7OygUCgqdAZt3LhxWLJkCbZt24b27dtrpqtUKuzcuRM9evSAj48PFAoF/vvf/2Ls2LGYNm0asrKy8N1332HgwIG4ePEi2rZtW6n3//777+Ojjz7CkCFDMGTIEFy5ckXzU/bDKvLd0bx5cyxduhTvv/++1rGza9euZa5bCIFnnnkGx48fx9SpU9G2bVscPnwY8+bNQ1xcXKmfzv/66y/8/PPPmDFjBmxsbPDll1/i+eefR0xMjOY77lFubm7o1asXdu7cicWLF2vN27FjB2QyGV588cUKv8eHffjhhzAzM8M777wDpVJZ6TP/vXv3xooVK7Bo0SJMnDgREokE27ZtQ0hICHbu3FmptuoNQXXKxo0bBQBx9OhRkZycLGJjY8VPP/0knJychIWFhbh//74QQoj8/HyhUqm0XhsVFSXkcrlYunSpZtr3338vAIjPP/+81LrUarXmdQDEypUrSy0TGBgoevXqpXl+/PhxAUA0aNBAZGZmaqbv3LlTABCrV6/WtN24cWMxcOBAzXqEECI3N1f4+/uLp59+utS6unbtKlq2bKl5npycLACIxYsXa6bdu3dPyGQysWzZMq3X3rhxQ5iYmJSafufOHQFAbN68WTNt8eLF4uGPzunTpwUAsXXrVq3XHjp0qNR0X19fMXTo0FK1z5w5Uzz6cXy09vnz5wtXV1cRFBSktU23bNkipFKpOH36tNbr161bJwCIM2fOlFrfw3r16iUCAwNLTV+5cqUAIKKiojTTPvzwQ2FlZSVu376ttex//vMfIZPJRExMjNb0hQsXColEojXN19dXTJw4UfN86tSpwsPDQygUCq3lxowZI+zs7ERubq4Q4n/7zvHjxzXL5OfnC6lUKmbMmKGZNmLECGFmZiYiIyM10x48eCBsbGxEz549NdNKPisl7y8/P1/4+PiIwYMHCwBi48aNpTfWQ0pef+nSJa3pZe13lX2Puvx8lOyvY8eOFcOGDdNMj46OFlKpVIwdO1Y
"text/plain": [
"<Figure size 800x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAArwAAAHWCAYAAACVPVriAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB1eUlEQVR4nO3deVhU5d8G8HsWGHaQHRQQEAERN0zD3TQRzTJL0zQ1yzatzDKzMk0rM8u0NFveTHPJLVMrl9RcU3PFFRERHJR1QPZ95nn/IObnCCjL4AzD/bk6V85ZnvkOnDlzc+Y5z5EIIQSIiIiIiEyU1NAFEBERERE1JAZeIiIiIjJpDLxEREREZNIYeImIiIjIpDHwEhEREZFJY+AlIiIiIpPGwEtEREREJo2Bl4iIiIhMGgMvEREREZk0Bl4iIiPzySefQKPRAAA0Gg3mzZtn4IqoNi5cuIAtW7ZoH0dFReHPP/80XEFExMBLDW/FihWQSCTaycLCAq1bt8bkyZORmppq6PKIjM7KlSvx+eef48aNG/jiiy+wcuVKQ5dEtZCbm4sXX3wRx44dQ2xsLF5//XWcP3/e0GXVScuWLXWO39VNK1as0MvzffLJJzp/LBiTjz/+GI8++ijc3NwgkUgwe/bsatfds2cP+vbtC2dnZzg4OKBLly5YtWrV/SuWKpEbugBqOubMmQNfX18UFRXh8OHDWLZsGbZv344LFy7AysrK0OURGY05c+Zg7NixmD59OhQKBVavXm3okqgWwsPDtRMAtG7dGhMnTjRwVXWzaNEi5OXlaR9v374dv/zyC7788ks4Oztr53fr1k0vz/fJJ5/gySefxNChQ/XSnj69//77cHd3R8eOHbFr165q19u2bRuGDh2K8PBwzJ49GxKJBBs2bMDYsWOhUqnwxhtv3MeqqQIDL903kZGR6Ny5MwDg+eefh5OTExYuXIitW7di1KhRBq6OyHg89dRT6Nu3L65evYqAgAC4uLgYuiSqpS1btuDSpUsoLCxEaGgozM3NDV1SndwZPFNSUvDLL79g6NChaNmypUFqup+Kiopgbm4OqVSK+Ph4tGzZEiqV6q7vySVLlsDDwwN///03FAoFAODFF19EUFAQVqxYwcBrIOzSQAbz0EMPAQDi4+MBAJmZmXjrrbcQGhoKGxsb2NnZITIyEmfPnq20bVFREWbPno3WrVvDwsICHh4eGDZsGOLi4gAACQkJd/36rU+fPtq29u/fD4lEgvXr1+Pdd9+Fu7s7rK2t8eijjyIxMbHSc//7778YOHAg7O3tYWVlhd69e+Off/6p8jX26dOnyuev6quw1atXIywsDJaWlnB0dMTIkSOrfP67vbbbaTQaLFq0CCEhIbCwsICbmxtefPFF3Lp1S2e9li1b4pFHHqn0PJMnT67UZlW1L1iwoNLPFACKi4sxa9YstGrVCgqFAl5eXnj77bdRXFxc5c/qdn369EHbtm0rzf/8888hkUiQkJCgMz8rKwtTpkyBl5cXFAoFWrVqhfnz52v7wd6u4ozLndP48eN11rt58yYmTJgANzc3KBQKhISEYPny5TrrVOw7FZNCoUDr1q0xb948CCF01j1z5gwiIyNhZ2cHGxsb9OvXD8eOHdNZp6L7T0JCAlxdXdGtWzc4OTmhXbt2Nfra+M7uQ/fa72rzGvX5/qj4Hbi6uqK0tFRn2S+//KKtV6VS6SzbsWMHevbsCWtra9ja2mLw4MG4ePGizjrjx4+HjY1Npbo2bdoEiUSC/fv3a+fVdj/75ptvEBISAoVCAU9PT0yaNAlZWVk66/Tp00f7XmjTpg3CwsJw9uzZKt+jd1Pd7/D2+m9/zTX5fW/atAmdO3eGra2tznqff/55jeuqTk2OX7GxsXjiiSfg7u4OCwsLtGjRAiNHjkR2drb2Nefn52PlypXVvi8rpKamQi6X48MPP6y0LCYmBhKJBEuWLAFQ88+Win193bp1eP/999G8eXNYWVkhJycHAGoc8HNyctCsWTNt2AUAuVwOZ2dnWFpa1qgN0j+e4SWDqQinTk5OAIBr165hy5YtGD58OHx9fZGamorvvvsOvXv3xqVLl+Dp6QkAUKvVeOSRR7B3716MHDkSr7/+OnJzc7F7925cuHAB/v7+2ucYNWoUBg0apPO8M2bMqLKejz/+GBKJBNOnT0daWhoWLVqE/v37IyoqSnuQ+vvvvxEZGYmwsDDMmjULUqkUP/30Ex566CEcOnQIXbp0qdRuixYttBcd5eXl4eWXX67yuWfOnIkRI0bg+eefR3p6Or7++mv06tULZ86cgYODQ6VtXnjhBfTs2RMAsHnzZvz22286y1988UWsWLECzz77LF577TXEx8djyZIlOHPmDP755x+YmZlV+XOojaysrCovqNJoNHj00Udx+PBhvPDCCwgODsb58+fx5Zdf4sqVK3rto1dQUIDevXvj5s2bePHFF+Ht7Y0jR45gxowZSE5OxqJFi6rc7vb+dHeecUlNTcWDDz4IiUSCyZMnw8XFBTt27MBzzz2HnJwcTJkyRWf9d999F8HBwSgsLNQGQ1dXVzz33HMAgIsXL6Jnz56ws7PD22+/DTMzM3z33Xfo06cPDhw4gK5du1b7+latWlXr/p8V3YcqVLXf1fY1NsT7Izc3F3/88Qcef/xx7byffvoJFhYWKCoqqvRzGDduHCIiIjB//nwUFBRg2bJl6NGjB86cOdPgZxtnz56NDz/8EP3798fLL7+MmJgYLFu2DCdOnLjn+2n69Ol1es6HH34YY8eOBQCcOHECX331VbXrOjs748svv9Q+fuaZZ3SWHz16FCNGjED79u3x6aefwt7eXm9fr9fk+FVSUoKIiAgUFxfj1Vdfhbu7O27evIk//vgDWVlZsLe3x6pVq/D888+jS5cueOGFFwBA53h+Ozc3N/Tu3RsbNmzArFmzdJatX78eMpkMw4cPB1Dzz5YKc+fOhbm5Od566y0UFxfX+ux8nz59MH/+fMycORPjxo2DRCLB2rVrcfLkSWzYsKFWbZEeCaIG9tNPPwkAYs+ePSI9PV0kJiaKdevWCScnJ2FpaSlu3LghhBCiqKhIqNVqnW3j4+OFQqEQc+bM0c5bvny5ACAWLlxY6bk0Go12OwBiwYIFldYJCQkRvXv31j7et2+fACCaN28ucnJytPM3bNggAIjFixdr2w4ICBARERHa5xFCiIKCAuHr6ysefvjhSs/VrVs30bZtW+3j9PR0AUDMmjVLOy8hIUHIZDLx8ccf62x7/vx5IZfLK82PjY0VAMTKlSu182bNmiVufzsfOnRIABBr1qzR2Xbnzp2V5vv4+IjBgwdXqn3SpEnizkPEnbW//fbbwtXVVYSFhen8TFetWiWkUqk4dOiQzvbffvutACD++eefSs93u969e4uQkJBK8xcsWCAAiPj4eO28uXPnCmtra3HlyhWddd955x0hk8mEUqnUmf/ee+8JiUSiM8/Hx0eMGzdO+/i5554THh4eQqVS6aw3cuRIYW9vLwoKCoQQ/9t39u3bp12nqKhISKVS8corr2jnDR06VJibm4u4uDjtvKSkJGFrayt69eqlnVfxXql4fUVFRcLb21tERkYKAOKnn36q/MO6TcX2J06c0Jlf1X5X29eoz/dHxf46atQo8cgjj2jnX79+XUilUjFq1CgBQKSnpwshhMjNzRUODg5i4sSJOrWmpKQIe3t7nfnjxo0T1tbWlX42GzdurPS7qul+lpaWJszNzcWAAQN0jlFLliwRAMTy5ct12rz9vbB9+3YBQAwcOLDS+6k6JSUlAoCYPHnyXeuvMHr0aOHr66sz787f94wZMwQAkZycrJ13t+Nkde782dT0+HXmzBkBQGzcuPGu7VtbW+u8F+/mu+++EwDE+fPndea3adNGPPTQQ9rHNf1sqdjX/fz8tPt/Vap6P90uLy9PjBgxQkgkEgFAABBWVlZiy5YtNXpd1DDYpYHum/79+8PFxQVeXl4YOXIkbGxs8Ntvv6F58+YAAIVCAam0fJdUq9XIyMiAjY0NAgMDcfr0aW07v/76K5ydnfHqq69Weo7afGV4p7Fjx8L
"text/plain": [
"<Figure size 800x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_sample_balance(train_var18['Spec_score'], 'Train var18')\n",
"plot_sample_balance(val_var18['Spec_score'], 'Validation var18')\n",
"plot_sample_balance(test_var18['Spec_score'], 'Test var18')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Распределения выборок у данного датасета выглядят схоже. Это говорит о сбалансированности выборок. Однако в тренировочной выборке значительно больший размах значений"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 12. Выполнить приращение данных методами выборки с избытком (oversampling) и выборки с недостатком (undersampling). Должны быть представлены примеры реализации обоих методов для выборок каждого набора данных\n"
2024-10-19 13:14:28 +04:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2024-10-19 23:03:18 +04:00
"source": [
"#### Инсульт"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"После oversampling (var4): stroke\n",
"1 4861\n",
"0 4861\n",
"Name: count, dtype: int64\n"
]
}
],
"source": [
"from imblearn.over_sampling import SMOTE\n",
"\n",
"X_var4 = var4.drop('stroke', axis=1)\n",
"y_var4 = var4['stroke']\n",
"\n",
"# Кодирование категориальных признаков\n",
"for column in X_var4.select_dtypes(include=['object']).columns:\n",
" X_var4[column] = X_var4[column].astype('category').cat.codes\n",
"\n",
"# Теперь применяем SMOTE\n",
"smote = SMOTE(random_state=42)\n",
"X_resampled_var4, y_resampled_var4 = smote.fit_resample(X_var4, y_var4)\n",
"\n",
"# Получаем результаты\n",
"print(f'После oversampling (var4): {pd.Series(y_resampled_var4).value_counts()}')"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"После undersampling (var4): stroke\n",
"0 249\n",
"1 249\n",
"Name: count, dtype: int64\n"
]
}
],
"source": [
"from imblearn.under_sampling import RandomUnderSampler\n",
"\n",
"# Undersampling для var4\n",
"undersample = RandomUnderSampler(random_state=42)\n",
"X_under_var4, y_under_var4 = undersample.fit_resample(X_var4, y_var4)\n",
"\n",
"print(f'После undersampling (var4): {pd.Series(y_under_var4).value_counts()}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Дома"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"ename": "KeyError",
"evalue": "\"['Price'] not found in axis\"",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mKeyError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[1;32mIn[65], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m X_var6 \u001b[38;5;241m=\u001b[39m \u001b[43mvar6\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdrop\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mPrice\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[0;32m 2\u001b[0m y_var6 \u001b[38;5;241m=\u001b[39m var6[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mPrice\u001b[39m\u001b[38;5;124m'\u001b[39m]\n\u001b[0;32m 4\u001b[0m \u001b[38;5;66;03m# Кодирование категориальных признаков\u001b[39;00m\n",
"File \u001b[1;32mc:\\Users\\HomePC\\Desktop\\MII_Lab1\\.venv\\Lib\\site-packages\\pandas\\core\\frame.py:5581\u001b[0m, in \u001b[0;36mDataFrame.drop\u001b[1;34m(self, labels, axis, index, columns, level, inplace, errors)\u001b[0m\n\u001b[0;32m 5433\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mdrop\u001b[39m(\n\u001b[0;32m 5434\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[0;32m 5435\u001b[0m labels: IndexLabel \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 5442\u001b[0m errors: IgnoreRaise \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mraise\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[0;32m 5443\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m DataFrame \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m 5444\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[0;32m 5445\u001b[0m \u001b[38;5;124;03m Drop specified labels from rows or columns.\u001b[39;00m\n\u001b[0;32m 5446\u001b[0m \n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 5579\u001b[0m \u001b[38;5;124;03m weight 1.0 0.8\u001b[39;00m\n\u001b[0;32m 5580\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[1;32m-> 5581\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdrop\u001b[49m\u001b[43m(\u001b[49m\n\u001b[0;32m 5582\u001b[0m \u001b[43m \u001b[49m\u001b[43mlabels\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mlabels\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 5583\u001b[0m \u001b[43m \u001b[49m\u001b[43maxis\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43maxis\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 5584\u001b[0m \u001b[43m \u001b[49m\u001b[43mindex\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mindex\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 5585\u001b[0m \u001b[43m \u001b[49m\u001b[43mcolumns\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcolumns\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 5586\u001b[0m \u001b[43m \u001b[49m\u001b[43mlevel\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mlevel\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 5587\u001b[0m \u001b[43m \u001b[49m\u001b[43minplace\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43minplace\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 5588\u001b[0m \u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43merrors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m 5589\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[1;32mc:\\Users\\HomePC\\Desktop\\MII_Lab1\\.venv\\Lib\\site-packages\\pandas\\core\\generic.py:4788\u001b[0m, in \u001b[0;36mNDFrame.drop\u001b[1;34m(self, labels, axis, index, columns, level, inplace, errors)\u001b[0m\n\u001b[0;32m 4786\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m axis, labels \u001b[38;5;129;01min\u001b[39;00m axes\u001b[38;5;241m.\u001b[39mitems():\n\u001b[0;32m 4787\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m labels \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m-> 4788\u001b[0m obj \u001b[38;5;241m=\u001b[39m \u001b[43mobj\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_drop_axis\u001b[49m\u001b[43m(\u001b[49m\u001b[43mlabels\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mlevel\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mlevel\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43merrors\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 4790\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m inplace:\n\u001b[0;32m 4791\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_update_inplace(obj)\n",
"File \u001b[1;32mc:\\Users\\HomePC\\Desktop\\MII_Lab1\\.venv\\Lib\\site-packages\\pandas\\core\\generic.py:4830\u001b[0m, in \u001b[0;36mNDFrame._drop_axis\u001b[1;34m(self, labels, axis, level, errors, only_slice)\u001b[0m\n\u001b[0;32m 4828\u001b[0m new_axis \u001b[38;5;241m=\u001b[39m axis\u001b[38;5;241m.\u001b[39mdrop(labels, level\u001b[38;5;241m=\u001b[39mlevel, errors\u001b[38;5;241m=\u001b[39merrors)\n\u001b[0;32m 4829\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m-> 4830\u001b[0m new_axis \u001b[38;5;241m=\u001b[39m \u001b[43maxis\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdrop\u001b[49m\u001b[43m(\u001b[49m\u001b[43mlabels\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43merrors\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 4831\u001b[0m indexer \u001b[38;5;241m=\u001b[39m axis\u001b[38;5;241m.\u001b[39mget_indexer(new_axis)\n\u001b[0;32m 4833\u001b[0m \u001b[38;5;66;03m# Case for non-unique axis\u001b[39;00m\n\u001b[0;32m 4834\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n",
"File \u001b[1;32mc:\\Users\\HomePC\\Desktop\\MII_Lab1\\.venv\\Lib\\site-packages\\pandas\\core\\indexes\\base.py:7070\u001b[0m, in \u001b[0;36mIndex.drop\u001b[1;34m(self, labels, errors)\u001b[0m\n\u001b[0;32m 7068\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m mask\u001b[38;5;241m.\u001b[39many():\n\u001b[0;32m 7069\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m errors \u001b[38;5;241m!=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mignore\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n\u001b[1;32m-> 7070\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mKeyError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mlabels[mask]\u001b[38;5;241m.\u001b[39mtolist()\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m not found in axis\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m 7071\u001b[0m indexer \u001b[38;5;241m=\u001b[39m indexer[\u001b[38;5;241m~\u001b[39mmask]\n\u001b[0;32m 7072\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mdelete(indexer)\n",
"\u001b[1;31mKeyError\u001b[0m: \"['Price'] not found in axis\""
]
}
],
"source": [
"X_var6 = var6.drop('Price', axis=1)\n",
"y_var6 = var6['Price']\n",
"\n",
"# Кодирование категориальных признаков\n",
"for column in X_var6.select_dtypes(include=['object']).columns:\n",
" X_var6[column] = X_var6[column].astype('category').cat.codes\n",
"\n",
"# Теперь применяем SMOTE\n",
"smote = SMOTE(random_state=42)\n",
"X_resampled_var6, y_resampled_var6 = smote.fit_resample(X_var6, y_var6)\n",
"\n",
"# Получаем результаты\n",
"print(f'После oversampling (var6): {pd.Series(y_resampled_var6).value_counts()}')"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"ename": "ValueError",
"evalue": "Expected n_neighbors <= n_samples_fit, but n_neighbors = 6, n_samples_fit = 1, n_samples = 1",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[1;32mIn[69], line 10\u001b[0m\n\u001b[0;32m 8\u001b[0m \u001b[38;5;66;03m# Теперь применяем SMOTE\u001b[39;00m\n\u001b[0;32m 9\u001b[0m smote \u001b[38;5;241m=\u001b[39m SMOTE(random_state\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m42\u001b[39m)\n\u001b[1;32m---> 10\u001b[0m X_resampled_var18, y_resampled_var18 \u001b[38;5;241m=\u001b[39m \u001b[43msmote\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfit_resample\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX_var18\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my_var18\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 12\u001b[0m \u001b[38;5;66;03m# Получаем результаты\u001b[39;00m\n\u001b[0;32m 13\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mПосле oversampling (var18): \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mpd\u001b[38;5;241m.\u001b[39mSeries(y_resampled_var18)\u001b[38;5;241m.\u001b[39mvalue_counts()\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m'\u001b[39m)\n",
"File \u001b[1;32mc:\\Users\\HomePC\\Desktop\\MII_Lab1\\.venv\\Lib\\site-packages\\imblearn\\base.py:208\u001b[0m, in \u001b[0;36mBaseSampler.fit_resample\u001b[1;34m(self, X, y)\u001b[0m\n\u001b[0;32m 187\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Resample the dataset.\u001b[39;00m\n\u001b[0;32m 188\u001b[0m \n\u001b[0;32m 189\u001b[0m \u001b[38;5;124;03mParameters\u001b[39;00m\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 205\u001b[0m \u001b[38;5;124;03m The corresponding label of `X_resampled`.\u001b[39;00m\n\u001b[0;32m 206\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[0;32m 207\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_validate_params()\n\u001b[1;32m--> 208\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfit_resample\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[1;32mc:\\Users\\HomePC\\Desktop\\MII_Lab1\\.venv\\Lib\\site-packages\\imblearn\\base.py:112\u001b[0m, in \u001b[0;36mSamplerMixin.fit_resample\u001b[1;34m(self, X, y)\u001b[0m\n\u001b[0;32m 106\u001b[0m X, y, binarize_y \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_check_X_y(X, y)\n\u001b[0;32m 108\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39msampling_strategy_ \u001b[38;5;241m=\u001b[39m check_sampling_strategy(\n\u001b[0;32m 109\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39msampling_strategy, y, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_sampling_type\n\u001b[0;32m 110\u001b[0m )\n\u001b[1;32m--> 112\u001b[0m output \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_fit_resample\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 114\u001b[0m y_ \u001b[38;5;241m=\u001b[39m (\n\u001b[0;32m 115\u001b[0m label_binarize(output[\u001b[38;5;241m1\u001b[39m], classes\u001b[38;5;241m=\u001b[39mnp\u001b[38;5;241m.\u001b[39munique(y)) \u001b[38;5;28;01mif\u001b[39;00m binarize_y \u001b[38;5;28;01melse\u001b[39;00m output[\u001b[38;5;241m1\u001b[39m]\n\u001b[0;32m 116\u001b[0m )\n\u001b[0;32m 118\u001b[0m X_, y_ \u001b[38;5;241m=\u001b[39m arrays_transformer\u001b[38;5;241m.\u001b[39mtransform(output[\u001b[38;5;241m0\u001b[39m], y_)\n",
"File \u001b[1;32mc:\\Users\\HomePC\\Desktop\\MII_Lab1\\.venv\\Lib\\site-packages\\imblearn\\over_sampling\\_smote\\base.py:389\u001b[0m, in \u001b[0;36mSMOTE._fit_resample\u001b[1;34m(self, X, y)\u001b[0m\n\u001b[0;32m 386\u001b[0m X_class \u001b[38;5;241m=\u001b[39m _safe_indexing(X, target_class_indices)\n\u001b[0;32m 388\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mnn_k_\u001b[38;5;241m.\u001b[39mfit(X_class)\n\u001b[1;32m--> 389\u001b[0m nns \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mnn_k_\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mkneighbors\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX_class\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mreturn_distance\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m[:, \u001b[38;5;241m1\u001b[39m:]\n\u001b[0;32m 390\u001b[0m X_new, y_new \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_make_samples(\n\u001b[0;32m 391\u001b[0m X_class, y\u001b[38;5;241m.\u001b[39mdtype, class_sample, X_class, nns, n_samples, \u001b[38;5;241m1.0\u001b[39m\n\u001b[0;32m 392\u001b[0m )\n\u001b[0;32m 393\u001b[0m X_resampled\u001b[38;5;241m.\u001b[39mappend(X_new)\n",
"File \u001b[1;32mc:\\Users\\HomePC\\Desktop\\MII_Lab1\\.venv\\Lib\\site-packages\\sklearn\\neighbors\\_base.py:834\u001b[0m, in \u001b[0;36mKNeighborsMixin.kneighbors\u001b[1;34m(self, X, n_neighbors, return_distance)\u001b[0m\n\u001b[0;32m 832\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m 833\u001b[0m inequality_str \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mn_neighbors <= n_samples_fit\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m--> 834\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[0;32m 835\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mExpected \u001b[39m\u001b[38;5;132;01m{\u001b[39;00minequality_str\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m, but \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m 836\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mn_neighbors = \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mn_neighbors\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m, n_samples_fit = \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mn_samples_fit\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m, \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m 837\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mn_samples = \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mX\u001b[38;5;241m.\u001b[39mshape[\u001b[38;5;241m0\u001b[39m]\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;66;03m# include n_samples for common tests\u001b[39;00m\n\u001b[0;32m 838\u001b[0m )\n\u001b[0;32m 840\u001b[0m n_jobs \u001b[38;5;241m=\u001b[39m effective_n_jobs(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mn_jobs)\n\u001b[0;32m 841\u001b[0m chunked_results \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n",
"\u001b[1;31mValueError\u001b[0m: Expected n_neighbors <= n_samples_fit, but n_neighbors = 6, n_samples_fit = 1, n_samples = 1"
]
}
],
"source": [
"X_var18 = var18.drop('Price', axis=1)\n",
"y_var18 = var18['Price']\n",
"\n",
"# Кодирование категориальных признаков\n",
"for column in X_var18.select_dtypes(include=['object']).columns:\n",
" X_var18[column] = X_var18[column].astype('category').cat.codes\n",
"\n",
"# Теперь применяем SMOTE\n",
"smote = SMOTE(random_state=42)\n",
"X_resampled_var18, y_resampled_var18 = smote.fit_resample(X_var18, y_var18)\n",
"\n",
"# Получаем результаты\n",
"print(f'После oversampling (var18): {pd.Series(y_resampled_var18).value_counts()}')"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"После undersampling (var4): stroke\n",
"0 249\n",
"1 249\n",
"Name: count, dtype: int64\n"
]
}
],
"source": [
"from imblearn.under_sampling import RandomUnderSampler\n",
"\n",
"# Undersampling для var4\n",
"undersample = RandomUnderSampler(random_state=42)\n",
"X_under_var4, y_under_var4 = undersample.fit_resample(X_var4, y_var4)\n",
"\n",
"print(f'После undersampling (var4): {pd.Series(y_under_var4).value_counts()}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В данном случае у нас есть только один датасет, предназначенный для решения задачи классификации (инсульт). Проблему дисбаланса в нем мы решили применив undersampling & oversampling.\n",
"\n",
"Два остальных датасета не содержат классов, т.к предназначены для решения задачи регрессии (предсказания цен на недвижимость или цены мобильного устройства), поэтому выполнять приращение данных не требуется."
]
2024-10-19 13:14:28 +04:00
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}