2024-11-29 01:51:25 +04:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## **Лабораторная работа №4**\n",
"\n",
"### **Определение бизнес-целей для решения задач регрессии и классификации**\n",
"\n",
"**Вариант задания:** Н а б о р данных о ценах на акции Walmart.\n",
"\n",
"**Бизнес-цели:**\n",
"\n",
"1. **Регрессия:** Предсказание цены закрытия акции (Close) на основе исторических данных.\n",
"\n",
"2. **Классификация:** Определение направления изменения цены (повышение или понижение) на следующий день, что можно выразить в бинарной метке (например, 1 — цена повысилась, 0 — снизилась). Метка будет рассчитываться как разница между Close сегодняшнего и завтрашнего дня.\n",
"\n",
"**Столбцы датасета и их пояснение:**\n",
"\n",
2024-11-29 20:52:46 +04:00
"*Date* - Дата, на которую относятся данные. Эта характеристика указывает конкретный день, в который происходила торговля акциями Walmart.\n",
2024-11-29 01:51:25 +04:00
"\n",
2024-11-29 20:52:46 +04:00
"*Open* - Цена открытия. Стоимость акций Walmart в начале торгового дня. Это важный показатель, который показывает, по какой цене начались торги в конкретный день, и часто используется для сравнения с ценой закрытия для определения дневного тренда.\n",
2024-11-29 01:51:25 +04:00
"\n",
2024-11-29 20:52:46 +04:00
"*High* - Максимальная цена за день. Наибольшая цена, достигнутая акциями Walmart в течение торгового дня. Эта характеристика указывает, какой была самая высокая стоимость акций за день.\n",
2024-11-29 01:51:25 +04:00
"\n",
2024-11-29 20:52:46 +04:00
"*Low* - Минимальная цена за день. Наименьшая цена, по которой торговались акции Walmart в течение дня.\n",
2024-11-29 01:51:25 +04:00
"\n",
2024-11-29 20:52:46 +04:00
"*Close* - Цена закрытия. Стоимость акций Walmart в конце торгового дня. Цена закрытия — один из основных показателей, используемых для анализа акций, так как она отображает итоговую стоимость акций за день и часто используется для расчета дневных изменений и трендов на длительных временных периодах.\n",
2024-11-29 01:51:25 +04:00
"\n",
"*Adj Close* - Скорректированная цена закрытия. Цена закрытия, скорректированная с учетом всех корпоративных действий.\n",
"\n",
2024-11-29 20:52:46 +04:00
"*Volume* - Объем торгов. Количество акций Walmart, проданных и купленных в течение дня. "
2024-11-29 01:51:25 +04:00
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Date Open High Low Close Adj Close Volume\n",
"0 1/3/2000 22.791668 23.000000 21.833332 22.270832 14.469358 25109700\n",
"1 1/4/2000 21.833332 21.937500 21.395832 21.437500 13.927947 20235300\n",
"2 1/5/2000 21.291668 21.458332 20.729168 21.000000 13.643703 21056100\n",
"3 1/6/2000 21.000000 21.520832 20.895832 21.229168 13.792585 19633500\n",
"4 1/7/2000 21.500000 22.979168 21.500000 22.833332 14.834813 23930700\n",
"Index(['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'], dtype='object')\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Date</th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Adj Close</th>\n",
" <th>Volume</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1/3/2000</td>\n",
" <td>22.791668</td>\n",
" <td>23.000000</td>\n",
" <td>21.833332</td>\n",
" <td>22.270832</td>\n",
" <td>14.469358</td>\n",
" <td>25109700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1/4/2000</td>\n",
" <td>21.833332</td>\n",
" <td>21.937500</td>\n",
" <td>21.395832</td>\n",
" <td>21.437500</td>\n",
" <td>13.927947</td>\n",
" <td>20235300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1/5/2000</td>\n",
" <td>21.291668</td>\n",
" <td>21.458332</td>\n",
" <td>20.729168</td>\n",
" <td>21.000000</td>\n",
" <td>13.643703</td>\n",
" <td>21056100</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1/6/2000</td>\n",
" <td>21.000000</td>\n",
" <td>21.520832</td>\n",
" <td>20.895832</td>\n",
" <td>21.229168</td>\n",
" <td>13.792585</td>\n",
" <td>19633500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1/7/2000</td>\n",
" <td>21.500000</td>\n",
" <td>22.979168</td>\n",
" <td>21.500000</td>\n",
" <td>22.833332</td>\n",
" <td>14.834813</td>\n",
" <td>23930700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1/10/2000</td>\n",
" <td>22.416668</td>\n",
" <td>22.500000</td>\n",
" <td>21.875000</td>\n",
" <td>22.416668</td>\n",
" <td>14.564112</td>\n",
" <td>20142900</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>1/11/2000</td>\n",
" <td>22.354168</td>\n",
" <td>22.583332</td>\n",
" <td>21.875000</td>\n",
" <td>22.083332</td>\n",
" <td>14.347544</td>\n",
" <td>14829900</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>1/12/2000</td>\n",
" <td>22.062500</td>\n",
" <td>22.250000</td>\n",
" <td>21.687500</td>\n",
" <td>21.687500</td>\n",
" <td>14.090372</td>\n",
" <td>12255000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1/13/2000</td>\n",
" <td>22.000000</td>\n",
" <td>22.041668</td>\n",
" <td>21.666668</td>\n",
" <td>21.708332</td>\n",
" <td>14.103909</td>\n",
" <td>15063000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>1/14/2000</td>\n",
" <td>21.333332</td>\n",
" <td>21.979168</td>\n",
" <td>21.333332</td>\n",
" <td>21.500000</td>\n",
" <td>13.968553</td>\n",
" <td>18936600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>1/18/2000</td>\n",
" <td>21.062500</td>\n",
" <td>22.145832</td>\n",
" <td>21.020832</td>\n",
" <td>21.854168</td>\n",
" <td>14.198661</td>\n",
" <td>19326600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>1/19/2000</td>\n",
" <td>21.750000</td>\n",
" <td>21.937500</td>\n",
" <td>21.333332</td>\n",
" <td>21.354168</td>\n",
" <td>13.873807</td>\n",
" <td>14459700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>1/20/2000</td>\n",
" <td>21.479168</td>\n",
" <td>21.500000</td>\n",
" <td>20.833332</td>\n",
" <td>21.125000</td>\n",
" <td>13.724912</td>\n",
" <td>17214300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>1/21/2000</td>\n",
" <td>21.312500</td>\n",
" <td>21.312500</td>\n",
" <td>20.687500</td>\n",
" <td>20.812500</td>\n",
" <td>13.521886</td>\n",
" <td>20857500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>1/24/2000</td>\n",
" <td>21.145832</td>\n",
" <td>21.145832</td>\n",
" <td>19.166668</td>\n",
" <td>19.791668</td>\n",
" <td>12.858650</td>\n",
" <td>23399700</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Date Open High Low Close Adj Close Volume\n",
"0 1/3/2000 22.791668 23.000000 21.833332 22.270832 14.469358 25109700\n",
"1 1/4/2000 21.833332 21.937500 21.395832 21.437500 13.927947 20235300\n",
"2 1/5/2000 21.291668 21.458332 20.729168 21.000000 13.643703 21056100\n",
"3 1/6/2000 21.000000 21.520832 20.895832 21.229168 13.792585 19633500\n",
"4 1/7/2000 21.500000 22.979168 21.500000 22.833332 14.834813 23930700\n",
"5 1/10/2000 22.416668 22.500000 21.875000 22.416668 14.564112 20142900\n",
"6 1/11/2000 22.354168 22.583332 21.875000 22.083332 14.347544 14829900\n",
"7 1/12/2000 22.062500 22.250000 21.687500 21.687500 14.090372 12255000\n",
"8 1/13/2000 22.000000 22.041668 21.666668 21.708332 14.103909 15063000\n",
"9 1/14/2000 21.333332 21.979168 21.333332 21.500000 13.968553 18936600\n",
"10 1/18/2000 21.062500 22.145832 21.020832 21.854168 14.198661 19326600\n",
"11 1/19/2000 21.750000 21.937500 21.333332 21.354168 13.873807 14459700\n",
"12 1/20/2000 21.479168 21.500000 20.833332 21.125000 13.724912 17214300\n",
"13 1/21/2000 21.312500 21.312500 20.687500 20.812500 13.521886 20857500\n",
"14 1/24/2000 21.145832 21.145832 19.166668 19.791668 12.858650 23399700"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Date 0\n",
"Open 0\n",
"High 0\n",
"Low 0\n",
"Close 0\n",
"Adj Close 0\n",
"Volume 0\n",
"dtype: int64\n"
]
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.read_csv(\"..//static//csv//WMT.csv\").head(15000)\n",
"\n",
"print(df.head())\n",
"print(df.columns)\n",
"display(df.head(15))\n",
"print(df.isnull().sum())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **Выберем три модели для задач регрессии и классификации**\n",
"\n",
"Сделаем выбор подходящих моделей для решения задач классификации и регрессии на основе анализа данных и целей. \n",
"\n",
"Для регрессии выберем:\n",
"\n",
"- LinearRegression\n",
"- DecisionTreeRegressor\n",
"- GradientBoostingRegressor\n",
"\n",
"Для классификации выберем:\n",
"\n",
"- LogisticRegression\n",
"- RandomForestClassifier\n",
"- GradientBoostingClassifier"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **Разбиение на выборки и создание ориентира для задач регрессии**\n",
"\n",
"Мы будем использовать подход к задаче регрессии, где целевой переменной будет выступать цена товара, а другие характеристики, кроме ссылок, выбраны в качестве признаков."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Размер обучающей выборки: (4894, 6)\n",
"Размер тестовой выборки: (1224, 6)\n",
"Baseline MAE: 9.224148034130094\n",
"Baseline MSE: 129.81371036926848\n",
"Baseline R²: -0.002482369649123406\n"
]
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\n",
"\n",
"# Определяем признаки и целевой признак для задачи регрессии\n",
"features = ['Date', 'Open', 'High', 'Low', 'Adj Close', 'Volume'] \n",
"target = 'Close' # Целевая переменная\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(df[features], df[target], test_size=0.2, random_state=42)\n",
"\n",
"print(\"Размер обучающей выборки:\", X_train.shape)\n",
"print(\"Размер тестовой выборки:\", X_test.shape)\n",
"\n",
"baseline_predictions = [y_train.mean()] * len(y_test)\n",
"\n",
"print('Baseline MAE:', mean_absolute_error(y_test, baseline_predictions))\n",
"print('Baseline MSE:', mean_squared_error(y_test, baseline_predictions))\n",
"print('Baseline R²:', r2_score(y_test, baseline_predictions))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **Построение конвейера и обучение моделей для задач регрессии**\n",
"\n",
"Переделаем характристики под числовые данные и построим конвейер для обучения моделей, а также оценим их качество."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model: Linear Regression trained.\n",
"Model: Decision Tree trained.\n",
"Model: Gradient Boosting trained.\n"
]
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
"from sklearn.compose import ColumnTransformer\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.tree import DecisionTreeRegressor\n",
"from sklearn.ensemble import GradientBoostingRegressor\n",
"\n",
"df['Date'] = pd.to_datetime(df['Date'], errors='coerce')\n",
"\n",
"# Извлечение признаков из даты\n",
"df['Year'] = df['Date'].dt.year\n",
"df['Month'] = df['Date'].dt.month\n",
"df['Day'] = df['Date'].dt.day\n",
"\n",
"categorical_features = [] \n",
"numeric_features = ['Year', 'Month', 'Day', 'Open', 'High', 'Low', 'Adj Close', 'Volume']\n",
"\n",
"target = 'Close'\n",
"features = numeric_features + categorical_features\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(df[features], df[target], test_size=0.2, random_state=42)\n",
"\n",
"preprocessor = ColumnTransformer(\n",
" transformers=[\n",
" ('num', StandardScaler(), numeric_features),\n",
" ('cat', OneHotEncoder(), categorical_features)], \n",
" remainder='passthrough')\n",
"\n",
"pipeline_linear_regression = Pipeline(steps=[\n",
" ('preprocessor', preprocessor),\n",
" ('regressor', LinearRegression())\n",
"])\n",
"\n",
"pipeline_decision_tree = Pipeline(steps=[\n",
" ('preprocessor', preprocessor),\n",
" ('regressor', DecisionTreeRegressor(random_state=42))\n",
"])\n",
"\n",
"pipeline_gradient_boosting = Pipeline(steps=[\n",
" ('preprocessor', preprocessor),\n",
" ('regressor', GradientBoostingRegressor(random_state=42))\n",
"])\n",
"\n",
"pipelines = [\n",
" ('Linear Regression', pipeline_linear_regression),\n",
" ('Decision Tree', pipeline_decision_tree),\n",
" ('Gradient Boosting', pipeline_gradient_boosting)\n",
"]\n",
"\n",
"for name, pipeline in pipelines:\n",
" pipeline.fit(X_train, y_train)\n",
" print(f\"Model: {name} trained.\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **Оценка качества моделей для регрессии**\n",
"\n",
"Оценим качество моделей для решения задач регресси и обоснуем выбор метрик."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model: Linear Regression\n",
"MAE: 0.09839901002366848\n",
"MSE: 0.021197782995962776\n",
"R²: 0.9998363007754062\n",
"\n",
"Model: Decision Tree\n",
"MAE: 0.12540266748366016\n",
"MSE: 0.03189181356212172\n",
"R²: 0.9997537164545931\n",
"\n",
"Model: Gradient Boosting\n",
"MAE: 0.1251011338066786\n",
"MSE: 0.031244571650786104\n",
"R²: 0.9997587147602665\n",
"\n"
]
}
],
"source": [
"from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\n",
"\n",
"for name, pipeline in pipelines:\n",
" y_pred = pipeline.predict(X_test)\n",
" print(f\"Model: {name}\")\n",
" print('MAE:', mean_absolute_error(y_test, y_pred))\n",
" print('MSE:', mean_squared_error(y_test, y_pred))\n",
" print('R²:', r2_score(y_test, y_pred))\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**В качестве метрик для оценки качества регрессионных моделей выбраны:**\n",
"\n",
"- **MAE (Mean Absolute Error)** — средняя абсолютная ошибка. Она измеряет среднюю величину отклонений предсказанных значений от фактических, что позволяет понять, насколько в среднем модель ошибается. MAE удобна для интерпретации, так как измеряется в тех же единицах, что и целевая переменная.\n",
"\n",
"- **MSE (Mean Squared Error)** — среднеквадратичная ошибка, которая учитывает квадраты ошибок, что увеличивает вес больших ошибок по сравнению с MAE. Это полезно, когда нам нужно сильнее штрафовать крупные отклонения.\n",
"\n",
"- **R² (коэффициент детерминации)** — доля объясненной дисперсии, которая показывает, насколько хорошо модель объясняет изменчивость целевой переменной. Значение R² близкое к 1 указывает на высокую точность модели, а отрицательные значения — на низкое качество, когда модель хуже, чем простое усреднение.\n",
"\n",
"\n",
"**Анализ результатов:**\n",
"1. **Baseline MAE, MSE, R²:**\n",
"\n",
"- Baseline MAE: 9.22\n",
"\n",
"- Baseline MSE: 129.81\n",
"\n",
"- Baseline R²: -0.002\n",
"\n",
"2. **Linear Regression:**\n",
"\n",
"- MAE: 0.098\n",
"\n",
"- MSE: 0.021\n",
"\n",
"- R²: 0.999\n",
"\n",
"*Вывод*: Линейная регрессия показала самую низкую ошибку (MAE и MSE) и наивысший R², что указывает на то, что она лучше всего подходит для данного набора данных\n",
"\n",
"3. **Decision Tree:**\n",
"\n",
"- MAE: 0.125\n",
"\n",
"- MSE: 0.031\n",
"\n",
"- R²: 0.999\n",
"\n",
"*Вывод*: Дерево решений показало хорошие результаты, хотя и немного хуже, чем линейная регрессия. R² также очень высокий, что указывает на хорошее объяснение изменчивости данных.\n",
"\n",
"4. **Gradient Boosting:**\n",
"\n",
"- MAE: 0.125\n",
"\n",
"- MSE: 0.031\n",
"\n",
"- R²: 0.999\n",
"\n",
"*Вывод*: Градиентный бустинг показал результаты, близкие к дереву решений, но немного лучше по MAE и MSE. R² также очень высокий."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **Разбиение на выборки и создание ориентира для задач классификации**\n",
"\n",
"Мы будем использовать подход к задаче регрессии, где целевой переменной будет выступать цена закрытия акции, а другие характеристики выбраны в качестве признаков."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Размер обучающей выборки: (4894, 4)\n",
"Размер тестовой выборки: (1224, 4)\n"
]
}
],
"source": [
"import pandas as pd\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"df = pd.read_csv(\"..//static//csv//WMT.csv\")\n",
"# Создание целевой переменной для классификации направления изменения цены\n",
"# Если цена закрытия следующего дня выше текущего дня — 1 (повышение), иначе — 0 (снижение)\n",
"df['Price_Up'] = (df['Close'].shift(-1) > df['Close']).astype(int)\n",
"\n",
"features = ['Open', 'High', 'Low', 'Volume'] \n",
"target = 'Price_Up'\n",
"\n",
"# Удаление последней строки, так как для неё нет значения следующего дня\n",
"df = df.dropna()\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(df[features], df[target], test_size=0.2, random_state=42)\n",
"\n",
"print(\"Размер обучающей выборки:\", X_train.shape)\n",
"print(\"Размер тестовой выборки:\", X_test.shape)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **Построение конвейера и обучение моделей для задач классификации**\n",
"\n",
"Построим конвейер где проведем обучение моделей, а так же создадим отдельную переменную 'Price_Up' для точного подсчета направления изменения цены (повышение или понижение) на следующий день."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 0.4880\n",
"Precision: 0.4821\n",
"Recall: 0.3891\n",
"F1-Score: 0.4306\n",
"ROC-AUC: 0.4836\n",
"Accuracy: 0.4936\n",
"Precision: 0.4906\n",
"Recall: 0.4630\n",
"F1-Score: 0.4764\n",
"ROC-AUC: 0.5052\n",
"Accuracy: 0.4952\n",
"Precision: 0.4923\n",
"Recall: 0.4630\n",
"F1-Score: 0.4772\n",
"ROC-AUC: 0.4972\n",
"\n",
"Результаты моделей:\n",
"\n",
"Logistic Regression:\n",
"Accuracy: 0.4880\n",
"Precision: 0.4821\n",
"Recall: 0.3891\n",
"F1: 0.4306\n",
"Roc_auc: 0.4836\n",
"\n",
"Random Forest:\n",
"Accuracy: 0.4936\n",
"Precision: 0.4906\n",
"Recall: 0.4630\n",
"F1: 0.4764\n",
"Roc_auc: 0.5052\n",
"\n",
"XGBoost:\n",
"Accuracy: 0.4952\n",
"Precision: 0.4923\n",
"Recall: 0.4630\n",
"F1: 0.4772\n",
"Roc_auc: 0.4972\n"
]
}
],
"source": [
"import pandas as pd\n",
"from imblearn.over_sampling import SMOTE\n",
"from sklearn.model_selection import train_test_split, RandomizedSearchCV\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score\n",
"from scipy.stats import uniform, randint\n",
"import xgboost as xgb\n",
"\n",
"features = ['Open', 'High', 'Low', 'Volume']\n",
"target = 'Price_Up'\n",
"\n",
"X = df[features]\n",
"y = df[target]\n",
"\n",
"smote = SMOTE(random_state=42)\n",
"X_resampled, y_resampled = smote.fit_resample(X, y)\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=0.2, random_state=42)\n",
"\n",
"def evaluate_model(model, X_test, y_test):\n",
" y_pred = model.predict(X_test)\n",
" y_pred_proba = model.predict_proba(X_test)[:, 1]\n",
" \n",
" accuracy = accuracy_score(y_test, y_pred)\n",
" precision = precision_score(y_test, y_pred)\n",
" recall = recall_score(y_test, y_pred)\n",
" f1 = f1_score(y_test, y_pred)\n",
" roc_auc = roc_auc_score(y_test, y_pred_proba)\n",
" \n",
" print(f\"Accuracy: {accuracy:.4f}\")\n",
" print(f\"Precision: {precision:.4f}\")\n",
" print(f\"Recall: {recall:.4f}\")\n",
" print(f\"F1-Score: {f1:.4f}\")\n",
" print(f\"ROC-AUC: {roc_auc:.4f}\")\n",
" \n",
" return {'accuracy': accuracy, 'precision': precision, 'recall': recall, 'f1': f1, 'roc_auc': roc_auc}\n",
"\n",
"\n",
"# Логистическая регрессия\n",
"logreg_pipeline = Pipeline([\n",
" ('scaler', StandardScaler()),\n",
" ('classifier', LogisticRegression(max_iter=1000, random_state=42))\n",
"])\n",
"logreg_param_dist = {\n",
" 'classifier__C': uniform(loc=0, scale=4),\n",
" 'classifier__penalty': ['l1', 'l2'],\n",
" 'classifier__solver': ['liblinear', 'saga']\n",
"}\n",
"logreg_random_search = RandomizedSearchCV(logreg_pipeline, param_distributions=logreg_param_dist, n_iter=50, cv=5, random_state=42, n_jobs=-1)\n",
"logreg_random_search.fit(X_train, y_train)\n",
"logreg_best_model = logreg_random_search.best_estimator_\n",
"logreg_results = evaluate_model(logreg_best_model, X_test, y_test)\n",
"\n",
"# Случайный лес\n",
"rf_pipeline = Pipeline([\n",
" ('scaler', StandardScaler()),\n",
" ('classifier', RandomForestClassifier(random_state=42))\n",
"])\n",
"rf_param_dist = {\n",
" 'classifier__n_estimators': randint(100, 1000),\n",
" 'classifier__max_depth': [None] + list(randint(10, 100).rvs(10)),\n",
" 'classifier__min_samples_split': randint(2, 20),\n",
" 'classifier__min_samples_leaf': randint(1, 20),\n",
" 'classifier__bootstrap': [True, False]\n",
"}\n",
"rf_random_search = RandomizedSearchCV(rf_pipeline, param_distributions=rf_param_dist, n_iter=50, cv=5, random_state=42, n_jobs=-1)\n",
"rf_random_search.fit(X_train, y_train)\n",
"rf_best_model = rf_random_search.best_estimator_\n",
"rf_results = evaluate_model(rf_best_model, X_test, y_test)\n",
"\n",
"# XGBoost\n",
"xgb_pipeline = Pipeline([\n",
" ('scaler', StandardScaler()),\n",
" ('classifier', xgb.XGBClassifier(random_state=42))\n",
"])\n",
"xgb_param_dist = {\n",
" 'classifier__n_estimators': randint(100, 1000),\n",
" 'classifier__learning_rate': uniform(0.01, 0.5),\n",
" 'classifier__max_depth': randint(3, 10),\n",
" 'classifier__min_child_weight': randint(1, 10),\n",
" 'classifier__subsample': uniform(0.5, 0.5),\n",
" 'classifier__colsample_bytree': uniform(0.5, 0.5)\n",
"}\n",
"xgb_random_search = RandomizedSearchCV(xgb_pipeline, param_distributions=xgb_param_dist, n_iter=50, cv=5, random_state=42, n_jobs=-1)\n",
"xgb_random_search.fit(X_train, y_train)\n",
"xgb_best_model = xgb_random_search.best_estimator_\n",
"xgb_results = evaluate_model(xgb_best_model, X_test, y_test)\n",
"\n",
"print(\"\\nР е зу льта ты моделей:\")\n",
"print(\"\\nLogistic Regression:\")\n",
"for metric, value in logreg_results.items():\n",
" print(f\"{metric.capitalize()}: {value:.4f}\")\n",
"\n",
"print(\"\\nRandom Forest:\")\n",
"for metric, value in rf_results.items():\n",
" print(f\"{metric.capitalize()}: {value:.4f}\")\n",
"\n",
"print(\"\\nXGBoost:\")\n",
"for metric, value in xgb_results.items():\n",
" print(f\"{metric.capitalize()}: {value:.4f}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **Оценка качества моделей для классификации**\n",
"\n",
"Оценим качество моделей для решения задач классификации и обоснуем выбор метрик. "
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Logistic Regression Metrics:\n",
"Accuracy: 0.4880\n",
"Precision: 0.4821\n",
"Recall: 0.3891\n",
"F1-Score: 0.4306\n",
"ROC-AUC: 0.4836\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAiQAAAHdCAYAAAAthmI8AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABzSklEQVR4nO3ddVhUaRsG8HtokDQAwQZBJRQFTOxeXXONNXFXXbs71sBc7G4MbF1W165V10DEQEVsUQwQkBCBAWa+P2DO5yyIDAwc1Pt3XXMp57znnecchpln3joSuVwuBxEREZGINMQOgIiIiIgJCREREYmOCQkRERGJjgkJERERiY4JCREREYmOCQkRERGJjgkJERERiY4JCREREYmOCQl907ju3/eBv2eir99Xn5BIpVIcPXoUAwcORJMmTeDk5ARXV1d0794dW7duRXJysmixXb16FZ07d0bVqlVRo0YNLFy4MN+f097eHvb29khNTc3358oJRTz29vY4efLkF8u3atVKKB8aGpqn5w4KCkLXrl1VuhYrVqyAvb09lixZkqfnzkpSUhJmz56NunXrwtHREfXr10d4eLjan+dzGjduDHt7e/j7+xfYc35Jr169YG9vj8uXL+fq+LS0NPj6+mLOnDlK2w8ePAh7e3uMHTtWHWEqUVzHrB6VK1dGjRo10K5dOyxduhQfPnxQ+/N/Swrb+xWJS0vsAPLi8ePHGDlyJB49egR9fX3Y29vDwcEBERERuHv3Lm7cuIE9e/bAx8cH5ubmBRpbfHw8Bg8ejISEBDg6OqJUqVJwdHQs0BgKmxMnTqB58+af3X///n08ffpUbc/XpUuXQvXNed26ddixYwcMDQ3RsGFDSCSSAn9dfmsOHz6MWbNmoW3btgX+3HXq1EGxYsWUtkmlUrx+/Rr37t1DSEgI/vnnH+zcuRMGBgYFHh/R1+arTUhCQ0PRpUsXJCQkoFevXhgyZAjMzMyE/W/fvsWkSZNw+fJl9OnTBwcOHCjQN4XHjx8jISEBpUqVwv79+yGRSArkeY8ePQoA0NIqXL9aY2NjnDt3DsnJydDV1c2yjCJ2bW1tpKSk5Pk5c5OM9OjRA61bt1Z6LalLUFAQAGDKlCno2LGj2uv/Gi1YsACJiYmwsrLK1fEymSzL7c2aNUPVqlVhZGSUl/Cy9dtvv6FmzZpZ7nvw4AH69euH+/fvY8eOHRgwYEC+xfE1K6zvVySOr7LLRi6XY8yYMUhISMDAgQMxderUTB8glpaWWLVqFcqVK4enT59i7969BRqjVCoFAJibmxdYMgIANjY2sLGxKbDny6mmTZsiISEBFy9e/GyZo0ePwt7eXtRWg6JFi8LGxgZFixZVe92K14SlpaXa6/5aWVlZwcbGBvr6+mqt18jICDY2NqK9luzt7TF48GAAwNmzZ0WJ4WtQWN+vSBxfZUISGBiIO3fuoESJEsIffVYMDAwwaNAg1KhRI8uk4PDhw+jRoweqV68OZ2dntG3bFmvWrEFiYqJSubCwMOENJiIiApMmTUK9evXg5OSEH374AZs3b0ZaWppQ3t7eHr179wYA3LhxA/b29mjcuDEAYOLEibC3t8e+ffsyxePv7w97e3t0795daXt4eDimT5+OVq1awdnZGe7u7ujduzcOHTqUqY7P9cm+efMGM2bMQOPGjeHo6IhatWphyJAhuHXrVqY6FDHev38ff/31Fzp16oRq1arB3d0dw4YNw6NHjz5zxT+vZcuWAIDjx49nuf/27dsICwtDmzZtPltHeHg4FixYgLZt28LFxQWOjo5o2LAhJkyYoNTVoxg/oODg4KD0s729Pdq1a4dr166hZcuWcHJyQvPmzfH8+fNMY0ju3bsHBwcHVKpUCdevX1eKJzIyErVq1YK9vT3OnDnz2bgVdV67dg0A4OnpCXt7exw8eFAok5vfz7Vr1zBs2DA4OzujVq1a8PHx+WwMuaUYo6F4DVSrVg2dOnXCjh07Ptvvf+nSJfTt2xc1a9ZE9erV0b9/fwQHB2PKlCmZxrBkNYZEJpNh+/bt6NKlC9zd3VG1alX88MMP8Pb2xvv375WOnTRpEoD0v2V7e3tMnDgRQPZjSPz9/TF48GDUq1cPLi4uwt/9x48f1XLNFKytrQEAMTExmfbFxcVhyZIlwuuvZs2aGDhwYKbXmEJycjLWr1+P1q1bo2rVqmjQoAEWLFiAhIQEVKlSRXh/AZTfr44dO4ZGjRrByckJbdu2RXx8vFDu6NGj6NWrF2rUqIGqVauiXbt28PHxybJ18smTJxgzZgyaNWsmvD4HDBiA8+fPZyr7LbxfkTi+ynYyRTNf06ZNoaenl23Z9u3bo3379krb5HI5xo8fj0OHDkFHRwdubm4wMDBAQEAAli5diuPHj8PHxydTq8u7d+/QuXNnJCYmolq1akhOTkZAQAAWLFiAV69eYdq0aQCAtm3bIioqCpcvX0bRokVRt27dXH/jjoqKQufOnREREQE7Ozs0bNgQsbGxCAgIgL+/P0JDQzFs2LBs6wgKCsIvv/yCuLg4lC1bFo0bN0Z4eDhOnz6Ns2fPYsaMGejatWum41atWoVTp06hUqVK8PDwQFBQEE6ePInLly/Dz88PpUuXzvF51K5dG2ZmZjh37hykUil0dHSU9it+p61bt8bu3bszHf/06VP06NED0dHRsLW1Rb169fDx40cEBQXBz88PZ86cweHDh1GyZEmUKVMGbdu2xeHDhwEAbdq0yZSQRkVFYdCgQbCyskK9evUQFhaGsmXLZnpeBwcHDBw4EKtWrcL06dPh5+cnxD516lS8f/8eXbt2RZMmTT577vb29mjbti0uX76MqKgo1K5dG8WLF0eZMmUA5P73M23aNERHR8PDwwOPHz9WSrrUITk5GQMGDMDVq1dRpEgR1KxZExKJBP7+/pg9ezbOnDmDdevWKf0ufX19MXv2bGhoaMDV1RVGRka4du0aunfvjnLlyuXoeadNm4b9+/fD1NQULi4u0NTUxO3bt7FhwwacOXMGfn5+0NXVRZ06dZCSkoKbN2+iVKlScHFxgYuLS7Z1b9y4Ed7e3pBIJKhevTrMzMxw69YtLF26FBcvXoSPj0+m12ZuKT6s7ezslLa/ffsWvXv3RmhoKCwtLeHh4YG4uDhcuHABFy5cwKxZs/DTTz8J5ZOSktC/f39cu3YNpqam8PDwwPv377FlyxYEBAR8tmvywYMHOH/+PBwcHGBra4vU1FShC2v69OnYs2cP9PT04OzsDCMjIwQGBmLevHm4cOEC1q5dK1yHJ0+e4KeffkJCQgKcnJxQuXJlRERE4Pz58zh//jzmzp2LTp06Afh23q9IJPKvUJ8+feR2dnbygwcP5ur4bdu2ye3s7OTNmjWTv3jxQtgeHx8vHzBggNzOzk4+dOhQYfvLly/ldnZ2cjs7O3nPnj3lUVFRwr4zZ87I7ezs5FWqVJHHxsYK269evSq3s7OTd+vWTem5J0yYILezs5Pv3bs3U1xZHbNy5Uq5nZ2dfNGiRUplb9++LXdwcJA7OzvLExMThe2KOFNSUuRyuVyelJQk9/DwkNvZ2clXrVoll8lkQtl//vlH7uTkJK9SpYr83r17mWKsXLmy/MiRI8L2pKQkebdu3eR2dnby+fPnZ3OF/+/TeKZNmya3s7OTnz59WqmMTCaTe3h4yLt06SKXy+XyRo0aye3s7OTPnz8XygwcOFBuZ2cn37Bhg9KxcXFx8s6dO8vt7Ozkq1ev/uxzZ7V94MCBwvVIS0uTy+Vy+fLly+V2dnbyxYsXC+WlUqm8Xbt2cjs7O/nKlSvlcrlcvnfvXrmdnZ28efPm8oSEhBxdi549e8rt7Ozkly5dErbl5fdTrVo14fUrk8mUjs2K4rpevXo1R/HOmzdPbmdnJ//pp5+UXvORkZHyTp06ye3s7OTz5s0Ttj958kTu4OAgd3FxkV+/fl3YHh0dLe/SpYtw3T99/v9ek1evXgnXNT4+XiiXmJgo1HHgwAFh+4EDB+R2dnbyMWPGKMWe1fagoCB5pUqV5NWrV1eK7+PHj0Icmzdv/uJ1ye46JiYmyh89eiRfsGCB3M7
"text/plain": [
"<Figure size 600x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAsEAAAIqCAYAAADFMpc1AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADpj0lEQVR4nOzdd3hUVfrA8e+dkt47JSDFBBGQFhAF6YgIAsG1rSC6tl27LvZeUNeCBQVlERUragggKAhIUaSJCqiI8AOWEhIS0tu0+/sjzpBJJiGTmUkmue/neXyEe2fOnLnv3Ms7Z859j6KqqooQQgghhBAaomvuDgghhBBCCNHUJAkWQgghhBCaI0mwEEIIIYTQHEmChRBCCCGE5kgSLIQQQgghNEeSYCGEEEIIoTmSBAshhBBCCM2RJFgIIYQQQmiOJMFCCCF8TtZlEkL4G0Nzd0CIlmDLli1Mmzatzv1Go5HIyEhSUlK48sorGTNmTL3tbdy4kSVLlrBjxw7y8vIAaN++PYMGDeKqq66ic+fOp+3TgQMH+OKLL/juu+/IysqitLSUxMRE0tLSuPrqq+nRo4d7b/Ivx48fJyMjg2+//ZajR49SVFRETEwMffr04YorrmDQoEGNarclKCgo4JlnnmHDhg2UlZWRmJjI119/jcHQNJfK1NRUANasWUP79u2b5DVPZ8SIERw9epRVq1bRsWNHt59fWVnJvHnzMBgM3HzzzY7tr7/+OrNnz+bmm2/mrrvu8maXHcfRFb1eT2hoKB07duSiiy5i6tSpBAQEePX1W4sjR44wcuRIEhMT2bBhQ3N3RwivkyRYCDeEhIQwcuTIWtuLiorYt28fmzZtYtOmTdx1111O/+DbFRYWcs8997Bx40YAUlJSOPvssykrK2Pv3r0sXLiQjz76iLvuuosbbrjBZR9sNhtvvvkmb775JlarleTkZHr16oVOp+PPP/9k8eLFLFmyhHvvvZdrr73Wrff3ySefMHPmTCorK0lMTCQ1NZXg4GD+7//+j6+//pqvv/6aa665hgcffNCtdluKmTNnsnTpUmJjYxk+fDhRUVFNlgC3VvPmzeP11193eT742qhRowgODnbaVllZyf/+9z927drFrl27+OGHH3jrrbfQ6/VN3j8hRDNThRCntXnzZjUlJUUdPnx4nY+xWq3q/Pnz1ZSUFLV79+7qsWPHnPaXlpaqEyZMUFNSUtQrr7xS/f3332s9f8WKFerAgQPVlJQU9amnnnL5Oo899pijLxs3bnTaZ7PZ1MzMTPXss89WU1JS1EWLFjX4Pb799ttqSkqKOmDAAHXFihWq1Wp12r9+/Xq1f//+akpKijpr1qwGt9uSjBkzRk1JSVE3b97cLK+fkpKipqSkqIcPH26W13fl0KFD6r59+1STydSo57/22mtqSkqK+vLLLzttz8vLU/ft26fm5eV5o5tOGnIcN2/erJ5zzjlqSkqK+uWXX3q9D62ByWRS9+3bpx46dKi5uyKET8icYCG8RKfTcd1119GjRw8sFotjtNfu+eef548//mDw4MG8//77dOvWrdbzL7roIhYtWkR4eDgLFy5k/fr1To9Zt24dH3/8MbGxsXz44YcMHjzYab+iKEycOJHHHnsMgFdeeYXKysrT9v3333/n1VdfJTAwkPfee4+LLroInc758nDBBRfw6quvAjB//nxycnIadmBaELPZDEBSUlIz98R/dOjQgS5dumA0Gr3abkxMDF26dCEmJsar7TbUwIEDufLKKwFYu3Zts/TB3xmNRrp06UKHDh2auytC+IQkwUJ4Wbt27YCq+aV22dnZfPHFFxgMBp555pl6f2Lv0KGDY47kG2+84bRvwYIFANx00020adOmzjbS09Pp168f5557LsePHz9tnxcuXIjZbOaKK66olZxXd9555zF69GiGDRvm1G5qaiqpqalYLJZaz7n//vtJTU3ls88+c2x7/fXXSU1NZcWKFTzyyCP06dOHtLQ0Hn74YXr27Mk555xDSUmJyz5ccsklpKam8scffzi22eedXnLJJZxzzjn069ePadOmNTi5sffx6NGjAIwZM4bU1FS2bNnieMy+ffu49957GTJkCD169GDw4MHMmDGDffv21Wpv6tSppKamsnfvXqZNm0bPnj0ZPHgwK1asaFB/3FFRUcGcOXOYMGECvXr1om/fvlx11VUsXbq0zuesWLGCK664gn79+jFgwADuvPNODh8+zPTp00lNTeXIkSOOx44YMYLU1FQOHTrk2FZZWckbb7zB5MmT6du3L3369GHy5MnMnTuX8vJyp+fOnj0bgLlz55Kamsrrr78OnPoMzJo1q1b/vvnmG6699lrOPfdc+vbtS3p6Oh9++KHjS4q32OddVz9X7XJycnjqqacYMWIEPXr04LzzzuOuu+5i7969LtsqKiripZdeYvTo0fTq1YvRo0czZ84cDh8+TGpqKlOnTnU8dsuWLaSmpvLMM8+wcOFCzjvvPM455xyuuuoqbDYbAFarlU8//ZS//e1v9OnThz59+nD55ZeTkZHh8ibDn3/+mVtuucXR38GDB3P77bfz888/13rs/v37ueeeexg9ejQ9evTg3HPP5cYbb6z1pfvIkSOkpqZywQUX1GqjMedDQUEB77//PuPHj6dXr16cd955PPDAAxw7dszlMRXC12SymxBeVFpayo8//gjAmWee6di+bt06zGYzQ4YMadAo48SJE3nuuef45ZdfOHToEB07diQvL4+tW7cCMH78+Hqfr9fr+eijjxrUZ6vVyqpVqxrULuBIarzh1VdfJSsri/PPP59jx47Ru3dvioqKWLlyJWvWrGHixIlOj9+3bx9//PEH3bp1c9z8VFJSwrXXXsvOnTuJiYnh3HPPxWQysW3bNrZs2cKtt97KbbfdVm8/+vTpg8ViYc2aNZSVlTFy5EhCQkKIi4sDqkYK77zzTiorK0lNTaVv374cOHCApUuXsmrVKl555RWGDx9eq93bbruN0tJShg4dyu7duxt9s2Jd8vPzmTZtGnv37iUqKoohQ4ZQUVHB1q1b+fHHH/n+++957rnnUBTF8Zz//Oc/zJ8/n4CAAAYOHIher2fdunVs3ryZiIiI076mqqr885//5PvvvycxMZGBAweiqirbt29n1qxZfP/997z//vsoisKoUaP44Ycf2Lt3LykpKY4vS/V56qmn+OCDDzAajfTv35/g4GC2b9/Ok08+ybZt25g1a5bT+/GE/WavlJQUp+179uzhuuuuIy8vj44dOzJs2DCys7NZsWIFa9as4fXXX2fo0KGOx588eZJp06bx559/kpiYyLBhwzh69CivvPIK69atq/f1Dx06xIABA1AUhbZt26LT6bBYLNx66618++23hIeH07dvXwwGA1u3buWBBx5g69atPPfcc452Nm/ezPXXX4/VaqVv37706NGDw4cPO86jt99+m/PPPx+oSoD/9re/UVpaSs+ePTnrrLPIyclh/fr1rF+/npkzZzJlypR6j1tjz4eHH36Y1atX06tXLy644AK2bdtGRkYGmzZt4ssvvyQ8PPy0MRPCq5p5OoYQLUJ9c4KtVqtaUFCgbtq0Sb388svVlJQUdfLkyarFYnE85oEHHlBTUlLUV155pcGvaW9r+fLlqqqq6rZt2047L7kxsrKyHPOYq/fZHfY5mGazuda+++67r9b8ZPs80dTUVPWXX35xbLdarerq1avVlJQU9cYbb6zV1ssvv6ympKSo8+fPr9X+3XffrZaWljq2HzhwQB0+fLiakpKifv/99w16H/bHHzx40LEtJydH7d27t5qamqpmZGQ4Pf6zzz5TU1NT1T59+qjHjx93bL/66qvVlJQUdejQoWp+fr7jvZ2Ou3OCb731VjUlJUW9+eab1ZKSEsf2gwcPqiNHjlRTUlLU999/37H9hx9+UFNSUtTBgwer+/btc2w/cuSIOnr0aJevX/OY2D+HV199tdM84by8PMdrVp9TXdecYFfbV61apaakpKhDhgxx6l9eXp46duxYNSUlRV25cuVpj0t9x7G0tFT99ddf1fvvv19NSUlR+/Xrpx49etSx32Q
"text/plain": [
"<Figure size 800x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Random Forest Metrics:\n",
"Accuracy: 0.4936\n",
"Precision: 0.4906\n",
"Recall: 0.4630\n",
"F1-Score: 0.4764\n",
"ROC-AUC: 0.5052\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAiQAAAHdCAYAAAAthmI8AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABqzklEQVR4nO3dd1QUVxsG8GfpIB0RBDsIKoiigBVQ7EZjjSVWTMTYSxQ1lliw994NNuz9s8Sa2BGxEIMFG4IKqDRB+s73B7BxpcjiwqA+v3P2KDN37r4zu7Dv3jYSQRAEEBEREYlIRewAiIiIiJiQEBERkeiYkBAREZHomJAQERGR6JiQEBERkeiYkBAREZHomJAQERGR6JiQEBERkeiYkBApgOsIEhUd/n5929TEDqCkSU1NxZkzZ3D48GE8evQIUVFR0NTURNWqVdG6dWv06NEDmpqaosR27do1LFy4ECEhIVBTU0P37t3h7e1dpM9pa2sLAPj333+hpib+2yU7HgBYsWIFWrZsmW/5Nm3a4MmTJwCAU6dOoWLFioV+7qCgIPj4+MDPz6/A12LFihVYuXIlfvnlF4wePbrQz52b5ORkLFiwACdPnkRcXByMjY2xd+9emJmZKfV58tKnTx9cv349130SiQTa2tooW7YsGjZsiMGDB8PExKRY4iooNzc3REZG4uzZsyhXrpzY4eTpwIEDmDhxYoHKzpkzB507dy7iiJQvJSUFGzZsgJqaGn755RexwyGRiP8JU4I8evQIo0aNQkhICLS1tWFraws7OztERUXh7t27uHnzJnbv3g1fX1+UKVOmWGN79+4dhgwZgsTERNjb26NcuXKwt7cv1hhKmj///DPfhOTevXuyZEQZunXrVqK+wa1btw7bt2+Hrq4umjRpAolEUuzvSwBwdHTM8YGekZGBV69e4e7du3j8+DHOnDmDvXv3wtTUtNjj+1qYmJigYcOG+ZapUKFCMUWjXBs2bMCKFSuYjHzjmJBkCQ0NRbdu3ZCYmIg+ffpg6NChMDIyku2PiIjAxIkTceXKFfTr1w/79++Hjo5OscX36NEjJCYmoly5cti3bx8kEkmxPO/x48cBoES0jnxIX18f58+fR0pKSp4tVtmxq6urIy0t7bOfszDJSK9evdC2bVu595KyBAUFAQAmTZok6rfibt265fn84eHhGDRoEB49eoQVK1ZgxowZxRzd18PKygoLFy4UO4wiUZISfRIPx5Ag85fh119/RWJiIgYNGoTJkyfn+AAxNzfHqlWrUKlSJTx58gR79uwp1hhTU1MBAGXKlCm2ZATI/CNoZWVVbM9XUM2bN0diYiIuXryYZ5njx4/D1tZWlFaDbMbGxrCysoKxsbHS685+T5ibmyu9bmUpV64cxo0bBwA4f/68yNEQUUnGhARAYGAg/vnnH5iammLIkCF5ltPR0cHgwYNRt27dXJOCo0ePolevXqhTpw4cHBzQvn17rFmzBklJSXLlwsPDYWtriyFDhiAqKgoTJ05E48aNUbNmTXz33XfYvHkzMjIyZOVtbW3Rt29fAMDNmzdha2sLDw8PAMCECRNga2uLvXv35ojH398ftra26Nmzp9z2yMhITJ06FW3atIGDgwNcXFzQt29fHDlyJEcdtra2sLW1RXp6utz2V69eYdq0afDw8IC9vT3q16+PoUOH4vbt2znqyI7x3r17OHz4MLp06YLatWvDxcUFw4cPR0hISB5XPG+tW7cGAJw8eTLX/Xfu3EF4eDjatWuXZx2RkZGYN28e2rdvD0dHR9jb26NJkyYYP368XFfPgQMH5Mau2NnZyf1sa2uLDh064Pr162jdujVq1qyJli1b4tmzZ1ixYgVsbW2xZMkSAJljcezs7FCtWjXcuHFDLp43b96gfv36sLW1xdmzZ/OMO7vO7PEbnp6esLW1xYEDB2RlCvP6XL9+HcOHD4eDgwPq168PX1/fPGNQhKWlJQAgNjY2x767d+9i3Lhx8PDwQM2aNVG7dm20adMGCxcuRHx8fK7nfebMGfz999/o3bs3HB0dUadOHfz0008IDAzM9fmvXr0KT09PODs7w8nJCaNHj8bLly/zjDc5ORlr1qxB+/bt4eDggDp16uDHH3/M9fcj+72xfft23Lx5EwMGDECdOnXg7OyMwYMH4/nz5wCAM2fO4IcffkCtWrXg4eGBWbNm4f379wW9hIX26NEjeHt7w9XVFfb29mjcuDHGjRuHR48e5Sjbp08f2Nra4uHDh+jbty9q1qyJxo0by1oaAeDZs2eYMGEC3NzcYG9vDzc3N0yaNAkvXrzIUV9KSgpWrVqFTp06oU6dOnB0dESnTp2wdu1aub+JHh4eWLlyJQBg7dq1sLW1xYoVK4rgalBJx4QE/zXtN2/eHFpaWvmW7dixI/z8/NCvXz/ZNkEQMG7cOIwdOxZBQUGoXbs23NzcEBUVhaVLl6JHjx6IiYnJUdfr16/RtWtXnDlzBtWrV4ejoyOePHmCefPmYfbs2bJy7du3l/UdGxsbo3379mjevHmhzvXt27fo2rUrdu/eDTU1NTRp0gTVq1fHjRs3MG7cuAL9IQgKCsL333+PnTt3Qk1NDR4eHqhYsSLOnDmDnj17Yvfu3bket2rVKnh7eyM9PR2urq7Q1tbGqVOn0KNHD4SFhSl0Hg0aNICRkRHOnz8vayn4UPZr2rZt21yPf/LkCTp27IjNmzdDKpWicePGqFevHhITE3Ho0CF069YNr169ApDZL9++fXvZse3atZP7Gci8roMHD4a6ujoaN24MTU3NXAfQ2tnZYdCgQRAEAVOnTpWLffLkyYiJiUH37t3RrFmzPM/d1tYW7du3lw0SbdCgAdq3by8bP1DY12fKlCm4du0aXF1dYWBgIJd0fY6///4bAFC1alW57SdOnEC3bt1w9OhRlClTBk2bNoWdnR2eP3+ODRs2wNPTE1KpNEd9hw4dgpeXF16/fo1GjRrB1NQUly5dQr9+/XIkXHv37oWnpyeuXbuGatWqoV69erh48SJ69OiRa0IQExODH374AUuXLkVUVBRcXV3h6OiIf/75B+PGjcP48eNz7V7ITpBevHiBhg0bQkdHB+fOnUOfPn3g6+uLoUOHQhAENGrUCDExMdi6dWuRD0g/d+4cOnfujMOHD8PIyAjNmjWDsbExjhw5gi5duuTZYjV8+HA8efIE7u7uUFNTk41Vu3r1Kjp16oSDBw/CwMAAHh4eMDAwwL59+9C5c2fcvXtXVocgCBg8eDCWL1+Ot2/fol69eqhXrx7CwsKwZMkSeHl5ya5j8+bNYWNjAwCwsbFB+/btlfbeoy+MQEK/fv0EGxsb4cCBA4U6fuvWrYKNjY3QokUL4fnz57Lt7969E7y8vAQbGxth2LBhsu1hYWGCjY2NYGNjI/Tu3Vt4+/atbN/Zs2cFGxsboUaNGkJcXJxs+7Vr1wQbGxuhR48ecs89fvx4wcbGRtizZ0+OuHI7ZuXKlYKNjY2waNEiubJ37twR7OzsBAcHByEpKUm2PTvOtLQ0QRAEITk5WXB1dRVsbGyEVatWCVKpVFb2r7/+EmrWrCnUqFFD+Pfff3PEWL16deHYsWOy7cnJyUKPHj0EGxsbYe7cuflc4f98GM+UKVMEGxsb4cyZM3JlpFKp4OrqKnTr1k0QBEFo2rSpYGNjIzx79kxWZtCgQYKNjY2wYcMGuWPj4+OFrl27CjY2NsLq1avzfO7ctg8aNEh2PTIyMgRBEITly5cLNjY2wuLFi2XlU1NThQ4dOgg2NjbCypUrBUEQhD179gg2NjZCy5YthcTExAJdi969ews2NjbC5cuXZds+5/WpXbu27P0rlUrljs3v+ffv359jX0pKihAaGiqsW7dOsLOzE2xsbITTp0/L7a9Xr55gZ2cnBAQEyB376NEjoU6dOoKNjY1w48YN2fbsa2ljYyNs3rxZ7lqPGDFCsLGxEYYPHy4r//LlS8HBwUGwt7cXrl27Jtv+9u1boWPHjrK6wsLCZPuGDRsm2NjYCL/88ouQkJAg2/7s2TOhWbNmgo2NjbB161bZ9v3798vqmTt3riymd+/eCe7u7rJ
"text/plain": [
"<Figure size 600x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAsEAAAIqCAYAAADFMpc1AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADii0lEQVR4nOzdd3gU5fbA8e9sSe89QEAEE0Q6BEFBEBQVQULwigWwXAvXghV7v9ZrwYKIYseuQABFQQTBShGkqFj4CVJSSO/ZNr8/4i7ZZBPYZFsy5/M8PiYzs5Ozc3aHs7PvnFdRVVVFCCGEEEIIDdH5OwAhhBBCCCF8TYpgIYQQQgihOVIECyGEEEIIzZEiWAghhBBCaI4UwUIIIYQQQnOkCBZCCCGEEJojRbAQQgghhNAcKYKFEEIIIYTmSBEshBBCiKMmc2yJjsLg7wCE0KoNGzYwY8aMZtcbjUaio6NJT0/nggsuYNy4cS3u7+uvv2bp0qVs2bKFoqIiALp06cLw4cO58MILOfbYY48Y019//cWiRYv45ptvyM3NpaqqiuTkZDIzM5k2bRp9+vRx70n+Iy8vj8WLF7N27VoOHDhAeXk5cXFxDBw4kPPPP5/hw4e3ar/tQWlpKQ8//DDr16+nurqa5ORkPv/8cwwG35x+x4wZw4EDB1yu0+l0hIWF0aVLF0499VQuv/xyIiIifBLX0bBYLJxwwgkA/Pbbb36OpmXPP/88c+fOPapt33rrLU488UQvR+R55eXlPPfcc5xwwglMnjzZ3+EI0WZSBAvhZ2FhYYwdO7bJ8vLycv7880++++47vvvuO2688UZmzpzZZLuysjJuvvlmvv76awDS09M54YQTqK6u5vfff2fhwoW8++673HjjjVxxxRUuY7DZbMybN4958+ZhtVpJS0ujX79+6HQ6/vjjD5YsWcLSpUu59dZbufTSS916fu+//z6PPPIIdXV1JCcnk5GRQWhoKP/3f//H559/zueff87FF1/MnXfe6dZ+24tHHnmEZcuWER8fz6mnnkpMTIzPCuCGTjrpJOLj452WmUwmDh48yM8//8yuXbv46quvePfddwkLC/N5fB1FWloaAwYMaHGbhIQE3wTjYY8++iiLFy/moYce8ncoQniEFMFC+FlsbCxPPvmky3U2m4033niDxx9/nOeff55JkyaRmprqWF9dXc306dP57bffGDx4MPfeey+9evVyevzKlSt54IEHePLJJ8nPz+fuu+9u8ncefPBB3nvvPTp37syDDz7IiBEjHOtUVWXZsmXcddddPPbYY0RERPCvf/3rqJ7bggULePLJJ4mJieHxxx/njDPOQKc7PApr/fr13Hzzzbz55puEhYVxww03HNV+25Nt27YBMGfOHL9e/Zs5c2azf/+3337jsssu49dff+Xtt9/myiuv9HF0HceQIUN47LHH/B2GV8gwCNHRyJhgIQKYTqfjsssuo0+fPlgsFsfVXrvHH3+c3377jREjRvDWW285FcD2x5911ll8+OGHREZGsnDhQtatW+e0zVdffcV7771HfHw877zzjlMBDKAoCpMmTeK+++4D4JlnnqGuru6Isf/66688++yzBAcH8+abb3LWWWc5FcAAp5xyCs8++ywAr776KgUFBUd3YNoRs9kMQEpKip8jaV5GRgZXX301AGvWrPFzNEII4RtSBAvRDnTu3BmoH19ql5+fz6JFizAYDDz88MMtfsXetWtXbrzxRgBeeOEFp3Wvv/46AFdddZXTVebGsrOzGTx4MMOGDSMvL++IMS9cuBCz2cz555/fpDhv6KSTTuL0009n9OjRTvvNyMggIyMDi8XS5DG33347GRkZfPTRR45lzz//PBkZGaxYsYJ77rmHgQMHkpmZyd13303fvn3p378/lZWVLmM455xzyMjIcBp3WldXx4IFCzjnnHPo378/gwcPZsaMGUddJNpjtI/HHTduHBkZGWzYsMGxzZ9//smtt97KyJEj6dOnDyNGjGD27Nn8+eefTfY3ffp0MjIy+P3335kxYwZ9+/ZlxIgRrFix4qjiORJXrzG777//nlmzZnHKKafQp08fBg4cyKRJk3jxxRcxmUwun/evv/7K0qVLmTJlCgMGDGDo0KFcd911/PHHHy7//meffcb555/PoEGDGDZsGPfeey9lZWXNxltSUsITTzzBGWecQZ8+fRg6dCj//ve/m3zIg8OvjbVr17J27VrOP/98BgwYwLBhw5g9ezbFxcUAfPTRR0ycOJH+/ftzxhln8MILLzg+xHjT1q1bufbaaxk+fDh9+vTh1FNP5b777iM3N7fJtmPGjGHIkCH8/vvvZGdnO7bfvHmzY5udO3cya9Ysx/7Gjh3Lo48+6nieDZWXl/P4448zceJEBgwYwODBgzn//PN55513nN57GRkZLFmyBIC7776bjIwMFi9e7IWjIYTvyHAIIQJcVVUVP/74IwDHHXecY/lXX32F2Wxm5MiRR3WVcdKkSTz22GNs27aNvXv30q1bN4qKiti4cSMAEyZMaPHxer2ed99996hitlqtrFq16qj2Cxz1DUVH49lnnyU3N5eTTz6ZgwcPMmDAAMrLy1m5ciVffvklkyZNctr+zz//5LfffqNXr15kZGQAUFlZyaWXXsr27duJi4tj2LBhmEwmNm3axIYNG7j22mu57rrrWoxj4MCBWCwWvvzyS6qrqxk7dixhYWGO8aBr1qzhhhtuoK6ujoyMDAYNGsRff/3FsmXLWLVqFc888wynnnpqk/1ed911VFVVMWrUKHbu3NnqmxUbsxeP6enpTstff/11HnvsMYxGIwMHDmTAgAHk5eWxfft2du3axc8//+wyfy+88AJffPEFvXr1YuTIkWzfvp1Vq1bx3XffkZOTQ1pammPbZ599lnnz5mE0Ghk2bBh6vZ6lS5c6XveN7du3j2nTppGXl0dycjJjxoyhpKSE77//nm+++Yb//Oc/LofWfPDBB6xdu5bjjz+ek046iS1btrBs2TJ2797NSSedxKuvvsrAgQMZNmwY3377Lc899xzl5eXccccdbTiyLXvvvfd48MEHsdls9O/fn5SUFHbt2sX777/P559/zoIFC+jXr5/TY8xmM1deeSUGg4FRo0bx66+/Oj5oLl26lDvvvBOr1coJJ5xA586d+fXXX3njjTf44osveOutt+jSpQsAtbW1XHTRRfz+++907dqVESNGUFNTw6ZNm9i6dSs7duxwDO2YOHEiP/30E/v27WPAgAGkpaXRtWtXrx0XIXxCFUL4xQ8//KCmp6erp556apN1VqtVLS0tVb/77jt16tSpanp6ujp58mTVYrE4trnjjjvU9PR09Zlnnjnqv2nf16effqqqqqpu2rSp2RjaIjc3V01PT1d79+7tFLM70tPT1fT0dNVsNjdZd9ttt6np6enqhx9+6Fj23HPPqenp6WpGRoa6bds2x3Kr1aquXr1aTU9PV6+88som+3r66afV9PR09dVXX22y/5tuukmtqqpyLP/rr7/UU089VU1PT1e//fbbo3oe9u337NnjWFZQUKAOGDBAzcjIUBcvXuy0/UcffaRmZGSoAwcOVPPy8hzLp02bpqanp6ujRo1SS0pKHM/taP/+Dz/80GRdTU2N+scff6iPP/64mp6erp5wwglOxy4/P1894YQT1MzMTHX37t1Oj920aZPau3dvNT09Xc3NzXUstx+7448/3vE6U1VVra2tVc8//3w1PT1dfeyxxxzLt23bpmZkZKhDhw5Vf/vtN8fyv//+Wx09erTjdWBns9nUyZMnq+np6eo999yjmkwmp30NHTpUTU9PV7/88kvHcvtrIz09XV24cKFjeV5entq/f39HvA2P0bp169T09HR10KBBR3Wc7X/jtttuO+K2dr/++qvaq1cvtV+/fur69esdy61Wq/r888+r6enp6imnnKLW1NQ41tnzOXnyZLWurs6xvaqq6u7du9U+ffqoAwcOVDds2OC0P/vr/IILLnAsX7JkiZqenq7efPPNqs1mcyzfu3evmpmZqaanp6t///23Y7mr950Q7ZkMhxDCzw4cOOD46t/+3/H
"text/plain": [
"<Figure size 800x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Gradient Boosting Metrics:\n",
"Accuracy: 0.4952\n",
"Precision: 0.4923\n",
"Recall: 0.4630\n",
"F1-Score: 0.4772\n",
"ROC-AUC: 0.4972\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAiQAAAHdCAYAAAAthmI8AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABrN0lEQVR4nO3dd1QUVxsG8GfpoCiiCIodXFAQRQErYm/RWGOJFY0Yxd6NYsVesGNFNEqssX32FmMXKxo0dhQLIFWQvvP9AbtxpcjCwqA+v3P2KDN37r6z9d3bRiIIggAiIiIiEWmIHQARERERExIiIiISHRMSIiIiEh0TEiIiIhIdExIiIiISHRMSIiIiEh0TEiIiIhIdExIiIiISHRMS+q5xXUBSp+/59fQ9nzupxzefkCQlJeHo0aMYMmQImjdvjho1asDBwQG9evXC1q1bkZiYKFpsV69eRbdu3VCzZk3UqVMHixYtyvf7tLKygpWVFVJSUvL9vnJCHo+VlRVOnjz5xfJt27ZVlA8KCsrTfQcEBKBHjx4qPRarVq2ClZUVvLy88nTfmUlISMCcOXPQsGFD2NraonHjxggJCVH7/eTE+/fv4evriz59+qBp06awtbWFk5MTunTpgsWLF+PVq1eixPWp8ePHw8rKCn/++adiW34+P9lJTEzE6tWrsX79+hyVv3btmtJr/9Obg4MDfvzxR3h5eSE2NjafI8+7rM5drOeCvl5aYgeQn548eYLRo0fj8ePH0NfXh5WVFWxsbBAaGor79+/j1q1b2LVrF3x9fVG6dOkCje3Dhw8YNmwY4uLiYGtri3LlysHW1rZAYyhsTpw4gVatWmW5/8GDB3j27Jna7q979+6F6lfd+vXrsX37dhQtWhRNmjSBRCIp8NclAGzZsgVeXl5ITEyEgYEBatSoATs7O8TGxuL+/fvYtGkTfH19MX78eLi6uhZ4fIXRxo0bsWrVKvz6668qHWdgYIDmzZsr/hYEAR8+fEBgYCDWrVuH48ePY9euXTAyMlJzxOqT23Mn+tw3m5AEBQWhe/fuiIuLQ9++feHu7o4SJUoo9r979w5TpkzB5cuX0b9/f+zbtw8GBgYFFt+TJ08QFxeHcuXKYe/evZBIJAVyv0ePHgUAaGkVrqe+WLFiOHfuHBITE6Grq5tpGXns2traSE5OzvN95iYZ6d27N9q1a6f0WlKXgIAAAMDUqVPRpUsXtdefE6tXr8aqVatgYGCAGTNmoFu3btDR0VHsT01NxeHDhzFr1iwsWLAA5ubm2SaRBS0/n5/s5DaxLVGiBJYsWZJhe1JSEsaPH48TJ05gxYoVmDFjRl5DzDdZnbtYzwV9vb7JLhtBEDBu3DjExcVhyJAhmDZtWoY3hZmZGdasWYNKlSrh2bNn2L17d4HGmJSUBAAoXbp0gSUjAGBhYQELC4sCu7+catGiBeLi4nDhwoUsyxw9ehRWVlaitBrIGRsbw8LCAsbGxmqvW/6aMDMzU3vdOXH//n2sWbMG2tra2LJlC37++WelZAQANDU10alTJyxevBgAsHz5chEizVp+Pj8FSUdHB6NGjQIAnDp1SuRocudbeS6o4HyTCcnNmzdx7949mJiYYNiwYVmWMzAwwNChQ1GnTp1Mk4LDhw+jd+/eqF27Nuzs7NChQwd4e3sjPj5eqVxwcDCsrKwwbNgwhIaGYsqUKWjUqBFq1KiBH374AT4+PkhNTVWUt7KyQr9+/QAAt27dgpWVFZo1awYAmDx5MqysrLBnz54M8cj7nXv16qW0PSQkBNOnT0fbtm1hZ2cHJycn9OvXD4cOHcpQR1ZjSN6+fYuZM2eiWbNmsLW1Rb169eDu7o47d+5kqEMe44MHD3Dw4EF07doVtWrVgpOTE0aMGIHHjx9n8YhnrU2bNgCA48ePZ7r/7t27CA4ORvv27bOsIyQkBAsXLkSHDh1gb28PW1tbNGnSBJMmTVLq6vnzzz9hZWWl+NvGxkbpbysrK3Ts2BHXr19HmzZtUKNGDbRq1QovXrzI0C/+zz//wMbGBtbW1rhx44ZSPO/fv0e9evVgZWWFM2fOZBm3vM7r168DAFxdXTOMjcjN83P9+nWMGDECdnZ2qFevHnx9fbOMAQC2bt0KmUwGV1dX1KpVK9uyLVq0QLdu3dCpUydFIgUAffv2hZWVFR49eoR+/fqhRo0aaNSokaJ1KyUlBXv27EH//v1Rt25d2NjYwMnJCX379sWRI0cyva8XL15gwoQJaNSoEWrVqoU+ffpkeKzlshu3cPnyZQwePBh169ZFjRo10LZtW6xatQofP35UKqfq+7lZs2ZYvXo1AGDdunWwsrLCqlWrsn38ckKemMbFxWXYl5qaih07dijee7Vq1ULXrl2xffv2LMdE3b59G8OHD0f9+vVha2uLpk2bYsaMGXj79m2GsomJiVizZg06d+6M2rVrw97eHp07d8a6deuUPv+yO/fMngv5ttOnT+P8+fPo06cP7O3tUbt2bQwaNAg3b97MNPaAgAAMHToUDRo0ULwGrly5grVr12Z4r9DXq3C126uJ/MOvRYsW0NPTy7Zsp06d0KlTJ6VtgiBg4sSJOHToEHR0dODo6AgDAwP4+/tj+fLlOH78OHx9fTO0uoSFhaFbt26Ij49HrVq1kJiYCH9/fyxcuBCvX7+Gh4cHAKBDhw4IDw/H5cuXYWxsjIYNG+b6V0R4eDi6deuG0NBQSKVSNGnSBNHR0fD398e1a9cQFBSEESNGZFtHQEAABg0ahJiYGFSsWBHNmjVDSEgITp8+jbNnz2LmzJno0aNHhuPWrFmDU6dOwdraGs7OzggICMDJkydx+fJlHDhwAOXLl8/xedSvXx8lSpTAuXPnkJSUlOGXufw5bdeuHXbu3Jnh+GfPnqF3796IiIiApaUlGjVqhI8fPyIgIAAHDhzAmTNncPjwYZQpUwYVKlRAhw4dcPjwYQBA+/btMySk4eHhGDp0KMqWLYtGjRohODgYFStWzHC/NjY2GDJkCNasWYPp06fjwIEDitinTZuGyMhI9OjRQ2mcwOesrKzQoUMHXL58GeHh4ahfvz5KlSqFChUqAMj98+Ph4YGIiAg4OzvjyZMnSknX5xISEnD69GkAQOfOnbMs96m5c+dmuW/EiBGIi4uDi4sL7t+/D1tbWwiCgBEjRuDs2bMoXrw4atasCV1dXTx58gTXr1/H9evXER4erkjWgbRWm4EDByI6OhpWVlaoXbs27t+/jwEDBsDc3DxHcQLAhg0bsHTpUmhra8PW1hYmJia4c+cOVq9ejTNnzmDr1q0oXry40jE5fT+3aNECV65cwaNHjyCVShVJf179888/AAB7e3ul7YmJiXBzc8PVq1dRpEgR1K1bFxKJBNeuXcOcOXNw5swZrF+/Xuk99Mcff2D27NmQyWSoWbMmzMzM8PDhQ+zcuRPHjx/Hxo0bYWdnByDt82/o0KG4dOkSTE1NUbduXQiCgBs3bsDLywuXLl3Ctm3bIJFIcn3uBw4cwKlTp1CpUiU0bNgQjx8/xsWLF3Ht2jVs375dKSE+ffo0Ro8ejeTkZNjb26N06dK4efMmBg4cCBsbmzw/zlSICN+g/v37C1KpVPjzzz9zdfy2bdsEqVQqtGzZUnj58qVi+4cPHwQ3NzdBKpUKw4cPV2x/9eqVIJVKBalUKvTp00cIDw9X7Dtz5owglUqF6tWrC9HR0YrtV69eFaRSqdCzZ0+l+540aZIglUqF3bt3Z4grs2NWr14tSKVSYenSpUpl7969K9jY2Ah2dnZCfHy8Yrs8zuTkZEEQBCEhIUFwdnYWpFKpsGbNGkEmkynK/vXXX0KNGjWE6tWrC//880+GGKtVqyYcOXJEsT0hIUHo2bOnIJVKhQULFmTzCP/n03g8PDwEqVQqnD59WqmMTCYTnJ2dhe7duwuCIAhNmzYVpFKp8OLFC0WZIUOGCFKpVNi4caPSsTExMUK3bt0EqVQqrF27Nsv7zmz7kCFDFI9HamqqIAiCsHLlSkEqlQrLli1TlE9KShI6duwoSKVSYfXq1YIgCMLu3bsFqVQqtGrVSoiLi8vRY9GnTx9BKpU
"text/plain": [
"<Figure size 600x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAsEAAAIqCAYAAADFMpc1AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADsM0lEQVR4nOzdd3gUVffA8e9sSe89QEAEAwKhB0VBEBRRQSCogC8gYH1fFSsK9gr6s2BBRbEhVpSqICBFUJCiCAQQKQJSQnpv2+b3R8zCsgmwyW42yZzP8/iYzMzePTNndjg7uXOvoqqqihBCCCGEEBqi83YAQgghhBBC1DUpgoUQQgghhOZIESyEEEIIITRHimAhhBBCCKE5UgQLIYQQQgjNkSJYCCGEEEJojhTBQgghhBBCc6QIFkIIIYQQmiNFsBBCCM3Q8vxQWt53Iapi8HYAQjQGmzZtYuzYsdWuNxqNhIaGkpiYyKhRoxgwYMAZ2/v5559ZtGgRW7duJTs7G4BmzZrRs2dPbrrpJs4///yzxnTw4EHmzZvHL7/8QlpaGsXFxcTGxpKcnMzo0aPp0KGDazv5rxMnTjB//nzWrFnDsWPHKCgoICIigi5dujBy5Eh69uxZo3Ybgry8PF544QXWrVtHSUkJsbGxLFu2DIOh7i+lR48eZdGiRaxbt460tDRycnIICgrivPPOo1evXowcOZKoqKg6j+tUo0aNYuvWrXz66adcdNFFAEyePJkFCxbw/PPPc8MNN9RZLAUFBbz55pu0b9+eYcOGnXX7+fPnM2XKFKfliqIQEhJC8+bNueqqq7j55pvx8fHxRMhuU92+eysXQtQXUgQL4UYBAQH079/faXlBQQH79+9nw4YNbNiwgfvvv58777zTabv8/HwefPBBfv75ZwASExNp3749JSUl7N27lzlz5vDFF19w//33c9ttt1UZg81m45133uGdd97BarWSkJBAx44d0el07Nu3jwULFrBo0SIefvhhxo8f79L+ffXVV0ydOpXy8nJiY2Np06YN/v7+/P333yxbtoxly5Zx88038+ijj7rUbkMxdepUFi9eTGRkJJdffjlhYWF1XgBbrVZeffVVPvnkE6xWK2FhYbRv356QkBDy8vLYsWMHf/zxB7NmzWLq1Klce+21dRpffTVt2jTmz5/P888/79LrIiMjueSSS+y/22w28vPzSU1N5ZVXXmHVqlV8+umn9boQrum+C9HoqUKIWtu4caOamJioXn755dVuY7Va1Q8//FBNTExU27Vrpx4/ftxhfXFxsTp48GA1MTFRHTVqlPrnn386vX7p0qXqRRddpCYmJqrPPfdcle/z1FNP2WP5+eefHdbZbDZ14cKFavv27dXExER17ty557yP77//vpqYmKj26NFDXbp0qWq1Wh3Wr127Vu3evbuamJioTp8+/ZzbbUgGDBigJiYmqhs3bvRaDJMnT1YTExPVSy65RF2yZIlqsVgc1peXl6sfffSR2q5dO/XCCy9Ut2/f7qVIVXXkyJFOxys9PV3dv3+/WlBQUKexPPLIIy6d8/PmzVMTExPV0aNHV7m+sLBQHTFihJqYmKjOmjXLnaG6XXX77q1cCFFfSJ9gIeqITqdjwoQJdOjQAYvFYr/bW+mll17ir7/+olevXnz66ae0bdvW6fVXX301c+fOJTg4mDlz5rB27VqHbX766Se+/PJLIiMj+fzzz+nVq5fDekVRGDJkCE899RQAr7/+OuXl5WeN/c8//+SNN97A19eX2bNnc/XVV6PTOV4+LrvsMt544w0APvzwQzIyMs7twDQgZrMZgLi4OK+8/48//sj8+fMJCwvjyy+/5JprrkGv1zts4+Pjw/jx45k0aRJWq5W33nrLK7FWJyYmhlatWhEcHOztUGolKCiI//3vfwCsXLnSy9HUTGPJhRA1JUWwEHWsadOmQEX/0krp6enMmzcPg8HACy+8cMY/sTdv3pz7778fgLffftth3ccffwzAHXfcQXx8fLVtpKSk0K1bNy6++GJOnDhx1pjnzJmD2Wxm5MiRTsX5qS655BKuvPJK+vbt69BumzZtaNOmDRaLxek1kydPpk2bNnzzzTf2ZW+99RZt2rRh6dKlPPHEE3Tp0oXk5GQef/xxkpKS6NSpE0VFRVXGcN1119GmTRv++usv+7Ly8nJmzZrFddddR6dOnejWrRtjx45l9erVZ933U2M8duwYAAMGDKBNmzZs2rTJvs3+/ft5+OGH6d27Nx06dKBXr15MmjSJ/fv3O7U3ZswY2rRpw969exk7dixJSUn06tWLpUuXnjGOyvzef//9NG/e/Izb3nTTTVxzzTX06dPHYXm/fv3o3r07e/fuJSUlhQ4dOnD55Zfz22+/AVBWVsbHH3/MyJEj6dGjB+3bt+fiiy/mtttu45dffqnyvXbu3Mn//vc/evbsSZcuXbjtttvYu3dvldtWle9KS5cuZcyYMXTr1o1OnToxZMgQPvnkE/uXj0qbNm2iTZs2vPDCC/z9999MnDiRiy66iI4dO5KSksK8efMctm/Tpg0LFiwA4PHHH6dNmzbMnz//jMfvXFR+GSouLnZaV1ZWxrvvvsvgwYPp2LEjXbt25aabbmLx4sXVtrd27VpuueUWevToQVJSEldddRUvv/yyw7WiUkFBAS+99BKDBw+mc+fOdOvWjZEjR/L55587fM7OtO9V5aJy2Z9//smiRYsYPnw4nTt3pkePHtxzzz3s27evytjXr1/PuHHjuOiii+jatSu33XYbu3fv5rHHHnP6rAhRX0ifYCHqUHFxMb///jsAF1xwgX35Tz/9hNlspnfv3ud0l3HIkCG8+OKLbN++ncOHD9OiRQuys7PZvHkzAIMGDTrj6/V6PV988cU5xWy1WlmxYsU5tQswY8aMc2r3XLzxxhukpaVx6aWXcvz4cTp37kxBQQHLly9n1apVDBkyxGH7/fv389dff9G2bVvatGkDQFFREePHj2fHjh1ERERw8cUXYzKZ2LJlC5s2beLuu+/mnnvuOWMcXbp0wWKxsGrVKkpKSujfvz8BAQH2B89Wr17NfffdR3l5OW3atKFr164cPHiQxYsXs2LFCl5//XUuv/xyp3bvueceiouL6dOnDzt37jzjw4rHjx9n69atGI3Gc8qDj48P06dPr3Kd2Wzm9ttvx2Aw0KdPH/7880/atm1LeXk5o0ePJjU1lejoaLp27YqiKPz111+sW7eOn3/+mRkzZnDFFVfY21q7di133303JpOJzp07Exsby++//86oUaMICgo6a5yVnnzySb7++mv8/Pzo2LEjwcHB/P7770ybNo1169Yxc+ZMp363+/fv54YbbsDX15cuXbqQl5fHH3/8waOPPkp+fj4TJkwAYPDgwWzbto0jR47QuXNnEhISzvol4lzs2rULgM6dOzssz83NZezYsezdu5ewsDB69+5NWVkZmzdv5vfff2f9+vW8+OKLKIpif8306dOZOXMmer2ebt26ER4ezrZt2/jggw/44YcfmD17NgkJCUBFgf2f//yHvXv30rx5c3r16kVpaSlbtmzhjz/+IDU1lRdffLFW+/7222/z448/0rZtW3r37s2OHTtYsWIFGzZsYOHChfZYAD7//HOee+45dDod3bt3Jzg4mM2bNzNq1CjOO++8Wh5lITzI2/0xhGgMztQn2Gq1qnl5eeqGDRvsfQiHDRvm0JdzypQpamJiovr666+f83tWtrVkyRJVVVV1y5YtZ+2XXBNpaWn2fsyn9z89V4mJiWpiYqJqNpud1lXVX/HNN99UExMT1TZt2jj0abVarerKlSvVxMRE9fbbb3dq67XXXlMTExPVDz/80Kn9Bx54QC0uLrYvP3jwoHr55ZeriYmJ6vr1689pPyq3P3TokH1ZRkaG2rlzZ7VNmzbq/PnzHbb/5ptv1DZt2qhdunRRT5w4YV8+evRoNTExUe3Tp4+am5tr37czWbZsmZqYmKiOGDHinGI92z4MGzZMLS8vd3jvjz76SE1MTFT/+9//qiaTyf4ai8WiPv3002piYqI6btw4+/KioiL10ksvVdu0aaN+99139uX
"text/plain": [
"<Figure size 800x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.metrics import confusion_matrix, roc_curve, auc, accuracy_score, precision_score, recall_score, f1_score, roc_auc_score\n",
"\n",
"sns.set(style=\"whitegrid\")\n",
"\n",
"def plot_confusion_matrix(y_true, y_pred, title):\n",
" cm = confusion_matrix(y_true, y_pred)\n",
" plt.figure(figsize=(6, 5))\n",
" ax = sns.heatmap(cm, annot=True, fmt='d', cmap='coolwarm', cbar=True, annot_kws={\"size\": 14})\n",
" plt.title(title, fontsize=16)\n",
" plt.xlabel('Предсказанные значения', fontsize=12)\n",
" plt.ylabel('Истинные значения', fontsize=12)\n",
" plt.xticks(fontsize=10)\n",
" plt.yticks(fontsize=10)\n",
" cbar = ax.collections[0].colorbar\n",
" cbar.set_label('Count', rotation=270, labelpad=20, fontsize=12)\n",
" plt.show() \n",
"\n",
"def plot_roc_curve(y_true, y_pred_proba, title):\n",
" fpr, tpr, _ = roc_curve(y_true, y_pred_proba)\n",
" roc_auc = auc(fpr, tpr)\n",
" \n",
" plt.figure(figsize=(8, 6))\n",
" plt.plot(fpr, tpr, color='#FF6347', lw=2, label=f'ROC curve (AUC = {roc_auc:.2f})')\n",
" plt.plot([0, 1], [0, 1], color='gray', linestyle='--', lw=1.5, label='Random Guess')\n",
" plt.xlim([0.0, 1.0])\n",
" plt.ylim([0.0, 1.05])\n",
" plt.xlabel('Показатель ложных положительных результатов', fontsize=12)\n",
" plt.ylabel('Показатель истинных положительных результатов', fontsize=12)\n",
" plt.title(title, fontsize=16)\n",
" plt.legend(loc=\"lower right\", fontsize=10)\n",
" plt.grid(True, linestyle='--', alpha=0.6)\n",
" plt.show()\n",
"\n",
"def evaluate_and_plot_model(model, X_test, y_test, model_name):\n",
" y_pred = model.predict(X_test)\n",
" y_pred_proba = model.predict_proba(X_test)[:, 1]\n",
" \n",
" accuracy = accuracy_score(y_test, y_pred)\n",
" precision = precision_score(y_test, y_pred, pos_label=1)\n",
" recall = recall_score(y_test, y_pred, pos_label=1)\n",
" f1 = f1_score(y_test, y_pred, pos_label=1)\n",
" roc_auc = roc_auc_score(y_test, y_pred_proba)\n",
" \n",
" print(f\"\\n{model_name} Metrics:\")\n",
" print(f\"Accuracy: {accuracy:.4f}\")\n",
" print(f\"Precision: {precision:.4f}\")\n",
" print(f\"Recall: {recall:.4f}\")\n",
" print(f\"F1-Score: {f1:.4f}\")\n",
" print(f\"ROC-AUC: {roc_auc:.4f}\")\n",
" \n",
" plot_confusion_matrix(y_test, y_pred, f'Confusion Matrix for {model_name}')\n",
" plot_roc_curve(y_test, y_pred_proba, f'ROC Curve for {model_name}')\n",
"\n",
"evaluate_and_plot_model(logreg_best_model, X_test, y_test, 'Logistic Regression')\n",
"evaluate_and_plot_model(rf_best_model, X_test, y_test, 'Random Forest')\n",
"evaluate_and_plot_model(xgb_best_model, X_test, y_test, 'Gradient Boosting')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Вывод по результатам задач классификации** \n",
"Результаты обучения моделей для задачи классификации направления изменения цены показали, что качество прогнозирования остаётся на уровне случайного угадывания.\n",
"\n",
"**Анализ метрик для моделей:**\n",
"\n",
"- **Логистическая регрессия:** Данная модель показала точность (Accuracy) 0.4880 и F1-меру 0.4306. Значения Precision (0.4821) и Recall (0.3891) также указывают на трудности модели с корректной классификацией. ROC-AUC на уровне 0.4836 близок к случайному значению (0.5), что говорит о слабой предсказательной способности.\n",
"\n",
"- **Случайный лес:** Случайный лес продемонстрировал лучшие результаты по сравнению с логистической регрессией: точность 0.4936 и F1-меру 0.4764. Метрика ROC-AUC составила 0.5052, что превышает уровень случайного угадывания, но не является достаточным показателем качества.\n",
"\n",
"- **Градиентный бустинг:** Градиентный бустинг показал схожие результаты с о случайным лесом: точность 0.4952, F1-меру 0.4772, и ROC-AUC на уровне 0.4972. Данные значения говорят о том, что, несмотря на сложности задачи, эта модель на данный момент является наилучшей из предложенных."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "miienv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}