AIM-PIbd-32-Petrushin-E-A/lab_4/lab4.ipynb

877 lines
114 KiB
Plaintext
Raw Normal View History

2024-11-02 00:25:32 +04:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Выбор бизнес-целей\n",
"### Задача регрессии:\n",
"\n",
"Цель: Предсказать цену автомобиля (Price) на основе других характеристик.\n",
"\n",
"Применение: Это может быть полезно для автосалонов, онлайн-площадок по продаже автомобилей, а также для частных лиц, которые хотят оценить рыночную стоимость своего автомобиля.\n",
"\n",
"Задача классификации:\n",
"\n",
"Цель: Классифицировать автомобили по категориям (например, \"Эконом\", \"Средний\", \"Премиум\") на основе цены и других характеристик.\n",
"\n",
"Применение: Это может быть полезно для маркетинговых кампаний, определения целевой аудитории, а также для анализа рынка автомобилей."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" ID Price Levy Manufacturer Model Prod. year Category \\\n",
"0 45654403 13328 1399 LEXUS RX 450 2010 Jeep \n",
"1 44731507 16621 1018 CHEVROLET Equinox 2011 Jeep \n",
"2 45774419 8467 - HONDA FIT 2006 Hatchback \n",
"3 45769185 3607 862 FORD Escape 2011 Jeep \n",
"4 45809263 11726 446 HONDA FIT 2014 Hatchback \n",
"\n",
" Leather interior Fuel type Engine volume Mileage Cylinders \\\n",
"0 Yes Hybrid 3.5 186005 km 6.0 \n",
"1 No Petrol 3 192000 km 6.0 \n",
"2 No Petrol 1.3 200000 km 4.0 \n",
"3 Yes Hybrid 2.5 168966 km 4.0 \n",
"4 Yes Petrol 1.3 91901 km 4.0 \n",
"\n",
" Gear box type Drive wheels Doors Wheel Color Airbags \n",
"0 Automatic 4x4 04-May Left wheel Silver 12 \n",
"1 Tiptronic 4x4 04-May Left wheel Black 8 \n",
"2 Variator Front 04-May Right-hand drive Black 2 \n",
"3 Automatic 4x4 04-May Left wheel White 0 \n",
"4 Automatic Front 04-May Left wheel Silver 4 \n",
"Index(['ID', 'Price', 'Levy', 'Manufacturer', 'Model', 'Prod. year',\n",
" 'Category', 'Leather interior', 'Fuel type', 'Engine volume', 'Mileage',\n",
" 'Cylinders', 'Gear box type', 'Drive wheels', 'Doors', 'Wheel', 'Color',\n",
" 'Airbags'],\n",
" dtype='object')\n"
]
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import sklearn\n",
"from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score\n",
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
"from sklearn.compose import ColumnTransformer\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.linear_model import LinearRegression, LogisticRegression\n",
"from sklearn.tree import DecisionTreeRegressor, DecisionTreeClassifier\n",
"from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier\n",
"from sklearn.metrics import mean_squared_error, f1_score, accuracy_score, roc_auc_score, confusion_matrix, classification_report\n",
"df = pd.read_csv(\"./static/csv/car_price_prediction.csv\")\n",
"print(df.head())\n",
"print(df.columns)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Предобработка данных"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ID 0\n",
"Price 0\n",
"Levy 0\n",
"Manufacturer 0\n",
"Model 0\n",
"Prod. year 0\n",
"Category 0\n",
"Leather interior 0\n",
"Fuel type 0\n",
"Engine volume 0\n",
"Mileage 0\n",
"Cylinders 0\n",
"Gear box type 0\n",
"Drive wheels 0\n",
"Doors 0\n",
"Wheel 0\n",
"Color 0\n",
"Airbags 0\n",
"dtype: int64\n",
"object\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\Egor\\AppData\\Local\\Temp\\ipykernel_18436\\3209090058.py:21: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.\n",
"The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.\n",
"\n",
"For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.\n",
"\n",
"\n",
" df['Levy'].fillna(df['Levy'].median(), inplace=True)\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\numpy\\lib\\_nanfunctions_impl.py:1241: RuntimeWarning: Mean of empty slice\n",
" return np.nanmean(a, axis, out=out, keepdims=keepdims)\n",
"C:\\Users\\Egor\\AppData\\Local\\Temp\\ipykernel_18436\\3209090058.py:22: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.\n",
"The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.\n",
"\n",
"For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.\n",
"\n",
"\n",
" df['Mileage'].fillna(df['Mileage'].median(), inplace=True)\n",
"C:\\Users\\Egor\\AppData\\Local\\Temp\\ipykernel_18436\\3209090058.py:23: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.\n",
"The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.\n",
"\n",
"For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.\n",
"\n",
"\n",
" df['Engine volume'].fillna(df['Engine volume'].median(), inplace=True)\n"
]
}
],
"source": [
"# Проверка наличия пропущенных значений\n",
"print(df.isnull().sum())\n",
"\n",
"# Очистка столбца 'Levy' от нечисловых значений\n",
"df['Levy'] = pd.to_numeric(df['Levy'], errors='coerce')\n",
"\n",
"# Очистка столбца 'Mileage' от нечисловых значений\n",
"df['Mileage'] = pd.to_numeric(df['Mileage'], errors='coerce')\n",
"\n",
"# Проверка типа данных в столбце 'Engine volume'\n",
"print(df['Engine volume'].dtype)\n",
"\n",
"# Если столбец 'Engine volume' не является строковым, преобразуем его в строку\n",
"if df['Engine volume'].dtype != 'object':\n",
" df['Engine volume'] = df['Engine volume'].astype(str)\n",
"\n",
"# Очистка столбца 'Engine volume' от нечисловых значений\n",
"df['Engine volume'] = df['Engine volume'].str.replace(r'[^0-9.]', '', regex=True).astype(float)\n",
"\n",
"# Заполнение пропущенных значений\n",
"df['Levy'].fillna(df['Levy'].median(), inplace=True)\n",
"df['Mileage'].fillna(df['Mileage'].median(), inplace=True)\n",
"df['Engine volume'].fillna(df['Engine volume'].median(), inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Определение числовых и категориальных признаков\n",
"numeric_features = ['Levy', 'Prod. year', 'Engine volume', 'Mileage', 'Cylinders', 'Airbags']\n",
"categorical_features = ['Manufacturer', 'Model', 'Category', 'Leather interior', 'Fuel type', 'Gear box type', 'Drive wheels', 'Doors', 'Wheel', 'Color']\n",
"\n",
"# Преобразование категориальных признаков в числовые\n",
"df = pd.get_dummies(df, columns=categorical_features, drop_first=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Разделение данных на тренировочный и тестовый наборы"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Задача регрессии\n",
"X_reg = df.drop(['ID', 'Price'], axis=1)\n",
"y_reg = df['Price']\n",
"\n",
"# Задача классификации\n",
"df['Category'] = pd.cut(df['Price'], bins=[0, 10000, 20000, np.inf], labels=['Эконом', 'Средний', 'Премиум'])\n",
"X_class = df.drop(['ID', 'Price', 'Category'], axis=1)\n",
"y_class = df['Category']\n",
"\n",
"X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)\n",
"X_train_class, X_test_class, y_train_class, y_test_class = train_test_split(X_class, y_class, test_size=0.2, random_state=42)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5. Построение конвейера и обучение моделей\n",
"#### 5.1. Задача регрессии"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"LinearRegression RMSE: 16981.208711977062\n",
"DecisionTreeRegressor RMSE: 141914.29349587928\n",
"RandomForestRegressor RMSE: 173537.46233609488\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n"
]
}
],
"source": [
"# Конвейер для задачи регрессии\n",
"from sklearn.impute import SimpleImputer\n",
"\n",
"\n",
"numeric_transformer = Pipeline(steps=[\n",
" ('imputer', SimpleImputer(strategy='median')),\n",
" ('scaler', StandardScaler())\n",
"])\n",
"\n",
"preprocessor_reg = ColumnTransformer(\n",
" transformers=[\n",
" ('num', numeric_transformer, numeric_features)\n",
" ])\n",
"\n",
"pipeline_reg = Pipeline(steps=[\n",
" ('preprocessor', preprocessor_reg),\n",
" ('regressor', LinearRegression())\n",
"])\n",
"\n",
"# Обучение моделей\n",
"models_reg = {\n",
" 'LinearRegression': LinearRegression(),\n",
" 'DecisionTreeRegressor': DecisionTreeRegressor(),\n",
" 'RandomForestRegressor': RandomForestRegressor()\n",
"}\n",
"\n",
"for name, model in models_reg.items():\n",
" pipeline_reg.set_params(regressor=model)\n",
" pipeline_reg.fit(X_train_reg, y_train_reg)\n",
" y_pred_reg = pipeline_reg.predict(X_test_reg)\n",
" rmse = np.sqrt(mean_squared_error(y_test_reg, y_pred_reg))\n",
" print(f'{name} RMSE: {rmse}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.2. Задача классификации"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"LogisticRegression F1-score: 0.48010296192139407\n",
"DecisionTreeClassifier F1-score: 0.6836168013771631\n",
"RandomForestClassifier F1-score: 0.6943295967769952\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n"
]
}
],
"source": [
"# Конвейер для задачи классификации\n",
"preprocessor_class = ColumnTransformer(\n",
" transformers=[\n",
" ('num', numeric_transformer, numeric_features)\n",
" ])\n",
"\n",
"pipeline_class = Pipeline(steps=[\n",
" ('preprocessor', preprocessor_class),\n",
" ('classifier', LogisticRegression())\n",
"])\n",
"\n",
"# Обучение моделей\n",
"models_class = {\n",
" 'LogisticRegression': LogisticRegression(),\n",
" 'DecisionTreeClassifier': DecisionTreeClassifier(),\n",
" 'RandomForestClassifier': RandomForestClassifier()\n",
"}\n",
"\n",
"for name, model in models_class.items():\n",
" pipeline_class.set_params(classifier=model)\n",
" pipeline_class.fit(X_train_class, y_train_class)\n",
" y_pred_class = pipeline_class.predict(X_test_class)\n",
" f1 = f1_score(y_test_class, y_pred_class, average='weighted')\n",
" print(f'{name} F1-score: {f1}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6. Оценка качества моделей"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"LinearRegression RMSE: 16981.208711977062, MAE: 11731.578355206166\n",
"DecisionTreeRegressor RMSE: 141914.29349587928, MAE: 9887.588955657844\n",
"RandomForestRegressor RMSE: 173537.46233609488, MAE: 12656.846663315797\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n"
]
}
],
"source": [
"# Оценка качества моделей регрессии\n",
"import sklearn\n",
"#from sklearn.base import r2_score #r2 = r2_score(y_test_reg, y_pred_reg)\n",
"from sklearn.metrics import mean_absolute_error\n",
"\n",
"\n",
"for name, model in models_reg.items():\n",
" pipeline_reg.set_params(regressor=model)\n",
" y_pred_reg = pipeline_reg.predict(X_test_reg)\n",
" rmse = np.sqrt(mean_squared_error(y_test_reg, y_pred_reg))\n",
" mae = mean_absolute_error(y_test_reg, y_pred_reg)\n",
" \n",
" print(f'{name} RMSE: {rmse}, MAE: {mae}')"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"LogisticRegression F1-score: 0.48010296192139407, Accuracy: 0.502079002079002, ROC-AUC: 0.6953729054676709\n",
"DecisionTreeClassifier F1-score: 0.6836168013771631, Accuracy: 0.6876299376299376, ROC-AUC: 0.8222065250497814\n",
"RandomForestClassifier F1-score: 0.6943295967769952, Accuracy: 0.6993243243243243, ROC-AUC: 0.856645400908623\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n"
]
}
],
"source": [
"# Оценка качества моделей классификации\n",
"for name, model in models_class.items():\n",
" pipeline_class.set_params(classifier=model)\n",
" y_pred_class = pipeline_class.predict(X_test_class)\n",
" f1 = f1_score(y_test_class, y_pred_class, average='weighted')\n",
" accuracy = accuracy_score(y_test_class, y_pred_class)\n",
" roc_auc = roc_auc_score(y_test_class, pipeline_class.predict_proba(X_test_class), multi_class='ovr')\n",
" print(f'{name} F1-score: {f1}, Accuracy: {accuracy}, ROC-AUC: {roc_auc}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Выводы по пункту 6: Оценка качества моделей\n",
"Задача регрессии\n",
"Линейная регрессия (LinearRegression):\n",
"\n",
"RMSE: 16981.208711977062\n",
"\n",
"MAE: 11731.578355206166\n",
"\n",
"Вывод: Линейная регрессия показала относительно низкое качество предсказаний. RMSE и MAE достаточно высоки, что указывает на то, что модель плохо предсказывает цены автомобилей.\n",
"\n",
"Дерево решений (DecisionTreeRegressor):\n",
"\n",
"RMSE: 141914.29349587928\n",
"\n",
"MAE: 9887.588955657844\n",
"\n",
"Вывод: Дерево решений показало значительно более высокое значение RMSE по сравнению с линейной регрессией, что указывает на то, что модель сильно переобучилась. Однако MAE ниже, чем у линейной регрессии, что может указывать на то, что модель лучше предсказывает средние значения цен.\n",
"\n",
"Случайный лес (RandomForestRegressor):\n",
"\n",
"RMSE: 173537.46233609488\n",
"\n",
"MAE: 12656.846663315797\n",
"\n",
"Вывод: Случайный лес показал еще более высокое значение RMSE, что указывает на то, что модель также сильно переобучилась. MAE выше, чем у линейной регрессии, что говорит о том, что модель предсказывает цены хуже, чем линейная регрессия.\n",
"\n",
"Задача классификации\n",
"Логистическая регрессия (LogisticRegression):\n",
"\n",
"F1-score: 0.48010296192139407\n",
"\n",
"Accuracy: 0.502079002079002\n",
"\n",
"ROC-AUC: 0.6953729054676709\n",
"\n",
"Вывод: Логистическая регрессия показала низкое качество классификации. F1-score и точность близки к 0.5, что указывает на то, что модель почти не лучше случайного угадывания. ROC-AUC также низкий, что говорит о плохой способности модели различать классы.\n",
"\n",
"Дерево решений (DecisionTreeClassifier):\n",
"\n",
"F1-score: 0.6836168013771631\n",
"\n",
"Accuracy: 0.6876299376299376\n",
"\n",
"ROC-AUC: 0.8222065250497814\n",
"\n",
"Вывод: Дерево решений показало значительно лучшее качество классификации по сравнению с логистической регрессией. F1-score и точность выше, а ROC-AUC значительно лучше, что указывает на то, что модель хорошо различает классы.\n",
"\n",
"Случайный лес (RandomForestClassifier):\n",
"\n",
"F1-score: 0.6943295967769952\n",
"\n",
"Accuracy: 0.6993243243243243\n",
"\n",
"ROC-AUC: 0.856645400908623\n",
"\n",
"Вывод: Случайный лес показал лучшее качество классификации среди всех моделей. F1-score, точность и ROC-AUC выше, чем у дерева решений, что указывает на то, что модель хорошо обобщает данные и различает классы.\n",
"\n",
"Общие выводы:\n",
"Задача регрессии: Линейная регрессия показала лучшее качество предсказаний цен по сравнению с деревьями решений и случайным лесом, несмотря на высокие значения RMSE и MAE. Деревья решений и случайный лес показали сильное переобучение, что привело к очень высоким значениям RMSE.\n",
"\n",
"Задача классификации: Случайный лес показал лучшее качество классификации по сравнению с логистической регрессией и деревом решений. Логистическая регрессия показала низкое качество, в то время как дерево решений и случайный лес показали хорошие результаты, причем случайный лес показал наилучшие результаты"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"LinearRegression Cross-Validation RMSE: 100651.03159099314, Std: 161863.4449796077\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"DecisionTreeRegressor Cross-Validation RMSE: 194034.64594171714, Std: 136171.92328322295\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"RandomForestRegressor Cross-Validation RMSE: 181627.2578040142, Std: 137879.8905706371\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"LogisticRegression Cross-Validation F1-score: 0.4742308354293046, Std: 0.007525407236566359\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"DecisionTreeClassifier Cross-Validation F1-score: 0.6862381973987357, Std: 0.004587968007336983\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n",
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"RandomForestClassifier Cross-Validation F1-score: 0.692567227648008, Std: 0.004169193228958696\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
" warnings.warn(\n"
]
}
],
"source": [
"# Оценка смещения и дисперсии для задачи регрессии\n",
"for name, model in models_reg.items():\n",
" pipeline_reg.set_params(regressor=model)\n",
" scores = cross_val_score(pipeline_reg, X_reg, y_reg, cv=5, scoring='neg_mean_squared_error')\n",
" rmse_scores = np.sqrt(-scores)\n",
" print(f'{name} Cross-Validation RMSE: {rmse_scores.mean()}, Std: {rmse_scores.std()}')\n",
"\n",
"# Оценка смещения и дисперсии для задачи классификации\n",
"for name, model in models_class.items():\n",
" pipeline_class.set_params(classifier=model)\n",
" scores = cross_val_score(pipeline_class, X_class, y_class, cv=5, scoring='f1_weighted')\n",
" print(f'{name} Cross-Validation F1-score: {scores.mean()}, Std: {scores.std()}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Оценка смещения и дисперсии моделей\n",
"Задача регрессии\n",
"Дерево решений (DecisionTreeRegressor):\n",
"\n",
"Cross-Validation RMSE: 194034.64594171714\n",
"\n",
"Std: 136171.92328322295\n",
"\n",
"Вывод: Дерево решений показало очень высокое значение RMSE при кросс-валидации, что указывает на сильное переобучение. Стандартное отклонение также высокое, что говорит о нестабильности модели. Это означает, что модель плохо обобщает данные и имеет высокую дисперсию.\n",
"\n",
"Случайный лес (RandomForestRegressor):\n",
"\n",
"Cross-Validation RMSE: 181627.2578040142\n",
"\n",
"Std: 137879.8905706371\n",
"\n",
"Вывод: Случайный лес также показал высокое значение RMSE при кросс-валидации, хотя и немного ниже, чем у дерева решений. Стандартное отклонение также высокое, что указывает на нестабильность модели. Это говорит о том, что модель также переобучена и имеет высокую дисперсию.\n",
"\n",
"Задача классификации\n",
"Дерево решений (DecisionTreeClassifier):\n",
"\n",
"Cross-Validation F1-score: 0.6862381973987357\n",
"\n",
"Std: 0.004587968007336983\n",
"\n",
"Вывод: Дерево решений показало хороший F1-score при кросс-валидации, но стандартное отклонение относительно высокое. Это указывает на некоторую нестабильность модели, хотя и не такую высокую, как в случае регрессии. Модель имеет умеренную дисперсию.\n",
"\n",
"Случайный лес (RandomForestClassifier):\n",
"\n",
"Cross-Validation F1-score: 0.692567227648008\n",
"\n",
"Std: 0.004169193228958696\n",
"\n",
"#### Вывод: Случайный лес показал лучший F1-score при кросс-валидации по сравнению с деревом решений. Стандартное отклонение также ниже, что указывает на более стабильную модель. Это говорит о том, что случайный лес лучше обобщает данные и имеет меньшую дисперсию по сравнению с деревом решений.\n",
"\n",
"Общие выводы:\n",
"Задача регрессии: И дерево решений, и случайный лес показали высокие значения RMSE и высокое стандартное отклонение при кросс-валидации. Это указывает на сильное переобучение и высокую дисперсию. Модели плохо обобщают данные и нестабильны.\n",
"\n",
"Задача классификации: Дерево решений показало хороший F1-score, но с высоким стандартным отклонением, что указывает на некоторую нестабильность. Случайный лес показал лучший F1-score и более низкое стандартное отклонение, что говорит о более стабильной и обобщающей способности модели."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1QAAAIjCAYAAAAEMVqQAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABdsElEQVR4nO3dd3wUdf7H8fem7GbTQxLASAiBRHpTBBWIBU5A9Cx4AupJsR+IniennkezIXee5x0WrKg/EVEBC5YTQUUQBaSJCAIGQQEhQLLpm+x+f3+ErCzpQ2ATeD0fjzwkM9+d+czMbvy+d2a+YzPGGAEAAAAA6iwo0AUAAAAAQGNFoAIAAAAAiwhUAAAAAGARgQoAAAAALCJQAQAAAIBFBCoAAAAAsIhABQAAAAAWEagAAAAAwCICFQAAAABYRKACgBOczWbT5MmTA11GwJ133nk677zzfL9v375dNptNL730UsBqOtKRNR4rDXHbAaCxIlABQB089dRTstls6tWrl+Vl7Nq1S5MnT9batWvrr7AG7rPPPpPNZvP9hIaGqnXr1rruuuv0448/Brq8Ovnyyy81efJkZWdnB6yGVq1a+e3Ppk2bqm/fvpo/f37AagKAk1VIoAsAgMZk1qxZatWqlVasWKGtW7cqLS2tzsvYtWuXpkyZolatWqlbt271X2QDNm7cOJ155pkqKSnR6tWr9eyzz+r999/Xt99+q6SkpONaS0pKigoLCxUaGlqn13355ZeaMmWKRo4cqdjY2GNTXC1069ZNf/nLXySVvaeeeeYZXXHFFXr66ad1yy23VPtaq9sOAKiIM1QAUEuZmZn68ssv9dhjjykxMVGzZs0KdEmNTt++fXXttddq1KhRmj59uh599FEdOHBAL7/8cpWvyc/PPya12Gw2hYWFKTg4+Jgs/1g79dRTde211+raa6/VX//6Vy1btkwRERH697//XeVrSktL5Xa7G/22A0BDQqACgFqaNWuW4uLiNHjwYF155ZVVBqrs7Gz9+c9/VqtWreRwONSiRQtdd911ysrK0meffaYzzzxTkjRq1CjfJVvl97K0atVKI0eOrLDMI++tcbvdmjhxos444wzFxMQoIiJCffv21aefflrn7fr1118VEhKiKVOmVJi3efNm2Ww2PfHEE5KkkpISTZkyRenp6QoLC1N8fLz69OmjhQsX1nm9knTBBRdIKgurkjR58mTZbDZt3LhRV199teLi4tSnTx9f+1dffVVnnHGGnE6nmjRpomHDhmnnzp0Vlvvss8+qTZs2cjqd6tmzp7744osKbaq6j2jTpk266qqrlJiYKKfTqbZt2+q+++7z1Td+/HhJUmpqqu/4bd++/ZjUWBfNmzdX+/btffuyfPseffRRPf7442rTpo0cDoc2btxoadvL/fLLLxo9erSaNWsmh8Ohjh076sUXX6xQz/Tp09WxY0eFh4crLi5OPXr00GuvvXZU2wgADRGX/AFALc2aNUtXXHGF7Ha7hg8frqefflorV670BSRJysvLU9++ffX9999r9OjROv3005WVlaV3331XP//8s9q3b6/7779fEydO1E033aS+fftKks4555w61eJyufT8889r+PDhuvHGG5Wbm6sXXnhBAwYM0IoVK+p0KWGzZs107rnn6o033tCkSZP85s2ZM0fBwcH6wx/+IKksUEydOlU33HCDevbsKZfLpVWrVmn16tX63e9+V6dtkKRt27ZJkuLj4/2m/+EPf1B6eroefvhhGWMkSQ899JAmTJigq666SjfccIP27dun6dOnKyMjQ2vWrPFdfvfCCy/o5ptv1jnnnKM77rhDP/74o37/+9+rSZMmSk5Orrae9evXq2/fvgoNDdVNN92kVq1aadu2bXrvvff00EMP6YorrtAPP/yg2bNn69///rcSEhIkSYmJicetxqqUlJRo586dFfblzJkzVVRUpJtuukkOh0NNmjSR1+ut87ZLZeH7rLPOks1m09ixY5WYmKgPP/xQ119/vVwul+644w5J0nPPPadx48bpyiuv1O23366ioiKtX79eX3/9ta6++mpL2wcADZYBANRo1apVRpJZuHChMcYYr9drWrRoYW6//Xa/dhMnTjSSzLx58yosw+v1GmOMWblypZFkZs6cWaFNSkqKGTFiRIXp5557rjn33HN9v5eWlpri4mK/NgcPHjTNmjUzo0eP9psuyUyaNKna7XvmmWeMJPPtt9/6Te/QoYO54IILfL937drVDB48uNplVebTTz81ksyLL75o9u3bZ3bt2mXef/9906pVK2Oz2czKlSuNMcZMmjTJSDLDhw/3e/327dtNcHCweeihh/ymf/vttyYkJMQ33e12m6ZNm5pu3br57Z9nn33WSPLbh5mZmRWOQ0ZGhomKijI//fST33rKj50xxvzzn/80kkxmZuYxr7EqKSkp5sILLzT79u0z+/btM+vWrTPDhg0zksxtt93mt33R0dFm7969fq+3uu3XX3+9OeWUU0xWVpZfm2HDhpmYmBhTUFBgjDHm0ksvNR07dqxxOwDgRMAlfwBQC7NmzVKzZs10/vnnSyq7/2bo0KF6/fXX5fF4fO3mzp2rrl276vLLL6+wDJvNVm/1BAcHy263S5K8Xq8OHDig0tJS9ejRQ6tXr67z8q644gqFhIRozpw5vmkbNmzQxo0bNXToUN+02NhYfffdd9qyZYulukePHq3ExEQlJSVp8ODBys/P18svv6wePXr4tTtyUIV58+bJ6/XqqquuUlZWlu+nefPmSk9P913quGrVKu3du1e33HKLb/9I0siRIxUTE1Ntbfv27dOSJUs0evRotWzZ0m9ebY7d8ajxcB9//LESExOVmJiorl276s0339Qf//hHTZs2za/dkCFDfGfQqlKbbTfGaO7cubrkkktkjPHbxgEDBignJ8f33ouNjdXPP/+slStX1np7AKCxOmEC1ZIlS3TJJZcoKSlJNptNb7/9dp2XYYzRo48+qtNOO00Oh0Onnnqq7zIHACcvj8ej119/Xeeff74yMzO1detWbd26Vb169dKvv/6qRYsW+dpu27ZNnTp1Oi51vfzyy+rSpYvvXqbExES9//77ysnJqfOyEhIS1K9fP73xxhu+aXPmzFFISIiuuOIK37T7779f2dnZOu2009S5c2eNHz9e69evr/V6Jk6cqIULF2rx4sVav369du3apT/+8Y8V2qWmpvr9vmXLFhljlJ6e7gsR5T/ff/+99u7dK0n66aefJEnp6el+ry8fpr065cO3Wz1+x6PGw/Xq1UsLFy7UJ598oi+//FJZWVl65ZVX5HQ6/doduS8rU5tt37dvn7Kzs/Xss89W2L5Ro0ZJkm8b7777bkVGRqpnz55KT0/XmDFjtGzZslpvGwA0JifMPVT5+fnq2rWrRo8e7fc//7q4/fbb9fHHH+vRRx9V586ddeDAAR04cKCeKwXQ2CxevFi7d+/W66+/rtdff73C/FmzZunCCy+sl3VVdSbE4/H4jcj26quvauTIkbrssss0fvx4NW3aVMHBwZo6darvvqS6GjZsmEaNGqW1a9eqW7dueuONN9SvXz/ffUKSlJGRoW3btumdd97Rxx9/rOeff17//ve/NWPGDN1www01rqNz587q379/je2ODAVer1c2m00ffvhhpSPTRUZG1mILj63jXWNCQoKlfWlV+X1X1157rUaMGFFpmy5dukiS2rdvr82bN2vBggX66KOPNHfuXD311FOaOHFipYOfAEBjdsIEqkGDBmnQoEFVzi8uLtZ9992n2bNnKzs7W506ddK0adN8o2Z9//33evrpp7Vhwwa1bdtWUu2+1QNw4ps1a5aaNm2qJ598ssK8efPmaf78+ZoxY4acTqfatGmjDRs2VLu86i4fi4uLq/SBsT/99JPf2Yu33npLrVu31rx58/yWd+SgEnVx2WWX6eabb/Zd9vfDDz/o3nvvrdCuSZMmGjVqlEaNGqW8vDxlZGRo8uTJtQpUVrVp00bGGKWmpuq0006rsl1KSoqksrNF5SMISmUDNmRmZqpr165VvrZ8/1o9fsejxmOlNtuemJioqKg
"text/plain": [
"<Figure size 1000x600 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAiQAAAHHCAYAAACPy0PBAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABQBElEQVR4nO3deVhUZf8G8HuGZUB2RBhwQdxBzbUQd5NURMUtM30LzbQM3HClN/cUwx030rfUDM2yNLM0SVJSERXFLcVdNAVUBGQbljm/P/w5OYEd0DkcHO/Pe811xXOeOed75p3o5vucM6MQBEEAERERkYyUchdARERExEBCREREsmMgISIiItkxkBAREZHsGEiIiIhIdgwkREREJDsGEiIiIpIdAwkRERHJjoGEiIiIZMdAQiShS5cuoVu3brCzs4NCocCOHTsMuv/r169DoVBgw4YNBt3vi6xz587o3Lmz3GUQUTkxkJDRu3LlCj744APUqVMHFhYWsLW1Rbt27bB8+XLk5eVJeuzAwECcOXMG8+bNw6ZNm9C6dWtJj1eRhg0bBoVCAVtb21Jfx0uXLkGhUEChUGDRokXl3v/t27cxa9YsJCYmGqBaIqrsTOUugEhKP//8M958802oVCq8++67aNKkCQoKCnDw4EFMnjwZ586dw9q1ayU5dl5eHuLi4vDf//4XwcHBkhzD3d0deXl5MDMzk2T/YkxNTZGbm4uffvoJgwYN0tsWFRUFCwsL5OfnP9O+b9++jdmzZ6N27dpo3rx5mZ+3d+/eZzoeEcmLgYSM1rVr1zB48GC4u7sjJiYGrq6uum1BQUG4fPkyfv75Z8mOf/fuXQCAvb29ZMdQKBSwsLCQbP9iVCoV2rVrhy1btpQIJJs3b4a/vz++//77CqklNzcXVapUgbm5eYUcj4gMi0s2ZLTCw8ORnZ2NL774Qi+MPFavXj2MGzdO93NRURHmzp2LunXrQqVSoXbt2vj444+h0Wj0nle7dm306tULBw8exGuvvQYLCwvUqVMHX331lW7OrFmz4O7uDgCYPHkyFAoFateuDeDRUsfjf37SrFmzoFAo9Maio6PRvn172Nvbw9raGg0bNsTHH3+s2/60a0hiYmLQoUMHWFlZwd7eHgEBATh//nypx7t8+TKGDRsGe3t72NnZYfjw4cjNzX36C/sPQ4YMwe7du5GRkaEbO3bsGC5duoQhQ4aUmJ+eno5JkyahadOmsLa2hq2tLfz8/HDq1CndnP379+PVV18FAAwfPly39PP4PDt37owmTZogISEBHTt2RJUqVXSvyz+vIQkMDISFhUWJ8+/evTscHBxw+/btMp8rEUmHgYSM1k8//YQ6deqgbdu2ZZr//vvvY8aMGWjZsiWWLl2KTp06ISwsDIMHDy4x9/Llyxg4cCDeeOMNLF68GA4ODhg2bBjOnTsHAOjfvz+WLl0KAHj77bexadMmLFu2rFz1nzt3Dr169YJGo8GcOXOwePFi9OnTB4cOHfrX5/3222/o3r070tLSMGvWLISEhODw4cNo164drl+/XmL+oEGD8PDhQ4SFhWHQoEHYsGEDZs+eXeY6+/fvD4VCgR9++EE3tnnzZjRq1AgtW7YsMf/q1avYsWMHevXqhSVLlmDy5Mk4c+YMOnXqpAsHnp6emDNnDgBg1KhR2LRpEzZt2oSOHTvq9nP//n34+fmhefPmWLZsGbp06VJqfcuXL0e1atUQGBiI4uJiAMDnn3+OvXv3YsWKFXBzcyvzuRKRhAQiI5SZmSkAEAICAso0PzExUQAgvP/++3rjkyZNEgAIMTExujF3d3cBgBAbG6sbS0tLE1QqlTBx4kTd2LVr1wQAwsKFC/X2GRgYKLi7u5eoYebMmcKT/0ouXbpUACDcvXv3qXU/Psb69et1Y82bNxecnZ2F+/fv68ZOnTolKJVK4d133y1xvPfee09vn/369ROqVq361GM+eR5WVlaCIAjCwIEDha5duwqCIAjFxcWCWq0WZs+eXeprkJ+fLxQXF5c4D5VKJcyZM0c3duzYsRLn9linTp0EAEJkZGSp2zp16qQ39uuvvwoAhE8//VS4evWqYG1tLfTt21f0HImo4rBDQkYpKysLAGBjY1Om+b/88gsAICQkRG984sSJAFDiWhMvLy906NBB93O1atXQsGFDXL169Zlr/qfH1578+OOP0Gq1ZXrOnTt3kJiYiGHDhsHR0VE3/sorr+CNN97QneeTPvzwQ72fO3TogPv37+tew7IYMmQI9u/fj5SUFMTExCAlJaXU5Rrg0XUnSuWjXz3FxcW4f/++bjnqxIkTZT6mSqXC8OHDyzS3W7du+OCDDzBnzhz0798fFhYW+Pzzz8t8LCKSHgMJGSVbW1sAwMOHD8s0/8aNG1AqlahXr57euFqthr29PW7cuKE3XqtWrRL7cHBwwIMHD56x4pLeeusttGvXDu+//z5cXFwwePBgfPvtt/8aTh7X2bBhwxLbPD09ce/ePeTk5OiN//NcHBwcAKBc59KzZ0/Y2Nhg69atiIqKwquvvlritXxMq9Vi6dKlqF+/PlQqFZycnFCtWjWcPn0amZmZZT5m9erVy3UB66JFi+Do6IjExERERETA2dm5zM8lIukxkJBRsrW1hZubG86ePVuu5/3zotKnMTExKXVcEIRnPsbj6xses7S0RGxsLH777Te88847OH36NN566y288cYbJeY+j+c5l8dUKhX69++PjRs3Yvv27U/tjgDA/PnzERISgo4dO+Lrr7/Gr7/+iujoaDRu3LjMnSDg0etTHidPnkRaWhoA4MyZM+V6LhFJj4GEjFavXr1w5coVxMXFic51d3eHVqvFpUuX9MZTU1ORkZGhu2PGEBwcHPTuSHnsn10YAFAqlejatSuWLFmCP//8E/PmzUNMTAx+//33Uvf9uM6kpKQS2y5cuAAnJydYWVk93wk8xZAhQ3Dy5Ek8fPiw1AuBH9u2bRu6dOmCL774AoMHD0a3bt3g6+tb4jUpazgsi5ycHAwfPhxeXl4YNWoUwsPDcezYMYPtn4ieHwMJGa0pU6bAysoK77//PlJTU0tsv3LlCpYvXw7g0ZIDgBJ3wixZsgQA4O/vb7C66tati8zMTJw+fVo3dufOHWzfvl1vXnp6eonnPv6AsH/eivyYq6srmjdvjo0bN+r9B/7s2bPYu3ev7jyl0KVLF8ydOxcrV66EWq1+6jwTE5MS3ZfvvvsOf/31l97Y4+BUWngrr6lTpyI5ORkbN27EkiVLULt2bQQGBj71dSSiiscPRiOjVbduXWzevBlvvfUWPD099T6p9fDhw/juu+8wbNgwAECzZs0QGBiItWvXIiMjA506dcLRo0exceNG9O3b96m3lD6LwYMHY+rUqejXrx/Gjh2L3NxcrFmzBg0aNNC7qHPOnDmIjY2Fv78/3N3dkZaWhtWrV6NGjRpo3779U/e/cOFC+Pn5wcfHByNGjEBeXh5WrFgBOzs7zJo1y2Dn8U9KpRKffPKJ6LxevXphzpw5GD58ONq2bYszZ84gKioKderU0ZtXt25d2NvbIzIyEjY2NrCysoK3tzc8PDzKVVdMTAxWr16NmTNn6m5DXr9+PTp37ozp06cjPDy8XPsjIonIfJcPkeQuXrwojBw5Uqhdu7Zgbm4u2NjYCO3atRNWrFgh5Ofn6+YVFhYKs2fPFjw8PAQzMzOhZs2aQmhoqN4cQXh026+/v3+J4/zzdtOn3fYrCIKwd+9eoUmTJoK5ubnQsGFD4euvvy5x2+++ffuEgIAAwc3NTTA3Nxfc3NyEt99+W7h48WKJY/zz1tjffvtNaNeunWBpaSnY2toKvXv3Fv7880+9OY+P98/bitevXy8AEK5du/bU11QQ9G/7fZqn3fY7ceJEwdXVVbC0tBTatWsnxMXFlXq77o8//ih4eXkJpqameufZqVMnoXHjxqUe88n9ZGVlCe7u7kLLli2FwsJCvXkTJkwQlEqlEBcX96/nQEQVQyEI5bhyjYiIiEgCvIaEiIiIZMdAQkRERLJjICEiIiLZMZAQERGR7BhIiIiISHYMJERERCQ7BhIiIiKSnVF
"text/plain": [
"<Figure size 640x480 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Визуализация результатов для задачи регрессии\n",
"plt.figure(figsize=(10, 6))\n",
"sns.scatterplot(x=y_test_reg, y=y_pred_reg)\n",
"plt.xlabel('Actual Prices')\n",
"plt.ylabel('Predicted Prices')\n",
"plt.title('Actual vs Predicted Prices')\n",
"plt.show()\n",
"\n",
"# Визуализация результатов для задачи классификации\n",
"conf_matrix = confusion_matrix(y_test_class, y_pred_class)\n",
"sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues')\n",
"plt.xlabel('Predicted')\n",
"plt.ylabel('Actual')\n",
"plt.title('Confusion Matrix')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1) Плохое качество предсказаний в первой диаграмме: \n",
"Модель регрессии плохо предсказывает цены автомобилей, так как точки на диаграмме рассеяния распределены хаотично и далеко от диагонали. Значительные ошибки: Ошибки предсказаний значительны, что указывает на то, что модель не может точно предсказать цены автомобилей. Необходимость улучшения модели: Для улучшения качества предсказаний стоит рассмотреть другие модели, такие как градиентный бустинг или нейронные сети, а также улучшить предобработку данных.\n",
"\n",
"2) Выводы по второй диаграмме:\n",
"Хорошее качество классификации: Матрица ошибок показывает высокие значения на диагонали и низкие значения вне диагонали, что указывает на хорошее качество классификации.\n",
"Правильные предсказания: Большинство предсказаний модели являются правильными, что говорит о ее способности хорошо различать классы.\n",
"Низкие ошибки: Низкие значения вне диагонали указывают на то, что модель допускает мало ошибок при классификации."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}