877 lines
114 KiB
Plaintext
877 lines
114 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Выбор бизнес-целей\n",
|
|||
|
"### Задача регрессии:\n",
|
|||
|
"\n",
|
|||
|
"Цель: Предсказать цену автомобиля (Price) на основе других характеристик.\n",
|
|||
|
"\n",
|
|||
|
"Применение: Это может быть полезно для автосалонов, онлайн-площадок по продаже автомобилей, а также для частных лиц, которые хотят оценить рыночную стоимость своего автомобиля.\n",
|
|||
|
"\n",
|
|||
|
"Задача классификации:\n",
|
|||
|
"\n",
|
|||
|
"Цель: Классифицировать автомобили по категориям (например, \"Эконом\", \"Средний\", \"Премиум\") на основе цены и других характеристик.\n",
|
|||
|
"\n",
|
|||
|
"Применение: Это может быть полезно для маркетинговых кампаний, определения целевой аудитории, а также для анализа рынка автомобилей."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 1,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
" ID Price Levy Manufacturer Model Prod. year Category \\\n",
|
|||
|
"0 45654403 13328 1399 LEXUS RX 450 2010 Jeep \n",
|
|||
|
"1 44731507 16621 1018 CHEVROLET Equinox 2011 Jeep \n",
|
|||
|
"2 45774419 8467 - HONDA FIT 2006 Hatchback \n",
|
|||
|
"3 45769185 3607 862 FORD Escape 2011 Jeep \n",
|
|||
|
"4 45809263 11726 446 HONDA FIT 2014 Hatchback \n",
|
|||
|
"\n",
|
|||
|
" Leather interior Fuel type Engine volume Mileage Cylinders \\\n",
|
|||
|
"0 Yes Hybrid 3.5 186005 km 6.0 \n",
|
|||
|
"1 No Petrol 3 192000 km 6.0 \n",
|
|||
|
"2 No Petrol 1.3 200000 km 4.0 \n",
|
|||
|
"3 Yes Hybrid 2.5 168966 km 4.0 \n",
|
|||
|
"4 Yes Petrol 1.3 91901 km 4.0 \n",
|
|||
|
"\n",
|
|||
|
" Gear box type Drive wheels Doors Wheel Color Airbags \n",
|
|||
|
"0 Automatic 4x4 04-May Left wheel Silver 12 \n",
|
|||
|
"1 Tiptronic 4x4 04-May Left wheel Black 8 \n",
|
|||
|
"2 Variator Front 04-May Right-hand drive Black 2 \n",
|
|||
|
"3 Automatic 4x4 04-May Left wheel White 0 \n",
|
|||
|
"4 Automatic Front 04-May Left wheel Silver 4 \n",
|
|||
|
"Index(['ID', 'Price', 'Levy', 'Manufacturer', 'Model', 'Prod. year',\n",
|
|||
|
" 'Category', 'Leather interior', 'Fuel type', 'Engine volume', 'Mileage',\n",
|
|||
|
" 'Cylinders', 'Gear box type', 'Drive wheels', 'Doors', 'Wheel', 'Color',\n",
|
|||
|
" 'Airbags'],\n",
|
|||
|
" dtype='object')\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"import pandas as pd\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import seaborn as sns\n",
|
|||
|
"import sklearn\n",
|
|||
|
"from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score\n",
|
|||
|
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
|
|||
|
"from sklearn.compose import ColumnTransformer\n",
|
|||
|
"from sklearn.pipeline import Pipeline\n",
|
|||
|
"from sklearn.linear_model import LinearRegression, LogisticRegression\n",
|
|||
|
"from sklearn.tree import DecisionTreeRegressor, DecisionTreeClassifier\n",
|
|||
|
"from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier\n",
|
|||
|
"from sklearn.metrics import mean_squared_error, f1_score, accuracy_score, roc_auc_score, confusion_matrix, classification_report\n",
|
|||
|
"df = pd.read_csv(\"./static/csv/car_price_prediction.csv\")\n",
|
|||
|
"print(df.head())\n",
|
|||
|
"print(df.columns)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Предобработка данных"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 2,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"ID 0\n",
|
|||
|
"Price 0\n",
|
|||
|
"Levy 0\n",
|
|||
|
"Manufacturer 0\n",
|
|||
|
"Model 0\n",
|
|||
|
"Prod. year 0\n",
|
|||
|
"Category 0\n",
|
|||
|
"Leather interior 0\n",
|
|||
|
"Fuel type 0\n",
|
|||
|
"Engine volume 0\n",
|
|||
|
"Mileage 0\n",
|
|||
|
"Cylinders 0\n",
|
|||
|
"Gear box type 0\n",
|
|||
|
"Drive wheels 0\n",
|
|||
|
"Doors 0\n",
|
|||
|
"Wheel 0\n",
|
|||
|
"Color 0\n",
|
|||
|
"Airbags 0\n",
|
|||
|
"dtype: int64\n",
|
|||
|
"object\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"C:\\Users\\Egor\\AppData\\Local\\Temp\\ipykernel_18436\\3209090058.py:21: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.\n",
|
|||
|
"The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.\n",
|
|||
|
"\n",
|
|||
|
"For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
" df['Levy'].fillna(df['Levy'].median(), inplace=True)\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\numpy\\lib\\_nanfunctions_impl.py:1241: RuntimeWarning: Mean of empty slice\n",
|
|||
|
" return np.nanmean(a, axis, out=out, keepdims=keepdims)\n",
|
|||
|
"C:\\Users\\Egor\\AppData\\Local\\Temp\\ipykernel_18436\\3209090058.py:22: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.\n",
|
|||
|
"The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.\n",
|
|||
|
"\n",
|
|||
|
"For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
" df['Mileage'].fillna(df['Mileage'].median(), inplace=True)\n",
|
|||
|
"C:\\Users\\Egor\\AppData\\Local\\Temp\\ipykernel_18436\\3209090058.py:23: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.\n",
|
|||
|
"The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.\n",
|
|||
|
"\n",
|
|||
|
"For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
" df['Engine volume'].fillna(df['Engine volume'].median(), inplace=True)\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Проверка наличия пропущенных значений\n",
|
|||
|
"print(df.isnull().sum())\n",
|
|||
|
"\n",
|
|||
|
"# Очистка столбца 'Levy' от нечисловых значений\n",
|
|||
|
"df['Levy'] = pd.to_numeric(df['Levy'], errors='coerce')\n",
|
|||
|
"\n",
|
|||
|
"# Очистка столбца 'Mileage' от нечисловых значений\n",
|
|||
|
"df['Mileage'] = pd.to_numeric(df['Mileage'], errors='coerce')\n",
|
|||
|
"\n",
|
|||
|
"# Проверка типа данных в столбце 'Engine volume'\n",
|
|||
|
"print(df['Engine volume'].dtype)\n",
|
|||
|
"\n",
|
|||
|
"# Если столбец 'Engine volume' не является строковым, преобразуем его в строку\n",
|
|||
|
"if df['Engine volume'].dtype != 'object':\n",
|
|||
|
" df['Engine volume'] = df['Engine volume'].astype(str)\n",
|
|||
|
"\n",
|
|||
|
"# Очистка столбца 'Engine volume' от нечисловых значений\n",
|
|||
|
"df['Engine volume'] = df['Engine volume'].str.replace(r'[^0-9.]', '', regex=True).astype(float)\n",
|
|||
|
"\n",
|
|||
|
"# Заполнение пропущенных значений\n",
|
|||
|
"df['Levy'].fillna(df['Levy'].median(), inplace=True)\n",
|
|||
|
"df['Mileage'].fillna(df['Mileage'].median(), inplace=True)\n",
|
|||
|
"df['Engine volume'].fillna(df['Engine volume'].median(), inplace=True)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 3,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Определение числовых и категориальных признаков\n",
|
|||
|
"numeric_features = ['Levy', 'Prod. year', 'Engine volume', 'Mileage', 'Cylinders', 'Airbags']\n",
|
|||
|
"categorical_features = ['Manufacturer', 'Model', 'Category', 'Leather interior', 'Fuel type', 'Gear box type', 'Drive wheels', 'Doors', 'Wheel', 'Color']\n",
|
|||
|
"\n",
|
|||
|
"# Преобразование категориальных признаков в числовые\n",
|
|||
|
"df = pd.get_dummies(df, columns=categorical_features, drop_first=True)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Разделение данных на тренировочный и тестовый наборы"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Задача регрессии\n",
|
|||
|
"X_reg = df.drop(['ID', 'Price'], axis=1)\n",
|
|||
|
"y_reg = df['Price']\n",
|
|||
|
"\n",
|
|||
|
"# Задача классификации\n",
|
|||
|
"df['Category'] = pd.cut(df['Price'], bins=[0, 10000, 20000, np.inf], labels=['Эконом', 'Средний', 'Премиум'])\n",
|
|||
|
"X_class = df.drop(['ID', 'Price', 'Category'], axis=1)\n",
|
|||
|
"y_class = df['Category']\n",
|
|||
|
"\n",
|
|||
|
"X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)\n",
|
|||
|
"X_train_class, X_test_class, y_train_class, y_test_class = train_test_split(X_class, y_class, test_size=0.2, random_state=42)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### 5. Построение конвейера и обучение моделей\n",
|
|||
|
"#### 5.1. Задача регрессии"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"LinearRegression RMSE: 16981.208711977062\n",
|
|||
|
"DecisionTreeRegressor RMSE: 141914.29349587928\n",
|
|||
|
"RandomForestRegressor RMSE: 173537.46233609488\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Конвейер для задачи регрессии\n",
|
|||
|
"from sklearn.impute import SimpleImputer\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"numeric_transformer = Pipeline(steps=[\n",
|
|||
|
" ('imputer', SimpleImputer(strategy='median')),\n",
|
|||
|
" ('scaler', StandardScaler())\n",
|
|||
|
"])\n",
|
|||
|
"\n",
|
|||
|
"preprocessor_reg = ColumnTransformer(\n",
|
|||
|
" transformers=[\n",
|
|||
|
" ('num', numeric_transformer, numeric_features)\n",
|
|||
|
" ])\n",
|
|||
|
"\n",
|
|||
|
"pipeline_reg = Pipeline(steps=[\n",
|
|||
|
" ('preprocessor', preprocessor_reg),\n",
|
|||
|
" ('regressor', LinearRegression())\n",
|
|||
|
"])\n",
|
|||
|
"\n",
|
|||
|
"# Обучение моделей\n",
|
|||
|
"models_reg = {\n",
|
|||
|
" 'LinearRegression': LinearRegression(),\n",
|
|||
|
" 'DecisionTreeRegressor': DecisionTreeRegressor(),\n",
|
|||
|
" 'RandomForestRegressor': RandomForestRegressor()\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"for name, model in models_reg.items():\n",
|
|||
|
" pipeline_reg.set_params(regressor=model)\n",
|
|||
|
" pipeline_reg.fit(X_train_reg, y_train_reg)\n",
|
|||
|
" y_pred_reg = pipeline_reg.predict(X_test_reg)\n",
|
|||
|
" rmse = np.sqrt(mean_squared_error(y_test_reg, y_pred_reg))\n",
|
|||
|
" print(f'{name} RMSE: {rmse}')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### 5.2. Задача классификации"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 8,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"LogisticRegression F1-score: 0.48010296192139407\n",
|
|||
|
"DecisionTreeClassifier F1-score: 0.6836168013771631\n",
|
|||
|
"RandomForestClassifier F1-score: 0.6943295967769952\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Конвейер для задачи классификации\n",
|
|||
|
"preprocessor_class = ColumnTransformer(\n",
|
|||
|
" transformers=[\n",
|
|||
|
" ('num', numeric_transformer, numeric_features)\n",
|
|||
|
" ])\n",
|
|||
|
"\n",
|
|||
|
"pipeline_class = Pipeline(steps=[\n",
|
|||
|
" ('preprocessor', preprocessor_class),\n",
|
|||
|
" ('classifier', LogisticRegression())\n",
|
|||
|
"])\n",
|
|||
|
"\n",
|
|||
|
"# Обучение моделей\n",
|
|||
|
"models_class = {\n",
|
|||
|
" 'LogisticRegression': LogisticRegression(),\n",
|
|||
|
" 'DecisionTreeClassifier': DecisionTreeClassifier(),\n",
|
|||
|
" 'RandomForestClassifier': RandomForestClassifier()\n",
|
|||
|
"}\n",
|
|||
|
"\n",
|
|||
|
"for name, model in models_class.items():\n",
|
|||
|
" pipeline_class.set_params(classifier=model)\n",
|
|||
|
" pipeline_class.fit(X_train_class, y_train_class)\n",
|
|||
|
" y_pred_class = pipeline_class.predict(X_test_class)\n",
|
|||
|
" f1 = f1_score(y_test_class, y_pred_class, average='weighted')\n",
|
|||
|
" print(f'{name} F1-score: {f1}')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### 6. Оценка качества моделей"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 17,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"LinearRegression RMSE: 16981.208711977062, MAE: 11731.578355206166\n",
|
|||
|
"DecisionTreeRegressor RMSE: 141914.29349587928, MAE: 9887.588955657844\n",
|
|||
|
"RandomForestRegressor RMSE: 173537.46233609488, MAE: 12656.846663315797\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Оценка качества моделей регрессии\n",
|
|||
|
"import sklearn\n",
|
|||
|
"#from sklearn.base import r2_score #r2 = r2_score(y_test_reg, y_pred_reg)\n",
|
|||
|
"from sklearn.metrics import mean_absolute_error\n",
|
|||
|
"\n",
|
|||
|
"\n",
|
|||
|
"for name, model in models_reg.items():\n",
|
|||
|
" pipeline_reg.set_params(regressor=model)\n",
|
|||
|
" y_pred_reg = pipeline_reg.predict(X_test_reg)\n",
|
|||
|
" rmse = np.sqrt(mean_squared_error(y_test_reg, y_pred_reg))\n",
|
|||
|
" mae = mean_absolute_error(y_test_reg, y_pred_reg)\n",
|
|||
|
" \n",
|
|||
|
" print(f'{name} RMSE: {rmse}, MAE: {mae}')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 18,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"LogisticRegression F1-score: 0.48010296192139407, Accuracy: 0.502079002079002, ROC-AUC: 0.6953729054676709\n",
|
|||
|
"DecisionTreeClassifier F1-score: 0.6836168013771631, Accuracy: 0.6876299376299376, ROC-AUC: 0.8222065250497814\n",
|
|||
|
"RandomForestClassifier F1-score: 0.6943295967769952, Accuracy: 0.6993243243243243, ROC-AUC: 0.856645400908623\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Оценка качества моделей классификации\n",
|
|||
|
"for name, model in models_class.items():\n",
|
|||
|
" pipeline_class.set_params(classifier=model)\n",
|
|||
|
" y_pred_class = pipeline_class.predict(X_test_class)\n",
|
|||
|
" f1 = f1_score(y_test_class, y_pred_class, average='weighted')\n",
|
|||
|
" accuracy = accuracy_score(y_test_class, y_pred_class)\n",
|
|||
|
" roc_auc = roc_auc_score(y_test_class, pipeline_class.predict_proba(X_test_class), multi_class='ovr')\n",
|
|||
|
" print(f'{name} F1-score: {f1}, Accuracy: {accuracy}, ROC-AUC: {roc_auc}')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Выводы по пункту 6: Оценка качества моделей\n",
|
|||
|
"Задача регрессии\n",
|
|||
|
"Линейная регрессия (LinearRegression):\n",
|
|||
|
"\n",
|
|||
|
"RMSE: 16981.208711977062\n",
|
|||
|
"\n",
|
|||
|
"MAE: 11731.578355206166\n",
|
|||
|
"\n",
|
|||
|
"Вывод: Линейная регрессия показала относительно низкое качество предсказаний. RMSE и MAE достаточно высоки, что указывает на то, что модель плохо предсказывает цены автомобилей.\n",
|
|||
|
"\n",
|
|||
|
"Дерево решений (DecisionTreeRegressor):\n",
|
|||
|
"\n",
|
|||
|
"RMSE: 141914.29349587928\n",
|
|||
|
"\n",
|
|||
|
"MAE: 9887.588955657844\n",
|
|||
|
"\n",
|
|||
|
"Вывод: Дерево решений показало значительно более высокое значение RMSE по сравнению с линейной регрессией, что указывает на то, что модель сильно переобучилась. Однако MAE ниже, чем у линейной регрессии, что может указывать на то, что модель лучше предсказывает средние значения цен.\n",
|
|||
|
"\n",
|
|||
|
"Случайный лес (RandomForestRegressor):\n",
|
|||
|
"\n",
|
|||
|
"RMSE: 173537.46233609488\n",
|
|||
|
"\n",
|
|||
|
"MAE: 12656.846663315797\n",
|
|||
|
"\n",
|
|||
|
"Вывод: Случайный лес показал еще более высокое значение RMSE, что указывает на то, что модель также сильно переобучилась. MAE выше, чем у линейной регрессии, что говорит о том, что модель предсказывает цены хуже, чем линейная регрессия.\n",
|
|||
|
"\n",
|
|||
|
"Задача классификации\n",
|
|||
|
"Логистическая регрессия (LogisticRegression):\n",
|
|||
|
"\n",
|
|||
|
"F1-score: 0.48010296192139407\n",
|
|||
|
"\n",
|
|||
|
"Accuracy: 0.502079002079002\n",
|
|||
|
"\n",
|
|||
|
"ROC-AUC: 0.6953729054676709\n",
|
|||
|
"\n",
|
|||
|
"Вывод: Логистическая регрессия показала низкое качество классификации. F1-score и точность близки к 0.5, что указывает на то, что модель почти не лучше случайного угадывания. ROC-AUC также низкий, что говорит о плохой способности модели различать классы.\n",
|
|||
|
"\n",
|
|||
|
"Дерево решений (DecisionTreeClassifier):\n",
|
|||
|
"\n",
|
|||
|
"F1-score: 0.6836168013771631\n",
|
|||
|
"\n",
|
|||
|
"Accuracy: 0.6876299376299376\n",
|
|||
|
"\n",
|
|||
|
"ROC-AUC: 0.8222065250497814\n",
|
|||
|
"\n",
|
|||
|
"Вывод: Дерево решений показало значительно лучшее качество классификации по сравнению с логистической регрессией. F1-score и точность выше, а ROC-AUC значительно лучше, что указывает на то, что модель хорошо различает классы.\n",
|
|||
|
"\n",
|
|||
|
"Случайный лес (RandomForestClassifier):\n",
|
|||
|
"\n",
|
|||
|
"F1-score: 0.6943295967769952\n",
|
|||
|
"\n",
|
|||
|
"Accuracy: 0.6993243243243243\n",
|
|||
|
"\n",
|
|||
|
"ROC-AUC: 0.856645400908623\n",
|
|||
|
"\n",
|
|||
|
"Вывод: Случайный лес показал лучшее качество классификации среди всех моделей. F1-score, точность и ROC-AUC выше, чем у дерева решений, что указывает на то, что модель хорошо обобщает данные и различает классы.\n",
|
|||
|
"\n",
|
|||
|
"Общие выводы:\n",
|
|||
|
"Задача регрессии: Линейная регрессия показала лучшее качество предсказаний цен по сравнению с деревьями решений и случайным лесом, несмотря на высокие значения RMSE и MAE. Деревья решений и случайный лес показали сильное переобучение, что привело к очень высоким значениям RMSE.\n",
|
|||
|
"\n",
|
|||
|
"Задача классификации: Случайный лес показал лучшее качество классификации по сравнению с логистической регрессией и деревом решений. Логистическая регрессия показала низкое качество, в то время как дерево решений и случайный лес показали хорошие результаты, причем случайный лес показал наилучшие результаты"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 19,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"LinearRegression Cross-Validation RMSE: 100651.03159099314, Std: 161863.4449796077\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"DecisionTreeRegressor Cross-Validation RMSE: 194034.64594171714, Std: 136171.92328322295\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"RandomForestRegressor Cross-Validation RMSE: 181627.2578040142, Std: 137879.8905706371\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"LogisticRegression Cross-Validation F1-score: 0.4742308354293046, Std: 0.007525407236566359\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"DecisionTreeClassifier Cross-Validation F1-score: 0.6862381973987357, Std: 0.004587968007336983\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n",
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"RandomForestClassifier Cross-Validation F1-score: 0.692567227648008, Std: 0.004169193228958696\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"c:\\Users\\Egor\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\sklearn\\impute\\_base.py:598: UserWarning: Skipping features without any observed values: ['Mileage']. At least one non-missing value is needed for imputation with strategy='median'.\n",
|
|||
|
" warnings.warn(\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Оценка смещения и дисперсии для задачи регрессии\n",
|
|||
|
"for name, model in models_reg.items():\n",
|
|||
|
" pipeline_reg.set_params(regressor=model)\n",
|
|||
|
" scores = cross_val_score(pipeline_reg, X_reg, y_reg, cv=5, scoring='neg_mean_squared_error')\n",
|
|||
|
" rmse_scores = np.sqrt(-scores)\n",
|
|||
|
" print(f'{name} Cross-Validation RMSE: {rmse_scores.mean()}, Std: {rmse_scores.std()}')\n",
|
|||
|
"\n",
|
|||
|
"# Оценка смещения и дисперсии для задачи классификации\n",
|
|||
|
"for name, model in models_class.items():\n",
|
|||
|
" pipeline_class.set_params(classifier=model)\n",
|
|||
|
" scores = cross_val_score(pipeline_class, X_class, y_class, cv=5, scoring='f1_weighted')\n",
|
|||
|
" print(f'{name} Cross-Validation F1-score: {scores.mean()}, Std: {scores.std()}')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Оценка смещения и дисперсии моделей\n",
|
|||
|
"Задача регрессии\n",
|
|||
|
"Дерево решений (DecisionTreeRegressor):\n",
|
|||
|
"\n",
|
|||
|
"Cross-Validation RMSE: 194034.64594171714\n",
|
|||
|
"\n",
|
|||
|
"Std: 136171.92328322295\n",
|
|||
|
"\n",
|
|||
|
"Вывод: Дерево решений показало очень высокое значение RMSE при кросс-валидации, что указывает на сильное переобучение. Стандартное отклонение также высокое, что говорит о нестабильности модели. Это означает, что модель плохо обобщает данные и имеет высокую дисперсию.\n",
|
|||
|
"\n",
|
|||
|
"Случайный лес (RandomForestRegressor):\n",
|
|||
|
"\n",
|
|||
|
"Cross-Validation RMSE: 181627.2578040142\n",
|
|||
|
"\n",
|
|||
|
"Std: 137879.8905706371\n",
|
|||
|
"\n",
|
|||
|
"Вывод: Случайный лес также показал высокое значение RMSE при кросс-валидации, хотя и немного ниже, чем у дерева решений. Стандартное отклонение также высокое, что указывает на нестабильность модели. Это говорит о том, что модель также переобучена и имеет высокую дисперсию.\n",
|
|||
|
"\n",
|
|||
|
"Задача классификации\n",
|
|||
|
"Дерево решений (DecisionTreeClassifier):\n",
|
|||
|
"\n",
|
|||
|
"Cross-Validation F1-score: 0.6862381973987357\n",
|
|||
|
"\n",
|
|||
|
"Std: 0.004587968007336983\n",
|
|||
|
"\n",
|
|||
|
"Вывод: Дерево решений показало хороший F1-score при кросс-валидации, но стандартное отклонение относительно высокое. Это указывает на некоторую нестабильность модели, хотя и не такую высокую, как в случае регрессии. Модель имеет умеренную дисперсию.\n",
|
|||
|
"\n",
|
|||
|
"Случайный лес (RandomForestClassifier):\n",
|
|||
|
"\n",
|
|||
|
"Cross-Validation F1-score: 0.692567227648008\n",
|
|||
|
"\n",
|
|||
|
"Std: 0.004169193228958696\n",
|
|||
|
"\n",
|
|||
|
"#### Вывод: Случайный лес показал лучший F1-score при кросс-валидации по сравнению с деревом решений. Стандартное отклонение также ниже, что указывает на более стабильную модель. Это говорит о том, что случайный лес лучше обобщает данные и имеет меньшую дисперсию по сравнению с деревом решений.\n",
|
|||
|
"\n",
|
|||
|
"Общие выводы:\n",
|
|||
|
"Задача регрессии: И дерево решений, и случайный лес показали высокие значения RMSE и высокое стандартное отклонение при кросс-валидации. Это указывает на сильное переобучение и высокую дисперсию. Модели плохо обобщают данные и нестабильны.\n",
|
|||
|
"\n",
|
|||
|
"Задача классификации: Дерево решений показало хороший F1-score, но с высоким стандартным отклонением, что указывает на некоторую нестабильность. Случайный лес показал лучший F1-score и более низкое стандартное отклонение, что говорит о более стабильной и обобщающей способности модели."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 20,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA1QAAAIjCAYAAAAEMVqQAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABdsElEQVR4nO3dd3wUdf7H8fem7GbTQxLASAiBRHpTBBWIBU5A9Cx4AupJsR+IniennkezIXee5x0WrKg/EVEBC5YTQUUQBaSJCAIGQQEhQLLpm+x+f3+ErCzpQ2ATeD0fjzwkM9+d+czMbvy+d2a+YzPGGAEAAAAA6iwo0AUAAAAAQGNFoAIAAAAAiwhUAAAAAGARgQoAAAAALCJQAQAAAIBFBCoAAAAAsIhABQAAAAAWEagAAAAAwCICFQAAAABYRKACgBOczWbT5MmTA11GwJ133nk677zzfL9v375dNptNL730UsBqOtKRNR4rDXHbAaCxIlABQB089dRTstls6tWrl+Vl7Nq1S5MnT9batWvrr7AG7rPPPpPNZvP9hIaGqnXr1rruuuv0448/Brq8Ovnyyy81efJkZWdnB6yGVq1a+e3Ppk2bqm/fvpo/f37AagKAk1VIoAsAgMZk1qxZatWqlVasWKGtW7cqLS2tzsvYtWuXpkyZolatWqlbt271X2QDNm7cOJ155pkqKSnR6tWr9eyzz+r999/Xt99+q6SkpONaS0pKigoLCxUaGlqn13355ZeaMmWKRo4cqdjY2GNTXC1069ZNf/nLXySVvaeeeeYZXXHFFXr66ad1yy23VPtaq9sOAKiIM1QAUEuZmZn68ssv9dhjjykxMVGzZs0KdEmNTt++fXXttddq1KhRmj59uh599FEdOHBAL7/8cpWvyc/PPya12Gw2hYWFKTg4+Jgs/1g79dRTde211+raa6/VX//6Vy1btkwRERH697//XeVrSktL5Xa7G/22A0BDQqACgFqaNWuW4uLiNHjwYF155ZVVBqrs7Gz9+c9/VqtWreRwONSiRQtdd911ysrK0meffaYzzzxTkjRq1CjfJVvl97K0atVKI0eOrLDMI++tcbvdmjhxos444wzFxMQoIiJCffv21aefflrn7fr1118VEhKiKVOmVJi3efNm2Ww2PfHEE5KkkpISTZkyRenp6QoLC1N8fLz69OmjhQsX1nm9knTBBRdIKgurkjR58mTZbDZt3LhRV199teLi4tSnTx9f+1dffVVnnHGGnE6nmjRpomHDhmnnzp0Vlvvss8+qTZs2cjqd6tmzp7744osKbaq6j2jTpk266qqrlJiYKKfTqbZt2+q+++7z1Td+/HhJUmpqqu/4bd++/ZjUWBfNmzdX+/btffuyfPseffRRPf7442rTpo0cDoc2btxoadvL/fLLLxo9erSaNWsmh8Ohjh076sUXX6xQz/Tp09WxY0eFh4crLi5OPXr00GuvvXZU2wgADRGX/AFALc2aNUtXXHGF7Ha7hg8frqefflorV670BSRJysvLU9++ffX9999r9OjROv3005WVlaV3331XP//8s9q3b6/7779fEydO1E033aS+fftKks4555w61eJyufT8889r+PDhuvHGG5Wbm6sXXnhBAwYM0IoVK+p0KWGzZs107rnn6o033tCkSZP85s2ZM0fBwcH6wx/+IKksUEydOlU33HCDevbsKZfLpVWrVmn16tX63e9+V6dtkKRt27ZJkuLj4/2m/+EPf1B6eroefvhhGWMkSQ899JAmTJigq666SjfccIP27dun6dOnKyMjQ2vWrPFdfvfCCy/o5ptv1jnnnKM77rhDP/74o37/+9+rSZMmSk5Orrae9evXq2/fvgoNDdVNN92kVq1aadu2bXrvvff00EMP6YorrtAPP/yg2bNn69///rcSEhIkSYmJicetxqqUlJRo586dFfblzJkzVVRUpJtuukkOh0NNmjSR1+ut87ZLZeH7rLPOks1m09ixY5WYmKgPP/xQ119/vVwul+644w5J0nPPPadx48bpyiuv1O23366ioiKtX79eX3/9ta6++mpL2wcADZYBANRo1apVRpJZuHChMcYYr9drWrRoYW6//Xa/dhMnTjSSzLx58yosw+v1GmOMWblypZFkZs6cWaFNSkqKGTFiRIXp5557rjn33HN9v5eWlpri4mK/NgcPHjTNmjUzo0eP9psuyUyaNKna7XvmmWeMJPPtt9/6Te/QoYO54IILfL937drVDB48uNplVebTTz81ksyLL75o9u3bZ3bt2mXef/9906pVK2Oz2czKlSuNMcZMmjTJSDLDhw/3e/327dtNcHCweeihh/ymf/vttyYkJMQ33e12m6ZNm5pu3br57Z9nn33WSPLbh5mZmRWOQ0ZGhomKijI//fST33rKj50xxvzzn/80kkxmZuYxr7EqKSkp5sILLzT79u0z+/btM+vWrTPDhg0zksxtt93mt33R0dFm7969fq+3uu3XX3+9OeWUU0xWVpZfm2HDhpmYmBhTUFBgjDHm0ksvNR07dqxxOwDgRMAlfwBQC7NmzVKzZs10/vnnSyq7/2bo0KF6/fXX5fF4fO3mzp2rrl276vLLL6+wDJvNVm/1BAcHy263S5K8Xq8OHDig0tJS9ejRQ6tXr67z8q644gqFhIRozpw5vmkbNmzQxo0bNXToUN+02NhYfffdd9qyZYulukePHq3ExEQlJSVp8ODBys/P18svv6wePXr4tTtyUIV58+bJ6/XqqquuUlZWlu+nefPmSk9P913quGrVKu3du1e33HKLb/9I0siRIxUTE1Ntbfv27dOSJUs0evRotWzZ0m9ebY7d8ajxcB9//LESExOVmJiorl276s0339Qf//hHTZs2za/dkCFDfGfQqlKbbTfGaO7cubrkkktkjPHbxgEDBignJ8f33ouNjdXPP/+slStX1np7AKCxOmEC1ZIlS3TJJZcoKSlJNptNb7/9dp2XYYzRo48+qtNOO00Oh0Onnnqq7zIHACcvj8ej119/Xeeff74yMzO1detWbd26Vb169dKvv/6qRYsW+dpu27ZNnTp1Oi51vfzyy+rSpYvvXqbExES9//77ysnJqfOyEhIS1K9fP73xxhu+aXPmzFFISIiuuOIK37T7779f2dnZOu2009S5c2eNHz9e69evr/V6Jk6cqIULF2rx4sVav369du3apT/+8Y8V2qWmpvr9vmXLFhljlJ6e7gsR5T/ff/+99u7dK0n66aefJEnp6el+ry8fpr065cO3Wz1+x6PGw/Xq1UsLFy7UJ598oi+//FJZWVl65ZVX5HQ6/doduS8rU5tt37dvn7Kzs/Xss89W2L5Ro0ZJkm8b7777bkVGRqpnz55KT0/XmDFjtGzZslpvGwA0JifMPVT5+fnq2rWrRo8e7fc//7q4/fbb9fHHH+vRRx9V586ddeDAAR04cKCeKwXQ2CxevFi7d+/W66+/rtdff73C/FmzZunCCy+sl3VVdSbE4/H4jcj26quvauTIkbrssss0fvx4NW3aVMHBwZo6darvvqS6GjZsmEaNGqW1a9eqW7dueuONN9SvXz/ffUKSlJGRoW3btumdd97Rxx9/rOeff17//ve/NWPGDN1www01rqNz587q379/je2ODAVer1c2m00ffvhhpSPTRUZG1mILj63jXWNCQoKlfWlV+X1X1157rUaMGFFpmy5dukiS2rdvr82bN2vBggX66KOPNHfuXD311FOaOHFipYOfAEBjdsIEqkGDBmnQoEFVzi8uLtZ9992n2bNnKzs7W506ddK0adN8o2Z9//33evrpp7Vhwwa1bdtWUu2+1QNw4ps1a5aaNm2qJ598ssK8efPmaf78+ZoxY4acTqfatGmjDRs2VLu86i4fi4uLq/SBsT/99JPf2Yu33npLrVu31rx58/yWd+SgEnVx2WWX6eabb/Zd9vfDDz/o3nvvrdCuSZMmGjVqlEaNGqW8vDxlZGRo8uTJtQpUVrVp00bGGKWmpuq0006rsl1KSoqksrNF5SMISmUDNmRmZqpr165VvrZ8/1o9fsejxmOlNtuemJioqKg
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 1000x600 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAiQAAAHHCAYAAACPy0PBAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABQBElEQVR4nO3deVhUZf8G8HuGZUB2RBhwQdxBzbUQd5NURMUtM30LzbQM3HClN/cUwx030rfUDM2yNLM0SVJSERXFLcVdNAVUBGQbljm/P/w5OYEd0DkcHO/Pe811xXOeOed75p3o5vucM6MQBEEAERERkYyUchdARERExEBCREREsmMgISIiItkxkBAREZHsGEiIiIhIdgwkREREJDsGEiIiIpIdAwkRERHJjoGEiIiIZMdAQiShS5cuoVu3brCzs4NCocCOHTsMuv/r169DoVBgw4YNBt3vi6xz587o3Lmz3GUQUTkxkJDRu3LlCj744APUqVMHFhYWsLW1Rbt27bB8+XLk5eVJeuzAwECcOXMG8+bNw6ZNm9C6dWtJj1eRhg0bBoVCAVtb21Jfx0uXLkGhUEChUGDRokXl3v/t27cxa9YsJCYmGqBaIqrsTOUugEhKP//8M958802oVCq8++67aNKkCQoKCnDw4EFMnjwZ586dw9q1ayU5dl5eHuLi4vDf//4XwcHBkhzD3d0deXl5MDMzk2T/YkxNTZGbm4uffvoJgwYN0tsWFRUFCwsL5OfnP9O+b9++jdmzZ6N27dpo3rx5mZ+3d+/eZzoeEcmLgYSM1rVr1zB48GC4u7sjJiYGrq6uum1BQUG4fPkyfv75Z8mOf/fuXQCAvb29ZMdQKBSwsLCQbP9iVCoV2rVrhy1btpQIJJs3b4a/vz++//77CqklNzcXVapUgbm5eYUcj4gMi0s2ZLTCw8ORnZ2NL774Qi+MPFavXj2MGzdO93NRURHmzp2LunXrQqVSoXbt2vj444+h0Wj0nle7dm306tULBw8exGuvvQYLCwvUqVMHX331lW7OrFmz4O7uDgCYPHkyFAoFateuDeDRUsfjf37SrFmzoFAo9Maio6PRvn172Nvbw9raGg0bNsTHH3+s2/60a0hiYmLQoUMHWFlZwd7eHgEBATh//nypx7t8+TKGDRsGe3t72NnZYfjw4cjNzX36C/sPQ4YMwe7du5GRkaEbO3bsGC5duoQhQ4aUmJ+eno5JkyahadOmsLa2hq2tLfz8/HDq1CndnP379+PVV18FAAwfPly39PP4PDt37owmTZogISEBHTt2RJUqVXSvyz+vIQkMDISFhUWJ8+/evTscHBxw+/btMp8rEUmHgYSM1k8//YQ6deqgbdu2ZZr//vvvY8aMGWjZsiWWLl2KTp06ISwsDIMHDy4x9/Llyxg4cCDeeOMNLF68GA4ODhg2bBjOnTsHAOjfvz+WLl0KAHj77bexadMmLFu2rFz1nzt3Dr169YJGo8GcOXOwePFi9OnTB4cOHfrX5/3222/o3r070tLSMGvWLISEhODw4cNo164drl+/XmL+oEGD8PDhQ4SFhWHQoEHYsGEDZs+eXeY6+/fvD4VCgR9++EE3tnnzZjRq1AgtW7YsMf/q1avYsWMHevXqhSVLlmDy5Mk4c+YMOnXqpAsHnp6emDNnDgBg1KhR2LRpEzZt2oSOHTvq9nP//n34+fmhefPmWLZsGbp06VJqfcuXL0e1atUQGBiI4uJiAMDnn3+OvXv3YsWKFXBzcyvzuRKRhAQiI5SZmSkAEAICAso0PzExUQAgvP/++3rjkyZNEgAIMTExujF3d3cBgBAbG6sbS0tLE1QqlTBx4kTd2LVr1wQAwsKFC/X2GRgYKLi7u5eoYebMmcKT/0ouXbpUACDcvXv3qXU/Psb69et1Y82bNxecnZ2F+/fv68ZOnTolKJVK4d133y1xvPfee09vn/369ROqVq361GM+eR5WVlaCIAjCwIEDha5duwqCIAjFxcWCWq0WZs+eXeprkJ+fLxQXF5c4D5VKJcyZM0c3duzYsRLn9linTp0EAEJkZGSp2zp16qQ39uuvvwoAhE8//VS4evWqYG1tLfTt21f0HImo4rBDQkYpKysLAGBjY1Om+b/88gsAICQkRG984sSJAFDiWhMvLy906NBB93O1atXQsGFDXL169Zlr/qfH1578+OOP0Gq1ZXrOnTt3kJiYiGHDhsHR0VE3/sorr+CNN97QneeTPvzwQ72fO3TogPv37+tew7IYMmQI9u/fj5SUFMTExCAlJaXU5Rrg0XUnSuWjXz3FxcW4f/++bjnqxIkTZT6mSqXC8OHDyzS3W7du+OCDDzBnzhz0798fFhYW+Pzzz8t8LCKSHgMJGSVbW1sAwMOHD8s0/8aNG1AqlahXr57euFqthr29PW7cuKE3XqtWrRL7cHBwwIMHD56x4pLeeusttGvXDu+//z5cXFwwePBgfPvtt/8aTh7X2bBhwxLbPD09ce/ePeTk5OiN//NcHBwcAKBc59KzZ0/Y2Nhg69atiIqKwquvvlritXxMq9Vi6dKlqF+/PlQqFZycnFCtWjWcPn0amZmZZT5m9erVy3UB66JFi+Do6IjExERERETA2dm5zM8lIukxkJBRsrW1hZubG86ePVuu5/3zotKnMTExKXVcEIRnPsbj6xses7S0RGxsLH777Te88847OH36NN566y288cYbJeY+j+c5l8dUKhX69++PjRs3Yvv27U/tjgDA/PnzERISgo4dO+Lrr7/Gr7/+iujoaDRu3LjMnSDg0etTHidPnkRaWhoA4MyZM+V6LhFJj4GEjFavXr1w5coVxMXFic51d3eHVqvFpUuX9MZTU1ORkZGhu2PGEBwcHPTuSHnsn10YAFAqlejatSuWLFmCP//8E/PmzUNMTAx+//33Uvf9uM6kpKQS2y5cuAAnJydYWVk93wk8xZAhQ3Dy5Ek8fPiw1AuBH9u2bRu6dOmCL774AoMHD0a3bt3g6+tb4jUpazgsi5ycHAwfPhxeXl4YNWoUwsPDcezYMYPtn4ieHwMJGa0pU6bAysoK77//PlJTU0tsv3LlCpYvXw7g0ZIDgBJ3wixZsgQA4O/vb7C66tati8zMTJw+fVo3dufOHWzfvl1vXnp6eonnPv6AsH/eivyYq6srmjdvjo0bN+r9B/7s2bPYu3ev7jyl0KVLF8ydOxcrV66EWq1+6jwTE5MS3ZfvvvsOf/31l97Y4+BUWngrr6lTpyI5ORkbN27EkiVLULt2bQQGBj71dSSiiscPRiOjVbduXWzevBlvvfUWPD099T6p9fDhw/juu+8wbNgwAECzZs0QGBiItWvXIiMjA506dcLRo0exceNG9O3b96m3lD6LwYMHY+rUqejXrx/Gjh2L3NxcrFmzBg0aNNC7qHPOnDmIjY2Fv78/3N3dkZaWhtWrV6NGjRpo3779U/e/cOFC+Pn5wcfHByNGjEBeXh5WrFgBOzs7zJo1y2Dn8U9KpRKffPKJ6LxevXphzpw5GD58ONq2bYszZ84gKioKderU0ZtXt25d2NvbIzIyEjY2NrCysoK3tzc8PDzKVVdMTAxWr16NmTNn6m5DXr9+PTp37ozp06cjPDy8XPsjIonIfJcPkeQuXrwojBw5Uqhdu7Zgbm4u2NjYCO3atRNWrFgh5Ofn6+YVFhYKs2fPFjw8PAQzMzOhZs2aQmhoqN4cQXh026+/v3+J4/zzdtOn3fYrCIKwd+9eoUmTJoK5ubnQsGFD4euvvy5x2+++ffuEgIAAwc3NTTA3Nxfc3NyEt99+W7h48WKJY/zz1tjffvtNaNeunWBpaSnY2toKvXv3Fv7880+9OY+P98/bitevXy8AEK5du/bU11QQ9G/7fZqn3fY7ceJEwdXVVbC0tBTatWsnxMXFlXq77o8//ih4eXkJpqameufZqVMnoXHjxqUe88n9ZGVlCe7u7kLLli2FwsJCvXkTJkwQlEqlEBcX96/nQEQVQyEI5bhyjYiIiEgCvIaEiIiIZMdAQkRERLJjICEiIiLZMZAQERGR7BhIiIiISHYMJERERCQ7BhIiIiKSnVF
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 640x480 with 2 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Визуализация результатов для задачи регрессии\n",
|
|||
|
"plt.figure(figsize=(10, 6))\n",
|
|||
|
"sns.scatterplot(x=y_test_reg, y=y_pred_reg)\n",
|
|||
|
"plt.xlabel('Actual Prices')\n",
|
|||
|
"plt.ylabel('Predicted Prices')\n",
|
|||
|
"plt.title('Actual vs Predicted Prices')\n",
|
|||
|
"plt.show()\n",
|
|||
|
"\n",
|
|||
|
"# Визуализация результатов для задачи классификации\n",
|
|||
|
"conf_matrix = confusion_matrix(y_test_class, y_pred_class)\n",
|
|||
|
"sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues')\n",
|
|||
|
"plt.xlabel('Predicted')\n",
|
|||
|
"plt.ylabel('Actual')\n",
|
|||
|
"plt.title('Confusion Matrix')\n",
|
|||
|
"plt.show()"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"1) Плохое качество предсказаний в первой диаграмме: \n",
|
|||
|
"Модель регрессии плохо предсказывает цены автомобилей, так как точки на диаграмме рассеяния распределены хаотично и далеко от диагонали. Значительные ошибки: Ошибки предсказаний значительны, что указывает на то, что модель не может точно предсказать цены автомобилей. Необходимость улучшения модели: Для улучшения качества предсказаний стоит рассмотреть другие модели, такие как градиентный бустинг или нейронные сети, а также улучшить предобработку данных.\n",
|
|||
|
"\n",
|
|||
|
"2) Выводы по второй диаграмме:\n",
|
|||
|
"Хорошее качество классификации: Матрица ошибок показывает высокие значения на диагонали и низкие значения вне диагонали, что указывает на хорошее качество классификации.\n",
|
|||
|
"Правильные предсказания: Большинство предсказаний модели являются правильными, что говорит о ее способности хорошо различать классы.\n",
|
|||
|
"Низкие ошибки: Низкие значения вне диагонали указывают на то, что модель допускает мало ошибок при классификации."
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python 3",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.12.5"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 2
|
|||
|
}
|